This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Decentralized Online Convex Optimization in Networked Systems

Yiheng Lin    Judy Gan    Guannan Qu    Yash kanoria    Adam Wierman
Abstract

We study the problem of networked online convex optimization, where each agent individually decides on an action at every time step and agents cooperatively seek to minimize the total global cost over a finite horizon. The global cost is made up of three types of local costs: convex node costs, temporal interaction costs, and spatial interaction costs. In deciding their individual action at each time, an agent has access to predictions of local cost functions for the next kk time steps in an rr-hop neighborhood. Our work proposes a novel online algorithm, Localized Predictive Control (LPC), which generalizes predictive control to multi-agent systems. We show that LPC achieves a competitive ratio of 1+O~(ρTk)+O~(ρSr)1+\tilde{O}(\rho_{T}^{k})+\tilde{O}(\rho_{S}^{r}) in an adversarial setting, where ρT\rho_{T} and ρS\rho_{S} are constants in (0,1)(0,1) that increase with the relative strength of temporal and spatial interaction costs, respectively. This is the first competitive ratio bound on decentralized predictive control for networked online convex optimization. Further, we show that the dependence on kk and rr in our results is near optimal by lower bounding the competitive ratio of any decentralized online algorithm.

Machine Learning, ICML

1 Introduction

A wide variety of multi-agent systems can be modeled as optimization tasks in which individual agents must select actions based on local information with the goal of cooperatively learning to minimize a global objective in an uncertain, time varying environment. This general setting emerges in applications such as formation control (Chen & Wang, 2005; Oh et al., 2015), power systems control (Molzahn et al., 2017; Shi et al., 2021), and multiproduct price optimization (Caro & Gallien, 2012; Candogan et al., 2012). In all these cases, it is key that the algorithms used by agents use only local information due to the computational burden created by the size of the systems, the information constraints in the systems, and the need for fast and/or interpretable decisions.

At this point, there is a mature literature focused on decentralized optimization, e.g. Bertsekas & Tsitsiklis (1989); Boyd et al. (2011); Shi et al. (2015); Nedić et al. (2018), see Xin et al. (2020) for a survey; however, the design of learning policies for uncertain, time-varying environments requires decentralized online optimization. The literature studying decentralized online optimization is still nascent (see the related work section for a discussion of recent papers, e.g. Li et al. (2021b); Yuan et al. (2021); Yi et al. (2020)) and many challenging open questions remain.

Three issues of particular importance for real-world applications are the following.

First, temporal coupling in actions is often of first-order importance to applications. For example, startup costs, ramping costs, and switching costs are prominent in settings such as power systems and cloud computing, and lead to penalties for changing actions dramatically over time. The design of online algorithms to address such temporal interaction costs has received significant attention in the single-agent case recently, e.g, smoothed online optimization (Goel et al., 2019; Lin et al., 2020), convex body chasing (Argue et al., 2020b; Sellke, 2020), online optimization with memory (Agarwal et al., 2019; Shi et al., 2020), and dynamic pricing (Besbes & Lobel, 2015; Chen & Farias, 2018).

Second, spatial interaction costs are of broad importance in practical applications. Such costs arise because of the need for actions of nearby agents to be aligned with one another, and are prominent in settings such as economic team theory (Marschak, 1955; Marschak & Radner, 1972), combinatorial optimization over graphs (Hochba, 1997; Gamarnik & Goldberg, 2010), and statistical inference (Wainwright & Jordan, 2008). An example is (dynamic) multiproduct pricing, where the price of a product can impact the demand of other related products (Song & Chintagunta, 2006).

Third, leveraging predictions of future costs has long been recognized as a promising way to improve the performance of online agents (Morari & Lee, 1999; Lin et al., 2012; Badiei et al., 2015; Chen et al., 2016; Shi et al., 2019; Li et al., 2020). As learning tools become more prominent, the role of predictions is growing. By collecting data from repeated trials, data-driven learning tools make it possible to provide accurate predictions for near future costs. For example, in multiproduct pricing, good demand forecasts can be constructed up to a certain time horizon and are invaluable in setting prices (Caro & Gallien, 2012).

In addition to the three issues above, we would like to highlight that existing results for decentralized online optimization focus on designing algorithms with low (static) regret (Hosseini et al., 2016; Li et al., 2021b) , i.e., algorithms that (nearly) match the performance of the best static action in hindsight. In a time-varying environment, it is desirable to instead obtain stronger bounds, such as those on the dynamic regret or competitive ratio, which compare to the dynamic optimal actions instead of the best static action in hindsight, e.g., see results in the centralized setting such as Lin et al. (2020); Li et al. (2020); Shi et al. (2020).

This paper aims to address decentralized online optimization with the three features described above. In particular, we are motivated by the open question: Can a decentralized algorithm make use of predictions to be competitive for networked online convex optimization in an adversarial environment when spatial and temporal costs are considered?

Contributions. This paper provides the first competitive algorithm for decentralized learning in networked online convex optimization. Agents in a network must each make a decision at each time step, to minimize a global cost which is the sum of convex node costs, spatial interaction costs and temporal interaction costs. We propose a predictive control framework called Localized Predictive Control (LPC, Algorithm 1) and prove that it achieves a competitive ratio of 1+O~(ρTk)+O~(ρSr)1+\tilde{O}(\rho_{T}^{k})+\tilde{O}(\rho_{S}^{r}), which approaches 1 exponentially fast as the prediction horizon kk and communication radius rr increase simultaneously. Our results quantify the improvement in competitive ratio from increasing the communication radius rr (which also increases the computational requirements) versus increasing the prediction horizon kk, and imply that – as a function of problem parameters – one of the two “resources” kk and rr emerges as the bottleneck to algorithmic performance. Given any target competitive ratio, we find the minimum required prediction horizon kk and communication radius rr as functions of the temporal interaction strength and the spatial interaction strength, resp.

Further, we show that LPC is order-optimal in terms of kk and rr by proving a lower bound on the competitive ratio for any online algorithm. We formalize the near optimality of our algorithm by showing that a resource augmentation bound follows from our upper and lower bounds: our algorithm with given kk and rr performs at least as well as the best possible algorithm that is forced to work with kk^{\prime} and rr^{\prime} which are a constant factor smaller than kk and rr respectively.

The algorithm we propose, LPC, is inspired by Model Predictive Control (MPC). After fixing the prediction horizon kk and the communication radius rr, each agent makes an individual decision by solving a kk-time-step optimization problem, on a local neighborhood centered at itself and with radius rr. In doing so, the algorithm utilizes all available information and makes a “greedy” decision. One benefit of this algorithm is its simplicity and interpretability, which is often important for practical applications. Moreover, since the algorithm is local, the computation needed for each agent is independent of the network size.

Our main results are enabled by a new analysis methodology which obtains two separate decay factors for the propagation of decision errors (a temporal decaying factor ρT\rho_{T} and spatial decaying factor ρS\rho_{S}) through a novel perturbation analysis. Specifically, the perturbation analysis seeks to answer the following question: If we perturb the boundary condition of an agent vv’s rr-hop neighborhood at the time step which is τ\tau-th later than the present, how does that affect vv’s current decision, in terms of spatial distance rr and temporal distance τ\tau? With our analysis, we are able to bound the impact on vv’s current decision by O(ρTτρSr)O(\rho_{T}^{\tau}\rho_{S}^{r}), where the decay factors ρT\rho_{T} and ρS\rho_{S} increase with the strength of temporal/spatial interactions among individual decisions. This novel analysis is critical for deriving a competitive ratio that distinguishes the decay rate for temporal and spatial distances.

To illustrate the use of our results in a concrete application, Appendix B provides a detailed discussion of dynamic multiproduct pricing, which is a central problem in revenue management. The resulting revenue maximization problem fits into our theoretical framework, and we deduce from our results that LPC guarantees near optimal revenue, in addition to reducing the computational burden (Schlosser, 2016) and providing interpretable prices (Biggs et al., 2021) for products.

Related Work. This paper contributes to the literature in three related areas, each of which we describe below.

Distributed Online Convex Optimization. Our work relates to a growing literature on distributed online convex optimization with time-varying cost functions over multi-agent networks. Many recent advances have been made including distributed OCO with delayed feedback (Cao & Basar, 2021), coordinating behaviors among agents (Li et al., 2021a; Cao & Başar, 2021), and distributed OCO with a time-varying communication graph (Hosseini et al., 2016; Akbari et al., 2017; Yuan et al., 2021; Li et al., 2021b; Yi et al., 2020). A common theme of the previous literature is the idea that agents can only access partial information of time-varying global loss functions, thus requiring local information exchange between neighboring agents. To the best of our knowledge, our paper is the first in this literature to provide competitive ratio bounds or consider spatial and temporal costs, e.g., switching costs.

Online Convex Optimization (OCO) with Switching Costs. Online convex optimization with switching costs was first introduced in Lin et al. (2012) to model dynamic power management in data centers. Different forms of cost functions have been studied since then, e.g., Chen et al. (2018); Shi et al. (2020); Lin et al. (2020), in order to fit a variety of applications from video streaming (Joseph & de Veciana, 2012) to energy markets (Kim & Giannakis, 2017). The quadratic form of switching cost was first proposed in Goel & Wierman (2019) and yields connections to optimal control, which were further explored in Lin et al. (2021). The literature has focused entirely on the centralized, single-agent setting. Our paper contributes to this literature by providing the first analysis of switching costs in a networked setting with a decentralized algorithm.

Perturbation Analysis of Online Algorithms. Sensitivity analysis of convex optimization problems studying the properties of the optimal solutions as a function of the problem parameters has a long and rich history (see Fiacco & Ishizuka (1990) for a survey). The works that are most related to ours consider the specific class of problems where the decision variables are located on a horizon axis, or consider a general network and aim to show the impact of a perturbation on a decision variable is exponentially decaying in the graph distance from that variable, e.g., Shin et al. (2021); Shin & Zavala (2021); Lin et al. (2021). The idea of using exponentially decaying perturbation bounds to analyze an online algorithm is first proposed in Lin et al. (2021), where only the temporal dimension is considered. This style of perturbation analysis is key to the proof of our competitive bounds and, to prove our competitive bounds, we provide new perturbation results that separate the impact of spatial and temporal costs in a network for the first time. Additionally, our analysis is enabled by new results on the decay rate of a product of exponential decay matrices, which may be of independent interest.

Notation. A complete notation table can be found in Appendix A. Here we describe the most commonly used notation. In a graph 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}), we use d𝒢(v,u)d_{\mathcal{G}}(v,u) to denote the distance (i.e. the length of the shortest path) between two vertices vv and uu. NvrN_{v}^{r} denotes the rr-hop neighborhood of vertex vv, i.e., Nvr:={u𝒱d𝒢(u,v)r}N_{v}^{r}:=\{u\in\mathcal{V}\mid d_{\mathcal{G}}(u,v)\leq r\}. Nvr\partial N_{v}^{r} denotes the boundary of NvrN_{v}^{r}, i.e., Nvr=NvrNvr1\partial N_{v}^{r}=N_{v}^{r}\setminus N_{v}^{r-1}. We generalize these notations to temporal-spatial graphs as follows. Let ×\times denote the Cartesian product of sets, and N(t,v)(k,r):={τtτ<t+k}×Nvr,N_{(t,v)}^{(k,r)}:={}\{\tau\in\mathbb{Z}\mid t\leq\tau<t+k\}\times N_{v}^{r}, N(t,v)(k,r):=N(t,v)(k,r)N(t,v)(k1,r1).\partial N_{(t,v)}^{(k,r)}:={}N_{(t,v)}^{(k,r)}\setminus N_{(t,v)}^{(k-1,r-1)}. For any subset of vertices SS, we use (S)\mathcal{E}(S) to denote the set of all edges that have both endpoints in SS, and define S+={u𝒱vS s.t. d𝒢(u,v)1}S_{+}=\{u\in\mathcal{V}\mid\exists v\in S\text{ s.t. }d_{\mathcal{G}}(u,v)\leq 1\} (i.e., SS and its 1-hop neighbors). Let Δ\Delta denote the maximum degree of any vertex in 𝒢\mathcal{G}; h(r):=supv|Nvr|h(r):=\sup_{v}\mathopen{}\mathclose{{}\left\lvert\partial N_{v}^{r}}\right\rvert. We say a function is in C2C^{2} if it is twice continuously differentiable. We use \mathopen{}\mathclose{{}\left\lVert\cdot}\right\rVert to denote the (Euclidean) 2-norm for vectors and the induced 2-norm for matrices.

2 Problem Setting

We consider a set of agents in a networked system where each agent individually decides on an action at each time step and the agents cooperatively seek to minimize a global cost over a finite time horizon HH. Specifically, we consider a graph 𝒢=(𝒱,)\mathcal{G}=\mathopen{}\mathclose{{}\left(\mathcal{V},\mathcal{E}}\right) of agents. Each vertex v𝒱v\in\mathcal{V} denotes an individual agent, and two agents vv and uu interact with each other if and only if they are connected by an undirected edge (v,u)(v,u)\in\mathcal{E}. At each time step t=1,2,,Ht=1,2,\dots,H, each agent vv picks an nn-dimensional local action xtvDtvx_{t}^{v}\in D_{t}^{v}, where nn is a positive integer and DtvnD_{t}^{v}\subset\mathbb{R}^{n} is a convex set of feasible actions. The global action at time tt is the vector of agent actions xt={xtv}v𝒱,x_{t}=\{x_{t}^{v}\}_{v\in\mathcal{V}}, and incurs a global state cost, which is the sum of three types of local cost functions:

  • Node costs: Each agent vv incurs a time-varying node cost ftv(xtv)f_{t}^{v}(x_{t}^{v}), which characterizes agent vv’s local preference for its local action xtvx_{t}^{v}.

  • Temporal interaction costs: Each agent vv incurs a time-varying temporal interaction cost ctv(xtv,xt1v)c_{t}^{v}(x_{t}^{v},x_{t-1}^{v}), that characterizes how agent vv’s previous local action xt1vx_{t-1}^{v} interacts with its current local action xtvx_{t}^{v}.

  • Spatial interaction costs: Each edge e=(v,u)e=(v,u) incurs a time-varying spatial interaction cost111Since ee is an undirected edge, the order we write the two inputs (the action of vv and the action of uu) does not matter. Note that stes_{t}^{e} can be asymmetric for agents vv and uu, e.g., ste(xtv,xtu)=ste(xtu,xtv)=xtv+2xtu2s_{t}^{e}(x_{t}^{v},x_{t}^{u})=s_{t}^{e}(x_{t}^{u},x_{t}^{v})=\mathopen{}\mathclose{{}\left\lVert x_{t}^{v}+2x_{t}^{u}}\right\rVert^{2}. ste(xtv,xtu)s_{t}^{e}(x_{t}^{v},x_{t}^{u}). This characterizes how agents vv and uu’s current local actions affect each other.

In our model, the node cost is the part of the cost that only depends the agent’s current local action. If the other two types of costs are zero functions, each agent will trivially pick the minimizer of its node cost. Temporal interaction costs encourage agents to choose a local action that is “compatible” with their previous local action. For example, a temporal interaction could be a switching cost which penalizes large deviations from the previous action, in order to make the trajectory of local actions “smooth”. Such switching costs can be found in work on single-agent online convex optimization, e.g., Chen et al. (2018); Goel et al. (2019); Lin et al. (2020). Spatial interaction costs, on the other hand, can be used to enforce some collective behavior among the agents. For example, spatial interaction can model the probability that one agent’s actions affects its neighbor’s actions in diffusion processes on social networks (Kempe et al., 2015); or model interactions between complement/substitute products in multiproduct pricing (Candogan et al., 2012).

Our analysis is based on standard smoothness and convexity assumptions on the local cost functions (see Appendix A for definitions of smoothness and strong convexity):

Assumption 2.1.

For μ>0,f<,T<,S<\mu>0,\ell_{f}<\infty,\ell_{T}<\infty,\ell_{S}<\infty, the local cost functions and feasible sets for all t,v,et,v,e satisfy:

  • ftv:n0f_{t}^{v}:\mathbb{R}^{n}\to\mathbb{R}_{\geq 0} is μ\mu-strongly convex, f\ell_{f}-smooth, and in C2C^{2};

  • ctv:n×n0c_{t}^{v}:\mathbb{R}^{n}\times\mathbb{R}^{n}\to\mathbb{R}_{\geq 0} is convex, T\ell_{T}-smooth, and in C2C^{2};

  • ste:n×n0s_{t}^{e}:\mathbb{R}^{n}\times\mathbb{R}^{n}\to\mathbb{R}_{\geq 0} is convex, S\ell_{S}-smooth, and in C2C^{2};

  • DtvnD_{t}^{v}\subseteq\mathbb{R}^{n} satisfies int(Dtv)int(D_{t}^{v})\not=\emptyset and can be written as Dtv:={xtvn(gtv)i(xtv)0,1imtv},D_{t}^{v}:=\{x_{t}^{v}\in\mathbb{R}^{n}\mid(g_{t}^{v})_{i}(x_{t}^{v})\leq 0,\forall 1\leq i\leq m_{t}^{v}\}, where each (gtv)i:n(g_{t}^{v})_{i}:\mathbb{R}^{n}\to\mathbb{R} is a convex function in C2C^{2}.

Note that the assumptions above are common, even in the case of single-agent online convex optimization, e.g., see Li et al. (2020); Shi et al. (2020); Lin et al. (2021).

It is useful to separate the global stage costs into two parts based on whether the cost term depends only on the current global action or whether it also depends on the previous action. Specifically, the part that only depends on the current global action xtx_{t} is the sum of all node costs and spatial interaction costs. We refer to this component as the (global) hitting cost and denote it as

ft(xt):=v𝒱ftv(xtv)+(v,u)st(v,u)(xtv,xtu).f_{t}(x_{t}):=\sum_{v\in\mathcal{V}}f_{t}^{v}(x_{t}^{v})+\sum_{(v,u)\in\mathcal{E}}s_{t}^{(v,u)}(x_{t}^{v},x_{t}^{u}).

The rest of the global stage cost involves the current global action xtx_{t} and the previous global action xt1x_{t-1}. We refer to it as the (global) switching cost and denote it as

ct(xt,xt1)=v𝒱ctv(xtv,xt1v).c_{t}(x_{t},x_{t-1})=\sum_{v\in\mathcal{V}}c_{t}^{v}(x_{t}^{v},x_{t-1}^{v}).

Combining the global hitting and switching costs, the networked agents work cooperatively to minimize the total global stage costs in a finite horizon HH starting from a given initial global action x0x_{0} at time step 0: cost(ALG):=t=1H(ft(xt)+ct(xt,xt1)),cost(ALG):=\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left(f_{t}(x_{t})+c_{t}(x_{t},x_{t-1})}\right), where ALGALG denotes the decentralized online algorithm used by the agents. The offline optimal cost is the clairvoyant minimum cost one can incur on the same sequence of cost functions and the initial global action x0x_{0} at time step 0, i.e., cost(OPT):=minx1:Ht=1H(ft(xt)+ct(xt,xt1)).cost(OPT):=\min_{x_{1:H}}\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left(f_{t}(x_{t})+c_{t}(x_{t},x_{t-1})}\right).

We measure the performance of any online algorithm ALG by the competitive ratio (CR), which is a widely-used metric in the literature of online optimization, e.g., Chen et al. (2018); Goel et al. (2019); Argue et al. (2020a).

Definition 2.2.

The competitive ratio of online algorithm ALGALG is the supremum of cost(ALG)/cost(OPT)cost(ALG)/cost(OPT) over all possible problem instances, i.e., CR(ALG)sup𝒢,H,x0,{ftv,ctv,ste,Dtv}cost(ALG)/cost(OPT).CR(ALG)\coloneqq\sup_{\mathcal{G},H,x_{0},\{f_{t}^{v},c_{t}^{v},s_{t}^{e},D_{t}^{v}\}}cost(ALG)/cost(OPT).

Finally, we define the partial hitting and switching costs over subsets of the agents. In particular, for a subset of agents S𝒱S\subseteq\mathcal{V}, we denote the joint action over SS as xtS:={xtvvS}x_{t}^{S}:=\{x_{t}^{v}\mid v\in S\} and define the partial hitting cost and partial switching cost over SS as

ftS(xtS+):=\displaystyle f_{t}^{S}(x_{t}^{S_{+}}):={} vSftv(xtv)+(v,u)(S+)st(v,u)(xtv,xtu),\displaystyle\sum_{v\in S}f_{t}^{v}(x_{t}^{v})+\sum_{(v,u)\in\mathcal{E}(S_{+})}s_{t}^{(v,u)}(x_{t}^{v},x_{t}^{u}),
ctS(xtS,xt1S):=\displaystyle c_{t}^{S}(x_{t}^{S},x_{t-1}^{S}):={} vSctv(xtv,xt1v),\displaystyle\sum_{v\in S}c_{t}^{v}(x_{t}^{v},x_{t-1}^{v}), (1)

This notation is useful for presenting decentralized online algorithms where the optimizations are performed over the rr-hop neighborhood of each agent.

2.1 Information Availability Model

u1u_{1}u2u_{2}vvu3u_{3}u4u_{4}t+2t+2u1u_{1}u2u_{2}vvu3u_{3}u4u_{4}t+1t+1u1u_{1}u2u_{2}vvu3u_{3}u4u_{4}ttftvf_{t}^{v}ctvc_{t}^{v}stes_{t}^{e}u1u_{1}u2u_{2}vvu3u_{3}u4u_{4}t1t-1historyfuture
Figure 1: Illustration of available information for agent vv at time tt in networked online convex optimization with k=2k=2 and r=1r=1, for the network with 𝒱={u1,u2,v,u3,u4}\mathcal{V}=\{u_{1},u_{2},v,u_{3},u_{4}\} and ={(u1,u2),(u2,v),(v,u3),(u2,u3),(u3,u4)})\mathcal{E}=\{(u_{1},u_{2}),(u_{2},v),(v,u_{3}),(u_{2},u_{3}),(u_{3},u_{4})\}).

We assume that each agent has access to local cost functions up to a prediction horizon kk into the future, for themselves and their neighborhood up to a communication radius rr. In more detail, recall that NvrN_{v}^{r} denotes the rr-hop neighborhood of agent vv, i.e., Nvr:={u𝒱d𝒢(u,v)r}N_{v}^{r}:=\{u\in\mathcal{V}\mid d_{\mathcal{G}}(u,v)\leq r\}. Before picking a local action xtvx_{t}^{v} at time tt, agent vv can observe kk steps of future node costs, temporal interaction costs, and spatial interaction costs within its rr-hop neighborhood, {{(fτu,cτu)uNvr},{sτee(Nvr)}}tτ<t+k,\{\{(f_{\tau}^{u},c_{\tau}^{u})\mid u\in N_{v}^{r}\},\{s_{\tau}^{e}\mid e\in\mathcal{E}(N_{v}^{r})\}\}_{t\leq\tau<t+k}\,, and the previous local actions in NvrN_{v}^{r}: {xt1uuNvr}\{x_{t-1}^{u}\mid u\in N_{v}^{r}\}.

We provide an illustration of the local cost functions known to agent vv at time tt in Figure 1. In the figure, the black circles, blue lines, and orange lines denote the node costs, temporal interaction costs, and spatial interaction costs respectively. The known functions are marked by solid lines. Note that, in addition to the local cost functions, agent vv also knows the local actions in NvrN_{v}^{r} at time t1t-1, which are not illustrated in the figure.

To simplify notation, in cases when the prediction horizon exceeds the whole horizon length HH, we adopt the convention that ftv(xtv)=μ2xtv2f_{t}^{v}(x_{t}^{v})=\frac{\mu}{2}\mathopen{}\mathclose{{}\left\lVert x_{t}^{v}}\right\rVert^{2}, ctvste0c_{t}^{v}\equiv s_{t}^{e}\equiv 0 and Dtv=nD_{t}^{v}=\mathbb{R}^{n} for t>Ht>H. These extended definitions do not affect our original problem with horizon HH.

As in many previous works studying the power of predictions in the online optimization literature, e.g., Yu et al. (2020); Lin et al. (2020); Li et al. (2020); Lin et al. (2021), we assume the kk-step predictions of cost functions are exact and leave the case of inexact predictions for future work. This model is reasonable in the case where the predictors can be trained to be very accurate for the near future. Although we focus on exact predictions throughout this paper, we also discuss how to extend this available information model to include inexact predictions in Appendix C.3.

3 Algorithm and Main Results

Algorithm 1 Localized Predictive Control (for agent vv)
  Parameters: kk and rr.
  for t=1t=1 to HH do
     Receive information {xt1uuNvr}\{x_{t-1}^{u}\mid u\in N_{v}^{r}\} and
{{(fτu,cτu)uNvr},{sτee(Nvr)}}tτ<t+k.\{\{(f_{\tau}^{u},c_{\tau}^{u})\mid u\in N_{v}^{r}\},\{s_{\tau}^{e}\mid e\in\mathcal{E}(N_{v}^{r})\}\}_{t\leq\tau<t+k}.
     Choose local action xtvx_{t}^{v} to be the (t,v)(t,v)-th element in
ψ(t,v)(k,r)({xt1uuNvr},{θτu(τ,u)N(t,v)(k,r)})\psi_{(t,v)}^{(k,r)}\!\mathopen{}\mathclose{{}\left(\!\{x_{t-1}^{u}\!\mid\!u\!\in\!N_{v}^{r}\},\big{\{}\theta_{\tau}^{u}\!\mid\!(\tau,u)\!\in\!\partial N_{(t,v)}^{(k,r)}\big{\}}\!}\right)
the solution of (3.1), where θτuargminyDτufτu(y)\theta_{\tau}^{u}\coloneqq\arg\min_{y\in D_{\tau}^{u}}f_{\tau}^{u}(y).
  end for

In this section we present our main results, which show that our simple and practical LPC algorithm can achieve an order-optimal competitive ratio for the networked online convex optimization problem. We first introduce LPC in Section 3.1. Then, we present the key idea that leads to our competitive ratio bound: a novel perturbation-based analysis (Section 3.2). Next, we use our perturbation analysis to derive bounds on the competitive ratio in Section 3.3. Finally, we show that the competitive ratio of LPC is order-optimal in Section 3.4. An outline that highlights the major novelties in our proofs can be found in Appendix C.

3.1 Localized Predictive Control (LPC)

The design of LPC is inspired by the classical model predictive control (MPC) framework (Garcia et al., 1989), which leverages all available information at the current time step to decide the current local action “greedily”. In our context, when an agent vv wants to decide its action xtvx_{t}^{v} at time tt, the available information includes previous local actions in the rr-hop neighborhood and kk-step predictions of all local node costs, temporal/spatial interaction costs. The boundaries of all available information, which are formed by {t1}×Nvr\{t-1\}\times N_{v}^{r} and N(t,v)(k,r)\partial N_{(t,v)}^{(k,r)}, are illustrated in Figure 2.

The pseudocode for LPC is presented in Algorithm 1. For each agent vv at time step tt, LPC fixes the actions on the boundaries of available information and then solves for the optimal actions inside the boundaries. Specifically, define ψ(t,v)(k,r)({yt1uuNvr},{zτu(τ,u)N(t,v)(k,r)})\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\mid u\in N_{v}^{r}\},\{z_{\tau}^{u}\mid(\tau,u)\in\partial N_{(t,v)}^{(k,r)}\}}\right) as the optimal solution of the problem

min\displaystyle\min τ=tt+k1(fτ(Nvr1)(xτ(Nvr))+cτ(Nvr)(xτ(Nvr),xτ1(Nvr)))\displaystyle\sum_{\tau=t}^{t+k-1}\mathopen{}\mathclose{{}\left(f_{\tau}^{(N_{v}^{r-1})}\mathopen{}\mathclose{{}\left(x_{\tau}^{(N_{v}^{r})}}\right)+c_{\tau}^{(N_{v}^{r})}\mathopen{}\mathclose{{}\left(x_{\tau}^{(N_{v}^{r})},x_{\tau-1}^{(N_{v}^{r})}}\right)}\right)
s.t. xt1u=yt1u,uNvr,\displaystyle x_{t-1}^{u}=y_{t-1}^{u},\forall u\in N_{v}^{r},
xτu=zτu,(τ,u)N(t,v)(k,r),\displaystyle x_{\tau}^{u}=z_{\tau}^{u},\forall(\tau,u)\in\partial N_{(t,v)}^{(k,r)}, (2)
xτuDτu,(τ,u)N(t,v)(k1,r1),\displaystyle x_{\tau}^{u}\in D_{\tau}^{u},\forall(\tau,u)\in N_{(t,v)}^{(k-1,r-1)},

where the partial hitting cost and partial switching cost fτSf_{\tau}^{S} and cτSc_{\tau}^{S} for a subset SS of agents were defined in (1). Note that ψ(t,v)(k,r)({yt1u},{zτu})\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right) is a matrix of actions (in n\mathbb{R}^{n}) indexed by (τ,u)N(t,v)(k1,r1)(\tau,u)\in N_{(t,v)}^{(k-1,r-1)}. (When the context is clear, we use the shorthand ψ(t,v)(k,r)({yt1u},{zτu})\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right).) Once the parameters {yt1u}\{y_{t-1}^{u}\} and {zτu}\{z_{\tau}^{u}\} are fixed, the agent vv can leverage its knowledge of the local cost functions to solve (3.1).

t+2t+2t+1t+1tt(t,v)(t,v)ftvf_{t}^{v}ctvc_{t}^{v}stes_{t}^{e}t1t-1vvqqr=2r=2historyfuture
Figure 2: Illustration of LPC with k=3,r=2k=3,r=2 on a line graph (the underlying graph is replicated over the time dimension). The orange node marks the decision variable at (t,v)(t,v). The green part denotes the decisions in NvrN_{v}^{r} at time (t1)(t-1). The blue “U” shape denotes the boundary of available information for node vv at time tt. Edge e(v,q)e\coloneqq(v,q).

LPC fixes the parameters {yt1u}\{y_{t-1}^{u}\} to be {xt1u}\{x_{t-1}^{u}\}, which are the previous local actions in NvrN_{v}^{r}, and fixes the parameters {zτu}\{z_{\tau}^{u}\} to be the minimizers of local node cost functions at nodes in N(t,v)(k,r)\partial N_{(t,v)}^{(k,r)}. The selection of the parameters at nodes in N(t,v)(k,r)\partial N_{(t,v)}^{(k,r)} plays a similar role as the terminal cost of classical MPC in centralized settings.

For a single-agent system, MPC-style algorithms are perhaps the most prominent approach for optimization-based control (Garcia et al., 1989) because of their simplicity and excellent performance in practice. LPC extends the ideas of MPC to a multi-agent setting in a networked system by leveraging available information in both the temporal and spatial dimensions, whereas classical MPC focuses only on the temporal dimension. This change leads to significant technical challenges in the analysis.

3.2 Perturbation Analysis

The key idea underlying our analysis of LPC is that the impact of perturbations to the actions at the boundaries of the available information of an agent decay quickly, in fact exponentially fast, in the distance of the boundary from the agent. This quick decay means that small errors cannot build up to hurt algorithm performance.

In this section, we formally study such perturbations by deriving several new results which generalize perturbation bounds for online convex optimization problems on networks. Our bounds capture both the effect of temporal interactions as well as spatial interactions between agent actions, which is a more challenging problem compared to previous literature which considers either temporal interactions (Lin et al., 2021) or spatial interactions (Shin et al., 2021) but not both simultaneously.

More specifically, recall from Section 3.1 that for each agent vv at time tt, LPC solves an optimization problem ψ(t,v)(k,r)\psi_{(t,v)}^{(k,r)} where actions on the boundaries of available information (i.e., {t1}×Nvr\{t-1\}\times N_{v}^{r} and N(t,v)(k,r)\partial N_{(t,v)}^{(k,r)}) are fixed. By the principle of optimality, we know that if the actions on the boundaries are selected to be identical with the offline optimal actions, the agent can decide its current action optimally by solving ψ(t,v)(k,r)\psi_{(t,v)}^{(k,r)}. However, due to the limits on the prediction horizon and communication radius, LPC can only approximate the offline optimal actions on the boundaries (we do this by using the minimizer of node cost functions). The key idea to our analysis of the optimality gap of LPC is by first asking: If we perturb the parameters of ψ(t,v)(k,r)\psi_{(t,v)}^{(k,r)}, i.e., the actions on the information boundaries, how large is the resulting change in a local action xt0v0x_{t_{0}}^{v_{0}} for (t0,v0)N(t,v)(k,r)N(t,v)(k,r)(t_{0},v_{0})\in N_{(t,v)}^{(k,r)}\setminus\partial N_{(t,v)}^{(k,r)} (in the optimal solution to (3.1))?

Ideally, we would like the above impact to decay exponentially fast in the graph distance between node v0v_{0} and the communication boundary for node vv (i.e., rr minus the graph distance between v0v_{0} and vv), as well as in the temporal distance between t0t_{0} and tt. We formalize this goal as exponentially decaying local perturbation bound in Definition 3.1. We then show in Theorem 3.2 and Theorem 3.3 that such bounds hold under appropriate assumptions.

Definition 3.1.

Define xt0v0ψ(t,v)(k,r)({yt1u},{zτu})(t0,v0),x_{t_{0}}^{v_{0}}\coloneqq\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right)_{(t_{0},v_{0})}, and (xt0v0)ψ(t,v)(k,r)({(yt1u)},{(zτu)})(t0,v0)(x_{t_{0}}^{v_{0}})^{\prime}\coloneqq\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{(y_{t-1}^{u})^{\prime}\},\{(z_{\tau}^{u})^{\prime}\}}\right)_{(t_{0},v_{0})} for arbitrary boundary parameters {(yt1u)},{(zτu)}\{(y_{t-1}^{u})\},\{(z_{\tau}^{u})\} and {(yt1u)},{(zτu)}\{(y_{t-1}^{u})^{\prime}\},\{(z_{\tau}^{u})^{\prime}\}. We say an exponentially decaying local perturbation bound holds if for non-negative constants

C1=\displaystyle C_{1}={} C1(T/μ,(ΔS)/μ)<,\displaystyle C_{1}(\ell_{T}/\mu,(\Delta\ell_{S})/\mu)<\infty,
C2=\displaystyle C_{2}={} C2(T/μ,(ΔS)/μ)<,\displaystyle C_{2}(\ell_{T}/\mu,(\Delta\ell_{S})/\mu)<\infty,
ρT=\displaystyle\rho_{T}={} ρT(T/μ)<1,ρS=ρS((ΔS)/μ)<1,\displaystyle\rho_{T}(\ell_{T}/\mu)<1,\rho_{S}=\rho_{S}((\Delta\ell_{S})/\mu)<1,

for any (t0,v0)(t_{0},v_{0}) and arbitrary boundary parameters {(yt1u)},{(zτu)},{(yt1u)},{(zτu)}\{(y_{t-1}^{u})^{\prime}\},\{(z_{\tau}^{u})^{\prime}\},\{(y_{t-1}^{u})\},\{(z_{\tau}^{u})\}, we have:

xt0v0(xt0v0)\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t_{0}}^{v_{0}}-(x_{t_{0}}^{v_{0}})^{\prime}}\right\rVert
\displaystyle\leq{} C1(u,τ)N(t,v)(k,r)ρT|t0τ|ρSd𝒢(v0,u)zτu(zτu)\displaystyle C_{1}\sum_{(u,\tau)\in\partial N_{(t,v)}^{(k,r)}}\rho_{T}^{|t_{0}-\tau|}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert z_{\tau}^{u}-(z_{\tau}^{u})^{\prime}}\right\rVert
+C2uNvrρTt0(t1)ρSd𝒢(v0,u)yt1u(yt1u).\displaystyle+C_{2}\sum_{u\in N_{v}^{r}}\rho_{T}^{t_{0}-(t-1)}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert y_{t-1}^{u}-(y_{t-1}^{u})^{\prime}}\right\rVert.

Perturbation bounds were recently found to be a promising tool for the analysis of adaptive control and online optimization models(Lin et al., 2021). The exponentially decaying local perturbation bound defined above is similar in spirit to two recent results, i.e., Lin et al. (2021) derives a similar perturbation bound for line graphs and Shin et al. (2021) for general graphs with local perturbations. In fact, one may attempt to derive such a bound by applying these results directly; however, a major weakness of the direct approach is that it will yield ρT=ρS\rho_{T}=\rho_{S}, i.e., it cannot distinguish between spatial and temporal dependencies, and the bound deteriorates as max{T/μ,S/μ}\max\{\ell_{T}/\mu,\ell_{S}/\mu\} increases. For instance, even if the temporal interactions are weak (i.e., T/μ0\ell_{T}/\mu\approx 0), ρT=ρS\rho_{T}=\rho_{S} can still be close to 11 if S/μ\ell_{S}/\mu is large, leading to a large slack in the perturbation bound for small prediction horizons kk.

We overcome this limitation by redefining the action variables. Specifically, to focus on the temporal decay effect, we regroup all local actions in {τ}×Nvr\{\tau\}\times N_{v}^{r} as a “large” decision variable for time τ\tau (in Figure 1 we would group each horizontal blue plane in NvrN_{v}^{r} to create a new variable). After regrouping, we have (k+1)(k+1) “large” decision variables located on a line graph, where the strength of the interactions between consecutive variables is upper bounded by T\ell_{T}. On the other hand, to focus on spatial decay, we regroup all local actions in {τt1τ<t+k}×{v}\{\tau\mid t-1\leq\tau<t+k\}\times\{v\} as a decision variable (in Figure 1 we would group each vertical orange line connecting from t1t-1 to t+k1t+k-1 to create a new variable). After regrouping, we have |𝒱|\mathopen{}\mathclose{{}\left\lvert\mathcal{V}}\right\rvert “large” decision variables located on 𝒢\mathcal{G}, where the strength of the interactions between two neighbors is upper bounded by S\ell_{S}. Averaging over the two perturbation bounds (since we have two valid bounds, their average is also a valid bound) provides the following exponentially decaying local perturbation bound (see (D.1) in Appendix D.1 for details of the proof).

Theorem 3.2.

Under Assumption 2.1, the exponentially decaying local perturbation bound (Definition 3.1) holds with C1=C2=2ΔSTμC_{1}=C_{2}=\frac{2\sqrt{\Delta\ell_{S}\ell_{T}}}{\mu}, and

ρT=\displaystyle\rho_{T}={} 12(1+(2T/μ)+1)1,\displaystyle\sqrt{1-2\mathopen{}\mathclose{{}\left(\sqrt{1+({2\ell_{T}}/{\mu})}+1}\right)^{-1}},
ρS=\displaystyle\rho_{S}={} 12(1+(ΔS/μ)+1)1.\displaystyle\sqrt{1-2\mathopen{}\mathclose{{}\left(\sqrt{1+({\Delta\ell_{S}}/{\mu})}+1}\right)^{-1}}.

Note that, as T/μ\ell_{T}/\mu (respectively S/μ\ell_{S}/\mu) tends to zero, ρT\rho_{T} (respectively ρS\rho_{S}) in Theorem 3.2 also tends to zero with the scaling ρT=Θ(T/μ)\rho_{T}=\Theta(\sqrt{\ell_{T}/\mu}) (resp. ρS=Θ(S/μ)\rho_{S}=\Theta(\sqrt{\ell_{S}/\mu})).

Next, we provide a tighter bound (through a refined analysis) for the regime where μ\mu is much larger than T,S\ell_{T},\ell_{S}. Specifically, we establish a bound with the scaling ρT=Θ(T/μ)\rho_{T}=\Theta(\ell_{T}/\mu) and ρS=Θ(S/μ)\rho_{S}=\Theta(\ell_{S}/\mu). Again, it is not possible to obtain this result from previous perturbation bounds in the literature.

Theorem 3.3.

Recall h(γ)supv𝒱|Nvγ|h(\gamma)\coloneqq\sup_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left\lvert\partial N_{v}^{\gamma}}\right\rvert. Given any b1,b2>0b_{1},b_{2}>0, define a=γ0(1+b11+b1+b2)γh(γ)a=\sum_{\gamma\geq 0}(\frac{1+b_{1}}{1+b_{1}+b_{2}})^{\gamma}h(\gamma), a~=γ0(11+b1)γh(γ)\tilde{a}=\sum_{\gamma\geq 0}(\frac{1}{1+b_{1}})^{\gamma}h(\gamma) and γS=1+ΔS/μ11+ΔS/μ+1\gamma_{S}=\frac{\sqrt{1+\Delta\ell_{S}/\mu}-1}{\sqrt{1+\Delta\ell_{S}/\mu}+1}. Suppose Assumption 2.1 holds, a,a~<a,\tilde{a}<\infty and μmax{8a~T,ΔS(b1+b2)/4}\mu\geq\max\{8\tilde{a}\ell_{T},\Delta\ell_{S}(b_{1}+b_{2})/4\}. Then the exponentially decaying local perturbation bound (Definition 3.1) holds with C1=C2=max{a22a~(14a~lT/μ),2a2ΔS/μγS(1+b1+b2)(14a~lT/μ)}C_{1}=C_{2}=\max\{\frac{a^{2}}{2\tilde{a}(1-4\tilde{a}l_{T}/\mu)},\frac{2a^{2}\Delta\ell_{S}/\mu}{\gamma_{S}(1+b_{1}+b_{2})(1-4\tilde{a}l_{T}/\mu)}\}

ρT=4a~Tμ,ρS=(1+b1+b2)γS.\displaystyle\rho_{T}=\frac{4\tilde{a}\ell_{T}}{\mu},\quad\rho_{S}=(1+b_{1}+b_{2})\gamma_{S}.

Note that ρT,ρS<1\rho_{T},\rho_{S}<1 follow from the condition on μ\mu. Also observe that γS=Θ(S/μ)\gamma_{S}=\Theta(\ell_{S}/\mu) as S/μ0\ell_{S}/\mu\to 0.

The main difference between this result and Theorem 3.2 is, instead of dividing and redefining the action variables, we explicitly write down the perturbations along spatial edges and along temporal edges in the original temporal-spatial graph. We observe that per-time-step spatial interactions are characterized by a banded matrix and that the inverse of the banded matrix exhibits exponential correlation decay, which implies the exponentially decaying local perturbation bounds holds if the perturbed boundary action and the impacted local action we consider are at the same the time step. However, for a multi-time-step problem, to characterize the impact at a local action at some time step due to perturbation at a boundary action at a different time step is a difficult problem. The main technical contribution of this proof is to establish that a product of exponentially decaying matrices still satisfies exponential decay under the conditions in Theorem 3.3. In addition, we obtain a tight bound on the decay rate of the product matrix (see Lemma C.3), which may be of independent interest.

Our condition on a,a~<a,\tilde{a}<\infty and μ>max{8a~T,ΔS(b1+b2)/4}\mu>\max\{8\tilde{a}\ell_{T},\Delta\ell_{S}(b_{1}+b_{2})/4\} characterizes a tradeoff between the allowable neighborhood boundary sizes h(γ)h(\gamma), and how large μ\mu needs to be compared to the interaction cost parameters T,S\ell_{T},\ell_{S}. At one extreme, if h(γ)=Δγh(\gamma)=\Delta^{\gamma}, then by setting b1=2Δ1b_{1}=2\Delta-1 and b2=4Δ22Δb_{2}=4\Delta^{2}-2\Delta, we obtain a=a~=2a=\tilde{a}=2 but must make a strong requirement on μ\mu, namely, μ>max{16T,Δ3S(114Δ2)}.\mu>\max\{16\ell_{T},\Delta^{3}\ell_{S}(1-\frac{1}{4\Delta^{2}})\}. At the other extreme, if h(γ)O(poly(γ))h(\gamma)\leq O(poly(\gamma)) (as is the case if 𝒢\mathcal{G} is a grid), then a,a~<a,\tilde{a}<\infty holds for any b1,b2>0b_{1},b_{2}>0, we can impose a weaker requirement on μ\mu: for example, taking b1=b2=1b_{1}=b_{2}=1 yields a requirement μ>max{8a~T,ΔS/2}\mu>\max\{8\tilde{a}\ell_{T},\Delta\ell_{S}/2\} (where a~=γ0(12)γh(γ)\tilde{a}=\sum_{\gamma\geq 0}(\frac{1}{2})^{\gamma}h(\gamma)); which grows only linearly in Δ\Delta, and compares favorably with the μ>Ω(Δ3)\mu>\Omega(\Delta^{3}) requirement which arose earlier.

Proofs of Theorem 3.2 and Theorem 3.3 are in Appendix D.

3.3 From Perturbations to Competitive Bounds

We now present our main result, which bounds the competitive ratio of LPC using the exponentially decaying local perturbation bounds defined in the previous section.

Before presenting the result, we first provide some intuition as to why the perturbation bounds are useful for deriving the competitive ratio bound. Specifically, to bound the competitive ratio requires bounding the gap between LPC’s trajectory and the offline optimal trajectory. This gap comes from the following two sources: (i) the per-time-step error made by LPC due to its limited prediction horizon and communication radius; and (ii) the cumulative impact of all per-time-step errors made in the past. Intuitively, the local perturbation bounds we derive in Section 3.2 allow us to bound the per-step error made jointly by all agents in LPC, and then we use the perturbation bounds from Lin et al. (2021) help us to bound the second type of cumulative errors.

We present our main result in the following theorem and defer a proof outline to Appendix C.2. A formal proof can be found in Appendix E.3.

Theorem 3.4.

Suppose Assumption 2.1 and the exponentially decaying local perturbation bound in Definition 3.1 holds. Define ρG12(1+(2T/μ)+1)1,\rho_{G}\coloneqq 1-2\cdot\mathopen{}\mathclose{{}\left(\sqrt{1+({2\ell_{T}}/{\mu})}+1}\right)^{-1}, and define C3(r)γ=0rh(γ)ρSγC_{3}(r)\coloneqq\sum_{\gamma=0}^{r}h(\gamma)\cdot\rho_{S}^{\gamma}. If parameters rr and kk of LPC are large enough such that O(h(r)2ρS2r+C3(r)2ρT2kρG2k)12,O\mathopen{}\mathclose{{}\left(h(r)^{2}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2k}\cdot\rho_{G}^{2k}}\right)\leq\frac{1}{2}, then the competitive ratio of LPC is upper bounded by

1+O(h(r)2ρSr)+O(C3(r)2ρTk).1+O\mathopen{}\mathclose{{}\left(h(r)^{2}\cdot\rho_{S}^{r}}\right)+O\mathopen{}\mathclose{{}\left(C_{3}(r)^{2}\cdot\rho_{T}^{k}}\right).

Here the O()O(\cdot) notation hides factors that depend polynomially on f/μ,T/μ,\ell_{f}/\mu,\ell_{T}/\mu, and (ΔS)/μ(\Delta\ell_{S})/\mu; see Appendix E.3.

Recall that h(r)h(r) denotes the size of the largest rr-hop boundary in 𝒢\mathcal{G}. The bound in Theorem 3.4 implies that if h(r)h(r) can be upper bounded by poly(r)ρS(1ι)r2poly(r)\cdot\rho_{S}^{-\frac{(1-\iota)r}{2}} for some constant ι>0\iota>0, the competitive ratio of LPC can be upper bounded by 1+O(ρSιr)+O(ρTk)1+O(\rho_{S}^{\iota r})+O(\rho_{T}^{k}), because C3(r)C_{3}(r) can be upper bounded by some constant that depends on ι\iota in this case. Therefore, the competitive ratio improves exponentially with respect to the prediction horizon kk and communication radius rr.

Note that the assumption h(r)poly(r)ρS(1ι)r2h(r)\leq poly(r)\cdot\rho_{S}^{-\frac{(1-\iota)r}{2}} is not particularly restrictive: For commonly seen graphs like an mm-dimensional grid, h(r)h(r) is polynomial in rr, so ι=1\iota=1 works. More generally, for graphs with bounded degree Δ<\Delta<\infty, there exists δ=δ(Δ)>0\delta=\delta(\Delta)>0 such that, for any graph with node degrees bounded above by Δ\Delta and any S/μδ{\ell_{S}}/{\mu}\leq\delta, we have ρS\rho_{S} (from either Theorem 3.2 or Theorem 3.3) will be small enough that, e.g., h(r)Δr=O(ρSr4)h(r)\leq\Delta^{r}=O(\rho_{S}^{-\frac{r}{4}}); i.e., ι=1/2\iota=1/2 works. Thus we can eliminate the dependence on h(r)h(r) and C3(r)C_{3}(r) in the competitive ratio by making additional assumptions on S/μ\ell_{S}/\mu. This result is stated in Corollary 3.5 whose proof is deferred to Appendix E.4. Corollary 3.5 is a corollary of Theorem 3.4 and Theorem 3.3. We use the bound in Theorem 3.3 and not the bound in Theorem 3.2 because Theorem 3.3 is tighter when S/μ\ell_{S}/\mu is small.

Corollary 3.5.

Suppose 2.1 and inequalities S/μΔ7\ell_{S}/\mu\leq{\Delta^{-7}}, and T/μ1/16\ell_{T}/\mu\leq 1/16 hold. If rr and kk satisfy that O(ρSr+ρT2kρG2k)12,O\mathopen{}\mathclose{{}\left(\rho_{S}^{r}+\rho_{T}^{2k}\cdot\rho_{G}^{2k}}\right)\leq\frac{1}{2}, then the competitive ratio of LPC is upper bounded by 1+O(ρSr/2)+O(ρTk),1+O\mathopen{}\mathclose{{}\left(\rho_{S}^{r/2}}\right)+O\mathopen{}\mathclose{{}\left(\rho_{T}^{k}}\right), where ρS\rho_{S} and ρT\rho_{T} are given by Theorem 3.3 with parameters b1=2Δ1b_{1}=2\Delta-1 and b2=4Δ22Δb_{2}=4\Delta^{2}-2\Delta. The O()O(\cdot) notation hides factors that depend polynomially on f/μ,T/μ,\ell_{f}/\mu,\ell_{T}/\mu, and (ΔS)/μ,(\Delta\ell_{S})/\mu, see Appendix E.4 for the full constants.

3.4 A Lower Bound

We show that the competitive ratio in Theorem 3.4 is order-optimal by deriving a lower bound on the competitive ratio of any decentralized online algorithm with prediction horizon kk and communication radius rr. The specific constants and a proof of Theorem 3.6 can be found in Appendix F.

Theorem 3.6.

When Δ3\Delta\geq 3, the competitive ratio of any decentralized online algorithm is lower bounded by 1+Ω(λTk)+Ω(λSr)1+\Omega(\lambda_{T}^{k})+\Omega(\lambda_{S}^{r}). Here, the decay factor λT\lambda_{T} is given by λT=(12(1+(4T/μ)+1)1)2\lambda_{T}=\mathopen{}\mathclose{{}\left(1-2\mathopen{}\mathclose{{}\left(\sqrt{1+(4\ell_{T}/\mu)}+1}\right)^{-1}}\right)^{2}. The decay factor λS\lambda_{S} is given by λS=(ΔS/μ)3+3(ΔS/μ)\lambda_{S}=\frac{(\Delta\ell_{S}/\mu)}{3+3(\Delta\ell_{S}/\mu)} if ΔS/μ<48\Delta\ell_{S}/\mu<48; λS=max((ΔS/μ)3+3(ΔS/μ),(143(ΔS/μ)12)2)\lambda_{S}=\max\mathopen{}\mathclose{{}\left(\frac{(\Delta\ell_{S}/\mu)}{3+3(\Delta\ell_{S}/\mu)},\mathopen{}\mathclose{{}\left(1-4\sqrt{3}\cdot(\Delta\ell_{S}/\mu)^{-\frac{1}{2}}}\right)^{2}}\right) otherwise. The Ω()\Omega(\cdot) notation hides factors that depend polynomially on 1/μ,T,1/\mu,\ell_{T}, and S\ell_{S}.

While Theorem 3.6 highlights that Theorem 3.4 is order-optimal, the decay factors λT,λS\lambda_{T},\lambda_{S} in the lower bound differ from their counterparts ρT,ρS\rho_{T},\rho_{S} in the upper bound for LPC. To understand the magnitude of the difference, we compare the bounds on graphs with bounded degree Δ\Delta. The decay factors are a function of the interaction strengths, which are measured by S/μ{\ell_{S}}/{\mu} and T/μ{\ell_{T}}/{\mu}. Our lower bound on the temporal decay factor λT\lambda_{T} and upper bound ρT\rho_{T} only differ by a constant factor in the log-scale, and the same holds for the lower/upper bound in terms of the spatial decay factor.

To formalize this comparison, we derive a resource augmentation bound that bounds the additional “resources” that LPC needs to outperform the optimal decentralized online algorithm.222See, e.g., Roughgarden (2020), for an introduction to this flavor of bounds for expressing the near-optimality of an algorithm. Here the prediction horizon kk and the communication radius rr can be viewed as the “resources” available to a decentralized online algorithm in our setting. We ask how large do kk and rr given to LPC need to be, to ensure that it beats the optimal decentralized online algorithm given a communication radius rr^{*} and prediction horizon kk^{*}?

We formally state our result in the following corollary and provide a proof in Appendix G.

Corollary 3.7.

Under Assumption 2.1, suppose the optimal decentralized online algorithm achieves a competitive ratio of c(k,r)c(k^{*},r^{*}) with prediction horizon kk^{*} and communication radius rr^{*}. Additionally assume that h(γ)=O~(ρSγ/4)h(\gamma)=\tilde{O}\mathopen{}\mathclose{{}\left(\rho_{S}^{-\gamma/4}}\right) and Δ3\Delta\geq 3, where the O~\tilde{O} notation hides a factor that depends polynomially on γ\gamma. As k,rk^{*},r^{*}\to\infty, LPC achieves a competitive ratio at least as good as that of the optimal decentralized online algorithm when LPC uses a prediction horizon of k=(4+o(1))kk=(4+o(1))k^{*} and a communication radius of r=(32+o(1))rr=(32+o(1))r^{*}.

Finally, note that we establish Corollary 3.7 based on the local perturbation bound in Theorem 3.2 rather than Theorem 3.3 for simplicity, because it does not make assumptions on the relationship among μ,T,\mu,\ell_{T}, and S\ell_{S}. We expect that Theorem 3.3 can give better resource augmentation bounds under stronger assumptions on μ,T,\mu,\ell_{T}, and S\ell_{S}.

4 Concluding Remarks

In this work, we introduce and study a novel form of decentralized online convex optimization in a networked system, where the local actions of each agent are coupled by temporal interactions and spatial interactions. We propose a decentralized online algorithm, LPC, which leverages all available information within a prediction horizon of length kk and a communication radius of rr to achieve a competitive ratio of 1+O~(ρTk)+O~(ρSr)1+\tilde{O}(\rho_{T}^{k})+\tilde{O}(\rho_{S}^{r}). Our lower bound result shows that this competitive ratio is order optimal. Our results imply that the two types of resources, the prediction horizon and the communication radius, must be improved simultaneously in order to obtain a competitive ratio that converges to 11. That is, it is not enough to either have a large communication radius or a long prediction horizon, the combination of both is necessary to approach the hindsight optimal performance.

A limitation of this work is that we have considered only exact predictions in the available information model, the LPC algorithm, and its analysis. Considering inexact predictions is an important future goal and we are optimistic that our work can be generalized in that direction using the roadmap in Appendix C.3.

Acknowledgement: We would like to thank ICML reviewers and meta-reviewer, as well as Yisong Yue for their valuable feedback on this work. This work is supported by NSF grants ECCS-2154171, CNS-2146814, CPS-2136197, CNS-2106403, NGSDI-2105648, CMMI-1653477, with additional support from Amazon AWS, PIMCO, and the Resnick Sustainability Insitute. Yiheng Lin was supported by Kortschak Scholars program.

References

  • Agarwal et al. (2019) Agarwal, N., Bullins, B., Hazan, E., Kakade, S., and Singh, K. Online control with adversarial disturbances. In International Conference on Machine Learning, pp. 111–119. PMLR, 2019.
  • Akbari et al. (2017) Akbari, M., Gharesifard, B., and Linder, T. Distributed online convex optimization on time-varying directed graphs. IEEE Transactions on Control of Network Systems, 4(3), 2017.
  • Antoniadis et al. (2020) Antoniadis, A., Coester, C., Elias, M., Polak, A., and Simon, B. Online metric algorithms with untrusted predictions. In International Conference on Machine Learning, pp. 345–355. PMLR, 2020.
  • Argue et al. (2020a) Argue, C., Gupta, A., and Guruganesh, G. Dimension-free bounds for chasing convex functions. In Conference on Learning Theory, pp.  219–241. PMLR, 2020a.
  • Argue et al. (2020b) Argue, C., Gupta, A., Guruganesh, G., and Tang, Z. Chasing convex bodies with linear competitive ratio. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp.  1519–1524. SIAM, 2020b.
  • Badiei et al. (2015) Badiei, M., Li, N., and Wierman, A. Online convex optimization with ramp constraints. In 2015 54th IEEE Conference on Decision and Control (CDC), pp.  6730–6736. IEEE, 2015.
  • Bertsekas & Tsitsiklis (1989) Bertsekas, D. and Tsitsiklis, J. Parallel and Distributed Computation: Numeral Methods. Prentice-Hall Inc., 1989.
  • Besbes & Lobel (2015) Besbes, O. and Lobel, I. Intertemporal price discrimination: Structure and computation of optimal policies. Management Science, 61(1):92–110, 2015.
  • Biggs et al. (2021) Biggs, M., Sun, W., and Ettl, M. Model distillation for revenue optimization: Interpretable personalized pricing. In International Conference on Machine Learning, pp. 946–956. PMLR, 2021.
  • Boyd et al. (2011) Boyd, S., Parikh, N., and Chu, E. Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc, 2011.
  • Candogan et al. (2012) Candogan, O., Bimpikis, K., and Ozdaglar, A. Optimal pricing in networks with externalities. Operations Research, 60(4):883–905, 2012.
  • Cao & Basar (2021) Cao, X. and Basar, T. Decentralized online convex optimization with feedback delays. IEEE Transactions on Automatic Control, 2021. ISSN 15582523. doi: 10.1109/TAC.2021.3092562.
  • Cao & Başar (2021) Cao, X. and Başar, T. Decentralized online convex optimization based on signs of relative states. Automatica, 129, 2021. ISSN 00051098. doi: 10.1016/j.automatica.2021.109676.
  • Caro & Gallien (2012) Caro, F. and Gallien, J. Clearance pricing optimization for a fast-fashion retailer. Operations Research, 60(6):1404–1422, 2012.
  • Chen & Chen (2015) Chen, M. and Chen, Z. L. Recent developments in dynamic pricing research: Multiple products, competition, and limited demand information. Production and Operations Management, 24, 2015. ISSN 19375956. doi: 10.1111/poms.12295.
  • Chen et al. (2016) Chen, N., Comden, J., Liu, Z., Gandhi, A., and Wierman, A. Using predictions in online optimization: Looking forward with an eye on the past. ACM SIGMETRICS Performance Evaluation Review, 44(1):193–206, 2016.
  • Chen et al. (2018) Chen, N., Goel, G., and Wierman, A. Smoothed online convex optimization in high dimensions via online balanced descent. In Proceedings of Conference On Learning Theory (COLT), pp. 1574–1594, 2018.
  • Chen & Farias (2018) Chen, Y. and Farias, V. F. Robust dynamic pricing with strategic customers. Mathematics of Operations Research, 43(4):1119–1142, 2018.
  • Chen & Wang (2005) Chen, Y. Q. and Wang, Z. Formation control: A review and a new consideration. In 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2005. doi: 10.1109/IROS.2005.1545539.
  • Fiacco & Ishizuka (1990) Fiacco, A. V. and Ishizuka, Y. Sensitivity and stability analysis for nonlinear programming. Annals of Operations Research, 27(1):215–235, 1990.
  • Forsgren et al. (2002) Forsgren, A., Gill, P. E., and Wright, M. H. Interior methods for nonlinear optimization. SIAM review, 44(4):525–597, 2002.
  • Gallego & Topaloglu (2019) Gallego, G. and Topaloglu, H. Revenue management and pricing analytics. International Series in Operations Research and Management Science, 279, 2019. ISSN 22147934. doi: 10.1007/978-1-4939-9606-3.
  • Gamarnik & Goldberg (2010) Gamarnik, D. and Goldberg, D. A. Randomized greedy algorithms for independent sets and matchings in regular graphs: Exact results and finite girth corrections. Combinatorics Probability and Computing, 19, 2010. ISSN 09635483. doi: 10.1017/S0963548309990186.
  • Garcia et al. (1989) Garcia, C. E., Prett, D. M., and Morari, M. Model predictive control: Theory and practice—a survey. Automatica, 25(3):335–348, 1989.
  • Goel & Wierman (2019) Goel, G. and Wierman, A. An online algorithm for smoothed regression and LQR control. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.
  • Goel et al. (2019) Goel, G., Lin, Y., Sun, H., and Wierman, A. Beyond online balanced descent: An optimal algorithm for smoothed online optimization. In Neural Information Processing Systems (NeurIPS), 2019.
  • Hochba (1997) Hochba, D. S. Approximation algorithms for NP-hard problems. ACM Sigact News, 28(2):40–52, 1997.
  • Hosseini et al. (2016) Hosseini, S., Chapman, A., and Mesbahi, M. Online distributed convex optimization on dynamic networks. IEEE Transactions on Automatic Control, 61(11), 2016. ISSN 00189286. doi: 10.1109/TAC.2016.2525928.
  • Joseph & de Veciana (2012) Joseph, V. and de Veciana, G. Jointly optimizing multi-user rate adaptation for video transport over wireless systems: Mean-fairness-variability tradeoffs. In Proceedings of the IEEE INFOCOM, pp.  567–575, 2012.
  • Kempe et al. (2015) Kempe, D., Kleinberg, J., and Éva Tardos. Maximizing the spread of influence through a social network. Theory of Computing, 11, 2015. ISSN 15572862. doi: 10.4086/toc.2015.v011a004.
  • Kim & Giannakis (2017) Kim, S. and Giannakis, G. B. An online convex optimization approach to real-time energy pricing for demand response. IEEE Transactions on Smart Grid, 8(6):2784–2793, 2017. ISSN 1949-3053. doi: 10.1109/TSG.2016.2539948.
  • Li et al. (2021a) Li, X., Yi, X., and Xie, L. Distributed online convex optimization with an aggregative variable. IEEE Transactions on Control of Network Systems, 2021a. ISSN 2325-5870. doi: 10.1109/tcns.2021.3107480.
  • Li et al. (2021b) Li, X., Yi, X., and Xie, L. Distributed online optimization for multi-agent networks with coupled inequality constraints. IEEE Transactions on Automatic Control, 66(8), 2021b. ISSN 15582523. doi: 10.1109/TAC.2020.3021011.
  • Li et al. (2020) Li, Y., Qu, G., and Li, N. Online optimization with predictions and switching costs: Fast algorithms and the fundamental limit. IEEE Transactions on Automatic Control, 2020.
  • Lin et al. (2012) Lin, M., Liu, Z., Wierman, A., and Andrew, L. L. Online algorithms for geographical load balancing. In Proceedings of the International Green Computing Conference (IGCC), pp.  1–10, 2012.
  • Lin et al. (2020) Lin, Y., Goel, G., and Wierman, A. Online optimization with predictions and non-convex losses. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 4(1):1–32, 2020.
  • Lin et al. (2021) Lin, Y., Hu, Y., Sun, H., Shi, G., Qu, G., and Wierman, A. Perturbation-based regret analysis of predictive control in linear time varying systems. arXiv preprint arXiv:2106.10497, 2021.
  • Marschak (1955) Marschak, J. Elements for a theory of teams. Management Science, 1, 1955. ISSN 0025-1909. doi: 10.1287/mnsc.1.2.127.
  • Marschak & Radner (1972) Marschak, J. and Radner, R. Economic Theory of Teams. Cowles Foundation for Research in Economics at Yale University, 1972.
  • Molzahn et al. (2017) Molzahn, D. K., Dörfler, F., Sandberg, H., Low, S. H., Chakrabarti, S., Baldick, R., and Lavaei, J. A survey of distributed optimization and control algorithms for electric power systems. IEEE Transactions on Smart Grid, 8(6):2941–2962, 2017.
  • Morari & Lee (1999) Morari, M. and Lee, J. H. Model predictive control: past, present and future. Computers & Chemical Engineering, 23(4-5):667–682, 1999.
  • Nedić et al. (2018) Nedić, A., Olshevsky, A., and Rabbat, M. G. Network topology and communication-computation tradeoffs in decentralized optimization. Proceedings of the IEEE, 106(5):953–976, 2018.
  • Oh et al. (2015) Oh, K. K., Park, M. C., and Ahn, H. S. A survey of multi-agent formation control. Automatica, 53, 2015. ISSN 00051098. doi: 10.1016/j.automatica.2014.10.022.
  • Roughgarden (2020) Roughgarden, T. Resource augmentation. CoRR, abs/2007.13234, 2020. URL https://arxiv.org/abs/2007.13234.
  • Schlosser (2016) Schlosser, R. Stochastic dynamic multi-product pricing with dynamic advertising and adoption effects. Journal of Revenue and Pricing Management, 15(2):153–169, 2016.
  • Sellke (2020) Sellke, M. Chasing convex bodies optimally. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp.  1509–1518. SIAM, 2020.
  • Shi et al. (2020) Shi, G., Lin, Y., Chung, S.-J., Yue, Y., and Wierman, A. Online optimization with memory and competitive control. Advances in Neural Information Processing Systems, 33:20636–20647, 2020.
  • Shi et al. (2019) Shi, M., Lin, X., and Jiao, L. On the value of look-ahead in competitive online convex optimization. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 3(2):22, 2019.
  • Shi et al. (2015) Shi, W., Ling, Q., Wu, G., and Yin, W. Extra: An exact first-order algorithm for decentralized consensus optimization. SIAM Journal on Optimization, 25(2):944–966, 2015.
  • Shi et al. (2021) Shi, Y., Qu, G., Low, S., Anandkumar, A., and Wierman, A. Stability constrained reinforcement learning for real-time voltage control. arXiv preprint arXiv:2109.14854, 2021.
  • Shin & Zavala (2021) Shin, S. and Zavala, V. M. Controllability and observability imply exponential decay of sensitivity in dynamic optimization. IFAC-PapersOnLine, 54(6):179–184, 2021.
  • Shin et al. (2020) Shin, S., Zavala, V. M., and Anitescu, M. Decentralized schemes with overlap for solving graph-structured optimization problems. IEEE Transactions on Control of Network Systems, 7(3):1225–1236, Sep 2020. ISSN 2372-2533. doi: 10.1109/tcns.2020.2967805. URL http://dx.doi.org/10.1109/TCNS.2020.2967805.
  • Shin et al. (2021) Shin, S., Anitescu, M., and Zavala, V. M. Exponential decay of sensitivity in graph-structured nonlinear programs. arXiv preprint arXiv:2101.03067, 2021.
  • Song & Chintagunta (2006) Song, I. and Chintagunta, P. K. Measuring cross-category price effects with aggregate store data. Management Science, 52(10):1594–1609, 2006.
  • Stanica (2001) Stanica, P. Good lower and upper bounds on binomial coefficients. Journal of Inequalities in Pure and Applied Mathematics, 2(3):30, 2001.
  • Talluri & Ryzin (2006) Talluri, K. and Ryzin, G. V. The theory and practice of revenue management, springer. International Series in Operations Research and Management, 68, 2006.
  • Tretter (2008) Tretter, C. Spectral theory of block operator matrices and applications. Spectral Theory of Block Operator Matrices and Applications, 2008. doi: 10.1142/P493.
  • Wainwright & Jordan (2008) Wainwright, M. J. and Jordan, M. I. Graphical models, exponential families, and variational inference. Now Publishers Inc, 2008.
  • Xin et al. (2020) Xin, R., Pu, S., Nedić, A., and Khan, U. A. A general framework for decentralized optimization with first-order methods. Proceedings of the IEEE, 108(11):1869–1889, 2020.
  • Yi et al. (2020) Yi, X., Li, X., Xie, L., and Johansson, K. H. Distributed online convex optimization with time-varying coupled inequality constraints. IEEE Transactions on Signal Processing, 68, 2020. ISSN 19410476. doi: 10.1109/TSP.2020.2964200.
  • Yu et al. (2020) Yu, C., Shi, G., Chung, S.-J., Yue, Y., and Wierman, A. The power of predictions in online control. Advances in Neural Information Processing Systems, 33, 2020.
  • Yuan et al. (2021) Yuan, D., Proutiere, A., and Shi, G. Distributed online optimization with long-term constraints. IEEE Transactions on Automatic Control, 2021. ISSN 15582523. doi: 10.1109/TAC.2021.3057601.

Appendix A Notation Summary and Definitions

The notation we use throughout the paper is summarized in the following two tables.

Table 1: Notation related to the graph/network structures.
Notation                                         Meaning
𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) The network of agents connected by undirected edges;
d𝒢(u,v)d_{\mathcal{G}}(u,v) The graph distance (i.e. the length of the shortest path) between two vertices uu and vv in 𝒢\mathcal{G};
NvrN_{v}^{r} The rr-hop neighborhood of vertex/agent vv in 𝒢\mathcal{G}, i.e., Nvr:={u𝒱d𝒢(u,v)r}N_{v}^{r}:=\{u\in\mathcal{V}\mid d_{\mathcal{G}}(u,v)\leq r\};
Nvr\partial N_{v}^{r} The boundary of the rr-hop neighborhood of vertex/agent vv, i.e., Nvr=NvrNvr1\partial N_{v}^{r}=N_{v}^{r}\setminus N_{v}^{r-1};
h(r)h(r) The size of the largest rr-hop boundary in 𝒢\mathcal{G}, i.e., h(r):=supv|Nvr|h(r):=\sup_{v}\mathopen{}\mathclose{{}\left\lvert\partial N_{v}^{r}}\right\rvert;
Δ\Delta The maximum degree of any vertex vv in 𝒢\mathcal{G};
(S)\mathcal{E}(S) The set of all edges that have both endpoints in SS, where S𝒱S\subseteq\mathcal{V};
S+S_{+} The extension of SS by 1-hop, i.e., S+=S{uvS s.t. (u,v)}S_{+}=S\cup\{u\mid\exists v\in S\text{ s.t. }(u,v)\in\mathcal{E}\};
N(t,v)(k,r)N_{(t,v)}^{(k,r)} {τtτ<t+k}×Nvr\{\tau\in\mathbb{Z}\mid t\leq\tau<t+k\}\times N_{v}^{r}, which is a set of (time, vertex) pairs;
N(t,v)(k,r)\partial N_{(t,v)}^{(k,r)} N(t,v)(k,r)N(t,v)(k1,r1)N_{(t,v)}^{(k,r)}\setminus N_{(t,v)}^{(k-1,r-1)}, which is a set of (time, vertex) pairs;
Table 2: Notation related to the optimization problems.
Notation                                         Meaning
\mathopen{}\mathclose{{}\left\lVert\cdot}\right\rVert The (Euclidean) 2-norm for vectors and the induced 2-norm for matrices;
HH The whole horizon length of Networked OCO problem;
[H][H] The sequence of integers 1,2,,H1,2,\ldots,H;
𝕊m\mathbb{S}^{m} For any positive integer mm, 𝕊m\mathbb{S}^{m} denotes the set of all symmetric real m×mm\times m matrices;
yt1:t2y_{t_{1}:t_{2}} The sequence yt1,yt1+1,,yt2y_{t_{1}},y_{t_{1}+1},\ldots,y_{t_{2}}, for t2t1t_{2}\geq t_{1};
xtvx_{t}^{v} The individual action of agent vv at time step tt. It is a vector in n\mathbb{R}^{n};
xtSx_{t}^{S} The joint action of all agents in set S𝒱S\subseteq\mathcal{V} at time tt, i.e., xtS={xtv}vSx_{t}^{S}=\{x_{t}^{v}\}_{v\in S};
xtx_{t} The joint action of all agents in 𝒱\mathcal{V} at time tt, i.e., xt={xtv}v𝒱x_{t}=\{x_{t}^{v}\}_{v\in\mathcal{V}}. It is a shorthand of xt𝒱x_{t}^{\mathcal{V}};
xtx_{t}^{*} The offline optimal joint action of all agents at time tt;
xτtx_{\tau\mid t}^{*} The clairvoyant joint decision of all agents at time τ\tau given that the joint decision is xtx_{t} at time tt;
ftv(xtv)f_{t}^{v}(x_{t}^{v}) The node cost function for agent v𝒱v\in\mathcal{V} at time step tt;
ctv(xtv,xt1v)c_{t}^{v}(x_{t}^{v},x_{t-1}^{v}) The temporal interaction cost function for agent v𝒱v\in\mathcal{V} at time step tt;
ste(xtv,xtu)s_{t}^{e}(x_{t}^{v},x_{t}^{u}) The spatial interaction cost for edge e=(v,u)e=(v,u)\in\mathcal{E} at time step tt;
μ\mu The strong convexity constant of node costs ftvf_{t}^{v};
f,T,S\ell_{f},\ell_{T},\ell_{S} The smoothness constant of node costs, temporal interaction costs, and spatial interaction costs;
DtvD_{t}^{v} The feasible set of xtvx_{t}^{v} for agent vv at time tt. It is a convex subset of n\mathbb{R}^{n};
θtv\theta_{t}^{v} The minimizer of node cost function for vv at time tt subject to DtvD_{t}^{v}, i.e., θtv=argminyDtvftv(y)\theta_{t}^{v}=\operatorname*{arg\,min}_{y\in D_{t}^{v}}f_{t}^{v}(y);
ftS(xtS+)f_{t}^{S}(x_{t}^{S_{+}}) The total node costs and spatial interaction costs over a subset S𝒱S\subseteq\mathcal{V} at time tt, i.e.,
ftS(xtS):=vSftv(xtv)+(v,u)(S)st(v,u)(xtv,xtu)f_{t}^{S}(x_{t}^{S}):=\sum_{v\in S}f_{t}^{v}(x_{t}^{v})+\sum_{(v,u)\in\mathcal{E}(S)}s_{t}^{(v,u)}(x_{t}^{v},x_{t}^{u});
ctS(xtS)c_{t}^{S}(x_{t}^{S}) The total temporal interaction costs over a subset S𝒱S\subseteq\mathcal{V} at time tt, i.e., ctS(xtS):=vSctv(xtv,xt1v)c_{t}^{S}(x_{t}^{S}):=\sum_{v\in S}c_{t}^{v}(x_{t}^{v},x_{t-1}^{v});
ft(xt)f_{t}(x_{t}) The total node costs and spatial interaction costs over a 𝒱\mathcal{V} at time tt. A shorthand of ft𝒱(xt𝒱)f_{t}^{\mathcal{V}}(x_{t}^{\mathcal{V}});
ct(xt)c_{t}(x_{t}) The total temporal interaction costs over 𝒱\mathcal{V} at time tt. A shorthand of ct𝒱(xt𝒱)c_{t}^{\mathcal{V}}(x_{t}^{\mathcal{V}});
ψ(t,v)(k,r)(,)\psi_{(t,v)}^{(k,r)}(\cdot,\cdot) The optimal individual decisions in N(t,v)(k,r)N_{(t,v)}^{(k,r)} when the decision boundaries formed by {t1}×Nvr\{t-1\}\times N_{v}^{r}
and N(t,v)(k,r)\partial N_{(t,v)}^{(k,r)} are fixed as parameters;
ψ~tp(,)\tilde{\psi}_{t}^{p}(\cdot,\cdot) The optimal global trajectory xt,xt+1,,xt+p2x_{t},x_{t+1},\ldots,x_{t+p-2} when xt1x_{t-1} and xt+p1x_{t+p-1} are fixed as parameters;
ψ~t()\tilde{\psi}_{t}(\cdot) The optimal global trajectory xt,xt+1,,xTx_{t},x_{t+1},\ldots,x_{T} when xt1x_{t-1} is fixed as the parameter;

In addition to the notation in the tables above, we make use of the concepts of strong convexity and smoothness throughout this paper.

Definition A.1.

For a fixed dimension m+m\in\mathbb{Z}_{+}, let DmD\subset\mathbb{R}^{m} be a convex set, and suppose function h^:D\hat{h}:D\to\mathbb{R} is a differentiable function. Then, h^\hat{h} is called \ell-smooth for some constant 0\ell\in\mathbb{R}_{\geq 0} if

h^(y)h^(x)+h^(x),yx+2yx22,x,ym,\hat{h}(y)\leq\hat{h}(x)+\langle\nabla\hat{h}(x),y-x\rangle+\frac{\ell}{2}\mathopen{}\mathclose{{}\left\lVert y-x}\right\rVert_{2}^{2},\forall x,y\in\mathbb{R}^{m},

and is called μ\mu-strongly convex for some constant μ0\mu\in\mathbb{R}_{\geq 0} if

h^(y)h^(x)+h^(x),yx+μ2yx22,x,ym.\hat{h}(y)\geq\hat{h}(x)+\langle\nabla\hat{h}(x),y-x\rangle+\frac{\mu}{2}\mathopen{}\mathclose{{}\left\lVert y-x}\right\rVert_{2}^{2},\forall x,y\in\mathbb{R}^{m}.

Here ,\langle\cdot,\cdot\rangle denotes the dot product of vectors.

Appendix B Example: Multiproduct Pricing

The networked online convex optimization problem captures many applications where individual agents must make decisions online in a networked system. In this section, we give a concrete motivating example from multiproduct pricing problems. Many recent books of revenue management (Talluri & Ryzin, 2006; Gallego & Topaloglu, 2019) and papers (Song & Chintagunta, 2006; Caro & Gallien, 2012; Candogan et al., 2012; Chen & Chen, 2015) describe numerous instances of this problem. In our example below, we follow a similar pricing model as in Candogan et al. (2012).

Consider a setting where a large company sells nn different products and wants to maximize its revenue by adjusting prices adaptively in a time-varying market. Each vertex/agent v𝒱v\in\mathcal{V} corresponds to a product and xtvx_{t}^{v} denotes its price at time tt. Two products vv and uu are connected by an edge if they interact, e.g., because the products are complements or substitutes.

We assume a classical linear demand model (Talluri & Ryzin, 2006; Gallego & Topaloglu, 2019), where the demand of vv at time tt, denoted as dtvd_{t}^{v}, is given by

dtv=atvktvxtvPart 1uNv1{v}ηt(uv)xtuPart 2+btvxt1vPart 3,d_{t}^{v}=\underbrace{a_{t}^{v}-k_{t}^{v}x_{t}^{v}}_{\text{Part 1}}\overbrace{-\sum_{u\in N_{v}^{1}\setminus\{v\}}\eta_{t}^{(u\to v)}x_{t}^{u}}^{\text{Part 2}}\underbrace{+b_{t}^{v}x_{t-1}^{v}}_{\text{Part 3}},

with parameters atv,ktv,btv>0a_{t}^{v},k_{t}^{v},b_{t}^{v}>0 and ηt(uv)\eta_{t}^{(u\to v)}\in\mathbb{R}. Here, Part 1 corresponds to the nominal demand at price xtvx_{t}^{v}; Part 2 adds the network externalities from vv’s complements/substitutes, and Part 3 reflects the pent up demand of product vv due to high price at time t1t-1. Note that the coefficient ηt(uv)\eta_{t}^{(u\to v)} can be different with ηt(vu)\eta_{t}^{(v\to u)}. To simplify the notations, for each undirected edge e=(u,v)e=(u,v), we define an aggregate coefficient γte12(ηt(uv)+ηt(vu))\gamma_{t}^{e}\coloneqq\frac{1}{2}\mathopen{}\mathclose{{}\left(\eta_{t}^{(u\to v)}+\eta_{t}^{(v\to u)}}\right).

The full revenue maximization problem can be written as

max\displaystyle\max t=1Hv𝒱xtvdtv=t=1Hv𝒱xtv(atvktvxtvuNv1{v}ηt(uv)xtu+btvxt1v)\displaystyle\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}x_{t}^{v}d_{t}^{v}=\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}x_{t}^{v}({a_{t}^{v}-k_{t}^{v}x_{t}^{v}}-\sum_{u\in N_{v}^{1}\setminus\{v\}}\eta_{t}^{(u\to v)}x_{t}^{u}+b_{t}^{v}x_{t-1}^{v}) (3)
s.t. 0xtvp¯tv,\displaystyle 0\leq x_{t}^{v}\leq\overline{p}_{t}^{v},

which is equivalent to the following:

min\displaystyle\min t=1Hv𝒱xtvdtv=t=1Hv𝒱[xtv(atv+ktvxtv+uNv1{v}ηt(uv)xtubtvxt1v)]\displaystyle-\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}x_{t}^{v}d_{t}^{v}=\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}\Bigg{[}x_{t}^{v}(-a_{t}^{v}+k_{t}^{v}x_{t}^{v}+\sum_{u\in N_{v}^{1}\setminus\{v\}}\eta_{t}^{(u\to v)}x_{t}^{u}-b_{t}^{v}x_{t-1}^{v})\Bigg{]} (4)
s.t. 0xtvp¯tv\displaystyle 0\leq x_{t}^{v}\leq\overline{p}_{t}^{v}

We assume that the product’s own price elasticity coefficient ktvk_{t}^{v} is uniformly larger than the sum of magnitudes of cross-elasticity coefficients, i.e., exist μ>0\mu>0, s.t.

ξtv:=ktvuNv1v|γt(u,v)|btv+bt+1v2μ/2>0\xi_{t}^{v}:=k_{t}^{v}-\sum_{u\in N_{v}^{1}\setminus{v}}|\gamma_{t}^{(u,v)}|-\frac{b_{t}^{v}+b_{t+1}^{v}}{2}\geq\mu/2>0

holds for any (t,v)(t,v). Further, we assume supv𝒱,tHktvf/2\sup_{v\in\mathcal{V},t\in H}k_{t}^{v}\leq\ell_{f}/2, supv𝒱,tHbtvb\sup_{v\in\mathcal{V},t\in H}b_{t}^{v}\leq b, and sup(u,v),tH|ηt(uv)|γ\sup_{(u,v)\in\mathcal{E},t\in H}|\eta_{t}^{(u\to v)}|\leq\gamma.

We now observe that this problem fits within our framework with node, spatial, and temporal costs which are quadratic and defined as follows.

ftv(xtv)\displaystyle f_{t}^{v}(x_{t}^{v})\coloneqq{} ξtv(xtvatv2ξtv)2,\displaystyle\xi_{t}^{v}\mathopen{}\mathclose{{}\left(x_{t}^{v}-\frac{a_{t}^{v}}{2\xi_{t}^{v}}}\right)^{2},
st(u,v)(xtu,xtv)\displaystyle s_{t}^{(u,v)}(x_{t}^{u},x_{t}^{v})\coloneqq{} |γt(u,v)|(xtu+sgn(γt(u,v))xtv)2,\displaystyle|\gamma_{t}^{(u,v)}|\mathopen{}\mathclose{{}\left(x_{t}^{u}+sgn\mathopen{}\mathclose{{}\left(\gamma_{t}^{(u,v)}}\right)\cdot x_{t}^{v}}\right)^{2},
ctv(xtv,xt1v)\displaystyle c_{t}^{v}(x_{t}^{v},x_{t-1}^{v})\coloneqq{} btv2(xtvxt1v)2.\displaystyle\frac{b_{t}^{v}}{2}\mathopen{}\mathclose{{}\left(x_{t}^{v}-x_{t-1}^{v}}\right)^{2}.

Note that ftv(xtv)f_{t}^{v}(x_{t}^{v}) is ξtv(xtv)2atvxtv\xi_{t}^{v}(x_{t}^{v})^{2}-{a_{t}^{v}}{x_{t}^{v}} plus a constant (atv)2/(4ξtv)(a_{t}^{v})^{2}/(4\xi_{t}^{v}), and the interaction functions can be rewritten as

st(u,v)(xtu,xtv)=|γt(u,v)|((xtu)2+(xtv)2)+2γt(u,v)xtvxtu,ctv(xtv,xt1v)=btv2((xtv)2+(xt1v)2)btxt1vxtv.s_{t}^{(u,v)}(x_{t}^{u},x_{t}^{v})=|\gamma_{t}^{(u,v)}|\Bigg{(}(x_{t}^{u})^{2}+(x_{t}^{v})^{2}\Bigg{)}+2\gamma_{t}^{(u,v)}x_{t}^{v}x_{t}^{u},\quad c_{t}^{v}(x_{t}^{v},x_{t-1}^{v})=\frac{b_{t}^{v}}{2}\Bigg{(}(x_{t}^{v})^{2}+(x_{t-1}^{v})^{2}\Bigg{)}-b_{t}x_{t-1}^{v}x_{t}^{v}.

Summing, we see that

t=1Hv𝒱ftv(xtv)+ctv(xtv,xt1v)+t=1Heste(xtu,xtv)=(Objective in (4))+t=1Hv𝒱(atv)2/(4ξtv).\displaystyle\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}f_{t}^{v}(x_{t}^{v})+c_{t}^{v}(x_{t}^{v},x_{t-1}^{v})+\sum_{t=1}^{H}\sum_{e\in\mathcal{E}}s_{t}^{e}(x_{t}^{u},x_{t}^{v})=\textup{(Objective in \eqref{opt-problem:min})}+\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}(a_{t}^{v})^{2}/(4\xi_{t}^{v})\,. (5)

Hence the optimal solution of (4) is the same as the following problem:

min\displaystyle\min t=1Hv𝒱ftv(xtv)+ctv(xtv,xt1v)+t=1Heste(xtu,xtv)\displaystyle\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}f_{t}^{v}(x_{t}^{v})+c_{t}^{v}(x_{t}^{v},x_{t-1}^{v})+\sum_{t=1}^{H}\sum_{e\in\mathcal{E}}s_{t}^{e}(x_{t}^{u},x_{t}^{v}) (6)
s.t. 0xtvp¯tv\displaystyle 0\leq x_{t}^{v}\leq\overline{p}_{t}^{v}

where the node cost function ftv(xtv)f_{t}^{v}(x_{t}^{v}) is nonnegative, μ\mu-strongly convex, and f\ell_{f}-smooth; the spatial interaction function is nonnegative, convex and (4γ)(4\gamma)-smooth; the temporal interaction function is nonegative, convex and (2b)(2b)-smooth.

The decentralized nature of our policy is important in this setting. Interpretable pricing algorithms (Biggs et al., 2021) are attractive in practice. Our local pricing algorithm is indeed interpretable, since the current price of a given product is transparently determined by reliable predictions of demand in the near future as well as interactions with directly related products.

In addition, exactly solving the global multiproduct pricing problem can be computationally challenging in practice, especially when the network is large. For example, large online e-commerce companies maintain millions of products, which makes the entire network difficult to store, let alone do computation over. Moreover, due to the ease of changing prices, e-commerce companies often use dynamic pricing and change prices on a daily (or quicker) basis, which magnifies the computational burden.

B.1 Competitive Bound

We end our discussion of multiproduct pricing by showing how the competitive bound in Theorem 3.4 can be applied to the revenue maximization problem of product networks through the use of the following lemma.

Lemma B.1.

Suppose the competitive ratio of our general cost minimization problem is CR(k,r)CR(k,r), which a function of prediction horizon kk and communication radius rr. Suppose sup(u,v),t[H]atu/atvb~\sup_{(u,v)\in\mathcal{E},t\in[H]}a_{t}^{u}/a_{t}^{v}\leq\tilde{b}, supv𝒱,t[H]atvp¯tvc~\sup_{v\in\mathcal{V},t\in[H]}\frac{a_{t}^{v}}{\overline{p}_{t}^{v}}\leq\tilde{c}, then the competitive ratio for the corresponding revenue maximization problem, defined as rev(ALG)/rev(OPT)rev(ALG)/rev(OPT), is at least 1η2(CR(k,r)1)1-\frac{\eta}{2}(CR(k,r)-1), where Δ\Delta denotes the degree of the product network and ηmax{2(f+Δb~γ)/μ,c~/μ}\eta\coloneqq\max\{2(\ell_{f}+\Delta\tilde{b}\gamma)/\mu,\tilde{c}/\mu\}.

Proof.

We define Ct,v(atv)2/(4ξtv)C\coloneqq\sum_{t,v}(a_{t}^{v})^{2}/(4\xi_{t}^{v}). Suppose

CR(k,r)cost(OPT)cost(ALG),CR(k,r)\cdot cost(OPT)\geq cost(ALG),

then

CR(k,r)(rev(OPT)+C)(rev(ALG)+C).CR(k,r)\cdot(-rev(OPT)+C)\geq(-rev(ALG)+C).

Rearranging the terms yields

(CR(k,r)1)CCR(k,r)rev(OPT)rev(ALG).(CR(k,r)-1)\cdot C\geq CR(k,r)rev(OPT)-rev(ALG). (7)

To find a lower bound on rev(OPT)rev(OPT), we choose a pricing strategy such that xtv=atvημ0x_{t}^{v}=\frac{a_{t}^{v}}{\eta\mu}\geq 0 where η=max{2(f+Δb~γ)/μ,c~/μ}\eta=\max\{2(\ell_{f}+\Delta\tilde{b}\gamma)/\mu,\tilde{c}/\mu\}. We first check that the demand is always nonnegative under this strategy:

atvktvatvημuNv1{v}ηt(uv)atuημ+btvat1vημ\displaystyle a_{t}^{v}-k_{t}^{v}\frac{a_{t}^{v}}{\eta\mu}-\sum_{u\in N_{v}^{1}\setminus\{v\}}\eta_{t}^{(u\to v)}\frac{a_{t}^{u}}{\eta\mu}+b_{t}^{v}\frac{a_{t-1}^{v}}{\eta\mu} atvktvatvημuNv1{v}ηt(uv)atuημ\displaystyle\geq a_{t}^{v}-k_{t}^{v}\frac{a_{t}^{v}}{\eta\mu}-\sum_{u\in N_{v}^{1}\setminus\{v\}}\eta_{t}^{(u\to v)}\frac{a_{t}^{u}}{\eta\mu}
atvktvatvημuNv1{v}b~γatvημ\displaystyle\geq a_{t}^{v}-k_{t}^{v}\frac{a_{t}^{v}}{\eta\mu}-\sum_{u\in N_{v}^{1}\setminus\{v\}}\tilde{b}\gamma\frac{a_{t}^{v}}{\eta\mu}
atv(1f+Δb~γημ)\displaystyle\geq a_{t}^{v}(1-\frac{\ell_{f}+\Delta\tilde{b}\gamma}{\eta\mu})
atv2.\displaystyle\geq\frac{a_{t}^{v}}{2}.

Moreover,

xtvatv/c~p¯tv.x_{t}^{v}\leq a_{t}^{v}/\tilde{c}\leq\overline{p}_{t}^{v}.

Hence this is a feasible price strategy.

We lower bound the optimal revenue:

rev(OPT)\displaystyle rev(OPT) t=1Hv𝒱atvημ(atvktvatvημuNv1{v}ηt(uv)atuημ+btvat1vημ)\displaystyle\geq\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}\frac{a_{t}^{v}}{\eta\mu}({a_{t}^{v}-k_{t}^{v}\frac{a_{t}^{v}}{\eta\mu}}-\sum_{u\in N_{v}^{1}\setminus\{v\}}\eta_{t}^{(u\to v)}\frac{a_{t}^{u}}{\eta\mu}+b_{t}^{v}\frac{a_{t-1}^{v}}{\eta\mu})
t=1Hv𝒱atvημatv2\displaystyle\geq\sum_{t=1}^{H}\sum_{v\in\mathcal{V}}\frac{a_{t}^{v}}{\eta\mu}\frac{a_{t}^{v}}{2}
2ηC.\displaystyle\geq\frac{2}{\eta}C.

We further divide Equation 7 by rev(OPT)rev(OPT) to obtain

(CR(k,r)1)Crev(OPT)CR(k,r)rev(ALG)/rev(OPT).(CR(k,r)-1)\frac{C}{rev(OPT)}\geq CR(k,r)-rev(ALG)/rev(OPT).

Since CR(k,r)1CR(k,r)\geq 1 for the cost minimization problem,

rev(ALG)/rev(OPT)1(CR(k,r)1)Crev(OPT).rev(ALG)/rev(OPT)\geq 1-(CR(k,r)-1)\frac{C}{rev(OPT)}.

This allows us to complete the proof as follows

rev(ALG)/rev(OPT)1η2(CR(k,r)1).rev(ALG)/rev(OPT)\geq 1-\frac{\eta}{2}(CR(k,r)-1).

Appendix C Proof Outline

In this section, we outline the major novelties in our proofs for the tighter exponentially decaying local perturbation bound in Theorem 3.3 and the main competitive ratio bound for LPC in Theorem 3.4. The full details of the proofs of these and other results are in the appendices following this one.

C.1 Refined Analysis of Perturbation Bounds

We begin by outlining the four-step structure we use to prove Theorem 3.3. Our goal is to highlight the main ideas, while deferring a detailed proof to Appendix D.2.

Step 1. Establish first order equations

We define hh as the objective function in (3.1), where actions on the boundary are fixed as {zτu|(τ,u)N(t,v)(k,r)}\{z_{\tau}^{u}|(\tau,u)\in\partial N_{(t,v)}^{(k,r)}\} and the actions at time t1t-1 are fixed as {xt1u|uNvr}\{x_{t-1}^{u}|u\in N_{v}^{r}\}. We denote those fixed actions as system parameter

ζ(xt1(Nvr),{zτu|(τ,u)N(t,v)(k,r)}).\zeta\coloneqq(x_{t-1}^{(N_{v}^{r})},\{z_{\tau}^{u}|(\tau,u)\in\partial N_{(t,v)}^{(k,r)}\}).

To avoid writing the time index tt repeatedly, we use x^i\hat{x}_{i} to denote actions at time t1+it-1+i for 0ik0\leq i\leq k. The main lemma in for this step is the following.

Lemma C.1.

Given θ\theta\in\mathbb{R}, system parameter ζ\zeta and perturbation vector ee, we have

ddθψ(ζ+θe)=M1(R(1)e0+R(k1)ek+τ=1k1K(τ)eτ)\frac{d}{d\theta}\psi(\zeta+\theta e)=M^{-1}\Bigg{(}R^{(1)}e_{0}+R^{(k-1)}e_{k}+\sum_{\tau=1}^{k-1}K^{(\tau)}e_{\tau}\Bigg{)}

where

M=x^1:k12h(ψ(ζ+θe),ζ+θe),M=\nabla_{\hat{x}_{1:k-1}}^{2}h(\psi(\zeta+\theta e),\zeta+\theta e),
R(1)x^0x^1:k1h(ψ(ζ+θe),ζ+θe),R^{(1)}\coloneqq-\nabla_{\hat{x}_{0}}\nabla_{\hat{x}_{1:k-1}}h(\psi(\zeta+\theta e),\zeta+\theta e),
R(k1)x^kx^1:k1h(ψ(ζ+θe),ζ+θe),R^{(k-1)}\coloneqq-\nabla_{\hat{x}_{k}}\nabla_{\hat{x}_{1:k-1}}h(\psi(\zeta+\theta e),\zeta+\theta e),
K(τ)x^τ(Nvr)x^1:k1h(ψ(ζ+θe),ζ+θe).K^{(\tau)}\coloneqq-\nabla_{\hat{x}_{\tau}^{(\partial N_{v}^{r})}}\nabla_{\hat{x}_{1:k-1}}h(\psi(\zeta+\theta e),\zeta+\theta e).

The proof for Lemma C.1 using first order conditions at the global optimal solution for convex function h(,ζ+θe)h(\cdot,\zeta+\theta e) and then takes derivatives with respect to to θ\theta. See Appendix D.2 for a proof.

Step 2: Exploit the structure of matrix MM

MM is a hierarchical block matrix with the first level of dimension (k1)×(k1)(k-1)\times(k-1). When fixing the first level indices (i.e. time indices) in MM, the lower level matrices are non-zero only if the difference in the time indices is 1\leq 1. Hence we decompose MM to a block diagonal matrix DD and tri-diagonal block matrix AA with zero matrix on the diagonal. Each diagonal block in DD is a graph-induced banded matrix, which captures the Hessian of hh in a single time step. Denote each diagonal block as Di,iD_{i,i} for 1ik11\leq i\leq k-1. Further, for 1ik11\leq i\leq k-1, Ai,i1A_{i,i-1} (similarly Ai,i+1A_{i,i+1}) captures the temporal correlation of individual’s action between consecutive time steps. Given this decomposition,

M1=(D+A)1=D1(I+AD1)1.M^{-1}=(D+A)^{-1}=D^{-1}(I+AD^{-1})^{-1}.

For the ease of notation, we denote I+AD1I+AD^{-1} by P.P. Note that PP is not necessarily a symmetric matrix. Nevertheless, under technical conditions on PP’s eigenvalues, we have the following power series expansion (Shin et al., 2020). The details are presented in the Lemma C.2 in Appendix D.2.

Lemma C.2.

Under the condition μ>2T\mu>2\ell_{T}, we have

P1=0(IP).P^{-1}=\sum_{\ell\geq 0}(I-P)^{\ell}. (8)

To understand the the power series in (8), consider the special case where each block Ai,j=TIA_{i,j}=\ell_{T}\cdot I, and Di,i=QD_{i,i}=Q. Denote PIP-{I} as JJ, which is equivalent to AD1AD^{-1}. Then, we have Ji,i=0J_{i,i}=0, Ji,i1=Ji,i+1=TQ1J_{i,i-1}=J_{i,i+1}=\ell_{T}Q^{-1}, Ji,j=0 when |ij|>1.J_{i,j}=0\textup{ when $|i-j|>1$}. Intuitively, JJ captures the “correlation over actions” after one time step. More generally, for 0\ell\geq 0 and any two time indices τ\tau^{\prime}, τ\tau,

Jτ,τ=TQb(,τ,τ),J^{\ell}_{\tau^{\prime},\tau}=\ell_{T}^{\ell}Q^{-\ell}b(\ell,\tau,\tau^{\prime}),

where b(,τ,τ)b(\ell,\tau,\tau^{\prime}) is a constant depending on ,τ,τ\ell,\tau,\tau^{\prime} and equal to zero if <|ττ|\ell<|\tau-\tau^{\prime}|.

Given that QQ is a graph-induced banded matrix, Q1Q^{-1} satisfies exponential-decay properties, which makes it plausible that QQ^{-\ell} is an exponential decay matrix with a slower rate.

For the general case, we need to bound terms similar to (Di1,i11Di2,i21Di,i1)u,v\mathopen{}\mathclose{{}\left\lVert(D_{i_{1},i_{1}}^{-1}D_{i_{2},i_{2}}^{-1}\cdots D_{i_{\ell},i_{\ell}}^{-1})_{u,v}}\right\rVert. This is the goal of Step 3.

Step 3: Properties for general exponential-decay matrices

The goal of this step is to establish that a product of exponential decay matrices still exhibits exponential decay property under technical conditions about the underlying graph.

Lemma C.3.

Given any graph =(𝒱,)\mathcal{M}=(\mathcal{V^{\prime}},\mathcal{E^{\prime}}) and integers d,1d,\ell\geq 1, suppose block matrices Ai|𝒱|d×|𝒱|dA_{i}\in\mathbb{R}^{|\mathcal{V^{\prime}}|d\times|\mathcal{V^{\prime}}|d} all satisfy exponential decay properties, i.e. exists Ci0C_{i}\geq 0, and 0λ<10\leq\lambda<1, s.t.,

(Ai)u,qCiλd(u,q)for any node u,q.\mathopen{}\mathclose{{}\left\lVert(A_{i})_{u,q}}\right\rVert\leq C_{i}\lambda^{d_{\mathcal{M}}(u,q)}\quad\textup{for any node $u,q\in\mathcal{M}$}.

Select some δ>0\delta>0 s.t. λ=λ+δ<1\lambda^{\prime}=\lambda+\delta<1. If a~k=0(λλ)k(supu𝒱|Nuk|)<\tilde{a}\coloneqq\sum_{k=0}^{\infty}(\frac{\lambda}{\lambda^{\prime}})^{k}(\sup_{u\in\mathcal{V^{\prime}}}|\partial N_{u}^{k}|)<\infty, then i=1Ai\prod_{i=1}^{\ell}A_{i} satisfies exponential decay properties with decay rate λ\lambda^{\prime}, i.e.,

(i=1Ai)u,vC(λ)dM(u,v).\mathopen{}\mathclose{{}\left\lVert(\prod_{i=1}^{\ell}A_{i})_{u,v}}\right\rVert\leq C^{\prime}{(\lambda^{\prime})}^{d_{M}(u,v)}.

where C=(a~)i=1CiC^{\prime}=(\tilde{a})^{\ell}\prod_{i=1}^{\ell}C_{i}.

A proof of Lemma C.3 can be found in the Appendix D.2.

Step 4: Establish correlation decay properties of matrix MM

The last step of the proof is to study the properties of MM. To accomplish this, we first show that, for time indices i,j1i,j\geq 1, JJ^{\ell} has the following properties:

  • (J)i,j=0(J^{\ell})_{i,j}=0 if <|ij|\ell<|i-j| or |ij|\ell-|i-j| is odd.

  • (J)i,j(J^{\ell})_{i,j} is a summation of terms k=1Ajk,ikDik,ik1\prod_{k=1}^{\ell}A_{j_{k},i_{k}}D_{i_{k},i_{k}}^{-1} and the number of such terms is bounded by ((|ij|)/2)\binom{\ell}{(\ell-|i-j|)/2}.

We formally state and prove the above observation in Lemma D.3. We can further use Theorem C.3 on block matrices Ajk,ikDik1A_{j_{k},i_{k}}D_{i_{k}}^{-1}, which gives the following lemma.

Lemma C.4.

Recall γS1+(ΔS/μ)11+(ΔS/μ)+1\gamma_{S}\coloneqq\frac{\sqrt{1+(\Delta\ell_{S}/\mu)}-1}{\sqrt{1+(\Delta\ell_{S}/\mu)}+1}. Select δ>0\delta>0 s.t. γS=γS+δ<1\gamma_{S}^{\prime}=\gamma_{S}+\delta<1 and bγ0(γSγS)γh(γ)b\coloneqq\sum_{\gamma\geq 0}(\frac{\gamma_{S}}{\gamma_{S}^{\prime}})^{\gamma}h(\gamma). Given ,i,j1\ell,i,j\geq 1 and u,q𝒱u,q\in\mathcal{V}, we have

((J)i,j)u,q((|ij|)/2)(b2Tμ)(γS)d𝒢(u,q).\mathopen{}\mathclose{{}\left\lVert((J^{\ell})_{i,j})_{u,q}}\right\rVert\leq\binom{\ell}{(\ell-|i-j|)/2}(b\frac{2\ell_{T}}{\mu})^{\ell}(\gamma_{S}^{\prime})^{d_{\mathcal{G}}(u,q)}.

Intuitively speaking, Lemma C.4 bounds the correlations over actions for node uu at time step t1+it-1+i and action for node qq at time step t1+jt-1+j. We present its proof in the Appendix D.2.

Recall that, for 1i,jk11\leq i,j\leq k-1,

Mi,j1=Di,i10(J)i,j.M^{-1}_{i,j}=D^{-1}_{i,i}\sum_{\ell\geq 0}(-J)^{\ell}_{i,j}.

With the exponential decaying bounds on matrix JJ^{\ell}, we can thus conclude Theorem 3.3 by following a similar procedure as in the proof of Theorem 3.1 of (Lin et al., 2021). We present the details in Appendix D.4.

C.2 From Perturbation to Competitive Ratio

We now show how to use the result proven in the previous section to prove our competitive ratio bounds in Theorem 3.4. Our starting point is the assumption that the exponentially decaying local perturbation bound in Definition 3.1 holds for some C1,C2>0C_{1},C_{2}>0 and ρS,ρT[0,1)\rho_{S},\rho_{T}\in[0,1), which is established using the proof approach outlined in the Section 3.3.

As we discussed in Section 3.3, our proof contains two key parts: (i) we bound the per-time-step error of LPC (Lemma C.7); and (ii) show that the per-time-step error does not accumulate to be unbounded (Theorem C.8).

A key observation that enables the above analysis approach is that the aggregation of local per-time-step error made by each agent at xtvx_{t}^{v} can be viewed as a global per-time-step error in the joint global action xtx_{t}. Following this observation, we first introduce a global perturbation bound that focuses on the global action xtx_{t} rather than the local actions xtvx_{t}^{v}. Recall that ftf_{t} denotes the global hitting cost (see Section 2). Define the optimization problem that solves the optimal trajectory from global state xt1x_{t-1} to xt+p1x_{t+p-1}

ψ~tp(y,z)=argminxt:t+p1\displaystyle\tilde{\psi}_{t}^{p}(y,z)=\operatorname*{arg\,min}_{x_{t:t+p-1}} τ=tt+p1(fτ(xτ)+cτ(xτ,xτ1))\displaystyle\sum_{\tau=t}^{t+p-1}\mathopen{}\mathclose{{}\left(f_{\tau}(x_{\tau})+c_{\tau}(x_{\tau},x_{\tau-1})}\right)
s.t. xt1=y,xt+p1=z,\displaystyle x_{t-1}=y,x_{t+p-1}=z, (9)

and another one that solves the optimal trajectory from global state xt1x_{t-1} to the end of the game

ψ~t(y)=argminxt:T\displaystyle\tilde{\psi}_{t}(y)=\operatorname*{arg\,min}_{x_{t:T}} τ=tT(fτ(xτ)+cτ(xτ,xτ1))\displaystyle\sum_{\tau=t}^{T}\mathopen{}\mathclose{{}\left(f_{\tau}(x_{\tau})+c_{\tau}(x_{\tau},x_{\tau-1})}\right)
s.t. xt1=y.\displaystyle x_{t-1}=y. (10)

The following global perturbation bound can be derived from Theorem 3.1 in Lin et al. (2021):

Theorem C.5 (Global Perturbation Bound).

Under Assumption 2.1, the following perturbation bounds hold for optimization problems (C.2) and (C.2):

ψ~tp(y,z)t0ψ~tp(y,z)t0\displaystyle\mathopen{}\mathclose{{}\left\lVert\tilde{\psi}_{t}^{p}(y,z)_{t_{0}}-\tilde{\psi}_{t}^{p}(y^{\prime},z^{\prime})_{t_{0}}}\right\rVert\leq{} CGρGt0t+1yy+CGρGt+p1t0zz,\displaystyle C_{G}\rho_{G}^{t_{0}-t+1}\mathopen{}\mathclose{{}\left\lVert y-y^{\prime}}\right\rVert+C_{G}\rho_{G}^{t+p-1-t_{0}}\mathopen{}\mathclose{{}\left\lVert z-z^{\prime}}\right\rVert,
ψ~t(y)t0ψ~t(y)t0\displaystyle\mathopen{}\mathclose{{}\left\lVert\tilde{\psi}_{t}(y)_{t_{0}}-\tilde{\psi}_{t}(y^{\prime})_{t_{0}}}\right\rVert\leq{} CGρGt0t+1yy,\displaystyle C_{G}\rho_{G}^{t_{0}-t+1}\mathopen{}\mathclose{{}\left\lVert y-y^{\prime}}\right\rVert,

where ρG=12(1+2Tμ+1)1\rho_{G}=1-2\cdot\mathopen{}\mathclose{{}\left(\sqrt{1+\frac{2\ell_{T}}{\mu}}+1}\right)^{-1} and CG=2TμC_{G}=\frac{2\ell_{T}}{\mu}.

To make the concept of per-time-step error rigorous, we formally define it as the distance between the actual next action picked by LPC and the clairvoyant optimal next action from previous action xt1x_{t-1} to the end of the game:

Definition C.6 (Per-step error magnitude).

At time step tt, given the previous state xt1x_{t-1}, the (decentralized) online algorithm ALGALG picks xtDtx_{t}\in D_{t}. We define error ete_{t} as

et:=xtψ~t(xt1)t.e_{t}:=\mathopen{}\mathclose{{}\left\lVert x_{t}-\tilde{\psi}_{t}(x_{t-1})_{t}}\right\rVert.

Using the local perturbation bound in Definition 3.1, we show the per-time-step error of LPC decays exponentially with respect to prediction length kk and communication range rr. This result is stated formally in Lemma C.7, and the proof can be found in Appendix E.2.

Lemma C.7.

For LPCLPC with parameters rr and kk, ete_{t} satisfies

et2=\displaystyle e_{t}^{2}={} O(h(r)2ρS2r+C3(r)2ρT2kρG2k)xt1xt12\displaystyle O\mathopen{}\mathclose{{}\left(h(r)^{2}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2k}\rho_{G}^{2k}}\right)\cdot\mathopen{}\mathclose{{}\left\lVert x_{t-1}-x_{t-1}^{*}}\right\rVert^{2}
+O(h(r)2ρS2r)τ=tt+k1ρTτtfτ(xτ)\displaystyle+O\mathopen{}\mathclose{{}\left(h(r)^{2}\cdot\rho_{S}^{2r}}\right)\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}f_{\tau}(x_{\tau}^{*})
+O(C3(r)2ρT2k)ft+k1(xt+k1),\displaystyle+O\mathopen{}\mathclose{{}\left(C_{3}(r)^{2}\cdot\rho_{T}^{2k}}\right)f_{t+k-1}(x_{t+k-1}^{*}),

where C3(r):=γ=0rh(γ)ρSγC_{3}(r):=\sum_{\gamma=0}^{r}h(\gamma)\cdot\rho_{S}^{\gamma}.

Using the global perturbation bound in Theorem C.5, we show t=1Txtxt2\sum_{t=1}^{T}\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2} can be upper bounded by the sum of per-time-step errors of LPC in Theorem C.8. The proof can be found in Appendix E.1.

Theorem C.8.

Let x0,x1,x2,,xHx_{0},x_{1}^{*},x_{2}^{*},\ldots,x_{H}^{*} denote the offline optimal global trajectory and x0,x1,x2,,xHx_{0},x_{1},x_{2},\ldots,x_{H} denote the trajectory of ALGALG. The trajectory of ALGALG satisfies that

t=1Hxtxt2C02(1ρG)2t=1Het2,\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2}\leq\frac{C_{0}^{2}}{(1-\rho_{G})^{2}}\sum_{t=1}^{H}e_{t}^{2},

where C0:=max{1,CG}C_{0}:=\max\{1,C_{G}\} and CGC_{G} is defined in Theorem C.5.

To understand the bound in Theorem C.8, we can set all per-time-step error ete_{t} to be zero except a single time step τ\tau. We see the impact of eτe_{\tau} on the total squared distance t=1Txtxt2\sum_{t=1}^{T}\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2} is up to some constant factor of eτe_{\tau}. This is because the impact of eτe_{\tau} on xtxt\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert decays exponentially as tt increases from τ\tau to TT.

By substituting the per-time-step error bound in Lemma C.7 into Theorem C.8, one can bound t=1Txtxt2\sum_{t=1}^{T}\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2} by the offline optimal cost, which can be converted to the competitive ratio bound in Theorem 3.4.

C.3 Roadmap to Generalize the Proof to Inexact Predictions

In this section, we present a roadmap to generalize our proof to the case where predictions of future cost functions are inexact. In the information availability model in Section 2.1, one can study inexact predictions by introducing additional disturbance parameters {δtv,wtv,wte}\{\delta_{t}^{v},w_{t}^{v},w_{t}^{e}\} to the 3 types of cost functions and generalize them to ft(xtv,δtv),ctv(xtv,xt1v,wtv),ste(xtv,xtu,wte)f_{t}(x_{t}^{v},\delta_{t}^{v}),c_{t}^{v}(x_{t}^{v},x_{t-1}^{v},w_{t}^{v}),s_{t}^{e}(x_{t}^{v},x_{t}^{u},w_{t}^{e}) for all t[H],v𝒱,e=(v,u)t\in[H],v\in\mathcal{V},e=(v,u)\in\mathcal{E}. {δtv,wtv,wte}\{\delta_{t}^{v},w_{t}^{v},w_{t}^{e}\} represents the disturbances in the cost functions that are hard to predict exactly. Before the decentralized online algorithm decides each local action, it receives the generalized cost functions ftv(,),ctv(,,),ste(,,)f_{t}^{v}(\cdot,\cdot),c_{t}^{v}(\cdot,\cdot,\cdot),s_{t}^{e}(\cdot,\cdot,\cdot) and noisy predictions of the true disturbance parameters {δtv,wtv,wte}\{\delta_{t}^{v},w_{t}^{v},w_{t}^{e}\} within kk time steps and an rr-hop neighborhood. In the LPC algorithm (Algorithm 1), the optimization problem ψ(t,v)(k,r)\psi_{(t,v)}^{(k,r)} can then be solved with the noisy predictions of disturbance parameters. To analyze the performance of LPC in the presence of prediction errors, one can first generalize the exponentially decaying perturbation bounds in Theorem 3.2 and Theorem 3.3 to include the perturbations on disturbance parameters similar to what we already did in Theorem D.2. The prediction error on disturbance parameters will result in an additional additive term in the per-step error bound in Lemma C.7. If one is willing to assume that the total sum of prediction errors is O(cost(OPT))O(cost(OPT)), as in Antoniadis et al. (2020), one can derive a competitive ratio for LPC by substituting the per-step error bound into Theorem C.8. It is worth noting that the resulting competitive ratio will inevitably depend on the quality of predictions, and will converge to a limit larger than 11 (under imperfect predictions) as the prediction horizon kk and the communication radius rr increase.

Appendix D Perturbation Bounds

This section provides the full proofs of the perturbation bounds stated in Section 3.2.

D.1 Proof of Theorem 3.2

We begin with a technical lemma. Recall that for any positive integer mm, 𝕊m\mathbb{S}^{m} denotes the set of all symmetric m×mm\times m real matrices.

Lemma D.1.

For a graph 𝒢=(𝒱,)\mathcal{G}^{\prime}=(\mathcal{V}^{\prime},\mathcal{E}^{\prime}), suppose AA is a positive definite matrix in 𝕊i𝒱pi\mathbb{S}^{\sum_{i\in\mathcal{V}^{\prime}}p_{i}} formed by |𝒱|×|𝒱|\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}}\right\rvert\times\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}}\right\rvert blocks, where the (i,j)(i,j)-th block has dimension pi×pjp_{i}\times p_{j}, i.e., Ai,jpi×pjA_{i,j}\in\mathbb{R}^{p_{i}\times p_{j}}. Assume that AA is qq-banded for an even positive integer qq; i.e.,

Ai,j=0,d𝒢(i,j)>q/2.A_{i,j}=0,\forall d_{\mathcal{G}^{\prime}}(i,j)>q/2.

Let a0a_{0} denote the smallest eigenvalue value of AA, and b0b_{0} denote the largest eigenvalue value of AA. Assume that b0a0>0b_{0}\geq a_{0}>0. Suppose D=diag(D1,,D|𝒱|)D=diag(D_{1},\ldots,D_{\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}}\right\rvert}), where Di𝕊piD_{i}\in\mathbb{S}^{p_{i}} is positive semi-definite. Let M=((A+D)1)SR,SCM=\mathopen{}\mathclose{{}\left((A+D)^{-1}}\right)_{S_{R},S_{C}}, where SR,SC{1,,|𝒱|}S_{R},S_{C}\subseteq\{1,\ldots,\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}}\right\rvert\}. Then we have MCγd^\mathopen{}\mathclose{{}\left\lVert M}\right\rVert\leq C\gamma^{\hat{d}}, where

C=2a0,γ=(cond(A)1cond(A)+1)2/q,d^=miniSR,jScd𝒢(i,j).C=\frac{2}{a_{0}},\gamma=\mathopen{}\mathclose{{}\left(\frac{\sqrt{cond(A)}-1}{\sqrt{cond(A)}+1}}\right)^{2/q},\hat{d}=\min_{i\in S_{R},j\in S_{c}}d_{\mathcal{G}^{\prime}}(i,j).

Here cond(A)=b0/a0cond(A)=b_{0}/a_{0} denotes the condition number of matrix AA.

We can show Lemma D.1 using the same method as Lemma B.1 in Lin et al. (2021). We only need to note that even when the size of blocks are not identical, the mm th power of a qq-banded matrix is a qmqm-banded matrix for any positive integer mm.

With the help of Lemma D.1, we can proceed to show a local perturbation bound on a general 𝒢\mathcal{G}^{\prime} in Theorem D.2, where 𝒢\mathcal{G}^{\prime} can be different from the network 𝒢\mathcal{G} of agents in Section 2. Compared with Theorem 3.1 in Lin et al. (2021), Theorem D.2 is more general because it considers a general network of decision variables while Theorem 3.1 in Lin et al. (2021) only consider the special case of a line graph. Although Theorem D.2 does not consider the temporal dimension which features in the local perturbation bound defined in Definition 3.1, we will use it to show Theorem 3.2 later by redefining the variables from two perspectives.

Theorem D.2.

For a network 𝒢=(𝒱,)\mathcal{G}^{\prime}=(\mathcal{V}^{\prime},{\mathcal{E}^{\prime}}) with undirected edges, suppose that each node v𝒱v\in\mathcal{V}^{\prime} is associated with a decision vector333We add a hat over the decision vector x^v\hat{x}_{v} to distinguish it with the local action xtvx_{t}^{v} and global action xtx_{t} defined in Section 2. We assume x^v\hat{x}_{v} is a pvp_{v} dimensional real vector. x^vpv\hat{x}_{v}\in\mathbb{R}^{p_{v}} and a cost function f^v:pv0\hat{f}_{v}:\mathbb{R}^{p_{v}}\to\mathbb{R}_{\geq 0}, and each edge e=(u,v)e=(u,v)\in{\mathcal{E}^{\prime}} is associated with an edge cost c^e:pv×pu×q0\hat{c}_{e}:\mathbb{R}^{p_{v}}\times\mathbb{R}^{p_{u}}\times\mathbb{R}^{q}\to\mathbb{R}_{\geq 0}. Assume that f^v\hat{f}_{v} is μ\mu-strongly convex for all v𝒱v\in\mathcal{V}^{\prime} and c^e\hat{c}_{e} is \ell-smooth for all ee\in{\mathcal{E}^{\prime}}. For some subset S𝒱S\subset\mathcal{V}^{\prime}, define

E0:=\displaystyle E_{0}:={} {(u,v)u,v𝒱S},\displaystyle\{(u,v)\in{\mathcal{E}^{\prime}}\mid u,v\in\mathcal{V}^{\prime}\setminus S\},
E1:=\displaystyle E_{1}:={} {(u,v)u𝒱S,vS}.\displaystyle\{(u,v)\in{\mathcal{E}^{\prime}}\mid u\in\mathcal{V}^{\prime}\setminus S,v\in S\}.

For the disturbance vectors444We do not consider the disturbance vectors in the exponentially decaying local perturbation bounds defined in Definition 3.1, but adding ww into the edge costs makes Theorem D.2 more general. For each edge ee, wew_{e} is a qq-dimensional real vector. w(|E0|+|E1|)×qw\in\mathbb{R}^{(\mathopen{}\mathclose{{}\left\lvert E_{0}}\right\rvert+\mathopen{}\mathclose{{}\left\lvert E_{1}}\right\rvert)\times q} indexed by eE0E1e\in E_{0}\cup E_{1} and yvSpvy\in\mathbb{R}^{\sum_{v\in S}p_{v}} indexed by vSv\in S, let ψ(w,y)\psi(w,y) denote the optimal solution of the optimization problem

ψ(w,y):=argminx|𝒱S|×dv𝒱Sf^v(x^v)+(u,v)E0c^(u,v)(x^u,x^v,w(u,v))+(u,v)E1c^(u,v)(x^u,yv,w(u,v)).\psi(w,y):=\operatorname*{arg\,min}_{x\in\mathbb{R}^{\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}\setminus S}\right\rvert\times d}}\sum_{v\in\mathcal{V}^{\prime}\setminus S}\hat{f}_{v}(\hat{x}_{v})+\sum_{(u,v)\in E_{0}}\hat{c}_{(u,v)}(\hat{x}_{u},\hat{x}_{v},w_{(u,v)})+\sum_{(u,v)\in E_{1}}\hat{c}_{(u,v)}(\hat{x}_{u},y_{v},w_{(u,v)}).

Then, we have that for any vertex u0𝒱Su_{0}\in\mathcal{V}^{\prime}\setminus S, the following inequality holds:

ψ(w,y)u0ψ(w,y)u0C(eE0E1λd𝒢(h,e)1wewe+vSλd𝒢(h,v)1yvyv),w,y,w,y,\mathopen{}\mathclose{{}\left\lVert\psi(w,y)_{u_{0}}-\psi(w^{\prime},y^{\prime})_{u_{0}}}\right\rVert\leq C\mathopen{}\mathclose{{}\left(\sum_{e\in E_{0}\cup E_{1}}\lambda^{d_{\mathcal{G}^{\prime}}(h,e)-1}\mathopen{}\mathclose{{}\left\lVert w_{e}-w_{e}^{\prime}}\right\rVert+\sum_{v\in S}\lambda^{d_{\mathcal{G}^{\prime}}(h,v)-1}\mathopen{}\mathclose{{}\left\lVert y_{v}-y_{v}^{\prime}}\right\rVert}\right),\forall w,y,w^{\prime},y^{\prime},

where C(2)/μC\coloneqq{(2\ell)}/{\mu} and λ12(1+(Δ/μ)+1)1\lambda\coloneqq 1-2\cdot\mathopen{}\mathclose{{}\left(\sqrt{1+(\Delta^{\prime}\ell/\mu)}+1}\right)^{-1}. Here, Δ\Delta^{\prime} denote the maximum degree of any vertex v𝒱v\in\mathcal{V}^{\prime} in graph 𝒢\mathcal{G}^{\prime}. For e=(u,v)e=(u,v)\in{\mathcal{E}^{\prime}}, we define d𝒢(u0,e):=min{d𝒢(u0,u),d𝒢(u0,v)}d_{\mathcal{G^{\prime}}}(u_{0},e):=\min\{d_{\mathcal{G}^{\prime}}(u_{0},u),d_{\mathcal{G}^{\prime}}(u_{0},v)\}.

Proof of Theorem D.2.

Let e=[π,ϵ]e=[\pi^{\top},\epsilon^{\top}]^{\top} be a vector where ϵ={ϵv}vS\epsilon=\{\epsilon_{v}\}_{v\in S} for ϵvpv\epsilon_{v}\in\mathbb{R}^{p_{v}} and π={πe}eE0E1,\pi=\{\pi_{e}\}_{e\in E_{0}\cup E_{1}}, for πeq\pi_{e}\in\mathbb{R}^{q}. Let θ\theta be an arbitrary real number. Define function h^:v𝒱Spv×(|E0|+|E1|)×q×vSpv0\hat{h}:\mathbb{R}^{\sum_{v\in\mathcal{V}^{\prime}\setminus S}p_{v}}\times\mathbb{R}^{(\mathopen{}\mathclose{{}\left\lvert E_{0}}\right\rvert+\mathopen{}\mathclose{{}\left\lvert E_{1}}\right\rvert)\times q}\times\mathbb{R}^{\sum_{v\in S}p_{v}}\to\mathbb{R}_{\geq 0} as

h^(x^,w,y)=v𝒱Sf^v(x^v)+(u,v)E0c^(u,v)(x^u,x^v,w(u,v))+(u,v)E1c^(u,v)(x^u,yv,w(u,v)).\hat{h}(\hat{x},w,y)=\sum_{v\in\mathcal{V}^{\prime}\setminus S}\hat{f}_{v}(\hat{x}_{v})+\sum_{(u,v)\in E_{0}}\hat{c}_{(u,v)}(\hat{x}_{u},\hat{x}_{v},w_{(u,v)})+\sum_{(u,v)\in E_{1}}\hat{c}_{(u,v)}(\hat{x}_{u},y_{v},w_{(u,v)}).

To simplify the notation, we use ζ\zeta to denote the tuple of system parameters, i.e.,

ζ:=(w,y).\zeta:=(w,y).

From our construction, we know that h^\hat{h} is μ\mu-strongly convex in xx, so we use the decomposition h^=h^a+h^b\hat{h}=\hat{h}_{a}+\hat{h}_{b}, where

h^a(x^,ζ)\displaystyle\hat{h}_{a}(\hat{x},\zeta) =v𝒱Sμ2x^v2+(u,v)E0c^(u,v)(x^u,x^v,w(u,v))+(u,v)E1c^(u,v)(x^u,yv,w(u,v)),\displaystyle=\sum_{v\in\mathcal{V}^{\prime}\setminus S}\frac{\mu}{2}\mathopen{}\mathclose{{}\left\lVert\hat{x}_{v}}\right\rVert^{2}+\sum_{(u,v)\in E_{0}}\hat{c}_{(u,v)}(\hat{x}_{u},\hat{x}_{v},w_{(u,v)})+\sum_{(u,v)\in E_{1}}\hat{c}_{(u,v)}(\hat{x}_{u},y_{v},w_{(u,v)}),
h^b(x^,ζ)\displaystyle\hat{h}_{b}(\hat{x},\zeta) =v𝒱S(f^v(x^v)μ2x^v2).\displaystyle=\sum_{v\in\mathcal{V}^{\prime}\setminus S}\mathopen{}\mathclose{{}\left(\hat{f}_{v}(\hat{x}_{v})-\frac{\mu}{2}\mathopen{}\mathclose{{}\left\lVert\hat{x}_{v}}\right\rVert^{2}}\right).

Since ψ(ζ+θe)\psi(\zeta+\theta e) is the minimizer of convex function h^(,ζ+θe)\hat{h}(\cdot,\zeta+\theta e), we see that

x^h^(ψ(ζ+θe),ζ+θe)=0.\nabla_{\hat{x}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)=0.

Taking the derivative with respect to θ\theta gives that

x^2h^(ψ(ζ+θe),ζ+θe)ddθψ(ζ+θe)=\displaystyle\nabla_{\hat{x}}^{2}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)\frac{d}{d\theta}\psi(\zeta+\theta e)={} vSyvx^h^(ψ(ζ+θe),ζ+θe)ϵv\displaystyle-\sum_{v\in S}\nabla_{y_{v}}\nabla_{\hat{x}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)\epsilon_{v}
eE1E2wex^h^(ψ(ζ+θe),ζ+θe)πe.\displaystyle-\sum_{e\in E_{1}\cup E_{2}}\nabla_{w_{e}}\nabla_{\hat{x}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)\pi_{e}.

To simplify the notation, we define

M\displaystyle M :=x^2h^(ψ(ζ+θe),ζ+θe),which is a |𝒱S|×|𝒱S| block matrix,\displaystyle:=\nabla_{\hat{x}}^{2}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e),\text{which is a }\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}\setminus S}\right\rvert\times\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}\setminus S}\right\rvert\text{ block matrix},
R(v)\displaystyle R^{(v)} :=yvx^h^(ψ(ζ+θe),ζ+θe),vS,which are |𝒱S|×1 block matrix,\displaystyle:=-\nabla_{y_{v}}\nabla_{\hat{x}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e),\forall v\in S,\text{which are }\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}\setminus S}\right\rvert\times 1\text{ block matrix},
K(e)\displaystyle K^{(e)} :=wex^h^(ψ(ζ+θe),ζ+θe),eE0E1,which are |𝒱S|×1 block matrices,\displaystyle:=-\nabla_{w_{e}}\nabla_{\hat{x}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e),\forall e\in E_{0}\cup E_{1},\text{which are }\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}\setminus S}\right\rvert\times 1\text{ block matrices},

where in MM, the block size is pu×pv,(u,v)(𝒱S)2p_{u}\times p_{v},\forall(u,v)\in(\mathcal{V}^{\prime}\setminus S)^{2}; in R(v)R^{(v)}, the block size is pu×pv,u𝒱Sp_{u}\times p_{v},\forall u\in\mathcal{V}^{\prime}\setminus S; in K(e)K^{(e)}, the block size is pu×q,u𝒱Sp_{u}\times q,\forall u\in\mathcal{V}^{\prime}\setminus S. Hence we can write

ddθψ(ζ+θe)=M1(vSR(v)ϵv+eE1E2K(e)πe).\frac{d}{d\theta}\psi(\zeta+\theta e)=M^{-1}\mathopen{}\mathclose{{}\left(\sum_{v\in S}R^{(v)}\epsilon_{v}+\sum_{e\in E_{1}\cup E_{2}}K^{(e)}\pi_{e}}\right).

Recall that {R(v)}vS\{R^{(v)}\}_{v\in S} are |𝒱S|×1\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}\setminus S}\right\rvert\times 1 block matrices with block size pu×pv,u𝒱Sp_{u}\times p_{v},\forall u\in\mathcal{V}^{\prime}\setminus S; {K(e)}eE0E1\{K^{(e)}\}_{e\in E_{0}\cup E_{1}} are |𝒱S|×1\mathopen{}\mathclose{{}\left\lvert\mathcal{V}^{\prime}\setminus S}\right\rvert\times 1 block matrices with block size pu×q,u𝒱Sp_{u}\times q,\forall u\in\mathcal{V}^{\prime}\setminus S. Let N(v)N(v) denote the set of neighbors of vertex vv on 𝒢\mathcal{G}^{\prime}. For R(v),vSR^{(v)},v\in S, the (u,1)(u,1)-th block can be non-zero only if u(𝒱S)N(v)u\in(\mathcal{V}^{\prime}\setminus S)\cap N(v). For K(e),eE0E1K^{(e)},e\in E_{0}\cup E_{1}, the (u,1)(u,1)-th block can be non-zero only if ueu\in e and u𝒱Su\in\mathcal{V}^{\prime}\setminus S. Hence we see that

ddθψ(ζ+θe)u0=vS(M1)u0,(𝒱S)N(v)R(𝒱S)N(v),1(v)ϵv+eE0E1(M1)u0,{ueu𝒱S}K{ueu𝒱S},1(τ)πe.\displaystyle\frac{d}{d\theta}\psi(\zeta+\theta e)_{u_{0}}=\sum_{v\in S}(M^{-1})_{u_{0},(\mathcal{V}^{\prime}\setminus S)\cap N(v)}R^{(v)}_{(\mathcal{V}^{\prime}\setminus S)\cap N(v),1}\epsilon_{v}+\sum_{e\in E_{0}\cup E_{1}}(M^{-1})_{u_{0},\{u\in e\mid u\in\mathcal{V}^{\prime}\setminus S\}}K^{(\tau)}_{\{u\in e\mid u\in\mathcal{V}^{\prime}\setminus S\},1}\pi_{e}.

Since the switching costs cτ(,,),τ=1,,pc_{\tau}(\cdot,\cdot,\cdot),\tau=1,\ldots,p are \ell-strongly smooth, we know that the norms of

R(𝒱S)N(v),1(v), and K{ueu𝒱S},1(τ)R^{(v)}_{(\mathcal{V}^{\prime}\setminus S)\cap N(v),1},\text{ and }K^{(\tau)}_{\{u\in e\mid u\in\mathcal{V}^{\prime}\setminus S\},1}

are all upper bounded by \ell. Taking norms on both sides gives that

ddθψ(ζ+θe)u0\displaystyle\mathopen{}\mathclose{{}\left\lVert\frac{d}{d\theta}\psi(\zeta+\theta e)_{u_{0}}}\right\rVert\leq{} vS(M1)u0,(𝒱S)N(v)ϵv+eE0E1(M1)u0,{ueu𝒱S}πe.\displaystyle\sum_{v\in S}\ell\mathopen{}\mathclose{{}\left\lVert(M^{-1})_{u_{0},(\mathcal{V}^{\prime}\setminus S)\cap N(v)}}\right\rVert\mathopen{}\mathclose{{}\left\lVert\epsilon_{v}}\right\rVert+\sum_{e\in E_{0}\cup E_{1}}\ell\mathopen{}\mathclose{{}\left\lVert(M^{-1})_{u_{0},\{u\in e\mid u\in\mathcal{V}^{\prime}\setminus S\}}}\right\rVert\mathopen{}\mathclose{{}\left\lVert\pi_{e}}\right\rVert. (11)

Note that MM can be decomposed as M=Ma+MbM=M_{a}+M_{b}, where

Ma\displaystyle M_{a} :=x^2h^a(ψ(ζ+θe),ζ+θe),\displaystyle:=\nabla_{\hat{x}}^{2}\hat{h}_{a}(\psi(\zeta+\theta e),\zeta+\theta e),
Mb\displaystyle M_{b} :=x^2h^b(ψ(ζ+θe),ζ+θe).\displaystyle:=\nabla_{\hat{x}}^{2}\hat{h}_{b}(\psi(\zeta+\theta e),\zeta+\theta e).

Since MaM_{a} is block tri-diagonal and satisfies (μ+Δ)IMaμI(\mu+\Delta^{\prime}\ell)I\succeq M_{a}\succeq\mu I , and MbM_{b} is block diagonal and satisfies Mb0M_{b}\succeq 0, we obtain the following using Lemma D.1:

(M1)u0,(𝒱S)N(v)2μλd𝒢(u0,v)1, and (M1)u0,{ueu𝒱S}2μλd𝒢(u0,e)1,\mathopen{}\mathclose{{}\left\lVert(M^{-1})_{u_{0},(\mathcal{V}^{\prime}\setminus S)\cap N(v)}}\right\rVert\leq\frac{2}{\mu}\lambda^{d_{\mathcal{G}^{\prime}}(u_{0},v)-1},\text{ and }\mathopen{}\mathclose{{}\left\lVert(M^{-1})_{u_{0},\{u\in e\mid u\in\mathcal{V}^{\prime}\setminus S\}}}\right\rVert\leq\frac{2}{\mu}\lambda^{d_{\mathcal{G}^{\prime}}(u_{0},e)-1},

where λ:=(cond(Ma)1)/(cond(Ma)+1)=12(1+(2/μ)+1)1\lambda:={(\sqrt{cond(M_{a})}-1)}/{(\sqrt{cond(M_{a})}+1)}=1-2\cdot\mathopen{}\mathclose{{}\left(\sqrt{1+(2\ell/\mu)}+1}\right)^{-1}.

Substituting this into (11), we see that

ddθψ(ζ+θe)u0C(vSλd𝒢(u0,v)1ϵv+eE0E1λd𝒢(u0,e)1πe),\mathopen{}\mathclose{{}\left\lVert\frac{d}{d\theta}\psi(\zeta+\theta e)_{u_{0}}}\right\rVert\leq C\mathopen{}\mathclose{{}\left(\sum_{v\in S}\lambda^{d_{\mathcal{G}^{\prime}}(u_{0},v)-1}\mathopen{}\mathclose{{}\left\lVert\epsilon_{v}}\right\rVert+\sum_{e\in E_{0}\cup E_{1}}\lambda^{d_{\mathcal{G}^{\prime}}(u_{0},e)-1}\mathopen{}\mathclose{{}\left\lVert\pi_{e}}\right\rVert}\right),

where C=(2)/μC=(2\ell)/\mu.

Finally, by integration we can complete the proof

ψ(ζ)u0ψ(ζ+e)u0=\displaystyle\mathopen{}\mathclose{{}\left\lVert\psi(\zeta)_{u_{0}}-\psi(\zeta+e)_{u_{0}}}\right\rVert={} 01ddθψ(ζ+θe)u0𝑑θ\displaystyle\mathopen{}\mathclose{{}\left\lVert\int_{0}^{1}\frac{d}{d\theta}\psi(\zeta+\theta e)_{u_{0}}d\theta}\right\rVert
\displaystyle\leq{} 01ddθψ(ζ+θe)u0𝑑θ\displaystyle\int_{0}^{1}\mathopen{}\mathclose{{}\left\lVert\frac{d}{d\theta}\psi(\zeta+\theta e)_{u_{0}}}\right\rVert d\theta
\displaystyle\leq{} C(vSλd𝒢(u0,v)1ϵv+eE0E1λd𝒢(u0,e)1πe).\displaystyle C\mathopen{}\mathclose{{}\left(\sum_{v\in S}\lambda^{d_{\mathcal{G}^{\prime}}(u_{0},v)-1}\mathopen{}\mathclose{{}\left\lVert\epsilon_{v}}\right\rVert+\sum_{e\in E_{0}\cup E_{1}}\lambda^{d_{\mathcal{G}^{\prime}}(u_{0},e)-1}\mathopen{}\mathclose{{}\left\lVert\pi_{e}}\right\rVert}\right).

Now we return to the proof of Theorem 3.2. For simplicity, we temporarily assume the individual decision points are unconstrained, i.e., Dtv=nD_{t}^{v}=\mathbb{R}^{n}. We discuss how to relax this assumption in Appendix D.3.

We first consider the case when ({yt1u},{zτu})\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right) and ({yt1u},{zτu})\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right) only differ at one entry yt1uy_{t-1}^{u} or zτuz_{\tau}^{u}. If the difference is at zτuz_{\tau}^{u}, by viewing each subset {τ}×Nvr\{\tau\}\times N_{v}^{r} for τ{t1,t,,t+k}\tau\in\{t-1,t,\dots,t+k\} in the original problem as a vertex in the new graph 𝒢\mathcal{G}^{\prime} and applying Theorem D.2, we obtain that

xt0v0(xt0v0)C10(ρT0)|t0τ|zτu(zτu),\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t_{0}}^{v_{0}}-(x_{t_{0}}^{v_{0}})^{\prime}}\right\rVert\leq C_{1}^{0}\cdot(\rho_{T}^{0})^{\mathopen{}\mathclose{{}\left\lvert t_{0}-\tau}\right\rvert}\mathopen{}\mathclose{{}\left\lVert z_{\tau}^{u}-(z_{\tau}^{u})^{\prime}}\right\rVert, (12)

where C10=(2T)/μC_{1}^{0}=(2\ell_{T})/\mu and ρT0=12(1+(2T/μ)+1)1\rho_{T}^{0}=1-2\cdot\mathopen{}\mathclose{{}\left(\sqrt{1+(2\ell_{T}/\mu)}+1}\right)^{-1}. On the other hand, by viewing each subset {τt1τ<t+k}×{u}\{\tau\mid t-1\leq\tau<t+k\}\times\{u\} for uNvru\in N_{v}^{r} in the original problem as a vertex in the new graph 𝒢\mathcal{G}^{\prime} and applying Theorem D.2, we obtain that

xt0v0(xt0v0)C11(ρS0)d𝒢(u,v0)zτu(zτu),\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t_{0}}^{v_{0}}-(x_{t_{0}}^{v_{0}})^{\prime}}\right\rVert\leq C_{1}^{1}\cdot(\rho_{S}^{0})^{d_{\mathcal{G}}(u,v_{0})}\mathopen{}\mathclose{{}\left\lVert z_{\tau}^{u}-(z_{\tau}^{u})^{\prime}}\right\rVert, (13)

where C11=(2ΔS)/μC_{1}^{1}=(2\Delta\ell_{S})/\mu and ρS0=12(1+(2ΔS/μ))1\rho_{S}^{0}=1-2\cdot\mathopen{}\mathclose{{}\left(\sqrt{1+(2\Delta\ell_{S}/\mu)}}\right)^{-1}. Combining (12) and (13) gives that

xt0v0(xt0v0)\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t_{0}}^{v_{0}}-(x_{t_{0}}^{v_{0}})^{\prime}}\right\rVert\leq{} min{C10(ρT0)|t0τ|,C11(ρS0)d𝒢(u,v0)}zτu(zτu)\displaystyle\min\{C_{1}^{0}\cdot(\rho_{T}^{0})^{\mathopen{}\mathclose{{}\left\lvert t_{0}-\tau}\right\rvert},C_{1}^{1}\cdot(\rho_{S}^{0})^{d_{\mathcal{G}}(u,v_{0})}\}\cdot\mathopen{}\mathclose{{}\left\lVert z_{\tau}^{u}-(z_{\tau}^{u})^{\prime}}\right\rVert
\displaystyle\leq{} C10C11(ρT0)|t0τ|/2(ρS0)d𝒢(u,v0)/2zτu(zτu)\displaystyle\sqrt{C_{1}^{0}\cdot C_{1}^{1}}\cdot(\rho_{T}^{0})^{\mathopen{}\mathclose{{}\left\lvert t_{0}-\tau}\right\rvert/2}\cdot(\rho_{S}^{0})^{d_{\mathcal{G}}(u,v_{0})/2}\cdot\mathopen{}\mathclose{{}\left\lVert z_{\tau}^{u}-(z_{\tau}^{u})^{\prime}}\right\rVert
\displaystyle\leq{} C1ρT|t0τ|ρSd𝒢(v0,u)zτu(zτu)\displaystyle C_{1}\cdot\rho_{T}^{|t_{0}-\tau|}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert z_{\tau}^{u}-(z_{\tau}^{u})^{\prime}}\right\rVert (14)

when ({yt1u},{zτu})\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right) and ({yt1u},{zτu})\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right) only differ at one entry zτuz_{\tau}^{u} for (τ,u)N(t,v)(k,r)(\tau,u)\in\partial N_{(t,v)}^{(k,r)}.

We can use the same method to show that when ({yt1u},{zτu})\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right) and ({yt1u},{zτu})\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right) only differ at one entry yt1uy_{t-1}^{u} for uNvru\in N_{v}^{r}, we have

xt0v0(xt0v0)C2ρTt0(t1)ρSd𝒢(v0,u)yt1u(yt1u).\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t_{0}}^{v_{0}}-(x_{t_{0}}^{v_{0}})^{\prime}}\right\rVert\leq{}C_{2}\rho_{T}^{t_{0}-(t-1)}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert y_{t-1}^{u}-(y_{t-1}^{u})^{\prime}}\right\rVert. (15)

In the general case where ({yt1u},{zτu})\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right) and ({yt1u},{zτu})\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right) differ not only at one entry, we can perturb the entries of parameters one at a time and apply the triangle inequality. Then, the conclusion of Theorem 3.2 follows from (D.1) and (15).

D.2 Proof of Theorem 3.3

The proof follows a four step structure outlined in Section C.1.

Step 1. Establish first order equations

Given any system parameter ζ=(xt1(Nvr),{zτu|(τ,u)N(t,v)(k,r)})\zeta=(x_{t-1}^{(N_{v}^{r})},\{z_{\tau}^{u}|(\tau,u)\in\partial N_{(t,v)}^{(k,r)}\}), we can define function h^\hat{h} as follows:

h^(x^[k1],ζ)=i=1k1uNvr1ft1+iu(x^iu)+i=1k1(u,u)(Nvr)st1+i(u,u)(x^iu,x^iu)+i=1kuNvrct1+iu(x^iu,x^i1u).\hat{h}(\hat{x}_{[k-1]},\zeta)=\sum_{i=1}^{k-1}\sum_{u\in N_{v}^{r-1}}f^{u}_{t-1+i}(\hat{x}_{i}^{u})+\sum_{i=1}^{k-1}\sum_{(u,u^{\prime})\in\mathcal{E}(N_{v}^{r})}s_{t-1+i}^{(u,u^{\prime})}(\hat{x}_{i}^{u},\hat{x}_{i}^{u^{\prime}})+\sum_{i=1}^{k}\sum_{u\in N_{v}^{r}}c_{t-1+i}^{u}(\hat{x}_{i}^{u},\hat{x}_{i-1}^{u}).

x^0\hat{x}_{0} coincides with xt1x_{t-1} on every node in Nvr.N_{v}^{r}. x^k\hat{x}_{k} coincides with zt1+kz_{t-1+k} on every node in Nvr.N_{v}^{r}. For 1ik11\leq i\leq k-1, x^iu\hat{x}_{i}^{u} coincides with zt1+iuz_{t-1+i}^{u} on the boundary, i.e., uNvr.u\in\partial N_{v}^{r}.

Let perturbation vector e=[e0T,e1T,,ek1T,ekT]Te=[e_{0}^{T},e_{1}^{T},\cdots,e_{k-1}^{T},e_{k}^{T}]^{T} where e0,ek|Nvr|×ne_{0},e_{k}\in\mathbb{R}^{|N_{v}^{r}|\times n} and ei=|Nvr|×ne_{i}=\mathbb{R}^{|\partial N_{v}^{r}|\times n} for 1ik11\leq i\leq k-1.

Given θ\theta\in\mathbb{R}, ψ(ζ+θe)\psi(\zeta+\theta e) is the global minimizer of convex function h^(,ζ+θe)\hat{h}(\cdot,\zeta+\theta e), and hence we have

x^1:k1(Nvr1)h^(ψ(ζ+θe),ζ+θe)=0.\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)=0.

Taking the derivative with respect to θ\theta, we establish the following set of equations:

x^1:k1(Nvr1)2h^(ψ(ζ+θe),ζ+θe)ddθψ(ζ+θe)=x^0(Nvr)x^1:k1(Nvr1)h^(ψ(ζ+θe),ζ+θe)e0x^k(Nvr)x^1:k1(Nvr1)h^(ψ(ζ+θe),ζ+θe)ekτ=1k1x^τ(Nvr)x^1:k1(Nvr1)h^(ψ(ζ+θe),ζ+θe)eτ.\displaystyle\begin{split}\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}^{2}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)\frac{d}{d\theta}\psi(\zeta+\theta e)&=-\nabla_{\hat{x}_{0}^{(N_{v}^{r})}}\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)e_{0}\\ &-\nabla_{\hat{x}_{k}^{(N_{v}^{r})}}\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)e_{k}\\ &-\sum_{\tau=1}^{k-1}\nabla_{\hat{x}_{\tau}^{(\partial N_{v}^{r})}}\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e)e_{\tau}.\end{split} (16)

We adopt the following short-hand notation:

  • Mx^1:k1(Nvr1)2h^(ψ(ζ+θe),ζ+θe)M\coloneqq\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}^{2}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e), which is a hierarchical block matrix with the first level of dimension (k1)×(k1)(k-1)\times(k-1), the second level of dimension |Nvr1|×|Nvr1||N_{v}^{r-1}|\times|N_{v}^{r-1}| and the third level of dimension n×nn\times n.

  • R(1)x^0(Nvr)x^1:k1(Nvr1)h^(ψ(ζ+θe),ζ+θe)R^{(1)}\coloneqq-\nabla_{\hat{x}_{0}^{(N_{v}^{r})}}\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e), which is also a hierarchical block matrix with the first level of dimension (k1)×1(k-1)\times 1, the second level of dimension |Nvr1|×|Nvr||N_{v}^{r-1}|\times|N_{v}^{r}| and the third level of dimension n×nn\times n.

  • R(k1)x^k(Nvr)x^1:k1(Nvr1)h^(ψ(ζ+θe),ζ+θe)R^{(k-1)}\coloneqq-\nabla_{\hat{x}_{k}^{(N_{v}^{r})}}\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e), which is also a hierarchical block matrix with the first level of dimension (k1)×1(k-1)\times 1, the second level of dimension |Nvr1|×|Nvr||N_{v}^{r-1}|\times|N_{v}^{r}| and the third level of dimension n×nn\times n.

  • K(τ)x^τ(Nvr)x^1:k1(Nvr1)h^(ψ(ζ+θe),ζ+θe)K^{(\tau)}\coloneqq-\nabla_{\hat{x}_{\tau}^{(\partial N_{v}^{r})}}\nabla_{\hat{x}_{1:k-1}^{(N_{v}^{r-1})}}\hat{h}(\psi(\zeta+\theta e),\zeta+\theta e), which is also a hierarchical block matrix with the first level of dimension (k1)×1(k-1)\times 1. the second level of dimension |Nvr1|×|Nvr||N_{v}^{r-1}|\times|\partial N_{v}^{r}| and the third level of dimension n×nn\times n.

Using the above, we can rewrite (16) as follows:

ddθψ(ζ+θe)=M1(R(1)e0+R(k1)ek+τ=1k1K(τ)yτ).\frac{d}{d\theta}\psi(\zeta+\theta e)=M^{-1}\Bigg{(}R^{(1)}e_{0}+R^{(k-1)}e_{k}+\sum_{\tau=1}^{k-1}K^{(\tau)}y_{\tau}\Bigg{)}.

Due to the structure of temporal interaction cost functions, for R(1)R^{(1)}(resp. R(k1)R^{(k-1)}), only when the first level index is 11 (resp. k1k-1), the lower level block matrix is non-zero; due to the structure of spatial interaction cost functions, for K(τ)K^{(\tau)}, only when the first level index is τ\tau, the lower level block matrix is non-zero. Hence, for 1τk11\leq\tau^{\prime}\leq k-1, we have

(ddθψ(ζ+θe))τ=Mτ,11R1(1)e0+Mτ,k11Rk1(k1)ek+τ=1k1Mτ,τ1Kτ(τ)yτ,\displaystyle\begin{split}(\frac{d}{d\theta}\psi(\zeta+\theta e))_{\tau^{\prime}}&=M^{-1}_{\tau^{\prime},1}R^{(1)}_{1}e_{0}+M^{-1}_{\tau^{\prime},k-1}R^{(k-1)}_{k-1}e_{k}+\sum_{\tau=1}^{k-1}M^{-1}_{\tau^{\prime},\tau}K^{(\tau)}_{\tau}y_{\tau},\end{split} (17)

where the subscripts on the right hand side denote the first level index of hierarchical block matrices MM, R(1)R^{(1)}, R(k1)R^{(k-1)} and K(τ)K^{(\tau)}.

Step 2. Exploit the structure of matrix MM

We decompose MM to block diagonal matrix DD and tri-diagonal block matrix AA such that M=D+AM=D+A. We denote each diagonal block in DD as Di,iD_{i,i} for 1ik11\leq i\leq k-1. Other blocks in DD are zero matrices.

D[000000000000]D\coloneqq\begin{bmatrix}\begin{matrix}*&0&\cdots&*\\ 0&*&&0\\ \vdots&&\ddots&\\ *&0&\cdots&*\end{matrix}&&&\\ &\begin{matrix}*&0&\cdots&*\\ 0&*&&0\\ \vdots&&\ddots&\\ *&0&\cdots&*\end{matrix}\\ &&\ddots&\\ &&&\begin{matrix}*&0&\cdots&*\\ 0&*&&0\\ \vdots&&\ddots&\\ *&0&\cdots&*\end{matrix}\end{bmatrix}

Each non-zero block in AA is a diagonal block matrix, which captures the Hessian of temporal interaction cost between consecutive time steps. Denote each block as Ai,jA_{i,j} for 1i,jk11\leq i,j\leq k-1.

A[000000000000000000000000000000]A\coloneqq\begin{bmatrix}&\begin{matrix}*&0&\cdots&0\\ 0&*&&0\\ \vdots&&\ddots&\\ 0&0&\cdots&*\end{matrix}&&\\ \begin{matrix}*&0&\cdots&0\\ 0&*&&0\\ \vdots&&\ddots&\\ 0&0&\cdots&*\end{matrix}&&\begin{matrix}*&0&\cdots&0\\ 0&*&&0\\ \vdots&&\ddots&\\ 0&0&\cdots&*\end{matrix}\\ &&\ddots&\begin{matrix}*&0&\cdots&0\\ 0&*&&0\\ \vdots&&\ddots&\\ 0&0&\cdots&*\end{matrix}\\ &&\begin{matrix}*&0&\cdots&0\\ 0&*&&0\\ \vdots&&\ddots&\\ 0&0&\cdots&*\end{matrix}&\end{bmatrix}

We rewrite the inverse of MM as follows:

M1=(D+A)1=D1(I+AD1)1=D1P1.M^{-1}=(D+A)^{-1}=D^{-1}(I+AD^{-1})^{-1}=D^{-1}P^{-1}.

Next we show the proof for Lemma C.2.

Proof of Lemma C.2.

We claim the eigenvalues of PP are in {λ||λz|R}\{\lambda\in\mathbb{C}||\lambda-z|\leq R\} for some R>0R\in\mathbb{R}_{>0} and z{0}z\in\mathbb{C}\setminus\{0\} such that R<|z|.R<|z|. We first establish Lemma C.2 based on the claim and then prove the claim.

We follow the argument as in the proof of Thm 4 in Shin et al. (2020). Since any eigenvalue λ\lambda of PP satisfies |λz|R|\lambda-z|\leq R, |λ/z1|R/|z|<1.|\lambda/z-1|\leq R/|z|<1. Thus, the eigenvalues of I(1/z)PI-(1/z)P lie on {λ~:|λ~|R/|z|}\{\tilde{\lambda}\in\mathbb{C}:|\tilde{\lambda}|\leq R/|z|\}, which guarantees ρ(I(1/z)P)<1.\rho(I-(1/z)P)<1. Therefore,

P1=1z(I(I1zP))1=1zq0(I1zP)q.P^{-1}=\frac{1}{z}\Bigg{(}I-(I-\frac{1}{z}P)\Bigg{)}^{-1}=\frac{1}{z}\sum_{q\geq 0}(I-\frac{1}{z}P)^{q}.

We let z=1z=1 and R=2TμR=\frac{2\ell_{T}}{\mu} and prove the above claim by utilizing Gershgorin circle theorem for block matrices.

By Theorem 1.13.1 and Remark 1.13.2 of Tretter (2008), the following holds: Consider 𝒜=(Aij)dn×dn\mathcal{A}=(A_{ij})\in\mathbb{R}^{dn\times dn} (d,n1d,n\geq 1) where Aijd×dA_{ij}\in\mathbb{R}^{d\times d} and AiiA_{ii} is symmetric. Suppose σ()\sigma(\cdot) is the spectrum of a matrix. Define set

Giσ(Aii){k=1dB(λk(Aii),jiAij)}G_{i}\coloneqq\sigma(A_{ii})\cup\Bigg{\{}\cup_{k=1}^{d}B\Bigg{(}\lambda_{k}(A_{ii}),\sum_{j\neq i}\mathopen{}\mathclose{{}\left\lVert A_{ij}}\right\rVert\Bigg{)}\Bigg{\}}

where B(,)B(\cdot,\cdot) denotes a disk B(c,r)={λ:λcr}B(c,r)=\{\lambda:\mathopen{}\mathclose{{}\left\lVert\lambda-c}\right\rVert\leq r\} and λk\lambda_{k} is the kk-th smallest eigenvalues of AiiA_{ii}. Then,

σ(𝒜)i=1nGi.\sigma(\mathcal{A})\in\cup_{i=1}^{n}G_{i}.

Next, we use the above fact to find a superset of σ(P)\sigma(P). Every diagonal block of PP is I{I}. Moreover, Pi,j=0P_{i,j}=0 for |ij|>1|i-j|>1, Pi,i1=Ai,i1Di1,i11P_{i,i-1}=A_{i,i-1}D_{i-1,i-1}^{-1}, Pi,i+1=Ai,i+1Di+1,i+11P_{i,i+1}=A_{i,i+1}D_{i+1,i+1}^{-1}. Hence we have

jiPi,j\displaystyle\sum_{j\neq i}\mathopen{}\mathclose{{}\left\lVert P_{i,j}}\right\rVert Ai,i1Di1,i11+Ai,i+1Di+1,i+11\displaystyle\leq\mathopen{}\mathclose{{}\left\lVert A_{i,i-1}}\right\rVert\mathopen{}\mathclose{{}\left\lVert D_{i-1,i-1}^{-1}}\right\rVert+\mathopen{}\mathclose{{}\left\lVert A_{i,i+1}}\right\rVert\mathopen{}\mathclose{{}\left\lVert D_{i+1,i+1}^{-1}}\right\rVert
2Tμ.\displaystyle\leq\frac{2\ell_{T}}{\mu}.

The last inequality is by Assumptions 2.1. Therefore, Gi=B(1,2Tμ).G_{i}=B(1,\frac{2\ell_{T}}{\mu}). This implies all eigenvalues of PP are in B(1,2Tμ).B(1,\frac{2\ell_{T}}{\mu}).

To further simplify the notation in the power series expansion, we define JAD1=PIJ\coloneqq AD^{-1}=P-I. Given any time indices τ\tau^{\prime} and τ\tau, we have

(M1)τ,τ=(D1)τ,τ(P1)τ,τ=(D1)τ,τ×0(J)τ,τ,\displaystyle\begin{split}(M^{-1})_{\tau^{\prime},\tau}&=(D^{-1})_{\tau^{\prime},\tau^{\prime}}(P^{-1})_{\tau^{\prime},\tau}\\ &=(D^{-1})_{\tau^{\prime},\tau^{\prime}}\times\sum_{\ell\geq 0}(-J)^{\ell}_{\tau^{\prime},\tau},\end{split} (18)

where the first equality is since D1D^{-1} is a diagonal block matrix, the second equality is due to Lemma C.2.

Step 3: Property for general exponential-decay matrices

This step simply requires proving Lemma C.3.

Proof of Lemma C.3.

Under the assumptions, we see that

q(1λ)d(u,q)(A1A2A)u,q=q(1λ)d(u,q)s1,,s1(A1)u,s1(A2)s1,s2(A)s1,qq(1λ)d(u,q)s1,,s1(C1λd(u,s1))(C2λd(s1,s2))(Cλd(s1,q))qs1,,s1i=1Ci(λλ)d(u,s1)+d(s1,s2)++d(s1,q)(a~)i=1Ci.\displaystyle\begin{split}\sum_{q}(\frac{1}{\lambda^{\prime}})^{d_{\mathcal{M}}(u,q)}\mathopen{}\mathclose{{}\left\lVert(A_{1}A_{2}\cdots A_{\ell})_{u,q}}\right\rVert&=\sum_{q}(\frac{1}{\lambda^{\prime}})^{d_{\mathcal{M}}(u,q)}\mathopen{}\mathclose{{}\left\lVert\sum_{s_{1},\cdots,s_{\ell-1}}(A_{1})_{u,s_{1}}(A_{2})_{s_{1},s_{2}}\cdots(A_{\ell})_{s_{\ell-1},q}}\right\rVert\\ &\leq\sum_{q}(\frac{1}{\lambda^{\prime}})^{d_{\mathcal{M}}(u,q)}\sum_{s_{1},\cdots,s_{\ell-1}}(C_{1}\lambda^{d_{\mathcal{M}}(u,s_{1})})(C_{2}\lambda^{d_{\mathcal{M}}(s_{1},s_{2})})\cdots(C_{\ell}\lambda^{d_{\mathcal{M}}(s_{\ell-1},q)})\\ &\leq\sum_{q}\sum_{s_{1},\cdots,s_{\ell-1}}\prod_{i=1}^{\ell}C_{i}(\frac{\lambda}{\lambda^{\prime}})^{d_{\mathcal{M}}(u,s_{1})+d_{\mathcal{M}}(s_{1},s_{2})+\cdots+d_{\mathcal{M}}(s_{\ell-1},q)}\\ &\leq(\tilde{a})^{\ell}\prod_{i=1}^{\ell}C_{i}.\end{split} (19)

Hence, we obtain that

(i=1Ai)u,qC(λ)dM(u,q).\mathopen{}\mathclose{{}\left\lVert(\prod_{i=1}^{\ell}A_{i})_{u,q}}\right\rVert\leq C^{\prime}(\lambda^{\prime})^{d_{M}(u,q)}.

Step 4: Establish correlation decay properties of matrix MM

In this step, we use the property developed for general exponential-decay matrices on MM and derive the perturbation bound in the Theorem 3.3.

Lemma D.3.

For 1\ell\geq 1, time index i,j1i,j\geq 1, JJ^{\ell} has the following properties:

  • (J)i,j=0(J^{\ell})_{i,j}=0 if <|ij|\ell<|i-j| or |ij|\ell-|i-j| is odd.

  • (J)i,j(J^{\ell})_{i,j} is a summation of terms k=1Ajk,ikDik,ik1\prod_{k=1}^{\ell}A_{j_{k},i_{k}}D_{i_{k},i_{k}}^{-1} and the number of such terms is bounded by ((|ij|)/2){\ell\choose(\ell-|i-j|)/2}.

Note for integers m,k1m,k\geq 1, we define (mk/2)=0{m\choose k/2}=0 if kk is odd.

Proof.

Since JJ is a tri-diagonal banded matrix, Ji,j=0J^{\ell}_{i,j}=0 for <|ij|\ell<|i-j|. We prove the rest of properties of JJ by induction on \ell. When =1\ell=1,

Ji,i=0,Ji,i1=Ai,i1Di1,i11,Ji,i+1=Ai,i+1Di+1,i+11.J_{i,i}=0,\quad J_{i,i-1}=A_{i,i-1}D_{i-1,i-1}^{-1},\quad J_{i,i+1}=A_{i,i+1}D_{i+1,i+1}^{-1}.

Lemma D.3 holds for the base case. Suppose Lemma D.3 holds for JqJ^{q} for q1q\leq\ell-1. Let q=q=\ell, then

Ji,j\displaystyle J^{\ell}_{i,j} =kJi,k1Jk,j=Ji,j11Aj1,jDj,j1+Ji,j+11Aj+1,jDj,j1.\displaystyle=\sum_{k}J^{\ell-1}_{i,k}J_{k,j}=J^{\ell-1}_{i,j-1}A_{j-1,j}D^{-1}_{j,j}+J^{\ell-1}_{i,j+1}A_{j+1,j}D_{j,j}^{-1}.

By induction hypothesis, Ji,j1J^{\ell-1}_{i,j} is a summation of terms k=11Ajk,ikDik,ik1\prod_{k=1}^{\ell-1}A_{j_{k},i_{k}}D_{i_{k},i_{k}}^{-1}. Moreover, the number of such terms is bounded by (1(1|ij1|)/2)+(1(1|ij+1|)/2){\ell-1\choose(\ell-1-|i-j-1|)/2}+{\ell-1\choose(\ell-1-|i-j+1|)/2}. Next we will show (1(1|ij1|)/2)+(1(1|ij+1|)/2)=((|ij|)/2){\ell-1\choose(\ell-1-|i-j-1|)/2}+{\ell-1\choose(\ell-1-|i-j+1|)/2}={\ell\choose(\ell-|i-j|)/2} case by case.

Case 1: |ij|\ell-|i-j| is odd.

If |ij|\ell-|i-j| is odd, then 1|ij1|\ell-1-|i-j-1| and 1|ij+1|\ell-1-|i-j+1| are both odd. Under this case,

(1(1|ij1|)/2)+(1(1|ij+1|)/2)=0,{\ell-1\choose(\ell-1-|i-j-1|)/2}+{\ell-1\choose(\ell-1-|i-j+1|)/2}=0,

which is equal to ((|ij|)/2).{\ell\choose(\ell-|i-j|)/2}.

Case 2: |ij|\ell-|i-j| is even and i=ji=j. Under this case, we have

(1(1|ij1|)/2)+(1(1|ij+1|)/2)=(1/21)+(1/21).{\ell-1\choose(\ell-1-|i-j-1|)/2}+{\ell-1\choose(\ell-1-|i-j+1|)/2}={\ell-1\choose\ell/2-1}+{\ell-1\choose\ell/2-1}.

Since \ell is even, (1/21)+(1/21)=(/2)=((|ij|)/2).{\ell-1\choose\ell/2-1}+{\ell-1\choose\ell/2-1}={\ell\choose\ell/2}={\ell\choose(\ell-|i-j|)/2}.

Case 3: |ij|\ell-|i-j| is even and iji\neq j.

If |ij|\ell-|i-j| is even, then 1|ij1|\ell-1-|i-j-1| and 1|ij+1|\ell-1-|i-j+1| are both even. We denote (|ij|)/2(\ell-|i-j|)/2 as k0k_{0}. By triangle inequality, (1|ij1|)/2(\ell-1-|i-j-1|)/2 and (1|ij+1|)/2(\ell-1-|i-j+1|)/2 are in {k01,k0}\{k_{0}-1,k_{0}\}. Since iji\neq j,

(1(1|ij1|)/2)+(1(1|ij+1|)/2)=(1k01)+(1k0),{\ell-1\choose(\ell-1-|i-j-1|)/2}+{\ell-1\choose(\ell-1-|i-j+1|)/2}={\ell-1\choose k_{0}-1}+{\ell-1\choose k_{0}},

which sums to (k0){\ell\choose k_{0}} by Pascal’s triangle.

Next we present the proof of Lemma C.4.

Proof of Lemma C.4.

By Lemma D.3, (J)i,j(J^{\ell})_{i,j} is a summation of terms k=1Ajk,ikDik,ik1\prod_{k=1}^{\ell}A_{j_{k},i_{k}}D_{i_{k},i_{k}}^{-1} and the number of such terms is bounded by ((|ij|)/2){\ell\choose(\ell-|i-j|)/2}.

Define BkAjk,ikDik,ik1B_{k}\coloneqq A_{j_{k},i_{k}}D_{i_{k},i_{k}}^{-1}. Recall Ajk,ikA_{j_{k},i_{k}} is a diagonal matrix and Dik,ikD_{i_{k},i_{k}} is a graph-induced banded matrix.

(Bk)u,q=(Ajk,ikDik,ik1)u,q=(Ajk,ik)u,u(Dik,ik1)u,qT(Dik,ik1)u,v2TμγSd𝒢(u,v).\mathopen{}\mathclose{{}\left\lVert(B_{k})_{u,q}}\right\rVert=\mathopen{}\mathclose{{}\left\lVert(A_{j_{k},i_{k}}D_{i_{k},i_{k}}^{-1})_{u,q}}\right\rVert=\mathopen{}\mathclose{{}\left\lVert(A_{j_{k},i_{k}})_{u,u}(D_{i_{k},i_{k}}^{-1})_{u,q}}\right\rVert\leq\ell_{T}\mathopen{}\mathclose{{}\left\lVert(D_{i_{k},i_{k}}^{-1})_{u,v}}\right\rVert\leq\frac{2\ell_{T}}{\mu}\gamma_{S}^{d_{\mathcal{G}}(u,v)}.

where the last inequality is by using Lemma D.1 on Dik,ikD_{i_{k},i_{k}}.

Under the condition b<b<\infty, we can use Lemma C.3 to obtain the following bound,

(k=1Ajk,ikDik1)u,v(b2Tμ)(γS)d𝒢(u,v).\mathopen{}\mathclose{{}\left\lVert(\prod_{k=1}^{\ell}A_{j_{k},i_{k}}D_{i_{k}}^{-1})_{u,v}}\right\rVert\leq(b\frac{2\ell_{T}}{\mu})^{\ell}(\gamma_{S}^{\prime})^{d_{\mathcal{G}}(u,v)}.

Since the number of such terms is bounded by ((|ij|)/2){\ell\choose(\ell-|i-j|)/2}, we have

((J)i,j)u,q((|ij|)/2)(b2Tμ)(γS)d𝒢(u,v).\mathopen{}\mathclose{{}\left\lVert((J^{\ell})_{i,j})_{u,q}}\right\rVert\leq{\ell\choose(\ell-|i-j|)/2}(b\frac{2\ell_{T}}{\mu})^{\ell}(\gamma_{S}^{\prime})^{d_{\mathcal{G}}(u,v)}.

Lemma D.4.

Given 1τ,τk11\leq\tau^{\prime},\tau\leq k-1, y|Nvr|×ny\in\mathbb{R}^{|\partial N_{v}^{r}|\times n} and v0Nvr1v_{0}\in N_{v}^{r-1}, we have

((M)τ,τ1Kτ(τ)y)v0C1ρT|ττ|uNvrρSd𝒢(v0,u)1yu,\mathopen{}\mathclose{{}\left\lVert\Bigg{(}(M)^{-1}_{\tau^{\prime},\tau}K_{\tau}^{(\tau)}y\Bigg{)}_{v_{0}}}\right\rVert\leq C_{1}\rho_{T}^{|\tau^{\prime}-\tau|}\sum_{u\in\partial N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)-1}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert,

and for i{1,k1}i\in\{1,k-1\}, e|Nvr|×ne\in\mathbb{R}^{|N_{v}^{r}|\times n},

((M1)τ,i)Ri(i)e)v0C2ρT|τi|+1uNvrρSd𝒢(v0,u)eu,\mathopen{}\mathclose{{}\left\lVert\Bigg{(}(M^{-1})_{\tau^{\prime},i})R^{(i)}_{i}e\Bigg{)}_{v_{0}}}\right\rVert\leq C_{2}\rho_{T}^{|\tau^{\prime}-i|+1}\sum_{u\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert e_{u}}\right\rVert,

where ρT=4a~Tμ\rho_{T}=\frac{4\tilde{a}\ell_{T}}{\mu} and ρS=(1+b1+b2)γS.\rho_{S}=(1+b_{1}+b_{2})\gamma_{S}. We let C1=C2=max{a22a~(14a~T/μ),2a2ΔS/μγS(1+b1+b2)(14a~T/μ)}C_{1}=C_{2}=\max\{\frac{a^{2}}{2\tilde{a}(1-4\tilde{a}\ell_{T}/\mu)},\frac{2a^{2}\Delta\ell_{S}/\mu}{\gamma_{S}(1+b_{1}+b_{2})(1-4\tilde{a}\ell_{T}/\mu)}\}.

Proof.

Given 1τ,τk11\leq\tau,\tau^{\prime}\leq k-1 and v0Nvr1v_{0}\in N_{v}^{r-1}, since M1=D10(J)M^{-1}=D^{-1}\sum_{\ell\geq 0}(-J)^{\ell}, we have

((M)τ,τ1Kτ(τ)y)v0\displaystyle\mathopen{}\mathclose{{}\left\lVert\Bigg{(}(M)^{-1}_{\tau^{\prime},\tau}K_{\tau}^{(\tau)}y\Bigg{)}_{v_{0}}}\right\rVert =(Dτ,τ10(J)τ,τKτ(τ)y)v0.\displaystyle=\mathopen{}\mathclose{{}\left\lVert\Bigg{(}D^{-1}_{\tau^{\prime},\tau^{\prime}}\sum_{\ell\geq 0}(-J)^{\ell}_{\tau^{\prime},\tau}K_{\tau}^{(\tau)}y\Bigg{)}_{v_{0}}}\right\rVert. (20)

With slight abuse of notation, we use KK to denote Kτ(τ)K_{\tau}^{(\tau)}, and Q1Q^{-1} to denote Dτ,τ1D^{-1}_{\tau^{\prime},\tau^{\prime}} in this proof from now. We can rewrite the right hand side of (20) using the new notation as follows:

(Q10(J)τ,τKy)v00(Q1(J)τ,τKy)v0=0qNvr1(Q1(J)τ,τ)v0,q(Ky)q0qNvr1(Q1(J)τ,τ)v0,q(Ky)q.\displaystyle\begin{split}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}\sum_{\ell\geq 0}(-J)^{\ell}_{\tau^{\prime},\tau}Ky\Bigg{)}_{v_{0}}}\right\rVert&\leq\sum_{\ell\geq 0}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}(-J)^{\ell}_{\tau^{\prime},\tau}Ky\Bigg{)}_{v_{0}}}\right\rVert\\ &=\sum_{\ell\geq 0}\mathopen{}\mathclose{{}\left\lVert\sum_{q\in N_{v}^{r-1}}\Bigg{(}Q^{-1}(-J)^{\ell}_{\tau^{\prime},\tau}\Bigg{)}_{v_{0},q}(Ky)_{q}}\right\rVert\\ &\leq\sum_{\ell\geq 0}\sum_{q\in N_{v}^{r-1}}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}(-J)^{\ell}_{\tau^{\prime},\tau}\Bigg{)}_{v_{0},q}}\right\rVert\mathopen{}\mathclose{{}\left\lVert(Ky)_{q}}\right\rVert.\end{split} (21)

For a given qNvr1q\in N_{v}^{r-1} and y|Nvr|dy\in\mathbb{R}^{|\partial N_{v}^{r}|d},

(Ky)q=uNvrKq,uyu=uNvrNq1Kq,uyu.\mathopen{}\mathclose{{}\left\lVert(Ky)_{q}}\right\rVert=\mathopen{}\mathclose{{}\left\lVert\sum_{u\in\partial N_{v}^{r}}K_{q,u}y_{u}}\right\rVert=\mathopen{}\mathclose{{}\left\lVert\sum_{u\in\partial N_{v}^{r}\cap N_{q}^{1}}K_{q,u}y_{u}}\right\rVert.

where the last equality is since spacial interaction costs are only among neighboring nodes.

For a given uNvru\in\partial N_{v}^{r}, since the spacial interaction cost for each edge is S\ell_{S} smooth,

Kq,uyuKq,uyuSyu,\mathopen{}\mathclose{{}\left\lVert K_{q,u}y_{u}}\right\rVert\leq\mathopen{}\mathclose{{}\left\lVert K_{q,u}}\right\rVert\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert\leq\ell_{S}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert,

which gives

(Ky)quNvrNq1Syu.\mathopen{}\mathclose{{}\left\lVert(Ky)_{q}}\right\rVert\leq\sum_{u\in\partial N_{v}^{r}\cap N_{q}^{1}}\ell_{S}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert.

Therefore,

(Q10(J)τ,τKy)v0S0qNvr1(Q1(J)τ,τ)v0,quNvrNq1yu.\displaystyle\begin{split}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}\sum_{\ell\geq 0}(-J)^{\ell}_{\tau^{\prime},\tau}Ky\Bigg{)}_{v_{0}}}\right\rVert&\leq\ell_{S}\sum_{\ell\geq 0}\sum_{q\in N_{v}^{r-1}}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}(-J)^{\ell}_{\tau^{\prime},\tau}\Bigg{)}_{v_{0},q}}\right\rVert\sum_{u\in\partial N_{v}^{r}\cap N_{q}^{1}}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert.\end{split} (22)

By Lemma C.4, (J)τ,τ(-J)^{\ell}_{\tau^{\prime},\tau} satisfies the following exponential decay properties: for any u,qNvr1u,q\in N_{v}^{r-1},

((J)τ,τ)u,q((|ττ|)/2)(a~2Tμ)(γS)d𝒢(u,q),\mathopen{}\mathclose{{}\left\lVert((J^{\ell})_{\tau^{\prime},\tau})_{u,q}}\right\rVert\leq{\ell\choose(\ell-|\tau^{\prime}-\tau|)/2}(\tilde{a}\frac{2\ell_{T}}{\mu})^{\ell}(\gamma_{S}^{\prime})^{d_{\mathcal{G}}(u,q)},

where we choose δ=b1γS\delta=b_{1}\cdot\gamma_{S}, γS=(1+b1)γS\gamma_{S}^{\prime}=(1+b_{1})\gamma_{S} and a~=γ0(11+b1)γh(γ)\tilde{a}=\sum_{\gamma\geq 0}(\frac{1}{1+b_{1}})^{\gamma}h(\gamma).

Moreover, Q1Q^{-1} (which denotes Dτ,τ1D_{\tau^{\prime},\tau^{\prime}}^{-1}) is the inverse of a graph-induced banded matrix. Q1Q^{-1} satisfies: for any u,qNvr1u,q\in N_{v}^{r-1},

(Q1)u,q2μγSd𝒢(u,q)<2μ(γS)d𝒢(u,q),\mathopen{}\mathclose{{}\left\lVert(Q^{-1})_{u,q}}\right\rVert\leq\frac{2}{\mu}\gamma_{S}^{d_{\mathcal{G}}(u,q)}<\frac{2}{\mu}(\gamma_{S}^{\prime})^{d_{\mathcal{G}}(u,q)},

where the first inequality is again by using Lemma D.1 on Dτ,τD_{\tau^{\prime},\tau^{\prime}}.

Applying Lemma C.3 on Q1Q^{-1} and ((J)τ,τ)\mathopen{}\mathclose{{}\left\lVert((J^{\ell})_{\tau^{\prime},\tau})}\right\rVert, we have for any u,qNvr1u,q\in N_{v}^{r-1}, and 1\ell\geq 1,

(Q1(J)τ,τ)u,qa22μ((|ττ|)/2)(a~2Tμ)(λ)d𝒢(u,q),\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}(-J)^{\ell}_{\tau^{\prime},\tau}\Bigg{)}_{u,q}}\right\rVert\leq a^{2}\frac{2}{\mu}{\ell\choose(\ell-|\tau^{\prime}-\tau|)/2}(\tilde{a}\frac{2\ell_{T}}{\mu})^{\ell}(\lambda^{\prime})^{d_{\mathcal{G}}(u,q)},

where λγS+b2γS<1\lambda^{\prime}\coloneqq\gamma_{S}^{\prime}+b_{2}\cdot\gamma_{S}<1 and aγ0(1+b11+b1+b2)γh(γ).a\coloneqq\sum_{\gamma\geq 0}(\frac{1+b_{1}}{1+b_{1}+b_{2}})^{\gamma}h(\gamma). Note that J0IJ^{0}\coloneqq I, it is straightforward to verify that the above inequality holds when =0\ell=0.

With the exponential decay properties of Q1(J)τ,τQ^{-1}(-J)^{\ell}_{\tau^{\prime},\tau}, we have

(Q10(J)τ,τKy)v0Sa22μ0((|ττ|)/2)(a~2Tμ)qNvr1(λ)d𝒢(v0,q)uNvrNq1yuSa22μ|ττ|((|ττ|)/2)(a~2Tμ)uNvrΔ(λ)d𝒢(v0,u)1yuΔSa22μ|ττ|(4a~Tμ)uNvr(λ)d𝒢(v0,u)1yu2ΔSa2μ4a~T(4a~Tμ)|ττ|uNvr(λ)d𝒢(v0,u)1yu=2ΔSa2λ(μ4a~T)(4a~Tμ)|ττ|uNvr(λ)d𝒢(v0,u)yu.\displaystyle\begin{split}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}\sum_{\ell\geq 0}(-J)^{\ell}_{\tau^{\prime},\tau}Ky\Bigg{)}_{v_{0}}}\right\rVert&\leq\ell_{S}a^{2}\frac{2}{\mu}\sum_{\ell\geq 0}{\ell\choose(\ell-|\tau^{\prime}-\tau|)/2}(\tilde{a}\frac{2\ell_{T}}{\mu})^{\ell}\sum_{q\in N_{v}^{r-1}}(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},q)}\sum_{u\in\partial N_{v}^{r}\cup N_{q}^{1}}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert\\ &\leq\ell_{S}a^{2}\frac{2}{\mu}\sum_{\ell\geq|\tau^{\prime}-\tau|}{\ell\choose(\ell-|\tau^{\prime}-\tau|)/2}(\tilde{a}\frac{2\ell_{T}}{\mu})^{\ell}\sum_{u\in\partial N_{v}^{r}}\Delta(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},u)-1}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert\\ &\leq\Delta\ell_{S}a^{2}\frac{2}{\mu}\sum_{\ell\geq|\tau^{\prime}-\tau|}(\frac{4\tilde{a}\ell_{T}}{\mu})^{\ell}\sum_{u\in\partial N_{v}^{r}}(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},u)-1}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert\\ &\leq\frac{2\Delta\ell_{S}a^{2}}{\mu-4\tilde{a}\ell_{T}}(\frac{4\tilde{a}\ell_{T}}{\mu})^{|\tau^{\prime}-\tau|}\sum_{u\in\partial N_{v}^{r}}(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},u)-1}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert\\ &=\frac{2\Delta\ell_{S}a^{2}}{\lambda^{\prime}(\mu-4\tilde{a}\ell_{T})}(\frac{4\tilde{a}\ell_{T}}{\mu})^{|\tau^{\prime}-\tau|}\sum_{u\in\partial N_{v}^{r}}(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert y_{u}}\right\rVert.\\ \end{split} (23)

The third inequality uses ((|ττ|)/2)2{\ell\choose(\ell-|\tau^{\prime}-\tau|)/2}\leq 2^{\ell}, which can be proved using the following version of Stirling’s approximation: For all n1n\geq 1, ee denotes the natural number,

2πn(n/e)ne1/(12n+1)<n!<2πn(n/e)ne1/(12n).\sqrt{2\pi n}(n/e)^{n}e^{1/(12n+1)}<n!<\sqrt{2\pi n}(n/e)^{n}e^{1/(12n)}.

Similarly, consider ((M1)τ,i)Ri(i)e)v0\mathopen{}\mathclose{{}\left\lVert((M^{-1})_{\tau^{\prime},i})R^{(i)}_{i}e)_{v_{0}}}\right\rVert for i{1,k1}i\in\{1,k-1\}. With slight abuse of notation, in this proof, we use RR to denote Ri(i)R_{i}^{(i)} and use the notation Q1Q^{-1} to denote Dτ,τ1.D^{-1}_{\tau^{\prime},\tau^{\prime}}. Following the same steps as before, we have

((M1)τ,i)Ri(i)e)v00qNvr(Q1(J)τ,i)v0,q(Re)q.\displaystyle\begin{split}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}(M^{-1})_{\tau^{\prime},i})R^{(i)}_{i}e\Bigg{)}_{v_{0}}}\right\rVert&\leq\sum_{\ell\geq 0}\sum_{q\in N_{v}^{r}}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}(-J)^{\ell}_{\tau^{\prime},i}\Bigg{)}_{v_{0},q}}\right\rVert\mathopen{}\mathclose{{}\left\lVert(Re)_{q}}\right\rVert.\end{split} (24)

Since temporal interactions occurs for the same node under consecutive time steps, RR is a diagonal block matrix. Hence,

(Re)q=Rq,qeqTeq.\mathopen{}\mathclose{{}\left\lVert(Re)_{q}}\right\rVert=\mathopen{}\mathclose{{}\left\lVert R_{q,q}e_{q}}\right\rVert\leq\ell_{T}\mathopen{}\mathclose{{}\left\lVert e_{q}}\right\rVert.

Moreover, using the exponential decay properties of Q1(J)τ,iQ^{-1}(-J)^{\ell}_{\tau^{\prime},i}, we have for u,qNvr1u,q\in N_{v}^{r-1},

(Q1(J)τ,i)u,qa22μ((|τi|)/2)(a~2Tμ)(λ)d𝒢(u,q).\mathopen{}\mathclose{{}\left\lVert\Bigg{(}Q^{-1}(-J)^{\ell}_{\tau^{\prime},i}\Bigg{)}_{u,q}}\right\rVert\leq a^{2}\frac{2}{\mu}{\ell\choose(\ell-|\tau^{\prime}-i|)/2}(\tilde{a}\frac{2\ell_{T}}{\mu})^{\ell}(\lambda^{\prime})^{d_{\mathcal{G}}(u,q)}.

Therefore,

((M1)τ,i)Ri(i)e)v00qNvra22μ((|τi|)/2)(a~2Tμ)(λ)d𝒢(v0,q)Teq|τi|qNvra22μ((|τi|)/2)(a~2Tμ)(λ)d𝒢(v0,q)Teq2Ta2μ|τi|(4a~Tμ)qNvr(λ)d𝒢(v0,q)eq2Ta2μ4a~T(4a~Tμ)|τi|qNvr(λ)d𝒢(v0,q)eq=a2μ2a~(μ4a~T)(4a~Tμ)|τi|+1qNvr(λ)d𝒢(v0,q)eq.\displaystyle\begin{split}\mathopen{}\mathclose{{}\left\lVert((M^{-1})_{\tau^{\prime},i})R^{(i)}_{i}e)_{v_{0}}}\right\rVert&\leq\sum_{\ell\geq 0}\sum_{q\in N_{v}^{r}}a^{2}\frac{2}{\mu}{\ell\choose(\ell-|\tau^{\prime}-i|)/2}(\tilde{a}\frac{2\ell_{T}}{\mu})^{\ell}(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},q)}\ell_{T}\mathopen{}\mathclose{{}\left\lVert e_{q}}\right\rVert\\ &\leq\sum_{\ell\geq|\tau^{\prime}-i|}\sum_{q\in N_{v}^{r}}a^{2}\frac{2}{\mu}{\ell\choose(\ell-|\tau^{\prime}-i|)/2}(\tilde{a}\frac{2\ell_{T}}{\mu})^{\ell}(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},q)}\ell_{T}\mathopen{}\mathclose{{}\left\lVert e_{q}}\right\rVert\\ &\leq\frac{2\ell_{T}a^{2}}{\mu}\sum_{\ell\geq|\tau^{\prime}-i|}(\frac{4\tilde{a}\ell_{T}}{\mu})^{\ell}\sum_{q\in N_{v}^{r}}{(\lambda^{\prime})}^{d_{\mathcal{G}}(v_{0},q)}\mathopen{}\mathclose{{}\left\lVert e_{q}}\right\rVert\\ &\leq\frac{2\ell_{T}a^{2}}{\mu-4\tilde{a}\ell_{T}}(\frac{4\tilde{a}\ell_{T}}{\mu})^{|\tau^{\prime}-i|}\sum_{q\in N_{v}^{r}}(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},q)}\mathopen{}\mathclose{{}\left\lVert e_{q}}\right\rVert\\ &=\frac{a^{2}\mu}{2\tilde{a}(\mu-4\tilde{a}\ell_{T})}(\frac{4\tilde{a}\ell_{T}}{\mu})^{|\tau^{\prime}-i|+1}\sum_{q\in N_{v}^{r}}(\lambda^{\prime})^{d_{\mathcal{G}}(v_{0},q)}\mathopen{}\mathclose{{}\left\lVert e_{q}}\right\rVert.\\ \end{split} (25)

Given time index 1τk11\leq\tau^{\prime}\leq k-1, node v0Nvr1v_{0}\in N_{v}^{r-1}, and perturbation vector e=(e0,e1,,ek)e=(e_{0},e_{1},\cdots,e_{k}),

(ddθψ(ζ+θe))τ,v0\displaystyle\mathopen{}\mathclose{{}\left\lVert(\frac{d}{d\theta}\psi(\zeta+\theta e))_{\tau^{\prime},v_{0}}}\right\rVert (Mτ,11R1(1)e0)v0+(Mτ,k11Rk1(k1)ek)v0+τ=1k1(Mτ,τ1Kτ(τ)eτ)v0\displaystyle\leq\mathopen{}\mathclose{{}\left\lVert\Bigg{(}M^{-1}_{\tau^{\prime},1}R^{(1)}_{1}e_{0}\Bigg{)}_{v_{0}}}\right\rVert+\mathopen{}\mathclose{{}\left\lVert\Bigg{(}M^{-1}_{\tau^{\prime},k-1}R^{(k-1)}_{k-1}e_{k}\Bigg{)}_{v_{0}}}\right\rVert+\sum_{\tau=1}^{k-1}\mathopen{}\mathclose{{}\left\lVert\Bigg{(}M^{-1}_{\tau^{\prime},\tau}K^{(\tau)}_{\tau}e_{\tau}\Bigg{)}_{v_{0}}}\right\rVert
a2μ2a~(μ4a~T)[ρTτqNvrρSd𝒢(v0,q)(e0)q+ρTkτqNvrρSd𝒢(v0,q)(ek)q]\displaystyle\leq\frac{a^{2}\mu}{2\tilde{a}(\mu-4\tilde{a}\ell_{T})}\Bigg{[}\rho_{T}^{\tau^{\prime}}\sum_{q\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(v_{0},q)}\mathopen{}\mathclose{{}\left\lVert(e_{0})_{q}}\right\rVert+\rho_{T}^{k-\tau^{\prime}}\sum_{q\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(v_{0},q)}\mathopen{}\mathclose{{}\left\lVert(e_{k})_{q}}\right\rVert\Bigg{]}
+τ=1k12ΔSa2λ(μ4a~T)ρT|ττ|uNvr(ρS)d𝒢(v0,u)(eτ)u\displaystyle+\sum_{\tau=1}^{k-1}\frac{2\Delta\ell_{S}a^{2}}{\lambda^{\prime}(\mu-4\tilde{a}\ell_{T})}\rho_{T}^{|\tau^{\prime}-\tau|}\sum_{u\in\partial N_{v}^{r}}(\rho_{S})^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert(e_{\tau})_{u}}\right\rVert

where ρT=4a~Tμ\rho_{T}=\frac{4\tilde{a}\ell_{T}}{\mu} and ρS=λ=(1+b1+b2)γS.\rho_{S}=\lambda^{\prime}=(1+b_{1}+b_{2})\gamma_{S}. We let C=max{a22a~(14a~T/μ),2a2ΔS/μγS(1+b1+b2)(14a~T/μ)}C=\max\{\frac{a^{2}}{2\tilde{a}(1-4\tilde{a}\ell_{T}/\mu)},\frac{2a^{2}\Delta\ell_{S}/\mu}{\gamma_{S}(1+b_{1}+b_{2})(1-4\tilde{a}\ell_{T}/\mu)}\}. Under the condition μmax{8a~T,ΔS(b1+b2)/4}\mu\geq\max\{8\tilde{a}\ell_{T},\Delta\ell_{S}(b_{1}+b_{2})/4\}, ρT<1\rho_{T}<1 and ρS<1\rho_{S}<1.

Then,

(ddθψ(ζ+θe))τ,v0\displaystyle\mathopen{}\mathclose{{}\left\lVert(\frac{d}{d\theta}\psi(\zeta+\theta e))_{\tau^{\prime},v_{0}}}\right\rVert
\displaystyle\leq{} C[ρTτqNvrρSd𝒢(v0,q)(e0)q+ρTkτqNvrρSd𝒢(v0,q)(ek)q+τ=1k1ρT|ττ|uNvr(ρS)d𝒢(v0,u)(eτ)u].\displaystyle C\Bigg{[}\rho_{T}^{\tau^{\prime}}\sum_{q\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(v_{0},q)}\mathopen{}\mathclose{{}\left\lVert(e_{0})_{q}}\right\rVert+\rho_{T}^{k-\tau^{\prime}}\sum_{q\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(v_{0},q)}\mathopen{}\mathclose{{}\left\lVert(e_{k})_{q}}\right\rVert+\sum_{\tau=1}^{k-1}\rho_{T}^{|\tau^{\prime}-\tau|}\sum_{u\in\partial N_{v}^{r}}(\rho_{S})^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert(e_{\tau})_{u}}\right\rVert\Bigg{]}.

Finally, let ζ={yt1u,zτu|(τ,u)N(v,t)(k,r)}\zeta=\{y_{t-1}^{u},z_{\tau}^{u}|(\tau,u)\in\partial N_{(v,t)}^{(k,r)}\} and e={(yt1u)yt1u,(zτu)zτu}e=\{(y_{t-1}^{u})^{\prime}-y_{t-1}^{u},(z_{\tau}^{u})^{\prime}-z_{\tau}^{u}\}. By integration,

ψ(t,v)(k,r)({yt1u},{zτu})(t0,v0)(ψ(t,v)(k,r)({(yt1u)},{(zτu)})(t0,v0)\displaystyle\mathopen{}\mathclose{{}\left\lVert\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}}\right)_{(t_{0},v_{0})}-(\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{(y_{t-1}^{u})^{\prime}\},\{(z_{\tau}^{u})^{\prime}\}}\right)_{(t_{0},v_{0})}}\right\rVert 01(ddθψ(ζ+θe))t0,v0𝑑θ,\displaystyle\leq\int_{0}^{1}\mathopen{}\mathclose{{}\left\lVert(\frac{d}{d\theta}\psi(\zeta+\theta e))_{t_{0},v_{0}}}\right\rVert d\theta,

which is bounded by

CuNvrρTt0(t1)ρSd𝒢(v0,u)yt1u(yt1u)+C(u,τ)N(t,v)(k,r)ρT|t0τ|ρSd𝒢(v0,u)zτu(zτu).C\sum_{u\in N_{v}^{r}}\rho_{T}^{t_{0}-(t-1)}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert y_{t-1}^{u}-(y_{t-1}^{u})^{\prime}}\right\rVert+C\sum_{(u,\tau)\in\partial N_{(t,v)}^{(k,r)}}\rho_{T}^{|t_{0}-\tau|}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert z_{\tau}^{u}-(z_{\tau}^{u})^{\prime}}\right\rVert.

D.3 Adding Constraints to Perturbation Bounds

Recall that in Appendix D.1 and D.2, we showed Theorem 3.2 and Theorem 3.3 under the assumption that the individual decisions are unconstrained to simplify the analysis. In this section, we present a general way to relax this assumption by incorporating logarithm barrier functions, which also applies for Theorem C.5.

Recall that in Assumption 2.1, we assume that DtvD_{t}^{v} is convex with a non-empty interior, and can be expressed as

Dtv:={xtvn(gtv)i(xtv)0,1imtv},D_{t}^{v}:=\{x_{t}^{v}\in\mathbb{R}^{n}\mid(g_{t}^{v})_{i}(x_{t}^{v})\leq 0,\forall 1\leq i\leq m_{t}^{v}\},

where the ii th constraint (gtv)i:n(g_{t}^{v})_{i}:\mathbb{R}^{n}\to\mathbb{R} is a convex function in C2C^{2}. For any time-vertex pair (τ,v)(\tau,v), we can approximate the individual constraints

(gτv)i(xτv)0,1imτv,(g_{\tau}^{v})_{i}(x_{\tau}^{v})\leq 0,\forall 1\leq i\leq m_{\tau}^{v},

by adding the logarithmic barrier function μi=1mτvln((gτv)i(xτv))-\mu\sum_{i=1}^{m_{\tau}^{v}}\ln{(-(g_{\tau}^{v})_{i}(x_{\tau}^{v}))} to the original node cost function fτvf_{\tau}^{v}. Here, parameter μ\mu is a positive real number that controls how “good” the barrier function approximates the indicator function

𝐈Dτv(xτv)={0 if (gτv)i(xτv)0,1imτv,+ otherwise.\displaystyle\mathbf{I}_{D_{\tau}^{v}}(x_{\tau}^{v})=\begin{cases}0&\text{ if }(g_{\tau}^{v})_{i}(x_{\tau}^{v})\leq 0,\forall 1\leq i\leq m_{\tau}^{v},\\ +\infty&\text{ otherwise.}\end{cases}

The approximation improves as parameter μ\mu becomes closer to 0. Thus, the new node cost function will be

Bτv(xτv;μ):=fτv(xτv)μi=1mτvln((gτv)i(xτv)).B_{\tau}^{v}(x_{\tau}^{v};\mu):=f_{\tau}^{v}(x_{\tau}^{v})-\mu\sum_{i=1}^{m_{\tau}^{v}}\ln{(-(g_{\tau}^{v})_{i}(x_{\tau}^{v}))}.

As an extension of the original notation, we use ψ(t,v)(k,r)({yt1u},{zτu};μ)\psi_{(t,v)}^{(k,r)}(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\};\mu) denote the optimal solution of the following optimization problem

argmin{xτu(τ,v)N(t,v)(k1,r1)}\displaystyle\operatorname*{arg\,min}_{\{x_{\tau}^{u}\mid(\tau,v)\in N_{(t,v)}^{(k-1,r-1)}\}} τ=tt+k1(uNvrBτu(xτu;μ)+uNvrcτu(xτu,xτ1u)+(u,q)(Nvr)gt(u,q)(xtu,xtq))\displaystyle\sum_{\tau=t}^{t+k-1}\mathopen{}\mathclose{{}\left(\sum_{u\in N_{v}^{r}}B_{\tau}^{u}(x_{\tau}^{u};\mu)+\sum_{u\in N_{v}^{r}}c_{\tau}^{u}(x_{\tau}^{u},x_{\tau-1}^{u})+\sum_{(u,q)\in{\mathcal{E}}(N_{v}^{r})}g_{t}^{(u,q)}(x_{t}^{u},x_{t}^{q})}\right)
s.t. xt1u=yt1u,uNvr,\displaystyle x_{t-1}^{u}=y_{t-1}^{u},\forall u\in N_{v}^{r},
xτu=zτu,(τ,u)N(t,v)(k,r).\displaystyle x_{\tau}^{u}=z_{\tau}^{u},\forall(\tau,u)\in\partial N_{(t,v)}^{(k,r)}.

Compared with ψ(t,v)(k,r)({yt1u},{zτu})\psi_{(t,v)}^{(k,r)}(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}) defined in Section 3.1, the constraints xτuDτux_{\tau}^{u}\in D_{\tau}^{u} are removed and the node costs fτu(xτu)f_{\tau}^{u}(x_{\tau}^{u}) are replaced with Bτu(xτ;μ)B_{\tau}^{u}(x_{\tau};\mu).

A key observation we need to point out is that the perturbation bounds we have shown in Appendix D.1 and D.2 do not depend on the smoothness constant f\ell_{f} of node cost functions. That means the perturbation bound

ψ(t,v)(k,r)({yt1u},{zτu};μ)(t0,v0)ψ(t,v)(k,r)({(yt1u)},{(zτu)};μ)(t0,v0)\displaystyle\mathopen{}\mathclose{{}\left\lVert\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\};\mu}\right)_{(t_{0},v_{0})}-\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{(y_{t-1}^{u})^{\prime}\},\{(z_{\tau}^{u})^{\prime}\};\mu}\right)_{(t_{0},v_{0})}}\right\rVert
\displaystyle\leq{} C1(u,τ)N(t,v)(k,r)ρT|t0τ|ρSd𝒢(v0,u)zτu(zτu)+C2uNvrρTt0(t1)ρSd𝒢(v0,u)yt1u(yt1u)\displaystyle C_{1}\sum_{(u,\tau)\in\partial N_{(t,v)}^{(k,r)}}\rho_{T}^{|t_{0}-\tau|}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert z_{\tau}^{u}-(z_{\tau}^{u})^{\prime}}\right\rVert+C_{2}\sum_{u\in N_{v}^{r}}\rho_{T}^{t_{0}-(t-1)}\rho_{S}^{d_{\mathcal{G}}(v_{0},u)}\mathopen{}\mathclose{{}\left\lVert y_{t-1}^{u}-(y_{t-1}^{u})^{\prime}}\right\rVert

holds for arbitrary μ\mu, where C1,C2,ρS,ρTC_{1},C_{2},\rho_{S},\rho_{T} are specified in Theorem 3.2 or Theorem 3.3 and are independent of μ\mu. Theorem 3.10 in Forsgren et al. (2002) guarantees that ψ(t,v)(k,r)({yt1u},{zτu};μk)\psi_{(t,v)}^{(k,r)}(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\};\mu_{k}) converge to ψ(t,v)(k,r)({yt1u},{zτu})\psi_{(t,v)}^{(k,r)}(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}) for any positive sequence {μk}k=1\{\mu_{k}\}_{k=1}^{\infty} that tends to zero. Thus the above perturbation bound also holds for ψ(t,v)(k,r)({yt1u},{zτu})\psi_{(t,v)}^{(k,r)}(\{y_{t-1}^{u}\},\{z_{\tau}^{u}\}) which includes the constraints on individual decisions.

Note that the argument we present in this section also works for Theorem C.5.

Appendix E Competitive Bounds

This appendix includes the proofs of the competitive bounds presented in Section 3.3.

E.1 Proof of Theorem C.8

We first derive an upper bound on the distance between xtx_{t} and xtx_{t}^{*}.

Note that for any time step tt, we have

xtψ~t(xt1)tet.\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t}-\tilde{\psi}_{t}\mathopen{}\mathclose{{}\left(x_{t-1}}\right)_{t}}\right\rVert\leq e_{t}. (26)

Thus we see that

xtxt=\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert={} xtψ~1(x0)t\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t}-\tilde{\psi}_{1}(x_{0})_{t}}\right\rVert{}
\displaystyle\leq{} xtψ~t(xt1)t+i=1t1ψ~ti+1(xti)tψ~ti(xti1)t\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t}-\tilde{\psi}_{t}(x_{t-1})_{t}}\right\rVert+\sum_{i=1}^{t-1}\mathopen{}\mathclose{{}\left\lVert\tilde{\psi}_{t-i+1}(x_{t-i})_{t}-\tilde{\psi}_{t-i}(x_{t-i-1})_{t}}\right\rVert
\displaystyle\leq{} xtψ~t(xt1)t+i=1t1CGρGixtiψ~ti(xti1)ti\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t}-\tilde{\psi}_{t}(x_{t-1})_{t}}\right\rVert+\sum_{i=1}^{t-1}C_{G}\rho_{G}^{i}\mathopen{}\mathclose{{}\left\lVert x_{t-i}-\tilde{\psi}_{t-i}(x_{t-i-1})_{t-i}}\right\rVert (27a)
\displaystyle\leq{} i=0t1C0ρGixtiψ~ti(xti1)ti\displaystyle\sum_{i=0}^{t-1}C_{0}\rho_{G}^{i}\mathopen{}\mathclose{{}\left\lVert x_{t-i}-\tilde{\psi}_{t-i}(x_{t-i-1})_{t-i}}\right\rVert (27b)
\displaystyle\leq{} i=1tC0ρGtiei,\displaystyle\sum_{i=1}^{t}C_{0}\rho_{G}^{t-i}e_{i}, (27c)

where in (27a), we used Theorem C.5 and the fact that ψ~ti(xti1)t\tilde{\psi}_{t-i}(x_{t-i-1})_{t} can be written as

ψ~ti(xti1)t=ψ~ti+1(ψ~ti(xti1)ti)t.\tilde{\psi}_{t-i}(x_{t-i-1})_{t}=\tilde{\psi}_{t-i+1}\mathopen{}\mathclose{{}\left(\tilde{\psi}_{t-i}(x_{t-i-1})_{t-i}}\right)_{t}.

We also used C0=max{1,CG}C_{0}=\max\{1,C_{G}\} in (27b) and (26) in (27c).

By (27) and the Cauchy-Schwarz Inequality, we see that

xtxt2C02(i=1tρGtiei)2C02(i=1tρGti)(i=1tρGtiei2)C021ρG(i=1tρGtiei2).\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2}\leq C_{0}^{2}\mathopen{}\mathclose{{}\left(\sum_{i=1}^{t}\rho_{G}^{t-i}e_{i}}\right)^{2}\leq C_{0}^{2}\mathopen{}\mathclose{{}\left(\sum_{i=1}^{t}\rho_{G}^{t-i}}\right)\cdot\mathopen{}\mathclose{{}\left(\sum_{i=1}^{t}\rho_{G}^{t-i}e_{i}^{2}}\right)\leq\frac{C_{0}^{2}}{1-\rho_{G}}\cdot\mathopen{}\mathclose{{}\left(\sum_{i=1}^{t}\rho_{G}^{t-i}e_{i}^{2}}\right).

Summing up over tt gives that

t=1Hxtxt2C02(1ρG)2t=1Het2.\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2}\leq\frac{C_{0}^{2}}{(1-\rho_{G})^{2}}\cdot\sum_{t=1}^{H}e_{t}^{2}.

E.2 Proof of Lemma C.7

In this section, we show Lemma C.7 holds with following specific constants:

et2:=\displaystyle e_{t}^{2}:={} xtxtt12\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t\mid t-1}^{*}}\right\rVert^{2}
\displaystyle\leq{} 4C12C02(h(r)2ρG2(1ρT)(1ρG2ρT)ρS2r+C3(r)2ρT2(k1)ρG2k)xt1xt12\displaystyle 4C_{1}^{2}C_{0}^{2}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}\rho_{G}^{2}}{(1-\rho_{T})(1-\rho_{G}^{2}\rho_{T})}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}\cdot\rho_{G}^{2k}}\right)\mathopen{}\mathclose{{}\left\lVert x_{t-1}-x_{t-1}^{*}}\right\rVert^{2}
+8C12μ(h(r)21ρTρS2rτ=tt+k1ρTτtfτ(xτ)+C3(r)2ρT2(k1)ft+k1(xt+k1))\displaystyle+\frac{8C_{1}^{2}}{\mu}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}}{1-\rho_{T}}\cdot\rho_{S}^{2r}\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}f_{\tau}(x_{\tau}^{*})+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}f_{t+k-1}(x_{t+k-1}^{*})}\right) (28)

Note that, by the principle of optimality, we have

xtv\displaystyle x_{t}^{v} =ψ(t,v)(k,r)({xt1u},{θτu})(t,v),\displaystyle=\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{x_{t-1}^{u}\},\{\theta_{\tau}^{u}\}}\right)_{(t,v)},
(xtt1v)\displaystyle(x_{t\mid t-1}^{v})^{*} =ψ(t,v)(k,r)({xt1u},{(xτt1u)})(t,v).\displaystyle=\psi_{(t,v)}^{(k,r)}\mathopen{}\mathclose{{}\left(\{x_{t-1}^{u}\},\{(x_{\tau\mid t-1}^{u})^{*}\}}\right)_{(t,v)}.

Recall that we define the quantity C3(r):=γ=0rh(γ)ρSγC_{3}(r):=\sum_{\gamma=0}^{r}h(\gamma)\cdot\rho_{S}^{\gamma} to simplify the notation.

Since the exponentially decaying local perturbation bound holds in Definition 3.1, we see that

xtv(xtt1v)\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t}^{v}-(x_{t\mid t-1}^{v})^{*}}\right\rVert\leq{} C1ρSrτ=tt+k1ρTτtuNvr(xτt1u)θτu\displaystyle C_{1}\rho_{S}^{r}\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}\sum_{u\in\partial N_{v}^{r}}\mathopen{}\mathclose{{}\left\lVert(x_{\tau\mid t-1}^{u})^{*}-\theta_{\tau}^{u}}\right\rVert
+C1ρTk1uNvrρSd𝒢(u,v)(xt+k1t1u)θt+k1u,\displaystyle+C_{1}\rho_{T}^{k-1}\sum_{u\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(u,v)}\mathopen{}\mathclose{{}\left\lVert(x_{t+k-1\mid t-1}^{u})^{*}-\theta_{t+k-1}^{u}}\right\rVert, (29)

which implies that

xtv(xtt1v)2\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{t}^{v}-(x_{t\mid t-1}^{v})^{*}}\right\rVert^{2}\leq{} 2C12ρS2r(τ=tt+k1ρTτtuNvr(xτt1u)θτu)2\displaystyle 2C_{1}^{2}\rho_{S}^{2r}\mathopen{}\mathclose{{}\left(\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}\sum_{u\in\partial N_{v}^{r}}\mathopen{}\mathclose{{}\left\lVert(x_{\tau\mid t-1}^{u})^{*}-\theta_{\tau}^{u}}\right\rVert}\right)^{2}
+2C12ρT2(k1)(uNvrρSd𝒢(u,v)(xt+k1t1u)θt+k1u)2\displaystyle+2C_{1}^{2}\rho_{T}^{2(k-1)}\mathopen{}\mathclose{{}\left(\sum_{u\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(u,v)}\mathopen{}\mathclose{{}\left\lVert(x_{t+k-1\mid t-1}^{u})^{*}-\theta_{t+k-1}^{u}}\right\rVert}\right)^{2} (30a)
\displaystyle\leq{} 2C12ρS2r(τ=tt+k1ρTτtuNvr1)(τ=tt+k1ρTτtuNvr(xτt1u)θτu2)\displaystyle 2C_{1}^{2}\rho_{S}^{2r}\mathopen{}\mathclose{{}\left(\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}\sum_{u\in\partial N_{v}^{r}}1}\right)\mathopen{}\mathclose{{}\left(\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}\sum_{u\in\partial N_{v}^{r}}\mathopen{}\mathclose{{}\left\lVert(x_{\tau\mid t-1}^{u})^{*}-\theta_{\tau}^{u}}\right\rVert^{2}}\right)
+2C12ρT2(k1)(uNvrρSd𝒢(u,v))(uNvrρSd𝒢(u,v)(xt+k1t1u)θt+k1u2)\displaystyle+2C_{1}^{2}\rho_{T}^{2(k-1)}\mathopen{}\mathclose{{}\left(\sum_{u\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(u,v)}}\right)\mathopen{}\mathclose{{}\left(\sum_{u\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(u,v)}\mathopen{}\mathclose{{}\left\lVert(x_{t+k-1\mid t-1}^{u})^{*}-\theta_{t+k-1}^{u}}\right\rVert^{2}}\right) (30b)
\displaystyle\leq{} 2C12h(r)1ρTρS2r(τ=tt+k1ρTτtuNvr(xτt1u)θτu2)\displaystyle\frac{2C_{1}^{2}h(r)}{1-\rho_{T}}\cdot\rho_{S}^{2r}\mathopen{}\mathclose{{}\left(\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}\sum_{u\in\partial N_{v}^{r}}\mathopen{}\mathclose{{}\left\lVert(x_{\tau\mid t-1}^{u})^{*}-\theta_{\tau}^{u}}\right\rVert^{2}}\right)
+2C12C3(r)ρT2(k1)(uNvrρSd𝒢(u,v)(xt+k1t1u)θt+k1u2),\displaystyle+2C_{1}^{2}C_{3}(r)\cdot\rho_{T}^{2(k-1)}\mathopen{}\mathclose{{}\left(\sum_{u\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(u,v)}\mathopen{}\mathclose{{}\left\lVert(x_{t+k-1\mid t-1}^{u})^{*}-\theta_{t+k-1}^{u}}\right\rVert^{2}}\right), (30c)

where we used the AM-GM Inequality in (30a); we used the Cauchy-Schwarz Inequality in (30b); we used the definitions of functions h(r)h(r) and C3(r)C_{3}(r) in (30c).

Summing up (30) over all v𝒱v\in\mathcal{V} and reorganizing terms gives

v𝒱xtv(xtt1v)2\displaystyle\sum_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left\lVert x_{t}^{v}-(x_{t\mid t-1}^{v})^{*}}\right\rVert^{2}
\displaystyle\leq{} 2C12h(r)1ρTρS2rv𝒱(τ=tt+k1ρTτtuNvr(xτt1u)θτu2)\displaystyle\frac{2C_{1}^{2}h(r)}{1-\rho_{T}}\cdot\rho_{S}^{2r}\sum_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left(\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}\sum_{u\in\partial N_{v}^{r}}\mathopen{}\mathclose{{}\left\lVert(x_{\tau\mid t-1}^{u})^{*}-\theta_{\tau}^{u}}\right\rVert^{2}}\right)
+2C12C3(r)ρT2(k1)v𝒱(uNvrρSd𝒢(u,v)(xt+k1t1u)θt+k1u2)\displaystyle+2C_{1}^{2}C_{3}(r)\cdot\rho_{T}^{2(k-1)}\sum_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left(\sum_{u\in N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(u,v)}\mathopen{}\mathclose{{}\left\lVert(x_{t+k-1\mid t-1}^{u})^{*}-\theta_{t+k-1}^{u}}\right\rVert^{2}}\right)
\displaystyle\leq{} 2C12h(r)21ρTρS2rτ=tt+k1ρTτtxτt1θτ2+2C12C3(r)2ρT2(k1)xt+k1t1θt+k12,\displaystyle\frac{2C_{1}^{2}h(r)^{2}}{1-\rho_{T}}\cdot\rho_{S}^{2r}\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}\mathopen{}\mathclose{{}\left\lVert x_{\tau\mid t-1}^{*}-\theta_{\tau}}\right\rVert^{2}+2C_{1}^{2}C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}\mathopen{}\mathclose{{}\left\lVert x_{t+k-1\mid t-1}^{*}-\theta_{t+k-1}}\right\rVert^{2}, (31)

where we used the facts that

v𝒱uNvr(xτt1u)θτu2h(r)v𝒱(xτt1v)θτv2=h(r)xτt1θτ2,\displaystyle\sum_{v\in\mathcal{V}}\sum_{u\in\partial N_{v}^{r}}\mathopen{}\mathclose{{}\left\lVert(x_{\tau\mid t-1}^{u})^{*}-\theta_{\tau}^{u}}\right\rVert^{2}\leq{}h(r)\sum_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left\lVert(x_{\tau\mid t-1}^{v})^{*}-\theta_{\tau}^{v}}\right\rVert^{2}=h(r)\cdot\mathopen{}\mathclose{{}\left\lVert x_{\tau\mid t-1}^{*}-\theta_{\tau}}\right\rVert^{2},

and

v𝒱uNvrρSd𝒢(u,v)(xt+k1t1u)θt+k1u2\displaystyle\sum_{v\in\mathcal{V}}\sum_{u\in\partial N_{v}^{r}}\rho_{S}^{d_{\mathcal{G}}(u,v)}\mathopen{}\mathclose{{}\left\lVert(x_{t+k-1\mid t-1}^{u})^{*}-\theta_{t+k-1}^{u}}\right\rVert^{2}\leq{} C3(r)v𝒱(xt+k1t1v)θt+k1v2\displaystyle C_{3}(r)\sum_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left\lVert(x_{t+k-1\mid t-1}^{v})^{*}-\theta_{t+k-1}^{v}}\right\rVert^{2}
=\displaystyle={} C3(r)xt+k1t1θt+k12.\displaystyle C_{3}(r)\cdot\mathopen{}\mathclose{{}\left\lVert x_{t+k-1\mid t-1}^{*}-\theta_{t+k-1}}\right\rVert^{2}.

We also note that by the principle of optimality, the following equations hold for all τt\tau\geq t:

xτt1\displaystyle x_{\tau\mid t-1}^{*} =ψ~t(xt1)τ,\displaystyle=\tilde{\psi}_{t}\mathopen{}\mathclose{{}\left(x_{t-1}}\right)_{\tau},
xτ\displaystyle x_{\tau}^{*} =ψ~t(xt1)τ.\displaystyle=\tilde{\psi}_{t}\mathopen{}\mathclose{{}\left(x_{t-1}^{*}}\right)_{\tau}.

Recall that C0:=max{1,CG}C_{0}:=\max\{1,C_{G}\}. By Theorem C.5, we see that

xτt1xτC0ρGτt+1xt1xt1,\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{\tau\mid t-1}^{*}-x_{\tau}^{*}}\right\rVert\leq C_{0}\rho_{G}^{\tau-t+1}\mathopen{}\mathclose{{}\left\lVert x_{t-1}-x_{t-1}^{*}}\right\rVert, (32)

which implies

xτt1θτ2\displaystyle\mathopen{}\mathclose{{}\left\lVert x_{\tau\mid t-1}^{*}-\theta_{\tau}}\right\rVert^{2}\leq{} 2xτt1xτ2+2xτθτ2\displaystyle 2\mathopen{}\mathclose{{}\left\lVert x_{\tau\mid t-1}^{*}-x_{\tau}^{*}}\right\rVert^{2}+2\mathopen{}\mathclose{{}\left\lVert x_{\tau}^{*}-\theta_{\tau}}\right\rVert^{2} (33a)
\displaystyle\leq{} 2C02ρG2(τt+1)xt1xt12+2xτθτ2,\displaystyle 2C_{0}^{2}\rho_{G}^{2(\tau-t+1)}\mathopen{}\mathclose{{}\left\lVert x_{t-1}-x_{t-1}^{*}}\right\rVert^{2}+2\mathopen{}\mathclose{{}\left\lVert x_{\tau}^{*}-\theta_{\tau}}\right\rVert^{2}, (33b)

where we used the triangle inequality and the AM-GM inequality in (33a); we used (32) in (33b).

Substituting (33) into (E.2) gives

v𝒱xtv(xtt1v)2\displaystyle\sum_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left\lVert x_{t}^{v}-(x_{t\mid t-1}^{v})^{*}}\right\rVert^{2}
\displaystyle\leq{} 4C12C02(h(r)2ρG2(1ρT)(1ρG2ρT)ρS2r+C3(r)2ρT2(k1)ρG2k)xt1xt12\displaystyle 4C_{1}^{2}C_{0}^{2}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}\rho_{G}^{2}}{(1-\rho_{T})(1-\rho_{G}^{2}\rho_{T})}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}\cdot\rho_{G}^{2k}}\right)\mathopen{}\mathclose{{}\left\lVert x_{t-1}-x_{t-1}^{*}}\right\rVert^{2}
+4C12(h(r)21ρTρS2rτ=tt+k1ρTτtxτθτ2+C3(r)2ρT2(k1)xt+k1θt+k12)\displaystyle+4C_{1}^{2}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}}{1-\rho_{T}}\cdot\rho_{S}^{2r}\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}\mathopen{}\mathclose{{}\left\lVert x_{\tau}^{*}-\theta_{\tau}}\right\rVert^{2}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}\mathopen{}\mathclose{{}\left\lVert x_{t+k-1}^{*}-\theta_{t+k-1}}\right\rVert^{2}}\right)
\displaystyle\leq{} 4C12C02(h(r)2ρG2(1ρT)(1ρG2ρT)ρS2r+C3(r)2ρT2(k1)ρG2k)xt1xt12\displaystyle 4C_{1}^{2}C_{0}^{2}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}\rho_{G}^{2}}{(1-\rho_{T})(1-\rho_{G}^{2}\rho_{T})}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}\cdot\rho_{G}^{2k}}\right)\mathopen{}\mathclose{{}\left\lVert x_{t-1}-x_{t-1}^{*}}\right\rVert^{2}
+8C12μ(h(r)21ρTρS2rτ=tt+k1ρTτtfτ(xτ)+C3(r)2ρT2(k1)ft+k1(xt+k1)),\displaystyle+\frac{8C_{1}^{2}}{\mu}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}}{1-\rho_{T}}\cdot\rho_{S}^{2r}\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}f_{\tau}(x_{\tau}^{*})+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}f_{t+k-1}(x_{t+k-1}^{*})}\right), (34)

where we used the fact that the node cost function fτvf_{\tau}^{v} is non-negative and μ\mu-strongly convex for all τ,v\tau,v, thus

fτ(xτ)v𝒱fτv((xτv))μ2v𝒱(xτv)θτv2=μ2xτθτ2.f_{\tau}(x_{\tau}^{*})\geq\sum_{v\in\mathcal{V}}f_{\tau}^{v}((x_{\tau}^{v})^{*})\geq\frac{\mu}{2}\sum_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left\lVert(x_{\tau}^{v})^{*}-\theta_{\tau}^{v}}\right\rVert^{2}=\frac{\mu}{2}\mathopen{}\mathclose{{}\left\lVert x_{\tau}^{*}-\theta_{\tau}}\right\rVert^{2}.

Note that v𝒱xtv(xtt1v)2=xtxtt12=et2\sum_{v\in\mathcal{V}}\mathopen{}\mathclose{{}\left\lVert x_{t}^{v}-(x_{t\mid t-1}^{v})^{*}}\right\rVert^{2}=\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t\mid t-1}^{*}}\right\rVert^{2}=e_{t}^{2}. Thus we have finished the proof of (E.2).

E.3 Proof of Theorem 3.4

In this section, we show Theorem 3.4 holds with the following specific constants:

1+(1+32C02C12(f+ΔS+2T)h(r)2μ(1ρG)2(1ρT)2)ρSr+(1+32C02C12(f+ΔS+2T)C3(r)2μ(1ρG)2)ρTk1.\displaystyle 1+\mathopen{}\mathclose{{}\left(1+\frac{32C_{0}^{2}C_{1}^{2}(\ell_{f}+\Delta\ell_{S}+2\ell_{T})\cdot h(r)^{2}}{\mu(1-\rho_{G})^{2}(1-\rho_{T})^{2}}}\right)\cdot\rho_{S}^{r}+\mathopen{}\mathclose{{}\left(1+\frac{32C_{0}^{2}C_{1}^{2}(\ell_{f}+\Delta\ell_{S}+2\ell_{T})C_{3}(r)^{2}}{\mu(1-\rho_{G})^{2}}}\right)\rho_{T}^{k-1}. (35)

under the assumption that

4C12C04(1ρG)2(h(r)2ρG2(1ρT)(1ρG2ρT)ρS2r+C3(r)2ρT2(k1)ρG2k)12.\displaystyle\frac{4C_{1}^{2}C_{0}^{4}}{(1-\rho_{G})^{2}}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}\rho_{G}^{2}}{(1-\rho_{T})(1-\rho_{G}^{2}\rho_{T})}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}\cdot\rho_{G}^{2k}}\right)\leq\frac{1}{2}. (36)

Recall that C0C_{0} is defined in Theorem C.8. Note that Theorem 3.2 and Theorem C.5 hold under Assumption 2.1. One can check that C0,C1,(1ρG)1,C_{0},C_{1},(1-\rho_{G})^{-1}, and (1ρT)1(1-\rho_{T})^{-1} are bounded by polynomials of f/μ,T/μ,\ell_{f}/\mu,\ell_{T}/\mu, and (ΔS)/μ(\Delta\ell_{S})/\mu.

In the proof, we need to use Lemma F.2 in Lin et al. (2021) to bound LPC’s total cost by a weighted sum of the offline optimal cost and the sum of squared distances between their trajectories. For completeness, we present Lemma F.2 in Lin et al. (2021) below:

Lemma E.1.

For a fixed dimension m+m\in\mathbb{Z}_{+}, assume a function h:m0h:\mathbb{R}^{m}\to\mathbb{R}_{\geq 0} is convex, \ell-smooth and continuously differentiable. For all x,ymx,y\in\mathbb{R}^{m}, for all η>0\eta>0, we have

h(x)(1+η)h(y)+2(1+1η)xy2.h(x)\leq(1+\eta)h(y)+\frac{\ell}{2}\mathopen{}\mathclose{{}\left(1+\frac{1}{\eta}}\right)\mathopen{}\mathclose{{}\left\lVert x-y}\right\rVert^{2}.

Now we come back to the proof of Theorem 3.4. We first bound the sum of squared distances between LPC’s trajectory and the offline optimal trajectory:

t=1Hxtxt2\displaystyle\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2}\leq{} C02(1ρG)2t=1Het2\displaystyle\frac{C_{0}^{2}}{(1-\rho_{G})^{2}}\sum_{t=1}^{H}e_{t}^{2} (37a)
\displaystyle\leq{} 4C12C04(1ρG)2(h(r)2ρG2(1ρT)(1ρG2ρT)ρS2r+C3(r)2ρT2(k1)ρG2k)t=1Hxt1xt12\displaystyle\frac{4C_{1}^{2}C_{0}^{4}}{(1-\rho_{G})^{2}}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}\rho_{G}^{2}}{(1-\rho_{T})(1-\rho_{G}^{2}\rho_{T})}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}\cdot\rho_{G}^{2k}}\right)\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left\lVert x_{t-1}-x_{t-1}^{*}}\right\rVert^{2}
+8C02C12μ(1ρG)2t=1H(h(r)21ρTρS2rτ=tt+k1ρTτtfτ(xτ)+C3(r)2ρT2(k1)ft+k1(xt+k1)),\displaystyle+\frac{8C_{0}^{2}C_{1}^{2}}{\mu(1-\rho_{G})^{2}}\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}}{1-\rho_{T}}\cdot\rho_{S}^{2r}\sum_{\tau=t}^{t+k-1}\rho_{T}^{\tau-t}f_{\tau}(x_{\tau}^{*})+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}f_{t+k-1}(x_{t+k-1}^{*})}\right), (37b)

where we used Theorem C.8 in (37a); we used Lemma C.7 with the specific constants given in Appendix E.2 in (37b).

Recall that in (36), we assume rr and kk are sufficient large so that the coefficient of the first term in (37) satisfies

4C12C04(1ρG)2(h(r)2ρG2(1ρT)(1ρG2ρT)ρS2r+C3(r)2ρT2(k1)ρG2k)12.\frac{4C_{1}^{2}C_{0}^{4}}{(1-\rho_{G})^{2}}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}\rho_{G}^{2}}{(1-\rho_{T})(1-\rho_{G}^{2}\rho_{T})}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}\cdot\rho_{G}^{2k}}\right)\leq\frac{1}{2}.

Substituting this bound into (37) gives that

t=1Hxtxt216C02C12μ(1ρG)2(h(r)2(1ρT)2ρS2r+C3(r)2ρT2(k1))t=1Hft(xt).\displaystyle\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2}\leq\frac{16C_{0}^{2}C_{1}^{2}}{\mu(1-\rho_{G})^{2}}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}}{(1-\rho_{T})^{2}}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}}\right)\cdot\sum_{t=1}^{H}f_{t}(x_{t}^{*}). (38)

By Lemma E.1, since ftf_{t} is (f+ΔS)(\ell_{f}+\Delta\ell_{S})-smooth, convex, and non-negative on n\mathbb{R}^{n}, and ctc_{t} is T\ell_{T}-smooth, convex, and non-negative on n×n\mathbb{R}^{n}\times\mathbb{R}^{n}, we know that

ft(xt)\displaystyle f_{t}(x_{t}) (1+η)ft(xt)+f+ΔS2(1+1η)xtxt2\displaystyle\leq(1+\eta)f_{t}(x_{t}^{*})+\frac{\ell_{f}+\Delta\ell_{S}}{2}\mathopen{}\mathclose{{}\left(1+\frac{1}{\eta}}\right)\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2}
ct(xt,xt1)\displaystyle c_{t}(x_{t},x_{t-1}) (1+η)ct(xt,xt1)+T2(1+1η)(xtxt2+xt1xt12)\displaystyle\leq(1+\eta)c_{t}(x_{t}^{*},x_{t-1}^{*})+\frac{\ell_{T}}{2}\mathopen{}\mathclose{{}\left(1+\frac{1}{\eta}}\right)\mathopen{}\mathclose{{}\left(\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2}+\mathopen{}\mathclose{{}\left\lVert x_{t-1}-x_{t-1}^{*}}\right\rVert^{2}}\right) (39)

holds for any η>0\eta>0. Summing the above inequality over tt gives

t=1H(ft(xt)+ct(xt,xt1))\displaystyle\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left(f_{t}(x_{t})+c_{t}(x_{t},x_{t-1})}\right)
\displaystyle\leq{} (1+η)t=1H(ft(xt)+ct(xt,xt1))+(f+ΔS+2T)2(1+1η)t=1Hxtxt2\displaystyle(1+\eta)\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left(f_{t}(x_{t}^{*})+c_{t}(x_{t}^{*},x_{t-1}^{*})}\right)+\frac{(\ell_{f}+\Delta\ell_{S}+2\ell_{T})}{2}\mathopen{}\mathclose{{}\left(1+\frac{1}{\eta}}\right)\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left\lVert x_{t}-x_{t}^{*}}\right\rVert^{2}
\displaystyle\leq{} (1+η)cost(OPT)+(1+1η)16C02C12(f+ΔS+2T)μ(1ρG)2(h(r)2(1ρT)2ρS2r+C3(r)2ρT2(k1))cost(OPT),\displaystyle(1+\eta)cost(OPT)+\mathopen{}\mathclose{{}\left(1+\frac{1}{\eta}}\right)\frac{16C_{0}^{2}C_{1}^{2}(\ell_{f}+\Delta\ell_{S}+2\ell_{T})}{\mu(1-\rho_{G})^{2}}\mathopen{}\mathclose{{}\left(\frac{h(r)^{2}}{(1-\rho_{T})^{2}}\cdot\rho_{S}^{2r}+C_{3}(r)^{2}\cdot\rho_{T}^{2(k-1)}}\right)\cdot cost(OPT), (40)

where we used (38) and t=1Hft(xt)cost(OPT)\sum_{t=1}^{H}f_{t}(x_{t}^{*})\leq cost(OPT) in the last inequality. Setting η=ρSr+ρTk1\eta=\rho_{S}^{r}+\rho_{T}^{k-1} in (E.3) finishes the proof of (35).

As a remark, we require the local cost function (ftv,ctv,ste)\mathopen{}\mathclose{{}\left(f_{t}^{v},c_{t}^{v},s_{t}^{e}}\right) to be non-negative, convex, and smooth in the whole Euclidean spaces (n,n×n,n×n)\mathopen{}\mathclose{{}\left(\mathbb{R}^{n},\mathbb{R}^{n}\times\mathbb{R}^{n},\mathbb{R}^{n}\times\mathbb{R}^{n}}\right) in Assumption 2.1 because we want to apply Lemma E.1 in (E.3).

E.4 Proof of Corollary 3.5

We first show Δ2ρSρS\Delta^{2}\rho_{S}\leq\sqrt{\rho_{S}} holds under 2.1 and the assumptions that Sμ1Δ7,Tμ116\frac{\ell_{S}}{\mu}\leq\frac{1}{\Delta^{7}},\frac{\ell_{T}}{\mu}\leq\frac{1}{16}. To see this, note that as we discussed in Section 3.2, by setting b1=2Δ1b_{1}=2\Delta-1 and b2=4Δ22Δb_{2}=4\Delta^{2}-2\Delta, Theorem 3.4 holds with

ρS=4Δ2(1+ΔS/μ1)1+ΔS/μ+1.\displaystyle\rho_{S}=\frac{4\Delta^{2}(\sqrt{1+\Delta\ell_{S}/\mu}-1)}{\sqrt{1+\Delta\ell_{S}/\mu}+1}.

Hence we see that

Δ2ρS=2Δ3(1+(ΔS/μ)11+(ΔS/μ)+1)122Δ3(1+Δ612)121,\Delta^{2}\sqrt{\rho_{S}}=2\Delta^{3}\mathopen{}\mathclose{{}\left(\frac{\sqrt{1+(\Delta\ell_{S}/\mu)}-1}{\sqrt{1+(\Delta\ell_{S}/\mu)}+1}}\right)^{\frac{1}{2}}\leq 2\Delta^{3}\mathopen{}\mathclose{{}\left(\frac{\sqrt{1+\Delta^{-6}}-1}{2}}\right)^{\frac{1}{2}}\leq 1,

which implies that

Δ2ρSρS.\displaystyle\Delta^{2}\rho_{S}\leq\sqrt{\rho_{S}}. (41)

Recall that function C3(r):=γ=0rh(γ)ρSγC_{3}(r):=\sum_{\gamma=0}^{r}h(\gamma)\cdot\rho_{S}^{\gamma}. Hence we see that

C3(r)γ=0rΔγρSγγ=0r(ρSΔ)γΔΔρS.\displaystyle C_{3}(r)\leq\sum_{\gamma=0}^{r}\Delta^{\gamma}\cdot\rho_{S}^{\gamma}\leq\sum_{\gamma=0}^{r}\mathopen{}\mathclose{{}\left(\frac{\sqrt{\rho_{S}}}{\Delta}}\right)^{\gamma}\leq\frac{\Delta}{\Delta-\sqrt{\rho_{S}}}. (42)

Substituting (41) and (42) into the competitive ratio bound in (35) shows that the competitive ratio of LPC is upper bound by

1+(1+32C02C12(f+ΔS+2c)μ(1ρG)2(1ρT)2)ρSr2+(1+32C02C12(f+ΔS+2c)Δ2μ(1ρG)2(ΔρS)2)ρTk1.1+\mathopen{}\mathclose{{}\left(1+\frac{32C_{0}^{2}C_{1}^{2}(\ell_{f}+\Delta\ell_{S}+2\ell_{c})}{\mu(1-\rho_{G})^{2}(1-\rho_{T})^{2}}}\right)\cdot\rho_{S}^{\frac{r}{2}}+\mathopen{}\mathclose{{}\left(1+\frac{32C_{0}^{2}C_{1}^{2}(\ell_{f}+\Delta\ell_{S}+2\ell_{c})\Delta^{2}}{\mu(1-\rho_{G})^{2}(\Delta-\sqrt{\rho_{S}})^{2}}}\right)\rho_{T}^{k-1}.

Appendix F Proof of Theorem 3.6

In this appendix we prove a lower bound on the competitive ratio of any online algorithm. Our proof focuses on temporal and spatial lower bounds separately first, and then combines them.

Step 1: Temporal Lower Bounds

We first show that the competitive ratio of any online algorithm with kk steps of future predictions is lower bounded by 1+Ω(λTk)1+\Omega(\lambda_{T}^{k}). To show this, we consider the special case when there are no spatial interaction costs (i.e., ste0s_{t}^{e}\equiv 0 for all tt and ee). In this case, since all agents are independent with each other, it suffices to assume there is only one agent in the network 𝒢\mathcal{G}. Thus we will drop the agent index in the following analysis. To further simplify the problem, we assume dimension n=1n=1, ct(xt,xt1)=T2(xtxt1)2c_{t}(x_{t},x_{t-1})=\frac{\ell_{T}}{2}(x_{t}-x_{t-1})^{2}, and the feasible set is DtD=[0,1]D_{t}\equiv D=[0,1] for all tt. Let RR denote the diameter of DD, i.e., R=supx,yD|xy|=1R=\sup_{x,y\in D}\mathopen{}\mathclose{{}\left\lvert x-y}\right\rvert=1.

By Theorem 2 in Li et al. (2020) and Case 1 in its proof, we know that for any online algorithm ALGALG with kk steps of future predictions and LT(2R,RH)L_{T}\in(2R,RH), there exists a problem instance with quadratic functions f1,f2,,fHf_{1},f_{2},\ldots,f_{H} that have the form ft(xt)=μ2(xtθt)2,θtDf_{t}(x_{t})=\frac{\mu}{2}(x_{t}-\theta_{t})^{2},\theta_{t}\in D such that

cost(ALG)cost(OPT)μ3(1λT)296(μ+1)2λTkRLH,\displaystyle cost(ALG)-cost(OPT)\geq\frac{\mu^{3}(1-\sqrt{\lambda_{T}})^{2}}{96(\mu+1)^{2}}\cdot\lambda_{T}^{k}\cdot R\cdot L_{H}, (43)

where LHt=1H|θtθt1|L_{H}\geq\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left\lvert\theta_{t}-\theta_{t-1}}\right\rvert. Note that

RLT\displaystyle R\cdot L_{T}\geq{} t=1H|vtvt1|2\displaystyle\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left\lvert v_{t}-v_{t-1}}\right\rvert^{2}
=\displaystyle={} 2Tt=1H(ft(vt)+ct(vt,vt1))\displaystyle\frac{2}{\ell_{T}}\cdot\sum_{t=1}^{H}\mathopen{}\mathclose{{}\left(f_{t}(v_{t})+c_{t}(v_{t},v_{t-1})}\right)
\displaystyle\geq{} 2Tcost(OPT).\displaystyle\frac{2}{\ell_{T}}\cdot cost(OPT).

Substituting this into (43) gives

cost(ALG)(1+μ3(1λT)248(μ+1)2TλTk)cost(OPT).\displaystyle cost(ALG)\geq\mathopen{}\mathclose{{}\left(1+\frac{\mu^{3}(1-\sqrt{\lambda_{T}})^{2}}{48(\mu+1)^{2}\ell_{T}}\cdot\lambda_{T}^{k}}\right)\cdot cost(OPT). (44)

Note that (43) implies cost(ALG)>0cost(ALG)>0, hence the competitive ratio can be unbounded if cost(OPT)=0cost(OPT)=0.

Step 2: Spatial Lower Bounds

We next show that the competitive ratio of any online algorithm that can communicate within rr-hop neighborhood according to the scheme defined in Section 2.1 is lower bounded by 1+Ω(λSr)1+\Omega(\lambda_{S}^{r}). To show this, we will construct a special Networked OCO instance with random cost functions and show there exists a realization that achieves the lower bound by probabilistic methods.

Theorem F.1.

Under the assumption that Δ3\Delta\geq 3, the competitive ratio of any decentralized online algorithm ALGALG with communication radius rr is lower bounded by 1+Ω(λSr)1+\Omega(\lambda_{S}^{r}), where Ω()\Omega(\cdot) notation hides factors that depend polynomially on 1/μ,T,S,1/\mu,\ell_{T},\ell_{S}, and Δ\Delta, and

λS={(ΔS/μ)3+3(ΔS/μ) if ΔS/μ<48,max((ΔS/μ)3+3(ΔS/μ),(143(ΔS/μ)12)2) otherwise.\displaystyle\lambda_{S}=\begin{cases}\frac{(\Delta\ell_{S}/\mu)}{3+3(\Delta\ell_{S}/\mu)}&\text{ if }\Delta\ell_{S}/\mu<48,\\ \max\mathopen{}\mathclose{{}\left(\frac{(\Delta\ell_{S}/\mu)}{3+3(\Delta\ell_{S}/\mu)},\mathopen{}\mathclose{{}\left(1-4\sqrt{3}\cdot(\Delta\ell_{S}/\mu)^{-\frac{1}{2}}}\right)^{2}}\right)&\text{ otherwise.}\end{cases} (45)
Proof of Theorem F.1.

In the proof, we assume the online game only lasts one time step before it ends, i.e., H=1H=1. Note that when H>1H>1, the same counterexample can be constructed repeatedly by letting the temporal interaction costs ctv0c_{t}^{v}\equiv 0 for every agent vv and time step tt. To simplify the notation, we define :=S/μ\ell:=\ell_{S}/\mu and d:=[Δ/2]d:=[\Delta/2]. Without the loss of generality, we assume 𝒱={1,2,,n}\mathcal{V}=\{1,2,\cdots,n\} so that each agent has a positive integer index.

We consider the case where the node cost function for each agent ii is (xi+wi)2(x_{i}+w_{i})^{2} and the spatial interaction cost between two neighboring agents ii and jj is (xixj)2\ell(x_{i}-x_{j})^{2}. Here, xix_{i}\in\mathbb{R} is the scalar action of agent ii, and parameter wiw_{i}\in\mathbb{R} is a local information that corresponds to agent ii. The parameters {wi}i=1n\{w_{i}\}_{i=1}^{n} are sampled i.i.d. from some distribution 𝒟\mathcal{D}, which we will discuss later.

For a general graph 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) of agents, let LL denote its graph Laplacian matrix. Recall that the graph Laplacian matrix L𝒱×𝒱L\in\mathcal{V}\times\mathcal{V} is a symmetric n×nn\times n matrix and it is defined as

Li,j={deg(i) if i=j,1 if ij and (i,j),0 otherwise,L_{i,j}=\begin{cases}deg(i)&\text{ if }i=j,\\ -1&\text{ if }i\not=j\text{ and }(i,j)\in\mathcal{E},\\ 0&\text{ otherwise,}\end{cases}

for agents i,j𝒱i,j\in\mathcal{V}. Here deg()deg(\cdot) denotes the degree of an agent in graph 𝒢\mathcal{G}. We know that LL is a symmetric semi-definite positive semi-definite and has bandwidth 11 w.r.t. to 𝒢\mathcal{G}. The centralized optimization problem can be expressed as

cost(OPT)\displaystyle cost(OPT) =minxn(x+w)(x+w)+xLx\displaystyle=\min_{x\in\mathbb{R}^{n}}(x+w)^{\top}(x+w)+\ell\cdot x^{\top}Lx
=minxn(I+L)12x+(I+L)12w2+w(I(I+L)1)w\displaystyle=\min_{x\in\mathbb{R}^{n}}\mathopen{}\mathclose{{}\left\lVert(I+\ell\cdot L)^{\frac{1}{2}}x+(I+\ell\cdot L)^{-\frac{1}{2}}w}\right\rVert^{2}+w^{\top}(I-(I+\ell\cdot L)^{-1})w
=w(I(I+L)1)w,\displaystyle=w^{\top}(I-(I+\ell\cdot L)^{-1})w,

where the minimum is attained at x=(I+L)1wx^{*}=(I+\ell\cdot L)^{-1}w.

When each agent ii only has communication radius rr, it can only observe the part of ww that is within NirN_{i}^{r}. To simplify the notation, we define the mask operator ϕS:nn\phi_{S}:\mathbb{R}^{n}\to\mathbb{R}^{n} w.r.t. a set S𝒱S\subseteq\mathcal{V} as

ϕS(w)i={wi if iS,0 otherwise,\phi_{S}(w)_{i}=\begin{cases}w_{i}&\text{ if }i\in S,\\ 0&\text{ otherwise,}\end{cases}

for i𝒱i\in\mathcal{V}. The local policy of agent ii (denote as πi\pi_{i}) is a mapping from wNirw_{N_{i}^{r}} to the local decision xix_{i}.

Suppose the distribution 𝒟\mathcal{D} of each local parameters wiw_{i} is a mean-zero distribution with support on \mathbb{R}. For every agent i𝒱i\in\mathcal{V}, we see that

𝔼w|xi(w)xi(w)|2\displaystyle\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lvert x_{i}(w)-x_{i}^{*}(w)}\right\rvert^{2} =minπi𝔼w|πi(wNir)xi(w)|2\displaystyle={}\min_{\pi_{i}}\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lvert\pi_{i}(w_{N_{i}^{r}})-x_{i}^{*}(w)}\right\rvert^{2}
𝔼w|𝔼[xi(w)wNir]xi(w)|2\displaystyle\geq{}\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lvert\mathbb{E}[x_{i}^{*}(w)\mid w_{N_{i}^{r}}]-x_{i}^{*}(w)}\right\rvert^{2} (46a)
=𝔼w|𝔼[((I+L)1w)iwNir]((I+L)1w)i|2\displaystyle={}\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lvert\mathbb{E}[\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}w}\right)_{i}\mid w_{N_{i}^{r}}]-\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}w}\right)_{i}}\right\rvert^{2}
=𝔼w|((I+L)1ϕNir(w))i((I+L)1w)i|2\displaystyle={}\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lvert\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}\phi_{N_{i}^{r}}(w)}\right)_{i}-\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}w}\right)_{i}}\right\rvert^{2} (46b)
=𝔼w|((I+L)1ϕNir(w))i|2,\displaystyle={}\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lvert\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}\phi_{N_{-i}^{r}}(w)}\right)_{i}}\right\rvert^{2}, (46c)

where we use the fact that conditional expectations minimize the mean square prediction error in (46a); we use the requirement that the distribution of ww is mean-zero in (46b).

To bound the variance term in (46c), we need the following lemma to lower bound the magnitude of every entry in the exponential decaying matrix (I+L)1(I+\ell\cdot L)^{-1}:

Lemma F.2.

There exists a finite graph 𝒢\mathcal{G} with maximum degree 2d2d that satisfies the following conditions: For any two vertices i,ji,j such that d𝒢(i,j)3d_{\mathcal{G}}(i,j)\geq 3, the following inequality holds:

((I+L)1)ijd𝒢(i,j)d2(2d+1)(d2d+1)d𝒢(i,j).\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}\geq\frac{d_{\mathcal{G}}(i,j)}{d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(\frac{d\ell}{2d\ell+1}}\right)^{d_{\mathcal{G}}(i,j)}.

If we make the additional assumption that >16d\ell>\frac{16}{d}, we have that

((I+L)1)ij14πd𝒢(i,j)dd2(2d+1)(14(d)12)d𝒢(i,j).\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}\geq\frac{1}{4\sqrt{\pi\cdot d_{\mathcal{G}}(i,j)\cdot\sqrt{d\ell}}\cdot d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(1-4(d\ell)^{-\frac{1}{2}}}\right)^{d_{\mathcal{G}}(i,j)}.

We defer the proof of Lemma F.2 to the end of this section. Note that Lemma F.2 implies that there exists a graph 𝒢\mathcal{G} that satisfies ((I+L)1)i,j=Ω(λSr)\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{i,j}=\Omega\mathopen{}\mathclose{{}\left(\lambda_{S}^{r}}\right), where Ω()\Omega(\cdot) notation hides factors that depend polynomially on 1/μ,T,S,1/\mu,\ell_{T},\ell_{S}, and Δ\Delta, and λS\lambda_{S} is as defined in (45). We assume the agents are located in this graph 𝒢\mathcal{G} for the rest of the proof.

Using Lemma F.2, we can derive the following lower bound of the variance term in (46b):

𝔼w|((I+L)1ϕNir(w))i|2=\displaystyle\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lvert\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}\phi_{N_{-i}^{r}}(w)}\right)_{i}}\right\rvert^{2}={} 𝔼w(jNir((I+L)1)ijwj)2\displaystyle\mathbb{E}_{w}\mathopen{}\mathclose{{}\left(\sum_{j\in N_{-i}^{r}}\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}w_{j}}\right)^{2}
=\displaystyle={} jNir((I+L)1)ij2Var(wj)\displaystyle\sum_{j\in N_{-i}^{r}}\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}^{2}Var(w_{j})
\displaystyle\geq{} jNir+1((I+L)1)ij2Var(wj)\displaystyle\sum_{j\in\partial N_{i}^{r+1}}\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}^{2}Var(w_{j})
\displaystyle\geq{} Θ(λSrVar(wi)).\displaystyle\Theta\mathopen{}\mathclose{{}\left(\lambda_{S}^{r}\cdot Var(w_{i})}\right). (47)

Substituting (F) into (46) and summing over all vertices ii, we obtain that

𝔼wx(w)x(w)2i=1n𝔼w|xi(w)xi(w)|2Θ(nλSrVar(wi)).\displaystyle\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lVert x(w)-x^{*}(w)}\right\rVert^{2}\geq\sum_{i=1}^{n}\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lvert x_{i}(w)-x_{i}^{*}(w)}\right\rvert^{2}\geq\Theta\mathopen{}\mathclose{{}\left(n\cdot\lambda_{S}^{r}\cdot Var(w_{i})}\right).

We also see that

𝔼w[cost(OPT)]=𝔼w[w(I(I+L)1)w]=O(nVar(wi)).\displaystyle\mathbb{E}_{w}[cost(OPT)]=\mathbb{E}_{w}\mathopen{}\mathclose{{}\left[w^{\top}(I-(I+\ell\cdot L)^{-1})w}\right]=O(n\cdot Var(w_{i})). (48)

Note that the global objective function (x+w)(x+w)+xLx(x+w)^{\top}(x+w)+\ell\cdot x^{\top}Lx is 11-strongly convex, and x(w)x^{*}(w) is minimizer of this function. Thus, we have that for any outcome of ww,

cost(ALG)cost(OPT)12x(w)x(w)2.cost(ALG)-cost(OPT)\geq\frac{1}{2}\mathopen{}\mathclose{{}\left\lVert x(w)-x^{*}(w)}\right\rVert^{2}.

Taking expectations on both sides w.r.t. ww gives that

𝔼wcost(ALG)𝔼wcost(OPT)12𝔼wx(w)x(w)2Θ(nλSrVar(wi)).\displaystyle\mathbb{E}_{w}cost(ALG)-\mathbb{E}_{w}cost(OPT)\geq\frac{1}{2}\mathbb{E}_{w}\mathopen{}\mathclose{{}\left\lVert x(w)-x^{*}(w)}\right\rVert^{2}\geq\Theta\mathopen{}\mathclose{{}\left(n\cdot\lambda_{S}^{r}\cdot Var(w_{i})}\right). (49)

Dividing (49) by (48), we obtain that

𝔼wcost(ALG)𝔼wcost(OPT)1+Ω(λSr).\frac{\mathbb{E}_{w}cost(ALG)}{\mathbb{E}_{w}cost(OPT)}\geq 1+\Omega\mathopen{}\mathclose{{}\left(\lambda_{S}^{r}}\right).

Note that w[cost(OPT)=0]=0\mathbb{P}_{w}\mathopen{}\mathclose{{}\left[cost(OPT)=0}\right]=0. Thus, there must exist an instance of ww such that cost(OPT)>0cost(OPT)>0 and

cost(ALG)cost(OPT)1+Ω(λSr).\frac{cost(ALG)}{cost(OPT)}\geq 1+\Omega\mathopen{}\mathclose{{}\left(\lambda_{S}^{r}}\right).

Before we present the proof of Lemma F.2, we first need to introduce two technical lemmas that will be used in the proof of Lemma F.2. The first lemma (Lemma F.3) provides a lower bound for binomial coefficient ((2+ϵ)mm)\binom{(2+\epsilon)m}{m}.

Lemma F.3.

For any positive integer mm and ϵ0\epsilon\in\mathbb{R}_{\geq 0} such that ϵm\epsilon m is an integer, the following inequality holds:

((2+ϵ)mm)>12πm12(2+ϵ)(2+ϵ)m+12(1+ϵ)(1+ϵ)m+12e16m.\binom{(2+\epsilon)m}{m}>\frac{1}{\sqrt{2\pi}}m^{-\frac{1}{2}}\cdot\frac{(2+\epsilon)^{(2+\epsilon)m+\frac{1}{2}}}{(1+\epsilon)^{(1+\epsilon)m+\frac{1}{2}}}\cdot e^{-\frac{1}{6m}}.
Proof of Lemma F.3.

By Lemma 2.1 in (Stanica, 2001), we know for any n+n\in\mathbb{Z}_{+},

n!=2πnn+12en+r(n),n!=\sqrt{2\pi}n^{n+\frac{1}{2}}e^{-n+r(n)},

where r(n)r(n) satisfies 112n+1<r(n)<112n\frac{1}{12n+1}<r(n)<\frac{1}{12n}. Thus we see that

2πnn+12en+112n+1<n!<2πnn+12en+112n,n+.\sqrt{2\pi}n^{n+\frac{1}{2}}e^{-n+\frac{1}{12n+1}}<n!<\sqrt{2\pi}n^{n+\frac{1}{2}}e^{-n+\frac{1}{12n}},\forall n\in\mathbb{Z}_{+}.

Therefore, we can lower bound ((2+ϵ)mm)\binom{(2+\epsilon)m}{m} by

((2+ϵ)mm)=\displaystyle\binom{(2+\epsilon)m}{m}={} ((2+ϵ)m)!m!((1+ϵ)m)!\displaystyle\frac{((2+\epsilon)m)!}{m!\cdot((1+\epsilon)m)!}
>\displaystyle>{} 2π((2+ϵ)m)(2+ϵ)m+12e(2+ϵ)m+112(2+ϵ)m+12πmm+12em+112m2π((1+ϵ)m)(1+ϵ)m+12e(1+ϵ)m+112(1+ϵ)m\displaystyle\frac{\sqrt{2\pi}((2+\epsilon)m)^{(2+\epsilon)m+\frac{1}{2}}e^{-(2+\epsilon)m+\frac{1}{12(2+\epsilon)m+1}}}{\sqrt{2\pi}m^{m+\frac{1}{2}}e^{-m+\frac{1}{12m}}\cdot\sqrt{2\pi}((1+\epsilon)m)^{(1+\epsilon)m+\frac{1}{2}}e^{-(1+\epsilon)m+\frac{1}{12(1+\epsilon)m}}}
=\displaystyle={} 12πm12(2+ϵ)(2+ϵ)m+12(1+ϵ)(1+ϵ)m+12e112(2+ϵ)m+1112m112(1+ϵ)m\displaystyle\frac{1}{\sqrt{2\pi}}m^{-\frac{1}{2}}\cdot\frac{(2+\epsilon)^{(2+\epsilon)m+\frac{1}{2}}}{(1+\epsilon)^{(1+\epsilon)m+\frac{1}{2}}}\cdot e^{\frac{1}{12(2+\epsilon)m+1}-\frac{1}{12m}-\frac{1}{12(1+\epsilon)m}}
>\displaystyle>{} 12πm12(2+ϵ)(2+ϵ)m+12(1+ϵ)(1+ϵ)m+12e16m.\displaystyle\frac{1}{\sqrt{2\pi}}m^{-\frac{1}{2}}\cdot\frac{(2+\epsilon)^{(2+\epsilon)m+\frac{1}{2}}}{(1+\epsilon)^{(1+\epsilon)m+\frac{1}{2}}}\cdot e^{-\frac{1}{6m}}.

The second technical lemma (Lemma F.4) will be used to simplify the decay factor in the proof of Lemma F.2.

Lemma F.4.

For all ϵ[0,2)\epsilon\in[0,\sqrt{2}), the following inequality holds

2+ϵ2(1+ϵ)1+ϵ2+ϵ1ϵ22.\frac{2+\epsilon}{2\cdot(1+\epsilon)^{\frac{1+\epsilon}{2+\epsilon}}}\geq 1-\frac{\epsilon^{2}}{2}.
Proof of Lemma F.4.

By taking logarithm on both sides, we see the original inequality is equivalent to

ln(1+ϵ2)1+ϵ2+ϵln(1+ϵ)ln(112ϵ2),\ln\mathopen{}\mathclose{{}\left(1+\frac{\epsilon}{2}}\right)-\frac{1+\epsilon}{2+\epsilon}\ln(1+\epsilon)\geq\ln\mathopen{}\mathclose{{}\left(1-\frac{1}{2}\epsilon^{2}}\right),

which is further equivalent to

ln(1+ϵ2)1+ϵ2+ϵln(1+ϵ)ln(112ϵ2)0.\ln\mathopen{}\mathclose{{}\left(1+\frac{\epsilon}{2}}\right)-\frac{1+\epsilon}{2+\epsilon}\ln(1+\epsilon)-\ln\mathopen{}\mathclose{{}\left(1-\frac{1}{2}\epsilon^{2}}\right)\geq 0. (50)

Note that the LHS can be lower bounded by

ln(1+ϵ2)1+ϵ2+ϵln(1+ϵ)ln(112ϵ2)ln(1+ϵ2)1+ϵ2ln(1+ϵ)ln(112ϵ2)=:g(ϵ).\ln\mathopen{}\mathclose{{}\left(1+\frac{\epsilon}{2}}\right)-\frac{1+\epsilon}{2+\epsilon}\ln(1+\epsilon)-\ln\mathopen{}\mathclose{{}\left(1-\frac{1}{2}\epsilon^{2}}\right)\geq\ln\mathopen{}\mathclose{{}\left(1+\frac{\epsilon}{2}}\right)-\frac{1+\epsilon}{2}\ln(1+\epsilon)-\ln\mathopen{}\mathclose{{}\left(1-\frac{1}{2}\epsilon^{2}}\right)=:g(\epsilon).

Function gg satisfies that g(0)=0g(0)=0, and its derivative is

g(ϵ)=\displaystyle g^{\prime}(\epsilon)={} 12+ϵ1212ln(1+ϵ)+ϵ112ϵ2\displaystyle\frac{1}{2+\epsilon}-\frac{1}{2}-\frac{1}{2}\ln(1+\epsilon)+\frac{\epsilon}{1-\frac{1}{2}\epsilon^{2}}
\displaystyle\geq{} 12+ϵ12ϵ2+ϵ\displaystyle\frac{1}{2+\epsilon}-\frac{1}{2}-\frac{\epsilon}{2}+\epsilon
=\displaystyle={} 2(2+ϵ)(1ϵ)2(2+ϵ)\displaystyle\frac{2-(2+\epsilon)(1-\epsilon)}{2(2+\epsilon)}
=\displaystyle={} ϵ+ϵ22(2+ϵ)0.\displaystyle\frac{\epsilon+\epsilon^{2}}{2(2+\epsilon)}\geq 0.

Thus, g(ϵ)0g(\epsilon)\geq 0 for all ϵ[0,2)\epsilon\in[0,\sqrt{2}). Hence (50) holds for all ϵ[0,2)\epsilon\in[0,\sqrt{2}). ∎

Now we are ready to present the proof of Lemma F.2.

Proof of Lemma F.2.
Refer to caption
Figure 3: Graph structure of 𝒢\mathcal{G} to obtain the lower bound: NN blocks form a ring. Each block contains dd vertices.

Consider the graph 𝒢\mathcal{G} constructed as Figure 3: Let NN be a positive integer that is sufficiently large. NN blocks form a ring, where each block contains dd nodes. Every pair of blocks are connected by a complete bipartite graph. The graph Laplacian of 𝒢\mathcal{G} can be decomposed as L=2dIML=2dI-M, where MM is the adjacency matrix of 𝒢\mathcal{G}. We see that

(I+L)1\displaystyle(I+\ell\cdot L)^{-1} =((2d+1)IM)1\displaystyle=\mathopen{}\mathclose{{}\left((2d\ell+1)I-\ell\cdot M}\right)^{-1}
=12d+1(I2d+1M)1\displaystyle=\frac{1}{2d\ell+1}\mathopen{}\mathclose{{}\left(I-\frac{\ell}{2d\ell+1}M}\right)^{-1}
=12d+1t=0t(2d+1)tMt\displaystyle=\frac{1}{2d\ell+1}\sum_{t=0}^{\infty}\frac{\ell^{t}}{(2d\ell+1)^{t}}M^{t}

Fix two vertices ii and jj and denote κ:=d𝒢(i,j)\kappa:=d_{\mathcal{G}}(i,j) and assume κ3\kappa\geq 3. Without the loss of generality, we can assume jj is on the clockwise direction of ii. We see that

((I+L)1)ij=\displaystyle\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}={} 12d+1t=0t(2d+1)t(Mt)ij\displaystyle\frac{1}{2d\ell+1}\sum_{t=0}^{\infty}\frac{\ell^{t}}{(2d\ell+1)^{t}}(M^{t})_{ij}
=\displaystyle={} κ(2d+1)κ+1m=02m(2d+1)2m(Mκ+2m)ij.\displaystyle\frac{\ell^{\kappa}}{(2d\ell+1)^{\kappa+1}}\sum_{m=0}^{\infty}\frac{\ell^{2m}}{(2d\ell+1)^{2m}}(M^{\kappa+2m})_{ij}. (51)

Note that (Mκ+2m)ij(M^{\kappa+2m})_{ij} denotes the number of paths from ii to jj with length κ+2m\kappa+2m in graph 𝒢\mathcal{G}. Note that the shortest paths from ii to jj have length κ\kappa. To pick a path with length (κ+2m)(\kappa+2m) from ii to jj, we can first pick a path on the level of blocks: The number of possible block-level paths is lower bounded by (κ+2mm)\binom{\kappa+2m}{m} because we can choose mm in (κ+2m)(\kappa+2m) steps to go in the counter clockwise direction. After a block-level path is fixed, we can choose which specific vertices in the blocks we want to land at, and there are dκ+2m2d^{\kappa+2m-2} choices. Thus we see that

(Mκ+2m)ij(κ+2mm)dκ+2m2.(M^{\kappa+2m})_{ij}\geq\binom{\kappa+2m}{m}d^{\kappa+2m-2}.

Substituting this into (F) gives

((I+L)1)ij\displaystyle\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}\geq{} κdκ2(2d+1)κ+1m=02md2m(2d+1)2m(κ+2mm).\displaystyle\frac{\ell^{\kappa}d^{\kappa-2}}{(2d\ell+1)^{\kappa+1}}\sum_{m=0}^{\infty}\frac{\ell^{2m}d^{2m}}{(2d\ell+1)^{2m}}\binom{\kappa+2m}{m}. (52)

Let m=0m=0 will give that ((I+L)1)ijκd2(2d+1)(d2d+1)κ\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}\geq\frac{\kappa}{d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(\frac{d\ell}{2d\ell+1}}\right)^{\kappa}, which shows the first claim of Lemma F.2. Now we proceed to show the second claim of Lemma F.2.

By Lemma F.3, we know that when κ=ϵm\kappa=\epsilon m, we have that

(κ+2mm)=\displaystyle\binom{\kappa+2m}{m}={} ((2+ϵ)mm)\displaystyle\binom{(2+\epsilon)m}{m}
>\displaystyle>{} 12πe16mm12(2+ϵ)(2+ϵ)m(1+ϵ)(1+ϵ)m(2+ϵ1+ϵ)12\displaystyle\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{6m}}m^{-\frac{1}{2}}\frac{(2+\epsilon)^{(2+\epsilon)m}}{(1+\epsilon)^{(1+\epsilon)m}}\cdot\mathopen{}\mathclose{{}\left(\frac{2+\epsilon}{1+\epsilon}}\right)^{\frac{1}{2}}
\displaystyle\geq{} 122πm12(2+ϵ(1+ϵ)1+ϵ2+ϵ)(2+ϵ)m.\displaystyle\frac{1}{2\sqrt{2\pi}}m^{-\frac{1}{2}}\mathopen{}\mathclose{{}\left(\frac{2+\epsilon}{(1+\epsilon)^{\frac{1+\epsilon}{2+\epsilon}}}}\right)^{(2+\epsilon)m}.

For any m>κm>\kappa, the inequality we just showed can help us bound a term in the summation of (52) below:

κdκ2(2d+1)κ+12md2m(2d+1)2m(κ+2mm)\displaystyle\frac{\ell^{\kappa}d^{\kappa-2}}{(2d\ell+1)^{\kappa+1}}\cdot\frac{\ell^{2m}d^{2m}}{(2d\ell+1)^{2m}}\binom{\kappa+2m}{m}
\displaystyle\geq{} 1d2(2d+1)12κ+2m(κ+2mm)(112d+1)κ+2m\displaystyle\frac{1}{d^{2}(2d\ell+1)}\cdot\frac{1}{2^{\kappa+2m}}\binom{\kappa+2m}{m}\cdot\mathopen{}\mathclose{{}\left(1-\frac{1}{2d\ell+1}}\right)^{\kappa+2m}
\displaystyle\geq{} 122πκϵd2(2d+1)(2+ϵ2(1+ϵ)1+ϵ2+ϵ)(2+ϵ)κ/ϵ(112d+1)(1+2ϵ)κ\displaystyle\frac{1}{2\sqrt{2\pi}\cdot\sqrt{\frac{\kappa}{\epsilon}}\cdot d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(\frac{2+\epsilon}{2\cdot(1+\epsilon)^{\frac{1+\epsilon}{2+\epsilon}}}}\right)^{(2+\epsilon)\kappa/\epsilon}\cdot\mathopen{}\mathclose{{}\left(1-\frac{1}{2d\ell+1}}\right)^{(1+\frac{2}{\epsilon})\kappa}
\displaystyle\geq{} 122πκϵd2(2d+1)((1ϵ22)1ϵ(112d+1)1ϵ)(2+ϵ)κ,\displaystyle\frac{1}{2\sqrt{2\pi}\cdot\sqrt{\frac{\kappa}{\epsilon}}\cdot d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(\mathopen{}\mathclose{{}\left(1-\frac{\epsilon^{2}}{2}}\right)^{\frac{1}{\epsilon}}\cdot\mathopen{}\mathclose{{}\left(1-\frac{1}{2d\ell+1}}\right)^{\frac{1}{\epsilon}}}\right)^{(2+\epsilon)\kappa},

where the last line follows from Lemma F.4.

Thus, we obtain that the following inequality holds for arbitrary ϵ(0,1)\epsilon\in(0,1):

((I+L)1)ij122πκϵd2(2d+1)((1ϵ22)2ϵ+1(112d+1)2ϵ+1)κ.\displaystyle\mathopen{}\mathclose{{}\left((I+L)^{-1}}\right)_{ij}\geq\frac{1}{2\sqrt{2\pi}\cdot\sqrt{\frac{\kappa}{\epsilon}}\cdot d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(\mathopen{}\mathclose{{}\left(1-\frac{\epsilon^{2}}{2}}\right)^{\frac{2}{\epsilon}+1}\cdot\mathopen{}\mathclose{{}\left(1-\frac{1}{2d\ell+1}}\right)^{\frac{2}{\epsilon}+1}}\right)^{\kappa}. (53)

By setting ϵ\epsilon such that 1/ϵ=[2(d)12]1/\epsilon=\mathopen{}\mathclose{{}\left[2(d\ell)^{\frac{1}{2}}}\right] in (53), we obtain that:

((I+L)1)ij\displaystyle\mathopen{}\mathclose{{}\left((I+\ell\cdot L)^{-1}}\right)_{ij}\geq{} 122π2κdd2(2d+1)((112d)4d+1(112d+1)4d+1)κ\displaystyle\frac{1}{2\sqrt{2\pi}\cdot\sqrt{2\kappa\cdot\sqrt{d\ell}}\cdot d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(\mathopen{}\mathclose{{}\left(1-\frac{1}{2d\ell}}\right)^{4\sqrt{d\ell}+1}\cdot\mathopen{}\mathclose{{}\left(1-\frac{1}{2d\ell+1}}\right)^{4\sqrt{d\ell}+1}}\right)^{\kappa}
\displaystyle\geq{} 14πκdd2(2d+1)((14d+12d)(14d+12d+1))κ\displaystyle\frac{1}{4\sqrt{\pi\cdot\kappa\cdot\sqrt{d\ell}}\cdot d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(\mathopen{}\mathclose{{}\left(1-\frac{4\sqrt{d\ell}+1}{2d\ell}}\right)\cdot\mathopen{}\mathclose{{}\left(1-\frac{4\sqrt{d\ell}+1}{2d\ell+1}}\right)}\right)^{\kappa}
\displaystyle\geq{} 14πκdd2(2d+1)(14d)κ.\displaystyle\frac{1}{4\sqrt{\pi\cdot\kappa\cdot\sqrt{d\ell}}\cdot d^{2}(2d\ell+1)}\cdot\mathopen{}\mathclose{{}\left(1-\frac{4}{\sqrt{d\ell}}}\right)^{\kappa}. (54)

Step 3: Combine Temporal and Spatial Lower Bounds

Combining the results of Steps 1 and 2 together, we know that the competitive ratio of any decentralized online algorithm is lower bounded by

max{1+μ3(1λT)248(μ+1)2TλTk,1+Ω(λSr)}=1+Ω(λk)+Ω(λSr).\max\{1+\frac{\mu^{3}(1-\sqrt{\lambda_{T}})^{2}}{48(\mu+1)^{2}\ell_{T}}\cdot\lambda_{T}^{k},1+\Omega(\lambda_{S}^{r})\}=1+\Omega(\lambda^{k})+\Omega(\lambda_{S}^{r}).

Appendix G Proof of Corollary 3.7

In this appendix we prove a resource augmentation bound for LPC. To simplify the notation, we define the shorthand aT:=T/μa_{T}:=\ell_{T}/\mu and aS:=S/μa_{S}:=\ell_{S}/\mu. aTa_{T} and aSa_{S} are positive real numbers. We first show two lemmas about the relationships between the decay factors ρT\rho_{T} and λT\lambda_{T}, and ρS\rho_{S} and λS\lambda_{S}.

Lemma G.1.

Under the assumptions of Theorem 3.2, we have ρT4λTρT2.\rho_{T}^{4}\leq\lambda_{T}\leq\rho_{T}^{2}.

Proof of Lemma G.1.

Recall that ρT\rho_{T} is given by

ρT=121+2aT+1\rho_{T}=\sqrt{1-\frac{2}{\sqrt{1+2a_{T}}+1}}

in Theorem 3.2. Thus we see that

ρT4=(121+2aT+1)2(121+4aT+1)2=λT.\displaystyle\rho_{T}^{4}=\mathopen{}\mathclose{{}\left(1-\frac{2}{\sqrt{1+2a_{T}}+1}}\right)^{2}\leq\mathopen{}\mathclose{{}\left(1-\frac{2}{\sqrt{1+4a_{T}}+1}}\right)^{2}=\lambda_{T}.

On the other hand, we have that

λTρT2=(121+4aT+1)21+21+2aT+1=4(1+2aT)(1+2aT1+4aT)(1+2aT+1)(1+4aT+1)20.\lambda_{T}-\rho_{T}^{2}=\mathopen{}\mathclose{{}\left(1-\frac{2}{\sqrt{1+4a_{T}}+1}}\right)^{2}-1+\frac{2}{\sqrt{1+2a_{T}}+1}=\frac{4\sqrt{(1+2a_{T})}\mathopen{}\mathclose{{}\left(\sqrt{1+2a_{T}}-\sqrt{1+4a_{T}}}\right)}{\mathopen{}\mathclose{{}\left(\sqrt{1+2a_{T}}+1}\right)\mathopen{}\mathclose{{}\left(\sqrt{1+4a_{T}}+1}\right)^{2}}\leq 0.

Lemma G.2.

Under the assumptions of Theorem 3.2, we have ρS32λS\rho_{S}^{32}\leq\lambda_{S}.

Proof of Lemma G.2.

Recall that ρT\rho_{T} is given by

ρS=121+ΔaS+1\rho_{S}=\sqrt{1-\frac{2}{\sqrt{1+\Delta a_{S}}+1}}

in Theorem 3.2. We consider the following three cases separately.

Case 1: ΔaS224\Delta a_{S}\geq 224. We have ρS32λS\rho_{S}^{32}\leq\lambda_{S} in this case.

We first show that the following inequality holds for any positive integer n0n_{0} and x[0,1/(2n0)]x\in[0,1/(2n_{0})]:

(1x)2n01n0x.\displaystyle(1-x)^{2n_{0}}\leq 1-n_{0}x. (55)

To see this, define function g(x)=(1x)2n0+n0x1g(x)=(1-x)^{2n_{0}}+n_{0}x-1. Note that gg is a convex function with g(0)=0g(0)=0 and

g(12n0)=(112n0)2n012e112<0.g\mathopen{}\mathclose{{}\left(\frac{1}{2n_{0}}}\right)=\mathopen{}\mathclose{{}\left(1-\frac{1}{2n_{0}}}\right)^{2n_{0}}-\frac{1}{2}\leq e^{-1}-\frac{1}{2}<0.

Thus, we see that g(x)0g(x)\leq 0 holds for all x[0,1/(2n0)]x\in[0,1/(2n_{0})]. Hence (55) holds.

By (55), we see that

ρS16\displaystyle\rho_{S}^{16} =(121+ΔaS+1)8\displaystyle=\mathopen{}\mathclose{{}\left(1-\frac{2}{\sqrt{1+\Delta a_{S}}+1}}\right)^{8}
181+ΔaS+1\displaystyle\leq 1-\frac{8}{\sqrt{1+\Delta a_{S}}+1}
143ΔaS\displaystyle\leq 1-\frac{4\sqrt{3}}{\sqrt{\Delta a_{S}}}
=λS.\displaystyle=\sqrt{\lambda_{S}}.

Case 2: 1ΔaS<2241\leq\Delta a_{S}<224. We have ρS28λS\rho_{S}^{28}\leq\lambda_{S} in this case.

To see this, note that ρS2121+224+1=78\rho_{S}^{2}\leq 1-\frac{2}{\sqrt{1+224}+1}=\frac{7}{8}, and by Theorem 3.6, λS13+31=16\lambda_{S}\geq\frac{1}{3+3\cdot 1}=\frac{1}{6}. Therefore, we see that

ρS28(78)1416λS.\rho_{S}^{28}\leq\mathopen{}\mathclose{{}\left(\frac{7}{8}}\right)^{14}\leq\frac{1}{6}\leq\lambda_{S}.

Case 3: ΔaS<1\Delta a_{S}<1. We have ρS4λS\rho_{S}^{4}\leq\lambda_{S} in this case.

To see this, note that

ρS2=1+ΔaS11+ΔaS+1ΔaS4.\displaystyle\rho_{S}^{2}=\frac{\sqrt{1+\Delta a_{S}}-1}{\sqrt{1+\Delta a_{S}}+1}\leq\frac{\sqrt{\Delta a_{S}}}{4}.

Thus we see that

ρS4ΔaS16ΔaS3+3ΔaS=λS.\displaystyle\rho_{S}^{4}\leq\frac{\Delta a_{S}}{16}\leq\frac{\Delta a_{S}}{3+3\Delta a_{S}}=\lambda_{S}.

Now we come back to the proof of Corollary 3.7. By Theorem 3.4 and Theorem 3.6, we know that the optimal competitive ratio is lower bounded by

c(k,r)1+Cλ(λTk+λSr)\displaystyle c(k^{*},r^{*})\geq 1+C_{\lambda}\mathopen{}\mathclose{{}\left(\lambda_{T}^{k^{*}}+\lambda_{S}^{r^{*}}}\right)

and LPC’s competitive ratio is upper bounded by

cLPC(k,r):=1+Cρ(C3(r)2ρTk+h(r)2ρSr),c_{LPC}(k,r):=1+C_{\rho}\mathopen{}\mathclose{{}\left(C_{3}(r)^{2}\cdot\rho_{T}^{k}+h(r)^{2}\cdot\rho_{S}^{r}}\right),

where CλC_{\lambda} and CρC_{\rho} are some positive constants. To achieve cLPC(k,r)c(k,r)c_{LPC}(k,r)\leq c(k^{*},r^{*}), it suffices to guarantee that

CρC3(r)2ρTkCλλTk and Cρh(r)2ρSrCλλSr.C_{\rho}\cdot C_{3}(r)^{2}\cdot\rho_{T}^{k}\leq C_{\lambda}\lambda_{T}^{k^{*}}\text{ and }C_{\rho}\cdot h(r)^{2}\cdot\rho_{S}^{r}\leq C_{\lambda}\lambda_{S}^{r^{*}}.

Note that C3(r)C_{3}(r) can be upper bounded by some constant and h(r)2poly(r)ρSr2h(r)^{2}\leq poly(r)\cdot\rho_{S}^{-\frac{r}{2}} under our assumptions. Applying Lemma G.1 and Lemma G.2 finishes the proof.