This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the Sample Complexity of Decentralized Linear Quadratic Regulator with Partially Nested Information Structure

Lintao Ye Department of Electrical Engineering at the University of Notre Dame, Notre Dame, IN, USA; [email protected],[email protected].    Zhu Hao Department of Electrical and Computer Engineering at the University of Texas at Austin, USA; [email protected].    Vijay Gupta11footnotemark: 1
Abstract

We study the problem of control policy design for decentralized state-feedback linear quadratic control with a partially nested information structure, when the system model is unknown. We propose a model-based learning solution, which consists of two steps. First, we estimate the unknown system model from a single system trajectory of finite length, using least squares estimation. Next, based on the estimated system model, we design a decentralized control policy that satisfies the desired information structure. We show that the suboptimality gap between our control policy and the optimal decentralized control policy (designed using accurate knowledge of the system model) scales linearly with the estimation error of the system model. Using this result, we provide an end-to-end sample complexity result for learning decentralized controllers for a linear quadratic control problem with a partially nested information structure.

1 Introduction

In large-scale control systems, the control policy is often required to be decentralized, where different controllers may only use partial state information, when designing their local control policies. For example, a given controller may only receive a subset of the global state measurements (e.g., [35]), and there may be a delay in receiving the measurements (e.g., [25]). In general, finding a globally optimal control policy under information constraints is NP-hard, even if the system model is known at the controllers [39, 31, 6]. This has led to a large literature on identifying tractable subclasses of the problem. For instance, if the information structure describing the decentralized control problem is partially nested [21], the optimal solution to the state-feedback linear quadratic control problem can be solved efficiently using dynamic programming [26]. Other conditions, such as quadratic invariance [32, 33], have also been identified as tractable subclasses of the problem.

However, the classical work in this field assumes the knowledge of the system model at the controllers. In this work, we are interested in the situation when the system model is not known a priori [23]. In such a case, the existing algorithms do not apply. Moreover, it is not clear whether subclasses such as problems with partially nested information patterns or where quadratic invariance is satisfied are any more tractable than the general decentralized control problem in this case.

In this paper, we consider a decentralized infinite-horizon state-feedback Linear Quadratic Regulator (LQR) control problem with a partially nested information structure [35, 26] and assume that the controllers do not have access to the system model. We use a model-based learning approach, where we first identify the system model, and then use it to design a decentralized control policy that satisfies the prescribed information constraints.

Related Work

Solving optimal control problems without prior system model knowledge has receive much attention recently. One of the most studied problems is the centralized LQR problem. For this problem, two broad classes of methods have been studied, i.e., model based learning [1, 29, 12], and model-free learning [16, 41, 28, 20]. In the model-based learning approach, a system model is first estimated from observed system trajectories using some system identification method. A control policy can then be obtained based on the estimated system model. In the model-free learning approach, the objective function in the LQR problem is first viewed as a function of the control policies. Based on zeroth-order optimization methods (e.g., [19, 30]), the optimal solution can then be obtained using gradient descent, where the gradient of the objective function is estimated from the data samples from system trajectories. Moreover, the model-based learning approach has also been studied for the centralized linear quadratic Gaussian control problem [43]. In general, compared to model-free learning, model-based learning tends to require less data samples in order to achieve a policy of equivalent performance [38].

Most of the previous works on model-based learning for centralized LQR build on recent advances in non-asymptotic analyses for system identification of linear dynamical systems with full state observations (e.g., [13, 36, 34]). Such non-asymptotic analyses (i.e., sample complexity results) relate the estimation error of the system matrices to the number of samples used for system identification. In particular, it was shown in [36] that when using a single system trajectory, the least squares approach for system identification achieves the optimal sample complexity up to logarithmic factors. In this paper, we utilize a similar least squares approach for estimating the system matrices from a single system trajectory. Although the system matrices in our problem are structured, as dictated by the interconnections among the subsystems, we leverage the results in [2, 10] to provide a non-asymptotic analysis of the resulting estimation error.

There are few results on solving decentralized linear quadratic control problems with information constraints, when the system model is unknown. In [18], the authors studied a decentralized output-feedback linear quadratic control problem, under the assumption that the quadratic invariance condition is satisfied. The authors proposed a model-free approach and provided a sample complexity analysis. They focused on a finite-horizon setting, since gradient-based optimization methods may not converge to the optimal controller for infinite-horizon decentralized linear quadratic control problems with information constraints, even when the system model is known [17, 7]. In [27], the authors proposed a consensus-based model-free learning algorithm for multi-agent decentralized LQR over an infinite horizon, where each agent (i.e., controller) has access to a subset of the global state without delay. They showed that their algorithm converges to a control policy that is a stationary point of the objective function in the LQR problem. In [15], the authors studied model-based learning for LQR with subspace constraints on the closed-loop responses. However, those constraints may not lead to controllers that satisfy the information constraints that we consider in this paper (e.g., [42]).

There is also a line of research on online adaptive control for centralized LQR with unknown system models, using either model-based learning [1, 11, 10], or model-free learning [3, 8]. The goal there is to adaptively design a control policy in an online manner when new data samples from the system trajectory become available, and bound the corresponding regret.

Contributions

We propose a two-step model-based approach to solving the problem of learning decentralized LQR with a partially nested information structure. Here, we summarize our contributions and technical challenges in the paper.

  • In Section 3, we provide a sample complexity result for estimating the system model from a single system trajectory using a least squares approach. Despite the existence of a sparsity pattern in the system model considered in our problem, we adapt the analyses in [10, 9] for least squares estimation of general linear system models (without any sparsity pattern) to our setting, and show that such a system identification method for general system models suffices for our ensuing analyses.

  • In Section 4, based on the estimated system model, we design a novel decentralized control policy that satisfies the given information structure. Our control policy is inspired by [26], which developed the optimal controller for the decentralized LQR problem with a partially nested information structure and known system model. The optimal controller therein depends on some internal states, each of which evolves according to an auxiliary linear system (characterized by the actual model of the original system with a disturbance term from the original system) and correlates with other internal states. Accordingly, this complicated form of the internal states makes it challenging to extend the design in [26] to the case when the system model is unknown. To tackle this, we capitalize on the observation that the optimal controller proposed in [26] can be viewed as a disturbance-feedback control policy that maps the history of past disturbances (affecting the original system) to the current control input. Thanks to this viewpoint, we put forth a control policy that uses the aforementioned estimated system model and maps the estimates of past disturbances to the current control input via some estimated internal states. Particularly, the estimates of disturbances are obtained using the estimated system model and the state information of original system, and each of the estimated internal states evolves according to a linear system characterized by the estimated system model and the estimated disturbances. More importantly, we show that the proposed control policy can be implemented in a decentralized manner that satisfies the prescribed information structure, which requires a careful investigation of the structure of our problem.

  • In Section 5.2, we characterize the performance guarantee (i.e., suboptimality) of the control policy proposed in Section 4. As we discussed above, our control policy requires obtaining estimates of the past disturbances and maintaining the estimated internal states. When we compare the performance of our control policy to that of the optimal decentralized control policy in [26], both the estimates of the past disturbances and the estimated internal states contribute to the suboptimality of our control policy, which creates the major technical challenge in our analyses. We overcome this challenge by carefully investigating the structure of the proposed control policy, and we show that the suboptimality gap between our control policy and the optimal decentralized control policy (designed based on accurate knowledge of the system model) provided in [26] can be decomposed into two terms, both of which scale linearly with the estimation error of the system model.

  • In Section 5.3, we combine the above results together and provide an end-to-end sample complexity result for learning decentralized LQR with a partially nested information structure. Surprisingly, despite the existence of the information constraints and the fact that the optimal controller is a linear dynamic controller, our sample complexity result matches with that of learning centralized LQR without any information constraints [12].

2 Preliminaries and Problem Formulation

2.1 Notation and Terminology

The sets of integers and real numbers are denoted as \mathbb{Z} and \mathbb{R}, respectively. The set of integers (resp., real numbers) that are greater than or equal to aa\in\mathbb{R} is denoted as a\mathbb{Z}_{\geq a} (resp., a\mathbb{R}_{\geq a}). For a real number aa, let a\lceil a\rceil be the smallest integer that is greater than or equal to aa. The space of mm-dimensional real vectors is denoted by m\mathbb{R}^{m}, and the space of m×nm\times n real matrices is denoted by m×n\mathbb{R}^{m\times n}. For a matrix Pn×nP\in\mathbb{R}^{n\times n}, let PP^{\top}, Tr(P)\text{Tr}(P), and {σi(P):i{1,,n}}\{\sigma_{i}(P):i\in\{1,\dots,n\}\} be its transpose, trace, and set of singular values, respectively. Without loss of generality, let the singular values of PP be ordered as σ1(P)σn(P)\sigma_{1}(P)\geq\cdots\geq\sigma_{n}(P). Let \norm\norm{\cdot} denote the 2\ell_{2} norm, i.e., \normP=σ1(P)\norm{P}=\sigma_{1}(P) for a matrix Pn×nP\in\mathbb{R}^{n\times n}, and \normx=xx\norm{x}=\sqrt{x^{\top}x} for a vector xnx\in\mathbb{R}^{n}. Let \normPF=Tr(PP)\norm{P}_{F}=\sqrt{\text{Tr}(PP^{\top})} denote the Frobenius norm of Pn×mP\in\mathbb{R}^{n\times m}. A positive semidefinite matrix PP is denoted by P0P\succeq 0, and PQP\succeq Q if and only if PQ0P-Q\succeq 0. Let 𝕊+n\mathbb{S}_{+}^{n} (resp., 𝕊++n\mathbb{S}_{++}^{n}) denote the set of n×nn\times n positive semidefinite (resp., positive definite) matrices. Let II denote an identity matrix whose dimension can be inferred from the context. Given any integer n1n\geq 1, we define [n]={1,,n}[n]=\{1,\dots,n\}. The cardinality of a finite set 𝒜\mathcal{A} is denoted by |𝒜||\mathcal{A}|. Let 𝒩(μ,Σ)\mathcal{N}(\mu,\Sigma) denote a Gaussian distribution with mean μm\mu\in\mathbb{R}^{m} and covariance Σ𝕊+m\Sigma\in\mathbb{S}^{m}_{+}.

2.2 Solution to Decentralized LQR with Sparsity and Delay Constraints

In this section, we sketch the method developed in [26, 35], which presents the optimal solution to a decentralized LQR problem with a partially nested information structure [21], when the system model is known a priori. First, let us consider a networked system that consists of p1p\in\mathbb{Z}_{\geq 1} interconnected linear-time-invariant (LTI) subsystems. Letting the state, input and disturbance of the subsystem corresponding to node i[p]i\in[p] be xi(t)nix_{i}(t)\in\mathbb{R}^{n_{i}}, ui(t)miu_{i}(t)\in\mathbb{R}^{m_{i}}, and wi(t)w_{i}(t), respectively, the subsystem corresponding to node ii is given by

xi(t+1)=(j𝒩iAijxj(t)+Bijuj(t))+wi(t)i𝒱,x_{i}(t+1)=\Big{(}\!\sum_{j\in\mathcal{N}_{i}}\!A_{ij}x_{j}(t)+B_{ij}u_{j}(t)\Big{)}+w_{i}(t)\ \forall i\in\mathcal{V}, (1)

where 𝒩i[p]\mathcal{N}_{i}\subseteq[p] is the set of subsystems whose states and inputs directly affect the state of subsystem jj, Aijni×niA_{ij}\in\mathbb{R}^{n_{i}\times n_{i}}, Bijni×miB_{ij}\in\mathbb{R}^{n_{i}\times m_{i}}, and wi(t)niw_{i}(t)\in\mathbb{R}^{n_{i}} is a white Gaussian noise process with wi(t)𝒩(0,σw2I)w_{i}(t)\sim\mathcal{N}(0,\sigma_{w}^{2}I) for all t0t\in\mathbb{Z}_{\geq 0}, where σw>0\sigma_{w}\in\mathbb{R}_{>0}.111The analysis can be extended to the case when wi(t)w_{i}(t) is assumed to be a zero-mean white Gaussian noise process with covariance W𝕊++niW\in\mathbb{S}_{++}^{n_{i}}. In that case, our analysis will depend on maxi𝒱σ1(Wi)\max_{i\in\mathcal{V}}\sigma_{1}(W_{i}) and mini𝒱σn(Wi)\min_{i\in\mathcal{V}}\sigma_{n}(W_{i}). For simplicity, we assume throughout this paper that nimin_{i}\geq m_{i} for all i𝒱i\in\mathcal{V}. We can also write Eq. (1) as

xi(t+1)=Aix𝒩i(t)+Biu𝒩i(t)+wi(t)i𝒱,x_{i}(t+1)=A_{i}x_{\mathcal{N}_{i}}(t)+B_{i}u_{\mathcal{N}_{i}}(t)+w_{i}(t)\quad\forall i\in\mathcal{V}, (2)

where Ai[Aij1Aij|𝒩i|]A_{i}\triangleq\begin{bmatrix}A_{ij_{1}}&\cdots A_{ij_{|\mathcal{N}_{i}|}}\end{bmatrix}, Bi[Bij1Bij|𝒩i|]B_{i}\triangleq\begin{bmatrix}B_{ij_{1}}&\cdots B_{ij_{|\mathcal{N}_{i}|}}\end{bmatrix}, x𝒩i(t)[xj1(t)xj|𝒩i|(t)]x_{\mathcal{N}_{i}}(t)\triangleq\begin{bmatrix}x_{j_{1}}(t)&\cdots x_{j_{|\mathcal{N}_{i}|}}(t)\end{bmatrix}^{\top}, and u𝒩i(t)[uj1(t)uj|𝒩i|(t)]u_{\mathcal{N}_{i}}(t)\triangleq\begin{bmatrix}u_{j_{1}}(t)&\cdots u_{j_{|\mathcal{N}_{i}|}}(t)\end{bmatrix}^{\top}, with 𝒩i={j1,,j|𝒩i|}\mathcal{N}_{i}=\{j_{1},\dots,j_{|\mathcal{N}_{i}|}\}. Further letting n=i𝒱nin=\sum_{i\in\mathcal{V}}n_{i} and m=i𝒱mim=\sum_{i\in\mathcal{V}}m_{i}, and defining x(t)=[x1(t)xp(t)]x(t)=\begin{bmatrix}x_{1}(t)^{\top}&\cdots&x_{p}(t)^{\top}\end{bmatrix}^{\top}, u(t)=[u1(t)up(t)]u(t)=\begin{bmatrix}u_{1}(t)^{\top}&\cdots&u_{p}(t)^{\top}\end{bmatrix}^{\top} and w(t)=[w1(t)wp(t)]w(t)=\begin{bmatrix}w_{1}(t)^{\top}&\cdots&w_{p}(t)^{\top}\end{bmatrix}^{\top}, we can compactly write Eq. (1) into the following matrix form:

x(t+1)=Ax(t)+Bu(t)+w(t),x(t+1)=Ax(t)+Bu(t)+w(t), (3)

where the (i,j)(i,j)th block of An×nA\in\mathbb{R}^{n\times n} (resp., Bn×mB\in\mathbb{R}^{n\times m}), i.e., AijA_{ij} (resp., BijB_{ij}) satisfies Aij=0A_{ij}=0 (resp., Bij=0B_{ij}=0) if j𝒩ij\notin\mathcal{N}_{i}. We assume that wi(t1)w_{i}(t_{1}) and wj(t2)w_{j}(t_{2}) are independent for all i,j𝒱i,j\in\mathcal{V} with iji\neq j and for all t1,t20t_{1},t_{2}\in\mathbb{Z}_{\geq 0}. In other words, w(t)w(t) is a white Gaussian noise process with w(t)𝒩(0,σw2I)w(t)\sim\mathcal{N}(0,\sigma_{w}^{2}I) for all t0t\in\mathbb{Z}_{\geq 0}. For simplicity, we assume that x(0)=0x(0)=0 throughout this paper.222The analysis can be extended to the case when x(0)x(0) is given by a zero-mean Gaussian distribution, as one may view x(0)x(0) as w(1)w(-1).

Next, we use a directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) with 𝒱=[p]\mathcal{V}=[p] to characterize the information flow among the subsystems in [p][p] due to communication constraints on the subsystems. Each node in 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) represents a subsystem in [p][p], and we assume that 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) does not have self loops. We associate any edge (i,j)𝒜(i,j)\in\mathcal{A} with a delay of either 0 or 11, further denoted as i0ji\xrightarrow[]{0}j or i1ji\xrightarrow[]{1}j, respectively.333The framework described in this paper can also be used to handle 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) with larger delays; see [26] for a detailed discussion. Then, we define the delay matrix corresponding to 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) as Dp×pD\in\mathbb{R}^{p\times p} such that: (i) If iji\neq j and there is a directed path from jj to ii in 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}), then DijD_{ij} is equal to the sum of delays along the directed path from node jj to node ii with the smallest accumulative delay; (ii) If iji\neq j and there is no directed path from jj to ii in 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}), then Dij=+D_{ij}=+\infty; (iii) Dii=0D_{ii}=0 for all i𝒱i\in\mathcal{V}. Here, we consider the scenario where the information (e.g., state information) corresponding to subsystem j𝒱j\in\mathcal{V} can propagate to subsystem i𝒱i\in\mathcal{V} with a delay of DijD_{ij} (in time), if and only if there exists a directed path from jj to ii with an accumulative delay of DijD_{ij}. Note that as argued in [26], we assume that there is no directed cycle with zero accumulative delay; otherwise, one can first collapse all the nodes in such a directed cycle into a single node, and equivalently consider the resulting directed graph in the framework described above.

To proceed, we consider designing the control input u(t)u(t) for the LTI system in Eq. (3). We focus on state-feedback control, i.e., we can view u(t)u(t) as a policy that maps the states of the LTI system to a control input. Moreover, we require that u(t)u(t) satisfy the information structure according to the directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) and the delay matrix Dp×pD\in\mathbb{R}^{p\times p}, described above. Specifically, considering any i𝒱i\in\mathcal{V} and any t0t\in\mathbb{Z}_{\geq 0}, and noting that the controller corresponding to subsystem i𝒱i\in\mathcal{V} provides the control input ui(t)miu_{i}(t)\in\mathbb{R}^{m_{i}}, the state information that is available to the controller corresponding to i𝒱i\in\mathcal{V} is given by

i(t)={xj(k):j𝒱i,0ktDij},\mathcal{I}_{i}(t)=\{x_{j}(k):j\in\mathcal{V}_{i},0\leq k\leq t-D_{ij}\}, (4)

where 𝒱i{j𝒱:Dij+}\mathcal{V}_{i}\triangleq\{j\in\mathcal{V}:D_{ij}\neq+\infty\}. In other words, the control policy ui(t)u_{i}(t) maps the states contained in i(t)\mathcal{I}_{i}(t) to a control input. In the sequel, we also call i(t)\mathcal{I}_{i}(t) the information set of controller i𝒱i\in\mathcal{V} at time t0t\in\mathbb{Z}_{\geq 0}. Note that i(t)\mathcal{I}_{i}(t) contains the states corresponding to the subsystems in 𝒱\mathcal{V} that have enough time to reach subsystem i𝒱i\in\mathcal{V} at time t0t\in\mathbb{Z}_{\geq 0}, due to the sparsity and delay constraints described above. Now, based on the information set i(t)\mathcal{I}_{i}(t), we further define 𝒮(i(t))\mathcal{S}(\mathcal{I}_{i}(t)) to be the set that consists of all the policies that map the states in i(t)\mathcal{I}_{i}(t) to a control input at node ii. The goal is then to solve the following constrained optimization problem:

minu(0),u(1),limT𝔼[1Tt=0T1(x(t)Qx(t)+u(t)Ru(t))]s.t.x(t+1)=Ax(t)+Bu(t)+w(t),ui(t)𝒮(i(t))i𝒱,t0,\begin{split}\min_{u(0),u(1),\dots}&\lim_{T\to\infty}\mathbb{E}\Big{[}\frac{1}{T}\sum_{t=0}^{T-1}(x(t)^{\top}Qx(t)+u(t)^{\top}Ru(t))\Big{]}\\ s.t.\ &x(t+1)=Ax(t)+Bu(t)+w(t),\\ &u_{i}(t)\in\mathcal{S}(\mathcal{I}_{i}(t))\quad\forall i\in\mathcal{V},\forall t\in\mathbb{Z}_{\geq 0},\end{split} (5)

where Q𝕊+nQ\in\mathbb{S}_{+}^{n} and R𝕊++mR\in\mathbb{S}_{++}^{m} are the cost matrices, and the expectation is taken with respect to w(t)w(t) for all t0t\in\mathbb{Z}_{\geq 0}. Throughout the paper, we always assume that the following assumption on the information propagation pattern among the subsystems in 𝒱\mathcal{V} holds (e.g., [26, 40]).

Assumption 1.

For all j𝒩ij\in\mathcal{N}_{i}, it holds that Dij1D_{ij}\leq 1, where 𝒩i\mathcal{N}_{i} is given in Eq. (1).

Assumption 1 says that the state of subsystem i𝒱i\in\mathcal{V} is affected by the state and input of subsystem j𝒱j\in\mathcal{V}, if and only if there is a communication link with a delay of at most 11 from subsystem jj to ii in 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}). As shown in [26], Assumption 1 ensures that the information structure associated with the system given in Eq. (1) is partially nested [21]. Assumption 1 is frequently used in decentralized control problems (e.g., [26, 35] and the references therein), and one can see that the assumption is satisfied in networked systems where information propagates at least as fast as dynamics. To illustrate our arguments above, we introduce Example 1.

Example 1.

Consider a directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) given in Fig. 1, where 𝒱={1,2,3}\mathcal{V}=\{1,2,3\} and each directed edge is associated with a delay of 0 or 11. The corresponding LTI system is then given by

[x1(t+1)x2(t+1)x3(t+1)]=[A11A12A130A22A230A32A33][x1(t)x2(t)x3(t)]+[B11B12B130B22B230B32B33][u1(t)u2(t)u3(t)]+[w1(t)w2(t)w3(t)].\begin{bmatrix}x_{1}(t+1)\\ x_{2}(t+1)\\ x_{3}(t+1)\end{bmatrix}=\begin{bmatrix}A_{11}&A_{12}&A_{13}\\ 0&A_{22}&A_{23}\\ 0&A_{32}&A_{33}\end{bmatrix}\begin{bmatrix}x_{1}(t)\\ x_{2}(t)\\ x_{3}(t)\end{bmatrix}+\begin{bmatrix}B_{11}&B_{12}&B_{13}\\ 0&B_{22}&B_{23}\\ 0&B_{32}&B_{33}\end{bmatrix}\begin{bmatrix}u_{1}(t)\\ u_{2}(t)\\ u_{3}(t)\end{bmatrix}+\begin{bmatrix}w_{1}(t)\\ w_{2}(t)\\ w_{3}(t)\end{bmatrix}. (6)
Refer to caption
Figure 1:  The directed graph of Example 1. Node i𝒱i\in\mathcal{V} represents a subsystem with state xi(t)x_{i}(t) and edge (i,j)𝒜(i,j)\in\mathcal{A} is labelled with the information propagation delay from ii to jj.

Now, in order to present the solution to (5) given in, e.g., [26], we need to construct an information graph 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) (see [26] for more details). Considering any directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) with 𝒱=[p]\mathcal{V}=[p], and the delay matrix Dp×pD\in\mathbb{R}^{p\times p} as we described above, let us first define sj(k)s_{j}(k) to be the set of nodes in 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) that are reachable from node jj within kk time steps, i.e., sj(k)={i𝒱:Dijk}s_{j}(k)=\{i\in\mathcal{V}:D_{ij}\leq k\}. The information graph 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) is then constructed as

𝒰={sj(k):k0,j𝒱},={(sj(k),sj(k+1)):k0,j𝒱}.\begin{split}\mathcal{U}&=\{s_{j}(k):k\geq 0,j\in\mathcal{V}\},\\ \mathcal{H}&=\{(s_{j}(k),s_{j}(k+1)):k\geq 0,j\in\mathcal{V}\}.\end{split} (7)

Thus, we see from (7) that each node s𝒰s\in\mathcal{U} corresponds to a set of nodes from 𝒱=[p]\mathcal{V}=[p] in the original directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}). Using a similar notation to that for the graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}), if there is an edge from ss to rr in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}), we denote the edge as srs\to r. Additionally, considering any si(0)𝒰s_{i}(0)\in\mathcal{U}, we write wisi(0)w_{i}\xrightarrow[]{}s_{i}(0) to indicate the fact that the noise wi(t)w_{i}(t) is injected to node i𝒱i\in\mathcal{V} at time t0t\in\mathbb{Z}_{\geq 0}.444Note that we have assumed that there is no directed cycle with zero accumulative delay in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}). Hence, one can show that for any si(0)𝒰s_{i}(0)\in\mathcal{U}, wiw_{i} is the only noise term such that wisi(0)w_{i}\rightarrow s_{i}(0). From the above construction of the information graph 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}), one can show that the following properties hold.

Lemma 1.

[26, Proposition 1] Given a directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) with 𝒱=[p]\mathcal{V}=[p], the information graph 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) constructed in (7) satisfies the following: (i) For every r𝒰r\in\mathcal{U}, there is a unique s𝒰s\in\mathcal{U} such that (r,s)(r,s)\in\mathcal{H}, i.e., rsr\xrightarrow[]{}s; (ii) every path in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) ends at a node with a self loop; and (iii) n|𝒰|p2p+1n\leq|\mathcal{U}|\leq p^{2}-p+1.

Remark 1.

One can see from the construction of 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) and Lemma 1 that 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) is a forest, i.e., a set of disconnected directed trees, where each directed tree in the forest is oriented to a node with a self loop in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}). Specifically, si(0)s_{i}(0) for all i𝒱i\in\mathcal{V} are the leaf nodes in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}), and the nodes with self loop are root nodes in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}).

To illustrate the construction steps and the properties of the information graph discussed above, we again use Example 1; the resulting information graph 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) is then given in Fig. 2. Note that the information graph 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) in Fig. 2 contains two disconnected directed trees, one of which is an isolated node {1}𝒰\{1\}\in\mathcal{U} with a self loop. Also notice that s1(0)={1}s_{1}(0)=\{1\}, s2(0)={1,2}s_{2}(0)=\{1,2\} and s3(0)={3}s_{3}(0)=\{3\}. In fact, we can check that the results in Lemma 1 hold for 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) in Fig. 2.

Refer to caption
Figure 2: The information graph of Example 1. Each node in the information graph is a subset of the nodes in the directed graph given in Fig. 1.

Throughout this paper, we assume that the elements in 𝒱=[p]\mathcal{V}=[p] are ordered in an increasing manner, and that the elements in ss are also ordered in an increasing manner for all s𝒰s\in\mathcal{U}. Now, for any s,r𝒰s,r\in\mathcal{U}, we use AsrA_{sr} (or As,rA_{s,r}) to denote the submatrix of AA that corresponds to the nodes of the directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) contained in ss and rr. For example, A{1},{1,2}=[A11A12]A_{\{1\},\{1,2\}}=\begin{bmatrix}A_{11}&A_{12}\end{bmatrix}. In the sequel, we will also use similar notations to denote submatrices of BB, QQ, RR and the identity matrix II. We will make the following standard assumptions (see, e.g., [26]).

Assumption 2.

For any s𝒰s\in\mathcal{U} that has a self loop, the pair (Ass,Bss)(A_{ss},B_{ss}) is stabilizable and the pair (Ass,Css)(A_{ss},C_{ss}) is detectable, where Qss=CssCssQ_{ss}=C_{ss}^{\top}C_{ss}.

Leveraging the partial nestedness of (5), the authors in [26] obtained the optimal solution to (5), which we summarize in the following lemma.

Lemma 2.

[26, Corollary 4] Consider the problem given in (5), and let 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) be the associated information graph. Suppose Assumption 2 holds. For all r𝒰r\in\mathcal{U}, define matrices PrP_{r} and KrK_{r} recursively as

Kr\displaystyle K_{r} =(Rrr+BsrPsBsr)1BsrPsAsr,\displaystyle=-(R_{rr}+B_{sr}^{\top}P_{s}B_{sr}^{\top})^{-1}B_{sr}^{\top}P_{s}A_{sr}, (8)
Pr\displaystyle P_{r} =Qrr+KrRrrKr+(Asr+BsrKr)Ps(Asr+BsrKr),\displaystyle=Q_{rr}+K_{r}^{\top}R_{rr}K_{r}+(A_{sr}+B_{sr}K_{r})^{\top}P_{s}(A_{sr}+B_{sr}K_{r}), (9)

where for each r𝒰r\in\mathcal{U}, s𝒰s\in\mathcal{U} is the unique node such that rsr\rightarrow s. In particular, for any s𝒰s\in\mathcal{U} that has a self loop, the matrix PsP_{s} is the unique positive semidefinite solution to the Riccati equation given by Eq. (9) , and the matrix Ass+BssKsA_{ss}+B_{ss}K_{s} is stable. The optimal solution to (5) is then given by

ζs(t+1)=rs(Asr+BsrKr)ζr(t)+wisIs,{i}wi(t),\zeta_{s}(t+1)=\sum_{r\rightarrow s}(A_{sr}+B_{sr}K_{r})\zeta_{r}(t)+\sum_{w_{i}\rightarrow s}I_{s,\{i\}}w_{i}(t), (10)

and

ui(t)=riI{i},rKrζr(t),u_{i}^{\star}(t)=\sum_{r\ni i}I_{\{i\},r}K_{r}\zeta_{r}(t), (11)

for all t0t\in\mathbb{Z}_{\geq 0}, where ζs(t)\zeta_{s}(t) is an internal state initialized with ζs(0)=wisIs,{i}xi(0)=0\zeta_{s}(0)=\sum_{w_{i}\rightarrow s}I_{s,\{i\}}x_{i}(0)=0 for all s𝒰s\in\mathcal{U}. The corresponding optimal cost of (5), denoted as JJ_{\star}, is given by

J=σw2i𝒱wisTr(I{i},sPsIs,{i}).J_{\star}=\sigma_{w}^{2}\sum_{\begin{subarray}{c}i\in\mathcal{V}\\ w_{i}\rightarrow s\end{subarray}}\text{Tr}\big{(}I_{\{i\},s}P_{s}I_{s,\{i\}}\big{)}. (12)

Let us use Example 1 to illustrate the results in Lemma 2. First, considering node {1}𝒰\{1\}\in\mathcal{U} in the information graph 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) given in Fig. 2, we have from Eq. (10) that

ζ1(t+1)\displaystyle\zeta_{1}(t+1) =(A11+B11K1)ζ1(t)+wi{1}I{1},{i}wi(t)\displaystyle=(A_{11}+B_{11}K_{1})\zeta_{1}(t)+\sum_{w_{i}\to\{1\}}I_{\{1\},\{i\}}w_{i}(t)
=(A11+B11K1)ζ1(t)+w1(t).\displaystyle=(A_{11}+B_{11}K_{1})\zeta_{1}(t)+w_{1}(t).

Next, considering node 2𝒱2\in\mathcal{V} in the directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) given in Fig. 1, we see from Eq. (11) and Fig. 2 that

u2(t)=r2I{2},rKrζr(t)=I{2},{1,2}K{1,2}ζ{1,2}(t)+I{2},{1,2,3}K{1,2,3}ζ{1,2,3}(t),u_{2}^{\star}(t)=\sum_{r\ni 2}I_{\{2\},r}K_{r}\zeta_{r}(t)=I_{\{2\},\{1,2\}}K_{\{1,2\}}\zeta_{\{1,2\}}(t)+I_{\{2\},\{1,2,3\}}K_{\{1,2,3\}}\zeta_{\{1,2,3\}}(t),

where KrK_{r} is given by Eq. (8).

Remark 2.

Obtaining the optimal policy ui(t)u^{\star}_{i}(t), for any i𝒱i\in\mathcal{V}, given by Lemma 2 requires global knowledge of the system matrices AA and BB, the cost matrices QQ and RR, and the directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) with the associated delay matrix DD. Moreover, u(t)u^{\star}(t) given in Lemma 2 is not a static state-feedback controller, but a linear dynamic controller based on the internal states ζr()\zeta_{r}(\cdot) for all r𝒰r\in\mathcal{U}. For any controller i𝒱i\in\mathcal{V} and for any t0t\in\mathbb{Z}_{\geq 0}, the authors in [26] proposed an algorithm to determine ζr(t)\zeta_{r}(t) for all r𝒰r\in\mathcal{U} such that iri\in r, and thus ui(t)u_{i}^{\star}(t), using only the memory maintained by the algorithm, the state information contained in the information set i(t)\mathcal{I}_{i}(t) defined in Eq. (4), and the global information described above.

2.3 Problem Formulation and Summary of Results

We now formally introduce the problem that we will study in this paper. We consider the scenario where the system matrices AA and BB are unknown. However, we assume that the directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) and the associated delay matrix DD are known. Similarly to, e.g., [12, 43], we consider the scenario where we can first conduct experiments in order to estimate the unknown system matrices AA and BB. Specifically, starting from the initial state x(0)=0x(0)=0, we evolve the system given in Eq. (3) for N1N\in\mathbb{Z}_{\geq 1} time steps using a given control input sequence {u(0),u(1),,u(N1)}\{u(0),u(1),\dots,u(N-1)\}, and collect the resulting state sequence {x(1),x(2),,x(N)}\{x(1),x(2),\dots,x(N)\}. Based on {u(0),,u(N1)}\{u(0),\dots,u(N-1)\} and {x(0),,x(N)}\{x(0),\dots,x(N)\}, we use a least squares approach to obtain estimates of the system matrices AA and BB, denoted as A^\hat{A} and B^\hat{B}, respectively. Using the obtained A^\hat{A} and B^\hat{B}, the goal is still to solve (5). Since the true system matrices AA and BB are unknown, it may no longer be possible to solve (5) optimally, using the methods introduced in Section 2.2. Thus, we aim to provide a solution to (5) using A^\hat{A} and B^\hat{B}, and characterize its performance (i.e., suboptimality) guarantees.

In the rest of this paper, we first analyze the estimation error of A^\hat{A} and B^\hat{B} obtained from the procedure described above. In particular, we show in Section 3 that the estimation errors \normA^A\norm{\hat{A}-A} and \normB^B\norm{\hat{B}-B} scale as 𝒪~(1/N)\tilde{\mathcal{O}}(1/\sqrt{N}) with high probability.555Throughout this paper, we let 𝒪~()\tilde{\mathcal{O}}(\cdot) hide logarithmic factors in NN. Next, in Section 4, we design a control policy u^()\hat{u}(\cdot), based on A^\hat{A} and B^\hat{B}, which satisfies the information constraints given in (5). Supposing \normA^Aε\norm{\hat{A}-A}\leq\varepsilon and \normB^Bε\norm{\hat{B}-B}\leq\varepsilon, where ε>0\varepsilon\in\mathbb{R}_{>0}, and denoting the cost of (5) corresponding to u^()\hat{u}(\cdot) as J^\hat{J}, we show in Section 5.2 that

J^JCε,\hat{J}-J_{\star}\leq C\varepsilon,

as long as εC0\varepsilon\leq C_{0}, where JJ_{\star} is the optimal cost of (5) given by (12), and CC and C0C_{0} are constants that explicitly depend on the problem parameters of (5). Finally, combining the above results together, we show in Section 5.3 that with high probability and for large enough NN, the following end-to-end sample complexity of learning decentralized LQR with the partially nested information structure holds:

J^J=𝒪~(1N).\hat{J}-J_{\star}=\tilde{\mathcal{O}}(\frac{1}{\sqrt{N}}).

3 System Identification Using Least Squares

As we described in Section 2.3, we use a least squares approach to estimate the system matrices An×nA\in\mathbb{R}^{n\times n} and Bn×mB\in\mathbb{R}^{n\times m}, based on a single system trajectory consisting of the control input sequence {u(0),,u(N1)}\{u(0),\dots,u(N-1)\} and the system state sequence {x(0),,x(N)}\{x(0),\dots,x(N)\}, where x(0)=0x(0)=0 and N1N\in\mathbb{Z}_{\geq 1}. Here, we draw the inputs u(0),,u(N1)u(0),\dots,u(N-1) independently from a Gaussian distribution 𝒩(0,σu2I)\mathcal{N}(0,\sigma^{2}_{u}I), where σu>0\sigma_{u}\in\mathbb{R}_{>0}. In other words, we let u(t)i.i.d.𝒩(0,σu2I)u(t)\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,\sigma_{u}^{2}I) for all t{0,,N1}t\in\{0,\dots,N-1\}. Moreover, we assume that the input u(t)u(t) and the disturbance w(t)w(t) are independent for all t{0,,N1}t\in\{0,\dots,N-1\}. Note that we consider the scenario where the estimation of AA and BB is performed in a centralized manner using a least squares approach (detailed in Algorithm 1). However, we remark that Algorithm 1 can be carried out without violating the information constraints given by Eq. (4), since u(t)i.i.d.𝒩(0,σu2I)u(t)\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,\sigma_{u}^{2}I) is not a function of the states in the information set defined in Eq. (4) for any t{0,,N1}t\in\{0,\dots,N-1\}. In the following, we present the least squares approach to estimate AA and BB, and characterize the corresponding estimation error.

3.1 Least Squares Estimation of System Matrices

Let us denote

Θ=[AB]andz(t)=[x(t)u(t)],\Theta=\begin{bmatrix}A&B\end{bmatrix}\ \text{and}\ z(t)=\begin{bmatrix}x(t)^{\top}&u(t)^{\top}\end{bmatrix}^{\top}, (13)

where Θn×(n+m)\Theta\in\mathbb{R}^{n\times(n+m)} and z(t)n+mz(t)\in\mathbb{R}^{n+m}. Given the sequences {z(0),,z(N1)}\{z(0),\dots,z(N-1)\} and {x(1),,x(N)}\{x(1),\dots,x(N)\}, we use the regularized least squares to obtain an estimate of Θ\Theta, denoted as Θ^(N)\hat{\Theta}(N), i.e.,

Θ^(N)=argminYn×(n+m){λ\normYF2+t=0N1\normx(t+1)Yz(t)2},\hat{\Theta}(N)=\operatorname*{arg\,min}_{Y\in\mathbb{R}^{n\times(n+m)}}\Big{\{}\lambda\norm{Y}_{F}^{2}+\sum_{t=0}^{N-1}\norm{x(t+1)-Yz(t)}^{2}\Big{\}}, (14)

where λ>0\lambda\in\mathbb{R}_{>0} is the regularization parameter. We summarize the above least squares approach in Algorithm 1.

Input: parameter λ>0\lambda>0 and time horizon length NN

Algorithm 1 Least Squares Estimation of AA and BB
1:Initialize x(0)=0x(0)=0
2:for t=0,,N1t=0,\dots,N-1 do
3:     Play u(t)i.i.d.𝒩(0,σu2I)u(t)\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,\sigma_{u}^{2}I)
4:Obtain Θ^(N)\hat{\Theta}(N) using (14)
5:Extract A^\hat{A} and B^\hat{B} from Θ^(N)\hat{\Theta}(N)

3.2 Least Squares Estimation Error

In order to characterize the estimation error of Θ^(N)\hat{\Theta}(N) given by (14), we will use the following result from [10], which is a consequence of [2, Theorem 1].

Lemma 3.

[10, Lemma 6] For any t1t\in\mathbb{Z}_{\geq 1}, let V(t)=λI+k=0t1z(k)z(k)V(t)=\lambda I+\sum_{k=0}^{t-1}z(k)z(k)^{\top} and Δ(t)=ΘΘ^(t)\Delta(t)=\Theta-\hat{\Theta}(t), where Θ\Theta and z(k)z(k) are given in (13), Θ^(t)\hat{\Theta}(t) is given by (14), and λ>0\lambda\in\mathbb{R}_{>0}. Then, for any δΘ>0\delta_{\Theta}>0, the following hold with probability at least 1δΘ1-\delta_{\Theta}:

Tr(Δ(t)V(t)Δ(t))4σw2nlog(nδΘdet(V(t))det(λI))+2λ\normΘF2,t0.\text{Tr}\big{(}\Delta(t)^{\top}V(t)\Delta(t)\big{)}\leq 4\sigma_{w}^{2}n\log\bigg{(}\frac{n}{\delta_{\Theta}}\frac{\det(V(t))}{\det(\lambda I)}\bigg{)}+2\lambda\norm{\Theta}_{F}^{2},\quad\forall t\in\mathbb{Z}_{\geq 0}.

For any δ>0\delta>0, we now introduce the following probabilistic events that will be useful in our analysis later:

w={max0tN1\normw(t)σw5nlog4Nδ},u={max0tN1\normu(t)σu5mlog4Nδ},Θ={Tr(Δ(N)V(N)Δ(N))4σw2nlog(4nδdet(V(N))det(λI))+2λ\normΘF2},z={t=0N1z(t)z(t)(N1)σ¯240I},\begin{split}\mathcal{E}_{w}&=\bigg{\{}\max_{0\leq t\leq N-1}\norm{w(t)}\leq\sigma_{w}\sqrt{5n\log\frac{4N}{\delta}}\bigg{\}},\\ \mathcal{E}_{u}&=\bigg{\{}\max_{0\leq t\leq N-1}\norm{u(t)}\leq\sigma_{u}\sqrt{5m\log\frac{4N}{\delta}}\bigg{\}},\\ \mathcal{E}_{\Theta}&=\bigg{\{}\text{Tr}\big{(}\Delta(N)^{\top}V(N)\Delta(N)\big{)}\leq 4\sigma_{w}^{2}n\log\bigg{(}\frac{4n}{\delta}\frac{\det(V(N))}{\det(\lambda I)}\bigg{)}+2\lambda\norm{\Theta}_{F}^{2}\bigg{\}},\\ \mathcal{E}_{z}&=\bigg{\{}\sum_{t=0}^{N-1}z(t)z(t)^{\top}\succeq\frac{(N-1)\underline{\sigma}^{2}}{40}I\bigg{\}},\end{split} (15)

where σ¯min{σw,σu}\underline{\sigma}\triangleq\min\{\sigma_{w},\sigma_{u}\}. Denoting

=wvΘz,\mathcal{E}=\mathcal{E}_{w}\cap\mathcal{E}_{v}\cap\mathcal{E}_{\Theta}\cap\mathcal{E}_{z}, (16)

we have the following result; the proof is included in Appendix A.

Lemma 4.

For any δ>0\delta>0 and for any N200(n+m)log48δN\geq 200(n+m)\log\frac{48}{\delta}, it holds that ()1δ\mathbb{P}(\mathcal{E})\geq 1-\delta.

For the analysis in the sequel, we will make the following assumption, which is also made in related literature (see e.g., [24, 37, 43]).

Assumption 3.

The system matrix An×nA\in\mathbb{R}^{n\times n} is stable, and \normAkκ0γ0k\norm{A^{k}}\leq\kappa_{0}\gamma_{0}^{k} for all k0k\in\mathbb{Z}_{\geq 0}, where κ01\kappa_{0}\geq 1 and ρ(A)<γ0<1\rho(A)<\gamma_{0}<1.

Note that for any stable matrix AA, we have from the Gelfand formula (e.g., [22]) that there always exist κ01\kappa_{0}\in\mathbb{R}_{\geq 1} and γ0\gamma_{0}\in\mathbb{R} with ρ(A)<γ0<1\rho(A)<\gamma_{0}<1 such that \normAkκ0γ0k\norm{A^{k}}\leq\kappa_{0}\gamma_{0}^{k} for all k0k\in\mathbb{Z}_{\geq 0}. We then have the following results; the proofs are included in Appendix A.

Lemma 5.

Suppose Assumption 3 holds. On the event \mathcal{E} defined in Eq. (16),

\normz(t)5κ01γ0σ¯(\normB2m+m+n)log4Nδ,\norm{z(t)}\leq\frac{5\kappa_{0}}{1-\gamma_{0}}\overline{\sigma}\sqrt{(\norm{B}^{2}m+m+n)\log\frac{4N}{\delta}}, (17)

for all t{0,,N1}t\in\{0,\dots,N-1\}, where N1N\geq 1, σ¯=max{σw,σu}\overline{\sigma}=\max\{\sigma_{w},\sigma_{u}\}, γ0\gamma_{0} and κ0\kappa_{0} are given in Assumption 3, and z(t)=[x(t)u(t)]z(t)=\begin{bmatrix}x(t)^{\top}&u(t)^{\top}\end{bmatrix}^{\top} with x(t)nx(t)\in\mathbb{R}^{n} and w(t)mw(t)\in\mathbb{R}^{m} to be the state and input of the system in Eq. (3), respectively, corresponding to Algorithm 1.

Proposition 1.

Suppose Assumption 3 holds, and \normAϑ\norm{A}\leq\vartheta and \normBϑ\norm{B}\leq\vartheta, where ϑ>0\vartheta\in\mathbb{R}_{>0}. Consider any δ>0\delta>0. Let the input parameters to Algorithm 1 satisfy N200(n+m)log48δN\geq 200(n+m)\log\frac{48}{\delta} and λσ¯2/40\lambda\geq\underline{\sigma}^{2}/40, where σ¯=min{σw,σu}\underline{\sigma}=\min\{\sigma_{w},\sigma_{u}\}. Define

zb=5κ01γ0σ¯(\normB2m+m+n)log4Nδ,z_{b}=\frac{5\kappa_{0}}{1-\gamma_{0}}\overline{\sigma}\sqrt{(\norm{B}^{2}m+m+n)\log\frac{4N}{\delta}},

where κ0\kappa_{0} and γ0\gamma_{0} are given in Assumption 3, and σ¯=max{σw,σu}\overline{\sigma}=\max\{\sigma_{w},\sigma_{u}\}. Then, with probability at least 1δ1-\delta, it holds that \normA^Aε0\norm{\hat{A}-A}\leq\varepsilon_{0} and \normB^Bε0\norm{\hat{B}-B}\leq\varepsilon_{0}, where A^\hat{A} and B^\hat{B} are returned by Algorithm 1, and

ε0=4160Nσ¯2(2nσw2(n+m)logN+zb2/λδ+λnϑ2).\varepsilon_{0}=4\sqrt{\frac{160}{N\underline{\sigma}^{2}}\bigg{(}2n\sigma_{w}^{2}(n+m)\log\frac{N+z^{2}_{b}/\lambda}{\delta}+\lambda n\vartheta^{2}\bigg{)}}. (18)

Several remarks pertaining to Algorithm 1 and the result in Proposition 1 are now in order. First, note that while considering the problem of learning centralized LQR without any information constraints, the authors in [12] proposed to obtain A^\hat{A} and B^\hat{B} from multiple system trajectories using least squares, where each trajectory starts from x(0)=0x(0)=0. They showed that \normA^A=𝒪(1/Nr)\norm{\hat{A}-A}=\mathcal{O}(1/\sqrt{N_{r}}) and \normB^B=𝒪(1/Nr)\norm{\hat{B}-B}=\mathcal{O}(1/\sqrt{N_{r}}), where Nr1N_{r}\in\mathbb{Z}_{\geq 1} is the number of system trajectories. In contrast, we estimate AA and BB from a single system trajectory, and achieve \normA^A=𝒪~(1/N)\norm{\hat{A}-A}=\tilde{\mathcal{O}}(1/\sqrt{N}) and \normB^B=𝒪~(1/N)\norm{\hat{B}-B}=\tilde{\mathcal{O}}(1/\sqrt{N}).

Second, note that we use the regularized least squares in Algorithm 1 to obtain estimates A^\hat{A} and B^\hat{B}. Although least squares without regularization can also be used to obtain estimates A^\hat{A} and B^\hat{B} from a single system trajectory with the same 𝒪~(1/N)\tilde{\mathcal{O}}(1/\sqrt{N}) finite sample guarantee (e.g., [36]), we choose to use regularized least squares considered in, e.g., [2, 10, 9]. The reason is that introducing the regularization into least squares makes the finite sample analysis more tractable (e.g., [10, 9]), which facilitates the adaption of the analysis in [10, 9] to our setting described in this section. Moreover, note that the lower bound on λ\lambda required in Proposition 1 is merely used to guarantee that the denominator of the right-hand side of Eq. (18) contains the factor 1/N1/\sqrt{N}; choosing an arbitrary λ>0\lambda\in\mathbb{R}_{>0} leads to a factor 1/N11/\sqrt{N-1}. In general, one can show that choosing any λ>0\lambda\in\mathbb{R}_{>0} leads to the same 𝒪~(1/N)\tilde{\mathcal{O}}(1/\sqrt{N}) finite sample guarantee.

Third, note that we do not leverage the block structure (i.e., sparsity pattern) of AA and BB described in Section 2.2, when we obtain A^\hat{A} and B^\hat{B} using Algorithm 1. Therefore, the sparsity pattern of A^\hat{A} and B^\hat{B} may potentially be inconsistent with that of AA and BB. Nonetheless, such a potential inconsistency does not play any role in our analysis later. The reason is that the control policy to be proposed in Section 4 does not depend on the sparsity pattern of A^\hat{A} and B^\hat{B}. Moreover, when analyzing the suboptimality of the proposed control policy later in Section 5, we only leverage the fact that the estimation error corresponding to submatrices in A^\hat{A} (resp., B^\hat{B}) will be upper bounded by \normA^A\norm{\hat{A}-A} (resp., \normB^B\norm{\hat{B}-B}). Specifically, considering any nodes s,rs,r in the information graph 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) given by (7), one can show that \normA^srAsr\normA^A\norm{\hat{A}_{sr}-A_{sr}}\leq\norm{\hat{A}-A}, where recall that A^sr\hat{A}_{sr} (resp., AsrA_{sr}) is a submatrix of A^\hat{A} (resp., AA) that corresponds to the nodes of the directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) contained in ss and rr.

Finally, we remark that one may also use system identification schemes and the associated sample complexity analysis dedicated to sparse system matrices (e.g., [14]). Under some extra assumptions on AA and BB (e.g., [14]), one may then obtain A^\hat{A} and B^\hat{B} that have the same sparsity pattern as AA and BB, and remove the logarithmic factor in NN in ε0\varepsilon_{0} defined in Proposition 1. However, the assumptions on AA and BB made in e.g., [14] can be restrictive and hard to check in practice.

4 Control Policy Design

While the estimation of AA and BB is performed in a centralized manner as we discussed in Section 3.1, we assume that each controller i𝒱i\in\mathcal{V} receives the estimates A^\hat{A} and B^\hat{B} after we conduct the system identification step described in Algorithm 1. Given the matrices A^\hat{A}, B^\hat{B}, QQ, and RR, and the directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) (𝒱=[p]\mathcal{V}=[p]) with the delay matrix DD, in this section we design a control policy that can be implemented in a decentralized manner, while satisfying the information constraints described in Section 2.2. To this end, we leverage the structure of the optimal policy u()u^{\star}(\cdot) given in Lemma 2 (when AA and BB are known). Note that the optimal policy u()u^{\star}(\cdot) cannot be applied to our scenario, since only A^\hat{A} and B^\hat{B} are available.

First, given the directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) with 𝒱=[p]\mathcal{V}=[p] and the delay matrix DD, we construct the information graph 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) given by (7). Recall from Remark 1 that 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) is a forest that contains a set of disconnected directed trees. We then let \mathcal{L} denote the set of all the leaf nodes in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}), i.e.,

={si(0)𝒰:i𝒱}.\mathcal{L}=\{s_{i}(0)\in\mathcal{U}:i\in\mathcal{V}\}. (19)

Moreover, for any s𝒰s\in\mathcal{U}, we denote

s={v:vs},\mathcal{L}_{s}=\{v\in\mathcal{L}:v\rightsquigarrow s\}, (20)

where we write vsv\rightsquigarrow s if and only if there is a unique directed path from node vv to node ss in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}). In other words, s\mathcal{L}_{s} is the set of leaf nodes in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) that can reach ss. Moreover, for any v,s𝒰v,s\in\mathcal{U} such that vsv\rightsquigarrow s, we let lvsl_{vs} denote the length of the unique directed path from vv to ss in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}); we let lvs=0l_{vs}=0 if v=sv=s. For example, in the information graph (associated with Example 1) given in Fig. 2, we have ={{1},{1,2},{3}}\mathcal{L}=\{\{1\},\{1,2\},\{3\}\}, {1,2,3}={{1},{1,2}}\mathcal{L}_{\{1,2,3\}}=\{\{1\},\{1,2\}\}, and l{1}{1,2,3}=1l_{\{1\}\{1,2,3\}}=1.

Next, in order to leverage the structure of the optimal policy u()u^{\star}(\cdot) given in Eqs. (8)-(11), we substitute (submatrices of) A^\hat{A} and B^\hat{B} into the right-hand sides of Eqs. (8)-(9), and obtain K^r\hat{K}_{r} and P^r\hat{P}_{r} for all r𝒰r\in\mathcal{U}. Specifically, for all r𝒰r\in\mathcal{U}, we obtain K^r\hat{K}_{r}, and P^r\hat{P}_{r} recursively as

K^r\displaystyle\hat{K}_{r} =(Rrr+B^srP^sB^sr)1B^srP^sA^sr,\displaystyle=-(R_{rr}+\hat{B}_{sr}^{\top}\hat{P}_{s}\hat{B}_{sr}^{\top})^{-1}\hat{B}_{sr}^{\top}\hat{P}_{s}\hat{A}_{sr}, (21)
P^r\displaystyle\hat{P}_{r} =Qrr+K^rRrrK^r+(A^sr+B^srK^r)P^s(A^sr+B^srK^r),\displaystyle=Q_{rr}+\hat{K}_{r}^{\top}R_{rr}\hat{K}_{r}+(\hat{A}_{sr}+\hat{B}_{sr}\hat{K}_{r})^{\top}\hat{P}_{s}(\hat{A}_{sr}+\hat{B}_{sr}\hat{K}_{r}), (22)

where for each r𝒰r\in\mathcal{U}, we let s𝒰s\in\mathcal{U} be the unique node such that rsr\rightarrow s, and A^sr\hat{A}_{sr} (resp., B^sr\hat{B}_{sr}) is a submatrix of A^\hat{A} (resp., B^\hat{B}) obtained in the same manner as AsrA_{sr} (resp., BsrB_{sr}) described before. Similarly to Eq. (10), we then use K^r\hat{K}_{r} for all r𝒰r\in\mathcal{U} together with A^\hat{A} and B^\hat{B} to maintain an (estimated) internal state ζ^r(t)\hat{\zeta}_{r}(t) (to be defined later) for all r𝒰r\in\mathcal{U} and for all t{0,,T1}t\in\{0,\dots,T-1\}, which, via a similar form to Eq. (11), will lead to our control policy, denoted as u^i(t)\hat{u}_{i}(t), for all i𝒱i\in\mathcal{V} and for all t{0,,T1}t\in\{0,\dots,T-1\}. Specifically, for all i𝒱i\in\mathcal{V} in parallel, we propose Algorithm 2 to compute the control policy

u^i(t)=riI{i},rK^rζ^r(t)t{0,,T1}.\hat{u}_{i}(t)=\sum_{r\ni i}I_{\{i\},r}\hat{K}_{r}\hat{\zeta}_{r}(t)\quad\forall t\in\{0,\dots,T-1\}. (23)

Input: estimates A^\hat{A} and B^\hat{B}, cost matrices QQ and RR, directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) with 𝒱=[p]\mathcal{V}=[p] and delay matrix DD, time horizon length TT

Algorithm 2 Control policy design for node i𝒱i\in\mathcal{V}
1:Construct the information graph 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) from (7)
2:Obtain K^s\hat{K}_{s} for all s𝒰s\in\mathcal{U} from Eq. (8)
3:Initialize i¯i\mathcal{M}_{i}\leftarrow\bar{\mathcal{M}}_{i}
4:for t=0,T1t=0,\dots T-1 do
5:     for s(𝒯i)s\in\mathcal{L}(\mathcal{T}_{i}) do
6:         Find sj(0)𝒰s_{j}(0)\in\mathcal{U} s.t. j𝒱j\in\mathcal{V} and sj(0)=ss_{j}(0)=s
7:         Obtain w^j(tDij1)\hat{w}_{j}(t-D_{ij}-1) from Eq. (29)
8:         Obtain ζ^s(tDij)\hat{\zeta}_{s}(t-D_{ij}) from Eq. (28)
9:         ii{ζ^s(tDij)}\mathcal{M}_{i}\leftarrow\mathcal{M}_{i}\cup\{\hat{\zeta}_{s}(t-D_{ij})\}      
10:     for s(𝒯i)s\in\mathcal{R}(\mathcal{T}_{i}) do
11:         Obtain ζ^s(tDmax)\hat{\zeta}_{s}(t-D_{\max}) from Eq. (28)
12:         ii{ζ^s(tDmax)}\mathcal{M}_{i}\leftarrow\mathcal{M}_{i}\cup\{\hat{\zeta}_{s}(t-D_{\max})\}      
13:     Play u^i(t)=riI{i},rK^rζ^r(t)\hat{u}_{i}(t)=\sum_{r\ni i}I_{\{i\},r}\hat{K}_{r}\hat{\zeta}_{r}(t)
14:     ii({ζ^s(t2Dmax1):s(𝒯i)}{ζ^s(tDmax1):s(𝒯i)})\mathcal{M}_{i}\leftarrow\mathcal{M}_{i}\setminus\big{(}\{\hat{\zeta}_{s}(t-2D_{\max}-1):s\in\mathcal{L}(\mathcal{T}_{i})\}\cup\{\hat{\zeta}_{s}(t-D_{\max}-1):s\in\mathcal{R}(\mathcal{T}_{i})\}\big{)}

We now describe the notations used in Algorithm 2 and hereafter. Let us consider any i𝒱i\in\mathcal{V}. In Algorithm 2, we let 𝒯i\mathcal{T}_{i} denote the set of disconnected directed trees in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) such that the root node of any tree in 𝒯i\mathcal{T}_{i} contains ii. Slightly abusing the notation, we also let 𝒯i\mathcal{T}_{i} denote the set of nodes of all the trees in 𝒯i\mathcal{T}_{i}. Moreover, we denote

(𝒯i)=𝒯i,\mathcal{L}(\mathcal{T}_{i})=\mathcal{T}_{i}\cap\mathcal{L}, (24)

where \mathcal{L} is defined in Eq. (19), i.e., (𝒯i)\mathcal{L}(\mathcal{T}_{i}) is the set of leaf nodes of all the trees in 𝒯i\mathcal{T}_{i}. Letting 𝒰\mathcal{R}\subseteq\mathcal{U} be the set of root nodes in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}), we denote

(𝒯i)=𝒯i,\mathcal{R}(\mathcal{T}_{i})=\mathcal{T}_{i}\cap\mathcal{R}, (25)

where we recall from Lemma 1 that any root node in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) has a self loop. We then see from the information graph 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) given in Fig. 2 that

(𝒯1)={{1},{1,2},{3}},(𝒯2)=(𝒯3)={{1,2},{3}},\displaystyle\mathcal{L}(\mathcal{T}_{1})=\{\{1\},\{1,2\},\{3\}\},\ \mathcal{L}(\mathcal{T}_{2})=\mathcal{L}(\mathcal{T}_{3})=\{\{1,2\},\{3\}\},
(𝒯1)={{1},{1,2,3}},2(𝒯2)=(𝒯3)={1,2,3}.\displaystyle\mathcal{R}(\mathcal{T}_{1})=\{\{1\},\{1,2,3\}\},\ \mathcal{R}_{2}(\mathcal{T}_{2})=\mathcal{R}(\mathcal{T}_{3})=\{1,2,3\}.

Note that if any node s𝒯is\in\mathcal{T}_{i} is a leaf node with a self loop (i.e., ss is an isolated node in 𝒫(𝒰,\mathcal{P}(\mathcal{U},\mathcal{H})) such as the node {3}\{3\} in Fig. 2, we only include ss in (𝒯i)\mathcal{L}(\mathcal{T}_{i}) (i.e., s(𝒯i)s\in\mathcal{L}(\mathcal{T}_{i}) but s(𝒯i)s\not\in\mathcal{R}(\mathcal{T}_{i})).

Furthermore, we denote

Dmax=maxi,j𝒱jiDij,D_{\max}=\max_{\begin{subarray}{c}i,j\in\mathcal{V}\\ j\rightsquigarrow i\end{subarray}}D_{ij}, (26)

where we write jij\rightsquigarrow i if and only if there is a directed path from node jj to node ii in 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}), and recall that DijD_{ij} is the sum of delays along the directed path from jj to ii with the smallest accumulative delay. Finally, the memory i\mathcal{M}_{i} of Algorithm 2 is initialized as i=¯i\mathcal{M}_{i}=\bar{\mathcal{M}}_{i} with

¯i=\displaystyle\bar{\mathcal{M}}_{i}= {ζ^s(k):k{2Dmax1,,Dij1},s(𝒯i),j𝒱,sj(0)=s}{ζ^s(Dmax1):s(𝒯i)},\displaystyle\big{\{}\hat{\zeta}_{s}(k):k\in\{-2D_{\max}-1,\dots,-D_{ij}-1\},s\in\mathcal{L}(\mathcal{T}_{i}),j\in\mathcal{V},s_{j}(0)=s\big{\}}\cup\{\hat{\zeta}_{s}(-D_{\max}-1):s\in\mathcal{R}(\mathcal{T}_{i})\}, (27)

where we initialize ζ^s(k)=0\hat{\zeta}_{s}(k)=0 for all ζ^s(k)¯i\hat{\zeta}_{s}(k)\in\bar{\mathcal{M}}_{i}.

Remark 3.

For any s,r(𝒯i)s,r\in\mathcal{L}(\mathcal{T}_{i}), let j1,j2𝒱j_{1},j_{2}\in\mathcal{V} be such that sj1(0)=ss_{j_{1}}(0)=s and sj2(0)=rs_{j_{2}}(0)=r. In Algorithm 2, we assume that the elements in (𝒯i)\mathcal{L}(\mathcal{T}_{i}) are already ordered such that if Dij1>Dij2D_{ij_{1}}>D_{ij_{2}}, then ss comes before rr in (𝒯i)\mathcal{L}(\mathcal{T}_{i}). We then let the for loop in lines 5-9 in Algorithm 2 iterate over the elements in (𝒯i)\mathcal{L}(\mathcal{T}_{i}) according to the above order. Considering the node 22 in the directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) in Example 1, we see from Fig. 1 and Fig. 2 that (𝒯2)={{1,2},{3}}\mathcal{L}(\mathcal{T}_{2})=\{\{1,2\},\{3\}\}, where s2(0)={1,2}s_{2}(0)=\{1,2\} and s3(0)={3}s_{3}(0)=\{3\}. Since D22=0D_{22}=0 and D23=1D_{23}=1, we assume that the elements in (𝒯2)\mathcal{L}(\mathcal{T}_{2}) are ordered such that (𝒯2)={{3},{1,2}}\mathcal{L}(\mathcal{T}_{2})=\{\{3\},\{1,2\}\}.

Remark 4.

Recall that each edge in 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) is associated with a delay of either 0 or 11. Considering the scenario with only sparsity constraints (e.g., [35]), i.e., all the edges in 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) have a zero delay, we see that Dij=0D_{ij}=0 for all i,j𝒱i,j\in\mathcal{V} such that jij\rightsquigarrow i, which implies via Eq. (26) that Dmax=0D_{\max}=0.

For any rir\ni i, the dynamics of the internal state ζ^r(t)\hat{\zeta}_{r}(t) is given by

ζ^r(t+1)=vr(A^rv+B^rvK^v)ζ^v(t)+wjrIr,{j}w^j(t),\hat{\zeta}_{r}(t+1)=\sum_{v\rightarrow r}(\hat{A}_{rv}+\hat{B}_{rv}\hat{K}_{v})\hat{\zeta}_{v}(t)+\sum_{w_{j}\rightarrow r}I_{r,\{j\}}\hat{w}_{j}(t), (28)

where w^j(t)\hat{w}_{j}(t) is an estimate of the disturbance wj(t)w_{j}(t) in Eq. (2) obtained as

w^j(t)={0ift<1,xj(0)ift=1,xj(t+1)A^jx𝒩j(t)B^ju^𝒩j(t)ift0,\hat{w}_{j}(t)=\begin{cases}0\ \text{if}\ t<-1,\\ x_{j}(0)\ \text{if}\ t=-1,\\ x_{j}(t+1)-\hat{A}_{j}x_{\mathcal{N}_{j}}(t)-\hat{B}_{j}\hat{u}_{\mathcal{N}_{j}}(t)\ \text{if}\ t\geq 0,\end{cases} (29)

where we replace AjA_{j} and BjB_{j} with the estimates A^j\hat{A}_{j} and B^j\hat{B}_{j} in Eq. (2), respectively, and u^𝒩j(t)\hat{u}_{\mathcal{N}_{j}}(t) is the vector that collects u^j1(t)\hat{u}_{j_{1}}(t) for all j1𝒩jj_{1}\in\mathcal{N}_{j}, with 𝒩j\mathcal{N}_{j} given in Assumption 1. We note from Eqs. (28)-(29) that ζ^r(0)=wjrIr,{j}xj(0)\hat{\zeta}_{r}(0)=\sum_{w_{j}\rightarrow r}I_{r,\{j\}}x_{j}(0), where x(0)=0x(0)=0 as we assumed previously. We emphasize that Eqs. (28)-(29) are the keys to our control policy design, and they also enable our analyses in Section 5, where we provide a suboptimality guarantee of our control policy. As we mentioned in Section 1, the motivation of the control policy u^()\hat{u}(\cdot) given by Eqs. (23), (28)-(29) is that the optimal control policy given in Lemma 2 can be viewed as a disturbance-feedback controller. Since the system matrices AA and BB are unknown, the control policy u^()\hat{u}(\cdot) constructed in Eqs. (23), (28)-(29) maps the estimates of the past disturbances given by Eq. (29) to the current control input via the estimated internal states given by Eq. (28).

Observation 1.

From the structure of the information graph 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) defined in (7), the following hold:
(a) If rr is not a leaf node in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}), Eq. (28) reduces to ζ^r(t+1)=vr(A^rv+B^rvK^v)ζ^v(t)\hat{\zeta}_{r}(t+1)=\sum_{v\rightarrow r}(\hat{A}_{rv}+\hat{B}_{rv}\hat{K}_{v})\hat{\zeta}_{v}(t).
(b) If rr is a leaf node in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) that is not isolated, Eq. (28) reduces to ζ^r(t+1)=wjrIr,{j}w^j(t)\hat{\zeta}_{r}(t+1)=\sum_{w_{j}\rightarrow r}I_{r,\{j\}}\hat{w}_{j}(t).
(c) If rr is an isolated node in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}), Eq. (28) reduces to ζ^r(t+1)=(A^rr+B^rrK^r)ζ^r(t)+wjrIr,{j}w^j(t)\hat{\zeta}_{r}(t+1)=(\hat{A}_{rr}+\hat{B}_{rr}\hat{K}_{r})\hat{\zeta}_{r}(t)+\sum_{w_{j}\rightarrow r}I_{r,\{j\}}\hat{w}_{j}(t).

We will show that in each iteration t{0,,T1}t\in\{0,\dots,T-1\} of the for loop in lines 4-14 of Algorithm 2, the internal states ζ^r(t)\hat{\zeta}_{r}(t) for all r𝒰r\in\mathcal{U} such that iri\in r (i.e., for all rir\ni i) can be determined, via Eq. (28), based on the current memory i\mathcal{M}_{i} of the algorithm and the state information contained in (a subset of) the information set i(t)\mathcal{I}_{i}(t) defined in Eq. (4). As we will see, Algorithm 2 maintains, in its current memory i\mathcal{M}_{i}, the internal states (with potential time delays) for a certain subset of nodes in 𝒰\mathcal{U}, via the recursion in Eq. (28). Given those internal states, ζ^r(t)\hat{\zeta}_{r}(t) for all rir\ni i can be determined using Eq. (28). Moreover, the memory i\mathcal{M}_{i} of Algorithm 2 is recursively updated in the for loop in lines 4-14 of the algorithm. Formally, we have the following result for Algorithm 2; the proof can be found in Appendix B.

Proposition 2.

Suppose that any controller i𝒱i\in\mathcal{V} at any time step t0t\in\mathbb{Z}_{\geq 0} has access to the states in ~i(t)\tilde{\mathcal{I}}_{i}(t) defined as

~i(t)={xj(k):j𝒱i,k{tDmax1,,tDij}}i(t),\tilde{\mathcal{I}}_{i}(t)=\big{\{}x_{j}(k):j\in\mathcal{V}_{i},k\in\{t-D_{\max}-1,\dots,t-D_{ij}\}\big{\}}\subseteq\mathcal{I}_{i}(t), (30)

where 𝒱i={j𝒱:Dij+}\mathcal{V}_{i}=\{j\in\mathcal{V}:D_{ij}\neq+\infty\}, and i(t)\mathcal{I}_{i}(t) is defined in Eq. (4). Then, the following properties hold for Algorithm 2:
(a) The memory i\mathcal{M}_{i} of Algorithm 2 can be recursively updated such that at the beginning of any iteration t{0,,T1}t\in\{0,\dots,T-1\} of the for loop in lines 4-14 of the algorithm,

i={ζ^s(k):k{t2Dmax1,,tDij1},s(𝒯i),j𝒱,sj(0)=s}{ζ^s(tDmax1):s(𝒯i)}.\mathcal{M}_{i}=\big{\{}\hat{\zeta}_{s}(k):k\in\{t-2D_{\max}-1,\dots,t-D_{ij}-1\},s\in\mathcal{L}(\mathcal{T}_{i}),j\in\mathcal{V},s_{j}(0)=s\big{\}}\\ \cup\{\hat{\zeta}_{s}(t-D_{\max}-1):s\in\mathcal{R}(\mathcal{T}_{i})\}. (31)

(b) The control input u^i(t)\hat{u}_{i}(t) in line 13 can be determined using Eq. (28) and the states in the memory i\mathcal{M}_{i} after line 12 (and before line 14) in any iteration t{0,,T1}t\in\{0,\dots,T-1\} of the for loop in lines 4-14 of Algorithm 2.

Since the proof of Proposition 2 is rather involved and requires careful considerations of the structures of the directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) and the information graph 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) described in Section 2.2, we again use Example 1 to illustrate the steps of Algorithm 2 and the results and proof ideas of Proposition 2.

First, we note from Fig. 1 and Eq. (26) that Dmax=1D_{\max}=1. Now, let us consider Algorithm 2 with respective to node 22 in the directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) given in Fig. 1. We see that 𝒱2={j𝒱:Dij}={2,3}\mathcal{V}_{2}=\{j\in\mathcal{V}:D_{ij}\neq\infty\}=\{2,3\}, which implies via Eq. (30) that ~2(t)={x2(t2),x2(t1),x2(t),x3(t2),x3(t1)}\tilde{\mathcal{I}}_{2}(t)=\{x_{2}(t-2),x_{2}(t-1),x_{2}(t),x_{3}(t-2),x_{3}(t-1)\} for all t{0,,T1}t\in\{0,\dots,T-1\}. One can check that the initial memory ¯2\bar{\mathcal{M}}_{2} of Algorithm 2 given by Eq. (27) satisfies Eq. (31) for t=0t=0, which implies that the memory 2\mathcal{M}_{2} satisfies Eq. (31) at the beginning of iteration t=0t=0 of the for loop in lines 4-14 of the algorithm.

To proceed, let us consider iteration t=0t=0 of the for loop in lines 4-14 of the algorithm. Noting that (𝒯2)={{3},{1,2}}\mathcal{L}(\mathcal{T}_{2})=\{\{3\},\{1,2\}\} from Remark 3, Algorithm 2 first considers s={3}s=\{3\} in the for loop in lines 5-9, which implies that j=3j=3 in line 7. We then see from Eq. (29) that in order to obtain w^3(t2)\hat{w}_{3}(t-2), we need to know x3(t1)x_{3}(t-1), x3(t2)x_{3}(t-2), x2(t2)x_{2}(t-2), u^3(t2)\hat{u}_{3}(t-2) and u^2(t2)\hat{u}_{2}(t-2), where x3(t1),x3(t2),x2(t2)~2(t)x_{3}(t-1),x_{3}(t-2),x_{2}(t-2)\in\tilde{\mathcal{I}}_{2}(t), and u^2(t2),u^3(t2)\hat{u}_{2}(t-2),\hat{u}_{3}(t-2) are given by Eq. (23). One can then check that the internal states ζ^r(t)\hat{\zeta}_{r}(t^{\prime}) that are needed to determine u^2(t2)\hat{u}_{2}(t-2) and u^3(t2)\hat{u}_{3}(t-2) are available in the current memory 2\mathcal{M}_{2} of Algorithm 2 or become available via further applications of Eq. (28). After w^3(t2)\hat{w}_{3}(t-2) is obtained, we see from Eq. (28) that ζ^{3}(t1)\hat{\zeta}_{\{3\}}(t-1) can also be obtained. Algorithm 2 then updates its current memory 2\mathcal{M}_{2} in line 9 and finishes the iteration with respect to s={3}s=\{3\} in the for loop in lines 5-9. Next, Algorithm 2 considers s={1,2}s=\{1,2\} in the for loop in lines 5-9, which implies that j=2j=2 in line 7. Following similar arguments to those above for s={3}s=\{3\} and noting that the current memory 2\mathcal{M}_{2} of Algorithm 2 has been updated, one can show that ζ^{1,2}(t)\hat{\zeta}_{\{1,2\}}(t) can be obtained from Eq. (28), based on the current memory of the algorithm. Algorithm 2 again updates its current memory 2\mathcal{M}_{2} in line 9 and finishes the iteration with respect to s={1,2}s=\{1,2\} in the for loop in lines 5-9.

Now, recalling that (𝒯2)={1,2,3}\mathcal{R}(\mathcal{T}_{2})=\{1,2,3\} from Fig. 2, we see that Algorithm 2 considers s={1,2,3}s=\{1,2,3\} in line 10. One can also check that ζ^{1,2,3}(t)\hat{\zeta}_{\{1,2,3\}}(t) can be obtained from Eq. (28), based on the current memory of the algorithm. Finally, based on the current memory 2\mathcal{M}_{2} of Algorithm 2 after line 12, one can check that the control input u^2(t)\hat{u}_{2}(t) can be determined from Eq. (23). Note that Algorithm 2 also removes certain internal states from its current memory in line 14 that will no longer be used. One can check that after the removal, the current memory 2\mathcal{M}_{2} of Algorithm  2 will satisfy Eq. (31) at the beginning of iteration t+1t+1 of the for loop in lines 4-14 of the algorithm, where t=0t=0. One can then repeat the above arguments for iteration t=1t=1 of the for loop in lines 4-14 of the algorithm and so on.

Several remarks pertaining to Algorithm 2 are now in order. First, since |(𝒯i)|p|\mathcal{L}(\mathcal{T}_{i})|\leq p and |(𝒯i)|p|\mathcal{R}(\mathcal{T}_{i})|\leq p, one can show via the definition of Algorithm 2 that the number of the states in the memory i\mathcal{M}_{i} of Algorithm 2 is always upper bounded by (2Dmax+2)p+2p(2D_{\max}+2)p+2p, where we note that DmaxD_{\max} defined in Eq. (26) satisfies DmaxpD_{\max}\leq p, and pp is the number of nodes in the directed graph 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}). Moreover, one can check that Algorithm 2 can be implemented in polynomial time.

Second, it is worth noting that the control policy u^i()\hat{u}_{i}(\cdot) for all i𝒰i\in\mathcal{U} that we proposed in Eq. (23) is related to the certainty equivalent approach (e.g., [4]) that has been used for learning centralized LQR without any information constraints on the controllers (e.g., [12, 29, 9]). It is known that the optimal solution to classic centralized LQR (i.e., problem (5) without the information constraints) is given by a static state-feedback controller u(t)=Kx(t)u^{\star}(t)=Kx(t), where KK can be obtained from the solution to the Ricatti equation corresponding to AA, BB, QQ and RR (e.g., [5]). The corresponding certainty equivalent controller simply takes the form u^(t)=K^x(t)\hat{u}(t)=\hat{K}x(t), where K^\hat{K} is obtained from the solution to the Ricatti equation corresponding to A^\hat{A}, B^\hat{B}, QQ and RR, with A^\hat{A} and B^\hat{B} to be the estimates of AA and BB, respectively. While we also leverage the structure of the optimal control policy u()u^{\star}(\cdot) given in Eq. (11), we cannot simply replace KrK_{r} with K^r\hat{K}_{r} for all r𝒰r\in\mathcal{U} in Eq. (11), where K^r\hat{K}_{r} is given by the Ricatti equations in Eqs. (21)-(22). As we argued in Remark 2, this is because u()u^{\star}(\cdot) is not a static state-feedback controller, but a linear dynamic controller based on the internal states ζr()\zeta_{r}(\cdot) for all r𝒰r\in\mathcal{U}, where the dynamics of ζr()\zeta_{r}(\cdot) given by Eq. (10) also depends on AA and BB. Thus, the control policy u^i()\hat{u}_{i}(\cdot) that we proposed in Eq. (23) is a linear dynamic controller based on K^r\hat{K}_{r} and the estimated internal states ζ^r()\hat{\zeta}_{r}(\cdot) for all r𝒰r\in\mathcal{U}, where the dynamics of ζ^r()\hat{\zeta}_{r}(\cdot) given by Eq. (28) depends on A^\hat{A} and B^\hat{B}. Such a more complicated form of u^i()\hat{u}_{i}(\cdot) also creates several challenges when we analyze the corresponding suboptimality guarantees in the next section.

Third, for any i𝒱i\in\mathcal{V} and any t{0,,T1}t\in\{0,\dots,T-1\}, Proposition 2 only requires controller ii to have access to a subset of the state information contained in the information set i(t)\mathcal{I}_{i}(t).

Finally, we remark that Algorithm 2 is not the unique way to implement the control policy u^i()\hat{u}_{i}(\cdot) given in Eq. (23), under the information constraints on each controller i𝒱i\in\mathcal{V}.

5 Suboptimality Guarantees

In this section, we characterize the suboptimality guarantees of the control policy u^()\hat{u}(\cdot) proposed in Section 4. To begin with, in order to explicitly distinguish the states of the system in Eq. (3) corresponding to the control policies u()u^{\star}(\cdot) and u^()\hat{u}(\cdot) given by Eqs. (11) and (23), respectively, we let x^(t)\hat{x}(t) denote the state of the system in Eq. (3) corresponding to the control policy u^()\hat{u}(\cdot) given by Eq. (23), for t0t\in\mathbb{Z}_{\geq 0}, i.e.,

x^(t+1)=Ax^(t)+Bu^(t)+w(t),\hat{x}(t+1)=A\hat{x}(t)+B\hat{u}(t)+w(t), (32)

where we note from Eq. (23) that u^(t)=s𝒰I𝒱,sK^sζ^s(t)\hat{u}(t)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\hat{K}_{s}\hat{\zeta}_{s}(t) with K^s\hat{K}_{s} and ζ^s(t)\hat{\zeta}_{s}(t) given by Eqs. (21) and (28), respectively, for all s𝒰s\in\mathcal{U}. We let x(t)x(t) denote the state of the system in Eq. (3) corresponding to the optimal control policy u(t)u^{\star}(t) given by Eq. (11), for t0t\in\mathbb{Z}_{\geq 0}, i.e.,

x(t+1)=Ax(t)+Bu(t)+w(t),x(t+1)=Ax(t)+Bu^{\star}(t)+w(t), (33)

where u(t)=s𝒰I𝒱,sKsζs(t)u^{\star}(t)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}K_{s}\zeta_{s}(t) with KsK_{s} and ζs(t)\zeta_{s}(t) given by Eqs. (8) and (10), respectively, for all s𝒰s\in\mathcal{U}. In Eqs. (32)-(33), we set x^(0)=x(0)=0\hat{x}(0)=x(0)=0.

Moreover, for our analysis in the sequel, we introduce another control policy u~(t)\tilde{u}(t) given by

u~i(t)=siI{i},sK^sζ~s(t)i𝒱,\tilde{u}_{i}(t)=\sum_{s\ni i}I_{\{i\},s}\hat{K}_{s}\tilde{\zeta}_{s}(t)\quad\forall i\in\mathcal{V}, (34)

for t0t\in\mathbb{Z}_{\geq 0}, where for any s𝒰s\in\mathcal{U}, K^s\hat{K}_{s} is given by Eq. (21), and ζ~s(t)\tilde{\zeta}_{s}(t) is given by

ζ~s(t+1)=rs(Asr+BsrK^r)ζ~r(t)+wisIs,{i}wi(t),\tilde{\zeta}_{s}(t+1)=\sum_{r\rightarrow s}(A_{sr}+B_{sr}\hat{K}_{r})\tilde{\zeta}_{r}(t)+\sum_{w_{i}\rightarrow s}I_{s,\{i\}}w_{i}(t), (35)

with ζ~s(0)=wisIs,{i}xi(0)=0\tilde{\zeta}_{s}(0)=\sum_{w_{i}\rightarrow s}I_{s,\{i\}}x_{i}(0)=0. We then let x~(t)\tilde{x}(t) denote the state of the system in Eq. (3) corresponding to u~i()\tilde{u}_{i}(\cdot), for t0t\in\mathbb{Z}_{\geq 0}, i.e.,

x~(t+1)=Ax~(t)+Bu~(t)+w(t),\tilde{x}(t+1)=A\tilde{x}(t)+B\tilde{u}(t)+w(t), (36)

where u~(t)=s𝒰I𝒱,sK^sζ~s(t)\tilde{u}(t)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\hat{K}_{s}\tilde{\zeta}_{s}(t) from Eq. (34), and we set x~(0)=x(0)=0\tilde{x}(0)=x(0)=0. Roughly speaking, the auxiliary control policy u~i()\tilde{u}_{i}(\cdot) and the corresponding internal state ζ~s()\tilde{\zeta}_{s}(\cdot) introduced above allow us to decompose the suboptimality gap J^J\hat{J}-J_{\star} of the control policy u^()\hat{u}(\cdot) into two terms that are due to K^s\hat{K}_{s} and ζ^s()\hat{\zeta}_{s}(\cdot), respectively, for all s𝒱s\in\mathcal{V}. We then have the following result; the proof follows directly from [26, Lemma 14] and is thus omitted. Note that Lemma 6 is a consequence of the partially nested information structure and the structure of the information graph described in Section 2.2.

Lemma 6.

For any t0t\in\mathbb{Z}_{\geq 0}, the following hold: (a) 𝔼[ζ~s(t)]=0\mathbb{E}[\tilde{\zeta}_{s}(t)]=0, for all s𝒰s\in\mathcal{U}; (b) x~(t)=s𝒰I𝒱,sζ~s(t)\tilde{x}(t)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\tilde{\zeta}_{s}(t); (c) ζ~s1(t)\tilde{\zeta}_{s_{1}}(t) and ζ~s2(t)\tilde{\zeta}_{s_{2}}(t) are independent for all s1,s2𝒰s_{1},s_{2}\in\mathcal{U} with s1s2s_{1}\neq s_{2}.

Using the above notations, the cost of the optimization problem in (5) corresponding to the control policy u^()\hat{u}(\cdot) (i.e., J^\hat{J}) can be written as

J^=lim supT𝔼[1Tt=0T1(x^(t)Qx^(t)+u^(t)Ru^(t))],\hat{J}=\limsup_{T\to\infty}\mathbb{E}\Big{[}\frac{1}{T}\sum_{t=0}^{T-1}\big{(}\hat{x}(t)^{\top}Q\hat{x}(t)+\hat{u}(t)^{\top}R\hat{u}(t)\big{)}\Big{]}, (37)

where we use lim sup\limsup instead of lim\lim since the limit may not exist. Furthermore, we let J~\tilde{J} denote the cost of the optimization problem in (5) corresponding to the control policy u~()\tilde{u}(\cdot) given in Eq. (34),666We will show in Proposition 3 that the limit in Eq. (38) exists.

J~=limT𝔼[1Tt=0T1(x~(t)Qx~(t)+u~(t)Ru~(t))].\tilde{J}=\lim_{T\to\infty}\mathbb{E}\Big{[}\frac{1}{T}\sum_{t=0}^{T-1}\big{(}\tilde{x}(t)^{\top}Q\tilde{x}(t)+\tilde{u}(t)^{\top}R\tilde{u}(t)\big{)}\Big{]}. (38)

Supposing that the estimates A^\hat{A} and B^\hat{B} satisfy \normA^Aε\norm{\hat{A}-A}\leq\varepsilon and \normB^Bε\norm{\hat{B}-B}\leq\varepsilon with ε>0\varepsilon\in\mathbb{R}_{>0}, our ultimate goal in this section is to provide an upper bound on the suboptimality gap J^J\hat{J}-J_{\star}, where JJ_{\star} is the optimal cost given by Eq. (12). To this end, we first decompose J^J\hat{J}-J_{\star} into J~J\tilde{J}-J_{\star} and J^J~\hat{J}-\tilde{J}, and then upper bound J~J\tilde{J}-J_{\star} and J^J~\hat{J}-\tilde{J} separately. Such a decomposition of J^J\hat{J}-J_{\star} is enabled by the structure of the control policy u^()\hat{u}(\cdot) described in Section 4. Specifically, one may view J~J\tilde{J}-J_{\star} as the suboptimality due to K^r\hat{K}_{r} given by Eq. (21) for all r𝒰r\in\mathcal{U} , and view J^J~\hat{J}-\tilde{J} as the suboptimality due to ζ^r(t)\hat{\zeta}_{r}(t) given by Eq. (28) for all r𝒰r\in\mathcal{U} and for all t0t\in\mathbb{Z}_{\geq 0}. Moreover, the suboptimality introduced by ζ^r(t)\hat{\zeta}_{r}(t) is due to the fact that the dynamics of ζ^r(t)\hat{\zeta}_{r}(t) given in Eq. (28) for all r𝒰r\in\mathcal{U} are characterized by A^\hat{A}, B^\hat{B} and w^(t)\hat{w}(t), where w^(t)\hat{w}(t) given by Eq. (29) is an estimate of the disturbance w(t)w(t) in Eq. (3).

To proceed, we recall from Lemma 2 that for any s𝒰s\in\mathcal{U} that has a self loop, the matrix Ass+BssKsA_{ss}+B_{ss}K_{s} is stable, where KsK_{s} is given by Eq. (8). We then have from the Gelfand formula that for any s𝒰s\in\mathcal{U} that has a self loop, there exist κs1\kappa_{s}\in\mathbb{R}_{\geq 1} and γs\gamma_{s}\in\mathbb{R} with ρ(Ass+BssKs)<γs<1\rho(A_{ss}+B_{ss}K_{s})<\gamma_{s}<1 such that \norm(Ass+BssKs)kκsγsk\norm{(A_{ss}+B_{ss}K_{s})^{k}}\leq\kappa_{s}\gamma_{s}^{k} for all k0k\in\mathbb{Z}_{\geq 0}. For notational simplicity, let us denote

γ=max{maxsγs,γ0},κ=max{maxsκs,κ0},\gamma=\max\big{\{}\max_{s\in\mathcal{R}}\gamma_{s},\gamma_{0}\big{\}},\ \kappa=\max\big{\{}\max_{s\in\mathcal{R}}\kappa_{s},\kappa_{0}\}, (39)

where 𝒰\mathcal{R}\subseteq\mathcal{U} denotes the set of root nodes in 𝒰\mathcal{U}, and κ01\kappa_{0}\in\mathbb{R}_{\geq 1} and γ0\gamma_{0}\in\mathbb{R} with ρ(A)<γ0<1\rho(A)<\gamma_{0}<1 are given in Assumption 3. Thus, we see from Assumption 3 and our above arguments that \norm(Ass+BssKs)kκγk\norm{(A_{ss}+B_{ss}K_{s})^{k}}\leq\kappa\gamma^{k} for all ss\in\mathcal{R} and for all k0k\in\mathbb{Z}_{\geq 0}, and \normAkκγk\norm{A^{k}}\leq\kappa\gamma^{k} for all k0k\in\mathbb{Z}_{\geq 0}, where κ1\kappa\in\mathbb{Z}_{\geq 1} and 0<γ<10<\gamma<1. Moreover, we denote

Γ=max{\normA,\normB,maxs𝒰\normPs,maxs𝒰\normKs},Γ~=Γ+1.\begin{split}\Gamma&=\max\big{\{}\norm{A},\norm{B},\max_{s\in\mathcal{U}}\norm{P_{s}},\max_{s\in\mathcal{U}}\norm{K_{s}}\big{\}},\\ \tilde{\Gamma}&=\Gamma+1.\end{split} (40)

For our analysis in this section, we will make the following assumption; similar assumptions can be found in, e.g., [10, 29, 12].

Assumption 4.

The cost matrices RR and QQ in (5) satisfy that σn(R)1\sigma_{n}(R)\geq 1 and σm(Q)1\sigma_{m}(Q)\geq 1.

Note that the above assumption is not more restrictive than assuming that RR and QQ are positive definite. Specifically, supposing R0R\succ 0 and Q0Q\succ 0, one can assume without loss of generality that σn(R)1\sigma_{n}(R)\geq 1 and σm(Q)1\sigma_{m}(Q)\geq 1. This is because one can check that scaling the objective function in (5) by a positive constant does not change KrK_{r} in the optimal solution to (5) provided in Lemma 2, for any r𝒰r\in\mathcal{U}.

5.1 Perturbation Bounds on Solutions to Ricatti Equations

Supposing \normA^Aε\norm{\hat{A}-A}\leq\varepsilon and \normB^Bε\norm{\hat{B}-B}\leq\varepsilon with ε>0\varepsilon\in\mathbb{R}_{>0}, in this subsection we aim to provide upper bounds on the perturbations \normP^rPr\norm{\hat{P}_{r}-P_{r}} and \normK^rKr\norm{\hat{K}_{r}-K_{r}} for all r𝒰r\in\mathcal{U}, where PrP_{r} (resp., P^r\hat{P}_{r}) is given by Eq. (9) (resp., Eq. (22)), and KrK_{r} (resp., K^r\hat{K}_{r}) is given by Eq. (8) (resp., Eq. (21)). We note from Lemma 2 that for any r𝒰r\in\mathcal{U} that has a self loop, Eq. (9) (resp., Eq. (22)) reduces to a discrete Ricatti equation in PrP_{r} (resp., P^r\hat{P}_{r}). The following results characterize the bounds on \normP^rPr\norm{\hat{P}_{r}-P_{r}} and \normK^rKr\norm{\hat{K}_{r}-K_{r}}, for all r𝒰r\in\mathcal{U}; the proofs can be found in Appendix C.

Lemma 7.

Suppose Assumptions 2 and 4 hold, and \normA^Aε\norm{\hat{A}-A}\leq\varepsilon and \normB^Bε\norm{\hat{B}-B}\leq\varepsilon, where ε>0\varepsilon\in\mathbb{R}_{>0}. Then, for any r𝒰r\in\mathcal{U} that has a self loop, the following hold:

\normP^rPr6κ21γ2Γ~5(1+σ1(R1))ε16,\norm{\hat{P}_{r}-P_{r}}\leq 6\frac{\kappa^{2}}{1-\gamma^{2}}\tilde{\Gamma}^{5}(1+\sigma_{1}(R^{-1}))\varepsilon\leq\frac{1}{6}, (41)
\normK^rKr18κ21γ2Γ~8(1+σ1(R1))ε1,\norm{\hat{K}_{r}-K_{r}}\leq 18\frac{\kappa^{2}}{1-\gamma^{2}}\tilde{\Gamma}^{8}(1+\sigma_{1}(R^{-1}))\varepsilon\leq 1, (42)

and

\norm(Arr+BrrK^r)kκ(γ+12)k,k0,\norm{(A_{rr}+B_{rr}\hat{K}_{r})^{k}}\leq\kappa(\frac{\gamma+1}{2})^{k},\ \forall k\geq 0, (43)

under the assumption that

ε1768(1γ2)2κ4Γ~11(1+σ1(R1))2,\varepsilon\leq\frac{1}{768}\frac{(1-\gamma^{2})^{2}}{\kappa^{4}}\tilde{\Gamma}^{-11}(1+\sigma_{1}(R^{-1}))^{-2}, (44)

where PrP_{r} (resp., P^r\hat{P}_{r}) is given by Eq. (9) (resp., Eq. (22)), KrK_{r} (resp., K^r\hat{K}_{r}) is given by Eq. (8) (resp., (21)), γ\gamma and κ\kappa are defined in (39), and Γ~\tilde{\Gamma} is defined in (40).

Lemma 8.

Suppose Assumptions 2 and 4 hold, and \normA^Aε\norm{\hat{A}-A}\leq\varepsilon and \normB^Bε\norm{\hat{B}-B}\leq\varepsilon, where ε>0\varepsilon\in\mathbb{R}_{>0}. Then, for any r𝒰r\in\mathcal{U} that does not have a self loop, the following hold:

\normK^rKr18κ21γ2Γ~8(1+σ1(R1))(20Γ~9σ1(R))lrs1ε1,\norm{\hat{K}_{r}-K_{r}}\leq 18\frac{\kappa^{2}}{1-\gamma^{2}}\tilde{\Gamma}^{8}(1+\sigma_{1}(R^{-1}))(20\tilde{\Gamma}^{9}\sigma_{1}(R))^{l_{rs}-1}\varepsilon\leq 1, (45)

and

\normP^rPr6κ21γ2Γ~5(1+σ1(R1))(20Γ~9σ1(R))lrsε16,\norm{\hat{P}_{r}-P_{r}}\leq 6\frac{\kappa^{2}}{1-\gamma^{2}}\tilde{\Gamma}^{5}(1+\sigma_{1}(R^{-1}))(20\tilde{\Gamma}^{9}\sigma_{1}(R))^{l_{rs}}\varepsilon\leq\frac{1}{6}, (46)

under the assumption that

ε1768(1γ2)2κ4Γ~11(1+σ1(R1))2(20Γ~9σ1(R))Dmax,\varepsilon\leq\frac{1}{768}\frac{(1-\gamma^{2})^{2}}{\kappa^{4}}\tilde{\Gamma}^{-11}(1+\sigma_{1}(R^{-1}))^{-2}(20\tilde{\Gamma}^{9}\sigma_{1}(R))^{-D_{\max}}, (47)

where KrK_{r} (resp., K^r\hat{K}_{r}) is given by Eq.(8) (resp., Eq. (21)), PrP_{r} (resp., P^r\hat{P}_{r}) is given by Eq. (9) (resp., Eq. (22)), Γ~\tilde{\Gamma} is defined in (40), κ\kappa and γ\gamma are defined in  (39), lrsl_{rs} is the length of the unique directed path from node rr to node ss in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) with s𝒰s\in\mathcal{U} to be the unique root node that is reachable from rr, and DmaxD_{\max} is defined in Eq. (26).

Consider any r𝒰r\in\mathcal{U} with a self loop and suppose Eq. (44) holds. One can show via Eq. (43) and [29, Lemma 12] that K^r\hat{K}_{r} given by Eq. (21) is also stabilizing for the pair (A^rr,B^rr)(\hat{A}_{rr},\hat{B}_{rr}), i.e., A^rr+B^rrK^r\hat{A}_{rr}+\hat{B}_{rr}\hat{K}_{r} is stable (see our arguments for (102) in Appendix D for more details). Moreover, it is well-known (e.g., [5]) that a stabilizing solution P^r\hat{P}_{r} to the Ricatti equation in Eq. (22) exists if and only if (A^rr,B^rr)(\hat{A}_{rr},\hat{B}_{rr}) is stabilizable and (A^rr,Crr)(\hat{A}_{rr},C_{rr}) (with Qrr=CrrTCrrQ_{rr}=C_{rr}^{T}C_{rr}) is detectable.777A solution P^r\hat{P}_{r} to the Ricatti equation in Eq. (19) is stabilizing if and only if A^rr+B^rrK^r\hat{A}_{rr}+\hat{B}_{rr}\hat{K}_{r} (with K^r\hat{K}_{r} given by Eq. (18)) is stable. The above arguments together also imply that (A^rr,B^rr)(\hat{A}_{rr},\hat{B}_{rr}) is stabilizable and (A^rr,Crr)(\hat{A}_{rr},C_{rr}) (with Qrr=CrrCrrQ_{rr}=C_{rr}^{\top}C_{rr}) is detectable for all r𝒰r\in\mathcal{U}, under the assumption on ε\varepsilon given by Eq. (44).

5.2 Perturbation Bounds on Costs

Suppose \normA^Aε\norm{\hat{A}-A}\leq\varepsilon and \normB^Bε\norm{\hat{B}-B}\leq\varepsilon, where ε>0\varepsilon\in\mathbb{R}_{>0}. In this subsection, we aim to provide an upper bound on J^J\hat{J}-J_{\star} that scales linearly with ϵ\epsilon, where JJ_{\star} and J^\hat{J} are given by Eqs. (12) and (37), respectively.

Lemma 9.

Suppose Assumptions 2 and 4 hold, and \normA^Aε\norm{\hat{A}-A}\leq\varepsilon and \normB^Bε\norm{\hat{B}-B}\leq\varepsilon, where ε>0\varepsilon\in\mathbb{R}_{>0} satisfies (47). Then, for any s𝒰s\in\mathcal{U},

limt𝔼[ζ~s(t)ζ~s(t)]4pσw2Γ~4Dmaxκ21γ2I,\lim_{t\to\infty}\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)\tilde{\zeta}_{s}(t)^{\top}\Big{]}\preceq\frac{4p\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}}\kappa^{2}}{1-\gamma^{2}}I, (48)

where p=|𝒱|p=|\mathcal{V}|, κ\kappa and γ\gamma are defined in (39), Γ~\tilde{\Gamma} is defined in (40), and DmaxD_{\max} is defined in Eq. (26).

For our analysis in the sequel, we further define P~r\tilde{P}_{r} recursively, for all r𝒰r\in\mathcal{U}, as

P~r=Qrr+K^rRrrK^r+(Asr+BsrK^r)P~s(Asr+BsrK^r),\tilde{P}_{r}=Q_{rr}+\hat{K}_{r}^{\top}R_{rr}\hat{K}_{r}+(A_{sr}+B_{sr}\hat{K}_{r})^{\top}\tilde{P}_{s}(A_{sr}+B_{sr}\hat{K}_{r}), (49)

where K^r\hat{K}_{r} is given by Eq. (21), and s𝒰s\in\mathcal{U} is the unique node such that rsr\rightarrow s. We then have the following result, which gives an upper bound on J~J\tilde{J}-J_{\star}.

Proposition 3.

Suppose Assumption 2 and 4 hold, and \normA^Aε\norm{\hat{A}-A}\leq\varepsilon and \normB^Bε\norm{\hat{B}-B}\leq\varepsilon, where ε>0\varepsilon\in\mathbb{R}_{>0} satisfies (47). It holds that

J~=σw2i𝒱wisTr(I{i},sP~sIs,{i}),\tilde{J}=\sigma_{w}^{2}\sum_{\begin{subarray}{c}i\in\mathcal{V}\\ w_{i}\rightarrow s\end{subarray}}\text{Tr}\big{(}I_{\{i\},s}\tilde{P}_{s}I_{s,\{i\}}\big{)}, (50)

where J~\tilde{J} is defined in Eq. (38). Moreover, consider the optimal cost JJ_{\star} given by Eq. (12). For any φ>0\varphi\in\mathbb{R}_{>0},

J~J72κ4σw2npq(1γ2)2Γ~4Dmax+8(Γ3+σ1(R))(1+σ1(R1))(20Γ~9σ1(R))Dmaxε+φ,\tilde{J}-J_{\star}\leq\frac{72\kappa^{4}\sigma_{w}^{2}npq}{(1-\gamma^{2})^{2}}\tilde{\Gamma}^{4D_{\max}+8}(\Gamma^{3}+\sigma_{1}(R))(1+\sigma_{1}(R^{-1}))(20\tilde{\Gamma}^{9}\sigma_{1}(R))^{D_{\max}}\varepsilon+\varphi,

where κ\kappa and γ\gamma are defined in (39), p=|𝒱|p=|\mathcal{V}| and q=|𝒰|q=|\mathcal{U}|, DmaxD_{\max} is defined in Eq. (26), and Γ\Gamma and Γ~\tilde{\Gamma} are defined in (40).

Next, we aim to provide an upper bound on J^J~\hat{J}-\tilde{J}. We first prove the following result.

Lemma 10.

Suppose Assumptions 2 and 4 hold, and \normA^Aε\norm{\hat{A}-A}\leq\varepsilon and \normB^Bε\norm{\hat{B}-B}\leq\varepsilon, where ε\varepsilon satisfies (47). Then, for any s𝒰s\in\mathcal{U} and for any t0t\in\mathbb{Z}_{\geq 0},

𝔼[\normζ~s(t)2]4npσw2Γ~4Dmaxκ21γ2,\mathbb{E}\Big{[}\norm{\tilde{\zeta}_{s}(t)}^{2}\Big{]}\leq\frac{4np\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}}\kappa^{2}}{1-\gamma^{2}}, (51)

where ζ~s(t)\tilde{\zeta}_{s}(t) is given in Eq. (35), p=|𝒱|p=|\mathcal{V}|, κ\kappa and γ\gamma are defined in (39), Γ~\tilde{\Gamma} is defined in (40), and DmaxD_{\max} is defined in Eq. (26). Moreover, for any t0t\in\mathbb{Z}_{\geq 0},

𝔼[\normx~(t)2]4npq2σw2Γ~4Dmaxκ21γ2,\mathbb{E}\Big{[}\norm{\tilde{x}(t)}^{2}\Big{]}\leq\frac{4npq^{2}\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}}\kappa^{2}}{1-\gamma^{2}}, (52)

and

𝔼[\normu~(t)2]4npq2σw2Γ~4Dmax+2κ21γ2,\mathbb{E}\Big{[}\norm{\tilde{u}(t)}^{2}\Big{]}\leq\frac{4npq^{2}\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}+2}\kappa^{2}}{1-\gamma^{2}}, (53)

where x~(t)\tilde{x}(t) and u~(t)\tilde{u}(t) are given by Eqs. (36) and (34), respectively, and q=|𝒰|q=|\mathcal{U}|.

For notational simplicity in the sequel, let us denote

ζb=4npσw2Γ~4Dmaxκ21γ2,ε¯=(1γ)3768κ4pq(Γ~+1)2Γ~9(1+σ1(R1))2(20(Γ~+1)2Γ~7σ1(R))Dmax.\begin{split}\zeta_{b}&=\sqrt{\frac{4np\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}}\kappa^{2}}{1-\gamma^{2}}},\\ \bar{\varepsilon}&=\frac{(1-\gamma)^{3}}{768\kappa^{4}pq}(\tilde{\Gamma}+1)^{-2}\tilde{\Gamma}^{-9}(1+\sigma_{1}(R^{-1}))^{-2}(20(\tilde{\Gamma}+1)^{2}\tilde{\Gamma}^{7}\sigma_{1}(R))^{-D_{\max}}.\end{split} (54)

We then have the following results.

Lemma 11.

Suppose Assumptions 2-4 hold, and \normA^Aε¯\norm{\hat{A}-A}\leq\bar{\varepsilon} and \normB^Bε¯\norm{\hat{B}-B}\leq\bar{\varepsilon}, where ε¯\bar{\varepsilon} is defined in (54). Then, for all t0t\in\mathbb{Z}_{\geq 0},

𝔼[\normu^(t)u~(t)2](58κ2(Γ~+1)2Dmax+3p2q2(1γ)2ζbε¯)2,\mathbb{E}\Big{[}\norm{\hat{u}(t)-\tilde{u}(t)}^{2}\Big{]}\leq\bigg{(}\frac{58\kappa^{2}(\tilde{\Gamma}+1)^{2D_{\max}+3}p^{2}q^{2}}{(1-\gamma)^{2}}\zeta_{b}\bar{\varepsilon}\bigg{)}^{2}, (55)

and

𝔼[\normx^(t)x~(t)2](58κ3Γ(Γ~+1)2Dmax+3p2q2(1γ)3ζbε¯)2,\mathbb{E}\Big{[}\norm{\hat{x}(t)-\tilde{x}(t)}^{2}\Big{]}\leq\bigg{(}\frac{58\kappa^{3}\Gamma(\tilde{\Gamma}+1)^{2D_{\max}+3}p^{2}q^{2}}{(1-\gamma)^{3}}\zeta_{b}\bar{\varepsilon}\bigg{)}^{2}, (56)

where u^(t)\hat{u}(t) (resp., u~(t)\tilde{u}(t)) is given by Eq. (23) (resp., Eq. (34)), x^(t)\hat{x}(t) (resp., x~(t)\tilde{x}(t)) is given by Eq. (32) (resp., Eq. (36)), Γ\Gamma and Γ~\tilde{\Gamma} are defined in (40), κ\kappa and γ\gamma are defined in (39), p=|𝒱|p=|\mathcal{V}| and q=|𝒰|q=|\mathcal{U}|, DmaxD_{\max} is defined in Eq. (26), and ζb\zeta_{b} is defined in (54).

Corollary 1.

Suppose Assumptions 2-4 hold. and \normA^Aε¯\norm{\hat{A}-A}\leq\bar{\varepsilon} and \normB^Bε¯\norm{\hat{B}-B}\leq\bar{\varepsilon}, where ε¯\bar{\varepsilon} is defined in (54). Then, for all t0t\in\mathbb{Z}_{\geq 0},

𝔼[\normx^(t)2](58κ3Γ(Γ~+1)2Dmax+3p2q2(1γ)3ζbε¯+qζb)2,\mathbb{E}\Big{[}\norm{\hat{x}(t)}^{2}\Big{]}\leq\bigg{(}\frac{58\kappa^{3}\Gamma(\tilde{\Gamma}+1)^{2D_{\max}+3}p^{2}q^{2}}{(1-\gamma)^{3}}\zeta_{b}\bar{\varepsilon}+q\zeta_{b}\bigg{)}^{2}, (57)

and

𝔼[\normu^(t)2](58κ2(Γ~+1)2Dmax+3p2q2(1γ)2ζbε¯+qΓ~ζb)2,\mathbb{E}\Big{[}\norm{\hat{u}(t)}^{2}\Big{]}\leq\bigg{(}\frac{58\kappa^{2}(\tilde{\Gamma}+1)^{2D_{\max}+3}p^{2}q^{2}}{(1-\gamma)^{2}}\zeta_{b}\bar{\varepsilon}+q\tilde{\Gamma}\zeta_{b}\bigg{)}^{2}, (58)

where u^(t)\hat{u}(t) is given by Eq. (23), x^(t)\hat{x}(t) is given by Eq. (32), Γ\Gamma and Γ~\tilde{\Gamma} are defined in (40), κ\kappa and γ\gamma are defined in (39), p=|𝒱|p=|\mathcal{V}| and q=|𝒰|q=|\mathcal{U}|, DmaxD_{\max} is defined in Eq. (26), and ζb\zeta_{b} is given by (54).

Proof.

Note that

𝔼[\normx^(t)2]\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{\hat{x}(t)}^{2}\Big{]}} =𝔼[\normx^(t)x~(t)+x~(t)2]\displaystyle=\sqrt{\mathbb{E}\Big{[}\norm{\hat{x}(t)-\tilde{x}(t)+\tilde{x}(t)}^{2}\Big{]}}
𝔼[\normx^(t)x~(t)2]+𝔼[\normx~(t)2],\displaystyle\leq\sqrt{\mathbb{E}\Big{[}\norm{\hat{x}(t)-\tilde{x}(t)}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{\tilde{x}(t)}^{2}\Big{]}},

where the inequality follows from Lemma 14. The proof of (57) now follows directly from Lemmas 10-11. Similarly, we can prove (58). ∎

Proposition 4.

Suppose Assumptions 2-4 hold, and \normA^Aε¯\norm{\hat{A}-A}\leq\bar{\varepsilon} and \normB^Bε¯\norm{\hat{B}-B}\leq\bar{\varepsilon}, where ε¯\bar{\varepsilon} is defined in Eq. (54). Then, for J^\hat{J} and J~\tilde{J} defined in Eqs. (37) and (38), respectively,

J^J~696κ6σw2np4q3(1γ)4(1γ2)Γ~4Dmax+2(Γ~+1)2Dmax+3(σ1(Q)+σ1(R))ε¯,\hat{J}-\tilde{J}\leq 696\frac{\kappa^{6}\sigma_{w}^{2}np^{4}q^{3}}{(1-\gamma)^{4}(1-\gamma^{2})}\tilde{\Gamma}^{4D_{\max}+2}(\tilde{\Gamma}+1)^{2D_{\max}+3}(\sigma_{1}(Q)+\sigma_{1}(R))\bar{\varepsilon}, (59)

where κ\kappa and γ\gamma are defined in (39), p=|𝒱|p=|\mathcal{V}| and q=|𝒰|q=|\mathcal{U}|, Γ~\tilde{\Gamma} is defined in (40), and DmaxD_{\max} is defined in Eq. (26).

Suppose \normA^Aε\norm{\hat{A}-A}\leq\varepsilon and \normB^Bε\norm{\hat{B}-B}\leq\varepsilon with ε>0\varepsilon\in\mathbb{R}_{>0}. We see from the results in Propositions 3-4 that J^JCε\hat{J}-J_{\star}\leq C\varepsilon, if εC0\varepsilon\leq C_{0}, where CC and C0C_{0} are constants that depend on the problem parameters.

5.3 Sample Complexity Result

We are now in place to present the sample complexity result for learning decentralized LQR with the partially nested information structure described in Section 2.2.

Theorem 1.

Suppose Assumptions 2-4 hold, and Algorithm 1 is used to obtain A^\hat{A} and B^\hat{B}. Moreover, suppose \normAϑ\norm{A}\leq\vartheta and \normBϑ\norm{B}\leq\vartheta, where ϑ>0\vartheta\in\mathbb{R}_{>0}, and DmaxDD_{\max}\leq D, where DmaxD_{\max} is defined in Eq. (26) and DD is a universal constant. Consider any δ>0\delta>0. Let the input parameters to Algorithm 1 satisfy Nα/ε¯N\geq\alpha/\bar{\varepsilon} and λσ¯2/40\lambda\geq\underline{\sigma}^{2}/40, where

zb=5κ01γ0σ¯(\normB2m+m+n)log4Nδ,z_{b}=\frac{5\kappa_{0}}{1-\gamma_{0}}\overline{\sigma}\sqrt{(\norm{B}^{2}m+m+n)\log\frac{4N}{\delta}},

and

α=160σ¯2(2nσw2(n+m)logN+zb2/λδ+λnϑ2),\alpha=\frac{160}{\underline{\sigma}^{2}}\bigg{(}2n\sigma_{w}^{2}(n+m)\log\frac{N+z_{b}^{2}/\lambda}{\delta}+\lambda n\vartheta^{2}\bigg{)},

where κ0\kappa_{0} and γ0\gamma_{0} are given in Assumption 3, σ¯=min{σw,σu}\underline{\sigma}=\min\{\sigma_{w},\sigma_{u}\}, σ¯=max{σw,σu}\overline{\sigma}=\max\{\sigma_{w},\sigma_{u}\}, and ε¯\bar{\varepsilon} is defined in (54). Then, with probability at least 1δ1-\delta,

J^JC1κ6σw2np4q3(1γ2)2Γ~11D+5(Γ~+1)2D+3(Γ3+σ1(R)+σ1(Q))σ1(R)DαN,\hat{J}-J_{\star}\leq C_{1}\frac{\kappa^{6}\sigma_{w}^{2}np^{4}q^{3}}{(1-\gamma^{2})^{2}}\tilde{\Gamma}^{11D+5}(\tilde{\Gamma}+1)^{2D+3}(\Gamma^{3}+\sigma_{1}(R)+\sigma_{1}(Q))\sigma_{1}(R)^{D}\sqrt{\frac{\alpha}{N}}, (60)

where J^\hat{J} and JJ_{\star} are given in Eqs. (37) and (12), respectively, C1C_{1} is a universal constant, κ\kappa and γ\gamma are defined in (39), and Γ\Gamma and Γ~\tilde{\Gamma} are defined in (40), p=|𝒱|p=|\mathcal{V}|, and q=|𝒰|q=|\mathcal{U}|.

Proof.

Note that the results in Propositions 3-4 hold, if \normA^Aε¯\norm{\hat{A}-A}\leq\bar{\varepsilon} and \normB^Bε¯\norm{\hat{B}-B}\leq\bar{\varepsilon} with ε¯\bar{\varepsilon} given in (54). Thus, letting Nαε¯N\geq\frac{\alpha}{\bar{\varepsilon}} and λσ¯2/40\lambda\geq\underline{\sigma}^{2}/40, one can first check that N200(n+m)log48δN\geq 200(n+m)\log\frac{48}{\delta}, and then obtain from Proposition 1 that with probability at least 1δ1-\delta, A^\hat{A} and B^\hat{B} returned by Algorithm 1 satisfy that \normA^Aε¯\norm{\hat{A}-A}\leq\bar{\varepsilon} and \normB^Bε¯\norm{\hat{B}-B}\leq\bar{\varepsilon}. Now, noting that DmaxDD_{\max}\leq D, where DD is a universal constant, and setting φ=1/N\varphi=1/\sqrt{N} in Proposition 3, one can then show via Propositions 3-4 that (60) holds with probability at least 1δ1-\delta. ∎

Thus, we have shown a 𝒪~(1/N)\tilde{\mathcal{O}}(1/\sqrt{N}) end-to-end sample complexity result for learning decentralized LQR with the partially nested information structure. In other words, we relate the number of data samples used for estimating the system model to the performance of the control policy proposed in Section 4. Note that our result in Theorem 1 matches with the 𝒪(1/N)\mathcal{O}(1/\sqrt{N}) sample complexity result (up to logarithm factors in NN) provided in [12] for learning centralized LQR without any information constraints. Also note that the sample complexity for learning centralized LQR has been improved to 𝒪(1/N)\mathcal{O}(1/N) in [29]. Specifically, the authors in [29] showed that the gap between the cost J^\hat{J} corresponding to the control policy they proposed and the optimal cost JJ_{\star} is upper bounded by 𝒪(ε2)\mathcal{O}(\varepsilon^{2}), where \normA^Aε\norm{\hat{A}-A}\leq\varepsilon and \normB^Bε\norm{\hat{B}-B}\leq\varepsilon, if ε\varepsilon is sufficiently small. Due to the additional challenges introduced by the information constraints on the controllers (see our discussions at the end of Section 4), we leave investigating the possibility of improving our sample complexity result in Theorem 1 for future work.

6 Numerical Results

In this section, we illustrate the sample complexity result provided in Theorem 1 with numerical experiments, where the numerical experiments are conducted based on Example 1. Specifically, we consider the LTI system given by Eq. (6) with the corresponding directed graph and information graph given by Fig. 1 and Fig. 2, respectively. Under the sparsity pattern of AA and BB specified in Eq. (6), we generate the nonzero entries in A3×3A\in\mathbb{R}^{3\times 3} and B3×3B\in\mathbb{R}^{3\times 3} independently by the Gaussian distribution 𝒩(0,1)\mathcal{N}(0,1) while satisfying Assumption 3. We set the covariance of the zero-mean white Gaussian noise process w(t)w(t) to be II, and set the cost matrices to be Q=2IQ=2I and R=5IR=5I. Moreover, we set the input sequence used in the system identification algorithm (Algorithm 1) to be u(t)i.i.d.𝒩(0,I)u(t)\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,I) for all t{0,,N1}t\in\{0,\dots,N-1\}. In order to approximate the value of J^\hat{J} defined in Eq. (37), we simulate the system using Algorithm 2 for T=2000T=2000 and obtain J^1Tt=0T1(x~(t)Qx~(t)+u~(t)Ru~(t))\hat{J}\approx\frac{1}{T}\sum_{t=0}^{T-1}\big{(}\tilde{x}(t)^{\top}Q\tilde{x}(t)+\tilde{u}(t)^{\top}R\tilde{u}(t)\big{)}. Fixing the randomly generated matrices AA and BB described above, the numerical results presented in this section are obtained by averaging over 100100 independent experiments.

Refer to caption
(a) a
Refer to caption
(b) b
Figure 3: Both the performance of Algorithm 1 and the performance of Algorithm 2 are plotted against the number of data samples used for estimating the system model, where shaded regions display quartiles.

In Fig. 3(a), we plot the estimation error [A^B^][AB]\big{\lVert}\begin{bmatrix}\hat{A}&\hat{B}\end{bmatrix}-\begin{bmatrix}A&B\end{bmatrix}\big{\rVert} corresponding to Algorithm 1 when we range the number of the data samples used in Algorithm 1 from N=20N=20 to N=280N=280. Similarly, in Fig. 3(b) we plot the curve corresponding to the cost suboptimality J^J\hat{J}-J_{\star}, where JJ_{\star} is obtained by the closed-form expression given in Eq. (12). According to Fig. 3, we observe that the estimation error and the cost suboptimality share a similar dependency pattern on NN. The similar dependency on NN aligns with the results shown in Proposition 1 and Theorem 1 that both the estimation error and the cost suboptimality scale as 𝒪~(1/N)\tilde{\mathcal{O}}(1/\sqrt{N}), which is a consequence of the results shown in Propositions 3-4 that the cost suboptimality scales linearly with the estimation error. The results presented in Fig. 3 then also imply that our suboptimality results provided in Propositions 3-4 can be tight for certain instances of the problem. Finally, we observe from the shaded regions in Fig. 3 that the cost suboptimality is more sensitive to the randomness introduced by the random input u(t)i.i.d.𝒩(0,I)u(t)\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,I) for t{0,,N1}t\in\{0,\dots,N-1\} and the noise w(t)i.i.d.𝒩(0,I)w(t)\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,I) for t0t\in\mathbb{Z}_{\geq 0}, when we run the 100100 experiments described above. This is potentially due to the fact that we approximated the cost suboptimality as 1Tt=0T1(x^(t)Qx^(t)+u^(t)Ru^(t))J\frac{1}{T}\sum_{t=0}^{T-1}\big{(}\hat{x}(t)^{\top}Q\hat{x}(t)+\hat{u}(t)^{\top}R\hat{u}(t)\big{)}-J_{\star} with T=2000T=2000.

7 Conclusion

We considered the problem of control policy design for decentralized state-feedback linear quadratic control with a partially nested information structure, when the system model is unknown. We took a model-based learning approach consisting of two steps. First, we estimated the unknown system model from a single system trajectory of finite length, using least squares estimation. Next, we designed a control policy based on the estimated system model, which satisfies the desired information constraints. We showed that the suboptimality gap between our control policy and the optimal decentralized control policy (designed using accurate knowledge of the system model) scales linearly with the estimation error of the system model. Combining the above results, we provided an end-to-end sample complexity of learning decentralized controllers for state-feedback linear quadratic control with a partially nested information structure.

References

  • Abbasi-Yadkori and Szepesvári [2011] Y. Abbasi-Yadkori and C. Szepesvári. Regret bounds for the adaptive control of linear quadratic systems. In Proc. Conference on Learning Theory, pages 1–26, 2011.
  • Abbasi-Yadkori et al. [2011] Y. Abbasi-Yadkori, D. Pál, and C. Szepesvári. Improved algorithms for linear stochastic bandits. In Proc. Advances in neural information processing systems, volume 24, pages 2312–2320, 2011.
  • Abbasi-Yadkori et al. [2019] Y. Abbasi-Yadkori, P. Bartlett, K. Bhatia, N. Lazic, C. Szepesvari, and G. Weisz. Politex: Regret bounds for policy iteration using expert prediction. In Proc. International Conference on Machine Learning, pages 3692–3702, 2019.
  • Åström and Wittenmark [2008] K. J. Åström and B. Wittenmark. Adaptive Control. Courier Corporation, 2008.
  • Bertsekas [2017] D. P. Bertsekas. Dynamic programming and optimal control: Vol. 2 4th Edition. Athena Scientific, 2017.
  • Blondel and Tsitsiklis [2000] V. D. Blondel and J. N. Tsitsiklis. A survey of computational complexity results in systems and control. Automatica, 36(9):1249–1274, 2000.
  • Bu et al. [2019] J. Bu, A. Mesbahi, M. Fazel, and M. Mesbahi. LQR through the lens of first order methods: Discrete-time case. arXiv preprint arXiv:1907.08921, 2019.
  • Cassel and Koren [2021] A. Cassel and T. Koren. Online policy gradient for model free learning of linear quadratic regulators with T\sqrt{T} regret. arXiv preprint arXiv:2102.12608, 2021.
  • Cassel et al. [2020] A. Cassel, A. Cohen, and T. Koren. Logarithmic regret for learning linear quadratic regulators efficiently. In Proc. International Conference on Machine Learning, pages 1328–1337, 2020.
  • Cohen et al. [2019] A. Cohen, T. Koren, and Y. Mansour. Learning linear-quadratic regulators efficiently with only T\sqrt{T} regret. In Proc. International Conference on Machine Learning, pages 1300–1309, 2019.
  • Dean et al. [2018] S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu. Regret bounds for robust adaptive control of the linear quadratic regulator. In Proc. International Conference on Neural Information Processing Systems, pages 4192–4201, 2018.
  • Dean et al. [2020] S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu. On the sample complexity of the linear quadratic regulator. Foundations of Computational Mathematics, 20(4):633–679, 2020.
  • Faradonbeh et al. [2018] M. K. S. Faradonbeh, A. Tewari, and G. Michailidis. Finite time identification in unstable linear systems. Automatica, 96:342–353, 2018.
  • Fattahi et al. [2019] S. Fattahi, N. Matni, and S. Sojoudi. Learning sparse dynamical systems from a single sample trajectory. In Proc. IEEE Conference on Decision and Control, pages 2682–2689, 2019.
  • Fattahi et al. [2020] S. Fattahi, N. Matni, and S. Sojoudi. Efficient learning of distributed linear-quadratic control policies. SIAM Journal on Control and Optimization, 58(5):2927–2951, 2020.
  • Fazel et al. [2018] M. Fazel, R. Ge, S. Kakade, and M. Mesbahi. Global convergence of policy gradient methods for the linear quadratic regulator. In Proc. International Conference on Machine Learning, pages 1467–1476, 2018.
  • Feng and Lavaei [2019] H. Feng and J. Lavaei. On the exponential number of connected components for the feasible set of optimal decentralized control problems. In Proc. American Control Conference, pages 1430–1437, 2019.
  • Furieri et al. [2020] L. Furieri, Y. Zheng, and M. Kamgarpour. Learning the globally optimal distributed LQ regulator. In Proc. Learning for Dynamics and Control Conference, pages 287–297, 2020.
  • Ghadimi and Lan [2013] S. Ghadimi and G. Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368, 2013.
  • Gravell et al. [2020] B. Gravell, P. M. Esfahani, and T. H. Summers. Learning optimal controllers for linear systems with multiplicative noise via policy gradient. IEEE Transactions on Automatic Control, 2020.
  • Ho et al. [1972] Y.-C. Ho et al. Team decision theory and information structures in optimal control problems–part i. IEEE Transactions on Automatic control, 17(1):15–22, 1972.
  • Horn and Johnson [2012] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge university press, 2012.
  • Hou and Wang [2013] Z.-S. Hou and Z. Wang. From model-based control to data-driven control: Survey, classification and perspective. Information Sciences, 235:3–35, 2013.
  • Lale et al. [2020] S. Lale, K. Azizzadenesheli, B. Hassibi, and A. Anandkumar. Logarithmic regret bound in partially observable linear dynamical systems. arXiv preprint arXiv:2003.11227, 2020.
  • Lamperski and Doyle [2012] A. Lamperski and J. C. Doyle. Dynamic programming solutions for decentralized state-feedback LQG problems with communication delays. In Proc. American Control Conference, pages 6322–6327, 2012.
  • Lamperski and Lessard [2015] A. Lamperski and L. Lessard. Optimal decentralized state-feedback control with sparsity and delays. Automatica, 58:143–151, 2015.
  • Li et al. [2019] Y. Li, Y. Tang, R. Zhang, and N. Li. Distributed reinforcement learning for decentralized linear quadratic control: A derivative-free policy optimization approach. arXiv preprint arXiv:1912.09135, 2019.
  • Malik et al. [2020] D. Malik, A. Pananjady, K. Bhatia, K. Khamaru, P. L. Bartlett, and M. J. Wainwright. Derivative-free methods for policy optimization: Guarantees for linear quadratic systems. Journal of Machine Learning Research, 21(21):1–51, 2020.
  • Mania et al. [2019] H. Mania, S. Tu, and B. Recht. Certainty equivalence is efficient for linear quadratic control. In Proc. International Conference on Neural Information Processing Systems, pages 10154–10164, 2019.
  • Nesterov and Spokoiny [2017] Y. Nesterov and V. Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527–566, 2017.
  • Papadimitriou and Tsitsiklis [1986] C. H. Papadimitriou and J. Tsitsiklis. Intractable problems in control theory. SIAM Journal on Control and Optimization, 24(4):639–654, 1986.
  • Rotkowitz and Lall [2005] M. Rotkowitz and S. Lall. A characterization of convex problems in decentralized control. IEEE Transactions on Automatic Control, 50(12):1984–1996, 2005.
  • Rotkowitz and Martins [2011] M. C. Rotkowitz and N. C. Martins. On the nearest quadratically invariant information constraint. IEEE Transactions on Automatic Control, 57(5):1314–1319, 2011.
  • Sarkar and Rakhlin [2019] T. Sarkar and A. Rakhlin. Near optimal finite time identification of arbitrary linear dynamical systems. In Proc. International Conference on Machine Learning, pages 5610–5618, 2019.
  • Shah and Parrilo [2013] P. Shah and P. A. Parrilo. 2\mathcal{H}_{2}-optimal decentralized control over posets: A state-space solution for state-feedback. IEEE Transactions on Automatic Control, 58(12):3084–3096, 2013.
  • Simchowitz et al. [2018] M. Simchowitz, H. Mania, S. Tu, M. I. Jordan, and B. Recht. Learning without mixing: Towards a sharp analysis of linear system identification. In Proc. Conference On Learning Theory, pages 439–473, 2018.
  • Simchowitz et al. [2020] M. Simchowitz, K. Singh, and E. Hazan. Improper learning for non-stochastic control. In Proc. Conference on Learning Theory, pages 3320–3436, 2020.
  • Tu and Recht [2019] S. Tu and B. Recht. The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint. In Proc. Conference on Learning Theory, pages 3036–3083, 2019.
  • Witsenhausen [1968] H. S. Witsenhausen. A counterexample in stochastic optimum control. SIAM Journal on Control, 6(1):131–147, 1968.
  • Yu et al. [2022] J. Yu, D. Ho, and A. Wierman. Online stabilization of unknown networked systems with communication constraints. arXiv preprint arXiv:2203.02630, 2022.
  • Zhang et al. [2020] K. Zhang, B. Hu, and T. Basar. Policy optimization for 2\mathcal{H}_{2} linear control with \mathcal{H}_{\infty} robustness guarantee: Implicit regularization and global convergence. In Proc. Learning for Dynamics and Control Conference, pages 179–190, 2020.
  • Zheng et al. [2020] Y. Zheng, L. Furieri, A. Papachristodoulou, N. Li, and M. Kamgarpour. On the equivalence of Youla, system-level, and input–output parameterizations. IEEE Transactions on Automatic Control, 66(1):413–420, 2020.
  • Zheng et al. [2021] Y. Zheng, L. Furieri, M. Kamgarpour, and N. Li. Sample complexity of linear quadratic gaussian (LQG) control for output feedback systems. In Proc. Learning for Dynamics and Control Conference, pages 559–570, 2021.

Appendix

Appendix A Proofs for Least Squares Estimation of System Matrices

A.1 Proof of Lemma 4

First, let us consider w\mathcal{E}_{w}. We can apply Lemma 12 with δw=δ4\delta_{w}=\frac{\delta}{4} and obtain that (w)1δ4\mathbb{P}(\mathcal{E}_{w})\geq 1-\frac{\delta}{4}. Similarly, recalling from Algorithm 1 that u(t)i.i.d.𝒩(0,σu2I)u(t)\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,\sigma_{u}^{2}I) for all t{0,,N1}t\in\{0,\dots,N-1\}, we have from Lemma 12 that (u)1δ4\mathbb{P}(\mathcal{E}_{u})\geq 1-\frac{\delta}{4}. Next, let us consider Θ\mathcal{E}_{\Theta}. Applying Lemma 3 with δΘ=δ4\delta_{\Theta}=\frac{\delta}{4}, we obtain (Θ)1δ4\mathbb{P}(\mathcal{E}_{\Theta})\geq 1-\frac{\delta}{4}.

Finally, let us consider z\mathcal{E}_{z}. Consider the sequence of random vectors {z(t)}t0\{z(t)\}_{t\geq 0} and the filtration {t}t0\{\mathcal{F}_{t}\}_{t\geq 0}, where z(t)n+mz(t)\in\mathbb{R}^{n+m} is defined in (13), and t=σ(x(0),u(0),,x(t),u(t))\mathcal{F}_{t}=\sigma(x(0),u(0),\dots,x(t),u(t)) for all t0t\in\mathbb{Z}_{\geq 0}. For any t{1,,N1}t\in\{1,\dots,N-1\}, we note from Eq. (3) that x(t)x(t) is conditionally Gaussian on x(t1)x(t-1) and u(t1)u(t-1), with

𝔼[x(t)x(t)|t1]𝔼[w(t1)w(t1)]=σw2I.\mathbb{E}\big{[}x(t)x(t)^{\top}|\mathcal{F}_{t-1}\big{]}\succeq\mathbb{E}\big{[}w(t-1)w(t-1)^{\top}\big{]}=\sigma_{w}^{2}I.

Note again from Algorithm 1 that u(t)i.i.d.𝒩(0,σu2I)u(t)\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,\sigma^{2}_{u}I) for all t{0,,N1}t\in\{0,\dots,N-1\}, and that u(t)u(t) is assumed to be independent of w(t)w(t) for all t{0,,N1}t\in\{0,\dots,N-1\}. We then see that z(t)z(t) is also conditionally Gaussian on x(t1)x(t-1) and u(t1)u(t-1), with

𝔼[z(t)z(t)|t1]\displaystyle\mathbb{E}\big{[}z(t)z(t)^{\top}|\mathcal{F}_{t-1}\big{]} =[𝔼[x(t)x(t)|t1]00𝔼[u(t)u(t)]]\displaystyle=\begin{bmatrix}\mathbb{E}[x(t)x(t)^{\top}|\mathcal{F}_{t-1}]&0\\ 0&\mathbb{E}[u(t)u(t)^{\top}]\end{bmatrix}
=[𝔼[x(t)x(t)|t1]00σu2I]σ¯2I.\displaystyle=\begin{bmatrix}\mathbb{E}[x(t)x(t)^{\top}|\mathcal{F}_{t-1}]&0\\ 0&\sigma_{u}^{2}I\end{bmatrix}\succeq\underline{\sigma}^{2}I.

Now, we can apply Lemma 13 with {z(t)}t0\{z(t)\}_{t\geq 0} and {t}t0\{\mathcal{F}_{t}\}_{t\geq 0} described above, and let δz=δ4p\delta_{z}=\frac{\delta}{4p}. Since N200(n+m)log48δN\geq 200(n+m)\log\frac{48}{\delta}, we have from Lemma 13 that t=0N1z(t)z(t)(N1)σ¯240I\sum_{t=0}^{N-1}z(t)z(t)^{\top}\succeq\frac{(N-1)\underline{\sigma}^{2}}{40}I holds with probability at least 1δ41-\frac{\delta}{4}. Combining the above arguments together and applying a union bound over the events w\mathcal{E}_{w}, ψ\mathcal{E}_{\psi}, Θ\mathcal{E}_{\Theta} and z\mathcal{E}_{z}, we complete the proof of the lemma. ∎

A.2 Proof of Lemma 5

First, considering any t{0,,N1}t\in\{0,\dots,N-1\}, we denote ψ(t)=Bu(t)+w(t)\psi(t)=Bu(t)+w(t). We see from (15) that

\normψ(k)\displaystyle\norm{\psi(k)} \normBσu5mlog4Nδ+σw5nlog4Nδ\displaystyle\leq\norm{B}\sigma_{u}\sqrt{5m\log\frac{4N}{\delta}}+\sigma_{w}\sqrt{5n\log\frac{4N}{\delta}}
σ¯(m\normB2+n)5log4Nδ\displaystyle\leq\overline{\sigma}(\sqrt{m\norm{B}^{2}}+\sqrt{n})\sqrt{5\log\frac{4N}{\delta}}
σ¯10(m\normB2+n)log4Nδ.\displaystyle\leq\overline{\sigma}\sqrt{10(m\norm{B}^{2}+n)\log\frac{4N}{\delta}}.

Next, for any t{1,,N1}t\in\{1,\dots,N-1\}, we see from Eq. (3) that

x(t)=Atx(0)+k=0t1At1kψ(k),x(t)=A^{t}x(0)+\sum_{k=0}^{t-1}A^{t-1-k}\psi(k),

where recall that we assumed previously that x(0)=0x(0)=0. Since AA is stable from Assumption 3, we know that \normAkκ0γ0k\norm{A^{k}}\leq\kappa_{0}\gamma_{0}^{k} for all k0k\in\mathbb{Z}_{\geq 0}, where κ01\kappa_{0}\geq 1 and ρ(A)<γ0<1\rho(A)<\gamma_{0}<1. It now follows from the above arguments that

\normx(t)κ01γ0σ¯10(m\normB2+n)log4Nδ,\norm{x(t)}\leq\frac{\kappa_{0}}{1-\gamma_{0}}\overline{\sigma}\sqrt{10(m\norm{B}^{2}+n)\log\frac{4N}{\delta}},

for all t{0,,N1}t\in\{0,\dots,N-1\}. Noting that \normz(t)\normx(t)+\normu(t)\norm{z(t)}\leq\norm{x(t)}+\norm{u(t)}, we then have

\normz(t)\displaystyle\norm{z(t)} κ01γ0σ¯10(m\normB2+n)log4Nδ+σu5mlog4Nδ\displaystyle\leq\frac{\kappa_{0}}{1-\gamma_{0}}\overline{\sigma}\sqrt{10(m\norm{B}^{2}+n)\log\frac{4N}{\delta}}+\sigma_{u}\sqrt{5m\log\frac{4N}{\delta}}
5κ01γ0σ¯(\normB2m+m+n)log4Nδ,\displaystyle\leq\frac{5\kappa_{0}}{1-\gamma_{0}}\overline{\sigma}\sqrt{(\norm{B}^{2}m+m+n)\log\frac{4N}{\delta}},

for all t{0,,N1}t\in\{0,\dots,N-1\}. ∎

A.3 Proof of Proposition 1

First, we see from Eq. (15) that on the event \mathcal{E} defined in Eq. (16), the following holds:

Tr(Δ(N)V(N)Δ(N))4σw2nlog(4nδdet(V(N))det(λI))+4λnϑ2,\text{Tr}\big{(}\Delta(N)^{\top}V(N)\Delta(N)\big{)}\leq 4\sigma_{w}^{2}n\log\bigg{(}\frac{4n}{\delta}\frac{\det(V(N))}{\det(\lambda I)}\bigg{)}+4\lambda n\vartheta^{2}, (61)

where recall that V(N)=λI+t=0N1z(t)z(t)V(N)=\lambda I+\sum_{t=0}^{N-1}z(t)z(t)^{\top} and Δ(N)=ΘΘ^(N)\Delta(N)=\Theta-\hat{\Theta}(N), and where Θ\Theta and z(t)z(t) are defined in (13), and Θ^(N)\hat{\Theta}(N) is given by (14). To obtain (61), we also use the fact that \normΘ2\normA2+\normB22ϑ2\norm{\Theta}^{2}\leq\norm{A}^{2}+\norm{B}^{2}\leq 2\vartheta^{2}, which implies via Θn×(n+m)\Theta\in\mathbb{R}^{n\times(n+m)} that \normΘF22nϑ2\norm{\Theta}_{F}^{2}\leq 2n\vartheta^{2}. Moreover, we have from Eq. (15) that on the event \mathcal{E}, the following holds:

V(N)λI+(N1)σ¯240INσ¯240I,V(N)\succeq\lambda I+\frac{(N-1)\underline{\sigma}^{2}}{40}I\succeq\frac{N\underline{\sigma}^{2}}{40}I,

where the second inequality follows from the choice of λ\lambda. Combining the above arguments together, one can show that

\normΔ(N)2160Nσ¯(nσw2log(4nδdet(V(N))det(λI))+λnϑ2).\norm{\Delta(N)}^{2}\leq\frac{160}{N\underline{\sigma}}\bigg{(}n\sigma_{w}^{2}\log\bigg{(}\frac{4n}{\delta}\frac{\det(V(N))}{\det(\lambda I)}\bigg{)}+\lambda n\vartheta^{2}\bigg{)}.

Noting from Lemma 5 that \normz(t)zb\norm{z(t)}\leq z_{b} for all t{0,,N1}t\in\{0,\dots,N-1\}, one can use similar arguments to those for [9, Lemma 37], and show that

logdet(V(N))det(λI)(n+m)log(N+zb2λ).\log\frac{\det(V(N))}{\det(\lambda I)}\leq(n+m)\log(N+\frac{z_{b}^{2}}{\lambda}).

Noting that N200(n+m)log48δN\geq 200(n+m)\log\frac{48}{\delta}, we have N4nN\geq 4n. It then follows that

\normΔ(N)2\displaystyle\norm{\Delta(N)}^{2} 160Nσ¯2(nσw2logNδ+nσw2(n+m)log(N+zb2λ)+λnϑ2)\displaystyle\leq\frac{160}{N\underline{\sigma}^{2}}\bigg{(}n\sigma_{w}^{2}\log\frac{N}{\delta}+n\sigma_{w}^{2}(n+m)\log(N+\frac{z_{b}^{2}}{\lambda})+\lambda n\vartheta^{2}\bigg{)}
160Nσ¯2(2nσw2(n+m)logN+zb2/λδ+λnϑ2).\displaystyle\leq\frac{160}{N\underline{\sigma}^{2}}\bigg{(}2n\sigma_{w}^{2}(n+m)\log\frac{N+z^{2}_{b}/\lambda}{\delta}+\lambda n\vartheta^{2}\bigg{)}.

Noting that Algorithm 1 extracts A^\hat{A} and B^\hat{B} from Θ^(N)\hat{\Theta}(N), i.e., Θ^(N)=[A^B^]\hat{\Theta}(N)=\begin{bmatrix}\hat{A}&\hat{B}\end{bmatrix}, one can show that \normA^A\normΘ^(N)Θ\norm{\hat{A}-A}\leq\norm{\hat{\Theta}(N)-\Theta} and \normB^B\normΘ^(N)Θ\norm{\hat{B}-B}\leq\norm{\hat{\Theta}(N)-\Theta}. Finally, since we know from Lemma 4 that ()1δ\mathbb{P}(\mathcal{E})\geq 1-\delta, we conclude that \normA^Aε0\norm{\hat{A}-A}\leq\varepsilon_{0} and \normB^Bε0\norm{\hat{B}-B}\leq\varepsilon_{0} hold with probability at least 1δ1-\delta. ∎

Appendix B Proof for Algorithm 2

B.1 Proof of Proposition 2

To prove part (a), we use an induction on t=0,,T1t=0,\dots,T-1. For the base case t=0t=0, we see directly from line 4 in Algorithm 2 and Eq. (27) that i\mathcal{M}_{i} satisfies Eq. (31) (with t=0t=0) at the beginning of iteration 0 of the for loop in lines 4-14 of the algorithm. For the induction step, consider any t{0,,T1}t\in\{0,\dots,T-1\} and suppose the memory i\mathcal{M}_{i} satisfies Eq. (31) at the beginning of iteration tt of the for loop in lines 4-14 of the algorithm.

To proceed, let us consider any s(𝒯i)s\in\mathcal{L}(\mathcal{T}_{i}) with j𝒱j\in\mathcal{V} and sj(0)=ss_{j}(0)=s in the for loop in lines 5-9 of Algorithm 2, where (𝒯i)\mathcal{L}(\mathcal{T}_{i}) is defined in Eq. (24). We will show that w^j(tDij1)\hat{w}_{j}(t-D_{ij}-1) in line 7, and thus ζ^s(tDij)\hat{\zeta}_{s}(t-D_{ij}) in line 8, can be determined using Eq. (28) and the current memory i\mathcal{M}_{i} of Algorithm 2. As suggested by the first two cases in Eq. (29), we can focus on the case when tDij>0t-D_{ij}>0 (otherwise, ζ^s(tDij)\hat{\zeta}_{s}(t-D_{ij}) can be directly determined). We then note from the third case in Eq. (29) that in order to determine w^j(tDij1)\hat{w}_{j}(t-D_{ij}-1), we need to know xj(tDij)x_{j}(t-D_{ij}), and xj1(tDij1)x_{j_{1}}(t-D_{ij}-1) and u^j1(tDij1)\hat{u}_{j_{1}}(t-D_{ij}-1) for all j1𝒩jj_{1}\in\mathcal{N}_{j}, where 𝒩j\mathcal{N}_{j} is given in Assumption 1. Also note that Dij1Dij+1D_{ij_{1}}\leq D_{ij}+1 for all j1𝒩jj_{1}\in\mathcal{N}_{j}, which implies that tDmax1tDij1tDij1t-D_{\max}-1\leq t-D_{ij}-1\leq t-D_{ij_{1}}. Thus, we have that xj(tDij)~i(t)x_{j}(t-D_{ij})\in\tilde{\mathcal{I}}_{i}(t), and xj1(tDij1)~i(t)x_{j_{1}}(t-D_{ij}-1)\in\tilde{\mathcal{I}}_{i}(t) for all j1𝒩jj_{1}\in\mathcal{N}_{j}, where ~i(t)\tilde{\mathcal{I}}_{i}(t) is defined in Eq. (30). Now, considering any j1𝒩jj_{1}\in\mathcal{N}_{j}, we note from Eq. (23) that u^j1(tDij1)=rj1I{j1},rK^rζ^r(tDij1)\hat{u}_{j_{1}}(t-D_{ij}-1)=\sum_{r\ni j_{1}}I_{\{j_{1}\},r}\hat{K}_{r}\hat{\zeta}_{r}(t-D_{ij}-1). Thus, in order to determine u^j1(tDij1)\hat{u}_{j_{1}}(t-D_{ij}-1), it suffices to determine ζ^r(tDij1)\hat{\zeta}_{r}(t-D_{ij}-1) for all r𝒰r\in\mathcal{U} such that j1rj_{1}\in r (i.e., for all rj1r\ni j_{1}). Note that j1ij_{1}\rightsquigarrow i, i.e., there exists a directed path from node j1j_{1} to node ii in 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}). Moreover, noting the definition of 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) given by (7) with its properties discussed in Lemma 1 and Remark 1, and noting the way we defined the set 𝒯i\mathcal{T}_{i}, one can show that for any rj1r\ni j_{1}, it holds that r𝒯ir\in\mathcal{T}_{i}. Next, considering any rj1r\ni j_{1}, we recall from Eq. (20) that r\mathcal{L}_{r} denotes the set of leaf nodes that can reach rr in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}), i.e., r={v:vr}={v(𝒯i):vr}\mathcal{L}_{r}=\{v\in\mathcal{L}:v\rightsquigarrow r\}=\{v\in\mathcal{L}(\mathcal{T}_{i}):v\rightsquigarrow r\}, where the second equality again follows from the properties of 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) and the definition of (𝒯i)\mathcal{L}(\mathcal{T}_{i}) in Eq. (24). Now, we split our arguments into two cases: rr is a root node in 𝒯i\mathcal{T}_{i} (i.e., rr has a self loop), and rr does not have a self loop.

First, suppose rr has a self loop. For the case when rr is a leaf node in 𝒯i\mathcal{T}_{i} (i.e., rr is an isolated node in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) and r(𝒯i)r\in\mathcal{L}(\mathcal{T}_{i})), we see that ζ^r(k)i\hat{\zeta}_{r}(k)\in\mathcal{M}_{i} with i\mathcal{M}_{i} given by Eq. (31), for all k{k2Dmax1,,kDijr1}k\in\{k-2D_{\max}-1,\dots,k-D_{ij_{r}}-1\}, where jr𝒱j_{r}\in\mathcal{V} and sjr(0)=rs_{j_{r}}(0)=r. Noting from the definition of 𝒯i\mathcal{T}_{i} that i,jrri,j_{r}\in r, we have from the construction of 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) in Eq. (7) that jrij_{r}\rightarrow i in 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) with Dijr=0D_{ij_{r}}=0. It follows that ζ^r(tDij1)i\hat{\zeta}_{r}(t-D_{ij}-1)\in\mathcal{M}_{i} with i\mathcal{M}_{i} given by Eq. (31). Thus, we focus on the case when rr is not a leaf node in 𝒯i\mathcal{T}_{i}, i.e., r(𝒯i)r\in\mathcal{R}(\mathcal{T}_{i}). We now see from Eq. (28) that given ζ^r(tDij2)\hat{\zeta}_{r}(t-D_{ij}-2), and ζ^r(tDij2)\hat{\zeta}_{r^{\prime}}(t-D_{ij}-2) for all rr^{\prime} such that rrr^{\prime}\rightarrow r (with rrr\neq r^{\prime}) in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}), the state ζ^r(tDij1)\hat{\zeta}_{r}(t-D_{ij}-1) can be determined. Let us consider any rr^{\prime} such that rrr^{\prime}\rightarrow r (with rrr^{\prime}\neq r) in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}), and denote r={v(𝒯i):vr}\mathcal{L}_{r^{\prime}}=\{v^{\prime}\in\mathcal{L}(\mathcal{T}_{i}):v^{\prime}\rightsquigarrow r^{\prime}\}, where we note that rr\mathcal{L}_{r^{\prime}}\subseteq\mathcal{L}_{r}. For any kk\in\mathbb{Z}, one can recursively apply Eq. (28) to show that given ζ^v(klvr)\hat{\zeta}_{v^{\prime}}(k-l_{v^{\prime}r^{\prime}}) for all vrv^{\prime}\in\mathcal{L}_{r^{\prime}}, the state ζ^r(k)\hat{\zeta}_{r^{\prime}}(k) can be determined, where lvrl_{v^{\prime}r^{\prime}} is the length of the (unique) directed path from vv^{\prime} to rr^{\prime} in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}). Further considering any vrv^{\prime}\in\mathcal{L}_{r^{\prime}}, and noting that v(𝒯i)v^{\prime}\in\mathcal{L}(\mathcal{T}_{i}), we have that ζ^v(k)i\hat{\zeta}_{v^{\prime}}(k)\in\mathcal{M}_{i} for all k{t2Dmax1,,tDijv1}k\in\{t-2D_{\max}-1,\dots,t-D_{ij_{v^{\prime}}}-1\} with jv𝒱j_{v^{\prime}}\in\mathcal{V} and sjv(0)=vs_{j_{v^{\prime}}}(0)=v^{\prime}, where i\mathcal{M}_{i} is given by Eq. (31). Recalling again the definition of 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) in (7), and noting that vrv^{\prime}\in\mathcal{L}_{r^{\prime}}, jvvj_{v^{\prime}}\in v^{\prime}, j1rj_{1}\in r, and rrr^{\prime}\rightarrow r in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}), one can show that

Dj1jvlvr+1Dmax.D_{j_{1}j_{v^{\prime}}}\leq l_{v^{\prime}r^{\prime}}+1\leq D_{\max}. (62)

We further split our arguments into DijvDijD_{ij_{v^{\prime}}}\leq D_{ij} and DijvDij+1D_{ij_{v^{\prime}}}\geq D_{ij}+1. First, supposing DijvDijD_{ij_{v^{\prime}}}\leq D_{ij}, we have

t2Dmax1tDmaxlvr1tDijv1tDijlvr2.\begin{split}&t-2D_{\max}-1\leq t-D_{\max}-l_{v^{\prime}r^{\prime}}-1\\ &t-D_{ij_{v^{\prime}}}-1\geq t-D_{ij}-l_{v^{\prime}r^{\prime}}-2.\end{split} (63)

Second, suppose DijvDij+1D_{ij_{v^{\prime}}}\geq D_{ij}+1. Recall from Remark 3 that we let the for loop in lines 5-9 of Algorithm 2 iterate over the elements in (𝒯i)\mathcal{L}(\mathcal{T}_{i}) according to a certain order of the elements in (𝒯i)\mathcal{L}(\mathcal{T}_{i}). We then see from the inequality DijvDij+1D_{ij_{v^{\prime}}}\geq D_{ij}+1 that sjv(0)(𝒯i)s_{j_{v^{\prime}}}(0)\in\mathcal{L}(\mathcal{T}_{i}) (with jv𝒱j_{v^{\prime}}\in\mathcal{V} and sjv(0)=vs_{j_{v^{\prime}}}(0)=v^{\prime}) has already been considered by the for loop in lines 5-9 in Algorithm 2, i.e., the states ζ^v(k)\hat{\zeta}_{v^{\prime}}(k) for all k{t2Dmax1,,tDijv}k\in\{t-2D_{\max}-1,\dots,t-D_{ij_{v^{\prime}}}\} are in the current memory of Algorithm 2, denoted as i\mathcal{M}_{i}^{\prime}, when we consider the s(𝒯i)s\in\mathcal{L}(\mathcal{T}_{i}) with j𝒱j\in\mathcal{V} and sj(0)=ss_{j}(0)=s in the for loop in lines 5-9 of the algorithm. Moreover, we have from the above arguments that jvj1ij_{v^{\prime}}\rightsquigarrow j_{1}\rightsquigarrow i in 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}), i.e., there is a directed path from node jvj_{v^{\prime}} to node ii that goes through node j1j_{1} in 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}). It then follows that

DijvDij1+Dj1jvDij+Djj1+Dj1jv,D_{ij_{v^{\prime}}}\leq D_{ij_{1}}+D_{j_{1}j_{v^{\prime}}}\leq D_{ij}+D_{jj_{1}}+D_{j_{1}j_{v^{\prime}}}, (64)

where Djj1{0,1}D_{jj_{1}}\in\{0,1\}. Combining (62) and (64), we obtain

t2Dmax1tDmaxlvr1tDijvtDijlvr2.\begin{split}&t-2D_{\max}-1\leq t-D_{\max}-l_{v^{\prime}r^{\prime}}-1\\ &t-D_{ij_{v^{\prime}}}\geq t-D_{ij}-l_{v^{\prime}r^{\prime}}-2.\end{split} (65)

It then follows from (63) and (65) and our arguments above that the states ζ^v(k)\hat{\zeta}_{v^{\prime}}(k) for all k{tDmaxlvr1,,tDijlvr2}k\in\{t-D_{\max}-l_{v^{\prime}r^{\prime}}-1,\dots,t-D_{ij}-l_{v^{\prime}r^{\prime}}-2\} are in the current memory i\mathcal{M}_{i}^{\prime} described above, for all vrv^{\prime}\in\mathcal{L}_{r^{\prime}}. Combining the above arguments together, we have that ζ^r(k)\hat{\zeta}_{r^{\prime}}(k) can be determined from Eq. (28) and the current memory i\mathcal{M}_{i}^{\prime}, for all k{tDmax1,,tDij2}k\in\{t-D_{\max}-1,\dots,t-D_{ij}-2\} and for all rr^{\prime} such that rrr^{\prime}\rightarrow r (with rrr\neq r^{\prime}) in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}). Moreover, recalling that r(𝒯i)r\in\mathcal{R}(\mathcal{T}_{i}) as we argued above, we see from Eq. (31) that ζ^r(tDmax1)i\hat{\zeta}_{r}(t-D_{\max}-1)\in\mathcal{M}_{i}^{\prime}. One can now apply Eq. (28) multiple times to show that ζ^r(k)\hat{\zeta}_{r}(k) can be determined from the current memory i\mathcal{M}_{i}^{\prime} described above, for all k{tDmax,,tDij1}k\in\{t-D_{\max},\dots,t-D_{ij}-1\}.

Next, suppose rr does not have a self loop. Similarly to our arguments above, we first consider the case when rr is a leaf node in 𝒯i\mathcal{T}_{i}, i.e., r(𝒯i)r\in\mathcal{L}(\mathcal{T}_{i}). We see that ζ^r(tDijr1)i\hat{\zeta}_{r}(t-D_{ij_{r}}-1)\in\mathcal{M}_{i}, where jr𝒱j_{r}\in\mathcal{V} with sjr(0)=rs_{j_{r}}(0)=r, and i\mathcal{M}_{i} is defined in Eq. (31). Since j1,jrrj_{1},j_{r}\in r, we have from the construction of 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) in (7) that jrj1j_{r}\rightarrow j_{1} in 𝒢(𝒱,𝒜)\mathcal{G}(\mathcal{V},\mathcal{A}) with Dj1jr=0D_{j_{1}j_{r}}=0. Noting that j1ij_{1}\rightsquigarrow i in 𝒫(𝒱,𝒜)\mathcal{P}(\mathcal{V},\mathcal{A}) as we argued above, we then have the following:

DijrDij+Djj1+Dj1jr=Dij+Djj1,\displaystyle D_{ij_{r}}\leq D_{ij}+D_{jj_{1}}+D_{j_{1}j_{r}}=D_{ij}+D_{jj_{1}}, (66)

where Djj1{0,1}D_{jj_{1}}\in\{0,1\}. Now, supposing DijrDijD_{ij_{r}}\leq D_{ij}, we see directly see from Eq. (31) that ζ^r(tDij1)i\hat{\zeta}_{r}(t-D_{ij}-1)\in\mathcal{M}_{i} holds. Supposing DijrDij+1D_{ij_{r}}\geq D_{ij}+1, we see from (66) that Dijr=Dij+1D_{ij_{r}}=D_{ij}+1. Using similar arguments to those above for the case when rr has a self loop (particularly, the order of the elements in (𝒯i)\mathcal{L}(\mathcal{T}_{i}) over which the for loop in lines 5-9 of Algorithm 2 iterates), one can show that the states ζ^r(k)\hat{\zeta}_{r}(k) for all k{t2Dmax1,,tDijr}k\in\{t-2D_{\max}-1,\dots,t-D_{ij_{r}}\} have been added to the current memory of Algorithm 2, denoted as i′′\mathcal{M}_{i}^{\prime\prime}, when we consider the s(𝒯i)s\in\mathcal{L}(\mathcal{T}_{i}) with j𝒱j\in\mathcal{V} and sj(0)=ss_{j}(0)=s in the for loop in lines 5-9 of the algorithm. It follows that ζ^r(tDij1)i′′\hat{\zeta}_{r}(t-D_{ij}-1)\in\mathcal{M}^{\prime\prime}_{i}. Then, we consider the case when rr is not a leaf node in 𝒯i\mathcal{T}_{i}. We see from Eq. (28) that given ζ^r(tDij2)\hat{\zeta}_{r^{\prime}}(t-D_{ij}-2) for all rr^{\prime} such that rrr^{\prime}\rightarrow r (with rrr\neq r^{\prime}) in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}), the state ζ^r(tDij1)\hat{\zeta}_{r}(t-D_{ij}-1) can be determined. The remaining arguments then follow directly from those above for the case when rr has a self loop.

In summary, we have shown that ζ^r(tDij1)\hat{\zeta}_{r}(t-D_{ij}-1) can be determined from Eq. (28) and the current memory of Algorithm 2, for all rj1r\ni j_{1}, for all j1𝒩jj_{1}\in\mathcal{N}_{j}, and for all s(𝒯i)s\in\mathcal{L}(\mathcal{T}_{i}) with j𝒱j\in\mathcal{V} and sj(0)=ss_{j}(0)=s. It then follows from our arguments above that w^j(tDij1)\hat{w}_{j}(t-D_{ij}-1) in line 7 of Algorithm 2, and thus ζ^s(tDij)\hat{\zeta}_{s}(t-D_{ij}) in line 8 of Algorithm 2, can be determined using Eq. (28) and the current memory of Algorithm 2, for all s(𝒯i)s\in\mathcal{L}(\mathcal{T}_{i}) with j𝒱j\in\mathcal{V} and sj(0)=ss_{j}(0)=s. In other words, we have shown that ζ^s(tDij)\hat{\zeta}_{s}(t-D_{ij}) can be added to the memory of Algorithm 2 in line 9, for all s(𝒯i)s\in\mathcal{L}(\mathcal{T}_{i}) with j𝒱j\in\mathcal{V} and sj(0)=ss_{j}(0)=s.

Now, let us consider any s(𝒯i)s\in\mathcal{R}(\mathcal{T}_{i}) with (𝒯i)\mathcal{R}(\mathcal{T}_{i}) defined in Eq. (25). We will show that ζ^r(tDmax1)\hat{\zeta}_{r}(t-D_{\max}-1) can be determined using Eq. (28) and the states in i\mathcal{M}_{i} given by Eq. (31), for all rr such that rsr\rightarrow s in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}). Note from our definition of (𝒯i)\mathcal{R}(\mathcal{T}_{i}) in Eq. (25) that ss is not a leaf node. Following similar arguments to those above, let us consider any rr such that rsr\rightarrow s (with rsr\neq s) in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}), and denote r={v(𝒯i):vr}\mathcal{L}_{r}=\{v\in\mathcal{L}(\mathcal{T}_{i}):v\rightsquigarrow r\}. Further considering any vrv\in\mathcal{L}_{r}, and noting that v(𝒯i)v\in\mathcal{L}(\mathcal{T}_{i}), we have that ζ^v(k)i\hat{\zeta}_{v}(k)\in\mathcal{M}_{i} for all k{t2Dmax1,,tDijv1}k\in\{t-2D_{\max}-1,\dots,t-D_{ij_{v}}-1\}, where i\mathcal{M}_{i} is given by Eq. (31), and jv𝒱j_{v}\in\mathcal{V} with sjv(0)=vs_{j_{v}}(0)=v. Similarly to Eq. (62), we have that lvrDmaxl_{vr}\leq D_{\max}, which implies that t2Dmax1tDmaxlvr1t-2D_{\max}-1\leq t-D_{\max}-l_{vr}-1. Therefore, we see that ζ^v(tDmaxlvr1)i\hat{\zeta}_{v}(t-D_{\max}-l_{vr}-1)\in\mathcal{M}_{i} with i\mathcal{M}_{i} given by Eq. (31), for all vrv\in\mathcal{L}_{r}. Using similar arguments to those above, one can now recursively apply Eq. (28) to show that ζ^r(tDmax1)\hat{\zeta}_{r}(t-D_{\max}-1) can be determined from i\mathcal{M}_{i} given by Eq. (31), for all rr such that rsr\rightarrow s (with rsr\neq s). Since ζ^s(tDmax1)i\hat{\zeta}_{s}(t-D_{\max}-1)\in\mathcal{M}_{i}, we see from Eq. (28) that ζ^s(tDmax)\hat{\zeta}_{s}(t-D_{\max}) can be determined from i\mathcal{M}_{i} given by Eq. (31). Thus, we have shown that ζ^s(tDmax)\hat{\zeta}_{s}(t-D_{\max}) can be added to the memory of Algorithm 2 in line 12, for all s(𝒯i)s\in\mathcal{R}(\mathcal{T}_{i}).

Combining all the above arguments together and noting line 14 in Algorithm 2, we see that at the beginning of iteration (t+1)(t+1) of the for loop in lines 4-14 of Algorithm 2, the memory i\mathcal{M}_{i} of the algorithm satisfies

i={ζ^s(k):k{t2Dmax,,tDij},s(𝒯i),j𝒱,sj(0)=s}{ζ^s(tDmax):s(𝒯i)}.\mathcal{M}_{i}=\big{\{}\hat{\zeta}_{s}(k):k\in\{t-2D_{\max},\dots,t-D_{ij}\},s\in\mathcal{L}(\mathcal{T}_{i}),j\in\mathcal{V},s_{j}(0)=s\big{\}}\cup\{\hat{\zeta}_{s}(t-D_{\max}):s\in\mathcal{R}(\mathcal{T}_{i})\}. (67)

This completes the induction step for the proof of Eq. (31), and thus completes the proof of part (a).

We then prove part (b). Consider any t{0,1,,T1}t\in\{0,1,\dots,T-1\}. In order to prove part (b), it suffices for us to show that ζ^r(t)\hat{\zeta}_{r}(t) for all rir\ni i can be determined using Eq. (28) and the memory i\mathcal{M}_{i} after line 12 (and before line 14) in iteration tt of the for loop in lines 4-14 of Algorithm 2, which is given by

i={ζ^s(k):k{t2Dmax1,,tDij},s(𝒯i),j𝒱,sj(0)=s}{ζ^s(k):k{tDmax1,tDmax},s(𝒯i)}.\mathcal{M}_{i}=\big{\{}\hat{\zeta}_{s}(k):k\in\{t-2D_{\max}-1,\dots,t-D_{ij}\},s\in\mathcal{L}(\mathcal{T}_{i}),j\in\mathcal{V},s_{j}(0)=s\big{\}}\\ \cup\{\hat{\zeta}_{s}(k):k\in\{t-D_{\max}-1,t-D_{\max}\},s\in\mathcal{R}(\mathcal{T}_{i})\}. (68)

Considering any rir\ni i, one can show via the definition of 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) in (7) and the definition of 𝒯i\mathcal{T}_{i} that r𝒯ir\in\mathcal{T}_{i}. Again, we split our arguments into two cases: rr has a self loop, and rr does not have a self loop.

First, suppose rr has a self loop. For the case when r(𝒯i)r\in\mathcal{L}(\mathcal{T}_{i}) (i.e., rr is an isolated node in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H})), we see that ζ^r(tDijr)i\hat{\zeta}_{r}(t-D_{ij_{r}})\in\mathcal{M}_{i} with i\mathcal{M}_{i} given by Eq. (68), where jr𝒱j_{r}\in\mathcal{V} with sjr(0)=rs_{j_{r}}(0)=r. Noting that i,jrri,j_{r}\in r, we see from the definitions of 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) in (7) that jrij_{r}\rightarrow i in 𝒢(𝒜,𝒱)\mathcal{G}(\mathcal{A},\mathcal{V}) with Dijr=0D_{ij_{r}}=0. It follows that ζ^r(t)i\hat{\zeta}_{r}(t)\in\mathcal{M}_{i} with i\mathcal{M}_{i} given by Eq. (68). Thus, we focus on the case when rr is not a leaf node, i.e., r(𝒯i)r\in\mathcal{R}(\mathcal{T}_{i}). We see from Eq. (28) that given ζ^r(t1)\hat{\zeta}_{r}(t-1), and ζ^r(t1)\hat{\zeta}_{r^{\prime}}(t-1) for all rrr^{\prime}\rightarrow r (with rrr^{\prime}\neq r) in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}), the state ζ^r(t)\hat{\zeta}_{r}(t) can be determined. Again, let us consider any rr^{\prime} such that rrr^{\prime}\rightarrow r in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}), and denote r={v(𝒯i):vr}\mathcal{L}_{r^{\prime}}=\{v^{\prime}\in\mathcal{L}(\mathcal{T}_{i}):v^{\prime}\rightsquigarrow r^{\prime}\}. Further considering any vrv^{\prime}\in\mathcal{L}_{r^{\prime}}, and noting that v(𝒯i)v^{\prime}\in\mathcal{L}(\mathcal{T}_{i}), we have that ζ^v(k)i\hat{\zeta}_{v^{\prime}}(k)\in\mathcal{M}_{i} with i\mathcal{M}_{i} given by Eq. (68), for all k{t2Dmax1,,tDijv}k\in\{t-2D_{\max}-1,\dots,t-D_{ij_{v^{\prime}}}\}, where jv𝒱j_{v^{\prime}}\in\mathcal{V} and sjv(0)=vs_{j_{v^{\prime}}}(0)=v^{\prime}. Similarly to (62), we have that Dijvlvr+1DmaxD_{ij_{v^{\prime}}}\leq l_{v^{\prime}r^{\prime}}+1\leq D_{\max}, which implies that

t2Dmax1tDmaxlvr1tDijvtlvr1.\begin{split}&t-2D_{\max}-1\leq t-D_{\max}-l_{v^{\prime}r^{\prime}}-1\\ &t-D_{ij_{v^{\prime}}}\geq t-l_{v^{\prime}r^{\prime}}-1.\end{split} (69)

It then follows from (69) that ζ^v(k)i\hat{\zeta}_{v^{\prime}}(k)\in\mathcal{M}_{i} with i\mathcal{M}_{i} given by Eq. (68), for all k{tDmaxlvr1,,tlvr1}k\in\{t-D_{\max}-l_{v^{\prime}r^{\prime}}-1,\dots,t-l_{v^{\prime}r^{\prime}}-1\}, and for all vrv^{\prime}\in\mathcal{L}_{r}. Using similar arguments to those before, one can recursively use Eq. (28) to show that ζ^r(k)\hat{\zeta}_{r^{\prime}}(k) can be determined from i\mathcal{M}_{i} given by Eq. (68), for all k{tDmax1,,t1}k\in\{t-D_{\max}-1,\dots,t-1\}. Moreover, recalling that r(𝒯i)r\in\mathcal{R}(\mathcal{T}_{i}) as we argued above, we see that ζ^r(tDmax1)i\hat{\zeta}_{r}(t-D_{\max}-1)\in\mathcal{M}_{i} with i\mathcal{M}_{i} given by Eq. (68). One can then apply Eq. (28) multiple times and show that ζ^r(t)\hat{\zeta}_{r}(t) can be determined from i\mathcal{M}_{i} given by Eq. (68). Next, suppose rr does not have a self loop. Using similar arguments to those above for the case when rr has a self loop, one can show that ζ^r(t)\hat{\zeta}_{r}(t) can be determined using Eq. (28) and the current memory i\mathcal{M}_{i} given in Eq. (68).

Combining the above arguments together, we conclude that for any t{0,1,,T1}t\in\{0,1,\dots,T-1\}, the control input u^i(t)\hat{u}_{i}(t) in line 13 can be determined using Eq. (28) and the memory i\mathcal{M}_{i} given by Eq. (68). This completes the proof of part (b). ∎

Appendix C Proofs for Perturbation Bounds on Solutions to Ricatti Equations

C.1 Proof of Lemma 7

Consider any r𝒰r\in\mathcal{U} that has a self loop. To show that (41) holds under the assumption on ε\varepsilon given in (44), we use [29, Proposition 2]. Specifically, since \normA^Aε\norm{\hat{A}-A}\leq\varepsilon and \normB^Bε\norm{\hat{B}-B}\leq\varepsilon hold, one can show that \normA^rrArrε\norm{\hat{A}_{rr}-A_{rr}}\leq\varepsilon and \normB^rrBrrε\norm{\hat{B}_{rr}-B_{rr}}\leq\varepsilon hold, for all r𝒰r\in\mathcal{U}. Similarly, noting that \normArr\normA\norm{A_{rr}}\leq\norm{A} and \normBrr\normB\norm{B_{rr}}\leq\norm{B} hold, for all r𝒰r\in\mathcal{U}, one can show that \normArr+BrrKrΓ~2\norm{A_{rr}+B_{rr}K_{r}}\leq\tilde{\Gamma}^{2}. Moreover, note that 1σn(R)σnr(Rrr)σ1(Rrr)σ1(R)1\leq\sigma_{n}(R)\leq\sigma_{n_{r}}(R_{rr})\leq\sigma_{1}(R_{rr})\leq\sigma_{1}(R) and 1σm(Q)σmr(Qrr)σ1(Qrr)σ1(Q)1\leq\sigma_{m}(Q)\leq\sigma_{m_{r}}(Q_{rr})\leq\sigma_{1}(Q_{rr})\leq\sigma_{1}(Q) from Assumption 4, where nrirnin_{r}\triangleq\sum_{i\in r}n_{i} and mrirmim_{r}\triangleq\sum_{i\in r}m_{i}. Recalling the definitions of κ\kappa, γ\gamma and Γ\Gamma, the proof of (41) under the assumption on ε\varepsilon given in (44) now follows from [29, Proposition 2].

Next, let denote

f(ε)=6κ21γ2Γ~5(1+σ1(R1))ε,f(\varepsilon)=6\frac{\kappa^{2}}{1-\gamma^{2}}\tilde{\Gamma}^{5}(1+\sigma_{1}(R^{-1}))\varepsilon,

and note that \normPrΓ\norm{P_{r}}\leq\Gamma. Moreover, we see from Eq. (21) that

K^r=(Rrr+B^rrP^rB^rr)1B^rrP^rA^rr,\hat{K}_{r}=-(R_{rr}+\hat{B}_{rr}^{\top}\hat{P}_{r}\hat{B}_{rr}^{\top})^{-1}\hat{B}_{rr}^{\top}\hat{P}_{r}\hat{A}_{rr},

where \normA^rrArrε\norm{\hat{A}_{rr}-A_{rr}}\leq\varepsilon and \normB^rrBrrε\norm{\hat{B}_{rr}-B_{rr}}\leq\varepsilon. We have

\normB^rrP^rB^rrBrrPrB^rr\normB^rrP^rrB^rrBrrP^rB^rr+\normBrrP^rB^rrBrrPrB^rr+\normBrrPrB^rrBrrPrBrr,\norm{\hat{B}^{\top}_{rr}\hat{P}_{r}\hat{B}_{rr}-B^{\top}_{rr}P_{r}\hat{B}_{rr}}\leq\norm{\hat{B}^{\top}_{rr}\hat{P}_{rr}\hat{B}_{rr}-B^{\top}_{rr}\hat{P}_{r}\hat{B}_{rr}}+\norm{B^{\top}_{rr}\hat{P}_{r}\hat{B}_{rr}-B^{\top}_{rr}P_{r}\hat{B}_{rr}}+\norm{B^{\top}_{rr}P_{r}\hat{B}_{rr}-B^{\top}_{rr}P_{r}B_{rr}},

which implies that

\normB^rrP^rB^rrBrrPrBrr\normP^r\normB^rrε+\normBrr\normB^rrf(ε)+\normBrr\normPrε.\norm{\hat{B}^{\top}_{rr}\hat{P}_{r}\hat{B}_{rr}-B^{\top}_{rr}P_{r}B_{rr}}\leq\norm{\hat{P}_{r}}\norm{\hat{B}_{rr}}\varepsilon+\norm{B_{rr}}\norm{\hat{B}_{rr}}f(\varepsilon)+\norm{B_{rr}}\norm{P_{r}}\varepsilon.

Noting that εf(ε)16\varepsilon\leq f(\varepsilon)\leq\frac{1}{6} and recalling the definition of Γ\Gamma given in (40), we have

\normB^rrP^rB^rrBrrPrBrr3Γ~2f(ε).\norm{\hat{B}^{\top}_{rr}\hat{P}_{r}\hat{B}_{rr}-B^{\top}_{rr}P_{r}B_{rr}}\leq 3\tilde{\Gamma}^{2}f(\varepsilon).

Similarly, one can show that

\normB^rrP^rA^rrBrrPrArr3Γ~2f(ε).\norm{\hat{B}^{\top}_{rr}\hat{P}_{r}\hat{A}_{rr}-B^{\top}_{rr}P_{r}A_{rr}}\leq 3\tilde{\Gamma}^{2}f(\varepsilon).

Now, following similar arguments to those in the proof of [29, Lemma 2], one can show that

\normK^rKr3Γ~3f(ε)1,\norm{\hat{K}_{r}-K_{r}}\leq 3\tilde{\Gamma}^{3}f(\varepsilon)\leq 1,

where the second inequality follows from the assumption on ε\varepsilon given in (44). ∎

C.2 Proof of Lemma 8

First, let us consider any r𝒰r\in\mathcal{U} such that lrs=1l_{rs}=1, i.e., rsr\rightarrow s. Since ss is the unique root node that is reachable from rr, we see from Lemma 1 and Remark 1 that ss has a self loop. Noting that σ1(R)1\sigma_{1}(R)\geq 1 from Assumption 4, and that Γ~1\tilde{\Gamma}\geq 1, we see that any ε\varepsilon satisfying (47) also satisfies (44). Thus, we have from (41) in Lemma 7 that

\normP^sPsf(ε)16,\norm{\hat{P}_{s}-P_{s}}\leq f(\varepsilon)\leq\frac{1}{6},

where

f(ε)=6κ21γ2Γ~5(1+σ1(R1))ε,f(\varepsilon)=6\frac{\kappa^{2}}{1-\gamma^{2}}\tilde{\Gamma}^{5}(1+\sigma_{1}(R^{-1}))\varepsilon,

and we note that \normPsΓ\norm{P_{s}}\leq\Gamma. Moreover, we see from Eq. (21) that

K^r=(Rrr+B^srP^sB^sr)1B^srP^sA^sr,\hat{K}_{r}=-(R_{rr}+\hat{B}_{sr}^{\top}\hat{P}_{s}\hat{B}_{sr}^{\top})^{-1}\hat{B}_{sr}^{\top}\hat{P}_{s}\hat{A}_{sr},

where \normA^srArrε\norm{\hat{A}_{sr}-A_{rr}}\leq\varepsilon and \normB^srBsrε\norm{\hat{B}_{sr}-B_{sr}}\leq\varepsilon (since \normA^Aε\norm{\hat{A}-A}\leq\varepsilon and \normB^Bε\norm{\hat{B}-B}\leq\varepsilon). Now, using similar arguments to those in the proof of Lemma 7, one can also show that

\normB^srP^sB^srBsrPsBsr3Γ~2f(ε),\norm{\hat{B}^{\top}_{sr}\hat{P}_{s}\hat{B}_{sr}-B^{\top}_{sr}P_{s}B_{sr}}\leq 3\tilde{\Gamma}^{2}f(\varepsilon),

and

\normB^srP^sA^srBsrPsAsr3Γ~2f(ε).\norm{\hat{B}^{\top}_{sr}\hat{P}_{s}\hat{A}_{sr}-B^{\top}_{sr}P_{s}A_{sr}}\leq 3\tilde{\Gamma}^{2}f(\varepsilon).

Using similar arguments to those in the proof of [29, Lemma 2], one can now show that

\normK^rKr3Γ~3f(ε)1,\norm{\hat{K}_{r}-K_{r}}\leq 3\tilde{\Gamma}^{3}f(\varepsilon)\leq 1, (70)

where the second inequality follows from the assumption on ε\varepsilon given in (47), and we note that \normKrΓ\norm{K_{r}}\leq\Gamma. Hence, we have shown that (45) holds for lrs=1l_{rs}=1

To prove Eq. (46) for lrs=1l_{rs}=1, we first recall the expressions for PrP_{r} and P^r\hat{P}_{r} given in Eqs. (9) and (22), respectively. Using similar arguments to those above, we have

\normA^sr+B^srK^rAsrBsrKr\displaystyle\norm{\hat{A}_{sr}+\hat{B}_{sr}\hat{K}_{r}-A_{sr}-B_{sr}K_{r}}
\displaystyle\leq \normA^srAsr+\normB^srK^rBsrK^r+\normBsrK^rBsrKr\displaystyle\norm{\hat{A}_{sr}-A_{sr}}+\norm{\hat{B}_{sr}\hat{K}_{r}-B_{sr}\hat{K}_{r}}+\norm{B_{sr}\hat{K}_{r}-B_{sr}K_{r}}
\displaystyle\leq ε+(Γ+1)ε+Γ3Γ~3f(ε)4Γ~4f(ε),\displaystyle\varepsilon+(\Gamma+1)\varepsilon+\Gamma 3\widetilde{\Gamma}^{3}f(\varepsilon)\leq 4\tilde{\Gamma}^{4}f(\varepsilon), (71)

where the first inequality in (71) follows from (70) and the definition of Γ\Gamma given in (40). To obtain the second inequality in (71), we first note from Eq. (9) that PrQrrσm(Q)IIP_{r}\succeq Q_{rr}\succeq\sigma_{m}(Q)I\succeq I, where the last inequality follows from Assumption 4. We then have from (40) that Γ\normPr1\Gamma\geq\norm{P_{r}}\geq 1, which further implies that Γ~+1Γ~4\tilde{\Gamma}+1\leq\tilde{\Gamma}^{4}. It follows that the second inequality in (71) holds. To proceed, denoting L^sr=A^sr+B^srK^r\hat{L}_{sr}=\hat{A}_{sr}+\hat{B}_{sr}\hat{K}_{r} and Lsr=Asr+BsrKrL_{sr}=A_{sr}+B_{sr}K_{r}, we have from Eqs. (9) and (22) that

\normP^rPr\normK^rRrrK^rKrRrrKr+\normL^srP^sL^srLsrPsLsr,\norm{\hat{P}_{r}-P_{r}}\leq\norm{\hat{K}_{r}^{\top}R_{rr}\hat{K}_{r}-K_{r}^{\top}R_{rr}K_{r}}+\norm{\hat{L}_{sr}^{\top}\hat{P}_{s}\hat{L}_{sr}-L_{sr}^{\top}P_{s}L_{sr}}, (72)

where we note that \normLsrΓ~2\norm{L_{sr}}\leq\tilde{\Gamma}^{2}. From the above arguments, we have the following:

\normK^rRrrK^rKrRrrKr\displaystyle\norm{\hat{K}_{r}^{\top}R_{rr}\hat{K}_{r}-K^{\top}_{r}R_{rr}K_{r}} \normK^rRrrK^rKrRrrK^r+\normKrRrrK^rKrRrrKr\displaystyle\leq\norm{\hat{K}_{r}^{\top}R_{rr}\hat{K}_{r}-K_{r}^{\top}R_{rr}\hat{K}_{r}}+\norm{K_{r}^{\top}R_{rr}\hat{K}_{r}-K_{r}^{\top}R_{rr}K_{r}}
3Γ~3f(ε)σ1(R)(Γ+3Γ~3f(ε))+Γσ1(R)3Γ~3f(ε)\displaystyle\leq 3\tilde{\Gamma}^{3}f(\varepsilon)\sigma_{1}(R)(\Gamma+3\tilde{\Gamma}^{3}f(\varepsilon))+\Gamma\sigma_{1}(R)3\tilde{\Gamma}^{3}f(\varepsilon)
6Γ~4σ1(R)f(ε)+9Γ~6σ1(R)f(ε)2,\displaystyle\leq 6\tilde{\Gamma}^{4}\sigma_{1}(R)f(\varepsilon)+9\tilde{\Gamma}^{6}\sigma_{1}(R)f(\varepsilon)^{2},

and

\normL^srP^sL^srLsrPsLsr\displaystyle\norm{\hat{L}_{sr}^{\top}\hat{P}_{s}\hat{L}_{sr}-L_{sr}^{\top}P_{s}L_{sr}}
\displaystyle\leq \normL^srP^sL^srLsrP^sL^sr+\normLsrP^sL^srLsrPsL^sr+\normLsrPsL^srLsrPsLsr\displaystyle\norm{\hat{L}_{sr}^{\top}\hat{P}_{s}\hat{L}_{sr}-L_{sr}^{\top}\hat{P}_{s}\hat{L}_{sr}}+\norm{L_{sr}^{\top}\hat{P}_{s}\hat{L}_{sr}-L_{sr}^{\top}P_{s}\hat{L}_{sr}}+\norm{L_{sr}^{\top}P_{s}\hat{L}_{sr}-L_{sr}^{\top}P_{s}L_{sr}}
\displaystyle\leq 4Γ~4f(ε)(Γ+1)(Γ~2+4Γ~4f(ε))+Γ~2f(ε)(Γ~2+4Γ~4f(ε))+Γ~2Γ4Γ~4f(ε)\displaystyle 4\tilde{\Gamma}^{4}f(\varepsilon)(\Gamma+1)(\tilde{\Gamma}^{2}+4\tilde{\Gamma}^{4}f(\varepsilon))+\tilde{\Gamma}^{2}f(\varepsilon)(\tilde{\Gamma}^{2}+4\tilde{\Gamma}^{4}f(\varepsilon))+\tilde{\Gamma}^{2}\Gamma 4\tilde{\Gamma}^{4}f(\varepsilon)
\displaystyle\leq 4Γ~7f(ε)+16Γ~9f(ε)2+Γ~4f(ε)+4Γ~6f(ε)2+4Γ~7f(ε).\displaystyle 4\tilde{\Gamma}^{7}f(\varepsilon)+16\tilde{\Gamma}^{9}f(\varepsilon)^{2}+\tilde{\Gamma}^{4}f(\varepsilon)+4\tilde{\Gamma}^{6}f(\varepsilon)^{2}+4\tilde{\Gamma}^{7}f(\varepsilon).

Noting that f(ε)16f(\varepsilon)\leq\frac{1}{6}, one can now obtain from (72) that

\normP^rPr20Γ~9σ1(R)f(ε)16,\norm{\hat{P}_{r}-P_{r}}\leq 20\tilde{\Gamma}^{9}\sigma_{1}(R)f(\varepsilon)\leq\frac{1}{6}, (73)

where the second inequality again follows from the assumption on ε\varepsilon given in (47).

Next, let us consider any r𝒰r\in\mathcal{U} such that lsr=2l_{sr}=2 (where s𝒰s\in\mathcal{U} is the unique root node that is reachable from rr), and denote the unique directed path from rr to ss in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) as rr1sr\rightarrow r_{1}\rightarrow s. We see that r1𝒰r_{1}\in\mathcal{U} satisfies (70) and (73). Repeating the above arguments for obtaining (70) and (73) one more time, one can show that

\normK^rKr3Γ~320Γ~9σ1(R)f(ε)1,\norm{\hat{K}_{r}-K_{r}}\leq 3\tilde{\Gamma}^{3}20\tilde{\Gamma}^{9}\sigma_{1}(R)f(\varepsilon)\leq 1,

and

\normP^rPr(20Γ~9σ1(R))2f(ε)16,\norm{\hat{P}_{r}-P_{r}}\leq(20\tilde{\Gamma}^{9}\sigma_{1}(R))^{2}f(\varepsilon)\leq\frac{1}{6},

where we use again the assumption on ε\varepsilon given by (47). Further repeating the above arguments, and noting from Eq. (26) and the definition of 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) in (7) that lsrDmaxl_{sr}\leq D_{\max} for all s,r𝒰s,r\in\mathcal{U}, one can show that (45)-(46) hold, for all r𝒰r\in\mathcal{U} without a self loop, under the assumption on ε\varepsilon given in (47). ∎

Appendix D Proofs for Perturbation Bounds on Costs

D.1 Proof of Lemma 9

First, let us consider any s𝒰s\in\mathcal{U} that has a self loop. Noting the construction of the information graph 𝒫=(𝒰,)\mathcal{P}=(\mathcal{U},\mathcal{H}) given in (7), one can show that Eq. (35) can be rewritten as

ζ~s(t+1)=(Ass+BssK^s)ζ~s(t)+vsH(v,s)wjvIv,{j}wj(tlvs),\tilde{\zeta}_{s}(t+1)=(A_{ss}+B_{ss}\hat{K}_{s})\tilde{\zeta}_{s}(t)+\sum_{v\in\mathcal{L}_{s}}H(v,s)\sum_{w_{j}\rightarrow v}I_{v,\{j\}}w_{j}(t-l_{vs}), (74)

where s={v:vs}\mathcal{L}_{s}=\{v\in\mathcal{L}:v\rightsquigarrow s\} is the set of leaf nodes in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) that can reach ss, lvsl_{vs} is the length of the (unique) directed path from node vv to node ss in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) with lvs=0l_{vs}=0 if v=sv=s, and

H(v,s)(Asr1+Bsr1K^r1)(Arlvs1v+Brlvs1vK^v),H(v,s)\triangleq(A_{sr_{1}}+B_{sr_{1}}\hat{K}_{r_{1}})\cdots(A_{r_{l_{vs}-1}v}+B_{r_{l_{vs}-1}v}\hat{K}_{v}),

with H(v,s)=IH(v,s)=I if v=sv=s, where K^r\hat{K}_{r} is given by Eq. (8) for all r𝒰r\in\mathcal{U}, and v,rlvs1,,r1,sv,r_{l_{vs}-1},\dots,r_{1},s are the nodes along the directed path from vv to ss in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}). Recalling from Eq. (35) that ζ~s(0)=wisIs,{i}xi(0)=0\tilde{\zeta}_{s}(0)=\sum_{w_{i}\rightarrow s}I_{s,\{i\}}x_{i}(0)=0, in Eq. (74) we set wj(tlvs)=0w_{j}(t-l_{vs})=0 if tlvs<0t-l_{vs}<0. Now, under the assumption on ε\varepsilon given in Eq. (47), we see from (45) in Lemma 8 that \normK^rΓ~\norm{\hat{K}_{r}}\leq\tilde{\Gamma}, which implies that \normAsr+BsrK^rΓ~2\norm{A_{sr}+B_{sr}\hat{K}_{r}}\leq\tilde{\Gamma}^{2}, for all r𝒰r\in\mathcal{U} with rsr\neq s. Noting that lvsDmaxl_{vs}\leq D_{\max} from the construction of 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}), we have \normH(v,s)Γ~2Dmax\norm{H(v,s)}\leq\tilde{\Gamma}^{2D_{\max}}, for all vsv\in\mathcal{L}_{s}. Considering any t0t\in\mathbb{Z}_{\geq 0} and denoting

ηs(t)=vsH(v,s)wjvIv,{j}wj(tlvs),\eta_{s}(t)=\sum_{v\in\mathcal{L}_{s}}H(v,s)\sum_{w_{j}\rightarrow v}I_{v,\{j\}}w_{j}(t-l_{vs}), (75)

we have

𝔼[ηs(t)ηs(t)]=𝔼[vswjvH(v,s)Iv,{j}wj(tlvs)wj(tlvs)I{j},vH(v,s)],\mathbb{E}\Big{[}\eta_{s}(t)\eta_{s}(t)^{\top}\Big{]}=\mathbb{E}\Big{[}\sum_{v\in\mathcal{L}_{s}}\sum_{w_{j}\rightarrow v}H(v,s)I_{v,\{j\}}w_{j}(t-l_{vs})w_{j}(t-l_{vs})^{\top}I_{\{j\},v}H(v,s)^{\top}\Big{]},

where we use the fact from w(t)𝒩(0,σw2I)w(t)\sim\mathcal{N}(0,\sigma_{w}^{2}I) that wj1(t)w_{j_{1}}(t) and wj2(t)w_{j_{2}}(t) are independent for all j1,j2𝒱j_{1},j_{2}\in\mathcal{V} with j1j2j_{1}\neq j_{2}, and the fact that for any v𝒰v\in\mathcal{U} with sj(0)=vs_{j}(0)=v, wjw_{j} is the only noise term such that wjvw_{j}\rightarrow v (see Footnote 2). Moreover, we see that ηs(t1)\eta_{s}(t_{1}) and ηs(t2)\eta_{s}(t_{2}) are independent for all t1,t20t_{1},t_{2}\in\mathbb{Z}_{\geq 0} with t1t2t_{1}\neq t_{2}, and that ηs(t)\eta_{s}(t) is independent of ζ~s(t)\tilde{\zeta}_{s}(t) for all t0t\in\mathbb{Z}_{\geq 0}. Now, considering any k0k\in\mathbb{Z}_{\geq 0} such that klvs0k-l_{vs}\geq 0 for all vsv\in\mathcal{L}_{s}, and noting that w(t)𝒩(0,σw2I)w(t)\sim\mathcal{N}(0,\sigma_{w}^{2}I) for all t0t\in\mathbb{Z}_{\geq 0}, we have

𝔼[ηs(k)ηs(k)]=σw2vswjvH(v,s)Iv,{j}I{j},vH(v,s).\mathbb{E}\Big{[}\eta_{s}(k)\eta_{s}(k)^{\top}\Big{]}=\sigma_{w}^{2}\sum_{v\in\mathcal{L}_{s}}\sum_{w_{j}\rightarrow v}H(v,s)I_{v,\{j\}}I_{\{j\},v}H(v,s)^{\top}. (76)

Let us denote the right-hand side of Eq. (76) as W¯s\bar{W}_{s}, and denote

L~ss=Ass+BssK^s.\tilde{L}_{ss}=A_{ss}+B_{ss}\hat{K}_{s}.

Fixing any τ1\tau\in\mathbb{Z}_{\geq 1} such that τlvs0\tau-l_{vs}\geq 0 for all vsv\in\mathcal{L}_{s}, and considering any tτt\geq\tau, one can then unroll Eq. (74) and show that

𝔼[ζ~s(t)ζ~s(t)]=L~sstτ𝔼[ζ~s(τ)ζ~s(τ)](L~ss)tτ+k=0tτ1LsskW¯s(L~ss)k.\displaystyle\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)\tilde{\zeta}_{s}(t)^{\top}\Big{]}=\tilde{L}_{ss}^{t-\tau}\mathbb{E}\Big{[}\tilde{\zeta}_{s}(\tau)\tilde{\zeta}_{s}(\tau)^{\top}\Big{]}(\tilde{L}_{ss}^{\top})^{t-\tau}+\sum_{k=0}^{t-\tau-1}L_{ss}^{k}\bar{W}_{s}(\tilde{L}_{ss}^{\top})^{k}. (77)

Under the assumption on ε\varepsilon given in (47), one can obtain from Lemma 7 that \normL~sskκ(γ+12)k\norm{\tilde{L}_{ss}^{k}}\leq\kappa(\frac{\gamma+1}{2})^{k} for all k0k\geq 0, where 0<γ+12<10<\frac{\gamma+1}{2}<1, which implies that L~ss\tilde{L}_{ss} is stable. It follows that

limt𝔼[ζ~s(t)ζ~s(t)]\displaystyle\lim_{t\to\infty}\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)\tilde{\zeta}_{s}(t)^{\top}\Big{]} =limtk=0tτ1LsskW¯s(L~ss)k\displaystyle=\lim_{t\to\infty}\sum_{k=0}^{t-\tau-1}L_{ss}^{k}\bar{W}_{s}(\tilde{L}_{ss}^{\top})^{k}
\normW¯slimtk=0k01\normL~ssk\norm(L~ss)kI\displaystyle\preceq\norm{\bar{W}_{s}}\lim_{t\to\infty}\sum_{k=0}^{k-0-1}\norm{\tilde{L}_{ss}^{k}}\norm{(\tilde{L}_{ss}^{\top})^{k}}I
4\normW¯sκ21γ2I.\displaystyle\preceq\frac{4\norm{\bar{W}_{s}}\kappa^{2}}{1-\gamma^{2}}I.

Noting that |s|p|\mathcal{L}_{s}|\leq p from the definition of 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) given in (7), and that for any v𝒰v\in\mathcal{U} with sj(0)=vs_{j}(0)=v, wjw_{j} is the only noise term such that wjvw_{j}\rightarrow v, as we argued above, we have from Eq. (76) that

\normW¯sσw2pmaxvs\normH(v,s)2σw2pΓ~4Dmax,\displaystyle\norm{\bar{W}_{s}}\leq\sigma_{w}^{2}p\max_{v\in\mathcal{L}_{s}}\norm{H(v,s)}^{2}\leq\sigma_{w}^{2}p\tilde{\Gamma}^{4D_{\max}},

where the second inequality follows from the fact that \normH(v,s)Γ~2Dmax\norm{H(v,s)}\leq\tilde{\Gamma}^{2D_{\max}} as we argued above. It then follows that (48) holds.

Next, let us consider any s𝒰s\in\mathcal{U} that does not have a self loop. Similarly to Eq. (74), one can rewrite Eq. (35) as

ζ~s(t+1)=vsH(v,s)wjvIv,{j}wj(tlvs).\tilde{\zeta}_{s}(t+1)=\sum_{v\in\mathcal{L}_{s}}H(v,s)\sum_{w_{j}\rightarrow v}I_{v,\{j\}}w_{j}(t-l_{vs}).

Using similar arguments to those above, one can show that (48) also holds. ∎

D.2 Proof of Proposition 3

First, since ε\varepsilon satisfies (47) (and thus (44)), we have from (43) in Lemma 8 that Ass+BssK^sA_{ss}+B_{ss}\hat{K}_{s} is stable for any s𝒰s\in\mathcal{U} that has a self loop. Now, using similar arguments to those for the proofs of Theorem 2 and Corollary 4 in [26], and leveraging Lemma 6 and Eqs. (34)-(35), (21) and (49), one can show that Eq. (50) holds. To proceed, for any t0t\in\mathbb{Z}_{\geq 0} and for any TtT\geq t and, we set T=T+φJTT^{\prime}=T+\lceil\frac{\varphi}{J_{\star}}T\rceil, and define

JT(x~(t))=𝔼[k=tT1(x(k)Qx(k)+u(k)Ru(k))],J_{T}(\tilde{x}(t))=\mathbb{E}\Big{[}\sum_{k=t}^{T^{\prime}-1}(x(k)^{\top}Qx(k)+u^{\star}(k)^{\top}Ru^{\star}(k))\Big{]}, (78)

where u(k)=s𝒰I𝒱,sKsζs(k)u^{\star}(k)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}K_{s}\zeta_{s}(k) is the optimal control policy given by Eq. (11), for all i𝒱i\in\mathcal{V} and for all ktk\geq t, and where KrK_{r} and ζr(k)\zeta_{r}(k) are given by Eqs. (8) and (10), respectively, for all r𝒰r\in\mathcal{U}. Moreover, on the right-hand side of Eq. (78), we set x(t)=x~(t)x(t)=\tilde{x}(t), where x~(t)\tilde{x}(t) given by Eq. (36) is the state after applying the control policy u~(k)\tilde{u}(k) in Eq. (34) for k{0,,t1}k\in\{0,\dots,t-1\}. Noting that x~(0)=x(0)\tilde{x}(0)=x(0) as we discussed at the beginning of Section 5, and that T+φJTTT+φJT+1T+\frac{\varphi}{J_{\star}}T\leq T^{\prime}\leq T+\frac{\varphi}{J_{\star}}T+1, we see that

limT1TJT(x~(0))\displaystyle\lim_{T\to\infty}\frac{1}{T}J_{T}(\tilde{x}(0)) =limT(TT1TJT(x~(0)))\displaystyle=\lim_{T\to\infty}\big{(}\frac{T^{\prime}}{T}\frac{1}{T^{\prime}}J_{T}(\tilde{x}(0))\big{)}
=limTTT(limT1TJT(x~(0)))\displaystyle=\lim_{T\to\infty}\frac{T^{\prime}}{T}\big{(}\lim_{T\to\infty}\frac{1}{T^{\prime}}J_{T}(\tilde{x}(0))\big{)}
(1+φJ)J=J+φ,\displaystyle\leq(1+\frac{\varphi}{J_{\star}})J_{\star}=J_{\star}+\varphi,

where we use the fact that limT1TJT(x~(0))=J\lim_{T\to\infty}\frac{1}{T^{\prime}}J_{T}(\tilde{x}(0))=J_{\star}.

Recalling the definition of J~\tilde{J} in Eq. (38), we denote c~(t)=x~(t)Qx~(t)+u~(t)Ru~(t)\tilde{c}(t)=\tilde{x}(t)^{\top}Q\tilde{x}(t)+\tilde{u}(t)^{\top}R\tilde{u}(t). We then have the following:

J~\displaystyle\tilde{J} =limT1T𝔼[t=0T1c~(t)]\displaystyle=\lim_{T\to\infty}\frac{1}{T}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\tilde{c}(t)\Big{]}
=limT1T(𝔼[t=0T1(c~(t)JT(x~(t)))+t=0T1JT(x~(t))])\displaystyle=\lim_{T\to\infty}\frac{1}{T}\bigg{(}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\big{(}\tilde{c}(t)-J_{T}(\tilde{x}(t))\big{)}+\sum_{t=0}^{T-1}J_{T}(\tilde{x}(t))\Big{]}\bigg{)}
=limT1T(𝔼[t=0T2(c~(t)JT(x~(t)))+t=0T2JT(x~(t+1))+JT(x~(0))])\displaystyle=\lim_{T\to\infty}\frac{1}{T}\bigg{(}\mathbb{E}\Big{[}\sum_{t=0}^{T-2}\big{(}\tilde{c}(t)-J_{T}(\tilde{x}(t))\big{)}+\sum_{t=0}^{T-2}J_{T}(\tilde{x}(t+1))+J_{T}(\tilde{x}(0))\Big{]}\bigg{)}
=limT1T(𝔼[t=0T2(c~(t)+JT(x~(t+1))JT(x~(t)))+JT(x~(0))]).\displaystyle=\lim_{T\to\infty}\frac{1}{T}\bigg{(}\mathbb{E}\Big{[}\sum_{t=0}^{T-2}\big{(}\tilde{c}(t)+J_{T}(\tilde{x}(t+1))-J_{T}(\tilde{x}(t))\big{)}+J_{T}(\tilde{x}(0))\Big{]}\bigg{)}.

Using similar arguments to those for the proof of [26, Theorem 2], one can show that

JT(x~(t+1))JT(x~(t))=𝔼[r𝒰ζ~r(t+1)Pr(t+1)ζ~r(t+1)]𝔼[r𝒰ζ~r(t)Pr(t)ζ~r(t)]σw2i𝒱wirTr(I{i},rPr(t+1)Ir,{i}),J_{T}(\tilde{x}(t+1))-J_{T}(\tilde{x}(t))=\mathbb{E}\Big{[}\sum_{r\in\mathcal{U}}\tilde{\zeta}_{r}(t+1)^{\top}P_{r}(t+1)\tilde{\zeta}_{r}(t+1)\Big{]}\\ -\mathbb{E}\Big{[}\sum_{r\in\mathcal{U}}\tilde{\zeta}_{r}(t)^{\top}P_{r}(t)\tilde{\zeta}_{r}(t)\Big{]}-\sigma_{w}^{2}\sum_{\begin{subarray}{c}i\in\mathcal{V}\\ w_{i}\rightarrow r\end{subarray}}\text{Tr}\big{(}I_{\{i\},r}P_{r}(t+1)I_{r,\{i\}}\big{)}, (79)

where ζ~r(k)=wirIr,{i}x~i(k)\tilde{\zeta}_{r}(k)=\sum_{w_{i}\to r}I_{r,\{i\}}\tilde{x}_{i}(k) for k{t,t+1}k\in\{t,t+1\}. To obtain Pr(t)P_{r}(t) for all t{0,,T}t\in\{0,\dots,T\} and for all r𝒰r\in\mathcal{U} in Eq. (79), we use the following recursion:

Pr(k)=Qrr+KrRrrKr+(Asr+BsrKr)Ps(k+1)(Asr+BsrKr),P_{r}(k)=Q_{rr}+K_{r}^{\top}R_{rr}K_{r}+(A_{sr}+B_{sr}K_{r})^{\top}P_{s}(k+1)(A_{sr}+B_{sr}K_{r}), (80)

initialized with Pr(T)=0P_{r}(T^{\prime})=0 for all r𝒰r\in\mathcal{U}, where T=T+φJTT^{\prime}=T+\lceil\frac{\varphi}{J_{\star}}T\rceil, and for each r𝒰r\in\mathcal{U}, we let s𝒰s\in\mathcal{U} be the unique node such that rsr\rightarrow s, and KrK_{r} is given by Eq. (8) for all r𝒰r\in\mathcal{U}. Combining the above arguments together, we obtain the following:

J~J\displaystyle\tilde{J}-J_{\star} =limT1T𝔼[t=0T1(c~(t)+r𝒰ζ~r(t+1)Pr(t+1)ζ~r(t+1)r𝒰ζ~r(t)Pr(t)ζ~r(t)\displaystyle=\lim_{T\to\infty}\frac{1}{T}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\Big{(}\tilde{c}(t)+\sum_{r\in\mathcal{U}}\tilde{\zeta}_{r}(t+1)^{\top}P_{r}(t+1)\tilde{\zeta}_{r}(t+1)-\sum_{r\in\mathcal{U}}\tilde{\zeta}_{r}(t)^{\top}P_{r}(t)\tilde{\zeta}_{r}(t)
σw2i𝒱wirTr(I{i},rPr(t+1)Ir,{i}))]+limT1TJT(x~(0))J\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\quad-\sigma_{w}^{2}\sum_{\begin{subarray}{c}i\in\mathcal{V}\\ w_{i}\rightarrow r\end{subarray}}\text{Tr}\big{(}I_{\{i\},r}P_{r}(t+1)I_{r,\{i\}}\big{)}\Big{)}\Big{]}+\lim_{T\to\infty}\frac{1}{T}J_{T}(\tilde{x}(0))-J_{\star}
limT1T𝔼[t=0T1r𝒰(ζ~r(t)(Qrr+K^rRrrK^r)ζ~r(t)+ζ~r(t+1)Pr(t+1)ζ~r(t+1)\displaystyle\leq\lim_{T\to\infty}\frac{1}{T}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\sum_{r\in\mathcal{U}}\Big{(}\tilde{\zeta}_{r}(t)^{\top}(Q_{rr}+\hat{K}_{r}^{\top}R_{rr}\hat{K}_{r})\tilde{\zeta}_{r}(t)+\tilde{\zeta}_{r}(t+1)^{\top}P_{r}(t+1)\tilde{\zeta}_{r}(t+1)
ζ~r(t)Pr(t)ζ~r(t)σw2i𝒱wirTr(I{i},rPr(t+1)Ir,{i}))]+φ\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad-\tilde{\zeta}_{r}(t)^{\top}P_{r}(t)\tilde{\zeta}_{r}(t)-\sigma_{w}^{2}\sum_{\begin{subarray}{c}i\in\mathcal{V}\\ w_{i}\rightarrow r\end{subarray}}\text{Tr}\big{(}I_{\{i\},r}P_{r}(t+1)I_{r,\{i\}}\big{)}\Big{)}\Big{]}+\varphi (81)
=limT1T𝔼[t=0T1r𝒰(ζ~r(t)(Qrr+K^rRrrK^rPr(t))ζ~r(t)\displaystyle=\lim_{T\to\infty}\frac{1}{T}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\sum_{r\in\mathcal{U}}\Big{(}\tilde{\zeta}_{r}(t)^{\top}\big{(}Q_{rr}+\hat{K}_{r}^{\top}R_{rr}\hat{K}_{r}-P_{r}(t)\big{)}\tilde{\zeta}_{r}(t)
+vrζ~v(t)(Arv+BrvK^v)Pr(t+1)(Arv+BrvK^v)ζ~v(t))]+φ\displaystyle\qquad\qquad\qquad\qquad\qquad+\sum_{v\to r}\tilde{\zeta}_{v}^{\top}(t)(A_{rv}+B_{rv}\hat{K}_{v})^{\top}P_{r}(t+1)(A_{rv}+B_{rv}\hat{K}_{v})\tilde{\zeta}_{v}(t)\Big{)}\Big{]}+\varphi (82)
=limT1T𝔼[t=0T1r𝒰(ζ~r(t)(Qrr+K^rRrrK^rPr(t))ζ~r(t)\displaystyle=\lim_{T\to\infty}\frac{1}{T}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\sum_{r\in\mathcal{U}}\Big{(}\tilde{\zeta}_{r}(t)^{\top}\big{(}Q_{rr}+\hat{K}_{r}^{\top}R_{rr}\hat{K}_{r}-P_{r}(t)\big{)}\tilde{\zeta}_{r}(t)
+ζ~r(t)((Asr+BsrK^r)Ps(t+1)(Asr+BsrK^r))ζ~r(t))]+φ,\displaystyle\qquad\qquad\qquad\qquad\qquad\quad+\tilde{\zeta}_{r}^{\top}(t)\big{(}(A_{sr}+B_{sr}\hat{K}_{r})^{\top}P_{s}(t+1)(A_{sr}+B_{sr}\hat{K}_{r})\big{)}\tilde{\zeta}_{r}(t)\Big{)}\Big{]}+\varphi, (83)

where for each r𝒰r\in\mathcal{U} in Eq. (83), we let s𝒰s\in\mathcal{U} be the unique node such that rsr\rightarrow s. To obtain Eq. (81), we note from Lemma 6 that x~(t)=r𝒰I𝒱,rζ~r(t)\tilde{x}(t)=\sum_{r\in\mathcal{U}}I_{\mathcal{V},r}\tilde{\zeta}_{r}(t) for all t0t\in\mathbb{Z}_{\geq 0}, where 𝔼[ζ~r(t)]=0\mathbb{E}[\tilde{\zeta}_{r}(t)]=0 for all r𝒰r\in\mathcal{U}, and ζ~r1(t)\tilde{\zeta}_{r_{1}}(t) and ζ~r2(t)\tilde{\zeta}_{r_{2}}(t) are independent for all r1,r2𝒰r_{1},r_{2}\in\mathcal{U} with r1r2r_{1}\neq r_{2}. Moreover, we note from Eq. (34) that u~(t)=r𝒰I𝒱,sK^rζ~r(t)\tilde{u}(t)=\sum_{r\in\mathcal{U}}I_{\mathcal{V},s}\hat{K}_{r}\tilde{\zeta}_{r}(t) for all t0t\in\mathbb{Z}_{\geq 0}, where K^r\hat{K}_{r} is given by Eq. (21). Combining the above arguments together, and recalling that c~t=x~(t)Qx~(t)+u~(t)Ru~(t)\tilde{c}_{t}=\tilde{x}(t)^{\top}Q\tilde{x}(t)+\tilde{u}(t)^{\top}R\tilde{u}(t), we obtain Eq. (81). To obtain Eq. (82), we first apply Eq. (35) and notice w(t)𝒩(0,σw2I)w(t)\sim\mathcal{N}(0,\sigma_{w}^{2}I). Next, we use the facts that the information graph 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) defined in (7) is a tree (see Lemma 1 and Remark 1), and that ζ~r1(t)\tilde{\zeta}_{r_{1}}(t) and ζ~r2(t)\tilde{\zeta}_{r_{2}}(t) are independent for all r1,r2𝒰r_{1},r_{2}\in\mathcal{U} with r1r2r_{1}\neq r_{2} and for all t0t\in\mathbb{Z}_{\geq 0}, as we argued above. To obtain Eq. (83), we leverage again the tree structure of 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}).

Now, leveraging the recursion in Eq. (80), and using similar arguments to those for the proof of [16, Lemma 12], one can show via (83) that

J~J\displaystyle\tilde{J}-J_{\star} limT1T𝔼[t=0T1r𝒰(ζ~r(t)(K^rKr)(Rrr+BsrPs(t+1)Bsr)(K^rKr)ζ~r(t)\displaystyle\leq\lim_{T\to\infty}\frac{1}{T}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\sum_{r\in\mathcal{U}}\Big{(}\tilde{\zeta}_{r}(t)^{\top}(\hat{K}_{r}-K_{r})^{\top}\big{(}R_{rr}+B_{sr}^{\top}P_{s}(t+1)B_{sr}\big{)}(\hat{K}_{r}-K_{r})\tilde{\zeta}_{r}(t)
+2ζ~r(t)(K^rKr)((Rrr+BsrPs(t+1)Bsr)Kr+BsrPs(t+1)Asr)ζ~r(t))]+φ.\displaystyle\qquad\qquad+2\tilde{\zeta}_{r}(t)^{\top}(\hat{K}_{r}-K_{r})^{\top}\big{(}(R_{rr}+B_{sr}^{\top}P_{s}(t+1)B_{sr})K_{r}+B_{sr}^{\top}P_{s}(t+1)A_{sr}\big{)}\tilde{\zeta}_{r}(t)\Big{)}\Big{]}+\varphi.

Recall from Lemma 2 that Ass+BssKsA_{ss}+B_{ss}K_{s} is stable for any s𝒰s\in\mathcal{U} that has a self loop. Using similar arguments to those for the proof of [26, Corollary 4], one can show via Eq, (80) that Pr(t)PrP_{r}(t)\to P_{r} as TT\to\infty, for all t{0,,T}t\in\{0,\dots,T\} and for all r𝒰r\in\mathcal{U}, where PrP_{r} is given by Eq. (9). It then follows that

J~J\displaystyle\tilde{J}-J_{\star} limT1T𝔼[t=0T1r𝒰(ζ~r(t)(K^rKr)(Rrr+BsrPsBsr)(K^rKr)ζ~r(t)\displaystyle\leq\lim_{T\to\infty}\frac{1}{T}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\sum_{r\in\mathcal{U}}\Big{(}\tilde{\zeta}_{r}(t)^{\top}(\hat{K}_{r}-K_{r})^{\top}\big{(}R_{rr}+B_{sr}^{\top}P_{s}B_{sr}\big{)}(\hat{K}_{r}-K_{r})\tilde{\zeta}_{r}(t)
+2ζ~r(t)(K^rKr)((Rrr+BsrPsBsr)Kr+BsrPsAsr)ζ~r(t))]+φ\displaystyle\qquad\qquad\quad+2\tilde{\zeta}_{r}(t)^{\top}(\hat{K}_{r}-K_{r})^{\top}\big{(}(R_{rr}+B_{sr}^{\top}P_{s}B_{sr})K_{r}+B_{sr}^{\top}P_{s}A_{sr}\big{)}\tilde{\zeta}_{r}(t)\Big{)}\Big{]}+\varphi
=limT1T𝔼[t=0T1r𝒰(ζ~r(t)(K^rKr)(Rrr+BsrPsBsr)(K^rKr)ζ~r(t))+φ\displaystyle=\lim_{T\to\infty}\frac{1}{T}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\sum_{r\in\mathcal{U}}\Big{(}\tilde{\zeta}_{r}(t)^{\top}(\hat{K}_{r}-K_{r})^{\top}\big{(}R_{rr}+B_{sr}^{\top}P_{s}B_{sr}\big{)}(\hat{K}_{r}-K_{r})\tilde{\zeta}_{r}(t)\Big{)}+\varphi
=Tr(r𝒰(K^rKr)(Rrr+BsrPsBsr)(K^rKr)limt𝔼[ζ~r(t)ζ~r(t)])+φ,\displaystyle=\text{Tr}\Big{(}\sum_{r\in\mathcal{U}}(\hat{K}_{r}-K_{r})^{\top}\big{(}R_{rr}+B_{sr}^{\top}P_{s}B_{sr}\big{)}(\hat{K}_{r}-K_{r})\lim_{t\to\infty}\mathbb{E}\Big{[}\tilde{\zeta}_{r}(t)\tilde{\zeta}_{r}(t)^{\top}\Big{]}\Big{)}+\varphi, (84)

where the equality follows from Eq. (8), and the second equality follows from the fact that the limit limt𝔼[ζ~r(t)ζ~r(t)]\lim_{t\to\infty}\mathbb{E}\big{[}\tilde{\zeta}_{r}(t)\tilde{\zeta}_{r}(t)^{\top}\big{]} exists, for all r𝒰r\in\mathcal{U}, as we argued in the proof of Lemma 9.

Finally, substituting (48) in Lemma 9 into the right-hand side of  (84), we obtain

J~J\displaystyle\tilde{J}-J_{\star} 4pσw2Γ~4Dmaxκ21γ2Tr(r𝒰(KrK^r)(Rrr+BsrPsBsr)(KrK^r))+φ\displaystyle\leq\frac{4p\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}}\kappa^{2}}{1-\gamma^{2}}\text{Tr}\Big{(}\sum_{r\in\mathcal{U}}(K_{r}-\hat{K}_{r})^{\top}(R_{rr}+B_{sr}^{\top}P_{s}B_{sr})(K_{r}-\hat{K}_{r})\Big{)}+\varphi
4pσw2Γ~4Dmaxκ21γ2(Γ3+σ1(R))Tr(r𝒰(KrK^r)(KrK^r))+φ\displaystyle\leq\frac{4p\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}}\kappa^{2}}{1-\gamma^{2}}(\Gamma^{3}+\sigma_{1}(R))\text{Tr}\Big{(}\sum_{r\in\mathcal{U}}(K_{r}-\hat{K}_{r})^{\top}(K_{r}-\hat{K}_{r})\Big{)}+\varphi
4pσw2Γ~4Dmaxκ21γ2(Γ3+σ1(R))r𝒰min{mr,nr}\normKrK^r2+φ\displaystyle\leq\frac{4p\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}}\kappa^{2}}{1-\gamma^{2}}(\Gamma^{3}+\sigma_{1}(R))\sum_{r\in\mathcal{U}}\min\{m_{r},n_{r}\}\norm{K_{r}-\hat{K}_{r}}^{2}+\varphi
72κ4σw2pqn(1γ2)2Γ~4Dmax+8(Γ3+σ1(R))(1+σ1(R1))(20Γ~9σ1(R))Dmaxε+φ,\displaystyle\leq\frac{72\kappa^{4}\sigma_{w}^{2}pqn}{(1-\gamma^{2})^{2}}\tilde{\Gamma}^{4D_{\max}+8}(\Gamma^{3}+\sigma_{1}(R))(1+\sigma_{1}(R^{-1}))(20\tilde{\Gamma}^{9}\sigma_{1}(R))^{D_{\max}}\varepsilon+\varphi, (85)

where the third inequality follows from the fact that Kr,K^rnr×mrK_{r},\hat{K}_{r}\in\mathbb{R}^{n_{r}\times m_{r}} for all r𝒰r\in\mathcal{U}, with nrirnin_{r}\triangleq\sum_{i\in r}n_{i} and mrirmim_{r}\triangleq\sum_{i\in r}m_{i}. To obtain (85), we first note that ε\varepsilon is assumed to satisfy (47) (and thus (44)). Recalling |𝒰|=q|\mathcal{U}|=q, and nimin_{i}\geq m_{i} for all i𝒱i\in\mathcal{V} as we assumed previously, we then obtain (85) from Lemmas 7-8, where we also use the fact that \normK^rKr1\norm{\hat{K}_{r}-K_{r}}\leq 1 for all r𝒰r\in\mathcal{U}. ∎

D.3 Proof of Lemma 10

First, considering any s𝒰s\in\mathcal{U} and any t0t\in\mathbb{Z}_{\geq 0}, we have

𝔼[\normζ~s(t)2]\displaystyle\mathbb{E}\Big{[}\norm{\tilde{\zeta}_{s}(t)}^{2}\Big{]} =𝔼[ζ~s(t)ζ~s(t)]=𝔼[Tr(ζ~s(t)ζ~s(t))]=Tr(𝔼[ζ~s(t)ζ~s(t)])\displaystyle=\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)^{\top}\tilde{\zeta}_{s}(t)\Big{]}=\mathbb{E}\Big{[}\text{Tr}\Big{(}\tilde{\zeta}_{s}(t)\tilde{\zeta}_{s}(t)^{\top}\Big{)}\Big{]}=\text{Tr}\Big{(}\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)\tilde{\zeta}_{s}(t)^{\top}\Big{]}\Big{)}
nσ1(𝔼[ζ~s(t)ζ~s(t)]).\displaystyle\leq n\sigma_{1}\Big{(}\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)\tilde{\zeta}_{s}(t)^{\top}\Big{]}\Big{)}.

Following similar arguments to those in the proof of Lemma 9 (particularly Eq. (77)), one can show that

𝔼[ζ~s(t)ζ~s(t)]L~sst𝔼[ζ~s(0)ζ~s(0)](L~ss)t+σw2pΓ~4Dmaxk=0t1Lssk(L~ss)k,\displaystyle\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)\tilde{\zeta}_{s}(t)^{\top}\Big{]}\preceq\tilde{L}_{ss}^{t}\mathbb{E}\Big{[}\tilde{\zeta}_{s}(0)\tilde{\zeta}_{s}(0)^{\top}\Big{]}(\tilde{L}_{ss}^{\top})^{t}+\sigma_{w}^{2}p\tilde{\Gamma}^{4D_{\max}}\sum_{k=0}^{t-1}L_{ss}^{k}(\tilde{L}_{ss}^{\top})^{k},

where L~ss=Ass+BssK^s\tilde{L}_{ss}=A_{ss}+B_{ss}\hat{K}_{s}, and ζ~s(0)=wisIs,{i}xi(0)=0\tilde{\zeta}_{s}(0)=\sum_{w_{i}\rightarrow s}I_{s,\{i\}}x_{i}(0)=0. Since ε\varepsilon satisfies (47) and thus (44), we see from Lemma 7 that \normL~sskκ(γ+12)k\norm{\tilde{L}_{ss}^{k}}\leq\kappa(\frac{\gamma+1}{2})^{k} for all k0k\in\mathbb{Z}_{\geq 0}. It now follows that

𝔼[ζ~s(t)ζ~s(t)]\displaystyle\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)\tilde{\zeta}_{s}(t)^{\top}\Big{]} σw2pκ2Γ~4Dmaxk=0t(γ+12)2kI\displaystyle\preceq\sigma_{w}^{2}p\kappa^{2}\tilde{\Gamma}^{4D_{\max}}\sum_{k=0}^{t}\Big{(}\frac{\gamma+1}{2}\Big{)}^{2k}I
4σw2pκ2Γ~4Dmax1γ2I.\displaystyle\preceq\frac{4\sigma_{w}^{2}p\kappa^{2}\tilde{\Gamma}^{4D_{\max}}}{1-\gamma^{2}}I.

Combining the above arguments together, we obtain (51).

Next, recalling from Lemma 6 that x~(t)=s𝒰I𝒱,sζ~s(t)\tilde{x}(t)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\tilde{\zeta}_{s}(t) for all t0t\in\mathbb{Z}_{\geq 0}, we then have

𝔼[\normx~(t)2]\displaystyle\mathbb{E}\Big{[}\norm{\tilde{x}(t)}^{2}\Big{]} =𝔼[(s𝒰I𝒱,sζ~s(t))(s𝒰I𝒱,sζ~s(t))]\displaystyle=\mathbb{E}\Big{[}\big{(}\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\tilde{\zeta}_{s}(t)\big{)}^{\top}\big{(}\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\tilde{\zeta}_{s}(t)\big{)}\Big{]}
(s𝒰𝔼[ζ~s(t)Is,𝒱I𝒱,sζ~s(t)])2\displaystyle\leq\Big{(}\sum_{s\in\mathcal{U}}\sqrt{\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)^{\top}I_{s,\mathcal{V}}I_{\mathcal{V},s}\tilde{\zeta}_{s}(t)\Big{]}}\Big{)}^{2}
(s𝒰𝔼[\normζ~s(t)2])2\displaystyle\leq\Big{(}\sum_{s\in\mathcal{U}}\sqrt{\mathbb{E}\Big{[}\norm{\tilde{\zeta}_{s}(t)}^{2}\Big{]}}\Big{)}^{2}
4npq2σw2Γ~4Dmaxκ21γ2,\displaystyle\leq\frac{4npq^{2}\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}}\kappa^{2}}{1-\gamma^{2}},

where the first inequality follows from Lemma 14, and the last inequality follows from the fact that |𝒰|=q|\mathcal{U}|=q. This completes the proof of (52).

Finally, we note from Eq. (34) that u~(t)=s𝒰I𝒱,sK^s(t)ζ~s(t)\tilde{u}(t)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\hat{K}_{s}(t)\tilde{\zeta}_{s}(t). Moreover, since ε\varepsilon satisfies (47) (and thus (44)), we know from Lemmas 7-8 that \normK^sKs1\norm{\hat{K}_{s}-K_{s}}\leq 1 for all s𝒰s\in\mathcal{U}, which implies via the definition of Γ\Gamma given in (40) that \normK^sΓ~\norm{\hat{K}_{s}}\leq\tilde{\Gamma} for all s𝒰s\in\mathcal{U}. Now, using similar arguments to those above, we can show that (53) holds. ∎

D.4 Proof of Lemma 11

For notational simplicity in this proof, we denote

δh=p(Γ~+1)2Dmax1,β=Γ~2Dmax,\delta_{h}=p(\tilde{\Gamma}+1)^{2D_{\max}-1},\ \beta=\tilde{\Gamma}^{2D_{\max}},
Λ1=qΓ~(2κp(β+1)1γ+32κ2p(Γ~+1)(1γ)2)(1+κΓ1γ),\Lambda_{1}=q\tilde{\Gamma}\bigg{(}\frac{2\kappa p(\beta+1)}{1-\gamma}+\frac{32\kappa^{2}p(\tilde{\Gamma}+1)}{(1-\gamma)^{2}}\bigg{)}\bigg{(}1+\frac{\kappa\Gamma}{1-\gamma}\bigg{)},

and

Λ2=2pqΓ~κ1γ((β+1)q(Γ~+1)+δh)ζb+16κ2pqΓ~(Γ~+1)(1γ)2(2q(Γ~+1)+2)ζb.\Lambda_{2}=\frac{2pq\tilde{\Gamma}\kappa}{1-\gamma}\big{(}(\beta+1)q(\tilde{\Gamma}+1)+\delta_{h}\big{)}\zeta_{b}+\frac{16\kappa^{2}pq\tilde{\Gamma}(\tilde{\Gamma}+1)}{(1-\gamma)^{2}}\big{(}2q(\tilde{\Gamma}+1)+2\big{)}\zeta_{b}.

We first prove (55). Based on the above notations, we can show that

Λ2\displaystyle\Lambda_{2} =(2κpq21γΓ~(Γ~+1)(β+1)+32κ2pq2(1γ)2Γ~(Γ~+1)2)ζb+(2κpq1γΓ~δh+32κ2pq(1γ)2Γ~(Γ~+1))ζb\displaystyle=\Big{(}\frac{2\kappa pq^{2}}{1-\gamma}\tilde{\Gamma}(\tilde{\Gamma}+1)(\beta+1)+\frac{32\kappa^{2}pq^{2}}{(1-\gamma)^{2}}\tilde{\Gamma}(\tilde{\Gamma}+1)^{2}\Big{)}\zeta_{b}+\Big{(}\frac{2\kappa pq}{1-\gamma}\tilde{\Gamma}\delta_{h}+\frac{32\kappa^{2}pq}{(1-\gamma)^{2}}\tilde{\Gamma}(\tilde{\Gamma}+1)\Big{)}\zeta_{b}
34κ2pq2(1γ)2(Γ~+1)2Γ~2Dmax+1ζb+18κ2p2q(1γ)2(Γ~+1)2Dmax+3ζb\displaystyle\leq\frac{34\kappa^{2}pq^{2}}{(1-\gamma)^{2}}(\tilde{\Gamma}+1)^{2}\tilde{\Gamma}^{2D_{\max}+1}\zeta_{b}+\frac{18\kappa^{2}p^{2}q}{(1-\gamma)^{2}}(\tilde{\Gamma}+1)^{2D_{\max}+3}\zeta_{b}
52κ2p2q2(1γ)2(Γ~+1)2Dmax+3ζb,\displaystyle\leq\frac{52\kappa^{2}p^{2}q^{2}}{(1-\gamma)^{2}}(\tilde{\Gamma}+1)^{2D_{\max}+3}\zeta_{b},

where the first inequality follows from the facts that κ1γ>1\frac{\kappa}{1-\gamma}>1, and that β+1Γ~2Dmax+1\beta+1\leq\tilde{\Gamma}^{2D_{\max}+1} since Γ~=Γ+12\tilde{\Gamma}=\Gamma+1\geq 2 (see our arguments in the proof of Lemma 8). We then have

1.1Λ258κ2(Γ~+1)2Dmax+3p2q2(1γ)2ζb.1.1\Lambda_{2}\leq\frac{58\kappa^{2}(\tilde{\Gamma}+1)^{2D_{\max}+3}p^{2}q^{2}}{(1-\gamma)^{2}}\zeta_{b}. (86)

Thus, in order to show that (55) holds for all t0t\in\mathbb{Z}_{\geq 0}, it suffices to show that 𝔼[\normu(t)u~(t)2](1.1Λ2ε¯)2\mathbb{E}\big{[}\norm{u(t)-\tilde{u}(t)}^{2}\big{]}\leq(1.1\Lambda_{2}\bar{\varepsilon})^{2} holds for all t0t\in\mathbb{Z}_{\geq 0}. To this end, we prove via an induction on t=0,1,t=0,1,\dots. For any t0t\in\mathbb{Z}_{\geq 0}, we recall from Eqs. (23) and (34) that u^i(t)=riI{i},rK^rζ^r(t)\hat{u}_{i}(t)=\sum_{r\ni i}I_{\{i\},r}\hat{K}_{r}\hat{\zeta}_{r}(t) and u~i(t)=riI{i},rK^rζ~r(t)\tilde{u}_{i}(t)=\sum_{r\ni i}I_{\{i\},r}\hat{K}_{r}\tilde{\zeta}_{r}(t), respectively, for all i𝒱i\in\mathcal{V}, where ζ^r(t)\hat{\zeta}_{r}(t) and ζ~r(t)\tilde{\zeta}_{r}(t) are given by Eqs. (28) and (35), respectively, and K^r\hat{K}_{r} is given by Eq. (21), for all r𝒰r\in\mathcal{U}. As we argued before, in Eqs. (28) and (35) we have ζ^r(0)=ζ~r(0)=wirIr,{i}xi(0)\hat{\zeta}_{r}(0)=\tilde{\zeta}_{r}(0)=\sum_{w_{i}\rightarrow r}I_{r,\{i\}}x_{i}(0) for all r𝒰r\in\mathcal{U}. Hence, we have u^(0)=u~(0)\hat{u}(0)=\tilde{u}(0), which implies that (55) holds for t=0t=0, completing the proof of the base step of the induction.

For the induction step, suppose 𝔼[\normu^(k)u~(k)2](1.1Λ2ε¯)2\mathbb{E}\big{[}\norm{\hat{u}(k)-\tilde{u}(k)}^{2}\big{]}\leq(1.1\Lambda_{2}\bar{\varepsilon})^{2} holds for all k{0,,t}k\in\{0,\dots,t\}. Now, considering any k{0,,t}k\in\{0,\dots,t\}, we can unroll the expressions of x^(k)\hat{x}(k) and x~(k)\tilde{x}(k) given by Eqs. (32) and (36), respectively, and obtain

x^(k)=Akx^(0)+k=0k1Akk1(Bu^(k)+w(k)),\hat{x}(k)=A^{k}\hat{x}(0)+\sum_{k^{\prime}=0}^{k-1}A^{k-k^{\prime}-1}(B\hat{u}(k^{\prime})+w(k^{\prime})),

and

x~(k)=Akx~(0)+k=0k1Akk1(Bu~(k)+w(k)),\tilde{x}(k)=A^{k}\tilde{x}(0)+\sum_{k^{\prime}=0}^{k-1}A^{k-k^{\prime}-1}(B\tilde{u}(k^{\prime})+w(k^{\prime})),

where we note that x^(0)=x~(0)=x(0)\hat{x}(0)=\tilde{x}(0)=x(0). It then follows that

𝔼[\normx^(k)x~(k)2]\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{\hat{x}(k)-\tilde{x}(k)}^{2}\Big{]}} k=0k1𝔼[\normAkk1B(u^(k)u~(k))2]\displaystyle\leq\sum_{k^{\prime}=0}^{k-1}\sqrt{\mathbb{E}\Big{[}\norm{A^{k-k^{\prime}-1}B(\hat{u}(k^{\prime})-\tilde{u}(k^{\prime}))}^{2}\Big{]}}
k=0k1𝔼[\normAkk12\normB2\normu^(k)u~(k)2]\displaystyle\leq\sum_{k^{\prime}=0}^{k-1}\sqrt{\mathbb{E}\Big{[}\norm{A^{k-k^{\prime}-1}}^{2}\norm{B}^{2}\norm{\hat{u}(k^{\prime})-\tilde{u}(k^{\prime})}^{2}\Big{]}}
\normBk=0k1\normAkk1𝔼[\normu^(k)u~(k)2]\displaystyle\leq\norm{B}\sum_{k^{\prime}=0}^{k-1}\norm{A^{k-k^{\prime}-1}}\sqrt{\mathbb{E}\Big{[}\norm{\hat{u}(k^{\prime})-\tilde{u}(k^{\prime})}^{2}\Big{]}}
Γ1.1Λ2ε¯k=0k1\normAkk11.1ΓΛ2ε¯κ1γ,\displaystyle\leq\Gamma 1.1\Lambda_{2}\bar{\varepsilon}\sum_{k^{\prime}=0}^{k-1}\norm{A^{k-k^{\prime}-1}}\leq 1.1\Gamma\Lambda_{2}\bar{\varepsilon}\frac{\kappa}{1-\gamma}, (87)

where the first inequality follows from Lemma 14. To obtain the first inequality in (87), we use the induction hypothesis. To obtain the second inequality in (87), we use the fact that \normAkκγk\norm{A^{k^{\prime}}}\leq\kappa\gamma^{k^{\prime}} (with 0<γ<10<\gamma<1), for all k0k^{\prime}\in\mathbb{Z}_{\geq 0}, from Assumption 3. Recalling from our arguments in Section 4 (particularly, Eq. (29)), one can show that

w^(k)=x^(k+1)A^x^(k)B^u^(k),\hat{w}(k)=\hat{x}(k+1)-\hat{A}\hat{x}(k)-\hat{B}\hat{u}(k),

where w^(k)=[w^1(k)w^p(k)]\hat{w}(k)=\begin{bmatrix}\hat{w}_{1}(k)^{\top}&\cdots&\hat{w}_{p}(k)^{\top}\end{bmatrix}^{\top} is an estimate of w(k)w(k) in Eq. (3). From Eq. (32), we see that

w(k)=x^(k+1)Ax^(k)Bu^(k).w(k)=\hat{x}(k+1)-A\hat{x}(k)-B\hat{u}(k).

It follows that

\normw^(k)w(k)\displaystyle\norm{\hat{w}(k)-w(k)} \normA^A\normx^(k)+\normB^B\normu^(k)\displaystyle\leq\norm{\hat{A}-A}\norm{\hat{x}(k)}+\norm{\hat{B}-B}\norm{\hat{u}(k)}
(\normx^(k)+\normu^(k))ε¯.\displaystyle\leq(\norm{\hat{x}(k)}+\norm{\hat{u}(k)})\bar{\varepsilon}.

Recall from Lemma 10 that 𝔼[\normx~(k)2]q2ζb2\mathbb{E}\big{[}\norm{\tilde{x}(k)}^{2}\big{]}\leq q^{2}\zeta_{b}^{2} and 𝔼[\normu~(k)2]q2Γ~2ζb2\mathbb{E}\big{[}\norm{\tilde{u}(k)}^{2}\big{]}\leq q^{2}\tilde{\Gamma}^{2}\zeta_{b}^{2}, for all k0k\in\mathbb{Z}_{\geq 0}. We then obtain

𝔼[\normx^(k)2]\displaystyle\mathbb{E}\Big{[}\norm{\hat{x}(k)}^{2}\Big{]} =𝔼[\normx^(k)x~[k]+x~[k]2]\displaystyle=\mathbb{E}\Big{[}\norm{\hat{x}(k)-\tilde{x}[k]+\tilde{x}[k]}^{2}\Big{]}
(𝔼[\normx^(k)x~(k)2]+𝔼[\normx~[k]2])2\displaystyle\leq\Big{(}\sqrt{\mathbb{E}\Big{[}\norm{\hat{x}(k)-\tilde{x}(k)}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{\tilde{x}[k]}^{2}\Big{]}}\Big{)}^{2}
(1.1ΓΛ2κ1γε¯+qζb)2,\displaystyle\leq\Big{(}\frac{1.1\Gamma\Lambda_{2}\kappa}{1-\gamma}\bar{\varepsilon}+q\zeta_{b}\Big{)}^{2},

where the first inequality follows again from Lemma 14, and the second inequality uses (87). Similarly, we have

𝔼[\normu^(k)2]\displaystyle\mathbb{E}\Big{[}\norm{\hat{u}(k)}^{2}\Big{]} =𝔼[\normu^(k)u~[k]+u~[k]2]\displaystyle=\mathbb{E}\Big{[}\norm{\hat{u}(k)-\tilde{u}[k]+\tilde{u}[k]}^{2}\Big{]}
(𝔼[\normu^(k)u~(k)2]+𝔼[\normu~[k]2])2\displaystyle\leq\Big{(}\sqrt{\mathbb{E}\Big{[}\norm{\hat{u}(k)-\tilde{u}(k)}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{\tilde{u}[k]}^{2}\Big{]}}\Big{)}^{2}
(1.1Λ2ε¯+qΓ~ζb)2.\displaystyle\leq(1.1\Lambda_{2}\bar{\varepsilon}+q\tilde{\Gamma}\zeta_{b})^{2}.

It then follows that

𝔼[\normw^(k)w(k)2]\displaystyle\mathbb{E}\Big{[}\norm{\hat{w}(k)-w(k)}^{2}\Big{]} =𝔼[\norm(A^A)x^(k)+(B^B)u^(k)2]\displaystyle=\mathbb{E}\Big{[}\norm{(\hat{A}-A)\hat{x}(k)+(\hat{B}-B)\hat{u}(k)}^{2}\Big{]}
(𝔼[\norm(A^A)x^(k)2]+𝔼[\norm(B^B)u^(k)2])2\displaystyle\leq\Big{(}\sqrt{\mathbb{E}\Big{[}\norm{(\hat{A}-A)\hat{x}(k)}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{(\hat{B}-B)\hat{u}(k)}^{2}\Big{]}}\Big{)}^{2}
(𝔼[\normx^(k)2]+𝔼[\normu^(k)2])2ε¯2\displaystyle\leq\Big{(}\sqrt{\mathbb{E}\Big{[}\norm{\hat{x}(k)}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{\hat{u}(k)}^{2}\Big{]}}\Big{)}^{2}\bar{\varepsilon}^{2}
(q(Γ~+1)ζb+1.1ΓΛ2κ1γε¯+1.1Λ2)2ε¯2,\displaystyle\leq\big{(}q(\tilde{\Gamma}+1)\zeta_{b}+\frac{1.1\Gamma\Lambda_{2}\kappa}{1-\gamma}\bar{\varepsilon}+1.1\Lambda_{2}\big{)}^{2}\bar{\varepsilon}^{2},

where the first inequality again follows from Lemma 14, and the second inequality follows from \normA^Aε¯\norm{\hat{A}-A}\leq\bar{\varepsilon} and \normB^Bε¯\norm{\hat{B}-B}\leq\bar{\varepsilon}. Denoting

δw=q(Γ~+1)ζb+1.1ΓΛ2κ1γε¯+1.1Λ2ε¯,\delta_{w}=q(\tilde{\Gamma}+1)\zeta_{b}+\frac{1.1\Gamma\Lambda_{2}\kappa}{1-\gamma}\bar{\varepsilon}+1.1\Lambda_{2}\bar{\varepsilon}, (88)

we have

𝔼[\normw^(k)w(k)2]δw2ε¯2k{0,,t}.\mathbb{E}\Big{[}\norm{\hat{w}(k)-w(k)}^{2}\Big{]}\leq\delta_{w}^{2}\bar{\varepsilon}^{2}\quad\forall k\in\{0,\dots,t\}. (89)

Moreover, note that

𝔼[\normw(k)2]\displaystyle\mathbb{E}\Big{[}\norm{w(k)}^{2}\Big{]} =𝔼[Tr(w(k)w(k))]=Tr(𝔼[w(k)w(k)])\displaystyle=\mathbb{E}\Big{[}\text{Tr}(w(k)w(k)^{\top})\Big{]}=\text{Tr}\Big{(}\mathbb{E}\Big{[}w(k)w(k)^{\top}\Big{]}\Big{)}
=nσ2wζb2k0.\displaystyle=n\sigma^{2}_{w}\leq\zeta_{b}^{2}\quad\forall k\in\mathbb{Z}_{\geq 0}. (90)

To proceed, let us consider any s𝒰s\in\mathcal{U} that has a self loop. Recalling the arguments in the proof of Lemmas 9, we can rewrite Eq. (35) as

ζ~s(t+1)=(Ass+BssK^s)ζ~s(t)+ηs(t),\tilde{\zeta}_{s}(t+1)=(A_{ss}+B_{ss}\hat{K}_{s})\tilde{\zeta}_{s}(t)+\eta_{s}(t), (91)

with

ηs(t)=vsH(v,s)wjvIv,{j}wj(tlvs),\eta_{s}(t)=\sum_{v\in\mathcal{L}_{s}}H(v,s)\sum_{w_{j}\rightarrow v}I_{v,\{j\}}w_{j}(t-l_{vs}), (92)

where s={v:vs}\mathcal{L}_{s}=\{v\in\mathcal{L}:v\rightsquigarrow s\} is the set of leaf nodes in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) that can reach ss, lvsl_{vs} is the length of the (unique) directed path from node vv to node ss in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) with lvs=0l_{vs}=0 if v=sv=s, and

H(v,s)=(Asr1+Bsr1K^r1)(Arlvs1v+Brlvs1vK^v),H(v,s)=(A_{sr_{1}}+B_{sr_{1}}\hat{K}_{r_{1}})\cdots(A_{r_{l_{vs}-1}v}+B_{r_{l_{vs}-1}v}\hat{K}_{v}),

with H(v,s)=IH(v,s)=I if v=sv=s, where v,rlvs1,,r1,sv,r_{l_{vs}-1},\dots,r_{1},s are the nodes along the directed path from vv to ss in 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}). We also recall from the arguments in the proof of Lemma 9 that \normH(v,s)β\norm{H(v,s)}\leq\beta for all vsv\in\mathcal{L}_{s}. We then see from (76) in the proof of Lemma 9 and the definition of ζb\zeta_{b} in (54) that

𝔼[\normηs(k)2]\displaystyle\mathbb{E}\Big{[}\norm{\eta_{s}(k)}^{2}\Big{]} =𝔼[Tr(ηs(k)ηs(k))]=Tr(𝔼[ηs(k)ηs(k)])\displaystyle=\mathbb{E}\Big{[}\text{Tr}(\eta_{s}(k)\eta_{s}(k)^{\top})\Big{]}=\text{Tr}\Big{(}\mathbb{E}\Big{[}\eta_{s}(k)\eta_{s}(k)^{\top}\Big{]}\Big{)}
σw2npβ2ζb2k0.\displaystyle\leq\sigma_{w}^{2}np\beta^{2}\leq\zeta_{b}^{2}\quad\forall k\in\mathbb{Z}_{\geq 0}. (93)

Similarly, one can rewrite Eq. (28) as

ζ^s(t+1)=(A^ss+B^ssK^s)ζ^s(t)+η^s(t),\hat{\zeta}_{s}(t+1)=(\hat{A}_{ss}+\hat{B}_{ss}\hat{K}_{s})\hat{\zeta}_{s}(t)+\hat{\eta}_{s}(t), (94)

where

η^s(t)=vsH^(v,s)wjvIv,{j}w^j(tlvs),\hat{\eta}_{s}(t)=\sum_{v\in\mathcal{L}_{s}}\hat{H}(v,s)\sum_{w_{j}\rightarrow v}I_{v,\{j\}}\hat{w}_{j}(t-l_{vs}), (95)

where

H^(v,s)=(A^sr1+B^sr1K^r1)(A^rlvs1v+B^rlvs1vK^v),\hat{H}(v,s)=(\hat{A}_{sr_{1}}+\hat{B}_{sr_{1}}\hat{K}_{r_{1}})\cdots(\hat{A}_{r_{l_{vs}-1}v}+\hat{B}_{r_{l_{vs}-1}v}\hat{K}_{v}),

with H^(v,s)=I\hat{H}(v,s)=I if v=sv=s. Note that for any vsv\in\mathcal{L}_{s} and for any wjvw_{j}\rightarrow v in Eqs. (92) and (95), we set w^j(klvs)=wj(klvs)=0\hat{w}_{j}(k-l_{vs})=w_{j}(k-l_{vs})=0 if k<lvsk<l_{vs}. One can check that ε¯\bar{\varepsilon} satisfies (44) and (47). We then have from Lemmas 7-8 that \normK^rΓ+1=Γ~\norm{\hat{K}_{r}}\leq\Gamma+1=\tilde{\Gamma} for all r𝒰r\in\mathcal{U}. It follows that \normA^sr+B^srK^r(Γ~+1)2\norm{\hat{A}_{sr}+\hat{B}_{sr}\hat{K}_{r}}\leq(\tilde{\Gamma}+1)^{2} for all r𝒰r\in\mathcal{U} with rsr\neq s. Noting from the construction of 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) in (7) that lvsDmaxpl_{vs}\leq D_{\max}\leq p for all vsv\in\mathcal{L}_{s}, one can now show that

\normH(v,s)H^(v,s)δhε¯.\norm{H(v,s)-\hat{H}(v,s)}\leq\delta_{h}\bar{\varepsilon}. (96)

which also implies that

\normH^(v,s)δhε¯+β,\norm{\hat{H}(v,s)}\leq\delta_{h}\bar{\varepsilon}+\beta, (97)

for all vsv\in\mathcal{L}_{s}. For any k{0,,t}k\in\{0,\dots,t\}, we then have from the above arguments that

𝔼[\normηs(k)η^s(k)2]\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{\eta_{s}(k)-\hat{\eta}_{s}(k)}^{2}\Big{]}}
=\displaystyle= (𝔼[vswjv(H(v,s)Iv,{j}wj(klvs)H^(v,s)Iv,{j}w^j(klvs))2])12\displaystyle\bigg{(}\mathbb{E}\bigg{[}\Big{\|}\sum_{v\in\mathcal{L}_{s}}\sum_{w_{j}\rightarrow v}\big{(}H(v,s)I_{v,\{j\}}w_{j}(k-l_{vs})-\hat{H}(v,s)I_{v,\{j\}}\hat{w}_{j}(k-l_{vs})\big{)}\Big{\|}^{2}\bigg{]}\bigg{)}^{\frac{1}{2}}
\displaystyle\leq vswjv(𝔼[(H(v,s)H^(v,s))wj(klvs)2]+𝔼[H^(v,s)(wj(klvs)w^j(klvs))2])\displaystyle\sum_{v\in\mathcal{L}_{s}}\sum_{w_{j}\rightarrow v}\Big{(}\sqrt{\mathbb{E}\Big{[}\big{\|}\big{(}H(v,s)-\hat{H}(v,s)\big{)}w_{j}(k-l_{vs})\big{\|}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\big{\|}\hat{H}(v,s)\big{(}w_{j}(k-l_{vs})-\hat{w}_{j}(k-l_{vs})\big{)}\big{\|}^{2}\Big{]}}\Big{)}
\displaystyle\leq p(δhζbε¯+(δhε¯+β)δwε¯),\displaystyle p\big{(}\delta_{h}\zeta_{b}\bar{\varepsilon}+(\delta_{h}\bar{\varepsilon}+\beta)\delta_{w}\bar{\varepsilon}\big{)}, (98)

where the first inequality follows from Lemma 14. To obtain (98), we first note (89)-(90) and (96)-(97). We then use the fact that |s|p|\mathcal{L}_{s}|\leq p from the definition of the information graph 𝒫(𝒰,)\mathcal{P}(\mathcal{U},\mathcal{H}) given by (7), and the fact that for any v𝒰v\in\mathcal{U} with sj(0)=vs_{j}(0)=v, wjw_{j} is the only noise term such that wjvw_{j}\rightarrow v (see Footnote 2). From (93) and (98), we also obtain

𝔼[\normη^s(k)2]\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{\hat{\eta}_{s}(k)}^{2}\Big{]}} =𝔼[\normη^s(k)ηs(k)+ηs(k)2]\displaystyle=\sqrt{\mathbb{E}\Big{[}\norm{\hat{\eta}_{s}(k)-\eta_{s}(k)+\eta_{s}(k)}^{2}\Big{]}}
𝔼[\normη^s(k)ηs(k)2]+𝔼[\normηs(k)2]\displaystyle\leq\sqrt{\mathbb{E}\Big{[}\norm{\hat{\eta}_{s}(k)-\eta_{s}(k)}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{\eta_{s}(k)}^{2}\Big{]}}
p(δhζbε¯+(δhε¯+β)δwε¯)+ζb,\displaystyle\leq p\big{(}\delta_{h}\zeta_{b}\bar{\varepsilon}+(\delta_{h}\bar{\varepsilon}+\beta)\delta_{w}\bar{\varepsilon}\big{)}+\zeta_{b}, (99)

where the first inequality follows from Lemma 14.

Now, let us denote L~ss=Ass+BssK^s\tilde{L}_{ss}=A_{ss}+B_{ss}\hat{K}_{s} and L^ss=A^ss+B^ssK^s\hat{L}_{ss}=\hat{A}_{ss}+\hat{B}_{ss}\hat{K}_{s}. Recalling that ζ^s(0)=ζ~s(0)=wisIs,{i}xi(0)\hat{\zeta}_{s}(0)=\tilde{\zeta}_{s}(0)=\sum_{w_{i}\rightarrow s}I_{s,\{i\}}x_{i}(0), where x(0)=0x(0)=0 as we assumed before, one can unroll Eqs. (91) and (94), and show that

ζ^s(t+1)ζ~s(t+1)=k=0t(L^sstkη^s(k)L~sstkη~s(k)).\hat{\zeta}_{s}(t+1)-\tilde{\zeta}_{s}(t+1)=\sum_{k=0}^{t}\big{(}\hat{L}_{ss}^{t-k}\hat{\eta}_{s}(k)-\tilde{L}_{ss}^{t-k}\tilde{\eta}_{s}(k)\big{)}. (100)

Since \normA^Aε¯\norm{\hat{A}-A}\leq\bar{\varepsilon} and \normB^Bε¯\norm{\hat{B}-B}\leq\bar{\varepsilon}, where ε¯\bar{\varepsilon} satisfies (44), as we argued above, we have from Lemma 7 that

\normL~sskκ(γ+12)kk0,\norm{\tilde{L}_{ss}^{k}}\leq\kappa(\frac{\gamma+1}{2})^{k}\quad\forall k\in\mathbb{Z}_{\geq 0}, (101)

where κ1\kappa\in\mathbb{R}_{\geq 1} and γ\gamma\in\mathbb{R}, with 0<γ<10<\gamma<1. Moreover, since \normL^ssL~ss=\normA^ssAss+K^s(B^ssBss)(Γ~+1)ε¯\norm{\hat{L}_{ss}-\tilde{L}_{ss}}=\norm{\hat{A}_{ss}-A_{ss}+\hat{K}_{s}(\hat{B}_{ss}-B_{ss})}\leq(\tilde{\Gamma}+1)\bar{\varepsilon}, we have from Lemma 15 that

\normL^sskL~ssk\displaystyle\norm{\hat{L}_{ss}^{k}-\tilde{L}_{ss}^{k}} kκ2(κ(Γ~+1)ε¯+γ+12)k1(Γ~+1)ε¯\displaystyle\leq k\kappa^{2}\big{(}\kappa(\tilde{\Gamma}+1)\bar{\varepsilon}+\frac{\gamma+1}{2}\big{)}^{k-1}(\tilde{\Gamma}+1)\bar{\varepsilon}
kκ2(γ+34)k1(Γ~+1)ε¯k0,\displaystyle\leq k\kappa^{2}(\frac{\gamma+3}{4})^{k-1}(\tilde{\Gamma}+1)\bar{\varepsilon}\quad\forall k\in\mathbb{Z}_{\geq 0}, (102)

where (102) follows from the choice of ε¯\bar{\varepsilon} in (54). Now, considering any term in the summation on the right-hand side of (100), we have

𝔼[\normL^sstkη^s(k)L~sstkη~s(k)2]\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{\hat{L}_{ss}^{t-k}\hat{\eta}_{s}(k)-\tilde{L}_{ss}^{t-k}\tilde{\eta}_{s}(k)}^{2}\Big{]}}
\displaystyle\leq 𝔼[\norm(L^sstkL~sstk)η^s(k)2]+𝔼[\normL~sstk(η^s(k)ηs(k))2]\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{(\hat{L}_{ss}^{t-k}-\tilde{L}_{ss}^{t-k})\hat{\eta}_{s}(k)}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{\tilde{L}_{ss}^{t-k}(\hat{\eta}_{s}(k)-\eta_{s}(k))}^{2}\Big{]}}
\displaystyle\leq (tk)κ2(γ+34)tk1(Γ~+1)ε¯(p(δwδhε¯+δhζb+δwβ)ε¯+ζb)+κ(γ+12)tkp(δwδhε¯+δhζb+δwβ)ε¯\displaystyle(t-k)\kappa^{2}(\frac{\gamma+3}{4})^{t-k-1}(\tilde{\Gamma}+1)\bar{\varepsilon}\big{(}p(\delta_{w}\delta_{h}\bar{\varepsilon}+\delta_{h}\zeta_{b}+\delta_{w}\beta)\bar{\varepsilon}+\zeta_{b}\big{)}+\kappa(\frac{\gamma+1}{2})^{t-k}p(\delta_{w}\delta_{h}\bar{\varepsilon}+\delta_{h}\zeta_{b}+\delta_{w}\beta)\bar{\varepsilon}
\displaystyle\leq (tk)κ2(γ+34)tk1(Γ~+1)p(2δw+2ζb)ε¯+κ(γ+12)tkp((β+1)δw+δhζb)ε¯,\displaystyle(t-k)\kappa^{2}(\frac{\gamma+3}{4})^{t-k-1}(\tilde{\Gamma}+1)p\big{(}2\delta_{w}+2\zeta_{b}\big{)}\bar{\varepsilon}+\kappa(\frac{\gamma+1}{2})^{t-k}p\big{(}(\beta+1)\delta_{w}+\delta_{h}\zeta_{b}\big{)}\bar{\varepsilon}, (103)

where the first inequality follows from Lemma 14, and the second inequality uses the upper bounds provided in (98)-(99) and (101)-(102). Moreover, one can show that ε¯\bar{\varepsilon} defined in (54) satisfies that δhε¯1\delta_{h}\bar{\varepsilon}\leq 1 and βε¯1\beta\bar{\varepsilon}\leq 1, which, via algebraic manipulations, yield (103). We then see from (100) that

𝔼[\normζ^s(t+1)ζ~s(t+1)2]\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{\hat{\zeta}_{s}(t+1)-\tilde{\zeta}_{s}(t+1)}^{2}\Big{]}}
\displaystyle\leq k=0t𝔼[\normL^sstkη^s(k)L~sstkη~s(k)2]\displaystyle\sum_{k=0}^{t}\sqrt{\mathbb{E}\Big{[}\norm{\hat{L}_{ss}^{t-k}\hat{\eta}_{s}(k)-\tilde{L}_{ss}^{t-k}\tilde{\eta}_{s}(k)}^{2}\Big{]}}
\displaystyle\leq (κp((β+1)δw+δhζb)ε¯k=0t(γ+12)tk)+κ2(Γ~+1)p(2δw+2ζb)ε¯k=0t(tk)(γ+34)tk1\displaystyle\bigg{(}\kappa p\big{(}(\beta+1)\delta_{w}+\delta_{h}\zeta_{b}\big{)}\bar{\varepsilon}\sum_{k=0}^{t}(\frac{\gamma+1}{2})^{t-k}\bigg{)}+\kappa^{2}(\tilde{\Gamma}+1)p\big{(}2\delta_{w}+2\zeta_{b}\big{)}\bar{\varepsilon}\sum_{k=0}^{t}(t-k)(\frac{\gamma+3}{4})^{t-k-1}
\displaystyle\leq 2κp1γ((β+1)δw+δhζb)ε¯+16κ2(Γ~+1)p(1γ)2(2δw+2ζb)ε¯,\displaystyle\frac{2\kappa p}{1-\gamma}\big{(}(\beta+1)\delta_{w}+\delta_{h}\zeta_{b}\big{)}\bar{\varepsilon}+\frac{16\kappa^{2}(\tilde{\Gamma}+1)p}{(1-\gamma)^{2}}\big{(}2\delta_{w}+2\zeta_{b}\big{)}\bar{\varepsilon}, (104)

where the first inequality follows from Lemma 14, and (104) follows from standard formulas for series. Now, substituting Eq. (88) into the right-hand side of (104), one can show that

𝔼[\normζ^s(t+1)ζ~s(t+1)2]1qΓ~(Λ1(1.1Λ2ε¯)+Λ2)ε¯,\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{\hat{\zeta}_{s}(t+1)-\tilde{\zeta}_{s}(t+1)}^{2}\Big{]}}\leq\frac{1}{q\tilde{\Gamma}}\big{(}\Lambda_{1}(1.1\Lambda_{2}\bar{\varepsilon})+\Lambda_{2}\big{)}\bar{\varepsilon}, (105)

where we note that Λ1>0\Lambda_{1}>0 and Λ2>0\Lambda_{2}>0 by their definitions.

Next, considering any s𝒰s\in\mathcal{U} that does not have a self loop, we have from the arguments in the proof of Lemma 9 that Eq. (35) can be rewritten as ζ~s(t+1)=ηs(t)\tilde{\zeta}_{s}(t+1)=\eta_{s}(t), where ηs(t)\eta_{s}(t) is defined in Eq. (92). Similarly, Eq. (28) can be rewritten as ζ^s(t+1)=η^s(t)\hat{\zeta}_{s}(t+1)=\hat{\eta}_{s}(t), where η^s(t)\hat{\eta}_{s}(t) is defined in Eq. (95). Using similar arguments to those above, one can then show that (105) also holds.

Further recalling Eqs. (23) and (34), we know that u^(t+1)=s𝒰I𝒱,sK^sζ^(t+1)\hat{u}(t+1)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\hat{K}_{s}\hat{\zeta}(t+1) and u~(t+1)=s𝒰I𝒱,sK^sζ~(t+1)\tilde{u}(t+1)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\hat{K}_{s}\tilde{\zeta}(t+1), which imply that

𝔼[\normu^(t+1)u~(t+1)2]\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{\hat{u}(t+1)-\tilde{u}(t+1)}^{2}\Big{]}} =𝔼[s𝒰I𝒱,sK^s(ζ^s(t+1)ζ~s(t+1))2]\displaystyle=\sqrt{\mathbb{E}\Big{[}\big{\|}\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\hat{K}_{s}(\hat{\zeta}_{s}(t+1)-\tilde{\zeta}_{s}(t+1))\big{\|}^{2}\Big{]}}
s𝒰𝔼[\normI𝒱,sK^s(ζ^s(t+1)ζ~s(t+1))2]\displaystyle\leq\sum_{s\in\mathcal{U}}\sqrt{\mathbb{E}\Big{[}\norm{I_{\mathcal{V},s}\hat{K}_{s}(\hat{\zeta}_{s}(t+1)-\tilde{\zeta}_{s}(t+1))}^{2}\Big{]}}
Γ~s𝒰𝔼[\normζ^s(t+1)ζ~s(t+1)2]\displaystyle\leq\tilde{\Gamma}\sum_{s\in\mathcal{U}}\sqrt{\mathbb{E}\Big{[}\norm{\hat{\zeta}_{s}(t+1)-\tilde{\zeta}_{s}(t+1)}^{2}\Big{]}}
(Λ1(1.1Λ2ε¯)+Λ2)ε¯,\displaystyle\leq\big{(}\Lambda_{1}(1.1\Lambda_{2}\bar{\varepsilon})+\Lambda_{2}\big{)}\bar{\varepsilon},

where the first inequality follows from Lemma 14, the second inequality follows from \normK^sΓ~\norm{\hat{K}_{s}}\leq\tilde{\Gamma} for all s𝒰s\in\mathcal{U}, as we argued above, and the last inequality follows from (105) and the fact that |𝒰|=q|\mathcal{U}|=q. Moreover, we can show that

Λ1\displaystyle\Lambda_{1} qΓ~34κ2p(1γ)2(Γ~+1)(β+1)Γ~κ1γ\displaystyle\leq q\tilde{\Gamma}\frac{34\kappa^{2}p}{(1-\gamma)^{2}}(\tilde{\Gamma}+1)(\beta+1)\frac{\tilde{\Gamma}\kappa}{1-\gamma}
34κ2pq(1γ)3(Γ~+1)Γ~2Dmax+3,\displaystyle\leq\frac{34\kappa^{2}pq}{(1-\gamma)^{3}}(\tilde{\Gamma}+1)\tilde{\Gamma}^{2D_{\max}+3},

where the first inequality follows from the fact that κ1γ>1\frac{\kappa}{1-\gamma}>1, and the second inequality follows from the fact that β+1Γ~2Dmax+1\beta+1\leq\tilde{\Gamma}^{2D_{\max}+1} as we argued above. One can then show that ε¯\bar{\varepsilon} given in (54) satisfies that 0<ε¯111Λ10<\bar{\varepsilon}\leq\frac{1}{11\Lambda_{1}}. Thus, we obtain the following:

1.1Λ1ε¯+11.1\displaystyle 1.1\Lambda_{1}\bar{\varepsilon}+1\leq 1.1
\displaystyle\Leftrightarrow\ 1.1Λ1Λ2ε¯+Λ21.1Λ2\displaystyle 1.1\Lambda_{1}\Lambda_{2}\bar{\varepsilon}+\Lambda_{2}\leq 1.1\Lambda_{2}
\displaystyle\Leftrightarrow\ Λ1(1.1Λ2ε¯)ε¯+Λ2ε¯1.1Λ2ε¯.\displaystyle\Lambda_{1}(1.1\Lambda_{2}\bar{\varepsilon})\bar{\varepsilon}+\Lambda_{2}\bar{\varepsilon}\leq 1.1\Lambda_{2}\bar{\varepsilon}.

It follows that

𝔼[\normu^(t+1)u~(t+1)2]1.1Λ2ε¯,\sqrt{\mathbb{E}\Big{[}\norm{\hat{u}(t+1)-\tilde{u}(t+1)}^{2}\Big{]}}\leq 1.1\Lambda_{2}\bar{\varepsilon},

which completes the induction step, i.e., we have shown that 𝔼[\normu^(k)u~(k)2]1.1Λ2ε¯\sqrt{\mathbb{E}\big{[}\norm{\hat{u}(k)-\tilde{u}(k)}^{2}\big{]}}\leq 1.1\Lambda_{2}\bar{\varepsilon} holds for all k{0,,t+1}k\in\{0,\dots,t+1\}.

Next, using similar arguments to those for (87), we have 𝔼[\normx^(t)x~(t)2]1.1ΓΛ2κ1γε¯\sqrt{\mathbb{E}\big{[}\norm{\hat{x}(t)-\tilde{x}(t)}^{2}\big{]}}\leq\frac{1.1\Gamma\Lambda_{2}\kappa}{1-\gamma}\bar{\varepsilon} for all t0t\in\mathbb{Z}_{\geq 0}. It then follows from (86) that (56) holds for all t0t\in\mathbb{Z}_{\geq 0}. ∎

D.5 Proof of Proposition 4

For notational simplicity in this proof, we denote

Λ=58κ2(Γ~+1)2Dmax+3p2q2(1γ)2.\Lambda=\frac{58\kappa^{2}(\tilde{\Gamma}+1)^{2D_{\max}+3}p^{2}q^{2}}{(1-\gamma)^{2}}. (106)

For all t0t\in\mathbb{Z}_{\geq 0}, we then see from Lemma 11 that

𝔼[\normu^(t)u~(t)2](Λζbε¯)2,\mathbb{E}\Big{[}\norm{\hat{u}(t)-\tilde{u}(t)}^{2}\Big{]}\leq(\Lambda\zeta_{b}\bar{\varepsilon})^{2},

and

𝔼[\normx^(t)x~(t)2](κΓ1γΛζbε¯)2,\mathbb{E}\Big{[}\norm{\hat{x}(t)-\tilde{x}(t)}^{2}\Big{]}\leq\Big{(}\frac{\kappa\Gamma}{1-\gamma}\Lambda\zeta_{b}\bar{\varepsilon}\Big{)}^{2},

where u^(k)\hat{u}(k) (resp., u~(k)\tilde{u}(k)) is given by Eq. (23) (resp., Eq. (34)), x^(k)\hat{x}(k) (resp., x~(k)\tilde{x}(k)) is given by Eq. (32) (resp., Eq. (36)), and ζb\zeta_{b} is defined in Eq. (54). Similarly, we see from Corollary 1 that

𝔼[\normx^(t)2](κΓ1γΛζbε¯+qζb)2,\mathbb{E}\Big{[}\norm{\hat{x}(t)}^{2}\Big{]}\leq\Big{(}\frac{\kappa\Gamma}{1-\gamma}\Lambda\zeta_{b}\bar{\varepsilon}+q\zeta_{b}\Big{)}^{2},

and

𝔼[\normu^(t)2](Λζbε¯+qΓ~ζb)2,\mathbb{E}\Big{[}\norm{\hat{u}(t)}^{2}\Big{]}\leq(\Lambda\zeta_{b}\bar{\varepsilon}+q\tilde{\Gamma}\zeta_{b})^{2},

for all t0t\in\mathbb{Z}_{\geq 0}.

To proceed, we have the following:

J^J~\displaystyle\hat{J}-\tilde{J} =lim supT𝔼[1Tt=0T1(x^(t)Qx^(t)+u^(t)Ru^(t))]limT𝔼[1Tt=0T1(x~(t)Qx~(t)+u~(t)Ru~(t))]\displaystyle=\limsup_{T\to\infty}\mathbb{E}\Big{[}\frac{1}{T}\sum_{t=0}^{T-1}\big{(}\hat{x}(t)^{\top}Q\hat{x}(t)+\hat{u}(t)^{\top}R\hat{u}(t)\big{)}\Big{]}-\lim_{T\to\infty}\mathbb{E}\Big{[}\frac{1}{T}\sum_{t=0}^{T-1}\big{(}\tilde{x}(t)^{\top}Q\tilde{x}(t)+\tilde{u}(t)^{\top}R\tilde{u}(t)\big{)}\Big{]}
=lim supT𝔼[1Tt=0T1(x^(t)Qx^(t)x~(t)Qx~(t)+u^(t)Ru^(t)u~(t)Ru~(t))].\displaystyle=\limsup_{T\to\infty}\mathbb{E}\Big{[}\frac{1}{T}\sum_{t=0}^{T-1}\big{(}\hat{x}(t)^{\top}Q\hat{x}(t)-\tilde{x}(t)^{\top}Q\tilde{x}(t)+\hat{u}(t)^{\top}R\hat{u}(t)-\tilde{u}(t)^{\top}R\tilde{u}(t)\big{)}\Big{]}. (107)

Now, considering any term in the summation on the right-hand side of Eq. (107), and dropping the dependency on tt for notational simplicity, we have the following:

𝔼[x^Qx^x~Qx~]\displaystyle\mathbb{E}\Big{[}\hat{x}^{\top}Q\hat{x}-\tilde{x}^{\top}Q\tilde{x}\Big{]}
=\displaystyle= 𝔼[x^Q(x^x~)+(x^x~)Qx~]\displaystyle\mathbb{E}\Big{[}\hat{x}^{\top}Q(\hat{x}-\tilde{x})+(\hat{x}-\tilde{x})^{\top}Q\tilde{x}\Big{]}
\displaystyle\leq 𝔼[\normQx^\normx^x~]+𝔼[\normx^x~\normQx~]\displaystyle\mathbb{E}\Big{[}\norm{Q\hat{x}}\norm{\hat{x}-\tilde{x}}\Big{]}+\mathbb{E}\Big{[}\norm{\hat{x}-\tilde{x}}\norm{Q\tilde{x}}\Big{]}
\displaystyle\leq 𝔼[\normQx^2]𝔼[\normx^x~2]+𝔼[\normx^x~2]𝔼[\normQx~2]\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{Q\hat{x}}^{2}\Big{]}\mathbb{E}\Big{[}\norm{\hat{x}-\tilde{x}}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{\hat{x}-\tilde{x}}^{2}\Big{]}\mathbb{E}\Big{[}\norm{Q\tilde{x}}^{2}\Big{]}}
\displaystyle\leq σ1(Q)(κΓΛζb1γε¯+qζb)κΓΛζb1γε¯+σ1(Q)κΓΛζb1γε¯qζb\displaystyle\sigma_{1}(Q)\Big{(}\frac{\kappa\Gamma\Lambda\zeta_{b}}{1-\gamma}\bar{\varepsilon}+q\zeta_{b}\Big{)}\frac{\kappa\Gamma\Lambda\zeta_{b}}{1-\gamma}\bar{\varepsilon}+\sigma_{1}(Q)\frac{\kappa\Gamma\Lambda\zeta_{b}}{1-\gamma}\bar{\varepsilon}q\zeta_{b}
=\displaystyle= σ1(Q)(κΓΛζb1γε¯+2qζb)κΓΛζb1γε¯,\displaystyle\sigma_{1}(Q)\Big{(}\frac{\kappa\Gamma\Lambda\zeta_{b}}{1-\gamma}\bar{\varepsilon}+2q\zeta_{b}\Big{)}\frac{\kappa\Gamma\Lambda\zeta_{b}}{1-\gamma}\bar{\varepsilon}, (108)

where the first two inequalities follow from the Cauchy-Schwartz inequality, and the third inequality follows from the upper bounds on 𝔼[\normx^2]\mathbb{E}\big{[}\norm{\hat{x}}^{2}\big{]}, 𝔼[\normx^x~2]\mathbb{E}\big{[}\norm{\hat{x}-\tilde{x}}^{2}\big{]}, and 𝔼[\normx~2]\mathbb{E}\big{[}\norm{\tilde{x}}^{2}\big{]} given above and in Lemma 10. Similarly, we have

𝔼[u^Ru^u~Ru~]\displaystyle\mathbb{E}\Big{[}\hat{u}^{\top}R\hat{u}-\tilde{u}^{\top}R\tilde{u}\Big{]}
\displaystyle\leq 𝔼[\normRu^2]𝔼[\normu^u~2]+𝔼[\normu^u~2]𝔼[\normRu~2]\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{R\hat{u}}^{2}\Big{]}\mathbb{E}\Big{[}\norm{\hat{u}-\tilde{u}}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{\hat{u}-\tilde{u}}^{2}\Big{]}\mathbb{E}\Big{[}\norm{R\tilde{u}}^{2}\Big{]}}
\displaystyle\leq σ1(R)(Λζbε¯+qΓ~ζb)Λζbε¯+σ1(R)Λζbε¯qΓ~ζb\displaystyle\sigma_{1}(R)(\Lambda\zeta_{b}\bar{\varepsilon}+q\tilde{\Gamma}\zeta_{b})\Lambda\zeta_{b}\bar{\varepsilon}+\sigma_{1}(R)\Lambda\zeta_{b}\bar{\varepsilon}q\tilde{\Gamma}\zeta_{b}
=\displaystyle= σ1(R)(Λζbε¯+2qΓ~ζb)Λζbε¯,\displaystyle\sigma_{1}(R)(\Lambda\zeta_{b}\bar{\varepsilon}+2q\tilde{\Gamma}\zeta_{b})\Lambda\zeta_{b}\bar{\varepsilon}, (109)

where the second inequality follows from the upper bounds on 𝔼[\normu^2]\mathbb{E}\big{[}\norm{\hat{u}}^{2}\big{]}, 𝔼[\normu^u~2]\mathbb{E}\big{[}\norm{\hat{u}-\tilde{u}}^{2}\big{]}, and 𝔼[\normu~2]\mathbb{E}\big{[}\norm{\tilde{u}}^{2}\big{]} given above and in Lemma 10. Combining (108) and (109) together, we obtain from Eq. (107) that

J^J~\displaystyle\hat{J}-\tilde{J} σ1(Q)(κΓΛζb1γε¯+2qζb)κΓΛζb1γε¯+σ1(R)(Λζbε¯+2qΓ~ζb)Λζbε¯\displaystyle\leq\sigma_{1}(Q)\Big{(}\frac{\kappa\Gamma\Lambda\zeta_{b}}{1-\gamma}\bar{\varepsilon}+2q\zeta_{b}\Big{)}\frac{\kappa\Gamma\Lambda\zeta_{b}}{1-\gamma}\bar{\varepsilon}+\sigma_{1}(R)(\Lambda\zeta_{b}\bar{\varepsilon}+2q\tilde{\Gamma}\zeta_{b})\Lambda\zeta_{b}\bar{\varepsilon}
(κΓ~ζb1γ)2(Λ2ε¯+2qΛ)(σ1(Q)+σ1(R))ε¯\displaystyle\leq\Big{(}\frac{\kappa\tilde{\Gamma}\zeta_{b}}{1-\gamma}\Big{)}^{2}(\Lambda^{2}\bar{\varepsilon}+2q\Lambda)(\sigma_{1}(Q)+\sigma_{1}(R))\bar{\varepsilon}
(κΓ~ζb1γ)23Λpq(σ1(Q)+σ1(R))ε¯,\displaystyle\leq\Big{(}\frac{\kappa\tilde{\Gamma}\zeta_{b}}{1-\gamma}\Big{)}^{2}3\Lambda pq(\sigma_{1}(Q)+\sigma_{1}(R))\bar{\varepsilon}, (110)

where the second inequality follows from the fact that κΓ~1γ1\frac{\kappa\tilde{\Gamma}}{1-\gamma}\geq 1. To obtain (110), one can show that Λ2ε¯Λpq\Lambda^{2}\bar{\varepsilon}\leq\Lambda pq. Finally substituting the expressions for ζb\zeta_{b} and Λ\Lambda given in (54) and (106), respectively, we obtain from (110) that (59) holds. ∎

Appendix E Auxiliary Lemmas

Lemma 12.

[9, Lemma 34] Let w(t)nw(t)\in\mathbb{R}^{n} be a Gaussian random vector with distribution 𝒩(0,σw2I)\mathcal{N}(0,\sigma_{w}^{2}I), for all t{0,,N1}t\in\{0,\dots,N-1\}, where σw0\sigma_{w}\in\mathbb{R}_{\geq 0}. Then for any N2N\geq 2 and for any δw>0\delta_{w}>0, the following holds with probability at least 1δw1-\delta_{w}:

max0tN1\normw(t)σw5nlogNδw.\max_{0\leq t\leq N-1}\norm{w(t)}\leq\sigma_{w}\sqrt{5n\log\frac{N}{\delta_{w}}}.
Lemma 13.

[9, Lemma 36] Let {z(t)}t0\{z(t)\}_{t\geq 0} be a sequence of random vectors that is adapted to a filtration {t}t0\{\mathcal{F}_{t}\}_{t\geq 0}, where z(t)dz(t)\in\mathbb{R}^{d} for all t0t\in\mathbb{Z}_{\geq 0}. Suppose z(t)z(t) is conditionally Gaussian on t1\mathcal{F}_{t-1} with 𝔼[z(t)z(t)|t1]σz2I\mathbb{E}[z(t)z(t)^{\top}|\mathcal{F}_{t-1}]\geq\sigma_{z}^{2}I, for all t1t\in\mathbb{Z}_{\geq 1}, where σz>0\sigma_{z}\in\mathbb{R}_{>0}. Then, for any δz>0\delta_{z}\in\mathbb{R}_{>0} and for any t200dlog12δzt\geq 200d\log\frac{12}{\delta_{z}}, the following holds with probability at least 1δz1-\delta_{z}:

k=0t1z(k)z(k)(t1)σz240I.\sum_{k=0}^{t-1}z(k)z(k)^{\top}\succeq\frac{(t-1)\sigma_{z}^{2}}{40}I.
Lemma 14.

Let X1,,XtX_{1},\dots,X_{t} be a sequence of random vectors, where t1t\in\mathbb{Z}_{\geq 1}. Then,

𝔼[(k=1tXk)(k=1tXk)](k=1t𝔼[\normXk2])2.\mathbb{E}\Big{[}\big{(}\sum_{k=1}^{t}X_{k}\big{)}^{\top}\big{(}\sum_{k=1}^{t}X_{k}\big{)}\Big{]}\leq\Big{(}\sum_{k=1}^{t}\sqrt{\mathbb{E}\Big{[}\norm{X_{k}}^{2}\Big{]}}\Big{)}^{2}.
Proof.

We have the following:

𝔼[(k=1tXk)(k=1tXk)]\displaystyle\mathbb{E}\Big{[}\big{(}\sum_{k=1}^{t}X_{k}\big{)}^{\top}\big{(}\sum_{k=1}^{t}X_{k}\big{)}\Big{]} =k=1t𝔼[\normXk2]+2k1=1tk2=1k2k1t𝔼[Xk1Xk2]\displaystyle=\sum_{k=1}^{t}\mathbb{E}\Big{[}\norm{X_{k}}^{2}\Big{]}+2\sum_{k_{1}=1}^{t}\sum_{\begin{subarray}{c}k_{2}=1\\ k_{2}\neq k_{1}\end{subarray}}^{t}\mathbb{E}\Big{[}X_{k_{1}}^{\top}X_{k_{2}}\Big{]}
k=1t𝔼[\normXk2]+2k1=1tk2=1k2k1t𝔼[\normXk1\normXk2]\displaystyle\leq\sum_{k=1}^{t}\mathbb{E}\Big{[}\norm{X_{k}}^{2}\Big{]}+2\sum_{k_{1}=1}^{t}\sum_{\begin{subarray}{c}k_{2}=1\\ k_{2}\neq k_{1}\end{subarray}}^{t}\mathbb{E}\Big{[}\norm{X_{k_{1}}}\norm{X_{k_{2}}}\Big{]}
k=1t𝔼[\normXk2]+2k1=1tk2=1k2k1t𝔼[\normXk12]𝔼[\normXk2]\displaystyle\leq\sum_{k=1}^{t}\mathbb{E}\Big{[}\norm{X_{k}}^{2}\Big{]}+2\sum_{k_{1}=1}^{t}\sum_{\begin{subarray}{c}k_{2}=1\\ k_{2}\neq k_{1}\end{subarray}}^{t}\sqrt{\mathbb{E}\Big{[}\norm{X_{k_{1}}}^{2}\Big{]}}\sqrt{\mathbb{E}\Big{[}\norm{X_{k}}^{2}\Big{]}}
=(k=1t𝔼[\normXk2])2,\displaystyle=\Big{(}\sum_{k=1}^{t}\sqrt{\mathbb{E}\Big{[}\norm{X_{k}}^{2}\Big{]}}\Big{)}^{2},

where the first and second inequalities follow from the Cauchy-Schwarz inequality. ∎

Lemma 15.

[29, Lemma 5] Consider any matrix Mn×nM\in\mathbb{R}^{n\times n} and any matrix Δn×n\Delta\in\mathbb{R}^{n\times n}. Let κM1\kappa_{M}\in\mathbb{R}_{\geq 1} and γM>0\gamma_{M}\in\mathbb{R}_{>0} be such that γM>ρ(M)\gamma_{M}>\rho(M), and \normMkκMγMk\norm{M^{k}}\leq\kappa_{M}\gamma_{M}^{k} for all k0k\in\mathbb{Z}_{\geq 0}. Then, for all k0k\in\mathbb{Z}_{\geq 0},

\norm(M+Δ)kMkkκM2(κM\normΔ+γM)k1\normΔ.\norm{(M+\Delta)^{k}-M^{k}}\leq k\kappa_{M}^{2}(\kappa_{M}\norm{\Delta}+\gamma_{M})^{k-1}\norm{\Delta}.