On the Sample Complexity of Decentralized Linear Quadratic Regulator with Partially Nested Information Structure

Lintao Ye Department of Electrical Engineering at the University of Notre Dame, Notre Dame, IN, USA; [email protected],[email protected]. Zhu Hao Department of Electrical and Computer Engineering at the University of Texas at Austin, USA; [email protected]. Vijay Gupta¹¹footnotemark: 1

Abstract

We study the problem of control policy design for decentralized state-feedback linear quadratic control with a partially nested information structure, when the system model is unknown. We propose a model-based learning solution, which consists of two steps. First, we estimate the unknown system model from a single system trajectory of finite length, using least squares estimation. Next, based on the estimated system model, we design a decentralized control policy that satisfies the desired information structure. We show that the suboptimality gap between our control policy and the optimal decentralized control policy (designed using accurate knowledge of the system model) scales linearly with the estimation error of the system model. Using this result, we provide an end-to-end sample complexity result for learning decentralized controllers for a linear quadratic control problem with a partially nested information structure.

1 Introduction

In large-scale control systems, the control policy is often required to be decentralized, where different controllers may only use partial state information, when designing their local control policies. For example, a given controller may only receive a subset of the global state measurements (e.g., [35]), and there may be a delay in receiving the measurements (e.g., [25]). In general, finding a globally optimal control policy under information constraints is NP-hard, even if the system model is known at the controllers [39, 31, 6]. This has led to a large literature on identifying tractable subclasses of the problem. For instance, if the information structure describing the decentralized control problem is partially nested [21], the optimal solution to the state-feedback linear quadratic control problem can be solved efficiently using dynamic programming [26]. Other conditions, such as quadratic invariance [32, 33], have also been identified as tractable subclasses of the problem.

However, the classical work in this field assumes the knowledge of the system model at the controllers. In this work, we are interested in the situation when the system model is not known a priori [23]. In such a case, the existing algorithms do not apply. Moreover, it is not clear whether subclasses such as problems with partially nested information patterns or where quadratic invariance is satisfied are any more tractable than the general decentralized control problem in this case.

In this paper, we consider a decentralized infinite-horizon state-feedback Linear Quadratic Regulator (LQR) control problem with a partially nested information structure [35, 26] and assume that the controllers do not have access to the system model. We use a model-based learning approach, where we first identify the system model, and then use it to design a decentralized control policy that satisfies the prescribed information constraints.

Related Work

Solving optimal control problems without prior system model knowledge has receive much attention recently. One of the most studied problems is the centralized LQR problem. For this problem, two broad classes of methods have been studied, i.e., model based learning [1, 29, 12], and model-free learning [16, 41, 28, 20]. In the model-based learning approach, a system model is first estimated from observed system trajectories using some system identification method. A control policy can then be obtained based on the estimated system model. In the model-free learning approach, the objective function in the LQR problem is first viewed as a function of the control policies. Based on zeroth-order optimization methods (e.g., [19, 30]), the optimal solution can then be obtained using gradient descent, where the gradient of the objective function is estimated from the data samples from system trajectories. Moreover, the model-based learning approach has also been studied for the centralized linear quadratic Gaussian control problem [43]. In general, compared to model-free learning, model-based learning tends to require less data samples in order to achieve a policy of equivalent performance [38].

Most of the previous works on model-based learning for centralized LQR build on recent advances in non-asymptotic analyses for system identification of linear dynamical systems with full state observations (e.g., [13, 36, 34]). Such non-asymptotic analyses (i.e., sample complexity results) relate the estimation error of the system matrices to the number of samples used for system identification. In particular, it was shown in [36] that when using a single system trajectory, the least squares approach for system identification achieves the optimal sample complexity up to logarithmic factors. In this paper, we utilize a similar least squares approach for estimating the system matrices from a single system trajectory. Although the system matrices in our problem are structured, as dictated by the interconnections among the subsystems, we leverage the results in [2, 10] to provide a non-asymptotic analysis of the resulting estimation error.

There are few results on solving decentralized linear quadratic control problems with information constraints, when the system model is unknown. In [18], the authors studied a decentralized output-feedback linear quadratic control problem, under the assumption that the quadratic invariance condition is satisfied. The authors proposed a model-free approach and provided a sample complexity analysis. They focused on a finite-horizon setting, since gradient-based optimization methods may not converge to the optimal controller for infinite-horizon decentralized linear quadratic control problems with information constraints, even when the system model is known [17, 7]. In [27], the authors proposed a consensus-based model-free learning algorithm for multi-agent decentralized LQR over an infinite horizon, where each agent (i.e., controller) has access to a subset of the global state without delay. They showed that their algorithm converges to a control policy that is a stationary point of the objective function in the LQR problem. In [15], the authors studied model-based learning for LQR with subspace constraints on the closed-loop responses. However, those constraints may not lead to controllers that satisfy the information constraints that we consider in this paper (e.g., [42]).

There is also a line of research on online adaptive control for centralized LQR with unknown system models, using either model-based learning [1, 11, 10], or model-free learning [3, 8]. The goal there is to adaptively design a control policy in an online manner when new data samples from the system trajectory become available, and bound the corresponding regret.

Contributions

We propose a two-step model-based approach to solving the problem of learning decentralized LQR with a partially nested information structure. Here, we summarize our contributions and technical challenges in the paper.

•

In Section 3, we provide a sample complexity result for estimating the system model from a single system trajectory using a least squares approach. Despite the existence of a sparsity pattern in the system model considered in our problem, we adapt the analyses in [10, 9] for least squares estimation of general linear system models (without any sparsity pattern) to our setting, and show that such a system identification method for general system models suffices for our ensuing analyses.
•

In Section 4, based on the estimated system model, we design a novel decentralized control policy that satisfies the given information structure. Our control policy is inspired by [26], which developed the optimal controller for the decentralized LQR problem with a partially nested information structure and known system model. The optimal controller therein depends on some internal states, each of which evolves according to an auxiliary linear system (characterized by the actual model of the original system with a disturbance term from the original system) and correlates with other internal states. Accordingly, this complicated form of the internal states makes it challenging to extend the design in [26] to the case when the system model is unknown. To tackle this, we capitalize on the observation that the optimal controller proposed in [26] can be viewed as a disturbance-feedback control policy that maps the history of past disturbances (affecting the original system) to the current control input. Thanks to this viewpoint, we put forth a control policy that uses the aforementioned estimated system model and maps the estimates of past disturbances to the current control input via some estimated internal states. Particularly, the estimates of disturbances are obtained using the estimated system model and the state information of original system, and each of the estimated internal states evolves according to a linear system characterized by the estimated system model and the estimated disturbances. More importantly, we show that the proposed control policy can be implemented in a decentralized manner that satisfies the prescribed information structure, which requires a careful investigation of the structure of our problem.
•

In Section 5.2, we characterize the performance guarantee (i.e., suboptimality) of the control policy proposed in Section 4. As we discussed above, our control policy requires obtaining estimates of the past disturbances and maintaining the estimated internal states. When we compare the performance of our control policy to that of the optimal decentralized control policy in [26], both the estimates of the past disturbances and the estimated internal states contribute to the suboptimality of our control policy, which creates the major technical challenge in our analyses. We overcome this challenge by carefully investigating the structure of the proposed control policy, and we show that the suboptimality gap between our control policy and the optimal decentralized control policy (designed based on accurate knowledge of the system model) provided in [26] can be decomposed into two terms, both of which scale linearly with the estimation error of the system model.
•

In Section 5.3, we combine the above results together and provide an end-to-end sample complexity result for learning decentralized LQR with a partially nested information structure. Surprisingly, despite the existence of the information constraints and the fact that the optimal controller is a linear dynamic controller, our sample complexity result matches with that of learning centralized LQR without any information constraints [12].

2 Preliminaries and Problem Formulation

2.1 Notation and Terminology

The sets of integers and real numbers are denoted as $\mathbb{Z}$ and $\mathbb{R}$ , respectively. The set of integers (resp., real numbers) that are greater than or equal to $a\in\mathbb{R}$ is denoted as $\mathbb{Z}_{\geq a}$ (resp., $\mathbb{R}_{\geq a}$ ). For a real number $a$ , let $\lceil a\rceil$ be the smallest integer that is greater than or equal to $a$ . The space of $m$ -dimensional real vectors is denoted by $\mathbb{R}^{m}$ , and the space of $m\times n$ real matrices is denoted by $\mathbb{R}^{m\times n}$ . For a matrix $P\in\mathbb{R}^{n\times n}$ , let $P^{\top}$ , $\text{Tr}(P)$ , and $\{\sigma_{i}(P):i\in\{1,\dots,n\}\}$ be its transpose, trace, and set of singular values, respectively. Without loss of generality, let the singular values of $P$ be ordered as $\sigma_{1}(P)\geq\cdots\geq\sigma_{n}(P)$ . Let $\norm{\cdot}$ denote the $\ell_{2}$ norm, i.e., $\norm{P}=\sigma_{1}(P)$ for a matrix $P\in\mathbb{R}^{n\times n}$ , and $\norm{x}=\sqrt{x^{\top}x}$ for a vector $x\in\mathbb{R}^{n}$ . Let $\norm{P}_{F}=\sqrt{\text{Tr}(PP^{\top})}$ denote the Frobenius norm of $P\in\mathbb{R}^{n\times m}$ . A positive semidefinite matrix $P$ is denoted by $P\succeq 0$ , and $P\succeq Q$ if and only if $P-Q\succeq 0$ . Let $\mathbb{S}_{+}^{n}$ (resp., $\mathbb{S}_{++}^{n}$ ) denote the set of $n\times n$ positive semidefinite (resp., positive definite) matrices. Let $I$ denote an identity matrix whose dimension can be inferred from the context. Given any integer $n\geq 1$ , we define $[n]=\{1,\dots,n\}$ . The cardinality of a finite set $\mathcal{A}$ is denoted by $|\mathcal{A}|$ . Let $\mathcal{N}(\mu,\Sigma)$ denote a Gaussian distribution with mean $\mu\in\mathbb{R}^{m}$ and covariance $\Sigma\in\mathbb{S}^{m}_{+}$ .

2.2 Solution to Decentralized LQR with Sparsity and Delay Constraints

In this section, we sketch the method developed in [26, 35], which presents the optimal solution to a decentralized LQR problem with a partially nested information structure [21], when the system model is known a priori. First, let us consider a networked system that consists of $p\in\mathbb{Z}_{\geq 1}$ interconnected linear-time-invariant (LTI) subsystems. Letting the state, input and disturbance of the subsystem corresponding to node $i\in[p]$ be $x_{i}(t)\in\mathbb{R}^{n_{i}}$ , $u_{i}(t)\in\mathbb{R}^{m_{i}}$ , and $w_{i}(t)$ , respectively, the subsystem corresponding to node $i$ is given by

x_{i}(t+1)=\Big{(}\!\sum_{j\in\mathcal{N}_{i}}\!A_{ij}x_{j}(t)+B_{ij}u_{j}(t)\Big{)}+w_{i}(t)\ \forall i\in\mathcal{V},

(1)

where $\mathcal{N}_{i}\subseteq[p]$ is the set of subsystems whose states and inputs directly affect the state of subsystem $j$ , $A_{ij}\in\mathbb{R}^{n_{i}\times n_{i}}$ , $B_{ij}\in\mathbb{R}^{n_{i}\times m_{i}}$ , and $w_{i}(t)\in\mathbb{R}^{n_{i}}$ is a white Gaussian noise process with $w_{i}(t)\sim\mathcal{N}(0,\sigma_{w}^{2}I)$ for all $t\in\mathbb{Z}_{\geq 0}$ , where $\sigma_{w}\in\mathbb{R}_{>0}$ .¹¹1The analysis can be extended to the case when $w_{i}(t)$ is assumed to be a zero-mean white Gaussian noise process with covariance $W\in\mathbb{S}_{++}^{n_{i}}$ . In that case, our analysis will depend on $\max_{i\in\mathcal{V}}\sigma_{1}(W_{i})$ and $\min_{i\in\mathcal{V}}\sigma_{n}(W_{i})$ . For simplicity, we assume throughout this paper that $n_{i}\geq m_{i}$ for all $i\in\mathcal{V}$ . We can also write Eq. (1) as

x_{i}(t+1)=A_{i}x_{\mathcal{N}_{i}}(t)+B_{i}u_{\mathcal{N}_{i}}(t)+w_{i}(t)\quad\forall i\in\mathcal{V},

(2)

where $A_{i}\triangleq\begin{bmatrix}A_{ij_{1}}&\cdots A_{ij_{|\mathcal{N}_{i}|}}\end{bmatrix}$ , $B_{i}\triangleq\begin{bmatrix}B_{ij_{1}}&\cdots B_{ij_{|\mathcal{N}_{i}|}}\end{bmatrix}$ , $x_{\mathcal{N}_{i}}(t)\triangleq\begin{bmatrix}x_{j_{1}}(t)&\cdots x_{j_{|\mathcal{N}_{i}|}}(t)\end{bmatrix}^{\top}$ , and $u_{\mathcal{N}_{i}}(t)\triangleq\begin{bmatrix}u_{j_{1}}(t)&\cdots u_{j_{|\mathcal{N}_{i}|}}(t)\end{bmatrix}^{\top}$ , with $\mathcal{N}_{i}=\{j_{1},\dots,j_{|\mathcal{N}_{i}|}\}$ . Further letting $n=\sum_{i\in\mathcal{V}}n_{i}$ and $m=\sum_{i\in\mathcal{V}}m_{i}$ , and defining $x(t)=\begin{bmatrix}x_{1}(t)^{\top}&\cdots&x_{p}(t)^{\top}\end{bmatrix}^{\top}$ , $u(t)=\begin{bmatrix}u_{1}(t)^{\top}&\cdots&u_{p}(t)^{\top}\end{bmatrix}^{\top}$ and $w(t)=\begin{bmatrix}w_{1}(t)^{\top}&\cdots&w_{p}(t)^{\top}\end{bmatrix}^{\top}$ , we can compactly write Eq. (1) into the following matrix form:

x(t+1)=Ax(t)+Bu(t)+w(t),

(3)

where the $(i,j)$ th block of $A\in\mathbb{R}^{n\times n}$ (resp., $B\in\mathbb{R}^{n\times m}$ ), i.e., $A_{ij}$ (resp., $B_{ij}$ ) satisfies $A_{ij}=0$ (resp., $B_{ij}=0$ ) if $j\notin\mathcal{N}_{i}$ . We assume that $w_{i}(t_{1})$ and $w_{j}(t_{2})$ are independent for all $i,j\in\mathcal{V}$ with $i\neq j$ and for all $t_{1},t_{2}\in\mathbb{Z}_{\geq 0}$ . In other words, $w(t)$ is a white Gaussian noise process with $w(t)\sim\mathcal{N}(0,\sigma_{w}^{2}I)$ for all $t\in\mathbb{Z}_{\geq 0}$ . For simplicity, we assume that $x(0)=0$ throughout this paper.²²2The analysis can be extended to the case when $x(0)$ is given by a zero-mean Gaussian distribution, as one may view $x(0)$ as $w(-1)$ .

Next, we use a directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ with $\mathcal{V}=[p]$ to characterize the information flow among the subsystems in $[p]$ due to communication constraints on the subsystems. Each node in $\mathcal{G}(\mathcal{V},\mathcal{A})$ represents a subsystem in $[p]$ , and we assume that $\mathcal{G}(\mathcal{V},\mathcal{A})$ does not have self loops. We associate any edge $(i,j)\in\mathcal{A}$ with a delay of either $0$ or $1$ , further denoted as $i\xrightarrow[]{0}j$ or $i\xrightarrow[]{1}j$ , respectively.³³3The framework described in this paper can also be used to handle $\mathcal{G}(\mathcal{V},\mathcal{A})$ with larger delays; see [26] for a detailed discussion. Then, we define the delay matrix corresponding to $\mathcal{G}(\mathcal{V},\mathcal{A})$ as $D\in\mathbb{R}^{p\times p}$ such that: (i) If $i\neq j$ and there is a directed path from $j$ to $i$ in $\mathcal{G}(\mathcal{V},\mathcal{A})$ , then $D_{ij}$ is equal to the sum of delays along the directed path from node $j$ to node $i$ with the smallest accumulative delay; (ii) If $i\neq j$ and there is no directed path from $j$ to $i$ in $\mathcal{G}(\mathcal{V},\mathcal{A})$ , then $D_{ij}=+\infty$ ; (iii) $D_{ii}=0$ for all $i\in\mathcal{V}$ . Here, we consider the scenario where the information (e.g., state information) corresponding to subsystem $j\in\mathcal{V}$ can propagate to subsystem $i\in\mathcal{V}$ with a delay of $D_{ij}$ (in time), if and only if there exists a directed path from $j$ to $i$ with an accumulative delay of $D_{ij}$ . Note that as argued in [26], we assume that there is no directed cycle with zero accumulative delay; otherwise, one can first collapse all the nodes in such a directed cycle into a single node, and equivalently consider the resulting directed graph in the framework described above.

To proceed, we consider designing the control input $u(t)$ for the LTI system in Eq. (3). We focus on state-feedback control, i.e., we can view $u(t)$ as a policy that maps the states of the LTI system to a control input. Moreover, we require that $u(t)$ satisfy the information structure according to the directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ and the delay matrix $D\in\mathbb{R}^{p\times p}$ , described above. Specifically, considering any $i\in\mathcal{V}$ and any $t\in\mathbb{Z}_{\geq 0}$ , and noting that the controller corresponding to subsystem $i\in\mathcal{V}$ provides the control input $u_{i}(t)\in\mathbb{R}^{m_{i}}$ , the state information that is available to the controller corresponding to $i\in\mathcal{V}$ is given by

\mathcal{I}_{i}(t)=\{x_{j}(k):j\in\mathcal{V}_{i},0\leq k\leq t-D_{ij}\},

(4)

where $\mathcal{V}_{i}\triangleq\{j\in\mathcal{V}:D_{ij}\neq+\infty\}$ . In other words, the control policy $u_{i}(t)$ maps the states contained in $\mathcal{I}_{i}(t)$ to a control input. In the sequel, we also call $\mathcal{I}_{i}(t)$ the information set of controller $i\in\mathcal{V}$ at time $t\in\mathbb{Z}_{\geq 0}$ . Note that $\mathcal{I}_{i}(t)$ contains the states corresponding to the subsystems in $\mathcal{V}$ that have enough time to reach subsystem $i\in\mathcal{V}$ at time $t\in\mathbb{Z}_{\geq 0}$ , due to the sparsity and delay constraints described above. Now, based on the information set $\mathcal{I}_{i}(t)$ , we further define $\mathcal{S}(\mathcal{I}_{i}(t))$ to be the set that consists of all the policies that map the states in $\mathcal{I}_{i}(t)$ to a control input at node $i$ . The goal is then to solve the following constrained optimization problem:

\begin{split}\min_{u(0),u(1),\dots}&\lim_{T\to\infty}\mathbb{E}\Big{[}\frac{1}{T}\sum_{t=0}^{T-1}(x(t)^{\top}Qx(t)+u(t)^{\top}Ru(t))\Big{]}\\ s.t.\ &x(t+1)=Ax(t)+Bu(t)+w(t),\\ &u_{i}(t)\in\mathcal{S}(\mathcal{I}_{i}(t))\quad\forall i\in\mathcal{V},\forall t\in\mathbb{Z}_{\geq 0},\end{split}

(5)

where $Q\in\mathbb{S}_{+}^{n}$ and $R\in\mathbb{S}_{++}^{m}$ are the cost matrices, and the expectation is taken with respect to $w(t)$ for all $t\in\mathbb{Z}_{\geq 0}$ . Throughout the paper, we always assume that the following assumption on the information propagation pattern among the subsystems in $\mathcal{V}$ holds (e.g., [26, 40]).

Assumption 1.

For all $j\in\mathcal{N}_{i}$ , it holds that $D_{ij}\leq 1$ , where $\mathcal{N}_{i}$ is given in Eq. (1).

Assumption 1 says that the state of subsystem $i\in\mathcal{V}$ is affected by the state and input of subsystem $j\in\mathcal{V}$ , if and only if there is a communication link with a delay of at most $1$ from subsystem $j$ to $i$ in $\mathcal{G}(\mathcal{V},\mathcal{A})$ . As shown in [26], Assumption 1 ensures that the information structure associated with the system given in Eq. (1) is partially nested [21]. Assumption 1 is frequently used in decentralized control problems (e.g., [26, 35] and the references therein), and one can see that the assumption is satisfied in networked systems where information propagates at least as fast as dynamics. To illustrate our arguments above, we introduce Example 1.

Example 1.

Consider a directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ given in Fig. 1, where $\mathcal{V}=\{1,2,3\}$ and each directed edge is associated with a delay of $0$ or $1$ . The corresponding LTI system is then given by

\begin{bmatrix}x_{1}(t+1)\\ x_{2}(t+1)\\ x_{3}(t+1)\end{bmatrix}=\begin{bmatrix}A_{11}&A_{12}&A_{13}\\ 0&A_{22}&A_{23}\\ 0&A_{32}&A_{33}\end{bmatrix}\begin{bmatrix}x_{1}(t)\\ x_{2}(t)\\ x_{3}(t)\end{bmatrix}+\begin{bmatrix}B_{11}&B_{12}&B_{13}\\ 0&B_{22}&B_{23}\\ 0&B_{32}&B_{33}\end{bmatrix}\begin{bmatrix}u_{1}(t)\\ u_{2}(t)\\ u_{3}(t)\end{bmatrix}+\begin{bmatrix}w_{1}(t)\\ w_{2}(t)\\ w_{3}(t)\end{bmatrix}.

(6)

Refer to caption — Figure 1: The directed graph of Example 1. Node $i\in\mathcal{V}$ represents a subsystem with state $x_{i}(t)$ and edge $(i,j)\in\mathcal{A}$ is labelled with the information propagation delay from $i$ to $j$ .

Now, in order to present the solution to (5) given in, e.g., [26], we need to construct an information graph $\mathcal{P}(\mathcal{U},\mathcal{H})$ (see [26] for more details). Considering any directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ with $\mathcal{V}=[p]$ , and the delay matrix $D\in\mathbb{R}^{p\times p}$ as we described above, let us first define $s_{j}(k)$ to be the set of nodes in $\mathcal{G}(\mathcal{V},\mathcal{A})$ that are reachable from node $j$ within $k$ time steps, i.e., $s_{j}(k)=\{i\in\mathcal{V}:D_{ij}\leq k\}$ . The information graph $\mathcal{P}(\mathcal{U},\mathcal{H})$ is then constructed as

\begin{split}\mathcal{U}&=\{s_{j}(k):k\geq 0,j\in\mathcal{V}\},\\ \mathcal{H}&=\{(s_{j}(k),s_{j}(k+1)):k\geq 0,j\in\mathcal{V}\}.\end{split}

(7)

Thus, we see from (7) that each node $s\in\mathcal{U}$ corresponds to a set of nodes from $\mathcal{V}=[p]$ in the original directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ . Using a similar notation to that for the graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ , if there is an edge from $s$ to $r$ in $\mathcal{P}(\mathcal{U},\mathcal{H})$ , we denote the edge as $s\to r$ . Additionally, considering any $s_{i}(0)\in\mathcal{U}$ , we write $w_{i}\xrightarrow[]{}s_{i}(0)$ to indicate the fact that the noise $w_{i}(t)$ is injected to node $i\in\mathcal{V}$ at time $t\in\mathbb{Z}_{\geq 0}$ .⁴⁴4Note that we have assumed that there is no directed cycle with zero accumulative delay in $\mathcal{P}(\mathcal{U},\mathcal{H})$ . Hence, one can show that for any $s_{i}(0)\in\mathcal{U}$ , $w_{i}$ is the only noise term such that $w_{i}\rightarrow s_{i}(0)$ . From the above construction of the information graph $\mathcal{P}(\mathcal{U},\mathcal{H})$ , one can show that the following properties hold.

Lemma 1.

[26, Proposition 1] Given a directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ with $\mathcal{V}=[p]$ , the information graph $\mathcal{P}(\mathcal{U},\mathcal{H})$ constructed in (7) satisfies the following: (i) For every $r\in\mathcal{U}$ , there is a unique $s\in\mathcal{U}$ such that $(r,s)\in\mathcal{H}$ , i.e., $r\xrightarrow[]{}s$ ; (ii) every path in $\mathcal{P}(\mathcal{U},\mathcal{H})$ ends at a node with a self loop; and (iii) $n\leq|\mathcal{U}|\leq p^{2}-p+1$ .

Remark 1.

One can see from the construction of $\mathcal{P}(\mathcal{U},\mathcal{H})$ and Lemma 1 that $\mathcal{P}(\mathcal{U},\mathcal{H})$ is a forest, i.e., a set of disconnected directed trees, where each directed tree in the forest is oriented to a node with a self loop in $\mathcal{P}(\mathcal{U},\mathcal{H})$ . Specifically, $s_{i}(0)$ for all $i\in\mathcal{V}$ are the leaf nodes in $\mathcal{P}(\mathcal{U},\mathcal{H})$ , and the nodes with self loop are root nodes in $\mathcal{P}(\mathcal{U},\mathcal{H})$ .

To illustrate the construction steps and the properties of the information graph discussed above, we again use Example 1; the resulting information graph $\mathcal{P}(\mathcal{U},\mathcal{H})$ is then given in Fig. 2. Note that the information graph $\mathcal{P}(\mathcal{U},\mathcal{H})$ in Fig. 2 contains two disconnected directed trees, one of which is an isolated node $\{1\}\in\mathcal{U}$ with a self loop. Also notice that $s_{1}(0)=\{1\}$ , $s_{2}(0)=\{1,2\}$ and $s_{3}(0)=\{3\}$ . In fact, we can check that the results in Lemma 1 hold for $\mathcal{P}(\mathcal{U},\mathcal{H})$ in Fig. 2.

Throughout this paper, we assume that the elements in $\mathcal{V}=[p]$ are ordered in an increasing manner, and that the elements in $s$ are also ordered in an increasing manner for all $s\in\mathcal{U}$ . Now, for any $s,r\in\mathcal{U}$ , we use $A_{sr}$ (or $A_{s,r}$ ) to denote the submatrix of $A$ that corresponds to the nodes of the directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ contained in $s$ and $r$ . For example, $A_{\{1\},\{1,2\}}=\begin{bmatrix}A_{11}&A_{12}\end{bmatrix}$ . In the sequel, we will also use similar notations to denote submatrices of $B$ , $Q$ , $R$ and the identity matrix $I$ . We will make the following standard assumptions (see, e.g., [26]).

Assumption 2.

For any $s\in\mathcal{U}$ that has a self loop, the pair $(A_{ss},B_{ss})$ is stabilizable and the pair $(A_{ss},C_{ss})$ is detectable, where $Q_{ss}=C_{ss}^{\top}C_{ss}$ .

Leveraging the partial nestedness of (5), the authors in [26] obtained the optimal solution to (5), which we summarize in the following lemma.

Lemma 2.

[26, Corollary 4] Consider the problem given in (5), and let $\mathcal{P}(\mathcal{U},\mathcal{H})$ be the associated information graph. Suppose Assumption 2 holds. For all $r\in\mathcal{U}$ , define matrices $P_{r}$ and $K_{r}$ recursively as

	$\displaystyle K_{r}$	$\displaystyle=-(R_{rr}+B_{sr}^{\top}P_{s}B_{sr}^{\top})^{-1}B_{sr}^{\top}P_{s}A_{sr},$		(8)
	$\displaystyle P_{r}$	$\displaystyle=Q_{rr}+K_{r}^{\top}R_{rr}K_{r}+(A_{sr}+B_{sr}K_{r})^{\top}P_{s}(A_{sr}+B_{sr}K_{r}),$		(9)

where for each $r\in\mathcal{U}$ , $s\in\mathcal{U}$ is the unique node such that $r\rightarrow s$ . In particular, for any $s\in\mathcal{U}$ that has a self loop, the matrix $P_{s}$ is the unique positive semidefinite solution to the Riccati equation given by Eq. (9) , and the matrix $A_{ss}+B_{ss}K_{s}$ is stable. The optimal solution to (5) is then given by

\zeta_{s}(t+1)=\sum_{r\rightarrow s}(A_{sr}+B_{sr}K_{r})\zeta_{r}(t)+\sum_{w_{i}\rightarrow s}I_{s,\{i\}}w_{i}(t),

(10)

and

u_{i}^{\star}(t)=\sum_{r\ni i}I_{\{i\},r}K_{r}\zeta_{r}(t),

(11)

for all $t\in\mathbb{Z}_{\geq 0}$ , where $\zeta_{s}(t)$ is an internal state initialized with $\zeta_{s}(0)=\sum_{w_{i}\rightarrow s}I_{s,\{i\}}x_{i}(0)=0$ for all $s\in\mathcal{U}$ . The corresponding optimal cost of (5), denoted as $J_{\star}$ , is given by

J_{\star}=\sigma_{w}^{2}\sum_{\begin{subarray}{c}i\in\mathcal{V}\\ w_{i}\rightarrow s\end{subarray}}\text{Tr}\big{(}I_{\{i\},s}P_{s}I_{s,\{i\}}\big{)}.

(12)

Let us use Example 1 to illustrate the results in Lemma 2. First, considering node $\{1\}\in\mathcal{U}$ in the information graph $\mathcal{P}(\mathcal{U},\mathcal{H})$ given in Fig. 2, we have from Eq. (10) that

	$\displaystyle\zeta_{1}(t+1)$	$\displaystyle=(A_{11}+B_{11}K_{1})\zeta_{1}(t)+\sum_{w_{i}\to\{1\}}I_{\{1\},\{i\}}w_{i}(t)$
		$\displaystyle=(A_{11}+B_{11}K_{1})\zeta_{1}(t)+w_{1}(t).$

Next, considering node $2\in\mathcal{V}$ in the directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ given in Fig. 1, we see from Eq. (11) and Fig. 2 that

u_{2}^{\star}(t)=\sum_{r\ni 2}I_{\{2\},r}K_{r}\zeta_{r}(t)=I_{\{2\},\{1,2\}}K_{\{1,2\}}\zeta_{\{1,2\}}(t)+I_{\{2\},\{1,2,3\}}K_{\{1,2,3\}}\zeta_{\{1,2,3\}}(t),

where $K_{r}$ is given by Eq. (8).

Remark 2.

Obtaining the optimal policy $u^{\star}_{i}(t)$ , for any $i\in\mathcal{V}$ , given by Lemma 2 requires global knowledge of the system matrices $A$ and $B$ , the cost matrices $Q$ and $R$ , and the directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ with the associated delay matrix $D$ . Moreover, $u^{\star}(t)$ given in Lemma 2 is not a static state-feedback controller, but a linear dynamic controller based on the internal states $\zeta_{r}(\cdot)$ for all $r\in\mathcal{U}$ . For any controller $i\in\mathcal{V}$ and for any $t\in\mathbb{Z}_{\geq 0}$ , the authors in [26] proposed an algorithm to determine $\zeta_{r}(t)$ for all $r\in\mathcal{U}$ such that $i\in r$ , and thus $u_{i}^{\star}(t)$ , using only the memory maintained by the algorithm, the state information contained in the information set $\mathcal{I}_{i}(t)$ defined in Eq. (4), and the global information described above.

2.3 Problem Formulation and Summary of Results

We now formally introduce the problem that we will study in this paper. We consider the scenario where the system matrices $A$ and $B$ are unknown. However, we assume that the directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ and the associated delay matrix $D$ are known. Similarly to, e.g., [12, 43], we consider the scenario where we can first conduct experiments in order to estimate the unknown system matrices $A$ and $B$ . Specifically, starting from the initial state $x(0)=0$ , we evolve the system given in Eq. (3) for $N\in\mathbb{Z}_{\geq 1}$ time steps using a given control input sequence $\{u(0),u(1),\dots,u(N-1)\}$ , and collect the resulting state sequence $\{x(1),x(2),\dots,x(N)\}$ . Based on $\{u(0),\dots,u(N-1)\}$ and $\{x(0),\dots,x(N)\}$ , we use a least squares approach to obtain estimates of the system matrices $A$ and $B$ , denoted as $\hat{A}$ and $\hat{B}$ , respectively. Using the obtained $\hat{A}$ and $\hat{B}$ , the goal is still to solve (5). Since the true system matrices $A$ and $B$ are unknown, it may no longer be possible to solve (5) optimally, using the methods introduced in Section 2.2. Thus, we aim to provide a solution to (5) using $\hat{A}$ and $\hat{B}$ , and characterize its performance (i.e., suboptimality) guarantees.

In the rest of this paper, we first analyze the estimation error of $\hat{A}$ and $\hat{B}$ obtained from the procedure described above. In particular, we show in Section 3 that the estimation errors $\norm{\hat{A}-A}$ and $\norm{\hat{B}-B}$ scale as $\tilde{\mathcal{O}}(1/\sqrt{N})$ with high probability.⁵⁵5Throughout this paper, we let $\tilde{\mathcal{O}}(\cdot)$ hide logarithmic factors in $N$ . Next, in Section 4, we design a control policy $\hat{u}(\cdot)$ , based on $\hat{A}$ and $\hat{B}$ , which satisfies the information constraints given in (5). Supposing $\norm{\hat{A}-A}\leq\varepsilon$ and $\norm{\hat{B}-B}\leq\varepsilon$ , where $\varepsilon\in\mathbb{R}_{>0}$ , and denoting the cost of (5) corresponding to $\hat{u}(\cdot)$ as $\hat{J}$ , we show in Section 5.2 that

\hat{J}-J_{\star}\leq C\varepsilon,

as long as $\varepsilon\leq C_{0}$ , where $J_{\star}$ is the optimal cost of (5) given by (12), and $C$ and $C_{0}$ are constants that explicitly depend on the problem parameters of (5). Finally, combining the above results together, we show in Section 5.3 that with high probability and for large enough $N$ , the following end-to-end sample complexity of learning decentralized LQR with the partially nested information structure holds:

\hat{J}-J_{\star}=\tilde{\mathcal{O}}(\frac{1}{\sqrt{N}}).

3 System Identification Using Least Squares

As we described in Section 2.3, we use a least squares approach to estimate the system matrices $A\in\mathbb{R}^{n\times n}$ and $B\in\mathbb{R}^{n\times m}$ , based on a single system trajectory consisting of the control input sequence $\{u(0),\dots,u(N-1)\}$ and the system state sequence $\{x(0),\dots,x(N)\}$ , where $x(0)=0$ and $N\in\mathbb{Z}_{\geq 1}$ . Here, we draw the inputs $u(0),\dots,u(N-1)$ independently from a Gaussian distribution $\mathcal{N}(0,\sigma^{2}_{u}I)$ , where $\sigma_{u}\in\mathbb{R}_{>0}$ . In other words, we let $u(t)\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,\sigma_{u}^{2}I)$ for all $t\in\{0,\dots,N-1\}$ . Moreover, we assume that the input $u(t)$ and the disturbance $w(t)$ are independent for all $t\in\{0,\dots,N-1\}$ . Note that we consider the scenario where the estimation of $A$ and $B$ is performed in a centralized manner using a least squares approach (detailed in Algorithm 1). However, we remark that Algorithm 1 can be carried out without violating the information constraints given by Eq. (4), since $u(t)\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,\sigma_{u}^{2}I)$ is not a function of the states in the information set defined in Eq. (4) for any $t\in\{0,\dots,N-1\}$ . In the following, we present the least squares approach to estimate $A$ and $B$ , and characterize the corresponding estimation error.

3.1 Least Squares Estimation of System Matrices

Let us denote

\Theta=\begin{bmatrix}A&B\end{bmatrix}\ \text{and}\ z(t)=\begin{bmatrix}x(t)^{\top}&u(t)^{\top}\end{bmatrix}^{\top},

(13)

where $\Theta\in\mathbb{R}^{n\times(n+m)}$ and $z(t)\in\mathbb{R}^{n+m}$ . Given the sequences $\{z(0),\dots,z(N-1)\}$ and $\{x(1),\dots,x(N)\}$ , we use the regularized least squares to obtain an estimate of $\Theta$ , denoted as $\hat{\Theta}(N)$ , i.e.,

\hat{\Theta}(N)=\operatorname*{arg\,min}_{Y\in\mathbb{R}^{n\times(n+m)}}\Big{\{}\lambda\norm{Y}_{F}^{2}+\sum_{t=0}^{N-1}\norm{x(t+1)-Yz(t)}^{2}\Big{\}},

(14)

where $\lambda\in\mathbb{R}_{>0}$ is the regularization parameter. We summarize the above least squares approach in Algorithm 1.

Input: parameter $\lambda>0$ and time horizon length $N$

Algorithm 1 Least Squares Estimation of

A

and

B

1:Initialize

x(0)=0

2:for

t=0,\dots,N-1

3: Play

u(t)\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,\sigma_{u}^{2}I)

4:Obtain

\hat{\Theta}(N)

using (14)

5:Extract

\hat{A}

and

\hat{B}

from

\hat{\Theta}(N)

3.2 Least Squares Estimation Error

In order to characterize the estimation error of $\hat{\Theta}(N)$ given by (14), we will use the following result from [10], which is a consequence of [2, Theorem 1].

Lemma 3.

[10, Lemma 6] For any $t\in\mathbb{Z}_{\geq 1}$ , let $V(t)=\lambda I+\sum_{k=0}^{t-1}z(k)z(k)^{\top}$ and $\Delta(t)=\Theta-\hat{\Theta}(t)$ , where $\Theta$ and $z(k)$ are given in (13), $\hat{\Theta}(t)$ is given by (14), and $\lambda\in\mathbb{R}_{>0}$ . Then, for any $\delta_{\Theta}>0$ , the following hold with probability at least $1-\delta_{\Theta}$ :

\text{Tr}\big{(}\Delta(t)^{\top}V(t)\Delta(t)\big{)}\leq 4\sigma_{w}^{2}n\log\bigg{(}\frac{n}{\delta_{\Theta}}\frac{\det(V(t))}{\det(\lambda I)}\bigg{)}+2\lambda\norm{\Theta}_{F}^{2},\quad\forall t\in\mathbb{Z}_{\geq 0}.

For any $\delta>0$ , we now introduce the following probabilistic events that will be useful in our analysis later:

\begin{split}\mathcal{E}_{w}&=\bigg{\{}\max_{0\leq t\leq N-1}\norm{w(t)}\leq\sigma_{w}\sqrt{5n\log\frac{4N}{\delta}}\bigg{\}},\\ \mathcal{E}_{u}&=\bigg{\{}\max_{0\leq t\leq N-1}\norm{u(t)}\leq\sigma_{u}\sqrt{5m\log\frac{4N}{\delta}}\bigg{\}},\\ \mathcal{E}_{\Theta}&=\bigg{\{}\text{Tr}\big{(}\Delta(N)^{\top}V(N)\Delta(N)\big{)}\leq 4\sigma_{w}^{2}n\log\bigg{(}\frac{4n}{\delta}\frac{\det(V(N))}{\det(\lambda I)}\bigg{)}+2\lambda\norm{\Theta}_{F}^{2}\bigg{\}},\\ \mathcal{E}_{z}&=\bigg{\{}\sum_{t=0}^{N-1}z(t)z(t)^{\top}\succeq\frac{(N-1)\underline{\sigma}^{2}}{40}I\bigg{\}},\end{split}

(15)

where $\underline{\sigma}\triangleq\min\{\sigma_{w},\sigma_{u}\}$ . Denoting

\mathcal{E}=\mathcal{E}_{w}\cap\mathcal{E}_{v}\cap\mathcal{E}_{\Theta}\cap\mathcal{E}_{z},

(16)

we have the following result; the proof is included in Appendix A.

Lemma 4.

For any $\delta>0$ and for any $N\geq 200(n+m)\log\frac{48}{\delta}$ , it holds that $\mathbb{P}(\mathcal{E})\geq 1-\delta$ .

For the analysis in the sequel, we will make the following assumption, which is also made in related literature (see e.g., [24, 37, 43]).

Assumption 3.

The system matrix $A\in\mathbb{R}^{n\times n}$ is stable, and $\norm{A^{k}}\leq\kappa_{0}\gamma_{0}^{k}$ for all $k\in\mathbb{Z}_{\geq 0}$ , where $\kappa_{0}\geq 1$ and $\rho(A)<\gamma_{0}<1$ .

Note that for any stable matrix $A$ , we have from the Gelfand formula (e.g., [22]) that there always exist $\kappa_{0}\in\mathbb{R}_{\geq 1}$ and $\gamma_{0}\in\mathbb{R}$ with $\rho(A)<\gamma_{0}<1$ such that $\norm{A^{k}}\leq\kappa_{0}\gamma_{0}^{k}$ for all $k\in\mathbb{Z}_{\geq 0}$ . We then have the following results; the proofs are included in Appendix A.

Lemma 5.

Suppose Assumption 3 holds. On the event $\mathcal{E}$ defined in Eq. (16),

\norm{z(t)}\leq\frac{5\kappa_{0}}{1-\gamma_{0}}\overline{\sigma}\sqrt{(\norm{B}^{2}m+m+n)\log\frac{4N}{\delta}},

(17)

for all $t\in\{0,\dots,N-1\}$ , where $N\geq 1$ , $\overline{\sigma}=\max\{\sigma_{w},\sigma_{u}\}$ , $\gamma_{0}$ and $\kappa_{0}$ are given in Assumption 3, and $z(t)=\begin{bmatrix}x(t)^{\top}&u(t)^{\top}\end{bmatrix}^{\top}$ with $x(t)\in\mathbb{R}^{n}$ and $w(t)\in\mathbb{R}^{m}$ to be the state and input of the system in Eq. (3), respectively, corresponding to Algorithm 1.

Proposition 1.

Suppose Assumption 3 holds, and $\norm{A}\leq\vartheta$ and $\norm{B}\leq\vartheta$ , where $\vartheta\in\mathbb{R}_{>0}$ . Consider any $\delta>0$ . Let the input parameters to Algorithm 1 satisfy $N\geq 200(n+m)\log\frac{48}{\delta}$ and $\lambda\geq\underline{\sigma}^{2}/40$ , where $\underline{\sigma}=\min\{\sigma_{w},\sigma_{u}\}$ . Define

z_{b}=\frac{5\kappa_{0}}{1-\gamma_{0}}\overline{\sigma}\sqrt{(\norm{B}^{2}m+m+n)\log\frac{4N}{\delta}},

where $\kappa_{0}$ and $\gamma_{0}$ are given in Assumption 3, and $\overline{\sigma}=\max\{\sigma_{w},\sigma_{u}\}$ . Then, with probability at least $1-\delta$ , it holds that $\norm{\hat{A}-A}\leq\varepsilon_{0}$ and $\norm{\hat{B}-B}\leq\varepsilon_{0}$ , where $\hat{A}$ and $\hat{B}$ are returned by Algorithm 1, and

\varepsilon_{0}=4\sqrt{\frac{160}{N\underline{\sigma}^{2}}\bigg{(}2n\sigma_{w}^{2}(n+m)\log\frac{N+z^{2}_{b}/\lambda}{\delta}+\lambda n\vartheta^{2}\bigg{)}}.

(18)

Several remarks pertaining to Algorithm 1 and the result in Proposition 1 are now in order. First, note that while considering the problem of learning centralized LQR without any information constraints, the authors in [12] proposed to obtain $\hat{A}$ and $\hat{B}$ from multiple system trajectories using least squares, where each trajectory starts from $x(0)=0$ . They showed that $\norm{\hat{A}-A}=\mathcal{O}(1/\sqrt{N_{r}})$ and $\norm{\hat{B}-B}=\mathcal{O}(1/\sqrt{N_{r}})$ , where $N_{r}\in\mathbb{Z}_{\geq 1}$ is the number of system trajectories. In contrast, we estimate $A$ and $B$ from a single system trajectory, and achieve $\norm{\hat{A}-A}=\tilde{\mathcal{O}}(1/\sqrt{N})$ and $\norm{\hat{B}-B}=\tilde{\mathcal{O}}(1/\sqrt{N})$ .

Second, note that we use the regularized least squares in Algorithm 1 to obtain estimates $\hat{A}$ and $\hat{B}$ . Although least squares without regularization can also be used to obtain estimates $\hat{A}$ and $\hat{B}$ from a single system trajectory with the same $\tilde{\mathcal{O}}(1/\sqrt{N})$ finite sample guarantee (e.g., [36]), we choose to use regularized least squares considered in, e.g., [2, 10, 9]. The reason is that introducing the regularization into least squares makes the finite sample analysis more tractable (e.g., [10, 9]), which facilitates the adaption of the analysis in [10, 9] to our setting described in this section. Moreover, note that the lower bound on $\lambda$ required in Proposition 1 is merely used to guarantee that the denominator of the right-hand side of Eq. (18) contains the factor $1/\sqrt{N}$ ; choosing an arbitrary $\lambda\in\mathbb{R}_{>0}$ leads to a factor $1/\sqrt{N-1}$ . In general, one can show that choosing any $\lambda\in\mathbb{R}_{>0}$ leads to the same $\tilde{\mathcal{O}}(1/\sqrt{N})$ finite sample guarantee.

Third, note that we do not leverage the block structure (i.e., sparsity pattern) of $A$ and $B$ described in Section 2.2, when we obtain $\hat{A}$ and $\hat{B}$ using Algorithm 1. Therefore, the sparsity pattern of $\hat{A}$ and $\hat{B}$ may potentially be inconsistent with that of $A$ and $B$ . Nonetheless, such a potential inconsistency does not play any role in our analysis later. The reason is that the control policy to be proposed in Section 4 does not depend on the sparsity pattern of $\hat{A}$ and $\hat{B}$ . Moreover, when analyzing the suboptimality of the proposed control policy later in Section 5, we only leverage the fact that the estimation error corresponding to submatrices in $\hat{A}$ (resp., $\hat{B}$ ) will be upper bounded by $\norm{\hat{A}-A}$ (resp., $\norm{\hat{B}-B}$ ). Specifically, considering any nodes $s,r$ in the information graph $\mathcal{P}(\mathcal{U},\mathcal{H})$ given by (7), one can show that $\norm{\hat{A}_{sr}-A_{sr}}\leq\norm{\hat{A}-A}$ , where recall that $\hat{A}_{sr}$ (resp., $A_{sr}$ ) is a submatrix of $\hat{A}$ (resp., $A$ ) that corresponds to the nodes of the directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ contained in $s$ and $r$ .

Finally, we remark that one may also use system identification schemes and the associated sample complexity analysis dedicated to sparse system matrices (e.g., [14]). Under some extra assumptions on $A$ and $B$ (e.g., [14]), one may then obtain $\hat{A}$ and $\hat{B}$ that have the same sparsity pattern as $A$ and $B$ , and remove the logarithmic factor in $N$ in $\varepsilon_{0}$ defined in Proposition 1. However, the assumptions on $A$ and $B$ made in e.g., [14] can be restrictive and hard to check in practice.

4 Control Policy Design

While the estimation of $A$ and $B$ is performed in a centralized manner as we discussed in Section 3.1, we assume that each controller $i\in\mathcal{V}$ receives the estimates $\hat{A}$ and $\hat{B}$ after we conduct the system identification step described in Algorithm 1. Given the matrices $\hat{A}$ , $\hat{B}$ , $Q$ , and $R$ , and the directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ ( $\mathcal{V}=[p]$ ) with the delay matrix $D$ , in this section we design a control policy that can be implemented in a decentralized manner, while satisfying the information constraints described in Section 2.2. To this end, we leverage the structure of the optimal policy $u^{\star}(\cdot)$ given in Lemma 2 (when $A$ and $B$ are known). Note that the optimal policy $u^{\star}(\cdot)$ cannot be applied to our scenario, since only $\hat{A}$ and $\hat{B}$ are available.

First, given the directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ with $\mathcal{V}=[p]$ and the delay matrix $D$ , we construct the information graph $\mathcal{P}(\mathcal{U},\mathcal{H})$ given by (7). Recall from Remark 1 that $\mathcal{P}(\mathcal{U},\mathcal{H})$ is a forest that contains a set of disconnected directed trees. We then let $\mathcal{L}$ denote the set of all the leaf nodes in $\mathcal{P}(\mathcal{U},\mathcal{H})$ , i.e.,

\mathcal{L}=\{s_{i}(0)\in\mathcal{U}:i\in\mathcal{V}\}.

(19)

Moreover, for any $s\in\mathcal{U}$ , we denote

\mathcal{L}_{s}=\{v\in\mathcal{L}:v\rightsquigarrow s\},

(20)

where we write $v\rightsquigarrow s$ if and only if there is a unique directed path from node $v$ to node $s$ in $\mathcal{P}(\mathcal{U},\mathcal{H})$ . In other words, $\mathcal{L}_{s}$ is the set of leaf nodes in $\mathcal{P}(\mathcal{U},\mathcal{H})$ that can reach $s$ . Moreover, for any $v,s\in\mathcal{U}$ such that $v\rightsquigarrow s$ , we let $l_{vs}$ denote the length of the unique directed path from $v$ to $s$ in $\mathcal{P}(\mathcal{U},\mathcal{H})$ ; we let $l_{vs}=0$ if $v=s$ . For example, in the information graph (associated with Example 1) given in Fig. 2, we have $\mathcal{L}=\{\{1\},\{1,2\},\{3\}\}$ , $\mathcal{L}_{\{1,2,3\}}=\{\{1\},\{1,2\}\}$ , and $l_{\{1\}\{1,2,3\}}=1$ .

Next, in order to leverage the structure of the optimal policy $u^{\star}(\cdot)$ given in Eqs. (8)-(11), we substitute (submatrices of) $\hat{A}$ and $\hat{B}$ into the right-hand sides of Eqs. (8)-(9), and obtain $\hat{K}_{r}$ and $\hat{P}_{r}$ for all $r\in\mathcal{U}$ . Specifically, for all $r\in\mathcal{U}$ , we obtain $\hat{K}_{r}$ , and $\hat{P}_{r}$ recursively as

	$\displaystyle\hat{K}_{r}$	$\displaystyle=-(R_{rr}+\hat{B}_{sr}^{\top}\hat{P}_{s}\hat{B}_{sr}^{\top})^{-1}\hat{B}_{sr}^{\top}\hat{P}_{s}\hat{A}_{sr},$		(21)
	$\displaystyle\hat{P}_{r}$	$\displaystyle=Q_{rr}+\hat{K}_{r}^{\top}R_{rr}\hat{K}_{r}+(\hat{A}_{sr}+\hat{B}_{sr}\hat{K}_{r})^{\top}\hat{P}_{s}(\hat{A}_{sr}+\hat{B}_{sr}\hat{K}_{r}),$		(22)

where for each $r\in\mathcal{U}$ , we let $s\in\mathcal{U}$ be the unique node such that $r\rightarrow s$ , and $\hat{A}_{sr}$ (resp., $\hat{B}_{sr}$ ) is a submatrix of $\hat{A}$ (resp., $\hat{B}$ ) obtained in the same manner as $A_{sr}$ (resp., $B_{sr}$ ) described before. Similarly to Eq. (10), we then use $\hat{K}_{r}$ for all $r\in\mathcal{U}$ together with $\hat{A}$ and $\hat{B}$ to maintain an (estimated) internal state $\hat{\zeta}_{r}(t)$ (to be defined later) for all $r\in\mathcal{U}$ and for all $t\in\{0,\dots,T-1\}$ , which, via a similar form to Eq. (11), will lead to our control policy, denoted as $\hat{u}_{i}(t)$ , for all $i\in\mathcal{V}$ and for all $t\in\{0,\dots,T-1\}$ . Specifically, for all $i\in\mathcal{V}$ in parallel, we propose Algorithm 2 to compute the control policy

\hat{u}_{i}(t)=\sum_{r\ni i}I_{\{i\},r}\hat{K}_{r}\hat{\zeta}_{r}(t)\quad\forall t\in\{0,\dots,T-1\}.

(23)

Input: estimates $\hat{A}$ and $\hat{B}$ , cost matrices $Q$ and $R$ , directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ with $\mathcal{V}=[p]$ and delay matrix $D$ , time horizon length $T$

Algorithm 2 Control policy design for node

i\in\mathcal{V}

1:Construct the information graph

\mathcal{P}(\mathcal{U},\mathcal{H})

from (7)

2:Obtain

\hat{K}_{s}

for all

s\in\mathcal{U}

from Eq. (8)

3:Initialize

\mathcal{M}_{i}\leftarrow\bar{\mathcal{M}}_{i}

4:for

t=0,\dots T-1

5: for

s\in\mathcal{L}(\mathcal{T}_{i})

6: Find

s_{j}(0)\in\mathcal{U}

s.t.

j\in\mathcal{V}

and

s_{j}(0)=s

7: Obtain

\hat{w}_{j}(t-D_{ij}-1)

from Eq. (29)

8: Obtain

\hat{\zeta}_{s}(t-D_{ij})

from Eq. (28)

\mathcal{M}_{i}\leftarrow\mathcal{M}_{i}\cup\{\hat{\zeta}_{s}(t-D_{ij})\}

10: for

s\in\mathcal{R}(\mathcal{T}_{i})

11: Obtain

\hat{\zeta}_{s}(t-D_{\max})

from Eq. (28)

12:

\mathcal{M}_{i}\leftarrow\mathcal{M}_{i}\cup\{\hat{\zeta}_{s}(t-D_{\max})\}

13: Play

\hat{u}_{i}(t)=\sum_{r\ni i}I_{\{i\},r}\hat{K}_{r}\hat{\zeta}_{r}(t)

14:

\mathcal{M}_{i}\leftarrow\mathcal{M}_{i}\setminus\big{(}\{\hat{\zeta}_{s}(t-2D_{\max}-1):s\in\mathcal{L}(\mathcal{T}_{i})\}\cup\{\hat{\zeta}_{s}(t-D_{\max}-1):s\in\mathcal{R}(\mathcal{T}_{i})\}\big{)}

We now describe the notations used in Algorithm 2 and hereafter. Let us consider any $i\in\mathcal{V}$ . In Algorithm 2, we let $\mathcal{T}_{i}$ denote the set of disconnected directed trees in $\mathcal{P}(\mathcal{U},\mathcal{H})$ such that the root node of any tree in $\mathcal{T}_{i}$ contains $i$ . Slightly abusing the notation, we also let $\mathcal{T}_{i}$ denote the set of nodes of all the trees in $\mathcal{T}_{i}$ . Moreover, we denote

\mathcal{L}(\mathcal{T}_{i})=\mathcal{T}_{i}\cap\mathcal{L},

(24)

where $\mathcal{L}$ is defined in Eq. (19), i.e., $\mathcal{L}(\mathcal{T}_{i})$ is the set of leaf nodes of all the trees in $\mathcal{T}_{i}$ . Letting $\mathcal{R}\subseteq\mathcal{U}$ be the set of root nodes in $\mathcal{P}(\mathcal{U},\mathcal{H})$ , we denote

\mathcal{R}(\mathcal{T}_{i})=\mathcal{T}_{i}\cap\mathcal{R},

(25)

where we recall from Lemma 1 that any root node in $\mathcal{P}(\mathcal{U},\mathcal{H})$ has a self loop. We then see from the information graph $\mathcal{P}(\mathcal{U},\mathcal{H})$ given in Fig. 2 that

	$\displaystyle\mathcal{L}(\mathcal{T}_{1})=\{\{1\},\{1,2\},\{3\}\},\ \mathcal{L}(\mathcal{T}_{2})=\mathcal{L}(\mathcal{T}_{3})=\{\{1,2\},\{3\}\},$
	$\displaystyle\mathcal{R}(\mathcal{T}_{1})=\{\{1\},\{1,2,3\}\},\ \mathcal{R}_{2}(\mathcal{T}_{2})=\mathcal{R}(\mathcal{T}_{3})=\{1,2,3\}.$

Note that if any node $s\in\mathcal{T}_{i}$ is a leaf node with a self loop (i.e., $s$ is an isolated node in $\mathcal{P}(\mathcal{U},\mathcal{H}$ )) such as the node $\{3\}$ in Fig. 2, we only include $s$ in $\mathcal{L}(\mathcal{T}_{i})$ (i.e., $s\in\mathcal{L}(\mathcal{T}_{i})$ but $s\not\in\mathcal{R}(\mathcal{T}_{i})$ ).

Furthermore, we denote

D_{\max}=\max_{\begin{subarray}{c}i,j\in\mathcal{V}\\ j\rightsquigarrow i\end{subarray}}D_{ij},

(26)

where we write $j\rightsquigarrow i$ if and only if there is a directed path from node $j$ to node $i$ in $\mathcal{G}(\mathcal{V},\mathcal{A})$ , and recall that $D_{ij}$ is the sum of delays along the directed path from $j$ to $i$ with the smallest accumulative delay. Finally, the memory $\mathcal{M}_{i}$ of Algorithm 2 is initialized as $\mathcal{M}_{i}=\bar{\mathcal{M}}_{i}$ with

\displaystyle\bar{\mathcal{M}}_{i}=

\displaystyle\big{\{}\hat{\zeta}_{s}(k):k\in\{-2D_{\max}-1,\dots,-D_{ij}-1\},s\in\mathcal{L}(\mathcal{T}_{i}),j\in\mathcal{V},s_{j}(0)=s\big{\}}\cup\{\hat{\zeta}_{s}(-D_{\max}-1):s\in\mathcal{R}(\mathcal{T}_{i})\},

(27)

where we initialize $\hat{\zeta}_{s}(k)=0$ for all $\hat{\zeta}_{s}(k)\in\bar{\mathcal{M}}_{i}$ .

Remark 3.

For any $s,r\in\mathcal{L}(\mathcal{T}_{i})$ , let $j_{1},j_{2}\in\mathcal{V}$ be such that $s_{j_{1}}(0)=s$ and $s_{j_{2}}(0)=r$ . In Algorithm 2, we assume that the elements in $\mathcal{L}(\mathcal{T}_{i})$ are already ordered such that if $D_{ij_{1}}>D_{ij_{2}}$ , then $s$ comes before $r$ in $\mathcal{L}(\mathcal{T}_{i})$ . We then let the for loop in lines 5-9 in Algorithm 2 iterate over the elements in $\mathcal{L}(\mathcal{T}_{i})$ according to the above order. Considering the node $2$ in the directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ in Example 1, we see from Fig. 1 and Fig. 2 that $\mathcal{L}(\mathcal{T}_{2})=\{\{1,2\},\{3\}\}$ , where $s_{2}(0)=\{1,2\}$ and $s_{3}(0)=\{3\}$ . Since $D_{22}=0$ and $D_{23}=1$ , we assume that the elements in $\mathcal{L}(\mathcal{T}_{2})$ are ordered such that $\mathcal{L}(\mathcal{T}_{2})=\{\{3\},\{1,2\}\}$ .

Remark 4.

Recall that each edge in $\mathcal{G}(\mathcal{V},\mathcal{A})$ is associated with a delay of either $0$ or $1$ . Considering the scenario with only sparsity constraints (e.g., [35]), i.e., all the edges in $\mathcal{G}(\mathcal{V},\mathcal{A})$ have a zero delay, we see that $D_{ij}=0$ for all $i,j\in\mathcal{V}$ such that $j\rightsquigarrow i$ , which implies via Eq. (26) that $D_{\max}=0$ .

For any $r\ni i$ , the dynamics of the internal state $\hat{\zeta}_{r}(t)$ is given by

\hat{\zeta}_{r}(t+1)=\sum_{v\rightarrow r}(\hat{A}_{rv}+\hat{B}_{rv}\hat{K}_{v})\hat{\zeta}_{v}(t)+\sum_{w_{j}\rightarrow r}I_{r,\{j\}}\hat{w}_{j}(t),

(28)

where $\hat{w}_{j}(t)$ is an estimate of the disturbance $w_{j}(t)$ in Eq. (2) obtained as

\hat{w}_{j}(t)=\begin{cases}0\ \text{if}\ t<-1,\\ x_{j}(0)\ \text{if}\ t=-1,\\ x_{j}(t+1)-\hat{A}_{j}x_{\mathcal{N}_{j}}(t)-\hat{B}_{j}\hat{u}_{\mathcal{N}_{j}}(t)\ \text{if}\ t\geq 0,\end{cases}

(29)

where we replace $A_{j}$ and $B_{j}$ with the estimates $\hat{A}_{j}$ and $\hat{B}_{j}$ in Eq. (2), respectively, and $\hat{u}_{\mathcal{N}_{j}}(t)$ is the vector that collects $\hat{u}_{j_{1}}(t)$ for all $j_{1}\in\mathcal{N}_{j}$ , with $\mathcal{N}_{j}$ given in Assumption 1. We note from Eqs. (28)-(29) that $\hat{\zeta}_{r}(0)=\sum_{w_{j}\rightarrow r}I_{r,\{j\}}x_{j}(0)$ , where $x(0)=0$ as we assumed previously. We emphasize that Eqs. (28)-(29) are the keys to our control policy design, and they also enable our analyses in Section 5, where we provide a suboptimality guarantee of our control policy. As we mentioned in Section 1, the motivation of the control policy $\hat{u}(\cdot)$ given by Eqs. (23), (28)-(29) is that the optimal control policy given in Lemma 2 can be viewed as a disturbance-feedback controller. Since the system matrices $A$ and $B$ are unknown, the control policy $\hat{u}(\cdot)$ constructed in Eqs. (23), (28)-(29) maps the estimates of the past disturbances given by Eq. (29) to the current control input via the estimated internal states given by Eq. (28).

Observation 1.

From the structure of the information graph $\mathcal{P}(\mathcal{U},\mathcal{H})$ defined in (7), the following hold:
(a) If $r$ is not a leaf node in $\mathcal{P}(\mathcal{U},\mathcal{H})$ , Eq. (28) reduces to $\hat{\zeta}_{r}(t+1)=\sum_{v\rightarrow r}(\hat{A}_{rv}+\hat{B}_{rv}\hat{K}_{v})\hat{\zeta}_{v}(t)$ .
(b) If $r$ is a leaf node in $\mathcal{P}(\mathcal{U},\mathcal{H})$ that is not isolated, Eq. (28) reduces to $\hat{\zeta}_{r}(t+1)=\sum_{w_{j}\rightarrow r}I_{r,\{j\}}\hat{w}_{j}(t)$ .
(c) If $r$ is an isolated node in $\mathcal{P}(\mathcal{U},\mathcal{H})$ , Eq. (28) reduces to $\hat{\zeta}_{r}(t+1)=(\hat{A}_{rr}+\hat{B}_{rr}\hat{K}_{r})\hat{\zeta}_{r}(t)+\sum_{w_{j}\rightarrow r}I_{r,\{j\}}\hat{w}_{j}(t)$ .

We will show that in each iteration $t\in\{0,\dots,T-1\}$ of the for loop in lines 4-14 of Algorithm 2, the internal states $\hat{\zeta}_{r}(t)$ for all $r\in\mathcal{U}$ such that $i\in r$ (i.e., for all $r\ni i$ ) can be determined, via Eq. (28), based on the current memory $\mathcal{M}_{i}$ of the algorithm and the state information contained in (a subset of) the information set $\mathcal{I}_{i}(t)$ defined in Eq. (4). As we will see, Algorithm 2 maintains, in its current memory $\mathcal{M}_{i}$ , the internal states (with potential time delays) for a certain subset of nodes in $\mathcal{U}$ , via the recursion in Eq. (28). Given those internal states, $\hat{\zeta}_{r}(t)$ for all $r\ni i$ can be determined using Eq. (28). Moreover, the memory $\mathcal{M}_{i}$ of Algorithm 2 is recursively updated in the for loop in lines 4-14 of the algorithm. Formally, we have the following result for Algorithm 2; the proof can be found in Appendix B.

Proposition 2.

Suppose that any controller $i\in\mathcal{V}$ at any time step $t\in\mathbb{Z}_{\geq 0}$ has access to the states in $\tilde{\mathcal{I}}_{i}(t)$ defined as

\tilde{\mathcal{I}}_{i}(t)=\big{\{}x_{j}(k):j\in\mathcal{V}_{i},k\in\{t-D_{\max}-1,\dots,t-D_{ij}\}\big{\}}\subseteq\mathcal{I}_{i}(t),

(30)

where $\mathcal{V}_{i}=\{j\in\mathcal{V}:D_{ij}\neq+\infty\}$ , and $\mathcal{I}_{i}(t)$ is defined in Eq. (4). Then, the following properties hold for Algorithm 2:
(a) The memory $\mathcal{M}_{i}$ of Algorithm 2 can be recursively updated such that at the beginning of any iteration $t\in\{0,\dots,T-1\}$ of the for loop in lines 4-14 of the algorithm,

\mathcal{M}_{i}=\big{\{}\hat{\zeta}_{s}(k):k\in\{t-2D_{\max}-1,\dots,t-D_{ij}-1\},s\in\mathcal{L}(\mathcal{T}_{i}),j\in\mathcal{V},s_{j}(0)=s\big{\}}\\ \cup\{\hat{\zeta}_{s}(t-D_{\max}-1):s\in\mathcal{R}(\mathcal{T}_{i})\}.

(31)

(b) The control input $\hat{u}_{i}(t)$ in line 13 can be determined using Eq. (28) and the states in the memory $\mathcal{M}_{i}$ after line 12 (and before line 14) in any iteration $t\in\{0,\dots,T-1\}$ of the for loop in lines 4-14 of Algorithm 2.

Since the proof of Proposition 2 is rather involved and requires careful considerations of the structures of the directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ and the information graph $\mathcal{P}(\mathcal{U},\mathcal{H})$ described in Section 2.2, we again use Example 1 to illustrate the steps of Algorithm 2 and the results and proof ideas of Proposition 2.

First, we note from Fig. 1 and Eq. (26) that $D_{\max}=1$ . Now, let us consider Algorithm 2 with respective to node $2$ in the directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ given in Fig. 1. We see that $\mathcal{V}_{2}=\{j\in\mathcal{V}:D_{ij}\neq\infty\}=\{2,3\}$ , which implies via Eq. (30) that $\tilde{\mathcal{I}}_{2}(t)=\{x_{2}(t-2),x_{2}(t-1),x_{2}(t),x_{3}(t-2),x_{3}(t-1)\}$ for all $t\in\{0,\dots,T-1\}$ . One can check that the initial memory $\bar{\mathcal{M}}_{2}$ of Algorithm 2 given by Eq. (27) satisfies Eq. (31) for $t=0$ , which implies that the memory $\mathcal{M}_{2}$ satisfies Eq. (31) at the beginning of iteration $t=0$ of the for loop in lines 4-14 of the algorithm.

To proceed, let us consider iteration $t=0$ of the for loop in lines 4-14 of the algorithm. Noting that $\mathcal{L}(\mathcal{T}_{2})=\{\{3\},\{1,2\}\}$ from Remark 3, Algorithm 2 first considers $s=\{3\}$ in the for loop in lines 5-9, which implies that $j=3$ in line 7. We then see from Eq. (29) that in order to obtain $\hat{w}_{3}(t-2)$ , we need to know $x_{3}(t-1)$ , $x_{3}(t-2)$ , $x_{2}(t-2)$ , $\hat{u}_{3}(t-2)$ and $\hat{u}_{2}(t-2)$ , where $x_{3}(t-1),x_{3}(t-2),x_{2}(t-2)\in\tilde{\mathcal{I}}_{2}(t)$ , and $\hat{u}_{2}(t-2),\hat{u}_{3}(t-2)$ are given by Eq. (23). One can then check that the internal states $\hat{\zeta}_{r}(t^{\prime})$ that are needed to determine $\hat{u}_{2}(t-2)$ and $\hat{u}_{3}(t-2)$ are available in the current memory $\mathcal{M}_{2}$ of Algorithm 2 or become available via further applications of Eq. (28). After $\hat{w}_{3}(t-2)$ is obtained, we see from Eq. (28) that $\hat{\zeta}_{\{3\}}(t-1)$ can also be obtained. Algorithm 2 then updates its current memory $\mathcal{M}_{2}$ in line 9 and finishes the iteration with respect to $s=\{3\}$ in the for loop in lines 5-9. Next, Algorithm 2 considers $s=\{1,2\}$ in the for loop in lines 5-9, which implies that $j=2$ in line 7. Following similar arguments to those above for $s=\{3\}$ and noting that the current memory $\mathcal{M}_{2}$ of Algorithm 2 has been updated, one can show that $\hat{\zeta}_{\{1,2\}}(t)$ can be obtained from Eq. (28), based on the current memory of the algorithm. Algorithm 2 again updates its current memory $\mathcal{M}_{2}$ in line 9 and finishes the iteration with respect to $s=\{1,2\}$ in the for loop in lines 5-9.

Now, recalling that $\mathcal{R}(\mathcal{T}_{2})=\{1,2,3\}$ from Fig. 2, we see that Algorithm 2 considers $s=\{1,2,3\}$ in line 10. One can also check that $\hat{\zeta}_{\{1,2,3\}}(t)$ can be obtained from Eq. (28), based on the current memory of the algorithm. Finally, based on the current memory $\mathcal{M}_{2}$ of Algorithm 2 after line 12, one can check that the control input $\hat{u}_{2}(t)$ can be determined from Eq. (23). Note that Algorithm 2 also removes certain internal states from its current memory in line 14 that will no longer be used. One can check that after the removal, the current memory $\mathcal{M}_{2}$ of Algorithm 2 will satisfy Eq. (31) at the beginning of iteration $t+1$ of the for loop in lines 4-14 of the algorithm, where $t=0$ . One can then repeat the above arguments for iteration $t=1$ of the for loop in lines 4-14 of the algorithm and so on.

Several remarks pertaining to Algorithm 2 are now in order. First, since $|\mathcal{L}(\mathcal{T}_{i})|\leq p$ and $|\mathcal{R}(\mathcal{T}_{i})|\leq p$ , one can show via the definition of Algorithm 2 that the number of the states in the memory $\mathcal{M}_{i}$ of Algorithm 2 is always upper bounded by $(2D_{\max}+2)p+2p$ , where we note that $D_{\max}$ defined in Eq. (26) satisfies $D_{\max}\leq p$ , and $p$ is the number of nodes in the directed graph $\mathcal{G}(\mathcal{V},\mathcal{A})$ . Moreover, one can check that Algorithm 2 can be implemented in polynomial time.

Second, it is worth noting that the control policy $\hat{u}_{i}(\cdot)$ for all $i\in\mathcal{U}$ that we proposed in Eq. (23) is related to the certainty equivalent approach (e.g., [4]) that has been used for learning centralized LQR without any information constraints on the controllers (e.g., [12, 29, 9]). It is known that the optimal solution to classic centralized LQR (i.e., problem (5) without the information constraints) is given by a static state-feedback controller $u^{\star}(t)=Kx(t)$ , where $K$ can be obtained from the solution to the Ricatti equation corresponding to $A$ , $B$ , $Q$ and $R$ (e.g., [5]). The corresponding certainty equivalent controller simply takes the form $\hat{u}(t)=\hat{K}x(t)$ , where $\hat{K}$ is obtained from the solution to the Ricatti equation corresponding to $\hat{A}$ , $\hat{B}$ , $Q$ and $R$ , with $\hat{A}$ and $\hat{B}$ to be the estimates of $A$ and $B$ , respectively. While we also leverage the structure of the optimal control policy $u^{\star}(\cdot)$ given in Eq. (11), we cannot simply replace $K_{r}$ with $\hat{K}_{r}$ for all $r\in\mathcal{U}$ in Eq. (11), where $\hat{K}_{r}$ is given by the Ricatti equations in Eqs. (21)-(22). As we argued in Remark 2, this is because $u^{\star}(\cdot)$ is not a static state-feedback controller, but a linear dynamic controller based on the internal states $\zeta_{r}(\cdot)$ for all $r\in\mathcal{U}$ , where the dynamics of $\zeta_{r}(\cdot)$ given by Eq. (10) also depends on $A$ and $B$ . Thus, the control policy $\hat{u}_{i}(\cdot)$ that we proposed in Eq. (23) is a linear dynamic controller based on $\hat{K}_{r}$ and the estimated internal states $\hat{\zeta}_{r}(\cdot)$ for all $r\in\mathcal{U}$ , where the dynamics of $\hat{\zeta}_{r}(\cdot)$ given by Eq. (28) depends on $\hat{A}$ and $\hat{B}$ . Such a more complicated form of $\hat{u}_{i}(\cdot)$ also creates several challenges when we analyze the corresponding suboptimality guarantees in the next section.

Third, for any $i\in\mathcal{V}$ and any $t\in\{0,\dots,T-1\}$ , Proposition 2 only requires controller $i$ to have access to a subset of the state information contained in the information set $\mathcal{I}_{i}(t)$ .

Finally, we remark that Algorithm 2 is not the unique way to implement the control policy $\hat{u}_{i}(\cdot)$ given in Eq. (23), under the information constraints on each controller $i\in\mathcal{V}$ .

5 Suboptimality Guarantees

In this section, we characterize the suboptimality guarantees of the control policy $\hat{u}(\cdot)$ proposed in Section 4. To begin with, in order to explicitly distinguish the states of the system in Eq. (3) corresponding to the control policies $u^{\star}(\cdot)$ and $\hat{u}(\cdot)$ given by Eqs. (11) and (23), respectively, we let $\hat{x}(t)$ denote the state of the system in Eq. (3) corresponding to the control policy $\hat{u}(\cdot)$ given by Eq. (23), for $t\in\mathbb{Z}_{\geq 0}$ , i.e.,

\hat{x}(t+1)=A\hat{x}(t)+B\hat{u}(t)+w(t),

(32)

where we note from Eq. (23) that $\hat{u}(t)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\hat{K}_{s}\hat{\zeta}_{s}(t)$ with $\hat{K}_{s}$ and $\hat{\zeta}_{s}(t)$ given by Eqs. (21) and (28), respectively, for all $s\in\mathcal{U}$ . We let $x(t)$ denote the state of the system in Eq. (3) corresponding to the optimal control policy $u^{\star}(t)$ given by Eq. (11), for $t\in\mathbb{Z}_{\geq 0}$ , i.e.,

x(t+1)=Ax(t)+Bu^{\star}(t)+w(t),

(33)

where $u^{\star}(t)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}K_{s}\zeta_{s}(t)$ with $K_{s}$ and $\zeta_{s}(t)$ given by Eqs. (8) and (10), respectively, for all $s\in\mathcal{U}$ . In Eqs. (32)-(33), we set $\hat{x}(0)=x(0)=0$ .

Moreover, for our analysis in the sequel, we introduce another control policy $\tilde{u}(t)$ given by

\tilde{u}_{i}(t)=\sum_{s\ni i}I_{\{i\},s}\hat{K}_{s}\tilde{\zeta}_{s}(t)\quad\forall i\in\mathcal{V},

(34)

for $t\in\mathbb{Z}_{\geq 0}$ , where for any $s\in\mathcal{U}$ , $\hat{K}_{s}$ is given by Eq. (21), and $\tilde{\zeta}_{s}(t)$ is given by

\tilde{\zeta}_{s}(t+1)=\sum_{r\rightarrow s}(A_{sr}+B_{sr}\hat{K}_{r})\tilde{\zeta}_{r}(t)+\sum_{w_{i}\rightarrow s}I_{s,\{i\}}w_{i}(t),

(35)

with $\tilde{\zeta}_{s}(0)=\sum_{w_{i}\rightarrow s}I_{s,\{i\}}x_{i}(0)=0$ . We then let $\tilde{x}(t)$ denote the state of the system in Eq. (3) corresponding to $\tilde{u}_{i}(\cdot)$ , for $t\in\mathbb{Z}_{\geq 0}$ , i.e.,

\tilde{x}(t+1)=A\tilde{x}(t)+B\tilde{u}(t)+w(t),

(36)

where $\tilde{u}(t)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\hat{K}_{s}\tilde{\zeta}_{s}(t)$ from Eq. (34), and we set $\tilde{x}(0)=x(0)=0$ . Roughly speaking, the auxiliary control policy $\tilde{u}_{i}(\cdot)$ and the corresponding internal state $\tilde{\zeta}_{s}(\cdot)$ introduced above allow us to decompose the suboptimality gap $\hat{J}-J_{\star}$ of the control policy $\hat{u}(\cdot)$ into two terms that are due to $\hat{K}_{s}$ and $\hat{\zeta}_{s}(\cdot)$ , respectively, for all $s\in\mathcal{V}$ . We then have the following result; the proof follows directly from [26, Lemma 14] and is thus omitted. Note that Lemma 6 is a consequence of the partially nested information structure and the structure of the information graph described in Section 2.2.

Lemma 6.

For any $t\in\mathbb{Z}_{\geq 0}$ , the following hold: (a) $\mathbb{E}[\tilde{\zeta}_{s}(t)]=0$ , for all $s\in\mathcal{U}$ ; (b) $\tilde{x}(t)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\tilde{\zeta}_{s}(t)$ ; (c) $\tilde{\zeta}_{s_{1}}(t)$ and $\tilde{\zeta}_{s_{2}}(t)$ are independent for all $s_{1},s_{2}\in\mathcal{U}$ with $s_{1}\neq s_{2}$ .

Using the above notations, the cost of the optimization problem in (5) corresponding to the control policy $\hat{u}(\cdot)$ (i.e., $\hat{J}$ ) can be written as

\hat{J}=\limsup_{T\to\infty}\mathbb{E}\Big{[}\frac{1}{T}\sum_{t=0}^{T-1}\big{(}\hat{x}(t)^{\top}Q\hat{x}(t)+\hat{u}(t)^{\top}R\hat{u}(t)\big{)}\Big{]},

(37)

where we use $\limsup$ instead of $\lim$ since the limit may not exist. Furthermore, we let $\tilde{J}$ denote the cost of the optimization problem in (5) corresponding to the control policy $\tilde{u}(\cdot)$ given in Eq. (34),⁶⁶6We will show in Proposition 3 that the limit in Eq. (38) exists.

\tilde{J}=\lim_{T\to\infty}\mathbb{E}\Big{[}\frac{1}{T}\sum_{t=0}^{T-1}\big{(}\tilde{x}(t)^{\top}Q\tilde{x}(t)+\tilde{u}(t)^{\top}R\tilde{u}(t)\big{)}\Big{]}.

(38)

Supposing that the estimates $\hat{A}$ and $\hat{B}$ satisfy $\norm{\hat{A}-A}\leq\varepsilon$ and $\norm{\hat{B}-B}\leq\varepsilon$ with $\varepsilon\in\mathbb{R}_{>0}$ , our ultimate goal in this section is to provide an upper bound on the suboptimality gap $\hat{J}-J_{\star}$ , where $J_{\star}$ is the optimal cost given by Eq. (12). To this end, we first decompose $\hat{J}-J_{\star}$ into $\tilde{J}-J_{\star}$ and $\hat{J}-\tilde{J}$ , and then upper bound $\tilde{J}-J_{\star}$ and $\hat{J}-\tilde{J}$ separately. Such a decomposition of $\hat{J}-J_{\star}$ is enabled by the structure of the control policy $\hat{u}(\cdot)$ described in Section 4. Specifically, one may view $\tilde{J}-J_{\star}$ as the suboptimality due to $\hat{K}_{r}$ given by Eq. (21) for all $r\in\mathcal{U}$ , and view $\hat{J}-\tilde{J}$ as the suboptimality due to $\hat{\zeta}_{r}(t)$ given by Eq. (28) for all $r\in\mathcal{U}$ and for all $t\in\mathbb{Z}_{\geq 0}$ . Moreover, the suboptimality introduced by $\hat{\zeta}_{r}(t)$ is due to the fact that the dynamics of $\hat{\zeta}_{r}(t)$ given in Eq. (28) for all $r\in\mathcal{U}$ are characterized by $\hat{A}$ , $\hat{B}$ and $\hat{w}(t)$ , where $\hat{w}(t)$ given by Eq. (29) is an estimate of the disturbance $w(t)$ in Eq. (3).

To proceed, we recall from Lemma 2 that for any $s\in\mathcal{U}$ that has a self loop, the matrix $A_{ss}+B_{ss}K_{s}$ is stable, where $K_{s}$ is given by Eq. (8). We then have from the Gelfand formula that for any $s\in\mathcal{U}$ that has a self loop, there exist $\kappa_{s}\in\mathbb{R}_{\geq 1}$ and $\gamma_{s}\in\mathbb{R}$ with $\rho(A_{ss}+B_{ss}K_{s})<\gamma_{s}<1$ such that $\norm{(A_{ss}+B_{ss}K_{s})^{k}}\leq\kappa_{s}\gamma_{s}^{k}$ for all $k\in\mathbb{Z}_{\geq 0}$ . For notational simplicity, let us denote

\gamma=\max\big{\{}\max_{s\in\mathcal{R}}\gamma_{s},\gamma_{0}\big{\}},\ \kappa=\max\big{\{}\max_{s\in\mathcal{R}}\kappa_{s},\kappa_{0}\},

(39)

where $\mathcal{R}\subseteq\mathcal{U}$ denotes the set of root nodes in $\mathcal{U}$ , and $\kappa_{0}\in\mathbb{R}_{\geq 1}$ and $\gamma_{0}\in\mathbb{R}$ with $\rho(A)<\gamma_{0}<1$ are given in Assumption 3. Thus, we see from Assumption 3 and our above arguments that $\norm{(A_{ss}+B_{ss}K_{s})^{k}}\leq\kappa\gamma^{k}$ for all $s\in\mathcal{R}$ and for all $k\in\mathbb{Z}_{\geq 0}$ , and $\norm{A^{k}}\leq\kappa\gamma^{k}$ for all $k\in\mathbb{Z}_{\geq 0}$ , where $\kappa\in\mathbb{Z}_{\geq 1}$ and $0<\gamma<1$ . Moreover, we denote

\begin{split}\Gamma&=\max\big{\{}\norm{A},\norm{B},\max_{s\in\mathcal{U}}\norm{P_{s}},\max_{s\in\mathcal{U}}\norm{K_{s}}\big{\}},\\ \tilde{\Gamma}&=\Gamma+1.\end{split}

(40)

For our analysis in this section, we will make the following assumption; similar assumptions can be found in, e.g., [10, 29, 12].

Assumption 4.

The cost matrices $R$ and $Q$ in (5) satisfy that $\sigma_{n}(R)\geq 1$ and $\sigma_{m}(Q)\geq 1$ .

Note that the above assumption is not more restrictive than assuming that $R$ and $Q$ are positive definite. Specifically, supposing $R\succ 0$ and $Q\succ 0$ , one can assume without loss of generality that $\sigma_{n}(R)\geq 1$ and $\sigma_{m}(Q)\geq 1$ . This is because one can check that scaling the objective function in (5) by a positive constant does not change $K_{r}$ in the optimal solution to (5) provided in Lemma 2, for any $r\in\mathcal{U}$ .

5.1 Perturbation Bounds on Solutions to Ricatti Equations

Supposing $\norm{\hat{A}-A}\leq\varepsilon$ and $\norm{\hat{B}-B}\leq\varepsilon$ with $\varepsilon\in\mathbb{R}_{>0}$ , in this subsection we aim to provide upper bounds on the perturbations $\norm{\hat{P}_{r}-P_{r}}$ and $\norm{\hat{K}_{r}-K_{r}}$ for all $r\in\mathcal{U}$ , where $P_{r}$ (resp., $\hat{P}_{r}$ ) is given by Eq. (9) (resp., Eq. (22)), and $K_{r}$ (resp., $\hat{K}_{r}$ ) is given by Eq. (8) (resp., Eq. (21)). We note from Lemma 2 that for any $r\in\mathcal{U}$ that has a self loop, Eq. (9) (resp., Eq. (22)) reduces to a discrete Ricatti equation in $P_{r}$ (resp., $\hat{P}_{r}$ ). The following results characterize the bounds on $\norm{\hat{P}_{r}-P_{r}}$ and $\norm{\hat{K}_{r}-K_{r}}$ , for all $r\in\mathcal{U}$ ; the proofs can be found in Appendix C.

Lemma 7.

Suppose Assumptions 2 and 4 hold, and $\norm{\hat{A}-A}\leq\varepsilon$ and $\norm{\hat{B}-B}\leq\varepsilon$ , where $\varepsilon\in\mathbb{R}_{>0}$ . Then, for any $r\in\mathcal{U}$ that has a self loop, the following hold:

\norm{\hat{P}_{r}-P_{r}}\leq 6\frac{\kappa^{2}}{1-\gamma^{2}}\tilde{\Gamma}^{5}(1+\sigma_{1}(R^{-1}))\varepsilon\leq\frac{1}{6},

(41)

\norm{\hat{K}_{r}-K_{r}}\leq 18\frac{\kappa^{2}}{1-\gamma^{2}}\tilde{\Gamma}^{8}(1+\sigma_{1}(R^{-1}))\varepsilon\leq 1,

(42)

and

\norm{(A_{rr}+B_{rr}\hat{K}_{r})^{k}}\leq\kappa(\frac{\gamma+1}{2})^{k},\ \forall k\geq 0,

(43)

under the assumption that

\varepsilon\leq\frac{1}{768}\frac{(1-\gamma^{2})^{2}}{\kappa^{4}}\tilde{\Gamma}^{-11}(1+\sigma_{1}(R^{-1}))^{-2},

(44)

where $P_{r}$ (resp., $\hat{P}_{r}$ ) is given by Eq. (9) (resp., Eq. (22)), $K_{r}$ (resp., $\hat{K}_{r}$ ) is given by Eq. (8) (resp., (21)), $\gamma$ and $\kappa$ are defined in (39), and $\tilde{\Gamma}$ is defined in (40).

Lemma 8.

\norm{\hat{K}_{r}-K_{r}}\leq 18\frac{\kappa^{2}}{1-\gamma^{2}}\tilde{\Gamma}^{8}(1+\sigma_{1}(R^{-1}))(20\tilde{\Gamma}^{9}\sigma_{1}(R))^{l_{rs}-1}\varepsilon\leq 1,

(45)

and

\norm{\hat{P}_{r}-P_{r}}\leq 6\frac{\kappa^{2}}{1-\gamma^{2}}\tilde{\Gamma}^{5}(1+\sigma_{1}(R^{-1}))(20\tilde{\Gamma}^{9}\sigma_{1}(R))^{l_{rs}}\varepsilon\leq\frac{1}{6},

(46)

under the assumption that

\varepsilon\leq\frac{1}{768}\frac{(1-\gamma^{2})^{2}}{\kappa^{4}}\tilde{\Gamma}^{-11}(1+\sigma_{1}(R^{-1}))^{-2}(20\tilde{\Gamma}^{9}\sigma_{1}(R))^{-D_{\max}},

(47)

where $K_{r}$ (resp., $\hat{K}_{r}$ ) is given by Eq.(8) (resp., Eq. (21)), $P_{r}$ (resp., $\hat{P}_{r}$ ) is given by Eq. (9) (resp., Eq. (22)), $\tilde{\Gamma}$ is defined in (40), $\kappa$ and $\gamma$ are defined in (39), $l_{rs}$ is the length of the unique directed path from node $r$ to node $s$ in $\mathcal{P}(\mathcal{U},\mathcal{H})$ with $s\in\mathcal{U}$ to be the unique root node that is reachable from $r$ , and $D_{\max}$ is defined in Eq. (26).

Consider any $r\in\mathcal{U}$ with a self loop and suppose Eq. (44) holds. One can show via Eq. (43) and [29, Lemma 12] that $\hat{K}_{r}$ given by Eq. (21) is also stabilizing for the pair $(\hat{A}_{rr},\hat{B}_{rr})$ , i.e., $\hat{A}_{rr}+\hat{B}_{rr}\hat{K}_{r}$ is stable (see our arguments for (102) in Appendix D for more details). Moreover, it is well-known (e.g., [5]) that a stabilizing solution $\hat{P}_{r}$ to the Ricatti equation in Eq. (22) exists if and only if $(\hat{A}_{rr},\hat{B}_{rr})$ is stabilizable and $(\hat{A}_{rr},C_{rr})$ (with $Q_{rr}=C_{rr}^{T}C_{rr}$ ) is detectable.⁷⁷7A solution $\hat{P}_{r}$ to the Ricatti equation in Eq. (19) is stabilizing if and only if $\hat{A}_{rr}+\hat{B}_{rr}\hat{K}_{r}$ (with $\hat{K}_{r}$ given by Eq. (18)) is stable. The above arguments together also imply that $(\hat{A}_{rr},\hat{B}_{rr})$ is stabilizable and $(\hat{A}_{rr},C_{rr})$ (with $Q_{rr}=C_{rr}^{\top}C_{rr}$ ) is detectable for all $r\in\mathcal{U}$ , under the assumption on $\varepsilon$ given by Eq. (44).

5.2 Perturbation Bounds on Costs

Suppose $\norm{\hat{A}-A}\leq\varepsilon$ and $\norm{\hat{B}-B}\leq\varepsilon$ , where $\varepsilon\in\mathbb{R}_{>0}$ . In this subsection, we aim to provide an upper bound on $\hat{J}-J_{\star}$ that scales linearly with $\epsilon$ , where $J_{\star}$ and $\hat{J}$ are given by Eqs. (12) and (37), respectively.

Lemma 9.

Suppose Assumptions 2 and 4 hold, and $\norm{\hat{A}-A}\leq\varepsilon$ and $\norm{\hat{B}-B}\leq\varepsilon$ , where $\varepsilon\in\mathbb{R}_{>0}$ satisfies (47). Then, for any $s\in\mathcal{U}$ ,

\lim_{t\to\infty}\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)\tilde{\zeta}_{s}(t)^{\top}\Big{]}\preceq\frac{4p\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}}\kappa^{2}}{1-\gamma^{2}}I,

(48)

where $p=|\mathcal{V}|$ , $\kappa$ and $\gamma$ are defined in (39), $\tilde{\Gamma}$ is defined in (40), and $D_{\max}$ is defined in Eq. (26).

For our analysis in the sequel, we further define $\tilde{P}_{r}$ recursively, for all $r\in\mathcal{U}$ , as

\tilde{P}_{r}=Q_{rr}+\hat{K}_{r}^{\top}R_{rr}\hat{K}_{r}+(A_{sr}+B_{sr}\hat{K}_{r})^{\top}\tilde{P}_{s}(A_{sr}+B_{sr}\hat{K}_{r}),

(49)

where $\hat{K}_{r}$ is given by Eq. (21), and $s\in\mathcal{U}$ is the unique node such that $r\rightarrow s$ . We then have the following result, which gives an upper bound on $\tilde{J}-J_{\star}$ .

Proposition 3.

Suppose Assumption 2 and 4 hold, and $\norm{\hat{A}-A}\leq\varepsilon$ and $\norm{\hat{B}-B}\leq\varepsilon$ , where $\varepsilon\in\mathbb{R}_{>0}$ satisfies (47). It holds that

\tilde{J}=\sigma_{w}^{2}\sum_{\begin{subarray}{c}i\in\mathcal{V}\\ w_{i}\rightarrow s\end{subarray}}\text{Tr}\big{(}I_{\{i\},s}\tilde{P}_{s}I_{s,\{i\}}\big{)},

(50)

where $\tilde{J}$ is defined in Eq. (38). Moreover, consider the optimal cost $J_{\star}$ given by Eq. (12). For any $\varphi\in\mathbb{R}_{>0}$ ,

\tilde{J}-J_{\star}\leq\frac{72\kappa^{4}\sigma_{w}^{2}npq}{(1-\gamma^{2})^{2}}\tilde{\Gamma}^{4D_{\max}+8}(\Gamma^{3}+\sigma_{1}(R))(1+\sigma_{1}(R^{-1}))(20\tilde{\Gamma}^{9}\sigma_{1}(R))^{D_{\max}}\varepsilon+\varphi,

where $\kappa$ and $\gamma$ are defined in (39), $p=|\mathcal{V}|$ and $q=|\mathcal{U}|$ , $D_{\max}$ is defined in Eq. (26), and $\Gamma$ and $\tilde{\Gamma}$ are defined in (40).

Next, we aim to provide an upper bound on $\hat{J}-\tilde{J}$ . We first prove the following result.

Lemma 10.

Suppose Assumptions 2 and 4 hold, and $\norm{\hat{A}-A}\leq\varepsilon$ and $\norm{\hat{B}-B}\leq\varepsilon$ , where $\varepsilon$ satisfies (47). Then, for any $s\in\mathcal{U}$ and for any $t\in\mathbb{Z}_{\geq 0}$ ,

\mathbb{E}\Big{[}\norm{\tilde{\zeta}_{s}(t)}^{2}\Big{]}\leq\frac{4np\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}}\kappa^{2}}{1-\gamma^{2}},

(51)

where $\tilde{\zeta}_{s}(t)$ is given in Eq. (35), $p=|\mathcal{V}|$ , $\kappa$ and $\gamma$ are defined in (39), $\tilde{\Gamma}$ is defined in (40), and $D_{\max}$ is defined in Eq. (26). Moreover, for any $t\in\mathbb{Z}_{\geq 0}$ ,

\mathbb{E}\Big{[}\norm{\tilde{x}(t)}^{2}\Big{]}\leq\frac{4npq^{2}\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}}\kappa^{2}}{1-\gamma^{2}},

(52)

and

\mathbb{E}\Big{[}\norm{\tilde{u}(t)}^{2}\Big{]}\leq\frac{4npq^{2}\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}+2}\kappa^{2}}{1-\gamma^{2}},

(53)

where $\tilde{x}(t)$ and $\tilde{u}(t)$ are given by Eqs. (36) and (34), respectively, and $q=|\mathcal{U}|$ .

For notational simplicity in the sequel, let us denote

\begin{split}\zeta_{b}&=\sqrt{\frac{4np\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}}\kappa^{2}}{1-\gamma^{2}}},\\ \bar{\varepsilon}&=\frac{(1-\gamma)^{3}}{768\kappa^{4}pq}(\tilde{\Gamma}+1)^{-2}\tilde{\Gamma}^{-9}(1+\sigma_{1}(R^{-1}))^{-2}(20(\tilde{\Gamma}+1)^{2}\tilde{\Gamma}^{7}\sigma_{1}(R))^{-D_{\max}}.\end{split}

(54)

We then have the following results.

Lemma 11.

Suppose Assumptions 2-4 hold, and $\norm{\hat{A}-A}\leq\bar{\varepsilon}$ and $\norm{\hat{B}-B}\leq\bar{\varepsilon}$ , where $\bar{\varepsilon}$ is defined in (54). Then, for all $t\in\mathbb{Z}_{\geq 0}$ ,

\mathbb{E}\Big{[}\norm{\hat{u}(t)-\tilde{u}(t)}^{2}\Big{]}\leq\bigg{(}\frac{58\kappa^{2}(\tilde{\Gamma}+1)^{2D_{\max}+3}p^{2}q^{2}}{(1-\gamma)^{2}}\zeta_{b}\bar{\varepsilon}\bigg{)}^{2},

(55)

and

\mathbb{E}\Big{[}\norm{\hat{x}(t)-\tilde{x}(t)}^{2}\Big{]}\leq\bigg{(}\frac{58\kappa^{3}\Gamma(\tilde{\Gamma}+1)^{2D_{\max}+3}p^{2}q^{2}}{(1-\gamma)^{3}}\zeta_{b}\bar{\varepsilon}\bigg{)}^{2},

(56)

where $\hat{u}(t)$ (resp., $\tilde{u}(t)$ ) is given by Eq. (23) (resp., Eq. (34)), $\hat{x}(t)$ (resp., $\tilde{x}(t)$ ) is given by Eq. (32) (resp., Eq. (36)), $\Gamma$ and $\tilde{\Gamma}$ are defined in (40), $\kappa$ and $\gamma$ are defined in (39), $p=|\mathcal{V}|$ and $q=|\mathcal{U}|$ , $D_{\max}$ is defined in Eq. (26), and $\zeta_{b}$ is defined in (54).

Corollary 1.

Suppose Assumptions 2-4 hold. and $\norm{\hat{A}-A}\leq\bar{\varepsilon}$ and $\norm{\hat{B}-B}\leq\bar{\varepsilon}$ , where $\bar{\varepsilon}$ is defined in (54). Then, for all $t\in\mathbb{Z}_{\geq 0}$ ,

\mathbb{E}\Big{[}\norm{\hat{x}(t)}^{2}\Big{]}\leq\bigg{(}\frac{58\kappa^{3}\Gamma(\tilde{\Gamma}+1)^{2D_{\max}+3}p^{2}q^{2}}{(1-\gamma)^{3}}\zeta_{b}\bar{\varepsilon}+q\zeta_{b}\bigg{)}^{2},

(57)

and

\mathbb{E}\Big{[}\norm{\hat{u}(t)}^{2}\Big{]}\leq\bigg{(}\frac{58\kappa^{2}(\tilde{\Gamma}+1)^{2D_{\max}+3}p^{2}q^{2}}{(1-\gamma)^{2}}\zeta_{b}\bar{\varepsilon}+q\tilde{\Gamma}\zeta_{b}\bigg{)}^{2},

(58)

where $\hat{u}(t)$ is given by Eq. (23), $\hat{x}(t)$ is given by Eq. (32), $\Gamma$ and $\tilde{\Gamma}$ are defined in (40), $\kappa$ and $\gamma$ are defined in (39), $p=|\mathcal{V}|$ and $q=|\mathcal{U}|$ , $D_{\max}$ is defined in Eq. (26), and $\zeta_{b}$ is given by (54).

Proof.

Note that

	$\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{\hat{x}(t)}^{2}\Big{]}}$	$\displaystyle=\sqrt{\mathbb{E}\Big{[}\norm{\hat{x}(t)-\tilde{x}(t)+\tilde{x}(t)}^{2}\Big{]}}$
		$\displaystyle\leq\sqrt{\mathbb{E}\Big{[}\norm{\hat{x}(t)-\tilde{x}(t)}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{\tilde{x}(t)}^{2}\Big{]}},$

where the inequality follows from Lemma 14. The proof of (57) now follows directly from Lemmas 10-11. Similarly, we can prove (58). ∎

Proposition 4.

Suppose Assumptions 2-4 hold, and $\norm{\hat{A}-A}\leq\bar{\varepsilon}$ and $\norm{\hat{B}-B}\leq\bar{\varepsilon}$ , where $\bar{\varepsilon}$ is defined in Eq. (54). Then, for $\hat{J}$ and $\tilde{J}$ defined in Eqs. (37) and (38), respectively,

\hat{J}-\tilde{J}\leq 696\frac{\kappa^{6}\sigma_{w}^{2}np^{4}q^{3}}{(1-\gamma)^{4}(1-\gamma^{2})}\tilde{\Gamma}^{4D_{\max}+2}(\tilde{\Gamma}+1)^{2D_{\max}+3}(\sigma_{1}(Q)+\sigma_{1}(R))\bar{\varepsilon},

(59)

where $\kappa$ and $\gamma$ are defined in (39), $p=|\mathcal{V}|$ and $q=|\mathcal{U}|$ , $\tilde{\Gamma}$ is defined in (40), and $D_{\max}$ is defined in Eq. (26).

Suppose $\norm{\hat{A}-A}\leq\varepsilon$ and $\norm{\hat{B}-B}\leq\varepsilon$ with $\varepsilon\in\mathbb{R}_{>0}$ . We see from the results in Propositions 3-4 that $\hat{J}-J_{\star}\leq C\varepsilon$ , if $\varepsilon\leq C_{0}$ , where $C$ and $C_{0}$ are constants that depend on the problem parameters.

5.3 Sample Complexity Result

We are now in place to present the sample complexity result for learning decentralized LQR with the partially nested information structure described in Section 2.2.

Theorem 1.

Suppose Assumptions 2-4 hold, and Algorithm 1 is used to obtain $\hat{A}$ and $\hat{B}$ . Moreover, suppose $\norm{A}\leq\vartheta$ and $\norm{B}\leq\vartheta$ , where $\vartheta\in\mathbb{R}_{>0}$ , and $D_{\max}\leq D$ , where $D_{\max}$ is defined in Eq. (26) and $D$ is a universal constant. Consider any $\delta>0$ . Let the input parameters to Algorithm 1 satisfy $N\geq\alpha/\bar{\varepsilon}$ and $\lambda\geq\underline{\sigma}^{2}/40$ , where

z_{b}=\frac{5\kappa_{0}}{1-\gamma_{0}}\overline{\sigma}\sqrt{(\norm{B}^{2}m+m+n)\log\frac{4N}{\delta}},

and

\alpha=\frac{160}{\underline{\sigma}^{2}}\bigg{(}2n\sigma_{w}^{2}(n+m)\log\frac{N+z_{b}^{2}/\lambda}{\delta}+\lambda n\vartheta^{2}\bigg{)},

where $\kappa_{0}$ and $\gamma_{0}$ are given in Assumption 3, $\underline{\sigma}=\min\{\sigma_{w},\sigma_{u}\}$ , $\overline{\sigma}=\max\{\sigma_{w},\sigma_{u}\}$ , and $\bar{\varepsilon}$ is defined in (54). Then, with probability at least $1-\delta$ ,

\hat{J}-J_{\star}\leq C_{1}\frac{\kappa^{6}\sigma_{w}^{2}np^{4}q^{3}}{(1-\gamma^{2})^{2}}\tilde{\Gamma}^{11D+5}(\tilde{\Gamma}+1)^{2D+3}(\Gamma^{3}+\sigma_{1}(R)+\sigma_{1}(Q))\sigma_{1}(R)^{D}\sqrt{\frac{\alpha}{N}},

(60)

where $\hat{J}$ and $J_{\star}$ are given in Eqs. (37) and (12), respectively, $C_{1}$ is a universal constant, $\kappa$ and $\gamma$ are defined in (39), and $\Gamma$ and $\tilde{\Gamma}$ are defined in (40), $p=|\mathcal{V}|$ , and $q=|\mathcal{U}|$ .

Proof.

Note that the results in Propositions 3-4 hold, if $\norm{\hat{A}-A}\leq\bar{\varepsilon}$ and $\norm{\hat{B}-B}\leq\bar{\varepsilon}$ with $\bar{\varepsilon}$ given in (54). Thus, letting $N\geq\frac{\alpha}{\bar{\varepsilon}}$ and $\lambda\geq\underline{\sigma}^{2}/40$ , one can first check that $N\geq 200(n+m)\log\frac{48}{\delta}$ , and then obtain from Proposition 1 that with probability at least $1-\delta$ , $\hat{A}$ and $\hat{B}$ returned by Algorithm 1 satisfy that $\norm{\hat{A}-A}\leq\bar{\varepsilon}$ and $\norm{\hat{B}-B}\leq\bar{\varepsilon}$ . Now, noting that $D_{\max}\leq D$ , where $D$ is a universal constant, and setting $\varphi=1/\sqrt{N}$ in Proposition 3, one can then show via Propositions 3-4 that (60) holds with probability at least $1-\delta$ . ∎

Thus, we have shown a $\tilde{\mathcal{O}}(1/\sqrt{N})$ end-to-end sample complexity result for learning decentralized LQR with the partially nested information structure. In other words, we relate the number of data samples used for estimating the system model to the performance of the control policy proposed in Section 4. Note that our result in Theorem 1 matches with the $\mathcal{O}(1/\sqrt{N})$ sample complexity result (up to logarithm factors in $N$ ) provided in [12] for learning centralized LQR without any information constraints. Also note that the sample complexity for learning centralized LQR has been improved to $\mathcal{O}(1/N)$ in [29]. Specifically, the authors in [29] showed that the gap between the cost $\hat{J}$ corresponding to the control policy they proposed and the optimal cost $J_{\star}$ is upper bounded by $\mathcal{O}(\varepsilon^{2})$ , where $\norm{\hat{A}-A}\leq\varepsilon$ and $\norm{\hat{B}-B}\leq\varepsilon$ , if $\varepsilon$ is sufficiently small. Due to the additional challenges introduced by the information constraints on the controllers (see our discussions at the end of Section 4), we leave investigating the possibility of improving our sample complexity result in Theorem 1 for future work.

6 Numerical Results

In this section, we illustrate the sample complexity result provided in Theorem 1 with numerical experiments, where the numerical experiments are conducted based on Example 1. Specifically, we consider the LTI system given by Eq. (6) with the corresponding directed graph and information graph given by Fig. 1 and Fig. 2, respectively. Under the sparsity pattern of $A$ and $B$ specified in Eq. (6), we generate the nonzero entries in $A\in\mathbb{R}^{3\times 3}$ and $B\in\mathbb{R}^{3\times 3}$ independently by the Gaussian distribution $\mathcal{N}(0,1)$ while satisfying Assumption 3. We set the covariance of the zero-mean white Gaussian noise process $w(t)$ to be $I$ , and set the cost matrices to be $Q=2I$ and $R=5I$ . Moreover, we set the input sequence used in the system identification algorithm (Algorithm 1) to be $u(t)\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,I)$ for all $t\in\{0,\dots,N-1\}$ . In order to approximate the value of $\hat{J}$ defined in Eq. (37), we simulate the system using Algorithm 2 for $T=2000$ and obtain $\hat{J}\approx\frac{1}{T}\sum_{t=0}^{T-1}\big{(}\tilde{x}(t)^{\top}Q\tilde{x}(t)+\tilde{u}(t)^{\top}R\tilde{u}(t)\big{)}$ . Fixing the randomly generated matrices $A$ and $B$ described above, the numerical results presented in this section are obtained by averaging over $100$ independent experiments.

In Fig. 3(a), we plot the estimation error $\big{\lVert}\begin{bmatrix}\hat{A}&\hat{B}\end{bmatrix}-\begin{bmatrix}A&B\end{bmatrix}\big{\rVert}$ corresponding to Algorithm 1 when we range the number of the data samples used in Algorithm 1 from $N=20$ to $N=280$ . Similarly, in Fig. 3(b) we plot the curve corresponding to the cost suboptimality $\hat{J}-J_{\star}$ , where $J_{\star}$ is obtained by the closed-form expression given in Eq. (12). According to Fig. 3, we observe that the estimation error and the cost suboptimality share a similar dependency pattern on $N$ . The similar dependency on $N$ aligns with the results shown in Proposition 1 and Theorem 1 that both the estimation error and the cost suboptimality scale as $\tilde{\mathcal{O}}(1/\sqrt{N})$ , which is a consequence of the results shown in Propositions 3-4 that the cost suboptimality scales linearly with the estimation error. The results presented in Fig. 3 then also imply that our suboptimality results provided in Propositions 3-4 can be tight for certain instances of the problem. Finally, we observe from the shaded regions in Fig. 3 that the cost suboptimality is more sensitive to the randomness introduced by the random input $u(t)\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,I)$ for $t\in\{0,\dots,N-1\}$ and the noise $w(t)\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,I)$ for $t\in\mathbb{Z}_{\geq 0}$ , when we run the $100$ experiments described above. This is potentially due to the fact that we approximated the cost suboptimality as $\frac{1}{T}\sum_{t=0}^{T-1}\big{(}\hat{x}(t)^{\top}Q\hat{x}(t)+\hat{u}(t)^{\top}R\hat{u}(t)\big{)}-J_{\star}$ with $T=2000$ .

7 Conclusion

We considered the problem of control policy design for decentralized state-feedback linear quadratic control with a partially nested information structure, when the system model is unknown. We took a model-based learning approach consisting of two steps. First, we estimated the unknown system model from a single system trajectory of finite length, using least squares estimation. Next, we designed a control policy based on the estimated system model, which satisfies the desired information constraints. We showed that the suboptimality gap between our control policy and the optimal decentralized control policy (designed using accurate knowledge of the system model) scales linearly with the estimation error of the system model. Combining the above results, we provided an end-to-end sample complexity of learning decentralized controllers for state-feedback linear quadratic control with a partially nested information structure.

References

Abbasi-Yadkori and Szepesvári [2011] Y. Abbasi-Yadkori and C. Szepesvári. Regret bounds for the adaptive control of linear quadratic systems. In Proc. Conference on Learning Theory, pages 1–26, 2011.
Abbasi-Yadkori et al. [2011] Y. Abbasi-Yadkori, D. Pál, and C. Szepesvári. Improved algorithms for linear stochastic bandits. In Proc. Advances in neural information processing systems, volume 24, pages 2312–2320, 2011.
Abbasi-Yadkori et al. [2019] Y. Abbasi-Yadkori, P. Bartlett, K. Bhatia, N. Lazic, C. Szepesvari, and G. Weisz. Politex: Regret bounds for policy iteration using expert prediction. In Proc. International Conference on Machine Learning, pages 3692–3702, 2019.
Åström and Wittenmark [2008] K. J. Åström and B. Wittenmark. Adaptive Control. Courier Corporation, 2008.
Bertsekas [2017] D. P. Bertsekas. Dynamic programming and optimal control: Vol. 2 4th Edition. Athena Scientific, 2017.
Blondel and Tsitsiklis [2000] V. D. Blondel and J. N. Tsitsiklis. A survey of computational complexity results in systems and control. Automatica, 36(9):1249–1274, 2000.
Bu et al. [2019] J. Bu, A. Mesbahi, M. Fazel, and M. Mesbahi. LQR through the lens of first order methods: Discrete-time case. arXiv preprint arXiv:1907.08921, 2019.
Cassel and Koren [2021] A. Cassel and T. Koren. Online policy gradient for model free learning of linear quadratic regulators with $\sqrt{T}$ regret. arXiv preprint arXiv:2102.12608, 2021.
Cassel et al. [2020] A. Cassel, A. Cohen, and T. Koren. Logarithmic regret for learning linear quadratic regulators efficiently. In Proc. International Conference on Machine Learning, pages 1328–1337, 2020.
Cohen et al. [2019] A. Cohen, T. Koren, and Y. Mansour. Learning linear-quadratic regulators efficiently with only $\sqrt{T}$ regret. In Proc. International Conference on Machine Learning, pages 1300–1309, 2019.
Dean et al. [2018] S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu. Regret bounds for robust adaptive control of the linear quadratic regulator. In Proc. International Conference on Neural Information Processing Systems, pages 4192–4201, 2018.
Dean et al. [2020] S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu. On the sample complexity of the linear quadratic regulator. Foundations of Computational Mathematics, 20(4):633–679, 2020.
Faradonbeh et al. [2018] M. K. S. Faradonbeh, A. Tewari, and G. Michailidis. Finite time identification in unstable linear systems. Automatica, 96:342–353, 2018.
Fattahi et al. [2019] S. Fattahi, N. Matni, and S. Sojoudi. Learning sparse dynamical systems from a single sample trajectory. In Proc. IEEE Conference on Decision and Control, pages 2682–2689, 2019.
Fattahi et al. [2020] S. Fattahi, N. Matni, and S. Sojoudi. Efficient learning of distributed linear-quadratic control policies. SIAM Journal on Control and Optimization, 58(5):2927–2951, 2020.
Fazel et al. [2018] M. Fazel, R. Ge, S. Kakade, and M. Mesbahi. Global convergence of policy gradient methods for the linear quadratic regulator. In Proc. International Conference on Machine Learning, pages 1467–1476, 2018.
Feng and Lavaei [2019] H. Feng and J. Lavaei. On the exponential number of connected components for the feasible set of optimal decentralized control problems. In Proc. American Control Conference, pages 1430–1437, 2019.
Furieri et al. [2020] L. Furieri, Y. Zheng, and M. Kamgarpour. Learning the globally optimal distributed LQ regulator. In Proc. Learning for Dynamics and Control Conference, pages 287–297, 2020.
Ghadimi and Lan [2013] S. Ghadimi and G. Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368, 2013.
Gravell et al. [2020] B. Gravell, P. M. Esfahani, and T. H. Summers. Learning optimal controllers for linear systems with multiplicative noise via policy gradient. IEEE Transactions on Automatic Control, 2020.
Ho et al. [1972] Y.-C. Ho et al. Team decision theory and information structures in optimal control problems–part i. IEEE Transactions on Automatic control, 17(1):15–22, 1972.
Horn and Johnson [2012] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge university press, 2012.
Hou and Wang [2013] Z.-S. Hou and Z. Wang. From model-based control to data-driven control: Survey, classification and perspective. Information Sciences, 235:3–35, 2013.
Lale et al. [2020] S. Lale, K. Azizzadenesheli, B. Hassibi, and A. Anandkumar. Logarithmic regret bound in partially observable linear dynamical systems. arXiv preprint arXiv:2003.11227, 2020.
Lamperski and Doyle [2012] A. Lamperski and J. C. Doyle. Dynamic programming solutions for decentralized state-feedback LQG problems with communication delays. In Proc. American Control Conference, pages 6322–6327, 2012.
Lamperski and Lessard [2015] A. Lamperski and L. Lessard. Optimal decentralized state-feedback control with sparsity and delays. Automatica, 58:143–151, 2015.
Li et al. [2019] Y. Li, Y. Tang, R. Zhang, and N. Li. Distributed reinforcement learning for decentralized linear quadratic control: A derivative-free policy optimization approach. arXiv preprint arXiv:1912.09135, 2019.
Malik et al. [2020] D. Malik, A. Pananjady, K. Bhatia, K. Khamaru, P. L. Bartlett, and M. J. Wainwright. Derivative-free methods for policy optimization: Guarantees for linear quadratic systems. Journal of Machine Learning Research, 21(21):1–51, 2020.
Mania et al. [2019] H. Mania, S. Tu, and B. Recht. Certainty equivalence is efficient for linear quadratic control. In Proc. International Conference on Neural Information Processing Systems, pages 10154–10164, 2019.
Nesterov and Spokoiny [2017] Y. Nesterov and V. Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527–566, 2017.
Papadimitriou and Tsitsiklis [1986] C. H. Papadimitriou and J. Tsitsiklis. Intractable problems in control theory. SIAM Journal on Control and Optimization, 24(4):639–654, 1986.
Rotkowitz and Lall [2005] M. Rotkowitz and S. Lall. A characterization of convex problems in decentralized control. IEEE Transactions on Automatic Control, 50(12):1984–1996, 2005.
Rotkowitz and Martins [2011] M. C. Rotkowitz and N. C. Martins. On the nearest quadratically invariant information constraint. IEEE Transactions on Automatic Control, 57(5):1314–1319, 2011.
Sarkar and Rakhlin [2019] T. Sarkar and A. Rakhlin. Near optimal finite time identification of arbitrary linear dynamical systems. In Proc. International Conference on Machine Learning, pages 5610–5618, 2019.
Shah and Parrilo [2013] P. Shah and P. A. Parrilo. $\mathcal{H}_{2}$ -optimal decentralized control over posets: A state-space solution for state-feedback. IEEE Transactions on Automatic Control, 58(12):3084–3096, 2013.
Simchowitz et al. [2018] M. Simchowitz, H. Mania, S. Tu, M. I. Jordan, and B. Recht. Learning without mixing: Towards a sharp analysis of linear system identification. In Proc. Conference On Learning Theory, pages 439–473, 2018.
Simchowitz et al. [2020] M. Simchowitz, K. Singh, and E. Hazan. Improper learning for non-stochastic control. In Proc. Conference on Learning Theory, pages 3320–3436, 2020.
Tu and Recht [2019] S. Tu and B. Recht. The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint. In Proc. Conference on Learning Theory, pages 3036–3083, 2019.
Witsenhausen [1968] H. S. Witsenhausen. A counterexample in stochastic optimum control. SIAM Journal on Control, 6(1):131–147, 1968.
Yu et al. [2022] J. Yu, D. Ho, and A. Wierman. Online stabilization of unknown networked systems with communication constraints. arXiv preprint arXiv:2203.02630, 2022.
Zhang et al. [2020] K. Zhang, B. Hu, and T. Basar. Policy optimization for $\mathcal{H}_{2}$ linear control with $\mathcal{H}_{\infty}$ robustness guarantee: Implicit regularization and global convergence. In Proc. Learning for Dynamics and Control Conference, pages 179–190, 2020.
Zheng et al. [2020] Y. Zheng, L. Furieri, A. Papachristodoulou, N. Li, and M. Kamgarpour. On the equivalence of Youla, system-level, and input–output parameterizations. IEEE Transactions on Automatic Control, 66(1):413–420, 2020.
Zheng et al. [2021] Y. Zheng, L. Furieri, M. Kamgarpour, and N. Li. Sample complexity of linear quadratic gaussian (LQG) control for output feedback systems. In Proc. Learning for Dynamics and Control Conference, pages 559–570, 2021.

Appendix

Appendix A Proofs for Least Squares Estimation of System Matrices

A.1 Proof of Lemma 4

First, let us consider $\mathcal{E}_{w}$ . We can apply Lemma 12 with $\delta_{w}=\frac{\delta}{4}$ and obtain that $\mathbb{P}(\mathcal{E}_{w})\geq 1-\frac{\delta}{4}$ . Similarly, recalling from Algorithm 1 that $u(t)\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,\sigma_{u}^{2}I)$ for all $t\in\{0,\dots,N-1\}$ , we have from Lemma 12 that $\mathbb{P}(\mathcal{E}_{u})\geq 1-\frac{\delta}{4}$ . Next, let us consider $\mathcal{E}_{\Theta}$ . Applying Lemma 3 with $\delta_{\Theta}=\frac{\delta}{4}$ , we obtain $\mathbb{P}(\mathcal{E}_{\Theta})\geq 1-\frac{\delta}{4}$ .

Finally, let us consider $\mathcal{E}_{z}$ . Consider the sequence of random vectors $\{z(t)\}_{t\geq 0}$ and the filtration $\{\mathcal{F}_{t}\}_{t\geq 0}$ , where $z(t)\in\mathbb{R}^{n+m}$ is defined in (13), and $\mathcal{F}_{t}=\sigma(x(0),u(0),\dots,x(t),u(t))$ for all $t\in\mathbb{Z}_{\geq 0}$ . For any $t\in\{1,\dots,N-1\}$ , we note from Eq. (3) that $x(t)$ is conditionally Gaussian on $x(t-1)$ and $u(t-1)$ , with

\mathbb{E}\big{[}x(t)x(t)^{\top}|\mathcal{F}_{t-1}\big{]}\succeq\mathbb{E}\big{[}w(t-1)w(t-1)^{\top}\big{]}=\sigma_{w}^{2}I.

Note again from Algorithm 1 that $u(t)\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,\sigma^{2}_{u}I)$ for all $t\in\{0,\dots,N-1\}$ , and that $u(t)$ is assumed to be independent of $w(t)$ for all $t\in\{0,\dots,N-1\}$ . We then see that $z(t)$ is also conditionally Gaussian on $x(t-1)$ and $u(t-1)$ , with

	$\displaystyle\mathbb{E}\big{[}z(t)z(t)^{\top}\|\mathcal{F}_{t-1}\big{]}$	$\displaystyle=\begin{bmatrix}\mathbb{E}[x(t)x(t)^{\top}\|\mathcal{F}_{t-1}]&0\\ 0&\mathbb{E}[u(t)u(t)^{\top}]\end{bmatrix}$
		$\displaystyle=\begin{bmatrix}\mathbb{E}[x(t)x(t)^{\top}\|\mathcal{F}_{t-1}]&0\\ 0&\sigma_{u}^{2}I\end{bmatrix}\succeq\underline{\sigma}^{2}I.$

Now, we can apply Lemma 13 with $\{z(t)\}_{t\geq 0}$ and $\{\mathcal{F}_{t}\}_{t\geq 0}$ described above, and let $\delta_{z}=\frac{\delta}{4p}$ . Since $N\geq 200(n+m)\log\frac{48}{\delta}$ , we have from Lemma 13 that $\sum_{t=0}^{N-1}z(t)z(t)^{\top}\succeq\frac{(N-1)\underline{\sigma}^{2}}{40}I$ holds with probability at least $1-\frac{\delta}{4}$ . Combining the above arguments together and applying a union bound over the events $\mathcal{E}_{w}$ , $\mathcal{E}_{\psi}$ , $\mathcal{E}_{\Theta}$ and $\mathcal{E}_{z}$ , we complete the proof of the lemma. ∎

A.2 Proof of Lemma 5

First, considering any $t\in\{0,\dots,N-1\}$ , we denote $\psi(t)=Bu(t)+w(t)$ . We see from (15) that

	$\displaystyle\norm{\psi(k)}$	$\displaystyle\leq\norm{B}\sigma_{u}\sqrt{5m\log\frac{4N}{\delta}}+\sigma_{w}\sqrt{5n\log\frac{4N}{\delta}}$
		$\displaystyle\leq\overline{\sigma}(\sqrt{m\norm{B}^{2}}+\sqrt{n})\sqrt{5\log\frac{4N}{\delta}}$
		$\displaystyle\leq\overline{\sigma}\sqrt{10(m\norm{B}^{2}+n)\log\frac{4N}{\delta}}.$

Next, for any $t\in\{1,\dots,N-1\}$ , we see from Eq. (3) that

x(t)=A^{t}x(0)+\sum_{k=0}^{t-1}A^{t-1-k}\psi(k),

where recall that we assumed previously that $x(0)=0$ . Since $A$ is stable from Assumption 3, we know that $\norm{A^{k}}\leq\kappa_{0}\gamma_{0}^{k}$ for all $k\in\mathbb{Z}_{\geq 0}$ , where $\kappa_{0}\geq 1$ and $\rho(A)<\gamma_{0}<1$ . It now follows from the above arguments that

\norm{x(t)}\leq\frac{\kappa_{0}}{1-\gamma_{0}}\overline{\sigma}\sqrt{10(m\norm{B}^{2}+n)\log\frac{4N}{\delta}},

for all $t\in\{0,\dots,N-1\}$ . Noting that $\norm{z(t)}\leq\norm{x(t)}+\norm{u(t)}$ , we then have

	$\displaystyle\norm{z(t)}$	$\displaystyle\leq\frac{\kappa_{0}}{1-\gamma_{0}}\overline{\sigma}\sqrt{10(m\norm{B}^{2}+n)\log\frac{4N}{\delta}}+\sigma_{u}\sqrt{5m\log\frac{4N}{\delta}}$
		$\displaystyle\leq\frac{5\kappa_{0}}{1-\gamma_{0}}\overline{\sigma}\sqrt{(\norm{B}^{2}m+m+n)\log\frac{4N}{\delta}},$

for all $t\in\{0,\dots,N-1\}$ . ∎

A.3 Proof of Proposition 1

First, we see from Eq. (15) that on the event $\mathcal{E}$ defined in Eq. (16), the following holds:

\text{Tr}\big{(}\Delta(N)^{\top}V(N)\Delta(N)\big{)}\leq 4\sigma_{w}^{2}n\log\bigg{(}\frac{4n}{\delta}\frac{\det(V(N))}{\det(\lambda I)}\bigg{)}+4\lambda n\vartheta^{2},

(61)

where recall that $V(N)=\lambda I+\sum_{t=0}^{N-1}z(t)z(t)^{\top}$ and $\Delta(N)=\Theta-\hat{\Theta}(N)$ , and where $\Theta$ and $z(t)$ are defined in (13), and $\hat{\Theta}(N)$ is given by (14). To obtain (61), we also use the fact that $\norm{\Theta}^{2}\leq\norm{A}^{2}+\norm{B}^{2}\leq 2\vartheta^{2}$ , which implies via $\Theta\in\mathbb{R}^{n\times(n+m)}$ that $\norm{\Theta}_{F}^{2}\leq 2n\vartheta^{2}$ . Moreover, we have from Eq. (15) that on the event $\mathcal{E}$ , the following holds:

V(N)\succeq\lambda I+\frac{(N-1)\underline{\sigma}^{2}}{40}I\succeq\frac{N\underline{\sigma}^{2}}{40}I,

where the second inequality follows from the choice of $\lambda$ . Combining the above arguments together, one can show that

\norm{\Delta(N)}^{2}\leq\frac{160}{N\underline{\sigma}}\bigg{(}n\sigma_{w}^{2}\log\bigg{(}\frac{4n}{\delta}\frac{\det(V(N))}{\det(\lambda I)}\bigg{)}+\lambda n\vartheta^{2}\bigg{)}.

Noting from Lemma 5 that $\norm{z(t)}\leq z_{b}$ for all $t\in\{0,\dots,N-1\}$ , one can use similar arguments to those for [9, Lemma 37], and show that

\log\frac{\det(V(N))}{\det(\lambda I)}\leq(n+m)\log(N+\frac{z_{b}^{2}}{\lambda}).

Noting that $N\geq 200(n+m)\log\frac{48}{\delta}$ , we have $N\geq 4n$ . It then follows that

	$\displaystyle\norm{\Delta(N)}^{2}$	$\displaystyle\leq\frac{160}{N\underline{\sigma}^{2}}\bigg{(}n\sigma_{w}^{2}\log\frac{N}{\delta}+n\sigma_{w}^{2}(n+m)\log(N+\frac{z_{b}^{2}}{\lambda})+\lambda n\vartheta^{2}\bigg{)}$
		$\displaystyle\leq\frac{160}{N\underline{\sigma}^{2}}\bigg{(}2n\sigma_{w}^{2}(n+m)\log\frac{N+z^{2}_{b}/\lambda}{\delta}+\lambda n\vartheta^{2}\bigg{)}.$

Noting that Algorithm 1 extracts $\hat{A}$ and $\hat{B}$ from $\hat{\Theta}(N)$ , i.e., $\hat{\Theta}(N)=\begin{bmatrix}\hat{A}&\hat{B}\end{bmatrix}$ , one can show that $\norm{\hat{A}-A}\leq\norm{\hat{\Theta}(N)-\Theta}$ and $\norm{\hat{B}-B}\leq\norm{\hat{\Theta}(N)-\Theta}$ . Finally, since we know from Lemma 4 that $\mathbb{P}(\mathcal{E})\geq 1-\delta$ , we conclude that $\norm{\hat{A}-A}\leq\varepsilon_{0}$ and $\norm{\hat{B}-B}\leq\varepsilon_{0}$ hold with probability at least $1-\delta$ . ∎

Appendix B Proof for Algorithm 2

B.1 Proof of Proposition 2

To prove part (a), we use an induction on $t=0,\dots,T-1$ . For the base case $t=0$ , we see directly from line 4 in Algorithm 2 and Eq. (27) that $\mathcal{M}_{i}$ satisfies Eq. (31) (with $t=0$ ) at the beginning of iteration $0$ of the for loop in lines 4-14 of the algorithm. For the induction step, consider any $t\in\{0,\dots,T-1\}$ and suppose the memory $\mathcal{M}_{i}$ satisfies Eq. (31) at the beginning of iteration $t$ of the for loop in lines 4-14 of the algorithm.

To proceed, let us consider any $s\in\mathcal{L}(\mathcal{T}_{i})$ with $j\in\mathcal{V}$ and $s_{j}(0)=s$ in the for loop in lines 5-9 of Algorithm 2, where $\mathcal{L}(\mathcal{T}_{i})$ is defined in Eq. (24). We will show that $\hat{w}_{j}(t-D_{ij}-1)$ in line 7, and thus $\hat{\zeta}_{s}(t-D_{ij})$ in line 8, can be determined using Eq. (28) and the current memory $\mathcal{M}_{i}$ of Algorithm 2. As suggested by the first two cases in Eq. (29), we can focus on the case when $t-D_{ij}>0$ (otherwise, $\hat{\zeta}_{s}(t-D_{ij})$ can be directly determined). We then note from the third case in Eq. (29) that in order to determine $\hat{w}_{j}(t-D_{ij}-1)$ , we need to know $x_{j}(t-D_{ij})$ , and $x_{j_{1}}(t-D_{ij}-1)$ and $\hat{u}_{j_{1}}(t-D_{ij}-1)$ for all $j_{1}\in\mathcal{N}_{j}$ , where $\mathcal{N}_{j}$ is given in Assumption 1. Also note that $D_{ij_{1}}\leq D_{ij}+1$ for all $j_{1}\in\mathcal{N}_{j}$ , which implies that $t-D_{\max}-1\leq t-D_{ij}-1\leq t-D_{ij_{1}}$ . Thus, we have that $x_{j}(t-D_{ij})\in\tilde{\mathcal{I}}_{i}(t)$ , and $x_{j_{1}}(t-D_{ij}-1)\in\tilde{\mathcal{I}}_{i}(t)$ for all $j_{1}\in\mathcal{N}_{j}$ , where $\tilde{\mathcal{I}}_{i}(t)$ is defined in Eq. (30). Now, considering any $j_{1}\in\mathcal{N}_{j}$ , we note from Eq. (23) that $\hat{u}_{j_{1}}(t-D_{ij}-1)=\sum_{r\ni j_{1}}I_{\{j_{1}\},r}\hat{K}_{r}\hat{\zeta}_{r}(t-D_{ij}-1)$ . Thus, in order to determine $\hat{u}_{j_{1}}(t-D_{ij}-1)$ , it suffices to determine $\hat{\zeta}_{r}(t-D_{ij}-1)$ for all $r\in\mathcal{U}$ such that $j_{1}\in r$ (i.e., for all $r\ni j_{1}$ ). Note that $j_{1}\rightsquigarrow i$ , i.e., there exists a directed path from node $j_{1}$ to node $i$ in $\mathcal{G}(\mathcal{V},\mathcal{A})$ . Moreover, noting the definition of $\mathcal{P}(\mathcal{U},\mathcal{H})$ given by (7) with its properties discussed in Lemma 1 and Remark 1, and noting the way we defined the set $\mathcal{T}_{i}$ , one can show that for any $r\ni j_{1}$ , it holds that $r\in\mathcal{T}_{i}$ . Next, considering any $r\ni j_{1}$ , we recall from Eq. (20) that $\mathcal{L}_{r}$ denotes the set of leaf nodes that can reach $r$ in $\mathcal{P}(\mathcal{U},\mathcal{H})$ , i.e., $\mathcal{L}_{r}=\{v\in\mathcal{L}:v\rightsquigarrow r\}=\{v\in\mathcal{L}(\mathcal{T}_{i}):v\rightsquigarrow r\}$ , where the second equality again follows from the properties of $\mathcal{P}(\mathcal{U},\mathcal{H})$ and the definition of $\mathcal{L}(\mathcal{T}_{i})$ in Eq. (24). Now, we split our arguments into two cases: $r$ is a root node in $\mathcal{T}_{i}$ (i.e., $r$ has a self loop), and $r$ does not have a self loop.

First, suppose $r$ has a self loop. For the case when $r$ is a leaf node in $\mathcal{T}_{i}$ (i.e., $r$ is an isolated node in $\mathcal{P}(\mathcal{U},\mathcal{H})$ and $r\in\mathcal{L}(\mathcal{T}_{i})$ ), we see that $\hat{\zeta}_{r}(k)\in\mathcal{M}_{i}$ with $\mathcal{M}_{i}$ given by Eq. (31), for all $k\in\{k-2D_{\max}-1,\dots,k-D_{ij_{r}}-1\}$ , where $j_{r}\in\mathcal{V}$ and $s_{j_{r}}(0)=r$ . Noting from the definition of $\mathcal{T}_{i}$ that $i,j_{r}\in r$ , we have from the construction of $\mathcal{P}(\mathcal{U},\mathcal{H})$ in Eq. (7) that $j_{r}\rightarrow i$ in $\mathcal{G}(\mathcal{V},\mathcal{A})$ with $D_{ij_{r}}=0$ . It follows that $\hat{\zeta}_{r}(t-D_{ij}-1)\in\mathcal{M}_{i}$ with $\mathcal{M}_{i}$ given by Eq. (31). Thus, we focus on the case when $r$ is not a leaf node in $\mathcal{T}_{i}$ , i.e., $r\in\mathcal{R}(\mathcal{T}_{i})$ . We now see from Eq. (28) that given $\hat{\zeta}_{r}(t-D_{ij}-2)$ , and $\hat{\zeta}_{r^{\prime}}(t-D_{ij}-2)$ for all $r^{\prime}$ such that $r^{\prime}\rightarrow r$ (with $r\neq r^{\prime}$ ) in $\mathcal{P}(\mathcal{U},\mathcal{H})$ , the state $\hat{\zeta}_{r}(t-D_{ij}-1)$ can be determined. Let us consider any $r^{\prime}$ such that $r^{\prime}\rightarrow r$ (with $r^{\prime}\neq r$ ) in $\mathcal{P}(\mathcal{U},\mathcal{H})$ , and denote $\mathcal{L}_{r^{\prime}}=\{v^{\prime}\in\mathcal{L}(\mathcal{T}_{i}):v^{\prime}\rightsquigarrow r^{\prime}\}$ , where we note that $\mathcal{L}_{r^{\prime}}\subseteq\mathcal{L}_{r}$ . For any $k\in\mathbb{Z}$ , one can recursively apply Eq. (28) to show that given $\hat{\zeta}_{v^{\prime}}(k-l_{v^{\prime}r^{\prime}})$ for all $v^{\prime}\in\mathcal{L}_{r^{\prime}}$ , the state $\hat{\zeta}_{r^{\prime}}(k)$ can be determined, where $l_{v^{\prime}r^{\prime}}$ is the length of the (unique) directed path from $v^{\prime}$ to $r^{\prime}$ in $\mathcal{P}(\mathcal{U},\mathcal{H})$ . Further considering any $v^{\prime}\in\mathcal{L}_{r^{\prime}}$ , and noting that $v^{\prime}\in\mathcal{L}(\mathcal{T}_{i})$ , we have that $\hat{\zeta}_{v^{\prime}}(k)\in\mathcal{M}_{i}$ for all $k\in\{t-2D_{\max}-1,\dots,t-D_{ij_{v^{\prime}}}-1\}$ with $j_{v^{\prime}}\in\mathcal{V}$ and $s_{j_{v^{\prime}}}(0)=v^{\prime}$ , where $\mathcal{M}_{i}$ is given by Eq. (31). Recalling again the definition of $\mathcal{P}(\mathcal{U},\mathcal{H})$ in (7), and noting that $v^{\prime}\in\mathcal{L}_{r^{\prime}}$ , $j_{v^{\prime}}\in v^{\prime}$ , $j_{1}\in r$ , and $r^{\prime}\rightarrow r$ in $\mathcal{P}(\mathcal{U},\mathcal{H})$ , one can show that

D_{j_{1}j_{v^{\prime}}}\leq l_{v^{\prime}r^{\prime}}+1\leq D_{\max}.

(62)

We further split our arguments into $D_{ij_{v^{\prime}}}\leq D_{ij}$ and $D_{ij_{v^{\prime}}}\geq D_{ij}+1$ . First, supposing $D_{ij_{v^{\prime}}}\leq D_{ij}$ , we have

\begin{split}&t-2D_{\max}-1\leq t-D_{\max}-l_{v^{\prime}r^{\prime}}-1\\ &t-D_{ij_{v^{\prime}}}-1\geq t-D_{ij}-l_{v^{\prime}r^{\prime}}-2.\end{split}

(63)

Second, suppose $D_{ij_{v^{\prime}}}\geq D_{ij}+1$ . Recall from Remark 3 that we let the for loop in lines 5-9 of Algorithm 2 iterate over the elements in $\mathcal{L}(\mathcal{T}_{i})$ according to a certain order of the elements in $\mathcal{L}(\mathcal{T}_{i})$ . We then see from the inequality $D_{ij_{v^{\prime}}}\geq D_{ij}+1$ that $s_{j_{v^{\prime}}}(0)\in\mathcal{L}(\mathcal{T}_{i})$ (with $j_{v^{\prime}}\in\mathcal{V}$ and $s_{j_{v^{\prime}}}(0)=v^{\prime}$ ) has already been considered by the for loop in lines 5-9 in Algorithm 2, i.e., the states $\hat{\zeta}_{v^{\prime}}(k)$ for all $k\in\{t-2D_{\max}-1,\dots,t-D_{ij_{v^{\prime}}}\}$ are in the current memory of Algorithm 2, denoted as $\mathcal{M}_{i}^{\prime}$ , when we consider the $s\in\mathcal{L}(\mathcal{T}_{i})$ with $j\in\mathcal{V}$ and $s_{j}(0)=s$ in the for loop in lines 5-9 of the algorithm. Moreover, we have from the above arguments that $j_{v^{\prime}}\rightsquigarrow j_{1}\rightsquigarrow i$ in $\mathcal{G}(\mathcal{V},\mathcal{A})$ , i.e., there is a directed path from node $j_{v^{\prime}}$ to node $i$ that goes through node $j_{1}$ in $\mathcal{G}(\mathcal{V},\mathcal{A})$ . It then follows that

D_{ij_{v^{\prime}}}\leq D_{ij_{1}}+D_{j_{1}j_{v^{\prime}}}\leq D_{ij}+D_{jj_{1}}+D_{j_{1}j_{v^{\prime}}},

(64)

where $D_{jj_{1}}\in\{0,1\}$ . Combining (62) and (64), we obtain

\begin{split}&t-2D_{\max}-1\leq t-D_{\max}-l_{v^{\prime}r^{\prime}}-1\\ &t-D_{ij_{v^{\prime}}}\geq t-D_{ij}-l_{v^{\prime}r^{\prime}}-2.\end{split}

(65)

It then follows from (63) and (65) and our arguments above that the states $\hat{\zeta}_{v^{\prime}}(k)$ for all $k\in\{t-D_{\max}-l_{v^{\prime}r^{\prime}}-1,\dots,t-D_{ij}-l_{v^{\prime}r^{\prime}}-2\}$ are in the current memory $\mathcal{M}_{i}^{\prime}$ described above, for all $v^{\prime}\in\mathcal{L}_{r^{\prime}}$ . Combining the above arguments together, we have that $\hat{\zeta}_{r^{\prime}}(k)$ can be determined from Eq. (28) and the current memory $\mathcal{M}_{i}^{\prime}$ , for all $k\in\{t-D_{\max}-1,\dots,t-D_{ij}-2\}$ and for all $r^{\prime}$ such that $r^{\prime}\rightarrow r$ (with $r\neq r^{\prime}$ ) in $\mathcal{P}(\mathcal{U},\mathcal{H})$ . Moreover, recalling that $r\in\mathcal{R}(\mathcal{T}_{i})$ as we argued above, we see from Eq. (31) that $\hat{\zeta}_{r}(t-D_{\max}-1)\in\mathcal{M}_{i}^{\prime}$ . One can now apply Eq. (28) multiple times to show that $\hat{\zeta}_{r}(k)$ can be determined from the current memory $\mathcal{M}_{i}^{\prime}$ described above, for all $k\in\{t-D_{\max},\dots,t-D_{ij}-1\}$ .

Next, suppose $r$ does not have a self loop. Similarly to our arguments above, we first consider the case when $r$ is a leaf node in $\mathcal{T}_{i}$ , i.e., $r\in\mathcal{L}(\mathcal{T}_{i})$ . We see that $\hat{\zeta}_{r}(t-D_{ij_{r}}-1)\in\mathcal{M}_{i}$ , where $j_{r}\in\mathcal{V}$ with $s_{j_{r}}(0)=r$ , and $\mathcal{M}_{i}$ is defined in Eq. (31). Since $j_{1},j_{r}\in r$ , we have from the construction of $\mathcal{P}(\mathcal{U},\mathcal{H})$ in (7) that $j_{r}\rightarrow j_{1}$ in $\mathcal{G}(\mathcal{V},\mathcal{A})$ with $D_{j_{1}j_{r}}=0$ . Noting that $j_{1}\rightsquigarrow i$ in $\mathcal{P}(\mathcal{V},\mathcal{A})$ as we argued above, we then have the following:

\displaystyle D_{ij_{r}}\leq D_{ij}+D_{jj_{1}}+D_{j_{1}j_{r}}=D_{ij}+D_{jj_{1}},

(66)

where $D_{jj_{1}}\in\{0,1\}$ . Now, supposing $D_{ij_{r}}\leq D_{ij}$ , we see directly see from Eq. (31) that $\hat{\zeta}_{r}(t-D_{ij}-1)\in\mathcal{M}_{i}$ holds. Supposing $D_{ij_{r}}\geq D_{ij}+1$ , we see from (66) that $D_{ij_{r}}=D_{ij}+1$ . Using similar arguments to those above for the case when $r$ has a self loop (particularly, the order of the elements in $\mathcal{L}(\mathcal{T}_{i})$ over which the for loop in lines 5-9 of Algorithm 2 iterates), one can show that the states $\hat{\zeta}_{r}(k)$ for all $k\in\{t-2D_{\max}-1,\dots,t-D_{ij_{r}}\}$ have been added to the current memory of Algorithm 2, denoted as $\mathcal{M}_{i}^{\prime\prime}$ , when we consider the $s\in\mathcal{L}(\mathcal{T}_{i})$ with $j\in\mathcal{V}$ and $s_{j}(0)=s$ in the for loop in lines 5-9 of the algorithm. It follows that $\hat{\zeta}_{r}(t-D_{ij}-1)\in\mathcal{M}^{\prime\prime}_{i}$ . Then, we consider the case when $r$ is not a leaf node in $\mathcal{T}_{i}$ . We see from Eq. (28) that given $\hat{\zeta}_{r^{\prime}}(t-D_{ij}-2)$ for all $r^{\prime}$ such that $r^{\prime}\rightarrow r$ (with $r\neq r^{\prime}$ ) in $\mathcal{P}(\mathcal{U},\mathcal{H})$ , the state $\hat{\zeta}_{r}(t-D_{ij}-1)$ can be determined. The remaining arguments then follow directly from those above for the case when $r$ has a self loop.

In summary, we have shown that $\hat{\zeta}_{r}(t-D_{ij}-1)$ can be determined from Eq. (28) and the current memory of Algorithm 2, for all $r\ni j_{1}$ , for all $j_{1}\in\mathcal{N}_{j}$ , and for all $s\in\mathcal{L}(\mathcal{T}_{i})$ with $j\in\mathcal{V}$ and $s_{j}(0)=s$ . It then follows from our arguments above that $\hat{w}_{j}(t-D_{ij}-1)$ in line 7 of Algorithm 2, and thus $\hat{\zeta}_{s}(t-D_{ij})$ in line 8 of Algorithm 2, can be determined using Eq. (28) and the current memory of Algorithm 2, for all $s\in\mathcal{L}(\mathcal{T}_{i})$ with $j\in\mathcal{V}$ and $s_{j}(0)=s$ . In other words, we have shown that $\hat{\zeta}_{s}(t-D_{ij})$ can be added to the memory of Algorithm 2 in line 9, for all $s\in\mathcal{L}(\mathcal{T}_{i})$ with $j\in\mathcal{V}$ and $s_{j}(0)=s$ .

Now, let us consider any $s\in\mathcal{R}(\mathcal{T}_{i})$ with $\mathcal{R}(\mathcal{T}_{i})$ defined in Eq. (25). We will show that $\hat{\zeta}_{r}(t-D_{\max}-1)$ can be determined using Eq. (28) and the states in $\mathcal{M}_{i}$ given by Eq. (31), for all $r$ such that $r\rightarrow s$ in $\mathcal{P}(\mathcal{U},\mathcal{H})$ . Note from our definition of $\mathcal{R}(\mathcal{T}_{i})$ in Eq. (25) that $s$ is not a leaf node. Following similar arguments to those above, let us consider any $r$ such that $r\rightarrow s$ (with $r\neq s$ ) in $\mathcal{P}(\mathcal{U},\mathcal{H})$ , and denote $\mathcal{L}_{r}=\{v\in\mathcal{L}(\mathcal{T}_{i}):v\rightsquigarrow r\}$ . Further considering any $v\in\mathcal{L}_{r}$ , and noting that $v\in\mathcal{L}(\mathcal{T}_{i})$ , we have that $\hat{\zeta}_{v}(k)\in\mathcal{M}_{i}$ for all $k\in\{t-2D_{\max}-1,\dots,t-D_{ij_{v}}-1\}$ , where $\mathcal{M}_{i}$ is given by Eq. (31), and $j_{v}\in\mathcal{V}$ with $s_{j_{v}}(0)=v$ . Similarly to Eq. (62), we have that $l_{vr}\leq D_{\max}$ , which implies that $t-2D_{\max}-1\leq t-D_{\max}-l_{vr}-1$ . Therefore, we see that $\hat{\zeta}_{v}(t-D_{\max}-l_{vr}-1)\in\mathcal{M}_{i}$ with $\mathcal{M}_{i}$ given by Eq. (31), for all $v\in\mathcal{L}_{r}$ . Using similar arguments to those above, one can now recursively apply Eq. (28) to show that $\hat{\zeta}_{r}(t-D_{\max}-1)$ can be determined from $\mathcal{M}_{i}$ given by Eq. (31), for all $r$ such that $r\rightarrow s$ (with $r\neq s$ ). Since $\hat{\zeta}_{s}(t-D_{\max}-1)\in\mathcal{M}_{i}$ , we see from Eq. (28) that $\hat{\zeta}_{s}(t-D_{\max})$ can be determined from $\mathcal{M}_{i}$ given by Eq. (31). Thus, we have shown that $\hat{\zeta}_{s}(t-D_{\max})$ can be added to the memory of Algorithm 2 in line 12, for all $s\in\mathcal{R}(\mathcal{T}_{i})$ .

Combining all the above arguments together and noting line 14 in Algorithm 2, we see that at the beginning of iteration $(t+1)$ of the for loop in lines 4-14 of Algorithm 2, the memory $\mathcal{M}_{i}$ of the algorithm satisfies

\mathcal{M}_{i}=\big{\{}\hat{\zeta}_{s}(k):k\in\{t-2D_{\max},\dots,t-D_{ij}\},s\in\mathcal{L}(\mathcal{T}_{i}),j\in\mathcal{V},s_{j}(0)=s\big{\}}\cup\{\hat{\zeta}_{s}(t-D_{\max}):s\in\mathcal{R}(\mathcal{T}_{i})\}.

(67)

This completes the induction step for the proof of Eq. (31), and thus completes the proof of part (a).

We then prove part (b). Consider any $t\in\{0,1,\dots,T-1\}$ . In order to prove part (b), it suffices for us to show that $\hat{\zeta}_{r}(t)$ for all $r\ni i$ can be determined using Eq. (28) and the memory $\mathcal{M}_{i}$ after line 12 (and before line 14) in iteration $t$ of the for loop in lines 4-14 of Algorithm 2, which is given by

\mathcal{M}_{i}=\big{\{}\hat{\zeta}_{s}(k):k\in\{t-2D_{\max}-1,\dots,t-D_{ij}\},s\in\mathcal{L}(\mathcal{T}_{i}),j\in\mathcal{V},s_{j}(0)=s\big{\}}\\ \cup\{\hat{\zeta}_{s}(k):k\in\{t-D_{\max}-1,t-D_{\max}\},s\in\mathcal{R}(\mathcal{T}_{i})\}.

(68)

Considering any $r\ni i$ , one can show via the definition of $\mathcal{P}(\mathcal{U},\mathcal{H})$ in (7) and the definition of $\mathcal{T}_{i}$ that $r\in\mathcal{T}_{i}$ . Again, we split our arguments into two cases: $r$ has a self loop, and $r$ does not have a self loop.

First, suppose $r$ has a self loop. For the case when $r\in\mathcal{L}(\mathcal{T}_{i})$ (i.e., $r$ is an isolated node in $\mathcal{P}(\mathcal{U},\mathcal{H})$ ), we see that $\hat{\zeta}_{r}(t-D_{ij_{r}})\in\mathcal{M}_{i}$ with $\mathcal{M}_{i}$ given by Eq. (68), where $j_{r}\in\mathcal{V}$ with $s_{j_{r}}(0)=r$ . Noting that $i,j_{r}\in r$ , we see from the definitions of $\mathcal{P}(\mathcal{U},\mathcal{H})$ in (7) that $j_{r}\rightarrow i$ in $\mathcal{G}(\mathcal{A},\mathcal{V})$ with $D_{ij_{r}}=0$ . It follows that $\hat{\zeta}_{r}(t)\in\mathcal{M}_{i}$ with $\mathcal{M}_{i}$ given by Eq. (68). Thus, we focus on the case when $r$ is not a leaf node, i.e., $r\in\mathcal{R}(\mathcal{T}_{i})$ . We see from Eq. (28) that given $\hat{\zeta}_{r}(t-1)$ , and $\hat{\zeta}_{r^{\prime}}(t-1)$ for all $r^{\prime}\rightarrow r$ (with $r^{\prime}\neq r$ ) in $\mathcal{P}(\mathcal{U},\mathcal{H})$ , the state $\hat{\zeta}_{r}(t)$ can be determined. Again, let us consider any $r^{\prime}$ such that $r^{\prime}\rightarrow r$ in $\mathcal{P}(\mathcal{U},\mathcal{H})$ , and denote $\mathcal{L}_{r^{\prime}}=\{v^{\prime}\in\mathcal{L}(\mathcal{T}_{i}):v^{\prime}\rightsquigarrow r^{\prime}\}$ . Further considering any $v^{\prime}\in\mathcal{L}_{r^{\prime}}$ , and noting that $v^{\prime}\in\mathcal{L}(\mathcal{T}_{i})$ , we have that $\hat{\zeta}_{v^{\prime}}(k)\in\mathcal{M}_{i}$ with $\mathcal{M}_{i}$ given by Eq. (68), for all $k\in\{t-2D_{\max}-1,\dots,t-D_{ij_{v^{\prime}}}\}$ , where $j_{v^{\prime}}\in\mathcal{V}$ and $s_{j_{v^{\prime}}}(0)=v^{\prime}$ . Similarly to (62), we have that $D_{ij_{v^{\prime}}}\leq l_{v^{\prime}r^{\prime}}+1\leq D_{\max}$ , which implies that

\begin{split}&t-2D_{\max}-1\leq t-D_{\max}-l_{v^{\prime}r^{\prime}}-1\\ &t-D_{ij_{v^{\prime}}}\geq t-l_{v^{\prime}r^{\prime}}-1.\end{split}

(69)

It then follows from (69) that $\hat{\zeta}_{v^{\prime}}(k)\in\mathcal{M}_{i}$ with $\mathcal{M}_{i}$ given by Eq. (68), for all $k\in\{t-D_{\max}-l_{v^{\prime}r^{\prime}}-1,\dots,t-l_{v^{\prime}r^{\prime}}-1\}$ , and for all $v^{\prime}\in\mathcal{L}_{r}$ . Using similar arguments to those before, one can recursively use Eq. (28) to show that $\hat{\zeta}_{r^{\prime}}(k)$ can be determined from $\mathcal{M}_{i}$ given by Eq. (68), for all $k\in\{t-D_{\max}-1,\dots,t-1\}$ . Moreover, recalling that $r\in\mathcal{R}(\mathcal{T}_{i})$ as we argued above, we see that $\hat{\zeta}_{r}(t-D_{\max}-1)\in\mathcal{M}_{i}$ with $\mathcal{M}_{i}$ given by Eq. (68). One can then apply Eq. (28) multiple times and show that $\hat{\zeta}_{r}(t)$ can be determined from $\mathcal{M}_{i}$ given by Eq. (68). Next, suppose $r$ does not have a self loop. Using similar arguments to those above for the case when $r$ has a self loop, one can show that $\hat{\zeta}_{r}(t)$ can be determined using Eq. (28) and the current memory $\mathcal{M}_{i}$ given in Eq. (68).

Combining the above arguments together, we conclude that for any $t\in\{0,1,\dots,T-1\}$ , the control input $\hat{u}_{i}(t)$ in line 13 can be determined using Eq. (28) and the memory $\mathcal{M}_{i}$ given by Eq. (68). This completes the proof of part (b). ∎

Appendix C Proofs for Perturbation Bounds on Solutions to Ricatti Equations

C.1 Proof of Lemma 7

Consider any $r\in\mathcal{U}$ that has a self loop. To show that (41) holds under the assumption on $\varepsilon$ given in (44), we use [29, Proposition 2]. Specifically, since $\norm{\hat{A}-A}\leq\varepsilon$ and $\norm{\hat{B}-B}\leq\varepsilon$ hold, one can show that $\norm{\hat{A}_{rr}-A_{rr}}\leq\varepsilon$ and $\norm{\hat{B}_{rr}-B_{rr}}\leq\varepsilon$ hold, for all $r\in\mathcal{U}$ . Similarly, noting that $\norm{A_{rr}}\leq\norm{A}$ and $\norm{B_{rr}}\leq\norm{B}$ hold, for all $r\in\mathcal{U}$ , one can show that $\norm{A_{rr}+B_{rr}K_{r}}\leq\tilde{\Gamma}^{2}$ . Moreover, note that $1\leq\sigma_{n}(R)\leq\sigma_{n_{r}}(R_{rr})\leq\sigma_{1}(R_{rr})\leq\sigma_{1}(R)$ and $1\leq\sigma_{m}(Q)\leq\sigma_{m_{r}}(Q_{rr})\leq\sigma_{1}(Q_{rr})\leq\sigma_{1}(Q)$ from Assumption 4, where $n_{r}\triangleq\sum_{i\in r}n_{i}$ and $m_{r}\triangleq\sum_{i\in r}m_{i}$ . Recalling the definitions of $\kappa$ , $\gamma$ and $\Gamma$ , the proof of (41) under the assumption on $\varepsilon$ given in (44) now follows from [29, Proposition 2].

Next, let denote

f(\varepsilon)=6\frac{\kappa^{2}}{1-\gamma^{2}}\tilde{\Gamma}^{5}(1+\sigma_{1}(R^{-1}))\varepsilon,

and note that $\norm{P_{r}}\leq\Gamma$ . Moreover, we see from Eq. (21) that

\hat{K}_{r}=-(R_{rr}+\hat{B}_{rr}^{\top}\hat{P}_{r}\hat{B}_{rr}^{\top})^{-1}\hat{B}_{rr}^{\top}\hat{P}_{r}\hat{A}_{rr},

where $\norm{\hat{A}_{rr}-A_{rr}}\leq\varepsilon$ and $\norm{\hat{B}_{rr}-B_{rr}}\leq\varepsilon$ . We have

\norm{\hat{B}^{\top}_{rr}\hat{P}_{r}\hat{B}_{rr}-B^{\top}_{rr}P_{r}\hat{B}_{rr}}\leq\norm{\hat{B}^{\top}_{rr}\hat{P}_{rr}\hat{B}_{rr}-B^{\top}_{rr}\hat{P}_{r}\hat{B}_{rr}}+\norm{B^{\top}_{rr}\hat{P}_{r}\hat{B}_{rr}-B^{\top}_{rr}P_{r}\hat{B}_{rr}}+\norm{B^{\top}_{rr}P_{r}\hat{B}_{rr}-B^{\top}_{rr}P_{r}B_{rr}},

which implies that

\norm{\hat{B}^{\top}_{rr}\hat{P}_{r}\hat{B}_{rr}-B^{\top}_{rr}P_{r}B_{rr}}\leq\norm{\hat{P}_{r}}\norm{\hat{B}_{rr}}\varepsilon+\norm{B_{rr}}\norm{\hat{B}_{rr}}f(\varepsilon)+\norm{B_{rr}}\norm{P_{r}}\varepsilon.

Noting that $\varepsilon\leq f(\varepsilon)\leq\frac{1}{6}$ and recalling the definition of $\Gamma$ given in (40), we have

\norm{\hat{B}^{\top}_{rr}\hat{P}_{r}\hat{B}_{rr}-B^{\top}_{rr}P_{r}B_{rr}}\leq 3\tilde{\Gamma}^{2}f(\varepsilon).

Similarly, one can show that

\norm{\hat{B}^{\top}_{rr}\hat{P}_{r}\hat{A}_{rr}-B^{\top}_{rr}P_{r}A_{rr}}\leq 3\tilde{\Gamma}^{2}f(\varepsilon).

Now, following similar arguments to those in the proof of [29, Lemma 2], one can show that

\norm{\hat{K}_{r}-K_{r}}\leq 3\tilde{\Gamma}^{3}f(\varepsilon)\leq 1,

where the second inequality follows from the assumption on $\varepsilon$ given in (44). ∎

C.2 Proof of Lemma 8

First, let us consider any $r\in\mathcal{U}$ such that $l_{rs}=1$ , i.e., $r\rightarrow s$ . Since $s$ is the unique root node that is reachable from $r$ , we see from Lemma 1 and Remark 1 that $s$ has a self loop. Noting that $\sigma_{1}(R)\geq 1$ from Assumption 4, and that $\tilde{\Gamma}\geq 1$ , we see that any $\varepsilon$ satisfying (47) also satisfies (44). Thus, we have from (41) in Lemma 7 that

\norm{\hat{P}_{s}-P_{s}}\leq f(\varepsilon)\leq\frac{1}{6},

where

f(\varepsilon)=6\frac{\kappa^{2}}{1-\gamma^{2}}\tilde{\Gamma}^{5}(1+\sigma_{1}(R^{-1}))\varepsilon,

and we note that $\norm{P_{s}}\leq\Gamma$ . Moreover, we see from Eq. (21) that

\hat{K}_{r}=-(R_{rr}+\hat{B}_{sr}^{\top}\hat{P}_{s}\hat{B}_{sr}^{\top})^{-1}\hat{B}_{sr}^{\top}\hat{P}_{s}\hat{A}_{sr},

where $\norm{\hat{A}_{sr}-A_{rr}}\leq\varepsilon$ and $\norm{\hat{B}_{sr}-B_{sr}}\leq\varepsilon$ (since $\norm{\hat{A}-A}\leq\varepsilon$ and $\norm{\hat{B}-B}\leq\varepsilon$ ). Now, using similar arguments to those in the proof of Lemma 7, one can also show that

\norm{\hat{B}^{\top}_{sr}\hat{P}_{s}\hat{B}_{sr}-B^{\top}_{sr}P_{s}B_{sr}}\leq 3\tilde{\Gamma}^{2}f(\varepsilon),

and

\norm{\hat{B}^{\top}_{sr}\hat{P}_{s}\hat{A}_{sr}-B^{\top}_{sr}P_{s}A_{sr}}\leq 3\tilde{\Gamma}^{2}f(\varepsilon).

Using similar arguments to those in the proof of [29, Lemma 2], one can now show that

\norm{\hat{K}_{r}-K_{r}}\leq 3\tilde{\Gamma}^{3}f(\varepsilon)\leq 1,

(70)

where the second inequality follows from the assumption on $\varepsilon$ given in (47), and we note that $\norm{K_{r}}\leq\Gamma$ . Hence, we have shown that (45) holds for $l_{rs}=1$

To prove Eq. (46) for $l_{rs}=1$ , we first recall the expressions for $P_{r}$ and $\hat{P}_{r}$ given in Eqs. (9) and (22), respectively. Using similar arguments to those above, we have

	$\displaystyle\norm{\hat{A}_{sr}+\hat{B}_{sr}\hat{K}_{r}-A_{sr}-B_{sr}K_{r}}$
$\displaystyle\leq$	$\displaystyle\norm{\hat{A}_{sr}-A_{sr}}+\norm{\hat{B}_{sr}\hat{K}_{r}-B_{sr}\hat{K}_{r}}+\norm{B_{sr}\hat{K}_{r}-B_{sr}K_{r}}$
$\displaystyle\leq$	$\displaystyle\varepsilon+(\Gamma+1)\varepsilon+\Gamma 3\widetilde{\Gamma}^{3}f(\varepsilon)\leq 4\tilde{\Gamma}^{4}f(\varepsilon),$	(71)

where the first inequality in (71) follows from (70) and the definition of $\Gamma$ given in (40). To obtain the second inequality in (71), we first note from Eq. (9) that $P_{r}\succeq Q_{rr}\succeq\sigma_{m}(Q)I\succeq I$ , where the last inequality follows from Assumption 4. We then have from (40) that $\Gamma\geq\norm{P_{r}}\geq 1$ , which further implies that $\tilde{\Gamma}+1\leq\tilde{\Gamma}^{4}$ . It follows that the second inequality in (71) holds. To proceed, denoting $\hat{L}_{sr}=\hat{A}_{sr}+\hat{B}_{sr}\hat{K}_{r}$ and $L_{sr}=A_{sr}+B_{sr}K_{r}$ , we have from Eqs. (9) and (22) that

\norm{\hat{P}_{r}-P_{r}}\leq\norm{\hat{K}_{r}^{\top}R_{rr}\hat{K}_{r}-K_{r}^{\top}R_{rr}K_{r}}+\norm{\hat{L}_{sr}^{\top}\hat{P}_{s}\hat{L}_{sr}-L_{sr}^{\top}P_{s}L_{sr}},

(72)

where we note that $\norm{L_{sr}}\leq\tilde{\Gamma}^{2}$ . From the above arguments, we have the following:

	$\displaystyle\norm{\hat{K}_{r}^{\top}R_{rr}\hat{K}_{r}-K^{\top}_{r}R_{rr}K_{r}}$	$\displaystyle\leq\norm{\hat{K}_{r}^{\top}R_{rr}\hat{K}_{r}-K_{r}^{\top}R_{rr}\hat{K}_{r}}+\norm{K_{r}^{\top}R_{rr}\hat{K}_{r}-K_{r}^{\top}R_{rr}K_{r}}$
		$\displaystyle\leq 3\tilde{\Gamma}^{3}f(\varepsilon)\sigma_{1}(R)(\Gamma+3\tilde{\Gamma}^{3}f(\varepsilon))+\Gamma\sigma_{1}(R)3\tilde{\Gamma}^{3}f(\varepsilon)$
		$\displaystyle\leq 6\tilde{\Gamma}^{4}\sigma_{1}(R)f(\varepsilon)+9\tilde{\Gamma}^{6}\sigma_{1}(R)f(\varepsilon)^{2},$

and

		$\displaystyle\norm{\hat{L}_{sr}^{\top}\hat{P}_{s}\hat{L}_{sr}-L_{sr}^{\top}P_{s}L_{sr}}$
	$\displaystyle\leq$	$\displaystyle\norm{\hat{L}_{sr}^{\top}\hat{P}_{s}\hat{L}_{sr}-L_{sr}^{\top}\hat{P}_{s}\hat{L}_{sr}}+\norm{L_{sr}^{\top}\hat{P}_{s}\hat{L}_{sr}-L_{sr}^{\top}P_{s}\hat{L}_{sr}}+\norm{L_{sr}^{\top}P_{s}\hat{L}_{sr}-L_{sr}^{\top}P_{s}L_{sr}}$
	$\displaystyle\leq$	$\displaystyle 4\tilde{\Gamma}^{4}f(\varepsilon)(\Gamma+1)(\tilde{\Gamma}^{2}+4\tilde{\Gamma}^{4}f(\varepsilon))+\tilde{\Gamma}^{2}f(\varepsilon)(\tilde{\Gamma}^{2}+4\tilde{\Gamma}^{4}f(\varepsilon))+\tilde{\Gamma}^{2}\Gamma 4\tilde{\Gamma}^{4}f(\varepsilon)$
	$\displaystyle\leq$	$\displaystyle 4\tilde{\Gamma}^{7}f(\varepsilon)+16\tilde{\Gamma}^{9}f(\varepsilon)^{2}+\tilde{\Gamma}^{4}f(\varepsilon)+4\tilde{\Gamma}^{6}f(\varepsilon)^{2}+4\tilde{\Gamma}^{7}f(\varepsilon).$

Noting that $f(\varepsilon)\leq\frac{1}{6}$ , one can now obtain from (72) that

\norm{\hat{P}_{r}-P_{r}}\leq 20\tilde{\Gamma}^{9}\sigma_{1}(R)f(\varepsilon)\leq\frac{1}{6},

(73)

where the second inequality again follows from the assumption on $\varepsilon$ given in (47).

Next, let us consider any $r\in\mathcal{U}$ such that $l_{sr}=2$ (where $s\in\mathcal{U}$ is the unique root node that is reachable from $r$ ), and denote the unique directed path from $r$ to $s$ in $\mathcal{P}(\mathcal{U},\mathcal{H})$ as $r\rightarrow r_{1}\rightarrow s$ . We see that $r_{1}\in\mathcal{U}$ satisfies (70) and (73). Repeating the above arguments for obtaining (70) and (73) one more time, one can show that

\norm{\hat{K}_{r}-K_{r}}\leq 3\tilde{\Gamma}^{3}20\tilde{\Gamma}^{9}\sigma_{1}(R)f(\varepsilon)\leq 1,

and

\norm{\hat{P}_{r}-P_{r}}\leq(20\tilde{\Gamma}^{9}\sigma_{1}(R))^{2}f(\varepsilon)\leq\frac{1}{6},

where we use again the assumption on $\varepsilon$ given by (47). Further repeating the above arguments, and noting from Eq. (26) and the definition of $\mathcal{P}(\mathcal{U},\mathcal{H})$ in (7) that $l_{sr}\leq D_{\max}$ for all $s,r\in\mathcal{U}$ , one can show that (45)-(46) hold, for all $r\in\mathcal{U}$ without a self loop, under the assumption on $\varepsilon$ given in (47). ∎

Appendix D Proofs for Perturbation Bounds on Costs

D.1 Proof of Lemma 9

First, let us consider any $s\in\mathcal{U}$ that has a self loop. Noting the construction of the information graph $\mathcal{P}=(\mathcal{U},\mathcal{H})$ given in (7), one can show that Eq. (35) can be rewritten as

\tilde{\zeta}_{s}(t+1)=(A_{ss}+B_{ss}\hat{K}_{s})\tilde{\zeta}_{s}(t)+\sum_{v\in\mathcal{L}_{s}}H(v,s)\sum_{w_{j}\rightarrow v}I_{v,\{j\}}w_{j}(t-l_{vs}),

(74)

where $\mathcal{L}_{s}=\{v\in\mathcal{L}:v\rightsquigarrow s\}$ is the set of leaf nodes in $\mathcal{P}(\mathcal{U},\mathcal{H})$ that can reach $s$ , $l_{vs}$ is the length of the (unique) directed path from node $v$ to node $s$ in $\mathcal{P}(\mathcal{U},\mathcal{H})$ with $l_{vs}=0$ if $v=s$ , and

H(v,s)\triangleq(A_{sr_{1}}+B_{sr_{1}}\hat{K}_{r_{1}})\cdots(A_{r_{l_{vs}-1}v}+B_{r_{l_{vs}-1}v}\hat{K}_{v}),

with $H(v,s)=I$ if $v=s$ , where $\hat{K}_{r}$ is given by Eq. (8) for all $r\in\mathcal{U}$ , and $v,r_{l_{vs}-1},\dots,r_{1},s$ are the nodes along the directed path from $v$ to $s$ in $\mathcal{P}(\mathcal{U},\mathcal{H})$ . Recalling from Eq. (35) that $\tilde{\zeta}_{s}(0)=\sum_{w_{i}\rightarrow s}I_{s,\{i\}}x_{i}(0)=0$ , in Eq. (74) we set $w_{j}(t-l_{vs})=0$ if $t-l_{vs}<0$ . Now, under the assumption on $\varepsilon$ given in Eq. (47), we see from (45) in Lemma 8 that $\norm{\hat{K}_{r}}\leq\tilde{\Gamma}$ , which implies that $\norm{A_{sr}+B_{sr}\hat{K}_{r}}\leq\tilde{\Gamma}^{2}$ , for all $r\in\mathcal{U}$ with $r\neq s$ . Noting that $l_{vs}\leq D_{\max}$ from the construction of $\mathcal{P}(\mathcal{U},\mathcal{H})$ , we have $\norm{H(v,s)}\leq\tilde{\Gamma}^{2D_{\max}}$ , for all $v\in\mathcal{L}_{s}$ . Considering any $t\in\mathbb{Z}_{\geq 0}$ and denoting

\eta_{s}(t)=\sum_{v\in\mathcal{L}_{s}}H(v,s)\sum_{w_{j}\rightarrow v}I_{v,\{j\}}w_{j}(t-l_{vs}),

(75)

we have

\mathbb{E}\Big{[}\eta_{s}(t)\eta_{s}(t)^{\top}\Big{]}=\mathbb{E}\Big{[}\sum_{v\in\mathcal{L}_{s}}\sum_{w_{j}\rightarrow v}H(v,s)I_{v,\{j\}}w_{j}(t-l_{vs})w_{j}(t-l_{vs})^{\top}I_{\{j\},v}H(v,s)^{\top}\Big{]},

where we use the fact from $w(t)\sim\mathcal{N}(0,\sigma_{w}^{2}I)$ that $w_{j_{1}}(t)$ and $w_{j_{2}}(t)$ are independent for all $j_{1},j_{2}\in\mathcal{V}$ with $j_{1}\neq j_{2}$ , and the fact that for any $v\in\mathcal{U}$ with $s_{j}(0)=v$ , $w_{j}$ is the only noise term such that $w_{j}\rightarrow v$ (see Footnote 2). Moreover, we see that $\eta_{s}(t_{1})$ and $\eta_{s}(t_{2})$ are independent for all $t_{1},t_{2}\in\mathbb{Z}_{\geq 0}$ with $t_{1}\neq t_{2}$ , and that $\eta_{s}(t)$ is independent of $\tilde{\zeta}_{s}(t)$ for all $t\in\mathbb{Z}_{\geq 0}$ . Now, considering any $k\in\mathbb{Z}_{\geq 0}$ such that $k-l_{vs}\geq 0$ for all $v\in\mathcal{L}_{s}$ , and noting that $w(t)\sim\mathcal{N}(0,\sigma_{w}^{2}I)$ for all $t\in\mathbb{Z}_{\geq 0}$ , we have

\mathbb{E}\Big{[}\eta_{s}(k)\eta_{s}(k)^{\top}\Big{]}=\sigma_{w}^{2}\sum_{v\in\mathcal{L}_{s}}\sum_{w_{j}\rightarrow v}H(v,s)I_{v,\{j\}}I_{\{j\},v}H(v,s)^{\top}.

(76)

Let us denote the right-hand side of Eq. (76) as $\bar{W}_{s}$ , and denote

\tilde{L}_{ss}=A_{ss}+B_{ss}\hat{K}_{s}.

Fixing any $\tau\in\mathbb{Z}_{\geq 1}$ such that $\tau-l_{vs}\geq 0$ for all $v\in\mathcal{L}_{s}$ , and considering any $t\geq\tau$ , one can then unroll Eq. (74) and show that

\displaystyle\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)\tilde{\zeta}_{s}(t)^{\top}\Big{]}=\tilde{L}_{ss}^{t-\tau}\mathbb{E}\Big{[}\tilde{\zeta}_{s}(\tau)\tilde{\zeta}_{s}(\tau)^{\top}\Big{]}(\tilde{L}_{ss}^{\top})^{t-\tau}+\sum_{k=0}^{t-\tau-1}L_{ss}^{k}\bar{W}_{s}(\tilde{L}_{ss}^{\top})^{k}.

(77)

Under the assumption on $\varepsilon$ given in (47), one can obtain from Lemma 7 that $\norm{\tilde{L}_{ss}^{k}}\leq\kappa(\frac{\gamma+1}{2})^{k}$ for all $k\geq 0$ , where $0<\frac{\gamma+1}{2}<1$ , which implies that $\tilde{L}_{ss}$ is stable. It follows that

	$\displaystyle\lim_{t\to\infty}\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)\tilde{\zeta}_{s}(t)^{\top}\Big{]}$	$\displaystyle=\lim_{t\to\infty}\sum_{k=0}^{t-\tau-1}L_{ss}^{k}\bar{W}_{s}(\tilde{L}_{ss}^{\top})^{k}$
		$\displaystyle\preceq\norm{\bar{W}_{s}}\lim_{t\to\infty}\sum_{k=0}^{k-0-1}\norm{\tilde{L}_{ss}^{k}}\norm{(\tilde{L}_{ss}^{\top})^{k}}I$
		$\displaystyle\preceq\frac{4\norm{\bar{W}_{s}}\kappa^{2}}{1-\gamma^{2}}I.$

Noting that $|\mathcal{L}_{s}|\leq p$ from the definition of $\mathcal{P}(\mathcal{U},\mathcal{H})$ given in (7), and that for any $v\in\mathcal{U}$ with $s_{j}(0)=v$ , $w_{j}$ is the only noise term such that $w_{j}\rightarrow v$ , as we argued above, we have from Eq. (76) that

\displaystyle\norm{\bar{W}_{s}}\leq\sigma_{w}^{2}p\max_{v\in\mathcal{L}_{s}}\norm{H(v,s)}^{2}\leq\sigma_{w}^{2}p\tilde{\Gamma}^{4D_{\max}},

where the second inequality follows from the fact that $\norm{H(v,s)}\leq\tilde{\Gamma}^{2D_{\max}}$ as we argued above. It then follows that (48) holds.

Next, let us consider any $s\in\mathcal{U}$ that does not have a self loop. Similarly to Eq. (74), one can rewrite Eq. (35) as

\tilde{\zeta}_{s}(t+1)=\sum_{v\in\mathcal{L}_{s}}H(v,s)\sum_{w_{j}\rightarrow v}I_{v,\{j\}}w_{j}(t-l_{vs}).

Using similar arguments to those above, one can show that (48) also holds. ∎

D.2 Proof of Proposition 3

First, since $\varepsilon$ satisfies (47) (and thus (44)), we have from (43) in Lemma 8 that $A_{ss}+B_{ss}\hat{K}_{s}$ is stable for any $s\in\mathcal{U}$ that has a self loop. Now, using similar arguments to those for the proofs of Theorem 2 and Corollary 4 in [26], and leveraging Lemma 6 and Eqs. (34)-(35), (21) and (49), one can show that Eq. (50) holds. To proceed, for any $t\in\mathbb{Z}_{\geq 0}$ and for any $T\geq t$ and, we set $T^{\prime}=T+\lceil\frac{\varphi}{J_{\star}}T\rceil$ , and define

J_{T}(\tilde{x}(t))=\mathbb{E}\Big{[}\sum_{k=t}^{T^{\prime}-1}(x(k)^{\top}Qx(k)+u^{\star}(k)^{\top}Ru^{\star}(k))\Big{]},

(78)

where $u^{\star}(k)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}K_{s}\zeta_{s}(k)$ is the optimal control policy given by Eq. (11), for all $i\in\mathcal{V}$ and for all $k\geq t$ , and where $K_{r}$ and $\zeta_{r}(k)$ are given by Eqs. (8) and (10), respectively, for all $r\in\mathcal{U}$ . Moreover, on the right-hand side of Eq. (78), we set $x(t)=\tilde{x}(t)$ , where $\tilde{x}(t)$ given by Eq. (36) is the state after applying the control policy $\tilde{u}(k)$ in Eq. (34) for $k\in\{0,\dots,t-1\}$ . Noting that $\tilde{x}(0)=x(0)$ as we discussed at the beginning of Section 5, and that $T+\frac{\varphi}{J_{\star}}T\leq T^{\prime}\leq T+\frac{\varphi}{J_{\star}}T+1$ , we see that

	$\displaystyle\lim_{T\to\infty}\frac{1}{T}J_{T}(\tilde{x}(0))$	$\displaystyle=\lim_{T\to\infty}\big{(}\frac{T^{\prime}}{T}\frac{1}{T^{\prime}}J_{T}(\tilde{x}(0))\big{)}$
		$\displaystyle=\lim_{T\to\infty}\frac{T^{\prime}}{T}\big{(}\lim_{T\to\infty}\frac{1}{T^{\prime}}J_{T}(\tilde{x}(0))\big{)}$
		$\displaystyle\leq(1+\frac{\varphi}{J_{\star}})J_{\star}=J_{\star}+\varphi,$

where we use the fact that $\lim_{T\to\infty}\frac{1}{T^{\prime}}J_{T}(\tilde{x}(0))=J_{\star}$ .

Recalling the definition of $\tilde{J}$ in Eq. (38), we denote $\tilde{c}(t)=\tilde{x}(t)^{\top}Q\tilde{x}(t)+\tilde{u}(t)^{\top}R\tilde{u}(t)$ . We then have the following:

	$\displaystyle\tilde{J}$	$\displaystyle=\lim_{T\to\infty}\frac{1}{T}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\tilde{c}(t)\Big{]}$
		$\displaystyle=\lim_{T\to\infty}\frac{1}{T}\bigg{(}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\big{(}\tilde{c}(t)-J_{T}(\tilde{x}(t))\big{)}+\sum_{t=0}^{T-1}J_{T}(\tilde{x}(t))\Big{]}\bigg{)}$
		$\displaystyle=\lim_{T\to\infty}\frac{1}{T}\bigg{(}\mathbb{E}\Big{[}\sum_{t=0}^{T-2}\big{(}\tilde{c}(t)-J_{T}(\tilde{x}(t))\big{)}+\sum_{t=0}^{T-2}J_{T}(\tilde{x}(t+1))+J_{T}(\tilde{x}(0))\Big{]}\bigg{)}$
		$\displaystyle=\lim_{T\to\infty}\frac{1}{T}\bigg{(}\mathbb{E}\Big{[}\sum_{t=0}^{T-2}\big{(}\tilde{c}(t)+J_{T}(\tilde{x}(t+1))-J_{T}(\tilde{x}(t))\big{)}+J_{T}(\tilde{x}(0))\Big{]}\bigg{)}.$

Using similar arguments to those for the proof of [26, Theorem 2], one can show that

J_{T}(\tilde{x}(t+1))-J_{T}(\tilde{x}(t))=\mathbb{E}\Big{[}\sum_{r\in\mathcal{U}}\tilde{\zeta}_{r}(t+1)^{\top}P_{r}(t+1)\tilde{\zeta}_{r}(t+1)\Big{]}\\ -\mathbb{E}\Big{[}\sum_{r\in\mathcal{U}}\tilde{\zeta}_{r}(t)^{\top}P_{r}(t)\tilde{\zeta}_{r}(t)\Big{]}-\sigma_{w}^{2}\sum_{\begin{subarray}{c}i\in\mathcal{V}\\ w_{i}\rightarrow r\end{subarray}}\text{Tr}\big{(}I_{\{i\},r}P_{r}(t+1)I_{r,\{i\}}\big{)},

(79)

where $\tilde{\zeta}_{r}(k)=\sum_{w_{i}\to r}I_{r,\{i\}}\tilde{x}_{i}(k)$ for $k\in\{t,t+1\}$ . To obtain $P_{r}(t)$ for all $t\in\{0,\dots,T\}$ and for all $r\in\mathcal{U}$ in Eq. (79), we use the following recursion:

P_{r}(k)=Q_{rr}+K_{r}^{\top}R_{rr}K_{r}+(A_{sr}+B_{sr}K_{r})^{\top}P_{s}(k+1)(A_{sr}+B_{sr}K_{r}),

(80)

initialized with $P_{r}(T^{\prime})=0$ for all $r\in\mathcal{U}$ , where $T^{\prime}=T+\lceil\frac{\varphi}{J_{\star}}T\rceil$ , and for each $r\in\mathcal{U}$ , we let $s\in\mathcal{U}$ be the unique node such that $r\rightarrow s$ , and $K_{r}$ is given by Eq. (8) for all $r\in\mathcal{U}$ . Combining the above arguments together, we obtain the following:

$\displaystyle\tilde{J}-J_{\star}$	$\displaystyle=\lim_{T\to\infty}\frac{1}{T}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\Big{(}\tilde{c}(t)+\sum_{r\in\mathcal{U}}\tilde{\zeta}_{r}(t+1)^{\top}P_{r}(t+1)\tilde{\zeta}_{r}(t+1)-\sum_{r\in\mathcal{U}}\tilde{\zeta}_{r}(t)^{\top}P_{r}(t)\tilde{\zeta}_{r}(t)$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\quad-\sigma_{w}^{2}\sum_{\begin{subarray}{c}i\in\mathcal{V}\\ w_{i}\rightarrow r\end{subarray}}\text{Tr}\big{(}I_{\{i\},r}P_{r}(t+1)I_{r,\{i\}}\big{)}\Big{)}\Big{]}+\lim_{T\to\infty}\frac{1}{T}J_{T}(\tilde{x}(0))-J_{\star}$
	$\displaystyle\leq\lim_{T\to\infty}\frac{1}{T}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\sum_{r\in\mathcal{U}}\Big{(}\tilde{\zeta}_{r}(t)^{\top}(Q_{rr}+\hat{K}_{r}^{\top}R_{rr}\hat{K}_{r})\tilde{\zeta}_{r}(t)+\tilde{\zeta}_{r}(t+1)^{\top}P_{r}(t+1)\tilde{\zeta}_{r}(t+1)$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad-\tilde{\zeta}_{r}(t)^{\top}P_{r}(t)\tilde{\zeta}_{r}(t)-\sigma_{w}^{2}\sum_{\begin{subarray}{c}i\in\mathcal{V}\\ w_{i}\rightarrow r\end{subarray}}\text{Tr}\big{(}I_{\{i\},r}P_{r}(t+1)I_{r,\{i\}}\big{)}\Big{)}\Big{]}+\varphi$	(81)
	$\displaystyle=\lim_{T\to\infty}\frac{1}{T}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\sum_{r\in\mathcal{U}}\Big{(}\tilde{\zeta}_{r}(t)^{\top}\big{(}Q_{rr}+\hat{K}_{r}^{\top}R_{rr}\hat{K}_{r}-P_{r}(t)\big{)}\tilde{\zeta}_{r}(t)$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad+\sum_{v\to r}\tilde{\zeta}_{v}^{\top}(t)(A_{rv}+B_{rv}\hat{K}_{v})^{\top}P_{r}(t+1)(A_{rv}+B_{rv}\hat{K}_{v})\tilde{\zeta}_{v}(t)\Big{)}\Big{]}+\varphi$	(82)
	$\displaystyle=\lim_{T\to\infty}\frac{1}{T}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\sum_{r\in\mathcal{U}}\Big{(}\tilde{\zeta}_{r}(t)^{\top}\big{(}Q_{rr}+\hat{K}_{r}^{\top}R_{rr}\hat{K}_{r}-P_{r}(t)\big{)}\tilde{\zeta}_{r}(t)$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\quad+\tilde{\zeta}_{r}^{\top}(t)\big{(}(A_{sr}+B_{sr}\hat{K}_{r})^{\top}P_{s}(t+1)(A_{sr}+B_{sr}\hat{K}_{r})\big{)}\tilde{\zeta}_{r}(t)\Big{)}\Big{]}+\varphi,$	(83)

where for each $r\in\mathcal{U}$ in Eq. (83), we let $s\in\mathcal{U}$ be the unique node such that $r\rightarrow s$ . To obtain Eq. (81), we note from Lemma 6 that $\tilde{x}(t)=\sum_{r\in\mathcal{U}}I_{\mathcal{V},r}\tilde{\zeta}_{r}(t)$ for all $t\in\mathbb{Z}_{\geq 0}$ , where $\mathbb{E}[\tilde{\zeta}_{r}(t)]=0$ for all $r\in\mathcal{U}$ , and $\tilde{\zeta}_{r_{1}}(t)$ and $\tilde{\zeta}_{r_{2}}(t)$ are independent for all $r_{1},r_{2}\in\mathcal{U}$ with $r_{1}\neq r_{2}$ . Moreover, we note from Eq. (34) that $\tilde{u}(t)=\sum_{r\in\mathcal{U}}I_{\mathcal{V},s}\hat{K}_{r}\tilde{\zeta}_{r}(t)$ for all $t\in\mathbb{Z}_{\geq 0}$ , where $\hat{K}_{r}$ is given by Eq. (21). Combining the above arguments together, and recalling that $\tilde{c}_{t}=\tilde{x}(t)^{\top}Q\tilde{x}(t)+\tilde{u}(t)^{\top}R\tilde{u}(t)$ , we obtain Eq. (81). To obtain Eq. (82), we first apply Eq. (35) and notice $w(t)\sim\mathcal{N}(0,\sigma_{w}^{2}I)$ . Next, we use the facts that the information graph $\mathcal{P}(\mathcal{U},\mathcal{H})$ defined in (7) is a tree (see Lemma 1 and Remark 1), and that $\tilde{\zeta}_{r_{1}}(t)$ and $\tilde{\zeta}_{r_{2}}(t)$ are independent for all $r_{1},r_{2}\in\mathcal{U}$ with $r_{1}\neq r_{2}$ and for all $t\in\mathbb{Z}_{\geq 0}$ , as we argued above. To obtain Eq. (83), we leverage again the tree structure of $\mathcal{P}(\mathcal{U},\mathcal{H})$ .

Now, leveraging the recursion in Eq. (80), and using similar arguments to those for the proof of [16, Lemma 12], one can show via (83) that

	$\displaystyle\tilde{J}-J_{\star}$	$\displaystyle\leq\lim_{T\to\infty}\frac{1}{T}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\sum_{r\in\mathcal{U}}\Big{(}\tilde{\zeta}_{r}(t)^{\top}(\hat{K}_{r}-K_{r})^{\top}\big{(}R_{rr}+B_{sr}^{\top}P_{s}(t+1)B_{sr}\big{)}(\hat{K}_{r}-K_{r})\tilde{\zeta}_{r}(t)$
		$\displaystyle\qquad\qquad+2\tilde{\zeta}_{r}(t)^{\top}(\hat{K}_{r}-K_{r})^{\top}\big{(}(R_{rr}+B_{sr}^{\top}P_{s}(t+1)B_{sr})K_{r}+B_{sr}^{\top}P_{s}(t+1)A_{sr}\big{)}\tilde{\zeta}_{r}(t)\Big{)}\Big{]}+\varphi.$

Recall from Lemma 2 that $A_{ss}+B_{ss}K_{s}$ is stable for any $s\in\mathcal{U}$ that has a self loop. Using similar arguments to those for the proof of [26, Corollary 4], one can show via Eq, (80) that $P_{r}(t)\to P_{r}$ as $T\to\infty$ , for all $t\in\{0,\dots,T\}$ and for all $r\in\mathcal{U}$ , where $P_{r}$ is given by Eq. (9). It then follows that

$\displaystyle\tilde{J}-J_{\star}$	$\displaystyle\leq\lim_{T\to\infty}\frac{1}{T}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\sum_{r\in\mathcal{U}}\Big{(}\tilde{\zeta}_{r}(t)^{\top}(\hat{K}_{r}-K_{r})^{\top}\big{(}R_{rr}+B_{sr}^{\top}P_{s}B_{sr}\big{)}(\hat{K}_{r}-K_{r})\tilde{\zeta}_{r}(t)$
	$\displaystyle\qquad\qquad\quad+2\tilde{\zeta}_{r}(t)^{\top}(\hat{K}_{r}-K_{r})^{\top}\big{(}(R_{rr}+B_{sr}^{\top}P_{s}B_{sr})K_{r}+B_{sr}^{\top}P_{s}A_{sr}\big{)}\tilde{\zeta}_{r}(t)\Big{)}\Big{]}+\varphi$
	$\displaystyle=\lim_{T\to\infty}\frac{1}{T}\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\sum_{r\in\mathcal{U}}\Big{(}\tilde{\zeta}_{r}(t)^{\top}(\hat{K}_{r}-K_{r})^{\top}\big{(}R_{rr}+B_{sr}^{\top}P_{s}B_{sr}\big{)}(\hat{K}_{r}-K_{r})\tilde{\zeta}_{r}(t)\Big{)}+\varphi$
	$\displaystyle=\text{Tr}\Big{(}\sum_{r\in\mathcal{U}}(\hat{K}_{r}-K_{r})^{\top}\big{(}R_{rr}+B_{sr}^{\top}P_{s}B_{sr}\big{)}(\hat{K}_{r}-K_{r})\lim_{t\to\infty}\mathbb{E}\Big{[}\tilde{\zeta}_{r}(t)\tilde{\zeta}_{r}(t)^{\top}\Big{]}\Big{)}+\varphi,$	(84)

where the equality follows from Eq. (8), and the second equality follows from the fact that the limit $\lim_{t\to\infty}\mathbb{E}\big{[}\tilde{\zeta}_{r}(t)\tilde{\zeta}_{r}(t)^{\top}\big{]}$ exists, for all $r\in\mathcal{U}$ , as we argued in the proof of Lemma 9.

Finally, substituting (48) in Lemma 9 into the right-hand side of (84), we obtain

$\displaystyle\tilde{J}-J_{\star}$	$\displaystyle\leq\frac{4p\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}}\kappa^{2}}{1-\gamma^{2}}\text{Tr}\Big{(}\sum_{r\in\mathcal{U}}(K_{r}-\hat{K}_{r})^{\top}(R_{rr}+B_{sr}^{\top}P_{s}B_{sr})(K_{r}-\hat{K}_{r})\Big{)}+\varphi$
	$\displaystyle\leq\frac{4p\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}}\kappa^{2}}{1-\gamma^{2}}(\Gamma^{3}+\sigma_{1}(R))\text{Tr}\Big{(}\sum_{r\in\mathcal{U}}(K_{r}-\hat{K}_{r})^{\top}(K_{r}-\hat{K}_{r})\Big{)}+\varphi$
	$\displaystyle\leq\frac{4p\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}}\kappa^{2}}{1-\gamma^{2}}(\Gamma^{3}+\sigma_{1}(R))\sum_{r\in\mathcal{U}}\min\{m_{r},n_{r}\}\norm{K_{r}-\hat{K}_{r}}^{2}+\varphi$
	$\displaystyle\leq\frac{72\kappa^{4}\sigma_{w}^{2}pqn}{(1-\gamma^{2})^{2}}\tilde{\Gamma}^{4D_{\max}+8}(\Gamma^{3}+\sigma_{1}(R))(1+\sigma_{1}(R^{-1}))(20\tilde{\Gamma}^{9}\sigma_{1}(R))^{D_{\max}}\varepsilon+\varphi,$	(85)

where the third inequality follows from the fact that $K_{r},\hat{K}_{r}\in\mathbb{R}^{n_{r}\times m_{r}}$ for all $r\in\mathcal{U}$ , with $n_{r}\triangleq\sum_{i\in r}n_{i}$ and $m_{r}\triangleq\sum_{i\in r}m_{i}$ . To obtain (85), we first note that $\varepsilon$ is assumed to satisfy (47) (and thus (44)). Recalling $|\mathcal{U}|=q$ , and $n_{i}\geq m_{i}$ for all $i\in\mathcal{V}$ as we assumed previously, we then obtain (85) from Lemmas 7-8, where we also use the fact that $\norm{\hat{K}_{r}-K_{r}}\leq 1$ for all $r\in\mathcal{U}$ . ∎

D.3 Proof of Lemma 10

First, considering any $s\in\mathcal{U}$ and any $t\in\mathbb{Z}_{\geq 0}$ , we have

	$\displaystyle\mathbb{E}\Big{[}\norm{\tilde{\zeta}_{s}(t)}^{2}\Big{]}$	$\displaystyle=\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)^{\top}\tilde{\zeta}_{s}(t)\Big{]}=\mathbb{E}\Big{[}\text{Tr}\Big{(}\tilde{\zeta}_{s}(t)\tilde{\zeta}_{s}(t)^{\top}\Big{)}\Big{]}=\text{Tr}\Big{(}\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)\tilde{\zeta}_{s}(t)^{\top}\Big{]}\Big{)}$
		$\displaystyle\leq n\sigma_{1}\Big{(}\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)\tilde{\zeta}_{s}(t)^{\top}\Big{]}\Big{)}.$

Following similar arguments to those in the proof of Lemma 9 (particularly Eq. (77)), one can show that

\displaystyle\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)\tilde{\zeta}_{s}(t)^{\top}\Big{]}\preceq\tilde{L}_{ss}^{t}\mathbb{E}\Big{[}\tilde{\zeta}_{s}(0)\tilde{\zeta}_{s}(0)^{\top}\Big{]}(\tilde{L}_{ss}^{\top})^{t}+\sigma_{w}^{2}p\tilde{\Gamma}^{4D_{\max}}\sum_{k=0}^{t-1}L_{ss}^{k}(\tilde{L}_{ss}^{\top})^{k},

where $\tilde{L}_{ss}=A_{ss}+B_{ss}\hat{K}_{s}$ , and $\tilde{\zeta}_{s}(0)=\sum_{w_{i}\rightarrow s}I_{s,\{i\}}x_{i}(0)=0$ . Since $\varepsilon$ satisfies (47) and thus (44), we see from Lemma 7 that $\norm{\tilde{L}_{ss}^{k}}\leq\kappa(\frac{\gamma+1}{2})^{k}$ for all $k\in\mathbb{Z}_{\geq 0}$ . It now follows that

	$\displaystyle\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)\tilde{\zeta}_{s}(t)^{\top}\Big{]}$	$\displaystyle\preceq\sigma_{w}^{2}p\kappa^{2}\tilde{\Gamma}^{4D_{\max}}\sum_{k=0}^{t}\Big{(}\frac{\gamma+1}{2}\Big{)}^{2k}I$
		$\displaystyle\preceq\frac{4\sigma_{w}^{2}p\kappa^{2}\tilde{\Gamma}^{4D_{\max}}}{1-\gamma^{2}}I.$

Combining the above arguments together, we obtain (51).

Next, recalling from Lemma 6 that $\tilde{x}(t)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\tilde{\zeta}_{s}(t)$ for all $t\in\mathbb{Z}_{\geq 0}$ , we then have

	$\displaystyle\mathbb{E}\Big{[}\norm{\tilde{x}(t)}^{2}\Big{]}$	$\displaystyle=\mathbb{E}\Big{[}\big{(}\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\tilde{\zeta}_{s}(t)\big{)}^{\top}\big{(}\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\tilde{\zeta}_{s}(t)\big{)}\Big{]}$
		$\displaystyle\leq\Big{(}\sum_{s\in\mathcal{U}}\sqrt{\mathbb{E}\Big{[}\tilde{\zeta}_{s}(t)^{\top}I_{s,\mathcal{V}}I_{\mathcal{V},s}\tilde{\zeta}_{s}(t)\Big{]}}\Big{)}^{2}$
		$\displaystyle\leq\Big{(}\sum_{s\in\mathcal{U}}\sqrt{\mathbb{E}\Big{[}\norm{\tilde{\zeta}_{s}(t)}^{2}\Big{]}}\Big{)}^{2}$
		$\displaystyle\leq\frac{4npq^{2}\sigma_{w}^{2}\tilde{\Gamma}^{4D_{\max}}\kappa^{2}}{1-\gamma^{2}},$

where the first inequality follows from Lemma 14, and the last inequality follows from the fact that $|\mathcal{U}|=q$ . This completes the proof of (52).

Finally, we note from Eq. (34) that $\tilde{u}(t)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\hat{K}_{s}(t)\tilde{\zeta}_{s}(t)$ . Moreover, since $\varepsilon$ satisfies (47) (and thus (44)), we know from Lemmas 7-8 that $\norm{\hat{K}_{s}-K_{s}}\leq 1$ for all $s\in\mathcal{U}$ , which implies via the definition of $\Gamma$ given in (40) that $\norm{\hat{K}_{s}}\leq\tilde{\Gamma}$ for all $s\in\mathcal{U}$ . Now, using similar arguments to those above, we can show that (53) holds. ∎

D.4 Proof of Lemma 11

For notational simplicity in this proof, we denote

\delta_{h}=p(\tilde{\Gamma}+1)^{2D_{\max}-1},\ \beta=\tilde{\Gamma}^{2D_{\max}},

\Lambda_{1}=q\tilde{\Gamma}\bigg{(}\frac{2\kappa p(\beta+1)}{1-\gamma}+\frac{32\kappa^{2}p(\tilde{\Gamma}+1)}{(1-\gamma)^{2}}\bigg{)}\bigg{(}1+\frac{\kappa\Gamma}{1-\gamma}\bigg{)},

and

\Lambda_{2}=\frac{2pq\tilde{\Gamma}\kappa}{1-\gamma}\big{(}(\beta+1)q(\tilde{\Gamma}+1)+\delta_{h}\big{)}\zeta_{b}+\frac{16\kappa^{2}pq\tilde{\Gamma}(\tilde{\Gamma}+1)}{(1-\gamma)^{2}}\big{(}2q(\tilde{\Gamma}+1)+2\big{)}\zeta_{b}.

We first prove (55). Based on the above notations, we can show that

	$\displaystyle\Lambda_{2}$	$\displaystyle=\Big{(}\frac{2\kappa pq^{2}}{1-\gamma}\tilde{\Gamma}(\tilde{\Gamma}+1)(\beta+1)+\frac{32\kappa^{2}pq^{2}}{(1-\gamma)^{2}}\tilde{\Gamma}(\tilde{\Gamma}+1)^{2}\Big{)}\zeta_{b}+\Big{(}\frac{2\kappa pq}{1-\gamma}\tilde{\Gamma}\delta_{h}+\frac{32\kappa^{2}pq}{(1-\gamma)^{2}}\tilde{\Gamma}(\tilde{\Gamma}+1)\Big{)}\zeta_{b}$
		$\displaystyle\leq\frac{34\kappa^{2}pq^{2}}{(1-\gamma)^{2}}(\tilde{\Gamma}+1)^{2}\tilde{\Gamma}^{2D_{\max}+1}\zeta_{b}+\frac{18\kappa^{2}p^{2}q}{(1-\gamma)^{2}}(\tilde{\Gamma}+1)^{2D_{\max}+3}\zeta_{b}$
		$\displaystyle\leq\frac{52\kappa^{2}p^{2}q^{2}}{(1-\gamma)^{2}}(\tilde{\Gamma}+1)^{2D_{\max}+3}\zeta_{b},$

where the first inequality follows from the facts that $\frac{\kappa}{1-\gamma}>1$ , and that $\beta+1\leq\tilde{\Gamma}^{2D_{\max}+1}$ since $\tilde{\Gamma}=\Gamma+1\geq 2$ (see our arguments in the proof of Lemma 8). We then have

1.1\Lambda_{2}\leq\frac{58\kappa^{2}(\tilde{\Gamma}+1)^{2D_{\max}+3}p^{2}q^{2}}{(1-\gamma)^{2}}\zeta_{b}.

(86)

Thus, in order to show that (55) holds for all $t\in\mathbb{Z}_{\geq 0}$ , it suffices to show that $\mathbb{E}\big{[}\norm{u(t)-\tilde{u}(t)}^{2}\big{]}\leq(1.1\Lambda_{2}\bar{\varepsilon})^{2}$ holds for all $t\in\mathbb{Z}_{\geq 0}$ . To this end, we prove via an induction on $t=0,1,\dots$ . For any $t\in\mathbb{Z}_{\geq 0}$ , we recall from Eqs. (23) and (34) that $\hat{u}_{i}(t)=\sum_{r\ni i}I_{\{i\},r}\hat{K}_{r}\hat{\zeta}_{r}(t)$ and $\tilde{u}_{i}(t)=\sum_{r\ni i}I_{\{i\},r}\hat{K}_{r}\tilde{\zeta}_{r}(t)$ , respectively, for all $i\in\mathcal{V}$ , where $\hat{\zeta}_{r}(t)$ and $\tilde{\zeta}_{r}(t)$ are given by Eqs. (28) and (35), respectively, and $\hat{K}_{r}$ is given by Eq. (21), for all $r\in\mathcal{U}$ . As we argued before, in Eqs. (28) and (35) we have $\hat{\zeta}_{r}(0)=\tilde{\zeta}_{r}(0)=\sum_{w_{i}\rightarrow r}I_{r,\{i\}}x_{i}(0)$ for all $r\in\mathcal{U}$ . Hence, we have $\hat{u}(0)=\tilde{u}(0)$ , which implies that (55) holds for $t=0$ , completing the proof of the base step of the induction.

For the induction step, suppose $\mathbb{E}\big{[}\norm{\hat{u}(k)-\tilde{u}(k)}^{2}\big{]}\leq(1.1\Lambda_{2}\bar{\varepsilon})^{2}$ holds for all $k\in\{0,\dots,t\}$ . Now, considering any $k\in\{0,\dots,t\}$ , we can unroll the expressions of $\hat{x}(k)$ and $\tilde{x}(k)$ given by Eqs. (32) and (36), respectively, and obtain

\hat{x}(k)=A^{k}\hat{x}(0)+\sum_{k^{\prime}=0}^{k-1}A^{k-k^{\prime}-1}(B\hat{u}(k^{\prime})+w(k^{\prime})),

and

\tilde{x}(k)=A^{k}\tilde{x}(0)+\sum_{k^{\prime}=0}^{k-1}A^{k-k^{\prime}-1}(B\tilde{u}(k^{\prime})+w(k^{\prime})),

where we note that $\hat{x}(0)=\tilde{x}(0)=x(0)$ . It then follows that

$\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{\hat{x}(k)-\tilde{x}(k)}^{2}\Big{]}}$	$\displaystyle\leq\sum_{k^{\prime}=0}^{k-1}\sqrt{\mathbb{E}\Big{[}\norm{A^{k-k^{\prime}-1}B(\hat{u}(k^{\prime})-\tilde{u}(k^{\prime}))}^{2}\Big{]}}$
	$\displaystyle\leq\sum_{k^{\prime}=0}^{k-1}\sqrt{\mathbb{E}\Big{[}\norm{A^{k-k^{\prime}-1}}^{2}\norm{B}^{2}\norm{\hat{u}(k^{\prime})-\tilde{u}(k^{\prime})}^{2}\Big{]}}$
	$\displaystyle\leq\norm{B}\sum_{k^{\prime}=0}^{k-1}\norm{A^{k-k^{\prime}-1}}\sqrt{\mathbb{E}\Big{[}\norm{\hat{u}(k^{\prime})-\tilde{u}(k^{\prime})}^{2}\Big{]}}$
	$\displaystyle\leq\Gamma 1.1\Lambda_{2}\bar{\varepsilon}\sum_{k^{\prime}=0}^{k-1}\norm{A^{k-k^{\prime}-1}}\leq 1.1\Gamma\Lambda_{2}\bar{\varepsilon}\frac{\kappa}{1-\gamma},$	(87)

where the first inequality follows from Lemma 14. To obtain the first inequality in (87), we use the induction hypothesis. To obtain the second inequality in (87), we use the fact that $\norm{A^{k^{\prime}}}\leq\kappa\gamma^{k^{\prime}}$ (with $0<\gamma<1$ ), for all $k^{\prime}\in\mathbb{Z}_{\geq 0}$ , from Assumption 3. Recalling from our arguments in Section 4 (particularly, Eq. (29)), one can show that

\hat{w}(k)=\hat{x}(k+1)-\hat{A}\hat{x}(k)-\hat{B}\hat{u}(k),

where $\hat{w}(k)=\begin{bmatrix}\hat{w}_{1}(k)^{\top}&\cdots&\hat{w}_{p}(k)^{\top}\end{bmatrix}^{\top}$ is an estimate of $w(k)$ in Eq. (3). From Eq. (32), we see that

w(k)=\hat{x}(k+1)-A\hat{x}(k)-B\hat{u}(k).

It follows that

	$\displaystyle\norm{\hat{w}(k)-w(k)}$	$\displaystyle\leq\norm{\hat{A}-A}\norm{\hat{x}(k)}+\norm{\hat{B}-B}\norm{\hat{u}(k)}$
		$\displaystyle\leq(\norm{\hat{x}(k)}+\norm{\hat{u}(k)})\bar{\varepsilon}.$

Recall from Lemma 10 that $\mathbb{E}\big{[}\norm{\tilde{x}(k)}^{2}\big{]}\leq q^{2}\zeta_{b}^{2}$ and $\mathbb{E}\big{[}\norm{\tilde{u}(k)}^{2}\big{]}\leq q^{2}\tilde{\Gamma}^{2}\zeta_{b}^{2}$ , for all $k\in\mathbb{Z}_{\geq 0}$ . We then obtain

	$\displaystyle\mathbb{E}\Big{[}\norm{\hat{x}(k)}^{2}\Big{]}$	$\displaystyle=\mathbb{E}\Big{[}\norm{\hat{x}(k)-\tilde{x}[k]+\tilde{x}[k]}^{2}\Big{]}$
		$\displaystyle\leq\Big{(}\sqrt{\mathbb{E}\Big{[}\norm{\hat{x}(k)-\tilde{x}(k)}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{\tilde{x}[k]}^{2}\Big{]}}\Big{)}^{2}$
		$\displaystyle\leq\Big{(}\frac{1.1\Gamma\Lambda_{2}\kappa}{1-\gamma}\bar{\varepsilon}+q\zeta_{b}\Big{)}^{2},$

where the first inequality follows again from Lemma 14, and the second inequality uses (87). Similarly, we have

	$\displaystyle\mathbb{E}\Big{[}\norm{\hat{u}(k)}^{2}\Big{]}$	$\displaystyle=\mathbb{E}\Big{[}\norm{\hat{u}(k)-\tilde{u}[k]+\tilde{u}[k]}^{2}\Big{]}$
		$\displaystyle\leq\Big{(}\sqrt{\mathbb{E}\Big{[}\norm{\hat{u}(k)-\tilde{u}(k)}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{\tilde{u}[k]}^{2}\Big{]}}\Big{)}^{2}$
		$\displaystyle\leq(1.1\Lambda_{2}\bar{\varepsilon}+q\tilde{\Gamma}\zeta_{b})^{2}.$

It then follows that

	$\displaystyle\mathbb{E}\Big{[}\norm{\hat{w}(k)-w(k)}^{2}\Big{]}$	$\displaystyle=\mathbb{E}\Big{[}\norm{(\hat{A}-A)\hat{x}(k)+(\hat{B}-B)\hat{u}(k)}^{2}\Big{]}$
		$\displaystyle\leq\Big{(}\sqrt{\mathbb{E}\Big{[}\norm{(\hat{A}-A)\hat{x}(k)}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{(\hat{B}-B)\hat{u}(k)}^{2}\Big{]}}\Big{)}^{2}$
		$\displaystyle\leq\Big{(}\sqrt{\mathbb{E}\Big{[}\norm{\hat{x}(k)}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{\hat{u}(k)}^{2}\Big{]}}\Big{)}^{2}\bar{\varepsilon}^{2}$
		$\displaystyle\leq\big{(}q(\tilde{\Gamma}+1)\zeta_{b}+\frac{1.1\Gamma\Lambda_{2}\kappa}{1-\gamma}\bar{\varepsilon}+1.1\Lambda_{2}\big{)}^{2}\bar{\varepsilon}^{2},$

where the first inequality again follows from Lemma 14, and the second inequality follows from $\norm{\hat{A}-A}\leq\bar{\varepsilon}$ and $\norm{\hat{B}-B}\leq\bar{\varepsilon}$ . Denoting

\delta_{w}=q(\tilde{\Gamma}+1)\zeta_{b}+\frac{1.1\Gamma\Lambda_{2}\kappa}{1-\gamma}\bar{\varepsilon}+1.1\Lambda_{2}\bar{\varepsilon},

(88)

we have

\mathbb{E}\Big{[}\norm{\hat{w}(k)-w(k)}^{2}\Big{]}\leq\delta_{w}^{2}\bar{\varepsilon}^{2}\quad\forall k\in\{0,\dots,t\}.

(89)

Moreover, note that

	$\displaystyle\mathbb{E}\Big{[}\norm{w(k)}^{2}\Big{]}$	$\displaystyle=\mathbb{E}\Big{[}\text{Tr}(w(k)w(k)^{\top})\Big{]}=\text{Tr}\Big{(}\mathbb{E}\Big{[}w(k)w(k)^{\top}\Big{]}\Big{)}$
		$\displaystyle=n\sigma^{2}_{w}\leq\zeta_{b}^{2}\quad\forall k\in\mathbb{Z}_{\geq 0}.$		(90)

To proceed, let us consider any $s\in\mathcal{U}$ that has a self loop. Recalling the arguments in the proof of Lemmas 9, we can rewrite Eq. (35) as

\tilde{\zeta}_{s}(t+1)=(A_{ss}+B_{ss}\hat{K}_{s})\tilde{\zeta}_{s}(t)+\eta_{s}(t),

(91)

with

\eta_{s}(t)=\sum_{v\in\mathcal{L}_{s}}H(v,s)\sum_{w_{j}\rightarrow v}I_{v,\{j\}}w_{j}(t-l_{vs}),

(92)

H(v,s)=(A_{sr_{1}}+B_{sr_{1}}\hat{K}_{r_{1}})\cdots(A_{r_{l_{vs}-1}v}+B_{r_{l_{vs}-1}v}\hat{K}_{v}),

with $H(v,s)=I$ if $v=s$ , where $v,r_{l_{vs}-1},\dots,r_{1},s$ are the nodes along the directed path from $v$ to $s$ in $\mathcal{P}(\mathcal{U},\mathcal{H})$ . We also recall from the arguments in the proof of Lemma 9 that $\norm{H(v,s)}\leq\beta$ for all $v\in\mathcal{L}_{s}$ . We then see from (76) in the proof of Lemma 9 and the definition of $\zeta_{b}$ in (54) that

	$\displaystyle\mathbb{E}\Big{[}\norm{\eta_{s}(k)}^{2}\Big{]}$	$\displaystyle=\mathbb{E}\Big{[}\text{Tr}(\eta_{s}(k)\eta_{s}(k)^{\top})\Big{]}=\text{Tr}\Big{(}\mathbb{E}\Big{[}\eta_{s}(k)\eta_{s}(k)^{\top}\Big{]}\Big{)}$
		$\displaystyle\leq\sigma_{w}^{2}np\beta^{2}\leq\zeta_{b}^{2}\quad\forall k\in\mathbb{Z}_{\geq 0}.$		(93)

Similarly, one can rewrite Eq. (28) as

\hat{\zeta}_{s}(t+1)=(\hat{A}_{ss}+\hat{B}_{ss}\hat{K}_{s})\hat{\zeta}_{s}(t)+\hat{\eta}_{s}(t),

(94)

where

\hat{\eta}_{s}(t)=\sum_{v\in\mathcal{L}_{s}}\hat{H}(v,s)\sum_{w_{j}\rightarrow v}I_{v,\{j\}}\hat{w}_{j}(t-l_{vs}),

(95)

where

\hat{H}(v,s)=(\hat{A}_{sr_{1}}+\hat{B}_{sr_{1}}\hat{K}_{r_{1}})\cdots(\hat{A}_{r_{l_{vs}-1}v}+\hat{B}_{r_{l_{vs}-1}v}\hat{K}_{v}),

with $\hat{H}(v,s)=I$ if $v=s$ . Note that for any $v\in\mathcal{L}_{s}$ and for any $w_{j}\rightarrow v$ in Eqs. (92) and (95), we set $\hat{w}_{j}(k-l_{vs})=w_{j}(k-l_{vs})=0$ if $k<l_{vs}$ . One can check that $\bar{\varepsilon}$ satisfies (44) and (47). We then have from Lemmas 7-8 that $\norm{\hat{K}_{r}}\leq\Gamma+1=\tilde{\Gamma}$ for all $r\in\mathcal{U}$ . It follows that $\norm{\hat{A}_{sr}+\hat{B}_{sr}\hat{K}_{r}}\leq(\tilde{\Gamma}+1)^{2}$ for all $r\in\mathcal{U}$ with $r\neq s$ . Noting from the construction of $\mathcal{P}(\mathcal{U},\mathcal{H})$ in (7) that $l_{vs}\leq D_{\max}\leq p$ for all $v\in\mathcal{L}_{s}$ , one can now show that

\norm{H(v,s)-\hat{H}(v,s)}\leq\delta_{h}\bar{\varepsilon}.

(96)

which also implies that

\norm{\hat{H}(v,s)}\leq\delta_{h}\bar{\varepsilon}+\beta,

(97)

for all $v\in\mathcal{L}_{s}$ . For any $k\in\{0,\dots,t\}$ , we then have from the above arguments that

	$\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{\eta_{s}(k)-\hat{\eta}_{s}(k)}^{2}\Big{]}}$
$\displaystyle=$	$\displaystyle\bigg{(}\mathbb{E}\bigg{[}\Big{\\|}\sum_{v\in\mathcal{L}_{s}}\sum_{w_{j}\rightarrow v}\big{(}H(v,s)I_{v,\{j\}}w_{j}(k-l_{vs})-\hat{H}(v,s)I_{v,\{j\}}\hat{w}_{j}(k-l_{vs})\big{)}\Big{\\|}^{2}\bigg{]}\bigg{)}^{\frac{1}{2}}$
$\displaystyle\leq$	$\displaystyle\sum_{v\in\mathcal{L}_{s}}\sum_{w_{j}\rightarrow v}\Big{(}\sqrt{\mathbb{E}\Big{[}\big{\\|}\big{(}H(v,s)-\hat{H}(v,s)\big{)}w_{j}(k-l_{vs})\big{\\|}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\big{\\|}\hat{H}(v,s)\big{(}w_{j}(k-l_{vs})-\hat{w}_{j}(k-l_{vs})\big{)}\big{\\|}^{2}\Big{]}}\Big{)}$
$\displaystyle\leq$	$\displaystyle p\big{(}\delta_{h}\zeta_{b}\bar{\varepsilon}+(\delta_{h}\bar{\varepsilon}+\beta)\delta_{w}\bar{\varepsilon}\big{)},$	(98)

where the first inequality follows from Lemma 14. To obtain (98), we first note (89)-(90) and (96)-(97). We then use the fact that $|\mathcal{L}_{s}|\leq p$ from the definition of the information graph $\mathcal{P}(\mathcal{U},\mathcal{H})$ given by (7), and the fact that for any $v\in\mathcal{U}$ with $s_{j}(0)=v$ , $w_{j}$ is the only noise term such that $w_{j}\rightarrow v$ (see Footnote 2). From (93) and (98), we also obtain

$\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{\hat{\eta}_{s}(k)}^{2}\Big{]}}$	$\displaystyle=\sqrt{\mathbb{E}\Big{[}\norm{\hat{\eta}_{s}(k)-\eta_{s}(k)+\eta_{s}(k)}^{2}\Big{]}}$
	$\displaystyle\leq\sqrt{\mathbb{E}\Big{[}\norm{\hat{\eta}_{s}(k)-\eta_{s}(k)}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{\eta_{s}(k)}^{2}\Big{]}}$
	$\displaystyle\leq p\big{(}\delta_{h}\zeta_{b}\bar{\varepsilon}+(\delta_{h}\bar{\varepsilon}+\beta)\delta_{w}\bar{\varepsilon}\big{)}+\zeta_{b},$	(99)

where the first inequality follows from Lemma 14.

Now, let us denote $\tilde{L}_{ss}=A_{ss}+B_{ss}\hat{K}_{s}$ and $\hat{L}_{ss}=\hat{A}_{ss}+\hat{B}_{ss}\hat{K}_{s}$ . Recalling that $\hat{\zeta}_{s}(0)=\tilde{\zeta}_{s}(0)=\sum_{w_{i}\rightarrow s}I_{s,\{i\}}x_{i}(0)$ , where $x(0)=0$ as we assumed before, one can unroll Eqs. (91) and (94), and show that

\hat{\zeta}_{s}(t+1)-\tilde{\zeta}_{s}(t+1)=\sum_{k=0}^{t}\big{(}\hat{L}_{ss}^{t-k}\hat{\eta}_{s}(k)-\tilde{L}_{ss}^{t-k}\tilde{\eta}_{s}(k)\big{)}.

(100)

Since $\norm{\hat{A}-A}\leq\bar{\varepsilon}$ and $\norm{\hat{B}-B}\leq\bar{\varepsilon}$ , where $\bar{\varepsilon}$ satisfies (44), as we argued above, we have from Lemma 7 that

\norm{\tilde{L}_{ss}^{k}}\leq\kappa(\frac{\gamma+1}{2})^{k}\quad\forall k\in\mathbb{Z}_{\geq 0},

(101)

where $\kappa\in\mathbb{R}_{\geq 1}$ and $\gamma\in\mathbb{R}$ , with $0<\gamma<1$ . Moreover, since $\norm{\hat{L}_{ss}-\tilde{L}_{ss}}=\norm{\hat{A}_{ss}-A_{ss}+\hat{K}_{s}(\hat{B}_{ss}-B_{ss})}\leq(\tilde{\Gamma}+1)\bar{\varepsilon}$ , we have from Lemma 15 that

	$\displaystyle\norm{\hat{L}_{ss}^{k}-\tilde{L}_{ss}^{k}}$	$\displaystyle\leq k\kappa^{2}\big{(}\kappa(\tilde{\Gamma}+1)\bar{\varepsilon}+\frac{\gamma+1}{2}\big{)}^{k-1}(\tilde{\Gamma}+1)\bar{\varepsilon}$
		$\displaystyle\leq k\kappa^{2}(\frac{\gamma+3}{4})^{k-1}(\tilde{\Gamma}+1)\bar{\varepsilon}\quad\forall k\in\mathbb{Z}_{\geq 0},$		(102)

where (102) follows from the choice of $\bar{\varepsilon}$ in (54). Now, considering any term in the summation on the right-hand side of (100), we have

	$\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{\hat{L}_{ss}^{t-k}\hat{\eta}_{s}(k)-\tilde{L}_{ss}^{t-k}\tilde{\eta}_{s}(k)}^{2}\Big{]}}$
$\displaystyle\leq$	$\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{(\hat{L}_{ss}^{t-k}-\tilde{L}_{ss}^{t-k})\hat{\eta}_{s}(k)}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{\tilde{L}_{ss}^{t-k}(\hat{\eta}_{s}(k)-\eta_{s}(k))}^{2}\Big{]}}$
$\displaystyle\leq$	$\displaystyle(t-k)\kappa^{2}(\frac{\gamma+3}{4})^{t-k-1}(\tilde{\Gamma}+1)\bar{\varepsilon}\big{(}p(\delta_{w}\delta_{h}\bar{\varepsilon}+\delta_{h}\zeta_{b}+\delta_{w}\beta)\bar{\varepsilon}+\zeta_{b}\big{)}+\kappa(\frac{\gamma+1}{2})^{t-k}p(\delta_{w}\delta_{h}\bar{\varepsilon}+\delta_{h}\zeta_{b}+\delta_{w}\beta)\bar{\varepsilon}$
$\displaystyle\leq$	$\displaystyle(t-k)\kappa^{2}(\frac{\gamma+3}{4})^{t-k-1}(\tilde{\Gamma}+1)p\big{(}2\delta_{w}+2\zeta_{b}\big{)}\bar{\varepsilon}+\kappa(\frac{\gamma+1}{2})^{t-k}p\big{(}(\beta+1)\delta_{w}+\delta_{h}\zeta_{b}\big{)}\bar{\varepsilon},$	(103)

where the first inequality follows from Lemma 14, and the second inequality uses the upper bounds provided in (98)-(99) and (101)-(102). Moreover, one can show that $\bar{\varepsilon}$ defined in (54) satisfies that $\delta_{h}\bar{\varepsilon}\leq 1$ and $\beta\bar{\varepsilon}\leq 1$ , which, via algebraic manipulations, yield (103). We then see from (100) that

	$\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{\hat{\zeta}_{s}(t+1)-\tilde{\zeta}_{s}(t+1)}^{2}\Big{]}}$
$\displaystyle\leq$	$\displaystyle\sum_{k=0}^{t}\sqrt{\mathbb{E}\Big{[}\norm{\hat{L}_{ss}^{t-k}\hat{\eta}_{s}(k)-\tilde{L}_{ss}^{t-k}\tilde{\eta}_{s}(k)}^{2}\Big{]}}$
$\displaystyle\leq$	$\displaystyle\bigg{(}\kappa p\big{(}(\beta+1)\delta_{w}+\delta_{h}\zeta_{b}\big{)}\bar{\varepsilon}\sum_{k=0}^{t}(\frac{\gamma+1}{2})^{t-k}\bigg{)}+\kappa^{2}(\tilde{\Gamma}+1)p\big{(}2\delta_{w}+2\zeta_{b}\big{)}\bar{\varepsilon}\sum_{k=0}^{t}(t-k)(\frac{\gamma+3}{4})^{t-k-1}$
$\displaystyle\leq$	$\displaystyle\frac{2\kappa p}{1-\gamma}\big{(}(\beta+1)\delta_{w}+\delta_{h}\zeta_{b}\big{)}\bar{\varepsilon}+\frac{16\kappa^{2}(\tilde{\Gamma}+1)p}{(1-\gamma)^{2}}\big{(}2\delta_{w}+2\zeta_{b}\big{)}\bar{\varepsilon},$	(104)

where the first inequality follows from Lemma 14, and (104) follows from standard formulas for series. Now, substituting Eq. (88) into the right-hand side of (104), one can show that

\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{\hat{\zeta}_{s}(t+1)-\tilde{\zeta}_{s}(t+1)}^{2}\Big{]}}\leq\frac{1}{q\tilde{\Gamma}}\big{(}\Lambda_{1}(1.1\Lambda_{2}\bar{\varepsilon})+\Lambda_{2}\big{)}\bar{\varepsilon},

(105)

where we note that $\Lambda_{1}>0$ and $\Lambda_{2}>0$ by their definitions.

Next, considering any $s\in\mathcal{U}$ that does not have a self loop, we have from the arguments in the proof of Lemma 9 that Eq. (35) can be rewritten as $\tilde{\zeta}_{s}(t+1)=\eta_{s}(t)$ , where $\eta_{s}(t)$ is defined in Eq. (92). Similarly, Eq. (28) can be rewritten as $\hat{\zeta}_{s}(t+1)=\hat{\eta}_{s}(t)$ , where $\hat{\eta}_{s}(t)$ is defined in Eq. (95). Using similar arguments to those above, one can then show that (105) also holds.

Further recalling Eqs. (23) and (34), we know that $\hat{u}(t+1)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\hat{K}_{s}\hat{\zeta}(t+1)$ and $\tilde{u}(t+1)=\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\hat{K}_{s}\tilde{\zeta}(t+1)$ , which imply that

	$\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{\hat{u}(t+1)-\tilde{u}(t+1)}^{2}\Big{]}}$	$\displaystyle=\sqrt{\mathbb{E}\Big{[}\big{\\|}\sum_{s\in\mathcal{U}}I_{\mathcal{V},s}\hat{K}_{s}(\hat{\zeta}_{s}(t+1)-\tilde{\zeta}_{s}(t+1))\big{\\|}^{2}\Big{]}}$
		$\displaystyle\leq\sum_{s\in\mathcal{U}}\sqrt{\mathbb{E}\Big{[}\norm{I_{\mathcal{V},s}\hat{K}_{s}(\hat{\zeta}_{s}(t+1)-\tilde{\zeta}_{s}(t+1))}^{2}\Big{]}}$
		$\displaystyle\leq\tilde{\Gamma}\sum_{s\in\mathcal{U}}\sqrt{\mathbb{E}\Big{[}\norm{\hat{\zeta}_{s}(t+1)-\tilde{\zeta}_{s}(t+1)}^{2}\Big{]}}$
		$\displaystyle\leq\big{(}\Lambda_{1}(1.1\Lambda_{2}\bar{\varepsilon})+\Lambda_{2}\big{)}\bar{\varepsilon},$

where the first inequality follows from Lemma 14, the second inequality follows from $\norm{\hat{K}_{s}}\leq\tilde{\Gamma}$ for all $s\in\mathcal{U}$ , as we argued above, and the last inequality follows from (105) and the fact that $|\mathcal{U}|=q$ . Moreover, we can show that

	$\displaystyle\Lambda_{1}$	$\displaystyle\leq q\tilde{\Gamma}\frac{34\kappa^{2}p}{(1-\gamma)^{2}}(\tilde{\Gamma}+1)(\beta+1)\frac{\tilde{\Gamma}\kappa}{1-\gamma}$
		$\displaystyle\leq\frac{34\kappa^{2}pq}{(1-\gamma)^{3}}(\tilde{\Gamma}+1)\tilde{\Gamma}^{2D_{\max}+3},$

where the first inequality follows from the fact that $\frac{\kappa}{1-\gamma}>1$ , and the second inequality follows from the fact that $\beta+1\leq\tilde{\Gamma}^{2D_{\max}+1}$ as we argued above. One can then show that $\bar{\varepsilon}$ given in (54) satisfies that $0<\bar{\varepsilon}\leq\frac{1}{11\Lambda_{1}}$ . Thus, we obtain the following:

		$\displaystyle 1.1\Lambda_{1}\bar{\varepsilon}+1\leq 1.1$
	$\displaystyle\Leftrightarrow\$	$\displaystyle 1.1\Lambda_{1}\Lambda_{2}\bar{\varepsilon}+\Lambda_{2}\leq 1.1\Lambda_{2}$
	$\displaystyle\Leftrightarrow\$	$\displaystyle\Lambda_{1}(1.1\Lambda_{2}\bar{\varepsilon})\bar{\varepsilon}+\Lambda_{2}\bar{\varepsilon}\leq 1.1\Lambda_{2}\bar{\varepsilon}.$

It follows that

\sqrt{\mathbb{E}\Big{[}\norm{\hat{u}(t+1)-\tilde{u}(t+1)}^{2}\Big{]}}\leq 1.1\Lambda_{2}\bar{\varepsilon},

which completes the induction step, i.e., we have shown that $\sqrt{\mathbb{E}\big{[}\norm{\hat{u}(k)-\tilde{u}(k)}^{2}\big{]}}\leq 1.1\Lambda_{2}\bar{\varepsilon}$ holds for all $k\in\{0,\dots,t+1\}$ .

Next, using similar arguments to those for (87), we have $\sqrt{\mathbb{E}\big{[}\norm{\hat{x}(t)-\tilde{x}(t)}^{2}\big{]}}\leq\frac{1.1\Gamma\Lambda_{2}\kappa}{1-\gamma}\bar{\varepsilon}$ for all $t\in\mathbb{Z}_{\geq 0}$ . It then follows from (86) that (56) holds for all $t\in\mathbb{Z}_{\geq 0}$ . ∎

D.5 Proof of Proposition 4

For notational simplicity in this proof, we denote

\Lambda=\frac{58\kappa^{2}(\tilde{\Gamma}+1)^{2D_{\max}+3}p^{2}q^{2}}{(1-\gamma)^{2}}.

(106)

For all $t\in\mathbb{Z}_{\geq 0}$ , we then see from Lemma 11 that

\mathbb{E}\Big{[}\norm{\hat{u}(t)-\tilde{u}(t)}^{2}\Big{]}\leq(\Lambda\zeta_{b}\bar{\varepsilon})^{2},

and

\mathbb{E}\Big{[}\norm{\hat{x}(t)-\tilde{x}(t)}^{2}\Big{]}\leq\Big{(}\frac{\kappa\Gamma}{1-\gamma}\Lambda\zeta_{b}\bar{\varepsilon}\Big{)}^{2},

where $\hat{u}(k)$ (resp., $\tilde{u}(k)$ ) is given by Eq. (23) (resp., Eq. (34)), $\hat{x}(k)$ (resp., $\tilde{x}(k)$ ) is given by Eq. (32) (resp., Eq. (36)), and $\zeta_{b}$ is defined in Eq. (54). Similarly, we see from Corollary 1 that

\mathbb{E}\Big{[}\norm{\hat{x}(t)}^{2}\Big{]}\leq\Big{(}\frac{\kappa\Gamma}{1-\gamma}\Lambda\zeta_{b}\bar{\varepsilon}+q\zeta_{b}\Big{)}^{2},

and

\mathbb{E}\Big{[}\norm{\hat{u}(t)}^{2}\Big{]}\leq(\Lambda\zeta_{b}\bar{\varepsilon}+q\tilde{\Gamma}\zeta_{b})^{2},

for all $t\in\mathbb{Z}_{\geq 0}$ .

To proceed, we have the following:

	$\displaystyle\hat{J}-\tilde{J}$	$\displaystyle=\limsup_{T\to\infty}\mathbb{E}\Big{[}\frac{1}{T}\sum_{t=0}^{T-1}\big{(}\hat{x}(t)^{\top}Q\hat{x}(t)+\hat{u}(t)^{\top}R\hat{u}(t)\big{)}\Big{]}-\lim_{T\to\infty}\mathbb{E}\Big{[}\frac{1}{T}\sum_{t=0}^{T-1}\big{(}\tilde{x}(t)^{\top}Q\tilde{x}(t)+\tilde{u}(t)^{\top}R\tilde{u}(t)\big{)}\Big{]}$
		$\displaystyle=\limsup_{T\to\infty}\mathbb{E}\Big{[}\frac{1}{T}\sum_{t=0}^{T-1}\big{(}\hat{x}(t)^{\top}Q\hat{x}(t)-\tilde{x}(t)^{\top}Q\tilde{x}(t)+\hat{u}(t)^{\top}R\hat{u}(t)-\tilde{u}(t)^{\top}R\tilde{u}(t)\big{)}\Big{]}.$		(107)

Now, considering any term in the summation on the right-hand side of Eq. (107), and dropping the dependency on $t$ for notational simplicity, we have the following:

	$\displaystyle\mathbb{E}\Big{[}\hat{x}^{\top}Q\hat{x}-\tilde{x}^{\top}Q\tilde{x}\Big{]}$
$\displaystyle=$	$\displaystyle\mathbb{E}\Big{[}\hat{x}^{\top}Q(\hat{x}-\tilde{x})+(\hat{x}-\tilde{x})^{\top}Q\tilde{x}\Big{]}$
$\displaystyle\leq$	$\displaystyle\mathbb{E}\Big{[}\norm{Q\hat{x}}\norm{\hat{x}-\tilde{x}}\Big{]}+\mathbb{E}\Big{[}\norm{\hat{x}-\tilde{x}}\norm{Q\tilde{x}}\Big{]}$
$\displaystyle\leq$	$\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{Q\hat{x}}^{2}\Big{]}\mathbb{E}\Big{[}\norm{\hat{x}-\tilde{x}}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{\hat{x}-\tilde{x}}^{2}\Big{]}\mathbb{E}\Big{[}\norm{Q\tilde{x}}^{2}\Big{]}}$
$\displaystyle\leq$	$\displaystyle\sigma_{1}(Q)\Big{(}\frac{\kappa\Gamma\Lambda\zeta_{b}}{1-\gamma}\bar{\varepsilon}+q\zeta_{b}\Big{)}\frac{\kappa\Gamma\Lambda\zeta_{b}}{1-\gamma}\bar{\varepsilon}+\sigma_{1}(Q)\frac{\kappa\Gamma\Lambda\zeta_{b}}{1-\gamma}\bar{\varepsilon}q\zeta_{b}$
$\displaystyle=$	$\displaystyle\sigma_{1}(Q)\Big{(}\frac{\kappa\Gamma\Lambda\zeta_{b}}{1-\gamma}\bar{\varepsilon}+2q\zeta_{b}\Big{)}\frac{\kappa\Gamma\Lambda\zeta_{b}}{1-\gamma}\bar{\varepsilon},$	(108)

where the first two inequalities follow from the Cauchy-Schwartz inequality, and the third inequality follows from the upper bounds on $\mathbb{E}\big{[}\norm{\hat{x}}^{2}\big{]}$ , $\mathbb{E}\big{[}\norm{\hat{x}-\tilde{x}}^{2}\big{]}$ , and $\mathbb{E}\big{[}\norm{\tilde{x}}^{2}\big{]}$ given above and in Lemma 10. Similarly, we have

	$\displaystyle\mathbb{E}\Big{[}\hat{u}^{\top}R\hat{u}-\tilde{u}^{\top}R\tilde{u}\Big{]}$
$\displaystyle\leq$	$\displaystyle\sqrt{\mathbb{E}\Big{[}\norm{R\hat{u}}^{2}\Big{]}\mathbb{E}\Big{[}\norm{\hat{u}-\tilde{u}}^{2}\Big{]}}+\sqrt{\mathbb{E}\Big{[}\norm{\hat{u}-\tilde{u}}^{2}\Big{]}\mathbb{E}\Big{[}\norm{R\tilde{u}}^{2}\Big{]}}$
$\displaystyle\leq$	$\displaystyle\sigma_{1}(R)(\Lambda\zeta_{b}\bar{\varepsilon}+q\tilde{\Gamma}\zeta_{b})\Lambda\zeta_{b}\bar{\varepsilon}+\sigma_{1}(R)\Lambda\zeta_{b}\bar{\varepsilon}q\tilde{\Gamma}\zeta_{b}$
$\displaystyle=$	$\displaystyle\sigma_{1}(R)(\Lambda\zeta_{b}\bar{\varepsilon}+2q\tilde{\Gamma}\zeta_{b})\Lambda\zeta_{b}\bar{\varepsilon},$	(109)

where the second inequality follows from the upper bounds on $\mathbb{E}\big{[}\norm{\hat{u}}^{2}\big{]}$ , $\mathbb{E}\big{[}\norm{\hat{u}-\tilde{u}}^{2}\big{]}$ , and $\mathbb{E}\big{[}\norm{\tilde{u}}^{2}\big{]}$ given above and in Lemma 10. Combining (108) and (109) together, we obtain from Eq. (107) that

$\displaystyle\hat{J}-\tilde{J}$	$\displaystyle\leq\sigma_{1}(Q)\Big{(}\frac{\kappa\Gamma\Lambda\zeta_{b}}{1-\gamma}\bar{\varepsilon}+2q\zeta_{b}\Big{)}\frac{\kappa\Gamma\Lambda\zeta_{b}}{1-\gamma}\bar{\varepsilon}+\sigma_{1}(R)(\Lambda\zeta_{b}\bar{\varepsilon}+2q\tilde{\Gamma}\zeta_{b})\Lambda\zeta_{b}\bar{\varepsilon}$
	$\displaystyle\leq\Big{(}\frac{\kappa\tilde{\Gamma}\zeta_{b}}{1-\gamma}\Big{)}^{2}(\Lambda^{2}\bar{\varepsilon}+2q\Lambda)(\sigma_{1}(Q)+\sigma_{1}(R))\bar{\varepsilon}$
	$\displaystyle\leq\Big{(}\frac{\kappa\tilde{\Gamma}\zeta_{b}}{1-\gamma}\Big{)}^{2}3\Lambda pq(\sigma_{1}(Q)+\sigma_{1}(R))\bar{\varepsilon},$	(110)

where the second inequality follows from the fact that $\frac{\kappa\tilde{\Gamma}}{1-\gamma}\geq 1$ . To obtain (110), one can show that $\Lambda^{2}\bar{\varepsilon}\leq\Lambda pq$ . Finally substituting the expressions for $\zeta_{b}$ and $\Lambda$ given in (54) and (106), respectively, we obtain from (110) that (59) holds. ∎

Appendix E Auxiliary Lemmas

Lemma 12.

[9, Lemma 34] Let $w(t)\in\mathbb{R}^{n}$ be a Gaussian random vector with distribution $\mathcal{N}(0,\sigma_{w}^{2}I)$ , for all $t\in\{0,\dots,N-1\}$ , where $\sigma_{w}\in\mathbb{R}_{\geq 0}$ . Then for any $N\geq 2$ and for any $\delta_{w}>0$ , the following holds with probability at least $1-\delta_{w}$ :

\max_{0\leq t\leq N-1}\norm{w(t)}\leq\sigma_{w}\sqrt{5n\log\frac{N}{\delta_{w}}}.

Lemma 13.

[9, Lemma 36] Let $\{z(t)\}_{t\geq 0}$ be a sequence of random vectors that is adapted to a filtration $\{\mathcal{F}_{t}\}_{t\geq 0}$ , where $z(t)\in\mathbb{R}^{d}$ for all $t\in\mathbb{Z}_{\geq 0}$ . Suppose $z(t)$ is conditionally Gaussian on $\mathcal{F}_{t-1}$ with $\mathbb{E}[z(t)z(t)^{\top}|\mathcal{F}_{t-1}]\geq\sigma_{z}^{2}I$ , for all $t\in\mathbb{Z}_{\geq 1}$ , where $\sigma_{z}\in\mathbb{R}_{>0}$ . Then, for any $\delta_{z}\in\mathbb{R}_{>0}$ and for any $t\geq 200d\log\frac{12}{\delta_{z}}$ , the following holds with probability at least $1-\delta_{z}$ :

\sum_{k=0}^{t-1}z(k)z(k)^{\top}\succeq\frac{(t-1)\sigma_{z}^{2}}{40}I.

Lemma 14.

Let $X_{1},\dots,X_{t}$ be a sequence of random vectors, where $t\in\mathbb{Z}_{\geq 1}$ . Then,

\mathbb{E}\Big{[}\big{(}\sum_{k=1}^{t}X_{k}\big{)}^{\top}\big{(}\sum_{k=1}^{t}X_{k}\big{)}\Big{]}\leq\Big{(}\sum_{k=1}^{t}\sqrt{\mathbb{E}\Big{[}\norm{X_{k}}^{2}\Big{]}}\Big{)}^{2}.

Proof.

We have the following:

	$\displaystyle\mathbb{E}\Big{[}\big{(}\sum_{k=1}^{t}X_{k}\big{)}^{\top}\big{(}\sum_{k=1}^{t}X_{k}\big{)}\Big{]}$	$\displaystyle=\sum_{k=1}^{t}\mathbb{E}\Big{[}\norm{X_{k}}^{2}\Big{]}+2\sum_{k_{1}=1}^{t}\sum_{\begin{subarray}{c}k_{2}=1\\ k_{2}\neq k_{1}\end{subarray}}^{t}\mathbb{E}\Big{[}X_{k_{1}}^{\top}X_{k_{2}}\Big{]}$
		$\displaystyle\leq\sum_{k=1}^{t}\mathbb{E}\Big{[}\norm{X_{k}}^{2}\Big{]}+2\sum_{k_{1}=1}^{t}\sum_{\begin{subarray}{c}k_{2}=1\\ k_{2}\neq k_{1}\end{subarray}}^{t}\mathbb{E}\Big{[}\norm{X_{k_{1}}}\norm{X_{k_{2}}}\Big{]}$
		$\displaystyle\leq\sum_{k=1}^{t}\mathbb{E}\Big{[}\norm{X_{k}}^{2}\Big{]}+2\sum_{k_{1}=1}^{t}\sum_{\begin{subarray}{c}k_{2}=1\\ k_{2}\neq k_{1}\end{subarray}}^{t}\sqrt{\mathbb{E}\Big{[}\norm{X_{k_{1}}}^{2}\Big{]}}\sqrt{\mathbb{E}\Big{[}\norm{X_{k}}^{2}\Big{]}}$
		$\displaystyle=\Big{(}\sum_{k=1}^{t}\sqrt{\mathbb{E}\Big{[}\norm{X_{k}}^{2}\Big{]}}\Big{)}^{2},$

where the first and second inequalities follow from the Cauchy-Schwarz inequality. ∎

Lemma 15.

[29, Lemma 5] Consider any matrix $M\in\mathbb{R}^{n\times n}$ and any matrix $\Delta\in\mathbb{R}^{n\times n}$ . Let $\kappa_{M}\in\mathbb{R}_{\geq 1}$ and $\gamma_{M}\in\mathbb{R}_{>0}$ be such that $\gamma_{M}>\rho(M)$ , and $\norm{M^{k}}\leq\kappa_{M}\gamma_{M}^{k}$ for all $k\in\mathbb{Z}_{\geq 0}$ . Then, for all $k\in\mathbb{Z}_{\geq 0}$ ,

\norm{(M+\Delta)^{k}-M^{k}}\leq k\kappa_{M}^{2}(\kappa_{M}\norm{\Delta}+\gamma_{M})^{k-1}\norm{\Delta}.