Learning in Networked Control Systems

Rahul Singh and P. R. Kumar Rahul Singh is with the Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH, 43210, USA [email protected]P. R. Kumar is with the Department of Electrical & Computer Engineering, Texas A&M University, College Station, TX 77843, USA [email protected]

Abstract

We design adaptive controller (learning rule) for a networked control system (NCS) in which data packets containing control information are transmitted across a lossy wireless channel. We propose Upper Confidence Bounds for Networked Control Systems (UCB-NCS), a learning rule that maintains confidence intervals for the estimates of plant parameters $(A_{(\star)},B_{(\star)})$ , and channel reliability $p_{(\star)}$ , and utilizes the principle of optimism in the face of uncertainty while making control decisions.

We provide non-asymptotic performance guarantees for UCB-NCS by analyzing its “regret”, i.e., performance gap from the scenario when $(A_{(\star)},B_{(\star)},p_{(\star)})$ are known to the controller. We show that with a high probability the regret can be upper-bounded as $\tilde{O}\left(C\sqrt{T}\right)$ ¹¹1Here $\tilde{O}$ hides logarithmic factors., where $T$ is the operating time horizon of the system, and $C$ is a problem dependent constant.

I Introduction

Though adaptive control [1] of unknown Linear Quadratic Gaussian (LQG) systems [2] is a well-studied topic by now [3, 4, 5, 6], existing algorithms cannot be utilized for controlling an unknown NCS in which plant and network parameters are unknown. In departure from the traditional adaptive controllers for LQG systems, an algorithm now also needs to continually estimate the unknown network behaviour besides simultaneously learning and controlling the plant in an online manner. An important concern is that in general it is not optimal to design and operate network estimator independently of the process controller. Thus, the optimal controls $u(t)$ should utilize the information gained about network quality in addition to using the information gained about plant parameters. Similarly, decisions made by the network scheduler should also “aid” the controller in “learning” the unknown plant parameters.

This work addresses the problem of adaptive control of a simple NCS in which data packets from the controller to the plant, are communicated over an unreliable channel. We model the plant as a LQG system. We propose a learning rule that maintains estimates and confidence sets for both a) (unknown) plant parameters $(A_{(\star)},B_{(\star)})$ , and also b) (unknown) channel reliability $p_{(\star)}$ . Controls are then generated using the principle of optimism in face of uncertainty [7], and depend upon both a) and b). We denote our algorithm as Upper Confidence Bounds for Networked Control Systems (UCB-NCS).

We show that UCB-NCS yields the same asymptotic performance as the optimal controller that has knowledge of the system and network parameters. We also quantify its finite-time performance by providing upper-bounds on its “regret” [8]. Regret scales as $\tilde{O}\left(C\sqrt{T}\right)$ , where $T$ is the operating time horizon and $C$ is a problem dependent constant. It also depends on the channel reliability through a certain quantity which we call the “margin of stability” $\eta$ (14). A larger value of $\eta$ means that the learning algorithm has a lower regret.

UCB-NCS has many appealing properties. For instance, network estimator needs to communicate only occasionally the value of its optimistic estimate of network reliability to the controller which then uses it to generate controls.

II System Model

We assume that the system of interest is linear, and evolves as follows

\displaystyle x(t+1)=\begin{cases}A_{(\star)}x(t)+B_{(\star)}u(t)+w(t)\mbox{ if }\ell(t)=1\\ A_{(\star)}x(t)+w(t)\mbox{ if }\ell(t)=0,\end{cases}

(1)

where $A_{(\star)}\in\mathbb{R}^{n\times n},B_{(\star)}\in\mathbb{R}^{n\times m}$ are the system matrices, $\ell(t)\in\left\{0,1\right\}$ is the instantaneous state of the wireless channel, and $x(t)\in\mathbb{R}^{n},u(t)\in\mathbb{R}^{m}$ are the system state and control input at time $t$ respectively. $\{\ell(t)\}_{t=1}^{T}$ are Bernoulli i.i.d. with mean value $p_{(\star)}$ . $\{w(t)\}_{t=1}^{T}$ is the process noise, and is assumed to be i.i.d. with

\displaystyle\mathbb{E}\left(w(t)w^{T}(t)\right)=\sigma^{2}_{w},~{}\forall t\in[1,T].

The objective is to minimize the operating cost

\displaystyle\mathbb{E}\sum_{t=1}^{T-1}x^{T}(t)Qx(t)+u^{T}(t)Ru(t)+x^{T}(T)Qx(T).

(2)

We let $\theta_{(\star)}:=\left(A_{(\star)},B_{(\star)},p_{(\star)}\right)$ denote the system parameters. $\theta_{(\star)}$ is not known to controller. We assume that the system is scalar, i.e., $m=n=1$ .

III Preliminaries on Jump Markov Linear Systems

Note that (1) is a Jump Markov Linear System (JMLS), and if the system parameter $\theta_{(\star)}$ is known, the optimal controls can be obtained by using Dynamic Programming [9].

There are matrices $\left\{K_{\theta_{(\star)}}(\ell)\right\}_{\ell\in\{0,1\}}$ such that the optimal control at $t$ is given by $K_{\theta_{(\star)}}(\ell(t))x(t)$ . We let $\left\{K_{\theta}(\ell)\right\}_{\ell\in\{0,1\}}$ denote the optimal matrices when system parameter is equal to $\theta$ .

We let $V_{\theta}(x,\ell)$ denote the “cost-to-go” when system state is equal to $x$ , channel state is $\ell$ and system dynamics are described by $\theta$ . In fact value function is piecewise linear, and we let $\{P_{\theta}(\ell)\}_{\ell\in\{0,1\}}$ denote the corresponding matrices. We also let $J_{\theta}$ be the optimal operating cost.

Notation: For a random variable (r.v.) $X$ , let $X_{\mathcal{F}}$ denote its projection onto the space of $\mathcal{F}$ measurable funcions, i.e., its conditional expectation w.r.t. sigma-algebra $\mathcal{F}$ . For $x,y\in\mathbb{Z}$ ²²2 $\mathbb{Z}$ denotes the set of integers., we let $[x,y]:=\left\{x,x+1,\ldots,y\right\}$ . For a set of r.v. s $\mathcal{X}$ , we let $\sigma(\mathcal{X})$ denote the smallest sigma-algebra with respect to which each r.v. in $\mathcal{X}$ is measurable. For functions $f(x),g(x)$ , we say $f(x)=O(g(x))$ if $\lim_{x\to\infty}f(x)/g(x)=1$ . For a set $\mathcal{X}$ , we let $\mathcal{X}^{c}$ denote its complement.

IV Upper Confidence Bounds for NCS (UCB-NCS)

Let $\mathcal{F}_{t}:=\sigma\left(\left\{(x(s),u(s))\right\}_{s=1}^{t-1}\cup\{x(t)\}\right)$ . A learning policy, or an adaptive controller is a collection of maps $\left\{\mathcal{F}_{t}\mapsto u(t)\right\}_{t=1}^{T}$ . Let $\hat{\theta}(t):=\left(\hat{A}(t),\hat{B}(t),\hat{p}(t),\right)$ denote the estimates of $\theta_{(\star)}=(A_{(\star)},B_{(\star)},p_{(\star)})$ at time $t$ defined as follows. Let $z(s):=x(s+1)$ , and $\lambda>0$ .

		$\displaystyle\hat{p}(t)=\sum_{s=1}^{t}\ell(s)/t,$
		$\displaystyle\hat{A}(t)\in\arg\min 1/2\left[\lambda A^{2}+\sum_{s=1}^{t-1}\left(z(s)-Ax(s)\right)^{2}(1-\ell(s))\right],$
		$\displaystyle\hat{B}(t)\in$
		$\displaystyle\arg\min\left[\frac{\lambda B^{2}}{2}+\frac{\sum_{s=1}^{t-1}\left(z(s)-\hat{A}(t)x(s)-Bu(s)\right)^{2}\ell(s)}{2}\right],$		(3)

Define

	$\displaystyle V_{1}(t):$	$\displaystyle=\lambda+\sum_{s=1}^{t-1}x^{2}(s)(1-\ell(s)),V_{2}(t):=\lambda+\sum\limits_{s=1}^{t-1}u^{2}(s)\ell(s),$
	$\displaystyle\gamma_{i}(\delta,t)$	$\displaystyle:=\sqrt{\log\left(\lambda V_{i}(t)/\delta\right)},~{}i=1,2.$		(4)

Let $\mathcal{C}(t)=\left(\mathcal{C}_{1}(t),\mathcal{C}_{2}(t),\mathcal{C}_{3}(t)\right)$ be the confidence intervals associated with the estimates $\left(\hat{A}(t),\hat{B}(t),\hat{p}(t)\right)$ at time $t$ defined as follows,

$\displaystyle\mathcal{C}_{1}(t):$	$\displaystyle=\left\{A:\|A-\hat{A}(t)\|\leq\beta_{1}(t)\right\},$	(5)
$\displaystyle\mathcal{C}_{2}(t):$	$\displaystyle=\left\{B:\|B-\hat{B}(t)\|\leq\beta_{2}(t)\right\},$
$\displaystyle\mathcal{C}_{3}(t):$	$\displaystyle=\left\{p:\|p-\hat{p}(t)\|\leq\beta_{3}(t)\right\},$	(6)

where

	$\displaystyle\beta_{1}(t):$	$\displaystyle=(\gamma_{1}(\delta,t)+\lambda^{1/2})/\sqrt{V_{1}(\delta,t)},~{}\beta_{3}(t):=\sqrt{\log\left(1/\delta\right)/t}$
	$\displaystyle\beta_{2}(t):$	$\displaystyle=\frac{(\gamma_{2}(\delta,t)+\lambda^{1/2})}{\sqrt{V_{2}(t)}}+K_{\max}\frac{(\gamma_{1}(\delta,t)+\lambda^{1/2})}{\sqrt{V_{1}(\delta,t)}}.$

The learning rule decomposes the cumulative time into episodes, and implements a single stationary controller within each single episode that chooses $u(t)$ as a function of $x(t)$ . Let $\tau_{k}$ denote the starting time of $k$ -th episode. The controller implemented within episode $k$ is obtained at time $\tau_{k}$ by solving the following optimization problem.

\displaystyle\min_{\theta\in\mathcal{C}(\tau_{k})\cap\Theta}J_{\theta},

(7)

where $\Theta$ is the set of “allowable” parameters. Let $\theta(\tau_{k})$ denote a solution to above problem. It implements the optimal controller corresponding to the case when true system parameters are equal to $\theta(\tau_{k})$ . $u(t)=K_{\theta(\tau_{k})}(\ell(t))x(t)$ . Thus, $u(t)=K_{\theta(\tau_{k})}(\ell(t))x(t)$ for $t\in\left[\tau_{k},\tau_{k+1}-1\right]$ .

A new episode begins when either $V_{1}(t)$ or $V_{2}(t)$ doubles or the operating time spent in current episode becomes equal to length of previous episode. The learning rule also ensures that the durations of episodes are at least $L$ time-slots, i.e., $\tau_{k+1}-\tau_{k}\geq L$ . We set

\displaystyle\theta(t):=\theta(\tau_{k}),\forall t\in\left[\tau_{k},\tau_{k+1}-1\right],

i.e., it is the current value of the UCB estimate of $\theta_{(\star)}$ . UCB-NCS is summarized in Algorithm 1.

Algorithm 1 UCB-NCS

T,\lambda>0,\delta>0,L\in\mathbb{N},\alpha>2

Set

V^{1,\star},V^{2,\star}=\lambda,\hat{A}(1)=.5,\hat{B}(1)=.5,\hat{p}(1)=.5,\tau=1,V_{1}(1)=\lambda,V_{2}(1)=\lambda

1: for

t=1,2,\ldots

2: if (

V_{1}(t)\geq 2V^{1,\star}

V_{2}(t)\geq 2V^{2,\star}

t\geq 2\tau

) and

t-\tau\geq L

then

3: Calculate

\hat{\theta}(t)

as in (IV) and

\theta(t)

by solving (7). Update

V^{1,\star}=V_{1}(t),V^{2,\star}=V_{2}(t),\tau=t

4: else

\hat{\theta}(t)=\hat{\theta}(t-1)

6: end ifCalculate

u(t)

based on current UCB estimate

\theta(t)

, system state

x(t)

, and channel state

\ell(t)

. Use control

u(t)=K_{\theta(t)}(\ell(t))x(t)

.Update

V_{1}(t+1)=V_{1}(t)+x^{2}(t)(1-\ell(t)),V_{2}(t+1)=V_{2}(t)+u^{2}(t)\ell(t)

7: end for

V Large Deviation Bounds on Estimation Errors

We now analyze the estimation errors $e_{1}(t):=\hat{A}(t)-A,e_{2}(t):=\hat{B}(t)-B$ .

Lemma 1

Define

\displaystyle\mathcal{E}:=\left\{\omega:\theta_{(\star)}=\left(A_{(\star)},B_{(\star)},p_{(\star)}\right)\in\mathcal{C}(t),~{}\forall t\in[1,T]\right\}.

We then have that

\displaystyle\mathbb{P}\left(\mathcal{E}^{c}\right)\leq 3\delta.

Proof:

It can be shown that

\displaystyle e_{1}(t)=-\lambda A/V_{1}(t)+\sum\limits_{s=1}^{t-1}w(s)x(s)(1-\ell(s))/V_{1}(t).

(8)

Note that $\{w(s)\}_{s=1}^{T-1}$ is a martingale difference sequence w.r.t. $\mathcal{F}_{t}$ , while $x(t)$ is adapted to $\mathcal{F}_{t}$ . Thus, bound on $e_{1}(t)$ follows by using self-normalized bounds on martingales from Corollary 1 of [10].

To analyze $e_{2}(t)$ , we observe,

		$\displaystyle e_{2}(t)=\left(\sum\limits_{s=1}^{t-1}w(s)u(s)\ell(s)/V_{2}(t)-\lambda B/V_{2}(t)\right)$
		$\displaystyle+[A-\hat{A}(t)]\sum\limits_{s=1}^{t-1}x(s)u(s)\ell(s)/V_{2}(t).$		(9)

The first term within braces is bounded using Corollary 2 of [10]. To bound the second term, we observe that it is upper-bounded by $K_{\max}|e_{1}(t)|$ . We then use bounds on $e_{1}(t)$ to bound it. Bound on estimation error of $p_{(\star)}$ is obtained using Azuma-Hoeffding inequality. ∎

VI Large Deviation Bounds on the System State $|x(t)|$

We now bound $|x(t)|$ under UCB-NCS. System evolution under UCB-NCS is given by

\displaystyle x(t+1)=A_{sw}(t)x(t)+w(t),t\in[1,T-1],

where

\displaystyle A_{sw}(t):=\left[\left(A_{(\star)}+B_{(\star)}K_{{}_{\theta(t)}}(\ell(t))\right)\ell(t)+A_{(\star)}(1-\ell(t))\right].

Thus,

\displaystyle x(t)=x(0)G(0,t)+\sum_{s=1}^{t-1}w(s)G(s,t-1),

(10)

where

\displaystyle G(s_{1},s_{2}):=\begin{cases}\prod\limits_{\ell=s_{1}}^{s_{2}}A_{sw}(\ell)\mbox{ if }s_{2}>s_{1},\\ 1\mbox{ if }s_{1}=s_{2}.\end{cases}

Consider the deviations

\displaystyle\Delta(t_{1},t_{2}):=\sum\limits_{s=t_{1}}^{t_{2}}\ell(s)-p_{(\star)}(t_{2}-t_{1}),

and the events,

\displaystyle\mathcal{J}_{t_{1},t_{2}}:=\left\{\omega:|\Delta(t_{1},t_{2})|\leq\sqrt{2\alpha\sigma^{2}_{p_{(\star)}}(t_{2}-t_{1})\log(t_{2}-t_{1})}\right\},

(11)

where $\sigma^{2}_{p_{(\star)}}:=p_{(\star)}(1-p_{(\star)})$ , and $\alpha>2$ . It follows from Azuma-Hoeffding inequality that

\displaystyle\mathbb{P}\left(\mathcal{J}^{c}_{t_{1},t_{2}}\right)\leq\frac{1}{(t_{2}-t_{1})^{\alpha}},~{}\forall t_{1},t_{2}\in[1,T].

(12)

Fix a sufficiently large $L>0$ ³³3It suffices to let $L>\left(2\alpha\sigma^{2}_{p_{(\star)}}/\epsilon^{2}\right)^{2}$ , and define

\displaystyle\mathcal{J}:=\cap_{t_{1},t_{2}:t_{2}\geq t_{1}+L}~{}\mathcal{J}_{t_{1},t_{2}}.

(13)

The following result by combining union bound with the bound (12).

Lemma 2

\displaystyle\mathbb{P}\left(\mathcal{J}^{c}\right)\leq T^{2}/L^{\alpha}.

We now focus on upper-bounding $|G(s,t)|$ on $\mathcal{J}$ .

Throughout, we assume that the true system parameter $\theta_{(\star)}$ , and the set $\Theta$ used by UCB-NCS, satisfy the following.

Assumption 1

Define

\displaystyle\Lambda(\theta):=\mathbb{E}\left(\log A_{sw}(t)|\theta(t)=\theta\right).

Let $\epsilon>0,\eta>0$ . Then,

\displaystyle\Lambda(\theta)<-\eta-\epsilon<0,~{}\forall\theta\in\Theta.

(14)

We call $\eta$ as the “margin of stability” of the NCS. Note that $\eta$ depends upon a) $\Theta$ , b) $(A_{(\star)},B_{(\star)},p_{(\star)})$ .

Consider an element of $\mathcal{J}$ , and assume there are $k$ episodes during the time period $[s,t]$ . Let $N_{i,k},i=0,1$ denote the number of times channel state assumes value $i$ , and let $\theta_{k}$ denote the UCB estimate of $\theta_{(\star)}$ during the $k$ -th episode. Let $D_{k}$ denote the duration of $k$ -th episode. We have the following,

		$\displaystyle\|G(s,t)\|=\prod_{m=s}^{t}A_{sw}(\ell)$
		$\displaystyle\leq\prod_{k=1}^{K}\exp\left(D_{k}\Lambda(\theta_{k})\right)\exp\left(\sqrt{2\alpha\sigma^{2}_{p_{(\star)}}D_{k}\log D_{k}}\right)$
		$\displaystyle\leq\exp\left(-\eta(t-s)\right),$		(15)

where the first inequality follows from definition of $\mathcal{J}$ (13), while the second follows from Assumption 1.

Let

\displaystyle\mathcal{H}:=\left\{\omega:\max_{t\in[1,T]}|w(t)|\leq\log^{1/2}\left(T/\delta\right)\right\}.

Following is easily proved.

Lemma 3

We have

\displaystyle\mathbb{P}\left(\mathcal{H}^{c}\right)\leq\delta.

Lemma 4

Define

\displaystyle g(\delta,T):=|x(0)|+\log^{1/2}\left(T/\delta\right)/(1-\exp(-\eta)).

(16)

Under Assumption 1, we have the following on $\mathcal{H}\cap\mathcal{J}$

\displaystyle|x(t)|<g(\delta,T),~{}\forall t\in[1,T].

Note that we have suppressed dependence of function $g$ upon $\eta,x(0)$ .

Proof:

The proof follows by substituting in (10) the bound (VI) on $|G(s,t)|$ and the bound $\log^{1/2}\left(\frac{T}{\delta}\right)$ on $|w(s)|$ on the set $\mathcal{H}$ . ∎

VII Regret Analysis of UCB-NCS

Define $R(T)$ , the regret incurred by UCB-NCS until time $T$ as follows

	$\displaystyle R(T):$	$\displaystyle=\sum_{t=1}^{T}c(t)-TJ_{\theta_{(\star)}},$
	$\displaystyle\mbox{ where }c(t):$	$\displaystyle=Qx^{2}(t)+Ru^{2}(t).$		(17)

For $\theta=(A,B,p)$ , define

\displaystyle x_{\theta}(t+1;u)=Ax(t)+Bu+w(t).

Similarly, let $\{\ell_{\theta}(t)\}_{t=1}^{T}$ be drawn i.i.d. according to $\theta$ .

Lemma 5

On the set $\mathcal{E}$ , $R(T)$ can be upper-bounded as follows,

\displaystyle R(T)\leq R_{1}+R_{2},

where,

	$\displaystyle R_{1}:$	$\displaystyle=\sum_{t=1}^{T-1}V_{\theta(t)}(x_{\theta_{(\star)}}(t+1;u(t)),\ell_{(\star)}(t+1))_{\mathcal{F}_{t}}$
		$\displaystyle-V_{\theta(t)}(x(t),\ell(t))$
	$\displaystyle R_{2}:$	$\displaystyle=\sum_{t=1}^{T-1}V_{\theta(t)}(x_{\theta(t)}(t+1;u(t)),\ell_{\theta(t)}(t+1))_{\mathcal{F}_{t}}$
		$\displaystyle-V_{\theta(t)}(x_{\theta_{(\star)}}(t+1;u(t)),\ell_{(\star)}(t+1))_{\mathcal{F}_{t}}.$

Proof:

Consider the Bellman optimality equation at time $t$ when the true system parameter is assumed equal to $\theta(t)$ ,

		$\displaystyle J_{\theta(t)}+V_{\theta(t)}(x(t),\ell(t))=Qx^{2}(t)$
		$\displaystyle+\min_{u\in\mathbb{R}}\left[Ru^{2}+V_{\theta(t)}(x_{\theta(t)}(t+1;u),\ell_{\theta(t)}(t+1))_{\mathcal{F}_{t}}\right]$
		$\displaystyle=Qx^{2}(t)+Ru^{2}(t)+V_{\theta(t)}(x_{\theta_{(\star)}}(t+1;u(t)),\ell_{(\star)}(t+1))_{\mathcal{F}_{t}}$
		$\displaystyle+V_{\theta(t)}(x_{\theta(t)}(t+1;u(t)),\ell_{\theta(t)}(t+1))_{\mathcal{F}_{t}}$
		$\displaystyle-V_{\theta(t)}(x_{\theta_{(\star)}}(t+1;u(t)),\ell_{(\star)}(t+1))_{\mathcal{F}_{t}}$		(18)

where the second equality follows since the learning rule applies controls by assuming that $\theta(t)$ is the true system parameter. Note that on $\mathcal{E}$ , $J_{\theta(t)}$ serves as a lower bound on the optimal cost $J_{\theta_{(\star)}}$ , so that $\sum\limits_{t=0}^{T}\left(Qx^{2}(t)+Ru^{2}(t)\right)-\sum\limits_{t=0}^{T}J_{\theta(t)}$ serves as an upper-bound on $R(T)$ . Proof is completed by re-arranging the terms in (VII), and summing them from $t=1$ to $t=T-1$ . ∎

We now bound the terms $R_{1},R_{2}$ on $\mathcal{E}$ .

VII-A Bounding $R_{1}$

We decompose $R_{1}$ as follows, $R_{1}=\mathcal{T}_{1}+\mathcal{T}_{2}$ , where,

	$\displaystyle\mathcal{T}_{1}:$	$\displaystyle=\sum_{t=1}^{T-1}V_{\theta(t-1)}(x_{\theta_{(\star)}}(t;u(t-1)),\ell_{(\star)}(t))_{\mathcal{F}_{t-1}}$
		$\displaystyle\qquad\qquad-V_{\theta(t)}(x(t),\ell(t)),$
	$\displaystyle\mathcal{T}_{2}:$	$\displaystyle=V_{\theta(T-1)}(x_{\theta_{(\star)}}(T;u(T-1)),\ell_{(\star)}(T))_{\mathcal{F}_{T-1}}$
		$\displaystyle\qquad\qquad-V_{\theta(1)}(x(1),\ell(1)).$

We further decompose $\mathcal{T}_{1}$ as follows,

\displaystyle\mathcal{T}_{1}=\mathcal{T}_{3}+\mathcal{T}_{4},

where,

	$\displaystyle\mathcal{T}_{3}:$	$\displaystyle=\sum_{t=1}^{T-1}V_{\theta(t-1)}(x_{\theta_{(\star)}}(t;u(t-1)),\ell_{(\star)}(t))_{\mathcal{F}_{t-1}}$
		$\displaystyle\qquad\qquad\qquad\qquad\qquad-V_{\theta(t-1)}(x(t),\ell(t))$
	$\displaystyle\mathcal{T}_{4}:$	$\displaystyle=\sum_{t=1}^{T-1}V_{\theta(t)}(x(t),\ell(t))-V_{\theta(t-1)}(x(t),\ell(t)).$

Lemma 6

\displaystyle\mathbb{P}\left(\mathcal{T}_{3}>\sqrt{Tg(\delta,T)\log\left(T/\delta\right)}\right)\leq\delta+\mathbb{P}\left(\left[\mathcal{H}\cap\mathcal{J}\right]^{c}\right),

where $g(\delta,T)$ is as in (16).

Proof:

$\mathcal{T}_{3}$ is a martingale, though its increments are not bounded. However, its increments are upper-bounded as $O\left(|x(t)|\right)$ . It follows from Lemma 4 that its increments are upper-bounded as $O\left(g(\delta,T)\right)$ on $\mathcal{H}\cap\mathcal{J}$ . The proof then follows from Proposition 34 of [11]. ∎

Henceforth denote

\displaystyle\mathcal{G}:=\left\{\omega:\mathcal{T}_{3}<\sqrt{Tg(\delta,T)\log\left(T/\delta\right)}\right\}.

We obtain the following bound on $R_{1}$ by combining results of Lemma 6 and Lemma 14.

Lemma 7 (Bounding $R_{1}$ )

Let

	$\displaystyle\mathcal{U}_{1}:$	$\displaystyle=\sqrt{Tg(\delta,T)\log\left(T/\delta\right)}$
		$\displaystyle+2P_{\max}g^{2}(\delta,T)+P_{\max}f(\delta,T)g(\delta,T),$		(19)

where $g(\delta,T),f(\delta,T)$ are as in (16), (13). On $\mathcal{G}\cap\left(\mathcal{H}\cap\mathcal{J}\right)$ we have $R_{1}\leq\mathcal{U}_{1}$ .

VII-B Bounding $R_{2}$

We decompose $R_{2}$ as follows,

\displaystyle R_{2}=\mathcal{T}_{5}+\mathcal{T}_{6}.

(20)

where

	$\displaystyle\mathcal{T}_{5}:$	$\displaystyle=\sum_{t=1}^{T-1}V_{\theta(t)}(x_{\theta(t)}(t+1;u(t)),\ell_{\theta(t)}(t+1))_{\mathcal{F}_{t}}$
		$\displaystyle\qquad\qquad-V_{\theta(t)}(x_{\theta_{(\star)}}(t+1;u(t)),\ell_{\theta(t)}(t+1))_{\mathcal{F}_{t}}$
	$\displaystyle\mathcal{T}_{6}:$	$\displaystyle=\sum_{t=1}^{T-1}V_{\theta(t)}(x_{\theta_{(\star)}}(t+1;u(t)),\ell_{\theta(t)}(t+1))_{\mathcal{F}_{t}}$
		$\displaystyle-V_{\theta(t)}(x_{\theta_{(\star)}}(t+1;u(t)),\ell_{(\star)}(t+1))_{\mathcal{F}_{t}}.$

Note that under UCB-NCS, we have that $u(t)=K_{\theta(t)}(\ell(t))$ . Let

\displaystyle K_{\max}:=\sup_{\theta\in\Theta,\ell\in\{0,1\}}K_{\theta}(\ell),P_{\max}:=\sup_{\theta\in\Theta,\ell\in\{0,1\}}P_{\theta}(\ell).

(21)

After performing simple algebraic manipulations, we can show that

$\displaystyle\mathcal{T}_{5}$	$\displaystyle\leq P_{\max}\sum_{t=1}^{T-1}\left\|\left(A_{\theta(t)}x(t)+B_{\theta(t)}u(t)\right)^{2}\right.$
	$\displaystyle\left.\qquad\qquad\qquad\qquad-\left(A_{(\star)}x(t)+B_{(\star)}u(t)\right)^{2}\right\|$
	$\displaystyle\leq P_{\max}~{}\mathcal{T}_{7}^{1/2}\times\mathcal{T}_{8}^{1/2}$	(22)

where

	$\displaystyle\mathcal{T}_{7}:=\sum_{t=1}^{T-1}\left\|A_{\theta(t)}x(t)-A_{(\star)}x(t)+B_{\theta(t)}u(t)-B_{(\star)}u(t)\right\|^{2},$
	$\displaystyle\mathcal{T}_{8}:=\sum_{t=1}^{T}\left\|A_{\theta(t)}x(t)+B_{\theta(t)}u(t)+A_{(\star)}x(t)+B_{(\star)}u(t)\right\|^{2},$

and the last inequality in (VII-B) follows from Cauchy-Schwartz inequality. The terms $\mathcal{T}_{7},\mathcal{T}_{8}$ are bounded in Lemma 10 and Lemma 11 in Appendix. We substitute these bounds in (VII-B) and obtain the following result.

Lemma 8

On $\mathcal{E}\cap\left(\mathcal{H}\cap\mathcal{J}\right)$ , we have

$\displaystyle\mathcal{T}_{5}$	$\displaystyle\leq C_{1}\sqrt{T}\log\left(V_{1}(T)/\lambda\right)\left(\gamma_{1}(\delta,T)+\gamma_{2}(\delta,T)+2\lambda^{1/2}\right)$
	$\displaystyle\times\sqrt{h(\delta,T)}~{}g^{3/2}(\delta,T),\mbox{ where, }$	(23)
$\displaystyle C_{1}:$	$\displaystyle=2\sqrt{2}P_{\max}\left(1+K_{\max}\right)G_{cl,\max}/\lambda.$	(24)

It remains to bound $\mathcal{T}_{6}$ in order to bound $R_{2}$ . This is done in Lemma 12 of Appendix.

Lemma 9

Let

	$\displaystyle\mathcal{U}_{2}:=C_{1}\sqrt{T}\log\left(V_{1}(T)/\lambda\right)\left(\gamma_{1}(\delta,T)+\gamma_{2}(\delta,T)+2\lambda^{1/2}\right)$
	$\displaystyle\times\sqrt{h(\delta,T)}~{}g^{3/2}(\delta,T)$
	$\displaystyle+P_{\max}\left(G^{2}_{cl,\max}~{}g(\delta,T)+\sigma^{2}\right)\sqrt{\alpha T\log T}.$

On $\mathcal{E}\cap\mathcal{H}\cap\mathcal{J}$ , we have $R_{2}\leq\mathcal{U}_{2}$ .

Proof:

Follows by substituting bounds on $\mathcal{T}_{5}$ and $\mathcal{T}_{6}$ from Lemma 8 and Lemma 12 into (20). ∎

VIII Main Result

Theorem 1 (Bound on Regret)

Consider the NCS operating under UCB-NCS described in Algorithm 1. Under Assumption 1, $R(T)\leq\mathcal{U}_{1}+\mathcal{U}_{2}$ with a probability at least $7\delta+T^{2}/L^{\alpha}$ . The terms $\mathcal{U}_{1},\mathcal{U}_{2}$ are defined in (7) and (23) respectively. Upon ignoring terms and factors that are $O(\log T)$ , this bound simplifies to

\displaystyle\sqrt{T}\left(\log^{1/4}(1/\delta)+\sqrt{\alpha}P_{\max}G^{2}_{cl,\max}\log^{1/2}(1/\delta)+C_{1}\right).

Proof:

It follows from Lemma 5 that $R(T)\leq R_{1}+R_{2}$ on $\mathcal{E}$ . Proof then follows by substituting upper-bounds from Lemma 7, Lemma 9, and using union bound to lower-bound the probability of $\mathcal{G}\cap\mathcal{E}\cap\mathcal{H}\cap\mathcal{J}$ . ∎

IX Conclusion and Future Work

We propose UCB-NCS, an adaptive control law, or learning rule for NCS, and provide its finite-time performance guarantees. We show that with a high probability, its regret scales as $\tilde{O}(\sqrt{T})$ upto constant factors. We identify a certain quantity which we call margin of stability of NCS. Regret increases with a smaller margin, which indicates a low network quality.

Results in this work can be extended in various directions. So far we considered only scalar systems. A natural extension is to the case of vector systems. Another direction is to derive lower-bounds on expected value of regret that can be achieved under any admissible control policy.

Lemma 10 (Bounding $\mathcal{T}_{7}$ )

On $\mathcal{E}\cap\left(\mathcal{H}\cap\mathcal{J}\right)$ , we have

	$\displaystyle\mathcal{T}_{7}$	$\displaystyle\leq\left(1+K_{\max}\right)^{2}\left(\gamma_{1}(\delta,T)+\gamma_{2}(T)+2\lambda^{1/2}\right)^{2}$
		$\displaystyle\times 2\left\{2\sqrt{h(\delta,T)}\right\}^{2}\cdot g^{2}(\delta,T)\cdot\frac{1}{\lambda}\log\left(\frac{V_{1}(T)}{\lambda}\right).$

Proof:

Let $\tau$ be the time step at which the latest episode begins. Since under UCB-NCS we have $u(t)=K_{\theta(t)}(\ell(t))x(t)$ , it can be shown that

		$\displaystyle\left\|\left(A_{\theta(t)}x(t)-A_{(\star)}x(t)\right)+\left(B_{\theta(t)}u(t)-B_{(\star)}u(t)\right)\right\|$
		$\displaystyle\leq\left\|\left(A_{\theta(t)}-A_{(\star)}\right)\right\|\left\|x(t)\right\|+\left\|\left(B_{\theta(t)}-B_{(\star)}\right)\right\|K_{\max}\left\|x(t)\right\|.$		(25)

Now consider the following inequality,

\displaystyle\left|\left(A_{\theta(t)}-A_{(\star)}\right)\right|\leq\left|A_{\theta(t)}-\hat{A}(t)\right|+\left|\hat{A}(t)-A_{(\star)}\right|.

(26)

For $\theta=\left(A,B,p\right)\in\mathcal{C}(\tau)$ , we have,

	$\displaystyle\left\|A-\hat{A}(\tau)\right\|\|x(t)\|$	$\displaystyle\leq\sqrt{V_{1}(\tau)}\left\|A-\hat{A}(\tau)\right\|\frac{\|x(t)\|}{\sqrt{V_{1}(t)}}\sqrt{h(\delta,T)}$
		$\displaystyle\leq\left(\gamma_{1}(\tau)+\lambda^{1/2}\right)\frac{\|x(t)\|}{\sqrt{V_{1}(t)}}\sqrt{h(\delta,T)},$		(27)

where the first inequality follows from Lemma 14, and second inequality follows from the size of the confidence intervals (5). On $\mathcal{E}$ , we have $\theta_{(\star)}\in\mathcal{C}(\tau)$ , and also $\theta(t)\in\mathcal{C}(\tau)$ ; so we use inequality (IX) with $\theta$ set equal to $\theta_{(\star)},\theta(t)$ , and combine the resulting inequalities with (26) in order to obtain the following,

		$\displaystyle\left\|\left(A_{\theta(t)}-A_{(\star)}\right)\right\|\left\|x(t)\right\|$
		$\displaystyle\leq 2\sqrt{h(\delta,T)}\left(\gamma_{1}(\delta,t)+\lambda^{1/2}\right)\frac{\|x(t)\|}{\sqrt{V_{1}(t)}}.$		(28)

A similar bound can be obtained for $\left|\left(B_{\theta(t)}-B_{(\star)}\right)\right|\left|x(t)\right|$ also. Remaining proof comprises of substituting these bounds in (IX) and performing algebraic manipulations. We also utilize Lemma 10 of [6] in order to bound $\sum_{t=1}^{T}\left[|x(t)|^{2}/V_{1}(t)\wedge 1\right],\sum_{t=1}^{T}\left[|x(t)|^{2}/V_{2}(t)\wedge 1\right]$ . ∎

Lemma 11 (Bounding $\mathcal{T}_{8}$ )

On $\mathcal{H}\cap\mathcal{J}$ , we have

\displaystyle\mathcal{T}_{8}\leq G^{2}_{cl,\max}~{}Tg(\delta,T),

where

\displaystyle G_{cl,\max}:=\sup_{\theta\in\Theta,\ell\in\{0,1\}}\left\{|A_{\theta}+B_{\theta}K_{\ell}(\theta)|,|A_{(\star)}+B_{(\star)}K_{\ell}(\theta)|\right\}

Proof:

Follows from Lemma 4. ∎

Lemma 12

On $\mathcal{E}\cap\left(\mathcal{H}\cap\mathcal{J}\right)$ we have

\displaystyle\mathcal{T}_{6}\leq P_{\max}\left(G^{2}_{cl,\max}g(\delta,T)+\sigma^{2}\right)\sqrt{\alpha T\log T}.

Proof:

We have

	$\displaystyle\mathcal{T}_{6}$	$\displaystyle\leq\sum_{t=1}^{T-1}\|p_{\theta(t)}-p_{(\star)}\|P_{\max}\left(G^{2}_{cl,\max}x^{2}(t)+\sigma^{2}\right)$
		$\displaystyle\leq P_{\max}\left(G^{2}_{cl,\max}\max_{t\in[1,T]}x^{2}(t)+\sigma^{2}\right)\left(\sum_{t=1}^{T-1}\|p_{\theta(t)}-p_{(\star)}\|\right).$

The proof is completed by noting that on $\mathcal{E}$ , we have $|p_{\theta(t)}-p_{(\star)}|\leq\beta_{1}(t)$ , while on $\mathcal{H}\cap\mathcal{J}$ we have $\max_{t\in[1,T]}|x(t)|\leq g(\delta,T)$ . ∎

Lemma 13 (Bounding $N(T)$ )

Define

		$\displaystyle f(\delta,T):=\log\left(1+Tg^{2}(\delta,T)/\lambda\right)$
		$\displaystyle+\log\left(1+TK^{2}_{\max}g^{2}(\delta,T)/\lambda\right)+\log\left(T\right).$		(29)

We have that

\displaystyle N(T)\leq f(\delta,T)\mbox{ on }\mathcal{H}\cap\mathcal{J}.

Proof:

Recall that a new episode starts only when either a) $V_{1}(t)$ or $V_{2}(t)$ doubles, or b) samples used for estimating channel reliability double. Let $N_{1}(T),N_{2}(T)$ denote the number of episodes that began due to doubling of $V_{1}(t),V_{2}(t)$ respectively. Let $N_{3}(T)$ be number of episodes that began due to b). Clearly, $V_{1}(T)\geq 2^{N_{1}(T)}\lambda$ , while on $\mathcal{H}\cap\mathcal{J}$ we have $|x(t)|\leq g(\delta,T)$ (Lemma 4) so that $V_{1}(T)\leq\lambda+Tg^{2}(\delta,T)$ . Combining these, we obtain $N_{1}(T)\leq\log\left(1+Tg^{2}(\delta,T)/\lambda\right)$ . Similarly, $N_{2}(T)\leq\log\left(1+TK^{2}_{\max}g^{2}(\delta,T)/\lambda\right)$ . Also, $N_{3}(T)\leq\log\left(T\right)$ . The proof then follows by noting that $N(T)=N_{1}(T)+N_{2}(T)+N_{3}(T)$ . ∎

Lemma 14 (Bounding fluctuations within an episode)

We have

	$\displaystyle\mathcal{T}_{4}$	$\displaystyle\leq P_{\max}f(\delta,T)\cdot g(\delta,T),~{}\forall\omega\in\mathcal{H}\cap\mathcal{J},\mbox{ and},$
	$\displaystyle\mathcal{T}_{2}$	$\displaystyle\leq 2P_{\max}g^{2}(\delta,T),~{}\forall\omega\in\mathcal{H}\cap\mathcal{J},$

where $g(\delta,T),f(\delta,T)$ are as in (16), (13).

Proof:

Recall $\mathcal{T}_{4}=\sum\limits_{t=1}^{T-1}V_{\theta(t)}(x(t),\ell(t))-V_{\theta(t-1)}(x(t),\ell(t))$ . Hence, the term in summation corresponding to time $t$ is non-zero only if the UCB estimate $\theta(t)$ changes, i.e., a new episode begins at time $t$ . Thus,

\displaystyle\mathcal{T}_{4}\leq N(T)P_{\max}\max_{t\in[1,T]}|x(t)|^{2}.

The claim then follows by substituting bounds on $N(T)$ and $\max_{t\in[1,T]}|x(t)|$ from Lemma 4 and Lemma 13. ∎

Lemma 15

Define,

\displaystyle h(\delta,T):=\max\left\{1+2L\left(1+\frac{1}{\lambda}\left[\frac{\log^{1/2}\left(T/\delta\right)}{1-\exp(-\eta)}\right]^{2}\right),2\right\}.

(30)

We then have that

\displaystyle\frac{V_{1}(t)}{V_{1}(\tau_{k})}\leq h(\delta,T),~{}\forall t\in\left[\tau_{k},\tau_{k+1}-1\right].

Note that we have suppressed its dependence upon $\eta,L$ in order to simplify the notation.

Proof:

Consider the following cases.

Case a): $t\in[\tau_{k},\tau_{k}+L]$ . We have

	$\displaystyle\frac{V_{1}(t)}{V_{1}(\tau_{k})}-1=\sum\limits_{s=\tau_{k}+1}^{t}x^{2}(s)/V_{1}(\tau_{k})$
	$\displaystyle\leq L\left(\max_{s\in[\tau_{k},\tau_{k}+L]}x(s)\right)^{2}/V_{1}(\tau_{k})$
	$\displaystyle\leq L\left(\|x(\tau_{k})\|+\frac{\log^{1/2}\left(T/\delta\right)}{1-\exp(-\eta)}\right)^{2}/V_{1}(\tau_{k})$
	$\displaystyle\leq 2L\left(\frac{\|x(\tau_{k})\|^{2}}{V_{1}(\tau_{k})}+\frac{1}{V_{1}(\tau_{k})}\left[\frac{\log^{1/2}\left(T/\delta\right)}{1-\exp(-\eta)}\right]^{2}\right)$
	$\displaystyle\leq 2L\left(\frac{\|x(\tau_{k})\|^{2}}{\|x(\tau_{k})\|^{2}}+\frac{1}{\lambda}\left[\frac{\log^{1/2}\left(T/\delta\right)}{1-\exp(-\eta)}\right]^{2}\right).$

Case b): $t\in t\in[\tau_{k}+L+1,\tau_{k+1}-1]$ . In this case we have $\frac{V_{1}(t)}{V_{1}(\tau_{k})}<2$ , since a new episode begins once the ratio becomes greater than or equal to $2$ . ∎

References

[1] R. E. Bellman, Adaptive control processes: a guided tour. Princeton university press, 2015.
[2] P. R. Kumar and P. Varaiya, Stochastic systems: Estimation, identification and adaptive control. Prentice Hall Inc., Englewood Cliffs, 1986.
[3] A. Becker, P. R. Kumar, and C.-Z. Wei, “Adaptive control with the stochastic approximation algorithm: Geometry and convergence,” IEEE Transactions on Automatic Control, vol. 30, no. 4, pp. 330–338, 1985.
[4] H.-F. Chen and L. Guo, “Optimal adaptive control and consistent parameter estimates for armax model with quadratic cost,” SIAM Journal on Control and Optimization, vol. 25, no. 4, pp. 845–867, 1987.
[5] S. Bittanti, M. C. Campi et al., “Adaptive control of linear time invariant systems: the ?bet on the best? principle,” Communications in Information & Systems, vol. 6, no. 4, pp. 299–320, 2006.
[6] Y. Abbasi-Yadkori and C. Szepesvári, “Regret bounds for the adaptive control of linear quadratic systems,” in Proceedings of the 24th Annual Conference on Learning Theory, 2011, pp. 1–26.
[7] T. L. Lai and H. Robbins, “Asymptotically efficient adaptive allocation rules,” Advances in applied mathematics, vol. 6, no. 1, pp. 4–22, 1985.
[8] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” Machine learning, vol. 47, no. 2-3, pp. 235–256, 2002.
[9] O. L. V. Costa, M. D. Fragoso, and R. P. Marques, Discrete-time Markov jump linear systems. Springer Science & Business Media, 2006.
[10] Y. Abbasi-Yadkori, D. Pál, and C. Szepesvári, “Online least squares estimation with self-normalized processes: An application to bandit problems,” arXiv preprint arXiv:1102.2670, 2011.
[11] T. Tao, V. Vu et al., “Random matrices: universality of local eigenvalue statistics,” Acta mathematica, vol. 206, no. 1, pp. 127–204, 2011.

	$\displaystyle\left\|A-\hat{A}(\tau)\right\|\|x(t)\|$	$\displaystyle\leq\sqrt{V_{1}(\tau)}\left\|A-\hat{A}(\tau)\right\|\frac{\|x(t)\|}{\sqrt{V_{1}(t)}}\sqrt{h(\delta,T)}$
		$\displaystyle\leq\left(\gamma_{1}(\tau)+\lambda^{1/2}\right)\frac{\|x(t)\|}{\sqrt{V_{1}(t)}}\sqrt{h(\delta,T)},$		(27)

	$\displaystyle\frac{V_{1}(t)}{V_{1}(\tau_{k})}-1=\sum\limits_{s=\tau_{k}+1}^{t}x^{2}(s)/V_{1}(\tau_{k})$
	$\displaystyle\leq L\left(\max_{s\in[\tau_{k},\tau_{k}+L]}x(s)\right)^{2}/V_{1}(\tau_{k})$
	$\displaystyle\leq L\left(\|x(\tau_{k})\|+\frac{\log^{1/2}\left(T/\delta\right)}{1-\exp(-\eta)}\right)^{2}/V_{1}(\tau_{k})$
	$\displaystyle\leq 2L\left(\frac{\|x(\tau_{k})\|^{2}}{V_{1}(\tau_{k})}+\frac{1}{V_{1}(\tau_{k})}\left[\frac{\log^{1/2}\left(T/\delta\right)}{1-\exp(-\eta)}\right]^{2}\right)$
	$\displaystyle\leq 2L\left(\frac{\|x(\tau_{k})\|^{2}}{\|x(\tau_{k})\|^{2}}+\frac{1}{\lambda}\left[\frac{\log^{1/2}\left(T/\delta\right)}{1-\exp(-\eta)}\right]^{2}\right).$

Learning in Networked Control Systems

Abstract

I Introduction

II System Model

III Preliminaries on Jump Markov Linear Systems

IV Upper Confidence Bounds for NCS (UCB-NCS)

V Large Deviation Bounds on Estimation Errors

Lemma 1

Proof:

VI Large Deviation Bounds on the System State |x​(t)||x(t)|

Lemma 2

Assumption 1

Lemma 3

Lemma 4

Proof:

VII Regret Analysis of UCB-NCS

Lemma 5

Proof:

VII-A Bounding R1R_{1}

Lemma 6

Proof:

Lemma 7 (Bounding R1R_{1})

VII-B Bounding R2R_{2}

Lemma 8

Lemma 9

Proof:

VIII Main Result

Theorem 1 (Bound on Regret)

Proof:

IX Conclusion and Future Work

Lemma 10 (Bounding 𝒯7\mathcal{T}_{7})

Proof:

Lemma 11 (Bounding 𝒯8\mathcal{T}_{8})

Proof:

Lemma 12

Proof:

Lemma 13 (Bounding N​(T)N(T))

Proof:

Lemma 14 (Bounding fluctuations within an episode)

Proof:

Lemma 15

Proof:

References

VI Large Deviation Bounds on the System State $|x(t)|$

VII-A Bounding $R_{1}$

Lemma 7 (Bounding $R_{1}$ )

VII-B Bounding $R_{2}$

Lemma 10 (Bounding $\mathcal{T}_{7}$ )

Lemma 11 (Bounding $\mathcal{T}_{8}$ )

Lemma 13 (Bounding $N(T)$ )