This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Tracking an Auto-Regressive Process with Limited Communication per Unit Time

Rooji Jinan    Parimal Parag    Himanshu Tyagi The authors are with the Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore 560012, India. Email: {roojijinan, parimal, htyagi}@iisc.ac.in.
Abstract

Samples from a high-dimensional AR[1] process are observed by a sender which can communicate only finitely many bits per unit time to a receiver. The receiver seeks to form an estimate of the process value at every time instant in real-time. We consider a time-slotted communication model in a slow-sampling regime where multiple communication slots occur between two sampling instants. We propose a successive update scheme which uses communication between sampling instants to refine estimates of the latest sample and study the following question: Is it better to collect communication of multiple slots to send better refined estimates, making the receiver wait more for every refinement, or to be fast but loose and send new information in every communication opportunity? We show that the fast but loose successive update scheme with ideal spherical codes is universally optimal asymptotically for a large dimension. However, most practical quantization codes for fixed dimensions do not meet the ideal performance required for this optimality, and they typically will have a bias in the form of a fixed additive error. Interestingly, our analysis shows that the fast but loose scheme is not an optimal choice in the presence of such errors, and a judiciously chosen frequency of updates outperforms it.

I Introduction

We consider the setting of real-time decision systems based on remotely sensed observations. In this setting, the decision maker needs to track the remote observations with high precision and in a timely manner. These are competing requirements, since high precision tracking will require larger number of bits to be communicated, resulting in larger transmission delay and increased staleness of information. Towards this larger goal, we study the following problem.

Consider a discrete time first-order auto-regressive (AR[1]) process XtnX_{t}\in\mathbb{R}^{n}, t0t\geq 0. A sensor draws a sample from this process, periodically once every ss time-slots. In each of these time-slots, the sensor can send nRnR bits to a center. The center seeks to form an estimate X^t\hat{X}_{t} of XtX_{t} at time tt, with small mean square error (MSE). Specifically, we are interested in minimizing the time-averaged error t=1T𝔼XtX^t22/T\sum_{t=1}^{T}\mathbb{E}{\lVert X_{t}-\hat{X}_{t}\rVert_{2}^{2}}/T to enable timely and accurate tracking of XtX_{t}.

We propose and study a successive update scheme where the encoder computes the error in the estimate of the latest sample at the decoder and sends its quantized value to the decoder. The decoder adds this value to its previous estimate to update the estimate of the latest sample, and uses it to estimate the current value using a linear predictor. We instantiate this scheme with a general gain-shape quantizer for error-quantization.

Note that we can send this update several times between two sampling instances. In particular, our interest in comparing a fast but loose scheme where an update is sent every slot or a slower update every pp communication slots. The latter allows the encoder to use more bits for the update, but the decoder will need to wait longer. We analyze this scheme for a universal setting and show that the fast but loose successive update scheme, used with an appropriately selected quantizer, is optimal asymptotic.

To show this optimality, we use a random construction for the quantizer, based on the spherical code given in [1, 2]. Roughly speaking, this ideal quantizer QQ yields

𝔼yQ(y)22y2222R,\mathbb{E}{\lVert y-Q(y)\rVert_{2}^{2}}\leq\lVert y\rVert_{2}^{2}2^{-2R},

for every yy uniformly bounded. However, in practice, at finite nn, such quantizers need not exist. Most practical vector quantizers have an extra additive error, i.e.i.e., the error bound takes the form

𝔼yQ(y)22y22θ+nε2.\mathbb{E}{\lVert y-Q(y)\rVert_{2}^{2}}\leq\lVert y\rVert_{2}^{2}\theta+n\varepsilon^{2}.

We present our analysis for such general quantizers. Interestingly, for such a quantizer (which is all we have at a finite nn), the optimal choice of pp can differ from 11. Our analysis provides a theoretically sound guideline for choosing the frequency of updates 1/p1/p for practical quantizers.

Our work relates to a large body of literature ranging from real-time compression to control and estimation over networks. The structure of real-time encoders for source coding has been studied in [3, 4, 5, 6, 7, 8, 9]. The general structure of real-time encoders for Markov sources is studied for communication over error-free channels in [3] and over noisy channels in [4, 5]. A similar structural result for the optimal encoders and decoders which are restricted to be causal is given in [6]. Furthermore, structural results in the context of optimal zero-delay coding of correlated sources are available in [7, 9, 8]. The setup in all these works are different from the problem we consider and the results do not extend to our problem.

The problems of remote estimation under communication constraints of various kinds have been studied in [10, 11, 12, 13, 14, 15]. This line of work proposes several Kalman-like recursive estimation algorithms and evaluates their performances. In a related thread, [16, 17, 18] study remote estimation under communication constraints and other related constraints using tools from dynamic programming and stochastic control. However, in all these works the role of channel delay is slightly different from that in our setting. Furthermore, the specific problem of choice of quantizer we consider has not been looked at. More recently, [19] studied remote estimation of Wiener process for channels with random delays and proves that the optimal sampling policy is a threshold based policy. This work, like some of the works cited above, assumes real-valued transmissions and does not take into account any quantization effects.

In more information theoretic settings, sequential coding for individual sequences under delay constraints was studied in [20, 21, 22, 23]. Closer to our work, the causal (non-anticipatory) rate-distortion function for a stochastic process goes back to early works [24, 25]. Recent works [26, 27, 28] consider the specific case of auto-regressive and Gauss-Markov processes and use the general formula in these early works to establish asymptotic optimality of simpler information structure for the encoders (optimal decoder structure is straightforward). We note that related formulations have been studied for simple settings of two or three iterations in [29, 30] where interesting encoder structures for using previous communication for next sample as well emerge. Although some of these works propose specific optimal schemes as well, the key results in this line of research provide an expression for the rate-distortion function as an optimization problem, solving which will provide guidelines for a concrete scheme. In contrast, motivated by problems of estimation over an erasure channel, [31] provides an asymptotically optimal scheme that roughly uses Gaussian codebooks to quantize the innovation errors between the encoder’s observation and the decoder’s estimation. Our work is closest to [31], but differs in an important aspect from all the works mentioned in this research thread: We take transmission delays into account. In particular, in our formulation the decoder estimation and communication delays are on the same time-scales. The decoder must provide an estimate at every time instant, and a longer codeword will result in a longer delay for the decoder to get complete information.

Nonetheless, our converse bound is derived using similar methods as [32]. Even the achievability part of our proof draws from [32], but there is technical caveat. Note that after the first round of quantization, the error vector need not be Gaussian, and the analysis in [32] can only be applied after showing a closeness of the error vector distribution to Gaussian in the Wasserstein distance of order 22. While the original proof [32] overlooks this technical point, this gap can be filled using a recent result from [33] if spherical codes are used. But we follow an alternative approach and show a direct analysis using vector quantizers.

In addition, there is a large body of work on control problems over rate-limited communication channels (cf.cf. [34, 35, 36, 37, 38, 39, 40, 41, 42, 43]). This line of work implicitly requires handling of communication delays in construction of estimators. However, the simple formulation we have seems to be missing, and results in this long line of work do not resolve the questions we raise.

Our main contributions in this paper are as follows: We present a heuristically appealing encoder structure which we show to be optimal asymptotically in the dimension of the observation. Specifically, we propose to send successive updates that refines the estimates of the latest sample at the decoder. It is important to note that we quantize the estimation error at the decoder, instead of quantizing the innovation sequence formed at the encoder. Although the optimal MMSE decoder involves taking conditional expectation, we use a simple decoder which uses a simple linear structure. Yet, we show that this decoder is asymptotically optimal. Then, we instantiate this general scheme with spherical codes for quantizers to obtain a universal scheme. In particular, we consider general gain-shape quantizers and develop a framework to analyze their performance. One interesting result we present shows that the tradeoff between the accuracy and frequency of the updates must be carefully balanced based on the “bias” (additive error) in the quantizer used.

We present our problem formulation in the next section. Section III presents a discussion on the our achievability scheme followed by the main results in the subsequent section. Section V provides a detailed analysis of our scheme, which we further build-on in Section VI to get our asymptotic achievability results. We prove our converse bound in Section VII, and conclude with a discussion on extensions of our result in the final section.

II Problem Formulation

We begin by providing a formal description of our problem; different components of the model are presented in separate sections. Throughout the remainder of this paper, the set of real numbers is denoted by \mathbb{R}, the nn-dimensional Euclidean space is denoted by n\mathbb{R}^{n} and the associated Euclidean norm by 2\lVert\cdot\rVert_{2}, the set of positive integers is denoted by \mathbb{N}, the set of non-negative integers is denoted by +\mathbb{Z}_{+}, the set of continuous positive integers until mm is denoted by [m]{1,,m}[m]\triangleq\left\{1,\dots,m\right\}, and an identity matrix of size n×nn\times n is denoted by InI_{n}.

II-A Observed random process and its sampling

For α(0,1)\alpha\in(0,1), we consider a discrete time auto-regressive process of order 1 (AR[1] process) in n\mathbb{R}^{n},

Xt=αXt1+ξt,t0,X_{t}=\alpha X_{t-1}+\xi_{t},\quad t\geqslant 0, (1)

where (ξtn,t1)(\xi_{t}\in\mathbb{R}^{n},t\geqslant 1) is an independent and identically distributed (i.i.d.) random sequence with zero mean and covariance matrix σ2(1α2)In\sigma^{2}(1-\alpha^{2})I_{n}. For simplicity, we assume that X0nX_{0}\in\mathbb{R}^{n} is a zero mean random variable with covariance matrix σ2In\sigma^{2}I_{n}. This implies that the variance of XtnX_{t}\in\mathbb{R}^{n} is σ2In\sigma^{2}I_{n} for all t0t\geqslant 0. In addition, we assume that Xt2\lVert X_{t}\rVert_{2} has a bounded fourth moment at all times t0t\geqslant 0. Specifically, let κ>0\kappa>0 satisfy

supk+1n𝔼Xk24κ.\sup_{k\in\mathbb{Z}_{+}}\frac{1}{n}\sqrt{\mathbb{E}\lVert X_{k}\rVert_{2}^{4}}\leqslant\kappa.

It is clear that X=(Xtn,t0)X=(X_{t}\in\mathbb{R}^{n},t\geqslant 0) is a Markov process. We denote the set of processes XX satisfying the assumptions above by 𝕏n\mathbb{X}_{n} and the class of all such processes for different choices of dimension nn as 𝕏\mathbb{X}.

This discrete time process is sub-sampled periodically at sampling frequency 1/s1/s, for some ss\in\mathbb{N}, to obtain samples (Xksn,k0)(X_{ks}\in\mathbb{R}^{n},k\geqslant 0).

II-B Encoder description

The sampled process (Xks,k0)(X_{ks},k\geqslant 0) is passed to an encoder which converts it to a bit stream. The encoder operates in real-time and sends nRsnRs bits between any two sampling instants. Specifically, the encoder is given by a sequence of mappings (ϕt)t0(\phi_{t})_{t\geq 0}, where the mapping at any discrete time t=kst=ks is denoted by

ϕt:n(k+1){0,1}nRs.\phi_{t}:\mathbb{R}^{n(k+1)}\to\left\{0,1\right\}^{nRs}.

The encoder output at time t=kst=ks is denoted by the codeword Ctϕt(X0,Xs,,Xks)C_{t}\triangleq\phi_{t}(X_{0},X_{s},\dots,X_{ks}). We represent this codeword by an ss-length sequence of binary strings Ct=(Ct,0,,Ct,s1)C_{t}=(C_{t,0},\dots,C_{t,s-1}), where each term Ct,iC_{t,i} takes values in {0,1}nR\left\{0,1\right\}^{nR}. For t=kst=ks and 0is10\leq i\leq s-1, we can view the binary string Ct,iC_{t,i} as the communication sent at time t+it+i. We elaborate on the communication channel next.

II-C Communication channel

The output bit-stream of the encoder is sent to the receiver via an error-free communication channel. Specifically, we assume slotted transmission with synchronization where in each slot the transmitter sends nRnR bits of communication error-free. That is, we are allowed to send RR bits per dimension, per time slot. Note that there is a delay of 11 time-unit (corresponding to one slot) in transmission of each nRnR bits. Therefore, the vector Cks,iC_{ks,i} of nRnR bits transmitted at time ks+iks+i is received at time instant ks+i+1ks+i+1 for 0is10\leqslant i\leqslant s-1. Throughout we use the notation Ik{ks,,(k+1)s1}I_{k}\triangleq\left\{ks,\dots,(k+1)s-1\right\} and I~k=Ik+1={ks+1,,(k+1)s}\tilde{I}_{k}=I_{k}+1=\left\{ks+1,\dots,(k+1)s\right\}, respectively, for the set of transmit and receive times for the strings Cks,iC_{ks,i}, 0is10\leq i\leq s-1.

II-D Decoder description

We describe the operation of the receiver at time tIkt\in I_{k}, for some kk\in\mathbb{N}, such that i=tks{0,,s1}i=t-ks\in\left\{0,\dots,s-1\right\}. Upon receiving the codewords Cs,C2s,,C(k1)sC_{s},C_{2s},...,C_{(k-1)s} and the partial codeword (Cks,0,,Cks,i1)(C_{ks,0},...,C_{ks,i-1}) at time t=ks+it=ks+i, the decoder estimates the current-state XtX_{t} of the process using the estimator mapping

ψt:{0,1}nRtn.\psi_{t}:\{0,1\}^{nRt}\to\mathbb{R}^{n}.

We denote the overall communication received by the decoder until time instant111The time index t1t-1 in Ct1C^{t-1} corresponds to the transmission time of the codewords, whereby the communication received till time tt is denoted by Ct1C^{t-1}. tt by Ct1C^{t-1}. Further, we denote by X^t|t\hat{X}_{t|t} the real-time causal estimate ψt(Ct1)\psi_{t}(C^{t-1}) of XtX_{t} formed at the decoder at time tt. Thus, the overall real-time causal estimation scheme is described by the mappings (ϕt,ψt)t0(\phi_{t},\psi_{t})_{t\geqslant 0}. It is important to note that the communication available to the decoder at time tIkt\in I_{k} can only depend on samples XX_{\ell} up to time ks\ell\leqslant ks. As a convention, we assume that X^0|0=0\hat{X}_{0|0}=0.

II-E Performance metrics

We call the encoder-decoder mapping sequence (ϕ,ψ)=(ϕt,ψt)t0(\phi,\psi)=(\phi_{t},\psi_{t})_{t\geqslant 0} a tracking code of rate RR and sampling period ss. The tracking error of our tracking code at time tt for process XX is measured by the mean squared error (MSE) per dimension given by

Dt(ϕ,ψ,X)1n𝔼XtX^t|t22.D_{t}(\phi,\psi,X)\triangleq\frac{1}{n}\mathbb{E}\lVert X_{t}-\hat{X}_{t|t}\rVert_{2}^{2}.

Our goal is to design (ϕ,ψ)(\phi,\psi) with low average tracking error D¯T(ϕ,ψ,X)\overline{D}_{T}(\phi,\psi,X) given by

D¯T(ϕ,ψ,X)1Tt=0T1Dt(ϕ,ψ,X).\overline{D}_{T}(\phi,\psi,X)\triangleq\frac{1}{T}\sum_{t=0}^{T-1}D_{t}(\phi,\psi,X).

For technical reasons, we restrict to a finite time horizon setting. For the most part, the time horizon TT will remain fixed and will be omitted from the notation. Instead of working with the mean-square error, a more convenient parameterization for us will be that of accuracy, given by

δT(ϕ,ψ,X)=1D¯T(ϕ,ψ,X)σ2.\delta^{T}(\phi,\psi,X)=1-\frac{\overline{D}_{T}(\phi,\psi,X)}{\sigma^{2}}.
Definition 1 (Maxmin tracking accuracy).

The worst-case tracking accuracy for 𝕏n\mathbb{X}_{n} attained by a tracking code (ϕ,ψ)(\phi,\psi) is given by

δT(ϕ,ψ,𝕏n)=infX𝕏nδT(ϕ,ψ,X).\delta^{T}(\phi,\psi,\mathbb{X}_{n})=\inf_{X\in\mathbb{X}_{n}}\delta^{T}(\phi,\psi,X).

The maxmin tracking accuracy for 𝕏n\mathbb{X}_{n} at rate RR and sampling period ss is given by

δnT(R,s,,𝕏n)=sup(ϕ,ψ)δT(ϕ,ψ,𝕏n),\delta_{n}^{T}(R,s,,\mathbb{X}_{n})=\sup_{(\phi,\psi)}\delta^{T}(\phi,\psi,\mathbb{X}_{n}),

where the supremum is over all tracking codes (ϕ,ψ)(\phi,\psi).

The maxmin tracking accuracy δnT(R,s,𝕏n)\delta_{n}^{T}(R,s,\mathbb{X}_{n}) is the fundamental quantity of interest for us. Recall that nn denotes the dimension of the observations in XtX_{t} for X𝕏nX\in\mathbb{X}_{n} and TT the time horizon. However, we will only characterize δnT(R,s,𝕏n)\delta_{n}^{T}(R,s,\mathbb{X}_{n}) asymptotically in nn and TT. Specifically, we define the asymptotic maxmin tracking accuracy as

δ(R,s,𝕏)=lim supTlim supnδnT(R,s,𝕏n).\delta^{*}(R,s,\mathbb{X})=\limsup_{T\to\infty}\limsup_{n\to\infty}\delta_{n}^{T}(R,s,\mathbb{X}_{n}).

We will provide a characterization of δ(R,s,𝕏)\delta^{*}(R,s,\mathbb{X}) and present a sequence of tracking codes that attains it. In fact, the tracking code we use is an instantiation of our successive update scheme, which we describe in the next section. It is important to note that our results may not hold if we switch the order of limits above: We need very large codeword lengths depending on a fixed finite time horizon TT.

III The Successive Update Scheme

In this section, we present our main contribution in this paper, namely the Successive Update tracking code. Before we describe the scheme completely, we present its different components. In every communication slot, the transmitter gets an opportunity to send nRnR bits. The transmitter may use it to send any information about a previously seen sample. There are various options for the encoder. For instance, it may use the current slot to send some information about a sample it had seen earlier. Or it may use all the slots between two sampling instants to send a quantized version of the latest sample. Interestingly, it will be seen (quite straightforwardly) that there are not so many options for the decoder; it gets roughly fixed once the encoder is chosen.

III-A Decoder structure

Once the quantized information is sent by the transmitter, at the receiver end, the decoder estimates the state XtX_{t}, using the codewords received until time tt. Since we are interested in forming estimates with small MSE, the decoder simply forms the minimum mean square error (MMSE) estimate using all the observations till that point. Specifically, for tut\geq u, denoting by X~u|t\tilde{X}_{u|t} the MMSE estimate XuX_{u} formed by the communication Ct1C^{t-1} received before time tt, we know (cf.cf. [44])

X~u|t=𝔼[Xu|Ct1].\tilde{X}_{u|t}=\mathbb{E}[X_{u}|C^{t-1}].

The following result presents a simple structure for X~u|t\tilde{X}_{u|t} for our AR[1] model.

Lemma 1 (MMSE Structure).

The MMSE estimates X~t|t\tilde{X}_{t|t} and X~ti|t\tilde{X}_{t-i|t}, respectively, of samples XtX_{t} and XtiX_{t-i} at any time tIkt\in I_{k} and i=tksi=t-ks using communication Ct1C^{t-1} are related as

X~t|t=αiX~ti|t=αi𝔼[Xks|Ct1].\tilde{X}_{t|t}=\alpha^{i}\tilde{X}_{t-i|t}=\alpha^{i}\mathbb{E}[X_{ks}|C^{t-1}].
Proof:

Recalling the notation IkI_{k}, we can represent tIkt\in I_{k} as t=ks+it=ks+i for 0is10\leq i\leq s-1. From the evolution of the AR[1] process, for 1is11\leq i\leq s-1, the sample Xks+iX_{ks+i} can be expressed in terms of the previous sample XksX_{ks} as

Xks+i=αiXks+j=1iαijξks+j,X_{ks+i}=\alpha^{i}X_{ks}+\sum_{j=1}^{i}\alpha^{i-j}\xi_{ks+j}, (2)

where the innovation sequence (ξks+j:j1)(\xi_{ks+j}:j\geqslant 1) is independent of process samples (X0,,Xks)(X_{0},\dots,X_{ks}). By our specification, the historical observations Ct1C^{t-1} at the receiver depend only on the process evolution until time ksks, namely Ct1C^{t-1} is independent of (ξks+j:j1)(\xi_{ks+j}:j\geqslant 1) conditioned on (X0,,Xks)(X_{0},\dots,X_{ks}). In particular, 𝔼[ξks+j|Ct1]=0\mathbb{E}[\xi_{ks+j}|C^{t-1}]=0 for every j1j\geq 1. Thus, taking conditional expectation on both sides of (2), we get

X~ks+i|ks+i=𝔼[Xks+i|Ct1]=αi𝔼[Xks|Ct1]=αiX~ks|t,\tilde{X}_{ks+i|ks+i}=\mathbb{E}[X_{ks+i}|C^{t-1}]=\alpha^{i}\mathbb{E}[X_{ks}|C^{t-1}]=\alpha^{i}\tilde{X}_{ks|t},

since X~ks|t=𝔼[Xks|Ct1]\tilde{X}_{ks|t}=\mathbb{E}[X_{ks}|C^{t-1}]. ∎

Therefore, the optimal strategy for the decoder is to use the communication sent to form an estimate for the latest sample and then scale it to form the estimate of the state at the current time instant.

III-B Encoder structure: Refining the error successively

The structure of the decoder exposed in Lemma 1 gives an important insight for encoder design: The communication sent between two sampling instants is used only to form estimates of the latest sample. In particular, the communication Cks+1,iC_{ks+1,i} transmitted at time t=ks+it=ks+i must be chosen to refine the previous estimate from 𝔼[Xks|C0,,Cks,Cks+1,0,,Cks+1,i1]\mathbb{E}[X_{ks}|C_{0},\dots,C_{ks},C_{ks+1,0},\dots,C_{ks+1,i-1}] to 𝔼[Xks|C0,,Cks,Cks+1,0,,Cks+1,i1,Cks+1,i]\mathbb{E}[X_{ks}|C_{0},\dots,C_{ks},C_{ks+1,0},\dots,C_{ks+1,i-1},C_{ks+1,i}]. This principle can be applied (as a heuristic) for any other form of the estimate as follows. Let X^ks|t\hat{X}_{ks|t} denote the estimate for XksX_{ks} formed at the receiver at time tt (which need not be the MMSE estimate X~ks|t\tilde{X}_{ks|t}). Our encoder computes the error in the receiver estimate of the last process sample at each time instant tt. Denoting the error at time tIkt\in I_{k} by YtXksX^ks|tY_{t}\triangleq X_{ks}-\hat{X}_{ks|t}, the encoder quantizes this error YtY_{t} and sends it as communication Cks+1,iC_{ks+1,i}.

Simply speaking, our encoder computes and quantizes the error in the current estimate of the last sample at the decoder, and sends it to the decoder to enable the refinement of the estimate in the next time slot. While we have not been able to establish optimality of this encoder structure, our results will show its optimality asymptotically, in the limit as the dimension nn goes to infinity.

Even within this structural simplification, a very interesting question remains. Since the process is sampled once in ss time slots, we have, potentially, nRsnRs bits to encode the latest sample. At any time tI~kt\in\tilde{I}_{k}, the receiver has access to (C0,,C(k1)s)(C_{0},\dots,C_{(k-1)s}) and the partial codewords (Cks,0,,Cks,i1)(C_{ks,0},\dots,C_{ks,i-1}) for i=tksi=t-ks. A simple approach for the encoder is to use the complete codeword to express the latest sample and the decoder can ignore the partial codewords. This approach will result in slow but very accurate updates of the sample estimates. An alternative fast but loose approach will send nRnR quantizer codewords to refine estimates in every communication slot. Should we prefer fast but loose estimates or slow but accurate ones? Our results will shed light on this conundrum.

III-C The choice of quantizers

In our description of the encoder structure above, we did not specify a key design choice, namely the choice of the quantizer. We will restrict to using the same quantizer to quantize the error in each round of communication. The precision of this quantizer will depend on whether we choose a fast but loose paradigm or a slow but accurate one. However, the overall structure will remain the same. Roughly speaking, we allow any gain-shape [45] quantizer which separately sends the quantized value of the gain y2\lVert y\rVert_{2} and the shape y/y2y/\lVert y\rVert_{2} for input yy. Formally, we use the following abstraction.

Definition 2 ((θ,ε)(\theta,\varepsilon)-quantizer family).

Fix 0<M<0<M<\infty. For 0θ10\leq\theta\leq 1 and 0ε0\leq\varepsilon, a quantizer QQ with dynamic range MM specified by a mapping Q:n{0,1}nRQ:\mathbb{R}^{n}\to\{0,1\}^{nR} constitutes an nRnR bit (θ,ε)(\theta,\varepsilon)-quantizer if for every vector yny\in\mathbb{R}^{n} such that y22nM2\lVert y\rVert_{2}^{2}\leqslant nM^{2}, we have

𝔼yQ(y)22y22θ(R)+nε2.\mathbb{E}\lVert y-Q(y)\rVert_{2}^{2}\leqslant\lVert y\rVert_{2}^{2}\theta(R)+n\varepsilon^{2}.

Further, for a mapping θ:+[0,1]\theta:\mathbb{R}_{+}\to[0,1], which is a decreasing function of rate RR, a family of quantizers Q={QR:R>0}Q=\{Q_{R}:R>0\} constitutes an (θ,ε)(\theta,\varepsilon)-quantizer family if for every RR the quantizer QRQ_{R} constitutes an nRnR bit (θ(R),ε)(\theta(R),\varepsilon)-quantizer.

The expectation in the previous definition is taken with respect to the randomness in the quantizer, which is assumed to be shared between the encoder and the decoder for simplicity. The parameter MM, termed the dynamic range of the quantizer, specifies the domain of the quantizer. When the input yy does not satisfy y2nM\lVert y\rVert_{2}\leq\sqrt{n}M, the quantizer simply declares a failure, which we denote by \bot. Our tracking code may use any such (θ,ε)(\theta,\varepsilon)-quantizer family. It is typical in any construction of a gain-shape quantizer to have a finite MM and ε>0\varepsilon>0. Our analysis for finite nn will apply to any such (θ,ε)(\theta,\varepsilon)-quantizer family and, in particular, will bring-out the role of the “bias” ε\varepsilon. However, when establishing our optimality result, we instantiate it using a random spherical code to get the desired performance.

III-D Description of the successive update scheme

All the conceptual components of our scheme are ready. We use the structure of Lemma 1 and focus only on updating the estimates of the latest observed sample XksX_{ks} at the decoder. Our encoder successively updates the estimate of the latest sample at the decoder by quantizing and sending estimates for errors YtY_{t}.

As discussed earlier, we must decide if we prefer a fast but loose approach or a slow but accurate approach for sending error estimates. To carefully examine this tradeoff, we opt for a more general scheme where the nRsnRs bits available between two samples are divided into m=s/pm=s/p sub-fragments of length nRpnRp bits each. We use an nRpnRp bit quantizer to refine error estimates for the latest sample XksX_{ks} (obtained at time t=kst=ks) every pp slots, and send the resulting quantizer codewords as partial tracking codewords (Cks,jp,,Cks,(j+1)p1)(C_{ks,jp},...,C_{ks,(j+1)p-1}), 0jm10\leqslant j\leq m-1. Specifically, the kkth codeword transmission interval IkI_{k} is divided into mm sub-fragments Ik,jI_{k,j}, 1jm1\leq j\leq m given by

Ik,j{ks+jp,,ks+(j+1)p1}, 0jm1,I_{k,j}\triangleq\left\{ks+jp,\dots,ks+(j+1)p-1\right\},\leavevmode\nobreak\ 0\leq j\leq m-1,

and (Cks,jp,,Cks,(j+1)p1)(C_{ks,jp},...,C_{ks,(j+1)p-1}) is transmitted in communication slots in Ik,jI_{k,j}.

At time instant t=ks+jp+1t=ks+jp+1 the decoder receives the jjth sub-fragment (Cks,tks,tIk,j)(C_{ks,t-ks},t\in I_{k,j}) of nRpnRp bits, and uses it to refine the estimate of the latest source sample XksX_{ks}. Note that the fast but loose and the slow but accurate regimes described above correspond to p=1p=1 and p=sp=s, respectively. In the middle of the interval Ik,jI_{k,j}, the decoder ignores the partially received quantization code and retains the estimate X^ks\hat{X}_{ks} of XksX_{ks} formed at time ks+(j1)p+1ks+(j-1)p+1. It forms an estimate of the current state Xks+iX_{ks+i} by simply scaling X^ks\hat{X}_{ks} by a factor of αi\alpha^{i}, as suggested by Lemma 1.

Finally, we impose one more additional simplification on the decoder structure. Instead of using MMSE estimates for the latest sample, we simply update the estimate by adding to it the quantized value of the error. Thus, the decoder has a simple linear structure.

We can use any nRpnRp bit quantizer222With an abuse of notation, we will use QpQ_{p} instead of QRpQ_{Rp} to denote an nRpnRp bit quantizer. QpQ_{p} for the nn-dimensional error vector, whereby this scheme can be easily implemented in practice if QpQ_{p} can be implemented. For instance, we can use any standard gain-shape quantizer. The performance of most quantizers can be analyzed explicitly to render them a (θ,ε)(\theta,\varepsilon)-quantizer family for an appropriate MM and function θ\theta. Later, when analyzing the scheme, we will consider a QpQ_{p} coming from a (θ,ε)(\theta,\varepsilon)-quantizer family and present a theoretically sound guideline for choosing pp.

Recall that we denote the estimate of XuX_{u} formed at the decoder at time tut\geqslant u by X^u|t\hat{X}_{u|t}. We start by initializing X^0|0=0\hat{X}_{0|0}=0 and then proceed using the encoder and the decoder algorithms outlined above. Note that our quantizer QpQ_{p} may declare failure symbol \bot, in which case the decoder must still yield a nominal estimate. We will simply declare the estimate as333In analysis, we account for all these events as error. Only the probability of failure will determine the contribution of this part to the MSE since the process is mean-square bounded. 0 once a failure happens.

We give a formal description of our encoder and decoder algorithms below.

The encoder.

  1. 1

    Initialize k=0k=0, j=0j=0, X^0|0=0\hat{X}_{0|0}=0.

  2. 2

    At time t=ks+jpt=ks+jp, use the decoder algorithm (to be described below) to form the estimate X^ks|t\hat{X}_{ks|t} and compute the error

    Yk,jXksX^ks|t,Y_{k,j}\triangleq X_{ks}-\hat{X}_{ks|t}, (3)

    where we use the latest sample XksX_{ks} available at time t=ks+jpt=ks+jp.

  3. 3

    Quantize Yk,jY_{k,j} to nRpnRp bit as Qp(Yk,j)Q_{p}(Y_{k,j}).

  4. 4

    If quantize failure occurs and Qp(Yk,j)=Q_{p}(Y_{k,j})=\bot, send \bot to the receiver and terminate the encoder.

  5. 5

    Else, send a binary representation of Qp(Yk,j)Q_{p}(Y_{k,j}) as the communication (Cks,0,,Cks,p1)(C_{ks,0},...,C_{ks,p-1}) to the receiver over the next pp communication slots444For simplicity, we do not account for the extra message symbol needed for sending \bot..

  6. 6

    If j<m1j<m-1, increase jj by 11; else set j=0j=0 and increase kk by 11. Go to Step 2.

The decoder.

  1. 1

    Initialize k=0k=0, j=0j=0, X^0|0=0\hat{X}_{0|0}=0.

  2. 2

    At time t=ks+jpt=ks+jp, if encoding failure has not occurred until time tt, compute

    X^ks|ks+jp=X^ks|ks+(j1)p+Qp(Yk,j1),\hat{X}_{ks|ks+jp}=\hat{X}_{ks|ks+(j-1)p}+Q_{p}(Y_{k,j-1}),

    and output X^t|t=αtksX^ks|t\hat{X}_{t|t}=\alpha^{t-ks}\hat{X}_{ks|t}.

  3. 3

    Else, if encoding failure has occurred and the \bot symbol is received declare X^s|t=0\hat{X}_{s|t}=0 for all subsequent time instants sts\geqslant t.

  4. 4

    At time t=ks+jp+it=ks+jp+i, for i[p1]i\in[p-1], output555We ignore the partial quantizer codewords received as (Cks,jp+1,Cks,jp+2,,Cks,jp+i1)(C_{ks,jp+1},C_{ks,jp+2},\dots,C_{ks,jp+i-1}) till time tt. X^t|t=αtksX^ks|ks+jp\hat{X}_{t|t}=\alpha^{t-ks}\hat{X}_{ks|ks+jp}.

  5. 5

    If j<m1j<m-1, increase jj by 11; else set j=0j=0 and increase kk by 11. Go to Step 2.

IV Main Results

We present results in two categories. First, we provide an explicit formula for the asymptotic maxmin tracking accuracy δ(R,s,𝕏)\delta^{*}(R,s,\mathbb{X}). Next, we present a theoretically-founded guideline for selecting a good pp for the successive update scheme with a (θ,ε)(\theta,\varepsilon)-quantizer family. Interestingly, the optimal choice may differ from the asymptotically optimal choice of p=1p=1.

IV-A Characterization of the maxmin tracking accuracy

To describe our result, we define a functions δ0:+[0,1]\delta_{0}:\mathbb{R}_{+}\to[0,1] and g:+[0,1]g:\mathbb{R}_{+}\to[0,1] as

δ0(R)\displaystyle\delta_{0}(R) α2(122R)(1α222R), for all R>0;\displaystyle\triangleq\frac{\alpha^{2}(1-2^{-2R})}{(1-\alpha^{2}2^{-2R})},\text{ for all }R>0;
g(s)\displaystyle g(s) (1α2s)s(1α2), for all s>0.\displaystyle\triangleq\frac{(1-\alpha^{2s})}{s(1-\alpha^{2})},\text{ for all }s>0. (4)

Note that g(s)g(s) is a decreasing function of ss with g(1)=1g(1)=1. The result below shows that, for an appropriate choice of the quantizer, our successive update scheme with p=1p=1 (the fast but loose version) achieves an accuracy of δ0(R)g(s)\delta_{0}(R)g(s) asymptotically, universally for all processes in 𝕏\mathbb{X}.

Theorem 2 (Lower bound for maxmin tracking accuracy: The achievability).

For R>0R>0 and ss\in\mathbb{N}, the asymptotic maxmin tacking accuracy is bounded below as

δ(R,s,𝕏)δ0(R)g(s).\delta^{*}(R,s,\mathbb{X})\geq\delta_{0}(R)g(s).

Furthermore, this bound can be obtained by a successive update scheme with p=1p=1 and appropriately chosen quantizer QpQ_{p}.

We provide a proof in Section VI. Note that while we assume that the per dimension fourth moment of the processes in 𝕏\mathbb{X} is bounded, the asymptotic result above does not depend on that bound. Interestingly, the performance characterized above is the best possible.

Theorem 3 (Upper bound for maxmin tracking accuracy: The converse).

For R>0R>0 and ss\in\mathbb{N}, the asymptotic maxmin tacking accuracy is bounded above as

δ(R,s,𝕏)δ0(R)g(s).\delta^{*}(R,s,\mathbb{X})\leq\delta_{0}(R)g(s).

Furthermore, the upper bound is obtained by considering a Gauss-Markov process.

We provide a proof in Section VII. Thus, δ(R,s,𝕏)=δ0(R)g(s)\delta^{*}(R,s,\mathbb{X})=\delta_{0}(R)g(s) with the fast but loose successive update scheme being universally (asymptotically) optimal and the Gauss-Markov process being the most difficult process to track. Clearly, the best possible choice of sampling period is s=1s=1 and the highest possible accuracy at rate RR is δ0(R)\delta_{0}(R), whereby we cannot hope for an accuracy exceeding δ0(R)\delta_{0}(R).

Alternatively, the results above can be interpreted as saying that we cannot subsample at a frequency less than 1/g1(δ/δ0(R))1/\lfloor g^{-1}(\delta/\delta_{0}(R))\rfloor for attaining a tracking accuracy δδ0(R)\delta\leqslant\delta_{0}(R).

IV-B Guidelines for choosing a good pp

The proof of Theorem 2 entails the analysis of the successive update scheme for p=1p=1. In fact, we can analyze this scheme for any pp\in\mathbb{N} and for any (θ,ε)(\theta,\varepsilon)-quantizer family; we term this tracking code the pp-successive update (pp-SU) scheme. This analysis can provide a simple guideline for the optimal choice of pp depending on the performance of the quantizer.

However, there are some technical caveats. A quantizer family will operate only as long as the input yy satisfies y2M\lVert y\rVert_{2}\leq M. If a yy outside this range is observed is observed, the quantizer will declare \bot and the tracking code encoder, in turn, will declare a failure. We denote by τ\tau the stopping time at which encoder failure occurs for the first time, i.e.i.e.,

τmin{ks+jp:Qp(Yk,j)=,0k,0jm1}.\tau\triangleq\min\{ks+jp:Q_{p}(Y_{k,j})=\bot,0\leq k,0\leqslant j\leqslant m-1\}.

Further, denote by AtA_{t} the event that failure does not occur until time tt, i.e.i.e.,

At{τ>t}.A_{t}\triangleq\{\tau>t\}.

We characterize the performance of a pp-SU in terms of the probability of encoder failure in a finite time horizon TT.

Theorem 4 (Performance of pp-SU).

For fixed θ,ε,β[0,1]\theta,\varepsilon,\beta\in[0,1], consider the pp-SU scheme with an nRpnRp bit (θ,ε)(\theta,\varepsilon)-quantizer QpQ_{p}, and denote the corresponding tracking code by (ϕp,ψp)(\phi_{p},\psi_{p}). Suppose that for a time horizon TT\in\mathbb{N}, the tracking code (ϕp,ψp)(\phi_{p},\psi_{p}) satisfies (τT)β\mathbb{P}\left(\tau\leq T\right)\leq\beta. Then,

supX𝕏nD¯T(ϕp,ψp,X)BT(θ,ε,β),\sup_{X\in\mathbb{X}_{n}}\overline{D}^{T}(\phi_{p},\psi_{p},X)\leq B_{T}(\theta,\varepsilon,\beta),

where BT(θ,ε,β)B_{T}(\theta,\varepsilon,\beta) satisfies

lim supTBT(θ,ε,β)σ2[1g(s)α2p1α2pθ(1ε2σ2θ)]+κβg(s)(1α2s)(1α2(s+p)(1θ)1α2pθ).\displaystyle\limsup_{T\to\infty}B_{T}(\theta,\varepsilon,\beta)\leq\sigma^{2}\Big{[}1-\frac{g(s)\,\alpha^{2p}}{1-\alpha^{2p}\,\theta}\Big{(}1-\frac{\varepsilon^{2}}{\sigma^{2}}-\theta\Big{)}\Big{]}+\frac{\kappa\beta g(s)}{(1-\alpha^{2s})}\Big{(}1-\alpha^{2(s+p)}\frac{(1-\theta)}{1-\alpha^{2p}\theta}\Big{)}.

We remark that β\beta can be made small by choosing MM to be large for a quantizer family. Furthermore, the inequality in the upper bound for the MSE in the previous result (barring the dependence on β\beta) comes from the inequality in the definition of a (θ,ε)(\theta,\varepsilon)-quantizer, rendering it a good proxy for the performance of the quantizer. The interesting regime is that of very small β\beta where the encoder failure doesn’t occur during the time horizon of operation. If we ignore the dependence on β\beta, the accuracy of the pp-SU does not depend either on ss or on the bound for the fourth moment κ\kappa. Motivated by these insights, we define the accuracy-speed curve of a quantizer family as follows.

Definition 3 (The accuracy-speed curve).

For α[0,1]\alpha\in[0,1], σ2\sigma^{2}, and R>0R>0, the accuracy-speed curve for a (θ,ε)(\theta,\varepsilon)-quantizer family QQ is given by

ΓQ(p)=α2p1α2pθ(Rp)(1ε2σ2θ(Rp)),p>0.\Gamma_{Q}(p)=\frac{\alpha^{2p}}{1-\alpha^{2p}\,\theta(Rp)}\left(1-\frac{\varepsilon^{2}}{\sigma^{2}}-\theta(Rp)\right),\quad p>0.

By Theorem 4, it is easy to see that the accuracy (precisely the upper bound on the accuracy) of a pp-SU scheme is better when ΓQ(p)\Gamma_{Q}(p) is larger. Thus, a good choice of pp for a given quantizer family QQ is the one that maximizes ΓQ(p)\Gamma_{Q}(p) for 1ps1\leq p\leq s.

We conclude by providing accuracy-speed curves for some illustrative examples. To build some heuristics, note that a uniform quantization of [M,M][-M,M] has θ(R)=0\theta(R)=0 and ε=M2R\varepsilon=M2^{-R}. For a gain-shape quantizer, we express a vector y=y2ysy=\lVert y\rVert_{2}y_{s} where the shape vector ysy_{s} has ys2=1\lVert y_{s}\rVert_{2}=1. An ideal shape quantizer (which only can be shown to exist asymptotically) using RR bits per dimension will satisfy 𝔼ys^ys2222R\mathbb{E}\lVert\hat{y_{s}}-y_{s}\rVert_{2}^{2}\leq 2^{-2R}, similar to the scalar uniform quantizer. In one of the examples below, we consider gain-shape quantizers with such an ideal shape quantizer.

Example 1.

We begin by considering an ideal quantizer family with θ(R)=22R\theta(R)=2^{-2R} and ε=0\varepsilon=0. In our asymptotic analysis, we will show roughly that such a quantizer with very small ε\varepsilon exists. For this ideal case, for R>0R>0, the accuracy-speed curve is given by

ΓQ(p)=α2pα2pθ(Rp)1α2pθ(Rp)=11α2p1α2p2Rp.\Gamma_{Q}(p)=\frac{\alpha^{2p}-\alpha^{2p}\,\theta(Rp)}{1-\alpha^{2p}\,\theta(Rp)}=1-\frac{1-\alpha^{2p}}{1-\alpha^{2p}2^{-Rp}}.

It can be seen that ΓQ(p)\Gamma_{Q}(p) is decreasing in pp whereby the optimal choice of pp that maximized ΓQ(p)\Gamma_{Q}(p) over p[s]p\in[s] is p=1p=1. Heuristically, this justifies why asymptotically the fast but loose successive update scheme is optimal.

Example 2 (Uniform scalar quantization).

In this example, we consider a coordinate-wise uniform quantizer. Since we seek quantizers for inputs yny\in\mathbb{R}^{n} such that y2Mn\lVert y\rVert_{2}\leq M\sqrt{n}, we can only use uniform quantizer of [Mn,Mn][-M\sqrt{n},M\sqrt{n}] for each coordinate. For this quantizer, we have θ=0\theta=0 and ε2=nM222R\varepsilon^{2}=nM^{2}2^{-2R}, whereby the accuracy-speed curve is given by ΓQ(p)=α2p(1nM222R/σ2)\Gamma_{Q}(p)=\alpha^{2p}(1-nM^{2}2^{-2R}/\sigma^{2}). Thus, once again, the optimal choice of pp that maximizes accuracy is p=1p=1.

Example 3 (Gain-shape quantizer).

Consider the quantization of a vector y=aysy=ay_{s} where a=y2a=\lVert y\rVert_{2}. The vector yy is quantized by a gain-shape quantizer which quantizes the norm and shape of the vector separately to give Q(y)=a^y^sQ(y)=\hat{a}\hat{y}_{s}. We use a uniform quantizer within a fixed range [0,Mn][0,M\sqrt{n}] in order to quantize the norm aa to a^\hat{a}, where an ideal shape quantizer is employed in quantizing the shape vector ysy_{s}. Namely, we assume 𝔼ysy^s2222R\mathbb{E}\lVert y_{s}-\hat{y}_{s}\rVert_{2}^{2}\leqslant 2^{-2R} and y^s1\lVert\hat{y}_{s}\rVert\leqslant 1. Suppose, that we allot \ell bits out of the total budget of nRnR bits for norm quantization and the rest for shape quantization. Then, we see that

𝔼yQ(y)222a222(R/n)+nM2221,\mathbb{E}\lVert y-Q(y)\rVert_{2}^{2}\leqslant 2a^{2}2^{-2(R-\ell/n)}+nM^{2}2^{-2\ell-1},

whereby θ(R)=22(R/n)+1\theta(R)=2^{-2(R-\ell/n)+1} and ϵ2=M2221\epsilon^{2}=M^{2}2^{-2\ell-1}. Thus, the accuracy-speed curve is given by

ΓQ(p)=α2p12α2p22(Rp/n)(12M2221σ222(Rp/n)+1).\displaystyle\Gamma_{Q}(p)=\frac{\alpha^{2p}}{1-2\alpha^{2p}2^{-2(Rp-\ell/n)}}\,\Big{(}1-\frac{2M^{2}2^{-2\ell-1}}{\sigma^{2}}-2^{-2(Rp-\ell/n)+1}\Big{)}.

Note that the optimal choice of pp in this case depends on the choice of MM.

We illustrated application of our analysis for idealized quantizers, but it can be used to analyze even very practical quantizers, such as the recently proposed almost optimal quantizer in [46].

V Analysis of the Successive Update scheme

From the discussion in section III, we observe that the successive update scheme is designed to refine the estimate of XksX_{ks} in each interval I~k\tilde{I}_{k}. This fact helps us in establishing a recursive relation for Dt(ϕp,ψp,X)D_{t}(\phi_{p},\psi_{p},X), tI~kt\in\tilde{I}_{k} in terms of Dks(ϕp,ψp,X)D_{ks}(\phi_{p},\psi_{p},X) which is provided next.

Lemma 5.

For a time instant t=ks+jp+i,0jm1,0ip1t=ks+jp+i,0\leq j\leq m-1,0\leq i\leq p-1 and k0k\geqslant 0, let (ϕp,ψp)(\phi_{p},\psi_{p}) denote the tracking code of a pp-SU scheme employing an nRpnRp bit (θ,ϵ)(\theta,\epsilon)-quantizer. Assume that (Atc)β2\mathbb{P}\left(A_{t}^{c}\right)\leqslant\beta^{2}. Then, we have

Dt(ϕp,ψp,X)α2(tks)θjDks(ϕp,ψp,X)+σ2(1α2(tks))+α2(tks)(1θj)ε2(1θ)+α2(tks)κβ.\displaystyle D_{t}(\phi_{p},\psi_{p},X)\leqslant\alpha^{2(t-ks)}\theta^{j}D_{ks}(\phi_{p},\psi_{p},X)+\sigma^{2}(1-\alpha^{2(t-ks)})+\frac{\alpha^{2(t-ks)}(1-\theta^{j})\varepsilon^{2}}{(1-\theta)}+\alpha^{2(t-ks)}\kappa\beta.
Proof:

From the evolution of the AR[1] process defined in (1), we see that Xt=αtksXks+u=ks+1tαtuξuX_{t}=\alpha^{t-ks}X_{ks}+\sum_{u=ks+1}^{t}\alpha^{t-u}\xi_{u}. Further for the pp-SU scheme, we know that X^t|t=αtksX^ks|ks+jp\hat{X}_{t|t}=\alpha^{t-ks}\hat{X}_{ks|ks+jp} at each instant t=ks+jp+it=ks+jp+i. Therefore, we have

XtX^t|t=αtks(XksX^ks|ks+jp)+u=ks+1tαtuξu.X_{t}-\hat{X}_{t|t}=\alpha^{t-ks}(X_{ks}-\hat{X}_{ks|ks+jp})+\sum_{u=ks+1}^{t}\alpha^{t-u}\xi_{u}.

Since the estimate X^ks|ks+jp\hat{X}_{ks|ks+jp} is a function of samples (X0,,Xks)(X_{0},\dots,X_{ks}), and the sequence (ξu,uks)(\xi_{u},u\geqslant ks) is independent of the past, we obtain the per dimension MSE as

Dt(ϕp,ψp,X)=α2(tks)n𝔼XksX^ks|ks+jp22+σ2(1α2(tks)).\displaystyle{D_{t}(\phi_{p},\psi_{p},X)}=\frac{\alpha^{2(t-ks)}}{n}\mathbb{E}\lVert X_{ks}-\hat{X}_{ks|ks+jp}\rVert_{2}^{2}+\sigma^{2}(1-\alpha^{2(t-ks)}).

Further, we divide the error into two terms based on occurrence of the failure event as follows:

Dt(ϕp,ψp,X)=α2(tks)n[𝔼[XksX^ks|ks+jp22𝟙At]+𝔼[XksX^ks|ks+jp22𝟙Atc]]+σ2(1α2(tks)).\displaystyle D_{t}(\phi_{p},\psi_{p},X)=\frac{\alpha^{2(t-ks)}}{n}\Big{[}\mathbb{E}[\lVert X_{ks}-\hat{X}_{ks|ks+jp}\rVert_{2}^{2}\mathbbm{1}_{A_{t}}]+\mathbb{E}[\lVert X_{ks}-\hat{X}_{ks|ks+jp}\rVert_{2}^{2}\mathbbm{1}_{A_{t}^{c}}]\Big{]}+\sigma^{2}(1-\alpha^{2(t-ks)}). (5)

Recall that at each instant t=ks+jpt=ks+jp, we refine the estimate X^ks|ks+(j1)p\hat{X}_{ks|ks+(j-1)p} of XksX_{ks} to X^ks|ks+jp=(X^ks|ks+(j1)p+Qp(Yk,j1))𝟙At\hat{X}_{ks|ks+jp}=(\hat{X}_{ks|ks+(j-1)p}+Q_{p}(Y_{k,j-1}))\mathbbm{1}_{A_{t}}. Upon substituting this expression for X^ks|ks+jp\hat{X}_{ks|ks+jp}, we obtain

𝔼[XksX^ks|ks+jp22𝟙At]\displaystyle{\mathbb{E}\big{[}\lVert X_{ks}-\hat{X}_{ks|ks+jp}\rVert_{2}^{2}\mathbbm{1}_{A_{t}}]} =𝔼[Yk,j1Qp(Yk,j1)2𝟙At]\displaystyle=\mathbb{E}[\lVert Y_{k,j-1}-Q_{p}(Y_{k,j-1})\rVert^{2}\mathbbm{1}_{A_{t}}\big{]}
θ𝔼[XksX^ks|ks+(j1)p22𝟙At]+nε2,\displaystyle\leqslant\theta\mathbb{E}[\lVert X_{ks}-\hat{X}_{ks|ks+(j-1)p}\rVert_{2}^{2}\mathbbm{1}_{A_{t}}]+n\varepsilon^{2},

where the identity uses the definition of error Yk,j1Y_{k,j-1} given in (3) and the inequality holds since QpQ_{p} is a (θ,ε)(\theta,\varepsilon)-quantizer. Repeating the previous step recursively, we get

1n𝔼[XksX^ks|ks+jp22𝟙At]\displaystyle{\frac{1}{n}\mathbb{E}[\lVert X_{ks}-\hat{X}_{ks|ks+jp}\rVert_{2}^{2}\mathbbm{1}_{A_{t}}]} θj1n𝔼[XksX^ks|ks22𝟙At]+1θj1θε2\displaystyle\leqslant\theta^{j}\cdot\frac{1}{n}\mathbb{E}[\lVert X_{ks}-\hat{X}_{ks|ks}\rVert_{2}^{2}\mathbbm{1}_{A_{t}}]+\frac{1-\theta^{j}}{1-\theta}\cdot\varepsilon^{2}
θj1n𝔼XksX^ks|ks22+1θj1θε2,\displaystyle\leqslant\theta^{j}\cdot\frac{1}{n}\mathbb{E}\lVert X_{ks}-\hat{X}_{ks|ks}\rVert_{2}^{2}+\frac{1-\theta^{j}}{1-\theta}\cdot\varepsilon^{2},

which is the same as

1n𝔼[XksX^ks|ks+jp22𝟙At]θjDks(ϕp,ψp,X)+1θj1θε2.\displaystyle{\frac{1}{n}\mathbb{E}[\lVert X_{ks}-\hat{X}_{ks|ks+jp}\rVert_{2}^{2}\mathbbm{1}_{A_{t}}]}\leqslant\theta^{j}\cdot D_{ks}(\phi_{p},\psi_{p},X)+\frac{1-\theta^{j}}{1-\theta}\cdot\varepsilon^{2}.

Moving to the error term 𝔼[XksX^ks|ks+jp22𝟙Atc]\mathbb{E}[\lVert X_{ks}-\hat{X}_{ks|ks+jp}\rVert_{2}^{2}\mathbbm{1}_{A_{t}^{c}}] when encoder failure occurs, recall that the decoder sets the estimate to 0 in the event of an encoder failure. Thus, using the Cauchy-Schwartz inequality, we get

1n𝔼[XksX^ks|ks+jp22𝟙Atc]\displaystyle{\frac{1}{n}\mathbb{E}[\lVert X_{ks}-\hat{X}_{ks|ks+jp}\rVert_{2}^{2}\mathbbm{1}_{A_{t}^{c}}]} =1n𝔼[Xks2𝟙Atc]\displaystyle=\frac{1}{n}\mathbb{E}[\lVert X_{ks}\rVert^{2}\mathbbm{1}_{A_{t}^{c}}]
1n𝔼[Xks4]P(Atc)\displaystyle\leqslant\frac{1}{n}\sqrt{\mathbb{E}\left[\lVert X_{ks}\rVert^{4}\right]P(A_{t}^{c})}
κβ.\displaystyle\leqslant\kappa\beta.

Substituting the two bounds above in (5), we get the result. ∎

The following recursive bound can be obtained using almost the same proof as that of Lemma 5; we omit the details.

Lemma 6.

Let (ϕp,ψp)(\phi_{p},\psi_{p}) denote the tracking code of a pp-SU scheme employing an nRpnRp bit (θ,ϵ)(\theta,\epsilon)-quantizer. Assume that (Atc)β2\mathbb{P}\left(A_{t}^{c}\right)\leqslant\beta^{2}. Then, we have

Dks(ϕp,ψp,𝕏n)α2sθmD(k1)s(ϕp,ψp,𝕏n)+σ2(1α2s)+α2sε2(1θm)(1θ)+α2sκβ.\displaystyle{D_{ks}(\phi_{p},\psi_{p},\mathbb{X}_{n})}\leqslant\alpha^{2s}\theta^{m}D_{(k-1)s}(\phi_{p},\psi_{p},\mathbb{X}_{n})+\sigma^{2}(1-\alpha^{2s})+\alpha^{2s}\varepsilon^{2}\frac{(1-\theta^{m})}{(1-\theta)}+\alpha^{2s}\kappa\beta.

We also need the following technical observation.

Lemma 7.

For a sequence (Xk:k+)(X_{k}\in\mathbb{R}:k\in\mathbb{Z}_{+}) that satisfies sequence of upper bounds

XkaXk1+b,k+,X_{k}\leqslant aX_{k-1}+b,\qquad\forall\,k\in\mathbb{Z}_{+},

with constants a,ba,b\in\mathbb{R} such that bb is finite and a(1,1)a\in(-1,1), we have

limK1Kk=0K1Xkb1a.\lim_{K\to\infty}\frac{1}{K}\sum_{k=0}^{K-1}X_{k}\leqslant\frac{b}{1-a}.
Proof:

From the sequence of upper bounds, we can inductively show that

XkakX0+b1ak1a,k+.X_{k}\leqslant a^{k}X_{0}+b\frac{1-a^{k}}{1-a},\qquad\forall\,k\in\mathbb{Z}_{+}.

Averaging XkX_{k} over the horizon {0,,K1}\left\{0,\dots,K-1\right\}, we get

1Kk=0K1Xk1aKK(1a)(X0b1a)+b1a.\frac{1}{K}\sum_{k=0}^{K-1}X_{k}\leqslant\frac{1-a^{K}}{K(1-a)}\Big{(}X_{0}-\frac{b}{1-a}\Big{)}+\frac{b}{1-a}.

From the finiteness of X0,bX_{0},b and the fact that |a|<1\lvert a\rvert<1, the result follows by taking the limit KK growing arbitrarily large on both sides. ∎

We are now in a position to prove Theorem 4.

Proof of Theorem 4: We begin by noting that, without any loss of generality, we can restrict to T=KsT=Ks. This holds since the contributions of the error term within the fixed interval IKI_{K} are bounded. For T=KsT=Ks, the time duration {0,,T}\left\{0,\dots,T\right\} can be partitioned into intervals (Ik,k+1[K])(I_{k},k+1\in[K]). Therefore, we can write the average MSE per dimension for the pp-SU scheme for time-horizon T=KsT=Ks as

D¯T(ϕp,ψp,X)=1Ksk=0K1j=0m1i=0p1Dks+jp+i(ϕp,ψp,X).\overline{D}_{T}(\phi_{p},\psi_{p},X)=\frac{1}{Ks}\sum_{k=0}^{K-1}\sum_{j=0}^{m-1}\sum_{i=0}^{p-1}D_{ks+jp+i}(\phi_{p},\psi_{p},X).

From the upper bound for per dimension MSE given in Lemma 5, we get

i=0p1Dks+jp+i(ϕp,ψp,X)\displaystyle{\sum_{i=0}^{p-1}D_{ks+jp+i}(\phi_{p},\psi_{p},X)} i=0p1[α2(jp+i)θjDks(ϕp,ψp,X)+σ2(1α2(jp+i))+α2(jp+i)(1θj)ε2(1θ)+α2(jp+i)κβ]\displaystyle\leq\sum_{i=0}^{p-1}\Big{[}\alpha^{2(jp+i)}\theta^{j}D_{ks}(\phi_{p},\psi_{p},X)+\sigma^{2}(1-\alpha^{2(jp+i)})+\frac{\alpha^{2(jp+i)}(1-\theta^{j})\varepsilon^{2}}{(1-\theta)}+\alpha^{2(jp+i)}\kappa\beta\Big{]}
=α2jp1α2p1α2(θjDks(ϕp,ψp,X)+(1θj)ε2(1θ)+κβσ2)+pσ2.\displaystyle=\alpha^{2jp}\frac{1-\alpha^{2p}}{1-\alpha^{2}}\Big{(}\theta^{j}D_{ks}(\phi_{p},\psi_{p},X)+\frac{(1-\theta^{j})\varepsilon^{2}}{(1-\theta)}+\kappa\beta-\sigma^{2}\Big{)}+p\sigma^{2}.

Summing the expression above over j{0,,m1}j\in\{0,...,m-1\} and k{0,,K1}k\in\{0,...,K-1\}, and dividing by TT, we get

D¯T(ϕp,ψp,X)\displaystyle{\overline{D}_{T}(\phi_{p},\psi_{p},X)} σ2+(1α2s)s(1α2)(κβσ2+ε21θ)+(1α2p)s(1α2)(1α2sθm)(1α2pθ)(1Kk=0K1Dks(ϕp,ψp,X)ε21θ).\displaystyle\leqslant\sigma^{2}+\frac{(1-\alpha^{2s})}{s(1-\alpha^{2})}\Big{(}\kappa\beta-\sigma^{2}+\frac{\varepsilon^{2}}{1-\theta}\Big{)}+\frac{(1-\alpha^{2p})}{s(1-\alpha^{2})}\frac{(1-\alpha^{2s}\theta^{m})}{(1-\alpha^{2p}\theta)}\Big{(}\frac{1}{K}\sum_{k=0}^{K-1}D_{ks}(\phi_{p},\psi_{p},X)-\frac{\varepsilon^{2}}{1-\theta}\Big{)}.

It follows by Lemma 6 that

D¯T(ϕp,ψp,X)\displaystyle{\overline{D}_{T}(\phi_{p},\psi_{p},X)} σ2+(1α2s)s(1α2)(κβσ2+ε21θ)+(1α2p)s(1α2)(1α2sθm)(1α2pθ)(sup{ak}k0𝒜1Kk=0K1akε21θ),\displaystyle\leqslant\sigma^{2}+\frac{(1-\alpha^{2s})}{s(1-\alpha^{2})}\Big{(}\kappa\beta-\sigma^{2}+\frac{\varepsilon^{2}}{1-\theta}\Big{)}+\frac{(1-\alpha^{2p})}{s(1-\alpha^{2})}\frac{(1-\alpha^{2s}\theta^{m})}{(1-\alpha^{2p}\theta)}\Big{(}\sup_{\{a_{k}\}_{k\geq 0}\in\mathcal{A}}\frac{1}{K}\sum_{k=0}^{K-1}a_{k}-\frac{\varepsilon^{2}}{1-\theta}\Big{)}, (6)

where the 𝒜\mathcal{A} denotes the set of {ak}k0\{a_{k}\}_{k\geq 0} satisfying

ak\displaystyle a_{k} α2sθmak1+σ2(1α2s)+α2sκβ+α2sε2(1θm)(1θ).\displaystyle\leqslant\alpha^{2s}\theta^{m}a_{k-1}+\sigma^{2}(1-\alpha^{2s})+\alpha^{2s}\kappa\beta+\alpha^{2s}\varepsilon^{2}\frac{(1-\theta^{m})}{(1-\theta)}.

We denote the right-side of (6) by BT(θ,ϵ,β)B_{T}(\theta,\epsilon,\beta). Noting that by Lemma 7 any sequence {ak}k0𝒜\{a_{k}\}_{k\geq 0}\in\mathcal{A} satisfies

limK1Kk=0K1ak11α2sθm(σ2(1α2s)+α2sκβ+α2sε2(1θm)(1θ)),\displaystyle\lim_{K\to\infty}\frac{1}{K}\sum_{k=0}^{K-1}a_{k}\leqslant\frac{1}{1-\alpha^{2s}\theta^{m}}\Big{(}\sigma^{2}(1-\alpha^{2s})+\alpha^{2s}\kappa\beta+\alpha^{2s}\varepsilon^{2}\frac{(1-\theta^{m})}{(1-\theta)}\Big{)},

we get that

lim supTBT(θ,ϵ,β)\displaystyle{\limsup_{T\to\infty}B_{T}(\theta,\epsilon,\beta)} σ2(1(1α2s)s(1α2)α2p(1θ)(1α2pθ))+ε2(1α2s)s(1α2)α2p(1α2pθ)+κβ1s(1α2)(1α2(s+p)(1θ)(1α2pθ)),\displaystyle\leq\sigma^{2}\left(1-\frac{(1-\alpha^{2s})}{s(1-\alpha^{2})}\cdot\frac{\alpha^{2p}(1-\theta)}{(1-\alpha^{2p}\theta)}\right)+\varepsilon^{2}\frac{(1-\alpha^{2s})}{s(1-\alpha^{2})}\cdot\frac{\alpha^{2p}}{(1-\alpha^{2p}\theta)}+\kappa\beta\frac{1}{s(1-\alpha^{2})}\left(1-\frac{\alpha^{2(s+p)}(1-\theta)}{(1-\alpha^{2p}\theta)}\right),

which completes the proof. ∎

VI Asymptotic Achievability Using Random Quantizer

With Theorem 4 at our disposal, the proof of achievability can be completed by fixing p=1p=1 and showing the existence of appropriate quantizer. However, we need to handle the failure event, and we address this first. The next result shows that the failure probability depends on the quantizer only through MM.

Lemma 8.

For fixed TT and nn, consider the pp-SU scheme with p=1p=1 and an nRnR bit (θ,ϵ)(\theta,\epsilon)-quantizer QQ with dynamic range MM. Then, for every η>0\eta>0, there exists an M0M_{0} independent of nn such that for all MM0M\geq M_{0}, we get

(ATc)η.\mathbb{P}\left(A_{T}^{c}\right)\leqslant\eta.
Proof:

The event ATA_{T} (of encoder failure not happening until time TT for the successive update scheme) occurs when the errors Yk,jY_{k,j} satisfies Yk,j22nM2\lVert Y_{k,j}\rVert_{2}^{2}\leq nM^{2}, for every k0k\geq 0 and 0js10\leq j\leq s-1 such that t=ks+jTt=ks+j\leq T. For brevity, we denote by YtY_{t} the error random variable Yk,jY_{k,j} and Yt1=(Y1,,Yt1)Y^{t-1}=(Y_{1},...,Y_{t-1}). We note that

(ATc)\displaystyle\mathbb{P}\left(A_{T}^{c}\right) =(AT1c)+(AT1ATc)\displaystyle=\mathbb{P}\left(A_{T-1}^{c}\right)+\mathbb{P}\left(A_{T-1}\cap A_{T}^{c}\right)
=(AT1c)+(AT1{YT22>nM2})\displaystyle=\mathbb{P}\left(A_{T-1}^{c}\right)+\mathbb{P}\left(A_{T-1}\cap\{\lVert Y_{T}\rVert_{2}^{2}>nM^{2}\}\right)
=(AT1c)+𝔼[𝟙AT1(YT22>nM2|YT1)]\displaystyle=\mathbb{P}\left(A_{T-1}^{c}\right)+\mathbb{E}\big{[}\mathbbm{1}_{A_{T-1}}\mathbb{P}\left(\lVert Y_{T}\rVert_{2}^{2}>nM^{2}|Y^{T-1}\right)\big{]}
(AT1c)+𝔼[𝟙AT1𝔼[YT22|YT1]nM2]\displaystyle\leq\mathbb{P}\left(A_{T-1}^{c}\right)+\mathbb{E}\Big{[}\mathbbm{1}_{A_{T-1}}\frac{\mathbb{E}[\lVert Y_{T}\rVert_{2}^{2}|Y^{T-1}]}{nM^{2}}\Big{]}
=(AT1c)+𝔼[YT22𝟙AT1]nM2.\displaystyle=\mathbb{P}\left(A_{T-1}^{c}\right)+\frac{\mathbb{E}[\lVert Y_{T}\rVert_{2}^{2}\mathbbm{1}_{A_{T-1}}]}{nM^{2}}.

Note that we have YT=YT1Q(YT1)Y_{T}=Y_{T-1}-Q(Y_{T-1}) under AT1A_{T-1}, whereby

𝔼[YT22𝟙AT1]θ𝔼[YT122𝟙AT1]+nε2.\mathbb{E}[\lVert Y_{T}\rVert_{2}^{2}\mathbbm{1}_{A_{T-1}}]\leq\theta\cdot\mathbb{E}[\lVert Y_{T-1}\rVert_{2}^{2}\mathbbm{1}_{A_{T-1}}]+n\varepsilon^{2}.

Denoting by βT2\beta_{T}^{2} the probability (ATc)\mathbb{P}\left(A_{T}^{c}\right), the previous two inequalities imply

βT2βT12+θnM2𝔼[YT122]+ε2M2.\beta_{T}^{2}\leq\beta_{T-1}^{2}+\frac{\theta}{nM^{2}}\mathbb{E}[\lVert Y_{T-1}\rVert_{2}^{2}]+\frac{\varepsilon^{2}}{M^{2}}.

We saw earlier in the proof of Lemma 5 that 𝔼[YT122]/n\mathbb{E}[\lVert Y_{T-1}\rVert_{2}^{2}]/n depends only on the probability βT12\beta_{T-1}^{2} that failure doesn’t occur until time T1T-1. Proceeding as in that proof, we get

βT2βT12+1M2(c1βT1+c2),\beta_{T}^{2}\leq\beta_{T-1}^{2}+\frac{1}{M^{2}}(c_{1}\beta_{T-1}+c_{2}),

where c1c_{1} and c2c_{2} do not depend on nn. Therefore, there exists M0M_{0} independent of nn such that for all MM exceeding M0M_{0} we have

βT2βT12+η,\beta_{T}^{2}\leq\beta_{T-1}^{2}+\eta,

which completes the proof by summing over TT. ∎

The bound above is rather loose, but it suffices for our purpose. In particular, it says that we can choose MM sufficiently large to make probability of failure until time TT less than any β2\beta^{2}, whereby Theorem 4 can be applied by designing a quantizer for this MM. Indeed, we can use the quantizer of unit sphere from [1, 2], along with a uniform quantizer for gain (which lies in [M,M][-M,M]) to get the following performance. In fact, we will show that a deterministic quantizer with the desired performance exists. Note that we already considered such a quantizer in Example 3. But the analysis there was slightly loose, and assumed the existence of an ideal shape quantizer.

Lemma 9.

For every R,ε,γ,M>0R,\varepsilon,\gamma,M>0, there exists an nRnR bit (22(Rγ),ε)(2^{-2(R-\gamma)},\varepsilon)-quantizer with dynamic range MM, for all nn sufficiently large.

Proof:

We first borrow a classic construction from [1, 2], which gives us our desired shape quantizer. Denote by 𝕊n\mathbb{S}_{n} the (n1)(n-1)-dimensional unit sphere {yn:y2=1}\{y\in\mathbb{R}^{n}:\lVert y\rVert_{2}=1\}. For every γ>0\gamma>0 and nn sufficiently large, it was shown in [1, 2] that there exist 2nR2^{nR} vectors 𝒞\mathcal{C} in 𝕊n\mathbb{S}_{n} such that for every y𝕊ny\in\mathbb{S}_{n} we can find y𝒞y^{\prime}\in\mathcal{C} satisfying

y,y122(Rγ).\langle y,y^{\prime}\rangle\geqslant\sqrt{1-2^{-2(R-\gamma)}}.

Denoting cosθ=122(Rγ)\cos\theta=\sqrt{1-2^{-2(R-\gamma)}}, consider the shape quantizer QR(y)Q_{R}(y) from [2] given by

QR(y)\displaystyle Q_{R}\left(y\right) cosθargminy𝒞yy22\displaystyle\triangleq\cos\theta\cdot\operatorname*{arg\,min}_{y^{\prime}\in\mathcal{C}}\lVert y-y^{\prime}\rVert_{2}^{2}
=cosθargmaxy𝒞y,y,y𝕊n.\displaystyle=\cos\theta\cdot\arg\max_{y^{\prime}\in\mathcal{C}}\langle y,y^{\prime}\rangle,\quad\forall\,y\in\mathbb{S}_{n}.

Note that we shrink the length of yy^{\prime} by a factor of cosθ\cos\theta, which will be seen to yield the gain over the analysis in Example 3.

We append to this shape quantizer the uniform gain quantizer qM:[0,M][0,M]q_{M}:[0,M]\to[0,M], which quantizes the interval [0,M][0,M] uniformly into sub-intervals of length ϵ\epsilon. Specifically, qM(a)=εa/εq_{M}(a)=\varepsilon\lfloor a/\varepsilon\rfloor and the corresponding index is given by a/ε\lfloor a/\varepsilon\rfloor. We represent this index using its log(M/ε)\ell\triangleq\lceil\log(M/\varepsilon)\rceil bit binary representation and denote this mapping by ϕM:[0,M]{0,1}\phi^{M}:[0,M]\to\{0,1\}^{\ell}.

For every yny\in\mathbb{R}^{n} such that y22nM2\lVert y\rVert_{2}^{2}\leq nM^{2}, we consider the quantizer

Q(y)=nqM(y2n)QR(yy2).Q(y)=\sqrt{n}\cdot q_{M}\left(\frac{\lVert y\rVert_{2}}{\sqrt{n}}\right)\cdot Q_{R}\left(\frac{y}{\lVert y\rVert_{2}}\right).

For this quantizer, for every yny\in\mathbb{R}^{n} with y22=nB2\lVert y\rVert_{2}^{2}=nB^{2} such that BMB\leq M, we have

yQ(y)22\displaystyle\lVert y-Q(y)\rVert^{2}_{2} =y22+Q(y)222y,Q(y)\displaystyle=\lVert y\rVert^{2}_{2}+\lVert Q(y)\rVert_{2}^{2}-2\langle y,Q(y)\rangle
=nB2+nB^2cos2θ2nBB^cosθy~,QR(y~)\displaystyle=nB^{2}+n\hat{B}^{2}\cos^{2}\theta-2nB\hat{B}\cos\theta\langle\tilde{y},Q_{R}(\tilde{y})\rangle
nB2+nB^2cos2θ2nBB^cos2θ\displaystyle\leq nB^{2}+n\hat{B}^{2}\cos^{2}\theta-2nB\hat{B}\cos^{2}\theta
=nB2sin2θ+n(BB^)2cos2θ\displaystyle=nB^{2}\sin^{2}\theta+n(B-\hat{B})^{2}\cos^{2}\theta
nB2sin2θ+nε2cos2θ\displaystyle\leq nB^{2}\sin^{2}\theta+n\varepsilon^{2}\cos^{2}\theta
nB222(Rγ)+nε2,\displaystyle\leq nB^{2}2^{-2(R-\gamma)}+n\varepsilon^{2},

where the first inequality uses the covering property of 𝒞\mathcal{C}. Therefore, QQ constitutes an nR+nR+\ell bit 22(Rγ),ε2^{-2(R-\gamma),\varepsilon}-quantizer with dynamic range MM, for all nn sufficiently large. Note that this quantizer is a deterministic one. ∎

Proof of Theorem 2: For any fixed β\beta and ε\varepsilon, we can make the probability of failure until time TT less than β\beta by choosing MM sufficiently large. Further, for any fixed R,γ>0R,\gamma>0, by Lemma 9, we can choose nn sufficiently large to get an nRnR bit (22(Rγ),ε)(2^{-2(R-\gamma)},\varepsilon)-quantizer for vectors yy with y22nM2\lVert y\rVert_{2}^{2}\leq nM^{2}. Therefore, by Theorem 4 applied for p=1p=1, we get that

δ(R,s,𝕏)g(s)α21α222(Rγ)(1ε2σ222(Rγ))κβg(s)σ2(1α2s)(1α2(s+1)1θ1α2θ).\displaystyle\delta^{*}(R,s,\mathbb{X})\geq\frac{g(s)\,\alpha^{2}}{1-\alpha^{2}2^{-2(R-\gamma)}}\,\Big{(}1-\frac{\varepsilon^{2}}{\sigma^{2}}-2^{-2(R-\gamma)}\Big{)}-\frac{\kappa\beta g(s)}{\sigma^{2}(1-\alpha^{2s})}\,\Big{(}1-\alpha^{2(s+1)}\frac{1-\theta}{1-\alpha^{2}\theta}\Big{)}.

The proof is completed upon taking the limits as ε,γ\varepsilon,\gamma, and β\beta go to 0.∎

VII Converse bound : Proof of Theorem 3

The proof is similar to the converse proof in [32], but now we need to handle the delay per transmission. We rely on the properties of entropy power of a random variable. Recall that for a continuous random variable XX taking values in n\mathbb{R}^{n}, the entropy power of XX is given by

𝒩(X)=12πe2(2/n)h(X),\mathcal{N}(X)=\frac{1}{2\pi e}\cdot 2^{(2/n)\,h(X)},

where h(X)h(X) is the differential entropy of XX.

Consider a tracking code (ϕ,ψ)(\phi,\psi) of rate RR and sampling period ss and a process X𝕏nX\in\mathbb{X}_{n}. We begin by noting that the state at time tt is related to the state at time t+it+i as

Xt+i=αiXt+j=0i1αjξt+ij,X_{t+i}=\alpha^{i}X_{t}+\sum_{j=0}^{i-1}\alpha^{j}\xi_{t+i-j},

where the noise j=0i1αjξt+ij\sum_{j=0}^{i-1}\alpha^{j}\xi_{t+i-j} is independent of XtX_{t} (and the past states). In particular, for t=ks+it=ks+i, 1i<s1\leqslant i<s, we get

𝔼[XtX^t|t22]\displaystyle{\mathbb{E}\left[\lVert X_{t}-\hat{X}_{t|t}\rVert_{2}^{2}\right]} =𝔼[αiXksX^t|t22]+j=0i1α2j𝔼[ξtj22]\displaystyle=\mathbb{E}\left[\lVert\alpha^{i}X_{ks}-\hat{X}_{t|t}\rVert_{2}^{2}\right]+\sum_{j=0}^{i-1}\alpha^{2j}\mathbb{E}\left[\lVert\xi_{t-j}\rVert_{2}^{2}\right]
=α2i𝔼[XksX~t22]+n(1α2i)σ2.\displaystyle=\alpha^{2i}\mathbb{E}\left[\lVert X_{ks}-\tilde{X}_{t}\rVert_{2}^{2}\right]+n(1-\alpha^{2i})\sigma^{2}.

where we define X~t:=αiX^t|t\tilde{X}_{t}:=\alpha^{-i}\hat{X}_{t|t} and the first identity uses the orthogonality of noise added in each round from the previous states and noise. Since the Gaussian distribution has the maximum differential entropy among all continuous random variables with a given variance, and the entropy power for a Gaussian random variable equals its variance, we get that

σ2(1α2)𝒩(ξt+i).\sigma^{2}(1-\alpha^{2})\geqslant\mathcal{N}(\xi_{t+i}).

Therefore, the previous bound for tracking error yields

Dks+i(ϕ,ψ,X)\displaystyle{D_{ks+i}(\phi,\psi,X)} α2i1n𝔼[XksX~ks+i22]+(1α2i)(1α2)𝒩(ξks+i)\displaystyle\geqslant\alpha^{2i}\frac{1}{n}\mathbb{E}\left[\lVert X_{ks}-\tilde{X}_{ks+i}\rVert_{2}^{2}\right]+\frac{(1-\alpha^{2i})}{(1-\alpha^{2})}\mathcal{N}(\xi_{ks+i})
=α2i1n𝔼[XksX~ks+i22]+(1α2i)(1α2)𝒩(ξ1),\displaystyle=\alpha^{2i}\frac{1}{n}\mathbb{E}\left[\lVert X_{ks}-\tilde{X}_{ks+i}\rVert_{2}^{2}\right]+\frac{(1-\alpha^{2i})}{(1-\alpha^{2})}\mathcal{N}(\xi_{1}), (7)

where the identity uses the assumption that ξt\xi_{t} are identically distributed for all tt. Taking average of these terms for t=0,..,Tt=0,..,T, we get

D¯T(ϕ,ψ,X)\displaystyle{\overline{D}_{T}(\phi,\psi,X)} =1nKsk=0K1i=ks(k+1)s1𝔼[XiX^i|i22]\displaystyle=\frac{1}{nKs}\sum_{k=0}^{K-1}\sum_{i=ks}^{(k+1)s-1}\mathbb{E}\left[\lVert X_{i}-\hat{X}_{i|i}\rVert_{2}^{2}\right]
1nKsk=0K1i=0s1α2i𝔼[XksX~ks+i22]+𝒩(ξ1)(1α2)(1(1α2s)s(1α2)).\displaystyle\geqslant\frac{1}{nKs}\sum_{k=0}^{K-1}\sum_{i=0}^{s-1}\alpha^{2i}\mathbb{E}\left[\lVert X_{ks}-\tilde{X}_{ks+i}\rVert_{2}^{2}\right]+\frac{\mathcal{N}(\xi_{1})}{(1-\alpha^{2})}\left(1-\frac{(1-\alpha^{2s})}{s(1-\alpha^{2})}\right).

Note that X~ks+i\tilde{X}_{ks+i}s act as estimates of Xks{X}_{ks} which depend on the communication received by the decoder until time ks+iks+i. We denote the communication received at time tt by Ct1C_{t-1}, whereby X~ks+i\tilde{X}_{ks+i} depends only on C1,,Cks+i1C_{1},...,C_{ks+i-1}. In particular, the communication Cks,,Cks+i1C_{ks},...,C_{ks+i-1} was sent as a function of XksX_{ks}, the sample seen at time t=kst=ks.

From here on, we proceed by invoking the “entropy power bounds” for the MSE terms. For random variables XX and YY such that PX|YP_{X|Y} has a conditional density, the conditional entropy power is given by 𝒩(X|Y)=1/(2πe)22h(X|Y)/n\mathcal{N}(X|Y)=1/(2\pi e)2^{2h(X|Y)/n}.666The conditional differential entropy h(X|Y)h(X|Y) is given by 𝔼[h(PX|Y)]\mathbb{E}\left[h(P_{X|Y})\right]. Bounding MSE terms by entropy power is a standard step that allows us to track reduction in error due to a fixed amount of communication.

We begin by using the following standard bound (see [47, Chapter 10]):777It follows simply by noting that Gaussian maximizes differential entropy among all random variables with a given second moment and that h(X)h(X|Y)H(Y)=nRh(X)-h(X|Y)\leqslant H(Y)=nR. For a continuous random variable XX and a discrete random variable YY taking {0,1}nR\{0,1\}^{nR} values, let X^\hat{X} be any function of YY. Then, it holds that

1n𝔼XX^222R𝒩(X).\frac{1}{n}\mathbb{E}\|X-\hat{X}\|^{2}\geqslant 2^{-2R}\mathcal{N}(X). (8)

We apply this result to XksX_{ks} given Cks1C^{ks-1} in the role of XX and the communication Cks,..,Cks+i1C_{ks},..,C_{ks+i-1} in the role of YY. The previous bound and Jensen’s inequality yield

1n𝔼[XksX~ks+i22]22Ri𝔼[𝒩(Xks|Cks1)].\frac{1}{n}\mathbb{E}\left[\lVert X_{ks}-\tilde{X}_{ks+i}\rVert_{2}^{2}\right]\geqslant 2^{-2Ri}\mathbb{E}[\mathcal{N}(X_{ks}|C^{ks-1})].

Next, we recall the entropy power inequality (cf.cf. [47]): For independent X1X_{1} and X2X_{2}, 𝒩(X1+X2)𝒩(X1)+𝒩(X2)\mathcal{N}(X_{1}+X_{2})\geqslant\mathcal{N}(X_{1})+\mathcal{N}(X_{2}). Noting that Xks=αsX(k1)s+j=0s1αjξksjX_{ks}=\alpha^{s}X_{(k-1)s}+\sum_{j=0}^{s-1}\alpha^{j}\xi_{ks-j}, where {ξi}\{\xi_{i}\} is an iid zero-mean random variable independent of X(k1)sX_{(k-1)s}, and that Cks1C^{ks-1} is a function of X1,,X(k1)sX_{1},...,X_{(k-1)s}, we get

𝒩(Xks|Cks1)\displaystyle{\mathcal{N}(X_{ks}|C^{ks-1})} 𝒩(αsX(k1)s|Cks1)+𝒩(ξks)(1α2s)(1α2)\displaystyle\geqslant\mathcal{N}(\alpha^{s}X_{(k-1)s}|C^{ks-1})+\mathcal{N}(\xi_{ks})\frac{(1-\alpha^{2s})}{(1-\alpha^{2})}
=α2s𝒩(X(k1)s|Cks1)+𝒩(ξ1)(1α2s)(1α2),\displaystyle=\alpha^{2s}\mathcal{N}(X_{(k-1)s}|C^{ks-1})+\mathcal{N}(\xi_{1})\frac{(1-\alpha^{2s})}{(1-\alpha^{2})},

where the previous identities utilizes the scaling property of differential entropy. Upon combining the bounds given above and simplifying, we get

D¯T(ϕ,ψ,X)\displaystyle\overline{D}_{T}(\phi,\psi,X) α2s(1α2s22Rs)s(1α222R)1Kk=0K1𝔼[𝒩(X(k1)s|Cks1)]\displaystyle\geqslant\frac{\alpha^{2s}(1-\alpha^{2s}2^{-2Rs})}{s(1-\alpha^{2}2^{-2R})}\cdot\frac{1}{K}\sum_{k=0}^{K-1}\mathbb{E}[\mathcal{N}(X_{(k-1)s}|C^{ks-1})]
+𝒩(ξ1)(1α2)(1+(1α2s)(1α2s22Rs)s(1α222R)(1α2s)s(1α2)).\displaystyle\hskip 85.35826pt+\frac{\mathcal{N}(\xi_{1})}{(1-\alpha^{2})}\left(1+\frac{(1-\alpha^{2s})(1-\alpha^{2s}2^{-2Rs})}{s(1-\alpha^{2}2^{-2R})}-\frac{(1-\alpha^{2s})}{s(1-\alpha^{2})}\right). (9)

Finally, note that the terms 𝒩(X(k1)s|Cks1)\mathcal{N}(X_{(k-1)s}|C^{ks-1}) are exactly the same as that considered in [32, eqn. 11e] since they correspond to recovering X(k1)sX_{(k-1)s} using communication that can depend on it. Therefore, a similar expression holds here, for the sampled process {Xks:k}\{X_{ks}:k\in\mathbb{N}\} . Using the recursive bound for the tracking error in (7) and (8), we adapt the results of [32, eqn. 11] for our case to obtain

𝔼[𝒩(X(k1)s|Cks1)]dk1,\mathbb{E}[\mathcal{N}(X_{(k-1)s}|C^{ks-1})]\geqslant d_{k-1}^{*},

where the quantity dkd_{k}^{*} is given by the recursion

dk=22Rs(α2sdk1+𝒩(ξ1)(1α2s)(1α2)),d_{k}^{*}=2^{-2Rs}\Big{(}\alpha^{2s}d_{k-1}^{*}+\mathcal{N}(\xi_{1})\frac{(1-\alpha^{2s})}{(1-\alpha^{2})}\Big{)},

with d0=0d_{0}^{*}=0.

The bound obtain above holds for any given process X𝕏nX\in\mathbb{X}_{n}. To obtain the best possible bound we substitute ξ1\xi_{1} to be a Gaussian random variable, since that would maximize 𝒩(ξ1)\mathcal{N}(\xi_{1}). Specifically, we set {ξk}\{\xi_{k}\} to be a Gaussian random variable with zero mean and variance σ2\sigma^{2} to get 𝒩(ξ)=σ2(1α2)\mathcal{N}(\xi)=\sigma^{2}(1-\alpha^{2}). Thus, taking supremum over all distributions on both sides of (9), we have

supX𝕏nD¯T(ϕ,ψ,X)\displaystyle\sup_{X\in\mathbb{X}_{n}}\overline{D}_{T}(\phi,\psi,X) α2s(1α2s22Rs)s(1α222R)1Kk=0K1dk1+σ2(1+(1α2s)(1α2s22Rs)s(1α222R)(1α2s)s(1α2)),\displaystyle\geqslant\frac{\alpha^{2s}(1-\alpha^{2s}2^{-2Rs})}{s(1-\alpha^{2}2^{-2R})}\cdot\frac{1}{K}\sum_{k=0}^{K-1}d_{k-1}^{*}+\sigma^{2}\left(1+\frac{(1-\alpha^{2s})(1-\alpha^{2s}2^{-2Rs})}{s(1-\alpha^{2}2^{-2R})}-\frac{(1-\alpha^{2s})}{s(1-\alpha^{2})}\right),

where

dk=22Rs(α2sdk1+σ2(1α2s)),d_{k}^{*}=2^{-2Rs}\big{(}\alpha^{2s}d_{k-1}^{*}+\sigma^{2}(1-\alpha^{2s})\big{)},

with d0=0d_{0}^{*}=0. For this sequence dkd_{k}^{*}, we can see that (cf.cf. [32, Corollary 1])

lim supK1Kk=0K1dk1=limKdk=σ2(1α2s)22Rs(1α2s22Rs).\displaystyle\limsup_{K\to\infty}\frac{1}{K}\sum_{k=0}^{K-1}d_{k-1}^{*}=\lim_{K\to\infty}d_{k}^{*}=\frac{\sigma^{2}(1-\alpha^{2s})2^{-2Rs}}{(1-\alpha^{2s}2^{-2Rs})}.

Therefore, we have obtained

lim supTsupX𝕏nD¯T(ϕ,ψ,X)\displaystyle{\limsup_{T\to\infty}\sup_{X\in\mathbb{X}_{n}}\overline{D}_{T}(\phi,\psi,X)} σ2((1α2s)α2s22Rss(1α222R)+1(1α2s)s(1α2)+(1α2s)(1α2s22Rs)s(1α222R))\displaystyle\geqslant\sigma^{2}\bigg{(}\frac{(1-\alpha^{2s})\alpha^{2s}2^{-2Rs}}{s(1-\alpha^{2}2^{-2R})}+1-\frac{(1-\alpha^{2s})}{s(1-\alpha^{2})}+\frac{(1-\alpha^{2s})(1-\alpha^{2s}2^{-2Rs})}{s(1-\alpha^{2}2^{-2R})}\bigg{)}
=σ2(1g(s)δ0(R)).\displaystyle=\sigma^{2}\bigg{(}1-g(s)\delta_{0}(R)\bigg{)}.

As the previous bound holds for all tracking codes (ϕ,ψ)(\phi,\psi), it follows that δ(R,s,𝕏)g(s)δ0(R).\delta^{*}(R,s,\mathbb{X})\leqslant g(s)\delta_{0}(R).

VIII Discussion

We restricted our treatment to an AR[1] process with uncorrelated components. This restriction is for clarity of presentation, and some of the results can be extended to AR[1] processes with correlated components. In this case, the decoder will be replaced by a Kalman-like filter in the manner of [27]. A natural extension of this work is the study of an optimum transmission strategy for an AR[nn] process in the given setting. In an AR[nn] process, the strategy of refining the latest sample is clearly not sufficient as the value of the process at any time instant is dependent on the past nn samples. If the sampling is periodic, even the encoder does not have access to all these nn samples unless we take a sample at every instant. A viable alternative is to take nn consecutive samples at every sampling instant. However, even with this structure on the sampling policy, it is not clear how must the information be transmitted. A systematic analysis of this problem is an interesting area of future research.

Another setting which is not discussed in the current work is where the transmissions are of nonuniform rates. Throughout our work, we have assumed periodic sampling and transmissions at a fixed rate. For the scheme presented in this paper, it is easy to see from our analysis that only the total number of bits transmitted in each sampling interval matters, when the dimension is sufficient large. That is, for our scheme, even framing each packet (sent in each communication slot) using unequal number of bits will give the same performance as that for equal packet size, if the overall bit-budget per sampling period is fixed. A similar phenomenon was observed in [31], which allowed the extension of some of their analysis to erasure channels with feedback. We remark that a similar extension is possible for some of our results, too. This behavior stems from the use of successive batches of bits to successively refine the estimate of a single sample within any sampling interval, whereby at the end of the sampling interval the error corresponds to roughly that for a quantizer using the total number of bits sent during the interval. In general, a study of nonuniform rates for describing each sample, while keeping bits per time-slot fixed, will require us to move beyond uniform sampling. This, too, is an interesting research direction to pursue.

Finally, we remark that the encoder structure we have imposed, wherein the error in the estimate of the latest sample is refined at each instant, is optimal only asymptotically and is justified only heuristically for fixed dimensions. Even for one dimensional observation it is not clear if this structure is optimal. We believe that this is a question of fundamental interest which remains open.

Acknowledgements

The authors would like to thank Shun Watanabe for pointing to the reference [2].

References

  • [1] A. D. Wyner, “Random packings and coverings of the unit n-sphere,” The Bell System Technical Journal, vol. 46, no. 9, pp. 2111–2118, 1967.
  • [2] A. Lapidoth, “On the role of mismatch in rate distortion theory,” IEEE Trans. Inf. Theory, vol. 43, no. 1, pp. 38–47, 1997.
  • [3] H. S. Witsenhausen, “On the structure of real-time source coders,” Bell System Technical Journal, vol. 58, no. 6, pp. 1437–1451, 1979.
  • [4] D. Teneketzis, “On the structure of optimal real-time encoders and decoders in noisy communication,” IEEE Trans. Inf. Theory, vol. 52, pp. 4017–4035, 2006.
  • [5] A. Mahajan and D. Teneketzis, “On real-time communication systems with noisy feedback,” in IEEE Inf. Theory Workshop (ITW).   IEEE, 2007, pp. 283–288.
  • [6] J. C. Walrand and P. Varaiya, “Optimal causal coding - decoding problems,” IEEE Trans. Inf. Theory, vol. 29, pp. 814–819, 1983.
  • [7] S. Yuksel, “On optimal causal coding of partially observed markov sources in single and multiterminal settings,” IEEE Transactions on Information Theory, vol. 59, no. 1, pp. 424–437, 2012.
  • [8] T. Linder and S. Yüksel, “On optimal zero-delay coding of vector markov sources,” IEEE Transactions on Information Theory, vol. 60, no. 10, pp. 5975–5991, 2014.
  • [9] R. G. Wood, T. Linder, and S. Yüksel, “Optimal zero delay coding of markov sources: Stationary and finite memory codes,” IEEE Transactions on Information Theory, vol. 63, no. 9, pp. 5968–5980, 2017.
  • [10] W. S. Wong and R. W. Brockett, “Systems with finite communication bandwidth constraints. i. state estimation problems,” IEEE Transactions on Automatic Control, vol. 42, no. 9, pp. 1294–1299, 1997.
  • [11] G. N. Nair and R. J. Evans, “State estimation via a capacity-limited communication channel,” in IEEE Conference on Decision and Control, vol. 1, Dec 1997, pp. 866–871.
  • [12] ——, “State estimation under bit-rate constraints,” in IEEE Conference on Decision and Control, vol. 1, Dec 1998, pp. 251–256.
  • [13] N. G. Dokuchaev and A. V. Savkin, “Recursive state estimation via limited capacity communication channels,” in Proceedings of the 38th IEEE Conference on Decision and Control, vol. 5.   IEEE, 1999, pp. 4929–4932.
  • [14] S. C. Smith and P. Seiler, “Estimation with lossy measurements: jump estimators for jump systems,” IEEE Trans. Autom. Control, vol. 48, no. 12, pp. 2163–2171, 2003.
  • [15] A. S. Matveev and A. V. Savkin, “The problem of state estimation via asynchronous communication channels with irregular transmission times,” IEEE Trans. Autom. Control, vol. 48, no. 4, pp. 670–676, 2003.
  • [16] G. M. Lipsa and N. C. Martins, “Remote state estimation with communication costs for first-order lti systems,” IEEE Trans. Autom. Control, vol. 56, pp. 2013–2025, 2011.
  • [17] J. Chakravorty and A. Mahajan, “Fundamental limits of remote estimation of autoregressive markov processes under communication constraints,” IEEE Trans. Autom. Control, vol. 62, pp. 1109–1123, 2017.
  • [18] A. Nayyar, T. Basar, D. Teneketzis, and V. V. Veeravalli, “Optimal strategies for communication and remote estimation with an energy harvesting sensor,” IEEE Trans. Autom. Control, vol. 58, pp. 2246–2259, 2013.
  • [19] Y. Sun, Y. Polyanskiy, and E. Uysal-Biyikoglu, “Remote estimation of the wiener process over a channel with random delay,” in IEEE Inter. Symp. Inf. Theory (ISIT), Jun. 2017, pp. 321–325.
  • [20] T. Linder and G. Lugosi, “A zero-delay sequential scheme for lossy coding of individual sequences,” IEEE Trans. Inf. Theory, vol. 47, pp. 2533–2538, 2001.
  • [21] T. Weissman and N. Merhav, “On limited-delay lossy coding and filtering of individual sequences,” IEEE Trans. Inf. Theory, vol. 48, pp. 721–732, 2002.
  • [22] T. Weisman and N. Merhav, “Universal prediction of individual binary sequences in the presence of noise,” IEEE Trans. Inf. Theory, vol. 47, pp. 2151–2173, 2001.
  • [23] S. Matloub and T. Weissman, “Universal zero-delay joint source - channel coding,” IEEE Trans. Inf. Theory, vol. 52, pp. 5240–5249, 2006.
  • [24] M. S. P. A. K. Gorbunov, “Nonanticipatory and prognostic epsilon entropies and message generation rates,” Problems Inform. Transmission, vol. 9, pp. 184–191, 1973.
  • [25] ——, “Prognostic epsilon entropy of a gaussian message and a gaussian source,” Problems Inform. Transmission, vol. 10, pp. 93–109, 1974.
  • [26] P. Stavrou, C. K. Kourtellaris, and C. D. Charalambous, “Information nonanticipative rate distortion function and its applications,” CoRR, vol. abs/1405.1593, 2014.
  • [27] P. A. Stavrou, J. Østergaard, and C. D. Charalambous, “Zero-delay rate distortion via filtering for vector-valued gaussian sources,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 5, pp. 841–856, Oct 2018.
  • [28] P. A. Stavrou, T. Charalambous, C. D. Charalambous, and S. Loyka, “Optimal estimation via nonanticipative rate distortion function and applications to time-varying gauss–markov processes,” SIAM Journal on Control and Optimization, vol. 56, no. 5, pp. 3731–3765, 2018.
  • [29] H. Viswanathan and T. Berger, “Sequential coding of correlated sources,” IEEE Trans. Inf. Theory, vol. 46, no. 1, pp. 236–246, 2000.
  • [30] N. Ma and P. Ishwar, “On delayed sequential coding of correlated sources,” IEEE Transactions on Information Theory, vol. 57, no. 6, pp. 3763–3782, June 2011.
  • [31] A. Khina, V. Kostina, A. Khisti, and B. Hassibi, “Tracking and control of gauss–markov processes over packet-drop channels with acknowledgments,” IEEE Transactions on Control of Network Systems, vol. 6, no. 2, June 2019.
  • [32] A. Khina, A. Khisti, V. Kostina, and B. Hassibi, “Sequential coding of Gauss-Markov sources with packet erasures and feedback,” in IEEE Inf. Theory Workshop (ITW), Nov. 2017, pp. 529–530.
  • [33] A. Kipnis and G. Reeves, “Gaussian approximation of quantization error for estimation from compressed data,” in IEEE International Symposium on Information Theory, ISIT, 2019, pp. 2029–2033.
  • [34] D. F. Delchamps, “Extracting state information from a quantized output record,” Systems and Control Letters, vol. 13, no. 5, pp. 365–372, December 1989.
  • [35] V. S. Borkar and S. K. Mitter, LQG Control with Communication Constraints.   Boston, MA: Springer US, 1997, pp. 365–373.
  • [36] W. S. Wong and R. W. Brockett, “Systems with finite communication bandwidth constraints. ii. stabilization with limited information feedback,” IEEE Transactions on Automatic Control, vol. 44, no. 5, pp. 1049–1053, 1999.
  • [37] G. N. Nair and R. J. Evans, “Communication-limited stabilization of linear systems,” in Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No. 00CH37187), vol. 1.   IEEE, 2000, pp. 1005–1010.
  • [38] D. Liberzon, “On stabilization of linear systems with limited information,” IEEE Transactions on Automatic Control, vol. 48, no. 2, pp. 304–307, 2003.
  • [39] K. You and L. Xie, “Minimum data rate for mean square stabilization of discrete lti systems over lossy channels,” IEEE Transactions on Automatic Control, vol. 55, no. 10, pp. 2373–2378, 2010.
  • [40] S. Yuksel, “Stochastic stabilization of noisy linear systems with fixed-rate limited feedback,” IEEE Transactions on Automatic Control, vol. 55, no. 12, pp. 2847–2853, 2010.
  • [41] S. Yuksel and S. P. Meyn, “Random-time, state-dependent stochastic drift for markov chains and application to stochastic stabilization over erasure channels,” IEEE Transactions on Automatic Control, vol. 58, no. 1, pp. 47–59, 2012.
  • [42] S. Yuksel, “Characterization of information channels for asymptotic mean stationarity and stochastic stability of nonstationary/unstable linear systems,” IEEE Transactions on Information Theory, vol. 58, no. 10, pp. 6332–6354, 2012.
  • [43] ——, “Stationary and ergodic properties of stochastic nonlinear systems controlled over communication channels,” SIAM Journal on Control and Optimization, vol. 54, no. 5, pp. 2844–2871, 2016.
  • [44] H. V. Poor, An Introduction to Signal Detection and Estimation (2nd Edition).   Berlin, Heidelberg: Springer-Verlag, 1994.
  • [45] A. Gersho and R. M. Gray, Vector quantization and signal compression.   Springer Science & Business Media, 2012, vol. 159.
  • [46] P. Mayekar and H. Tyagi, “RATQ: A universal fixed-length quantizer for stochastic optimization,” arXiv:1908.08200, 2019.
  • [47] T. Cover and J. Thomas, Elements of Information Theory, ser. A Wiley-Interscience publication.   Wiley, 2006.