Tracking an Auto-Regressive Process with Limited Communication per Unit Time
Abstract
Samples from a high-dimensional AR[1] process are observed by a sender which can communicate only finitely many bits per unit time to a receiver. The receiver seeks to form an estimate of the process value at every time instant in real-time. We consider a time-slotted communication model in a slow-sampling regime where multiple communication slots occur between two sampling instants. We propose a successive update scheme which uses communication between sampling instants to refine estimates of the latest sample and study the following question: Is it better to collect communication of multiple slots to send better refined estimates, making the receiver wait more for every refinement, or to be fast but loose and send new information in every communication opportunity? We show that the fast but loose successive update scheme with ideal spherical codes is universally optimal asymptotically for a large dimension. However, most practical quantization codes for fixed dimensions do not meet the ideal performance required for this optimality, and they typically will have a bias in the form of a fixed additive error. Interestingly, our analysis shows that the fast but loose scheme is not an optimal choice in the presence of such errors, and a judiciously chosen frequency of updates outperforms it.
I Introduction
We consider the setting of real-time decision systems based on remotely sensed observations. In this setting, the decision maker needs to track the remote observations with high precision and in a timely manner. These are competing requirements, since high precision tracking will require larger number of bits to be communicated, resulting in larger transmission delay and increased staleness of information. Towards this larger goal, we study the following problem.
Consider a discrete time first-order auto-regressive (AR[1]) process , . A sensor draws a sample from this process, periodically once every time-slots. In each of these time-slots, the sensor can send bits to a center. The center seeks to form an estimate of at time , with small mean square error (MSE). Specifically, we are interested in minimizing the time-averaged error to enable timely and accurate tracking of .
We propose and study a successive update scheme where the encoder computes the error in the estimate of the latest sample at the decoder and sends its quantized value to the decoder. The decoder adds this value to its previous estimate to update the estimate of the latest sample, and uses it to estimate the current value using a linear predictor. We instantiate this scheme with a general gain-shape quantizer for error-quantization.
Note that we can send this update several times between two sampling instances. In particular, our interest in comparing a fast but loose scheme where an update is sent every slot or a slower update every communication slots. The latter allows the encoder to use more bits for the update, but the decoder will need to wait longer. We analyze this scheme for a universal setting and show that the fast but loose successive update scheme, used with an appropriately selected quantizer, is optimal asymptotic.
To show this optimality, we use a random construction for the quantizer, based on the spherical code given in [1, 2]. Roughly speaking, this ideal quantizer yields
for every uniformly bounded. However, in practice, at finite , such quantizers need not exist. Most practical vector quantizers have an extra additive error, , the error bound takes the form
We present our analysis for such general quantizers. Interestingly, for such a quantizer (which is all we have at a finite ), the optimal choice of can differ from . Our analysis provides a theoretically sound guideline for choosing the frequency of updates for practical quantizers.
Our work relates to a large body of literature ranging from real-time compression to control and estimation over networks. The structure of real-time encoders for source coding has been studied in [3, 4, 5, 6, 7, 8, 9]. The general structure of real-time encoders for Markov sources is studied for communication over error-free channels in [3] and over noisy channels in [4, 5]. A similar structural result for the optimal encoders and decoders which are restricted to be causal is given in [6]. Furthermore, structural results in the context of optimal zero-delay coding of correlated sources are available in [7, 9, 8]. The setup in all these works are different from the problem we consider and the results do not extend to our problem.
The problems of remote estimation under communication constraints of various kinds have been studied in [10, 11, 12, 13, 14, 15]. This line of work proposes several Kalman-like recursive estimation algorithms and evaluates their performances. In a related thread, [16, 17, 18] study remote estimation under communication constraints and other related constraints using tools from dynamic programming and stochastic control. However, in all these works the role of channel delay is slightly different from that in our setting. Furthermore, the specific problem of choice of quantizer we consider has not been looked at. More recently, [19] studied remote estimation of Wiener process for channels with random delays and proves that the optimal sampling policy is a threshold based policy. This work, like some of the works cited above, assumes real-valued transmissions and does not take into account any quantization effects.
In more information theoretic settings, sequential coding for individual sequences under delay constraints was studied in [20, 21, 22, 23]. Closer to our work, the causal (non-anticipatory) rate-distortion function for a stochastic process goes back to early works [24, 25]. Recent works [26, 27, 28] consider the specific case of auto-regressive and Gauss-Markov processes and use the general formula in these early works to establish asymptotic optimality of simpler information structure for the encoders (optimal decoder structure is straightforward). We note that related formulations have been studied for simple settings of two or three iterations in [29, 30] where interesting encoder structures for using previous communication for next sample as well emerge. Although some of these works propose specific optimal schemes as well, the key results in this line of research provide an expression for the rate-distortion function as an optimization problem, solving which will provide guidelines for a concrete scheme. In contrast, motivated by problems of estimation over an erasure channel, [31] provides an asymptotically optimal scheme that roughly uses Gaussian codebooks to quantize the innovation errors between the encoder’s observation and the decoder’s estimation. Our work is closest to [31], but differs in an important aspect from all the works mentioned in this research thread: We take transmission delays into account. In particular, in our formulation the decoder estimation and communication delays are on the same time-scales. The decoder must provide an estimate at every time instant, and a longer codeword will result in a longer delay for the decoder to get complete information.
Nonetheless, our converse bound is derived using similar methods as [32]. Even the achievability part of our proof draws from [32], but there is technical caveat. Note that after the first round of quantization, the error vector need not be Gaussian, and the analysis in [32] can only be applied after showing a closeness of the error vector distribution to Gaussian in the Wasserstein distance of order . While the original proof [32] overlooks this technical point, this gap can be filled using a recent result from [33] if spherical codes are used. But we follow an alternative approach and show a direct analysis using vector quantizers.
In addition, there is a large body of work on control problems over rate-limited communication channels ( [34, 35, 36, 37, 38, 39, 40, 41, 42, 43]). This line of work implicitly requires handling of communication delays in construction of estimators. However, the simple formulation we have seems to be missing, and results in this long line of work do not resolve the questions we raise.
Our main contributions in this paper are as follows: We present a heuristically appealing encoder structure which we show to be optimal asymptotically in the dimension of the observation. Specifically, we propose to send successive updates that refines the estimates of the latest sample at the decoder. It is important to note that we quantize the estimation error at the decoder, instead of quantizing the innovation sequence formed at the encoder. Although the optimal MMSE decoder involves taking conditional expectation, we use a simple decoder which uses a simple linear structure. Yet, we show that this decoder is asymptotically optimal. Then, we instantiate this general scheme with spherical codes for quantizers to obtain a universal scheme. In particular, we consider general gain-shape quantizers and develop a framework to analyze their performance. One interesting result we present shows that the tradeoff between the accuracy and frequency of the updates must be carefully balanced based on the “bias” (additive error) in the quantizer used.
We present our problem formulation in the next section. Section III presents a discussion on the our achievability scheme followed by the main results in the subsequent section. Section V provides a detailed analysis of our scheme, which we further build-on in Section VI to get our asymptotic achievability results. We prove our converse bound in Section VII, and conclude with a discussion on extensions of our result in the final section.
II Problem Formulation
We begin by providing a formal description of our problem; different components of the model are presented in separate sections. Throughout the remainder of this paper, the set of real numbers is denoted by , the -dimensional Euclidean space is denoted by and the associated Euclidean norm by , the set of positive integers is denoted by , the set of non-negative integers is denoted by , the set of continuous positive integers until is denoted by , and an identity matrix of size is denoted by .
II-A Observed random process and its sampling
For , we consider a discrete time auto-regressive process of order 1 (AR[1] process) in ,
(1) |
where is an independent and identically distributed (i.i.d.) random sequence with zero mean and covariance matrix . For simplicity, we assume that is a zero mean random variable with covariance matrix . This implies that the variance of is for all . In addition, we assume that has a bounded fourth moment at all times . Specifically, let satisfy
It is clear that is a Markov process. We denote the set of processes satisfying the assumptions above by and the class of all such processes for different choices of dimension as .
This discrete time process is sub-sampled periodically at sampling frequency , for some , to obtain samples .
II-B Encoder description
The sampled process is passed to an encoder which converts it to a bit stream. The encoder operates in real-time and sends bits between any two sampling instants. Specifically, the encoder is given by a sequence of mappings , where the mapping at any discrete time is denoted by
The encoder output at time is denoted by the codeword . We represent this codeword by an -length sequence of binary strings , where each term takes values in . For and , we can view the binary string as the communication sent at time . We elaborate on the communication channel next.
II-C Communication channel
The output bit-stream of the encoder is sent to the receiver via an error-free communication channel. Specifically, we assume slotted transmission with synchronization where in each slot the transmitter sends bits of communication error-free. That is, we are allowed to send bits per dimension, per time slot. Note that there is a delay of time-unit (corresponding to one slot) in transmission of each bits. Therefore, the vector of bits transmitted at time is received at time instant for . Throughout we use the notation and , respectively, for the set of transmit and receive times for the strings , .
II-D Decoder description
We describe the operation of the receiver at time , for some , such that . Upon receiving the codewords and the partial codeword at time , the decoder estimates the current-state of the process using the estimator mapping
We denote the overall communication received by the decoder until time instant111The time index in corresponds to the transmission time of the codewords, whereby the communication received till time is denoted by . by . Further, we denote by the real-time causal estimate of formed at the decoder at time . Thus, the overall real-time causal estimation scheme is described by the mappings . It is important to note that the communication available to the decoder at time can only depend on samples up to time . As a convention, we assume that .
II-E Performance metrics
We call the encoder-decoder mapping sequence a tracking code of rate and sampling period . The tracking error of our tracking code at time for process is measured by the mean squared error (MSE) per dimension given by
Our goal is to design with low average tracking error given by
For technical reasons, we restrict to a finite time horizon setting. For the most part, the time horizon will remain fixed and will be omitted from the notation. Instead of working with the mean-square error, a more convenient parameterization for us will be that of accuracy, given by
Definition 1 (Maxmin tracking accuracy).
The worst-case tracking accuracy for attained by a tracking code is given by
The maxmin tracking accuracy for at rate and sampling period is given by
where the supremum is over all tracking codes .
The maxmin tracking accuracy is the fundamental quantity of interest for us. Recall that denotes the dimension of the observations in for and the time horizon. However, we will only characterize asymptotically in and . Specifically, we define the asymptotic maxmin tracking accuracy as
We will provide a characterization of and present a sequence of tracking codes that attains it. In fact, the tracking code we use is an instantiation of our successive update scheme, which we describe in the next section. It is important to note that our results may not hold if we switch the order of limits above: We need very large codeword lengths depending on a fixed finite time horizon .
III The Successive Update Scheme
In this section, we present our main contribution in this paper, namely the Successive Update tracking code. Before we describe the scheme completely, we present its different components. In every communication slot, the transmitter gets an opportunity to send bits. The transmitter may use it to send any information about a previously seen sample. There are various options for the encoder. For instance, it may use the current slot to send some information about a sample it had seen earlier. Or it may use all the slots between two sampling instants to send a quantized version of the latest sample. Interestingly, it will be seen (quite straightforwardly) that there are not so many options for the decoder; it gets roughly fixed once the encoder is chosen.
III-A Decoder structure
Once the quantized information is sent by the transmitter, at the receiver end, the decoder estimates the state , using the codewords received until time . Since we are interested in forming estimates with small MSE, the decoder simply forms the minimum mean square error (MMSE) estimate using all the observations till that point. Specifically, for , denoting by the MMSE estimate formed by the communication received before time , we know ( [44])
The following result presents a simple structure for for our AR[1] model.
Lemma 1 (MMSE Structure).
The MMSE estimates and , respectively, of samples and at any time and using communication are related as
Proof:
Recalling the notation , we can represent as for . From the evolution of the AR[1] process, for , the sample can be expressed in terms of the previous sample as
(2) |
where the innovation sequence is independent of process samples . By our specification, the historical observations at the receiver depend only on the process evolution until time , namely is independent of conditioned on . In particular, for every . Thus, taking conditional expectation on both sides of (2), we get
since . ∎
Therefore, the optimal strategy for the decoder is to use the communication sent to form an estimate for the latest sample and then scale it to form the estimate of the state at the current time instant.
III-B Encoder structure: Refining the error successively
The structure of the decoder exposed in Lemma 1 gives an important insight for encoder design: The communication sent between two sampling instants is used only to form estimates of the latest sample. In particular, the communication transmitted at time must be chosen to refine the previous estimate from to . This principle can be applied (as a heuristic) for any other form of the estimate as follows. Let denote the estimate for formed at the receiver at time (which need not be the MMSE estimate ). Our encoder computes the error in the receiver estimate of the last process sample at each time instant . Denoting the error at time by , the encoder quantizes this error and sends it as communication .
Simply speaking, our encoder computes and quantizes the error in the current estimate of the last sample at the decoder, and sends it to the decoder to enable the refinement of the estimate in the next time slot. While we have not been able to establish optimality of this encoder structure, our results will show its optimality asymptotically, in the limit as the dimension goes to infinity.
Even within this structural simplification, a very interesting question remains. Since the process is sampled once in time slots, we have, potentially, bits to encode the latest sample. At any time , the receiver has access to and the partial codewords for . A simple approach for the encoder is to use the complete codeword to express the latest sample and the decoder can ignore the partial codewords. This approach will result in slow but very accurate updates of the sample estimates. An alternative fast but loose approach will send quantizer codewords to refine estimates in every communication slot. Should we prefer fast but loose estimates or slow but accurate ones? Our results will shed light on this conundrum.
III-C The choice of quantizers
In our description of the encoder structure above, we did not specify a key design choice, namely the choice of the quantizer. We will restrict to using the same quantizer to quantize the error in each round of communication. The precision of this quantizer will depend on whether we choose a fast but loose paradigm or a slow but accurate one. However, the overall structure will remain the same. Roughly speaking, we allow any gain-shape [45] quantizer which separately sends the quantized value of the gain and the shape for input . Formally, we use the following abstraction.
Definition 2 (-quantizer family).
Fix . For and , a quantizer with dynamic range specified by a mapping constitutes an bit -quantizer if for every vector such that , we have
Further, for a mapping , which is a decreasing function of rate , a family of quantizers constitutes an -quantizer family if for every the quantizer constitutes an bit -quantizer.
The expectation in the previous definition is taken with respect to the randomness in the quantizer, which is assumed to be shared between the encoder and the decoder for simplicity. The parameter , termed the dynamic range of the quantizer, specifies the domain of the quantizer. When the input does not satisfy , the quantizer simply declares a failure, which we denote by . Our tracking code may use any such -quantizer family. It is typical in any construction of a gain-shape quantizer to have a finite and . Our analysis for finite will apply to any such -quantizer family and, in particular, will bring-out the role of the “bias” . However, when establishing our optimality result, we instantiate it using a random spherical code to get the desired performance.
III-D Description of the successive update scheme
All the conceptual components of our scheme are ready. We use the structure of Lemma 1 and focus only on updating the estimates of the latest observed sample at the decoder. Our encoder successively updates the estimate of the latest sample at the decoder by quantizing and sending estimates for errors .
As discussed earlier, we must decide if we prefer a fast but loose approach or a slow but accurate approach for sending error estimates. To carefully examine this tradeoff, we opt for a more general scheme where the bits available between two samples are divided into sub-fragments of length bits each. We use an bit quantizer to refine error estimates for the latest sample (obtained at time ) every slots, and send the resulting quantizer codewords as partial tracking codewords , . Specifically, the th codeword transmission interval is divided into sub-fragments , given by
and is transmitted in communication slots in .
At time instant the decoder receives the th sub-fragment of bits, and uses it to refine the estimate of the latest source sample . Note that the fast but loose and the slow but accurate regimes described above correspond to and , respectively. In the middle of the interval , the decoder ignores the partially received quantization code and retains the estimate of formed at time . It forms an estimate of the current state by simply scaling by a factor of , as suggested by Lemma 1.
Finally, we impose one more additional simplification on the decoder structure. Instead of using MMSE estimates for the latest sample, we simply update the estimate by adding to it the quantized value of the error. Thus, the decoder has a simple linear structure.
We can use any bit quantizer222With an abuse of notation, we will use instead of to denote an bit quantizer. for the -dimensional error vector, whereby this scheme can be easily implemented in practice if can be implemented. For instance, we can use any standard gain-shape quantizer. The performance of most quantizers can be analyzed explicitly to render them a -quantizer family for an appropriate and function . Later, when analyzing the scheme, we will consider a coming from a -quantizer family and present a theoretically sound guideline for choosing .
Recall that we denote the estimate of formed at the decoder at time by . We start by initializing and then proceed using the encoder and the decoder algorithms outlined above. Note that our quantizer may declare failure symbol , in which case the decoder must still yield a nominal estimate. We will simply declare the estimate as333In analysis, we account for all these events as error. Only the probability of failure will determine the contribution of this part to the MSE since the process is mean-square bounded. once a failure happens.
We give a formal description of our encoder and decoder algorithms below.
The encoder.
-
1
Initialize , , .
-
2
At time , use the decoder algorithm (to be described below) to form the estimate and compute the error
(3) where we use the latest sample available at time .
-
3
Quantize to bit as .
-
4
If quantize failure occurs and , send to the receiver and terminate the encoder.
-
5
Else, send a binary representation of as the communication to the receiver over the next communication slots444For simplicity, we do not account for the extra message symbol needed for sending ..
-
6
If , increase by ; else set and increase by . Go to Step 2.
The decoder.
-
1
Initialize , , .
-
2
At time , if encoding failure has not occurred until time , compute
and output .
-
3
Else, if encoding failure has occurred and the symbol is received declare for all subsequent time instants .
-
4
At time , for , output555We ignore the partial quantizer codewords received as till time . .
-
5
If , increase by ; else set and increase by . Go to Step 2.
IV Main Results
We present results in two categories. First, we provide an explicit formula for the asymptotic maxmin tracking accuracy . Next, we present a theoretically-founded guideline for selecting a good for the successive update scheme with a -quantizer family. Interestingly, the optimal choice may differ from the asymptotically optimal choice of .
IV-A Characterization of the maxmin tracking accuracy
To describe our result, we define a functions and as
(4) |
Note that is a decreasing function of with . The result below shows that, for an appropriate choice of the quantizer, our successive update scheme with (the fast but loose version) achieves an accuracy of asymptotically, universally for all processes in .
Theorem 2 (Lower bound for maxmin tracking accuracy: The achievability).
For and , the asymptotic maxmin tacking accuracy is bounded below as
Furthermore, this bound can be obtained by a successive update scheme with and appropriately chosen quantizer .
We provide a proof in Section VI. Note that while we assume that the per dimension fourth moment of the processes in is bounded, the asymptotic result above does not depend on that bound. Interestingly, the performance characterized above is the best possible.
Theorem 3 (Upper bound for maxmin tracking accuracy: The converse).
For and , the asymptotic maxmin tacking accuracy is bounded above as
Furthermore, the upper bound is obtained by considering a Gauss-Markov process.
We provide a proof in Section VII. Thus, with the fast but loose successive update scheme being universally (asymptotically) optimal and the Gauss-Markov process being the most difficult process to track. Clearly, the best possible choice of sampling period is and the highest possible accuracy at rate is , whereby we cannot hope for an accuracy exceeding .
Alternatively, the results above can be interpreted as saying that we cannot subsample at a frequency less than for attaining a tracking accuracy .
IV-B Guidelines for choosing a good
The proof of Theorem 2 entails the analysis of the successive update scheme for . In fact, we can analyze this scheme for any and for any -quantizer family; we term this tracking code the -successive update (-SU) scheme. This analysis can provide a simple guideline for the optimal choice of depending on the performance of the quantizer.
However, there are some technical caveats. A quantizer family will operate only as long as the input satisfies . If a outside this range is observed is observed, the quantizer will declare and the tracking code encoder, in turn, will declare a failure. We denote by the stopping time at which encoder failure occurs for the first time, ,
Further, denote by the event that failure does not occur until time , ,
We characterize the performance of a -SU in terms of the probability of encoder failure in a finite time horizon .
Theorem 4 (Performance of -SU).
For fixed , consider the -SU scheme with an bit -quantizer , and denote the corresponding tracking code by . Suppose that for a time horizon , the tracking code satisfies . Then,
where satisfies
We remark that can be made small by choosing to be large for a quantizer family. Furthermore, the inequality in the upper bound for the MSE in the previous result (barring the dependence on ) comes from the inequality in the definition of a -quantizer, rendering it a good proxy for the performance of the quantizer. The interesting regime is that of very small where the encoder failure doesn’t occur during the time horizon of operation. If we ignore the dependence on , the accuracy of the -SU does not depend either on or on the bound for the fourth moment . Motivated by these insights, we define the accuracy-speed curve of a quantizer family as follows.
Definition 3 (The accuracy-speed curve).
For , , and , the accuracy-speed curve for a -quantizer family is given by
By Theorem 4, it is easy to see that the accuracy (precisely the upper bound on the accuracy) of a -SU scheme is better when is larger. Thus, a good choice of for a given quantizer family is the one that maximizes for .
We conclude by providing accuracy-speed curves for some illustrative examples. To build some heuristics, note that a uniform quantization of has and . For a gain-shape quantizer, we express a vector where the shape vector has . An ideal shape quantizer (which only can be shown to exist asymptotically) using bits per dimension will satisfy , similar to the scalar uniform quantizer. In one of the examples below, we consider gain-shape quantizers with such an ideal shape quantizer.
Example 1.
We begin by considering an ideal quantizer family with and . In our asymptotic analysis, we will show roughly that such a quantizer with very small exists. For this ideal case, for , the accuracy-speed curve is given by
It can be seen that is decreasing in whereby the optimal choice of that maximized over is . Heuristically, this justifies why asymptotically the fast but loose successive update scheme is optimal.
Example 2 (Uniform scalar quantization).
In this example, we consider a coordinate-wise uniform quantizer. Since we seek quantizers for inputs such that , we can only use uniform quantizer of for each coordinate. For this quantizer, we have and , whereby the accuracy-speed curve is given by . Thus, once again, the optimal choice of that maximizes accuracy is .
Example 3 (Gain-shape quantizer).
Consider the quantization of a vector where . The vector is quantized by a gain-shape quantizer which quantizes the norm and shape of the vector separately to give . We use a uniform quantizer within a fixed range in order to quantize the norm to , where an ideal shape quantizer is employed in quantizing the shape vector . Namely, we assume and . Suppose, that we allot bits out of the total budget of bits for norm quantization and the rest for shape quantization. Then, we see that
whereby and . Thus, the accuracy-speed curve is given by
Note that the optimal choice of in this case depends on the choice of .
We illustrated application of our analysis for idealized quantizers, but it can be used to analyze even very practical quantizers, such as the recently proposed almost optimal quantizer in [46].
V Analysis of the Successive Update scheme
From the discussion in section III, we observe that the successive update scheme is designed to refine the estimate of in each interval . This fact helps us in establishing a recursive relation for , in terms of which is provided next.
Lemma 5.
For a time instant and , let denote the tracking code of a -SU scheme employing an bit -quantizer. Assume that . Then, we have
Proof:
From the evolution of the AR[1] process defined in (1), we see that . Further for the -SU scheme, we know that at each instant . Therefore, we have
Since the estimate is a function of samples , and the sequence is independent of the past, we obtain the per dimension MSE as
Further, we divide the error into two terms based on occurrence of the failure event as follows:
(5) |
Recall that at each instant , we refine the estimate of to . Upon substituting this expression for , we obtain
where the identity uses the definition of error given in (3) and the inequality holds since is a -quantizer. Repeating the previous step recursively, we get
which is the same as
Moving to the error term when encoder failure occurs, recall that the decoder sets the estimate to in the event of an encoder failure. Thus, using the Cauchy-Schwartz inequality, we get
Substituting the two bounds above in (5), we get the result. ∎
The following recursive bound can be obtained using almost the same proof as that of Lemma 5; we omit the details.
Lemma 6.
Let denote the tracking code of a -SU scheme employing an bit -quantizer. Assume that . Then, we have
We also need the following technical observation.
Lemma 7.
For a sequence that satisfies sequence of upper bounds
with constants such that is finite and , we have
Proof:
From the sequence of upper bounds, we can inductively show that
Averaging over the horizon , we get
From the finiteness of and the fact that , the result follows by taking the limit growing arbitrarily large on both sides. ∎
We are now in a position to prove Theorem 4.
Proof of Theorem 4: We begin by noting that, without any loss of generality, we can restrict to . This holds since the contributions of the error term within the fixed interval are bounded. For , the time duration can be partitioned into intervals . Therefore, we can write the average MSE per dimension for the -SU scheme for time-horizon as
From the upper bound for per dimension MSE given in Lemma 5, we get
Summing the expression above over and , and dividing by , we get
It follows by Lemma 6 that
(6) |
where the denotes the set of satisfying
We denote the right-side of (6) by . Noting that by Lemma 7 any sequence satisfies
we get that
which completes the proof. ∎
VI Asymptotic Achievability Using Random Quantizer
With Theorem 4 at our disposal, the proof of achievability can be completed by fixing and showing the existence of appropriate quantizer. However, we need to handle the failure event, and we address this first. The next result shows that the failure probability depends on the quantizer only through .
Lemma 8.
For fixed and , consider the -SU scheme with and an bit -quantizer with dynamic range . Then, for every , there exists an independent of such that for all , we get
Proof:
The event (of encoder failure not happening until time for the successive update scheme) occurs when the errors satisfies , for every and such that . For brevity, we denote by the error random variable and . We note that
Note that we have under , whereby
Denoting by the probability , the previous two inequalities imply
We saw earlier in the proof of Lemma 5 that depends only on the probability that failure doesn’t occur until time . Proceeding as in that proof, we get
where and do not depend on . Therefore, there exists independent of such that for all exceeding we have
which completes the proof by summing over . ∎
The bound above is rather loose, but it suffices for our purpose. In particular, it says that we can choose sufficiently large to make probability of failure until time less than any , whereby Theorem 4 can be applied by designing a quantizer for this . Indeed, we can use the quantizer of unit sphere from [1, 2], along with a uniform quantizer for gain (which lies in ) to get the following performance. In fact, we will show that a deterministic quantizer with the desired performance exists. Note that we already considered such a quantizer in Example 3. But the analysis there was slightly loose, and assumed the existence of an ideal shape quantizer.
Lemma 9.
For every , there exists an bit -quantizer with dynamic range , for all sufficiently large.
Proof:
We first borrow a classic construction from [1, 2], which gives us our desired shape quantizer. Denote by the -dimensional unit sphere . For every and sufficiently large, it was shown in [1, 2] that there exist vectors in such that for every we can find satisfying
Denoting , consider the shape quantizer from [2] given by
Note that we shrink the length of by a factor of , which will be seen to yield the gain over the analysis in Example 3.
We append to this shape quantizer the uniform gain quantizer , which quantizes the interval uniformly into sub-intervals of length . Specifically, and the corresponding index is given by . We represent this index using its bit binary representation and denote this mapping by .
For every such that , we consider the quantizer
For this quantizer, for every with such that , we have
where the first inequality uses the covering property of . Therefore, constitutes an bit -quantizer with dynamic range , for all sufficiently large. Note that this quantizer is a deterministic one. ∎
Proof of Theorem 2: For any fixed and , we can make the probability of failure until time less than by choosing sufficiently large. Further, for any fixed , by Lemma 9, we can choose sufficiently large to get an bit -quantizer for vectors with . Therefore, by Theorem 4 applied for , we get that
The proof is completed upon taking the limits as , and go to .∎
VII Converse bound : Proof of Theorem 3
The proof is similar to the converse proof in [32], but now we need to handle the delay per transmission. We rely on the properties of entropy power of a random variable. Recall that for a continuous random variable taking values in , the entropy power of is given by
where is the differential entropy of .
Consider a tracking code of rate and sampling period and a process . We begin by noting that the state at time is related to the state at time as
where the noise is independent of (and the past states). In particular, for , , we get
where we define and the first identity uses the orthogonality of noise added in each round from the previous states and noise. Since the Gaussian distribution has the maximum differential entropy among all continuous random variables with a given variance, and the entropy power for a Gaussian random variable equals its variance, we get that
Therefore, the previous bound for tracking error yields
(7) |
where the identity uses the assumption that are identically distributed for all . Taking average of these terms for , we get
Note that s act as estimates of which depend on the communication received by the decoder until time . We denote the communication received at time by , whereby depends only on . In particular, the communication was sent as a function of , the sample seen at time .
From here on, we proceed by invoking the “entropy power bounds” for the MSE terms. For random variables and such that has a conditional density, the conditional entropy power is given by .666The conditional differential entropy is given by . Bounding MSE terms by entropy power is a standard step that allows us to track reduction in error due to a fixed amount of communication.
We begin by using the following standard bound (see [47, Chapter 10]):777It follows simply by noting that Gaussian maximizes differential entropy among all random variables with a given second moment and that . For a continuous random variable and a discrete random variable taking values, let be any function of . Then, it holds that
(8) |
We apply this result to given in the role of and the communication in the role of . The previous bound and Jensen’s inequality yield
Next, we recall the entropy power inequality ( [47]): For independent and , . Noting that , where is an iid zero-mean random variable independent of , and that is a function of , we get
where the previous identities utilizes the scaling property of differential entropy. Upon combining the bounds given above and simplifying, we get
(9) |
Finally, note that the terms are exactly the same as that considered in [32, eqn. 11e] since they correspond to recovering using communication that can depend on it. Therefore, a similar expression holds here, for the sampled process . Using the recursive bound for the tracking error in (7) and (8), we adapt the results of [32, eqn. 11] for our case to obtain
where the quantity is given by the recursion
with .
The bound obtain above holds for any given process . To obtain the best possible bound we substitute to be a Gaussian random variable, since that would maximize . Specifically, we set to be a Gaussian random variable with zero mean and variance to get . Thus, taking supremum over all distributions on both sides of (9), we have
where
with . For this sequence , we can see that ( [32, Corollary 1])
Therefore, we have obtained
As the previous bound holds for all tracking codes , it follows that
VIII Discussion
We restricted our treatment to an AR[1] process with uncorrelated components. This restriction is for clarity of presentation, and some of the results can be extended to AR[1] processes with correlated components. In this case, the decoder will be replaced by a Kalman-like filter in the manner of [27]. A natural extension of this work is the study of an optimum transmission strategy for an AR[] process in the given setting. In an AR[] process, the strategy of refining the latest sample is clearly not sufficient as the value of the process at any time instant is dependent on the past samples. If the sampling is periodic, even the encoder does not have access to all these samples unless we take a sample at every instant. A viable alternative is to take consecutive samples at every sampling instant. However, even with this structure on the sampling policy, it is not clear how must the information be transmitted. A systematic analysis of this problem is an interesting area of future research.
Another setting which is not discussed in the current work is where the transmissions are of nonuniform rates. Throughout our work, we have assumed periodic sampling and transmissions at a fixed rate. For the scheme presented in this paper, it is easy to see from our analysis that only the total number of bits transmitted in each sampling interval matters, when the dimension is sufficient large. That is, for our scheme, even framing each packet (sent in each communication slot) using unequal number of bits will give the same performance as that for equal packet size, if the overall bit-budget per sampling period is fixed. A similar phenomenon was observed in [31], which allowed the extension of some of their analysis to erasure channels with feedback. We remark that a similar extension is possible for some of our results, too. This behavior stems from the use of successive batches of bits to successively refine the estimate of a single sample within any sampling interval, whereby at the end of the sampling interval the error corresponds to roughly that for a quantizer using the total number of bits sent during the interval. In general, a study of nonuniform rates for describing each sample, while keeping bits per time-slot fixed, will require us to move beyond uniform sampling. This, too, is an interesting research direction to pursue.
Finally, we remark that the encoder structure we have imposed, wherein the error in the estimate of the latest sample is refined at each instant, is optimal only asymptotically and is justified only heuristically for fixed dimensions. Even for one dimensional observation it is not clear if this structure is optimal. We believe that this is a question of fundamental interest which remains open.
Acknowledgements
The authors would like to thank Shun Watanabe for pointing to the reference [2].
References
- [1] A. D. Wyner, “Random packings and coverings of the unit n-sphere,” The Bell System Technical Journal, vol. 46, no. 9, pp. 2111–2118, 1967.
- [2] A. Lapidoth, “On the role of mismatch in rate distortion theory,” IEEE Trans. Inf. Theory, vol. 43, no. 1, pp. 38–47, 1997.
- [3] H. S. Witsenhausen, “On the structure of real-time source coders,” Bell System Technical Journal, vol. 58, no. 6, pp. 1437–1451, 1979.
- [4] D. Teneketzis, “On the structure of optimal real-time encoders and decoders in noisy communication,” IEEE Trans. Inf. Theory, vol. 52, pp. 4017–4035, 2006.
- [5] A. Mahajan and D. Teneketzis, “On real-time communication systems with noisy feedback,” in IEEE Inf. Theory Workshop (ITW). IEEE, 2007, pp. 283–288.
- [6] J. C. Walrand and P. Varaiya, “Optimal causal coding - decoding problems,” IEEE Trans. Inf. Theory, vol. 29, pp. 814–819, 1983.
- [7] S. Yuksel, “On optimal causal coding of partially observed markov sources in single and multiterminal settings,” IEEE Transactions on Information Theory, vol. 59, no. 1, pp. 424–437, 2012.
- [8] T. Linder and S. Yüksel, “On optimal zero-delay coding of vector markov sources,” IEEE Transactions on Information Theory, vol. 60, no. 10, pp. 5975–5991, 2014.
- [9] R. G. Wood, T. Linder, and S. Yüksel, “Optimal zero delay coding of markov sources: Stationary and finite memory codes,” IEEE Transactions on Information Theory, vol. 63, no. 9, pp. 5968–5980, 2017.
- [10] W. S. Wong and R. W. Brockett, “Systems with finite communication bandwidth constraints. i. state estimation problems,” IEEE Transactions on Automatic Control, vol. 42, no. 9, pp. 1294–1299, 1997.
- [11] G. N. Nair and R. J. Evans, “State estimation via a capacity-limited communication channel,” in IEEE Conference on Decision and Control, vol. 1, Dec 1997, pp. 866–871.
- [12] ——, “State estimation under bit-rate constraints,” in IEEE Conference on Decision and Control, vol. 1, Dec 1998, pp. 251–256.
- [13] N. G. Dokuchaev and A. V. Savkin, “Recursive state estimation via limited capacity communication channels,” in Proceedings of the 38th IEEE Conference on Decision and Control, vol. 5. IEEE, 1999, pp. 4929–4932.
- [14] S. C. Smith and P. Seiler, “Estimation with lossy measurements: jump estimators for jump systems,” IEEE Trans. Autom. Control, vol. 48, no. 12, pp. 2163–2171, 2003.
- [15] A. S. Matveev and A. V. Savkin, “The problem of state estimation via asynchronous communication channels with irregular transmission times,” IEEE Trans. Autom. Control, vol. 48, no. 4, pp. 670–676, 2003.
- [16] G. M. Lipsa and N. C. Martins, “Remote state estimation with communication costs for first-order lti systems,” IEEE Trans. Autom. Control, vol. 56, pp. 2013–2025, 2011.
- [17] J. Chakravorty and A. Mahajan, “Fundamental limits of remote estimation of autoregressive markov processes under communication constraints,” IEEE Trans. Autom. Control, vol. 62, pp. 1109–1123, 2017.
- [18] A. Nayyar, T. Basar, D. Teneketzis, and V. V. Veeravalli, “Optimal strategies for communication and remote estimation with an energy harvesting sensor,” IEEE Trans. Autom. Control, vol. 58, pp. 2246–2259, 2013.
- [19] Y. Sun, Y. Polyanskiy, and E. Uysal-Biyikoglu, “Remote estimation of the wiener process over a channel with random delay,” in IEEE Inter. Symp. Inf. Theory (ISIT), Jun. 2017, pp. 321–325.
- [20] T. Linder and G. Lugosi, “A zero-delay sequential scheme for lossy coding of individual sequences,” IEEE Trans. Inf. Theory, vol. 47, pp. 2533–2538, 2001.
- [21] T. Weissman and N. Merhav, “On limited-delay lossy coding and filtering of individual sequences,” IEEE Trans. Inf. Theory, vol. 48, pp. 721–732, 2002.
- [22] T. Weisman and N. Merhav, “Universal prediction of individual binary sequences in the presence of noise,” IEEE Trans. Inf. Theory, vol. 47, pp. 2151–2173, 2001.
- [23] S. Matloub and T. Weissman, “Universal zero-delay joint source - channel coding,” IEEE Trans. Inf. Theory, vol. 52, pp. 5240–5249, 2006.
- [24] M. S. P. A. K. Gorbunov, “Nonanticipatory and prognostic epsilon entropies and message generation rates,” Problems Inform. Transmission, vol. 9, pp. 184–191, 1973.
- [25] ——, “Prognostic epsilon entropy of a gaussian message and a gaussian source,” Problems Inform. Transmission, vol. 10, pp. 93–109, 1974.
- [26] P. Stavrou, C. K. Kourtellaris, and C. D. Charalambous, “Information nonanticipative rate distortion function and its applications,” CoRR, vol. abs/1405.1593, 2014.
- [27] P. A. Stavrou, J. Østergaard, and C. D. Charalambous, “Zero-delay rate distortion via filtering for vector-valued gaussian sources,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 5, pp. 841–856, Oct 2018.
- [28] P. A. Stavrou, T. Charalambous, C. D. Charalambous, and S. Loyka, “Optimal estimation via nonanticipative rate distortion function and applications to time-varying gauss–markov processes,” SIAM Journal on Control and Optimization, vol. 56, no. 5, pp. 3731–3765, 2018.
- [29] H. Viswanathan and T. Berger, “Sequential coding of correlated sources,” IEEE Trans. Inf. Theory, vol. 46, no. 1, pp. 236–246, 2000.
- [30] N. Ma and P. Ishwar, “On delayed sequential coding of correlated sources,” IEEE Transactions on Information Theory, vol. 57, no. 6, pp. 3763–3782, June 2011.
- [31] A. Khina, V. Kostina, A. Khisti, and B. Hassibi, “Tracking and control of gauss–markov processes over packet-drop channels with acknowledgments,” IEEE Transactions on Control of Network Systems, vol. 6, no. 2, June 2019.
- [32] A. Khina, A. Khisti, V. Kostina, and B. Hassibi, “Sequential coding of Gauss-Markov sources with packet erasures and feedback,” in IEEE Inf. Theory Workshop (ITW), Nov. 2017, pp. 529–530.
- [33] A. Kipnis and G. Reeves, “Gaussian approximation of quantization error for estimation from compressed data,” in IEEE International Symposium on Information Theory, ISIT, 2019, pp. 2029–2033.
- [34] D. F. Delchamps, “Extracting state information from a quantized output record,” Systems and Control Letters, vol. 13, no. 5, pp. 365–372, December 1989.
- [35] V. S. Borkar and S. K. Mitter, LQG Control with Communication Constraints. Boston, MA: Springer US, 1997, pp. 365–373.
- [36] W. S. Wong and R. W. Brockett, “Systems with finite communication bandwidth constraints. ii. stabilization with limited information feedback,” IEEE Transactions on Automatic Control, vol. 44, no. 5, pp. 1049–1053, 1999.
- [37] G. N. Nair and R. J. Evans, “Communication-limited stabilization of linear systems,” in Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No. 00CH37187), vol. 1. IEEE, 2000, pp. 1005–1010.
- [38] D. Liberzon, “On stabilization of linear systems with limited information,” IEEE Transactions on Automatic Control, vol. 48, no. 2, pp. 304–307, 2003.
- [39] K. You and L. Xie, “Minimum data rate for mean square stabilization of discrete lti systems over lossy channels,” IEEE Transactions on Automatic Control, vol. 55, no. 10, pp. 2373–2378, 2010.
- [40] S. Yuksel, “Stochastic stabilization of noisy linear systems with fixed-rate limited feedback,” IEEE Transactions on Automatic Control, vol. 55, no. 12, pp. 2847–2853, 2010.
- [41] S. Yuksel and S. P. Meyn, “Random-time, state-dependent stochastic drift for markov chains and application to stochastic stabilization over erasure channels,” IEEE Transactions on Automatic Control, vol. 58, no. 1, pp. 47–59, 2012.
- [42] S. Yuksel, “Characterization of information channels for asymptotic mean stationarity and stochastic stability of nonstationary/unstable linear systems,” IEEE Transactions on Information Theory, vol. 58, no. 10, pp. 6332–6354, 2012.
- [43] ——, “Stationary and ergodic properties of stochastic nonlinear systems controlled over communication channels,” SIAM Journal on Control and Optimization, vol. 54, no. 5, pp. 2844–2871, 2016.
- [44] H. V. Poor, An Introduction to Signal Detection and Estimation (2nd Edition). Berlin, Heidelberg: Springer-Verlag, 1994.
- [45] A. Gersho and R. M. Gray, Vector quantization and signal compression. Springer Science & Business Media, 2012, vol. 159.
- [46] P. Mayekar and H. Tyagi, “RATQ: A universal fixed-length quantizer for stochastic optimization,” arXiv:1908.08200, 2019.
- [47] T. Cover and J. Thomas, Elements of Information Theory, ser. A Wiley-Interscience publication. Wiley, 2006.