Improved Channel Coding Performance Through Cost Variability
Abstract
Channel coding for discrete memoryless channels (DMCs) with mean and variance cost constraints has been recently introduced. We show that there is an improvement in coding performance due to cost variability, both with and without feedback. We demonstrate this improvement over the traditional almost-sure cost constraint (also called the peak-power constraint) that prohibits any cost variation above a fixed threshold. Our result simultaneously shows that feedback does not improve the second-order coding rate of simple-dispersion DMCs under the peak-power constraint. This finding parallels similar results for unconstrained simple-dispersion DMCs, additive white Gaussian noise (AWGN) channels and parallel Gaussian channels.
Index Terms:
Channel coding, feedback communications, second-order coding rate, stochastic control.I Introduction
Channel coding is a fundamental problem focused on the reliable transmission of information over a noisy channel. Information transmission with arbitrarily small error probability is possible at all rates below the capacity of the channel, if the number of channel uses (also called the blocklength) is permitted to grow without bound [1]. At finite blocklengths, there is an unavoidable backoff from capacity due to the random nature of the channel. The second-order coding rate (SOCR) ([2, 3, 4, 5, 6]) quantifies the convergence to the capacity.
In many practical scenarios, the channel input is subject to some cost constraints which limit the amount of resources that can be used for transmission. With a cost constraint present, the role of capacity is replaced by the capacity-cost function [1, Theorem 6.11]. One common form of the cost constraint is the almost-sure (a.s.) cost constraint ([3, 7]) which bounds the time-average cost of the channel input over all messages, realizations of any side randomness, channel noise (if there is feedback), etc.:
(1) |
where is the cost function. Under the almost-sure (a.s.) cost constraint, the optimal first-order coding rate is the capacity-cost function, the strong converse holds [1, Theorem 6.11], and the optimal SOCR is also known [3, Theorem 3].
The a.s. cost constraint is quite unforgiving, never allowing the cost to exceed the threshold under any circumstances. Our first result (Theorem 1) shows that the SOCR can be strictly improved by merely allowing the cost to fluctuate above the threshold in a manner consistent with a noise process, i.e., the fluctuations have a variance of . Our second result (Theorem 2) shows that the a.s. cost framework does not allow feedback improvement to SOCR for simple-dispersion111See Definition 1. DMCs. This again contrasts with the scenario where random fluctuations with variance above the threshold are allowed, as shown in [8, Theorem 3]. This highlights the important role cost variability plays in enabling feedback mechanisms to improve coding performance.
These findings raise the question of whether it is necessary to impose a constraint as stringent as (1). Cost constraints in communication systems are typically imposed to achieve goals such as operating circuitry in the linear regime, minimizing power consumption, and reducing interference with other terminals. It is worth noting that these goals do not always necessitate the use of the strict a.s. cost constraint. For example, the expected cost constraint is often used in wireless communication literature (see, e.g., [9, 10, 11, 12]) because it allows for a dynamic allocation of power based on the current channel state. The expected cost constraint bounds the cost averaged over time and the ensemble:
(2) |
Yet, if the a.s. constraint is too strict, the expectation constraint is arguably too weak. The expectation constraint allows highly non-ergodic use of power, as shown in Section II-A, which is problematic both from the vantage points of operating circuitry in the linear regime and interference management.
The variance allowance is a feature of a new cost formulation, referred to as mean and variance cost constraints in [8]. This formulation replaces (1) with the following conditions:
(3) | ||||
(4) |
The mean and variance cost constraints were introduced as a relaxed version of the a.s. cost constraint that permits a small amount of stochastic fluctuation above the threshold while providing an ergodicity guarantee. Consider a random channel codebook whose codewords satisfy with equality. For a given input , define an ergodicity metric as
(5) |
The definition in only penalizes cost variation above the threshold and normalizes by the mean cost . Let be the desired ergodicity parameter. We say that a transmission is -ergodic if . Let be the desired uncertainty parameter. We say that a random codebook is -ergodic if , where is a random transmission from the codebook.
Under the mean and variance cost formulation, we have if
(6) |
where we call the critical blocklength. Thus, the critical blocklength specifies the minimum blocklength of a channel code for which transmission behaves ergodically with high probability. For fixed , , and , the parameters and are in one-to-one correspondence, so one can view the choice of in (4) as specifying the critical blocklength. Note that with an expectation-only constraint, we effectively have , so the transmission is not guaranteed to be ergodic at any blocklength. Furthermore, unlike the expected cost constraint, the mean and variance cost formulation:
The results of this paper also have significance in the context of previous works. Our result in Theorem 2 extends the previously known result that feedback does not improve the second-order performance for simple-dispersion DMCs without cost constraints [4]. It is also similar to the result in [15] that feedback does not improve the second-order performance for AWGN channels.
Random channel coding schemes often use independent and identically distributed (i.i.d.) codewords. It was noted in [16] that the a.s. cost constraint, which is the most commonly considered cost constraint in the context of discrete memoryless channels (DMCs), prohibits the use of i.i.d. codewords. It was shown in [16] that a feedback scheme that uses both i.i.d. and constant-composition codewords leads to an improved SOCR compared to the best non-feedback SOCR achievable under the a.s. cost constraint. Our result in Theorem 2 strengthens the result in [16] by showing that the aforementioned improvement also holds compared to the best feedback SOCR achievable under the a.s. cost constraint.
I-A Related Work
The second- and third-order asymptotics for DMCs with the a.s. cost constraint in the non-feedback setting have been characterized in [3] and [17], respectively. The second-order asymptotics in the feedback setting of DMCs that are not simple-dispersion are studied in [4] without cost constraints. There are more feedback results available for AWGN channels compared to DMCs under the a.s. cost constraint. For example, the result in [15] also addresses the third-order performance with feedback while [18] gives the result that feedback does not improve the second-order performance for parallel Gaussian channels. The second-order performance for the AWGN channel with an expected cost constraint is characterized in [19]. Table I summarizes these results across different settings in channel coding.
Paper | Channel | Performance | Cost Constraint | Feedback | Non-feedback |
Hayashi [3] | DMC, AWGN | 2nd order | a.s. | No | Yes |
Tan and Tomamichel [20] | AWGN | 3rd order | a.s. | No | Yes |
Kostina and Verdú [17] | DMC | 3rd order | a.s. | No | Yes |
Fong and Tan [15] | AWGN | 2nd and 3rd order | a.s. | Yes | No |
Wagner, Shende and Altuğ [4] | DMC | 2nd order | none | Yes | No |
Mahmood and Wagner [8] | DMC | 2nd order | mean and variance | Yes | Yes |
This paper | DMC | 2nd order | mean and variance, a.s. | Yes | Yes |
Polyanskiy [13, Th. 78] | Parallel AWGN | 2nd order | a.s. | No | Yes |
Fong and Tan [18] | Parallel AWGN | 2nd order | a.s. | Yes | No |
Polyanskiy [13, Th. 77] | AWGN | 1st order | expected cost | No | Yes |
Yang et al. [19] | AWGN | 2nd order | expected cost | No | Yes |
Our proof technique for Theorem 2 is more closely aligned with that used in [4] for DMCs than in [15] for AWGN channels. Both proofs show converse bounds with feedback that match the previously known non-feedback achievability results for DMCs and AWGN channels, respectively. A common technique used in both converse proofs is a result from binary hypothesis testing, which is used in the derivation of Lemma 1 in our paper and a similar result in [15, (17)]. We then proceed with the proof by using a Berry-Esseen-type result for bounded martingale difference sequences whereas [15] uses the usual Berry-Esseen theorem by first showing equality in distribution of the information density with a sum of i.i.d. random variables.
II Preliminaries
Let and be finite input and output alphabets, respectively, of the DMC , where is a stochastic matrix from to . For a given sequence , the -type of is defined as
for all , where is the standard indicator function. For a given sequence , we will use or to denote its type. Let be the set of -types on . For a given , denotes the type class, i.e., the set of sequences with empirical distribution equal to . For a random variable , denotes its essential supremum (that is, the infimum of those numbers such that ). We will write to denote logarithm to the base and to denote to the power of . The cost function is denoted by where and is a constant. Let . Let denote the smallest such that the capacity-cost function is equal to the unconstrained capacity. We assume and throughout the paper. For , the capacity-cost function is defined as
(7) |
where . The function is strictly increasing and differentiable [1, Problem 8.4] in the interval . For a given , we define
Let be the set of all capacity-cost-achieving distributions, i.e., the set of maximizing distributions in . For any , let be the marginal distribution on . Note that the output distribution is always unique, and without loss of generality, can be assumed to satisfy for all [21, Corollaries 1 and 2 to Theorem 4.5.1].
The following definitions will remain in effect throughout the paper:
Definition 1 (cf. [4])
A DMC is called simple-dispersion at the cost if
We will only focus on simple-dispersion channels for a fixed cost and thus define
for any .
With a blocklength and a fixed rate , let denote the message set. Let denote the random message drawn uniformly from the message set.
Definition 2
An code for a DMC consists of an encoder which, for each message , chooses an input , and a decoder which maps the output to . The code is random if or is random.
Definition 3
An code with ideal feedback for a DMC consists of an encoder which, at each time instant () and for each message , chooses an input , and a decoder which maps the output to . The code is random if or is random.
Definition 4
An code for a DMC is an code such that almost surely, where the message has a uniform distribution over the message set .
Definition 5
An code with ideal feedback for a DMC is an code with ideal feedback such that almost surely, where the message has a uniform distribution over the message set .
Definition 6
An code for a DMC is an code such that and , where the message has a uniform distribution over the message set .
Definition 7
An code with ideal feedback for a DMC is an code with ideal feedback such that and , where the message has a uniform distribution over the message set .
Given , define
where denotes the minimum average error probability attainable by any random code with feedback. Similarly, define
where denotes the minimum average error probability attainable by any random code without feedback. Define and similarly for codes with mean and variance cost constraints.
II-A Expectation-only cost constraint
Under this cost formulation, the average cost of the codewords is constrained in expectation only:
(8) |
We now illustrate a codebook construction (adapted from [17]) with an average error probability at most that meets the cost threshold according to but the cost of its codewords is non-ergodic, i.e., does not converge to . Consider a codebook with rate whose average error probability and each of whose codewords has average cost equal to . Such a codebook exists because . Assuming without loss of generality, one could modify the codebook by replacing an -fraction of its codewords with the all-zero codeword. The modified codebook has average error probability at most and meets the cost threshold according to . But is either or . This construction also shows that the strong converse does not hold under the expected cost constraint.
The mean and variance cost constraints ensure that the average cost of the codewords concentrate around the cost threshold , thereby disallowing codebook constructions with irregular or non-ergodic power consumption.
III Main Results
We prove coding performance improvement in terms of the second-order coding rate, although equivalent results in terms of the average error probability improvement can also be shown as in [8, Theorems 1-3]. Let
-
•
denote the optimal SOCR with the a.s. cost constraint without feedback,
-
•
denote the optimal SOCR with the a.s. cost constraint with feedback,
-
•
denote the optimal SOCR with the mean and variance cost constraints without feedback and
-
•
denote the optimal SOCR with the mean and variance cost constraints with feedback
for channel codes operating with average error probability of at most . In the non-feedback case, is the optimal first-order rate for DMCs with the a.s. cost constraint [1, Theorem 6.11] as well as the mean and variance cost formulation [8, Theorems 1 and 2], i.e.,
(9) | ||||
(10) |
for all . The results and imply that the strong converse holds. We thus define the second-order rates with respect to the capacity-cost function as follows:
For the feedback case, we simply take the convention to define the SOCR with respect to as follows:
For the a.s. cost constraint, this convention is justified because from the result in Theorem 2, is the optimal first-order rate, in the analogous sense to , for DMCs with feedback. For DMCs without cost constraints, Shannon [22, Theorem 6] showed that feedback does not increase the capacity.
III-A Performance improvement for non-feedback codes
From [3, Theorem 3], we have for a simple-dispersion222The result in [3, Theorem 3] is not restricted to simple-dispersion DMCs. DMC . On the other hand, [8, Theorems 1 and 2] proved that
(11) |
for a DMC such that and , where the function is given by
(12) |
The maximum and the minimum in and , respectively, are attained [8, Lemmas 3 and 4].
Theorem 1
Fix an arbitrary . Then for any , and a DMC such that and , we have .
Proof: The proof is given in Section IV.
The improvement in Theorem 1 is shown in Fig. 1 for a binary symmetric channel. Specifically, the second-order coding rate as a function of the average error probability is shown in Fig. 1 and Fig. 2 for a binary symmetric channel with parameter , alphabets , cost threshold and cost function .

As discussed in , the choice of together with the desired values of and specifies the critical blocklength exceeding which guarantees the -ergodicity of the coding scheme. In practice, the choice of blocklength is more fundamental as it affects complexity and latency. Therefore, it is more prudent for the value of to be dictated by the blocklength and the desired -ergodicity via the relation derived from . With the same channel and cost parameters as used in Fig. 1, Fig. 2 shows the second-order performance for different critical blocklengths for an -ergodic codebook with .

III-B Performance improvement for feedback codes
Definition 8
A controller is a function .
Random feedback codes can be constructed by controllers. Given a message and the past channel inputs and outputs , the channel input at time instant is distributed according to . There is a one-to-one correspondence between a random feedback code and a controller-based code.
-
•
A random feedback code is equivalently specified by the joint distribution
where is the decoded message. Marginalizing out and , we obtain
from which a controller can be obtained with the specification
for each time .
-
•
Likewise, a controller specifies a random feedback code by inducing the following joint distribution:
A controller along with the channel specify a joint distribution over with probability assignments given by
(13) |
Lemma 1
Consider a channel with cost constraint . Then for every and ,
(14) |
where the supremum in is over all controllers satisfying
(15) |
Proof: Lemma 1 is similar to [23, Theorem 27], [18, (42)], [4, Lemma 15], [8, Lemma 2] and others. Its proof is omitted.
Theorem 2
Fix an arbitrary . Then for any and a simple-dispersion DMC such that , we have .
Proof: The proof is given in Section V.
We only prove a converse result that is the following upper bound:
(16) |
The result of Theorem 2 follows by combining with the existing achievability result (without feedback) from [3, Theorem 3].
From Theorems 1 and 2 alone, we have that . More importantly, the mean and variance cost formulation admits feedback mechanisms that improve the SOCR, even if the capacity-cost-achieving distribution is unique, i.e., . This is the more interesting case since for compound-dispersion channels where , feedback is already known to improve second-order performance via timid/bold coding [4].
In contrast to the almost-sure constraint where , we observe that for simple-dispersion channels with [8, Theorem 3]. In summary, for any , , and a simple-dispersion DMC such that and ,
where the last equality above has been proven without the assumption .
IV Proof of Theorem 1
Since is a continuous function [8, Lemma 3], it suffices to show that for all and ,
(17) |
The LHS of can be written as
where we used the constraint to eliminate one of the decision variables.
Assume by contradiction that
(18) |
for all , and . The assumption is without loss of generality since one of the two point masses must be greater than or equal to .
If holds, then
for all , and . Since in this case, we must have
(19) |
for all .
Consider the function
with domain . For any ,
(20) |
In equality , we have by the mean value theorem. It is easy to see that for sufficiently small , the expression in is negative. Since , we have for some , which contradicts .
V Proof of Theorem 2
For any , define
For any , define
(21) |
Definition 9
For any distribution and such that , define the probability measure
Definition 10
For any and any , let333For , and for , .
Fix arbitrarily and let denote a single point-mass distribution at . Then for any , and a controller satisfying , we define the controller as
Remark 1
Given any controller satisfying , Definition 10 constructs a modified controller which satisfies
(22) |
Intuitively, amplifies the probability assignments of over the set and nullifies the probability assignments of over the set so that almost surely for . Definition 10 is inspired by, and corrects an error in, [4, Def. 8]. With the definition given in [4, Def. 8], the analogue of (22) does not hold, although it is asserted in the proof of [4, Thm. 3]. This can be rectified by using Definition 10 in place of [4, Def. 8]. An analogous comment applies to the next definition and [4, Def. 9].
Definition 11
For any type such that , and any , let
Fix arbitrarily and let denote a single point-mass distribution at . Then for any , and a controller satisfying , we define the controller as
Remark 2
Given any controller satisfying , Definition 11 constructs a modified controller which satisfies for .
Now let
(23) |
where
Let denote the distribution . Let denote the distribution . Let denote the distribution for each such that . Note that all controllers and satisfy . We have
(24) |
Inequality follows from the following argument. For any such that , note that for all , and so that
(25) |
Then
With some abuse of notation, we assume in inequality above that if , then
which is justified by .
A similar derivation gives
for all such that and , where .
Let , where will be specified later. Define444This proof follows that of [4, Thm. 3] and corrects an error therein: the filtration defined before (98) should be defined like here.
Two things are important to note here. First, by the Markov property , we have a.s. Second, .
For the first term in , we can upper bound it as follows:
(26) |
In inequality , we used the following lemma and the fact that almost surely.
Lemma 2
For ,
(27) |
almost surely, where has an arbitrary distribution and is the output of the channel when is the input.
Proof: See [24, Proposition 1] and its references.
We will now apply a martingale central limit theorem [25, Corollary to Theorem 2] to the expression in . We first verify that the hypotheses of [25, Corollary to Theorem 2] are satisfied:
-
1.
First, we require that
Since for all by assumption and almost surely for each channel input and output pair , we have
for all .
- 2.
Under the above two conditions, it follows from [25, Corollary to Theorem 2] that there exists a constant depending only on such that for any ,
(28) |
Using Lemma 3 in , we obtain
(29) |
where the last inequality holds for sufficiently large for some constant which can be chosen such that as .
Lemma 3
We have
Furthermore, for ,
almost surely according to the probability measure .
Note that the upper bound in holds for any . For a given error probability , we choose according to . Then using in , we obtain for any given that
(32) |
provides an upper bound to the first term in .
We now upper bound the second term in .
Using again the choice of in , we have
(33) |
where in equality , is any arbitrary sequence from the type class . Equality holds because under the probability measure , a.s. (see Remark 2) and the distribution of
depends on only through its type. Continuing from , we have
(34) |
where the last inequality holds for sufficiently large because , as defined in , is an term, and from the construction of the set , we have
which implies
for some constant .
Let . We now show that for all . Let and . Then
Thus,
for all sufficiently large . Hence, we can use Azuma’s inequality [26, (33), p. 61] to upper bound , giving us
(35) |
which goes to zero as .
Substituting the upper bounds and in , we obtain
for sufficiently large . Since the controller was arbitrary, we can apply Lemma 1 to obtain
where is obtained from the expression of in after taking the limit as , i.e.,
Finally, since was arbitrary, we can take and arbitrarily small, giving us the converse result
Since this matches the optimal non-feedback SOCR of simple dispersion DMCs with a peak-power cost constraint, we have
(36) |
for simple-dispersion DMCs with a peak-power cost constraint.
Appendix A Proof of Lemma 3
We have
From Remark 1, since a.s., there exists a such that . Hence,
a.s., where we used the fact that is simple-dispersion at the cost . Furthermore, since ,
a.s. and
Finally, we have
assuming .
Acknowledgment
This research was supported by the US National Science Foundation under grant CCF-1956192.
References
- [1] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems, 2nd ed. Cambridge University Press, 2011.
- [2] V. Strassen, “Asymptotische abschätzungen in Shannon’s informationstheorie,” in Proc. Trans. 3rd Prague Conf Inf. Theory, Prague, Czech, 1962, pp. 689–723.
- [3] M. Hayashi, “Information spectrum approach to second-order coding rate in channel coding,” IEEE Transactions on Information Theory, vol. 55, no. 11, pp. 4947–4966, 2009.
- [4] A. B. Wagner, N. V. Shende, and Y. Altuğ, “A new method for employing feedback to improve coding performance,” IEEE Transactions on Information Theory, vol. 66, no. 11, pp. 6660–6681, 2020.
- [5] Y. Y. Shkel, V. Y. F. Tan, and S. C. Draper, “Second-order coding rate for -class source-channel codes,” in 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2015, pp. 620–626.
- [6] M. Tomamichel and V. Y. F. Tan, “Second-order coding rates for channels with state,” IEEE Transactions on Information Theory, vol. 60, no. 8, pp. 4427–4448, 2014.
- [7] C. E. Shannon, “Probability of error for optimal codes in a gaussian channel,” The Bell System Technical Journal, vol. 38, no. 3, pp. 611–656, 1959.
- [8] A. Mahmood and A. B. Wagner, “Channel coding with mean and variance cost constraints,” in 2024 IEEE International Symposium on Information Theory (ISIT), 2024, pp. 510–515.
- [9] R. Yates, “A framework for uplink power control in cellular radio systems,” IEEE Journal on Selected Areas in Communications, vol. 13, no. 7, pp. 1341–1347, 1995.
- [10] A. Goldsmith and P. Varaiya, “Capacity of fading channels with channel side information,” IEEE Transactions on Information Theory, vol. 43, no. 6, pp. 1986–1992, 1997.
- [11] S. Hanly and D. Tse, “Multiaccess fading channels. ii. delay-limited capacities,” IEEE Transactions on Information Theory, vol. 44, no. 7, pp. 2816–2831, 1998.
- [12] A. Lozano and N. Jindal, “Transmit diversity vs. spatial multiplexing in modern mimo systems,” IEEE Transactions on Wireless Communications, vol. 9, no. 1, pp. 186–197, 2010.
- [13] Y. Polyanskiy, “Channel coding: Non-asymptotic fundamental limits,” Ph.D. dissertation, Dept. Elect. Eng., Princeton Univ., Princeton, NJ, USA, 2010.
- [14] A. Sahai, S. Draper, and M. Gastpar, “Boosting reliability over AWGN networks with average power constraints and noiseless feedback,” in Proceedings. International Symposium on Information Theory, 2005. ISIT 2005., 2005, pp. 402–406.
- [15] S. L. Fong and V. Y. F. Tan, “Asymptotic expansions for the awgn channel with feedback under a peak power constraint,” in 2015 IEEE International Symposium on Information Theory (ISIT), 2015, pp. 311–315.
- [16] A. Mahmood and A. B. Wagner, “Timid/bold coding for channels with cost constraints,” in 2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 1442–1447.
- [17] V. Kostina and S. Verdú, “Channels with cost constraints: Strong converse and dispersion,” IEEE Transactions on Information Theory, vol. 61, no. 5, pp. 2415–2429, 2015.
- [18] S. L. Fong and V. Y. F. Tan, “A tight upper bound on the second-order coding rate of the parallel Gaussian channel with feedback,” IEEE Transactions on Information Theory, vol. 63, no. 10, pp. 6474–6486, 2017.
- [19] W. Yang, G. Caire, G. Durisi, and Y. Polyanskiy, “Optimum power control at finite blocklength,” IEEE Transactions on Information Theory, vol. 61, no. 9, pp. 4598–4615, 2015.
- [20] V. Y. F. Tan and M. Tomamichel, “The third-order term in the normal approximation for the awgn channel,” IEEE Transactions on Information Theory, vol. 61, no. 5, pp. 2430–2438, 2015.
- [21] R. G. Gallager, Information Theory and Reliable Communication. New York, NY, USA: Wiley, 1968.
- [22] C. Shannon, “The zero error capacity of a noisy channel,” IRE Transactions on Information Theory, vol. 2, no. 3, pp. 8–19, 1956.
- [23] Y. Polyanskiy, H. V. Poor, and S. Verdu, “Channel coding rate in the finite blocklength regime,” IEEE Transactions on Information Theory, vol. 56, no. 5, pp. 2307–2359, 2010.
- [24] A. Mahmood and A. B. Wagner, “Channel coding with mean and variance cost constraints,” 2024. [Online]. Available: https://arxiv.org/abs/2401.16417
- [25] E. Bolthausen, “Exact convergence rates in some martingale central limit theorems,” The Annals of Probability, vol. 10, no. 3, pp. 672–688, Aug. 1982.
- [26] B. Bercu, B. Delyon, and E. Rio, Concentration Inequalities for Sums and Martingales. Cham, Switzerland: Springer, 2015.