Strong Converses using Typical Changes of Measures and Asymptotic Markov Chains
Abstract
The paper presents exponentially-strong converses for source-coding, channel coding, and hypothesis testing problems. More specifically, it presents alternative proofs for the well-known exponentially-strong converse bounds for almost lossless source-coding with side-information and for channel coding over a discrete memoryless channel (DMC). These alternative proofs are solely based on a change of measure argument on the sets of conditionally or jointly typical sequences that result in a correct decision, and on the analysis of these measures in the asymptotic regime of infinite blocklengths. The paper also presents a new exponentially-strong converse for the K-hop hypothesis testing against independence problem with certain Markov chains and a strong converse for the two-terminal L-round interactive compression problem with multiple distortion constraints that depend on both sources and both reconstructions. This latter problem includes as special cases the Wyner-Ziv problem, the interactive function computation problem, and the compression with lossy common reconstruction problem. These new strong converse proofs are derived using similar change of measure arguments as described above and by additionally proving that certain Markov chains involving auxiliary random variables hold in the asymptotic regime of infinite blocklengths. As shown in related publications, the same method also yields converse bounds under expected resource constraints.
Index Terms:
Strong converse, change of measure, asymptotic Markov chains, source coding, channel coding, hypothesis testing.I Introduction
Strong converse results have a rich history in information theory. They refer to proofs showing that the fundamental performance limit (such as minimum compression rate or maximum rate of communiction) of a specific system does not depend on its allowed error (or excess) probability, as long as this probability is not . For example, Wolfowitz’ strong converse [2] established that the largest rate of communication over a discrete-memoryless channel (DMC) ensuring a probability of error smaller than (see Figure 1) always equals capacity, as long as does not tend to 1 for increasing blocklengths . Differently stated, for all rates of communication that exceed capacity, the probability of error of any system must tend to 1 as the blocklength tends to infinity. A stronger result, known as the exponentially-strong converse states that for all rates above capacity the probability of decoding error tends to 1 exponentially fast in the blocklength. This result was first established by Csiszár and Körner [3] who presented lower bounds on the error exponents at rates above capacity. Since then, various alternative proofs for the strong or exponentially-strong converse for channel coding over a DMC have been proposed, for example based on the blowing-up lemma [4, 5], by bounding the decoding error probabilities at finite blocklengths [6, 7], or by putting forward geometric and typicality arguments [8]. Various extensions to multi-user communication networks were also derived, see e.g., [5, 9, 10]. In this paper, we present yet-another proof, based on a change of measure argument that restricts to output sequences that are conditionally-typical for one of the possible codewords and lie in the decoding set of this codeword. Related is the converse proof for the wiretap channel by Tyagi and Watanabe [11], which is also based on a similar change of measure argument, but does not restrict to conditionally-typical output sequences. This restriction is however important in our converse proof to relate the rate of the code to the capacity of the DMC.

For (almost) lossless source coding, the strong converse states that any discrete-memoryless source (DMS) cannot be compressed with a rate below the entropy of the source and with reconstruction error probability that stays below asymptotically for infinite blocklengths. This result essentially follows by the asymptotic equipartition property [12, 13]. The exponentially-strong converse for lossless compression [14] states that for all compression rates below entropy, the probability of reconstruction error tends to 1 exponentially fast. Strong converse results also extend to lossy compression, where the limit of compression of DMSs is not entropy but the well-known rate-distortion function. The strong converse for lossy compression of DMSs was established by Körner [15], see also the related work by Kieffer [16].
Our focus is on compression scenarios where the decoder has side-information that is correlated with the source as depicted in Figure 2. For memoryless sources, the fundamental limits of compression with side-information were established by Slepian and Wolf [17] for the lossless case and by Wyner and Ziv [18] for the lossy case. Exponentially-strong converses were established by Oohama and Han [19] for the lossless case and by Oohama [20] for the lossy case. Various exponentially-strong converse results were also derived for compression problems in more complicated networks with and without side-information, see e.g., [21, 22, 23, 24, 11].
In this paper, we reprove the exponentially-strong converse for lossless source coding with side-information.

As a new result, we also prove the strong converse for an interactive lossy compression problem (see Figure 3) where the distortion measures can depend on the source sequences observed at the two terminals, as well as on the two terminals’ produced reconstruction sequences. This problem includes as special cases Kaspi’s two-terminal interactive lossy source-coding problem [25], Ma and Ishwar’s two-terminal interactive function-computation problem [26], and Steinberg’s lossy source coding problem with common reconstructions [27], as well as its extension by Malär et al. [28]. Our proof of the exponentially-strong converse for lossless source coding with side-information is based on a change of measure argument that restricts to source sequences that are jointly typical and result in correct decoding, and on the asymptotic analysis of this new measure. Our strong converse proof for interactive -round lossy source coding with generalized distortion measures again relies on a change of measure argument on the set of typical sequences that result in the desired distortion and on analysing this new measure asymptotically in the limit of infinite blocklengths. As part of this analysis, we prove in particular that certain Markov chains hold in this asymptotic regime of infinite blocklengths.
The change of measure argument for proving exponentially-strong converses for source coding problems dates back to Gu and Effros [22, 23] and for function computation problems and channel coding to Tyagi and Watanabe [11]. Our proofs are similar to [11], however in our changed measures we restrict to typical sequences, which allows us to circumvent resorting to variational characterizations of the multi-letter and single-letter problems.

In this paper, we also prove a new exponentially-strong converse result for the -hop hypothesis testing problem in [29], see Figure 4. In this problem, all relays as well as the final receiver guess the hypothesis by testing against independence. The figure of merit is the type-II error exponents that can be achieved at these terminals subject to rate constraints on the links and the constraint that the type-I error probabilities at these terminals have to stay below predefined thresholds . Specifically, we consider a scenario where terminals observe memoryless source sequences whose underlying joint distribution depends on a binary hypothesis . The distribution is under and it is under . Upon observing its source sequence , each terminal can send a -bits message to the next-following terminal. Terminals have to produce a guess of the hypothesis based on their local observations and their received message . The main goal is to maximize their type-II error probability (the probability of error under ) under the constraint that for each blocklength the type-I error probability (the probability of error under the null hypothesis ) stays below a given threshold .
For , this problem was solved by Ahlswede and Csiszár [30] for type-I error probabilities that are asymptotically bounded away from 1. In particular, it was shown that the maximum achievable type-II error exponent does not depend on the values as long as they do not tend to 1. For arbitrary , the problem was studied under the assumption that all the type-I error probabilities tend to 0 as [29]. In [31], it was shown that for the result in [29] applies unchanged for sequences of type-I error probabilities and for which each individual sequence as well as the sum of the two sequences is asymptotically bounded away from 1. In this work, we prove the exponentially-strong converse to this result, i.e., we show that for arbitrary and arbitrary type-I error probabilities not vanishing exponentially fast in the blocklength, the result in [29] continues to hold. Notice that the proof of the mentioned special cases with in [31] used the change of measure argument and variational characterizations proposed by Tyagi and Watanabe [11] and hypercontractivity arguments as in [32]. The proof of the strong converse for in [30] was based on the blowing-up lemma [4], same as the proof of the strong converse for when communication is over a DMC and without any rate constraint [33]. The latter work also used the Tyagi-Watanabe change of measure argument combined with variational characterizations. Unlike these works, our general proof does not require any blowing-up lemma or hypercontractivity arguments, nor variational characterizations.

We summarize the main contributions of our paper:
-
•
We present alternative exponentially-strong converse proofs for the lossless compression problem with side-information for DMSs and the channel coding problem over a DMCs. The proofs are rather simple and depend only on a change of measure argument and the asymptotic analysis of these new measures.
-
•
We derive new strong converse results for the two-terminal -round interactive lossy source coding problem for DMSs and with multiple distortion measures that depend on both sources and both reconstructions. This setup includes as special cases the Wyner-Ziv problem, the interactive function computation problem, and the lossy source coding problem with (lossy) common reconstructions. Previous strong converse proofs for the Wyner-Ziv problem and the interactive function computation problem relied on the method of variational characterizations and on a change of measure argument. Our proof also relies on a similar change of measure, which is however restricted to the typical set. This, combined with the asymptotic proofs of specific Markov chains allows us to directly prove the desired results without resorting to variational characterizations.
-
•
We derive new exponentially-strong converse results for a -hop hypothesis testing problem. Previously, a general weak converse and strong converses in special cases had been derived for this problem. The previous strong converse for was based on a change of measure argument combined with variational characterizations, and on hypercontractivity arguments. The general exponentially-strong converse proof in this paper relies on a change of measure argument restricted to the typical set and on an asymptotic analysis of this new measure including the proof of asymptotic Markov chains.
Outline of the Paper: We end this section with remarks on notation. The following Section II presents and proofs two key lemmas used in the rest of the paper. Sections III and IV present our new strong converse proof methods for the almost lossless source coding with side-information problem and for communication over a DMC. These strong converse proofs are solely based on change of measure arguments and on the analysis of these measures in the asymptotic regime of infinite blocklengths. The converse proofs in the next two Sections V and VI are also based on similar change of measure arguments and asymptotic analysis of these measures, but additionally also require proving that certain Markov chains involving auxiliary random variables hold in the asymptotic regime of infinite blocklengths. Specifically, Section V considers the -round interactive compression problem with distortion functions that can depend on the sources and reconstructions at both terminals. Section VI considers the -hop hypothesis testing for testing against independence where the observations at the various terminals obey some Markov conditions.
Notation: We mostly follow standard notation where upper-case letters are used for random quantities and lower-case letters for deterministic realizations. Sets are denoted using calligraphic fonts. All random variables are assumed finite and discrete. We abbreviate the -tuples and as and and the tuples and as and . We further abbreviate independent and identically distributed as i.i.d. and probability mass function as pmf.
Entropy, conditional entropy, and mutual information functionals are written as , , and , where the arguments of these functionals are random variables and whenever their probability mass function (pmf) is not clear from the context, we add it as a subscript to these functionals. The Kullback-Leibler divergence between two pmfs is denoted by . We shall use to indicate the jointly strongly-typical set with respect to the pmf on the product alphabet and parameter as defined in [3, Definition 2.8]. Specifically, denoting by the number of occurrences of the pair in sequences :
(1) |
a pair lies in if
(2) |
and whenever . The conditionally strongly-typical set with respect to a conditional pmf from to , parameter , and sequence is denoted [3, Definition 2.9]. It contains all sequences satisfying
(3) |
and whenever . Here denotes the number of occurrences of symbol in . In this paper, we denote the joint type of by , i.e.,
(4) |
Accordingly, the marginal type of is written as . Finally, we use Landau notation to indicate any function that tends to 0 for blocklengths .
II Auxiliary Lemmas
Lemma 1
Let be a sequence of pairs of i.i.d. random variables according to the pmf . Further let be a sequence of small positive numbers satisfying111Condition (6) ensures that the probability of the strongly typical set under tends to 1 as [3, Remark to Lemma 2.12].
(5) | |||||
(6) |
and for each positive integer let be a subset of the strongly-typical set so that its probability
(7) |
satisfies
(8) |
Let further be random variables of joint pmf
(9) |
and be uniform over independent of all other random variables.
For the distribution in (9), the following limits hold as :
(10) | |||||
(11) | |||||
(12) | |||||
(13) | |||||
(14) | |||||
(15) | |||||
(16) |
Proof:
Notice that (14)–(16) follow directly from (10) and continuity of entropy. To prove (10), notice that
(17) | |||||
(18) | |||||
(19) |
where the expectations are with respect to the tuples and . Since by the definition of the typical set,
(20) |
we conclude that as the probability tends to .
To prove (11), notice first that
(21) | |||||
(22) | |||||
(23) | |||||
(24) | |||||
(25) | |||||
(26) |
where (23) holds by the law of total probability applied to the random variables . Combined with the following two limits (27) and (28) this establishes (11). The first relevant limit is
(27) |
which holds by (10) and because whenever . The second limit is:
(28) |
and holds because and by the following set of inequalities:
(29) | |||||
(30) | |||||
(31) |
The second lemma dates back to Csiszár and Körner [3].
Lemma 2
Let and be of arbitrary joint distribution and be uniform over independent of . Then:
(35) | |||||
The conditional version follows immediately from the definition of conditional entropy:
(36) | |||||
for any random variable so that is independent of .
Proof:
Kramer’s simple telescoping sum states:
(37) |
which by the chain rule of mutual information implies Csiszár’s sum-identity [3]:
(38) |
where we added and subtracted the mutual informations . Similarly, adding and subtracting the mutual informations yields:
(39) |
The desired equality in (35) follows then immediately from this last equation (39) and from the chain rule:
(40) | |||||
(41) | |||||
(42) | |||||
(43) |
where is obtained by adding and subtracting , holds by (39); and by the definition of .
∎
III Lossless Source Coding with Side-Information
This section studies the lossless source coding with side-information setup in Figure 2.
III-A Setup and Result
Consider two terminals, an encoder observing the source sequence and a decoder observing the related side-information sequence , where we assume that
(44) |
for a given probability mass function on the product alphabet . The encoder uses a function to compress the sequence into a message of given rate :
(45) |
Based on this message and its own observation , the decoder is supposed to reconstruct the source sequence with small probability of error. Thus, the decoder applies a decoding function to to produce the reconstruction sequence :
(46) |
Definition 1
Given a sequence of error probabilities , the rate is said -achievable if there exist sequences (in ) of encoding and reconstruction functions and such that for each blocklength :
(47) |
A standard result in information theory is [19]:
Theorem 1
For any sequence satisfying
(48) |
any rate is not -achievable.
Notice that the theorem in particular implies an exponentially-strong converse, i.e., that for all rates the probability of error approches exponentially fast in . The result is well known, but we provide an alternative proof in the following subsection.
III-B Alternative Strong Converse Proof
Fix a sequence of encoding and decoding functions satisfying (47). Choose a sequence of small positive numbers satisfying
(49) | |||||
(50) |
and select for each a subset
(51) |
i.e., the set of all typical -sequences for which the reconstructed sequence coincides with the source sequence . Let
(52) |
and notice that by (47) and [3, Remark to Lemma 2.12]:
(53) |
(54) |
Let further be random variables of joint pmf
(55) |
Let also and be uniform over independent of .
The strong converse is then easily obtained as follows. Similar to the weak converse we have:
(56) |
where the last inequality in (56) holds because can be obtained as a function of and , see the definition of the set .
IV Communication over a Memoryless Channel
This section studies communication over a discrete memoryless channel (DMC) as depicted in Figure 1.
IV-A Setup and Results
Consider a transmitter (Tx) that wishes to communicate to a receiver (Rx) over a DMC parametrized by the finite input and output alphabets and and the transition law . The goal of the communication is that the Tx conveys a message to the Rx, where is uniformly distributed over the set with and denoting the rate and blocklength of communication, respectively.
For a given blocklength , the Tx thus produces the -length sequence of channel inputs
(57) |
for some choice of the encoding function , and the Rx observes the sequence of channel outputs , where the time- output is distributed according to the law when the time- input is , irrespective of the previous and future inputs and outputs.
The receiver attempts to guess message based on the sequence of channel outputs :
(58) |
using a decoding function of the form . The goal is to minimize the maximum decoding error probability
(59) |
Definition 2
The rate is said -achievable over the DMC , if there exists a sequence of encoding and decoding functions such that for each blocklength the maximum probability of error
(60) |
A well-known result in information theory states [3]:
Theorem 2
Any rate , where denotes the capacity
(61) |
is not -achievable for all sequences satisfying
(62) |
Above result implies that for all rates above capacity, the probability of error converges exponentially fast to 1. This result is well known, here we present a different converse proof.
IV-B Alternative Strong Converse Proof
Fix a sequence of encoding and decoding functions so that (60) holds. Choose a sequence of small positive numbers satisfying
(63) | |||||
(64) |
and define for each message the set
(65) |
and its probability
(66) |
(For readability we write the sets and its probability without the subscript .) By (60) and [3, Remark to Lemma 2.12]:
(67) |
(68) |
Let further be random variables so that is uniform over the set (i.e., it has the same distribution as ),
(69) |
and
(70) |
Further, let be independent of and uniform over . Notice that since the decoding sets are disjoint, by the definition of the new measure it is possible to determine from with probability .
Following similar steps as in the weak converse, we have:
(71) | |||||
(72) | |||||
(73) | |||||
(74) | |||||
(75) | |||||
(76) |
where holds because as explained above.
By the following lemma, by considering an appropriate subsequence of blocklengths, and by the continuity of the entropy function, we deduce that
(77) |
where the subscript indicates that mutual information is with respect to the joint pmf with denoting the pmf mentioned in the lemma. This concludes the proof of the strong converse for channel coding.
Lemma 3
There exists an increasing subsequence of blocklengths such that for some pmf :
(78) | |||||
(79) |
where denotes the conditional entropy of given when the pair .
Proof:
For readability, we will also write and to indicate the (random) codewords and . We have:
(80) | |||||
(81) | |||||
(82) | |||||
(83) | |||||
(84) |
Since is a subset of the conditional typical set , for all , all , and all :
(85) |
and if then . Plugging these conditions into (84) we obtain
(86a) | |||||
and similarly: | |||||
(86b) |
Let now be an increasing subsequence of blocklengths so that the sequence of average types converges for each and denote the convergence point by . Then, since as , by (86):
(87) |
establishing the first part of the lemma.
Notice next that by definition
(88) | |||||
(94) | |||||
Averaging over all messages , we obtain
(96) | |||||
V Interactive Lossy Compression
This section focuses on the interactive lossy compression problem depicted in Figure 3.
V-A Setup
Consider two terminals, observing the related source sequences and , where as in the case of source coding with side-information:
(98) |
for a given probability mass function on the product alphabet . Communication between the two terminals is over noise-free links and interactive in rounds. The terminal observing starts communication and thus in all odd rounds , the message is created as:
(99) |
for an encoding function on appropriate domains, where each message , for given non-negative rates . (Note that for , . ) In even rounds , the message is created as:
(100) |
At the end of the rounds, each terminal produces a reconstruction sequence on a pre-specified alphabet. The terminal observing produces
(101) |
for taking value on the given alphabet . The terminal observing produces
(102) |
for taking value on the given alphabet .
The reconstructions are supposed to satisfy a set of distortion constraints:
(103) |
for given non-negative symbolwise-distortion functions .
Definition 3
Given sequences , a rate-tuple is said -achievable if there exist sequences (in ) of encoding functions and reconstruction functions and such that the excess distortion probabilities satisfy
(104) |
Remark 1
Our problem formulation includes various previously studied models as special cases. For example, the Wyner-Ziv problem [18] is included by setting and choosing a distortion function of the form . Kaspi’s interactive source-coding problem is included by restricting to two distortion functions of the form and . Lossy source coding with side-information and lossy common reconstruction [28, 27] is included by setting . The interactive function computation problem [26] is obtained by choosing , , and distortion function for the desired function .
Theorem 3
Given any sequences satisfying
(105a) | |||||
(105b) |
a rate-tuple can only be -achievable if it satisfies the rate-constraints
(106a) | |||||
for some auxiliary random variables and reconstruction random variables and satisfying the distortion constraints | |||||
(106b) | |||||
for , and the Markov chains | |||||
(106c) | |||||
(106d) | |||||
(106e) | |||||
(106f) |
Remark 2 (A single distortion)
For a single distortion constraint , the theorem implies that if the rate-tuple violates the constraints in the theorem, then the probability of excess distortion tends to 1 exponentially fast.
Remark 3 (Vector-valued distortions)
Theorem 3 extends in a straightforward manner to vector-valued distortion functions and vector distortions
(107) |
where for some positive integer , distortion functions are non-negative and of the form , and inequality (107) is meant component-wise. The difference between scalar distortion constraints as in (103) and a single -valued vector-distortion function as in (107) is that the vector-distortion constraint limits the probability that any of the constraints is violated whereas the scalar distortion constraints individually limit the probability of each distortion to be violated.
In the following section, we prove the strong converse, i.e., the non-achievability of any rate-tuple not satisfying the above conditions, for any sequences satisfying (105). Using standard arguments, it can be shown that for any rate-tuple satisfying constraints (106) there exist excess probabilities all tending to 0 as and so that the the rate-tuple is -achievable.
V-B Strong Converse Proof
Fix a sequence of encoding functions and reconstruction functions and satisfying (104). Choose a sequence of positive real numbers satisfying
(108) | |||||
(109) |
and the set
(110) | |||||
where and for odd values of we have whereas for even values of we have .
Define also the probability
(111) |
and notice that by (104) and [3, Remark to Lemma 2.12]:
(112) |
which by assumptions (105b) and (109) satisfies
(113) |
Let further be random variables of joint pmf
(114) |
Let also be uniform over independent of , and define:
(115) | |||||
(116) |
Note that for , . Define the auxiliary random variables
(117a) | |||||
(117b) |
We start with some preliminary observations. For any odd observe the following:
(121) | |||||
(122) |
where, holds because for odd, the message is a function of and thus whereas for even the message is a function of and thus ; holds by Lemmas 1 and 2 in Section II and Definitions (117), where for the application of Lemma 1 we also used Equation (113), and it is worth noting here that when , all the terms before and including the sequential summation “+” do not exist anymore; holds by dividing the entropy terms between sums for odd and even and by definition of the mutual information ; and holds by the non-negativity of mutual information.
Following similar steps, we obtain for any even :
(123) |
We now apply bounds (122) and (123) to obtain the desired bounds on the rates and prove validity of some desired asymptotic Markov chains. For any odd , we have
(127) | |||||
(128) | |||||
(129) |
where holds because for odd message is a function of the tuple and because conditioning can only reduce entropy; and holds by (117) and (122). Notice that for :
(130) |
For any even , we have:
(131) | |||||
(134) | |||||
(135) |
where holds because for even is a function of and holds by (117) and (123).
We next notice that for even (because the message is a function of ):
(136) | |||||
(137) |
where holds by (122) and because conditioning can only reduce entropy.
Similarly, for odd (because the message is a function of ):
(138) | |||||
In particular, for , message is a function of and we have:
(139) |
Let now and and notice that by our definition of the set , for any :
(140) | |||||
(141) |
For simplicity, in the sequel we assume that is even; if is odd the proof is similar. Similarly to (137) and (138), since and , we have:
(143) |
and
(144) | |||||
where holds since for even , message is a function of ; and holds by (122) since is odd and because conditioning can only reduce entropy.
The desired rate constraints are then obtained by combining (129), (130), (135), (137), (138), (139), (141), (143), and (144) and by taking . Details are as follows. By Carathéodory’s theorem [34, Appendix C], there exist auxiliary random variables of bounded alphabets satisfying (129), (130), (135), (137), (138), (139), (141), (143), and (144). We restrict to such auxiliary random variables and invoke the Bolzano-Weierstrass theorem to conclude the existence of a pmf , also abbreviated as , and an increasing subsequence of blocklengths so that
(145) |
where denotes the pmf of the tuple at blocklength .
Notice that for any blocklength the pair lies in the jointly typical set , i.e., , and thus since as , by the definition of and by (145), the limiting pmf satisfies . We further deduce from (129), (130), (135), (137), (138), (139), (141), (143), and (144) that:
(146a) | |||||
(146b) | |||||
(146c) | |||||
(146d) | |||||
(146e) | |||||
(146f) |
where the subscript indicates that the mutual information quantities should be computed with respect to .
VI Testing Against Independence in a -Hop Network
In this section we focus on the -hop hypothesis testing setup in Figure 4.
VI-A Setup
Consider a system with a transmitter T0 observing the source sequence , relays labelled and observing sequences , respectively, and a receiver RK observing sequence .
The source sequences are distributed according to one of two distributions depending on a binary hypothesis :
(148a) | |||
Communication takes place over hops as illustrated in Figure 4. The transmitter T0 sends a message to the first relay R1, which sends a message to the second relay and so on. The communication is thus described by encoding functions
(149) | |||||
(150) | |||||
and messages are obtained as:
(151) | |||||
(152) |
Each relay R1, …, RK-1 as well as the receiver RK, produces a guess of the hypothesis . These guesses are described by guessing functions
(153) |
where we request that the guesses
(154) |
have type-I error probabilities
(155) |
not exceeding given thresholds, and type-II error probabilities
(156) |
decaying to 0 exponentially fast with largest possible exponents.
Definition 4
Given sequences of allowed type-I error probabilities and rates , the exponent tuple is called -achievable if there exists a sequence of encoding and decision functions satisfying for each and blocklength :
(157a) | |||||
(157b) |
VI-B Old and New Results
Definition 5
For any , define the function
(158) | |||||
(159) |
The described setup was previously studied in [29] and [31], and an extension of the setup under variable-length coding was considered in [35]. While for a general number of users only achievability results and weak converses were presented [29], for users a strong converse was derived.
Theorem 4 (Theorems 2 and 3 in [31])
Let and consider fixed allowed type-I error probabilities
(160) |
for given with . An exponent pair is -achievable if, and only if,
(161) |
Remark 4
In [31], the presentation of Theorem 4 was split into two separate theorems depending on the values of and . While [31, Theorem 2] considers the case and coincides with above formulation, [31, Theorem 3] considers the case and is formulated as an optimization problem over three auxiliary random variables . Without loss in optimality, this optimization can however be restricted to auxiliaries , and [31, Theorem 3] simplifies to the form presented in above Theorem 4.
Remark 5
The set of pairs that are achievable according to Theorem 4 does not depend on the values of and (as long as ) and forms a rectangular region. In particular, each of the two exponents can be maximized without affecting the other exponent. This result extends to a general number of users, as shown by the achievability result in [29] and by the strong converse result in the following Theorem 5.
Our main result in this section (Theorem 5 ahead) generalizes the strong converse in Theorem 4 to arbitrary and arbitrary . Technically speaking, we prove an exponentially-strong converse result that is a stronger statement. In fact, for any , an exponent violating Condition (163) can only be achieved with probabilities that tend to 1 exponentially fast in the blocklength .
Theorem 5
Let be sequences satisfying
(162) |
Given rates , the exponent-tuple can only be -achievable, if
(163) |
Remark 6
Remark 7
For all permissible type-I error probabilities that satisfy (162) and do not vanish too quickly, the set of achievable exponent-tuples form a hypercube, implying that all decision centers, i.e., relays R RK-1 and receiver RK, can simultaneously achieve their optimal type-II error exponents. To prove the desired converse result in Theorem 5, it thus suffices to show that the bound in (163) holds in a setup where only the single decision center Rk takes a decision.
Remark 8
When one allows for variable-length coding and only limits the expected sizes of the message set but not its maximum sizes, then a tradeoff between the different exponents arises [35]. Moreover, as also shown in [35], in that case the set of all achievable exponent tuples depends on the asymptotic values of the allowed type-I error probabilities.
VI-C Strong Converse Proof to Theorem 5
Let be sequences of allowed type-I error probabilities. Fix a sequence (in ) of encoding and decision functions satisfying (157) for and type-II error exponents .
Choose a sequence (in ) of small positive numbers satisfying
(164) | |||||
(165) |
Fix now an arbitrary and a blocklength , and let denote the acceptance region of Rk, i.e.,
(166) |
where we define recursively and
(167) |
Define also the intersection of this acceptance region with the typical set:
(168) |
By [3, Remark to Lemma 2.12], the type-I error probability constraints in (157a), and the union bound:
(169) |
and thus
(170) |
Let be random variables of joint pmf
Let also and be uniform over independent of .
Notice that for any :
(172) | |||||
(173) | |||||
(175) | |||||
(176) | |||||
(177) | |||||
(178) |
where we defined . Here, (175) holds by extending (11) to -tuples.
We next upper bound the exponential decay of the type-II error probability. Define:
and
and notice that by (LABEL:eq:change):
(181) | |||||
Moreover, by (166), the probability , and thus
(182) |
where on the left-hand side we slightly abused notation and mean the KL divergence of the two binary pmfs induced by and and by and . Combined with (170), with (181), and with the data-processing inequality, we obtain from (182):
(184) | |||||
We continue to upper bound the divergence term as
(185) | |||||
(188) | |||||
(189) | |||||
(190) |
Here (VI-C) is obtained by the data processing inequality for KL-divergence and (190) by the definition of and .
Combined with (184), we obtain
(191) |
Finally, we proceed to prove that for any the Markov chain holds in the limit as . We start by noticing the Markov chain , and thus:
(192) | |||||
(195) | |||||
(196) |
where (LABEL:MC1proofstep2KHop) is obtained by extending (13) to multiple random variables. We thus conclude that
(197) |
Notice next that for any :
(199) | |||||
(200) |
where the last equality can be proved by extending (11) and (13) to multiple random variables and by noting the factorization .
Following similar steps to (192)–(196), we further obtain:
(206) | |||||
(207) |
We thus conclude that
(208) |
which combined with (200) proves
(209) |
The converse is then concluded by taking , as we explain in the following. By Carathéodory’s theorem [34, Appendix C], for each there must exist random variables satisfying (209), (191), and (178) over alphabets of sizes
(210) |
We thus restrict to random variables of above (bounded) supports and invoke the Bolzano-Weierstrass theorem to conclude for each the existence of pmfs over , also abbreviated as , and an increasing subsequence of positive numbers satisfying
(211) |
where denotes the pmf at blocklength .
By the monotone continuity of mutual information for discrete random variables, we can then deduce that
(212) | |||||
(213) |
where the subscripts indicate that mutual informations should be computed according to the indicated pmfs.
Since for any blocklength the pair lies in the jointly typical set , we have and thus the limiting pmfs satisfy . By similar continuity considerations and by (209), for all the Markov chain
(214) |
holds under . This concludes the proof.
VII Conclusions and Outlook
This paper presented new strong converse proofs for source and channel coding setups and for hypothesis testing. Most of the presented converses are exponentially-strong converses and allow to conclude that the error probabilities tend to 1 exponentially fast whenever the rates (or exponents) violate certain conditions. The proofs for the standard almost lossless source coding with side-information problem and for communication over discrete memoryless channels (DMC) are solely based on change of measure arguments as inspired by [22, 23, 11] and by asymptotic analysis of the distributions implied by these changes of measure. Notice in particular that the restriction to strongly-typical and conditionally strongly-typical sets allows to simplify the proofs and circumvent additional arguments based on variational characterizations as in [11]. The converse results for almost lossless source coding and communication over DMC have been known for long, and our contribution here is to present a simple alternative proof.
In contrast, our converse proofs for the -round interactive compression and the -hop hypothesis testing setups are novel contributions in this article. Only special cases had been reported previously. Our proofs for these setups use similar change of measure arguments as in almost lossless source coding, but additionally also rely on the proofs of Markov chains that hold in the asymptotic regime of infinite blocklengths. These Markov chains are required to conclude existence of the desired auxiliary random variables. Strong converses of several special cases of our -round interactive compression had been reported previously, in particular see [11]. A strong converse proof for the -hop hypothesis testing setup was already presented in [31]; the proof in [31] however does not apply to the case and seems more complex since it is based on the blowing-up lemma and hypercontractivity arguments.
In related publications we have used the presented proof method to derive fundamental limits of hypothesis testing systems under expected rate constraints [35, 36] and (expected) secrecy constraints [37]. In contrast to the results presented in this paper, the fundamental limits of distributed hypothesis testing under these expectation constraints depend on the allowed type-I error probabilities. It turns out that the proof technique based on change of measure arguments as proposed in the present paper naturally captures this dependence.
Acknowledgement
The authors would like to thank R. Graczyk for useful discussions on the strong converse proof for channel coding.
References
- [1] M. Hamad, M. Wigger, and M. Sarkiss, “Strong converses using change of measure and asymptotic Markov chains.” [Online]. Available: https://arxiv.org/, May 2022.
- [2] J. Wolfowitz, “The coding of messages subject to chance errors,” Illinois Journal of Mathematics, vol. 1, no. 4, pp. 591 – 606, 1957.
- [3] I. Csiszár and J. Körner, Information theory: coding theorems for discrete memoryless systems. Cambridge University Press, 2011.
- [4] K. Marton, “A simple proof of the blowing-up lemma,” IEEE Trans. Inf. Theory, vol. 32, pp. 445–446, May 1986.
- [5] G. Dueck, “The strong converse coding theorem for the multiple-access channel,” Journal of Combinatorics, Information & System Sciences, vol. 6, no. 3, pp. 187–196, 1981.
- [6] V. Strassen, “Asymptotische Abschätzungen in Shannons Informationstheorie,” (Prague, Czech Replublic), pp. 689–723, 962.
- [7] H. V. P. Y. Polyanskiy and S. Verdú, “Channel coding rate in the finite blocklength regime,” IEEE Transactions on Information Theory, vol. 56, no. 5, pp. 2307–2359, 2010.
- [8] R. Graczyk, “An elementary proof of the strong converse of the channel coding theorem.” Nov. 2022.
- [9] S. L. Fong and V. Y. Tan, “A proof of the strong converse theorem for gaussian multiple access channels,” IEEE Transactions on Information Theory, vol. 62, no. 8, pp. 4376–4394, 2016.
- [10] S. L. Fong and V. Y. Tan, “A proof of the strong converse theorem for gaussian broadcast channels via the gaussian poincaré inequality,” IEEE Transactions on Information Theory, vol. 63, no. 12, pp. 7737–7746, 2017.
- [11] H. Tyagi and S. Watanabe, “Strong converse using change of measure arguments,” IEEE Trans. Inf. Theory, vol. 66, no. 2, pp. 689–703, 2019.
- [12] A. I. Khinchin, “The entropy concept in probability theory,” Usp. Mat. Nay, vol. 8, pp. 3–20, 1953.
- [13] B. McMillan, “The basic theorems of information theory,” Ann. Math. Stat., vol. 24, pp. 196–219, 1953.
- [14] I. Csiszár and G. Longo, “On error exponent for source coding and for testing simple statistical hypotheses,” Studia Sci. Math. Hungar., pp. 181–191, 1977.
- [15] J. Körner, “Coding of an information source having ambiguous alphabet and the entropy of graphs,” in Transactions of the 6th Prague conference on Information Theory, etc., (Prague, Czech Republic), pp. 411–425, 1973.
- [16] J. Kieffer, “Strong converses in source coding relative to a fidelity criterion,” IEEE Transactions on Information Theory, vol. 37, no. 2, pp. 257–262, 1991.
- [17] D. Slepian and J. Wolf, “Noiseless coding of correlated information sources,” IEEE Transactions on Information Theory, vol. 19, no. 4, pp. 471–480, 1973.
- [18] A. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,” IEEE Transactions on Information Theory, vol. 22, no. 1, pp. 1–10, 1976.
- [19] Y. Oohama and T. S. Han, “Universal coding for the slepian-wolf data compression system and the strong converse theorem,” IEEE Transactions on Information Theory, vol. 40, no. 6, pp. 1908–1919, 1994.
- [20] Y. Oohama, “Exponential strong converse for source coding with side information at the decoder,” Entropy, vol. 20, no. 5, p. 352, 2018.
- [21] Y. Oohama, “Exponential strong converse for one helper source coding problem,” Entropy, vol. 21, no. 6, p. 567, 2019.
- [22] W. Gu and M. Effros, “A strong converse for a collection of network source coding problems,” in 2009 IEEE International Symposium on Information Theory, pp. 2316–2320, 2009.
- [23] W. Gu and M. Effros, “A strong converse in source coding for super-source networks,” in 2011 IEEE International Symposium on Information Theory Proceedings, pp. 395–399, 2011.
- [24] O. Kosut and J. Kliewer, “Strong converses are just edge removal properties,” IEEE Transactions on Information Theory, vol. 65, no. 6, pp. 3315–3339, 2018.
- [25] A. Kaspi, “Two-way source coding with a fidelity criterion,” IEEE Transactions on Information Theory, vol. 31, no. 6, pp. 735–740, 1985.
- [26] N. Ma and P. Ishwar, “Some results on distributed source coding for interactive function computation,” IEEE Transactions on Information Theory, vol. 57, no. 9, pp. 6180–6195, 2011.
- [27] Y. Steinberg, “Coding and common reconstruction,” IEEE Transactions on Information Theory, vol. 55, no. 11, pp. 4995–5010, 2009.
- [28] A. Lapidoth, A. Malär, and M. Wigger, “Constrained source-coding with side information,” IEEE Transactions on Information Theory, vol. 60, no. 6, pp. 3218–3237, 2014.
- [29] S. Salehkalaibar, M. Wigger, and L. Wang, “Hypothesis testing in multi-hop networks.” [Online]. Available: https://arxiv.org/abs/1708.05198v1, 2017.
- [30] R. Ahlswede and I. Csiszár, “Hypothesis testing with communication constraints,” IEEE Trans. Inf. Theory, vol. 32, pp. 533–542, Jul. 1986.
- [31] D. Cao, L. Zhou, and V. Y. F. Tan, “Strong converse for hypothesis testing against independence over a two-hop network,” Entropy (Special Issue on Multiuser Information Theory II), vol. 21, Nov. 2019.
- [32] J. Liu, R. Van Handel, and S. Verdú, “Beyond the blowing-up lemma: Sharp converses via reverse hypercontractivity,” in 2017 IEEE International Symposium on Information Theory (ISIT), pp. 943–947, IEEE, 2017.
- [33] S. Sreekuma and D. Gündüz, “Strong converse for testing against independence over a noisy channel,” 2020.
- [34] A. El Gamal and Y. H. Kim, Network Information Theory. Cambridge University Press, 2011.
- [35] M. Hamad, M. Wigger, and M. Sarkiss, “Multi-hop network with multiple decision centers under expected-rate constraints,” 2022.
- [36] M. Hamad, M. Sarkiss, and M. Wigger, “Benefits of rate-sharing for distributed hypothesis testing,” in 2022 IEEE International Symposium on Information Theory (ISIT), pp. 2714–2719, 2022.
- [37] S. Faour, M. Hamad, M. Sarkiss, and M. Wigger, “Testing against independence with an eavesdropper,” 2022.