This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Strong Converses using Typical Changes of Measures and Asymptotic Markov Chains

Mustapha Hamad, Michèle Wigger, Mireille Sarkiss Part of this material was presented at IEEE Inf. Theory Workshop (ITW) 2022 [1]. M. Hamad used to be with LTCI, Télécom Paris, Institut Polytechnique de Paris, 91120 Palaiseau, France. He is now with Huawei Mathematical and Algorithmic Sciences Lab, Paris Research Center, 92100 Boulogne-Billancourt, France, mustapha.hamad7@gmail.com. M. Wigger is with LTCI, Télécom Paris, Institut Polytechnique de Paris, 91120 Palaiseau, France, michele.wigger@telecom-paris.fr.M. Sarkiss is with SAMOVAR, Télécom SudParis, Institut Polytechnique de Paris, 91120 Palaiseau, France, mireille.sarkiss@telecom-sudparis.eu.
Abstract

The paper presents exponentially-strong converses for source-coding, channel coding, and hypothesis testing problems. More specifically, it presents alternative proofs for the well-known exponentially-strong converse bounds for almost lossless source-coding with side-information and for channel coding over a discrete memoryless channel (DMC). These alternative proofs are solely based on a change of measure argument on the sets of conditionally or jointly typical sequences that result in a correct decision, and on the analysis of these measures in the asymptotic regime of infinite blocklengths. The paper also presents a new exponentially-strong converse for the K-hop hypothesis testing against independence problem with certain Markov chains and a strong converse for the two-terminal L-round interactive compression problem with multiple distortion constraints that depend on both sources and both reconstructions. This latter problem includes as special cases the Wyner-Ziv problem, the interactive function computation problem, and the compression with lossy common reconstruction problem. These new strong converse proofs are derived using similar change of measure arguments as described above and by additionally proving that certain Markov chains involving auxiliary random variables hold in the asymptotic regime of infinite blocklengths. As shown in related publications, the same method also yields converse bounds under expected resource constraints.

Index Terms:
Strong converse, change of measure, asymptotic Markov chains, source coding, channel coding, hypothesis testing.

I Introduction

Strong converse results have a rich history in information theory. They refer to proofs showing that the fundamental performance limit (such as minimum compression rate or maximum rate of communiction) of a specific system does not depend on its allowed error (or excess) probability, as long as this probability is not 11. For example, Wolfowitz’ strong converse [2] established that the largest rate of communication over a discrete-memoryless channel (DMC) ensuring a probability of error smaller than δn\delta_{n} (see Figure 1) always equals capacity, as long as δn\delta_{n} does not tend to 1 for increasing blocklengths nn. Differently stated, for all rates of communication that exceed capacity, the probability of error of any system must tend to 1 as the blocklength nn tends to infinity. A stronger result, known as the exponentially-strong converse states that for all rates above capacity the probability of decoding error tends to 1 exponentially fast in the blocklength. This result was first established by Csiszár and Körner [3] who presented lower bounds on the error exponents at rates above capacity. Since then, various alternative proofs for the strong or exponentially-strong converse for channel coding over a DMC have been proposed, for example based on the blowing-up lemma [4, 5], by bounding the decoding error probabilities at finite blocklengths [6, 7], or by putting forward geometric and typicality arguments [8]. Various extensions to multi-user communication networks were also derived, see e.g., [5, 9, 10]. In this paper, we present yet-another proof, based on a change of measure argument that restricts to output sequences that are conditionally-typical for one of the possible codewords and lie in the decoding set of this codeword. Related is the converse proof for the wiretap channel by Tyagi and Watanabe [11], which is also based on a similar change of measure argument, but does not restrict to conditionally-typical output sequences. This restriction is however important in our converse proof to relate the rate of the code to the capacity of the DMC.

Refer to caption
Figure 1: Channel coding over a DMC subject to a constraint on the probability of error.

For (almost) lossless source coding, the strong converse states that any discrete-memoryless source (DMS) cannot be compressed with a rate below the entropy of the source and with reconstruction error probability that stays below 11 asymptotically for infinite blocklengths. This result essentially follows by the asymptotic equipartition property [12, 13]. The exponentially-strong converse for lossless compression [14] states that for all compression rates below entropy, the probability of reconstruction error tends to 1 exponentially fast. Strong converse results also extend to lossy compression, where the limit of compression of DMSs is not entropy but the well-known rate-distortion function. The strong converse for lossy compression of DMSs was established by Körner [15], see also the related work by Kieffer [16].

Our focus is on compression scenarios where the decoder has side-information that is correlated with the source as depicted in Figure 2. For memoryless sources, the fundamental limits of compression with side-information were established by Slepian and Wolf [17] for the lossless case and by Wyner and Ziv [18] for the lossy case. Exponentially-strong converses were established by Oohama and Han [19] for the lossless case and by Oohama [20] for the lossy case. Various exponentially-strong converse results were also derived for compression problems in more complicated networks with and without side-information, see e.g., [21, 22, 23, 24, 11].

In this paper, we reprove the exponentially-strong converse for lossless source coding with side-information.

Refer to caption
Figure 2: Almost lossless source coding of a DMS subject to a constraint on the probability of error.

As a new result, we also prove the strong converse for an interactive lossy compression problem (see Figure 3) where the distortion measures can depend on the source sequences observed at the two terminals, as well as on the two terminals’ produced reconstruction sequences. This problem includes as special cases Kaspi’s two-terminal interactive lossy source-coding problem [25], Ma and Ishwar’s two-terminal interactive function-computation problem [26], and Steinberg’s lossy source coding problem with common reconstructions [27], as well as its extension by Malär et al. [28]. Our proof of the exponentially-strong converse for lossless source coding with side-information is based on a change of measure argument that restricts to source sequences that are jointly typical and result in correct decoding, and on the asymptotic analysis of this new measure. Our strong converse proof for interactive LL-round lossy source coding with generalized distortion measures again relies on a change of measure argument on the set of typical sequences that result in the desired distortion and on analysing this new measure asymptotically in the limit of infinite blocklengths. As part of this analysis, we prove in particular that certain Markov chains hold in this asymptotic regime of infinite blocklengths.

The change of measure argument for proving exponentially-strong converses for source coding problems dates back to Gu and Effros [22, 23] and for function computation problems and channel coding to Tyagi and Watanabe [11]. Our proofs are similar to [11], however in our changed measures we restrict to typical sequences, which allows us to circumvent resorting to variational characterizations of the multi-letter and single-letter problems.

Refer to caption
Figure 3: Interactive source coding with multiple distortions depending on the sources and the reconstructions.

In this paper, we also prove a new exponentially-strong converse result for the KK-hop hypothesis testing problem in [29], see Figure 4. In this problem, all K1K-1 relays as well as the final receiver guess the hypothesis by testing against independence. The figure of merit is the type-II error exponents that can be achieved at these KK terminals subject to rate constraints on the KK links and the constraint that the type-I error probabilities at these terminals have to stay below predefined thresholds δk,n\delta_{k,n}. Specifically, we consider a scenario where K+1K+1 terminals observe memoryless source sequences whose underlying joint distribution depends on a binary hypothesis {0,1}\mathcal{H}\in\{0,1\}. The distribution is PY0k=1KPYk|Yk1P_{Y_{0}}\prod_{k=1}^{K}P_{Y_{k}|Y_{k-1}} under =0\mathcal{H}=0 and it is PY0k=1KPYkP_{Y_{0}}\prod_{k=1}^{K}P_{Y_{k}} under =1\mathcal{H}=1. Upon observing its source sequence YknY_{k}^{n}, each terminal k=0,,K1k=0,\ldots,K-1 can send a nRk+1nR_{k+1}-bits message Mk+1M_{k+1} to the next-following terminal. Terminals 1,,K1,\ldots,K have to produce a guess of the hypothesis ^k{0,1}\hat{\mathcal{H}}_{k}\in\{0,1\} based on their local observations YknY_{k}^{n} and their received message MkM_{k}. The main goal is to maximize their type-II error probability (the probability of error under =1\mathcal{H}=1) under the constraint that for each blocklength nn the type-I error probability (the probability of error under the null hypothesis =0\mathcal{H}=0) stays below a given threshold δk,n\delta_{k,n}.

For K=1K=1, this problem was solved by Ahlswede and Csiszár [30] for type-I error probabilities δ1,n\delta_{1,n} that are asymptotically bounded away from 1. In particular, it was shown that the maximum achievable type-II error exponent does not depend on the values δ1,n\delta_{1,n} as long as they do not tend to 1. For arbitrary K2K\geq 2, the problem was studied under the assumption that all the type-I error probabilities δk,n\delta_{k,n} tend to 0 as nn\to\infty [29]. In [31], it was shown that for K=2K=2 the result in [29] applies unchanged for sequences of type-I error probabilities δ1,n\delta_{1,n} and δ2,n\delta_{2,n} for which each individual sequence as well as the sum of the two sequences is asymptotically bounded away from 1. In this work, we prove the exponentially-strong converse to this result, i.e., we show that for arbitrary K1K\geq 1 and arbitrary type-I error probabilities δk,n\delta_{k,n} not vanishing exponentially fast in the blocklength, the result in [29] continues to hold. Notice that the proof of the mentioned special cases with K=2K=2 in [31] used the change of measure argument and variational characterizations proposed by Tyagi and Watanabe [11] and hypercontractivity arguments as in [32]. The proof of the strong converse for K=1K=1 in [30] was based on the blowing-up lemma [4], same as the proof of the strong converse for K=1K=1 when communication is over a DMC and without any rate constraint [33]. The latter work also used the Tyagi-Watanabe change of measure argument combined with variational characterizations. Unlike these works, our general proof does not require any blowing-up lemma or hypercontractivity arguments, nor variational characterizations.

Refer to caption
Figure 4: KK-hop hypothesis testing problem.

We summarize the main contributions of our paper:

  • We present alternative exponentially-strong converse proofs for the lossless compression problem with side-information for DMSs and the channel coding problem over a DMCs. The proofs are rather simple and depend only on a change of measure argument and the asymptotic analysis of these new measures.

  • We derive new strong converse results for the two-terminal LL-round interactive lossy source coding problem for DMSs and with multiple distortion measures that depend on both sources and both reconstructions. This setup includes as special cases the Wyner-Ziv problem, the interactive function computation problem, and the lossy source coding problem with (lossy) common reconstructions. Previous strong converse proofs for the Wyner-Ziv problem and the interactive function computation problem relied on the method of variational characterizations and on a change of measure argument. Our proof also relies on a similar change of measure, which is however restricted to the typical set. This, combined with the asymptotic proofs of specific Markov chains allows us to directly prove the desired results without resorting to variational characterizations.

  • We derive new exponentially-strong converse results for a KK-hop hypothesis testing problem. Previously, a general weak converse and strong converses in special cases had been derived for this problem. The previous strong converse for K=2K=2 was based on a change of measure argument combined with variational characterizations, and on hypercontractivity arguments. The general exponentially-strong converse proof in this paper relies on a change of measure argument restricted to the typical set and on an asymptotic analysis of this new measure including the proof of asymptotic Markov chains.

Outline of the Paper: We end this section with remarks on notation. The following Section II presents and proofs two key lemmas used in the rest of the paper. Sections III and IV present our new strong converse proof methods for the almost lossless source coding with side-information problem and for communication over a DMC. These strong converse proofs are solely based on change of measure arguments and on the analysis of these measures in the asymptotic regime of infinite blocklengths. The converse proofs in the next two Sections V and VI are also based on similar change of measure arguments and asymptotic analysis of these measures, but additionally also require proving that certain Markov chains involving auxiliary random variables hold in the asymptotic regime of infinite blocklengths. Specifically, Section V considers the LL-round interactive compression problem with distortion functions that can depend on the sources and reconstructions at both terminals. Section VI considers the KK-hop hypothesis testing for testing against independence where the observations at the various terminals obey some Markov conditions.

Notation: We mostly follow standard notation where upper-case letters are used for random quantities and lower-case letters for deterministic realizations. Sets are denoted using calligraphic fonts. All random variables are assumed finite and discrete. We abbreviate the nn-tuples (X1,,Xn)(X_{1},\ldots,X_{n}) and (x1,,xn)(x_{1},\ldots,x_{n}) as XnX^{n} and xnx^{n} and the ntn-t tuples (Xt+1,,Xn)(X_{t+1},\ldots,X_{n}) and (xt+1,,xn)(x_{t+1},\ldots,x_{n}) as Xt+1nX_{t+1}^{n} and xt+1nx_{t+1}^{n}. We further abbreviate independent and identically distributed as i.i.d. and probability mass function as pmf.

Entropy, conditional entropy, and mutual information functionals are written as H()H(\cdot), H(|)H(\cdot|\cdot), and I(;)I(\cdot;\cdot), where the arguments of these functionals are random variables and whenever their probability mass function (pmf) is not clear from the context, we add it as a subscript to these functionals. The Kullback-Leibler divergence between two pmfs is denoted by D()D(\cdot\|\cdot). We shall use 𝒯μ(n)(PXY)\mathcal{T}_{\mu}^{(n)}(P_{XY}) to indicate the jointly strongly-typical set with respect to the pmf PXYP_{XY} on the product alphabet 𝒳×𝒴\mathcal{X}\times\mathcal{Y} and parameter μ\mu as defined in [3, Definition 2.8]. Specifically, denoting by nxn,yn(a,b)n_{x^{n},y^{n}}(a,b) the number of occurrences of the pair (a,b)(a,b) in sequences (xn,yn)(x^{n},y^{n}):

nxn,yn(a,b)=|{t:(xt,yt)=(a,b)}|,n_{x^{n},y^{n}}(a,b)=\left|\{t\colon(x_{t},y_{t})=(a,b)\}\right|, (1)

a pair (xn,yn)(x^{n},y^{n}) lies in 𝒯μ(n)(PXY)\mathcal{T}_{\mu}^{(n)}(P_{XY}) if

|nxn,yn(a,b)nPXY(a,b)|μ,(a,b)𝒳×𝒴,\left|\frac{n_{x^{n},y^{n}}(a,b)}{n}-P_{XY}(a,b)\right|\leq\mu,\qquad\forall(a,b)\in\mathcal{X}\times\mathcal{{Y}}, (2)

and nxn,yn(a,b)=0n_{x^{n},y^{n}}(a,b)=0 whenever PXY(a,b)=0P_{XY}(a,b)=0. The conditionally strongly-typical set with respect to a conditional pmf PY|XP_{Y|X} from 𝒳\mathcal{X} to 𝒴\mathcal{Y}, parameter μ>0\mu>0, and sequence xn𝒳nx^{n}\in\mathcal{X}^{n} is denoted 𝒯μ(n)(PY|X,xn)\mathcal{T}_{\mu}^{(n)}(P_{Y|X},x^{n}) [3, Definition 2.9]. It contains all sequences yn𝒴ny^{n}\in\mathcal{Y}^{n} satisfying

|nxn,yn(a,b)nnxn(a)nPY|X(b|a)|μ,(a,b)𝒳×𝒴,\left|\frac{n_{x^{n},y^{n}}(a,b)}{n}-\frac{n_{x^{n}}(a)}{n}P_{Y|X}(b|a)\right|\leq\mu,\qquad\forall(a,b)\in\mathcal{X}\times\mathcal{{Y}}, (3)

and nxn,yn(a,b)=0n_{x^{n},y^{n}}(a,b)=0 whenever PY|X(b|a)=0P_{Y|X}(b|a)=0. Here nxn(a)n_{x^{n}}(a) denotes the number of occurrences of symbol aa in xnx^{n}. In this paper, we denote the joint type of (xn,yn)(x^{n},y^{n}) by πxnyn\pi_{x^{n}y^{n}}, i.e.,

πxnyn(a,b)nxn,yn(a,b)n.\pi_{x^{n}y^{n}}(a,b)\triangleq\frac{n_{x^{n},y^{n}}(a,b)}{n}. (4)

Accordingly, the marginal type of xnx^{n} is written as πxn\pi_{x^{n}}. Finally, we use Landau notation o(1)o(1) to indicate any function that tends to 0 for blocklengths nn\to\infty.

II Auxiliary Lemmas

Lemma 1

Let {(Xi,Yi)}i=1\{(X_{i},Y_{i})\}_{i=1}^{\infty} be a sequence of pairs of i.i.d. random variables according to the pmf PXYP_{XY}. Further let {μn}\{\mu_{n}\} be a sequence of small positive numbers satisfying111Condition (6) ensures that the probability of the strongly typical set 𝒯μn(n)(PXY)\mathcal{T}_{\mu_{n}}^{(n)}(P_{XY}) under PXYnP_{XY}^{\otimes n} tends to 1 as nn\to\infty [3, Remark to Lemma 2.12].

limnμn\displaystyle\lim_{n\to\infty}\mu_{n} =\displaystyle= 0\displaystyle 0 (5)
limn(nμn2)1\displaystyle\lim_{n\to\infty}\left(n\cdot\mu_{n}^{2}\right)^{-1} =\displaystyle= 0\displaystyle 0 (6)

and for each positive integer nn let 𝒟n\mathcal{D}_{n} be a subset of the strongly-typical set 𝒯μn(n)(PXY)\mathcal{T}_{\mu_{n}}^{(n)}(P_{XY}) so that its probability

Δn:=Pr[(Xn,Yn)𝒟n]\Delta_{n}:=\Pr[(X^{n},Y^{n})\in\mathcal{D}_{n}] (7)

satisfies

limn1nlogΔn=0.\lim_{n\to\infty}\frac{1}{n}\log\Delta_{n}=0. (8)

Let further (X~n,Y~n)(\tilde{X}^{n},\tilde{Y}^{n}) be random variables of joint pmf

PX~nY~n(xn,yn)=PXYn(xn,yn)Δn𝟙{(xn,yn)𝒟n}P_{\tilde{X}^{n}\tilde{Y}^{n}}(x^{n},y^{n})=\frac{P_{{X}{Y}}^{\otimes n}(x^{n},y^{n})}{\Delta_{n}}\cdot\mathds{1}\{(x^{n},y^{n})\in\mathcal{D}_{n}\} (9)

and TT be uniform over {1,,n}\{1,\ldots,n\} independent of all other random variables.

For the distribution in (9), the following limits hold as nn\to\infty:

PX~TY~T\displaystyle P_{\tilde{X}_{T}\tilde{Y}_{T}} \displaystyle\to PXY\displaystyle P_{XY} (10)
|1nH(X~nY~n)H(X~TY~T)|\displaystyle\left|\frac{1}{n}H(\tilde{X}^{n}\tilde{Y}^{n})-H(\tilde{X}_{T}\tilde{Y}_{T})\right| \displaystyle\to 0\displaystyle 0 (11)
|1nH(Y~n)H(Y~T)|\displaystyle\left|\frac{1}{n}H(\tilde{Y}^{n})-H(\tilde{Y}_{T})\right| \displaystyle\to 0\displaystyle 0 (12)
|1nH(X~n|Y~n)H(X~T|Y~T)|\displaystyle\left|\frac{1}{n}H(\tilde{X}^{n}|\tilde{Y}^{n})-H(\tilde{X}_{T}|\tilde{Y}_{T})\right| \displaystyle\to 0\displaystyle 0 (13)
|H(X~TY~T)H(XY)|\displaystyle\left|H(\tilde{X}_{T}\tilde{Y}_{T})-H(XY)\right| \displaystyle\to 0\displaystyle 0 (14)
|H(Y~T)H(Y)|\displaystyle\left|H(\tilde{Y}_{T})-H(Y)\right| \displaystyle\to 0\displaystyle 0 (15)
|H(X~T|Y~T)H(X|Y)|\displaystyle\left|H(\tilde{X}_{T}|\tilde{Y}_{T})-H(X|Y)\right| \displaystyle\to 0.\displaystyle 0. (16)
Proof:

Notice that (14)–(16) follow directly from (10) and continuity of entropy. To prove (10), notice that

PX~TY~T(x,y)\displaystyle P_{\tilde{X}_{T}\tilde{Y}_{T}}(x,y) =\displaystyle= 1nt=1nPX~tY~t(x,y)\displaystyle\frac{1}{n}\sum_{t=1}^{n}P_{\tilde{X}_{t}\tilde{Y}_{t}}(x,y) (17)
=\displaystyle= 𝔼[1nt=1n𝟙{X~t=x,Y~t=y}]\displaystyle\mathbb{E}\left[\frac{1}{n}\sum_{t=1}^{n}\mathbbm{1}\{\tilde{X}_{t}=x,\tilde{Y}_{t}=y\}\right] (18)
=\displaystyle= 𝔼[πX~nY~n(x,y)],\displaystyle\mathbb{E}[\pi_{\tilde{X}^{n}\tilde{Y}^{n}}(x,y)], (19)

where the expectations are with respect to the tuples X~n\tilde{X}^{n} and Y~n\tilde{Y}^{n}. Since by the definition of the typical set,

|πX~nY~n(x,y)PXY(x,y)|μn,|\pi_{\tilde{X}^{n}\tilde{Y}^{n}}(x,y)-P_{XY}(x,y)|\leq\mu_{n}, (20)

we conclude that as nn\to\infty the probability PX~TY~T(x,y)P_{\tilde{X}_{T}\tilde{Y}_{T}}(x,y) tends to PXY(x,y)P_{XY}(x,y).

To prove (11), notice first that

1nH(X~nY~n)+1nD(PX~nY~nPXYn)\displaystyle\frac{1}{n}H(\tilde{X}^{n}\tilde{Y}^{n})+\frac{1}{n}D(P_{\tilde{X}^{n}\tilde{Y}^{n}}\|P_{XY}^{\otimes n}) (21)
=\displaystyle= 1n(xn,yn)𝒟nPX~nY~n(xn,yn)logPXYn(xn,yn)\displaystyle-\frac{1}{n}\sum_{(x^{n},y^{n})\in\mathcal{D}_{n}}P_{\tilde{X}^{n}\tilde{Y}^{n}}(x^{n},y^{n})\log P_{XY}^{\otimes n}(x^{n},y^{n})
=\displaystyle= 1nt=1n(xn,yn)𝒟nPX~nY~n(xn,yn)logPXY(xt,yt)\displaystyle-\frac{1}{n}\sum_{t=1}^{n}\sum_{(x^{n},y^{n})\in\mathcal{D}_{n}}P_{\tilde{X}^{n}\tilde{Y}^{n}}(x^{n},y^{n})\log P_{XY}(x_{t},y_{t}) (22)
=\displaystyle= 1nt=1n(xt,yt)𝒳×𝒴PX~tY~t(xt,yt)logPXY(xt,yt)\displaystyle-\frac{1}{n}\sum_{t=1}^{n}\sum_{(x_{t},y_{t})\in\mathcal{X}\times\mathcal{Y}}P_{\tilde{X}_{t}\tilde{Y}_{t}}(x_{t},y_{t})\log P_{XY}(x_{t},y_{t}) (23)
=\displaystyle= (x,y)𝒳×𝒴(1nt=1nPX~tY~t(x,y))logPXY(x,y)\displaystyle-\sum_{(x,y)\in\mathcal{X}\times\mathcal{Y}}\left(\frac{1}{n}\sum_{t=1}^{n}P_{\tilde{X}_{t}\tilde{Y}_{t}}(x,y)\right)\log P_{XY}(x,y) (24)
=\displaystyle= (x,y)𝒳×𝒴PX~TY~T(x,y)logPXY(x,y)\displaystyle-\sum_{(x,y)\in\mathcal{X}\times\mathcal{Y}}P_{\tilde{X}_{T}\tilde{Y}_{T}}(x,y)\log P_{XY}(x,y) (25)
=\displaystyle= H(X~TY~T)+D(PX~TY~TPXY),\displaystyle H(\tilde{X}_{T}\tilde{Y}_{T})+D(P_{\tilde{X}_{T}\tilde{Y}_{T}}\|P_{XY}), (26)

where (23) holds by the law of total probability applied to the random variables X~t1,X~t+1n,Y~t1,Y~t+1n\tilde{X}^{t-1},\tilde{X}_{t+1}^{n},\tilde{Y}^{t-1},\tilde{Y}_{t+1}^{n}. Combined with the following two limits (27) and (28) this establishes (11). The first relevant limit is

D(PX~TY~TPXY)0,\displaystyle D(P_{\tilde{X}_{T}\tilde{Y}_{T}}\|P_{XY})\to 0, (27)

which holds by (10) and because PX~TY~T(x,y)=0P_{\tilde{X}_{T}\tilde{Y}_{T}}(x,y)=0 whenever PXY(x,y)=0P_{XY}(x,y)=0. The second limit is:

1nD(PX~nY~nPXYn)0,\frac{1}{n}D(P_{\tilde{X}^{n}\tilde{Y}^{n}}\|P_{XY}^{\otimes n})\to 0, (28)

and holds because 1nlogΔn0\frac{1}{n}\log\Delta_{n}\to 0 and by the following set of inequalities:

0\displaystyle 0 \displaystyle\leq 1nD(PX~nY~nPXYn)\displaystyle\frac{1}{n}D(P_{\tilde{X}^{n}\tilde{Y}^{n}}\|P_{XY}^{\otimes n}) (29)
=\displaystyle= 1n(xn,yn)𝒟nPX~nY~n(xn,yn)logPX~nY~n(xn,yn)PXYn(xn,yn)\displaystyle\frac{1}{n}\sum_{(x^{n},y^{n})\in\mathcal{D}_{n}}P_{\tilde{X}^{n}\tilde{Y}^{n}}(x^{n},y^{n})\log\frac{P_{\tilde{X}^{n}\tilde{Y}^{n}}(x^{n},y^{n})}{P_{XY}^{\otimes n}(x^{n},y^{n})}
=\displaystyle= 1n(xn,yn)𝒟nPX~nY~n(xn,yn)logΔn\displaystyle-\frac{1}{n}\sum_{(x^{n},y^{n})\in\mathcal{D}_{n}}P_{\tilde{X}^{n}\tilde{Y}^{n}}(x^{n},y^{n})\log\Delta_{n} (30)
=\displaystyle= 1nlogΔn.\displaystyle-\frac{1}{n}\log\Delta_{n}. (31)

To prove (12), notice that by the same arguments as we concluded (26), we also have

1nH(Y~n)+1nD(PY~nPYn)=H(Y~T)+D(PY~TPY).\displaystyle\frac{1}{n}H(\tilde{Y}^{n})+\frac{1}{n}D(P_{\tilde{Y}^{n}}\|P_{Y}^{\otimes n})=H(\tilde{Y}_{T})+D(P_{\tilde{Y}_{T}}\|P_{Y}). (32)

Moreover, (27) and (28) imply

1nD(PY~nPYn)\displaystyle\frac{1}{n}D(P_{\tilde{Y}^{n}}\|P_{Y}^{\otimes n}) \displaystyle\to 0\displaystyle 0 (33)
D(PY~TPY)\displaystyle D(P_{\tilde{Y}_{T}}\|P_{Y}) \displaystyle\to 0,\displaystyle 0, (34)

which combined with (32) imply (12).

The last limit (13) follows by the chain rule and limits (11) and (12). This concludes the proof. ∎

The second lemma dates back to Csiszár and Körner [3].

Lemma 2

Let AnA^{n} and BnB^{n} be of arbitrary joint distribution and TT be uniform over {1,,n}\{1,\ldots,n\} independent of (An,Bn)(A^{n},B^{n}). Then:

H(An|Bn)H(Bn|An)\displaystyle H(A^{n}|B^{n})-H(B^{n}|A^{n}) (35)
=\displaystyle= n(H(AT|BTAT1BT+1n)H(BT|ATAT1BT+1n)).\displaystyle n\left(H(A_{T}|B_{T}A^{T-1}B_{T+1}^{n})-H(B_{T}|A_{T}A^{T-1}B_{T+1}^{n})\right).

The conditional version follows immediately from the definition of conditional entropy:

H(An|BnS)H(Bn|AnS)\displaystyle H(A^{n}|B^{n}S)-H(B^{n}|A^{n}S) (36)
=\displaystyle= n(H(AT|BTAT1BT+1nS)H(BT|ATAT1BT+1nS))\displaystyle n\left(H(A_{T}|B_{T}A^{T-1}B_{T+1}^{n}S)-H(B_{T}|A_{T}A^{T-1}B_{T+1}^{n}S\right))

for any random variable SS so that TT is independent of (An,Bn,S)(A^{n},B^{n},S).

Proof:

Kramer’s simple telescoping sum states:

0=i=1n[I(Ai;Bi+1n)I(Ai1;Bin)],0=\sum_{i=1}^{n}\left[I(A^{i};B_{i+1}^{n})-I(A^{i-1};B_{i}^{n})\right], (37)

which by the chain rule of mutual information implies Csiszár’s sum-identity [3]:

0=i=1n[I(Ai;Bi+1n|Ai1)I(Ai1;Bi|Bi+1n)].0=\sum_{i=1}^{n}\left[I(A_{i};B_{i+1}^{n}|A^{i-1})-I(A^{i-1};B_{i}|B_{i+1}^{n})\right]. (38)

where we added and subtracted the mutual informations i=1nI(Ai1;Bi+1n)\sum_{i=1}^{n}I(A^{i-1};B_{i+1}^{n}). Similarly, adding and subtracting the mutual informations i=1nI(Ai;Bi|Ai1,Bi+1n)\sum_{i=1}^{n}I(A_{i};B_{i}|A^{i-1},B_{i+1}^{n}) yields:

0=i=1n[I(Ai;Bin|Ai1)I(Ai;Bi|Bi+1n)].0=\sum_{i=1}^{n}\left[I(A_{i};B_{i}^{n}|A^{i-1})-I(A^{i};B_{i}|B_{i+1}^{n})\right]. (39)

The desired equality in (35) follows then immediately from this last equation (39) and from the chain rule:

H(An|Bn)H(Bn|An)\displaystyle H(A^{n}|B^{n})-H(B^{n}|A^{n}) (40)
=(a)\displaystyle\stackrel{{\scriptstyle(a)}}{{=}} H(An)H(Bn)\displaystyle H(A^{n})-H(B^{n})
=\displaystyle= i=1n[H(Ai|Ai1)H(Bi|Bi+1n)]\displaystyle\sum_{i=1}^{n}\left[H(A_{i}|A^{i-1})-H(B_{i}|B_{i+1}^{n})\right] (41)
=(b)\displaystyle\stackrel{{\scriptstyle(b)}}{{=}} i=1n[H(Ai|Ai1Bin)H(Bi|AiBi+1n)]\displaystyle\sum_{i=1}^{n}\left[H(A_{i}|A^{i-1}B_{i}^{n})-H(B_{i}|A^{i}B_{i+1}^{n})\right] (42)
=(c)\displaystyle\stackrel{{\scriptstyle(c)}}{{=}} n(H(AT|BTAT1BT+1n)H(BT|ATAT1BT+1n)),\displaystyle n(H(A_{T}|B_{T}A^{T-1}B_{T+1}^{n})-H(B_{T}|A_{T}A^{T-1}B_{T+1}^{n})), (43)

where (a)(a) is obtained by adding and subtracting I(An;Bn)I(A^{n};B^{n}), (b)(b) holds by (39); and (c)(c) by the definition of TT.

III Lossless Source Coding with Side-Information

This section studies the lossless source coding with side-information setup in Figure 2.

III-A Setup and Result

Consider two terminals, an encoder observing the source sequence XnX^{n} and a decoder observing the related side-information sequence YnY^{n}, where we assume that

(Xn,Yn) i.i.d. PXY,\displaystyle(X^{n},Y^{n})\textnormal{ i.i.d. }\sim\,P_{XY}, (44)

for a given probability mass function PXYP_{XY} on the product alphabet 𝒳×𝒴\mathcal{X}\times\mathcal{Y}. The encoder uses a function ϕ(n)\phi^{(n)} to compress the sequence XnX^{n} into a message M{1,,2nR}{M}\in\{1,\ldots,2^{nR}\} of given rate R>0R>0:

M\displaystyle{M} =\displaystyle= ϕ(n)(Xn).\displaystyle\phi^{(n)}(X^{n}). (45)

Based on this message and its own observation YnY^{n}, the decoder is supposed to reconstruct the source sequence XnX^{n} with small probability of error. Thus, the decoder applies a decoding function g(n)g^{(n)} to (M,Yn)({M},Y^{n}) to produce the reconstruction sequence X^n𝒳n\hat{X}^{n}\in\mathcal{X}^{n}:

X^n=g(n)(M,Yn).\hat{X}^{n}=g^{(n)}({M},Y^{n}). (46)
Definition 1

Given a sequence of error probabilities {δn}\{\delta_{n}\}, the rate R>0R>0 is said {δn}\{\delta_{n}\}-achievable if there exist sequences (in nn) of encoding and reconstruction functions ϕ(n)\phi^{(n)} and g(n)g^{(n)} such that for each blocklength nn:

Pr[Xng(n)(ϕ(n)(Xn))]δn.\Pr\left[X^{n}\neq g^{(n)}(\phi^{(n)}({X}^{n}))\right]\leq\delta_{n}. (47)

A standard result in information theory is [19]:

Theorem 1

For any sequence {δn}\{\delta_{n}\} satisfying

limn1nlog(1δn)=0,\lim_{n\to\infty}\frac{1}{n}\log(1-\delta_{n})=0, (48)

any rate R<H(X|Y)R<H(X|Y) is not {δn}\{\delta_{n}\}-achievable.

Notice that the theorem in particular implies an exponentially-strong converse, i.e., that for all rates R<H(X|Y)R<H(X|Y) the probability of error approches 11 exponentially fast in nn. The result is well known, but we provide an alternative proof in the following subsection.

III-B Alternative Strong Converse Proof

Fix a sequence of encoding and decoding functions {ϕ(n),g(n)}n=1\{\phi^{(n)},g^{(n)}\}_{n=1}^{\infty} satisfying (47). Choose a sequence of small positive numbers {μn}\{\mu_{n}\} satisfying

limnμn\displaystyle\lim_{n\to\infty}\mu_{n} =\displaystyle= 0\displaystyle 0 (49)
limn(nμn2)1\displaystyle\lim_{n\to\infty}\left(n\cdot\mu_{n}^{2}\right)^{-1} =\displaystyle= 0,\displaystyle 0, (50)

and select for each nn a subset

𝒟n:={(xn,yn)𝒯μn(n)(PXY):g(n)(ϕ(n)(xn),yn)=xn},\mathcal{D}_{n}:=\left\{(x^{n},y^{n})\in\mathcal{T}_{\mu_{n}}^{(n)}(P_{XY})\colon g^{(n)}\left(\phi^{(n)}(x^{n}),y^{n}\right)=x^{n}\right\}, (51)

i.e., the set of all typical (xn,yn)(x^{n},y^{n})-sequences for which the reconstructed sequence X^n\hat{X}^{n} coincides with the source sequence XnX^{n}. Let

Δn:=Pr[(Xn,Yn)𝒟n],\Delta_{n}:=\Pr[(X^{n},Y^{n})\in\mathcal{D}_{n}], (52)

and notice that by (47) and [3, Remark to Lemma 2.12]:

Δn1δn|𝒳||𝒴|4μn2n,\Delta_{n}\geq 1-\delta_{n}-\frac{|\mathcal{X}||\mathcal{Y}|}{4\mu_{n}^{2}n}, (53)

and thus by (48) and (50):

limn1nlogΔn=0.\lim_{n\to\infty}\frac{1}{n}\log\Delta_{n}=0. (54)

Let further (X~n,Y~n)(\tilde{X}^{n},\tilde{Y}^{n}) be random variables of joint pmf

PX~nY~n(xn,yn)=PXnYn(xn,yn)Δn𝟙{(xn,yn)𝒟n}.P_{\tilde{X}^{n}\tilde{Y}^{n}}(x^{n},y^{n})=\frac{P_{{X}^{n}{Y}^{n}}(x^{n},y^{n})}{\Delta_{n}}\cdot\mathds{1}\{(x^{n},y^{n})\in\mathcal{D}_{n}\}. (55)

Let also M~=ϕ(n)(X~n)\tilde{{M}}=\phi^{(n)}({\tilde{X}^{n}}) and TT be uniform over {1,,n}\{1,\ldots,n\} independent of (X~n,Y~n,M~)(\tilde{X}^{n},\tilde{Y}^{n},\tilde{{M}}).

The strong converse is then easily obtained as follows. Similar to the weak converse we have:

R\displaystyle R \displaystyle\geq 1nH(M~)1nH(M~|Y~n)1nH(X~n|Y~n),\displaystyle\frac{1}{n}H(\tilde{{M}})\geq\frac{1}{n}H(\tilde{{M}}|\tilde{Y}^{n})\geq\frac{1}{n}H(\tilde{X}^{n}|\tilde{Y}^{n}), (56)

where the last inequality in (56) holds because X~n\tilde{X}^{n} can be obtained as a function of M~\tilde{{M}} and Y~n\tilde{Y}^{n}, see the definition of the set 𝒟n\mathcal{D}_{n}.

Letting nn\to\infty, we obtain the desired limit by (13) and (16) in Lemma 1.

IV Communication over a Memoryless Channel

This section studies communication over a discrete memoryless channel (DMC) as depicted in Figure 1.

IV-A Setup and Results

Consider a transmitter (Tx) that wishes to communicate to a receiver (Rx) over a DMC parametrized by the finite input and output alphabets 𝒳\mathcal{X} and 𝒴\mathcal{Y} and the transition law PY|XP_{Y|X}. The goal of the communication is that the Tx conveys a message MM to the Rx, where MM is uniformly distributed over the set {1,,2nR}\{1,\ldots,2^{nR}\} with R>0R>0 and n>0n>0 denoting the rate and blocklength of communication, respectively.

For a given blocklength nn, the Tx thus produces the nn-length sequence of channel inputs

Xn=ϕ(n)(M)X^{n}=\phi^{(n)}(M) (57)

for some choice of the encoding function ϕ(n):{1,,2nR}𝒳n\phi^{(n)}\colon\{1,\ldots,2^{nR}\}\to\mathcal{X}^{n}, and the Rx observes the sequence of channel outputs YnY^{n}, where the time-tt output YtY_{t} is distributed according to the law PY|X(|x)P_{Y|X}(\cdot|x) when the time-tt input is xx, irrespective of the previous and future inputs and outputs.

The receiver attempts to guess message MM based on the sequence of channel outputs YnY^{n}:

M^=g(n)(Yn)\hat{M}=g^{(n)}(Y^{n}) (58)

using a decoding function of the form g(n):𝒴n{1,,2nR}g^{(n)}\colon\mathcal{Y}^{n}\to\{1,\ldots,2^{nR}\}. The goal is to minimize the maximum decoding error probability

p(n)(error):=maxmPr[M^M|M=m].p^{(n)}(\textnormal{error}):=\max_{m}\Pr[\hat{M}\neq M|M=m]. (59)
Definition 2

The rate R>0R>0 is said {δn}\{\delta_{n}\}-achievable over the DMC (𝒳,𝒴,PY|X)(\mathcal{X},\mathcal{Y},P_{Y|X}), if there exists a sequence of encoding and decoding functions {(ϕ(n),g(n))}\{(\phi^{(n)},g^{(n)})\} such that for each blocklength nn the maximum probability of error

p(n)(error)δn.p^{(n)}(\textnormal{error})\leq\delta_{n}. (60)

A well-known result in information theory states [3]:

Theorem 2

Any rate R>CR>C, where CC denotes the capacity

C:=maxPXI(X;Y),C:=\max_{P_{X}}I(X;Y), (61)

is not {δn}\{\delta_{n}\}-achievable for all sequences {δn}\{\delta_{n}\} satisfying

limn1nlog(1δn)=0.\lim_{n\to\infty}\frac{1}{n}\log(1-\delta_{n})=0. (62)

Above result implies that for all rates above capacity, the probability of error converges exponentially fast to 1. This result is well known, here we present a different converse proof.

IV-B Alternative Strong Converse Proof

Fix a sequence of encoding and decoding functions {(ϕ(n),g(n))}n=1\{(\phi^{(n)},g^{(n)})\}_{n=1}^{\infty} so that (60) holds. Choose a sequence of small positive numbers {μn}\{\mu_{n}\} satisfying

limnμn\displaystyle\lim_{n\to\infty}\mu_{n} =\displaystyle= 0\displaystyle 0 (63)
limn(nμn2)1\displaystyle\lim_{n\to\infty}\left(n\cdot\mu_{n}^{2}\right)^{-1} =\displaystyle= 0,\displaystyle 0, (64)

and define for each message M=mM=m the set

𝒟m\displaystyle\mathcal{D}_{m} :=\displaystyle:= {yn𝒯μn(n)(PY|X=xn(m)):g(n)(yn)=m}\displaystyle\left\{y^{n}\in\mathcal{T}_{\mu_{n}}^{(n)}(P_{Y|X=x^{n}(m)})\colon g^{(n)}\left(y^{n}\right)=m\right\} (65)

and its probability

Δm:=Pr[Yn𝒟m|M=m].\Delta_{m}:=\Pr[Y^{n}\in\mathcal{D}_{m}|M=m]. (66)

(For readability we write the sets 𝒟m\mathcal{D}_{m} and its probability Δm\Delta_{m} without the subscript nn.) By (60) and [3, Remark to Lemma 2.12]:

Δm1δn|𝒴||𝒳|4μn2n,\Delta_{m}\geq 1-\delta_{n}-\frac{|\mathcal{Y}||\mathcal{X}|}{4\mu_{n}^{2}n}, (67)

and thus by (62) and (64):

limn1nlogΔm=0.\lim_{n\to\infty}\frac{1}{n}\log\Delta_{m}=0. (68)

Let further (M~,X~n,Y~n)(\tilde{M},\tilde{X}^{n},\tilde{Y}^{n}) be random variables so that M~\tilde{M} is uniform over the set {1,,2nR}\{1,\ldots,2^{nR}\} (i.e., it has the same distribution as MM),

X~n:=ϕ(n)(M~)\tilde{X}^{n}:=\phi^{(n)}(\tilde{M}) (69)

and

PY~n|M~(yn|m)\displaystyle P_{\tilde{Y}^{n}|\tilde{M}}(y^{n}|m) =\displaystyle= PY|Xn(yn|xn(m))Δm𝟙{yn𝒟m}.\displaystyle\frac{P_{Y|X}^{\otimes n}(y^{n}|x^{n}(m))}{\Delta_{m}}\cdot\mathbbm{1}\{y^{n}\in\mathcal{D}_{m}\}. (70)

Further, let TT be independent of (M~,X~n,Y~n)(\tilde{M},\tilde{X}^{n},\tilde{Y}^{n}) and uniform over {1,,n}\{1,\ldots,n\}. Notice that since the decoding sets {𝒟m}\{\mathcal{D}_{m}\} are disjoint, by the definition of the new measure PY~n|M~P_{\tilde{Y}^{n}|\tilde{M}} it is possible to determine M~\tilde{M} from Y~\tilde{Y} with probability 11.

Following similar steps as in the weak converse, we have:

R\displaystyle R =\displaystyle= 1nH(M~)\displaystyle\frac{1}{n}H(\tilde{M}) (71)
=(a)\displaystyle\stackrel{{\scriptstyle(a)}}{{=}} 1nI(M~;Y~n)\displaystyle\frac{1}{n}I(\tilde{M};\tilde{Y}^{n}) (72)
=\displaystyle= 1nH(Y~n)1nH(Y~n|M~)\displaystyle\frac{1}{n}H(\tilde{Y}^{n})-\frac{1}{n}H(\tilde{Y}^{n}|\tilde{M}) (73)
\displaystyle\leq 1ni=1nH(Y~i)1nH(Y~n|M~)\displaystyle\frac{1}{n}\sum_{i=1}^{n}H(\tilde{Y}_{i})-\frac{1}{n}H(\tilde{Y}^{n}|\tilde{M}) (74)
=\displaystyle= H(Y~T|T)1nH(Y~n|M~)\displaystyle H(\tilde{Y}_{T}|T)-\frac{1}{n}H(\tilde{Y}^{n}|\tilde{M}) (75)
\displaystyle\leq H(Y~T)1nH(Y~n|M~),\displaystyle H(\tilde{Y}_{T})-\frac{1}{n}H(\tilde{Y}^{n}|\tilde{M}), (76)

where (a)(a) holds because M~=g(Y~n)\tilde{M}=g(\tilde{Y}^{n}) as explained above.

By the following lemma, by considering an appropriate subsequence of blocklengths, and by the continuity of the entropy function, we deduce that

RIPXPY|X(X;Y)C,R\leq I_{{P}_{X}P_{Y|X}}(X;Y)\leq C, (77)

where the subscript indicates that mutual information is with respect to the joint pmf PXPY|XP_{X}P_{Y|X} with PXP_{X} denoting the pmf mentioned in the lemma. This concludes the proof of the strong converse for channel coding.

Lemma 3

There exists an increasing subsequence of blocklengths {ni}i=1\{n_{i}\}_{i=1}^{\infty} such that for some pmf PXP_{X}:

limiPY~T(y)\displaystyle\lim_{i\to\infty}P_{\tilde{Y}_{T}}(y) =\displaystyle= x𝒳PX(x)PY|X(y|x)\displaystyle\sum_{x\in\mathcal{X}}P_{X}(x)P_{Y|X}(y|x) (78)
limi1niH(Y~ni|M~)\displaystyle\lim_{i\to\infty}\frac{1}{n_{i}}H(\tilde{Y}^{n_{i}}|\tilde{M}) =\displaystyle= HPXPY|X(Y|X),\displaystyle H_{P_{X}P_{Y|X}}(Y|X), (79)

where HPXPY|X(Y|X)H_{P_{X}P_{Y|X}}(Y|X) denotes the conditional entropy of YY given XX when the pair (X,Y)PXPY|X(X,Y)\sim P_{X}P_{Y|X}.

Proof:

For readability, we will also write xn(m)x^{n}(m) and xn(M~)x^{n}(\tilde{M}) to indicate the (random) codewords ϕ(n)(m)\phi^{(n)}(m) and ϕ(n)(M~)\phi^{(n)}(\tilde{M}). We have:

PY~T(y)\displaystyle{P}_{\tilde{Y}_{T}}(y) =\displaystyle= 1nt=1nPY~t(y)\displaystyle\frac{1}{n}\sum_{t=1}^{n}P_{\tilde{Y}_{t}}(y) (80)
=\displaystyle= 𝔼[1nt=1n𝟙{Y~t=y}]\displaystyle\mathbb{E}\left[\frac{1}{n}\sum_{t=1}^{n}\mathbbm{1}\{\tilde{Y}_{t}=y\}\right] (81)
=\displaystyle= 𝔼[πY~n(y)]\displaystyle\mathbb{E}\left[\pi_{\tilde{Y}^{n}}(y)\right] (82)
=\displaystyle= x𝒳𝔼[πxn(M~)Y~n(x,y)]\displaystyle\sum_{x\in\mathcal{X}}\mathbb{E}\left[\pi_{{x}^{n}(\tilde{M})\tilde{Y}^{n}}(x,y)\right] (83)
=\displaystyle= x𝒳12nRm=12nR𝔼[πxn(m)Y~n(x,y)|M~=m].\displaystyle\sum_{x\in\mathcal{X}}\frac{1}{2^{nR}}\sum_{m=1}^{2^{nR}}\mathbb{E}\left[\pi_{{x}^{n}(m)\tilde{Y}^{n}}(x,y)\bigg{|}\tilde{M}=m\right]. (84)

Since 𝒟m\mathcal{D}_{m} is a subset of the conditional typical set 𝒯μn(n)(PY|X=xn(m))\mathcal{T}_{\mu_{n}}^{(n)}(P_{Y|X=x^{n}(m)}), for all m{1,,2nR}m\in\{1,\ldots,2^{nR}\}, all yn𝒟my^{n}\in\mathcal{D}_{m}, and all (x,y)𝒳×𝒴(x,y)\in\mathcal{X}\times\mathcal{Y}:

|πxn(m)yn(x,y)πxn(m)(x)PY|X(y|x)|μn,\displaystyle\left|\pi_{{x}^{n}(m)y^{n}}(x,y)-\pi_{{x}^{n}(m)}(x)P_{Y|X}(y|x)\right|\leq\mu_{n}, (85)

and if PY|X(y|x)=0P_{Y|X}(y|x)=0 then πxn(m)yn(x,y)=0\pi_{{x}^{n}(m)y^{n}}(x,y)=0. Plugging these conditions into (84) we obtain

PY~T(y)\displaystyle{P}_{\tilde{Y}_{T}}(y) \displaystyle\leq x𝒳:PY|X(y|x)>012nRm=12nRπxn(m)(x)PY|X(y|x)+|𝒳|μn\displaystyle\sum_{\begin{subarray}{c}x\in\mathcal{X}\colon\\ P_{Y|X}(y|x)>0\end{subarray}}\frac{1}{2^{nR}}\sum_{m=1}^{2^{nR}}\pi_{{x}^{n}(m)}(x)P_{Y|X}(y|x)+|\mathcal{X}|\mu_{n} (86a)
and similarly:
PY~T(y)\displaystyle{P}_{\tilde{Y}_{T}}(y) \displaystyle\geq x𝒳:PY|X(y|x)>012nRm=12nRπxn(m)(x)PY|X(y|x)|𝒳|μn.\displaystyle\sum_{\begin{subarray}{c}x\in\mathcal{X}\colon\\ P_{Y|X}(y|x)>0\end{subarray}}\frac{1}{2^{nR}}\sum_{m=1}^{2^{nR}}\pi_{{x}^{n}(m)}(x)P_{Y|X}(y|x)-|\mathcal{X}|\mu_{n}. (86b)

Let now {ni}\{n_{i}\} be an increasing subsequence of blocklengths so that the sequence of average types 12nRm=12nRπxn(m)(x)\frac{1}{2^{nR}}\sum_{m=1}^{2^{nR}}\pi_{{x}^{n}(m)}(x) converges for each x𝒳x\in\mathcal{X} and denote the convergence point by PX(x){P}_{X}(x). Then, since μn0\mu_{n}\to 0 as nn\to\infty, by (86):

limiPY~T(y)=x𝒳PX(x)PY|X(y|x),\displaystyle\lim_{i\to\infty}{P}_{\tilde{Y}_{T}}(y)=\sum_{x\in\mathcal{X}}{P}_{X}(x)P_{Y|X}(y|x), (87)

establishing the first part of the lemma.

Notice next that by definition

1nH(Y~n|M~=m)\displaystyle\frac{1}{n}H(\tilde{Y}^{n}|\tilde{M}=m) (88)
=\displaystyle= 1nyn𝒟mPY~n|M~=m(yn)logPY~n|M~=m(yn)\displaystyle-\frac{1}{n}\sum_{y^{n}\in\mathcal{D}_{m}}P_{\tilde{Y}^{n}|\tilde{M}=m}(y^{n})\log P_{\tilde{Y}^{n}|\tilde{M}=m}(y^{n})
=\displaystyle= 1nyn𝒟mPY~n|M~=m(yn)logPY|Xn(yn|xn(m))Δm\displaystyle-\frac{1}{n}\sum_{y^{n}\in\mathcal{D}_{m}}P_{\tilde{Y}^{n}|\tilde{M}=m}(y^{n})\log\frac{P_{Y|X}^{\otimes n}(y^{n}|x^{n}(m))}{\Delta_{m}} (94)
=\displaystyle= 1nt=1nyn𝒟mPY~n|M~=m(yn)logPY|X(yt|xt(m))\displaystyle-\frac{1}{n}\sum_{t=1}^{n}\sum_{y^{n}\in\mathcal{D}_{m}}P_{\tilde{Y}^{n}|\tilde{M}=m}(y^{n})\log P_{Y|X}(y_{t}|x_{t}(m))
+1nlogΔm\displaystyle+\frac{1}{n}\log\Delta_{m}
=\displaystyle= 1nt=1nyt𝒴PY~t|M~=m(yt)logPY|X(yt|xt(m))\displaystyle-\frac{1}{n}\sum_{t=1}^{n}\sum_{y_{t}\in\mathcal{Y}}P_{\tilde{Y}_{t}|\tilde{M}=m}(y_{t})\log P_{Y|X}(y_{t}|x_{t}(m))
+1nlogΔm\displaystyle+\frac{1}{n}\log\Delta_{m}
=\displaystyle= 1nt=1ny𝒴𝔼[𝟙{Y~t=y}|M~=m]logPY|X(y|xt(m))\displaystyle-\frac{1}{n}\sum_{t=1}^{n}\sum_{y\in\mathcal{Y}}\mathbb{E}\left[\mathbbm{1}\left\{\tilde{Y}_{t}=y\right\}\Big{|}\tilde{M}=m\right]\log P_{Y|X}(y|x_{t}(m))
+1nlogΔm\displaystyle+\frac{1}{n}\log\Delta_{m}
=\displaystyle= x𝒳y𝒴𝔼[1nt=1n𝟙{xt(m)=x,Y~t=y}|M~=m]\displaystyle-\sum_{x\in\mathcal{X}}\sum_{y\in\mathcal{Y}}\mathbb{E}\left[\frac{1}{n}\sum_{t=1}^{n}\mathbbm{1}\left\{x_{t}(m)=x,\tilde{Y}_{t}=y\right\}\Big{|}\tilde{M}=m\right]
logPY|X(y|x)\displaystyle\hskip 56.9055pt\cdot\log P_{Y|X}(y|x)
+1nlogΔm\displaystyle+\frac{1}{n}\log\Delta_{m}
=\displaystyle= x𝒳y𝒴𝔼[πxn(m)Y~n(x,y)|M~=m]logPY|X(y|x)\displaystyle-\sum_{x\in\mathcal{X}}\sum_{y\in\mathcal{Y}}\mathbb{E}\left[\pi_{{x}^{n}(m)\tilde{Y}^{n}}(x,y)\Big{|}\tilde{M}=m\right]\log P_{Y|X}(y|x)
+1nlogΔm.\displaystyle+\frac{1}{n}\log\Delta_{m}.

Averaging over all messages m{1,,2nR}m\in\{1,\ldots,2^{nR}\}, we obtain

1nH(Y~n|M~)\displaystyle\frac{1}{n}H(\tilde{Y}^{n}|\tilde{M}) (96)
=\displaystyle= x𝒳y𝒴12nRm=12nR𝔼[πxn(m)Y~n(x,y)|M~=m]logPY|X(y|x)\displaystyle-\sum_{x\in\mathcal{X}}\sum_{y\in\mathcal{Y}}\frac{1}{2^{nR}}\sum_{m=1}^{2^{nR}}\mathbb{E}\left[\pi_{{x}^{n}(m)\tilde{Y}^{n}}(x,y)\Bigg{|}\tilde{M}=m\right]\log P_{Y|X}(y|x)
+1nlogΔm.\displaystyle+\frac{1}{n}\log\Delta_{m}.

By (85) and by defining PXP_{X} as the convergence point of 12nRm=12nRπxn(m)(x)\frac{1}{2^{nR}}\sum_{m=1}^{2^{nR}}\pi_{{x}^{n}(m)}(x) for the sequence of blocklengths {ni}i=1\{n_{i}\}_{i=1}^{\infty}, one can follow the same bounding steps as leading to (86) to obtain:

limi1niH(Y~ni|M~)\displaystyle\lim_{i\to\infty}\frac{1}{n_{i}}H(\tilde{Y}^{n_{i}}|\tilde{M}) (97)
=\displaystyle= x𝒳PX(x)PY|X(y|x)logPY|X(y|x)\displaystyle-\sum_{x\in\mathcal{X}}{P}_{X}(x)P_{Y|X}(y|x)\log P_{Y|X}(y|x)
=\displaystyle= HPXPY|X(Y|X),\displaystyle H_{P_{X}P_{Y|X}}(Y|X),

which concludes the second part of the proof. ∎

V Interactive Lossy Compression

This section focuses on the interactive lossy compression problem depicted in Figure 3.

V-A Setup

Consider two terminals, observing the related source sequences XnX^{n} and YnY^{n}, where as in the case of source coding with side-information:

(Xn,Yn) i.i.d. PXY,\displaystyle(X^{n},Y^{n})\textnormal{ i.i.d. }\sim\,P_{XY}, (98)

for a given probability mass function PXYP_{XY} on the product alphabet 𝒳×𝒴\mathcal{X}\times\mathcal{Y}. Communication between the two terminals is over noise-free links and interactive in L>0L>0 rounds. The terminal observing XnX^{n} starts communication and thus in all odd rounds =1,3,5,\ell=1,3,5,\ldots, the message M{M}_{\ell} is created as:

M\displaystyle{M}_{\ell} =\displaystyle= ϕ(n)(Xn,M1,,M1),=1,3,5,,\displaystyle\phi_{\ell}^{(n)}(X^{n},{M}_{1},\ldots,{M}_{\ell-1}),\qquad\ell=1,3,5,\ldots, (99)

for an encoding function ϕ(n)\phi_{\ell}^{(n)} on appropriate domains, where each message M{1,,2nR}{M}_{\ell}\in\{1,\ldots,2^{nR_{\ell}}\}, for given non-negative rates R1,,RLR_{1},\ldots,R_{L}. (Note that for =1\ell=1, M1=ϕ1(n)(Xn)M_{1}=\phi_{1}^{(n)}(X^{n}). ) In even rounds =2,4,6,\ell=2,4,6,\ldots , the message M{1,,2nR}{M}_{\ell}\in\{1,\ldots,2^{nR_{\ell}}\} is created as:

M\displaystyle{M}_{\ell} =\displaystyle= ϕ(n)(Yn,M1,,M1),=2,4,6,.\displaystyle\phi_{\ell}^{(n)}(Y^{n},{M}_{1},\ldots,{M}_{\ell-1}),\qquad\ell=2,4,6,\ldots. (100)

At the end of the LL rounds, each terminal produces a reconstruction sequence on a pre-specified alphabet. The terminal observing Xn{X}^{n} produces

Wn=gX(n)(Xn,M1,,ML)W^{n}=g_{X}^{(n)}(X^{n},{M}_{1},\ldots,{M}_{L}) (101)

for WnW^{n} taking value on the given alphabet 𝒲n{\mathcal{W}}^{n}. The terminal observing Yn{Y}^{n} produces

Zn=gY(n)(Yn,M1,,ML)Z^{n}=g_{Y}^{(n)}(Y^{n},{M}_{1},\ldots,{M}_{L}) (102)

for ZnZ^{n} taking value on the given alphabet 𝒵n{\mathcal{Z}}^{n}.

The reconstructions are supposed to satisfy a set of JJ distortion constraints:

1ni=1ndj(Xi,Yi,Wi,Zi)<Dj,j{1,,J},\displaystyle\frac{1}{n}\sum_{i=1}^{n}d_{j}(X_{i},Y_{i},W_{i},Z_{i})<D_{j},\qquad j\in\{1,\ldots,J\}, (103)

for given non-negative symbolwise-distortion functions dj(,,,)d_{j}(\cdot,\cdot,\cdot,\cdot).

Definition 3

Given sequences {δj,n}\{\delta_{j,n}\}, a rate-tuple R1,,RL0R_{1},\ldots,R_{L}\geq 0 is said {δj,n}\{\delta_{j,n}\}-achievable if there exist sequences (in nn) of encoding functions {ϕ(n)}=1L\{\phi_{\ell}^{(n)}\}_{\ell=1}^{L} and reconstruction functions gX(n)g_{X}^{(n)} and gY(n)g_{Y}^{(n)} such that the excess distortion probabilities satisfy

Pr[1ni=1ndj(Xi,Yi,Wi,Zi)>Dj]δj,n,j{1,,J}.\displaystyle\Pr\left[\frac{1}{n}\sum_{i=1}^{n}d_{j}(X_{i},Y_{i},W_{i},Z_{i})>D_{j}\right]\leq\delta_{j,n},\quad j\in\{1,\ldots,J\}.
(104)
Remark 1

Our problem formulation includes various previously studied models as special cases. For example, the Wyner-Ziv problem [18] is included by setting L=1L=1 and choosing a distortion function of the form dj(Xi,Zi)d_{j}(X_{i},Z_{i}). Kaspi’s interactive source-coding problem is included by restricting to two distortion functions of the form d1(Xi,Zi)d_{1}(X_{i},Z_{i}) and d2(Yi,Wi)d_{2}(Y_{i},W_{i}). Lossy source coding with side-information and lossy common reconstruction [28, 27] is included by setting L=1L=1. The interactive function computation problem [26] is obtained by choosing J=1J=1, D1=0D_{1}=0, and distortion function d1(X,Y,W,Z)=𝟙{Z=W=f(X,Y)}d_{1}(X,Y,W,Z)=\mathbbm{1}\{Z=W=f(X,Y)\} for the desired function ff.

Theorem 3

Given any sequences {δj,n}\{\delta_{j,n}\} satisfying

j=1Jδj,n\displaystyle\sum_{j=1}^{J}\delta_{j,n} <\displaystyle< 1,n=1,2,,\displaystyle 1,\qquad n=1,2,\ldots, (105a)
limn1nlog(1j=1Jδj,n)\displaystyle\lim_{n\to\infty}\frac{1}{n}\log\left(1-\sum_{j=1}^{J}\delta_{j,n}\right) =\displaystyle= 0,j{1,,J},\displaystyle 0,\qquad j\in\{1,\ldots,J\}, (105b)

a rate-tuple (R1,,RL)(R_{1},\ldots,R_{L}) can only be {δj,n}\{\delta_{j,n}\}-achievable if it satisfies the rate-constraints

RI(X;U|U1U1),{1,,L},R_{\ell}\geq I(X;U_{\ell}|U_{1}\cdots U_{\ell-1}),\quad\ell\in\{1,\ldots,L\}, (106a)
for some auxiliary random variables U1,,ULU_{1},\ldots,U_{L} and reconstruction random variables WW and ZZ satisfying the distortion constraints
dj(X,Y,W,Z)\displaystyle d_{j}\big{(}X,Y,W,Z\big{)} <\displaystyle< Dj,j{1,,J},\displaystyle D_{j},\qquad j\in\{1,\ldots,J\}, (106b)
for (X,Y)PXY(X,Y)\sim P_{XY}, and the Markov chains
U\displaystyle U_{\ell}\to (X,U1,,U1)Y,\displaystyle(X,U_{1},\ldots,U_{\ell-1})\to Y, =1,3,5,,\displaystyle\quad\ell=1,3,5,\ldots, (106c)
U\displaystyle U_{\ell}\to (Y,U1,,U1)X,\displaystyle(Y,U_{1},\ldots,U_{\ell-1})\to X, =2,4,6,,\displaystyle\quad\ell=2,4,6,\ldots, (106d)
W\displaystyle W\to (X,U1,,UL)Y,\displaystyle(X,U_{1},\ldots,U_{L})\to Y, (106e)
Z\displaystyle Z\to (Y,U1,,UL)X,\displaystyle(Y,U_{1},\ldots,U_{L})\to X, (106f)
Remark 2 (A single distortion)

For a single distortion constraint J=1J=1, the theorem implies that if the rate-tuple violates the constraints in the theorem, then the probability of excess distortion tends to 1 exponentially fast.

Remark 3 (Vector-valued distortions)

Theorem 3 extends in a straightforward manner to vector-valued distortion functions and vector distortions

1ni=1n𝒅j(Xi,Yi,Wi,Zi)<𝑫j,j{1,,J},\displaystyle\frac{1}{n}\sum_{i=1}^{n}\boldsymbol{d}_{j}(X_{i},Y_{i},W_{i},Z_{i})<\boldsymbol{D}_{j},\qquad j\in\{1,\ldots,J\}, (107)

where 𝐃jνj\boldsymbol{D}_{j}\in\mathbb{R}^{\nu_{j}} for some positive integer νj\nu_{j}, distortion functions are non-negative and of the form 𝐝j:𝒳×𝒴×𝒲×𝒵νj\boldsymbol{d}_{j}\colon\mathcal{X}\times\mathcal{Y}\times\mathcal{W}\times\mathcal{Z}\to\mathbb{R}^{\nu_{j}}, and inequality (107) is meant component-wise. The difference between JJ scalar distortion constraints as in (103) and a single JJ-valued vector-distortion function as in (107) is that the vector-distortion constraint limits the probability that any of the JJ constraints is violated whereas the JJ scalar distortion constraints individually limit the probability of each distortion to be violated.

In the following section, we prove the strong converse, i.e., the non-achievability of any rate-tuple (R1,,RL)(R_{1},\ldots,R_{L}) not satisfying the above conditions, for any sequences {δj,n}\{\delta_{j,n}\} satisfying (105). Using standard arguments, it can be shown that for any rate-tuple (R1,,RL)(R_{1},\ldots,R_{L}) satisfying constraints (106) there exist excess probabilities {δj,n}\{\delta_{j,n}\} all tending to 0 as nn\to\infty and so that the the rate-tuple (R1,,RL)(R_{1},\ldots,R_{L}) is {δj,n}\{\delta_{j,n}\}-achievable.

V-B Strong Converse Proof

Fix a sequence of encoding functions {ϕ(n)}=1L\{\phi_{\ell}^{(n)}\}_{\ell=1}^{L} and reconstruction functions gX(n)g_{X}^{(n)} and gY(n)g_{Y}^{(n)} satisfying (104). Choose a sequence of positive real numbers {μn}\{\mu_{n}\} satisfying

limnμn\displaystyle\lim_{n\to\infty}\mu_{n} =\displaystyle= 0\displaystyle 0 (108)
limn(nμn2)1\displaystyle\lim_{n\to\infty}\left(n\cdot\mu_{n}^{2}\right)^{-1} =\displaystyle= 0,\displaystyle 0, (109)

and the set

𝒟n:={(xn,yn)𝒯μn(n)(PXY):\displaystyle\mathcal{D}_{n}:=\Big{\{}(x^{n},y^{n})\in\mathcal{T}_{\mu_{n}}^{(n)}(P_{XY})\colon (110)
dj(n)(gX(n)(xn,m1L),gY(n)(yn,m1L))Dj,\displaystyle d_{j}^{(n)}\left(g_{X}^{(n)}(x^{n},{m}_{1}^{L}),\ g_{Y}^{(n)}\left(y^{n},{m}_{1}^{L}\right)\right)\leq D_{j},
j{1,,J}},\displaystyle\hskip 113.81102pt\quad j\in\{1,\ldots,J\}\Big{\}},

where m1L:=(m1,,mL){m}_{1}^{L}:=({m}_{1},\ldots,{m}_{L}) and for odd values of \ell we have m=ϕ(n)(xn,m1,,m1){m}_{\ell}=\phi_{\ell}^{(n)}(x^{n},{m}_{1},\ldots,{m}_{\ell-1}) whereas for even values of \ell we have m=ϕ(n)(yn,m1,,m1){m}_{\ell}=\phi_{\ell}^{(n)}(y^{n},{m}_{1},\ldots,{m}_{\ell-1}).

Define also the probability

Δn:=Pr[(Xn,Yn)𝒟n]\Delta_{n}:=\Pr[(X^{n},Y^{n})\in\mathcal{D}_{n}] (111)

and notice that by (104) and [3, Remark to Lemma 2.12]:

Δn1j=1Jδj,n|𝒳||𝒴|4μ2n,\Delta_{n}\geq 1-\sum_{j=1}^{J}\delta_{j,n}-\frac{|\mathcal{X}||\mathcal{Y}|}{4\mu^{2}n}, (112)

which by assumptions (105b) and (109) satisfies

limn1nlogΔn=0.\lim_{n\to\infty}\frac{1}{n}\log\Delta_{n}=0. (113)

Let further (X~n,Y~n)(\tilde{X}^{n},\tilde{Y}^{n}) be random variables of joint pmf

PX~nY~n(xn,yn)=PXYn(xn,yn)Δn𝟙{(xn,yn)𝒟n}.P_{\tilde{X}^{n}\tilde{Y}^{n}}(x^{n},y^{n})=\frac{P_{{X}{Y}}^{\otimes n}(x^{n},y^{n})}{\Delta_{n}}\cdot\mathbbm{1}\{(x^{n},y^{n})\in\mathcal{D}_{n}\}. (114)

Let also TT be uniform over {1,,n}\{1,\ldots,n\} independent of (X~n,Y~n)(\tilde{X}^{n},\tilde{Y}^{n}), and define:

M~\displaystyle\tilde{M}_{\ell} =\displaystyle= ϕ(n)(X~n,M~1,,M~1),=1,3,5,,\displaystyle\phi_{\ell}^{(n)}(\tilde{X}^{n},\tilde{M}_{1},\ldots,\tilde{M}_{\ell-1}),\qquad\ell=1,3,5,\ldots, (115)
M~\displaystyle\tilde{M}_{\ell} =\displaystyle= ϕ(n)(Y~n,M~1,,M~1),=2,4,6,.\displaystyle\phi_{\ell}^{(n)}(\tilde{Y}^{n},\tilde{M}_{1},\ldots,\tilde{M}_{\ell-1}),\qquad\ell=2,4,6,\ldots. (116)

Note that for =1\ell=1, M~1=ϕ1(n)(X~n)\tilde{M}_{1}=\phi_{1}^{(n)}(\tilde{X}^{n}). Define the auxiliary random variables

U1\displaystyle U_{1} :=\displaystyle:= (X~T1,Y~T+1n,M~1,T)\displaystyle(\tilde{X}^{T-1},\tilde{Y}_{T+1}^{n},\tilde{{M}}_{1},T) (117a)
Uτ\displaystyle U_{\tau} :=\displaystyle:= M~τ,τ{2,,L}.\displaystyle\tilde{{M}}_{\tau},\qquad\tau\in\{2,\ldots,L\}. (117b)

We start with some preliminary observations. For any odd 1\ell\geq 1 observe the following:

1nH(X~n|Y~nM~1M~)\displaystyle\frac{1}{n}H(\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell}) (121)
=(d)\displaystyle\stackrel{{\scriptstyle(d)}}{{=}} 1n[H(X~n|Y~nM~1M~)\displaystyle\frac{1}{n}\big{[}H(\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell})
+τ{1,,}:τ oddI(M~τ;Y~n|X~nM~1M~τ1)\displaystyle\qquad+\sum_{\begin{subarray}{c}\tau\in\{1,\ldots,\ell\}\colon\\ \tau\textnormal{ odd}\end{subarray}}I(\tilde{{M}}_{\tau};\tilde{Y}^{n}|\tilde{X}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\tau-1})
+τ{2,,1}:τ evenI(M~τ;X~n|Y~nM~1M~τ1)]\displaystyle\qquad+\sum_{\begin{subarray}{c}\tau\in\{2,\ldots,\ell-1\}\colon\\ \tau\textnormal{ even}\end{subarray}}I(\tilde{{M}}_{\tau};\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\tau-1})\big{]}
=\displaystyle= 1n[H(X~n|Y~nM~1M~)H(Y~n|X~nM~1M~)]\displaystyle\frac{1}{n}\left[H(\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell})-H(\tilde{Y}^{n}|\tilde{X}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell})\right]
+1n[H(Y~n|X~nM~1M~1)H(X~n|Y~nM~1M~1)]\displaystyle+\frac{1}{n}\left[H(\tilde{Y}^{n}|\tilde{X}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell-1})-H(\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell-1})\right]
+\displaystyle+\cdots\phantom{\big{[}}
+1n[H(X~n|Y~nM~1)H(Y~n|X~nM~1)]\displaystyle+\frac{1}{n}\left[H(\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1})-H(\tilde{Y}^{n}|\tilde{X}^{n}\tilde{{M}}_{1})\right]
+1nH(Y~n|X~n)\displaystyle+\frac{1}{n}H(\tilde{Y}^{n}|\tilde{X}^{n})
=(e)\displaystyle\stackrel{{\scriptstyle(e)}}{{=}} H(X~T|Y~TU1U)H(Y~T|X~TU1U)\displaystyle H(\tilde{X}_{T}|\tilde{Y}_{T}U_{1}\cdots U_{\ell})-H(\tilde{Y}_{T}|\tilde{X}_{T}U_{1}\cdots U_{\ell})
+H(Y~T|X~TU1U1)H(X~T|Y~TU1U1)\displaystyle+H(\tilde{Y}_{T}|\tilde{X}_{T}U_{1}\cdots U_{\ell-1})-H(\tilde{X}_{T}|\tilde{Y}_{T}U_{1}\cdots U_{\ell-1})
+\displaystyle+\cdots
+H(X~T|Y~TU1)H(Y~T|X~TU1)\displaystyle+H(\tilde{X}_{T}|\tilde{Y}_{T}U_{1})-H(\tilde{Y}_{T}|\tilde{X}_{T}U_{1})
+H(Y~T|X~T)+o(1)\displaystyle+H(\tilde{Y}_{T}|\tilde{X}_{T})+o(1)
=(f)\displaystyle\stackrel{{\scriptstyle(f)}}{{=}} H(X~T|Y~TU1U)\displaystyle H(\tilde{X}_{T}|\tilde{Y}_{T}U_{1}\cdots U_{\ell})
+τ{1,,}:τ oddI(Uτ;Y~T|X~TU1Uτ1)\displaystyle+\sum_{\begin{subarray}{c}\tau\in\{1,\ldots,\ell\}\colon\\ \tau\textnormal{ odd}\end{subarray}}I(U_{\tau};\tilde{Y}_{T}|\tilde{X}_{T}U_{1}\cdots U_{\tau-1})
+τ{2,,1}:τ evenI(Uτ;X~T|Y~TU1Uτ1)+o(1)\displaystyle+\sum_{\begin{subarray}{c}\tau\in\{{2,\ldots,\ell-1}\}\colon\\ \tau\textnormal{ even}\end{subarray}}I(U_{\tau};\tilde{X}_{T}|\tilde{Y}_{T}U_{1}\cdots U_{\tau-1})+o(1)
(g)\displaystyle\stackrel{{\scriptstyle{(g)}}}{{\geq}} H(X~T|Y~TU1U)+o(1),\displaystyle H(\tilde{X}_{T}|\tilde{Y}_{T}U_{1}\cdots U_{\ell})+o(1), (122)

where, (d)(d) holds because for τ\tau odd, the message M~τ\tilde{{M}}_{\tau} is a function of (X~n,M~1,,M~τ1)(\tilde{X}^{n},\tilde{{M}}_{1},\ldots,\tilde{{M}}_{\tau-1}) and thus I(M~τ;Y~n|X~nM~1M~τ1)=0I(\tilde{{M}}_{\tau};\tilde{Y}^{n}|\tilde{X}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\tau-1})=0 whereas for τ\tau even the message M~τ\tilde{{M}}_{\tau} is a function of (Y~n,M~1,,M~τ1)(\tilde{Y}^{n},\tilde{{M}}_{1},\ldots,\tilde{{M}}_{\tau-1}) and thus I(M~τ;X~n|Y~nM~1M~τ1)=0I(\tilde{{M}}_{\tau};\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\tau-1})=0; (e)(e) holds by Lemmas 1 and 2 in Section II and Definitions (117), where for the application of Lemma 1 we also used Equation (113), and it is worth noting here that when =1\ell=1, all the terms before and including the sequential summation “+\cdots” do not exist anymore; (f)(f) holds by dividing the entropy terms between sums for τ\tau odd and even and by definition of the mutual information ; and (g){(g)} holds by the non-negativity of mutual information.

Following similar steps, we obtain for any even 2\ell\geq 2:

1nH(Y~n|X~nM~1M~)H(Y~T|X~TU1U)+o(1).\displaystyle\frac{1}{n}H(\tilde{Y}^{n}|\tilde{X}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell})\geq H(\tilde{Y}_{T}|\tilde{X}_{T}U_{1}\cdots U_{\ell})+o(1). (123)

We now apply bounds (122) and (123) to obtain the desired bounds on the rates and prove validity of some desired asymptotic Markov chains. For any odd 1\ell\geq 1, we have

R\displaystyle R_{\ell} \displaystyle\geq 1nH(M~)1nH(M~|Y~nM~1M~1)\displaystyle\frac{1}{n}H(\tilde{{M}}_{\ell})\geq\frac{1}{n}H(\tilde{{M}}_{\ell}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell-1}) (127)
=\displaystyle= 1nI(M~;X~n|Y~nM~1M~1)\displaystyle\frac{1}{n}I(\tilde{{M}}_{\ell};\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell-1})
=\displaystyle= 1n[H(X~n|Y~nM~1M~1)H(X~n|Y~nM~1M~)]\displaystyle\frac{1}{n}\left[H(\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell-1})-H(\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell})\right]
(h)\displaystyle\stackrel{{\scriptstyle(h)}}{{\geq}} 1n[H(X~n|Y~nM~1M~2)\displaystyle\frac{1}{n}\big{[}H(\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell-2})
i=1nH(X~i|Y~iX~i1Y~i+1nM~1M~)]\displaystyle\hskip 14.22636pt-\sum_{i=1}^{n}H(\tilde{X}_{i}|\tilde{Y}_{i}\tilde{X}^{i-1}\tilde{Y}_{i+1}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell})\big{]}
(i)\displaystyle\stackrel{{\scriptstyle(i)}}{{\geq}} H(X~T|Y~TU1U2)H(X~T|Y~TU1U)\displaystyle H(\tilde{X}_{T}|\tilde{Y}_{T}U_{1}\cdots U_{\ell-2})-H(\tilde{X}_{T}|\tilde{Y}_{T}U_{1}\cdots U_{\ell})
+o(1)\displaystyle+o(1)
=\displaystyle= I(U1U;X~T|Y~TU1U2)+o(1)\displaystyle I(U_{\ell-1}U_{\ell};\tilde{X}_{T}|\tilde{Y}_{T}U_{1}\cdots U_{\ell-2})+o(1) (128)
\displaystyle\geq I(U;X~T|Y~TU1U1)+o(1),\displaystyle I(U_{\ell};\tilde{X}_{T}|\tilde{Y}_{T}U_{1}\cdots U_{\ell-1})+o(1), (129)

where (h)(h) holds because for \ell odd message M~1\tilde{{M}}_{\ell-1} is a function of the tuple (Y~n,M~1,,M~2)(\tilde{Y}^{n},\tilde{{M}}_{1},\ldots,\tilde{{M}}_{\ell-2}) and because conditioning can only reduce entropy; and (i)(i) holds by (117) and (122). Notice that for =1\ell=1:

R1\displaystyle R_{1} \displaystyle\geq I(U1;X~T|Y~T)+o(1).\displaystyle I(U_{1};\tilde{X}_{T}|\tilde{Y}_{T})+o(1). (130)

For any even 2\ell\geq 2, we have:

R\displaystyle R_{\ell} \displaystyle\geq 1nH(M~)\displaystyle\frac{1}{n}H(\tilde{{M}}_{\ell}) (131)
\displaystyle\geq 1nI(M~;Y~n|X~nM~1M~1)\displaystyle\frac{1}{n}I(\tilde{{M}}_{\ell};\tilde{Y}^{n}|\tilde{X}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell-1}) (134)
=(j)\displaystyle\stackrel{{\scriptstyle(j)}}{{=}} 1n[H(Y~n|X~nM~1M~2)H(Y~n|X~nM~1M~)]\displaystyle\frac{1}{n}\left[H(\tilde{Y}^{n}|\tilde{X}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell-2})-H(\tilde{Y}^{n}|\tilde{X}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell})\right]
(k)\displaystyle\stackrel{{\scriptstyle(k)}}{{\geq}} H(Y~T|X~TU1U2)H(Y~T|X~TU1U)\displaystyle H(\tilde{Y}_{T}|\tilde{X}_{T}U_{1}\cdots U_{\ell-2})-H(\tilde{Y}_{T}|\tilde{X}_{T}U_{1}\cdots U_{\ell})
+o(1)\displaystyle+o(1)
\displaystyle\geq I(U;Y~T|X~TU1U1)+o(1)\displaystyle I(U_{\ell};\tilde{Y}_{T}|\tilde{X}_{T}U_{1}\cdots U_{\ell-1})+o(1) (135)

where (j)(j) holds because for \ell even M~1\tilde{M}_{\ell-1} is a function of (X~n,M~1,,M~2)(\tilde{X}^{n},\tilde{M}_{1},\ldots,\tilde{M}_{\ell-2}) and (k)(k) holds by (117) and (123).

We next notice that for \ell even (because the message M~\tilde{{M}}_{\ell} is a function of (Y~n,M~1,,M~1)(\tilde{Y}^{n},\tilde{{M}}_{1},\ldots,\tilde{{M}}_{\ell-1})):

0\displaystyle 0 =\displaystyle= 1nI(M~;X~n|Y~nM~1M~1)\displaystyle\frac{1}{n}I(\tilde{{M}}_{\ell};\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell-1}) (136)
=\displaystyle= 1n[H(X~n|Y~nM~1M~1)H(X~n|Y~nM~1M~)]\displaystyle\frac{1}{n}\left[H(\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell-1})-H(\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell})\right]
(l)\displaystyle\stackrel{{\scriptstyle(l)}}{{\geq}} H(X~T|Y~TU1U1)+o(1)\displaystyle H(\tilde{X}_{T}|\tilde{Y}_{T}U_{1}\cdots U_{\ell-1})+o(1)
1ni=1nH(X~i|X~i1Y~iY~i+1nM~1M~)\displaystyle-\frac{1}{n}\sum_{i=1}^{n}H(\tilde{X}_{i}|\tilde{X}^{i-1}\tilde{Y}_{i}\tilde{Y}_{i+1}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell})
=\displaystyle= I(U;X~T|Y~TU1U1)+o(1),\displaystyle I(U_{\ell};\tilde{X}_{T}|\tilde{Y}_{T}U_{1}\cdots U_{\ell-1}){+o(1)}, (137)

where (l)(l) holds by (122) and because conditioning can only reduce entropy.

Similarly, for 1\ell\geq 1 odd (because the message M~\tilde{{M}}_{\ell} is a function of (X~n,M~1,,M~1)(\tilde{X}^{n},\tilde{{M}}_{1},\ldots,\tilde{{M}}_{\ell-1})):

0\displaystyle 0 =\displaystyle= 1nI(M~;Y~n|X~nM~1)\displaystyle\frac{1}{n}I(\tilde{{M}}_{\ell};\tilde{Y}^{n}|\tilde{X}^{n}\cdots\tilde{{M}}_{\ell-1}) (138)
=\displaystyle= 1n[H(Y~n|X~nM~1M~1)H(Y~n|X~nM~1M~)]\displaystyle\frac{1}{n}\left[H(\tilde{Y}^{n}|\tilde{X}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell-1})-H(\tilde{Y}^{n}|\tilde{X}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{\ell})\right]
\displaystyle\geq I(U;Y~T|X~TU1U1)+o(1).\displaystyle I(U_{\ell};\tilde{Y}_{T}|\tilde{X}_{T}U_{1}\cdots U_{\ell-1})+o(1).

In particular, for =1\ell=1, message M~1\tilde{{M}}_{1} is a function of X~n\tilde{X}^{n} and we have:

0\displaystyle 0 =\displaystyle= 1nI(M~1;Y~n|X~n)I(U1;Y~T|X~T)+o(1).\displaystyle\frac{1}{n}I(\tilde{{M}}_{1};\tilde{Y}^{n}|\tilde{X}^{n})\geq I(U_{1};\tilde{Y}_{T}|\tilde{X}_{T})+o(1). (139)

Let now W~n:=gX(X~n,M~1,,M~L){\tilde{W}}^{n}:=g_{X}(\tilde{X}^{n},\tilde{{M}}_{1},\ldots,\tilde{{M}}_{L}) and Z~n:=gY(Y~n,M~1,,M~L){\tilde{Z}}^{n}:=g_{Y}(\tilde{Y}^{n},\tilde{{M}}_{1},\ldots,\tilde{{M}}_{L}) and notice that by our definition of the set 𝒟n\mathcal{D}_{n}, for any j{1,,J}j\in\{1,\ldots,J\}:

Dj\displaystyle D_{j} \displaystyle\geq 1ni=1ndj(X~i,Y~i,W~i,Z~i)\displaystyle\frac{1}{n}\sum_{i=1}^{n}d_{j}\left(\tilde{X}_{i},\tilde{Y}_{i},{\tilde{W}}_{i},{\tilde{Z}}_{i}\right) (140)
=\displaystyle= 𝔼[dj(X~T,Y~T,W~T,Z~T)].\displaystyle\mathbb{E}\left[d_{j}\left(\tilde{X}_{T},\tilde{Y}_{T},{\tilde{W}}_{T},{\tilde{Z}}_{T}\right)\right]. (141)

For simplicity, in the sequel we assume that LL is even; if LL is odd the proof is similar. Similarly to (137) and (138), since W~n:=gX(X~n,M~1,,M~L){\tilde{W}}^{n}:=g_{X}(\tilde{X}^{n},\tilde{{M}}_{1},\ldots,\tilde{{M}}_{L}) and Z~n:=gY(Y~n,M~1,,M~L){\tilde{Z}}^{n}:=g_{Y}(\tilde{Y}^{n},\tilde{{M}}_{1},\ldots,\tilde{{M}}_{L}), we have:

0\displaystyle 0 =\displaystyle= 1nI(Z~n;X~n|Y~nM~1M~L)\displaystyle\frac{1}{n}I({\tilde{Z}}^{n};\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{L})
=(m)\displaystyle\stackrel{{\scriptstyle(m)}}{{=}} 1n[H(X~n|Y~nM~1M~L1)H(X~n|Y~nM~1M~LZ~n)]\displaystyle\frac{1}{n}\left[H(\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{{L-1}})-H(\tilde{X}^{n}|\tilde{Y}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{L}{\tilde{Z}}^{n})\right]
(n)\displaystyle\stackrel{{\scriptstyle(n)}}{{\geq}} H(X~T|Y~TU1UL1)+o(1)\displaystyle H(\tilde{X}_{T}|\tilde{Y}_{T}U_{1}\cdots U_{{L-1}})+o(1)
1ni=1nH(X~i|X~i1Y~iY~i+1nM~1M~LZ~i)\displaystyle-\frac{1}{n}\sum_{i=1}^{n}H(\tilde{X}_{i}|\tilde{X}^{i-1}\tilde{Y}_{i}\tilde{Y}_{i+1}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{L}{\tilde{Z}}_{i})
=\displaystyle= I(ULZ~T;X~T|Y~TU1UL1)+o(1)\displaystyle I(U_{L}{\tilde{Z}_{T}};\tilde{X}_{T}|\tilde{Y}_{T}U_{1}\cdots U_{L-1})+o(1)
\displaystyle\geq I(Z~T;X~T|Y~TU1UL)+o(1)\displaystyle I({\tilde{Z}}_{T};\tilde{X}_{T}|\tilde{Y}_{T}U_{1}\cdots U_{L})+o(1) (143)

and

0\displaystyle 0 =\displaystyle= 1nI(W~n;Y~n|X~nM~1M~L)\displaystyle\frac{1}{n}I({\tilde{W}}^{n};\tilde{Y}^{n}|\tilde{X}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{L}) (144)
=\displaystyle= 1n[H(Y~n|X~nM~1M~L)H(Y~n|X~nM~1M~LW~n)]\displaystyle\frac{1}{n}\left[H(\tilde{Y}^{n}|\tilde{X}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{L})-H(\tilde{Y}^{n}|\tilde{X}^{n}\tilde{{M}}_{1}\cdots\tilde{{M}}_{L}{\tilde{W}}^{n})\right]
\displaystyle\geq I(W~T;Y~T|X~TU1UL)+o(1),\displaystyle I({\tilde{W}}_{T};\tilde{Y}_{T}|\tilde{X}_{T}U_{1}\cdots U_{L})+o(1),

where (m)(m) holds since for even LL, message M~L\tilde{M}_{L} is a function of (Y~n,M~1,,M~L1)(\tilde{Y}^{n},\tilde{M}_{1},\cdots,\tilde{M}_{L-1}); and (n)(n) holds by (122) since L1L-1 is odd and because conditioning can only reduce entropy.

The desired rate constraints are then obtained by combining (129), (130), (135), (137), (138), (139), (141), (143), and (144) and by taking nn\to\infty. Details are as follows. By Carathéodory’s theorem [34, Appendix C], there exist auxiliary random variables U1,,UL{U}_{1},\ldots,U_{L} of bounded alphabets satisfying (129), (130), (135), (137), (138), (139), (141), (143), and (144). We restrict to such auxiliary random variables and invoke the Bolzano-Weierstrass theorem to conclude the existence of a pmf PU1ULXYWZP_{U_{1}\cdots U_{L}{X}{Y}W{Z}}^{*}, also abbreviated as PP^{*}, and an increasing subsequence of blocklengths {ni}i=1\{n_{i}\}_{i=1}^{\infty} so that

limiPU1ULX~Y~W~Z~;ni\displaystyle\lim_{{i\to\infty}}P_{U_{1}\cdots U_{L}\tilde{X}\tilde{Y}\tilde{W}\tilde{Z};n_{i}} =\displaystyle= PU1ULXYWZ,\displaystyle P_{U_{1}\cdots U_{L}{X}{Y}W{Z}}^{*}, (145)

where PU1ULX~Y~W~Z~;niP_{U_{1}\cdots U_{L}\tilde{X}\tilde{Y}\tilde{W}\tilde{Z};n_{i}} denotes the pmf of the tuple (U1ULX~TY~TW~TZ~T)(U_{1}\cdots U_{L}\tilde{X}_{T}\tilde{Y}_{T}\tilde{W}_{T}\tilde{Z}_{T}) at blocklength nin_{i}.

Notice that for any blocklength nin_{i} the pair (X~ni,Y~ni)\big{(}\tilde{X}^{n_{i}},\tilde{Y}^{n_{i}}\big{)} lies in the jointly typical set 𝒯μni(ni)(PXY)\mathcal{T}^{(n_{i})}_{\mu_{n_{i}}}(P_{XY}), i.e., |PX~Y~;niPXY|μni\big{|}P_{\tilde{X}\tilde{Y};n_{i}}-P_{XY}\big{|}\leq\mu_{n_{i}}, and thus since μn0\mu_{n}\to 0 as nn\to\infty, by the definition of (X~T,Y~T)(\tilde{X}_{T},\tilde{Y}_{T}) and by (145), the limiting pmf satisfies PXY=PXYP^{*}_{{X}{Y}}=P_{XY}. We further deduce from (129), (130), (135), (137), (138), (139), (141), (143), and (144) that:

R\displaystyle R_{\ell} \displaystyle\geq IP(X;U|YU1U1),=1,3\displaystyle I_{P^{*}}(X;U_{\ell}|YU_{1}\cdots U_{\ell-1}),\qquad\ell=1,3\ldots (146a)
R\displaystyle R_{\ell} \displaystyle\geq IP(Y;U|XU1U1),=2,4\displaystyle I_{P^{*}}(Y;U_{\ell}|XU_{1}\cdots U_{\ell-1}),\qquad\ell=2,4\ldots (146b)
0\displaystyle 0 =\displaystyle= IP(Y;U|XU1U1),=1,3\displaystyle I_{P^{*}}(Y;U_{\ell}|XU_{1}\cdots U_{\ell-1}),\qquad\ell=1,3\ldots (146c)
0\displaystyle 0 =\displaystyle= IP(X;U|YU1U1),=2,4\displaystyle I_{P^{*}}(X;U_{\ell}|YU_{1}\cdots U_{\ell-1}),\qquad\ell=2,4\ldots (146d)
0\displaystyle 0 =\displaystyle= IP(Z;X|YU1UL),\displaystyle I_{P^{*}}(Z;X|YU_{1}\cdots U_{L}), (146e)
0\displaystyle 0 =\displaystyle= IP(W;Y|XU1UL),\displaystyle I_{P^{*}}(W;Y|XU_{1}\cdots U_{L}), (146f)

where the subscript PP^{*} indicates that the mutual information quantities should be computed with respect to PP^{*}.

Combined with (141), which implies

Dj\displaystyle D_{j} \displaystyle\geq 𝔼P[dj(X,Y,W,Z)],j{1,,J},\displaystyle\mathbb{E}_{P^{*}}[d_{j}(X,Y,{W},{{Z}})],\quad j\in\{1,\ldots,J\}, (147)

above (in)equalities (146) conclude the desired converse proof.

VI Testing Against Independence in a KK-Hop Network

In this section we focus on the KK-hop hypothesis testing setup in Figure 4.

VI-A Setup

Consider a system with a transmitter T0 observing the source sequence Y0nY_{0}^{n}, K1K-1 relays labelled R1,,RK1\text{R}_{1},\ldots,\text{R}_{K-1} and observing sequences Y1n,,YK1nY_{1}^{n},\ldots,Y_{K-1}^{n}, respectively, and a receiver RK observing sequence YKnY_{K}^{n}.

The source sequences (Y0n,Y1n,,YKn)(Y_{0}^{n},Y_{1}^{n},\ldots,Y_{K}^{n}) are distributed according to one of two distributions depending on a binary hypothesis {0,1}\mathcal{H}\in\{0,1\}:

if =0:(Y0n,Y1n,,YKn) i.i.d. PY0Y1PY2|Y1PYK|YK1;\displaystyle\textnormal{if }\mathcal{H}=0:(Y_{0}^{n},Y_{1}^{n},\ldots,Y_{K}^{n})\textnormal{ i.i.d. }\sim\,P_{Y_{0}Y_{1}}P_{Y_{2}|Y_{1}}\cdots P_{Y_{K}|Y_{K-1}}; (148a)
if =1:(Y0n,Y1n,,YKn) i.i.d. PY0PY1PYK.\displaystyle\textnormal{if }\mathcal{H}=1:(Y_{0}^{n},Y_{1}^{n},\ldots,Y_{K}^{n})\textnormal{ i.i.d. }\sim\,P_{Y_{0}}\cdot P_{Y_{1}}\cdots P_{Y_{K}}.

Communication takes place over KK hops as illustrated in Figure 4. The transmitter T0 sends a message M1=ϕ0(n)(Y0n){M}_{1}=\phi_{0}^{(n)}(Y_{0}^{n}) to the first relay R1, which sends a message M2=ϕ1(n)(Y1n,M1){M}_{2}=\phi_{1}^{(n)}(Y_{1}^{n},{M}_{1}) to the second relay and so on. The communication is thus described by encoding functions

ϕ0(n)\displaystyle\phi_{0}^{(n)} :\displaystyle\colon 𝒴0n{1,,2nR1},\displaystyle\mathcal{Y}_{0}^{n}\to\{1,\ldots,2^{nR_{1}}\}, (149)
ϕk(n)\displaystyle\phi_{k}^{(n)} :\displaystyle\colon 𝒴kn×{1,,2nRk}{1,,2nRk+1},\displaystyle\mathcal{Y}_{k}^{n}\times\{1,\ldots,2^{nR_{k}}\}\to\{1,\ldots,2^{nR_{k+1}}\}, (150)
k{1,,K1},\displaystyle\hskip 113.81102pt\quad k\in\{1,\ldots,K-1\},

and messages are obtained as:

M1\displaystyle{M}_{1} =\displaystyle= ϕ0(n)(Y0n)\displaystyle\phi_{0}^{(n)}({Y}_{0}^{n}) (151)
Mk+1\displaystyle{M}_{k+1} =\displaystyle= ϕk(n)(Ykn,Mk),k{1,,K1}.\displaystyle\phi_{k}^{(n)}({Y}_{k}^{n},{M}_{k}),\quad k\in\{1,\ldots,K-1\}. (152)

Each relay R1, …, RK-1 as well as the receiver RK, produces a guess of the hypothesis \mathcal{H}. These guesses are described by guessing functions

gk(n):𝒴kn×{1,,2nRk}{0,1},k{1,,K},g_{k}^{(n)}\colon\mathcal{Y}_{k}^{n}\times\{1,\ldots,2^{nR_{k}}\}\to\{0,1\},\quad k\in\{1,\ldots,K\}, (153)

where we request that the guesses

^k=gk(n)(Ykn,Mk),k{1,,K},\displaystyle\hat{\mathcal{H}}_{k}=g_{k}^{(n)}(Y_{k}^{n},{M}_{k}),\quad k\in\{1,\ldots,K\}, (154)

have type-I error probabilities

αk,n\displaystyle\alpha_{k,n} \displaystyle\triangleq Pr[^k=1|=0],k{1,,K},\displaystyle\Pr[\hat{\mathcal{H}}_{k}=1|\mathcal{H}=0],\quad k\in\{1,\ldots,K\}, (155)

not exceeding given thresholds, and type-II error probabilities

βk,n\displaystyle\beta_{k,n} \displaystyle\triangleq Pr[^k=0|=1],k{1,,K},\displaystyle\Pr[\hat{\mathcal{H}}_{k}=0|\mathcal{H}=1],\quad k\in\{1,\ldots,K\}, (156)

decaying to 0 exponentially fast with largest possible exponents.

Definition 4

Given sequences of allowed type-I error probabilities {δk,n}\{\delta_{k,n}\} and rates R1,R2,,RK0R_{1},R_{2},\ldots,R_{K}\geq 0, the exponent tuple (θ1,θ2,,θK)(\theta_{1},\theta_{2},\ldots,\theta_{K}) is called {δk,n}\{\delta_{k,n}\}-achievable if there exists a sequence of encoding and decision functions {ϕ0(n),ϕ1(n),,ϕK1(n),g1(n),g2(n),,gK(n)}n1\big{\{}\phi_{0}^{(n)},\phi_{1}^{(n)},\ldots,\phi_{K-1}^{(n)},g_{1}^{(n)},g_{2}^{(n)},\ldots,g_{K}^{(n)}\big{\}}_{n\geq 1} satisfying for each k{1,,K}k\in\{1,\ldots,K\} and blocklength nn:

αk,n\displaystyle\alpha_{k,n} \displaystyle\leq δk,n,\displaystyle\delta_{k,n}, (157a)
lim¯n1nlog1βk,n\displaystyle\varliminf_{n\to\infty}{1\over n}\log{1\over\beta_{k,n}} \displaystyle\geq θk.\displaystyle\theta_{k}. (157b)

VI-B Old and New Results

Definition 5

For any {1,,K}\ell\in\{1,\ldots,K\}, define the function

η:0+\displaystyle\eta_{\ell}\colon\mathbb{R}_{0}^{+} \displaystyle\to 0+\displaystyle\mathbb{R}_{0}^{+} (158)
R\displaystyle R \displaystyle\mapsto maxPU|Y1:RI(U;Y1)I(U;Y).\displaystyle\max_{\begin{subarray}{c}P_{U|Y_{\ell-1}}\colon\\ R\geq I\left(U;Y_{\ell-1}\right)\end{subarray}}I\left(U;Y_{\ell}\right). (159)

The described setup was previously studied in [29] and [31], and an extension of the setup under variable-length coding was considered in [35]. While for a general number of users K2K\geq 2 only achievability results and weak converses were presented [29], for K=2K=2 users a strong converse was derived.

Theorem 4 (Theorems 2 and 3 in [31])

Let K=2K=2 and consider fixed allowed type-I error probabilities

δk,n=ϵk,k{1,2},\displaystyle\delta_{k,n}=\epsilon_{k},\quad k\in\{1,2\}, (160)

for given ϵ1,ϵ2[0,1)\epsilon_{1},\epsilon_{2}\in[0,1) with ϵ1+ϵ21\epsilon_{1}+\epsilon_{2}\neq 1. An exponent pair (θ1,θ2)(\theta_{1},\theta_{2}) is (ϵ1,ϵ2)(\epsilon_{1},\epsilon_{2})-achievable if, and only if,

θk=1kη(R),k{1,2}.\displaystyle\theta_{k}\leq\sum_{\ell=1}^{k}\eta_{\ell}(R_{\ell}),\qquad k\in\{1,2\}. (161)
Remark 4

In [31], the presentation of Theorem 4 was split into two separate theorems depending on the values of ϵ1\epsilon_{1} and ϵ2\epsilon_{2}. While [31, Theorem 2] considers the case ϵ1+ϵ2<1\epsilon_{1}+\epsilon_{2}<1 and coincides with above formulation, [31, Theorem 3] considers the case ϵ1+ϵ2>1\epsilon_{1}+\epsilon_{2}>1 and is formulated as an optimization problem over three auxiliary random variables U1,U2,VU_{1},U_{2},V. Without loss in optimality, this optimization can however be restricted to auxiliaries U1=U2U_{1}=U_{2}, and [31, Theorem 3] simplifies to the form presented in above Theorem 4.

Remark 5

The set of pairs (θ1,θ2)(\theta_{1},\theta_{2}) that are (ϵ1,ϵ2)(\epsilon_{1},\epsilon_{2}) achievable according to Theorem 4 does not depend on the values of ϵ1\epsilon_{1} and ϵ2\epsilon_{2} (as long as ϵ1+ϵ21\epsilon_{1}+\epsilon_{2}\neq 1) and forms a rectangular region. In particular, each of the two exponents can be maximized without affecting the other exponent. This result extends to a general number of K2K\geq 2 users, as shown by the achievability result in [29] and by the strong converse result in the following Theorem 5.

Our main result in this section (Theorem 5 ahead) generalizes the strong converse in Theorem 4 to arbitrary K2K\geq 2 and arbitrary ϵ1,,ϵK[0,1)\epsilon_{1},\ldots,\epsilon_{K}\in[0,1). Technically speaking, we prove an exponentially-strong converse result that is a stronger statement. In fact, for any kk, an exponent θk\theta_{k} violating Condition (163) can only be achieved with probabilities αk,n\alpha_{k,n} that tend to 1 exponentially fast in the blocklength nn.

Theorem 5

Let {δk,n}\{\delta_{k,n}\} be sequences satisfying

limn1nlog(1δk,n)\displaystyle\lim_{n\to\infty}\frac{1}{n}\log(1-\delta_{k,n}) =\displaystyle= 0,k{1,,K}.\displaystyle 0,\qquad k\in\{1,\ldots,K\}. (162)

Given rates R1,,RK0R_{1},\ldots,R_{K}\geq 0, the exponent-tuple (θ1,,θK)(\theta_{1},\ldots,\theta_{K}) can only be {δk,n}\{\delta_{k,n}\}-achievable, if

θk=1kη(R),k{1,,K}.\displaystyle\theta_{k}\leq\sum_{\ell=1}^{k}\eta_{\ell}(R_{\ell}),\quad k\in\{1,\ldots,K\}. (163)
Remark 6

The direct part of this theorem was proved in [29] for some choice of admissible type-I error probabilities δk,n0\delta_{k,n}\to 0, for all kk. The strong converse in this theorem thus establishes the optimal exponents for arbitrary K2K\geq 2 and all sequences {δk,n}\{\delta_{k,n}\} that satisfy (162) and do not vanish too quickly.

Remark 7

For all permissible type-I error probabilities {δk,n}\{\delta_{k,n}\} that satisfy (162) and do not vanish too quickly, the set of achievable exponent-tuples (θ1,,θK)(\theta_{1},\ldots,\theta_{K}) form a hypercube, implying that all decision centers, i.e., relays R,1,{}_{1},\ldots, RK-1 and receiver RK, can simultaneously achieve their optimal type-II error exponents. To prove the desired converse result in Theorem 5, it thus suffices to show that the bound in (163) holds in a setup where only the single decision center Rk takes a decision.

Remark 8

When one allows for variable-length coding and only limits the expected sizes of the message set but not its maximum sizes, then a tradeoff between the different exponents θ1,,θK\theta_{1},\ldots,\theta_{K} arises [35]. Moreover, as also shown in [35], in that case the set of all achievable exponent tuples depends on the asymptotic values of the allowed type-I error probabilities.

VI-C Strong Converse Proof to Theorem 5

Let {δk,n}\{\delta_{k,n}\} be sequences of allowed type-I error probabilities. Fix a sequence (in nn) of encoding and decision functions {(ϕ0(n),ϕ1(n),,ϕK1(n),g1(n),,gK(n))}n1\{(\phi_{0}^{(n)},\phi_{1}^{(n)},\ldots,\phi_{K-1}^{(n)},g_{1}^{(n)},\ldots,g_{K}^{(n)})\}_{n\geq 1} satisfying (157) for {δk,n}\{\delta_{k,n}\} and type-II error exponents θ1,,θK\theta_{1},\ldots,\theta_{K}.

Choose a sequence (in nn) of small positive numbers {μn}\{\mu_{n}\} satisfying

limnμn\displaystyle\lim_{n\to\infty}\mu_{n} =\displaystyle= 0\displaystyle 0 (164)
limn(nμn2)1\displaystyle\lim_{n\to\infty}\left(n\cdot\mu_{n}^{2}\right)^{-1} =\displaystyle= 0.\displaystyle 0. (165)

Fix now an arbitrary k{1,,K}k\in\{1,\ldots,K\} and a blocklength nn, and let 𝒜k\mathcal{A}_{k} denote the acceptance region of Rk, i.e.,

𝒜k:={(y0n,,ykn):gk(n)(ykn,mk)=0},\mathcal{A}_{k}:=\{(y_{0}^{n},\ldots,y_{k}^{n})\colon g_{k}^{(n)}(y_{k}^{n},{m}_{k})=0\}, (166)

where we define recursively m1:=ϕ0(n)(y0n){m}_{1}:=\phi_{0}^{(n)}(y_{0}^{n}) and

m:=ϕ1(n)(y1n,m1),{2,,k}.{m}_{\ell}:=\phi_{\ell-1}^{(n)}(y_{\ell-1}^{n},{m}_{\ell-1}),\qquad\ell\in\{2,\ldots,k\}. (167)

Define also the intersection of this acceptance region with the typical set:

𝒟k𝒜k𝒯μn(n)(PY0Yk).\displaystyle\mathcal{D}_{k}\triangleq\mathcal{A}_{k}\cap\mathcal{T}_{\mu_{n}}^{(n)}(P_{Y_{0}\cdots Y_{k}}). (168)

By [3, Remark to Lemma 2.12], the type-I error probability constraints in (157a), and the union bound:

Δk:=PY0nY1nYkn(𝒟k)\displaystyle\Delta_{k}:=P_{Y_{0}^{n}Y_{1}^{n}\cdots Y_{k}^{n}}(\mathcal{D}_{k}) \displaystyle\geq 1δk,n|𝒴0||𝒴k|4μn2n,\displaystyle 1-\delta_{k,n}-{|{\mathcal{Y}_{0}}|\cdots|{\mathcal{Y}_{k}}|\over{{4\mu_{n}^{2}}n}}, (169)

and thus

limn1nlogΔk=0.\lim_{n\to\infty}\frac{1}{n}\log\Delta_{k}=0. (170)

Let (Y~0n,Y~1n,,Y~kn)(\tilde{Y}_{0}^{n},\tilde{Y}_{1}^{n},\ldots,\tilde{Y}_{k}^{n}) be random variables of joint pmf

PY~0nY~1nY~kn(y0n,y1n,,ykn)\displaystyle P_{\tilde{Y}_{0}^{n}\tilde{Y}_{1}^{n}\cdots\tilde{Y}_{k}^{n}}({y}_{0}^{n},{y}_{1}^{n},\ldots,{y}_{k}^{n})
=\displaystyle= PY0nY1nYkn(y0n,y1n,,ykn)Δk𝟙{(y0n,y1n,,ykn)𝒟k}.\displaystyle\frac{P_{{Y}_{0}^{n}{Y}_{1}^{n}\cdots{Y}_{k}^{n}}({y}_{0}^{n},{y}_{1}^{n},\ldots,{y}_{k}^{n})}{\Delta_{k}}\cdot\mathds{1}\{({y}_{0}^{n},{y}_{1}^{n},\ldots,{y}_{k}^{n})\in\mathcal{D}_{k}\}.

Let also M~=ϕ1(n)(M~1,Y~1n)\tilde{{M}}_{\ell}=\phi_{\ell-1}^{(n)}(\tilde{{M}}_{\ell-1},\tilde{Y}_{\ell-1}^{n}) and TT be uniform over {1,,n}\{1,\ldots,n\} independent of (Y~0n,Y~1n,,Y~kn,M~1,,M~k)(\tilde{Y}_{0}^{n},\tilde{Y}_{1}^{n},\ldots,\tilde{Y}_{k}^{n},\tilde{{M}}_{1},\ldots,\tilde{{M}}_{k}).

Notice that for any {1,,k}\ell\in\{1,\ldots,k\}:

R\displaystyle R_{\ell} \displaystyle\geq 1nH(M~)\displaystyle\frac{1}{n}H(\tilde{{M}}_{\ell}) (172)
=\displaystyle= 1nI(M~;Y~0nY~kn)\displaystyle\frac{1}{n}I(\tilde{{M}}_{\ell};\tilde{Y}_{0}^{n}\cdots\tilde{Y}_{k}^{n}) (173)
=\displaystyle= 1nH(Y~0nY~kn)1nH(Y~0nY~kn|M~)\displaystyle\frac{1}{n}H(\tilde{Y}_{0}^{n}\cdots\tilde{Y}_{k}^{n})-\frac{1}{n}H(\tilde{Y}_{0}^{n}\cdots\tilde{Y}_{k}^{n}|\tilde{{M}}_{\ell}) (175)
=\displaystyle= H(Y~0,TY~k,T)+o(1)\displaystyle H(\tilde{Y}_{0,T}\cdots\tilde{Y}_{k,T})+o(1)
1nt=1nH(Y~0,tY~k,t|M~Y~0t1Y~kt1)\displaystyle-\frac{1}{n}\sum_{t=1}^{n}H(\tilde{Y}_{0,t}\cdots\tilde{Y}_{k,t}|\tilde{{M}}_{\ell}\tilde{Y}_{0}^{t-1}\cdots\tilde{Y}_{k}^{t-1})
=\displaystyle= H(Y~0,TY~k,T)+o(1)H(Y~0,TY~k,T|U)\displaystyle H(\tilde{Y}_{0,T}\cdots\tilde{Y}_{k,T})+o(1)-H(\tilde{Y}_{0,T}\cdots\tilde{Y}_{k,T}|{U}_{\ell}) (176)
=\displaystyle= I(Y~0,TY~k,T;U)+o(1)\displaystyle I(\tilde{Y}_{0,T}\cdots\tilde{Y}_{k,T};U_{\ell})+o(1) (177)
\displaystyle\geq I(Y~1,T;U)+o(1),\displaystyle I(\tilde{Y}_{\ell-1,T};U_{\ell})+o(1), (178)

where we defined U(M~,Y~0T1,,Y~kT1,T)U_{\ell}\triangleq(\tilde{{M}}_{\ell},\tilde{Y}_{0}^{T-1},\ldots,\tilde{Y}_{k}^{T-1},T). Here, (175) holds by extending (11) to kk-tuples.

We next upper bound the exponential decay of the type-II error probability. Define:

QM~k(mk)\displaystyle Q_{\tilde{{M}}_{k}}({m}_{k}) \displaystyle\triangleq y0n,y1n,,yk1nPY~0n(y0n)PY~k1n(yk1n)\displaystyle\sum_{y_{0}^{n},y_{1}^{n},\ldots,y_{k-1}^{n}}P_{\tilde{Y}_{0}^{n}}(y_{0}^{n})\cdots P_{\tilde{Y}_{k-1}^{n}}(y_{k-1}^{n})
𝟙{mk=ϕk(ϕk1((ϕ1(y0n))),yk1n)},\displaystyle\quad\cdot\mathds{1}\{{m}_{k}=\phi_{k}(\phi_{k-1}(\cdots(\phi_{1}(y_{0}^{n})\cdots)),y_{k-1}^{n})\},

and

QMk(mk)\displaystyle Q_{{{M}}_{k}}({m}_{k}) \displaystyle\triangleq y0n,y1n,,yk1nPY0n(y0n)PYk1n(yk1n)\displaystyle\sum_{y_{0}^{n},y_{1}^{n},\ldots,y_{k-1}^{n}}P_{{Y}_{0}^{n}}(y_{0}^{n})\cdots P_{{Y}_{k-1}^{n}}(y_{k-1}^{n})
𝟙{mk=ϕk1(ϕk2((ϕ0(y0n))),yk1n)},\displaystyle\;\;\cdot\mathds{1}\{{m}_{k}=\phi_{k-1}(\phi_{k-2}(\cdots(\phi_{0}(y_{0}^{n})\cdots)),y_{k-1}^{n})\},

and notice that by (LABEL:eq:change):

QM~kPY~kn(𝒜k)\displaystyle Q_{\tilde{{M}}_{k}}P_{\tilde{Y}_{k}^{n}}\left({\mathcal{A}}_{k}\right) \displaystyle\leq QMkPYkn(𝒜k)Δk(k+1)\displaystyle Q_{{{M}}_{k}}P_{{Y}_{k}^{n}}\left({\mathcal{A}}_{k}\right)\Delta_{k}^{-(k+1)} (181)
=\displaystyle= βk,nΔk(k+1).\displaystyle\beta_{k,n}\Delta_{k}^{-(k+1)}.

Moreover, by (166), the probability PM~kY~kn(𝒜k)=1P_{\tilde{{M}}_{k}\tilde{Y}_{k}^{n}}({\mathcal{A}}_{k})=1, and thus

D(PM~kY~kn(𝒜k)QM~kPY~kn(𝒜k))=log(QM~kPY~kn(𝒜k)),D\left(P_{\tilde{{M}}_{k}\tilde{Y}_{k}^{n}}\left({\mathcal{A}}_{k}\right)\|Q_{\tilde{{M}}_{k}}P_{\tilde{Y}_{k}^{n}}\left({\mathcal{A}}_{k}\right)\right)=-\log\left(Q_{\tilde{{M}}_{k}}P_{\tilde{Y}_{k}^{n}}\left({\mathcal{A}}_{k}\right)\right), (182)

where on the left-hand side we slightly abused notation and mean the KL divergence of the two binary pmfs induced by PM~kY~kn(𝒜k)P_{\tilde{{M}}_{k}\tilde{Y}_{k}^{n}}\left({\mathcal{A}}_{k}\right) and 1PM~kY~kn(𝒜k)1-P_{\tilde{{M}}_{k}\tilde{Y}_{k}^{n}}\left({\mathcal{A}}_{k}\right) and by QM~kPY~kn(𝒜k)Q_{\tilde{{M}}_{k}}P_{\tilde{Y}_{k}^{n}}\left({\mathcal{A}}_{k}\right) and 1QM~kPY~kn(𝒜k)1-Q_{\tilde{{M}}_{k}}P_{\tilde{Y}_{k}^{n}}\left({\mathcal{A}}_{k}\right). Combined with (170), with (181), and with the data-processing inequality, we obtain from (182):

1nlogβk,n\displaystyle-{1\over n}\log\beta_{k,n} \displaystyle\leq 1nlog(QM~kPY~kn(𝒜k))(k+1)nlogΔk\displaystyle-{1\over n}\log\left(Q_{\tilde{{M}}_{k}}P_{\tilde{Y}_{k}^{n}}\left({\mathcal{A}}_{k}\right)\right)-\frac{(k+1)}{n}\log\Delta_{k} (184)
\displaystyle\leq 1nD(PM~kY~knQM~kPY~kn)+o(1).\displaystyle{1\over n}D\left(P_{\tilde{{M}}_{k}\tilde{Y}_{k}^{n}}\Big{\|}Q_{\tilde{{M}}_{k}}P_{\tilde{Y}_{k}^{n}}\right)+o(1).

We continue to upper bound the divergence term as

1nD(PM~kY~kn||QM~kPY~kn)\displaystyle{1\over n}D(P_{\tilde{{M}}_{k}\tilde{Y}_{k}^{n}}||Q_{\tilde{{M}}_{k}}P_{\tilde{Y}_{k}^{n}}) (185)
=\displaystyle= 1nI(M~k;Y~kn)+1nD(PM~k||QM~k)\displaystyle{1\over n}I(\tilde{{M}}_{k};\tilde{Y}_{k}^{n})+{1\over n}D(P_{\tilde{{M}}_{k}}||Q_{\tilde{{M}}_{k}})
\displaystyle\leq 1nI(M~k;Y~kn)+1nD(PY~k1nM~k1||PY~k1nQM~k1)\displaystyle{1\over n}I(\tilde{{M}}_{k};\tilde{Y}_{k}^{n})+{1\over n}D(P_{\tilde{Y}_{k-1}^{n}\tilde{{M}}_{k-1}}||P_{\tilde{Y}_{k-1}^{n}}Q_{\tilde{{M}}_{k-1}})
\displaystyle\leq 1nI(M~k;Y~kn)+1nI(M~k1;Y~k1n)\displaystyle{1\over n}I(\tilde{{M}}_{k};\tilde{Y}_{k}^{n})+{1\over n}I(\tilde{{M}}_{k-1};\tilde{Y}_{k-1}^{n})
+1nD(PY~k2nM~k2||PY~k2nQM~k2)\displaystyle\qquad\qquad\qquad+{1\over n}D(P_{\tilde{Y}_{k-2}^{n}\tilde{{M}}_{k-2}}||P_{\tilde{Y}_{k-2}^{n}}Q_{\tilde{{M}}_{k-2}})
\displaystyle\vdots
\displaystyle\leq 1n=1kI(M~;Y~n)\displaystyle{1\over n}\sum_{\ell=1}^{k}I(\tilde{{M}}_{\ell};\tilde{Y}_{\ell}^{n}) (188)
\displaystyle\leq 1n=1kt=1nI(M~Y~0t1Y~kt1;Y~,t)\displaystyle{1\over n}\sum_{\ell=1}^{k}\sum_{t=1}^{n}I(\tilde{{M}}_{\ell}\tilde{Y}_{0}^{t-1}\cdots\tilde{Y}_{k}^{t-1};\tilde{Y}_{\ell,t}) (189)
\displaystyle\leq =1kI(U;Y~,T).\displaystyle\sum_{\ell=1}^{k}I(U_{\ell};\tilde{Y}_{\ell,T}). (190)

Here (VI-C) is obtained by the data processing inequality for KL-divergence and (190) by the definition of U{U}_{\ell} and TT.

Combined with (184), we obtain

1nlogβk,n=1kI(U;Y~,T)+o(1).-{1\over n}\log\beta_{k,n}\leq\sum_{\ell=1}^{k}I(U_{\ell};\tilde{Y}_{\ell,T})+o(1). (191)

Finally, we proceed to prove that for any {1,,k}\ell\in\{1,\ldots,k\} the Markov chain UY~1,TY~,TU_{\ell}\to\tilde{Y}_{\ell-1,T}\to\tilde{Y}_{\ell,T} holds in the limit as nn\to\infty. We start by noticing the Markov chain M~1Y~0n(Y~1n,,Y~kn)\tilde{{M}}_{1}\to\tilde{Y}_{0}^{n}\to(\tilde{Y}_{1}^{n},\cdots,\tilde{Y}_{k}^{n}), and thus:

0\displaystyle 0 =\displaystyle= 1nI(M~1;Y~1nY~kn|Y~0n)\displaystyle{1\over n}I(\tilde{{M}}_{1};\tilde{Y}_{1}^{n}\cdots\tilde{Y}_{k}^{n}|\tilde{Y}_{0}^{n}) (192)
=\displaystyle= 1nH(Y~1nY~kn|Y~0n)1nH(Y~1nY~kn|Y~0nM~1)\displaystyle{1\over n}H(\tilde{Y}_{1}^{n}\cdots\tilde{Y}_{k}^{n}|\tilde{Y}_{0}^{n})-{1\over n}H(\tilde{Y}_{1}^{n}\cdots\tilde{Y}_{k}^{n}|\tilde{Y}_{0}^{n}\tilde{{M}}_{1}) (195)
=\displaystyle= H(Y~1,TY~k,T|Y~0,T)+o(1)1nH(Y~1nY~kn|Y~0nM~1)\displaystyle H(\tilde{Y}_{1,T}\cdots\tilde{Y}_{k,T}|\tilde{Y}_{0,T})+o(1)-{1\over n}H(\tilde{Y}_{1}^{n}\cdots\tilde{Y}_{k}^{n}|\tilde{Y}_{0}^{n}\tilde{{M}}_{1})
\displaystyle\geq H(Y~1,TY~k,T|Y~0,T)+o(1)\displaystyle H(\tilde{Y}_{1,T}\cdots\tilde{Y}_{k,T}|\tilde{Y}_{0,T})+o(1)
H(Y~1,TY~k,T|Y~0,TY~0T1Y~kT1Y~0,T+1nM~1T)\displaystyle-H(\tilde{Y}_{1,T}\cdots\tilde{Y}_{k,T}|\tilde{Y}_{0,T}\tilde{Y}_{0}^{T-1}\cdots\tilde{Y}_{k}^{T-1}\tilde{Y}_{0,T+1}^{n}\tilde{{M}}_{1}T)
\displaystyle\geq I(Y~1,TY~k,T;U1|Y~0,T)+o(1)0,\displaystyle I(\tilde{Y}_{1,T}\cdots\tilde{Y}_{k,T};U_{1}|\tilde{Y}_{0,T})+o(1)\geq 0, (196)

where (LABEL:MC1proofstep2KHop) is obtained by extending (13) to multiple random variables. We thus conclude that

limnI(Y~1,TY~k,T;U1|Y~0,T)=0.\lim_{n\to\infty}I(\tilde{Y}_{1,T}\cdots\tilde{Y}_{k,T};U_{1}|\tilde{Y}_{0,T})=0. (197)

Notice next that for any {2,,k}\ell\in\{2,\ldots,k\}:

I(U;Y~,T|Y~1,T)\displaystyle I(U_{\ell};\tilde{Y}_{\ell,T}|\tilde{Y}_{\ell-1,T}) \displaystyle\leq I(UY~0,TY~2,T;Y~,T|Y~1,T)\displaystyle I(U_{\ell}\tilde{Y}_{0,T}\cdots\tilde{Y}_{\ell-2,T};\tilde{Y}_{\ell,T}|\tilde{Y}_{\ell-1,T}) (199)
=\displaystyle= I(U;Y~,T|Y~0,TY~1,T)\displaystyle I(U_{\ell};\tilde{Y}_{\ell,T}|\tilde{Y}_{0,T}\cdots\tilde{Y}_{\ell-1,T})
+I(Y~0,TY~2,T;Y~,T|Y~1,T)\displaystyle+I(\tilde{Y}_{0,T}\cdots\tilde{Y}_{\ell-2,T};\tilde{Y}_{\ell,T}|\tilde{Y}_{\ell-1,T})
=\displaystyle= I(U;Y~,T|Y~0,TY~1,T)+o(1),\displaystyle I(U_{\ell};\tilde{Y}_{\ell,T}|\tilde{Y}_{0,T}\cdots\tilde{Y}_{\ell-1,T})+o(1), (200)

where the last equality can be proved by extending (11) and (13) to multiple random variables and by noting the factorization PY0PY1|Y0PYK|YK1P_{Y_{0}}P_{Y_{1}|Y_{0}}\cdots P_{Y_{K}|Y_{K-1}}.

Following similar steps to (192)–(196), we further obtain:

0\displaystyle 0 =\displaystyle= 1nI(M~;Y~nY~kn|Y~0nY~1n)\displaystyle{1\over n}I(\tilde{{M}}_{\ell};\tilde{Y}_{\ell}^{n}\cdots\tilde{Y}_{k}^{n}|\tilde{Y}_{0}^{n}\cdots\tilde{Y}_{\ell-1}^{n})
=\displaystyle= 1nH(Y~nY~kn|Y~0nY~1n)\displaystyle{1\over n}H(\tilde{Y}_{\ell}^{n}\cdots\tilde{Y}_{k}^{n}|\tilde{Y}_{0}^{n}\cdots\tilde{Y}_{\ell-1}^{n})
1nH(Y~nY~kn|Y~0nY~1nM~)\displaystyle-{1\over n}H(\tilde{Y}_{\ell}^{n}\cdots\tilde{Y}_{k}^{n}|\tilde{Y}_{0}^{n}\cdots\tilde{Y}_{\ell-1}^{n}\tilde{{M}}_{\ell})
=\displaystyle= H(Y~,TY~k,T|Y~0,TY~1,T)+o(1)\displaystyle H(\tilde{Y}_{\ell,T}\cdots\tilde{Y}_{k,T}|\tilde{Y}_{0,T}\cdots\tilde{Y}_{\ell-1,T})+o(1)
1nH(Y~nY~kn|Y~0nY~1nM~)\displaystyle-{1\over n}H(\tilde{Y}_{\ell}^{n}\cdots\tilde{Y}_{k}^{n}|\tilde{Y}_{0}^{n}\cdots\tilde{Y}_{\ell-1}^{n}\tilde{{M}}_{\ell})
\displaystyle\geq H(Y~,TY~k,T|Y~0,TY~1,T)+o(1)\displaystyle H(\tilde{Y}_{\ell,T}\cdots\tilde{Y}_{k,T}|\tilde{Y}_{0,T}\cdots\tilde{Y}_{\ell-1,T})+o(1)
1nt=1nH(Y~,tY~k,t|Y~0,tY~1,tY~0t1Y~kt1M~)\displaystyle-{1\over n}\sum_{t=1}^{n}H(\tilde{Y}_{\ell,t}\cdots\tilde{Y}_{k,t}|\tilde{Y}_{0,t}\cdots\tilde{Y}_{\ell-1,t}\tilde{Y}_{0}^{t-1}\cdots\tilde{Y}_{k}^{t-1}\tilde{{M}}_{\ell})
=\displaystyle= H(Y~,TY~k,T|Y~0,TY~1,T)+o(1)\displaystyle H(\tilde{Y}_{\ell,T}\cdots\tilde{Y}_{k,T}|\tilde{Y}_{0,T}\cdots\tilde{Y}_{\ell-1,T})+o(1)
H(Y~,TY~k,T|Y~0,TY~1,TY~0T1Y~kT1M~T)\displaystyle-H(\tilde{Y}_{\ell,T}\cdots\tilde{Y}_{k,T}|\tilde{Y}_{0,T}\cdots\tilde{Y}_{\ell-1,T}\tilde{Y}_{0}^{T-1}\cdots\tilde{Y}_{k}^{T-1}\tilde{{M}}_{\ell}T)
=\displaystyle= I(Y~,TY~k,T;U|Y~0,TY~1,T)+o(1)\displaystyle I(\tilde{Y}_{\ell,T}\cdots\tilde{Y}_{k,T};U_{\ell}|\tilde{Y}_{0,T}\cdots\tilde{Y}_{\ell-1,T})+o(1) (206)
\displaystyle\geq I(Y~,T;U|Y~0,TY~1,T)+o(1)0.\displaystyle I(\tilde{Y}_{\ell,T};U_{\ell}|\tilde{Y}_{0,T}\cdots\tilde{Y}_{\ell-1,T})+o(1)\geq 0. (207)

We thus conclude that

I(U;Y~,T|Y~0,TY~1,T)=o(1),I(U_{\ell};\tilde{Y}_{\ell,T}|\tilde{Y}_{0,T}\cdots\tilde{Y}_{\ell-1,T})=o(1), (208)

which combined with (200) proves

I(U;Y~,T|Y~1,T)=o(1).I(U_{\ell};\tilde{Y}_{\ell,T}|\tilde{Y}_{\ell-1,T})=o(1). (209)

The converse is then concluded by taking nn\to\infty, as we explain in the following. By Carathéodory’s theorem [34, Appendix C], for each nn there must exist random variables U1,,Uk{U}_{1},\ldots,U_{k} satisfying (209), (191), and (178) over alphabets of sizes

|𝒰|\displaystyle|{\mathcal{U}}_{\ell}| |𝒴1||𝒴|+2,{1,,k}.\displaystyle\leq|\mathcal{Y}_{\ell-1}|\cdot|\mathcal{Y}_{\ell}|+2,\qquad\ell\in\{1,\ldots,k\}. (210)

We thus restrict to random variables of above (bounded) supports and invoke the Bolzano-Weierstrass theorem to conclude for each {1,,k}\ell\in\{1,\ldots,k\} the existence of pmfs PY1YU()P^{(\ell)}_{Y_{\ell-1}Y_{\ell}{U_{\ell}}} over 𝒴1×𝒴×𝒰\mathcal{Y}_{\ell-1}\times\mathcal{Y}_{\ell}\times{\mathcal{U}_{\ell}}, also abbreviated as P()P^{(\ell)}, and an increasing subsequence of positive numbers {ni}i=1\{n_{i}\}_{i=1}^{\infty} satisfying

limiPY~1Y~U;ni\displaystyle\lim_{i\to\infty}P_{\tilde{Y}_{\ell-1}\tilde{Y}_{\ell}{U}_{\ell};n_{i}} =\displaystyle= PY1YU(),{1,,k},\displaystyle P^{(\ell)}_{Y_{\ell-1}Y_{\ell}{U_{\ell}}},\quad\ell\in\{1,\ldots,k\}, (211)

where PY~1Y~U;niP_{\tilde{Y}_{\ell-1}\tilde{Y}_{\ell}{U}_{\ell};n_{i}} denotes the pmf at blocklength nin_{i}.

By the monotone continuity of mutual information for discrete random variables, we can then deduce that

R\displaystyle R_{\ell} \displaystyle\geq IP()(U;Y1),{1,,k},\displaystyle I_{P^{(\ell)}}({U}_{\ell};{Y}_{\ell-1}),\quad\ell\in\{1,\ldots,k\}, (212)
θk\displaystyle\theta_{k} \displaystyle\leq =1kIP()(U;Y),\displaystyle\sum_{\ell=1}^{k}I_{P^{(\ell)}}({U}_{\ell};{Y}_{\ell}), (213)

where the subscripts indicate that mutual informations should be computed according to the indicated pmfs.

Since for any blocklength nin_{i} the pair (Y~1ni,Y~ni)\big{(}\tilde{Y}_{\ell-1}^{n_{i}},\tilde{Y}_{\ell}^{n_{i}}\big{)} lies in the jointly typical set 𝒯μni(ni)(PY1Y)\mathcal{T}^{(n_{i})}_{\mu_{n_{i}}}(P_{Y_{\ell-1}Y_{\ell}}), we have |PY1Y;niPY1Y|μni\big{|}P_{{Y}_{\ell-1}{Y}_{\ell};n_{i}}-P_{{Y_{\ell-1}}Y_{\ell}}\big{|}\leq\mu_{n_{i}} and thus the limiting pmfs satisfy PY1Y()=PY1YP^{(\ell)}_{Y_{\ell-1}Y_{\ell}}=P_{Y_{\ell-1}Y_{\ell}}. By similar continuity considerations and by (209), for all {1,,k}\ell\in\{1,\ldots,k\} the Markov chain

UY1Y,\displaystyle U_{\ell}\to Y_{\ell-1}\to Y_{\ell}, (214)

holds under PY1YU()P_{Y_{\ell-1}Y_{\ell}U_{\ell}}^{(\ell)}. This concludes the proof.

VII Conclusions and Outlook

This paper presented new strong converse proofs for source and channel coding setups and for hypothesis testing. Most of the presented converses are exponentially-strong converses and allow to conclude that the error probabilities tend to 1 exponentially fast whenever the rates (or exponents) violate certain conditions. The proofs for the standard almost lossless source coding with side-information problem and for communication over discrete memoryless channels (DMC) are solely based on change of measure arguments as inspired by [22, 23, 11] and by asymptotic analysis of the distributions implied by these changes of measure. Notice in particular that the restriction to strongly-typical and conditionally strongly-typical sets allows to simplify the proofs and circumvent additional arguments based on variational characterizations as in [11]. The converse results for almost lossless source coding and communication over DMC have been known for long, and our contribution here is to present a simple alternative proof.

In contrast, our converse proofs for the LL-round interactive compression and the KK-hop hypothesis testing setups are novel contributions in this article. Only special cases had been reported previously. Our proofs for these setups use similar change of measure arguments as in almost lossless source coding, but additionally also rely on the proofs of Markov chains that hold in the asymptotic regime of infinite blocklengths. These Markov chains are required to conclude existence of the desired auxiliary random variables. Strong converses of several special cases of our LL-round interactive compression had been reported previously, in particular see [11]. A strong converse proof for the 22-hop hypothesis testing setup was already presented in [31]; the proof in [31] however does not apply to the case ϵ1+ϵ2=1\epsilon_{1}+\epsilon_{2}=1 and seems more complex since it is based on the blowing-up lemma and hypercontractivity arguments.

In related publications we have used the presented proof method to derive fundamental limits of hypothesis testing systems under expected rate constraints [35, 36] and (expected) secrecy constraints [37]. In contrast to the results presented in this paper, the fundamental limits of distributed hypothesis testing under these expectation constraints depend on the allowed type-I error probabilities. It turns out that the proof technique based on change of measure arguments as proposed in the present paper naturally captures this dependence.

Acknowledgement

The authors would like to thank R. Graczyk for useful discussions on the strong converse proof for channel coding.

References

  • [1] M. Hamad, M. Wigger, and M. Sarkiss, “Strong converses using change of measure and asymptotic Markov chains.” [Online]. Available: https://arxiv.org/, May 2022.
  • [2] J. Wolfowitz, “The coding of messages subject to chance errors,” Illinois Journal of Mathematics, vol. 1, no. 4, pp. 591 – 606, 1957.
  • [3] I. Csiszár and J. Körner, Information theory: coding theorems for discrete memoryless systems. Cambridge University Press, 2011.
  • [4] K. Marton, “A simple proof of the blowing-up lemma,” IEEE Trans. Inf. Theory, vol. 32, pp. 445–446, May 1986.
  • [5] G. Dueck, “The strong converse coding theorem for the multiple-access channel,” Journal of Combinatorics, Information & System Sciences, vol. 6, no. 3, pp. 187–196, 1981.
  • [6] V. Strassen, “Asymptotische Abschätzungen in Shannons Informationstheorie,” (Prague, Czech Replublic), pp. 689–723, 962.
  • [7] H. V. P. Y. Polyanskiy and S. Verdú, “Channel coding rate in the finite blocklength regime,” IEEE Transactions on Information Theory, vol. 56, no. 5, pp. 2307–2359, 2010.
  • [8] R. Graczyk, “An elementary proof of the strong converse of the channel coding theorem.” Nov. 2022.
  • [9] S. L. Fong and V. Y. Tan, “A proof of the strong converse theorem for gaussian multiple access channels,” IEEE Transactions on Information Theory, vol. 62, no. 8, pp. 4376–4394, 2016.
  • [10] S. L. Fong and V. Y. Tan, “A proof of the strong converse theorem for gaussian broadcast channels via the gaussian poincaré inequality,” IEEE Transactions on Information Theory, vol. 63, no. 12, pp. 7737–7746, 2017.
  • [11] H. Tyagi and S. Watanabe, “Strong converse using change of measure arguments,” IEEE Trans. Inf. Theory, vol. 66, no. 2, pp. 689–703, 2019.
  • [12] A. I. Khinchin, “The entropy concept in probability theory,” Usp. Mat. Nay, vol. 8, pp. 3–20, 1953.
  • [13] B. McMillan, “The basic theorems of information theory,” Ann. Math. Stat., vol. 24, pp. 196–219, 1953.
  • [14] I. Csiszár and G. Longo, “On error exponent for source coding and for testing simple statistical hypotheses,” Studia Sci. Math. Hungar., pp. 181–191, 1977.
  • [15] J. Körner, “Coding of an information source having ambiguous alphabet and the entropy of graphs,” in Transactions of the 6th Prague conference on Information Theory, etc., (Prague, Czech Republic), pp. 411–425, 1973.
  • [16] J. Kieffer, “Strong converses in source coding relative to a fidelity criterion,” IEEE Transactions on Information Theory, vol. 37, no. 2, pp. 257–262, 1991.
  • [17] D. Slepian and J. Wolf, “Noiseless coding of correlated information sources,” IEEE Transactions on Information Theory, vol. 19, no. 4, pp. 471–480, 1973.
  • [18] A. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,” IEEE Transactions on Information Theory, vol. 22, no. 1, pp. 1–10, 1976.
  • [19] Y. Oohama and T. S. Han, “Universal coding for the slepian-wolf data compression system and the strong converse theorem,” IEEE Transactions on Information Theory, vol. 40, no. 6, pp. 1908–1919, 1994.
  • [20] Y. Oohama, “Exponential strong converse for source coding with side information at the decoder,” Entropy, vol. 20, no. 5, p. 352, 2018.
  • [21] Y. Oohama, “Exponential strong converse for one helper source coding problem,” Entropy, vol. 21, no. 6, p. 567, 2019.
  • [22] W. Gu and M. Effros, “A strong converse for a collection of network source coding problems,” in 2009 IEEE International Symposium on Information Theory, pp. 2316–2320, 2009.
  • [23] W. Gu and M. Effros, “A strong converse in source coding for super-source networks,” in 2011 IEEE International Symposium on Information Theory Proceedings, pp. 395–399, 2011.
  • [24] O. Kosut and J. Kliewer, “Strong converses are just edge removal properties,” IEEE Transactions on Information Theory, vol. 65, no. 6, pp. 3315–3339, 2018.
  • [25] A. Kaspi, “Two-way source coding with a fidelity criterion,” IEEE Transactions on Information Theory, vol. 31, no. 6, pp. 735–740, 1985.
  • [26] N. Ma and P. Ishwar, “Some results on distributed source coding for interactive function computation,” IEEE Transactions on Information Theory, vol. 57, no. 9, pp. 6180–6195, 2011.
  • [27] Y. Steinberg, “Coding and common reconstruction,” IEEE Transactions on Information Theory, vol. 55, no. 11, pp. 4995–5010, 2009.
  • [28] A. Lapidoth, A. Malär, and M. Wigger, “Constrained source-coding with side information,” IEEE Transactions on Information Theory, vol. 60, no. 6, pp. 3218–3237, 2014.
  • [29] S. Salehkalaibar, M. Wigger, and L. Wang, “Hypothesis testing in multi-hop networks.” [Online]. Available: https://arxiv.org/abs/1708.05198v1, 2017.
  • [30] R. Ahlswede and I. Csiszár, “Hypothesis testing with communication constraints,” IEEE Trans. Inf. Theory, vol. 32, pp. 533–542, Jul. 1986.
  • [31] D. Cao, L. Zhou, and V. Y. F. Tan, “Strong converse for hypothesis testing against independence over a two-hop network,” Entropy (Special Issue on Multiuser Information Theory II), vol. 21, Nov. 2019.
  • [32] J. Liu, R. Van Handel, and S. Verdú, “Beyond the blowing-up lemma: Sharp converses via reverse hypercontractivity,” in 2017 IEEE International Symposium on Information Theory (ISIT), pp. 943–947, IEEE, 2017.
  • [33] S. Sreekuma and D. Gündüz, “Strong converse for testing against independence over a noisy channel,” 2020.
  • [34] A. El Gamal and Y. H. Kim, Network Information Theory. Cambridge University Press, 2011.
  • [35] M. Hamad, M. Wigger, and M. Sarkiss, “Multi-hop network with multiple decision centers under expected-rate constraints,” 2022.
  • [36] M. Hamad, M. Sarkiss, and M. Wigger, “Benefits of rate-sharing for distributed hypothesis testing,” in 2022 IEEE International Symposium on Information Theory (ISIT), pp. 2714–2719, 2022.
  • [37] S. Faour, M. Hamad, M. Sarkiss, and M. Wigger, “Testing against independence with an eavesdropper,” 2022.