This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\DeclareCaptionType

step[Step]

Hypothesis Testing of Mixture Distributions using Compressed Data

Minh Thanh Vu
Abstract

In this paper we revisit the binary hypothesis testing problem with one-sided compression. Specifically we assume that the distribution in the null hypothesis is a mixture distribution of iid components. The distribution under the alternative hypothesis is a mixture of products of either iid distributions or finite order Markov distributions with stationary transition kernels. The problem is studied under the Neyman-Pearson framework in which our main interest is the maximum error exponent of the second type of error. We derive the optimal achievable error exponent and under a further sufficient condition establish the maximum ϵ\epsilon-achievable error exponent. It is shown that to obtain the latter, the study of the exponentially strong converse is needed. Using a simple code transfer argument we also establish new results for the Wyner-Ahlswede-Körner problem in which the source distribution is a mixture of iid components.

Index Terms:
Mixture distribution, information-spectrum method, exponentially strong converse, Neyman-Pearson framework, error exponent.

I Introduction

I-A Motivations & Related Works

Hypothesis testing with communication constraints is a classic problem in information theory. The problem was initiated by Ahswede and Csiszár in [1] as well as by Berger in [2]. It is assumed that a pair of data sequences (xn,yn)(x^{n},y^{n}) is observed at separate locations. The sequence xnx^{n} is compressed and sent to the location of yny^{n} via a noiseless channel. The decision center there decides whether (xn,yn)(x^{n},y^{n}) is iid generated from the null hypothesis with a distribution PXYP_{XY} or from the alternative hypothesis with a distribution QXYQ_{XY}. The Neyman-Pearson framework was used to study the trade-off between the probabilities of errors. The main interest was to establish the maximum error exponent of the second type of error when the probability of the first type of error is bounded as the number of samples tends to infinity. A single letter formulation was given in [1] for the testing against independence scenario and strong converse was proven for the general setting under a special condition. Various lower bounds for the general setting have been proposed in [3] and [4]. The work [3] also established the optimal error exponent in the zero-compression rate regime when ϵ\epsilon is small enough. Using the same condition as in [1], the work [5] established the optimal ϵ\epsilon-error exponent also in the zero-compression rate regime. The optimality of the random binning scheme proposed in [4] was shown for the conditional independence setting in [6]. In [7] the authors extended the testing against independence study to the successive refinement setting. The work [8] studied the non-asymptotic regime of the general setting under the two-sided zero-compression rate constraint. Yet, in all of the above studies, the distributions in both hypotheses are assumed to be iid.

Mixture distributions are prevalent in practice, cf. [9] for a comprehensive list of applications. However, hypothesis testing for mixture distributions is an under-examined direction in information theory. Notable studies are given in [10, 11, 12]. Communication constraints are not included in the above works. In this work we study the following binary hypothesis problem in which the hypotheses are given by

H0:PYnXn\displaystyle H_{0}\colon P_{Y^{n}X^{n}} =iνiPYiXin,\displaystyle=\sum_{i}\nu_{i}P_{Y_{i}X_{i}}^{\otimes n},
H1:QYnXn\displaystyle H_{1}\colon Q_{Y^{n}X^{n}} =jtτjtQYjn×QXtn.\displaystyle=\sum_{jt}\tau_{jt}Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}. (1)

Mixture of iid components is also known as mixture of repeated measurements or mixture of grouped observations in the literature. By no means exhaustive, we list a few works with applications in topic modeling [13], in cognitive psychology [14], and in developmental psychology [15], cf. also [16]. In machine learning applications, to allow flexible modeling it is often assumed that within each component the joint distribution has a product form without the requirement of having the same marginal distribution, cf. [17] and [18]. We keep the iid condition in the null hypothesis for tractable analysis. A motivating example is provided in the following.

Assume that two statisticians observe two bags of words xnx^{n} and yny^{n} taken as excerpts from some documents where nn denote the number of words in each bag. For simplicity we assume that in each bag words have an identical marginal distribution and also the order of words is not important. Then it is likely that the bag of words xnx^{n} (yny^{n}) is not generated iid, since for example knowing the first word x1x_{1} (y1y_{1}) to be in Latin (German) gives us a guess that the whole xnx^{n} (yny^{n}) is in Latin (German) [19]. Since the order of words does not matter, xnx^{n} and yny^{n} are sequences of exchangeable observations. It is natural to approximate the distributions of xnx^{n} and yny^{n} by finite mixtures of iid distributions, due to de Finetti’s theorem [20], in which each underlying state represents a topic. The two statisticians then form a hypothesis testing problem. In the null hypothesis, they assume that the two bags of words are generated jointly iid according to PYiXinP_{Y_{i}X_{i}}^{\otimes n} from an unknown topic i[1:m]i\in[1:m] with probability νi\nu_{i}, for example xlx_{l} is a direct translation or a synonym of yly_{l} for all l[1:n]l\in[1:n]. In the alternative hypothesis they assume that the two bags of words are generated independently and iid according to QYjn×QXtnQ_{Y_{j}}^{\otimes n}\times Q_{X_{t}}^{\otimes n} from an unknown topic (j,t)(j,t) with probability τjt\tau_{jt}.

We similarly are interested in the optimal error exponent of the second type of error when the probability of the first type is restricted by some ϵ[0,1)\epsilon\in[0,1). The iid assumption of each component in the null hypothesis as well as the factorization assumption in the alternative hypothesis are used to facilitate the derivation. In contrast to the classic strong converse result in [1], the maximum ϵ\epsilon-achievable error exponent generally depends on not only the prior distribution (νi)(\nu_{i}) but also relations between different constituent components in the mixture distributions. When no compression is involved, the weak law of large numbers can be used to establish the maximum ϵ\epsilon-achievable error exponent, cf. [11, 12]. However, this argument is no longer applicable in the presence of compression. We need to use an argument established through proving exponential strong converses of constituent components.

I-B Contributions

We summarize contents of our work in the following.

  • We use code transformation arguments to unveil a new property of the classic Ahlswede-Csiszár problem in a general setting, cf. Theorem 7 in Appendix A. The arguments are also useful in simplifying proofs of different results in our work.

  • We provide the optimal achievable error exponent in Theorem 3 and show in Corollary 2 that for ϵ>0\epsilon>0 small enough the obtained error exponent is also ϵ\epsilon-optimal. Our proof is based on studying a compound hypothesis testing problem of differentiating between {PYiXin}\{P_{Y_{i}X_{i}}^{\otimes n}\} and {QYjn×QXtn}\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\} under communication constraints. The results are established when {(QYjn)}\{(Q_{Y_{j}^{n}})\} and {(QXtn)}\{(Q_{X_{t}^{n}})\} are stationary memoryless processes or finite Markov processes with stationary transition probabilities.

  • Under a further sufficient condition on the sets of distributions {PYiXin}\{P_{Y_{i}X_{i}}^{\otimes n}\} and {QYjn}\{Q_{Y_{j}^{n}}\}, Assumption 2, we provide a complete characterization of the maximum ϵ\epsilon-error exponent in Theorem 4. It is shown that even if the strong converse is available for the compound problem of testing {PYiXin}\{P_{Y_{i}X_{i}}^{\otimes n}\} against {QYjn×QXtn}\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}, it is not sufficient for establishing the maximum ϵ\epsilon-error exponent in our mixture setting. Our derivation is based on the exponentially strong converse of the false alarm probabilities under the assumed condition.

  • We refine a recently established connection in [21] between the Wyner-Ahlswede-Körner (WAK) problem [22, 23] and the hypothesis testing against independence problem. While the previous result holds only under the stationary ergodic assumption, our new connection in Theorem 5 is valid under a more general assumption. We then use the refined connection to establish the corresponding minimum achievable compression rate and the minimum ϵ\epsilon-achievable compression rate for the WAK problem with mixture distributions.

I-C Organization

Our paper is organized as follows. In Section II we review previous results and define quantities which are essential to characterize optimal error exponents in our study. We then establish various results for the compound setting, some might be of independent interest in Section III. In Section IV, we provide the maximum achievable error exponent in the mixture setting. We state the sufficient condition and use it to establish the maximum ϵ\epsilon-achievable error exponent in the mixture setting in Section V. Then the refined connection between the WAK problem and the hypothesis testing against independence problem is given in Section VI. Consequences of the new connection are also given therein.

I-D Notations

We focus on finite alphabets in this paper. Before we begin we make the following conventions. Given a probability measure μ\mu, μn\mu^{\otimes n} denotes its nn-fold product measure extension. log(·)\log(\textperiodcentered) denotes the natural logarithm. For any two distributions PP and QQ on an alphabet 𝒰\mathcal{U}, assume that Q(u)=0Q(u)=0, if PP is absolutely continuous w.r.t QQ, denoted by PQP\ll Q, holds then we define ιPQ(u)=0\iota_{P\|Q}(u)=0, otherwise we define ιPQ(u)=+\iota_{P\|Q}(u)=+\infty, irrespective of whether P(u)P(u) is equal to 0 or not. We also define111For simplicity we use the convention log0=\log 0=-\infty in the following. ιPQ(u)=logPY(u)/QY(u)\iota_{P\|Q}(u)=\log P_{Y}(u)/Q_{Y}(u) when Q(u)>0Q(u)>0. The relative entropy between two distributions PP and QQ is defined as D(PQ)=𝔼[ιPQ(U)]D(P\|Q)=\mathbb{E}[\iota_{P\|Q}(U)] where UPU\sim P. For a joint distribution PUVP_{UV} on 𝒰×𝒱\mathcal{U}\times\mathcal{V} if PU(u)=0P_{U}(u)=0 or PV(v)=0P_{V}(v)=0 holds then we define ιPUV(u;v)=0\iota_{P_{UV}}(u;v)=0 as PUVPU×PVP_{UV}\ll P_{U}\times P_{V} is valid. Otherwise we define ιPUV(u;v)=logPUV(u,v)/(PU(u)×PV(v))\iota_{P_{UV}}(u;v)=\log P_{UV}(u,v)/(P_{U}(u)\times P_{V}(v)). The mutual information between UU and VV that is jointly distributed according to PUVP_{UV} is defined as I(U;V)=𝔼[ιPUV(U;V)]I(U;V)=\mathbb{E}[\iota_{P_{UV}}(U;V)]. For a finite set 𝒜\mathcal{A}, we use |𝒜||\mathcal{A}| and 𝒜c\mathcal{A}^{c} to denote its cardinality and its complement. For a mapping ϕ:𝒳\phi\colon\mathcal{X}\to\mathcal{M} we define |ϕ||\phi| to be the cardinality of its range, |ϕ||||\phi|\triangleq|\mathcal{M}|. For a given distribution PP on 𝒳\mathcal{X} and a stochastic mapping f:𝒳{0,1}f\colon\mathcal{X}\to\{0,1\} with the corresponding transition kernel WW we define

P(f)xP(x)W(0|x),andP(1f)1P(f).\displaystyle P(f)\triangleq\sum_{x}P(x)W(0|x),\;\text{and}\;P(1-f)\triangleq 1-P(f). (2)

II Preliminaries

We review previous results on the hypothesis testing problem with one-sided compression. Then we present an important assumption and essential quantities that are needed to establish results of our study in later sections.

Refer to caption
Figure 1: Illustration of the hypothesis with communication constraint setting.

In [1], the authors studied the following binary hypothesis testing problem. Deciding whether (xn,yn)(x^{n},y^{n}) is iid generated from H0:PXYH_{0}\colon P_{XY} or H1:QXYH_{1}\colon Q_{XY} using a testing scheme (ϕn,ψn)(\phi_{n},\psi_{n}). Herein ϕn\phi_{n} is a compression mapping,

ϕn:𝒳n,\phi_{n}\colon\mathcal{X}^{n}\to\mathcal{M}, (3)

and ψn\psi_{n} is a decision mapping

ψn:𝒴n×{0,1}.\psi_{n}\colon\mathcal{Y}^{n}\times\mathcal{M}\to\{0,1\}. (4)

The setting is depicted in Fig. 1. The corresponding type I and type II error (also known as false alarm and miss detection) probabilities are given by

αn\displaystyle\alpha_{n} =PYnϕn(Xn)(1ψn)=PYnϕn(Xn)(yn,ϕn(xn))P1|yn,ϕn(xn)\displaystyle=P_{Y^{n}\phi_{n}(X^{n})}(1-\psi_{n})=\sum P_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n}))P_{1|y^{n},\phi_{n}(x^{n})}
βn\displaystyle\beta_{n} =QYnϕn(Xn)(ψn)=QYnϕn(Xn)(yn,ϕn(xn))P0|yn,ϕn(xn),\displaystyle=Q_{Y^{n}\phi_{n}(X^{n})}(\psi_{n})=\sum Q_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n}))P_{0|y^{n},\phi_{n}(x^{n})}, (5)

where Pk|yn,ϕn(xn)P_{k|y^{n},\phi_{n}(x^{n})}, k{0,1}k\in\{0,1\}, is the probability that ψn\psi_{n} outputs kk given the pair (yn,ϕn(xn))(y^{n},\phi_{n}(x_{n})). The following achievability definition is repeatedly used in the subsequent analysis.

Definition 1.

For a given RcR_{c} and an ϵ[0,1)\epsilon\in[0,1), EE is an ϵ\epsilon-achievable error exponent of the second type for the binary hypothesis testing problem if there exists a sequence of testing schemes (ϕn,ψn)(\phi_{n},\psi_{n}) such that

lim supnαnϵ,\displaystyle\limsup_{n\to\infty}\alpha_{n}\leq\epsilon,\; lim infn1nlog1βnE,\displaystyle\liminf_{n\to\infty}\frac{1}{n}\log\frac{1}{\beta_{n}}\geq E,
lim supn1n\displaystyle\limsup_{n\to\infty}\frac{1}{n} log|ϕn|Rc.\displaystyle\log|\phi_{n}|\leq R_{c}. (6)

We define Eϵ(Rc)E^{\star}_{\epsilon}(R_{c}) as the supremum of all ϵ\epsilon-achievable error exponent at RcR_{c}.

When Q(y|x)>0Q(y|x)>0 for all (x,y)𝒳×𝒴(x,y)\in\mathcal{X}\times\mathcal{Y} holds, Ahlswede and Csiszár proved the strong converse result that for a given RcR_{c}, Eϵ(Rc)E_{\epsilon}^{\star}(R_{c}) does not depend on ϵ[0,1)\epsilon\in[0,1). They also provided a multi-letter formula for Eϵ(Rc)E_{\epsilon}^{\star}(R_{c}) under the stated condition. In the case that QXY=PX×PYQ_{XY}=P_{X}\times P_{Y} holds, the formula reduces to222For notation simplicity, we suppress the cardinality bound for 𝒰\mathcal{U} in the subsequent.

Eϵ(Rc)=maxPU|X:I(X;U)RcI(Y;U),ϵ[0,1).\displaystyle E_{\epsilon}^{\star}(R_{c})=\max_{P_{U|X}\colon I(X;U)\leq R_{c}}I(Y;U),\;\forall\epsilon\in[0,1). (7)

A closer examination also reveals that in the case QXY=QY×QXQ_{XY}=Q_{Y}\times Q_{X} holds, where QYQ_{Y} and QXQ_{X} are distributions on 𝒴\mathcal{Y} and 𝒳\mathcal{X} satisfying D(PYQY)<D(P_{Y}\|Q_{Y})<\infty and D(PXQX)<D(P_{X}\|Q_{X})<\infty, we also have333In [5], the authors claimed in the introduction that Ahswede and Csiszár obtained a single-letter characterization when QXY=QX×QYQ_{XY}=Q_{X}\times Q_{Y} using the entropy characterization methods. However, in [1] only characterizations for QXY=PX×PYQ_{XY}=P_{X}\times P_{Y} and an arbitrary QXYQ_{XY} with RcH(X)R_{c}\geq H(X) were provided. Further details to validate this claim were not provided in [5]. for all ϵ[0,1)\epsilon\in[0,1)

Eϵ(Rc)=maxPU|X:I(X;U)RcI(Y;U)+D(PYQY)+D(PXQX).E_{\epsilon}^{\star}(R_{c})=\max_{P_{U|X}\colon I(X;U)\leq R_{c}}I(Y;U)+D(P_{Y}\|Q_{Y})+D(P_{X}\|Q_{X}). (8)

In other words, Theorem 5 in [1] is tight in this case. We give a general relation using code transformations which leads to this observation in Appendix A.
This observation motivates us to study in this paper generalized problems of compound hypothesis testing and hypothesis testing for mixture distributions. These settings are based on finite collections of distributions {PXiYin}i=1m\{P_{X_{i}Y_{i}}^{\otimes n}\}_{i=1}^{m} and {QYjn×QXtn}j[1:k],t[1:r]\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}_{j\in[1:k],t\in[1:r]}. Let

𝒫𝒳={P~P~\displaystyle\mathcal{P}_{\mathcal{X}}=\{\tilde{P}\mid\tilde{P}\; is the marginal distributions on𝒳\displaystyle\text{is the marginal distributions on}\;\mathcal{X}
of PXiYifor some i[1:m]}.\displaystyle\text{ of }P_{X_{i}Y_{i}}\;\text{for some }i\in[1:m]\}. (9)

We assume that 𝒫𝒳\mathcal{P}_{\mathcal{X}} can be enumerated as {P𝒳,1,,P𝒳,|𝒫𝒳|}\{P_{\mathcal{X},1},\dots,P_{\mathcal{X},|\mathcal{P}_{\mathcal{X}}|}\}. For each s[1:|𝒫𝒳|]s\in[1:|\mathcal{P}_{\mathcal{X}}|], let 𝔉s\mathfrak{F}_{s} be the set of indices ii such that P𝒳,sP_{\mathcal{X},s} is a the marginal distribution of PYiXiP_{Y_{i}X_{i}} on 𝒳\mathcal{X}, i.e.,

𝔉s={iP𝒳,s is the marginal on 𝒳 of PXiYi}.\mathfrak{F}_{s}=\{i\mid P_{\mathcal{X},s}\text{ is the marginal on }\mathcal{X}\text{ of }P_{X_{i}Y_{i}}\}. (10)

To characterize the optimal error exponents in the compound setting and the mixture setting, we need to make the following assumption.

Assumption 1.

The processes {QXtn}\{Q_{X_{t}^{n}}\} and {QYjn}\{Q_{Y_{j}^{n}}\} are assumed to be in one of the following categories.

  • QXtn=QXtnQ_{X_{t}^{n}}=Q_{X_{t}}^{\otimes n} and QYjn=QYjnQ_{Y_{j}^{n}}=Q_{Y_{j}}^{\otimes n} hold for all j[1:k]j\in[1:k] and t[1:r]t\in[1:r], respectively. We assume further that for all i[1:m]i\in[1:m], minjD(PYiQYj)<\min_{j}D(P_{Y_{i}}\|Q_{Y_{j}})<\infty and for all s[1:|𝒫𝒳|]s\in[1:|\mathcal{P}_{\mathcal{X}}|], mintD(P𝒳,sQXt)<\min_{t}D(P_{\mathcal{X},s}\|Q_{X_{t}})<\infty hold. Accordingly, we define the minimum distances as444In case that argmin\operatorname*{arg\,min} does not return a unique value, we pick the first one.

    i[1:m],ji\displaystyle\forall i\in[1:m],\;j_{i}^{\star} =argminj[1:k]D(PYiQYj),diy=D(PYiQYji),\displaystyle=\operatorname*{arg\,min}_{j\in[1:k]}D(P_{Y_{i}}\|Q_{Y_{j}}),\;d_{i}^{\mathrm{y}}=D(P_{Y_{i}}\|Q_{Y_{j_{i}^{\star}}}),\;
    s[1:|𝒫𝒳|],ts\displaystyle\forall s\in[1:|\mathcal{P}_{\mathcal{X}}|],\;t_{s}^{\star} =argmint[1:r]D(P𝒳,sQXt),dsx=D(P𝒳,sQXts).\displaystyle=\operatorname*{arg\,min}_{t\in[1:r]}D(P_{\mathcal{X},s}\|Q_{X_{t}}),\;d_{s}^{\mathrm{x}}=D(P_{\mathcal{X},s}\|Q_{X_{t_{s}^{\star}}}). (11)
  • (QXtn)(Q_{X_{t}^{n}}) and (QYjn)(Q_{Y_{j}^{n}}) are Markov processes of finite orders with stationary transition probabilities for all t[1:r]t\in[1:r] and j[1:k]j\in[1:k]. Compared to the first category we need an additional assumption which is for all nn, we have P𝒳,snQXtnP_{\mathcal{X},s}^{\otimes n}\ll Q_{X_{t}^{n}} for all (s,t)(s,t) and similarly PYinQYjnP_{Y_{i}}^{\otimes n}\ll Q_{Y_{j}^{n}} for all (i,j)(i,j). We define the following limits, called relative entropy rates,

    (s,t),AX(st)\displaystyle\forall(s,t),\;A_{X}^{(st)} =limn[D(P𝒳,s(n+1)QXtn+1)D(P𝒳,snQXtn)],\displaystyle=\lim_{n\to\infty}[D(P_{\mathcal{X},s}^{\otimes(n+1)}\|Q_{X_{t}^{n+1}})-D(P_{\mathcal{X},s}^{\otimes n}\|Q_{X_{t}^{n}})],
    (i,j),AY(ij)\displaystyle\forall(i,j),\;A_{Y}^{(ij)} =limn[D(PYi(n+1)QYjn+1)D(PYinQYjn)].\displaystyle=\lim_{n\to\infty}[D(P_{Y_{i}}^{\otimes(n+1)}\|Q_{Y_{j}^{n+1}})-D(P_{Y_{i}}^{\otimes n}\|Q_{Y_{j}^{n}})]. (12)

    The existence of these (relative divergence rate) limits are guaranteed by [24, Theorem 1]. Similarly, we assume that for all i[1:m]i\in[1:m], minjAY(ij)<\min_{j}A_{Y}^{(ij)}<\infty and for all s[1:|𝒫𝒳|]s\in[1:|\mathcal{P}_{\mathcal{X}}|], mintAX(st)<\min_{t}A_{X}^{(st)}<\infty hold. We further define

    i[1:m],ji\displaystyle\forall i\in[1:m],\;j_{i}^{\star} =argminj[1:k]AY(ij),diy=AY(iji),\displaystyle=\operatorname*{arg\,min}_{j\in[1:k]}A_{Y}^{(ij)},\;d_{i}^{\mathrm{y}}=A_{Y}^{(ij_{i}^{\star})},
    s[1:|𝒫𝒳|],ts\displaystyle\forall s\in[1:|\mathcal{P}_{\mathcal{X}}|],\;t_{s}^{\star} =argmint[1:r]AX(st),dsx=AX(sts).\displaystyle=\operatorname*{arg\,min}_{t\in[1:r]}A_{X}^{(st)},\;d_{s}^{\mathrm{x}}=A_{X}^{(st_{s}^{\star)}}. (13)

For notation simplicity we also define total minimum distances for all i[1:m]i\in[1:m], i𝔉si\in\mathfrak{F}_{s},

disyx=diy+dsx.\displaystyle d_{is}^{\mathrm{yx}}=d_{i}^{\mathrm{y}}+d_{s}^{\mathrm{x}}. (14)

The following quantities are used to characterize optimal error exponents in later sections555For clarity, we reserve pairs of random variables (Yi,Xi)(Y_{i},X_{i}) for distributions PYiXiP_{Y_{i}X_{i}} and pairs of random variables (Y¯i,X¯s)(\bar{Y}_{i},\bar{X}_{s}) for distributions PYi|Xi×P𝒳,sP_{Y_{i}|X_{i}}\times P_{\mathcal{X},s}.

i[1:m],i𝔉s,ξi(Rc)\displaystyle\forall i\in[1:m],\;i\in\mathfrak{F}_{s},\;\xi_{i}(R_{c}) =maxPU|X¯s:I(X¯s;U)RcI(Y¯i;U)+disyx,\displaystyle=\max_{P_{U|\bar{X}_{s}}\colon I(\bar{X}_{s};U)\leq R_{c}}I(\bar{Y}_{i};U)+d_{is}^{\mathrm{yx}},
s[1:|𝒫𝒳|],θs(Rc)\displaystyle\forall s\in[1:|\mathcal{P}_{\mathcal{X}}|],\;\theta_{s}(R_{c}) =maxPU|X¯s:I(X¯s;U)Rcmini𝔉s[I(Y¯i;U)+disyx],\displaystyle=\max_{P_{U|\bar{X}_{s}}\colon I(\bar{X}_{s};U)\leq R_{c}}\min_{i\in\mathfrak{F}_{s}}[I(\bar{Y}_{i};U)+d_{is}^{\mathrm{yx}}], (15)

where PX¯s=P𝒳,sP_{\bar{X}_{s}}=P_{\mathcal{X},s} and for all i𝔉si\in\mathfrak{F}_{s}, PY¯i|X¯s=PYi|XiP_{\bar{Y}_{i}|\bar{X}_{s}}=P_{Y_{i}|X_{i}} hold.

III Compound Hypothesis Testing

In this section we study the compound hypothesis testing problem of differentiating between two collections of distributions {PYiXin}i[1:m]\{P_{Y_{i}X_{i}}^{\otimes n}\}_{i\in[1:m]} and {QYjn×QXtn}j[1:k],t[1:r]\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}_{j\in[1:k],t\in[1:r]} satisfying Assumption 1. We derive the optimal achievable error exponent and provide partial strong converse results for this setting.
A testing scheme for this problem is similarly characterized by a pair of compression-decision mappings (ϕn,ψn)(\phi_{n},\psi_{n}). For a given testing scheme (ϕn,ψn)(\phi_{n},\psi_{n}), we define the following quantities which characterize the maximum type I and type II error probabilities

αn\displaystyle\alpha_{n} =maxi[1:m]αn(i)=maxi[1:m]PYinϕn(Xin)(1ψn),\displaystyle=\max_{i\in[1:m]}\alpha_{n}^{(i)}=\max_{i\in[1:m]}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(1-\psi_{n}),
βn\displaystyle\beta_{n} =maxj[1:k],t[1:r]βn(jt)=maxj[1:k],t[1:r]QYjn×Qϕn(Xtn)(ψn).\displaystyle=\max_{j\in[1:k],t\in[1:r]}\beta_{n}^{(jt)}=\max_{j\in[1:k],t\in[1:r]}Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(\psi_{n}). (16)

Similarly we also use Definition 1 as the ϵ\epsilon-achievability definition. For a given RcR_{c}, the maximum ϵ\epsilon-achievable error exponent is denoted by Ecomp,ϵ(Rc)E_{\mathrm{comp},\epsilon}^{\star}(R_{c}).
As our achievability result will be used in Sections IV and V, for notation compactness, we define the following auxiliary sets. For a given mapping ϕn\phi_{n} and a positive number EE, define for each i[1:m]i\in[1:m], an intersected decision set n(i)(E)\mathcal{I}_{n}^{(i)}(E) as follows

n(i)(E)=\displaystyle\mathcal{I}_{n}^{(i)}(E)= {(yn,ϕn(xn))PYinϕn(Xin)(yn,ϕn(xn))\displaystyle\{(y^{n},\phi_{n}(x^{n}))\mid P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(y^{n},\phi_{n}(x^{n}))
>enEmaxj[1:k],t[1:r]QYjn×Qϕn(Xtn)(yn,ϕn(xn))}.\displaystyle\hskip 28.45274pt>e^{nE}\max_{j\in[1:k],t\in[1:r]}Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(y^{n},\phi_{n}(x^{n}))\}. (17)

In the case k=1k=1 and r=1r=1 hold, n(i)(E)\mathcal{I}_{n}^{(i)}(E) is a decision region based on likelihood ratio for testing PYinϕn(Xin)P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})} against QY1n×Qϕn(X1n)Q_{Y_{1}^{n}}\times Q_{\phi_{n}(X_{1}^{n})}.

III-A Characterization of Ecomp,0(Rc)E_{\mathrm{comp},0}^{\star}(R_{c})

The following result characterizes the optimal achievable error exponent in the compound hypothesis testing problem.

Theorem 1.

For a given compression threshold RcR_{c}, we have Ecomp,0(Rc)=mins[1:|𝒫𝒳|]θs(Rc)E_{\mathrm{comp},0}^{\star}(R_{c})=\min_{s\in[1:|\mathcal{P}_{\mathcal{X}}|]}\theta_{s}(R_{c}). Furthermore, for a given positive number γ\gamma and any sequence of testing schemes (ϕn,ψn)(\phi_{n},\psi_{n}) such that E¯=mins[1:|𝒫𝒳|]θs(Rc)γ\bar{E}=\min_{s\in[1:|\mathcal{P}_{\mathcal{X}}|]}\theta_{s}(R_{c})-\gamma is achievable we also have with E=E¯γE=\bar{E}-\gamma

limnPYinϕn(Xin)[(n(i)(E))c]=0,i[1:m].\displaystyle\lim_{n\to\infty}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}\big{[}(\mathcal{I}_{n}^{(i)}(E))^{c}\big{]}=0,\;\forall i\in[1:m]. (18)

The achievability proof of Theorem 1 is given in Appendix B. The converse proof of Theorem 1 follows from the one of Theorem 3. The proof of Theorem 1 uses a combination of techniques from [11], [25] and our new mixing idea. In the following we provide an overview of steps in the proof of Theorem 1.

Assume that the set of marginal distributions on 𝒳\mathcal{X}, 𝒫𝒳\mathcal{P}_{\mathcal{X}}, consists of a single element. Assume further that the number of components in the null hypothesis is two, i.e. m=2m=2 and 𝔉1={1,2}\mathfrak{F}_{1}=\{1,2\}. First we check whether the sequence xnx^{n} is “typical” in the sense that

mintιPXnQXtn(xn)>n(d1xγ).\displaystyle\min_{t}\iota_{P_{X}^{\otimes n}\|Q_{X_{t}^{n}}}(x^{n})>n(d_{1}^{\mathrm{x}}-\gamma). (19)

This helps us to perform the change of measure step from QXtnQ_{X_{t}^{n}} to PXnP_{X}^{\otimes n} in the analysis of the type-II (or miss detection) probability. The above condition is violated with vanishing probability in the analysis of the false alarm probability. We then select a test channel PU|XP_{U|X} and generate a codebook from the marginal distribution PUnP_{U}^{\otimes n}. In our proof we do not estimate ii from the sequence yny^{n} to avoid potential complications in the analysis of the miss detection probability. We artificially create instead the following joint distribution

PYnUn=i=12νiPYiUn,\displaystyle P_{Y^{n}U^{n}}=\sum_{i=1}^{2}\nu_{i}P_{Y_{i}U}^{\otimes n}, (20)

where νi\nu_{i} are positive probability weights. PYnUnP_{Y^{n}U^{n}} shifts the burden from calculating the miss detection probabilities to bounding the false alarm probabilities, which is less complex. We then consider the following score function which is helpful in defining a deterministic decision mapping ψn\psi_{n}

ζ(yn,un)=ιPYnUn(yn;un)+minjιPYnQYjn(yn).\displaystyle\zeta(y^{n},u^{n})=\iota_{P_{Y^{n}U^{n}}}(y^{n};u^{n})+\min_{j}\iota_{P_{Y^{n}\|Q_{Y_{j}^{n}}}}(y^{n}). (21)

In the score function ζ(,)\zeta(\cdot,\cdot) the first term resolves the uncertainty within the set of marginal distributions {PYin}\{P_{Y_{i}}^{\otimes n}\}, while the second term resolves the mismatch between two sets of distributions {PYin}\{P_{Y_{i}}^{\otimes n}\} and {QYtn}\{Q_{Y_{t}^{n}}\}. The second term also indirectly checks whether yny^{n} is “typical”.
Given a chosen codeword, which we explain how to obtain later, we decide that the null hypothesis is true if ζ(yn,un)>n(Ed1x)\zeta(y^{n},u^{n})>n(E-d_{1}^{\mathrm{x}}) is fulfilled. Given this decision the miss detection probabilities can be deduced based on the following chain of measure changing steps

QYjn(yn)(21)PYn|Un(yn|un),Qϕn(Xtn)(un)(19)Pϕ(Xn)(un),\displaystyle Q_{Y_{j}^{n}}(y^{n})\stackrel{{\scriptstyle\eqref{change_qto_pyu}}}{{\to}}P_{Y^{n}|U^{n}}(y^{n}|u^{n}),\;Q_{\phi_{n}(X_{t}^{n})}(u^{n})\stackrel{{\scriptstyle\eqref{change_qxto_px}}}{{\to}}P_{\phi(X^{n})}(u^{n}), (22)

as well as the fact that yn,unPYn|Un(yn|un)Pϕ(Xn)(un)1\sum_{y^{n},u^{n}}P_{Y^{n}|U^{n}}(y^{n}|u^{n})P_{\phi(X^{n})}(u^{n})\leq 1 holds, where the summation is performed over the decision region.
Now to obtain a transmission message index, we search for a codeword such that it yields the lowest maximum false alarm conditional probabilities by looking at

maxi𝔉sPr{ζ(Yin,un)<n(Ed1x)|Xin=xn}.\displaystyle\max_{i\in\mathfrak{F}_{s}}\mathrm{Pr}\{\zeta(Y_{i}^{n},u^{n})<n(E-d_{1}^{\mathrm{x}})|X_{i}^{n}=x^{n}\}. (23)

In the analysis of the maximum false alarm probabilities changing measure from PYnUnP_{Y^{n}U^{n}} to PYinUinP_{Y_{i}^{n}U_{i}^{n}} in the expression of ζ(,)\zeta(\cdot,\cdot) is relatively standard, cf. [11]. We then can use the standard typical arguments to conclude the existence of a good codebook. In the general case where the marginal set 𝒫𝒳\mathcal{P}_{\mathcal{X}} has multiple elements, we need to estimate ss. Because of the way that we design ζ(,)\zeta(\cdot,\cdot) this extra step does not affect the exponent of the miss detection probability.

III-B A partial characterization of Ecomp,ϵ(Rc)E_{\mathrm{comp},\epsilon}^{\star}(R_{c})

We have the following result which provides a partial characterization of Ecomp,ϵ(Rc)E_{\mathrm{comp},\epsilon}^{\star}(R_{c}).

Theorem 2.

Given a positive number RcR_{c}, define the inactive set 𝒮={sθs(Rc)<mini𝔉sξi(Rc)}\mathcal{S}=\{s\mid\theta_{s}(R_{c})<\min_{i\in\mathfrak{F}_{s}}\xi_{i}(R_{c})\}.

  • If666We use the following convention: if 𝒮=\mathcal{S}=\varnothing then mins𝒮()=+\min_{s\in\mathcal{S}}(\cdot)=+\infty and maxs𝒮()=\max_{s\in\mathcal{S}}(\cdot)=-\infty. ϵ<min{mins𝒮1|𝔉s|,1}\epsilon<\min\{\min_{s\in\mathcal{S}}\frac{1}{|\mathfrak{F}_{s}|},1\} holds, then we have

    Ecomp,ϵ(Rc)=mins[1:|𝒫𝒳|]θs(Rc).E_{\mathrm{comp},\epsilon}^{\star}(R_{c})=\min_{s\in[1:|\mathcal{P}_{\mathcal{X}}|]}\theta_{s}(R_{c}).
  • If ϵ>max{maxs𝒮|𝔉s|1|𝔉s|,0}\epsilon>\max\{\max_{s\in\mathcal{S}}\frac{|\mathfrak{F}_{s}|-1}{|\mathfrak{F}_{s}|},0\}, then we have

    Ecomp,ϵ(Rc)=mini[1:m]ξi(Rc).E_{\mathrm{comp},\epsilon}^{\star}(R_{c})=\min_{i\in[1:m]}\xi_{i}(R_{c}).
  • Let ss^{\star} be an optimality achieving index, i.e., θs(Rc)=minsθs(Rc)\theta_{s^{\star}}(R_{c})=\min_{s^{\prime}}\theta_{s^{\prime}}(R_{c}). Assume that ss^{\star} is active, i.e., θs(Rc)=mini𝔉sξi(Rc)\theta_{s^{\star}}(R_{c})=\min_{i\in\mathfrak{F}_{s^{\star}}}\xi_{i}(R_{c}). For an arbitrarily given γ>0\gamma>0, for any sequence of testing schemes (ϕn,ψn)(\phi_{n},\psi_{n}) such that the following inequalities are satisfied

    lim supn1nlog|ϕn|Rc,lim infn1nlog1βnminsθs(Rc)+γ,\displaystyle\limsup_{n\to\infty}\frac{1}{n}\log|\phi_{n}|\leq R_{c},\;\liminf_{n\to\infty}\frac{1}{n}\log\frac{1}{\beta_{n}}\geq\min_{s}\theta_{s}(R_{c})+\gamma, (24)

    we have then limnαn(i)=1\lim_{n\to\infty}\alpha_{n}^{(i^{\star})}=1, where i𝔉si^{\star}\in\mathfrak{F}_{s^{\star}} is an index such that ξi(Rc)=θs(Rc)\xi_{i^{\star}}(R_{c})=\theta_{s^{\star}}(R_{c}). It also implies that under this assumption we have Ecomp,ϵ(Rc)=minsθs(Rc)E_{\mathrm{comp},\epsilon}^{\star}(R_{c})=\min_{s}\theta_{s}(R_{c}) for all ϵ[0,1)\epsilon\in[0,1).

A simple case in which the third statement in Theorem 2 holds, is when |𝔉s|=1|\mathfrak{F}_{s}|=1 for all s[1:|𝒫𝒳|]s\in[1:|\mathcal{P}_{\mathcal{X}}|]. The proof of Theorem 2 is given in Appendix C. The second part of Theorem 4 will be employed in proving the converse of Theorem 4 for mixture models in Section IV.

Recall that 𝔉s\mathfrak{F}_{s} represents the set of distributions PYiXiP_{Y_{i}X_{i}} which have the same marginal distribution P𝒳,sP_{\mathcal{X},s} on 𝒳\mathcal{X}, cf. Equation (10). In the following we provide an outline of the proofs of the first two points in the statement Theorem 2.

We discuss in this paragraph the first item in the first part of Theorem 2. When ss is active, i.e., θs(Rc)=mini𝔉sξi(Rc)\theta_{s}(R_{c})=\min_{i\in\mathfrak{F}_{s}}\xi_{i}(R_{c}) holds, then it follows from the strong converse bound for testing PYiXinP_{Y_{i}X_{i}}^{\otimes n} against QYjin×QXtsnQ_{Y_{j_{i}^{\star}}^{n}}\times Q_{X_{t_{s}^{\star}}^{n}} for each ii inside the class 𝔉s\mathfrak{F}_{s}, that Ecomp,ϵ(Rc)θs(Rc)E_{\mathrm{comp},\epsilon}^{\star}(R_{c})\leq\theta_{s}(R_{c}) holds for all ϵ[0,1)\epsilon\in[0,1), cf. Theorem 7. Therefore we only need to focus on the inactive set 𝒮\mathcal{S}. For simplicity, in this discussion we can assume that 𝒮={1}\mathcal{S}=\{1\} holds and there are two components inside 𝔉1\mathfrak{F}_{1}. We assume further that there is no mismatch, i.e., {QYjn}={PYin}\{Q_{Y_{j}^{n}}\}=\{P_{Y_{i}}^{\otimes n}\} and {QXtn}={PXin}\{Q_{X_{t}^{n}}\}=\{P_{X_{i}}^{\otimes n}\} hold. The formulation of θ1(Rc)\theta_{1}(R_{c}) requires the selection of a test channel PU|X¯1P_{U|\bar{X}_{1}}. To show the strong converse bound, the general idea is hence to identify a common test channel PU|X¯1P_{U|\bar{X}_{1}}. This can be done by considering relevant sets 𝒱i\mathcal{V}_{i}, i=1,2i=1,2, of xnx^{n} such that for each i=1,2,i=1,2,

Pr(ψn(Yin,ϕn(xn))=0|Xin=xn)>η\mathrm{Pr}(\psi_{n}(Y_{i}^{n},\phi_{n}(x^{n}))=0|X_{i}^{n}=x^{n})>\eta

holds, where η(0,1(ϵ+γ))\eta\in(0,1-(\epsilon+\gamma)) and γ(0,1ϵ)\gamma\in(0,1-\epsilon) are arbitrary. By setting η=1/2(ϵ+γ)\eta=1/2-(\epsilon+\gamma) and using the reverse Markov inequality, we obtain the following inequalities

PXin(𝒱i)1/(1+2(ϵ+γ)),i=1,2.P_{X_{i}}^{\otimes n}(\mathcal{V}_{i})\geq 1/(1+2(\epsilon+\gamma)),\;\forall i=1,2. (25)

We require that the intersection 𝒱=i=12𝒱i\mathcal{V}=\cap_{i=1}^{2}\mathcal{V}_{i} should be non-empty. This allows us to define a joint distribution PY~1nY~2nX~nP_{\tilde{Y}_{1}^{n}\tilde{Y}_{2}^{n}\tilde{X}^{n}} where PX~nP_{\tilde{X}^{n}} is supported on 𝒱\mathcal{V}. Note also that η\eta-restriction in the definition of 𝒱i\mathcal{V}_{i} allows us to obtain the following inequality

n(E)I(Y~1n;ϕn(X~n))+(a bounded function of ηϵ and γ).n(E-\dots)\leq I(\tilde{Y}_{1}^{n};\phi_{n}(\tilde{X}^{n}))+(\text{a bounded function of $\eta$, $\epsilon$ and $\gamma$}).

With this we can identify U~l=(ϕn(X~n),(Y~il1)i=12)\tilde{U}_{l}=(\phi_{n}(\tilde{X}^{n}),(\tilde{Y}_{i}^{l-1})_{i=1}^{2}) for all l[1:n]l\in[1:n]. The variational arguments in [26] can be used to indeed show that Ecomp,ϵ(Rc)θ1(Rc)E_{\mathrm{comp},\epsilon}^{\star}(R_{c})\leq\theta_{1}(R_{c}) also holds. To make 𝒱\mathcal{V} non-empty, we must have ϵ<1/2=1|𝔉1|\epsilon<1/2=\frac{1}{|\mathfrak{F}_{1}|}. When the inactive set 𝒮\mathcal{S} contains more than one element we obtain the corresponding threshold mins𝒮1|𝔉s|\min_{s\in\mathcal{S}}\frac{1}{|\mathfrak{F}_{s}|}.

Now we discuss about the second item in the first part of Theorem 2. Since Ecomp,ϵ(Rc)ξi(Rc)E_{\mathrm{comp},\epsilon}^{\star}(R_{c})\leq\xi_{i}(R_{c}) holds for all i[1:m]i\in[1:m] and ϵ[0,1)\epsilon\in[0,1), cf. Theorem 7, we only explain the achievability direction of the second item. For simplicity we assume that the set of marginal distributions 𝒫𝒳\mathcal{P}_{\mathcal{X}} has a single element, the element is inactive 𝒮={1}\mathcal{S}=\{1\}, and 𝔉1={1,2}\mathfrak{F}_{1}=\{1,2\}. Our achievability idea is to build two sequences of testing schemes separately and then mix them together. For this we need to divide the space 𝒳n\mathcal{X}^{n} into |𝔉1|=2|\mathfrak{F}_{1}|=2 partitions 𝒞1\mathcal{C}_{1} and 𝒞2\mathcal{C}_{2} such that PX¯1n(𝒞l)>1ϵP_{\bar{X}_{1}}^{\otimes n}(\mathcal{C}_{l})>1-\epsilon for all l=1,2l=1,2 for all sufficiently large nn. Since ϵ>|𝔉1|1|𝔉1|=1/2\epsilon>\frac{|\mathfrak{F}_{1}|-1}{|\mathfrak{F}_{1}|}=1/2, such partitioning can be done. For each i{1,2}i\in\{1,2\}, we design a sequences of testing schemes (ϕn1i,ψn1i)(\phi_{n}^{1i},\psi_{n}^{1i}) to differentiate between PYiXinP_{Y_{i}X_{i}}^{\otimes n} and QYjin×QXt1nQ_{Y_{j_{i}^{\star}}^{n}}\times Q_{X_{t_{1}^{\star}}^{n}} such that ξi(Rc)γ\xi_{i}(R_{c})-\gamma is achievable. As in the proof of Theorem 1 we also define an auxiliary mixture distribution

PYnXn=iνiPYiXin.\displaystyle P_{Y^{n}X^{n}}=\sum_{i}\nu_{i}P_{Y_{i}X_{i}}^{\otimes n}. (26)

The mixture distribution helps to alleviate the estimation of the distribution of yny^{n}. Once the preparation is complete, we perform the compression as follows.
We first check if xnx^{n} is a typical sequence in the sense that whether the following condition is fulfilled or not

mintιPXnQXtn(xn)>n(d1xγ).\displaystyle\min_{t}\iota_{P_{X^{n}}\|Q_{X_{t}^{n}}}(x^{n})>n(d_{1}^{\mathrm{x}}-\gamma). (27)

Again the above inequality also helps resolving the mismatch between two sets of distributions {PXin}\{P_{X_{i}}^{\otimes n}\} and {QXtn}\{Q_{X_{t}^{n}}\}. Suppose that xnx^{n} is typical. If xn𝒞1x^{n}\in\mathcal{C}_{1} then we use ϕn11\phi_{n}^{11} to compress it, and similarly when xn𝒞2x^{n}\in\mathcal{C}_{2} we use ϕn12\phi_{n}^{12} to compress it. The joint compression mapping is then

ϕn1(xn)=ϕn11(xn)𝟏{xn𝒞1}+ϕn12(xn)𝟏{xn𝒞2}.\displaystyle\phi_{n}^{1}(x^{n})=\phi_{n}^{11}(x^{n})\mathbf{1}\{x^{n}\in\mathcal{C}_{1}\}+\phi_{n}^{12}(x^{n})\mathbf{1}\{x^{n}\in\mathcal{C}_{2}\}. (28)

The joint compression mapping induces the following distribution from PYnXnP_{Y^{n}X^{n}}

PYnU=ν1PY1nϕn11(X1n)+ν2PY2nϕn12(X2n).\displaystyle P_{Y^{n}U}=\nu_{1}P_{Y_{1}^{n}\phi_{n}^{11}(X_{1}^{n})}+\nu_{2}P_{Y_{2}^{n}\phi_{n}^{12}(X_{2}^{n})}. (29)

We also define the following score function

ζ(yn,ϕn1(xn))=ιPYnU(yn;ϕn1(xn))+minjιPYnQYjn(yn).\displaystyle\zeta(y^{n},\phi_{n}^{1}(x^{n}))=\iota_{P_{Y^{n}U}}(y^{n};\phi_{n}^{1}(x^{n}))+\min_{j}\iota_{P_{Y^{n}\|Q_{Y_{j}^{n}}}}(y^{n}). (30)

We say H0H_{0} is true if ζ(yn,ϕn1(xn))>n(Ed1x)\zeta(y^{n},\phi_{n}^{1}(x^{n}))>n(E-d_{1}^{\mathrm{x}}) holds. Let us look at αn(1)\alpha_{n}^{(1)} which can be upper-bounded as

αn(1)\displaystyle\alpha_{n}^{(1)} Pr{ζ(Y1n,ϕn1(X1n))>n(Ed1x),X1n is typical}\displaystyle\leq\mathrm{Pr}\{\zeta(Y_{1}^{n},\phi_{n}^{1}(X_{1}^{n}))>n(E-d_{1}^{\mathrm{x}}),X_{1}^{n}\text{ is typical}\}
+Pr{X1n is atypical}\displaystyle+\mathrm{Pr}\{X_{1}^{n}\text{ is atypical}\}
(a)Pr{ζ(Y1n,ϕn11(X1n))>n(Ed1x),X1n𝒞1}\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}}\mathrm{Pr}\{\zeta(Y_{1}^{n},\phi_{n}^{11}(X_{1}^{n}))>n(E-d_{1}^{\mathrm{x}}),X_{1}^{n}\in\mathcal{C}_{1}\}
+Pr{X1n𝒞1}+Pr{X1n is atypical}\displaystyle+\mathrm{Pr}\{X_{1}^{n}\notin\mathcal{C}_{1}\}+\mathrm{Pr}\{X_{1}^{n}\text{ is atypical}\}
Pr{ζ(Y1n,ϕn11(X1n))>n(Ed1x)}\displaystyle\leq\mathrm{Pr}\{\zeta(Y_{1}^{n},\phi_{n}^{11}(X_{1}^{n}))>n(E-d_{1}^{\mathrm{x}})\}
+Pr{X1n𝒞1}+Pr{X1n is atypical},\displaystyle+\mathrm{Pr}\{X_{1}^{n}\notin\mathcal{C}_{1}\}+\mathrm{Pr}\{X_{1}^{n}\text{ is atypical}\}, (31)

where (a)(a) holds since when xnx^{n} is in 𝒞1\mathcal{C}_{1} and is typical, we use ϕn11\phi_{n}^{11} to compress it. We have Pr{X1n𝒞1}<ϵ\mathrm{Pr}\{X_{1}^{n}\notin\mathcal{C}_{1}\}<\epsilon by construction. The first term can be shown to be vanishing using similar steps as in the proof of Theorem 1.
We note that Theorem 1 only guarantees that θ1(Rc)γ\theta_{1}(R_{c})-\gamma is achievable which is below mini𝔉1ξi(Rc)γ\min_{i\in\mathfrak{F}_{1}}\xi_{i}(R_{c})-\gamma, our desired error exponent in this part of Theorem 2. The collection of sets {𝒞l}l=12\{\mathcal{C}_{l}\}_{l=1}^{2} is used to resolve the confusion about which compression mappings we should use when s=1s=1 is not active. We are willing to pay an additional ϵ\epsilon error probability price for using this collection. The general case is a little bit more complicated but follows the same principles as we discuss herein. Note that when ss is active, we do not need to divide 𝒳n\mathcal{X}^{n} into a collection of subsets as above. This is because by Theorem 1 we can design a sequence of testing schemes to differentiate between {PYiXin}i𝔉s\{P_{Y_{i}X_{i}}^{\otimes n}\}_{i\in\mathfrak{F}_{s}} and {QYjn×QXtn}\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\} such that mini𝔉sξi(Rc)γ\min_{i\in\mathfrak{F}_{s}}\xi_{i}(R_{c})-\gamma is achievable.

Remark 1.

We have θs(Rc)mini𝔉sξi(Rc)\theta_{s}(R_{c})\leq\min_{i\in\mathfrak{F}_{s}}\xi_{i}(R_{c}). The inequality can be strict which can be shown numerically. This means that in general strong converse does not hold for the compound testing problem. In other words, Ecomp,ϵ(Rc)E_{\mathrm{comp},\epsilon}^{\star}(R_{c}) depends on ϵ\epsilon.

IV Testing Against Generalized Independence

In this section we consider the hypothesis testing problem involving mixture distributions. We use results and techniques from Section III to establish the optimal achievable error exponent in this section. We begin with our model’s definition.
Assume that we have two sets of distributions {PYiXin}i=1m\{P_{Y_{i}X_{i}}^{\otimes n}\}_{i=1}^{m} and {QYjn×QXtn}j[1:k],t[1:r]\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}_{j\in[1:k],t\in[1:r]}, which fulfill the conditions given in Assumption 1. For given {PYiXin}i=1m\{P_{Y_{i}X_{i}}^{\otimes n}\}_{i=1}^{m}, let the distribution under the null hypothesis be defined as

PYnXn=iνiPYiXin,wherei,νi>0,andiνi=1.\displaystyle P_{Y^{n}X^{n}}=\sum_{i}\nu_{i}P_{Y_{i}X_{i}}^{\otimes n},\;\text{where}\;\forall i,\;\nu_{i}>0,\;\text{and}\;\sum_{i}\nu_{i}=1. (32)

Similarly the distribution under the alternative hypothesis is given by

QYnXn=j,tτjtQYjn×QXtn,\displaystyle Q_{Y^{n}X^{n}}=\sum_{j,t}\tau_{jt}Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}, (33)

where τjt0\tau_{jt}\geq 0 for all (j,t)(j,t), j,tτjt=1\sum_{j,t}\tau_{jt}=1, and i[1:m],i𝔉s,τjits>0\forall i\in[1:m],\;i\in\mathfrak{F}_{s},\;\tau_{j_{i}^{\star}t_{s}^{\star}}>0. For notation simplicity in the subsequent analysis we define γq=mini[1:m],i𝔉𝔰τjits\gamma_{q}=\min_{i\in[1:m],i\in\mathfrak{F_{s}}}\tau_{j_{i}^{\star}t_{s}^{\star}}. We name this problem testing against generalized independence.
The model of QYnXnQ_{Y^{n}X^{n}} subsumes the following two cases:

  • testing against independence in which QYnXn=(iνiPYin)×(iνiPXin)Q_{Y^{n}X^{n}}=(\sum_{i}\nu_{i}P_{Y_{i}}^{\otimes n})\times(\sum_{i^{\prime}}\nu_{i^{\prime}}P_{X_{i^{\prime}}}^{\otimes n}) hold,

  • and testing against (unobserved) conditional independence in which QYnXn=iνiPYin×PXinQ_{Y^{n}X^{n}}=\sum_{i}\nu_{i}P_{Y_{i}}^{\otimes n}\times P_{X_{i}}^{\otimes n} hold.

For a given pair of compression-decision mappings (ϕn,ψn)(\phi_{n},\psi_{n}), we define the corresponding type-I and type-II (false alarm and miss detection) probabilities as

αn\displaystyle\alpha_{n} =PYnϕn(Xn)(1ψn),\displaystyle=P_{Y^{n}\phi_{n}(X^{n})}(1-\psi_{n}),
βn\displaystyle\beta_{n} =QYnϕn(Xn)(ψn).\displaystyle=Q_{Y^{n}\phi_{n}(X^{n})}(\psi_{n}). (34)

Similarly as in Definition 1 we say that EE is an ϵ\epsilon-achievable error exponent at a compression rate RcR_{c} for testing PYnXnP_{Y^{n}X^{n}} against QYnXnQ_{Y^{n}X^{n}} if there exists a sequence of testing schemes (ϕn,ψn)(\phi_{n},\psi_{n}) such that all the conditions in (6) are satisfied. We denote the maximum ϵ\epsilon-achievable error exponent at the given rate RcR_{c} by Emix,ϵ(Rc)E_{\mathrm{mix},\epsilon}^{\star}(R_{c}). We first characterize the optimal achievable error exponent in the testing against generalized independence problem Emix,0(Rc)E_{\mathrm{mix},0}^{\star}(R_{c}).

Theorem 3.

For a given compression rate RcR_{c}, in testing PYnXnP_{Y^{n}X^{n}} against QYnXnQ_{Y^{n}X^{n}} using one-sided compression, we have

Emix,0(Rc)=Ecomp,0(Rc)=mins[1:|𝒫𝒳|]θs(Rc).\displaystyle E_{\mathrm{mix},0}^{\star}(R_{c})=E_{\mathrm{comp},0}^{\star}(R_{c})=\min_{s\in[1:|\mathcal{P}_{\mathcal{X}}|]}\theta_{s}(R_{c}). (35)

We first provide a remark about the first equality in the statement of Theorem 3. In this we highlight the difference between our model and a previous study.

Let us consider the case that τjt>0\tau_{jt}>0 for all pairs (j,t)(j,t) holds. Assume that EE is achievable in the mixture setting via a sequence of testing schemes (ϕn,ψn)(\phi_{n},\psi_{n}). Since PYnϕn(Xn)(1ψn)0P_{Y^{n}\phi_{n}(X^{n})}(1-\psi_{n})\to 0 and νi>0\nu_{i}>0 for all ii, we have PYinϕn(Xin)(1ψn)0P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(1-\psi_{n})\to 0. Similarly, for an arbitrarily given γ>0\gamma>0 and for all sufficiently large nn we have QYnϕn(Xn)(ψn)en(Eγ)Q_{Y^{n}\phi_{n}(X^{n})}(\psi_{n})\leq e^{-n(E-\gamma)}. Since for all (j,t)(j,t), τjt>0\tau_{jt}>0 holds, we have QYjnϕn(Xtn)(ψn)1τjten(Eγ)Q_{Y_{j}^{n}\phi_{n}(X_{t}^{n})}(\psi_{n})\leq\frac{1}{\tau_{jt}}e^{-n(E-\gamma)}. This implies EE is an achievable error exponent in the compound hypothesis testing problem with the corresponding sequence of testing schemes (ϕn,ψn)(\phi_{n},\psi_{n}). Hence Emix,0(Rc)Ecomp,0(Rc)E_{\mathrm{mix},0}^{\star}(R_{c})\leq E_{\mathrm{comp},0}^{\star}(R_{c}) holds. The arguments discussed herein are similar to the ones given in [12] when data are not compressed. In our proof of Theorem 3, we only need the restriction that τjits>0\tau_{j_{i}^{\star}t_{s}^{\star}}>0 for all i[1:m]i\in[1:m] and i𝔉si\in\mathfrak{F}_{s}.

We explain the idea of showing Emix,0(Rc)mins[1:|𝒫𝒳|]θs(Rc)E_{\mathrm{mix},0}^{\star}(R_{c})\leq\min_{s\in[1:|\mathcal{P}_{\mathcal{X}}|]}\theta_{s}(R_{c}) in the following. For simplicity assume that there is no mismatch, i.e., {QXtn}={PXsn}\{Q_{X_{t}^{n}}\}=\{P_{X_{s}}^{\otimes n}\} as well as {QYjn}={PYin}\{Q_{Y_{j}^{n}}\}=\{P_{Y_{i}}^{\otimes n}\} hold. Assume that EE is an achievable error exponent via a sequence of testing schemes (ϕn,ψn)(\phi_{n},\psi_{n}). A central idea of the Neyman-Pearson framework is to consider a decision region based on the likelihood ratio. An advantage of working with a likelihood-based decision region is that elementary set operations such as intersection, contraction, etc. can be performed through simple change of measure steps either in the numerator or denominator of the likelihood ratio. We want to show that if EE is achievable then roughly

Pr{ιPYinϕn(Xin)(Yin;ϕn(Xin))<nE}0,asn.\displaystyle\mathrm{Pr}\{\iota_{P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}}(Y_{i}^{n};\phi_{n}(X_{i}^{n}))<nE\}\to 0,\;\text{as}\;n\to\infty. (36)

The term inside the bracket is a rejection region based on the likelihood ratio for testing PYinϕn(Xin)P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})} against PYin×Pϕn(Xin)P_{Y_{i}^{n}}\times P_{\phi_{n}(X_{i}^{n})}. Then based on the definition of the spectral-inf mutual information rate as well as the fact that the spectral-inf mutual information rate is bounded by the inf-mutual information rate, we can arrive at a conclusion that for an arbitrarily given γ>0\gamma>0, and for all i[1:m]i\in[1:m],

E1nI(Yin;ϕn(Xin))+γ\displaystyle E\leq\frac{1}{n}I(Y_{i}^{n};\phi_{n}(X_{i}^{n}))+\gamma (37)

for all sufficiently large nn. Then we can use the standard single-letterization method to obtain that Emins[1:|𝒫𝒳|]θs(Rc)E\leq\min_{s\in[1:|\mathcal{P}_{\mathcal{X}}|]}\theta_{s}(R_{c}). In order to obtain the conclusion in (36), we need to perform several change of measure steps. First we form a decision region 𝒜n\mathcal{A}_{n} based on the likelihood ratio of PYnϕn(Xn)P_{Y^{n}\phi_{n}(X^{n})} and PYn×Pϕn(Xn)P_{Y^{n}}\times P_{\phi_{n}(X^{n})} as well as the achievable error exponent EE. Then it can be shown that PYnϕn(Xn)(𝒜nc)0P_{Y^{n}\phi_{n}(X^{n})}(\mathcal{A}_{n}^{c})\to 0. We then do the first change of measure step from PYnϕn(Xn)P_{Y^{n}\phi_{n}(X^{n})} to PYinϕn(Xin)P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})} to obtain

PYinϕn(Xin)(𝒜nc)0.\displaystyle P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{A}_{n}^{c})\to 0. (38)

This can be seen from the definition of PYnXnP_{Y^{n}X^{n}}, as νi>0\nu_{i}>0. Next we need to change the measure inside the definition of 𝒜n\mathcal{A}_{n}. Roughly we want to show that the following inequality holds

logPYinϕn(Xin)PYin×Pϕn(Xin)(,)+extra penality\displaystyle\log\frac{P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}}{P_{Y_{i}^{n}}\times P_{\phi_{n}(X_{i}^{n})}}(\cdot,\cdot)+\text{extra penality}
logPYnϕn(Xn)PYn×Pϕn(Xn)(,)nE.\displaystyle\geq\log\frac{P_{Y^{n}\phi_{n}(X^{n})}}{P_{Y^{n}}\times P_{\phi_{n}(X^{n})}}(\cdot,\cdot)\geq nE. (39)

Changing the measures in the denominator from PYnP_{Y^{n}} to PYinP_{Y_{i}^{n}} and Pϕn(Xn)P_{\phi_{n}(X^{n})} to Pϕn(Xin)P_{\phi_{n}(X_{i}^{n})} can be done based on inequalities PYn()νiPYin()P_{Y^{n}}(\cdot)\geq\nu_{i}P_{Y_{i}^{n}}(\cdot) and Pϕn(Xn)νiPϕn(Xin)()P_{\phi_{n}(X^{n})}\geq\nu_{i}P_{\phi_{n}(X_{i}^{n})}(\cdot). These inequalities follow from the definition of PYnXnP_{Y^{n}X^{n}} and hold for all yny^{n} and ϕn(xn)\phi_{n}(x^{n}). The change from PYnϕn(Xn)P_{Y^{n}\phi_{n}(X^{n})} to PYinϕn(Xin)P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})} is not based on the definition of PYnXnP_{Y^{n}X^{n}}. However we can show that it holds with high probability. The proof of the general case involves another change of measure step from QYjin×Qϕn(Xtsn)Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})} to PYin×Pϕn(Xin)P_{Y_{i}^{n}}\times P_{\phi_{n}(X_{i}^{n})} using our code transformation arguments.

Proof.

Assume that EE is an achievable error exponent at a compression rate RcR_{c} in the compound hypothesis testing problem via a sequence of testing schemes (ϕn,ψn)(\phi_{n},\psi_{n}). We have by definition

limnmaxi[1:m]αn(i)\displaystyle\lim_{n\to\infty}\max_{i\in[1:m]}\alpha_{n}^{(i)} =0,\displaystyle=0,
lim infn1nlog1max(j,t)βn(jt)\displaystyle\liminf_{n\to\infty}\frac{1}{n}\log\frac{1}{\max_{(j,t)}\beta_{n}^{(jt)}} E.\displaystyle\geq E. (40)

Applying this sequence to the current testing against generalized independence setting we obtain

PYnϕn(Xn)(ψn)\displaystyle P_{Y^{n}\phi_{n}(X^{n})}(\psi_{n}) =i=1mνiPYinϕn(Xin)(ψn)\displaystyle=\sum_{i=1}^{m}\nu_{i}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\psi_{n})
maxi[1:m]PYinϕn(Xin)(ψn)=maxi[1:m]αn(i),\displaystyle\leq\max_{i\in[1:m]}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\psi_{n})=\max_{i\in[1:m]}\alpha_{n}^{(i)},
QYnϕn(Xn)(ψn)\displaystyle Q_{Y^{n}\phi_{n}(X^{n})}(\psi_{n}) =j,tτjtQYjn×Qϕn(Xtn)(ψn)\displaystyle=\sum_{j,t}\tau_{jt}Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(\psi_{n})
max(j,t)QYjn×Qϕn(Xtn)(ψn)=max(j,t)βn(jt).\displaystyle\leq\max_{(j,t)}Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(\psi_{n})=\max_{(j,t)}\beta_{n}^{(jt)}. (41)

We obtain that

limnPYnϕn(Xn)(ψn)=0,lim infn1nlog1QYnϕn(Xn)(ψn)E.\lim_{n\to\infty}P_{Y^{n}\phi_{n}(X^{n})}(\psi_{n})=0,\;\liminf_{n\to\infty}\frac{1}{n}\log\frac{1}{Q_{Y^{n}\phi_{n}(X^{n})}(\psi_{n})}\geq E. (42)

Hence EE is also achievable in our testing against generalized independence setting, cf. Definition 1. Therefore we have

Emix,0(Rc)Ecomp,0(Rc).\displaystyle E_{\mathrm{mix},0}^{\star}(R_{c})\geq E_{\mathrm{comp},0}^{\star}(R_{c}). (43)

Now for an arbitrarily given γ>0\gamma>0, assume that (ϕn,ψn)(\phi_{n},\psi_{n}) is a sequence of testing schemes such that

limnαn=0,lim infn1nlog1βnE+γ\lim_{n\to\infty}\alpha_{n}=0,\;\liminf_{n\to\infty}\frac{1}{n}\log\frac{1}{\beta_{n}}\geq E+\gamma (44)

hold. Define for each nn the following decision region based on the likelihood ratio

𝒜n={(yn,ϕn(xn))\displaystyle\mathcal{A}_{n}=\{(y^{n},\phi_{n}(x^{n}))\mid PYnϕn(Xn)(yn,ϕn(xn))\displaystyle P_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n}))
enEQYnϕn(Xn)(yn,ϕn(xn))}.\displaystyle\geq e^{nE}Q_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n}))\}. (45)

By [11, Lemma 4.1.2] and the definition of PYnXnP_{Y^{n}X^{n}}, we have for all nn

αn+enEβn\displaystyle\alpha_{n}+e^{nE}\beta_{n} PYnϕn(Xn)(𝒜nc)\displaystyle\geq P_{Y^{n}\phi_{n}(X^{n})}(\mathcal{A}_{n}^{c})
=i=1mνiPYinϕn(Xin)(𝒜nc)\displaystyle=\sum_{i=1}^{m}\nu_{i}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{A}_{n}^{c}) (46)

Let (γn)(\gamma_{n}) be a sequence such that γn0\gamma_{n}\to 0 and nγnn\gamma_{n}\to\infty as nn\to\infty. For each i[1:m]i\in[1:m], we define a set

𝒢n(i)={(yn,ϕn(xn))\displaystyle\mathcal{G}_{n}^{(i)}=\{(y^{n},\phi_{n}(x^{n}))\mid PYnϕn(Xn)(yn,ϕn(xn))\displaystyle P_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n}))
<enγnPYinϕn(Xin)(yn,ϕn(xn))}.\displaystyle<e^{n\gamma_{n}}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(y^{n},\phi_{n}(x^{n}))\}. (47)

We then have

PYinϕn(Xin)[(𝒢n(i))c]enγnPYnϕn(Xn)[(𝒢n(i))c]enγn.\displaystyle P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}[(\mathcal{G}_{n}^{(i)})^{c}]\leq e^{-n\gamma_{n}}P_{Y^{n}\phi_{n}(X^{n})}[(\mathcal{G}_{n}^{(i)})^{c}]\leq e^{-n\gamma_{n}}. (48)

𝒢n(i)\mathcal{G}_{n}^{(i)} contains high probability pairs when we perform the change of measure from PYnϕn(Xn)P_{Y^{n}\phi_{n}(X^{n})} to PYinϕn(Xin)P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})} in the numerator of the likelihood ratio test in the definition of 𝒜n\mathcal{A}_{n}. To make the derivation more compact, we further define two following sets

𝒞n(i)\displaystyle\mathcal{C}_{n}^{(i)} ={(yn,ϕn(xn))PYinϕn(Xin)(yn,ϕn(xn))\displaystyle=\{(y^{n},\phi_{n}(x^{n}))\mid P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(y^{n},\phi_{n}(x^{n}))
<en(Eγn+logγq/n)QYjin×Qϕn(Xtsn)(yn,ϕn(xn))},\displaystyle\hskip 56.9055pt<e^{n(E-\gamma_{n}+\log\gamma_{q}/n)}Q_{Y_{j^{\star}_{i}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}(y^{n},\phi_{n}(x^{n}))\},
𝒟n(i)\displaystyle\mathcal{D}_{n}^{(i)} ={(yn,ϕn(xn))PYinϕn(Xin)(yn,ϕn(xn))\displaystyle=\{(y^{n},\phi_{n}(x^{n}))\mid P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(y^{n},\phi_{n}(x^{n}))
<en(Eγn)QYnϕn(Xn)(yn,ϕn(xn))}.\displaystyle\hskip 56.9055pt<e^{n(E-\gamma_{n})}Q_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n}))\}. (49)

𝒟n(i)\mathcal{D}_{n}^{(i)} is a rejection region in testing PYinϕn(Xin)P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})} against QYnϕn(Xn)Q_{Y^{n}\phi_{n}(X^{n})}. 𝒞n(i)\mathcal{C}_{n}^{(i)} is a rejection region of testing PYinϕn(Xin)P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})} against QYjin×Qϕn(Xtsn)Q_{Y_{j^{\star}_{i}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}, which is our first desired test. From the definition of QYnXnQ_{Y^{n}X^{n}}, we know that for all pairs (yn,ϕn(xn))(y^{n},\phi_{n}(x^{n})), and for all i[1:m]i\in[1:m] and i𝔉si\in\mathfrak{F}_{s}, the following inequality holds

QYnϕn(Xn)(yn,ϕn(xn))\displaystyle Q_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n})) γqQYjin×Qϕn(Xtsn)(yn,ϕn(xn)).\displaystyle\geq\gamma_{q}Q_{Y_{j^{\star}_{i}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}(y^{n},\phi_{n}(x^{n})). (50)

This implies that 𝒞n(i)𝒟n(i)\mathcal{C}_{n}^{(i)}\subseteq\mathcal{D}_{n}^{(i)} holds. Furthermore for (yn,ϕn(xn))𝒟n(i)𝒢n(i)(y^{n},\phi_{n}(x^{n}))\in\mathcal{D}_{n}^{(i)}\cap\mathcal{G}_{n}^{(i)} we have

PYnϕn(Xn)(yn,ϕn(xn))\displaystyle P_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n})) <(47)enγnPYinϕn(Xin)(yn,ϕn(xn))\displaystyle\stackrel{{\scriptstyle\eqref{g_set}}}{{<}}e^{n\gamma_{n}}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(y^{n},\phi_{n}(x^{n}))
<(49)enEQYnϕn(Xn)(yn,ϕn(xn))\displaystyle\stackrel{{\scriptstyle\eqref{c_i_n_def}}}{{<}}e^{nE}Q_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n}))
(yn,\displaystyle\Rightarrow(y^{n}, ϕn(xn))𝒜nc.\displaystyle\phi_{n}(x^{n}))\in\mathcal{A}_{n}^{c}. (51)

Using the above analysis we perform in the following two change of measure steps

PYinϕn(Xin)(𝒜nc)+enγn\displaystyle P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{A}_{n}^{c})+e^{-n\gamma_{n}}
(a)PYinϕn(Xin)(𝒟n(i)𝒢n(i))+PYinϕn(Xin)[(𝒢n(i))c]\displaystyle\stackrel{{\scriptstyle(a)}}{{\geq}}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{D}_{n}^{(i)}\cap\mathcal{G}_{n}^{(i)})+P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}[(\mathcal{G}_{n}^{(i)})^{c}]
PYinϕn(Xin)(𝒟n(i))\displaystyle\geq P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{D}_{n}^{(i)})
(b)PYinϕn(Xin)(𝒞n(i)),\displaystyle\stackrel{{\scriptstyle(b)}}{{\geq}}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{C}_{n}^{(i)}), (52)

where

  • in (a)(a) we change the measures from PYnϕn(Xn)P_{Y^{n}\phi_{n}(X^{n})} to PYinϕn(Xin)P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})},

  • in (b)(b) we change the measures from QYnϕn(Xn)Q_{Y^{n}\phi_{n}(X^{n})} to QYjin×Qϕn(Xtsn)Q_{Y_{j^{\star}_{i}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}.

In summary we have

αn+enEβn+enγn\displaystyle\alpha_{n}+e^{nE}\beta_{n}+e^{-n\gamma_{n}}
i=1mνiPYinϕn(Xin)(𝒞n(i)).\displaystyle\geq\sum_{i=1}^{m}\nu_{i}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{C}_{n}^{(i)}). (53)

In combination with (44), since enEβn0e^{nE}\beta_{n}\to 0 as nn\to\infty holds, we obtain

limnPYinϕn(Xin)(𝒞n(i))=0,i[1:m].\displaystyle\lim_{n\to\infty}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{C}_{n}^{(i)})=0,\;\forall i\in[1:m]. (54)

In the next step, for each i[1:m]i\in[1:m], and i𝔉si\in\mathfrak{F}_{s}, we will perform change of measure from QYjin×Qϕn(Xtsn)Q_{Y_{j^{\star}_{i}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})} to PYin×Pϕ¯ns(Xin)P_{Y_{i}^{n}}\times P_{\bar{\phi}_{n}^{s}(X_{i}^{n})}. ϕ¯ns\bar{\phi}_{n}^{s} is a compression mapping for each class 𝔉s\mathfrak{F}_{s} that is constructed from ϕn\phi_{n}.

For a given i[1:m]i\in[1:m], consider the problem of differentiating between PYiXinP_{Y_{i}X_{i}}^{\otimes n} and QYjin×QXtsnQ_{Y_{j_{i}^{\star}}^{n}}\times Q_{X_{t_{s}^{\star}}^{n}}, where i𝔉si\in\mathfrak{F}_{s}, via the testing scheme (ϕn,𝟏(𝒞n(i))c)(\phi_{n},\mathbf{1}_{(\mathcal{C}_{n}^{(i)})^{c}}). The corresponding error probabilities are given by

PYinϕn(Xin)(𝒞n(i)),QYjin×Qϕn(Xtsn)[(𝒞n(i))c].\displaystyle P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{C}_{n}^{(i)}),\;Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}[(\mathcal{C}_{n}^{(i)})^{c}]. (55)

Note that by the definition of 𝒞n(i)\mathcal{C}_{n}^{(i)} we have

QYjin×Qϕn(Xtsn)[(𝒞n(i))c]\displaystyle Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}[(\mathcal{C}_{n}^{(i)})^{c}] en(Eγn+logγq/n)PYinϕn(Xin)[(𝒞n(i))c]\displaystyle\leq e^{-n(E-\gamma_{n}+\log\gamma_{q}/n)}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}[(\mathcal{C}_{n}^{(i)})^{c}]
en(Eγn+logγq/n).\displaystyle\leq e^{-n(E-\gamma_{n}+\log\gamma_{q}/n)}. (56)

We want to transform the given testing scheme to obtain a new testing scheme (ϕ¯ns,ψ¯n(i))(\bar{\phi}_{n}^{s},\bar{\psi}_{n}^{(i)}), i𝔉si\in\mathfrak{F}_{s}, for differentiating between PYiXinP_{Y_{i}X_{i}}^{\otimes n} and PYin×PXinP_{Y_{i}}^{\otimes n}\times P_{X_{i}}^{\otimes n}. This can be done using similar arguments to those given in Appendix A as follows. For the given positive γ\gamma we define for the given s[1:|𝒫𝒳|]s\in[1:|\mathcal{P}_{\mathcal{X}}|] a typical subset of 𝒳n\mathcal{X}^{n}

n,γs={xn|ιP𝒳,snQXtsn(xn)/ndsx|<γ}.\displaystyle\mathcal{B}_{n,\gamma}^{s}=\{x^{n}\mid|\iota_{P_{\mathcal{X},s}^{\otimes n}\|Q_{X_{t_{s}^{\star}}^{n}}}(x^{n})/n-d_{s}^{\mathrm{x}}|<\gamma\}. (57)

Similarly for the given ii we define a typical subset of 𝒴n\mathcal{Y}^{n}

n,γ(i)={yn|ιPYinQYjin(yn)/ndiy|<γ}.\displaystyle\mathcal{B}_{n,\gamma}^{(i)}=\{y^{n}\mid|\iota_{P_{Y_{i}}^{\otimes n}\|Q_{Y_{j_{i}^{\star}}^{n}}}(y^{n})/n-d_{i}^{\mathrm{y}}|<\gamma\}. (58)

Then the new compression mapping ϕ¯ns\bar{\phi}_{n}^{s} is defined as

ϕ¯ns:𝒳n\displaystyle\bar{\phi}_{n}^{s}\colon\mathcal{X}^{n} {e}\displaystyle\to\mathcal{M}\cup\{e\}
ϕ¯ns(xn)\displaystyle\bar{\phi}_{n}^{s}(x^{n}) {ϕn(xn),ifxnn,γs,eotherwise.\displaystyle\mapsto\begin{dcases}\phi_{n}(x^{n}),\;&\text{if}\;x^{n}\in\mathcal{B}_{n,\gamma}^{s},\\ e\;&\text{otherwise}\end{dcases}.

The decision mapping ψ¯n(i)\bar{\psi}_{n}^{(i)} is defined as

ψ¯n(i):𝒴n×({e})\displaystyle\bar{\psi}_{n}^{(i)}\colon\mathcal{Y}^{n}\times(\mathcal{M}\cup\{e\}) {0,1}\displaystyle\to\{0,1\}
ψ¯n(i)(yn,u¯)\displaystyle\bar{\psi}_{n}^{(i)}(y^{n},\bar{u}) {𝟏(𝒞n(i))c(yn,u¯),ifynn,γ(i),andu¯e,1otherwise.\displaystyle\mapsto\begin{dcases}\mathbf{1}_{(\mathcal{C}_{n}^{(i)})^{c}}(y^{n},\bar{u}),\;&\text{if}\;y^{n}\in\mathcal{B}_{n,\gamma}^{(i)},\;\text{and}\;\bar{u}\neq e,\\ 1&\text{otherwise}\end{dcases}.

Using this testing scheme (ϕ¯ns,ψ¯n(i))(\bar{\phi}_{n}^{s},\bar{\psi}_{n}^{(i)}) we can bound the error probabilities in testing PYiXinP_{Y_{i}X_{i}}^{\otimes n} against PYin×PXinP_{Y_{i}}^{\otimes n}\times P_{X_{i}}^{\otimes n} as

PYinϕ¯ns(Xin)(1ψ¯n(i))\displaystyle P_{Y_{i}^{n}\bar{\phi}_{n}^{s}(X_{i}^{n})}(1-\bar{\psi}_{n}^{(i)}) PYinϕn(Xin)(𝒞n(i))+PYin[(n,γ(i))c]+PXin[(n,γs)c]\displaystyle\leq P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{C}_{n}^{(i)})+P_{Y_{i}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{(i)})^{c}]+P_{X_{i}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{s})^{c}]
PYin×Pϕ¯ns(Xin)(ψ¯n(i))\displaystyle P_{Y_{i}}^{\otimes n}\times P_{\bar{\phi}_{n}^{s}(X_{i}^{n})}(\bar{\psi}_{n}^{(i)}) QYjin×Qϕn(Xtsn)[(𝒞n(i))c]en(dsx+diy+2γ)\displaystyle\leq Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}[(\mathcal{C}_{n}^{(i)})^{c}]e^{n(d_{s}^{\mathrm{x}}+d_{i}^{\mathrm{y}}+2\gamma)}
enEi,\displaystyle\leq e^{-nE_{i}^{\prime}}, (59)

where Ei=Eγn+logγq/n[disyx+2γ]E_{i}^{\prime}=E-\gamma_{n}+\log\gamma_{q}/n-[d_{is}^{\mathrm{yx}}+2\gamma]. Then using [11, Lemma 4.1.2] we obtain the following inequality

PYinϕ¯ns(Xin)(1ψ¯n(i))\displaystyle P_{Y_{i}^{n}\bar{\phi}_{n}^{s}(X_{i}^{n})}(1-\bar{\psi}_{n}^{(i)}) +en(Eiγ)PYin×Pϕ¯ns(Xin)(ψ¯n(i))\displaystyle+e^{n(E_{i}^{\prime}-\gamma)}P_{Y_{i}}^{\otimes n}\times P_{\bar{\phi}_{n}^{s}(X_{i}^{n})}(\bar{\psi}_{n}^{(i)})
PYinϕ¯ns(Xin)[(𝒜¯n(i))c],\displaystyle\geq P_{Y_{i}^{n}\bar{\phi}_{n}^{s}(X_{i}^{n})}[(\bar{\mathcal{A}}_{n}^{(i)})^{c}], (60)

holds where

𝒜¯n(i)={(yn,ϕ¯ns(xn))\displaystyle\bar{\mathcal{A}}_{n}^{(i)}=\{(y^{n},\bar{\phi}_{n}^{s}(x^{n}))\mid PYinϕ¯ns(Xin)(yn,ϕ¯ns(xn))\displaystyle P_{Y_{i}^{n}\bar{\phi}_{n}^{s}(X_{i}^{n})}(y^{n},\bar{\phi}_{n}^{s}(x^{n}))
en(Eiγ)PYin×Pϕ¯ns(Xin)(yn,ϕ¯ns(xn))},\displaystyle\geq e^{n(E_{i}^{\prime}-\gamma)}P_{Y_{i}}^{\otimes n}\times P_{\bar{\phi}_{n}^{s}(X_{i}^{n})}(y^{n},\bar{\phi}_{n}^{s}(x^{n}))\}, (61)

is our desired decision region using the likelihood ratio test in testing PYinϕ¯ns(Xin)P_{Y_{i}^{n}\bar{\phi}_{n}^{s}(X_{i}^{n})} against PYin×Pϕ¯ns(Xin)P_{Y_{i}}^{\otimes n}\times P_{\bar{\phi}_{n}^{s}(X_{i}^{n})}. Using the inequalities in (59) we obtain

PYinϕ¯ns(Xin)[(𝒜¯n(i))c]\displaystyle P_{Y_{i}^{n}\bar{\phi}_{n}^{s}(X_{i}^{n})}[(\bar{\mathcal{A}}_{n}^{(i)})^{c}]\leq PYinϕn(Xin)(𝒞n(i))+PYin[(n,γ(i))c]\displaystyle P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{C}_{n}^{(i)})+P_{Y_{i}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{(i)})^{c}]
+PXin[(n,γs)c]+enγ.\displaystyle+P_{X_{i}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{s})^{c}]+e^{-n\gamma}. (62)

Under Assumption 1, PYin[(n,γ(i))c]0P_{Y_{i}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{(i)})^{c}]\to 0 and PXin[(n,γs)c]0P_{X_{i}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{s})^{c}]\to 0 as nn\to\infty due to either the weak law of large numbers or Theorem 1 in [24]. Since both γp>0\gamma_{p}>0, and γ>γnlogγq/n\gamma>\gamma_{n}-\log\gamma_{q}/n as nn\to\infty hold, by combining (54) and (62) we have for all i[1:m]i\in[1:m]

limnPr{ιYinϕ¯ns(Xin)(Yin;ϕ¯ns(Xin))<n(Edisyx4γ)}=0,\displaystyle\lim_{n\to\infty}\mathrm{Pr}\{\iota_{Y_{i}^{n}\bar{\phi}_{n}^{s}(X_{i}^{n})}(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n}))<n(E-d_{is}^{\mathrm{yx}}-4\gamma)\}=0, (63)

where (Yin,Xin)PYiXin(Y_{i}^{n},X_{i}^{n})\sim P_{Y_{i}X_{i}}^{\otimes n} holds. Hence, for all i[1:m]i\in[1:m], i𝔉si\in\mathfrak{F}_{s}, we have

Edisyx4γI¯(𝐘i;ϕ¯s(𝐗i)),\displaystyle E-d_{is}^{\mathrm{yx}}-4\gamma\leq\underline{I}(\mathbf{Y}_{i};\bar{\phi}_{s}(\mathbf{X}_{i})), (64)

where (𝐘i,ϕ¯s(𝐗i))={(Yin,ϕ¯ns(Xin))}n=1(\mathbf{Y}_{i},\bar{\phi}_{s}(\mathbf{X}_{i}))=\{(Y_{i}^{n},\bar{\phi}_{n}^{s}(X_{i}^{n}))\}_{n=1}^{\infty} and I¯(;)\underline{I}(\cdot;\cdot) is the spectral-inf mutual information rate, defined for a joint process (𝐔¯,𝐕¯)={(Un,Vn)n=1}(\bar{\mathbf{U}},\bar{\mathbf{V}})=\{(U^{n},V^{n})_{n=1}^{\infty}\} as

I¯(𝐔;𝐕)=sup{β|limnPr[ιPUnVn(Un;Vn)<nβ]=0}.\displaystyle\underline{I}(\mathbf{U};\mathbf{V})=\sup\big{\{}\beta\big{|}\lim_{n\to\infty}\mathrm{Pr}\big{[}\iota_{P_{U^{n}V^{n}}}(U^{n};V^{n})<n\beta\big{]}=0\big{\}}. (65)

Since the spectral-inf mutual information rate is less than or equal to the inf-mutual information rate by [11, Theorem 3.5.2]

I¯(𝐘i;ϕ¯s(𝐗i))lim infn1nI(Yin;ϕ¯ns(Xin))\underline{I}(\mathbf{Y}_{i};\bar{\phi}_{s}(\mathbf{X}_{i}))\leq\liminf_{n\to\infty}\frac{1}{n}I(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n}))

holds, i[1:m],i𝔉s\forall i\in[1:m],i\in\mathfrak{F}_{s}, we have

Edisyx4γsupn0infnn01nI(Yin;ϕ¯ns(Xin)).\displaystyle E-d_{is}^{\mathrm{yx}}-4\gamma\leq\sup_{n_{0}}\inf_{n\geq n_{0}}\frac{1}{n}I(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n})). (66)

For each i[1:m]i\in[1:m], let ni(γ)n_{i}(\gamma) be such that

supn0infnn01nI(Yin;ϕ¯ns(Xin))infnni(γ)1nI(Yin;ϕ¯ns(Xin))+γ.\displaystyle\sup_{n_{0}}\inf_{n\geq n_{0}}\frac{1}{n}I(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n}))\leq\inf_{n\geq n_{i}(\gamma)}\frac{1}{n}I(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n}))+\gamma. (67)

Then for all i[1:m]i\in[1:m], we have

Edisyx4γinfnni(γ)1nI(Yin;ϕ¯ns(Xin))+γ.\displaystyle E-d_{is}^{\mathrm{yx}}-4\gamma\leq\inf_{n\geq n_{i}(\gamma)}\frac{1}{n}I(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n}))+\gamma. (68)

Let TT be a uniform random variable on [1:n][1:n] and independent of everything else. For each i[1:m]i\in[1:m] and i𝔉si\in\mathfrak{F}_{s}, we define Uil=(ϕ¯ns(Xin),Xil1)U_{il}=(\bar{\phi}_{n}^{s}(X_{i}^{n}),X_{i}^{l-1}) for all l[1:n]l\in[1:n] and Ui=(UiT,T)U_{i}=(U_{iT},T). Therefore for all i[1:m]\forall i\in[1:m] and i𝔉si\in\mathfrak{F}_{s}, as well as for all sufficiently large nn, say nmaxini(γ)n\geq\max_{i}n_{i}(\gamma), we have

E\displaystyle E- disyx5γ1nI(Yin;ϕ¯ns(Xin))\displaystyle d_{is}^{\mathrm{yx}}-5\gamma\leq\frac{1}{n}I(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n}))
=1nl=1nI(Yil;ϕ¯ns(Xin)|Yil1)=(a)1nl=1nI(Yil;ϕ¯ns(Xin),Yil1)\displaystyle=\frac{1}{n}\sum_{l=1}^{n}I(Y_{il};\bar{\phi}_{n}^{s}(X_{i}^{n})|Y_{i}^{l-1})\stackrel{{\scriptstyle(a)}}{{=}}\frac{1}{n}\sum_{l=1}^{n}I(Y_{il};\bar{\phi}_{n}^{s}(X_{i}^{n}),Y_{i}^{l-1})
1nl=1nI(Yil;ϕ¯ns(Xin),Yil1,Xil1)=(b)1nl=1nI(Yil;ϕ¯ns(Xin),Xil1)\displaystyle\leq\frac{1}{n}\sum_{l=1}^{n}I(Y_{il};\bar{\phi}_{n}^{s}(X_{i}^{n}),Y_{i}^{l-1},X_{i}^{l-1})\stackrel{{\scriptstyle(b)}}{{=}}\frac{1}{n}\sum_{l=1}^{n}I(Y_{il};\bar{\phi}_{n}^{s}(X_{i}^{n}),X_{i}^{l-1})
=I(YiT;UiT,T)=I(YiT;Ui),\displaystyle=I(Y_{iT};U_{iT},T)=I(Y_{iT};U_{i}), (69)

where (a)(a) follows since YinPYinY_{i}^{n}\sim P_{Y_{i}}^{\otimes n}, and (b)(b) is valid since Yil1Xil1(Yil,ϕ¯ns(Xin))Y_{i}^{l-1}-X_{i}^{l-1}-(Y_{il},\bar{\phi}_{n}^{s}(X_{i}^{n})) forms a Markov chain. Similarly we also have

Rc+γ\displaystyle R_{c}+\gamma 1nlog|ϕ¯ns|1nI(Xin;ϕ¯ns(Xin))=1nl=1nI(Xil;ϕ¯ns(Xin),Xil1)\displaystyle\geq\frac{1}{n}\log|\bar{\phi}_{n}^{s}|\geq\frac{1}{n}I(X^{n}_{i};\bar{\phi}_{n}^{s}(X_{i}^{n}))=\frac{1}{n}\sum_{l=1}^{n}I(X_{il};\bar{\phi}_{n}^{s}(X_{i}^{n}),X_{i}^{l-1})
=I(XiT;UiT,T)=I(XiT;Ui),i[1:m].\displaystyle=I(X_{iT};U_{iT},T)=I(X_{iT};U_{i}),\;\forall i\in[1:m]. (70)

Note that for τ,η𝔉s\tau,\eta\in\mathfrak{F}_{s}, we have PUτ|XτT=PUη|XηTP_{U_{\tau}|X_{\tau T}}=P_{U_{\eta}|X_{\eta T}}. We define this common kernel for each 𝔉s\mathfrak{F}_{s} as PUs|X¯sP_{U_{s}|\bar{X}_{s}}. Therefore for all s[1:|𝒫𝒳|]s\in[1:|\mathcal{P}_{\mathcal{X}}|] and all sufficiently large nn we have

E5γ\displaystyle E-5\gamma mini𝔉s[I(Y¯i;Us)+disyx],\displaystyle\leq\min_{i\in\mathfrak{F}_{s}}[I(\bar{Y}_{i};U_{s})+d_{is}^{\mathrm{yx}}],
Rc+γ\displaystyle R_{c}+\gamma I(X¯s;Us),\displaystyle\geq I(\bar{X}_{s};U_{s}), (71)

where for all i𝔉si\in\mathfrak{F}_{s} we have (Y¯i,X¯s,Us)PYi|Xi×P𝒳,s×PUs|X¯s(\bar{Y}_{i},\bar{X}_{s},U_{s})\sim P_{Y_{i}|X_{i}}\times P_{\mathcal{X},s}\times P_{U_{s}|\bar{X}_{s}}. By standard cardinality bound arguments [27] and taking γ0\gamma\to 0 we have that for all s[1:|𝒫𝒳|]s\in[1:|\mathcal{P}_{\mathcal{X}}|]

Emini𝔉s\displaystyle E\leq\min_{i\in\mathfrak{F}_{s}} [I(Y¯i;U¯s)+disyx],RcI(X¯s;U¯s),\displaystyle[I(\bar{Y}_{i};\bar{U}_{s})+d_{is}^{\mathrm{yx}}],\;R_{c}\geq I(\bar{X}_{s};\bar{U}_{s}),
PY¯iX¯sU¯s=PY¯iX¯s×PU¯s|X¯s.\displaystyle P_{\bar{Y}_{i}\bar{X}_{s}\bar{U}_{s}}=P_{\bar{Y}_{i}\bar{X}_{s}}\times P_{\bar{U}_{s}|\bar{X}_{s}}. (72)

Hence Eθs(Rc)E\leq\theta_{s}(R_{c}) holds for all s[1:|𝒫𝒳|]s\in[1:|\mathcal{P}_{\mathcal{X}}|], which leads to Emix,0(Rc)minsθs(Rc)E_{\mathrm{mix},0}^{\star}(R_{c})\leq\min_{s}\theta_{s}(R_{c}). ∎

V ϵ\epsilon-Error Exponent in Mixture Setting

In this section we characterize the maximum ϵ\epsilon-achievable error exponent in the testing against generalized independence setting in Section IV.

V-A Small ϵ\epsilon-optimality of Emix,0(Rc)E_{\mathrm{mix},0}^{\star}(R_{c})

The following partial result is an immediate consequence of the first part of Theorem 2. It states that when ϵ\epsilon is small enough then the maximum achievable error exponent Emix,0(Rc)E_{\mathrm{mix},0}^{\star}(R_{c}) is also ϵ\epsilon-optimal.

Corollary 1.

For a given RcR_{c}, if ϵ<mini[1:m]νi×min{mins𝒮1/|𝔉s|,1}\epsilon<\min_{i\in[1:m]}\nu_{i}\times\min\{\min_{s\in\mathcal{S}}1/|\mathfrak{F}_{s}|,1\}, where the inactive set 𝒮\mathcal{S} is defined as in the statement of Theorem 2, then

Emix,ϵ(Rc)=minsθs(Rc).E_{\mathrm{mix},\epsilon}^{\star}(R_{c})=\min_{s}\theta_{s}(R_{c}). (73)
Proof.

Assume that (ϕn,ψn)(\phi_{n},\psi_{n}) is a sequence of testing schemes such that the conditions in Definition 1 are satisfied for the pair (Rc,E)(R_{c},E)

lim supn1nlog|ϕn|Rc,\displaystyle\limsup_{n\to\infty}\frac{1}{n}\log|\phi_{n}|\leq R_{c},\; lim supnαnϵ,\displaystyle\limsup_{n\to\infty}\alpha_{n}\leq\epsilon,
lim infn1n\displaystyle\liminf_{n\to\infty}\frac{1}{n} log1βnE.\displaystyle\log\frac{1}{\beta_{n}}\geq E. (74)

For an arbitrarily small γ>0\gamma>0 and for all i[1:m]i\in[1:m], i𝔉si\in\mathfrak{F}_{s}, we have QYjin×Qϕn(Xtsn)(ψn)en(Eγ)Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}(\psi_{n})\leq e^{-n(E-\gamma)} for all nn0(γ)n\geq n_{0}(\gamma). Furthermore we also have

ϵlim supnαnνilim supnPYinϕn(Xin)(1ψn),i[1:m],\displaystyle\epsilon\geq\limsup_{n\to\infty}\alpha_{n}\geq\nu_{i}\limsup_{n\to\infty}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(1-\psi_{n}),\;\forall i\in[1:m], (75)

which implies that i[1:m]\forall i\in[1:m] we have lim supnPYinϕn(Xin)(1ψn)<min{mins𝒮1/|𝔉s|,1}\limsup_{n\to\infty}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(1-\psi_{n})<\min\{\min_{s\in\mathcal{S}}1/|\mathfrak{F}_{s}|,1\}. By the proof the first part of Theorem 2 we obtain then Eγ<minsθs(Rc)E-\gamma<\min_{s}\theta_{s}(R_{c}). Since γ\gamma is arbitrary the conclusion follows. ∎

V-B A sufficient condition for characterizing Emix,ϵ(Rc)E_{\mathrm{mix},\epsilon}^{\star}(R_{c})

To obtain a full characterization of Emix,ϵ(Rc)E_{\mathrm{mix},\epsilon}^{\star}(R_{c}) we need to make an additional assumption. By [28, Theorem 2.5] the maximization in the expression of ξi(Rc)\xi_{i}(R_{c}), i𝔉si\in\mathfrak{F}_{s}, can be restriction to the set 𝒲s(Rc)={PU|X¯sI(X¯s;U)=Rc}\mathcal{W}_{s}(R_{c})=\{P_{U|\bar{X}_{s}}\mid I(\bar{X}_{s};U)=R_{c}\}. For each i𝔉si\in\mathfrak{F}_{s}, we define 𝒲s(i)(Rc)\mathcal{W}_{s}^{(i)}(R_{c}) to be the set of optimal solutions of ξi(Rc)\xi_{i}(R_{c}) within 𝒲s(Rc)\mathcal{W}_{s}(R_{c}). We define a set 𝒪s(Rc)\mathcal{O}_{s}(R_{c}) as follows:

Ifi𝔉s𝒲s(i)(Rc),then we define𝒪s(Rc)=i𝔉s𝒲s(i)(Rc),\displaystyle\text{If}\;\bigcap_{i\in\mathfrak{F}_{s}}\mathcal{W}_{s}^{(i)}(R_{c})\neq\varnothing,\;\text{then we define}\;\mathcal{O}_{s}(R_{c})=\bigcap_{i\in\mathfrak{F}_{s}}\mathcal{W}_{s}^{(i)}(R_{c}),
otherwise we take𝒪s(Rc)=i𝔉s𝒲s(i)(Rc).\displaystyle\text{otherwise we take}\;\mathcal{O}_{s}(R_{c})=\bigcup_{i\in\mathfrak{F}_{s}}\mathcal{W}_{s}^{(i)}(R_{c}). (76)

For an arbitrarily given PU|X𝒪s(Rc)P_{U|X}\in\mathcal{O}_{s}(R_{c}), the set of {ξi,i𝔉s}\{\xi_{i},\;i\in\mathfrak{F}_{s}\} can be arranged as

I(Y¯i1;U)+di1yI(Y¯i2;U)+di2yI(Y¯i|𝔉s|;U)+d|𝔉s|y.I(\bar{Y}_{i_{1}};U)+d_{i_{1}}^{\mathrm{y}}\leq I(\bar{Y}_{i_{2}};U)+d_{i_{2}}^{\mathrm{y}}\leq\dots\leq I(\bar{Y}_{i_{|\mathfrak{F}_{s}|}};U)+d_{|\mathfrak{F}_{s}|}^{\mathrm{y}}. (77)

This corresponds to a permutation σs(PU|X)\sigma_{s}(P_{U|X}) on [1:|𝔉s|][1:|\mathfrak{F}_{s}|]. It can be seen that when i𝔉s𝒲s(i)(Rc)\bigcap_{i\in\mathfrak{F}_{s}}\mathcal{W}_{s}^{(i)}(R_{c}) is not empty then the permutation σs()\sigma_{s}(\cdot) does not depend on a specific PU|XP_{U|X} in 𝒪s(Rc)\mathcal{O}_{s}(R_{c}). To account for more general scenarios, we make the following separability assumption.

Assumption 2.

For given set of distributions {PXiYi}i=1m\{P_{X_{i}Y_{i}}\}_{i=1}^{m} and set of sequences of distributions {(QYjn)}j=1k\{(Q_{Y_{j}^{n}})\}_{j=1}^{k}, class 𝔉s\mathfrak{F}_{s} and RcR_{c}, any two kernels PU|X1P_{U|X}^{1} and PU|X2P_{U|X}^{2} in 𝒪s(Rc)\mathcal{O}_{s}(R_{c}) satisfy σs(PU|X1)=σs(PU|X2)\sigma_{s}(P_{U|X}^{1})=\sigma_{s}(P_{U|X}^{2}), i.e., the order is invariant to the change of kernels inside 𝒪s(Rc)\mathcal{O}_{s}(R_{c}).

It can be seen that Assumption 2 is satisfied when |𝔉s|=1|\mathfrak{F}_{s}|=1 for all ss, i.e., when all PYiXiP_{Y_{i}X_{i}} have distinct marginal distributions. In the following we briefly discuss two non-trivial scenarios in which Assumption 2 can be fulfilled.

Example 1: In this example we assume that {QYjn}={PYin}\{Q_{Y_{j}^{n}}\}=\{P_{Y_{i}}^{\otimes n}\} and {QXtn}={P𝒳,sn}\{Q_{X_{t}^{n}}\}=\{P_{\mathcal{X},s}^{\otimes n}\} hold. This implies that for all i[1:m]i\in[1:m] and for all s[1:|𝒫𝒳|]s\in[1:|\mathcal{P}_{\mathcal{X}}|], we have diy=0d_{i}^{\mathrm{y}}=0 and dsx=0d_{s}^{\mathrm{x}}=0. We then assume further that within each class 𝔉s\mathfrak{F}_{s} the set of channels PY¯i|X¯sP_{\bar{Y}_{i}|\bar{X}_{s}} can be ordered according to the less noisy relation777A channel PY|XP_{Y|X} is less noisy [29] than a channel PZ|XP_{Z|X} if for every PXUP_{XU} we have I(Y;U)I(Z;U)I(Y;U)\geq I(Z;U).. Then for all PU|X𝒪s(Rc)P_{U|X}\in\mathcal{O}_{s}(R_{c}), we have

I(Y¯i1;U)I(Y¯i2;U)I(Y¯i|𝔉s|;U),\displaystyle I(\bar{Y}_{i_{1}};U)\leq I(\bar{Y}_{i_{2}};U)\leq\dots\leq I(\bar{Y}_{i_{|\mathfrak{F}_{s}|}};U), (78)

for some fixed order {i1,,i|𝔉s|}\{i_{1},\dots,i_{|\mathfrak{F}_{s}|}\} which does not depend on whether i𝔉s𝒲s(i)(Rc)\bigcap_{i\in\mathfrak{F}_{s}}\mathcal{W}_{s}^{(i)}(R_{c}) is empty or not. Hence Assumption 2 is satisfied.

Example 2: We consider another example in which the set of distributions in the null hypothesis is {PY¯i|X¯×PX¯}i=12\{P_{\bar{Y}_{i}|\bar{X}}\times P_{\bar{X}}\}_{i=1}^{2}. We assume further that PY¯1|X¯P_{\bar{Y}_{1}|\bar{X}} is an erasure channel with erasure probability tt. We also assume that {QYjn}={PY¯in}\{Q_{Y_{j}^{n}}\}=\{P_{\bar{Y}_{i}}^{\otimes n}\} and {QXtn}={PX¯n}\{Q_{X_{t}^{n}}\}=\{P_{\bar{X}}^{\otimes n}\} hold in this example. Then ξ1(Rc)=(1t)Rc\xi_{1}(R_{c})=(1-t)R_{c}, which can be achieved by any kernel PU1|X¯P_{U_{1}|\bar{X}} such that I(X¯;U1)=RcI(\bar{X};U_{1})=R_{c}. This implies that 𝒲(1)(Rc)𝒲(2)(Rc)=𝒲(2)(Rc)\mathcal{W}^{(1)}(R_{c})\cap\mathcal{W}^{(2)}(R_{c})=\mathcal{W}^{(2)}(R_{c}) holds, a non-empty set, and hence Assumption 2 is satisfied.

Assumption 2 implies that at a given RcR_{c} we have

ξi1(Rc)ξi2(Rc)ξi|𝔉s|(Rc).\xi_{i_{1}}(R_{c})\leq\xi_{i_{2}}(R_{c})\leq\dots\leq\xi_{i_{|\mathfrak{F}_{s}|}}(R_{c}). (79)

This can be seen as follows. If i𝔉s𝒲s(i)(Rc)\bigcap_{i\in\mathfrak{F}_{s}}\mathcal{W}_{s}^{(i)}(R_{c})\neq\varnothing, we can take any kernel inside 𝒪s(Rc)\mathcal{O}_{s}(R_{c}) to achieve ξil(Rc)\xi_{i_{l}}(R_{c}) for all l[1:|𝔉s|]l\in[1:|\mathfrak{F}_{s}|]. Hence (77) leads to (79). Otherwise, we take a kernel PU|X𝒲s(i1)(Rc)P_{U|X}\in\mathcal{W}_{s}^{(i_{1})}(R_{c}) to obtain ξi1(Rc)I(Y¯i2;U)+di2yξi2(Rc)\xi_{i_{1}}(R_{c})\leq I(\bar{Y}_{i_{2}};U)+d_{i_{2}}^{\mathrm{y}}\leq\xi_{i_{2}}(R_{c}) since 𝒲s(i1)(Rc)𝒪s(Rc)\mathcal{W}_{s}^{(i_{1})}(R_{c})\subseteq\mathcal{O}_{s}(R_{c}) holds and so on. Furthermore we observe that for all l[1:|𝔉s|]l\in[1:|\mathfrak{F}_{s}|], we have

ξil(Rc)\displaystyle\xi_{i_{l}}(R_{c}) maxPU|X¯s:I(X¯s;U)Rcmini{iη}η=l|𝔉s|[I(Y¯i;U)+disyx]\displaystyle\geq\max_{P_{U|\bar{X}_{s}}\colon I(\bar{X}_{s};U)\leq R_{c}}\min_{i\in\{i_{\eta}\}_{\eta=l}^{|\mathfrak{F}_{s}|}}[I(\bar{Y}_{i};U)+d_{is}^{\mathrm{yx}}]
maxPU|X¯s𝒪s(Rc)mini{iη}η=l|𝔉s|[I(Y¯i;U)+disyx]\displaystyle\geq\max_{P_{U|\bar{X}_{s}}\in\mathcal{O}_{s}(R_{c})}\min_{i\in\{i_{\eta}\}_{\eta=l}^{|\mathfrak{F}_{s}|}}[I(\bar{Y}_{i};U)+d_{is}^{\mathrm{yx}}]
=(77)maxPU|X¯s𝒪s(Rc)[I(Y¯il;U)+dilsyx]\displaystyle\stackrel{{\scriptstyle\eqref{fixed_order}}}{{=}}\max_{P_{U|\bar{X}_{s}}\in\mathcal{O}_{s}(R_{c})}[I(\bar{Y}_{i_{l}};U)+d_{i_{l}s}^{\mathrm{yx}}]
()ξil(Rc).\displaystyle\stackrel{{\scriptstyle(*)}}{{\geq}}\xi_{i_{l}}(R_{c}). (80)

()(*) can be verified as follows. If i𝔉s𝒲s(i)(Rc)\bigcap_{i\in\mathfrak{F}_{s}}\mathcal{W}_{s}^{(i)}(R_{c})\neq\varnothing, we can take any kernel inside 𝒪s(Rc)\mathcal{O}_{s}(R_{c}) to achieve ξil(Rc)\xi_{i_{l}}(R_{c}). Otherwise, 𝒪s(Rc)𝒲s(il)(Rc)\mathcal{O}_{s}(R_{c})\supseteq\mathcal{W}_{s}^{(i_{l})}(R_{c}) holds, then ()(*) follows. Therefore, we have the following relation

ξil(Rc)=maxPU|X¯s:I(X¯s;U)Rcmini{iη}η=l|𝔉s|[I(Y¯i;U)+disyx].\xi_{i_{l}}(R_{c})=\max_{P_{U|\bar{X}_{s}}\colon I(\bar{X}_{s};U)\leq R_{c}}\min_{i\in\{i_{\eta}\}_{\eta=l}^{|\mathfrak{F}_{s}|}}[I(\bar{Y}_{i};U)+d_{is}^{\mathrm{yx}}]. (81)

Assume that at a given RcR_{c}, Assumption 2 is valid. By relabeling elements in the set {PYiXi}i=1m\{P_{Y_{i}X_{i}}\}_{i=1}^{m} if necessary, we assume that (ξi(Rc))i=1m(\xi_{i}(R_{c}))_{i=1}^{m} is an increasing sequence. An example of such ordering is given in Fig. 2.

Refer to caption
Figure 2: Illustration of Emix,ϵ(Rc)E_{\mathrm{mix},\epsilon}^{\star}(R_{c}) under Assumption 2. For comparison note that under Assumption 2 Ecomp,ϵ(Rc)E_{\mathrm{comp},\epsilon}^{\star}(R_{c}) does not depend on ϵ\epsilon. At l=2l=2 the left-over sets are given by 𝔉1(2)={4}\mathfrak{F}_{1}(2)=\{4\}, 𝔉2(2)={2,3}\mathfrak{F}_{2}(2)=\{2,3\}.

For each l[1:m]l\in[1:m], for notation simplicity we define for each ss a left-over subset of 𝔉s\mathfrak{F}_{s} as 𝔉s(l)=𝔉s\[1:l1]\mathfrak{F}_{s}(l)=\mathfrak{F}_{s}\backslash[1:l-1]. Since the ordering is unique, (79) implies that i1i2i|𝔉s|i_{1}\leq i_{2}\leq\dots\leq i_{|\mathfrak{F}_{s}|}. Therefore, when 𝔉s(l)\mathfrak{F}_{s}(l)\neq\varnothing, we have

ξmin𝔉s(l)(Rc)=maxPU|X¯s:I(X¯s;U)Rcmini𝔉s(l)[I(Y¯i;U)+disyx].\xi_{\min\mathfrak{F}_{s}(l)}(R_{c})=\max_{P_{U|\bar{X}_{s}}\colon I(\bar{X}_{s};U)\leq R_{c}}\min_{i\in\mathfrak{F}_{s}(l)}[I(\bar{Y}_{i};U)+d_{is}^{\mathrm{yx}}]. (82)

The above analysis leads to the following result.

Proposition 1.

Assume that at a given RcR_{c} Assumption 2 is fulfilled and (ξi(Rc))i=1m(\xi_{i}(R_{c}))_{i=1}^{m} is an increasing sequence. Then for each l[1:m]l\in[1:m] the following holds

minsmaxPU|X¯s:I(X¯s;U)Rcmini𝔉s(l)[I(Y¯i;U)+disyx]=ξl(Rc).\displaystyle\min_{s}\max_{P_{U|\bar{X}_{s}}\colon I(\bar{X}_{s};U)\leq R_{c}}\min_{i\in\mathfrak{F}_{s}(l)}[I(\bar{Y}_{i};U)+d_{is}^{\mathrm{yx}}]=\xi_{l}(R_{c}). (83)

The left-hand side in (83) is the maximum achievable error exponent in testing {PYiXin}i[l:m]\{P_{Y_{i}X_{i}}^{\otimes n}\}_{i\in[l:m]} against {QYjn×QXtn}\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}. This result will be used to establish the optimal ϵ\epsilon-error exponent in this section.

Proof.

Due to the above analysis the left-hand side equals to, cf. (82),

mins:𝔉s(l)ξmin𝔉s(l)(Rc)=minis𝔉s(l)ξi(Rc)\displaystyle\min_{s\colon\mathfrak{F}_{s}(l)\neq\varnothing}\xi_{\min\mathfrak{F}_{s}(l)}(R_{c})=\min_{i\in\bigcup_{s}\mathfrak{F}_{s}(l)}\xi_{i}(R_{c})
=mini[l:m]ξi(Rc)=ξl(Rc).\displaystyle=\min_{i\in[l:m]}\xi_{i}(R_{c})=\xi_{l}(R_{c}). (84)

V-C Characterization of Emix,ϵ(Rc)E_{\mathrm{mix},\epsilon}^{\star}(R_{c}) under Assumption 2

A complete characterization of Emix,ϵ(Rc)E_{\mathrm{mix},\epsilon}^{\star}(R_{c}) under Assumption 2 is provided in the following.

Theorem 4.

Assume that a given RcR_{c}, Assumption 2 is true and (ξi(Rc))i=1m(\xi_{i}(R_{c}))_{i=1}^{m} is an increasing sequence, then we have

Emix,ϵ(Rc)=i=1mξi(Rc)𝟏[j=1i1νj,j=1iνj)(ϵ).\displaystyle E_{\mathrm{mix},\epsilon}^{\star}(R_{c})=\sum_{i=1}^{m}\xi_{i}(R_{c})\mathbf{1}_{[\sum_{j=1}^{i-1}\nu_{j},\sum_{j=1}^{i}\nu_{j})}(\epsilon). (85)

The behavior of Emix,ϵ(Rc)E_{\mathrm{mix},\epsilon}^{\star}(R_{c}) is depicted in Fig. 2. The proof of Theorem 4 uses the result from the second part of Theorem 2. It demonstrates an application of showing exponentially strong converse. We briefly describe the proof idea in the following.

Fix an ϵ\epsilon such that ϵ[j=1i1νj,j=1iνj)\epsilon\in[\sum_{j=1}^{i-1}\nu_{j},\sum_{j=1}^{i}\nu_{j}) holds. We build an ϵ\epsilon-achievable sequence of testing schemes based on a compound hypothesis testing problem in which we need to specify two sets of distributions in the null and alternative hypotheses. The set of distributions in the alternative hypothesis is

H1:{QYjn×QXtn}j[1:k],t[1:r].\displaystyle H_{1}^{\prime}:\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}_{j\in[1:k],t\in[1:r]}. (86)

Since we are interested in showing ϵ\epsilon-error is achievable, we select the set of distributions in the null hypothesis as

H0:{PYlXln}l[i:m].\displaystyle H_{0}^{\prime}:\{P_{Y_{l}X_{l}}^{\otimes n}\}_{l\in[i:m]}. (87)

We omit the other distributions PYlXlnP_{Y_{l}X_{l}}^{\otimes n} for l[1:i1]l\in[1:i-1] in the null hypothesis of the above problem because in the mixture setting they contribute only to a total error probability up to j=1i1νj\sum_{j=1}^{i-1}\nu_{j} which is what we desire. Given a sequence of testing schemes such that EE is achievable (ϕn,ψn)(\phi_{n},\psi_{n}) in testing H0H_{0}^{\prime} against H1H_{1}^{\prime}, we obtain as by-products the collection of intersected decision regions n(l)(E)\mathcal{I}_{n}^{(l)}(E) with vanishing complement probability, cf. Theorem 1.
We use ϕn\phi_{n} to compress xnx^{n}. Next we select a decision region 𝒜n\mathcal{A}_{n} for testing PYnXnP_{Y^{n}X^{n}} against QYnXnQ_{Y^{n}X^{n}} such that for all l[i:m]l\in[i:m], 𝒜nc(n(l)(E))c\mathcal{A}_{n}^{c}\subset(\mathcal{I}_{n}^{(l)}(E))^{c} holds. This ensures that PYinϕn(Xin)(𝒜nc)0P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{A}_{n}^{c})\to 0 for all l[i:m]l\in[i:m]. Hence, asymptotically we have PYnϕn(Xn)[𝒜nc]j=1i1νjPYjnϕn(Xjn)[𝒜nc]ϵP_{Y^{n}\phi_{n}(X^{n})}[\mathcal{A}_{n}^{c}]\leq\sum_{j=1}^{i-1}\nu_{j}P_{Y_{j}^{n}\phi_{n}(X_{j}^{n})}[\mathcal{A}_{n}^{c}]\leq\epsilon.
As in the converse proof of Theorem 3, assume that we have a likelihood based decision region 𝒜n\mathcal{A}_{n} for testing PYnXnP_{Y^{n}X^{n}} against QYnXnQ_{Y^{n}X^{n}}. Then we use change of measure steps to obtain for each i[1:m]i\in[1:m] a likelihood based rejection region of testing PYinϕn(Xin)P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})} against QYjin×Qϕn(Xtsn)Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}, called 𝒞n(i)\mathcal{C}_{n}^{(i)}. Then for all sufficiently large nn we have

ϵ>l=1mνlPYlnϕn(Xln)(𝒞n(l)).\displaystyle\epsilon>\sum_{l=1}^{m}\nu_{l}P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}(\mathcal{C}_{n}^{(l)}). (88)

We show that for all l[1:i]l\in[1:i] we have PYlnϕn(Xln)(𝒞n(l))P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}(\mathcal{C}_{n}^{(l)}) converges to 1 when E>ξi(Rc)+3γE>\xi_{i}(R_{c})+3\gamma holds. Taking the limit superior on both sides on the above inequality, we obtain that ϵ>j=1iνj\epsilon>\sum_{j=1}^{i}\nu_{j}, a contradiction. Hence we must have Eξi(Rc)E\leq\xi_{i}(R_{c}).
When no compression is involved, ϕn\phi_{n} is the identity mapping, the convergence of PYlnXln(𝒞n(l))P_{Y_{l}^{n}X_{l}^{n}}(\mathcal{C}_{n}^{(l)}) to 1 can be seen from the weak law of large numbers since 𝒞n(l)\mathcal{C}_{n}^{(l)} is characterized by the likelihood ratio. In our case this is not possible since the likelihood ratio can not be factorized as the sum of identical components. Using the strong converse arguments which guarantee the convergence in the limit superior sense, it can be seen that if ϵ[maxj[1:i1]νj,maxj[1:i]νj)\epsilon\in[\max_{j\in[1:i-1]}{\nu_{j}},\max_{j\in[1:i]}{\nu_{j}}) holds, then we must have Eξi(Rc)E\leq\xi_{i}(R_{c}). This approach does not match the achievability result and does not yield conclusive information when ϵ>maxi[1:m]νi\epsilon>\max_{i\in[1:m]}\nu_{i}. The required convergence of PYlnϕn(Xln)(𝒞n(l))P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}(\mathcal{C}_{n}^{(l)}) is guaranteed by the second part of Theorem 2 since ξi(Rc)>ξl(Rc)\xi_{i}(R_{c})>\xi_{l}(R_{c}) holds for all l[1:i]l\in[1:i] due to Assumption 2.

Proof.

Achievability: Given i[1:m]i\in[1:m], consider a reduced compound problem of designing a testing scheme (ϕn,ψn)(\phi_{n},\psi_{n}) to differentiate between {PYlXln}l[i:m]\{P_{Y_{l}X_{l}}^{\otimes n}\}_{l\in[i:m]} and {QYjn×QXtn}j[1:k],t[1:r]\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}_{j\in[1:k],t\in[1:r]}. By Theorem 1 and Proposition 1, for each γ>0\gamma>0, there exists a sequence of testing schemes for this reduced compound setting such that E+γ/2=ξi(Rc)γ/2E+\gamma/2=\xi_{i}(R_{c})-\gamma/2 is achievable, i.e.,

limnPYlnϕn(Xln)[(n(l)(E))c]=0,l[i:m].\displaystyle\lim_{n\to\infty}P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}[(\mathcal{I}_{n}^{(l)}(E))^{c}]=0,\;\forall l\in[i:m]. (89)

In contrast to the achievability proof of Theorem 3, although we can use the same sequence of compression mappings (ϕn)(\phi_{n}) to compress xnx^{n}, we need to define a new sequence of decision mappings ψ¯n\bar{\psi}_{n}. For each nn and given the compression mapping ϕn\phi_{n}, define the following measure

P¯n=l=1mPYlnϕn(Xln).\displaystyle\bar{P}_{n}=\sum_{l=1}^{m}P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}. (90)

Define an acceptance region for our testing PYnXnP_{Y^{n}X^{n}} against QYnXnQ_{Y^{n}X^{n}} problem as follows

𝒜n={(yn,ϕn(xn))\displaystyle\mathcal{A}_{n}=\big{\{}(y^{n},\phi_{n}(x^{n}))\mid P¯n(yn,ϕn(xn))\displaystyle\bar{P}_{n}(y^{n},\phi_{n}(x^{n}))
maxj[1:k],t[1:r]QYjn×Qϕn(Xtn)(yn,ϕn(xn))enE}.\displaystyle\geq\max_{j\in[1:k],t\in[1:r]}Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(y^{n},\phi_{n}(x^{n}))e^{nE}\big{\}}. (91)

Then the probability of miss detection is given by

βn=QYnϕn(Xn)(𝒜n)\displaystyle\beta_{n}=Q_{Y^{n}\phi_{n}(X^{n})}(\mathcal{A}_{n}) krmaxj[1:k],t[1:r]QYjn×Qϕn(Xtn)(𝒜n)\displaystyle\leq kr\max_{j\in[1:k],t\in[1:r]}Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(\mathcal{A}_{n})
krenEP¯n(𝒜n)krmenE.\displaystyle\leq kre^{-nE}\bar{P}_{n}(\mathcal{A}_{n})\leq krme^{-nE}. (92)

For the given PYnXnP_{Y^{n}X^{n}}, the false alarm probability is given by

PYnϕn(X)n(𝒜nc)=l=1mνlPYlnϕn(Xln)(𝒜nc).\displaystyle P_{Y^{n}\phi_{n}(X)^{n}}(\mathcal{A}_{n}^{c})=\sum_{l=1}^{m}\nu_{l}P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}(\mathcal{A}_{n}^{c}). (93)

For each l[1:m]l\in[1:m], if (yn,ϕn(xn))𝒜nc(y^{n},\phi_{n}(x^{n}))\in\mathcal{A}_{n}^{c} then we have

PYlnϕn(Xln)(yn,ϕn(xn))\displaystyle P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}(y^{n},\phi_{n}(x^{n})) P¯n(yn,ϕn(xn))\displaystyle\leq\bar{P}_{n}(y^{n},\phi_{n}(x^{n}))
<enEmaxj[1:k],t[1:r]QYjn×Qϕn(Xtn)(yn,ϕn(xn))\displaystyle<e^{nE}\max_{j\in[1:k],t\in[1:r]}Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(y^{n},\phi_{n}(x^{n})) (94)

This means that (yn,ϕn(xn))(n(l)(E))c(y^{n},\phi_{n}(x^{n}))\in(\mathcal{I}_{n}^{(l)}(E))^{c}, cf. (17) for the definition. In summary we have

PYnϕn(X)n(𝒜nc)\displaystyle P_{Y^{n}\phi_{n}(X)^{n}}(\mathcal{A}_{n}^{c}) =lνlPYlnϕn(Xln)[𝒜nc]\displaystyle=\sum_{l}\nu_{l}P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}[\mathcal{A}_{n}^{c}]
lνlPYlnϕn(Xln)[(n(l)(E))c]\displaystyle\leq\sum_{l}\nu_{l}P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}[(\mathcal{I}_{n}^{(l)}(E))^{c}]
l[1:i1]νl+l[i:m]νlPYlnϕn(Xln)[(n(l)(E))c].\displaystyle\leq\sum_{l\in[1:i-1]}\nu_{l}+\sum_{l\in[i:m]}\nu_{l}P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}[(\mathcal{I}_{n}^{(l)}(E))^{c}]. (95)

Therefore we have

lim supnαn(89)l[1:i1]νl.\limsup_{n\to\infty}\alpha_{n}\stackrel{{\scriptstyle\eqref{eq_79}}}{{\leq}}\sum_{l\in[1:i-1]}\nu_{l}. (96)

This implies that Emix,ϵ(Rc)ξi(Rc)E_{\mathrm{mix},\epsilon}^{\star}(R_{c})\geq\xi_{i}(R_{c}) if l=1i1νlϵ<l=1iνl\sum_{l=1}^{i-1}\nu_{l}\leq\epsilon<\sum_{l=1}^{i}\nu_{l}.

Converse: Given an ϵ[j=1i1νj,j=1iνj)\epsilon\in[\sum_{j=1}^{i-1}\nu_{j},\sum_{j=1}^{i}\nu_{j}) and an arbitrary γ>0\gamma>0, assume that (E+γ)(E+\gamma) is ϵ\epsilon-achievable via a sequence of testing schemes (ϕn,ψn)(\phi_{n},\psi_{n}). In a similar fashion as in the converse proof of Theorem 3 we have,

αn+enEβn+enγn\displaystyle\alpha_{n}+e^{nE}\beta_{n}+e^{-n\gamma_{n}}
i=1mνiPYinϕn(Xin)(𝒞n(i)),\displaystyle\geq\sum_{i=1}^{m}\nu_{i}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{C}_{n}^{(i)}), (97)

where the rejection regions 𝒞n(i)\mathcal{C}_{n}^{(i)} are defined as in (49). Assume that E=ξi(Rc)+3γE=\xi_{i}(R_{c})+3\gamma holds. This implies that for all l[1:i]l\in[1:i] we have E>ξl(Rc)E>\xi_{l}(R_{c}). For each l[1:i]l\in[1:i], we apply the second part of Theorem 2 to the problem of testing PYlXlnP_{Y_{l}X_{l}}^{\otimes n} against QYjln×QXtsnQ_{Y_{j_{l}^{\star}}^{n}}\times Q_{X_{t_{s}^{\star}}^{n}} where l𝔉sl\in\mathfrak{F}_{s} via the sequence of testing schemes (ϕn,𝟏(𝒞n(l))c)(\phi_{n},\mathbf{1}_{(\mathcal{C}_{n}^{(l)})^{c}}) to obtain that

limnPYlnϕn(Xln)(𝒞n(l))=1,l[1:i].\lim_{n\to\infty}P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}(\mathcal{C}_{n}^{(l)})=1,\;\forall l\in[1:i]. (98)

Since enEβn0e^{nE}\beta_{n}\to 0 as nn\to\infty, this implies that

ϵlim supnαnl=1iνl,\displaystyle\epsilon\geq\limsup_{n\to\infty}\alpha_{n}\geq\sum_{l=1}^{i}\nu_{l}, (99)

a contradiction. Therefore we must have Eξi(Rc)E\leq\xi_{i}(R_{c}) and hence Emix,ϵ(Rc)ξi(Rc)E^{\star}_{\mathrm{mix},\epsilon}(R_{c})\leq\xi_{i}(R_{c}). ∎

VI A refined relation to the WAK problem

In this section we use the techniques and results from the previous sections to establish new results for the WAK problem in which the joint distribution is a mixture of iid components.
We first recall the definition of the WAK problem. Assume that we have a joint source (Xn,Yn)PXnYn(X^{n},Y^{n})\sim P_{X^{n}Y^{n}} which takes values on an alphabet 𝒳n×𝒴n\mathcal{X}_{n}\times\mathcal{Y}_{n} where 𝒴n\mathcal{Y}_{n} is finite or countable infinite. A code for the WAK problem is a triple of mappings

ϕ1n:𝒳n1\displaystyle\phi_{1n}\colon\mathcal{X}_{n}\to\mathcal{M}_{1} ,ϕ2n:𝒴n2.\displaystyle,\;\phi_{2n}\colon\mathcal{Y}_{n}\to\mathcal{M}_{2}.
ψ¯n:1\displaystyle\bar{\psi}_{n}\colon\mathcal{M}_{1} ×2𝒴n.\displaystyle\times\mathcal{M}_{2}\to\mathcal{Y}_{n}. (100)

In the WAK setting we aim to control the error probability Pr{Y^nYn}\mathrm{Pr}\{\hat{Y}^{n}\neq Y^{n}\} where Y^n=ψ¯n(ϕ1n(Xn),ϕ2n(Yn))\hat{Y}^{n}=\bar{\psi}_{n}(\phi_{1n}(X^{n}),\phi_{2n}(Y^{n})). The achievable region has been characterized using the information-spectrum formula in [30]. We are interested in single-letter formulas for the (ϵ\epsilon-) achievable regions. To obtain these we relate the WAK problem to the testing against independence problem with general distributions.

The hypotheses are given by

H0\displaystyle H_{0} :(yn,xn)PYnXn,\displaystyle\colon(y^{n},x^{n})\sim P_{Y^{n}X^{n}},
H1\displaystyle H_{1} :(yn,xn)QYn×PXn,\displaystyle\colon(y^{n},x^{n})\sim Q_{Y^{n}}\times P_{X^{n}}, (101)

where QYnQ_{Y^{n}} is a distribution on 𝒴n\mathcal{Y}_{n}. Similarly we use (3) and (4) for the definition of a testing scheme in this case. Additionally, Definition 1 is taken as the definition of ϵ\epsilon-achievability.

When 𝒴\mathcal{Y} is a finite or countable finite alphabet, and the process (Yi)i=(Y_{i})_{i=-\infty}^{\infty} in the WAK problem is a stationary and ergodic process which has a finite entropy rate, as well as 𝒴n=𝒴n\mathcal{Y}_{n}=\mathcal{Y}^{n} and QYn=PYnQ_{Y^{n}}=P_{Y^{n}}, a generalized relation between the WAK problem and the hypothesis testing against independence problem has been established in our recent work [21]. The relation allows us to transfer results from the testing against independence problem to the WAK problem and vice versa. However it is not strong enough for our current interest. In the following we study a refined relation between the two problems.

In this section we similarly assume that 𝒴n=𝒴n\mathcal{Y}_{n}=\mathcal{Y}^{n} where specifically 𝒴\mathcal{Y} is a finite alphabet. We further assume that QYn=χnQ_{Y^{n}}=\chi^{\otimes n} where χ\chi is the uniform distribution on 𝒴\mathcal{Y}. For simplicity we call this setting U(niform)-HT. Define a set

𝒜n(ϕn,t)={(yn,ϕn(xn))PYn|ϕn(Xn)(yn|ϕn(xn))t}.\displaystyle\mathcal{A}_{n}(\phi_{n},t)=\{(y^{n},\phi_{n}(x^{n}))\mid P_{Y^{n}|\phi_{n}(X^{n})}(y^{n}|\phi_{n}(x^{n}))\leq t\}. (102)

For a given WAK-code (ϕ1n,ϕ2n,ψn)(\phi_{1n},\phi_{2n},\psi_{n}), and an arbitrary number η\eta, we have

Pr{Y^n=Yn}PYnϕ1n(Xn)[(𝒜n(ϕ1n,eη/|2|))c]\displaystyle\mathrm{Pr}\{\hat{Y}^{n}=Y^{n}\}\leq P_{Y^{n}\phi_{1n}(X^{n})}[(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/|\mathcal{M}_{2}|))^{c}]
+u1Pϕ1n(Xn)(u1)yn:PYn|ϕ1n(Xn)(yn|u1)eη/|2|yn=ψ¯n(u1,ϕ2n(yn))PYn|ϕ1n(Xn)(yn|u1)\displaystyle\hskip 28.45274pt+\sum_{u_{1}}P_{\phi_{1n}(X^{n})}(u_{1})\sum_{\begin{subarray}{c}y^{n}\colon P_{Y^{n}|\phi_{1n}(X^{n})}(y^{n}|u_{1})\leq e^{-\eta}/|\mathcal{M}_{2}|\\ y^{n}=\bar{\psi}_{n}(u_{1},\phi_{2n}(y^{n}))\end{subarray}}P_{Y^{n}|\phi_{1n}(X^{n})}(y^{n}|u_{1})
()PYnϕ1n(Xn)[(𝒜n(ϕ1n,eη/|2|))c]+eη,\displaystyle\stackrel{{\scriptstyle(*)}}{{\leq}}P_{Y^{n}\phi_{1n}(X^{n})}[(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/|\mathcal{M}_{2}|))^{c}]+e^{-\eta}, (103)

where ()(*) holds as for a given u11u_{1}\in\mathcal{M}_{1} the number of yny^{n} satisfying yn=ψ¯n(u1,ϕ2n(yn))y^{n}=\bar{\psi}_{n}(u_{1},\phi_{2n}(y^{n})) is upper bounded by |2||\mathcal{M}_{2}|. Therefore, we obtain

Pr{Y^nYn}+eηPYnϕ1n(Xn)(𝒜n(ϕ1n,eη/|2|)).\displaystyle\mathrm{Pr}\{\hat{Y}^{n}\neq Y^{n}\}+e^{-\eta}\geq P_{Y^{n}\phi_{1n}(X^{n})}(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/|\mathcal{M}_{2}|)). (104)

Additionally, given a testing scheme (ϕn,ψn)(\phi_{n},\psi_{n}) for the U-HT problem and an arbitrarily positive number γ\gamma we also have by [11, Lemma 4.1.2]

αn+γβn\displaystyle\alpha_{n}+\gamma\beta_{n} Pr{PYnϕn(Xn)χn×Pϕn(Xn)(Yn,ϕn(Xn))γ}\displaystyle\geq\mathrm{Pr}\big{\{}\frac{P_{Y^{n}\phi_{n}(X^{n})}}{\chi^{\otimes n}\times P_{\phi_{n}}(X^{n})}(Y^{n},\phi_{n}(X^{n}))\leq\gamma\big{\}}
=PYnϕn(Xn)(𝒜n(ϕn,γ/|𝒴|n)).\displaystyle=P_{Y^{n}\phi_{n}(X^{n})}(\mathcal{A}_{n}(\phi_{n},\gamma/|\mathcal{Y}|^{n})). (105)

Similar as in [21, Theorem 2] we have the following result.

Theorem 5.

Given positive numbers η\eta and γ\gamma.

  • From a WAK-code (ϕ1n,ϕ2n,ψ¯n)(\phi_{1n},\phi_{2n},\bar{\psi}_{n}), we can construct a testing scheme for a U-HT problem (ϕ1n,ψn)(\phi_{1n},\psi_{n}) such that

    αn\displaystyle\alpha_{n} Pr{Y^nYn}+eη,βneη|2||𝒴|n.\displaystyle\leq\mathrm{Pr}\{\hat{Y}^{n}\neq Y^{n}\}+e^{-\eta},\;\beta_{n}\leq\frac{e^{\eta}|\mathcal{M}_{2}|}{|\mathcal{Y}|^{n}}. (106)
  • For a given U-HT testing scheme (ϕn,ψn)(\phi_{n},\psi_{n}), there exists a WAK-code (ϕ1n,ϕ2n,ψ¯n)(\phi_{1n},\phi_{2n},\bar{\psi}_{n}) such that

    Pr{Y^nYn}αn+γβn+|𝒴|n/(γ|2|).\displaystyle\mathrm{Pr}\{\hat{Y}^{n}\neq Y^{n}\}\leq\alpha_{n}+\gamma\beta_{n}+|\mathcal{Y}|^{n}/(\gamma|\mathcal{M}_{2}|). (107)
Proof.

U-HT \Leftarrow WAK: given a WAK-code (ϕ1n,ϕ2n,ψ¯n)(\phi_{1n},\phi_{2n},\bar{\psi}_{n}) we design a U-HT testing scheme as follows. We use ϕ1n\phi_{1n} to compress xnx^{n} in the U-HT setting. A decision region for the U-HT setting is given by (𝒜n(ϕ1n,eη/|2|))c(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/|\mathcal{M}_{2}|))^{c}. Then the false alarm probability is upper bounded as

αn\displaystyle\alpha_{n} =PYnϕ1n(Xn)(𝒜n(ϕ1n,eη/|2|))\displaystyle=P_{Y^{n}\phi_{1n}(X^{n})}(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/|\mathcal{M}_{2}|))
(104)Pr{Y^nYn}+eη.\displaystyle\stackrel{{\scriptstyle\eqref{wak_basic}}}{{\leq}}\mathrm{Pr}\{\hat{Y}^{n}\neq Y^{n}\}+e^{-\eta}. (108)

The miss detection probability is upper bounded as

βn\displaystyle\beta_{n} =χn×Pϕ1n(Xn)[(𝒜n(ϕ1n,eη/|2|))c]\displaystyle=\chi^{\otimes n}\times P_{\phi_{1n}(X^{n})}[(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/|\mathcal{M}_{2}|))^{c}]
()eη|2||𝒴|nPYnϕ1n(Xn)[(𝒜n(ϕ1n,eη/|2|))c]eη|2||𝒴|n,\displaystyle\stackrel{{\scriptstyle(**)}}{{\leq}}\frac{e^{\eta}|\mathcal{M}_{2}|}{|\mathcal{Y}|^{n}}P_{Y^{n}\phi_{1n}(X^{n})}[(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/|\mathcal{M}_{2}|))^{c}]\leq\frac{e^{\eta}|\mathcal{M}_{2}|}{|\mathcal{Y}|^{n}}, (109)

where ()(**) follows since for (yn,ϕ1n(xn))(𝒜n(ϕ1n,eη/|2|))c(y^{n},\phi_{1n}(x^{n}))\in(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/|\mathcal{M}_{2}|))^{c} we have eη|2|PYn|ϕ1n(Xn)(yn|ϕ1n(xn))1e^{\eta}|\mathcal{M}_{2}|P_{Y^{n}|\phi_{1n}(X^{n})}(y^{n}|\phi_{1n}(x^{n}))\geq 1.
U-HT \Rightarrow WAK: given a U-HT scheme (ϕn,ψn)(\phi_{n},\psi_{n}) we show the existence a WAK-code as follows. We use ϕn\phi_{n} to compress xnx^{n} in the WAK problem. We randomly assign yny^{n} to an index m2m_{2} in an alphabet 2\mathcal{M}_{2}. For each m2m_{2} we denote the corresponding (random) set of such yny^{n} by (m2)\mathcal{B}(m_{2}). We declare that y^n\hat{y}^{n} is the original source sequence if it is the unique sequence satisfying y^n(m2)\hat{y}^{n}\in\mathcal{B}(m_{2}) and (y^n,ϕn(xn))(𝒜n(ϕn,γ/|𝒴|n))c(\hat{y}^{n},\phi_{n}(x^{n}))\in(\mathcal{A}_{n}(\phi_{n},\gamma/|\mathcal{Y}|^{n}))^{c}. For each uu\in\mathcal{M}, the cardinality of the set of yny^{n} satisfying (yn,u)(𝒜n(ϕn,γ/|𝒴|n))c(y^{n},u)\in(\mathcal{A}_{n}(\phi_{n},\gamma/|\mathcal{Y}|^{n}))^{c} is upper bounded by |𝒴|n/γ|\mathcal{Y}|^{n}/\gamma. There are two sources of errors:

  • either (yn,ϕn(xn))𝒜n(ϕn,γ/|𝒴|n)(y^{n},\phi_{n}(x^{n}))\in\mathcal{A}_{n}(\phi_{n},\gamma/|\mathcal{Y}|^{n}) holds,

  • or there exists another y~n\tilde{y}^{n} for which y~n(m2)\tilde{y}^{n}\in\mathcal{B}(m_{2}) and (y~n,ϕn(xn))(𝒜n(ϕn,γ/|𝒴|n))c(\tilde{y}^{n},\phi_{n}(x^{n}))\in(\mathcal{A}_{n}(\phi_{n},\gamma/|\mathcal{Y}|^{n}))^{c} hold.

The probability of the first event is upper bounded as

Pr{(Yn,ϕn(Xn))𝒜n(ϕn,γ/|𝒴|n)}(105)αn+γβn,\displaystyle\mathrm{Pr}\{(Y^{n},\phi_{n}(X^{n}))\in\mathcal{A}_{n}(\phi_{n},\gamma/|\mathcal{Y}|^{n})\}\stackrel{{\scriptstyle\eqref{testing_abg}}}{{\leq}}\alpha_{n}+\gamma\beta_{n}, (110)

The probability of the second event is upper bounded by |𝒴|n/(γ|2|)|\mathcal{Y}|^{n}/(\gamma|\mathcal{M}_{2}|) because each sequence y~n\tilde{y}^{n} is assigned to a bin with probability 1/|2|1/|\mathcal{M}_{2}| and the number of such sequences satisfying the second event is upper bounded by |𝒴|n/γ|\mathcal{Y}|^{n}/\gamma. Hence, it can be seen that

Pr{Y^nYn}αn+γβn+|𝒴|n/(γ|2|).\displaystyle\mathrm{Pr}\{\hat{Y}^{n}\neq Y^{n}\}\leq\alpha_{n}+\gamma\beta_{n}+|\mathcal{Y}|^{n}/(\gamma|\mathcal{M}_{2}|). (111)

Fix an ϵ[0,1)\epsilon\in[0,1). Let WAK,ϵ\mathcal{R}_{\mathrm{WAK},\epsilon} be the closure of all (Rc,R2)(R_{c},R_{2}) such that there exists a sequence WAK-codes (ϕ1n,ϕ2n,ψ¯n)(\phi_{1n},\phi_{2n},\bar{\psi}_{n}) which satisfies

lim supn1nlog|ϕ1n|Rc,\displaystyle\limsup_{n\to\infty}\frac{1}{n}\log|\phi_{1n}|\leq R_{c}, lim supn1nlog|ϕ2n|R2,\displaystyle\;\limsup_{n\to\infty}\frac{1}{n}\log|\phi_{2n}|\leq R_{2},
lim supnPr{Y^nYn}\displaystyle\limsup_{n\to\infty}\mathrm{Pr}\{\hat{Y}^{n}\neq Y^{n}\} ϵ.\displaystyle\leq\epsilon. (112)

Define R2,ϵ(Rc)=inf{R2(Rc,R2)WAK,ϵ}R_{2,\epsilon}^{\star}(R_{c})=\inf\{R_{2}\mid(R_{c},R_{2})\in\mathcal{R}_{\mathrm{WAK},\epsilon}\}. We observe that R2,ϵ(Rc)log|𝒴|R_{2,\epsilon}^{\star}(R_{c})\leq\log|\mathcal{Y}| for all ϵ[0,1)\epsilon\in[0,1) and RcR_{c}. By setting γ=|𝒴|n\gamma=|\mathcal{Y}|^{n} in (105) we obtain that Eϵ(Rc)log|𝒴|E_{\epsilon}^{\star}(R_{c})\leq\log|\mathcal{Y}|. The following result summarizes the relation between the minimum encoding rate R2,ϵ(Rc)R_{2,\epsilon}^{\star}(R_{c}) in the WAK problem and the maximum ϵ\epsilon-achievable error exponent in the U-HT problem. As in [21, Theorem 3] we have the following result.

Corollary 2.

For any given Rc>0R_{c}>0 and ϵ[0,1)\epsilon\in[0,1), we have

R2,ϵ(Rc)+Eϵ(Rc)=log|𝒴|.R_{2,\epsilon}^{\star}(R_{c})+E_{\epsilon}^{\star}(R_{c})=\log|\mathcal{Y}|. (113)
Proof.

We consider the extreme case where R2,ϵ(Rc)=log|𝒴|R_{2,\epsilon}^{\star}(R_{c})=\log|\mathcal{Y}| holds. Assume that E>0E>0 is an achievable error exponent in the U-HT problem with the corresponding sequence of testing schemes (ϕn,ψn)(\phi_{n},\psi_{n}). We take γ=en(Eη)\gamma=e^{n(E-\eta)} where 0<η<23E0<\eta<\frac{2}{3}E. By plugging it into the second part of Theorem 5 and choosing |2|=(|𝒴|n/γ)enη/2|\mathcal{M}_{2}|=(|\mathcal{Y}|^{n}/\gamma)e^{n\eta/2}, we obtain a sequence WAK-codes such that lim supnPr{Y^nYn}ϵ\limsup_{n\to\infty}\mathrm{Pr}\{\hat{Y}^{n}\neq Y^{n}\}\leq\epsilon. The corresponding compression rate is log|𝒴|E+32η<R2,ϵ(Rc)\log|\mathcal{Y}|-E+\frac{3}{2}\eta<R_{2,\epsilon}^{\star}(R_{c}), a contradiction. Therefore in this case we must have Eϵ(Rc)=0E_{\epsilon}^{\star}(R_{c})=0. The other cases can be worked out similarly as in the proof of [21, Theorem 3]. ∎

Assume now that the source distribution in the WAK problem (and the distribution in the U-HT problem) is given by PXnYn=i=1mνiPXiYinP_{X^{n}Y^{n}}=\sum_{i=1}^{m}\nu_{i}P_{X_{i}Y_{i}}^{\otimes n}. Further we assume that QYjn=χnQ_{Y_{j}^{n}}=\chi^{\otimes n} for all j[1:k]j\in[1:k], and {QXtn}={P𝒳,sn}\{Q_{X_{t}^{n}}\}=\{P_{\mathcal{X},s}^{\otimes n}\} as well as QYnXn=χn×PXnQ_{Y^{n}X^{n}}=\chi^{\otimes n}\times P_{X^{n}} hold. Therefore we can use results from Section III as follows. The quantities θs(Rc)\theta_{s}(R_{c}), s[1:|𝒫𝒳|]s\in[1:|\mathcal{P}_{\mathcal{X}}|], in this case are given by

θs(Rc)\displaystyle\theta_{s}(R_{c}) =maxPU|X:I(Xs;U)Rcmini𝔉s[I(Yi;U)H(Yi)+log|𝒴|]\displaystyle=\max_{P_{U|X}\colon I(X_{s};U)\leq R_{c}}\min_{i\in\mathfrak{F}_{s}}[I(Y_{i};U)-H(Y_{i})+\log|\mathcal{Y}|]
=log|𝒴|minPU|X:I(Xs;U)Rcmaxi𝔉sH(Yi|U).\displaystyle=\log|\mathcal{Y}|-\min_{P_{U|X}\colon I(X_{s};U)\leq R_{c}}\max_{i\in\mathfrak{F}_{s}}H(Y_{i}|U). (114)

Therefore using Corollary 2 we obtain

R2,0(Rc)=maxsminPU|X:I(Xs;U)Rcmaxi𝔉sH(Yi|U).R_{2,0}^{\star}(R_{c})=\max_{s}\min_{P_{U|X}\colon I(X_{s};U)\leq R_{c}}\max_{i\in\mathfrak{F}_{s}}H(Y_{i}|U). (115)

Similarly if Assumption 2 hold and (ξi(Rc))i=1m(\xi_{i}(R_{c}))_{i=1}^{m} is an increasing sequence, the minimum ϵ\epsilon-achievable compression rate at RcR_{c} is given by

R2,ϵ(Rc)=log|𝒴|ξi(Rc),whenj=1i1νiϵ<j=1iνi,\displaystyle R_{2,\epsilon}^{\star}(R_{c})=\log|\mathcal{Y}|-\xi_{i}(R_{c}),\;\text{when}\;\sum_{j=1}^{i-1}\nu_{i}\leq\epsilon<\sum_{j=1}^{i}\nu_{i}, (116)

where in this case for all i[1:m]i\in[1:m] we have

ξi(Rc)=log|𝒴|minPU|X:I(Xs;U)RcH(Yi|U).\xi_{i}(R_{c})=\log|\mathcal{Y}|-\min_{P_{U|X}\colon I(X_{s};U)\leq R_{c}}H(Y_{i}|U). (117)

Appendix A Hypothesis testing with two-sided compression

In this section we study code transformations between hypothesis testing problems with compression at both terminals. Although the setting considered in the following is not directly related to our main problem, the arguments presented herein however are useful in simplifying proofs of main results in this work.

Assume that xnx^{n} is available at Terminal 1. Further yny^{n} is available at Terminal 2. At a given number of samples nn a generic testing scheme involves a triple of mappings (ϕ1n,ϕ2n,ψn)(\phi_{1n},\phi_{2n},\psi_{n}) where

ϕ1n:𝒳n\displaystyle\phi_{1n}\colon\mathcal{X}^{n} 1,ϕ2n:𝒴n2,\displaystyle\to\mathcal{M}_{1},\;\phi_{2n}\colon\mathcal{Y}^{n}\to\mathcal{M}_{2},
ψn:\displaystyle\psi_{n}\colon 1×2{0,1}.\displaystyle\mathcal{M}_{1}\times\mathcal{M}_{2}\to\{0,1\}. (118)

Similarly as in (5) the generic error probabilities αn\alpha_{n} and βn\beta_{n} are defined as

αn=Pϕ1n(Xn)ϕ2n(Yn)(1ψn),βn=Qϕ1n(Xn)ϕ2n(Yn)(ψn).\displaystyle\alpha_{n}=P_{\phi_{1n}(X^{n})\phi_{2n}(Y^{n})}(1-\psi_{n}),\;\beta_{n}=Q_{\phi_{1n}(X^{n})\phi_{2n}(Y^{n})}(\psi_{n}). (119)

We study a relation between two hypothesis testing problems which have the same distribution in the null hypothesis. The first problem involves the following hypotheses

H0\displaystyle H_{0} :(xn,yn)PXnYn\displaystyle\colon(x^{n},y^{n})\sim P_{X^{n}Y^{n}}
H1\displaystyle H_{1} :(xn,yn)QYn×QXn,\displaystyle\colon(x^{n},y^{n})\sim Q_{Y^{n}}\times Q_{X^{n}}, (120)

whereas the second problem considers

H¯0\displaystyle\bar{H}_{0} :(xn,yn)PXnYn\displaystyle\colon(x^{n},y^{n})\sim P_{X^{n}Y^{n}}
H¯1\displaystyle\bar{H}_{1} :(xn,yn)Q¯Yn×Q¯Xn.\displaystyle\colon(x^{n},y^{n})\sim\bar{Q}_{Y^{n}}\times\bar{Q}_{X^{n}}. (121)

We make the following technical assumptions

n,\displaystyle\forall n,\; PXnQXn,PYnQYn,\displaystyle P_{X^{n}}\ll Q_{X^{n}},\;P_{Y^{n}}\ll Q_{Y^{n}},
PXnQ¯Xn,PYnQ¯Yn.\displaystyle\;P_{X^{n}}\ll\bar{Q}_{X^{n}},\;P_{Y^{n}}\ll\bar{Q}_{Y^{n}}. (122)
Refer to caption
Figure 3: Constructive code transformations between two hypothesis testing problems. Each arrow indicates that given a testing scheme for the source node we can construct a testing scheme for the destination node. Each transformation allows us to change the measure of miss detection at the destination node to the measure of miss detection at the source node.

An overview of our derivation is depicted in Fig. 3. Let (ϕ1n,ϕ2n,ψn)(\phi_{1n},\phi_{2n},\psi_{n}) be a testing scheme for differentiating between H0H_{0} and H1H_{1} in (120). We construct a testing scheme (ϕ¯1n,ϕ¯2n,ψ¯n)(\bar{\phi}_{1n},\bar{\phi}_{2n},\bar{\psi}_{n}) for testing H¯0\bar{H}_{0} against H¯1\bar{H}_{1} in (121) as follows. For an arbitrarily given γ>0\gamma>0, we define typical sets n,γ1\mathcal{B}_{n,\gamma}^{1}, and n,γ2\mathcal{B}_{n,\gamma}^{2} as

n,γ1\displaystyle\mathcal{B}_{n,\gamma}^{1} ={xn|[logQ¯Xn(xn)logQXn(xn)]/nAX|<γ},\displaystyle=\{x^{n}\mid|[\log\bar{Q}_{X}^{n}(x^{n})-\log Q_{X^{n}}(x^{n})]/n-A_{X}|<\gamma\},
n,γ2\displaystyle\mathcal{B}_{n,\gamma}^{2} ={yn|[logQ¯Yn(yn)logQYn(yn)]/nAY|<γ},\displaystyle=\{y^{n}\mid|[\log\bar{Q}_{Y^{n}}(y^{n})-\log Q_{Y^{n}}(y^{n})]/n-A_{Y}|<\gamma\}, (123)

where AXA_{X} and AYA_{Y} are finite numbers.
We define the compression mapping ϕ¯1n\bar{\phi}_{1n} as follows

ϕ¯1n:𝒳n\displaystyle\bar{\phi}_{1n}\colon\mathcal{X}^{n} 1{e1}\displaystyle\to\mathcal{M}_{1}\cup\{e_{1}\}
ϕ¯1n(xn)\displaystyle\bar{\phi}_{1n}(x^{n}) {ϕ1n(xn)ifxnn,γ1,e1otherwise.\displaystyle\mapsto\begin{dcases}\phi_{1n}(x^{n})\;&\text{if}\;x^{n}\in\mathcal{B}_{n,\gamma}^{1},\\ e_{1}\;&\text{otherwise}\end{dcases}. (124)

Similarly the compression mapping ϕ¯2n\bar{\phi}_{2n} is defined as

ϕ¯2n:𝒴n\displaystyle\bar{\phi}_{2n}\colon\mathcal{Y}^{n} 2{e2}\displaystyle\to\mathcal{M}_{2}\cup\{e_{2}\}
ϕ¯2n(yn)\displaystyle\bar{\phi}_{2n}(y^{n}) {ϕ2n(yn)ifynn,γ2,e2otherwise.\displaystyle\mapsto\begin{dcases}\phi_{2n}(y^{n})\;&\text{if}\;y^{n}\in\mathcal{B}_{n,\gamma}^{2},\\ e_{2}\;&\text{otherwise}\end{dcases}. (125)

The corresponding decision mapping ψ¯n\bar{\psi}_{n}, is given as

ψ¯n:k=12(k{ek})\displaystyle\bar{\psi}_{n}\colon\prod_{k=1}^{2}(\mathcal{M}_{k}\cup\{e_{k}\}) {0,1}\displaystyle\to\{0,1\}
ψ¯n(u¯1,u¯2)\displaystyle\bar{\psi}_{n}(\bar{u}_{1},\bar{u}_{2}) {ψn(u¯1,u¯2)ifu¯kek,k=1,2,1otherwise.\displaystyle\mapsto\begin{dcases}\psi_{n}(\bar{u}_{1},\bar{u}_{2})\;&\text{if}\;\bar{u}_{k}\neq e_{k},\;\forall k=1,2,\\ 1&\;\text{otherwise}\end{dcases}.

For notation simplicity we define for k=1,2,k=1,2, and ukku_{k}\in\mathcal{M}_{k}, 𝒲ukk=ϕkn1(uk)n,γk\mathcal{W}_{u_{k}}^{k}=\phi_{kn}^{-1}(u_{k})\cap\mathcal{B}_{n,\gamma}^{k}. The set of all (xn,yn)(x^{n},y^{n}) for which u¯1e1\bar{u}_{1}\neq e_{1} and u¯2e2\bar{u}_{2}\neq e_{2} hold is n,γ1×n,γ2,\mathcal{B}_{n,\gamma}^{1}\times\mathcal{B}_{n,\gamma}^{2}, which can be factorized further as

(u1,u2)1×2(𝒲u11×𝒲u22).\displaystyle\bigcup_{(u_{1},u_{2})\in\mathcal{M}_{1}\times\mathcal{M}_{2}}(\mathcal{W}_{u_{1}}^{1}\times\mathcal{W}_{u_{2}}^{2}). (126)

In differentiating between PYnXnP_{Y^{n}X^{n}} and Q¯Yn×Q¯Xn\bar{Q}_{Y^{n}}\times\bar{Q}_{X^{n}} the induced false alarm probability α¯n\bar{\alpha}_{n} by the testing scheme (ϕ¯1n,ϕ¯2n,ψ¯n)(\bar{\phi}_{1n},\bar{\phi}_{2n},\bar{\psi}_{n}) is upper bounded as

α¯n\displaystyle\bar{\alpha}_{n} =1(u1,u2)1×2PYnXn(𝒲u11×𝒲u22)PH0|u1,u2\displaystyle=1-\sum_{(u_{1},u_{2})\in\mathcal{M}_{1}\times\mathcal{M}_{2}}P_{Y^{n}X^{n}}(\mathcal{W}_{u_{1}}^{1}\times\mathcal{W}_{u_{2}}^{2})P_{H_{0}|u_{1},u_{2}}
1(u1,u2)1×2PYnXn[ϕ1n1(u1)×ϕ2n1(u2)]PH0|u1,u2\displaystyle\leq 1-\sum_{(u_{1},u_{2})\in\mathcal{M}_{1}\times\mathcal{M}_{2}}P_{Y^{n}X^{n}}[\phi_{1n}^{-1}(u_{1})\times\phi_{2n}^{-1}(u_{2})]P_{H_{0}|u_{1},u_{2}}
+PYnXn[([(n,γ1×𝒴n)(𝒳n×n,γ2)])c]\displaystyle+P_{Y^{n}X^{n}}[([(\mathcal{B}_{n,\gamma}^{1}\times\mathcal{Y}^{n})\cap(\mathcal{X}^{n}\times\mathcal{B}_{n,\gamma}^{2})])^{c}]
1(u1,u2)1×2PYnXn[ϕ1n1(u1)×ϕ2n1(u2)]PH0|u1,u2\displaystyle\leq 1-\sum_{(u_{1},u_{2})\in\mathcal{M}_{1}\times\mathcal{M}_{2}}P_{Y^{n}X^{n}}[\phi_{1n}^{-1}(u_{1})\times\phi_{2n}^{-1}(u_{2})]P_{H_{0}|u_{1},u_{2}}
+PXn[(n,γ1)c]+PYn[(n,γ2)c]\displaystyle\quad+P_{X^{n}}[(\mathcal{B}_{n,\gamma}^{1})^{c}]+P_{Y^{n}}[(\mathcal{B}_{n,\gamma}^{2})^{c}]
=αn+PXn[(n,γ1)c]+PYn[(n,γ2)c].\displaystyle=\alpha_{n}+P_{X^{n}}[(\mathcal{B}_{n,\gamma}^{1})^{c}]+P_{Y^{n}}[(\mathcal{B}_{n,\gamma}^{2})^{c}]. (127)

Similarly the probability of miss detection β¯n\bar{\beta}_{n} is bounded by

β¯n\displaystyle\bar{\beta}_{n} =(u1,u2)1×2Q¯Xn(𝒲u11)Q¯Yn(𝒲u22)PH0|u1,u2\displaystyle=\sum_{(u_{1},u_{2})\in\mathcal{M}_{1}\times\mathcal{M}_{2}}\bar{Q}_{X^{n}}(\mathcal{W}_{u_{1}}^{1})\bar{Q}_{Y^{n}}(\mathcal{W}_{u_{2}}^{2})P_{H_{0}|u_{1},u_{2}}
en(AX+AY+2γ)\displaystyle\leq e^{n(A_{X}+A_{Y}+2\gamma)}
×(u1,u2)1×2QXn(𝒲u11)QYn(𝒲u22)PH0|u1,u2\displaystyle\quad\times\sum_{(u_{1},u_{2})\in\mathcal{M}_{1}\times\mathcal{M}_{2}}Q_{X^{n}}(\mathcal{W}_{u_{1}}^{1})Q_{Y^{n}}(\mathcal{W}_{u_{2}}^{2})P_{H_{0}|u_{1},u_{2}}
en(AX+AY+2γ)βn.\displaystyle\leq e^{n(A_{X}+A_{Y}+2\gamma)}\beta_{n}. (128)

Given a testing scheme (ϕ¯1n,ϕ¯2n,ψ¯n)(\bar{\phi}_{1n},\bar{\phi}_{2n},\bar{\psi}_{n}) for testing PYnXnP_{Y^{n}X^{n}} against Q¯Yn×Q¯Xn\bar{Q}_{Y^{n}}\times\bar{Q}_{X^{n}} we apply a similar procedure as from (124) to (A), by swapping the positions of ϕ¯kn\bar{\phi}_{kn} and ϕkn\phi_{kn} for k=1,2k=1,2, in (124) and (125) as well as switching the roles of ψ¯n\bar{\psi}_{n} and ψn\psi_{n} in (A), to obtain a testing scheme (ϕ1n,ϕ2n,ψn)(\phi_{1n},\phi_{2n},\psi_{n}) for testing PYnXnP_{Y^{n}X^{n}} against QYn×QXnQ_{Y^{n}}\times Q_{X^{n}}.

The induced false alarm probability αn\alpha_{n} in testing H0H_{0} against H1H_{1} is similarly upper bounded by

αnα¯n+PXn[(n,γ1)c]+PYn[(n,γ2)c].\displaystyle\alpha_{n}\leq\bar{\alpha}_{n}+P_{X^{n}}[(\mathcal{B}_{n,\gamma}^{1})^{c}]+P_{Y^{n}}[(\mathcal{B}_{n,\gamma}^{2})^{c}]. (129)

The induced miss detection probability βn\beta_{n} in testing H0H_{0} against H1H_{1} is upper bounded as

βn\displaystyle\beta_{n} =(u¯1,u¯2)¯1ׯ2QXn(𝒲u¯11)QYn(𝒲u¯22)PH¯0|u¯1,u¯2\displaystyle=\sum_{(\bar{u}_{1},\bar{u}_{2})\in\bar{\mathcal{M}}_{1}\times\bar{\mathcal{M}}_{2}}Q_{X^{n}}(\mathcal{W}_{\bar{u}_{1}}^{1})Q_{Y^{n}}(\mathcal{W}_{\bar{u}_{2}}^{2})P_{\bar{H}_{0}|\bar{u}_{1},\bar{u}_{2}}
en(AX+AY2γ)\displaystyle\leq e^{-n(A_{X}+A_{Y}-2\gamma)}
×(u¯1,u¯2)¯1ׯ2Q¯Xn(𝒲u¯11)Q¯Yn(𝒲u¯22)PH¯0|u¯1,u¯2\displaystyle\quad\times\sum_{(\bar{u}_{1},\bar{u}_{2})\in\bar{\mathcal{M}}_{1}\times\bar{\mathcal{M}}_{2}}\bar{Q}_{X^{n}}(\mathcal{W}_{\bar{u}_{1}}^{1})\bar{Q}_{Y^{n}}(\mathcal{W}_{\bar{u}_{2}}^{2})P_{\bar{H}_{0}|\bar{u}_{1},\bar{u}_{2}}
en(AX+AY2γ)β¯n.\displaystyle\leq e^{-n(A_{X}+A_{Y}-2\gamma)}\bar{\beta}_{n}. (130)

For a given pair of compression rates (R1,R2)(R_{1},R_{2}), let E¯ϵ(R1,R2)\bar{E}_{\epsilon}^{\star}(R_{1},R_{2}) be the maximum ϵ\epsilon-achievable error exponent for testing H¯0\bar{H}_{0} against H¯1\bar{H}_{1}. Similarly let Eϵ(R1,R2)E_{\epsilon}^{\star}(R_{1},R_{2}) be the maximum ϵ\epsilon-achievable error exponent for testing H0H_{0} against H1H_{1}. To establish a relation between E¯ϵ(R1,R2)\bar{E}_{\epsilon}^{\star}(R_{1},R_{2}) and Eϵ(R1,R2)E_{\epsilon}^{\star}(R_{1},R_{2}), we make the following assumption.

Assumption 3.

The sequences of joint distributions (PXnYn)(P_{X^{n}Y^{n}}), (QXnYn)(Q_{X^{n}Y^{n}}), (Q¯XnYn)(\bar{Q}_{X^{n}Y^{n}}), and the quantities AXA_{X}, AYA_{Y}, satisfy (122) and the following conditions

γ>0,limnPXn(n,γ1)=1,limnPYn(n,γ2)=1.\displaystyle\forall\gamma>0,\;\lim_{n\to\infty}P_{X^{n}}(\mathcal{B}_{n,\gamma}^{1})=1,\;\lim_{n\to\infty}P_{Y^{n}}(\mathcal{B}_{n,\gamma}^{2})=1. (131)

We give some examples in which Assumption (3) is satisfied.

  • Assume that

    PYn=PYn,PXn\displaystyle P_{Y^{n}}=P_{Y}^{\otimes n},\;P_{X^{n}} =PXn,QXn=QXn,QYn=QYn,\displaystyle=P_{X}^{\otimes n},\;Q_{X^{n}}=Q_{X}^{\otimes n},\;Q_{Y^{n}}=Q_{Y}^{\otimes n},
    andQ¯Xn\displaystyle\text{and}\;\bar{Q}_{X^{n}} =Q¯Xn,Q¯Yn=Q¯Yn,\displaystyle=\bar{Q}_{X}^{\otimes n},\;\bar{Q}_{Y^{n}}=\bar{Q}_{Y}^{\otimes n}, (132)

    hold such that the conditions in (122) are fulfilled. Furthermore we assume that

    AX\displaystyle A_{X} =D(PXQX)D(PXQ¯X),\displaystyle=D(P_{X}\|Q_{X})-D(P_{X}\|\bar{Q}_{X}),
    andAY\displaystyle\text{and}\;A_{Y} =D(PYQY)D(PYQ¯Y)\displaystyle=D(P_{Y}\|Q_{Y})-D(P_{Y}\|\bar{Q}_{Y}) (133)

    are finite. By the weak law of large numbers, the conditions in (131) are satisfied.

  • Assume that (PYn)(P_{Y^{n}}), (PXn)(P_{X^{n}}) are stationary and ergodic processes as well as (QYn)(Q_{Y^{n}}), (QXn)(Q_{X^{n}}), and (Q¯Yn)(\bar{Q}_{Y^{n}}), (Q¯Xn)(\bar{Q}_{X^{n}}) are finite order Markov processes with stationary transition probabilities such that the conditions in (122) are satisfied. Assume further that the relative divergence rates

    DX\displaystyle D_{X} =limn[D(PXn+1QXn+1)D(PXnQXn)],\displaystyle=\lim_{n\to\infty}[D(P_{X^{n+1}}\|Q_{X^{n+1}})-D(P_{X^{n}}\|Q_{X^{n}})],
    D¯X\displaystyle\bar{D}_{X} =limn[D(PXn+1Q¯Xn+1)D(PXnQ¯Xn)],\displaystyle=\lim_{n\to\infty}[D(P_{X^{n+1}}\|\bar{Q}_{X^{n+1}})-D(P_{X^{n}}\|\bar{Q}_{X^{n}})],
    DY\displaystyle D_{Y} =limn[D(PYn+1QYn+1)D(PYnQYn)],\displaystyle=\lim_{n\to\infty}[D(P_{Y^{n+1}}\|Q_{Y^{n+1}})-D(P_{Y^{n}}\|Q_{Y^{n}})],
    D¯Y\displaystyle\bar{D}_{Y} =limn[D(PYn+1Q¯Yn+1)D(PYnQ¯Yn)],\displaystyle=\lim_{n\to\infty}[D(P_{Y^{n+1}}\|\bar{Q}_{Y^{n+1}})-D(P_{Y^{n}}\|\bar{Q}_{Y^{n}})], (134)

    are finite. With AX=DXD¯XA_{X}=D_{X}-\bar{D}_{X}, and AY=DYD¯YA_{Y}=D_{Y}-\bar{D}_{Y}, the conditions in (131) are fulfilled by [24, Theorem 1].

Assume that Assumption 3 is fulfilled, then (127) and (128) imply that

E¯ϵ(R1,R2)Eϵ(R1,R2)[AX+AY].\displaystyle\bar{E}_{\epsilon}^{\star}(R_{1},R_{2})\geq E_{\epsilon}^{\star}(R_{1},R_{2})-[A_{X}+A_{Y}]. (135)

Conversely, (129) and (130) imply that

Eϵ(R1,R2)E¯ϵ(R1,R2)+[AX+AY].\displaystyle E_{\epsilon}^{\star}(R_{1},R_{2})\geq\bar{E}_{\epsilon}^{\star}(R_{1},R_{2})+[A_{X}+A_{Y}]. (136)

In conclusion we have shown the following result.

Theorem 6.

Given ϵ[0,1)\epsilon\in[0,1) and (R1,R2)+2(R_{1},R_{2})\in\mathbb{R}_{+}^{2}, under Assumption 3 we have

Eϵ(R1,R2)=E¯ϵ(R1,R2)+[AX+AY].\displaystyle E_{\epsilon}^{\star}(R_{1},R_{2})=\bar{E}_{\epsilon}^{\star}(R_{1},R_{2})+[A_{X}+A_{Y}]. (137)

As a corollary for this result by setting Q¯Xn=PXn\bar{Q}_{X^{n}}=P_{X}^{\otimes n} and Q¯Yn=PYn\bar{Q}_{Y^{n}}=P_{Y}^{\otimes n} we have the following result which states that Theorem 5 in [1] is tight for a special case.

Theorem 7.

Assume that the sequences of distributions (QYn)(Q_{Y^{n}}) and (QXn)(Q_{X^{n}}) satisfy Assumption 1 with m=1m=1, r=1r=1, S=1S=1, and PY1X1=PYXP_{Y_{1}X_{1}}=P_{YX}. In testing PYXnP_{YX}^{\otimes n} against QYn×QXnQ_{Y^{n}}\times Q_{X^{n}} using one-sided compression of the sequence xnx^{n}, the maximum ϵ\epsilon-achievable error exponent is given by

Eϵ(Rc)=maxPU|X:I(U;X)RcI(Y;U)+dyx,ϵ[0,1),\displaystyle E_{\epsilon}^{\star}(R_{c})=\max_{P_{U|X}\colon I(U;X)\leq R_{c}}I(Y;U)+d^{\mathrm{yx}},\;\forall\epsilon\in[0,1), (138)

where in the first category dyx=D(PXQX)+D(PYQY)d^{\mathrm{yx}}=D(P_{X}\|Q_{X})+D(P_{Y}\|Q_{Y}), and in the second category dyx=AX(11)+AY(11)d^{\mathrm{yx}}=A_{X}^{(11)}+A_{Y}^{(11)}.

Appendix B Proof of Theorem 1

B-A A support lemma

For the achievability proof of Theorem 1 we need the following support lemma.

Lemma 1.

Let ({PVηn},{QVτn})(\{P_{V_{\eta}}^{\otimes n}\},\{Q_{V_{\tau}^{n}}\}) be either ({P𝒳,sn},{QXtn})(\{P_{\mathcal{X},s}^{\otimes n}\},\{Q_{X_{t}^{n}}\}) or ({PYin},{QYjn})(\{P_{Y_{i}}^{\otimes n}\},\{Q_{Y_{j}^{n}}\}) in Assumption 1. For each η\eta, let dηd_{\eta}^{\star} be the corresponding dsxd_{s}^{\mathrm{x}} or diyd_{i}^{\mathrm{y}}. For each nn, define PVn=sνηPVηnP_{V^{n}}=\sum_{s}\nu_{\eta}P_{V_{\eta}}^{\otimes n} where for simplicity (νη)(\nu_{\eta}) are fixed positive numbers and ηνη=1\sum_{\eta}\nu_{\eta}=1. Also for each η\eta, we assume that VηnPVηnV_{\eta}^{n}\sim P_{V_{\eta}}^{\otimes n}. Then for an arbitrarily given γ>0\gamma>0 we have

limnPr{minτιPVnQVτn(Vηn)<n(dηγ)}=0.\displaystyle\lim_{n\to\infty}\mathrm{Pr}\{\min_{\tau}\iota_{P_{V^{n}}\|Q_{V_{\tau}^{n}}}(V_{\eta}^{n})<n(d_{\eta}^{\star}-\gamma)\}=0. (139)
Proof.

It can be seen that

Pr{minτιPVnQVτn(Vηn)n(dηγ)}\displaystyle\mathrm{Pr}\{\min_{\tau}\iota_{P_{V^{n}}\|Q_{V_{\tau}^{n}}}(V_{\eta}^{n})\leq n(d_{\eta}^{\star}-\gamma)\}
τPr{ιPVnQVτn(Vηn)n(dηγ)}\displaystyle\leq\sum_{\tau}\mathrm{Pr}\{\iota_{P_{V^{n}}\|Q_{V_{\tau}^{n}}}(V_{\eta}^{n})\leq n(d_{\eta}^{\star}-\gamma)\}
τPr{ιPVηnQVτn(Vηn)n(dηγlogνη/n)}.\displaystyle\leq\sum_{\tau}\mathrm{Pr}\{\iota_{P_{V_{\eta}}^{\otimes n}\|Q_{V_{\tau}^{n}}}(V_{\eta}^{n})\leq n(d_{\eta}^{\star}-\gamma-\log\nu_{\eta}/n)\}. (140)

If {QVτn}\{Q_{V_{\tau}^{n}}\} belongs to the second category, we always have PVηnQVτnP_{V_{\eta}}^{\otimes n}\ll Q_{V_{\tau}^{n}} for all (η,τ)(\eta,\tau). This leads to PVnQVτnP_{V^{n}}\ll Q_{V_{\tau}^{n}} for all τ\tau. The second inequality then follows since PVn(vn)νηPVηn(vn)P_{V^{n}}(v^{n})\geq\nu_{\eta}P_{V_{\eta}}^{\otimes n}(v^{n}) for all vnv^{n} holds.

Assume now that QVτnQ_{V_{\tau}^{n}} lies in the first category. When PVnQVτnP_{V^{n}}\ll Q_{V_{\tau}^{n}} holds, i.e., for all η\eta we have PVη¯nQVτnP_{V_{\bar{\eta}}}^{\otimes n}\ll Q_{V_{\tau}^{n}}, then the set {vnQVτn(vn)=0}\{v^{n}\mid Q_{V_{\tau}^{n}}(v^{n})=0\} can be omitted since it has zero probability. If PVnQVτnP_{V^{n}}\mathrel{\not{\mkern-7.0mu\ll}}Q_{V_{\tau}^{n}}, for example when there exists a η¯\bar{\eta} satisfying PVη¯nQVτnP_{V_{\bar{\eta}}}^{\otimes n}\mathrel{\not{\mkern-7.0mu\ll}}Q_{V_{\tau}^{n}}, then for all vnv^{n} such that QVτn(vn)=0Q_{V_{\tau}^{n}}(v^{n})=0 we have ιPVnQVτn(vn)=+\iota_{P_{V^{n}}\|Q_{V_{\tau}^{n}}}(v^{n})=+\infty which in turn violates the inequality ιPVnQVτn(vn)n(dηγ)\iota_{P_{V^{n}}\|Q_{V_{\tau}^{n}}}(v^{n})\leq n(d_{\eta}^{\star}-\gamma). Therefore in both cases we only need to consider the set {vnQVτn(vn)>0}\{v^{n}\mid Q_{V_{\tau}^{n}}(v^{n})>0\}, in which the second inequality follows from the definitions of ιPVnQVτn\iota_{P_{V^{n}}\|Q_{V_{\tau}^{n}}}, and ιPVηnQVτn\iota_{P_{V_{\eta}}^{\otimes n}\|Q_{V_{\tau}^{n}}} as well as the fact that PVn(vn)νηPVηn(vn)P_{V^{n}}(v^{n})\geq\nu_{\eta}P_{V_{\eta}}^{\otimes n}(v^{n}) for all vnv^{n} holds. For notation simplicity, we define Eη,n=dηγlogνη/nE_{\eta,n}=d_{\eta}^{\star}-\gamma-\log\nu_{\eta}/n. We examine two cases in Assumption 1 separately in the following.

  • Consider the first case in which {QVτn}\{Q_{V_{\tau}^{n}}\} is the set of product distributions on a common alphabet 𝒱\mathcal{V}. To avoid dealing with cumbersome extended real number operations, we define a set 𝒜η,n={vnvn𝒱n,PVηn(vn)>0}\mathcal{A}_{\eta,n}=\{v^{n}\mid v^{n}\in\mathcal{V}^{n},\;P_{V_{\eta}}^{\otimes n}(v^{n})>0\}. We further have888Without considering 𝒜η,n\mathcal{A}_{\eta,n}, we might encounter an indeterminate form +++\dots+\infty+\dots-\infty+\dots in the third expression.

    Pr{\displaystyle\mathrm{Pr}\big{\{} ιPVηnQVτn(Vηn)<nEη,n}\displaystyle\iota_{P_{V_{\eta}}^{\otimes n}\|Q_{V_{\tau}}^{\otimes n}}(V_{\eta}^{n})<nE_{\eta,n}\big{\}}
    =Pr{ιPVηnQVτn(Vηn)<nEη,n,Vηn𝒜η,n}\displaystyle=\mathrm{Pr}\big{\{}\iota_{P_{V_{\eta}}^{\otimes n}\|Q_{V_{\tau}}^{\otimes n}}(V_{\eta}^{n})<nE_{\eta,n},V_{\eta}^{n}\in\mathcal{A}_{\eta,n}\big{\}}
    =Pr{lιPVηlQVτl(Vηl)<nEη,n,Vηn𝒜η,n}.\displaystyle=\mathrm{Pr}\big{\{}\sum_{l}\iota_{P_{V_{\eta l}}\|Q_{V_{\tau l}}}(V_{\eta l})<nE_{\eta,n},V_{\eta}^{n}\in\mathcal{A}_{\eta,n}\big{\}}. (141)

    If D(PVηQVτ)<D(P_{V_{\eta}}\|Q_{V_{\tau}})<\infty then by the weak law of large numbers (141) goes to 0 as dηD(PVηQVτ)d_{\eta}^{\star}\leq D(P_{V_{\eta}}\|Q_{V_{\tau}}). When D(PVηQVτ)=+D(P_{V_{\eta}}\|Q_{V_{\tau}})=+\infty, let ητ\mathcal{B}_{\eta\tau} be the largest subset of 𝒱\mathcal{V} such that PVη(v)>0P_{V_{\eta}}(v)>0 and QVτ(v)=0Q_{V_{\tau}}(v)=0 for all vητv\in\mathcal{B}_{\eta\tau}. By our assumption we have PVη(ητ)>0P_{V_{\eta}}(\mathcal{B}_{\eta\tau})>0. And if vητv\in\mathcal{B}_{\eta\tau} then ιPVηQVτ(v)=+\iota_{P_{V_{\eta}}\|Q_{V_{\tau}}}(v)=+\infty holds. Therefore we have

    {vnlιPVηlQVτl(vl)<nEη,n,vn𝒜η,n}\displaystyle\{v^{n}\mid\sum_{l}\iota_{P_{V_{\eta l}}\|Q_{V_{\tau l}}}(v_{l})<nE_{\eta,n},\;v^{n}\in\mathcal{A}_{\eta,n}\}
    {vnvlητc,l[1:n],vn𝒜η,n}.\displaystyle\subseteq\{v^{n}\mid v_{l}\in\mathcal{B}_{\eta\tau}^{c},\;\forall l\in[1:n],\;v^{n}\in\mathcal{A}_{\eta,n}\}. (142)

    Therefore

    Pr\displaystyle\mathrm{Pr} {lιPVηlQVτl(Vηl)<nEη,n,Vηn𝒜η,n}\displaystyle\big{\{}\sum_{l}\iota_{P_{V_{\eta l}}\|Q_{V_{\tau l}}}(V_{\eta l})<nE_{\eta,n},V_{\eta}^{n}\in\mathcal{A}_{\eta,n}\big{\}}
    (1PVη(ητ))n0,asn.\displaystyle\leq(1-P_{V_{\eta}}(\mathcal{B}_{\eta\tau}))^{n}\to 0,\;\text{as}\;n\to\infty. (143)
  • Next consider the case in which {QVηn}\{Q_{V_{\eta}^{n}}\} is the set of finite order Markov processes satisfying Assumption 1. By [24, Theorem 1] we have with probability 1

    1nιPVηnQVτn(Vηn)Aητ,η,τ,\displaystyle\frac{1}{n}\iota_{P_{V_{\eta}}^{\otimes n}\|Q_{V_{\tau}^{n}}}(V_{\eta}^{n})\to A^{\eta\tau},\;\forall\eta,\tau, (144)

    where AητA^{\eta\tau} is either AX(st)A_{X}^{(st)} or AY(ij)A_{Y}^{(ij)}. This holds even when Aητ=+A^{\eta\tau}=+\infty. Since, by definition dη=minτAητ<+d_{\eta}^{\star}=\min_{\tau}A^{\eta\tau}<+\infty, the conclusion also holds in this case.

B-B Existence of a testing scheme

Let tmin=mins,s¯[1:|𝒫𝒳|]dTV(P𝒳,s,P𝒳,s¯)t_{\min}=\min_{s,\bar{s}\in[1:|\mathcal{P}_{\mathcal{X}}|]}d_{TV}(P_{\mathcal{X},s},P_{\mathcal{X},{\bar{s}}}) be the minimum (total variation) distance between any two probability distributions in the set of marginal distributions on 𝒳\mathcal{X}, 𝒫𝒳\mathcal{P}_{\mathcal{X}}. Denote by PxnP_{x^{n}} the type of a sequence xnx^{n}. We define a mapping T:𝒳n[1:|𝒫𝒳|]{e}T\colon\mathcal{X}^{n}\to[1:|\mathcal{P}_{\mathcal{X}}|]\cup\{e\} as follows. We look for a unique s[1:|𝒫𝒳|]s\in[1:|\mathcal{P}_{\mathcal{X}}|] such that dTV(Pxn,P𝒳,s)<tmin/2d_{TV}(P_{x^{n}},P_{\mathcal{X},s})<t_{\min}/2. If such an ss exists, we set T(xn)=sT(x^{n})=s. Otherwise we set T(xn)=eT(x^{n})=e.
For a given Rc>0R_{c}>0 and for each s[1:|𝒫𝒳|]s\in[1:|\mathcal{P}_{\mathcal{X}}|], let PUs|X¯sP_{U_{s}|\bar{X}_{s}} be a probability kernel which achieves θs(Rc)\theta_{s}(R_{c}). Define

PYnXnUn\displaystyle P_{Y^{n}X^{n}U^{n}} =i=1mνiPYiXiUin,whereνi>0,i[1:m],i=1mνi=1,\displaystyle=\sum_{i=1}^{m}\nu_{i}P_{Y_{i}X_{i}U_{i}}^{\otimes n},\;\text{where}\;\nu_{i}>0,\forall i\in[1:m],\;\sum_{i=1}^{m}\nu_{i}=1,
andPUi|Xi=PUs|X¯s,i𝔉s,s[1:|𝒫𝒳|].\displaystyle\text{and}\;P_{U_{i}|X_{i}}=P_{U_{s}|\bar{X}_{s}},\;\forall i\in\mathfrak{F}_{s},\;s\in[1:|\mathcal{P}_{\mathcal{X}}|]. (145)

Let PYnUnP_{Y^{n}U^{n}} be the corresponding marginal joint distribution on 𝒴n×𝒰n\mathcal{Y}^{n}\times\mathcal{U}^{n}. For notation simplicity we define the following score function999By our assumption minj[1:k]ιPYnQYjn(yn)<+\min_{j\in[1:k]}\iota_{P_{Y^{n}}\|Q_{Y_{j}^{n}}}(y^{n})<+\infty, as if PYn(yn)>0P_{Y^{n}}(y^{n})>0 then there exists an jj such that QYjn(yn)>0Q_{Y_{j}^{n}}(y^{n})>0, therefore ζ\zeta is well-defined.

ζ(yn,un)=ιPYnUn(yn;un)+minj[1:k]ιPYnQYjn(yn).\displaystyle\zeta(y^{n},u^{n})=\iota_{P_{Y^{n}U^{n}}}(y^{n};u^{n})+\min_{j\in[1:k]}\iota_{P_{Y^{n}}\|Q_{Y_{j}^{n}}}(y^{n}). (146)

For each i[1:m]i\in[1:m], and nn, let (Yin,Xin)(Y_{i}^{n},X_{i}^{n}) be a tuple of random variables such that (Yin,Xin)PYiXin(Y_{i}^{n},X_{i}^{n})\sim P_{Y_{i}X_{i}}^{\otimes n}. Let γ>0\gamma>0 be arbitrary but given. For each s[1:|𝒫𝒳|]s\in[1:|\mathcal{P}_{\mathcal{X}}|] draw MM codewords {usn(j)}j=1M\{u_{s}^{n}(j)\}_{j=1}^{M}, M=en(Rc+2γ)M=e^{n(R_{c}+2\gamma)}, from the marginal distribution PUsnP_{U_{s}}^{\otimes n} of (PUs|X¯s×P𝒳,s)n(P_{U_{s}|\bar{X}_{s}}\times P_{\mathcal{X},s})^{\otimes n}. Given xnx^{n}, let s^=T(xn)\hat{s}=T(x^{n}) be an estimate of the index of the marginal distribution of xnx^{n}. Assume that s^e\hat{s}\neq e. We proceed with the compression process if

mintιPXnQXtn(xn)>n(ds^xγ),\displaystyle\min_{t}\iota_{P_{X^{n}}\|Q_{X_{t}^{n}}}(x^{n})>n(d_{\hat{s}}^{\mathrm{x}}-\gamma), (147)

where PXn=iνiPXinP_{X}^{n}=\sum_{i}\nu_{i}P_{X_{i}}^{\otimes n} is the marginal on 𝒳n\mathcal{X}^{n} of PYnXnUnP_{Y^{n}X^{n}U^{n}}. Otherwise we send a special index js^=ej_{\hat{s}}^{\star}=e^{\star}.

When (147) is fulfilled, we then select a transmission index j[1:M]j\in[1:M] to be js^=argminj[1:M]πs^(xn;us^n(j))j^{\star}_{\hat{s}}=\arg\min_{j\in[1:M]}\pi_{{\hat{s}}}(x^{n};u_{\hat{s}}^{n}(j)) where for all s[1:|𝒫𝒳|]s\in[1:|\mathcal{P}_{\mathcal{X}}|]

πs(xn;un)=maxi𝔉sPr{ζ(Yin,un)n(Edsx)|Xin=xn}.\pi_{s}(x^{n};u^{n})=\max_{i\in\mathfrak{F}_{s}}\mathrm{Pr}\{\zeta(Y_{i}^{n},u^{n})\leq n(E-d_{s}^{\mathrm{x}})|X_{i}^{n}=x^{n}\}. (148)

If s^=e\hat{s}=e, we set js^=1j^{\star}_{\hat{s}}=1. The pair (s^,js^)(\hat{s},j^{\star}_{\hat{s}}) is provided to the decision center. If s^e\hat{s}\neq e, and js^ej^{\star}_{\hat{s}}\neq e^{\star} hold we declare H0H_{0} is the underlying distribution if

ζ(yn,us^n(js^))>n(Eds^x).\displaystyle\zeta(y^{n},u_{\hat{s}}^{n}(j^{\star}_{\hat{s}}))>n(E-d_{\hat{s}}^{\mathrm{x}}). (149)

Otherwise we declare that H1H_{1} is true. Given a codebook realization, let S^\hat{S} and JS^J^{\star}_{\hat{S}} be induced random variables under the null hypothesis. For each i[1:m]i\in[1:m], i𝔉si\in\mathfrak{F}_{s}, the corresponding false alarm probability is given by

αn(i)\displaystyle\alpha_{n}^{(i)} =Pr{[ζ(Yin,uS^n(JS^))n(EdS^x),andS^e,andJS^e],\displaystyle=\mathrm{Pr}\{[\zeta(Y_{i}^{n},u_{\hat{S}}^{n}(J^{\star}_{\hat{S}}))\leq n(E-d_{\hat{S}}^{\mathrm{x}}),\;\text{and}\;\hat{S}\neq e,\;\text{and}\;J^{\star}_{\hat{S}}\neq e^{\star}],
or[S^=e],or[JS^=e]}\displaystyle\hskip 56.9055pt\text{or}\;[\hat{S}=e],\;\text{or}\;[J^{\star}_{\hat{S}}=e^{\star}]\}
Pr{ζ(Yin,usn(Js))n(Edsx)}τi1+Pr{T(Xin)s}\displaystyle\leq\underbrace{\mathrm{Pr}\{\zeta(Y_{i}^{n},u_{s}^{n}(J^{\star}_{s}))\leq n(E-d_{s}^{\mathrm{x}})\}}_{\tau_{i}^{1}}+\mathrm{Pr}\{T(X_{i}^{n})\neq s\}
+Pr{mintιPXnQXtn(Xin)<n(dsxγ)}τs3.\displaystyle\qquad+\underbrace{\mathrm{Pr}\{\min_{t}\iota_{P_{X^{n}}\|Q_{X_{t}^{n}}}(X_{i}^{n})<n(d_{s}^{\mathrm{x}}-\gamma)\}}_{\tau_{s}^{3}}. (150)

For notation simplicity we define for each ss, Es=EdsxE_{s}=E-d_{s}^{\mathrm{x}}. The first term can be upper bounded further as

τi1\displaystyle\tau_{i}^{1} =Pr{ζ(Yin,un))nEs|Xin=xn}dPXinusn(Js)(xn,un)\displaystyle=\int\mathrm{Pr}\{\zeta(Y_{i}^{n},u^{n}))\leq nE_{s}|X_{i}^{n}=x^{n}\}dP_{X_{i}^{n}u_{s}^{n}(J^{\star}_{s})}(x^{n},u^{n})
maxi𝔉sPr{ζ(Yin,un))nEs|Xin=xn}dPXinusn(Js)(xn,un)\displaystyle\leq\int\max_{i^{\prime}\in\mathfrak{F}_{s}}\mathrm{Pr}\{\zeta(Y_{i^{\prime}}^{n},u^{n}))\leq nE_{s}|X_{i^{\prime}}^{n}=x^{n}\}dP_{X_{i}^{n}u_{s}^{n}(J^{\star}_{s})}(x^{n},u^{n})
=(148)E[πs(Xin;usn(Js))]=[0,1]Pr{πs(Xin;usn(Js))>t}𝑑t.\displaystyle\stackrel{{\scriptstyle\eqref{pi_funcs}}}{{=}}\mathrm{E}[\pi_{s}(X_{i}^{n};u_{s}^{n}(J^{\star}_{s}))]=\int_{[0,1]}\mathrm{Pr}\{\pi_{s}(X_{i}^{n};u_{s}^{n}(J^{\star}_{s}))>t\}dt. (151)

Let (X¯sn,U¯sn)(\bar{X}_{s}^{n},\bar{U}_{s}^{n}) be another tuple of generic random variables that is independent of the random codebook (Usn(j))(U_{s}^{n}(j)) such that (X¯sn,U¯sn)(P𝒳,s×PUs|X¯s)n(\bar{X}_{s}^{n},\bar{U}_{s}^{n})\sim(P_{\mathcal{X},s}\times P_{U_{s}|\bar{X}_{s}})^{\otimes n}. Furthermore for the given codebook realization let J¯s\bar{J}^{\star}_{s} be the random message induced by X¯sn\bar{X}_{s}^{n} through the encoding process. Then we have

αn=maxi[1:m]αn(i)\displaystyle\alpha_{n}=\max_{i\in[1:m]}\alpha_{n}^{(i)}\leq s[0,1]Pr{πs(X¯sn;usn(J¯s))>t}𝑑t\displaystyle\sum_{s}\int_{[0,1]}\mathrm{Pr}\{\pi_{s}(\bar{X}_{s}^{n};u_{s}^{n}(\bar{J}^{\star}_{s}))>t\}dt
+sPr{T(X¯sn)s}τs2+sτs3.\displaystyle+\sum_{s}\underbrace{\mathrm{Pr}\{T(\bar{X}_{s}^{n})\neq s\}}_{\tau_{s}^{2}}+\sum_{s}\tau_{s}^{3}. (152)

Averaging over all codebooks we obtain by Fubini’s theorem

𝔼[αn]s[0,1]Pr{πs(X¯sn;Usn(J¯s))>t}𝑑t+s(τs2+τs3).\displaystyle\mathbb{E}[\alpha_{n}]\leq\sum_{s}\int_{[0,1]}\mathrm{Pr}\{\pi_{s}(\bar{X}_{s}^{n};U_{s}^{n}(\bar{J}^{\star}_{s}))>t\}dt+\sum_{s}(\tau_{s}^{2}+\tau_{s}^{3}). (153)

By the non-asymptotic covering lemma in [25, Lemma 5] we have

Pr{πs(X¯sn;Usn(J¯s))>t}\displaystyle\mathrm{Pr}\{\pi_{s}(\bar{X}_{s}^{n};U_{s}^{n}(\bar{J}^{\star}_{s}))>t\} Pr{πs(X¯sn;U¯sn)>t}+eexp(nγ)\displaystyle\leq\mathrm{Pr}\{\pi_{s}(\bar{X}_{s}^{n};\bar{U}_{s}^{n})>t\}+e^{-\exp(n\gamma)}
+Pr{ιPX¯snU¯sn(X¯sn;U¯sn)>n(Rc+γ)}τs4.\displaystyle+\underbrace{\mathrm{Pr}\{\iota_{P_{\bar{X}_{s}^{n}\bar{U}_{s}^{n}}}(\bar{X}_{s}^{n};\bar{U}_{s}^{n})>n(R_{c}+\gamma)\}}_{\tau_{s}^{4}}. (154)

This result implies the following chain of expressions

𝔼[αn]\displaystyle\mathbb{E}[\alpha_{n}] s[0,1]Pr{πs(X¯sn;U¯sn)>t}𝑑t+Seexp(nγ)+s(τs2+τs3+τs4)\displaystyle\leq\sum_{s}\int_{[0,1]}\mathrm{Pr}\{\pi_{s}(\bar{X}_{s}^{n};\bar{U}_{s}^{n})>t\}dt+Se^{-\exp(n\gamma)}+\sum_{s}(\tau_{s}^{2}+\tau_{s}^{3}+\tau_{s}^{4})
=s𝔼[πs(X¯sn;U¯sn)]+Seexp(nγ)+s(τs2+τs3+τs4).\displaystyle=\sum_{s}\mathbb{E}[\pi_{s}(\bar{X}_{s}^{n};\bar{U}_{s}^{n})]+Se^{-\exp(n\gamma)}+\sum_{s}(\tau_{s}^{2}+\tau_{s}^{3}+\tau_{s}^{4}). (155)

For each i𝔉si\in\mathfrak{F}_{s} let Y¯in\bar{Y}_{i}^{n} be a tuple of random variable such that Y¯inX¯snU¯sn\bar{Y}_{i}^{n}-\bar{X}_{s}^{n}-\bar{U}_{s}^{n} and PY¯in|X¯sn=PYi|XinP_{\bar{Y}_{i}^{n}|\bar{X}_{s}^{n}}=P_{Y_{i}|X_{i}}^{\otimes n} hold. From the definition of πs(;)\pi_{s}(\cdot;\cdot) we also have

𝔼[πs(X¯sn;U¯sn)]i𝔉sPr{ζ(Y¯in,U¯sn)nEs}.\displaystyle\mathbb{E}[\pi_{s}(\bar{X}_{s}^{n};\bar{U}_{s}^{n})]\leq\sum_{i\in\mathfrak{F}_{s}}\mathrm{Pr}\{\zeta(\bar{Y}_{i}^{n},\bar{U}_{s}^{n})\leq nE_{s}\}. (156)

In summary there exists a codebook realization, hence a mapping ϕn:𝒳n([1:|𝒫𝒳|]{e})×([1:M]{e})\phi_{n}\colon\mathcal{X}^{n}\to\mathcal{M}\triangleq([1:|\mathcal{P}_{\mathcal{X}}|]\cup\{e\})\times([1:M]\cup\{e^{\star}\}), such that

αn\displaystyle\alpha_{n}\leq si𝔉sPr{ζ(Y¯in,U¯sn)nEs}\displaystyle\sum_{s}\sum_{i\in\mathfrak{F}_{s}}\mathrm{Pr}\{\zeta(\bar{Y}_{i}^{n},\bar{U}_{s}^{n})\leq nE_{s}\}
+Seexp(nγ)+s(τs2+τs3+τs4),\displaystyle+Se^{-\exp(n\gamma)}+\sum_{s}(\tau_{s}^{2}+\tau_{s}^{3}+\tau_{s}^{4}), (157)

holds.

B-C Bounding error probabilities

Due to Lemma 1, τs3\tau_{s}^{3} goes to 0 as nn\to\infty. By the weak law of large numbers the terms τs2\tau_{s}^{2} and τs4\tau_{s}^{4} go to 0 as nn\to\infty. We focus now on the first term. We observe that

Pr{ζ(Y¯in,U¯sn)nEs}\displaystyle\mathrm{Pr}\{\zeta(\bar{Y}_{i}^{n},\bar{U}_{s}^{n})\leq nE_{s}\} Pr{ιPYnUn(Y¯in;U¯sn)n(Esdiy+γ)}\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y^{n}U^{n}}}(\bar{Y}_{i}^{n};\bar{U}_{s}^{n})\leq n(E_{s}-d_{i}^{\mathrm{y}}+\gamma)\}
+Pr{minj[1:k]ιPYnQYjn(Y¯in)<n(diyγ)}.\displaystyle+\mathrm{Pr}\{\min_{j\in[1:k]}\iota_{P_{Y^{n}}\|Q_{Y_{j}^{n}}}(\bar{Y}_{i}^{n})<n(d_{i}^{\mathrm{y}}-\gamma)\}. (158)

Similarly, the last term goes to 0 due to Lemma 1. In the next step we perform several change of measure steps from the general distribution PYnUnP_{Y^{n}U^{n}} to our distributions of interest PYiUinP_{Y_{i}U_{i}}^{\otimes n}, i[1:m]i\in[1:m]. For that purpose, for each i𝔉si\in\mathfrak{F}_{s}, we define the following sets

𝒜is\displaystyle\mathcal{A}_{is} ={(yn,un)PY¯inU¯sn(yn,un)>0},\displaystyle=\{(y^{n},u^{n})\mid P_{\bar{Y}_{i}^{n}\bar{U}_{s}^{n}}(y^{n},u^{n})>0\},
i\displaystyle\mathcal{B}_{i} ={ynPYn(yn)enγnPY¯in(yn)},\displaystyle=\{y^{n}\mid P_{Y^{n}}(y^{n})\leq e^{n\gamma_{n}}P_{\bar{Y}_{i}}^{\otimes n}(y^{n})\},
𝒞s\displaystyle\mathcal{C}_{s} ={unPUn(un)enγnPU¯sn(un)},\displaystyle=\{u^{n}\mid P_{U^{n}}(u^{n})\leq e^{n\gamma_{n}}P_{\bar{U}_{s}}^{\otimes n}(u^{n})\}, (159)

where γn0\gamma_{n}\to 0 and nγnn\gamma_{n}\to\infty as nn\to\infty. We have

PY¯iU¯sn[(i×𝒰n)c𝒜is]PY¯in[(i)c]enγn,\displaystyle P_{\bar{Y}_{i}\bar{U}_{s}}^{\otimes n}[(\mathcal{B}_{i}\times\mathcal{U}^{n})^{c}\cap\mathcal{A}_{is}]\leq P_{\bar{Y}_{i}}^{\otimes n}[(\mathcal{B}_{i})^{c}]\leq e^{-n\gamma_{n}},
PY¯iU¯sn[(𝒴n×𝒞s)c𝒜is]PU¯sn[(𝒞s)c]enγn.\displaystyle P_{\bar{Y}_{i}\bar{U}_{s}}^{\otimes n}[(\mathcal{Y}^{n}\times\mathcal{C}_{s})^{c}\cap\mathcal{A}_{is}]\leq P_{\bar{U}_{s}}^{\otimes n}[(\mathcal{C}_{s})^{c}]\leq e^{-n\gamma_{n}}. (160)

For (yn,un)𝒜is(i×𝒰n)(𝒴n×𝒞s)(y^{n},u^{n})\in\mathcal{A}_{is}\cap(\mathcal{B}_{i}\times\mathcal{U}^{n})\cap(\mathcal{Y}^{n}\times\mathcal{C}_{s}) as PYnUn(yn,un)νiPY¯iU¯sn(yn,un)P_{Y^{n}U^{n}}(y^{n},u^{n})\geq\nu_{i}P_{\bar{Y}_{i}\bar{U}_{s}}^{\otimes n}(y^{n},u^{n}) holds by the definition of PYnUnP_{Y^{n}U^{n}}, we further have

ιPYnUn(yn;un)logνi+ιPY¯inU¯sn(yn;un)2nγn.\iota_{P_{Y^{n}U^{n}}}(y^{n};u^{n})\geq\log\nu_{i}+\iota_{P_{\bar{Y}_{i}^{n}\bar{U}_{s}^{n}}}(y^{n};u^{n})-2n\gamma_{n}. (161)

This leads to

Pr\displaystyle\mathrm{Pr} {ιPYnUn(Y¯in;U¯sn)n(Esdiy+γ)}\displaystyle\{\iota_{P_{Y^{n}U^{n}}}(\bar{Y}_{i}^{n};\bar{U}_{s}^{n})\leq n(E_{s}-d_{i}^{\mathrm{y}}+\gamma)\}
=Pr{ιPYnUn(Y¯in;U¯sn)n(Edisyx+γ),(Y¯in,U¯sn)𝒜is}\displaystyle=\mathrm{Pr}\{\iota_{P_{Y^{n}U^{n}}}(\bar{Y}_{i}^{n};\bar{U}_{s}^{n})\leq n(E-d_{is}^{\mathrm{yx}}+\gamma),\;(\bar{Y}_{i}^{n},\bar{U}_{s}^{n})\in\mathcal{A}_{is}\}
Pr{ιPYnUn(Y¯in;U¯sn)n(Edisyx+γ),\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y^{n}U^{n}}}(\bar{Y}_{i}^{n};\bar{U}_{s}^{n})\leq n(E-d_{is}^{\mathrm{yx}}+\gamma),\;
(Y¯in,U¯sn)𝒜is(i×𝒰n)(𝒴n×𝒞s)}+2enγn\displaystyle\hskip 56.9055pt(\bar{Y}_{i}^{n},\bar{U}_{s}^{n})\in\mathcal{A}_{is}\cap(\mathcal{B}_{i}\times\mathcal{U}^{n})\cap(\mathcal{Y}^{n}\times\mathcal{C}_{s})\}+2e^{-n\gamma_{n}}
Pr{ιPY¯inU¯sn(Y¯in;U¯sn)n(Edisyxlogνi/n+2γn+γ)}+2enγn.\displaystyle\leq\mathrm{Pr}\big{\{}\iota_{P_{\bar{Y}_{i}^{n}\bar{U}_{s}^{n}}}(\bar{Y}_{i}^{n};\bar{U}_{s}^{n})\leq n(E-d_{is}^{\mathrm{yx}}-\log\nu_{i}/n+2\gamma_{n}+\gamma)\big{\}}+2e^{-n\gamma_{n}}. (162)

For an arbitrary γ>0\gamma>0 select E=minsθs(Rc)2γE=\min_{s}\theta_{s}(R_{c})-2\gamma which implies that Edisyx+γ<I(Y¯i;U¯s)E-d_{is}^{\mathrm{yx}}+\gamma<I(\bar{Y}_{i};\bar{U}_{s}) holds. By the weak law of large numbers we obtain

limnPr{ιPY¯inU¯sn(Y¯in;U¯sn)\displaystyle\lim_{n\to\infty}\mathrm{Pr}\big{\{}\iota_{P_{\bar{Y}_{i}^{n}\bar{U}_{s}^{n}}}(\bar{Y}_{i}^{n};\bar{U}_{s}^{n}) n(Edisyxlogνi/n+2γn+γ)}=0,\displaystyle\leq n(E-d_{is}^{\mathrm{yx}}-\log\nu_{i}/n+2\gamma_{n}+\gamma)\big{\}}=0,
i[1:m],i𝔉s.\displaystyle\forall i\in[1:m],i\in\mathfrak{F}_{s}. (163)

Therefore we have

Pr{ζ(Y¯in,U¯sn)nEs}\displaystyle\mathrm{Pr}\{\zeta(\bar{Y}_{i}^{n},\bar{U}_{s}^{n})\leq nE_{s}\} 0,s,i𝔉s,asn,\displaystyle\to 0,\;\forall s,\;i\in\mathfrak{F}_{s},\;\text{as}\;n\to\infty,
αn\displaystyle\Rightarrow\alpha_{n} 0,asn.\displaystyle\to 0,\;\text{as}\;n\to\infty. (164)

Define the following sets

𝒢={(yn,ϕn(xn))\displaystyle\mathcal{G}=\{(y^{n},\phi_{n}(x^{n}))\mid ζ(yn,un(ϕn(xn)))>n(Eds^x),\displaystyle\zeta(y^{n},u^{n}(\phi_{n}(x^{n})))>n(E-d_{\hat{s}}^{\mathrm{x}}),
ϕn(xn){(e,1),(1,e),,(S,e)}}\displaystyle\phi_{n}(x^{n})\notin\{(e,1),(1,e^{\star}),\dots,(S,e^{\star})\}\} (165)

and

j={ynQYjn(yn)>0},j[1:k].\displaystyle\mathcal{H}_{j}=\{y^{n}\mid Q_{Y_{j}^{n}}(y^{n})>0\},\;j\in[1:k]. (166)

𝒢\mathcal{G} is our decision region described in (149). Using 𝒢\mathcal{G} and {j}\{\mathcal{H}_{j}\} we perform in the following change of measure steps from QYjnQ_{Y_{j}^{n}} to PYn|UnP_{Y^{n}|U^{n}} and from Qϕn(Xtn)Q_{\phi_{n}(X_{t}^{n})} to Pϕn(Xn)P_{\phi_{n}(X^{n})} in the calculation of the miss detection probability. For each j[1:k]j\in[1:k] and (yn,ϕn(xn))𝒢(j×)(y^{n},\phi_{n}(x^{n}))\in\mathcal{G}\cap(\mathcal{H}_{j}\times\mathcal{M}) we have

ζ(yn,un(ϕn(xn)))>n(Eds^x)\displaystyle\zeta(y^{n},u^{n}(\phi_{n}(x^{n})))>n(E-d_{\hat{s}}^{\mathrm{x}})
\displaystyle\Rightarrow ιPYnUn(yn,un(ϕn(xn))+ιPYnQYjn(yn)>n(Eds^x)\displaystyle\iota_{P_{Y^{n}U^{n}}}(y^{n},u^{n}(\phi_{n}(x^{n}))+\iota_{P_{Y^{n}}\|Q_{Y_{j}^{n}}}(y^{n})>n(E-d_{\hat{s}}^{\mathrm{x}})
\displaystyle\Rightarrow logPYn|Un(yn|un(ϕn(xn)))QYjn(yn)>n(Eds^x),\displaystyle\log\frac{P_{Y^{n}|U^{n}}(y^{n}|u^{n}(\phi_{n}(x^{n})))}{Q_{Y_{j}^{n}}(y^{n})}>n(E-d_{\hat{s}}^{\mathrm{x}}),
\displaystyle\Rightarrow QYjn(yn)en(Eds^x)PYn|Un(yn|un(ϕn(xn))),\displaystyle Q_{Y_{j}^{n}}(y^{n})\leq e^{-n(E-d_{\hat{s}}^{\mathrm{x}})}P_{Y^{n}|U^{n}}(y^{n}|u^{n}(\phi_{n}(x^{n}))), (167)

where the second last expression follows since (yn,ϕn(xn))𝒢(j×)(y^{n},\phi_{n}(x^{n}))\in\mathcal{G}\cap(\mathcal{H}_{j}\times\mathcal{M}) implies that PYn(yn)>0P_{Y^{n}}(y^{n})>0, otherwise in the previous line the first term is 0 while the second term is -\infty, and due to our coding arguments PUn(un(ϕn(xn)))>0P_{U^{n}}(u^{n}(\phi_{n}(x^{n})))>0. Furthermore for (yn,ϕn(xn))𝒢(y^{n},\phi_{n}(x^{n}))\in\mathcal{G}, as ϕn(xn){(1,e),,(|𝒫𝒳|,e)}\phi_{n}(x^{n})\notin\{(1,e^{\star}),\dots,(|\mathcal{P}_{\mathcal{X}}|,e^{\star})\} holds, we also have

Qϕn(Xtn)(ϕn(xn))Pϕn(Xn)(ϕn(xn))en(ds^xγ),t[1:r].\displaystyle Q_{\phi_{n}(X_{t}^{n})}(\phi_{n}(x^{n}))\leq P_{\phi_{n}(X^{n})}(\phi_{n}(x^{n}))e^{-n(d_{\hat{s}}^{\mathrm{x}}-\gamma)},\;\forall t\in[1:r]. (168)

Therefore, for each j[1:k]j\in[1:k] and t[1:r]t\in[1:r] the probability of miss detection is bounded by

βn(jt)\displaystyle\beta_{n}^{(jt)} =QYjn×Qϕn(Xtn)(𝒢)=QYjn×Qϕn(Xtn)(𝒢(j×))\displaystyle=Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(\mathcal{G})=Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(\mathcal{G}\cap(\mathcal{H}_{j}\times\mathcal{M}))
en(ds^xγ)en(Eds^x)\displaystyle\leq e^{-n(d_{\hat{s}}^{\mathrm{x}}-\gamma)}e^{-n(E-d_{\hat{s}}^{\mathrm{x}})}
×(yn,ϕn(xn))𝒢(j×)Pϕn(Xn)(ϕn(xn))PYn|Un(yn|un(ϕn(xn)))\displaystyle\times\sum_{(y^{n},\phi_{n}(x^{n}))\in\mathcal{G}\cap(\mathcal{H}_{j}\times\mathcal{M})}P_{\phi_{n}(X^{n})}(\phi_{n}(x^{n}))P_{Y^{n}|U^{n}}(y^{n}|u^{n}(\phi_{n}(x^{n})))
en(Eγ).\displaystyle\leq e^{-n(E-\gamma)}. (169)

This implies that we have

βn=maxj[1:k],t[1:r]βn(jt)en(Eγ).\displaystyle\beta_{n}=\max_{j\in[1:k],t\in[1:r]}\beta_{n}^{(jt)}\leq e^{-n(E-\gamma)}. (170)

Therefore, the chosen sequence of ϕn\phi_{n} satisfies

limnαn=0,lim infn1nlog1βnEγ.\displaystyle\lim_{n\to\infty}\alpha_{n}=0,\;\liminf_{n\to\infty}\frac{1}{n}\log\frac{1}{\beta_{n}}\geq E-\gamma. (171)

Combining with the definition of intersected sets n(i)(E)\mathcal{I}_{n}^{(i)}(E) this further implies that for all i[1:m]i\in[1:m] we have

limnPYinϕn(Xin)[n(i)(E2γ)c]limn(αn(i)+en(E2γ)j,tβ(jt)n)=0.\displaystyle\lim_{n\to\infty}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}[\mathcal{I}_{n}^{(i)}(E-2\gamma)^{c}]\leq\lim_{n\to\infty}(\alpha_{n}^{(i)}+e^{n(E-2\gamma)}\sum_{j,t}\beta^{(jt)}_{n})=0. (172)

Appendix C Proof of Theorem 2

For an arbitrarily given γ>0\gamma>0, define the following typical sets

n,γ(i)\displaystyle\mathcal{B}_{n,\gamma}^{(i)} ={yn|ιPYinQYjin(yn)/ndiy|<γ},i[1:m],\displaystyle=\{y^{n}\mid|\iota_{P_{Y_{i}}^{\otimes n}\|Q_{Y_{j_{i}^{\star}}^{n}}}(y^{n})/n-d_{i}^{\mathrm{y}}|<\gamma\},\;i\in[1:m],
n,γs\displaystyle\mathcal{B}_{n,\gamma}^{s} ={xn|ιP𝒳,snQXtsn(xn)/ndsx|<γ},s[1:|𝒫𝒳|].\displaystyle=\{x^{n}\mid|\iota_{P_{\mathcal{X},s}^{\otimes n}\|Q_{X_{t_{s}^{\star}}}^{n}}(x^{n})/n-d_{s}^{\mathrm{x}}|<\gamma\},\;s\in[1:|\mathcal{P}_{\mathcal{X}}|]. (173)

We have

limn\displaystyle\lim_{n\to\infty} P𝒳,sn(n,γs)=1,s[1:|𝒫𝒳|],\displaystyle P_{\mathcal{X},s}^{\otimes n}(\mathcal{B}_{n,\gamma}^{s})=1,\;\forall s\in[1:|\mathcal{P}_{\mathcal{X}}|],
limn\displaystyle\lim_{n\to\infty} PYin(n,γ(i))=1,i[1:m].\displaystyle P_{Y_{i}}^{\otimes n}(\mathcal{B}_{n,\gamma}^{(i)})=1,\;\forall i\in[1:m]. (174)

either due to the weak law of large numbers or due to [24, Theorem 1]. For simplicity we first consider the case that there is a single marginal distribution in the set 𝒫𝒳\mathcal{P}_{\mathcal{X}}. In this case we simply write P𝒳,1P_{\mathcal{X},1} as PXP_{X}, t1t_{1}^{\star} as tt^{\star}, n,γ1\mathcal{B}_{n,\gamma}^{1} as n,γ\mathcal{B}_{n,\gamma}, and θs(Rc)\theta_{s}(R_{c}) as θ(Rc)\theta(R_{c}). Furthermore, for notation compactness we define the following quantities in this case

d¯i=diy+d1x,i[1:m].\displaystyle\bar{d}_{i}^{\star}=d_{i}^{\mathrm{y}}+d_{1}^{\mathrm{x}},\;\forall i\in[1:m]. (175)

To support to analysis we define P¯=PYi|Xi×PX\bar{P}=\prod P_{Y_{i}|X_{i}}\times P_{X}. Given an arbitrary joint distribution P(Yi)i=1mXP_{(Y_{i})_{i=1}^{m}X}, consider the following region

={(Rc,E)\displaystyle\mathcal{R}=\{(R_{c},E)\mid Emini[1:m][I(Yi;U)+d¯i],\displaystyle E\leq\min_{i\in[1:m]}[I(Y_{i};U)+\bar{d}_{i}^{\star}],
RcI(X;U),UX(Yi)i=1m}.\displaystyle R_{c}\geq I(X;U),\;U-X-(Y_{i})_{i=1}^{m}\}. (176)

It can be seen that \mathcal{R} only depends on marginal distributions {PYiX}i[1:m]\{P_{Y_{i}X}\}_{i\in[1:m]} but not on the joint distribution P(Yi)i=1mXP_{(Y_{i})_{i=1}^{m}X}. Without the loss of generality we assume that in the evaluation of \mathcal{R}, P(Yi)i=1mX=P¯P_{(Y_{i})_{i=1}^{m}X}=\bar{P}. In the following we use the hyperplane characterization of \mathcal{R}. For that purpose, we first show the following result.

Lemma 2.

\mathcal{R} is a closed, convex set. Furthermore, θ(Rc)\theta(R_{c}) is a concave function.

Proof.

Assume that {(Rc,i,Ei)}i=12\{(R_{c,i},E_{i})\}_{i=1}^{2} are two points in \mathcal{R} with corresponding kernels PUi|XP_{U_{i}|X}. Assume that α\alpha is a random variable taking values on {1,2}\{1,2\} with Pα(1)=νP_{\alpha}(1)=\nu, ν[0,1]\nu\in[0,1], and independent of everything else. Define U=(Uα,α)U=(U_{\alpha},\alpha). Then we have

I(X;U)\displaystyle I(X;U) =νI(X;U1)+(1ν)I(X;U2)\displaystyle=\nu I(X;U_{1})+(1-\nu)I(X;U_{2})
νRc,1+(1ν)Rc,2.\displaystyle\leq\nu R_{c,1}+(1-\nu)R_{c,2}.
I(Yi;U)\displaystyle I(Y_{i};U) =νI(Yi;U1)+(1ν)I(Yi;U2)\displaystyle=\nu I(Y_{i};U_{1})+(1-\nu)I(Y_{i};U_{2})
νE1+(1ν)E2d¯i,i[1:m].\displaystyle\geq\nu E_{1}+(1-\nu)E_{2}-\bar{d}_{i}^{\star},\;i\in[1:m]. (177)

Therefore, the convex combination (νRc,1+(1ν)Rc,2,νE1+(1ν)E2)(\nu R_{c,1}+(1-\nu)R_{c,2},\nu E_{1}+(1-\nu)E_{2}) lies also in \mathcal{R}, or \mathcal{R} is a convex set. It can also be seen that \mathcal{R} is a closed set since all alphabets are finite. Let ={(Rc,E)(Rc,E)}\mathcal{R}^{-}=\{(R_{c},E)\mid(R_{c},-E)\in\mathcal{R}\} be the reflection of the set \mathcal{R} via the RcR_{c}-axis. Then \mathcal{R}^{-} is a convex set. Furthermore the function

θ(Rc)\displaystyle\theta^{-}(R_{c}) =min{E(Rc,E)}\displaystyle=\min\{E\mid(R_{c},E)\in\mathcal{R}^{-}\}
=minPU|X:I(X;U)Rcmini[1:m][I(Yi;U)+d¯i],\displaystyle=\min_{P_{U|X}\colon I(X;U)\leq R_{c}}-\min_{i\in[1:m]}[I(Y_{i};U)+\bar{d}_{i}^{\star}],
=maxPU|X:I(X;U)Rcmini[1:m][I(Yi;U)+d¯i],\displaystyle=-\max_{P_{U|X}\colon I(X;U)\leq R_{c}}\min_{i\in[1:m]}[I(Y_{i};U)+\bar{d}_{i}^{\star}], (178)

is a convex function. This implies that θ(Rc)=θ(Rc)\theta(R_{c})=-\theta^{-}(R_{c}) is a concave function. ∎

For any μ>0\mu>0 define

RHTμ(P¯)=minPU|X(I(X;U)μmini[1:m][I(Yi;U)+d¯i]).\displaystyle R_{\mathrm{HT}}^{\mu}(\bar{P})=\min_{P_{U|X}}(I(X;U)-\mu\min_{i\in[1:m]}[I(Y_{i};U)+\bar{d}_{i}^{\star}]). (179)

Since \mathcal{R} is a closed convex set, the line RcμE=RHTμ(P¯)R_{c}-\mu E=R_{\mathrm{HT}}^{\mu}(\bar{P}) supports \mathcal{R} or101010Assume that (x,y)+2(x,y)\in\mathbb{R}_{+}^{2} and (x,y)(x,y)\notin\mathcal{R} hold. Then there exists a pair (a,b)2(a,b)\in\mathbb{R}^{2} such that ax+by<aR+bEax+by<aR+bE for all (R,E)(R,E)\in\mathcal{R} as \mathcal{R} is a closed convex set. Since (0,minid¯i)(0,\min_{i}\bar{d}_{i}^{\star})\in\mathcal{R}, we must have y>minid¯iy>\min_{i}\bar{d}_{i}^{\star}. This implies that either aa or bb is negative by plugging (R,E)=(0,minid¯i)(R,E)=(0,\min_{i}\bar{d}_{i}^{\star}) in. Similarly plugging (R,minid¯i)(R,\min_{i}\bar{d}_{i}^{\star}) with sufficiently large RR in we see that aa is positive and bb is negative. Setting μ=b/a\mu=-b/a, we see that if (x,y)(x,y)\notin\mathcal{R} then (x,y)μ>0{(Rc,E)RcμERHTμ(P¯)}(x,y)\notin\cap_{\mu>0}\{(R_{c},E)\mid R_{c}-\mu E\geq R_{\mathrm{HT}}^{\mu}(\bar{P})\}. Hence μ>0{(Rc,E)RcμERHTμ(P¯)}\mathcal{R}\supseteq\cap_{\mu>0}\{(R_{c},E)\mid R_{c}-\mu E\geq R_{\mathrm{HT}}^{\mu}(\bar{P})\} holds. The other direction is straightforward. =μ>0{(Rc,E)RcμERHTμ(P¯)}\mathcal{R}=\cap_{\mu>0}\{(R_{c},E)\mid R_{c}-\mu E\geq R_{\mathrm{HT}}^{\mu}(\bar{P})\}. For our proof we also use an additional characterization of RHTμ(P¯)R_{\mathrm{HT}}^{\mu}(\bar{P}) which is stated in the following. For any pair of positive numbers (μ,α)(\mu,\alpha) we define

RHTμ,α(P¯)=\displaystyle R_{\mathrm{HT}}^{\mu,\alpha}(\bar{P})= minPU~X~(Y~i)i=1m(I(X~,(Y~i)i=1m;U~)μmini[1:m][I(Y~i;U~)+d¯i]\displaystyle\min_{P_{\tilde{U}\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}}\bigg{(}I(\tilde{X},(\tilde{Y}_{i})_{i=1}^{m};\tilde{U})-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{i};\tilde{U})+\bar{d}_{i}^{\star}]
+αI(U~;(Y~i)i=1m|X~)+(α+1)D(PX~(Y~i)i=1mP¯)),\displaystyle\qquad+\alpha I(\tilde{U};(\tilde{Y}_{i})_{i=1}^{m}|\tilde{X})+(\alpha+1)D(P_{\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}\|\bar{P})\bigg{)}, (180)

where PU~X~(Y~i)i=1mP_{\tilde{U}\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}} is a joint probability measure on 𝒰~×𝒳×𝒴m\tilde{\mathcal{U}}\times\mathcal{X}\times\mathcal{Y}^{m} satisfying PX~(Y~i)i=1mP¯P_{\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}\ll\bar{P} and (U~,X~,(Y~i)i=1m)PU~X~(Y~i)i=1m(\tilde{U},\tilde{X},(\tilde{Y}_{i})_{i=1}^{m})\sim P_{\tilde{U}\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}. By the support lemma in [27] we can upper bound the cardinality of |𝒰~||\tilde{\mathcal{U}}| by a constant. An alternative characterization of RHTμ(P¯)R_{\mathrm{HT}}^{\mu}(\bar{P}) is given in the following.

Lemma 3.
supα>0RHTμ,α(P¯)=RHTμ(P¯).\displaystyle\sup_{\alpha>0}R_{\mathrm{HT}}^{\mu,\alpha}(\bar{P})=R_{\mathrm{HT}}^{\mu}(\bar{P}). (181)

The proof of Lemma 3 is deferred to the end of Subsection C-A.

C-A Strong converse proof for ϵ<min{mins𝒮1|𝔉s|,1}\epsilon<\min\{\min_{s\in\mathcal{S}}\frac{1}{|\mathfrak{F}_{s}|},1\}

We now present the main part of the proof of Theorem 2. In the proof, we will use a recent technique by Tyagi and Watanabe [26]. When the inactive set is empty, 𝒮=\mathcal{S}=\varnothing, showing that Ecomp,ϵ(Rc)mini[1:m]ξi(Rc)E_{\mathrm{comp},\epsilon}^{\star}(R_{c})\leq\min_{i\in[1:m]}\xi_{i}(R_{c}) for all ϵ[0,1)\epsilon\in[0,1) can be deduced from the strong converse result of testing PYiXinP_{Y_{i}X_{i}}^{\otimes n} against QYjin×QXtnQ_{Y_{j_{i}^{\star}}^{n}}\times Q_{X_{t^{\star}}^{n}} for all i[1:m]i\in[1:m], cf. Theorem 7. Without loss of generality, we assume in the following that the inactive set is not empty 𝒮\mathcal{S}\neq\varnothing.

Assume first that the set of marginal distributions on 𝒳\mathcal{X}, 𝒫𝒳\mathcal{P}_{\mathcal{X}}, is a singleton. This implies that |𝔉1|=m|\mathfrak{F}_{1}|=m holds. Given a sequence of testing schemes (ϕn,ψn)(\phi_{n},\psi_{n}) such that the ϵ\epsilon-achievability conditions are fulfilled

lim supn1nlog|ϕn|\displaystyle\limsup_{n\to\infty}\frac{1}{n}\log|\phi_{n}| Rc,lim supnαnϵ,\displaystyle\leq R_{c},\;\limsup_{n\to\infty}\alpha_{n}\leq\epsilon,
lim infn\displaystyle\;\liminf_{n\to\infty} 1nlog1βnE,\displaystyle\frac{1}{n}\log\frac{1}{\beta_{n}}\geq E, (182)

we define for each i[1:m]i\in[1:m] the following likelihood based decision region

𝒜n,γi={(yn,ϕn(xn))\displaystyle\mathcal{A}_{n,\gamma}^{i}=\{(y^{n},\phi_{n}(x^{n}))\mid PYinϕn(Xin)(yn,ϕn(xn))\displaystyle P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(y^{n},\phi_{n}(x^{n}))
en(Eγ)QYjin×Qϕn(Xtn)(yn,ϕn(xn))}.\displaystyle\geq e^{n(E-\gamma)}Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t^{\star}}^{n})}(y^{n},\phi_{n}(x^{n}))\}. (183)

By [11, Lemma 4.1.2], cf. also [31, Lemma 12.2], we obtain

αn+en(Eγ)βn\displaystyle\alpha_{n}+e^{n(E-\gamma)}\beta_{n} PYinϕn(Xin)[(𝒜n,γi)c],\displaystyle\geq P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}[(\mathcal{A}_{n,\gamma}^{i})^{c}],
QYjin×Qϕn(Xtn)(𝒜in,γ)\displaystyle Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t^{\star}}^{n})}(\mathcal{A}^{i}_{n,\gamma}) en(Eγ).\displaystyle\leq e^{-n(E-\gamma)}. (184)

For each nn, (ϕn,𝟏𝒜in,γ)(\phi_{n},\mathbf{1}_{\mathcal{A}^{i}_{n,\gamma}}) can be seen as a testing scheme for differentiating between PYiXinP_{Y_{i}X_{i}}^{\otimes n} and QYjin×QXtnQ_{Y_{j_{i}^{\star}}^{n}}\times Q_{X_{t^{\star}}^{n}}. For a given i[1:m]i\in[1:m] we now a construct a testing scheme (ϕ¯n,ψ¯in)(\bar{\phi}_{n},\bar{\psi}_{in}) to differentiate between PYiXinP_{Y_{i}X_{i}}^{\otimes n} and PYin×PXnP_{Y_{i}}^{\otimes n}\times P_{X}^{\otimes n}. Given ϕn\phi_{n}, the compression mapping ϕ¯n\bar{\phi}_{n} does not depend on ii. Our arguments are similar to the code transformations given in Appendix A. We present the procedure in the following for completeness. Given ϕn\phi_{n}, ϕ¯n\bar{\phi}_{n} is defined as

ϕ¯n:𝒳n\displaystyle\bar{\phi}_{n}\colon\mathcal{X}^{n} {e}\displaystyle\to\mathcal{M}\cup\{e\}
ϕ¯n(xn)\displaystyle\bar{\phi}_{n}(x^{n}) {ϕn(xn),ifxnn,γ,eotherwise.\displaystyle\mapsto\begin{dcases}\phi_{n}(x^{n}),\;&\text{if}\;x^{n}\in\mathcal{B}_{n,\gamma},\\ e\;&\text{otherwise}\end{dcases}.

For each i𝔉1i\in\mathfrak{F}_{1}, the decision mapping ψ¯in\bar{\psi}_{in} is defined as

ψ¯in:𝒴n×({e})\displaystyle\bar{\psi}_{in}\colon\mathcal{Y}^{n}\times(\mathcal{M}\cup\{e\}) {0,1}\displaystyle\to\{0,1\}
ψ¯in(yn,u¯)\displaystyle\bar{\psi}_{in}(y^{n},\bar{u}) {𝟏𝒜in,γ(yn,u¯),ifynn,γ(i),andu¯e,1otherwise.\displaystyle\mapsto\begin{dcases}\mathbf{1}_{\mathcal{A}^{i}_{n,\gamma}}(y^{n},\bar{u}),\;&\text{if}\;y^{n}\in\mathcal{B}_{n,\gamma}^{(i)},\;\text{and}\;\bar{u}\neq e,\\ 1&\text{otherwise}\end{dcases}.

In the above definitions the typical sets n,γ\mathcal{B}_{n,\gamma} (or n,γs\mathcal{B}_{n,\gamma}^{s}) and n,γ(i)\mathcal{B}_{n,\gamma}^{(i)} are defined in (173). Then we also have

PYinϕ¯n(Xin)(1ψ¯in)\displaystyle P_{Y_{i}^{n}\bar{\phi}_{n}(X_{i}^{n})}(1-\bar{\psi}_{in}) PYinϕn(Xin)[(𝒜n,γi)c]+PXn[(n,γ)c]+PYin[(n,γ(i))c]\displaystyle\leq P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}[(\mathcal{A}_{n,\gamma}^{i})^{c}]+P_{X}^{\otimes n}[(\mathcal{B}_{n,\gamma})^{c}]+P_{Y_{i}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{(i)})^{c}]
PYin×Pϕ¯n(Xin)(ψ¯in)\displaystyle P_{Y_{i}^{n}}\times P_{\bar{\phi}_{n}(X_{i}^{n})}(\bar{\psi}_{in}) en(diy+dsx+2γ)QYjin×Qϕn(Xtn)(𝒜in,γ)\displaystyle\leq e^{n(d_{i}^{\mathrm{y}}+d_{s}^{\mathrm{x}}+2\gamma)}Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t^{\star}}^{n})}(\mathcal{A}^{i}_{n,\gamma})
en(Ed¯i3γ).\displaystyle\leq e^{-n(E-\bar{d}_{i}^{\star}-3\gamma)}. (185)

Let 𝒜n(i)\mathcal{A}_{n}^{(i)} be the corresponding acceptance region of ψ¯in\bar{\psi}_{in}. We take n0n_{0} to be sufficiently large such that the following conditions hold for all nn0n\geq n_{0}

log|ϕ¯n|\displaystyle\log|\bar{\phi}_{n}| n(Rc+γ),\displaystyle\leq n(R_{c}+\gamma),
PYinϕ¯n(Xin)(𝒜n(i))\displaystyle P_{Y_{i}^{n}\bar{\phi}_{n}(X_{i}^{n})}(\mathcal{A}_{n}^{(i)}) 1(ϵ+γ),i[1:m].\displaystyle\geq 1-(\epsilon+\gamma),\;\forall i\in[1:m]. (186)

Next, we can further decompose 𝒜n(i)\mathcal{A}_{n}^{(i)} as

𝒜n(i)=u𝒜n,u(i)×{u}.\mathcal{A}_{n}^{(i)}=\bigcup_{u\in\mathcal{M}}\mathcal{A}_{n,u}^{(i)}\times\{u\}. (187)

For simplicity define δ=1(ϵ+γ)\delta=1-(\epsilon+\gamma). Consider the set

𝒱i={xnPYi|Xin(𝒜n,ϕ¯n(xn)(i)|xn)>η},\displaystyle\mathcal{V}_{i}=\{x^{n}\mid P_{Y_{i}|X_{i}}^{\otimes n}(\mathcal{A}_{n,\bar{\phi}_{n}(x^{n})}^{(i)}|x^{n})>\eta\}, (188)

where δ>η>0\delta>\eta>0. Then we have

δ\displaystyle\delta PYinϕn(Xin)(𝒜n(i))\displaystyle\leq P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{A}_{n}^{(i)})
=Pr[(Yin,ϕn(Xin))𝒜n(i),Xin𝒱i]+Pr[(Yin,ϕn(Xin))𝒜n(i),Xin𝒱i]\displaystyle=\mathrm{Pr}[(Y_{i}^{n},\phi_{n}(X_{i}^{n}))\in\mathcal{A}_{n}^{(i)},X_{i}^{n}\in\mathcal{V}_{i}]+\mathrm{Pr}[(Y_{i}^{n},\phi_{n}(X_{i}^{n}))\in\mathcal{A}_{n}^{(i)},X_{i}^{n}\notin\mathcal{V}_{i}]
PXn(𝒱i)+ηPXn(𝒱ic),\displaystyle\leq P_{X}^{\otimes n}(\mathcal{V}_{i})+\eta P_{X}^{\otimes n}(\mathcal{V}_{i}^{c}), (189)

which implies that PXn(𝒱i)(δη)/(1η)P_{X}^{\otimes n}(\mathcal{V}_{i})\geq(\delta-\eta)/(1-\eta). Using this inequality we further obtain

PXn(i=1m𝒱i)\displaystyle P_{X}^{\otimes n}(\cap_{i=1}^{m}\mathcal{V}_{i}) =1PXn(i=1m𝒱ic)\displaystyle=1-P_{X}^{\otimes n}(\cup_{i=1}^{m}\mathcal{V}_{i}^{c})
1m(1δ)/(1η).\displaystyle\geq 1-m(1-\delta)/(1-\eta). (190)

For PXn(i=1m𝒱i)>0P_{X}^{\otimes n}(\cap_{i=1}^{m}\mathcal{V}_{i})>0, we require that

η<1m(1δ)=1m(ϵ+γ),\eta<1-m(1-\delta)=1-m(\epsilon+\gamma), (191)

must hold. This in turn implies that ϵ<1/m=1/|𝔉1|\epsilon<1/m=1/|\mathfrak{F}_{1}|. Taking η=(1m(ϵ+γ))/2\eta=(1-m(\epsilon+\gamma))/2, we obtain PXn(i=1m𝒱i)>(1m(ϵ+γ))/(1+m(ϵ+γ))P_{X}^{\otimes n}(\cap_{i=1}^{m}\mathcal{V}_{i})>(1-m(\epsilon+\gamma))/(1+m(\epsilon+\gamma)). Define 𝒱~n=i=1m𝒱i\tilde{\mathcal{V}}_{n}=\cap_{i=1}^{m}\mathcal{V}_{i}, ϵ~=(1m(ϵ+γ))/(1+m(ϵ+γ))\tilde{\epsilon}=(1-m(\epsilon+\gamma))/(1+m(\epsilon+\gamma)), and the following distribution on 𝒳n\mathcal{X}^{n}

P~𝒳n(xn)=PXn(xn)/PXn(𝒱~n)𝟏{xn𝒱~n}.\displaystyle\tilde{P}_{\mathcal{X}^{n}}(x^{n})=P_{X}^{\otimes n}(x^{n})/P_{X}^{\otimes n}(\tilde{\mathcal{V}}_{n})\mathbf{1}\{x^{n}\in\tilde{\mathcal{V}}_{n}\}. (192)

For a given xn𝒱~nx^{n}\in\tilde{\mathcal{V}}_{n} we also define the following joint conditional distribution on 𝒴nm\mathcal{Y}^{nm}

P~𝒴nm,xn((yin)i=1m|xn)=i=1mPYi|Xin(yn|xn)PYi|Xin(𝒜n,ϕ¯n(xn)(i)|xn)𝟏{yn𝒜n,ϕ¯n(xn)(i)}.\displaystyle\tilde{P}_{\mathcal{Y}^{nm},x^{n}}((y_{i}^{n})_{i=1}^{m}|x^{n})=\prod_{i=1}^{m}\frac{P_{Y_{i}|X_{i}}^{\otimes n}(y^{n}|x^{n})}{P_{Y_{i}|X_{i}}^{\otimes n}(\mathcal{A}_{n,\bar{\phi}_{n}(x^{n})}^{(i)}|x^{n})}\mathbf{1}\{y^{n}\in\mathcal{A}_{n,\bar{\phi}_{n}(x^{n})}^{(i)}\}. (193)

Additionally, we define P~𝒴nm,xn((yin)i=1m|xn)=0\tilde{P}_{\mathcal{Y}^{nm},x^{n}}((y_{i}^{n})_{i=1}^{m}|x^{n})=0 for all yny^{n} if xn𝒱~nx^{n}\notin\tilde{\mathcal{V}}_{n}. Let (X~n,(Y~in)i=1m)(\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m}) be a tuple of general sources such that

(X~n,(Y~in)i=1m)P~P~𝒴nm,xn×P~𝒳n.(\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m})\sim\tilde{P}\triangleq\tilde{P}_{\mathcal{Y}^{nm},x^{n}}\times\tilde{P}_{\mathcal{X}^{n}}. (194)

Then for each xn𝒱~nx^{n}\in\tilde{\mathcal{V}}_{n} we have the following inequality

PXn(xn)=PX~n(xn)PXn(𝒱~n)ϵ~PX~n(xn).\displaystyle P_{X}^{\otimes n}(x^{n})=P_{\tilde{X}^{n}}(x^{n})P_{X}^{\otimes n}(\tilde{\mathcal{V}}_{n})\geq\tilde{\epsilon}P_{\tilde{X}^{n}}(x^{n}). (195)
Refer to caption
Figure 4: Different situations for ynu:ϕ¯n1(u)𝒱~n𝒜n,u(i)y^{n}\in\bigcup_{u\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing}\mathcal{A}_{n,u}^{(i)}.

Consider an arbitrarily fixed sequence ynu:ϕ¯n1(u)𝒱~n𝒜n,u(i)y^{n}\in\bigcup_{u\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing}\mathcal{A}_{n,u}^{(i)}. For the following analysis, the illustration given in Fig. 4 is perhaps helpful for readers. Let uu be a message index such that ϕ¯n1(u)𝒱~n\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing and yn𝒜n,u(i)y^{n}\in\mathcal{A}_{n,u}^{(i)} hold. For all xn𝒱~nϕ¯n1(u)x^{n}\in\tilde{\mathcal{V}}_{n}\cap\bar{\phi}_{n}^{-1}(u), we then have

PYiXin(yn,xn)\displaystyle P_{Y_{i}X_{i}}^{\otimes n}(y^{n},x^{n}) =PYi|Xin(𝒜n,ϕ¯n(xn)(i)|xn)PXn(𝒱~n)PY~inX~n(yn,xn)\displaystyle=P_{Y_{i}|X_{i}}^{\otimes n}(\mathcal{A}_{n,\bar{\phi}_{n}(x^{n})}^{(i)}|x^{n})P_{X}^{\otimes n}(\tilde{\mathcal{V}}_{n})P_{\tilde{Y}_{i}^{n}\tilde{X}^{n}}(y^{n},x^{n})
ηϵ~PY~inX~n(yn,xn).\displaystyle\geq\eta\tilde{\epsilon}P_{\tilde{Y}_{i}^{n}\tilde{X}^{n}}(y^{n},x^{n}). (196)

If uu is another message index such that ϕ¯n1(u)𝒱~n\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing and that yn𝒜n,u(i)y^{n}\notin\mathcal{A}_{n,u}^{(i)} hold, then for all xn𝒱~ϕ¯n1(u)x^{n}\in\tilde{\mathcal{V}}\cap\bar{\phi}_{n}^{-1}(u), we have PY~inX~n(yn,xn)=0P_{\tilde{Y}_{i}^{n}\tilde{X}^{n}}(y^{n},x^{n})=0. When xn𝒱~nx^{n}\notin\tilde{\mathcal{V}}_{n}, we also have PY~inX~n(yn,xn)=0P_{\tilde{Y}_{i}^{n}\tilde{X}^{n}}(y^{n},x^{n})=0. Hence for all ynu:ϕ¯n1(u)𝒱~n𝒜n,u(i)y^{n}\in\bigcup_{u\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing}\mathcal{A}_{n,u}^{(i)}, we obtain

PYin(yn)=xn𝒳nPYiXin(yn,xn)\displaystyle P_{Y_{i}}^{\otimes n}(y^{n})=\sum_{x^{n}\in\mathcal{X}^{n}}P_{Y_{i}X_{i}}^{\otimes n}(y^{n},x^{n})
=xn𝒱~nPYiXin(yn,xn)+xn𝒱~nyn𝒜n,ϕ¯n(xn)(i)PYiXin(yn,xn)\displaystyle=\sum_{x^{n}\notin\tilde{\mathcal{V}}_{n}}P_{Y_{i}X_{i}}^{\otimes n}(y^{n},x^{n})+\sum_{\begin{subarray}{c}x^{n}\in\tilde{\mathcal{V}}_{n}\\ y^{n}\in\mathcal{A}_{n,\bar{\phi}_{n}(x^{n})}^{(i)}\end{subarray}}P_{Y_{i}X_{i}}^{\otimes n}(y^{n},x^{n})
+xn𝒱~nyn𝒜n,ϕ¯n(xn)(i)PYiXin(yn,xn)\displaystyle\hskip 28.45274pt+\sum_{\begin{subarray}{c}x^{n}\in\tilde{\mathcal{V}}_{n}\\ y^{n}\notin\mathcal{A}_{n,\bar{\phi}_{n}(x^{n})}^{(i)}\end{subarray}}P_{Y_{i}X_{i}}^{\otimes n}(y^{n},x^{n})
ηϵ~[xn𝒱~nPY~inX~n(yn,xn)+xn𝒱~nyn𝒜n,ϕ¯n(xn)(i)PY~inX~n(yn,xn)\displaystyle\geq\eta\tilde{\epsilon}\big{[}\sum_{x^{n}\notin\tilde{\mathcal{V}}_{n}}P_{\tilde{Y}_{i}^{n}\tilde{X}^{n}}(y^{n},x^{n})+\sum_{\begin{subarray}{c}x^{n}\in\tilde{\mathcal{V}}_{n}\\ y^{n}\in\mathcal{A}_{n,\bar{\phi}_{n}(x^{n})}^{(i)}\end{subarray}}P_{\tilde{Y}_{i}^{n}\tilde{X}^{n}}(y^{n},x^{n})
+xn𝒱~nyn𝒜n,ϕ¯n(xn)(i)PY~inX~n(yn,xn)]\displaystyle\hskip 28.45274pt+\sum_{\begin{subarray}{c}x^{n}\in\tilde{\mathcal{V}}_{n}\\ y^{n}\notin\mathcal{A}_{n,\bar{\phi}_{n}(x^{n})}^{(i)}\end{subarray}}P_{\tilde{Y}_{i}^{n}\tilde{X}^{n}}(y^{n},x^{n})\big{]}
=ηϵ~PY~in(yn),\displaystyle=\eta\tilde{\epsilon}P_{\tilde{Y}_{i}^{n}}(y^{n}), (197)

We observe that not only P~P¯n\tilde{P}\ll\bar{P}^{\otimes n} holds but also we have the following bound

D(P~P¯n)\displaystyle D(\tilde{P}\|\bar{P}^{\otimes n})
=PX~n(xn)logPX~n(xn)PXn\displaystyle=\sum P_{\tilde{X}^{n}}(x^{n})\log\frac{P_{\tilde{X}^{n}}(x^{n})}{P_{X}^{\otimes n}}
+PX~n(xn)P(Y~in)i=1m|X~n((yin)i=1m|xn)logP(Y~in)i=1m|X~n((yin)i=1m|xn)i=1mPYi|Xin(yin|xn)\displaystyle+\sum P_{\tilde{X}^{n}}(x^{n})\sum P_{(\tilde{Y}_{i}^{n})_{i=1}^{m}|\tilde{X}^{n}}((y_{i}^{n})_{i=1}^{m}|x^{n})\log\frac{P_{(\tilde{Y}_{i}^{n})_{i=1}^{m}|\tilde{X}^{n}}((y_{i}^{n})_{i=1}^{m}|x^{n})}{\prod_{i=1}^{m}P_{Y_{i}|X_{i}}^{\otimes n}(y_{i}^{n}|x^{n})}
log1ϵ~+mlog1ηη~.\displaystyle\leq\log\frac{1}{\tilde{\epsilon}}+m\log\frac{1}{\eta}\triangleq\tilde{\eta}. (198)

We now derive bounds on the compression rate RcR_{c} and the ϵ\epsilon-achievable error exponent EE using expressions involving (X~n,(Y~in)i=1m)(\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m}). We first have

n(Rc+γ)log|ϕ¯n|I(X~n;M~),\displaystyle n(R_{c}+\gamma)\geq\log|\bar{\phi}_{n}|\geq I(\tilde{X}^{n};\tilde{M}), (199)

where M~=ϕ¯n(X~n)\tilde{M}=\bar{\phi}_{n}(\tilde{X}^{n}). Note further that the following Markov chain holds

(Y~in)i=1mX~nM~.(\tilde{Y}_{i}^{n})_{i=1}^{m}-\tilde{X}^{n}-\tilde{M}. (200)

For each i[1:m]i\in[1:m] the support set of the joint distribution PY~inϕ¯n(X~n)P_{\tilde{Y}_{i}^{n}\bar{\phi}_{n}(\tilde{X}^{n})} is a subset of the following set, cf. Fig. 4 for a visual illustration,

𝒜~n(i)=u:ϕ¯n1(u)𝒱~n𝒜n,u(i)×{u}.\tilde{\mathcal{A}}_{n}^{(i)}=\bigcup_{u\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing}\mathcal{A}_{n,u}^{(i)}\times\{u\}.

Then compared with (187) we have 𝒜~n(i)𝒜n(i)\tilde{\mathcal{A}}_{n}^{(i)}\subseteq\mathcal{A}_{n}^{(i)} due to the restriction on uu. Therefore, on one hand we have

PYin×Pϕ¯n(Xin)(𝒜~n(i))PYin×Pϕ¯n(Xin)(𝒜n(i)).\displaystyle P_{Y_{i}^{n}}\times P_{\bar{\phi}_{n}(X_{i}^{n})}(\tilde{\mathcal{A}}_{n}^{(i)})\leq P_{Y_{i}^{n}}\times P_{\bar{\phi}_{n}(X_{i}^{n})}(\mathcal{A}_{n}^{(i)}). (201)

On the other hand from (195) and (197) we have

PYin×Pϕ¯n(Xin)(𝒜~n(i))=u:ϕ¯n1(u)𝒱~nPϕ¯n(Xin)(u)yn𝒜n,u(i)PYin(yn)\displaystyle P_{Y_{i}^{n}}\times P_{\bar{\phi}_{n}(X_{i}^{n})}(\tilde{\mathcal{A}}_{n}^{(i)})=\sum_{u\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing}P_{\bar{\phi}_{n}(X_{i}^{n})}(u)\sum_{y^{n}\in\mathcal{A}_{n,u}^{(i)}}P_{Y_{i}}^{\otimes n}(y^{n})
()u:ϕ¯n1(u)𝒱~nxnϕ¯n1(u)𝒱~nPXn(xn)yn𝒜n,u(i)PYin(yn)\displaystyle\stackrel{{\scriptstyle(*)}}{{\geq}}\sum_{u\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing}\sum_{x^{n}\in\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}}P_{X}^{\otimes n}(x^{n})\sum_{y^{n}\in\mathcal{A}_{n,u}^{(i)}}P_{Y_{i}}^{\otimes n}(y^{n})
u:ϕ¯n1(u)𝒱~nxnϕ¯n1(u)𝒱~nϵ~PX~n(xn)yn𝒜n,u(i)ϵ~ηPY~in(yn)\displaystyle\geq\sum_{u\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing}\sum_{x^{n}\in\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}}\tilde{\epsilon}P_{\tilde{X}^{n}}(x^{n})\sum_{y^{n}\in\mathcal{A}_{n,u}^{(i)}}\tilde{\epsilon}\eta P_{\tilde{Y}_{i}^{n}}(y^{n})
=ϵ~2ηu:ϕ¯n1(u)𝒱~nyn𝒜n,u(i)Pϕ¯n(X~n)(u)PY~in(yn).\displaystyle=\tilde{\epsilon}^{2}\eta\sum_{u\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing}\sum_{y^{n}\in\mathcal{A}_{n,u}^{(i)}}P_{\bar{\phi}_{n}(\tilde{X}^{n})}(u)P_{\tilde{Y}_{i}^{n}}(y^{n}). (202)

where ()(*) holds because Pϕ¯n(Xin)(u)=xnϕ¯n1(u)PXn(xn)P_{\bar{\phi}_{n}(X_{i}^{n})}(u)=\sum_{x^{n}\in\bar{\phi}_{n}^{-1}(u)}P_{X}^{\otimes n}(x^{n}). The last equality is valid because for xnϕ¯n1(u)(𝒱~n)cx^{n}\in\bar{\phi}_{n}^{-1}(u)\cap(\tilde{\mathcal{V}}_{n})^{c} we have PX~n(xn)=0P_{\tilde{X}^{n}}(x^{n})=0. This leads to the following

log1PYin×Pϕ¯n(Xin)(A~n(i))\displaystyle\log\frac{1}{P_{Y_{i}^{n}}\times P_{\bar{\phi}_{n}(X_{i}^{n})}(\tilde{A}_{n}^{(i)})}
()u,ynPY¯inϕ¯n(X¯n)(yn,u)logPY¯inϕ¯n(X¯n)(yn,u)ϵ~2ηPϕn(X~n)(u)PY~in(yn)\displaystyle\stackrel{{\scriptstyle(**)}}{{\leq}}\sum_{u,y^{n}}P_{\bar{Y}_{i}^{n}\bar{\phi}_{n}(\bar{X}^{n})}(y^{n},u)\log\frac{P_{\bar{Y}_{i}^{n}\bar{\phi}_{n}(\bar{X}^{n})}(y^{n},u)}{\tilde{\epsilon}^{2}\eta P_{\phi_{n}(\tilde{X}^{n})}(u)P_{\tilde{Y}_{i}^{n}}(y^{n})}
I(Y~in;ϕ¯n(X~n))+log1ηϵ~2,\displaystyle\leq I(\tilde{Y}_{i}^{n};\bar{\phi}_{n}(\tilde{X}^{n}))+\log\frac{1}{\eta\tilde{\epsilon}^{2}}, (203)

where we have use log-sum inequality in ()(**) and the summation therein is taken over u:ϕ¯n1(u)𝒱~nu\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing and yn𝒜n,u(i)y^{n}\in\mathcal{A}_{n,u}^{(i)}. The last inequality holds since for (yn,u)𝒜~n(i)(y^{n},u)\notin\tilde{\mathcal{A}}_{n}^{(i)} we have PY~inϕ¯n(X~n)(yn,u)=0P_{\tilde{Y}_{i}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}(y^{n},u)=0. Combining (185), (201), and (203) we obtain

n(E3γ)mini[1:m][I(Y~in;M~)+nd¯i]+log1ηϵ~2.\displaystyle n(E-3\gamma)\leq\min_{i\in[1:m]}[I(\tilde{Y}_{i}^{n};\tilde{M})+n\bar{d}_{i}^{\star}]+\log\frac{1}{\eta\tilde{\epsilon}^{2}}. (204)

By combining those two previous inequalities in (199) and (204), we obtain for arbitrarily given positive numbers μ\mu and α\alpha

n(Rc+γμ(E3γ))\displaystyle n(R_{c}+\gamma-\mu(E-3\gamma))
I(X~n;M~)μmini[1:m][I(Y~in;M~)+nd¯i]μlog1ηϵ~2\displaystyle\geq I(\tilde{X}^{n};\tilde{M})-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{i}^{n};\tilde{M})+n\bar{d}_{i}^{\star}]-\mu\log\frac{1}{\eta\tilde{\epsilon}^{2}}
(198),(200)I(X~n;M~)μmini[1:m][I(Y~in;M~)+nd¯i]\displaystyle\stackrel{{\scriptstyle\eqref{divergence_bound},\eqref{markov_thm2}}}{{\geq}}I(\tilde{X}^{n};\tilde{M})-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{i}^{n};\tilde{M})+n\bar{d}_{i}^{\star}]
+(α+1)I(M~;(Y~in)i=1m|X~n)+(α+1)D(P~P¯n)(α+1)η~μlog1ηϵ~2\displaystyle+(\alpha+1)I(\tilde{M};(\tilde{Y}_{i}^{n})_{i=1}^{m}|\tilde{X}^{n})+(\alpha+1)D(\tilde{P}\|\bar{P}^{\otimes n})-(\alpha+1)\tilde{\eta}-\mu\log\frac{1}{\eta\tilde{\epsilon}^{2}}
=I(M~;X~n,(Y~in)i=1m)μmini[1:m][I(Y~in;M~)+nd¯i]+αI(M~;(Y~in)i=1m|X~n)\displaystyle=I(\tilde{M};\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m})-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{i}^{n};\tilde{M})+n\bar{d}_{i}^{\star}]+\alpha I(\tilde{M};(\tilde{Y}_{i}^{n})_{i=1}^{m}|\tilde{X}^{n})
+(α+1)D(P~P¯n)(α+1)η~μlog1ηϵ~2\displaystyle+(\alpha+1)D(\tilde{P}\|\bar{P}^{\otimes n})-(\alpha+1)\tilde{\eta}-\mu\log\frac{1}{\eta\tilde{\epsilon}^{2}}
=A1+A2(α+1)η~μlog1ηϵ~2,\displaystyle=A_{1}+A_{2}-(\alpha+1)\tilde{\eta}-\mu\log\frac{1}{\eta\tilde{\epsilon}^{2}}, (205)

where

A1\displaystyle A_{1} =H(X~n,(Y~in)i=1m)+αH((Y~in)i=1m|X~n)+(α+1)D(P~P¯n),\displaystyle=H(\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m})+\alpha H((\tilde{Y}_{i}^{n})_{i=1}^{m}|\tilde{X}^{n})+(\alpha+1)D(\tilde{P}\|\bar{P}^{\otimes n}),
A2\displaystyle A_{2} =H(X~n,(Y~in)i=1m|M~)αH((Y~in)i=1m|X~n,M~)\displaystyle=-H(\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m}|\tilde{M})-\alpha H((\tilde{Y}_{i}^{n})_{i=1}^{m}|\tilde{X}^{n},\tilde{M})
μmini[1:m][I(Y~in;M~)+nd¯i].\displaystyle\hskip 56.9055pt-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{i}^{n};\tilde{M})+n\bar{d}_{i}^{\star}]. (206)

Let TT be a uniform random on [1:n][1:n]. Furthermore, define U~l=(M~,(Y~il1)i=1m)\tilde{U}_{l}=(\tilde{M},(\tilde{Y}_{i}^{l-1})_{i=1}^{m}) for all l=[1:n]l=[1:n], and U~=(U~T,T)\tilde{U}=(\tilde{U}_{T},T). We show at the end of this subsection that

A1\displaystyle A_{1} n(H(X~T,(Y~iT)i=1m)+αH((Y~iT)i=1m|X~T)\displaystyle\geq n(H(\tilde{X}_{T},(\tilde{Y}_{iT})_{i=1}^{m})+\alpha H((\tilde{Y}_{iT})_{i=1}^{m}|\tilde{X}_{T})
+(α+1)D(PX~T(Y~iT)i=1mP¯)),\displaystyle\hskip 28.45274pt+(\alpha+1)D(P_{\tilde{X}_{T}(\tilde{Y}_{iT})_{i=1}^{m}}\|\bar{P})), (207)
A2\displaystyle A_{2} n(H(X~T,(Y~iT)i=1m|U~)αH((Y~iT)i=1m|X~T,U~)\displaystyle\geq n(-H(\tilde{X}_{T},(\tilde{Y}_{iT})_{i=1}^{m}|\tilde{U})-\alpha H((\tilde{Y}_{iT})_{i=1}^{m}|\tilde{X}_{T},\tilde{U})
μmini[1:m][I(Y~iT;U~)+d¯i]).\displaystyle\hskip 28.45274pt-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{iT};\tilde{U})+\bar{d}_{i}^{\star}]). (208)

In summary we obtain for all positive μ\mu and α\alpha that

(Rc+γ)μ(E3γ)\displaystyle(R_{c}+\gamma)-\mu(E-3\gamma)
I(X~T,(Y~iT)i=1m;U~)μmini[1:m][I(Y~iT;U~)+d¯i]+αI(U~;(Y~iT)i=1m|X~T)\displaystyle\geq I(\tilde{X}_{T},(\tilde{Y}_{iT})_{i=1}^{m};\tilde{U})-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{iT};\tilde{U})+\bar{d}_{i}^{\star}]+\alpha I(\tilde{U};(\tilde{Y}_{iT})_{i=1}^{m}|\tilde{X}_{T})
+(α+1)D(PX~T(Y~iT)i=1mP¯)(α+1)η~+μlog1ηϵ~2n\displaystyle\qquad+(\alpha+1)D(P_{\tilde{X}_{T}(\tilde{Y}_{iT})_{i=1}^{m}}\|\bar{P})-\frac{(\alpha+1)\tilde{\eta}+\mu\log\frac{1}{\eta\tilde{\epsilon}^{2}}}{n}
RHTμ,α(P¯)(α+1)η~+μlog1ηϵ~2n.\displaystyle\geq R_{\text{HT}}^{\mu,\alpha}(\bar{P})-\frac{(\alpha+1)\tilde{\eta}+\mu\log\frac{1}{\eta\tilde{\epsilon}^{2}}}{n}. (209)

Taking nn\to\infty and supremum over α>0\alpha>0 and using Lemma 3, we have shown that (Rc+γ,E3γ)(R_{c}+\gamma,E-3\gamma)\in\mathcal{R}. Taking γ0\gamma\to 0 we obtain the conclusion that Eθ(Rc)E\leq\theta(R_{c}).
If the set of marginal distributions on 𝒳\mathcal{X}, 𝒫𝒳\mathcal{P}_{\mathcal{X}}, has more than one element then for each inactive s𝒮s\in\mathcal{S} we have Eθs(Rc)E\leq\theta_{s}(R_{c}) provided that ϵ<1/|𝔉s|\epsilon<1/|\mathfrak{F}_{s}| holds. For each active s𝒮s\notin\mathcal{S}, then by the strong converse result for the simple hypothesis testing problem we obtain Emini𝔉sξi(Rc)=θs(Rc)E\leq\min_{i\in\mathfrak{F}_{s}}\xi_{i}(R_{c})=\theta_{s}(R_{c}), for all ϵ[0,1)\epsilon\in[0,1). Thereby we obtain the conclusion.

Proof of Lemma 3: Select for any PU|XP_{U|X} in the optimization domain of RHTμ(P¯)R_{\text{HT}}^{\mu}(\bar{P}) a PU~X~(Y~i)i=1m=PU|X×P¯P_{\tilde{U}\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}=P_{U|X}\times\bar{P}. Then we can see that

supα>0RHTμ,α(P¯)RHTμ(P¯).\sup_{\alpha>0}R_{\text{HT}}^{\mu,\alpha}(\bar{P})\leq R_{\text{HT}}^{\mu}(\bar{P}).

Given an α>0\alpha>0, let PU~X~(Y~i)i=1mαP_{\tilde{U}\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}^{\alpha} be an optimal solution for RHTμ,α(P¯)R_{\text{HT}}^{\mu,\alpha}(\bar{P}). As RHTμ(P¯)log|𝒳|R_{\text{HT}}^{\mu}(\bar{P})\leq\log|\mathcal{X}|, and I(X~;U~)μmini[1:m][I(Y~i;U~)+d¯i]μ(log|𝒴|+minid¯i)I(\tilde{X};\tilde{U})-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{i};\tilde{U})+\bar{d}_{i}^{\star}]\geq-\mu(\log|\mathcal{Y}|+\min_{i}\bar{d}_{i}^{\star}), we have

D(PU~X~(Y~i)i=1mαPU~|X~α×P¯)\displaystyle D(P_{\tilde{U}\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}^{\alpha}\|P_{\tilde{U}|\tilde{X}}^{\alpha}\times\bar{P}) =I(U~;(Y~i)i=1m|X~)+D(PX~(Y~i)i=1mαP¯)\displaystyle=I(\tilde{U};(\tilde{Y}_{i})_{i=1}^{m}|\tilde{X})+D(P_{\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}^{\alpha}\|\bar{P})
log|𝒳|+μ(log|𝒴|+mini[1:m]d¯i)α.\displaystyle\leq\frac{\log|\mathcal{X}|+\mu(\log|\mathcal{Y}|+\min_{i\in[1:m]}\bar{d}_{i}^{\star})}{\alpha}. (210)

We then have

RHTμ,α(P¯)\displaystyle R_{\text{HT}}^{\mu,\alpha}(\bar{P})
I(X~;U~)μmini[1:m][I(Y~i;U~)+d¯i]\displaystyle\geq I(\tilde{X};\tilde{U})-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{i};\tilde{U})+\bar{d}_{i}^{\star}] eval. withPU~X~(Y~i)i=1mα\displaystyle\text{eval. with}\;P_{\tilde{U}\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}^{\alpha}
I(X;U¯)μmini[1:m][I(Yi;U¯)+d¯i]\displaystyle\geq I(X;\bar{U})-\mu\min_{i\in[1:m]}[I(Y_{i};\bar{U})+\bar{d}_{i}^{\star}]
Δ(log|𝒳|+μ(log|𝒴|+mini[1:m]d¯i)α)\displaystyle-\Delta\bigg{(}\frac{\log|\mathcal{X}|+\mu(\log|\mathcal{Y}|+\min_{i\in[1:m]}\bar{d}_{i}^{\star})}{\alpha}\bigg{)} eval. withPU~|X~α×P¯\displaystyle\text{eval. with}\;P_{\tilde{U}|\tilde{X}}^{\alpha}\times\bar{P}
RHTμ(P¯)Δ(log|𝒳|+μ(log|𝒴|+mini[1:m]d¯i)α),\displaystyle\geq R_{\text{HT}}^{\mu}(\bar{P})-\Delta\bigg{(}\frac{\log|\mathcal{X}|+\mu(\log|\mathcal{Y}|+\min_{i\in[1:m]}\bar{d}_{i}^{\star})}{\alpha}\bigg{)}, (211)

where Δ(t)0\Delta(t)\to 0 as t0t\to 0. Taking the supremum over α\alpha we obtain the conclusion.

Proof of (207) and (208): First, we have

A1=\displaystyle A_{1}= H(X~n,(Y~in)i=1m)+D(P~P¯n)\displaystyle H(\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m})+D(\tilde{P}\|\bar{P}^{\otimes n})
+α[H((Y~in)i=1m|X~n)+D(P~P¯n)].\displaystyle+\alpha[H((\tilde{Y}_{i}^{n})_{i=1}^{m}|\tilde{X}^{n})+D(\tilde{P}\|\bar{P}^{\otimes n})]. (212)

Note that since P~P¯n\tilde{P}\ll\bar{P}^{\otimes n}, whenever P¯(x,(yi)i=1m)=0\bar{P}(x,(y_{i})_{i=1}^{m})=0 we must have PX~l(Y~il)i=1m(x,(yi)i=1m)=0P_{\tilde{X}_{l}(\tilde{Y}_{il})_{i=1}^{m}}(x,(y_{i})_{i=1}^{m})=0 for all l[1:n]l\in[1:n]. This implies that PX~T(Y~iT)i=1m=1/nPX~l(Y~il)i=1mP¯P_{\tilde{X}_{T}(\tilde{Y}_{iT})_{i=1}^{m}}=1/n\sum P_{\tilde{X}_{l}(\tilde{Y}_{il})_{i=1}^{m}}\ll\bar{P}. The absolute continuities ensure the validity of the following derivations

H(X~n,(Y~in)i=1m)+D(P~P¯n)\displaystyle H(\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m})+D(\tilde{P}\|\bar{P}^{\otimes n})
=PX~n(Y~in)i=1m(xn,(yin)i=1m)log1P¯n(xn,(yin)i=1m)\displaystyle=\sum P_{\tilde{X}^{n}(\tilde{Y}_{i}^{n})_{i=1}^{m}}(x^{n},(y_{i}^{n})_{i=1}^{m})\log\frac{1}{\bar{P}^{\otimes n}(x^{n},(y_{i}^{n})_{i=1}^{m})}
=l=1nPX~l(Y~il)i=1m(x,(yi)i=1m)log1P¯(x,(yi)i=1m)\displaystyle=\sum_{l=1}^{n}\sum P_{\tilde{X}_{l}(\tilde{Y}_{il})_{i=1}^{m}}(x,(y_{i})_{i=1}^{m})\log\frac{1}{\bar{P}(x,(y_{i})_{i=1}^{m})}
=PX~T(Y~iT)i=1m(x,(yi)i=1m)log1P¯(x,(yi)i=1m)\displaystyle=\sum P_{\tilde{X}_{T}(\tilde{Y}_{iT})_{i=1}^{m}}(x,(y_{i})_{i=1}^{m})\log\frac{1}{\bar{P}(x,(y_{i})_{i=1}^{m})}
=n(H(X~T,(Y~iT)i=1m)+D(PX~T(Y~iT)i=1mP¯)),\displaystyle=n(H(\tilde{X}_{T},(\tilde{Y}_{iT})_{i=1}^{m})+D(P_{\tilde{X}_{T}(\tilde{Y}_{iT})_{i=1}^{m}}\|\bar{P})), (213)

and

H((Y~in)i=1m|X~n)+D(P~P¯n)=H(X~n)+H(X~n,(Y~in)i=1m)+D(P~P¯n)\displaystyle H((\tilde{Y}_{i}^{n})_{i=1}^{m}|\tilde{X}^{n})+D(\tilde{P}\|\bar{P}^{\otimes n})=-H(\tilde{X}^{n})+H(\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m})+D(\tilde{P}\|\bar{P}^{\otimes n})
=H(X~n)+nH(X~T)+nH((Y~iT)i=1m|X~T)+nD(PX~T(Y~iT)i=1mP¯)\displaystyle=-H(\tilde{X}^{n})+nH(\tilde{X}_{T})+nH((\tilde{Y}_{iT})_{i=1}^{m}|\tilde{X}_{T})+nD(P_{\tilde{X}_{T}(\tilde{Y}_{iT})_{i=1}^{m}}\|\bar{P})
nH((Y~iT)i=1m|X~T)+nD(PX~T(Y~iT)i=1mP¯).\displaystyle\geq nH((\tilde{Y}_{iT})_{i=1}^{m}|\tilde{X}_{T})+nD(P_{\tilde{X}_{T}(\tilde{Y}_{iT})_{i=1}^{m}}\|\bar{P}). (214)

In the last step we have use the inequality H(X~T)H(X~T|T)H(\tilde{X}_{T})\geq H(\tilde{X}_{T}|T), as in this case X~T\tilde{X}_{T} and TT might not be independent of each other. This implies (207). Furthermore, since conditioning reduces entropy, we can lower bound the term A2A_{2} as follows

A2\displaystyle A_{2} l=1nH(X~l,(Y~il)i=1m|M~,X~l1,(Y~il1)i=1m)\displaystyle\geq\sum_{l=1}^{n}-H(\tilde{X}_{l},(\tilde{Y}_{il})_{i=1}^{m}|\tilde{M},\tilde{X}^{l-1},(\tilde{Y}_{i}^{l-1})_{i=1}^{m})
αH((Y~il)i=1m|M~,(Y~il1)i=1m,X~n)\displaystyle-\alpha H((\tilde{Y}_{il})_{i=1}^{m}|\tilde{M},(\tilde{Y}_{i}^{l-1})_{i=1}^{m},\tilde{X}^{n})
μmini[1:m][l=1nI(Y~il;M~,Y~il1)+nd¯i]\displaystyle\qquad-\mu\min_{i\in[1:m]}\bigg{[}\sum_{l=1}^{n}I(\tilde{Y}_{il};\tilde{M},\tilde{Y}_{i}^{l-1})+n\bar{d}_{i}^{\star}\bigg{]}
l=1nH(X~l,(Y~il)i=1m|M~,(Y~il1)i=1m)αH((Y~il)i=1m|X~l,M~,(Y~il1)i=1m)\displaystyle\geq\sum_{l=1}^{n}-H(\tilde{X}_{l},(\tilde{Y}_{il})_{i=1}^{m}|\tilde{M},(\tilde{Y}_{i}^{l-1})_{i=1}^{m})-\alpha H((\tilde{Y}_{il})_{i=1}^{m}|\tilde{X}_{l},\tilde{M},(\tilde{Y}_{i}^{l-1})_{i=1}^{m})
μmini[1:m][l=1nI(Y~il;M~,(Y~ηl1)η=1m)+nd¯i]\displaystyle\qquad-\mu\min_{i\in[1:m]}\bigg{[}\sum_{l=1}^{n}I(\tilde{Y}_{il};\tilde{M},(\tilde{Y}_{\eta}^{l-1})_{\eta=1}^{m})+n\bar{d}_{i}^{\star}\bigg{]}
=l=1nH(X~l,(Y~il)i=1m|U~l)αH((Y~il)i=1m|X~l,U~l)\displaystyle=\sum_{l=1}^{n}-H(\tilde{X}_{l},(\tilde{Y}_{il})_{i=1}^{m}|\tilde{U}_{l})-\alpha H((\tilde{Y}_{il})_{i=1}^{m}|\tilde{X}_{l},\tilde{U}_{l})
μ[mini[1:m]l=1nI(Y~il;U~l)+nd¯i]\displaystyle\hskip 28.45274pt-\mu\bigg{[}\min_{i\in[1:m]}\sum_{l=1}^{n}I(\tilde{Y}_{il};\tilde{U}_{l})+n\bar{d}_{i}^{\star}\bigg{]}
n(H(X~T,(Y~iT)i=1m|U~T,T)αH((Y~iT)i=1m|X~T,U~T,T)\displaystyle\geq n(-H(\tilde{X}_{T},(\tilde{Y}_{iT})_{i=1}^{m}|\tilde{U}_{T},T)-\alpha H((\tilde{Y}_{iT})_{i=1}^{m}|\tilde{X}_{T},\tilde{U}_{T},T)
μmini[1:m][I(Y~iT;U~T,T)+d¯i]).\displaystyle\hskip 28.45274pt-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{iT};\tilde{U}_{T},T)+\bar{d}_{i}^{\star}]). (215)

C-B The case that ϵ>max{maxs𝒮|𝔉s|1|𝔉s|,0}\epsilon>\max\{\max_{s\in\mathcal{S}}\frac{|\mathfrak{F}_{s}|-1}{|\mathfrak{F}_{s}|},0\}

Fix a compression rate RcR_{c} and an arbitrary γ>0\gamma>0. If the inactive set is empty, 𝒮=\mathcal{S}=\varnothing, then we have minsθs(Rc)=mini[1:m]ξi(Rc)\min_{s}\theta_{s}(R_{c})=\min_{i\in[1:m]}\xi_{i}(R_{c}) and the threshold max{maxs𝒮(|𝔉s|1)/|𝔉s|,0}\max\{\max_{s\in\mathcal{S}}(|\mathfrak{F}_{s}|-1)/|\mathfrak{F}_{s}|,0\} becomes 0. We can use the achievability of Theorem 1 and the strong converse in the subsection C-A to verify the statement. Therefore in the following we assume that the inactive set is non-empty, 𝒮\mathcal{S}\neq\varnothing. Our coding scheme is influenced by the one given in [7].

C-B1 Construction of a testing scheme

Recall that 𝔉s\mathfrak{F}_{s} represents the distributions which have the same marginal distribution P𝒳,sP_{\mathcal{X},s} on 𝒳\mathcal{X}, and |𝔉s||\mathfrak{F}_{s}| is the number of these. For each inactive s𝒮s\in\mathcal{S} we partition the set 𝒳n\mathcal{X}^{n} into |𝔉s||\mathfrak{F}_{s}| sets {𝒞n(ls)}l=1|𝔉s|\{\mathcal{C}_{n}^{(ls)}\}_{l=1}^{|\mathfrak{F}_{s}|} such that for all sufficiently large nn, we have P𝒳,sn(𝒞n(ls))>1ϵP_{\mathcal{X},s}^{\otimes n}(\mathcal{C}_{n}^{(ls)})>1-\epsilon for all l[1:|𝔉s|]l\in[1:|\mathfrak{F}_{s}|]. This is possible111111Let 0<γ<[1|𝔉s|(1ϵ)]/20<\gamma<[1-|\mathfrak{F}_{s}|(1-\epsilon)]/2 be an arbitrarily given number. Given a type class TPnT_{P}^{n} which is a subset of the strongly typical set 𝒯γn(P𝒳,s)\mathcal{T}_{\gamma}^{n}(P_{\mathcal{X},s}), we divide it into |𝔉s||\mathfrak{F}_{s}| subsets such that each subset has cardinality of |TPn|/|𝔉s|\lfloor|T_{P}^{n}|/|\mathfrak{F}_{s}|\rfloor and omit the remaining sequences. Enumerating over all type classes inside the typical set, the number of omitted typical sequences is upper bounded by (n+1)|𝒳||𝔉s|(n+1)^{|\mathcal{X}|}|\mathfrak{F}_{s}|. Hence each of the constructed subsets has probability of at least (12γ)/|𝔉s|>1ϵ(1-2\gamma)/|\mathfrak{F}_{s}|>1-\epsilon for all sufficiently large nn. The atypical sequences and the omitted typical sequences can be then assigned into these sets randomly. because we have ϵ(maxs𝒮(|𝔉s|1)/|𝔉s|,1)\epsilon\in(\max_{s\in\mathcal{S}}(|\mathfrak{F}_{s}|-1)/|\mathfrak{F}_{s}|,1). Furthermore for each i𝔉si\in\mathfrak{F}_{s} where s𝒮s\in\mathcal{S}, let lil_{i} be its position inside the set 𝔉s\mathfrak{F}_{s} according to the natural ordering.

For each of these inactive states ss and i𝔉si\in\mathfrak{F}_{s}, let (ϕn(lis),ψn(lis))(\phi_{n}^{(l_{i}s)},\psi_{n}^{(l_{i}s)}) be a sequence of testing schemes to differentiate between PYiXinP_{Y_{i}X_{i}}^{\otimes n} and QYjin×QXtsnQ_{Y_{j_{i}^{\star}}^{n}}\times Q_{X_{t_{s}^{\star}}^{n}} such that ξi(Rc)γ/3\xi_{i}(R_{c})-\gamma/3 is achievable. Then similarly as in the conclusion of Theorem 1, the false alarm probability of the likelihood ratio test also goes to zero

limnPr{\displaystyle\lim_{n\to\infty}\mathrm{Pr}\big{\{} PYinϕn(lis)(Xin)(Yin,ϕn(lis)(Xin))\displaystyle P_{Y_{i}^{n}\phi_{n}^{(l_{i}s)}(X_{i}^{n})}(Y_{i}^{n},\phi_{n}^{(l_{i}s)}(X_{i}^{n}))
en(ξi(Rc)γ/3)QYjin×Qϕn(lis)(Xtsn)(Yin,ϕn(lis)(Xin))}=0.\displaystyle\leq e^{n(\xi_{i}(R_{c})-\gamma/3)}Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}^{(l_{i}s)}(X_{t_{s}^{\star}}^{n})}(Y_{i}^{n},\phi_{n}^{(l_{i}s)}(X_{i}^{n}))\big{\}}=0. (216)

Note also that the following expressions

Pr{PXin(Xin)en(dsx+γ/3)QXtsn(Xin)}\displaystyle\mathrm{Pr}\{P_{X_{i}}^{\otimes n}(X_{i}^{n})\leq e^{n(d_{s}^{\mathrm{x}}+\gamma/3)}Q_{X_{t_{s}^{\star}}^{n}}(X_{i}^{n})\} 1,\displaystyle\to 1,
Pr{PYin(Yin)en(diy+γ/3)QYjin(Yin)}\displaystyle\mathrm{Pr}\{P_{Y_{i}}^{\otimes n}(Y_{i}^{n})\leq e^{n(d_{i}^{\mathrm{y}}+\gamma/3)}Q_{Y_{j_{i}^{\star}}^{n}}(Y_{i}^{n})\} 1,\displaystyle\to 1, (217)

hold due to either the weak law of large numbers or Theorem 1 in [24]. Rewriting the above expression we therefore obtain

limnPr{ιPYinϕn(lis)(Xin)\displaystyle\lim_{n\to\infty}\mathrm{Pr}\big{\{}\iota_{P_{Y_{i}^{n}\phi_{n}^{(l_{i}s)}(X_{i}^{n})}} (Yin;ϕn(lis)(Xin))n(ξi(Rc)disyxγ)}=0.\displaystyle(Y_{i}^{n};\phi_{n}^{(l_{i}s)}(X_{i}^{n}))\leq n(\xi_{i}(R_{c})-d_{is}^{\mathrm{yx}}-\gamma)\big{\}}=0. (218)

For each active state ss, s𝒮s\notin\mathcal{S}, let (ϕns,ψns)(\phi_{n}^{s},\psi_{n}^{s}) be a sequence of testing schemes to differentiate between {PYiXin}i𝔉s\{P_{Y_{i}X_{i}}^{\otimes n}\}_{i\in\mathfrak{F}_{s}} and {QYjn×QXtn}\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\} such that θs(Rc)γ/3=mini¯𝔉sξi¯(Rc)γ/3\theta_{s}(R_{c})-\gamma/3=\min_{\bar{i}\in\mathfrak{F}_{s}}\xi_{\bar{i}}(R_{c})-\gamma/3 is achievable. The existence of (ϕns,ψns)(\phi_{n}^{s},\psi_{n}^{s}) follows from Theorem 1. Similarly from the conclusion of Theorem 1, for each intersected set (i)(θs(Rc)γ/3)\mathcal{I}^{(i)}(\theta_{s}(R_{c})-\gamma/3), i𝔉si\in\mathfrak{F}_{s}, we have

limnPYinϕns(Xin)[((i)(θs(Rc)γ/3))c]=0.\displaystyle\lim_{n\to\infty}P_{Y_{i}^{n}\phi_{n}^{s}(X_{i}^{n})}[(\mathcal{I}^{(i)}(\theta_{s}(R_{c})-\gamma/3))^{c}]=0. (219)

From the definition of (i)(θs(Rc)γ/3)\mathcal{I}^{(i)}(\theta_{s}(R_{c})-\gamma/3) in (17) we further have

limnPr{\displaystyle\lim_{n\to\infty}\mathrm{Pr}\big{\{} PYinϕns(Xin)(Yin,ϕns(Xin))\displaystyle P_{Y_{i}^{n}\phi_{n}^{s}(X_{i}^{n})}(Y_{i}^{n},\phi_{n}^{s}(X_{i}^{n}))
en(θs(Rc)γ/3)QYjin×Qϕns(Xtsn)(Yin,ϕns(Xin))}=0.\displaystyle\leq e^{n(\theta_{s}(R_{c})-\gamma/3)}Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}^{s}(X_{t_{s}^{\star}}^{n})}(Y_{i}^{n},\phi_{n}^{s}(X_{i}^{n}))\big{\}}=0. (220)

Using (217) again we obtain

limnPr{\displaystyle\lim_{n\to\infty}\mathrm{Pr}\big{\{} ιPYinϕns(Xin)(Yin;ϕns(Xin))\displaystyle\iota_{P_{Y_{i}^{n}\phi_{n}^{s}(X_{i}^{n})}}(Y_{i}^{n};\phi_{n}^{s}(X_{i}^{n}))
n(mini¯𝔉sξi¯(Rc)disyxγ)}=0,i𝔉s.\displaystyle\leq n(\min_{\bar{i}\in\mathfrak{F}_{s}}\xi_{\bar{i}}(R_{c})-d_{is}^{\mathrm{yx}}-\gamma)\big{\}}=0,\;\forall i\in\mathfrak{F}_{s}. (221)

Define the following auxiliary distribution

PYnXn=i=1mνiPYiXin,whereνi>0,andνi=1.\displaystyle P_{Y^{n}X^{n}}=\sum_{i=1}^{m}\nu_{i}P_{Y_{i}X_{i}}^{\otimes n},\;\text{where}\;\nu_{i}>0,\;\text{and}\;\sum\nu_{i}=1. (222)

Let PXnP_{X^{n}} and PYnP_{Y^{n}} be the marginal distributions of PYnXnP_{Y^{n}X^{n}}. Furthermore, let PYnUP_{Y^{n}U} be the push-forward distribution resulting from applying (ϕn(lis))(\phi_{n}^{(l_{i}s)}) and ϕns\phi_{n}^{s} to {PXin}\{P_{X_{i}^{n}}\}, i.e.,

PYnU=s𝒮i𝔉sνiPYinϕn(lis)(Xin)+s𝒮i𝔉sνiPYinϕns(Xin).\displaystyle P_{Y^{n}U}=\sum_{s\in\mathcal{S}}\sum_{i\in\mathfrak{F}_{s}}\nu_{i}P_{Y_{i}^{n}\phi_{n}^{(l_{i}s)}(X_{i}^{n})}+\sum_{s\notin\mathcal{S}}\sum_{i\in\mathfrak{F}_{s}}\nu_{i}P_{Y_{i}^{n}\phi_{n}^{s}(X_{i}^{n})}. (223)

We use the same mapping T()T(\cdot) to estimate ss before encoding the as in the proof of Theorem 1. If s^=e\hat{s}=e we send (e,1)(e,1) to the decision center. If s^e\hat{s}\neq e, we check if xnn,γs^x^{n}\in\mathcal{B}_{n,\gamma}^{\hat{s}} where

n,γs^={xnmint[1:r]ιPXnQXtn(xn)>n(ds^xγ)}.\displaystyle\mathcal{B}_{n,\gamma}^{\hat{s}}=\{x^{n}\mid\min_{t\in[1:r]}\iota_{P_{X^{n}}\|Q_{X_{t}^{n}}}(x^{n})>n(d_{\hat{s}}^{\mathrm{x}}-\gamma)\}. (224)

If the condition is not fulfilled we send a special symbol u=eu=e^{\star}. Assume that the condition xnn,γs^x^{n}\in\mathcal{B}_{n,\gamma}^{\hat{s}} holds. If s^𝒮\hat{s}\in\mathcal{S} we send the following message to the decision center u=ϕ¯ns^(xn)=l=1|𝔉s^|ϕn(ls^)(xn)𝟏{xn𝒞n(ls^)n,γs^}u=\bar{\phi}_{n}^{\hat{s}}(x^{n})=\sum_{l=1}^{|\mathfrak{F}_{\hat{s}}|}\phi_{n}^{(l\hat{s})}(x^{n})\mathbf{1}\{x^{n}\in\mathcal{C}_{n}^{(l\hat{s})}\cap\mathcal{B}_{n,\gamma}^{\hat{s}}\}. If s^𝒮\hat{s}\notin\mathcal{S} we send u=ϕns^(xn)𝟏{xnn,γs^}u=\phi_{n}^{\hat{s}}(x^{n})\mathbf{1}\{x^{n}\in\mathcal{B}_{n,\gamma}^{\hat{s}}\}. For E=mini[1:m]ξi(Rc)2γE=\min_{i\in[1:m]}\xi_{i}(R_{c})-2\gamma, we decide that the null hypothesis H0H_{0} is true if s^e\hat{s}\neq e, ueu\neq e^{\star} and

ιPYnU(yn;u)+minj[1:k]ιPYnQYjn(yn)>n(Eds^x).\displaystyle\iota_{P_{Y^{n}U}}(y^{n};u)+\min_{j\in[1:k]}\iota_{P_{Y^{n}}\|Q_{Y_{j}^{n}}}(y^{n})>n(E-d_{\hat{s}}^{\mathrm{x}}). (225)

C-B2 Bounding error probabilities

For each i𝔉si\in\mathfrak{F}_{s} where ss is active, s𝒮s\notin\mathcal{S}, the false alarm probability is given by

αn(i)\displaystyle\alpha_{n}^{(i)} Pr{ιPYnU(Yin;ϕns(Xin))+minj[1:k]ιPYnQYjn(Yin)n(Edsx)}\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y^{n}U}}(Y_{i}^{n};\phi_{n}^{s}(X_{i}^{n}))+\min_{j\in[1:k]}\iota_{P_{Y^{n}}\|Q_{Y_{j}^{n}}}(Y_{i}^{n})\leq n(E-d_{s}^{\mathrm{x}})\}
+P𝒳,sn[(Bn,γs)c]+Pr{T(Xin)s}.\displaystyle\quad+P_{\mathcal{X},s}^{\otimes n}[(B_{n,\gamma}^{s})^{c}]+\mathrm{Pr}\{T(X_{i}^{n})\neq s\}.
Pr{ιPYnU(Yin;ϕns(Xin))n(Edisyx+γ)}\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y^{n}U}}(Y_{i}^{n};\phi_{n}^{s}(X_{i}^{n}))\leq n(E-d_{is}^{\mathrm{yx}}+\gamma)\}
+Pr{minj[1:k]ιPYnQYjn(Y¯in)<n(diyγ)}\displaystyle\quad+\mathrm{Pr}\{\min_{j\in[1:k]}\iota_{P_{Y^{n}}\|Q_{Y_{j}^{n}}}(\bar{Y}_{i}^{n})<n(d_{i}^{\mathrm{y}}-\gamma)\}
+P𝒳,sn[(Bn,γs)c]+Pr{T(Xin)s}.\displaystyle\quad+P_{\mathcal{X},s}^{\otimes n}[(B_{n,\gamma}^{s})^{c}]+\mathrm{Pr}\{T(X_{i}^{n})\neq s\}. (226)

Similarly, for each i𝔉si\in\mathfrak{F}_{s} where ss is not active, s𝒮s\in\mathcal{S}, we have

αn(i)\displaystyle\alpha_{n}^{(i)} Pr{ιPYnU(Yin;ϕ¯ns(Xin))+minj[1:k]ιPYnQYjn(Yin)\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y^{n}U}}(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n}))+\min_{j\in[1:k]}\iota_{P_{Y^{n}}\|Q_{Y_{j}^{n}}}(Y_{i}^{n})
n(Edsx),Xinn,γs}\displaystyle\hskip 28.45274pt\leq n(E-d_{s}^{\mathrm{x}}),\;X_{i}^{n}\in\mathcal{B}_{n,\gamma}^{s}\}
+P𝒳,sn[(Bn,γs)c]+Pr{T(Xin)s}\displaystyle\quad+P_{\mathcal{X},s}^{\otimes n}[(B_{n,\gamma}^{s})^{c}]+\mathrm{Pr}\{T(X_{i}^{n})\neq s\}
Pr{ιPYnU(Yin;ϕn(lis)(Xin))n(Edisyx+γ),Xin𝒞n(lis)n,γs}\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y^{n}U}}(Y_{i}^{n};\phi_{n}^{(l_{i}s)}(X_{i}^{n}))\leq n(E-d_{is}^{\mathrm{yx}}+\gamma),\;X_{i}^{n}\in\mathcal{C}_{n}^{(l_{i}s)}\cap\mathcal{B}_{n,\gamma}^{s}\}
+Pr{ιPYnU(Yin;ϕ¯ns(Xin))n(Edisyx+γ),Xin𝒞n(lis)}\displaystyle\quad+\mathrm{Pr}\{\iota_{P_{Y^{n}U}}(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n}))\leq n(E-d_{is}^{\mathrm{yx}}+\gamma),\;X_{i}^{n}\notin\mathcal{C}_{n}^{(l_{i}s)}\}
+Pr{minj[1:k]ιPYnQYjn(Y¯in)<n(diyγ)}\displaystyle\quad+\mathrm{Pr}\{\min_{j\in[1:k]}\iota_{P_{Y^{n}}\|Q_{Y_{j}^{n}}}(\bar{Y}_{i}^{n})<n(d_{i}^{\mathrm{y}}-\gamma)\}
+P𝒳,sn[(Bn,γs)c]+Pr{T(Xin)s}\displaystyle\quad+P_{\mathcal{X},s}^{\otimes n}[(B_{n,\gamma}^{s})^{c}]+\mathrm{Pr}\{T(X_{i}^{n})\neq s\}
Pr{ιPYnU(Yin;ϕn(lis)(Xin))n(Edisyx+γ)}\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y^{n}U}}(Y_{i}^{n};\phi_{n}^{(l_{i}s)}(X_{i}^{n}))\leq n(E-d_{is}^{\mathrm{yx}}+\gamma)\}
+Pr{minj[1:k]ιPYnQYjn(Y¯in)<n(diyγ)}\displaystyle\quad+\mathrm{Pr}\{\min_{j\in[1:k]}\iota_{P_{Y^{n}}\|Q_{Y_{j}^{n}}}(\bar{Y}_{i}^{n})<n(d_{i}^{\mathrm{y}}-\gamma)\}
+P𝒳,sn[(Bn,γs)c]+Pr{T(Xin)s}+ϵ,\displaystyle\quad+P_{\mathcal{X},s}^{\otimes n}[(B_{n,\gamma}^{s})^{c}]+\mathrm{Pr}\{T(X_{i}^{n})\neq s\}+\epsilon, (227)

for all sufficiently large nn. The last inequality holds since for all sufficiently large nn we have P𝒳,sn[(𝒞n(lis))c]ϵP_{\mathcal{X},s}^{\otimes n}[(\mathcal{C}_{n}^{(l_{i}s)})^{c}]\leq\epsilon by the definition of 𝒞n(lis)\mathcal{C}_{n}^{(l_{i}s)}. Let γn\gamma_{n} be a sequence such that γn0\gamma_{n}\to 0 and nγnn\gamma_{n}\to\infty as nn\to\infty. Using the change of measure steps as in the proof of Theorem 1 we have

Pr{ιPYnU(Yin;ϕns(Xin))n(Edisyx+γ)}\displaystyle\mathrm{Pr}\{\iota_{P_{Y^{n}U}}(Y_{i}^{n};\phi_{n}^{s}(X_{i}^{n}))\leq n(E-d_{is}^{\mathrm{yx}}+\gamma)\}
Pr{ιPYinϕns(Xin)(Yin;ϕns(Xin))n(Edisyx+γ1/nlogνi+2γn)}\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y_{i}^{n}\phi_{n}^{s}(X_{i}^{n})}}(Y_{i}^{n};\phi_{n}^{s}(X_{i}^{n}))\leq n(E-d_{is}^{\mathrm{yx}}+\gamma-1/n\log{\nu_{i}}+2\gamma_{n})\}
+2enγn,\displaystyle\quad+2e^{-n\gamma_{n}},
Pr{ιPYnU(Yin;ϕn(lis)(Xin))n(Edisyx+γ)}\displaystyle\mathrm{Pr}\{\iota_{P_{Y^{n}U}}(Y_{i}^{n};\phi_{n}^{(l_{i}s)}(X_{i}^{n}))\leq n(E-d_{is}^{\mathrm{yx}}+\gamma)\}
Pr{ιPYinϕn(lis)(Xin)(Yin;ϕn(lis)(Xin))n(Edisyx+γ1/nlogνi+2γn)}\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y_{i}^{n}}\phi_{n}^{(l_{i}s)}(X_{i}^{n})}(Y_{i}^{n};\phi_{n}^{(l_{i}s)}(X_{i}^{n}))\leq n(E-d_{is}^{\mathrm{yx}}+\gamma-1/n\log\nu_{i}+2\gamma_{n})\}
+2enγn.\displaystyle\quad+2e^{-n\gamma_{n}}. (228)

Using (218), (221) and Lemma 1, we obtain

limnαn(i)=0,i𝔉s,s𝒮,\displaystyle\lim_{n\to\infty}\alpha_{n}^{(i)}=0,\;\forall i\in\mathfrak{F}_{s},\;s\notin\mathcal{S},
lim supnαn(i)ϵ,i𝔉s,s𝒮.\displaystyle\limsup_{n\to\infty}\alpha_{n}^{(i)}\leq\epsilon,\;\forall i\in\mathfrak{F}_{s},\;s\in\mathcal{S}. (229)

This implies that

lim supnαn=lim supnmaxi[1:m]αn(i)=maxi[1:m]lim supnαn(i)ϵ.\limsup_{n\to\infty}\alpha_{n}=\limsup_{n\to\infty}\max_{i\in[1:m]}\alpha_{n}^{(i)}=\max_{i\in[1:m]}\limsup_{n\to\infty}\alpha_{n}^{(i)}\leq\epsilon. (230)

Similarly as in the last part of the proof of Theorem 1, by change of measure steps, we also have

βn(jt)enE,j[1:k],t[1:r].\displaystyle\beta_{n}^{(jt)}\leq e^{-nE},\;\forall j\in[1:k],\;t\in[1:r]. (231)

Therefore for ϵ>maxs𝒮(|𝔉s|1)/|𝔉s|\epsilon>\max_{s\in\mathcal{S}}(|\mathfrak{F}_{s}|-1)/|\mathfrak{F}_{s}| we have Eϵ(Rc)=mini[1:m]ξi(Rc)E_{\epsilon}^{\star}(R_{c})=\min_{i\in[1:m]}\xi_{i}(R_{c}) as the converse direction is straightforward.

C-C Convergence of αn(i)\alpha_{n}^{(i^{\star})}

We assume that the optimality achieving index ss^{\star} is active, s𝒮s^{\star}\not\subset\mathcal{S}, otherwise there is nothing to prove. Let \mathcal{R}, RHTμ(PXiYi)R_{\mathrm{HT}}^{\mu}(P_{X_{i}^{\star}Y_{i^{\star}}}) and RHTμ,α(PXiYi)R_{\mathrm{HT}}^{\mu,\alpha}(P_{X_{i^{\star}}Y_{i^{\star}}}) be defined as in (176), (179) and (180) with m=1m=1 and PXiYiP_{X_{i^{\star}}Y_{i^{\star}}} in place of P¯\bar{P} and disyxd_{i^{\star}s^{\star}}^{\mathrm{yx}} in place of d¯i\bar{d}_{i}^{\star}.

Let (ϕn,ψn)(\phi_{n},\psi_{n}) be an arbitrary sequence of testing schemes such that lim infn1nlog1βnE\liminf_{n\to\infty}\frac{1}{n}\log\frac{1}{\beta_{n}}\geq E where E=ξi(Rc)+τE=\xi_{i^{\star}}(R_{c})+\tau for an arbitrary τ>0\tau>0 holds. Select γ(0,τ/5)\gamma\in(0,\tau/5) small enough such that (R+γ,E3γ)(R+\gamma,E-3\gamma)\notin\mathcal{R}.
Similarly as in the previous proofs we transform (ϕn,ψn)(\phi_{n},\psi_{n}) to a testing scheme (ϕ¯n,ψ¯n)(\bar{\phi}_{n},\bar{\psi}_{n}), where ψ¯n\bar{\psi}_{n} is a deterministic mapping, for differentiating between PYiXinP_{Y_{i^{\star}}X_{i^{\star}}}^{\otimes n} and PYin×PXinP_{Y_{i^{\star}}}^{\otimes n}\times P_{X_{i^{\star}}}^{\otimes n}. This can be done by using the typical sets n,γs\mathcal{B}_{n,\gamma}^{s^{\star}} and n,γ(i)\mathcal{B}_{n,\gamma}^{(i^{\star})} defined in (173). The resulting error probabilities are similarly bounded by

PYinϕ¯n(Xin)(1ψ¯n)\displaystyle P_{Y_{i^{\star}}^{n}\bar{\phi}_{n}(X_{i^{\star}}^{n})}(1-\bar{\psi}_{n}) αn(i)+en(Eγ)βn(jits)\displaystyle\leq\alpha_{n}^{(i^{\star})}+e^{n(E-\gamma)}\beta_{n}^{(j_{i^{\star}}^{\star}t_{s^{\star}}^{\star})}
+PXin[(n,γs)c]+PYin[(n,γ(i))c],\displaystyle\quad+P_{X_{i^{\star}}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{s^{\star}})^{c}]+P_{Y_{i^{\star}}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{(i^{\star})})^{c}],
PYin×Pϕ¯n(Xin)(ψ¯n)\displaystyle P_{Y_{i^{\star}}^{n}}\times P_{\bar{\phi}_{n}(X_{i^{\star}}^{n})}(\bar{\psi}_{n}) en(Edisyx3γ).\displaystyle\leq e^{-n(E-d_{i^{\star}s^{\star}}^{\mathrm{yx}}-3\gamma)}. (232)

Let 𝒜¯n\bar{\mathcal{A}}_{n} be the acceptance region of ψ¯n\bar{\psi}_{n}. We argue that there exists a λ>0\lambda>0 such that for all nn0(γ)n\geq n_{0}(\gamma) we have

PYinϕ¯n(Xin)(𝒜¯n)eλn.\displaystyle P_{Y_{i^{\star}}^{n}\bar{\phi}_{n}(X_{i^{\star}}^{n})}(\bar{\mathcal{A}}_{n})\leq e^{-\lambda n}. (233)

Assume otherwise that for all λ>0\lambda>0 there exists an nn0(γ)n\geq n_{0}(\gamma) such that

PYinϕ¯n(Xin)(𝒜¯n)>eλn.\displaystyle P_{Y_{i^{\star}}^{n}\bar{\phi}_{n}(X_{i^{\star}}^{n})}(\bar{\mathcal{A}}_{n})>e^{-\lambda n}. (234)

Similarly for notation simplicity we define B¯n=u{ϕ¯n1(u)}×𝒜¯n,u\bar{B}_{n}=\bigcup_{u}\{\bar{\phi}_{n}^{-1}(u)\}\times\bar{\mathcal{A}}_{n,u} as well as

PX~nY~n=PXiYin(xn,yn)PXiYin(B¯n)𝟏{(xn,yn)B¯n}.\displaystyle P_{\tilde{X}^{n}\tilde{Y}^{n}}=\frac{P_{X_{i^{\star}}Y_{i^{\star}}}^{\otimes n}(x^{n},y^{n})}{P_{X_{i^{\star}}Y_{i^{\star}}}^{\otimes n}(\bar{B}_{n})}\mathbf{1}\{(x^{n},y^{n})\in\bar{B}_{n}\}. (235)

We then have

D(PX~nY~nPXiYin)λn.\displaystyle D(P_{\tilde{X}^{n}\tilde{Y}^{n}}\|P_{X_{i^{\star}}Y_{i^{\star}}}^{\otimes n})\leq\lambda n. (236)

For (yn,u)𝒜¯n(y^{n},u)\in\bar{\mathcal{A}}_{n} it can be seen that

PYinϕ¯n(Xin)(yn,u)\displaystyle P_{Y_{i^{\star}}^{n}\bar{\phi}_{n}(X_{i^{\star}}^{n})}(y^{n},u) =PXiYin(B¯n)PY~nϕ¯n(X~n)(yn,u)\displaystyle=P_{X_{i^{\star}}Y_{i^{\star}}}^{\otimes n}(\bar{B}_{n})P_{\tilde{Y}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}(y^{n},u)
eλnPY~nϕ¯n(X~n)(yn,u)\displaystyle\geq e^{-\lambda n}P_{\tilde{Y}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}(y^{n},u)

holds whereas for (yn,u)𝒜¯n(y^{n},u)\notin\bar{\mathcal{A}}_{n} we have PY~nϕ¯n(X~n)(yn,u)=0P_{\tilde{Y}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}(y^{n},u)=0. Therefore for all (yn,u)𝒜¯n(y^{n},u)\in\bar{\mathcal{A}}_{n} the following inequalities hold

PYin(yn)eλnPY~n(yn),Pϕ¯n(Xin)(u)eλnPϕ¯n(X~n)(u).\displaystyle P_{Y_{i^{\star}}}^{\otimes n}(y^{n})\geq e^{-\lambda n}P_{\tilde{Y}^{n}}(y^{n}),\;P_{\bar{\phi}_{n}(X_{i^{\star}}^{n})}(u)\geq e^{-\lambda n}P_{\bar{\phi}_{n}(\tilde{X}^{n})}(u). (237)

This implies further that we also have

log1PYin×Pϕ¯n(Xin)(𝒜¯n)\displaystyle\log\frac{1}{P_{Y_{i^{\star}}^{n}}\times P_{\bar{\phi}_{n}(X_{i^{\star}}^{n})}(\bar{\mathcal{A}}_{n})}
(yn,u)𝒜¯nPY~nϕ¯n(X~n)(yn,u)logPY~nϕ¯n(X~n)(yn,u)PYin(yn)Pϕ¯n(Xin)(u)\displaystyle\leq\sum_{(y^{n},u)\in\bar{\mathcal{A}}_{n}}P_{\tilde{Y}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}(y^{n},u)\log\frac{P_{\tilde{Y}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}(y^{n},u)}{P_{Y_{i^{\star}}^{n}}(y^{n})P_{\bar{\phi}_{n}(X_{i^{\star}}^{n})}(u)}
(yn,u)𝒜¯nPY~nϕ¯n(X~n)(yn,u)logPY~nϕ¯n(X~n)(yn,u)e2λnPY~n(yn)Pϕ¯n(X~n)(u)\displaystyle\leq\sum_{(y^{n},u)\in\bar{\mathcal{A}}_{n}}P_{\tilde{Y}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}(y^{n},u)\log\frac{P_{\tilde{Y}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}(y^{n},u)}{e^{-2\lambda n}P_{\tilde{Y}^{n}}(y^{n})P_{\bar{\phi}_{n}(\tilde{X}^{n})}(u)}
=I(Y~n;ϕ¯n(X~n))+2λn.\displaystyle=I(\tilde{Y}^{n};\bar{\phi}_{n}(\tilde{X}^{n}))+2\lambda n. (238)

Hence we obtain

n(E3γ)I(Y~n;ϕ¯n(X~n))+disyx+2λn,\displaystyle n(E-3\gamma)\leq I(\tilde{Y}^{n};\bar{\phi}_{n}(\tilde{X}^{n}))+d_{i^{\star}s^{\star}}^{\mathrm{yx}}+2\lambda n, (239)

where (Y~n,X~n)PX~nY~n(\tilde{Y}^{n},\tilde{X}^{n})\sim P_{\tilde{X}^{n}\tilde{Y}^{n}}. By using the same lines of arguments as from (205) to (209) we have for given positive α\alpha and μ\mu

(Rc+γ)μ(E3γ)RHTμ,α(PXiYi)((α+1)+2μ)λ.\displaystyle(R_{c}+\gamma)-\mu(E-3\gamma)\geq R_{\mathrm{HT}}^{\mu,\alpha}(P_{X_{i^{\star}}Y_{i^{\star}}})-((\alpha+1)+2\mu)\lambda. (240)

As \mathcal{R} is a closed convex set and (R+γ,E3γ)(R+\gamma,E-3\gamma)\notin\mathcal{R} holds, by hyperplane separation theorem there exist positive numbers μ\mu and ν\nu such that (Rc+γ)μ(E3γ)<RHTμ(PXiYi)2ν(R_{c}+\gamma)-\mu(E-3\gamma)<R_{\mathrm{HT}}^{\mu}(P_{X_{i^{\star}}Y_{i^{\star}}})-2\nu holds. Furthermore there also exists an α\alpha such that RHTμ(PXiYi)<RHTμ,α(PXiYi)+νR_{\mathrm{HT}}^{\mu}(P_{X_{i^{\star}}Y_{i^{\star}}})<R_{\mathrm{HT}}^{\mu,\alpha}(P_{X_{i^{\star}}Y_{i^{\star}}})+\nu. Then we obtain for such α,μ\alpha,\mu

RHTμ,α(PXiYi)ν\displaystyle R_{\mathrm{HT}}^{\mu,\alpha}(P_{X_{i^{\star}}Y_{i^{\star}}})-\nu RHTμ(PXiYi)2ν\displaystyle\geq R_{\mathrm{HT}}^{\mu}(P_{X_{i^{\star}}Y_{i^{\star}}})-2\nu
>(Rc+γ)μ(E3γ)\displaystyle>(R_{c}+\gamma)-\mu(E-3\gamma)
RHTμ,α(PXiYi)((α+1)+2μ)λ.\displaystyle\geq R_{\mathrm{HT}}^{\mu,\alpha}(P_{X_{i^{\star}}Y_{i^{\star}}})-((\alpha+1)+2\mu)\lambda. (241)

This inequality is violated for λ<ν/((α+1)+2μ)\lambda<\nu/((\alpha+1)+2\mu). Then (233) and (232) imply that

limnαn(i)=1.\displaystyle\lim_{n\to\infty}\alpha_{n}^{(i^{\star})}=1. (242)

References

  • [1] R. Ahlswede and I. Csiszár, “Hypothesis testing with communication constraints,” IEEE Transactions on Information Theory, vol. 32, no. 4, pp. 533–542, 1986.
  • [2] T. Berger, “Decentralized estimation and decision theory,” in IEEE 7th Spring Workshop on Inf. Theory, Mt. Kisco, NY, September 1979.
  • [3] T. Han, “Hypothesis testing with multiterminal data compression,” IEEE Transactions on Information Theory, vol. 33, no. 6, pp. 759–772, 1987.
  • [4] H. Shimokawa, T. S. Han, and S. Amari, “Error bound of hypothesis testing with data compression,” in Proceedings of 1994 IEEE International Symposium on Information Theory.   IEEE, 1994, p. 114.
  • [5] H. M. Shalaby and A. Papamarcou, “Multiterminal detection with zero-rate data compression,” IEEE Transactions on Information Theory, vol. 38, no. 2, pp. 254–267, 1992.
  • [6] M. S. Rahman and A. B. Wagner, “On the optimality of binning for distributed hypothesis testing,” IEEE Transactions on Information Theory, vol. 58, no. 10, pp. 6282–6303, 2012.
  • [7] C. Tian and J. Chen, “Successive refinement for hypothesis testing and lossless one-helper problem,” IEEE Transactions on Information Theory, vol. 54, no. 10, pp. 4666–4681, 2008.
  • [8] S. Watanabe, “Neyman–pearson test for zero-rate multiterminal hypothesis testing,” IEEE Transactions on Information Theory, vol. 64, no. 7, pp. 4923–4939, 2017.
  • [9] G. J. McLachlan and D. Peel, Finite mixture models.   New York: John Wiley & Sons, 2000.
  • [10] P.-N. Chen, “General formulas for the neyman-pearson type-ii error exponent subject to fixed and exponential type-i error bounds,” IEEE Transactions on Information Theory, vol. 42, no. 1, pp. 316–323, 1996.
  • [11] T. S. Han, Information-Spectrum Methods in Information Theory.   Berlin Heidelberg: Springer-Verlag Berlin Heidelberg, 2003.
  • [12] T. S. Han, R. Nomura et al., “First-and second-order hypothesis testing for mixed memoryless sources,” Entropy, vol. 20, no. 3, p. 174, 2018.
  • [13] A. Ritchie, R. A. Vandermeulen, and C. Scott, “Consistent estimation of identifiable nonparametric mixture models from grouped observations,” Advances in Neural Information Processing Systems, vol. 33, pp. 11 676–11 686, 2020.
  • [14] R. T. Elmore, T. P. Hettmansperger, and H. Thomas, “Estimating component cumulative distribution functions in finite mixture models,” Communications in Statistics-Theory and Methods, vol. 33, no. 9, pp. 2075–2086, 2004.
  • [15] I. Cruz-Medina, T. Hettmansperger, and H. Thomas, “Semiparametric mixture models and repeated measures: the multinomial cut point model,” Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 53, no. 3, pp. 463–474, 2004.
  • [16] Y. Wei and X. Nguyen, “Convergence of de finetti’s mixing measure in latent structure models for observed exchangeable sequences,” arXiv preprint arXiv:2004.05542, 2020.
  • [17] C. Pal, B. Frey, and T. Kristjansson, “Noise robust speech recognition using gaussian basis functions for non-linear likelihood function approximation,” in 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1.   IEEE, 2002, pp. I–405.
  • [18] A. Anandkumar, D. Hsu, and S. M. Kakade, “A method of moments for mixture models and hidden markov models,” in Conference on Learning Theory.   JMLR Workshop and Conference Proceedings, 2012, pp. 33–1.
  • [19] M. I. Jordan, “Stat260: Bayesian Modeling and Inference, Lecture 1: History and De Finetti’s Theorem,” 2010.
  • [20] W. Kirsch, “An elementary proof of de finetti’s theorem,” Statistics & Probability Letters, vol. 151, pp. 84–88, 2019.
  • [21] M. T. Vu, T. J. Oechtering, and M. Skoglund, “Hypothesis testing and identification systems,” IEEE Transactions on Information Theory, vol. 67, no. 6, pp. 3765–3780, 2021.
  • [22] A. Wyner, “On source coding with side information at the decoder,” IEEE Transactions on Information Theory, vol. 21, no. 3, pp. 294–300, 1975.
  • [23] R. Ahlswede and J. Körner, “Source coding with side information and a converse for degraded broadcast channels,” IEEE Transactions on Information Theory, vol. 21, no. 6, pp. 629–637, 1975.
  • [24] A. R. Barron, “The Strong Ergodic Theorem for Densities: Generalized Shannon-McMillan-Breiman Theorem,” Annals of Probability, vol. 13, no. 4, pp. 1292–1303, 1985.
  • [25] S. Verdú, “Non-asymptotic achievability bounds in multiuser information theory,” in Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on.   IEEE, 2012, pp. 1–8.
  • [26] H. Tyagi and S. Watanabe, “Strong converse using change of measure arguments,” IEEE Transactions on Information Theory, vol. 66, no. 2, pp. 689–703, 2019.
  • [27] I. Csiszar and J. Körner, Information theory: coding theorems for discrete memoryless systems.   Cambridge: Cambridge University Press, 2011.
  • [28] H. Witsenhausen and A. Wyner, “A conditional entropy bound for a pair of discrete random variables,” IEEE Transactions on Information Theory, vol. 21, no. 5, pp. 493–501, 1975.
  • [29] J. Körner and K. Marton, “Comparison of two noisy channels,” in Topics in Information Theory, vol. 16.   Amsterdam: North Holland, 1977, pp. 411–424.
  • [30] S. Miyake and F. Kanaya, “Coding theorems on correlated general sources,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. 78, no. 9, pp. 1063–1070, 1995.
  • [31] Y. Polyanskiy and Y. Wu, “Lecture notes on information theory,” MIT (6.441), UIUC (ECE 563), 2017.