\DeclareCaptionType

step[Step]

Hypothesis Testing of Mixture Distributions using Compressed Data

Minh Thanh Vu

Abstract

In this paper we revisit the binary hypothesis testing problem with one-sided compression. Specifically we assume that the distribution in the null hypothesis is a mixture distribution of iid components. The distribution under the alternative hypothesis is a mixture of products of either iid distributions or finite order Markov distributions with stationary transition kernels. The problem is studied under the Neyman-Pearson framework in which our main interest is the maximum error exponent of the second type of error. We derive the optimal achievable error exponent and under a further sufficient condition establish the maximum $\epsilon$ -achievable error exponent. It is shown that to obtain the latter, the study of the exponentially strong converse is needed. Using a simple code transfer argument we also establish new results for the Wyner-Ahlswede-Körner problem in which the source distribution is a mixture of iid components.

Index Terms:

Mixture distribution, information-spectrum method, exponentially strong converse, Neyman-Pearson framework, error exponent.

I Introduction

I-A Motivations & Related Works

Hypothesis testing with communication constraints is a classic problem in information theory. The problem was initiated by Ahswede and Csiszár in [1] as well as by Berger in [2]. It is assumed that a pair of data sequences $(x^{n},y^{n})$ is observed at separate locations. The sequence $x^{n}$ is compressed and sent to the location of $y^{n}$ via a noiseless channel. The decision center there decides whether $(x^{n},y^{n})$ is iid generated from the null hypothesis with a distribution $P_{XY}$ or from the alternative hypothesis with a distribution $Q_{XY}$ . The Neyman-Pearson framework was used to study the trade-off between the probabilities of errors. The main interest was to establish the maximum error exponent of the second type of error when the probability of the first type of error is bounded as the number of samples tends to infinity. A single letter formulation was given in [1] for the testing against independence scenario and strong converse was proven for the general setting under a special condition. Various lower bounds for the general setting have been proposed in [3] and [4]. The work [3] also established the optimal error exponent in the zero-compression rate regime when $\epsilon$ is small enough. Using the same condition as in [1], the work [5] established the optimal $\epsilon$ -error exponent also in the zero-compression rate regime. The optimality of the random binning scheme proposed in [4] was shown for the conditional independence setting in [6]. In [7] the authors extended the testing against independence study to the successive refinement setting. The work [8] studied the non-asymptotic regime of the general setting under the two-sided zero-compression rate constraint. Yet, in all of the above studies, the distributions in both hypotheses are assumed to be iid.

Mixture distributions are prevalent in practice, cf. [9] for a comprehensive list of applications. However, hypothesis testing for mixture distributions is an under-examined direction in information theory. Notable studies are given in [10, 11, 12]. Communication constraints are not included in the above works. In this work we study the following binary hypothesis problem in which the hypotheses are given by

	$\displaystyle H_{0}\colon P_{Y^{n}X^{n}}$	$\displaystyle=\sum_{i}\nu_{i}P_{Y_{i}X_{i}}^{\otimes n},$
	$\displaystyle H_{1}\colon Q_{Y^{n}X^{n}}$	$\displaystyle=\sum_{jt}\tau_{jt}Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}.$		(1)

Mixture of iid components is also known as mixture of repeated measurements or mixture of grouped observations in the literature. By no means exhaustive, we list a few works with applications in topic modeling [13], in cognitive psychology [14], and in developmental psychology [15], cf. also [16]. In machine learning applications, to allow flexible modeling it is often assumed that within each component the joint distribution has a product form without the requirement of having the same marginal distribution, cf. [17] and [18]. We keep the iid condition in the null hypothesis for tractable analysis. A motivating example is provided in the following.

Assume that two statisticians observe two bags of words $x^{n}$ and $y^{n}$ taken as excerpts from some documents where $n$ denote the number of words in each bag. For simplicity we assume that in each bag words have an identical marginal distribution and also the order of words is not important. Then it is likely that the bag of words $x^{n}$ ( $y^{n}$ ) is not generated iid, since for example knowing the first word $x_{1}$ ( $y_{1}$ ) to be in Latin (German) gives us a guess that the whole $x^{n}$ ( $y^{n}$ ) is in Latin (German) [19]. Since the order of words does not matter, $x^{n}$ and $y^{n}$ are sequences of exchangeable observations. It is natural to approximate the distributions of $x^{n}$ and $y^{n}$ by finite mixtures of iid distributions, due to de Finetti’s theorem [20], in which each underlying state represents a topic. The two statisticians then form a hypothesis testing problem. In the null hypothesis, they assume that the two bags of words are generated jointly iid according to $P_{Y_{i}X_{i}}^{\otimes n}$ from an unknown topic $i\in[1:m]$ with probability $\nu_{i}$ , for example $x_{l}$ is a direct translation or a synonym of $y_{l}$ for all $l\in[1:n]$ . In the alternative hypothesis they assume that the two bags of words are generated independently and iid according to $Q_{Y_{j}}^{\otimes n}\times Q_{X_{t}}^{\otimes n}$ from an unknown topic $(j,t)$ with probability $\tau_{jt}$ .

We similarly are interested in the optimal error exponent of the second type of error when the probability of the first type is restricted by some $\epsilon\in[0,1)$ . The iid assumption of each component in the null hypothesis as well as the factorization assumption in the alternative hypothesis are used to facilitate the derivation. In contrast to the classic strong converse result in [1], the maximum $\epsilon$ -achievable error exponent generally depends on not only the prior distribution $(\nu_{i})$ but also relations between different constituent components in the mixture distributions. When no compression is involved, the weak law of large numbers can be used to establish the maximum $\epsilon$ -achievable error exponent, cf. [11, 12]. However, this argument is no longer applicable in the presence of compression. We need to use an argument established through proving exponential strong converses of constituent components.

I-B Contributions

We summarize contents of our work in the following.

•

We use code transformation arguments to unveil a new property of the classic Ahlswede-Csiszár problem in a general setting, cf. Theorem 7 in Appendix A. The arguments are also useful in simplifying proofs of different results in our work.
•

We provide the optimal achievable error exponent in Theorem 3 and show in Corollary 2 that for $\epsilon>0$ small enough the obtained error exponent is also $\epsilon$ -optimal. Our proof is based on studying a compound hypothesis testing problem of differentiating between $\{P_{Y_{i}X_{i}}^{\otimes n}\}$ and $\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}$ under communication constraints. The results are established when $\{(Q_{Y_{j}^{n}})\}$ and $\{(Q_{X_{t}^{n}})\}$ are stationary memoryless processes or finite Markov processes with stationary transition probabilities.
•

Under a further sufficient condition on the sets of distributions $\{P_{Y_{i}X_{i}}^{\otimes n}\}$ and $\{Q_{Y_{j}^{n}}\}$ , Assumption 2, we provide a complete characterization of the maximum $\epsilon$ -error exponent in Theorem 4. It is shown that even if the strong converse is available for the compound problem of testing $\{P_{Y_{i}X_{i}}^{\otimes n}\}$ against $\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}$ , it is not sufficient for establishing the maximum $\epsilon$ -error exponent in our mixture setting. Our derivation is based on the exponentially strong converse of the false alarm probabilities under the assumed condition.
•

We refine a recently established connection in [21] between the Wyner-Ahlswede-Körner (WAK) problem [22, 23] and the hypothesis testing against independence problem. While the previous result holds only under the stationary ergodic assumption, our new connection in Theorem 5 is valid under a more general assumption. We then use the refined connection to establish the corresponding minimum achievable compression rate and the minimum $\epsilon$ -achievable compression rate for the WAK problem with mixture distributions.

I-C Organization

Our paper is organized as follows. In Section II we review previous results and define quantities which are essential to characterize optimal error exponents in our study. We then establish various results for the compound setting, some might be of independent interest in Section III. In Section IV, we provide the maximum achievable error exponent in the mixture setting. We state the sufficient condition and use it to establish the maximum $\epsilon$ -achievable error exponent in the mixture setting in Section V. Then the refined connection between the WAK problem and the hypothesis testing against independence problem is given in Section VI. Consequences of the new connection are also given therein.

I-D Notations

We focus on finite alphabets in this paper. Before we begin we make the following conventions. Given a probability measure $\mu$ , $\mu^{\otimes n}$ denotes its $n$ -fold product measure extension. $\log(\textperiodcentered)$ denotes the natural logarithm. For any two distributions $P$ and $Q$ on an alphabet $\mathcal{U}$ , assume that $Q(u)=0$ , if $P$ is absolutely continuous w.r.t $Q$ , denoted by $P\ll Q$ , holds then we define $\iota_{P\|Q}(u)=0$ , otherwise we define $\iota_{P\|Q}(u)=+\infty$ , irrespective of whether $P(u)$ is equal to 0 or not. We also define¹¹1For simplicity we use the convention $\log 0=-\infty$ in the following. $\iota_{P\|Q}(u)=\log P_{Y}(u)/Q_{Y}(u)$ when $Q(u)>0$ . The relative entropy between two distributions $P$ and $Q$ is defined as $D(P\|Q)=\mathbb{E}[\iota_{P\|Q}(U)]$ where $U\sim P$ . For a joint distribution $P_{UV}$ on $\mathcal{U}\times\mathcal{V}$ if $P_{U}(u)=0$ or $P_{V}(v)=0$ holds then we define $\iota_{P_{UV}}(u;v)=0$ as $P_{UV}\ll P_{U}\times P_{V}$ is valid. Otherwise we define $\iota_{P_{UV}}(u;v)=\log P_{UV}(u,v)/(P_{U}(u)\times P_{V}(v))$ . The mutual information between $U$ and $V$ that is jointly distributed according to $P_{UV}$ is defined as $I(U;V)=\mathbb{E}[\iota_{P_{UV}}(U;V)]$ . For a finite set $\mathcal{A}$ , we use $|\mathcal{A}|$ and $\mathcal{A}^{c}$ to denote its cardinality and its complement. For a mapping $\phi\colon\mathcal{X}\to\mathcal{M}$ we define $|\phi|$ to be the cardinality of its range, $|\phi|\triangleq|\mathcal{M}|$ . For a given distribution $P$ on $\mathcal{X}$ and a stochastic mapping $f\colon\mathcal{X}\to\{0,1\}$ with the corresponding transition kernel $W$ we define

\displaystyle P(f)\triangleq\sum_{x}P(x)W(0|x),\;\text{and}\;P(1-f)\triangleq 1-P(f).

(2)

II Preliminaries

We review previous results on the hypothesis testing problem with one-sided compression. Then we present an important assumption and essential quantities that are needed to establish results of our study in later sections.

Refer to caption — Figure 1: Illustration of the hypothesis with communication constraint setting.

In [1], the authors studied the following binary hypothesis testing problem. Deciding whether $(x^{n},y^{n})$ is iid generated from $H_{0}\colon P_{XY}$ or $H_{1}\colon Q_{XY}$ using a testing scheme $(\phi_{n},\psi_{n})$ . Herein $\phi_{n}$ is a compression mapping,

\phi_{n}\colon\mathcal{X}^{n}\to\mathcal{M},

(3)

and $\psi_{n}$ is a decision mapping

\psi_{n}\colon\mathcal{Y}^{n}\times\mathcal{M}\to\{0,1\}.

(4)

The setting is depicted in Fig. 1. The corresponding type I and type II error (also known as false alarm and miss detection) probabilities are given by

	$\displaystyle\alpha_{n}$	$\displaystyle=P_{Y^{n}\phi_{n}(X^{n})}(1-\psi_{n})=\sum P_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n}))P_{1\|y^{n},\phi_{n}(x^{n})}$
	$\displaystyle\beta_{n}$	$\displaystyle=Q_{Y^{n}\phi_{n}(X^{n})}(\psi_{n})=\sum Q_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n}))P_{0\|y^{n},\phi_{n}(x^{n})},$		(5)

where $P_{k|y^{n},\phi_{n}(x^{n})}$ , $k\in\{0,1\}$ , is the probability that $\psi_{n}$ outputs $k$ given the pair $(y^{n},\phi_{n}(x_{n}))$ . The following achievability definition is repeatedly used in the subsequent analysis.

Definition 1.

For a given $R_{c}$ and an $\epsilon\in[0,1)$ , $E$ is an $\epsilon$ -achievable error exponent of the second type for the binary hypothesis testing problem if there exists a sequence of testing schemes $(\phi_{n},\psi_{n})$ such that

	$\displaystyle\limsup_{n\to\infty}\alpha_{n}\leq\epsilon,\;$	$\displaystyle\liminf_{n\to\infty}\frac{1}{n}\log\frac{1}{\beta_{n}}\geq E,$
	$\displaystyle\limsup_{n\to\infty}\frac{1}{n}$	$\displaystyle\log\|\phi_{n}\|\leq R_{c}.$		(6)

We define $E^{\star}_{\epsilon}(R_{c})$ as the supremum of all $\epsilon$ -achievable error exponent at $R_{c}$ .

When $Q(y|x)>0$ for all $(x,y)\in\mathcal{X}\times\mathcal{Y}$ holds, Ahlswede and Csiszár proved the strong converse result that for a given $R_{c}$ , $E_{\epsilon}^{\star}(R_{c})$ does not depend on $\epsilon\in[0,1)$ . They also provided a multi-letter formula for $E_{\epsilon}^{\star}(R_{c})$ under the stated condition. In the case that $Q_{XY}=P_{X}\times P_{Y}$ holds, the formula reduces to²²2For notation simplicity, we suppress the cardinality bound for $\mathcal{U}$ in the subsequent.

\displaystyle E_{\epsilon}^{\star}(R_{c})=\max_{P_{U|X}\colon I(X;U)\leq R_{c}}I(Y;U),\;\forall\epsilon\in[0,1).

(7)

A closer examination also reveals that in the case $Q_{XY}=Q_{Y}\times Q_{X}$ holds, where $Q_{Y}$ and $Q_{X}$ are distributions on $\mathcal{Y}$ and $\mathcal{X}$ satisfying $D(P_{Y}\|Q_{Y})<\infty$ and $D(P_{X}\|Q_{X})<\infty$ , we also have³³3In [5], the authors claimed in the introduction that Ahswede and Csiszár obtained a single-letter characterization when $Q_{XY}=Q_{X}\times Q_{Y}$ using the entropy characterization methods. However, in [1] only characterizations for $Q_{XY}=P_{X}\times P_{Y}$ and an arbitrary $Q_{XY}$ with $R_{c}\geq H(X)$ were provided. Further details to validate this claim were not provided in [5]. for all $\epsilon\in[0,1)$

E_{\epsilon}^{\star}(R_{c})=\max_{P_{U|X}\colon I(X;U)\leq R_{c}}I(Y;U)+D(P_{Y}\|Q_{Y})+D(P_{X}\|Q_{X}).

(8)

In other words, Theorem 5 in [1] is tight in this case. We give a general relation using code transformations which leads to this observation in Appendix A.
This observation motivates us to study in this paper generalized problems of compound hypothesis testing and hypothesis testing for mixture distributions. These settings are based on finite collections of distributions $\{P_{X_{i}Y_{i}}^{\otimes n}\}_{i=1}^{m}$ and $\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}_{j\in[1:k],t\in[1:r]}$ . Let

	$\displaystyle\mathcal{P}_{\mathcal{X}}=\{\tilde{P}\mid\tilde{P}\;$	$\displaystyle\text{is the marginal distributions on}\;\mathcal{X}$
		$\displaystyle\text{ of }P_{X_{i}Y_{i}}\;\text{for some }i\in[1:m]\}.$		(9)

We assume that $\mathcal{P}_{\mathcal{X}}$ can be enumerated as $\{P_{\mathcal{X},1},\dots,P_{\mathcal{X},|\mathcal{P}_{\mathcal{X}}|}\}$ . For each $s\in[1:|\mathcal{P}_{\mathcal{X}}|]$ , let $\mathfrak{F}_{s}$ be the set of indices $i$ such that $P_{\mathcal{X},s}$ is a the marginal distribution of $P_{Y_{i}X_{i}}$ on $\mathcal{X}$ , i.e.,

\mathfrak{F}_{s}=\{i\mid P_{\mathcal{X},s}\text{ is the marginal on }\mathcal{X}\text{ of }P_{X_{i}Y_{i}}\}.

(10)

To characterize the optimal error exponents in the compound setting and the mixture setting, we need to make the following assumption.

Assumption 1.

The processes $\{Q_{X_{t}^{n}}\}$ and $\{Q_{Y_{j}^{n}}\}$ are assumed to be in one of the following categories.

•

$Q_{X_{t}^{n}}=Q_{X_{t}}^{\otimes n}$ and $Q_{Y_{j}^{n}}=Q_{Y_{j}}^{\otimes n}$ hold for all $j\in[1:k]$ and $t\in[1:r]$ , respectively. We assume further that for all $i\in[1:m]$ , $\min_{j}D(P_{Y_{i}}\|Q_{Y_{j}})<\infty$ and for all $s\in[1:|\mathcal{P}_{\mathcal{X}}|]$ , $\min_{t}D(P_{\mathcal{X},s}\|Q_{X_{t}})<\infty$ hold. Accordingly, we define the minimum distances as⁴⁴4In case that $\operatorname*{arg\,min}$ does not return a unique value, we pick the first one.

	$\displaystyle\forall i\in[1:m],\;j_{i}^{\star}$	$\displaystyle=\operatorname*{arg\,min}_{j\in[1:k]}D(P_{Y_{i}}\\|Q_{Y_{j}}),\;d_{i}^{\mathrm{y}}=D(P_{Y_{i}}\\|Q_{Y_{j_{i}^{\star}}}),\;$
	$\displaystyle\forall s\in[1:\|\mathcal{P}_{\mathcal{X}}\|],\;t_{s}^{\star}$	$\displaystyle=\operatorname*{arg\,min}_{t\in[1:r]}D(P_{\mathcal{X},s}\\|Q_{X_{t}}),\;d_{s}^{\mathrm{x}}=D(P_{\mathcal{X},s}\\|Q_{X_{t_{s}^{\star}}}).$		(11)

•

$(Q_{X_{t}^{n}})$ and $(Q_{Y_{j}^{n}})$ are Markov processes of finite orders with stationary transition probabilities for all $t\in[1:r]$ and $j\in[1:k]$ . Compared to the first category we need an additional assumption which is for all $n$ , we have $P_{\mathcal{X},s}^{\otimes n}\ll Q_{X_{t}^{n}}$ for all $(s,t)$ and similarly $P_{Y_{i}}^{\otimes n}\ll Q_{Y_{j}^{n}}$ for all $(i,j)$ . We define the following limits, called relative entropy rates,

	$\displaystyle\forall(s,t),\;A_{X}^{(st)}$	$\displaystyle=\lim_{n\to\infty}[D(P_{\mathcal{X},s}^{\otimes(n+1)}\\|Q_{X_{t}^{n+1}})-D(P_{\mathcal{X},s}^{\otimes n}\\|Q_{X_{t}^{n}})],$
	$\displaystyle\forall(i,j),\;A_{Y}^{(ij)}$	$\displaystyle=\lim_{n\to\infty}[D(P_{Y_{i}}^{\otimes(n+1)}\\|Q_{Y_{j}^{n+1}})-D(P_{Y_{i}}^{\otimes n}\\|Q_{Y_{j}^{n}})].$		(12)

The existence of these (relative divergence rate) limits are guaranteed by [24, Theorem 1]. Similarly, we assume that for all $i\in[1:m]$ , $\min_{j}A_{Y}^{(ij)}<\infty$ and for all $s\in[1:|\mathcal{P}_{\mathcal{X}}|]$ , $\min_{t}A_{X}^{(st)}<\infty$ hold. We further define

	$\displaystyle\forall i\in[1:m],\;j_{i}^{\star}$	$\displaystyle=\operatorname*{arg\,min}_{j\in[1:k]}A_{Y}^{(ij)},\;d_{i}^{\mathrm{y}}=A_{Y}^{(ij_{i}^{\star})},$
	$\displaystyle\forall s\in[1:\|\mathcal{P}_{\mathcal{X}}\|],\;t_{s}^{\star}$	$\displaystyle=\operatorname*{arg\,min}_{t\in[1:r]}A_{X}^{(st)},\;d_{s}^{\mathrm{x}}=A_{X}^{(st_{s}^{\star)}}.$		(13)

For notation simplicity we also define total minimum distances for all $i\in[1:m]$ , $i\in\mathfrak{F}_{s}$ ,

\displaystyle d_{is}^{\mathrm{yx}}=d_{i}^{\mathrm{y}}+d_{s}^{\mathrm{x}}.

(14)

The following quantities are used to characterize optimal error exponents in later sections⁵⁵5For clarity, we reserve pairs of random variables $(Y_{i},X_{i})$ for distributions $P_{Y_{i}X_{i}}$ and pairs of random variables $(\bar{Y}_{i},\bar{X}_{s})$ for distributions $P_{Y_{i}|X_{i}}\times P_{\mathcal{X},s}$ .

	$\displaystyle\forall i\in[1:m],\;i\in\mathfrak{F}_{s},\;\xi_{i}(R_{c})$	$\displaystyle=\max_{P_{U\|\bar{X}_{s}}\colon I(\bar{X}_{s};U)\leq R_{c}}I(\bar{Y}_{i};U)+d_{is}^{\mathrm{yx}},$
	$\displaystyle\forall s\in[1:\|\mathcal{P}_{\mathcal{X}}\|],\;\theta_{s}(R_{c})$	$\displaystyle=\max_{P_{U\|\bar{X}_{s}}\colon I(\bar{X}_{s};U)\leq R_{c}}\min_{i\in\mathfrak{F}_{s}}[I(\bar{Y}_{i};U)+d_{is}^{\mathrm{yx}}],$		(15)

where $P_{\bar{X}_{s}}=P_{\mathcal{X},s}$ and for all $i\in\mathfrak{F}_{s}$ , $P_{\bar{Y}_{i}|\bar{X}_{s}}=P_{Y_{i}|X_{i}}$ hold.

III Compound Hypothesis Testing

In this section we study the compound hypothesis testing problem of differentiating between two collections of distributions $\{P_{Y_{i}X_{i}}^{\otimes n}\}_{i\in[1:m]}$ and $\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}_{j\in[1:k],t\in[1:r]}$ satisfying Assumption 1. We derive the optimal achievable error exponent and provide partial strong converse results for this setting.
A testing scheme for this problem is similarly characterized by a pair of compression-decision mappings $(\phi_{n},\psi_{n})$ . For a given testing scheme $(\phi_{n},\psi_{n})$ , we define the following quantities which characterize the maximum type I and type II error probabilities

	$\displaystyle\alpha_{n}$	$\displaystyle=\max_{i\in[1:m]}\alpha_{n}^{(i)}=\max_{i\in[1:m]}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(1-\psi_{n}),$
	$\displaystyle\beta_{n}$	$\displaystyle=\max_{j\in[1:k],t\in[1:r]}\beta_{n}^{(jt)}=\max_{j\in[1:k],t\in[1:r]}Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(\psi_{n}).$		(16)

Similarly we also use Definition 1 as the $\epsilon$ -achievability definition. For a given $R_{c}$ , the maximum $\epsilon$ -achievable error exponent is denoted by $E_{\mathrm{comp},\epsilon}^{\star}(R_{c})$ .
As our achievability result will be used in Sections IV and V, for notation compactness, we define the following auxiliary sets. For a given mapping $\phi_{n}$ and a positive number $E$ , define for each $i\in[1:m]$ , an intersected decision set $\mathcal{I}_{n}^{(i)}(E)$ as follows

	$\displaystyle\mathcal{I}_{n}^{(i)}(E)=$	$\displaystyle\{(y^{n},\phi_{n}(x^{n}))\mid P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(y^{n},\phi_{n}(x^{n}))$
		$\displaystyle\hskip 28.45274pt>e^{nE}\max_{j\in[1:k],t\in[1:r]}Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(y^{n},\phi_{n}(x^{n}))\}.$		(17)

In the case $k=1$ and $r=1$ hold, $\mathcal{I}_{n}^{(i)}(E)$ is a decision region based on likelihood ratio for testing $P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}$ against $Q_{Y_{1}^{n}}\times Q_{\phi_{n}(X_{1}^{n})}$ .

III-A Characterization of $E_{\mathrm{comp},0}^{\star}(R_{c})$

The following result characterizes the optimal achievable error exponent in the compound hypothesis testing problem.

Theorem 1.

For a given compression threshold $R_{c}$ , we have $E_{\mathrm{comp},0}^{\star}(R_{c})=\min_{s\in[1:|\mathcal{P}_{\mathcal{X}}|]}\theta_{s}(R_{c})$ . Furthermore, for a given positive number $\gamma$ and any sequence of testing schemes $(\phi_{n},\psi_{n})$ such that $\bar{E}=\min_{s\in[1:|\mathcal{P}_{\mathcal{X}}|]}\theta_{s}(R_{c})-\gamma$ is achievable we also have with $E=\bar{E}-\gamma$

\displaystyle\lim_{n\to\infty}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}\big{[}(\mathcal{I}_{n}^{(i)}(E))^{c}\big{]}=0,\;\forall i\in[1:m].

(18)

The achievability proof of Theorem 1 is given in Appendix B. The converse proof of Theorem 1 follows from the one of Theorem 3. The proof of Theorem 1 uses a combination of techniques from [11], [25] and our new mixing idea. In the following we provide an overview of steps in the proof of Theorem 1.

Assume that the set of marginal distributions on $\mathcal{X}$ , $\mathcal{P}_{\mathcal{X}}$ , consists of a single element. Assume further that the number of components in the null hypothesis is two, i.e. $m=2$ and $\mathfrak{F}_{1}=\{1,2\}$ . First we check whether the sequence $x^{n}$ is “typical” in the sense that

\displaystyle\min_{t}\iota_{P_{X}^{\otimes n}\|Q_{X_{t}^{n}}}(x^{n})>n(d_{1}^{\mathrm{x}}-\gamma).

(19)

This helps us to perform the change of measure step from $Q_{X_{t}^{n}}$ to $P_{X}^{\otimes n}$ in the analysis of the type-II (or miss detection) probability. The above condition is violated with vanishing probability in the analysis of the false alarm probability. We then select a test channel $P_{U|X}$ and generate a codebook from the marginal distribution $P_{U}^{\otimes n}$ . In our proof we do not estimate $i$ from the sequence $y^{n}$ to avoid potential complications in the analysis of the miss detection probability. We artificially create instead the following joint distribution

\displaystyle P_{Y^{n}U^{n}}=\sum_{i=1}^{2}\nu_{i}P_{Y_{i}U}^{\otimes n},

(20)

where $\nu_{i}$ are positive probability weights. $P_{Y^{n}U^{n}}$ shifts the burden from calculating the miss detection probabilities to bounding the false alarm probabilities, which is less complex. We then consider the following score function which is helpful in defining a deterministic decision mapping $\psi_{n}$

\displaystyle\zeta(y^{n},u^{n})=\iota_{P_{Y^{n}U^{n}}}(y^{n};u^{n})+\min_{j}\iota_{P_{Y^{n}\|Q_{Y_{j}^{n}}}}(y^{n}).

(21)

In the score function $\zeta(\cdot,\cdot)$ the first term resolves the uncertainty within the set of marginal distributions $\{P_{Y_{i}}^{\otimes n}\}$ , while the second term resolves the mismatch between two sets of distributions $\{P_{Y_{i}}^{\otimes n}\}$ and $\{Q_{Y_{t}^{n}}\}$ . The second term also indirectly checks whether $y^{n}$ is “typical”.
Given a chosen codeword, which we explain how to obtain later, we decide that the null hypothesis is true if $\zeta(y^{n},u^{n})>n(E-d_{1}^{\mathrm{x}})$ is fulfilled. Given this decision the miss detection probabilities can be deduced based on the following chain of measure changing steps

\displaystyle Q_{Y_{j}^{n}}(y^{n})\stackrel{{\scriptstyle\eqref{change_qto_pyu}}}{{\to}}P_{Y^{n}|U^{n}}(y^{n}|u^{n}),\;Q_{\phi_{n}(X_{t}^{n})}(u^{n})\stackrel{{\scriptstyle\eqref{change_qxto_px}}}{{\to}}P_{\phi(X^{n})}(u^{n}),

(22)

as well as the fact that $\sum_{y^{n},u^{n}}P_{Y^{n}|U^{n}}(y^{n}|u^{n})P_{\phi(X^{n})}(u^{n})\leq 1$ holds, where the summation is performed over the decision region.
Now to obtain a transmission message index, we search for a codeword such that it yields the lowest maximum false alarm conditional probabilities by looking at

\displaystyle\max_{i\in\mathfrak{F}_{s}}\mathrm{Pr}\{\zeta(Y_{i}^{n},u^{n})<n(E-d_{1}^{\mathrm{x}})|X_{i}^{n}=x^{n}\}.

(23)

In the analysis of the maximum false alarm probabilities changing measure from $P_{Y^{n}U^{n}}$ to $P_{Y_{i}^{n}U_{i}^{n}}$ in the expression of $\zeta(\cdot,\cdot)$ is relatively standard, cf. [11]. We then can use the standard typical arguments to conclude the existence of a good codebook. In the general case where the marginal set $\mathcal{P}_{\mathcal{X}}$ has multiple elements, we need to estimate $s$ . Because of the way that we design $\zeta(\cdot,\cdot)$ this extra step does not affect the exponent of the miss detection probability.

III-B A partial characterization of $E_{\mathrm{comp},\epsilon}^{\star}(R_{c})$

We have the following result which provides a partial characterization of $E_{\mathrm{comp},\epsilon}^{\star}(R_{c})$ .

Theorem 2.

Given a positive number $R_{c}$ , define the inactive set $\mathcal{S}=\{s\mid\theta_{s}(R_{c})<\min_{i\in\mathfrak{F}_{s}}\xi_{i}(R_{c})\}$ .

•

If⁶⁶6We use the following convention: if $\mathcal{S}=\varnothing$ then $\min_{s\in\mathcal{S}}(\cdot)=+\infty$ and $\max_{s\in\mathcal{S}}(\cdot)=-\infty$ . $\epsilon<\min\{\min_{s\in\mathcal{S}}\frac{1}{|\mathfrak{F}_{s}|},1\}$ holds, then we have

$E_{\mathrm{comp},\epsilon}^{\star}(R_{c})=\min_{s\in[1:|\mathcal{P}_{\mathcal{X}}|]}\theta_{s}(R_{c}).$
•

If $\epsilon>\max\{\max_{s\in\mathcal{S}}\frac{|\mathfrak{F}_{s}|-1}{|\mathfrak{F}_{s}|},0\}$ , then we have

$E_{\mathrm{comp},\epsilon}^{\star}(R_{c})=\min_{i\in[1:m]}\xi_{i}(R_{c}).$

•

Let $s^{\star}$ be an optimality achieving index, i.e., $\theta_{s^{\star}}(R_{c})=\min_{s^{\prime}}\theta_{s^{\prime}}(R_{c})$ . Assume that $s^{\star}$ is active, i.e., $\theta_{s^{\star}}(R_{c})=\min_{i\in\mathfrak{F}_{s^{\star}}}\xi_{i}(R_{c})$ . For an arbitrarily given $\gamma>0$ , for any sequence of testing schemes $(\phi_{n},\psi_{n})$ such that the following inequalities are satisfied

\displaystyle\limsup_{n\to\infty}\frac{1}{n}\log|\phi_{n}|\leq R_{c},\;\liminf_{n\to\infty}\frac{1}{n}\log\frac{1}{\beta_{n}}\geq\min_{s}\theta_{s}(R_{c})+\gamma,

(24)

we have then $\lim_{n\to\infty}\alpha_{n}^{(i^{\star})}=1$ , where $i^{\star}\in\mathfrak{F}_{s^{\star}}$ is an index such that $\xi_{i^{\star}}(R_{c})=\theta_{s^{\star}}(R_{c})$ . It also implies that under this assumption we have $E_{\mathrm{comp},\epsilon}^{\star}(R_{c})=\min_{s}\theta_{s}(R_{c})$ for all $\epsilon\in[0,1)$ .

A simple case in which the third statement in Theorem 2 holds, is when $|\mathfrak{F}_{s}|=1$ for all $s\in[1:|\mathcal{P}_{\mathcal{X}}|]$ . The proof of Theorem 2 is given in Appendix C. The second part of Theorem 4 will be employed in proving the converse of Theorem 4 for mixture models in Section IV.

Recall that $\mathfrak{F}_{s}$ represents the set of distributions $P_{Y_{i}X_{i}}$ which have the same marginal distribution $P_{\mathcal{X},s}$ on $\mathcal{X}$ , cf. Equation (10). In the following we provide an outline of the proofs of the first two points in the statement Theorem 2.

We discuss in this paragraph the first item in the first part of Theorem 2. When $s$ is active, i.e., $\theta_{s}(R_{c})=\min_{i\in\mathfrak{F}_{s}}\xi_{i}(R_{c})$ holds, then it follows from the strong converse bound for testing $P_{Y_{i}X_{i}}^{\otimes n}$ against $Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{X_{t_{s}^{\star}}^{n}}$ for each $i$ inside the class $\mathfrak{F}_{s}$ , that $E_{\mathrm{comp},\epsilon}^{\star}(R_{c})\leq\theta_{s}(R_{c})$ holds for all $\epsilon\in[0,1)$ , cf. Theorem 7. Therefore we only need to focus on the inactive set $\mathcal{S}$ . For simplicity, in this discussion we can assume that $\mathcal{S}=\{1\}$ holds and there are two components inside $\mathfrak{F}_{1}$ . We assume further that there is no mismatch, i.e., $\{Q_{Y_{j}^{n}}\}=\{P_{Y_{i}}^{\otimes n}\}$ and $\{Q_{X_{t}^{n}}\}=\{P_{X_{i}}^{\otimes n}\}$ hold. The formulation of $\theta_{1}(R_{c})$ requires the selection of a test channel $P_{U|\bar{X}_{1}}$ . To show the strong converse bound, the general idea is hence to identify a common test channel $P_{U|\bar{X}_{1}}$ . This can be done by considering relevant sets $\mathcal{V}_{i}$ , $i=1,2$ , of $x^{n}$ such that for each $i=1,2,$

\mathrm{Pr}(\psi_{n}(Y_{i}^{n},\phi_{n}(x^{n}))=0|X_{i}^{n}=x^{n})>\eta

holds, where $\eta\in(0,1-(\epsilon+\gamma))$ and $\gamma\in(0,1-\epsilon)$ are arbitrary. By setting $\eta=1/2-(\epsilon+\gamma)$ and using the reverse Markov inequality, we obtain the following inequalities

P_{X_{i}}^{\otimes n}(\mathcal{V}_{i})\geq 1/(1+2(\epsilon+\gamma)),\;\forall i=1,2.

(25)

We require that the intersection $\mathcal{V}=\cap_{i=1}^{2}\mathcal{V}_{i}$ should be non-empty. This allows us to define a joint distribution $P_{\tilde{Y}_{1}^{n}\tilde{Y}_{2}^{n}\tilde{X}^{n}}$ where $P_{\tilde{X}^{n}}$ is supported on $\mathcal{V}$ . Note also that $\eta$ -restriction in the definition of $\mathcal{V}_{i}$ allows us to obtain the following inequality

n(E-\dots)\leq I(\tilde{Y}_{1}^{n};\phi_{n}(\tilde{X}^{n}))+(\text{a bounded function of $\eta$, $\epsilon$ and $\gamma$}).

With this we can identify $\tilde{U}_{l}=(\phi_{n}(\tilde{X}^{n}),(\tilde{Y}_{i}^{l-1})_{i=1}^{2})$ for all $l\in[1:n]$ . The variational arguments in [26] can be used to indeed show that $E_{\mathrm{comp},\epsilon}^{\star}(R_{c})\leq\theta_{1}(R_{c})$ also holds. To make $\mathcal{V}$ non-empty, we must have $\epsilon<1/2=\frac{1}{|\mathfrak{F}_{1}|}$ . When the inactive set $\mathcal{S}$ contains more than one element we obtain the corresponding threshold $\min_{s\in\mathcal{S}}\frac{1}{|\mathfrak{F}_{s}|}$ .

Now we discuss about the second item in the first part of Theorem 2. Since $E_{\mathrm{comp},\epsilon}^{\star}(R_{c})\leq\xi_{i}(R_{c})$ holds for all $i\in[1:m]$ and $\epsilon\in[0,1)$ , cf. Theorem 7, we only explain the achievability direction of the second item. For simplicity we assume that the set of marginal distributions $\mathcal{P}_{\mathcal{X}}$ has a single element, the element is inactive $\mathcal{S}=\{1\}$ , and $\mathfrak{F}_{1}=\{1,2\}$ . Our achievability idea is to build two sequences of testing schemes separately and then mix them together. For this we need to divide the space $\mathcal{X}^{n}$ into $|\mathfrak{F}_{1}|=2$ partitions $\mathcal{C}_{1}$ and $\mathcal{C}_{2}$ such that $P_{\bar{X}_{1}}^{\otimes n}(\mathcal{C}_{l})>1-\epsilon$ for all $l=1,2$ for all sufficiently large $n$ . Since $\epsilon>\frac{|\mathfrak{F}_{1}|-1}{|\mathfrak{F}_{1}|}=1/2$ , such partitioning can be done. For each $i\in\{1,2\}$ , we design a sequences of testing schemes $(\phi_{n}^{1i},\psi_{n}^{1i})$ to differentiate between $P_{Y_{i}X_{i}}^{\otimes n}$ and $Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{X_{t_{1}^{\star}}^{n}}$ such that $\xi_{i}(R_{c})-\gamma$ is achievable. As in the proof of Theorem 1 we also define an auxiliary mixture distribution

\displaystyle P_{Y^{n}X^{n}}=\sum_{i}\nu_{i}P_{Y_{i}X_{i}}^{\otimes n}.

(26)

The mixture distribution helps to alleviate the estimation of the distribution of $y^{n}$ . Once the preparation is complete, we perform the compression as follows.
We first check if $x^{n}$ is a typical sequence in the sense that whether the following condition is fulfilled or not

\displaystyle\min_{t}\iota_{P_{X^{n}}\|Q_{X_{t}^{n}}}(x^{n})>n(d_{1}^{\mathrm{x}}-\gamma).

(27)

Again the above inequality also helps resolving the mismatch between two sets of distributions $\{P_{X_{i}}^{\otimes n}\}$ and $\{Q_{X_{t}^{n}}\}$ . Suppose that $x^{n}$ is typical. If $x^{n}\in\mathcal{C}_{1}$ then we use $\phi_{n}^{11}$ to compress it, and similarly when $x^{n}\in\mathcal{C}_{2}$ we use $\phi_{n}^{12}$ to compress it. The joint compression mapping is then

\displaystyle\phi_{n}^{1}(x^{n})=\phi_{n}^{11}(x^{n})\mathbf{1}\{x^{n}\in\mathcal{C}_{1}\}+\phi_{n}^{12}(x^{n})\mathbf{1}\{x^{n}\in\mathcal{C}_{2}\}.

(28)

The joint compression mapping induces the following distribution from $P_{Y^{n}X^{n}}$

\displaystyle P_{Y^{n}U}=\nu_{1}P_{Y_{1}^{n}\phi_{n}^{11}(X_{1}^{n})}+\nu_{2}P_{Y_{2}^{n}\phi_{n}^{12}(X_{2}^{n})}.

(29)

We also define the following score function

\displaystyle\zeta(y^{n},\phi_{n}^{1}(x^{n}))=\iota_{P_{Y^{n}U}}(y^{n};\phi_{n}^{1}(x^{n}))+\min_{j}\iota_{P_{Y^{n}\|Q_{Y_{j}^{n}}}}(y^{n}).

(30)

We say $H_{0}$ is true if $\zeta(y^{n},\phi_{n}^{1}(x^{n}))>n(E-d_{1}^{\mathrm{x}})$ holds. Let us look at $\alpha_{n}^{(1)}$ which can be upper-bounded as

$\displaystyle\alpha_{n}^{(1)}$	$\displaystyle\leq\mathrm{Pr}\{\zeta(Y_{1}^{n},\phi_{n}^{1}(X_{1}^{n}))>n(E-d_{1}^{\mathrm{x}}),X_{1}^{n}\text{ is typical}\}$
	$\displaystyle+\mathrm{Pr}\{X_{1}^{n}\text{ is atypical}\}$
	$\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}}\mathrm{Pr}\{\zeta(Y_{1}^{n},\phi_{n}^{11}(X_{1}^{n}))>n(E-d_{1}^{\mathrm{x}}),X_{1}^{n}\in\mathcal{C}_{1}\}$
	$\displaystyle+\mathrm{Pr}\{X_{1}^{n}\notin\mathcal{C}_{1}\}+\mathrm{Pr}\{X_{1}^{n}\text{ is atypical}\}$
	$\displaystyle\leq\mathrm{Pr}\{\zeta(Y_{1}^{n},\phi_{n}^{11}(X_{1}^{n}))>n(E-d_{1}^{\mathrm{x}})\}$
	$\displaystyle+\mathrm{Pr}\{X_{1}^{n}\notin\mathcal{C}_{1}\}+\mathrm{Pr}\{X_{1}^{n}\text{ is atypical}\},$	(31)

where $(a)$ holds since when $x^{n}$ is in $\mathcal{C}_{1}$ and is typical, we use $\phi_{n}^{11}$ to compress it. We have $\mathrm{Pr}\{X_{1}^{n}\notin\mathcal{C}_{1}\}<\epsilon$ by construction. The first term can be shown to be vanishing using similar steps as in the proof of Theorem 1.
We note that Theorem 1 only guarantees that $\theta_{1}(R_{c})-\gamma$ is achievable which is below $\min_{i\in\mathfrak{F}_{1}}\xi_{i}(R_{c})-\gamma$ , our desired error exponent in this part of Theorem 2. The collection of sets $\{\mathcal{C}_{l}\}_{l=1}^{2}$ is used to resolve the confusion about which compression mappings we should use when $s=1$ is not active. We are willing to pay an additional $\epsilon$ error probability price for using this collection. The general case is a little bit more complicated but follows the same principles as we discuss herein. Note that when $s$ is active, we do not need to divide $\mathcal{X}^{n}$ into a collection of subsets as above. This is because by Theorem 1 we can design a sequence of testing schemes to differentiate between $\{P_{Y_{i}X_{i}}^{\otimes n}\}_{i\in\mathfrak{F}_{s}}$ and $\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}$ such that $\min_{i\in\mathfrak{F}_{s}}\xi_{i}(R_{c})-\gamma$ is achievable.

Remark 1.

We have $\theta_{s}(R_{c})\leq\min_{i\in\mathfrak{F}_{s}}\xi_{i}(R_{c})$ . The inequality can be strict which can be shown numerically. This means that in general strong converse does not hold for the compound testing problem. In other words, $E_{\mathrm{comp},\epsilon}^{\star}(R_{c})$ depends on $\epsilon$ .

IV Testing Against Generalized Independence

In this section we consider the hypothesis testing problem involving mixture distributions. We use results and techniques from Section III to establish the optimal achievable error exponent in this section. We begin with our model’s definition.
Assume that we have two sets of distributions $\{P_{Y_{i}X_{i}}^{\otimes n}\}_{i=1}^{m}$ and $\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}_{j\in[1:k],t\in[1:r]}$ , which fulfill the conditions given in Assumption 1. For given $\{P_{Y_{i}X_{i}}^{\otimes n}\}_{i=1}^{m}$ , let the distribution under the null hypothesis be defined as

\displaystyle P_{Y^{n}X^{n}}=\sum_{i}\nu_{i}P_{Y_{i}X_{i}}^{\otimes n},\;\text{where}\;\forall i,\;\nu_{i}>0,\;\text{and}\;\sum_{i}\nu_{i}=1.

(32)

Similarly the distribution under the alternative hypothesis is given by

\displaystyle Q_{Y^{n}X^{n}}=\sum_{j,t}\tau_{jt}Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}},

(33)

where $\tau_{jt}\geq 0$ for all $(j,t)$ , $\sum_{j,t}\tau_{jt}=1$ , and $\forall i\in[1:m],\;i\in\mathfrak{F}_{s},\;\tau_{j_{i}^{\star}t_{s}^{\star}}>0$ . For notation simplicity in the subsequent analysis we define $\gamma_{q}=\min_{i\in[1:m],i\in\mathfrak{F_{s}}}\tau_{j_{i}^{\star}t_{s}^{\star}}$ . We name this problem testing against generalized independence.
The model of $Q_{Y^{n}X^{n}}$ subsumes the following two cases:

•

testing against independence in which $Q_{Y^{n}X^{n}}=(\sum_{i}\nu_{i}P_{Y_{i}}^{\otimes n})\times(\sum_{i^{\prime}}\nu_{i^{\prime}}P_{X_{i^{\prime}}}^{\otimes n})$ hold,
•

and testing against (unobserved) conditional independence in which $Q_{Y^{n}X^{n}}=\sum_{i}\nu_{i}P_{Y_{i}}^{\otimes n}\times P_{X_{i}}^{\otimes n}$ hold.

For a given pair of compression-decision mappings $(\phi_{n},\psi_{n})$ , we define the corresponding type-I and type-II (false alarm and miss detection) probabilities as

	$\displaystyle\alpha_{n}$	$\displaystyle=P_{Y^{n}\phi_{n}(X^{n})}(1-\psi_{n}),$
	$\displaystyle\beta_{n}$	$\displaystyle=Q_{Y^{n}\phi_{n}(X^{n})}(\psi_{n}).$		(34)

Similarly as in Definition 1 we say that $E$ is an $\epsilon$ -achievable error exponent at a compression rate $R_{c}$ for testing $P_{Y^{n}X^{n}}$ against $Q_{Y^{n}X^{n}}$ if there exists a sequence of testing schemes $(\phi_{n},\psi_{n})$ such that all the conditions in (6) are satisfied. We denote the maximum $\epsilon$ -achievable error exponent at the given rate $R_{c}$ by $E_{\mathrm{mix},\epsilon}^{\star}(R_{c})$ . We first characterize the optimal achievable error exponent in the testing against generalized independence problem $E_{\mathrm{mix},0}^{\star}(R_{c})$ .

Theorem 3.

For a given compression rate $R_{c}$ , in testing $P_{Y^{n}X^{n}}$ against $Q_{Y^{n}X^{n}}$ using one-sided compression, we have

\displaystyle E_{\mathrm{mix},0}^{\star}(R_{c})=E_{\mathrm{comp},0}^{\star}(R_{c})=\min_{s\in[1:|\mathcal{P}_{\mathcal{X}}|]}\theta_{s}(R_{c}).

(35)

We first provide a remark about the first equality in the statement of Theorem 3. In this we highlight the difference between our model and a previous study.

Let us consider the case that $\tau_{jt}>0$ for all pairs $(j,t)$ holds. Assume that $E$ is achievable in the mixture setting via a sequence of testing schemes $(\phi_{n},\psi_{n})$ . Since $P_{Y^{n}\phi_{n}(X^{n})}(1-\psi_{n})\to 0$ and $\nu_{i}>0$ for all $i$ , we have $P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(1-\psi_{n})\to 0$ . Similarly, for an arbitrarily given $\gamma>0$ and for all sufficiently large $n$ we have $Q_{Y^{n}\phi_{n}(X^{n})}(\psi_{n})\leq e^{-n(E-\gamma)}$ . Since for all $(j,t)$ , $\tau_{jt}>0$ holds, we have $Q_{Y_{j}^{n}\phi_{n}(X_{t}^{n})}(\psi_{n})\leq\frac{1}{\tau_{jt}}e^{-n(E-\gamma)}$ . This implies $E$ is an achievable error exponent in the compound hypothesis testing problem with the corresponding sequence of testing schemes $(\phi_{n},\psi_{n})$ . Hence $E_{\mathrm{mix},0}^{\star}(R_{c})\leq E_{\mathrm{comp},0}^{\star}(R_{c})$ holds. The arguments discussed herein are similar to the ones given in [12] when data are not compressed. In our proof of Theorem 3, we only need the restriction that $\tau_{j_{i}^{\star}t_{s}^{\star}}>0$ for all $i\in[1:m]$ and $i\in\mathfrak{F}_{s}$ .

We explain the idea of showing $E_{\mathrm{mix},0}^{\star}(R_{c})\leq\min_{s\in[1:|\mathcal{P}_{\mathcal{X}}|]}\theta_{s}(R_{c})$ in the following. For simplicity assume that there is no mismatch, i.e., $\{Q_{X_{t}^{n}}\}=\{P_{X_{s}}^{\otimes n}\}$ as well as $\{Q_{Y_{j}^{n}}\}=\{P_{Y_{i}}^{\otimes n}\}$ hold. Assume that $E$ is an achievable error exponent via a sequence of testing schemes $(\phi_{n},\psi_{n})$ . A central idea of the Neyman-Pearson framework is to consider a decision region based on the likelihood ratio. An advantage of working with a likelihood-based decision region is that elementary set operations such as intersection, contraction, etc. can be performed through simple change of measure steps either in the numerator or denominator of the likelihood ratio. We want to show that if $E$ is achievable then roughly

\displaystyle\mathrm{Pr}\{\iota_{P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}}(Y_{i}^{n};\phi_{n}(X_{i}^{n}))<nE\}\to 0,\;\text{as}\;n\to\infty.

(36)

The term inside the bracket is a rejection region based on the likelihood ratio for testing $P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}$ against $P_{Y_{i}^{n}}\times P_{\phi_{n}(X_{i}^{n})}$ . Then based on the definition of the spectral-inf mutual information rate as well as the fact that the spectral-inf mutual information rate is bounded by the inf-mutual information rate, we can arrive at a conclusion that for an arbitrarily given $\gamma>0$ , and for all $i\in[1:m]$ ,

\displaystyle E\leq\frac{1}{n}I(Y_{i}^{n};\phi_{n}(X_{i}^{n}))+\gamma

(37)

for all sufficiently large $n$ . Then we can use the standard single-letterization method to obtain that $E\leq\min_{s\in[1:|\mathcal{P}_{\mathcal{X}}|]}\theta_{s}(R_{c})$ . In order to obtain the conclusion in (36), we need to perform several change of measure steps. First we form a decision region $\mathcal{A}_{n}$ based on the likelihood ratio of $P_{Y^{n}\phi_{n}(X^{n})}$ and $P_{Y^{n}}\times P_{\phi_{n}(X^{n})}$ as well as the achievable error exponent $E$ . Then it can be shown that $P_{Y^{n}\phi_{n}(X^{n})}(\mathcal{A}_{n}^{c})\to 0$ . We then do the first change of measure step from $P_{Y^{n}\phi_{n}(X^{n})}$ to $P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}$ to obtain

\displaystyle P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{A}_{n}^{c})\to 0.

(38)

This can be seen from the definition of $P_{Y^{n}X^{n}}$ , as $\nu_{i}>0$ . Next we need to change the measure inside the definition of $\mathcal{A}_{n}$ . Roughly we want to show that the following inequality holds

	$\displaystyle\log\frac{P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}}{P_{Y_{i}^{n}}\times P_{\phi_{n}(X_{i}^{n})}}(\cdot,\cdot)+\text{extra penality}$
	$\displaystyle\geq\log\frac{P_{Y^{n}\phi_{n}(X^{n})}}{P_{Y^{n}}\times P_{\phi_{n}(X^{n})}}(\cdot,\cdot)\geq nE.$		(39)

Changing the measures in the denominator from $P_{Y^{n}}$ to $P_{Y_{i}^{n}}$ and $P_{\phi_{n}(X^{n})}$ to $P_{\phi_{n}(X_{i}^{n})}$ can be done based on inequalities $P_{Y^{n}}(\cdot)\geq\nu_{i}P_{Y_{i}^{n}}(\cdot)$ and $P_{\phi_{n}(X^{n})}\geq\nu_{i}P_{\phi_{n}(X_{i}^{n})}(\cdot)$ . These inequalities follow from the definition of $P_{Y^{n}X^{n}}$ and hold for all $y^{n}$ and $\phi_{n}(x^{n})$ . The change from $P_{Y^{n}\phi_{n}(X^{n})}$ to $P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}$ is not based on the definition of $P_{Y^{n}X^{n}}$ . However we can show that it holds with high probability. The proof of the general case involves another change of measure step from $Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}$ to $P_{Y_{i}^{n}}\times P_{\phi_{n}(X_{i}^{n})}$ using our code transformation arguments.

Proof.

Assume that $E$ is an achievable error exponent at a compression rate $R_{c}$ in the compound hypothesis testing problem via a sequence of testing schemes $(\phi_{n},\psi_{n})$ . We have by definition

	$\displaystyle\lim_{n\to\infty}\max_{i\in[1:m]}\alpha_{n}^{(i)}$	$\displaystyle=0,$
	$\displaystyle\liminf_{n\to\infty}\frac{1}{n}\log\frac{1}{\max_{(j,t)}\beta_{n}^{(jt)}}$	$\displaystyle\geq E.$		(40)

Applying this sequence to the current testing against generalized independence setting we obtain

$\displaystyle P_{Y^{n}\phi_{n}(X^{n})}(\psi_{n})$	$\displaystyle=\sum_{i=1}^{m}\nu_{i}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\psi_{n})$
	$\displaystyle\leq\max_{i\in[1:m]}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\psi_{n})=\max_{i\in[1:m]}\alpha_{n}^{(i)},$
$\displaystyle Q_{Y^{n}\phi_{n}(X^{n})}(\psi_{n})$	$\displaystyle=\sum_{j,t}\tau_{jt}Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(\psi_{n})$
	$\displaystyle\leq\max_{(j,t)}Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(\psi_{n})=\max_{(j,t)}\beta_{n}^{(jt)}.$	(41)

We obtain that

\lim_{n\to\infty}P_{Y^{n}\phi_{n}(X^{n})}(\psi_{n})=0,\;\liminf_{n\to\infty}\frac{1}{n}\log\frac{1}{Q_{Y^{n}\phi_{n}(X^{n})}(\psi_{n})}\geq E.

(42)

Hence $E$ is also achievable in our testing against generalized independence setting, cf. Definition 1. Therefore we have

\displaystyle E_{\mathrm{mix},0}^{\star}(R_{c})\geq E_{\mathrm{comp},0}^{\star}(R_{c}).

(43)

Now for an arbitrarily given $\gamma>0$ , assume that $(\phi_{n},\psi_{n})$ is a sequence of testing schemes such that

\lim_{n\to\infty}\alpha_{n}=0,\;\liminf_{n\to\infty}\frac{1}{n}\log\frac{1}{\beta_{n}}\geq E+\gamma

(44)

hold. Define for each $n$ the following decision region based on the likelihood ratio

	$\displaystyle\mathcal{A}_{n}=\{(y^{n},\phi_{n}(x^{n}))\mid$	$\displaystyle P_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n}))$
		$\displaystyle\geq e^{nE}Q_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n}))\}.$		(45)

By [11, Lemma 4.1.2] and the definition of $P_{Y^{n}X^{n}}$ , we have for all $n$

	$\displaystyle\alpha_{n}+e^{nE}\beta_{n}$	$\displaystyle\geq P_{Y^{n}\phi_{n}(X^{n})}(\mathcal{A}_{n}^{c})$
		$\displaystyle=\sum_{i=1}^{m}\nu_{i}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{A}_{n}^{c})$		(46)

Let $(\gamma_{n})$ be a sequence such that $\gamma_{n}\to 0$ and $n\gamma_{n}\to\infty$ as $n\to\infty$ . For each $i\in[1:m]$ , we define a set

	$\displaystyle\mathcal{G}_{n}^{(i)}=\{(y^{n},\phi_{n}(x^{n}))\mid$	$\displaystyle P_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n}))$
		$\displaystyle<e^{n\gamma_{n}}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(y^{n},\phi_{n}(x^{n}))\}.$		(47)

We then have

\displaystyle P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}[(\mathcal{G}_{n}^{(i)})^{c}]\leq e^{-n\gamma_{n}}P_{Y^{n}\phi_{n}(X^{n})}[(\mathcal{G}_{n}^{(i)})^{c}]\leq e^{-n\gamma_{n}}.

(48)

$\mathcal{G}_{n}^{(i)}$ contains high probability pairs when we perform the change of measure from $P_{Y^{n}\phi_{n}(X^{n})}$ to $P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}$ in the numerator of the likelihood ratio test in the definition of $\mathcal{A}_{n}$ . To make the derivation more compact, we further define two following sets

$\displaystyle\mathcal{C}_{n}^{(i)}$	$\displaystyle=\{(y^{n},\phi_{n}(x^{n}))\mid P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(y^{n},\phi_{n}(x^{n}))$
	$\displaystyle\hskip 56.9055pt<e^{n(E-\gamma_{n}+\log\gamma_{q}/n)}Q_{Y_{j^{\star}_{i}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}(y^{n},\phi_{n}(x^{n}))\},$
$\displaystyle\mathcal{D}_{n}^{(i)}$	$\displaystyle=\{(y^{n},\phi_{n}(x^{n}))\mid P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(y^{n},\phi_{n}(x^{n}))$
	$\displaystyle\hskip 56.9055pt<e^{n(E-\gamma_{n})}Q_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n}))\}.$	(49)

$\mathcal{D}_{n}^{(i)}$ is a rejection region in testing $P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}$ against $Q_{Y^{n}\phi_{n}(X^{n})}$ . $\mathcal{C}_{n}^{(i)}$ is a rejection region of testing $P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}$ against $Q_{Y_{j^{\star}_{i}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}$ , which is our first desired test. From the definition of $Q_{Y^{n}X^{n}}$ , we know that for all pairs $(y^{n},\phi_{n}(x^{n}))$ , and for all $i\in[1:m]$ and $i\in\mathfrak{F}_{s}$ , the following inequality holds

\displaystyle Q_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n}))

\displaystyle\geq\gamma_{q}Q_{Y_{j^{\star}_{i}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}(y^{n},\phi_{n}(x^{n})).

(50)

This implies that $\mathcal{C}_{n}^{(i)}\subseteq\mathcal{D}_{n}^{(i)}$ holds. Furthermore for $(y^{n},\phi_{n}(x^{n}))\in\mathcal{D}_{n}^{(i)}\cap\mathcal{G}_{n}^{(i)}$ we have

$\displaystyle P_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n}))$	$\displaystyle\stackrel{{\scriptstyle\eqref{g_set}}}{{<}}e^{n\gamma_{n}}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(y^{n},\phi_{n}(x^{n}))$
	$\displaystyle\stackrel{{\scriptstyle\eqref{c_i_n_def}}}{{<}}e^{nE}Q_{Y^{n}\phi_{n}(X^{n})}(y^{n},\phi_{n}(x^{n}))$
$\displaystyle\Rightarrow(y^{n},$	$\displaystyle\phi_{n}(x^{n}))\in\mathcal{A}_{n}^{c}.$	(51)

Using the above analysis we perform in the following two change of measure steps

	$\displaystyle P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{A}_{n}^{c})+e^{-n\gamma_{n}}$
	$\displaystyle\stackrel{{\scriptstyle(a)}}{{\geq}}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{D}_{n}^{(i)}\cap\mathcal{G}_{n}^{(i)})+P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}[(\mathcal{G}_{n}^{(i)})^{c}]$
	$\displaystyle\geq P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{D}_{n}^{(i)})$
	$\displaystyle\stackrel{{\scriptstyle(b)}}{{\geq}}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{C}_{n}^{(i)}),$		(52)

where

•

in $(a)$ we change the measures from $P_{Y^{n}\phi_{n}(X^{n})}$ to $P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}$ ,
•

in $(b)$ we change the measures from $Q_{Y^{n}\phi_{n}(X^{n})}$ to $Q_{Y_{j^{\star}_{i}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}$ .

In summary we have

	$\displaystyle\alpha_{n}+e^{nE}\beta_{n}+e^{-n\gamma_{n}}$
	$\displaystyle\geq\sum_{i=1}^{m}\nu_{i}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{C}_{n}^{(i)}).$		(53)

In combination with (44), since $e^{nE}\beta_{n}\to 0$ as $n\to\infty$ holds, we obtain

\displaystyle\lim_{n\to\infty}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{C}_{n}^{(i)})=0,\;\forall i\in[1:m].

(54)

In the next step, for each $i\in[1:m]$ , and $i\in\mathfrak{F}_{s}$ , we will perform change of measure from $Q_{Y_{j^{\star}_{i}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}$ to $P_{Y_{i}^{n}}\times P_{\bar{\phi}_{n}^{s}(X_{i}^{n})}$ . $\bar{\phi}_{n}^{s}$ is a compression mapping for each class $\mathfrak{F}_{s}$ that is constructed from $\phi_{n}$ .

For a given $i\in[1:m]$ , consider the problem of differentiating between $P_{Y_{i}X_{i}}^{\otimes n}$ and $Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{X_{t_{s}^{\star}}^{n}}$ , where $i\in\mathfrak{F}_{s}$ , via the testing scheme $(\phi_{n},\mathbf{1}_{(\mathcal{C}_{n}^{(i)})^{c}})$ . The corresponding error probabilities are given by

\displaystyle P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{C}_{n}^{(i)}),\;Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}[(\mathcal{C}_{n}^{(i)})^{c}].

(55)

Note that by the definition of $\mathcal{C}_{n}^{(i)}$ we have

	$\displaystyle Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}[(\mathcal{C}_{n}^{(i)})^{c}]$	$\displaystyle\leq e^{-n(E-\gamma_{n}+\log\gamma_{q}/n)}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}[(\mathcal{C}_{n}^{(i)})^{c}]$
		$\displaystyle\leq e^{-n(E-\gamma_{n}+\log\gamma_{q}/n)}.$		(56)

We want to transform the given testing scheme to obtain a new testing scheme $(\bar{\phi}_{n}^{s},\bar{\psi}_{n}^{(i)})$ , $i\in\mathfrak{F}_{s}$ , for differentiating between $P_{Y_{i}X_{i}}^{\otimes n}$ and $P_{Y_{i}}^{\otimes n}\times P_{X_{i}}^{\otimes n}$ . This can be done using similar arguments to those given in Appendix A as follows. For the given positive $\gamma$ we define for the given $s\in[1:|\mathcal{P}_{\mathcal{X}}|]$ a typical subset of $\mathcal{X}^{n}$

\displaystyle\mathcal{B}_{n,\gamma}^{s}=\{x^{n}\mid|\iota_{P_{\mathcal{X},s}^{\otimes n}\|Q_{X_{t_{s}^{\star}}^{n}}}(x^{n})/n-d_{s}^{\mathrm{x}}|<\gamma\}.

(57)

Similarly for the given $i$ we define a typical subset of $\mathcal{Y}^{n}$

\displaystyle\mathcal{B}_{n,\gamma}^{(i)}=\{y^{n}\mid|\iota_{P_{Y_{i}}^{\otimes n}\|Q_{Y_{j_{i}^{\star}}^{n}}}(y^{n})/n-d_{i}^{\mathrm{y}}|<\gamma\}.

(58)

Then the new compression mapping $\bar{\phi}_{n}^{s}$ is defined as

	$\displaystyle\bar{\phi}_{n}^{s}\colon\mathcal{X}^{n}$	$\displaystyle\to\mathcal{M}\cup\{e\}$
	$\displaystyle\bar{\phi}_{n}^{s}(x^{n})$	$\displaystyle\mapsto\begin{dcases}\phi_{n}(x^{n}),\;&\text{if}\;x^{n}\in\mathcal{B}_{n,\gamma}^{s},\\ e\;&\text{otherwise}\end{dcases}.$

The decision mapping $\bar{\psi}_{n}^{(i)}$ is defined as

	$\displaystyle\bar{\psi}_{n}^{(i)}\colon\mathcal{Y}^{n}\times(\mathcal{M}\cup\{e\})$	$\displaystyle\to\{0,1\}$
	$\displaystyle\bar{\psi}_{n}^{(i)}(y^{n},\bar{u})$	$\displaystyle\mapsto\begin{dcases}\mathbf{1}_{(\mathcal{C}_{n}^{(i)})^{c}}(y^{n},\bar{u}),\;&\text{if}\;y^{n}\in\mathcal{B}_{n,\gamma}^{(i)},\;\text{and}\;\bar{u}\neq e,\\ 1&\text{otherwise}\end{dcases}.$

Using this testing scheme $(\bar{\phi}_{n}^{s},\bar{\psi}_{n}^{(i)})$ we can bound the error probabilities in testing $P_{Y_{i}X_{i}}^{\otimes n}$ against $P_{Y_{i}}^{\otimes n}\times P_{X_{i}}^{\otimes n}$ as

$\displaystyle P_{Y_{i}^{n}\bar{\phi}_{n}^{s}(X_{i}^{n})}(1-\bar{\psi}_{n}^{(i)})$	$\displaystyle\leq P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{C}_{n}^{(i)})+P_{Y_{i}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{(i)})^{c}]+P_{X_{i}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{s})^{c}]$
$\displaystyle P_{Y_{i}}^{\otimes n}\times P_{\bar{\phi}_{n}^{s}(X_{i}^{n})}(\bar{\psi}_{n}^{(i)})$	$\displaystyle\leq Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}[(\mathcal{C}_{n}^{(i)})^{c}]e^{n(d_{s}^{\mathrm{x}}+d_{i}^{\mathrm{y}}+2\gamma)}$
	$\displaystyle\leq e^{-nE_{i}^{\prime}},$	(59)

where $E_{i}^{\prime}=E-\gamma_{n}+\log\gamma_{q}/n-[d_{is}^{\mathrm{yx}}+2\gamma]$ . Then using [11, Lemma 4.1.2] we obtain the following inequality

	$\displaystyle P_{Y_{i}^{n}\bar{\phi}_{n}^{s}(X_{i}^{n})}(1-\bar{\psi}_{n}^{(i)})$	$\displaystyle+e^{n(E_{i}^{\prime}-\gamma)}P_{Y_{i}}^{\otimes n}\times P_{\bar{\phi}_{n}^{s}(X_{i}^{n})}(\bar{\psi}_{n}^{(i)})$
		$\displaystyle\geq P_{Y_{i}^{n}\bar{\phi}_{n}^{s}(X_{i}^{n})}[(\bar{\mathcal{A}}_{n}^{(i)})^{c}],$		(60)

holds where

	$\displaystyle\bar{\mathcal{A}}_{n}^{(i)}=\{(y^{n},\bar{\phi}_{n}^{s}(x^{n}))\mid$	$\displaystyle P_{Y_{i}^{n}\bar{\phi}_{n}^{s}(X_{i}^{n})}(y^{n},\bar{\phi}_{n}^{s}(x^{n}))$
		$\displaystyle\geq e^{n(E_{i}^{\prime}-\gamma)}P_{Y_{i}}^{\otimes n}\times P_{\bar{\phi}_{n}^{s}(X_{i}^{n})}(y^{n},\bar{\phi}_{n}^{s}(x^{n}))\},$		(61)

is our desired decision region using the likelihood ratio test in testing $P_{Y_{i}^{n}\bar{\phi}_{n}^{s}(X_{i}^{n})}$ against $P_{Y_{i}}^{\otimes n}\times P_{\bar{\phi}_{n}^{s}(X_{i}^{n})}$ . Using the inequalities in (59) we obtain

	$\displaystyle P_{Y_{i}^{n}\bar{\phi}_{n}^{s}(X_{i}^{n})}[(\bar{\mathcal{A}}_{n}^{(i)})^{c}]\leq$	$\displaystyle P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{C}_{n}^{(i)})+P_{Y_{i}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{(i)})^{c}]$
		$\displaystyle+P_{X_{i}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{s})^{c}]+e^{-n\gamma}.$		(62)

Under Assumption 1, $P_{Y_{i}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{(i)})^{c}]\to 0$ and $P_{X_{i}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{s})^{c}]\to 0$ as $n\to\infty$ due to either the weak law of large numbers or Theorem 1 in [24]. Since both $\gamma_{p}>0$ , and $\gamma>\gamma_{n}-\log\gamma_{q}/n$ as $n\to\infty$ hold, by combining (54) and (62) we have for all $i\in[1:m]$

\displaystyle\lim_{n\to\infty}\mathrm{Pr}\{\iota_{Y_{i}^{n}\bar{\phi}_{n}^{s}(X_{i}^{n})}(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n}))<n(E-d_{is}^{\mathrm{yx}}-4\gamma)\}=0,

(63)

where $(Y_{i}^{n},X_{i}^{n})\sim P_{Y_{i}X_{i}}^{\otimes n}$ holds. Hence, for all $i\in[1:m]$ , $i\in\mathfrak{F}_{s}$ , we have

\displaystyle E-d_{is}^{\mathrm{yx}}-4\gamma\leq\underline{I}(\mathbf{Y}_{i};\bar{\phi}_{s}(\mathbf{X}_{i})),

(64)

where $(\mathbf{Y}_{i},\bar{\phi}_{s}(\mathbf{X}_{i}))=\{(Y_{i}^{n},\bar{\phi}_{n}^{s}(X_{i}^{n}))\}_{n=1}^{\infty}$ and $\underline{I}(\cdot;\cdot)$ is the spectral-inf mutual information rate, defined for a joint process $(\bar{\mathbf{U}},\bar{\mathbf{V}})=\{(U^{n},V^{n})_{n=1}^{\infty}\}$ as

\displaystyle\underline{I}(\mathbf{U};\mathbf{V})=\sup\big{\{}\beta\big{|}\lim_{n\to\infty}\mathrm{Pr}\big{[}\iota_{P_{U^{n}V^{n}}}(U^{n};V^{n})<n\beta\big{]}=0\big{\}}.

(65)

Since the spectral-inf mutual information rate is less than or equal to the inf-mutual information rate by [11, Theorem 3.5.2]

\underline{I}(\mathbf{Y}_{i};\bar{\phi}_{s}(\mathbf{X}_{i}))\leq\liminf_{n\to\infty}\frac{1}{n}I(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n}))

holds, $\forall i\in[1:m],i\in\mathfrak{F}_{s}$ , we have

\displaystyle E-d_{is}^{\mathrm{yx}}-4\gamma\leq\sup_{n_{0}}\inf_{n\geq n_{0}}\frac{1}{n}I(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n})).

(66)

For each $i\in[1:m]$ , let $n_{i}(\gamma)$ be such that

\displaystyle\sup_{n_{0}}\inf_{n\geq n_{0}}\frac{1}{n}I(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n}))\leq\inf_{n\geq n_{i}(\gamma)}\frac{1}{n}I(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n}))+\gamma.

(67)

Then for all $i\in[1:m]$ , we have

\displaystyle E-d_{is}^{\mathrm{yx}}-4\gamma\leq\inf_{n\geq n_{i}(\gamma)}\frac{1}{n}I(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n}))+\gamma.

(68)

Let $T$ be a uniform random variable on $[1:n]$ and independent of everything else. For each $i\in[1:m]$ and $i\in\mathfrak{F}_{s}$ , we define $U_{il}=(\bar{\phi}_{n}^{s}(X_{i}^{n}),X_{i}^{l-1})$ for all $l\in[1:n]$ and $U_{i}=(U_{iT},T)$ . Therefore for all $\forall i\in[1:m]$ and $i\in\mathfrak{F}_{s}$ , as well as for all sufficiently large $n$ , say $n\geq\max_{i}n_{i}(\gamma)$ , we have

$\displaystyle E-$	$\displaystyle d_{is}^{\mathrm{yx}}-5\gamma\leq\frac{1}{n}I(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n}))$
	$\displaystyle=\frac{1}{n}\sum_{l=1}^{n}I(Y_{il};\bar{\phi}_{n}^{s}(X_{i}^{n})\|Y_{i}^{l-1})\stackrel{{\scriptstyle(a)}}{{=}}\frac{1}{n}\sum_{l=1}^{n}I(Y_{il};\bar{\phi}_{n}^{s}(X_{i}^{n}),Y_{i}^{l-1})$
	$\displaystyle\leq\frac{1}{n}\sum_{l=1}^{n}I(Y_{il};\bar{\phi}_{n}^{s}(X_{i}^{n}),Y_{i}^{l-1},X_{i}^{l-1})\stackrel{{\scriptstyle(b)}}{{=}}\frac{1}{n}\sum_{l=1}^{n}I(Y_{il};\bar{\phi}_{n}^{s}(X_{i}^{n}),X_{i}^{l-1})$
	$\displaystyle=I(Y_{iT};U_{iT},T)=I(Y_{iT};U_{i}),$	(69)

where $(a)$ follows since $Y_{i}^{n}\sim P_{Y_{i}}^{\otimes n}$ , and $(b)$ is valid since $Y_{i}^{l-1}-X_{i}^{l-1}-(Y_{il},\bar{\phi}_{n}^{s}(X_{i}^{n}))$ forms a Markov chain. Similarly we also have

	$\displaystyle R_{c}+\gamma$	$\displaystyle\geq\frac{1}{n}\log\|\bar{\phi}_{n}^{s}\|\geq\frac{1}{n}I(X^{n}_{i};\bar{\phi}_{n}^{s}(X_{i}^{n}))=\frac{1}{n}\sum_{l=1}^{n}I(X_{il};\bar{\phi}_{n}^{s}(X_{i}^{n}),X_{i}^{l-1})$
		$\displaystyle=I(X_{iT};U_{iT},T)=I(X_{iT};U_{i}),\;\forall i\in[1:m].$		(70)

Note that for $\tau,\eta\in\mathfrak{F}_{s}$ , we have $P_{U_{\tau}|X_{\tau T}}=P_{U_{\eta}|X_{\eta T}}$ . We define this common kernel for each $\mathfrak{F}_{s}$ as $P_{U_{s}|\bar{X}_{s}}$ . Therefore for all $s\in[1:|\mathcal{P}_{\mathcal{X}}|]$ and all sufficiently large $n$ we have

	$\displaystyle E-5\gamma$	$\displaystyle\leq\min_{i\in\mathfrak{F}_{s}}[I(\bar{Y}_{i};U_{s})+d_{is}^{\mathrm{yx}}],$
	$\displaystyle R_{c}+\gamma$	$\displaystyle\geq I(\bar{X}_{s};U_{s}),$		(71)

where for all $i\in\mathfrak{F}_{s}$ we have $(\bar{Y}_{i},\bar{X}_{s},U_{s})\sim P_{Y_{i}|X_{i}}\times P_{\mathcal{X},s}\times P_{U_{s}|\bar{X}_{s}}$ . By standard cardinality bound arguments [27] and taking $\gamma\to 0$ we have that for all $s\in[1:|\mathcal{P}_{\mathcal{X}}|]$

	$\displaystyle E\leq\min_{i\in\mathfrak{F}_{s}}$	$\displaystyle[I(\bar{Y}_{i};\bar{U}_{s})+d_{is}^{\mathrm{yx}}],\;R_{c}\geq I(\bar{X}_{s};\bar{U}_{s}),$
		$\displaystyle P_{\bar{Y}_{i}\bar{X}_{s}\bar{U}_{s}}=P_{\bar{Y}_{i}\bar{X}_{s}}\times P_{\bar{U}_{s}\|\bar{X}_{s}}.$		(72)

Hence $E\leq\theta_{s}(R_{c})$ holds for all $s\in[1:|\mathcal{P}_{\mathcal{X}}|]$ , which leads to $E_{\mathrm{mix},0}^{\star}(R_{c})\leq\min_{s}\theta_{s}(R_{c})$ . ∎

V $\epsilon$ -Error Exponent in Mixture Setting

In this section we characterize the maximum $\epsilon$ -achievable error exponent in the testing against generalized independence setting in Section IV.

V-A Small $\epsilon$ -optimality of $E_{\mathrm{mix},0}^{\star}(R_{c})$

The following partial result is an immediate consequence of the first part of Theorem 2. It states that when $\epsilon$ is small enough then the maximum achievable error exponent $E_{\mathrm{mix},0}^{\star}(R_{c})$ is also $\epsilon$ -optimal.

Corollary 1.

For a given $R_{c}$ , if $\epsilon<\min_{i\in[1:m]}\nu_{i}\times\min\{\min_{s\in\mathcal{S}}1/|\mathfrak{F}_{s}|,1\}$ , where the inactive set $\mathcal{S}$ is defined as in the statement of Theorem 2, then

E_{\mathrm{mix},\epsilon}^{\star}(R_{c})=\min_{s}\theta_{s}(R_{c}).

(73)

Proof.

Assume that $(\phi_{n},\psi_{n})$ is a sequence of testing schemes such that the conditions in Definition 1 are satisfied for the pair $(R_{c},E)$

	$\displaystyle\limsup_{n\to\infty}\frac{1}{n}\log\|\phi_{n}\|\leq R_{c},\;$	$\displaystyle\limsup_{n\to\infty}\alpha_{n}\leq\epsilon,$
	$\displaystyle\liminf_{n\to\infty}\frac{1}{n}$	$\displaystyle\log\frac{1}{\beta_{n}}\geq E.$		(74)

For an arbitrarily small $\gamma>0$ and for all $i\in[1:m]$ , $i\in\mathfrak{F}_{s}$ , we have $Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}(\psi_{n})\leq e^{-n(E-\gamma)}$ for all $n\geq n_{0}(\gamma)$ . Furthermore we also have

\displaystyle\epsilon\geq\limsup_{n\to\infty}\alpha_{n}\geq\nu_{i}\limsup_{n\to\infty}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(1-\psi_{n}),\;\forall i\in[1:m],

(75)

which implies that $\forall i\in[1:m]$ we have $\limsup_{n\to\infty}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(1-\psi_{n})<\min\{\min_{s\in\mathcal{S}}1/|\mathfrak{F}_{s}|,1\}$ . By the proof the first part of Theorem 2 we obtain then $E-\gamma<\min_{s}\theta_{s}(R_{c})$ . Since $\gamma$ is arbitrary the conclusion follows. ∎

V-B A sufficient condition for characterizing $E_{\mathrm{mix},\epsilon}^{\star}(R_{c})$

To obtain a full characterization of $E_{\mathrm{mix},\epsilon}^{\star}(R_{c})$ we need to make an additional assumption. By [28, Theorem 2.5] the maximization in the expression of $\xi_{i}(R_{c})$ , $i\in\mathfrak{F}_{s}$ , can be restriction to the set $\mathcal{W}_{s}(R_{c})=\{P_{U|\bar{X}_{s}}\mid I(\bar{X}_{s};U)=R_{c}\}$ . For each $i\in\mathfrak{F}_{s}$ , we define $\mathcal{W}_{s}^{(i)}(R_{c})$ to be the set of optimal solutions of $\xi_{i}(R_{c})$ within $\mathcal{W}_{s}(R_{c})$ . We define a set $\mathcal{O}_{s}(R_{c})$ as follows:

	$\displaystyle\text{If}\;\bigcap_{i\in\mathfrak{F}_{s}}\mathcal{W}_{s}^{(i)}(R_{c})\neq\varnothing,\;\text{then we define}\;\mathcal{O}_{s}(R_{c})=\bigcap_{i\in\mathfrak{F}_{s}}\mathcal{W}_{s}^{(i)}(R_{c}),$
	$\displaystyle\text{otherwise we take}\;\mathcal{O}_{s}(R_{c})=\bigcup_{i\in\mathfrak{F}_{s}}\mathcal{W}_{s}^{(i)}(R_{c}).$		(76)

For an arbitrarily given $P_{U|X}\in\mathcal{O}_{s}(R_{c})$ , the set of $\{\xi_{i},\;i\in\mathfrak{F}_{s}\}$ can be arranged as

I(\bar{Y}_{i_{1}};U)+d_{i_{1}}^{\mathrm{y}}\leq I(\bar{Y}_{i_{2}};U)+d_{i_{2}}^{\mathrm{y}}\leq\dots\leq I(\bar{Y}_{i_{|\mathfrak{F}_{s}|}};U)+d_{|\mathfrak{F}_{s}|}^{\mathrm{y}}.

(77)

This corresponds to a permutation $\sigma_{s}(P_{U|X})$ on $[1:|\mathfrak{F}_{s}|]$ . It can be seen that when $\bigcap_{i\in\mathfrak{F}_{s}}\mathcal{W}_{s}^{(i)}(R_{c})$ is not empty then the permutation $\sigma_{s}(\cdot)$ does not depend on a specific $P_{U|X}$ in $\mathcal{O}_{s}(R_{c})$ . To account for more general scenarios, we make the following separability assumption.

Assumption 2.

For given set of distributions $\{P_{X_{i}Y_{i}}\}_{i=1}^{m}$ and set of sequences of distributions $\{(Q_{Y_{j}^{n}})\}_{j=1}^{k}$ , class $\mathfrak{F}_{s}$ and $R_{c}$ , any two kernels $P_{U|X}^{1}$ and $P_{U|X}^{2}$ in $\mathcal{O}_{s}(R_{c})$ satisfy $\sigma_{s}(P_{U|X}^{1})=\sigma_{s}(P_{U|X}^{2})$ , i.e., the order is invariant to the change of kernels inside $\mathcal{O}_{s}(R_{c})$ .

It can be seen that Assumption 2 is satisfied when $|\mathfrak{F}_{s}|=1$ for all $s$ , i.e., when all $P_{Y_{i}X_{i}}$ have distinct marginal distributions. In the following we briefly discuss two non-trivial scenarios in which Assumption 2 can be fulfilled.

Example 1: In this example we assume that $\{Q_{Y_{j}^{n}}\}=\{P_{Y_{i}}^{\otimes n}\}$ and $\{Q_{X_{t}^{n}}\}=\{P_{\mathcal{X},s}^{\otimes n}\}$ hold. This implies that for all $i\in[1:m]$ and for all $s\in[1:|\mathcal{P}_{\mathcal{X}}|]$ , we have $d_{i}^{\mathrm{y}}=0$ and $d_{s}^{\mathrm{x}}=0$ . We then assume further that within each class $\mathfrak{F}_{s}$ the set of channels $P_{\bar{Y}_{i}|\bar{X}_{s}}$ can be ordered according to the less noisy relation⁷⁷7A channel $P_{Y|X}$ is less noisy [29] than a channel $P_{Z|X}$ if for every $P_{XU}$ we have $I(Y;U)\geq I(Z;U)$ .. Then for all $P_{U|X}\in\mathcal{O}_{s}(R_{c})$ , we have

\displaystyle I(\bar{Y}_{i_{1}};U)\leq I(\bar{Y}_{i_{2}};U)\leq\dots\leq I(\bar{Y}_{i_{|\mathfrak{F}_{s}|}};U),

(78)

for some fixed order $\{i_{1},\dots,i_{|\mathfrak{F}_{s}|}\}$ which does not depend on whether $\bigcap_{i\in\mathfrak{F}_{s}}\mathcal{W}_{s}^{(i)}(R_{c})$ is empty or not. Hence Assumption 2 is satisfied.

Example 2: We consider another example in which the set of distributions in the null hypothesis is $\{P_{\bar{Y}_{i}|\bar{X}}\times P_{\bar{X}}\}_{i=1}^{2}$ . We assume further that $P_{\bar{Y}_{1}|\bar{X}}$ is an erasure channel with erasure probability $t$ . We also assume that $\{Q_{Y_{j}^{n}}\}=\{P_{\bar{Y}_{i}}^{\otimes n}\}$ and $\{Q_{X_{t}^{n}}\}=\{P_{\bar{X}}^{\otimes n}\}$ hold in this example. Then $\xi_{1}(R_{c})=(1-t)R_{c}$ , which can be achieved by any kernel $P_{U_{1}|\bar{X}}$ such that $I(\bar{X};U_{1})=R_{c}$ . This implies that $\mathcal{W}^{(1)}(R_{c})\cap\mathcal{W}^{(2)}(R_{c})=\mathcal{W}^{(2)}(R_{c})$ holds, a non-empty set, and hence Assumption 2 is satisfied.

Assumption 2 implies that at a given $R_{c}$ we have

\xi_{i_{1}}(R_{c})\leq\xi_{i_{2}}(R_{c})\leq\dots\leq\xi_{i_{|\mathfrak{F}_{s}|}}(R_{c}).

(79)

This can be seen as follows. If $\bigcap_{i\in\mathfrak{F}_{s}}\mathcal{W}_{s}^{(i)}(R_{c})\neq\varnothing$ , we can take any kernel inside $\mathcal{O}_{s}(R_{c})$ to achieve $\xi_{i_{l}}(R_{c})$ for all $l\in[1:|\mathfrak{F}_{s}|]$ . Hence (77) leads to (79). Otherwise, we take a kernel $P_{U|X}\in\mathcal{W}_{s}^{(i_{1})}(R_{c})$ to obtain $\xi_{i_{1}}(R_{c})\leq I(\bar{Y}_{i_{2}};U)+d_{i_{2}}^{\mathrm{y}}\leq\xi_{i_{2}}(R_{c})$ since $\mathcal{W}_{s}^{(i_{1})}(R_{c})\subseteq\mathcal{O}_{s}(R_{c})$ holds and so on. Furthermore we observe that for all $l\in[1:|\mathfrak{F}_{s}|]$ , we have

$\displaystyle\xi_{i_{l}}(R_{c})$	$\displaystyle\geq\max_{P_{U\|\bar{X}_{s}}\colon I(\bar{X}_{s};U)\leq R_{c}}\min_{i\in\{i_{\eta}\}_{\eta=l}^{\|\mathfrak{F}_{s}\|}}[I(\bar{Y}_{i};U)+d_{is}^{\mathrm{yx}}]$
	$\displaystyle\geq\max_{P_{U\|\bar{X}_{s}}\in\mathcal{O}_{s}(R_{c})}\min_{i\in\{i_{\eta}\}_{\eta=l}^{\|\mathfrak{F}_{s}\|}}[I(\bar{Y}_{i};U)+d_{is}^{\mathrm{yx}}]$
	$\displaystyle\stackrel{{\scriptstyle\eqref{fixed_order}}}{{=}}\max_{P_{U\|\bar{X}_{s}}\in\mathcal{O}_{s}(R_{c})}[I(\bar{Y}_{i_{l}};U)+d_{i_{l}s}^{\mathrm{yx}}]$
	$\displaystyle\stackrel{{\scriptstyle(*)}}{{\geq}}\xi_{i_{l}}(R_{c}).$	(80)

$(*)$ can be verified as follows. If $\bigcap_{i\in\mathfrak{F}_{s}}\mathcal{W}_{s}^{(i)}(R_{c})\neq\varnothing$ , we can take any kernel inside $\mathcal{O}_{s}(R_{c})$ to achieve $\xi_{i_{l}}(R_{c})$ . Otherwise, $\mathcal{O}_{s}(R_{c})\supseteq\mathcal{W}_{s}^{(i_{l})}(R_{c})$ holds, then $(*)$ follows. Therefore, we have the following relation

\xi_{i_{l}}(R_{c})=\max_{P_{U|\bar{X}_{s}}\colon I(\bar{X}_{s};U)\leq R_{c}}\min_{i\in\{i_{\eta}\}_{\eta=l}^{|\mathfrak{F}_{s}|}}[I(\bar{Y}_{i};U)+d_{is}^{\mathrm{yx}}].

(81)

Assume that at a given $R_{c}$ , Assumption 2 is valid. By relabeling elements in the set $\{P_{Y_{i}X_{i}}\}_{i=1}^{m}$ if necessary, we assume that $(\xi_{i}(R_{c}))_{i=1}^{m}$ is an increasing sequence. An example of such ordering is given in Fig. 2.

For each $l\in[1:m]$ , for notation simplicity we define for each $s$ a left-over subset of $\mathfrak{F}_{s}$ as $\mathfrak{F}_{s}(l)=\mathfrak{F}_{s}\backslash[1:l-1]$ . Since the ordering is unique, (79) implies that $i_{1}\leq i_{2}\leq\dots\leq i_{|\mathfrak{F}_{s}|}$ . Therefore, when $\mathfrak{F}_{s}(l)\neq\varnothing$ , we have

\xi_{\min\mathfrak{F}_{s}(l)}(R_{c})=\max_{P_{U|\bar{X}_{s}}\colon I(\bar{X}_{s};U)\leq R_{c}}\min_{i\in\mathfrak{F}_{s}(l)}[I(\bar{Y}_{i};U)+d_{is}^{\mathrm{yx}}].

(82)

The above analysis leads to the following result.

Proposition 1.

Assume that at a given $R_{c}$ Assumption 2 is fulfilled and $(\xi_{i}(R_{c}))_{i=1}^{m}$ is an increasing sequence. Then for each $l\in[1:m]$ the following holds

\displaystyle\min_{s}\max_{P_{U|\bar{X}_{s}}\colon I(\bar{X}_{s};U)\leq R_{c}}\min_{i\in\mathfrak{F}_{s}(l)}[I(\bar{Y}_{i};U)+d_{is}^{\mathrm{yx}}]=\xi_{l}(R_{c}).

(83)

The left-hand side in (83) is the maximum achievable error exponent in testing $\{P_{Y_{i}X_{i}}^{\otimes n}\}_{i\in[l:m]}$ against $\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}$ . This result will be used to establish the optimal $\epsilon$ -error exponent in this section.

Proof.

Due to the above analysis the left-hand side equals to, cf. (82),

	$\displaystyle\min_{s\colon\mathfrak{F}_{s}(l)\neq\varnothing}\xi_{\min\mathfrak{F}_{s}(l)}(R_{c})=\min_{i\in\bigcup_{s}\mathfrak{F}_{s}(l)}\xi_{i}(R_{c})$
	$\displaystyle=\min_{i\in[l:m]}\xi_{i}(R_{c})=\xi_{l}(R_{c}).$		(84)

∎

V-C Characterization of $E_{\mathrm{mix},\epsilon}^{\star}(R_{c})$ under Assumption 2

A complete characterization of $E_{\mathrm{mix},\epsilon}^{\star}(R_{c})$ under Assumption 2 is provided in the following.

Theorem 4.

Assume that a given $R_{c}$ , Assumption 2 is true and $(\xi_{i}(R_{c}))_{i=1}^{m}$ is an increasing sequence, then we have

\displaystyle E_{\mathrm{mix},\epsilon}^{\star}(R_{c})=\sum_{i=1}^{m}\xi_{i}(R_{c})\mathbf{1}_{[\sum_{j=1}^{i-1}\nu_{j},\sum_{j=1}^{i}\nu_{j})}(\epsilon).

(85)

The behavior of $E_{\mathrm{mix},\epsilon}^{\star}(R_{c})$ is depicted in Fig. 2. The proof of Theorem 4 uses the result from the second part of Theorem 2. It demonstrates an application of showing exponentially strong converse. We briefly describe the proof idea in the following.

Fix an $\epsilon$ such that $\epsilon\in[\sum_{j=1}^{i-1}\nu_{j},\sum_{j=1}^{i}\nu_{j})$ holds. We build an $\epsilon$ -achievable sequence of testing schemes based on a compound hypothesis testing problem in which we need to specify two sets of distributions in the null and alternative hypotheses. The set of distributions in the alternative hypothesis is

\displaystyle H_{1}^{\prime}:\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}_{j\in[1:k],t\in[1:r]}.

(86)

Since we are interested in showing $\epsilon$ -error is achievable, we select the set of distributions in the null hypothesis as

\displaystyle H_{0}^{\prime}:\{P_{Y_{l}X_{l}}^{\otimes n}\}_{l\in[i:m]}.

(87)

We omit the other distributions $P_{Y_{l}X_{l}}^{\otimes n}$ for $l\in[1:i-1]$ in the null hypothesis of the above problem because in the mixture setting they contribute only to a total error probability up to $\sum_{j=1}^{i-1}\nu_{j}$ which is what we desire. Given a sequence of testing schemes such that $E$ is achievable $(\phi_{n},\psi_{n})$ in testing $H_{0}^{\prime}$ against $H_{1}^{\prime}$ , we obtain as by-products the collection of intersected decision regions $\mathcal{I}_{n}^{(l)}(E)$ with vanishing complement probability, cf. Theorem 1.
We use $\phi_{n}$ to compress $x^{n}$ . Next we select a decision region $\mathcal{A}_{n}$ for testing $P_{Y^{n}X^{n}}$ against $Q_{Y^{n}X^{n}}$ such that for all $l\in[i:m]$ , $\mathcal{A}_{n}^{c}\subset(\mathcal{I}_{n}^{(l)}(E))^{c}$ holds. This ensures that $P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{A}_{n}^{c})\to 0$ for all $l\in[i:m]$ . Hence, asymptotically we have $P_{Y^{n}\phi_{n}(X^{n})}[\mathcal{A}_{n}^{c}]\leq\sum_{j=1}^{i-1}\nu_{j}P_{Y_{j}^{n}\phi_{n}(X_{j}^{n})}[\mathcal{A}_{n}^{c}]\leq\epsilon$ .
As in the converse proof of Theorem 3, assume that we have a likelihood based decision region $\mathcal{A}_{n}$ for testing $P_{Y^{n}X^{n}}$ against $Q_{Y^{n}X^{n}}$ . Then we use change of measure steps to obtain for each $i\in[1:m]$ a likelihood based rejection region of testing $P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}$ against $Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t_{s}^{\star}}^{n})}$ , called $\mathcal{C}_{n}^{(i)}$ . Then for all sufficiently large $n$ we have

\displaystyle\epsilon>\sum_{l=1}^{m}\nu_{l}P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}(\mathcal{C}_{n}^{(l)}).

(88)

We show that for all $l\in[1:i]$ we have $P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}(\mathcal{C}_{n}^{(l)})$ converges to 1 when $E>\xi_{i}(R_{c})+3\gamma$ holds. Taking the limit superior on both sides on the above inequality, we obtain that $\epsilon>\sum_{j=1}^{i}\nu_{j}$ , a contradiction. Hence we must have $E\leq\xi_{i}(R_{c})$ .
When no compression is involved, $\phi_{n}$ is the identity mapping, the convergence of $P_{Y_{l}^{n}X_{l}^{n}}(\mathcal{C}_{n}^{(l)})$ to 1 can be seen from the weak law of large numbers since $\mathcal{C}_{n}^{(l)}$ is characterized by the likelihood ratio. In our case this is not possible since the likelihood ratio can not be factorized as the sum of identical components. Using the strong converse arguments which guarantee the convergence in the limit superior sense, it can be seen that if $\epsilon\in[\max_{j\in[1:i-1]}{\nu_{j}},\max_{j\in[1:i]}{\nu_{j}})$ holds, then we must have $E\leq\xi_{i}(R_{c})$ . This approach does not match the achievability result and does not yield conclusive information when $\epsilon>\max_{i\in[1:m]}\nu_{i}$ . The required convergence of $P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}(\mathcal{C}_{n}^{(l)})$ is guaranteed by the second part of Theorem 2 since $\xi_{i}(R_{c})>\xi_{l}(R_{c})$ holds for all $l\in[1:i]$ due to Assumption 2.

Proof.

Achievability: Given $i\in[1:m]$ , consider a reduced compound problem of designing a testing scheme $(\phi_{n},\psi_{n})$ to differentiate between $\{P_{Y_{l}X_{l}}^{\otimes n}\}_{l\in[i:m]}$ and $\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}_{j\in[1:k],t\in[1:r]}$ . By Theorem 1 and Proposition 1, for each $\gamma>0$ , there exists a sequence of testing schemes for this reduced compound setting such that $E+\gamma/2=\xi_{i}(R_{c})-\gamma/2$ is achievable, i.e.,

\displaystyle\lim_{n\to\infty}P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}[(\mathcal{I}_{n}^{(l)}(E))^{c}]=0,\;\forall l\in[i:m].

(89)

In contrast to the achievability proof of Theorem 3, although we can use the same sequence of compression mappings $(\phi_{n})$ to compress $x^{n}$ , we need to define a new sequence of decision mappings $\bar{\psi}_{n}$ . For each $n$ and given the compression mapping $\phi_{n}$ , define the following measure

\displaystyle\bar{P}_{n}=\sum_{l=1}^{m}P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}.

(90)

Define an acceptance region for our testing $P_{Y^{n}X^{n}}$ against $Q_{Y^{n}X^{n}}$ problem as follows

	$\displaystyle\mathcal{A}_{n}=\big{\{}(y^{n},\phi_{n}(x^{n}))\mid$	$\displaystyle\bar{P}_{n}(y^{n},\phi_{n}(x^{n}))$
		$\displaystyle\geq\max_{j\in[1:k],t\in[1:r]}Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(y^{n},\phi_{n}(x^{n}))e^{nE}\big{\}}.$		(91)

Then the probability of miss detection is given by

	$\displaystyle\beta_{n}=Q_{Y^{n}\phi_{n}(X^{n})}(\mathcal{A}_{n})$	$\displaystyle\leq kr\max_{j\in[1:k],t\in[1:r]}Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(\mathcal{A}_{n})$
		$\displaystyle\leq kre^{-nE}\bar{P}_{n}(\mathcal{A}_{n})\leq krme^{-nE}.$		(92)

For the given $P_{Y^{n}X^{n}}$ , the false alarm probability is given by

\displaystyle P_{Y^{n}\phi_{n}(X)^{n}}(\mathcal{A}_{n}^{c})=\sum_{l=1}^{m}\nu_{l}P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}(\mathcal{A}_{n}^{c}).

(93)

For each $l\in[1:m]$ , if $(y^{n},\phi_{n}(x^{n}))\in\mathcal{A}_{n}^{c}$ then we have

	$\displaystyle P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}(y^{n},\phi_{n}(x^{n}))$	$\displaystyle\leq\bar{P}_{n}(y^{n},\phi_{n}(x^{n}))$
		$\displaystyle<e^{nE}\max_{j\in[1:k],t\in[1:r]}Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(y^{n},\phi_{n}(x^{n}))$		(94)

This means that $(y^{n},\phi_{n}(x^{n}))\in(\mathcal{I}_{n}^{(l)}(E))^{c}$ , cf. (17) for the definition. In summary we have

$\displaystyle P_{Y^{n}\phi_{n}(X)^{n}}(\mathcal{A}_{n}^{c})$	$\displaystyle=\sum_{l}\nu_{l}P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}[\mathcal{A}_{n}^{c}]$
	$\displaystyle\leq\sum_{l}\nu_{l}P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}[(\mathcal{I}_{n}^{(l)}(E))^{c}]$
	$\displaystyle\leq\sum_{l\in[1:i-1]}\nu_{l}+\sum_{l\in[i:m]}\nu_{l}P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}[(\mathcal{I}_{n}^{(l)}(E))^{c}].$	(95)

Therefore we have

\limsup_{n\to\infty}\alpha_{n}\stackrel{{\scriptstyle\eqref{eq_79}}}{{\leq}}\sum_{l\in[1:i-1]}\nu_{l}.

(96)

This implies that $E_{\mathrm{mix},\epsilon}^{\star}(R_{c})\geq\xi_{i}(R_{c})$ if $\sum_{l=1}^{i-1}\nu_{l}\leq\epsilon<\sum_{l=1}^{i}\nu_{l}$ .

Converse: Given an $\epsilon\in[\sum_{j=1}^{i-1}\nu_{j},\sum_{j=1}^{i}\nu_{j})$ and an arbitrary $\gamma>0$ , assume that $(E+\gamma)$ is $\epsilon$ -achievable via a sequence of testing schemes $(\phi_{n},\psi_{n})$ . In a similar fashion as in the converse proof of Theorem 3 we have,

	$\displaystyle\alpha_{n}+e^{nE}\beta_{n}+e^{-n\gamma_{n}}$
	$\displaystyle\geq\sum_{i=1}^{m}\nu_{i}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{C}_{n}^{(i)}),$		(97)

where the rejection regions $\mathcal{C}_{n}^{(i)}$ are defined as in (49). Assume that $E=\xi_{i}(R_{c})+3\gamma$ holds. This implies that for all $l\in[1:i]$ we have $E>\xi_{l}(R_{c})$ . For each $l\in[1:i]$ , we apply the second part of Theorem 2 to the problem of testing $P_{Y_{l}X_{l}}^{\otimes n}$ against $Q_{Y_{j_{l}^{\star}}^{n}}\times Q_{X_{t_{s}^{\star}}^{n}}$ where $l\in\mathfrak{F}_{s}$ via the sequence of testing schemes $(\phi_{n},\mathbf{1}_{(\mathcal{C}_{n}^{(l)})^{c}})$ to obtain that

\lim_{n\to\infty}P_{Y_{l}^{n}\phi_{n}(X_{l}^{n})}(\mathcal{C}_{n}^{(l)})=1,\;\forall l\in[1:i].

(98)

Since $e^{nE}\beta_{n}\to 0$ as $n\to\infty$ , this implies that

\displaystyle\epsilon\geq\limsup_{n\to\infty}\alpha_{n}\geq\sum_{l=1}^{i}\nu_{l},

(99)

a contradiction. Therefore we must have $E\leq\xi_{i}(R_{c})$ and hence $E^{\star}_{\mathrm{mix},\epsilon}(R_{c})\leq\xi_{i}(R_{c})$ . ∎

VI A refined relation to the WAK problem

In this section we use the techniques and results from the previous sections to establish new results for the WAK problem in which the joint distribution is a mixture of iid components.
We first recall the definition of the WAK problem. Assume that we have a joint source $(X^{n},Y^{n})\sim P_{X^{n}Y^{n}}$ which takes values on an alphabet $\mathcal{X}_{n}\times\mathcal{Y}_{n}$ where $\mathcal{Y}_{n}$ is finite or countable infinite. A code for the WAK problem is a triple of mappings

	$\displaystyle\phi_{1n}\colon\mathcal{X}_{n}\to\mathcal{M}_{1}$	$\displaystyle,\;\phi_{2n}\colon\mathcal{Y}_{n}\to\mathcal{M}_{2}.$
	$\displaystyle\bar{\psi}_{n}\colon\mathcal{M}_{1}$	$\displaystyle\times\mathcal{M}_{2}\to\mathcal{Y}_{n}.$		(100)

In the WAK setting we aim to control the error probability $\mathrm{Pr}\{\hat{Y}^{n}\neq Y^{n}\}$ where $\hat{Y}^{n}=\bar{\psi}_{n}(\phi_{1n}(X^{n}),\phi_{2n}(Y^{n}))$ . The achievable region has been characterized using the information-spectrum formula in [30]. We are interested in single-letter formulas for the ( $\epsilon$ -) achievable regions. To obtain these we relate the WAK problem to the testing against independence problem with general distributions.

The hypotheses are given by

	$\displaystyle H_{0}$	$\displaystyle\colon(y^{n},x^{n})\sim P_{Y^{n}X^{n}},$
	$\displaystyle H_{1}$	$\displaystyle\colon(y^{n},x^{n})\sim Q_{Y^{n}}\times P_{X^{n}},$		(101)

where $Q_{Y^{n}}$ is a distribution on $\mathcal{Y}_{n}$ . Similarly we use (3) and (4) for the definition of a testing scheme in this case. Additionally, Definition 1 is taken as the definition of $\epsilon$ -achievability.

When $\mathcal{Y}$ is a finite or countable finite alphabet, and the process $(Y_{i})_{i=-\infty}^{\infty}$ in the WAK problem is a stationary and ergodic process which has a finite entropy rate, as well as $\mathcal{Y}_{n}=\mathcal{Y}^{n}$ and $Q_{Y^{n}}=P_{Y^{n}}$ , a generalized relation between the WAK problem and the hypothesis testing against independence problem has been established in our recent work [21]. The relation allows us to transfer results from the testing against independence problem to the WAK problem and vice versa. However it is not strong enough for our current interest. In the following we study a refined relation between the two problems.

In this section we similarly assume that $\mathcal{Y}_{n}=\mathcal{Y}^{n}$ where specifically $\mathcal{Y}$ is a finite alphabet. We further assume that $Q_{Y^{n}}=\chi^{\otimes n}$ where $\chi$ is the uniform distribution on $\mathcal{Y}$ . For simplicity we call this setting U(niform)-HT. Define a set

\displaystyle\mathcal{A}_{n}(\phi_{n},t)=\{(y^{n},\phi_{n}(x^{n}))\mid P_{Y^{n}|\phi_{n}(X^{n})}(y^{n}|\phi_{n}(x^{n}))\leq t\}.

(102)

For a given WAK-code $(\phi_{1n},\phi_{2n},\psi_{n})$ , and an arbitrary number $\eta$ , we have

	$\displaystyle\mathrm{Pr}\{\hat{Y}^{n}=Y^{n}\}\leq P_{Y^{n}\phi_{1n}(X^{n})}[(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/\|\mathcal{M}_{2}\|))^{c}]$
	$\displaystyle\hskip 28.45274pt+\sum_{u_{1}}P_{\phi_{1n}(X^{n})}(u_{1})\sum_{\begin{subarray}{c}y^{n}\colon P_{Y^{n}\|\phi_{1n}(X^{n})}(y^{n}\|u_{1})\leq e^{-\eta}/\|\mathcal{M}_{2}\|\\ y^{n}=\bar{\psi}_{n}(u_{1},\phi_{2n}(y^{n}))\end{subarray}}P_{Y^{n}\|\phi_{1n}(X^{n})}(y^{n}\|u_{1})$
	$\displaystyle\stackrel{{\scriptstyle(*)}}{{\leq}}P_{Y^{n}\phi_{1n}(X^{n})}[(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/\|\mathcal{M}_{2}\|))^{c}]+e^{-\eta},$		(103)

where $(*)$ holds as for a given $u_{1}\in\mathcal{M}_{1}$ the number of $y^{n}$ satisfying $y^{n}=\bar{\psi}_{n}(u_{1},\phi_{2n}(y^{n}))$ is upper bounded by $|\mathcal{M}_{2}|$ . Therefore, we obtain

\displaystyle\mathrm{Pr}\{\hat{Y}^{n}\neq Y^{n}\}+e^{-\eta}\geq P_{Y^{n}\phi_{1n}(X^{n})}(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/|\mathcal{M}_{2}|)).

(104)

Additionally, given a testing scheme $(\phi_{n},\psi_{n})$ for the U-HT problem and an arbitrarily positive number $\gamma$ we also have by [11, Lemma 4.1.2]

	$\displaystyle\alpha_{n}+\gamma\beta_{n}$	$\displaystyle\geq\mathrm{Pr}\big{\{}\frac{P_{Y^{n}\phi_{n}(X^{n})}}{\chi^{\otimes n}\times P_{\phi_{n}}(X^{n})}(Y^{n},\phi_{n}(X^{n}))\leq\gamma\big{\}}$
		$\displaystyle=P_{Y^{n}\phi_{n}(X^{n})}(\mathcal{A}_{n}(\phi_{n},\gamma/\|\mathcal{Y}\|^{n})).$		(105)

Similar as in [21, Theorem 2] we have the following result.

Theorem 5.

Given positive numbers $\eta$ and $\gamma$ .

•

From a WAK-code $(\phi_{1n},\phi_{2n},\bar{\psi}_{n})$ , we can construct a testing scheme for a U-HT problem $(\phi_{1n},\psi_{n})$ such that

\displaystyle\alpha_{n}

\displaystyle\leq\mathrm{Pr}\{\hat{Y}^{n}\neq Y^{n}\}+e^{-\eta},\;\beta_{n}\leq\frac{e^{\eta}|\mathcal{M}_{2}|}{|\mathcal{Y}|^{n}}.

(106)

•

For a given U-HT testing scheme $(\phi_{n},\psi_{n})$ , there exists a WAK-code $(\phi_{1n},\phi_{2n},\bar{\psi}_{n})$ such that

$\displaystyle\mathrm{Pr}\{\hat{Y}^{n}\neq Y^{n}\}\leq\alpha_{n}+\gamma\beta_{n}+|\mathcal{Y}|^{n}/(\gamma|\mathcal{M}_{2}|).$ (107)

Proof.

U-HT $\Leftarrow$ WAK: given a WAK-code $(\phi_{1n},\phi_{2n},\bar{\psi}_{n})$ we design a U-HT testing scheme as follows. We use $\phi_{1n}$ to compress $x^{n}$ in the U-HT setting. A decision region for the U-HT setting is given by $(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/|\mathcal{M}_{2}|))^{c}$ . Then the false alarm probability is upper bounded as

	$\displaystyle\alpha_{n}$	$\displaystyle=P_{Y^{n}\phi_{1n}(X^{n})}(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/\|\mathcal{M}_{2}\|))$
		$\displaystyle\stackrel{{\scriptstyle\eqref{wak_basic}}}{{\leq}}\mathrm{Pr}\{\hat{Y}^{n}\neq Y^{n}\}+e^{-\eta}.$		(108)

The miss detection probability is upper bounded as

	$\displaystyle\beta_{n}$	$\displaystyle=\chi^{\otimes n}\times P_{\phi_{1n}(X^{n})}[(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/\|\mathcal{M}_{2}\|))^{c}]$
		$\displaystyle\stackrel{{\scriptstyle(**)}}{{\leq}}\frac{e^{\eta}\|\mathcal{M}_{2}\|}{\|\mathcal{Y}\|^{n}}P_{Y^{n}\phi_{1n}(X^{n})}[(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/\|\mathcal{M}_{2}\|))^{c}]\leq\frac{e^{\eta}\|\mathcal{M}_{2}\|}{\|\mathcal{Y}\|^{n}},$		(109)

where $(**)$ follows since for $(y^{n},\phi_{1n}(x^{n}))\in(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/|\mathcal{M}_{2}|))^{c}$ we have $e^{\eta}|\mathcal{M}_{2}|P_{Y^{n}|\phi_{1n}(X^{n})}(y^{n}|\phi_{1n}(x^{n}))\geq 1$ .
U-HT $\Rightarrow$ WAK: given a U-HT scheme $(\phi_{n},\psi_{n})$ we show the existence a WAK-code as follows. We use $\phi_{n}$ to compress $x^{n}$ in the WAK problem. We randomly assign $y^{n}$ to an index $m_{2}$ in an alphabet $\mathcal{M}_{2}$ . For each $m_{2}$ we denote the corresponding (random) set of such $y^{n}$ by $\mathcal{B}(m_{2})$ . We declare that $\hat{y}^{n}$ is the original source sequence if it is the unique sequence satisfying $\hat{y}^{n}\in\mathcal{B}(m_{2})$ and $(\hat{y}^{n},\phi_{n}(x^{n}))\in(\mathcal{A}_{n}(\phi_{n},\gamma/|\mathcal{Y}|^{n}))^{c}$ . For each $u\in\mathcal{M}$ , the cardinality of the set of $y^{n}$ satisfying $(y^{n},u)\in(\mathcal{A}_{n}(\phi_{n},\gamma/|\mathcal{Y}|^{n}))^{c}$ is upper bounded by $|\mathcal{Y}|^{n}/\gamma$ . There are two sources of errors:

•

either $(y^{n},\phi_{n}(x^{n}))\in\mathcal{A}_{n}(\phi_{n},\gamma/|\mathcal{Y}|^{n})$ holds,
•

or there exists another $\tilde{y}^{n}$ for which $\tilde{y}^{n}\in\mathcal{B}(m_{2})$ and $(\tilde{y}^{n},\phi_{n}(x^{n}))\in(\mathcal{A}_{n}(\phi_{n},\gamma/|\mathcal{Y}|^{n}))^{c}$ hold.

The probability of the first event is upper bounded as

\displaystyle\mathrm{Pr}\{(Y^{n},\phi_{n}(X^{n}))\in\mathcal{A}_{n}(\phi_{n},\gamma/|\mathcal{Y}|^{n})\}\stackrel{{\scriptstyle\eqref{testing_abg}}}{{\leq}}\alpha_{n}+\gamma\beta_{n},

(110)

The probability of the second event is upper bounded by $|\mathcal{Y}|^{n}/(\gamma|\mathcal{M}_{2}|)$ because each sequence $\tilde{y}^{n}$ is assigned to a bin with probability $1/|\mathcal{M}_{2}|$ and the number of such sequences satisfying the second event is upper bounded by $|\mathcal{Y}|^{n}/\gamma$ . Hence, it can be seen that

\displaystyle\mathrm{Pr}\{\hat{Y}^{n}\neq Y^{n}\}\leq\alpha_{n}+\gamma\beta_{n}+|\mathcal{Y}|^{n}/(\gamma|\mathcal{M}_{2}|).

(111)

∎

Fix an $\epsilon\in[0,1)$ . Let $\mathcal{R}_{\mathrm{WAK},\epsilon}$ be the closure of all $(R_{c},R_{2})$ such that there exists a sequence WAK-codes $(\phi_{1n},\phi_{2n},\bar{\psi}_{n})$ which satisfies

	$\displaystyle\limsup_{n\to\infty}\frac{1}{n}\log\|\phi_{1n}\|\leq R_{c},$	$\displaystyle\;\limsup_{n\to\infty}\frac{1}{n}\log\|\phi_{2n}\|\leq R_{2},$
	$\displaystyle\limsup_{n\to\infty}\mathrm{Pr}\{\hat{Y}^{n}\neq Y^{n}\}$	$\displaystyle\leq\epsilon.$		(112)

Define $R_{2,\epsilon}^{\star}(R_{c})=\inf\{R_{2}\mid(R_{c},R_{2})\in\mathcal{R}_{\mathrm{WAK},\epsilon}\}$ . We observe that $R_{2,\epsilon}^{\star}(R_{c})\leq\log|\mathcal{Y}|$ for all $\epsilon\in[0,1)$ and $R_{c}$ . By setting $\gamma=|\mathcal{Y}|^{n}$ in (105) we obtain that $E_{\epsilon}^{\star}(R_{c})\leq\log|\mathcal{Y}|$ . The following result summarizes the relation between the minimum encoding rate $R_{2,\epsilon}^{\star}(R_{c})$ in the WAK problem and the maximum $\epsilon$ -achievable error exponent in the U-HT problem. As in [21, Theorem 3] we have the following result.

Corollary 2.

For any given $R_{c}>0$ and $\epsilon\in[0,1)$ , we have

R_{2,\epsilon}^{\star}(R_{c})+E_{\epsilon}^{\star}(R_{c})=\log|\mathcal{Y}|.

(113)

Proof.

We consider the extreme case where $R_{2,\epsilon}^{\star}(R_{c})=\log|\mathcal{Y}|$ holds. Assume that $E>0$ is an achievable error exponent in the U-HT problem with the corresponding sequence of testing schemes $(\phi_{n},\psi_{n})$ . We take $\gamma=e^{n(E-\eta)}$ where $0<\eta<\frac{2}{3}E$ . By plugging it into the second part of Theorem 5 and choosing $|\mathcal{M}_{2}|=(|\mathcal{Y}|^{n}/\gamma)e^{n\eta/2}$ , we obtain a sequence WAK-codes such that $\limsup_{n\to\infty}\mathrm{Pr}\{\hat{Y}^{n}\neq Y^{n}\}\leq\epsilon$ . The corresponding compression rate is $\log|\mathcal{Y}|-E+\frac{3}{2}\eta<R_{2,\epsilon}^{\star}(R_{c})$ , a contradiction. Therefore in this case we must have $E_{\epsilon}^{\star}(R_{c})=0$ . The other cases can be worked out similarly as in the proof of [21, Theorem 3]. ∎

Assume now that the source distribution in the WAK problem (and the distribution in the U-HT problem) is given by $P_{X^{n}Y^{n}}=\sum_{i=1}^{m}\nu_{i}P_{X_{i}Y_{i}}^{\otimes n}$ . Further we assume that $Q_{Y_{j}^{n}}=\chi^{\otimes n}$ for all $j\in[1:k]$ , and $\{Q_{X_{t}^{n}}\}=\{P_{\mathcal{X},s}^{\otimes n}\}$ as well as $Q_{Y^{n}X^{n}}=\chi^{\otimes n}\times P_{X^{n}}$ hold. Therefore we can use results from Section III as follows. The quantities $\theta_{s}(R_{c})$ , $s\in[1:|\mathcal{P}_{\mathcal{X}}|]$ , in this case are given by

	$\displaystyle\theta_{s}(R_{c})$	$\displaystyle=\max_{P_{U\|X}\colon I(X_{s};U)\leq R_{c}}\min_{i\in\mathfrak{F}_{s}}[I(Y_{i};U)-H(Y_{i})+\log\|\mathcal{Y}\|]$
		$\displaystyle=\log\|\mathcal{Y}\|-\min_{P_{U\|X}\colon I(X_{s};U)\leq R_{c}}\max_{i\in\mathfrak{F}_{s}}H(Y_{i}\|U).$		(114)

Therefore using Corollary 2 we obtain

R_{2,0}^{\star}(R_{c})=\max_{s}\min_{P_{U|X}\colon I(X_{s};U)\leq R_{c}}\max_{i\in\mathfrak{F}_{s}}H(Y_{i}|U).

(115)

Similarly if Assumption 2 hold and $(\xi_{i}(R_{c}))_{i=1}^{m}$ is an increasing sequence, the minimum $\epsilon$ -achievable compression rate at $R_{c}$ is given by

\displaystyle R_{2,\epsilon}^{\star}(R_{c})=\log|\mathcal{Y}|-\xi_{i}(R_{c}),\;\text{when}\;\sum_{j=1}^{i-1}\nu_{i}\leq\epsilon<\sum_{j=1}^{i}\nu_{i},

(116)

where in this case for all $i\in[1:m]$ we have

\xi_{i}(R_{c})=\log|\mathcal{Y}|-\min_{P_{U|X}\colon I(X_{s};U)\leq R_{c}}H(Y_{i}|U).

(117)

Appendix A Hypothesis testing with two-sided compression

In this section we study code transformations between hypothesis testing problems with compression at both terminals. Although the setting considered in the following is not directly related to our main problem, the arguments presented herein however are useful in simplifying proofs of main results in this work.

Assume that $x^{n}$ is available at Terminal 1. Further $y^{n}$ is available at Terminal 2. At a given number of samples $n$ a generic testing scheme involves a triple of mappings $(\phi_{1n},\phi_{2n},\psi_{n})$ where

	$\displaystyle\phi_{1n}\colon\mathcal{X}^{n}$	$\displaystyle\to\mathcal{M}_{1},\;\phi_{2n}\colon\mathcal{Y}^{n}\to\mathcal{M}_{2},$
	$\displaystyle\psi_{n}\colon$	$\displaystyle\mathcal{M}_{1}\times\mathcal{M}_{2}\to\{0,1\}.$		(118)

Similarly as in (5) the generic error probabilities $\alpha_{n}$ and $\beta_{n}$ are defined as

\displaystyle\alpha_{n}=P_{\phi_{1n}(X^{n})\phi_{2n}(Y^{n})}(1-\psi_{n}),\;\beta_{n}=Q_{\phi_{1n}(X^{n})\phi_{2n}(Y^{n})}(\psi_{n}).

(119)

We study a relation between two hypothesis testing problems which have the same distribution in the null hypothesis. The first problem involves the following hypotheses

	$\displaystyle H_{0}$	$\displaystyle\colon(x^{n},y^{n})\sim P_{X^{n}Y^{n}}$
	$\displaystyle H_{1}$	$\displaystyle\colon(x^{n},y^{n})\sim Q_{Y^{n}}\times Q_{X^{n}},$		(120)

whereas the second problem considers

	$\displaystyle\bar{H}_{0}$	$\displaystyle\colon(x^{n},y^{n})\sim P_{X^{n}Y^{n}}$
	$\displaystyle\bar{H}_{1}$	$\displaystyle\colon(x^{n},y^{n})\sim\bar{Q}_{Y^{n}}\times\bar{Q}_{X^{n}}.$		(121)

We make the following technical assumptions

	$\displaystyle\forall n,\;$	$\displaystyle P_{X^{n}}\ll Q_{X^{n}},\;P_{Y^{n}}\ll Q_{Y^{n}},$
		$\displaystyle\;P_{X^{n}}\ll\bar{Q}_{X^{n}},\;P_{Y^{n}}\ll\bar{Q}_{Y^{n}}.$		(122)

An overview of our derivation is depicted in Fig. 3. Let $(\phi_{1n},\phi_{2n},\psi_{n})$ be a testing scheme for differentiating between $H_{0}$ and $H_{1}$ in (120). We construct a testing scheme $(\bar{\phi}_{1n},\bar{\phi}_{2n},\bar{\psi}_{n})$ for testing $\bar{H}_{0}$ against $\bar{H}_{1}$ in (121) as follows. For an arbitrarily given $\gamma>0$ , we define typical sets $\mathcal{B}_{n,\gamma}^{1}$ , and $\mathcal{B}_{n,\gamma}^{2}$ as

	$\displaystyle\mathcal{B}_{n,\gamma}^{1}$	$\displaystyle=\{x^{n}\mid\|[\log\bar{Q}_{X}^{n}(x^{n})-\log Q_{X^{n}}(x^{n})]/n-A_{X}\|<\gamma\},$
	$\displaystyle\mathcal{B}_{n,\gamma}^{2}$	$\displaystyle=\{y^{n}\mid\|[\log\bar{Q}_{Y^{n}}(y^{n})-\log Q_{Y^{n}}(y^{n})]/n-A_{Y}\|<\gamma\},$		(123)

where $A_{X}$ and $A_{Y}$ are finite numbers.
We define the compression mapping $\bar{\phi}_{1n}$ as follows

	$\displaystyle\bar{\phi}_{1n}\colon\mathcal{X}^{n}$	$\displaystyle\to\mathcal{M}_{1}\cup\{e_{1}\}$
	$\displaystyle\bar{\phi}_{1n}(x^{n})$	$\displaystyle\mapsto\begin{dcases}\phi_{1n}(x^{n})\;&\text{if}\;x^{n}\in\mathcal{B}_{n,\gamma}^{1},\\ e_{1}\;&\text{otherwise}\end{dcases}.$		(124)

Similarly the compression mapping $\bar{\phi}_{2n}$ is defined as

	$\displaystyle\bar{\phi}_{2n}\colon\mathcal{Y}^{n}$	$\displaystyle\to\mathcal{M}_{2}\cup\{e_{2}\}$
	$\displaystyle\bar{\phi}_{2n}(y^{n})$	$\displaystyle\mapsto\begin{dcases}\phi_{2n}(y^{n})\;&\text{if}\;y^{n}\in\mathcal{B}_{n,\gamma}^{2},\\ e_{2}\;&\text{otherwise}\end{dcases}.$		(125)

The corresponding decision mapping $\bar{\psi}_{n}$ , is given as

	$\displaystyle\bar{\psi}_{n}\colon\prod_{k=1}^{2}(\mathcal{M}_{k}\cup\{e_{k}\})$	$\displaystyle\to\{0,1\}$
	$\displaystyle\bar{\psi}_{n}(\bar{u}_{1},\bar{u}_{2})$	$\displaystyle\mapsto\begin{dcases}\psi_{n}(\bar{u}_{1},\bar{u}_{2})\;&\text{if}\;\bar{u}_{k}\neq e_{k},\;\forall k=1,2,\\ 1&\;\text{otherwise}\end{dcases}.$

For notation simplicity we define for $k=1,2,$ and $u_{k}\in\mathcal{M}_{k}$ , $\mathcal{W}_{u_{k}}^{k}=\phi_{kn}^{-1}(u_{k})\cap\mathcal{B}_{n,\gamma}^{k}$ . The set of all $(x^{n},y^{n})$ for which $\bar{u}_{1}\neq e_{1}$ and $\bar{u}_{2}\neq e_{2}$ hold is $\mathcal{B}_{n,\gamma}^{1}\times\mathcal{B}_{n,\gamma}^{2},$ which can be factorized further as

\displaystyle\bigcup_{(u_{1},u_{2})\in\mathcal{M}_{1}\times\mathcal{M}_{2}}(\mathcal{W}_{u_{1}}^{1}\times\mathcal{W}_{u_{2}}^{2}).

(126)

In differentiating between $P_{Y^{n}X^{n}}$ and $\bar{Q}_{Y^{n}}\times\bar{Q}_{X^{n}}$ the induced false alarm probability $\bar{\alpha}_{n}$ by the testing scheme $(\bar{\phi}_{1n},\bar{\phi}_{2n},\bar{\psi}_{n})$ is upper bounded as

$\displaystyle\bar{\alpha}_{n}$	$\displaystyle=1-\sum_{(u_{1},u_{2})\in\mathcal{M}_{1}\times\mathcal{M}_{2}}P_{Y^{n}X^{n}}(\mathcal{W}_{u_{1}}^{1}\times\mathcal{W}_{u_{2}}^{2})P_{H_{0}\|u_{1},u_{2}}$
	$\displaystyle\leq 1-\sum_{(u_{1},u_{2})\in\mathcal{M}_{1}\times\mathcal{M}_{2}}P_{Y^{n}X^{n}}[\phi_{1n}^{-1}(u_{1})\times\phi_{2n}^{-1}(u_{2})]P_{H_{0}\|u_{1},u_{2}}$
	$\displaystyle+P_{Y^{n}X^{n}}[([(\mathcal{B}_{n,\gamma}^{1}\times\mathcal{Y}^{n})\cap(\mathcal{X}^{n}\times\mathcal{B}_{n,\gamma}^{2})])^{c}]$
	$\displaystyle\leq 1-\sum_{(u_{1},u_{2})\in\mathcal{M}_{1}\times\mathcal{M}_{2}}P_{Y^{n}X^{n}}[\phi_{1n}^{-1}(u_{1})\times\phi_{2n}^{-1}(u_{2})]P_{H_{0}\|u_{1},u_{2}}$
	$\displaystyle\quad+P_{X^{n}}[(\mathcal{B}_{n,\gamma}^{1})^{c}]+P_{Y^{n}}[(\mathcal{B}_{n,\gamma}^{2})^{c}]$
	$\displaystyle=\alpha_{n}+P_{X^{n}}[(\mathcal{B}_{n,\gamma}^{1})^{c}]+P_{Y^{n}}[(\mathcal{B}_{n,\gamma}^{2})^{c}].$	(127)

Similarly the probability of miss detection $\bar{\beta}_{n}$ is bounded by

$\displaystyle\bar{\beta}_{n}$	$\displaystyle=\sum_{(u_{1},u_{2})\in\mathcal{M}_{1}\times\mathcal{M}_{2}}\bar{Q}_{X^{n}}(\mathcal{W}_{u_{1}}^{1})\bar{Q}_{Y^{n}}(\mathcal{W}_{u_{2}}^{2})P_{H_{0}\|u_{1},u_{2}}$
	$\displaystyle\leq e^{n(A_{X}+A_{Y}+2\gamma)}$
	$\displaystyle\quad\times\sum_{(u_{1},u_{2})\in\mathcal{M}_{1}\times\mathcal{M}_{2}}Q_{X^{n}}(\mathcal{W}_{u_{1}}^{1})Q_{Y^{n}}(\mathcal{W}_{u_{2}}^{2})P_{H_{0}\|u_{1},u_{2}}$
	$\displaystyle\leq e^{n(A_{X}+A_{Y}+2\gamma)}\beta_{n}.$	(128)

Given a testing scheme $(\bar{\phi}_{1n},\bar{\phi}_{2n},\bar{\psi}_{n})$ for testing $P_{Y^{n}X^{n}}$ against $\bar{Q}_{Y^{n}}\times\bar{Q}_{X^{n}}$ we apply a similar procedure as from (124) to (A), by swapping the positions of $\bar{\phi}_{kn}$ and $\phi_{kn}$ for $k=1,2$ , in (124) and (125) as well as switching the roles of $\bar{\psi}_{n}$ and $\psi_{n}$ in (A), to obtain a testing scheme $(\phi_{1n},\phi_{2n},\psi_{n})$ for testing $P_{Y^{n}X^{n}}$ against $Q_{Y^{n}}\times Q_{X^{n}}$ .

The induced false alarm probability $\alpha_{n}$ in testing $H_{0}$ against $H_{1}$ is similarly upper bounded by

\displaystyle\alpha_{n}\leq\bar{\alpha}_{n}+P_{X^{n}}[(\mathcal{B}_{n,\gamma}^{1})^{c}]+P_{Y^{n}}[(\mathcal{B}_{n,\gamma}^{2})^{c}].

(129)

The induced miss detection probability $\beta_{n}$ in testing $H_{0}$ against $H_{1}$ is upper bounded as

$\displaystyle\beta_{n}$	$\displaystyle=\sum_{(\bar{u}_{1},\bar{u}_{2})\in\bar{\mathcal{M}}_{1}\times\bar{\mathcal{M}}_{2}}Q_{X^{n}}(\mathcal{W}_{\bar{u}_{1}}^{1})Q_{Y^{n}}(\mathcal{W}_{\bar{u}_{2}}^{2})P_{\bar{H}_{0}\|\bar{u}_{1},\bar{u}_{2}}$
	$\displaystyle\leq e^{-n(A_{X}+A_{Y}-2\gamma)}$
	$\displaystyle\quad\times\sum_{(\bar{u}_{1},\bar{u}_{2})\in\bar{\mathcal{M}}_{1}\times\bar{\mathcal{M}}_{2}}\bar{Q}_{X^{n}}(\mathcal{W}_{\bar{u}_{1}}^{1})\bar{Q}_{Y^{n}}(\mathcal{W}_{\bar{u}_{2}}^{2})P_{\bar{H}_{0}\|\bar{u}_{1},\bar{u}_{2}}$
	$\displaystyle\leq e^{-n(A_{X}+A_{Y}-2\gamma)}\bar{\beta}_{n}.$	(130)

For a given pair of compression rates $(R_{1},R_{2})$ , let $\bar{E}_{\epsilon}^{\star}(R_{1},R_{2})$ be the maximum $\epsilon$ -achievable error exponent for testing $\bar{H}_{0}$ against $\bar{H}_{1}$ . Similarly let $E_{\epsilon}^{\star}(R_{1},R_{2})$ be the maximum $\epsilon$ -achievable error exponent for testing $H_{0}$ against $H_{1}$ . To establish a relation between $\bar{E}_{\epsilon}^{\star}(R_{1},R_{2})$ and $E_{\epsilon}^{\star}(R_{1},R_{2})$ , we make the following assumption.

Assumption 3.

The sequences of joint distributions $(P_{X^{n}Y^{n}})$ , $(Q_{X^{n}Y^{n}})$ , $(\bar{Q}_{X^{n}Y^{n}})$ , and the quantities $A_{X}$ , $A_{Y}$ , satisfy (122) and the following conditions

\displaystyle\forall\gamma>0,\;\lim_{n\to\infty}P_{X^{n}}(\mathcal{B}_{n,\gamma}^{1})=1,\;\lim_{n\to\infty}P_{Y^{n}}(\mathcal{B}_{n,\gamma}^{2})=1.

(131)

We give some examples in which Assumption (3) is satisfied.

•

Assume that

	$\displaystyle P_{Y^{n}}=P_{Y}^{\otimes n},\;P_{X^{n}}$	$\displaystyle=P_{X}^{\otimes n},\;Q_{X^{n}}=Q_{X}^{\otimes n},\;Q_{Y^{n}}=Q_{Y}^{\otimes n},$
	$\displaystyle\text{and}\;\bar{Q}_{X^{n}}$	$\displaystyle=\bar{Q}_{X}^{\otimes n},\;\bar{Q}_{Y^{n}}=\bar{Q}_{Y}^{\otimes n},$		(132)

hold such that the conditions in (122) are fulfilled. Furthermore we assume that

	$\displaystyle A_{X}$	$\displaystyle=D(P_{X}\\|Q_{X})-D(P_{X}\\|\bar{Q}_{X}),$
	$\displaystyle\text{and}\;A_{Y}$	$\displaystyle=D(P_{Y}\\|Q_{Y})-D(P_{Y}\\|\bar{Q}_{Y})$		(133)

are finite. By the weak law of large numbers, the conditions in (131) are satisfied.

•

Assume that $(P_{Y^{n}})$ , $(P_{X^{n}})$ are stationary and ergodic processes as well as $(Q_{Y^{n}})$ , $(Q_{X^{n}})$ , and $(\bar{Q}_{Y^{n}})$ , $(\bar{Q}_{X^{n}})$ are finite order Markov processes with stationary transition probabilities such that the conditions in (122) are satisfied. Assume further that the relative divergence rates

$\displaystyle D_{X}$	$\displaystyle=\lim_{n\to\infty}[D(P_{X^{n+1}}\\|Q_{X^{n+1}})-D(P_{X^{n}}\\|Q_{X^{n}})],$
$\displaystyle\bar{D}_{X}$	$\displaystyle=\lim_{n\to\infty}[D(P_{X^{n+1}}\\|\bar{Q}_{X^{n+1}})-D(P_{X^{n}}\\|\bar{Q}_{X^{n}})],$
$\displaystyle D_{Y}$	$\displaystyle=\lim_{n\to\infty}[D(P_{Y^{n+1}}\\|Q_{Y^{n+1}})-D(P_{Y^{n}}\\|Q_{Y^{n}})],$
$\displaystyle\bar{D}_{Y}$	$\displaystyle=\lim_{n\to\infty}[D(P_{Y^{n+1}}\\|\bar{Q}_{Y^{n+1}})-D(P_{Y^{n}}\\|\bar{Q}_{Y^{n}})],$	(134)

are finite. With $A_{X}=D_{X}-\bar{D}_{X}$ , and $A_{Y}=D_{Y}-\bar{D}_{Y}$ , the conditions in (131) are fulfilled by [24, Theorem 1].

Assume that Assumption 3 is fulfilled, then (127) and (128) imply that

\displaystyle\bar{E}_{\epsilon}^{\star}(R_{1},R_{2})\geq E_{\epsilon}^{\star}(R_{1},R_{2})-[A_{X}+A_{Y}].

(135)

Conversely, (129) and (130) imply that

\displaystyle E_{\epsilon}^{\star}(R_{1},R_{2})\geq\bar{E}_{\epsilon}^{\star}(R_{1},R_{2})+[A_{X}+A_{Y}].

(136)

In conclusion we have shown the following result.

Theorem 6.

Given $\epsilon\in[0,1)$ and $(R_{1},R_{2})\in\mathbb{R}_{+}^{2}$ , under Assumption 3 we have

\displaystyle E_{\epsilon}^{\star}(R_{1},R_{2})=\bar{E}_{\epsilon}^{\star}(R_{1},R_{2})+[A_{X}+A_{Y}].

(137)

As a corollary for this result by setting $\bar{Q}_{X^{n}}=P_{X}^{\otimes n}$ and $\bar{Q}_{Y^{n}}=P_{Y}^{\otimes n}$ we have the following result which states that Theorem 5 in [1] is tight for a special case.

Theorem 7.

Assume that the sequences of distributions $(Q_{Y^{n}})$ and $(Q_{X^{n}})$ satisfy Assumption 1 with $m=1$ , $r=1$ , $S=1$ , and $P_{Y_{1}X_{1}}=P_{YX}$ . In testing $P_{YX}^{\otimes n}$ against $Q_{Y^{n}}\times Q_{X^{n}}$ using one-sided compression of the sequence $x^{n}$ , the maximum $\epsilon$ -achievable error exponent is given by

\displaystyle E_{\epsilon}^{\star}(R_{c})=\max_{P_{U|X}\colon I(U;X)\leq R_{c}}I(Y;U)+d^{\mathrm{yx}},\;\forall\epsilon\in[0,1),

(138)

where in the first category $d^{\mathrm{yx}}=D(P_{X}\|Q_{X})+D(P_{Y}\|Q_{Y})$ , and in the second category $d^{\mathrm{yx}}=A_{X}^{(11)}+A_{Y}^{(11)}$ .

Appendix B Proof of Theorem 1

B-A A support lemma

For the achievability proof of Theorem 1 we need the following support lemma.

Lemma 1.

Let $(\{P_{V_{\eta}}^{\otimes n}\},\{Q_{V_{\tau}^{n}}\})$ be either $(\{P_{\mathcal{X},s}^{\otimes n}\},\{Q_{X_{t}^{n}}\})$ or $(\{P_{Y_{i}}^{\otimes n}\},\{Q_{Y_{j}^{n}}\})$ in Assumption 1. For each $\eta$ , let $d_{\eta}^{\star}$ be the corresponding $d_{s}^{\mathrm{x}}$ or $d_{i}^{\mathrm{y}}$ . For each $n$ , define $P_{V^{n}}=\sum_{s}\nu_{\eta}P_{V_{\eta}}^{\otimes n}$ where for simplicity $(\nu_{\eta})$ are fixed positive numbers and $\sum_{\eta}\nu_{\eta}=1$ . Also for each $\eta$ , we assume that $V_{\eta}^{n}\sim P_{V_{\eta}}^{\otimes n}$ . Then for an arbitrarily given $\gamma>0$ we have

\displaystyle\lim_{n\to\infty}\mathrm{Pr}\{\min_{\tau}\iota_{P_{V^{n}}\|Q_{V_{\tau}^{n}}}(V_{\eta}^{n})<n(d_{\eta}^{\star}-\gamma)\}=0.

(139)

Proof.

It can be seen that

	$\displaystyle\mathrm{Pr}\{\min_{\tau}\iota_{P_{V^{n}}\\|Q_{V_{\tau}^{n}}}(V_{\eta}^{n})\leq n(d_{\eta}^{\star}-\gamma)\}$
	$\displaystyle\leq\sum_{\tau}\mathrm{Pr}\{\iota_{P_{V^{n}}\\|Q_{V_{\tau}^{n}}}(V_{\eta}^{n})\leq n(d_{\eta}^{\star}-\gamma)\}$
	$\displaystyle\leq\sum_{\tau}\mathrm{Pr}\{\iota_{P_{V_{\eta}}^{\otimes n}\\|Q_{V_{\tau}^{n}}}(V_{\eta}^{n})\leq n(d_{\eta}^{\star}-\gamma-\log\nu_{\eta}/n)\}.$		(140)

If $\{Q_{V_{\tau}^{n}}\}$ belongs to the second category, we always have $P_{V_{\eta}}^{\otimes n}\ll Q_{V_{\tau}^{n}}$ for all $(\eta,\tau)$ . This leads to $P_{V^{n}}\ll Q_{V_{\tau}^{n}}$ for all $\tau$ . The second inequality then follows since $P_{V^{n}}(v^{n})\geq\nu_{\eta}P_{V_{\eta}}^{\otimes n}(v^{n})$ for all $v^{n}$ holds.

Assume now that $Q_{V_{\tau}^{n}}$ lies in the first category. When $P_{V^{n}}\ll Q_{V_{\tau}^{n}}$ holds, i.e., for all $\eta$ we have $P_{V_{\bar{\eta}}}^{\otimes n}\ll Q_{V_{\tau}^{n}}$ , then the set $\{v^{n}\mid Q_{V_{\tau}^{n}}(v^{n})=0\}$ can be omitted since it has zero probability. If $P_{V^{n}}\mathrel{\not{\mkern-7.0mu\ll}}Q_{V_{\tau}^{n}}$ , for example when there exists a $\bar{\eta}$ satisfying $P_{V_{\bar{\eta}}}^{\otimes n}\mathrel{\not{\mkern-7.0mu\ll}}Q_{V_{\tau}^{n}}$ , then for all $v^{n}$ such that $Q_{V_{\tau}^{n}}(v^{n})=0$ we have $\iota_{P_{V^{n}}\|Q_{V_{\tau}^{n}}}(v^{n})=+\infty$ which in turn violates the inequality $\iota_{P_{V^{n}}\|Q_{V_{\tau}^{n}}}(v^{n})\leq n(d_{\eta}^{\star}-\gamma)$ . Therefore in both cases we only need to consider the set $\{v^{n}\mid Q_{V_{\tau}^{n}}(v^{n})>0\}$ , in which the second inequality follows from the definitions of $\iota_{P_{V^{n}}\|Q_{V_{\tau}^{n}}}$ , and $\iota_{P_{V_{\eta}}^{\otimes n}\|Q_{V_{\tau}^{n}}}$ as well as the fact that $P_{V^{n}}(v^{n})\geq\nu_{\eta}P_{V_{\eta}}^{\otimes n}(v^{n})$ for all $v^{n}$ holds. For notation simplicity, we define $E_{\eta,n}=d_{\eta}^{\star}-\gamma-\log\nu_{\eta}/n$ . We examine two cases in Assumption 1 separately in the following.

•

Consider the first case in which $\{Q_{V_{\tau}^{n}}\}$ is the set of product distributions on a common alphabet $\mathcal{V}$ . To avoid dealing with cumbersome extended real number operations, we define a set $\mathcal{A}_{\eta,n}=\{v^{n}\mid v^{n}\in\mathcal{V}^{n},\;P_{V_{\eta}}^{\otimes n}(v^{n})>0\}$ . We further have⁸⁸8Without considering $\mathcal{A}_{\eta,n}$ , we might encounter an indeterminate form $\dots+\infty+\dots-\infty+\dots$ in the third expression.

$\displaystyle\mathrm{Pr}\big{\{}$	$\displaystyle\iota_{P_{V_{\eta}}^{\otimes n}\\|Q_{V_{\tau}}^{\otimes n}}(V_{\eta}^{n})<nE_{\eta,n}\big{\}}$
	$\displaystyle=\mathrm{Pr}\big{\{}\iota_{P_{V_{\eta}}^{\otimes n}\\|Q_{V_{\tau}}^{\otimes n}}(V_{\eta}^{n})<nE_{\eta,n},V_{\eta}^{n}\in\mathcal{A}_{\eta,n}\big{\}}$
	$\displaystyle=\mathrm{Pr}\big{\{}\sum_{l}\iota_{P_{V_{\eta l}}\\|Q_{V_{\tau l}}}(V_{\eta l})<nE_{\eta,n},V_{\eta}^{n}\in\mathcal{A}_{\eta,n}\big{\}}.$	(141)

If $D(P_{V_{\eta}}\|Q_{V_{\tau}})<\infty$ then by the weak law of large numbers (141) goes to 0 as $d_{\eta}^{\star}\leq D(P_{V_{\eta}}\|Q_{V_{\tau}})$ . When $D(P_{V_{\eta}}\|Q_{V_{\tau}})=+\infty$ , let $\mathcal{B}_{\eta\tau}$ be the largest subset of $\mathcal{V}$ such that $P_{V_{\eta}}(v)>0$ and $Q_{V_{\tau}}(v)=0$ for all $v\in\mathcal{B}_{\eta\tau}$ . By our assumption we have $P_{V_{\eta}}(\mathcal{B}_{\eta\tau})>0$ . And if $v\in\mathcal{B}_{\eta\tau}$ then $\iota_{P_{V_{\eta}}\|Q_{V_{\tau}}}(v)=+\infty$ holds. Therefore we have

	$\displaystyle\{v^{n}\mid\sum_{l}\iota_{P_{V_{\eta l}}\\|Q_{V_{\tau l}}}(v_{l})<nE_{\eta,n},\;v^{n}\in\mathcal{A}_{\eta,n}\}$
	$\displaystyle\subseteq\{v^{n}\mid v_{l}\in\mathcal{B}_{\eta\tau}^{c},\;\forall l\in[1:n],\;v^{n}\in\mathcal{A}_{\eta,n}\}.$		(142)

Therefore

	$\displaystyle\mathrm{Pr}$	$\displaystyle\big{\{}\sum_{l}\iota_{P_{V_{\eta l}}\\|Q_{V_{\tau l}}}(V_{\eta l})<nE_{\eta,n},V_{\eta}^{n}\in\mathcal{A}_{\eta,n}\big{\}}$
		$\displaystyle\leq(1-P_{V_{\eta}}(\mathcal{B}_{\eta\tau}))^{n}\to 0,\;\text{as}\;n\to\infty.$		(143)

•

Next consider the case in which $\{Q_{V_{\eta}^{n}}\}$ is the set of finite order Markov processes satisfying Assumption 1. By [24, Theorem 1] we have with probability 1

$\displaystyle\frac{1}{n}\iota_{P_{V_{\eta}}^{\otimes n}\|Q_{V_{\tau}^{n}}}(V_{\eta}^{n})\to A^{\eta\tau},\;\forall\eta,\tau,$ (144)

where $A^{\eta\tau}$ is either $A_{X}^{(st)}$ or $A_{Y}^{(ij)}$ . This holds even when $A^{\eta\tau}=+\infty$ . Since, by definition $d_{\eta}^{\star}=\min_{\tau}A^{\eta\tau}<+\infty$ , the conclusion also holds in this case.

∎

B-B Existence of a testing scheme

Let $t_{\min}=\min_{s,\bar{s}\in[1:|\mathcal{P}_{\mathcal{X}}|]}d_{TV}(P_{\mathcal{X},s},P_{\mathcal{X},{\bar{s}}})$ be the minimum (total variation) distance between any two probability distributions in the set of marginal distributions on $\mathcal{X}$ , $\mathcal{P}_{\mathcal{X}}$ . Denote by $P_{x^{n}}$ the type of a sequence $x^{n}$ . We define a mapping $T\colon\mathcal{X}^{n}\to[1:|\mathcal{P}_{\mathcal{X}}|]\cup\{e\}$ as follows. We look for a unique $s\in[1:|\mathcal{P}_{\mathcal{X}}|]$ such that $d_{TV}(P_{x^{n}},P_{\mathcal{X},s})<t_{\min}/2$ . If such an $s$ exists, we set $T(x^{n})=s$ . Otherwise we set $T(x^{n})=e$ .
For a given $R_{c}>0$ and for each $s\in[1:|\mathcal{P}_{\mathcal{X}}|]$ , let $P_{U_{s}|\bar{X}_{s}}$ be a probability kernel which achieves $\theta_{s}(R_{c})$ . Define

	$\displaystyle P_{Y^{n}X^{n}U^{n}}$	$\displaystyle=\sum_{i=1}^{m}\nu_{i}P_{Y_{i}X_{i}U_{i}}^{\otimes n},\;\text{where}\;\nu_{i}>0,\forall i\in[1:m],\;\sum_{i=1}^{m}\nu_{i}=1,$
		$\displaystyle\text{and}\;P_{U_{i}\|X_{i}}=P_{U_{s}\|\bar{X}_{s}},\;\forall i\in\mathfrak{F}_{s},\;s\in[1:\|\mathcal{P}_{\mathcal{X}}\|].$		(145)

Let $P_{Y^{n}U^{n}}$ be the corresponding marginal joint distribution on $\mathcal{Y}^{n}\times\mathcal{U}^{n}$ . For notation simplicity we define the following score function⁹⁹9By our assumption $\min_{j\in[1:k]}\iota_{P_{Y^{n}}\|Q_{Y_{j}^{n}}}(y^{n})<+\infty$ , as if $P_{Y^{n}}(y^{n})>0$ then there exists an $j$ such that $Q_{Y_{j}^{n}}(y^{n})>0$ , therefore $\zeta$ is well-defined.

\displaystyle\zeta(y^{n},u^{n})=\iota_{P_{Y^{n}U^{n}}}(y^{n};u^{n})+\min_{j\in[1:k]}\iota_{P_{Y^{n}}\|Q_{Y_{j}^{n}}}(y^{n}).

(146)

For each $i\in[1:m]$ , and $n$ , let $(Y_{i}^{n},X_{i}^{n})$ be a tuple of random variables such that $(Y_{i}^{n},X_{i}^{n})\sim P_{Y_{i}X_{i}}^{\otimes n}$ . Let $\gamma>0$ be arbitrary but given. For each $s\in[1:|\mathcal{P}_{\mathcal{X}}|]$ draw $M$ codewords $\{u_{s}^{n}(j)\}_{j=1}^{M}$ , $M=e^{n(R_{c}+2\gamma)}$ , from the marginal distribution $P_{U_{s}}^{\otimes n}$ of $(P_{U_{s}|\bar{X}_{s}}\times P_{\mathcal{X},s})^{\otimes n}$ . Given $x^{n}$ , let $\hat{s}=T(x^{n})$ be an estimate of the index of the marginal distribution of $x^{n}$ . Assume that $\hat{s}\neq e$ . We proceed with the compression process if

\displaystyle\min_{t}\iota_{P_{X^{n}}\|Q_{X_{t}^{n}}}(x^{n})>n(d_{\hat{s}}^{\mathrm{x}}-\gamma),

(147)

where $P_{X}^{n}=\sum_{i}\nu_{i}P_{X_{i}}^{\otimes n}$ is the marginal on $\mathcal{X}^{n}$ of $P_{Y^{n}X^{n}U^{n}}$ . Otherwise we send a special index $j_{\hat{s}}^{\star}=e^{\star}$ .

When (147) is fulfilled, we then select a transmission index $j\in[1:M]$ to be $j^{\star}_{\hat{s}}=\arg\min_{j\in[1:M]}\pi_{{\hat{s}}}(x^{n};u_{\hat{s}}^{n}(j))$ where for all $s\in[1:|\mathcal{P}_{\mathcal{X}}|]$

\pi_{s}(x^{n};u^{n})=\max_{i\in\mathfrak{F}_{s}}\mathrm{Pr}\{\zeta(Y_{i}^{n},u^{n})\leq n(E-d_{s}^{\mathrm{x}})|X_{i}^{n}=x^{n}\}.

(148)

If $\hat{s}=e$ , we set $j^{\star}_{\hat{s}}=1$ . The pair $(\hat{s},j^{\star}_{\hat{s}})$ is provided to the decision center. If $\hat{s}\neq e$ , and $j^{\star}_{\hat{s}}\neq e^{\star}$ hold we declare $H_{0}$ is the underlying distribution if

\displaystyle\zeta(y^{n},u_{\hat{s}}^{n}(j^{\star}_{\hat{s}}))>n(E-d_{\hat{s}}^{\mathrm{x}}).

(149)

Otherwise we declare that $H_{1}$ is true. Given a codebook realization, let $\hat{S}$ and $J^{\star}_{\hat{S}}$ be induced random variables under the null hypothesis. For each $i\in[1:m]$ , $i\in\mathfrak{F}_{s}$ , the corresponding false alarm probability is given by

$\displaystyle\alpha_{n}^{(i)}$	$\displaystyle=\mathrm{Pr}\{[\zeta(Y_{i}^{n},u_{\hat{S}}^{n}(J^{\star}_{\hat{S}}))\leq n(E-d_{\hat{S}}^{\mathrm{x}}),\;\text{and}\;\hat{S}\neq e,\;\text{and}\;J^{\star}_{\hat{S}}\neq e^{\star}],$
	$\displaystyle\hskip 56.9055pt\text{or}\;[\hat{S}=e],\;\text{or}\;[J^{\star}_{\hat{S}}=e^{\star}]\}$
	$\displaystyle\leq\underbrace{\mathrm{Pr}\{\zeta(Y_{i}^{n},u_{s}^{n}(J^{\star}_{s}))\leq n(E-d_{s}^{\mathrm{x}})\}}_{\tau_{i}^{1}}+\mathrm{Pr}\{T(X_{i}^{n})\neq s\}$
	$\displaystyle\qquad+\underbrace{\mathrm{Pr}\{\min_{t}\iota_{P_{X^{n}}\\|Q_{X_{t}^{n}}}(X_{i}^{n})<n(d_{s}^{\mathrm{x}}-\gamma)\}}_{\tau_{s}^{3}}.$	(150)

For notation simplicity we define for each $s$ , $E_{s}=E-d_{s}^{\mathrm{x}}$ . The first term can be upper bounded further as

$\displaystyle\tau_{i}^{1}$	$\displaystyle=\int\mathrm{Pr}\{\zeta(Y_{i}^{n},u^{n}))\leq nE_{s}\|X_{i}^{n}=x^{n}\}dP_{X_{i}^{n}u_{s}^{n}(J^{\star}_{s})}(x^{n},u^{n})$
	$\displaystyle\leq\int\max_{i^{\prime}\in\mathfrak{F}_{s}}\mathrm{Pr}\{\zeta(Y_{i^{\prime}}^{n},u^{n}))\leq nE_{s}\|X_{i^{\prime}}^{n}=x^{n}\}dP_{X_{i}^{n}u_{s}^{n}(J^{\star}_{s})}(x^{n},u^{n})$
	$\displaystyle\stackrel{{\scriptstyle\eqref{pi_funcs}}}{{=}}\mathrm{E}[\pi_{s}(X_{i}^{n};u_{s}^{n}(J^{\star}_{s}))]=\int_{[0,1]}\mathrm{Pr}\{\pi_{s}(X_{i}^{n};u_{s}^{n}(J^{\star}_{s}))>t\}dt.$	(151)

Let $(\bar{X}_{s}^{n},\bar{U}_{s}^{n})$ be another tuple of generic random variables that is independent of the random codebook $(U_{s}^{n}(j))$ such that $(\bar{X}_{s}^{n},\bar{U}_{s}^{n})\sim(P_{\mathcal{X},s}\times P_{U_{s}|\bar{X}_{s}})^{\otimes n}$ . Furthermore for the given codebook realization let $\bar{J}^{\star}_{s}$ be the random message induced by $\bar{X}_{s}^{n}$ through the encoding process. Then we have

	$\displaystyle\alpha_{n}=\max_{i\in[1:m]}\alpha_{n}^{(i)}\leq$	$\displaystyle\sum_{s}\int_{[0,1]}\mathrm{Pr}\{\pi_{s}(\bar{X}_{s}^{n};u_{s}^{n}(\bar{J}^{\star}_{s}))>t\}dt$
		$\displaystyle+\sum_{s}\underbrace{\mathrm{Pr}\{T(\bar{X}_{s}^{n})\neq s\}}_{\tau_{s}^{2}}+\sum_{s}\tau_{s}^{3}.$		(152)

Averaging over all codebooks we obtain by Fubini’s theorem

\displaystyle\mathbb{E}[\alpha_{n}]\leq\sum_{s}\int_{[0,1]}\mathrm{Pr}\{\pi_{s}(\bar{X}_{s}^{n};U_{s}^{n}(\bar{J}^{\star}_{s}))>t\}dt+\sum_{s}(\tau_{s}^{2}+\tau_{s}^{3}).

(153)

By the non-asymptotic covering lemma in [25, Lemma 5] we have

	$\displaystyle\mathrm{Pr}\{\pi_{s}(\bar{X}_{s}^{n};U_{s}^{n}(\bar{J}^{\star}_{s}))>t\}$	$\displaystyle\leq\mathrm{Pr}\{\pi_{s}(\bar{X}_{s}^{n};\bar{U}_{s}^{n})>t\}+e^{-\exp(n\gamma)}$
		$\displaystyle+\underbrace{\mathrm{Pr}\{\iota_{P_{\bar{X}_{s}^{n}\bar{U}_{s}^{n}}}(\bar{X}_{s}^{n};\bar{U}_{s}^{n})>n(R_{c}+\gamma)\}}_{\tau_{s}^{4}}.$		(154)

This result implies the following chain of expressions

	$\displaystyle\mathbb{E}[\alpha_{n}]$	$\displaystyle\leq\sum_{s}\int_{[0,1]}\mathrm{Pr}\{\pi_{s}(\bar{X}_{s}^{n};\bar{U}_{s}^{n})>t\}dt+Se^{-\exp(n\gamma)}+\sum_{s}(\tau_{s}^{2}+\tau_{s}^{3}+\tau_{s}^{4})$
		$\displaystyle=\sum_{s}\mathbb{E}[\pi_{s}(\bar{X}_{s}^{n};\bar{U}_{s}^{n})]+Se^{-\exp(n\gamma)}+\sum_{s}(\tau_{s}^{2}+\tau_{s}^{3}+\tau_{s}^{4}).$		(155)

For each $i\in\mathfrak{F}_{s}$ let $\bar{Y}_{i}^{n}$ be a tuple of random variable such that $\bar{Y}_{i}^{n}-\bar{X}_{s}^{n}-\bar{U}_{s}^{n}$ and $P_{\bar{Y}_{i}^{n}|\bar{X}_{s}^{n}}=P_{Y_{i}|X_{i}}^{\otimes n}$ hold. From the definition of $\pi_{s}(\cdot;\cdot)$ we also have

\displaystyle\mathbb{E}[\pi_{s}(\bar{X}_{s}^{n};\bar{U}_{s}^{n})]\leq\sum_{i\in\mathfrak{F}_{s}}\mathrm{Pr}\{\zeta(\bar{Y}_{i}^{n},\bar{U}_{s}^{n})\leq nE_{s}\}.

(156)

In summary there exists a codebook realization, hence a mapping $\phi_{n}\colon\mathcal{X}^{n}\to\mathcal{M}\triangleq([1:|\mathcal{P}_{\mathcal{X}}|]\cup\{e\})\times([1:M]\cup\{e^{\star}\})$ , such that

	$\displaystyle\alpha_{n}\leq$	$\displaystyle\sum_{s}\sum_{i\in\mathfrak{F}_{s}}\mathrm{Pr}\{\zeta(\bar{Y}_{i}^{n},\bar{U}_{s}^{n})\leq nE_{s}\}$
		$\displaystyle+Se^{-\exp(n\gamma)}+\sum_{s}(\tau_{s}^{2}+\tau_{s}^{3}+\tau_{s}^{4}),$		(157)

holds.

B-C Bounding error probabilities

Due to Lemma 1, $\tau_{s}^{3}$ goes to 0 as $n\to\infty$ . By the weak law of large numbers the terms $\tau_{s}^{2}$ and $\tau_{s}^{4}$ go to $0$ as $n\to\infty$ . We focus now on the first term. We observe that

	$\displaystyle\mathrm{Pr}\{\zeta(\bar{Y}_{i}^{n},\bar{U}_{s}^{n})\leq nE_{s}\}$	$\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y^{n}U^{n}}}(\bar{Y}_{i}^{n};\bar{U}_{s}^{n})\leq n(E_{s}-d_{i}^{\mathrm{y}}+\gamma)\}$
		$\displaystyle+\mathrm{Pr}\{\min_{j\in[1:k]}\iota_{P_{Y^{n}}\\|Q_{Y_{j}^{n}}}(\bar{Y}_{i}^{n})<n(d_{i}^{\mathrm{y}}-\gamma)\}.$		(158)

Similarly, the last term goes to 0 due to Lemma 1. In the next step we perform several change of measure steps from the general distribution $P_{Y^{n}U^{n}}$ to our distributions of interest $P_{Y_{i}U_{i}}^{\otimes n}$ , $i\in[1:m]$ . For that purpose, for each $i\in\mathfrak{F}_{s}$ , we define the following sets

$\displaystyle\mathcal{A}_{is}$	$\displaystyle=\{(y^{n},u^{n})\mid P_{\bar{Y}_{i}^{n}\bar{U}_{s}^{n}}(y^{n},u^{n})>0\},$
$\displaystyle\mathcal{B}_{i}$	$\displaystyle=\{y^{n}\mid P_{Y^{n}}(y^{n})\leq e^{n\gamma_{n}}P_{\bar{Y}_{i}}^{\otimes n}(y^{n})\},$
$\displaystyle\mathcal{C}_{s}$	$\displaystyle=\{u^{n}\mid P_{U^{n}}(u^{n})\leq e^{n\gamma_{n}}P_{\bar{U}_{s}}^{\otimes n}(u^{n})\},$	(159)

where $\gamma_{n}\to 0$ and $n\gamma_{n}\to\infty$ as $n\to\infty$ . We have

	$\displaystyle P_{\bar{Y}_{i}\bar{U}_{s}}^{\otimes n}[(\mathcal{B}_{i}\times\mathcal{U}^{n})^{c}\cap\mathcal{A}_{is}]\leq P_{\bar{Y}_{i}}^{\otimes n}[(\mathcal{B}_{i})^{c}]\leq e^{-n\gamma_{n}},$
	$\displaystyle P_{\bar{Y}_{i}\bar{U}_{s}}^{\otimes n}[(\mathcal{Y}^{n}\times\mathcal{C}_{s})^{c}\cap\mathcal{A}_{is}]\leq P_{\bar{U}_{s}}^{\otimes n}[(\mathcal{C}_{s})^{c}]\leq e^{-n\gamma_{n}}.$		(160)

For $(y^{n},u^{n})\in\mathcal{A}_{is}\cap(\mathcal{B}_{i}\times\mathcal{U}^{n})\cap(\mathcal{Y}^{n}\times\mathcal{C}_{s})$ as $P_{Y^{n}U^{n}}(y^{n},u^{n})\geq\nu_{i}P_{\bar{Y}_{i}\bar{U}_{s}}^{\otimes n}(y^{n},u^{n})$ holds by the definition of $P_{Y^{n}U^{n}}$ , we further have

\iota_{P_{Y^{n}U^{n}}}(y^{n};u^{n})\geq\log\nu_{i}+\iota_{P_{\bar{Y}_{i}^{n}\bar{U}_{s}^{n}}}(y^{n};u^{n})-2n\gamma_{n}.

(161)

This leads to

$\displaystyle\mathrm{Pr}$	$\displaystyle\{\iota_{P_{Y^{n}U^{n}}}(\bar{Y}_{i}^{n};\bar{U}_{s}^{n})\leq n(E_{s}-d_{i}^{\mathrm{y}}+\gamma)\}$
	$\displaystyle=\mathrm{Pr}\{\iota_{P_{Y^{n}U^{n}}}(\bar{Y}_{i}^{n};\bar{U}_{s}^{n})\leq n(E-d_{is}^{\mathrm{yx}}+\gamma),\;(\bar{Y}_{i}^{n},\bar{U}_{s}^{n})\in\mathcal{A}_{is}\}$
	$\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y^{n}U^{n}}}(\bar{Y}_{i}^{n};\bar{U}_{s}^{n})\leq n(E-d_{is}^{\mathrm{yx}}+\gamma),\;$
	$\displaystyle\hskip 56.9055pt(\bar{Y}_{i}^{n},\bar{U}_{s}^{n})\in\mathcal{A}_{is}\cap(\mathcal{B}_{i}\times\mathcal{U}^{n})\cap(\mathcal{Y}^{n}\times\mathcal{C}_{s})\}+2e^{-n\gamma_{n}}$
	$\displaystyle\leq\mathrm{Pr}\big{\{}\iota_{P_{\bar{Y}_{i}^{n}\bar{U}_{s}^{n}}}(\bar{Y}_{i}^{n};\bar{U}_{s}^{n})\leq n(E-d_{is}^{\mathrm{yx}}-\log\nu_{i}/n+2\gamma_{n}+\gamma)\big{\}}+2e^{-n\gamma_{n}}.$	(162)

For an arbitrary $\gamma>0$ select $E=\min_{s}\theta_{s}(R_{c})-2\gamma$ which implies that $E-d_{is}^{\mathrm{yx}}+\gamma<I(\bar{Y}_{i};\bar{U}_{s})$ holds. By the weak law of large numbers we obtain

	$\displaystyle\lim_{n\to\infty}\mathrm{Pr}\big{\{}\iota_{P_{\bar{Y}_{i}^{n}\bar{U}_{s}^{n}}}(\bar{Y}_{i}^{n};\bar{U}_{s}^{n})$	$\displaystyle\leq n(E-d_{is}^{\mathrm{yx}}-\log\nu_{i}/n+2\gamma_{n}+\gamma)\big{\}}=0,$
		$\displaystyle\forall i\in[1:m],i\in\mathfrak{F}_{s}.$		(163)

Therefore we have

	$\displaystyle\mathrm{Pr}\{\zeta(\bar{Y}_{i}^{n},\bar{U}_{s}^{n})\leq nE_{s}\}$	$\displaystyle\to 0,\;\forall s,\;i\in\mathfrak{F}_{s},\;\text{as}\;n\to\infty,$
	$\displaystyle\Rightarrow\alpha_{n}$	$\displaystyle\to 0,\;\text{as}\;n\to\infty.$		(164)

Define the following sets

	$\displaystyle\mathcal{G}=\{(y^{n},\phi_{n}(x^{n}))\mid$	$\displaystyle\zeta(y^{n},u^{n}(\phi_{n}(x^{n})))>n(E-d_{\hat{s}}^{\mathrm{x}}),$
		$\displaystyle\phi_{n}(x^{n})\notin\{(e,1),(1,e^{\star}),\dots,(S,e^{\star})\}\}$		(165)

and

\displaystyle\mathcal{H}_{j}=\{y^{n}\mid Q_{Y_{j}^{n}}(y^{n})>0\},\;j\in[1:k].

(166)

$\mathcal{G}$ is our decision region described in (149). Using $\mathcal{G}$ and $\{\mathcal{H}_{j}\}$ we perform in the following change of measure steps from $Q_{Y_{j}^{n}}$ to $P_{Y^{n}|U^{n}}$ and from $Q_{\phi_{n}(X_{t}^{n})}$ to $P_{\phi_{n}(X^{n})}$ in the calculation of the miss detection probability. For each $j\in[1:k]$ and $(y^{n},\phi_{n}(x^{n}))\in\mathcal{G}\cap(\mathcal{H}_{j}\times\mathcal{M})$ we have

	$\displaystyle\zeta(y^{n},u^{n}(\phi_{n}(x^{n})))>n(E-d_{\hat{s}}^{\mathrm{x}})$
$\displaystyle\Rightarrow$	$\displaystyle\iota_{P_{Y^{n}U^{n}}}(y^{n},u^{n}(\phi_{n}(x^{n}))+\iota_{P_{Y^{n}}\\|Q_{Y_{j}^{n}}}(y^{n})>n(E-d_{\hat{s}}^{\mathrm{x}})$
$\displaystyle\Rightarrow$	$\displaystyle\log\frac{P_{Y^{n}\|U^{n}}(y^{n}\|u^{n}(\phi_{n}(x^{n})))}{Q_{Y_{j}^{n}}(y^{n})}>n(E-d_{\hat{s}}^{\mathrm{x}}),$
$\displaystyle\Rightarrow$	$\displaystyle Q_{Y_{j}^{n}}(y^{n})\leq e^{-n(E-d_{\hat{s}}^{\mathrm{x}})}P_{Y^{n}\|U^{n}}(y^{n}\|u^{n}(\phi_{n}(x^{n}))),$	(167)

where the second last expression follows since $(y^{n},\phi_{n}(x^{n}))\in\mathcal{G}\cap(\mathcal{H}_{j}\times\mathcal{M})$ implies that $P_{Y^{n}}(y^{n})>0$ , otherwise in the previous line the first term is $0$ while the second term is $-\infty$ , and due to our coding arguments $P_{U^{n}}(u^{n}(\phi_{n}(x^{n})))>0$ . Furthermore for $(y^{n},\phi_{n}(x^{n}))\in\mathcal{G}$ , as $\phi_{n}(x^{n})\notin\{(1,e^{\star}),\dots,(|\mathcal{P}_{\mathcal{X}}|,e^{\star})\}$ holds, we also have

\displaystyle Q_{\phi_{n}(X_{t}^{n})}(\phi_{n}(x^{n}))\leq P_{\phi_{n}(X^{n})}(\phi_{n}(x^{n}))e^{-n(d_{\hat{s}}^{\mathrm{x}}-\gamma)},\;\forall t\in[1:r].

(168)

Therefore, for each $j\in[1:k]$ and $t\in[1:r]$ the probability of miss detection is bounded by

$\displaystyle\beta_{n}^{(jt)}$	$\displaystyle=Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(\mathcal{G})=Q_{Y_{j}^{n}}\times Q_{\phi_{n}(X_{t}^{n})}(\mathcal{G}\cap(\mathcal{H}_{j}\times\mathcal{M}))$
	$\displaystyle\leq e^{-n(d_{\hat{s}}^{\mathrm{x}}-\gamma)}e^{-n(E-d_{\hat{s}}^{\mathrm{x}})}$
	$\displaystyle\times\sum_{(y^{n},\phi_{n}(x^{n}))\in\mathcal{G}\cap(\mathcal{H}_{j}\times\mathcal{M})}P_{\phi_{n}(X^{n})}(\phi_{n}(x^{n}))P_{Y^{n}\|U^{n}}(y^{n}\|u^{n}(\phi_{n}(x^{n})))$
	$\displaystyle\leq e^{-n(E-\gamma)}.$	(169)

This implies that we have

\displaystyle\beta_{n}=\max_{j\in[1:k],t\in[1:r]}\beta_{n}^{(jt)}\leq e^{-n(E-\gamma)}.

(170)

Therefore, the chosen sequence of $\phi_{n}$ satisfies

\displaystyle\lim_{n\to\infty}\alpha_{n}=0,\;\liminf_{n\to\infty}\frac{1}{n}\log\frac{1}{\beta_{n}}\geq E-\gamma.

(171)

Combining with the definition of intersected sets $\mathcal{I}_{n}^{(i)}(E)$ this further implies that for all $i\in[1:m]$ we have

\displaystyle\lim_{n\to\infty}P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}[\mathcal{I}_{n}^{(i)}(E-2\gamma)^{c}]\leq\lim_{n\to\infty}(\alpha_{n}^{(i)}+e^{n(E-2\gamma)}\sum_{j,t}\beta^{(jt)}_{n})=0.

(172)

Appendix C Proof of Theorem 2

For an arbitrarily given $\gamma>0$ , define the following typical sets

	$\displaystyle\mathcal{B}_{n,\gamma}^{(i)}$	$\displaystyle=\{y^{n}\mid\|\iota_{P_{Y_{i}}^{\otimes n}\\|Q_{Y_{j_{i}^{\star}}^{n}}}(y^{n})/n-d_{i}^{\mathrm{y}}\|<\gamma\},\;i\in[1:m],$
	$\displaystyle\mathcal{B}_{n,\gamma}^{s}$	$\displaystyle=\{x^{n}\mid\|\iota_{P_{\mathcal{X},s}^{\otimes n}\\|Q_{X_{t_{s}^{\star}}}^{n}}(x^{n})/n-d_{s}^{\mathrm{x}}\|<\gamma\},\;s\in[1:\|\mathcal{P}_{\mathcal{X}}\|].$		(173)

We have

	$\displaystyle\lim_{n\to\infty}$	$\displaystyle P_{\mathcal{X},s}^{\otimes n}(\mathcal{B}_{n,\gamma}^{s})=1,\;\forall s\in[1:\|\mathcal{P}_{\mathcal{X}}\|],$
	$\displaystyle\lim_{n\to\infty}$	$\displaystyle P_{Y_{i}}^{\otimes n}(\mathcal{B}_{n,\gamma}^{(i)})=1,\;\forall i\in[1:m].$		(174)

either due to the weak law of large numbers or due to [24, Theorem 1]. For simplicity we first consider the case that there is a single marginal distribution in the set $\mathcal{P}_{\mathcal{X}}$ . In this case we simply write $P_{\mathcal{X},1}$ as $P_{X}$ , $t_{1}^{\star}$ as $t^{\star}$ , $\mathcal{B}_{n,\gamma}^{1}$ as $\mathcal{B}_{n,\gamma}$ , and $\theta_{s}(R_{c})$ as $\theta(R_{c})$ . Furthermore, for notation compactness we define the following quantities in this case

\displaystyle\bar{d}_{i}^{\star}=d_{i}^{\mathrm{y}}+d_{1}^{\mathrm{x}},\;\forall i\in[1:m].

(175)

To support to analysis we define $\bar{P}=\prod P_{Y_{i}|X_{i}}\times P_{X}$ . Given an arbitrary joint distribution $P_{(Y_{i})_{i=1}^{m}X}$ , consider the following region

	$\displaystyle\mathcal{R}=\{(R_{c},E)\mid$	$\displaystyle E\leq\min_{i\in[1:m]}[I(Y_{i};U)+\bar{d}_{i}^{\star}],$
		$\displaystyle R_{c}\geq I(X;U),\;U-X-(Y_{i})_{i=1}^{m}\}.$		(176)

It can be seen that $\mathcal{R}$ only depends on marginal distributions $\{P_{Y_{i}X}\}_{i\in[1:m]}$ but not on the joint distribution $P_{(Y_{i})_{i=1}^{m}X}$ . Without the loss of generality we assume that in the evaluation of $\mathcal{R}$ , $P_{(Y_{i})_{i=1}^{m}X}=\bar{P}$ . In the following we use the hyperplane characterization of $\mathcal{R}$ . For that purpose, we first show the following result.

Lemma 2.

$\mathcal{R}$ is a closed, convex set. Furthermore, $\theta(R_{c})$ is a concave function.

Proof.

Assume that $\{(R_{c,i},E_{i})\}_{i=1}^{2}$ are two points in $\mathcal{R}$ with corresponding kernels $P_{U_{i}|X}$ . Assume that $\alpha$ is a random variable taking values on $\{1,2\}$ with $P_{\alpha}(1)=\nu$ , $\nu\in[0,1]$ , and independent of everything else. Define $U=(U_{\alpha},\alpha)$ . Then we have

$\displaystyle I(X;U)$	$\displaystyle=\nu I(X;U_{1})+(1-\nu)I(X;U_{2})$
	$\displaystyle\leq\nu R_{c,1}+(1-\nu)R_{c,2}.$
$\displaystyle I(Y_{i};U)$	$\displaystyle=\nu I(Y_{i};U_{1})+(1-\nu)I(Y_{i};U_{2})$
	$\displaystyle\geq\nu E_{1}+(1-\nu)E_{2}-\bar{d}_{i}^{\star},\;i\in[1:m].$	(177)

Therefore, the convex combination $(\nu R_{c,1}+(1-\nu)R_{c,2},\nu E_{1}+(1-\nu)E_{2})$ lies also in $\mathcal{R}$ , or $\mathcal{R}$ is a convex set. It can also be seen that $\mathcal{R}$ is a closed set since all alphabets are finite. Let $\mathcal{R}^{-}=\{(R_{c},E)\mid(R_{c},-E)\in\mathcal{R}\}$ be the reflection of the set $\mathcal{R}$ via the $R_{c}$ -axis. Then $\mathcal{R}^{-}$ is a convex set. Furthermore the function

$\displaystyle\theta^{-}(R_{c})$	$\displaystyle=\min\{E\mid(R_{c},E)\in\mathcal{R}^{-}\}$
	$\displaystyle=\min_{P_{U\|X}\colon I(X;U)\leq R_{c}}-\min_{i\in[1:m]}[I(Y_{i};U)+\bar{d}_{i}^{\star}],$
	$\displaystyle=-\max_{P_{U\|X}\colon I(X;U)\leq R_{c}}\min_{i\in[1:m]}[I(Y_{i};U)+\bar{d}_{i}^{\star}],$	(178)

is a convex function. This implies that $\theta(R_{c})=-\theta^{-}(R_{c})$ is a concave function. ∎

For any $\mu>0$ define

\displaystyle R_{\mathrm{HT}}^{\mu}(\bar{P})=\min_{P_{U|X}}(I(X;U)-\mu\min_{i\in[1:m]}[I(Y_{i};U)+\bar{d}_{i}^{\star}]).

(179)

Since $\mathcal{R}$ is a closed convex set, the line $R_{c}-\mu E=R_{\mathrm{HT}}^{\mu}(\bar{P})$ supports $\mathcal{R}$ or¹⁰¹⁰10Assume that $(x,y)\in\mathbb{R}_{+}^{2}$ and $(x,y)\notin\mathcal{R}$ hold. Then there exists a pair $(a,b)\in\mathbb{R}^{2}$ such that $ax+by<aR+bE$ for all $(R,E)\in\mathcal{R}$ as $\mathcal{R}$ is a closed convex set. Since $(0,\min_{i}\bar{d}_{i}^{\star})\in\mathcal{R}$ , we must have $y>\min_{i}\bar{d}_{i}^{\star}$ . This implies that either $a$ or $b$ is negative by plugging $(R,E)=(0,\min_{i}\bar{d}_{i}^{\star})$ in. Similarly plugging $(R,\min_{i}\bar{d}_{i}^{\star})$ with sufficiently large $R$ in we see that $a$ is positive and $b$ is negative. Setting $\mu=-b/a$ , we see that if $(x,y)\notin\mathcal{R}$ then $(x,y)\notin\cap_{\mu>0}\{(R_{c},E)\mid R_{c}-\mu E\geq R_{\mathrm{HT}}^{\mu}(\bar{P})\}$ . Hence $\mathcal{R}\supseteq\cap_{\mu>0}\{(R_{c},E)\mid R_{c}-\mu E\geq R_{\mathrm{HT}}^{\mu}(\bar{P})\}$ holds. The other direction is straightforward. $\mathcal{R}=\cap_{\mu>0}\{(R_{c},E)\mid R_{c}-\mu E\geq R_{\mathrm{HT}}^{\mu}(\bar{P})\}$ . For our proof we also use an additional characterization of $R_{\mathrm{HT}}^{\mu}(\bar{P})$ which is stated in the following. For any pair of positive numbers $(\mu,\alpha)$ we define

	$\displaystyle R_{\mathrm{HT}}^{\mu,\alpha}(\bar{P})=$	$\displaystyle\min_{P_{\tilde{U}\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}}\bigg{(}I(\tilde{X},(\tilde{Y}_{i})_{i=1}^{m};\tilde{U})-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{i};\tilde{U})+\bar{d}_{i}^{\star}]$
		$\displaystyle\qquad+\alpha I(\tilde{U};(\tilde{Y}_{i})_{i=1}^{m}\|\tilde{X})+(\alpha+1)D(P_{\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}\\|\bar{P})\bigg{)},$		(180)

where $P_{\tilde{U}\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}$ is a joint probability measure on $\tilde{\mathcal{U}}\times\mathcal{X}\times\mathcal{Y}^{m}$ satisfying $P_{\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}\ll\bar{P}$ and $(\tilde{U},\tilde{X},(\tilde{Y}_{i})_{i=1}^{m})\sim P_{\tilde{U}\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}$ . By the support lemma in [27] we can upper bound the cardinality of $|\tilde{\mathcal{U}}|$ by a constant. An alternative characterization of $R_{\mathrm{HT}}^{\mu}(\bar{P})$ is given in the following.

Lemma 3.

\displaystyle\sup_{\alpha>0}R_{\mathrm{HT}}^{\mu,\alpha}(\bar{P})=R_{\mathrm{HT}}^{\mu}(\bar{P}).

(181)

The proof of Lemma 3 is deferred to the end of Subsection C-A.

C-A Strong converse proof for $\epsilon<\min\{\min_{s\in\mathcal{S}}\frac{1}{|\mathfrak{F}_{s}|},1\}$

We now present the main part of the proof of Theorem 2. In the proof, we will use a recent technique by Tyagi and Watanabe [26]. When the inactive set is empty, $\mathcal{S}=\varnothing$ , showing that $E_{\mathrm{comp},\epsilon}^{\star}(R_{c})\leq\min_{i\in[1:m]}\xi_{i}(R_{c})$ for all $\epsilon\in[0,1)$ can be deduced from the strong converse result of testing $P_{Y_{i}X_{i}}^{\otimes n}$ against $Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{X_{t^{\star}}^{n}}$ for all $i\in[1:m]$ , cf. Theorem 7. Without loss of generality, we assume in the following that the inactive set is not empty $\mathcal{S}\neq\varnothing$ .

Assume first that the set of marginal distributions on $\mathcal{X}$ , $\mathcal{P}_{\mathcal{X}}$ , is a singleton. This implies that $|\mathfrak{F}_{1}|=m$ holds. Given a sequence of testing schemes $(\phi_{n},\psi_{n})$ such that the $\epsilon$ -achievability conditions are fulfilled

	$\displaystyle\limsup_{n\to\infty}\frac{1}{n}\log\|\phi_{n}\|$	$\displaystyle\leq R_{c},\;\limsup_{n\to\infty}\alpha_{n}\leq\epsilon,$
	$\displaystyle\;\liminf_{n\to\infty}$	$\displaystyle\frac{1}{n}\log\frac{1}{\beta_{n}}\geq E,$		(182)

we define for each $i\in[1:m]$ the following likelihood based decision region

	$\displaystyle\mathcal{A}_{n,\gamma}^{i}=\{(y^{n},\phi_{n}(x^{n}))\mid$	$\displaystyle P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(y^{n},\phi_{n}(x^{n}))$
		$\displaystyle\geq e^{n(E-\gamma)}Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t^{\star}}^{n})}(y^{n},\phi_{n}(x^{n}))\}.$		(183)

By [11, Lemma 4.1.2], cf. also [31, Lemma 12.2], we obtain

	$\displaystyle\alpha_{n}+e^{n(E-\gamma)}\beta_{n}$	$\displaystyle\geq P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}[(\mathcal{A}_{n,\gamma}^{i})^{c}],$
	$\displaystyle Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t^{\star}}^{n})}(\mathcal{A}^{i}_{n,\gamma})$	$\displaystyle\leq e^{-n(E-\gamma)}.$		(184)

For each $n$ , $(\phi_{n},\mathbf{1}_{\mathcal{A}^{i}_{n,\gamma}})$ can be seen as a testing scheme for differentiating between $P_{Y_{i}X_{i}}^{\otimes n}$ and $Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{X_{t^{\star}}^{n}}$ . For a given $i\in[1:m]$ we now a construct a testing scheme $(\bar{\phi}_{n},\bar{\psi}_{in})$ to differentiate between $P_{Y_{i}X_{i}}^{\otimes n}$ and $P_{Y_{i}}^{\otimes n}\times P_{X}^{\otimes n}$ . Given $\phi_{n}$ , the compression mapping $\bar{\phi}_{n}$ does not depend on $i$ . Our arguments are similar to the code transformations given in Appendix A. We present the procedure in the following for completeness. Given $\phi_{n}$ , $\bar{\phi}_{n}$ is defined as

	$\displaystyle\bar{\phi}_{n}\colon\mathcal{X}^{n}$	$\displaystyle\to\mathcal{M}\cup\{e\}$
	$\displaystyle\bar{\phi}_{n}(x^{n})$	$\displaystyle\mapsto\begin{dcases}\phi_{n}(x^{n}),\;&\text{if}\;x^{n}\in\mathcal{B}_{n,\gamma},\\ e\;&\text{otherwise}\end{dcases}.$

For each $i\in\mathfrak{F}_{1}$ , the decision mapping $\bar{\psi}_{in}$ is defined as

	$\displaystyle\bar{\psi}_{in}\colon\mathcal{Y}^{n}\times(\mathcal{M}\cup\{e\})$	$\displaystyle\to\{0,1\}$
	$\displaystyle\bar{\psi}_{in}(y^{n},\bar{u})$	$\displaystyle\mapsto\begin{dcases}\mathbf{1}_{\mathcal{A}^{i}_{n,\gamma}}(y^{n},\bar{u}),\;&\text{if}\;y^{n}\in\mathcal{B}_{n,\gamma}^{(i)},\;\text{and}\;\bar{u}\neq e,\\ 1&\text{otherwise}\end{dcases}.$

In the above definitions the typical sets $\mathcal{B}_{n,\gamma}$ (or $\mathcal{B}_{n,\gamma}^{s}$ ) and $\mathcal{B}_{n,\gamma}^{(i)}$ are defined in (173). Then we also have

$\displaystyle P_{Y_{i}^{n}\bar{\phi}_{n}(X_{i}^{n})}(1-\bar{\psi}_{in})$	$\displaystyle\leq P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}[(\mathcal{A}_{n,\gamma}^{i})^{c}]+P_{X}^{\otimes n}[(\mathcal{B}_{n,\gamma})^{c}]+P_{Y_{i}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{(i)})^{c}]$
$\displaystyle P_{Y_{i}^{n}}\times P_{\bar{\phi}_{n}(X_{i}^{n})}(\bar{\psi}_{in})$	$\displaystyle\leq e^{n(d_{i}^{\mathrm{y}}+d_{s}^{\mathrm{x}}+2\gamma)}Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}(X_{t^{\star}}^{n})}(\mathcal{A}^{i}_{n,\gamma})$
	$\displaystyle\leq e^{-n(E-\bar{d}_{i}^{\star}-3\gamma)}.$	(185)

Let $\mathcal{A}_{n}^{(i)}$ be the corresponding acceptance region of $\bar{\psi}_{in}$ . We take $n_{0}$ to be sufficiently large such that the following conditions hold for all $n\geq n_{0}$

	$\displaystyle\log\|\bar{\phi}_{n}\|$	$\displaystyle\leq n(R_{c}+\gamma),$
	$\displaystyle P_{Y_{i}^{n}\bar{\phi}_{n}(X_{i}^{n})}(\mathcal{A}_{n}^{(i)})$	$\displaystyle\geq 1-(\epsilon+\gamma),\;\forall i\in[1:m].$		(186)

Next, we can further decompose $\mathcal{A}_{n}^{(i)}$ as

\mathcal{A}_{n}^{(i)}=\bigcup_{u\in\mathcal{M}}\mathcal{A}_{n,u}^{(i)}\times\{u\}.

(187)

For simplicity define $\delta=1-(\epsilon+\gamma)$ . Consider the set

\displaystyle\mathcal{V}_{i}=\{x^{n}\mid P_{Y_{i}|X_{i}}^{\otimes n}(\mathcal{A}_{n,\bar{\phi}_{n}(x^{n})}^{(i)}|x^{n})>\eta\},

(188)

where $\delta>\eta>0$ . Then we have

$\displaystyle\delta$	$\displaystyle\leq P_{Y_{i}^{n}\phi_{n}(X_{i}^{n})}(\mathcal{A}_{n}^{(i)})$
	$\displaystyle=\mathrm{Pr}[(Y_{i}^{n},\phi_{n}(X_{i}^{n}))\in\mathcal{A}_{n}^{(i)},X_{i}^{n}\in\mathcal{V}_{i}]+\mathrm{Pr}[(Y_{i}^{n},\phi_{n}(X_{i}^{n}))\in\mathcal{A}_{n}^{(i)},X_{i}^{n}\notin\mathcal{V}_{i}]$
	$\displaystyle\leq P_{X}^{\otimes n}(\mathcal{V}_{i})+\eta P_{X}^{\otimes n}(\mathcal{V}_{i}^{c}),$	(189)

which implies that $P_{X}^{\otimes n}(\mathcal{V}_{i})\geq(\delta-\eta)/(1-\eta)$ . Using this inequality we further obtain

	$\displaystyle P_{X}^{\otimes n}(\cap_{i=1}^{m}\mathcal{V}_{i})$	$\displaystyle=1-P_{X}^{\otimes n}(\cup_{i=1}^{m}\mathcal{V}_{i}^{c})$
		$\displaystyle\geq 1-m(1-\delta)/(1-\eta).$		(190)

For $P_{X}^{\otimes n}(\cap_{i=1}^{m}\mathcal{V}_{i})>0$ , we require that

\eta<1-m(1-\delta)=1-m(\epsilon+\gamma),

(191)

must hold. This in turn implies that $\epsilon<1/m=1/|\mathfrak{F}_{1}|$ . Taking $\eta=(1-m(\epsilon+\gamma))/2$ , we obtain $P_{X}^{\otimes n}(\cap_{i=1}^{m}\mathcal{V}_{i})>(1-m(\epsilon+\gamma))/(1+m(\epsilon+\gamma))$ . Define $\tilde{\mathcal{V}}_{n}=\cap_{i=1}^{m}\mathcal{V}_{i}$ , $\tilde{\epsilon}=(1-m(\epsilon+\gamma))/(1+m(\epsilon+\gamma))$ , and the following distribution on $\mathcal{X}^{n}$

\displaystyle\tilde{P}_{\mathcal{X}^{n}}(x^{n})=P_{X}^{\otimes n}(x^{n})/P_{X}^{\otimes n}(\tilde{\mathcal{V}}_{n})\mathbf{1}\{x^{n}\in\tilde{\mathcal{V}}_{n}\}.

(192)

For a given $x^{n}\in\tilde{\mathcal{V}}_{n}$ we also define the following joint conditional distribution on $\mathcal{Y}^{nm}$

\displaystyle\tilde{P}_{\mathcal{Y}^{nm},x^{n}}((y_{i}^{n})_{i=1}^{m}|x^{n})=\prod_{i=1}^{m}\frac{P_{Y_{i}|X_{i}}^{\otimes n}(y^{n}|x^{n})}{P_{Y_{i}|X_{i}}^{\otimes n}(\mathcal{A}_{n,\bar{\phi}_{n}(x^{n})}^{(i)}|x^{n})}\mathbf{1}\{y^{n}\in\mathcal{A}_{n,\bar{\phi}_{n}(x^{n})}^{(i)}\}.

(193)

Additionally, we define $\tilde{P}_{\mathcal{Y}^{nm},x^{n}}((y_{i}^{n})_{i=1}^{m}|x^{n})=0$ for all $y^{n}$ if $x^{n}\notin\tilde{\mathcal{V}}_{n}$ . Let $(\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m})$ be a tuple of general sources such that

(\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m})\sim\tilde{P}\triangleq\tilde{P}_{\mathcal{Y}^{nm},x^{n}}\times\tilde{P}_{\mathcal{X}^{n}}.

(194)

Then for each $x^{n}\in\tilde{\mathcal{V}}_{n}$ we have the following inequality

\displaystyle P_{X}^{\otimes n}(x^{n})=P_{\tilde{X}^{n}}(x^{n})P_{X}^{\otimes n}(\tilde{\mathcal{V}}_{n})\geq\tilde{\epsilon}P_{\tilde{X}^{n}}(x^{n}).

(195)

Consider an arbitrarily fixed sequence $y^{n}\in\bigcup_{u\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing}\mathcal{A}_{n,u}^{(i)}$ . For the following analysis, the illustration given in Fig. 4 is perhaps helpful for readers. Let $u$ be a message index such that $\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing$ and $y^{n}\in\mathcal{A}_{n,u}^{(i)}$ hold. For all $x^{n}\in\tilde{\mathcal{V}}_{n}\cap\bar{\phi}_{n}^{-1}(u)$ , we then have

	$\displaystyle P_{Y_{i}X_{i}}^{\otimes n}(y^{n},x^{n})$	$\displaystyle=P_{Y_{i}\|X_{i}}^{\otimes n}(\mathcal{A}_{n,\bar{\phi}_{n}(x^{n})}^{(i)}\|x^{n})P_{X}^{\otimes n}(\tilde{\mathcal{V}}_{n})P_{\tilde{Y}_{i}^{n}\tilde{X}^{n}}(y^{n},x^{n})$
		$\displaystyle\geq\eta\tilde{\epsilon}P_{\tilde{Y}_{i}^{n}\tilde{X}^{n}}(y^{n},x^{n}).$		(196)

If $u$ is another message index such that $\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing$ and that $y^{n}\notin\mathcal{A}_{n,u}^{(i)}$ hold, then for all $x^{n}\in\tilde{\mathcal{V}}\cap\bar{\phi}_{n}^{-1}(u)$ , we have $P_{\tilde{Y}_{i}^{n}\tilde{X}^{n}}(y^{n},x^{n})=0$ . When $x^{n}\notin\tilde{\mathcal{V}}_{n}$ , we also have $P_{\tilde{Y}_{i}^{n}\tilde{X}^{n}}(y^{n},x^{n})=0$ . Hence for all $y^{n}\in\bigcup_{u\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing}\mathcal{A}_{n,u}^{(i)}$ , we obtain

	$\displaystyle P_{Y_{i}}^{\otimes n}(y^{n})=\sum_{x^{n}\in\mathcal{X}^{n}}P_{Y_{i}X_{i}}^{\otimes n}(y^{n},x^{n})$
	$\displaystyle=\sum_{x^{n}\notin\tilde{\mathcal{V}}_{n}}P_{Y_{i}X_{i}}^{\otimes n}(y^{n},x^{n})+\sum_{\begin{subarray}{c}x^{n}\in\tilde{\mathcal{V}}_{n}\\ y^{n}\in\mathcal{A}_{n,\bar{\phi}_{n}(x^{n})}^{(i)}\end{subarray}}P_{Y_{i}X_{i}}^{\otimes n}(y^{n},x^{n})$
	$\displaystyle\hskip 28.45274pt+\sum_{\begin{subarray}{c}x^{n}\in\tilde{\mathcal{V}}_{n}\\ y^{n}\notin\mathcal{A}_{n,\bar{\phi}_{n}(x^{n})}^{(i)}\end{subarray}}P_{Y_{i}X_{i}}^{\otimes n}(y^{n},x^{n})$
	$\displaystyle\geq\eta\tilde{\epsilon}\big{[}\sum_{x^{n}\notin\tilde{\mathcal{V}}_{n}}P_{\tilde{Y}_{i}^{n}\tilde{X}^{n}}(y^{n},x^{n})+\sum_{\begin{subarray}{c}x^{n}\in\tilde{\mathcal{V}}_{n}\\ y^{n}\in\mathcal{A}_{n,\bar{\phi}_{n}(x^{n})}^{(i)}\end{subarray}}P_{\tilde{Y}_{i}^{n}\tilde{X}^{n}}(y^{n},x^{n})$
	$\displaystyle\hskip 28.45274pt+\sum_{\begin{subarray}{c}x^{n}\in\tilde{\mathcal{V}}_{n}\\ y^{n}\notin\mathcal{A}_{n,\bar{\phi}_{n}(x^{n})}^{(i)}\end{subarray}}P_{\tilde{Y}_{i}^{n}\tilde{X}^{n}}(y^{n},x^{n})\big{]}$
	$\displaystyle=\eta\tilde{\epsilon}P_{\tilde{Y}_{i}^{n}}(y^{n}),$		(197)

We observe that not only $\tilde{P}\ll\bar{P}^{\otimes n}$ holds but also we have the following bound

	$\displaystyle D(\tilde{P}\\|\bar{P}^{\otimes n})$
	$\displaystyle=\sum P_{\tilde{X}^{n}}(x^{n})\log\frac{P_{\tilde{X}^{n}}(x^{n})}{P_{X}^{\otimes n}}$
	$\displaystyle+\sum P_{\tilde{X}^{n}}(x^{n})\sum P_{(\tilde{Y}_{i}^{n})_{i=1}^{m}\|\tilde{X}^{n}}((y_{i}^{n})_{i=1}^{m}\|x^{n})\log\frac{P_{(\tilde{Y}_{i}^{n})_{i=1}^{m}\|\tilde{X}^{n}}((y_{i}^{n})_{i=1}^{m}\|x^{n})}{\prod_{i=1}^{m}P_{Y_{i}\|X_{i}}^{\otimes n}(y_{i}^{n}\|x^{n})}$
	$\displaystyle\leq\log\frac{1}{\tilde{\epsilon}}+m\log\frac{1}{\eta}\triangleq\tilde{\eta}.$		(198)

We now derive bounds on the compression rate $R_{c}$ and the $\epsilon$ -achievable error exponent $E$ using expressions involving $(\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m})$ . We first have

\displaystyle n(R_{c}+\gamma)\geq\log|\bar{\phi}_{n}|\geq I(\tilde{X}^{n};\tilde{M}),

(199)

where $\tilde{M}=\bar{\phi}_{n}(\tilde{X}^{n})$ . Note further that the following Markov chain holds

(\tilde{Y}_{i}^{n})_{i=1}^{m}-\tilde{X}^{n}-\tilde{M}.

(200)

For each $i\in[1:m]$ the support set of the joint distribution $P_{\tilde{Y}_{i}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}$ is a subset of the following set, cf. Fig. 4 for a visual illustration,

\tilde{\mathcal{A}}_{n}^{(i)}=\bigcup_{u\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing}\mathcal{A}_{n,u}^{(i)}\times\{u\}.

Then compared with (187) we have $\tilde{\mathcal{A}}_{n}^{(i)}\subseteq\mathcal{A}_{n}^{(i)}$ due to the restriction on $u$ . Therefore, on one hand we have

\displaystyle P_{Y_{i}^{n}}\times P_{\bar{\phi}_{n}(X_{i}^{n})}(\tilde{\mathcal{A}}_{n}^{(i)})\leq P_{Y_{i}^{n}}\times P_{\bar{\phi}_{n}(X_{i}^{n})}(\mathcal{A}_{n}^{(i)}).

(201)

On the other hand from (195) and (197) we have

	$\displaystyle P_{Y_{i}^{n}}\times P_{\bar{\phi}_{n}(X_{i}^{n})}(\tilde{\mathcal{A}}_{n}^{(i)})=\sum_{u\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing}P_{\bar{\phi}_{n}(X_{i}^{n})}(u)\sum_{y^{n}\in\mathcal{A}_{n,u}^{(i)}}P_{Y_{i}}^{\otimes n}(y^{n})$
	$\displaystyle\stackrel{{\scriptstyle(*)}}{{\geq}}\sum_{u\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing}\sum_{x^{n}\in\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}}P_{X}^{\otimes n}(x^{n})\sum_{y^{n}\in\mathcal{A}_{n,u}^{(i)}}P_{Y_{i}}^{\otimes n}(y^{n})$
	$\displaystyle\geq\sum_{u\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing}\sum_{x^{n}\in\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}}\tilde{\epsilon}P_{\tilde{X}^{n}}(x^{n})\sum_{y^{n}\in\mathcal{A}_{n,u}^{(i)}}\tilde{\epsilon}\eta P_{\tilde{Y}_{i}^{n}}(y^{n})$
	$\displaystyle=\tilde{\epsilon}^{2}\eta\sum_{u\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing}\sum_{y^{n}\in\mathcal{A}_{n,u}^{(i)}}P_{\bar{\phi}_{n}(\tilde{X}^{n})}(u)P_{\tilde{Y}_{i}^{n}}(y^{n}).$		(202)

where $(*)$ holds because $P_{\bar{\phi}_{n}(X_{i}^{n})}(u)=\sum_{x^{n}\in\bar{\phi}_{n}^{-1}(u)}P_{X}^{\otimes n}(x^{n})$ . The last equality is valid because for $x^{n}\in\bar{\phi}_{n}^{-1}(u)\cap(\tilde{\mathcal{V}}_{n})^{c}$ we have $P_{\tilde{X}^{n}}(x^{n})=0$ . This leads to the following

	$\displaystyle\log\frac{1}{P_{Y_{i}^{n}}\times P_{\bar{\phi}_{n}(X_{i}^{n})}(\tilde{A}_{n}^{(i)})}$
	$\displaystyle\stackrel{{\scriptstyle(**)}}{{\leq}}\sum_{u,y^{n}}P_{\bar{Y}_{i}^{n}\bar{\phi}_{n}(\bar{X}^{n})}(y^{n},u)\log\frac{P_{\bar{Y}_{i}^{n}\bar{\phi}_{n}(\bar{X}^{n})}(y^{n},u)}{\tilde{\epsilon}^{2}\eta P_{\phi_{n}(\tilde{X}^{n})}(u)P_{\tilde{Y}_{i}^{n}}(y^{n})}$
	$\displaystyle\leq I(\tilde{Y}_{i}^{n};\bar{\phi}_{n}(\tilde{X}^{n}))+\log\frac{1}{\eta\tilde{\epsilon}^{2}},$		(203)

where we have use log-sum inequality in $(**)$ and the summation therein is taken over $u\colon\bar{\phi}_{n}^{-1}(u)\cap\tilde{\mathcal{V}}_{n}\neq\varnothing$ and $y^{n}\in\mathcal{A}_{n,u}^{(i)}$ . The last inequality holds since for $(y^{n},u)\notin\tilde{\mathcal{A}}_{n}^{(i)}$ we have $P_{\tilde{Y}_{i}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}(y^{n},u)=0$ . Combining (185), (201), and (203) we obtain

\displaystyle n(E-3\gamma)\leq\min_{i\in[1:m]}[I(\tilde{Y}_{i}^{n};\tilde{M})+n\bar{d}_{i}^{\star}]+\log\frac{1}{\eta\tilde{\epsilon}^{2}}.

(204)

By combining those two previous inequalities in (199) and (204), we obtain for arbitrarily given positive numbers $\mu$ and $\alpha$

	$\displaystyle n(R_{c}+\gamma-\mu(E-3\gamma))$
	$\displaystyle\geq I(\tilde{X}^{n};\tilde{M})-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{i}^{n};\tilde{M})+n\bar{d}_{i}^{\star}]-\mu\log\frac{1}{\eta\tilde{\epsilon}^{2}}$
	$\displaystyle\stackrel{{\scriptstyle\eqref{divergence_bound},\eqref{markov_thm2}}}{{\geq}}I(\tilde{X}^{n};\tilde{M})-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{i}^{n};\tilde{M})+n\bar{d}_{i}^{\star}]$
	$\displaystyle+(\alpha+1)I(\tilde{M};(\tilde{Y}_{i}^{n})_{i=1}^{m}\|\tilde{X}^{n})+(\alpha+1)D(\tilde{P}\\|\bar{P}^{\otimes n})-(\alpha+1)\tilde{\eta}-\mu\log\frac{1}{\eta\tilde{\epsilon}^{2}}$
	$\displaystyle=I(\tilde{M};\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m})-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{i}^{n};\tilde{M})+n\bar{d}_{i}^{\star}]+\alpha I(\tilde{M};(\tilde{Y}_{i}^{n})_{i=1}^{m}\|\tilde{X}^{n})$
	$\displaystyle+(\alpha+1)D(\tilde{P}\\|\bar{P}^{\otimes n})-(\alpha+1)\tilde{\eta}-\mu\log\frac{1}{\eta\tilde{\epsilon}^{2}}$
	$\displaystyle=A_{1}+A_{2}-(\alpha+1)\tilde{\eta}-\mu\log\frac{1}{\eta\tilde{\epsilon}^{2}},$		(205)

where

$\displaystyle A_{1}$	$\displaystyle=H(\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m})+\alpha H((\tilde{Y}_{i}^{n})_{i=1}^{m}\|\tilde{X}^{n})+(\alpha+1)D(\tilde{P}\\|\bar{P}^{\otimes n}),$
$\displaystyle A_{2}$	$\displaystyle=-H(\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m}\|\tilde{M})-\alpha H((\tilde{Y}_{i}^{n})_{i=1}^{m}\|\tilde{X}^{n},\tilde{M})$
	$\displaystyle\hskip 56.9055pt-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{i}^{n};\tilde{M})+n\bar{d}_{i}^{\star}].$	(206)

Let $T$ be a uniform random on $[1:n]$ . Furthermore, define $\tilde{U}_{l}=(\tilde{M},(\tilde{Y}_{i}^{l-1})_{i=1}^{m})$ for all $l=[1:n]$ , and $\tilde{U}=(\tilde{U}_{T},T)$ . We show at the end of this subsection that

$\displaystyle A_{1}$	$\displaystyle\geq n(H(\tilde{X}_{T},(\tilde{Y}_{iT})_{i=1}^{m})+\alpha H((\tilde{Y}_{iT})_{i=1}^{m}\|\tilde{X}_{T})$
	$\displaystyle\hskip 28.45274pt+(\alpha+1)D(P_{\tilde{X}_{T}(\tilde{Y}_{iT})_{i=1}^{m}}\\|\bar{P})),$	(207)
$\displaystyle A_{2}$	$\displaystyle\geq n(-H(\tilde{X}_{T},(\tilde{Y}_{iT})_{i=1}^{m}\|\tilde{U})-\alpha H((\tilde{Y}_{iT})_{i=1}^{m}\|\tilde{X}_{T},\tilde{U})$
	$\displaystyle\hskip 28.45274pt-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{iT};\tilde{U})+\bar{d}_{i}^{\star}]).$	(208)

In summary we obtain for all positive $\mu$ and $\alpha$ that

	$\displaystyle(R_{c}+\gamma)-\mu(E-3\gamma)$
	$\displaystyle\geq I(\tilde{X}_{T},(\tilde{Y}_{iT})_{i=1}^{m};\tilde{U})-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{iT};\tilde{U})+\bar{d}_{i}^{\star}]+\alpha I(\tilde{U};(\tilde{Y}_{iT})_{i=1}^{m}\|\tilde{X}_{T})$
	$\displaystyle\qquad+(\alpha+1)D(P_{\tilde{X}_{T}(\tilde{Y}_{iT})_{i=1}^{m}}\\|\bar{P})-\frac{(\alpha+1)\tilde{\eta}+\mu\log\frac{1}{\eta\tilde{\epsilon}^{2}}}{n}$
	$\displaystyle\geq R_{\text{HT}}^{\mu,\alpha}(\bar{P})-\frac{(\alpha+1)\tilde{\eta}+\mu\log\frac{1}{\eta\tilde{\epsilon}^{2}}}{n}.$		(209)

Taking $n\to\infty$ and supremum over $\alpha>0$ and using Lemma 3, we have shown that $(R_{c}+\gamma,E-3\gamma)\in\mathcal{R}$ . Taking $\gamma\to 0$ we obtain the conclusion that $E\leq\theta(R_{c})$ .
If the set of marginal distributions on $\mathcal{X}$ , $\mathcal{P}_{\mathcal{X}}$ , has more than one element then for each inactive $s\in\mathcal{S}$ we have $E\leq\theta_{s}(R_{c})$ provided that $\epsilon<1/|\mathfrak{F}_{s}|$ holds. For each active $s\notin\mathcal{S}$ , then by the strong converse result for the simple hypothesis testing problem we obtain $E\leq\min_{i\in\mathfrak{F}_{s}}\xi_{i}(R_{c})=\theta_{s}(R_{c})$ , for all $\epsilon\in[0,1)$ . Thereby we obtain the conclusion.

Proof of Lemma 3: Select for any $P_{U|X}$ in the optimization domain of $R_{\text{HT}}^{\mu}(\bar{P})$ a $P_{\tilde{U}\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}=P_{U|X}\times\bar{P}$ . Then we can see that

\sup_{\alpha>0}R_{\text{HT}}^{\mu,\alpha}(\bar{P})\leq R_{\text{HT}}^{\mu}(\bar{P}).

Given an $\alpha>0$ , let $P_{\tilde{U}\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}^{\alpha}$ be an optimal solution for $R_{\text{HT}}^{\mu,\alpha}(\bar{P})$ . As $R_{\text{HT}}^{\mu}(\bar{P})\leq\log|\mathcal{X}|$ , and $I(\tilde{X};\tilde{U})-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{i};\tilde{U})+\bar{d}_{i}^{\star}]\geq-\mu(\log|\mathcal{Y}|+\min_{i}\bar{d}_{i}^{\star})$ , we have

	$\displaystyle D(P_{\tilde{U}\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}^{\alpha}\\|P_{\tilde{U}\|\tilde{X}}^{\alpha}\times\bar{P})$	$\displaystyle=I(\tilde{U};(\tilde{Y}_{i})_{i=1}^{m}\|\tilde{X})+D(P_{\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}^{\alpha}\\|\bar{P})$
		$\displaystyle\leq\frac{\log\|\mathcal{X}\|+\mu(\log\|\mathcal{Y}\|+\min_{i\in[1:m]}\bar{d}_{i}^{\star})}{\alpha}.$		(210)

We then have

$\displaystyle R_{\text{HT}}^{\mu,\alpha}(\bar{P})$
$\displaystyle\geq I(\tilde{X};\tilde{U})-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{i};\tilde{U})+\bar{d}_{i}^{\star}]$	$\displaystyle\text{eval. with}\;P_{\tilde{U}\tilde{X}(\tilde{Y}_{i})_{i=1}^{m}}^{\alpha}$
$\displaystyle\geq I(X;\bar{U})-\mu\min_{i\in[1:m]}[I(Y_{i};\bar{U})+\bar{d}_{i}^{\star}]$
$\displaystyle-\Delta\bigg{(}\frac{\log\|\mathcal{X}\|+\mu(\log\|\mathcal{Y}\|+\min_{i\in[1:m]}\bar{d}_{i}^{\star})}{\alpha}\bigg{)}$	$\displaystyle\text{eval. with}\;P_{\tilde{U}\|\tilde{X}}^{\alpha}\times\bar{P}$
$\displaystyle\geq R_{\text{HT}}^{\mu}(\bar{P})-\Delta\bigg{(}\frac{\log\|\mathcal{X}\|+\mu(\log\|\mathcal{Y}\|+\min_{i\in[1:m]}\bar{d}_{i}^{\star})}{\alpha}\bigg{)},$		(211)

where $\Delta(t)\to 0$ as $t\to 0$ . Taking the supremum over $\alpha$ we obtain the conclusion.

Proof of (207) and (208): First, we have

	$\displaystyle A_{1}=$	$\displaystyle H(\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m})+D(\tilde{P}\\|\bar{P}^{\otimes n})$
		$\displaystyle+\alpha[H((\tilde{Y}_{i}^{n})_{i=1}^{m}\|\tilde{X}^{n})+D(\tilde{P}\\|\bar{P}^{\otimes n})].$		(212)

Note that since $\tilde{P}\ll\bar{P}^{\otimes n}$ , whenever $\bar{P}(x,(y_{i})_{i=1}^{m})=0$ we must have $P_{\tilde{X}_{l}(\tilde{Y}_{il})_{i=1}^{m}}(x,(y_{i})_{i=1}^{m})=0$ for all $l\in[1:n]$ . This implies that $P_{\tilde{X}_{T}(\tilde{Y}_{iT})_{i=1}^{m}}=1/n\sum P_{\tilde{X}_{l}(\tilde{Y}_{il})_{i=1}^{m}}\ll\bar{P}$ . The absolute continuities ensure the validity of the following derivations

	$\displaystyle H(\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m})+D(\tilde{P}\\|\bar{P}^{\otimes n})$
	$\displaystyle=\sum P_{\tilde{X}^{n}(\tilde{Y}_{i}^{n})_{i=1}^{m}}(x^{n},(y_{i}^{n})_{i=1}^{m})\log\frac{1}{\bar{P}^{\otimes n}(x^{n},(y_{i}^{n})_{i=1}^{m})}$
	$\displaystyle=\sum_{l=1}^{n}\sum P_{\tilde{X}_{l}(\tilde{Y}_{il})_{i=1}^{m}}(x,(y_{i})_{i=1}^{m})\log\frac{1}{\bar{P}(x,(y_{i})_{i=1}^{m})}$
	$\displaystyle=\sum P_{\tilde{X}_{T}(\tilde{Y}_{iT})_{i=1}^{m}}(x,(y_{i})_{i=1}^{m})\log\frac{1}{\bar{P}(x,(y_{i})_{i=1}^{m})}$
	$\displaystyle=n(H(\tilde{X}_{T},(\tilde{Y}_{iT})_{i=1}^{m})+D(P_{\tilde{X}_{T}(\tilde{Y}_{iT})_{i=1}^{m}}\\|\bar{P})),$		(213)

and

	$\displaystyle H((\tilde{Y}_{i}^{n})_{i=1}^{m}\|\tilde{X}^{n})+D(\tilde{P}\\|\bar{P}^{\otimes n})=-H(\tilde{X}^{n})+H(\tilde{X}^{n},(\tilde{Y}_{i}^{n})_{i=1}^{m})+D(\tilde{P}\\|\bar{P}^{\otimes n})$
	$\displaystyle=-H(\tilde{X}^{n})+nH(\tilde{X}_{T})+nH((\tilde{Y}_{iT})_{i=1}^{m}\|\tilde{X}_{T})+nD(P_{\tilde{X}_{T}(\tilde{Y}_{iT})_{i=1}^{m}}\\|\bar{P})$
	$\displaystyle\geq nH((\tilde{Y}_{iT})_{i=1}^{m}\|\tilde{X}_{T})+nD(P_{\tilde{X}_{T}(\tilde{Y}_{iT})_{i=1}^{m}}\\|\bar{P}).$		(214)

In the last step we have use the inequality $H(\tilde{X}_{T})\geq H(\tilde{X}_{T}|T)$ , as in this case $\tilde{X}_{T}$ and $T$ might not be independent of each other. This implies (207). Furthermore, since conditioning reduces entropy, we can lower bound the term $A_{2}$ as follows

$\displaystyle A_{2}$	$\displaystyle\geq\sum_{l=1}^{n}-H(\tilde{X}_{l},(\tilde{Y}_{il})_{i=1}^{m}\|\tilde{M},\tilde{X}^{l-1},(\tilde{Y}_{i}^{l-1})_{i=1}^{m})$
	$\displaystyle-\alpha H((\tilde{Y}_{il})_{i=1}^{m}\|\tilde{M},(\tilde{Y}_{i}^{l-1})_{i=1}^{m},\tilde{X}^{n})$
	$\displaystyle\qquad-\mu\min_{i\in[1:m]}\bigg{[}\sum_{l=1}^{n}I(\tilde{Y}_{il};\tilde{M},\tilde{Y}_{i}^{l-1})+n\bar{d}_{i}^{\star}\bigg{]}$
	$\displaystyle\geq\sum_{l=1}^{n}-H(\tilde{X}_{l},(\tilde{Y}_{il})_{i=1}^{m}\|\tilde{M},(\tilde{Y}_{i}^{l-1})_{i=1}^{m})-\alpha H((\tilde{Y}_{il})_{i=1}^{m}\|\tilde{X}_{l},\tilde{M},(\tilde{Y}_{i}^{l-1})_{i=1}^{m})$
	$\displaystyle\qquad-\mu\min_{i\in[1:m]}\bigg{[}\sum_{l=1}^{n}I(\tilde{Y}_{il};\tilde{M},(\tilde{Y}_{\eta}^{l-1})_{\eta=1}^{m})+n\bar{d}_{i}^{\star}\bigg{]}$
	$\displaystyle=\sum_{l=1}^{n}-H(\tilde{X}_{l},(\tilde{Y}_{il})_{i=1}^{m}\|\tilde{U}_{l})-\alpha H((\tilde{Y}_{il})_{i=1}^{m}\|\tilde{X}_{l},\tilde{U}_{l})$
	$\displaystyle\hskip 28.45274pt-\mu\bigg{[}\min_{i\in[1:m]}\sum_{l=1}^{n}I(\tilde{Y}_{il};\tilde{U}_{l})+n\bar{d}_{i}^{\star}\bigg{]}$
	$\displaystyle\geq n(-H(\tilde{X}_{T},(\tilde{Y}_{iT})_{i=1}^{m}\|\tilde{U}_{T},T)-\alpha H((\tilde{Y}_{iT})_{i=1}^{m}\|\tilde{X}_{T},\tilde{U}_{T},T)$
	$\displaystyle\hskip 28.45274pt-\mu\min_{i\in[1:m]}[I(\tilde{Y}_{iT};\tilde{U}_{T},T)+\bar{d}_{i}^{\star}]).$	(215)

C-B The case that $\epsilon>\max\{\max_{s\in\mathcal{S}}\frac{|\mathfrak{F}_{s}|-1}{|\mathfrak{F}_{s}|},0\}$

Fix a compression rate $R_{c}$ and an arbitrary $\gamma>0$ . If the inactive set is empty, $\mathcal{S}=\varnothing$ , then we have $\min_{s}\theta_{s}(R_{c})=\min_{i\in[1:m]}\xi_{i}(R_{c})$ and the threshold $\max\{\max_{s\in\mathcal{S}}(|\mathfrak{F}_{s}|-1)/|\mathfrak{F}_{s}|,0\}$ becomes $0$ . We can use the achievability of Theorem 1 and the strong converse in the subsection C-A to verify the statement. Therefore in the following we assume that the inactive set is non-empty, $\mathcal{S}\neq\varnothing$ . Our coding scheme is influenced by the one given in [7].

C-B1 Construction of a testing scheme

Recall that $\mathfrak{F}_{s}$ represents the distributions which have the same marginal distribution $P_{\mathcal{X},s}$ on $\mathcal{X}$ , and $|\mathfrak{F}_{s}|$ is the number of these. For each inactive $s\in\mathcal{S}$ we partition the set $\mathcal{X}^{n}$ into $|\mathfrak{F}_{s}|$ sets $\{\mathcal{C}_{n}^{(ls)}\}_{l=1}^{|\mathfrak{F}_{s}|}$ such that for all sufficiently large $n$ , we have $P_{\mathcal{X},s}^{\otimes n}(\mathcal{C}_{n}^{(ls)})>1-\epsilon$ for all $l\in[1:|\mathfrak{F}_{s}|]$ . This is possible¹¹¹¹11Let $0<\gamma<[1-|\mathfrak{F}_{s}|(1-\epsilon)]/2$ be an arbitrarily given number. Given a type class $T_{P}^{n}$ which is a subset of the strongly typical set $\mathcal{T}_{\gamma}^{n}(P_{\mathcal{X},s})$ , we divide it into $|\mathfrak{F}_{s}|$ subsets such that each subset has cardinality of $\lfloor|T_{P}^{n}|/|\mathfrak{F}_{s}|\rfloor$ and omit the remaining sequences. Enumerating over all type classes inside the typical set, the number of omitted typical sequences is upper bounded by $(n+1)^{|\mathcal{X}|}|\mathfrak{F}_{s}|$ . Hence each of the constructed subsets has probability of at least $(1-2\gamma)/|\mathfrak{F}_{s}|>1-\epsilon$ for all sufficiently large $n$ . The atypical sequences and the omitted typical sequences can be then assigned into these sets randomly. because we have $\epsilon\in(\max_{s\in\mathcal{S}}(|\mathfrak{F}_{s}|-1)/|\mathfrak{F}_{s}|,1)$ . Furthermore for each $i\in\mathfrak{F}_{s}$ where $s\in\mathcal{S}$ , let $l_{i}$ be its position inside the set $\mathfrak{F}_{s}$ according to the natural ordering.

For each of these inactive states $s$ and $i\in\mathfrak{F}_{s}$ , let $(\phi_{n}^{(l_{i}s)},\psi_{n}^{(l_{i}s)})$ be a sequence of testing schemes to differentiate between $P_{Y_{i}X_{i}}^{\otimes n}$ and $Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{X_{t_{s}^{\star}}^{n}}$ such that $\xi_{i}(R_{c})-\gamma/3$ is achievable. Then similarly as in the conclusion of Theorem 1, the false alarm probability of the likelihood ratio test also goes to zero

	$\displaystyle\lim_{n\to\infty}\mathrm{Pr}\big{\{}$	$\displaystyle P_{Y_{i}^{n}\phi_{n}^{(l_{i}s)}(X_{i}^{n})}(Y_{i}^{n},\phi_{n}^{(l_{i}s)}(X_{i}^{n}))$
		$\displaystyle\leq e^{n(\xi_{i}(R_{c})-\gamma/3)}Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}^{(l_{i}s)}(X_{t_{s}^{\star}}^{n})}(Y_{i}^{n},\phi_{n}^{(l_{i}s)}(X_{i}^{n}))\big{\}}=0.$		(216)

Note also that the following expressions

	$\displaystyle\mathrm{Pr}\{P_{X_{i}}^{\otimes n}(X_{i}^{n})\leq e^{n(d_{s}^{\mathrm{x}}+\gamma/3)}Q_{X_{t_{s}^{\star}}^{n}}(X_{i}^{n})\}$	$\displaystyle\to 1,$
	$\displaystyle\mathrm{Pr}\{P_{Y_{i}}^{\otimes n}(Y_{i}^{n})\leq e^{n(d_{i}^{\mathrm{y}}+\gamma/3)}Q_{Y_{j_{i}^{\star}}^{n}}(Y_{i}^{n})\}$	$\displaystyle\to 1,$		(217)

hold due to either the weak law of large numbers or Theorem 1 in [24]. Rewriting the above expression we therefore obtain

\displaystyle\lim_{n\to\infty}\mathrm{Pr}\big{\{}\iota_{P_{Y_{i}^{n}\phi_{n}^{(l_{i}s)}(X_{i}^{n})}}

\displaystyle(Y_{i}^{n};\phi_{n}^{(l_{i}s)}(X_{i}^{n}))\leq n(\xi_{i}(R_{c})-d_{is}^{\mathrm{yx}}-\gamma)\big{\}}=0.

(218)

For each active state $s$ , $s\notin\mathcal{S}$ , let $(\phi_{n}^{s},\psi_{n}^{s})$ be a sequence of testing schemes to differentiate between $\{P_{Y_{i}X_{i}}^{\otimes n}\}_{i\in\mathfrak{F}_{s}}$ and $\{Q_{Y_{j}^{n}}\times Q_{X_{t}^{n}}\}$ such that $\theta_{s}(R_{c})-\gamma/3=\min_{\bar{i}\in\mathfrak{F}_{s}}\xi_{\bar{i}}(R_{c})-\gamma/3$ is achievable. The existence of $(\phi_{n}^{s},\psi_{n}^{s})$ follows from Theorem 1. Similarly from the conclusion of Theorem 1, for each intersected set $\mathcal{I}^{(i)}(\theta_{s}(R_{c})-\gamma/3)$ , $i\in\mathfrak{F}_{s}$ , we have

\displaystyle\lim_{n\to\infty}P_{Y_{i}^{n}\phi_{n}^{s}(X_{i}^{n})}[(\mathcal{I}^{(i)}(\theta_{s}(R_{c})-\gamma/3))^{c}]=0.

(219)

From the definition of $\mathcal{I}^{(i)}(\theta_{s}(R_{c})-\gamma/3)$ in (17) we further have

	$\displaystyle\lim_{n\to\infty}\mathrm{Pr}\big{\{}$	$\displaystyle P_{Y_{i}^{n}\phi_{n}^{s}(X_{i}^{n})}(Y_{i}^{n},\phi_{n}^{s}(X_{i}^{n}))$
		$\displaystyle\leq e^{n(\theta_{s}(R_{c})-\gamma/3)}Q_{Y_{j_{i}^{\star}}^{n}}\times Q_{\phi_{n}^{s}(X_{t_{s}^{\star}}^{n})}(Y_{i}^{n},\phi_{n}^{s}(X_{i}^{n}))\big{\}}=0.$		(220)

Using (217) again we obtain

	$\displaystyle\lim_{n\to\infty}\mathrm{Pr}\big{\{}$	$\displaystyle\iota_{P_{Y_{i}^{n}\phi_{n}^{s}(X_{i}^{n})}}(Y_{i}^{n};\phi_{n}^{s}(X_{i}^{n}))$
		$\displaystyle\leq n(\min_{\bar{i}\in\mathfrak{F}_{s}}\xi_{\bar{i}}(R_{c})-d_{is}^{\mathrm{yx}}-\gamma)\big{\}}=0,\;\forall i\in\mathfrak{F}_{s}.$		(221)

Define the following auxiliary distribution

\displaystyle P_{Y^{n}X^{n}}=\sum_{i=1}^{m}\nu_{i}P_{Y_{i}X_{i}}^{\otimes n},\;\text{where}\;\nu_{i}>0,\;\text{and}\;\sum\nu_{i}=1.

(222)

Let $P_{X^{n}}$ and $P_{Y^{n}}$ be the marginal distributions of $P_{Y^{n}X^{n}}$ . Furthermore, let $P_{Y^{n}U}$ be the push-forward distribution resulting from applying $(\phi_{n}^{(l_{i}s)})$ and $\phi_{n}^{s}$ to $\{P_{X_{i}^{n}}\}$ , i.e.,

\displaystyle P_{Y^{n}U}=\sum_{s\in\mathcal{S}}\sum_{i\in\mathfrak{F}_{s}}\nu_{i}P_{Y_{i}^{n}\phi_{n}^{(l_{i}s)}(X_{i}^{n})}+\sum_{s\notin\mathcal{S}}\sum_{i\in\mathfrak{F}_{s}}\nu_{i}P_{Y_{i}^{n}\phi_{n}^{s}(X_{i}^{n})}.

(223)

We use the same mapping $T(\cdot)$ to estimate $s$ before encoding the as in the proof of Theorem 1. If $\hat{s}=e$ we send $(e,1)$ to the decision center. If $\hat{s}\neq e$ , we check if $x^{n}\in\mathcal{B}_{n,\gamma}^{\hat{s}}$ where

\displaystyle\mathcal{B}_{n,\gamma}^{\hat{s}}=\{x^{n}\mid\min_{t\in[1:r]}\iota_{P_{X^{n}}\|Q_{X_{t}^{n}}}(x^{n})>n(d_{\hat{s}}^{\mathrm{x}}-\gamma)\}.

(224)

If the condition is not fulfilled we send a special symbol $u=e^{\star}$ . Assume that the condition $x^{n}\in\mathcal{B}_{n,\gamma}^{\hat{s}}$ holds. If $\hat{s}\in\mathcal{S}$ we send the following message to the decision center $u=\bar{\phi}_{n}^{\hat{s}}(x^{n})=\sum_{l=1}^{|\mathfrak{F}_{\hat{s}}|}\phi_{n}^{(l\hat{s})}(x^{n})\mathbf{1}\{x^{n}\in\mathcal{C}_{n}^{(l\hat{s})}\cap\mathcal{B}_{n,\gamma}^{\hat{s}}\}$ . If $\hat{s}\notin\mathcal{S}$ we send $u=\phi_{n}^{\hat{s}}(x^{n})\mathbf{1}\{x^{n}\in\mathcal{B}_{n,\gamma}^{\hat{s}}\}$ . For $E=\min_{i\in[1:m]}\xi_{i}(R_{c})-2\gamma$ , we decide that the null hypothesis $H_{0}$ is true if $\hat{s}\neq e$ , $u\neq e^{\star}$ and

\displaystyle\iota_{P_{Y^{n}U}}(y^{n};u)+\min_{j\in[1:k]}\iota_{P_{Y^{n}}\|Q_{Y_{j}^{n}}}(y^{n})>n(E-d_{\hat{s}}^{\mathrm{x}}).

(225)

C-B2 Bounding error probabilities

For each $i\in\mathfrak{F}_{s}$ where $s$ is active, $s\notin\mathcal{S}$ , the false alarm probability is given by

$\displaystyle\alpha_{n}^{(i)}$	$\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y^{n}U}}(Y_{i}^{n};\phi_{n}^{s}(X_{i}^{n}))+\min_{j\in[1:k]}\iota_{P_{Y^{n}}\\|Q_{Y_{j}^{n}}}(Y_{i}^{n})\leq n(E-d_{s}^{\mathrm{x}})\}$
	$\displaystyle\quad+P_{\mathcal{X},s}^{\otimes n}[(B_{n,\gamma}^{s})^{c}]+\mathrm{Pr}\{T(X_{i}^{n})\neq s\}.$
	$\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y^{n}U}}(Y_{i}^{n};\phi_{n}^{s}(X_{i}^{n}))\leq n(E-d_{is}^{\mathrm{yx}}+\gamma)\}$
	$\displaystyle\quad+\mathrm{Pr}\{\min_{j\in[1:k]}\iota_{P_{Y^{n}}\\|Q_{Y_{j}^{n}}}(\bar{Y}_{i}^{n})<n(d_{i}^{\mathrm{y}}-\gamma)\}$
	$\displaystyle\quad+P_{\mathcal{X},s}^{\otimes n}[(B_{n,\gamma}^{s})^{c}]+\mathrm{Pr}\{T(X_{i}^{n})\neq s\}.$	(226)

Similarly, for each $i\in\mathfrak{F}_{s}$ where $s$ is not active, $s\in\mathcal{S}$ , we have

$\displaystyle\alpha_{n}^{(i)}$	$\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y^{n}U}}(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n}))+\min_{j\in[1:k]}\iota_{P_{Y^{n}}\\|Q_{Y_{j}^{n}}}(Y_{i}^{n})$
	$\displaystyle\hskip 28.45274pt\leq n(E-d_{s}^{\mathrm{x}}),\;X_{i}^{n}\in\mathcal{B}_{n,\gamma}^{s}\}$
	$\displaystyle\quad+P_{\mathcal{X},s}^{\otimes n}[(B_{n,\gamma}^{s})^{c}]+\mathrm{Pr}\{T(X_{i}^{n})\neq s\}$
	$\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y^{n}U}}(Y_{i}^{n};\phi_{n}^{(l_{i}s)}(X_{i}^{n}))\leq n(E-d_{is}^{\mathrm{yx}}+\gamma),\;X_{i}^{n}\in\mathcal{C}_{n}^{(l_{i}s)}\cap\mathcal{B}_{n,\gamma}^{s}\}$
	$\displaystyle\quad+\mathrm{Pr}\{\iota_{P_{Y^{n}U}}(Y_{i}^{n};\bar{\phi}_{n}^{s}(X_{i}^{n}))\leq n(E-d_{is}^{\mathrm{yx}}+\gamma),\;X_{i}^{n}\notin\mathcal{C}_{n}^{(l_{i}s)}\}$
	$\displaystyle\quad+\mathrm{Pr}\{\min_{j\in[1:k]}\iota_{P_{Y^{n}}\\|Q_{Y_{j}^{n}}}(\bar{Y}_{i}^{n})<n(d_{i}^{\mathrm{y}}-\gamma)\}$
	$\displaystyle\quad+P_{\mathcal{X},s}^{\otimes n}[(B_{n,\gamma}^{s})^{c}]+\mathrm{Pr}\{T(X_{i}^{n})\neq s\}$
	$\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y^{n}U}}(Y_{i}^{n};\phi_{n}^{(l_{i}s)}(X_{i}^{n}))\leq n(E-d_{is}^{\mathrm{yx}}+\gamma)\}$
	$\displaystyle\quad+\mathrm{Pr}\{\min_{j\in[1:k]}\iota_{P_{Y^{n}}\\|Q_{Y_{j}^{n}}}(\bar{Y}_{i}^{n})<n(d_{i}^{\mathrm{y}}-\gamma)\}$
	$\displaystyle\quad+P_{\mathcal{X},s}^{\otimes n}[(B_{n,\gamma}^{s})^{c}]+\mathrm{Pr}\{T(X_{i}^{n})\neq s\}+\epsilon,$	(227)

for all sufficiently large $n$ . The last inequality holds since for all sufficiently large $n$ we have $P_{\mathcal{X},s}^{\otimes n}[(\mathcal{C}_{n}^{(l_{i}s)})^{c}]\leq\epsilon$ by the definition of $\mathcal{C}_{n}^{(l_{i}s)}$ . Let $\gamma_{n}$ be a sequence such that $\gamma_{n}\to 0$ and $n\gamma_{n}\to\infty$ as $n\to\infty$ . Using the change of measure steps as in the proof of Theorem 1 we have

	$\displaystyle\mathrm{Pr}\{\iota_{P_{Y^{n}U}}(Y_{i}^{n};\phi_{n}^{s}(X_{i}^{n}))\leq n(E-d_{is}^{\mathrm{yx}}+\gamma)\}$
	$\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y_{i}^{n}\phi_{n}^{s}(X_{i}^{n})}}(Y_{i}^{n};\phi_{n}^{s}(X_{i}^{n}))\leq n(E-d_{is}^{\mathrm{yx}}+\gamma-1/n\log{\nu_{i}}+2\gamma_{n})\}$
	$\displaystyle\quad+2e^{-n\gamma_{n}},$
	$\displaystyle\mathrm{Pr}\{\iota_{P_{Y^{n}U}}(Y_{i}^{n};\phi_{n}^{(l_{i}s)}(X_{i}^{n}))\leq n(E-d_{is}^{\mathrm{yx}}+\gamma)\}$
	$\displaystyle\leq\mathrm{Pr}\{\iota_{P_{Y_{i}^{n}}\phi_{n}^{(l_{i}s)}(X_{i}^{n})}(Y_{i}^{n};\phi_{n}^{(l_{i}s)}(X_{i}^{n}))\leq n(E-d_{is}^{\mathrm{yx}}+\gamma-1/n\log\nu_{i}+2\gamma_{n})\}$
	$\displaystyle\quad+2e^{-n\gamma_{n}}.$		(228)

Using (218), (221) and Lemma 1, we obtain

	$\displaystyle\lim_{n\to\infty}\alpha_{n}^{(i)}=0,\;\forall i\in\mathfrak{F}_{s},\;s\notin\mathcal{S},$
	$\displaystyle\limsup_{n\to\infty}\alpha_{n}^{(i)}\leq\epsilon,\;\forall i\in\mathfrak{F}_{s},\;s\in\mathcal{S}.$		(229)

This implies that

\limsup_{n\to\infty}\alpha_{n}=\limsup_{n\to\infty}\max_{i\in[1:m]}\alpha_{n}^{(i)}=\max_{i\in[1:m]}\limsup_{n\to\infty}\alpha_{n}^{(i)}\leq\epsilon.

(230)

Similarly as in the last part of the proof of Theorem 1, by change of measure steps, we also have

\displaystyle\beta_{n}^{(jt)}\leq e^{-nE},\;\forall j\in[1:k],\;t\in[1:r].

(231)

Therefore for $\epsilon>\max_{s\in\mathcal{S}}(|\mathfrak{F}_{s}|-1)/|\mathfrak{F}_{s}|$ we have $E_{\epsilon}^{\star}(R_{c})=\min_{i\in[1:m]}\xi_{i}(R_{c})$ as the converse direction is straightforward.

C-C Convergence of $\alpha_{n}^{(i^{\star})}$

We assume that the optimality achieving index $s^{\star}$ is active, $s^{\star}\not\subset\mathcal{S}$ , otherwise there is nothing to prove. Let $\mathcal{R}$ , $R_{\mathrm{HT}}^{\mu}(P_{X_{i}^{\star}Y_{i^{\star}}})$ and $R_{\mathrm{HT}}^{\mu,\alpha}(P_{X_{i^{\star}}Y_{i^{\star}}})$ be defined as in (176), (179) and (180) with $m=1$ and $P_{X_{i^{\star}}Y_{i^{\star}}}$ in place of $\bar{P}$ and $d_{i^{\star}s^{\star}}^{\mathrm{yx}}$ in place of $\bar{d}_{i}^{\star}$ .

Let $(\phi_{n},\psi_{n})$ be an arbitrary sequence of testing schemes such that $\liminf_{n\to\infty}\frac{1}{n}\log\frac{1}{\beta_{n}}\geq E$ where $E=\xi_{i^{\star}}(R_{c})+\tau$ for an arbitrary $\tau>0$ holds. Select $\gamma\in(0,\tau/5)$ small enough such that $(R+\gamma,E-3\gamma)\notin\mathcal{R}$ .
Similarly as in the previous proofs we transform $(\phi_{n},\psi_{n})$ to a testing scheme $(\bar{\phi}_{n},\bar{\psi}_{n})$ , where $\bar{\psi}_{n}$ is a deterministic mapping, for differentiating between $P_{Y_{i^{\star}}X_{i^{\star}}}^{\otimes n}$ and $P_{Y_{i^{\star}}}^{\otimes n}\times P_{X_{i^{\star}}}^{\otimes n}$ . This can be done by using the typical sets $\mathcal{B}_{n,\gamma}^{s^{\star}}$ and $\mathcal{B}_{n,\gamma}^{(i^{\star})}$ defined in (173). The resulting error probabilities are similarly bounded by

$\displaystyle P_{Y_{i^{\star}}^{n}\bar{\phi}_{n}(X_{i^{\star}}^{n})}(1-\bar{\psi}_{n})$	$\displaystyle\leq\alpha_{n}^{(i^{\star})}+e^{n(E-\gamma)}\beta_{n}^{(j_{i^{\star}}^{\star}t_{s^{\star}}^{\star})}$
	$\displaystyle\quad+P_{X_{i^{\star}}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{s^{\star}})^{c}]+P_{Y_{i^{\star}}}^{\otimes n}[(\mathcal{B}_{n,\gamma}^{(i^{\star})})^{c}],$
$\displaystyle P_{Y_{i^{\star}}^{n}}\times P_{\bar{\phi}_{n}(X_{i^{\star}}^{n})}(\bar{\psi}_{n})$	$\displaystyle\leq e^{-n(E-d_{i^{\star}s^{\star}}^{\mathrm{yx}}-3\gamma)}.$	(232)

Let $\bar{\mathcal{A}}_{n}$ be the acceptance region of $\bar{\psi}_{n}$ . We argue that there exists a $\lambda>0$ such that for all $n\geq n_{0}(\gamma)$ we have

\displaystyle P_{Y_{i^{\star}}^{n}\bar{\phi}_{n}(X_{i^{\star}}^{n})}(\bar{\mathcal{A}}_{n})\leq e^{-\lambda n}.

(233)

Assume otherwise that for all $\lambda>0$ there exists an $n\geq n_{0}(\gamma)$ such that

\displaystyle P_{Y_{i^{\star}}^{n}\bar{\phi}_{n}(X_{i^{\star}}^{n})}(\bar{\mathcal{A}}_{n})>e^{-\lambda n}.

(234)

Similarly for notation simplicity we define $\bar{B}_{n}=\bigcup_{u}\{\bar{\phi}_{n}^{-1}(u)\}\times\bar{\mathcal{A}}_{n,u}$ as well as

\displaystyle P_{\tilde{X}^{n}\tilde{Y}^{n}}=\frac{P_{X_{i^{\star}}Y_{i^{\star}}}^{\otimes n}(x^{n},y^{n})}{P_{X_{i^{\star}}Y_{i^{\star}}}^{\otimes n}(\bar{B}_{n})}\mathbf{1}\{(x^{n},y^{n})\in\bar{B}_{n}\}.

(235)

We then have

\displaystyle D(P_{\tilde{X}^{n}\tilde{Y}^{n}}\|P_{X_{i^{\star}}Y_{i^{\star}}}^{\otimes n})\leq\lambda n.

(236)

For $(y^{n},u)\in\bar{\mathcal{A}}_{n}$ it can be seen that

	$\displaystyle P_{Y_{i^{\star}}^{n}\bar{\phi}_{n}(X_{i^{\star}}^{n})}(y^{n},u)$	$\displaystyle=P_{X_{i^{\star}}Y_{i^{\star}}}^{\otimes n}(\bar{B}_{n})P_{\tilde{Y}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}(y^{n},u)$
		$\displaystyle\geq e^{-\lambda n}P_{\tilde{Y}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}(y^{n},u)$

holds whereas for $(y^{n},u)\notin\bar{\mathcal{A}}_{n}$ we have $P_{\tilde{Y}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}(y^{n},u)=0$ . Therefore for all $(y^{n},u)\in\bar{\mathcal{A}}_{n}$ the following inequalities hold

\displaystyle P_{Y_{i^{\star}}}^{\otimes n}(y^{n})\geq e^{-\lambda n}P_{\tilde{Y}^{n}}(y^{n}),\;P_{\bar{\phi}_{n}(X_{i^{\star}}^{n})}(u)\geq e^{-\lambda n}P_{\bar{\phi}_{n}(\tilde{X}^{n})}(u).

(237)

This implies further that we also have

	$\displaystyle\log\frac{1}{P_{Y_{i^{\star}}^{n}}\times P_{\bar{\phi}_{n}(X_{i^{\star}}^{n})}(\bar{\mathcal{A}}_{n})}$
	$\displaystyle\leq\sum_{(y^{n},u)\in\bar{\mathcal{A}}_{n}}P_{\tilde{Y}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}(y^{n},u)\log\frac{P_{\tilde{Y}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}(y^{n},u)}{P_{Y_{i^{\star}}^{n}}(y^{n})P_{\bar{\phi}_{n}(X_{i^{\star}}^{n})}(u)}$
	$\displaystyle\leq\sum_{(y^{n},u)\in\bar{\mathcal{A}}_{n}}P_{\tilde{Y}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}(y^{n},u)\log\frac{P_{\tilde{Y}^{n}\bar{\phi}_{n}(\tilde{X}^{n})}(y^{n},u)}{e^{-2\lambda n}P_{\tilde{Y}^{n}}(y^{n})P_{\bar{\phi}_{n}(\tilde{X}^{n})}(u)}$
	$\displaystyle=I(\tilde{Y}^{n};\bar{\phi}_{n}(\tilde{X}^{n}))+2\lambda n.$		(238)

Hence we obtain

\displaystyle n(E-3\gamma)\leq I(\tilde{Y}^{n};\bar{\phi}_{n}(\tilde{X}^{n}))+d_{i^{\star}s^{\star}}^{\mathrm{yx}}+2\lambda n,

(239)

where $(\tilde{Y}^{n},\tilde{X}^{n})\sim P_{\tilde{X}^{n}\tilde{Y}^{n}}$ . By using the same lines of arguments as from (205) to (209) we have for given positive $\alpha$ and $\mu$

\displaystyle(R_{c}+\gamma)-\mu(E-3\gamma)\geq R_{\mathrm{HT}}^{\mu,\alpha}(P_{X_{i^{\star}}Y_{i^{\star}}})-((\alpha+1)+2\mu)\lambda.

(240)

As $\mathcal{R}$ is a closed convex set and $(R+\gamma,E-3\gamma)\notin\mathcal{R}$ holds, by hyperplane separation theorem there exist positive numbers $\mu$ and $\nu$ such that $(R_{c}+\gamma)-\mu(E-3\gamma)<R_{\mathrm{HT}}^{\mu}(P_{X_{i^{\star}}Y_{i^{\star}}})-2\nu$ holds. Furthermore there also exists an $\alpha$ such that $R_{\mathrm{HT}}^{\mu}(P_{X_{i^{\star}}Y_{i^{\star}}})<R_{\mathrm{HT}}^{\mu,\alpha}(P_{X_{i^{\star}}Y_{i^{\star}}})+\nu$ . Then we obtain for such $\alpha,\mu$

$\displaystyle R_{\mathrm{HT}}^{\mu,\alpha}(P_{X_{i^{\star}}Y_{i^{\star}}})-\nu$	$\displaystyle\geq R_{\mathrm{HT}}^{\mu}(P_{X_{i^{\star}}Y_{i^{\star}}})-2\nu$
	$\displaystyle>(R_{c}+\gamma)-\mu(E-3\gamma)$
	$\displaystyle\geq R_{\mathrm{HT}}^{\mu,\alpha}(P_{X_{i^{\star}}Y_{i^{\star}}})-((\alpha+1)+2\mu)\lambda.$	(241)

This inequality is violated for $\lambda<\nu/((\alpha+1)+2\mu)$ . Then (233) and (232) imply that

\displaystyle\lim_{n\to\infty}\alpha_{n}^{(i^{\star})}=1.

(242)

References

[1] R. Ahlswede and I. Csiszár, “Hypothesis testing with communication constraints,” IEEE Transactions on Information Theory, vol. 32, no. 4, pp. 533–542, 1986.
[2] T. Berger, “Decentralized estimation and decision theory,” in IEEE 7th Spring Workshop on Inf. Theory, Mt. Kisco, NY, September 1979.
[3] T. Han, “Hypothesis testing with multiterminal data compression,” IEEE Transactions on Information Theory, vol. 33, no. 6, pp. 759–772, 1987.
[4] H. Shimokawa, T. S. Han, and S. Amari, “Error bound of hypothesis testing with data compression,” in Proceedings of 1994 IEEE International Symposium on Information Theory. IEEE, 1994, p. 114.
[5] H. M. Shalaby and A. Papamarcou, “Multiterminal detection with zero-rate data compression,” IEEE Transactions on Information Theory, vol. 38, no. 2, pp. 254–267, 1992.
[6] M. S. Rahman and A. B. Wagner, “On the optimality of binning for distributed hypothesis testing,” IEEE Transactions on Information Theory, vol. 58, no. 10, pp. 6282–6303, 2012.
[7] C. Tian and J. Chen, “Successive refinement for hypothesis testing and lossless one-helper problem,” IEEE Transactions on Information Theory, vol. 54, no. 10, pp. 4666–4681, 2008.
[8] S. Watanabe, “Neyman–pearson test for zero-rate multiterminal hypothesis testing,” IEEE Transactions on Information Theory, vol. 64, no. 7, pp. 4923–4939, 2017.
[9] G. J. McLachlan and D. Peel, Finite mixture models. New York: John Wiley & Sons, 2000.
[10] P.-N. Chen, “General formulas for the neyman-pearson type-ii error exponent subject to fixed and exponential type-i error bounds,” IEEE Transactions on Information Theory, vol. 42, no. 1, pp. 316–323, 1996.
[11] T. S. Han, Information-Spectrum Methods in Information Theory. Berlin Heidelberg: Springer-Verlag Berlin Heidelberg, 2003.
[12] T. S. Han, R. Nomura et al., “First-and second-order hypothesis testing for mixed memoryless sources,” Entropy, vol. 20, no. 3, p. 174, 2018.
[13] A. Ritchie, R. A. Vandermeulen, and C. Scott, “Consistent estimation of identifiable nonparametric mixture models from grouped observations,” Advances in Neural Information Processing Systems, vol. 33, pp. 11 676–11 686, 2020.
[14] R. T. Elmore, T. P. Hettmansperger, and H. Thomas, “Estimating component cumulative distribution functions in finite mixture models,” Communications in Statistics-Theory and Methods, vol. 33, no. 9, pp. 2075–2086, 2004.
[15] I. Cruz-Medina, T. Hettmansperger, and H. Thomas, “Semiparametric mixture models and repeated measures: the multinomial cut point model,” Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 53, no. 3, pp. 463–474, 2004.
[16] Y. Wei and X. Nguyen, “Convergence of de finetti’s mixing measure in latent structure models for observed exchangeable sequences,” arXiv preprint arXiv:2004.05542, 2020.
[17] C. Pal, B. Frey, and T. Kristjansson, “Noise robust speech recognition using gaussian basis functions for non-linear likelihood function approximation,” in 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. IEEE, 2002, pp. I–405.
[18] A. Anandkumar, D. Hsu, and S. M. Kakade, “A method of moments for mixture models and hidden markov models,” in Conference on Learning Theory. JMLR Workshop and Conference Proceedings, 2012, pp. 33–1.
[19] M. I. Jordan, “Stat260: Bayesian Modeling and Inference, Lecture 1: History and De Finetti’s Theorem,” 2010.
[20] W. Kirsch, “An elementary proof of de finetti’s theorem,” Statistics & Probability Letters, vol. 151, pp. 84–88, 2019.
[21] M. T. Vu, T. J. Oechtering, and M. Skoglund, “Hypothesis testing and identification systems,” IEEE Transactions on Information Theory, vol. 67, no. 6, pp. 3765–3780, 2021.
[22] A. Wyner, “On source coding with side information at the decoder,” IEEE Transactions on Information Theory, vol. 21, no. 3, pp. 294–300, 1975.
[23] R. Ahlswede and J. Körner, “Source coding with side information and a converse for degraded broadcast channels,” IEEE Transactions on Information Theory, vol. 21, no. 6, pp. 629–637, 1975.
[24] A. R. Barron, “The Strong Ergodic Theorem for Densities: Generalized Shannon-McMillan-Breiman Theorem,” Annals of Probability, vol. 13, no. 4, pp. 1292–1303, 1985.
[25] S. Verdú, “Non-asymptotic achievability bounds in multiuser information theory,” in Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on. IEEE, 2012, pp. 1–8.
[26] H. Tyagi and S. Watanabe, “Strong converse using change of measure arguments,” IEEE Transactions on Information Theory, vol. 66, no. 2, pp. 689–703, 2019.
[27] I. Csiszar and J. Körner, Information theory: coding theorems for discrete memoryless systems. Cambridge: Cambridge University Press, 2011.
[28] H. Witsenhausen and A. Wyner, “A conditional entropy bound for a pair of discrete random variables,” IEEE Transactions on Information Theory, vol. 21, no. 5, pp. 493–501, 1975.
[29] J. Körner and K. Marton, “Comparison of two noisy channels,” in Topics in Information Theory, vol. 16. Amsterdam: North Holland, 1977, pp. 411–424.
[30] S. Miyake and F. Kanaya, “Coding theorems on correlated general sources,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. 78, no. 9, pp. 1063–1070, 1995.
[31] Y. Polyanskiy and Y. Wu, “Lecture notes on information theory,” MIT (6.441), UIUC (ECE 563), 2017.

	$\displaystyle\forall i\in[1:m],\;j_{i}^{\star}$	$\displaystyle=\operatorname*{arg\,min}_{j\in[1:k]}D(P_{Y_{i}}\\|Q_{Y_{j}}),\;d_{i}^{\mathrm{y}}=D(P_{Y_{i}}\\|Q_{Y_{j_{i}^{\star}}}),\;$
	$\displaystyle\forall s\in[1:\|\mathcal{P}_{\mathcal{X}}\|],\;t_{s}^{\star}$	$\displaystyle=\operatorname*{arg\,min}_{t\in[1:r]}D(P_{\mathcal{X},s}\\|Q_{X_{t}}),\;d_{s}^{\mathrm{x}}=D(P_{\mathcal{X},s}\\|Q_{X_{t_{s}^{\star}}}).$		(11)

	$\displaystyle\forall i\in[1:m],\;i\in\mathfrak{F}_{s},\;\xi_{i}(R_{c})$	$\displaystyle=\max_{P_{U\|\bar{X}_{s}}\colon I(\bar{X}_{s};U)\leq R_{c}}I(\bar{Y}_{i};U)+d_{is}^{\mathrm{yx}},$
	$\displaystyle\forall s\in[1:\|\mathcal{P}_{\mathcal{X}}\|],\;\theta_{s}(R_{c})$	$\displaystyle=\max_{P_{U\|\bar{X}_{s}}\colon I(\bar{X}_{s};U)\leq R_{c}}\min_{i\in\mathfrak{F}_{s}}[I(\bar{Y}_{i};U)+d_{is}^{\mathrm{yx}}],$		(15)

	$\displaystyle\mathrm{Pr}\{\hat{Y}^{n}=Y^{n}\}\leq P_{Y^{n}\phi_{1n}(X^{n})}[(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/\|\mathcal{M}_{2}\|))^{c}]$
	$\displaystyle\hskip 28.45274pt+\sum_{u_{1}}P_{\phi_{1n}(X^{n})}(u_{1})\sum_{\begin{subarray}{c}y^{n}\colon P_{Y^{n}\|\phi_{1n}(X^{n})}(y^{n}\|u_{1})\leq e^{-\eta}/\|\mathcal{M}_{2}\|\\ y^{n}=\bar{\psi}_{n}(u_{1},\phi_{2n}(y^{n}))\end{subarray}}P_{Y^{n}\|\phi_{1n}(X^{n})}(y^{n}\|u_{1})$
	$\displaystyle\stackrel{{\scriptstyle(*)}}{{\leq}}P_{Y^{n}\phi_{1n}(X^{n})}[(\mathcal{A}_{n}(\phi_{1n},e^{-\eta}/\|\mathcal{M}_{2}\|))^{c}]+e^{-\eta},$		(103)

Hypothesis Testing of Mixture Distributions using Compressed Data

Abstract

Index Terms:

I Introduction

I-A Motivations & Related Works

I-B Contributions

I-C Organization

I-D Notations

II Preliminaries

Definition 1.

Assumption 1.

III Compound Hypothesis Testing

III-A Characterization of Ecomp,0⋆​(Rc)E_{\mathrm{comp},0}^{\star}(R_{c})

Theorem 1.

III-B A partial characterization of Ecomp,ϵ⋆​(Rc)E_{\mathrm{comp},\epsilon}^{\star}(R_{c})

Theorem 2.

Remark 1.

IV Testing Against Generalized Independence

Theorem 3.

Proof.

V ϵ\epsilon-Error Exponent in Mixture Setting

V-A Small ϵ\epsilon-optimality of Emix,0⋆​(Rc)E_{\mathrm{mix},0}^{\star}(R_{c})

Corollary 1.

Proof.

V-B A sufficient condition for characterizing Emix,ϵ⋆​(Rc)E_{\mathrm{mix},\epsilon}^{\star}(R_{c})

Assumption 2.

Proposition 1.

Proof.

V-C Characterization of Emix,ϵ⋆​(Rc)E_{\mathrm{mix},\epsilon}^{\star}(R_{c}) under Assumption 2

Theorem 4.

Proof.

VI A refined relation to the WAK problem

Theorem 5.

Proof.

Corollary 2.

Proof.

Appendix A Hypothesis testing with two-sided compression

Assumption 3.

Theorem 6.

Theorem 7.

Appendix B Proof of Theorem 1

B-A A support lemma

Lemma 1.

Proof.

B-B Existence of a testing scheme

B-C Bounding error probabilities

Appendix C Proof of Theorem 2

Lemma 2.

Proof.

Lemma 3.

C-A Strong converse proof for ϵ<min{mins∈𝒮1|𝔉s|,1}\epsilon<\min\{\min_{s\in\mathcal{S}}\frac{1}{|\mathfrak{F}_{s}|},1\}

C-B The case that ϵ>max{maxs∈𝒮|𝔉s|−1|𝔉s|,0}\epsilon>\max\{\max_{s\in\mathcal{S}}\frac{|\mathfrak{F}_{s}|-1}{|\mathfrak{F}_{s}|},0\}

C-B1 Construction of a testing scheme

C-B2 Bounding error probabilities

C-C Convergence of αn(i⋆)\alpha_{n}^{(i^{\star})}

References

III-A Characterization of $E_{\mathrm{comp},0}^{\star}(R_{c})$

III-B A partial characterization of $E_{\mathrm{comp},\epsilon}^{\star}(R_{c})$

V $\epsilon$ -Error Exponent in Mixture Setting

V-A Small $\epsilon$ -optimality of $E_{\mathrm{mix},0}^{\star}(R_{c})$

V-B A sufficient condition for characterizing $E_{\mathrm{mix},\epsilon}^{\star}(R_{c})$

V-C Characterization of $E_{\mathrm{mix},\epsilon}^{\star}(R_{c})$ under Assumption 2

C-A Strong converse proof for $\epsilon<\min\{\min_{s\in\mathcal{S}}\frac{1}{|\mathfrak{F}_{s}|},1\}$

C-B The case that $\epsilon>\max\{\max_{s\in\mathcal{S}}\frac{|\mathfrak{F}_{s}|-1}{|\mathfrak{F}_{s}|},0\}$

C-C Convergence of $\alpha_{n}^{(i^{\star})}$