This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On Sub-optimality of Random Binning for Distributed Hypothesis Testing

Shun Watanabe1 1Department of Computer and Information Sciences, Tokyo University of Agriculture and Technology, Japan,
E-mail:[email protected]
Abstract

We investigate the quantize and binning scheme, known as the Shimokawa-Han-Amari (SHA) scheme, for the distributed hypothesis testing. We develop tools to evaluate the critical rate attainable by the SHA scheme. For a product of binary symmetric double sources, we present a sequential scheme that improves upon the SHA scheme.

I Introduction

In [1], Berger introduced a framework of statistical decision problems under communication constraint. Inspired by his work, many researchers studied various problems of this kind [2, 3, 4, 5, 6, 7, 8]; see [9] for a thorough review until 90s. More recently, the problem of communication constrained statistics has regained interest of researchers; see [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24] and references therein.

In this paper, we are interested in the most basic setting of hypothesis testing in which the sender and the receiver observe correlated sources XnX^{n} and YnY^{n}, respectively, and the observation (Xn,Yn)(X^{n},Y^{n}) is distributed according to the product of the null hypothesis PXYP_{XY} or the alternative hypothesis QXYQ_{XY}; the sender transmit a message at rate RR, and the receiver decide the hypotheses based on the message and its observation.

In [2], Ahlswede and Csiszár introduced the so-called quantization scheme, and showed that the quantization scheme is optimal for the testing against independence, i.e., the alternative hypothesis is QXY=PX×PYQ_{XY}=P_{X}\times P_{Y}. In [3], Han introduced an improved version of the quantization scheme that fully exploit the knowledge of the marginal types. In [25], Shimokawa, Han, and Amari introduced the quantize and binning scheme, which we shall call the SHA scheme in the following. For a long time, it has not been known if the SHA is optimal or not. In [13], Rahman and Wagner showed that the SHA scheme is optimal for the testing against conditional independence.111In fact, for the testing against conditional independence, it suffices to consider a simpler version of quantize and binning scheme than the SHA scheme. More recently, Weinberger and Kochman [17] (see also [26]) derived the optimal exponential trade-off between the type I and type II error probabilities of the quantize and binning scheme. Since the decision is conducted based on the optimal likelihood decision rule in [17], their scheme may potentially provide a better performance than the SHA scheme; however, since the expression involves complicated optimization over multiple parameters, and a strict improvement has not been clarified so far.

A main objective of this paper is to investigate if the SHA scheme is optimal or not. In fact, it turns out that the answer is negative, and there exists a scheme that improves upon the SHA scheme. Since the general SHA scheme is involved, we focus on the critical rate, i.e., the minimum rate that is required to attain the same type II exponent as the Stein exponent of the centralized testing. Then, under a certain non-degeneracy condition, we will prove that the auxiliary random variable in the SHA scheme must coincide with the source XX itself, i.e., we must use the binning without quantization. By leveraging this simplification, for the binary symmetric double sources (BSDS), we derive a closed form expression of the critical rate attainable by the SHA scheme. Next, we will consider a product of the BSDS such that each component of the null hypothesis and the alternative hypothesis is “reversely aligned.” Perhaps surprisingly, it turns out that the SHA is sub-optimal for such hypotheses. Our improved scheme is simple, and it uses the SHA scheme for each component in a sequential manner; however, it should be emphasized that we need to conduct binning of each component separately, and our scheme is not a naive random binning.

II Problem Formulation

We consider the statistical problem of testing the null hypothesis 𝙷0:PXY\mathtt{H}_{0}:P_{XY} on finite alphabets 𝒳×𝒴{\cal X}\times{\cal Y} versus the alternative hypothesis 𝙷1:QXY\mathtt{H}_{1}:Q_{XY} on the same alphabet. The i.i.d. random variables (Xn,Yn)(X^{n},Y^{n}) distributed according to either PXYnP_{XY}^{n} or QXYnQ_{XY}^{n} are observed by the sender and the receiver; the sender encodes the observation XnX^{n} to a message by an encoding function

φn:𝒳nn,\displaystyle\varphi_{n}:{\cal X}^{n}\to{\cal M}_{n},

and the receiver decides whether to accept the null hypothesis or not by a decoding function

ψn:n×𝒴n{𝙷0,𝙷1}.\displaystyle\psi_{n}:{\cal M}_{n}\times{\cal Y}^{n}\to\{\mathtt{H}_{0},\mathtt{H}_{1}\}.

When the block length nn is obvious from the context, we omit the subscript nn. For a given testing scheme Tn=(φ,ψ)T_{n}=(\varphi,\psi), the type I error probability is defined by

α[Tn]:=P(ψ(φ(Xn),Yn)=𝙷1)\displaystyle\alpha[T_{n}]:=P\bigg{(}\psi(\varphi(X^{n}),Y^{n})=\mathtt{H}_{1}\bigg{)}

and the type II error probability is defined by

β[Tn]:=Q(ψ(φ(Xn),Yn)=𝙷0).\displaystyle\beta[T_{n}]:=Q\bigg{(}\psi(\varphi(X^{n}),Y^{n})=\mathtt{H}_{0}\bigg{)}.

In the following, P()P(\cdot) (or Q()Q(\cdot)) means that (Xn,Yn)(X^{n},Y^{n}) is distributed according to PXYnP_{XY}^{n} (or QXYnQ_{XY}^{n}).

In the problem of distributed hypothesis testing, we are interested in the trade-off among the communication rate and the type I and type II error probabilities. Particularly, we focus on the optimal trade-off between the communication rate and the exponent of the type II error probability when the type I error probability is vanishing, which is sometimes called “Stein regime”.

Definition 1

A pair (R,E)(R,E) of the communication rate and the exponent is defined to be achievable if there exists a sequence {Tn}n=1\{T_{n}\}_{n=1}^{\infty} of testing schemes such that

limnα[Tn]=0\displaystyle\lim_{n\to\infty}\alpha[T_{n}]=0

and

lim supn1nlog|n|\displaystyle\limsup_{n\to\infty}\frac{1}{n}\log|{\cal M}_{n}| R,\displaystyle\leq R,
lim infn1nlogβn[Tn]\displaystyle\liminf_{n\to\infty}-\frac{1}{n}\log\beta_{n}[T_{n}] E.\displaystyle\geq E.

The achievable region {\cal R} is the set of all achievable pair (R,E)(R,E).

For a given communication rate RR, the maximum exponent is denoted by

E(R):=max{E:(R,E)}.\displaystyle E(R):=\max\{E:(R,E)\in{\cal R}\}.

When there is no communication constraint, i.e., Rlog|𝒳|R\geq\log|{\cal X}|, the sender can send the full observation without any encoding, and the receiver can use a scheme for the standard (centralized) hypothesis testing. In such a case, the Stein exponent D(PXYQXY)D(P_{XY}\|Q_{XY}) is attainable. We define the critical rate as the minimum communication rate required to attain the Stein exponent:

R𝚌𝚛:=inf{R:E(R)=D(PXYQXY)}.\displaystyle R_{\mathtt{cr}}:=\inf\{R:E(R)=D(P_{XY}\|Q_{XY})\}.

III On Evaluation of SHA Scheme

When the communication rate RR is not sufficient to send XX, the standard approach is to quantize XX using a test channel PU|XP_{U|X} for some finite auxiliary alphabet 𝒰{\cal U}. For PUXY=PU|XPXYP_{UXY}=P_{U|X}P_{XY} and QUXY=QU|XQXYQ_{UXY}=Q_{U|X}Q_{XY} with QU|X=PU|XQ_{U|X}=P_{U|X}, let

E(PUXYQUXY)\displaystyle E(P_{UXY}\|Q_{UXY})
:=min{D(PU~X~Y~QUXY):PU~X~=PUX,PU~Y~=PUY}.\displaystyle:=\min\big{\{}D(P_{\tilde{U}\tilde{X}\tilde{Y}}\|Q_{UXY}):P_{\tilde{U}\tilde{X}}=P_{UX},P_{\tilde{U}\tilde{Y}}=P_{UY}\big{\}}.

In [3], a testing scheme based on quantization was proposed, and the following achievability bound was derived: for any test channel PU|XP_{U|X},

E(R)E(PUXYQUXY).\displaystyle E(R)\geq E(P_{UXY}\|Q_{UXY}).

In [25], a testing scheme based on quantization and binning was proposed, and the following achievability bound was derived: for any test channel PU|XP_{U|X} such that RI(UX|Y)R\geq I(U\wedge X|Y),

E(R)\displaystyle E(R) min[minPU~X~Y~𝒫𝚋(PU|X)D(PU~X~Y~QUXY)\displaystyle\geq\min\bigg{[}\min_{P_{\tilde{U}\tilde{X}\tilde{Y}}\in{\cal P}_{\mathtt{b}}(P_{U|X})}D(P_{\tilde{U}\tilde{X}\tilde{Y}}\|Q_{UXY})
+|RI(UX|Y)|+,E(PUXYQUXY)],\displaystyle~{}~{}~{}~{}~{}~{}+|R-I(U\wedge X|Y)|^{+},E(P_{UXY}\|Q_{UXY})\bigg{]},

where |t|+=max[t,0]|t|^{+}=\max[t,0], and

𝒫𝚋(PU|X)\displaystyle{\cal P}_{\mathtt{b}}(P_{U|X}) :={PU~X~Y~:\displaystyle:=\big{\{}P_{\tilde{U}\tilde{X}\tilde{Y}}:
PU~X~=PUX,PY~=PY,H(U~|Y~)H(U|Y)}.\displaystyle~{}~{}~{}P_{\tilde{U}\tilde{X}}=P_{UX},P_{\tilde{Y}}=P_{Y},H(\tilde{U}|\tilde{Y})\geq H(U|Y)\big{\}}.

Particularly, by taking PU|XP_{U|X} to be noiseless test channel, i.e., U=XU=X, we have

E(R)\displaystyle E(R) E𝚋(R)\displaystyle\geq E_{\mathtt{b}}(R) (1)
:=min[minPX~Y~𝒫𝚋D(PX~Y~QXY)+|RH(X|Y)|+,\displaystyle:=\min\bigg{[}\min_{P_{\tilde{X}\tilde{Y}}\in{\cal P}_{\mathtt{b}}}D(P_{\tilde{X}\tilde{Y}}\|Q_{XY})+|R-H(X|Y)|^{+},
D(PXYQXY)],\displaystyle~{}~{}~{}~{}~{}~{}D(P_{XY}\|Q_{XY})\bigg{]}, (2)

where

𝒫𝚋:={PX~Y~:PX~=PX,PY~=PY,H(X~|Y~)H(X|Y)}.\displaystyle{\cal P}_{\mathtt{b}}:=\big{\{}P_{\tilde{X}\tilde{Y}}:P_{\tilde{X}}=P_{X},P_{\tilde{Y}}=P_{Y},H(\tilde{X}|\tilde{Y})\geq H(X|Y)\big{\}}.

In the following, we focus on the critical rate. Under some mild conditions, in order to attain the Stein exponent, we must take U=XU=X in the quantization-bining bound, i.e., no quantization. In the rest of this section, we prove this claim.

Suppose that PXYP_{XY} and QXYQ_{XY} have full support. Let Λ^\hat{\Lambda} be a function on 𝒳×𝒴{\cal X}\times{\cal Y} defined by222We suppose that alphabets are 𝒳={0,,m𝗑}{\cal X}=\{0,\ldots,m_{\mathsf{x}}\} and 𝒴={0,,m𝗒}{\cal Y}=\{0,\ldots,m_{\mathsf{y}}\}; taking y=0y=0 as a special symbol in (3) is just notational convenience.

Λ^(x,y):=logPXY(x,y)QXY(x,y)logPXY(x,0)QXY(x,0).\displaystyle\hat{\Lambda}(x,y):=\log\frac{P_{XY}(x,y)}{Q_{XY}(x,y)}-\log\frac{P_{XY}(x,0)}{Q_{XY}(x,0)}. (3)
Proposition 2

Suppose that, for every xxx\neq x^{\prime}, rows Λ^(x,)\hat{\Lambda}(x,\cdot) and Λ^(x,)\hat{\Lambda}(x^{\prime},\cdot) are distinct. Then, E(PUXYQUXY)=D(PXYQXY)E(P_{UXY}\|Q_{UXY})=D(P_{XY}\|Q_{XY}) only if there does not exist u𝒰u\in{\cal U} and xxx\neq x^{\prime} such that PU|X(u|x)PU|X(u|x)>0P_{U|X}(u|x)P_{U|X}(u|x^{\prime})>0.

Proof:

Since we assumed that PXYP_{XY} and QXYQ_{XY} have full support and since PU|X=QU|XP_{U|X}=Q_{U|X}, PUXYP_{UXY} and QUXYQ_{UXY} have the same support, which we denote by 𝒜{\cal A}. Let 𝒜=𝚜𝚞𝚙𝚙(PUX)=𝚜𝚞𝚙𝚙(QUX){\cal A}^{\prime}=\mathtt{supp}(P_{UX})=\mathtt{supp}(Q_{UX}). Note that PUYP_{UY} and QUYQ_{UY} have full support. Let 𝒫(𝒜){\cal L}\subseteq{\cal P}({\cal A}) be the linear family of distributions PU~X~Y~P_{\tilde{U}\tilde{X}\tilde{Y}} satisfying

(u,x,y)𝒜PU~X~Y~(u,x,y)δu¯x¯(u,x)=PUX(u¯,x¯)\displaystyle\sum_{(u,x,y)\in{\cal A}}P_{\tilde{U}\tilde{X}\tilde{Y}}(u,x,y)\delta_{\bar{u}\bar{x}}(u,x)=P_{UX}(\bar{u},\bar{x})

for every (u¯,x¯)𝒜(\bar{u},\bar{x})\in{\cal A}^{\prime} and

(u,x,y)𝒜PU~X~Y~(u,x,y)δu¯y¯(u,y)=PUY(u¯,y¯),\displaystyle\sum_{(u,x,y)\in{\cal A}}P_{\tilde{U}\tilde{X}\tilde{Y}}(u,x,y)\delta_{\bar{u}\bar{y}}(u,y)=P_{UY}(\bar{u},\bar{y}),

for every (u¯,y¯)𝒰×𝒴(\bar{u},\bar{y})\in{\cal U}\times{\cal Y}. Then, we can write E(PUXYQUXY)E(P_{UXY}\|Q_{UXY}) as

E(PUXYQUXY)=minPU~X~Y~D(PU~X~Y~QUXY).\displaystyle E(P_{UXY}\|Q_{UXY})=\min_{P_{\tilde{U}\tilde{X}\tilde{Y}}\in{\cal L}}D(P_{\tilde{U}\tilde{X}\tilde{Y}}\|Q_{UXY}). (4)

If

E(PUXYQUXY)=D(PXYQXY)=D(PUXYQUXY),\displaystyle E(P_{UXY}\|Q_{UXY})=D(P_{XY}\|Q_{XY})=D(P_{UXY}\|Q_{UXY}),

then PUXYP_{UXY} must be the (unique) minimizer in (4). Then, the I-projection theorem (cf. [27, Theorem 3.2]) implies that the minimizer PUXYP_{UXY} lies in the exponential family generated by QUXYQ_{UXY} and the constraints of {\cal L}, i.e., it has of the form

PUXY(u,x,y)\displaystyle P_{UXY}(u,x,y)
QUXY(u,x,y)exp[u¯,x¯θu¯x¯δu¯x¯(u,x)+u¯,y¯θu¯y¯δu¯y¯(u,y)].\displaystyle\propto Q_{UXY}(u,x,y)\exp\bigg{[}\sum_{\bar{u},\bar{x}}\theta_{\bar{u}\bar{x}}\delta_{\bar{u}\bar{x}}(u,x)+\sum_{\bar{u},\bar{y}}\theta_{\bar{u}\bar{y}}\delta_{\bar{u}\bar{y}}(u,y)\bigg{]}.

for some θu¯x¯,θu¯y¯\theta_{\bar{u}\bar{x}},\theta_{\bar{u}\bar{y}}\in\mathbb{R}. Equivalently, this condition can be written as

logPUXY(u,x,y)QUXY(u,x,y)=g1(u,x)+g2(u,y)\displaystyle\log\frac{P_{UXY}(u,x,y)}{Q_{UXY}(u,x,y)}=g_{1}(u,x)+g_{2}(u,y)

for some functions g1g_{1} on 𝒜{\cal A}^{\prime} and g2g_{2} on 𝒰×𝒴{\cal U}\times{\cal Y}. If there exist u𝒰u\in{\cal U} and xxx\neq x^{\prime} such that PU|X(u|x)PU|X(u|x)>0P_{U|X}(u|x)P_{U|X}(u|x^{\prime})>0, then, for every y𝒴y\in{\cal Y}, we have

Λ^(x,y)Λ^(x,y)\displaystyle\hat{\Lambda}(x,y)-\hat{\Lambda}(x^{\prime},y)
=[logPXY(x,y)QXY(x,y)logPXY(x,0)QXY(x,0)]\displaystyle=\bigg{[}\log\frac{P_{XY}(x,y)}{Q_{XY}(x,y)}-\log\frac{P_{XY}(x,0)}{Q_{XY}(x,0)}\bigg{]}
[logPXY(x,y)QXY(x,y)logPXY(x,0)QXY(x,0)]\displaystyle~{}~{}~{}-\bigg{[}\log\frac{P_{XY}(x^{\prime},y)}{Q_{XY}(x^{\prime},y)}-\log\frac{P_{XY}(x^{\prime},0)}{Q_{XY}(x^{\prime},0)}\bigg{]}
=[logPUXY(u,x,y)QUXY(u,x,y)logPUXY(u,x,0)QUXY(u,x,0)]\displaystyle=\bigg{[}\log\frac{P_{UXY}(u,x,y)}{Q_{UXY}(u,x,y)}-\log\frac{P_{UXY}(u,x,0)}{Q_{UXY}(u,x,0)}\bigg{]}
[logPUXY(u,x,y)QUXY(u,x,y)logPUXY(u,x,0)QUXY(u,x,0)]\displaystyle~{}~{}~{}-\bigg{[}\log\frac{P_{UXY}(u,x^{\prime},y)}{Q_{UXY}(u,x^{\prime},y)}-\log\frac{P_{UXY}(u,x^{\prime},0)}{Q_{UXY}(u,x^{\prime},0)}\bigg{]}
=[g1(u,x)+g2(u,y)g1(u,x)g2(u,0)]\displaystyle=\bigg{[}g_{1}(u,x)+g_{2}(u,y)-g_{1}(u,x)-g_{2}(u,0)\bigg{]}
[g1(u,x)+g2(u,y)g1(u,x)g2(u,0)]\displaystyle~{}~{}~{}-\bigg{[}g_{1}(u,x^{\prime})+g_{2}(u,y)-g_{1}(u,x^{\prime})-g_{2}(u,0)\bigg{]}
=0,\displaystyle=0,

which contradict the assumption that Λ^(x,)\hat{\Lambda}(x,\cdot) and Λ^(x,)\hat{\Lambda}(x^{\prime},\cdot) are distinct. ∎

Since the quantization-binning bound is at most E(PUXYQUXY)E(P_{UXY}\|Q_{UXY}), when the assumption of Proposition 2 is satisfied, U=XU=X is the only choice of test channel to attain the Stein exponent.333Even though we can take a redundant test channel PU|XP_{U|X} such that 𝚜𝚞𝚙𝚙(PU|X(|x))𝚜𝚞𝚙𝚙(PU|X(|x))=\mathtt{supp}(P_{U|X}(\cdot|x))\cap\mathtt{supp}(P_{U|X}(\cdot|x^{\prime}))=\emptyset for any xxx\neq x^{\prime}, such a test channel does not improve the attainable exponent.

In Proposition 2, the assumption that every raws of Λ^(x,y)\hat{\Lambda}(x,y) are distinct cannot be replaced by the assumption that every raws of the log-likelihood function

Λ(x,y)=logPXY(x,y)QXY(x,y)\displaystyle\Lambda(x,y)=\log\frac{P_{XY}(x,y)}{Q_{XY}(x,y)}

are distinct; for instance, when PXYP_{XY} and QXYQ_{XY} are given by

PXY\displaystyle P_{XY} =124a[a2a3a2a3a3aa3a6a],\displaystyle=\frac{1}{24a}\left[\begin{array}[]{ccc}a&2a&3a\\ 2a&3a&3a\\ a&3a&6a\end{array}\right],
QXY\displaystyle Q_{XY} =114a[2aaaaaaa2a4a]\displaystyle=\frac{1}{14a}\left[\begin{array}[]{ccc}2a&a&a\\ a&a&a\\ a&2a&4a\end{array}\right]

for some a>0a>0, then every raws Λ(x,)\Lambda(x,\cdot) are distinct but Λ^(1,)=Λ^(2,)\hat{\Lambda}(1,\cdot)=\hat{\Lambda}(2,\cdot). In fact, as is shown in the following Proposition 3, the symbols x=1x=1 and x=2x=2 in this example can be merged while maintaining E(PUXYQUXY)=D(PXYQXY)E(P_{UXY}\|Q_{UXY})=D(P_{XY}\|Q_{XY}). Even though Proposition 3 is not used in the rest of the paper, it may be of independent interest.

Proposition 3

Let κ:𝒳𝒰\kappa:{\cal X}\to{\cal U} be a function such that κ(x)=κ(x)\kappa(x)=\kappa(x^{\prime}) if and only if Λ^(x,)=Λ^(x,)\hat{\Lambda}(x,\cdot)=\hat{\Lambda}(x^{\prime},\cdot), and let U=κ(X)U=\kappa(X). Then, for any PU~X~Y~P_{\tilde{U}\tilde{X}\tilde{Y}} satisfying PU~X~=PUXP_{\tilde{U}\tilde{X}}=P_{UX} and PU~Y~=PUYP_{\tilde{U}\tilde{Y}}=P_{UY}, we have

D(PU~X~Y~QUXY)\displaystyle D(P_{\tilde{U}\tilde{X}\tilde{Y}}\|Q_{UXY})
=D(PU~X~Y~PUXY)+D(PUXYQUXY).\displaystyle=D(P_{\tilde{U}\tilde{X}\tilde{Y}}\|P_{UXY})+D(P_{UXY}\|Q_{UXY}). (5)

Particularly, we have

E(PUXYQUXY)=D(PXYQXY).\displaystyle E(P_{UXY}\|Q_{UXY})=D(P_{XY}\|Q_{XY}). (6)
Proof:

Since (6) follows from (5), it suffices to prove (5). For each uu, fix xu𝒳x_{u}\in{\cal X} such that κ(xu)=u\kappa(x_{u})=u. Then, we have

D(PU~X~Y~QUXY)D(PU~X~Y~PUXY)D(PUXYQUXY)\displaystyle D(P_{\tilde{U}\tilde{X}\tilde{Y}}\|Q_{UXY})-D(P_{\tilde{U}\tilde{X}\tilde{Y}}\|P_{UXY})-D(P_{UXY}\|Q_{UXY})
=u,x,y(PU~X~Y~(u,x,y)PUXY(u,x,y))logPUXY(u,x,y)QUXY(u,x,y)\displaystyle=\sum_{u,x,y}\big{(}P_{\tilde{U}\tilde{X}\tilde{Y}}(u,x,y)-P_{UXY}(u,x,y)\big{)}\log\frac{P_{UXY}(u,x,y)}{Q_{UXY}(u,x,y)}
=u,x,y(PU~X~Y~(u,x,y)PUXY(u,x,y))logPXY(x,y)QXY(x,y)\displaystyle=\sum_{u,x,y}\big{(}P_{\tilde{U}\tilde{X}\tilde{Y}}(u,x,y)-P_{UXY}(u,x,y)\big{)}\log\frac{P_{XY}(x,y)}{Q_{XY}(x,y)}
=u,x,y(PU~X~Y~(u,x,y)PUXY(u,x,y))Λ^(x,y)\displaystyle=\sum_{u,x,y}\big{(}P_{\tilde{U}\tilde{X}\tilde{Y}}(u,x,y)-P_{UXY}(u,x,y)\big{)}\hat{\Lambda}(x,y)
=u,x,y(PU~X~Y~(u,x,y)PUXY(u,x,y))Λ^(xu,y)\displaystyle=\sum_{u,x,y}\big{(}P_{\tilde{U}\tilde{X}\tilde{Y}}(u,x,y)-P_{UXY}(u,x,y)\big{)}\hat{\Lambda}(x_{u},y)
=u,y(PU~Y~(u,y)PUY(u,y))Λ^(xu,y)\displaystyle=\sum_{u,y}\big{(}P_{\tilde{U}\tilde{Y}}(u,y)-P_{UY}(u,y)\big{)}\hat{\Lambda}(x_{u},y)
=0,\displaystyle=0,

where the third equality follows from PX~=PXP_{\tilde{X}}=P_{X} and the last equality follows from PU~Y~=PUYP_{\tilde{U}\tilde{Y}}=P_{UY}. ∎

Remark 4

Introducing the function Λ(x,y)\Lambda(x,y) as in (3) is motivated by the problem of distributed function computing. Note that, in order to conduct the likelihood ratio test, the receiver need not compute the log-likelihood ratios (Λ(xi,yi):1in)(\Lambda(x_{i},y_{i}):1\leq i\leq n) for a given observation (xy𝒳n×𝒴n(\bm{x},\bm{y})\in{\cal X}^{n}\times{\cal Y}^{n}; instead, it suffices to compute the summation

Λn(xy/i/1nΛxiyi\displaystyle\Lambda_{n}(\bm{x},\bm{y})=\sum_{i=1}^{n}\Lambda(x_{i},y_{i}).

In the terminology of distributed computing [28, Definition 4], we can verify that the function Λn\Lambda_{n} is 𝒳¯Λ^\overline{{\cal X}}_{\hat{\Lambda}}-informative.

IV Binary Symmetric Double Source (BSDS)

Let 𝒳=𝒴={0,1}{\cal X}={\cal Y}=\{0,1\}, and let PXYP_{XY} and QXYQ_{XY} be given by

PXY=[1p2p2p21p2],QXY=[1q2q2q21q2]\displaystyle P_{XY}=\left[\begin{array}[]{cc}\frac{1-p}{2}&\frac{p}{2}\\ \frac{p}{2}&\frac{1-p}{2}\end{array}\right],~{}~{}~{}Q_{XY}=\left[\begin{array}[]{cc}\frac{1-q}{2}&\frac{q}{2}\\ \frac{q}{2}&\frac{1-q}{2}\end{array}\right]

for 0p,q10\leq p,q\leq 1 and pqp\neq q. Since Λ^(0,0)=Λ^(1,0)=0\hat{\Lambda}(0,0)=\hat{\Lambda}(1,0)=0 and

Λ^(0,1)=log(1q)p(1p)qlog(1p)q(1q)p=Λ^(1,1),\displaystyle\hat{\Lambda}(0,1)=\log\frac{(1-q)p}{(1-p)q}\neq\log\frac{(1-p)q}{(1-q)p}=\hat{\Lambda}(1,1),

the assumption of Proposition 2 is satisfied. Thus, we need to set U=XU=X in the quantization-binning bound to attain the Stein exponent. In the minimization of (1), since PX~=PXP_{\tilde{X}}=P_{X} and PY~=PYP_{\tilde{Y}}=P_{Y} are the uniform distribution, the only free parameter is the crossover probability PY~|X~(1|0)=PY~|X~(0|1)=p~P_{\tilde{Y}|\tilde{X}}(1|0)=P_{\tilde{Y}|\tilde{X}}(0|1)=\tilde{p}. Then, we can easily find that the binning bound for BSDS is

E𝚋(R)\displaystyle E_{\mathtt{b}}(R)
={min[|Rh(p)|+,D(pq)] if h(p)h(q)D(pq) if h(p)>h(q)\displaystyle=\left\{\begin{array}[]{ll}\min[|R-h(p)|^{+},D(p\|q)]&\mbox{ if }h(p)\leq h(q)\\ D(p\|q)&\mbox{ if }h(p)>h(q)\end{array}\right. (9)

for Rh(p)R\geq h(p). Furthermore, this bound implies that the critical rate attainable by the SHA scheme is

R𝚌𝚛{h(p)+D(pq) if h(p)h(q)h(p) if h(p)>h(q).\displaystyle R_{\mathtt{cr}}\leq\left\{\begin{array}[]{ll}h(p)+D(p\|q)&\mbox{ if }h(p)\leq h(q)\\ h(p)&\mbox{ if }h(p)>h(q)\end{array}\right.. (12)

V Product of BSDS

Next, let us consider the case where PXY=PX1Y1×PX2Y2P_{XY}=P_{X_{1}Y_{1}}\times P_{X_{2}Y_{2}} and QXY=QX1Y1×QX2Y2Q_{XY}=Q_{X_{1}Y_{1}}\times Q_{X_{2}Y_{2}}, and PXiYiP_{X_{i}Y_{i}} and QXiYiQ_{X_{i}Y_{i}} for i=1,2i=1,2 are DSBSs with parameters 0p1,p2,q1,q210\leq p_{1},p_{2},q_{1},q_{2}\leq 1, respectively. Particularly, we consider the case where p1=q2p_{1}=q_{2}, p2=q1p_{2}=q_{1}, where p1q1p_{1}\neq q_{1}. Note that, for the product source, the function defined by (3) can be decomposed as

Λ^(x1x2,y1y2)=Λ^1(x1,y1)+Λ^2(x2,y2),\displaystyle\hat{\Lambda}(x_{1}x_{2},y_{1}y_{2})=\hat{\Lambda}_{1}(x_{1},y_{1})+\hat{\Lambda}_{2}(x_{2},y_{2}),

where

Λ^i(xi,yi)=logPXiYi(xi,yi)QXiYi(xi,yi)logPXiYi(xi,0)QXiYi(xi,0).\displaystyle\hat{\Lambda}_{i}(x_{i},y_{i})=\log\frac{P_{X_{i}Y_{i}}(x_{i},y_{i})}{Q_{X_{i}Y_{i}}(x_{i},y_{i})}-\log\frac{P_{X_{i}Y_{i}}(x_{i},0)}{Q_{X_{i}Y_{i}}(x_{i},0)}.

Furthermore, we have Λ^1(1,1)=Λ^1(0,1)\hat{\Lambda}_{1}(1,1)=-\hat{\Lambda}_{1}(0,1), Λ^2(0,1)=Λ^1(0,1)\hat{\Lambda}_{2}(0,1)=-\hat{\Lambda}_{1}(0,1), and Λ^2(1,1)=Λ^1(0,1)\hat{\Lambda}_{2}(1,1)=\hat{\Lambda}_{1}(0,1). Thus, by denoting a=Λ^1(0,1)a=\hat{\Lambda}_{1}(0,1), we have

Λ^(x1x2,y1y2)=[0aa00aa2a0aa2a0aa0],\displaystyle\hat{\Lambda}(x_{1}x_{2},y_{1}y_{2})=\left[\begin{array}[]{cccc}0&-a&a&0\\ 0&a&a&2a\\ 0&-a&-a&-2a\\ 0&a&-a&0\end{array}\right],

and the assumption of Proposition 2 is satisfied. Thus, we need to set U=XU=X in the quantization-binning bound to attain the Stein exponent.

Since p1=q2p_{1}=q_{2} and p2=q1p_{2}=q_{1}, the conditional entropies HP(X|Y)H_{P}(X|Y) and HQ(X|Y)H_{Q}(X|Y) with respect to PXYP_{XY} and QXYQ_{XY} satisfy HP(X|Y)=HQ(X|Y)H_{P}(X|Y)=H_{Q}(X|Y). Thus, we find that the inner minimization of the binning bound is attained by PX~Y~=QXYP_{\tilde{X}\tilde{Y}}=Q_{XY}, and

E𝚋(R)\displaystyle E_{\mathtt{b}}(R)
=min[|Rh(p1)h(p2)|+,D(p1q1)+D(p2q2)].\displaystyle=\min\big{[}|R-h(p_{1})-h(p_{2})|^{+},D(p_{1}\|q_{1})+D(p_{2}\|q_{2})\big{]}.

Furthermore, this bound implies that the critical rate attainable by the SHA scheme is

R𝚌𝚛h(p1)+h(p2)+D(p1q1)+D(p2q2).\displaystyle R_{\mathtt{cr}}\leq h(p_{1})+h(p_{2})+D(p_{1}\|q_{1})+D(p_{2}\|q_{2}). (13)

VI Sequential Scheme for Product of BSDS

Now, we describe a modified version of SHA scheme for the product of BSDS. For i=1,2i=1,2, let T𝚋[i]=(φ[i],ψ[i])T_{\mathtt{b}}^{[i]}=(\varphi^{[i]},\psi^{[i]}) be the SHA scheme (without quantization) for PXiYiP_{X_{i}Y_{i}} versus QXiYiQ_{X_{i}Y_{i}}. For a given rate RR, we split the rate as R=R1+R2R=R_{1}+R_{2}, and consider sequential scheme T~𝚋=(φ,ψ)\tilde{T}_{\mathtt{b}}=(\varphi,\psi) constructed from (T𝚋[1],T𝚋[2])(T_{\mathtt{b}}^{[1]},T_{\mathtt{b}}^{[2]}) as follows. Upon observing x/x1x2\bm{x}=(\bm{x}_{1},\bm{x}_{2}), the sender transmit m1=φ[1](x1m_{1}=\varphi^{[1]}(\bm{x}_{1}) and m2=φ[2](x2m_{2}=\varphi^{[2]}(\bm{x}_{2}).444Note that m1m_{1} and m2m_{2} include indices of random binning as well as marginal types of x1\bm{x}_{1} and x2\bm{x}_{2}. Then, upon receiving (m1,m2)(m_{1},m_{2}) and observing y/y1y2\bm{y}=(\bm{y}_{1},\bm{y}_{2}), the receiver decides ψ(m1,m2,y/𝙷0\psi(m_{1},m_{2},\bm{y})=\mathtt{H}_{0} if ψ[1](m1,y1/𝙷0\psi^{[1]}(m_{1},\bm{y}_{1})=\mathtt{H}_{0} and ψ[2](m2,y2/𝙷0\psi^{[2]}(m_{2},\bm{y}_{2})=\mathtt{H}_{0}. The type I and type II error probabilities of this sequential scheme T~𝚋\tilde{T}_{\mathtt{b}} can be evaluated as

α[T~𝚋]\displaystyle\alpha[\tilde{T}_{\mathtt{b}}]
=P(ψ[1](φ[1](X1n),Y1n)=𝙷1ψ[2](φ[2](X2n),Y2n)=𝙷1)\displaystyle=P\bigg{(}\psi^{[1]}(\varphi^{[1]}(X_{1}^{n}),Y_{1}^{n})=\mathtt{H}_{1}\vee\psi^{[2]}(\varphi^{[2]}(X_{2}^{n}),Y_{2}^{n})=\mathtt{H}_{1}\bigg{)}
i=12P(ψ[i](φ[i](Xin),Yin)=𝙷1)\displaystyle\leq\sum_{i=1}^{2}P\bigg{(}\psi^{[i]}(\varphi^{[i]}(X_{i}^{n}),Y_{i}^{n})=\mathtt{H}_{1}\bigg{)}

by the union bound, and

β[T~𝚋]\displaystyle\beta[\tilde{T}_{\mathtt{b}}]
=Q(ψ[1](φ[1](X1n),Y1n)=𝙷0,ψ[2](φ[2](X2n),Y2n)=𝙷0)\displaystyle=Q\bigg{(}\psi^{[1]}(\varphi^{[1]}(X_{1}^{n}),Y_{1}^{n})=\mathtt{H}_{0},~{}\psi^{[2]}(\varphi^{[2]}(X_{2}^{n}),Y_{2}^{n})=\mathtt{H}_{0}\bigg{)}
=i=12Q(ψ[i](φ[i](Xin),Yin)=𝙷0).\displaystyle=\prod_{i=1}^{2}Q\bigg{(}\psi^{[i]}(\varphi^{[i]}(X_{i}^{n}),Y_{i}^{n})=\mathtt{H}_{0}\bigg{)}.

Thus, if the type I error probabilities of each scheme is vanishing, then the type I error probability α[T~𝚋]\alpha[\tilde{T}_{\mathtt{b}}] is also vanishing. On the other hand, the type II exponent of the sequential scheme is the summation of the type II exponent of each scheme.

As in Section V, suppose that p1=q2p_{1}=q_{2} and p2=q1p_{2}=q_{1}. Furthermore, let h(q1)<h(p1)h(q_{1})<h(p_{1}). Then, from the above argument, and (9), we find that the following exponent is attainable by the sequential scheme:

E~𝚋(R)=D(p1q1)+min[|R2h(p2)|+,D(p2q2)]\displaystyle\tilde{E}_{\mathtt{b}}(R)=D(p_{1}\|q_{1})+\min[|R_{2}-h(p_{2})|^{+},D(p_{2}\|q_{2})] (14)

for R1h(p1)R_{1}\geq h(p_{1}) and R2h(p2)R_{2}\geq h(p_{2}). By setting R1=h(p1)R_{1}=h(p_{1}) and R2=h(p2)+D(p2q2)R_{2}=h(p_{2})+D(p_{2}\|q_{2}), we can derive the following bound on the critical rate:

R𝚌𝚛h(p1)+h(p2)+D(p2q2).\displaystyle R_{\mathtt{cr}}\leq h(p_{1})+h(p_{2})+D(p_{2}\|q_{2}). (15)

Interestingly, the bound on the critical rate in (15) improves upon the bound in (13).

From an operational point of view, this improvement can be explained as follows. In the SHA scheme (without quantization), we use the minimum entropy decoder to compute an estimate x^\hat{\bm{x}} of Xn=xX^{n}=\bm{x}; then, if the joint type 𝗍x^y\mathsf{t}_{\hat{\bm{x}}\bm{y}} is close to PXYP_{XY}, we accept the null hypothesis. When the empirical conditional entropy H(xjyH(\bm{x}|\bm{y}) is such that HP(X|Y)H(xjyH_{P}(X|Y)\gnsim H(\bm{x}|\bm{y}), even if there exists x^\hat{\bm{x}} satisfying H(xjyHx^jyH(\bm{x}|\bm{y})\geq H(\hat{\bm{x}}|\bm{y}), the type II testing error does not occur though the receiver may erroneously compute x^x\hat{\bm{x}}\neq\bm{x}.555Note that HP(X|Y)H(xjyHx^jyH_{P}(X|Y)\gnsim H(\bm{x}|\bm{y})\geq H(\hat{\bm{x}}|\bm{y}) implies that 𝗍x^y\mathsf{t}_{\hat{\bm{x}}\bm{y}} is not close to PXYP_{XY}. Consequently, the type II testing error may occur only if (Xn,Yn)=(xy(X^{n},Y^{n})=(\bm{x},\bm{y}) satisfies H(xjyHPXjYH(\bm{x}|\bm{y})\gtrsim H_{P}(X|Y).

For the product of BSDSs, when the types of modulo sums x1y1\bm{x}_{1}+\bm{y}_{1} and x2y2\bm{x}_{2}+\bm{y}_{2} are H(𝗍x1y1)h(τ1)H(\mathsf{t}_{\bm{x}_{1}+\bm{y}_{1}})\simeq h(\tau_{1}) and H(𝗍x2y2)h(τ2)H(\mathsf{t}_{\bm{x}_{2}+\bm{y}_{2}})\simeq h(\tau_{2}), the type II testing error may occur only if h(τ1)+h(τ2)h(p1)+h(p2)h(\tau_{1})+h(\tau_{2})\gtrsim h(p_{1})+h(p_{2}); see Fig. 1a(a). In the modified SHA scheme, x1\bm{x}_{1} and x2\bm{x}_{2} are binned and estimated separately. Because of this, the constraint of type II testing error event become more stringent, i.e., h(τ1)h(p1)h(\tau_{1})\gtrsim h(p_{1}) and h(τ2)h(p2)h(\tau_{2})\gtrsim h(p_{2}); see Fig. Fig. 1a(b).

Refer to caption
(a)
Refer to caption
(b)
Figure 1: The meshed area describes the range of h(τ1)h(\tau_{1}) and h(τ2)h(\tau_{2}) such that the type II testing error may occur when H(𝗍x1y1)h(τ1)H(\mathsf{t}_{\bm{x}_{1}+\bm{y}_{1}})\simeq h(\tau_{1}) and H(𝗍x2y2)h(τ2)H(\mathsf{t}_{\bm{x}_{2}+\bm{y}_{2}})\simeq h(\tau_{2}) for (a) SHA scheme and (b) modified SHA scheme.

VII Discussion

In this paper, we have developed tools to evaluate the critical rate attainable by the SHA scheme, and exemplified that the SHA scheme is sub-optimal for a product of BSDSs. A future problem is to generalize the idea of sequential scheme in Section VI and develop a new achievability scheme for the distributed hypothesis testing. Another future problem is a potential utility of the scheme in [17]; as we mentioned in Section I, it has a potential to improve upon the SHA scheme, but strict improvement is not clear since the expression involves complicated optimization over multiple parameters.

Lastly, it should be pointed out that certain kinds of “structured” random coding is known to improve upon the naive random coding in the context of the mismatched decoding problem; eg. see [29, 30] and references therein. Even though there is no direct connection between the two problems, it might be interesting to pursue a mathematical connection between these problems.

Acknowledgment

The author appreciates Marat Burnashev for valuable discussions at an early stage of the work. The author also thanks Te Sun Han and Yasutada Oohama for fruitful discussions.

References

  • [1] T. Berger, “Decentralized estimation and decision theory,” in Presented at the IEEE 7th Spring Workshop on Information Theory, Mt. Kisco, NY, September 1979.
  • [2] R. Ahlswede and I. Csiszár, “Hypothesis testing with communication constraints,” IEEE Trans. Inform. Theory, vol. 32, no. 4, pp. 533–542, July 1986.
  • [3] T. S. Han, “Hypothesis testing with multiterminal data compression,” IEEE Trans. Inform. Theory, vol. 33, no. 6, pp. 759–772, November 1987.
  • [4] Z. Zhang and T. Berger, “Estimation via compressed information,” IEEE Trans. Inform. Theory, vol. 34, no. 2, pp. 198–211, March 1988.
  • [5] S. Amari and T. S. Han, “Statistical inference under multiterminal rate restrictions: A differential geometric approach,” IEEE Trans. Inform. Theory, vol. 35, no. 2, pp. 217–227, March 1989.
  • [6] S. Amari, “Fisher information under restriction of Shannon information in multi-terminal situations,” Annals of the Institute of Statistical Mathematics, vol. 41, no. 4, pp. 623–648, 1989.
  • [7] T. S. Han and K. Kobayashi, “Exponential-type error probabilities for multiterminal hypothesis testing,” IEEE Trans. Inform. Theory, vol. 35, no. 1, pp. 2–14, January 1989.
  • [8] H. M. H. Shalaby and A. Papamarcou, “Multiterminal detection with zero-rate data compression,” IEEE Trans. Inform. Theory, vol. 38, no. 2, pp. 254–267, March 1992.
  • [9] T. S. Han and S. Amari, “Statistical inference under multiterminal data compression,” IEEE Trans. Inform. Theory, vol. 44, no. 6, pp. 2300–2324, October 1998.
  • [10] S. Amari, “On optimal data compression in multiterminal statistical inference,” IEEE Trans. Inform. Theory, vol. 57, no. 9, pp. 5577–5587, September 2011.
  • [11] M. Ueta and S. Kuzuoka, “The error exponent of zero-rate multiterminal hypothesis testing for sources with common information,” in Proc. ISITA 2014, Melbourne, Australia, 2014, pp. 559–563.
  • [12] Y. Polyanskiy, “Hypothesis testing via a comparator,” in Proc. IEEE Int. Symp. Inf. Theory 2012, Cambridge, MA, 2012, pp. 2206–2210.
  • [13] M. S. Rahman and A. B. Wagner, “On the optimality of binning for distributed hypothesis testing,” IEEE Trans. Inform. Theory, vol. 58, no. 10, pp. 6282–6303, October 2012.
  • [14] Y. Xiang and Y.-H. Kim, “Interactive hypothesis testing against independence,” in IEEE International Symposium on Information Theory, 2013, pp. 2840–2844.
  • [15] W. Zhao and L. Lai, “Distributed testing with cascaded encoders,” IEEE Trans. Inform. Theory, vol. 64, no. 11, pp. 7339–7348, November 2018.
  • [16] S. Watanabe, “Neyman-Pearson test for zero-rate multiterminal hypothesis testing,” IEEE Trans. Inform. Theory, vol. 64, no. 7, pp. 4923–4939, July 2018.
  • [17] N. Weinberger and Y. Kochman, “On the reliability function of distributed hypothesis testing under optimal detection,” IEEE Trans. Inform. Theory, vol. 65, no. 8, pp. 4940–4965, August 2019.
  • [18] S. Salehkalaibar, M. Wigger, and L. Wang, “Hypothesis testing over the two-hop relay network,” IEEE Trans. Inform. Theory, vol. 65, no. 7, pp. 4411–4433, July 2019.
  • [19] U. Hadar, J. Liu, Y. Polyanskiy, and O. Shayevitz, “Communication complexity of estimating correlations,” in Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC ’19).   ACM Press, 2019, pp. 792–803.
  • [20] U. Hadar and O. Shayevitz, “Distributed estimation of Gaussian correlations,” IEEE Trans. Inform. Theory, vol. 65, no. 9, pp. 5323–5338, September 2019.
  • [21] P. Escamilla, M. Wigger, and A. Zaidi, “Distributed hypothesis testing: Cooperation and concurrent detection,” IEEE Trans. Inform. Theory, vol. 66, no. 12, pp. 7550–7564, December 2020.
  • [22] M. V. Burnashev, “New upper bounds in the hypothesis testing problem with information constraints,” Problems of Information Transmission, vol. 56, no. 2, pp. 64–81, 2020.
  • [23] S. Sreekumar and D. Gündüz, “Distributed hypothesis testing over discrete memoryless channels,” IEEE Trans. Inform. Theory, vol. 66, no. 4, pp. 2044–2066, April 2020.
  • [24] K. R. Sahasranand and H. Tyagi, “Communication complexity of distributed high dimensional correlation testing,” IEEE Trans. Inform. Theory, vol. 67, no. 9, pp. 6082–6095, September 2021.
  • [25] H. Shimokawa, T. S. Han, and S. Amari, “Error bound of hypothesis testing with data compression,” in Proceedings of 1994 IEEE International Symposium on Information Theory, 1994, p. 114.
  • [26] E. Haim and Y. Kochman, “Binary distributed hypothesis testing via Körner-Marton coding,” in 2016 IEEE Information Theory Workshop, 2016.
  • [27] I. Csiszár and P. C. Shields, Information Theory and Statistics: A Tutorial.   Now Publishers Inc., 2004.
  • [28] S. Kuzuoka and S. Watanabe, “On distributed computing for functions with certain structures,” IEEE Trans. Inform. Theory, vol. 63, no. 11, pp. 7003–7017, November 2017.
  • [29] A. Lapidoth, “Mismatched decoding and the multiple-access channel,” IEEE Trans. Inform. Theory, vol. 42, no. 5, pp. 1439–1452, September 1996.
  • [30] J. Scarlett, A. Somekh-Baruch, A. Martinez, and A. G. i Fábregas, “A counter-example to the mismatched decoding converse for binary-input discrete memoryless channels,” IEEE Trans. Inform. Theory, vol. 10, no. 10, pp. 5387–5359, October 2015.