This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Robust Anytime-Valid Sequential Probability Ratio Test

1 Method

Define the distributions Qi,i=1,2,Q_{i},i=1,2, by their densities as follows:

q0,ϵ(x)\displaystyle q_{0,\epsilon}(x) =(1ϵ)p0(x)\displaystyle=\left(1-\epsilon\right)p_{0}(x) for p1(x)/p0(x)<c′′\displaystyle\text{ for }p_{1}(x)/p_{0}(x)<c^{\prime\prime}
=(1/c′′)(1ϵ)p1(x)\displaystyle=\left(1/c^{\prime\prime}\right)\left(1-\epsilon\right)p_{1}(x) for p1(x)/p0(x)c′′\displaystyle\text{ for }p_{1}(x)/p_{0}(x)\geq c^{\prime\prime}
q1,ϵ(x)\displaystyle q_{1,\epsilon}(x) =(1ϵ)p1(x)\displaystyle=\left(1-\epsilon\right)p_{1}(x) for p1(x)/p0(x)>c\displaystyle\text{ for }p_{1}(x)/p_{0}(x)>c^{\prime}
=c(1ϵ)p0(x)\displaystyle=c^{\prime}\left(1-\epsilon\right)p_{0}(x) for p1(x)/p0(x)c.\displaystyle\text{ for }p_{1}(x)/p_{0}(x)\leqslant c^{\prime}.

The numbers 0c<c′′0\leqslant c^{\prime}<c^{\prime\prime}\leqslant\infty have to be determined such that q0,ϵ,q1,ϵq_{0,\epsilon},q_{1,\epsilon} are probability densities, i.e.,

(1ϵ){P0[p1/p0<c′′]+(c′′)1P1[p1/p0c′′]}\displaystyle\left(1-\epsilon\right)\left\{P_{0}\left[p_{1}/p_{0}<c^{\prime\prime}\right]+\left(c^{\prime\prime}\right)^{-1}P_{1}\left[p_{1}/p_{0}\geq c^{\prime\prime}\right]\right\} =1\displaystyle=1 (1)
(1ϵ){P1[p1/p0>c]+cP0[p1/p0c]}\displaystyle\left(1-\epsilon\right)\left\{P_{1}\left[p_{1}/p_{0}>c^{\prime}\right]+c^{\prime}P_{0}\left[p_{1}/p_{0}\leqslant c^{\prime}\right]\right\} =1\displaystyle=1 (2)

Then, huber1965robust proved that such c,c′′c^{\prime},c^{\prime\prime} exist and Qi𝒫i,i=1,2Q_{i}\in\mathcal{P}_{i},i=1,2 are “least favorable” distributions.

Note that q1,ϵ(x)q0,ϵ(x)=max{c,min{c′′,p1(x)p0(x)}}\frac{q_{1,\epsilon}(x)}{q_{0,\epsilon}(x)}=\max\{c^{\prime},\min\{c^{\prime\prime},\frac{p_{1}(x)}{p_{0}(x)}\}\} is a truncation of the original likelihood ratio p1(x)p0(x)\frac{p_{1}(x)}{p_{0}(x)}.

Now we define,

Rt=i=1tq1,ϵ(Xi)q0,ϵ(Xi)(𝔼P0q1,ϵ(X)q0,ϵ(X)+(c′′c)ϵ)t.R_{t}=\frac{\prod_{i=1}^{t}\frac{q_{1,\epsilon}(X_{i})}{q_{0,\epsilon}(X_{i})}}{\left(\mathbb{E}_{P_{0}}\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}+(c^{\prime\prime}-c^{\prime})\epsilon\right)^{t}}. (3)
Lemma 1.1.

RtR_{t} is a non-negative supermartingale for 𝒫0\mathcal{P}_{0}.

We know that the total variation distance is an integral probability metric in the sense that for any pair of real numbers c1<c2c_{1}<c_{2},

DTV(P,Q)=1c2c1supc1fc2|𝔼XPf(X)𝔼XQf(X)|.D_{\operatorname{TV}}(P,Q)=\frac{1}{c_{2}-c_{1}}\sup_{c_{1}\leq f\leq c_{2}}\left|\operatorname*{\mathbb{E}}_{X\sim P}f(X)-\operatorname*{\mathbb{E}}_{X\sim Q}f(X)\right|. (4)

For any distribution Q𝒫0,DTV(Q,P0)<ϵQ\in\mathcal{P}_{0},D_{\operatorname{TV}}(Q,P_{0})<\epsilon, which implies

𝔼Qq1,ϵ(X)q0,ϵ(X)𝔼P0q1,ϵ(X)q0,ϵ(X)+(c′′c)ϵ.\mathbb{E}_{Q}\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}\leqslant\mathbb{E}_{P_{0}}\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}+(c^{\prime\prime}-c^{\prime})\epsilon.

Therefore, RtR_{t} is a non-negative supermartingale for 𝒫0\mathcal{P}_{0}.

Proposition 1.2.

For i=1,2i=1,2, Qi𝒫i=𝔹TV(P0,ϵ)Q_{i}\in\mathcal{P}_{i}=\mathbb{B}_{\operatorname{TV}}(P_{0},\epsilon).

Proof.

We can rewrite q0,ϵq_{0,\epsilon} as

q0,ϵ(x)=(1ϵ)p0(x)+ϵh(x),q_{0,\epsilon}(x)=(1-\epsilon)p_{0}(x)+\epsilon h(x), (5)

where h(x)=1ϵϵ(1c′′p1(x)p0(x))𝟙(p1(x)/p0(x)>c′′).h(x)=\frac{1-\epsilon}{\epsilon}\left(\frac{1}{c^{\prime\prime}}p_{1}(x)-p_{0}(x)\right)\mathds{1}(p_{1}(x)/p_{0}(x)>c^{\prime\prime}). Note that hh is a valid density function since h0h\geq 0 and (5) implies that h𝑑μ=1\int hd\mu=1. Therefore, DTV(P0,Q1,ϵ)<ϵD_{\operatorname{TV}}(P_{0},Q_{1,\epsilon})<\epsilon. ∎

Lemma 1.3.

As ϵ0\epsilon\downarrow 0, c′′esssup[μ]p1p0c^{\prime\prime}\uparrow\text{ess}\sup_{[\mu]}\frac{p_{1}}{p_{0}} and cessinf[μ]p1p0c^{\prime}\downarrow\text{ess}\inf_{[\mu]}\frac{p_{1}}{p_{0}}.

Proof.

Define, f(c)=P0[p1/p0<c]+1cP1[p1/p0c]=1+p1/p0c(1/cp0/p1)p1𝑑μ.f(c)=P_{0}\left[p_{1}/p_{0}<c\right]+\frac{1}{c}P_{1}\left[p_{1}/p_{0}\geq c\right]=1+\int_{p_{1}/p_{0}\geq c}(1/c-p_{0}/p_{1})p_{1}d\mu. Note that c=c′′c=c^{\prime\prime} is a solution of the equation f(c)=11ϵ.f(c)=\frac{1}{1-\epsilon}.

f(c+δ)f(c)=cp1/p0c+δ(1cp0p1)p1𝑑μδc(c+δ)p1/p0c+δp1𝑑μf(c+\delta)-f(c)=-\int_{c\leqslant p_{1}/p_{0}\leqslant c+\delta}\left(\frac{1}{c}-\frac{p_{0}}{p_{1}}\right)p_{1}d\mu-\frac{\delta}{c(c+\delta)}\int_{p_{1}/p_{0}\geq c+\delta}p_{1}d\mu (6)

Therefore, δc(c+δ)f(c+δ)f(c)δc(c+δ)P1[p1/p0c+δ]-\frac{\delta}{c(c+\delta)}\leqslant f(c+\delta)-f(c)\leqslant-\frac{\delta}{c(c+\delta)}P_{1}[p_{1}/p_{0}\geq c+\delta], which implies that ff is a continuous and decreasing function.

Let, c0=esssup[μ]p1p0c_{0}=\text{ess}\sup_{[\mu]}\frac{p_{1}}{p_{0}}. If c0<c_{0}<\infty, we have f(c)=1,f(c)=1, for cc0c\geqslant c_{0} and for c<c0c<c_{0}, f(c)f(c) is strictly decreasing because f(c+δ)f(c)δc(c+δ)P1[p1/p0c+δ]<0f(c+\delta)-f(c)\leqslant-\frac{\delta}{c(c+\delta)}P_{1}[p_{1}/p_{0}\geq c+\delta]<0, for small δ>0\delta>0.

Now, if c0=c_{0}=\infty, f(c+δ)f(c)δc(c+δ)P1[p1/p0c+δ]<0f(c+\delta)-f(c)\leqslant-\frac{\delta}{c(c+\delta)}P_{1}[p_{1}/p_{0}\geq c+\delta]<0, for all cc and hence ff is strictly decreasing with limcf(c)=1\lim_{c\to\infty}f(c)=1.

Note that 11ϵ1\frac{1}{1-\epsilon}\downarrow 1, as ϵ0.\epsilon\downarrow 0. Since, f(c)f(c) is a strictly decreasing function for c<c0c<c_{0}, the solution of the equation f(c)=11ϵf(c)=\frac{1}{1-\epsilon} increases to c0c_{0} in both cases. Therefore, we have c′′esssup[μ]p1p0c^{\prime\prime}\uparrow\text{ess}\sup_{[\mu]}\frac{p_{1}}{p_{0}}, as ϵ0.\epsilon\to 0. Similarly, one can show that cessinf[μ]p1p0c^{\prime}\downarrow\text{ess}\inf_{[\mu]}\frac{p_{1}}{p_{0}}, as ϵ0.\epsilon\to 0.

Lemma 1.4.

Suppose that either esssup[μ]p1p0<\text{ess}\sup_{[\mu]}\frac{p_{1}}{p_{0}}<\infty or DKL(P1,P0)<D_{\operatorname{KL}}(P_{1},P_{0})<\infty. Then, c′′ϵ0,c^{\prime\prime}\epsilon\to 0, as ϵ0.\epsilon\to 0.

Proof.

Let, c0=esssup[μ]p1p0c_{0}=\text{ess}\sup_{[\mu]}\frac{p_{1}}{p_{0}}. If c0<c_{0}<\infty, c′′c0c^{\prime\prime}\leqslant c_{0} and so c′′ϵ0c^{\prime\prime}\epsilon\to 0, as ϵ0\epsilon\to 0.

Now, if c0=c_{0}=\infty, c′′c^{\prime\prime}\to\infty as ϵ0.\epsilon\to 0.

From (1), 1+1c′′P1[p1/p0c′′]11ϵ1+\frac{1}{c^{\prime\prime}}P_{1}\left[p_{1}/p_{0}\geq c^{\prime\prime}\right]\geq\frac{1}{1-\epsilon}, which implies c′′ϵ(1ϵ)P1[p1/p0c′′]c^{\prime\prime}\epsilon\leqslant(1-\epsilon)P_{1}\left[p_{1}/p_{0}\geq c^{\prime\prime}\right].

If DKL(P1,P0)<D_{\operatorname{KL}}(P_{1},P_{0})<\infty, we have 𝔼P1|log(p1/p0)|<.\mathbb{E}_{P_{1}}\left|\log(p_{1}/p_{0})\right|<\infty. Then,

P1[p1/p0c′′]=P1[log(p1/p0)logc′′]P1[|log(p1/p0)|logc′′]𝔼P1|log(p1/p0)|logc′′0,P_{1}\left[p_{1}/p_{0}\geq c^{\prime\prime}\right]=P_{1}\left[\log(p_{1}/p_{0})\geq\log c^{\prime\prime}\right]\leq P_{1}\left[|\log(p_{1}/p_{0})|\geq\log c^{\prime\prime}\right]\leqslant\frac{\mathbb{E}_{P_{1}}|\log(p_{1}/p_{0})|}{\log c^{\prime\prime}}\to 0,

as c′′c^{\prime\prime}\to\infty. Hence, c′′ϵ0,c^{\prime\prime}\epsilon\to 0, since c′′c^{\prime\prime}\to\infty, as ϵ0\epsilon\to 0 for the case when c0=c_{0}=\infty. ∎

2 Growth Rate

Theorem 2.1.

Suppose that ϵ>0\epsilon>0 and X1,X2,iidQ𝒫1X_{1},X_{2},\cdots\stackrel{{\scriptstyle iid}}{{\sim}}Q\in\mathcal{P}_{1}. Then,

logRttr almost surely, where rDKL(Q1,ϵ,Q0,ϵ)2(logc′′logc)ϵlog(1+2(c′′c)ϵ).\frac{\log R_{t}}{t}\to r\text{ almost surely, where }r\geq D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})-2(\log c^{\prime\prime}-\log c^{\prime})\epsilon-\log(1+2(c^{\prime\prime}-c^{\prime})\epsilon).
Proof.

By SLLN,

logRttr almost surely, \frac{\log R_{t}}{t}\to r\text{ almost surely, } (7)

where r=𝔼Qlogq1,ϵ(X)q0,ϵ(X)log(𝔼P0q1,ϵ(X)q0,ϵ(X)+(c′′c)ϵ).r=\mathbb{E}_{Q}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}-\log\left(\mathbb{E}_{P_{0}}\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}+(c^{\prime\prime}-c^{\prime})\epsilon\right). Since DTV(Q0,ϵ,P0)<ϵD_{\operatorname{TV}}(Q_{0,\epsilon},P_{0})<\epsilon,

1=𝔼Q0,ϵq1,ϵ(X)q0,ϵ(X)𝔼P0q1,ϵ(X)q0,ϵ(X)(c′′c)ϵ.1=\mathbb{E}_{Q_{0,\epsilon}}\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}\geq\mathbb{E}_{P_{0}}\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}-(c^{\prime\prime}-c^{\prime})\epsilon.

Hence, r𝔼Qlogq1,ϵ(X)q0,ϵ(X)log(1+2(c′′c)ϵ)r\geq\mathbb{E}_{Q}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}-\log(1+2(c^{\prime\prime}-c^{\prime})\epsilon). Note that DTV(Q1,ϵ,Q)<2ϵD_{\operatorname{TV}}(Q_{1,\epsilon},Q)<2\epsilon, so

𝔼Qlogq1,ϵ(X)q0,ϵ(X)𝔼Q1,ϵlogq1,ϵ(X)q0,ϵ(X)2(logc′′logc)ϵ.\mathbb{E}_{Q}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}\geq\mathbb{E}_{Q_{1,\epsilon}}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}-2(\log c^{\prime\prime}-\log c^{\prime})\epsilon.

Therefore,

rDKL(Q1,ϵ,Q0,ϵ)2(logc′′logc)ϵlog(1+2(c′′c)ϵ).r\geq D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})-2(\log c^{\prime\prime}-\log c^{\prime})\epsilon-\log(1+2(c^{\prime\prime}-c^{\prime})\epsilon). (8)

Corollary 2.2.

If rr^{*} is the optimal growth rate for testing 𝒫0\mathcal{P}_{0} vs 𝒫1\mathcal{P}_{1}, then rr4ϵlog1ϵϵlog(32ϵ(12ϵ)1ϵ).r\geq r^{*}-4\epsilon\log\frac{1-\epsilon}{\epsilon}-\log\left(3-\frac{2\epsilon(1-2\epsilon)}{1-\epsilon}\right).

Proof.

From (1), we get (1ϵ)(1+(c′′)1)1(1-\epsilon)(1+\left(c^{\prime\prime}\right)^{-1})\geq 1, which implies c′′1ϵ1c^{\prime\prime}\leqslant\frac{1}{\epsilon}-1. Similarly, from (2), we get cϵ1ϵc^{\prime}\geq\frac{\epsilon}{1-\epsilon}. Hence,

rDKL(Q1,ϵ,Q0,ϵ)4ϵlog1ϵϵlog(32ϵ(12ϵ)1ϵ).r\geq D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})-4\epsilon\log\frac{1-\epsilon}{\epsilon}-\log\left(3-\frac{2\epsilon(1-2\epsilon)}{1-\epsilon}\right). (9)

The growth rate of an optimal robust test for 𝒫0\mathcal{P}_{0} vs 𝒫1\mathcal{P}_{1} cannot be better than DKL(Q1,ϵ,Q0,ϵ)D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon}), since any test for 𝒫0\mathcal{P}_{0} vs 𝒫1\mathcal{P}_{1} is a test for Q0,ϵQ_{0,\epsilon} vs Q1,ϵQ_{1,\epsilon} as well, for which we know that the growth rate can be at most DKL(Q1,ϵ,Q0,ϵ)D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon}). Therefore, the growth rate of our test can deviate from the optimal growth rate by at most 4ϵlog1ϵϵ+log(32ϵ(12ϵ)1ϵ)4\epsilon\log\frac{1-\epsilon}{\epsilon}+\log\left(3-\frac{2\epsilon(1-2\epsilon)}{1-\epsilon}\right), which is approximately log3\log 3 for small positive values of ϵ.\epsilon.

Corollary 2.3.

Suppose that either esssup[μ]p1p0<\text{ess}\sup_{[\mu]}\frac{p_{1}}{p_{0}}<\infty or DKL(P0,P1)D_{\operatorname{KL}}(P_{0},P_{1}) is finite. Then for any δ>0\delta>0, there exists sufficiently small ϵ>0\epsilon>0, such that rrδ,r\geq r^{*}-\delta, where rr^{*} is the optimal growth rate for testing 𝒫0\mathcal{P}_{0} vs 𝒫1\mathcal{P}_{1}.

Proof.

From (1), we get (1ϵ)(1+(c′′)1)1(1-\epsilon)(1+\left(c^{\prime\prime}\right)^{-1})\geq 1, which implies c′′1ϵ1c^{\prime\prime}\leqslant\frac{1}{\epsilon}-1. Similarly, from (2), we get cϵ1ϵc^{\prime}\geq\frac{\epsilon}{1-\epsilon}. Hence,

rDKL(Q1,ϵ,Q0,ϵ)4ϵlog1ϵϵlog(1+2(c′′c)ϵ).r\geq D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})-4\epsilon\log\frac{1-\epsilon}{\epsilon}-\log\left(1+2(c^{\prime\prime}-c^{\prime})\epsilon\right). (10)

If either esssup[μ]p1p0<\text{ess}\sup_{[\mu]}\frac{p_{1}}{p_{0}}<\infty or DKL(P0,P1)D_{\operatorname{KL}}(P_{0},P_{1}) exists finitely, Lemma 1.4 says that c′′ϵ0c^{\prime\prime}\epsilon\to 0, as ϵ0\epsilon\to 0. So, we have that 4ϵlog1ϵϵlog(1+2(c′′+c)ϵ)04\epsilon\log\frac{1-\epsilon}{\epsilon}-\log\left(1+2(c^{\prime\prime}+c^{\prime})\epsilon\right)\to 0, as ϵ0\epsilon\to 0. Therefore, for any δ>0\delta>0, there exists sufficiently small ϵ>0\epsilon>0, such that 4ϵlog1ϵϵ+log(1+2(c′′c)ϵ)<δ4\epsilon\log\frac{1-\epsilon}{\epsilon}+\log\left(1+2(c^{\prime\prime}-c^{\prime})\epsilon\right)<\delta. Using similar arguments as used in the proof of previous corollary, we get |rr|<δ.|r-r^{*}|<\delta.

Theorem 2.4.

The growth rate of our test, rDKL(P1,P0)r\to D_{\operatorname{KL}}(P_{1},P_{0}), as ϵ0\epsilon\to 0.

Proof.

Define, Zϵ=logq1,ϵ(X)q0,ϵ(X)Z_{\epsilon}=\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)} and Z=logp1(X)p0(X)Z=\log\frac{p_{1}(X)}{p_{0}(X)}. We write them as Zϵ=Zϵ+Zϵ,Z=Z+Z.Z_{\epsilon}=Z_{\epsilon}^{+}-Z_{\epsilon}^{-},Z=Z^{+}-Z^{-}. As ϵ0\epsilon\to 0, c′′esssup[μ]p1p0c^{\prime\prime}\uparrow\text{ess}\sup_{[\mu]}\frac{p_{1}}{p_{0}} and cessinf[μ]p1p0c^{\prime}\downarrow\text{ess}\inf_{[\mu]}\frac{p_{1}}{p_{0}}. Therefore, Zϵ+Z+Z_{\epsilon}^{+}\uparrow Z^{+} and ZϵZZ_{\epsilon}^{-}\downarrow Z^{-} almost surely as ϵ0\epsilon\downarrow 0. Therefore, using monotone convergence theorem, we have 𝔼P1Zϵ+𝔼P1Z+\mathbb{E}_{P_{1}}Z_{\epsilon}^{+}\uparrow\mathbb{E}_{P_{1}}Z^{+} and 𝔼P1Zϵ𝔼P1Z\mathbb{E}_{P_{1}}Z_{\epsilon}^{-}\downarrow\mathbb{E}_{P_{1}}Z^{-}, as ϵ0\epsilon\downarrow 0. Since DKL(P1,P0)=𝔼P1Z+𝔼P1ZD_{\operatorname{KL}}(P_{1},P_{0})=\mathbb{E}_{P_{1}}Z^{+}-\mathbb{E}_{P_{1}}Z^{-} exists, we have 𝔼P1logq1,ϵ(X)q0,ϵ(X)DKL(P0,P1)\mathbb{E}_{P_{1}}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}\to D_{\operatorname{KL}}(P_{0},P_{1}), as ϵ0.\epsilon\to 0.

Case I: If DKL(P1,P0)<D_{\operatorname{KL}}(P_{1},P_{0})<\infty, using Lemma 1.4 we have

|𝔼Q1,ϵlogq1,ϵ(X)q0,ϵ(X)𝔼P1logq1,ϵ(X)q0,ϵ(X)|(c′′c)ϵ0.\left|\mathbb{E}_{Q_{1,\epsilon}}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}-\mathbb{E}_{P_{1}}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}\right|\leqslant(c^{\prime\prime}-c^{\prime})\epsilon\to 0.

Therefore, DKL(Q1,ϵ,Q0,ϵ)DKL(P1,P0)D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})\to D_{\operatorname{KL}}(P_{1},P_{0}), as ϵ0\epsilon\to 0. Now, from theorem 2.1 and Lemma 1.4, we have

rDKL(Q1,ϵ,Q0,ϵ)2(logc′′logc)ϵlog(1+2(c′′c)ϵ)DKL(P1,P0).r\geq D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})-2(\log c^{\prime\prime}-\log c^{\prime})\epsilon-\log(1+2(c^{\prime\prime}-c^{\prime})\epsilon)\to D_{\operatorname{KL}}(P_{1},P_{0}). (11)

And we must have, rDKL(P1,P0)r\leqslant D_{\operatorname{KL}}(P_{1},P_{0}). Thus, rDKL(P1,P0)r\to D_{\operatorname{KL}}(P_{1},P_{0}), as ϵ0\epsilon\to 0.

Case II: If DKL(P1,P0)=D_{\operatorname{KL}}(P_{1},P_{0})=\infty In this case, 𝔼P1logq1,ϵ(X)q0,ϵ(X)DKL(P0,P1)=\mathbb{E}_{P_{1}}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}\to D_{\operatorname{KL}}(P_{0},P_{1})=\infty, as ϵ0\epsilon\to 0. Also, c′′11ϵc^{\prime\prime}\leqslant 1-\frac{1}{\epsilon} implies

|𝔼Q1,ϵlogq1,ϵ(X)q0,ϵ(X)𝔼P1logq1,ϵ(X)q0,ϵ(X)|(c′′c)ϵ1.\left|\mathbb{E}_{Q_{1,\epsilon}}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}-\mathbb{E}_{P_{1}}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}\right|\leqslant(c^{\prime\prime}-c^{\prime})\epsilon\leqslant 1.

Therefore, DKL(Q1,ϵ,Q0,ϵ)DKL(P1,P0)=D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})\to D_{\operatorname{KL}}(P_{1},P_{0})=\infty, as ϵ0\epsilon\to 0. From (9),

rDKL(Q1,ϵ,Q0,ϵ)4ϵlog1ϵϵlog(1+2(c′′c)ϵ), as ϵ0.r\geq D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})-4\epsilon\log\frac{1-\epsilon}{\epsilon}-\log\left(1+2(c^{\prime\prime}-c^{\prime})\epsilon\right)\to\infty,\text{ as }\epsilon\to 0. (12)

Therefore, in both the cases we have rDKL(P1,P0)r\to D_{\operatorname{KL}}(P_{1},P_{0}), as ϵ0\epsilon\to 0. ∎

3 Simulations

In this section, we present a series of simulations designed to evaluate the performance of our robust SPRT. We use two key parameters in our analysis: ϵA\epsilon^{A}, which represents the value of ϵ\epsilon specified to the test algorithm and ϵR\epsilon^{R}, which denotes the true fraction of data contaminated.

Growth rate with different contamination

For this experiment, samples are simulated independently from (1ϵR)×N(0,1)+ϵR×Cauchy(1,5)(1-\epsilon^{R})\times N(0,1)+\epsilon^{R}\times\text{Cauchy}(1,5) for ϵR=104,103,102\epsilon^{R}=10^{-4},10^{-3},10^{-2}. This mixture model ensures that the ϵR\epsilon^{R} fraction of the sample is drawn from the heavy-tailed Cauchy distribution with location and scale parameters 11 and 55 respectively. Fig. 1 shows the growths of the test supermartingales when ϵA=ϵR\epsilon^{A}=\epsilon^{R}.

Comparison with SPRT when actual data has no contamination

Here, samples are drawn independently from N(0,1)N(0,1) without adding any contamination. Our objective is to check the cost incurred to safeguard against potential adversarial scenarios, despite the absence of actual contamination, where a naive SPRT could have been utilized instead. Fig 2 shows the growth of our robust SPRT for different specified values of ϵA\epsilon^{A} and the original SPRT.

Refer to caption
Figure 1: Data is drawn from (1ϵR)×N(0,1)+ϵR×Cauchy(1,5)(1-\epsilon^{R})\times N(0,1)+\epsilon^{R}\times\text{Cauchy}(1,5) and P0=N(1,1),P1=N(0,1)P_{0}=N(1,1),P_{1}=N(0,1). Here we observe that the growth rate for ϵA=ϵR\epsilon^{A}=\epsilon^{R} slightly increases as ϵ\epsilon increases.
Refer to caption
Figure 2: Data is drawn from N(0,1)N(0,1), ϵR=0\epsilon^{R}=0 and P0=N(1,1),P1=N(0,1)P_{0}=N(1,1),P_{1}=N(0,1). Here, the growth rate of our robust SPRT is nearly half of that of SPRT (non-robust).

Growth rate with different separation between null and alternative

For this experiment, samples are simulated independently from (1ϵR)×N(0,1)+ϵR×Cauchy(1,5)(1-\epsilon^{R})\times N(0,1)+\epsilon^{R}\times\text{Cauchy}(1,5) for ϵR=104,103,102\epsilon^{R}=10^{-4},10^{-3},10^{-2}. To ensure that the data is contaminated with potential outliers, ϵR\epsilon^{R} fraction of the sample is drawn from the heavy-tailed Cauchy distribution with location and scale parameters 11 and 55 respectively. We consider ϵA\epsilon^{A}-robust test for P0=N(μ,1)P_{0}=N(\mu,1) vs P1=N(0,1)P_{1}=N(0,1), for μ=1,0.5,0.25\mu=1,0.5,0.25. As anticipated, the growth rate of the robust test decreases as the null and alternative hypotheses become harder to distinguish.

Refer to caption
Figure 3: Data is drawn from (1ϵR)×N(0,1)+ϵR×Cauchy(1,5)(1-\epsilon^{R})\times N(0,1)+\epsilon^{R}\times\text{Cauchy}(1,5), ϵA=ϵR=0.01\epsilon^{A}=\epsilon^{R}=0.01. As expected, the growth rate increases as the TV distance between P0P_{0} and P1P_{1} increases.