Robust Anytime-Valid Sequential Probability Ratio Test

1 Method

Define the distributions $Q_{i},i=1,2,$ by their densities as follows:

$\displaystyle q_{0,\epsilon}(x)$	$\displaystyle=\left(1-\epsilon\right)p_{0}(x)$	$\displaystyle\text{ for }p_{1}(x)/p_{0}(x)<c^{\prime\prime}$
	$\displaystyle=\left(1/c^{\prime\prime}\right)\left(1-\epsilon\right)p_{1}(x)$	$\displaystyle\text{ for }p_{1}(x)/p_{0}(x)\geq c^{\prime\prime}$
$\displaystyle q_{1,\epsilon}(x)$	$\displaystyle=\left(1-\epsilon\right)p_{1}(x)$	$\displaystyle\text{ for }p_{1}(x)/p_{0}(x)>c^{\prime}$
	$\displaystyle=c^{\prime}\left(1-\epsilon\right)p_{0}(x)$	$\displaystyle\text{ for }p_{1}(x)/p_{0}(x)\leqslant c^{\prime}.$

The numbers $0\leqslant c^{\prime}<c^{\prime\prime}\leqslant\infty$ have to be determined such that $q_{0,\epsilon},q_{1,\epsilon}$ are probability densities, i.e.,

	$\displaystyle\left(1-\epsilon\right)\left\{P_{0}\left[p_{1}/p_{0}<c^{\prime\prime}\right]+\left(c^{\prime\prime}\right)^{-1}P_{1}\left[p_{1}/p_{0}\geq c^{\prime\prime}\right]\right\}$	$\displaystyle=1$		(1)
	$\displaystyle\left(1-\epsilon\right)\left\{P_{1}\left[p_{1}/p_{0}>c^{\prime}\right]+c^{\prime}P_{0}\left[p_{1}/p_{0}\leqslant c^{\prime}\right]\right\}$	$\displaystyle=1$		(2)

Then, huber1965robust proved that such $c^{\prime},c^{\prime\prime}$ exist and $Q_{i}\in\mathcal{P}_{i},i=1,2$ are “least favorable” distributions.

Note that $\frac{q_{1,\epsilon}(x)}{q_{0,\epsilon}(x)}=\max\{c^{\prime},\min\{c^{\prime\prime},\frac{p_{1}(x)}{p_{0}(x)}\}\}$ is a truncation of the original likelihood ratio $\frac{p_{1}(x)}{p_{0}(x)}$ .

Now we define,

R_{t}=\frac{\prod_{i=1}^{t}\frac{q_{1,\epsilon}(X_{i})}{q_{0,\epsilon}(X_{i})}}{\left(\mathbb{E}_{P_{0}}\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}+(c^{\prime\prime}-c^{\prime})\epsilon\right)^{t}}.

(3)

Lemma 1.1.

$R_{t}$ is a non-negative supermartingale for $\mathcal{P}_{0}$ .

We know that the total variation distance is an integral probability metric in the sense that for any pair of real numbers $c_{1}<c_{2}$ ,

D_{\operatorname{TV}}(P,Q)=\frac{1}{c_{2}-c_{1}}\sup_{c_{1}\leq f\leq c_{2}}\left|\operatorname*{\mathbb{E}}_{X\sim P}f(X)-\operatorname*{\mathbb{E}}_{X\sim Q}f(X)\right|.

(4)

For any distribution $Q\in\mathcal{P}_{0},D_{\operatorname{TV}}(Q,P_{0})<\epsilon$ , which implies

\mathbb{E}_{Q}\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}\leqslant\mathbb{E}_{P_{0}}\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}+(c^{\prime\prime}-c^{\prime})\epsilon.

Therefore, $R_{t}$ is a non-negative supermartingale for $\mathcal{P}_{0}$ .

Proposition 1.2.

For $i=1,2$ , $Q_{i}\in\mathcal{P}_{i}=\mathbb{B}_{\operatorname{TV}}(P_{0},\epsilon)$ .

Proof.

We can rewrite $q_{0,\epsilon}$ as

q_{0,\epsilon}(x)=(1-\epsilon)p_{0}(x)+\epsilon h(x),

(5)

where $h(x)=\frac{1-\epsilon}{\epsilon}\left(\frac{1}{c^{\prime\prime}}p_{1}(x)-p_{0}(x)\right)\mathds{1}(p_{1}(x)/p_{0}(x)>c^{\prime\prime}).$ Note that $h$ is a valid density function since $h\geq 0$ and (5) implies that $\int hd\mu=1$ . Therefore, $D_{\operatorname{TV}}(P_{0},Q_{1,\epsilon})<\epsilon$ . ∎

Lemma 1.3.

As $\epsilon\downarrow 0$ , $c^{\prime\prime}\uparrow\text{ess}\sup_{[\mu]}\frac{p_{1}}{p_{0}}$ and $c^{\prime}\downarrow\text{ess}\inf_{[\mu]}\frac{p_{1}}{p_{0}}$ .

Proof.

Define, $f(c)=P_{0}\left[p_{1}/p_{0}<c\right]+\frac{1}{c}P_{1}\left[p_{1}/p_{0}\geq c\right]=1+\int_{p_{1}/p_{0}\geq c}(1/c-p_{0}/p_{1})p_{1}d\mu.$ Note that $c=c^{\prime\prime}$ is a solution of the equation $f(c)=\frac{1}{1-\epsilon}.$

f(c+\delta)-f(c)=-\int_{c\leqslant p_{1}/p_{0}\leqslant c+\delta}\left(\frac{1}{c}-\frac{p_{0}}{p_{1}}\right)p_{1}d\mu-\frac{\delta}{c(c+\delta)}\int_{p_{1}/p_{0}\geq c+\delta}p_{1}d\mu

(6)

Therefore, $-\frac{\delta}{c(c+\delta)}\leqslant f(c+\delta)-f(c)\leqslant-\frac{\delta}{c(c+\delta)}P_{1}[p_{1}/p_{0}\geq c+\delta]$ , which implies that $f$ is a continuous and decreasing function.

Let, $c_{0}=\text{ess}\sup_{[\mu]}\frac{p_{1}}{p_{0}}$ . If $c_{0}<\infty$ , we have $f(c)=1,$ for $c\geqslant c_{0}$ and for $c<c_{0}$ , $f(c)$ is strictly decreasing because $f(c+\delta)-f(c)\leqslant-\frac{\delta}{c(c+\delta)}P_{1}[p_{1}/p_{0}\geq c+\delta]<0$ , for small $\delta>0$ .

Now, if $c_{0}=\infty$ , $f(c+\delta)-f(c)\leqslant-\frac{\delta}{c(c+\delta)}P_{1}[p_{1}/p_{0}\geq c+\delta]<0$ , for all $c$ and hence $f$ is strictly decreasing with $\lim_{c\to\infty}f(c)=1$ .

Note that $\frac{1}{1-\epsilon}\downarrow 1$ , as $\epsilon\downarrow 0.$ Since, $f(c)$ is a strictly decreasing function for $c<c_{0}$ , the solution of the equation $f(c)=\frac{1}{1-\epsilon}$ increases to $c_{0}$ in both cases. Therefore, we have $c^{\prime\prime}\uparrow\text{ess}\sup_{[\mu]}\frac{p_{1}}{p_{0}}$ , as $\epsilon\to 0.$ Similarly, one can show that $c^{\prime}\downarrow\text{ess}\inf_{[\mu]}\frac{p_{1}}{p_{0}}$ , as $\epsilon\to 0.$ ∎

Lemma 1.4.

Suppose that either $\text{ess}\sup_{[\mu]}\frac{p_{1}}{p_{0}}<\infty$ or $D_{\operatorname{KL}}(P_{1},P_{0})<\infty$ . Then, $c^{\prime\prime}\epsilon\to 0,$ as $\epsilon\to 0.$

Proof.

Let, $c_{0}=\text{ess}\sup_{[\mu]}\frac{p_{1}}{p_{0}}$ . If $c_{0}<\infty$ , $c^{\prime\prime}\leqslant c_{0}$ and so $c^{\prime\prime}\epsilon\to 0$ , as $\epsilon\to 0$ .

Now, if $c_{0}=\infty$ , $c^{\prime\prime}\to\infty$ as $\epsilon\to 0.$

From (1), $1+\frac{1}{c^{\prime\prime}}P_{1}\left[p_{1}/p_{0}\geq c^{\prime\prime}\right]\geq\frac{1}{1-\epsilon}$ , which implies $c^{\prime\prime}\epsilon\leqslant(1-\epsilon)P_{1}\left[p_{1}/p_{0}\geq c^{\prime\prime}\right]$ .

If $D_{\operatorname{KL}}(P_{1},P_{0})<\infty$ , we have $\mathbb{E}_{P_{1}}\left|\log(p_{1}/p_{0})\right|<\infty.$ Then,

P_{1}\left[p_{1}/p_{0}\geq c^{\prime\prime}\right]=P_{1}\left[\log(p_{1}/p_{0})\geq\log c^{\prime\prime}\right]\leq P_{1}\left[|\log(p_{1}/p_{0})|\geq\log c^{\prime\prime}\right]\leqslant\frac{\mathbb{E}_{P_{1}}|\log(p_{1}/p_{0})|}{\log c^{\prime\prime}}\to 0,

as $c^{\prime\prime}\to\infty$ . Hence, $c^{\prime\prime}\epsilon\to 0,$ since $c^{\prime\prime}\to\infty$ , as $\epsilon\to 0$ for the case when $c_{0}=\infty$ . ∎

2 Growth Rate

Theorem 2.1.

Suppose that $\epsilon>0$ and $X_{1},X_{2},\cdots\stackrel{{\scriptstyle iid}}{{\sim}}Q\in\mathcal{P}_{1}$ . Then,

\frac{\log R_{t}}{t}\to r\text{ almost surely, where }r\geq D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})-2(\log c^{\prime\prime}-\log c^{\prime})\epsilon-\log(1+2(c^{\prime\prime}-c^{\prime})\epsilon).

Proof.

By SLLN,

\frac{\log R_{t}}{t}\to r\text{ almost surely, }

(7)

where $r=\mathbb{E}_{Q}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}-\log\left(\mathbb{E}_{P_{0}}\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}+(c^{\prime\prime}-c^{\prime})\epsilon\right).$ Since $D_{\operatorname{TV}}(Q_{0,\epsilon},P_{0})<\epsilon$ ,

1=\mathbb{E}_{Q_{0,\epsilon}}\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}\geq\mathbb{E}_{P_{0}}\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}-(c^{\prime\prime}-c^{\prime})\epsilon.

Hence, $r\geq\mathbb{E}_{Q}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}-\log(1+2(c^{\prime\prime}-c^{\prime})\epsilon)$ . Note that $D_{\operatorname{TV}}(Q_{1,\epsilon},Q)<2\epsilon$ , so

\mathbb{E}_{Q}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}\geq\mathbb{E}_{Q_{1,\epsilon}}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}-2(\log c^{\prime\prime}-\log c^{\prime})\epsilon.

Therefore,

r\geq D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})-2(\log c^{\prime\prime}-\log c^{\prime})\epsilon-\log(1+2(c^{\prime\prime}-c^{\prime})\epsilon).

(8)

∎

Corollary 2.2.

If $r^{*}$ is the optimal growth rate for testing $\mathcal{P}_{0}$ vs $\mathcal{P}_{1}$ , then $r\geq r^{*}-4\epsilon\log\frac{1-\epsilon}{\epsilon}-\log\left(3-\frac{2\epsilon(1-2\epsilon)}{1-\epsilon}\right).$

Proof.

From (1), we get $(1-\epsilon)(1+\left(c^{\prime\prime}\right)^{-1})\geq 1$ , which implies $c^{\prime\prime}\leqslant\frac{1}{\epsilon}-1$ . Similarly, from (2), we get $c^{\prime}\geq\frac{\epsilon}{1-\epsilon}$ . Hence,

r\geq D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})-4\epsilon\log\frac{1-\epsilon}{\epsilon}-\log\left(3-\frac{2\epsilon(1-2\epsilon)}{1-\epsilon}\right).

(9)

The growth rate of an optimal robust test for $\mathcal{P}_{0}$ vs $\mathcal{P}_{1}$ cannot be better than $D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})$ , since any test for $\mathcal{P}_{0}$ vs $\mathcal{P}_{1}$ is a test for $Q_{0,\epsilon}$ vs $Q_{1,\epsilon}$ as well, for which we know that the growth rate can be at most $D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})$ . Therefore, the growth rate of our test can deviate from the optimal growth rate by at most $4\epsilon\log\frac{1-\epsilon}{\epsilon}+\log\left(3-\frac{2\epsilon(1-2\epsilon)}{1-\epsilon}\right)$ , which is approximately $\log 3$ for small positive values of $\epsilon.$ ∎

Corollary 2.3.

Suppose that either $\text{ess}\sup_{[\mu]}\frac{p_{1}}{p_{0}}<\infty$ or $D_{\operatorname{KL}}(P_{0},P_{1})$ is finite. Then for any $\delta>0$ , there exists sufficiently small $\epsilon>0$ , such that $r\geq r^{*}-\delta,$ where $r^{*}$ is the optimal growth rate for testing $\mathcal{P}_{0}$ vs $\mathcal{P}_{1}$ .

Proof.

r\geq D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})-4\epsilon\log\frac{1-\epsilon}{\epsilon}-\log\left(1+2(c^{\prime\prime}-c^{\prime})\epsilon\right).

(10)

If either $\text{ess}\sup_{[\mu]}\frac{p_{1}}{p_{0}}<\infty$ or $D_{\operatorname{KL}}(P_{0},P_{1})$ exists finitely, Lemma 1.4 says that $c^{\prime\prime}\epsilon\to 0$ , as $\epsilon\to 0$ . So, we have that $4\epsilon\log\frac{1-\epsilon}{\epsilon}-\log\left(1+2(c^{\prime\prime}+c^{\prime})\epsilon\right)\to 0$ , as $\epsilon\to 0$ . Therefore, for any $\delta>0$ , there exists sufficiently small $\epsilon>0$ , such that $4\epsilon\log\frac{1-\epsilon}{\epsilon}+\log\left(1+2(c^{\prime\prime}-c^{\prime})\epsilon\right)<\delta$ . Using similar arguments as used in the proof of previous corollary, we get $|r-r^{*}|<\delta.$ ∎

Theorem 2.4.

The growth rate of our test, $r\to D_{\operatorname{KL}}(P_{1},P_{0})$ , as $\epsilon\to 0$ .

Proof.

Define, $Z_{\epsilon}=\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}$ and $Z=\log\frac{p_{1}(X)}{p_{0}(X)}$ . We write them as $Z_{\epsilon}=Z_{\epsilon}^{+}-Z_{\epsilon}^{-},Z=Z^{+}-Z^{-}.$ As $\epsilon\to 0$ , $c^{\prime\prime}\uparrow\text{ess}\sup_{[\mu]}\frac{p_{1}}{p_{0}}$ and $c^{\prime}\downarrow\text{ess}\inf_{[\mu]}\frac{p_{1}}{p_{0}}$ . Therefore, $Z_{\epsilon}^{+}\uparrow Z^{+}$ and $Z_{\epsilon}^{-}\downarrow Z^{-}$ almost surely as $\epsilon\downarrow 0$ . Therefore, using monotone convergence theorem, we have $\mathbb{E}_{P_{1}}Z_{\epsilon}^{+}\uparrow\mathbb{E}_{P_{1}}Z^{+}$ and $\mathbb{E}_{P_{1}}Z_{\epsilon}^{-}\downarrow\mathbb{E}_{P_{1}}Z^{-}$ , as $\epsilon\downarrow 0$ . Since $D_{\operatorname{KL}}(P_{1},P_{0})=\mathbb{E}_{P_{1}}Z^{+}-\mathbb{E}_{P_{1}}Z^{-}$ exists, we have $\mathbb{E}_{P_{1}}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}\to D_{\operatorname{KL}}(P_{0},P_{1})$ , as $\epsilon\to 0.$

Case I: If $D_{\operatorname{KL}}(P_{1},P_{0})<\infty$ , using Lemma 1.4 we have

\left|\mathbb{E}_{Q_{1,\epsilon}}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}-\mathbb{E}_{P_{1}}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}\right|\leqslant(c^{\prime\prime}-c^{\prime})\epsilon\to 0.

Therefore, $D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})\to D_{\operatorname{KL}}(P_{1},P_{0})$ , as $\epsilon\to 0$ . Now, from theorem 2.1 and Lemma 1.4, we have

r\geq D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})-2(\log c^{\prime\prime}-\log c^{\prime})\epsilon-\log(1+2(c^{\prime\prime}-c^{\prime})\epsilon)\to D_{\operatorname{KL}}(P_{1},P_{0}).

(11)

And we must have, $r\leqslant D_{\operatorname{KL}}(P_{1},P_{0})$ . Thus, $r\to D_{\operatorname{KL}}(P_{1},P_{0})$ , as $\epsilon\to 0$ .

Case II: If $D_{\operatorname{KL}}(P_{1},P_{0})=\infty$ In this case, $\mathbb{E}_{P_{1}}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}\to D_{\operatorname{KL}}(P_{0},P_{1})=\infty$ , as $\epsilon\to 0$ . Also, $c^{\prime\prime}\leqslant 1-\frac{1}{\epsilon}$ implies

\left|\mathbb{E}_{Q_{1,\epsilon}}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}-\mathbb{E}_{P_{1}}\log\frac{q_{1,\epsilon}(X)}{q_{0,\epsilon}(X)}\right|\leqslant(c^{\prime\prime}-c^{\prime})\epsilon\leqslant 1.

Therefore, $D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})\to D_{\operatorname{KL}}(P_{1},P_{0})=\infty$ , as $\epsilon\to 0$ . From (9),

r\geq D_{\operatorname{KL}}(Q_{1,\epsilon},Q_{0,\epsilon})-4\epsilon\log\frac{1-\epsilon}{\epsilon}-\log\left(1+2(c^{\prime\prime}-c^{\prime})\epsilon\right)\to\infty,\text{ as }\epsilon\to 0.

(12)

Therefore, in both the cases we have $r\to D_{\operatorname{KL}}(P_{1},P_{0})$ , as $\epsilon\to 0$ . ∎

3 Simulations

In this section, we present a series of simulations designed to evaluate the performance of our robust SPRT. We use two key parameters in our analysis: $\epsilon^{A}$ , which represents the value of $\epsilon$ specified to the test algorithm and $\epsilon^{R}$ , which denotes the true fraction of data contaminated.

Growth rate with different contamination

For this experiment, samples are simulated independently from $(1-\epsilon^{R})\times N(0,1)+\epsilon^{R}\times\text{Cauchy}(1,5)$ for $\epsilon^{R}=10^{-4},10^{-3},10^{-2}$ . This mixture model ensures that the $\epsilon^{R}$ fraction of the sample is drawn from the heavy-tailed Cauchy distribution with location and scale parameters $1$ and $5$ respectively. Fig. 1 shows the growths of the test supermartingales when $\epsilon^{A}=\epsilon^{R}$ .

Comparison with SPRT when actual data has no contamination

Here, samples are drawn independently from $N(0,1)$ without adding any contamination. Our objective is to check the cost incurred to safeguard against potential adversarial scenarios, despite the absence of actual contamination, where a naive SPRT could have been utilized instead. Fig 2 shows the growth of our robust SPRT for different specified values of $\epsilon^{A}$ and the original SPRT.

Refer to caption — Figure 1: Data is drawn from $(1-\epsilon^{R})\times N(0,1)+\epsilon^{R}\times\text{Cauchy}(1,5)$ and $P_{0}=N(1,1),P_{1}=N(0,1)$ . Here we observe that the growth rate for $\epsilon^{A}=\epsilon^{R}$ slightly increases as $\epsilon$ increases.

Growth rate with different separation between null and alternative

For this experiment, samples are simulated independently from $(1-\epsilon^{R})\times N(0,1)+\epsilon^{R}\times\text{Cauchy}(1,5)$ for $\epsilon^{R}=10^{-4},10^{-3},10^{-2}$ . To ensure that the data is contaminated with potential outliers, $\epsilon^{R}$ fraction of the sample is drawn from the heavy-tailed Cauchy distribution with location and scale parameters $1$ and $5$ respectively. We consider $\epsilon^{A}$ -robust test for $P_{0}=N(\mu,1)$ vs $P_{1}=N(0,1)$ , for $\mu=1,0.5,0.25$ . As anticipated, the growth rate of the robust test decreases as the null and alternative hypotheses become harder to distinguish.