Robust Anytime-Valid Sequential Probability Ratio Test
1 Method
Define the distributions by their densities as follows:
The numbers have to be determined such that are probability densities, i.e.,
(1) | ||||
(2) |
Then, huber1965robust proved that such exist and are “least favorable” distributions.
Note that is a truncation of the original likelihood ratio .
Now we define,
(3) |
Lemma 1.1.
is a non-negative supermartingale for .
We know that the total variation distance is an integral probability metric in the sense that for any pair of real numbers ,
(4) |
For any distribution , which implies
Therefore, is a non-negative supermartingale for .
Proposition 1.2.
For , .
Proof.
We can rewrite as
(5) |
where Note that is a valid density function since and (5) implies that . Therefore, . ∎
Lemma 1.3.
As , and .
Proof.
Define, Note that is a solution of the equation
(6) |
Therefore, , which implies that is a continuous and decreasing function.
Let, . If , we have for and for , is strictly decreasing because , for small .
Now, if , , for all and hence is strictly decreasing with .
Note that , as Since, is a strictly decreasing function for , the solution of the equation increases to in both cases. Therefore, we have , as Similarly, one can show that , as ∎
Lemma 1.4.
Suppose that either or . Then, as
Proof.
Let, . If , and so , as .
Now, if , as
From (1), , which implies .
If , we have Then,
as . Hence, since , as for the case when . ∎
2 Growth Rate
Theorem 2.1.
Suppose that and . Then,
Proof.
By SLLN,
(7) |
where Since ,
Hence, . Note that , so
Therefore,
(8) |
∎
Corollary 2.2.
If is the optimal growth rate for testing vs , then
Proof.
From (1), we get , which implies . Similarly, from (2), we get . Hence,
(9) |
The growth rate of an optimal robust test for vs cannot be better than , since any test for vs is a test for vs as well, for which we know that the growth rate can be at most . Therefore, the growth rate of our test can deviate from the optimal growth rate by at most , which is approximately for small positive values of ∎
Corollary 2.3.
Suppose that either or is finite. Then for any , there exists sufficiently small , such that where is the optimal growth rate for testing vs .
Proof.
From (1), we get , which implies . Similarly, from (2), we get . Hence,
(10) |
If either or exists finitely, Lemma 1.4 says that , as . So, we have that , as . Therefore, for any , there exists sufficiently small , such that . Using similar arguments as used in the proof of previous corollary, we get ∎
Theorem 2.4.
The growth rate of our test, , as .
Proof.
Define, and . We write them as As , and . Therefore, and almost surely as . Therefore, using monotone convergence theorem, we have and , as . Since exists, we have , as
Case I: If , using Lemma 1.4 we have
Therefore, , as . Now, from theorem 2.1 and Lemma 1.4, we have
(11) |
And we must have, . Thus, , as .
Case II: If In this case, , as . Also, implies
Therefore, , as . From (9),
(12) |
Therefore, in both the cases we have , as . ∎
3 Simulations
In this section, we present a series of simulations designed to evaluate the performance of our robust SPRT. We use two key parameters in our analysis: , which represents the value of specified to the test algorithm and , which denotes the true fraction of data contaminated.
Growth rate with different contamination
For this experiment, samples are simulated independently from for . This mixture model ensures that the fraction of the sample is drawn from the heavy-tailed Cauchy distribution with location and scale parameters and respectively. Fig. 1 shows the growths of the test supermartingales when .
Comparison with SPRT when actual data has no contamination
Here, samples are drawn independently from without adding any contamination. Our objective is to check the cost incurred to safeguard against potential adversarial scenarios, despite the absence of actual contamination, where a naive SPRT could have been utilized instead. Fig 2 shows the growth of our robust SPRT for different specified values of and the original SPRT.


Growth rate with different separation between null and alternative
For this experiment, samples are simulated independently from for . To ensure that the data is contaminated with potential outliers, fraction of the sample is drawn from the heavy-tailed Cauchy distribution with location and scale parameters and respectively. We consider -robust test for vs , for . As anticipated, the growth rate of the robust test decreases as the null and alternative hypotheses become harder to distinguish.
