Strong uniform laws of large numbers for bootstrap means and other randomly weighted sums
Abstract
This article establishes novel strong uniform laws of large numbers for randomly weighted sums such as bootstrap means. By leveraging recent advances, these results extend previous work in their general applicability to a wide range of weighting procedures and in their flexibility with respect to the effective bootstrap sample size. In addition to the standard multinomial bootstrap and the -out-of- bootstrap, our results apply to a large class of randomly weighted sums involving negatively orthant dependent (NOD) weights, including the Bayesian bootstrap, jackknife, resampling without replacement, simple random sampling with over-replacement, independent weights, and multivariate Gaussian weighting schemes. Weights are permitted to be non-identically distributed and possibly even negative. Our proof technique is based on extending a proof of the i.i.d. strong uniform law of large numbers to employ strong laws for randomly weighted sums; in particular, we exploit a recent Marcinkiewicz–Zygmund strong law for NOD weighted sums.
1 Introduction
The bootstrap (Efron and Tibshirani, 1994) and related resampling procedures such as bagging (Breiman, 1996), the Bayesian bootstrap (Rubin, 1981), and the jackknife (Efron, 1982) are widely used general-purpose tools for statistical inference. In addition to its original purpose of approximating sampling distributions of estimators (Efron, 1979), the bootstrap and its relatives have been applied to a variety of statistical tasks, including model averaging (Breiman, 1996), approximate Bayesian inference (Newton and Raftery, 1994), outlier detection (Singh and Xie, 2003), robust Bayesian inference (Huggins and Miller, 2019, 2022), and causal inference (Little and Badawy, 2019).
Due to its versatility, extensions of the bootstrap are frequently proposed to address new statistical questions. When establishing the properties of such methods, bootstrap versions of classical asymptotic results play a key role, such as the weak law of large numbers (Athreya et al., 1984, Theorem 1), the strong law of large numbers (Athreya et al., 1984, Theorem 2), and the central limit theorem (Singh, 1981) for bootstrap means.
Meanwhile, it is sometimes important to obtain convergence over an entire collection of random variables simultaneously, thus guaranteeing convergence for even the worst case in the collection. To this end, several authors have established uniform laws of large numbers for bootstrap means. Giné and Zinn (1990, Theorems 2.6 and 3.5) proved weak uniform laws of large numbers for the standard multinomial bootstrap, that is, with weights. Vaart and Wellner (1996, Theorem 3.6.16) proved an analogous result for exchangeably weighted sums such as the Bayesian bootstrap. As weak laws, these show convergence in probability, but often one needs almost sure convergence, that is, a strong law.
Strong uniform laws of large numbers for bootstrap means are provided by Kosorok (2008, Section 10.2) for weighted sums with independent and identically distributed (i.i.d.) weights, with weights obtained by normalizing i.i.d. random variables, and with weights (Theorem 10.13, Corollary 10.14, and Theorem 10.15, respectively). However, these results do not apply to more general schemes such as the jackknife, resampling without replacement, and weights.
In this article, we present new strong uniform laws of large numbers for randomly weighted sums, aiming to fill these gaps in the literature. Our first result applies to the case of weights, known as the -out-of- bootstrap. Our second and third results apply more generally to a large class of randomly weighted sums that involve negatively orthant dependent (NOD) weights. This covers a wide range of weighting schemes, including the Bayesian bootstrap (Rubin, 1981), various versions of the jackknife (Efron, 1982; Chatterjee and Bose, 2005), resampling without replacement (Bickel et al., 2012), simple random sampling with over-replacement (Antal and Tillé, 2011b), weights (Antal and Tillé, 2011a, Section 3), independent weights (Newton and Raftery, 1994), and even schemes involving negative weights such as multivariate Gaussian weights with non-positive correlations (Patak and Beaumont, 2009). All three theorems are flexible in terms of the effective bootstrap sample size (that is, the sum of the weights), for instance, allowing which is of particular interest for certain applications (Bickel et al., 2012; Huggins and Miller, 2022).
2 Main results
We present three strong uniform laws of large numbers: Theorem 1 for the multinomial bootstrap, Theorem 2 for more general randomly weighted sums, and Theorem 3 which establishes faster convergence rates under stronger regularity and moment conditions than Theorem 2. All three results are obtained via extensions of the proof of the i.i.d. strong uniform law of large numbers presented by Ghosh and Ramamoorthi (2003). Specifically, our proof of Theorem 1 involves replacing the traditional strong law of large numbers with a strong law of large numbers for bootstrap means as presented by Arenal-Gutiérrez et al. (1996). Similarly, Theorems 2 and 3 rely on a strong law of large numbers for randomly weighted sums of random variables presented by Chen et al. (2019).
Condition 1.
Suppose is a compact subset of a separable metric space. Let and be a real-valued function on such that
-
(i)
for each , is continuous on , and
-
(ii)
for each , is a measurable function on .
Theorem 1.
Let i.i.d. and for independently, let
independently of , where is a positive integer for . Assume Condition 1 and suppose there exists such that
(1) |
Then
(2) |
In Theorems 2 and 3, we generalize beyond the standard multinomial bootstrap to weighting schemes involving NOD random variables (Lehmann, 1966; Chen et al., 2019).
Definition 1.
A finite collection of random variables is said to be negatively orthant dependent (NOD) if
(3) |
and
(4) |
for all . An infinite collection of random variables is NOD if every finite subcollection is NOD.
Any collection of independent random variables is NOD, and many commonly used multivariate distributions are NOD including the multinomial distribution, the Dirichlet distribution, the Dirichlet-multinomial distribution, the multivariate hypergeometric distribution, convolutions of multinomial distributions, and multivariate Gaussian distributions for which the correlations are all non-positive (Joag-Dev and Proschan, 1983).
Theorem 2.
Let i.i.d. and for independently, let be NOD random variables, independent of . Assume Condition 1 and suppose , , and where and and satisfy . Then
(5) |
In particular, if , then Equation 5 is analogous to Equation 2 with in place of . While the moment condition on in Theorem 2 is slightly more stringent than in Theorem 1, it applies to a more general class of resampling procedures. For instance, the distribution of can be different for each and ; in particular, there is no assumption that are exchangeable or even identically distributed. Further, the weights are not restricted to being non-negative; thus, random weights taking positive and negative values are permitted.
The main limitation of Theorem 2 is that whenever , the condition that requires the weights to be getting smaller as increases. One can scale the weights up by a factor of , but then the leading factor of in Equation 5 becomes , effectively reverting back to the standard rate of convergence. In Theorem 3, we show that this condition can be dropped if we assume stronger regularity and moment conditions.
Condition 2.
Assume is uniformly locally Hölder continuous, in the sense that there exist , , and such that for all , , if then .
Condition 3.
Assume for some and , where is the smallest number of open balls of radius , centered at points in , needed to cover .
Condition 3 holds for any compact ; indeed, by Shalev-Shwartz and Ben-David (2014, Example 27.1), where is the number of -balls needed to cover , centered at any points in , and it is straightforward to verify that .
Theorem 3.
Note that as from above, the bound on the power approaches .
3 Examples of relevant resampling schemes
A key condition of Theorems 2 and 3 is that the weights must be NOD. It turns out that most popular resampling techniques satisfy this condition.
For instance, the -out-of- bootstrap corresponds to weights, unequal probability with replacement corresponds to weights (Antal and Tillé, 2011a), the Bayesian bootstrap corresponds to Dirichlet weights, the delete- jackknife and resampling without replacement correspond to multivariate hypergeometric weights (Chatterjee and Bose, 2005), the weighted likelihood bootstrap is equivalent to using independent weights (Newton and Raftery, 1994), and the reweighting scheme of Patak and Beaumont (2009) employs multivariate Gaussian weights with non-positive correlations. All of these distributions satisfy the NOD requirement (Joag-Dev and Proschan, 1983).
The NOD requirement is also satisfied by less standard reweighting schemes such as the downweight- jackknife (Chatterjee and Bose, 2005) and simple random sampling with over-replacement (Antal and Tillé, 2011b). In the downweight- jackknife, indices are selected uniformly at random to be downweighted such that , whereas the remaining indices are upweighted to . These weights can be viewed as a monotonic transformation of the multivariate hypergeometric weights corresponding to the delete- jackknife, and thus are NOD (Chen et al., 2019). For simple random sampling with over-replacement, the weights can be viewed as the conditional distribution of a sequence of independent geometric random variables, given that their sum equals . Geometric random variables satisfy the conditions of Theorem 2.6 from Joag-Dev and Proschan (1983) according to Efron (1965, 3.1), implying that the resulting weights are NOD.
A general family of applicable NOD reweighting schemes can also be derived from the Farlie-Gumbel-Morgenstern (FGM) -copula (Nelsen, 2006, page 108). For variables , an FGM copula is characterized by parameters—one for each subset of that contains least two elements. Let denote these parameters. The density of the FGM copula is given by
(7) |
where the values must adhere to the constraints
(8) |
for all to ensure non-negativity of the density function. It follows that each is marginally uniformly distributed, and for each ,
(9) | ||||
(10) |
Therefore, for any ,
(11) |
A FGM -copula is thus NOD if and only if
(12) | |||
(13) |
for all . Because NOD is preserved under monotonic transformations, it follows that any multivariate distribution corresponding to such a FGM copula with satisfying the criteria in 12 and 13 will be NOD.
Finally, the condition that in Theorem 3 holds for all of the aforementioned reweighting procedures without any additional assumptions, except for a few cases. For the case, it holds as long as there exists a constant such that for all and . For the Gaussian reweighting scheme (Patak and Beaumont, 2009), it holds as long as . For the independent weights and FGM copula cases, it holds when .
4 Proofs
We begin by stating a strong law of large numbers of the bootstrap mean due to Arenal-Gutiérrez et al. (1996, Theorem 2.1). This lemma plays a key role in our proof of Theorem 1.
Lemma 1.
Let i.i.d. and for independently, let
independently of , where is a positive integer for . Suppose there exists a such that
(14) |
Then
(15) |
Proof.
See Arenal-Gutiérrez et al. (1996, Theorem 2.1) for the proof. ∎
Proof of Theorem 1.
Our argument is based on the proof of Theorem 1.3.3 of Ghosh and Ramamoorthi (2003), except that we use Lemma 1 in place of the strong law of large numbers for i.i.d. random variables.
Condition 1 ensures that is measurable; this can be seen by letting be a countable dense subset of , verifying that , and using Folland (2013, Proposition 2.7). Define , and note that is continuous by the dominated convergence theorem. Let denote the open ball of radius at , where is the metric on . For , , and , define
(16) |
and observe that by continuity and compactness,
(17) |
Applying the dominated convergence theorem again, we have for all . Thus, for any , by compactness of there exist , , and such that and for all . Choosing according to the statement of Theorem 1,
(18) | ||||
(19) |
since . For all , by applying Lemma 1 with we have that
(20) |
Similarly, by Lemma 1 with ,
(21) |
Thus, for any , by choosing such that , we have
(22) | ||||
by the triangle inequality and Equation 16. Letting denote the right-hand side of Equation 22, we have a.s. by Equations 20 and 21. Therefore,
(23) |
Since is arbitrary, Equation 23 holds almost surely for for all , completing the proof. ∎
The following result, due to Chen et al. (2019), is a more general version of Lemma 1 that extends beyond the standard bootstrap to negatively orthant dependent (NOD) weights.
Lemma 2.
Let be identically distributed NOD random variables. For each independently, let be NOD random variables, independent of . Suppose and where and either (a) and satisfy , or (b) the weights are identically distributed for all , and . Then
(24) |
Proof.
See Chen et al. (2019, Theorem 1 and Corollary 1) for the proof. ∎
Proof of Theorem 2.
The proof is similar to the proof of Theorem 1, except that we use Lemma 2 instead of Lemma 1, and some modifications are needed to handle more general distributions of the weights .
As in the proof of Theorem 1, define by Equation 16 and . As before, for any , there exist , , and such that and for all . Just as in Equation 18,
For all , by applying Lemma 2 with and , respectively,
(25) | |||
(26) |
Let and . Then and are each NOD because they are monotone transformations of the ’s (Chen et al., 2019). Thus, Lemma 2 applied to with and , respectively, yields
(27) | ||||
(28) |
By adding Equations 27 and 28 and using the fact that , we have
(29) |
Since , Equation 29 implies that, almost surely,
(30) |
where . Note that by the assumption that . For any , choosing such that , we can write
(31) | ||||
Summing Equation 4 over , using the triangle inequality, and employing Equation 16,
(32) | ||||
Letting denote the right-hand side of Equation 32, we have a.s. by Equations 30, 25, and 26, along with the fact that is continuous and is compact. Therefore, almost surely,
(33) |
As before, since is arbitrary, Equation 33 holds almost surely for for all , completing the proof. ∎
Proof of Theorem 3.
First, we may assume without loss of generality that and are nonnegative since we can write and , and apply the result in the nonnegative case to each of , , , and to obtain the result for in the general case; see the proof of Theorem 1 from Chen et al. (2019) for a similar argument.
Let . As before, define by Equation 16 and . For , define and , where is defined in Condition 3. Let be the centers of balls of radius that cover , that is, .
Let . Note that since . Thus, by Lemma 2 with ,
(34) |
For all sufficiently large that , we have by Condition 2, and thus,
(35) |
where . By another application of Lemma 2 with ,
(36) |
since . Defining , we claim that
(37) |
Assuming Equation 37 for the moment, we use the same decomposition as in Equation 32 and plug in Equations 35–37 to obtain
Since is arbitrarily small, this will yield the result of the theorem.
To complete the proof, we need to show Equation 37. For each and , by Chen et al. (2019, Lemma 1), is an NOD sequence since we have assumed without loss of generality that and are nonnegative, and adding constants preserves the NOD property. Asadian et al. (2006) and Rivaz et al. (2007) provide moment inequalities that are useful in this context. By Asadian et al. (2006, Corollary 2.2) (also see (Rivaz et al., 2007, Corollary 3)), since and ,
(38) |
where is a universal constant that depends only on . Letting , we have
(39) |
Likewise, since for any random variable ,
(40) |
Plugging Equations 39 and 40 into Equation 38, we have
(41) |
where . By Condition 3, . Along with Markov’s inequality and Equation 41, this implies
where because by assumption. Hence,
(42) |
Therefore, Equation 37 holds by the Borel–Cantelli lemma. This completes the proof. ∎
References
- Antal and Tillé (2011a) E. Antal and Y. Tillé. A direct bootstrap method for complex sampling designs from a finite population. Journal of the American Statistical Association, 106(494):534–543, 2011a.
- Antal and Tillé (2011b) E. Antal and Y. Tillé. Simple random sampling with over-replacement. Journal of Statistical Planning and Inference, 141(1):597–601, 2011b.
- Arenal-Gutiérrez et al. (1996) E. Arenal-Gutiérrez, C. Matrán, and J. A. Cuesta-Albertos. On the unconditional strong law of large numbers for the bootstrap mean. Statistics & Probability Letters, 27(1):49–60, 1996.
- Asadian et al. (2006) N. Asadian, V. Fakoor, and A. Bozorgnia. Rosenthal’s Type Inequalities for Negatively Orthant Dependent Random Variables. Journal of the Iranian Statistical Society, 5(1 and 2):69–75, 2006.
- Athreya et al. (1984) K. B. Athreya, M. Ghosh, L. Y. Low, and P. K. Sen. Laws of large numbers for bootstrapped U-statistics. Journal of Statistical Planning and Inference, 9(2):185–194, 1984.
- Bickel et al. (2012) P. J. Bickel, F. Götze, and W. R. van Zwet. Resampling fewer than n observations: gains, losses, and remedies for losses. In Selected works of Willem van Zwet, pages 267–297. Springer, 2012.
- Breiman (1996) L. Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
- Chatterjee and Bose (2005) S. Chatterjee and A. Bose. Generalized bootstrap for estimating equations. The Annals of Statistics, 33(1):414–436, 2005.
- Chen et al. (2019) P. Chen, T. Zhang, and S. H. Sung. Strong laws for randomly weighted sums of random variables and applications in the bootstrap and random design regression. Statistica Sinica, 29(4):1739–1749, 2019.
- Efron (1965) B. Efron. Increasing properties of Pólya frequency function. The Annals of Mathematical Statistics, pages 272–279, 1965.
- Efron (1979) B. Efron. Bootstrap methods: Another look at the jackknife. The Annals of Statistics, pages 1–26, 1979.
- Efron (1982) B. Efron. The jackknife, the bootstrap and other resampling plans. SIAM, 1982.
- Efron and Tibshirani (1994) B. Efron and R. J. Tibshirani. An introduction to the bootstrap. CRC press, 1994.
- Folland (2013) G. B. Folland. Real Analysis: Modern Techniques and Their Applications. John Wiley & Sons, 2013.
- Ghosh and Ramamoorthi (2003) J. K. Ghosh and R. V. Ramamoorthi. Bayesian Nonparametrics. Springer Series in Statistics. Springer, New York, NY, 2003. ISBN 0387955372.
- Giné and Zinn (1990) E. Giné and J. Zinn. Bootstrapping general empirical measures. The Annals of Probability, pages 851–869, 1990.
- Huggins and Miller (2019) J. H. Huggins and J. W. Miller. Robust inference and model criticism using bagged posteriors. arXiv preprint arXiv:1912.07104, 2019.
- Huggins and Miller (2022) J. H. Huggins and J. W. Miller. Reproducible model selection using bagged posteriors. Bayesian Analysis, pages 1–26, 2022.
- Joag-Dev and Proschan (1983) K. Joag-Dev and F. Proschan. Negative association of random variables with applications. The Annals of Statistics, pages 286–295, 1983.
- Kosorok (2008) M. R. Kosorok. Introduction to Empirical Processes and Semiparametric Inference. Springer, 2008.
- Lehmann (1966) E. L. Lehmann. Some concepts of dependence. The Annals of Mathematical Statistics, 37(5):1137–1153, 1966.
- Little and Badawy (2019) M. A. Little and R. Badawy. Causal bootstrapping. arXiv preprint arXiv:1910.09648, 2019.
- Nelsen (2006) R. B. Nelsen. An Introduction to Copulas. Springer, 2006.
- Newton and Raftery (1994) M. A. Newton and A. E. Raftery. Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society: Series B (Methodological), 56(1):3–26, 1994.
- Patak and Beaumont (2009) Z. Patak and J.-F. Beaumont. Generalized bootstrap for prices surveys. 57th Session of the International Statistical Institute, Durban, South-Africa, 2009.
- Rivaz et al. (2007) F. Rivaz, M. Amini, and A. G. Bozorgnia. Moment Inequalities and Applications for Negative Dependence Random Variables. 26(4):7–11, 2007.
- Rubin (1981) D. B. Rubin. The Bayesian bootstrap. The Annals of Statistics, pages 130–134, 1981.
- Shalev-Shwartz and Ben-David (2014) S. Shalev-Shwartz and S. Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
- Singh (1981) K. Singh. On the asymptotic accuracy of Efron’s bootstrap. The Annals of Statistics, pages 1187–1195, 1981.
- Singh and Xie (2003) K. Singh and M. Xie. Bootlier-plot: bootstrap based outlier detection plot. Sankhyā: The Indian Journal of Statistics, pages 532–559, 2003.
- Vaart and Wellner (1996) A. W. Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer, 1996.