Strong uniform laws of large numbers for bootstrap means and other randomly weighted sums

Neil A. Spencer and Jeffrey W. Miller

Abstract

This article establishes novel strong uniform laws of large numbers for randomly weighted sums such as bootstrap means. By leveraging recent advances, these results extend previous work in their general applicability to a wide range of weighting procedures and in their flexibility with respect to the effective bootstrap sample size. In addition to the standard multinomial bootstrap and the $m$ -out-of- $n$ bootstrap, our results apply to a large class of randomly weighted sums involving negatively orthant dependent (NOD) weights, including the Bayesian bootstrap, jackknife, resampling without replacement, simple random sampling with over-replacement, independent weights, and multivariate Gaussian weighting schemes. Weights are permitted to be non-identically distributed and possibly even negative. Our proof technique is based on extending a proof of the i.i.d. strong uniform law of large numbers to employ strong laws for randomly weighted sums; in particular, we exploit a recent Marcinkiewicz–Zygmund strong law for NOD weighted sums.

1 Introduction

The bootstrap (Efron and Tibshirani, 1994) and related resampling procedures such as bagging (Breiman, 1996), the Bayesian bootstrap (Rubin, 1981), and the jackknife (Efron, 1982) are widely used general-purpose tools for statistical inference. In addition to its original purpose of approximating sampling distributions of estimators (Efron, 1979), the bootstrap and its relatives have been applied to a variety of statistical tasks, including model averaging (Breiman, 1996), approximate Bayesian inference (Newton and Raftery, 1994), outlier detection (Singh and Xie, 2003), robust Bayesian inference (Huggins and Miller, 2019, 2022), and causal inference (Little and Badawy, 2019).

Due to its versatility, extensions of the bootstrap are frequently proposed to address new statistical questions. When establishing the properties of such methods, bootstrap versions of classical asymptotic results play a key role, such as the weak law of large numbers (Athreya et al., 1984, Theorem 1), the strong law of large numbers (Athreya et al., 1984, Theorem 2), and the central limit theorem (Singh, 1981) for bootstrap means.

Meanwhile, it is sometimes important to obtain convergence over an entire collection of random variables simultaneously, thus guaranteeing convergence for even the worst case in the collection. To this end, several authors have established uniform laws of large numbers for bootstrap means. Giné and Zinn (1990, Theorems 2.6 and 3.5) proved weak uniform laws of large numbers for the standard multinomial bootstrap, that is, with $\mathrm{Multinomial}\big{(}n,(1/n,\ldots,1/n))$ weights. Vaart and Wellner (1996, Theorem 3.6.16) proved an analogous result for exchangeably weighted sums such as the Bayesian bootstrap. As weak laws, these show convergence in probability, but often one needs almost sure convergence, that is, a strong law.

Strong uniform laws of large numbers for bootstrap means are provided by Kosorok (2008, Section 10.2) for weighted sums with independent and identically distributed (i.i.d.) weights, with weights obtained by normalizing $n$ i.i.d. random variables, and with $\mathrm{Multinomial}\big{(}n,(1/n,\ldots,1/n))$ weights (Theorem 10.13, Corollary 10.14, and Theorem 10.15, respectively). However, these results do not apply to more general schemes such as the jackknife, resampling without replacement, and $\mathrm{Multinomial}\big{(}m_{n},(p_{n1},\ldots,p_{nn}))$ weights.

In this article, we present new strong uniform laws of large numbers for randomly weighted sums, aiming to fill these gaps in the literature. Our first result applies to the case of $\mathrm{Multinomial}\big{(}m_{n},(1/n,\ldots,1/n))$ weights, known as the $m$ -out-of- $n$ bootstrap. Our second and third results apply more generally to a large class of randomly weighted sums that involve negatively orthant dependent (NOD) weights. This covers a wide range of weighting schemes, including the Bayesian bootstrap (Rubin, 1981), various versions of the jackknife (Efron, 1982; Chatterjee and Bose, 2005), resampling without replacement (Bickel et al., 2012), simple random sampling with over-replacement (Antal and Tillé, 2011b), $\mathrm{Multinomial}\big{(}m_{n},(p_{n1},\ldots,p_{nn}))$ weights (Antal and Tillé, 2011a, Section 3), independent weights (Newton and Raftery, 1994), and even schemes involving negative weights such as multivariate Gaussian weights with non-positive correlations (Patak and Beaumont, 2009). All three theorems are flexible in terms of the effective bootstrap sample size $m_{n}$ (that is, the sum of the weights), for instance, allowing $m_{n}=o(n)$ which is of particular interest for certain applications (Bickel et al., 2012; Huggins and Miller, 2022).

The article is organized as follows. We present our main results in Section 2, Section provides some examples of relevant re-weighting schemes, and the proofs of the main results are provided in Section 4.

2 Main results

We present three strong uniform laws of large numbers: Theorem 1 for the multinomial bootstrap, Theorem 2 for more general randomly weighted sums, and Theorem 3 which establishes faster convergence rates under stronger regularity and moment conditions than Theorem 2. All three results are obtained via extensions of the proof of the i.i.d. strong uniform law of large numbers presented by Ghosh and Ramamoorthi (2003). Specifically, our proof of Theorem 1 involves replacing the traditional strong law of large numbers with a strong law of large numbers for bootstrap means as presented by Arenal-Gutiérrez et al. (1996). Similarly, Theorems 2 and 3 rely on a strong law of large numbers for randomly weighted sums of random variables presented by Chen et al. (2019).

Condition 1.

Suppose $\Theta$ is a compact subset of a separable metric space. Let $u\in\mathbb{N}$ and $H(\theta,x)$ be a real-valued function on $\Theta\times\mathbb{R}^{u}$ such that

(i)

for each $x\in\mathbb{R}^{u}$ , $\theta\mapsto H(\theta,x)$ is continuous on $\Theta$ , and
(ii)

for each $\theta\in\Theta$ , $x\mapsto H(\theta,x)$ is a measurable function on $\mathbb{R}^{u}$ .

Theorem 1.

Let $X_{1},X_{2},\ldots\in\mathbb{R}^{u}$ i.i.d. and for $n\in\mathbb{N}$ independently, let

\displaystyle(W_{n1},\ldots,W_{nn})\sim\mathrm{Multinomial}\big{(}m_{n},(1/n,\ldots,1/n)\big{)}

independently of $(X_{1},X_{2},\ldots)$ , where $m_{n}$ is a positive integer for $n\in\mathbb{N}$ . Assume Condition 1 and suppose there exists $\delta\in[0,1)$ such that

\displaystyle\lim_{n\rightarrow\infty}\frac{n^{1-\delta}\log(n)}{m_{n}}=0\text{\leavevmode\nobreak\ \leavevmode\nobreak\ and \leavevmode\nobreak\ \leavevmode\nobreak\ }\mathbb{E}\left(\sup_{\theta\in\Theta}\left|H(\theta,X_{1})\right|^{\frac{1}{1-\delta}}\right)<\infty,

(1)

Then

\displaystyle\sup_{\theta\in\Theta}\bigg{|}\frac{1}{m_{n}}\sum_{j=1}^{n}W_{nj}H(\theta,X_{j})-\mathbb{E}(H(\theta,X_{1}))\bigg{|}\xrightarrow[n\to\infty]{\mathrm{a.s.}}0.

(2)

In Theorems 2 and 3, we generalize beyond the standard multinomial bootstrap to weighting schemes involving NOD random variables (Lehmann, 1966; Chen et al., 2019).

Definition 1.

A finite collection of random variables $X_{1},\ldots,X_{n}\in\mathbb{R}$ is said to be negatively orthant dependent (NOD) if

\displaystyle\mathbb{P}\left(X_{1}\leq x_{1},\ldots,X_{n}\leq x_{n}\right)

\displaystyle\leq\prod_{i=1}^{n}\mathbb{P}(X_{i}\leq x_{i})

(3)

and

\displaystyle\mathbb{P}\left(X_{1}>x_{1},\ldots,X_{n}>x_{n}\right)

\displaystyle\leq\prod_{i=1}^{n}\mathbb{P}(X_{i}>x_{i})

(4)

for all $x_{1},\ldots,x_{n}\in\mathbb{R}$ . An infinite collection of random variables is NOD if every finite subcollection is NOD.

Any collection of independent random variables is NOD, and many commonly used multivariate distributions are NOD including the multinomial distribution, the Dirichlet distribution, the Dirichlet-multinomial distribution, the multivariate hypergeometric distribution, convolutions of multinomial distributions, and multivariate Gaussian distributions for which the correlations are all non-positive (Joag-Dev and Proschan, 1983).

Theorem 2.

Let $X_{1},X_{2},\ldots\in\mathbb{R}^{u}$ i.i.d. and for $n\in\mathbb{N}$ independently, let $W_{n1},\ldots,W_{nn}\in\mathbb{R}$ be NOD random variables, independent of $(X_{1},X_{2},\ldots)$ . Assume Condition 1 and suppose $\mathbb{E}(\sup_{\theta\in\Theta}|H(\theta,X_{1})|^{\beta})<\infty$ , $\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|^{\alpha})=O(n)$ , and $\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|)=O(n^{1/p})$ where $p\in[1,2)$ and $\alpha>2p$ and $\beta>1$ satisfy $\alpha^{-1}+\beta^{-1}=p^{-1}$ . Then

\displaystyle\sup_{\theta\in\Theta}\bigg{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}W_{nj}H(\theta,X_{j})-\mathbb{E}(W_{nj}H(\theta,X_{j}))\Big{)}\bigg{|}\xrightarrow[n\to\infty]{\mathrm{a.s.}}0.

(5)

In particular, if $n^{-1/p}\sum_{j=1}^{n}\mathbb{E}(W_{nj})\to 1$ , then Equation 5 is analogous to Equation 2 with $n^{1/p}$ in place of $m_{n}$ . While the moment condition on $H(\theta,X_{1})$ in Theorem 2 is slightly more stringent than in Theorem 1, it applies to a more general class of resampling procedures. For instance, the distribution of $W_{nj}$ can be different for each $n$ and $j$ ; in particular, there is no assumption that $W_{n1},\ldots,W_{nn}$ are exchangeable or even identically distributed. Further, the weights $W_{nj}$ are not restricted to being non-negative; thus, random weights taking positive and negative values are permitted.

The main limitation of Theorem 2 is that whenever $p>1$ , the condition that $\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|)=O(n^{1/p})$ requires the weights to be getting smaller as $n$ increases. One can scale the weights up by a factor of $n^{1-1/p}$ , but then the leading factor of $1/n^{1/p}$ in Equation 5 becomes $1/n$ , effectively reverting back to the standard rate of convergence. In Theorem 3, we show that this condition can be dropped if we assume stronger regularity and moment conditions.

Condition 2.

Assume $\theta\mapsto H(\theta,x)$ is uniformly locally Hölder continuous, in the sense that there exist $a>0$ , $M>0$ , and $\delta>0$ such that for all $x\in\mathbb{R}^{u}$ , $\theta,\theta^{\prime}\in\Theta$ , if $d(\theta,\theta^{\prime})<\delta$ then $|H(\theta,x)-H(\theta^{\prime},x)|\leq Md(\theta,\theta^{\prime})^{a}$ .

Condition 3.

Assume $N(r)\leq c\,r^{-D}$ for some $c>0$ and $D>0$ , where $N(r)$ is the smallest number of open balls of radius $r$ , centered at points in $\Theta$ , needed to cover $\Theta$ .

Condition 3 holds for any compact $\Theta\subset\mathbb{R}^{D}$ ; indeed, by Shalev-Shwartz and Ben-David (2014, Example 27.1), $N_{1}(r)\leq c_{1}r^{-D}$ where $N_{1}(r)$ is the number of $r$ -balls needed to cover $\Theta$ , centered at any points in $\mathbb{R}^{D}$ , and it is straightforward to verify that $N(r)\leq N_{1}(r/2)$ .

Theorem 3.

Let $X_{1},X_{2},\ldots\in\mathbb{R}^{u}$ i.i.d. and for $n\in\mathbb{N}$ independently, let $W_{n1},\ldots,W_{nn}\in\mathbb{R}$ be NOD random variables, independent of $(X_{1},X_{2},\ldots)$ . Assume Conditions 1, 2, and 3. Let $p\in(1,2)$ and suppose $\sup_{\theta\in\Theta}\mathbb{E}(|H(\theta,X_{1})|^{q})<\infty$ , $\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|^{q})=O(n)$ for some $q>((1-1/p)D/a+1)/(1/p-1/2)$ . Then

\displaystyle\sup_{\theta\in\Theta}\bigg{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}W_{nj}H(\theta,X_{j})-\mathbb{E}(W_{nj}H(\theta,X_{j}))\Big{)}\bigg{|}\xrightarrow[n\to\infty]{\mathrm{a.s.}}0.

(6)

Note that as $p\to 1$ from above, the bound on the power $q$ approaches $2$ .

3 Examples of relevant resampling schemes

A key condition of Theorems 2 and 3 is that the weights $W_{n1},\ldots,W_{nn}$ must be NOD. It turns out that most popular resampling techniques satisfy this condition.

For instance, the $m$ -out-of- $n$ bootstrap corresponds to $\mathrm{Multinomial}\big{(}m_{n},(1/n,\ldots,1/n))$ weights, unequal probability with replacement corresponds to $\mathrm{Multinomial}\big{(}m_{n},(p_{n1},\ldots,p_{nn}))$ weights (Antal and Tillé, 2011a), the Bayesian bootstrap corresponds to Dirichlet weights, the delete- $d$ jackknife and resampling without replacement correspond to multivariate hypergeometric weights (Chatterjee and Bose, 2005), the weighted likelihood bootstrap is equivalent to using independent weights (Newton and Raftery, 1994), and the reweighting scheme of Patak and Beaumont (2009) employs multivariate Gaussian weights with non-positive correlations. All of these distributions satisfy the NOD requirement (Joag-Dev and Proschan, 1983).

The NOD requirement is also satisfied by less standard reweighting schemes such as the downweight- $d$ jackknife (Chatterjee and Bose, 2005) and simple random sampling with over-replacement (Antal and Tillé, 2011b). In the downweight- $d$ jackknife, $d$ indices $i_{1},\ldots,i_{d}$ are selected uniformly at random to be downweighted such that $W_{ni_{1}}=\cdots=W_{ni_{d}}=d/n$ , whereas the remaining $n-d$ indices are upweighted to $1+d/n$ . These weights can be viewed as a monotonic transformation of the multivariate hypergeometric weights corresponding to the delete- $d$ jackknife, and thus are NOD (Chen et al., 2019). For simple random sampling with over-replacement, the weights can be viewed as the conditional distribution of a sequence of $n$ independent geometric random variables, given that their sum equals $n$ . Geometric random variables satisfy the conditions of Theorem 2.6 from Joag-Dev and Proschan (1983) according to Efron (1965, 3.1), implying that the resulting weights are NOD.

A general family of applicable NOD reweighting schemes can also be derived from the Farlie-Gumbel-Morgenstern (FGM) $n$ -copula (Nelsen, 2006, page 108). For variables $u_{1},\ldots,u_{n}\in[0,1]$ , an FGM copula is characterized by $2^{n}-n-1$ parameters—one for each subset of $\left\{1,\ldots,n\right\}$ that contains least two elements. Let $\theta_{1,2},\theta_{1,3},\ldots,\theta_{1,2,\ldots,n}\in[-1,1]$ denote these parameters. The density of the FGM copula is given by

\displaystyle f(u)

\displaystyle=1+\sum_{k=2}^{n}\sum_{1\leq j_{1}<j_{2}<\cdots<j_{k}\leq n}\theta_{j_{1},j_{2},\cdots,j_{k}}(1-2u_{j_{1}})(1-2u_{j_{2}})\cdots(1-2u_{j_{k}}).

(7)

where the $\theta$ values must adhere to the constraints

\displaystyle\sum_{k=2}^{n}\sum_{1\leq j_{1}<j_{2}<\cdots<j_{k}\leq n}\theta_{j_{1},j_{2},\cdots,j_{k}}\epsilon_{j_{1}}\epsilon_{j_{2}}\cdots\epsilon_{j_{k}}\geq-1

(8)

for all $(\epsilon_{1},\ldots,\epsilon_{n})\in\left\{-1,1\right\}^{n}$ to ensure non-negativity of the density function. It follows that each $u_{j}$ is marginally uniformly distributed, and for each $x\in[0,1]^{n}$ ,

	$\displaystyle\mathbb{P}\left(u_{1}\leq x_{1},\ldots,u_{n}\leq x_{n}\right)$	$\displaystyle=\left(\prod_{j=1}^{n}x_{i}\right)\left(1+\sum_{k=2}^{n}\sum_{1\leq j_{1}<j_{2}<\cdots<j_{k}\leq n}\theta_{j_{1},j_{2},\cdots,j_{k}}\prod_{\ell=1}^{k}(1-x_{j_{\ell}})\right),$		(9)
	$\displaystyle\mathbb{P}\left(u_{1}>x_{1},\ldots,u_{n}>x_{n}\right)$	$\displaystyle=\left(\prod_{j=1}^{n}(1-x_{i})\right)\left(1+\sum_{k=2}^{n}\sum_{1\leq j_{1}<j_{2}<\cdots<j_{k}\leq n}\theta_{j_{1},j_{2},\cdots,j_{k}}(-1)^{k}\prod_{\ell=1}^{k}x_{j_{\ell}}\right).$		(10)

Therefore, for any $x_{1},\ldots,x_{n}\in[0,1]$ ,

\displaystyle\prod_{i=1}^{n}\mathbb{P}\left(u_{i}\leq x_{i}\right)=\prod_{i=1}^{n}x_{i},\text{ and }\prod_{i=1}^{n}\mathbb{P}\left(u_{i}>x_{i}\right)=\prod_{i=1}^{n}(1-x_{i}).

(11)

A FGM $n$ -copula is thus NOD if and only if

	$\displaystyle\sum_{k=2}^{n}\sum_{1\leq j_{1}<j_{2}<\cdots<j_{k}\leq n}\theta_{j_{1}j_{2}\cdots j_{k}}\prod_{\ell=1}^{k}(1-x_{j_{\ell}})\leq 0,\text{ and}$		(12)
	$\displaystyle\sum_{k=2}^{n}\sum_{1\leq j_{1}<j_{2}<\cdots<j_{k}\leq n}\theta_{j_{1}j_{2}\cdots j_{k}}(-1)^{k}\prod_{\ell=1}^{k}x_{j_{\ell}}\leq 0$		(13)

for all $x_{1},\ldots,x_{n}\in[0,1]$ . Because NOD is preserved under monotonic transformations, it follows that any multivariate distribution corresponding to such a FGM copula with $\theta$ satisfying the criteria in 12 and 13 will be NOD.

Finally, the condition that $\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|^{q})=O(n)$ in Theorem 3 holds for all of the aforementioned reweighting procedures without any additional assumptions, except for a few cases. For the $\mathrm{Multinomial}\big{(}m_{n},(p_{n1},\ldots,p_{nn}))$ case, it holds as long as there exists a constant $\kappa>0$ such that $p_{nj}<\kappa/n$ for all $j$ and $n$ . For the Gaussian reweighting scheme (Patak and Beaumont, 2009), it holds as long as $\sup_{n,j}\mathrm{Var}(W_{nj})<\infty$ . For the independent weights and FGM copula cases, it holds when $\sup_{n,j}\mathbb{E}(|W_{nj}|^{q})<\infty$ .

4 Proofs

We begin by stating a strong law of large numbers of the bootstrap mean due to Arenal-Gutiérrez et al. (1996, Theorem 2.1). This lemma plays a key role in our proof of Theorem 1.

Lemma 1.

Let $Z_{1},Z_{2},\ldots\in\mathbb{R}$ i.i.d. and for $n\in\mathbb{N}$ independently, let

\displaystyle(W_{n1},\ldots,W_{nn})\sim\mathrm{Multinomial}\big{(}m_{n},(1/n,\ldots,1/n)\big{)}

independently of $(Z_{1},Z_{2},\ldots)$ , where $m_{n}$ is a positive integer for $n\in\mathbb{N}$ . Suppose there exists a $\delta\in[0,1)$ such that

\displaystyle\lim_{n\rightarrow\infty}\frac{n^{1-\delta}\log(n)}{m_{n}}=0\text{\leavevmode\nobreak\ \leavevmode\nobreak\ and \leavevmode\nobreak\ \leavevmode\nobreak\ }\mathbb{E}\big{(}|Z_{1}|^{\frac{1}{1-\delta}}\big{)}<\infty.

(14)

Then

\displaystyle\frac{1}{m_{n}}\sum_{j=1}^{n}W_{nj}Z_{j}\xrightarrow[n\to\infty]{\mathrm{a.s.}}\mathbb{E}(Z_{1}).

(15)

Proof.

See Arenal-Gutiérrez et al. (1996, Theorem 2.1) for the proof. ∎

Proof of Theorem 1.

Our argument is based on the proof of Theorem 1.3.3 of Ghosh and Ramamoorthi (2003), except that we use Lemma 1 in place of the strong law of large numbers for i.i.d. random variables.

Condition 1 ensures that $x\mapsto\sup_{\theta\in\Theta}|H(\theta,x)|$ is measurable; this can be seen by letting $\theta_{1},\theta_{2},\ldots$ be a countable dense subset of $\Theta$ , verifying that $\sup_{j\in\mathbb{N}}|H(\theta_{j},x)|=\sup_{\theta\in\Theta}|H(\theta,x)|$ , and using Folland (2013, Proposition 2.7). Define $\mu(\theta):=\mathbb{E}\left(H(\theta,X_{1})\right)$ , and note that $\mu(\theta)$ is continuous by the dominated convergence theorem. Let $B_{r}(\theta_{0}):=\{\theta\in\Theta:d(\theta,\theta_{0})<r\}$ denote the open ball of radius $r$ at $\theta_{0}$ , where $d(\cdot,\cdot)$ is the metric on $\Theta$ . For $\theta\in\Theta$ , $x\in\mathbb{R}^{u}$ , and $r>0$ , define

\displaystyle\eta(\theta,x,r):=\sup_{\theta^{\prime}\in B_{r}(\theta)}\Big{|}\big{(}H(\theta,x)-\mu(\theta)\big{)}-\big{(}H(\theta^{\prime},x)-\mu(\theta^{\prime})\big{)}\Big{|},

(16)

and observe that by continuity and compactness,

\displaystyle\eta(\theta,x,r)\leq 2\sup_{\theta\in\Theta}|H(\theta,x)|+2\sup_{\theta\in\Theta}|\mu(\theta)|<\infty.

(17)

Applying the dominated convergence theorem again, we have $\lim_{r\to 0}\mathbb{E}(\eta(\theta,X_{1},r))=0$ for all $\theta\in\Theta$ . Thus, for any $\varepsilon>0$ , by compactness of $\Theta$ there exist $K\in\mathbb{N}$ , $\theta_{1},\ldots,\theta_{K}\in\Theta$ , and $r_{1},\ldots,r_{K}>0$ such that $\Theta=\textstyle{\bigcup_{i=1}^{K}}B_{r_{i}}(\theta_{i})$ and $\mathbb{E}\left(\eta(\theta_{i},X_{1},r_{i})\right)<\varepsilon$ for all $i\in\left\{1,\ldots,K\right\}$ . Choosing $\delta\in(0,1]$ according to the statement of Theorem 1,

	$\displaystyle\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})^{\frac{1}{1-\delta}}\big{)}$	$\displaystyle\leq\mathbb{E}\Big{(}\big{(}2\sup_{\theta\in\Theta}\|H(\theta,X_{1})\|+2\sup_{\theta\in\Theta}\|\mu(\theta)\|\big{)}^{\frac{1}{1-\delta}}\Big{)}$		(18)
		$\displaystyle\leq 4^{\frac{1}{1-\delta}}\mathbb{E}\Big{(}\sup_{\theta\in\Theta}\|H(\theta,X_{1})\|^{\frac{1}{1-\delta}}\Big{)}+4^{\frac{1}{1-\delta}}\sup_{\theta\in\Theta}\|\mu(\theta)\|^{\frac{1}{1-\delta}}<\infty$		(19)

since $(x+y)^{\frac{1}{1-\delta}}\leq(2\max\{|x|,|y|\})^{\frac{1}{1-\delta}}\leq 2^{\frac{1}{1-\delta}}(|x|^{\frac{1}{1-\delta}}+|y|^{\frac{1}{1-\delta}})$ . For all $i\in\{1,\ldots,K\}$ , by applying Lemma 1 with $Z_{j}=\eta(\theta_{i},X_{j},r_{i})$ we have that

\displaystyle\frac{1}{m_{n}}\sum_{j=1}^{n}W_{nj}\eta(\theta_{i},X_{j},r_{i})\xrightarrow[n\to\infty]{\mathrm{a.s.}}\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})\big{)}<\varepsilon.

(20)

Similarly, by Lemma 1 with $Z_{j}=H(\theta_{i},X_{j})$ ,

\displaystyle\frac{1}{m_{n}}\sum_{j=1}^{n}W_{nj}H(\theta_{i},X_{j})\xrightarrow[n\to\infty]{\mathrm{a.s.}}\mu(\theta_{i}).

(21)

Thus, for any $\theta\in\Theta$ , by choosing $i$ such that $\theta\in B_{r_{i}}(\theta_{i})$ , we have

		$\displaystyle\bigg{\|}\frac{1}{m_{n}}\sum_{j=1}^{n}W_{nj}H(\theta,X_{j})-\mu(\theta)\bigg{\|}$		(22)
		$\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leq\frac{1}{m_{n}}\sum_{j=1}^{n}W_{nj}\eta(\theta_{i},X_{j},r_{i})+\bigg{\|}\frac{1}{m_{n}}\sum_{j=1}^{n}W_{nj}H(\theta_{i},X_{j})-\mu(\theta_{i})\bigg{\|}$

by the triangle inequality and Equation 16. Letting $V_{ni}$ denote the right-hand side of Equation 22, we have $V_{ni}\to\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})\big{)}<\varepsilon$ a.s. by Equations 20 and 21. Therefore,

\displaystyle\sup_{\theta\in\Theta}\bigg{|}\frac{1}{m_{n}}\sum_{j=1}^{n}W_{nj}H(\theta,X_{j})-\mu(\theta)\bigg{|}\leq\max_{1\leq i\leq K}V_{ni}\xrightarrow[n\to\infty]{\mathrm{a.s.}}\max_{1\leq i\leq K}\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})\big{)}<\varepsilon.

(23)

Since $\varepsilon>0$ is arbitrary, Equation 23 holds almost surely for $\varepsilon=\varepsilon_{k}=1/k$ for all $k\in\mathbb{N}$ , completing the proof. ∎

The following result, due to Chen et al. (2019), is a more general version of Lemma 1 that extends beyond the standard bootstrap to negatively orthant dependent (NOD) weights.

Lemma 2.

Let $Z_{1},Z_{2},\ldots\in\mathbb{R}$ be identically distributed NOD random variables. For each $n\in\mathbb{N}$ independently, let $W_{n1},\ldots,W_{nn}$ be NOD random variables, independent of $(Z_{1},Z_{2},\ldots)$ . Suppose $\mathbb{E}(|Z_{1}|^{\beta})<\infty$ and $\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|^{\alpha})=O(n)$ where $p\in[1,2)$ and either (a) $\alpha>2p$ and $\beta>1$ satisfy $\alpha^{-1}+\beta^{-1}=p^{-1}$ , or (b) the weights $W_{nj}$ are identically distributed for all $n,j$ , and $\alpha=\beta=2p$ . Then

\displaystyle\frac{1}{n^{1/p}}\sum_{j=1}^{n}\big{(}W_{nj}Z_{j}-\mathbb{E}(W_{nj}Z_{j})\big{)}\xrightarrow[n\to\infty]{\mathrm{a.s.}}0.

(24)

Proof.

See Chen et al. (2019, Theorem 1 and Corollary 1) for the proof. ∎

Proof of Theorem 2.

The proof is similar to the proof of Theorem 1, except that we use Lemma 2 instead of Lemma 1, and some modifications are needed to handle more general distributions of the weights $W_{nj}$ .

As in the proof of Theorem 1, define $\eta(\theta,x,r)$ by Equation 16 and $\mu(\theta):=\mathbb{E}\left(H(\theta,X_{1})\right)$ . As before, for any $\varepsilon>0$ , there exist $K\in\mathbb{N}$ , $\theta_{1},\ldots,\theta_{K}\in\Theta$ , and $r_{1},\ldots,r_{K}>0$ such that $\Theta=\textstyle{\bigcup_{i=1}^{K}}B_{r_{i}}(\theta_{i})$ and $\mathbb{E}\left(\eta(\theta_{i},X_{1},r_{i})\right)<\varepsilon$ for all $i\in\left\{1,\ldots,K\right\}$ . Just as in Equation 18,

\displaystyle\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})^{\beta}\big{)}\leq 4^{\beta}\mathbb{E}\Big{(}\sup_{\theta\in\Theta}|H(\theta,X_{1})|^{\beta}\Big{)}+4^{\beta}\sup_{\theta\in\Theta}|\mu(\theta)|^{\beta}<\infty.

For all $i\in\{1,\ldots,K\}$ , by applying Lemma 2 with $Z_{j}=1$ and $Z_{j}=H(\theta_{i},X_{j})$ , respectively,

	$\displaystyle\frac{1}{n^{1/p}}\sum_{j=1}^{n}\big{(}W_{nj}-\mathbb{E}(W_{nj})\big{)}\xrightarrow[n\to\infty]{\mathrm{a.s.}}0,$		(25)
	$\displaystyle\frac{1}{n^{1/p}}\sum_{j=1}^{n}\big{(}W_{nj}H(\theta_{i},X_{j})-\mathbb{E}\left(W_{nj}\right)\mu(\theta_{i})\big{)}\xrightarrow[n\to\infty]{\mathrm{a.s.}}0.$		(26)

Let $W_{nj}^{+}=\max\{W_{nj},0\}$ and $W_{nj}^{-}=\max\{-W_{nj},0\}$ . Then $(W_{n1}^{+},\ldots,W_{nn}^{+})$ and $(W_{n1}^{-},\ldots,W_{nn}^{-})$ are each NOD because they are monotone transformations of the $W_{nj}$ ’s (Chen et al., 2019). Thus, Lemma 2 applied to $Z_{j}=\eta(\theta_{i},X_{j},r_{i})$ with $W_{nj}^{+}$ and $W_{nj}^{-}$ , respectively, yields

	$\displaystyle\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}W^{+}_{nj}\eta(\theta_{i},X_{j},r_{i})-\mathbb{E}(W_{nj}^{+})\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})\big{)}\Big{)}$	$\displaystyle\xrightarrow[n\to\infty]{\mathrm{a.s.}}0,$		(27)
	$\displaystyle\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}W^{-}_{nj}\eta(\theta_{i},X_{j},r_{i})-\mathbb{E}(W_{nj}^{-})\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})\big{)}\Big{)}$	$\displaystyle\xrightarrow[n\to\infty]{\mathrm{a.s.}}0.$		(28)

By adding Equations 27 and 28 and using the fact that $|W_{nj}|=W_{nj}^{+}+W_{nj}^{-}$ , we have

\displaystyle\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}|W_{nj}|\eta(\theta_{i},X_{j},r_{i})-\mathbb{E}\left(|W_{nj}|\right)\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})\big{)}\Big{)}

\displaystyle\xrightarrow[n\to\infty]{\mathrm{a.s.}}0,

(29)

Since $\mathbb{E}(\eta(\theta_{i},X_{1},r_{i}))<\varepsilon$ , Equation 29 implies that, almost surely,

\displaystyle\limsup_{n\to\infty}\frac{1}{n^{1/p}}\sum_{j=1}^{n}|W_{nj}|\eta(\theta_{i},X_{j},r_{i})\leq\limsup_{n\to\infty}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|)\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})\big{)}\leq C\varepsilon

(30)

where $C:=\limsup_{n\to\infty}n^{-1/p}\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|)$ . Note that $C<\infty$ by the assumption that $\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|)=O(n^{1/p})$ . For any $\theta\in\Theta$ , choosing $i$ such that $\theta\in B_{r_{i}}(\theta_{i})$ , we can write

$\displaystyle W_{nj}H(\theta,X_{j})-\mathbb{E}(W_{nj})\mu(\theta)$	$\displaystyle=W_{nj}\Big{(}(H(\theta,X_{j})-\mu(\theta))-(H(\theta_{i},X_{j})-\mu(\theta_{i}))\Big{)}$
	$\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ +(W_{nj}-\mathbb{E}(W_{nj}))(\mu(\theta)-\mu(\theta_{i}))$	(31)
	$\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ +\big{(}W_{nj}H(\theta_{i},X_{j})-\mathbb{E}(W_{nj})\mu(\theta_{i})\big{)}.$

Summing Equation 4 over $j$ , using the triangle inequality, and employing Equation 16,

$\displaystyle\bigg{\|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}W_{nj}H(\theta,X_{j})-\mathbb{E}(W_{nj})\mu(\theta)\Big{)}\bigg{\|}$	$\displaystyle\leq\frac{1}{n^{1/p}}\sum_{j=1}^{n}\|W_{nj}\|\,\eta(\theta_{i},X_{j},r_{i})$	(32)
	$\displaystyle+\Big{\|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\left(W_{nj}-\mathbb{E}(W_{nj})\right)\Big{\|}\,\sup_{\theta\in\Theta}\|\mu(\theta)-\mu(\theta_{i})\|$
	$\displaystyle+\Big{\|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}W_{nj}H(\theta_{i},X_{j})-\mathbb{E}(W_{nj})\mu(\theta_{i})\Big{)}\Big{\|}.$

Letting $V_{ni}$ denote the right-hand side of Equation 32, we have $\limsup_{n}V_{ni}\leq C\varepsilon$ a.s. by Equations 30, 25, and 26, along with the fact that $\mu(\theta)$ is continuous and $\Theta$ is compact. Therefore, almost surely,

\displaystyle\limsup_{n\to\infty}\;\sup_{\theta\in\Theta}\bigg{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}W_{nj}H(\theta,X_{j})-\mathbb{E}(W_{nj})\mu(\theta)\Big{)}\bigg{|}\leq\limsup_{n\to\infty}\max_{1\leq i\leq K}V_{ni}\leq C\varepsilon.

(33)

As before, since $\varepsilon>0$ is arbitrary, Equation 33 holds almost surely for $\varepsilon=\varepsilon_{k}=1/k$ for all $k\in\mathbb{N}$ , completing the proof. ∎

Proof of Theorem 3.

First, we may assume without loss of generality that $W_{nj}$ and $H(\theta,X_{j})$ are nonnegative since we can write $W_{nj}=W_{nj}^{+}-W_{nj}^{-}$ and $H(\theta,X_{j})=H(\theta,X_{j})^{+}-H(\theta,X_{j})^{-}$ , and apply the result in the nonnegative case to each of $W_{nj}^{+}H(\theta,X_{j})^{+}$ , $W_{nj}^{+}H(\theta,X_{j})^{-}$ , $W_{nj}^{-}H(\theta,X_{j})^{+}$ , and $W_{nj}^{-}H(\theta,X_{j})^{-}$ to obtain the result for $W_{nj}H(\theta,X_{j})$ in the general case; see the proof of Theorem 1 from Chen et al. (2019) for a similar argument.

Let $\varepsilon>0$ . As before, define $\eta(\theta,x,r)$ by Equation 16 and $\mu(\theta):=\mathbb{E}\left(H(\theta,X_{1})\right)$ . For $n\in\mathbb{N}$ , define $r_{n}=(\varepsilon/n^{(1-1/p)})^{1/a}$ and $K_{n}=N(r_{n})$ , where $N(r)$ is defined in Condition 3. Let $\theta_{n1},\ldots,\theta_{nK_{n}}\in\Theta$ be the centers of $K_{n}$ balls of radius $r_{n}$ that cover $\Theta$ , that is, $\bigcup_{i=1}^{K_{n}}B_{r_{n}}(\theta_{ni})$ .

Let $C_{W}=1+\sup_{n}\frac{1}{n}\sum_{j=1}^{n}\mathbb{E}(W_{nj}^{q})<\infty$ . Note that $q>2p$ since $p>1$ . Thus, by Lemma 2 with $Z_{j}=1$ ,

\displaystyle\frac{1}{n}\sum_{j=1}^{n}W_{nj}\leq C_{W}+\frac{1}{n}\sum_{j=1}^{n}(W_{nj}-\mathbb{E}(W_{nj}))\xrightarrow[n\to\infty]{\mathrm{a.s.}}C_{W}

(34)

For all $n$ sufficiently large that $r_{n}<\delta$ , we have $\eta(\theta,x,r_{n})\leq 2Mr_{n}^{a}=2M\varepsilon/n^{(1-1/p)}$ by Condition 2, and thus,

\displaystyle\limsup_{n\to\infty}\max_{i\in[K_{n}]}\frac{1}{n^{1/p}}\sum_{j=1}^{n}W_{nj}\eta(\theta_{ni},X_{j},r_{n})\leq 2M\varepsilon\limsup_{n\to\infty}\frac{1}{n}\sum_{j=1}^{n}W_{nj}\stackrel{{\scriptstyle\mathrm{a.s.}}}{{\leq}}2MC_{W}\varepsilon

(35)

where $[K_{n}]=\{1,\ldots,K_{n}\}$ . By another application of Lemma 2 with $Z_{j}=1$ ,

\displaystyle\limsup_{n\to\infty}\max_{i\in[K_{n}]}\Big{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}(W_{nj}-\mathbb{E}(W_{nj}))\Big{|}\sup_{\theta\in\Theta}|\mu(\theta)-\mu(\theta_{ni})|\stackrel{{\scriptstyle\mathrm{a.s.}}}{{=}}0

(36)

since $\max_{i\in[K_{n}]}\sup_{\theta}|\mu(\theta)-\mu(\theta_{ni})|\leq 2\sup_{\theta}|\mu(\theta)|<\infty$ . Defining $Y_{nij}:=W_{nj}H(\theta_{ni},X_{j})-\mathbb{E}(W_{nj})\mu(\theta_{ni})$ , we claim that

\displaystyle\limsup_{n\to\infty}\max_{i\in[K_{n}]}\Big{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}Y_{nij}\Big{|}\stackrel{{\scriptstyle\mathrm{a.s.}}}{{\leq}}\varepsilon.

(37)

Assuming Equation 37 for the moment, we use the same decomposition as in Equation 32 and plug in Equations 35–37 to obtain

\displaystyle\limsup_{n\to\infty}\sup_{\theta\in\Theta}\bigg{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}

\displaystyle\Big{(}W_{nj}H(\theta,X_{j})-\mathbb{E}(W_{nj})\mu(\theta)\Big{)}\bigg{|}\leq 2MC_{W}\varepsilon+0+\varepsilon.

Since $\varepsilon>0$ is arbitrarily small, this will yield the result of the theorem.

To complete the proof, we need to show Equation 37. For each $n$ and $i$ , by Chen et al. (2019, Lemma 1), $Y_{ni1},\ldots,Y_{nin}$ is an NOD sequence since we have assumed without loss of generality that $W_{nj}$ and $H(\theta,X_{j})$ are nonnegative, and adding constants preserves the NOD property. Asadian et al. (2006) and Rivaz et al. (2007) provide moment inequalities that are useful in this context. By Asadian et al. (2006, Corollary 2.2) (also see (Rivaz et al., 2007, Corollary 3)), since $\mathbb{E}(Y_{nij})=0$ and $q>2$ ,

\displaystyle\mathbb{E}\Big{(}\Big{|}\sum_{j=1}^{n}Y_{nij}\Big{|}^{q}\Big{)}\leq C_{A}(q)\sum_{j=1}^{n}\mathbb{E}(|Y_{nij}|^{q})+C_{A}(q)\Big{(}\sum_{j=1}^{n}\mathbb{E}(|Y_{nij}|^{2})\Big{)}^{q/2}

(38)

where $C_{A}(q)$ is a universal constant that depends only on $q$ . Letting $C_{H}=1+\sup_{\theta\in\Theta}\mathbb{E}(|H(\theta,X_{1})|^{q})<\infty$ , we have

\displaystyle\sum_{j=1}^{n}\mathbb{E}(|Y_{nij}|^{q})\leq\sum_{j=1}^{n}2^{q+1}\mathbb{E}\big{(}|W_{nj}|^{q}|H(\theta_{ni},X_{j})|^{q}\big{)}\leq 2^{q+1}C_{H}C_{W}n.

(39)

Likewise, since $\mathrm{Var}(X)\leq\mathbb{E}(X^{2})$ for any random variable $X$ ,

\displaystyle\sum_{j=1}^{n}\mathbb{E}(|Y_{nij}|^{2})\leq\sum_{j=1}^{n}\mathbb{E}\big{(}|W_{nj}|^{2}|H(\theta_{ni},X_{j})|^{2}\big{)}\leq C_{H}C_{W}n.

(40)

Plugging Equations 39 and 40 into Equation 38, we have

\displaystyle\mathbb{E}\Big{(}\Big{|}\sum_{j=1}^{n}Y_{nij}\Big{|}^{q}\Big{)}\leq Cn^{q/2}

(41)

where $C=C_{A}(q)2^{q+1}C_{H}C_{W}+C_{A}(q)(C_{H}C_{W})^{q/2}$ . By Condition 3, $K_{n}=N(r_{n})\leq c\,r_{n}^{-D}=c\,n^{(1-1/p)D/a}/\varepsilon^{D/a}$ . Along with Markov’s inequality and Equation 41, this implies

	$\displaystyle\mathbb{P}\Big{(}\max_{i\in[K_{n}]}\Big{\|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}Y_{nij}\Big{\|}>\varepsilon\Big{)}$	$\displaystyle\leq\sum_{i=1}^{K_{n}}\mathbb{P}\Big{(}\Big{\|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}Y_{nij}\Big{\|}>\varepsilon\Big{)}\leq\frac{1}{\varepsilon^{q}n^{q/p}}\sum_{i=1}^{K_{n}}\mathbb{E}\Big{(}\Big{\|}\sum_{j=1}^{n}Y_{nij}\Big{\|}^{q}\Big{)}$
		$\displaystyle\leq\frac{K_{n}Cn^{q/2}}{\varepsilon^{q}n^{q/p}}\leq\frac{c\,C}{\varepsilon^{q}\varepsilon^{D/a}}n^{(1-1/p)D/a}\,n^{q(1/2-1/p)}=\frac{c\,C}{\varepsilon^{q}\varepsilon^{D/a}}n^{-\gamma}$

where $\gamma:=q(1/p-1/2)-(1-1/p)D/a>1$ because $q>((1-1/p)D/a+1)/(1/p-1/2)$ by assumption. Hence,

\displaystyle\sum_{n=1}^{\infty}\mathbb{P}\Big{(}\max_{i\in[K_{n}]}\Big{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}Y_{nij}\Big{|}>\varepsilon\Big{)}\leq\frac{c\,C}{\varepsilon^{q}\varepsilon^{D/a}}\sum_{n=1}^{\infty}n^{-\gamma}<\infty.

(42)

Therefore, Equation 37 holds by the Borel–Cantelli lemma. This completes the proof. ∎

References

Antal and Tillé (2011a) E. Antal and Y. Tillé. A direct bootstrap method for complex sampling designs from a finite population. Journal of the American Statistical Association, 106(494):534–543, 2011a.
Antal and Tillé (2011b) E. Antal and Y. Tillé. Simple random sampling with over-replacement. Journal of Statistical Planning and Inference, 141(1):597–601, 2011b.
Arenal-Gutiérrez et al. (1996) E. Arenal-Gutiérrez, C. Matrán, and J. A. Cuesta-Albertos. On the unconditional strong law of large numbers for the bootstrap mean. Statistics & Probability Letters, 27(1):49–60, 1996.
Asadian et al. (2006) N. Asadian, V. Fakoor, and A. Bozorgnia. Rosenthal’s Type Inequalities for Negatively Orthant Dependent Random Variables. Journal of the Iranian Statistical Society, 5(1 and 2):69–75, 2006.
Athreya et al. (1984) K. B. Athreya, M. Ghosh, L. Y. Low, and P. K. Sen. Laws of large numbers for bootstrapped U-statistics. Journal of Statistical Planning and Inference, 9(2):185–194, 1984.
Bickel et al. (2012) P. J. Bickel, F. Götze, and W. R. van Zwet. Resampling fewer than n observations: gains, losses, and remedies for losses. In Selected works of Willem van Zwet, pages 267–297. Springer, 2012.
Breiman (1996) L. Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
Chatterjee and Bose (2005) S. Chatterjee and A. Bose. Generalized bootstrap for estimating equations. The Annals of Statistics, 33(1):414–436, 2005.
Chen et al. (2019) P. Chen, T. Zhang, and S. H. Sung. Strong laws for randomly weighted sums of random variables and applications in the bootstrap and random design regression. Statistica Sinica, 29(4):1739–1749, 2019.
Efron (1965) B. Efron. Increasing properties of Pólya frequency function. The Annals of Mathematical Statistics, pages 272–279, 1965.
Efron (1979) B. Efron. Bootstrap methods: Another look at the jackknife. The Annals of Statistics, pages 1–26, 1979.
Efron (1982) B. Efron. The jackknife, the bootstrap and other resampling plans. SIAM, 1982.
Efron and Tibshirani (1994) B. Efron and R. J. Tibshirani. An introduction to the bootstrap. CRC press, 1994.
Folland (2013) G. B. Folland. Real Analysis: Modern Techniques and Their Applications. John Wiley & Sons, 2013.
Ghosh and Ramamoorthi (2003) J. K. Ghosh and R. V. Ramamoorthi. Bayesian Nonparametrics. Springer Series in Statistics. Springer, New York, NY, 2003. ISBN 0387955372.
Giné and Zinn (1990) E. Giné and J. Zinn. Bootstrapping general empirical measures. The Annals of Probability, pages 851–869, 1990.
Huggins and Miller (2019) J. H. Huggins and J. W. Miller. Robust inference and model criticism using bagged posteriors. arXiv preprint arXiv:1912.07104, 2019.
Huggins and Miller (2022) J. H. Huggins and J. W. Miller. Reproducible model selection using bagged posteriors. Bayesian Analysis, pages 1–26, 2022.
Joag-Dev and Proschan (1983) K. Joag-Dev and F. Proschan. Negative association of random variables with applications. The Annals of Statistics, pages 286–295, 1983.
Kosorok (2008) M. R. Kosorok. Introduction to Empirical Processes and Semiparametric Inference. Springer, 2008.
Lehmann (1966) E. L. Lehmann. Some concepts of dependence. The Annals of Mathematical Statistics, 37(5):1137–1153, 1966.
Little and Badawy (2019) M. A. Little and R. Badawy. Causal bootstrapping. arXiv preprint arXiv:1910.09648, 2019.
Nelsen (2006) R. B. Nelsen. An Introduction to Copulas. Springer, 2006.
Newton and Raftery (1994) M. A. Newton and A. E. Raftery. Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society: Series B (Methodological), 56(1):3–26, 1994.
Patak and Beaumont (2009) Z. Patak and J.-F. Beaumont. Generalized bootstrap for prices surveys. 57th Session of the International Statistical Institute, Durban, South-Africa, 2009.
Rivaz et al. (2007) F. Rivaz, M. Amini, and A. G. Bozorgnia. Moment Inequalities and Applications for Negative Dependence Random Variables. 26(4):7–11, 2007.
Rubin (1981) D. B. Rubin. The Bayesian bootstrap. The Annals of Statistics, pages 130–134, 1981.
Shalev-Shwartz and Ben-David (2014) S. Shalev-Shwartz and S. Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
Singh (1981) K. Singh. On the asymptotic accuracy of Efron’s bootstrap. The Annals of Statistics, pages 1187–1195, 1981.
Singh and Xie (2003) K. Singh and M. Xie. Bootlier-plot: bootstrap based outlier detection plot. Sankhyā: The Indian Journal of Statistics, pages 532–559, 2003.
Vaart and Wellner (1996) A. W. Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer, 1996.

$\displaystyle\bigg{\|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}W_{nj}H(\theta,X_{j})-\mathbb{E}(W_{nj})\mu(\theta)\Big{)}\bigg{\|}$	$\displaystyle\leq\frac{1}{n^{1/p}}\sum_{j=1}^{n}\|W_{nj}\|\,\eta(\theta_{i},X_{j},r_{i})$	(32)
	$\displaystyle+\Big{\|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\left(W_{nj}-\mathbb{E}(W_{nj})\right)\Big{\|}\,\sup_{\theta\in\Theta}\|\mu(\theta)-\mu(\theta_{i})\|$
	$\displaystyle+\Big{\|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}W_{nj}H(\theta_{i},X_{j})-\mathbb{E}(W_{nj})\mu(\theta_{i})\Big{)}\Big{\|}.$