This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Strong uniform laws of large numbers for bootstrap means and other randomly weighted sums

Neil A. Spencer and Jeffrey W. Miller
Abstract

This article establishes novel strong uniform laws of large numbers for randomly weighted sums such as bootstrap means. By leveraging recent advances, these results extend previous work in their general applicability to a wide range of weighting procedures and in their flexibility with respect to the effective bootstrap sample size. In addition to the standard multinomial bootstrap and the mm-out-of-nn bootstrap, our results apply to a large class of randomly weighted sums involving negatively orthant dependent (NOD) weights, including the Bayesian bootstrap, jackknife, resampling without replacement, simple random sampling with over-replacement, independent weights, and multivariate Gaussian weighting schemes. Weights are permitted to be non-identically distributed and possibly even negative. Our proof technique is based on extending a proof of the i.i.d. strong uniform law of large numbers to employ strong laws for randomly weighted sums; in particular, we exploit a recent Marcinkiewicz–Zygmund strong law for NOD weighted sums.

1 Introduction

The bootstrap (Efron and Tibshirani, 1994) and related resampling procedures such as bagging (Breiman, 1996), the Bayesian bootstrap (Rubin, 1981), and the jackknife (Efron, 1982) are widely used general-purpose tools for statistical inference. In addition to its original purpose of approximating sampling distributions of estimators (Efron, 1979), the bootstrap and its relatives have been applied to a variety of statistical tasks, including model averaging (Breiman, 1996), approximate Bayesian inference (Newton and Raftery, 1994), outlier detection (Singh and Xie, 2003), robust Bayesian inference (Huggins and Miller, 2019, 2022), and causal inference (Little and Badawy, 2019).

Due to its versatility, extensions of the bootstrap are frequently proposed to address new statistical questions. When establishing the properties of such methods, bootstrap versions of classical asymptotic results play a key role, such as the weak law of large numbers (Athreya et al., 1984, Theorem 1), the strong law of large numbers (Athreya et al., 1984, Theorem 2), and the central limit theorem (Singh, 1981) for bootstrap means.

Meanwhile, it is sometimes important to obtain convergence over an entire collection of random variables simultaneously, thus guaranteeing convergence for even the worst case in the collection. To this end, several authors have established uniform laws of large numbers for bootstrap means. Giné and Zinn (1990, Theorems 2.6 and 3.5) proved weak uniform laws of large numbers for the standard multinomial bootstrap, that is, with Multinomial(n,(1/n,,1/n))\mathrm{Multinomial}\big{(}n,(1/n,\ldots,1/n)) weights. Vaart and Wellner (1996, Theorem 3.6.16) proved an analogous result for exchangeably weighted sums such as the Bayesian bootstrap. As weak laws, these show convergence in probability, but often one needs almost sure convergence, that is, a strong law.

Strong uniform laws of large numbers for bootstrap means are provided by Kosorok (2008, Section 10.2) for weighted sums with independent and identically distributed (i.i.d.) weights, with weights obtained by normalizing nn i.i.d. random variables, and with Multinomial(n,(1/n,,1/n))\mathrm{Multinomial}\big{(}n,(1/n,\ldots,1/n)) weights (Theorem 10.13, Corollary 10.14, and Theorem 10.15, respectively). However, these results do not apply to more general schemes such as the jackknife, resampling without replacement, and Multinomial(mn,(pn1,,pnn))\mathrm{Multinomial}\big{(}m_{n},(p_{n1},\ldots,p_{nn})) weights.

In this article, we present new strong uniform laws of large numbers for randomly weighted sums, aiming to fill these gaps in the literature. Our first result applies to the case of Multinomial(mn,(1/n,,1/n))\mathrm{Multinomial}\big{(}m_{n},(1/n,\ldots,1/n)) weights, known as the mm-out-of-nn bootstrap. Our second and third results apply more generally to a large class of randomly weighted sums that involve negatively orthant dependent (NOD) weights. This covers a wide range of weighting schemes, including the Bayesian bootstrap (Rubin, 1981), various versions of the jackknife (Efron, 1982; Chatterjee and Bose, 2005), resampling without replacement (Bickel et al., 2012), simple random sampling with over-replacement (Antal and Tillé, 2011b), Multinomial(mn,(pn1,,pnn))\mathrm{Multinomial}\big{(}m_{n},(p_{n1},\ldots,p_{nn})) weights (Antal and Tillé, 2011a, Section 3), independent weights (Newton and Raftery, 1994), and even schemes involving negative weights such as multivariate Gaussian weights with non-positive correlations (Patak and Beaumont, 2009). All three theorems are flexible in terms of the effective bootstrap sample size mnm_{n} (that is, the sum of the weights), for instance, allowing mn=o(n)m_{n}=o(n) which is of particular interest for certain applications (Bickel et al., 2012; Huggins and Miller, 2022).

The article is organized as follows. We present our main results in Section 2, Section  provides some examples of relevant re-weighting schemes, and the proofs of the main results are provided in Section 4.

2 Main results

We present three strong uniform laws of large numbers: Theorem 1 for the multinomial bootstrap, Theorem 2 for more general randomly weighted sums, and Theorem 3 which establishes faster convergence rates under stronger regularity and moment conditions than Theorem 2. All three results are obtained via extensions of the proof of the i.i.d. strong uniform law of large numbers presented by Ghosh and Ramamoorthi (2003). Specifically, our proof of Theorem 1 involves replacing the traditional strong law of large numbers with a strong law of large numbers for bootstrap means as presented by Arenal-Gutiérrez et al. (1996). Similarly, Theorems 2 and 3 rely on a strong law of large numbers for randomly weighted sums of random variables presented by Chen et al. (2019).

Condition 1.

Suppose Θ\Theta is a compact subset of a separable metric space. Let uu\in\mathbb{N} and H(θ,x)H(\theta,x) be a real-valued function on Θ×u\Theta\times\mathbb{R}^{u} such that

  • (i)

    for each xux\in\mathbb{R}^{u}, θH(θ,x)\theta\mapsto H(\theta,x) is continuous on Θ\Theta, and

  • (ii)

    for each θΘ\theta\in\Theta, xH(θ,x)x\mapsto H(\theta,x) is a measurable function on u\mathbb{R}^{u}.

Theorem 1.

Let X1,X2,uX_{1},X_{2},\ldots\in\mathbb{R}^{u} i.i.d. and for nn\in\mathbb{N} independently, let

(Wn1,,Wnn)Multinomial(mn,(1/n,,1/n))\displaystyle(W_{n1},\ldots,W_{nn})\sim\mathrm{Multinomial}\big{(}m_{n},(1/n,\ldots,1/n)\big{)}

independently of (X1,X2,)(X_{1},X_{2},\ldots), where mnm_{n} is a positive integer for nn\in\mathbb{N}. Assume Condition 1 and suppose there exists δ[0,1)\delta\in[0,1) such that

limnn1δlog(n)mn=0 and 𝔼(supθΘ|H(θ,X1)|11δ)<,\displaystyle\lim_{n\rightarrow\infty}\frac{n^{1-\delta}\log(n)}{m_{n}}=0\text{\leavevmode\nobreak\ \leavevmode\nobreak\ and \leavevmode\nobreak\ \leavevmode\nobreak\ }\mathbb{E}\left(\sup_{\theta\in\Theta}\left|H(\theta,X_{1})\right|^{\frac{1}{1-\delta}}\right)<\infty, (1)

Then

supθΘ|1mnj=1nWnjH(θ,Xj)𝔼(H(θ,X1))|na.s.0.\displaystyle\sup_{\theta\in\Theta}\bigg{|}\frac{1}{m_{n}}\sum_{j=1}^{n}W_{nj}H(\theta,X_{j})-\mathbb{E}(H(\theta,X_{1}))\bigg{|}\xrightarrow[n\to\infty]{\mathrm{a.s.}}0. (2)

In Theorems 2 and 3, we generalize beyond the standard multinomial bootstrap to weighting schemes involving NOD random variables (Lehmann, 1966; Chen et al., 2019).

Definition 1.

A finite collection of random variables X1,,XnX_{1},\ldots,X_{n}\in\mathbb{R} is said to be negatively orthant dependent (NOD) if

(X1x1,,Xnxn)\displaystyle\mathbb{P}\left(X_{1}\leq x_{1},\ldots,X_{n}\leq x_{n}\right) i=1n(Xixi)\displaystyle\leq\prod_{i=1}^{n}\mathbb{P}(X_{i}\leq x_{i}) (3)

and

(X1>x1,,Xn>xn)\displaystyle\mathbb{P}\left(X_{1}>x_{1},\ldots,X_{n}>x_{n}\right) i=1n(Xi>xi)\displaystyle\leq\prod_{i=1}^{n}\mathbb{P}(X_{i}>x_{i}) (4)

for all x1,,xnx_{1},\ldots,x_{n}\in\mathbb{R}. An infinite collection of random variables is NOD if every finite subcollection is NOD.

Any collection of independent random variables is NOD, and many commonly used multivariate distributions are NOD including the multinomial distribution, the Dirichlet distribution, the Dirichlet-multinomial distribution, the multivariate hypergeometric distribution, convolutions of multinomial distributions, and multivariate Gaussian distributions for which the correlations are all non-positive (Joag-Dev and Proschan, 1983).

Theorem 2.

Let X1,X2,uX_{1},X_{2},\ldots\in\mathbb{R}^{u} i.i.d. and for nn\in\mathbb{N} independently, let Wn1,,WnnW_{n1},\ldots,W_{nn}\in\mathbb{R} be NOD random variables, independent of (X1,X2,)(X_{1},X_{2},\ldots). Assume Condition 1 and suppose 𝔼(supθΘ|H(θ,X1)|β)<\mathbb{E}(\sup_{\theta\in\Theta}|H(\theta,X_{1})|^{\beta})<\infty, j=1n𝔼(|Wnj|α)=O(n)\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|^{\alpha})=O(n), and j=1n𝔼(|Wnj|)=O(n1/p)\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|)=O(n^{1/p}) where p[1,2)p\in[1,2) and α>2p\alpha>2p and β>1\beta>1 satisfy α1+β1=p1\alpha^{-1}+\beta^{-1}=p^{-1}. Then

supθΘ|1n1/pj=1n(WnjH(θ,Xj)𝔼(WnjH(θ,Xj)))|na.s.0.\displaystyle\sup_{\theta\in\Theta}\bigg{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}W_{nj}H(\theta,X_{j})-\mathbb{E}(W_{nj}H(\theta,X_{j}))\Big{)}\bigg{|}\xrightarrow[n\to\infty]{\mathrm{a.s.}}0. (5)

In particular, if n1/pj=1n𝔼(Wnj)1n^{-1/p}\sum_{j=1}^{n}\mathbb{E}(W_{nj})\to 1, then Equation 5 is analogous to Equation 2 with n1/pn^{1/p} in place of mnm_{n}. While the moment condition on H(θ,X1)H(\theta,X_{1}) in Theorem 2 is slightly more stringent than in Theorem 1, it applies to a more general class of resampling procedures. For instance, the distribution of WnjW_{nj} can be different for each nn and jj; in particular, there is no assumption that Wn1,,WnnW_{n1},\ldots,W_{nn} are exchangeable or even identically distributed. Further, the weights WnjW_{nj} are not restricted to being non-negative; thus, random weights taking positive and negative values are permitted.

The main limitation of Theorem 2 is that whenever p>1p>1, the condition that j=1n𝔼(|Wnj|)=O(n1/p)\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|)=O(n^{1/p}) requires the weights to be getting smaller as nn increases. One can scale the weights up by a factor of n11/pn^{1-1/p}, but then the leading factor of 1/n1/p1/n^{1/p} in Equation 5 becomes 1/n1/n, effectively reverting back to the standard rate of convergence. In Theorem 3, we show that this condition can be dropped if we assume stronger regularity and moment conditions.

Condition 2.

Assume θH(θ,x)\theta\mapsto H(\theta,x) is uniformly locally Hölder continuous, in the sense that there exist a>0a>0, M>0M>0, and δ>0\delta>0 such that for all xux\in\mathbb{R}^{u}, θ,θΘ\theta,\theta^{\prime}\in\Theta, if d(θ,θ)<δd(\theta,\theta^{\prime})<\delta then |H(θ,x)H(θ,x)|Md(θ,θ)a|H(\theta,x)-H(\theta^{\prime},x)|\leq Md(\theta,\theta^{\prime})^{a}.

Condition 3.

Assume N(r)crDN(r)\leq c\,r^{-D} for some c>0c>0 and D>0D>0, where N(r)N(r) is the smallest number of open balls of radius rr, centered at points in Θ\Theta, needed to cover Θ\Theta.

Condition 3 holds for any compact ΘD\Theta\subset\mathbb{R}^{D}; indeed, by Shalev-Shwartz and Ben-David (2014, Example 27.1), N1(r)c1rDN_{1}(r)\leq c_{1}r^{-D} where N1(r)N_{1}(r) is the number of rr-balls needed to cover Θ\Theta, centered at any points in D\mathbb{R}^{D}, and it is straightforward to verify that N(r)N1(r/2)N(r)\leq N_{1}(r/2).

Theorem 3.

Let X1,X2,uX_{1},X_{2},\ldots\in\mathbb{R}^{u} i.i.d. and for nn\in\mathbb{N} independently, let Wn1,,WnnW_{n1},\ldots,W_{nn}\in\mathbb{R} be NOD random variables, independent of (X1,X2,)(X_{1},X_{2},\ldots). Assume Conditions 1, 2, and 3. Let p(1,2)p\in(1,2) and suppose supθΘ𝔼(|H(θ,X1)|q)<\sup_{\theta\in\Theta}\mathbb{E}(|H(\theta,X_{1})|^{q})<\infty, j=1n𝔼(|Wnj|q)=O(n)\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|^{q})=O(n) for some q>((11/p)D/a+1)/(1/p1/2)q>((1-1/p)D/a+1)/(1/p-1/2). Then

supθΘ|1n1/pj=1n(WnjH(θ,Xj)𝔼(WnjH(θ,Xj)))|na.s.0.\displaystyle\sup_{\theta\in\Theta}\bigg{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}W_{nj}H(\theta,X_{j})-\mathbb{E}(W_{nj}H(\theta,X_{j}))\Big{)}\bigg{|}\xrightarrow[n\to\infty]{\mathrm{a.s.}}0. (6)

Note that as p1p\to 1 from above, the bound on the power qq approaches 22.

3 Examples of relevant resampling schemes

A key condition of Theorems 2 and 3 is that the weights Wn1,,WnnW_{n1},\ldots,W_{nn} must be NOD. It turns out that most popular resampling techniques satisfy this condition.

For instance, the mm-out-of-nn bootstrap corresponds to Multinomial(mn,(1/n,,1/n))\mathrm{Multinomial}\big{(}m_{n},(1/n,\ldots,1/n)) weights, unequal probability with replacement corresponds to Multinomial(mn,(pn1,,pnn))\mathrm{Multinomial}\big{(}m_{n},(p_{n1},\ldots,p_{nn})) weights (Antal and Tillé, 2011a), the Bayesian bootstrap corresponds to Dirichlet weights, the delete-dd jackknife and resampling without replacement correspond to multivariate hypergeometric weights (Chatterjee and Bose, 2005), the weighted likelihood bootstrap is equivalent to using independent weights (Newton and Raftery, 1994), and the reweighting scheme of Patak and Beaumont (2009) employs multivariate Gaussian weights with non-positive correlations. All of these distributions satisfy the NOD requirement (Joag-Dev and Proschan, 1983).

The NOD requirement is also satisfied by less standard reweighting schemes such as the downweight-dd jackknife (Chatterjee and Bose, 2005) and simple random sampling with over-replacement (Antal and Tillé, 2011b). In the downweight-dd jackknife, dd indices i1,,idi_{1},\ldots,i_{d} are selected uniformly at random to be downweighted such that Wni1==Wnid=d/nW_{ni_{1}}=\cdots=W_{ni_{d}}=d/n, whereas the remaining ndn-d indices are upweighted to 1+d/n1+d/n. These weights can be viewed as a monotonic transformation of the multivariate hypergeometric weights corresponding to the delete-dd jackknife, and thus are NOD (Chen et al., 2019). For simple random sampling with over-replacement, the weights can be viewed as the conditional distribution of a sequence of nn independent geometric random variables, given that their sum equals nn. Geometric random variables satisfy the conditions of Theorem 2.6 from Joag-Dev and Proschan (1983) according to Efron (1965, 3.1), implying that the resulting weights are NOD.

A general family of applicable NOD reweighting schemes can also be derived from the Farlie-Gumbel-Morgenstern (FGM) nn-copula (Nelsen, 2006, page 108). For variables u1,,un[0,1]u_{1},\ldots,u_{n}\in[0,1], an FGM copula is characterized by 2nn12^{n}-n-1 parameters—one for each subset of {1,,n}\left\{1,\ldots,n\right\} that contains least two elements. Let θ1,2,θ1,3,,θ1,2,,n[1,1]\theta_{1,2},\theta_{1,3},\ldots,\theta_{1,2,\ldots,n}\in[-1,1] denote these parameters. The density of the FGM copula is given by

f(u)\displaystyle f(u) =1+k=2n1j1<j2<<jknθj1,j2,,jk(12uj1)(12uj2)(12ujk).\displaystyle=1+\sum_{k=2}^{n}\sum_{1\leq j_{1}<j_{2}<\cdots<j_{k}\leq n}\theta_{j_{1},j_{2},\cdots,j_{k}}(1-2u_{j_{1}})(1-2u_{j_{2}})\cdots(1-2u_{j_{k}}). (7)

where the θ\theta values must adhere to the constraints

k=2n1j1<j2<<jknθj1,j2,,jkϵj1ϵj2ϵjk1\displaystyle\sum_{k=2}^{n}\sum_{1\leq j_{1}<j_{2}<\cdots<j_{k}\leq n}\theta_{j_{1},j_{2},\cdots,j_{k}}\epsilon_{j_{1}}\epsilon_{j_{2}}\cdots\epsilon_{j_{k}}\geq-1 (8)

for all (ϵ1,,ϵn){1,1}n(\epsilon_{1},\ldots,\epsilon_{n})\in\left\{-1,1\right\}^{n} to ensure non-negativity of the density function. It follows that each uju_{j} is marginally uniformly distributed, and for each x[0,1]nx\in[0,1]^{n},

(u1x1,,unxn)\displaystyle\mathbb{P}\left(u_{1}\leq x_{1},\ldots,u_{n}\leq x_{n}\right) =(j=1nxi)(1+k=2n1j1<j2<<jknθj1,j2,,jk=1k(1xj)),\displaystyle=\left(\prod_{j=1}^{n}x_{i}\right)\left(1+\sum_{k=2}^{n}\sum_{1\leq j_{1}<j_{2}<\cdots<j_{k}\leq n}\theta_{j_{1},j_{2},\cdots,j_{k}}\prod_{\ell=1}^{k}(1-x_{j_{\ell}})\right), (9)
(u1>x1,,un>xn)\displaystyle\mathbb{P}\left(u_{1}>x_{1},\ldots,u_{n}>x_{n}\right) =(j=1n(1xi))(1+k=2n1j1<j2<<jknθj1,j2,,jk(1)k=1kxj).\displaystyle=\left(\prod_{j=1}^{n}(1-x_{i})\right)\left(1+\sum_{k=2}^{n}\sum_{1\leq j_{1}<j_{2}<\cdots<j_{k}\leq n}\theta_{j_{1},j_{2},\cdots,j_{k}}(-1)^{k}\prod_{\ell=1}^{k}x_{j_{\ell}}\right). (10)

Therefore, for any x1,,xn[0,1]x_{1},\ldots,x_{n}\in[0,1],

i=1n(uixi)=i=1nxi, and i=1n(ui>xi)=i=1n(1xi).\displaystyle\prod_{i=1}^{n}\mathbb{P}\left(u_{i}\leq x_{i}\right)=\prod_{i=1}^{n}x_{i},\text{ and }\prod_{i=1}^{n}\mathbb{P}\left(u_{i}>x_{i}\right)=\prod_{i=1}^{n}(1-x_{i}). (11)

A FGM nn-copula is thus NOD if and only if

k=2n1j1<j2<<jknθj1j2jk=1k(1xj)0, and\displaystyle\sum_{k=2}^{n}\sum_{1\leq j_{1}<j_{2}<\cdots<j_{k}\leq n}\theta_{j_{1}j_{2}\cdots j_{k}}\prod_{\ell=1}^{k}(1-x_{j_{\ell}})\leq 0,\text{ and} (12)
k=2n1j1<j2<<jknθj1j2jk(1)k=1kxj0\displaystyle\sum_{k=2}^{n}\sum_{1\leq j_{1}<j_{2}<\cdots<j_{k}\leq n}\theta_{j_{1}j_{2}\cdots j_{k}}(-1)^{k}\prod_{\ell=1}^{k}x_{j_{\ell}}\leq 0 (13)

for all x1,,xn[0,1]x_{1},\ldots,x_{n}\in[0,1]. Because NOD is preserved under monotonic transformations, it follows that any multivariate distribution corresponding to such a FGM copula with θ\theta satisfying the criteria in 12 and 13 will be NOD.

Finally, the condition that j=1n𝔼(|Wnj|q)=O(n)\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|^{q})=O(n) in Theorem 3 holds for all of the aforementioned reweighting procedures without any additional assumptions, except for a few cases. For the Multinomial(mn,(pn1,,pnn))\mathrm{Multinomial}\big{(}m_{n},(p_{n1},\ldots,p_{nn})) case, it holds as long as there exists a constant κ>0\kappa>0 such that pnj<κ/np_{nj}<\kappa/n for all jj and nn. For the Gaussian reweighting scheme (Patak and Beaumont, 2009), it holds as long as supn,jVar(Wnj)<\sup_{n,j}\mathrm{Var}(W_{nj})<\infty. For the independent weights and FGM copula cases, it holds when supn,j𝔼(|Wnj|q)<\sup_{n,j}\mathbb{E}(|W_{nj}|^{q})<\infty.

4 Proofs

We begin by stating a strong law of large numbers of the bootstrap mean due to Arenal-Gutiérrez et al. (1996, Theorem 2.1). This lemma plays a key role in our proof of Theorem 1.

Lemma 1.

Let Z1,Z2,Z_{1},Z_{2},\ldots\in\mathbb{R} i.i.d. and for nn\in\mathbb{N} independently, let

(Wn1,,Wnn)Multinomial(mn,(1/n,,1/n))\displaystyle(W_{n1},\ldots,W_{nn})\sim\mathrm{Multinomial}\big{(}m_{n},(1/n,\ldots,1/n)\big{)}

independently of (Z1,Z2,)(Z_{1},Z_{2},\ldots), where mnm_{n} is a positive integer for nn\in\mathbb{N}. Suppose there exists a δ[0,1)\delta\in[0,1) such that

limnn1δlog(n)mn=0 and 𝔼(|Z1|11δ)<.\displaystyle\lim_{n\rightarrow\infty}\frac{n^{1-\delta}\log(n)}{m_{n}}=0\text{\leavevmode\nobreak\ \leavevmode\nobreak\ and \leavevmode\nobreak\ \leavevmode\nobreak\ }\mathbb{E}\big{(}|Z_{1}|^{\frac{1}{1-\delta}}\big{)}<\infty. (14)

Then

1mnj=1nWnjZjna.s.𝔼(Z1).\displaystyle\frac{1}{m_{n}}\sum_{j=1}^{n}W_{nj}Z_{j}\xrightarrow[n\to\infty]{\mathrm{a.s.}}\mathbb{E}(Z_{1}). (15)
Proof.

See Arenal-Gutiérrez et al. (1996, Theorem 2.1) for the proof. ∎

Proof of Theorem 1.

Our argument is based on the proof of Theorem 1.3.3 of Ghosh and Ramamoorthi (2003), except that we use Lemma 1 in place of the strong law of large numbers for i.i.d. random variables.

Condition 1 ensures that xsupθΘ|H(θ,x)|x\mapsto\sup_{\theta\in\Theta}|H(\theta,x)| is measurable; this can be seen by letting θ1,θ2,\theta_{1},\theta_{2},\ldots be a countable dense subset of Θ\Theta, verifying that supj|H(θj,x)|=supθΘ|H(θ,x)|\sup_{j\in\mathbb{N}}|H(\theta_{j},x)|=\sup_{\theta\in\Theta}|H(\theta,x)|, and using Folland (2013, Proposition 2.7). Define μ(θ):=𝔼(H(θ,X1))\mu(\theta):=\mathbb{E}\left(H(\theta,X_{1})\right), and note that μ(θ)\mu(\theta) is continuous by the dominated convergence theorem. Let Br(θ0):={θΘ:d(θ,θ0)<r}B_{r}(\theta_{0}):=\{\theta\in\Theta:d(\theta,\theta_{0})<r\} denote the open ball of radius rr at θ0\theta_{0}, where d(,)d(\cdot,\cdot) is the metric on Θ\Theta. For θΘ\theta\in\Theta, xux\in\mathbb{R}^{u}, and r>0r>0, define

η(θ,x,r):=supθBr(θ)|(H(θ,x)μ(θ))(H(θ,x)μ(θ))|,\displaystyle\eta(\theta,x,r):=\sup_{\theta^{\prime}\in B_{r}(\theta)}\Big{|}\big{(}H(\theta,x)-\mu(\theta)\big{)}-\big{(}H(\theta^{\prime},x)-\mu(\theta^{\prime})\big{)}\Big{|}, (16)

and observe that by continuity and compactness,

η(θ,x,r)2supθΘ|H(θ,x)|+2supθΘ|μ(θ)|<.\displaystyle\eta(\theta,x,r)\leq 2\sup_{\theta\in\Theta}|H(\theta,x)|+2\sup_{\theta\in\Theta}|\mu(\theta)|<\infty. (17)

Applying the dominated convergence theorem again, we have limr0𝔼(η(θ,X1,r))=0\lim_{r\to 0}\mathbb{E}(\eta(\theta,X_{1},r))=0 for all θΘ\theta\in\Theta. Thus, for any ε>0\varepsilon>0, by compactness of Θ\Theta there exist KK\in\mathbb{N}, θ1,,θKΘ\theta_{1},\ldots,\theta_{K}\in\Theta, and r1,,rK>0r_{1},\ldots,r_{K}>0 such that Θ=i=1KBri(θi)\Theta=\textstyle{\bigcup_{i=1}^{K}}B_{r_{i}}(\theta_{i}) and 𝔼(η(θi,X1,ri))<ε\mathbb{E}\left(\eta(\theta_{i},X_{1},r_{i})\right)<\varepsilon for all i{1,,K}i\in\left\{1,\ldots,K\right\}. Choosing δ(0,1]\delta\in(0,1] according to the statement of Theorem 1,

𝔼(η(θi,X1,ri)11δ)\displaystyle\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})^{\frac{1}{1-\delta}}\big{)} 𝔼((2supθΘ|H(θ,X1)|+2supθΘ|μ(θ)|)11δ)\displaystyle\leq\mathbb{E}\Big{(}\big{(}2\sup_{\theta\in\Theta}|H(\theta,X_{1})|+2\sup_{\theta\in\Theta}|\mu(\theta)|\big{)}^{\frac{1}{1-\delta}}\Big{)} (18)
411δ𝔼(supθΘ|H(θ,X1)|11δ)+411δsupθΘ|μ(θ)|11δ<\displaystyle\leq 4^{\frac{1}{1-\delta}}\mathbb{E}\Big{(}\sup_{\theta\in\Theta}|H(\theta,X_{1})|^{\frac{1}{1-\delta}}\Big{)}+4^{\frac{1}{1-\delta}}\sup_{\theta\in\Theta}|\mu(\theta)|^{\frac{1}{1-\delta}}<\infty (19)

since (x+y)11δ(2max{|x|,|y|})11δ211δ(|x|11δ+|y|11δ)(x+y)^{\frac{1}{1-\delta}}\leq(2\max\{|x|,|y|\})^{\frac{1}{1-\delta}}\leq 2^{\frac{1}{1-\delta}}(|x|^{\frac{1}{1-\delta}}+|y|^{\frac{1}{1-\delta}}). For all i{1,,K}i\in\{1,\ldots,K\}, by applying Lemma 1 with Zj=η(θi,Xj,ri)Z_{j}=\eta(\theta_{i},X_{j},r_{i}) we have that

1mnj=1nWnjη(θi,Xj,ri)na.s.𝔼(η(θi,X1,ri))<ε.\displaystyle\frac{1}{m_{n}}\sum_{j=1}^{n}W_{nj}\eta(\theta_{i},X_{j},r_{i})\xrightarrow[n\to\infty]{\mathrm{a.s.}}\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})\big{)}<\varepsilon. (20)

Similarly, by Lemma 1 with Zj=H(θi,Xj)Z_{j}=H(\theta_{i},X_{j}),

1mnj=1nWnjH(θi,Xj)na.s.μ(θi).\displaystyle\frac{1}{m_{n}}\sum_{j=1}^{n}W_{nj}H(\theta_{i},X_{j})\xrightarrow[n\to\infty]{\mathrm{a.s.}}\mu(\theta_{i}). (21)

Thus, for any θΘ\theta\in\Theta, by choosing ii such that θBri(θi)\theta\in B_{r_{i}}(\theta_{i}), we have

|1mnj=1nWnjH(θ,Xj)μ(θ)|\displaystyle\bigg{|}\frac{1}{m_{n}}\sum_{j=1}^{n}W_{nj}H(\theta,X_{j})-\mu(\theta)\bigg{|} (22)
1mnj=1nWnjη(θi,Xj,ri)+|1mnj=1nWnjH(θi,Xj)μ(θi)|\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leq\frac{1}{m_{n}}\sum_{j=1}^{n}W_{nj}\eta(\theta_{i},X_{j},r_{i})+\bigg{|}\frac{1}{m_{n}}\sum_{j=1}^{n}W_{nj}H(\theta_{i},X_{j})-\mu(\theta_{i})\bigg{|}

by the triangle inequality and Equation 16. Letting VniV_{ni} denote the right-hand side of Equation 22, we have Vni𝔼(η(θi,X1,ri))<εV_{ni}\to\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})\big{)}<\varepsilon a.s. by Equations 20 and 21. Therefore,

supθΘ|1mnj=1nWnjH(θ,Xj)μ(θ)|max1iKVnina.s.max1iK𝔼(η(θi,X1,ri))<ε.\displaystyle\sup_{\theta\in\Theta}\bigg{|}\frac{1}{m_{n}}\sum_{j=1}^{n}W_{nj}H(\theta,X_{j})-\mu(\theta)\bigg{|}\leq\max_{1\leq i\leq K}V_{ni}\xrightarrow[n\to\infty]{\mathrm{a.s.}}\max_{1\leq i\leq K}\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})\big{)}<\varepsilon. (23)

Since ε>0\varepsilon>0 is arbitrary, Equation 23 holds almost surely for ε=εk=1/k\varepsilon=\varepsilon_{k}=1/k for all kk\in\mathbb{N}, completing the proof. ∎

The following result, due to Chen et al. (2019), is a more general version of Lemma 1 that extends beyond the standard bootstrap to negatively orthant dependent (NOD) weights.

Lemma 2.

Let Z1,Z2,Z_{1},Z_{2},\ldots\in\mathbb{R} be identically distributed NOD random variables. For each nn\in\mathbb{N} independently, let Wn1,,WnnW_{n1},\ldots,W_{nn} be NOD random variables, independent of (Z1,Z2,)(Z_{1},Z_{2},\ldots). Suppose 𝔼(|Z1|β)<\mathbb{E}(|Z_{1}|^{\beta})<\infty and j=1n𝔼(|Wnj|α)=O(n)\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|^{\alpha})=O(n) where p[1,2)p\in[1,2) and either (a) α>2p\alpha>2p and β>1\beta>1 satisfy α1+β1=p1\alpha^{-1}+\beta^{-1}=p^{-1}, or (b) the weights WnjW_{nj} are identically distributed for all n,jn,j, and α=β=2p\alpha=\beta=2p. Then

1n1/pj=1n(WnjZj𝔼(WnjZj))na.s.0.\displaystyle\frac{1}{n^{1/p}}\sum_{j=1}^{n}\big{(}W_{nj}Z_{j}-\mathbb{E}(W_{nj}Z_{j})\big{)}\xrightarrow[n\to\infty]{\mathrm{a.s.}}0. (24)
Proof.

See Chen et al. (2019, Theorem 1 and Corollary 1) for the proof. ∎

Proof of Theorem 2.

The proof is similar to the proof of Theorem 1, except that we use Lemma 2 instead of Lemma 1, and some modifications are needed to handle more general distributions of the weights WnjW_{nj}.

As in the proof of Theorem 1, define η(θ,x,r)\eta(\theta,x,r) by Equation 16 and μ(θ):=𝔼(H(θ,X1))\mu(\theta):=\mathbb{E}\left(H(\theta,X_{1})\right). As before, for any ε>0\varepsilon>0, there exist KK\in\mathbb{N}, θ1,,θKΘ\theta_{1},\ldots,\theta_{K}\in\Theta, and r1,,rK>0r_{1},\ldots,r_{K}>0 such that Θ=i=1KBri(θi)\Theta=\textstyle{\bigcup_{i=1}^{K}}B_{r_{i}}(\theta_{i}) and 𝔼(η(θi,X1,ri))<ε\mathbb{E}\left(\eta(\theta_{i},X_{1},r_{i})\right)<\varepsilon for all i{1,,K}i\in\left\{1,\ldots,K\right\}. Just as in Equation 18,

𝔼(η(θi,X1,ri)β)4β𝔼(supθΘ|H(θ,X1)|β)+4βsupθΘ|μ(θ)|β<.\displaystyle\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})^{\beta}\big{)}\leq 4^{\beta}\mathbb{E}\Big{(}\sup_{\theta\in\Theta}|H(\theta,X_{1})|^{\beta}\Big{)}+4^{\beta}\sup_{\theta\in\Theta}|\mu(\theta)|^{\beta}<\infty.

For all i{1,,K}i\in\{1,\ldots,K\}, by applying Lemma 2 with Zj=1Z_{j}=1 and Zj=H(θi,Xj)Z_{j}=H(\theta_{i},X_{j}), respectively,

1n1/pj=1n(Wnj𝔼(Wnj))na.s.0,\displaystyle\frac{1}{n^{1/p}}\sum_{j=1}^{n}\big{(}W_{nj}-\mathbb{E}(W_{nj})\big{)}\xrightarrow[n\to\infty]{\mathrm{a.s.}}0, (25)
1n1/pj=1n(WnjH(θi,Xj)𝔼(Wnj)μ(θi))na.s.0.\displaystyle\frac{1}{n^{1/p}}\sum_{j=1}^{n}\big{(}W_{nj}H(\theta_{i},X_{j})-\mathbb{E}\left(W_{nj}\right)\mu(\theta_{i})\big{)}\xrightarrow[n\to\infty]{\mathrm{a.s.}}0. (26)

Let Wnj+=max{Wnj,0}W_{nj}^{+}=\max\{W_{nj},0\} and Wnj=max{Wnj,0}W_{nj}^{-}=\max\{-W_{nj},0\}. Then (Wn1+,,Wnn+)(W_{n1}^{+},\ldots,W_{nn}^{+}) and (Wn1,,Wnn)(W_{n1}^{-},\ldots,W_{nn}^{-}) are each NOD because they are monotone transformations of the WnjW_{nj}’s (Chen et al., 2019). Thus, Lemma 2 applied to Zj=η(θi,Xj,ri)Z_{j}=\eta(\theta_{i},X_{j},r_{i}) with Wnj+W_{nj}^{+} and WnjW_{nj}^{-}, respectively, yields

1n1/pj=1n(Wnj+η(θi,Xj,ri)𝔼(Wnj+)𝔼(η(θi,X1,ri)))\displaystyle\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}W^{+}_{nj}\eta(\theta_{i},X_{j},r_{i})-\mathbb{E}(W_{nj}^{+})\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})\big{)}\Big{)} na.s.0,\displaystyle\xrightarrow[n\to\infty]{\mathrm{a.s.}}0, (27)
1n1/pj=1n(Wnjη(θi,Xj,ri)𝔼(Wnj)𝔼(η(θi,X1,ri)))\displaystyle\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}W^{-}_{nj}\eta(\theta_{i},X_{j},r_{i})-\mathbb{E}(W_{nj}^{-})\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})\big{)}\Big{)} na.s.0.\displaystyle\xrightarrow[n\to\infty]{\mathrm{a.s.}}0. (28)

By adding Equations 27 and 28 and using the fact that |Wnj|=Wnj++Wnj|W_{nj}|=W_{nj}^{+}+W_{nj}^{-}, we have

1n1/pj=1n(|Wnj|η(θi,Xj,ri)𝔼(|Wnj|)𝔼(η(θi,X1,ri)))\displaystyle\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}|W_{nj}|\eta(\theta_{i},X_{j},r_{i})-\mathbb{E}\left(|W_{nj}|\right)\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})\big{)}\Big{)} na.s.0,\displaystyle\xrightarrow[n\to\infty]{\mathrm{a.s.}}0, (29)

Since 𝔼(η(θi,X1,ri))<ε\mathbb{E}(\eta(\theta_{i},X_{1},r_{i}))<\varepsilon, Equation 29 implies that, almost surely,

lim supn1n1/pj=1n|Wnj|η(θi,Xj,ri)lim supn1n1/pj=1n𝔼(|Wnj|)𝔼(η(θi,X1,ri))Cε\displaystyle\limsup_{n\to\infty}\frac{1}{n^{1/p}}\sum_{j=1}^{n}|W_{nj}|\eta(\theta_{i},X_{j},r_{i})\leq\limsup_{n\to\infty}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|)\mathbb{E}\big{(}\eta(\theta_{i},X_{1},r_{i})\big{)}\leq C\varepsilon (30)

where C:=lim supnn1/pj=1n𝔼(|Wnj|)C:=\limsup_{n\to\infty}n^{-1/p}\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|). Note that C<C<\infty by the assumption that j=1n𝔼(|Wnj|)=O(n1/p)\sum_{j=1}^{n}\mathbb{E}(|W_{nj}|)=O(n^{1/p}). For any θΘ\theta\in\Theta, choosing ii such that θBri(θi)\theta\in B_{r_{i}}(\theta_{i}), we can write

WnjH(θ,Xj)𝔼(Wnj)μ(θ)\displaystyle W_{nj}H(\theta,X_{j})-\mathbb{E}(W_{nj})\mu(\theta) =Wnj((H(θ,Xj)μ(θ))(H(θi,Xj)μ(θi)))\displaystyle=W_{nj}\Big{(}(H(\theta,X_{j})-\mu(\theta))-(H(\theta_{i},X_{j})-\mu(\theta_{i}))\Big{)}
+(Wnj𝔼(Wnj))(μ(θ)μ(θi))\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ +(W_{nj}-\mathbb{E}(W_{nj}))(\mu(\theta)-\mu(\theta_{i})) (31)
+(WnjH(θi,Xj)𝔼(Wnj)μ(θi)).\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ +\big{(}W_{nj}H(\theta_{i},X_{j})-\mathbb{E}(W_{nj})\mu(\theta_{i})\big{)}.

Summing Equation 4 over jj, using the triangle inequality, and employing Equation 16,

|1n1/pj=1n(WnjH(θ,Xj)𝔼(Wnj)μ(θ))|\displaystyle\bigg{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}W_{nj}H(\theta,X_{j})-\mathbb{E}(W_{nj})\mu(\theta)\Big{)}\bigg{|} 1n1/pj=1n|Wnj|η(θi,Xj,ri)\displaystyle\leq\frac{1}{n^{1/p}}\sum_{j=1}^{n}|W_{nj}|\,\eta(\theta_{i},X_{j},r_{i}) (32)
+|1n1/pj=1n(Wnj𝔼(Wnj))|supθΘ|μ(θ)μ(θi)|\displaystyle+\Big{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\left(W_{nj}-\mathbb{E}(W_{nj})\right)\Big{|}\,\sup_{\theta\in\Theta}|\mu(\theta)-\mu(\theta_{i})|
+|1n1/pj=1n(WnjH(θi,Xj)𝔼(Wnj)μ(θi))|.\displaystyle+\Big{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}W_{nj}H(\theta_{i},X_{j})-\mathbb{E}(W_{nj})\mu(\theta_{i})\Big{)}\Big{|}.

Letting VniV_{ni} denote the right-hand side of Equation 32, we have lim supnVniCε\limsup_{n}V_{ni}\leq C\varepsilon a.s. by Equations 30, 25, and 26, along with the fact that μ(θ)\mu(\theta) is continuous and Θ\Theta is compact. Therefore, almost surely,

lim supnsupθΘ|1n1/pj=1n(WnjH(θ,Xj)𝔼(Wnj)μ(θ))|lim supnmax1iKVniCε.\displaystyle\limsup_{n\to\infty}\;\sup_{\theta\in\Theta}\bigg{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}\Big{(}W_{nj}H(\theta,X_{j})-\mathbb{E}(W_{nj})\mu(\theta)\Big{)}\bigg{|}\leq\limsup_{n\to\infty}\max_{1\leq i\leq K}V_{ni}\leq C\varepsilon. (33)

As before, since ε>0\varepsilon>0 is arbitrary, Equation 33 holds almost surely for ε=εk=1/k\varepsilon=\varepsilon_{k}=1/k for all kk\in\mathbb{N}, completing the proof. ∎

Proof of Theorem 3.

First, we may assume without loss of generality that WnjW_{nj} and H(θ,Xj)H(\theta,X_{j}) are nonnegative since we can write Wnj=Wnj+WnjW_{nj}=W_{nj}^{+}-W_{nj}^{-} and H(θ,Xj)=H(θ,Xj)+H(θ,Xj)H(\theta,X_{j})=H(\theta,X_{j})^{+}-H(\theta,X_{j})^{-}, and apply the result in the nonnegative case to each of Wnj+H(θ,Xj)+W_{nj}^{+}H(\theta,X_{j})^{+}, Wnj+H(θ,Xj)W_{nj}^{+}H(\theta,X_{j})^{-}, WnjH(θ,Xj)+W_{nj}^{-}H(\theta,X_{j})^{+}, and WnjH(θ,Xj)W_{nj}^{-}H(\theta,X_{j})^{-} to obtain the result for WnjH(θ,Xj)W_{nj}H(\theta,X_{j}) in the general case; see the proof of Theorem 1 from Chen et al. (2019) for a similar argument.

Let ε>0\varepsilon>0. As before, define η(θ,x,r)\eta(\theta,x,r) by Equation 16 and μ(θ):=𝔼(H(θ,X1))\mu(\theta):=\mathbb{E}\left(H(\theta,X_{1})\right). For nn\in\mathbb{N}, define rn=(ε/n(11/p))1/ar_{n}=(\varepsilon/n^{(1-1/p)})^{1/a} and Kn=N(rn)K_{n}=N(r_{n}), where N(r)N(r) is defined in Condition 3. Let θn1,,θnKnΘ\theta_{n1},\ldots,\theta_{nK_{n}}\in\Theta be the centers of KnK_{n} balls of radius rnr_{n} that cover Θ\Theta, that is, i=1KnBrn(θni)\bigcup_{i=1}^{K_{n}}B_{r_{n}}(\theta_{ni}).

Let CW=1+supn1nj=1n𝔼(Wnjq)<C_{W}=1+\sup_{n}\frac{1}{n}\sum_{j=1}^{n}\mathbb{E}(W_{nj}^{q})<\infty. Note that q>2pq>2p since p>1p>1. Thus, by Lemma 2 with Zj=1Z_{j}=1,

1nj=1nWnjCW+1nj=1n(Wnj𝔼(Wnj))na.s.CW\displaystyle\frac{1}{n}\sum_{j=1}^{n}W_{nj}\leq C_{W}+\frac{1}{n}\sum_{j=1}^{n}(W_{nj}-\mathbb{E}(W_{nj}))\xrightarrow[n\to\infty]{\mathrm{a.s.}}C_{W} (34)

For all nn sufficiently large that rn<δr_{n}<\delta, we have η(θ,x,rn)2Mrna=2Mε/n(11/p)\eta(\theta,x,r_{n})\leq 2Mr_{n}^{a}=2M\varepsilon/n^{(1-1/p)} by Condition 2, and thus,

lim supnmaxi[Kn]1n1/pj=1nWnjη(θni,Xj,rn)2Mεlim supn1nj=1nWnja.s.2MCWε\displaystyle\limsup_{n\to\infty}\max_{i\in[K_{n}]}\frac{1}{n^{1/p}}\sum_{j=1}^{n}W_{nj}\eta(\theta_{ni},X_{j},r_{n})\leq 2M\varepsilon\limsup_{n\to\infty}\frac{1}{n}\sum_{j=1}^{n}W_{nj}\stackrel{{\scriptstyle\mathrm{a.s.}}}{{\leq}}2MC_{W}\varepsilon (35)

where [Kn]={1,,Kn}[K_{n}]=\{1,\ldots,K_{n}\}. By another application of Lemma 2 with Zj=1Z_{j}=1,

lim supnmaxi[Kn]|1n1/pj=1n(Wnj𝔼(Wnj))|supθΘ|μ(θ)μ(θni)|=a.s.0\displaystyle\limsup_{n\to\infty}\max_{i\in[K_{n}]}\Big{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}(W_{nj}-\mathbb{E}(W_{nj}))\Big{|}\sup_{\theta\in\Theta}|\mu(\theta)-\mu(\theta_{ni})|\stackrel{{\scriptstyle\mathrm{a.s.}}}{{=}}0 (36)

since maxi[Kn]supθ|μ(θ)μ(θni)|2supθ|μ(θ)|<\max_{i\in[K_{n}]}\sup_{\theta}|\mu(\theta)-\mu(\theta_{ni})|\leq 2\sup_{\theta}|\mu(\theta)|<\infty. Defining Ynij:=WnjH(θni,Xj)𝔼(Wnj)μ(θni)Y_{nij}:=W_{nj}H(\theta_{ni},X_{j})-\mathbb{E}(W_{nj})\mu(\theta_{ni}), we claim that

lim supnmaxi[Kn]|1n1/pj=1nYnij|a.s.ε.\displaystyle\limsup_{n\to\infty}\max_{i\in[K_{n}]}\Big{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}Y_{nij}\Big{|}\stackrel{{\scriptstyle\mathrm{a.s.}}}{{\leq}}\varepsilon. (37)

Assuming Equation 37 for the moment, we use the same decomposition as in Equation 32 and plug in Equations 3537 to obtain

lim supnsupθΘ|1n1/pj=1n\displaystyle\limsup_{n\to\infty}\sup_{\theta\in\Theta}\bigg{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n} (WnjH(θ,Xj)𝔼(Wnj)μ(θ))|2MCWε+0+ε.\displaystyle\Big{(}W_{nj}H(\theta,X_{j})-\mathbb{E}(W_{nj})\mu(\theta)\Big{)}\bigg{|}\leq 2MC_{W}\varepsilon+0+\varepsilon.

Since ε>0\varepsilon>0 is arbitrarily small, this will yield the result of the theorem.

To complete the proof, we need to show Equation 37. For each nn and ii, by Chen et al. (2019, Lemma 1), Yni1,,YninY_{ni1},\ldots,Y_{nin} is an NOD sequence since we have assumed without loss of generality that WnjW_{nj} and H(θ,Xj)H(\theta,X_{j}) are nonnegative, and adding constants preserves the NOD property. Asadian et al. (2006) and Rivaz et al. (2007) provide moment inequalities that are useful in this context. By Asadian et al. (2006, Corollary 2.2) (also see (Rivaz et al., 2007, Corollary 3)), since 𝔼(Ynij)=0\mathbb{E}(Y_{nij})=0 and q>2q>2,

𝔼(|j=1nYnij|q)CA(q)j=1n𝔼(|Ynij|q)+CA(q)(j=1n𝔼(|Ynij|2))q/2\displaystyle\mathbb{E}\Big{(}\Big{|}\sum_{j=1}^{n}Y_{nij}\Big{|}^{q}\Big{)}\leq C_{A}(q)\sum_{j=1}^{n}\mathbb{E}(|Y_{nij}|^{q})+C_{A}(q)\Big{(}\sum_{j=1}^{n}\mathbb{E}(|Y_{nij}|^{2})\Big{)}^{q/2} (38)

where CA(q)C_{A}(q) is a universal constant that depends only on qq. Letting CH=1+supθΘ𝔼(|H(θ,X1)|q)<C_{H}=1+\sup_{\theta\in\Theta}\mathbb{E}(|H(\theta,X_{1})|^{q})<\infty, we have

j=1n𝔼(|Ynij|q)j=1n2q+1𝔼(|Wnj|q|H(θni,Xj)|q)2q+1CHCWn.\displaystyle\sum_{j=1}^{n}\mathbb{E}(|Y_{nij}|^{q})\leq\sum_{j=1}^{n}2^{q+1}\mathbb{E}\big{(}|W_{nj}|^{q}|H(\theta_{ni},X_{j})|^{q}\big{)}\leq 2^{q+1}C_{H}C_{W}n. (39)

Likewise, since Var(X)𝔼(X2)\mathrm{Var}(X)\leq\mathbb{E}(X^{2}) for any random variable XX,

j=1n𝔼(|Ynij|2)j=1n𝔼(|Wnj|2|H(θni,Xj)|2)CHCWn.\displaystyle\sum_{j=1}^{n}\mathbb{E}(|Y_{nij}|^{2})\leq\sum_{j=1}^{n}\mathbb{E}\big{(}|W_{nj}|^{2}|H(\theta_{ni},X_{j})|^{2}\big{)}\leq C_{H}C_{W}n. (40)

Plugging Equations 39 and 40 into Equation 38, we have

𝔼(|j=1nYnij|q)Cnq/2\displaystyle\mathbb{E}\Big{(}\Big{|}\sum_{j=1}^{n}Y_{nij}\Big{|}^{q}\Big{)}\leq Cn^{q/2} (41)

where C=CA(q)2q+1CHCW+CA(q)(CHCW)q/2C=C_{A}(q)2^{q+1}C_{H}C_{W}+C_{A}(q)(C_{H}C_{W})^{q/2}. By Condition 3, Kn=N(rn)crnD=cn(11/p)D/a/εD/aK_{n}=N(r_{n})\leq c\,r_{n}^{-D}=c\,n^{(1-1/p)D/a}/\varepsilon^{D/a}. Along with Markov’s inequality and Equation 41, this implies

(maxi[Kn]|1n1/pj=1nYnij|>ε)\displaystyle\mathbb{P}\Big{(}\max_{i\in[K_{n}]}\Big{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}Y_{nij}\Big{|}>\varepsilon\Big{)} i=1Kn(|1n1/pj=1nYnij|>ε)1εqnq/pi=1Kn𝔼(|j=1nYnij|q)\displaystyle\leq\sum_{i=1}^{K_{n}}\mathbb{P}\Big{(}\Big{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}Y_{nij}\Big{|}>\varepsilon\Big{)}\leq\frac{1}{\varepsilon^{q}n^{q/p}}\sum_{i=1}^{K_{n}}\mathbb{E}\Big{(}\Big{|}\sum_{j=1}^{n}Y_{nij}\Big{|}^{q}\Big{)}
KnCnq/2εqnq/pcCεqεD/an(11/p)D/anq(1/21/p)=cCεqεD/anγ\displaystyle\leq\frac{K_{n}Cn^{q/2}}{\varepsilon^{q}n^{q/p}}\leq\frac{c\,C}{\varepsilon^{q}\varepsilon^{D/a}}n^{(1-1/p)D/a}\,n^{q(1/2-1/p)}=\frac{c\,C}{\varepsilon^{q}\varepsilon^{D/a}}n^{-\gamma}

where γ:=q(1/p1/2)(11/p)D/a>1\gamma:=q(1/p-1/2)-(1-1/p)D/a>1 because q>((11/p)D/a+1)/(1/p1/2)q>((1-1/p)D/a+1)/(1/p-1/2) by assumption. Hence,

n=1(maxi[Kn]|1n1/pj=1nYnij|>ε)cCεqεD/an=1nγ<.\displaystyle\sum_{n=1}^{\infty}\mathbb{P}\Big{(}\max_{i\in[K_{n}]}\Big{|}\frac{1}{n^{1/p}}\sum_{j=1}^{n}Y_{nij}\Big{|}>\varepsilon\Big{)}\leq\frac{c\,C}{\varepsilon^{q}\varepsilon^{D/a}}\sum_{n=1}^{\infty}n^{-\gamma}<\infty. (42)

Therefore, Equation 37 holds by the Borel–Cantelli lemma. This completes the proof. ∎

References

  • Antal and Tillé (2011a) E. Antal and Y. Tillé. A direct bootstrap method for complex sampling designs from a finite population. Journal of the American Statistical Association, 106(494):534–543, 2011a.
  • Antal and Tillé (2011b) E. Antal and Y. Tillé. Simple random sampling with over-replacement. Journal of Statistical Planning and Inference, 141(1):597–601, 2011b.
  • Arenal-Gutiérrez et al. (1996) E. Arenal-Gutiérrez, C. Matrán, and J. A. Cuesta-Albertos. On the unconditional strong law of large numbers for the bootstrap mean. Statistics & Probability Letters, 27(1):49–60, 1996.
  • Asadian et al. (2006) N. Asadian, V. Fakoor, and A. Bozorgnia. Rosenthal’s Type Inequalities for Negatively Orthant Dependent Random Variables. Journal of the Iranian Statistical Society, 5(1 and 2):69–75, 2006.
  • Athreya et al. (1984) K. B. Athreya, M. Ghosh, L. Y. Low, and P. K. Sen. Laws of large numbers for bootstrapped U-statistics. Journal of Statistical Planning and Inference, 9(2):185–194, 1984.
  • Bickel et al. (2012) P. J. Bickel, F. Götze, and W. R. van Zwet. Resampling fewer than n observations: gains, losses, and remedies for losses. In Selected works of Willem van Zwet, pages 267–297. Springer, 2012.
  • Breiman (1996) L. Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
  • Chatterjee and Bose (2005) S. Chatterjee and A. Bose. Generalized bootstrap for estimating equations. The Annals of Statistics, 33(1):414–436, 2005.
  • Chen et al. (2019) P. Chen, T. Zhang, and S. H. Sung. Strong laws for randomly weighted sums of random variables and applications in the bootstrap and random design regression. Statistica Sinica, 29(4):1739–1749, 2019.
  • Efron (1965) B. Efron. Increasing properties of Pólya frequency function. The Annals of Mathematical Statistics, pages 272–279, 1965.
  • Efron (1979) B. Efron. Bootstrap methods: Another look at the jackknife. The Annals of Statistics, pages 1–26, 1979.
  • Efron (1982) B. Efron. The jackknife, the bootstrap and other resampling plans. SIAM, 1982.
  • Efron and Tibshirani (1994) B. Efron and R. J. Tibshirani. An introduction to the bootstrap. CRC press, 1994.
  • Folland (2013) G. B. Folland. Real Analysis: Modern Techniques and Their Applications. John Wiley & Sons, 2013.
  • Ghosh and Ramamoorthi (2003) J. K. Ghosh and R. V. Ramamoorthi. Bayesian Nonparametrics. Springer Series in Statistics. Springer, New York, NY, 2003. ISBN 0387955372.
  • Giné and Zinn (1990) E. Giné and J. Zinn. Bootstrapping general empirical measures. The Annals of Probability, pages 851–869, 1990.
  • Huggins and Miller (2019) J. H. Huggins and J. W. Miller. Robust inference and model criticism using bagged posteriors. arXiv preprint arXiv:1912.07104, 2019.
  • Huggins and Miller (2022) J. H. Huggins and J. W. Miller. Reproducible model selection using bagged posteriors. Bayesian Analysis, pages 1–26, 2022.
  • Joag-Dev and Proschan (1983) K. Joag-Dev and F. Proschan. Negative association of random variables with applications. The Annals of Statistics, pages 286–295, 1983.
  • Kosorok (2008) M. R. Kosorok. Introduction to Empirical Processes and Semiparametric Inference. Springer, 2008.
  • Lehmann (1966) E. L. Lehmann. Some concepts of dependence. The Annals of Mathematical Statistics, 37(5):1137–1153, 1966.
  • Little and Badawy (2019) M. A. Little and R. Badawy. Causal bootstrapping. arXiv preprint arXiv:1910.09648, 2019.
  • Nelsen (2006) R. B. Nelsen. An Introduction to Copulas. Springer, 2006.
  • Newton and Raftery (1994) M. A. Newton and A. E. Raftery. Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society: Series B (Methodological), 56(1):3–26, 1994.
  • Patak and Beaumont (2009) Z. Patak and J.-F. Beaumont. Generalized bootstrap for prices surveys. 57th Session of the International Statistical Institute, Durban, South-Africa, 2009.
  • Rivaz et al. (2007) F. Rivaz, M. Amini, and A. G. Bozorgnia. Moment Inequalities and Applications for Negative Dependence Random Variables. 26(4):7–11, 2007.
  • Rubin (1981) D. B. Rubin. The Bayesian bootstrap. The Annals of Statistics, pages 130–134, 1981.
  • Shalev-Shwartz and Ben-David (2014) S. Shalev-Shwartz and S. Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
  • Singh (1981) K. Singh. On the asymptotic accuracy of Efron’s bootstrap. The Annals of Statistics, pages 1187–1195, 1981.
  • Singh and Xie (2003) K. Singh and M. Xie. Bootlier-plot: bootstrap based outlier detection plot. Sankhyā: The Indian Journal of Statistics, pages 532–559, 2003.
  • Vaart and Wellner (1996) A. W. Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer, 1996.