This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the lower expected star discrepancy for jittered sampling than simple random sampling

Jun Xian, Xiaoda Xu J. Xian
Department of Mathematics and Guangdong Province Key Laboratory of Computational Science
Sun Yat-sen University
510275 Guangzhou
China.
[email protected] X. Xu
Department of Mathematics
Sun Yat-sen University
510275 Guangzhou
China.
[email protected]
Abstract.

We compare expected star discrepancy under jittered sampling with simple random sampling, and the strong partition principle for the star discrepancy is proved.

Key words and phrases:
Expected star discrepancy; Stratified sampling.
2010 Mathematics Subject Classification:
65C10, 11K38, 65D30.

1. Introduction

Classical jittered sampling (JS) acquires better expected star discrepancy bounds than using traditional Monte Carlo (MC) sampling, see [9]. This means jittered sampling is the refinement of the traditional Monte Carlo method, the problem now is that is there a direct comparison of star discrepancy itself, not a bound comparison. This involves the strong partition principle for the star discrepancy version, see an introduction in [13]. Among various techniques for measuring the irregularities of point distribution, star discrepancy is the most efficient, which is well established and has found applications in various areas such as computer graphics, machining learning, numerical integration, and financial engineering, see [15, 1, 7, 10].

Star discrepancy. The star discrepancy of a sampling set PN,d={t1,t2,,tN}P_{N,d}=\{t_{1},t_{2},\ldots,t_{N}\} is defined by:

(1.1) DN(t1,t2,,tN):=supx[0,1]d|λ([0,x])n=1NI[0,x](tn)N|,D_{N}^{*}\left(t_{1},t_{2},\ldots,t_{N}\right):=\sup_{x\in[0,1]^{d}}\left|\lambda([0,x])-\frac{\sum_{n=1}^{N}I_{[0,x]}\left(t_{n}\right)}{N}\right|,

where λ\lambda denotes the dd-dimensional Lebesgue measure and I[0,x]I_{[0,x]} denotes the characteristic function defined on the dd-dimensional rectangle [0,x][0,x].

The special deterministic point set constructions are called low discrepancy point sets, the best known asymptotic upper bounds for star discrepancy are of the form

O((lnN)αdN),O(\frac{(\ln N)^{\alpha_{d}}}{N}),

where αd0\alpha_{d}\geq 0 for fixed dd. Studies on such point sets, see [14, 6]. We introduce random factors in the present paper. Formers have conducted extensive research on the comparisons between expected discrepancy. In [13], it is proved that jittered sampling constitutes a set whose expected LpL_{p}-discrepancy is smaller than that of purely random points. Further, a theoretical conclusion that the jittered sampling does not have the minimal expected L2L_{2}-discrepancy is presented in [12]. We study expected star discrepancy under different partition models, which are the following,

(1.2) 𝔼(DN(Y))<𝔼(DN(X)),\mathbb{E}(D_{N}^{*}(Y))<\mathbb{E}(D_{N}^{*}(X)),

where XX denotes a simple random sampling set, YY denotes a stratified sampling point set that is uniformly distributed in the grid-based stratified subsets. Since 2016 in [16], F. Pausinger and S. Steinerberger have given the upper and lower bounds for expected star discrepancy under jittered sampling, and they proved the strong partition principle for L2L_{2}-discrepancy, they mentioned whether this conclusion could be generalized to star discrepancy. Then, in 2021 [12], M. Kiderlen and F. Pausinger referred again to the proof of the strong partition principle for the star discrepancy.

The rest of this paper is organized as follows. Section 2 presents preliminaries on stratified sampling and δ\delta-covers. Section 3 presents our main results, which provide comparisons of the expected star discrepancy for simple random sampling and stratified sampling under grid-based equivolume partition. Section 4 includes the proofs of the main result. Finally, in section 5 we conclude the paper with a short summary.

2. Preliminaries on stratified sampling and δ\delta-covers

Before introducing the main result, we list the preliminaries used in this paper.

2.1. Jittered sampling

Jittered sampling is a type of grid-based equivolume partition. [0,1]d[0,1]^{d} is divided into mdm^{d} axis parallel boxes Qi,1iN,Q_{i},1\leq i\leq N, each with sides 1m,\frac{1}{m}, see illustration of Figure 1. Research on the jittered sampling are extensive, see [4, 9, 12, 13, 16].

Refer to caption
(a) jittered sampling in two dimension
Refer to caption
(b) jittered sampling in three dimension
Figure 1. jittered sampling formed by isometric grid partition.

For jittered sampling, we consider a rectangle R=[0,x)R=[0,x) (we shall call it the test set in the following) in [0,1]d[0,1]^{d} anchored at 0. For the corresponding isometric grid partition Ω={Q1,Q2,,QN}\Omega=\{Q_{1},Q_{2},\ldots,Q_{N}\} of [0,1]d[0,1]^{d}, if we put

IN:={j:RQj},I_{N}:=\{j:\partial R\cap Q_{j}\neq\emptyset\},

and

CN:=|IN|,C_{N}:=|I_{N}|,

which means the cardinality of the index set INI_{N}. Then for CNC_{N}, easy to show that

(2.1) CNdN11d.C_{N}\leq d\cdot N^{1-\frac{1}{d}}.

2.2. δ\delta-covers

Secondly, to discretize the star discrepancy, we use the definition of δ\delta-covers as in [8], which is well known in empirical process theory, see, e.g., [17].

Definition 2.1.

For any δ(0,1]\delta\in(0,1], a finite set Γ\Gamma of points in [0,1)d[0,1)^{d} is called a δ\delta-cover of [0,1)d[0,1)^{d}, if for every y[0,1)dy\in[0,1)^{d}, there exist x,zΓ{0}x,z\in\Gamma\cup\{0\} such that xyzx\leq y\leq z and λ([0,z])λ([0,x])δ\lambda([0,z])-\lambda([0,x])\leq\delta. The number 𝒩(d,δ)\mathcal{N}(d,\delta) denotes the smallest cardinality of a δ\delta-cover of [0,1)d[0,1)^{d}.

From [8, 11], combining with Stirling’s formula, the following estimation for 𝒩(d,δ)\mathcal{N}(d,\delta) holds, that is, for any d1d\geq 1 and δ(0,1]\delta\in(0,1],

(2.2) 𝒩(d,δ)2ded2πd(δ1+1)d.\mathcal{N}(d,\delta)\leq 2^{d}\cdot\frac{e^{d}}{\sqrt{2\pi d}}\cdot(\delta^{-1}+1)^{d}.

Let P={p1,p2,,pN}[0,1]dP=\{p_{1},p_{2},\ldots,p_{N}\}\subset[0,1]^{d} and Γ\Gamma be δ\delta-covers, then

DN(P)DΓ(P)+δ,D_{N}^{*}(P)\leq D_{\Gamma}(P)+\delta,

where

(2.3) DΓ(P):=maxxΓ|λ([0,x])n=1NI[0,x](pn)N|.D_{\Gamma}(P):=\max_{x\in\Gamma}|\lambda([0,x])-\frac{\sum_{n=1}^{N}I_{[0,x]}\left(p_{n}\right)}{N}|.

Formula (2.3) provides convenience for estimating the star discrepancy.

3. Expected star discrepancy for stratified and simple random sampling

In this section, a comparison of expected star discrepancy for jittered sampling and simple random sampling is obtained, which means the strong partition principle for star discrepancy holds.

3.1. Comparison of expected star discrepancy under jittered sampling and simple random sampling

Theorem 3.1.

Let m,dm,d\in\mathbb{N} with md2m\geq d\geq 2. Let N=mdN=m^{d}, X={X1,X2,,XN}X=\{X_{1},X_{2},\ldots,X_{N}\} is a simple random sampling set. Stratified random dd-dimension point set Y={Y1,Y2,Y3,,YN}Y=\{Y_{1},Y_{2},Y_{3},\ldots,Y_{N}\} is uniformly distributed in the grid-based stratified subsets of Ω|\Omega^{*}_{|}, then

(3.1) 𝔼(DN(Y))<𝔼(DN(X)).\mathbb{E}(D_{N}^{*}(Y))<\mathbb{E}(D_{N}^{*}(X)).
Remark 3.2.

The inequality (3.1) in Theorem 3.1 presents lower expected star discrepancy under grid-based partition than simple random sampling, which also means [Strong Partition Principle [13]] holds, it is stated in [12] that strong partition principle for the star discrepancy needs to be proved.

4. Proofs

In this section, we present the proof of Theorem 3.1. We first introduce Bernstein’s inequality, which is a standard result that can be found in textbooks on probability, see, e.g., [5] and we shall omit the proof here.

Lemma 4.1 (Bernstein’s inequality).

Let Z1,,ZNZ_{1},\ldots,Z_{N} be independent random variables with expected values 𝔼(Zj)=μj\mathbb{E}(Z_{j})=\mu_{j} and variances σj2\sigma_{j}^{2} for j=1,,Nj=1,\ldots,N. Assume |Zjμj|C|Z_{j}-\mu_{j}|\leq C(C is a constant) for each jj and set Σ2:=j=1Nσj2\Sigma^{2}:=\sum_{j=1}^{N}\sigma_{j}^{2}, then for any λ0\lambda\geq 0,

{|j=1N[Zjμj]|λ}2exp(λ22Σ2+23Cλ).\mathbb{P}\left\{\Big{|}\sum_{j=1}^{N}[Z_{j}-\mu_{j}]\Big{|}\geq\lambda\right\}\leq 2\exp\left(-\frac{\lambda^{2}}{2\Sigma^{2}+\frac{2}{3}C\lambda}\right).

4.1. Proof of Theorem 3.1

For the arbitrary test set R=[0,x)R=[0,x), we unify a label W={W1,W2,,WN}W=\{W_{1},W_{2},\ldots,W_{N}\} for the sampling point sets formed by different sampling models, and we consider the following discrepancy function,

(4.1) Δ𝒫(x)=1Nn=1N𝟏R(Wn)λ(R).\Delta_{\mathscr{P}}(x)=\frac{1}{N}\sum_{n=1}^{N}\mathbf{1}_{R}(W_{n})-\lambda(R).

For the grid-based equivolume partition Ω={Ω1,Ω2,,ΩN}\Omega=\{\Omega_{1},\Omega_{2},\ldots,\Omega_{N}\}, we divide the test set RR into two parts, one is the disjoint union of Ωi\Omega_{i} entirely contained by RR and another is the union of remaining pieces which are the intersections of some Ωj\Omega_{j} and RR, i.e.,

(4.2) R=iI0ΩijJ0(ΩjR),R=\bigcup_{i\in I_{0}}\Omega_{i}\cup\bigcup_{j\in J_{0}}(\Omega_{j}\cap R),

where I0,J0I_{0},J_{0} are two index sets.

We set

T=jJ0(ΩjR).T=\bigcup_{j\in J_{0}}(\Omega_{j}\cap R).

Besides, for an equivolume partition Ω={Ω1,Ω2,,ΩN}\Omega=\{\Omega_{1},\Omega_{2},\ldots,\Omega_{N}\} of [0,1]d[0,1]^{d} and the corresponding stratified sampling set PΩP_{\Omega}, we have

(4.3) Var(n=1N𝟏R(PΩ))\displaystyle\text{Var}(\sum_{n=1}^{N}\mathbf{1}_{R}(P_{\Omega})) =i=1N|Ωi[0,x]||Ωi|(1|Ωi[0,x]||Ωi|)\displaystyle=\sum_{i=1}^{N}\frac{|\Omega_{i}\cap[0,x]|}{|\Omega_{i}|}(1-\frac{|\Omega_{i}\cap[0,x]|}{|\Omega_{i}|})
=N|[0,x]|N2i=1N(|Ωi[0,x]|)2.\displaystyle=N|[0,x]|-N^{2}\sum_{i=1}^{N}(|\Omega_{i}\cap[0,x]|)^{2}.

For sampling sets Y={Y1,Y2,,YN}Y=\{Y_{1},Y_{2},\ldots,Y_{N}\} and X={X1,X2,,XN}X=\{X_{1},X_{2},\ldots,X_{N}\}, we have

(4.4) Var(n=1N𝟏R(Y))=N|[0,x]|N2i=1N(|Ω|,i[0,x]|)2,\text{Var}(\sum_{n=1}^{N}\mathbf{1}_{R}(Y))=N|[0,x]|-N^{2}\sum_{i=1}^{N}(|\Omega^{*}_{|,i}\cap[0,x]|)^{2},

and

(4.5) Var(n=1N𝟏R(X))=N|[0,x]|N2|[0,x]|2.\text{Var}(\sum_{n=1}^{N}\mathbf{1}_{R}(X))=N|[0,x]|-N^{2}|[0,x]|^{2}.

Hence, we have

(4.6) Var(n=1N𝟏R(Y))Var(n=1N𝟏R(X))\text{Var}(\sum_{n=1}^{N}\mathbf{1}_{R}(Y))\leq\text{Var}(\sum_{n=1}^{N}\mathbf{1}_{R}(X))

Now, we exclude the equality case in (4.6), which means the following formula holds,

(4.7) N|Ω|,i[0,x]|=|[0,x]|,N|\Omega^{*}_{|,i}\cap[0,x]|=|[0,x]|,

for i=1,2,,Ni=1,2,\ldots,N and for almost all x[0,1]dx\in[0,1]^{d}.

Hence,

(4.8) [0,x]𝟏Ω|,i(y)𝑑y=[0,x]1N𝑑y.\int_{[0,x]}\mathbf{1}_{\Omega^{*}_{|,i}}(y)dy=\int_{[0,x]}\frac{1}{N}dy.

for almost all x[0,1]dx\in[0,1]^{d} and all i=1,2,,Ni=1,2,\ldots,N, which implies 𝟏Ω|,i=1N\mathbf{1}_{\Omega^{*}_{|,i}}=\frac{1}{N}, which is not impossible for N2.N\geq 2.

For set TT, we have

(4.9) Var(1Nn=1N𝟏T(Yn))=i=1|J0||Ωi,|[0,x]||Ωi,||(1|Ωi,|[0,x]||Ωi,||)\text{Var}(\frac{1}{N}\sum_{n=1}^{N}\mathbf{1}_{T}(Y_{n}))=\sum_{i=1}^{|J_{0}|}\frac{|\Omega^{*}_{i,|}\cap[0,x]|}{|\Omega^{*}_{i,|}|}(1-\frac{|\Omega^{*}_{i,|}\cap[0,x]|}{|\Omega^{*}_{i,|}|})

The same analysis for set TT as RR, we have

(4.10) Var(1Nn=1N𝟏T(Yn))<Var(1Nn=1N𝟏T(Xn)).\text{Var}(\frac{1}{N}\sum_{n=1}^{N}\mathbf{1}_{T}(Y_{n}))<\text{Var}(\frac{1}{N}\sum_{n=1}^{N}\mathbf{1}_{T}(X_{n})).

For test set R=[0,x)R=[0,x), we choose R0=[0,y)R_{0}=[0,y) and R1=[0,z)R_{1}=[0,z) such that yxzy\leq x\leq z and λ(R1)λ(R0)1N\lambda(R_{1})-\lambda(R_{0})\leq\frac{1}{N}, then (R0R_{0},R1R_{1}) constitute the 1N\frac{1}{N}-covers. For R0R_{0} and R1R_{1}, we can divide them into two parts as we did for (4.2) respectively. Let

T0=jJ0(ΩjR0),T_{0}=\bigcup_{j\in J_{0}}(\Omega_{j}\cap R_{0}),

and

T1=jJ0(ΩjR1).T_{1}=\bigcup_{j\in J_{0}}(\Omega_{j}\cap R_{1}).

We have the same conclusion for T0T_{0} and T1T_{1}. In order to unify the two cases T0T_{0} and T1T_{1} (Because T0T_{0} and T1T_{1} are generated from two test sets with the same cardinality, and the cardinality is the covering numbers), we consider a set RR^{\prime} which can be divided into two parts

(4.11) R=kKΩklL(ΩlR),R^{\prime}=\bigcup_{k\in K}\Omega_{k}\cup\bigcup_{l\in L}(\Omega_{l}\cap R^{\prime}),

where K,LK,L are two index sets. Moreover, we set the cardinality of R[0,1)dR^{{}^{\prime}}\subset[0,1)^{d} at most 2d1ed2πd(N+1)d2^{d-1}\frac{e^{d}}{\sqrt{2\pi d}}(N+1)^{d}(the δ\delta-covering numbers, where we choose δ=1N\delta=\frac{1}{N}), and we let

T=lL(ΩlR).T^{\prime}=\bigcup_{l\in L}(\Omega_{l}\cap R^{\prime}).

We define new random variables χj,1j|L|\chi_{j},1\leq j\leq|L|, as follows

χj={1,WjΩjR,0,otherwise.\chi_{j}=\left\{\begin{aligned} &1,W_{j}\in\Omega_{j}\cap R^{\prime},\\ &0,otherwise.\end{aligned}\right.

Then,

(4.12) NDN(W1,W2,,WN;R)\displaystyle N\cdot D^{*}_{N}\left(W_{1},W_{2},\ldots,W_{N};R^{\prime}\right)
=NDN(W1,W2,,WN;T)\displaystyle=N\cdot D^{*}_{N}\left(W_{1},W_{2},\ldots,W_{N};T^{\prime}\right)
=|j=1|L|χjN(j=1|L|λ(ΩjT))|.\displaystyle=|\sum_{j=1}^{|L|}\chi_{j}-N(\sum_{j=1}^{|L|}\lambda(\Omega_{j}\cap T^{\prime}))|.

Since

(χj=1)=λ(ΩjT)λ(Ωj)=Nλ(ΩjT),\mathbb{P}(\chi_{j}=1)=\frac{\lambda(\Omega_{j}\cap T^{\prime})}{\lambda(\Omega_{j})}=N\cdot\lambda(\Omega_{j}\cap T^{\prime}),

we get

(4.13) 𝐄(χj)=Nλ(ΩjT).\mathbf{E}(\chi_{j})=N\cdot\lambda(\Omega_{j}\cap T^{\prime}).

Thus, from (4.12) and (4.13), we obtain

(4.14) NDN(W1,W2,,WN;R)=|j=1|L|(χj𝔼(χj))|.N\cdot D^{*}_{N}\left(W_{1},W_{2},\ldots,W_{N};R^{\prime}\right)=|\sum_{j=1}^{|L|}(\chi_{j}-\mathbb{E}(\chi_{j}))|.

Let

σj2=𝔼(χj𝔼(χj))2,Σ=(j=1|L|σj2)12.\sigma_{j}^{2}=\mathbb{E}(\chi_{j}-\mathbb{E}(\chi_{j}))^{2},\Sigma=(\sum_{j=1}^{|L|}\sigma_{j}^{2})^{\frac{1}{2}}.

Therefore, from Lemma 4.1, for every RR^{\prime}, we have,

(|j=1|L|(χj𝔼(χj))|>λ)2exp(λ22Σ2+2λ3).\mathbb{P}\left(\Big{|}\sum_{j=1}^{|L|}(\chi_{j}-\mathbb{E}(\chi_{j}))\Big{|}>\lambda\right)\leq 2\cdot\exp(-\frac{\lambda^{2}}{2\Sigma^{2}+\frac{2\lambda}{3}}).

Let =R(|j=1|L|(χj𝔼(χj))|>λ),\mathscr{B}=\bigcup\limits_{R^{\prime}}\left(\Big{|}\sum_{j=1}^{|L|}(\chi_{j}-\mathbb{E}(\chi_{j}))|>\lambda\right), then using δ\delta-covering numbers, we have

(4.15) ()(2e)d12πd(N+1)dexp(λ22Σ2+2λ3).\mathbb{P}(\mathscr{B})\leq(2e)^{d}\cdot\frac{1}{\sqrt{2\pi d}}\cdot(N+1)^{d}\cdot\exp(-\frac{\lambda^{2}}{2\Sigma^{2}+\frac{2\lambda}{3}}).

Combining with (4.14), we get

(4.16) (R(NDN(W1,W2,,WN;R)>λ))\displaystyle\mathbb{P}\Big{(}\bigcup_{R^{\prime}}\left(N\cdot D_{N}^{*}\left(W_{1},W_{2},\ldots,W_{N};R^{\prime}\right)>\lambda\right)\Big{)}
(2e)d12πd(N+1)dexp(λ22Σ2+2λ3).\displaystyle\leq(2e)^{d}\cdot\frac{1}{\sqrt{2\pi d}}\cdot(N+1)^{d}\cdot\exp(-\frac{\lambda^{2}}{2\Sigma^{2}+\frac{2\lambda}{3}}).

For point sets YY and XX, if we let

Σ02=Var(n=1N𝟏T(Yn)),Σ12=Var(n=1N𝟏T(Xn)).\Sigma_{0}^{2}=\text{Var}(\sum_{n=1}^{N}\mathbf{1}_{T^{\prime}}(Y_{n})),\Sigma_{1}^{2}=\text{Var}(\sum_{n=1}^{N}\mathbf{1}_{T^{\prime}}(X_{n})).

Then (4.10) implies

Σ02<Σ12.\Sigma_{0}^{2}<\Sigma_{1}^{2}.

Besides, as (4.16), we have

(4.17) (R(NDN(Y1,Y2,,YN;R)>λ))\displaystyle\mathbb{P}\Big{(}\bigcup_{R^{\prime}}\left(N\cdot D_{N}^{*}\left(Y_{1},Y_{2},\ldots,Y_{N};R^{\prime}\right)>\lambda\right)\Big{)}
(2e)d12πd(N+1)dexp(λ22Σ02+2λ3),\displaystyle\leq(2e)^{d}\cdot\frac{1}{\sqrt{2\pi d}}\cdot(N+1)^{d}\cdot\exp(-\frac{\lambda^{2}}{2\Sigma_{0}^{2}+\frac{2\lambda}{3}}),

and

(R(NDN(X1,X2,,XN;R)>λ))\displaystyle\mathbb{P}\Big{(}\bigcup_{R^{\prime}}\left(N\cdot D_{N}^{*}\left(X_{1},X_{2},\ldots,X_{N};R^{\prime}\right)>\lambda\right)\Big{)}
(2e)d12πd(N+1)dexp(λ22Σ12+2λ3),\displaystyle\leq(2e)^{d}\cdot\frac{1}{\sqrt{2\pi d}}\cdot(N+1)^{d}\cdot\exp(-\frac{\lambda^{2}}{2\Sigma_{1}^{2}+\frac{2\lambda}{3}}),

respectively.

Suppose A(d,q,N)=dln(2e)+dln(N+1)ln(2πd)2ln(1q)A(d,q,N)=d\ln(2e)+d\ln(N+1)-\frac{\ln(2\pi d)}{2}-\ln(1-q), and we choose

λ=2Σ02A(d,q,N)+A2(d,q,N)9+A(d,q,N)3\lambda=\sqrt{2\Sigma_{0}^{2}\cdot A(d,q,N)+\frac{A^{2}(d,q,N)}{9}}+\frac{A(d,q,N)}{3}

in (4.17), then we have

(4.18) (R(NDN(Y1,Y2,,YN;R)>λ))1q.\mathbb{P}\Big{(}\bigcup_{R^{\prime}}\left(N\cdot D_{N}^{*}\left(Y_{1},Y_{2},\ldots,Y_{N};R^{\prime}\right)>\lambda\right)\Big{)}\leq 1-q.

Hence, from (4.18), it can easily be verified

(4.19) maxRi,i=0,1DN(Y1,Y2,,YN;Ri)2Σ02A(d,q,N)+A2(d,q,N)9N+A(d,q,N)3N\max_{R_{i},i=0,1}D_{N}^{*}(Y_{1},Y_{2},\ldots,Y_{N};R_{i})\leq\frac{\sqrt{2\Sigma_{0}^{2}\cdot A(d,q,N)+\frac{A^{2}(d,q,N)}{9}}}{N}+\frac{A(d,q,N)}{3N}

holds with probability at least qq.

From (2.3), combining with δ\delta-covering numbers(where δ=1N\delta=\frac{1}{N}), we get,

(4.20) DN(Y)\displaystyle D_{N}^{*}(Y) 2Σ02A(d,q,N)+A2(d,q,N)9N+A(d,q,N)+33N\displaystyle\leq\frac{\sqrt{2\Sigma_{0}^{2}\cdot A(d,q,N)+\frac{A^{2}(d,q,N)}{9}}}{N}+\frac{A(d,q,N)+3}{3N}
(2Σ0+1)A(d,q,N)N\displaystyle\leq(\sqrt{2}\cdot\Sigma_{0}+1)\frac{A(d,q,N)}{N}

holds with probability at least qq, the last inequality in (4.20) holds because A(d,q,N)3A(d,q,N)\geq 3 holds for all q(0,1)q\in(0,1).

Same analysis with point set XX, we have

(4.21) DN(X)\displaystyle D_{N}^{*}(X) 2Σ12A(d,q,N)+A2(d,q,N)9N+A(d,q,N)+33N\displaystyle\leq\frac{\sqrt{2\Sigma_{1}^{2}\cdot A(d,q,N)+\frac{A^{2}(d,q,N)}{9}}}{N}+\frac{A(d,q,N)+3}{3N}
(2Σ1+1)A(d,q,N)N\displaystyle\leq(\sqrt{2}\cdot\Sigma_{1}+1)\frac{A(d,q,N)}{N}

holds with probability at least qq.

Now, we fix a probability value q0(0,1)q_{0}\in(0,1) in (4.20), i.e., we suppose (4.20) holds with probability exactly q0q_{0}, where q0[q,1)q_{0}\in[q,1). Choose this q0q_{0} in (4.21), we have

DN(X)(2Σ1+1)A(d,q0,N)N,D_{N}^{*}(X)\leq(\sqrt{2}\cdot\Sigma_{1}+1)\frac{A(d,q_{0},N)}{N},

holds with probability q0.q_{0}.

Therefore from Σ0<Σ1\Sigma_{0}<\Sigma_{1}, we obtain,

(4.22) DN(X)(2Σ0+1)A(d,q0,N)ND_{N}^{*}(X)\leq(\sqrt{2}\cdot\Sigma_{0}+1)\frac{A(d,q_{0},N)}{N}

holds with probability q1,q_{1}, where 0<q1<q00<q_{1}<q_{0}.

We use the following fact to estimate the expected star discrepancy

(4.23) 𝔼[DN(W)]=01(DN(W)t)𝑑t,\mathbb{E}[D^{*}_{N}(W)]=\int_{0}^{1}\mathbb{P}(D^{*}_{N}(W)\geq t)dt,

where DN(W)D^{*}_{N}(W) denotes the star discrepancy of point set WW.

Plugging q0q_{0} into (4.20), we have

(4.24) DN(Y)(2Σ0+1)A(d,q0,N)ND_{N}^{*}\left(Y\right)\leq(\sqrt{2}\cdot\Sigma_{0}+1)\frac{A(d,q_{0},N)}{N}

holds with probability q0q_{0}. Then (4.24) is equivalent to

(DN(Y)(2Σ0+1)A(d,q0,N)N)=1q0.\mathbb{P}\big{(}D_{N}^{*}\left(Y\right)\geq(\sqrt{2}\cdot\Sigma_{0}+1)\frac{A(d,q_{0},N)}{N}\big{)}=1-q_{0}.

Now releasing q0q_{0} and let

(4.25) t=(2Σ0+1)A(d,q0,N)N,t=(\sqrt{2}\cdot\Sigma_{0}+1)\frac{A(d,q_{0},N)}{N},
(4.26) C0(Σ0,N)=2Σ0+1N,C_{0}(\Sigma_{0},N)=\frac{\sqrt{2}\cdot\Sigma_{0}+1}{N},

and

(4.27) C1(d,Σ0,N)=2Σ0+1N(dln(2e)+dln(N+1)ln(2πd)2).C_{1}(d,\Sigma_{0},N)=\frac{\sqrt{2}\cdot\Sigma_{0}+1}{N}\cdot(d\ln(2e)+d\ln(N+1)-\frac{\ln(2\pi d)}{2}).

Then

(4.28) t=C1(d,Σ0,N)C0(Σ0,N)ln(1q0).t=C_{1}(d,\Sigma_{0},N)-C_{0}(\Sigma_{0},N)\ln(1-q_{0}).

Thus from (4.23) and q0[q,1)q_{0}\in[q,1), we have

(4.29) 𝔼[DN(Y)]=01(DN(Y)t)𝑑t\displaystyle\mathbb{E}[D^{*}_{N}(Y)]=\int_{0}^{1}\mathbb{P}(D^{*}_{N}(Y)\geq t)dt
=1eC1(d,Σ0,N)C0(Σ0,N)1eC1(d,Σ0,N)1C0(Σ0,N)(DN(Y)(2Σ0+1)A(d,q0,N)N)C0(Σ0,N)11q0𝑑q0\displaystyle=\int_{1-e^{\frac{C_{1}(d,\Sigma_{0},N)}{C_{0}(\Sigma_{0},N)}}}^{1-e^{\frac{C_{1}(d,\Sigma_{0},N)-1}{C_{0}(\Sigma_{0},N)}}}\mathbb{P}\Big{(}D^{*}_{N}(Y)\geq(\sqrt{2}\cdot\Sigma_{0}+1)\frac{A(d,q_{0},N)}{N}\Big{)}\cdot C_{0}(\Sigma_{0},N)\cdot\frac{1}{1-q_{0}}dq_{0}
=q1eC1(d,Σ0,N)1C0(Σ0,N)C0(Σ0,N)1q01q0𝑑q0.\displaystyle=\int_{q}^{1-e^{\frac{C_{1}(d,\Sigma_{0},N)-1}{C_{0}(\Sigma_{0},N)}}}C_{0}(\Sigma_{0},N)\cdot\frac{1-q_{0}}{1-q_{0}}dq_{0}.

Furthermore, from (4.22), we have

(DN(X)(2Σ0+1)A(d,q0,N)N)=1q1.\mathbb{P}\big{(}D_{N}^{*}\left(X\right)\geq(\sqrt{2}\cdot\Sigma_{0}+1)\frac{A(d,q_{0},N)}{N}\big{)}=1-q_{1}.

Following the steps from (4.25) to (4.28), we obtain,

𝔼[DN(X)]=01(DN(X)t)𝑑t=q1eC1(d,Σ0,N)1C0(Σ0,N)C0(Σ0,N)1q11q0𝑑q0.\mathbb{E}[D^{*}_{N}(X)]=\int_{0}^{1}\mathbb{P}(D^{*}_{N}(X)\geq t)dt=\int_{q}^{1-e^{\frac{C_{1}(d,\Sigma_{0},N)-1}{C_{0}(\Sigma_{0},N)}}}C_{0}(\Sigma_{0},N)\cdot\frac{1-q_{1}}{1-q_{0}}dq_{0}.

From q1<q0,q_{1}<q_{0}, we obtain

1q11q0>1q01q0.\frac{1-q_{1}}{1-q_{0}}>\frac{1-q_{0}}{1-q_{0}}.

Hence,

(4.30) 𝔼(DN(Y))<𝔼(DN(X)).\mathbb{E}(D_{N}^{*}(Y))<\mathbb{E}(D_{N}^{*}(X)).

5. Conclusion

We study expected star discrepancy under jittered sampling. A strong partition principle for the star discrepancy is presented. The next most direct task is to find the expected star discrepancy better than jittered sampling, which may be closely related to the variance of the characteristic function defined on the convex test sets.

References

  • [1] H. Avron, V. Sindhwani, J. Yang and M. W. Mahoney, Quasi-Monte Carlo feature maps for shift-invariant kernels, J. Mach. Learn. Res., 17(2016), 1–38.
  • [2] J. Beck, Some upper bounds in the theory of irregularities of distribution, Acta Arith., 43(1984), 115-130.
  • [3] J. Beck, Irregularities of distribution. I, Acta Math., 159(1987), 1-49.
  • [4] K. Chiu, P. Shirley and C. Wang, Multi-jittered sampling, Graphics Gems, 4(1994), 370-374.
  • [5] F. Cucker and D. X. Zhou, Learning theory: an approximation theory viewpoint, Cambridge University Press., 2007.
  • [6] J. Dick and F. Pillichshammer, Digital Nets and Sequences. Discrepancy Theory and quasi-Monte Carlo Integration, Cambridge University Press., 2010.
  • [7] J. Dick, F. Y. Kuo and I. H. Sloan, High-dimensional integration:the quasi-Monte Carlo way, Acta Numer., 22(2013), 133–288.
  • [8] B. Doerr, M. Gnewuch and A. Srivastav, Bounds and constructions for the star-discrepancy via δ\delta-covers, J. Complexity, 21(2005), 691-709.
  • [9] B. Doerr, A sharp discrepancy bound for jittered sampling, Math. Comp.(2022), http://doi.org/10.1090/mcom/3727.
  • [10] P. Glasserman, Monte Carlo Methods in Financial Engineering. Springer-Verlag, New York, 2004.
  • [11] M. Gnewuch, Bracketing numbers for axis-parallel boxes and applications to geometric discrepancy, J. Complexity, 24(2008), 154-172.
  • [12] M. Kiderlen and F. Pausinger, On a partition with a lower expected L2L_{2}-discrepancy than classical jittered sampling, J. Complexity (2021), https://doi.org/10.1016/j.jco.2021.101616.
  • [13] M. Kiderlen and F. Pausinger, Discrepancy of stratified samples from partitions of the unit cube, Monatsh. Math., 195(2021), 267-306.
  • [14] H. Niederreiter, Random number generation and Quasi-Monte Carlo methods, SIAM, Philadelphia, 1992.
  • [15] A. B. Owen, Quasi-monte carlo sampling. In Monte Carlo Ray Tracing: ACM SIGGRAPH 2003 Course Notes 44(2003), 69–88.
  • [16] F. Pausinger and S. Steinerberger, On the discrepancy of jittered sampling, J. Complexity, 33(2016), 199-216.
  • [17] A. W. van der Vaart and J. A. Wellner, Weak convergence and empirical processes, Springer Series in Statistics, Springer-Verlag, New York, With applications to statistics, 1996.