Improved expected -discrepancy formulas on jittered sampling
Jun Xian, Xiaoda Xu
J. Xian
Department of Mathematics and Guangdong Province Key Laboratory of Computational Science
Sun
Yat-sen University
510275 Guangzhou
China.
[email protected]X. Xu
Department of Mathematics
Sun
Yat-sen University
510275 Guangzhou
China.
[email protected]
Abstract.
We study the expected discrepancy under two classes of partitions, explicit and exact formulas are derived respectively. These results attain better expected discrepancy formulas than jittered sampling.
Key words and phrases:
Expected -discrepancy; Stratified sampling.
2010 Mathematics Subject Classification:
65C10, 11K38, 65D30.
1. Introduction
Among various techniques for measuring the irregularity of point distribution, discrepancy methods have proven to be one of the most efficient approaches. In various notions of discrepancy, discrepancy is widely studied, see applications in areas of low discrepancy point sets [1, 2, 3, 4], and the best known asymptotic upper bounds of -discrepancy for these point sets are of the form
In order to facilitate the comparison of -discrepancy for different sampling point sets, it is necessary to derive explicit formulas for -discrepancy of different sampling sets. Thanks to Warnock’s formula, explicit -discrepancy formulas for deterministic point sets can be derived, see [3]. The interesting thing is to calculate the expected discrepancy formula for random sampling, this problem comes from the integration approximation, which is, for and random sampling set , it is proved in [3]
(1.1)
where is Sobolev space.
It means that a smaller expected discrepancy implies a better expected uniform integration approximation in a class of functional space. Also, this means some practical approximation problems that can be converted into (1.1) are solvable.
The main purpose of this paper is to derive explicit discrepancy formulas for two classes of partitions in dimensions with sampling number , and attain better results than jittered sampling under the same sampling conditions.
The explicit formula for discrepancy of jittered sampling set is a known result, which is recently derived in [7]. The significant application of improving this formula is to obtain a better and explicit upper bound of approximation error in formula (1.1). Our models are motivated by [6], which are for dimensions with sampling sets , and the corresponding construction models for dimensions are motivated by [5], see model 2.4 in Section 2.
The rest of this paper is organized as follows. Section 2 presents preliminaries. Section 3 presents our main result, which provides explicit expected -discrepancy for two certain classes of partitions. Section 4 includes the proofs of the main results. Finally, in section 5 we conclude the paper with a short summary.
2. Preliminaries on random sampling
Before introducing the main result, we list the preliminaries used in this paper.
2.1. discrepancy
discrepancy of a sampling set is defined by
(2.1)
where denotes the Lebesgue measure, denotes the characteristic function on set .
In the definition of discrepancy, if we introduce the counting measure , (2.1) can also be expressed as
(2.2)
where denotes the number of points falling into the set
To simplify the expression of discrepancy, we employ the discrepancy function via:
(2.3)
2.2. Simple random sampling
In a sense, simple random sampling is Monte Carlo sampling. Uniform distributed point set is selected in , see Figure 1.
(a)Simple random sampling in two dimensions
(b)Simple random sampling in three dimensions
Figure 1. Simple random sampling.
2.3. Jittered sampling
Jittered sampling is a type of grid-based equivolume partition. is divided into axis parallel boxes each with sides see illustration of Figure 2.
(a)jittered sampling in two dimensions
(b)jittered sampling in three dimensions
Figure 2. jittered sampling formed by isometric grid partition.
For the merged rectangle , we use a series of straight line partitions to divide the rectangle into two equal-volume parts, which will be converted to a one-parameter model if we set the angle between the dividing line and horizontal line across the center , where we suppose . From simple calculations, we can conclude the arbitrary straight line must pass through the center of the rectangle. For convenience of notation, we set this partition model in two dimensional case.
Figure 4. A class of partitions for two dimensions
2.6. Class of partition model II
For the rectangle
where are three positive constants. A straight line partition is used to divide the rectangle into two parts if we set the straight line parallel to the diagonal of , and the distance from the intersection point to the endpoint at the upper right corner of is .
For convenience of notation, we set this partition model
where .
Figure 5. Uneqivolume partition
Now, consider dimensional cuboid
(2.4)
and its two partitions and into two closed, convex bodies with
(2.5)
and
We choose in and , denoted by and , then we obtain
(2.6)
and
(2.7)
3. Explicit Expected -discrepancy for stratified random sampling formed by two classes of partitions
3.1. Main results
In this section, explicit discrepancy formulas are given for Class of partition model I and II.
Theorem 3.1.
For partition of and , then
(3.1)
where
(3.2)
Remark 3.2.
Noticing that in Theorem 3.1, is a continuous function, decreases monotonically between and and increases monotonically between and , see Figure 6. Choose parameter in Theorem 3.1, then we are back to the case of classical jittered sampling.
Besides, the interesting thing is is a continuous function of , it dynamically establishes continuous dependency with partition parameters and provides ideas for continuous improvement of grid-based partition.
Figure 6. function
Corollary 3.3.
For partition of and , then
(3.3)
where the equality holds if and only if and .
Remark 3.4.
Noticing that and if and only if and . We already know it is shown in [5] the following conclusion
holds, this is a counterexample that gives a theoretic conclusion, which proves the existence of a partition that with lower expected discrepancy than jittered sampling. This partition is a special case of the Class of partition model I, for . Besides, this partition happens to achieve optimal expected discrepancy among the Class of partition model I, however, in [5], the aim is to give a counterexample, that proves jittered sampling could not minimize expected discrepancy for all partition manners, the contribution of our Theorem 3.1 is giving explicit expected discrepancy formulas under a class of equivolume partitions and improving the known expected discrepancy formula
(3.4)
recently derived in [7].
A smaller expected discrepancy formula gives better integration approximation error in a certain class of functional space if we consider some application aspects.
Theorem 3.5.
For partition of and , then
(3.5)
where
Remark 3.6.
It can be easily analyzed that
for any , thus it improves the expected discrepancy formula for jittered sampling, moreover, it attains better results than all cases of convex equivolume partitions as the class of partition model I. Therefore, expected discrepancy formulas under the class of partition I and II can be seen as the improvement of the Theorem 1 in [7], and the class of partition II can be seen as a class of partitions with lower expected discrepancy than jittered sampling and the partition model in [5]. The main contribution of Theorem 3.5 is giving explicit expected discrepancy formulas under a class of unequivolume partitions for in dimensional space, which provides a method to compute exact expected discrepancy formulas in dimensional space with , by the way, it is proved that in the grid-like partition of -dimensional space, the use of equivolume may not be optimal. This conclusion is a known result for two-dimensional space with two points, see [6, 8], [8] gives optimal partition manner(it happens to be unequivolume partition) for two-dimensional space with two points, for the case of dimensional space partitions with , the partition model may be easy to extend, but the exact formula of expected discrepancy formula as (3.4) under the optimal case remains unknown.
3.2. Some discussions
The model [8] gives a family of unequivolume partitions in two dimensions with sampling number , a certain type of symmetry along the diagonal is needed, and the partition is non-convex, expected discrepancy for attains optimal in these restrictions. It is also proved in [6] some class of convex partitions could minimize the expected discrepancy for dimensions with , thus an interesting question is could some class of convex unequivolume partitions achieve better expected discrepancy formulas than the family of non-convex partitions for with ? The family of partitions in [8] is:
where . If satisfies a highly nonlinear integral equation, then the partition is optimal. This result can be extended to dimensions with as the partition model 2.4 in [5], but it may be difficult to obtain an explicit expected discrepancy formula due to the difficulty in computing the solution of the integral equation that has to satisfy. If we consider the convex partition, will satisfy a linear equation, then an explicit formula is within reach. The optimal expected -discrepancy formula may be obtained from the class of unequivolume partitions formed by a special class of linear equations, but we have not yet carried out the accurate calculation and comparison. Besides, since theorem 3.5 satisfies certain symmetry, it can be converted into a one-parameter model. Considering the partition class formed by general linear equations, the result should be a two-parameter model. As the equation satisfied by becomes more and more complex, it is expected that the parameters contained in the discrepancy formula will become more complex, but it still has the value of numerical research, approximation error in (1.1) can still be accurately analyzed.
For some missing convex or non-convex partitions, we can naturally provide the following two examples, see Figure 7, similar to model 2.4, they can be easily extended to dimensions with . Expected discrepancy formulas are also easy to compute if we follow the proof process of Theorem 3.1 and 3.5, because their partition curve equations can always be explicitly displayed by some parameters.
For equivolume partition of (the same argument if we replace with ), from [Proposition ] in [5], which is, for an equivolume partition of a compact convex set with , is the corresponding stratified sampling set, then
The equation of the dividing line is the following
Besides, set , thus,
The second step, for area III, we have,
and
Figure 13. Division of the integral region II
Furthermore, we have
(4.47)
Besides,
Moreover, we have,
Therefore, we have
(4.48)
According to , and let
We obtain
(4.49)
For jittered grid area and , it is clear that
(4.50)
For jittered sampling point set ,
(4.51)
Besides,
(4.52)
Easy to obtain
(4.53)
Therefore, from (4.49),(4.50),(4.51),(4.52) and (4.53), we have the desired result.
5. Conclusion
We study expected discrepancy under two classes of partitions,
and we give explicit formulas, these results improve the expected discrepancy formula on jittered sampling. It should be pointed out that the results presented here for the Class of partition models I and II need to be improved, they are all based on the strict condition of grid-based partition. The expected discrepancy of stratified sampling under general equal measure partition models will be investigated in future research and we hope to give some more applications in function approximation.
References
[1]J. Dick and F. Pillichshammer, On the mean square weighted discrepancy of randomized digital -nets over , Acta Arith., 117(2005), 371-403.
[2]L. L. Cristea, J. Dick and F. Pillichshammer, On the mean square weighted discrepancy of randomized digital nets in prime base, J. Complexity, 22(2006), 605-629.
[3]J. Dick and F. Pillichshammer, Discrepancy theory and quasi-Monte Carlo integration. In: W. W. L. Chen, A. Srivastav, G. Travaglini(eds.), Panoramy in Discrepancy Theory, Springer Verlag, Cham, 2014, 539-619.
[4]J. Dick, A. Hinrichs and F. Pillichshammer, A note on the periodic -discrepancy of Korobov’s -sets, Arch. Math. (Basel), 115(2020), 67-78.
[5]M. Kiderlen and F. Pausinger, On a partition with a lower expected -discrepancy than classical jittered sampling, J. Complexity, 70(2022), https://doi.org/10.1016/j.jco.2021.101616.
[6]M. Kiderlen and F. Pausinger, Discrepancy of stratified samples from partitions of the unit cube, Monatsh. Math., 195(2021), 267-306.
[7]N. Kirk and F. Pausinger, On the expected -discrepancy of jittered sampling, arXiv preprint arXiv:2208.08924, 2022.
[8]F. Pausinger, M. Rachh, S. Steinerberger, Optimal jittered sampling for two points in the unit square, Statist.Probab. Lett., 132(2018), 55–61.