This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\headers

AN EffICEINT ALGRITHM FOR HIGH-DIMENSIONAL INTEGRATIONHUICONG ZHONG AND XIAOBING FENG

An efficient implementation algorithm for quasi-Monte Carlo approximations of high-dimensional integrals

Huicong Zhong School of Mathematics and Statistics, Northwestern Polytechnical University, Xi’an, Shaanxi 710129, China (). [email protected]    Xiaobing Feng Department of Mathematics, The University of Tennessee, Knoxville, TN, 37996 (). [email protected]
Abstract

In this paper, we develop and test a fast numerical algorithm, called MDI-LR, for efficient implementation of quasi-Monte Carlo lattice rules for computing dd-dimensional integrals of a given function. It is based on the idea of converting and improving the underlying lattice rule into a tensor product rule by an affine transformation, and adopting the multilevel dimension iteration approach which computes the function evaluations (at the integration points) in the tensor product multi-summation in cluster and iterates along each (transformed) coordinate direction so that a lot of computations can be reused. The proposed algorithm also eliminates the need for storing integration points and computing function values independently at each point. Extensive numerical experiments are presented to gauge the performance of the algorithm MDI-LR and to compare it with standard implementation of quasi-Monte Carlo lattice rules. It is also showed numerically that the algorithm MDI-LR can achieve a computational complexity of order O(N2d3)O(N^{2}d^{3}) or better, where NN represents the number of points in each (transformed) coordinate direction and dd standard for the dimension. Thus, the algorithm MDI-LR effectively overcomes the curse of dimensionality and revitalizes QMC lattice rules for high-dimensional integration.

keywords:
Lattice rule(LR), multilevel dimension iteration (MDI), Monte Carlo methods, Quasi-Monte Carlo(QMC) method, high-dimensional integration.
{AMS}

65D30, 65D40, 65C05, 65N99

1 Introduction

Numerical integration is an essential tool and building block in many scientific and engineering fields which requires to evaluate or estimate integrals of given (explicitly or implicitly) functions, which becomes very challenging in high dimensions due to the so-called curse of the conditionality (CoD). They are seen in evaluating quantities of stochastic interests, solving high-dimensional partial differential equations, or computing value functions of an option of a basket of securities. The goal of this paper is to develop and test an efficient algorithm based on quasi-Monte Carlo methods for evaluating the dd-dimensional integral

(1.1) Id(f):=Ωf(𝐱)𝑑𝐱I_{d}(f):=\int_{\Omega}f(\mathbf{x})d\mathbf{x}

for a given function f:Ω:=[0,1]d𝐑f:\Omega:=[0,1]^{d}\to\mathbf{R} and d>>1d>>1 .

Classical numerical integration methods, such as tensor product and sparse grid methods [17, 18] as well as Monte Carlo(MC) methods [19, 20] require the evaluation of a function at a set of integration points. The computational complexity of the first two types of methods grows exponentially with the dimension dd in the problem (i.e.., the CoD), which limits their practical usage. Monte Carlo (MC) methods are often the default methods for high dimensional integration problems due to their ability of handling complicated functions and mitigating the CoD. We recall that the MC method approximates the integral by randomly sampling points within the integration domain and averaging their function values. The classical MC method has the form

(1.2) Qn,d(f)=1ni=0n1f(𝐱i),Q_{n,d}(f)=\frac{1}{n}\sum_{i=0}^{n-1}f(\mathbf{x}_{i}),

where {𝐱i}i=0n1\{\mathbf{x}_{i}\}_{i=0}^{n-1} denotes independent and uniformly distributed random samples in the integration domain Ω\Omega. The expected error for the MC method is proportional to σ(f)n\frac{\sigma(f)}{\sqrt{n}}, where σ(f)2\sigma(f)^{2} stands for the variance of ff. If ff is square-integrable then the expected error in (1.2) has the order O(n12)O(n^{-\frac{1}{2}}) (note that the convergence rate is independent of the dimension dd). Evidently, the MC method is simple and easy to implement, making them a popular choice for many applications. However, the MC method is slow to converge, especially for high-dimensional problems, and the accuracy of the approximation depends on the number of random samples. One way to improve the convergence rate of the Monte Carlo method is to use quasi-Monte Carlo methods.

Quasi-Monte Carlo (QMC) methods [3, 21] employ integration techniques that use point sets with better distribution properties than random sampling. Similar to the MC method, the QMC method also has the general form (1.2), but unlike the MC method, the integration points {𝐱i}i=0n1Ω\{\mathbf{x}_{i}\}_{i=0}^{n-1}\in\Omega are chosen deterministically and methodically. The deterministic nature of the QMC method could lead to guaranteed error bounds and that the convergence rate could be faster than the O(n12)O(n^{-\frac{1}{2}}) order of the MC method for sufficiently smooth functions. QMC error bounds are typically given in the form of Koksma-Hlawka-type inequalities as follows:

(1.3) |Id(f)Qn,d(f)|D(𝐱0,𝐱1,,𝐱n1)V(f),|I_{d}(f)-Q_{n,d}(f)|\leq D(\mathbf{x}_{0},\mathbf{x}_{1},\cdots,\mathbf{x}_{n-1})V(f),

where D(𝐱0,𝐱1,,𝐱n1)D(\mathbf{x}_{0},\mathbf{x}_{1},\cdots,\mathbf{x}_{n-1}) is a (positive) discrepancy function which measures the non-uniformity of the point set {𝐱i}i=0n1\{\mathbf{x}_{i}\}_{i=0}^{n-1} and V(f)V(f) is a (positive) functional which measures the variability of ff. Error bounds of this type separate the dependence on the cubature points from the dependence on the integrand. The QMC point sets with discrepancy of order O(n1(logn)d)O(n^{-1}(\log n)^{d}) or better are collectively known as low-discrepancy point sets [22].

One of the most popular QMC methods is the lattice rule, whose integration points are chosen to have a lattice structure, low-discrepancy, and better distribution properties than random sampling [1, 2, 4], hence, resulting in a more accurate method with faster convergence rate. However, traditional lattice rules still have limitations when applied to high-dimensional problems. Good lattice rules almost always involve searching (cf. [23, 24]), the cost of an exhaustive search (for nn fixed) grows exponentially with the dimension dd. Moreover, like the MC method, the number integration points required to achieve a reasonable accuracy also increases exponentially with the dimension dd (i.e., the CoD phenomenon), which makes the method computationally infeasible for very high-dimensional integration problems.

To overcome the limitations of QMC lattice rules, we first propose an improved QMC lattice rule based on a change of variables and reformulate it as a tensor product rule in the transformed coordinates. We then develop an efficient implementation algorithm, called MDI-LR, by adapting the multilevel dimension iteration (MDI) idea first proposed by the authors in [11], for the improved QMC lattice rule. The proposed MDI-LR algorithm optimizes the function evaluations at integration points by clustering them and sharing computations via a symbolic-function based dimension/coordinate iteration procedure. This MDI-LR algorithm significantly reduces the computational complexity of the QMC lattice rule from an exponential growth in dimension dd to a polynomial order O(N2d3)O(N^{2}d^{3}), where NN denotes the number of integration points in each (transformed) coordinate direction. Thus, the MDI-LR effectively overcomes the CoD and revitalizes QMC lattice rules for high-dimensional integration.

The remainder of this paper is organized as follows. In Section 2, we first briefly review the rank-one lattice rule and its properties. In Section 4, we introduce a reformulation of this lattice rule and proposed a tensor product generalization based on an affine transformation. In Section 3 , we introduce our MDI-LR algorithm for efficiently implementing the proposed lattice rule based a multilevel dimension iteration idea. In Section 5, we present extensive numerical experiments to test the performance of the proposed MDI-LR algorithm and compare its performance with the original lattice rule and the improved lattice rule with standard implementation. The numerical experiments show that the MDI-LR algorithm is much faster and more efficient in medium and high-dimensional cases. In Section 6, we numerically examine the impact of parameters appeared in MDI-LR algorithm, including the choice of the generating vector 𝐳\mathbf{z} for the lattice rule. In Section 7, we present a detail numerical study of the computational complexity for the MDI-LR algorithm. This is done by using regression techniques to discover the relationship between CPU time and dimension dd. Finally, the paper is concluded with a summary given in Section 8.

2 Preliminaries

In this section, we first briefly recall some basic materials about Quasi-Monte Carlo (QMC) lattice rules for evaluating integrals (1.1) and their properties, they will set stage for us to introduce our fast implementation algorithm in the later sections.

2.1 Quasi-Monte Carlo lattice rules

Lattice rules are a class of Quasi Monte Carlo (QMC) methods which were first introduced by Korobov in [1] to approximate (1.1) with periodic integrand ff. A lattice rule is an equal-weight cubature rule whose cubature points are those points of an integration lattice that lie in the half-open unit cube [0,1)d[0,1)^{d}. Every lattice point set includes the origin. The projection of the lattice points onto each coordinate axis are equally spaced. Essentially, the integral is approximated in each coordinate direction by a rectangle rule (or a trapezoidal rule if the integrand is periodic). The simplest lattice rules are called rank-one lattice rules, they use a lattice point set generated by multiples of a single generator vector, which are defined as follows.

Definition 2.1 (rank-one lattice rule).

An n-point rank-one lattice rule in dd-dimensions, also known as the method of good lattice points, is a QMC method with cubature points

(2.1) 𝐱i={i𝐳n},i=0,1,,n1,\mathbf{x}_{i}=\Big{\{}\frac{i\mathbf{z}}{n}\Big{\}},\qquad i=0,1,\cdots,n-1,

where 𝐳d\mathbf{z}\in\mathbb{Z}^{d}, known as the generating vector, is a dd-dimensional integer vector having no factor in common with n, and the braces operator {}\{\cdot\} takes the fractional part of the input vector.

Every lattice rule can be written as a multiple sum involving one or more generating vectors. The minimal number of generating vectors required to generate a lattice rule is known as the rank of the rule. Besides rank-one lattice rules which have only one generating vector, there are also lattice rules having rank up to dd.

ff is said to have an absolutely convergent Fourier series expansion if

(2.2) f(𝐱)=𝐡Zdf^(𝐡)e2πi𝐡𝐱,i=1,f(\mathbf{x})=\sum_{\mathbf{h}\in Z^{d}}\hat{f}(\mathbf{h})e^{2\pi i\mathbf{h}\cdot\mathbf{x}},\qquad i=\sqrt{-1},

where the Fourier coefficient is defined as

f^(𝐡)=Ωf(𝐱)e2πi𝐡𝐱𝑑𝐱.\hat{f}(\mathbf{h})=\int_{\Omega}f(\mathbf{x})e^{-2\pi i\mathbf{h}\cdot\mathbf{x}}d\mathbf{x}.

The following theorem gives two characterizations for the error of the lattice rules (cf. [2, Theorem 1] and [3, Theorem 5.2]).

Theorem 2.2.

Let Qn,dQ_{n,d} denote a lattice rule (not necessarily rank-one) and let \mathcal{L} denote the associated integration lattice. If ff has an absolutely convergent Fourier series (2.2), then

(2.3) Qn,d(f)Id(f)=𝐡{𝟎}f^(𝐡),Q_{n,d}(f)-I_{d}(f)=\sum_{\mathbf{h}\in\mathcal{L}^{\perp}\setminus\{\mathbf{0}\}}\hat{f}(\mathbf{h}),

where :={𝐡d:𝐡𝐱𝐱}\mathcal{L}^{\perp}:=\{\mathbf{h}\in\mathbb{Z}^{d}:\mathbf{h}\cdot\mathbf{x}\in\mathbb{Z}\quad\forall\mathbf{x}\in\mathcal{L}\} is the dual lattice associated with \mathcal{L}.

Theorem 2.3.

Let Qn,dQ_{n,d} denote a rank-one lattice rule with a generating vector 𝐳\mathbf{z}. If ff has an absolutely convergent Fourier series (2.2), then

(2.4) Qn,d(f)Id(f)=𝐡d{𝟎}𝐡𝐳0(modn)f^(𝐡).Q_{n,d}(f)-I_{d}(f)=\sum_{\begin{array}[]{c}\mathbf{h}\in\mathbb{Z}^{d}\setminus\{\mathbf{0}\}\\ \mathbf{h}\cdot\mathbf{z}\equiv 0\,(\mbox{\rm mod}\,n)\end{array}}\hat{f}(\mathbf{h}).

It follows from (2.4) that the least upper bound of the error for the class Eα(c)E_{\alpha}(c) of functions whose Fourier coefficients satisfy |f^(𝐡)|c(h¯1h¯d)α|\hat{f}(\mathbf{h})|\leq\frac{c}{(\bar{h}_{1}\cdots\bar{h}_{d})^{\alpha}} (where h¯:=max(1,|h|)\bar{h}:=\max(1,|h|) and α>1,c>0,𝐡0\alpha>1,c>0,\mathbf{h}\neq 0) is given by

(2.5) |Qn,d(f)Id(f)|c𝐡d{𝟎}𝐡𝐳0(modn)1(h¯1h¯d)α.|Q_{n,d}(f)-I_{d}(f)|\leq c\sum_{\begin{array}[]{c}\mathbf{h}\in\mathbb{Z}^{d}\setminus\{\mathbf{0}\}\\ \mathbf{h}\cdot\mathbf{z}\equiv 0\,(\mbox{mod}\,n)\end{array}}\frac{1}{(\bar{h}_{1}\cdots\bar{h}_{d})^{\alpha}}.

Let

(2.6) Pα,n,d(𝐳):=𝐡d{𝟎}𝐡𝐳0(modn)1(h¯1h¯d)α.{\color[rgb]{0,0,0}P_{\alpha,n,d}(\mathbf{z}):}=\sum_{\begin{array}[]{c}\mathbf{h}\in\mathbb{Z}^{d}\setminus\{\mathbf{0}\}\\ \mathbf{h}\cdot\mathbf{z}\equiv 0\,(\mbox{mod}\,n)\end{array}}\frac{1}{(\bar{h}_{1}\cdots\bar{h}_{d})^{\alpha}}.

For fixed nn and α\alpha, a good lattice point 𝐳\mathbf{z} is so chosen to make Pα,n,d(𝐳)P_{\alpha,n,d}(\mathbf{z}) as small as possible. It ws proved by Niederreiter in [7, 8, Theorem 2.11] that for a prime nn (or a prime power) there exists a lattice point 𝐳\mathbf{z} such that

(2.7) Pα,n,d(𝐳)=O((logn)αdnα).P_{\alpha,n,d}(\mathbf{z})=O\biggl{(}\frac{(\log n)^{\alpha d}}{n^{\alpha}}\biggr{)}.

This was done by proving that Pα,n,d(𝐳)P_{\alpha,n,d}(\mathbf{z}) has the following expansion:

(2.8) Pα,n,d(𝐳)=1+1nk=0n1j=1d(1+h\{0}e2πikhzj/n|h|α)\displaystyle P_{\alpha,n,d}(\mathbf{z})=-1+\frac{1}{n}\sum\limits_{k=0}^{n-1}\prod_{j=1}^{d}\biggl{(}1+\sum_{h\in\backslash\{0\}}\frac{e^{2\pi ikhz_{j}/n}}{|h|^{\alpha}}\biggr{)}
=1+1nj=1d(1+2ζ(α))+1nk=1n1j=1d(1+(1)α2+1(2π)αα!Bα({kzjn})),\displaystyle\quad=-1+\frac{1}{n}\prod_{j=1}^{d}\bigl{(}1+2\zeta(\alpha)\bigr{)}+\frac{1}{n}\sum_{k=1}^{n-1}\prod_{j=1}^{d}\biggl{(}1+\frac{(-1)^{\frac{\alpha}{2}+1}(2\pi)^{\alpha}}{\alpha!}B_{\alpha}\Bigl{(}\Bigl{\{}\frac{kz_{j}}{n}\Bigr{\}}\Bigr{)}\biggr{)},

where

(2.9) ζ(α)\displaystyle\zeta(\alpha) :=j=11jα,α>1;\displaystyle:=\sum_{j=1}^{\infty}\frac{1}{j^{\alpha}},\qquad\alpha>1;
(2.10) Bα(λ)\displaystyle B_{\alpha}(\lambda) :=(1)α2+1α!(2π)αh\{0}e2πihλ|h|α,λ[0,1].\displaystyle:=\frac{(-1)^{\frac{\alpha}{2}+1}\alpha!}{(2\pi)^{\alpha}}\sum_{h\in\mathbb{Z}\backslash\{0\}}\frac{e^{2\pi ih\lambda}}{|h|^{\alpha}},\qquad\lambda\in[0,1].

As expected, the performance of a lattice rule depends heavily on the choice of the generating vector 𝐳\mathbf{z}. For large nn and dd, an exhaustive search to find such a generating vector by minimizing some desired error criterion is practically impossible. Below we list a few common strategies for constructing lattice generating vectors.

We end this subsection by stating some well-known error estimate results. To the end, we need to introduce some notations. The worst-case error of a QMC rule Qn,d(f)Q_{n,d}(f) using the point set P[0,1]dP\subset[0,1]^{d} in a normed space HH (with the norm \|\cdot\|) is

En,d(P):=supf1|Id(f)Qn,d(f)|.E_{n,d}(P):=\sup_{\|f\|\leq 1}|I_{d}(f)-Q_{n,d}(f)|.

By linearity, for any function fHf\in H, we have

|Id(f)Qn,d(f)|En,d(P)f.\bigl{|}I_{d}(f)-Q_{n,d}(f)\bigr{|}\leq E_{n,d}(P)\|f\|.

For a given (shift) vector Δ[0,1]d\Delta\in[0,1]^{d}, we define the shifted lattice P+Δ:={{𝐭+Δ}:tP}P+\Delta:=\{\{\mathbf{t}+\Delta\}:t\in P\}. For any QMC lattice rule Qn,d()Q_{n,d}(\cdot) with the lattice point set PP, let Qn,d(sh)()Q^{(sh)}_{n,d}(\cdot) denote the corresponding shifted QMC lattice rule over the lattice P+ΔP+\Delta. Then, for any integrand fHf\in H, it follows from the definition of the worst-case error that

|Id(f)Qn,d(sh)(f)|En,d(P+Δ)f.\bigl{|}I_{d}(f)-Q_{n,d}^{(sh)}(f)\bigr{|}\leq E_{n,d}(P+\Delta)\|f\|.

Define the quantity

En,d(sh)(P):=([0,1]dEn,d2(P+Δ)𝑑Δ)12,E^{(sh)}_{n,d}(P):=\biggl{(}\int_{[0,1]^{d}}E^{2}_{n,d}(P+\Delta)\,d\Delta\biggr{)}^{\frac{1}{2}},

which denotes the shift-averaged worst-case error. The following bound for the root-mean-square error was derived in [3, Section 5.2]:

(𝔼|Id(f)Qn,d(sh)(f)|2)12En,d(sh)(P)f.\biggl{(}\mathbb{E}\bigl{|}I_{d}(f)-Q_{n,d}^{(sh)}(f)\bigr{|}^{2}\biggr{)}^{\frac{1}{2}}\leq E^{(sh)}_{n,d}(P)\|f\|.

where the expectation 𝔼\mathbb{E} is taken over the random shift Δ\Delta which is uniformly distributed over [0,1]d[0,1]^{d}. The shift-averaged worst-case error En,d(sh)(P)E^{(sh)}_{n,d}(P) is often used as quality measure for randomly shifted QMC rules. For any given point set PP, the averaging argument guarantees the existence of at least one shift Δ\Delta for which

(2.11) En,d(P+Δ)En,d(sh)(P).E_{n,d}(P+\Delta)\leq E_{n,d}^{(sh)}(P).

In the case of rank-one QMC lattice rule with the generating vector 𝐳\mathbf{z}, we use En,d(𝐳)E_{n,d}(\mathbf{z}) and En,d(sh)(𝐳)E^{(sh)}_{n,d}(\mathbf{z}) to denote En,d(P)E_{n,d}(P) and En,d(sh)(P)E^{(sh)}_{n,d}(P). It was proved in [3, Lemma 5.5] that, for any rank-one QMC lattice rule, [En,d(sh)(𝐳)]2\bigl{[}E^{(sh)}_{n,d}(\mathbf{z})\bigr{]}^{2} has an explicit formula as quoted in the following theorem.

Theorem 2.4.

The shift-averaged worst-case error for a rank-one QMC attice rule in the weighted anchored or unanchored Sobolev space (see Remark 2.5 below for the definitions) is given by

(2.12) [En,d(sh)(𝐳)]2\displaystyle\bigl{[}E^{(sh)}_{n,d}(\mathbf{z})\bigr{]}^{2} =ν{1:d}γν(1nk=0n1jν[B2({kzjn})+β]β|ν|)\displaystyle=\sum_{\emptyset\neq\nu\subseteq\{1:d\}}\gamma_{\nu}\bigg{(}\frac{1}{n}\sum_{k=0}^{n-1}\prod_{j\in\nu}\bigg{[}B_{2}\bigg{(}\bigg{\{}\frac{kz_{j}}{n}\bigg{\}}\bigg{)}+\beta\bigg{]}-\beta^{|\nu|}\bigg{)}
=j=1d(1+γjβ)+1nk=0n1j=1d(1+γj[B2({kzjn})+β]),\displaystyle=-\prod_{j=1}^{d}(1+\gamma_{j}\beta)+\frac{1}{n}\sum_{k=0}^{n-1}\prod_{j=1}^{d}\bigg{(}1+\gamma_{j}\bigg{[}B_{2}\bigg{(}\bigg{\{}\frac{kz_{j}}{n}\bigg{\}}\bigg{)}+\beta\bigg{]}\bigg{)},

where β=c2c+13\beta=c^{2}-c+\frac{1}{3} for the anchored Sobolev space and β=0\beta=0 for unanchored Sobolev space. {γν}\{\gamma_{\nu}\} are weights.

Remark 2.5.

(1) We recall that for general given weights {γν}\{\gamma_{\nu}\}, the inner product of the weighted anchored Sobolev space is defined by

(2.13) <f,g>d,γ:=ν{1,2,,d}γν1[0,1]|ν||ν|𝐱νf(𝐱ν;c)|ν|𝐱νg(𝐱ν;c)d𝐱ν.<f,g>_{d,\gamma}:=\sum_{\nu\subseteq\{1,2,\cdots,d\}}\gamma_{\nu}^{-1}\int_{[0,1]^{|\nu|}}\frac{\partial^{|\nu|}}{\partial\mathbf{x}_{\nu}}f(\mathbf{x}_{\nu};c)\frac{\partial^{|\nu|}}{\partial\mathbf{x}_{\nu}}g(\mathbf{x}_{\nu};c)d\mathbf{x}_{\nu}.

where the sum is over all subsets ν{1,2,,d}\nu\subseteq\{1,2,\cdots,d\}, including the empty set, while for 𝐱[0,1]d\mathbf{x}\in[0,1]^{d} the symbol 𝐱ν\mathbf{x}_{\nu} denotes the set of components xjx_{j} of 𝐱\mathbf{x} with jνj\in\nu, and (𝐱ν;c)(\mathbf{x}_{\nu};c) denotes the vector obtained by replacing the components of 𝐱\mathbf{x} for jνj\notin\nu by c[0,1]c\in[0,1] which is called the ‘anchor’ value. The partial derivative |ν|𝐱ν\frac{\partial^{|\nu|}}{\partial\mathbf{x}_{\nu}} denotes the mixed first partial derivative with respect to the components of 𝐱ν\mathbf{x}_{\nu}.

(2) The inner product for the weighted unanchored Sobolev space is defined by

(2.14) <f,g>d,γ\displaystyle<f,g>_{d,\gamma} :=ν{1:d}γν1[0,1]|ν|([0,1]d|ν||ν|𝐱νf(𝐱)𝑑𝐱ν)\displaystyle:=\sum_{\nu\subseteq\{1:d\}}\gamma_{\nu}^{-1}\int_{[0,1]^{|\nu|}}\bigg{(}\int_{[0,1]^{d-|\nu|}}\frac{\partial^{|\nu|}}{\partial\mathbf{x}_{\nu}}f(\mathbf{x})d\mathbf{x}_{-\nu}\bigg{)}
×([0,1]d|ν||ν|𝐱νg(𝐱)𝑑𝐱ν)d𝐱ν,\displaystyle\hskip 108.405pt\times\bigg{(}\int_{[0,1]^{d-|\nu|}}\frac{\partial^{|\nu|}}{\partial\mathbf{x}_{\nu}}g(\mathbf{x})d\mathbf{x}_{-\nu}\bigg{)}d\mathbf{x}_{\nu},

where 𝐱ν\mathbf{x}_{-\nu} stands for the vector consisting of the remaining components of the dd-dimensional vector 𝐱\mathbf{x} that are not in 𝐱ν\mathbf{x}_{\nu}.

2.2 Examples of good rank-one lattice rules

The first example is the Fibonacci lattice, we refer the reader to [3] for the details.

Example 2.6 (Fibonacci lattice).

Let 𝐳=(1,Fk)\mathbf{z}=(1,F_{k}) and n=Fk+1n=F_{k+1}, where FkF_{k} and Fk+1F_{k+1} are consecutive Fibonacci numbers. Then the resulting two-dimensional lattice set generated by 𝐳\mathbf{z} is called a Fibonacci lattice.

Fibonacci lattices in 2-d have a certain optimality property, but there is no obvious generalization to higher dimensions that retains the optimality property (cf. [3] ).

The second example is so-called Korobov lattices, we refer the reader to [5, 6] for the details.

Example 2.7 (Korobov lattice).

Let aa be an integer satisfying 1an11\leq a\leq n-1 and gcd(a,n)=1\gcd(a,n)=1 and

𝐳=𝐳(a):=(1,a,a2,,ad1)modn.\mathbf{z}=\mathbf{z}(a):=(1,a,a^{2},\cdots,a^{d-1})\mod n.

Then the resulting dd-dimensional lattice set generated by 𝐳\mathbf{z} is called a Korobov lattice.

It is easy to see that there are (at most) n1n-1 choices for the Korobov parameter aa, which leads to (at most) n1n-1 choices for the generating vector 𝐳\mathbf{z}. Thus it is feasible in practice to search through the (at most) n1n-1 choices and take the one that fulfills the desired error criterion such as the one that minimizes Pα,n,d(𝐳)P_{\alpha,n,d}(\mathbf{z}), and (2.8) allows Pα,n,d(𝐳)P_{\alpha,n,d}(\mathbf{z}) to be computed in O(dn2)O(dn^{2}) operations (cf. [4]).

The last example is called the CBC lattice which is based on the component-by-component construction (cf. [9]).

Example 2.8 (CBC lattice).

Let n:={z:1zn1 and gcd(z,n)=1}\mathbb{N}_{n}:=\{z\in\mathbb{Z}:1\leq z\leq n-1\text{ and }\gcd(z,n)=1\}. Given n,dn,d and weights as in γν\gamma_{\nu} in (2.12), define generating vector 𝐳=(z1,z2,,zd)\mathbf{z}=(z_{1},z_{2},\cdots,z_{d}) component-wise as follows.

  • (i)

    Set z1=1z_{1}=1.

  • (ii)

    With z1z_{1} held fixed, choose z2z_{2} from n\mathbb{N}_{n} to minimize [En,d(sh)((z1,z2))]2[E^{(sh)}_{n,d}((z_{1},z_{2}))]^{2} in 22-d.

  • (iii)

    With z1,z2z_{1},z_{2} held fixed, choose z3z_{3} from n\mathbb{N}_{n} to minimize [En,d(sh)((z1,z2,z3))]2[E^{(sh)}_{n,d}((z_{1},z_{2},z_{3}))]^{2} in 33-d.

  • (iv)

    repeat the above process until all {zj}j=1d\{z_{j}\}_{j=1}^{d} are determined.

With general weights {γν}\{\gamma_{\nu}\}, the cost of the CBC algorithm is prohibitively expensive, thus in practice some special structure is always adopted, among them, product weights, order-dependent weights, finite-order weights, and POD (product and order-dependent) weights are commonly used. In each of the dd steps of the CBC algorithm, the search space n\mathbb{N}_{n} has cardinality n1n-1. Then the overall search space for the CBC algorithm is reduced to a size of order O(dn)O(dn) (cf. [10, page 11]). Hence, this provides a feasible way of constructing a generating vector 𝐳\mathbf{z}.

Figure 1 shows a two-dimensional lattice with 81 points, the corresponding generating vectors are (1, 2), (1, 4) and (1, 7) respectively. Figure 2 shows a three-dimensional lattice with 81 points, and the corresponding generating vectors are respectively (1, 2, 4), (1, 4, 16) and (1, 7, 49).

Refer to caption Refer to caption Refer to caption

Figure 1: 8181-point lattice with generating vectors (1,2),(1,4)(1,2),(1,4), and (1,7)(1,7).

Refer to caption Refer to caption Refer to caption

Figure 2: 8181-point lattice with generating vectors (1,2,4),(1,4,16)(1,2,4),(1,4,16), and (1,7,49)(1,7,49).

3 Reformulation of lattice rules

Clearly, the lattice point set of each QMC lattice rule has some pattern or structure. Indeed, one main goal of this section is precisely to describe the pattern. We show that a lattice rule almost has a tensor product reformulation viewed in an appropriately transformed coordinate space via an affine transformation. This discovery allows us to introduce a tensor product rule as an improvement to the original QMC lattice rule. More importantly, the reformulation lays an important jump pad for us to develop an efficient and fast implementation algorithm (or solver), called the MDI-LR algorithm, based on the idea of multilevel dimension iteration [11], for evaluating the QMC lattice rule (1.2).

3.1 Construction of affine coordinate transformations

From Figure 1-2, we see that the distribution of lattice points are on lines/planes which are not parallel to the coordinate axes/planes, however, those lines/planes are parallel to each other, this observation suggests that they can be made to parallel to the coordinate axes/planes via affine transformations. Below we prove that is indeed the case and explicitly construct such an affine transformation for a given QMC lattice rule.

Theorem 3.1.

Let 𝐳d\mathbf{z}\in\mathbb{Z}^{d} and 𝐱j={j𝐳n}\mathbf{x}_{j}=\bigl{\{}\frac{j\mathbf{z}}{n}\big{\}} for j=0,1,2,,n1j=0,1,2,\cdots,n-1 denote the rank-one QMC lattice rule point set. Define

(3.1) A=(1z11z200001z21z3000001zd11zd00001)and 𝐛=(00{nxdzd}zdn).A=\left(\begin{array}[]{ccccccc}\frac{1}{z_{1}}&-\frac{1}{z_{2}}&0&\cdots&0&0\\ 0&\frac{1}{z_{2}}&-\frac{1}{z_{3}}&\cdots&0&0\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ 0&0&0&\cdots&\frac{1}{z_{d-1}}&-\frac{1}{z_{d}}\\ 0&0&0&\cdots&0&1\end{array}\right)\quad\mbox{and }\quad\mathbf{b}=\left(\begin{array}[]{c}0\\ 0\\ \vdots\\ -\bigl{\{}\frac{nx_{d}}{z_{d}}\bigr{\}}\cdot\frac{z_{d}}{n}\end{array}\right).

Notice that Ad×dA\in\mathbb{R}^{d\times d} and 𝐛d\mathbf{b}\in\mathbb{R}^{d}. Then 𝐲j:=abs(A𝐱j+𝐛),j=0,1,,n1\mathbf{y}_{j}:=abs(A\mathbf{x}_{j}+\mathbf{b}),j=0,1,\cdots,n-1 form a Cartesian grid in the new coordinate system, where abs(𝐲)abs(\mathbf{y}) defines as taking the absolute value of each component in the vector 𝐲\mathbf{y}.

Proof 3.2.

By the definition of 𝐱j\mathbf{x}_{j}, we have 𝐱j=({jz1n},{jz2n},,{jzdn})\mathbf{x}_{j}=\big{(}\{\frac{jz_{1}}{n}\},\{\frac{jz_{2}}{n}\},\cdots,\{\frac{jz_{d}}{n}\}\big{)}^{\intercal}. A direct computation yields

(3.7) 𝐲j\displaystyle\mathbf{y}_{j} =abs(A𝐱j+𝐛)=abs(1z1{jz1n}1z2{jz2n}1z2{jz2n}1z3{jz3n}1zd1{jzd1n}1zd{jzdn}{jzdn}{nzd{jzdn}}zdn)\displaystyle=abs(A\mathbf{x}_{j}+\mathbf{b})=abs\left(\begin{array}[]{cccc}\frac{1}{z_{1}}\{\frac{jz_{1}}{n}\}-\frac{1}{z_{2}}\{\frac{jz_{2}}{n}\}\\ \frac{1}{z_{2}}\{\frac{jz_{2}}{n}\}-\frac{1}{z_{3}}\{\frac{jz_{3}}{n}\}\\ \vdots\\ \frac{1}{z_{d-1}}\{\frac{jz_{d-1}}{n}\}-\frac{1}{z_{d}}\{\frac{jz_{d}}{n}\}\\ \{\frac{jz_{d}}{n}\}-\{\frac{n}{z_{d}}\{\frac{jz_{d}}{n}\}\}\cdot\frac{z_{d}}{n}\end{array}\right)

Recall that {x}\{x\} and x\lfloor x\rfloor denote respectively the fractional and integer parts of the number xx. Because

1zi{jzin}=1zi(jzinjzin)=jn1zijzin,\frac{1}{z_{i}}\Big{\{}\frac{jz_{i}}{n}\Big{\}}=\frac{1}{z_{i}}\Big{(}\frac{jz_{i}}{n}-\Big{\lfloor}\frac{jz_{i}}{n}\Big{\rfloor}\Big{)}=\frac{j}{n}-\frac{1}{z_{i}}\Big{\lfloor}\frac{jz_{i}}{n}\Big{\rfloor},

then

1zi1{jzi1n}1zi{jzin}=1zijzin1zi1jzi1n,\frac{1}{z_{i-1}}\Big{\{}\frac{jz_{i-1}}{n}\Big{\}}-\frac{1}{z_{i}}\Big{\{}\frac{jz_{i}}{n}\Big{\}}=\frac{1}{z_{i}}\Big{\lfloor}\frac{jz_{i}}{n}\Big{\rfloor}-\frac{1}{z_{i-1}}\Big{\lfloor}\frac{jz_{i-1}}{n}\Big{\rfloor},

and

(3.8) 𝐲j=abs(1z1{jz1n}1z2{jz2n}1z2{jz2n}1z3{jz3n}1zd1{jzd1n}1zd{jzdn}{jzdn}{{jzdn}zdn}zdn)=abs(1z2jz2n1z1jz1n1z3jz3n1z2jz2n1zdjzdn1zd1jzd1nzdnjnzdjzdn).\mathbf{y}_{j}={abs}\left(\begin{array}[]{cccc}\frac{1}{z_{1}}\{\frac{jz_{1}}{n}\}-\frac{1}{z_{2}}\{\frac{jz_{2}}{n}\}\\ \frac{1}{z_{2}}\{\frac{jz_{2}}{n}\}-\frac{1}{z_{3}}\{\frac{jz_{3}}{n}\}\\ \vdots\\ \frac{1}{z_{d-1}}\{\frac{jz_{d-1}}{n}\}-\frac{1}{z_{d}}\{\frac{jz_{d}}{n}\}\\ \{\frac{jz_{d}}{n}\}-\{\frac{\{\frac{jz_{d}}{n}\}}{\frac{z_{d}}{n}}\}\cdot\frac{z_{d}}{n}\end{array}\right)={abs}\left(\begin{array}[]{cccc}\frac{1}{z_{2}}\lfloor\frac{jz_{2}}{n}\rfloor-\frac{1}{z_{1}}\lfloor\frac{jz_{1}}{n}\rfloor\\ \frac{1}{z_{3}}\lfloor\frac{jz_{3}}{n}\rfloor-\frac{1}{z_{2}}\lfloor\frac{jz_{2}}{n}\rfloor\\ \vdots\\ \frac{1}{z_{d}}\lfloor\frac{jz_{d}}{n}\rfloor-\frac{1}{z_{d-1}}\lfloor\frac{jz_{d-1}}{n}\rfloor\\ \frac{z_{d}}{n}\lfloor j-\frac{n}{z_{d}}\lfloor\frac{jz_{d}}{n}\rfloor\rfloor\end{array}\right).

It is easy to check that

(3.9) 1zijzin1zi1jzi1n={0,0j<nzi1zi,nzij<nzi11zi11zi,nzi1j<2nzi2zi1zi1,2nzij<2nzi12zi12zi,2nzi1j<3nzi.\frac{1}{z_{i}}\Bigl{\lfloor}\frac{jz_{i}}{n}\Bigr{\rfloor}-\frac{1}{z_{i-1}}\Bigl{\lfloor}\frac{jz_{i-1}}{n}\Bigr{\rfloor}=\left\{\begin{array}[]{lcl}0,&0\leq j<\frac{n}{z_{i}}\\ \frac{1}{z_{i}},&\frac{n}{z_{i}}\leq j<\frac{n}{z_{i-1}}\\ \frac{1}{z_{i-1}}-\frac{1}{z_{i}},&\frac{n}{z_{i-1}}\leq j<\frac{2n}{z_{i}}\\ \frac{2}{z_{i}}-\frac{1}{z_{i-1}},&\frac{2n}{z_{i}}\leq j<\frac{2n}{z_{i-1}}\\ \frac{2}{z_{i-1}}-\frac{2}{z_{i}},&\frac{2n}{z_{i-1}}\leq j<\frac{3n}{z_{i}}\\ \vdots&\vdots\\ \end{array}\right..

On the other hand, let

Γn11:={ys1|ys1=abs(1z2iz2n1z1iz1n),i=0,1,,n1,s1=0,1,,n11},\displaystyle\Gamma_{n_{1}}^{1}:=\Big{\{}y_{s_{1}}\,|y_{s_{1}}={abs}\bigg{(}\frac{1}{z_{2}}\Big{\lfloor}\frac{iz_{2}}{n}\Big{\rfloor}-\frac{1}{z_{1}}\Big{\lfloor}\frac{iz_{1}}{n}\Big{\rfloor}\bigg{)},i=0,1,\cdots,n-1,s_{1}=0,1,\cdots,n_{1}-1\Big{\}},
Γn21:={ys2|ys2=abs(1z3iz3n1z2iz2n),i=0,1,,n1,s2=0,1,,n21},\displaystyle\Gamma_{n_{2}}^{1}:=\Big{\{}y_{s_{2}}\,|y_{s_{2}}={abs}\bigg{(}\frac{1}{z_{3}}\Big{\lfloor}\frac{iz_{3}}{n}\Big{\rfloor}-\frac{1}{z_{2}}\Big{\lfloor}\frac{iz_{2}}{n}\Big{\rfloor}\bigg{)},i=0,1,\cdots,n-1,s_{2}=0,1,\cdots,n_{2}-1\Big{\}},
\displaystyle\quad\vdots
Γnd11:={ysd1|ysd1=abs(1zdizdn1zd1izd1n),i=0,1,,n1,sd1=0,1,,nd11},\displaystyle\Gamma_{n_{d-1}}^{1}:=\Big{\{}y_{s_{d-1}}\,|y_{s_{d-1}}={abs}\bigg{(}\frac{1}{z_{d}}\Big{\lfloor}\frac{iz_{d}}{n}\Big{\rfloor}-\frac{1}{z_{d-1}}\Big{\lfloor}\frac{iz_{d-1}}{n}\Big{\rfloor}\bigg{)},i=0,1,\cdots,n-1,\atop\hskip 202.35622pts_{d-1}=0,1,\cdots,n_{d-1}-1\Big{\}},
Γnd1:={ysd|ysd=abs(zdninzdizdn),i=0,1,,n1,sd=0,1,,nd1},\displaystyle\Gamma_{n_{d}}^{1}:=\Big{\{}y_{s_{d}}\,|y_{s_{d}}={abs}\bigg{(}\frac{z_{d}}{n}\Big{\lfloor}i-\frac{n}{z_{d}}\Big{\lfloor}\frac{iz_{d}}{n}\Big{\rfloor}\Big{\rfloor}\bigg{)},i=0,1,\cdots,n-1,s_{d}=0,1,\cdots,n_{d}-1\Big{\}},
Γnd:=Γn11Γn21Γnd1,\displaystyle\Gamma_{n}^{d}:=\Gamma_{n_{1}}^{1}\otimes\Gamma_{n_{2}}^{1}\otimes\cdots\otimes\Gamma_{n_{d}}^{1},

where

(3.10) ni=lcm(zi,zi+1)min(zi,zi+1),i=1,2,,d1;nd=nzd,n_{i}=\frac{{\rm lcm}(z_{i},z_{i+1})}{\min(z_{i},z_{i+1})},\quad i=1,2,\cdots,d-1;\qquad n_{d}=\Bigl{\lceil}\frac{n}{z_{d}}\Bigr{\rceil},

and lcm{\rm lcm} represents the least common multiple.

For any 𝐲k=(ys1,ys2,,ysd)Γnd\mathbf{y}_{k}=(y_{s_{1}},y_{s_{2}},\cdots,y_{s_{d}})^{\intercal}\in\Gamma_{n}^{d}, we have k=s1+s2n1+s3n1n2++sdn1n2nd1k=s_{1}+s_{2}n_{1}+s_{3}n_{1}n_{2}+\cdots+s_{d}n_{1}n_{2}\cdots n_{d-1}. Since s1=0,1,,n1,,sd=0,1,,nds_{1}=0,1,\cdots,n_{1},\cdots,s_{d}=0,1,\cdots,n_{d}, then k=0,1,,n1,,k=0,1,\cdots,n_{1},\cdots, (n1n2nd)(n_{1}n_{2}\cdots n_{d}). For 𝐲j\mathbf{y}_{j} in the set described by (3.8), j=1,2,,n\forall j=1,2,\cdots,n, we get

(3.11) 𝐲j=abs(1z2jz2n1z1jz1n1z3jz3n1z2jz2n1zdjzdn1zd1jzd1nzdnknzdjzdn).\mathbf{y}_{j}={abs}\left(\begin{array}[]{cccc}\frac{1}{z_{2}}\lfloor\frac{jz_{2}}{n}\rfloor-\frac{1}{z_{1}}\lfloor\frac{jz_{1}}{n}\rfloor\\ \frac{1}{z_{3}}\lfloor\frac{jz_{3}}{n}\rfloor-\frac{1}{z_{2}}\lfloor\frac{jz_{2}}{n}\rfloor\\ \vdots\\ \frac{1}{z_{d}}\lfloor\frac{jz_{d}}{n}\rfloor-\frac{1}{z_{d-1}}\lfloor\frac{jz_{d-1}}{n}\rfloor\\ \frac{z_{d}}{n}\lfloor k-\frac{n}{z_{d}}\lfloor\frac{jz_{d}}{n}\rfloor\rfloor\end{array}\right).

Let yi1:=abs(1z2jz2n1z1jz1n)y_{i_{1}}:=abs(\frac{1}{z_{2}}\lfloor\frac{jz_{2}}{n}\rfloor-\frac{1}{z_{1}}\lfloor\frac{jz_{1}}{n}\rfloor), it follows that there exists an s1s_{1} such that s1=n1jn1s_{1}=n_{1}-\lfloor\frac{j}{n_{1}}\rfloor , resulting in yi1=ys1Γn11y_{i_{1}}=y_{s_{1}}\in\Gamma_{n_{1}}^{1}. In the same way, yi2Γn21,,yidΓnd1y_{i_{2}}\in\Gamma_{n_{2}}^{1},\cdots,y_{i_{d}}\in\Gamma_{n_{d}}^{1}. Therefore, we conclude that 𝐲j=(yi1,yi2,,yid)Γnd\mathbf{y}_{j}=(y_{i_{1}},y_{i_{2}},\cdots,y_{i_{d}})^{\intercal}\in\Gamma_{n}^{d}, that is, the transformed lattice points have the Cartesian product structure.

Lemma 3.3.

Let 𝐱j={j𝐳n}\mathbf{x}_{j}=\bigl{\{}\frac{j\mathbf{z}}{n}\big{\}} for j=1,2,,n1j=1,2,\cdots,n-1 denote the Korobov lattice point set, that is 𝐳=(1,a,a2,,ad1)\mathbf{z}=(1,a,a^{2},\cdots,a^{d-1}), 1an11\leq a\leq n-1 and gcd(a,n)=1\gcd(a,n)=1. Then 𝐲j:=abs(A𝐱j+𝐛),j=0,1,,n1\mathbf{y}_{j}:=abs(A\mathbf{x}_{j}+\mathbf{b}),j=0,1,\cdots,n-1 satisfies the conclusion of Theorem 3.1. Moreover, if a=[n]1da=[n]^{\frac{1}{d}}, then the number of points in each direction of the lattice set Γnd\Gamma_{n}^{d} is the same, that is, n1=n2=nd=an_{1}=n_{2}\cdots=n_{d}=a.

Proof 3.4.

From theorem 3.1 we have

(3.12) 𝐲j=abs(1ajan1a2ja2n1ajan1ad1jad1n1ad2jad2nad1njnad1jad1n),\mathbf{y}_{j}=abs\left(\begin{array}[]{cccc}\frac{1}{a}\lfloor\frac{ja}{n}\rfloor\\ \frac{1}{a^{2}}\lfloor\frac{ja^{2}}{n}\rfloor-\frac{1}{a}\lfloor\frac{ja}{n}\rfloor\\ \vdots\\ \frac{1}{a^{d-1}}\lfloor\frac{ja^{d-1}}{n}\rfloor-\frac{1}{a^{d-2}}\lfloor\frac{ja^{d-2}}{n}\rfloor\\ \frac{a^{d-1}}{n}\lfloor j-\frac{n}{a^{d-1}}\lfloor\frac{ja^{d-1}}{n}\rfloor\rfloor\end{array}\right),

and

Γn11\displaystyle\Gamma_{n_{1}}^{1} ={ys1|ys1=abs(1ajan),j=0,1,,n1,s1=0,1,,n11}\displaystyle=\Big{\{}y_{s_{1}}\,|y_{s_{1}}={abs}\bigg{(}\frac{1}{a}\Big{\lfloor}\frac{ja}{n}\Big{\rfloor}\bigg{)},j=0,1,\cdots,n-1,s_{1}={0,1},\cdots,n_{1}-1\Big{\}}
={0,1a,2a,,a1a},\displaystyle=\Big{\{}0,\frac{1}{a},\frac{2}{a},\cdots,\frac{a-1}{a}\Big{\}},
Γn21\displaystyle\Gamma_{n_{2}}^{1} ={ys2|ys2=abs(1a2ja2n1ajan),j=0,1,,n1,s2=0,1,,n21}\displaystyle=\Big{\{}y_{s_{2}}\,|y_{s_{2}}={abs}\bigg{(}\frac{1}{a^{2}}\Big{\lfloor}\frac{ja^{2}}{n}\Big{\rfloor}-\frac{1}{a}\Big{\lfloor}\frac{ja}{n}\Big{\rfloor}\bigg{)},j=0,1,\cdots,n-1,s_{2}={0,1},\cdots,n_{2}-1\Big{\}}
={0,1a2,2a2,,a1a2},\displaystyle=\Big{\{}0,\frac{1}{a^{2}},\frac{2}{a^{2}},\cdots,\frac{a-1}{a^{2}}\Big{\}},
\displaystyle\quad\vdots
Γnd11\displaystyle\Gamma_{n_{d-1}}^{1} ={ysd1|ysd1=abs(1ad1jad1n1ad2jad2n),j=0,1,,n1,\displaystyle=\Big{\{}y_{s_{d-1}}\,|y_{s_{d-1}}={abs}\bigg{(}\frac{1}{a^{d-1}}\Big{\lfloor}\frac{ja^{d-1}}{n}\Big{\rfloor}-\frac{1}{a^{d-2}}\Big{\lfloor}\frac{ja^{d-2}}{n}\Big{\rfloor}\bigg{)},j=0,1,\cdots,n-1,
sd1=0,1,,nd11}\displaystyle\hskip 224.03743pts_{d-1}={0,1},\cdots,n_{d-1}-1\Big{\}}
={0,1ad1,2ad1,,a1ad1},\displaystyle=\Big{\{}0,\frac{1}{a^{d-1}},\frac{2}{a^{d-1}},\cdots,\frac{a-1}{a^{d-1}}\Big{\}},
Γnd1\displaystyle\Gamma_{n_{d}}^{1} ={ysd|ysd=abs(ad1njnad1jad1n),j=0,1,,n1,\displaystyle=\Big{\{}y_{s_{d}}\,|y_{s_{d}}={abs}\bigg{(}\frac{a^{d-1}}{n}\Big{\lfloor}j-\frac{n}{a^{d-1}}\Big{\lfloor}\frac{ja^{d-1}}{n}\Big{\rfloor}\Big{\rfloor}\bigg{)},j=0,1,\cdots,n-1,
sd=0,1,,nd1}\displaystyle\hskip 205.96994pts_{d}={0,1},\cdots,n_{d}-1\Big{\}}
={0,ad1n,2ad1n,,nad1n}.\displaystyle=\Big{\{}0,\frac{a^{d-1}}{n},\frac{2a^{d-1}}{n},\cdots,\frac{n-a^{d-1}}{n}\Big{\}}.

For 𝐲j\mathbf{y}_{j} in the set described by (3.12), let yi1:=abs(1ajan)y_{i_{1}}:=abs(\frac{1}{a}\lfloor\frac{ja}{n}\rfloor), it follows that there exists an s1s_{1} such that s1=n1jn1s_{1}=n_{1}-\lfloor\frac{j}{n_{1}}\rfloor, resulting in yi1=s1a=ys1Γn11y_{i_{1}}=\frac{s_{1}}{a}=y_{s_{1}}\in\Gamma_{n_{1}}^{1}. Similarly, yi2=s2a2=ys2Γn21,,yidΓnd1y_{i_{2}}=\frac{s_{2}}{a^{2}}=y_{s_{2}}\in\Gamma_{n_{2}}^{1},\cdots,y_{i_{d}}\in\Gamma_{n_{d}}^{1}. Therefore, we conclude that 𝐲j=(yi1,yi2,,yid)Γnd\mathbf{y}_{j}=(y_{i_{1}},y_{i_{2}},\cdots,y_{i_{d}})^{\intercal}\in\Gamma_{n}^{d}, Obviously, the transformed lattice points have the Cartesian product structure. Moreover, if a=[n]1da=[n]^{\frac{1}{d}}, then

ni=lcm(zi,zi+1)min(zi,zi+1)=zi+1zi=a,i=1,2,,d1,\displaystyle n_{i}=\frac{{\rm lcm}(z_{i},z_{i+1})}{\min(z_{i},z_{i+1})}=\frac{z_{i+1}}{z_{i}}=a,\quad i=1,2,\cdots,d-1,
nd=nzd=adad1=a.\displaystyle n_{d}=\lceil\frac{n}{z_{d}}\rceil=\frac{a^{d}}{a^{d-1}}=a.

Hence, n1=n2==nd1=nd=an_{1}=n_{2}=\cdots=n_{d-1}=n_{d}=a. The proof is complete.

Refer to caption

Figure 3: Left: 81-point lattice with the generating vector (1,4)(1,4). Right: transformed lattice after affine coordinate transformation.

Left graph of Figure 3 shows a 2-d example of 81-point rank-one lattice with the generating vector (1,4)(1,4). Right graph displays transformed lattice under the affine coordinate transformation 𝐲=A𝐱+𝐛\mathbf{y}=A\mathbf{x}+\mathbf{b} from 2\mathbb{R}^{2} to itself, where

(3.13) A=(11401),𝐛=(0{81x24}481).A=\left(\begin{array}[]{cc}1&-\frac{1}{4}\\ 0&1\end{array}\right),\qquad\mathbf{b}=\left(\begin{array}[]{cc}0\\ -\{\frac{81x_{2}}{4}\}\cdot\frac{4}{81}\end{array}\right).

Refer to caption

Figure 4: Left: 161-point rank-one lattice with generating vector is (1,4,16)(1,4,16). Right: transformed lattice after coordinate transformation.

Figure 4 demonstrates a specific example in 3-d. The left graph is a 161-point rank-one lattice with generating vector (1,4,16)(1,4,16). The right one shows the transformed points under the affine coordinate transformation 𝐲=A𝐱+𝐛\mathbf{y}=A\mathbf{x}+\mathbf{b} from 3\mathbb{R}^{3} to itself, where

(3.14) A=(1140014116001),𝐛=(00{161x316}16161).A=\left(\begin{array}[]{ccc}1&-\frac{1}{4}&0\\ 0&\frac{1}{4}&-\frac{1}{16}\\ 0&0&1\end{array}\right),\qquad\mathbf{b}=\left(\begin{array}[]{ccc}0\\ 0\\ -\{\frac{161x_{3}}{16}\}\cdot\frac{16}{161}\end{array}\right).

3.2 Improved lattice rules

From Figures 3 and 4 we see that the transformed lattice point sets do not exactly form tensor product grids because many lines miss one point. By adding those “missing” points which can be done systematically, we easily make them become tensor product grids in the transformed coordinate system. Since more integration points are added to the QMC lattice rule, the resulting quadrature rule is expected to be more accurate (which is supported by our numerical tests), hence, it is an improvement to the original QMC lattice rule. We also note that those added points would correspond to ghost points in the original coordinates.

Definition 3.5 (Improved QMC lattice rule).

Let 𝐳d\mathbf{z}\in\mathbb{R}^{d} and 𝐱i={i𝐳n},i=0,1,,n1,\mathbf{x}_{i}=\big{\{}\frac{i\mathbf{z}}{n}\big{\}},i=0,1,\cdots,n-1, be a rank-one lattice point set and 𝐲i:=A𝐱i+𝐛,i=0,1,,n1\mathbf{y}_{i}:=A\mathbf{x}_{i}+\mathbf{b},i=0,1,\cdots,n-1 for some Ad×dA\in\mathbb{R}^{d\times d} and 𝐛d\mathbf{b}\in\mathbb{R}^{d} (which uniquely determine an affine transformation). Suppose there exists n(<<n)n^{*}(<<n) points so that together the n+nn+n^{*} points form a tensor product grid in the transformed coordinate system, then the QMC lattice rule obtained by using those n+nn+n^{*} sampling points is called an improved QMC lattice rule, and denoted by Q^n,d(f)\widehat{Q}_{n,d}(f).

Figure 5 shows a 81-point (i.e., n=81n=81) 2-d rank-one lattice with the generating vector (1,7)(1,7), the transformed lattice (middle), and the improved tensor product grid (right). Three points are added on the top, so n=3n^{*}=3 for this example.

Refer to caption

Figure 5: Left: 81-point lattice with the generating vector (1,7)(1,7). Middle: transformed lattice. Right: improved tensor product grid after adding 33 points.

4 The MDI-LR algorithm

Since an improved rank-one lattice is a tensor product grid in the transformed coordinate system and its corresponding quasi-Monte Carlo (QMC) rule is a tensor product rule with equal weight w=1n+nw=\frac{1}{n+n^{*}}. This tensor product improvement allows us to apply the multilevel dimension iteration (MDI) approach, which was proposed by the authors in [11], for a fast implementation of the original QMC lattice rule, especially in the high dimension case. The resulting algorithm will be called the MDI-LR algorithm throughout this paper,

4.1 Formulation of the MDI-LR algorithm

To formulate our MDI-LR algorithm, we first recall the MDI idea/algorithm in simple terms (cf. [11]).

For a tensor product rule, we need to compute a multi-summation with variable limits

i1=1n1i2=1n2id=1ndf(ξi1,ξi2,,ξid),\sum_{i_{1}=1}^{n_{1}}\sum_{i_{2}=1}^{n_{2}}\cdots\sum_{i_{d}=1}^{n_{d}}f(\xi_{i_{1}},\xi_{i_{2}},\cdots,\xi_{i_{d}}),

which involves n1n2ndn_{1}n_{2}\cdots n_{d} function evaluations for the given function ff if one uses the conventional approach by computing function value at each point independently, that inevitably leads to the curse of dimensionality (CoD). To make computation feasible in high dimensions, it is imperative to save the computational cost by evaluating the summation more efficiently.

The main idea of the MDI approach proposed in [11] is to compute those n1n2ndn_{1}n_{2}\cdots n_{d} function values in cluster (not independently) and to compute the summation layer-by-layer based on a dimension iteration with help of symbolic computation. To the end, we write

(4.1) i1=1n1i2=1n2id=1ndf(ξi1,ξi2,,ξid)=im+1=1nm+1id=1ndfdm(ξim+1,,ξd),\sum_{i_{1}=1}^{n_{1}}\sum_{i_{2}=1}^{n_{2}}\cdots\sum_{i_{d}=1}^{n_{d}}f(\xi_{i_{1}},\xi_{i_{2}},\cdots,\xi_{i_{d}})=\sum_{i_{m+1}=1}^{n_{m+1}}\cdots\sum_{i_{d}=1}^{n_{d}}f_{d-m}(\xi_{i_{m+1}},\cdots,\xi_{d}),

where 1m<<d1\leq m<<d is fixed and

fdm(x1,,xdm):=i1=1n1im=1nmf(ξi1,,ξim,x1,,xdm).f_{d-m}(x_{1},\cdots,x_{d-m}):=\sum_{i_{1}=1}^{n_{1}}\cdots\sum_{i_{m}=1}^{n_{m}}f(\xi_{i_{1}},\cdots,\xi_{i_{m}},x_{1},\cdots,x_{d-m}).

MDI approach recursively generates a sequence of symbolic functions {fdm,fd2m,\{f_{d-m},f_{d-2m}, ,fdlm}\cdots,f_{d-lm}\}, each function has mm fewer arguments than its predecessor (because the dimension is reduced by mm at each iteration). As already mentioned above, the MDI approach explores the lattice structure of the tensor product integration point set, instead of evaluating function values at all integration points independently, it evaluates them in cluster and iteratively along mm-coordinate directions, the function evaluation at any integration point is not completed until the last step of the algorithm is executed. In some sense, the implementation strategy of the MDI approach is to trade large space complexity for low time complexity. That being said, however, the price to be paid by the MDI approach for the speedy evaluation of the multi-summation is that those symbolic function needs to be saved during the iteration process, which often takes up more computer memory.

For example, consider 2-d function f(x1,x2)=x12+x1x2+x22f(x_{1},x_{2})=x_{1}^{2}+x_{1}x_{2}+x_{2}^{2} and let n1=n2=Nn_{1}=n_{2}=N. In the standard approach, to compute the function value f(ξi1,ξi2)f(\xi_{i_{1}},\xi_{i_{2}}) at an integration point (ξi1,ξi2)(\xi_{i_{1}},\xi_{i_{2}}) , one needs to compute three multiplications ξi1ξi1=ξi12\xi_{i_{1}}*\xi_{i_{1}}=\xi_{i_{1}}^{2}, ξi1ξi2\xi_{i_{1}}*\xi_{i_{2}} and ξi2ξi2=ξi22\xi_{i_{2}}*\xi_{i_{2}}=\xi_{i_{2}}^{2}, and two additions. To compute N2N^{2} function values, it requires a total of 3N23N^{2} multiplications and 3N213N^{2}-1 additions. On the other hand, the first for-loop of the MDI approach generates f1(x)=i2nf(x,ξi2)f_{1}(x)=\sum_{i_{2}}^{n}f(x,\xi_{i_{2}}) which requires NN evaluations of ξi2x1\xi_{i_{2}}*x_{1} (symbolic computations) and NN evaluations of ξi2ξi2\xi_{i_{2}}*\xi_{i_{2}} , as well as 3(N1)3(N-1) additions. The second for-loop generates i1=1Nf1(ξi1)\sum_{i_{1}=1}^{N}f_{1}(\xi_{i_{1}}) which requires NN evaluations of ξi1ξi1\xi_{i_{1}}*\xi_{i_{1}} and NN evaluations of ξi1ξ¯i2\xi_{i_{1}}*\bar{\xi}_{i_{2}}, as well as 3N13N-1 additions. After the second for-loop completes, we obtain the summation value. The computation complexity of the MDI approach consists of a total of 4N4N multiplications and 6N46N-4 additions, which is much cheaper than the standard approach. In fact, the speedup is even more dramatic in higher dimensions.

It is easy to see that the MDI approach can not be applied to the QMC rule (1.2) because it is not in a multi-summation form. However, we have showed in Section 3 that this obstacle can be overcome by a simple affine coordinate transformation (i.e., change of variables) and adding a few integration points.

Let 𝐲=A𝐱+𝐛\mathbf{y}=A\mathbf{x}+\mathbf{b} denote the affine transformation, then the integral (1.1) is equivalent to

(4.2) Id(f)=1|A|Ω^f(A1(𝐲𝐛))𝑑𝐲=1|A|Ω^g(𝐲)𝑑𝐲,I_{d}(f)=\frac{1}{|A|}\int_{\widehat{\Omega}}f\bigl{(}A^{-1}(\mathbf{y}-\mathbf{b})\bigr{)}\,d\mathbf{y}=\frac{1}{|A|}\int_{\widehat{\Omega}}g(\mathbf{y})d\mathbf{y},

where |A||A| stands for the determinant of Ad×dA\in\mathbb{R}^{d\times d} and

g(𝐲):=f(A1(𝐲𝐛)),Ω^:={𝐲|𝐲=A𝐱+𝐛,𝐱Ω}.g(\mathbf{y}):=f\bigl{(}A^{-1}(\mathbf{y}-\mathbf{b})\bigr{)},\qquad\widehat{\Omega}:=\bigl{\{}\mathbf{y}\,|\mathbf{y}=A\mathbf{x}+\mathbf{b},\,\mathbf{x}\in\Omega\bigr{\}}.

Then, our improved QMC rank-one lattice rule for (1.2) in the 𝐲\mathbf{y}-coordinate system takes the form

(4.3) Q^n,d(f)=1n+ni=0n+n1f(𝐱i)=1|A|s1=1n1s2=1n2sd=1ndg(ys1,ys2,,ysd).\widehat{Q}_{n,d}(f)=\frac{1}{n+n^{*}}\sum_{i=0}^{n+n^{*}-1}f(\mathbf{x}_{i})=\frac{1}{|A|}\sum_{s_{1}=1}^{n_{1}}\sum_{s_{2}=1}^{n_{2}}\cdots\sum_{s_{d}=1}^{n_{d}}g(y_{s_{1}},y_{s_{2}},\cdots,y_{s_{d}}).

Let

J(g,Ω):=s1=1n1s2=1n2sd=1ndg(ys1,ys2,,ysd),J(g,\Omega):=\sum\limits_{s_{1}=1}^{n_{1}}\sum\limits_{s_{2}=1}^{n_{2}}\cdots\sum\limits_{s_{d}=1}^{n_{d}}g(y_{s_{1}},y_{s_{2}},\cdots,y_{s_{d}}),

Clearly, it is a multi-summation with variable limits. Thus, we can apply the MDI approach to compute it efficiently. Before doing that, we first need to extend the MDI algorithm, Algorithm 2.3 of [11], to the case of variable limits. We name the extend algorithm as MDI(d,g,Ωd,𝐍d,m)(d,g,\Omega_{d},\mathbf{N}_{d},m), which is defined as follows.

Algorithm 1 MDI(dd, gg, Ω,𝐍d,m\Omega,\mathbf{N}_{d},m)

Inputs: d(4),g,Ω,m(=1,2,3),𝐍k=(n1,n2,,nk),k=1,2,,d.d(\geq 4),g,\Omega,m(=1,2,3),\,\mathbf{N}_{k}=(n_{1},n_{2},\cdots,n_{k}),\,k=1,2,\cdots,d.
Output: J=J(g,Ω)J=J(g,\Omega).

1:Ωd=Ω\Omega_{d}=\Omega, gd=gg_{d}=g, =[dm]\ell=[\frac{d}{m}].
2:for dok=d:m:dm\ \textbf{do}\,k=d:-m:d-\ell m (the index is decreased by mm at each iteration)
3:     Ωdm=PkkmΩk\Omega_{d-m}=P_{k}^{k-m}\Omega_{k}.
4:     Construct symbolic function gkmg_{k-m} by (4.4) below).
5:     MDI(k,gk,Ωk,𝐍k,m)(k,g_{k},\Omega_{k},\mathbf{N}_{k},m) :=MDI(km,gkm,Ωkm,𝐍km,m)(k-m,g_{k-m},\Omega_{k-m},\mathbf{N}_{k-m},m).
6:end for
7:J=J= MDI(dm,gdm,Ωdm,𝐍dm,m)(d-\ell m,g_{d-\ell m},\Omega_{d-\ell m},\mathbf{N}_{d-\ell m},m).
8:return JJ.

Where PkkmP^{k-m}_{k} denotes the natural embedding from k\mathbb{R}^{k} to km\mathbb{R}^{k-m} by deleting the first mm components of vectors in k\mathbb{R}^{k}, and

(4.4) gkm(s1,,skm)=i1,,im=1n1,,nmwi1wi2wimgk((ξ1,,ξm,s1,,skm)).g_{k-m}(s_{1},\cdots,s_{k-m})=\sum_{i_{1},\cdots,i_{m}=1}^{n_{1},\cdots,n_{m}}w_{i_{1}}w_{i_{2}}\cdots w_{i_{m}}\,g_{k}\bigl{(}(\xi_{1},\cdots,\xi_{m},s_{1},\cdots,s_{k-m})\bigr{)}.
Remark 4.1.
  • (a)

    Algorithm 1 recursively generates a sequence of symbolic functions {gd,gdm,\{g_{d},g_{d-m}, gd2m,gdm}g_{d-2m},\cdots g_{d-\ell m}\}, each function has mm fewer arguments than its predecessor.

  • (b)

    Since m3m\leq 3, when d=2,3d=2,3, we simply use the underlying low dimensional QMC quadrature rules. As done in [11], we name those low dimensional algorithms as 2d-MDI(g,Ω,𝐍2)(g,\Omega,\mathbf{N}_{2}) and 3d-MDI(g,Ω,𝐍3)(g,\Omega,\mathbf{N}_{3}), and introduce the following conventions.

    • If k=1k=1, set MDI(k,gk,Ωk,n1,m):=J(gk,Ωk)(k,g_{k},\Omega_{k},n_{1},m):=J(g_{k},\Omega_{k}), which is computed by using the underlying 1-d QMC quadrature rule.

    • If k=2k=2, set MDI(k,gk,Ωk,𝐍k,m):=(k,g_{k},\Omega_{k},\mathbf{N}_{k},m):= 2d-MDI(gk,Ωk,𝐍k)(g_{k},\Omega_{k},\mathbf{N}_{k}).

    • If k=3k=3, set MDI(k,gk,Ωk,𝐍k,m):=(k,g_{k},\Omega_{k},\mathbf{N}_{k},m):= 3d-MDI(gk,Ωk,𝐍k)(g_{k},\Omega_{k},\mathbf{N}_{k}).

    We note that when k=1,2,3k=1,2,3, the parameter mm becomes a dummy variable and can be given any value.

  • (c)

    We also note that the MDI algorithm in [11] has an additional parameter rr which selects the 1-d quadrature rule. However, such a choice is not needed here because the underlying QMC rule is used as the 1-d quadrature rule.

We are now ready to define our MDI-LR algorithm, which is denoted by MDI-LR(d,g,Ωd,𝐍d,m)(d,g,\Omega_{d},\mathbf{N}_{d},m), by using the above MDI algorithm to evaluate Q^n,d(f)\widehat{Q}_{n,d}(f) in (4.3).

Algorithm 2 MDI-LR(ff, Ω,d,a,n\Omega,d,a,n)

Inputs: f,Ω,d,a,nf,\Omega,d,a,n.
Output: Q^n,d(f)=Qn+n,d(f)\widehat{Q}_{n,d}(f)=Q_{n+n^{*},d}(f).

1:Initialize 𝐳=(1,a,a2,,ad1)\mathbf{z}=(1,a,a^{2},\cdots,a^{d-1}), J=0,Q=0J=0,Q=0, m=1m=1.
2:Construct matrix AA and 𝐛\mathbf{b} by ((3.1)).
3:g(𝐲):=f(A1(𝐲𝐛))g(\mathbf{y}):=f(A^{-1}(\mathbf{y}-\mathbf{b})).
4:Generate the vector 𝐍d\mathbf{N}_{d} by (3.10).
5:Ω^:={𝐲|𝐲=A𝐱+𝐛,𝐱Ω}.\widehat{\Omega}:=\bigl{\{}\mathbf{y}\,|\mathbf{y}=A\mathbf{x}+\mathbf{b},\,\mathbf{x}\in\Omega\bigr{\}}.
6:J=J=MDI(d,g,Ω^,𝐍d,m)(d,g,\widehat{\Omega},\mathbf{N}_{d},m)
7:Q=J|A|Q=\frac{J}{|A|}.
8:return Q^n,d(f)=Q\widehat{Q}_{n,d}(f)=Q.

Noting that here we set m=1m=1, that is, the dimension is reduced by 11 at each dimension iteration, this is because the numerical tests of [11] shows that when m=1m=1 the MDI algorithm is more efficient than when m>1m>1. Also, the upper limit vector 𝐍d\mathbf{N}_{d} depends on the choice of the underlying QMC rule. In Lemma 3.3 we showed that when N=[n1d]N=[n^{\frac{1}{d}}] and a=Na=N, then n1=n2==nd=Nn_{1}=n_{2}=\cdots=n_{d}=N, that is, the number of integration points is the same in each (transformed) coordinate direction.

5 Numerical performance tests

In this section, we present extensive and purposely designed numerical experiments to gauge the performance of the proposed MDI-LR algorithm and to demonstrate its superiority over the standard implementations of the QMC lattice rule (SLR) and the improved lattice rule (Imp-LR) for computing high dimensional integrals. All our numerical experiments are done in Matlab on a desktop PC with Intel(R) Xeon(R) Gold 6226R CPU 2.90GHz and 32GB RAM.

5.1 Two and three-dimensional tests

We first test our MDI-LR on simple 2- and 3-d examples and to compare its performance (in terms of the CPU time) with the SLR and Imp-LR methods .

Test 1. Let Ω=[0,1]2\Omega=[0,1]^{2} and consider the following 2-d integrands:

(5.1) f(x):=x2exp(x1x2)e2;f^(x):=sin(2π+x12+x22).f(x):=\frac{x_{2}\exp\bigl{(}x_{1}x_{2}\bigr{)}}{e-2};\qquad\widehat{f}(x):=\sin\bigl{(}2\pi+x_{1}^{2}+x_{2}^{2}\bigr{)}.

Table 1 and 2 present the computational results (errors and CPU times) of the SLR, Imp-LR and MDI-LR method for approximating I2(f)I_{2}(f) and I2(f^)I_{2}(\widehat{f}), respectively. Recall that the Imp-LR is obtained by adding some integration points on the boundary of the domain in the transformed coordinates, and the MDI-LR algorithm provides a fast implementation of the Imp-LR using the MDI approach. From Table 1 and 2, we observe that all three methods require very little CPU time. The difference is almost negligible although the SLR is faster than the other two methods. Moreover, the Imp-LR and MDI-LR methods use function values at some additional sampling points on the boundary, which leads to higher accuracy compared to the SLR method as we predicated earlier.

SLR(Standard LR) Imp-LR(Improved LR) MDI-LR
Total nodes (nn) Relative error CPU time Relative error CPU time Relative error CPU time
101 1.332×1021.332\times 10^{-2} 0.0422 1.218×1031.218\times 10^{-3} 0.0423 1.218×1031.218\times 10^{-3} 0.0877
501 5.169×1035.169\times 10^{-3} 0.0567 2.520×1042.520\times 10^{-4} 0.0547 2.520×1042.520\times 10^{-4} 0.3230
1001 4.051×1034.051\times 10^{-3} 0.0610 1.269×1041.269\times 10^{-4} 0.0657 1.269×1041.269\times 10^{-4} 0.5147
5001 2.570×1032.570\times 10^{-3} 0.0755 2.489×1052.489\times 10^{-5} 0.0754 2.489×1052.489\times 10^{-5} 1.6242
10001 2.094×1042.094\times 10^{-4} 0.0922 1.220×1051.220\times 10^{-5} 0.0921 1.220×1051.220\times 10^{-5} 3.9471
40001 7.294×1057.294\times 10^{-5} 0.1782 3.050×1063.050\times 10^{-6} 0.1787 3.050×1063.050\times 10^{-6} 7.0408
Table 1: Relative errors and CPU times of SLR, Improved LR and MDI-LR simulations with N=[n1d],a=NN=[n^{\frac{1}{d}}],a=N, n1=n2=Nn_{1}=n_{2}=N for approximating I2(f)I_{2}(f).
SLR(Standard LR) Imp-LR(Improved LR) MDI-LR
Total nodes (nn) Relative error CPU time Relative error CPU time Relative error CPU time
101 1.163×1021.163\times 10^{-2} 0.0415 1.072×1031.072\times 10^{-3} 0.0410 1.072×1031.072\times 10^{-3} 0.0980
501 6.794×1036.794\times 10^{-3} 0.0539 1.399×1041.399\times 10^{-4} 0.0546 1.399×1041.399\times 10^{-4} 0.3498
1001 3.814×1033.814\times 10^{-3} 0.0647 7.040×1057.040\times 10^{-5} 0.0653 7.040×1057.040\times 10^{-5} 0.5028
5001 1.858×1031.858\times 10^{-3} 0.0723 1.3411×1051.3411\times 10^{-5} 0.0733 1.341×1051.341\times 10^{-5} 1.7212
10001 1.175×1041.175\times 10^{-4} 0.0965 6.759×1066.759\times 10^{-6} 0.0945 6.759×1066.759\times 10^{-6} 3.4528
40001 2.937×1052.937\times 10^{-5} 0.1386 1.689×1061.689\times 10^{-6} 0.1399 1.689×1061.689\times 10^{-6} 6.1104
Table 2: Relative errors and CPU times of SLR, Improved LR and MDI-LR simulations with N=[n1d],a=NN=[n^{\frac{1}{d}}],a=N, n1=n2=Nn_{1}=n_{2}=N or approximating I2(f^)I_{2}(\widehat{f}).

Test 2. Let Ω=[0,1]3\Omega=[0,1]^{3} and we consider the following 3-d integrands:

(5.2) f(x):=exp(x1+x2+x3)(e1)3;f^(x):=sin(2π+x12+x22+x32).f(x):=\frac{\exp\bigl{(}x_{1}+x_{2}+x_{3}\bigr{)}}{(e-1)^{3}};\qquad\widehat{f}(x):=\sin\bigl{(}2\pi+x_{1}^{2}+x_{2}^{2}+x_{3}^{2}\bigr{)}.

Tables 3 and 4 present the simulation results (errors and CPU time) of the SLR, Imp-LR, and MDI-LR methods for computing I3(f)I_{3}(f) and I3(f^)I_{3}(\widehat{f}) in Test 2. We observe that the SLR method requires less CPU time in both simulations. The advantage of the MDI-LR method in accelerating the computation does not materialize in low dimensions as seen in Test 1. Once again, the Imp-LR and MDI-LR have higher accuracy compared to the SLR method because they use additional sampling points on the boundary of the transformed domain.

SLR(Standard LR) Imp-LR(Improved LR) MDI-LR
Total nodes (nn) Relative error CPU time Relative error CPU time Relative error CPU time
101 3.426×1033.426\times 10^{-3} 0.0574 4.985×1034.985\times 10^{-3} 0.0588 4.985×1034.985\times 10^{-3} 0.0877
1001 6.276×1036.276\times 10^{-3} 0.0634 1.249×1031.249\times 10^{-3} 0.0654 1.249×1031.249\times 10^{-3} 0.2684
10001 9.920×1049.920\times 10^{-4} 0.0833 3.124×1043.124\times 10^{-4} 0.0877 3.124×1043.124\times 10^{-4} 0.6322
100001 5.717×1045.717\times 10^{-4} 0.1500 5.907×1055.907\times 10^{-5} 0.1499 5.907×1055.907\times 10^{-5} 2.5866
1000001 1.369×1051.369\times 10^{-5} 1.0589 1.249×1051.249\times 10^{-5} 1.0587 1.249×1051.249\times 10^{-5} 14.737
10000001 8.441×1068.441\times 10^{-6} 9.8969 3.124×1063.124\times 10^{-6} 10.280 3.124×1063.124\times 10^{-6} 91.897
Table 3: Relative errors and CPU times of SLR, Improved LR and MDI-LR simulations with N=[n1d],a=NN=[n^{\frac{1}{d}}],a=N, n1=n2=n3=Nn_{1}=n_{2}=n_{3}=N for computing I3(f)I_{3}(f).
SLR(Standard LR) Imp-LR(Improved LR) MDI-LR
Total nodes (nn) Relative error CPU time Relative error CPU time Relative error CPU time
101 1.866×1021.866\times 10^{-2} 0.0580 1.008×1031.008\times 10^{-3} 0.0554 1.008×1031.008\times 10^{-3} 0.1366
1001 9.746×1039.746\times 10^{-3} 0.0628 2.739×1042.739\times 10^{-4} 0.0649 2.739×1042.739\times 10^{-4} 0.3804
10001 1.001×1031.001\times 10^{-3} 0.0820 6.337×1056.337\times 10^{-5} 0.0828 6.337×1056.337\times 10^{-5} 1.1032
100001 7.063×1047.063\times 10^{-4} 0.1443 1.326×1051.326\times 10^{-5} 0.1557 1.326×1051.326\times 10^{-5} 4.8794
1000001 2.211×1052.211\times 10^{-5} 1.1163 2.810×1062.810\times 10^{-6} 1.2104 2.810×1062.810\times 10^{-6} 20.305
10000001 1.650×1051.650\times 10^{-5} 10.207 7.026×1077.026\times 10^{-7} 10.427 7.026×1077.026\times 10^{-7} 101.22
Table 4: Relative errors and CPU times of SLR, Improved LR and MDI-LR simulations with N=[n1d],a=NN=[n^{\frac{1}{d}}],a=N, n1=n2=n3=Nn_{1}=n_{2}=n_{3}=N for computing I3(f^)I_{3}(\widehat{f}).

5.2 High-dimensional tests

Since the MDI-LR method is designed for computing high-dimensional integrals, its performance for d>>1d>>1 is more important and anticipated, which is indeed the main task of this subsection. First, we test and compare the performance (in terms of CPU time) of the SLR, Imp-LR, and MDI-LR methods for computing high-dimensional integrals as the number of lattice points grows due to the dimension increases. Then, we also test the performance of the SLR and MDI-LR methods for computing high-dimensional integrals when the number of lattice points increases slowly in the dimension dd.

Test 3. Let Ω=[0,1]d\Omega=[0,1]^{d} for 2d502\leq d\leq 50 and consider the following Gaussian integrand:

(5.3) f(x)=12πexp(12|x|2),f(x)=\frac{1}{\sqrt{2\pi}}\exp\Bigl{(}-\frac{1}{2}|x|^{2}\Bigr{)},

where |x||x| stands for the Euclidean norm of the vector xdx\in\mathbb{R}^{d}.

Table 5 shows the relative errors and CPU times of SLR, Imp-LR, and MDI-LR methods for approximating the Gaussian integral Id(f)I_{d}(f). The simulation results indicate that SLR and Imp-LR methods are more efficient when d<7d<7, but they struggle to compute integrals when d>11d>11 as the number of lattice points increases exponentially in the dimension. However, this is not a problem for the MDI-LR method, which can compute this high-dimensional integral easily. Moreover, the MDI-LR method improves the accuracy of the original QMC rule significantly by adding some integration points on the boundary of the transformed domain.

SLR(Standard LR) Total Nodes(1+10d1+10^{d}) Imp-LR(Improved LR) Total Nodes(1.1×10d1.1\times 10^{d}) MDI-LR Total Nodes(1.1×10d1.1\times 10^{d})
Dimension (dd) Relative error CPU time Relative error CPU time Relative error CPU time
2 4.802×1034.802\times 10^{-3} 0.0622 5.398×1045.398\times 10^{-4} 0.0637 5.398×1045.398\times 10^{-4} 0.1335
4 3.796×1033.796\times 10^{-3} 0.1068 1.131×1031.131\times 10^{-3} 0.1206 1.131×1031.131\times 10^{-3} 0.5780
6 7.780×1037.780\times 10^{-3} 1.2450 1.723×1031.723\times 10^{-3} 1.2745 1.723×1031.723\times 10^{-3} 1.2890
8 1.189×1021.189\times 10^{-2} 124.91 2.315×1032.315\times 10^{-3} 126.85 2.315×1032.315\times 10^{-3} 1.4083
10 1.602×1021.602\times 10^{-2} 13084 2.908×1032.908\times 10^{-3} 13255 2.908×1032.908\times 10^{-3} 3.1418
11 1.809×1021.809\times 10^{-2} 132927 3.204×1033.204\times 10^{-3} 141665 3.204×1033.204\times 10^{-3} 3.8265
12 failed failed failed failed 3.501×1033.501\times 10^{-3} 4.5919
Table 5: Relative errors and CPU times of SLR, Improved LR and MDI-LR simulations with N=[n1d],a=NN=[n^{\frac{1}{d}}],a=N, n1==nd=Nn_{1}=\cdots=n_{d}=N for computing Id(f)I_{d}(f).
SLR MDI-LR
Dimension (dd) Total nodes (nn) aa value Relative error CPU time Relative error CPU time
2 1+10310^{3} 31 4.8020×1044.8020\times 10^{-4} 0.0369 6.1474×1056.1474\times 10^{-5} 0.432905
6 1+10610^{6} 10 7.7798×1037.7798\times 10^{-3} 1.2450 1.7745×1031.7745\times 10^{-3} 0.790102
10 1+10610^{6} 4 5.3673×1025.3673\times 10^{-2} 1.2453 1.8683×1021.8683\times 10^{-2} 0.582487
14 1+10810^{8} 4 7.9282×1027.9282\times 10^{-2} 144.759 2.6253×1022.6253\times 10^{-2} 0.536131
18 1+10910^{9} 3 1.5827×1011.5827\times 10^{-1} 1649.59 6.1158×1026.1158\times 10^{-2} 0.774606
22 1+101010^{10} 3 2.0007×1012.0007\times 10^{-1} 18694.04 7.5249×1027.5249\times 10^{-2} 0.702708
26 1+101110^{11} 3 2.4341×1012.4341\times 10^{-1} 217381.41 8.9527×1028.9527\times 10^{-2} 0.866122
30 1+101110^{11} 3 2.9009×1012.9009\times 10^{-1} 269850.87 1.0399×1011.0399\times 10^{-1} 1.045107
Table 6: Relative errors and CPU times of SLR and MDI-LR simulations with the same number of integration points for computing Id(f)I_{d}(f).

Table 6 shows the relative errors and CPU times of the SLR and MDI-LR methods for computing Id(f)I_{d}(f) when the number of lattice points increase slowly in the dimension dd. As the dimension increases, the CPU time required by the SLR method also increases sharply (see Figure 6). When approximating the Gaussian integral of about 30 dimensions with 101110^{11} lattice points, the SLR method requires 7474 hours to obtain a result with relatively low accuracy. In contrast, the MDI-LR method only takes about one second to obtain a more accurate value, this demonstrates that the acceleration effect of the MDI-LR method is quite dramatic.

Refer to caption Refer to caption

Figure 6: CUP time comparison of SLR and MDI-LR simulations: the number of lattice points increases in dimension (left), the number of lattice points increases slowly (right).

It is well known that it is difficult to obtain high-accuracy approximations in high dimensions because the number of integration points required is enormous. A natural question is whether the MDI-LR method can handle very high (i.e., d1000d\approx 1000) dimensional integration with reasonable accuracy. First, we note that the answer is machine dependent, as expected. Next, we present a test on the computer at our disposal to provide a positive answer to this question

Test 4. Let Ω=[0,1]d\Omega=[0,1]^{d} and consider the following integrands:

(5.4) f(x)=exp(i=1d(1)i+1xi),f^(x)=i=0d10.92+(xi0.6)2.f(x)=\exp\Bigl{(}\sum_{i=1}^{d}(-1)^{i+1}x_{i}\Bigr{)},\qquad\widehat{f}(x)=\prod_{i=0}^{d}\frac{1}{0.9^{2}+(x_{i}-0.6)^{2}}.

We use the algorithm MDI-LR to compute Id(f)I_{d}(f) and Id(f^)I_{d}(\widehat{f}) with parameters a=8,10a=8,10, and an increasing sequence of dd. The computed results are presented in Table 7. The simulation is stopped at d=1000d=1000 because it is already in the very high dimension regime. These tests demonstrate the efficacy and potential of the MDI-LR method in efficiently computing high-dimensional integrals. However, we note that in terms of efficiency and accuracy, the MDI-LR method underperforms its two companion methods, namely, MDI-TP [11] and MDI-SG [12] methods. The main reason for the underperformance is that the original lattice rule is unable to provide high-accuracy integral approximations and the MDI-LR is only a fast implementation algorithm (i.e., solver) for the original lattice rule. Nevertheless, the lattice rule has its own advantages, such as allowing flexible integration points and giving better results for periodic integrands.

Id(f)I_{d}(f) Nodes(1×8d1\times 8^{d}) Id(f^)I_{d}(\widehat{f}) Nodes(1×20d1\times 20^{d})
Dimension (dd) aa value Relative error CPU time(s) aa value Relative error CPU time(s)
10 8 6.4884×1036.4884\times 10^{-3} 0.4329063 20 1.6107×1031.6107\times 10^{-3} 0.9851172
100 8 6.3022×1026.3022\times 10^{-2} 71.253076 20 1.6225×1021.6225\times 10^{-2} 11.1203255
300 8 1.7740×1011.7740\times 10^{-1} 1856.91018 20 4.9469×1024.9469\times 10^{-2} 37.0903112
500 8 2.7781×1012.7781\times 10^{-1} 8076.92429 20 8.3801×1028.3801\times 10^{-2} 65.9497657
700 8 3.6597×1013.6597\times 10^{-1} 20969.96162 20 1.1925×1011.1925\times 10^{-1} 108.989057
900 8 4.4337×1014.4337\times 10^{-1} 47870.50843 20 1.5587×1011.5587\times 10^{-1} 157.487672
1000 8 4.7845×1014.7845\times 10^{-1} 69991.88017 20 1.7462×1011.7462\times 10^{-1} 189.132615
Table 7: Computed results for Id(f)I_{d}(f) and Id(f^)I_{d}(\widehat{f}) by algorithm MDI-LR.

6 Influence of parameters

The original MDI algorithm involves three crucial input parameters: rr, mm, and NN. The parameter rr determines the one-dimensional basis value quadrature rule, while mm sets the step size in the multidimensional iteration, and NN represents the number of integration points in each coordinate direction. The algorithm MDI-LR is similar to the original MDI, but uses the QMC rank-one lattice rule with generating vector 𝐳\mathbf{z}, so the parameter rr is muted. Here we focus on the Korobov approach in constructing the generating vector 𝐳\mathbf{z}, which is defined as 𝐳=𝐳(a):=(1,a,a2,,ad1)\mathbf{z}=\mathbf{z}(a):=(1,a,a^{2},\cdots,a^{d-1}). Moreover, the improved tensor product rule (in the transformed coordinate system) implemented by the algorithm Imp-LR has a variable upper limits in the summation (cf. (4.3)), hence, NN is now replaced by 𝐍d\mathbf{N}_{d} which is determined by the underlying QMC lattice rule. Furthermore, as explained earlier, we set m=1m=1 due to our experience in [11]. As a result, the only parameter to select is aa. Below, we first test the influence of the Korobov parameter aa on the efficiency of the algorithm MDI-LR and then test dependence of its performance on 𝐍d\mathbf{N}_{d} and dd.

6.1 Influence of parameter aa

In this subsection, we investigate the impact of the generating vector 𝐳=𝐳(a):=(1,a,a2,,ad1)\mathbf{z}=\mathbf{z}(a):=(1,a,a^{2},\cdots,a^{d-1}) in the algorithm MDI-LR . We note that similar methods can be constructed using other 𝐳\mathbf{z}.

Test 5. Let Ω=[0,1]d\Omega=[0,1]^{d} and consider the following integrands:

f(x)=12πexp(12|x|2),\displaystyle f(x)=\frac{1}{\sqrt{2\pi}}\exp\Bigl{(}-\frac{1}{2}|x|^{2}\Bigr{)}, f^(x)=cos(2π+2i=1dxi),\displaystyle\qquad\widehat{f}(x)=\cos\Bigl{(}2\pi+2\sum_{i=1}^{d}x_{i}\Bigr{)},
f~(x)=i=0d10.92+(xi0.6)2.\displaystyle\widetilde{f}(x)=\prod_{i=0}^{d}\frac{1}{0.9^{2}+(x_{i}-0.6)^{2}}.

We compare the performance of the algorithm MDI-LR with different Korobov parameters aa while holding other parameters unchanged when computing Id(f)I_{d}(f), Id(f^)I_{d}(\widehat{f}), and Id(f~)I_{d}(\widetilde{f}).

Refer to caption Refer to caption

Refer to caption Refer to caption

Figure 7: Performance comparison of algorithm MDI-LR with n=1+10dn=1+10^{d} and a=4,6,8,10,12,14,16a=4,6,8,10,12,14,16 for computing Id(f)I_{d}(f), Id(f^)I_{d}(\widehat{f}) and Id(f~)I_{d}(\widetilde{f}). Top left: d=5d=5, CPU time comparison. Top right: d=10d=10, CPU time comparison. Bottom left: d=5d=5, comparison of relative errors. Bottom right: d=10d=10, comparison of relative errors

Figure 7 shows the computed results for d=5,10d=5,10 and a=4,6,8,10,12,14,16a=4,6,8,10,12,14,16, respectively. We observe that the algorithm MDI-LR with different parameters aa has different accuracy and the effect could be significant. These results indicate that the algorithm is most efficient when a=Na=N, where N=[n1d]N=[n^{\frac{1}{d}}] and nn represents the total number of integral points. This is because when a smaller aa is used, although fewer integration points need to be evaluated in each coordinate direction in the first d1d-1 dimension iterations, since the total number of integral points nn is the same, the amount of computation will increase dramatically. When using a larger aa, more integration points need to be used in each coordinate direction in the first d1d-1 dimension iterations. Only when the integration points are equally distributed to each coordinate direction, the efficiency of the algorithm MDI-LR can be optimized. A total of 100100 points are shown in Figure 8. When a=2a=2, only 22 iterations in the x1x_{1}-direction are needed, but 5050 iterations in the x2x_{2}-direction must performed, hence, a total of 5252 iterations in the two directions are required. On the other hand, when a=20a=20, a total of 2525 iterations in the two directions are required. It is easy to check that the least total of 2020 iterations occurs when a=10a=10. The difference in accuracy is obvious, because the different aa leads to different generating vector 𝐳\mathbf{z}, which in turn results in different integration points. We note that it was already well studied in the literature on how to choose aa to achieve the highest accuracy (cf. []).

Refer to caption

Figure 8: Distribution of 100 integration points in the transformed coordinate system when a=2,10,20,a=2,10,20, respectively.

6.2 Influence of parameter N=[n1d]N=[n^{\frac{1}{d}}]

In the previous section, we know that the algorithm is most efficient when a=Na=N, where NN represents the number of integration points in each direction. This section aims to investigate the impact of NN on the MDI-LR algorithm. For this purpose, we conduct tests by setting a=Na=N and d=5d=5 and d=10d=10.

Test 6. Let Ω\Omega, ff, f^\widehat{f} and f~\widetilde{f} be the same as in Test 5.

Table 8, 9, and 10 present a performance comparison for algorithm MDI-LR with d=5,10d=5,10 and N=4,6,8,10,12,14,16N=4,6,8,10,12,14,16, respectively. We note that the quality of the computed results also depend on types of the integrands. As expected, more integration points must be used to achieve a good accuracy for very oscillatory and fast growth integrands.

d=5d=5 d=10d=10
N(n) Korobov parameter (a)(a) Relative error CPU time(s) Relative error CPU time(s)
4(1+4d1+4^{d}) 4 8.6248×1038.6248\times 10^{-3} 0.1456465 1.8003×1021.8003\times 10^{-2} 0.3336161
6(1+6d1+6^{d}) 6 3.8967×1033.8967\times 10^{-3} 0.1911801 8.0284×1038.0284\times 10^{-3} 0.5690320
8 (1+8d1+8^{d}) 8 2.2145×1032.2145\times 10^{-3} 0.3373442 4.5314×1034.5314\times 10^{-3} 0.9552591
10 (1+10d1+10^{d}) 10 1.4271×1031.4271\times 10^{-3} 0.3884146 2.9078×1032.9078\times 10^{-3} 1.9385378
12(1+12d1+12^{d}) 12 9.9601×1049.9601\times 10^{-4} 0.6545521 2.0234×1032.0234\times 10^{-3} 3.5639475
14(1+14d1+14^{d}) 14 7.3448×1047.3448\times 10^{-4} 0.7224777 1.4889×1031.4889\times 10^{-3} 6.0036393
16(1+16d1+16^{d}) 16 5.6396×1045.6396\times 10^{-4} 1.0909097 1.1414×1031.1414\times 10^{-3} 8.4313528
Table 8: Performance comparison of algorithm MDI-LR with d=5,10,a=Nd=5,10,a=N and N=4,6,8,10,12,14,16N=4,6,8,10,12,14,16 for computing Id(f)I_{d}(f).
d=5d=5 d=10d=10
N(n) Korobov parameter (a)(a) Relative error CPU time(s) Relative error CPU time(s)
4(1+4d1+4^{d}) 4 4.9621×1024.9621\times 10^{-2} 0.1323887 1.0585×1011.0585\times 10^{-1} 0.2775234
6(1+6d1+6^{d}) 6 2.2181×1022.2181\times 10^{-2} 0.1955847 4.6141×1024.6141\times 10^{-2} 0.4267706
8 (1+8d1+8^{d}) 8 1.2558×1021.2558\times 10^{-2} 0.2689113 2.5836×1022.5836\times 10^{-2} 0.5697773
10 (1+10d1+10^{d}) 10 8.0791×1038.0791\times 10^{-3} 0.3227299 1.6517×1021.6517\times 10^{-2} 0.7828456
12(1+12d1+12^{d}) 12 5.6328×1035.6328\times 10^{-3} 0.4056192 1.1470×1021.1470\times 10^{-2} 0.9228344
14(1+14d1+14^{d}) 14 4.1513×1034.1513\times 10^{-3} 0.4940739 8.4305×1038.4305\times 10^{-3} 1.0968489
16(1+16d1+16^{d}) 16 3.1863×1033.1863\times 10^{-3} 0.6079693 6.4576×1036.4576\times 10^{-3} 1.2933549
Table 9: Performance comparison of algorithm MDI-LR with d=5,10,a=Nd=5,10,a=N and N=4,6,8,10,12,14,16N=4,6,8,10,12,14,16 for computing Id(f^)I_{d}(\widehat{f}).
d=5d=5 d=10d=10
N(n) Korobov parameter (a)(a) Relative error CPU time(s) Relative error CPU time(s)
4(1+4d1+4^{d}) 4 1.9003×1021.9003\times 10^{-2} 0.1254485 3.9895×1023.9895\times 10^{-2} 0.2460844
6(1+6d1+6^{d}) 6 8.5331×1038.5331\times 10^{-3} 0.1802281 1.7625×1021.7625\times 10^{-2} 0.3613987
8 (1+8d1+8^{d}) 8 4.8390×1034.8390\times 10^{-3} 0.2114595 9.9155×1039.9155\times 10^{-3} 0.4414383
10 (1+10d1+10^{d}) 10 3.1153×1033.1153\times 10^{-3} 0.2748469 6.3531×1036.3531\times 10^{-3} 0.4892808
12(1+12d1+12^{d}) 12 2.1729×1032.1729\times 10^{-3} 0.3092816 4.4172×1034.4172\times 10^{-3} 0.5859328
14(1+14d1+14^{d}) 14 1.6018×1031.6018\times 10^{-3} 0.3602077 3.2488×1033.2488\times 10^{-3} 0.6783681
16(1+16d1+16^{d}) 16 1.2296×1031.2296\times 10^{-3} 0.4157161 2.4897×1032.4897\times 10^{-3} 0.7819849
Table 10: Performance comparison of algorithm MDI-LR with d=5,10,a=Nd=5,10,a=N and N=4,6,8,10,12,14,16N=4,6,8,10,12,14,16 for computing Id(f~)I_{d}(\widetilde{f}).

7 Computational complexity

7.1 The relationship between the CPU time and NN

In this subsection, we examine the relationship between CPU time and the parameter N=[n1d]N=[n^{\frac{1}{d}}] and a=Na=N using a regression technique based on test data.

Integrand aa mm dd Fitting function R-square
f(x)f(x) N 1 5 h1(N)=(0.007569)N1.772h_{1}(N)=(0.007569)*N^{1.772} 0.9687
f^(x)\widehat{f}(x) N 1 5 h2(N)=(0.02326)N1.165h_{2}(N)=(0.02326)*N^{1.165} 0.9920
f~(x)\widetilde{f}(x) N 1 5 h3(N)=(0.03592)N0.8767h_{3}(N)=(0.03592)*N^{0.8767} 0.9946
f(x)f(x) N 1 10 h4(N)=(0.002136)N2.992h_{4}(N)=(0.002136)*N^{2.992} 0.9968
f^(x)\widehat{f}(x) N 1 10 h5(N)=(0.05679)N1.125h_{5}(N)=(0.05679)*N^{1.125} 0.9984
f~(x)\widetilde{f}(x) N 1 10 h6(N)=(0.07872)N0.8184h_{6}(N)=(0.07872)*N^{0.8184} 0.9901
Table 11: Relationship between the CPU time and parameter NN.

Figures 9 and 10 show CPU time as a function of NN obtained by the least squares regression with the fitted function given in Table 11. All results show that CPU time grows in proportion to N3N^{3}.

Refer to caption Refer to caption Refer to caption

Figure 9: The relationship between the CPU time and parameter NN when d=5d=5: Id(f)I_{d}(f) (left), Id(f^)I_{d}(\widehat{f}) (middle), Id(f~)I_{d}(\widetilde{f}) (right).

Refer to caption Refer to caption Refer to caption

Figure 10: Relationship between the CPU time and parameter NN when d=10d=10: Id(f)I_{d}(f) (left), Id(f^)I_{d}(\widehat{f}) (middle), Id(f~)I_{d}(\widetilde{f}) (right).

7.2 The relationship between the CPU time and the dimension dd

In this subsection, we exploit the computational complexity (in terms of CPU time as a function of dd) using the least squares regression on numerical test data.

Test 7. Let Ω=[0,1]d\Omega=[0,1]^{d}, we consider the following five integrands:

f1(x)=exp(i=1d(1)i+1xi),\displaystyle f_{1}(x)=\exp\Bigl{(}\sum_{i=1}^{d}(-1)^{i+1}x_{i}\Bigr{)}, f2(x)=i=1d10.92+(xi0.6)2,\displaystyle\qquad f_{2}(x)=\prod_{i=1}^{d}\frac{1}{0.9^{2}+(x_{i}-0.6)^{2}},
f3(x)=12πexp(12|x|2),\displaystyle f_{3}(x)=\frac{1}{\sqrt{2\pi}}\exp\Bigl{(}-\frac{1}{2}|x|^{2}\Bigr{)}, f4(x)=cos(2π+i=1d2xi),\displaystyle\qquad f_{4}(x)=\cos\Bigl{(}2\pi+\sum_{i=1}^{d}2x_{i}\Bigr{)},
f5(x)=exp(i=1d(1)i+1xi2),\displaystyle f_{5}(x)=\exp\Bigl{(}\sum_{i=1}^{d}(-1)^{i+1}x_{i}^{2}\Bigr{)}, f6(x)=(1+i=1dxi)(d+1).\displaystyle\qquad f_{6}(x)=(1+\sum_{i=1}^{d}x_{i})^{-(d+1)}.

Figure 11 displays the the CPU time as functions of dd obtained by the least square regression whose analytical expressions are given in Table 12. We note that the parameters of the algorithm MDI-LR only affect the coefficients of the fitted function, not the power of the polynomials. These results show that the CPU time required by the proposed algorithm MDI-LR grows at most with polynomial order O(d3N2)O(d^{3}N^{2}).

Integrand aa NN mm Fitting function RR-square
f1f_{1} 8 8 1 g1=(1.057e06)N2d3g_{1}=(1.057e-06)*N^{2}d^{3} 0.9973
10 10 1 g2=(1.192e06)N2d3g_{2}=(1.192e-06)*N^{2}d^{3} 0.9995
20 20 1 g3=(1.433e06)N2d3g_{3}=(1.433e-06)*N^{2}d^{3} 0.9978
f2f_{2} 10 10 1 g4=0.0001774Nd1.611g_{4}=0.0001774*Nd^{1.611} 0.9983
14 14 1 g5=0.003028Nd1.147g_{5}=0.003028*Nd^{1.147} 0.9987
20 20 1 g6=0.000539Nd1.41g_{6}=0.000539*Nd^{1.41} 0.9964
f3f_{3} 8 8 1 g7=(7.334e06)N2d3g_{7}=(7.334e-06)*N^{2}d^{3} 0.9983
10 10 1 g8=(9.321e06)N2d3g_{8}=(9.321e-06)*N^{2}d^{3} 0.9986
14 14 1 g9=(1.339e05)N2d3g_{9}=(1.339e-05)*N^{2}d^{3} 0.9972
f4f_{4} 10 10 1 g10=(1.164e06)N2d3g_{10}=(1.164e-06)*N^{2}d^{3} 0.9988
20 20 1 g11=(1.319e06)N2d3g_{11}=(1.319e-06)*N^{2}d^{3} 0.9974
f5f_{5} 10 10 1 g12=(6.479e05)N2d2.557g_{12}=(6.479e-05)*N^{2}d^{2.557} 0.9996
14 14 1 g13=(1.164e05)N2d3g_{13}=(1.164e-05)*N^{2}d^{3} 0.9993
f6f_{6} 10 10 1 g14=(1.556e06)N2d3g_{14}=(1.556e-06)*N^{2}d^{3} 0.9983
20 20 1 g15=(8.328e06)N2d2.431g_{15}=(8.328e-06)*N^{2}d^{2.431} 0.9998
Table 12: The relationship between CPU time as a function of the dimension dd.

We assess the quality of the fitted curves using the RR-square criterion in Matlab, defined by RR-square=1in(yiy^i)2in(yiy¯)2\mbox{\rm square}=1-\frac{\sum_{i}^{n}(y_{i}-\widehat{y}_{i})^{2}}{\sum_{i}^{n}(y_{i}-\overline{y})^{2}}, where yiy_{i} is a test data output, y^i\widehat{y}_{i} is the predicted value, and y¯\overline{y} is the mean of yiy_{i}. As shown in Table 12, the RR-square values of all fitted functions are close to 11, indicating their high accuracy. These results support the observation that the CPU time grows no more than cubically with the dimension dd. Combined with the results of Test 6 in Section 7.1, we conclude that the computational cost of the proposed MDI-LR algorithm scales at most polynomially in the order of O(N2d3)O(N^{2}d^{3}).

[Uncaptioned image] [Uncaptioned image] [Uncaptioned image]

Refer to caption Refer to caption Refer to caption

Refer to caption Refer to caption Refer to caption

Refer to caption Refer to caption Refer to caption

Refer to caption Refer to caption Refer to caption

Figure 11: The relationship between the CPU time and dimension dd.

8 Conclusions

In this paper, we introduced an efficient and fast algorithm MDI-LR for implementing QMC lattice rules for high-dimensional numerical integration. It is based on the idea of converting and extending them into tensor product rules by affine transformations, and adopting the multilevel dimension iteration approach which computes the function evaluations (at the integration points) in the multi-summation in cluster and iterates along each (transformed) coordinate direction so that a lot of computations can be reused. Based on numerical simulation results, it was concluded that the computational complexity of the algorithm MDI-LR (in terms of CPU time) grows at most cubically in the dimension dd and has an overall growth rate O(d3N2)O(d^{3}N^{2}), which suggests that the proposed algorithm MDI-LR can effectively mitigate the curse of dimensionality in high-dimensional numerical integration, making the QMC lattice rule not only competitive but also practically useful for high dimension numerical integration. Extensive numerical tests were provided to guage the performance of the algorithm MDI-LR and to compare its performance with the standard QMC lattice rules. Extensions to general Monte Carlo methods and applications of the proposed MDI-LR algorithm for solving high-dimensional PDEs will be explored and reported in a forthcoming work.

References

  • [1] N. M. Korobov, The approximate computation of multiple integrals, Dokl. Akad. Nauk SSSR, 1959, 124:1207–1210. In Russian.
  • [2] I. H. Sloan, S. Joe, Lattice methods for multiple integration, Oxford University Press, 1994.
  • [3] J. Dick, F. Y. Kuo, and I. H. Sloan, High-dimensional integration: the quasi-Monte Carlo way, Acta Numer, 22:133–288, 2013.
  • [4] X. Wang, I. H. Sloan, and J. Dick, On Korobov lattice rules in weighted spaces, SIAM journal on numerical analysis, 2004, 42(4): 1760-1779.
  • [5] N. M. Korobov, Properties and calculation of optimal coefficients,Doklady Akademii Nauk. Russian Academy of Sciences, 1960, 132(5): 1009-1012.
  • [6] H. Niederreiter, A. Winterhof, Applied number theory, Cham: Springer, 2015.
  • [7] H. Niederreiter, Quasi-Monte Carlo methods and pseudo-random numbers,Bulletin of the American mathematical society, 1978, 957-1041.
  • [8] H. Niederreiter, Existence of good lattice points in the sense of Hlawka,Monatshefte für Mathematik, 1978, 86(3): 203-219.
  • [9] I. H. Sloan, A. Reztsov Component-by-component construction of good lattice rules, Mathematics of Computation, 2002, 71(237): 263-273.
  • [10] P.Kritzer, H. Niederreiter, F. Pillichshammer Ian Sloan and Lattice Rules,Contemporary Computational Mathematics-A Celebration of the 80th Birthday of Ian Sloan, 2018: 741-769.
  • [11] X. Feng and H. Zhong, A fast multilevel dimension iteration algorithm for high dimensional numerical integration. preprint.
  • [12] H. Zhong and X. Feng , An Efficient and fast Sparse Grid algorithm for high dimensional numerical integration. preprint.
  • [13] Paul Bratley, Bennett Fox, Harald Niederreiter, Implementation and Tests of Low Discrepancy Sequences, ACM Transactions on Modeling and Computer Simulation, Volume 2, Number 3, July 1992, pages 195-213.
  • [14] Henri Faure, Good permutations for extreme discrepancy, Journal of Number Theory, Volume 42, 1992, pages 47-56.
  • [15] Ilya Sobol, Uniformly Distributed Sequences with an Additional Uniform Property, USSR Computational Mathematics and Mathematical Physics, Volume 16, 1977, pages 236-242.
  • [16] H. Niederreiter, Low-discrepancy and low-dispersion sequences, Journal of Number Theory, Volume 30, 1988, pages 51-70.
  • [17] H.-J. Bungartz and M. Griebel, Sparse grids, Acta Numer., 13:147–269, 2014.
  • [18] T. Gerstner and M. Griebel, Numerical integration using sparse grids, Numerical algorithms, 18(3): 209–232, 1998.
  • [19] R. E. Caflisch, Monte Carlo and quasi-Monte Carlo methods, Acta Numer., 7:1–49, 1998.
  • [20] Y. Ogata, A Monte Carlo method for high dimensional integration, Numer. Math., 55:137–157, 1989.
  • [21] F. Y. Kuo, C. Schwab, and I. H. Sloan, Quasi-Monte Carlo methods for high-dimensional integration: the standard (weighted Hilbert space) setting and beyond, ANZIAM J., 53:1–37, 2011.
  • [22] H. Niederreiter, Random number generation and quasi-Monte Carlo methods, Society for Industrial and Applied Mathematics, 1992.
  • [23] S. Haber, Experiments on optimal coefficients, Applications of number theory to numerical analysis. Academic Press, 1972: 11-37.
  • [24] A. R. Krommer,C. W.Ueberhuber, Computational integration, Society for Industrial and Applied Mathematics, 1998.