\headers

AN EffICEINT ALGRITHM FOR HIGH-DIMENSIONAL INTEGRATIONHUICONG ZHONG AND XIAOBING FENG

An efficient implementation algorithm for quasi-Monte Carlo approximations of high-dimensional integrals

Huicong Zhong School of Mathematics and Statistics, Northwestern Polytechnical University, Xi’an, Shaanxi 710129, China (). [email protected] Xiaobing Feng Department of Mathematics, The University of Tennessee, Knoxville, TN, 37996 (). [email protected]

Abstract

In this paper, we develop and test a fast numerical algorithm, called MDI-LR, for efficient implementation of quasi-Monte Carlo lattice rules for computing $d$ -dimensional integrals of a given function. It is based on the idea of converting and improving the underlying lattice rule into a tensor product rule by an affine transformation, and adopting the multilevel dimension iteration approach which computes the function evaluations (at the integration points) in the tensor product multi-summation in cluster and iterates along each (transformed) coordinate direction so that a lot of computations can be reused. The proposed algorithm also eliminates the need for storing integration points and computing function values independently at each point. Extensive numerical experiments are presented to gauge the performance of the algorithm MDI-LR and to compare it with standard implementation of quasi-Monte Carlo lattice rules. It is also showed numerically that the algorithm MDI-LR can achieve a computational complexity of order $O(N^{2}d^{3})$ or better, where $N$ represents the number of points in each (transformed) coordinate direction and $d$ standard for the dimension. Thus, the algorithm MDI-LR effectively overcomes the curse of dimensionality and revitalizes QMC lattice rules for high-dimensional integration.

keywords:

Lattice rule(LR), multilevel dimension iteration (MDI), Monte Carlo methods, Quasi-Monte Carlo(QMC) method, high-dimensional integration.

{AMS}

65D30, 65D40, 65C05, 65N99

1 Introduction

Numerical integration is an essential tool and building block in many scientific and engineering fields which requires to evaluate or estimate integrals of given (explicitly or implicitly) functions, which becomes very challenging in high dimensions due to the so-called curse of the conditionality (CoD). They are seen in evaluating quantities of stochastic interests, solving high-dimensional partial differential equations, or computing value functions of an option of a basket of securities. The goal of this paper is to develop and test an efficient algorithm based on quasi-Monte Carlo methods for evaluating the $d$ -dimensional integral

(1.1)

I_{d}(f):=\int_{\Omega}f(\mathbf{x})d\mathbf{x}

for a given function $f:\Omega:=[0,1]^{d}\to\mathbf{R}$ and $d>>1$ .

Classical numerical integration methods, such as tensor product and sparse grid methods [17, 18] as well as Monte Carlo(MC) methods [19, 20] require the evaluation of a function at a set of integration points. The computational complexity of the first two types of methods grows exponentially with the dimension $d$ in the problem (i.e.., the CoD), which limits their practical usage. Monte Carlo (MC) methods are often the default methods for high dimensional integration problems due to their ability of handling complicated functions and mitigating the CoD. We recall that the MC method approximates the integral by randomly sampling points within the integration domain and averaging their function values. The classical MC method has the form

(1.2)

Q_{n,d}(f)=\frac{1}{n}\sum_{i=0}^{n-1}f(\mathbf{x}_{i}),

where $\{\mathbf{x}_{i}\}_{i=0}^{n-1}$ denotes independent and uniformly distributed random samples in the integration domain $\Omega$ . The expected error for the MC method is proportional to $\frac{\sigma(f)}{\sqrt{n}}$ , where $\sigma(f)^{2}$ stands for the variance of $f$ . If $f$ is square-integrable then the expected error in (1.2) has the order $O(n^{-\frac{1}{2}})$ (note that the convergence rate is independent of the dimension $d$ ). Evidently, the MC method is simple and easy to implement, making them a popular choice for many applications. However, the MC method is slow to converge, especially for high-dimensional problems, and the accuracy of the approximation depends on the number of random samples. One way to improve the convergence rate of the Monte Carlo method is to use quasi-Monte Carlo methods.

Quasi-Monte Carlo (QMC) methods [3, 21] employ integration techniques that use point sets with better distribution properties than random sampling. Similar to the MC method, the QMC method also has the general form (1.2), but unlike the MC method, the integration points $\{\mathbf{x}_{i}\}_{i=0}^{n-1}\in\Omega$ are chosen deterministically and methodically. The deterministic nature of the QMC method could lead to guaranteed error bounds and that the convergence rate could be faster than the $O(n^{-\frac{1}{2}})$ order of the MC method for sufficiently smooth functions. QMC error bounds are typically given in the form of Koksma-Hlawka-type inequalities as follows:

(1.3)

|I_{d}(f)-Q_{n,d}(f)|\leq D(\mathbf{x}_{0},\mathbf{x}_{1},\cdots,\mathbf{x}_{n-1})V(f),

where $D(\mathbf{x}_{0},\mathbf{x}_{1},\cdots,\mathbf{x}_{n-1})$ is a (positive) discrepancy function which measures the non-uniformity of the point set $\{\mathbf{x}_{i}\}_{i=0}^{n-1}$ and $V(f)$ is a (positive) functional which measures the variability of $f$ . Error bounds of this type separate the dependence on the cubature points from the dependence on the integrand. The QMC point sets with discrepancy of order $O(n^{-1}(\log n)^{d})$ or better are collectively known as low-discrepancy point sets [22].

One of the most popular QMC methods is the lattice rule, whose integration points are chosen to have a lattice structure, low-discrepancy, and better distribution properties than random sampling [1, 2, 4], hence, resulting in a more accurate method with faster convergence rate. However, traditional lattice rules still have limitations when applied to high-dimensional problems. Good lattice rules almost always involve searching (cf. [23, 24]), the cost of an exhaustive search (for $n$ fixed) grows exponentially with the dimension $d$ . Moreover, like the MC method, the number integration points required to achieve a reasonable accuracy also increases exponentially with the dimension $d$ (i.e., the CoD phenomenon), which makes the method computationally infeasible for very high-dimensional integration problems.

To overcome the limitations of QMC lattice rules, we first propose an improved QMC lattice rule based on a change of variables and reformulate it as a tensor product rule in the transformed coordinates. We then develop an efficient implementation algorithm, called MDI-LR, by adapting the multilevel dimension iteration (MDI) idea first proposed by the authors in [11], for the improved QMC lattice rule. The proposed MDI-LR algorithm optimizes the function evaluations at integration points by clustering them and sharing computations via a symbolic-function based dimension/coordinate iteration procedure. This MDI-LR algorithm significantly reduces the computational complexity of the QMC lattice rule from an exponential growth in dimension $d$ to a polynomial order $O(N^{2}d^{3})$ , where $N$ denotes the number of integration points in each (transformed) coordinate direction. Thus, the MDI-LR effectively overcomes the CoD and revitalizes QMC lattice rules for high-dimensional integration.

The remainder of this paper is organized as follows. In Section 2, we first briefly review the rank-one lattice rule and its properties. In Section 4, we introduce a reformulation of this lattice rule and proposed a tensor product generalization based on an affine transformation. In Section 3 , we introduce our MDI-LR algorithm for efficiently implementing the proposed lattice rule based a multilevel dimension iteration idea. In Section 5, we present extensive numerical experiments to test the performance of the proposed MDI-LR algorithm and compare its performance with the original lattice rule and the improved lattice rule with standard implementation. The numerical experiments show that the MDI-LR algorithm is much faster and more efficient in medium and high-dimensional cases. In Section 6, we numerically examine the impact of parameters appeared in MDI-LR algorithm, including the choice of the generating vector $\mathbf{z}$ for the lattice rule. In Section 7, we present a detail numerical study of the computational complexity for the MDI-LR algorithm. This is done by using regression techniques to discover the relationship between CPU time and dimension $d$ . Finally, the paper is concluded with a summary given in Section 8.

2 Preliminaries

In this section, we first briefly recall some basic materials about Quasi-Monte Carlo (QMC) lattice rules for evaluating integrals (1.1) and their properties, they will set stage for us to introduce our fast implementation algorithm in the later sections.

2.1 Quasi-Monte Carlo lattice rules

Lattice rules are a class of Quasi Monte Carlo (QMC) methods which were first introduced by Korobov in [1] to approximate (1.1) with periodic integrand $f$ . A lattice rule is an equal-weight cubature rule whose cubature points are those points of an integration lattice that lie in the half-open unit cube $[0,1)^{d}$ . Every lattice point set includes the origin. The projection of the lattice points onto each coordinate axis are equally spaced. Essentially, the integral is approximated in each coordinate direction by a rectangle rule (or a trapezoidal rule if the integrand is periodic). The simplest lattice rules are called rank-one lattice rules, they use a lattice point set generated by multiples of a single generator vector, which are defined as follows.

Definition 2.1 (rank-one lattice rule).

An n-point rank-one lattice rule in $d$ -dimensions, also known as the method of good lattice points, is a QMC method with cubature points

(2.1)

\mathbf{x}_{i}=\Big{\{}\frac{i\mathbf{z}}{n}\Big{\}},\qquad i=0,1,\cdots,n-1,

where $\mathbf{z}\in\mathbb{Z}^{d}$ , known as the generating vector, is a $d$ -dimensional integer vector having no factor in common with n, and the braces operator $\{\cdot\}$ takes the fractional part of the input vector.

Every lattice rule can be written as a multiple sum involving one or more generating vectors. The minimal number of generating vectors required to generate a lattice rule is known as the rank of the rule. Besides rank-one lattice rules which have only one generating vector, there are also lattice rules having rank up to $d$ .

$f$ is said to have an absolutely convergent Fourier series expansion if

(2.2)

f(\mathbf{x})=\sum_{\mathbf{h}\in Z^{d}}\hat{f}(\mathbf{h})e^{2\pi i\mathbf{h}\cdot\mathbf{x}},\qquad i=\sqrt{-1},

where the Fourier coefficient is defined as

\hat{f}(\mathbf{h})=\int_{\Omega}f(\mathbf{x})e^{-2\pi i\mathbf{h}\cdot\mathbf{x}}d\mathbf{x}.

The following theorem gives two characterizations for the error of the lattice rules (cf. [2, Theorem 1] and [3, Theorem 5.2]).

Theorem 2.2.

Let $Q_{n,d}$ denote a lattice rule (not necessarily rank-one) and let $\mathcal{L}$ denote the associated integration lattice. If $f$ has an absolutely convergent Fourier series (2.2), then

(2.3)

Q_{n,d}(f)-I_{d}(f)=\sum_{\mathbf{h}\in\mathcal{L}^{\perp}\setminus\{\mathbf{0}\}}\hat{f}(\mathbf{h}),

where $\mathcal{L}^{\perp}:=\{\mathbf{h}\in\mathbb{Z}^{d}:\mathbf{h}\cdot\mathbf{x}\in\mathbb{Z}\quad\forall\mathbf{x}\in\mathcal{L}\}$ is the dual lattice associated with $\mathcal{L}$ .

Theorem 2.3.

Let $Q_{n,d}$ denote a rank-one lattice rule with a generating vector $\mathbf{z}$ . If $f$ has an absolutely convergent Fourier series (2.2), then

(2.4)

Q_{n,d}(f)-I_{d}(f)=\sum_{\begin{array}[]{c}\mathbf{h}\in\mathbb{Z}^{d}\setminus\{\mathbf{0}\}\\ \mathbf{h}\cdot\mathbf{z}\equiv 0\,(\mbox{\rm mod}\,n)\end{array}}\hat{f}(\mathbf{h}).

It follows from (2.4) that the least upper bound of the error for the class $E_{\alpha}(c)$ of functions whose Fourier coefficients satisfy $|\hat{f}(\mathbf{h})|\leq\frac{c}{(\bar{h}_{1}\cdots\bar{h}_{d})^{\alpha}}$ (where $\bar{h}:=\max(1,|h|)$ and $\alpha>1,c>0,\mathbf{h}\neq 0$ ) is given by

(2.5)

|Q_{n,d}(f)-I_{d}(f)|\leq c\sum_{\begin{array}[]{c}\mathbf{h}\in\mathbb{Z}^{d}\setminus\{\mathbf{0}\}\\ \mathbf{h}\cdot\mathbf{z}\equiv 0\,(\mbox{mod}\,n)\end{array}}\frac{1}{(\bar{h}_{1}\cdots\bar{h}_{d})^{\alpha}}.

Let

(2.6)

{\color[rgb]{0,0,0}P_{\alpha,n,d}(\mathbf{z}):}=\sum_{\begin{array}[]{c}\mathbf{h}\in\mathbb{Z}^{d}\setminus\{\mathbf{0}\}\\ \mathbf{h}\cdot\mathbf{z}\equiv 0\,(\mbox{mod}\,n)\end{array}}\frac{1}{(\bar{h}_{1}\cdots\bar{h}_{d})^{\alpha}}.

For fixed $n$ and $\alpha$ , a good lattice point $\mathbf{z}$ is so chosen to make $P_{\alpha,n,d}(\mathbf{z})$ as small as possible. It ws proved by Niederreiter in [7, 8, Theorem 2.11] that for a prime $n$ (or a prime power) there exists a lattice point $\mathbf{z}$ such that

(2.7)

P_{\alpha,n,d}(\mathbf{z})=O\biggl{(}\frac{(\log n)^{\alpha d}}{n^{\alpha}}\biggr{)}.

This was done by proving that $P_{\alpha,n,d}(\mathbf{z})$ has the following expansion:

(2.8)			$\displaystyle P_{\alpha,n,d}(\mathbf{z})=-1+\frac{1}{n}\sum\limits_{k=0}^{n-1}\prod_{j=1}^{d}\biggl{(}1+\sum_{h\in\backslash\{0\}}\frac{e^{2\pi ikhz_{j}/n}}{\|h\|^{\alpha}}\biggr{)}$
		$\displaystyle\quad=-1+\frac{1}{n}\prod_{j=1}^{d}\bigl{(}1+2\zeta(\alpha)\bigr{)}+\frac{1}{n}\sum_{k=1}^{n-1}\prod_{j=1}^{d}\biggl{(}1+\frac{(-1)^{\frac{\alpha}{2}+1}(2\pi)^{\alpha}}{\alpha!}B_{\alpha}\Bigl{(}\Bigl{\{}\frac{kz_{j}}{n}\Bigr{\}}\Bigr{)}\biggr{)},$

where

(2.9)		$\displaystyle\zeta(\alpha)$	$\displaystyle:=\sum_{j=1}^{\infty}\frac{1}{j^{\alpha}},\qquad\alpha>1;$
(2.10)		$\displaystyle B_{\alpha}(\lambda)$	$\displaystyle:=\frac{(-1)^{\frac{\alpha}{2}+1}\alpha!}{(2\pi)^{\alpha}}\sum_{h\in\mathbb{Z}\backslash\{0\}}\frac{e^{2\pi ih\lambda}}{\|h\|^{\alpha}},\qquad\lambda\in[0,1].$

As expected, the performance of a lattice rule depends heavily on the choice of the generating vector $\mathbf{z}$ . For large $n$ and $d$ , an exhaustive search to find such a generating vector by minimizing some desired error criterion is practically impossible. Below we list a few common strategies for constructing lattice generating vectors.

We end this subsection by stating some well-known error estimate results. To the end, we need to introduce some notations. The worst-case error of a QMC rule $Q_{n,d}(f)$ using the point set $P\subset[0,1]^{d}$ in a normed space $H$ (with the norm $\|\cdot\|$ ) is

E_{n,d}(P):=\sup_{\|f\|\leq 1}|I_{d}(f)-Q_{n,d}(f)|.

By linearity, for any function $f\in H$ , we have

\bigl{|}I_{d}(f)-Q_{n,d}(f)\bigr{|}\leq E_{n,d}(P)\|f\|.

For a given (shift) vector $\Delta\in[0,1]^{d}$ , we define the shifted lattice $P+\Delta:=\{\{\mathbf{t}+\Delta\}:t\in P\}$ . For any QMC lattice rule $Q_{n,d}(\cdot)$ with the lattice point set $P$ , let $Q^{(sh)}_{n,d}(\cdot)$ denote the corresponding shifted QMC lattice rule over the lattice $P+\Delta$ . Then, for any integrand $f\in H$ , it follows from the definition of the worst-case error that

\bigl{|}I_{d}(f)-Q_{n,d}^{(sh)}(f)\bigr{|}\leq E_{n,d}(P+\Delta)\|f\|.

Define the quantity

E^{(sh)}_{n,d}(P):=\biggl{(}\int_{[0,1]^{d}}E^{2}_{n,d}(P+\Delta)\,d\Delta\biggr{)}^{\frac{1}{2}},

which denotes the shift-averaged worst-case error. The following bound for the root-mean-square error was derived in [3, Section 5.2]:

\biggl{(}\mathbb{E}\bigl{|}I_{d}(f)-Q_{n,d}^{(sh)}(f)\bigr{|}^{2}\biggr{)}^{\frac{1}{2}}\leq E^{(sh)}_{n,d}(P)\|f\|.

where the expectation $\mathbb{E}$ is taken over the random shift $\Delta$ which is uniformly distributed over $[0,1]^{d}$ . The shift-averaged worst-case error $E^{(sh)}_{n,d}(P)$ is often used as quality measure for randomly shifted QMC rules. For any given point set $P$ , the averaging argument guarantees the existence of at least one shift $\Delta$ for which

(2.11)

E_{n,d}(P+\Delta)\leq E_{n,d}^{(sh)}(P).

In the case of rank-one QMC lattice rule with the generating vector $\mathbf{z}$ , we use $E_{n,d}(\mathbf{z})$ and $E^{(sh)}_{n,d}(\mathbf{z})$ to denote $E_{n,d}(P)$ and $E^{(sh)}_{n,d}(P)$ . It was proved in [3, Lemma 5.5] that, for any rank-one QMC lattice rule, $\bigl{[}E^{(sh)}_{n,d}(\mathbf{z})\bigr{]}^{2}$ has an explicit formula as quoted in the following theorem.

Theorem 2.4.

The shift-averaged worst-case error for a rank-one QMC attice rule in the weighted anchored or unanchored Sobolev space (see Remark 2.5 below for the definitions) is given by

(2.12)		$\displaystyle\bigl{[}E^{(sh)}_{n,d}(\mathbf{z})\bigr{]}^{2}$	$\displaystyle=\sum_{\emptyset\neq\nu\subseteq\{1:d\}}\gamma_{\nu}\bigg{(}\frac{1}{n}\sum_{k=0}^{n-1}\prod_{j\in\nu}\bigg{[}B_{2}\bigg{(}\bigg{\{}\frac{kz_{j}}{n}\bigg{\}}\bigg{)}+\beta\bigg{]}-\beta^{\|\nu\|}\bigg{)}$
		$\displaystyle=-\prod_{j=1}^{d}(1+\gamma_{j}\beta)+\frac{1}{n}\sum_{k=0}^{n-1}\prod_{j=1}^{d}\bigg{(}1+\gamma_{j}\bigg{[}B_{2}\bigg{(}\bigg{\{}\frac{kz_{j}}{n}\bigg{\}}\bigg{)}+\beta\bigg{]}\bigg{)},$

where $\beta=c^{2}-c+\frac{1}{3}$ for the anchored Sobolev space and $\beta=0$ for unanchored Sobolev space. $\{\gamma_{\nu}\}$ are weights.

Remark 2.5.

(1) We recall that for general given weights $\{\gamma_{\nu}\}$ , the inner product of the weighted anchored Sobolev space is defined by

(2.13)

<f,g>_{d,\gamma}:=\sum_{\nu\subseteq\{1,2,\cdots,d\}}\gamma_{\nu}^{-1}\int_{[0,1]^{|\nu|}}\frac{\partial^{|\nu|}}{\partial\mathbf{x}_{\nu}}f(\mathbf{x}_{\nu};c)\frac{\partial^{|\nu|}}{\partial\mathbf{x}_{\nu}}g(\mathbf{x}_{\nu};c)d\mathbf{x}_{\nu}.

where the sum is over all subsets $\nu\subseteq\{1,2,\cdots,d\}$ , including the empty set, while for $\mathbf{x}\in[0,1]^{d}$ the symbol $\mathbf{x}_{\nu}$ denotes the set of components $x_{j}$ of $\mathbf{x}$ with $j\in\nu$ , and $(\mathbf{x}_{\nu};c)$ denotes the vector obtained by replacing the components of $\mathbf{x}$ for $j\notin\nu$ by $c\in[0,1]$ which is called the ‘anchor’ value. The partial derivative $\frac{\partial^{|\nu|}}{\partial\mathbf{x}_{\nu}}$ denotes the mixed first partial derivative with respect to the components of $\mathbf{x}_{\nu}$ .

(2) The inner product for the weighted unanchored Sobolev space is defined by

(2.14)		$\displaystyle<f,g>_{d,\gamma}$	$\displaystyle:=\sum_{\nu\subseteq\{1:d\}}\gamma_{\nu}^{-1}\int_{[0,1]^{\|\nu\|}}\bigg{(}\int_{[0,1]^{d-\|\nu\|}}\frac{\partial^{\|\nu\|}}{\partial\mathbf{x}_{\nu}}f(\mathbf{x})d\mathbf{x}_{-\nu}\bigg{)}$
		$\displaystyle\hskip 108.405pt\times\bigg{(}\int_{[0,1]^{d-\|\nu\|}}\frac{\partial^{\|\nu\|}}{\partial\mathbf{x}_{\nu}}g(\mathbf{x})d\mathbf{x}_{-\nu}\bigg{)}d\mathbf{x}_{\nu},$

where $\mathbf{x}_{-\nu}$ stands for the vector consisting of the remaining components of the $d$ -dimensional vector $\mathbf{x}$ that are not in $\mathbf{x}_{\nu}$ .

2.2 Examples of good rank-one lattice rules

The first example is the Fibonacci lattice, we refer the reader to [3] for the details.

Example 2.6 (Fibonacci lattice).

Let $\mathbf{z}=(1,F_{k})$ and $n=F_{k+1}$ , where $F_{k}$ and $F_{k+1}$ are consecutive Fibonacci numbers. Then the resulting two-dimensional lattice set generated by $\mathbf{z}$ is called a Fibonacci lattice.

Fibonacci lattices in 2-d have a certain optimality property, but there is no obvious generalization to higher dimensions that retains the optimality property (cf. [3] ).

The second example is so-called Korobov lattices, we refer the reader to [5, 6] for the details.

Example 2.7 (Korobov lattice).

Let $a$ be an integer satisfying $1\leq a\leq n-1$ and $\gcd(a,n)=1$ and

\mathbf{z}=\mathbf{z}(a):=(1,a,a^{2},\cdots,a^{d-1})\mod n.

Then the resulting $d$ -dimensional lattice set generated by $\mathbf{z}$ is called a Korobov lattice.

It is easy to see that there are (at most) $n-1$ choices for the Korobov parameter $a$ , which leads to (at most) $n-1$ choices for the generating vector $\mathbf{z}$ . Thus it is feasible in practice to search through the (at most) $n-1$ choices and take the one that fulfills the desired error criterion such as the one that minimizes $P_{\alpha,n,d}(\mathbf{z})$ , and (2.8) allows $P_{\alpha,n,d}(\mathbf{z})$ to be computed in $O(dn^{2})$ operations (cf. [4]).

The last example is called the CBC lattice which is based on the component-by-component construction (cf. [9]).

Example 2.8 (CBC lattice).

Let $\mathbb{N}_{n}:=\{z\in\mathbb{Z}:1\leq z\leq n-1\text{ and }\gcd(z,n)=1\}$ . Given $n,d$ and weights as in $\gamma_{\nu}$ in (2.12), define generating vector $\mathbf{z}=(z_{1},z_{2},\cdots,z_{d})$ component-wise as follows.

(i)

Set $z_{1}=1$ .
(ii)

With $z_{1}$ held fixed, choose $z_{2}$ from $\mathbb{N}_{n}$ to minimize $[E^{(sh)}_{n,d}((z_{1},z_{2}))]^{2}$ in $2$ -d.
(iii)

With $z_{1},z_{2}$ held fixed, choose $z_{3}$ from $\mathbb{N}_{n}$ to minimize $[E^{(sh)}_{n,d}((z_{1},z_{2},z_{3}))]^{2}$ in $3$ -d.
(iv)

repeat the above process until all $\{z_{j}\}_{j=1}^{d}$ are determined.

With general weights $\{\gamma_{\nu}\}$ , the cost of the CBC algorithm is prohibitively expensive, thus in practice some special structure is always adopted, among them, product weights, order-dependent weights, finite-order weights, and POD (product and order-dependent) weights are commonly used. In each of the $d$ steps of the CBC algorithm, the search space $\mathbb{N}_{n}$ has cardinality $n-1$ . Then the overall search space for the CBC algorithm is reduced to a size of order $O(dn)$ (cf. [10, page 11]). Hence, this provides a feasible way of constructing a generating vector $\mathbf{z}$ .

Figure 1 shows a two-dimensional lattice with 81 points, the corresponding generating vectors are (1, 2), (1, 4) and (1, 7) respectively. Figure 2 shows a three-dimensional lattice with 81 points, and the corresponding generating vectors are respectively (1, 2, 4), (1, 4, 16) and (1, 7, 49).

Refer to caption — Figure 1: $81$ -point lattice with generating vectors $(1,2),(1,4)$ , and $(1,7)$ .

3 Reformulation of lattice rules

Clearly, the lattice point set of each QMC lattice rule has some pattern or structure. Indeed, one main goal of this section is precisely to describe the pattern. We show that a lattice rule almost has a tensor product reformulation viewed in an appropriately transformed coordinate space via an affine transformation. This discovery allows us to introduce a tensor product rule as an improvement to the original QMC lattice rule. More importantly, the reformulation lays an important jump pad for us to develop an efficient and fast implementation algorithm (or solver), called the MDI-LR algorithm, based on the idea of multilevel dimension iteration [11], for evaluating the QMC lattice rule (1.2).

3.1 Construction of affine coordinate transformations

From Figure 1-2, we see that the distribution of lattice points are on lines/planes which are not parallel to the coordinate axes/planes, however, those lines/planes are parallel to each other, this observation suggests that they can be made to parallel to the coordinate axes/planes via affine transformations. Below we prove that is indeed the case and explicitly construct such an affine transformation for a given QMC lattice rule.

Theorem 3.1.

Let $\mathbf{z}\in\mathbb{Z}^{d}$ and $\mathbf{x}_{j}=\bigl{\{}\frac{j\mathbf{z}}{n}\big{\}}$ for $j=0,1,2,\cdots,n-1$ denote the rank-one QMC lattice rule point set. Define

(3.1)

A=\left(\begin{array}[]{ccccccc}\frac{1}{z_{1}}&-\frac{1}{z_{2}}&0&\cdots&0&0\\ 0&\frac{1}{z_{2}}&-\frac{1}{z_{3}}&\cdots&0&0\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ 0&0&0&\cdots&\frac{1}{z_{d-1}}&-\frac{1}{z_{d}}\\ 0&0&0&\cdots&0&1\end{array}\right)\quad\mbox{and }\quad\mathbf{b}=\left(\begin{array}[]{c}0\\ 0\\ \vdots\\ -\bigl{\{}\frac{nx_{d}}{z_{d}}\bigr{\}}\cdot\frac{z_{d}}{n}\end{array}\right).

Notice that $A\in\mathbb{R}^{d\times d}$ and $\mathbf{b}\in\mathbb{R}^{d}$ . Then $\mathbf{y}_{j}:=abs(A\mathbf{x}_{j}+\mathbf{b}),j=0,1,\cdots,n-1$ form a Cartesian grid in the new coordinate system, where $abs(\mathbf{y})$ defines as taking the absolute value of each component in the vector $\mathbf{y}$ .

Proof 3.2.

By the definition of $\mathbf{x}_{j}$ , we have $\mathbf{x}_{j}=\big{(}\{\frac{jz_{1}}{n}\},\{\frac{jz_{2}}{n}\},\cdots,\{\frac{jz_{d}}{n}\}\big{)}^{\intercal}$ . A direct computation yields

(3.7)

\displaystyle\mathbf{y}_{j}

\displaystyle=abs(A\mathbf{x}_{j}+\mathbf{b})=abs\left(\begin{array}[]{cccc}\frac{1}{z_{1}}\{\frac{jz_{1}}{n}\}-\frac{1}{z_{2}}\{\frac{jz_{2}}{n}\}\\ \frac{1}{z_{2}}\{\frac{jz_{2}}{n}\}-\frac{1}{z_{3}}\{\frac{jz_{3}}{n}\}\\ \vdots\\ \frac{1}{z_{d-1}}\{\frac{jz_{d-1}}{n}\}-\frac{1}{z_{d}}\{\frac{jz_{d}}{n}\}\\ \{\frac{jz_{d}}{n}\}-\{\frac{n}{z_{d}}\{\frac{jz_{d}}{n}\}\}\cdot\frac{z_{d}}{n}\end{array}\right)

Recall that $\{x\}$ and $\lfloor x\rfloor$ denote respectively the fractional and integer parts of the number $x$ . Because

\frac{1}{z_{i}}\Big{\{}\frac{jz_{i}}{n}\Big{\}}=\frac{1}{z_{i}}\Big{(}\frac{jz_{i}}{n}-\Big{\lfloor}\frac{jz_{i}}{n}\Big{\rfloor}\Big{)}=\frac{j}{n}-\frac{1}{z_{i}}\Big{\lfloor}\frac{jz_{i}}{n}\Big{\rfloor},

then

\frac{1}{z_{i-1}}\Big{\{}\frac{jz_{i-1}}{n}\Big{\}}-\frac{1}{z_{i}}\Big{\{}\frac{jz_{i}}{n}\Big{\}}=\frac{1}{z_{i}}\Big{\lfloor}\frac{jz_{i}}{n}\Big{\rfloor}-\frac{1}{z_{i-1}}\Big{\lfloor}\frac{jz_{i-1}}{n}\Big{\rfloor},

and

(3.8)

\mathbf{y}_{j}={abs}\left(\begin{array}[]{cccc}\frac{1}{z_{1}}\{\frac{jz_{1}}{n}\}-\frac{1}{z_{2}}\{\frac{jz_{2}}{n}\}\\ \frac{1}{z_{2}}\{\frac{jz_{2}}{n}\}-\frac{1}{z_{3}}\{\frac{jz_{3}}{n}\}\\ \vdots\\ \frac{1}{z_{d-1}}\{\frac{jz_{d-1}}{n}\}-\frac{1}{z_{d}}\{\frac{jz_{d}}{n}\}\\ \{\frac{jz_{d}}{n}\}-\{\frac{\{\frac{jz_{d}}{n}\}}{\frac{z_{d}}{n}}\}\cdot\frac{z_{d}}{n}\end{array}\right)={abs}\left(\begin{array}[]{cccc}\frac{1}{z_{2}}\lfloor\frac{jz_{2}}{n}\rfloor-\frac{1}{z_{1}}\lfloor\frac{jz_{1}}{n}\rfloor\\ \frac{1}{z_{3}}\lfloor\frac{jz_{3}}{n}\rfloor-\frac{1}{z_{2}}\lfloor\frac{jz_{2}}{n}\rfloor\\ \vdots\\ \frac{1}{z_{d}}\lfloor\frac{jz_{d}}{n}\rfloor-\frac{1}{z_{d-1}}\lfloor\frac{jz_{d-1}}{n}\rfloor\\ \frac{z_{d}}{n}\lfloor j-\frac{n}{z_{d}}\lfloor\frac{jz_{d}}{n}\rfloor\rfloor\end{array}\right).

It is easy to check that

(3.9)

\frac{1}{z_{i}}\Bigl{\lfloor}\frac{jz_{i}}{n}\Bigr{\rfloor}-\frac{1}{z_{i-1}}\Bigl{\lfloor}\frac{jz_{i-1}}{n}\Bigr{\rfloor}=\left\{\begin{array}[]{lcl}0,&0\leq j<\frac{n}{z_{i}}\\ \frac{1}{z_{i}},&\frac{n}{z_{i}}\leq j<\frac{n}{z_{i-1}}\\ \frac{1}{z_{i-1}}-\frac{1}{z_{i}},&\frac{n}{z_{i-1}}\leq j<\frac{2n}{z_{i}}\\ \frac{2}{z_{i}}-\frac{1}{z_{i-1}},&\frac{2n}{z_{i}}\leq j<\frac{2n}{z_{i-1}}\\ \frac{2}{z_{i-1}}-\frac{2}{z_{i}},&\frac{2n}{z_{i-1}}\leq j<\frac{3n}{z_{i}}\\ \vdots&\vdots\\ \end{array}\right..

On the other hand, let

	$\displaystyle\Gamma_{n_{1}}^{1}:=\Big{\{}y_{s_{1}}\,\|y_{s_{1}}={abs}\bigg{(}\frac{1}{z_{2}}\Big{\lfloor}\frac{iz_{2}}{n}\Big{\rfloor}-\frac{1}{z_{1}}\Big{\lfloor}\frac{iz_{1}}{n}\Big{\rfloor}\bigg{)},i=0,1,\cdots,n-1,s_{1}=0,1,\cdots,n_{1}-1\Big{\}},$
	$\displaystyle\Gamma_{n_{2}}^{1}:=\Big{\{}y_{s_{2}}\,\|y_{s_{2}}={abs}\bigg{(}\frac{1}{z_{3}}\Big{\lfloor}\frac{iz_{3}}{n}\Big{\rfloor}-\frac{1}{z_{2}}\Big{\lfloor}\frac{iz_{2}}{n}\Big{\rfloor}\bigg{)},i=0,1,\cdots,n-1,s_{2}=0,1,\cdots,n_{2}-1\Big{\}},$
	$\displaystyle\quad\vdots$
	$\displaystyle\Gamma_{n_{d-1}}^{1}:=\Big{\{}y_{s_{d-1}}\,\|y_{s_{d-1}}={abs}\bigg{(}\frac{1}{z_{d}}\Big{\lfloor}\frac{iz_{d}}{n}\Big{\rfloor}-\frac{1}{z_{d-1}}\Big{\lfloor}\frac{iz_{d-1}}{n}\Big{\rfloor}\bigg{)},i=0,1,\cdots,n-1,\atop\hskip 202.35622pts_{d-1}=0,1,\cdots,n_{d-1}-1\Big{\}},$
	$\displaystyle\Gamma_{n_{d}}^{1}:=\Big{\{}y_{s_{d}}\,\|y_{s_{d}}={abs}\bigg{(}\frac{z_{d}}{n}\Big{\lfloor}i-\frac{n}{z_{d}}\Big{\lfloor}\frac{iz_{d}}{n}\Big{\rfloor}\Big{\rfloor}\bigg{)},i=0,1,\cdots,n-1,s_{d}=0,1,\cdots,n_{d}-1\Big{\}},$
	$\displaystyle\Gamma_{n}^{d}:=\Gamma_{n_{1}}^{1}\otimes\Gamma_{n_{2}}^{1}\otimes\cdots\otimes\Gamma_{n_{d}}^{1},$

where

(3.10)

n_{i}=\frac{{\rm lcm}(z_{i},z_{i+1})}{\min(z_{i},z_{i+1})},\quad i=1,2,\cdots,d-1;\qquad n_{d}=\Bigl{\lceil}\frac{n}{z_{d}}\Bigr{\rceil},

and ${\rm lcm}$ represents the least common multiple.

For any $\mathbf{y}_{k}=(y_{s_{1}},y_{s_{2}},\cdots,y_{s_{d}})^{\intercal}\in\Gamma_{n}^{d}$ , we have $k=s_{1}+s_{2}n_{1}+s_{3}n_{1}n_{2}+\cdots+s_{d}n_{1}n_{2}\cdots n_{d-1}$ . Since $s_{1}=0,1,\cdots,n_{1},\cdots,s_{d}=0,1,\cdots,n_{d}$ , then $k=0,1,\cdots,n_{1},\cdots,$ $(n_{1}n_{2}\cdots n_{d})$ . For $\mathbf{y}_{j}$ in the set described by (3.8), $\forall j=1,2,\cdots,n$ , we get

(3.11)

\mathbf{y}_{j}={abs}\left(\begin{array}[]{cccc}\frac{1}{z_{2}}\lfloor\frac{jz_{2}}{n}\rfloor-\frac{1}{z_{1}}\lfloor\frac{jz_{1}}{n}\rfloor\\ \frac{1}{z_{3}}\lfloor\frac{jz_{3}}{n}\rfloor-\frac{1}{z_{2}}\lfloor\frac{jz_{2}}{n}\rfloor\\ \vdots\\ \frac{1}{z_{d}}\lfloor\frac{jz_{d}}{n}\rfloor-\frac{1}{z_{d-1}}\lfloor\frac{jz_{d-1}}{n}\rfloor\\ \frac{z_{d}}{n}\lfloor k-\frac{n}{z_{d}}\lfloor\frac{jz_{d}}{n}\rfloor\rfloor\end{array}\right).

Let $y_{i_{1}}:=abs(\frac{1}{z_{2}}\lfloor\frac{jz_{2}}{n}\rfloor-\frac{1}{z_{1}}\lfloor\frac{jz_{1}}{n}\rfloor)$ , it follows that there exists an $s_{1}$ such that $s_{1}=n_{1}-\lfloor\frac{j}{n_{1}}\rfloor$ , resulting in $y_{i_{1}}=y_{s_{1}}\in\Gamma_{n_{1}}^{1}$ . In the same way, $y_{i_{2}}\in\Gamma_{n_{2}}^{1},\cdots,y_{i_{d}}\in\Gamma_{n_{d}}^{1}$ . Therefore, we conclude that $\mathbf{y}_{j}=(y_{i_{1}},y_{i_{2}},\cdots,y_{i_{d}})^{\intercal}\in\Gamma_{n}^{d}$ , that is, the transformed lattice points have the Cartesian product structure.

Lemma 3.3.

Let $\mathbf{x}_{j}=\bigl{\{}\frac{j\mathbf{z}}{n}\big{\}}$ for $j=1,2,\cdots,n-1$ denote the Korobov lattice point set, that is $\mathbf{z}=(1,a,a^{2},\cdots,a^{d-1})$ , $1\leq a\leq n-1$ and $\gcd(a,n)=1$ . Then $\mathbf{y}_{j}:=abs(A\mathbf{x}_{j}+\mathbf{b}),j=0,1,\cdots,n-1$ satisfies the conclusion of Theorem 3.1. Moreover, if $a=[n]^{\frac{1}{d}}$ , then the number of points in each direction of the lattice set $\Gamma_{n}^{d}$ is the same, that is, $n_{1}=n_{2}\cdots=n_{d}=a$ .

Proof 3.4.

From theorem 3.1 we have

(3.12)

\mathbf{y}_{j}=abs\left(\begin{array}[]{cccc}\frac{1}{a}\lfloor\frac{ja}{n}\rfloor\\ \frac{1}{a^{2}}\lfloor\frac{ja^{2}}{n}\rfloor-\frac{1}{a}\lfloor\frac{ja}{n}\rfloor\\ \vdots\\ \frac{1}{a^{d-1}}\lfloor\frac{ja^{d-1}}{n}\rfloor-\frac{1}{a^{d-2}}\lfloor\frac{ja^{d-2}}{n}\rfloor\\ \frac{a^{d-1}}{n}\lfloor j-\frac{n}{a^{d-1}}\lfloor\frac{ja^{d-1}}{n}\rfloor\rfloor\end{array}\right),

and

	$\displaystyle\Gamma_{n_{1}}^{1}$	$\displaystyle=\Big{\{}y_{s_{1}}\,\|y_{s_{1}}={abs}\bigg{(}\frac{1}{a}\Big{\lfloor}\frac{ja}{n}\Big{\rfloor}\bigg{)},j=0,1,\cdots,n-1,s_{1}={0,1},\cdots,n_{1}-1\Big{\}}$
		$\displaystyle=\Big{\{}0,\frac{1}{a},\frac{2}{a},\cdots,\frac{a-1}{a}\Big{\}},$
	$\displaystyle\Gamma_{n_{2}}^{1}$	$\displaystyle=\Big{\{}y_{s_{2}}\,\|y_{s_{2}}={abs}\bigg{(}\frac{1}{a^{2}}\Big{\lfloor}\frac{ja^{2}}{n}\Big{\rfloor}-\frac{1}{a}\Big{\lfloor}\frac{ja}{n}\Big{\rfloor}\bigg{)},j=0,1,\cdots,n-1,s_{2}={0,1},\cdots,n_{2}-1\Big{\}}$
		$\displaystyle=\Big{\{}0,\frac{1}{a^{2}},\frac{2}{a^{2}},\cdots,\frac{a-1}{a^{2}}\Big{\}},$
		$\displaystyle\quad\vdots$
	$\displaystyle\Gamma_{n_{d-1}}^{1}$	$\displaystyle=\Big{\{}y_{s_{d-1}}\,\|y_{s_{d-1}}={abs}\bigg{(}\frac{1}{a^{d-1}}\Big{\lfloor}\frac{ja^{d-1}}{n}\Big{\rfloor}-\frac{1}{a^{d-2}}\Big{\lfloor}\frac{ja^{d-2}}{n}\Big{\rfloor}\bigg{)},j=0,1,\cdots,n-1,$
		$\displaystyle\hskip 224.03743pts_{d-1}={0,1},\cdots,n_{d-1}-1\Big{\}}$
		$\displaystyle=\Big{\{}0,\frac{1}{a^{d-1}},\frac{2}{a^{d-1}},\cdots,\frac{a-1}{a^{d-1}}\Big{\}},$
	$\displaystyle\Gamma_{n_{d}}^{1}$	$\displaystyle=\Big{\{}y_{s_{d}}\,\|y_{s_{d}}={abs}\bigg{(}\frac{a^{d-1}}{n}\Big{\lfloor}j-\frac{n}{a^{d-1}}\Big{\lfloor}\frac{ja^{d-1}}{n}\Big{\rfloor}\Big{\rfloor}\bigg{)},j=0,1,\cdots,n-1,$
		$\displaystyle\hskip 205.96994pts_{d}={0,1},\cdots,n_{d}-1\Big{\}}$
		$\displaystyle=\Big{\{}0,\frac{a^{d-1}}{n},\frac{2a^{d-1}}{n},\cdots,\frac{n-a^{d-1}}{n}\Big{\}}.$

For $\mathbf{y}_{j}$ in the set described by (3.12), let $y_{i_{1}}:=abs(\frac{1}{a}\lfloor\frac{ja}{n}\rfloor)$ , it follows that there exists an $s_{1}$ such that $s_{1}=n_{1}-\lfloor\frac{j}{n_{1}}\rfloor$ , resulting in $y_{i_{1}}=\frac{s_{1}}{a}=y_{s_{1}}\in\Gamma_{n_{1}}^{1}$ . Similarly, $y_{i_{2}}=\frac{s_{2}}{a^{2}}=y_{s_{2}}\in\Gamma_{n_{2}}^{1},\cdots,y_{i_{d}}\in\Gamma_{n_{d}}^{1}$ . Therefore, we conclude that $\mathbf{y}_{j}=(y_{i_{1}},y_{i_{2}},\cdots,y_{i_{d}})^{\intercal}\in\Gamma_{n}^{d}$ , Obviously, the transformed lattice points have the Cartesian product structure. Moreover, if $a=[n]^{\frac{1}{d}}$ , then

	$\displaystyle n_{i}=\frac{{\rm lcm}(z_{i},z_{i+1})}{\min(z_{i},z_{i+1})}=\frac{z_{i+1}}{z_{i}}=a,\quad i=1,2,\cdots,d-1,$
	$\displaystyle n_{d}=\lceil\frac{n}{z_{d}}\rceil=\frac{a^{d}}{a^{d-1}}=a.$

Hence, $n_{1}=n_{2}=\cdots=n_{d-1}=n_{d}=a$ . The proof is complete.

Left graph of Figure 3 shows a 2-d example of 81-point rank-one lattice with the generating vector $(1,4)$ . Right graph displays transformed lattice under the affine coordinate transformation $\mathbf{y}=A\mathbf{x}+\mathbf{b}$ from $\mathbb{R}^{2}$ to itself, where

(3.13)

A=\left(\begin{array}[]{cc}1&-\frac{1}{4}\\ 0&1\end{array}\right),\qquad\mathbf{b}=\left(\begin{array}[]{cc}0\\ -\{\frac{81x_{2}}{4}\}\cdot\frac{4}{81}\end{array}\right).

Figure 4 demonstrates a specific example in 3-d. The left graph is a 161-point rank-one lattice with generating vector $(1,4,16)$ . The right one shows the transformed points under the affine coordinate transformation $\mathbf{y}=A\mathbf{x}+\mathbf{b}$ from $\mathbb{R}^{3}$ to itself, where

(3.14)

A=\left(\begin{array}[]{ccc}1&-\frac{1}{4}&0\\ 0&\frac{1}{4}&-\frac{1}{16}\\ 0&0&1\end{array}\right),\qquad\mathbf{b}=\left(\begin{array}[]{ccc}0\\ 0\\ -\{\frac{161x_{3}}{16}\}\cdot\frac{16}{161}\end{array}\right).

3.2 Improved lattice rules

From Figures 3 and 4 we see that the transformed lattice point sets do not exactly form tensor product grids because many lines miss one point. By adding those “missing” points which can be done systematically, we easily make them become tensor product grids in the transformed coordinate system. Since more integration points are added to the QMC lattice rule, the resulting quadrature rule is expected to be more accurate (which is supported by our numerical tests), hence, it is an improvement to the original QMC lattice rule. We also note that those added points would correspond to ghost points in the original coordinates.

Definition 3.5 (Improved QMC lattice rule).

Let $\mathbf{z}\in\mathbb{R}^{d}$ and $\mathbf{x}_{i}=\big{\{}\frac{i\mathbf{z}}{n}\big{\}},i=0,1,\cdots,n-1,$ be a rank-one lattice point set and $\mathbf{y}_{i}:=A\mathbf{x}_{i}+\mathbf{b},i=0,1,\cdots,n-1$ for some $A\in\mathbb{R}^{d\times d}$ and $\mathbf{b}\in\mathbb{R}^{d}$ (which uniquely determine an affine transformation). Suppose there exists $n^{*}(<<n)$ points so that together the $n+n^{*}$ points form a tensor product grid in the transformed coordinate system, then the QMC lattice rule obtained by using those $n+n^{*}$ sampling points is called an improved QMC lattice rule, and denoted by $\widehat{Q}_{n,d}(f)$ .

Figure 5 shows a 81-point (i.e., $n=81$ ) 2-d rank-one lattice with the generating vector $(1,7)$ , the transformed lattice (middle), and the improved tensor product grid (right). Three points are added on the top, so $n^{*}=3$ for this example.

4 The MDI-LR algorithm

Since an improved rank-one lattice is a tensor product grid in the transformed coordinate system and its corresponding quasi-Monte Carlo (QMC) rule is a tensor product rule with equal weight $w=\frac{1}{n+n^{*}}$ . This tensor product improvement allows us to apply the multilevel dimension iteration (MDI) approach, which was proposed by the authors in [11], for a fast implementation of the original QMC lattice rule, especially in the high dimension case. The resulting algorithm will be called the MDI-LR algorithm throughout this paper,

4.1 Formulation of the MDI-LR algorithm

To formulate our MDI-LR algorithm, we first recall the MDI idea/algorithm in simple terms (cf. [11]).

For a tensor product rule, we need to compute a multi-summation with variable limits

\sum_{i_{1}=1}^{n_{1}}\sum_{i_{2}=1}^{n_{2}}\cdots\sum_{i_{d}=1}^{n_{d}}f(\xi_{i_{1}},\xi_{i_{2}},\cdots,\xi_{i_{d}}),

which involves $n_{1}n_{2}\cdots n_{d}$ function evaluations for the given function $f$ if one uses the conventional approach by computing function value at each point independently, that inevitably leads to the curse of dimensionality (CoD). To make computation feasible in high dimensions, it is imperative to save the computational cost by evaluating the summation more efficiently.

The main idea of the MDI approach proposed in [11] is to compute those $n_{1}n_{2}\cdots n_{d}$ function values in cluster (not independently) and to compute the summation layer-by-layer based on a dimension iteration with help of symbolic computation. To the end, we write

(4.1)

\sum_{i_{1}=1}^{n_{1}}\sum_{i_{2}=1}^{n_{2}}\cdots\sum_{i_{d}=1}^{n_{d}}f(\xi_{i_{1}},\xi_{i_{2}},\cdots,\xi_{i_{d}})=\sum_{i_{m+1}=1}^{n_{m+1}}\cdots\sum_{i_{d}=1}^{n_{d}}f_{d-m}(\xi_{i_{m+1}},\cdots,\xi_{d}),

where $1\leq m<<d$ is fixed and

f_{d-m}(x_{1},\cdots,x_{d-m}):=\sum_{i_{1}=1}^{n_{1}}\cdots\sum_{i_{m}=1}^{n_{m}}f(\xi_{i_{1}},\cdots,\xi_{i_{m}},x_{1},\cdots,x_{d-m}).

MDI approach recursively generates a sequence of symbolic functions $\{f_{d-m},f_{d-2m},$ $\cdots,f_{d-lm}\}$ , each function has $m$ fewer arguments than its predecessor (because the dimension is reduced by $m$ at each iteration). As already mentioned above, the MDI approach explores the lattice structure of the tensor product integration point set, instead of evaluating function values at all integration points independently, it evaluates them in cluster and iteratively along $m$ -coordinate directions, the function evaluation at any integration point is not completed until the last step of the algorithm is executed. In some sense, the implementation strategy of the MDI approach is to trade large space complexity for low time complexity. That being said, however, the price to be paid by the MDI approach for the speedy evaluation of the multi-summation is that those symbolic function needs to be saved during the iteration process, which often takes up more computer memory.

For example, consider 2-d function $f(x_{1},x_{2})=x_{1}^{2}+x_{1}x_{2}+x_{2}^{2}$ and let $n_{1}=n_{2}=N$ . In the standard approach, to compute the function value $f(\xi_{i_{1}},\xi_{i_{2}})$ at an integration point $(\xi_{i_{1}},\xi_{i_{2}})$ , one needs to compute three multiplications $\xi_{i_{1}}*\xi_{i_{1}}=\xi_{i_{1}}^{2}$ , $\xi_{i_{1}}*\xi_{i_{2}}$ and $\xi_{i_{2}}*\xi_{i_{2}}=\xi_{i_{2}}^{2}$ , and two additions. To compute $N^{2}$ function values, it requires a total of $3N^{2}$ multiplications and $3N^{2}-1$ additions. On the other hand, the first for-loop of the MDI approach generates $f_{1}(x)=\sum_{i_{2}}^{n}f(x,\xi_{i_{2}})$ which requires $N$ evaluations of $\xi_{i_{2}}*x_{1}$ (symbolic computations) and $N$ evaluations of $\xi_{i_{2}}*\xi_{i_{2}}$ , as well as $3(N-1)$ additions. The second for-loop generates $\sum_{i_{1}=1}^{N}f_{1}(\xi_{i_{1}})$ which requires $N$ evaluations of $\xi_{i_{1}}*\xi_{i_{1}}$ and $N$ evaluations of $\xi_{i_{1}}*\bar{\xi}_{i_{2}}$ , as well as $3N-1$ additions. After the second for-loop completes, we obtain the summation value. The computation complexity of the MDI approach consists of a total of $4N$ multiplications and $6N-4$ additions, which is much cheaper than the standard approach. In fact, the speedup is even more dramatic in higher dimensions.

It is easy to see that the MDI approach can not be applied to the QMC rule (1.2) because it is not in a multi-summation form. However, we have showed in Section 3 that this obstacle can be overcome by a simple affine coordinate transformation (i.e., change of variables) and adding a few integration points.

Let $\mathbf{y}=A\mathbf{x}+\mathbf{b}$ denote the affine transformation, then the integral (1.1) is equivalent to

(4.2)

I_{d}(f)=\frac{1}{|A|}\int_{\widehat{\Omega}}f\bigl{(}A^{-1}(\mathbf{y}-\mathbf{b})\bigr{)}\,d\mathbf{y}=\frac{1}{|A|}\int_{\widehat{\Omega}}g(\mathbf{y})d\mathbf{y},

where $|A|$ stands for the determinant of $A\in\mathbb{R}^{d\times d}$ and

g(\mathbf{y}):=f\bigl{(}A^{-1}(\mathbf{y}-\mathbf{b})\bigr{)},\qquad\widehat{\Omega}:=\bigl{\{}\mathbf{y}\,|\mathbf{y}=A\mathbf{x}+\mathbf{b},\,\mathbf{x}\in\Omega\bigr{\}}.

Then, our improved QMC rank-one lattice rule for (1.2) in the $\mathbf{y}$ -coordinate system takes the form

(4.3)

\widehat{Q}_{n,d}(f)=\frac{1}{n+n^{*}}\sum_{i=0}^{n+n^{*}-1}f(\mathbf{x}_{i})=\frac{1}{|A|}\sum_{s_{1}=1}^{n_{1}}\sum_{s_{2}=1}^{n_{2}}\cdots\sum_{s_{d}=1}^{n_{d}}g(y_{s_{1}},y_{s_{2}},\cdots,y_{s_{d}}).

Let

J(g,\Omega):=\sum\limits_{s_{1}=1}^{n_{1}}\sum\limits_{s_{2}=1}^{n_{2}}\cdots\sum\limits_{s_{d}=1}^{n_{d}}g(y_{s_{1}},y_{s_{2}},\cdots,y_{s_{d}}),

Clearly, it is a multi-summation with variable limits. Thus, we can apply the MDI approach to compute it efficiently. Before doing that, we first need to extend the MDI algorithm, Algorithm 2.3 of [11], to the case of variable limits. We name the extend algorithm as MDI $(d,g,\Omega_{d},\mathbf{N}_{d},m)$ , which is defined as follows.

Algorithm 1 MDI(

d

g

\Omega,\mathbf{N}_{d},m

)

Inputs: $d(\geq 4),g,\Omega,m(=1,2,3),\,\mathbf{N}_{k}=(n_{1},n_{2},\cdots,n_{k}),\,k=1,2,\cdots,d.$
Output: $J=J(g,\Omega)$ .

\Omega_{d}=\Omega

g_{d}=g

\ell=[\frac{d}{m}]

2:for

\ \textbf{do}\,k=d:-m:d-\ell m

(the index is decreased by

m

at each iteration)

\Omega_{d-m}=P_{k}^{k-m}\Omega_{k}

4: Construct symbolic function

g_{k-m}

by (4.4) below).

5: MDI

(k,g_{k},\Omega_{k},\mathbf{N}_{k},m)

:=MDI

(k-m,g_{k-m},\Omega_{k-m},\mathbf{N}_{k-m},m)

6:end for

J=

MDI

(d-\ell m,g_{d-\ell m},\Omega_{d-\ell m},\mathbf{N}_{d-\ell m},m)

8:return

J

Where $P^{k-m}_{k}$ denotes the natural embedding from $\mathbb{R}^{k}$ to $\mathbb{R}^{k-m}$ by deleting the first $m$ components of vectors in $\mathbb{R}^{k}$ , and

(4.4)

g_{k-m}(s_{1},\cdots,s_{k-m})=\sum_{i_{1},\cdots,i_{m}=1}^{n_{1},\cdots,n_{m}}w_{i_{1}}w_{i_{2}}\cdots w_{i_{m}}\,g_{k}\bigl{(}(\xi_{1},\cdots,\xi_{m},s_{1},\cdots,s_{k-m})\bigr{)}.

Remark 4.1.

(a)

Algorithm 1 recursively generates a sequence of symbolic functions $\{g_{d},g_{d-m},$ $g_{d-2m},\cdots g_{d-\ell m}\}$ , each function has $m$ fewer arguments than its predecessor.
(b)

Since $m\leq 3$ , when $d=2,3$ , we simply use the underlying low dimensional QMC quadrature rules. As done in [11], we name those low dimensional algorithms as 2d-MDI $(g,\Omega,\mathbf{N}_{2})$ and 3d-MDI $(g,\Omega,\mathbf{N}_{3})$ , and introduce the following conventions.
- –
  
  If $k=1$ , set MDI $(k,g_{k},\Omega_{k},n_{1},m):=J(g_{k},\Omega_{k})$ , which is computed by using the underlying 1-d QMC quadrature rule.
- –
  
  If $k=2$ , set MDI $(k,g_{k},\Omega_{k},\mathbf{N}_{k},m):=$ 2d-MDI $(g_{k},\Omega_{k},\mathbf{N}_{k})$ .
- –
  
  If $k=3$ , set MDI $(k,g_{k},\Omega_{k},\mathbf{N}_{k},m):=$ 3d-MDI $(g_{k},\Omega_{k},\mathbf{N}_{k})$ .
We note that when $k=1,2,3$ , the parameter $m$ becomes a dummy variable and can be given any value.
(c)

We also note that the MDI algorithm in [11] has an additional parameter $r$ which selects the 1-d quadrature rule. However, such a choice is not needed here because the underlying QMC rule is used as the 1-d quadrature rule.

We are now ready to define our MDI-LR algorithm, which is denoted by MDI-LR $(d,g,\Omega_{d},\mathbf{N}_{d},m)$ , by using the above MDI algorithm to evaluate $\widehat{Q}_{n,d}(f)$ in (4.3).

Algorithm 2 MDI-LR(

f

\Omega,d,a,n

)

Inputs: $f,\Omega,d,a,n$ .
Output: $\widehat{Q}_{n,d}(f)=Q_{n+n^{*},d}(f)$ .

1:Initialize

\mathbf{z}=(1,a,a^{2},\cdots,a^{d-1})

J=0,Q=0

m=1

2:Construct matrix

A

and

\mathbf{b}

by ((3.1)).

g(\mathbf{y}):=f(A^{-1}(\mathbf{y}-\mathbf{b}))

4:Generate the vector

\mathbf{N}_{d}

by (3.10).

\widehat{\Omega}:=\bigl{\{}\mathbf{y}\,|\mathbf{y}=A\mathbf{x}+\mathbf{b},\,\mathbf{x}\in\Omega\bigr{\}}.

J=

MDI

(d,g,\widehat{\Omega},\mathbf{N}_{d},m)

Q=\frac{J}{|A|}

8:return

\widehat{Q}_{n,d}(f)=Q

Noting that here we set $m=1$ , that is, the dimension is reduced by $1$ at each dimension iteration, this is because the numerical tests of [11] shows that when $m=1$ the MDI algorithm is more efficient than when $m>1$ . Also, the upper limit vector $\mathbf{N}_{d}$ depends on the choice of the underlying QMC rule. In Lemma 3.3 we showed that when $N=[n^{\frac{1}{d}}]$ and $a=N$ , then $n_{1}=n_{2}=\cdots=n_{d}=N$ , that is, the number of integration points is the same in each (transformed) coordinate direction.

5 Numerical performance tests

In this section, we present extensive and purposely designed numerical experiments to gauge the performance of the proposed MDI-LR algorithm and to demonstrate its superiority over the standard implementations of the QMC lattice rule (SLR) and the improved lattice rule (Imp-LR) for computing high dimensional integrals. All our numerical experiments are done in Matlab on a desktop PC with Intel(R) Xeon(R) Gold 6226R CPU 2.90GHz and 32GB RAM.

5.1 Two and three-dimensional tests

We first test our MDI-LR on simple 2- and 3-d examples and to compare its performance (in terms of the CPU time) with the SLR and Imp-LR methods .

Test 1. Let $\Omega=[0,1]^{2}$ and consider the following 2-d integrands:

(5.1)

f(x):=\frac{x_{2}\exp\bigl{(}x_{1}x_{2}\bigr{)}}{e-2};\qquad\widehat{f}(x):=\sin\bigl{(}2\pi+x_{1}^{2}+x_{2}^{2}\bigr{)}.

Table 1 and 2 present the computational results (errors and CPU times) of the SLR, Imp-LR and MDI-LR method for approximating $I_{2}(f)$ and $I_{2}(\widehat{f})$ , respectively. Recall that the Imp-LR is obtained by adding some integration points on the boundary of the domain in the transformed coordinates, and the MDI-LR algorithm provides a fast implementation of the Imp-LR using the MDI approach. From Table 1 and 2, we observe that all three methods require very little CPU time. The difference is almost negligible although the SLR is faster than the other two methods. Moreover, the Imp-LR and MDI-LR methods use function values at some additional sampling points on the boundary, which leads to higher accuracy compared to the SLR method as we predicated earlier.

	SLR(Standard LR)		Imp-LR(Improved LR)		MDI-LR
Total nodes ( $n$ )	Relative error	CPU time	Relative error	CPU time	Relative error	CPU time
101	$1.332\times 10^{-2}$	0.0422	$1.218\times 10^{-3}$	0.0423	$1.218\times 10^{-3}$	0.0877
501	$5.169\times 10^{-3}$	0.0567	$2.520\times 10^{-4}$	0.0547	$2.520\times 10^{-4}$	0.3230
1001	$4.051\times 10^{-3}$	0.0610	$1.269\times 10^{-4}$	0.0657	$1.269\times 10^{-4}$	0.5147
5001	$2.570\times 10^{-3}$	0.0755	$2.489\times 10^{-5}$	0.0754	$2.489\times 10^{-5}$	1.6242
10001	$2.094\times 10^{-4}$	0.0922	$1.220\times 10^{-5}$	0.0921	$1.220\times 10^{-5}$	3.9471
40001	$7.294\times 10^{-5}$	0.1782	$3.050\times 10^{-6}$	0.1787	$3.050\times 10^{-6}$	7.0408

Table 1: Relative errors and CPU times of SLR, Improved LR and MDI-LR simulations with

N=[n^{\frac{1}{d}}],a=N

n_{1}=n_{2}=N

for approximating

I_{2}(f)

	SLR(Standard LR)		Imp-LR(Improved LR)		MDI-LR
Total nodes ( $n$ )	Relative error	CPU time	Relative error	CPU time	Relative error	CPU time
101	$1.163\times 10^{-2}$	0.0415	$1.072\times 10^{-3}$	0.0410	$1.072\times 10^{-3}$	0.0980
501	$6.794\times 10^{-3}$	0.0539	$1.399\times 10^{-4}$	0.0546	$1.399\times 10^{-4}$	0.3498
1001	$3.814\times 10^{-3}$	0.0647	$7.040\times 10^{-5}$	0.0653	$7.040\times 10^{-5}$	0.5028
5001	$1.858\times 10^{-3}$	0.0723	$1.3411\times 10^{-5}$	0.0733	$1.341\times 10^{-5}$	1.7212
10001	$1.175\times 10^{-4}$	0.0965	$6.759\times 10^{-6}$	0.0945	$6.759\times 10^{-6}$	3.4528
40001	$2.937\times 10^{-5}$	0.1386	$1.689\times 10^{-6}$	0.1399	$1.689\times 10^{-6}$	6.1104

Table 2: Relative errors and CPU times of SLR, Improved LR and MDI-LR simulations with

N=[n^{\frac{1}{d}}],a=N

n_{1}=n_{2}=N

or approximating

I_{2}(\widehat{f})

Test 2. Let $\Omega=[0,1]^{3}$ and we consider the following 3-d integrands:

(5.2)

f(x):=\frac{\exp\bigl{(}x_{1}+x_{2}+x_{3}\bigr{)}}{(e-1)^{3}};\qquad\widehat{f}(x):=\sin\bigl{(}2\pi+x_{1}^{2}+x_{2}^{2}+x_{3}^{2}\bigr{)}.

Tables 3 and 4 present the simulation results (errors and CPU time) of the SLR, Imp-LR, and MDI-LR methods for computing $I_{3}(f)$ and $I_{3}(\widehat{f})$ in Test 2. We observe that the SLR method requires less CPU time in both simulations. The advantage of the MDI-LR method in accelerating the computation does not materialize in low dimensions as seen in Test 1. Once again, the Imp-LR and MDI-LR have higher accuracy compared to the SLR method because they use additional sampling points on the boundary of the transformed domain.

	SLR(Standard LR)		Imp-LR(Improved LR)		MDI-LR
Total nodes ( $n$ )	Relative error	CPU time	Relative error	CPU time	Relative error	CPU time
101	$3.426\times 10^{-3}$	0.0574	$4.985\times 10^{-3}$	0.0588	$4.985\times 10^{-3}$	0.0877
1001	$6.276\times 10^{-3}$	0.0634	$1.249\times 10^{-3}$	0.0654	$1.249\times 10^{-3}$	0.2684
10001	$9.920\times 10^{-4}$	0.0833	$3.124\times 10^{-4}$	0.0877	$3.124\times 10^{-4}$	0.6322
100001	$5.717\times 10^{-4}$	0.1500	$5.907\times 10^{-5}$	0.1499	$5.907\times 10^{-5}$	2.5866
1000001	$1.369\times 10^{-5}$	1.0589	$1.249\times 10^{-5}$	1.0587	$1.249\times 10^{-5}$	14.737
10000001	$8.441\times 10^{-6}$	9.8969	$3.124\times 10^{-6}$	10.280	$3.124\times 10^{-6}$	91.897

Table 3: Relative errors and CPU times of SLR, Improved LR and MDI-LR simulations with

N=[n^{\frac{1}{d}}],a=N

n_{1}=n_{2}=n_{3}=N

for computing

I_{3}(f)

	SLR(Standard LR)		Imp-LR(Improved LR)		MDI-LR
Total nodes ( $n$ )	Relative error	CPU time	Relative error	CPU time	Relative error	CPU time
101	$1.866\times 10^{-2}$	0.0580	$1.008\times 10^{-3}$	0.0554	$1.008\times 10^{-3}$	0.1366
1001	$9.746\times 10^{-3}$	0.0628	$2.739\times 10^{-4}$	0.0649	$2.739\times 10^{-4}$	0.3804
10001	$1.001\times 10^{-3}$	0.0820	$6.337\times 10^{-5}$	0.0828	$6.337\times 10^{-5}$	1.1032
100001	$7.063\times 10^{-4}$	0.1443	$1.326\times 10^{-5}$	0.1557	$1.326\times 10^{-5}$	4.8794
1000001	$2.211\times 10^{-5}$	1.1163	$2.810\times 10^{-6}$	1.2104	$2.810\times 10^{-6}$	20.305
10000001	$1.650\times 10^{-5}$	10.207	$7.026\times 10^{-7}$	10.427	$7.026\times 10^{-7}$	101.22

Table 4: Relative errors and CPU times of SLR, Improved LR and MDI-LR simulations with

N=[n^{\frac{1}{d}}],a=N

n_{1}=n_{2}=n_{3}=N

for computing

I_{3}(\widehat{f})

5.2 High-dimensional tests

Since the MDI-LR method is designed for computing high-dimensional integrals, its performance for $d>>1$ is more important and anticipated, which is indeed the main task of this subsection. First, we test and compare the performance (in terms of CPU time) of the SLR, Imp-LR, and MDI-LR methods for computing high-dimensional integrals as the number of lattice points grows due to the dimension increases. Then, we also test the performance of the SLR and MDI-LR methods for computing high-dimensional integrals when the number of lattice points increases slowly in the dimension $d$ .

Test 3. Let $\Omega=[0,1]^{d}$ for $2\leq d\leq 50$ and consider the following Gaussian integrand:

(5.3)

f(x)=\frac{1}{\sqrt{2\pi}}\exp\Bigl{(}-\frac{1}{2}|x|^{2}\Bigr{)},

where $|x|$ stands for the Euclidean norm of the vector $x\in\mathbb{R}^{d}$ .

Table 5 shows the relative errors and CPU times of SLR, Imp-LR, and MDI-LR methods for approximating the Gaussian integral $I_{d}(f)$ . The simulation results indicate that SLR and Imp-LR methods are more efficient when $d<7$ , but they struggle to compute integrals when $d>11$ as the number of lattice points increases exponentially in the dimension. However, this is not a problem for the MDI-LR method, which can compute this high-dimensional integral easily. Moreover, the MDI-LR method improves the accuracy of the original QMC rule significantly by adding some integration points on the boundary of the transformed domain.

	SLR(Standard LR) Total Nodes( $1+10^{d}$ )		Imp-LR(Improved LR) Total Nodes( $1.1\times 10^{d}$ )		MDI-LR Total Nodes( $1.1\times 10^{d}$ )
Dimension ( $d$ )	Relative error	CPU time	Relative error	CPU time	Relative error	CPU time
2	$4.802\times 10^{-3}$	0.0622	$5.398\times 10^{-4}$	0.0637	$5.398\times 10^{-4}$	0.1335
4	$3.796\times 10^{-3}$	0.1068	$1.131\times 10^{-3}$	0.1206	$1.131\times 10^{-3}$	0.5780
6	$7.780\times 10^{-3}$	1.2450	$1.723\times 10^{-3}$	1.2745	$1.723\times 10^{-3}$	1.2890
8	$1.189\times 10^{-2}$	124.91	$2.315\times 10^{-3}$	126.85	$2.315\times 10^{-3}$	1.4083
10	$1.602\times 10^{-2}$	13084	$2.908\times 10^{-3}$	13255	$2.908\times 10^{-3}$	3.1418
11	$1.809\times 10^{-2}$	132927	$3.204\times 10^{-3}$	141665	$3.204\times 10^{-3}$	3.8265
12	failed	failed	failed	failed	$3.501\times 10^{-3}$	4.5919

Table 5: Relative errors and CPU times of SLR, Improved LR and MDI-LR simulations with

N=[n^{\frac{1}{d}}],a=N

n_{1}=\cdots=n_{d}=N

for computing

I_{d}(f)

			SLR		MDI-LR
Dimension ( $d$ )	Total nodes ( $n$ )	$a$ value	Relative error	CPU time	Relative error	CPU time
2	1+ $10^{3}$	31	$4.8020\times 10^{-4}$	0.0369	$6.1474\times 10^{-5}$	0.432905
6	1+ $10^{6}$	10	$7.7798\times 10^{-3}$	1.2450	$1.7745\times 10^{-3}$	0.790102
10	1+ $10^{6}$	4	$5.3673\times 10^{-2}$	1.2453	$1.8683\times 10^{-2}$	0.582487
14	1+ $10^{8}$	4	$7.9282\times 10^{-2}$	144.759	$2.6253\times 10^{-2}$	0.536131
18	1+ $10^{9}$	3	$1.5827\times 10^{-1}$	1649.59	$6.1158\times 10^{-2}$	0.774606
22	1+ $10^{10}$	3	$2.0007\times 10^{-1}$	18694.04	$7.5249\times 10^{-2}$	0.702708
26	1+ $10^{11}$	3	$2.4341\times 10^{-1}$	217381.41	$8.9527\times 10^{-2}$	0.866122
30	1+ $10^{11}$	3	$2.9009\times 10^{-1}$	269850.87	$1.0399\times 10^{-1}$	1.045107

Table 6: Relative errors and CPU times of SLR and MDI-LR simulations with the same number of integration points for computing

I_{d}(f)

Table 6 shows the relative errors and CPU times of the SLR and MDI-LR methods for computing $I_{d}(f)$ when the number of lattice points increase slowly in the dimension $d$ . As the dimension increases, the CPU time required by the SLR method also increases sharply (see Figure 6). When approximating the Gaussian integral of about 30 dimensions with $10^{11}$ lattice points, the SLR method requires $74$ hours to obtain a result with relatively low accuracy. In contrast, the MDI-LR method only takes about one second to obtain a more accurate value, this demonstrates that the acceleration effect of the MDI-LR method is quite dramatic.

It is well known that it is difficult to obtain high-accuracy approximations in high dimensions because the number of integration points required is enormous. A natural question is whether the MDI-LR method can handle very high (i.e., $d\approx 1000$ ) dimensional integration with reasonable accuracy. First, we note that the answer is machine dependent, as expected. Next, we present a test on the computer at our disposal to provide a positive answer to this question

Test 4. Let $\Omega=[0,1]^{d}$ and consider the following integrands:

(5.4)

f(x)=\exp\Bigl{(}\sum_{i=1}^{d}(-1)^{i+1}x_{i}\Bigr{)},\qquad\widehat{f}(x)=\prod_{i=0}^{d}\frac{1}{0.9^{2}+(x_{i}-0.6)^{2}}.

We use the algorithm MDI-LR to compute $I_{d}(f)$ and $I_{d}(\widehat{f})$ with parameters $a=8,10$ , and an increasing sequence of $d$ . The computed results are presented in Table 7. The simulation is stopped at $d=1000$ because it is already in the very high dimension regime. These tests demonstrate the efficacy and potential of the MDI-LR method in efficiently computing high-dimensional integrals. However, we note that in terms of efficiency and accuracy, the MDI-LR method underperforms its two companion methods, namely, MDI-TP [11] and MDI-SG [12] methods. The main reason for the underperformance is that the original lattice rule is unable to provide high-accuracy integral approximations and the MDI-LR is only a fast implementation algorithm (i.e., solver) for the original lattice rule. Nevertheless, the lattice rule has its own advantages, such as allowing flexible integration points and giving better results for periodic integrands.

	$I_{d}(f)$ Nodes( $1\times 8^{d}$ )			$I_{d}(\widehat{f})$ Nodes( $1\times 20^{d}$ )
Dimension ( $d$ )	$a$ value	Relative error	CPU time(s)	$a$ value	Relative error	CPU time(s)
10	8	$6.4884\times 10^{-3}$	0.4329063	20	$1.6107\times 10^{-3}$	0.9851172
100	8	$6.3022\times 10^{-2}$	71.253076	20	$1.6225\times 10^{-2}$	11.1203255
300	8	$1.7740\times 10^{-1}$	1856.91018	20	$4.9469\times 10^{-2}$	37.0903112
500	8	$2.7781\times 10^{-1}$	8076.92429	20	$8.3801\times 10^{-2}$	65.9497657
700	8	$3.6597\times 10^{-1}$	20969.96162	20	$1.1925\times 10^{-1}$	108.989057
900	8	$4.4337\times 10^{-1}$	47870.50843	20	$1.5587\times 10^{-1}$	157.487672
1000	8	$4.7845\times 10^{-1}$	69991.88017	20	$1.7462\times 10^{-1}$	189.132615

Table 7: Computed results for

I_{d}(f)

and

I_{d}(\widehat{f})

by algorithm MDI-LR.

6 Influence of parameters

The original MDI algorithm involves three crucial input parameters: $r$ , $m$ , and $N$ . The parameter $r$ determines the one-dimensional basis value quadrature rule, while $m$ sets the step size in the multidimensional iteration, and $N$ represents the number of integration points in each coordinate direction. The algorithm MDI-LR is similar to the original MDI, but uses the QMC rank-one lattice rule with generating vector $\mathbf{z}$ , so the parameter $r$ is muted. Here we focus on the Korobov approach in constructing the generating vector $\mathbf{z}$ , which is defined as $\mathbf{z}=\mathbf{z}(a):=(1,a,a^{2},\cdots,a^{d-1})$ . Moreover, the improved tensor product rule (in the transformed coordinate system) implemented by the algorithm Imp-LR has a variable upper limits in the summation (cf. (4.3)), hence, $N$ is now replaced by $\mathbf{N}_{d}$ which is determined by the underlying QMC lattice rule. Furthermore, as explained earlier, we set $m=1$ due to our experience in [11]. As a result, the only parameter to select is $a$ . Below, we first test the influence of the Korobov parameter $a$ on the efficiency of the algorithm MDI-LR and then test dependence of its performance on $\mathbf{N}_{d}$ and $d$ .

6.1 Influence of parameter $a$

In this subsection, we investigate the impact of the generating vector $\mathbf{z}=\mathbf{z}(a):=(1,a,a^{2},\cdots,a^{d-1})$ in the algorithm MDI-LR . We note that similar methods can be constructed using other $\mathbf{z}$ .

Test 5. Let $\Omega=[0,1]^{d}$ and consider the following integrands:

		$\displaystyle f(x)=\frac{1}{\sqrt{2\pi}}\exp\Bigl{(}-\frac{1}{2}\|x\|^{2}\Bigr{)},$	$\displaystyle\qquad\widehat{f}(x)=\cos\Bigl{(}2\pi+2\sum_{i=1}^{d}x_{i}\Bigr{)},$
		$\displaystyle\widetilde{f}(x)=\prod_{i=0}^{d}\frac{1}{0.9^{2}+(x_{i}-0.6)^{2}}.$

We compare the performance of the algorithm MDI-LR with different Korobov parameters $a$ while holding other parameters unchanged when computing $I_{d}(f)$ , $I_{d}(\widehat{f})$ , and $I_{d}(\widetilde{f})$ .

Figure 7 shows the computed results for $d=5,10$ and $a=4,6,8,10,12,14,16$ , respectively. We observe that the algorithm MDI-LR with different parameters $a$ has different accuracy and the effect could be significant. These results indicate that the algorithm is most efficient when $a=N$ , where $N=[n^{\frac{1}{d}}]$ and $n$ represents the total number of integral points. This is because when a smaller $a$ is used, although fewer integration points need to be evaluated in each coordinate direction in the first $d-1$ dimension iterations, since the total number of integral points $n$ is the same, the amount of computation will increase dramatically. When using a larger $a$ , more integration points need to be used in each coordinate direction in the first $d-1$ dimension iterations. Only when the integration points are equally distributed to each coordinate direction, the efficiency of the algorithm MDI-LR can be optimized. A total of $100$ points are shown in Figure 8. When $a=2$ , only $2$ iterations in the $x_{1}$ -direction are needed, but $50$ iterations in the $x_{2}$ -direction must performed, hence, a total of $52$ iterations in the two directions are required. On the other hand, when $a=20$ , a total of $25$ iterations in the two directions are required. It is easy to check that the least total of $20$ iterations occurs when $a=10$ . The difference in accuracy is obvious, because the different $a$ leads to different generating vector $\mathbf{z}$ , which in turn results in different integration points. We note that it was already well studied in the literature on how to choose $a$ to achieve the highest accuracy (cf. []).

6.2 Influence of parameter $N=[n^{\frac{1}{d}}]$

In the previous section, we know that the algorithm is most efficient when $a=N$ , where $N$ represents the number of integration points in each direction. This section aims to investigate the impact of $N$ on the MDI-LR algorithm. For this purpose, we conduct tests by setting $a=N$ and $d=5$ and $d=10$ .

Test 6. Let $\Omega$ , $f$ , $\widehat{f}$ and $\widetilde{f}$ be the same as in Test 5.

Table 8, 9, and 10 present a performance comparison for algorithm MDI-LR with $d=5,10$ and $N=4,6,8,10,12,14,16$ , respectively. We note that the quality of the computed results also depend on types of the integrands. As expected, more integration points must be used to achieve a good accuracy for very oscillatory and fast growth integrands.

		$d=5$		$d=10$
N(n)	Korobov parameter $(a)$	Relative error	CPU time(s)	Relative error	CPU time(s)
4( $1+4^{d}$ )	4	$8.6248\times 10^{-3}$	0.1456465	$1.8003\times 10^{-2}$	0.3336161
6( $1+6^{d}$ )	6	$3.8967\times 10^{-3}$	0.1911801	$8.0284\times 10^{-3}$	0.5690320
8 ( $1+8^{d}$ )	8	$2.2145\times 10^{-3}$	0.3373442	$4.5314\times 10^{-3}$	0.9552591
10 ( $1+10^{d}$ )	10	$1.4271\times 10^{-3}$	0.3884146	$2.9078\times 10^{-3}$	1.9385378
12( $1+12^{d}$ )	12	$9.9601\times 10^{-4}$	0.6545521	$2.0234\times 10^{-3}$	3.5639475
14( $1+14^{d}$ )	14	$7.3448\times 10^{-4}$	0.7224777	$1.4889\times 10^{-3}$	6.0036393
16( $1+16^{d}$ )	16	$5.6396\times 10^{-4}$	1.0909097	$1.1414\times 10^{-3}$	8.4313528

Table 8: Performance comparison of algorithm MDI-LR with

d=5,10,a=N

and

N=4,6,8,10,12,14,16

for computing

I_{d}(f)

		$d=5$		$d=10$
N(n)	Korobov parameter $(a)$	Relative error	CPU time(s)	Relative error	CPU time(s)
4( $1+4^{d}$ )	4	$4.9621\times 10^{-2}$	0.1323887	$1.0585\times 10^{-1}$	0.2775234
6( $1+6^{d}$ )	6	$2.2181\times 10^{-2}$	0.1955847	$4.6141\times 10^{-2}$	0.4267706
8 ( $1+8^{d}$ )	8	$1.2558\times 10^{-2}$	0.2689113	$2.5836\times 10^{-2}$	0.5697773
10 ( $1+10^{d}$ )	10	$8.0791\times 10^{-3}$	0.3227299	$1.6517\times 10^{-2}$	0.7828456
12( $1+12^{d}$ )	12	$5.6328\times 10^{-3}$	0.4056192	$1.1470\times 10^{-2}$	0.9228344
14( $1+14^{d}$ )	14	$4.1513\times 10^{-3}$	0.4940739	$8.4305\times 10^{-3}$	1.0968489
16( $1+16^{d}$ )	16	$3.1863\times 10^{-3}$	0.6079693	$6.4576\times 10^{-3}$	1.2933549

Table 9: Performance comparison of algorithm MDI-LR with

d=5,10,a=N

and

N=4,6,8,10,12,14,16

for computing

I_{d}(\widehat{f})

		$d=5$		$d=10$
N(n)	Korobov parameter $(a)$	Relative error	CPU time(s)	Relative error	CPU time(s)
4( $1+4^{d}$ )	4	$1.9003\times 10^{-2}$	0.1254485	$3.9895\times 10^{-2}$	0.2460844
6( $1+6^{d}$ )	6	$8.5331\times 10^{-3}$	0.1802281	$1.7625\times 10^{-2}$	0.3613987
8 ( $1+8^{d}$ )	8	$4.8390\times 10^{-3}$	0.2114595	$9.9155\times 10^{-3}$	0.4414383
10 ( $1+10^{d}$ )	10	$3.1153\times 10^{-3}$	0.2748469	$6.3531\times 10^{-3}$	0.4892808
12( $1+12^{d}$ )	12	$2.1729\times 10^{-3}$	0.3092816	$4.4172\times 10^{-3}$	0.5859328
14( $1+14^{d}$ )	14	$1.6018\times 10^{-3}$	0.3602077	$3.2488\times 10^{-3}$	0.6783681
16( $1+16^{d}$ )	16	$1.2296\times 10^{-3}$	0.4157161	$2.4897\times 10^{-3}$	0.7819849

Table 10: Performance comparison of algorithm MDI-LR with

d=5,10,a=N

and

N=4,6,8,10,12,14,16

for computing

I_{d}(\widetilde{f})

7 Computational complexity

7.1 The relationship between the CPU time and $N$

In this subsection, we examine the relationship between CPU time and the parameter $N=[n^{\frac{1}{d}}]$ and $a=N$ using a regression technique based on test data.

Integrand	$a$	$m$	$d$	Fitting function	R-square
$f(x)$	N	1	5	$h_{1}(N)=(0.007569)*N^{1.772}$	0.9687
$\widehat{f}(x)$	N	1	5	$h_{2}(N)=(0.02326)*N^{1.165}$	0.9920
$\widetilde{f}(x)$	N	1	5	$h_{3}(N)=(0.03592)*N^{0.8767}$	0.9946
$f(x)$	N	1	10	$h_{4}(N)=(0.002136)*N^{2.992}$	0.9968
$\widehat{f}(x)$	N	1	10	$h_{5}(N)=(0.05679)*N^{1.125}$	0.9984
$\widetilde{f}(x)$	N	1	10	$h_{6}(N)=(0.07872)*N^{0.8184}$	0.9901

Table 11: Relationship between the CPU time and parameter

N

Figures 9 and 10 show CPU time as a function of $N$ obtained by the least squares regression with the fitted function given in Table 11. All results show that CPU time grows in proportion to $N^{3}$ .

7.2 The relationship between the CPU time and the dimension $d$

In this subsection, we exploit the computational complexity (in terms of CPU time as a function of $d$ ) using the least squares regression on numerical test data.

Test 7. Let $\Omega=[0,1]^{d}$ , we consider the following five integrands:

	$\displaystyle f_{1}(x)=\exp\Bigl{(}\sum_{i=1}^{d}(-1)^{i+1}x_{i}\Bigr{)},$	$\displaystyle\qquad f_{2}(x)=\prod_{i=1}^{d}\frac{1}{0.9^{2}+(x_{i}-0.6)^{2}},$
	$\displaystyle f_{3}(x)=\frac{1}{\sqrt{2\pi}}\exp\Bigl{(}-\frac{1}{2}\|x\|^{2}\Bigr{)},$	$\displaystyle\qquad f_{4}(x)=\cos\Bigl{(}2\pi+\sum_{i=1}^{d}2x_{i}\Bigr{)},$
	$\displaystyle f_{5}(x)=\exp\Bigl{(}\sum_{i=1}^{d}(-1)^{i+1}x_{i}^{2}\Bigr{)},$	$\displaystyle\qquad f_{6}(x)=(1+\sum_{i=1}^{d}x_{i})^{-(d+1)}.$

Figure 11 displays the the CPU time as functions of $d$ obtained by the least square regression whose analytical expressions are given in Table 12. We note that the parameters of the algorithm MDI-LR only affect the coefficients of the fitted function, not the power of the polynomials. These results show that the CPU time required by the proposed algorithm MDI-LR grows at most with polynomial order $O(d^{3}N^{2})$ .

Integrand	$a$	$N$	$m$	Fitting function	$R$ -square
$f_{1}$	8	8	1	$g_{1}=(1.057e-06)*N^{2}d^{3}$	0.9973
	10	10	1	$g_{2}=(1.192e-06)*N^{2}d^{3}$	0.9995
	20	20	1	$g_{3}=(1.433e-06)*N^{2}d^{3}$	0.9978
$f_{2}$	10	10	1	$g_{4}=0.0001774*Nd^{1.611}$	0.9983
	14	14	1	$g_{5}=0.003028*Nd^{1.147}$	0.9987
	20	20	1	$g_{6}=0.000539*Nd^{1.41}$	0.9964
$f_{3}$	8	8	1	$g_{7}=(7.334e-06)*N^{2}d^{3}$	0.9983
	10	10	1	$g_{8}=(9.321e-06)*N^{2}d^{3}$	0.9986
	14	14	1	$g_{9}=(1.339e-05)*N^{2}d^{3}$	0.9972
$f_{4}$	10	10	1	$g_{10}=(1.164e-06)*N^{2}d^{3}$	0.9988
	20	20	1	$g_{11}=(1.319e-06)*N^{2}d^{3}$	0.9974
$f_{5}$	10	10	1	$g_{12}=(6.479e-05)*N^{2}d^{2.557}$	0.9996
	14	14	1	$g_{13}=(1.164e-05)*N^{2}d^{3}$	0.9993
$f_{6}$	10	10	1	$g_{14}=(1.556e-06)*N^{2}d^{3}$	0.9983
	20	20	1	$g_{15}=(8.328e-06)*N^{2}d^{2.431}$	0.9998

Table 12: The relationship between CPU time as a function of the dimension

d

We assess the quality of the fitted curves using the $R$ -square criterion in Matlab, defined by $R$ - $\mbox{\rm square}=1-\frac{\sum_{i}^{n}(y_{i}-\widehat{y}_{i})^{2}}{\sum_{i}^{n}(y_{i}-\overline{y})^{2}}$ , where $y_{i}$ is a test data output, $\widehat{y}_{i}$ is the predicted value, and $\overline{y}$ is the mean of $y_{i}$ . As shown in Table 12, the $R$ -square values of all fitted functions are close to $1$ , indicating their high accuracy. These results support the observation that the CPU time grows no more than cubically with the dimension $d$ . Combined with the results of Test 6 in Section 7.1, we conclude that the computational cost of the proposed MDI-LR algorithm scales at most polynomially in the order of $O(N^{2}d^{3})$ .

8 Conclusions

In this paper, we introduced an efficient and fast algorithm MDI-LR for implementing QMC lattice rules for high-dimensional numerical integration. It is based on the idea of converting and extending them into tensor product rules by affine transformations, and adopting the multilevel dimension iteration approach which computes the function evaluations (at the integration points) in the multi-summation in cluster and iterates along each (transformed) coordinate direction so that a lot of computations can be reused. Based on numerical simulation results, it was concluded that the computational complexity of the algorithm MDI-LR (in terms of CPU time) grows at most cubically in the dimension $d$ and has an overall growth rate $O(d^{3}N^{2})$ , which suggests that the proposed algorithm MDI-LR can effectively mitigate the curse of dimensionality in high-dimensional numerical integration, making the QMC lattice rule not only competitive but also practically useful for high dimension numerical integration. Extensive numerical tests were provided to guage the performance of the algorithm MDI-LR and to compare its performance with the standard QMC lattice rules. Extensions to general Monte Carlo methods and applications of the proposed MDI-LR algorithm for solving high-dimensional PDEs will be explored and reported in a forthcoming work.

References

[1] N. M. Korobov, The approximate computation of multiple integrals, Dokl. Akad. Nauk SSSR, 1959, 124:1207–1210. In Russian.
[2] I. H. Sloan, S. Joe, Lattice methods for multiple integration, Oxford University Press, 1994.
[3] J. Dick, F. Y. Kuo, and I. H. Sloan, High-dimensional integration: the quasi-Monte Carlo way, Acta Numer, 22:133–288, 2013.
[4] X. Wang, I. H. Sloan, and J. Dick, On Korobov lattice rules in weighted spaces, SIAM journal on numerical analysis, 2004, 42(4): 1760-1779.
[5] N. M. Korobov, Properties and calculation of optimal coefficients,Doklady Akademii Nauk. Russian Academy of Sciences, 1960, 132(5): 1009-1012.
[6] H. Niederreiter, A. Winterhof, Applied number theory, Cham: Springer, 2015.
[7] H. Niederreiter, Quasi-Monte Carlo methods and pseudo-random numbers,Bulletin of the American mathematical society, 1978, 957-1041.
[8] H. Niederreiter, Existence of good lattice points in the sense of Hlawka,Monatshefte für Mathematik, 1978, 86(3): 203-219.
[9] I. H. Sloan, A. Reztsov Component-by-component construction of good lattice rules, Mathematics of Computation, 2002, 71(237): 263-273.
[10] P.Kritzer, H. Niederreiter, F. Pillichshammer Ian Sloan and Lattice Rules,Contemporary Computational Mathematics-A Celebration of the 80th Birthday of Ian Sloan, 2018: 741-769.
[11] X. Feng and H. Zhong, A fast multilevel dimension iteration algorithm for high dimensional numerical integration. preprint.
[12] H. Zhong and X. Feng , An Efficient and fast Sparse Grid algorithm for high dimensional numerical integration. preprint.
[13] Paul Bratley, Bennett Fox, Harald Niederreiter, Implementation and Tests of Low Discrepancy Sequences, ACM Transactions on Modeling and Computer Simulation, Volume 2, Number 3, July 1992, pages 195-213.
[14] Henri Faure, Good permutations for extreme discrepancy, Journal of Number Theory, Volume 42, 1992, pages 47-56.
[15] Ilya Sobol, Uniformly Distributed Sequences with an Additional Uniform Property, USSR Computational Mathematics and Mathematical Physics, Volume 16, 1977, pages 236-242.
[16] H. Niederreiter, Low-discrepancy and low-dispersion sequences, Journal of Number Theory, Volume 30, 1988, pages 51-70.
[17] H.-J. Bungartz and M. Griebel, Sparse grids, Acta Numer., 13:147–269, 2014.
[18] T. Gerstner and M. Griebel, Numerical integration using sparse grids, Numerical algorithms, 18(3): 209–232, 1998.
[19] R. E. Caflisch, Monte Carlo and quasi-Monte Carlo methods, Acta Numer., 7:1–49, 1998.
[20] Y. Ogata, A Monte Carlo method for high dimensional integration, Numer. Math., 55:137–157, 1989.
[21] F. Y. Kuo, C. Schwab, and I. H. Sloan, Quasi-Monte Carlo methods for high-dimensional integration: the standard (weighted Hilbert space) setting and beyond, ANZIAM J., 53:1–37, 2011.
[22] H. Niederreiter, Random number generation and quasi-Monte Carlo methods, Society for Industrial and Applied Mathematics, 1992.
[23] S. Haber, Experiments on optimal coefficients, Applications of number theory to numerical analysis. Academic Press, 1972: 11-37.
[24] A. R. Krommer,C. W.Ueberhuber, Computational integration, Society for Industrial and Applied Mathematics, 1998.

	$\displaystyle\Gamma_{n_{1}}^{1}:=\Big{\{}y_{s_{1}}\,\|y_{s_{1}}={abs}\bigg{(}\frac{1}{z_{2}}\Big{\lfloor}\frac{iz_{2}}{n}\Big{\rfloor}-\frac{1}{z_{1}}\Big{\lfloor}\frac{iz_{1}}{n}\Big{\rfloor}\bigg{)},i=0,1,\cdots,n-1,s_{1}=0,1,\cdots,n_{1}-1\Big{\}},$
	$\displaystyle\Gamma_{n_{2}}^{1}:=\Big{\{}y_{s_{2}}\,\|y_{s_{2}}={abs}\bigg{(}\frac{1}{z_{3}}\Big{\lfloor}\frac{iz_{3}}{n}\Big{\rfloor}-\frac{1}{z_{2}}\Big{\lfloor}\frac{iz_{2}}{n}\Big{\rfloor}\bigg{)},i=0,1,\cdots,n-1,s_{2}=0,1,\cdots,n_{2}-1\Big{\}},$
	$\displaystyle\quad\vdots$
	$\displaystyle\Gamma_{n_{d-1}}^{1}:=\Big{\{}y_{s_{d-1}}\,\|y_{s_{d-1}}={abs}\bigg{(}\frac{1}{z_{d}}\Big{\lfloor}\frac{iz_{d}}{n}\Big{\rfloor}-\frac{1}{z_{d-1}}\Big{\lfloor}\frac{iz_{d-1}}{n}\Big{\rfloor}\bigg{)},i=0,1,\cdots,n-1,\atop\hskip 202.35622pts_{d-1}=0,1,\cdots,n_{d-1}-1\Big{\}},$
	$\displaystyle\Gamma_{n_{d}}^{1}:=\Big{\{}y_{s_{d}}\,\|y_{s_{d}}={abs}\bigg{(}\frac{z_{d}}{n}\Big{\lfloor}i-\frac{n}{z_{d}}\Big{\lfloor}\frac{iz_{d}}{n}\Big{\rfloor}\Big{\rfloor}\bigg{)},i=0,1,\cdots,n-1,s_{d}=0,1,\cdots,n_{d}-1\Big{\}},$
	$\displaystyle\Gamma_{n}^{d}:=\Gamma_{n_{1}}^{1}\otimes\Gamma_{n_{2}}^{1}\otimes\cdots\otimes\Gamma_{n_{d}}^{1},$

An efficient implementation algorithm for quasi-Monte Carlo approximations of high-dimensional integrals

Abstract

keywords:

1 Introduction

2 Preliminaries

2.1 Quasi-Monte Carlo lattice rules

Definition 2.1 (rank-one lattice rule).

Theorem 2.2.

Theorem 2.3.

Theorem 2.4.

Remark 2.5.

2.2 Examples of good rank-one lattice rules

Example 2.6 (Fibonacci lattice).

Example 2.7 (Korobov lattice).

Example 2.8 (CBC lattice).

3 Reformulation of lattice rules

3.1 Construction of affine coordinate transformations

Theorem 3.1.

Proof 3.2.

Lemma 3.3.

Proof 3.4.

3.2 Improved lattice rules

Definition 3.5 (Improved QMC lattice rule).

4 The MDI-LR algorithm

4.1 Formulation of the MDI-LR algorithm

Remark 4.1.

5 Numerical performance tests

5.1 Two and three-dimensional tests

5.2 High-dimensional tests

6 Influence of parameters

6.1 Influence of parameter aa

6.2 Influence of parameter N=[n1d]N=[n^{\frac{1}{d}}]

7 Computational complexity

7.1 The relationship between the CPU time and NN

7.2 The relationship between the CPU time and the dimension dd

8 Conclusions

References

6.1 Influence of parameter $a$

6.2 Influence of parameter $N=[n^{\frac{1}{d}}]$

7.1 The relationship between the CPU time and $N$

7.2 The relationship between the CPU time and the dimension $d$