∎

¹¹institutetext: J. Dick ²²institutetext: School of Mathematics and Statistics, University of New South Wales, Sydney NSW 2052, Australia
²²email: [email protected] ³³institutetext: T. Goda ⁴⁴institutetext: H. Murata ⁵⁵institutetext: School of Engineering, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
⁵⁵email: [email protected]
⁵⁵email: [email protected]

Toeplitz Monte Carlo

Josef Dick Takashi Goda Hiroya Murata

(Received: date / Accepted: date)

Abstract

Motivated mainly by applications to partial differential equations with random coefficients, we introduce a new class of Monte Carlo estimators, called Toeplitz Monte Carlo (TMC) estimator for approximating the integral of a multivariate function with respect to the direct product of an identical univariate probability measure. The TMC estimator generates a sequence $x_{1},x_{2},\ldots$ of i.i.d. samples for one random variable, and then uses $(x_{n+s-1},x_{n+s-2}\ldots,x_{n})$ with $n=1,2,\ldots$ as quadrature points, where $s$ denotes the dimension. Although consecutive points have some dependency, the concatenation of all quadrature nodes is represented by a Toeplitz matrix, which allows for a fast matrix-vector multiplication. In this paper we study the variance of the TMC estimator and its dependence on the dimension $s$ . Numerical experiments confirm the considerable efficiency improvement over the standard Monte Carlo estimator for applications to partial differential equations with random coefficients, particularly when the dimension $s$ is large.

Keywords:

Monte Carlo Toeplitz matrix fast matrix-vector multiplication high-dimensional integration PDEs with random coefficients

MSC:

MSC 65C05

1 Introduction

The motivation of this research mainly comes from applications to uncertainty quantification for ordinary or partial differential equations with random coefficients. The problem we are interested in is to estimate an expectation (integral)

I_{\rho_{s}}(f):=\int_{\Omega^{s}}f(\boldsymbol{x})\rho_{s}(\boldsymbol{x})\,\mathrm{d}\boldsymbol{x}\quad\text{with}\quad\rho_{s}(\boldsymbol{x})=\prod_{j=1}^{s}\rho(x_{j}),

for large $s$ with $\rho$ being the univariate probability density function defined over $\Omega\subseteq\mathbb{R}$ . In some applications, the integrand is of the form

f(\boldsymbol{x})=g(\boldsymbol{x}A),

for a matrix $A\in\mathbb{R}^{s\times t}$ and a function $g\colon\mathbb{R}^{t}\to\mathbb{R}$ , see Dick et al. (2015). Here we note that $\boldsymbol{x}$ is defined as a row vector. Typically, $\rho$ is given by the uniform distribution on the unit interval $\Omega=[0,1]$ , or by the standard normal distribution on the real line $\Omega=\mathbb{R}$ .

The standard Monte Carlo method approximates $I_{\rho_{s}}(f)$ as follows: we first generate a sequence of i.i.d. samples of the random variables $\boldsymbol{x}\sim\rho_{s}$ :

\boldsymbol{x}_{1}=(x_{1,1},\ldots,x_{s,1}),\boldsymbol{x}_{2}=(x_{1,2},\ldots,x_{s,2}),\ldots,

and then approximate $I_{\rho_{s}}(f)$ by

I^{\mathrm{MC}}_{\rho_{s}}(f;N)=\frac{1}{N}\sum_{n=1}^{N}f(\boldsymbol{x}_{n})=\frac{1}{N}\sum_{n=1}^{N}g(\boldsymbol{x}_{n}A).

(1)

It is well known that

\mathbb{E}[I^{\mathrm{MC}}_{\rho_{s}}(f;N)]=I_{\rho_{s}}(f)

and

\mathbb{V}[I^{\mathrm{MC}}_{\rho_{s}}(f;N)]=\frac{I_{\rho_{s}}(f^{2})-(I_{\rho_{s}}(f))^{2}}{N},

(2)

which ensures the canonical “one over square root of $N$ ” convergence.

Now let us consider a situation where computing $\boldsymbol{x}_{n}A$ for $n=1,\ldots,N$ takes a significant amount of time in the computation of $I^{\mathrm{MC}}_{\rho_{s}}(f;N)$ . In general, if the matrix $A$ does not have any special structure such as circulant, Hankel, Toeplitz, or Vandermonde, then fast matrix-vector multiplication is not available and the computation of $I^{\mathrm{MC}}_{\rho_{s}}(f;N)$ requires $O(Nst)$ arithmetic operations. Some examples where a fast matrix-vector multiplication has been established are the following: In Feischl et al. (2018) the authors use $H$ -matrices to obtain an approximation of a covariance matrix which also permits a fast matrix vector multiplication; In Giles et al. (2008) the authors show how a (partially) fast matrix vector product can be implemented for multi-asset pricing in finance; Brownian bridge and principle component analysis factorizations of the covariance matrix in finance also permit a fast matrix vector multiplication (Giles et al. 2008). Here we consider the case where either a fast matrix-vector product is not available, or one wants to avoid $H$ -matrices and particular covariance factorizations, since we do not impose any restrictions on $A$ .

In order to reduce this computational cost, we propose an alternative, novel Monte Carlo estimator in this paper. Instead of generating a sequence of i.i.d. samples of the vector $\boldsymbol{x}$ , we generate a sequence of i.i.d. samples of a single random variable, denoted by $x_{1},x_{2},\ldots$ , and then approximates $I_{\rho_{s}}(f)$ by

I^{\mathrm{TMC}}_{\rho_{s}}(f;N)=\frac{1}{N}\sum_{n=1}^{N}f(\tilde{\boldsymbol{x}}_{n})

(3)

with

\tilde{\boldsymbol{x}}_{n}=(x_{n+s-1},\ldots,x_{n}).

The computation of $I^{\mathrm{TMC}}_{\rho_{s}}(f;N)$ can be done as follows:

Algorithm 1

For $N\in\mathbb{Z}_{>0}$ , let $x_{1},x_{2},\ldots,x_{N+s-1}$ be $N+s-1$ i.i.d. samples of a random variable following $\rho$ .

Define $X\in\mathbb{R}^{N\times s}$ by

X=\begin{pmatrix}x_{s}&x_{s-1}&x_{s-2}&\cdots&x_{1}\\ x_{s+1}&x_{s}&x_{s-1}&\cdots&x_{2}\\ x_{s+2}&x_{s+1}&x_{s}&\cdots&x_{3}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ x_{N+s-2}&x_{N+s-3}&x_{N+s-4}&\cdots&x_{N-1}\\ x_{N+s-1}&x_{N+s-2}&x_{N+s-3}&\cdots&x_{N}\\ \end{pmatrix}.

Note that $X$ is a Toeplitz matrix.

2.

Compute

$XA=Y=\begin{pmatrix}\boldsymbol{y}_{1}\\ \vdots\\ \boldsymbol{y}_{N}\end{pmatrix}\in\mathbb{R}^{N\times t}.$
3.

Then $I^{\mathrm{TMC}}_{\rho_{s}}(f;N)$ is given by

$I^{\mathrm{TMC}}_{\rho_{s}}(f;N)=\frac{1}{N}\sum_{n=1}^{N}g(\boldsymbol{y}_{n}).$

The idea behind introducing this algorithm comes from a recent paper by Dick et al. (2015) who consider replacing the point set used in the standard Monte Carlo estimator (1) with a special class of quasi-Monte Carlo point sets which permit a fast matrix-vector multiplication $\boldsymbol{x}_{n}A$ for $n=1,\ldots,N$ . This paper considers a sampling scheme different from Dick et al. (2015) while still allowing for a fast matrix-vector multiplication.

When $s$ is quite large, say thousands or million, $N$ has to be set significantly smaller than $2^{s}$ . Throughout this paper we consider the case where $N\approx s^{\kappa}$ for some $\kappa>0$ . Since the matrix-vector multiplication between a Toeplitz matrix $X$ and each column vector of $A$ can be done with $O(N\log s)$ arithmetic operations by using the fast Fourier transform (Frigo and Johnson 2005), the matrix-matrix multiplication $XA$ appearing in the second item of Algorithm 1 can be done with $O(tN\log s)$ arithmetic operations. This way the necessary computational cost can be reduced from $O(Nst)$ to $O(tN\log s)$ , which is the major advantage of using $I^{\mathrm{TMC}}_{\rho_{s}}(f;N)$ .

Remark 1

Here we give some comments about memory requirements and parallel implementation of the estimators. For the standard Monte Carlo estimator, since each sample $\boldsymbol{x}_{n}$ is generated independently, the corresponding function values $f(\boldsymbol{x}_{n})=g(\boldsymbol{x}_{n}A)$ can be computed in parallel. A required size to keep one vector $\boldsymbol{x}_{n}A$ in memory until evaluating $g(\boldsymbol{x}_{n}A)$ is only of order $t$ . Regarding the TMC estimator, let us assume that $N$ is a multiple of $s$ , that is, $N=Ls$ with some $L\in\mathbb{Z}_{>0}$ . For each $\ell\in\{1,\ldots,L\}$ , define a Toeplitz submatrix

X_{\ell}=\begin{pmatrix}x_{\ell s}&x_{\ell s-1}&\cdots&x_{(\ell-1)s+1}\\ x_{\ell s+1}&x_{\ell s}&\cdots&x_{(\ell-1)s+2}\\ \vdots&\vdots&\ddots&\vdots\\ x_{\ell s+s-1}&x_{\ell s+s-2}&\cdots&x_{\ell s}\\ \end{pmatrix}\in\mathbb{R}^{s\times s}.

In fact, it is easy to see that $X=\left(X_{1}^{\top},\ldots,X_{L}^{\top}\right)^{\top}$ and

Y=\left((X_{1}A)^{\top},\ldots,(X_{L}A)^{\top}\right)^{\top}\in\mathbb{R}^{N\times t}.

This clearly shows that each $X_{\ell}A$ can be computed in parallel by using the fast Fourier transform. Given that the matrix $X_{\ell}A\in\mathbb{R}^{s\times t}$ has to be kept in memory until evaluating $g(\boldsymbol{y}_{n})$ for $n=(\ell-1)s+1,\ldots,\ell s$ , the required memory size of the TMC estimator is of order $st$ .

In this paper we call $I^{\mathrm{TMC}}_{\rho_{s}}(f;N)$ a Toeplitz Monte Carlo (TMC) estimator of $I_{\rho_{s}}(f)$ as we rely on the Toeplitz structure of $X$ to achieve a faster computation.¹¹1If the sample nodes are given by $\tilde{\boldsymbol{x}}_{n}=(x_{n},\ldots,x_{n+s-1})$ instead of $\tilde{\boldsymbol{x}}_{n}=(x_{n+s-1},\ldots,x_{n})$ , the matrix $X$ becomes a Hankel matrix, which also allows for a fast matrix-vector multiplication. Therefore we can call our proposal a Hankel Monte Carlo (HMC) estimator instead. However, in the context of Monte Carlo methods, HMC often refers to the Hamiltonian Monte Carlo algorithm, and we would like to avoid duplication of the abbreviations by coining the name Toeplitz Monte Carlo. In the remainder of this paper, we study the variance of the TMC estimator and its dependence on the dimension $s$ , and also see practical efficiency of the TMC estimator by carrying out numerical experiments for applications from ordinary/partial differential equations with random coefficients.

2 Theoretical results

2.1 Variance analysis

In order to study the variance of $I^{\mathrm{TMC}}_{\rho_{s}}(f;N)$ , we introduce the concept of the analysis-of-variance (ANOVA) decomposition of multivariate functions (Hoeffding 1948, Kuo et al. 2010, Sobol’ 1993). In what follows, for simplicity of notation, we write $[1:s]=\{1,\ldots,s\}$ . For a subset $u\subseteq[1:s]$ , we write $-u:=[1:s]\setminus u$ and denote the cardinality of $u$ by $|u|$ . Let $f$ be a square-integrable function, i.e., $I_{\rho_{s}}(f^{2})<\infty$ . Then $f$ can be decomposed into

f(\boldsymbol{x})=\sum_{u\subseteq[1:s]}f_{u}(\boldsymbol{x}_{u}),

where we write $\boldsymbol{x}_{u}=(x_{j})_{j\in u}$ and each summand is defined recursively by $f_{\emptyset}=I_{\rho_{s}}(f)$ and

f_{u}(\boldsymbol{x}_{u})=\int_{\Omega^{s-|u|}}f(\boldsymbol{x})\rho_{s-|u|}(\boldsymbol{x}_{-u})\,\mathrm{d}\boldsymbol{x}_{-u}-\sum_{v\subset u}f_{v}(\boldsymbol{x}_{v})

for $\emptyset\neq u\subseteq[1:s]$ . Regarding this decomposition of multivariate functions, the following properties hold. We refer to Lemmas A.1 & A.3 of Owen (2019) for the proof of the case where $\rho$ is the uniform distribution over the unit interval $\Omega=[0,1]$ .

Lemma 1

With the notation above, we have:

1.

For any non-empty $u\subseteq[1:s]$ and $j\in u$ ,

$\int_{\Omega}f_{u}(\boldsymbol{x}_{u})\rho(x_{j})\,\mathrm{d}x_{j}=0.$

For any $u,v\subseteq[1:s]$ ,

	$\displaystyle I_{\rho_{s}}(f_{u}f_{v})$	$\displaystyle=\int_{\Omega^{s}}f_{u}(\boldsymbol{x}_{u})f_{v}(\boldsymbol{x}_{v})\rho_{s}(\boldsymbol{x})\,\mathrm{d}\boldsymbol{x}$
		$\displaystyle=\begin{cases}I_{\rho_{s}}(f_{u}^{2})&\text{if $u=v$,}\\ 0&\text{otherwise.}\end{cases}$

It follows from the second assertion of Lemma 1 that

	$\displaystyle I_{\rho_{s}}(f^{2})$	$\displaystyle=I_{\rho_{s}}\left(\sum_{u,v\subseteq[1:s]}f_{u}f_{v}\right)$
		$\displaystyle=\sum_{u,v\subseteq[1:s]}I_{\rho_{s}}(f_{u}f_{v})=\sum_{u\subseteq[1:s]}I_{\rho_{s}}(f_{u}^{2}).$

This equality means that the variance of $f$ can be expressed as a sum of the variances of the lower-dimensional functions:

I_{\rho_{s}}(f^{2})-(I_{\rho_{s}}(f))^{2}=\sum_{\emptyset\neq u\subseteq[1:s]}I_{\rho_{s}}(f_{u}^{2}).

(4)

Using these facts, the variance of the TMC estimator $I^{\mathrm{TMC}}_{\rho_{s}}(f;N)$ can be analyzed as follows:

Theorem 2.1

Let $N,s\in\mathbb{Z}_{\geq 2}$ . Then we have

\mathbb{E}[I^{\mathrm{TMC}}_{\rho_{s}}(f;N)]=I_{\rho_{s}}(f)

and

	$\displaystyle\mathbb{V}[I^{\mathrm{TMC}}_{\rho_{s}}(f;N)]=\mathbb{V}[I^{\mathrm{MC}}_{\rho_{s}}(f;N)]$
	$\displaystyle\quad+\frac{2}{N^{2}}\sum_{\ell=1}^{\min(s,N)-1}(N-\ell)\sum_{\emptyset\neq u\subseteq[1:s-\ell]}I_{\rho_{\|u\|}}(f_{u}f_{u+\ell}),$

where we write $u+\ell=\{j+\ell\colon j\in u\}$ .

Note that, in the theorem, we write

I_{\rho_{|u|}}(f_{u}f_{u+\ell})=\int_{\Omega^{|u|}}f_{u}(\boldsymbol{x}_{u})f_{u+\ell}(\boldsymbol{x}_{u})\rho_{|u|}(\boldsymbol{x}_{u})\,\mathrm{d}\boldsymbol{x}_{u}.

The readers should not be confused with

I_{\rho_{s}}(f_{u}f_{u+\ell})=\int_{\Omega^{s}}f_{u}(\boldsymbol{x}_{u})f_{u+\ell}(\boldsymbol{x}_{u+\ell})\rho_{s}(\boldsymbol{x})\,\mathrm{d}\boldsymbol{x}=0.

Proof

The first assertion follows immediately from the linearity of expectation and the trivial equality $\mathbb{E}[f(\tilde{\boldsymbol{x}}_{n})]=I_{\rho_{s}}(f)$ . For the second assertion, by using the ANOVA decomposition of $f$ , we have

	$\displaystyle\mathbb{V}[I^{\mathrm{TMC}}_{\rho_{s}}(f;N)]$
	$\displaystyle=\mathbb{E}\left[(I^{\mathrm{TMC}}_{\rho_{s}}(f;N))^{2}\right]-\left(\mathbb{E}[I^{\mathrm{TMC}}_{\rho_{s}}(f;N)]\right)^{2}$
	$\displaystyle=\mathbb{E}\left[(I^{\mathrm{TMC}}_{\rho_{s}}(f;N))^{2}\right]-\left(I_{\rho_{s}}(f)\right)^{2}$
	$\displaystyle=\frac{1}{N^{2}}\sum_{m,n=1}^{N}\mathbb{E}[f(\tilde{\boldsymbol{x}}_{m})f(\tilde{\boldsymbol{x}}_{n})]-\left(I_{\rho_{s}}(f)\right)^{2}$
	$\displaystyle=\frac{1}{N^{2}}\sum_{n=1}^{N}\mathbb{E}[(f(\tilde{\boldsymbol{x}}_{n}))^{2}]+\frac{2}{N^{2}}\sum_{\begin{subarray}{c}m,n=1\\ m>n\end{subarray}}^{N}\mathbb{E}[f(\tilde{\boldsymbol{x}}_{m})f(\tilde{\boldsymbol{x}}_{n})]$
	$\displaystyle\quad-\left(I_{\rho_{s}}(f)\right)^{2}$
	$\displaystyle=\frac{I_{\rho_{s}}(f^{2})}{N}+\frac{2}{N^{2}}\sum_{\begin{subarray}{c}m,n=1\\ m>n\end{subarray}}^{N}\sum_{u,v\subseteq[1:s]}\mathbb{E}[f_{u}(\tilde{\boldsymbol{x}}_{m,u})f_{v}(\tilde{\boldsymbol{x}}_{n,v})]$
	$\displaystyle\quad-\left(I_{\rho_{s}}(f)\right)^{2}.$

It follows from the first assertion of Lemma 1 that the second term on the right-most side above becomes

	$\displaystyle\frac{2}{N^{2}}\sum_{\begin{subarray}{c}m,n=1\\ m>n\end{subarray}}^{N}\sum_{u,v\subseteq[1:s]}\mathbb{E}[f_{u}(\tilde{\boldsymbol{x}}_{m,u})f_{v}(\tilde{\boldsymbol{x}}_{n,v})]$
	$\displaystyle=\frac{2}{N^{2}}\sum_{\begin{subarray}{c}m,n=1\\ m>n\end{subarray}}^{N}f_{\emptyset}^{2}+\frac{2}{N^{2}}\sum_{\begin{subarray}{c}m,n=1\\ m>n\end{subarray}}^{N}\sum_{\emptyset\neq u\subseteq[1:s]}f_{\emptyset}\,\mathbb{E}[f_{u}(\tilde{\boldsymbol{x}}_{m,u})]$
	$\displaystyle\quad+\frac{2}{N^{2}}\sum_{\begin{subarray}{c}m,n=1\\ m>n\end{subarray}}^{N}\sum_{\emptyset\neq v\subseteq[1:s]}f_{\emptyset}\,\mathbb{E}[f_{v}(\tilde{\boldsymbol{x}}_{n,v})]$
	$\displaystyle\quad+\frac{2}{N^{2}}\sum_{\begin{subarray}{c}m,n=1\\ m>n\end{subarray}}^{N}\sum_{\emptyset\neq u,v\subseteq[1:s]}\mathbb{E}[f_{u}(\tilde{\boldsymbol{x}}_{m,u})f_{v}(\tilde{\boldsymbol{x}}_{n,v})]$
	$\displaystyle=\frac{N-1}{N}\left(I_{\rho_{s}}(f)\right)^{2}$
	$\displaystyle\quad+\frac{2}{N^{2}}\sum_{\begin{subarray}{c}m,n=1\\ m>n\end{subarray}}^{N}\sum_{\emptyset\neq u,v\subseteq[1:s]}\mathbb{E}[f_{u}(\tilde{\boldsymbol{x}}_{m,u})f_{v}(\tilde{\boldsymbol{x}}_{n,v})]$
	$\displaystyle=\frac{N-1}{N}\left(I_{\rho_{s}}(f)\right)^{2}$
	$\displaystyle\quad+\frac{2}{N^{2}}\sum_{\ell=1}^{N-1}\sum_{\begin{subarray}{c}m,n=1\\ m-n=\ell\end{subarray}}^{N}\sum_{\emptyset\neq u,v\subseteq[1:s]}\mathbb{E}[f_{u}(\tilde{\boldsymbol{x}}_{m,u})f_{v}(\tilde{\boldsymbol{x}}_{n,v})],$

where we reordered the sum over $m$ and $n$ with respect to the difference $m-n$ in the last equality.

If $m-n\geq s$ , there is no overlapping of the components between $\tilde{\boldsymbol{x}}_{m}$ and $\tilde{\boldsymbol{x}}_{n}$ . Because of the independence of samples, it follows from the first assertion of Lemma 1 that the inner sum over $u$ and $v$ above is given by

	$\displaystyle\sum_{\emptyset\neq u,v\subseteq[1:s]}\mathbb{E}[f_{u}(\tilde{\boldsymbol{x}}_{m,u})f_{v}(\tilde{\boldsymbol{x}}_{n,v})]$
	$\displaystyle=\sum_{\emptyset\neq u,v\subseteq[1:s]}\mathbb{E}[f_{u}(\tilde{\boldsymbol{x}}_{m,u})]\mathbb{E}[f_{v}(\tilde{\boldsymbol{x}}_{n,v})]=0.$

If $\ell=m-n<s$ , on the other hand, we have $\tilde{x}_{n,j}=\tilde{x}_{m,j+\ell}$ for any $j=1,\ldots,s-\ell$ . With this equality and the first assertion of Lemma 1, the inner sum over $u$ and $v$ becomes

	$\displaystyle\sum_{\emptyset\neq u,v\subseteq[1:s]}\mathbb{E}[f_{u}(\tilde{\boldsymbol{x}}_{m,u})f_{v}(\tilde{\boldsymbol{x}}_{n,v})]$
	$\displaystyle=\sum_{\emptyset\neq u\subseteq[1:s-\ell]}\mathbb{E}[f_{u+\ell}(\tilde{\boldsymbol{x}}_{m,u+\ell})f_{u}(\tilde{\boldsymbol{x}}_{n,u})]$
	$\displaystyle=\sum_{\emptyset\neq u\subseteq[1:s-\ell]}\mathbb{E}[f_{u+\ell}(\tilde{\boldsymbol{x}}_{n,u})f_{u}(\tilde{\boldsymbol{x}}_{n,u})]$
	$\displaystyle=\sum_{\emptyset\neq u\subseteq[1:s-\ell]}I_{\rho_{\|u\|}}(f_{u+\ell}f_{u}).$

Altogether we obtain

	$\displaystyle\mathbb{V}[I^{\mathrm{TMC}}_{\rho_{s}}(f;N)]$
	$\displaystyle=\frac{I_{\rho_{s}}(f^{2})}{N}+\frac{N-1}{N}\left(I_{\rho_{s}}(f)\right)^{2}-\left(I_{\rho_{s}}(f)\right)^{2}$
	$\displaystyle\quad+\frac{2}{N^{2}}\sum_{\ell=1}^{N-1}\sum_{\begin{subarray}{c}m,n=1\\ m-n=\ell\end{subarray}}^{N}\sum_{\emptyset\neq u,v\subseteq[1:s]}\mathbb{E}[f_{u}(\tilde{\boldsymbol{x}}_{m,u})f_{v}(\tilde{\boldsymbol{x}}_{n,v})]$
	$\displaystyle=\frac{I_{\rho_{s}}(f^{2})-\left(I_{\rho_{s}}(f)\right)^{2}}{N}$
	$\displaystyle\quad+\frac{2}{N^{2}}\sum_{\ell=1}^{\min(s,N)-1}\sum_{\emptyset\neq u\subseteq[1:s-\ell]}I_{\rho_{\|u\|}}(f_{u}f_{u+\ell})\sum_{\begin{subarray}{c}m,n=1\\ m-n=\ell\end{subarray}}^{N}1$
	$\displaystyle=\mathbb{V}[I^{\mathrm{MC}}_{\rho_{s}}(f;N)]$
	$\displaystyle\quad+\frac{2}{N^{2}}\sum_{\ell=1}^{\min(s,N)-1}(N-\ell)\sum_{\emptyset\neq u\subseteq[1:s-\ell]}I_{\rho_{\|u\|}}(f_{u}f_{u+\ell}).$

Thus we are done.∎

Remark 2

We now consider a parallel implementation of the TMC method with $L$ CPUs. Each CPU independently generates a Toeplitz matrix of $N$ random numbers and CPU $v$ computes an estimator $I^{\mathrm{TMC}}_{\rho_{s}}(f;N;v)$ of the form (3), $v=1,2,\ldots,L$ . In this case, Theorem 2.1 then applies to the output of each CPU in the parallel implementation. The average of these $L$ independent estimators is again unbiased and satisfies

	$\displaystyle\mathbb{V}\left[\frac{1}{L}\sum_{v=1}^{L}I^{\mathrm{TMC}}_{\rho_{s}}(f;N;v)\right]=\frac{1}{L}\mathbb{V}[I^{\mathrm{MC}}_{\rho_{s}}(f;N)]$
	$\displaystyle\quad+\frac{2}{LN^{2}}\sum_{\ell=1}^{\min(s,N)-1}(N-\ell)\sum_{\emptyset\neq u\subseteq[1:s-\ell]}I_{\rho_{\|u\|}}(f_{u}f_{u+\ell}).$

Notice that we have $LN$ points altogether.

If $L=1$ then we get the result from Theorem 2.1 and if $N=1$ we get (2) (where, in this case, $L$ is the number of samples), as the sum $\sum_{\ell=1}^{0}\ldots$ is equal to 0 in this instance.

As is clear from Theorem 2.1, the TMC estimator is unbiased and maintains the canonical “one over square root of $N$ ” convergence. Moreover, the TMC estimator can be regarded as a variance reduction technique since the second term on the variance $\mathbb{V}[I^{\mathrm{TMC}}_{\rho_{s}}(f;N)]$ can be negative, depending on the function.

Example 1

To illustrate the last comment, let us consider a simple test function $f\colon\mathbb{R}^{3}\to\mathbb{R}$ given by

f(x,y,z)=x-y-z+xy-xz-yz,

and let $x,y,z$ be normally distributed independent random variables with mean 0 and variance 1. It is easy to see that

	$\displaystyle f_{\{1\}}(x)=x,\quad f_{\{2\}}(y)=-y,\quad f_{\{3\}}(z)=-z,$
	$\displaystyle f_{\{1,2\}}(x,y)=xy,\quad f_{\{1,3\}}(x,z)=-xz,$
	$\displaystyle f_{\{2,3\}}(y,z)=-yz,\quad f_{\{1,2,3\}}(x,y,z)=0.$

Then it follows that

\mathbb{V}[I^{\mathrm{MC}}_{\rho_{s}}(f;N)]=\frac{1}{N}\sum_{\emptyset\neq u\subseteq[1:3]}I_{\rho_{s}}(f_{u}^{2})=\frac{6}{N},

whereas, for $N\geq 3$ , we have

	$\displaystyle\mathbb{V}[I^{\mathrm{TMC}}_{\rho_{s}}(f;N)]$
	$\displaystyle=\frac{6}{N}+\frac{2(N-2)}{N^{2}}I_{\rho_{1}}(f_{\{1\}}f_{\{3\}})+\frac{2(N-1)}{N^{2}}$
	$\displaystyle\quad\times\left[I_{\rho_{1}}(f_{\{1\}}f_{\{2\}})+I_{\rho_{1}}(f_{\{2\}}f_{\{3\}})+I_{\rho_{2}}(f_{\{1,2\}}f_{\{2,3\}})\right]$
	$\displaystyle=\frac{6}{N}-\frac{2(N-2)}{N^{2}}-\frac{2(N-1)}{N^{2}}=\frac{2}{N}+\frac{6}{N^{2}}.$

Therefore the variance of the TMC estimator is almost one-third of the variance of the standard Monte Carlo estimator.

It is also possible that the variance of the TMC estimator increases compared to standard Monte Carlo, however, we show below that this increase is bounded.

2.2 Weighted $L_{2}$ space and tractability

Here we study the dependence of the variance $\mathbb{V}[I^{\mathrm{TMC}}_{\rho_{s}}(f;N)]$ on the dimension $s$ . For this purpose, we first give a bound on $\mathbb{V}[I^{\mathrm{TMC}}_{\rho_{s}}(f;N)]$ . For $1\leq\ell\leq s$ , let

\alpha_{\ell}(f):=\left(\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}I_{\rho_{|u|}}(f_{u}^{2})\right)^{1/2}.

Then it follows from the decomposition (4) that

I_{\rho_{s}}(f^{2})-(I_{\rho_{s}}(f))^{2}=\sum_{\ell=1}^{s}\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}I_{\rho_{s}}(f_{u}^{2})=\sum_{\ell=1}^{s}(\alpha_{\ell}(f))^{2},

resulting in an equality

\mathbb{V}[I^{\mathrm{MC}}_{\rho_{s}}(f;N)]=\frac{1}{N}\sum_{\ell=1}^{s}(\alpha_{\ell}(f))^{2}.

Using Theorem 2.1, the variance $\mathbb{V}[I^{\mathrm{TMC}}_{\rho_{s}}(f;N)]$ is bounded above as follows.

Corollary 1

We have

\mathbb{V}[I^{\mathrm{TMC}}_{\rho_{s}}(f;N)]\leq\frac{1}{N}\left(\sum_{\ell=1}^{s}\alpha_{\ell}(f)\right)^{2}.

Proof

For any $\ell=1,\ldots,s-1$ , the Cauchy-Schwarz inequality leads to

	$\displaystyle\sum_{\emptyset\neq u\subseteq[1:s-\ell]}I_{\rho_{\|u\|}}(f_{u}f_{u+\ell})$
	$\displaystyle\leq\sum_{\emptyset\neq u\subseteq[1:s-\ell]}\left(I_{\rho_{\|u\|}}(f_{u}^{2})\right)^{1/2}\left(I_{\rho_{\|u\|}}(f_{u+\ell}^{2})\right)^{1/2}$
	$\displaystyle=\sum_{v=1}^{s-\ell}\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s-\ell]\\ \min_{j\in u}j=v\end{subarray}}\left(I_{\rho_{\|u\|}}(f_{u}^{2})\right)^{1/2}\left(I_{\rho_{\|u\|}}(f_{u+\ell}^{2})\right)^{1/2}$
	$\displaystyle\leq\sum_{v=1}^{s-\ell}\left(\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s-\ell]\\ \min_{j\in u}j=v\end{subarray}}I_{\rho_{\|u\|}}(f_{u}^{2})\right)^{1/2}$
	$\displaystyle\qquad\times\left(\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s-\ell]\\ \min_{j\in u}j=v\end{subarray}}I_{\rho_{\|u\|}}(f_{u+\ell}^{2})\right)^{1/2}$
	$\displaystyle=\sum_{v=1}^{s-\ell}\left(\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s-\ell]\\ \min_{j\in u}j=v\end{subarray}}I_{\rho_{\|u\|}}(f_{u}^{2})\right)^{1/2}$
	$\displaystyle\qquad\times\left(\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[\ell+1:s]\\ \min_{j\in u}j=\ell+v\end{subarray}}I_{\rho_{\|u\|}}(f_{u}^{2})\right)^{1/2}$
	$\displaystyle\leq\sum_{v=1}^{s-\ell}\alpha_{v}(f)\alpha_{\ell+v}(f).$

Applying this bound to the second assertion of Theorem 2.1, we obtain

	$\displaystyle\mathbb{V}[I^{\mathrm{TMC}}_{\rho_{s}}(f;N)]$
	$\displaystyle\leq\frac{1}{N}\sum_{\ell=1}^{s}(\alpha_{\ell}(f))^{2}+\frac{2}{N^{2}}\sum_{\ell=1}^{s-1}(N-\ell)\sum_{v=1}^{s-\ell}\alpha_{v}(f)\alpha_{\ell+v}(f)$
	$\displaystyle\leq\frac{1}{N}\sum_{\ell=1}^{s}(\alpha_{\ell}(f))^{2}+\frac{2}{N}\sum_{v=1}^{s-1}\alpha_{v}(f)\sum_{\ell=v+1}^{s}\alpha_{\ell}(f)$
	$\displaystyle=\frac{1}{N}\left(\sum_{\ell=1}^{s}\alpha_{\ell}(f)\right)^{2}.$

∎

Using this result, we have

\frac{\mathbb{V}[I^{\mathrm{TMC}}_{\rho_{s}}(f;N)]}{\mathbb{V}[I^{\mathrm{MC}}_{\rho_{s}}(f;N)]}\leq\frac{(\alpha_{1}(f)+\cdots+\alpha_{s}(f))^{2}}{\alpha^{2}_{1}(f)+\cdots+\alpha^{2}_{s}(f)}\leq s,

wherein, for the second inequality, the equality is attained if and only if $\alpha_{1}(f)=\cdots=\alpha_{s}(f)$ . Therefore, when we fix the number of samples, the variance of the TMC estimator can at most be $s$ times larger than the variance of the standard Monte Carlo estimator.

Now let us consider the case $s=t$ and assume, as discussed in the first section, that the computational time for the standard Monte Carlo estimator is proportional to $Ns^{2}$ , whereas the computational time for the TMC estimator is proportional to $Ns\log s$ (assuming that the main cost in evaluating $f(\boldsymbol{x})=g(\boldsymbol{x}A)$ lies in the computation of $\boldsymbol{x}A$ ). When we fix the cost instead of the number of samples, we have

N_{\mathrm{MC}}s^{2}\asymp N_{\mathrm{TMC}}s\log s,

where $\asymp$ indicates that the terms should be of the same order, and so

	$\displaystyle\frac{\mathbb{V}[I^{\mathrm{TMC}}_{\rho_{s}}(f;N_{\mathrm{TMC}})]}{\mathbb{V}[I^{\mathrm{MC}}_{\rho_{s}}(f;N_{\mathrm{MC}})]}$	$\displaystyle\leq\frac{N_{\mathrm{MC}}}{N_{\mathrm{TMC}}}\cdot\frac{(\alpha_{1}(f)+\cdots+\alpha_{s}(f))^{2}}{\alpha^{2}_{1}(f)+\cdots+\alpha^{2}_{s}(f)}$
		$\displaystyle\propto\log s.$

Thus, the variance of the TMC estimator for a given cost is at most $\log s$ times as large as the standard Monte Carlo estimator (up to some constant factor). On the other hand, if there is some decay of the importance of the ANOVA terms as the index of the variable increases, for instance, if the first few terms in $\alpha_{1}(f)+\cdots+\alpha_{s}(f)$ dominate the sum, then the ratio $\frac{(\alpha_{1}(f)+\cdots+\alpha_{s}(f))^{2}}{\alpha^{2}_{1}(f)+\cdots+\alpha^{2}_{s}(f)}$ can be bounded independently of $s$ , leading to a gain in the efficiency of the TMC estimator. We observe such a behaviour in our numerical experiments below.

Following the idea from Sloan and Woźniakowski (1998), we now introduce the notion of a weighted $L_{2}$ space. Let $(\gamma_{u})_{u\subset\mathbb{N}}$ be a sequence of the non-negative real numbers called weights. Then the weighted $L_{2}$ space is defined by

\mathcal{F}_{s,\boldsymbol{\gamma}}=\left\{f\colon\Omega^{s}\to\mathbb{R}\mid\|f\|_{s,\boldsymbol{\gamma}}\leq\infty\right\},

where

\|f\|_{s,\boldsymbol{\gamma}}:=\left(\sum_{u\subseteq[1:s]}\gamma_{u}^{-1}I_{\rho_{s}}(f_{u}^{2})\right)^{1/2}.

For any subset $u$ with $\gamma_{u}=0$ , we assume that the corresponding ANOVA term $f_{u}$ is 0 and we formally set $0/0=0$ .

For a randomized algorithm using $N$ function evaluations of $f$ to estimate $I_{\rho_{s}}(f)$ , which we denote by $\mathrm{Alg}(f;N)$ , let us consider the minimal cost to estimate $I_{\rho_{s}}(f)$ with mean square error $\varepsilon^{2}$ for any $f$ in the unit ball of $\mathcal{F}_{s,\boldsymbol{\gamma}}$ :

N(\varepsilon,\mathrm{Alg}):=\min\left\{N\in\mathbb{Z}_{>0}\mid e^{2}(\mathrm{Alg},\mathcal{F}_{s,\boldsymbol{\gamma}})\leq\varepsilon^{2}\right\},

where

e^{2}(\mathrm{Alg},\mathcal{F}_{s,\boldsymbol{\gamma}}):=\sup_{\begin{subarray}{c}f\in\mathcal{F}_{s,\boldsymbol{\gamma}}\\ \|f\|_{s,\boldsymbol{\gamma}}\leq 1\end{subarray}}\mathbb{E}\left[\left(\mathrm{Alg}(f;N)-I_{\rho_{s}}(f)\right)^{2}\right].

We say that the algorithm $\mathrm{Alg}$ is

•

a weakly tractable algorithm if

$\lim_{\varepsilon^{-1}+s\to\infty}\frac{\ln N(\varepsilon,\mathrm{Alg})}{\varepsilon^{-1}+s}=0,$
•

a polynomially tractable algorithm if there exist non-negative constants $C,p,q$ , such that

$N(\varepsilon,\mathrm{Alg})\leq C\varepsilon^{-p}s^{q}$

holds for all $s=1,2,\ldots$ , where $p$ and $q$ are called the $\varepsilon^{-1}$ -exponent and the $s$ -exponent, respectively,
•

a strongly polynomially tractable algorithm if $\mathrm{Alg}$ is a polynomially tractable algorithm with the $s$ -exponent 0.

We refer to Novak and Woźniakowski (2010) for more information on the notion of tractability.

For instance, the standard Monte Carlo estimator $I^{\mathrm{MC}}_{\rho_{s}}(f;N)$ is a strongly polynomially tractable algorithm with $\varepsilon^{-1}$ -exponent 2 if

\sup_{u\subset\mathbb{N}}\gamma_{u}<\infty

(5)

holds. This claim can be proven as follows:

	$\displaystyle\mathbb{E}\left[\left(I^{\mathrm{MC}}_{\rho_{s}}(f;N)-I_{\rho_{s}}(f)\right)^{2}\right]$
	$\displaystyle=\mathbb{V}[I^{\mathrm{MC}}_{\rho_{s}}(f;N)]=\frac{1}{N}\sum_{\emptyset\neq u\subseteq[1:s]}I_{\rho_{s}}(f_{u}^{2})$
	$\displaystyle\leq\frac{1}{N}\left(\max_{u\subseteq[1:s]}\gamma_{u}\right)\left(\sum_{u\subseteq[1:s]}\frac{I_{\rho_{s}}(f_{u}^{2})}{\gamma_{u}}\right)$
	$\displaystyle=\frac{\\|f\\|_{s,\boldsymbol{\gamma}}^{2}}{N}\max_{u\subseteq[1:s]}\gamma_{u}.$

It follows that, in order to have

\mathbb{E}\left[\left(I^{\mathrm{MC}}_{\rho_{s}}(f;N)-I_{\rho_{s}}(f)\right)^{2}\right]\leq\varepsilon^{2}

for any $f\in\mathcal{F}_{s,\boldsymbol{\gamma}}$ with $\|f\|_{s,\boldsymbol{\gamma}}\leq 1$ , we need $N\geq\varepsilon^{-2}\max_{u\subseteq[1:s]}\gamma_{u}$ . Thus the minimal cost is bounded above by

N(\varepsilon,I^{\mathrm{MC}}_{\rho_{s}})\leq\varepsilon^{-2}\max_{u\subseteq[1:s]}\gamma_{u}.

Given the condition (5), we see that $N(\varepsilon,I^{\mathrm{MC}}_{\rho_{s}})$ is bounded independently of the dimension $s$ and the algorithm $I^{\mathrm{MC}}_{\rho_{s}}$ is strongly polynomially tractable with the $\varepsilon^{-1}$ -exponent 2.

The following theorem gives the necessary conditions on the weights $(\gamma_{u})_{u\subset\mathbb{N}}$ for the TMC estimator to be a weakly tractable algorithm, a polynomially tractable algorithm, or a strongly polynomially tractable algorithm.

Theorem 2.2

The TMC estimator is

•

a weakly tractable algorithm if

\lim_{s\to\infty}\frac{1}{s}\ln\left(\sum_{\ell=1}^{s}\max_{\begin{subarray}{c}u\subset[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}\right)=0,

•

a polynomially tractable algorithm with the $\varepsilon^{-1}$ -exponent 2 if there exists $q>0$ such that

\sup_{s=1,2,\ldots}\frac{1}{s^{q}}\sum_{\ell=1}^{s}\max_{\begin{subarray}{c}u\subset[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}<\infty,

•

a strongly polynomially tractable algorithm with the $\varepsilon^{-1}$ -exponent 2 if

$\sum_{\ell=1}^{\infty}\sup_{\begin{subarray}{c}u\subset\mathbb{N}\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}<\infty.$

Proof

It follows from Corollary 1 and Hölder’s inequality for sums that

	$\displaystyle\mathbb{E}\left[\left(I^{\mathrm{TMC}}_{\rho_{s}}(f;N)-I_{\rho_{s}}(f)\right)^{2}\right]$
	$\displaystyle=\mathbb{V}[I^{\mathrm{TMC}}_{\rho_{s}}(f;N)]\leq\frac{1}{N}\left(\sum_{\ell=1}^{s}\alpha_{\ell}(f)\right)^{2}$
	$\displaystyle=\frac{1}{N}\left(\sum_{\ell=1}^{s}\left(\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}I_{\rho_{\|u\|}}(f_{u}^{2})\right)^{1/2}\right)^{2}$
	$\displaystyle\leq\frac{1}{N}\left(\sum_{\ell=1}^{s}\left(\max_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}\right)^{1/2}\right.$
	$\displaystyle\quad\times\left.\left(\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}^{-1}I_{\rho_{\|u\|}}(f_{u}^{2})\right)^{1/2}\right)^{2}$
	$\displaystyle\leq\frac{1}{N}\left(\sum_{\ell=1}^{s}\max_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}\right)\left(\sum_{\ell=1}^{s}\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}^{-1}I_{\rho_{\|u\|}}(f_{u}^{2})\right)$
	$\displaystyle\leq\frac{\\|f\\|_{s,\boldsymbol{\gamma}}^{2}}{N}\sum_{\ell=1}^{s}\max_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}.$

Thus, the minimal cost to have

\mathbb{E}\left[\left(I^{\mathrm{TMC}}_{\rho_{s}}(f;N)-I_{\rho_{s}}(f)\right)^{2}\right]\leq\varepsilon^{2}

for any $f\in\mathcal{F}_{s,\boldsymbol{\gamma}}$ with $\|f\|_{s,\boldsymbol{\gamma}}\leq 1$ is bounded above by

N(\varepsilon,I^{\mathrm{TMC}}_{\rho_{s}})\leq\varepsilon^{-2}\sum_{\ell=1}^{s}\max_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}.

Let us consider the first assertion of the theorem. If the weights satisfy

\lim_{s\to\infty}\frac{1}{s}\ln\left(\sum_{\ell=1}^{s}\max_{\begin{subarray}{c}u\subset[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}\right)=0,

then we have

	$\displaystyle\lim_{\varepsilon^{-1}+s\to\infty}\frac{\ln N(\varepsilon,I^{\mathrm{TMC}}_{\rho_{s}})}{\varepsilon^{-1}+s}$
	$\displaystyle\leq\lim_{\varepsilon^{-1}+s\to\infty}\left(\frac{\ln\varepsilon^{-2}}{\varepsilon^{-1}}+\frac{1}{s}\ln\left(\sum_{\ell=1}^{s}\max_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}\right)\right)=0,$

meaning that $I^{\mathrm{TMC}}_{\rho_{s}}$ is a weakly tractable algorithm. Since the second and third assertions can be shown similarly, we omit the proof.∎

For instance, if the weights satisfy $\gamma_{u}\geq\gamma_{v}$ whenever $u\subset v$ , we always have

\sup_{\begin{subarray}{c}u\subset\mathbb{N}\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}=\gamma_{\{\ell\}}

for any $\ell\in\mathbb{Z}_{>0}$ . Therefore, the necessary condition for the TMC estimator to be strongly polynomially tractable reduces to a simple summability:

\sum_{\ell=1}^{\infty}\gamma_{\{\ell\}}<\infty.

It is obvious to see that the necessary condition for the TMC estimator to be weakly tractable is stronger than that for the standard Monte Carlo estimator to be strongly tractable. Whether we can weaken the necessary conditions for the TMC estimator given in Theorem 2.2 or not is an open question.

3 Numerical experiments

In order to see the practical performance of the TMC estimator, we conduct four kinds of numerical experiments.²²2The C codes used in our experiments are available from https://github.com/takashigoda/Toeplitz-Monte-Carlo. The first test case follows Section 4.1 of Dick et al. (2015), which considers generating quadrature points from a multivariate normal distribution with a general covariance matrix. The second test case, taken from Section 4.2 of Dick et al. (2015), deals with approximating linear functionals of solutions of one-dimensional PDE with “uniform” random coefficients. The third test case is an extension of the second test case to a one-dimensional PDE with “log-normal” random coefficients. Finally, in the fourth test case we consider another possible extension of the second test case, namely a two-dimensional PDE with uniform random coefficients. All computations are performed on a laptop with 1.6 GHz Intel Core i5 CPU and 8 GB memory.

For every test case, we carry out numerical experiments with various values of $N$ and $s$ using both the standard Monte Carlo estimator and the TMC estimator. For each pair of $N$ and $s$ , we repeat computations $R=25$ times independently and calculate the average computational time. For the latter three test cases, the variances of these estimators are measured by

\frac{1}{R(R-1)}\sum_{r=1}^{R}\left(I^{\bullet,(r)}_{\rho_{s}}(f;N)-\overline{I^{\bullet}_{\rho_{s}}(f;N)}\right)^{2}

with

\overline{I^{\bullet}_{\rho_{s}}(f;N)}=\frac{1}{R}\sum_{r=1}^{R}I^{\bullet,(r)}_{\rho_{s}}(f;N),

for $\bullet\in\{\mathrm{MC},\mathrm{TMC}\}$ , where $I^{\bullet,(r)}_{\rho_{s}}(f;N)$ denotes the $r$ -th realization of the estimator $I^{\bullet}_{\rho_{s}}(f;N)$ .

3.1 Generating points from multivariate Gaussian

Table 1: Average times (in seconds) to generate normally distributed points with a random covariance matrix for various values of

N

and

s

using the standard Monte Carlo method and the TMC method.

	$N$	$s=128$	$s=256$	$s=512$	$s=1024$	$s=2048$
stdMC	1024	0.041	0.192	1.369	9.613	–
TMC		0.165	0.239	0.403	0.872	–
saving		0.250	0.800	3.399	11.021	–
stdMC	2048	0.083	0.367	2.077	20.154	217.162
TMC		0.322	0.465	0.923	1.794	3.599
saving		0.259	0.790	2.251	11.234	60.342
stdMC	4096	0.154	0.740	5.551	45.287	446.974
TMC		0.586	0.981	1.909	3.407	7.112
saving		0.263	0.754	2.908	13.291	62.851
stdMC	8192	0.288	1.669	10.332	87.988	830.648
TMC		1.169	1.868	3.110	6.204	14.561
saving		0.247	0.893	3.322	14.183	57.047
stdMC	16384	0.660	3.255	19.764	159.792	1820.426
TMC		2.434	3.750	7.088	14.073	28.082
saving		0.271	0.868	2.788	11.355	64.826
stdMC	32768	1.078	5.627	33.011	331.747	3648.755
TMC		5.486	9.098	15.164	25.950	57.171
saving		0.197	0.618	2.177	12.784	63.822

Generating quadrature points from the multivariate normal distribution $\mathcal{N}(\boldsymbol{\mu},\Sigma)$ with mean vector $\boldsymbol{\mu}\in\mathbb{R}^{s}$ and covariance matrix $\Sigma\in\mathbb{R}^{s\times s}$ is ubiquitous in scientific computation. The standard procedure is as follows (Devroye 1986, Chapter XI.2): Let $A\in\mathbb{R}^{s\times s}$ be a matrix which satisfies $A^{\top}A=\Sigma$ . For instance, the Cholesky decomposition gives such $A$ in an upper triangular form for any symmetric positive-definite matrix $\Sigma$ . Using this decomposition, we can generate a point $\boldsymbol{y}\sim\mathcal{N}(\boldsymbol{\mu},\Sigma)$ by first generating $\boldsymbol{x}=(x_{1},\ldots,x_{s})$ with $x_{j}\sim\mathcal{N}(0,1)$ and then transforming $\boldsymbol{x}$ by

\boldsymbol{y}=\boldsymbol{\mu}+\boldsymbol{x}A.

Even if the matrix $A$ does not have any further structure, a set of quadrature points can be generated in a fast way by following Algorithm 1.

For our experiments, we fix $\boldsymbol{\mu}=(0,\ldots,0)$ and choose $A$ randomly such that $A$ is a random upper triangular matrix with positive diagonal entries. Table 1 shows the average computational times for various values of $N$ and $s$ . As the theory predicts, the computational time for the standard Monte Carlo (stdMC) estimator scales as $Ns^{2}$ , whereas that for the TMC estimator it approximately scales as $Ns\log s$ . For low-dimensional cases up to $s=256$ , the stdMC estimator is faster to compute than the TMC estimator. However, as the dimension $s$ increases, the relative speed of the TMC estimator also increases. For the case $s=2048$ , the TMC is approximately 60 times faster than the stdMC. This result indicates that the TMC estimator is useful in high-dimensional settings for generating normally distributed quadrature points for a general covariance matrix.

3.2 1D differential equation with uniform random coefficients

Let us consider the ODE

	$\displaystyle-\frac{\mathrm{d}}{\mathrm{d}x}\left(a(x,\boldsymbol{y})\frac{\mathrm{d}}{\mathrm{d}x}u(x,\boldsymbol{y})\right)=g(x)\equiv 1$
	for $x\in(0,1)$ and $\boldsymbol{y}\in\left[-\frac{1}{2},\frac{1}{2}\right]^{\mathbb{N}}$
	$\displaystyle u(x,\boldsymbol{y})=0\quad\text{for $x=0,1$}$
	$\displaystyle a(x,\boldsymbol{y})=2+\sum_{j=1}^{\infty}y_{j}\frac{\sin(2\pi jx)}{j^{3/2}}.$

In order to solve this ODE approximately with the finite element method, we consider a system of hat functions

\phi_{m}(x)=\begin{cases}(x-x_{m-1})M&\text{if $x_{m-1}\leq x\leq x_{m}$,}\\ (x_{m+1}-x)M&\text{if $x_{m}\leq x\leq x_{m+1}$,}\\ 0&\text{otherwise,}\end{cases}

for $m=1,\ldots,M-1$ over $M+1$ equi-distributed nodes $x_{m}=m/M$ for $m=0,1,\ldots,M$ , and truncate the infinite sum appearing in the random field $a$ by the first $s$ terms. Therefore, we have three different parameters $N,M$ and $s$ .

Given the boundary condition on $u$ , the approximate solution $u_{M}$ of the ODE for $y_{1},\ldots,y_{s}\sim U[-1/2,1/2]$ is given by

u_{M}=\sum_{m=1}^{M-1}\hat{u}_{m}\phi_{m}(x),

with

\left(\hat{u}_{1},\ldots,\hat{u}_{M-1}\right)\cdot B(y_{1},\ldots,y_{s})=(\hat{g}_{1},\ldots,\hat{g}_{M-1}),

(6)

for the symmetric stiffness matrix $B$ depending on $y_{1},\ldots,y_{s}$ and the forcing vector with entries $\hat{g}_{m}=\int_{0}^{1}g(x)\phi_{m}(x)\,\mathrm{d}x=1/M$ . The entries of the matrix $B=(b_{k,\ell})_{k,\ell}\in\mathbb{R}^{(M-1)\times(M-1)}$ are given by

	$\displaystyle b_{k,\ell}$	$\displaystyle=\int_{0}^{1}a(x,(y_{1},\ldots,y_{s},0,0,\ldots)){\phi}^{\prime}_{k}(x){\phi}^{\prime}_{\ell}(x)\,\mathrm{d}x$
		$\displaystyle=2\int_{0}^{1}{\phi}^{\prime}_{k}(x){\phi}^{\prime}_{\ell}(x)\,\mathrm{d}x$
		$\displaystyle\quad+\sum_{j=1}^{s}y_{j}\int_{0}^{1}\frac{\sin(2\pi jx)}{j^{3/2}}{\phi}^{\prime}_{k}(x){\phi}^{\prime}_{\ell}(x)\,\mathrm{d}x$
		$\displaystyle=:a_{k,\ell}^{(0)}+\sum_{j=1}^{s}y_{j}a_{k,\ell}^{(j)}.$

Hence, by letting $A^{(j)}=(a_{k,\ell}^{(j)})_{k,\ell}\in\mathbb{R}^{(M-1)\times(M-1)}$ for $j=0,1,\ldots,s$ , we have

B=A^{(0)}+\sum_{j=1}^{s}y_{j}A^{(j)}.

Here we note that every entry of $A^{(j)}$ can be explicitly calculated as

a_{k,\ell}^{(0)}=\begin{cases}4M&\text{if $k=\ell$,}\\ -2M&\text{if $|k-\ell|=1$,}\\ 0&\text{otherwise,}\end{cases}

and

a_{k,\ell}^{(j)}=\begin{cases}\frac{M^{2}}{\pi j^{5/2}}\sin\left(\frac{2\pi j}{M}\right)\sin\left(\frac{2\pi jk}{M}\right)&\text{if $k=\ell$,}\\ -\frac{M^{2}}{\pi j^{5/2}}\sin\left(\frac{\pi j}{M}\right)\sin\left(\frac{\pi j(k+\ell)}{M}\right)&\text{if $|k-\ell|=1$,}\\ 0&\text{otherwise,}\end{cases}

for $1\leq j\leq s$ . Since each matrix $A^{(j)}$ is tridiagonal, the LU decomposition requires only $O(M)$ computational time to solve the system of linear equations (6). This way, it is clear that computing the matrix $B$ for $N$ Monte Carlo samples on $(y_{1},\ldots,y_{s})$ is computationally dominant for the whole process, and as shown in Section 3.2 of Dick et al. (2015), the standard Monte Carlo method requires $O(NMs)$ arithmetic operations, whereas Algorithm 1 can reduce them to $O(NM\log s)$ . It is expected that the TMC estimator brings substantial computational cost savings, particularly for large $s$ . For our experiments, we estimate the expectation of $u(1/2,\cdot)$ .

Table 2 shows the results for various values of $N,M$ and $s$ . In general, in the area of PDEs with random coefficients, both $M$ and $s$ grow with $N$ so that the following three errors are balanced (Dick et al. 2015): the finite element discretization error, the truncation error on the random field $a$ , the Monte Carlo error. The optimal balancing depends on the decay rates of these errors. However, in our numerical experiments, we are only interested in how the computation times change for different relations between $M$ , $s$ , and $N$ , and so we test different relations between those parameters (irrespective of what the optimal choice actually is).

Since computations are repeated 25 times independently for each pair, here we report the average of the estimation values and the variance of the estimator for two methods. By comparing the mean values computed by two methods, we can confirm that the TMC estimator is also unbiased (just as the stdMC estimator is). The variances for both of the estimators decay with the rate $O(1/N)$ , whereas the magnitude for the TMC estimator is approximately 2–5 larger than the stdMC estimator. On the other hand, the computational time for the stdMC estimator increases with $N$ (equivalently, with $s$ ) significantly faster than the TMC estimator. This increment behavior of the computation times indicates that computation of the stiffness matrix is the most computationally dominant part in this computation, and so the TMC estimator is quite effective in reducing the computation time.

Table 2: Estimating the expectation of

u(1/2,\cdot)

with various values of

N,M

and

s

using the standard Monte Carlo method and the TMC method for the uniform case. The average estimate, the variance of the estimator and the average computational time (in seconds) are shown for each method. The efficiency is defined by the ratio of the product of the variance and the computational time between two methods.

	stdMC			TMC
	mean	variance	time	mean	variance	time	efficiency
$N$	$N=M=s$
64	0.066	2.00 $\cdot 10^{-8}$	0.002	0.066	4.39 $\cdot 10^{-8}$	0.023	0.046
128	0.065	4.20 $\cdot 10^{-9}$	0.020	0.065	1.47 $\cdot 10^{-8}$	0.071	0.081
256	0.064	2.00 $\cdot 10^{-9}$	0.189	0.064	7.11 $\cdot 10^{-9}$	0.221	0.240
512	0.064	1.01 $\cdot 10^{-9}$	1.455	0.064	2.27 $\cdot 10^{-9}$	0.823	0.781
1024	0.064	5.75 $\cdot 10^{-10}$	19.720	0.064	8.45 $\cdot 10^{-10}$	3.202	4.190
2048	0.064	1.96 $\cdot 10^{-10}$	217.303	0.064	5.51 $\cdot 10^{-10}$	12.069	6.405
4096	0.064	9.50 $\cdot 10^{-11}$	2367.836	0.064	2.36 $\cdot 10^{-10}$	57.402	16.612
$N$	$N=M^{2}=s$
256	0.076	3.34 $\cdot 10^{-8}$	0.005	0.076	1.17 $\cdot 10^{-7}$	0.013	0.125
512	0.072	7.88 $\cdot 10^{-9}$	0.042	0.072	2.44 $\cdot 10^{-8}$	0.031	0.441
1024	0.070	2.77 $\cdot 10^{-9}$	0.298	0.070	4.98 $\cdot 10^{-9}$	0.087	1.892
2048	0.068	4.78 $\cdot 10^{-10}$	1.871	0.068	2.07 $\cdot 10^{-9}$	0.235	1.838
4096	0.066	3.66 $\cdot 10^{-10}$	10.965	0.066	6.48 $\cdot 10^{-10}$	0.766	8.088
8192	0.066	6.70 $\cdot 10^{-11}$	75.863	0.066	2.28 $\cdot 10^{-10}$	2.390	9.329
16384	0.065	2.20 $\cdot 10^{-11}$	963.569	0.065	9.70 $\cdot 10^{-11}$	7.033	31.073
$N$	$N=2M=2s$
64	0.069	3.06 $\cdot 10^{-8}$	0.001	0.070	9.98 $\cdot 10^{-8}$	0.017	0.013
128	0.066	6.16 $\cdot 10^{-9}$	0.004	0.066	2.36 $\cdot 10^{-8}$	0.044	0.026
256	0.065	2.15 $\cdot 10^{-9}$	0.041	0.065	9.24 $\cdot 10^{-9}$	0.135	0.070
512	0.064	7.61 $\cdot 10^{-10}$	0.326	0.064	2.58 $\cdot 10^{-9}$	0.449	0.214
1024	0.064	3.86 $\cdot 10^{-10}$	2.681	0.064	8.95 $\cdot 10^{-10}$	1.413	0.818
2048	0.064	1.72 $\cdot 10^{-10}$	35.731	0.064	5.67 $\cdot 10^{-10}$	5.648	1.919
4096	0.064	7.84 $\cdot 10^{-11}$	444.156	0.064	2.41 $\cdot 10^{-10}$	23.480	6.144
$N$	$N=2M^{2}=2s$
512	0.076	1.39 $\cdot 10^{-8}$	0.011	0.076	4.37 $\cdot 10^{-8}$	0.026	0.137
1024	0.072	3.25 $\cdot 10^{-9}$	0.064	0.072	9.13 $\cdot 10^{-9}$	0.062	0.364
2048	0.070	7.20 $\cdot 10^{-10}$	0.534	0.070	3.26 $\cdot 10^{-9}$	0.183	0.645
4096	0.068	2.23 $\cdot 10^{-10}$	3.204	0.068	9.55 $\cdot 10^{-10}$	0.524	1.427
8192	0.066	1.41 $\cdot 10^{-10}$	20.888	0.066	3.01 $\cdot 10^{-10}$	1.596	6.133
16384	0.066	3.60 $\cdot 10^{-11}$	159.880	0.066	1.22 $\cdot 10^{-10}$	4.618	10.216
32768	0.065	1.02 $\cdot 10^{-11}$	2023.179	0.065	5.80 $\cdot 10^{-11}$	13.586	26.144

As is standard (see for instance Chapter 8 of Owen (2019)), we measure the relative efficiency of the TMC estimator compared to the stdMC estimator by the ratio

\frac{T_{\mathrm{MC}}\sigma^{2}_{\mathrm{MC}}}{T_{\mathrm{TMC}}\sigma^{2}_{\mathrm{TMC}}},

where $T_{\bullet}$ and $\sigma^{2}_{\bullet}$ denote the computational time spent and the variance, respectively, for the estimators $\bullet\in\{\mathrm{MC},\mathrm{TMC}\}$ . As shown in the rightmost column of Table 2, the relative efficiency is smaller than 1 for low-dimensional cases, which means that we do not gain any benefit from using the TMC estimator. However, because of the substantial computational time savings, the efficiency increases significantly for large $s$ where it goes well beyond 1.

3.3 1D differential equation in the log-normal case

Let us move on to an ODE with the log-normal random coefficients:

	$\displaystyle-\frac{\mathrm{d}}{\mathrm{d}x}\left(a(x,\boldsymbol{y})\frac{\mathrm{d}}{\mathrm{d}x}u(x,\boldsymbol{y})\right)=g(x)\equiv 1$
	for $x\in(0,1)$ and $y_{j}\sim\text{i.i.d.\ }N(0,1)$
	$\displaystyle u(x,\boldsymbol{y})=0\quad\text{for $x=0,1$}$
	$\displaystyle a(x,\boldsymbol{y})=\exp\left(\sum_{j=1}^{\infty}y_{j}\frac{\sin(2\pi jx)}{j^{2}}\right).$

Similarly to the uniform case, we truncate the infinite sum appearing in the random field $a$ by the first $s$ terms. A similar test case was also used in Section 4.3 of Dick et al. (2015).³³3We point out that in Section 4.3 of Dick et al. (2015) the authors replaced the normal distribution by a uniform distribution. However, a normal distribution could have been used in (Dick et al. 2015, Section 4.3) as well, for instance by shifting the QMC points in $[0,1]^{s}$ in the quadrature rule by $(1/(2N),\ldots,1/(2N))$ and applying the inverse CDF to the shifted points. Doing so avoids the point $(0,\ldots,0)\in[0,1]^{s}$ which would get transformed to $(-\infty,\ldots,-\infty)$ . Note that this shift does not effect the fast QMC matrix vector product, which can still be applied in the usual way.

Now, as the entries of the stiffness matrix $B=(b_{k,\ell})_{k,\ell}\in\mathbb{R}^{(M-1)\times(M-1)}$ cannot be expressed simply as a linear sum of $y_{1},y_{2},\ldots$ , we need to approximate the integral by using some quadrature formulas, except for the case $|k-\ell|\geq 2$ where we just have $b_{k,\ell}=0$ . Denoting the quadrature nodes and the corresponding weights by $x_{1,k,\ell},\ldots,x_{I,k,\ell}$ and $\omega_{1,k,\ell},\ldots,\omega_{I,k,\ell}$ , the entry $b_{k,\ell}$ is approximated by

b_{k,\ell}\approx\hat{b}_{k,\ell}=\sum_{i=1}^{I}\omega_{i,k,\ell}\exp(\theta_{i,k,\ell}){\phi}^{\prime}_{k}(x_{i,k,\ell}){\phi}^{\prime}_{\ell}(x_{i,k,\ell}),

where

\theta_{i,k,\ell}=\sum_{j=1}^{s}y_{j}\frac{\sin(2\pi jx_{i,k,\ell})}{j^{2}},

for the case $|k-\ell|\leq 1$ . As stated in Section 3.2 of Dick et al. (2015), computing $\theta_{i,k,\ell}$ for all $1\leq i\leq I$ and all $N$ Monte Carlo samples on $(y_{1},\ldots,y_{s})$ can be done in a fast way by using the TMC estimator, requiring $O(INM\log s)$ arithmetic operations. On the other hand, we need $O(INMs)$ arithmetic operations when using the standard Monte Carlo estimator. This way, the log-normal case poses a further computational challenge compared to the uniform case, so that it would be interesting to see whether the TMC is still effective. In this paper we apply the 3-point closed Newton-Cotes formula with nodes at $x_{k-1},x_{k},x_{k+1}$ if $k=\ell$ and the 2-point closed one with nodes at $x_{(k+\ell-1)/2},x_{(k+\ell+1)/2}$ if $|k-\ell|=1$ . Again we estimate the expectation of $u(1/2,\cdot)$ .

Table 3 shows the results for various values of $N,M$ and $s$ . Similarly to the uniform case, we see that the TMC estimator is unbiased as the mean values agree well with the results for the stdMC estimator. In this case, however, the variances for both of the estimators do not necessarily decay with the rate $O(1/N)$ . This is possible because we increase $M$ and $s$ simultaneously with $N$ , which may lead to an increment of the variance of $u_{M}(1/2,\cdot)$ in a non-asymptotic range of $N$ . As $N$ increases further, it is expected that the variance of $u_{M}(1/2,\cdot)$ stays almost the same and that the variances for both of the estimators tend to decay with the rate $O(1/N)$ . Moreover, it can be seen from the table that the magnitude of the variance for the TMC estimator is comparable to that of the stdMC estimator for many choices of $N,M,s$ . As expected, the computational time for the stdMC estimator increases with $s$ significantly faster than the TMC estimator, and it is clear that computation of the stiffness matrix takes most of the computational time, even for the log-normal case. The relative efficiency of the TMC estimator over the stdMC estimator gets larger as $N$ (or, equivalently $s$ ) increases.

Table 3: Estimating the expectation of

u(1/2,\cdot)

with various values of

N,M

and

s

using the standard Monte Carlo method and the TMC method for the log-normal case.

	stdMC			TMC
	mean	variance	time	mean	variance	time	efficiency
$N$	$N=M=s$
64	0.018	8.34 $\cdot 10^{-8}$	0.073	0.018	2.97 $\cdot 10^{-7}$	0.079	0.260
128	0.018	1.17 $\cdot 10^{-8}$	0.580	0.018	1.76 $\cdot 10^{-7}$	0.171	0.227
256	0.018	1.55 $\cdot 10^{-8}$	4.650	0.018	7.74 $\cdot 10^{-8}$	0.549	1.694
512	0.018	2.11 $\cdot 10^{-8}$	36.640	0.018	3.35 $\cdot 10^{-8}$	2.260	10.230
1024	0.018	2.65 $\cdot 10^{-8}$	310.415	0.018	6.63 $\cdot 10^{-8}$	10.385	11.956
2048	0.018	1.23 $\cdot 10^{-8}$	2391.495	0.018	3.06 $\cdot 10^{-8}$	41.504	23.262
4096	0.018	2.33 $\cdot 10^{-8}$	20739.823	0.018	2.03 $\cdot 10^{-8}$	182.282	131.012
$N$	$N=M^{2}=s$
256	0.017	8.08 $\cdot 10^{-9}$	0.301	0.017	1.24 $\cdot 10^{-8}$	0.032	6.069
512	0.018	1.44 $\cdot 10^{-8}$	1.680	0.018	9.75 $\cdot 10^{-9}$	0.093	26.813
1024	0.018	2.68 $\cdot 10^{-8}$	9.762	0.018	1.24 $\cdot 10^{-8}$	0.277	76.387
2048	0.018	8.59 $\cdot 10^{-9}$	56.480	0.018	5.37 $\cdot 10^{-8}$	0.615	14.695
4096	0.018	1.89 $\cdot 10^{-8}$	317.294	0.018	2.98 $\cdot 10^{-8}$	2.212	91.011
8192	0.018	6.15 $\cdot 10^{-8}$	1865.110	0.018	3.27 $\cdot 10^{-8}$	7.094	494.909
16384	0.018	1.91 $\cdot 10^{-8}$	9964.289	0.018	2.35 $\cdot 10^{-8}$	16.548	473.083
$N$	$N=2M=2s$
64	0.018	1.16 $\cdot 10^{-7}$	0.021	0.018	1.83 $\cdot 10^{-7}$	0.046	0.288
128	0.018	9.58 $\cdot 10^{-8}$	0.163	0.018	1.46 $\cdot 10^{-7}$	0.126	0.844
256	0.018	5.09 $\cdot 10^{-8}$	1.279	0.018	7.30 $\cdot 10^{-8}$	0.394	2.264
512	0.018	2.40 $\cdot 10^{-8}$	9.997	0.018	3.33 $\cdot 10^{-8}$	1.089	6.616
1024	0.018	9.53 $\cdot 10^{-8}$	82.822	0.018	6.00 $\cdot 10^{-8}$	4.914	26.758
2048	0.018	1.18 $\cdot 10^{-8}$	644.228	0.018	2.97 $\cdot 10^{-8}$	19.265	13.243
4096	0.018	1.45 $\cdot 10^{-8}$	5127.910	0.018	1.72 $\cdot 10^{-8}$	73.837	58.506
$N$	$N=2M^{2}=2s$
512	0.017	1.81 $\cdot 10^{-8}$	0.597	0.017	5.08 $\cdot 10^{-9}$	0.060	35.776
1024	0.018	5.07 $\cdot 10^{-9}$	3.370	0.018	6.70 $\cdot 10^{-9}$	0.151	16.834
2048	0.018	9.32 $\cdot 10^{-9}$	22.059	0.018	7.58 $\cdot 10^{-9}$	0.393	68.996
4096	0.018	8.18 $\cdot 10^{-9}$	123.556	0.018	5.70 $\cdot 10^{-9}$	1.187	149.507
8192	0.018	4.32 $\cdot 10^{-9}$	646.786	0.018	4.16 $\cdot 10^{-9}$	4.738	141.992
16384	0.018	2.47 $\cdot 10^{-9}$	3546.950	0.018	5.11 $\cdot 10^{-9}$	11.673	146.925
32768	0.018	1.81 $\cdot 10^{-9}$	20018.280	0.018	3.97 $\cdot 10^{-9}$	33.323	274.032

3.4 2D differential equation with random coefficients

Following Section 4 of Dick et al. (2016), our last example considers the following two-dimensional ODE with the uniform random coefficients:

	$\displaystyle-\nabla\cdot\left(a(\boldsymbol{x},\boldsymbol{y})\nabla u(\boldsymbol{x},\boldsymbol{y})\right)=g(\boldsymbol{x})\equiv 100x_{1}$
	for $\boldsymbol{x}\in[0,1]^{2}$ and $\boldsymbol{y}\in\left[-\frac{1}{2},\frac{1}{2}\right]^{\mathbb{N}}$
	$\displaystyle u(\boldsymbol{x},\boldsymbol{y})=0\quad\text{for $\boldsymbol{x}\in\partial\left([0,1]^{2}\right)$}$
	$\displaystyle a(\boldsymbol{x},\boldsymbol{y})=1+\sum_{j=1}^{\infty}y_{j}\frac{\sin(\pi k_{j,1}x_{1})\sin(\pi k_{j,2}x_{2})}{(k_{j,1}^{2}+k_{j,2}^{2})^{2}}.$

Here the elements $(k_{j,1},k_{j,2})_{j}$ are ordered in such a way that $\{(k_{j,1},k_{j,2})\mid j\in\mathbb{N}\}=\mathbb{N}\times\mathbb{N}$ and

\frac{1}{(k_{j,1}^{2}+k_{j,2}^{2})^{2}}\geq\frac{1}{(k_{j+1,1}^{2}+k_{j+1,2}^{2})^{2}}\quad\text{for all $j\in\mathbb{N}$.}

In cases where equality holds, the ordering is arbitrary. We solve this ODE by a finite element discretization. Given the boundary condition on $u$ , we exclude the basis functions along the boundary and use a set of local piecewise linear hat functions $\{\phi_{p,q}:0<p,q<M\}$ as the system of basis functions. The basis function $\phi_{p,q}$ has center at $(p/M,q/M)$ and the support is given by the following hexagon:

The approximate solution $u_{M}$ of the ODE in this setting is given by

u_{M}=\sum_{p,q=1}^{M-1}\hat{u}_{p,q}\phi_{p,q}(x_{1},x_{2}),

with

	$\displaystyle\left(\hat{u}_{1,1},\ldots,\hat{u}_{1,M-1},\ldots,\hat{u}_{M-1,1},\ldots,\hat{u}_{M-1,M-1}\right)$
	$\displaystyle\quad\cdot B(y_{1},\ldots,y_{s})$
	$\displaystyle=(\hat{g}_{1,1},\ldots,\hat{g}_{1,M-1},\ldots,\hat{g}_{M-1,1},\ldots,\hat{g}_{M-1,M-1}),$		(7)

for the stiffness matrix $B$ depending on $y_{1},\ldots,y_{s}$ and the forcing vector with entries $\hat{g}_{p,q}=\int_{[0,1]^{2}}g(\boldsymbol{x})\phi_{p,q}(\boldsymbol{x})\,\mathrm{d}\boldsymbol{x}$ , which is explicitly computable. By truncating the infinite sum appearing in the random field $a$ to the first $s$ terms, the entries of the matrix $B=(b_{(p,q),(p^{\prime},q^{\prime})})_{p,q,p^{\prime},q^{\prime}}$ are given by

	$\displaystyle b_{(p,q),(p^{\prime},q^{\prime})}$
	$\displaystyle=\int_{[0,1]^{2}}a(\boldsymbol{x},(y_{1},\ldots,y_{s},0,\ldots))\nabla\phi_{p,q}(\boldsymbol{x})\nabla\phi_{p^{\prime},q^{\prime}}(\boldsymbol{x})\,\mathrm{d}\boldsymbol{x}$
	$\displaystyle=\int_{[0,1]^{2}}\nabla\phi_{p,q}(\boldsymbol{x})\nabla\phi_{p^{\prime},q^{\prime}}(\boldsymbol{x})\,\mathrm{d}\boldsymbol{x}$
	$\displaystyle\quad+\sum_{j=1}^{s}y_{j}\int_{[0,1]^{2}}\frac{\sin(\pi k_{j,1}x_{1})\sin(\pi k_{j,2}x_{2})}{(k_{j,1}^{2}+k_{j,2}^{2})^{2}}$
	$\displaystyle\qquad\qquad\qquad\times\nabla\phi_{p,q}(\boldsymbol{x})\nabla\phi_{p^{\prime},q^{\prime}}(\boldsymbol{x})\,\mathrm{d}\boldsymbol{x}$
	$\displaystyle=:a_{(p,q),(p^{\prime},q^{\prime})}^{(0)}+\sum_{j=1}^{s}y_{j}a_{(p,q),(p^{\prime},q^{\prime})}^{(j)}.$

Similarly to the one-dimensional case, every term $a_{(p,q),(p^{\prime},q^{\prime})}^{(j)}$ is explicitly calculable. Hence, by letting $A^{(j)}=(a_{(p,q),(p^{\prime},q^{\prime})}^{(j)})_{p,q,p^{\prime},q^{\prime}}$ for $j=0,1,\ldots,s$ , we have

B=A^{(0)}+\sum_{j=1}^{s}y_{j}A^{(j)}.

This time, unlike the one-dimensional uniform case, each matrix $A^{(j)}$ is no longer tridiagonal but has a bandwidth of $O(M)$ . Instead of the LU decomposition, we use the BiCGSTAB method without preconditioner to solve the system of linear equations (7). We refer to Saad (2003) for detailed information on the BiCGSTAB method. Since this is an iterative method, the resulting solution $u_{M}$ is not precise and the required computational time depends on the stopping criterion we use. Moreover, such an additional computational burden makes it a priori unclear whether TMC can significantly reduce the computational time as a whole. However, in all our tests, computing the matrix $B$ for $N$ Monte Carlo samples on $(y_{1},\ldots,y_{s})$ remained the computationally dominant part, and the relative efficiency of the TMC estimator did not strongly depend on the stopping criterion. For our experiments, we estimate the expectation of $u((1/2,1/2),\cdot)$ .

Table 4 shows the results for various values of $N,M$ and $s$ . The stopping criterion of the BiCGSTAB method is set such that the relative 2-norm of the residual is less than $10^{-5}$ . We see that the TMC estimator is unbiased, and that the variances for both of the estimators approximately decay with the rate $O(1/N)$ . Similarly to the one-dimensional log-normal case, the magnitude for the TMC estimator is comparable to that of the stdMC estimator. The computational time for the stdMC estimator increases with $s$ much faster than the TMC estimator, resulting in a substantial relative efficiency of the TMC estimator over the stdMC estimator for larger $s$ .

Table 4: Estimating the expectation of

u((1/2,1/2),\cdot)

with various values of

N,M

and

s

using the standard Monte Carlo method and the TMC method for the two-dimensional uniform case.

	stdMC			TMC
	mean	variance	time	mean	variance	time	efficiency
$N$	$N=M^{2}=s$
16	3.516	1.09 $\cdot 10^{-3}$	0.0002	3.514	7.17 $\cdot 10^{-4}$	0.0020	0.119
64	3.647	2.16 $\cdot 10^{-4}$	0.009	3.643	2.43 $\cdot 10^{-4}$	0.027	0.302
256	3.676	8.03 $\cdot 10^{-5}$	1.248	3.678	8.84 $\cdot 10^{-5}$	0.406	2.792
1024	3.685	1.89 $\cdot 10^{-5}$	127.366	3.686	1.25 $\cdot 10^{-5}$	8.369	22.925
4096	3.688	3.62 $\cdot 10^{-6}$	14773.594	3.688	3.74 $\cdot 10^{-6}$	228.490	62.562
$N$	$N=2M^{2}=2s$
32	3.517	3.97 $\cdot 10^{-4}$	0.0004	3.515	3.20 $\cdot 10^{-4}$	0.0035	0.132
128	3.646	6.81 $\cdot 10^{-5}$	0.019	3.646	1.26 $\cdot 10^{-4}$	0.057	0.174
512	3.676	3.41 $\cdot 10^{-5}$	2.773	3.677	3.35 $\cdot 10^{-5}$	0.769	3.670
2048	3.686	5.67 $\cdot 10^{-6}$	260.868	3.686	7.59 $\cdot 10^{-6}$	16.375	11.893
8192	3.688	1.55 $\cdot 10^{-6}$	47914.800	3.688	1.66 $\cdot 10^{-6}$	494.336	90.562
$N$	$2N=M^{2}=2s$
8	3.517	1.83 $\cdot 10^{-3}$	0.0001	3.513	1.89 $\cdot 10^{-3}$	0.0021	0.046
32	3.646	4.06 $\cdot 10^{-4}$	0.003	3.639	3.19 $\cdot 10^{-4}$	0.020	0.194
128	3.679	7.52 $\cdot 10^{-5}$	0.291	3.679	1.26 $\cdot 10^{-4}$	0.771	0.771
512	3.685	2.61 $\cdot 10^{-5}$	21.702	3.686	3.35 $\cdot 10^{-5}$	4.135	4.087
2048	3.687	3.63 $\cdot 10^{-6}$	2385.625	3.688	7.59 $\cdot 10^{-6}$	105.021	10.862
$N$	$2N=M^{2}=s$
32	3.522	4.11 $\cdot 10^{-4}$	0.0005	3.515	3.21 $\cdot 10^{-4}$	0.0035	0.184
128	3.646	7.54 $\cdot 10^{-5}$	0.034	3.646	1.27 $\cdot 10^{-4}$	0.041	0.490
512	3.676	2.61 $\cdot 10^{-5}$	5.259	3.677	3.35 $\cdot 10^{-5}$	0.749	5.468
2048	3.685	3.63 $\cdot 10^{-6}$	742.863	3.686	7.59 $\cdot 10^{-6}$	16.304	21.790
8192	3.688	1.96 $\cdot 10^{-6}$	115052.392	3.688	1.66 $\cdot 10^{-6}$	510.611	266.587

4 Conclusion

Motivated by applications to partial differential equations with random coefficients, we introduced the Toeplitz Monte Carlo estimator in this paper. The theoretical analysis of the TMC estimator shows that it is unbiased and the variance converges with the canonical $1/N$ rate. From the viewpoint of tractability in the weighted $L_{2}$ space, the TMC estimator requires a stronger condition on the weights than the standard Monte Carlo estimator to achieve strong polynomial tractability. Through a series of numerical experiments for PDEs with random coefficients, we observed that the TMC estimator is quite effective in reducing necessary computational times and the relative efficiency over the standard Monte Carlo estimator is substantial, particularly for high-dimensional settings.

We leave the following topics open for future research.

•

Combination with variance reduction techniques: In our numerical experiments for the one-dimensional uniform case, the variance of the TMC estimator tends to be much larger than the standard Monte Carlo estimator. To address this issue, it would be reasonable to consider applying some variance reduction techniques to the TMC estimator such that the resulting algorithm still allows for a fast matrix-vector multiplication. In particular, it would be interesting to design a variance reduction technique which reduces the term

\sum_{\ell=1}^{\min(s,N)-1}(N-\ell)\sum_{\emptyset\neq u\subseteq[1:s-\ell]}I_{\rho_{|u|}}(f_{u}f_{u+\ell}).

•

Multilevel Toeplitz Monte Carlo (MLTMC): Recently, multilevel Monte Carlo (MLMC) methods from Giles (2008) have been studied intensively in the context of PDEs with random coefficients, see for instance Cliffe et al. (2011) and Teckentrup et al. (2013). By combining the TMC estimator with MLMC, the dependence of the total computational complexity not only on the truncation dimension $s$ but also on the discretization parameter $M$ can be possibly weakened.
•

Applications to different areas: Although this work has been originally motivated by PDEs with random coefficients, the TMC estimator itself is more general and can be applied in different contexts as well. Since generating points from multivariate normal distribution is quite common, for instance, in financial engineering, operations research and machine learning, one may apply the TMC estimator also to those areas.

Acknowledgements.

Josef Dick is partly supported by the Australian Research Council Discovery Project DP190101197. Takashi Goda is partly supported by JSPS KAKENHI Grant Number 20K03744. J.D. would like to thank T.G. for his hospitality during his visit at U. Tokyo.

References

Cliffe et al. (2011) Cliffe, K. A., Giles, M. B., Scheichl, R., Teckentrup, A. L.: Multilevel Monte Carlo methods and applications to elliptic PDEs with random coefficients. Comput. Vis. Sci. 14, 3–15 (2011)
Devroye (1986) Devroye, L.: Non-Uniform Random Variate Generation. Springer, New York (1986)
Dick et al. (2015) Dick, J., Kuo, F. Y., Le Gia, Q. T., Schwab, C.: Fast QMC matrix-vector multiplication. SIAM J. Sci. Comput. 37, A1436–A1450 (2015)
Dick et al. (2016) Dick, J., Kuo, F. Y., Le Gia, Q. T., Schwab, C.: Multilevel higher order QMC Petrov-Galerkin discretization for affine parametric operator equations. SIAM J. Numer. Anal. 54, 2541–2568 (2016)
Feischl et al. (2018) Feischl, M., Kuo, F. Y., Sloan, I. H.: Fast random field generation with $H$ -matrices. Numer. Math. 140, 639–676 (2018)
Frigo and Johnson (2005) Frigo, M., Johnson, S. G.: The design and implementation of FFTW3. Proc. IEEE 93, 216–231 (2005)
Giles (2008) Giles, M. B.: Multilevel Monte Carlo path simulation. Oper. Res. 56, 607–617 (2008)
Giles et al. (2008) Giles, M. B., Kuo, F. Y., Sloan, I. H., Waterhouse, B. J.: Quasi-Monte Carlo for finance applications. ANZIAM J. 50, C308–C323 (2008)
Hoeffding (1948) Hoeffding, W.: A class of statistics with asymptotically normal distribution. Ann. Math. Statist. 19, 293–325 (1948)
Kuo et al. (2010) Kuo, F. Y., Sloan, I. H., Wasilkowski, G., Woźniakowski, H.: On decompositions of multivariate functions. Math. Comp. 79, 953–966 (2010)
Novak and Woźniakowski (2010) Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems. Volume II: Standard Information for Functionals. EMS Tracts in Mathematics, 12. European Mathematical Society, Zürich (2010)
Owen (2019) Owen, A. B.: Monte Carlo Theory, Methods and Examples. http://statweb.stanford.edu/~owen/mc/ (2019) [Last accessed 4 March 2020]
Saad (2003) Saad, Y.: Iterative Methods for Sparse Linear Systems. 2nd Ed., SIAM, Philadelphia (2003)
Sloan and Woźniakowski (1998) Sloan, I. H., Woźniakowski, H.: When are quasi-Monte Carlo algorithms efficient for high-dimensional integrals? J. Complexity 14, 1–33 (1998)
Sobol’ (1993) Sobol’, I. M.: Sensitivity estimates for nonlinear mathematical models. Math. Mod. Comp. Exp. 1, 407–414 (1993)
Teckentrup et al. (2013) Teckentrup, A. L., Scheichl, R., Giles, M. B., Ullmann, E.: Further analysis of multilevel Monte Carlo methods for elliptic PDEs with random coefficients. Numer. Math. 125, 569–600 (2013)

	$\displaystyle\sum_{\emptyset\neq u\subseteq[1:s-\ell]}I_{\rho_{\|u\|}}(f_{u}f_{u+\ell})$
	$\displaystyle\leq\sum_{\emptyset\neq u\subseteq[1:s-\ell]}\left(I_{\rho_{\|u\|}}(f_{u}^{2})\right)^{1/2}\left(I_{\rho_{\|u\|}}(f_{u+\ell}^{2})\right)^{1/2}$
	$\displaystyle=\sum_{v=1}^{s-\ell}\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s-\ell]\\ \min_{j\in u}j=v\end{subarray}}\left(I_{\rho_{\|u\|}}(f_{u}^{2})\right)^{1/2}\left(I_{\rho_{\|u\|}}(f_{u+\ell}^{2})\right)^{1/2}$
	$\displaystyle\leq\sum_{v=1}^{s-\ell}\left(\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s-\ell]\\ \min_{j\in u}j=v\end{subarray}}I_{\rho_{\|u\|}}(f_{u}^{2})\right)^{1/2}$
	$\displaystyle\qquad\times\left(\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s-\ell]\\ \min_{j\in u}j=v\end{subarray}}I_{\rho_{\|u\|}}(f_{u+\ell}^{2})\right)^{1/2}$
	$\displaystyle=\sum_{v=1}^{s-\ell}\left(\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s-\ell]\\ \min_{j\in u}j=v\end{subarray}}I_{\rho_{\|u\|}}(f_{u}^{2})\right)^{1/2}$
	$\displaystyle\qquad\times\left(\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[\ell+1:s]\\ \min_{j\in u}j=\ell+v\end{subarray}}I_{\rho_{\|u\|}}(f_{u}^{2})\right)^{1/2}$
	$\displaystyle\leq\sum_{v=1}^{s-\ell}\alpha_{v}(f)\alpha_{\ell+v}(f).$

	$\displaystyle\mathbb{E}\left[\left(I^{\mathrm{TMC}}_{\rho_{s}}(f;N)-I_{\rho_{s}}(f)\right)^{2}\right]$
	$\displaystyle=\mathbb{V}[I^{\mathrm{TMC}}_{\rho_{s}}(f;N)]\leq\frac{1}{N}\left(\sum_{\ell=1}^{s}\alpha_{\ell}(f)\right)^{2}$
	$\displaystyle=\frac{1}{N}\left(\sum_{\ell=1}^{s}\left(\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}I_{\rho_{\|u\|}}(f_{u}^{2})\right)^{1/2}\right)^{2}$
	$\displaystyle\leq\frac{1}{N}\left(\sum_{\ell=1}^{s}\left(\max_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}\right)^{1/2}\right.$
	$\displaystyle\quad\times\left.\left(\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}^{-1}I_{\rho_{\|u\|}}(f_{u}^{2})\right)^{1/2}\right)^{2}$
	$\displaystyle\leq\frac{1}{N}\left(\sum_{\ell=1}^{s}\max_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}\right)\left(\sum_{\ell=1}^{s}\sum_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}^{-1}I_{\rho_{\|u\|}}(f_{u}^{2})\right)$
	$\displaystyle\leq\frac{\\|f\\|_{s,\boldsymbol{\gamma}}^{2}}{N}\sum_{\ell=1}^{s}\max_{\begin{subarray}{c}\emptyset\neq u\subseteq[1:s]\\ \min_{j\in u}j=\ell\end{subarray}}\gamma_{u}.$

Toeplitz Monte Carlo

Abstract

Keywords:

MSC:

1 Introduction

Algorithm 1

Remark 1

2 Theoretical results

2.1 Variance analysis

Lemma 1

Theorem 2.1

Proof

Remark 2

Example 1

2.2 Weighted L2L_{2} space and tractability

Corollary 1

Proof

Theorem 2.2

Proof

3 Numerical experiments

3.1 Generating points from multivariate Gaussian

3.2 1D differential equation with uniform random coefficients

3.3 1D differential equation in the log-normal case

3.4 2D differential equation with random coefficients

4 Conclusion

Acknowledgements.

References

2.2 Weighted $L_{2}$ space and tractability