This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Context-aware learning of hierarchies of low-fidelity models for multi-fidelity uncertainty quantification

Ionu\cbt-Gabriel Farca\cbs Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin    Benjamin Peherstorfer Courant Institute of Mathematical Sciences, New York University    Frank Jenko Max Planck Institute for Plasma Physics   
Tobias Neckel
Department of Informatics, Technical University of Munich
   and Hans-Joachim Bungartz44footnotemark: 4
Abstract

Multi-fidelity Monte Carlo methods leverage low-fidelity and surrogate models for variance reduction to make tractable uncertainty quantification even when numerically simulating the physical systems of interest with high-fidelity models is computationally expensive. This work proposes a context-aware multi-fidelity Monte Carlo method that optimally balances the costs of training low-fidelity models with the costs of Monte Carlo sampling. It generalizes the previously developed context-aware bi-fidelity Monte Carlo method to hierarchies of multiple models and to more general types of low-fidelity models. When training low-fidelity models, the proposed approach takes into account the context in which the learned low-fidelity models will be used, namely for variance reduction in Monte Carlo estimation, which allows it to find optimal trade-offs between training and sampling to minimize upper bounds of the mean-squared errors of the estimators for given computational budgets. This is in stark contrast to traditional surrogate modeling and model reduction techniques that construct low-fidelity models with the primary goal of approximating well the high-fidelity model outputs and typically ignore the context in which the learned models will be used in upstream tasks. The proposed context-aware multi-fidelity Monte Carlo method applies to hierarchies of a wide range of types of low-fidelity models such as sparse-grid and deep-network models. Numerical experiments with the gyrokinetic simulation code Gene show speedups of up to two orders of magnitude compared to standard estimators when quantifying uncertainties in small-scale fluctuations in confined plasma in fusion reactors. This corresponds to a runtime reduction from 72 days to about four hours on one node of the Lonestar6 supercomputer at the Texas Advanced Computing Center.

1 Introduction

Uncertainty quantification is an essential building block for achieving predictive numerical simulations of physical systems. To make accurate Monte Carlo estimation of uncertainties tractable even when simulations of the physical system of interest are computationally expensive, multi-fidelity methods rely on low-fidelity or surrogate models: the low-fidelity models are leveraged for variance reduction to achieve speedups and the high-fidelity models are occasionally evaluated to guarantee unbiasedness; see [35] for a survey on multi-fidelity methods. However, if low-fidelity models are not readily available, then they need to be constructed and trained first, which can incur additional computational costs and require additional evaluations of the high-fidelity models to generate training data.

In this work, we build on the context-aware bi-fidelity Monte Carlo method introduced in [32] and propose the context-aware multi-fidelity Monte Carlo (CA-MFMC) method that trades off the costs of training hierarchies of multiple low-fidelity models with the costs of Monte Carlo sampling to obtain multi-fidelity estimators that minimize upper bounds of the mean-squared errors for given computational budgets. The proposed approach is context-aware [1, 12, 32, 43, 41] in the sense that the low-fidelity models are trained to maximize variance reduction in Monte Carlo estimation. This means that the context in which the learned models will be used, namely for variance reduction in Monte Carlo estimation, is taken into account during training, which distinguishes it from traditional surrogate modeling and model reduction techniques that construct low-fidelity models with the primary goal of approximating well the high-fidelity model outputs and typically ignore the context in which the learned models are ultimately used [2, 38]. Our proposed CA-MFMC method can be combined with a wide range of types of data-fit low-fidelity models models such as sparse-grid-based models and deep-network models. We show that CA-MFMC achieves speedups of up to two orders of magnitude compared to single-fidelity Monte Carlo and standard multi-fidelity estimators when quantifying uncertainties in a plasma micro-turbulence scenario from the ASDEX Upgrade Experiment111https://www.ipp.mpg.de/16195/asdex.

We build on multi-fidelity Monte Carlo (MFMC) estimators [25, 31, 33, 34, 35, 36, 21, 20] that leverage a given hierarchy of low-fidelity models for variance reduction. Several works have employed MFMC estimators to quantify uncertainties in plasma micro-turbulence simulations [12, 26] and for estimating statistics in collisionless energetic particle confinement in stellerators [27]. See also [13] for other techniques for uncertainty quantification in plasma simulations. None of these methods trade off the training of low-fidelity models with sampling, however. The works [17, 39, 40] formulate generalized control variate techniques for multi-fidelity uncertainty propagation. There is a wide range of other multi-fidelity techniques that are based on other concepts than control variates and aim to find surrogate models from multi-fidelity information sources such as the collocation approach introduced in [29, 23] and the multi-fidelity networks proposed in [19, 18]; see also [30, 9, 37, 44]. There have been extensions to the multilevel Monte Carlo method [6, 16, 42] that consider spatially-adaptive mesh refinements and optimal mesh hierarchies [7, 10, 22]. However, in that line of work, the low-fidelity models are coarse-grid approximations and thus the costs of constructing low-fidelity models are typically considered to be negligible and therefore are ignored. In contrast, we consider more general types of low-fidelity models such as data-fit models that incur training costs.

The first work that considered trading off training low-fidelity models and multi-fidelity Monte Carlo estimation is [32], which studies the bi-fidelity setting in which the low-fidelity model has algebraic accuracy and cost rates. In [45] a similar trade off is considered in the bi-fidelity case but with more general cost and error rates and a specific focus on polynomial-chaos-based surrogate models. We go far beyond by introducing the CA-MFMC estimator based on context-aware learning that learns hierarchies of more than one low-fidelity model for variance reduction. We first extend the work on a single low-fidelity model with algebraic cost and error rates in [32] to apply to low-fidelity models with more general rates. Key is that the correspond bounds can be nested, which motivates the sequential training of low-fidelity models to obtain a hierarchy. In the proposed sequential training approach to fit hierarchies of low-fidelity models for CA-MFMC estimators, each step leads to an optimal trade off between training and sampling. This leads to a context-aware learning approach because the models are learned such that the CA-MFMC estimator achieves a low mean-squared error, rather than being trained to accurately approximate high-fidelity model outputs as in traditional model reduction.

To demonstrate the performance of the proposed CA-MFMC estimator, we apply it to quantifying uncertainties in plasma micro-turbulence simulations motivated by the ITER experiment222https://www.iter.org. The goal of the ITER experiment is to create, for the very first time, a self-sustained plasma in the laboratory. A physics obstacle are the above-mentioned small-scale fluctuations which cause energy loss rates despite sophisticated plasma confinements via strong and shaped magnetic fields. Building on the gyrokinetic simulation code GENE [24], we show that the proposed estimator achieves speedups of up to two orders of magnitude compared to single-fidelity Monte Carlo and standard multi-fidelity estimators, which translates into a runtime reduction from 72 days to about four hours on one node of the Lonestar6 supercomputer at the Texas Advanced Computing Center333https://www.tacc.utexas.edu/systems/lonestar6.

The remainder of this paper is organized as follows. Section 2 introduces the notation and summarizes the traditional MFMC algorithm [31, 34] and the bi-fidelity context-aware algorithm formulated in [32]. Section 3 introduces our context-aware learning approach for low-fidelity models with general cost/error rates and multiple low-fidelity models for multi-fidelity sampling methods. Section 4 presents numerical results in two scenarios: a heat conduction problem defined on a two-dimensional spatial domain with nine uncertain parameters and a realistic plasma micro-turbulence scenario with 1212 uncertain inputs, for which one realization of the uncertain inputs requires a total runtime of about 411411 seconds on 3232 cores. The code and data to reproduce our numerical results are available at https://github.com/ionutfarcas/context-aware-mfmc.

2 Preliminaries

This section reviews traditional MFMC estimators [31, 34] and the bi-fidelity context-aware MFMC estimator [32] that uses a high-fidelity model and a single low-fidelity model only.

2.1 Static multi-fidelity Monte Carlo estimation

Let f(0):𝒳𝒴f^{(0)}:\mathcal{X}\to\mathcal{Y} represent the input-output response of a potentially expensive-to-evaluate computational model. The input domain is 𝒳d\mathcal{X}\subset\mathbb{R}^{d} and the output is scalar with output domain 𝒴\mathcal{Y}\subset\mathbb{R}. We consider the situation where the input 𝜽=[θ1,θ2,,θd]T\bm{\theta}=[\theta_{1},\theta_{2},\ldots,\theta_{d}]^{T} is a realization of a random variable 𝚯=[Θ1,,Θd]T\bm{\Theta}=[\Theta_{1},\dots,\Theta_{d}]^{T} so that the output f(0)(𝜽)f^{(0)}(\bm{\theta}) becomes a realization of a random variable f(0)(𝚯)f^{(0)}(\bm{\Theta}) too. We denote the probability density function of 𝚯\bm{\Theta} as π\pi. Our goal is to estimate the expected value of f(0)(𝚯)f^{(0)}(\bm{\Theta})

μ0=𝔼[f(0)(𝚯)]=𝒳f(0)(𝜽)π(𝜽)d𝜽.\mu_{0}=\mathbb{E}[f^{(0)}(\bm{\Theta})]=\int_{\mathcal{X}}f^{(0)}(\bm{\theta})\pi(\bm{\theta})\mathrm{d}\bm{\theta}\,.

The MFMC estimator introduced in [31, 34] combines a hierarchy of k+1k+1 models, consisting of a high-fidelity model f(0)f^{(0)} and kk low-fidelity models f(1)f^{(1)}, f(2)f^{(2)}, …, f(k)f^{(k)}. The accuracy of the low-fidelity models is measured by their Pearson correlation coefficient with respect to f(0)f^{(0)},

ρj=Cov[f(0),f(j)]σ0σj[1,1],j=1,,k,\rho_{j}=\frac{\mathrm{Cov}[f^{(0)},f^{(j)}]}{\sigma_{0}\sigma_{j}}\in[-1,1],\quad j=1,\ldots,k,

where σj2\sigma_{j}^{2} denotes the variance of the output random variable f(j)(𝚯)f^{(j)}(\bm{\Theta}) for j=0,1,,kj=0,1,\ldots,k and Cov[,]\operatorname{Cov}[\cdot,\cdot] is the covariance. The evaluation costs of the models are 0<w1,,wk0<w_{1},\dots,w_{k}, respectively. We normalize the cost of evaluating the high-fidelity model such that w0=1w_{0}=1, without loss of generality. The models are assumed to satisfy the following ordering

1=|ρ0|>|ρ1|>>|ρk|,wj1wj>ρj12ρj2ρj2ρj+12,j=1,,k,1=\left|\rho_{0}\right|>\left|\rho_{1}\right|>\ldots>\left|\rho_{k}\right|,\quad\frac{w_{j-1}}{w_{j}}>\frac{\rho_{j-1}^{2}-\rho_{j}^{2}}{\rho_{j}^{2}-\rho_{j+1}^{2}},\quad j=1,\ldots,k, (2.1)

where ρ=0\rho_{\ell}=0 for k+1\ell\geq k+1.

Let now mjm_{j}\in\mathbb{N} for j=0,1,kj=0,1\ldots,k denote the number of evaluations of model f(j)f^{(j)}, which satisfy

0m0m1mk0\leq m_{0}\leq m_{1}\leq\ldots\leq m_{k}

because of (2.1). Consider mkm_{k} independent and identically distributed (i.i.d.) samples drawn from the input density π\pi:

𝜽1,𝜽2,,𝜽mk.\bm{\theta}_{1},\bm{\theta}_{2},\ldots,\bm{\theta}_{m_{k}}.

To obtain the MFMC estimator, model f(j)f^{(j)} is evaluated at the first mjm_{j} samples 𝜽1,𝜽2,,𝜽mj\bm{\theta}_{1},\bm{\theta}_{2},\ldots,\bm{\theta}_{m_{j}} to obtain output random variables

f(j)(𝜽1),f(j)(𝜽2),,f(j)(𝜽mj),f^{(j)}(\bm{\theta}_{1}),f^{(j)}(\bm{\theta}_{2}),\ldots,f^{(j)}(\bm{\theta}_{m_{j}})\,, (2.2)

for j=0,1,,kj=0,1,\ldots,k. From (2.2), derive the following standard MC estimators for j=1,,kj=1,\dots,k:

E^mj(j)=1mji=1mjf(j)(𝜽i),E^mj1(j)=1mj1i=1mj1f(j)(𝜽i)\hat{E}_{m_{j}}^{(j)}=\frac{1}{m_{j}}\sum_{i=1}^{m_{j}}f^{(j)}(\bm{\theta}_{i}),\quad\hat{E}_{m_{j-1}}^{(j)}=\frac{1}{m_{j-1}}\sum_{i=1}^{m_{j-1}}f^{(j)}(\bm{\theta}_{i})

as well as

E^m0(0)=1m0f(0)(𝜽i).\hat{E}_{m_{0}}^{(0)}=\frac{1}{m_{0}}f^{(0)}(\bm{\theta}_{i})\,.

The MFMC estimator is

E^MFMC=E^m0(0)+j=1kαj(E^mj(j)E^mj1(j))\hat{E}^{\mathrm{MFMC}}=\hat{E}_{m_{0}}^{(0)}+\sum_{j=1}^{k}\alpha_{j}(\hat{E}_{m_{j}}^{(j)}-\hat{E}_{m_{j-1}}^{(j)})

with the coefficients α1,,αk\alpha_{1},\dots,\alpha_{k}\in\mathbb{R}. The total computational cost of the MFMC estimator is therefore

p=j=0kmjwj.p=\sum_{j=0}^{k}m_{j}w_{j}\,.

Fixing pp and then selecting m0,m1,,mkm_{0}^{*},m_{1}^{*},\ldots,m_{k}^{*} model evaluations and α1,α2,,αk\alpha_{1}^{*},\alpha_{2}^{*},\ldots,\alpha_{k}^{*} coefficients such that the variance of the MFMC estimator E^MFMC\hat{E}^{\mathrm{MFMC}} is minimized, gives the MSE

MSE[E^MFMC]=σ02p(j=0kwj(ρj2ρj+12))2.\mathrm{MSE}[\hat{E}^{\mathrm{MFMC}}]=\frac{\sigma_{0}^{2}}{p}\left(\sum_{j=0}^{k}\sqrt{w_{j}(\rho_{j}^{2}-\rho_{j+1}^{2})}\right)^{2}. (2.3)

The optimal m0,m1,,mkm_{0}^{*},m_{1}^{*},\ldots,m_{k}^{*} model evaluations and α1,α2,,αk\alpha_{1}^{*},\alpha_{2}^{*},\ldots,\alpha_{k}^{*} coefficients are available in closed form for a given budget pp; see [34] for details.

The MFMC estimator leverages the cheaper-to-evaluate low-fidelity models with the aim to achieve a lower MSE than a standard MC estimator with the same costs. That is, given mm i.i.d. samples from 𝚯\bm{\Theta}, the standard MC mean estimator is

E^m(0)=1mi=1mf(0)(𝜽i).\hat{E}_{m}^{(0)}=\frac{1}{m}\sum_{i=1}^{m}f^{(0)}(\bm{\theta}_{i}). (2.4)

The computational cost of the Monte Carlo estimator (2.4) of 𝔼[f(0)(𝚯)]\mathbb{E}[f^{(0)}(\bm{\Theta})] is p=mw0=mp=mw_{0}=m because the function f(0)f^{(0)} is evaluated at mm realizations and the cost of each evaluation of f(0)f^{(0)} is w0=1w_{0}=1. The MSE of the standard MC estimator E^n(0)\hat{E}_{n}^{(0)} reads

MSE[E^m(0)]=σ02pw0=σ02p.\mathrm{MSE}[\hat{E}_{m}^{(0)}]=\frac{\sigma_{0}^{2}}{p}w_{0}=\frac{\sigma_{0}^{2}}{p}. (2.5)

2.2 Context-aware bi-fidelity Monte Carlo estimator with algebraic accuracy and cost rates

The work [32] introduces a context-aware bi-fidelity MFMC approach that trades off the training costs of constructing a low-fidelity model fn(1)f^{(1)}_{n} and sampling. Following [32], the low-fidelity model fn(1)f^{(1)}_{n} is obtained via a training process such as data-fit models and reduced models. Correspondingly, the subscript nn refers to the number of high-fidelity evaluations that are needed to train fn(1)f^{(1)}_{n}. For example, in case of obtaining fn(1)f^{(1)}_{n} via training a neural network, the subscript nn refers to the number of training input-output pairs obtained with the high-fidelity model f(0)f^{(0)}. This also means that the correlation coefficient ρ1(n)\rho_{1}(n) between f(0)(𝚯)f^{(0)}(\bm{\Theta}) and fn(1)(𝚯)f^{(1)}_{n}(\bm{\Theta}) as well as the evaluation cost w1(n)w_{1}(n) of fn(1)f^{(1)}_{n} depend on nn.

The dependency of the correlation and costs on nn are described as follows: to trade off training low-fidelity model and sampling, the work [32] makes the assumption that the correlation coefficient between the high-fidelity output random variable f(0)(𝚯)f^{(0)}(\bm{\Theta}) and low-fidelity random variable fn(1)(𝚯)f^{(1)}_{n}(\bm{\Theta}) is bounded by

1ρ12(n)c1nα1-\rho_{1}^{2}(n)\leq c_{1}n^{-\alpha}

with respect to nn, where c1,α>0c_{1},\alpha>0 are constants. The cost w1(n)w_{1}(n) of evaluating the low-fidelity model is bounded by

w1(n)c2nβw_{1}(n)\leq c_{2}n^{\beta}

with constants c2,β>0c_{2},\beta>0.

A budget pp corresponds to pp high-fidelity model evaluations because we have w0=1w_{0}=1. Thus, if nn high-fidelity model samples are used for training fn(1)f^{(1)}_{n}, then a budget of pnp-n is left for sampling. The corresponding context-aware bi-fidelity estimator is

E^nCA-MFMC=E^m0(0)+α1(E^m1,n(1)E^m0,n(1))\hat{E}^{\text{CA-MFMC}}_{n}=\hat{E}_{m_{0}^{*}}^{(0)}+\alpha_{1}^{*}(\hat{E}_{m_{1}^{*},n}^{(1)}-\hat{E}_{m_{0}^{*},n}^{(1)})

where m0,m1,α1m_{0}^{*},m_{1}^{*},\alpha_{1}^{*} are chosen to minimize the MSE of E^nCA-MFMC\hat{E}^{\text{CA-MFMC}}_{n} for a given pp and nn. Consequently, the MSE of E^nCA-MFMC\hat{E}^{\text{CA-MFMC}}_{n} depends on pp and nn

MSE(E^nCA-MFMC)=σ02pn(1ρ12(n)+w1(n)ρ12(n))2,\operatorname{MSE}(\hat{E}^{\text{CA-MFMC}}_{n})=\frac{\sigma_{0}^{2}}{p-n}\left(\sqrt{1-\rho_{1}^{2}(n)}+\sqrt{w_{1}(n)\rho_{1}^{2}(n)}\right)^{2}\,,

which is in contrast to the MFMC estimator E^MFMC\hat{E}^{\mathrm{MFMC}} for which the MSE (2.3) depends on pp only and is independent of potential training costs of constructing low-fidelity models.

If the budget pp is fixed, nn can be chosen by minimizing the upper bound of the MSE

MSE(E^nCA-MFMC)2σ02pn(c1nα+c2nβ).\operatorname{MSE}(\hat{E}^{\text{CA-MFMC}}_{n})\leq\frac{2\sigma_{0}^{2}}{p-n}\left(c_{1}n^{-\alpha}+c_{2}n^{\beta}\right)\,. (2.6)

The work [32] shows that there exists a unique nn^{*} that minimizes (2.6) with respect to nn for a given pp under certain assumptions; however, no closed form expression of nn^{*} is available and thus nn^{*} needs to be found numerically. The optimum nn^{*} is bounded independently of the budget pp; see [32] for details.

3 Context-aware multi-fidelity Monte Carlo estimation

This section presents the methodological novelty of this paper. We introduce context-aware learning for MFMC for low-fidelity models with general cost/error rates and hierarchies of multiple—more than one—low-fidelity models.

3.1 Setup with multiple models

We now consider multiple low-fidelity models fn1(1),,fnk(k)f_{n_{1}}^{(1)},\dots,f_{n_{k}}^{(k)} that take n1,,nkn_{1},\dots,n_{k} evaluations, respectively, of the high-fidelity model to be trained. We refer to n1,,nkn_{1},\dots,n_{k} as the number of training samples to fit the low-fidelity models. We then define the CA-MFMC estimator as

E^n1,,nkCA-MFMC=E^m0(0)+j=1kαj(E^mj(j)E^mj1(j)),\hat{E}_{n_{1},\dots,n_{k}}^{\text{CA-MFMC}}=\hat{E}_{m_{0}^{*}}^{(0)}+\sum_{j=1}^{k}\alpha_{j}^{*}\left(\hat{E}_{m_{j}^{*}}^{(j)}-\hat{E}_{m_{j-1}^{*}}^{(j)}\right)\,,

where the m0,,mkm_{0}^{*},\dots,m_{k}^{*} and α1,,αk\alpha_{1}^{*},\dots,\alpha_{k}^{*} minimize the MSE of E^n1,,nkCA-MFMC\hat{E}_{n_{1},\dots,n_{k}}^{\text{CA-MFMC}} for given n1,,nkn_{1},\dots,n_{k} and budget pp. The MSE of the CA-MFMC estimator is

MSE[E^n1,,nkCAMFMC]=σ02pi=1kni(j=0kwj(nj)(ρj2(nj)ρj+12(nj+1)))2,\mathrm{MSE}[\hat{E}^{\mathrm{CA-MFMC}}_{n_{1},\dots,n_{k}}]=\frac{\sigma_{0}^{2}}{p-\sum_{i=1}^{k}n_{i}}\left(\sum_{j=0}^{k}\sqrt{w_{j}(n_{j})(\rho_{j}^{2}(n_{j})-\rho_{j+1}^{2}(n_{j+1}))}\right)^{2}\,, (3.1)

where now the MSE depends on the n1,,nkn_{1},\dots,n_{k} training samples used for constructing the low-fidelity models. We are interested in choosing n1,,nkn_{1},\dots,n_{k} to minimize an upper bound of the MSE (3.1) so that the training costs given by n1,,nkn_{1},\dots,n_{k} and the costs of taking samples from the high- and the low-fidelity output random variables are balanced.

We make the following assumptions on the accuracy and cost behavior of the low-fidelity models with respect to the number of training samples n1,,nkn_{1},\dots,n_{k}:

Assumption 1.

For all j=1,,kj=1,\dots,k, there exists a constant ca,j>0c_{a,j}>0 and a positive, decreasing, at least twice continuously differentiable function ra,j(nj):(0,)(0,)r_{a,j}(n_{j}):(0,\infty)\rightarrow(0,\infty) such that

1ρj2(nj)ca,jra,j(nj),j=1,,k.1-\rho_{j}^{2}(n_{j})\leq c_{a,j}r_{a,j}(n_{j}),\quad j=1,\ldots,k\,.
Assumption 2.

For all j=1,,kj=1,\dots,k, there exists a constant cc,j>0c_{c,j}>0 and a positive, increasing, at least twice continuously differentiable function rc,j(nj):(0,)(0,)r_{c,j}(n_{j}):(0,\infty)\rightarrow(0,\infty) such that the evaluation costs are bounded as

wj(nj)cc,jrc,j(nj),j=1,,k.w_{j}(n_{j})\leq c_{c,j}r_{c,j}(n_{j}),\quad j=1,\ldots,k.

Assumptions 12 hold for a wide range of typical error and cost rates: the functions ra,j(n)=nα,α>0r_{a,j}(n)=n^{-\alpha},\alpha>0 and ra,j(n)=eαn,α>0r_{a,j}(n)=\mathrm{e}^{-\alpha n},\alpha>0 satisfy Assumption 1 for n(0,)n\in(0,\infty). The functions rc,j(n)=nα,α>1r_{c,j}(n)=n^{\alpha},\alpha>1 and rc,j(n)=eαn,α>0r_{c,j}(n)=\mathrm{e}^{\alpha n},\alpha>0 satisfy Assumption 2 for n(0,)n\in(0,\infty). Additionally, the error and cost rates given here are strictly convex.

3.2 Context-aware multi-fidelity Monte Carlo sampling with one low-fidelity model

Let us first consider only one low-fidelity model, which means that k=1k=1. The following is novel compared to what was introduced in [32] because here we allow more general error and cost rates than [32]; cf. Section 2.2 where [32] is reviewed.

We obtain the following upper bound of the MSE of the CA-MFMC estimator (3.1),

MSE[E^n1CAMFMC]=\displaystyle\mathrm{MSE}[\hat{E}^{\mathrm{CA-MFMC}}_{n_{1}}]= σ02pn1(1ρ12(n1)+w1(n1)ρ12(n1))2\displaystyle\frac{\sigma_{0}^{2}}{p-n_{1}}\left(\sqrt{1-\rho_{1}^{2}(n_{1})}+\sqrt{w_{1}(n_{1})\rho_{1}^{2}(n_{1})}\right)^{2}
=\displaystyle= σ02pn1(1ρ12(n1)+w1(n1)ρ12(n1)+21ρ12(n1)w1(n1)ρ12(n1))\displaystyle\frac{\sigma_{0}^{2}}{p-n_{1}}\left(1-\rho_{1}^{2}(n_{1})+w_{1}(n_{1})\rho_{1}^{2}(n_{1})+2\sqrt{1-\rho_{1}^{2}(n_{1})}\sqrt{w_{1}(n_{1})\rho_{1}^{2}(n_{1})}\right)
\displaystyle\leq σ02pn1(2(1ρ12(n1))+2w1(n1)ρ12(n1))\displaystyle\frac{\sigma_{0}^{2}}{p-n_{1}}\left(2(1-\rho_{1}^{2}(n_{1}))+2w_{1}(n_{1})\rho_{1}^{2}(n_{1})\right) (3.2)
\displaystyle\leq 2σ02pn1(1ρ12(n1)+w1(n1))\displaystyle\frac{2\sigma_{0}^{2}}{p-n_{1}}\left(1-\rho_{1}^{2}(n_{1})+w_{1}(n_{1})\right) (3.3)
\displaystyle\leq 2σ02pn1(ca,1ra,1(n1)+cc,1rc,1(n1)),\displaystyle\frac{2\sigma_{0}^{2}}{p-n_{1}}\left(c_{a,1}r_{a,1}\left(n_{1}\right)+c_{c,1}r_{c,1}\left(n_{1}\right)\right), (3.4)

where inequality (3.2) is due to the means inequality a,b+2aba+b\forall a,b\in\mathbb{R}_{+}\Rightarrow 2\sqrt{a}\sqrt{b}\leq a+b, inequality (3.3) results from ρ121\rho_{1}^{2}\leq 1 and (3.4) is due to Assumptions 1 and 2.

The objective that we want to minimize with respect to n1n_{1} is

u1:[1,p1](0,),n11pn1(ca,1ra,1(n1)+cc,1rc,1(n1)).u_{1}:[1,p-1]\rightarrow(0,\infty),n_{1}\mapsto\frac{1}{p-n_{1}}\left(c_{a,1}r_{a,1}\left(n_{1}\right)+c_{c,1}r_{c,1}\left(n_{1}\right)\right)\,. (3.5)

The following proposition clarifies when a unique global minimum exists.

Proposition 1.

Consider function u1u_{1} defined in (3.5) with ra,1r_{a,1} and rc,1r_{c,1} satisfying Assumptions 1 and 2, respectively. Additionally assume that

ca,1ra,1′′(n1)+cc,1rc,1′′(n1)>0c_{a,1}r_{a,1}^{\prime\prime}(n_{1})+c_{c,1}r_{c,1}^{\prime\prime}(n_{1})>0 (3.6)

holds for all n1[1,p1]n_{1}\in[1,p-1], where ra,1′′r_{a,1}^{\prime\prime} and rc,1′′r_{c,1}^{\prime\prime} are the second derivatives of ra,1r_{a,1} and rc,1r_{c,1}, respectively. Then, the function u1u_{1} has a unique global minimum n1[1,p1]n_{1}^{*}\in[1,p-1].

Proof.

Define the function h1(n1)=ca,1ra,1(n1)+cc,1rc,1(n1)h_{1}(n_{1})=c_{a,1}r_{a,1}(n_{1})+c_{c,1}r_{c,1}(n_{1}) and notice that

h1′′(n1)=ca,1ra,1′′(n1)+cc,1rc,1′′(n1)>0,n1[1,p1]h_{1}^{\prime\prime}(n_{1})=c_{a,1}r_{a,1}^{\prime\prime}(n_{1})+c_{c,1}r_{c,1}^{\prime\prime}(n_{1})>0\,,\qquad n_{1}\in[1,p-1] (3.7)

due to (3.6), which means that h1h_{1} is strictly convex in [1,p1][1,p-1]. We now consider u1u_{1} and its first derivative

u1(n1)=(pn1)h1(n1)+h1(n1)(pn1)2.u_{1}^{\prime}(n_{1})=\frac{(p-n_{1})h^{\prime}_{1}(n_{1})+h_{1}(n_{1})}{(p-n_{1})^{2}}\,. (3.8)

Case 1: If u1(n1)>0u_{1}^{\prime}(n_{1})>0 for all n1[1,p1]n_{1}\in[1,p-1], then u1u_{1} is strictly increasing and the minimum of u1u_{1} in [1,p1][1,p-1] is taken at the left boundary n1=1n_{1}^{*}=1, which is the global unique minimum.

Case 2: Analogously to Case 1, if u1(n1)<0u_{1}^{\prime}(n_{1})<0 for all n1[1,p1]n_{1}\in[1,p-1], then u1u_{1} is strictly decreasing and the unique minimum is p1p-1.

Case 3: There is a stationary point n1[1,p1]n_{1}^{*}\in[1,p-1] such that u1(n1)=0u_{1}^{\prime}(n_{1}^{*})=0. At a stationary point n1n_{1}^{*} of u1u_{1}, the second derivative,

u1′′(n1)=h1′′(n1)pn1+2pn1u1(n1),u_{1}^{\prime\prime}(n_{1})=\frac{h_{1}^{\prime\prime}(n_{1})}{p-n_{1}}+\frac{2}{p-n_{1}}u_{1}^{\prime}(n_{1})\,,

is positive u1′′(n1)>0u_{1}^{\prime\prime}(n_{1}^{*})>0 because of (3.7) and u1(n1)=0u^{\prime}_{1}(n_{1}^{*})=0. Thus, together with the fact that function u1u_{1} is twice continuously differentiable and univariate, we have that u1′′u^{\prime\prime}_{1} is positive in a neighborhood about all stationary points, because the inequality h1′′(n1)>0h_{1}^{\prime\prime}(n_{1}^{*})>0 is strict. Thus, in the interior (1,p1)(1,p-1), there can only be minima, which also means that a the boundaries of the interval [1,p1][1,p-1] can only be maxima. This means that there exists only one minimum, which shows uniqueness. ∎

Remark 1.

Assumption (3.6) in Proposition 1 is satisfied if ra,1r_{a,1} is strictly convex and rc,1r_{c,1} is convex, which holds for the examples given in Section 3.1.

We will next show that that the minimizer of (3.5) is also bounded from above, independent of the computational budget, pp, which implies that after a finite number of samples, all high-fidelity model samples should be used in the Monte Carlo estimator rather than being used to improve the low-fidelity model.

Proposition 2.

Let Proposition 1 apply with condition (3.6) satisfied for all n1(0,)n_{1}\in(0,\infty). Additionally, assume that there exists an n¯1(0,)\bar{n}_{1}\in(0,\infty) such that

ca,1ra,1(n¯1)+cc,1rc,1(n¯1)=0c_{a,1}r_{a,1}^{\prime}(\bar{n}_{1})+c_{c,1}r_{c,1}^{\prime}(\bar{n}_{1})=0 (3.9)

holds. Then n¯1\bar{n}_{1} is unique and the minimum n1n_{1}^{*} of u1u_{1} derived in Proposition 1 is bounded from above independently of the budget pp as

n1max{1,n¯1}.n_{1}^{*}\leq\max\{1,\bar{n}_{1}\}\,. (3.10)
Proof.

Consider h1h_{1} introduced in the proof of Proposition 2. There is an n¯1(0,)\bar{n}_{1}\in(0,\infty) with h1(n¯1)=0h_{1}^{\prime}(\bar{n}_{1})=0 as given by (3.9). The stationary point n¯1\bar{n}_{1} is unique because h1h_{1}^{\prime} is strictly increasing because we assumed that (3.6) holds in (0,)(0,\infty). Furthermore, the stationary point n¯1\bar{n}_{1} is independent of pp because h1h_{1} does not depend on pp.

Case 1: If n¯1<1\bar{n}_{1}<1, then this implies that h1>0h_{1}^{\prime}>0 (strictly increasing because h1′′>0h_{1}^{\prime\prime}>0) for all [1,p1][1,p-1] for all p1p\geq 1 and thus u1>0u_{1}^{\prime}>0 in [1,p1][1,p-1]. Thus, n1=1n_{1}^{*}=1 and (3.10) is a bound independent of pp.

Case 2: Let now n¯11\bar{n}_{1}\geq 1. First, consider budgets p>n¯1+1p>\bar{n}_{1}+1 so that n¯1[1,p1)\bar{n}_{1}\in[1,p-1). Because h1(n¯1)=0h_{1}^{\prime}(\bar{n}_{1})=0 with n¯1[1,p1)\bar{n}_{1}\in[1,p-1) and h1h_{1}^{\prime} strictly increasing, we have that h1>0h_{1}^{\prime}>0 in (n¯1,p1](\bar{n}_{1},p-1] and thus u1>0u_{1}^{\prime}>0 in (n¯1,p1](\bar{n}_{1},p-1], which means u1u_{1} cannot be strictly decreasing on all of [1,p1][1,p-1]. Thus, the minimum of u1u_{1} in [1,p1][1,p-1] is either at a stationary point of u1u_{1} or at 1. If it is a stationary point, then h1(n1)<0h_{1}^{\prime}(n_{1}^{*})<0 has to hold to obtain u1(n1)=0u_{1}^{\prime}(n_{1}^{*})=0 and we obtain h1(n1)<0=h1(n¯1)h_{1}^{\prime}(n_{1}^{*})<0=h_{1}^{\prime}(\bar{n}_{1}), which implies n1n¯1n_{1}^{*}\leq\bar{n}_{1} because h1h_{1}^{\prime} is strictly increasing and thus (3.10) is a pp-independent bound of n1n_{1}^{*}. If it n1=1n_{1}^{*}=1, then 1=n1n¯11=n_{1}^{*}\leq\bar{n}_{1} still holds because n¯11\bar{n}_{1}\geq 1. Second, consider budgets pn¯1+1p\leq\bar{n}_{1}+1 so that n¯1p1\bar{n}_{1}\geq p-1. Then, it holds n1p1n¯1n_{1}^{*}\leq p-1\leq\bar{n}_{1}, which again leads to the bound (3.10).

This shows that depending on the properties of h1h_{1}, the maximum of 11 and n¯1\bar{n}_{1} is an upper bound of n1n_{1}^{*}. ∎

3.3 Context-aware multi-fidelity Monte Carlo sampling with two or more low-fidelity models

The second novelty of our proposed context-aware approach is that we can consider more than just one low-fidelity model. In the following, we introduce a sequential training approach to fit hierarchies of low-fidelity models for the CA-MFMC estimator, where in each step the optimal trade off between training and sampling is achieved.

We first show an upper bound of the MSE (3.1) in terms of accuracy and cost rates for the case with k>1k>1 low-fidelity models:

Lemma 1.

Let Assumptions 1 and 2 hold. Then, the MSE (3.1) of the CA-MFMC estimator E^n1,,nkCAMFMC\hat{E}^{\mathrm{CA-MFMC}}_{n_{1},\dots,n_{k}} can be bounded as

MSE[E^n1,,nkCAMFMC](k+1)σ02pk1nk(κk1+wk1(nk1)ca,kra,k(nk)+cc,krc,k(nk)),\mathrm{MSE}[\hat{E}^{\mathrm{CA-MFMC}}_{n_{1},\dots,n_{k}}]\leq\frac{(k+1)\sigma_{0}^{2}}{p_{k-1}-n_{k}}\left(\kappa_{k-1}+w_{k-1}(n_{k-1})c_{a,k}r_{a,k}(n_{k})+c_{c,k}r_{c,k}(n_{k})\right)\,, (3.11)

where

pk1=pj=1k1nk,p0=p,κk1=j=0k2wj(nj)(1ρj+12(nj+1)),κ0=0.p_{k-1}=p-\sum_{j=1}^{k-1}n_{k},\quad p_{0}=p\,,\qquad\kappa_{k-1}=\sum_{j=0}^{k-2}w_{j}(n_{j})(1-\rho_{j+1}^{2}(n_{j+1})),\quad\kappa_{0}=0\,. (3.12)
Proof.

With the sums of squares inequality,

(i=0kai)2(k+1)i=0kai2,a0,,ak,\left(\sum_{i=0}^{k}a_{i}\right)^{2}\leq(k+1)\sum_{i=0}^{k}a_{i}^{2}\,,\qquad a_{0},\dots,a_{k}\in\mathbb{R}\,,

we obtain

MSE[E^n1,,nkCAMFMC]=\displaystyle\mathrm{MSE}[\hat{E}^{\mathrm{CA-MFMC}}_{n_{1},\dots,n_{k}}]= σ02pj=0knj(j=0kwj(nj)(ρj2(nj)ρj+12(nj+1)))2\displaystyle\frac{\sigma_{0}^{2}}{p-\sum_{j=0}^{k}n_{j}}\left(\sum_{j=0}^{k}\sqrt{w_{j}(n_{j})\left(\rho_{j}^{2}(n_{j})-\rho_{j+1}^{2}(n_{j+1})\right)}\right)^{2}
\displaystyle\leq σ02(k+1)pj=0knjj=0kwj(nj)(ρj2(nj)ρj+12(nj+1))\displaystyle\frac{\sigma_{0}^{2}(k+1)}{p-\sum_{j=0}^{k}n_{j}}\sum_{j=0}^{k}w_{j}(n_{j})(\rho_{j}^{2}(n_{j})-\rho_{j+1}^{2}(n_{j+1}))
=\displaystyle= σ02(k+1)pj=0knj(j=0k2wj(nj)(ρj2(nj)ρj+12(nj+1))+\displaystyle\frac{\sigma_{0}^{2}(k+1)}{p-\sum_{j=0}^{k}n_{j}}\left(\sum_{j=0}^{k-2}w_{j}(n_{j})(\rho_{j}^{2}(n_{j})-\rho_{j+1}^{2}(n_{j+1}))+\right.
wk1(nk1)(ρk12(nk1)ρk2(nk))+wk(nk)ρk2(nk)),\displaystyle\qquad\qquad\qquad\qquad\qquad w_{k-1}(n_{k-1})(\rho_{k-1}^{2}(n_{k-1})-\rho_{k}^{2}(n_{k}))+w_{k}(n_{k})\rho_{k}^{2}(n_{k})\bigg{)}\,,

where we used the convention that ρk+1=0\rho_{k+1}=0 in the last equation. With ρj21\rho_{j}^{2}\leq 1 for j=0,,kj=0,\dots,k and pk1,κk1p_{k-1},\kappa_{k-1} defined in (3.12), the bound (3.11) follows with Assumptions 1 and 2. ∎

Bound (3.11) in Lemma 1 decomposes into the component κk1\kappa_{k-1} which depends on n1,,nk1n_{1},\dots,n_{k-1} only and components wk1(nk1)(ρk12(nk1)ρk2(nk))w_{k-1}(n_{k-1})(\rho_{k-1}^{2}(n_{k-1})-\rho_{k}^{2}(n_{k})) and wk(nk)ρk2(nk)w_{k}(n_{k})\rho_{k}^{2}(n_{k}) that can be bounded with Assumptions 1 and 2 so that they depend on nkn_{k} for a fixed nk1n_{k-1}. This decomposition motivates a sequential approach of adding low-fidelity models to the CA-MFMC estimator. At iteration j=1,,kj=1,\dots,k, we consider the objective

uj(nj;n1,,nj1):[1,pj11](0,),nj1pj1nj(κj1+c^a,jra,j(nj)+cc,jrc,j(nj)),u_{j}(n_{j};n_{1},\dots,n_{j-1}):[1,p_{j-1}-1]\to(0,\infty),\qquad n_{j}\mapsto\frac{1}{p_{j-1}-n_{j}}\left(\kappa_{j-1}+\hat{c}_{a,j}r_{a,j}(n_{j})+c_{c,j}r_{c,j}(n_{j})\right)\,, (3.13)

where c^a,j=wj1(nj1)ca,j\hat{c}_{a,j}=w_{j-1}(n_{j-1})c_{a,j}. For j=1j=1, we obtain the objective defined in (3.5) with the convention that n0=0n_{0}=0 and w0=1w_{0}=1. For j>1j>1, we obtain objectives uju_{j} in (3.13) that depend on n1,,nj1n_{1},\dots,n_{j-1}. For given n1,,nj1n_{1},\dots,n_{j-1}, there is a global, unique minimizer of uju_{j} in [1,pj11][1,p_{j-1}-1] as the following proposition shows.

Proposition 3.

Let Assumptions 1 and 2 hold. For j=1j=1, let Proposition 1 apply and let n1n_{1}^{*} be the corresponding unique global minimum. Iterate now over j=2,,kj=2,\dots,k and consider the objective uj(nj;n1,,nj1)u_{j}(n_{j};n_{1}^{*},\dots,n_{j-1}^{*}) defined in (3.13) at iteration jj. If it holds that

c^a,jra,j′′(nj)+cc,jrc,j′′(nj)>0,\hat{c}_{a,j}r_{a,j}^{\prime\prime}(n_{j})+c_{c,j}r_{c,j}^{\prime\prime}(n_{j})>0\,, (3.14)

then there exists a unique minimizer nj[1,pj11]n_{j}^{*}\in[1,p_{j-1}-1] of uju_{j}.

Proof.

The function njκj1+c^a,jra,j(nj)+cc,jrc,j(nj)n_{j}\mapsto\kappa_{j-1}+\hat{c}_{a,j}r_{a,j}(n_{j})+c_{c,j}r_{c,j}(n_{j}) satisfies (3.6) in [1,pj11][1,p_{j-1}-1] because κj1\kappa_{j-1} is constant in njn_{j} and (3.14) is assumed. Thus, the same arguments as in the proof of Proposition 1 apply and thus unique minimizers n2,,nkn_{2}^{*},\dots,n_{k}^{*} exist. ∎

The next proposition shows the minimizers of (3.13) for j=1,,kj=1,\dots,k are bounded from above, independent of the computational budget pp.

Proposition 4.

Let Proposition 3 apply, with the condition (3.14) holding in (0,)(0,\infty). Furthermore, analog to Proposition 2, let there exist n¯j(0,)\bar{n}_{j}\in(0,\infty) so that

c^a,jra,j(n¯j)+cc,jrc,j(n¯j)=0\hat{c}_{a,j}r_{a,j}^{\prime}(\bar{n}_{j})+c_{c,j}r_{c,j}^{\prime}(\bar{n}_{j})=0

holds for all j=1,,kj=1,\dots,k. Then, for j=1,,kj=1,\dots,k, there exist bounds n¯1,,n¯k\bar{n}_{1}^{*},\dots,\bar{n}_{k}^{*} that are independent of pp so that njn¯jn_{j}^{*}\leq\bar{n}_{j}^{*} for j=1,,kj=1,\dots,k.

Proof.

Proof via induction and applying Proposition 2 sequentially to Proposition 3. ∎

Propositions 14 highlights that even though using more than nn_{\ell}^{*} high-fidelity model evaluations as training data to construct the \ellth low-fidelity model can lead to a more accurate low-fidelity model in terms of correlation coefficient, it also increases its evaluation costs, which ultimately leads to an increase of the upper bound of the MSE of the MFMC estimator (3.11) for a fixed budget and thus a poorer estimator. This shows that it is beneficial in multi-fidelity methods to trade off accuracy and evaluation costs of the low-fidelity models.

Remark 2.

Increasing the model hierarchy by adding the jjth context-aware low-fidelity model fnj(j)f^{(j)}_{n_{j}} must ensure that the MFMC ordering (2.1) is satisfied, i.e., the accuracy and evaluation cost rates corresponding to fnj(j)f^{(j)}_{n_{j}}, evaluated at njn_{j}^{*}, have to satisfy (2.1). If (2.1) is not satisfied, the order of the low-fidelity models in the multi-fidelity hierarchy must be changed accordingly.

4 Numerical experiments and discussion

We now present numerical results. In Section 4.1, we consider a heat conduction problem given by a parametric elliptic partial differential equation (PDE) with nine uncertain parameters, defined on a two-dimensional spatial domain. Our goal in this experiment is to draw initial insights about the proposed context-aware learning algorithm. To this end, we will employ two heterogeneous low-fidelity models: an accurate low-fidelity model that is computationally more expensive than a second, less accurate, low-fidelity model. Section 4.2 considers plasma micro-turbulence simulations that depend on 1212 uncertain inputs. In this example, we consider three low-fidelity models: a coarse-grid model and two non-intrusive data-driven low-fidelity models based on sparse grid approximations and deep neural network regression.

4.1 Heat conduction in a two-dimensional spatial domain

All numerical experiments in this section were performed using an eight core Intel i7-10510U CPU and 16 GB of RAM.

4.1.1 Setup

The thermal block problem [38] is defined on a two-dimensional spatial domain Ω=(0,1)2=i=1dΩi\Omega=(0,1)^{2}=\bigcup_{i=1}^{d}\Omega_{i}, divided into d=B1×B2d=B_{1}\times B_{2} non-overlapping vertical and horizontal square blocks, Ωi\Omega_{i} with i=1,di=1,\ldots d. The mathematical model is given by the parametric elliptic PDE

divk(x,y,𝜽)u(x,y)=0 in Ω,u(x,y)=0 on ΓD,k(x,y,𝜽)u(x,y)𝒏=j on ΓN,j,j=0,1,\begin{split}-\operatorname{div}k(x,y,\bm{\theta})\nabla u(x,y)=0&\text{ in }\Omega\,,\\ u(x,y)=0&\text{ on }\Gamma_{D}\,,\\ k(x,y,\bm{\theta})\nabla u(x,y)\cdot\bm{n}=j&\text{ on }\Gamma_{N,j},\quad j=0,1,\end{split} (4.1)

where ΓD\Gamma_{D} is a Dirichlet boundary, ΓN,0\Gamma_{N,0} and ΓN,1\Gamma_{N,1} are Neumann boundaries, and

k(x,y,𝜽)=i=1dθiχΩi(x,y)k(x,y,\bm{\theta})=\sum_{i=1}^{d}\theta_{i}\chi_{\Omega_{i}}(x,y)

is the piece-wise constant heat conductivity field parametrized by 𝜽=[θ1,θ2,,θd]d\bm{\theta}=[\theta_{1},\theta_{2},\ldots,\theta_{d}]\subset\mathbb{R}^{d}, where χΩi\chi_{\Omega_{i}} is the indicator function of Ωi\Omega_{i}. We set B1=B2=3B_{1}=B_{2}=3 and thus d=9d=9. Here, k(x,y,𝜽)k(x,y,\bm{\theta}) is parametrized by d=9d=9 uniformly distributed random parameters in [1,10]9[1,10]^{9}. The output of interest is the mean heat flow at the Neumann boundary ΓN,1\Gamma_{N,1} given by

𝔼[u(𝜽)]=ΓN,1u(x,y,𝜽)dxdy.\mathbb{E}[u(\bm{\theta})]=\int_{\Gamma_{N,1}}u(x,y,\bm{\theta})\mathrm{d}x\mathrm{d}y. (4.2)

This setup is a slight modification of the problem considered by Patera and Rozza in [38].

The high-fidelity model is a finite element discretization of (4.1) consisting of 7,2007,200 triangular elements with mesh width h=2/60h=\sqrt{2}/60, provided by the RBMatlab444https://www.morepas.org/software/rbmatlab/. The high-fidelity model evaluation cost is 0.11500.1150 seconds. Moreover, its variance, estimated using 100100 MC samples, is σ020.0018\sigma_{0}^{2}\approx 0.0018.

4.1.2 A low-fidelity model based on the reduced basis method

The reduced-basis (RB) low-fidelity model fn1(1)f^{(1)}_{n_{1}} is constructed using a greedy strategy, as described, for example, in [3]. We employ the implementation provided by RBMatlab. It has been shown that greedy RB low-fidelity models have exponential accuracy rates for problems similar to the thermal block [3], which is what we use when fitting the error decay. Once the reduced basis is found in the offline stage, evaluating the low-fidelity model online entails solving a dense linear system of size equal to the number of reduced bases to find the reduced basis coefficients. Therefore, we model the evaluation cost rate as algebraic in the reduced-model dimension.

The constants in the rate functions are estimated via regression from pilot runs. To estimate the constants in the exponential accuracy rate, we use 100100 high- and low-fidelity evaluations . For estimating the constants in the evaluation cost rate, we average the runtimes of the low-fidelity model constructed using increasing values of n1n_{1}, evaluated at 1,0001,000 MC samples. We perform these runtime measurements 5050 times and average the results. The estimated rates are visualized in Figure 1 and the constants are shown in Table 1.

Refer to caption
Figure 1: Thermal block: Estimated accuracy (left) and evaluation cost rates (right) for the RB low-fidelity model.
rate rate type estimated rate constants\bigstrut[t]
accuracy: ca,1ra,1(n1)c_{a,1}r_{a,1}(n_{1}) exponential: ra,1(n1)=eα1n1r_{a,1}(n_{1})=e^{-\alpha_{1}n_{1}} ca,1=0.6312c_{a,1}=0.6312, α1=0.5754\alpha_{1}=0.5754 \bigstrut[t]
evaluation cost: cc,1rc,1(n1)c_{c,1}r_{c,1}(n_{1}) algebraic: rc,1(n1)=n1β1r_{c,1}(n_{1})=n_{1}^{\beta_{1}} cc,1=9.6233×105c_{c,1}=9.6233\times 10^{-5}, β1=1.0704\beta_{1}=1.0704 \bigstrut[t]
Table 1: Thermal block: Estimated accuracy and evaluation cost rates and their constants for the RB low-fidelity model.

4.1.3 A data-fit low-fidelity model

The second low-fidelity model fn2(2)f^{(2)}_{n_{2}} that we consider in the thermal block example is an ε\varepsilon-support vector regression (ε\varepsilon-SVR) model. Our numerical implementation is based on libsvm [5]. The training data consists of pairs of pseudo-random realizations of the nine-dimensional input and the corresponding high-fidelity model evaluations. In our experiments, we used ε=102\varepsilon=10^{-2}. We model the accuracy and evaluation cost rates as algebraic. To estimate the constants in the accuracy rate, we use 100100 high- and low-fidelity evaluations. The constants in the evaluation cost rate are estimated by averaging the runtimes of the low-fidelity model evaluated at 1,0001,000 MC samples. We perform these runtime measurements 5050 times and average the results. The estimated rates are visualized in Figure 2 and the constants are shown in Table 2. Note that the ε\varepsilon-SVR low-fidelity model is less accurate but cheaper to evaluate than the RB low-fidelity model.

Refer to caption
Figure 2: Thermal block: Estimated accuracy (left) and evaluation cost rates (right) for the ε\varepsilon-SVR low-fidelity model.
rate rate type estimated rate constants\bigstrut[t]
accuracy: ca,2ra,2(n2)c_{a,2}r_{a,2}(n_{2}) algebraic: ra,2(n2)=n2α2r_{a,2}(n_{2})=n_{2}^{-\alpha_{2}} ca,2=0.7309c_{a,2}=0.7309, α2=0.4053\alpha_{2}=0.4053 \bigstrut[t]
evaluation cost: cc,2rc,2(n2)c_{c,2}r_{c,2}(n_{2}) algebraic: rc,2(n2)=n2β2r_{c,2}(n_{2})=n_{2}^{\beta_{2}} cc,2=9.3245×107c_{c,2}=9.3245\times 10^{-7}, β2=0.5696\beta_{2}=0.5696 \bigstrut[t]
Table 2: Thermal block: Accuracy and evaluation cost rates and their constants for the ε\varepsilon-SVR low-fidelity model.

4.1.4 Context-aware multi-fidelity Monte Carlo with RB low-fidelity model

We first consider only the RB low-fidelity model and budgets p{5,10,30,50,80,100,300,500}p\in\{5,10,30,50,80,100,300,500\} seconds. We compare, in terms of MSE, our context-aware estimator with standard MC sampling and MFMC in which the RB low-fidelity model is constructed using a fixed, a priori chosen number of basis independent of the budget pp. We compute the MSEs from NN replicates as

eMSE(E^())=1Nn=1N(μ^refE^())2,e_{\text{MSE}}(\hat{E}^{(\cdot)})=\frac{1}{N}\sum_{n=1}^{N}\left(\hat{\mu}_{\text{ref}}-\hat{E}^{(\cdot)}\right)^{2}, (4.3)

where μ^ref\hat{\mu}_{\text{ref}} represents the reference mean estimator and E^()\hat{E}^{(\cdot)} is either the CA-MFMC, standard MFMC or the MC estimator. The reference result μ^ref\hat{\mu}_{\text{ref}} was obtained using the context-aware multi-fidelity estimator in which the two low-fidelity models, RB and ε\varepsilon-SVR, were sequentially added with a computational budget pref=103p_{\text{ref}}=10^{3} seconds (cf. Section 4.1.5). To distinguish between the employed estimators, we use “std. MC: f(0)f^{(0)}” to denote the standard MC estimator, “MFMC: f(0),f(1)f^{(0)},f^{(1)}” to refer to the MFMC estimator in which the RB low-fidelity model is statically constructed independent of the budget pp, and “CA-MFMC: f(0),fn1(1)f^{(0)},f^{(1)}_{n_{1}}” to refer to the context-aware multi-fidelity estimator with the RB low-fidelity model.

Based on the estimated rate parameters in Table 1, both the accuracy and cost rates of the RB low-fidelity model are strictly convex for all n1[1,p1]n_{1}\in[1,p-1] and any budget p>0p>0, and hence Propositions 1 and 2 apply. Therefore the optimal, context-aware number of high-fidelity model evaluations for constructing the RB low-fidelity model, n1n_{1}^{*}, exists, is unique and bounded with respect to the budget. The optimal number of training samples n1n_{1}^{*} for increasing budgets is shown in Figure 3, on the left. Notice that n118n_{1}^{*}\leq 18 and this upper bound is attained for budgets p10p\geq 10 seconds, which is because the RB low-fidelity models is accurate already with few basis vectors. Figure 3, right, depicts the split of the total computational budget between constructing the RB low-fidelity model (offline phase, using n1n_{1}^{*} high-fidelity model evaluations) and multi-fidelity sampling (online phase, using the remaining budget). Because n1n_{1}^{*} is bounded, the percentage of the budget allocated to constructing the RB low-fidelity model decreases with pp.

Refer to caption
Figure 3: Thermal block: Left, optimal number of high-fidelity training samples for training the RB low-fidelity model for the context-aware estimator. Right, visualization of how the context-aware estimator splits the computational budget between training (offline) and sampling (online).

Figure 4 compares the MSEs of regular MC, static MFMC with RB low-fidelity models of fixed dimension 2,8,502,8,50, and CA-MFMC estimators. The left plot reports the analytical MSEs derived using equations (2.3), (2.5) and (3.1), which can be evaluated with the constants reported in Tables 1 and 3.

n1n_{1} ρ1\rho_{1} w1w_{1} σ12\sigma_{1}^{2}\bigstrut[t]
22 0.92770.9277 1.0915×1041.0915\times 10^{-4} 9.4347×1049.4347\times 10^{-4} \bigstrut[t]
88 0.99390.9939 1.9791×1041.9791\times 10^{-4} 0.00150.0015 \bigstrut[t]
5050 0.99990.9999 6.7539×1046.7539\times 10^{-4} 0.00180.0018 \bigstrut[t]
Table 3: Thermal block: Correlations, evaluation costs and variances of the RB low-fidelity model f(1)f^{(1)} with dimensions 2,8,502,8,50, respectively.

Computing the analytical MSEs requires no numerical simulations of the forward model as long as the constants in Tables 1 and 3 are available. The right plot shows the MSEs obtained numerically by taking N=50N=50 replicates of the estimators. All multi-fidelity estimators give a lower MSE than standard MC sampling by at least half an order of magnitude, even when using only two dimensions in the static RB low-fidelity model. Overall, the proposed context-aware multi-fidelity estimator gives the lowest MSE of all employed estimators. It is about four orders of magnitude more accurate than standard MC sampling in terms of MSE. Our context-aware estimator is also more accurate than the MFMC estimators with static models because 22 and 88 are too few basis vectors and 5050 are too many basis vectors that make the low-fidelity model unnecessarily expensive to evaluate. In contrast, our context-aware estimator optimally trades off training and sampling and so achieves the lowest MSE of all estimators, which agrees with Proposition 1 and Proposition 2. Even though the static MFMC estimator with n1=50n_{1}=50 basis vectors is close to the CA-MFMC estimator, it is unclear how to choose the dimension in static MFMC a priori and if, e.g., 5050 is a good choice, whereas the proposed context-aware approach provides the optimal n1n_{1}^{*} with respect to an upper bound of the MSE for a given budget pp; cf. Section 3.2 and Figure 3, left.

Refer to caption
Figure 4: Thermal block: Analytical MSE is shown left and estimated MSE is shown right. The context-aware MFMC estimator CA-MFMC achieves the lowest MSE by trading off the costs of training the low-fidelity model and sampling. Even though we see that the static MFMC with an RB model of dimension 5050 performs well here, it is unclear before the error has been computed what dimension of the RB low-fidelity model is a good choice. Furthermore, the static estimator with RB model of dimension 5050 requires at least 5050 high-fidelity model evaluations to start with, whereas the CA-MFMC adaptively chooses the number of high-fidelity model evaluations n1n_{1}^{*} for training the low-fidelity model based on the available budget.

4.1.5 Context-aware multi-fidelity Monte Carlo with RB and data-fit low-fidelity models

We now sequentially add, as detailed in Section 3.3, the second low-fidelity model that is based on ε\varepsilon-SVR. We consider budgets of p{5,10,30,50}p\in\{5,10,30,50\} seconds. The abbreviation “CA-MFMC: f(0),fn1(1),fn2(2)f^{(0)},f^{(1)}_{n_{1}},f^{(2)}_{n_{2}}” refers to the CA-MFMC estimator in which the ε\varepsilon-SVR low-fidelity model fn2(2)f^{(2)}_{n_{2}} is sequentially added after the RB low-fidelity model fn1(1)f^{(1)}_{n_{1}}.

As it can be seen from Table 2, the algebraic cost rate of the ε\varepsilon-SVR low-fidelity model is concave since 0<β2<10<\beta_{2}<1, which implies that we need to verify whether the assumptions in Propositions 3 hold true. We therefore verify whether the function h2(n2)=κ1+c^a,2ra,2(n2)+cc,2rc,2(n2)h_{2}(n_{2})=\kappa_{1}+\hat{c}_{a,2}r_{a,2}(n_{2})+c_{c,2}r_{c,2}(n_{2}) defined in (3.12), with κ1=cc,1rc,1(n1)\kappa_{1}=c_{c,1}r_{c,1}(n_{1}^{*}) and c^a,3=κ2ca,3\hat{c}_{a,3}=\kappa_{2}c_{a,3} with κ2=ca,1ra,1(n1)\kappa_{2}=c_{a,1}r_{a,1}(n_{1}^{*}), satisfies condition (3.14). Since κ1\kappa_{1} and κ2\kappa_{2} are constants for a fixed n1n_{1}^{*}, h2(n2)h_{2}(n_{2}) is independent of the budget and it is hence sufficient to verify (3.14) for the largest considered budget, p=50p=50 seconds. For this budget, n1=18n_{1}^{*}=18 (cf. Figure 3, left), and κ1=2.0061×105\kappa_{1}=2.0061\times 10^{-5} and κ2=2.1227×104\kappa_{2}=2.1227\times 10^{-4}. Moreover, n2[1,p/w0n11]=[1,415]n_{2}\in[1,\lfloor p/w0\rfloor-n_{1}^{*}-1]=[1,415]. The left plot in Figure 5 shows that h2′′(n2)>0h_{2}^{\prime\prime}(n_{2})>0 for all n2[1,415]n_{2}\in[1,415] which implies that (3.14) is satisfied for all budgets 0<p500<p\leq 50. Therefore the number of high-fidelity model evaluations used to construct the ε\varepsilon-SVR low-fidelity model n2n_{2}^{*} exists and is unique with respect to the budget pn1×w0p-n_{1}^{*}\times w_{0} that is left after adding the context-aware RB low-fidelity model. The right plot in Figure 5 depicts the objective u2(n2)u_{2}(n_{2}) defined in (3.13).

Refer to caption
Figure 5: Thermal block: The plot on the left demonstrates that (3.14) holds for budget p=50p=50 seconds, which corresponds to n2415n_{2}\leq 415. The plot on the right shows the objective u2u_{2} defined in (3.13), which has a unique, global minimizer in [1,415][1,415].

Figure 6 compares the MSE of the estimators. In the left plot, we show the analytical MSEs computed using equations (2.3), (2.5) and (3.1) and the constants reported in Tables 1, 2 and 3, without requiring any new numerical simulations. The MSEs reported in the right plot were computed using N=50N=50 replicates. Sequentially adding the ε\varepsilon-SVR low-fidelity model to the context-aware estimator further decreases the MSE. This shows that even though the slowly decreasing accuracy rate of the ε\varepsilon-SVR low-fidelity model indicates that it would necessitate a large training set to be accurate in single-fidelity settings for predicting high-fidelity model outputs, training it using at most n2=50n_{2}^{*}=50 high-fidelity training samples in our context-aware learning approach is sufficient to achieve variance reduction and thus a more accurate multi-fidelity estimator. This indicates that even low-fidelity models with poor predictive capabilities can be useful for multi-fidelity methods as long as they capture the trend of the high-fidelity model.

Refer to caption
Figure 6: Thermal block: The left plot shows the analytic MSE and the right plot showed the estimated MSE. When adding the second model fn2(2)f_{n_{2}}^{(2)}, the MSE of the CA-MFMC estimator is further reduced compared to the CA-MFMC estimator with only one low-fidelity model.

4.2 Plasma micro-turbulence simulation in the ASDEX Upgrade Experiment

We now consider uncertainty quantification in simulations of plasma micro-turbulence in magnetic confinement devices. Of interest are small-scale fluctuations which cause energy loss rates despite sophisticated plasma confinements via strong and shaped magnetic fields. The micro-turbulence is driven by the free energy provided by the steep plasma temperature and density gradients. However, the measurements of these gradients, as well as further decisive physics parameters affecting the underlying micro-instabilities are subject to uncertainties, which need to be quantified.

4.2.1 Setup of the experiment

We focus on linear (in the phase space variables) gyrokinetic simulations (five-dimensional phase space characterized by three positions and two velocities), which are used to characterize the underlying micro-instabilities; we refer to [4] and the references therein for details. A parameter set from a discharge from the ASDEX Upgrade Experiment is considered, which is similar to the parameter set used in [15] for a validation study of gyrokinetic simulations. The experiment considered here is characterized by two particle species, (deuterium) ions and electrons. Moreover, the magnetic geometry is described by an analytical Miller equilibrium [28]. We consider both electromagnetic effects and collisions. Collisions are modeled by a linearized Landau-Boltzmann operator. The total number of uncertain inputs is 1212, which are modeled as given in Table 4.

𝜽\bm{\theta} parameter name symbol left bound right bound
θ1\theta_{1} plasma β\beta β\beta 0.4889×1030.4889\times 10^{-3} 0.5975×1030.5975\times 10^{-3}
θ2\theta_{2} collision frequency νc\nu_{c} 0.6412×1020.6412\times 10^{-2} 0.8676×1020.8676\times 10^{-2}
θ3\theta_{3} ion and electron density gradient ωn\omega_{n} 1.15631.1563 1.92721.9272
θ4\theta_{4} ion temperature gradient ωTi\omega_{T_{i}} 2.09662.0966 3.49433.4943
θ5\theta_{5} temperature ratio Ti/TeT_{i}/T_{e} 0.61420.6142 0.67880.6788
θ6\theta_{6} electron temperature gradient ωTe\omega_{T_{e}} 4.04034.0403 6.73386.7338
θ7\theta_{7} effective ion charge ZeffZ_{\mathrm{eff}} 1.28001.2800 1.92001.9200
θ8\theta_{8} safety factor qq 2.17082.1708 2.39932.3993
θ9\theta_{9} magnetic shear s^\hat{s} 1.99271.9927 2.43562.4356
θ10\theta_{10} elongation kk 1.28151.2815 1.41651.4165
θ11\theta_{11} elongation gradient sks_{k} 0.20810.2081 0.25440.2544
θ12\theta_{12} triangularity δ\delta 0.05280.0528 0.05830.0583
Table 4: ASDEX Upgrade Experiment: The 1212 uniformly distributed uncertain inputs.

We perform experiments at one bi-normal wave-number kyρs=0.8k_{y}\rho_{s}=0.8. The output of interest is the growth rate (spectral abscissa) of the dominant micro-instability mode, which corresponds to the maximum real part over the spectrum. For this parameter set, the dominant micro-instability mode at kyρs=0.8k_{y}\rho_{s}=0.8 and at the nominal values of the input parameters is electron temperature gradient-driven, characterized by a negative frequency (imaginary part), while the first subdominant mode is ion temperature gradient-driven, characterized by a positive frequency. However, for certain values of the ion temperature and density gradients within the bounds showed in Table 4, the first subdominant ion mode can transition to become the dominant mode. This mode transition can be sharp and in turn lead to sharp transitions or even discontinuities in the growth rate. To avoid the potentially detrimental effects of this discontinuity, in our experiments, we determine the growth rate of the underlying electron-driven micro-instability mode. To this end, we compute the first two micro-instability eigenmodes and then select the growth rate that has a corresponding negative frequency.

4.2.2 High-fidelity model

The high-fidelity model is provided by the gyrokinetic simulation code Gene [24] in the flux-tube limit [8]. This assumes periodic boundary conditions over the radial xx and bi-normal kyky directions. To discretize the five dimensional gyrokinetic phase-space domain, we use 1515 Fourier modes in the radial direction and 2424 points along the field line, in the zz direction. The binormal coordinate kyky is decoupled in linear flux-tube simulations and hence only one such mode is used here. In velocity space, we employ 4848 equidistant and symmetric parallel velocity grid points and 1616 Gauss-Laguerre distributed magnetic moment points. This results in a total of 276,480=15×1×24×48×16276,480=15\times 1\times 24\times 48\times 16 degrees of freedom in the 55D phase space.

The average high-fidelity model runtime is 410.9941410.9941 seconds on 3232 cores on one node on the Lonestar6 supercomputer at the Texas Advanced Computing Center; cf. Section 1. One such node comprises two AMD EPYC 7763 64-Core Processors for a total of 128128 cores per socket, and 256256 GB of RAM. The high-fidelity variance, estimated using 2020 MC samples, is σ020.0279\sigma_{0}^{2}\approx 0.0279.

4.2.3 A coarse-grid low-fidelity model

We first consider a low-fidelity model f(1)f^{(1)} obtained by coarsening the grid in zz direction (20 points), v||v_{||} direction (2020 points) and μ\mu direction (88 Gauss-Laguerre points), for a total of 48,000=15×20×20×848,000=15\times 20\times 20\times 8 degrees of freedom, which corresponds to 5.76×5.76\times fewer degrees of freedom than the high-fidelity model. We note that this represents still a useful resolution for this experiment while further considerable coarsening of the underlying mesh could lead to significant loss of accuracy or potentially even unphysical solutions. The average evaluation cost of f(1)f^{(1)} relative to the high-fidelity model is w1=0.0849w_{1}=0.0849 on 1616 cores on one node on Lonestar6. The variance and correlation coefficient, approximated using 2020 MC high- and low-fidelity model evaluations, are σ120.0256\sigma_{1}^{2}\approx 0.0256 and ρ10.9990\rho_{1}\approx 0.9990, respectively.

Single-fidelity settings require fine discretizations for obtaining results with desired accuracy. Here, we want to show that when coarse-grid low-fidelity models are correlated with the high-fidelity model, their lower computational cost can be leveraged for speeding up multi-fidelity sampling methods. This is particularly relevant in plasma physics simulations in which usually only a limited number of high-fidelity simulations can be performed for training data-fit low-fidelity models. Also, we want to show that our algorithm has a wide scope in the sense that a wide variety of low-fidelity models can be used.

4.2.4 A sensitivity-driven dimension-adaptive sparse grid low-fidelity model

The low-fidelity model fn2(2)f^{(2)}_{n_{2}} is based on sensitivity-driven dimension-adaptive sparse grid interpolation [11, 12], with code available555https://github.com/ionutfarcas/sensitivity-driven-sparse-grid-approx. Such sparse grid (SG) low-fidelity models have been successfully used for uncertainty quantification in plasma micro-instability simulations [11, 12, 14], including computationally expensive nonlinear simulations [13]. The adaptive procedure to construct the SG low-fidelity model is terminated when the underlying sparse grid reaches cardinality n2n_{2}^{*}, which corresponds to n2n_{2}^{*} high-fidelity training samples, where n2n_{2}^{*} is determined as discussed in Section 3. We note that even though the interpolation points used to construct the SG low-fidelity model have fine granularity (see [11, 12]), in cases where we cannot have exactly n2n_{2}^{*} sparse-grid points, we take the closest, feasible number of sparse grid points to n2n_{2}^{*}.

We use 2020 high- and low-fidelity model evaluations to asses the accuracy of the SG low-fidelity model; see also [26]. To estimate the evaluation cost rate, we average the runtimes of the low-fidelity model evaluated at 1,0001,000 MC samples. As can be seen in Figure 7 and Table 5, accuracy and runtime can be modeled well with an algebraic rate.

Refer to caption
Figure 7: ASDEX Upgrade Experiment: Estimated accuracy (left) and evaluation cost rates (right) for the SG low-fidelity model.
rate rate type estimated rate constants\bigstrut[t]
accuracy: ca,2ra,2(n2)c_{a,2}r_{a,2}(n_{2}) algebraic: ra,2(n2)=n2α2r_{a,2}(n_{2})=n_{2}^{-\alpha_{2}} ca,2=0.3361c_{a,2}=0.3361, α2=0.8617\alpha_{2}=0.8617 \bigstrut[t]
evaluation cost: cc,2rc,2(n2)c_{c,2}r_{c,2}(n_{2}) algebraic: rc,2(n2)=n2β2r_{c,2(n_{2})}=n_{2}^{\beta_{2}} cc,2=2.9060×107c_{c,2}=2.9060\times 10^{-7}, β2=1.1480\beta_{2}=1.1480 \bigstrut[t]
Table 5: ASDEX Upgrade Experiment: Estimated accuracy and evaluation cost rates, and their constants for the SG low-fidelity model.

4.2.5 A deep learning low-fidelity model

Low-fidelity model fn3(3)f^{(3)}_{n_{3}} is a given by a fully connected, feed-forward three-layer deep neural network (DNN) with ReLU activation functions that is trained on high-fidelity data. The number of nodes of the hidden layers is 0.05×n3\lfloor 0.05\times n_{3}\rfloor and thus depends on the number of training samples n3n_{3}. Training is done over 2020 epochs using an Adam optimizer in which the loss function is the MSE. The accuracy rate is estimated from 2020 high- and low-fidelity model evaluations and the cost rate from 1,0001,000 low-fidelity model evaluations; see Figure 8 and Table 6. Notice that within the considered training size, the accuracy rate decreases rather slowly, at an algebraic rate of only about 0.20.2. This indicates that a large training size is required for obtaining accurate approximations should the DNN low-fidelity model be used to replace the high-fidelity model. In addition, the evaluation cost rate increases very slowly, which is because the network has few layers only and the employed libraries are highly optimized.

Refer to caption
Figure 8: ASDEX Upgrade Experiment: Estimated accuracy (left) and evaluation cost rates (right) for the DNN low-fidelity model.
rate rate type estimated rate constants\bigstrut[t]
accuracy: ca,3ra,3(n3)c_{a,3}r_{a,3}(n_{3}) algebraic: ra,3(n3)=n3α3r_{a,3}(n_{3})=n_{3}^{-\alpha_{3}} ca,3=0.1399c_{a,3}=0.1399, α3=0.2180\alpha_{3}=0.2180 \bigstrut[t]
evaluation cost: cc,3rc,3(n3)c_{c,3}r_{c,3}(n_{3}) algebraic: rc,3(n3)=n3β3r_{c,3}(n_{3})=n_{3}^{\beta_{3}} cc,3=3.6664×107c_{c,3}=3.6664\times 10^{-7}, β3=0.0501\beta_{3}=0.0501 \bigstrut[t]
Table 6: ASDEX Upgrade Experiment: Estimated accuracy and evaluation cost rates, and their constants for the DNN low-fidelity model.

4.2.6 Context-aware multi-fidelity estimation

We consider the computational budget p=5×105p=5\times 10^{5} seconds, which corresponds to 1,2161,216 high-fidelity model evaluations and is the limit of the computational resources available to us for this project. To determine which combinations of models lead to CA-MFMC estimators with low MSEs for a budget pp, we plot the analytic MSEs in Figure 9. The CA-MFMC estimator with the high-fidelity model f(0)f^{(0)}, the coarse-grid low-fidelity model f(1)f^{(1)} and either the SG low-fidelity model fn2(2)f^{(2)}_{n_{2}} or the DNN low-fidelity model fn3(3)f^{(3)}_{n_{3}} lead to low MSEs. In contrast, the CA-MFMC estimator with the high-fidelity model f(0)f^{(0)}, SG model f(2)f^{(2)}, DNN model f(3)f^{(3)} but without the coarse-grid model f(1)f^{(1)} leads to a higher MSE. We therefore consider the CA-MFMC estimators with f(0)f^{(0)}, f(1)f^{(1)} and either fn2(2)f^{(2)}_{n_{2}} or fn3(3)f^{(3)}_{n_{3}} (cf. Section 3.3). This experiment shows that the analytic MSE can be used to select model combinations.

Refer to caption
Figure 9: ASDEX Upgrade Experiment: Combinations of models can be selected based on the analytical MSE, which can be computed without additional model evaluations as long as the accuracy and cost rates are available. In this example, the CA-MFMC estimator with the high-fidelity model f(0)f^{(0)}, the coarse-grid low-fidelity model f(1)f^{(1)} and either the SG low-fidelity model f(2)f^{(2)} or the DNN low-fidelity model f(3)f^{(3)} have the highest (best) cost/error ratio, which indicates a good model selection.

Based on the estimated constants in Table 5, the accuracy and cost rates of the SG low-fidelity models are both strictly convex for any n2[1,p1]n_{2}\in[1,p-1] and any budget p>0p>0, and therefore Propositions 3 and 4 apply. That is, the optimal, context-aware number of high-fidelity model evaluations for constructing the SG low-fidelity model, n2n_{2}^{*}, exists, is unique and bounded with respect to the budget. On the other hand, we see from Table 6 that the algebraic cost rate of the DNN low-fidelity model is concave since β3<1\beta_{3}<1, which means that we need to verify whether the assumptions in Propositions 3 hold true. To this end, we verify whether the function h3(n3)=κ2+c^a,3ra,3(n3)+cc,3rc,3(n3)h_{3}(n_{3})=\kappa_{2}+\hat{c}_{a,3}r_{a,3}(n_{3})+c_{c,3}r_{c,3}(n_{3}) defined in (3.12), with κ2=w1=0.0849\kappa_{2}=w_{1}=0.0849 and c^a,3=κ1ca,3\hat{c}_{a,3}=\kappa_{1}c_{a,3} with κ1=1ρ12=0.0018\kappa_{1}=1-\rho_{1}^{2}=0.0018, satisfies condition (3.14). We do so for budget p=107p=10^{7} seconds, i.e., the largest budget considered in Figure 9, for which n3[1;24,330]n_{3}\in[1;24,330]. The left plot in Figure 10 shows that h3′′(n3)>0h_{3}^{\prime\prime}(n_{3})>0 for all n2[1,24,330]n_{2}\in[1,24,330] which implies that (3.14) is satisfied for all budgets 0<p1070<p\leq 10^{7}, including the budget considered in our computations, i.e., p=5×105p=5\times 10^{5} seconds. Therefore Propositions 3 applies and the optimal, context-aware number of high-fidelity model evaluations for constructing the DNN low-fidelity model, n3n_{3}^{*}, exists and is unique. The right plot in Figure 10 depicts the objective u3u_{3} defined in (3.13), which has a unique global minimizer.

Refer to caption
Figure 10: ASDEX Upgrade Experiment: On the left, we verify that h3′′>0h^{\prime\prime}_{3}>0 to satisfy (3.14)) up to budget p=107p=10^{7} seconds. On the right, we plot the objective u3u_{3} defined in (3.13), which has a unique global minimizer.

In the left plot of Figure 11, we show the number of training samples n2n_{2}^{*} for the CA-MFMC estimators with models f(0),fn2(2)f^{(0)},f_{n_{2}}^{(2)} and f(0),f(1),fn2(2)f^{(0)},f^{(1)},f_{n_{2}}^{(2)}. For the SG model fn2(2)f_{n_{2}}^{(2)} we took 133133 training samples because the optimal n2=131n_{2}^{*}=131 does not corresponding to a valid sparse grid; cf. Section 4.2.4. In the right plot of Figure 11, the number of training samples n3n_{3}^{*} for the CA-MFMC estimator with models f(0),f(1),fn3(3)f^{(0)},f^{(1)},f_{n_{3}}^{(3)} is shown. Training with only n3=159n_{3}^{*}=159 samples in case of our budget p=5×105p=5\times 10^{5} is sufficient to obtain a low-fidelity model that is useful for variance reduction in the CA-MFMC estimator together with the high-fidelity model, even though the low-fidelity model alone is insufficient to provide accurate predictions in single-fidelity settings.

Refer to caption
Figure 11: ASDEX Upgrade Experiment: The number n2n_{2}^{*} and n3n_{3}^{*} of high-fidelity training samples select by the proposed context-aware approach for obtaining a multi-fidelity estimator with highest (best) cost/error ratio. For example, depending on whether the coarse-grid-based low-fidelity model f(1)f^{(1)} is used together with the high-fidelity model f(0)f^{(0)} and the SG-based model fn2(2)f^{(2)}_{n_{2}}, the proposed approach selects either n2=131n_{2}^{*}=131 or n2=438n_{2}^{*}=438 training samples, which demonstrates the context-aware trade-off that is made by the proposed approach when training low-fidelity models.

Consider now the left plot of Figure 12 that compares the MSEs of the CA-MFMC estimators to static MFMC and regular MC. The reference solution was obtained via the MFMC estimator depending on the coarse-grid and DNN low-fidelity models for a budget of p=3×106p=3\times 10^{6} seconds using N=5N=5 replicates; here we needed n3=847n_{3}^{*}=847 high-fidelity evaluations to construct the DNN low-fidelity model. The average reference value of the expected growth rate of the dominant electron temperature gradient-driven micro-instability mode is μ^ref=0.3967\hat{\mu}_{\mathrm{ref}}=0.3967. The context-aware estimators are up to two orders of magnitude more accurate than the standard MC estimator. The roughly two orders of magnitude speedup of the CA-MFMC estimators translate into a runtime reduction from 7272 days to roughly 44 hours on 3232 cores on one node of the Lonestar6 supercomputer. We also observe that MFMC with the high-fidelity and coarse-grid model is about one order of magnitude more accurate in terms of the MSE than the standard MC estimator, showing that coarse-grid plasma physics models can be useful in multi-fidelity settings. The trend that we observe here numerically are in alignment with the analytic MSEs shows in Figure 9.

The right plot of Figure 12 shows how the number of samples are distributed among the models. All multi-fidelity estimators allocate more than 98%98\% of the total number of samples to low-fidelity models, which explains the speedups. However, there is still a small percentage corresponding to the high-fidelity model, which is desirable in practical applications to achieve unbiasedness of the estimators.

Refer to caption
Figure 12: ASDEX Upgrade Experiment: The left plots show that the CA-MFMC estimators achieve about two orders of magnitude speedup, which translates into a runtime reduction from 7272 days to roughly 44 hours on one node of the Lonestar6 supercomputer. The right plot shows the sample distribution across models in the employed estimators.

5 Conclusions

The proposed context-aware learning approach for variance reduction trades off training hierarchies of low-fidelity models with Monte Carlo estimation. By leveraging the trade-off between either adding a high-fidelity model sample to the training set for constructing low-fidelity models versus using the high-fidelity model sample in the Monte Carlo estimator, our analysis shows that the quasi-optimal number of training samples from the high-fidelity model is bounded independent of the computational budget and thus that after a finite number of samples, all high-fidelity model samples should be used in the Monte Carlo estimator rather than being used to improve the low-fidelity model. This means that low-fidelity models can be over-trained for multi-fidelity uncertainty quantification. We have demonstrated the over-training numerically in a thermal block scenario, in which a reduced basis low-fidelity model constructed from 5050 high-fidelity training samples led to a multi-fidelity estimator with a poorer cost/error ratio than the proposed context-aware estimator that determined 1818 to be the maximum quasi-optimal number of training samples. This has clear implications on training data-fit and machine-learning-based models too, which can require large data sets for achieving high prediction accuracy but are sufficient to be trained with few data points when they are merely used for variance reduction together with the high-fidelity model in multi-fidelity uncertainty quantification, as we have shown in a numerical experiment with plasma micro-turbulence simulations.

Acknowledgements

I.-G.F. and F.J. were partially supported by the Exascale Computing Project (No. 17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. B.P. acknowledges support from the Air Force Office of Scientific Research (AFOSR) award FA9550-21-1-0222 (Dr. Fariba Fahroo). We thank Tobias Goerler for useful discussions and insights about the considered plasma micro-turbulence simulation scenario. We also gratefully acknowledge the compute and data resources provided by the Texas Advanced Computing Center at The University of Texas at Austin (https://www.tacc.utexas.edu/).

References

  • [1] T. Alsup and B. Peherstorfer. Context-aware surrogate modeling for balancing approximation and sampling costs in multi-fidelity importance sampling and Bayesian inverse problems. arXiv, 2010.11708, 2020.
  • [2] P. Benner, S. Gugercin, and K. Willcox. A survey of projection-based model reduction methods for parametric dynamical systems. SIAM Review, 57(4):483–531, 2015.
  • [3] P. Binev, A. Cohen, W. Dahmen, R. DeVore, G. Petrova, and P. Wojtaszczyk. Convergence rates for greedy algorithms in reduced basis methods. SIAM Journal on Mathematical Analysis, 43(3):1457–1472, 2011.
  • [4] A. Brizard and T. Hahm. Foundations of nonlinear gyrokinetic theory. Reviews of Modern Physics, 79, 2007.
  • [5] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011.
  • [6] K. A. Cliffe, M. B. Giles, R. Scheichl, and A. L. Teckentrup. Multilevel Monte Carlo methods and applications to elliptic PDEs with random coefficients. Computing and Visualization in Science, 14(1):3–15, 2011.
  • [7] N. Collier, A.-L. Haji-Ali, F. Nobile, E. von Schwerin, and R. Tempone. A continuation multilevel Monte Carlo algorithm. BIT Numerical Mathematics, 55(2):399–432, Jun 2015.
  • [8] T. Dannert and F. Jenko. Gyrokinetic simulation of collisionless trapped-electron mode turbulence. Physics of Plasmas, 12(7):072309, 2005.
  • [9] S. De, J. Britton, M. Reynolds, R. Skinner, K. Jansen, and A. Doostan. On transfer learning of neural networks using bi-fidelity data for uncertainty propagation. International Journal for Uncertainty Quantification, 10(6):543–573, 2020.
  • [10] M. Eigel, C. Merdon, and J. Neumann. An adaptive multilevel Monte Carlo method with stochastic bounds for quantities of interest with uncertain data. SIAM/ASA Journal on Uncertainty Quantification, 4(1):1219–1245, 2016.
  • [11] I.-G. Farca\cbs, T. Görler, H.-J. Bungartz, F. Jenko, and T. Neckel. Sensitivity-driven adaptive sparse stochastic approximations in plasma microinstability analysis. Journal of Computational Physics, 410:109394, 2020.
  • [12] I.-G. Farcas. Context-aware model hierarchies for higher-dimensional uncertainty quantification. PhD thesis, Technical University of Munich, 2020.
  • [13] I.-G. Farcas, G. Merlo, and F. Jenko. A general framework for quantifying uncertainty at scale and its application to fusion research. arXiv, 2202.03999, 2022.
  • [14] I.-G. Farcaş, A. D. Siena, and F. Jenko. Turbulence suppression by energetic particles: a sensitivity-driven dimension-adaptive sparse grid framework for discharge optimization. Nuclear Fusion, 61(5):056004, apr 2021.
  • [15] S. J. Freethy, T. Görler, A. J. Creely, G. D. Conway, S. S. Denk, T. Happel, C. Koenen, P. Hennequin, and A. E. White. Validation of gyrokinetic simulations with measurements of electron temperature fluctuations and density-temperature phase angles on ASDEX upgrade. Physics of Plasmas, 25(5):055903, 2018.
  • [16] M. B. Giles. Multilevel Monte Carlo path simulation. Operations Research, 56(3):607–617, 2008.
  • [17] A. A. Gorodetsky, G. Geraci, M. S. Eldred, and J. D. Jakeman. A generalized approximate control variate framework for multifidelity uncertainty quantification. Journal of Computational Physics, 408:109257, 2020.
  • [18] A. A. Gorodetsky, J. D. Jakeman, and G. Geraci. MFNets: data efficient all-at-once learning of multifidelity surrogates as directed networks of information sources. Computational Mechanics, 68(4):741–758, Oct 2021.
  • [19] A. A. Gorodetsky, J. D. Jakeman, G. Geraci, and M. S. Eldred. MFNets: Multi-fidelity data-driven networks for bayesian learning and prediction. International Journal for Uncertainty Quantification, 10(6):595–622, 2020.
  • [20] A. Gruber, M. Gunzburger, L. Ju, R. Lan, and Z. Wang. Multifidelity Monte Carlo estimation for efficient uncertainty quantification in climate-related modeling. EGUsphere, 2022:1–27, 2022.
  • [21] A. Gruber, M. Gunzburger, L. Ju, and Z. Wang. A Multifidelity Monte Carlo Method for Realistic Computational Budgets. arXiv, 2206.07572, 2022.
  • [22] A.-L. Haji-Ali, F. Nobile, E. von Schwerin, and R. Tempone. Optimization of mesh hierarchies in multilevel Monte Carlo samplers. Stochastics and Partial Differential Equations Analysis and Computations, 4(1):76–112, Mar 2016.
  • [23] J. Hampton, H. R. Fairbanks, A. Narayan, and A. Doostan. Practical error bounds for a non-intrusive bi-fidelity approach to parametric/stochastic model reduction. Journal of Computational Physics, 368:315–332, 2018.
  • [24] F. Jenko, W. Dorland, M. Kotschenreuther, and B. N. Rogers. Electron temperature gradient driven turbulence. Physics of Plasmas, 7(5):1904–1910, 2000.
  • [25] P. Khodabakhshi, K. E. Willcox, and M. Gunzburger. A multifidelity method for a nonlocal diffusion model. Applied Mathematics Letters, 121:107361, 2021.
  • [26] J. Konrad, I.-G. Farca\cbs, B. Peherstorfer, A. Di Siena, F. Jenko, T. Neckel, and H.-J. Bungartz. Data-driven low-fidelity models for multi-fidelity Monte Carlo sampling in plasma micro-turbulence analysis. Journal of Computational Physics, 451:110898, 2022.
  • [27] F. Law, A. Cerfon, and B. Peherstorfer. Accelerating the estimation of collisionless energetic particle confinement statistics in stellarators using multifidelity Monte Carlo. Nuclear Fusion, 62(7):076019, may 2022.
  • [28] R. Miller, M. Chu, J. Greene, Y. Lin-Liu, and R. Waltz. Noncircular, finite aspect ratio, local equilibrium model. Physiscs of Plasmas, 5:979, 1998.
  • [29] A. Narayan, C. Gittelson, and D. Xiu. A Stochastic Collocation Algorithm with Multifidelity Models. SIAM Journal on Scientific Computing, 36(2):A495–A521, 2014.
  • [30] F. Newberry, J. Hampton, K. Jansen, and A. Doostan. Bi-fidelity reduced polynomial chaos expansion for uncertainty quantification. Computational Mechanics, 69(2):405–424, Feb 2022.
  • [31] L. Ng and K. Willcox. Multifidelity approaches for optimization under uncertainty. International Journal for Numerical Methods in Engineering, 100(10):746–772, 2014.
  • [32] B. Peherstorfer. Multifidelity Monte Carlo estimation with adaptive low-fidelity models. SIAM/ASA Journal on Uncertainty Quantification, 7(2):579–603, 2019.
  • [33] B. Peherstorfer, M. Gunzburger, and K. Willcox. Convergence analysis of multifidelity Monte Carlo estimation. Numerische Mathematik, 139(3):683–707, 2018.
  • [34] B. Peherstorfer, K. Willcox, and M. Gunzburger. Optimal model management for multifidelity Monte Carlo estimation. SIAM Journal on Scientific Computing, 38(5):A3163–A3194, 2016.
  • [35] B. Peherstorfer, K. Willcox, and M. Gunzburger. Survey of multifidelity methods in uncertainty propagation, inference, and optimization. SIAM Review, 60(3):550–591, 2018.
  • [36] E. Qian, B. Peherstorfer, D. O’Malley, V. Vesselinov, and K. Willcox. Multifidelity Monte Carlo estimation of variance and sensitivity indices. SIAM/ASA Journal on Uncertainty Quantification, 6(2):683–706, 2018.
  • [37] M. Razi, R. M. Kirby, and A. Narayan. Fast predictive multi-fidelity prediction with models of quantized fidelity levels. Journal of Computational Physics, 376:992–1008, 2019.
  • [38] G. Rozza, D. Huynh, and A. Patera. Reduced basis approximation and a posteriori error estimation for affinely parametrized elliptic coercive partial differential equations. Archives of Computational Methods in Engineering, 15(3):1–47, 2007.
  • [39] D. Schaden and E. Ullmann. On multilevel best linear unbiased estimators. SIAM/ASA Journal on Uncertainty Quantification, 8(2):601–635, 2020.
  • [40] D. Schaden and E. Ullmann. Asymptotic analysis of multilevel best linear unbiased estimators. SIAM/ASA Journal on Uncertainty Quantification, 9(3):953–978, 2021.
  • [41] N. Shyamkumar, S. Gugercin, and B. Peherstorfer. Towards context-aware learning for control: Balancing stability and model-learning error. In 2022 American Control Conference (ACC), pages 4808–4813, 2022.
  • [42] A. L. Teckentrup, R. Scheichl, M. B. Giles, and E. Ullmann. Further analysis of multilevel Monte Carlo methods for elliptic PDEs with random coefficients. Numerische Mathematik, 125(3):569–600, 2013.
  • [43] S. Werner and B. Peherstorfer. Context-aware controller inference for stabilizing dynamical systems from scarce data. arXiv, 2207.11049, 2022.
  • [44] Y. Xu, V. Keshavarzzadeh, R. M. Kirby, and A. Narayan. A Bandit-Learning Approach to Multifidelity Approximation. SIAM Journal on Scientific Computing, 44(1):A150–A175, 2022.
  • [45] H. Yang, Y. Fujii, K. W. Wang, and A. A. Gorodetsky. Control variate polynomial chaos: Optimal fusion of sampling and surrogates for multifidelity uncertainty quantification, 2022.