Context-aware learning of hierarchies of low-fidelity models for multi-fidelity uncertainty quantification

Ionu\cbt-Gabriel Farca\cbs Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin Benjamin Peherstorfer Courant Institute of Mathematical Sciences, New York University Frank Jenko Max Planck Institute for Plasma Physics
Tobias Neckel Department of Informatics, Technical University of Munich and Hans-Joachim Bungartz⁴⁴footnotemark: 4

Abstract

Multi-fidelity Monte Carlo methods leverage low-fidelity and surrogate models for variance reduction to make tractable uncertainty quantification even when numerically simulating the physical systems of interest with high-fidelity models is computationally expensive. This work proposes a context-aware multi-fidelity Monte Carlo method that optimally balances the costs of training low-fidelity models with the costs of Monte Carlo sampling. It generalizes the previously developed context-aware bi-fidelity Monte Carlo method to hierarchies of multiple models and to more general types of low-fidelity models. When training low-fidelity models, the proposed approach takes into account the context in which the learned low-fidelity models will be used, namely for variance reduction in Monte Carlo estimation, which allows it to find optimal trade-offs between training and sampling to minimize upper bounds of the mean-squared errors of the estimators for given computational budgets. This is in stark contrast to traditional surrogate modeling and model reduction techniques that construct low-fidelity models with the primary goal of approximating well the high-fidelity model outputs and typically ignore the context in which the learned models will be used in upstream tasks. The proposed context-aware multi-fidelity Monte Carlo method applies to hierarchies of a wide range of types of low-fidelity models such as sparse-grid and deep-network models. Numerical experiments with the gyrokinetic simulation code Gene show speedups of up to two orders of magnitude compared to standard estimators when quantifying uncertainties in small-scale fluctuations in confined plasma in fusion reactors. This corresponds to a runtime reduction from 72 days to about four hours on one node of the Lonestar6 supercomputer at the Texas Advanced Computing Center.

1 Introduction

Uncertainty quantification is an essential building block for achieving predictive numerical simulations of physical systems. To make accurate Monte Carlo estimation of uncertainties tractable even when simulations of the physical system of interest are computationally expensive, multi-fidelity methods rely on low-fidelity or surrogate models: the low-fidelity models are leveraged for variance reduction to achieve speedups and the high-fidelity models are occasionally evaluated to guarantee unbiasedness; see [35] for a survey on multi-fidelity methods. However, if low-fidelity models are not readily available, then they need to be constructed and trained first, which can incur additional computational costs and require additional evaluations of the high-fidelity models to generate training data.

In this work, we build on the context-aware bi-fidelity Monte Carlo method introduced in [32] and propose the context-aware multi-fidelity Monte Carlo (CA-MFMC) method that trades off the costs of training hierarchies of multiple low-fidelity models with the costs of Monte Carlo sampling to obtain multi-fidelity estimators that minimize upper bounds of the mean-squared errors for given computational budgets. The proposed approach is context-aware [1, 12, 32, 43, 41] in the sense that the low-fidelity models are trained to maximize variance reduction in Monte Carlo estimation. This means that the context in which the learned models will be used, namely for variance reduction in Monte Carlo estimation, is taken into account during training, which distinguishes it from traditional surrogate modeling and model reduction techniques that construct low-fidelity models with the primary goal of approximating well the high-fidelity model outputs and typically ignore the context in which the learned models are ultimately used [2, 38]. Our proposed CA-MFMC method can be combined with a wide range of types of data-fit low-fidelity models models such as sparse-grid-based models and deep-network models. We show that CA-MFMC achieves speedups of up to two orders of magnitude compared to single-fidelity Monte Carlo and standard multi-fidelity estimators when quantifying uncertainties in a plasma micro-turbulence scenario from the ASDEX Upgrade Experiment¹¹1https://www.ipp.mpg.de/16195/asdex.

We build on multi-fidelity Monte Carlo (MFMC) estimators [25, 31, 33, 34, 35, 36, 21, 20] that leverage a given hierarchy of low-fidelity models for variance reduction. Several works have employed MFMC estimators to quantify uncertainties in plasma micro-turbulence simulations [12, 26] and for estimating statistics in collisionless energetic particle confinement in stellerators [27]. See also [13] for other techniques for uncertainty quantification in plasma simulations. None of these methods trade off the training of low-fidelity models with sampling, however. The works [17, 39, 40] formulate generalized control variate techniques for multi-fidelity uncertainty propagation. There is a wide range of other multi-fidelity techniques that are based on other concepts than control variates and aim to find surrogate models from multi-fidelity information sources such as the collocation approach introduced in [29, 23] and the multi-fidelity networks proposed in [19, 18]; see also [30, 9, 37, 44]. There have been extensions to the multilevel Monte Carlo method [6, 16, 42] that consider spatially-adaptive mesh refinements and optimal mesh hierarchies [7, 10, 22]. However, in that line of work, the low-fidelity models are coarse-grid approximations and thus the costs of constructing low-fidelity models are typically considered to be negligible and therefore are ignored. In contrast, we consider more general types of low-fidelity models such as data-fit models that incur training costs.

The first work that considered trading off training low-fidelity models and multi-fidelity Monte Carlo estimation is [32], which studies the bi-fidelity setting in which the low-fidelity model has algebraic accuracy and cost rates. In [45] a similar trade off is considered in the bi-fidelity case but with more general cost and error rates and a specific focus on polynomial-chaos-based surrogate models. We go far beyond by introducing the CA-MFMC estimator based on context-aware learning that learns hierarchies of more than one low-fidelity model for variance reduction. We first extend the work on a single low-fidelity model with algebraic cost and error rates in [32] to apply to low-fidelity models with more general rates. Key is that the correspond bounds can be nested, which motivates the sequential training of low-fidelity models to obtain a hierarchy. In the proposed sequential training approach to fit hierarchies of low-fidelity models for CA-MFMC estimators, each step leads to an optimal trade off between training and sampling. This leads to a context-aware learning approach because the models are learned such that the CA-MFMC estimator achieves a low mean-squared error, rather than being trained to accurately approximate high-fidelity model outputs as in traditional model reduction.

To demonstrate the performance of the proposed CA-MFMC estimator, we apply it to quantifying uncertainties in plasma micro-turbulence simulations motivated by the ITER experiment²²2https://www.iter.org. The goal of the ITER experiment is to create, for the very first time, a self-sustained plasma in the laboratory. A physics obstacle are the above-mentioned small-scale fluctuations which cause energy loss rates despite sophisticated plasma confinements via strong and shaped magnetic fields. Building on the gyrokinetic simulation code GENE [24], we show that the proposed estimator achieves speedups of up to two orders of magnitude compared to single-fidelity Monte Carlo and standard multi-fidelity estimators, which translates into a runtime reduction from 72 days to about four hours on one node of the Lonestar6 supercomputer at the Texas Advanced Computing Center³³3https://www.tacc.utexas.edu/systems/lonestar6.

The remainder of this paper is organized as follows. Section 2 introduces the notation and summarizes the traditional MFMC algorithm [31, 34] and the bi-fidelity context-aware algorithm formulated in [32]. Section 3 introduces our context-aware learning approach for low-fidelity models with general cost/error rates and multiple low-fidelity models for multi-fidelity sampling methods. Section 4 presents numerical results in two scenarios: a heat conduction problem defined on a two-dimensional spatial domain with nine uncertain parameters and a realistic plasma micro-turbulence scenario with $12$ uncertain inputs, for which one realization of the uncertain inputs requires a total runtime of about $411$ seconds on $32$ cores. The code and data to reproduce our numerical results are available at https://github.com/ionutfarcas/context-aware-mfmc.

2 Preliminaries

This section reviews traditional MFMC estimators [31, 34] and the bi-fidelity context-aware MFMC estimator [32] that uses a high-fidelity model and a single low-fidelity model only.

2.1 Static multi-fidelity Monte Carlo estimation

Let $f^{(0)}:\mathcal{X}\to\mathcal{Y}$ represent the input-output response of a potentially expensive-to-evaluate computational model. The input domain is $\mathcal{X}\subset\mathbb{R}^{d}$ and the output is scalar with output domain $\mathcal{Y}\subset\mathbb{R}$ . We consider the situation where the input $\bm{\theta}=[\theta_{1},\theta_{2},\ldots,\theta_{d}]^{T}$ is a realization of a random variable $\bm{\Theta}=[\Theta_{1},\dots,\Theta_{d}]^{T}$ so that the output $f^{(0)}(\bm{\theta})$ becomes a realization of a random variable $f^{(0)}(\bm{\Theta})$ too. We denote the probability density function of $\bm{\Theta}$ as $\pi$ . Our goal is to estimate the expected value of $f^{(0)}(\bm{\Theta})$

\mu_{0}=\mathbb{E}[f^{(0)}(\bm{\Theta})]=\int_{\mathcal{X}}f^{(0)}(\bm{\theta})\pi(\bm{\theta})\mathrm{d}\bm{\theta}\,.

The MFMC estimator introduced in [31, 34] combines a hierarchy of $k+1$ models, consisting of a high-fidelity model $f^{(0)}$ and $k$ low-fidelity models $f^{(1)}$ , $f^{(2)}$ , …, $f^{(k)}$ . The accuracy of the low-fidelity models is measured by their Pearson correlation coefficient with respect to $f^{(0)}$ ,

\rho_{j}=\frac{\mathrm{Cov}[f^{(0)},f^{(j)}]}{\sigma_{0}\sigma_{j}}\in[-1,1],\quad j=1,\ldots,k,

where $\sigma_{j}^{2}$ denotes the variance of the output random variable $f^{(j)}(\bm{\Theta})$ for $j=0,1,\ldots,k$ and $\operatorname{Cov}[\cdot,\cdot]$ is the covariance. The evaluation costs of the models are $0<w_{1},\dots,w_{k}$ , respectively. We normalize the cost of evaluating the high-fidelity model such that $w_{0}=1$ , without loss of generality. The models are assumed to satisfy the following ordering

1=\left|\rho_{0}\right|>\left|\rho_{1}\right|>\ldots>\left|\rho_{k}\right|,\quad\frac{w_{j-1}}{w_{j}}>\frac{\rho_{j-1}^{2}-\rho_{j}^{2}}{\rho_{j}^{2}-\rho_{j+1}^{2}},\quad j=1,\ldots,k,

(2.1)

where $\rho_{\ell}=0$ for $\ell\geq k+1$ .

Let now $m_{j}\in\mathbb{N}$ for $j=0,1\ldots,k$ denote the number of evaluations of model $f^{(j)}$ , which satisfy

0\leq m_{0}\leq m_{1}\leq\ldots\leq m_{k}

because of (2.1). Consider $m_{k}$ independent and identically distributed (i.i.d.) samples drawn from the input density $\pi$ :

\bm{\theta}_{1},\bm{\theta}_{2},\ldots,\bm{\theta}_{m_{k}}.

To obtain the MFMC estimator, model $f^{(j)}$ is evaluated at the first $m_{j}$ samples $\bm{\theta}_{1},\bm{\theta}_{2},\ldots,\bm{\theta}_{m_{j}}$ to obtain output random variables

f^{(j)}(\bm{\theta}_{1}),f^{(j)}(\bm{\theta}_{2}),\ldots,f^{(j)}(\bm{\theta}_{m_{j}})\,,

(2.2)

for $j=0,1,\ldots,k$ . From (2.2), derive the following standard MC estimators for $j=1,\dots,k$ :

\hat{E}_{m_{j}}^{(j)}=\frac{1}{m_{j}}\sum_{i=1}^{m_{j}}f^{(j)}(\bm{\theta}_{i}),\quad\hat{E}_{m_{j-1}}^{(j)}=\frac{1}{m_{j-1}}\sum_{i=1}^{m_{j-1}}f^{(j)}(\bm{\theta}_{i})

as well as

\hat{E}_{m_{0}}^{(0)}=\frac{1}{m_{0}}f^{(0)}(\bm{\theta}_{i})\,.

The MFMC estimator is

\hat{E}^{\mathrm{MFMC}}=\hat{E}_{m_{0}}^{(0)}+\sum_{j=1}^{k}\alpha_{j}(\hat{E}_{m_{j}}^{(j)}-\hat{E}_{m_{j-1}}^{(j)})

with the coefficients $\alpha_{1},\dots,\alpha_{k}\in\mathbb{R}$ . The total computational cost of the MFMC estimator is therefore

p=\sum_{j=0}^{k}m_{j}w_{j}\,.

Fixing $p$ and then selecting $m_{0}^{*},m_{1}^{*},\ldots,m_{k}^{*}$ model evaluations and $\alpha_{1}^{*},\alpha_{2}^{*},\ldots,\alpha_{k}^{*}$ coefficients such that the variance of the MFMC estimator $\hat{E}^{\mathrm{MFMC}}$ is minimized, gives the MSE

\mathrm{MSE}[\hat{E}^{\mathrm{MFMC}}]=\frac{\sigma_{0}^{2}}{p}\left(\sum_{j=0}^{k}\sqrt{w_{j}(\rho_{j}^{2}-\rho_{j+1}^{2})}\right)^{2}.

(2.3)

The optimal $m_{0}^{*},m_{1}^{*},\ldots,m_{k}^{*}$ model evaluations and $\alpha_{1}^{*},\alpha_{2}^{*},\ldots,\alpha_{k}^{*}$ coefficients are available in closed form for a given budget $p$ ; see [34] for details.

The MFMC estimator leverages the cheaper-to-evaluate low-fidelity models with the aim to achieve a lower MSE than a standard MC estimator with the same costs. That is, given $m$ i.i.d. samples from $\bm{\Theta}$ , the standard MC mean estimator is

\hat{E}_{m}^{(0)}=\frac{1}{m}\sum_{i=1}^{m}f^{(0)}(\bm{\theta}_{i}).

(2.4)

The computational cost of the Monte Carlo estimator (2.4) of $\mathbb{E}[f^{(0)}(\bm{\Theta})]$ is $p=mw_{0}=m$ because the function $f^{(0)}$ is evaluated at $m$ realizations and the cost of each evaluation of $f^{(0)}$ is $w_{0}=1$ . The MSE of the standard MC estimator $\hat{E}_{n}^{(0)}$ reads

\mathrm{MSE}[\hat{E}_{m}^{(0)}]=\frac{\sigma_{0}^{2}}{p}w_{0}=\frac{\sigma_{0}^{2}}{p}.

(2.5)

2.2 Context-aware bi-fidelity Monte Carlo estimator with algebraic accuracy and cost rates

The work [32] introduces a context-aware bi-fidelity MFMC approach that trades off the training costs of constructing a low-fidelity model $f^{(1)}_{n}$ and sampling. Following [32], the low-fidelity model $f^{(1)}_{n}$ is obtained via a training process such as data-fit models and reduced models. Correspondingly, the subscript $n$ refers to the number of high-fidelity evaluations that are needed to train $f^{(1)}_{n}$ . For example, in case of obtaining $f^{(1)}_{n}$ via training a neural network, the subscript $n$ refers to the number of training input-output pairs obtained with the high-fidelity model $f^{(0)}$ . This also means that the correlation coefficient $\rho_{1}(n)$ between $f^{(0)}(\bm{\Theta})$ and $f^{(1)}_{n}(\bm{\Theta})$ as well as the evaluation cost $w_{1}(n)$ of $f^{(1)}_{n}$ depend on $n$ .

The dependency of the correlation and costs on $n$ are described as follows: to trade off training low-fidelity model and sampling, the work [32] makes the assumption that the correlation coefficient between the high-fidelity output random variable $f^{(0)}(\bm{\Theta})$ and low-fidelity random variable $f^{(1)}_{n}(\bm{\Theta})$ is bounded by

1-\rho_{1}^{2}(n)\leq c_{1}n^{-\alpha}

with respect to $n$ , where $c_{1},\alpha>0$ are constants. The cost $w_{1}(n)$ of evaluating the low-fidelity model is bounded by

w_{1}(n)\leq c_{2}n^{\beta}

with constants $c_{2},\beta>0$ .

A budget $p$ corresponds to $p$ high-fidelity model evaluations because we have $w_{0}=1$ . Thus, if $n$ high-fidelity model samples are used for training $f^{(1)}_{n}$ , then a budget of $p-n$ is left for sampling. The corresponding context-aware bi-fidelity estimator is

\hat{E}^{\text{CA-MFMC}}_{n}=\hat{E}_{m_{0}^{*}}^{(0)}+\alpha_{1}^{*}(\hat{E}_{m_{1}^{*},n}^{(1)}-\hat{E}_{m_{0}^{*},n}^{(1)})

where $m_{0}^{*},m_{1}^{*},\alpha_{1}^{*}$ are chosen to minimize the MSE of $\hat{E}^{\text{CA-MFMC}}_{n}$ for a given $p$ and $n$ . Consequently, the MSE of $\hat{E}^{\text{CA-MFMC}}_{n}$ depends on $p$ and $n$

\operatorname{MSE}(\hat{E}^{\text{CA-MFMC}}_{n})=\frac{\sigma_{0}^{2}}{p-n}\left(\sqrt{1-\rho_{1}^{2}(n)}+\sqrt{w_{1}(n)\rho_{1}^{2}(n)}\right)^{2}\,,

which is in contrast to the MFMC estimator $\hat{E}^{\mathrm{MFMC}}$ for which the MSE (2.3) depends on $p$ only and is independent of potential training costs of constructing low-fidelity models.

If the budget $p$ is fixed, $n$ can be chosen by minimizing the upper bound of the MSE

\operatorname{MSE}(\hat{E}^{\text{CA-MFMC}}_{n})\leq\frac{2\sigma_{0}^{2}}{p-n}\left(c_{1}n^{-\alpha}+c_{2}n^{\beta}\right)\,.

(2.6)

The work [32] shows that there exists a unique $n^{*}$ that minimizes (2.6) with respect to $n$ for a given $p$ under certain assumptions; however, no closed form expression of $n^{*}$ is available and thus $n^{*}$ needs to be found numerically. The optimum $n^{*}$ is bounded independently of the budget $p$ ; see [32] for details.

3 Context-aware multi-fidelity Monte Carlo estimation

This section presents the methodological novelty of this paper. We introduce context-aware learning for MFMC for low-fidelity models with general cost/error rates and hierarchies of multiple—more than one—low-fidelity models.

3.1 Setup with multiple models

We now consider multiple low-fidelity models $f_{n_{1}}^{(1)},\dots,f_{n_{k}}^{(k)}$ that take $n_{1},\dots,n_{k}$ evaluations, respectively, of the high-fidelity model to be trained. We refer to $n_{1},\dots,n_{k}$ as the number of training samples to fit the low-fidelity models. We then define the CA-MFMC estimator as

\hat{E}_{n_{1},\dots,n_{k}}^{\text{CA-MFMC}}=\hat{E}_{m_{0}^{*}}^{(0)}+\sum_{j=1}^{k}\alpha_{j}^{*}\left(\hat{E}_{m_{j}^{*}}^{(j)}-\hat{E}_{m_{j-1}^{*}}^{(j)}\right)\,,

where the $m_{0}^{*},\dots,m_{k}^{*}$ and $\alpha_{1}^{*},\dots,\alpha_{k}^{*}$ minimize the MSE of $\hat{E}_{n_{1},\dots,n_{k}}^{\text{CA-MFMC}}$ for given $n_{1},\dots,n_{k}$ and budget $p$ . The MSE of the CA-MFMC estimator is

\mathrm{MSE}[\hat{E}^{\mathrm{CA-MFMC}}_{n_{1},\dots,n_{k}}]=\frac{\sigma_{0}^{2}}{p-\sum_{i=1}^{k}n_{i}}\left(\sum_{j=0}^{k}\sqrt{w_{j}(n_{j})(\rho_{j}^{2}(n_{j})-\rho_{j+1}^{2}(n_{j+1}))}\right)^{2}\,,

(3.1)

where now the MSE depends on the $n_{1},\dots,n_{k}$ training samples used for constructing the low-fidelity models. We are interested in choosing $n_{1},\dots,n_{k}$ to minimize an upper bound of the MSE (3.1) so that the training costs given by $n_{1},\dots,n_{k}$ and the costs of taking samples from the high- and the low-fidelity output random variables are balanced.

We make the following assumptions on the accuracy and cost behavior of the low-fidelity models with respect to the number of training samples $n_{1},\dots,n_{k}$ :

Assumption 1.

For all $j=1,\dots,k$ , there exists a constant $c_{a,j}>0$ and a positive, decreasing, at least twice continuously differentiable function $r_{a,j}(n_{j}):(0,\infty)\rightarrow(0,\infty)$ such that

1-\rho_{j}^{2}(n_{j})\leq c_{a,j}r_{a,j}(n_{j}),\quad j=1,\ldots,k\,.

Assumption 2.

For all $j=1,\dots,k$ , there exists a constant $c_{c,j}>0$ and a positive, increasing, at least twice continuously differentiable function $r_{c,j}(n_{j}):(0,\infty)\rightarrow(0,\infty)$ such that the evaluation costs are bounded as

w_{j}(n_{j})\leq c_{c,j}r_{c,j}(n_{j}),\quad j=1,\ldots,k.

Assumptions 1–2 hold for a wide range of typical error and cost rates: the functions $r_{a,j}(n)=n^{-\alpha},\alpha>0$ and $r_{a,j}(n)=\mathrm{e}^{-\alpha n},\alpha>0$ satisfy Assumption 1 for $n\in(0,\infty)$ . The functions $r_{c,j}(n)=n^{\alpha},\alpha>1$ and $r_{c,j}(n)=\mathrm{e}^{\alpha n},\alpha>0$ satisfy Assumption 2 for $n\in(0,\infty)$ . Additionally, the error and cost rates given here are strictly convex.

3.2 Context-aware multi-fidelity Monte Carlo sampling with one low-fidelity model

Let us first consider only one low-fidelity model, which means that $k=1$ . The following is novel compared to what was introduced in [32] because here we allow more general error and cost rates than [32]; cf. Section 2.2 where [32] is reviewed.

We obtain the following upper bound of the MSE of the CA-MFMC estimator (3.1),

$\displaystyle\mathrm{MSE}[\hat{E}^{\mathrm{CA-MFMC}}_{n_{1}}]=$	$\displaystyle\frac{\sigma_{0}^{2}}{p-n_{1}}\left(\sqrt{1-\rho_{1}^{2}(n_{1})}+\sqrt{w_{1}(n_{1})\rho_{1}^{2}(n_{1})}\right)^{2}$
$\displaystyle=$	$\displaystyle\frac{\sigma_{0}^{2}}{p-n_{1}}\left(1-\rho_{1}^{2}(n_{1})+w_{1}(n_{1})\rho_{1}^{2}(n_{1})+2\sqrt{1-\rho_{1}^{2}(n_{1})}\sqrt{w_{1}(n_{1})\rho_{1}^{2}(n_{1})}\right)$
$\displaystyle\leq$	$\displaystyle\frac{\sigma_{0}^{2}}{p-n_{1}}\left(2(1-\rho_{1}^{2}(n_{1}))+2w_{1}(n_{1})\rho_{1}^{2}(n_{1})\right)$	(3.2)
$\displaystyle\leq$	$\displaystyle\frac{2\sigma_{0}^{2}}{p-n_{1}}\left(1-\rho_{1}^{2}(n_{1})+w_{1}(n_{1})\right)$	(3.3)
$\displaystyle\leq$	$\displaystyle\frac{2\sigma_{0}^{2}}{p-n_{1}}\left(c_{a,1}r_{a,1}\left(n_{1}\right)+c_{c,1}r_{c,1}\left(n_{1}\right)\right),$	(3.4)

where inequality (3.2) is due to the means inequality $\forall a,b\in\mathbb{R}_{+}\Rightarrow 2\sqrt{a}\sqrt{b}\leq a+b$ , inequality (3.3) results from $\rho_{1}^{2}\leq 1$ and (3.4) is due to Assumptions 1 and 2.

The objective that we want to minimize with respect to $n_{1}$ is

u_{1}:[1,p-1]\rightarrow(0,\infty),n_{1}\mapsto\frac{1}{p-n_{1}}\left(c_{a,1}r_{a,1}\left(n_{1}\right)+c_{c,1}r_{c,1}\left(n_{1}\right)\right)\,.

(3.5)

The following proposition clarifies when a unique global minimum exists.

Proposition 1.

Consider function $u_{1}$ defined in (3.5) with $r_{a,1}$ and $r_{c,1}$ satisfying Assumptions 1 and 2, respectively. Additionally assume that

c_{a,1}r_{a,1}^{\prime\prime}(n_{1})+c_{c,1}r_{c,1}^{\prime\prime}(n_{1})>0

(3.6)

holds for all $n_{1}\in[1,p-1]$ , where $r_{a,1}^{\prime\prime}$ and $r_{c,1}^{\prime\prime}$ are the second derivatives of $r_{a,1}$ and $r_{c,1}$ , respectively. Then, the function $u_{1}$ has a unique global minimum $n_{1}^{*}\in[1,p-1]$ .

Proof.

Define the function $h_{1}(n_{1})=c_{a,1}r_{a,1}(n_{1})+c_{c,1}r_{c,1}(n_{1})$ and notice that

h_{1}^{\prime\prime}(n_{1})=c_{a,1}r_{a,1}^{\prime\prime}(n_{1})+c_{c,1}r_{c,1}^{\prime\prime}(n_{1})>0\,,\qquad n_{1}\in[1,p-1]

(3.7)

due to (3.6), which means that $h_{1}$ is strictly convex in $[1,p-1]$ . We now consider $u_{1}$ and its first derivative

u_{1}^{\prime}(n_{1})=\frac{(p-n_{1})h^{\prime}_{1}(n_{1})+h_{1}(n_{1})}{(p-n_{1})^{2}}\,.

(3.8)

Case 1: If $u_{1}^{\prime}(n_{1})>0$ for all $n_{1}\in[1,p-1]$ , then $u_{1}$ is strictly increasing and the minimum of $u_{1}$ in $[1,p-1]$ is taken at the left boundary $n_{1}^{*}=1$ , which is the global unique minimum.

Case 2: Analogously to Case 1, if $u_{1}^{\prime}(n_{1})<0$ for all $n_{1}\in[1,p-1]$ , then $u_{1}$ is strictly decreasing and the unique minimum is $p-1$ .

Case 3: There is a stationary point $n_{1}^{*}\in[1,p-1]$ such that $u_{1}^{\prime}(n_{1}^{*})=0$ . At a stationary point $n_{1}^{*}$ of $u_{1}$ , the second derivative,

u_{1}^{\prime\prime}(n_{1})=\frac{h_{1}^{\prime\prime}(n_{1})}{p-n_{1}}+\frac{2}{p-n_{1}}u_{1}^{\prime}(n_{1})\,,

is positive $u_{1}^{\prime\prime}(n_{1}^{*})>0$ because of (3.7) and $u^{\prime}_{1}(n_{1}^{*})=0$ . Thus, together with the fact that function $u_{1}$ is twice continuously differentiable and univariate, we have that $u^{\prime\prime}_{1}$ is positive in a neighborhood about all stationary points, because the inequality $h_{1}^{\prime\prime}(n_{1}^{*})>0$ is strict. Thus, in the interior $(1,p-1)$ , there can only be minima, which also means that a the boundaries of the interval $[1,p-1]$ can only be maxima. This means that there exists only one minimum, which shows uniqueness. ∎

Remark 1.

Assumption (3.6) in Proposition 1 is satisfied if $r_{a,1}$ is strictly convex and $r_{c,1}$ is convex, which holds for the examples given in Section 3.1.

We will next show that that the minimizer of (3.5) is also bounded from above, independent of the computational budget, $p$ , which implies that after a finite number of samples, all high-fidelity model samples should be used in the Monte Carlo estimator rather than being used to improve the low-fidelity model.

Proposition 2.

Let Proposition 1 apply with condition (3.6) satisfied for all $n_{1}\in(0,\infty)$ . Additionally, assume that there exists an $\bar{n}_{1}\in(0,\infty)$ such that

c_{a,1}r_{a,1}^{\prime}(\bar{n}_{1})+c_{c,1}r_{c,1}^{\prime}(\bar{n}_{1})=0

(3.9)

holds. Then $\bar{n}_{1}$ is unique and the minimum $n_{1}^{*}$ of $u_{1}$ derived in Proposition 1 is bounded from above independently of the budget $p$ as

n_{1}^{*}\leq\max\{1,\bar{n}_{1}\}\,.

(3.10)

Proof.

Consider $h_{1}$ introduced in the proof of Proposition 2. There is an $\bar{n}_{1}\in(0,\infty)$ with $h_{1}^{\prime}(\bar{n}_{1})=0$ as given by (3.9). The stationary point $\bar{n}_{1}$ is unique because $h_{1}^{\prime}$ is strictly increasing because we assumed that (3.6) holds in $(0,\infty)$ . Furthermore, the stationary point $\bar{n}_{1}$ is independent of $p$ because $h_{1}$ does not depend on $p$ .

Case 1: If $\bar{n}_{1}<1$ , then this implies that $h_{1}^{\prime}>0$ (strictly increasing because $h_{1}^{\prime\prime}>0$ ) for all $[1,p-1]$ for all $p\geq 1$ and thus $u_{1}^{\prime}>0$ in $[1,p-1]$ . Thus, $n_{1}^{*}=1$ and (3.10) is a bound independent of $p$ .

Case 2: Let now $\bar{n}_{1}\geq 1$ . First, consider budgets $p>\bar{n}_{1}+1$ so that $\bar{n}_{1}\in[1,p-1)$ . Because $h_{1}^{\prime}(\bar{n}_{1})=0$ with $\bar{n}_{1}\in[1,p-1)$ and $h_{1}^{\prime}$ strictly increasing, we have that $h_{1}^{\prime}>0$ in $(\bar{n}_{1},p-1]$ and thus $u_{1}^{\prime}>0$ in $(\bar{n}_{1},p-1]$ , which means $u_{1}$ cannot be strictly decreasing on all of $[1,p-1]$ . Thus, the minimum of $u_{1}$ in $[1,p-1]$ is either at a stationary point of $u_{1}$ or at 1. If it is a stationary point, then $h_{1}^{\prime}(n_{1}^{*})<0$ has to hold to obtain $u_{1}^{\prime}(n_{1}^{*})=0$ and we obtain $h_{1}^{\prime}(n_{1}^{*})<0=h_{1}^{\prime}(\bar{n}_{1})$ , which implies $n_{1}^{*}\leq\bar{n}_{1}$ because $h_{1}^{\prime}$ is strictly increasing and thus (3.10) is a $p$ -independent bound of $n_{1}^{*}$ . If it $n_{1}^{*}=1$ , then $1=n_{1}^{*}\leq\bar{n}_{1}$ still holds because $\bar{n}_{1}\geq 1$ . Second, consider budgets $p\leq\bar{n}_{1}+1$ so that $\bar{n}_{1}\geq p-1$ . Then, it holds $n_{1}^{*}\leq p-1\leq\bar{n}_{1}$ , which again leads to the bound (3.10).

This shows that depending on the properties of $h_{1}$ , the maximum of $1$ and $\bar{n}_{1}$ is an upper bound of $n_{1}^{*}$ . ∎

3.3 Context-aware multi-fidelity Monte Carlo sampling with two or more low-fidelity models

The second novelty of our proposed context-aware approach is that we can consider more than just one low-fidelity model. In the following, we introduce a sequential training approach to fit hierarchies of low-fidelity models for the CA-MFMC estimator, where in each step the optimal trade off between training and sampling is achieved.

We first show an upper bound of the MSE (3.1) in terms of accuracy and cost rates for the case with $k>1$ low-fidelity models:

Lemma 1.

Let Assumptions 1 and 2 hold. Then, the MSE (3.1) of the CA-MFMC estimator $\hat{E}^{\mathrm{CA-MFMC}}_{n_{1},\dots,n_{k}}$ can be bounded as

\mathrm{MSE}[\hat{E}^{\mathrm{CA-MFMC}}_{n_{1},\dots,n_{k}}]\leq\frac{(k+1)\sigma_{0}^{2}}{p_{k-1}-n_{k}}\left(\kappa_{k-1}+w_{k-1}(n_{k-1})c_{a,k}r_{a,k}(n_{k})+c_{c,k}r_{c,k}(n_{k})\right)\,,

(3.11)

where

p_{k-1}=p-\sum_{j=1}^{k-1}n_{k},\quad p_{0}=p\,,\qquad\kappa_{k-1}=\sum_{j=0}^{k-2}w_{j}(n_{j})(1-\rho_{j+1}^{2}(n_{j+1})),\quad\kappa_{0}=0\,.

(3.12)

Proof.

With the sums of squares inequality,

\left(\sum_{i=0}^{k}a_{i}\right)^{2}\leq(k+1)\sum_{i=0}^{k}a_{i}^{2}\,,\qquad a_{0},\dots,a_{k}\in\mathbb{R}\,,

we obtain

	$\displaystyle\mathrm{MSE}[\hat{E}^{\mathrm{CA-MFMC}}_{n_{1},\dots,n_{k}}]=$	$\displaystyle\frac{\sigma_{0}^{2}}{p-\sum_{j=0}^{k}n_{j}}\left(\sum_{j=0}^{k}\sqrt{w_{j}(n_{j})\left(\rho_{j}^{2}(n_{j})-\rho_{j+1}^{2}(n_{j+1})\right)}\right)^{2}$
	$\displaystyle\leq$	$\displaystyle\frac{\sigma_{0}^{2}(k+1)}{p-\sum_{j=0}^{k}n_{j}}\sum_{j=0}^{k}w_{j}(n_{j})(\rho_{j}^{2}(n_{j})-\rho_{j+1}^{2}(n_{j+1}))$
	$\displaystyle=$	$\displaystyle\frac{\sigma_{0}^{2}(k+1)}{p-\sum_{j=0}^{k}n_{j}}\left(\sum_{j=0}^{k-2}w_{j}(n_{j})(\rho_{j}^{2}(n_{j})-\rho_{j+1}^{2}(n_{j+1}))+\right.$
		$\displaystyle\qquad\qquad\qquad\qquad\qquad w_{k-1}(n_{k-1})(\rho_{k-1}^{2}(n_{k-1})-\rho_{k}^{2}(n_{k}))+w_{k}(n_{k})\rho_{k}^{2}(n_{k})\bigg{)}\,,$

where we used the convention that $\rho_{k+1}=0$ in the last equation. With $\rho_{j}^{2}\leq 1$ for $j=0,\dots,k$ and $p_{k-1},\kappa_{k-1}$ defined in (3.12), the bound (3.11) follows with Assumptions 1 and 2. ∎

Bound (3.11) in Lemma 1 decomposes into the component $\kappa_{k-1}$ which depends on $n_{1},\dots,n_{k-1}$ only and components $w_{k-1}(n_{k-1})(\rho_{k-1}^{2}(n_{k-1})-\rho_{k}^{2}(n_{k}))$ and $w_{k}(n_{k})\rho_{k}^{2}(n_{k})$ that can be bounded with Assumptions 1 and 2 so that they depend on $n_{k}$ for a fixed $n_{k-1}$ . This decomposition motivates a sequential approach of adding low-fidelity models to the CA-MFMC estimator. At iteration $j=1,\dots,k$ , we consider the objective

u_{j}(n_{j};n_{1},\dots,n_{j-1}):[1,p_{j-1}-1]\to(0,\infty),\qquad n_{j}\mapsto\frac{1}{p_{j-1}-n_{j}}\left(\kappa_{j-1}+\hat{c}_{a,j}r_{a,j}(n_{j})+c_{c,j}r_{c,j}(n_{j})\right)\,,

(3.13)

where $\hat{c}_{a,j}=w_{j-1}(n_{j-1})c_{a,j}$ . For $j=1$ , we obtain the objective defined in (3.5) with the convention that $n_{0}=0$ and $w_{0}=1$ . For $j>1$ , we obtain objectives $u_{j}$ in (3.13) that depend on $n_{1},\dots,n_{j-1}$ . For given $n_{1},\dots,n_{j-1}$ , there is a global, unique minimizer of $u_{j}$ in $[1,p_{j-1}-1]$ as the following proposition shows.

Proposition 3.

Let Assumptions 1 and 2 hold. For $j=1$ , let Proposition 1 apply and let $n_{1}^{*}$ be the corresponding unique global minimum. Iterate now over $j=2,\dots,k$ and consider the objective $u_{j}(n_{j};n_{1}^{*},\dots,n_{j-1}^{*})$ defined in (3.13) at iteration $j$ . If it holds that

\hat{c}_{a,j}r_{a,j}^{\prime\prime}(n_{j})+c_{c,j}r_{c,j}^{\prime\prime}(n_{j})>0\,,

(3.14)

then there exists a unique minimizer $n_{j}^{*}\in[1,p_{j-1}-1]$ of $u_{j}$ .

Proof.

The function $n_{j}\mapsto\kappa_{j-1}+\hat{c}_{a,j}r_{a,j}(n_{j})+c_{c,j}r_{c,j}(n_{j})$ satisfies (3.6) in $[1,p_{j-1}-1]$ because $\kappa_{j-1}$ is constant in $n_{j}$ and (3.14) is assumed. Thus, the same arguments as in the proof of Proposition 1 apply and thus unique minimizers $n_{2}^{*},\dots,n_{k}^{*}$ exist. ∎

The next proposition shows the minimizers of (3.13) for $j=1,\dots,k$ are bounded from above, independent of the computational budget $p$ .

Proposition 4.

Let Proposition 3 apply, with the condition (3.14) holding in $(0,\infty)$ . Furthermore, analog to Proposition 2, let there exist $\bar{n}_{j}\in(0,\infty)$ so that

\hat{c}_{a,j}r_{a,j}^{\prime}(\bar{n}_{j})+c_{c,j}r_{c,j}^{\prime}(\bar{n}_{j})=0

holds for all $j=1,\dots,k$ . Then, for $j=1,\dots,k$ , there exist bounds $\bar{n}_{1}^{*},\dots,\bar{n}_{k}^{*}$ that are independent of $p$ so that $n_{j}^{*}\leq\bar{n}_{j}^{*}$ for $j=1,\dots,k$ .

Proof.

Proof via induction and applying Proposition 2 sequentially to Proposition 3. ∎

Propositions 1–4 highlights that even though using more than $n_{\ell}^{*}$ high-fidelity model evaluations as training data to construct the $\ell$ th low-fidelity model can lead to a more accurate low-fidelity model in terms of correlation coefficient, it also increases its evaluation costs, which ultimately leads to an increase of the upper bound of the MSE of the MFMC estimator (3.11) for a fixed budget and thus a poorer estimator. This shows that it is beneficial in multi-fidelity methods to trade off accuracy and evaluation costs of the low-fidelity models.

Remark 2.

Increasing the model hierarchy by adding the $j$ th context-aware low-fidelity model $f^{(j)}_{n_{j}}$ must ensure that the MFMC ordering (2.1) is satisfied, i.e., the accuracy and evaluation cost rates corresponding to $f^{(j)}_{n_{j}}$ , evaluated at $n_{j}^{*}$ , have to satisfy (2.1). If (2.1) is not satisfied, the order of the low-fidelity models in the multi-fidelity hierarchy must be changed accordingly.

4 Numerical experiments and discussion

We now present numerical results. In Section 4.1, we consider a heat conduction problem given by a parametric elliptic partial differential equation (PDE) with nine uncertain parameters, defined on a two-dimensional spatial domain. Our goal in this experiment is to draw initial insights about the proposed context-aware learning algorithm. To this end, we will employ two heterogeneous low-fidelity models: an accurate low-fidelity model that is computationally more expensive than a second, less accurate, low-fidelity model. Section 4.2 considers plasma micro-turbulence simulations that depend on $12$ uncertain inputs. In this example, we consider three low-fidelity models: a coarse-grid model and two non-intrusive data-driven low-fidelity models based on sparse grid approximations and deep neural network regression.

4.1 Heat conduction in a two-dimensional spatial domain

All numerical experiments in this section were performed using an eight core Intel i7-10510U CPU and 16 GB of RAM.

4.1.1 Setup

The thermal block problem [38] is defined on a two-dimensional spatial domain $\Omega=(0,1)^{2}=\bigcup_{i=1}^{d}\Omega_{i}$ , divided into $d=B_{1}\times B_{2}$ non-overlapping vertical and horizontal square blocks, $\Omega_{i}$ with $i=1,\ldots d$ . The mathematical model is given by the parametric elliptic PDE

\begin{split}-\operatorname{div}k(x,y,\bm{\theta})\nabla u(x,y)=0&\text{ in }\Omega\,,\\ u(x,y)=0&\text{ on }\Gamma_{D}\,,\\ k(x,y,\bm{\theta})\nabla u(x,y)\cdot\bm{n}=j&\text{ on }\Gamma_{N,j},\quad j=0,1,\end{split}

(4.1)

where $\Gamma_{D}$ is a Dirichlet boundary, $\Gamma_{N,0}$ and $\Gamma_{N,1}$ are Neumann boundaries, and

k(x,y,\bm{\theta})=\sum_{i=1}^{d}\theta_{i}\chi_{\Omega_{i}}(x,y)

is the piece-wise constant heat conductivity field parametrized by $\bm{\theta}=[\theta_{1},\theta_{2},\ldots,\theta_{d}]\subset\mathbb{R}^{d}$ , where $\chi_{\Omega_{i}}$ is the indicator function of $\Omega_{i}$ . We set $B_{1}=B_{2}=3$ and thus $d=9$ . Here, $k(x,y,\bm{\theta})$ is parametrized by $d=9$ uniformly distributed random parameters in $[1,10]^{9}$ . The output of interest is the mean heat flow at the Neumann boundary $\Gamma_{N,1}$ given by

\mathbb{E}[u(\bm{\theta})]=\int_{\Gamma_{N,1}}u(x,y,\bm{\theta})\mathrm{d}x\mathrm{d}y.

(4.2)

This setup is a slight modification of the problem considered by Patera and Rozza in [38].

The high-fidelity model is a finite element discretization of (4.1) consisting of $7,200$ triangular elements with mesh width $h=\sqrt{2}/60$ , provided by the RBMatlab⁴⁴4https://www.morepas.org/software/rbmatlab/. The high-fidelity model evaluation cost is $0.1150$ seconds. Moreover, its variance, estimated using $100$ MC samples, is $\sigma_{0}^{2}\approx 0.0018$ .

4.1.2 A low-fidelity model based on the reduced basis method

The reduced-basis (RB) low-fidelity model $f^{(1)}_{n_{1}}$ is constructed using a greedy strategy, as described, for example, in [3]. We employ the implementation provided by RBMatlab. It has been shown that greedy RB low-fidelity models have exponential accuracy rates for problems similar to the thermal block [3], which is what we use when fitting the error decay. Once the reduced basis is found in the offline stage, evaluating the low-fidelity model online entails solving a dense linear system of size equal to the number of reduced bases to find the reduced basis coefficients. Therefore, we model the evaluation cost rate as algebraic in the reduced-model dimension.

The constants in the rate functions are estimated via regression from pilot runs. To estimate the constants in the exponential accuracy rate, we use $100$ high- and low-fidelity evaluations . For estimating the constants in the evaluation cost rate, we average the runtimes of the low-fidelity model constructed using increasing values of $n_{1}$ , evaluated at $1,000$ MC samples. We perform these runtime measurements $50$ times and average the results. The estimated rates are visualized in Figure 1 and the constants are shown in Table 1.

Refer to caption — Figure 1: Thermal block: Estimated accuracy (left) and evaluation cost rates (right) for the RB low-fidelity model.

rate	rate type	estimated rate constants\bigstrut[t]
accuracy: $c_{a,1}r_{a,1}(n_{1})$	exponential: $r_{a,1}(n_{1})=e^{-\alpha_{1}n_{1}}$	$c_{a,1}=0.6312$ , $\alpha_{1}=0.5754$ \bigstrut[t]
evaluation cost: $c_{c,1}r_{c,1}(n_{1})$	algebraic: $r_{c,1}(n_{1})=n_{1}^{\beta_{1}}$	$c_{c,1}=9.6233\times 10^{-5}$ , $\beta_{1}=1.0704$ \bigstrut[t]

Table 1: Thermal block: Estimated accuracy and evaluation cost rates and their constants for the RB low-fidelity model.

4.1.3 A data-fit low-fidelity model

The second low-fidelity model $f^{(2)}_{n_{2}}$ that we consider in the thermal block example is an $\varepsilon$ -support vector regression ( $\varepsilon$ -SVR) model. Our numerical implementation is based on libsvm [5]. The training data consists of pairs of pseudo-random realizations of the nine-dimensional input and the corresponding high-fidelity model evaluations. In our experiments, we used $\varepsilon=10^{-2}$ . We model the accuracy and evaluation cost rates as algebraic. To estimate the constants in the accuracy rate, we use $100$ high- and low-fidelity evaluations. The constants in the evaluation cost rate are estimated by averaging the runtimes of the low-fidelity model evaluated at $1,000$ MC samples. We perform these runtime measurements $50$ times and average the results. The estimated rates are visualized in Figure 2 and the constants are shown in Table 2. Note that the $\varepsilon$ -SVR low-fidelity model is less accurate but cheaper to evaluate than the RB low-fidelity model.

rate	rate type	estimated rate constants\bigstrut[t]
accuracy: $c_{a,2}r_{a,2}(n_{2})$	algebraic: $r_{a,2}(n_{2})=n_{2}^{-\alpha_{2}}$	$c_{a,2}=0.7309$ , $\alpha_{2}=0.4053$ \bigstrut[t]
evaluation cost: $c_{c,2}r_{c,2}(n_{2})$	algebraic: $r_{c,2}(n_{2})=n_{2}^{\beta_{2}}$	$c_{c,2}=9.3245\times 10^{-7}$ , $\beta_{2}=0.5696$ \bigstrut[t]

Table 2: Thermal block: Accuracy and evaluation cost rates and their constants for the

\varepsilon

-SVR low-fidelity model.

4.1.4 Context-aware multi-fidelity Monte Carlo with RB low-fidelity model

We first consider only the RB low-fidelity model and budgets $p\in\{5,10,30,50,80,100,300,500\}$ seconds. We compare, in terms of MSE, our context-aware estimator with standard MC sampling and MFMC in which the RB low-fidelity model is constructed using a fixed, a priori chosen number of basis independent of the budget $p$ . We compute the MSEs from $N$ replicates as

e_{\text{MSE}}(\hat{E}^{(\cdot)})=\frac{1}{N}\sum_{n=1}^{N}\left(\hat{\mu}_{\text{ref}}-\hat{E}^{(\cdot)}\right)^{2},

(4.3)

where $\hat{\mu}_{\text{ref}}$ represents the reference mean estimator and $\hat{E}^{(\cdot)}$ is either the CA-MFMC, standard MFMC or the MC estimator. The reference result $\hat{\mu}_{\text{ref}}$ was obtained using the context-aware multi-fidelity estimator in which the two low-fidelity models, RB and $\varepsilon$ -SVR, were sequentially added with a computational budget $p_{\text{ref}}=10^{3}$ seconds (cf. Section 4.1.5). To distinguish between the employed estimators, we use “std. MC: $f^{(0)}$ ” to denote the standard MC estimator, “MFMC: $f^{(0)},f^{(1)}$ ” to refer to the MFMC estimator in which the RB low-fidelity model is statically constructed independent of the budget $p$ , and “CA-MFMC: $f^{(0)},f^{(1)}_{n_{1}}$ ” to refer to the context-aware multi-fidelity estimator with the RB low-fidelity model.

Based on the estimated rate parameters in Table 1, both the accuracy and cost rates of the RB low-fidelity model are strictly convex for all $n_{1}\in[1,p-1]$ and any budget $p>0$ , and hence Propositions 1 and 2 apply. Therefore the optimal, context-aware number of high-fidelity model evaluations for constructing the RB low-fidelity model, $n_{1}^{*}$ , exists, is unique and bounded with respect to the budget. The optimal number of training samples $n_{1}^{*}$ for increasing budgets is shown in Figure 3, on the left. Notice that $n_{1}^{*}\leq 18$ and this upper bound is attained for budgets $p\geq 10$ seconds, which is because the RB low-fidelity models is accurate already with few basis vectors. Figure 3, right, depicts the split of the total computational budget between constructing the RB low-fidelity model (offline phase, using $n_{1}^{*}$ high-fidelity model evaluations) and multi-fidelity sampling (online phase, using the remaining budget). Because $n_{1}^{*}$ is bounded, the percentage of the budget allocated to constructing the RB low-fidelity model decreases with $p$ .

Figure 4 compares the MSEs of regular MC, static MFMC with RB low-fidelity models of fixed dimension $2,8,50$ , and CA-MFMC estimators. The left plot reports the analytical MSEs derived using equations (2.3), (2.5) and (3.1), which can be evaluated with the constants reported in Tables 1 and 3.

$n_{1}$	$\rho_{1}$	$w_{1}$	$\sigma_{1}^{2}$ \bigstrut[t]
$2$	$0.9277$	$1.0915\times 10^{-4}$	$9.4347\times 10^{-4}$ \bigstrut[t]
$8$	$0.9939$	$1.9791\times 10^{-4}$	$0.0015$ \bigstrut[t]
$50$	$0.9999$	$6.7539\times 10^{-4}$	$0.0018$ \bigstrut[t]

Table 3: Thermal block: Correlations, evaluation costs and variances of the RB low-fidelity model

f^{(1)}

with dimensions

2,8,50

, respectively.

Computing the analytical MSEs requires no numerical simulations of the forward model as long as the constants in Tables 1 and 3 are available. The right plot shows the MSEs obtained numerically by taking $N=50$ replicates of the estimators. All multi-fidelity estimators give a lower MSE than standard MC sampling by at least half an order of magnitude, even when using only two dimensions in the static RB low-fidelity model. Overall, the proposed context-aware multi-fidelity estimator gives the lowest MSE of all employed estimators. It is about four orders of magnitude more accurate than standard MC sampling in terms of MSE. Our context-aware estimator is also more accurate than the MFMC estimators with static models because $2$ and $8$ are too few basis vectors and $50$ are too many basis vectors that make the low-fidelity model unnecessarily expensive to evaluate. In contrast, our context-aware estimator optimally trades off training and sampling and so achieves the lowest MSE of all estimators, which agrees with Proposition 1 and Proposition 2. Even though the static MFMC estimator with $n_{1}=50$ basis vectors is close to the CA-MFMC estimator, it is unclear how to choose the dimension in static MFMC a priori and if, e.g., $50$ is a good choice, whereas the proposed context-aware approach provides the optimal $n_{1}^{*}$ with respect to an upper bound of the MSE for a given budget $p$ ; cf. Section 3.2 and Figure 3, left.

4.1.5 Context-aware multi-fidelity Monte Carlo with RB and data-fit low-fidelity models

We now sequentially add, as detailed in Section 3.3, the second low-fidelity model that is based on $\varepsilon$ -SVR. We consider budgets of $p\in\{5,10,30,50\}$ seconds. The abbreviation “CA-MFMC: $f^{(0)},f^{(1)}_{n_{1}},f^{(2)}_{n_{2}}$ ” refers to the CA-MFMC estimator in which the $\varepsilon$ -SVR low-fidelity model $f^{(2)}_{n_{2}}$ is sequentially added after the RB low-fidelity model $f^{(1)}_{n_{1}}$ .

As it can be seen from Table 2, the algebraic cost rate of the $\varepsilon$ -SVR low-fidelity model is concave since $0<\beta_{2}<1$ , which implies that we need to verify whether the assumptions in Propositions 3 hold true. We therefore verify whether the function $h_{2}(n_{2})=\kappa_{1}+\hat{c}_{a,2}r_{a,2}(n_{2})+c_{c,2}r_{c,2}(n_{2})$ defined in (3.12), with $\kappa_{1}=c_{c,1}r_{c,1}(n_{1}^{*})$ and $\hat{c}_{a,3}=\kappa_{2}c_{a,3}$ with $\kappa_{2}=c_{a,1}r_{a,1}(n_{1}^{*})$ , satisfies condition (3.14). Since $\kappa_{1}$ and $\kappa_{2}$ are constants for a fixed $n_{1}^{*}$ , $h_{2}(n_{2})$ is independent of the budget and it is hence sufficient to verify (3.14) for the largest considered budget, $p=50$ seconds. For this budget, $n_{1}^{*}=18$ (cf. Figure 3, left), and $\kappa_{1}=2.0061\times 10^{-5}$ and $\kappa_{2}=2.1227\times 10^{-4}$ . Moreover, $n_{2}\in[1,\lfloor p/w0\rfloor-n_{1}^{*}-1]=[1,415]$ . The left plot in Figure 5 shows that $h_{2}^{\prime\prime}(n_{2})>0$ for all $n_{2}\in[1,415]$ which implies that (3.14) is satisfied for all budgets $0<p\leq 50$ . Therefore the number of high-fidelity model evaluations used to construct the $\varepsilon$ -SVR low-fidelity model $n_{2}^{*}$ exists and is unique with respect to the budget $p-n_{1}^{*}\times w_{0}$ that is left after adding the context-aware RB low-fidelity model. The right plot in Figure 5 depicts the objective $u_{2}(n_{2})$ defined in (3.13).

Figure 6 compares the MSE of the estimators. In the left plot, we show the analytical MSEs computed using equations (2.3), (2.5) and (3.1) and the constants reported in Tables 1, 2 and 3, without requiring any new numerical simulations. The MSEs reported in the right plot were computed using $N=50$ replicates. Sequentially adding the $\varepsilon$ -SVR low-fidelity model to the context-aware estimator further decreases the MSE. This shows that even though the slowly decreasing accuracy rate of the $\varepsilon$ -SVR low-fidelity model indicates that it would necessitate a large training set to be accurate in single-fidelity settings for predicting high-fidelity model outputs, training it using at most $n_{2}^{*}=50$ high-fidelity training samples in our context-aware learning approach is sufficient to achieve variance reduction and thus a more accurate multi-fidelity estimator. This indicates that even low-fidelity models with poor predictive capabilities can be useful for multi-fidelity methods as long as they capture the trend of the high-fidelity model.

4.2 Plasma micro-turbulence simulation in the ASDEX Upgrade Experiment

We now consider uncertainty quantification in simulations of plasma micro-turbulence in magnetic confinement devices. Of interest are small-scale fluctuations which cause energy loss rates despite sophisticated plasma confinements via strong and shaped magnetic fields. The micro-turbulence is driven by the free energy provided by the steep plasma temperature and density gradients. However, the measurements of these gradients, as well as further decisive physics parameters affecting the underlying micro-instabilities are subject to uncertainties, which need to be quantified.

4.2.1 Setup of the experiment

We focus on linear (in the phase space variables) gyrokinetic simulations (five-dimensional phase space characterized by three positions and two velocities), which are used to characterize the underlying micro-instabilities; we refer to [4] and the references therein for details. A parameter set from a discharge from the ASDEX Upgrade Experiment is considered, which is similar to the parameter set used in [15] for a validation study of gyrokinetic simulations. The experiment considered here is characterized by two particle species, (deuterium) ions and electrons. Moreover, the magnetic geometry is described by an analytical Miller equilibrium [28]. We consider both electromagnetic effects and collisions. Collisions are modeled by a linearized Landau-Boltzmann operator. The total number of uncertain inputs is $12$ , which are modeled as given in Table 4.

$\bm{\theta}$	parameter name	symbol	left bound	right bound
$\theta_{1}$	plasma $\beta$	$\beta$	$0.4889\times 10^{-3}$	$0.5975\times 10^{-3}$
$\theta_{2}$	collision frequency	$\nu_{c}$	$0.6412\times 10^{-2}$	$0.8676\times 10^{-2}$
$\theta_{3}$	ion and electron density gradient	$\omega_{n}$	$1.1563$	$1.9272$
$\theta_{4}$	ion temperature gradient	$\omega_{T_{i}}$	$2.0966$	$3.4943$
$\theta_{5}$	temperature ratio	$T_{i}/T_{e}$	$0.6142$	$0.6788$
$\theta_{6}$	electron temperature gradient	$\omega_{T_{e}}$	$4.0403$	$6.7338$
$\theta_{7}$	effective ion charge	$Z_{\mathrm{eff}}$	$1.2800$	$1.9200$
$\theta_{8}$	safety factor	$q$	$2.1708$	$2.3993$
$\theta_{9}$	magnetic shear	$\hat{s}$	$1.9927$	$2.4356$
$\theta_{10}$	elongation	$k$	$1.2815$	$1.4165$
$\theta_{11}$	elongation gradient	$s_{k}$	$0.2081$	$0.2544$
$\theta_{12}$	triangularity	$\delta$	$0.0528$	$0.0583$

Table 4: ASDEX Upgrade Experiment: The

12

uniformly distributed uncertain inputs.

We perform experiments at one bi-normal wave-number $k_{y}\rho_{s}=0.8$ . The output of interest is the growth rate (spectral abscissa) of the dominant micro-instability mode, which corresponds to the maximum real part over the spectrum. For this parameter set, the dominant micro-instability mode at $k_{y}\rho_{s}=0.8$ and at the nominal values of the input parameters is electron temperature gradient-driven, characterized by a negative frequency (imaginary part), while the first subdominant mode is ion temperature gradient-driven, characterized by a positive frequency. However, for certain values of the ion temperature and density gradients within the bounds showed in Table 4, the first subdominant ion mode can transition to become the dominant mode. This mode transition can be sharp and in turn lead to sharp transitions or even discontinuities in the growth rate. To avoid the potentially detrimental effects of this discontinuity, in our experiments, we determine the growth rate of the underlying electron-driven micro-instability mode. To this end, we compute the first two micro-instability eigenmodes and then select the growth rate that has a corresponding negative frequency.

4.2.2 High-fidelity model

The high-fidelity model is provided by the gyrokinetic simulation code Gene [24] in the flux-tube limit [8]. This assumes periodic boundary conditions over the radial $x$ and bi-normal $ky$ directions. To discretize the five dimensional gyrokinetic phase-space domain, we use $15$ Fourier modes in the radial direction and $24$ points along the field line, in the $z$ direction. The binormal coordinate $ky$ is decoupled in linear flux-tube simulations and hence only one such mode is used here. In velocity space, we employ $48$ equidistant and symmetric parallel velocity grid points and $16$ Gauss-Laguerre distributed magnetic moment points. This results in a total of $276,480=15\times 1\times 24\times 48\times 16$ degrees of freedom in the $5$ D phase space.

The average high-fidelity model runtime is $410.9941$ seconds on $32$ cores on one node on the Lonestar6 supercomputer at the Texas Advanced Computing Center; cf. Section 1. One such node comprises two AMD EPYC 7763 64-Core Processors for a total of $128$ cores per socket, and $256$ GB of RAM. The high-fidelity variance, estimated using $20$ MC samples, is $\sigma_{0}^{2}\approx 0.0279$ .

4.2.3 A coarse-grid low-fidelity model

We first consider a low-fidelity model $f^{(1)}$ obtained by coarsening the grid in $z$ direction (20 points), $v_{||}$ direction ( $20$ points) and $\mu$ direction ( $8$ Gauss-Laguerre points), for a total of $48,000=15\times 20\times 20\times 8$ degrees of freedom, which corresponds to $5.76\times$ fewer degrees of freedom than the high-fidelity model. We note that this represents still a useful resolution for this experiment while further considerable coarsening of the underlying mesh could lead to significant loss of accuracy or potentially even unphysical solutions. The average evaluation cost of $f^{(1)}$ relative to the high-fidelity model is $w_{1}=0.0849$ on $16$ cores on one node on Lonestar6. The variance and correlation coefficient, approximated using $20$ MC high- and low-fidelity model evaluations, are $\sigma_{1}^{2}\approx 0.0256$ and $\rho_{1}\approx 0.9990$ , respectively.

Single-fidelity settings require fine discretizations for obtaining results with desired accuracy. Here, we want to show that when coarse-grid low-fidelity models are correlated with the high-fidelity model, their lower computational cost can be leveraged for speeding up multi-fidelity sampling methods. This is particularly relevant in plasma physics simulations in which usually only a limited number of high-fidelity simulations can be performed for training data-fit low-fidelity models. Also, we want to show that our algorithm has a wide scope in the sense that a wide variety of low-fidelity models can be used.

4.2.4 A sensitivity-driven dimension-adaptive sparse grid low-fidelity model

The low-fidelity model $f^{(2)}_{n_{2}}$ is based on sensitivity-driven dimension-adaptive sparse grid interpolation [11, 12], with code available⁵⁵5https://github.com/ionutfarcas/sensitivity-driven-sparse-grid-approx. Such sparse grid (SG) low-fidelity models have been successfully used for uncertainty quantification in plasma micro-instability simulations [11, 12, 14], including computationally expensive nonlinear simulations [13]. The adaptive procedure to construct the SG low-fidelity model is terminated when the underlying sparse grid reaches cardinality $n_{2}^{*}$ , which corresponds to $n_{2}^{*}$ high-fidelity training samples, where $n_{2}^{*}$ is determined as discussed in Section 3. We note that even though the interpolation points used to construct the SG low-fidelity model have fine granularity (see [11, 12]), in cases where we cannot have exactly $n_{2}^{*}$ sparse-grid points, we take the closest, feasible number of sparse grid points to $n_{2}^{*}$ .

We use $20$ high- and low-fidelity model evaluations to asses the accuracy of the SG low-fidelity model; see also [26]. To estimate the evaluation cost rate, we average the runtimes of the low-fidelity model evaluated at $1,000$ MC samples. As can be seen in Figure 7 and Table 5, accuracy and runtime can be modeled well with an algebraic rate.

rate	rate type	estimated rate constants\bigstrut[t]
accuracy: $c_{a,2}r_{a,2}(n_{2})$	algebraic: $r_{a,2}(n_{2})=n_{2}^{-\alpha_{2}}$	$c_{a,2}=0.3361$ , $\alpha_{2}=0.8617$ \bigstrut[t]
evaluation cost: $c_{c,2}r_{c,2}(n_{2})$	algebraic: $r_{c,2(n_{2})}=n_{2}^{\beta_{2}}$	$c_{c,2}=2.9060\times 10^{-7}$ , $\beta_{2}=1.1480$ \bigstrut[t]

Table 5: ASDEX Upgrade Experiment: Estimated accuracy and evaluation cost rates, and their constants for the SG low-fidelity model.

4.2.5 A deep learning low-fidelity model

Low-fidelity model $f^{(3)}_{n_{3}}$ is a given by a fully connected, feed-forward three-layer deep neural network (DNN) with ReLU activation functions that is trained on high-fidelity data. The number of nodes of the hidden layers is $\lfloor 0.05\times n_{3}\rfloor$ and thus depends on the number of training samples $n_{3}$ . Training is done over $20$ epochs using an Adam optimizer in which the loss function is the MSE. The accuracy rate is estimated from $20$ high- and low-fidelity model evaluations and the cost rate from $1,000$ low-fidelity model evaluations; see Figure 8 and Table 6. Notice that within the considered training size, the accuracy rate decreases rather slowly, at an algebraic rate of only about $0.2$ . This indicates that a large training size is required for obtaining accurate approximations should the DNN low-fidelity model be used to replace the high-fidelity model. In addition, the evaluation cost rate increases very slowly, which is because the network has few layers only and the employed libraries are highly optimized.

rate	rate type	estimated rate constants\bigstrut[t]
accuracy: $c_{a,3}r_{a,3}(n_{3})$	algebraic: $r_{a,3}(n_{3})=n_{3}^{-\alpha_{3}}$	$c_{a,3}=0.1399$ , $\alpha_{3}=0.2180$ \bigstrut[t]
evaluation cost: $c_{c,3}r_{c,3}(n_{3})$	algebraic: $r_{c,3}(n_{3})=n_{3}^{\beta_{3}}$	$c_{c,3}=3.6664\times 10^{-7}$ , $\beta_{3}=0.0501$ \bigstrut[t]

Table 6: ASDEX Upgrade Experiment: Estimated accuracy and evaluation cost rates, and their constants for the DNN low-fidelity model.

4.2.6 Context-aware multi-fidelity estimation

We consider the computational budget $p=5\times 10^{5}$ seconds, which corresponds to $1,216$ high-fidelity model evaluations and is the limit of the computational resources available to us for this project. To determine which combinations of models lead to CA-MFMC estimators with low MSEs for a budget $p$ , we plot the analytic MSEs in Figure 9. The CA-MFMC estimator with the high-fidelity model $f^{(0)}$ , the coarse-grid low-fidelity model $f^{(1)}$ and either the SG low-fidelity model $f^{(2)}_{n_{2}}$ or the DNN low-fidelity model $f^{(3)}_{n_{3}}$ lead to low MSEs. In contrast, the CA-MFMC estimator with the high-fidelity model $f^{(0)}$ , SG model $f^{(2)}$ , DNN model $f^{(3)}$ but without the coarse-grid model $f^{(1)}$ leads to a higher MSE. We therefore consider the CA-MFMC estimators with $f^{(0)}$ , $f^{(1)}$ and either $f^{(2)}_{n_{2}}$ or $f^{(3)}_{n_{3}}$ (cf. Section 3.3). This experiment shows that the analytic MSE can be used to select model combinations.

Based on the estimated constants in Table 5, the accuracy and cost rates of the SG low-fidelity models are both strictly convex for any $n_{2}\in[1,p-1]$ and any budget $p>0$ , and therefore Propositions 3 and 4 apply. That is, the optimal, context-aware number of high-fidelity model evaluations for constructing the SG low-fidelity model, $n_{2}^{*}$ , exists, is unique and bounded with respect to the budget. On the other hand, we see from Table 6 that the algebraic cost rate of the DNN low-fidelity model is concave since $\beta_{3}<1$ , which means that we need to verify whether the assumptions in Propositions 3 hold true. To this end, we verify whether the function $h_{3}(n_{3})=\kappa_{2}+\hat{c}_{a,3}r_{a,3}(n_{3})+c_{c,3}r_{c,3}(n_{3})$ defined in (3.12), with $\kappa_{2}=w_{1}=0.0849$ and $\hat{c}_{a,3}=\kappa_{1}c_{a,3}$ with $\kappa_{1}=1-\rho_{1}^{2}=0.0018$ , satisfies condition (3.14). We do so for budget $p=10^{7}$ seconds, i.e., the largest budget considered in Figure 9, for which $n_{3}\in[1;24,330]$ . The left plot in Figure 10 shows that $h_{3}^{\prime\prime}(n_{3})>0$ for all $n_{2}\in[1,24,330]$ which implies that (3.14) is satisfied for all budgets $0<p\leq 10^{7}$ , including the budget considered in our computations, i.e., $p=5\times 10^{5}$ seconds. Therefore Propositions 3 applies and the optimal, context-aware number of high-fidelity model evaluations for constructing the DNN low-fidelity model, $n_{3}^{*}$ , exists and is unique. The right plot in Figure 10 depicts the objective $u_{3}$ defined in (3.13), which has a unique global minimizer.

In the left plot of Figure 11, we show the number of training samples $n_{2}^{*}$ for the CA-MFMC estimators with models $f^{(0)},f_{n_{2}}^{(2)}$ and $f^{(0)},f^{(1)},f_{n_{2}}^{(2)}$ . For the SG model $f_{n_{2}}^{(2)}$ we took $133$ training samples because the optimal $n_{2}^{*}=131$ does not corresponding to a valid sparse grid; cf. Section 4.2.4. In the right plot of Figure 11, the number of training samples $n_{3}^{*}$ for the CA-MFMC estimator with models $f^{(0)},f^{(1)},f_{n_{3}}^{(3)}$ is shown. Training with only $n_{3}^{*}=159$ samples in case of our budget $p=5\times 10^{5}$ is sufficient to obtain a low-fidelity model that is useful for variance reduction in the CA-MFMC estimator together with the high-fidelity model, even though the low-fidelity model alone is insufficient to provide accurate predictions in single-fidelity settings.

Consider now the left plot of Figure 12 that compares the MSEs of the CA-MFMC estimators to static MFMC and regular MC. The reference solution was obtained via the MFMC estimator depending on the coarse-grid and DNN low-fidelity models for a budget of $p=3\times 10^{6}$ seconds using $N=5$ replicates; here we needed $n_{3}^{*}=847$ high-fidelity evaluations to construct the DNN low-fidelity model. The average reference value of the expected growth rate of the dominant electron temperature gradient-driven micro-instability mode is $\hat{\mu}_{\mathrm{ref}}=0.3967$ . The context-aware estimators are up to two orders of magnitude more accurate than the standard MC estimator. The roughly two orders of magnitude speedup of the CA-MFMC estimators translate into a runtime reduction from $72$ days to roughly $4$ hours on $32$ cores on one node of the Lonestar6 supercomputer. We also observe that MFMC with the high-fidelity and coarse-grid model is about one order of magnitude more accurate in terms of the MSE than the standard MC estimator, showing that coarse-grid plasma physics models can be useful in multi-fidelity settings. The trend that we observe here numerically are in alignment with the analytic MSEs shows in Figure 9.

The right plot of Figure 12 shows how the number of samples are distributed among the models. All multi-fidelity estimators allocate more than $98\%$ of the total number of samples to low-fidelity models, which explains the speedups. However, there is still a small percentage corresponding to the high-fidelity model, which is desirable in practical applications to achieve unbiasedness of the estimators.

5 Conclusions

The proposed context-aware learning approach for variance reduction trades off training hierarchies of low-fidelity models with Monte Carlo estimation. By leveraging the trade-off between either adding a high-fidelity model sample to the training set for constructing low-fidelity models versus using the high-fidelity model sample in the Monte Carlo estimator, our analysis shows that the quasi-optimal number of training samples from the high-fidelity model is bounded independent of the computational budget and thus that after a finite number of samples, all high-fidelity model samples should be used in the Monte Carlo estimator rather than being used to improve the low-fidelity model. This means that low-fidelity models can be over-trained for multi-fidelity uncertainty quantification. We have demonstrated the over-training numerically in a thermal block scenario, in which a reduced basis low-fidelity model constructed from $50$ high-fidelity training samples led to a multi-fidelity estimator with a poorer cost/error ratio than the proposed context-aware estimator that determined $18$ to be the maximum quasi-optimal number of training samples. This has clear implications on training data-fit and machine-learning-based models too, which can require large data sets for achieving high prediction accuracy but are sufficient to be trained with few data points when they are merely used for variance reduction together with the high-fidelity model in multi-fidelity uncertainty quantification, as we have shown in a numerical experiment with plasma micro-turbulence simulations.

Acknowledgements

I.-G.F. and F.J. were partially supported by the Exascale Computing Project (No. 17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. B.P. acknowledges support from the Air Force Office of Scientific Research (AFOSR) award FA9550-21-1-0222 (Dr. Fariba Fahroo). We thank Tobias Goerler for useful discussions and insights about the considered plasma micro-turbulence simulation scenario. We also gratefully acknowledge the compute and data resources provided by the Texas Advanced Computing Center at The University of Texas at Austin (https://www.tacc.utexas.edu/).

References

[1] T. Alsup and B. Peherstorfer. Context-aware surrogate modeling for balancing approximation and sampling costs in multi-fidelity importance sampling and Bayesian inverse problems. arXiv, 2010.11708, 2020.
[2] P. Benner, S. Gugercin, and K. Willcox. A survey of projection-based model reduction methods for parametric dynamical systems. SIAM Review, 57(4):483–531, 2015.
[3] P. Binev, A. Cohen, W. Dahmen, R. DeVore, G. Petrova, and P. Wojtaszczyk. Convergence rates for greedy algorithms in reduced basis methods. SIAM Journal on Mathematical Analysis, 43(3):1457–1472, 2011.
[4] A. Brizard and T. Hahm. Foundations of nonlinear gyrokinetic theory. Reviews of Modern Physics, 79, 2007.
[5] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011.
[6] K. A. Cliffe, M. B. Giles, R. Scheichl, and A. L. Teckentrup. Multilevel Monte Carlo methods and applications to elliptic PDEs with random coefficients. Computing and Visualization in Science, 14(1):3–15, 2011.
[7] N. Collier, A.-L. Haji-Ali, F. Nobile, E. von Schwerin, and R. Tempone. A continuation multilevel Monte Carlo algorithm. BIT Numerical Mathematics, 55(2):399–432, Jun 2015.
[8] T. Dannert and F. Jenko. Gyrokinetic simulation of collisionless trapped-electron mode turbulence. Physics of Plasmas, 12(7):072309, 2005.
[9] S. De, J. Britton, M. Reynolds, R. Skinner, K. Jansen, and A. Doostan. On transfer learning of neural networks using bi-fidelity data for uncertainty propagation. International Journal for Uncertainty Quantification, 10(6):543–573, 2020.
[10] M. Eigel, C. Merdon, and J. Neumann. An adaptive multilevel Monte Carlo method with stochastic bounds for quantities of interest with uncertain data. SIAM/ASA Journal on Uncertainty Quantification, 4(1):1219–1245, 2016.
[11] I.-G. Farca\cbs, T. Görler, H.-J. Bungartz, F. Jenko, and T. Neckel. Sensitivity-driven adaptive sparse stochastic approximations in plasma microinstability analysis. Journal of Computational Physics, 410:109394, 2020.
[12] I.-G. Farcas. Context-aware model hierarchies for higher-dimensional uncertainty quantification. PhD thesis, Technical University of Munich, 2020.
[13] I.-G. Farcas, G. Merlo, and F. Jenko. A general framework for quantifying uncertainty at scale and its application to fusion research. arXiv, 2202.03999, 2022.
[14] I.-G. Farcaş, A. D. Siena, and F. Jenko. Turbulence suppression by energetic particles: a sensitivity-driven dimension-adaptive sparse grid framework for discharge optimization. Nuclear Fusion, 61(5):056004, apr 2021.
[15] S. J. Freethy, T. Görler, A. J. Creely, G. D. Conway, S. S. Denk, T. Happel, C. Koenen, P. Hennequin, and A. E. White. Validation of gyrokinetic simulations with measurements of electron temperature fluctuations and density-temperature phase angles on ASDEX upgrade. Physics of Plasmas, 25(5):055903, 2018.
[16] M. B. Giles. Multilevel Monte Carlo path simulation. Operations Research, 56(3):607–617, 2008.
[17] A. A. Gorodetsky, G. Geraci, M. S. Eldred, and J. D. Jakeman. A generalized approximate control variate framework for multifidelity uncertainty quantification. Journal of Computational Physics, 408:109257, 2020.
[18] A. A. Gorodetsky, J. D. Jakeman, and G. Geraci. MFNets: data efficient all-at-once learning of multifidelity surrogates as directed networks of information sources. Computational Mechanics, 68(4):741–758, Oct 2021.
[19] A. A. Gorodetsky, J. D. Jakeman, G. Geraci, and M. S. Eldred. MFNets: Multi-fidelity data-driven networks for bayesian learning and prediction. International Journal for Uncertainty Quantification, 10(6):595–622, 2020.
[20] A. Gruber, M. Gunzburger, L. Ju, R. Lan, and Z. Wang. Multifidelity Monte Carlo estimation for efficient uncertainty quantification in climate-related modeling. EGUsphere, 2022:1–27, 2022.
[21] A. Gruber, M. Gunzburger, L. Ju, and Z. Wang. A Multifidelity Monte Carlo Method for Realistic Computational Budgets. arXiv, 2206.07572, 2022.
[22] A.-L. Haji-Ali, F. Nobile, E. von Schwerin, and R. Tempone. Optimization of mesh hierarchies in multilevel Monte Carlo samplers. Stochastics and Partial Differential Equations Analysis and Computations, 4(1):76–112, Mar 2016.
[23] J. Hampton, H. R. Fairbanks, A. Narayan, and A. Doostan. Practical error bounds for a non-intrusive bi-fidelity approach to parametric/stochastic model reduction. Journal of Computational Physics, 368:315–332, 2018.
[24] F. Jenko, W. Dorland, M. Kotschenreuther, and B. N. Rogers. Electron temperature gradient driven turbulence. Physics of Plasmas, 7(5):1904–1910, 2000.
[25] P. Khodabakhshi, K. E. Willcox, and M. Gunzburger. A multifidelity method for a nonlocal diffusion model. Applied Mathematics Letters, 121:107361, 2021.
[26] J. Konrad, I.-G. Farca\cbs, B. Peherstorfer, A. Di Siena, F. Jenko, T. Neckel, and H.-J. Bungartz. Data-driven low-fidelity models for multi-fidelity Monte Carlo sampling in plasma micro-turbulence analysis. Journal of Computational Physics, 451:110898, 2022.
[27] F. Law, A. Cerfon, and B. Peherstorfer. Accelerating the estimation of collisionless energetic particle confinement statistics in stellarators using multifidelity Monte Carlo. Nuclear Fusion, 62(7):076019, may 2022.
[28] R. Miller, M. Chu, J. Greene, Y. Lin-Liu, and R. Waltz. Noncircular, finite aspect ratio, local equilibrium model. Physiscs of Plasmas, 5:979, 1998.
[29] A. Narayan, C. Gittelson, and D. Xiu. A Stochastic Collocation Algorithm with Multifidelity Models. SIAM Journal on Scientific Computing, 36(2):A495–A521, 2014.
[30] F. Newberry, J. Hampton, K. Jansen, and A. Doostan. Bi-fidelity reduced polynomial chaos expansion for uncertainty quantification. Computational Mechanics, 69(2):405–424, Feb 2022.
[31] L. Ng and K. Willcox. Multifidelity approaches for optimization under uncertainty. International Journal for Numerical Methods in Engineering, 100(10):746–772, 2014.
[32] B. Peherstorfer. Multifidelity Monte Carlo estimation with adaptive low-fidelity models. SIAM/ASA Journal on Uncertainty Quantification, 7(2):579–603, 2019.
[33] B. Peherstorfer, M. Gunzburger, and K. Willcox. Convergence analysis of multifidelity Monte Carlo estimation. Numerische Mathematik, 139(3):683–707, 2018.
[34] B. Peherstorfer, K. Willcox, and M. Gunzburger. Optimal model management for multifidelity Monte Carlo estimation. SIAM Journal on Scientific Computing, 38(5):A3163–A3194, 2016.
[35] B. Peherstorfer, K. Willcox, and M. Gunzburger. Survey of multifidelity methods in uncertainty propagation, inference, and optimization. SIAM Review, 60(3):550–591, 2018.
[36] E. Qian, B. Peherstorfer, D. O’Malley, V. Vesselinov, and K. Willcox. Multifidelity Monte Carlo estimation of variance and sensitivity indices. SIAM/ASA Journal on Uncertainty Quantification, 6(2):683–706, 2018.
[37] M. Razi, R. M. Kirby, and A. Narayan. Fast predictive multi-fidelity prediction with models of quantized fidelity levels. Journal of Computational Physics, 376:992–1008, 2019.
[38] G. Rozza, D. Huynh, and A. Patera. Reduced basis approximation and a posteriori error estimation for affinely parametrized elliptic coercive partial differential equations. Archives of Computational Methods in Engineering, 15(3):1–47, 2007.
[39] D. Schaden and E. Ullmann. On multilevel best linear unbiased estimators. SIAM/ASA Journal on Uncertainty Quantification, 8(2):601–635, 2020.
[40] D. Schaden and E. Ullmann. Asymptotic analysis of multilevel best linear unbiased estimators. SIAM/ASA Journal on Uncertainty Quantification, 9(3):953–978, 2021.
[41] N. Shyamkumar, S. Gugercin, and B. Peherstorfer. Towards context-aware learning for control: Balancing stability and model-learning error. In 2022 American Control Conference (ACC), pages 4808–4813, 2022.
[42] A. L. Teckentrup, R. Scheichl, M. B. Giles, and E. Ullmann. Further analysis of multilevel Monte Carlo methods for elliptic PDEs with random coefficients. Numerische Mathematik, 125(3):569–600, 2013.
[43] S. Werner and B. Peherstorfer. Context-aware controller inference for stabilizing dynamical systems from scarce data. arXiv, 2207.11049, 2022.
[44] Y. Xu, V. Keshavarzzadeh, R. M. Kirby, and A. Narayan. A Bandit-Learning Approach to Multifidelity Approximation. SIAM Journal on Scientific Computing, 44(1):A150–A175, 2022.
[45] H. Yang, Y. Fujii, K. W. Wang, and A. A. Gorodetsky. Control variate polynomial chaos: Optimal fusion of sampling and surrogates for multifidelity uncertainty quantification, 2022.