Context-aware learning of hierarchies of low-fidelity models for multi-fidelity uncertainty quantification
Abstract
Multi-fidelity Monte Carlo methods leverage low-fidelity and surrogate models for variance reduction to make tractable uncertainty quantification even when numerically simulating the physical systems of interest with high-fidelity models is computationally expensive. This work proposes a context-aware multi-fidelity Monte Carlo method that optimally balances the costs of training low-fidelity models with the costs of Monte Carlo sampling. It generalizes the previously developed context-aware bi-fidelity Monte Carlo method to hierarchies of multiple models and to more general types of low-fidelity models. When training low-fidelity models, the proposed approach takes into account the context in which the learned low-fidelity models will be used, namely for variance reduction in Monte Carlo estimation, which allows it to find optimal trade-offs between training and sampling to minimize upper bounds of the mean-squared errors of the estimators for given computational budgets. This is in stark contrast to traditional surrogate modeling and model reduction techniques that construct low-fidelity models with the primary goal of approximating well the high-fidelity model outputs and typically ignore the context in which the learned models will be used in upstream tasks. The proposed context-aware multi-fidelity Monte Carlo method applies to hierarchies of a wide range of types of low-fidelity models such as sparse-grid and deep-network models. Numerical experiments with the gyrokinetic simulation code Gene show speedups of up to two orders of magnitude compared to standard estimators when quantifying uncertainties in small-scale fluctuations in confined plasma in fusion reactors. This corresponds to a runtime reduction from 72 days to about four hours on one node of the Lonestar6 supercomputer at the Texas Advanced Computing Center.
1 Introduction
Uncertainty quantification is an essential building block for achieving predictive numerical simulations of physical systems. To make accurate Monte Carlo estimation of uncertainties tractable even when simulations of the physical system of interest are computationally expensive, multi-fidelity methods rely on low-fidelity or surrogate models: the low-fidelity models are leveraged for variance reduction to achieve speedups and the high-fidelity models are occasionally evaluated to guarantee unbiasedness; see [35] for a survey on multi-fidelity methods. However, if low-fidelity models are not readily available, then they need to be constructed and trained first, which can incur additional computational costs and require additional evaluations of the high-fidelity models to generate training data.
In this work, we build on the context-aware bi-fidelity Monte Carlo method introduced in [32] and propose the context-aware multi-fidelity Monte Carlo (CA-MFMC) method that trades off the costs of training hierarchies of multiple low-fidelity models with the costs of Monte Carlo sampling to obtain multi-fidelity estimators that minimize upper bounds of the mean-squared errors for given computational budgets. The proposed approach is context-aware [1, 12, 32, 43, 41] in the sense that the low-fidelity models are trained to maximize variance reduction in Monte Carlo estimation. This means that the context in which the learned models will be used, namely for variance reduction in Monte Carlo estimation, is taken into account during training, which distinguishes it from traditional surrogate modeling and model reduction techniques that construct low-fidelity models with the primary goal of approximating well the high-fidelity model outputs and typically ignore the context in which the learned models are ultimately used [2, 38]. Our proposed CA-MFMC method can be combined with a wide range of types of data-fit low-fidelity models models such as sparse-grid-based models and deep-network models. We show that CA-MFMC achieves speedups of up to two orders of magnitude compared to single-fidelity Monte Carlo and standard multi-fidelity estimators when quantifying uncertainties in a plasma micro-turbulence scenario from the ASDEX Upgrade Experiment111https://www.ipp.mpg.de/16195/asdex.
We build on multi-fidelity Monte Carlo (MFMC) estimators [25, 31, 33, 34, 35, 36, 21, 20] that leverage a given hierarchy of low-fidelity models for variance reduction. Several works have employed MFMC estimators to quantify uncertainties in plasma micro-turbulence simulations [12, 26] and for estimating statistics in collisionless energetic particle confinement in stellerators [27]. See also [13] for other techniques for uncertainty quantification in plasma simulations. None of these methods trade off the training of low-fidelity models with sampling, however. The works [17, 39, 40] formulate generalized control variate techniques for multi-fidelity uncertainty propagation. There is a wide range of other multi-fidelity techniques that are based on other concepts than control variates and aim to find surrogate models from multi-fidelity information sources such as the collocation approach introduced in [29, 23] and the multi-fidelity networks proposed in [19, 18]; see also [30, 9, 37, 44]. There have been extensions to the multilevel Monte Carlo method [6, 16, 42] that consider spatially-adaptive mesh refinements and optimal mesh hierarchies [7, 10, 22]. However, in that line of work, the low-fidelity models are coarse-grid approximations and thus the costs of constructing low-fidelity models are typically considered to be negligible and therefore are ignored. In contrast, we consider more general types of low-fidelity models such as data-fit models that incur training costs.
The first work that considered trading off training low-fidelity models and multi-fidelity Monte Carlo estimation is [32], which studies the bi-fidelity setting in which the low-fidelity model has algebraic accuracy and cost rates. In [45] a similar trade off is considered in the bi-fidelity case but with more general cost and error rates and a specific focus on polynomial-chaos-based surrogate models. We go far beyond by introducing the CA-MFMC estimator based on context-aware learning that learns hierarchies of more than one low-fidelity model for variance reduction. We first extend the work on a single low-fidelity model with algebraic cost and error rates in [32] to apply to low-fidelity models with more general rates. Key is that the correspond bounds can be nested, which motivates the sequential training of low-fidelity models to obtain a hierarchy. In the proposed sequential training approach to fit hierarchies of low-fidelity models for CA-MFMC estimators, each step leads to an optimal trade off between training and sampling. This leads to a context-aware learning approach because the models are learned such that the CA-MFMC estimator achieves a low mean-squared error, rather than being trained to accurately approximate high-fidelity model outputs as in traditional model reduction.
To demonstrate the performance of the proposed CA-MFMC estimator, we apply it to quantifying uncertainties in plasma micro-turbulence simulations motivated by the ITER experiment222https://www.iter.org. The goal of the ITER experiment is to create, for the very first time, a self-sustained plasma in the laboratory. A physics obstacle are the above-mentioned small-scale fluctuations which cause energy loss rates despite sophisticated plasma confinements via strong and shaped magnetic fields. Building on the gyrokinetic simulation code GENE [24], we show that the proposed estimator achieves speedups of up to two orders of magnitude compared to single-fidelity Monte Carlo and standard multi-fidelity estimators, which translates into a runtime reduction from 72 days to about four hours on one node of the Lonestar6 supercomputer at the Texas Advanced Computing Center333https://www.tacc.utexas.edu/systems/lonestar6.
The remainder of this paper is organized as follows. Section 2 introduces the notation and summarizes the traditional MFMC algorithm [31, 34] and the bi-fidelity context-aware algorithm formulated in [32]. Section 3 introduces our context-aware learning approach for low-fidelity models with general cost/error rates and multiple low-fidelity models for multi-fidelity sampling methods. Section 4 presents numerical results in two scenarios: a heat conduction problem defined on a two-dimensional spatial domain with nine uncertain parameters and a realistic plasma micro-turbulence scenario with uncertain inputs, for which one realization of the uncertain inputs requires a total runtime of about seconds on cores. The code and data to reproduce our numerical results are available at https://github.com/ionutfarcas/context-aware-mfmc.
2 Preliminaries
This section reviews traditional MFMC estimators [31, 34] and the bi-fidelity context-aware MFMC estimator [32] that uses a high-fidelity model and a single low-fidelity model only.
2.1 Static multi-fidelity Monte Carlo estimation
Let represent the input-output response of a potentially expensive-to-evaluate computational model. The input domain is and the output is scalar with output domain . We consider the situation where the input is a realization of a random variable so that the output becomes a realization of a random variable too. We denote the probability density function of as . Our goal is to estimate the expected value of
The MFMC estimator introduced in [31, 34] combines a hierarchy of models, consisting of a high-fidelity model and low-fidelity models , , …, . The accuracy of the low-fidelity models is measured by their Pearson correlation coefficient with respect to ,
where denotes the variance of the output random variable for and is the covariance. The evaluation costs of the models are , respectively. We normalize the cost of evaluating the high-fidelity model such that , without loss of generality. The models are assumed to satisfy the following ordering
(2.1) |
where for .
Let now for denote the number of evaluations of model , which satisfy
because of (2.1). Consider independent and identically distributed (i.i.d.) samples drawn from the input density :
To obtain the MFMC estimator, model is evaluated at the first samples to obtain output random variables
(2.2) |
for . From (2.2), derive the following standard MC estimators for :
as well as
The MFMC estimator is
with the coefficients . The total computational cost of the MFMC estimator is therefore
Fixing and then selecting model evaluations and coefficients such that the variance of the MFMC estimator is minimized, gives the MSE
(2.3) |
The optimal model evaluations and coefficients are available in closed form for a given budget ; see [34] for details.
The MFMC estimator leverages the cheaper-to-evaluate low-fidelity models with the aim to achieve a lower MSE than a standard MC estimator with the same costs. That is, given i.i.d. samples from , the standard MC mean estimator is
(2.4) |
The computational cost of the Monte Carlo estimator (2.4) of is because the function is evaluated at realizations and the cost of each evaluation of is . The MSE of the standard MC estimator reads
(2.5) |
2.2 Context-aware bi-fidelity Monte Carlo estimator with algebraic accuracy and cost rates
The work [32] introduces a context-aware bi-fidelity MFMC approach that trades off the training costs of constructing a low-fidelity model and sampling. Following [32], the low-fidelity model is obtained via a training process such as data-fit models and reduced models. Correspondingly, the subscript refers to the number of high-fidelity evaluations that are needed to train . For example, in case of obtaining via training a neural network, the subscript refers to the number of training input-output pairs obtained with the high-fidelity model . This also means that the correlation coefficient between and as well as the evaluation cost of depend on .
The dependency of the correlation and costs on are described as follows: to trade off training low-fidelity model and sampling, the work [32] makes the assumption that the correlation coefficient between the high-fidelity output random variable and low-fidelity random variable is bounded by
with respect to , where are constants. The cost of evaluating the low-fidelity model is bounded by
with constants .
A budget corresponds to high-fidelity model evaluations because we have . Thus, if high-fidelity model samples are used for training , then a budget of is left for sampling. The corresponding context-aware bi-fidelity estimator is
where are chosen to minimize the MSE of for a given and . Consequently, the MSE of depends on and
which is in contrast to the MFMC estimator for which the MSE (2.3) depends on only and is independent of potential training costs of constructing low-fidelity models.
If the budget is fixed, can be chosen by minimizing the upper bound of the MSE
(2.6) |
The work [32] shows that there exists a unique that minimizes (2.6) with respect to for a given under certain assumptions; however, no closed form expression of is available and thus needs to be found numerically. The optimum is bounded independently of the budget ; see [32] for details.
3 Context-aware multi-fidelity Monte Carlo estimation
This section presents the methodological novelty of this paper. We introduce context-aware learning for MFMC for low-fidelity models with general cost/error rates and hierarchies of multiple—more than one—low-fidelity models.
3.1 Setup with multiple models
We now consider multiple low-fidelity models that take evaluations, respectively, of the high-fidelity model to be trained. We refer to as the number of training samples to fit the low-fidelity models. We then define the CA-MFMC estimator as
where the and minimize the MSE of for given and budget . The MSE of the CA-MFMC estimator is
(3.1) |
where now the MSE depends on the training samples used for constructing the low-fidelity models. We are interested in choosing to minimize an upper bound of the MSE (3.1) so that the training costs given by and the costs of taking samples from the high- and the low-fidelity output random variables are balanced.
We make the following assumptions on the accuracy and cost behavior of the low-fidelity models with respect to the number of training samples :
Assumption 1.
For all , there exists a constant and a positive, decreasing, at least twice continuously differentiable function such that
Assumption 2.
For all , there exists a constant and a positive, increasing, at least twice continuously differentiable function such that the evaluation costs are bounded as
3.2 Context-aware multi-fidelity Monte Carlo sampling with one low-fidelity model
Let us first consider only one low-fidelity model, which means that . The following is novel compared to what was introduced in [32] because here we allow more general error and cost rates than [32]; cf. Section 2.2 where [32] is reviewed.
We obtain the following upper bound of the MSE of the CA-MFMC estimator (3.1),
(3.2) | ||||
(3.3) | ||||
(3.4) |
where inequality (3.2) is due to the means inequality , inequality (3.3) results from and (3.4) is due to Assumptions 1 and 2.
The objective that we want to minimize with respect to is
(3.5) |
The following proposition clarifies when a unique global minimum exists.
Proposition 1.
Proof.
Define the function and notice that
(3.7) |
due to (3.6), which means that is strictly convex in . We now consider and its first derivative
(3.8) |
Case 1: If for all , then is strictly increasing and the minimum of in is taken at the left boundary , which is the global unique minimum.
Case 2: Analogously to Case 1, if for all , then is strictly decreasing and the unique minimum is .
Case 3: There is a stationary point such that . At a stationary point of , the second derivative,
is positive because of (3.7) and . Thus, together with the fact that function is twice continuously differentiable and univariate, we have that is positive in a neighborhood about all stationary points, because the inequality is strict. Thus, in the interior , there can only be minima, which also means that a the boundaries of the interval can only be maxima. This means that there exists only one minimum, which shows uniqueness. ∎
Remark 1.
We will next show that that the minimizer of (3.5) is also bounded from above, independent of the computational budget, , which implies that after a finite number of samples, all high-fidelity model samples should be used in the Monte Carlo estimator rather than being used to improve the low-fidelity model.
Proposition 2.
Proof.
Consider introduced in the proof of Proposition 2. There is an with as given by (3.9). The stationary point is unique because is strictly increasing because we assumed that (3.6) holds in . Furthermore, the stationary point is independent of because does not depend on .
Case 1: If , then this implies that (strictly increasing because ) for all for all and thus in . Thus, and (3.10) is a bound independent of .
Case 2: Let now . First, consider budgets so that . Because with and strictly increasing, we have that in and thus in , which means cannot be strictly decreasing on all of . Thus, the minimum of in is either at a stationary point of or at 1. If it is a stationary point, then has to hold to obtain and we obtain , which implies because is strictly increasing and thus (3.10) is a -independent bound of . If it , then still holds because . Second, consider budgets so that . Then, it holds , which again leads to the bound (3.10).
This shows that depending on the properties of , the maximum of and is an upper bound of . ∎
3.3 Context-aware multi-fidelity Monte Carlo sampling with two or more low-fidelity models
The second novelty of our proposed context-aware approach is that we can consider more than just one low-fidelity model. In the following, we introduce a sequential training approach to fit hierarchies of low-fidelity models for the CA-MFMC estimator, where in each step the optimal trade off between training and sampling is achieved.
We first show an upper bound of the MSE (3.1) in terms of accuracy and cost rates for the case with low-fidelity models:
Lemma 1.
Proof.
Bound (3.11) in Lemma 1 decomposes into the component which depends on only and components and that can be bounded with Assumptions 1 and 2 so that they depend on for a fixed . This decomposition motivates a sequential approach of adding low-fidelity models to the CA-MFMC estimator. At iteration , we consider the objective
(3.13) |
where . For , we obtain the objective defined in (3.5) with the convention that and . For , we obtain objectives in (3.13) that depend on . For given , there is a global, unique minimizer of in as the following proposition shows.
Proposition 3.
Proof.
The next proposition shows the minimizers of (3.13) for are bounded from above, independent of the computational budget .
Proposition 4.
Propositions 1–4 highlights that even though using more than high-fidelity model evaluations as training data to construct the th low-fidelity model can lead to a more accurate low-fidelity model in terms of correlation coefficient, it also increases its evaluation costs, which ultimately leads to an increase of the upper bound of the MSE of the MFMC estimator (3.11) for a fixed budget and thus a poorer estimator. This shows that it is beneficial in multi-fidelity methods to trade off accuracy and evaluation costs of the low-fidelity models.
Remark 2.
Increasing the model hierarchy by adding the th context-aware low-fidelity model must ensure that the MFMC ordering (2.1) is satisfied, i.e., the accuracy and evaluation cost rates corresponding to , evaluated at , have to satisfy (2.1). If (2.1) is not satisfied, the order of the low-fidelity models in the multi-fidelity hierarchy must be changed accordingly.
4 Numerical experiments and discussion
We now present numerical results. In Section 4.1, we consider a heat conduction problem given by a parametric elliptic partial differential equation (PDE) with nine uncertain parameters, defined on a two-dimensional spatial domain. Our goal in this experiment is to draw initial insights about the proposed context-aware learning algorithm. To this end, we will employ two heterogeneous low-fidelity models: an accurate low-fidelity model that is computationally more expensive than a second, less accurate, low-fidelity model. Section 4.2 considers plasma micro-turbulence simulations that depend on uncertain inputs. In this example, we consider three low-fidelity models: a coarse-grid model and two non-intrusive data-driven low-fidelity models based on sparse grid approximations and deep neural network regression.
4.1 Heat conduction in a two-dimensional spatial domain
All numerical experiments in this section were performed using an eight core Intel i7-10510U CPU and 16 GB of RAM.
4.1.1 Setup
The thermal block problem [38] is defined on a two-dimensional spatial domain , divided into non-overlapping vertical and horizontal square blocks, with . The mathematical model is given by the parametric elliptic PDE
(4.1) |
where is a Dirichlet boundary, and are Neumann boundaries, and
is the piece-wise constant heat conductivity field parametrized by , where is the indicator function of . We set and thus . Here, is parametrized by uniformly distributed random parameters in . The output of interest is the mean heat flow at the Neumann boundary given by
(4.2) |
This setup is a slight modification of the problem considered by Patera and Rozza in [38].
The high-fidelity model is a finite element discretization of (4.1) consisting of triangular elements with mesh width , provided by the RBMatlab444https://www.morepas.org/software/rbmatlab/. The high-fidelity model evaluation cost is seconds. Moreover, its variance, estimated using MC samples, is .
4.1.2 A low-fidelity model based on the reduced basis method
The reduced-basis (RB) low-fidelity model is constructed using a greedy strategy, as described, for example, in [3]. We employ the implementation provided by RBMatlab. It has been shown that greedy RB low-fidelity models have exponential accuracy rates for problems similar to the thermal block [3], which is what we use when fitting the error decay. Once the reduced basis is found in the offline stage, evaluating the low-fidelity model online entails solving a dense linear system of size equal to the number of reduced bases to find the reduced basis coefficients. Therefore, we model the evaluation cost rate as algebraic in the reduced-model dimension.
The constants in the rate functions are estimated via regression from pilot runs. To estimate the constants in the exponential accuracy rate, we use high- and low-fidelity evaluations . For estimating the constants in the evaluation cost rate, we average the runtimes of the low-fidelity model constructed using increasing values of , evaluated at MC samples. We perform these runtime measurements times and average the results. The estimated rates are visualized in Figure 1 and the constants are shown in Table 1.

|
4.1.3 A data-fit low-fidelity model
The second low-fidelity model that we consider in the thermal block example is an -support vector regression (-SVR) model. Our numerical implementation is based on libsvm [5]. The training data consists of pairs of pseudo-random realizations of the nine-dimensional input and the corresponding high-fidelity model evaluations. In our experiments, we used . We model the accuracy and evaluation cost rates as algebraic. To estimate the constants in the accuracy rate, we use high- and low-fidelity evaluations. The constants in the evaluation cost rate are estimated by averaging the runtimes of the low-fidelity model evaluated at MC samples. We perform these runtime measurements times and average the results. The estimated rates are visualized in Figure 2 and the constants are shown in Table 2. Note that the -SVR low-fidelity model is less accurate but cheaper to evaluate than the RB low-fidelity model.

|
4.1.4 Context-aware multi-fidelity Monte Carlo with RB low-fidelity model
We first consider only the RB low-fidelity model and budgets seconds. We compare, in terms of MSE, our context-aware estimator with standard MC sampling and MFMC in which the RB low-fidelity model is constructed using a fixed, a priori chosen number of basis independent of the budget . We compute the MSEs from replicates as
(4.3) |
where represents the reference mean estimator and is either the CA-MFMC, standard MFMC or the MC estimator. The reference result was obtained using the context-aware multi-fidelity estimator in which the two low-fidelity models, RB and -SVR, were sequentially added with a computational budget seconds (cf. Section 4.1.5). To distinguish between the employed estimators, we use “std. MC: ” to denote the standard MC estimator, “MFMC: ” to refer to the MFMC estimator in which the RB low-fidelity model is statically constructed independent of the budget , and “CA-MFMC: ” to refer to the context-aware multi-fidelity estimator with the RB low-fidelity model.
Based on the estimated rate parameters in Table 1, both the accuracy and cost rates of the RB low-fidelity model are strictly convex for all and any budget , and hence Propositions 1 and 2 apply. Therefore the optimal, context-aware number of high-fidelity model evaluations for constructing the RB low-fidelity model, , exists, is unique and bounded with respect to the budget. The optimal number of training samples for increasing budgets is shown in Figure 3, on the left. Notice that and this upper bound is attained for budgets seconds, which is because the RB low-fidelity models is accurate already with few basis vectors. Figure 3, right, depicts the split of the total computational budget between constructing the RB low-fidelity model (offline phase, using high-fidelity model evaluations) and multi-fidelity sampling (online phase, using the remaining budget). Because is bounded, the percentage of the budget allocated to constructing the RB low-fidelity model decreases with .

Figure 4 compares the MSEs of regular MC, static MFMC with RB low-fidelity models of fixed dimension , and CA-MFMC estimators. The left plot reports the analytical MSEs derived using equations (2.3), (2.5) and (3.1), which can be evaluated with the constants reported in Tables 1 and 3.
|
Computing the analytical MSEs requires no numerical simulations of the forward model as long as the constants in Tables 1 and 3 are available. The right plot shows the MSEs obtained numerically by taking replicates of the estimators. All multi-fidelity estimators give a lower MSE than standard MC sampling by at least half an order of magnitude, even when using only two dimensions in the static RB low-fidelity model. Overall, the proposed context-aware multi-fidelity estimator gives the lowest MSE of all employed estimators. It is about four orders of magnitude more accurate than standard MC sampling in terms of MSE. Our context-aware estimator is also more accurate than the MFMC estimators with static models because and are too few basis vectors and are too many basis vectors that make the low-fidelity model unnecessarily expensive to evaluate. In contrast, our context-aware estimator optimally trades off training and sampling and so achieves the lowest MSE of all estimators, which agrees with Proposition 1 and Proposition 2. Even though the static MFMC estimator with basis vectors is close to the CA-MFMC estimator, it is unclear how to choose the dimension in static MFMC a priori and if, e.g., is a good choice, whereas the proposed context-aware approach provides the optimal with respect to an upper bound of the MSE for a given budget ; cf. Section 3.2 and Figure 3, left.

4.1.5 Context-aware multi-fidelity Monte Carlo with RB and data-fit low-fidelity models
We now sequentially add, as detailed in Section 3.3, the second low-fidelity model that is based on -SVR. We consider budgets of seconds. The abbreviation “CA-MFMC: ” refers to the CA-MFMC estimator in which the -SVR low-fidelity model is sequentially added after the RB low-fidelity model .
As it can be seen from Table 2, the algebraic cost rate of the -SVR low-fidelity model is concave since , which implies that we need to verify whether the assumptions in Propositions 3 hold true. We therefore verify whether the function defined in (3.12), with and with , satisfies condition (3.14). Since and are constants for a fixed , is independent of the budget and it is hence sufficient to verify (3.14) for the largest considered budget, seconds. For this budget, (cf. Figure 3, left), and and . Moreover, . The left plot in Figure 5 shows that for all which implies that (3.14) is satisfied for all budgets . Therefore the number of high-fidelity model evaluations used to construct the -SVR low-fidelity model exists and is unique with respect to the budget that is left after adding the context-aware RB low-fidelity model. The right plot in Figure 5 depicts the objective defined in (3.13).

Figure 6 compares the MSE of the estimators. In the left plot, we show the analytical MSEs computed using equations (2.3), (2.5) and (3.1) and the constants reported in Tables 1, 2 and 3, without requiring any new numerical simulations. The MSEs reported in the right plot were computed using replicates. Sequentially adding the -SVR low-fidelity model to the context-aware estimator further decreases the MSE. This shows that even though the slowly decreasing accuracy rate of the -SVR low-fidelity model indicates that it would necessitate a large training set to be accurate in single-fidelity settings for predicting high-fidelity model outputs, training it using at most high-fidelity training samples in our context-aware learning approach is sufficient to achieve variance reduction and thus a more accurate multi-fidelity estimator. This indicates that even low-fidelity models with poor predictive capabilities can be useful for multi-fidelity methods as long as they capture the trend of the high-fidelity model.

4.2 Plasma micro-turbulence simulation in the ASDEX Upgrade Experiment
We now consider uncertainty quantification in simulations of plasma micro-turbulence in magnetic confinement devices. Of interest are small-scale fluctuations which cause energy loss rates despite sophisticated plasma confinements via strong and shaped magnetic fields. The micro-turbulence is driven by the free energy provided by the steep plasma temperature and density gradients. However, the measurements of these gradients, as well as further decisive physics parameters affecting the underlying micro-instabilities are subject to uncertainties, which need to be quantified.
4.2.1 Setup of the experiment
We focus on linear (in the phase space variables) gyrokinetic simulations (five-dimensional phase space characterized by three positions and two velocities), which are used to characterize the underlying micro-instabilities; we refer to [4] and the references therein for details. A parameter set from a discharge from the ASDEX Upgrade Experiment is considered, which is similar to the parameter set used in [15] for a validation study of gyrokinetic simulations. The experiment considered here is characterized by two particle species, (deuterium) ions and electrons. Moreover, the magnetic geometry is described by an analytical Miller equilibrium [28]. We consider both electromagnetic effects and collisions. Collisions are modeled by a linearized Landau-Boltzmann operator. The total number of uncertain inputs is , which are modeled as given in Table 4.
parameter name | symbol | left bound | right bound | |
---|---|---|---|---|
plasma | ||||
collision frequency | ||||
ion and electron density gradient | ||||
ion temperature gradient | ||||
temperature ratio | ||||
electron temperature gradient | ||||
effective ion charge | ||||
safety factor | ||||
magnetic shear | ||||
elongation | ||||
elongation gradient | ||||
triangularity |
We perform experiments at one bi-normal wave-number . The output of interest is the growth rate (spectral abscissa) of the dominant micro-instability mode, which corresponds to the maximum real part over the spectrum. For this parameter set, the dominant micro-instability mode at and at the nominal values of the input parameters is electron temperature gradient-driven, characterized by a negative frequency (imaginary part), while the first subdominant mode is ion temperature gradient-driven, characterized by a positive frequency. However, for certain values of the ion temperature and density gradients within the bounds showed in Table 4, the first subdominant ion mode can transition to become the dominant mode. This mode transition can be sharp and in turn lead to sharp transitions or even discontinuities in the growth rate. To avoid the potentially detrimental effects of this discontinuity, in our experiments, we determine the growth rate of the underlying electron-driven micro-instability mode. To this end, we compute the first two micro-instability eigenmodes and then select the growth rate that has a corresponding negative frequency.
4.2.2 High-fidelity model
The high-fidelity model is provided by the gyrokinetic simulation code Gene [24] in the flux-tube limit [8]. This assumes periodic boundary conditions over the radial and bi-normal directions. To discretize the five dimensional gyrokinetic phase-space domain, we use Fourier modes in the radial direction and points along the field line, in the direction. The binormal coordinate is decoupled in linear flux-tube simulations and hence only one such mode is used here. In velocity space, we employ equidistant and symmetric parallel velocity grid points and Gauss-Laguerre distributed magnetic moment points. This results in a total of degrees of freedom in the D phase space.
The average high-fidelity model runtime is seconds on cores on one node on the Lonestar6 supercomputer at the Texas Advanced Computing Center; cf. Section 1. One such node comprises two AMD EPYC 7763 64-Core Processors for a total of cores per socket, and GB of RAM. The high-fidelity variance, estimated using MC samples, is .
4.2.3 A coarse-grid low-fidelity model
We first consider a low-fidelity model obtained by coarsening the grid in direction (20 points), direction ( points) and direction ( Gauss-Laguerre points), for a total of degrees of freedom, which corresponds to fewer degrees of freedom than the high-fidelity model. We note that this represents still a useful resolution for this experiment while further considerable coarsening of the underlying mesh could lead to significant loss of accuracy or potentially even unphysical solutions. The average evaluation cost of relative to the high-fidelity model is on cores on one node on Lonestar6. The variance and correlation coefficient, approximated using MC high- and low-fidelity model evaluations, are and , respectively.
Single-fidelity settings require fine discretizations for obtaining results with desired accuracy. Here, we want to show that when coarse-grid low-fidelity models are correlated with the high-fidelity model, their lower computational cost can be leveraged for speeding up multi-fidelity sampling methods. This is particularly relevant in plasma physics simulations in which usually only a limited number of high-fidelity simulations can be performed for training data-fit low-fidelity models. Also, we want to show that our algorithm has a wide scope in the sense that a wide variety of low-fidelity models can be used.
4.2.4 A sensitivity-driven dimension-adaptive sparse grid low-fidelity model
The low-fidelity model is based on sensitivity-driven dimension-adaptive sparse grid interpolation [11, 12], with code available555https://github.com/ionutfarcas/sensitivity-driven-sparse-grid-approx. Such sparse grid (SG) low-fidelity models have been successfully used for uncertainty quantification in plasma micro-instability simulations [11, 12, 14], including computationally expensive nonlinear simulations [13]. The adaptive procedure to construct the SG low-fidelity model is terminated when the underlying sparse grid reaches cardinality , which corresponds to high-fidelity training samples, where is determined as discussed in Section 3. We note that even though the interpolation points used to construct the SG low-fidelity model have fine granularity (see [11, 12]), in cases where we cannot have exactly sparse-grid points, we take the closest, feasible number of sparse grid points to .
We use high- and low-fidelity model evaluations to asses the accuracy of the SG low-fidelity model; see also [26]. To estimate the evaluation cost rate, we average the runtimes of the low-fidelity model evaluated at MC samples. As can be seen in Figure 7 and Table 5, accuracy and runtime can be modeled well with an algebraic rate.

|
4.2.5 A deep learning low-fidelity model
Low-fidelity model is a given by a fully connected, feed-forward three-layer deep neural network (DNN) with ReLU activation functions that is trained on high-fidelity data. The number of nodes of the hidden layers is and thus depends on the number of training samples . Training is done over epochs using an Adam optimizer in which the loss function is the MSE. The accuracy rate is estimated from high- and low-fidelity model evaluations and the cost rate from low-fidelity model evaluations; see Figure 8 and Table 6. Notice that within the considered training size, the accuracy rate decreases rather slowly, at an algebraic rate of only about . This indicates that a large training size is required for obtaining accurate approximations should the DNN low-fidelity model be used to replace the high-fidelity model. In addition, the evaluation cost rate increases very slowly, which is because the network has few layers only and the employed libraries are highly optimized.

|
4.2.6 Context-aware multi-fidelity estimation
We consider the computational budget seconds, which corresponds to high-fidelity model evaluations and is the limit of the computational resources available to us for this project. To determine which combinations of models lead to CA-MFMC estimators with low MSEs for a budget , we plot the analytic MSEs in Figure 9. The CA-MFMC estimator with the high-fidelity model , the coarse-grid low-fidelity model and either the SG low-fidelity model or the DNN low-fidelity model lead to low MSEs. In contrast, the CA-MFMC estimator with the high-fidelity model , SG model , DNN model but without the coarse-grid model leads to a higher MSE. We therefore consider the CA-MFMC estimators with , and either or (cf. Section 3.3). This experiment shows that the analytic MSE can be used to select model combinations.

Based on the estimated constants in Table 5, the accuracy and cost rates of the SG low-fidelity models are both strictly convex for any and any budget , and therefore Propositions 3 and 4 apply. That is, the optimal, context-aware number of high-fidelity model evaluations for constructing the SG low-fidelity model, , exists, is unique and bounded with respect to the budget. On the other hand, we see from Table 6 that the algebraic cost rate of the DNN low-fidelity model is concave since , which means that we need to verify whether the assumptions in Propositions 3 hold true. To this end, we verify whether the function defined in (3.12), with and with , satisfies condition (3.14). We do so for budget seconds, i.e., the largest budget considered in Figure 9, for which . The left plot in Figure 10 shows that for all which implies that (3.14) is satisfied for all budgets , including the budget considered in our computations, i.e., seconds. Therefore Propositions 3 applies and the optimal, context-aware number of high-fidelity model evaluations for constructing the DNN low-fidelity model, , exists and is unique. The right plot in Figure 10 depicts the objective defined in (3.13), which has a unique global minimizer.

In the left plot of Figure 11, we show the number of training samples for the CA-MFMC estimators with models and . For the SG model we took training samples because the optimal does not corresponding to a valid sparse grid; cf. Section 4.2.4. In the right plot of Figure 11, the number of training samples for the CA-MFMC estimator with models is shown. Training with only samples in case of our budget is sufficient to obtain a low-fidelity model that is useful for variance reduction in the CA-MFMC estimator together with the high-fidelity model, even though the low-fidelity model alone is insufficient to provide accurate predictions in single-fidelity settings.

Consider now the left plot of Figure 12 that compares the MSEs of the CA-MFMC estimators to static MFMC and regular MC. The reference solution was obtained via the MFMC estimator depending on the coarse-grid and DNN low-fidelity models for a budget of seconds using replicates; here we needed high-fidelity evaluations to construct the DNN low-fidelity model. The average reference value of the expected growth rate of the dominant electron temperature gradient-driven micro-instability mode is . The context-aware estimators are up to two orders of magnitude more accurate than the standard MC estimator. The roughly two orders of magnitude speedup of the CA-MFMC estimators translate into a runtime reduction from days to roughly hours on cores on one node of the Lonestar6 supercomputer. We also observe that MFMC with the high-fidelity and coarse-grid model is about one order of magnitude more accurate in terms of the MSE than the standard MC estimator, showing that coarse-grid plasma physics models can be useful in multi-fidelity settings. The trend that we observe here numerically are in alignment with the analytic MSEs shows in Figure 9.
The right plot of Figure 12 shows how the number of samples are distributed among the models. All multi-fidelity estimators allocate more than of the total number of samples to low-fidelity models, which explains the speedups. However, there is still a small percentage corresponding to the high-fidelity model, which is desirable in practical applications to achieve unbiasedness of the estimators.

5 Conclusions
The proposed context-aware learning approach for variance reduction trades off training hierarchies of low-fidelity models with Monte Carlo estimation. By leveraging the trade-off between either adding a high-fidelity model sample to the training set for constructing low-fidelity models versus using the high-fidelity model sample in the Monte Carlo estimator, our analysis shows that the quasi-optimal number of training samples from the high-fidelity model is bounded independent of the computational budget and thus that after a finite number of samples, all high-fidelity model samples should be used in the Monte Carlo estimator rather than being used to improve the low-fidelity model. This means that low-fidelity models can be over-trained for multi-fidelity uncertainty quantification. We have demonstrated the over-training numerically in a thermal block scenario, in which a reduced basis low-fidelity model constructed from high-fidelity training samples led to a multi-fidelity estimator with a poorer cost/error ratio than the proposed context-aware estimator that determined to be the maximum quasi-optimal number of training samples. This has clear implications on training data-fit and machine-learning-based models too, which can require large data sets for achieving high prediction accuracy but are sufficient to be trained with few data points when they are merely used for variance reduction together with the high-fidelity model in multi-fidelity uncertainty quantification, as we have shown in a numerical experiment with plasma micro-turbulence simulations.
Acknowledgements
I.-G.F. and F.J. were partially supported by the Exascale Computing Project (No. 17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. B.P. acknowledges support from the Air Force Office of Scientific Research (AFOSR) award FA9550-21-1-0222 (Dr. Fariba Fahroo). We thank Tobias Goerler for useful discussions and insights about the considered plasma micro-turbulence simulation scenario. We also gratefully acknowledge the compute and data resources provided by the Texas Advanced Computing Center at The University of Texas at Austin (https://www.tacc.utexas.edu/).
References
- [1] T. Alsup and B. Peherstorfer. Context-aware surrogate modeling for balancing approximation and sampling costs in multi-fidelity importance sampling and Bayesian inverse problems. arXiv, 2010.11708, 2020.
- [2] P. Benner, S. Gugercin, and K. Willcox. A survey of projection-based model reduction methods for parametric dynamical systems. SIAM Review, 57(4):483–531, 2015.
- [3] P. Binev, A. Cohen, W. Dahmen, R. DeVore, G. Petrova, and P. Wojtaszczyk. Convergence rates for greedy algorithms in reduced basis methods. SIAM Journal on Mathematical Analysis, 43(3):1457–1472, 2011.
- [4] A. Brizard and T. Hahm. Foundations of nonlinear gyrokinetic theory. Reviews of Modern Physics, 79, 2007.
- [5] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011.
- [6] K. A. Cliffe, M. B. Giles, R. Scheichl, and A. L. Teckentrup. Multilevel Monte Carlo methods and applications to elliptic PDEs with random coefficients. Computing and Visualization in Science, 14(1):3–15, 2011.
- [7] N. Collier, A.-L. Haji-Ali, F. Nobile, E. von Schwerin, and R. Tempone. A continuation multilevel Monte Carlo algorithm. BIT Numerical Mathematics, 55(2):399–432, Jun 2015.
- [8] T. Dannert and F. Jenko. Gyrokinetic simulation of collisionless trapped-electron mode turbulence. Physics of Plasmas, 12(7):072309, 2005.
- [9] S. De, J. Britton, M. Reynolds, R. Skinner, K. Jansen, and A. Doostan. On transfer learning of neural networks using bi-fidelity data for uncertainty propagation. International Journal for Uncertainty Quantification, 10(6):543–573, 2020.
- [10] M. Eigel, C. Merdon, and J. Neumann. An adaptive multilevel Monte Carlo method with stochastic bounds for quantities of interest with uncertain data. SIAM/ASA Journal on Uncertainty Quantification, 4(1):1219–1245, 2016.
- [11] I.-G. Farca\cbs, T. Görler, H.-J. Bungartz, F. Jenko, and T. Neckel. Sensitivity-driven adaptive sparse stochastic approximations in plasma microinstability analysis. Journal of Computational Physics, 410:109394, 2020.
- [12] I.-G. Farcas. Context-aware model hierarchies for higher-dimensional uncertainty quantification. PhD thesis, Technical University of Munich, 2020.
- [13] I.-G. Farcas, G. Merlo, and F. Jenko. A general framework for quantifying uncertainty at scale and its application to fusion research. arXiv, 2202.03999, 2022.
- [14] I.-G. Farcaş, A. D. Siena, and F. Jenko. Turbulence suppression by energetic particles: a sensitivity-driven dimension-adaptive sparse grid framework for discharge optimization. Nuclear Fusion, 61(5):056004, apr 2021.
- [15] S. J. Freethy, T. Görler, A. J. Creely, G. D. Conway, S. S. Denk, T. Happel, C. Koenen, P. Hennequin, and A. E. White. Validation of gyrokinetic simulations with measurements of electron temperature fluctuations and density-temperature phase angles on ASDEX upgrade. Physics of Plasmas, 25(5):055903, 2018.
- [16] M. B. Giles. Multilevel Monte Carlo path simulation. Operations Research, 56(3):607–617, 2008.
- [17] A. A. Gorodetsky, G. Geraci, M. S. Eldred, and J. D. Jakeman. A generalized approximate control variate framework for multifidelity uncertainty quantification. Journal of Computational Physics, 408:109257, 2020.
- [18] A. A. Gorodetsky, J. D. Jakeman, and G. Geraci. MFNets: data efficient all-at-once learning of multifidelity surrogates as directed networks of information sources. Computational Mechanics, 68(4):741–758, Oct 2021.
- [19] A. A. Gorodetsky, J. D. Jakeman, G. Geraci, and M. S. Eldred. MFNets: Multi-fidelity data-driven networks for bayesian learning and prediction. International Journal for Uncertainty Quantification, 10(6):595–622, 2020.
- [20] A. Gruber, M. Gunzburger, L. Ju, R. Lan, and Z. Wang. Multifidelity Monte Carlo estimation for efficient uncertainty quantification in climate-related modeling. EGUsphere, 2022:1–27, 2022.
- [21] A. Gruber, M. Gunzburger, L. Ju, and Z. Wang. A Multifidelity Monte Carlo Method for Realistic Computational Budgets. arXiv, 2206.07572, 2022.
- [22] A.-L. Haji-Ali, F. Nobile, E. von Schwerin, and R. Tempone. Optimization of mesh hierarchies in multilevel Monte Carlo samplers. Stochastics and Partial Differential Equations Analysis and Computations, 4(1):76–112, Mar 2016.
- [23] J. Hampton, H. R. Fairbanks, A. Narayan, and A. Doostan. Practical error bounds for a non-intrusive bi-fidelity approach to parametric/stochastic model reduction. Journal of Computational Physics, 368:315–332, 2018.
- [24] F. Jenko, W. Dorland, M. Kotschenreuther, and B. N. Rogers. Electron temperature gradient driven turbulence. Physics of Plasmas, 7(5):1904–1910, 2000.
- [25] P. Khodabakhshi, K. E. Willcox, and M. Gunzburger. A multifidelity method for a nonlocal diffusion model. Applied Mathematics Letters, 121:107361, 2021.
- [26] J. Konrad, I.-G. Farca\cbs, B. Peherstorfer, A. Di Siena, F. Jenko, T. Neckel, and H.-J. Bungartz. Data-driven low-fidelity models for multi-fidelity Monte Carlo sampling in plasma micro-turbulence analysis. Journal of Computational Physics, 451:110898, 2022.
- [27] F. Law, A. Cerfon, and B. Peherstorfer. Accelerating the estimation of collisionless energetic particle confinement statistics in stellarators using multifidelity Monte Carlo. Nuclear Fusion, 62(7):076019, may 2022.
- [28] R. Miller, M. Chu, J. Greene, Y. Lin-Liu, and R. Waltz. Noncircular, finite aspect ratio, local equilibrium model. Physiscs of Plasmas, 5:979, 1998.
- [29] A. Narayan, C. Gittelson, and D. Xiu. A Stochastic Collocation Algorithm with Multifidelity Models. SIAM Journal on Scientific Computing, 36(2):A495–A521, 2014.
- [30] F. Newberry, J. Hampton, K. Jansen, and A. Doostan. Bi-fidelity reduced polynomial chaos expansion for uncertainty quantification. Computational Mechanics, 69(2):405–424, Feb 2022.
- [31] L. Ng and K. Willcox. Multifidelity approaches for optimization under uncertainty. International Journal for Numerical Methods in Engineering, 100(10):746–772, 2014.
- [32] B. Peherstorfer. Multifidelity Monte Carlo estimation with adaptive low-fidelity models. SIAM/ASA Journal on Uncertainty Quantification, 7(2):579–603, 2019.
- [33] B. Peherstorfer, M. Gunzburger, and K. Willcox. Convergence analysis of multifidelity Monte Carlo estimation. Numerische Mathematik, 139(3):683–707, 2018.
- [34] B. Peherstorfer, K. Willcox, and M. Gunzburger. Optimal model management for multifidelity Monte Carlo estimation. SIAM Journal on Scientific Computing, 38(5):A3163–A3194, 2016.
- [35] B. Peherstorfer, K. Willcox, and M. Gunzburger. Survey of multifidelity methods in uncertainty propagation, inference, and optimization. SIAM Review, 60(3):550–591, 2018.
- [36] E. Qian, B. Peherstorfer, D. O’Malley, V. Vesselinov, and K. Willcox. Multifidelity Monte Carlo estimation of variance and sensitivity indices. SIAM/ASA Journal on Uncertainty Quantification, 6(2):683–706, 2018.
- [37] M. Razi, R. M. Kirby, and A. Narayan. Fast predictive multi-fidelity prediction with models of quantized fidelity levels. Journal of Computational Physics, 376:992–1008, 2019.
- [38] G. Rozza, D. Huynh, and A. Patera. Reduced basis approximation and a posteriori error estimation for affinely parametrized elliptic coercive partial differential equations. Archives of Computational Methods in Engineering, 15(3):1–47, 2007.
- [39] D. Schaden and E. Ullmann. On multilevel best linear unbiased estimators. SIAM/ASA Journal on Uncertainty Quantification, 8(2):601–635, 2020.
- [40] D. Schaden and E. Ullmann. Asymptotic analysis of multilevel best linear unbiased estimators. SIAM/ASA Journal on Uncertainty Quantification, 9(3):953–978, 2021.
- [41] N. Shyamkumar, S. Gugercin, and B. Peherstorfer. Towards context-aware learning for control: Balancing stability and model-learning error. In 2022 American Control Conference (ACC), pages 4808–4813, 2022.
- [42] A. L. Teckentrup, R. Scheichl, M. B. Giles, and E. Ullmann. Further analysis of multilevel Monte Carlo methods for elliptic PDEs with random coefficients. Numerische Mathematik, 125(3):569–600, 2013.
- [43] S. Werner and B. Peherstorfer. Context-aware controller inference for stabilizing dynamical systems from scarce data. arXiv, 2207.11049, 2022.
- [44] Y. Xu, V. Keshavarzzadeh, R. M. Kirby, and A. Narayan. A Bandit-Learning Approach to Multifidelity Approximation. SIAM Journal on Scientific Computing, 44(1):A150–A175, 2022.
- [45] H. Yang, Y. Fujii, K. W. Wang, and A. A. Gorodetsky. Control variate polynomial chaos: Optimal fusion of sampling and surrogates for multifidelity uncertainty quantification, 2022.