This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An adaptive model hierarchy for data-augmented training of kernel models for reactive flow

B. Haasdonk    M. Ohlberger    F. Schindler Institute of Applied Analysis and Numerical Simulation, Pfaffenwaldring 57, D-70569 Stuttgart (e-mail: [email protected]). Mathematics Münster, Westfälische Wilhelms-Universität Münster, Einsteinstr. 62, D-48149 Münster (e-mail: {mario.ohlberger,felix.schindler}@uni-muenster.de)
thanks: Funded by BMBF under contracts 05M20PMA and 05M20VSA. Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under contracts OH 98/11-1 and SCHI 1493/1-1, as well as under Germany’s Excellence Strategy EXC 2044 390685587, Mathematics Münster: Dynamics – Geometry – Structure, and EXC 2075 390740016, Stuttgart Center for Simulation Science (SimTech).

1 Reference model

We are interested in constructing efficient and accurate models to approximate time-dependent quantities of interest (QoI) fL2(𝒫;L2([0,T]))f\in L^{2}\big{(}\mathcal{P};L^{2}([0,T])\big{)} in the context of reactive flow, with T>0T>0 and where 𝒫p\mathcal{P}\subset\mathbb{R}^{p} for p1p\geq 1 denotes the set of possible input parameters. As a class of QoI functions, we consider those obtained by applying linear functionals sμVs_{\mu}\in V^{\prime} to solution trajectories cμL2(0,T;V)c_{\mu}\in L^{2}(0,T;V) of, e.g., parametric parabolic partial differential equations. Thus, f(μ;t):=sμ(cμ(t))f(\mu;t):=s_{\mu}(c_{\mu}(t)), where for each parameter μ𝒫\mu\in\mathcal{P}, the concentration cμc_{\mu} with tcμL2(0,T;V)\partial_{t}c_{\mu}\in L^{2}(0,T;V^{\prime}) and initial condition c0Vc_{0}\in V is the unique weak solution of

tcμ,v+aμ(cμ,v)=lμ(v)\displaystyle\left<\partial_{t}c_{\mu},v\right>+a_{\mu}(c_{\mu},v)=l_{\mu}(v) vV,cμ(0)=c0.\displaystyle\forall\;v\in V,\;c_{\mu}(0)=c_{0}. (1)

Here, VH1(Ω)L2(Ω)VV\subset H^{1}(\Omega)\subset L^{2}(\Omega)\subset V^{\prime} denotes a Gelfand triple of Hilbert-spaces associated with a spatial Lipschitz-domain Ω\Omega and, for μ𝒫\mu\in\mathcal{P}, lμVl_{\mu}\in V^{\prime} denotes a continuous linear functional and aμ:V×Va_{\mu}:V\times V\to\mathbb{R} a continuous coercive bilinear form.

As a basic model for reactive flow in catalytic filters, (1) could stem from a single-phase one-dimensional linear advection-diffusion-reaction problem with Dammköhler- and Péclet-numbers as input (thus p=2p=2), where cc models the dimensionless molar concentration of a species and the break-through curve ss measures the concentration at the outflow, as detailed in Gavrilenko et al. (2022).

Since direct evaluations of ff are not available, we resort to a full order model (FOM) as reference model, yielding

fh:𝒫NT for NT1,\displaystyle f_{h}:\mathcal{P}\to\mathbb{R}^{N_{T}}\text{ for }N_{T}\geq 1, fh(μ;t):=sμ(ch,μ(t)),\displaystyle f_{h}(\mu;t):=s_{\mu}(c_{h,\mu}(t)), (2)

which we assume to be a sufficiently accurate approximation of the QoI. For simplicity, we consider a P1P^{1}-conforming Finite Element space VhVV_{h}\subset V and obtain the FOM solution trajectory ch,μL2(0,T;Vh)c_{h,\mu}\in L^{2}(0,T;V_{h}) by Galerkin projection of (1) onto VhV_{h} and an implicit Euler approximation of the temporal derivative.

2 Surrogate models

The evaluation of (2) may be arbitrarily costly, in particular in multi- or large-scale scenarios where dimVh1\dim V_{h}\gg 1, but also if NT1N_{T}\gg 1 due to long-time integration or when a high resolution of fhf_{h} is required. We thus seek to build a machine learning (ML) based surrogate model

fml:𝒫NT,fml(μ;tn)fh(μ;tn)1nNT,\displaystyle f_{\text{ml}}:\mathcal{P}\to\mathbb{R}^{N_{T}},\;f_{\text{ml}}(\mu;t_{n})\approx f_{h}(\mu;t_{n})\;\forall 1\leq n\leq N_{T}, (3)

to predict all values {fml(μ;tn)}n=1NT\{f_{\text{ml}}(\mu;t_{n})\}_{n=1}^{N_{T}} at once, without time-integration. Such models based on Neural Networks or Kernels typically rely on a large amount of training data

{(μ,fh(μ))|μ𝒫train},\displaystyle\big{\{}\big{(}\mu,f_{h}(\mu)\big{)}\,\big{|}\,\mu\in\mathcal{P}_{\text{train}}\big{\}}, 𝒫train𝒫,\displaystyle\mathcal{P}_{\text{train}}\subset\mathcal{P}, |𝒫train|1,\displaystyle|\mathcal{P}_{\text{train}}|\gg 1, (4)

rendering their training prohibitively expensive in the aforementioned scenarios; we refer to Gavrilenko et al. (2022) and the references therein and in particular to Santin and Haasdonk (2021). In Gavrilenko et al. (2022) we thus seek to employ an intermediate surrogate to generate sufficient training data.

2.1 Structure preserving Reduced Basis models

The idea of projection-based model order reduction by Reduced Basis (RB) methods is to approximate the state chc_{h} in a low-dimensional subspace VrbVhV_{\text{rb}}\subset V_{h} and to obtain online-efficient approximations of fhf_{h} by Galerkin projection of the FOM detailed in Section 1 onto VrbV_{\text{rb}} and a pre-computation of all quantities involving VhV_{h} in a possibly expensive offline-computation; we refer to Milk et al. (2016) and the references therein. Using such structure preserving reduced order models (ROM)s we obtain RB trajectories crb,μL2(0,T;Vrb)c_{\text{rb},\mu}\in L^{2}(0,T;V_{\text{rb}}) and a RB model

frb:𝒫NT,\displaystyle f_{\text{rb}}:\mathcal{P}\to\mathbb{R}^{N_{T}}, frb(μ;t):=sμ(crb,μ(t)),\displaystyle f_{\text{rb}}(\mu;t):=s_{\mu}(c_{\text{rb},\mu}(t)), (5)

with a computational complexity independent of dimVh\dim V_{h}, the solution of which, however, still requires time-integration.

The quality and efficiency of RB models hinges on the problem adapted RB space VrbV_{\text{rb}} which could be constructed in an iterative manner steered by a posteriori error estimates using the POD-greedy algorithm from Haasdonk (2013). Instead, we obtain by the method of snapshots

Vrb:=POD({ch,μ|μ𝒫rb),\displaystyle V_{\text{rb}}:=\left<\texttt{POD(}\{c_{h,\mu}\,|\,\mu\in\mathcal{P}_{\text{rb}}\texttt{)}\right>, with 𝒫rb𝒫\displaystyle\text{with }\mathcal{P}_{\text{rb}}\subset\mathcal{P} (6)

consisting of only few a priori selected parameters (e.g. the outermost four points in 𝒫\mathcal{P}), where we use the hierarchic approximate POD from Himpe et al. (2018) for NT1N_{T}\gg 1 to avoid computing the SVD of a dense snapshot Gramian of size NT2N_{T}^{2}.

2.2 Kernel models

While still requiring time-integration, we can afford to use RB ROMs to generate a sufficient amount of training data

Xtrain={(μ,frb(μ))|μ𝒫ml}{(μ,fh(μ)|μ𝒫rb)},\displaystyle X_{\text{train}}=\big{\{}\big{(}\mu,f_{\text{rb}}(\mu)\big{)}\,\big{|}\,\mu\in\mathcal{P}_{\text{ml}}\big{\}}\cup\big{\{}\big{(}\mu,f_{h}(\mu)\,\big{|}\,\mu\in\mathcal{P}_{\text{rb}}\big{)}\big{\}},

augmented by the FOM-data available as a side-effect from generating VrbV_{\text{rb}}. Using this data, we obtain the ML model fmlf_{\text{ml}} from (3) using the vectorial greedy orthogonal kernel algorithm from Santin and Haasdonk (2021).

While resulting in substantial computational gains, the presented approach from Gavrilenko et al. (2022) still relies on the traditional offline/online splitting of the computational process to train the RB ROM as well as the ML model to be valid for all of 𝒫\mathcal{P}, requiring a priori choices regarding 𝒫rb\mathcal{P}_{\text{rb}} and 𝒫ml\mathcal{P}_{\text{ml}} with a significant impact on the overall performance and applicability of these models.

3 An adaptive model hierarchy

Keil et al. (2021) introduced an approach beyond the classical offline/online splitting where a RB ROM is adaptively enriched based on rigorous a posteriori error estimates, following the path of an optimization procedure through the parameter space. Similarly, we propose an adaptive enrichment yielding a hierarchy of FOM, RB ROM and ML models, based on the standard residual-based a posteriori estimate on the RB output error, fh(μ)frb(μ)L2([0,T]))Δrb(μ)\|f_{h}(\mu)-f_{\text{rb}}(\mu)\|_{L^{2}([0,T]))}\leq\Delta_{\text{rb}}(\mu), for which we refer to the references in Milk et al. (2016).

Algorithm 1 Adaptive QoI model generation
1:ROM tolerance ε>0\varepsilon>0, ML trust/train criteria
2:Xtrain=X_{\text{train}}=\emptyset, ΦRB={}\Phi_{\text{RB}}=\{\}, Vrb:=ΦrbV_{\text{rb}}:=\left<\Phi_{\text{rb}}\right>, fml:=0f_{\text{ml}}:=0
3:for all μ𝒫\mu\in\mathcal{P} selected by outer loop do
4:     if ML model is trustworthy then return fml(μ)f_{\text{ml}}(\mu)
5:     else
6:         compute frb(μ)f_{\text{rb}}(\mu), Δrb(μ)\Delta_{\text{rb}}(\mu)
7:         if Δrb(μ)ε\Delta_{\text{rb}}(\mu)\leq\varepsilon then
8:              collect Xtrain=Xtrain{(μ,frb(μ))}X_{\text{train}}=X_{\text{train}}\cup\{(\mu,f_{\text{rb}}(\mu))\}
9:              (optionally) fit ML model, return frb(μ)f_{\text{rb}}(\mu)
10:         elseΠΦrb\Pi_{\Phi_{\text{rb}}}: orth. proj. onto Φrb\left<\Phi_{\text{rb}}\right>
11:              compute fh(μ)f_{h}(\mu)
12:              enrich Φrb=ΦrbPOD(ch(μ)ΠΦrb[ch(μ)])\Phi_{\text{rb}}=\Phi_{\text{rb}}\cup\texttt{POD(}c_{h}(\mu)-\Pi_{\Phi_{\text{rb}}}[c_{h}(\mu)]\texttt{)}
13:              update RB ROM
14:              collect Xtrain=Xtrain{(μ,fh(μ))}X_{\text{train}}=X_{\text{train}}\cup\{(\mu,f_{h}(\mu))\}
15:              (optionally) fit ML model, return fh(μ)f_{h}(\mu)               

As a means to judge if a ML model is trustworthy, we propose a manual validation using the following a posteriori error estimate on the ML QoI error. While not as cheaply computable as fmlf_{\text{ml}}, it still allows to validate the ML model without computing fhf_{h}.

Proposition 1 (ML model a posteriori error estimate)


Let frb(μ)f_{\text{rb}}(\mu), fml(μ)NTf_{\text{ml}}(\mu)\in\mathbb{R}^{N_{T}} denote the RB ROM and ML model approximations of fh(μ)f_{h}(\mu), respectively, and let Δrb(μ)\Delta_{\text{rb}}(\mu) denote an upper bound on the RB-output error. We then have by triangle inequality for all μ𝒫\mu\in\mathcal{P}

fh(μ)fml(μ)\displaystyle\|f_{h}(\mu)-f_{\text{ml}}(\mu) L2([0,T])Δrb(μ)\displaystyle\|_{L^{2}([0,T])}\leq\Delta_{\text{rb}}(\mu) (7)
+frb(μ)fml(μ)L2([0,T]),\displaystyle+\|f_{\text{rb}}(\mu)-f_{\text{ml}}(\mu)\|_{L^{2}([0,T])},

where the right hand side is computable with a computational complexity independent of dimVh\dim V_{h}.

Applying Algorithm 1 to the example of one-dimensional single-phase reactive flow from the last row of Table 1 in Gavrilenko et al. (2022), with dim𝒫=2\dim\mathcal{P}=2, NT=24576N_{T}=24576 time steps, dimVh=65537\dim V_{h}=65537 gives the behaviour shown in Figure 1, where we set ε=102\varepsilon=10^{-2}, retrain the ML model every 10 collected samples and unconditionally trust the ML model as soon as |Xtrain|50|X_{\text{train}}|\geq 50.111The experiments were performed using pyMOR from Milk et al. (2016) and dune-gdt from https://docs.dune-gdt.org/. For the considered diffusion dominated regime, we only require a single evaluation of fhf_{h} (yielding a dimVrb=15\dim V_{\text{rb}}=15-dimensional RB ROM), which results in even further computational savings, compared to the results obtained in Gavrilenko et al. (2022).

Refer to caption
Figure 1: Each dot correspond to the input-to-output query time of the adaptive model from Algorithm 1 applied to Gavrilenko et al. (2022).

References

  • Gavrilenko et al. (2022) Gavrilenko, P., Haasdonk, B., Iliev, O., Ohlberger, M., Schindler, F., Toktaliev, P., Wenzel, T., and Youssef, M. (2022). A full order, reduced order and machine learning model pipeline for efficient prediction of reactive flows. In Large-Scale Scientific Computing, 378–386. Springer International Publishing.
  • Haasdonk (2013) Haasdonk, B. (2013). Convergence rates of the POD-greedy method. ESAIM Math. Model. Numer. Anal., 47(3), 859–873.
  • Himpe et al. (2018) Himpe, C., Leibner, T., and Rave, S. (2018). Hierarchical Approximate Proper Orthogonal Decomposition. SIAM Journal on Scientific Computing, 40(5), A3267–A3292.
  • Keil et al. (2021) Keil, T., Mechelli, L., Ohlberger, M., Schindler, F., and Volkwein, S. (2021). A non-conforming dual approach for adaptive trust-region reduced basis approximation of PDE-constrained parameter optimization. ESAIM Math. Model. Numer. Anal., 55(3), 1239–1269.
  • Milk et al. (2016) Milk, R., Rave, S., and Schindler, F. (2016). pyMOR – generic algorithms and interfaces for model order reduction. SIAM J. Sci. Comput., 38(5), S194–S216.
  • Santin and Haasdonk (2021) Santin, G. and Haasdonk, B. (2021). Kernel methods for surrogate modeling. In P. Benner, S. Grivet-Talocia, A. Quarteroni, G. Rozza, W. Schilders, and L.M. Silveira (eds.), Model Order Reduction, volume 2, 311–353. De Gruyter.