This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Facultad de Matemática, Astronomía, Física y Computación,
Universidad Nacional de Córdoba, (5000), Córdoba, Argentina
22institutetext: LIGO, California Institute of Technology, Pasadena, CA 91125, USA 33institutetext: Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas (CIFASIS, CONICET–UNR) 44institutetext: Instituto De Astronomía Teórica y Experimental - Observatorio Astronómico Córdoba (IATE–OAC–UNC–CONICET)

Arby - Fast data–driven surrogates

Aarón Villanueva 11    Martin Beroiz 22    Juan Cabral 33 4 4    Martín Chalela 44    Mariano Dominguez 44
Abstract

Context. The availability of fast to evaluate and reliable predictive models is highly relevant in multi-query scenarios where evaluating some quantities in real, or near-real-time becomes crucial. As a result, reduced-order modelling techniques have gained traction in many areas in recent years.

Aims. We introduce Arby, an entirely data-driven Python package for building reduced order or surrogate models. In contrast to standard approaches, which involve solving partial differential equations, Arby is entirely data-driven. The package encompasses several tools for building and interacting with surrogate models in a user-friendly manner. Furthermore, fast model evaluations are possible at a minimum computational cost using the surrogate model.

Methods. The package implements the Reduced Basis approach and the Empirical Interpolation Method along a classic regression stage for surrogate modelling.

Results. We illustrate the simplicity in using Arby to build surrogates through a simple toy model: a damped pendulum. Then, for a real case scenario, we use Arby to describe CMB temperature anisotropies power spectra. On this multi-dimensional setting, we find that out from an initial set of 80,00080,000 power spectra solutions with 3,0003,000 multipole indices each, could be well described at a given tolerance error, using just a subset of 8484 solutions.

Key Words.:
Reduced Order Modeling – Surrogate Models – Reduced Basis – Empirical Interpolation – Python Package.

1 Introduction

Several problems arise in observational and theoretical contexts that demand the resolution of computationally intensive differential equations. From structural analysis in engineering to spacetime simulations in astrophysics, rapid and reliable evaluations of solutions to these equations are crucial due to the need for computing in real or near-real time quantities that depend on those solutions. Due to pervasive computational costs led by the inherent complexities present in many problems, achieving fast responses becomes an ubiquitous bottleneck.

As a case study, let us take the example of an ongoing problem in the field of gravitational wave research, namely the template bank problem (Field et al. (2012)). To be able to detect the very faint gravitational wave signals among the noisy background of ground-based interferometers, the LIGO-Virgo collaboration use match filtering against a bank of GW signal templates to maximize the signal-to-noise ratio (SNR) (Cutler & Flanagan (1994); Abbott et al. (2016, 2020)).

The observed time series is filtered to decide whether a binary coalescence took place, and the parameter estimation allows us to infer the properties of precursors of the remnant, such as masses and spins for binary black hole mergers.

It is convenient to count with large enough template banks of theoretical waveforms to fill the parameter space of GWs. However, this pose an exceedingly challenging task due to the complexity of solving the Einstein Equations of General Relativity. A single numerically generated GW waveform can take from days to weeks to become available for production (Lehner & Pretorius (2014)).

A palliative solution is the construction of approximate waveform models, which are more direct to build and deploy than numerical relativity ones. There are several methods to build these approximations. Analytical examples are the Post-Newtonian (Blanchet (2006)), Effective One Body (Damour & Nagar (2011)) and Phenomenological (Sturani et al. (2010); Hannam et al. (2014); Khan et al. (2019)) approximations. However, we focus here on a particular set of methods that have proven to be very fertile in GW research and produced some of the milestones in waveform modeling in recent years: Reduced Order methods.

Reduced Order Modeling (ROM) is an umbrella term that encompasses a variety of techniques for dimensional reduction, developed to address the problem of complexity in numerical simulations. In particular, Surrogate Models obtained through application of ROM to ground truth solutions are low resolution representations intended to be fast to build and evaluate without compromising accuracy. We take a data-driven approach, i.e. driven only by data, as opposed to more standard and intrusive ones in which reduction methods are coupled to differential solvers to build solutions (Quarteroni et al. (2015); Hesthaven et al. (2015)).

In waveform modeling, the combination of two ROM methods originally posed for intrusive problems and recreated later for data-driven ones led to significant success in constructing surrogate models for GWs. Those methods are the Reduced Basis (RB) (Boyaval et al. (2009); Field et al. (2011, 2012)) and the Empirical Interpolation (EI) (Barrault et al. (2004); Chaturantabut & Sorensen (2010)) methods, which we describe in the next section.

The primarily purpose of this work is to disseminate these tools in the astronomy/astrophysics community by introducing a single and user-friendly Python package for data-driven dimensional reduction: Arby.

Arby arises as a response to a lack of well documented, tested and actively maintained code for reduced basis and surrogate modeling in the scientific community while adhering to the data-driven and user-friendliness premise.

2 Theory Overview

This section briefly describes the basics of a reduced-order pipeline for building surrogate models. As we stated above, it merges two main ingredients, the Reduced Basis and the Empirical Interpolation methods, for dimensional reduction of raw data. This pipeline was first introduced in (Field et al. (2014)) followed by the construction of several surrogate waveform models based on this method (Blackman et al. (2015, 2017b, 2017a); Varma et al. (2019a, b); Rifat et al. (2020)). See also (Tiglio & Villanueva (2021)) for a review.

Representation.– We are interested in parametrized scalar (real or complex) functions, solutions or models of the form

hλ(x):=h(λ;x),\displaystyle h_{\lambda}(x):=h(\lambda;x)\,, (1)

where λ\lambda represents the parameter/s of the model and xx is the independent variable. Both are real and possibly multidimensional. In physical models, hλh_{\lambda} can represent parametrized time series with xx being the time variable. For convenience, we denote the spaces for λ\lambda and xx as the parameter and physical domains, respectively.

In the first stage, we look for a low-resolution representation of the model hλh_{\lambda}. The RB method consist in representing a whole set of solutions, usually called the training set

𝒦:={hλi}i=1N,{\cal K}:=\{h_{\lambda_{i}}\}_{i=1}^{N}\,,

by linear combinations of basis elements of the form

hλi=1nei,hλei,h_{\lambda}\approx\sum_{i=1}^{n}\langle e_{i},h_{\lambda}\rangle e_{i}\,, (2)

where

ei,hλ:=Ωei¯(x)hλ(x)𝑑x\langle e_{i},h_{\lambda}\rangle:=\int_{\Omega}\bar{e_{i}}(x)h_{\lambda}(x)dx (3)

defines the inner product between training functions, ei¯\bar{e_{i}} is the complex conjugate of eie_{i} –if we deal with complex functions– and Ω\Omega is the physical domain. The set {ei}i=1n\{e_{i}\}_{i=1}^{n} is called the reduced basis and is composed by a subset of optimally chosen solutions from the training set. The construction of the reduced basis is iterative: at each step of the algorithm, the most dissimilar (orthogonal) element from the training set 𝒦{\cal K} joins to the current basis, and the process stops when an user-specified tolerance is met. This tolerance is related to the maximum error of the difference between solutions and approximations. In consequence, the different spaces 𝒳i(i=1,2,)\mathcal{X}_{i}(i=1,2,\ldots) spanned by each reduced basis built at each step are nested, i.e. 𝒳1𝒳2,\mathcal{X}_{1}\subset\mathcal{X}_{2},\ldots The addition of a new element to the basis implies a fine tuning of the previous approximation space.

The RB method supposes a compression in the parameter space of solutions. The next step is to achieve compression in time. To this we turn to interpolation replacing the projection-based approach described so far by an interpolation scheme. We pose the problem by looking for an efficient linear interpolation operator n{\cal I}_{n} such that

hλ(x)n[hλ](x)=i=1nCi(λ)ei(x),h_{\lambda}(x)\approx{\cal I}_{n}[h_{\lambda}](x)=\sum_{i=1}^{n}C_{i}(\lambda)e_{i}(x)\,, (4)

subject to

n[hλ](Xi)=hλ(Xi),i=1,,n{\cal I}_{n}[h_{\lambda}](X_{i})=h_{\lambda}(X_{i}),\quad i=1,\ldots,n (5)

for strategically selected nodes Xi(i=1,,n)X_{i}\,(i=1,\ldots,n) out from the physical domain. The EI method (Maday et al. (2009); Barrault et al. (2004); Chaturantabut & Sorensen (2010)) gives us an algorithm for building such interpolant. The Empirical Interpolation (EI) algorithm, as described in (Field et al. (2014)), selects iteratively the nodes {Xi}\{X_{i}\} from the physical domain following a local optimization criteria (Tiglio & Villanueva (2020)).

The EI algorithm receives as unique input the reduced basis and selects the interpolation nodes for building the interpolant. Note that there is no need of the whole training set 𝒦{\cal K} since the RB algorithm already did the introspection of it, and we assume that the relevant information about 𝒦{\cal K} is hard-coded in the reduced basis.

It is possible to show that, under some conditions, the interpolation error is similar to the projection one, which in most applications has exponential decay in the number of basis elements (see (Tiglio & Villanueva (2021)) and citations therein). This leads to an efficient and (in most cases of interest) accurate representation of the training set by means of an empirical interpolation.

To summarizing: first, a reduced basis is built from a training set using the RB method, which leads to the linear representation in (2). This step completes a compression in the parameter domain. Next, an empirical interpolant is built solely from the reduced basis. This step completes a compression in the physical domain. Finally, we end up with an empirical interpolation (4) which provides efficient and high-accuracy representations of all functions in the training set.

Predictive models.– We want our model to represent solutions that are not present in the training set. That is, we look for the predictability. For this, we perform parametric fits at each empirical node along with training values. Let us break it down.

Let us rewrite the interpolation in (4) as

n[hλ](x)=i=1nBi(x)hλ(Xi),{\cal I}_{n}[h_{\lambda}](x)=\sum_{i=1}^{n}B_{i}(x)h_{\lambda}(X_{i})\,, (6)

where

Bi(x):=j=1n(𝐕n1)jiej(x)B_{i}(x):=\sum_{j=1}^{n}({\bf V}_{n}^{-1})_{ji}e_{j}(x) (7)

and

(𝐕n)ij:=ej(Xi).({\bf V}_{n})_{ij}:=e_{j}(X_{i})\,.

Recall the approach is data-driven, so we do not fill the training set with more solutions to approximate newer ones. Instead, we predict them by performing fits along data that we already know, that is, along hλ(Xi)h_{\lambda}(X_{i}) (i=1,,n)(i=1,\ldots,n). For 11-D problems (λ\lambda\in\mathbb{R}), deciding for problem-agnostic fitting methods that are well-suited for most cases can be a challenging task, not to mention the high dimensional case, which remains an open question (Tiglio & Villanueva (2021)). For a rough classification, we refer to regression and interpolation methods. The former deals with calibrating free parameters of some model by optimizing an error function; the latter, with solving an interpolation problem consisting essentially in solving algebraic systems which are possibly subject to constraints (e.g. Eqs. (4,5)).

Here we address the second approach for parametric fits. The procedure consists in interpolating the values

{h(λj;Xi)}j=1N\{h(\lambda_{j};X_{i})\}_{j=1}^{N}

along parameter samples for each empirical node Xi,i=1,,nX_{i}\,,\,i=1,\ldots,n. As we describe in Section 4, as of now, Arby uses splines for this step, though we expect to generalize this. Once fits are done, we end up with a surrogate model hλsurr(x)h_{\lambda}^{surr}(x) which can represent and predict hλh_{\lambda} at any λ\lambda in the parameter domain with high accuracy and low computational cost.

3 Algorithms

The pipeline for building surrogates described in the previous section is valid in any number of parameter and physical dimensions if we consider arbitrary fittings through parameter space. For surrogate modeling, Arby supports in its present version 1-D parameter and physical domains (real number intervals of the form [a,b][a,b] for a,ba,b\in\mathbb{R}) and real-valued functions. Again, this restriction is only for building surrogate models. On the other hand, Arby supports multidimensional parameter domains (although still restricted to 1-D domains in the physical dimension) and complex-valued functions for building reduced bases and empirical interpolants.

Below we summarize the algorithm for surrogate modeling. We refer the reader to the Appendices for technical details about the RB and EI algorithms used in intermediate stages.

The inputs are the training set 𝒦={hλi}i=1N{\cal K}=\{h_{\lambda_{i}}\}_{i=1}^{N}, the parameter set 𝒯:={λi}i=1N{\cal T}:=\{\lambda_{i}\}_{i=1}^{N}, and the greedy tolerance ϵ\epsilon\in\mathbb{R}.

Algorithm 1 Surrogate modeling
1:Input: 𝒦{\cal K}, 𝒯{\cal T}, ϵ\epsilon
2:Build the reduced basis {ei}i=1n\{e_{i}\}_{i=1}^{n} up to tolerance ϵ\epsilon.
3:Find the empirical nodes {Xi}i=1n\{X_{i}\}_{i=1}^{n} and build the interpolant n{\cal I}_{n} by assembling the functions Bi(x)(i=1,,n)B_{i}(x)\,(i=1,\ldots,n) (see 7).
4:for i=1ni=1\to n do
5:     
Build a continuous function hifit(λ)h_{i}^{fit}(\lambda) by doing fits along values {h(λ;Xi)}λ𝒯\{h(\lambda;X_{i})\}_{\lambda\in{\cal T}}.
6:end for
7:Assemble the surrogate: hsurr(λ;x):=i=1nBi(x)hifit(λ)h^{surr}(\lambda;x):=\sum_{i=1}^{n}B_{i}(x)h_{i}^{fit}(\lambda)
8:Output: surrogate model hsurr(λ;x)h^{surr}(\lambda;x)

Let’s make some remarks on Alg. 1.

  • The training set 𝒦{\cal K} is built from a discretization 𝒯{\cal T} of the parameter domain. In the current version of Arby, 𝒯{\cal T} is a discretized real interval 𝒯:={λ1,,λN}{\cal T}:=\{\lambda_{1},\ldots,\lambda_{N}\}.

  • The RB algorithm used for building the reduced basis is fully described in Alg. 2, see the Appendices. It selects from 𝒯{\cal T} nn points λi=Λi(i=1,,n)\lambda_{i}=\Lambda_{i}\,(i=1,\ldots,n), called the greedy points, labeling those functions in the training set that conform the reduced basis. For conditioning purposes, the basis is orthonormalized at each step, so the algorithm’s final output is a set of orthonormal basis elements along with the set of greedy points. So we use the term reduced basis interchangeably for both, the basis conformed by greedy solutions and its orthonormal version, due to its equivalence (they span the same space).

    The number nn depends on the greedy tolerance ϵ\epsilon. In Arby we must specify a discretization of the physical domain Ω\Omega so as to be able to do integrals (see (3)). In this context, Ω\Omega is a real interval [xa,xb][x_{a},x_{b}] and Arby must receive as input an equispaced discretization of it.

  • Step 3 implements the EI algorithm described in Alg. 3, see the Appendices. It receives the reduced basis as unique input and finds nn empirical nodes Xi(i=1,,n)X_{i}\,(i=1,\ldots,n) to build the interpolant. In practice, the interpolant is specified by assembling the nn functions Bi(x)B_{i}(x) defined in Eqs. (6,7).

  • To achieve predictability Steps 4-6 perform parametric fits along training values for each empirical node XiX_{i}. Let’s illustrate this by looking at the first iteration of the loop in Steps 4-6. For the first node X1X_{1} collect all values

    {h(λ1;X1),h(λ2;X1),,h(λN;X1)}\{h(\lambda_{1};X_{1}),h(\lambda_{2};X_{1}),\ldots,h(\lambda_{N};X_{1})\}

    and perform a fit along them. This results in a function h1fit(λ)h^{fit}_{1}(\lambda) that is continuous in the interval [λ1,λN][\lambda_{1},\lambda_{N}]. This is repeated nn times for each empirical node. The resulted functions h(λ)ifith(\lambda)_{i}^{fit} constitute along the reduced basis the building blocks for the final surrogate assembly. The current Arby version implements splines (J.H. Ahlberg & Walsh (1967)) for parametric fits, i.e., piecewise polynomial interpolation of some degree arbitrarily set by the user.

  • From eqs. 6 and 7 the empirical interpolant is defined through the nn functions Bi(x)B_{i}(x). They comprise all the RB-EI information. Combining the functions Bi(x)B_{i}(x) with the fits hifit(λ)h^{fit}_{i}(\lambda) built on previous steps, Step 7 leads to the desired surrogate hsurr(λ;x)h^{surr}(\lambda;x) which is continuous in λ\lambda inside the real interval [λ1,λN][\lambda_{1},\lambda_{N}].

3.1 Related works

A previous implementation of the RB-EI approach is GreedyCpp 111ttps://bitbucket.org/sfield83/greedycpp/, an MPI/OpenMP parallel code written in C++ (Antil et al. (2018)). Although it is not designed for building surrogates and training sets have to be loaded at runtime, it allows for building reduced bases, empirical interpolants and reduced-order quadratures. Another example is ROMPy 222https://bitbucket.org/chadgalley/rompy/, a previous attempt written in pure Python which supports surrogate modeling.

Other ROM implementations in the Python ecosystem are not fully data-driven. Typically they are weakly or strongly coupled to solvers for differential equations. Mature examples are PyMOR (Milk et al. (2016)) and RBniCS (Ballarin et al. (2015)). The latter built on top of the FEniCS Logg et al. (2011) library for differential equations, and the former allows for coupling with external solvers.

4 Arby

Arby is a Python package for data-driven surrogate modeling satisfying standard software compliance along quality assurance (see Section 4.4). It allows the user for building reduced basis and empirical interpolants at any number of parameter dimensions. At the current release, Arby builds surrogate models for 1-D domains.

4.1 Implementation

Integrals and inner products (see 3) must be discretized in implementing Alg. 1. For this, the physical interval Ω\Omega is sampled in LL equispaced points {x1,,xL}\{x_{1},\ldots,x_{L}\} to define a discrete inner product between two functions,

h1,h2d:=i=1Lh1¯(xi)h2(xi)ωi.\langle h_{1},h_{2}\rangle_{d}:=\sum_{i=1}^{L}\bar{h_{1}}(x_{i})h_{2}(x_{i})\omega_{i}\,.

The bar represents complex conjugation in case of hh being complex. The ωi\omega_{i}’s are LL positive real values called weights. Weights {ωi}\{\omega_{i}\} and sample points {xi}\{x_{i}\} constitute a quadrature rule. Arby uses quadrature rules to compute integrals.

4.2 Public API

Classes.– The main class in Arby is alg:surr for surrogate modeling. There are three basic inputs for this class:

  • alg:surr

); eq:inner); alg:surr); These inputs represent the minimum and indispensable for building surrogates. Optional parameters are:

  • greedy_tol

: the greedy tolerance ϵ\epsilon for the reduced basis, and itemize These parameters can be tuned for controlling the model accuracy. See the Arby documentation 333https://arby.readthedocs.io for a thorough tutorial on this. Once a offline-online stages (Field et al. (2014)). Thus, the offline stage corresponds to a (possibly) expensive first building; the online one corresponds to fast model evaluations. There is a class to compute inner products and integrals: the integral, norm and Basis class encompasses data utilities for handling arbitrary bases, whether they are reduced bases or user-specified ones. The Integration object. Available methods are: project and projection_error method computes squared projection errors due to projecting arrays onto the basis. Auxiliary classes are EIM. They are containers for RB/EIM information. Functions.– The main function in Arby is tiglio2021reduced). For conditioning purposes, there are two patterns related to the normalization of the training set which lead to two different implementations of the greedy algorithm. There is a function for Gram-Schmidt orthonormalization called HoffmannIMGS) to orthonormalize a set of linearly independent arrays. Internally, this algorithm builds the reduced bases.

4.3 Benchmarks

It is interesting within the work to present a performance analysis of the the most important routine of the project: reduce_basis function influence the general performance of the algorithm and if this adjusts to the theoretical estimates. These parameters are

  • riemann

, trapezoidal and euclidean. True if training data must be normalized before training or False otherwise. -14andand10^-12.physicalpointsPhysicalpointsforquadraturerules.Mustmatchthenumberofcolumnsoftrainingset..\par\par\verb{}{training_set} -- The training data as a 2-D array. We tested on square random arrays with sizes $11 \times 11$ and $101 \times 101$. \par \item \code{}{training_set} -- The training data as a 2-D array. We tested on square random arrays with sizes $11 \times 11$ and $101 \times 101$. \par \item \codephysical_{p}oints--Physicalpointsforquadraturerules.Mustmatchthenumberofcolumnsoftrainingset.\par\par With these parameters, we simulated 100 training sets for each one of the 24 possible combinations, giving a total of 24000 test cases. The benchmark was then executed on a computer with the following specifications:

  • CPU – 4 x 2.4 GHz AMD Opteron(tm) Processor 6282 SE.

  • RAM – 251GB DDR3L.

  • OS – CentOS 7 Linux 3.10.0-514.el7.x86_64

  • Software – Python 3.9.0.final.0 (64 bit), NumPy 1.21.1 and SciPy 1.7.0

The results are presented in Figure 4.3. As we can anticipate, the size of the training set is the most important factor impacting the execution times. All other parameters, out of figure[h!]

[Uncaptioned image]

Results of the benchmark on 24000 test cases varying the values of , , and . In all cases, the horizontal axis represents the different parameter values, and the vertical axis represents the execution time in seconds. We can see that the algorithm increases execution time as the size of the increases, while in all other cases the times remain relatively unchanged. To further explore the relationship between normalize=False, greedy_tol=1e-12. We generate 100 random fig:dbenchmark) show the anticipated behavior: in front of random others the cost has a growth O(LN2)O(LN^{2}) as we increase the size of https://zenodo.org/record/5139187#.YP-IAXVKhhE (Villanueva et al., 2021)

.

[Uncaptioned image]
Figure 1: Measured times for 28,90028,900 test samples of between 11×1111\times 11 and 300×300300\times 300. The samples keep the , and fixed at riemman, 101210^{-12} and False, respectively.

4.4 Quality assurance

To ensure the proper software quality of Arby, we provide standard quantitative and qualitative metrics, in particular i) unit testing and ii) code-coverage, and adhere to the PEP 88 style guide(Van Rossum et al., 2001) throughout the entire project.

  1. 1.

    Unit testing: Its purpose is to ensure that the individual software components work as expected (Jazayeri, 2007).

  2. 2.

    Code-coverage: It measures the amount of code covered by the unit test suite, expressed as a percentage of executed sentences(Miller & Maloney, 1963). By providing comprehensive code-coverage we ensure code validation, expand the ability and efficiency of error handling, and increase confidence in the code.

Arby currently uses pytest (Okken, 2017) and Coverage.py 444https://coverage.readthedocs.io for unit testing and coverage, respectively, completing up to 99%99\% of code-coverage using Python versions 3.63.6, 3.73.7, 3.83.8 and 3.93.9. The PEP 88 - Style Guide for Python Code (Van Rossum et al., 2001) is one of a series of guidelines and practices on how to write Python code to improve code readability and consistency. There are several of PEP’s (Python Enhancement Proposals), including PEP 8. The latter has recommendations for code layout, whitespaces, comments, naming conventions and programming recommendations. In addition, there are tools, called linters, that can be used, in particular, to automate compliance with PEP 8; Arby currently uses flake8 555https://pypi.org/project/flake8/, which checks for any deviation in code style. Finally, the entire source code is MIT-licensed (Initiative et al., 2006) and is publicly available from its GitHub repository666https://github.com/aaronuv/arby. All versions committed to this code are automatically tested in a continuous-integration service using Travis CI777https://travis-ci.com/github/aaronuv/arby and GitHub Actions 888https://github.com/features/actions. Documentation is automatically built from the repository and made public in the read-the-docs service at  999https://arby.readthedocs.io/en/latest/. Arby is built on top of the Python scientific stack: Numpy (Walt et al., 2011) to perform efficient numerical linear algebra operations; Scipy (Virtanen et al., 2020), used in the current release for splines interpolation. The Arby package is available for installation on the Python-Package-Index (PyPI) and can be installed using the command Toy model: a damped pendulum We illustrate the construction of surrogate models applying Arby to a classical problem in physics: the damped pendulum. This system is a simple pendulum of given longitude subject to gravity and a dissipative force such as friction, allowing for the pendulum oscillations to damp at long times. We encode the generic dynamics of this system in the ordinary differential equation (ODE)

θ¨=bθ˙λsin(θ),\ddot{\theta}=-b\dot{\theta}-\lambda\sin(\theta)\,, (8)

where θ\theta represents the time-dependent angle of the pendulum with respect to the equilibrium axis and dots represent time differentiation. The symbols bb and λ\lambda represent friction and gravity strength per unit length, respectively, at fixed values of the pendulum’s longitude. Time units do not play any role in this example, so for practical purposes, we choose it adimensional. This election makes the two parameters of the model, bb and λ\lambda, also adimensional. In order to make the model one-dimensional and being able to apply Arby for surrogate modeling, fix bb to a convenient value so as to cover the variation of solutions in the selected time range widely. Fixed bb, our parametrized model consist on damped oscillations θλ(t)\theta_{\lambda}(t), being time tt the physical variable and λ\lambda the parameter. We must solve numerically the ODE in (8) in order to generate the training set of solutions to feed Arby. To this, we set b=0.2b=0.2 and choose intervals for physical and parameter ranges as t[0,50]t\in[0,50] and λ[1,5]\lambda\in[1,5]. The initial conditions are set to (θ,ω)=(π/2,0)(\theta,\omega)=(\pi/2,0), where ω:=θ˙\omega:=\dot{\theta}, meaning the pendulum departs from rest at θ=π/2\theta=\pi/2 and falls under the action of gravity. We generate a training set of solutions using an ODE solver from the Scipy Python package (Virtanen et al., 2020). We discretize the parameter and time domains in 101101 and 1,0011,001 equispaced points respectively, and generate 101101 solutions using the same initial conditions101010The code for this example can be found in https://arby.readthedocs.io/en/latest/. See Fig. 2.

Refer to caption
Figure 2: Graphical illustration of the training set. We plot a subset of training solutions.

To build a surrogate for solutions to the pendulum equation, we invoke the training, param, respectively. In addition, we modify default values of optional class parameters to increase the surrogate’s quality:

>>> from arby import ReducedOrderModel as ROM
\par# set the greedy tolerance to 1e-14 and
# splines degree to 5 and create a model
>>> pendulum = ROM(training, time, param, greedy_tol=1e-14, poly_deg=5)

Once the model for the pendulum is created the next step is to build/evaluate the surrogate. For that, we just call it.

# define the parameter par for surrogate evaluation
>>> par = 2.
# evalute
>>> pendulum.surrogate(par)
array([1.57079633, 1.5683046 , 1.56086267, ..., 0.00993474, 0.00975876, 0.009536])

In order to test the surrogate’s accuracy, we build a test set composed of 1,0011,001 solutions (10×10\times denser than training set) for the same parameter and physical domains. It means the test set contains the training set plus several solutions not used in the training stage. We use the L2L_{2} norm to compute the relative error between surrogate and ground truth,

e(λ):=θλsurrθλθλe(\lambda):=\frac{\|\theta_{\lambda}^{surr}-\theta_{\lambda}\|}{\|\theta_{\lambda}\|} (9)

where

θλ:=([0,50]|θλ(t)|2𝑑t)12.\|\theta_{\lambda}\|:=\left(\int_{[0,50]}|\theta_{\lambda}(t)|^{2}dt\right)^{\frac{1}{2}}\,.

This metric allows us to quantify how well the surrogate globally matches the ground truth model at some parameter value. Furthermore, we compute these errors not only for the surrogate model based on 101101 training parameters but for a set of surrogates built upon different discretizations, starting from very sparse ones until reaching the discretization of 101101 training parameters. With the integration tools available in Arby, these computations become straightforward. In Fig. 3 we built a colormap of errors for different discretizations and parameter values. Naturally, the biggest errors correspond to very sparse training sets and fall below 10410^{-4} for discretizations 50\gtrsim 50. For a specific model (an horizontal line in the colormap), bright-dark patterns describe the behavior of the model when it alternates between in-sample and out-of-sample parameter evaluations. The lowest errors usually correspond to in-sample evaluations, where splines become exact. The largest ones usually correspond to out-of-sample evaluations, where the errors due to parametric fits become relevant.

Refer to caption
Figure 3: Global errors for surrogate models built from different training discretizations N=11,12,13,15,17,21,26,34,51,101N=11,12,13,15,17,21,26,34,51,101. All models are evaluated at test parameters.

For the surrogate trained with 101101 solutions, in Fig. 4 (top panel) we show the function curves for both, surrogate and ground truth models, for a parameter value corresponding to the worst global error. The surrogate evaluation at this parameter is a prediction, i.e. the associated parameter do not correspond to a training one. Since both surrogate and ground truth models are indistinguishable at eyeball resolution, we plot the absolute value for both functions in logarithmic scale to locate the dissimilarity sectors between curves better. We conclude that even the worst-case scenario shows almost no difference between surrogate and ground truth solution. The bottom panel of Fig. 4 shows the absolute value of the point-wise difference between surrogate and test solutions. We removed from the test set those points which correspond to training parameters in order to focus only on generalization errors of the surrogate. We see that point-wise errors jump up at most to 104\sim 10^{-4} whereas the bulk remains close to 107\sim 10^{-7}.

Refer to caption
Refer to caption
Figure 4: Top. Absolute value of function curves for both models, surrogate and ground truth. They correspond to the worst prediction in the test set. Bottom. Point-wise difference errors for test parameters. Training points are excluded for which errors are much smaller than pure test errors.

5 Cosmic Microwave Background Anisotropies: a multidimensional case

A valuable application of surrogate models could stem from its application to modeling CMB temperature and polarization anisotropies power spectra as measured by satellites like Planck (Planck Collaboration et al., 2020). Such power spectra have a strong dependence on the underlying cosmological model, defined by a set of cosmological parameters [Ωbh2,Ωmh2,H0,n,τ,As,109Asexp(2τ)][\Omega_{b}h^{2},\Omega_{m}h^{2},H_{0},n,\tau,A_{s},10^{9}A_{s}exp(-2\tau)] , here taken to be 77-dimensional. Using the CAMB (Code for Anisotropies in the Microwave Background)) for each cosmological model (Lewis et al., 2000), we generated their corresponding observed temperature anisotropies power spectra. We randomly sampled the cosmological parameter space using 80,00080,000 points; as we will see, this is sufficiently dense. The independent or physical variable here results to be the angular multipole index \ell, which we sampled using 3,0003,000 discrete points =1,,3,000\ell=1,\ldots,3,000. We compute a reduced basis using Arby. For a greedy tolerance of 10410^{-4} we obtain a set of greedy parameters identifying those elements in the training set that conform the reduced basis, allowing us to describe any power spectra in our sample as linear combinations of them. In particular, any training set composed of power spectra is equivalent to a set of just 84 reduced functions. Thus, for example, in Fig. 5 we show the distribution of the cosmological parameters for a training set of 3,0003,000 CMB power spectra as blue points and the corresponding selected set of reduced basis as orange points. This methodology could accelerate the estimation of the cosmological parameters from CMB anisotropies by using a surrogate model built using Machine Learning algorithms on the reduced basis set. The utilization of this approach to estimate cosmological parameters will be the subject of a forthcoming publication.

Refer to caption
Figure 5: Cosmological parameters space of a training set in blue dots and their corresponding Arby selected reduced basis set in orange points.

We report the convergence of the reduced basis number nn as a function of the training set size NN in Fig. 6. This plot shows that the reduced basis number nn stabilizes close to n=84n=84, meaning that, for the specified greedy tolerance, the training set begins to saturate at N102103N\sim 10^{2}-10^{3}. This proves that the full 80,00080,000-points dataset involves highly redundant information, for which only 8484 functions are enough to represent the entire set. With the reduced basis we reach a compression factor of 3434 just simply by computing projections of the training set.

Refer to caption
Figure 6: Convergence of the number of basis elements with the size of the training set. We observe the curve saturates around 8484 basis elements.

6 Conclusion

We have introduced Arby, an open source Python package that provides a set of tools to generate and handle fast and highly-accurate surrogate models in a non-intrusive way. Arby can be used to construct continuous models from sparse data composed by functions generated perhaps by differential equations or to explore redundancies in data by dimensional reduction. The offline-online architecture of Arby allows for fast deployments of predictive models that can approximate functions which otherwise can be expensive to compute. We assessed the package with unit testing tools and ensured it satisfies proper software quality assurance, which improves in robustness and readability of code. We also perform benchmarks measuring computation times of reduced bases along different parameter combinations. To date, for surrogate modeling Arby works on 1-D domains for both spaces, the parametric and the physical, though it supports multidimensional parameter domains for reduced basis and empirical interpolation. In future releases we want to extend Arby to several dimensions for surrogate modeling and combine it with state-of-the-art regression methods at the fitting stages. The curse of the dimensionality present in high dimensional problems like the CMB power spectra discussed in section 5 surely become a bottleneck, so parallelized scenarios are worth to be explored in the future. The user-friendly API of Arby expands the usability of the code to virtually anyone looking for a continuous model out from discrete data, even with little or no knowledge of ROM methods.

Acknowledgements.
The authors would like to thank to their families and friends, and also IATE astronomers and Manuel Tiglio for useful comments and suggestions. This work was partially supported by the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET, Argentina). A.V., J.B.C and M.Ch. are supported by a fellowship from CONICET. This research has made use of the http://adsabs.harvard.edu/, Cornell University xxx.arxiv.org repository, adstex (https://github.com/yymao/adstex) and the Python programming language.

References

  • Abbott et al. (2016) Abbott, B. P., Abbott, R., Abbott, T. D., et al. 2016, Phys. Rev. Lett., 116, 241102
  • Abbott et al. (2020) Abbott, B. P., Abbott, R., Abbott, T. D., et al. 2020, Classical and Quantum Gravity, 37, 055002
  • Antil et al. (2018) Antil, H., Chen, D., & Field, S. E. 2018, Comput. Sci. Eng., 20, 10
  • Ballarin et al. (2015) Ballarin, F., Sartori, A., & Rozza, G. 2015, ScienceOpen Posters
  • Barrault et al. (2004) Barrault, M., Maday, Y., Nguyen, N. C., & Patera, A. T. 2004, Comptes Rendus Mathematique, 339, 667
  • Blackman et al. (2015) Blackman, J., Field, S. E., Galley, C. R., et al. 2015, Phys. Rev. Lett., 115, 121102
  • Blackman et al. (2017a) Blackman, J., Field, S. E., Scheel, M. A., et al. 2017a, Phys. Rev. D, 95, 104023
  • Blackman et al. (2017b) Blackman, J., Field, S. E., Scheel, M. A., et al. 2017b, Phys. Rev. D, 96, 024058
  • Blanchet (2006) Blanchet, L. 2006, Living Rev. Rel., 9, 4
  • Boyaval et al. (2009) Boyaval, S., Bris, C. L., Maday, Y., Nguyen, N. C., & Patera, A. T. 2009, Computer Methods in Applied Mechanics and Engineering, 198, 3187
  • Chaturantabut & Sorensen (2010) Chaturantabut, S. & Sorensen, D. 2010, SIAM J. Scientific Computing, 32, 2737
  • Cutler & Flanagan (1994) Cutler, C. & Flanagan, E. E. 1994, Phys. Rev. D, 49, 2658
  • Damour & Nagar (2011) Damour, T. & Nagar, A. 2011, Fundam. Theor. Phys., 162, 211
  • Field et al. (2011) Field, S. E., Galley, C. R., Herrmann, F., et al. 2011, Phys. Rev. Lett., 106, 221102
  • Field et al. (2014) Field, S. E., Galley, C. R., Hesthaven, J. S., Kaye, J., & Tiglio, M. 2014, Phys. Rev. X, 4, 031006
  • Field et al. (2012) Field, S. E., Galley, C. R., & Ochsner, E. 2012, Phys. Rev., D86, 084046
  • Hannam et al. (2014) Hannam, M., Schmidt, P., Bohé, A., et al. 2014, Phys. Rev. Lett., 113, 151101
  • Hesthaven et al. (2015) Hesthaven, J. S., Rozza, G., & Stamm, B. 2015, Certified Reduced Basis Methods for Parametrized Partial Differential Equations, 1st edn., Springer Briefs in Mathematics (Switzerland: Springer), 135
  • Hoffmann (1989) Hoffmann, W. 1989, Computing, 41, 335
  • Initiative et al. (2006) Initiative, O. S. et al. 2006, 2015b.[Online]. Available: https://opensource. org/licenses/MIT.[Accessed 27 March 2017]
  • Jazayeri (2007) Jazayeri, M. 2007, in 2007 Future of Software Engineering, FOSE ’07 (USA: IEEE Computer Society), 199–213
  • J.H. Ahlberg & Walsh (1967) J.H. Ahlberg, E. N. & Walsh, J. L. 1967, The theory of splines and their applications (Academic Press)
  • Khan et al. (2019) Khan, S., Chatziioannou, K., Hannam, M., & Ohme, F. 2019, Phys. Rev. D, 100, 024059
  • Lehner & Pretorius (2014) Lehner, L. & Pretorius, F. 2014, Ann. Rev. Astron. Astrophys., 52, 661
  • Lewis et al. (2000) Lewis, A., Challinor, A., & Lasenby, A. 2000, ApJ, 538, 473
  • Logg et al. (2011) Logg, A., Wells, G., & Mardal, K.-A. 2011, Automated solution of differential equations by the finite element method. The FEniCS book, Vol. 84
  • Maday et al. (2009) Maday, Y., Nguyen, N. C., Patera, A. T., & Pau, S. H. 2009, Communications on Pure and Applied Analysis, 8, 383
  • Milk et al. (2016) Milk, R., Rave, S., & Schindler, F. 2016, SIAM Journal on Scientific Computing, 38, S194
  • Miller & Maloney (1963) Miller, J. C. & Maloney, C. J. 1963, Commun. ACM, 6, 58
  • Okken (2017) Okken, B. 2017, Python testing with Pytest: simple, rapid, effective, and scalable (Pragmatic Bookshelf)
  • Planck Collaboration et al. (2020) Planck Collaboration, Aghanim, N., Akrami, Y., et al. 2020, A&A, 641, A6
  • Quarteroni et al. (2015) Quarteroni, A., Manzoni, A., & Negri, F. 2015, Reduced Basis Methods for Partial Differential Equations: An Introduction, UNITEXT (Springer International Publishing)
  • Rifat et al. (2020) Rifat, N. E. M., Field, S. E., Khanna, G., & Varma, V. 2020, Phys. Rev. D, 101, 081502
  • Sturani et al. (2010) Sturani, R., Fischetti, S., Cadonati, L., et al. 2010, Journal of Physics: Conference Series, 243, 012007
  • Tiglio & Villanueva (2020) Tiglio, M. & Villanueva, A. 2020
  • Tiglio & Villanueva (2021) Tiglio, M. & Villanueva, A. 2021, Reduced Order and Surrogate Models for Gravitational Waves
  • Van Rossum et al. (2001) Van Rossum, G., Warsaw, B., & Coghlan, N. 2001, Python. org, 1565
  • Varma et al. (2019a) Varma, V., Field, S. E., Scheel, M. A., et al. 2019a, Phys. Rev. Research., 1, 033015
  • Varma et al. (2019b) Varma, V., Field, S. E., Scheel, M. A., et al. 2019b, Phys. Rev. D, 99, 064045
  • Villanueva et al. (2021) Villanueva, A., Beroiz, M., Cabral, J., Chalela, M., & Dominguez, M. 2021, Benchmark dataset for arby
  • Virtanen et al. (2020) Virtanen, P., Gommers, R., Oliphant, T. E., et al. 2020, Nature methods, 17, 261
  • Walt et al. (2011) Walt, S., Colbert, S. C., & Varoquaux, G. 2011, Computing in Science and Engineering, 13, 22

 

Appendix A Build reduced bases

By defining

  • the training set 𝒦:={hλi}i=1N{\cal K}:=\{h_{\lambda_{i}}\}_{i=1}^{N},

  • the projection error at parameter λ\lambda

    σi(λ):=hλ𝒫ihλ2,\sigma_{i}(\lambda):=\|h_{\lambda}-{\cal P}_{i}h_{\lambda}\|^{2}\,,

    where 𝒫i{\cal P}_{i} is the projector operator associated to a ii-sized basis,

  • ϵ\epsilon: the greedy tolerance,

  • GS(h,𝚋𝚊𝚜𝚒𝚜)(h,{\tt basis}): Orthonormalizes hh against a basis through a Gram-Schmidt procedure,

  • gp: the greedy points, and

  • rb: the reduced basis,

the Reduced Basis-greedy algorithm proceeds as follows:

Algorithm 2 RB greedy algorithm
1:Input: 𝒦{\cal K}, ϵ\epsilon
2:Seed choice (arbitrary): Λ1𝒯\Lambda_{1}\in{\cal T}
3:e1=hΛ1/hΛ1e_{1}=h_{\Lambda_{1}}/\|h_{\Lambda_{1}}\|
4:rb = {e1}\{e_{1}\}, gp = {Λ1}\{\Lambda_{1}\}
5:Λ2=argmaxλ𝒯σ1(λ)\Lambda_{2}=\text{argmax}_{\lambda\in{\cal T}}\sigma_{1}(\lambda)
6:σ1=σ1(Λ2)\sigma_{1}=\sigma_{1}(\Lambda_{2})
7:Initialize i=1i=1
8:while σi>ϵ\sigma_{i}>\epsilon do
9:     i=i+1i=i+1
10:     gp = gp {Λi}\cup\{\Lambda_{i}\}
11:     ei=𝙶𝚂(hΛi,𝚛𝚋)e_{i}={\tt GS}(h_{\Lambda_{i}},{\tt rb})
12:     rb = rb {ei}\cup\{e_{i}\}
13:     Λi+1=argmaxλ𝒯σi(λ)\Lambda_{i+1}=\text{argmax}_{\lambda\in{\cal T}}\sigma_{i}(\lambda)
14:     σi=σi(Λi+1)\sigma_{i}=\sigma_{i}(\Lambda_{i+1})
15:end while
16:Output: rb = {ei}i=1n\{e_{i}\}_{i=1}^{n} and gp = {Λi}i=1n\{\Lambda_{i}\}_{i=1}^{n}

Appendix B Build Empirical Interpolants

Algorithm 3 EIM algorithm
1:Input: 𝚛𝚋={ei}i=1n{\tt rb}=\{e_{i}\}_{i=1}^{n}
2:X1=argmaxx|e1|X_{1}=\text{argmax}_{x}|e_{1}|
3:for j=2nj=2\to n do
4:     Build j1[ej](x){\cal I}_{j-1}[e_{j}](x)
5:     r(x)=ej(x)j1[ej](x)r(x)=e_{j}(x)-{\cal I}_{j-1}[e_{j}](x)
6:     Xj=argmaxx|r|X_{j}=\text{argmax}_{x}|r|
7:end for
8:Output: EIM nodes {Xi}i=1n\{X_{i}\}_{i=1}^{n} and interpolant n{\cal I}_{n}

Appendix C On computing projection coefficients

The most relevant step in terms of computational cost in the RB greedy algorithm is the computation of projection coefficients, that is, step 1313 of Alg. 2. Taking full advantage of the reduced basis orthonormality

ei,ej=δiji,j=1,,n\langle e_{i},e_{j}\rangle=\delta_{ij}\quad i,j=1,\ldots,n

we can write projection errors as

σn(λ)=hλ𝒫nhλ2=hλ2i=1n|ci(λ)|2,\displaystyle\sigma_{n}(\lambda)=\|h_{\lambda}-{\cal P}_{n}h_{\lambda}\|^{2}=\|h_{\lambda}\|^{2}-\sum_{i=1}^{n}|c_{i}(\lambda)|^{2}\,, (10)

where ci(λ)=ei,hλc_{i}(\lambda)=\langle e_{i},h_{\lambda}\rangle are the projection coefficients. Note that

σn+1=σn|cn+1|2.\sigma_{n+1}=\sigma_{n}-|c_{n+1}|^{2}\,.

We omitted the λ\lambda label for simplicity. This allows for constant computational cost in the addition of a new element to the basis, since one only need to compute the projection coefficients for the n+1n+1 basis element while storing those corresponding to the previous basis. In practice, the orthonormalization of the basis is not perfect and carries some error due to machine precision ϵ\epsilon. Therefore, inner products between basis elements write as

ei,ej=δij+ϵ,\langle e_{i},e_{j}\rangle=\delta_{ij}+\epsilon\,,

and this error propagates through projection error as

σn=h2i=1n|ci|2+ϵi,j=1nc¯icj.\sigma_{n}=\|h\|^{2}-\sum_{i=1}^{n}|c_{i}|^{2}+\epsilon\sum_{i,j=1}^{n}\bar{c}_{i}c_{j}\,.

Then, naive implementations of this rule (saving projection coefficients to compute the next projection error) can lead to undesired error amplifications whenever |ci|>1|c_{i}|>1 and therefore wrong estimations of projection errors. This is avoided simply by normalizing the training set. By doing so, we ensure |ci|1|c_{i}|\leq 1 and keep controlled all orthonormalization errors. Arby allows for normalizing options in the document