Bayesian model inversion using stochastic spectral embedding

P.-R. Wagner, S. Marelli, B. Sudret

(14.05.2020)

Abstract

In this paper we propose a new sampling-free approach to solve Bayesian model inversion problems that is an extension of the previously proposed spectral likelihood expansions (SLE) method. Our approach, called stochastic spectral likelihood embedding (SSLE), uses the recently presented stochastic spectral embedding (SSE) method for local spectral expansion refinement to approximate the likelihood function at the core of Bayesian inversion problems.

We show that, similar to SLE, this approach results in analytical expressions for key statistics of the Bayesian posterior distribution, such as evidence, posterior moments and posterior marginals, by direct post-processing of the expansion coefficients. Because SSLE and SSE rely on the direct approximation of the likelihood function, they are in a way independent of the computational/mathematical complexity of the forward model. We further enhance the efficiency of SSLE by introducing a likelihood specific adaptive sample enrichment scheme.

To showcase the performance of the proposed SSLE, we solve three problems that exhibit different kinds of complexity in the likelihood function: multimodality, high posterior concentration and high nominal dimensionality. We demonstrate how SSLE significantly improves on SLE, and present it as a promising alternative to existing inversion frameworks.

Keywords: Bayesian model inversion, inverse problems, polynomial chaos expansions, spectral likelihood expansions, stochastic spectral likelihood embedding, sampling-free inversion.

1 Introduction

Computational models are an invaluable tool for decision making, scientific advances and engineering breakthroughs. They establish a connection between a set of input parameters and output quantities with wide-ranging applications. Model inversion uses available experimental observations of the output to determine the set of input parameters that maximize the predictive potential of a model. The importance of efficient and reliable model inversion frameworks can hardly be overstated, considering that they establish a direct connection between models and the real world. Without it, the most advanced model predictions might lack physical meaning and, consequently, be useless for their intended applications.

Bayesian model inversion is one way to formalize this problem (Jaynes, 2003; Gelman et al., 2014). It is based on Bayesian inference and poses the problem in a probabilistic setting by capitalizing on Bayes’ theorem. In this setting a so-called prior (i.e., before observations) probability distribution about the model parameters is updated to a so-called posterior (i.e., after observations) distribution. The posterior distribution is the probability distribution of the input parameters conditioned on the available observations, and the main outcome of the Bayesian inversion process.

In Bayesian model inversion, the connection between the model output and the observations is established through a probabilistic discrepancy model. This model, which is a function of the input parameters, leads to the so-called likelihood function. The specific form of the likelihood function depends on the problem at hand, but typically it has a global maximum for the input parameters with the model output that is closest to the available observations (w.r.t. some metric), and rapidly goes to zero with increasing distance to those parameters.

Analytical expressions for the posterior distribution can only be found in few academic examples (e.g., conjugate priors with a linear forward model, Bishop (2006); Gelman et al. (2014)). In general model inversion problems, such analytical solutions are not available though. Instead, it is common practice to resort to sampling methods to generate a sample distributed according to the posterior distribution. The family of Markov chain Monte Carlo (MCMC) algorithms are particularly suitable for generating such a posterior sample (Beck and Au, 2002; Robert and Casella, 2004).

While MCMC and its extensions are extensively used in model inversion, and new algorithms are continuously being developed (Haario et al., 2001; Ching and Chen, 2007; Goodman and Weare, 2010; Neal, 2011), it has a few notable shortcomings that hinder its application in many practical cases. It is well known that there are no robust convergence criteria for MCMC algorithms, and that their performance is particularly sensitive to their tuning parameters. Additionally, samples generated by MCMC algorithms are often highly correlated, thus requiring extensive heuristic post-processing and empirical rules (Gelman and Rubin, 1992; Brooks and Gelman, 1998). MCMC algorithms are also in general not well suited for sampling multimodal posterior distributions.

When considering complex engineering scenarios, the models subject to inversion are often computationally expensive. Because MCMC algorithms usually require a significant number of forward model evaluations, it has been proposed to accelerate the procedure by using surrogate models in lieu of the original models. These surrogate models are either constructed non-adaptively before sampling from the posterior distribution (Marzouk et al., 2007; Marzouk and Xiu, 2009) or adaptively during the sampling procedure (Li and Marzouk, 2014; Birolleau et al., 2014; Cui et al., 2014; Conrad et al., 2016; Conrad et al., 2018; Yan and Zhou, 2019). Adaptive techniques can be of great benefit with posterior distributions that are concentrated in a small subspace of the prior domain, as the surrogate only needs to be accurate near high density areas of the posterior distribution.

Polynomial chaos expansions (PCE) are a widely used surrogate modelling technique based on expanding the forward model onto a suitable polynomial basis (Ghanem and Spanos, 1991; Xiu and Karniadakis, 2002). In other words, it provides a spectral representation of the computational forward model. Thanks to the introduction of sparse regression (see, e.g. Blatman and Sudret (2011)), its computation has become feasible even in the presence of complex and computationally expensive engineering models. This technique has been successfully used in conjunction with MCMC to reduce the total computational costs associated with sampling from the posterior distribution (Marzouk et al., 2007; Wagner et al., 2020).

Alternative approaches to compute the posterior distribution or its statistics include the Laplace approximation at a posterior mode (Tierney and Kadane, 1986; Tierney et al., 1989b, a), approximate Bayesian computations (ABC) (Marin et al., 2012; Sisson et al., 2018), optimal transport approaches (El Moselhy and Marzouk, 2012; Parno, 2015; Marzouk et al., 2016) and embarrassingly parallel quasi-Monte Carlo sampling (Dick et al., 2017; Gantner and Peters, 2018).

Stochastic spectral embedding (SSE) is a metamodelling technique suitable for approximating functions with complex localized features recently developed in Marelli et al. (2021). In this paper we propose to extend this technique with an ad-hoc adaptive sample enrichment strategy that makes it suitable to efficiently approximate likelihood functions in Bayesian model inversion problems. This method can be seen as a generalization of the previously proposed spectral likelihood expansions (SLE) approach (Nagel and Sudret, 2016).

Due to its deep connection to SLE, we call the application of SSE to likelihood functions stochastic spectral likelihood embedding (SSLE). We show that, due to its local spectral characteristics, this approach allows us to analytically derive expressions for the posterior marginals and general posterior moments by post-processing its expansion coefficients.

The paper is organized as follows: In Section 2 we establish the basics of Bayesian inference and particularly Bayesian model inversion. We then give an introduction into spectral function decomposition with a focus on polynomial chaos expansions and their application to likelihood functions (SLE) in Section 3. In Section 4 we present the main contribution of the paper, namely the derivation of Bayesian posterior quantities of interest through SSLE and the extension of the SSE algorithm with an adaptive sampling strategy. Finally, in Section 5 we showcase the performance of our approach on three case studies of varying complexity.

2 Model inversion

The problem of model inversion occurs whenever the predictions of a model are to be brought into agreement with available observations or data. This is achieved by properly adjusting a set of input parameters of the model. The goal of inversion can be twofold: on the one hand the inferred input parameters might be used to predict new realizations of the model output. On the other hand, the inferred input parameters might be the main interest. Model inversion is a common problem in many engineering disciplines, that in some cases is still routinely solved manually, i.e. by simply changing the input parameters until some, often qualitative, goodness-of-fit criterion is met. More quantitative inversion approaches aim at automatizing this process, by establishing a metric (e.g., $L^{2}$ -distance) between the data and the model response, which is then minimized through suitable optimization algorithms.

While such approaches can often be used in practical applications, they tend not to provide measures of the uncertainties associated with the inferred model input or predictions. These uncertainties are useful in judging the accuracy of the inversion, as well as indicating non-informative measurements. In fact, the lack of uncertainty quantification in the context of model inversion can lead to erroneous results that have far-reaching consequences in subsequent applications. One approach to consider uncertainties in inverse problems is the Bayesian framework for model inversion that will be presented hereinafter.

2.1 Bayesian inference

Consider some non-observable parameters $\bm{X}\in{\mathcal{D}}_{\bm{X}}$ and the observables $\bm{Y}\in{\mathcal{D}}_{\bm{Y}}$ . Furthermore, let ${{\mathcal{Y}}=\{\bm{y}^{(1)},\dots,\bm{y}^{(N)}\}}$ be a set of $N$ measurements, i.e., noisy observations of a set of realizations of $\bm{Y}$ . Statistical inference consists in drawing conclusions about $\bm{X}$ using the information from ${\mathcal{Y}}$ (Gelman et al., 2014). These measurements can be direct observations of the parameters ( $\bm{Y}=\bm{X}$ ) or some quantities indirectly related to $\bm{X}$ through a function or model ${\mathcal{M}}:{\mathcal{D}}_{\bm{X}}\to{\mathcal{D}}_{\bm{Y}}$ . One way to conduct this inference is through Bayes’ theorem of conditional probabilities, a process known as Bayesian inference.

Denoting by $\pi(\cdot)$ a probability density function (PDF) and by $\pi(\cdot|\bm{x})$ a PDF conditioned on $\bm{x}$ , Bayes’ theorem can be written as

\pi(\bm{x}|{\mathcal{Y}})=\frac{\pi({\mathcal{Y}}|\bm{x})\pi(\bm{x})}{\pi({\mathcal{Y}})},

(1)

where $\pi(\bm{x})$ is known as the prior distribution of the parameters, i.e., the distribution of $\bm{X}$ before observing the data ${\mathcal{Y}}$ . The conditional distribution $\pi({\mathcal{Y}}|\bm{x})$ , known as likelihood, establishes a connection between the observations ${\mathcal{Y}}$ and a realization of the parameters $\bm{X}=\bm{x}$ . For a given realization $\bm{x}$ , it returns the probability density of observing the data ${\mathcal{Y}}$ . Under the common assumption of independence between individual observations, $\{\bm{y}^{(i)}\}_{i=1}^{N}$ , the likelihood function takes the form:

{\mathcal{L}}:\bm{x}\mapsto{\mathcal{L}}(\bm{x};\,{\mathcal{Y}})\stackrel{{\scriptstyle\text{def}}}{{=}}\pi({\mathcal{Y}}|\bm{x})=\prod_{i=1}^{N}\pi(\bm{y}^{(i)}|\bm{x}).

(2)

The likelihood function is a map ${\mathcal{D}}_{\bm{X}}\rightarrow\mathbb{R}_{+}$ , and it attains its maximum for the parameter set with the highest probability of yielding ${\mathcal{Y}}$ . With this, Bayes’ theorem from Eq. (1) can be rewritten as:

\pi(\bm{x}|{\mathcal{Y}})=\frac{{\mathcal{L}}(\bm{x};\,{\mathcal{Y}})\pi(\bm{x})}{Z},\quad\text{with}\quad Z=\int_{{\mathcal{D}}_{\bm{X}}}{\mathcal{L}}(\bm{x};\,{\mathcal{Y}})\pi(\bm{x})\,{\rm d}\bm{x},

(3)

where $Z$ is a normalizing constant often called evidence or marginal likelihood. On the left-hand side, $\pi(\bm{x}|{\mathcal{Y}})$ is the posterior PDF, i.e., the distribution of $\bm{X}$ after observing data ${\mathcal{Y}}$ . In this sense, Bayes’ theorem establishes a general expression for updating the prior distribution using a likelihood function to incorporate information from the data.

2.2 Bayesian model inversion

Bayesian model inversion describes the application of the Bayesian inference framework to the problem of model inversion (Beck and Katafygiotis, 1998; Kennedy and O’Hagan, 2001; Jaynes, 2003; Tarantola, 2005). The two main ingredients needed to infer model parameters within the Bayesian framework are a prior distribution $\pi(\bm{x})$ of the model parameters and a likelihood function ${\mathcal{L}}$ . In practical applications, prior information about the model parameters is often readily available. Typical sources of such information are physical parameter constraints or expert knowledge. Additionally, prior inversion attempts can serve as guidelines to assign informative prior distributions. In cases where no prior information about the parameters is available, so-called non-informative or invariant prior distributions (Jeffreys, 1946; Harney, 2016) can also be assigned. The likelihood function serves instead as the link between model parameters $\bm{X}$ and observations of the model output ${\mathcal{Y}}$ . To connect these two quantities, it is necessary to choose a so-called discrepancy model that gives the relative probability that the model response to a realization of $\bm{X}=\bm{x}$ describes the observations. One common assumption for this probabilistic model is that the measurements are perturbed by a Gaussian additive discrepancy term $\bm{E}\sim{\mathcal{N}}(\bm{\varepsilon}|\bm{0},\bm{\Sigma})$ , with covariance matrix $\bm{\Sigma}$ . For a single measurement $\bm{y}^{(i)}$ it reads:

\bm{y}^{(i)}={\mathcal{M}}(\bm{x})+\bm{\varepsilon}.

(4)

This discrepancy between the model output ${\mathcal{M}}(\bm{X})$ and the observables $\bm{Y}$ can result from measurement error or model inadequacies. By using this additive discrepancy model, the distribution of the observables conditioned on the parameters $\bm{Y}|\bm{x}$ is written as:

\pi(\bm{y}^{(i)}|\bm{x})={\mathcal{N}}(\bm{y}^{(i)}|{\mathcal{M}}(\bm{x}),\bm{\Sigma}),

(5)

where ${\mathcal{N}}(\cdot|\bm{\mu},\bm{\Sigma})$ denotes the multivariate Gaussian PDF with mean value $\bm{\mu}$ and covariance matrix $\bm{\Sigma}$ . The likelihood function ${\mathcal{L}}$ is then constructed using this probabilistic model $\pi(\bm{y}^{(i)}|\bm{x})$ and Eq. (2). For a given set of measurements ${\mathcal{Y}}$ it thus reads:

{\mathcal{L}}(\bm{x};\,{\mathcal{Y}})\stackrel{{\scriptstyle\text{def}}}{{=}}\prod_{i=1}^{N}{\mathcal{N}}(\bm{y}^{(i)}|{\mathcal{M}}(\bm{x}),\bm{\Sigma}).

(6)

With the fully specified Bayesian model inversion problem, Eq. (3) directly gives the posterior distribution of the model parameters $\pi(\bm{x}|{\mathcal{Y}})$ . In the setting of model inversion, the posterior distribution represents therefore the state of belief about the true data-generating model parameters, considering all available information: computational forward model, discrepancy model and measurement data (Beck and Katafygiotis, 1998; Jaynes, 2003).

Often, the ultimate goal of model inversion is to provide a set of inferred parameters, with associate confidence measures/intervals. This is often achieved by computing posterior statistics (e.g., moments, mode, etc.). Propagating the posterior through secondary models is also of interest. So-called quantites of interest (QoI) can be expressed by calculating the posterior expectation of suitable functions of the parameters $h(\bm{x}):\mathbb{R}^{M}\to\mathbb{R}$ , with $\bm{X}|{\mathcal{Y}}\sim\pi(\bm{x}|{\mathcal{Y}})$ , as in:

{\mathbb{E}}\left[h(\bm{X})|{\mathcal{Y}}\right]=\int_{{\mathcal{D}}_{\bm{X}|{\mathcal{Y}}}}h(\bm{x})\pi(\bm{x}|{\mathcal{Y}})\,{\rm d}\bm{x}.

(7)

Depending on $h$ , this formulation encompasses posterior moments ( $h(\bm{x})=x_{i}$ or $h(\bm{x})=(x_{i}-{\mathbb{E}}\left[X_{i}\right])^{2}$ for the first and second moments, respectively), posterior covariance ( $h(\bm{x})=x_{i}x_{j}-{\mathbb{E}}\left[X_{i}\right]{\mathbb{E}}\left[X_{j}\right]$ ) or expectations of secondary models ( $h(\bm{x})={\mathcal{M}}^{\star}(\bm{x})$ ).

3 Spectral function decomposition

To pose a Bayesian inversion problem, the specification of a prior distribution and a likelihood function described in the previous section is sufficient. Its solution, however, is not available in closed form in the general case.

Spectral likelihood expansion (SLE) is a recently proposed method that aims at solving the Bayesian inversion problem by finding a polynomial chaos expansion (PCE) of the likelihood function in a basis orthogonal w.r.t. the prior distribution (Nagel and Sudret, 2016). This representation allows one to derive analytical expressions for the evidence $Z$ , the posterior distribution, the posterior marginals, and many types of QoIs, including the posterior moments.

We offer here a brief introduction to regression-based, sparse PCE before introducing SLE, but refer the interested reader to more exhaustive resources on PCE (Ghanem and Spanos, 1991; Xiu and Karniadakis, 2002) and sparse PCE (Xiu, 2010; Blatman and Sudret, 2010, 2011).

Let us consider a random variable $\bm{X}$ with independent components $\{X_{i},i=1,\dots,M\}$ and associated probability density functions $\pi_{i}(x_{i})$ so that $\pi(\bm{x})=\prod_{i=1}^{M}\pi_{i}(x_{i}).$ Assume further that $\mathcal{M}:{\mathcal{D}}_{\bm{X}}=\prod_{i=1}^{M}{\mathcal{D}}_{X_{i}}\subseteq\mathbb{R}^{M}\to\mathbb{R}$ is a scalar function of $\bm{X}$ which fulfills the finite variance condition ( ${\mathbb{E}}\left[\mathcal{M}(\bm{X})^{2}\right]<+\infty$ ). Then it is possible to find a so-called truncated polynomial chaos approximation of $\mathcal{M}$ such that

\mathcal{M}(\bm{X})\approx\mathcal{M}_{\mathrm{PCE}}(\bm{X})\stackrel{{\scriptstyle\text{def}}}{{=}}\sum_{\bm{\alpha}\in{\mathcal{A}}}a_{\bm{\alpha}}\Psi_{\bm{\alpha}}(\bm{X})

(8)

where $\bm{\alpha}$ is an $M$ -tuple $(\alpha_{1},\dots,\alpha_{M})\in\mathbb{N}^{M}$ and ${\mathcal{A}}\subset\mathbb{N}^{M}$ . For most parametric distributions, well-known classical orthonormal polynomials $\{\Psi_{\bm{\alpha}}\}_{\bm{\alpha}\in\mathbb{N}^{M}}$ satisfy the necessary orthonormality condition w.r.t. $\pi(\bm{x})$ (Xiu and Karniadakis, 2002). For more general distributions, arbitrary orthonormal polynomials can be constructed numerically through the Stieltjes procedure (Gautschi, 2004). If additionally, ${\mathcal{A}}$ is a sparse subset of $\mathbb{N}^{M}$ , the truncated expansion in Eq. (8) is called a sparse PCE.

In this contribution, we restrict the discussion to independent inputs, because it is computationally challenging, albeit possible, to construct an orthogonal basis for dependent inputs (e.g. through Gram-Schmidt decomposition of ). Furthermore, Torre et al. (2019) demonstrated that for purely predictive purposes, ignoring input dependence can significantly improve the predictive performance of PCE, at the cost of losing basis orthonormality. The latter, however, is required to derive analytical post-processing quantities such as the moments of the posterior distribution (see Sections 3.1 and 4.1).

There exist different algorithms to produce a sparse PCE in practice, i.e. select a sparse basis ${\mathcal{A}}$ and compute the corresponding coefficients. A powerful class of methods are regression-based approaches that rely on an initial input sample $\mathcal{X}$ , called experimental design, and corresponding model evaluations $\mathcal{M}(\mathcal{X})$ (See, e.g. Lüthen et al. (2021) for a recent survey). Additionally, it is possible to design adaptive algorithms that choose the truncated basis size (Blatman and Sudret, 2011; Jakeman et al., 2015).

To assess the accuracy of PCE, the so-called generalization error $\mathbb{E}\left[(\mathcal{M}(\bm{X})-\mathcal{M}_{\mathrm{PCE}}(\bm{X}))^{2}\right]$ shall be evaluated. A robust generalization error estimator is given by the leave-one-out (LOO) cross validation technique. This estimator is obtained by

\varepsilon_{\mathrm{LOO}}=\frac{1}{K}\sum_{i=1}^{K}\left(\mathcal{M}(\bm{x}^{(i)})-\mathcal{M}_{\mathrm{PCE}}^{\sim i}(\bm{x}^{(i)})\right)^{2},

(9)

where $\mathcal{M}_{\mathrm{PCE}}^{\sim i}$ is constructed by leaving out the $i$ -th point from the experimental design ${\mathcal{X}}$ . For methods based on linear regression, it can be shown (Chapelle et al., 2002; Blatman and Sudret, 2010) that the LOO error is available analytically by post-processing the regressor matrix.

3.1 Spectral likelihood expansions

The idea of SLE is to use sparse PCE to find a spectral representation of the likelihood function ${\mathcal{L}}$ occurring in Bayesian model inversion problems (see Eq. (2)). We present here a brief introduction to the method and the main results of Nagel and Sudret (2016).

Likelihood functions can be seen as scalar functions of the input random vector $\bm{X}\sim\pi(\bm{x})$ . In this work we assume priors of the type $\pi(\bm{x})=\prod_{i=1}^{M}\pi_{i}(x_{i})$ , i.e. with independent marginals, their spectral expansion then reads:

{\mathcal{L}}(\bm{X})\approx{\mathcal{L}}_{\mathrm{SLE}}(\bm{X})\stackrel{{\scriptstyle\text{def}}}{{=}}\sum_{\bm{\alpha}\in{\mathcal{A}}}a_{\bm{\alpha}}\Psi_{\bm{\alpha}}(\bm{X}),

(10)

where the explicit dependence on ${\mathcal{Y}}$ was dropped for notational simplicity.

Upon computing the basis and coefficients in Eq. (8), the solution to the inverse problem is converted to merely post-processing the coefficients $a_{\bm{\alpha}}$ . The following expressions can be derived for the individual quantities:

Evidence

The evidence emerges as the coefficient of the constant polynomial $a_{\bm{0}}$

Z=\int_{{\mathcal{D}}_{\bm{X}}}{\mathcal{L}}(\bm{x})\pi(\bm{x})\,{\rm d}\bm{x}\approx\left<{\mathcal{L}}_{\mathrm{SLE}},1\right>_{\pi}=a_{\bm{0}}.

(11)

Posterior

Upon computing the evidence $Z$ , the posterior can be evaluated directly through

\pi(\bm{x}|{\mathcal{Y}})\approx\frac{{\mathcal{L}}_{\mathrm{SLE}}(\bm{x})\pi(\bm{x})}{Z}=\frac{\pi(\bm{x})}{a_{\bm{0}}}\sum_{\bm{\alpha}\in{\mathcal{A}}}a_{\bm{\alpha}}\Psi_{\bm{\alpha}}(\bm{x}).

(12)

Posterior marginals

We split the random vector $\bm{X}$ into two vectors $\bm{X}_{{\bm{\mathsf{u}}}}$ with components $\{X_{i}\}_{i\in{\bm{\mathsf{u}}}}\in\mathcal{D}_{\bm{X}_{{\bm{\mathsf{u}}}}}$ and $\bm{X}_{{\bm{\mathsf{v}}}}$ with components $\{X_{i}\}_{i\in{\bm{\mathsf{v}}}}\in\mathcal{D}_{\bm{X}_{{\bm{\mathsf{v}}}}}$ , where ${\bm{\mathsf{u}}}$ and ${\bm{\mathsf{v}}}$ are two non-empty disjoint index sets such that ${\bm{\mathsf{u}}}\cup{\bm{\mathsf{v}}}=\{1,\dots,M\}$ . Denote further by $\pi_{{\bm{\mathsf{u}}}}(\bm{x}_{{\bm{\mathsf{u}}}})\stackrel{{\scriptstyle\text{def}}}{{=}}\prod_{i\in{\bm{\mathsf{u}}}}\pi_{i}(x_{i})$ and $\pi_{{\bm{\mathsf{v}}}}(\bm{x}_{{\bm{\mathsf{v}}}})\stackrel{{\scriptstyle\text{def}}}{{=}}\prod_{i\in{\bm{\mathsf{v}}}}\pi_{i}(x_{i})$ the prior marginal density functions of $\bm{X}_{{\bm{\mathsf{u}}}}$ and $\bm{X}_{{\bm{\mathsf{v}}}}$ respectively. The posterior marginals then read:

\pi_{{\bm{\mathsf{u}}}}(\bm{x}_{{\bm{\mathsf{u}}}}|{\mathcal{Y}})=\int_{{\mathcal{D}}_{\bm{X}_{{\bm{\mathsf{v}}}}}}\pi(\bm{x}|{\mathcal{Y}})\,{\rm d}\bm{x}_{{\bm{\mathsf{v}}}}\approx\frac{\pi_{{\bm{\mathsf{u}}}}(\bm{x}_{{\bm{\mathsf{u}}}})}{a_{\bm{0}}}\sum_{\bm{\alpha}\in{\mathcal{A}}_{{\bm{\mathsf{v}}}=0}}a_{\bm{\alpha}}\Psi_{\bm{\alpha}}(\bm{x}_{{\bm{\mathsf{u}}}}),

(13)

where ${\mathcal{A}}_{{\bm{\mathsf{v}}}=0}=\{\bm{\alpha}\in\mathcal{A}:\alpha_{i}=0\Leftrightarrow i\in{\bm{\mathsf{v}}}\}$ . The series in the above equation constitutes a subexpansion that contains non-constant polynomials only in the directions $i\in{\bm{\mathsf{u}}}$ .

Quantities of interest

Finally, it is also possible to analytically compute posterior expectations of functions that admit a polynomial chaos expansion in the same basis of the form $h(\bm{X})\approx\sum_{\bm{\alpha}\in{\mathcal{A}}}b_{\bm{\alpha}}\Psi_{\bm{\alpha}}(\bm{X})$ . Eq. (7) then reduces to the spectral product:

{\mathbb{E}}\left[h(\bm{X})|{\mathcal{Y}}\right]=\frac{1}{a_{\bm{0}}}\sum_{\bm{\alpha}\in{\mathcal{A}}}a_{\bm{\alpha}}b_{\bm{\alpha}}.

(14)

The quality of these results depends only on the approximation error introduced in Eq. (10). The latter, in turn, depends mainly on the chosen PCE truncation strategy (Blatman and Sudret, 2011; Nagel and Sudret, 2016) and the number of points used to compute the coefficients (i.e., the experimental design). It is known that likelihood functions typically have quasi-compact supports (i.e., ${\mathcal{L}}(\bm{X})\approx 0$ on a majority of ${\mathcal{D}}_{\bm{X}}$ ). Such functions require a very high polynomial degree to be approximated accurately, which in turn can lead to the need for prohibitively large experimental designs.

4 Stochastic spectral embedding

Stochastic spectral embedding (SSE) is a multi-level approach to surrogate modeling originally proposed in Marelli et al. (2021). It attempts to approximate a given square-integrable function $\mathcal{M}$ with independent inputs $\bm{X}$ through

\mathcal{M}\approx\mathcal{M}_{\text{SSE}}(\bm{X})=\sum\limits_{k\in\mathcal{K}}\bm{1}_{\mathcal{D}_{\bm{X}}^{k}}(\bm{X})\,\widehat{\mathcal{R}}_{S}^{k}(\bm{X}),

(15)

where $\mathcal{K}\subseteq\mathbb{N}^{2}$ is a set of multi-indices with elements $k=(\ell,p)$ for which $\ell=0,\dots,L$ and $p=1,\dots,P_{\ell}$ where $L$ is the number of levels and $P_{\ell}$ is the number of subdomains at a specific level $\ell$ . We call $\widehat{\mathcal{R}}_{S}^{k}(\bm{X})$ a residual expansion given by

\widehat{\mathcal{R}}^{k}_{S}(\bm{X})=\sum\limits_{\bm{\alpha}\in\mathcal{A}^{k}}a_{\bm{\alpha}}^{k}\Psi_{\bm{\alpha}}^{k}(\bm{X}).

(16)

In the present paper the term $\sum_{j\in\mathcal{A}^{k}}a_{j}^{k}\Psi_{j}^{k}(\bm{X})$ denotes a polynomial chaos expansion (see Eq. (8)) constructed in the subdomain ${\mathcal{D}}_{\bm{X}}^{k}$ , but in principle it can refer to any spectral expansion (e.g., Fourier series). A schematic representation of the summation in Eq. (15) is given in Figure 2. The detailed notation and the algorithm to sequentially construct an SSE are given in the sequel.

4.1 Stochastic spectral likelihood embedding

Viewing the likelihood as a function of a random variable $\bm{X}$ with independent marginals, we can directly use Eq. (15) to write down its SSLE representation

{\mathcal{L}}(\bm{X})\approx{\mathcal{L}}_{\mathrm{SSLE}}(\bm{X})\stackrel{{\scriptstyle\text{def}}}{{=}}\sum\limits_{k\in\mathcal{K}}\bm{1}_{\mathcal{D}_{\bm{X}}^{k}}(\bm{X})\,\widehat{\mathcal{R}}_{S}^{k}(\bm{X}),

(17)

where the variable $\bm{X}$ is distributed according to the prior distribution $\pi(\bm{x})$ and, consequently, the local basis used to compute $\widehat{\mathcal{R}}_{S}^{k}(\bm{X})$ is orthonormal w.r.t. that distribution.

Due to the local spectral properties of the residual expansions, the SSLE representation of the likelihood function retains all of the post-processing properties of SLE (Section 3.1):

Evidence

The normalization constant $Z$ emerges as the sum of the constant polynomial coefficients weighted by the prior mass:

Z=\sum_{k\in{\mathcal{K}}}\sum_{\bm{\alpha}\in{\mathcal{A}}^{k}}a_{\bm{\alpha}}^{k}\int_{{\mathcal{D}}_{\bm{X}}^{k}}\Psi_{\bm{\alpha}}^{k}(\bm{x})\pi(\bm{x})\,{\rm d}\bm{x}=\sum_{k\in{\mathcal{K}}}\mathcal{V}^{k}a_{\bm{0}}^{k},\quad\text{where}\quad\mathcal{V}^{k}=\int_{{\mathcal{D}}_{\bm{X}}^{k}}\pi(\bm{x})\,{\rm d}\bm{x}.

(18)

Posterior

This allows us to write the posterior as

\pi(\bm{x}|{\mathcal{Y}})\approx\frac{{\mathcal{L}}_{\mathrm{SSLE}}(\bm{x})\pi(\bm{x})}{Z}=\frac{\pi(\bm{x})}{\sum_{k\in{\mathcal{K}}}\mathcal{V}^{k}a_{\bm{0}}^{k}}\sum_{k\in{\mathcal{K}}}\bm{1}_{\mathcal{D}_{\bm{X}}^{k}}(\bm{x})\widehat{\mathcal{R}}^{k}_{S}(\bm{x}).

(19)

Posterior marginal

Utilizing again the disjoint sets ${\bm{\mathsf{u}}}$ and ${\bm{\mathsf{v}}}$ from Eq. (13) it is also possible to analytically derive posterior marginal PDFs as

\pi_{{\bm{\mathsf{u}}}}(\bm{x}_{{\bm{\mathsf{u}}}}|{\mathcal{Y}})=\int_{{\mathcal{D}}_{\bm{X}_{{\bm{\mathsf{v}}}}}}\pi(\bm{x}|{\mathcal{Y}})\,{\rm d}\bm{x}_{{\bm{\mathsf{v}}}}\approx\frac{\pi_{{\bm{\mathsf{u}}}}(\bm{x}_{{\bm{\mathsf{u}}}})}{\sum_{k\in{\mathcal{K}}}\mathcal{V}^{k}a_{\bm{0}}^{k}}\sum_{k\in{\mathcal{K}}}\bm{1}_{{\mathcal{D}}_{\bm{X}_{{\bm{\mathsf{u}}}}}^{k}}(\bm{x}_{{\bm{\mathsf{u}}}})\widehat{\mathcal{R}}^{k}_{S,{\bm{\mathsf{u}}}}(\bm{x}_{{\bm{\mathsf{u}}}})\mathcal{V}^{k}_{{\bm{\mathsf{v}}}}

(20)

where

\widehat{\mathcal{R}}^{k}_{S,{\bm{\mathsf{u}}}}(\bm{x}_{{\bm{\mathsf{u}}}})=\sum_{\bm{\alpha}\in{\mathcal{A}}_{{\bm{\mathsf{v}}}=0}^{k}}a_{\bm{\alpha}}^{k}\Psi_{\bm{\alpha}}^{k}(\bm{x}_{{\bm{\mathsf{u}}}})\quad\text{and}\quad\mathcal{V}^{k}_{{\bm{\mathsf{v}}}}=\int_{{\mathcal{D}}_{\bm{X}_{{\bm{\mathsf{v}}}}}^{k}}\pi_{{\bm{\mathsf{v}}}}(\bm{x}_{{\bm{\mathsf{v}}}})\,{\rm d}\bm{x}_{{\bm{\mathsf{v}}}}.

(21)

$\widehat{\mathcal{R}}^{k}_{S,{\bm{\mathsf{u}}}}(\bm{x}_{{\bm{\mathsf{u}}}})$ is a subexpansion of $\widehat{\mathcal{R}}^{k}_{S}(\bm{x})$ that contains only non-constant polynomials in the directions $i\in{\bm{\mathsf{u}}}$ . Note that, as we assumed that the prior distribution has independent components, the constants $\mathcal{V}^{k}$ and $\mathcal{V}^{k}_{{\bm{\mathsf{v}}}}$ are obtained as products of univariate integrals which are available analytically from the prior marginal cumulative distribution functions (CDFs).

Quantities of interest

Expected values of function $h(\bm{x})=\sum_{\bm{\alpha}\in{\mathcal{A}}^{k}}b_{\bm{\alpha}}^{k}\Psi_{\bm{\alpha}}^{k}(\bm{x})$ for $k\in{\mathcal{K}}$ under the posterior can be approximated by:

\displaystyle\begin{split}\mathbb{E}\left[h(\bm{X})|{\mathcal{Y}}\right]&=\int_{{\mathcal{D}}_{\bm{X}}}h(\bm{x})\pi(\bm{x}|{\mathcal{Y}})\,{\rm d}\bm{x}\\ &=\frac{1}{Z}\sum_{k\in{\mathcal{K}}}\sum_{\bm{\alpha}\in{\mathcal{A}}^{k}}a_{\bm{\alpha}}^{k}\int_{{\mathcal{D}}_{\bm{X}}^{k}}h(\bm{x})\Psi_{\bm{\alpha}}^{k}(\bm{x})\pi(\bm{x})\,{\rm d}\bm{x}\\ &=\frac{1}{Z}\sum_{k\in{\mathcal{K}}}\sum_{\bm{\alpha}\in{\mathcal{A}}^{k}}a_{\bm{\alpha}}^{k}b_{\bm{\alpha}}^{k},\end{split}

(22)

where $b_{\bm{\alpha}}^{k}$ are the coefficients of the PCE of $h$ in the $\mathrm{card}({\mathcal{K}})$ bases $\{\Psi_{\bm{\alpha}}^{k}\}_{\bm{\alpha}\in{\mathcal{A}}^{k}}$ . This can also be used for computing posterior moments like mean, variance or covariance.

These expressions can be seen as a generalization of the ones for SLE detailed in Section 3.1. For a single-level global expansion (i.e., $\mathrm{card}({\mathcal{K}})=1$ ) and consequently $\mathcal{V}^{(0,1)}=1$ , they are identical.

4.2 Modifications to the original algorithm

The original algorithm for computing an SSE was presented in Marelli et al. (2021). It recursively partitions the input domain $\mathcal{D}_{\bm{X}}$ and constructs truncated expansions of the residual. We reproduce it below for reference but replace the model $\mathcal{M}$ with the likelihood function $\mathcal{L}$ . We further simplify the algorithm by choosing a partitioning strategy with $N_{S}=2$ .

1.
Initialization:
1. (a)
  
  $\ell=0$ , $p=1$
2. (b)
  
  $\mathcal{D}_{\bm{X}}^{\ell,p}=\mathcal{D}_{\bm{X}}$
3. (c)
  
  ${\mathcal{R}}^{\ell}(\bm{X})=\mathcal{L}(\bm{X})$
2.
For each subdomain $\mathcal{D}_{\bm{X}}^{\ell,p}$ , $p=1,\cdots,P_{\ell}$ :
1. (a)
  
  Calculate the truncated expansion $\widehat{\mathcal{R}}_{S}^{\ell,p}(\bm{X}^{\ell,p})$ of the residual ${\mathcal{R}}^{\ell}(\bm{X}^{\ell,p})$ in the current subdomain
2. (b)
  
  Update the residual in the current subdomain ${\mathcal{R}}^{\ell+1}(\bm{X}^{\ell,p})={\mathcal{R}}^{\ell}(\bm{X}^{\ell,p})-\widehat{\mathcal{R}}_{S}^{\ell,p}(\bm{X}^{\ell,p})$
3. (c)
  
  Split the current subdomain $\mathcal{D}_{\bm{X}}^{\ell,p}$ in $2$ subdomains $\mathcal{D}_{\bm{X}}^{\ell+1,\{s_{1},s_{2}\}}$ based on a partitioning strategy
4. (d)
  
  If $\ell<L$ , $\ell\leftarrow\ell+1$ , go back to 2a, otherwise terminate the algorithm
3.
Termination
1. (a)
  
  Return the full sequence of $\mathcal{D}_{\bm{X}}^{\ell,p}$ and $\widehat{\mathcal{R}}_{S}^{\ell,p}(\bm{X}^{\ell,p})$ needed to compute Eq. (15).

In practice, the residual expansions $\widehat{\mathcal{R}}_{S}^{\ell,p}(\bm{X}^{\ell,p})$ are computed using a fixed experimental design $\mathcal{X}$ and corresponding model evaluations $\mathcal{L}(\mathcal{X})$ . The algorithm then only requires the specification of a partitioning strategy and a termination criterion, as detailed in Marelli et al. (2021).

Likelihood functions are typically characterized by a localized behaviour: Close to the data-generating parameters they peak while quickly decaying to $0$ in the remainder of the prior domain. This means that in a majority of the domain the likelihood evaluation is non-informative. Directly applying the original algorithm is then expected to waste many likelihood evaluations.

We therefore modify the original algorithm by adding an adaptive sampling scheme (Section 4.2.1) that includes the termination criterion and introducing an improved partitioning strategy (Section 4.2.2) that is especially suitable for finding compact support features. The rationale for these modifications is presented next.

4.2.1 Adaptive sampling scheme

The proposed algorithm has two parameters: the experimental design size for the residual expansions $N_{\mathrm{ref}}$ and the final experimental design size corresponding to the available computational budget $N_{\mathrm{ED}}$ . At the initialization of the algorithm $N_{\mathrm{ref}}$ points are sampled as a first experimental design. At every further iteration, additional points are then sampled from the prior distribution. These samples are generated in a space-filling way (e.g. through latin hypercube sampling) in the newly created subdomains $\mathcal{D}_{\bm{X}}^{\ell+1,s}$ to have always exactly $N_{\mathrm{ref}}$ points available for constructing the residual expansions. The algorithm is terminated, once the computational budget $N_{\mathrm{ED}}$ has been exhausted.

Adding new samples points requires the evaluation of the likelihood function. Because multiple points are added at every iteration of the algorithm, this step can be easily performed simultaneously on multiple computational nodes.

At every step, the proposed algorithm chooses a single refinement domain from the set of unsplit, i.e. terminal domains, creates two new subdomains by splitting the refinement domain and constructs residual expansions after enriching the experimental design. The selection of this refinement domain is based on the error estimator $\mathcal{E}^{k}$ that is defined by

\mathcal{E}^{\ell+1,s}=\begin{cases}E^{\ell+1,s}_{\mathrm{LOO}}\mathcal{V}^{\ell+1,s},\quad&\text{if}\quad\exists~{}\widehat{\mathcal{R}}_{S}^{\ell+1,s},\\ E^{\ell,s}_{\mathrm{LOO}}\mathcal{V}^{\ell+1,s},\quad&\text{otherwise}.\end{cases}

(23)

This estimator incorporates the subdomain size through the prior mass $\mathcal{V}^{\ell+1,s}$ , and the approximation accuracy, through the leave-one-out estimator. The distinction is necessary to assign an error estimator also to domains that have too few points to construct a residual expansion, in which case the error estimator of the previous level $E^{\ell,s}_{\mathrm{LOO}}$ is reused.

The algorithm sequentially splits and refines subdomains with large approximation errors. Because likelihood functions typically have the highest complexity close to their peak, these regions tend to have larger approximation errors and are therefore predominantly picked for refinement. The proposed way of adaptive sampling then ends up placing more points near the likelihood peak, thereby reducing the number of non-informative likelihood evaluations.

The choice of a constant $N_{\mathrm{ref}}$ is simple and could in principle be replaced by a more elaborate strategy (e.g., based on the approximation error of the current subdomain relative to the total approximation error). A benefit of this enrichment criterion is that all residual expansions are computed with experimental designs of the same size. Upon choosing the domain with the maximum approximation error among the terminal domains, the error estimators then have a more comparable estimation accuracy.

4.2.2 Partitioning strategy

The partitioning strategy determines how a selected refinement domain is split. As described in Marelli et al. (2021), it is easy to define the splitting in the uniformly distributed quantile space $\bm{U}$ and map the resulting split domains $\mathcal{D}_{\bm{U}}^{\ell,p}$ to the (possibly unbounded) real space $\bm{X}$ through an appropriate isoprobabilistic transform (e.g., the Rosenblatt transform (Rosenblatt, 1952)).

Similar to the original SSE algorithm presented in Marelli et al. (2021), we split the refinement domain in half w.r.t. its prior mass. The original algorithm chooses the splitting direction based on the partial variance in the refinement domain. This approach is well suited for generic function approximation problems. For the approximation of likelihood functions, however, we propose a partitioning strategy that is more apt for dealing with their compact support.

We propose to pick the split direction along which a split yields a maximum difference in the residual empirical variance between the two candidate subdomains created by the split. This can easily be visualized with an example given by the $M=2$ dimensional domain ${\mathcal{D}}_{\bm{X}}^{\ell,p}$ in Figure LABEL:sub@fig:SSE:algo:splitting:1. Assume this subdomain was selected as the refinement domain. To decide along which dimension to split, we construct the $M$ candidate subdomain pairs $\{{\mathcal{D}}_{\mathrm{split}}^{i,1},{\mathcal{D}}_{\mathrm{split}}^{i,2}\}_{i=1,\dots,M}$ and estimate the corresponding $\{E_{\mathrm{split}}^{i}\}_{i=1,\dots,M}$ in those subdomains defined by

E_{\mathrm{split}}^{i}\stackrel{{\scriptstyle\text{def}}}{{=}}\left|{\rm Var}\left[\mathcal{R}^{\ell+1}(\mathcal{X}^{i,1}_{\mathrm{split}})\right]-{\rm Var}\left[\mathcal{R}^{\ell+1}(\mathcal{X}^{i,2}_{\mathrm{split}})\right]\right|.

(24)

In this expression, $\mathcal{X}^{i,1}_{\mathrm{split}}$ and $\mathcal{X}^{i,2}_{\mathrm{split}}$ denote subsets of the experimental design $\mathcal{X}$ inside the subdomains ${\mathcal{D}}_{\mathrm{split}}^{i,1}$ and ${\mathcal{D}}_{\mathrm{split}}^{i,2}$ respectively. The occurring variances can be easily estimated with the empirical variance of the residuals in the respective candidate subdomains.

After computing the residual variance differences, the split is carried out along the dimension

d=\arg\max_{i\in\{1,\dots,M\}}E_{\mathrm{split}}^{i},

(25)

i.e., to keep the subdomains ${\mathcal{D}}_{\mathrm{split}}^{d,1}$ and ${\mathcal{D}}_{\mathrm{split}}^{d,2}$ that introduce the largest difference in variance. For $d=1$ , the resulting split can be seen in Figure LABEL:sub@fig:SSE:algo:splitting:4.

Refer to caption — (a) Refinement domain

The choice of this partitioning strategy can be justified heuristically with the goal of approximating compact support functions. Assume that the likelihood function has compact support, this criterion will avoid cutting through its support and instead identify a split direction that results in one subdomain with large variance (expected to contain the likelihood support) and a subdomain with small variance. In subsequent steps, the algorithm will proceed by cutting away low variance subdomains, until the likelihood support is isolated.

4.2.3 The adaptive SSLE algorithm

The algorithm is presented here with its two parameters $N_{\mathrm{ref}}$ , the minimum experimental design size needed to expand a residual, and $N_{\mathrm{ED}}$ , the final experimental design size. The sample ${\mathcal{X}}^{\ell,p}$ refers to ${\mathcal{X}}\cap{\mathcal{D}}_{X}^{\ell,p}$ , i.e. the subset of $\mathcal{X}$ inside ${\mathcal{D}}_{X}^{\ell,p}$ . Further, the multi-index set $\mathcal{T}\in\mathbb{N}^{2}$ at each step of the algorithm gathers all indices $(\ell,p)$ of unsplit subdomains. It thus denotes the terminal domains: $\mathcal{D}_{\bm{X}}^{k},k\in\mathcal{T}$ . For visualization purposes we show the first iterations of the algorithm for a two-dimensional example in Figure 2.

1.
Initialization:
1. (a)
  
  $\mathcal{D}_{\bm{X}}^{0,1}=\mathcal{D}_{\bm{X}}$
2. (b)
  
  Sample from prior distribution $\mathcal{X}=\{\bm{x}^{(1)},\cdots,\bm{x}^{(N_{\mathrm{ref}})}\}$
3. (c)
  
  Calculate the truncated expansion $\widehat{\mathcal{R}}_{S}^{0,1}(\bm{X})$ of $\mathcal{L}(\bm{X})$ in the full domain $\mathcal{X}^{0,1}$ , retrieve its approximation error $\mathcal{E}^{0,1}$ and initialize $\mathcal{T}=\{(0,1)\}$
4. (d)
  
  ${\mathcal{R}}^{1}(\bm{X})=\mathcal{L}(\bm{X})-\widehat{\mathcal{R}}_{S}^{0,1}(\bm{X})$
2.
For $(\ell,p)=\operatorname*{arg\,max}_{k\in\mathcal{T}}\mathcal{E}^{k}$ :
1. (a)
  
  Split the current subdomain $\mathcal{D}_{\bm{X}}^{\ell,p}$ in $2$ sub-parts $\mathcal{D}_{\bm{X}}^{\ell+1,\{s_{1},s_{2}\}}$ and update $\mathcal{T}$
2. (b)
  For each split $s=\{s_{1},s_{2}\}$
  1. i.
    
    If $|\mathcal{X}^{\ell+1,s}|<N_{\mathrm{ref}}$ and $N_{\mathrm{ref}}-|\mathcal{X}^{\ell+1,s}|<N_{\mathrm{ED}}-|\mathcal{X}|$
    
    A.
    
    Enrich sample $\mathcal{X}$ with $N_{\mathrm{ref}}-|\mathcal{X}^{\ell+1,s}|$ new points inside $\mathcal{D}_{\bm{X}}^{\ell+1,s}$
  2. ii.
    
    If $|\mathcal{X}^{\ell+1,s}|=N_{\mathrm{ref}}$
    
    A.
    
    Create the truncated expansion $\widehat{\mathcal{R}}_{S}^{\ell+1,s}(\bm{X}^{\ell+1,s})$ of the residual ${\mathcal{R}}^{\ell+1}(\bm{X}^{\ell+1,s})$ in the current subdomain using $\mathcal{X}^{\ell+1,s}$
    
    B.
    
    Update the residual in the current subdomain ${\mathcal{R}}^{\ell+2}(\bm{X}^{\ell+1,s})={\mathcal{R}}^{\ell+1}(\bm{X}^{\ell+1,s})-\widehat{\mathcal{R}}_{S}^{\ell+1,s}(\bm{X}^{\ell+1,s})$
  3. iii.
    
    Retrieve the approximation error $\mathcal{E}^{\ell+1,s}$ from Eq. (23)
3. (c)
  
  If no new expansions were created, terminate the algorithm, otherwise go back to 2
3.
Termination
1. (a)
  
  Return the full sequence of $\mathcal{D}_{\bm{X}}^{\ell,p}$ and $\widehat{\mathcal{R}}_{S}^{\ell,p}(\bm{X}^{\ell,p})$ needed to compute Eq. (15)

The updating of the multi-index set in Step 2a refers to removing the current index $(\ell,p)$ from the set and adding to it the newly created indices $(\ell+1,s_{1})$ and $(\ell+1,s_{2})$ .

4.2.4 Convergence of the adaptive SSLE algorithm

Convergence of the original SSE algorithm is guaranteed in the mean-square sense by the spectral convergence in each subdomain Marelli et al. (2021). This convergence property directly applies to the SSLE of any likelihood function as, per existence of a finite maximum likelihood value, they fulfil the finite variance condition Nagel and Sudret (2016).

This result cannot be directly extended to the present adaptive (greedy) setting, because a combination of parameters might in general lead to a whole subdomain not being explored further. This can only happen, however, if the error estimator $\mathcal{E}^{k}$ tremendously underestimates the actual error in a terminal domain. The choice of the leave-one-out cross-validation error estimator (see Eq. (23)), with its tendency to avoid overfitting, reduces significantly the probability of this scenario. Further investigations in this direction are ongoing.

5 Case studies

To showcase the effectiveness of the proposed SSLE approach, we present three case studies with different types of likelihood complexity: (i) a one-dimensional vibration problem with a bimodal posterior, (ii) a six-dimensional heat transfer problem that exhibits high posterior concentrations (i.e. highly informative likelihood) and (iii) a $62$ -dimensional diffusion problem with low active dimensionality that models concentration-driven diffusion in a one-dimensional domain.

For all case studies, we adopt the adaptive sparse-PCE based on LARS approach developed in Blatman and Sudret (2011) through its numerical implementation in UQLab (Marelli and Sudret, 2014, 2019). Each $\widehat{\mathcal{R}}_{S}^{k}$ is therefore a degree- and $q$ -norm-adaptive polynomial chaos expansion. We further introduce a rank truncation of $r=2$ , i.e. we limit the maximum number of input interactions (Marelli and Sudret, 2019) to two variables at a time. The truncation set for each spectral expansion (Eq. (8)) thus reads:

\mathcal{A}^{M,p,q,r}=\{\bm{\alpha}\in\mathbb{N}^{M}:||\bm{\alpha}||_{q}\leq p,||\bm{\alpha}||_{0}\leq r\}.

(26)

where

||\bm{\alpha}||_{q}=\left(\sum_{i=1}^{M}\alpha_{i}^{q}\right)^{\frac{1}{q}},q\in(0,1];\quad||\bm{\alpha}||_{0}=\sum_{i=1}^{M}1_{\{\alpha_{i}>0\}}.

(27)

The $q$ -norm is adaptively increased between $q=\{0.5,\cdots,0.8\}$ while the maximum polynomial degree is adaptively increased in the interval $p=\{0,1,\cdots,p\}$ , where the maximum degree $p=20$ for case study (i) and (ii) and $p=3$ for case study (iii) due to its high dimensionality.

In case study (ii) and (iii), the performance of SLE (Nagel and Sudret, 2016), the original non-adaptive SSLE (Marelli et al., 2021) and the proposed adaptive SSLE approach presented in Section 4.2 is compared. The comparison was omitted for case study (i), because only adaptive SSLE succeeded in solving the problem. For clarity, we henceforth abbreviate the adaptive SSLE algorithm to adSSLE.

To simplify the comparison, the same partitioning strategy employed for adSSLE (Section 4.2.2) was employed for the non-adaptive SSLE approach. Also, the same experimental designs were used for the non-adaptive SSLE and the SLE approaches. Finally, the same parameter $N_{\mathrm{ref}}$ was used to define the enrichment samples in adSSLE and the termination criterion in non-adaptive SSLE.

Because the considered case studies do not admit a closed form solution, we validate the algorithm instead through MCMC reference solutions with long chain lengths to produce samples from the posterior distributions. In this respect, identifiability of the “true” underlying data-generating model, while clearly important in inverse problems in general, is not central in the present discussion. As a proxy for identifiability, however, we consider several different case studies with more or less informative (peaked) likelihood and posterior distributions.

To assess the performance of the three algorithms considered, we define an error measure that allows to quantitatively compare the similarity of the SSLE, adSSLE and SLE solution with the reference MCMC solution. This comparison is inherently difficult, as a sampling-based approach (MCMC) needs to be compared to a functional approximation (SSLE, adSSLE, SLE). We proceed to compare the univariate posterior marginals, available analytically in SSLE, adSSLE and SLE (See Eq. (13) and Eq. (20)), to the reference posterior marginals estimated with kernel density estimation (KDE, Wand and Jones (1995)) from the MCMC sample. Denoting by $\hat{\pi}_{i}(\kappa_{i}|{\mathcal{Y}})$ the SSLE, adSSLE or SLE approximations and by ${\pi}_{i}(\kappa_{i}|{\mathcal{Y}})$ the reference solution, we define the following error measure

\eta\stackrel{{\scriptstyle\text{def}}}{{=}}\frac{1}{M}\sum_{i=1}^{M}\operatorname*{JSD}\left(\hat{\pi}_{i}(\kappa_{i}|{\mathcal{Y}})||\pi_{i}(\kappa_{i}|{\mathcal{Y}})\right)

(28)

where $M$ is the dimensionality of the problem and $\operatorname*{JSD}$ is the Jensen-Shannon divergence (Lin, 1991), a symmetric and bounded ( $[0,\log(2)]$ ) distance measure for probability distributions based on the Kullback-Leibler divergence.

The purpose of the error measure $\eta$ is to allow for a fair comparison between the different methods investigated. It is not a practical measure for engineering applications because it relies on the availability of a reference solution, and it its magnitude does not have a clear quantitative interpretation. However, it is considerably more comprehensive than a pure moment-based error measure. Because it is averaged over all $M$ marginals, it encapsulates the approximation accuracy of all univariate posterior marginals in a scalar value.

As all algorithms (SSLE, adSSLE, SLE) depend on a randomly chosen experimental designs, we produce $20$ replications for case study (ii) and $5$ replications for case study (iii) by running them multiple times. The computational cost of all examples is given in terms of number of likelihood evaluations $N_{\mathrm{ED}}$ , as they dominate the total computational cost in any engineering setting. All computations shown were carried out on a standard laptop, with the longest single runs taking $\approx 1~{}h$ , which includes both forward model evaluations and the overall algorithmic overhead.

5.1 1-dimensional vibration problem

In this first case study, the goal is the inference of a single unknown parameter with a multimodal posterior distribution. This problem is difficult to solve with standard MCMC methods due to the presence of a probability valley, i.e. low probability region, between essentially disjoint posterior peaks. It also serves as an illustrative example of how the proposed adaptive algorithm constructs an adSSLE. The problem is fabricated and uses synthetic data, but is presented in the context of a relevant engineering problem.

Consider the oscillator displayed in Figure 3 subject to harmonic (i.e., sinusoidal) excitation. Assume the prior information about its stiffness $X\stackrel{{\scriptstyle\text{def}}}{{=}}k$ is that it follows a lognormal distribution with $\mu=0.8~{}N/m$ and $\sigma=0.1~{}N/m$ . Its true value shall be determined using measurements of the oscillation amplitude at the location of the mass $m$ . The known properties of the oscillator system are the oscillator mass $m=1~{}kg$ , the excitation frequency $\omega=1~{}rad/s$ and the viscous damping coefficient $c=0.1~{}Ns/m$ . The oscillation amplitude is measured in five independent oscillation events and normalized by the forcing amplitude yielding the measured amplitude ratios ${\mathcal{Y}}=\{9.01,8.67,8.84,9.22,8.54\}$ .

This problem is well known in mechanics and in the linear case (i.e., assuming small deformations and linear material behavior) can be solved analytically with the amplitude of the frequency response function. This function returns the ratio between the steady state amplitude of a linear oscillator and the amplitude of its excitation. It is given by

{\mathcal{M}}(X)=\frac{m\omega^{2}}{\sqrt{(X-m\omega^{2})^{2}+(c\omega)^{2}}}.

(29)

We assume a discrepancy model with known discrepancy standard deviation $\sigma$ . In conjunction with the available measurements ${\mathcal{Y}}$ , this leads to the following likelihood function:

{\mathcal{L}}(\bm{x};\,{\mathcal{Y}})=\prod_{i=1}^{5}{\mathcal{N}}(y^{(i)}|{\mathcal{M}}(x),\sigma^{2}).

(30)

We employ the adSSLE algorithm to approximate this likelihood function with $N_{\mathrm{ref}}=10$ . A few iterations from the solution process are shown in Figure 4. The top plots show the subdomains ${\mathcal{D}}_{X}^{\ell,p}$ constructed at each refinement step, highlighting the terminal domains ${\mathcal{T}}$ . The middle plots display the residual between the true likelihood and the approximation at the current iteration, as well as the adaptively chosen experimental design $\mathcal{X}$ . The bottom plots display the target likelihood function and its current approximation.

The initial global approximation of the first iteration in Figure LABEL:sub@fig:cs1:SSEBehaviour:1 is a constant polynomial based on the initial experimental design. By the third iteration, the algorithm has identified the subdomain ${\mathcal{D}}_{X}^{2,2}$ as the one of interest and proceeds to refine it in subsequent steps. By the $8$ th iteration both likelihood peaks have been identified. Finally, by the $10$ th iteration in Figure LABEL:sub@fig:cs1:SSEBehaviour:4, both likelihood peaks are approximated well by the adSSLE approach.

The last iteration shows how the algorithm splits domains and adds new sample points. There is a clear clustering of subdomains and sample points near the likelihood peaks at $X=0.95$ and $X=1.05$ .

The results from Eq. (22) show that without further computations it would be possible to directly extract the posterior moments by post-processing the SSLE coefficients. In the present bimodal case, however, the posterior moments are not very meaningful. Instead, the available posterior approximation gives a full picture of the inferred parameter $X|{\mathcal{Y}}$ . It is shown together with the true posterior and the original prior distribution in Figure 5.

For this case study, non-adaptive experimental design approaches like the standard SSLE (Marelli et al., 2021) and the original SLE algorithm (Nagel and Sudret, 2016) will almost surely fail for the considered experimental design of $N_{\mathrm{ED}}=100$ . In numerous trial runs these approaches did not manage to accurately reconstruct the likelihood function due to a lack of informative samples near the likelihood peaks.

5.2 Moderate-dimensional heat transfer problem

This case study is a complex engineering problem that can be modified to exhibit high posterior concentration in the prior domain. It was originally presented in Nagel and Sudret (2016) and solved there using SLE. We again solve the same problem with SSLE and compare the performance of SLE (Nagel and Sudret, 2016), the original non-adaptive SSLE (Marelli et al., 2021) and the proposed adSSLE approach presented in Section 4.2. To investigate the performance of the algorithm in the case of high posterior concentrations (e.g. due to a large data-set), two instances of the problem with different discrepancy parameters are investigated.

Consider the diffusion-driven stationary heat transfer problem sketched in Figure LABEL:sub@fig:cs3:setup. It models a 2D plate with a background matrix of constant thermal conductivity $\kappa_{0}$ and $6$ inclusions with conductivities $\bm{\kappa}\stackrel{{\scriptstyle\text{def}}}{{=}}(\kappa_{1},\dots,\kappa_{6})^{\intercal}$ . The diffusion driven steady state heat distribution is described by a heat equation in Euclidean coordinates $\bm{r}\stackrel{{\scriptstyle\text{def}}}{{=}}(r_{1},r_{2})^{\intercal}$ of the form

\nabla\cdot(\kappa(\bm{r})\nabla\tilde{T}(\bm{r}))=0,

(31)

where the thermal conductivity field is denoted by $\kappa$ and the temperature field by $\tilde{T}$ . The boundary conditions of the plate are given by a no-heat-flux Neumann boundary conditions on the left and right sides ( $\partial\tilde{T}/r_{1}=0$ ), a Neumann boundary condition on the bottom ( ${\kappa_{0}\partial\tilde{T}/r_{2}=q_{2}}$ ) and a temperature $\tilde{T}_{1}$ Dirichlet boundary condition on the top.

We employ a finite element (FE) solver to solve the weak form of Eq. (31) by discretizing the domain into approximately $10^{5}$ triangular elements. A sample solution returned by the FE-solver is shown in Figure 6b.

In this example we intend to infer the thermal conductivities $\bm{\kappa}$ of the inclusions. We assume the same problem constants as in Nagel and Sudret (2016) (i.e., $q_{2}=2{,}000~{}\text{W/m}^{2}$ , $\tilde{T}_{1}=200~{}\text{K}$ , $\kappa_{0}=45\text{W/m/K}$ ). The forward model ${\mathcal{M}}$ takes as an input the conductivities of the inclusions $\bm{\kappa}$ , solves the finite element problem and returns the steady state temperature $\tilde{\bm{T}}\stackrel{{\scriptstyle\text{def}}}{{=}}(\tilde{T}_{1},\dots,\tilde{T}_{20})^{\intercal}$ at the measurement points, i.e., ${\mathcal{M}}:\bm{\kappa}\mapsto\tilde{\bm{T}}$ .

To solve the inverse problem, we assume a multivariate lognormal prior distribution with independent marginals on the inclusion conductivities, i.e. $\pi(\bm{\kappa})=\prod_{i=1}^{6}\mathcal{LN}(\kappa_{i}|\mu=30~{}\text{W/m/K},\sigma=6~{}\text{W/m/K})$ . We further assume an additive Gaussian discrepancy model, which yields the likelihood function

{\mathcal{L}}(\bm{\kappa};{\mathcal{Y}})=\frac{1}{2\pi\sigma^{2}}\exp\left(-\frac{1}{2\sigma^{2}}\left(\bm{T}-{\mathcal{M}}(\bm{\kappa})\right)^{\intercal}\left(\bm{T}-{\mathcal{M}}(\bm{\kappa})\right)\right),

(32)

with a discrepancy standard deviation of $\sigma$ .

As measurements, we generate one temperature field with $\hat{\bm{\kappa}}\stackrel{{\scriptstyle\text{def}}}{{=}}(32,36,20,24,40,28)^{\intercal}~{}\text{W/m/K}$ and collect its values at $20$ points indicated by black dots in Figure LABEL:sub@fig:cs3:setup. We then perturb these temperature values with additive Gaussian noise and use them as the inversion data ${\mathcal{Y}}\stackrel{{\scriptstyle\text{def}}}{{=}}\bm{T}=(T_{1},\dots,T_{20})^{\intercal}$ .

We look at two instances of this problem that differ only by the discrepancy parameter $\sigma$ from Eq. (32). The prior model response has a standard deviation of approximately $0.3~{}\text{K}$ , depending on the measurement point $T_{i}$ . We therefore solve the problem first with a large value $\sigma=0.25~{}\text{K}$ and second with a small value $\sigma=0.1~{}\text{K}$ . As the discrepancy standard deviation determines how peaked the likelihood function is, the first problem has a likelihood function with a much wider support and is in turn significantly easier to solve than the second one. It is noted here that in practice, the peakedness of the likelihood function is either increased by a smaller discrepancy standard deviation, or the inclusion of additional experimental data.

To monitor the dependence of the algorithms on the number of likelihood evaluations, we solve both problems with a set of maximum likelihood evaluations $N_{\mathrm{ED}}=\{1{,}000;2{,}000,5{,}000;10{,}000;30{,}000\}$ . The number of refinement samples is set to $N_{\mathrm{ref}}=1{,}000$ .

As a benchmark, we use reference posterior samples generated by the affine-invariant ensemble sampler MCMC algorithm (Goodman and Weare, 2010) with $30{,}000$ steps and $50$ parallel chains, requiring a total of $N_{\mathrm{ED}}=1.5\cdot 10^{6}$ likelihood evaluations. Based on numerous heuristic convergence tests and due to the large number of MCMC steps, the resulting samples can be considered to accurately represent the true posterior distributions.

The results of the analyses are summarized in Figure 7, where the error measure $\eta$ is plotted against the number of likelihood evaluations for the large and small standard deviation case. For the large discrepancy standard deviation case, both SSLE approaches clearly outperform standard SLE w.r.t. the error measure $\eta$ . This is most significant at mid-range experimental designs ( $N_{\mathrm{ED}}={5{,}000;10{,}000}$ ), where SLE does not reach the required high degrees and fails to accurately approximate the likelihood function. At larger experimental designs SLE catches up to non-adaptive SSLE but is still outperformed by the proposed adSSLE approach. The real strength of the adaptive algorithm shows for the case of a small discrepancy standard deviation, where the limitations of fixed experimental designs become obvious. When the likelihood function is nonzero in a small subdomain of the prior, the global SLE and non-adaptive SSLE approach will fail in practice because of the insufficient number of samples placed in the informative regions. The adSSLE approach, however, works very well in these types of problems. It manages to identify the regions of interest and produces a likelihood approximation that accurately reproduces the posterior marginals.

Tables 1 and 2 show the convergence of the adSSLE method moment estimate (mean and variance) to the reference solution for a single run. In brackets next to the moment estimates $\xi$ , the relative error $\epsilon\stackrel{{\scriptstyle\text{def}}}{{=}}|\xi_{\mathrm{MCMC}}-\xi_{\mathrm{SSLE}}|/\xi_{\mathrm{MCMC}}$ is also shown. Due to the non-strict positivity of the SSLE estimate, one variance estimate computed with Eq. (22) is negative and is therefore omitted from Table 2.

Table 1: Moderate-dimensional heat transfer problem: adSSLE results with large discrepancy standard deviation

\sigma=0.25~{}\text{K}

. Relative errors w.r.t. MCMC reference solution are shown in brackets.

	${\mathbb{E}}\left[\kappa_{i}\|{\mathcal{Y}}\right]$	$\kappa_{1}$	$\kappa_{2}$	$\kappa_{3}$	$\kappa_{4}$	$\kappa_{5}$	$\kappa_{6}$
adSSLE	$N_{\mathrm{ED}}=1{,}000$	$30.00(0.7\%)$	$30.00(7.1\%)$	$19.51(5.8\%)$	$30.01(7.1\%)$	$35.39(2.9\%)$	$30.00(14.6\%)$
	$N_{\mathrm{ED}}=2{,}000$	$31.74(6.5\%)$	$29.80(7.7\%)$	$20.89(0.9\%)$	$30.00(7.1\%)$	$35.56(2.4\%)$	$28.30(8.1\%)$
	$N_{\mathrm{ED}}=5{,}000$	$29.70(0.4\%)$	$31.74(1.7\%)$	$20.21(2.4\%)$	$30.21(6.5\%)$	$36.45(0.0\%)$	$27.58(5.4\%)$
	$N_{\mathrm{ED}}=10{,}000$	$29.96(0.5\%)$	$32.41(0.3\%)$	$19.90(3.9\%)$	$31.55(2.3\%)$	$36.57(0.3\%)$	$26.20(0.1\%)$
	$N_{\mathrm{ED}}=30{,}000$	$29.92(0.4\%)$	$32.40(0.3\%)$	$19.94(3.7\%)$	$31.96(1.1\%)$	$36.53(0.2\%)$	$26.15(0.1\%)$
MCMC		$\mathbf{29.81}$	$\mathbf{32.30}$	$\mathbf{20.71}$	$\mathbf{32.31}$	$\mathbf{36.45}$	$\mathbf{26.18}$
	$\sqrt{{\rm Var}\left[\kappa_{i}\|{\mathcal{Y}}\right]}$
adSSLE	$N_{\mathrm{ED}}=1{,}000$	$6.00(86.5\%)$	$6.01(41.7\%)$	$6.00(142.0\%)$	$5.99(18.1\%)$	$4.44(15.9\%)$	$5.99(97.1\%)$
	$N_{\mathrm{ED}}=2{,}000$	$6.09(89.3\%)$	$4.78(12.9\%)$	$3.44(38.6\%)$	$6.00(18.1\%)$	$4.34(13.5\%)$	$5.38(76.9\%)$
	$N_{\mathrm{ED}}=5{,}000$	$2.19(32.0\%)$	$5.61(32.4\%)$	$3.09(24.4\%)$	$6.06(19.4\%)$	$3.62(5.5\%)$	$4.78(57.2\%)$
	$N_{\mathrm{ED}}=10{,}000$	$2.62(18.7\%)$	$4.40(3.7\%)$	$2.82(13.6\%)$	$5.15(1.5\%)$	$3.96(3.6\%)$	$3.14(3.4\%)$
	$N_{\mathrm{ED}}=30{,}000$	$2.40(25.5\%)$	$4.38(3.4\%)$	$2.70(8.8\%)$	$5.23(3.1\%)$	$3.87(1.2\%)$	$3.19(5.0\%)$
MCMC		$\mathbf{3.22}$	$\mathbf{4.24}$	$\mathbf{2.48}$	$\mathbf{5.08}$	$\mathbf{3.83}$	$\mathbf{3.04}$

Table 2: Moderate-dimensional heat transfer problem: adSSLE results with small discrepancy standard deviation

\sigma=0.1~{}\text{K}

. Relative errors w.r.t. MCMC reference solution are shown in brackets. Field with an asterisk (

*

) indicates negative variance estimate.

	${\mathbb{E}}\left[\kappa_{i}\|{\mathcal{Y}}\right]$	$\kappa_{1}$	$\kappa_{2}$	$\kappa_{3}$	$\kappa_{4}$	$\kappa_{5}$	$\kappa_{6}$
adSSLE	$N_{\mathrm{ED}}=1{,}000$	$30.00(3.8\%)$	$30.00(10.7\%)$	$30.00(62.2\%)$	$30.00(5.7\%)$	$30.00(22.8\%)$	$30.00(17.6\%)$
	$N_{\mathrm{ED}}=2{,}000$	$30.00(3.8\%)$	$34.58(2.9\%)$	$30.00(62.2\%)$	$30.00(5.7\%)$	$30.00(22.8\%)$	$30.00(17.6\%)$
	$N_{\mathrm{ED}}=5{,}000$	$30.00(3.8\%)$	$34.70(3.2\%)$	$25.30(36.8\%)$	$30.00(5.7\%)$	$37.28(4.0\%)$	$30.00(17.6\%)$
	$N_{\mathrm{ED}}=10{,}000$	$30.00(3.8\%)$	$34.71(3.3\%)$	$18.92(2.3\%)$	$30.00(5.7\%)$	$37.87(2.5\%)$	$25.44(0.3\%)$
	$N_{\mathrm{ED}}=30{,}000$	$31.16(0.0\%)$	$34.28(2.0\%)$	$18.57(0.4\%)$	$31.37(1.4\%)$	$38.68(0.4\%)$	$25.56(0.2\%)$
MCMC		$\mathbf{31.17}$	$\mathbf{33.61}$	$\mathbf{18.49}$	$\mathbf{31.83}$	$\mathbf{38.84}$	$\mathbf{25.51}$
	$\sqrt{{\rm Var}\left[\kappa_{i}\|{\mathcal{Y}}\right]}$
adSSLE	$N_{\mathrm{ED}}=1{,}000$	$6.00(268.3\%)$	$6.00(149.7\%)$	$6.00(362.1\%)$	$6.00(69.4\%)$	$5.99(202.8\%)$	$5.99(264.7\%)$
	$N_{\mathrm{ED}}=2{,}000$	$6.00(268.5\%)$	$4.53(88.9\%)$	$6.00(362.2\%)$	$6.00(69.5\%)$	$6.00(203.1\%)$	$6.00(265.2\%)$
	$N_{\mathrm{ED}}=5{,}000$	$6.00(268.5\%)$	$4.43(84.3\%)$	$2.87(120.8\%)$	$6.00(69.5\%)$	$4.29(116.8\%)$	$6.00(265.2\%)$
	$N_{\mathrm{ED}}=10{,}000$	$4.49(176.0\%)$	$()$	$1.84(42.1\%)$	$2.58(27.2\%)$	$3.89(96.6\%)$	$2.66(61.8\%)$
	$N_{\mathrm{ED}}=30{,}000$	$1.20(26.6\%)$	$3.60(49.8\%)$	$1.62(25.2\%)$	$2.66(24.7\%)$	$3.06(54.6\%)$	$1.74(5.9\%)$
MCMC		$\mathbf{1.63}$	$\mathbf{2.40}$	$\mathbf{1.30}$	$\mathbf{3.54}$	$\mathbf{1.98}$	$\mathbf{1.64}$

The full posterior marginals obtained from one run of adSSLE with $N_{\mathrm{ED}}=30{,}000$ are also compared to those of the reference MCMC and displayed in Figure 8. The individual plots show the univariate posterior marginals (i.e. $\pi(x_{i}|\mathcal{Y})$ ) on the main diagonal and the bivariate posterior marginals (i.e. $\pi(\bm{x}_{ij}|\mathcal{Y})$ ) in the $i$ -th row and $j$ -th column. It can be clearly seen that the posterior characteristics are very well captured. However, the adSSLE approach sometimes fails to accurately represent the tails of the distribution. This is especially obvious in the small discrepancy case in Figure LABEL:sub@fig:cs2:Posterior:SSESmall where the tail is sometimes cut off. We emphasize here that the SSLE marginals are obtained analytically as 1D and 2D surfaces for the univariate and bivariate marginals respectively. For the reference MCMC approach, on the other hand, they need to be approximated with histograms based on the available posterior sample.

5.2.1 Convergence of the posterior moments

In practical inference applications, posterior moments are often one of the main quantities of interest. An estimator of these moments is readily available at every refinement step of adSSLE through Eq. (22).

Tracking the evolution of the posterior moments throughout the adSSLE iterations can be used as a heuristic estimator of the convergence of the adSSLE algorithm. However, only the stability of the solution can be assessed, without guarantees on the bias. As an example, we now consider the large discrepancy problem and plot the evolution of the posterior mean and standard deviation for every $X_{i}$ as a function of the number of likelihood evaluations in Figure 9. It can be seen that after $\sim 10{,}000$ likelihood evaluations, most moment estimators achieve convergence to a value close to the reference solution. This plot also reveals a small bias of the ${\mathbb{E}}\left[X_{3}|{\mathcal{Y}}\right]$ and $\sqrt{{\rm Var}\left[X_{3}|{\mathcal{Y}}\right]}$ estimators, that was previously highlighted in Table 1.

5.2.2 Influence of $N_{\mathrm{ref}}$

The main hyperparameter of the proposed adSSLE algorithm is the number $N_{\mathrm{ref}}$ , which corresponds to the number of sample points that are required at each PCE construction step (see Section 4.2). In Figure 10 we display the effect of different $N_{\mathrm{ref}}$ values on the convergence in the small and large discrepancy problems.

$N_{\mathrm{ref}}$ influences the accuracy of the two error estimators used inside the adSSLE algorithm. They are: (i) the residual expansion accuracy $E_{\mathrm{LOO}}$ in Eq. (23) and (ii) the splitting error $E_{\mathrm{split}}$ in Eq. (24).

Small values of $N_{\mathrm{ref}}$ allow to quickly obtain a crude likelihood approximation with limited experimental design sizes $N_{\mathrm{ED}}$ , but this comes at the cost of lower convergence rates at larger $N_{\mathrm{ED}}$ . This behaviour can be partially attributed to the deterioration of residual expansion error $E_{\mathrm{LOO}}$ in Eq. (23). At small experimental design sizes, the overall number of terminal domains is relatively small and this effect is not as pronounced. At larger experimental designs and higher numbers of subdomains, however, the error estimators high variances can lead to difficulties in identifying the true high error subdomains.

Large values of $N_{\mathrm{ref}}$ lead to slower initial convergence rates because of the smaller number of overall subdomains. The algorithm stability, however, is increased because both error estimators have lower variance and thereby allow the algorithm to more reliably identify the true high error subdomains and choose the split directions that maximize Eq. (25).

5.3 High-dimensional diffusion problem

The last cast study shows that adSSLE for Bayesian model inversion remains feasible in high dimensional problems with low effective dimensionality. The considered forward model is often used as a standard benchmark in UQ computations (Shin and Xiu, 2016; Fajraoui et al., 2017). It represents the 1-D diffusion along a domain with coordinate $\xi\in[0,1]$ given by the following boundary value problem:

-\frac{\partial}{\partial\xi}\left[\kappa(\xi)\frac{\partial u}{\partial\xi}(\xi)\right]=1,\quad\text{with}\quad\begin{cases}u(0)=0,\\ \frac{\partial u}{\partial\xi}(1)=1.\end{cases}

(33)

The concentration field $u$ can be used to describe any steady-state diffusion driven process (e.g., heat diffusion, concentration diffusion, etc.). Assume that the diffusion coefficient $\kappa$ is a log-normal random field given by $\kappa(\xi,\omega)=\exp{(10+3g(\xi))}$ where $g$ is a standard normal stationary Gaussian random field with exponential autocorrelation function $\rho(\xi,\xi^{\prime})=\exp{(-3\left|\xi^{\prime}-\xi\right|)}$ . Let $g$ be approximated through a truncated Karhunen-Loève expansion

g(\xi)\approx\sum_{k=1}^{M}X_{k}e_{k}(\xi),

(34)

with the pairwise uncorrelated random variables $X_{k}$ denoting the field coefficients and the real valued function $e_{k}$ obtained from the solution of the Fredholm equation for $\rho$ (Ghanem and Spanos, 1991). The truncation variable is set to $M=62$ to explain $99\%$ of the variance. Some realizations of the random field and resulting concentrations are shown in Figure 11.

In this example, the random vector of coefficients $\bm{X}=(X_{1},\dots,X_{62})$ shall be inferred using a single measurement of the diffusion field at $u(\xi=1)$ given by ${\mathcal{Y}}=0.16$ . The considered model therefore takes as an input a realization of that random vector, and returns the diffusion field at $\xi=1$ , i.e., ${\mathcal{M}}:\bm{x}\mapsto u(1)$ . It is expected that due to the single measurement location at $\xi=1$ , only very little information about the parameters will be recovered in the inverse problem.

We impose a standard normal prior on the field coefficients such that $\pi(\bm{x})=\prod_{i=1}^{62}{\mathcal{N}}(x_{i}|0,1)$ and assume the standard additive discrepancy model with known discrepancy variance $\sigma^{2}=10^{-6}$ . This yields the likelihood function

{\mathcal{L}}(\bm{x};{\mathcal{Y}})=\frac{1}{2\pi\sigma^{2}}\exp\left(-\frac{1}{2\sigma^{2}}\left(0.16-{\mathcal{M}}(\bm{x})\right)^{2}\right).

(35)

We proceed to compare the performance of standard SLE, non-adaptive SSLE and the proposed adSSLE approach on this example. We solve the problem with a set of maximum likelihood evaluations $N_{\mathrm{ED}}=\{2{,}000;4{,}000;6{,}000;8{,}000;10{,}000\}$ .

In the present high-dimensional case, it is necessary to set $N_{\mathrm{ref}}$ to a relatively large number ( $N_{\mathrm{ref}}=2{,}000$ ). At smaller $N_{\mathrm{ref}}$ numbers the variance of the estimator of $E_{\mathrm{split}}$ in Eq. (24) makes it difficult for the algorithm to correctly identify the splitting direction that maximizes Eq. (25).

To compare the results of the algorithms, they are compared to a reference MCMC solution obtained with the affine-invariant ensemble sampler (Goodman and Weare, 2010) algorithm with $100{,}000$ steps and $100$ parallel chains at a total cost of $10^{7}$ likelihood evaluations.

To allow a quantitative comparison, we again use the error measure from Eq. (28) with $M=62$ . It is plotted for a set of maximum likelihood evaluations in Figure 12. It is clear that both SSLE algorithms outperform SLE, while the adSSLE approach manages to improve the performance of SLE by an order of magnitude.

The overall small magnitude of the error $\eta$ in Figure 12 can be attributed to the low active dimensionality of this problem. Despite its high nominal dimensionality ( $M=62$ ), this problem in fact has only very few active dimensions, as the first few variables are significantly more important than the rest. In physical terms, very local fluctuations of the conductivity do not influence the output $u(1)$ which results from an integration of these fluctuations. Therefore, the biggest change between prior and posterior distribution happens in the first few parameters (see also Table 3), while the other parameters remain unchanged by the updating procedure. This results in a small value of the Jensen-Shannon divergence for the inactive dimensions that lower the average value $\eta$ as defined in Eq. (28).

To highlight the results of one adSSLE algorithm instance with $N_{\mathrm{ED}}=10{,}000$ , we display plots of the marginal posteriors in Figure 13. Due to the low active dimensionality of the problem, we focus on the first $3$ parameters $\{X_{1},X_{2},X_{3}\}$ . The remaining posterior parameters are not significantly influenced by the considered data. The comparative plots show a good agreement between the adSSLE and the reference solution, especially w.r.t. the interaction between $X_{1}$ and $\{X_{2},X_{3}\}$ . As can be expected from the single measurement location, only the first parameter $X_{1}$ is significantly influenced by the Bayesian updating procedure.

For the same instance, we also compute the first two posterior moments for all posterior marginals and compare them to the MCMC reference solution. The resulting values are presented in Table 3. Keeping in mind that the prior distribution is a multivariate standard normal distribution ( ${\mathbb{E}}\left[X_{i}\right]=0$ and $\sqrt{{\rm Var}\left[X_{i}|{\mathcal{Y}}\right]}=1$ for $i=1,\dots,62$ ) it is obvious from this table that the data most significantly affects the first three parameters.

The adSSLE approach manages to accurately recover the first two posterior moments at the relatively low cost of $N_{\mathrm{ED}}=10{,}000$ likelihood evaluations. The average absolute error for ${\mathbb{E}}\left[X_{i}\right]$ and $\sqrt{{\rm Var}\left[X_{i}|{\mathcal{Y}}\right]}$ is approximately $0.02$ .

Table 3: High-dimensional diffusion problem: Posterior mean and standard deviation for selected marginals

X_{i}|{\mathcal{Y}},i=1,\dots,6,10,20,\dots,50,62

. The values in brackets are computed from the MCMC reference solution. The prior is a multivariate standard normal distribution (

{\mathbb{E}}\left[X_{i}\right]=0

and

\sqrt{{\rm Var}\left[X_{i}|{\mathcal{Y}}\right]}=1

	$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$
${\mathbb{E}}\left[X_{i}\|{\mathcal{Y}}\right]$	$-0.88(-0.89)$	$0.14(0.14)$	$0.07(0.09)$	$0.04(-0.03)$	$-0.04(-0.00)$	$-0.04(0.03)$
$\sqrt{{\rm Var}\left[X_{i}\|{\mathcal{Y}}\right]}$	$0.16(0.14)$	$1.07(1.04)$	$0.97(1.03)$	$1.02(1.00)$	$1.00(1.00)$	$0.98(1.03)$
	$X_{10}$	$X_{20}$	$X_{30}$	$X_{40}$	$X_{50}$	$X_{62}$
${\mathbb{E}}\left[X_{i}\|{\mathcal{Y}}\right]$	$0.00(0.03)$	$0.00(0.00)$	$-0.02(-0.01)$	$-0.03(-0.01)$	$-0.00(0.01)$	$0.00(-0.03)$
$\sqrt{{\rm Var}\left[X_{i}\|{\mathcal{Y}}\right]}$	$1.03(1.01)$	$1.03(0.98)$	$1.00(1.00)$	$1.04(1.02)$	$1.00(0.99)$	$1.00(0.98)$

6 Conclusions

Motivated by the recently proposed spectral likelihood expansions (SLE) framework for Bayesian model inversion presented in Nagel and Sudret (2016), we showed that the same analytical post-processing capabilities can be derived when the novel SSE approach from Marelli et al. (2021) is applied to likelihood functions, giving rise to the proposed SSLE approach. Because SSE is designed for models with complex local characteristics, it was expected to outperform SLE on practically relevant, highly localized likelihood functions. To further improve SSLE performance, we introduced a novel adaptive sampling procedure and modified partitioning strategy.

There are a few unsolved shortcomings of SSLE that will be addressed in future works. Namely, the discontinuities at the subdomain boundaries may be a source of error that should be addressed. Additionally, for the adSSLE algorithm, it is not possible at the moment to specify the optimal $N_{\mathrm{ref}}$ parameter a priori. In light of the considerable influence of that parameter as shown in Section 5.2.2, it might be necessary to adaptively adjust it, or decouple this parameter from the termination criterion.

Approximating likelihood functions through local PCEs prohibits the enforcement of strict positivity throughout the function domain. For visualization purposes this is not an issue, as negative predictions can simply be set to $0$ in a post-processing step. When computing posterior expectations with Eq. (22), however, this can lead to erroneous results such as negative posterior variances. One way to enforce strict positivity is through an initial transformation of the likelihood function (e.g., log-likelihood $\log{{\mathcal{L}}}\approx\sum_{k\in{\mathcal{K}}}f_{k}^{\mathrm{PCE}}(\bm{X})$ ). This is avoided in the present work because it comes at a loss of the desirable analytical post-processing properties.

Another class of problems that can cause instability when focusing directly on the likelihood function (both for SLE and SSLE), is that of problems with a very sharp likelihood function. This can happen, e.g. in the case of many available measurements and small discrepancy variances, that cause the likelihood function to reduce to the Dirac delta function, which has a very dense spectral representation.

In problems with few active dimensions (e.g. High-dimensional diffusion problem, Section 5.3), SSLE performs well because the likelihood is constant in many dimensions. The local sparse PCE construction can exploit this and could therefore handle up to hundreds of input dimensions. However, if the number of active dimensions is high as well, the present SSLE algorithm will require prohibitively large experimental designs, thereby rendering it unfeasible.

The biggest advantage of SSLE, however, is that it poses the challenging Bayesian computation in a function approximation setting. This yields an analytical expression of the posterior distribution and preserves the analytical post-processing capabilities of SLE while delivering highly improved likelihood function approximations. As opposed to many existing algorithms, SSLE can even efficiently solve Bayesian problems with multiple posterior modes. As shown in the case studies, the proposed adaptive algorithm further capitalizes on the compact support nature of likelihood functions and leads to significant performance gains, especially at larger experimental designs.

Acknowledgements

The PhD thesis of the first author is supported by ETH grant #44 17-1.

References

Beck and Au (2002) Beck, J. L. and S.-K. Au (2002). Bayesian updating of structural models and reliability using Markov chain Monte Carlo simulation. Journal of Engineering Mechanics 128(4), 380–391.
Beck and Katafygiotis (1998) Beck, J. L. and L. S. Katafygiotis (1998). Updating models and their uncertainties. I: Bayesian statistical framework. Journal of Engineering Mechanics 124(4), 455–461.
Birolleau et al. (2014) Birolleau, A., G. Poëtte, and D. Lucor (2014). Adaptive Bayesian inference for discontinuous inverse problems, application to hyperbolic conservation laws. Communications in Computational Physics 16(1), 1–34.
Bishop (2006) Bishop, C. M. (2006). Pattern recognition and machine learning. Information Science and Statistics. Springer.
Blatman and Sudret (2010) Blatman, G. and B. Sudret (2010). An adaptive algorithm to build up sparse polynomial chaos expansions for stochastic finite element analysis. Probabilistic Engineering Mechanics 25, 183–197.
Blatman and Sudret (2011) Blatman, G. and B. Sudret (2011). Adaptive sparse polynomial chaos expansion based on Least Angle Regression. Journal of Computational Physics 230, 2345–2367.
Brooks and Gelman (1998) Brooks, S. P. and A. Gelman (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics 7(4), 434–455.
Chapelle et al. (2002) Chapelle, O., V. Vapnik, and Y. Bengio (2002). Model selection for small sample regression. Journal of Machine Learning Research 48(1), 9–23.
Ching and Chen (2007) Ching, J. and Y. Chen (2007). Transitional Markov chain Monte Carlo method for Bayesian model updating, model class selection, and model averaging. Journal of Engineering Mechanics 133(7), 816–832.
Conrad et al. (2018) Conrad, P. R., A. D. Davis, Y. M. Marzouk, N. S. Pillai, and A. Smith (2018, jan). Parallel local approximation MCMC for expensive models. SIAM/ASA Journal on Uncertainty Quantification 6(1), 339–373.
Conrad et al. (2016) Conrad, P. R., Y. M. Marzouk, N. S. Pillai, and A. Smith (2016, oct). Accelerating asymptotically exact MCMC for computationally intensive models via local approximations. Journal of the American Statistical Association 111(516), 1591–1607.
Cui et al. (2014) Cui, T., Y. M. Marzouk, and K. E. Willcox (2014, aug). Data-driven model reduction for the Bayesian solution of inverse problems. International Journal for Numerical Methods in Engineering 102(5), 966–990.
Dick et al. (2017) Dick, J., R. N. Gantner, Q. T. Le Gia, and C. Schwab (2017). Multilevel higher-order quasi-Monte Carlo Bayesian estimation. Mathematical Models and Methods in Applied Sciences 27(5), 953–995.
El Moselhy and Marzouk (2012) El Moselhy, T. A. and Y. M. Marzouk (2012). Bayesian inference with optimal maps. Journal of Computational Physics 231(23), 7815–7850.
Fajraoui et al. (2017) Fajraoui, N., S. Marelli, and B. Sudret (2017). Sequential design of experiment for sparse polynomial chaos expansions. SIAM/ASA Journal on Uncertainty Quantification 5(1), 1061–1085.
Gantner and Peters (2018) Gantner, R. N. and M. D. Peters (2018). Higher-order quasi-Monte Carlo for Bayesian shape inversion. SIAM/ASA Journal on Uncertainty Quantification 6(2), 707–736.
Gautschi (2004) Gautschi, W. (2004). Orthogonal polynomials: Computation and approximation. Numerical Mathematics and Scientific Computation. Oxford University Press.
Gelman et al. (2014) Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin (2014). Bayesian Data Analysis (3 ed.). Texts in Statistical Science. CRC Press.
Gelman and Rubin (1992) Gelman, A. and D. B. Rubin (1992). Inference from iterative simulation using multiple sequences. Statistical Science 7(4), 457–472.
Ghanem and Spanos (1991) Ghanem, R. G. and P. Spanos (1991). Stochastic finite elements – A spectral approach. Springer Verlag, New York. (Reedited by Dover Publications, Mineola, 2003).
Goodman and Weare (2010) Goodman, J. and J. Weare (2010). Ensemble samplers with affine invariance. Communications in Applied Mathematics and Computational Science 5(1), 65–80.
Haario et al. (2001) Haario, H., E. Saksman, and J. Tamminen (2001). An adaptive Metropolis algorithm. Bernoulli 7(2), 223–242.
Harney (2016) Harney, H. L. (2016). Bayesian Inference: Data Evaluation and Decisions (2 ed.). Springer International Publishing.
Jakeman et al. (2015) Jakeman, J., M. S. Eldred, and K. Sargsyan (2015). Enhancing $\ell_{1}$ -minimization estimates of polynomial chaos expansions using basis selection. Journal of Computational Physics 289, 18–34.
Jaynes (2003) Jaynes, E. T. (2003). Probability theory: The logic of science. Cambridge University Press.
Jeffreys (1946) Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 186(1007), 453–461.
Kennedy and O’Hagan (2001) Kennedy, M. C. and A. O’Hagan (2001). Bayesian calibration of computer models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63(3), 425–464.
Li and Marzouk (2014) Li, J. and Y. M. Marzouk (2014). Adaptive construction of surrogates for the Bayesian solution of inverse problems. SIAM Journal on Scientific Computing 36(3), A1163–A1186.
Lin (1991) Lin, J. (1991). Divergence measures based on the Shannon entropy. Transactions on Information Theory 37(1), 145–151.
Lüthen et al. (2021) Lüthen, N., S. Marelli, and B. Sudret (2021). Sparse polynomial chaos expansions: Review and benchmark. SIAM/ASA Journal on Uncertainty Quantification.
Marelli and Sudret (2014) Marelli, S. and B. Sudret (2014). UQLab: A framework for uncertainty quantification in Matlab. In Vulnerability, Uncertainty, and Risk (Proceedings of the 2nd International Conference on Vulnerability, Risk Analysis and Management (ICVRAM2014), Liverpool, United Kingdom), pp. 2554–2563.
Marelli and Sudret (2019) Marelli, S. and B. Sudret (2019). UQLab user manual – Polynomial chaos expansions. Technical report, Chair of Risk, Safety and Uncertainty Quantification, ETH Zurich, Switzerland. Report # UQLab-V1.3-104.
Marelli et al. (2021) Marelli, S., P.-R. Wagner, C. Lataniotis, and B. Sudret (2021). Stochastic spectral embedding. International Journal for Uncertainty Quantification 11(2), 25–47.
Marin et al. (2012) Marin, J.-M., P. Pudlo, C. P. Robert, and R. J. Ryder (2012). Approximate Bayesian computational methods. Statistics and Computing 22(6), 1167–1180.
Marzouk et al. (2016) Marzouk, Y., T. Moselhy, M. Parno, and A. Spantini (2016). Sampling via measure transport: An introduction. In R. Ghanem, D. Higdon, and H. Owhadi (Eds.), Handbook of Uncertainty Quantification. Springer International Publishing.
Marzouk et al. (2007) Marzouk, Y. M., H. N. Najm, and L. A. Rahn (2007). Stochastic spectral methods for efficient Bayesian solution of inverse problems. Journal of Computational Physics 224, 560–586.
Marzouk and Xiu (2009) Marzouk, Y. M. and D. Xiu (2009). A stochastic collocation approach to Bayesian inference in inverse problems. Communications in Computational Physics 6(4), 826–847.
Nagel and Sudret (2016) Nagel, J. and B. Sudret (2016). Spectral likelihood expansions for Bayesian inference. Journal of Computational Physics 309, 267–294.
Neal (2011) Neal, R. M. (2011). MCMC using Hamiltonian dynamics. In S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng (Eds.), Handbook of Markov Chain Monte Carlo, Handbooks of Modern Statistical Methods, Chapter 5, pp. 113–162. Chapman & Hall/CRC.
Parno (2015) Parno, M. D. (2015). Transport maps for accelerated Bayesian computation. PhD thesis, Massachusetts Institute of Technology (MIT).
Robert and Casella (2004) Robert, C. P. and G. Casella (2004). Monte Carlo statistical methods (2^nd Ed.). Springer Series in Statistics. Springer Verlag.
Rosenblatt (1952) Rosenblatt, M. (1952). Remarks on a multivariate transformation. Annals of Mathematical Statistics 23(3), 470–472.
Shin and Xiu (2016) Shin, Y. and D. Xiu (2016). Nonadaptive quasi-optimal points selection for least squares linear regression. SIAM Journal on Scientific Computing 38(1), A385–A411.
Sisson et al. (2018) Sisson, S. A., Y. Fan, and M. A. Beaumont (Eds.) (2018). Handbook of approximate Bayesian computation. Chapman and Hall/CRC.
Tarantola (2005) Tarantola, A. (2005). Inverse Problem Theory and Methods for Model Parameter Estimation. Society for Industrial and Applied Mathematics (SIAM).
Tierney and Kadane (1986) Tierney, L. and J. B. Kadane (1986). Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association 81(393), 82–86.
Tierney et al. (1989a) Tierney, L., R. E. Kass, and J. B. Kadane (1989a). Approximate marginal densities of nonlinear functions. Biometrika 76(3), 425–433.
Tierney et al. (1989b) Tierney, L., R. E. Kass, and J. B. Kadane (1989b). Fully exponential Laplace approximations to expectations and variances of nonpositive functions. Journal of the American Statistical Association 84(407), 710–716.
Torre et al. (2019) Torre, E., S. Marelli, P. Embrechts, and B. Sudret (2019, jul). Data-driven polynomial chaos expansion for machine learning regression. Journal of Computational Physics 388, 601–623.
Wagner et al. (2020) Wagner, P.-R., R. Fahrni, M. Klippel, A. Frangi, and B. Sudret (2020, feb). Bayesian calibration and sensitivity analysis of heat transfer models for fire insulation panels. Engineering Structures 205(15), 110063.
Wand and Jones (1995) Wand, M. and M. C. Jones (1995). Kernel smoothing. Chapman and Hall, Boca Raton.
Xiu (2010) Xiu, D. (2010). Numerical methods for stochastic computations – A spectral method approach. Princeton University press.
Xiu and Karniadakis (2002) Xiu, D. and G. E. Karniadakis (2002). The Wiener-Askey polynomial chaos for stochastic differential equations. SIAM Journal on Scientific Computing 24(2), 619–644.
Yan and Zhou (2019) Yan, L. and T. Zhou (2019, mar). Adaptive multi-fidelity polynomial chaos approach to Bayesian inference in inverse problems. Journal of Computational Physics 381, 110–128.