This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\stackMath

lpl_{p} regularization for ensemble Kalman inversion

Yoonsang Lee [email protected] Department of Mathematics, Dartmouth College
Abstract

Ensemble Kalman inversion (EKI) is a derivative-free optimization method that lies between the deterministic and the probabilistic approaches for inverse problems. EKI iterates the Kalman update of ensemble-based Kalman filters, whose ensemble converges to a minimizer of an objective function. EKI regularizes ill-posed problems by restricting the ensemble to the linear span of the initial ensemble, or by iterating regularization with early stopping. Another regularization approach for EKI, Tikhonov EKI, penalizes the objective function using the l2l_{2} penalty term, preventing overfitting in the standard EKI. This paper proposes a strategy to implement lp,0<p1,l_{p},0<p\leq 1, regularization for EKI to recover sparse structures in the solution. The strategy transforms a lpl_{p} problem into a l2l_{2} problem, which is then solved by Tikhonov EKI. The transformation is explicit, and thus the proposed approach has a computational cost comparable to Tikhonov EKI. We validate the proposed approach’s effectiveness and robustness through a suite of numerical experiments, including compressive sensing and subsurface flow inverse problems.

1 Introduction

A wide range of problems in science and engineering are formulated as inverse problems. Inverse problems aim to estimate a quantity of interest from noisy, imperfect observation or measurement data, such as state variables or a set of parameters that constitute a forward model. Examples include deblurring and denoising in image processing [13], recovery of permeability in subsurface flow using pressure fields [21], and training a neural network in machine learning [14, 18] to name a few. In this paper, we consider the inverse problem of finding uNu\in\mathbb{R}^{N} from measurement data ymy\in\mathbb{R}^{m} where uu and yy are related as follows

y=G(u)+η.y=G(u)+\eta. (1)

Here G:NmG:\mathbb{R}^{N}\to\mathbb{R}^{m} is a forward model that can be nonlinear and computationally expensive to solve, for example, solving a PDE problem. The last term η\eta is a measurement error. The measurement error is unknown in general, but we assume that it is drawn from a known probability distribution, a Gaussian distribution with mean zero and a known covariance Γ\Gamma. By assuming that the forward model GG and the observation covariance Γ\Gamma are known, the unknown variable uu is estimated by solving an optimization problem

argminuN12yG(u)Γ2,\operatorname*{argmin}_{u\in\mathbb{R}^{N}}\frac{1}{2}\|y-G(u)\|^{2}_{\Gamma}, (2)

where Γ\|\cdot\|_{\Gamma} is the norm induced from the inner product using the inverse of the covariance matrix Γ\Gamma, that is aΓ2=a,Γ1a\|a\|_{\Gamma}^{2}=\langle a,\Gamma^{-1}a\rangle for the standard inner product ,\langle,\rangle in m\mathbb{R}^{m}.

Ensemble Kalman inversion (EKI), pioneered in the oil industry [21] and mathematically formulated in an application-neutral setting in [16], is a derivative-free method that lies between the deterministic and the probabilistic approaches for inverse problems. EKI’s key feature is an iterative application of the Kalman update of the ensemble-based Kalman filters [1, 11]. Ensemble-based Kalman filters are well known for their success in numerical weather prediction, stringent inverse problems involving high-dimensional systems. EKI iterates the ensemble-based Kalman update in which the ensemble mean converges to the solution of the optimization problem eq. 2. EKI can be thought of as a least-squares method in which the derivatives are approximated from an empirical correlation of an ensemble [5], not from a variational approach. Thus, EKI is highly parallelizable without calculating the derivatives related to the forward or the adjoint problem used in the gradient-based methods.

Inverse problems are often ill-posed, which suffer from non-uniqueness of the solution and lack stability. Also, in the context of regression, the solution can show overfitting. A common strategy to overcome ill-posed problems is regularizing the solution of the optimization problem [3]. That is, a special structure of the solution from prior information, such as sparsity, is imposed to address ill-posedness. The standard EKI [16] implements regularization by restricting the ensemble to the linear span of the initial ensemble reflecting prior information. The ensemble-based Kalman update is known for that the ensemble remains in the linear span of the initial ensemble [19, 16]. Thus, the EKI ensemble always stays in the linear span of the initial ensemble, which regularizes the solution. Although this approach shows robust results in certain applications, numerical evidence demonstrates that overfitting may still occur [16]. As an effort to address the overfitting of the standard EKI, an iterative regularization method has been proposed in [17], which approximates the regularizing Levenberg-Marquardt scheme [15]. As another regularization approach using a penalty term to the objective function, a recent work called Tikhonov EKI (TEKI) [8] implements the Tikhonov regularization (which imposes a l2l_{2} penalty term to the objective function) using an augmented measurement model that adds artificial measurements to the original measurement. TEKI’s implementation is a straightforward modification of the standard EKI method with a marginal increase in the computational cost.

The regularization methods for EKI mentioned above address several issues of ill-posed problems, including overfitting. However, it is still an open problem to implement other types of regularizers, such as l1l_{1} or total variation (TV) regularization. This paper aims to implement lp,0<p1l_{p},0<p\leq 1, regularization to recover sparse structures in the solution of inverse problems. In other words, we propose a highly-parallelizable derivative-free method that solves the following lpl_{p} regularized optimization problem

argminuXλ2upp+12yG(u)Γ2,\operatorname*{argmin}_{u\in X}\frac{\lambda}{2}\|u\|_{p}^{p}+\frac{1}{2}\|y-G(u)\|^{2}_{\Gamma}, (3)

where up\|u\|_{p} is the lpl_{p} norm of u, i.e., iN|ui|p\sum_{i}^{N}|u_{i}|^{p}, and λ\lambda is a regularization coefficient. The proposed method’s key idea is a transformation of variables that converts the lpl_{p} regularization problem to the Tikhonov regularization problem. Therefore, a local minimizer of the original lpl_{p} problem can be found by a local minimizer of the l2l_{2} problem that is solved using the idea of Tikhonov EKI. As this transformation is explicit and easy to calculate, the proposed method’s overall computational complexity remains comparable to the complexity of Tikhonov EKI. In general, a transformed optimization problem can lead to additional difficulties, such as change of convexity, increased nonlinearity, additional/missing local minima of the original problem, etc. [12]. We show that the transformation does not add or remove local minimizers in the transformed formulation. A work imposing sparsity in EKI has been reported recently [25]. The idea of this work is to use thresholding and a l1l_{1} constraint to impose sparsity in the inverse problem solution. The l1l_{1} constraint is further relaxed by splitting the solution into positive and negative parts. The split converts the l1l_{1} problem to a quadratic problem, while it still has a non-negativity constraint. On the other hand, our method does not require additional constraints by reformulating the optimization problem and works as a solver for the lpl_{p} regularized optimization problem eq. 3.

This paper is structured as follows. Section 2 reviews the standard EKI and Tikhonov EKI. In section 3, we describe a transformation that converts the lpl_{p} regularization problem eq. 3, 0<p10<p\leq 1, to the Tikhonov (that is, l2l_{2}) regularization problem, and provide the complete description of the lpl_{p} regularized EKI algorithm. We also discuss implementation and computation issues. Section 4 is devoted to the validation of the effectiveness and robustness of regularized EKI through a suite of numerical tests. The tests include a scalar toy problem with an analytic solution, a compressive sensing problem to benchmark with a convex l1l_{1} minimization method, and a PDE-constrained nonlinear inverse problem from subsurface flow. We conclude this paper in section 5, discussing the proposed method’s limitations and future work.

2 Ensemble Kalman inversion

The lpl_{p} regularized EKI uses a change of variables to transform a lpl_{p} problem into a l2l_{2} problem, which is then solved by the standard EKI using an augmented measurement model. This section reviews the standard EKI and the application of the augmented measurement model in Tikhonov EKI to implement l2l_{2} regularization. The review is intended to be concise, delivering the minimal ideas for the lpl_{p} regularized EKI. Detailed descriptions of the standard EKI and the Tikhonov EKI methods can be found in [16] and [8], respectively.

2.1 Standard ensemble Kalman inversion

EKI incorporates an artificial dynamics, which corresponds to the application of the forward model to each ensemble member. This application moves each ensemble member to the measurement space, which is then updated using the ensemble Kalman update formula. The ensemble updated by EKI stays in the linear span of the initial ensemble [16, 19]. Therefore, by choosing an initial ensemble appropriately for prior information, EKI is regularized as the ensemble is restricted to the linear span of the initial ensemble. Under a continuous-time limit, when the operator GG is linear, it is proved in [24] that EKI estimate converges to the solution of the following optimization problem

argminuN12yG(u)Γ2.\operatorname*{argmin}_{u\in\mathbb{R}^{N}}\frac{1}{2}\|y-G(u)\|_{\Gamma}^{2}. (4)

In this paper, we consider the discrete-time EKI in [16], which is described below.

Algorithm: standard EKI
Assumption: an initial ensemble of size KK, {u0(k)}k=1K\{u_{0}^{(k)}\}_{k=1}^{K} from prior information, is given.
For n=1,2,,n=1,2,...,

  1. 1.

    Prediction step using the artificial dynamics:

    1. (a)

      Apply the forward model GG to each ensemble member

      gn(k):=G(un1(k))g_{n}^{(k)}:=G(u_{n-1}^{(k)}) (5)
    2. (b)

      From the set of the predictions {gn(k)}k=1K\{g_{n}^{(k)}\}_{k=1}^{K}, calculate the mean and covariances

      g¯n=1Kk=1Kgn(k),\overline{g}_{n}=\frac{1}{K}\sum_{k=1}^{K}g_{n}^{(k)}, (6)
      Cnug=1Kk=1K(un(k)u¯n)(gn(k)g¯n),Cngg=1Kk=1K(gn(k)g¯n)(gn(k)g¯n),\begin{split}C^{ug}_{n}&=\frac{1}{K}\sum_{k=1}^{K}(u_{n}^{(k)}-\overline{u}_{n})\otimes(g_{n}^{(k)}-\overline{g}_{n}),\\ C^{gg}_{n}&=\frac{1}{K}\sum_{k=1}^{K}(g_{n}^{(k)}-\overline{g}_{n})\otimes(g_{n}^{(k)}-\overline{g}_{n}),\end{split} (7)

      where u¯n\overline{u}_{n} is the mean of {un(k)}\{u_{n}^{(k)}\}, i.e., 1Kk=1Kun(k)\displaystyle\frac{1}{K}\sum_{k=1}^{K}u_{n}^{(k)}.

  2. 2.

    Analysis step:

    1. (a)

      Update each ensemble member un(k)u_{n}^{(k)} using the Kalman update

      un+1(k)=un(k)+Cnug(Cngg+Γ)1(yn(k)gn(k)),u_{n+1}^{(k)}=u_{n}^{(k)}+C^{ug}_{n}(C^{gg}_{n}+\Gamma)^{-1}(y_{n}^{(k)}-g_{n}^{(k)}), (8)

      where yn+1(k)=y+ζn+1(k)y_{n+1}^{(k)}=y+\zeta_{n+1}^{(k)} is a perturbed measurement using Gaussian noise ζn+1(k)\zeta_{n+1}^{(k)} with mean zero and covariance Γ\Gamma.

    2. (b)

      Compute the mean of the ensemble as an estimate for the solution

      u¯n+1=1Kk=1Kun(k)\overline{u}_{n+1}=\frac{1}{K}\sum_{k=1}^{K}u_{n}^{(k)} (9)
Remark 1.

The term Cnug(Cngg+Γ)1C^{ug}_{n}(C^{gg}_{n}+\Gamma)^{-1} in eq. 8 is from the Kalman gain matrix. The standard EKI uses an extended space, (u,G(u))N+m(u,G(u))\in\mathbb{R}^{N+m}, and then use the Kalman update for the extended space variable. However, as we need to update only uu while G(u)G(u) is subordinate to uu, we have the update formula eq. 8.

2.2 Tikhonov ensemble Kalman inversion

EKI is regularized through the initial ensemble reflecting prior information. However, there are several numerical evidence showing that EKI regularized only through an ensemble may have overfitting [16]. Among other approaches to regularize EKI, Tikhonov EKI [8] uses the idea of an augmented measurement to implement l2l_{2} regularization, which is a simple modification of the standard EKI. For the original measurement yy, the augmented measurement model extends yy by adding the zero vector in N\mathbb{R}^{N}, which yields an augmented measurement vector zm+Nz\in\mathbb{R}^{m+N}

augmented measurement vector: z=(y,0).\mbox{augmented measurement vector: }z=(y,0). (10)

The forward model is also augmented to account for the augmented measurement vector, which adds the identity measurement

augmented forward model: F(u)=(G(u),u).\mbox{augmented forward model: }F(u)=(G(u),u). (11)

Using the augmented measurement vector and the model, Tikhonov EKI has the following inverse problem of estimating uu from zz

z=F(u)+ζ.z=F(u)+\zeta. (12)

Here ζ\zeta is a m+Nm+N-dimensional measurement error for the augmented measurement model, which is Gaussian with mean zero and covariance

Σ=(Γ001λIN),\Sigma=\begin{pmatrix}\Gamma&0\\ 0&\frac{1}{\lambda}I_{N}\end{pmatrix}, (13)

for the N×NN\times N identity matrix INI_{N}.

The mechanism enabling the l2l_{2} regularization in Tikhonov EKI is the incorporation of the l2l_{2} penalty term as a part of the augmented measurement model. From the orthogonality between different components in m+N\mathbb{R}^{m+N}, we have

12zF(u)Σ2=12yG(u)Γ2+120u1λIN2=12yG(u)Γ2+λ2u22.\begin{split}\frac{1}{2}\|z-F(u)\|^{2}_{\Sigma}&=\frac{1}{2}\|y-G(u)\|^{2}_{\Gamma}+\frac{1}{2}\|0-u\|^{2}_{\frac{1}{\lambda}I_{N}}\\ &=\frac{1}{2}\|y-G(u)\|^{2}_{\Gamma}+\frac{\lambda}{2}\|u\|^{2}_{2}.\\ \end{split} (14)

Therefore, the standard EKI algorithm applied to the augmented measurement minimizes 12zF(u)Σ2\frac{1}{2}\|z-F(u)\|^{2}_{\Sigma}, which equivalently minimizes the l2l_{2} regularized problem.

3 lpl_{p}-regularization for EKI

This section describes a transformation that converts a lp,0<p1,l_{p},0<p\leq 1, regularization problem to a l2l_{2} regularization problem. lpl_{p}-regularized EKI (lpl_{p}EKI), which we completely describe in section 3.2, utilizes this transformation and solves the transformed l2l_{2} regularization problem using the idea of Tikhonov EKI [8], the augmented measurement model.

3.1 Transformation of lpl_{p} regularization into l2l_{2} regularization

For 0<p10<p\leq 1, we define a function ψ:\psi:\mathbb{R}\to\mathbb{R} given by

ψ(x)=sgn(x)|x|p2,x.\psi(x)=\operatorname*{\mbox{sgn}}(x)|x|^{\frac{p}{2}},\quad x\in\mathbb{R}. (15)

Here sgn(x)\operatorname*{\mbox{sgn}}(x) is the sign function of xx, which has 1 for x>0x>0, 0 for x=0x=0, and -1 for x<0x<0. It is straightforward to check that ψ\psi is bijective and has an inverse ξ:\xi:\mathbb{R}\to\mathbb{R} defined as

ξ(x)=sgn(x)|x|2p,x.\xi(x)=\operatorname*{\mbox{sgn}}(x)|x|^{\frac{2}{p}},\quad x\in\mathbb{R}. (16)

For uu in N\mathbb{R}^{N}, we define a nonlinear map Ψ:NN\Psi:\mathbb{R}^{N}\to\mathbb{R}^{N}, which applies ψ\psi to each component of u=(u1,u2,,uN)u=(u_{1},u_{2},...,u_{N}),

Ψ(u)=(ψ(u1),ψ(u2),,ψ(uN)).\Psi(u)=(\psi(u_{1}),\psi(u_{2}),...,\psi(u_{N})). (17)

As ψ\psi has an inverse, the map Ψ\Psi also has an inverse, say Ξ\Xi

Ξ(u)=Ψ1(u)=(ξ(u1),ξ(u2),,ξ(uN)).\Xi(u)=\Psi^{-1}(u)=(\xi(u_{1}),\xi(u_{2}),...,\xi(u_{N})). (18)

For v=Ψ(u)v=\Psi(u), it can be checked that for each i=1,2,,Ni=1,2,...,N,

|vi|2=|ψ(ui)|2=|ui|p,|v_{i}|^{2}=|\psi(u_{i})|^{2}=|u_{i}|^{p},

and thus we have the following norm relation

v22=upp.\|v\|_{2}^{2}=\|u\|_{p}^{p}. (19)

This relation shows that the map v=Ψ(u)v=\Psi(u) converts the lpl_{p}-regularized optimization problem in uu eq. 3 to a l2l_{2} regularized problem in vv,

argminvNλ2v22+12yG~(v)Γ2,\operatorname*{argmin}_{v\in\mathbb{R}^{N}}\frac{\lambda}{2}\|v\|_{2}^{2}+\frac{1}{2}\|y-\tilde{G}(v)\|^{2}_{\Gamma}, (20)

where G~\tilde{G} is the pullback of GG by Ξ\Xi

G~=GΞ.\tilde{G}=G\circ\Xi. (21)

A transformation between l1l_{1} and l2l_{2} regularization terms has already been used to solve an inverse problem in the Bayesian framework [26]. In the context of the randomize-then-optimize framework [2], the method in [26] draws a sample from a Gaussian distribution, which is then transformed to a Laplace distribution. As this method needs to match the corresponding densities of the variables (the original and the transformed variables) as random variables, the transformation involves calculations related to cumulative distribution functions. For the scalar case, vv\in\mathbb{R}, the transformation from l2l_{2} to l1l_{1}, denoted as glgl, is given by

gl(v)=sgn(v)log(12|ϕ(v)12|).gl(v)=-\operatorname*{\mbox{sgn}}(v)\log\left(1-2\left|\phi(v)-\frac{1}{2}\right|\right). (22)

where ϕ(u)\phi(u) is the cumulative distribution function of the standard Gaussian distribution. fig. 1 shows the two transformations ξ\xi eq. 16 and glgl eq. 22; the former is based on the norm relation eq. 19 and the latter is based on matching densities as random variables.

Refer to caption
Figure 1: ξ\xi: transformation matching the norm relation eq. 19, glgl: transformation from Gaussian to Laplace distributions.

We note that the transformation ξ\xi has a region around 0 flatter than the transformation glgl, but ξ\xi diverts quickly as vv moves further away from 0. From this comparison, we expect that the flattened region of ξ\xi plays another role in imposing sparsity by trapping the ensemble to the flattened area.

In general, a reformulation of an optimization problem using a transformation has the following potential issues [12]: i) the degree of nonlinearity may be significantly increased, ii) the desired minimum may be inadvertently excluded, or iii) an additional local minimum can be included. In [9], for a non-convex problem, it is shown that TEKI converges to an approximate local minimum if the gradient and Hessian of the objective function are bounded. It is straightforward to check that the transformed objective function has bounded gradient and Hessian if 0<p10<p\leq 1 regardless of the convexity of the problem. Therefore, if we can show that the original and the transformed problems have the same number of local minima, then it is guaranteed to find a local minimum of the original problem by finding a local minimum of the transformed problem using TEKI. We want to note the importance of the sign function in defining ψ\psi and ξ\xi. The sign function is not necessary to satisfy the norm relation eq. 19, but it is essential to make the transformation Ψ\Psi and its inverse Ξ\Xi bijective. Without being bijective, the transformed l2l_{2} problem can have more or less local minima than the original problem.

The following theorem shows that the transformation does not add or remove local minima.

Theorem 1.

For an objective function J(u):NJ(u):\mathbb{R}^{N}\to\mathbb{R}, if uu^{*} is a local minimizer of J(u)J(u), Ψ(u)\Psi(u^{*}) is also a local minimizer of J~(v)=JΞ(v)\tilde{J}(v)=J\circ\Xi(v). Similarly, if vv^{*} is a local minimizer of J~(v)\tilde{J}(v), then Ξ(v)\Xi(v^{*}) is also a local minimizer of J(u)=J~Ψ(u)J(u)=\tilde{J}\circ\Psi(u).

Proof.

From the definition eq. 17 and eq. 18, Ψ\Psi and Ξ\Xi are continuous and bijective. Thus for uNu\in\mathbb{R}^{N}, both Ψ\Psi and Ξ\Xi map a neighborhood of uNu\in\mathbb{R}^{N} to neighborhoods of Ψ(u)\Psi(u) and Ξ(u)\Xi(u), respectively. As uu^{*} is a local minimizer, there exists a neighborhood 𝒩\mathcal{N} of uu^{*} such that

J(u)J(w)for all w𝒩.J(u^{*})\leq J(w)\quad\mbox{for all }w\in\mathcal{N}. (23)

Let v=Ψ(u)v=\Psi(u^{*}) and :=Ψ(𝒩)\mathcal{M}:=\Psi(\mathcal{N}) that is a neighborhood of vv. For any ww\in\mathcal{M}, Ξ(w)𝒩\Xi(w)\in\mathcal{N} and thus we have

J~(v)=J(Ξ(v))=J(u)J(Ξ(w))=J~(w),\tilde{J}(v)=J(\Xi(v))=J(u)\leq J(\Xi(w))=\tilde{J}(w), (24)

which shows that vv is a local minimizer of J~\tilde{J}. The other direction is proved similarly by changing the roles of Ψ\Psi and Ξ\Xi and of JJ and J~\tilde{J}. ∎

We note that an insolated local minimizer can replace the local minimizer in the theorem. If there is a unique global minimizer of the lpl_{p} regularization problem eq. 3, the theorem guarantees that we can find it by finding the global minimizer of the l2l_{2} regularized problem eq. 20.

Corollary 1.

For 0<p10<p\leq 1, if the lpl_{p} regularized optimization eq. 3 has a unique global minimizer, say uu^{{\dagger}}, the l2l_{2} regularized optimization eq. 20 also has a unique global minimizer. By finding the minimizer uu^{{\dagger}} of eq. 20, say vv^{{\dagger}}, uu^{{\dagger}} is given by

u=Ξ(v).u^{{\dagger}}=\Xi(v^{{\dagger}}). (25)

3.2 Algorithm

lpl_{p}-regularized EKI (lpl_{p}EKI) solves the transformed l2l_{2} regularization problem using the standard EKI with the augmented measurement model. For the current study’s completeness to implement lpl_{p}EKI, this subsection describes the complete lpl_{p}EKI algorithm and discuss issues related to implementation. Note that the Tikhonov EKI (TEKI) part in lpl_{p}EKI is slightly modified to reflect the setting assumed in this paper. The general TEKI algorithm and its variants can be found in [8].

We assume that the forward model GG and the measurement error covariance Γ\Gamma are known, and measurement ymy\in\mathbb{R}^{m} is given (and thus z=(y,0)z=(y,0) is also given). We also fix the regularization coefficient λ\lambda and pp. Under this assumption, lpl_{p}EKI uses the following iterative procedure to update the ensemble until the ensemble mean v¯=1Kk=1Kv(k)\overline{v}=\displaystyle\frac{1}{K}\sum_{k=1}^{K}v^{(k)} converges.

Algorithm: lpl_{p}-regularized EKI
Assumption: an initial ensemble of size KK, {v0(k)}k=1K\{v_{0}^{(k)}\}_{k=1}^{K}, is given.
For n=1,2,,n=1,2,...,

  1. 1.

    Prediction step using the forward model:

    1. (a)

      Apply the augmented forward model FF to each ensemble member

      fn(k):=F(vn(k))=(G~(vn(k)),vn(k))f_{n}^{(k)}:=F(v_{n}^{(k)})=(\tilde{G}(v_{n}^{(k)}),v_{n}^{(k)}) (26)
    2. (b)

      From the set of the predictions {fn(k)}k=1K\{f_{n}^{(k)}\}_{k=1}^{K}, calculate the mean and covariances

      f¯n=1Kk=1Kfn(k),\overline{f}_{n}=\frac{1}{K}\sum_{k=1}^{K}f_{n}^{(k)}, (27)
      Cnvf=1Kk=1K(vn(k)v¯n)(fn(k)f¯n),Cnff=1Kk=1K(fn(k)f¯n)(fn(k)f¯n)\begin{split}C^{vf}_{n}&=\frac{1}{K}\sum_{k=1}^{K}(v_{n}^{(k)}-\overline{v}_{n})\otimes(f_{n}^{(k)}-\overline{f}_{n}),\\ C^{ff}_{n}&=\frac{1}{K}\sum_{k=1}^{K}(f_{n}^{(k)}-\overline{f}_{n})\otimes(f_{n}^{(k)}-\overline{f}_{n})\end{split} (28)

      where v¯n\overline{v}_{n} is the ensemble mean of {vn(k)}\{v_{n}^{(k)}\}, i.e., 1Kk=1Kvn(k)\displaystyle\frac{1}{K}\sum_{k=1}^{K}v_{n}^{(k)}.

  2. 2.

    Analysis step:

    1. (a)

      Update each ensemble member vn(k)v_{n}^{(k)} using the Kalman update

      vn+1(k)=vn(k)+Cnvf(Cnff+Σ)1(zn+1(k)fn(k)),v_{n+1}^{(k)}=v_{n}^{(k)}+C^{vf}_{n}(C^{ff}_{n}+\Sigma)^{-1}(z_{n+1}^{(k)}-f_{n}^{(k)}), (29)

      where zn+1(k)=z+ζn+1(k)z_{n+1}^{(k)}=z+\zeta_{n+1}^{(k)} is a perturbed measurement using Gaussian noise ζn+1(k)\zeta_{n+1}^{(k)} with mean zero and covariance Σ\Sigma.

    2. (b)

      For the ensemble mean v¯n\overline{v}_{n}, the lpl_{p}EKI estimate, unu_{n}, for the minimizer of the lpl_{p} regularization is given by

      u=Ξ(v¯n).u=\Xi(\overline{v}_{n}). (30)
Remark 2.

In EKI and TEKI, the covariance of ζn+1(k)\zeta_{n+1}^{(k)} can be set to zero so that all ensemble member uses the same measurement zz without perturbations. In our study, we focus on the perturbed measurement using the covariance matrix Γ\Gamma.

Remark 3.

The above algorithm is equivalent to TEKI, except that the forward model GG is replaced with the pullback of GG by the transformation Ξ\Xi. In comparison with TEKI, the additional computational cost for lpl_{p}EKI is to calculate the Transformation Ξ(v)\Xi(v). In comparison with the standard EKI, the additional cost of lpl_{p}EKI, in addition to the cost related to the transformation, is the matrix inversion (Cngg+Σ)1(C^{gg}_{n}+\Sigma)^{-1} in the augmented measurement space m+N\mathbb{R}^{m+N} instead of a matrix inversion in the original measurement space m\mathbb{R}^{m}. As the covariance matrices are symmetric positive definite, the matrix inversion can be done efficiently.

Remark 4.

In lpl_{p}EKI, it is also possible to consider estimating uu by transforming each ensemble member and take average of the transformed members, that is,

u=1Kk=1KΞ(vn(k))u=\frac{1}{K}\sum_{k=1}^{K}\Xi(v_{n}^{(k)}) (31)

instead of eq. 30. If the ensemble spread is large, these two approaches will make a difference. In our numerical tests in the next section, we do not incorporate covariance inflation. Thus the ensemble spread becomes relatively small when the estimate converges, and thus eq. 30 and eq. 31 are not significantly different. In this study, we use eq. 30 to measure the performance of lpl_{p}EKI.

In recovering sparsity using the lpl_{p} penalty term, if the penalty term’s convexity is not necessary, it is preferred to use a small p<1p<1 as a smaller pp imposes stronger sparsity. The transformation in lpl_{p}EKI works for any positive pp, but the transformation can lead to an overflow for a small pp; the function ξ\xi depends on an exponent 2p\frac{2}{p} that becomes large for a small pp. Therefore, there is a limit for the smallest pp. In our numerical experiments in the next section, the smallest pp is 0.7 in the compressive sensing test.

There is a variant of lpl_{p}EKI worth further consideration. In [24], a continuous-time limit of EKI has been proposed, which rescales Γh1Γ\Gamma\to h^{-1}\Gamma using h>0h>0 so that the matrix inversion (Cngg+h1Γ)1(C^{gg}_{n}+h^{-1}\Gamma)^{-1} is approximated by hΓ1h\Gamma^{-1} as a limit of h0h\to 0. In many applications, the measurement error covariance is assumed to be diagonal. That is, the measurement error corresponding to different components are uncorrelated. Thus the inversion Γ1\Gamma^{-1} becomes a cheap calculation in the continuous-time limit. The continuous-time limit is then discretized in time using an explicit time integration method with a finite time step. The latter is called ‘learning rate’ in the machine learning community, and it is known that an adaptive time-stepping to solve an optimization often shows improved results [10, 22]. The current study focuses on the discrete-time update described in eq. 8 and we leave adaptive time-stepping for future work.

4 Numerical tests

We apply lpl_{p}-regularized EKI (lpl_{p}EKI) to a suite of inverse problems to check its performance in regularizing EKI and recovering sparse structures of solutions. The tests include: i) a scalar toy model where an analytic solution is available, ii) a compressive sensing problem to recover a sparse signal from random measurements of the signal, iii) an inverse problem in subsurface flow; estimation of permeability from measurements of hydraulic pressure field whose forward model is described by a 2D elliptic partial differential equation [7, 21]. In all tests, we run lpl_{p}EKI for various values of p1p\leq 1, and compare with the result of Tikhonov EKI. We analyze the results to check how effectively lpl_{p}EKI implements lpl_{p} regularization and recover sparse solutions. When available, we also compare lpl_{p}EKI with a l1l_{1} convex minimization method. As quantitative measures for the estimation performance, we calculate the l1l_{1} error of the lpl_{p}EKI estimates and the data misfit yG(u)2\|y-G(u)\|_{2}.

Several parameters are to be determined in lpl_{p}EKI to achieve robust estimation results, the regularization coefficient λ\lambda, ensemble size, and its initialization. The regularization coefficient can be selected, for example, using cross-validation. As the coefficient can significantly affect the performance, we find the coefficient by hand-tuning so that lpl_{p}EKI achieves the best result for a given pp. Ensemble initialization plays a role in regularizing EKI, restricting the estimate to the linear span of the initial ensemble. In our experiments, instead of tuning the initial ensemble for improved results, we initialize the ensemble using a Gaussian distribution with mean zero and a constant diagonal covariance matrix (the variance will be specified later for each test). As this initialization does not utilize any prior information, a sparse structure in the solution, we regularize the solution mainly through the lpl_{p} penalty term. For each test, we run 100 trials of lpl_{p}EKI through 100 realizations of the initial ensemble distribution and use the estimate averaged over the trials along with its standard deviation to measure the performance difference.

Regarding the ensemble size, for the scalar toy and the compressive sensing problems, we test ensemble sizes larger than the dimension of uu, the unknown variable of interest. The purpose of a large ensemble size is to minimize the sampling error while we focus on the regularization effect of lpl_{p}EKI. To show the applicability of lpl_{p}EKI for high-dimensional problems, we also test a small ensemble size using the idea of multiple batches used in [23]. The multiple batch approach runs several batches where small magnitude components are removed after each batch. After removing small magnitude components from the previous batch, the ensemble is used for the next batch. The multiple batch approach enables a small ensemble size, 50 ensemble members, for the compressive sensing and the 2D elliptic inversion problems where the dimensions of uu are 200 and 400, respectively.

In ensemble-based Kalman filters, covariance inflation is an essential tool to stabilize and improve the performance of the filters. In a connection with the inflation, an adaptive time-stepping has been investigated to improve the performance of EKI. Although the adaptive time-stepping can be incorporated in lpl_{p}EKI for performance improvements, we use the discrete version lpl_{p}EKI described in section 3.2 focusing on the effect of different types of regularization on inversion. We will report a thorough investigation along the line of adaptive time-stepping in another place.

4.1 A scalar toy problem

The first numerical test is a scalar problem for uu\in\mathbb{R} with an analytic solution. As this is a scalar problem, there is no effect of regularization from ensemble initialization, and we can see the effect from the lpl_{p} penalty term. The scalar optimization problem we consider here is the minimization of an objective function J(u)=14|u|p+12(1u)2J(u)=\frac{1}{4}|u|^{p}+\frac{1}{2}(1-u)^{2}

argminuJ(u)=argminu14|u|p+12(1u)2.\operatorname*{argmin}_{u\in\mathbb{R}}J(u)=\operatorname*{argmin}_{u\in\mathbb{R}}\frac{1}{4}|u|^{p}+\frac{1}{2}(1-u)^{2}. (32)

This setup is equivalent to solving the optimization problem eq. 3 using lpl_{p} regularization with λ=1/2\lambda=1/2, where y=1y=1, G(u)=uG(u)=u, and η\eta is Gaussian with mean zero and variance 1. Using the transformation v=Ψ(u)=ψ(u)=sgn(u)|u|p2v=\Psi(u)=\psi(u)=\operatorname*{\mbox{sgn}}(u)|u|^{\frac{p}{2}} defined in eq. 15, lpl_{p}EKI minimizes a transformed objective function J~(v)=14|v|2+12(1sgn(v)|v|2/p)2\tilde{J}(v)=\frac{1}{4}|v|^{2}+\frac{1}{2}(1-\operatorname*{\mbox{sgn}}(v)|v|^{2/p})^{2}

argminvJ~(v)=argminv14|v|2+12(1sgn(v)|v|2/p)2,\operatorname*{argmin}_{v\in\mathbb{R}}\tilde{J}(v)=\operatorname*{argmin}_{v\in\mathbb{R}}\frac{1}{4}|v|^{2}+\frac{1}{2}(1-\operatorname*{\mbox{sgn}}(v)|v|^{2/p})^{2}, (33)

which is an l2l_{2} regularization of 12(1sgn(v)|v|2p)2\frac{1}{2}(1-\operatorname*{\mbox{sgn}}(v)|v|^{\frac{2}{p}})^{2}.

Refer to caption
Figure 2: Objective functions of eq. 32 and eq. 33 for p=1p=1 (first row) and p=0.5p=0.5 (second row).

For p=1p=1, the first row of fig. 2 shows the objective functions of lpl_{p} eq. 32 and the transformed l2l_{2} eq. 33 formulations. Each objective function has a unique global minimum without other local minima. The minimizers are 34\frac{3}{4} and 32\frac{\sqrt{3}}{2} for l1l_{1} and l2l_{2}, respectively. We can check that the transformation does not add/remove local minimizers, but the convexity of the objective function changes. The transformed objective function J~\tilde{J} has an inflection point at u=0u=0, which is also a stationary point. Note that the original function has no other stationary points than the global minimizer.

When p=0.5p=0.5, a potential issue of the transformation can be seen explicitly. The original objective and the transformed objective functions are shown in the second row of fig. 2. Due to the regularization term with p=0.5p=0.5, the objective functions are non-convex and have a local minimizer at u=v=0u=v=0 in addition to the global minimizers. In the transformed formulation (bottom right of fig. 2), the objective function flattens around v=0v=0, which shows a potential issue of trapping ensemble members around v=0v=0. Numerical experiments show that if the ensemble is initialized with a small variance, the ensemble is trapped around v=0v=0. On the other hand, if the ensemble is initialized with a sufficiently large variance (so that some of the ensemble members are initialized out of the well around v=0v=0), lpl_{p}EKI shows convergence to the true minimizer, v=0.9304v=0.9304 (or u=0.8656u=0.8656) even when it is initialized around 0.

Refer to caption
Figure 3: Time series of lpl_{p}EKI estimate, ξ(v¯n)\xi(\overline{v}_{n}), which is averaged over 100 different trials.

We use 100 different realizations for the ensemble initialization and each trial uses 50 ensemble members. The estimates at each iteration, which is averaged over different trials, are shown in fig. 3. For p=1p=1 (first row) and p=0.5p=0.5 (second row), the left and right columns show the results when the ensemble is initialized with mean 1 and 0, respectively. When p=1p=1 and initialized around 11, the ensemble estimate quickly converges to the true value 0.750.75 as the objective function is convex, and the initial guess is close to the true value. When p=0.5p=0.5, as the objective function is non-convex due to the regularization term, the convergence is slower than the p=1p=1 case. When the ensemble is initialized around 0 for p=0.5p=0.5, a local minimizer, the ensemble needs to be initialized with a large variance. Using variance 1, which is 10 times larger than 0.1, the variance for the ensemble initialization around 1, lpl_{p}EKI converges to the true value. The performance difference between different trials is marginal. The standard deviations of the estimate after 50 iterations are 6.62×1036.62\times 10^{-3} (p=1p=1 initialized with 1), 7.95×1037.95\times 10^{-3} (p=1p=1 initialized with 0), 8.79×1038.79\times 10^{-3} (p=0.5p=0.5 initialized with 1), and 1.14×1021.14\times 10^{-2} (p=0.5p=0.5 initialized with 0). As a reference, the estimate using the transformation eq. 22 based on matching the densities of random variables converges to 0.71.

4.2 Compressive sensing

The second test is a compressive sensing problem. The true signal uu is a vector in 200\mathbb{R}^{200}, which is sparse with only four randomly selected non-zero components (their magnitudes are also randomly chosen from the standard normal distribution.) The forward model G:20020G:\mathbb{R}^{200}\to\mathbb{R}^{20} is a random Gaussian matrix of size 20×20020\times 200, which yields a measurement vector in 20\mathbb{R}^{20}. The measurement yy is obtained by applying the forward model to the true signal uu polluted by Gaussian noise with mean zero and variance 0.010.01. As the forward model is linear, several robust methods can solve the sparse recovery problem, including the l1l_{1} convex minimization method [4]. This test aims to compare the performance of lpl_{p}EKI for various pp values, rather than to advocate the use of lpl_{p}EKI over other standard methods. As the forward model is linear and cheap to calculate, the standard methods are preferred over lpl_{p}EKI for this test.

We first use a large ensemble size, 2000 ensemble members, to run lpl_{p}EKI. The ensemble is initialized by drawing samples from a Gaussian distribution with mean zero and a diagonal covariance (which yields variance 0.1 for each component). For p=1p=1 and 0.70.7, the tuned regularization coefficients, λ\lambda, are 100 and 300. When p=2p=2, which corresponds to TEKI, the best result can be obtained using λ\lambda ranging from 10 to 200; we use the result of λ=50\lambda=50 to compare with the other cases. For p=1p=1, we also compare the result of the convex l1l_{1} minimization method using the KKT solver in the Python library CVXOPT [20].

Refer to caption
Figure 4: Compressive sensing. Reconstruction of sparse signal using lpl_{p}EKI for pp=2, 1, and 0.7. Ensemble size is 2000. The bottom right plot is the reconstruction using the convex l1l_{1} minimization method. For the true signal, only the nonzero components are marked

.

Figure 4 shows the lpl_{p}EKI estimates after 20 iterations averaged over 100 trials for p=2p=2 (top left), p=1p=1 (top right), and p=0.7p=0.7 (bottom left), along with the estimate by the convex optimization (bottom right). As it is well known in compressive sensing, l2l_{2} regularization fails to capture the true signal’s sparse structure. As pp decreases to 1, lpl_{p}EKI develops sparsity in the estimate, comparable to the estimate of the convex l1l_{1} minimization method. The slightly weak magnitudes of the three most significant components by lpl_{p}EKI improve as pp decreases to 0.70.7. When p=0.7p=0.7, lpl_{p}EKI captures the correct magnitudes at the cost of losing the smallest magnitude component. We note that the smallest magnitude component is difficult to capture; the magnitude is comparable to the measurement error 0.1=0.010.1=\sqrt{0.01}.

Method l1l_{1} error data misfit
p=2p=2, ens size 2000 14.0802 0.0515
p=1p=1, ens size 2000 0.7848 0.8018
p=0.7p=0.7, ens size 2000 0.2773 1.2737
p=1p=1, ens size 50 1.6408 1.4095
p=0.7p=0.7, ens size 50 0.6027 1.8958
l1l_{1} convex minimization 0.5623 0.9030
Table 1: Compressive sensing. lpl_{p}EKI estimate l1l_{1} error and data misfit for p=2,1p=2,1 and 0.70.7.
Refer to caption
Figure 5: Compressive sensing. l1l_{1} error of the lpl_{p}EKI estimate and data misfit.

Another cost of using p<1p<1 to impose stronger sparsity than p=1p=1 is a slow convergence rate oflpl_{p}EKI. The time series of the l1l_{1} estimation error and the data misfit of lpl_{p}EKI averaged over 100 trials are shown in fig. 5 alongside those of the convex optimization method. The results show that p=0.7p=0.7 converges slower than p=1p=1 (see Table table 1 for the numerical values of the error and the misfit). Although there is a slowdown in convergence, it is worth noting that lpl_{p}EKI with p=0.7p=0.7 converges in a reasonably short time, 15 iterations, to achieve the best result. lpl_{p}EKI with p=2p=2 converges fast with the smallest data misfit. However, the l2l_{2} regularization is not strong enough to impose sparsity in the estimate and yields the largest estimation error, which is 20 times larger than the case of p=1p=1. Note that the convex optimization method has the fastest convergence rate; it converges within three iterations and captures the four nonzero components with slightly smaller magnitudes than p=0.7p=0.7 for the three most significant components.

Refer to caption
Figure 6: Compressive sensing. Reconstruction of sparse signal using lpl_{p}EKI for p=1 and 0.7. Ensemble size is 50. For the true signal, only the nonzero components are marked.

The ensemble size 2000 is larger than the dimension of the unknown vector uu, 200. A large ensemble size can be impractical for a high-dimensional unknown vector. To see the applicability of lpl_{p}EKI using a small ensemble size, we use 50 ensemble members and two batches following the multiple batch approach [23]. The first batch runs 10 iterations, and all components whose magnitudes are less than 0.1 (the square root of the observation variance) are removed. The problem’s size the second batch solves ranges from 30-45 (depending on a realization of the initial ensemble), which is then solved for another 10 iterations. The estimates using 50 ensemble members for p=1p=1 and p=0.7p=0.7 after two batch runs (i.e., 20 iterations) are shown in fig. 6. Compared with the large ensemble size case, the small ensemble size run also captures the most significant components at the cost of fluctuating components larger than the large ensemble size test. We note that the estimates are averaged over 100 trials, and thus there are components whose magnitudes are less than the threshold value 0.1 used in the multiple batch run.

Refer to caption
Figure 7: Compressive sensing. Standard deviation of the estimates using 100 trials.

As a measure to check the performance difference for different trials, fig. 7 shows the standard deviations of lpl_{p}EKI estimates for p=1p=1 and 0.70.7 after 20 iterations. The first row shows the results using 2000 ensemble members, while the second row shows the ones using 50 ensemble members. The standard deviations of the large ensemble size are smaller than those of the small ensemble size case as the large ensemble size has a smaller sampling error. In all cases, the standard deviations are smaller than 6% of the magnitude of the most significant components. In terms of pp, the standard deviations of p=0.7p=0.7 are smaller than those of p=1p=1.

4.3 2D elliptic problem

Next, we consider an inverse problem where the forward model is given by an elliptic partial differential equation. The model is related to subsurface flow described by Darcy flow in the two-dimensional unit square (0,1)22(0,1)^{2}\subset\mathbb{R}^{2}

(k(x)p(x))=f(x),x=(x1,x2)(0,1)2.-\nabla\cdot(k(x)\nabla p(x))=f(x),\quad x=(x_{1},x_{2})\in(0,1)^{2}. (34)

The scalar field k(x)>α>0k(x)>\alpha>0 is the permeability, and another field p(x)p(x) is the piezometric head or the pressure field of the flow. For a known source term f(x)f(x), the inverse problem estimates the permeability from measurements of the pressure field pp. This model is a standard model for an inverse problem in oil reservoir simulations and has been actively used to measure EKI’s performance and its variants, including TEKI [16, 8].

We follow the same setting used in TEKI [8] for the boundary conditions and the source term. The boundary conditions consist of Dirichlet and Neumann boundary conditions

p(x1,0)=100,px1(1,x2)=0,kpx1(0,x2)=500,px2(x1,1)=0,p(x_{1},0)=100,\frac{\partial p}{\partial x_{1}}(1,x_{2})=0,-k\frac{\partial p}{\partial x_{1}}(0,x_{2})=500,\frac{\partial p}{\partial x_{2}}(x_{1},1)=0,

and the source term is piecewise constant

f(x1,x2)={0if 0x246,137if 46<x256,274if 56<x21.\displaystyle f(x_{1},x_{2})=\left\{\begin{array}[]{ll}0&\mbox{if }0\leq x_{2}\leq\frac{4}{6},\\ 137&\mbox{if }\frac{4}{6}<x_{2}\leq\frac{5}{6},\\ 274&\mbox{if }\frac{5}{6}<x_{2}\leq 1.\end{array}\right.

A physical motivation of the above configuration can be found in [7]. We use 15×1515\times 15 regularly spaced points in (0,1)2(0,1)^{2} to measure the pressure field with a small measurement error variance 10610^{{-6}}. For a given kk, the forward model is solved by a FEM method using the second-order polynomial basis on a 60×6060\times 60 uniform mesh.

In addition to the standard setup, we impose a sparse structure in the permeability. We assume that the log permeability, logk\log k, can be represented by 400 components in the cosine basis ϕij=cos(iπx1)cos(jπx2),i,j=0,1,,19,\phi_{ij}=\cos(i\pi x_{1})\cos(j\pi x_{2}),i,j=0,1,...,19,

logk(x)=i,j=019uijϕij(x),\log k(x)=\sum_{i,j=0}^{19}u_{ij}\phi_{ij}(x), (35)

where only six of {uij}\{u_{ij}\} are nonzero. That is, we assume that the discrete cosine transform of logk\log k is sparse with only 6 nonzero components out of 400 components. Thus, the problem we consider here can be formulated as an inverse problem to recover u={uij}400u=\{u_{ij}\}\in\mathbb{R}^{400} (which has only six nonzero components) from a measurement y225y\in\mathbb{R}^{225}, the measurement of pp at 15×1515\times 15 regularly spaced points. In terms of sparsity reconstruction, the current setup is similar to the previous compressive sensing problem, but the main difference lies in the forward model. In this test, the forward model is nonlinear and computationally expensive to solve, where the forward model in the compressive sensing test was linear using a random measurement matrix.

For this test, we run lpl_{p}EKI using only a small ensemble size due to the high computational cost of running the forward model. As in the previous test, we use the multiple batch approach. First, the lpl_{p}EKI ensemble of size 50 is initialized around zero with Gaussian perturbations of variance 0.1. After the first five iterations, all components whose magnitudes less than 5×1035\times 10^{-3} are removed at each iteration. The threshold value is slightly smaller than the smallest magnitude component of the true signal. Over 100 different trials, the average number of nonzero components after 30 iterations is 18 that is smaller than the ensemble size.

Refer to caption
(a) true
Refer to caption
(b) p=2p=2
Refer to caption
(c) p=1p=1
Refer to caption
(d) p=0.8p=0.8
Figure 8: 2D elliptic problem. Left column: the true uu and lpl_{p}EKI estimates for p=2,1,p=2,1, and 0.80.8. Right column: logk\log k of the true and lpl_{p}EKI estimates. All plots have the same grey scale. p=1p=1 and 0.80.8 use the results after 20 iterations while p=2p=2 uses the result after 50 iterations. For the true signal, only the nonzero components are marked.

The true value of uu used in this test and its corresponding log permeability, logk\log k, are shown in the first row of fig. 8 (uu is represented as a one-dimensional vector by concatenating the row vectors of {uij}\{u_{ij}\}). The lpl_{p}EKI estimates for p=2,1p=2,1, and 0.80.8 are shown in the second to the fourth rows of fig. 8. Here p=0.8p=0.8 was the smallest value we can use for lpl_{p}EKI due to the numerical overflow in the exponentiation of logk\log k. A smaller pp can be used with a smaller variance for ensemble initialization, but the gain is marginal. The results of lpl_{p}EKI are similar to the compressive sensing case. p=0.8p=0.8 has the best performance recovering the four most significant components of uu. p=1p=1 has slightly weak magnitudes missing the correct magnitudes of the two most significant components (corresponding to one-dimensional indices 141 and 364). Both cases converge within 20 iterations to yield the best result (see fig. 9 and Table table 2 for the time series and numerical values of the l1l_{1} error and data misfit). When p=2p=2, lpl_{p}EKI performs the worst; it has the largest l1l_{1} error and data misfit. We note that p=2p=2 uses the result after running 50 iterations at which the estimate converges.

Refer to caption
Figure 9: 2D elliptic problem. l1l_{1} error of the lpl_{p}EKI estimates and data misfit.
Refer to caption
Figure 10: 2D elliptic problem. Standard deviation of the estimates using 100 trials.

The performance difference between different trials is not significant. The standard deviations of the lpl_{p}EKI estimates using 100 trials are shown in fig. 10. The standard deviations for nonzero components are larger than the other components, but the largest standard deviation is less than 3% of the magnitude of the true signal. As in the compressive sensing test, the deviations are slightly smaller for p<1p<1 than p=1p=1.

pp l1l_{1} error data misfit
2 21.3389 4.1227
1 0.1553 0.5707
0.8 0.0719 0.5682
Table 2: 2D elliptic problem. lpl_{p}EKI estimate l1l_{1} error and data misfit for p=2,1p=2,1 and 0.80.8.

5 Discussions and conclusions

We have proposed a strategy to implement lp,0<p1l_{p},0<p\leq 1, regularization in ensemble Kalman inversion (EKI) to recover sparse structures in the solution of an inverse problem. The lpl_{p}-regularized ensemble Kalman inversion (lpl_{p}EKI) proposed here uses a transformation to convert the lpl_{p} regularization to the l2l_{2} regularization, which is then solved by the standard EKI with an augmented measurement model used in Tikhonov EKI. We showed a one-to-one correspondence between the local minima of the original and the transformed formulations. Thus a local minimum of the original problem can be obtained by finding a local minimum of the transformed problem. As other iterative methods for non-convex problems, initialization plays a vital role in the proposed method’s performance. The effectiveness and robustness of regularized EKI are validated through a suite of numerical tests, showing robust results in recovering sparse solutions using p1p\leq 1.

In implementing lpl_{p} regularization for EKI, there is a limit for p<1p<1 due to an overflow. One possible workaround is to use a nonlinear augmented measurement model related to the transformation Ψ\Psi, not the transformation Ξ\Xi. The nonlinear measurement model is general to incorporate the lpl_{p} regularization term directly instead of using the transformed l2l_{2} problem. However, this approach lacks a mathematical framework to prevent the inadvertent addition of local minima. This approach is under investigation and will be reported in another place.

For successful applications of lpl_{p}EKI for high-dimensional inverse problems, it is essential to maintain a small ensemble size for efficiency. In the current study, we considered the multiple batch approach. The approach removes non-significant components after each batch, and thus the problem size (i.e., the dimension of the unknown signal) decreases over different batch runs. This approach enabled lpl_{p}EKI to use only 50 ensemble members to solve 200 and 400-dimensional inverse problems. Other techniques, such as variance inflation and localization, improve the performance of the standard EKI using a small ensemble size [24]. It would be natural to investigate if these techniques can be extended to lpl_{p}EKI to decrease the sampling error of lpl_{p}EKI.

In the current study, we have left several variants of lpl_{p}EKI for future work. Weighted l1l_{1} has been shown to recover sparse solutions using fewer measurements than the standard l1l_{1} [6]. It is straightforward to implement weighted l1l_{1} (and further weighted lpl_{p} for p<1p<1) in lpl_{p}EKI by replacing the identity matrix in eq. 13 with another type of covariance matrix corresponding to the desired weights. We plan to study several weighting strategies to improve the performance oflpl_{p}EKI. As another variant oflpl_{p}EKI, we plan to investigate the adaptive time-stepping under the continuous limit. The time step for solving the continuous limit equation, which is called ‘learning rate’ in the machine learning community, is known to affect an optimization solver [10]. The standard ensemble Kaman inversion has been applied to machine learning tasks, such as discovering the vector fields defining a differential equation, using time series data [18] and sparse learning using thresholding [25]. We plan to investigate the effect of an adaptive time-stepping for performance improvements and compare with the sparsity EKI method using thresholding in dimension reduction in machine learning.

References

  • [1] J. L. Anderson, An ensemble adjustment kalman filter for data assimilation, Monthly Weather Review, 129 (2001), pp. 2884–2903.
  • [2] J. M. Bardsley, A. Solonen, H. Haario, and M. Laine, Randomize-then-optimize: A method for sampling from posterior distributions in nonlinear inverse problems, SIAM Journal on Scientific Computing, 36 (2014), pp. A1895–A1910.
  • [3] M. Benning and M. Burger, Modern regularization methods for inverse problems, Acta Numerica, 27 (2018), p. 1–111.
  • [4] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, USA, 2004.
  • [5] A. C. Reynolds, M. Zafari, and G. Li, Iterative forms of the ensemble kalman filter, (2006).
  • [6] E. J. Candès, M. B. Wakin, and S. P. Boyd, Enhancing sparsity by reweighted l1l_{1} minimization, Journal of Fourier Analysis and Applications, 14 (2008), pp. 877–905.
  • [7] J. Carrera and S. P. Neuman, Estimation of aquifer parameters under transient and steady state conditions: 3. application to synthetic and field data, Water Resources Research, 22 (1986), pp. 228–242.
  • [8] N. K. Chada, A. M. Stuart, and X. T. Tong, Tikhonov regularization within ensemble kalman inversion, SIAM Journal on Numerical Analysis, 58 (2020), pp. 1263–1294.
  • [9] N. K. Chada and X. T. Tong, Convergence acceleration of ensemble kalman inversion in nonlinear settings, (2019). arXiv:1911.02424.
  • [10] J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res., 12 (2011), p. 2121–2159.
  • [11] G. Evensen, Data Assimilation: The Ensemble Kalman Filter, Springer, London, 2009.
  • [12] R. Fletcher, Practical methods of optimization, John Wiley, New York, 2nd ed., 1987.
  • [13] S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing, Birkhäuser Basel, 2013.
  • [14] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, The MIT Press, 2016.
  • [15] M. Hanke, A regularizing levenberg - marquardt scheme, with applications to inverse groundwater filtration problems, Inverse Problems, 13 (1997), pp. 79–95.
  • [16] M. Iglesias, K. J. H. Law, and A. Stuart, Ensemble kalman methods for inverse problems, Inverse Problems, 29 (2013), p. 045001.
  • [17] M. A. Iglesias, A regularizing iterative ensemble kalman method for PDE-constrained inverse problems, Inverse Problems, 32 (2016), p. 025002.
  • [18] N. B. Kovachki and A. M. Stuart, Ensemble kalman inversion: a derivative-free technique for machine learning tasks, Inverse Problems, 35 (2019), p. 095005.
  • [19] A. C. Li, Gaoming; Reynolds, An iterative ensemble kalman filter for data assimilation, (2007).
  • [20] L. V. M.S. Andersen, J. Dahl, Cvxopt: A python package for convex optimization. cvxopt.org.
  • [21] D. Oliver, A. C. Reynolds, and N. Liu, Inverse Theory for Petroleum Reservoir Characterization and History Matching, Cambridge University Press, Cambridge, UK, 1st ed., 2008.
  • [22] B. T. Polyak and A. B. Juditsky, Acceleration of stochastic approximation by averaging, SIAM Journal on Control and Optimization, 30 (1992), pp. 838–855.
  • [23] H. Schaeffer, Learning partial differential equations via data discovery and sparse optimization, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473 (2017), p. 20160446.
  • [24] C. Schillings and A. M. Stuart, Analysis of the ensemble kalman filter for inverse problems, SIAM Journal on Numerical Analysis, 55 (2017), pp. 1264–1290.
  • [25] T. Schneider, A. M. Stuart, and J.-L. Wu, Imposing sparsity within ensemble kalman inversion, (2020). arXiv:2007.06175.
  • [26] Z. Wang, J. M. Bardsley, A. Solonen, T. Cui, and Y. M. Marzouk, Bayesian inverse problems with $l_1$ priors: A randomize-then-optimize approach, SIAM Journal on Scientific Computing, 39 (2017), pp. S140–S166.