This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An Efficient Data-Driven Multiscale Stochastic Reduced Order Modeling Framework for Complex Systems

Changhong Mou Department of Mathematics, University of Wisconsin-Madison, 480 Lincoln Drive, Madison, WI 53706, USA. [email protected] Nan Chen111Corresponding author Department of Mathematics, University of Wisconsin-Madison, 480 Lincoln Drive, Madison, WI 53706, USA. [email protected] Traian Iliescu Department of Mathematics, Virginia Tech, Virginia Tech, Blacksburg, VA 24061, USA. [email protected]
Abstract

Suitable reduced order models (ROMs) are computationally efficient tools in characterizing key dynamical and statistical features of nature. In this paper, a systematic multiscale stochastic ROM framework is developed for complex systems with strong chaotic or turbulent behavior. The new ROMs are fundamentally different from the traditional Galerkin ROM (G-ROM) or those deterministic ROMs that aim at minimizing the path-wise errors and applying mainly to laminar systems. Here, the new ROM focuses on recovering the large-scale dynamics to the maximum extent while it also exploits cheap but effective conditional linear functions as the closure terms to capture the statistical features of the medium-scale variables and its feedback to the large scales. In addition, physics constraints are incorporated into the new ROM. One unique feature of the resulting ROM is that it facilitates an efficient and accurate scheme for nonlinear data assimilation, the solution of which is provided by closed analytic formulae. Such an analytic solvable data assimilation solution significantly accelerates the computational efficiency and allows the new ROM to avoid many potential numerical and sampling issues in recovering the unobserved states from partial observations. The overall model calibration is efficient and systematic via explicit mathematical formulae. The new ROM framework is applied to complex nonlinear systems, in which the intrinsic turbulent behavior is either triggered by external random forcing or deterministic nonlinearity. It is shown that the new ROM significantly outperforms the G-ROM in both scenarios in terms of reproducing the dynamical and statistical features as well as recovering unobserved states via the associated efficient data assimilation scheme.

keywords:
Reduced Order Models, Data Assimilation, Physics Constraints, Analytically Solvable Conditional Statistics, Turbulent Systems
MSC:
[2010] 37N10 , 65C20 , 86A04 , 62M10
journal: arXiv.org

1 Introduction

Modeling complex nonlinear turbulent systems is of practical importance. These complex nonlinear system appear in many areas [1, 2, 3, 4, 5], such as geophysics, climate science, engineering, atmosphere and ocean science, neural science, and material science. They often involve strong nonlinearities and multiscale structures [3, 6, 7, 8]. Energy is transferred across different scales via the intrinsic nonlinearity that triggers regime switching, intermittent instabilities, extreme events and many other chaotic and turbulent features [9, 10, 11, 12, 13, 14, 15, 16]. As a result, the associated probability density functions (PDFs) are usually non-Gaussian with fat tails. Building appropriate mathematical models for complex systems that succeed in characterizing these key features not only advances the understanding of nature, but facilitates effective state estimation and skillful forecast as well. However, the complexity of nature makes it extremely challenging to develop a full order model (FOM) and apply it to forecast many quantities of interest. In fact, due to the lack of physical understanding or the inadequate resolution in computations, a perfect FOM is never available in practice. Even if a nearly perfect FOM is accessible, the model is often of high dimensionality and contains a complicated model structure [17, 18]. As a consequence, there exist many mathematical and computational difficulties in analyzing and simulating such a FOM.

Suitable reduced order models (ROMs) [19, 20, 21, 22, 23] are desirable surrogates for the FOMs to reproduce certain key dynamical and statistical features of nature with a much lower computational cost. Linear regression models are possibly the simplest class of ROMs [24, 25] that can provide certain skill for short-term forecasts but they usually struggle in characterizing the underlying nonlinear physics. Beyond the regression strategies, one systematic and commonly used approach to developing ROMs is to project the starting complex nonlinear system to the leading a few energetic modes in light of the Galerkin proper orthogonal decomposition methods [20] or other empirical basis functions such as the principal interaction patterns [26, 27] and the dynamic mode decomposition [28, 29]. With a careful design of the closure terms (see [30] for a survey) to compensate the truncation error, these ROMs are skillful in resolving certain problems in many areas, including fluids and turbulence [31, 32, 21, 23, 33]. While many traditional ROM strategies focus on seeking for the optimal approximate models with deterministic model structures, stochastic effects have been incorporated into many recent developments of ROMs for complex turbulent systems. The stochasticity is essential in assessing the uncertainty and model error in ROMs. It also plays a pivotal role in facilitating data assimilation and advancing probabilistic forecast, which is vital for predicting non-Gaussian extreme events. The MTV (named after Majda, Timofeyev and Vanden Eijnden) strategy is one of the most straightforward approaches to incorporate stochastic effects in the ROMs [34, 35], the idea of which is to replace the self-interactions among the high frequencies by random noise. The recently proposed nonlinear autoregression (NAR) model emphasizes the random effect in the development of regression-based ROMs and its significance in advancing the statistical forecast [36, 37]. On the other hand, physics-constrained regression models are a set of nonlinear stochastic ROMs [38, 39, 40], which take into account the energy conserving nonlinear interactions in the model development that guarantees the well-posednss of long-term behavior of the system. In addition, stochastic parameterizations are powerful ways to reduce the complexity of the unresolved part of the dynamics that nevertheless provides effective statistical feedbacks to the resolved-scale dynamics [41, 42, 43, 44, 45].

The goal of this paper is to build a systematic multiscale stochastic ROM framework for complex systems with strong chaotic or turbulent behavior. The new ROM framework has several unique features. First, a new strategy of building effective closure terms is incorporated into the framework that includes both deterministic and stochastic components. Fundamentally different from the traditional closure methods, the development of the closures in this new ROM framework exploits the multiscale characteristics of the underlying system to reach a systematic formulation that is mathematically more concise and tractable than most of the traditional nonlinear closures. Nevertheless, appropriate stochastic closure terms are combined with the deterministic ones in this framework that succeeds in providing effective statistical feedbacks between different scales. These closure terms significantly advance the ROM in reproducing both the dynamics and statistics of nature. Second, data assimilation is a necessary procedure in understanding and forecasting turbulent systems [46, 47, 48, 49]. It aims at recovering the unobserved state variables using Bayesian inference, and is the prerequisite of the ensemble forecast for turbulent flows. Different from most of the traditional ROMs that have to rely on ensemble based data assimilation methods for state estimation and the initialization of the forecast, the ROM with the new concise closures facilitates an efficient and accurate scheme for nonlinear data assimilation, the solution of which is provided by closed analytic formulae. Such an analytic solvable data assimilation solution increases the computational efficiency and allows the new ROM to avoid many potential numerical and sampling issues that usually require careful ad hoc tunings. Third, physics constraints can be naturally incorporated into this new ROM framework, which is essential for preserving the basic principles in physics and preventing finite-time blow-up solutions in the statistical forecast [38, 39, 50]. Another desirable feature is that the ROM and the associated closures can be systematically determined via a computationally efficient optimization procedure with no ad hoc tuning. Note that the development of the ROM in this framework does not require to know the exact FOM, which is often not accessible in practice. The framework is in fact amenable to data-driven scenarios. Specifically, the partially available information from the FOM can be used to determine the large-scale structure of the ROM, while the remaining part of the ROM including the closures can be inferred from data. Finally, it is worthwhile to highlight that many well-known complex nonlinear systems in geophysics, neural science, and engineering turbulence have the same structure as those belonging to the new ROM framework [51], which is a further justification of the framework.

The new ROM framework will be applied to complex nonlinear systems that belong to two categories, distinguished by the triggering mechanisms of the turbulent behavior. The first test model is a viscous stochastic Burgers equation [52, 53]. In such a complex system, external random forcing interacts with the deterministic nonlinearity to induce intermittent signals in the form of viscous shocks, which will be dissipated via the viscous term before the appearance of discontinuities. Here, the stochastic forcing triggers turbulent features. The second test example is the quasi-geostrophic (QG) equation [1, 9]. The QG equation is a deterministic system and the turbulent features are induced completely by the nonlinear interactions between the state variables at different spatial scales. In both cases, only a small number of the state variables are involved in the ROM that allows a much lower computational cost to reproduce the key dynamical and statistical features. It is also significant to see the skill of the stochastic closure terms in effectively characterizing the turbulent features triggered completely by the deterministic nonlinearity.

The rest of the paper is organized as follows. Section 2 includes the new ROM framework, the associated data assimilation scheme, and the model calibration. Section 3 demonstrates the skill of the new ROMs in reproducing the key dynamical and statistical features of both the viscous stochastic Burgers equation and the QG system. The comparison with the state-of-the-art ROMs in terms of both the model simulation and the data assimilation skill is presented in this section as well. The paper is concluded in Section 4, together with a discussion of potential extensions of the current framework. All the technical details are included in Appendix.

2 The New Reduced Order Modeling (ROM) Framework

2.1 The mathematical framework of the ROM with a conditional linear closure

Many complex turbulent nonlinear systems in nature have the following abstract mathematical structure [6, 1, 9, 47],

d𝐮dt=(L+D)𝐮+B(𝐮,𝐮)+𝐅(t)+𝝈(t)𝐖˙(t),\frac{\mathrm{d}\mathbf{u}}{\mathrm{d}t}=\left(L+D\right)\mathbf{u}+B\left(\mathbf{u},\mathbf{u}\right)+\mathbf{F}\left(t\right)+\bm{\sigma}\left(t\right)\dot{\mathbf{W}}\left(t\right), (1)

where 𝐮N\mathbf{u}\in\mathbb{C}^{N} is the multidimensional state variable and tt is time. In (1), L𝐮L\mathbf{u} and D𝐮D\mathbf{u} represent linear dispersion and dissipation effects, respectively, where LL is skew-symmetric and DD is negative-definite. The nonlinear effect is introduced through an energy-conserving quadratic form with 𝐮B(𝐮,𝐮)=0\mathbf{u}\cdot B\left(\mathbf{u},\mathbf{u}\right)=0. The system is also subject to external forcing effects that contain a deterministic component, 𝐅(t)\mathbf{F}\left(t\right), and possibly a stochastic disturbance as well, which is assumed to be a Gaussian random noise, 𝝈(t)𝐖˙(t)\bm{\sigma}\left(t\right)\dot{\mathbf{W}}\left(t\right), for simplicity. The model in (1) can be understood as discretizing a (stochastic) partial differential equation by projecting it onto a given set of basis functions {𝝋1,𝝋2,,𝝋N}\{\bm{\varphi}_{1},\bm{\varphi}_{2},\ldots,\bm{\varphi}_{N}\}, resulting in a high-dimensional system, where the stochastic forcing accounts for the truncation error. Due to the intrinsic turbulent behavior, the system in (1) is highly nonlinear and the state variable 𝐮\mathbf{u} can possess strongly non-Gaussian statistics in both the marginal and joint PDFs.

Decompose the state variable 𝐮\mathbf{u} into 𝐮=(𝐯𝚃,𝐰𝚃,𝐳𝚃)𝚃\mathbf{u}=(\mathbf{v}^{\mathtt{T}},\mathbf{w}^{\mathtt{T}},\mathbf{z}^{\mathtt{T}})^{\mathtt{T}} with 𝚃\cdot^{\mathtt{T}} being the vector transpose, where 𝐯N1\mathbf{v}\in\mathbb{C}^{N_{1}}, 𝐰N2\mathbf{w}\in\mathbb{C}^{N_{2}}, 𝐳N3\mathbf{z}\in\mathbb{C}^{N_{3}}, and N1+N2+N3=NN_{1}+N_{2}+N_{3}=N. A typical criterion for such a decomposition is based on the rank of the scales for the state variables. In other words, 𝐯\mathbf{v} includes the leading few modes corresponding to the large-scale components of the state variables, 𝐰\mathbf{w} contains the next a few modes describing the medium-scale features, while 𝐳\mathbf{z} involves the remaining modes that are treated as the small-scale variables. Note that different from many traditional ROM development, where the state variables are decomposed into only the large- and small-scale components, a three-scale decomposition is utilized here. Such a new decomposition facilitates a smoother transition from large- to small-scale variables by means of a set of intermediate-scale variables. It also allows the feedback from medium- to large-scale variables to be explicitly incorporated into the ROM, which is often crucial in enhancing the model accuracy in characterizing energy transfer and intermittency. Finally, the choice of N1,N2N_{1},N_{2}, and N3N_{3} is case dependent but in a typical scenario 𝐳\mathbf{z} accounts for the majority number of the modes.

A natural way to build ROMs with a reduced dimension of the coupled system in (1) is to project 𝐮\mathbf{u} onto its large- and medium-scale components 𝐯\mathbf{v} and 𝐰\mathbf{w},

d𝐯dt\displaystyle\frac{{\,\rm d}\mathbf{v}}{{\,\rm d}t} =𝐀v(v)𝐯+𝐯𝐁~vv(v)𝐯+𝐯𝐁~vw(v)𝐰+𝐰𝐁~ww(v)𝐰+residual1,\displaystyle=\mathbf{A}^{(v)}_{v}\mathbf{v}+\mathbf{v}^{*}\tilde{\mathbf{B}}^{(v)}_{vv}\mathbf{v}+\mathbf{v}^{*}\tilde{\mathbf{B}}^{(v)}_{vw}\mathbf{w}+\mathbf{w}^{*}\tilde{\mathbf{B}}^{(v)}_{ww}\mathbf{w}+\mbox{residual}_{1}, (2a)
d𝐰dt\displaystyle\frac{{\,\rm d}\mathbf{w}}{{\,\rm d}t} =𝐀w(w)𝐰+𝐯𝐁~vv(w)𝐯+𝐯𝐁~vw(w)𝐰+𝐰𝐁~ww(w)𝐰+residual2,\displaystyle=\mathbf{A}^{(w)}_{w}\mathbf{w}+\mathbf{v}^{*}\tilde{\mathbf{B}}^{(w)}_{vv}\mathbf{v}+\mathbf{v}^{*}\tilde{\mathbf{B}}^{(w)}_{vw}\mathbf{w}+\mathbf{w}^{*}\tilde{\mathbf{B}}^{(w)}_{ww}\mathbf{w}+\mbox{residual}_{2}, (2b)

where \cdot^{*} is the conjugate transpose. It is seen that the linear and the quadratic nonlinear terms in (1) have been explicitly expressed in (2). Therefore, 𝐀()\mathbf{A}^{(\cdot)}_{\cdot} and 𝐁~()\tilde{\mathbf{B}}^{(\cdot)}_{\cdot} in (2) are all constant matrices and tensors while the two residual terms include the contribution from the variable 𝐳\mathbf{z} that has been truncated here. The notation in (2) is slightly abused in the sense that the quadratic nonlinear terms between 𝐯\mathbf{v} and 𝐰\mathbf{w} are simply written as 𝐯𝐁~vw(v)𝐰\mathbf{v}^{*}\tilde{\mathbf{B}}^{(v)}_{vw}\mathbf{w} and 𝐯𝐁~vw(w)𝐰\mathbf{v}^{*}\tilde{\mathbf{B}}^{(w)}_{vw}\mathbf{w} while the other parts of the associated nonlinear terms, say 𝐰𝐁~~vw(v)𝐯\mathbf{w}^{*}\tilde{\tilde{\mathbf{B}}}^{(v)}_{vw}\mathbf{v} and 𝐰𝐁~~vw(w)𝐯\mathbf{w}^{*}\tilde{\tilde{\mathbf{B}}}^{(w)}_{vw}\mathbf{v}, are omitted in (2) for notation simplicity. Note that if the state variables are all real-valued, then 𝐯𝐁~vw(v)𝐰\mathbf{v}^{*}\tilde{\mathbf{B}}^{(v)}_{vw}\mathbf{w} and 𝐯𝐁~vw(w)𝐰\mathbf{v}^{*}\tilde{\mathbf{B}}^{(w)}_{vw}\mathbf{w} are sufficient to characterize the quadratic nonlinear interactions between 𝐯\mathbf{v} and 𝐰\mathbf{w}. Since each complex-valued variable can always be rewritten as two real-valued variables, representing its real and imaginary parts, all the state variables in the following will be regarded as real-valued. Similarly, the notation \cdot^{*} hereafter simply denotes the transpose of a real-valued vector. In the development of ROMs, various methods have been proposed to deal with the residual terms. If these residual terms are simply ignored, then the resulting ROM becomes the so-called Garlerkin ROM (G-ROM) [26, 27]. However, it has been shown that the G-ROM often introduces inaccuracy in reproducing even the large-scale dynamics of the FOM. Such a finding implies the importance of building approximate functions to compensate the contribution from the residuals. If these functions depend only on the state variables of the ROM, then they are known as the closures since they allow the ROM to be closed [30, 31, 21, 23, 33]. In most of the existing methods, the closure terms are assumed to be a linear or quadratic function of the entire ROM state variables, which are (𝐯,𝐰)(\mathbf{v},\mathbf{w}) in the setup of (2).

The new ROM to be developed here has several fundamental differences compared with the existing ROMs. First, only the first three terms on the right hand side of (2) are retained in the new ROM framework, which represent the basic dynamics. In other words, the self nonlinear interactions of 𝐰\mathbf{w} are omitted. One motivation for such a simplification is that 𝐰\mathbf{w} is a smaller scale variable compared with 𝐯\mathbf{v} and therefore it is often a faster variable and contains less energy. As a consequence, 𝐰𝐁~vw(v)𝐰\mathbf{w}^{*}\tilde{\mathbf{B}}^{(v)}_{vw}\mathbf{w} and 𝐰𝐁~vw(w)𝐰\mathbf{w}^{*}\tilde{\mathbf{B}}^{(w)}_{vw}\mathbf{w} are expected to have a weaker role than the other nonlinear terms in (2). Another important reason is related to the development of an efficient data assimilation scheme for the ROM, which will be discussed in Section 2.2. Next, a hybrid closure configuration is introduced to approximate the residual part as well as the contribution of the self nonlinear interactions of 𝐰\mathbf{w}. Here, the closure includes both a set of deterministic functions and a stochastic component. The building block of the deterministic part of the closure in this new ROM framework is very distinct fromthat in the existing methods, which is either linear or quadratic nonlinear in terms of both 𝐯\mathbf{v} and 𝐰\mathbf{w}. Here, a special type of nonlinear functions are adopted, which are fully nonlinear with respect to 𝐯\mathbf{v} but are conditional linear with respect to 𝐰\mathbf{w}. The motivation is again that, due to the relatively faster nature of 𝐰\mathbf{w}, a conditional linear function and stochastic terms can effectively approximate the statistical behavior of 𝐰\mathbf{w} and its feedback to the large-scale component, 𝐯\mathbf{v}. With these new ingredients, the resulting ROM with the closure reads:

d𝐯dt\displaystyle\frac{{\,\rm d}\mathbf{v}}{{\,\rm d}t} =𝐀v(v)𝐯+𝐯𝐁vv(v)𝐯+𝐯𝐁vw(v)𝐰+(𝝉w(v)(𝐯)𝐰+𝝉v(v)(𝐯))+𝝈v𝐖˙v,\displaystyle=\mathbf{A}^{(v)}_{v}\mathbf{v}+\mathbf{v}^{*}\mathbf{B}^{(v)}_{vv}\mathbf{v}+\mathbf{v}^{*}\mathbf{B}^{(v)}_{vw}\mathbf{w}+(\bm{\tau}^{(v)}_{w}(\mathbf{v})\mathbf{w}+\bm{\tau}^{(v)}_{v}(\mathbf{v}))+\bm{\sigma}_{v}\dot{\mathbf{W}}_{v}, (3a)
d𝐰dt\displaystyle\frac{{\,\rm d}\mathbf{w}}{{\,\rm d}t} =𝐀w(w)𝐰+𝐯𝐁vv(w)𝐯+𝐯𝐁vw(w)𝐰+(𝝉w(w)(𝐯)𝐰+𝝉v(w)(𝐯))+𝝈w𝐖˙w,\displaystyle=\mathbf{A}^{(w)}_{w}\mathbf{w}+\mathbf{v}^{*}\mathbf{B}^{(w)}_{vv}\mathbf{v}+\mathbf{v}^{*}\mathbf{B}^{(w)}_{vw}\mathbf{w}+(\bm{\tau}^{(w)}_{w}(\mathbf{v})\mathbf{w}+\bm{\tau}^{(w)}_{v}(\mathbf{v}))+\bm{\sigma}_{w}\dot{\mathbf{W}}_{w}, (3b)

where 𝐖˙v\dot{\mathbf{W}}_{v} and 𝐖˙w\dot{\mathbf{W}}_{w} are the standard Gaussian white noises. The noise coefficients 𝝈v\bm{\sigma}_{v} and 𝝈w\bm{\sigma}_{w} are constant matrices and for simplicity they are diagonal. The four matrix or vector functions in the closure terms 𝝉w(v)(𝐯)\bm{\tau}^{(v)}_{w}(\mathbf{v}), 𝝉v(v)(𝐯)\bm{\tau}^{(v)}_{v}(\mathbf{v}), 𝝉w(w)(𝐯)\bm{\tau}^{(w)}_{w}(\mathbf{v}), and 𝝉v(w)(𝐯)\bm{\tau}^{(w)}_{v}(\mathbf{v}) can be arbitrarily nonlinear functions of 𝐯\mathbf{v}. It is worthwhile to highlight that the conditional linear functions of 𝐰\mathbf{w} are not linear functions of 𝐰\mathbf{w}. The conditional linearity means that 𝐰\mathbf{w} is linear with 𝐯\mathbf{v} being fixed. Therefore, the closure in (3) includes the effects from the nonlinear interactions between 𝐯\mathbf{v} and 𝐰\mathbf{w}. Here, for the consistency with the quadratic nonlinearity in the starting model (2), the closure terms 𝝉w(v)(𝐯)𝐰+𝝉v(v)(𝐯)\bm{\tau}^{(v)}_{w}(\mathbf{v})\mathbf{w}+\bm{\tau}^{(v)}_{v}(\mathbf{v}) and 𝝉w(w)(𝐯)𝐰+𝝉v(w)(𝐯)\bm{\tau}^{(w)}_{w}(\mathbf{v})\mathbf{w}+\bm{\tau}^{(w)}_{v}(\mathbf{v}) are assumed to be quadratic, and therefore

𝝉w(v)(𝐯)=𝐯𝐂wτ(v)+𝐃wτ(v),𝝉v(v)(𝐯)=𝐯𝐁vτ(v)𝐯+𝐂vτ(v)𝐯+𝐃vτ(v)𝝉w(w)(𝐯)=𝐯𝐂wτ(w)+𝐃wτ(w),𝝉v(w)(𝐯)=𝐯𝐁vτ(w)𝐯+𝐂vτ(w)𝐯+𝐃vτ(w)\begin{split}\bm{\tau}^{(v)}_{w}(\mathbf{v})=\mathbf{v}^{*}\mathbf{C}^{(v)}_{w\tau}+\mathbf{D}^{(v)}_{w\tau},&\qquad\bm{\tau}^{(v)}_{v}(\mathbf{v})=\mathbf{v}^{*}\mathbf{B}^{(v)}_{v\tau}\mathbf{v}+\mathbf{C}^{(v)}_{v\tau}\mathbf{v}+\mathbf{D}^{(v)}_{v\tau}\\ \bm{\tau}^{(w)}_{w}(\mathbf{v})=\mathbf{v}^{*}\mathbf{C}^{(w)}_{w\tau}+\mathbf{D}^{(w)}_{w\tau},&\qquad\bm{\tau}^{(w)}_{v}(\mathbf{v})=\mathbf{v}^{*}\mathbf{B}^{(w)}_{v\tau}\mathbf{v}+\mathbf{C}^{(w)}_{v\tau}\mathbf{v}+\mathbf{D}^{(w)}_{v\tau}\end{split} (4)

where 𝐃vτ(v)\mathbf{D}^{(v)}_{v\tau} and 𝐃vτ(w)\mathbf{D}^{(w)}_{v\tau} are constant vectors, while 𝐁()\mathbf{B}^{(\cdot)}_{\cdot}, 𝐂()\mathbf{C}^{(\cdot)}_{\cdot}, and 𝐃()\mathbf{D}^{(\cdot)}_{\cdot} are constant matrices.

It is worthwhile to highlight that many complex nonlinear systems have already fitted into the framework of (3). Some well-known classes of the models are [51, 54]:

  • 1.

    Physics-constrained nonlinear stochastic models. Examples include the noisy versions of Lorenz models, low-order models of Charney-DeVore flows, and a paradigm model for topographic mean flow interaction.

  • 2.

    Stochastically coupled reaction-diffusion models in neuroscience and ecology. Examples include stochastically coupled FitzHugh-Nagumo models and stochastically coupled SIR epidemic models.

  • 3.

    Multi-scale models in turbulence, fluids and geophysical flows. Example include the Boussinesq equations with noise and stochastically forced rotating shallow water equation.

Other applications that share similar motivations or ideas as (3) include predicting the intermittent time-series of the monsoon intraseasonal variabilities [55, 56], recovering the turbulent ocean flows with noisy observations from Lagrangian tracers [57, 58], cheap solvable forecast models in dynamic stochastic superresolution [59, 60], and stochastic superparameterization for geophysical turbulence [8]. These examples further justify that the new ROM framework (3) is appropriate to characterize or approximate many nonlinear and non-Gaussian systems in various disciplines.

2.2 Efficient data assimilation utilizing the new ROM

One of the major goals of building a ROM is to use it as a cheap forecast model. For turbulent systems, the forecast skill relies heavily on the accuracy of the initial values. However, it is often the case that only partial observations are available for complex systems. These partial observations usually correspond to the large-scale modes as they are the easiest to identify, while the variables that are below certain spatial scales are often hard to beapproximated. Therefore, it is important to combine the ROM with the partial observations for recovering the unobserved state variables in an accurate fashion,which then can be used as initializations for the forecast [61, 62, 63, 64, 65, 66, 67]. This is known as data assimilation (or filtering) [47, 68, 49, 46, 48].

Ensemble based data assimilation schemes [46, 69], such as the ensemble Kalman filter, have been widely utilized in practiceeliminate the rest of this sentence? and lead to many successes. However, there are a few intrinsic difficulties in applying these classical data assimilation methods. The main challenge comes from the sampling error as the state estimation is completely determined by the ensembles. To prevent the filter divergence, localization, noise inflation, and many other ad hoc tuning procedures are usually inevitable as effective remedies when using the ensemble based methods [70, 71]. In addition, if a large number of ensembles are used in the system to guarantee the accuracy of the data assimilation results, then the computational cost can significantly increase.

Different from the traditional data assimilation methods, the new ROM in (3) allows an efficient and accurate data assimilation scheme. The solution of the data assimilation result is given by closed analytic formulae and therefore it avoids all the ad hoc tuning procedures. In fact, the ROM in (3) can be written into a more abstract form,

d𝐯(t)dt\displaystyle\frac{{\,\rm d}\mathbf{v}(t)}{{\,\rm d}t} =[𝐀𝟎(𝐯,t)+𝐀𝟏(𝐯,t)𝐰(t)]+𝐁𝟏(𝐯,t)𝐖˙𝟏(t),\displaystyle=\Big{[}\mathbf{A}_{\mathbf{0}}(\mathbf{v},t)+\mathbf{A}_{\mathbf{1}}(\mathbf{v},t)\mathbf{w}(t)\Big{]}+\mathbf{B}_{\mathbf{1}}(\mathbf{v},t)\dot{\mathbf{W}}_{\mathbf{1}}(t), (5a)
d𝐰(t)dt\displaystyle\frac{{\,\rm d}\mathbf{w}(t)}{{\,\rm d}t} =[𝐚𝟎(𝐯,t)+𝐚𝟏(𝐯,t)𝐰(t)]+𝐛𝟐(𝐯,t)𝐖˙𝟐(t),\displaystyle=\Big{[}\mathbf{a}_{\mathbf{0}}(\mathbf{v},t)+\mathbf{a}_{\mathbf{1}}(\mathbf{v},t)\mathbf{w}(t)\Big{]}+\mathbf{b}_{\mathbf{2}}(\mathbf{v},t)\dot{\mathbf{W}}_{\mathbf{2}}(t), (5b)

where 𝐀0,𝐚0,𝐀1,𝐚1,𝐁1\mathbf{A}_{0},\mathbf{a}_{0},\mathbf{A}_{1},\mathbf{a}_{1},\mathbf{B}_{1}, and 𝐛2\mathbf{b}_{2} are matrices and vectors that can depend nonlinearly on the state variables 𝐯\mathbf{v}. Despite being highly nonlinear and non-Gaussian, the process of 𝐰\mathbf{w} in (5) is linear by design if a realization of the time series 𝐯(s)\mathbf{v}(s) for s[0,t]s\in[0,t] is given. Due to such a conditional linear property, the conditional distribution of 𝐰(t)\mathbf{w}(t), given a realization of 𝐯(s)\mathbf{v}(s) for s[0,t]s\in[0,t], is Gaussian,

p(𝐰(t)|𝐯(s),st)=𝒩(𝝁(t),𝐑(t)).p(\mathbf{w}(t)|\mathbf{v}(s),s\leq t)=\mathcal{N}(\bm{\mu}(t),\mathbf{R}(t)). (6)

One desirable feature of the ROM framework (5) is that the conditional mean 𝝁\bm{\mu} and the conditional covariance 𝐑\mathbf{R} can be solved via the following closed analytic formulae [72, 51, 54]

d𝝁dt\displaystyle\frac{{\,\rm d}\bm{\mu}}{{\,\rm d}t} =(𝐚𝟎+𝐚𝟏𝝁)+(𝐑𝐀𝟏)(𝐁1𝐁1)1[d𝐯dt(𝐀𝟎+𝐀𝟏𝝁)],\displaystyle=(\mathbf{a}_{\mathbf{0}}+\mathbf{a}_{\mathbf{1}}\bm{\mu})+(\mathbf{R}\mathbf{A}_{\mathbf{1}}^{*})(\mathbf{B}_{1}\mathbf{B}_{1}^{*})^{-1}\left[\frac{{\,\rm d}\mathbf{v}}{{\,\rm d}t}-(\mathbf{A}_{\mathbf{0}}+\mathbf{A}_{\mathbf{1}}\bm{\mu})\right], (7a)
d𝐑dt\displaystyle\frac{{\,\rm d}\mathbf{R}}{{\,\rm d}t} =𝐚𝟏𝐑+𝐑𝐚𝟏+𝐛2𝐛2(𝐑𝐀𝟏)(𝐁1𝐁1)1(𝐀𝟏𝐑).\displaystyle=\mathbf{a}_{\mathbf{1}}\mathbf{R}+\mathbf{R}\mathbf{a}_{\mathbf{1}}^{*}+\mathbf{b}_{2}\mathbf{b}_{2}^{*}-(\mathbf{R}\mathbf{A}_{\mathbf{1}}^{*})(\mathbf{B}_{1}\mathbf{B}_{1}^{*})^{-1}(\mathbf{A}_{\mathbf{1}}\mathbf{R}). (7b)

The explicit formulae in (6)–(7) correspond to the optimal nonlinear filter (namely the online data assimilation) solution of the state variable 𝐰(t)\mathbf{w}(t) given a realization of the observed time series 𝐯(s)\mathbf{v}(s) for s[0,t]s\in[0,t]. Thus, 𝝁\bm{\mu} and 𝐑\mathbf{R} are also known as the filter posterior mean and the filter posterior covariance, respectively. The closed analytic formulae for the filtering solution allow an efficient and accurate online state estimation of the unobserved state 𝐰\mathbf{w} from the observed time series of 𝐯\mathbf{v} and the model structure of the ROM. The real-time recovery of the unobserved state provides the initial condition of 𝐰\mathbf{w}, which together with the observed initial value of 𝐯\mathbf{v}, advances the forecast utilizing the ROM. Note that the classical Kalman-Bucy filter [73] is the simplest special example of (7).

Due to the solvable conditional Gaussian statistics in (6)–(7), the new ROM is named as the conditional Gaussian ROM (CG-ROM).

2.3 Data-driven model calibration

What remains is to determine the matrices and vectors in the ROM (3), which correspond to the model parameters. As was discussed above, partial observation means only a time series of the observed large-scale variable 𝐯\mathbf{v} is available in practice. Yet, it is assumed that a time series of both 𝐯\mathbf{v} and 𝐰\mathbf{w} is accessible in the model calibration stage to simplify the procedure of determining the model parameters. This can be explained as using the reanalysis data in many practical applications, where the historical trajectories of the unobserved data has been inferred via certain Bayesian approaches [74]. Otherwise, an iteration process involving an alternation between state estimation and parameter estimation can be adopted for the model calibration. The basic algorithm of parameter estimation remains the same but the entire process is more complicated, which will diverge from the ROM development here, and is therefore treated as future work(see the discussion in Section 4). Nevertheless, only the value of the observed variable 𝐯\mathbf{v} is available in the forecast stage such that data assimilation becomes essential in recovering the initial value of 𝐰\mathbf{w}.

The parameters in the ROM (3) can be divided into three sets:

  1. (A).

    The parameters in the basic dynamics: 𝐀v(v)\mathbf{A}^{(v)}_{v}, 𝐁vv(v)\mathbf{B}^{(v)}_{vv}, 𝐁vw(v)\mathbf{B}^{(v)}_{vw}, 𝐀w(w)\mathbf{A}^{(w)}_{w}, 𝐁vv(w)\mathbf{B}^{(w)}_{vv}, 𝐁vw(w)\mathbf{B}^{(w)}_{vw} in (3)

  2. (B).

    The parameters in the deterministic part of the closure: 𝐁()\mathbf{B}^{(\cdot)}_{\cdot}, 𝐂()\mathbf{C}^{(\cdot)}_{\cdot} and 𝐃()\mathbf{D}^{(\cdot)}_{\cdot} in (4)

  3. (C).

    The parameters in the stochastic part of the closure: 𝝈v\bm{\sigma}_{v} and 𝝈w\bm{\sigma}_{w} in (3)

The parameters in Set (A) are inherited from the FOM if the FOM is available. Otherwise, they can be handled in the same way as those in Sets (B) and (C), which are computed using the following optimization algorithm. For the simplicity of discussion, consider the discrete approximation of the original continuous version of the ROM (3) by applying the Euler-Maruyama scheme:

𝐯j+1\displaystyle\mathbf{v}^{j+1} =𝐯j+[𝐀v(v)𝐯j+(𝐯j)𝐁vv(v)𝐯j+(𝐯j)𝐁vw(v)𝐰j]Δt𝐂Lj+\displaystyle=\underbrace{\mathbf{v}^{j}+\biggl{[}\mathbf{A}^{(v)}_{v}\mathbf{v}^{j}+(\mathbf{v}^{j})^{*}\mathbf{B}^{(v)}_{vv}\mathbf{v}^{j}+(\mathbf{v}^{j})^{*}\mathbf{B}^{(v)}_{vw}\mathbf{w}^{j}\biggr{]}\Delta t}_{\mathbf{C}_{L}^{j}}+
[𝝉w(v)(𝐯j)𝐰j+𝝉v(v)(𝐯j)]Δt𝐌Lj(θL)+(𝝈vjΔt)ε𝐯j,\displaystyle\underbrace{\biggl{[}\bm{\tau}^{(v)}_{w}(\mathbf{v}^{j})\mathbf{w}^{j}+\bm{\tau}^{(v)}_{v}(\mathbf{v}^{j})\biggr{]}\Delta t}_{\mathbf{M}^{j}_{L}(\theta_{L})}+(\bm{\sigma}_{v}^{j}\sqrt{\Delta t})\varepsilon^{j}_{\mathbf{v}}, (8a)
𝐰j+1\displaystyle\mathbf{w}^{j+1} =𝐰j+[𝐀w(w)𝐯j+(𝐯j)𝐁vv(w)𝐯j+(𝐯j)𝐁vw(w)𝐰j]Δt𝐂Sj+\displaystyle=\underbrace{\mathbf{w}^{j}+\biggl{[}\mathbf{A}^{(w)}_{w}\mathbf{v}^{j}+(\mathbf{v}^{j})^{*}\mathbf{B}^{(w)}_{vv}\mathbf{v}^{j}+(\mathbf{v}^{j})^{*}\mathbf{B}^{(w)}_{vw}\mathbf{w}^{j}\biggr{]}\Delta t}_{\mathbf{C}_{S}^{j}}+
[𝝉w(w)(𝐯j)𝐰j+𝝉v(w)(𝐯j)]Δt𝐌Sj(θS)+(𝝈wjΔt)ε𝐰j,\displaystyle\underbrace{\biggl{[}\bm{\tau}^{(w)}_{w}(\mathbf{v}^{j})\mathbf{w}^{j}+\bm{\tau}^{(w)}_{v}(\mathbf{v}^{j})\biggr{]}\Delta t}_{\mathbf{M}^{j}_{S}(\theta_{S})}+(\bm{\sigma}_{w}^{j}\sqrt{\Delta t})\varepsilon^{j}_{\mathbf{w}}, (8b)

where ε𝐯j\varepsilon^{j}_{\mathbf{v}} and ε𝐰j\varepsilon^{j}_{\mathbf{w}} are standard independent and identically distributed Gaussian random variables at time tjt_{j}. Assume the noise coefficients are constants, namely 𝝈vj:=𝝈v\bm{\sigma}_{v}^{j}:=\bm{\sigma}_{v} and 𝝈wj:=𝝈w\bm{\sigma}_{w}^{j}:=\bm{\sigma}_{w} for all jj. Denote 𝚺\bm{\Sigma} a block diagonal matrix with entries (𝝈v)2Δt(\bm{\sigma}_{v})^{2}\Delta t and (𝝈w)2Δt(\bm{\sigma}_{w})^{2}\Delta t. Note that in many practical situations 𝚺\bm{\Sigma} is a diagonal matrix, from which 𝝈v\bm{\sigma}_{v} and 𝝈w\bm{\sigma}_{w} can be inferred. Further denote

𝜽=[𝜽L𝜽S]𝐂=[𝐂Lj𝐂Sj]𝐌j𝜽=[𝐌Lj𝜽L00𝐌Sj𝜽S],\bm{\theta}=\begin{bmatrix}\bm{\theta}_{L}\\ \bm{\theta}_{S}\end{bmatrix}\qquad\mathbf{C}=\begin{bmatrix}\mathbf{C}_{L}^{j}\\ \mathbf{C}_{S}^{j}\end{bmatrix}\qquad\mathbf{M}^{j}\bm{\theta}=\begin{bmatrix}\mathbf{M}_{L}^{j}\bm{\theta}_{L}&0\\ 0&\mathbf{M}_{S}^{j}\bm{\theta}_{S}\end{bmatrix}, (9)

where 𝜽L\bm{\theta}_{L} contains all the parameters in 𝝉w(v),𝝉v(v)\bm{\tau}^{(v)}_{w},\bm{\tau}^{(v)}_{v} and 𝜽S\bm{\theta}_{S} contains all those in 𝝉w(w),𝝉v(w)\bm{\tau}^{(w)}_{w},\bm{\tau}^{(w)}_{v}. Here, the parameters are assumed to appear as multiplicative pre-factors of functions consisting of state variables 𝐌j\mathbf{M}^{j} such that the closure terms can be written in an abstract form 𝐌j𝜽\mathbf{M}^{j}\bm{\theta}. The vector 𝐂j\mathbf{C}^{j} in (9) includes all those terms on the right hand side of (3) that do not contain parameters. Then the optimization problem based on a linear regression can be formulated as [75]:

~=min𝜽,𝚺(j=1J(Jlog|𝚺|+12j(𝒛j+1𝐌j𝜽𝐂j)𝐑1(𝒛j+1𝐌j𝜽𝐂j))),\displaystyle\widetilde{\mathcal{L}}=\min_{\bm{\theta},\bm{\Sigma}}\biggl{(}\sum_{j=1}^{J}\biggl{(}J\log|\bm{\Sigma}|+\frac{1}{2}\sum_{j}\,\bigl{(}\bm{z}^{j+1}-\mathbf{M}^{j}\bm{\theta}-\mathbf{C}^{j}\bigr{)}^{\ast}\mathbf{R}^{-1}\bigl{(}\bm{z}^{j+1}-\mathbf{M}^{j}\bm{\theta}-\mathbf{C}^{j}\bigr{)}\,\biggr{)}\biggr{)}, (10)

where 𝒛j+1=[𝐯j+1;𝐰j+1]\bm{z}^{j+1}=[\mathbf{v}^{j+1};\mathbf{w}^{j+1}] and JJ is the number of training snapshots. To solve the optimization problem (10), we use the iterative method to compute the following:

𝚺=1Jj=1J(𝐳j+1𝐌j𝜽𝐂j)(𝐳j+1𝐌j𝜽𝐂j),\displaystyle\bm{\Sigma}=\frac{1}{J}\sum_{j=1}^{J}\left(\mathbf{z}^{j+1}-\mathbf{M}^{j}\bm{\theta}-\mathbf{C}^{j}\right)\left(\mathbf{z}^{j+1}-\mathbf{M}^{j}\bm{\theta}-\mathbf{C}^{j}\right)^{\ast}, (11)
𝜽=(jJ(𝐌j)𝚺1𝐌j)1(jJ(𝐌j)𝚺1(𝐳j+1𝐂j)).\displaystyle\bm{\theta}=\biggl{(}\sum_{j}^{J}\,(\mathbf{M}^{j})^{\ast}\bm{\Sigma}^{-1}\mathbf{M}^{j}\biggr{)}^{-1}\,\biggl{(}\sum_{j}^{J}\,(\mathbf{M}^{j})^{\ast}\bm{\Sigma}^{-1}\bigl{(}\mathbf{z}^{j+1}-\mathbf{C}^{j}\bigr{)}\biggr{)}. (12)

2.4 Incorporating physics constraints into the ROM

Physics constraints correspond to the conservation of the energy in the quadratic nonlinear terms in (1), namely, 𝐮B(𝐮,𝐮)=0\mathbf{u}\cdot B\left(\mathbf{u},\mathbf{u}\right)=0. Incorporating the physics constraint into the ROM is central in preventing finite-time blow-up of the solutions and facilitates a skillful medium- to long-range forecast [38, 39, 50].

The physics constraints (or in general any constraint) for the model can be written as the following matrix equality:

𝐇𝜽=𝟎,\displaystyle\mathbf{H}\bm{\theta}=\mathbf{0}, (13)

where 𝐇\mathbf{H} contains those functions that are involved in the constraints. Then the optimization problem in (10) is reformulated as a constrained optimization problem,

~=min𝜽,𝚺(J2log|𝚺1|+12j(𝒛j+1𝐌j𝜽𝐂j)𝚺1(𝒛j+1𝐌j𝜽𝐂j))\displaystyle\begin{aligned} \widetilde{\mathcal{L}}=\min_{\bm{\theta},\bm{\Sigma}}\biggl{(}-\frac{J}{2}\log|&\bm{\Sigma}^{-1}|+\\ &\frac{1}{2}\sum_{j}\,\bigl{(}\bm{z}^{j+1}-\mathbf{M}^{j}\bm{\theta}-\mathbf{C}^{j}\bigr{)}^{\ast}\bm{\Sigma}^{-1}\bigl{(}\bm{z}^{j+1}-\mathbf{M}^{j}\bm{\theta}-\mathbf{C}^{j}\bigr{)}\,\biggr{)}\end{aligned} (14)
s.t.𝐇𝜽=𝟎.\displaystyle s.t.\hskip 56.9055pt\mathbf{H}\bm{\theta}=\mathbf{0}. (15)

The constrained optimization problem can be solved by the Lagrangian multiplier method [76], where the Lagrangian function yields

f(𝜽,𝚺,λ)=(J2log|𝚺1|+12j(𝒛j+1𝐌j𝜽𝐂j)𝚺1(𝒛j+1𝐌j𝜽𝐂j))+λ𝐇𝜽.\displaystyle\begin{aligned} f(\bm{\theta},\bm{\Sigma}^{,}\lambda)=\biggl{(}-\frac{J}{2}\log|\bm{\Sigma}^{-1}|+\frac{1}{2}\sum_{j}\,\bigl{(}\bm{z}^{j+1}-\mathbf{M}^{j}\bm{\theta}-\mathbf{C}^{j}\bigr{)}^{\ast}\bm{\Sigma}^{-1}\\ \cdot\bigl{(}\bm{z}^{j+1}-\mathbf{M}^{j}\bm{\theta}-\mathbf{C}^{j}\bigr{)}\,\biggr{)}+\lambda^{\ast}\mathbf{H}\bm{\theta}.\end{aligned} (16)

The solution can be obtained using the iterative method:

𝚺\displaystyle\bm{\Sigma} =1Jj=1J(𝐳j+1𝐌j𝜽𝐂j)(𝐳j+1𝐌j𝜽𝐂j),\displaystyle=\frac{1}{J}\sum_{j=1}^{J}\left(\mathbf{z}^{j+1}-\mathbf{M}^{j}\bm{\theta}-\mathbf{C}^{j}\right)\left(\mathbf{z}^{j+1}-\mathbf{M}^{j}\bm{\theta}-\mathbf{C}^{j}\right)^{\ast}, (17)
λ\displaystyle\lambda =(𝐇𝒦1𝐇)1(𝐇𝒦1𝐛),\displaystyle=\left(\mathbf{H}\mathcal{K}^{-1}\mathbf{H}^{\ast}\right)^{-1}\left(\mathbf{H}\mathcal{K}^{-1}\mathbf{b}\right), (18)
𝜽\displaystyle\bm{\theta} =𝒦1(𝐛𝐇λ),\displaystyle=\mathcal{K}^{-1}(\mathbf{b}-\mathbf{H}^{\ast}\lambda), (19)

where

𝒦\displaystyle\mathcal{K} =j(𝐌j)𝚺1𝐌j,\displaystyle=\sum_{j}\,(\mathbf{M}^{j})^{\ast}\bm{\Sigma}^{-1}\mathbf{M}^{j}\,, (20)
𝐛\displaystyle\mathbf{b} =j(𝐌j)𝚺1(𝐳j+1𝐂j).\displaystyle=\sum_{j}\,(\mathbf{M}^{j})^{\ast}\bm{\Sigma}^{-1}{(\mathbf{z}^{j+1}-\mathbf{C}^{j})}\,. (21)
Refer to caption
Figure 1: Flowchart of conditional Gaussian framework: a summary of models and the corresponding equations.

In Figure 1, we summarize all the models and corresponding equations used in this paper.

3 Applications to Complex Turbulent Flows

3.1 The viscous stochastic Burgers equation 

The first application of the CG-ROM framework is the following viscous stochastic Burgers equation [52, 53],

ut=νuxx+λuγuux+σW˙t,\frac{\partial u}{\partial t}=\nu u_{xx}+\lambda u-\gamma uu_{x}+\sigma\dot{W}_{t}, (22)

which is posed on the spatial domain (0,L)(0,L) supplemented with the Dirichlet boundary conditions u(0,t)=u(L,t)=0u(0,t)=u(L,t)=0 for t0t\geq 0 and a suitable initial condition u(x,0)=u0(x)u(x,0)=u_{0}(x) for x(0,L)x\in(0,L). In (22), ν>0\nu>0 is the viscosity constant, λ\lambda is the linear drag coefficient, γ\gamma is the advection strength, σ\sigma stands for the level of the stochastic noise, and W˙t\dot{W}_{t} is a spatiotemporal white noise source.

The viscous stochastic Burgers equation (22) is very different from the standard deterministic inviscid Burgers equation, which is well-known for the development of discontinuities, namely shock waves. In (22), the external random forcing interacts with the deterministic nonlinearity to induce intermittent bursts of the signals in the form of viscous shocks, which will be dissipated via the viscous term before the appearance of discontinuities. Therefore, the viscous stochastic Burgers equation is a suitable test model that mimics turbulent behavior in nature with intermittency. It is also noticeable that the stochastic forcing is triggering turbulent features.

3.1.1 Numerical solver of the FOM

The viscous stochastic Burgers equation (22) is solved by a semi-implicit Euler-Maruyama scheme. At each time step the nonlinearity uux=(u2)x/2uu_{x}=(u^{2})_{x}/2 and the noise term σW˙t\sigma\,\dot{W}_{t} are treated explicitly, where the random noises are drawn independently from a normal distribution 𝒩(0,1)\mathcal{N}(0,1), while the other terms are treated implicitly. The Laplacian operator is discretized using the standard second-order central difference approximation.

The spatial domain considered in the following tests is x(0,2π)x\in(0,2\pi). In the numerical discretization, the total number of grid points is Nx=512N_{x}=512, and therefore Δx1.2×102\Delta x\approx 1.2\times 10^{-2}. The simulations of the viscous stochastic Burgers equation (22) are performed with a numerical integration time step Δt=103\Delta t=10^{-3} and are integrated up to T=104T=10^{4}. The FOM solutions are recorded every 5×1025\times 10^{-2} simulation time units, and the initial condition is given by

u(x,0)=0.11πsin(x2)+0.11πsin(2x).\displaystyle u(x,0)=0.1\sqrt{\frac{1}{\pi}}\sin(\frac{x}{2})+0.1\sqrt{\frac{1}{\pi}}\sin(2x). (23)

In particular, the stochastic noise is added as

σ=ks=1Nkσ^i2Lsin(ksπxL),\displaystyle\sigma=\sum_{k_{s}=1}^{N_{k}}\hat{\sigma}_{i}\sqrt{\frac{2}{L}}\sin\left(\frac{k_{s}\pi x}{L}\right), (24)

where NkN_{k} is chosen to be 44 and σ^i\hat{\sigma}_{i} is a constant for i=1,2,3,4i=1,2,3,4.

3.1.2 ROM formulation

To build the standard Galerkin ROM, the simplest way is to adopt the Fourier sine modes as the basis functions due to the Dirichlet boundary conditions:

φk=2Lsin(kπxL),k=1,2,,R.\displaystyle\varphi_{k}=\sqrt{\frac{2}{L}}\sin\left(\frac{k\pi x}{L}\right),k=1,2,\cdots,R. (25)

Then, the G-ROM yields the following:

(du,φk)+[(νxu,xφk)(λu,φk)+(γuux,φk)]dt\displaystyle\left(du,\varphi_{k}\right)+\bigl{[}\left(\nu\partial_{x}u,\partial_{x}\varphi_{k}\right)-\left(\lambda u,\varphi_{k}\right)+\left(\gamma uu_{x},\varphi_{k}\right)\bigr{]}dt =(σdWt,φk),\displaystyle=\left(\sigma dW_{t},\varphi_{k}\right), (26)
k=1,2,,R.\displaystyle k=1,2,\cdots,R.

With the sought solution, ur=l=1ral(t)φl(𝒙)u_{r}=\sum_{l=1}^{r}a_{l}(t)\varphi_{l}(\bm{x}), where rr is the dimension of reduced space and {al}l=1r\{a_{l}\}_{l=1}^{r} are the sought ROM coefficients varying with time, one can replace uu with uru_{r} in the equation (26) and hence obtain the Galerkin ROM (G-ROM):

dak=l=1rAklaldt+l=1rm=1rBlmkalamdt+σkdWt,k=1,,r,da_{k}=\sum_{l=1}^{r}A_{kl}a_{l}\,dt+\sum_{l=1}^{r}\sum_{m=1}^{r}B_{lmk}a_{l}a_{m}\,dt+\sigma_{k}dW_{t},\qquad k=1,\cdots,r, (27)

where AA is a matrix and BB is a tensor which are defined as

Akl=ν(φlx,φkx)+λ(φl,φk)\displaystyle A_{kl}=-\nu\bigl{(}\frac{\partial\varphi_{l}}{\partial x},\frac{\partial\varphi_{k}}{\partial x}\bigr{)}+\lambda\bigl{(}\varphi_{l},\varphi_{k}\bigr{)} (28)
Blmk=γ(φlφmx,φk).\displaystyle B_{lmk}=-\gamma\bigl{(}\varphi_{l}\frac{\partial\varphi_{m}}{\partial x},\varphi_{k}\bigr{)}. (29)

Thanks to the explicit formulation of ROM basis functions (i.e., they are sine functions (25)), we can explicitly calculate the formulas in (28) and (29):

(φl,φk)=δlk,(φlx,φkx)=k2π2L2δlk,l,k=1,,r,\displaystyle\bigl{(}\varphi_{l},\varphi_{k}\bigr{)}=\delta_{lk},\quad\biggl{(}\frac{\partial\varphi_{l}}{\partial x},\frac{\partial\varphi_{k}}{\partial x}\biggr{)}=\frac{k^{2}\pi^{2}}{L^{2}}\delta_{lk},\quad l,k=1,\cdots,r, (30)
(φlφmx,φk)={mπ4(2L)3/2if l+mk=0 or lmk=0,mπ4(2L)3/2if lm+k=0,0otherwise.\displaystyle\biggl{(}\varphi_{l}\frac{\partial\varphi_{m}}{\partial x},\varphi_{k}\biggr{)}=\begin{cases}\frac{m\pi}{4}\biggl{(}\frac{2}{L}\biggr{)}^{3/2}&\text{if $l+m-k=0$ or $l-m-k=0$},\\ -\frac{m\pi}{4}\biggl{(}\frac{2}{L}\biggr{)}^{3/2}&\text{if $l-m+k=0$,}\\ 0&\text{otherwise.}\end{cases} (31)

The CG-ROM starts from (27). It drops the quadratic nonlinear interaction between the medium-scale variables but adds closure terms as was described in Section 2.1. Physics constraints are imposed in the development of the CG-ROM, while they are automatically included in the G-ROM.

3.1.3 Dynamical regimes

Two different regimes are considered in the following tests, with the associated parameters listed in Table 1. The two dynamical regimes use a different linear drag coefficient, λ\lambda. A larger (i.e., more positive value of) λ\lambda is more prone to induce linear instability and, thus, yields more turbulent features. Linear analysis shows that only one eigenvalue is positive in Regime I, while three positive eigenvalues are observed in Regime II. Therefore, Regime I is a weakly chaotic regime, while Regime II is a much more challenging regime with strong instabilities.

Parameters ν\nu λ\lambda γ\gamma σ^\hat{\sigma}
Regime I 0.0050.005 0.003750.00375 11 0.0030.003
Regime II 0.0050.005 0.011250.01125 11 0.0030.003
Table 1: Parameters of the two dynamical regimes of the viscous stochastic Burgers equation (22).

3.1.4 Results: Comparison of the FOM and ROMs in model simulations and statistics

To present numerical results of ROMs, we mainly consider the ROMs with r=5r=5 modes, whichaccount for 96.05% of the FOM energy for Regime I, and 93.81% of the FOM energy for Regime II, where the FOM system has in total 512512 modes. In the CG-ROM, the dimension of the observed variable 𝐯\mathbf{v} is r1=2r_{1}=2 and therefore the dimension of the unobserved medium-scale variable 𝐰\mathbf{w} is r2=3r_{2}=3, such that the total number of the modes in the CG-ROM is the same as that in the G-ROM. For the simplicity of the notation, these leading five modes are denoted by a1,,a5a_{1},\ldots,a_{5}.

Figure 2 compares the trajectories, PDFs, and autocorrelation functions (ACFs) of the FOM (black), G-ROM (red), and CG-ROM (blue) for Regime I. In order to contrast the path-wise behavior, the same random noise source is utilized in the three models. Yet, it is important to notice that in practice the noise is unknown and therefore the statistics, namely the PDFs and the ACFs, are more suitable metrics for model comparison. The G-ROM is nearly as accurate as the CG-ROM for the first two modes in terms of both the simulated trajectories and the statistics. However, due to the lack of closure terms, the G-ROM loses its skill in capturing the statistics of modes a3,a4a_{3},a_{4}, and a5a_{5}. The amplitudes of the corresponding trajectories are also overestimated. Table 2 includes the L2L^{2} errors for different r1r_{1} and rr values in the ROMs. The results in Table 2 consistently show that the CG-ROM outperforms the G-ROM for both the large-scale modes 𝐯\mathbf{v} and the medium-scale modes 𝐰\mathbf{w}, indicating the robustness of the CG-ROM performance.

Figure 3 compares the trajectories and the statistics of the FOM (black) and the CG-ROM (blue) in Regime II, which is a much tougher regime for the test. Despite being strongly turbulent with significant non-Gaussian statistics in all the leading r=5r=5 modes, the CG-ROM succeeds in reproducing both the trajectories (by assigning the same random forcing as the FOM) and the overall non-Gaussian PDFs. Note that the solution of the G-ROM suffers from strong instability and blows up in a finite time in such a test case. This further indicates the necessity of incorporating the closure terms in the development of the ROMs. Figure 4 shows the reconstructed spatiotemporal patterns. The CG-ROM is able to reproduce the intermittent spatiotemporal patterns of the viscous shocks, which highly resemble those from the FOM with a pattern correlation that is above 0.950.95.

Refer to caption
Figure 2: Regime I of the viscous stochastic Burgers equation. Comparison of the trajectories, PDFs, and ACFs of the FOM (black), G-ROM (red), and CG-ROM (blue) for r=5r=5 and r1=2r_{1}=2. The same random noise source is utilized in the three models. The PDFs and the ACFs are computed based on a finite finite trajectory length of 5×1045\times 10^{4} units.
r1,rr_{1},r ROMs (𝒗;L2)\mathcal{E}\left(\bm{v};L^{2}\right) (𝒘;L2)\mathcal{E}\left(\bm{w};L^{2}\right) (𝒖;L2)\mathcal{E}\left(\bm{u};L^{2}\right)
r1=2,r=4r_{1}=2,r=4 G-ROM (rr dim) 5.818e-2 6.476e-2 8.705e-2
CG-ROM 2.051e-2 1.924e-2 2.812e-2
r1=2,r=5r_{1}=2,r=5 G-ROM (rr dim) 5.109e-2 5.919e-2 7.819e-2
CG-ROM 2.251e-2 2.086e-2 3.069e-2
r1=3,r=6r_{1}=3,r=6 G-ROM (rr dim) 2.193e-2 3.302e-2 3.964e-2
CG-ROM 2.957e-2 1.668e-2 3.395e-2
Table 2: Regime I of the viscous stochastic Burgers equation. The L2L^{2} errors for different r1r_{1} and rr values in the ROMs. The same random noise source is utilized in the FOM and the two ROMs.
Refer to caption
Figure 3: Regime II of the viscous stochastic Burgers equation. Comparison of the trajectories, PDFs, and ACFs of the FOM (black) and the CG-ROM (blue) for r=5r=5 and r1=2r_{1}=2. The same random noise source is utilized in the two models. The PDFs and the ACFs are computed based on a finite finite trajectory length of 5×1045\times 10^{4} units. The path-wise solution of the G-ROM blows up around time unit 100100 and is therefore omitted here.
Refer to caption
Figure 4: Regime II of the viscous stochastic Burgers equation. Reconstructed spatiotemporal patterns of the FOM and CG-ROM.

3.1.5 Results: Comparison of the FOM and ROMs in data assimilation

The focus of this subsection is on the data assimilation (i.e., filtering) of the unobserved variable 𝐰=(a3,a4,a5)𝚃\mathbf{w}=(a_{3},a_{4},a_{5})^{\mathtt{T}} using ROMs given one realization of the observed trajectories of the leading two modes 𝐯=(a1,a2)𝚃\mathbf{v}=(a_{1},a_{2})^{\mathtt{T}} generated from the FOM. As in the previous subsection, the two ROMs utilized here are the G-ROM and the CG-ROM. Recall that the closed analytic formulae (7) associated with the CG-ROM provide an efficient and accurate solution of the filter posterior distribution. However, no such a closed formula can be applied to the G-ROM. Therefore, the ensemble Kalman-Bucy filter (EnKBF) [77], which is a widely used ensemble based data assimilation scheme for continuous-in-time observations, is utilized for the G-ROM. As will be seen below, the solutions using G-ROM with the EnKBF are much less accurate than those using the CG-ROM with closed analytic filter formulae. To understand if the data assimilation scheme or the ROM itself is the main source accounting for the poor behavior of the former, the EnKBF is also applied to the CG-ROM. In other words, three filters will be tested in our numerical investigation:

  • 1.

    CG-ROM Closed Form: using the closed analytic formulae (7) as the data assimilation scheme for the CG-ROM,

  • 2.

    G-ROM EnKBF: using the EnKBF as the data assimilation scheme for the G-ROM, and

  • 3.

    CG-ROM EnKBF: using the EnKBF as the data assimilation scheme for the CG-ROM.

For the EnKBF, the size of ensemble is taken to be 100100, which has been tested to be sufficient to reach reasonably accurate and robust results. Note that although the G-ROM or CG-ROM are often relatively low dimensional, running such a moderate ensemble size forward through the system via EnKBF is still much more computationally expensive than the closed form in the CG-ROM.

Figure 5 shows the posterior mean time series of 𝐰=(a3,a4,a5)𝚃\mathbf{w}=(a_{3},a_{4},a_{5})^{\mathtt{T}} from data assimilation using the three different filters in Regime I. Consistent with the model simulation results presented in Figure 2, the G-ROM EnKBF is significantly worse than the CG-ROM Closed Form, especially for a5a_{5}. As the results of CG-ROM EnKBF are only slightly worse than those of CG-ROM Closed Form in such a case, it is conclusive that the main error source for the unskillful behavior in the G-ROM EnKBF is the model error in the G-ROM, not the filtering method EnKBF. In fact, the bare truncation in the G-ROM effectively breaks the coupling between 𝐯\mathbf{v} and 𝐰\mathbf{w} to a large extent. Therefore, it is more difficult to infer the unobserved states from the observations via a model with a poorly described coupling relationship. Similar qualitative conclusions are reached in Figure 6 that illustrates the results of Regime II. However, the G-ROM EnKBF is quantitatively much worse than its counterparts due to the stronger turbulent behavior in this regime. Note that, despite the big error, the data assimilation results using the G-ROM EnKBF do not suffer from the finite-time blow-up issue as in the forward run of the G-ROM. The reason is that in data assimilation observations are an additional input, which constantly helps mitigate the pathological behavior of the model. Finally, Figure 7 displays a spatiotemporal reconstruction of the three unobserved modes 𝐰=(a3,a4,a5)𝚃\mathbf{w}=(a_{3},a_{4},a_{5})^{\mathtt{T}}, using the recovered results from data assimilation, which is compared with the truth. It further validates that recovered fields from the CG-ROM Closed Form and from the CG-ROM EnKBF are comparably accurate, while that from the G-ROM is completely biased.

Refer to caption
Figure 5: Regime I of the viscous stochastic Burgers equation. Truth and posterior mean time series of 𝐰=(a3,a4,a5)𝚃\mathbf{w}=(a_{3},a_{4},a_{5})^{\mathtt{T}} from data assimilation using the three different filters.
Refer to caption
Figure 6: Regime II of the viscous stochastic Burgers equation. Truth and posterior mean time series of 𝐰=(a3,a4,a5)𝚃\mathbf{w}=(a_{3},a_{4},a_{5})^{\mathtt{T}} from data assimilation using the three different filters.
Refer to caption
Figure 7: Regime II of the viscous stochastic Burgers equation. Reconstructed spatiotemporal field based on the three modes 𝐰=(a3,a4,a5)𝚃\mathbf{w}=(a_{3},a_{4},a_{5})^{\mathtt{T}} from data assimilation and the truth, where the truth is computed by projecting the true signal on the basis functions associated with these three modes.

3.2 The QG model

The second application is the quasi-geostrophic (QG) equation:

ωt+J(ω,ψ)Ro1ψx=Re1Δω+Ro1F,ω=Δψ,\begin{split}\frac{\partial\omega}{\partial t}+J(\omega,\psi)-Ro^{-1}\frac{\partial\psi}{\partial x}&=Re^{-1}\Delta\omega+Ro^{-1}F,\\ \omega&=-\Delta\psi,\end{split} (32)

which is a widely used model that describes the large scale ocean circulation [1, 9]. In the QG equation (32), ω\omega is the vorticity, ψ\psi is the streamfunction, FF is the external forcing, and theJacobian is defined as J(A,B)=A/xB/yA/yB/xJ(A,B)=\partial A/\partial x\,\partial B/\partial y-\partial A/\partial y\,\partial B/\partial x. There are two non-dimensional numbers in (32): ReRe is the Reynolds number and RoRo is the Rossby number. The former is the ratio of inertial forces to viscous forces, representing the strength of turbulence, while the latter stands for the ratio of inertial force to Coriolis force. Note that, different from the viscous stochastic Burgers equation, the QG equation is a deterministic system and its turbulent features are induced completely by the nonlinear interactions between the state variables at different spatial scales.

In the following, a symmetric double-gyre wind forcing is imposed to the QG equation (32) [78, 79, 80, 81] with F=sin(π(y1))F=\sin(\pi(y-1)). A rectangular domain is considered here with Ω=[0,1]×[0,2]\Omega=[0,1]\times[0,2]. Homogeneous Dirichlet boundary conditions are applied for both ψ\psi and ω\omega:

ψ(t,x,y)=0,ω(t,x,y)=0for(x,y)Ω and t0.\displaystyle\psi(t,x,y)=0,\qquad\omega(t,x,y)=0\qquad\text{for}\quad(x,y)\in\partial\Omega\;\text{ and }\;t\geq 0. (33)

The two non-dimensional numbers are set to be Re=450Re=450 and Ro=0.0036Ro=0.0036, such that the system has moderate turbulent behavior and a fast rotation.

3.2.1 Numerical solver of the FOM

A spectral method with a 256×512256\times 512 spatial resolution and an explicit Runge-Kutta method are utilized to solve the QG model. The FOM is simulated for 8080 time units. The solution displays a transient behavior on the time interval [0,10][0,10], and then converges to a statistically steady state on the time interval [10,80][10,80]. The FOM solutions are recorded on the time interval [10,80][10,80] every 10210^{-2} simulation time units. which ensures that the snapshots used in the construction of the ROM basis are equally spaced. In Figure 8, the vorticity and streamfunction snapshots at t=20,30,40,50,60t=20,30,40,50,60 are plotted.

Refer to caption
(a) Vorticity
Refer to caption
(b) Streamfunction
Figure 8: The QG equation. Snapshots of vorticity, ω\omega (first row) and streamfunction ψ\psi, (second row) at five different time instances

3.2.2 ROM formulation

The ROMs start from projecting the vorticity ω\omega to its POD bases and keeping the leading rr modes. Correspondingly, a reduced set of rr basis functions are also introduced for ψ\psi, which are subordinate to the above POD basis functions for ω\omega via ϕi(x,y)=Δ1φi(x,y)\phi_{i}(x,y)=-\Delta^{-1}\varphi_{i}(x,y), i.e., they solve the following Poisson equation:

Δϕi(x,y)=φi(x,y), subject to ϕi(x,y)=0,for(x,y)Ω,i=1,2,,r.\displaystyle\begin{aligned} -\Delta\phi_{i}(x,y)=\varphi_{i}(x,y),\quad\text{ subject to }&\quad\phi_{i}(x,y)=0,\\ &\text{for}\quad(x,y)\in\partial\Omega,\qquad i=1,2,\cdots,r.\end{aligned} (34)

Note that while the POD basis {φi}\{\varphi_{i}\} for the vorticity ω\omega is an orthonormal basis under the L2L^{2} inner product, the basis {ϕi}\{\phi_{i}\} for the streamfunction ψ\psi is not orthogonal. Given the G-ROM approximation ωr=i=1rai(t)φi(x,y)\omega_{r}=\sum_{i=1}^{r}a_{i}(t)\varphi_{i}(x,y) of ω\omega, the corresponding ψ\psi is approximated by ψr=i=1rai(t)ϕi(x,y)\psi_{r}=\sum_{i=1}^{r}a_{i}(t)\phi_{i}(x,y), which results from the ansatz ψr=Δ1ωr\psi_{r}=-\Delta^{-1}\omega_{r} and definition (34) of the basis function ϕi\phi_{i}. With the above notations, the rr-dimensional G-ROM for the problem (32)–(33) is given by:

(ωrt,φi)+(J(ωr,ψr),φi)Ro1(ψrx,φi)+Re1(ωr,φi)=Ro1(F,φi),\displaystyle\left(\frac{\partial\omega_{r}}{\partial t},\varphi_{i}\right)+\left(J(\omega_{r},\psi_{r}),\varphi_{i}\right)-Ro^{-1}\left(\frac{\partial\psi_{r}}{\partial x},\varphi_{i}\right)+Re^{-1}\left(\nabla\omega_{r},\nabla\varphi_{i}\right)=Ro^{-1}\left(F,\varphi_{i}\right), (35)

where (,)(\cdot,\cdot) denotes the L2L^{2} inner product over the spatial domain, and i=1,,ri=1,\cdots,r. The component-wise form of the G-ROM is as follows: for i=1,2,,ri=1,2,\cdots,r

a˙i(t)=bi+m=1rAimam(t)+m=1rn=1rBimnam(t)an(t),\dot{a}_{i}(t)=b_{i}+\sum_{m=1}^{r}A_{im}a_{m}(t)+\sum_{m=1}^{r}\sum_{n=1}^{r}B_{imn}a_{m}(t)a_{n}(t), (36)

where

bi=Ro1(F,φi),Aim=Ro1(ϕmx,φi)Re1(φm,φi),Bimn=(J(φm,ϕn),φi).\begin{split}b_{i}&=Ro^{-1}(F,\varphi_{i}),\,\\ A_{im}&=Ro^{-1}\left(\frac{\partial\phi_{m}}{\partial x},\varphi_{i}\right)-Re^{-1}\left(\nabla\varphi_{m},\nabla\varphi_{i}\right),\,\\ B_{imn}&=-\left(J(\varphi_{m},\phi_{n}),\varphi_{i}\right).\end{split}

In the above procedure, 35003500 equally spaced FOM vorticity snapshots in the time interval [10,45][10,45] are utilized to construct the ROM bases [79]. For computational efficiency, the FOM vorticity is interpolated onto a uniform mesh with the resolution 257×513257\times 513 over the spatial domain Ω=[0,1]×[0,2]\Omega=[0,1]\times[0,2], i.e., with a mesh size Δx=Δy=1/256\Delta x=\Delta y=1/256. Based on the interpolated snapshots, the corresponding eigenvalue problemis solved to generate the ROM basis. The fourth-order Runge-Kutta scheme (RK4) is utilized for the temporal discretization. To ensure the numerical stability of the time discretization, a time step size Δt=0.001\Delta t=0.001 is utilized. The ROM data is stored every ten time steps to match the FOM sampling rate.

On the other hand, the CG-ROM follows the procedure described in Section 2.1. It starts from (36) and then drops the quadratic nonlinear interaction between the medium-scale variables but adds a closure term. Physics constraints are also included in the CG-ROM.

Figure 9 shows the time series and the associated statistics of the leading 66 POD modes of the FOM. Fast oscillations and turbulent behavior can be observed in these time series. In addition, the intermittency introduces extreme events, which lead to non-Gaussian PDFs with fat tails. The shading period up to t=45t=45 is utilized to calibrate the ROMs, while the remaining time series is utilized to test for the data assimilation skill.

In the following, r=20r=20 modes are kept in both the G-ROM and the CG-ROM. For notation simplicity, these 2020 modes are denoted byidentified with their time-dependent coefficients, a1,,a20a_{1},\ldots,a_{20}. Assume that the first r1=10r_{1}=10 modes are accurately resolved in the observations. Therefore, 𝐯=(a1,,a10)\mathbf{v}=(a_{1},\ldots,a_{10}) and 𝐰=(a11,,a20)\mathbf{w}=(a_{11},\ldots,a_{20}).

Refer to caption
Figure 9: The time series and the associated PDFs and ACFs of the leading 66 POD modes of the QG equation. The shading period up to t=45t=45 is utilized to calibrate the ROMs, while the remaining time series is utilized to test for the data assimilation skill.

3.2.3 Results: Comparison of the FOM and the ROMs in model statistics

For the viscous stochastic Burgers equation, the random noise source can be artificially assumed to be known, which allows a path-wise comparison between the ROMs and the FOM. In contrast, the turbulent nature of the QG equation makes the trajectories of different models diverge very quickly. Therefore,in this case it is more justified to compare the statistics of these models.

Figure 10 shows the comparison of the PDFs using the FOM and the two ROMs for different POD modes. For the first r1=10r_{1}=10 modes, the CG-ROM captures the true statistics quite accurately. Particularly, the non-Gaussian behavior in the PDFs is well characterized by the CG-ROM. In contrast, the resulting PDFs from the G-ROM are more biased. For a2a_{2} and a3a_{3}, there is a severe mean shift in the result from G-ROM. For a7,a8,a9a_{7},a_{8},a_{9} and a10a_{10}, the variance of the G-ROM is significantly overestimated. For the remaining 1010 medium-scale modes, neither the G-ROM nor the CG-ROM is very accurate. Nevertheless, the CG-ROM is still more skillul than the G-ROM in recovering the statistics.

Refer to caption
Figure 10: Comparison of the PDFs using the FOM and the two ROMs for different POD modes.

3.2.4 Results: Comparison of the FOM and the ROMs in data assimilation

As in Section 3.1.5, the three filters — CG-ROM Closed Form, G-ROM EnKBF and CG-ROM EnKBF — are utilized for assimilating the trajectories of the 1010 unobserved state variables a11,,a20a_{11},\ldots,a_{20}. Figures 11 and 12 include the comparison of the assimilated posterior mean time series and the associated statistics using different filters. Despite that the statistics related to the model free run using the CG-ROM contain certain bias for all these variables, the data assimilation solutions using the CG-ROM remain reasonably accurate. This is not surprising since the observations help correct the model error in the ROM simulations. In contrast to the filters based on the CG-ROM, the G-ROM EnKBF is significantly worse.

For the sake of a qualitative comparison between the filtered posterior mean time series and the truth using different filters, three skill scores are utilized here: the root-mean-square error (RMSE), the pattern correlation (Corr), and the relative entropy. They are defined as follows:

Corr=i=1n(uiMu¯M)(uirefu¯ref)i=1n(uiMu¯M)2i=1n(uirefu¯ref)2,RMSE=i=1n(uiMuiref)2n,Relative Entropy=𝒫(pref(u),pM(u))=pref(u)lnpref(u)pM(u)du,\begin{split}\mbox{Corr}&=\frac{\sum_{i=1}^{n}(u^{M}_{i}-\bar{u}^{M})(u_{i}^{ref}-\bar{u}^{ref})}{\sqrt{\sum_{i=1}^{n}(u^{M}_{i}-\bar{u}^{M})^{2}}\sqrt{\sum_{i=1}^{n}(u^{ref}_{i}-\bar{u}^{ref})^{2}}},\\ \mbox{RMSE}&=\sqrt{\frac{\sum_{i=1}^{n}(u^{M}_{i}-u^{ref}_{i})^{2}}{n}},\\ \mbox{Relative Entropy}&=\mathcal{P}(p^{ref}(u),p^{M}(u))=\int p^{ref}(u)\ln\frac{p^{ref}(u)}{p^{M}(u)}du,\end{split} (37)

where uiMu^{M}_{i} and uirefu^{ref}_{i} are the assimilated solution and the truth, respectively, at time t=tit=t_{i}. The time averages of the assimilated and the true time series are denoted by u¯M\bar{u}^{M} and u¯ref\bar{u}^{ref}, respectively. The RMSE and the Corr are widely used metrics in practice to quantify the path-wise error. A smaller RMSE and a larger Corr correspond to a skillful assimilated time series. On the other hand, the relative entropy is an information criterion [82, 83, 84, 85], which is adopted to quantify the statistics error between the two distributions formulated by the assimilated time series and the true signal, respectively. The relative entropy has many attractive features. First, 𝒫(p,pM)0\mathcal{P}(p,p^{M})\geq 0 with equality if and only if p=pMp=p^{M}. Second, 𝒫(p,pM)\mathcal{P}(p,p^{M}) is invariant under general nonlinear changes of variables. A smaller relative entropy value corresponds to a smaller statistical error. Figure 13 shows the skill scores using different filters. First, the larger RMSE and the smaller Corr in the G-ROM EnKBF confirm the intuition from Figures 11 and 12 that the path-wise error using the G-ROM EnKBF is much larger than the filters using the CG-ROMs. Second, the relative entropy, which quantifies the error in the PDFs formed by the posterior mean time series and the true signal, indicates that the G-ROM has a bigger statistical error for all the modes as well. It is also worthwhile to mention that the difference between CG-ROM Closed Form and CG-ROM EnKBF implies that closed analytic formulae in the CG-ROM is advantageous in not only decreasing the computational time but reducing the sampling error as well.

Figure 14 shows the reconstructed spatiotemporal patterns from the posterior mean based on modes a11,,a20a_{11},\ldots,a_{20}. It indicates that the CG-ROM Closed Form can overall recover the streamlines (and thus velocity fields) and the vortices in an accurate fashion. The CG-ROM EnKBF can capture the overall tendency but the sampling error leads to some biases in the amplitude. In contrast, the G-ROM EnKBF leads to a much larger error in reproducing the spatiotemporal patterns of the truth.

Refer to caption
Figure 11: The data assimilation results of the POD modes a11a_{11} to a15a_{15} in the QG equation. Comparison of the posterior mean time series and the associated statistics using different filters with the truth.
Refer to caption
Figure 12: Similar to Figure 11 but for the POD modes a16a_{16} to a20a_{20}.
Refer to caption
Figure 13: The skill scores of the posterior mean time series. The three panels show the root-mean-square error (RMSE), the pattern correlation (Corr), and the relative entropy.
Refer to caption
(a) Time instance, t=34t=34
Refer to caption
(b) Time instance, t=64t=64
Figure 14: The reconstructed spatiotemporal patterns of ω\omega (first row) and streamfunction ψ\psi (second row) from the posterior mean based on modes a11,,a20a_{11},\ldots,a_{20}, compared with the truth.

4 Discussion and Conclusions

In this paper, a new multiscale stochastic ROM framework, known as the CG-ROM, is developed. The new ROM focuses on recovering the mutliscale dynamical features as well as the statistical feedback between different scales. The CG-ROM exploits cheap but effective conditional linear functions as the closure terms, which facilitates an efficient and accurate data assimilation scheme that is crucial in recovering the unobserved states in turbulent systems. Physics constraints are incorporated into the CG-ROM to prevent the occurrence of pathological behavior of the model.

There are a few issues that remain as future work. First, both the trajectories of 𝐯\mathbf{v} and 𝐰\mathbf{w} are assumedknown in the training period for model calibration. Admittedly, the availability of the time series of the unobserved variable 𝐰\mathbf{w} can be explained as the result of the reanalysis. However, it is more natural to assume that only the data of the observed variable 𝐯\mathbf{v} is accessible in practice. Then determining the model parameters is more challenging than the case with full observations in (10). Estimating the model parameters and recovering the unobserved states have to be carried out simultaneously. Fortunately, due to the analytic formulae of the state estimation, such a procedure can still be efficiently using an expectation-maximization algorithm [75]. It is then interesting to see how such an uncertainty affects the skill of the ROM. Second, the ROMs are often preferred to be parsimonious, which prevents the overfitting issue and can yield more physical results. Such a constraint has not been included in this work, as the focus here is on reproducing the statistics and data assimilation for which the current setup is sufficient. Nevertheless, existing techniques, such as the LASSO (least absolute shrinkage and selection operator) regression [86] or information-theoretic based sparse identification methods [87, 88], can be incorporated into the current framework. Lastly, as in other ROMs, there is often no rigorous way to determine the dimension of the ROMs. The data assimilation framework may have the potential to provide a different criterion to determine the optimal number of the state variables in the ROMs in addition to using the explained variance as in most of the existing work.

Acknowledgments

The research of N.C. is partially funded by the Office of VCRGE at UW-Madison and ONR N00014-21-1-2904. The research of T.I. is partially funded by NSF through grant DMS-2012253 and CDS&E-MSS-1953113.

Appendix A Burgers equation

A.1 Eigenvalue Problems

Consider the linear part of the viscous stochastic Burgers equation (22), which yields the following initial boundary value problem (IBVP):

ut=νuxx+λu\displaystyle u_{t}=\nu u_{xx}+\lambda u (x,t)(0,1)×+,\displaystyle(x,t)\in(0,1)\times\mathbb{R}^{+}, (38)
u(0,t)=u(L,t)=0,\displaystyle u(0,t)=u(L,t)=0, t0,\displaystyle t\geq 0, (39)
u(x,0)=u0(x),\displaystyle u(x,0)=u_{0}(x), x(0,L).\displaystyle x\in(0,L). (40)

This IBVP yields the following eigenvalue problem:

\displaystyle- ν2wjx2λwj=σjwj\displaystyle\nu\frac{\partial^{2}w_{j}}{\partial x^{2}}-\lambda w_{j}=\sigma_{j}w_{j} xΩ,\displaystyle x\in\Omega, (41)
wj=0\displaystyle w_{j}=0 xΩ,\displaystyle x\in\partial\Omega, (42)

where {wj}j=1\{w_{j}\}_{j=1}^{\infty} are eigenfunctions for the operator =ν2x2λ\displaystyle\mathcal{L}=-\nu\frac{\partial^{2}}{\partial x^{2}}-\lambda on Ω=(0,l)\Omega=(0,l) with zero boundary conditions. Note that for (41)-(42), (λ+σj)/ν>0\displaystyle(\lambda+\sigma_{j})/\nu>0 must be true for all jj. Letting αj=λ+σjν\displaystyle\alpha_{j}=\frac{\lambda+\sigma_{j}}{\nu}, (41)-(42) become

2wj2x+αjwj=0\displaystyle\frac{\partial^{2}w_{j}}{\partial^{2}x}+\alpha_{j}w_{j}=0 xΩ,\displaystyle x\in\Omega, (43)
wj=0\displaystyle w_{j}=0 xΩ.\displaystyle x\in\partial\Omega. (44)

Since wj(x)w_{j}(x) is a nontrivial solution, αj>0\alpha_{j}>0 and the boundary condition imply

αj=(jπl)2,\displaystyle\alpha_{j}=\left(\frac{j\pi}{l}\right)^{2}, wj(x)=cjsin(jπxl),\displaystyle w_{j}(x)={c_{j}}\,\sin\left(\frac{j\pi x}{l}\right), j+,\displaystyle j\in\mathbb{N}^{+}, (45)

where cjc_{j} are arbitrary constants. We choose {wj}j=1\{w_{j}\}_{j=1}^{\infty} to be orthonormal in L2(Ω)L^{2}(\Omega), i.e.,

Ωwi(x)wj(x)𝑑x=δij,i,j=1,2,\displaystyle\int_{\Omega}w_{i}(x)w_{j}(x)dx=\delta_{ij},\qquad\qquad{i,j=1,2,\ldots} (46)

Consequently, the eigenfunctions in (45) can be written as follows:

wj(x)=2lsin(jπxl),\displaystyle w_{j}(x)=\sqrt{\frac{2}{l}}\sin\left(\frac{j\pi x}{l}\right), j=1,2,.\displaystyle j=1,2,\ldots. (47)

A.2 Unstable Eigen-modes

To apply the Galerkin projection, we define the L2L^{2} inner product, i.e., (f,g)L2=Ωf(x)g(x)𝑑x(f,g)_{L^{2}}=\int_{\Omega}f(x)g(x)dx. Then, with a finite-dimensional approximation space spanned by {wk}k=1N\{w_{k}\}_{k=1}^{N}, the Galerkin projection of the viscous stochastic Burgers equation (22) yields

(ut,wk)L2+ν(xu,xwk)L2λ(u,wk)L2+γ(uxu,wk)L2=0,\displaystyle(u_{t},w_{k})_{L^{2}}+\nu(\partial_{x}u,\partial_{x}w_{k})_{L^{2}}-\lambda(u,w_{k})_{L^{2}}+\gamma(u\partial_{x}u,w_{k})_{L^{2}}=0, (48)
k=1,2,,N.\displaystyle k=1,2,\cdots,N.

Writing u(x,t)=j=1Ndj(t)wj(x)\displaystyle u(x,t)=\sum_{j=1}^{N}d_{j}(t)w_{j}(x), we have

j=1Nd˙j(wj,wk)L2+νj=1Ndj(xwj,xwk)L2λj=1Ndj(wj,wk)L2+\displaystyle\sum_{j=1}^{N}\dot{d}_{j}(w_{j},w_{k})_{L^{2}}+\nu\sum_{j=1}^{N}d_{j}(\partial_{x}w_{j},\partial_{x}w_{k})_{L^{2}}-\lambda\sum_{j=1}^{N}d_{j}(w_{j},w_{k})_{L^{2}}+ (49)
γi=1Nj=1Ndidj(wixwj,wk)L2=0,k=1,2,,N.\displaystyle\gamma\sum_{i=1}^{N}\sum_{j=1}^{N}d_{i}d_{j}(w_{i}\partial_{x}w_{j},w_{k})_{L^{2}}=0,\qquad k=1,2,\cdots,N.

With the inner products (wixwj,wk)L2(w_{i}\partial_{x}w_{j},w_{k})_{L^{2}}, (wj,wk)L2(w_{j},w_{k})_{L^{2}}, and (xwj,xwk)L2(\partial_{x}w_{j},\partial_{x}w_{k})_{L^{2}} evaluated in equations (30) and (31), respectively, the Galerkin projection (49) (i.e., the G-ROM (26) in Section 3.1.2) can be reformulated as a set of ODE:

d1˙\displaystyle\dot{d_{1}} =(λνπ2l2)d1+h.o.t.\displaystyle=\left(\lambda-\nu\frac{\pi^{2}}{l^{2}}\right)d_{1}+h.o.t. (50)
d2˙\displaystyle\dot{d_{2}} =(λν4π2l2)d2+h.o.t.\displaystyle=\left(\lambda-\nu\frac{4\pi^{2}}{l^{2}}\right)d_{2}+h.o.t. (51)
d3˙\displaystyle\dot{d_{3}} =(λν9π2l2)d3+h.o.t.\displaystyle=\left(\lambda-\nu\frac{9\pi^{2}}{l^{2}}\right)d_{3}+h.o.t. (52)
\displaystyle\vdots
dn˙\displaystyle\dot{d_{n}} =(λνn2π2l2)dn+h.o.t.,\displaystyle=\left(\lambda-\nu\frac{n^{2}\pi^{2}}{l^{2}}\right)d_{n}+h.o.t., (53)

where h.o.t.h.o.t. denotes the higher order terms and 1nN1\leq n\leq N. As a result, the choice of the parameter pair, λ\lambda and ν\nu, can determine the linear stability of the viscous stochastic Burgers equation. For example, the Regime I in Table 1 yields one unstable eigen-mode, while the Regime II yields three unstable eigen-modes. Thus, the Regime II is more challenging than Regime I.

Appendix B Optimization Problem

B.1 Conditional Gaussian ROM with Physical Constraints

In what follows, the details of the multiscale physics constraints are presented. In particular, in Section B.2, the mathematical formulation of the multiscale physics constraints are derived and the matrix forms for implementation are presented. In Section B.2.1, a simple example of multiscale physics constraints is given.

B.2 Multiscale physical constraints

The proposed conditional Gaussian ROMs aim at preserving the mutliscale dynamical featuresof the underlying system.Therefore, multiscale physics constraints, are considered. In particular, the multiscale physics constraintsthat are imposed for the quadratic terms are expected to satisfy the following:

[v;w](v(𝐁vτ(v)𝐂wτ(v))[v;w]v(𝐁vτ(w)𝐂wτ(w))[v;w])=0.\displaystyle[v;w]^{\ast}\begin{pmatrix}v^{\ast}\left(\mathbf{B}^{(v)}_{v\tau}\oplus\mathbf{C}^{(v)}_{w\tau}\right)[v;w]\\ v^{\ast}\left(\mathbf{B}^{(w)}_{v\tau}\oplus\mathbf{C}^{(w)}_{w\tau}\right)[v;w]\end{pmatrix}=0. (54)

This constraint enforces the conservation of energy with respect to both the large and the small scales. The discrete formulation of (54) yields the following:

(i,j,k)=(1,1,1)(r1,r1,r1)(𝐁vτ(v))(ijk)vivjvk+(i,j,k)=(1,1,1)(r1,r2,r1)(𝐂wτ(v))(ijk)viwjvk+(i,j,k)=(1,1,1)(r1,r1,r2)(𝐁vτ(w))(ijk)vivjwk+(i,j,k)=(1,1,1)(r1,r2,r2)(𝐂wτ(w))(ijk)viwjwk=0.\displaystyle\begin{aligned} &\sum_{(i,j,k)=(1,1,1)}^{(r_{1},r_{1},r_{1})}\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(ijk)}v_{i}v_{j}v_{k}+{\sum_{(i,j,k)=(1,1,1)}^{(r_{1},r_{2},r_{1})}\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(ijk)}v_{i}w_{j}v_{k}}\\ &+{\sum_{(i,j,k)=(1,1,1)}^{(r_{1},r_{1},r_{2})}\left(\mathbf{B}^{(w)}_{v\tau}\right)_{(ijk)}v_{i}v_{j}w_{k}}+\sum_{(i,j,k)=(1,1,1)}^{(r_{1},r_{2},r_{2})}\left(\mathbf{C}^{(w)}_{w\tau}\right)_{(ijk)}v_{i}w_{j}w_{k}=0.\end{aligned} (55)

Taking into account the symmetries of the nonlinear terms in equation (55), the second and third terms on the LHS of (55) should share their energy conservation:

(i,j,k)=(1,1,1)(r1,r2,r1)(𝐂wτ(v))(ijk)viwjvk\displaystyle\sum_{(i,j,k)=(1,1,1)}^{(r_{1},r_{2},r_{1})}\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(ijk)}v_{i}w_{j}v_{k} +(i,j,k)=(1,1,1)(r1,r1,r2)(𝐁vτ(w))(ijk)vivjwk=0.\displaystyle+\sum_{(i,j,k)=(1,1,1)}^{(r_{1},r_{1},r_{2})}\left(\mathbf{B}^{(w)}_{v\tau}\right)_{(ijk)}v_{i}v_{j}w_{k}=0. (56)

On the other hand, the other two terms should enforce their own energy conservation:

(i,j,k)=(1,1,1)(r1,r1,r1)(𝐁vτ(v))(ijk)vivjvk=0\displaystyle\sum_{(i,j,k)=(1,1,1)}^{(r_{1},r_{1},r_{1})}\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(ijk)}v_{i}v_{j}v_{k}=0 (i,j,k)=(1,1,1)(r1,r2,r2)(𝐂wτ(w))(ijk)viwjwk=0.\displaystyle\sum_{(i,j,k)=(1,1,1)}^{(r_{1},r_{2},r_{2})}\left(\mathbf{C}^{(w)}_{w\tau}\right)_{(ijk)}v_{i}w_{j}w_{k}=0. (57)

In particular, these terms yield the following discrete relations:

𝐂wτ(v) and 𝐁vτ(w):\displaystyle{\mathbf{C}}^{(v)}_{w\tau}\text{ and }\mathbf{B}^{(w)}_{v\tau}:
(𝐂wτ(v))(iki)+(𝐁vτ(w))(iik)=0,i=1,,r1,k=1,,r2;\displaystyle\left({\mathbf{C}}^{(v)}_{w\tau}\right)_{(iki)}+\left(\mathbf{B}^{(w)}_{v\tau}\right)_{(iik)}=0,\quad i=1,\cdots,r_{1},k=1,\cdots,r_{2}; (58)
(𝐂wτ(v))(ikj)+(𝐂wτ(v))(jki)+(𝐁vτ(w))(ijk)+(𝐁vτ(w))(jik)=0,\displaystyle\left({\mathbf{C}}^{(v)}_{w\tau}\right)_{(ikj)}+\left({\mathbf{C}}^{(v)}_{w\tau}\right)_{(jki)}+\left(\mathbf{B}^{(w)}_{v\tau}\right)_{(ijk)}+\left(\mathbf{B}^{(w)}_{v\tau}\right)_{(jik)}=0, (59)
i,j=1,,r1,ij;k=1,,r2,\displaystyle\hskip 170.71652pti,j=1,\cdots,r_{1},i\neq j;k=1,\cdots,r_{2},
𝐁vτ(v):\displaystyle\mathbf{B}^{(v)}_{v\tau}:
(𝐁vτ(v))(iii)=0,i=1,,r1,\displaystyle\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(iii)}=0,\qquad\qquad i=1,\cdots,r_{1}, (60)
(𝐁vτ(v))(ijj)+(𝐁vτ(v))(jij)+(𝐁vτ(v))(jji)=0,i,j=1,,r1,ij,\displaystyle\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(ijj)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(jij)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(jji)}=0,\qquad i,j=1,\cdots,r_{1},i\neq j, (61)
(𝐁vτ(v))(ijk)+(𝐁vτ(v))(ikj)+(𝐁vτ(v))(jik)+(𝐁vτ(v))(jki)+(𝐁vτ(v))(kij)\displaystyle\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(ijk)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(ikj)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(jik)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(jki)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(kij)} (62)
+(𝐁wτ(v))(kji)=0,i,j,k=1,,r1,ijk.\displaystyle\hskip 113.81102pt+\left(\mathbf{B}^{(v)}_{w\tau}\right)_{(kji)}=0,\qquad i,j,k=1,\cdots,r_{1},i\neq j\neq k.
𝐂wτ(w):\displaystyle\mathbf{C}^{(w)}_{w\tau}:
(𝐂wτ(w))(ijj)=0,\displaystyle\left(\mathbf{C}^{(w)}_{w\tau}\right)_{(ijj)}=0, i=1,,r1;j=1,,r2,\displaystyle i=1,\cdots,r_{1};j=1,\cdots,r_{2}, (63)
(𝐂wτ(w))(ijk)+(𝐂wτ(w))(ikj)=0,\displaystyle\left(\mathbf{C}^{(w)}_{w\tau}\right)_{(ijk)}+\left(\mathbf{C}^{(w)}_{w\tau}\right)_{(ikj)}=0, i=1,,r1;j,k=1,,r2,jk.\displaystyle i=1,\cdots,r_{1};j,k=1,\cdots,r_{2},j\neq k. (64)

To further construct the constraint coefficient matrix HH in equation (13), let θL\theta_{L} contain every element in 𝐁vτ(v),𝐂wτ(v),𝐃wτ(v),𝐂vτ(v)\mathbf{B}^{(v)}_{v\tau},\mathbf{C}^{(v)}_{w\tau},\mathbf{D}^{(v)}_{w\tau},\mathbf{C}^{(v)}_{v\tau}, and let θS\theta_{S} contain every element in 𝐁vτ(w),𝐂wτ(w),𝐃wτ(w),𝐂vτ(w)\mathbf{B}^{(w)}_{v\tau},\mathbf{C}^{(w)}_{w\tau},\mathbf{D}^{(w)}_{w\tau},\mathbf{C}^{(w)}_{v\tau}. Then, the physical constraints (58) to (64) can be converted into a matrix form:

H(θLθS)=0,\displaystyle H\begin{pmatrix}\theta_{L}\\ \theta_{S}\end{pmatrix}=0, (65)

where θ=[θL;θS]\theta=[\theta_{L};\theta_{S}]. In particular, we can formulate θL\theta_{L}and θS\theta_{S} as follows:

θL=[(𝐁vτ(v))111,(𝐁vτ(v))121,(𝐁vτ(v))r1r1r1,(𝐂wτ(v))111,(𝐂wτ(v))121,(𝐂wτ(v))r1r2r1,(𝐃wτ(v))11,(𝐃wτ(v))r1r1,(𝐂vτ(v))11,(𝐂vτ(v))r1r2]θS=[(𝐁vτ(w))111,(𝐁vτ(w))121,(𝐁vτ(w))r1r1r1,(𝐂wτ(w))111,(𝐂wτ(w))121,(𝐂wτ(w))r1r2r1,(𝐃wτ(w))11,(𝐃wτ(w))r1r1,(𝐂vτ(w))11,(𝐂vτ(w))r1r2]\displaystyle\begin{aligned} &\theta_{L}=\biggl{[}\left(\mathbf{B}^{(v)}_{v\tau}\right)_{111},\left(\mathbf{B}^{(v)}_{v\tau}\right)_{121},\cdots\left(\mathbf{B}^{(v)}_{v\tau}\right)_{r_{1}r_{1}r_{1}},\left(\mathbf{C}^{(v)}_{w\tau}\right)_{111},\left(\mathbf{C}^{(v)}_{w\tau}\right)_{121},\cdots\left(\mathbf{C}^{(v)}_{w\tau}\right)_{r_{1}r_{2}r_{1}},\\ &\left(\mathbf{D}^{(v)}_{w\tau}\right)_{11},\cdots\left(\mathbf{D}^{(v)}_{w\tau}\right)_{r_{1}r_{1}},\left(\mathbf{C}^{(v)}_{v\tau}\right)_{11},\cdots\left(\mathbf{C}^{(v)}_{v\tau}\right)_{r_{1}r_{2}}\biggr{]}^{\top}\\ &\theta_{S}=\biggl{[}\left(\mathbf{B}^{(w)}_{v\tau}\right)_{111},\left(\mathbf{B}^{(w)}_{v\tau}\right)_{121},\cdots\left(\mathbf{B}^{(w)}_{v\tau}\right)_{r_{1}r_{1}r_{1}},\left(\mathbf{C}^{(w)}_{w\tau}\right)_{111},\left(\mathbf{C}^{(w)}_{w\tau}\right)_{121},\cdots\left(\mathbf{C}^{(w)}_{w\tau}\right)_{r_{1}r_{2}r_{1}},\\ &\left(\mathbf{D}^{(w)}_{w\tau}\right)_{11},\cdots\left(\mathbf{D}^{(w)}_{w\tau}\right)_{r_{1}r_{1}},\left(\mathbf{C}^{(w)}_{v\tau}\right)_{11},\cdots\left(\mathbf{C}^{(w)}_{v\tau}\right)_{r_{1}r_{2}}\biggr{]}^{\top}\end{aligned}

B.2.1 A Simple Example

A simple example, e.g,with r1=3,r=4r_{1}=3,r=4, can be used to illustrate the constraints from (60) to (64).

𝐁vτ(v)3×3×3:\displaystyle\mathbf{B}^{(v)}_{v\tau}\in\mathbb{R}^{3\times 3\times 3}:
(𝐁vτ(v))(111)=(𝐁vτ(v))(222)=(𝐁vτ(v))(333)=0,\displaystyle\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(111)}=\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(222)}=\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(333)}=0, (66)
(𝐁vτ(v))(122)+(𝐁vτ(v))(212)+(𝐁vτ(v))(221)=0,\displaystyle\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(122)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(212)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(221)}=0, (67)
(𝐁vτ(v))(133)+(𝐁vτ(v))(313)+(𝐁vτ(v))(331)=0,\displaystyle\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(133)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(313)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(331)}=0, (68)
(𝐁vτ(v))(233)+(𝐁vτ(v))(323)+(𝐁vτ(v))(332)=0,\displaystyle\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(233)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(323)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(332)}=0, (69)
(𝐁vτ(v))(211)+(𝐁vτ(v))(121)+(𝐁vτ(v))(112)=0,\displaystyle\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(211)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(121)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(112)}=0, (70)
(𝐁vτ(v))(311)+(𝐁vτ(v))(131)+(𝐁vτ(v))(113)=0,\displaystyle\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(311)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(131)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(113)}=0, (71)
(𝐁vτ(v))(322)+(𝐁vτ(v))(232)+(𝐁vτ(v))(223)=0,\displaystyle\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(322)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(232)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(223)}=0, (72)
(𝐁vτ(v))(123)+(𝐁vτ(v))(132)+(𝐁vτ(v))(213)+(𝐁vτ(v))(231)+(𝐁vτ(v))(312)\displaystyle\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(123)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(132)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(213)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(231)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(312)} (73)
+(𝐁wτ(v))(321)=0.\displaystyle\hskip 113.81102pt+\left(\mathbf{B}^{(v)}_{w\tau}\right)_{(321)}=0. (74)
𝐁vτ(w)3×3×1,𝐂wτ(v)3×1×3:\displaystyle\mathbf{B}^{(w)}_{v\tau}\in\mathbb{R}^{3\times 3\times 1},\,\mathbf{C}^{(v)}_{w\tau}\in\mathbb{R}^{3\times 1\times 3}:
(𝐁vτ(w))(111)+(𝐂wτ(v))(111)=0,\displaystyle\left(\mathbf{B}^{(w)}_{v\tau}\right)_{(111)}+\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(111)}=0, (75)
(𝐁vτ(w))(221)+(𝐂wτ(v))(212)=0,\displaystyle\left(\mathbf{B}^{(w)}_{v\tau}\right)_{(221)}+\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(212)}=0, (76)
(𝐁vτ(w))(331)+(𝐂wτ(v))(313)=0,\displaystyle\left(\mathbf{B}^{(w)}_{v\tau}\right)_{(331)}+\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(313)}=0,\hskip 170.71652pt (77)
(𝐁vτ(w))(121)+(𝐁vτ(w))(211)+(𝐂wτ(v))(112)+(𝐂wτ(v))(211)=0\displaystyle\left(\mathbf{B}^{(w)}_{v\tau}\right)_{(121)}+\left(\mathbf{B}^{(w)}_{v\tau}\right)_{(211)}+\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(112)}+\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(211)}=0 (78)
(𝐁vτ(w))(131)+(𝐁vτ(w))(311)+(𝐂wτ(v))(113)+(𝐂wτ(v))(311)=0\displaystyle\left(\mathbf{B}^{(w)}_{v\tau}\right)_{(131)}+\left(\mathbf{B}^{(w)}_{v\tau}\right)_{(311)}+\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(113)}+\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(311)}=0 (79)
(𝐁vτ(w))(231)+(𝐁vτ(w))(321)+(𝐂wτ(v))(213)+(𝐂wτ(v))(312)=0\displaystyle\left(\mathbf{B}^{(w)}_{v\tau}\right)_{(231)}+\left(\mathbf{B}^{(w)}_{v\tau}\right)_{(321)}+\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(213)}+\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(312)}=0 (80)
𝐂wτ(w)3×1×1:\displaystyle\mathbf{C}^{(w)}_{w\tau}\in\mathbb{R}^{3\times 1\times 1}:
(𝐂wτ(w))(111)=(𝐂wτ(w))(211)=(𝐂wτ(w))(311)=0,\displaystyle\left(\mathbf{C}^{(w)}_{w\tau}\right)_{(111)}=\left(\mathbf{C}^{(w)}_{w\tau}\right)_{(211)}=\left(\mathbf{C}^{(w)}_{w\tau}\right)_{(311)}=0,\hskip 99.58464pt (81)

B.3 Optimization Problem Formulation

In this section, the details of constructing the optimization problem in Section 2.3 and Section 2.4 are presented. In Section B.3.1, the matrix formulations of 𝐌Lj\mathbf{M}^{j}_{L} in the constrained optimization problem (14)-(15) are sketched. In Section B.3.2, the ill conditioning in the general formulation of the optimization problem is identified and the mathematical explanation is provided. In Section B.3.3, a special regularization for the constrained optimization problem is proposed, which can overcome the ill conditioning described in Section B.3.2.

B.3.1 General Formulation

Following the notations in Section 2.3 and Section 2.4, the 𝐌Lj\mathbf{M}^{j}_{L}, the coefficient matrix for θL\theta_{L}, can be expressed as:

𝐌Lj=ΔT\displaystyle\mathbf{M}^{j}_{L}=\Delta T\cdot (82)
[(v(j))(u(j))𝟎𝟎(u(j))𝟎𝟎𝟎(v(j))(u(j))𝟎𝟎(u(j))𝟎𝟎𝟎(v(j))(u(j))𝟎𝟎(u(j))]\displaystyle\begin{bmatrix}\left(v^{(j)}\right)^{\top}\bigotimes\left(u^{(j)}\right)^{\top}&\mathbf{0}&\cdots&\mathbf{0}&\left(u^{(j)}\right)^{\top}&\mathbf{0}&\cdots&\mathbf{0}\\ \mathbf{0}&\left(v^{(j)}\right)^{\top}\bigotimes\left(u^{(j)}\right)^{\top}&\cdots&\mathbf{0}&\mathbf{0}&\left(u^{(j)}\right)^{\top}&\cdots&\mathbf{0}\\ \vdots&\ddots&\ldots&\vdots&\vdots&\vdots&\ddots&\vdots\\ \mathbf{0}&\mathbf{0}&\dots&\left(v^{(j)}\right)^{\top}\bigotimes\left(u^{(j)}\right)^{\top}&\mathbf{0}&\mathbf{0}&\ldots&\left(u^{(j)}\right)^{\top}\end{bmatrix} (83)

where v(j)v^{(j)} is the snapshot of 𝒗\bm{v} at tjt_{j}, u(j)u^{(j)} is the snapshot of 𝒖=[𝒗;𝒘]\bm{u}=[\bm{v};\bm{w}] at tjt_{j}, and \bigotimes is the Kronecker tensor product. Using the notations from tensor products, 𝐌Lj\mathbf{M}^{j}_{L} can also be expressed as 𝐌Lj=ΔT(𝐈r1×r1((v(j))(u(j))))(𝐈r1×r1(u(j)))\mathbf{M}^{j}_{L}=\Delta T\cdot\left(\mathbf{I}_{r_{1}\times r_{1}}\bigotimes\left(\left(v^{(j)}\right)^{\top}\bigotimes\left(u^{(j)}\right)^{\top}\right)\right)\bigoplus\left(\mathbf{I}_{r_{1}\times r_{1}}\bigotimes\left(u^{(j)}\right)^{\top}\right), where 𝐈r1×r1\mathbf{I}_{r_{1}\times r_{1}} is the r1×r1r_{1}\times r_{1} identity matrix.

B.3.2 Ill Conditioning Problem

When solving the constrained optimization problem, j(𝐌Lj)𝚺1𝐌Lj\sum_{j}(\mathbf{M}^{j}_{L})^{\ast}\mathbf{\Sigma}^{-1}\mathbf{M}^{j}_{L} is expected to be full rank. However, due to the general formulation presented in Section B.3.1, j(𝐌Lj)𝚺1𝐌Lj\sum_{j}(\mathbf{M}^{j}_{L})^{\ast}\mathbf{\Sigma}^{-1}\mathbf{M}^{j}_{L} will inevitably be rank deficient, which leads to the ill conditioning. To track the source of the ill conditioning, it is essential to check the matrix form of (𝐌j)(𝚺(k))1𝐌j(\mathbf{M}^{j})^{\ast}(\mathbf{\Sigma}^{(k)})^{-1}\mathbf{M}^{j}. In particular, (𝐌j)(𝚺(k))1𝐌j(\mathbf{M}^{j})^{\ast}(\mathbf{\Sigma}^{(k)})^{-1}\mathbf{M}^{j} yields the following:

(𝐌j)(𝚺(k))1𝐌j=[αj000αj000αjβj000βj000βj][𝚺1100𝚺r1,r1][αj00βj000αj00βj000αj00βj]=[𝚺11αj000𝚺22αj000𝚺r1,r1αj𝚺11βj000𝚺22βj000𝚺r1,r1βj][αj00βj000αj00βj000αj00βj]=[𝚺11αjαj00𝚺11αjβj000𝚺22αjαj00𝚺22αjβj000𝚺r1,r1αjαj00𝚺r1,r1αjβj𝚺11βjαj00𝚺11βjβj000𝚺22βjαj00𝚺22βjβj000𝚺r1,r1βjαj00𝚺r1,r1βjβj],\displaystyle\begin{aligned} &&&(\mathbf{M}^{j})^{\ast}(\mathbf{\Sigma}^{(k)})^{-1}\mathbf{M}^{j}\\ &&&=\begin{bmatrix}\alpha_{j}^{\ast}&0&\cdots&0\\ 0&\alpha_{j}^{\ast}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\alpha_{j}^{\ast}\\ \beta_{j}^{\ast}&0&\cdots&0\\ 0&\beta_{j}^{\ast}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\beta_{j}^{\ast}\\ \end{bmatrix}\begin{bmatrix}\mathbf{\Sigma}_{11}&\cdots&0\\ \vdots&\ddots&\vdots\\ 0&\cdots&\mathbf{\Sigma}_{r_{1},r_{1}}\end{bmatrix}\begin{bmatrix}\alpha_{j}&0&\cdots&0&\beta_{j}&0&\cdots&0\\ 0&\alpha_{j}&\cdots&0&0&\beta_{j}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots&\vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\alpha_{j}&0&0&\cdots&\beta_{j}\\ \end{bmatrix}\\ &&&=\begin{bmatrix}\mathbf{\Sigma}_{11}\alpha_{j}^{\ast}&0&\cdots&0\\ 0&\mathbf{\Sigma}_{22}\alpha_{j}^{\ast}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\mathbf{\Sigma}_{r_{1},r_{1}}\alpha_{j}^{\ast}\\ \mathbf{\Sigma}_{11}\beta_{j}^{\ast}&0&\cdots&0\\ 0&\mathbf{\Sigma}_{22}\beta_{j}^{\ast}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\mathbf{\Sigma}_{r_{1},r_{1}}\beta_{j}^{\ast}\\ \end{bmatrix}\begin{bmatrix}\alpha_{j}&0&\cdots&0&\beta_{j}&0&\cdots&0\\ 0&\alpha_{j}&\cdots&0&0&\beta_{j}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots&\vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\alpha_{j}&0&0&\cdots&\beta_{j}\\ \end{bmatrix}\\ &&&=\begin{bmatrix}\mathbf{\Sigma}_{11}\alpha_{j}^{\ast}\alpha_{j}&0&\cdots&0&\mathbf{\Sigma}_{11}\alpha_{j}^{\ast}\beta_{j}&0&\cdots&0\\ 0&\mathbf{\Sigma}_{22}\alpha_{j}^{\ast}\alpha_{j}&\cdots&0&0&\mathbf{\Sigma}_{22}\alpha_{j}^{\ast}\beta_{j}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots&\vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\mathbf{\Sigma}_{r_{1},r_{1}}\alpha_{j}^{\ast}\alpha_{j}&0&0&\cdots&\mathbf{\Sigma}_{r_{1},r_{1}}\alpha_{j}^{\ast}\beta_{j}\\ \mathbf{\Sigma}_{11}\beta_{j}^{\ast}\alpha_{j}&0&\cdots&0&\mathbf{\Sigma}_{11}\beta_{j}^{\ast}\beta_{j}&0&\cdots&0\\ 0&\mathbf{\Sigma}_{22}\beta_{j}^{\ast}\alpha_{j}&\cdots&0&0&\mathbf{\Sigma}_{22}\beta_{j}^{\ast}\beta_{j}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots&\vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\mathbf{\Sigma}_{r_{1},r_{1}}\beta_{j}^{\ast}\alpha_{j}&0&0&\cdots&\mathbf{\Sigma}_{r_{1},r_{1}}\beta_{j}^{\ast}\beta_{j}\\ \end{bmatrix}\,,\end{aligned} (84)

where

αj\displaystyle\alpha_{j} =(v(j))(u(j))\displaystyle=\left(v^{(j)}\right)^{\top}\bigotimes\left(u^{(j)}\right)^{\top} 1×r1r\displaystyle\in\mathbb{R}^{1\times r_{1}r} (85)
βj\displaystyle\beta_{j} =(u(j))\displaystyle=\left(u^{(j)}\right)^{\top} 1×r.\displaystyle\in\mathbb{R}^{1\times r}. (86)

Moreover, the products between αj\alpha_{j} and βj\beta_{j} can be written as

αjαj=[v(j)u(j)][(v(j))(u(j))]=[v1(j)u1(j)v1(j)u1(j)v1(j)u1(j)v1(j)u2(j)v1(j)u1(j)v1(j)u3(j)v1(j)u1(j)vr1(j)ur(j)v1(j)u2(j)v1(j)u1(j)v1(j)u2(j)v1(j)u2(j)v1(j)u2(j)v1(j)u3(j)v1(j)u2(j)vr1(j)ur(j)v1(j)u3(j)v1(j)u1(j)v1(j)u3(j)v1(j)u2(j)v1(j)u3(j)v1(j)u3(j)v1(j)u3(j)vr1(j)ur(j)v2(j)u1(j)v1(j)u1(j)v2(j)u1(j)v1(j)u2(j)v2(j)u1(j)v1(j)u3(j)v2(j)u1(j)vr1(j)ur(j)v2(j)u2(j)v1(j)u1(j)v2(j)u2(j)v1(j)u2(j)v2(j)u2(j)v1(j)u3(j)v2(j)u2(j)vr1(j)ur(j)vr1(j)ur(j)v1(j)u1(j)vr1(j)ur(j)v1(j)u2(j)vr1(j)ur(j)v1(j)u3(j)vr1(j)ur(j)vr1(j)ur(j)]r1r×r1r,\displaystyle\begin{aligned} \alpha_{j}^{\ast}\alpha_{j}&=\left[v^{(j)}\bigotimes u^{(j)}\right]\bigotimes\left[\left(v^{(j)}\right)^{\top}\bigotimes\left(u^{(j)}\right)^{\top}\right]\\ &=\begin{bmatrix}v_{1}^{(j)}u_{1}^{(j)}v_{1}^{(j)}u_{1}^{(j)}&v_{1}^{(j)}u_{1}^{(j)}v_{1}^{(j)}u_{2}^{(j)}&v_{1}^{(j)}u_{1}^{(j)}v_{1}^{(j)}u_{3}^{(j)}&\cdots&v_{1}^{(j)}u_{1}^{(j)}v_{r_{1}}^{(j)}u_{r}^{(j)}\\ v_{1}^{(j)}u_{2}^{(j)}v_{1}^{(j)}u_{1}^{(j)}&v_{1}^{(j)}u_{2}^{(j)}v_{1}^{(j)}u_{2}^{(j)}&v_{1}^{(j)}u_{2}^{(j)}v_{1}^{(j)}u_{3}^{(j)}&\cdots&v_{1}^{(j)}u_{2}^{(j)}v_{r_{1}}^{(j)}u_{r}^{(j)}\\ v_{1}^{(j)}u_{3}^{(j)}v_{1}^{(j)}u_{1}^{(j)}&v_{1}^{(j)}u_{3}^{(j)}v_{1}^{(j)}u_{2}^{(j)}&v_{1}^{(j)}u_{3}^{(j)}v_{1}^{(j)}u_{3}^{(j)}&\cdots&v_{1}^{(j)}u_{3}^{(j)}v_{r_{1}}^{(j)}u_{r}^{(j)}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ v_{2}^{(j)}u_{1}^{(j)}v_{1}^{(j)}u_{1}^{(j)}&v_{2}^{(j)}u_{1}^{(j)}v_{1}^{(j)}u_{2}^{(j)}&v_{2}^{(j)}u_{1}^{(j)}v_{1}^{(j)}u_{3}^{(j)}&\cdots&v_{2}^{(j)}u_{1}^{(j)}v_{r_{1}}^{(j)}u_{r}^{(j)}\\ v_{2}^{(j)}u_{2}^{(j)}v_{1}^{(j)}u_{1}^{(j)}&v_{2}^{(j)}u_{2}^{(j)}v_{1}^{(j)}u_{2}^{(j)}&v_{2}^{(j)}u_{2}^{(j)}v_{1}^{(j)}u_{3}^{(j)}&\cdots&v_{2}^{(j)}u_{2}^{(j)}v_{r_{1}}^{(j)}u_{r}^{(j)}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ v_{r_{1}}^{(j)}u_{r}^{(j)}v_{1}^{(j)}u_{1}^{(j)}&v_{r_{1}}^{(j)}u_{r}^{(j)}v_{1}^{(j)}u_{2}^{(j)}&v_{r_{1}}^{(j)}u_{r}^{(j)}v_{1}^{(j)}u_{3}^{(j)}&\cdots&v_{r_{1}}^{(j)}u_{r}^{(j)}v_{r_{1}}^{(j)}u_{r}^{(j)}\\ \end{bmatrix}&&\in\mathbb{R}^{r_{1}r\times r_{1}r},\end{aligned} (87)
βjαj=[v(j)u(j)][(u(j))]=[v1(j)u1(j)u1(j)v1(j)u1(j)u2(j)v1(j)u1(j)u3(j)v1(j)u1(j)ur(j)v1(j)u2(j)u1(j)v1(j)u2(j)u2(j)v1(j)u2(j)u3(j)v1(j)u2(j)ur(j)v1(j)u3(j)u1(j)v1(j)u3(j)u2(j)v1(j)u3(j)u3(j)v1(j)u3(j)ur(j)v2(j)u1(j)u1(j)v2(j)u1(j)u2(j)v2(j)u1(j)u3(j)v2(j)u1(j)ur(j)v2(j)u2(j)u1(j)v2(j)u2(j)u2(j)v2(j)u2(j)u3(j)v2(j)u2(j)ur(j)vr1(j)ur(j)u1(j)vr1(j)ur(j)u2(j)vr1(j)ur(j)u3(j)vr1(j)ur(j)ur(j)]r1r×r\displaystyle\begin{aligned} \beta_{j}^{\ast}\alpha_{j}&=\left[v^{(j)}\bigotimes u^{(j)}\right]\bigotimes\left[\left(u^{(j)}\right)^{\top}\right]\\ &=\begin{bmatrix}v_{1}^{(j)}u_{1}^{(j)}u_{1}^{(j)}&v_{1}^{(j)}u_{1}^{(j)}u_{2}^{(j)}&v_{1}^{(j)}u_{1}^{(j)}u_{3}^{(j)}&\cdots&v_{1}^{(j)}u_{1}^{(j)}u_{r}^{(j)}\\ v_{1}^{(j)}u_{2}^{(j)}u_{1}^{(j)}&v_{1}^{(j)}u_{2}^{(j)}u_{2}^{(j)}&v_{1}^{(j)}u_{2}^{(j)}u_{3}^{(j)}&\cdots&v_{1}^{(j)}u_{2}^{(j)}u_{r}^{(j)}\\ v_{1}^{(j)}u_{3}^{(j)}u_{1}^{(j)}&v_{1}^{(j)}u_{3}^{(j)}u_{2}^{(j)}&v_{1}^{(j)}u_{3}^{(j)}u_{3}^{(j)}&\cdots&v_{1}^{(j)}u_{3}^{(j)}u_{r}^{(j)}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ v_{2}^{(j)}u_{1}^{(j)}u_{1}^{(j)}&v_{2}^{(j)}u_{1}^{(j)}u_{2}^{(j)}&v_{2}^{(j)}u_{1}^{(j)}u_{3}^{(j)}&\cdots&v_{2}^{(j)}u_{1}^{(j)}u_{r}^{(j)}\\ v_{2}^{(j)}u_{2}^{(j)}u_{1}^{(j)}&v_{2}^{(j)}u_{2}^{(j)}u_{2}^{(j)}&v_{2}^{(j)}u_{2}^{(j)}u_{3}^{(j)}&\cdots&v_{2}^{(j)}u_{2}^{(j)}u_{r}^{(j)}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ v_{r_{1}}^{(j)}u_{r}^{(j)}u_{1}^{(j)}&v_{r_{1}}^{(j)}u_{r}^{(j)}u_{2}^{(j)}&v_{r_{1}}^{(j)}u_{r}^{(j)}u_{3}^{(j)}&\cdots&v_{r_{1}}^{(j)}u_{r}^{(j)}u_{r}^{(j)}\\ \end{bmatrix}&&\in\mathbb{R}^{r_{1}r\times r}\end{aligned} (88)
αjβj=[u(j)][(v(j))(u(j))]=[u1(j)v1(j)u1(j)u1(j)v1(j)u2(j)u1(j)v1(j)u3(j)u1(j)vr1(j)ur(j)u2(j)v1(j)u1(j)u2(j)v1(j)u2(j)u2(j)v1(j)u3(j)u2(j)vr1(j)ur(j)ur(j)v1(j)u1(j)ur(j)v1(j)u2(j)ur(j)v1(j)u3(j)ur(j)vr1(j)ur(j)]r1r×r1r,\displaystyle\begin{aligned} \alpha_{j}^{\ast}\beta_{j}&=\left[u^{(j)}\right]\bigotimes\left[\left(v^{(j)}\right)^{\top}\bigotimes\left(u^{(j)}\right)^{\top}\right]\\ &=\begin{bmatrix}u_{1}^{(j)}v_{1}^{(j)}u_{1}^{(j)}&u_{1}^{(j)}v_{1}^{(j)}u_{2}^{(j)}&u_{1}^{(j)}v_{1}^{(j)}u_{3}^{(j)}&\cdots&u_{1}^{(j)}v_{r_{1}}^{(j)}u_{r}^{(j)}\\ u_{2}^{(j)}v_{1}^{(j)}u_{1}^{(j)}&u_{2}^{(j)}v_{1}^{(j)}u_{2}^{(j)}&u_{2}^{(j)}v_{1}^{(j)}u_{3}^{(j)}&\cdots&u_{2}^{(j)}v_{r_{1}}^{(j)}u_{r}^{(j)}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ u_{r}^{(j)}v_{1}^{(j)}u_{1}^{(j)}&u_{r}^{(j)}v_{1}^{(j)}u_{2}^{(j)}&u_{r}^{(j)}v_{1}^{(j)}u_{3}^{(j)}&\cdots&u_{r}^{(j)}v_{r_{1}}^{(j)}u_{r}^{(j)}\\ \end{bmatrix}&&\in\mathbb{R}^{r_{1}r\times r_{1}r},\end{aligned} (89)
βjβj=u(j)(u(j))=[u1(j)u1(j)u1(j)u2(j)u1(j)u3(j)u1(j)ur(j)u2(j)u1(j)u2(j)u2(j)u2(j)u3(j)u2(j)ur(j)ur(j)u1(j)ur(j)u2(j)ur(j)u3(j)ur(j)ur(j)]r×r.\displaystyle\begin{aligned} \beta_{j}^{\ast}\beta_{j}&=u^{(j)}\bigotimes\left(u^{(j)}\right)^{\top}\\ &=\begin{bmatrix}u_{1}^{(j)}u_{1}^{(j)}&u_{1}^{(j)}u_{2}^{(j)}&u_{1}^{(j)}u_{3}^{(j)}&\cdots&u_{1}^{(j)}u_{r}^{(j)}\\ u_{2}^{(j)}u_{1}^{(j)}&u_{2}^{(j)}u_{2}^{(j)}&u_{2}^{(j)}u_{3}^{(j)}&\cdots&u_{2}^{(j)}u_{r}^{(j)}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ u_{r}^{(j)}u_{1}^{(j)}&u_{r}^{(j)}u_{2}^{(j)}&u_{r}^{(j)}u_{3}^{(j)}&\cdots&u_{r}^{(j)}u_{r}^{(j)}\\ \end{bmatrix}&&\in\mathbb{R}^{r\times r}.\end{aligned} (90)

The rank deficiency occurs from αjαj\alpha_{j}^{\ast}\alpha_{j} where the repeated entries appear in different rows. The entry-wise expression of αjαj\alpha_{j}^{\ast}\alpha_{j} yields

(αjαj)p,q=(vl1ul2)(vm1um2),p=(r(l11)+l2)q=(r(m11)+m2),\displaystyle\begin{aligned} &\biggl{(}\alpha_{j}^{\ast}\alpha_{j}\biggr{)}_{p,q}=\left(v_{l_{1}}u_{l_{2}}\right)\left(v_{m_{1}}u_{m_{2}}\right),&&p=(r(l_{1}-1)+l_{2})\\ &&&q=(r(m_{1}-1)+m_{2}),\end{aligned} (91)

where 1l1,m1r11\leq l_{1},m_{1}\leq r_{1}, 1l2,m2r1\leq l_{2},m_{2}\leq r, pp is the row index, and qq is the column index of αjαj\alpha_{j}^{\ast}\alpha_{j}, which is also the block matrix in (𝐌j)(𝚺(k))1𝐌j(\mathbf{M}^{j})^{\ast}(\mathbf{\Sigma}^{(k)})^{-1}\mathbf{M}^{j}. Note that, since vl1=ul1,1l1r1v_{l_{1}}=u_{l_{1}},1\leq l_{1}\leq r_{1}, the following equality holds:

vl1ul2=vl2ul1,provided that 1l1,l2r1.\displaystyle v_{l_{1}}u_{l_{2}}=v_{l_{2}}u_{l_{1}},\qquad\text{provided that }1\leq l_{1},l_{2}\leq r_{1}. (92)

Therefore, for l1l2l_{1}\neq l_{2} and 1l1,l2r11\leq l_{1},l_{2}\leq r_{1} (r12r_{1}\geq 2), there must exist p1=(r(l11)+l2)p_{1}=(r(l_{1}-1)+l_{2}) and p2=(r(l21)+l1)p_{2}=(r(l_{2}-1)+l_{1}), such that

(αjαj)p1,q=(αjαj)p2,q,p1p2.\displaystyle\biggl{(}\alpha_{j}^{\ast}\alpha_{j}\biggr{)}_{p_{1},q}=\biggl{(}\alpha_{j}^{\ast}\alpha_{j}\biggr{)}_{p_{2},q},\qquad p_{1}\neq p_{2}. (93)

Then r1(r11)2\frac{r_{1}(r_{1}-1)}{2} pairs of rows are the same in each sub-block (r1r_{1} sub-blocks in total) of (𝐌Lj)(𝚺L(k))1𝐌Lj(\mathbf{M}_{L}^{j})^{\ast}(\mathbf{\Sigma}_{L}^{(k)})^{-1}\mathbf{M}_{L}^{j}, and hence the deficient rank of matrix 𝒦L\mathcal{K}_{L} will be r12(r11)2\frac{r_{1}^{2}(r_{1}-1)}{2}.

Remark.

To illustrate the ill conditioning, we use r1=2r_{1}=2 and r=4r=4 as a simple case. Let

u=(v1v2w1w2),\displaystyle u=\begin{pmatrix}v_{1}\\ v_{2}\\ w_{1}\\ w_{2}\end{pmatrix}, v=(v1v2),\displaystyle v=\begin{pmatrix}v_{1}\\ v_{2}\end{pmatrix}, v=(w1w2).\displaystyle v=\begin{pmatrix}w_{1}\\ w_{2}\end{pmatrix}. (94)

Then

𝐌Lj=(α(j)𝟎1×8β(j)𝟎1×4𝟎1×8α(j)𝟎1×4β(j)),\displaystyle\mathbf{M}^{j}_{L}=\begin{pmatrix}\alpha^{(j)}&\mathbf{0}_{1\times 8}&\beta^{(j)}&\mathbf{0}_{1\times 4}\\ \mathbf{0}_{1\times 8}&\alpha^{(j)}&\mathbf{0}_{1\times 4}&\beta^{(j)}\end{pmatrix}, (95)

with

α(j)\displaystyle\alpha^{(j)} =(v1(j)v1(j)v1(j)v2(j)v1(j)w1(j)v1(j)w2(j)v2(j)v1(j)v2(j)v2(j)v2(j)w1(j)v2(j)w2(j)),\displaystyle=\begin{pmatrix}v_{1}^{(j)}v_{1}^{(j)}&v_{1}^{(j)}v_{2}^{(j)}&v_{1}^{(j)}w_{1}^{(j)}&v_{1}^{(j)}w_{2}^{(j)}&v_{2}^{(j)}v_{1}^{(j)}&v_{2}^{(j)}v_{2}^{(j)}&v_{2}^{(j)}w_{1}^{(j)}&v_{2}^{(j)}w_{2}^{(j)}\end{pmatrix}, (96)
β(j)\displaystyle\beta^{(j)} =(v1v2w1w2),\displaystyle=\begin{pmatrix}v_{1}&v_{2}&w_{1}&w_{2}\end{pmatrix}, (97)
𝟎1×8\displaystyle\mathbf{0}_{1\times 8} =(00000000),\displaystyle=\begin{pmatrix}0&0&0&0&0&0&0&0\end{pmatrix}, (98)
𝟎1×4\displaystyle\mathbf{0}_{1\times 4} =(0000).\displaystyle=\begin{pmatrix}0&0&0&0\end{pmatrix}. (99)

The unknown vector θL\theta_{L} yields the following: I don’t think we’re allowed colored text.

θL=[(𝐁vτ(v))111,(𝐁vτ(v))121,(𝐂wτ(v))(111),(𝐂wτ(v))(121),(𝐁vτ(v))211,(𝐁vτ(v))221,(𝐂wτ(v))(211),(𝐂wτ(v))(221),(𝐁vτ(v))112,(𝐁vτ(v))122,(𝐂wτ(v))(112),(𝐂wτ(v))(122),(𝐁vτ(v))212,(𝐁vτ(v))222,(𝐂wτ(v))(212),(𝐂wτ(v))(222),(𝐂vτ(v))(11),(𝐂vτ(v))(12),(𝐃wτ(v))(11),(𝐃wτ(v))(12),(𝐂vτ(v))(21),(𝐂vτ(v))(22),(𝐃wτ(v))(21),(𝐃wτ(v))(22)],\displaystyle\begin{aligned} \theta_{L}=\left[\left(\mathbf{B}^{(v)}_{v\tau}\right)_{111},\quad\left(\mathbf{B}^{(v)}_{v\tau}\right)_{121},\quad\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(111)},\quad\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(121)},\quad\right.\\ \left.\left(\mathbf{B}^{(v)}_{v\tau}\right)_{211},\quad\left(\mathbf{B}^{(v)}_{v\tau}\right)_{221},\quad\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(211)},\quad\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(221)},\right.\\ \left.\left(\mathbf{B}^{(v)}_{v\tau}\right)_{112},\quad\left(\mathbf{B}^{(v)}_{v\tau}\right)_{122},\quad\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(112)},\quad\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(122)},\quad\right.\\ \left.\left(\mathbf{B}^{(v)}_{v\tau}\right)_{212},\quad\left(\mathbf{B}^{(v)}_{v\tau}\right)_{222},\quad\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(212)},\quad\left(\mathbf{C}^{(v)}_{w\tau}\right)_{(222)},\quad\right.\\ \left.\left(\mathbf{C}^{(v)}_{v\tau}\right)_{(11)},\quad\left(\mathbf{C}^{(v)}_{v\tau}\right)_{(12)},\quad\left(\mathbf{D}^{(v)}_{w\tau}\right)_{(11)},\quad\left(\mathbf{D}^{(v)}_{w\tau}\right)_{(12)},\right.\\ \left.\left(\mathbf{C}^{(v)}_{v\tau}\right)_{(21)},\quad\left(\mathbf{C}^{(v)}_{v\tau}\right)_{(22)},\quad\left(\mathbf{D}^{(v)}_{w\tau}\right)_{(21)},\quad\left(\mathbf{D}^{(v)}_{w\tau}\right)_{(22)}\right],\end{aligned} (100)

where the first four rows in (100) terms denote the nonlinear parameters in θL\theta_{L}, and the last two rows in (100) terms denote the linear parameters in θL\theta_{L}. The jjth row of product (𝐌Lj)(𝚺(k))1𝐌Lj(\mathbf{M}_{L}^{j})^{\top}(\mathbf{\Sigma}^{(k)})^{-1}\mathbf{M}_{L}^{j} is related to the jjth row of the residual. In particular, the first row of the product (𝐌Lj)(𝚺(k))1𝐌Lj(\mathbf{M}_{L}^{j})^{\top}(\mathbf{\Sigma}^{(k)})^{-1}\mathbf{M}_{L}^{j} yields the following:

((𝐌Lj)(𝚺(k))1𝐌Lj)1=[𝚺11v1v1v1v1,𝚺11v1v1v1v2,𝚺11v1v1v1w1,𝚺11v1v1v1w2,𝚺11v1v1v2v1,𝚺11v1v1v2v2,𝚺11v1v1v2w1,𝚺11v1v1v2w2,0,0,0,0,0,0,0,0,𝚺11v1v1v1,𝚺11v1v1v2,𝚺11v1v1w1,𝚺11v1v1w2,0,0,0,0].\displaystyle\begin{aligned} &\left({(\mathbf{M}_{L}^{j})^{\ast}}{(\mathbf{\Sigma}^{(k)})^{-1}}{\mathbf{M}_{L}^{j}}\right)_{1\cdot}=\\ &\left[{\mathbf{\Sigma}_{11}}{v_{1}v_{1}}{v_{1}v_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{1}}{v_{1}v_{2}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{1}}{v_{1}w_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{1}}{v_{1}w_{2}},\right.\\ &\left.{\mathbf{\Sigma}_{11}}{v_{1}v_{1}}{v_{2}v_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{1}}{v_{2}v_{2}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{1}}{v_{2}w_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{1}}{v_{2}w_{2}},\right.\\ &\left.0,\quad 0,\quad 0,\quad 0,\quad 0,\quad 0,\quad 0,\quad 0,\right.\\ &\left.{\mathbf{\Sigma}_{11}}{v_{1}v_{1}}{v_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{1}}{v_{2}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{1}}{w_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{1}}{w_{2}},\right.\\ &\left.0,\quad 0,\quad 0,\quad 0\right].\end{aligned} (101)

The problem occurs in the second and fifth rows. The second row yields:

((𝐌Lj)(𝚺(k))1𝐌Lj)2=[𝚺11v1v2v1v1,𝚺11v1v2v1v2,𝚺11v1v2v1w1,𝚺11v1v2v1w2,𝚺11v1v2v2v1,𝚺11v1v2v2v2,𝚺11v1v2v2w1,𝚺11v1v2v2w2,0,0,0,0,0,0,0,0,𝚺11v1v2v1,𝚺11v1v2v2,𝚺11v1v2w1,𝚺11v1v2w2,0,0,0,0].\displaystyle\begin{aligned} &\left({(\mathbf{M}_{L}^{j})^{\ast}}{(\mathbf{\Sigma}^{(k)})^{-1}}{\mathbf{M}_{L}^{j}}\right)_{2\cdot}=\\ &\left[{\mathbf{\Sigma}_{11}}{v_{1}v_{2}}{v_{1}v_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{2}}{v_{1}v_{2}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{2}}{v_{1}w_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{2}}{v_{1}w_{2}},\right.\\ &\left.{\mathbf{\Sigma}_{11}}{v_{1}v_{2}}{v_{2}v_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{2}}{v_{2}v_{2}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{2}}{v_{2}w_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{2}}{v_{2}w_{2}},\right.\\ &\left.0,\quad 0,\quad 0,\quad 0,\quad 0,\quad 0,\quad 0,\quad 0,\right.\\ &\left.{\mathbf{\Sigma}_{11}}{v_{1}v_{2}}{v_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{2}}{v_{2}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{2}}{w_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{1}v_{2}}{w_{2}},\right.\\ &\left.0,\quad 0,\quad 0,\quad 0\right].\end{aligned} (102)

The fifth row yields:

((𝐌Lj)(𝚺(k))1𝐌Lj)5=[𝚺11v2v1v1v1,𝚺11v2v1v1v2,𝚺11v2v1v1w1,𝚺11v2v1v1w2,𝚺11v2v1v2v1,𝚺11v2v1v2v2,𝚺11v2v1v2w1,𝚺11v2v1v2w2,0,0,0,0,0,0,0,0,𝚺11v2v1v1,𝚺11v2v1v2,𝚺11v2v1w1,𝚺11v2v1w2,0,0,0,0].\displaystyle\begin{aligned} &\left({(\mathbf{M}_{L}^{j})^{\ast}}{(\mathbf{\Sigma}^{(k)})^{-1}}{\mathbf{M}_{L}^{j}}\right)_{5\cdot}=\\ &\left[{\mathbf{\Sigma}_{11}}{v_{2}v_{1}}{v_{1}v_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{2}v_{1}}{v_{1}v_{2}},\quad{\mathbf{\Sigma}_{11}}{v_{2}v_{1}}{v_{1}w_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{2}v_{1}}{v_{1}w_{2}},\right.\\ &\left.{\mathbf{\Sigma}_{11}}{v_{2}v_{1}}{v_{2}v_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{2}v_{1}}{v_{2}v_{2}},\quad{\mathbf{\Sigma}_{11}}{v_{2}v_{1}}{v_{2}w_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{2}v_{1}}{v_{2}w_{2}},\right.\\ &\left.0,\quad 0,\quad 0,\quad 0,\quad 0,\quad 0,\quad 0,\quad 0,\right.\\ &\left.{\mathbf{\Sigma}_{11}}{v_{2}v_{1}}{v_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{2}v_{1}}{v_{2}},\quad{\mathbf{\Sigma}_{11}}{v_{2}v_{1}}{w_{1}},\quad{\mathbf{\Sigma}_{11}}{v_{2}v_{1}}{w_{2}},\right.\\ &\left.0,\quad 0,\quad 0,\quad 0\right].\end{aligned} (103)

The terms v2v1v_{2}v_{1} and v1v2v_{1}v_{2} are equal. These two terms are corresponding to the second and fifth elements of θL\theta_{L}, i.e., (𝐁vτ(v))121\left(\mathbf{B}^{(v)}_{v\tau}\right)_{121} and (𝐁vτ(v))211\left(\mathbf{B}^{(v)}_{v\tau}\right)_{211}. In fact, this issue may occur in solving the least square problem if all the nonlinear unknowns are involved, i.e., all entries of the unknown tensors are assumed non-zeros.

B.3.3 Special Regularization in Optimization Problem

To overcome the rank-deficiency from the formulation of MLM_{L}, we consider that 𝐁vτ(v)\mathbf{B}^{(v)}_{v\tau} and 𝐁vτ(w)\mathbf{B}^{(w)}_{v\tau} are upper triangular matrices in each dimension, i.e., (𝐁vτ(v))ijk\left(\mathbf{B}^{(v)}_{v\tau}\right)_{ijk} and (𝐁vτ(w))ijk\left(\mathbf{B}^{(w)}_{v\tau}\right)_{ijk} are upper triangular matrices for a fixed index kk. In particular, these terms share the following:

(𝐁vτ(v))ijk=0,provided that i>j;i,j,k=1,,r1\displaystyle\begin{aligned} &\left(\mathbf{B}^{(v)}_{v\tau}\right)_{ijk}=0,&\text{provided that }i>j;&\,i,j,k=1,\ldots,r_{1}\end{aligned} (104)
(𝐁vτ(w))ijk=0,provided that i>j;i,j=1,,r1;k=1,,r2.\displaystyle\begin{aligned} &\left(\mathbf{B}^{(w)}_{v\tau}\right)_{ijk}=0,&\text{provided that }i>j;&\,i,j=1,\cdots,r_{1};k=1,\ldots,r_{2}.\end{aligned} (105)

Note that, since the constraints for (𝐁vτ(w))ijk\left(\mathbf{B}^{(w)}_{v\tau}\right)_{ijk} require that (𝐁vτ(w))iii=0\left(\mathbf{B}^{(w)}_{v\tau}\right)_{iii}=0 and (𝐁vτ(w))ijk+(𝐁vτ(w))jik=0,ij\left(\mathbf{B}^{(w)}_{v\tau}\right)_{ijk}+\left(\mathbf{B}^{(w)}_{v\tau}\right)_{jik}=0,i\neq j, 𝐁vτ(w)\mathbf{B}^{(w)}_{v\tau} vanishes. Hence, the discrete formulation of the new multiscale physics constraints yield the following:

𝐁vτ(v):\displaystyle\mathbf{B}^{(v)}_{v\tau}:
(𝐁vτ(v))(iii)=0,i=1,,r1,\displaystyle\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(iii)}=0,\qquad\qquad i=1,\cdots,r_{1}, (106)
(𝐁vτ(v))(ijj)++(𝐁vτ(v))(jji)=0,i,j=1,,r1,i<j,\displaystyle\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(ijj)}++\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(jji)}=0,\qquad i,j=1,\cdots,r_{1},i<j, (107)
(𝐁vτ(v))(ijk)+(𝐁vτ(v))(ikj)+(𝐁vτ(v))(jki)=0,i,j,k=1,,r1,i<j<k.\displaystyle\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(ijk)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(ikj)}+\left(\mathbf{B}^{(v)}_{v\tau}\right)_{(jki)}=0,\qquad i,j,k=1,\cdots,r_{1},i<j<k. (108)
𝐂wτ(v) and 𝐁vτ(w):\displaystyle\mathbf{C}^{(v)}_{w\tau}\,\text{ and }\,\mathbf{B}^{(w)}_{v\tau}:
(𝐂wτ(v))(iki)+(𝐁vτ(w))(iik)=0,i=1,,r1,k=1,,r2;\displaystyle\left({\mathbf{C}}^{(v)}_{w\tau}\right)_{(iki)}+\left(\mathbf{B}^{(w)}_{v\tau}\right)_{(iik)}=0,\quad i=1,\cdots,r_{1},k=1,\cdots,r_{2}; (109)
(𝐂wτ(v))(ikj)+(𝐂wτ(v))(jki)+(𝐁vτ(w))(ijk)=0,\displaystyle\left({\mathbf{C}}^{(v)}_{w\tau}\right)_{(ikj)}+\left({\mathbf{C}}^{(v)}_{w\tau}\right)_{(jki)}+\left(\mathbf{B}^{(w)}_{v\tau}\right)_{(ijk)}=0, (110)
i<j,i,j=1,,r1,k=1,,r2.\displaystyle\hskip 170.71652pti<j,i,j=1,\cdots,r_{1},k=1,\cdots,r_{2}.
𝐂wτ(w):\displaystyle\mathbf{C}^{(w)}_{w\tau}:
(𝐂wτ(w))(ijj)=0,\displaystyle\left(\mathbf{C}^{(w)}_{w\tau}\right)_{(ijj)}=0, i=1,,r1;j=1,,r2,\displaystyle i=1,\cdots,r_{1};j=1,\cdots,r_{2}, (111)
(𝐂wτ(w))(ijk)+(𝐂wτ(w))(ikj)=0,\displaystyle\left(\mathbf{C}^{(w)}_{w\tau}\right)_{(ijk)}+\left(\mathbf{C}^{(w)}_{w\tau}\right)_{(ikj)}=0, i=1,,r1;j,k=1,,r2,jk.\displaystyle i=1,\cdots,r_{1};j,k=1,\cdots,r_{2},j\neq k. (112)

References

  • [1] G. K. Vallis, Atmospheric and oceanic fluid dynamics, Cambridge University Press, 2017.
  • [2] S. H. Strogatz, Nonlinear dynamics and chaos with student solutions manual: With applications to physics, biology, chemistry, and engineering, CRC press, 2018.
  • [3] D. C. Wilcox, Multiscale model for turbulent flows, AIAA journal 26 (11) (1988) 1311–1320.
  • [4] S. A. Sheard, A. Mostashari, Principles of complex systems for systems engineering, Systems Engineering 12 (4) (2009) 295–311.
  • [5] M. Ghil, S. Childress, Topics in geophysical fluid dynamics: atmospheric dynamics, dynamo theory, and climate dynamics, Springer Science & Business Media, 2012.
  • [6] A. J. Majda, Introduction to turbulent dynamical systems in complex systems, Springer, 2016.
  • [7] W.-K. Tao, J.-D. Chern, R. Atlas, D. Randall, M. Khairoutdinov, J.-L. Li, D. E. Waliser, A. Hou, X. Lin, C. Peters-Lidard, et al., A multiscale modeling system: Developments, applications, and critical issues, Bulletin of the American Meteorological Society 90 (4) (2009) 515–534.
  • [8] A. J. Majda, I. Grooms, New perspectives on superparameterization for geophysical turbulence, Journal of Computational Physics 271 (2014) 60–77.
  • [9] R. Salmon, Lectures on geophysical fluid dynamics, Oxford University Press, 1998.
  • [10] H. A. Dijkstra, Nonlinear climate dynamics, Cambridge University Press, 2013.
  • [11] T. Palmer, A nonlinear dynamical perspective on climate change, Weather 48 (10) (1993) 314–326.
  • [12] M. Farazmand, T. P. Sapsis, Extreme events: Mechanisms and prediction, Applied Mechanics Reviews 71 (5).
  • [13] K. E. Trenberth, J. T. Fasullo, T. G. Shepherd, Attribution of climate extreme events, Nature Climate Change 5 (8) (2015) 725–730.
  • [14] H. Moffatt, Extreme events in turbulent flow, Journal of Fluid Mechanics 914.
  • [15] A. Majda, Introduction to PDEs and Waves for the Atmosphere and Ocean, Vol. 9, American Mathematical Soc., 2003.
  • [16] P. Manneville, Y. Pomeau, Intermittency and the Lorenz model, Physics Letters A 75 (1-2) (1979) 1–2.
  • [17] A. J. Majda, Challenges in climate science and contemporary applied mathematics, Communications on Pure and Applied Mathematics 65 (7) (2012) 920–948.
  • [18] A. J. Majda, N. Chen, Model error, information barriers, state estimation and prediction in complex multiscale systems, Entropy 20 (9) (2018) 644.
  • [19] J. S. Hesthaven, G. Rozza, B. Stamm, Certified Reduced Basis Methods for Parametrized Partial Differential Equations, Springer, 2015.
  • [20] P. Holmes, J. L. Lumley, G. Berkooz, Turbulence, Coherent Structures, Dynamical Systems and Symmetry, Cambridge, 1996.
  • [21] B. R. Noack, M. Morzynski, G. Tadmor, Reduced-order modelling for flow control, Vol. 528, Springer Science & Business Media, 2011.
  • [22] A. Quarteroni, A. Manzoni, F. Negri, Reduced Basis Methods for Partial Differential Equations: An Introduction, Vol. 92, Springer, 2015.
  • [23] K. Taira, M. S. Hemati, S. L. Brunton, Y. Sun, K. Duraisamy, S. Bagheri, S. T. Dawson, C.-A. Yeh, Modal analysis of fluid flows: Applications and outlook, AIAA journal 58 (3) (2020) 998–1022.
  • [24] X. Yan, X. Su, Linear regression analysis: theory and computing, World Scientific, 2009.
  • [25] D. A. Freedman, Statistical models: theory and practice, Cambridge University Press, 2009.
  • [26] K. Hasselmann, PIPs and POPs: The reduction of complex dynamical systems using principal interaction and oscillation patterns, Journal of Geophysical Research: Atmospheres 93 (D9) (1988) 11015–11021.
  • [27] F. Kwasniok, The reduction of complex dynamical systems using principal interaction patterns, Physica D: Nonlinear Phenomena 92 (1-2) (1996) 28–60.
  • [28] C. W. Rowley, I. Mezić, S. Bagheri, P. Schlatter, D. S. Henningson, Spectral analysis of nonlinear flows, Journal of Fluid Mechanics 641 (2009) 115–127.
  • [29] P. J. Schmid, Dynamic mode decomposition of numerical and experimental data, Journal of Fluid Mechanics 656 (2010) 5–28.
  • [30] S. E. Ahmed, S. Pawar, O. San, A. Rasheed, T. Iliescu, B. R. Noack, On closures for reduced order models–A spectrum of first-principle to machine-learned avenues, Physics of Fluids 33 (9) (2021) 091301.
  • [31] K. Carlberg, C. Farhat, J. Cortial, D. Amsallem, The GNAT method for nonlinear model reduction: effective implementation and application to computational fluid dynamics and turbulent flows, Journal of Computational Physics 242 (2013) 623–647.
  • [32] C. Mou, B. Koc, O. San, L. G. Rebholz, T. Iliescu, Data-driven variational multiscale reduced order models, Computer Methods in Applied Mechanics and Engineering 373 (2021) 113470.
  • [33] X. Xie, M. Mohebujjaman, L. G. Rebholz, T. Iliescu, Data-driven filtered reduced order modeling of fluid flows, SIAM Journal on Scientific Computing 40 (3) (2018) B834–B857.
  • [34] A. J. Majda, I. Timofeyev, E. Vanden Eijnden, A mathematical framework for stochastic climate models, Communications on Pure and Applied Mathematics 54 (8) (2001) 891–974.
  • [35] A. J. Majda, I. Timofeyev, E. V. Eijnden, Models for stochastic climate prediction, Proceedings of the National Academy of Sciences 96 (26) (1999) 14687–14691.
  • [36] F. Lu, Data-driven model reduction for stochastic burgers equations, Entropy 22 (12) (2020) 1360.
  • [37] K. K. Lin, F. Lu, Data-driven model reduction, wiener projections, and the Koopman-Mori-Zwanzig formalism, Journal of Computational Physics 424 (2021) 109864.
  • [38] A. J. Majda, J. Harlim, Physics constrained nonlinear regression models for time series, Nonlinearity 26 (1) (2012) 201.
  • [39] J. Harlim, A. Mahdi, A. J. Majda, An ensemble Kalman filter for statistical estimation of physics constrained nonlinear regression models, Journal of Computational Physics 257 (2014) 782–812.
  • [40] D. Kondrashov, M. D. Chekroun, M. Ghil, Data-driven non-Markovian closure models, Physica D: Nonlinear Phenomena 297 (2015) 33–55.
  • [41] T. N. Palmer, A nonlinear dynamical perspective on model error: A proposal for non-local stochastic-dynamic parametrization in weather and climate prediction models, Quarterly Journal of the Royal Meteorological Society 127 (572) (2001) 279–304.
  • [42] A. J. Majda, B. Gershgorin, Improving model fidelity and sensitivity for complex systems through empirical information theory, Proceedings of the National Academy of Sciences 108 (25) (2011) 10044–10049.
  • [43] D. Crommelin, E. Vanden-Eijnden, Subgrid-scale parameterization with conditional Markov chains, Journal of the Atmospheric Sciences 65 (8) (2008) 2661–2675.
  • [44] T. J. Phillips, G. L. Potter, D. L. Williamson, R. T. Cederwall, J. S. Boyle, M. Fiorino, J. J. Hnilo, J. G. Olson, S. Xie, J. J. Yio, Evaluating parameterizations in general circulation models: Climate simulation meets weather prediction, Bulletin of the American Meteorological Society 85 (12) (2004) 1903–1916.
  • [45] M. Branicki, N. Chen, A. J. Majda, Non-Gaussian test models for prediction and state estimation with model errors, Chinese Annals of Mathematics, Series B 34 (1) (2013) 29–64.
  • [46] G. Evensen, Data assimilation: the ensemble Kalman filter, Springer Science & Business Media, 2009.
  • [47] E. Kalnay, Atmospheric modeling, data assimilation and predictability, Cambridge University Press, 2003.
  • [48] K. Law, A. Stuart, K. Zygalakis, Data assimilation, Cham, Switzerland: Springer 214.
  • [49] A. J. Majda, J. Harlim, Filtering complex turbulent systems, Cambridge University Press, 2012.
  • [50] A. J. Majda, Y. Yuan, Fundamental limitations of ad hoc linear and quadratic multi-level regression models for physical systems, Discrete and Continuous Dynamical Systems B 17 (4) (2012) 1333–1363.
  • [51] N. Chen, A. Majda, Conditional Gaussian systems for multiscale nonlinear stochastic systems: Prediction, state estimation and uncertainty quantification, Entropy 20 (7) (2018) 509.
  • [52] W. E, K. Khanin, A. Mazel, Y. Sinai, Invariant measures for Burgers equation with stochastic forcing, Annals of Mathematics (2000) 877–960.
  • [53] W. E, K. Khanin, A. Mazel, Y. Sinai, Probability distribution functions for the random forced Burgers equation, Physical Review Letters 78 (10) (1997) 1904.
  • [54] N. Chen, A. J. Majda, Filtering nonlinear turbulent dynamical systems through conditional Gaussian statistics, Monthly Weather Review 144 (12) (2016) 4885–4917.
  • [55] N. Chen, A. J. Majda, D. Giannakis, Predicting the cloud patterns of the Madden-Julian oscillation through a low-order nonlinear stochastic model, Geophysical Research Letters 41 (15) (2014) 5612–5619.
  • [56] N. Chen, A. J. Majda, Predicting the real-time multivariate Madden–Julian oscillation index through a low-order nonlinear stochastic model, Monthly Weather Review 143 (6) (2015) 2148–2169.
  • [57] N. Chen, A. J. Majda, X. T. Tong, Information barriers for noisy Lagrangian tracers in filtering random incompressible flows, Nonlinearity 27 (9) (2014) 2133.
  • [58] N. Chen, A. J. Majda, X. T. Tong, Noisy Lagrangian tracers for filtering random rotating compressible flows, Journal of Nonlinear Science 25 (3) (2015) 451–488.
  • [59] M. Branicki, A. J. Majda, Dynamic stochastic superresolution of sparsely observed turbulent systems, Journal of Computational Physics 241 (2013) 333–363.
  • [60] S. R. Keating, A. J. Majda, K. S. Smith, New methods for estimating ocean eddy heat transport using satellite altimetry, Monthly Weather Review 140 (5) (2012) 1703–1722.
  • [61] M. Kaercher, S. Boyaval, M. A. Grepl, K. Veroy, Reduced basis approximation and a posteriori error bounds for 4D-Var data assimilation, Optim. Eng. (2018) 1–33.
  • [62] Y. Maday, A. T. Patera, J. D. Penn, M. Yano, A parameterized-background data-weak approach to variational data assimilation: formulation, analysis, and application to acoustics, Int. J. Num. Meth. Engng. 102 (5) (2015) 933–965.
  • [63] S. Pagani, A. Manzoni, A. Quarteroni, Efficient state/parameter estimation in nonlinear unsteady PDEs by a reduced basis ensemble Kalman filter, SIAM-ASA J. Uncertain. 5 (1) (2017) 890–921.
  • [64] A. A. Popov, C. Mou, T. Iliescu, A. Sandu, A multifidelity ensemble Kalman filter with reduced order control variates, SIAM J. Sci. Comput. 43 (2) (2021) A1134–A1162.
  • [65] R. Ştefănescu, A. Sandu, I. M. Navon, POD/DEIM reduced-order strategies for efficient four dimensional variational data assimilation, J. Comput. Phys. 295 (2015) 569–595.
  • [66] D. Xiao, J. Du, F. Fang, C. C. Pain, J. Li, Parameterised non-intrusive reduced order methods for ensemble Kalman filter data assimilation, Comput. & Fluids 177 (2018) 69–77.
  • [67] C. Zerfas, L. G. Rebholz, M. Schneier, T. Iliescu, Continuous data assimilation reduced order models of fluid flow, Comput. Meth. Appl. Mech. Eng. 357 (2019) 112596.
  • [68] W. Lahoz, B. Khattatov, R. Ménard, Data assimilation and information, in: Data Assimilation, Springer, 2010, pp. 3–12.
  • [69] J. S. Whitaker, T. M. Hamill, Ensemble data assimilation without perturbed observations, Monthly weather review 130 (7) (2002) 1913–1924.
  • [70] B. R. Hunt, E. J. Kostelich, I. Szunyogh, Efficient data assimilation for spatiotemporal chaos: A local ensemble transform kalman filter, Physica D: Nonlinear Phenomena 230 (1-2) (2007) 112–126.
  • [71] P. L. Houtekamer, F. Zhang, Review of the ensemble kalman filter for atmospheric data assimilation, Monthly Weather Review 144 (12) (2016) 4489–4532.
  • [72] R. S. Liptser, A. N. Shiryaev, Statistics of random processes II: Applications, Vol. 6, Springer Science & Business Media, 2013.
  • [73] R. E. Kalman, R. S. Bucy, New results in linear filtering and prediction theory, Journal of Basic Engineering 83 (1) (1961) 95–108.
  • [74] S. Saha, S. Moorthi, H.-L. Pan, X. Wu, J. Wang, S. Nadiga, P. Tripp, R. Kistler, J. Woollen, D. Behringer, et al., The ncep climate forecast system reanalysis, Bulletin of the American Meteorological Society 91 (8) (2010) 1015–1058.
  • [75] N. Chen, Learning nonlinear turbulent dynamics from partial observations via analytically solvable conditional statistics, Journal of Computational Physics 418 (2020) 109635.
  • [76] S. Boyd, L. Vandenberghe, Convex optimization, Cambridge University Press, 2004.
  • [77] K. Bergemann, S. Reich, An ensemble Kalman-Bucy filter for continuous data assimilation, Meteorologische Zeitschrift 21 (2012) 213–219.
  • [78] R. J. Greatbatch, B. T. Nadiga, Four-gyre circulation in a barotropic model with double-gyre wind forcing, Journal of Physical Oceanography 30 (6) (2000) 1461–1471.
  • [79] C. Mou, H. Liu, D. R. Wells, T. Iliescu, Data-driven correction reduced order models for the quasi-geostrophic equations: A numerical investigation, International Journal of Computational Fluid Dynamics 34 (2) (2020) 147–159.
  • [80] O. San, T. Iliescu, A stabilized proper orthogonal decomposition reduced-order model for large scale quasigeostrophic ocean circulation, Advances in Computational Mathematics (2015) 1289–1319.
  • [81] O. San, A. E. Staples, Z. Wang, T. Iliescu, Approximate deconvolution large eddy simulation of a barotropic ocean circulation model, Ocean Modelling 40 (2011) 120–132.
  • [82] S. Kullback, R. A. Leibler, On information and sufficiency, The Annals of Mathematical Statistics 22 (1) (1951) 79–86.
  • [83] S. Kullback, Letter to the editor: The kullback-leibler distance, American Statistician.
  • [84] R. Kleeman, Information theory and dynamical system predictability, Entropy 13 (3) (2011) 612–649.
  • [85] A. Majda, R. V. Abramov, M. J. Grote, Information theory and stochastics for multiscale nonlinear systems, Vol. 25, American Mathematical Soc., 2005.
  • [86] S. L. Brunton, J. L. Proctor, J. N. Kutz, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proceedings of the National Academy of Sciences 113 (15) (2016) 3932–3937.
  • [87] J. Sun, E. M. Bollt, Causation entropy identifies indirect influences, dominance of neighbors and anticipatory couplings, Physica D: Nonlinear Phenomena 267 (2014) 49–57.
  • [88] J. Elinger, Information theoretic causality measures for parameter estimation and system identification, Ph.D. thesis, Georgia Institute of Technology (2020).