This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Improving Surrogate Model Robustness to Perturbations for Dynamical Systems Through Machine Learning and Data Assimilation

Abhishek Ajayakumar Soumyendu Raha
Abstract

Many real-world systems are modelled using complex ordinary differential equations (ODEs). However, the dimensionality of these systems can make them challenging to analyze. Dimensionality reduction techniques like Proper Orthogonal Decomposition (POD) can be used in such cases. However, these reduced order models are susceptible to perturbations in the input. We propose a novel framework that combines machine learning and data assimilation techniques to improving surrogate models to handle perturbations in input data effectively. Through rigorous experiments on dynamical systems modelled on graphs, we demonstrate that our framework substantially improves the accuracy of surrogate models under input perturbations. Furthermore, we evaluate the framework’s efficacy on alternative surrogate models, including neural ODEs, and the empirical results consistently show enhanced performance.

Keywords: Surrogate model \cdot Machine learning \cdot Dimensionality reduction \cdot Reaction-diffusion system

1   Introduction

Complex networks provide valuable insights into real-world systems, such as the spread of epidemics [1] and the understanding of biochemical and neuronal processes [2]. However, modeling these systems becomes computationally complex when dealing with large state spaces. To address this challenge, methods like Proper Orthogonal Decomposition (POD) ([3], [4], [5], [6]) are employed to capture patterns while operating in reduced dimensions. While dimensionality reduction techniques like POD are effective, their sensitivity to data necessitates robust surrogate models that can maintain accuracy under input perturbations. Consequently, the question arises regarding how to enhance a surrogate model to accommodate perturbed data sets. This study proposes a framework designed to improve the robustness of surrogate models, ensuring their reliability under varying input conditions.

Surrogate models serve as approximations of complex real-world systems, which are often modeled using ordinary differential equations (ODEs) or partial differential equations (PDEs). In many cases, the complexity of these systems makes it difficult to fully capture their underlying processes. To address the challenges of modeling high-dimensional and chaotic systems, several machine learning (ML) approaches have been proposed to construct surrogate models that provide accurate and computationally efficient approximations. Several ML-based approaches have been developed to construct surrogate models that provide accurate and computationally efficient approximations. These include analog models [7], recurrent neural networks (RNNs) [8, 9], residual neural networks that approximate resolvents [10, 11], and differential equation-based models such as in [12, 13, 14, 15]. A notable study [10] integrates machine learning and data assimilation (DA) to improve predictions in scenarios with sparse observations. However, the DA step is computationally expensive, limiting its practical applicability. To address model inaccuracies, researchers have explored alternative approaches such as weak-constraint DA methods [16] and ML-DA frameworks [17].

The contributions of our paper can be summarized as follows:

  1. 1.

    We introduce a novel framework (Figure 1) that improves the robustness of surrogate models against input perturbations by integrating data assimilation and machine learning techniques.

  2. 2.

    We conduct extensive experiments on dynamical systems represented as graphs, specifically the diffusion system (𝔻\mathbb{D}) and the chemical Brusselator model (\mathbb{C}) discussed in Section 2. For dynamical systems on graphs, we propose a dynamic optimization step, described in Sections  5.2 and  5.3, to obtain a sparse graph and reduce memory complexity.

    1. (a)

      For the diffusion equation (𝔻\mathbb{D}) on graphs (Section 2.2), we derive conditions (Theorem 3) to guide the constraints in our dynamic optimization step.

    2. (b)

      Section 5.3 presents the dynamic optimization step for reaction-diffusion systems on graphs, detailing the required constraints.

  3. 3.

    Section 1 presents experimental results for the framework on both linear and non-linear dynamical systems represented on graphs, using the POD-based surrogate model. Empirical results in Table 1 demonstrate that our framework enhances surrogate model performance under input perturbations.

  4. 4.

    In Section 7, we demonstrate how our framework can be integrated into a general setting to reinforce neural ODE-based surrogate models trained on data-driven systems. The improved models exhibit enhanced performance and robustness against input perturbations, outperforming both POD-based models and the original neural ODE solutions, as shown in Figures 3, 4, and 5.

2   Background

We present the fundamental tools and techniques essential for understanding the methodologies discussed in this paper.

2.1   Orthogonal collocation method

The orthogonal collocation method is a well-established numerical technique for solving differential equations, particularly in dynamic optimization problems. It provides a versatile approach applicable to both ordinary and partial differential equations [18, 19].

The proposed method entails partitioning the problem domain into a discrete set of collocation points, ensuring that the polynomial approximation of the vector field f(t,z(t))f(t,z(t)) satisfies the orthogonality condition. Common choices for collocation points include Legendre, Chebyshev, and Gaussian points.

The method enforces the differential equations at the collocation points, resulting in a system of algebraic equations that can be solved using numerical techniques such as Newton’s method or direct solvers. The choice of collocation points and basis functions significantly impacts the method’s accuracy and efficiency. To illustrate the method, we provide an example using the Lotka-Volterra system.

dxdt\displaystyle\frac{dx}{dt} =αxβxy\displaystyle=\alpha x-\beta xy
dydt\displaystyle\frac{dy}{dt} =δxyγy.\displaystyle=\delta xy-\gamma y.

Here xx and yy denote the prey and predator population. The parameters α,β\alpha,\beta determine the prey growth rate and δ,γ\delta,\gamma determine the predator growth rate. If we consider a single collocation element with two nodes per element, the orthogonal collocation method [18] solves the following system of equations to determine the prey and predator populations at time (ts)(t_{s}), denoted as (xts,yts)(x_{t_{s}},y_{t_{s}}).

tsN2×2(αxtsβxtsytsδxtsytsγyts)=(xtsyts)(xt0yt0)t_{s}N_{2\times 2}\left(\begin{array}[]{c}\alpha x_{t_{s}}-\beta x_{t_{s}}y_{t_{s}}\\ \delta x_{t_{s}}y_{t_{s}}-\gamma y_{t_{s}}\end{array}\right)=\left(\begin{array}[]{c}x_{t_{s}}\\ y_{t_{s}}\end{array}\right)-\left(\begin{array}[]{c}x_{t_{0}}\\ y_{t_{0}}\end{array}\right). N2×2=(0.750.2510).N_{2\times 2}=\left(\begin{array}[]{cc}0.75&-0.25\\ 1&0\end{array}\right). This method is applied in the dynamic optimization step of our framework (Section 4).

2.2   Spatio temporal propagation in graphs

Spatiotemporal propagation in complex networks has been widely studied across disciplines such as physics, biology, and neuroscience. Complex networks comprise interconnected nodes whose interactions give rise to dynamic processes. Understanding the mechanisms governing information, signal, or dynamic propagation within these networks is crucial for analyzing their complex behaviors and emergent properties.

Spatiotemporal propagation describes the transmission or diffusion of information, signals, or dynamics across interconnected nodes and edges in a complex network. It examines how localized events, disturbances, or modifications in one region of the network evolve and influence other nodes or areas. This phenomenon has significant implications across diverse fields, including physics, biology, and neuroscience. The following examples highlight key applications of complex networks.

  1. 1.

    Diffusion Processes  (𝔻)(\mathbb{D}): The study of diffusion phenomena, including heat transfer and molecular diffusion, aids in modeling and optimizing processes involving transport and dispersion. By considering the normalized Laplacian matrix of the graph, the heat equation can be represented as an equivalent discrete dynamical system. Normalized Laplacian matrix =D1/2LD1/2\mathcal{L}=D^{-1/2}LD^{-1/2}, where DD is the diagonal matrix of degrees and L=DAL=D-A is the Laplacian matrix of the graph.

    dFdt=F,\frac{dF}{dt}=-\mathcal{L}F,\;\;

    Here Fn,F(x,t)F\in\mathbb{R}^{n},F(x,t) denotes the temperature at node xx and time tt.

  2. 2.

    Chemical Brusselator model (\mathbb{C}): Introduced in 1971, the chemical Brusselator model exemplifies an autocatalytic chemical reaction system [20, 21]. Its dynamics are governed by the equations:

    {x˙i=a(b+d)xi+cxi2yiDxjLijxjy˙i=bxicxi2yiDyjLijyj\left\{\begin{array}[]{l}\dot{x}_{i}\;=\;a-\left(b+d\right)x_{i}+c\;x_{i}^{2}y_{i}-D_{x}\;\sum_{j}L{\;}_{\mathrm{ij}}x_{j}\\ {\dot{y}_{i}\;=\;{bx}}_{i}-c\;x_{\;i}^{2}y_{i}-D_{y}\;\sum_{j}L_{\mathrm{ij}}\;y_{j}\end{array}\right. (1)
  3. 3.

    Epidemic Spread (𝔼)(\mathbb{E}): Understanding how infectious diseases propagate through social networks can aid in designing effective strategies for disease control, information dissemination, and opinion formation. The SIS (susceptible-infected-susceptible) model, used to study epidemic spreading, is described as follows:

    dxidt=xi+j=1NAij(1xi)xj.\frac{dx_{i}}{dt}=-x_{i}+\sum_{j=1}^{N}A_{ij}(1-x_{i})x_{j}.

    AijA_{ij} represents the (i,j)(i,j)-th entry of the adjacency matrix of the graph. NN denotes the number of nodes.

  4. 4.

    Neural Dynamics (\mathbb{N}): The study of neural activity propagation in brain networks provides insights into cognition and neurological disorders. One such system, described in [22], follows the equation:

    dxidt=Bxi+Ctanhxi+j=1NAijtanhxj.\frac{dx_{i}}{dt}=-Bx_{i}+C\tanh{x_{i}}+\sum_{j=1}^{N}A_{ij}\tanh{x_{j}}.

2.3   Proper Orthogonal Decomposition for Dynamical systems

This section provides a concise overview of the application of Proper Orthogonal Decomposition (POD) to initial value problems in dynamical systems [3]. Given a dataset 𝒟\mathcal{D} consisting of a collection of points xicx^{c}_{i}, where xicnx^{c}_{i}\in\mathbb{R}^{n} denotes the state of the system at time tit_{i} for a particular trajectory cc, the POD method seeks a subspace SnS\subset\mathbb{R}^{n} such that

𝒟ρS𝒟2||\mathcal{D}-\rho_{S}\mathcal{D}||^{2}

is minimized. ρS\rho_{S} is the orthogonal projection onto the subspace SS and ρS𝒟\rho_{S}\mathcal{D} denotes the projected data set.

SnS\subset\mathbb{R}^{n} is the best kk dimensional approximating affine subspace, with the matrix ρ\rho of projection consisting of leading kk eigenvectors of the covariance matrix (R¯\bar{R}). The subspace for a dataset 𝒟\mathcal{D} is uniquely determined by the projection matrix PP and the mean x¯\bar{x}, P=ρTρP=\rho^{T}\rho.

R¯\displaystyle\bar{R} =c=1NT0T(xc(t)x¯)(xc(t)x¯)T𝑑t\displaystyle=\sum_{c=1}^{N_{T}}\int_{0}^{T}(x^{c}(t)-\bar{x})(x^{c}(t)-\bar{x})^{T}dt
x¯\displaystyle\bar{x} =1NTTc=1NT0Txc(t)𝑑t.\displaystyle=\frac{1}{N_{T}T}\sum_{c=1}^{N_{T}}\int_{0}^{T}x^{c}(t)dt.

NTN_{T} denotes the number of trajectories.

An asymptotic and sensitivity analysis of POD is presented in [3]. Consider a dynamical system in n\mathbb{R}^{n} governed by a vector field ff:

x˙=f(x,t).\dot{x}=f(x,t).

The reduced-order model (ROM) vector field is defined as:

z˙=ρf(ρTz+x¯,t)=fa(z,t).\dot{z}=\rho f(\rho^{T}z+\bar{x},t)=f_{a}(z,t).

Thus an initial value problem for the system x˙=f(x,t)\dot{x}=f(x,t) with x(0)=x0x(0)=x_{0} using the projection method is given by,

x^˙=Pf(x^,t);x^(0)=x^0=P(x0x¯)+x¯.\dot{\hat{x}}=Pf(\hat{x},t);\;\;\hat{x}(0)=\hat{x}_{0}=P(x_{0}-\bar{x})+\bar{x}. (2)

Here, x^0\hat{x}_{0} is the projection of x0x_{0} onto the subspace SS. The sensitivity of the POD projection PP to changes in the dataset 𝒟\mathcal{D} is quantified by the following proposition from [3]:
Proposition: (Rathinam and Petzold [3]) Consider applying POD to a data set 𝒟\mathcal{D} to find the best approximating k(<n)k(<n) dimensional subspace. Let the ordered eigenvalues of the covariance matrix of the data 𝒟\mathcal{D} be given by λ~1λ~n\tilde{\lambda}_{1}\geq\cdots\geq\tilde{\lambda}_{n}. Suppose λ~k>λ~k+1,\tilde{\lambda}_{k}>\tilde{\lambda}_{k+1}, which ensures that P(𝒟)P(\mathcal{D}) is well defined. Then

Sk(𝒟)=maxik,jnk2λ~i+λ~j+kλ~iλ~j+kλ~1++λ~n2.S_{k}(\mathcal{D})=\text{max}_{i\leq k,j\leq n-k}\,\sqrt{2}\;\frac{\sqrt{\tilde{\lambda}_{i}+\tilde{\lambda}_{j+k}}}{\tilde{\lambda}_{i}-\tilde{\lambda}_{j+k}}\sqrt{\tilde{\lambda}_{1}+\cdots+\tilde{\lambda}_{n}}\geq\sqrt{2}. (3)

2.4   Filtering

This section introduces the filtering problem and outlines the key steps in the filtering framework. For further details on the concept of filtering, see [23, 24]. Rooted in Bayesian inference, optimal filtering estimates the state of time-varying systems under noisy measurements. Its objective is to achieve statistically optimal estimation of the system state. Optimal filtering follows the Bayesian framework for state estimation, integrating statistical optimization with Bayesian reasoning to improve applications in signal processing, control systems, and sensor networks.

In Bayesian optimal filtering, the system state consists of dynamic variables such as position, velocity, orientation, and angular velocity. Due to measurement noise, observations do not yield deterministic values but rather a distribution of possible states, introducing uncertainty. The system state evolution is modeled as a dynamic system with process noise capturing inherent uncertainties in system dynamics. While the underlying system is often deterministic, stochasticity is introduced to represent model uncertainties.

The system’s state evolves according to:

xk+1=M(xk)+wk+1,x_{k+1}=M(x_{k})+w_{k+1},

where xknx_{k}\in\mathbb{R}^{n} is the system state at time tkt_{k}, and wk+1nw_{k+1}\in\mathbb{R}^{n} represents the model error. The observations are given by:

zk=h(xk)+vk,z_{k}=h(x_{k})+v_{k},

where vkv_{k} represents the observation noise.

The filtering step estimates the state at time tk+1t_{k+1} (xk+1x_{k+1}) based on observations up to tk+1t_{k+1}.

3   Problem statement

Surrogate modeling techniques, such as the reduced-order model (ROM) discussed in Section 2.3, provide computational efficiency but are highly sensitive to input variations, as indicated in Equation 3 Consequently, when initialized with a new random condition, the surrogate model (M()M(\cdot)) efficiently approximates the system trajectory but may fail to capture deviations from the true state.
The surrogate model is used to obtain compressed representations of state vectors at various timesteps, which are treated as noisy observations. The observations are compressed for better memory efficiency if the model operates on the state dimension, as in the neural ODE surrogate model (Section 7). Let h:nkh:\mathbb{R}^{n}\rightarrow\mathbb{R}^{k} (where k<nk<n) denote the state observation relationship. emh(tk+1)e_{mh}(t_{k+1}) and eml(tk+1)e_{ml}(t_{k+1}) denote the high and low wavelength components of error, capturing deviations between the true state and the state obtained from the surrogate model (M(xk)M(x_{k})) at time tk+1t_{k+1}. eoh(tk)e_{oh}(t_{k}) and eol(tk)e_{ol}(t_{k}) denote the high and low-wavelength components of error that capture the difference between the observation and the state at time tkt_{k}. xk+1x_{k+1} denotes the true state of the system at time tk+1t_{k+1}. M()M(\cdot) denotes the forward model. For instance, if the forward model M()M(\cdot) uses the Euler forward scheme and the surrogate model applies POD to the vector field x˙=f(x(t))\dot{x}=f(x(t)), then:

M(xk)=xk+hPf(xk,tk),M({x}_{k})={x}_{k}+hPf({x}_{k},t_{k}),

where hh denotes the step size. The state forward dynamics and state observation relationship are given by:

xk+1\displaystyle x_{k+1} =M(xk)+emh(tk+1)+eml(tk+1),\displaystyle=M({x}_{k})+e_{mh}(t_{k+1})+e_{ml}(t_{k+1}),
zk\displaystyle z_{k} =h(xk)+eoh(tk)+eol(tk).\displaystyle=h(x_{k})+e_{oh}(t_{k})+e_{ol}(t_{k}).

4   Proposed Framework

Refer to caption
Figure 1: Diagram illustrating the key steps of the proposed methodology, including dynamic optimization, hierarchical clustering, solving the inverse problem, and filtering. See Section 4 for a detailed explanation.

The key steps of the framework, as illustrated in Figure 1, are as follows. For brevity, the framework is discussed in the context of dynamical systems represented on graphs. For dynamical systems that do not involve graphs, the dynamic optimization step is not required. The framework’s applicability to general modeling scenarios is demonstrated in Section 7, where a neural ODE-based surrogate model is used to learn from data.

  1. 1.

    Dynamic optimization. For dynamical systems on graphs, a dynamic optimization problem is formulated to extract a sparse graph from known solution trajectory snapshots. The sparse graph improves memory efficiency in the filtering step (Step 4) of the framework. The original graph can be replaced with the sparse graph, further enhancing memory efficiency. This step outputs the graph Laplacian matrix of the sparse graph, given by L1=BTdiag(w)BL_{1}=B^{T}\text{diag}(w^{*})B, where the weights ww^{*} for linear and nonlinear dynamical systems on graphs are determined by solving the optimization problems detailed in Sections 5.2 and 5.3.

  2. 2.

    Hierarchical clustering. This step quantifies deviations between the surrogate model (e.g., POD) and the actual system trajectories. Given a dataset 𝒟={x1,x2,,xT}\mathcal{D}=\{x_{1},x_{2},\ldots,x_{T}\} and surrogate model predictions 𝒫={z1,z2,,zT}\mathcal{P}=\{z_{1},z_{2},\ldots,z_{T}\}, we apply hierarchical clustering to partition 𝒫\mathcal{P} into pp clusters. The assigned cluster labels are represented as Y={y1,y2,,yT}Y=\{y_{1},y_{2},\ldots,y_{T}\}, where yty_{t} denotes the cluster assignment for data point ztz_{t}. For each cluster CiC_{i}, we construct a set of triplets {zj,ej,e^j}j=1NCi\{z_{j},e_{j},\hat{e}_{j}\}_{j=1}^{N_{C_{i}}}, where NCiN_{C_{i}} is the number of triplets in cluster CiC_{i}. For reduced order model using POD (Section 2.3), we denote ej=xj(ρTzj+x¯),e^j=zjρ(xjx¯)e_{j}=x_{j}-(\rho^{T}{z}_{j}+\bar{x}),\hat{e}_{j}={z}_{j}-\rho(x_{j}-\bar{x}). The center of cluster CiC_{i} is the triplet {zmCi,emCi,e^mCi}\{{z}^{C_{i}}_{m},e^{C_{i}}_{m},\hat{e}^{C_{i}}_{m}\}, where emCi=j=1NCiejNCi,e^mCi=j=1NCie^jNCi,zmCi=j=1NCizjNCie^{C_{i}}_{m}=\frac{\sum^{N_{C_{i}}}_{j=1}e_{j}}{N_{C_{i}}},\hat{e}^{C_{i}}_{m}=\frac{\sum^{N_{C_{i}}}_{j=1}\hat{e}_{j}}{N_{C_{i}}},{z}^{C_{i}}_{m}=\frac{\sum^{N_{C_{i}}}_{j=1}{z}_{j}}{N_{C_{i}}}. As shown in Figure 2, a visualization of the clusters is provided, with ICiI_{C_{i}} representing the indices of data points within cluster CiC_{i}.

  3. 3.

    Inverse Problem for Estimating Weights Between Clusters. For an initial condition x0x_{0} at time step t0t_{0}, we obtain a compressed representation of the data point z0z_{0}. For example, when using the POD method, we get the compressed representation of x0x_{0} as z0=ρ(x0x¯){z}_{0}=\rho(x_{0}-\bar{x}). From the clusters formed in Step 3, we assign the cluster to the point z0z_{0} based on the following,

    argminf{1,2,,p}zmCfz02.\arg\min_{f\in\{1,2,\dots,p\}}\left\|z^{C_{f}}_{m}-{z}_{0}\right\|_{2}. (4)

    The initial cluster distribution p0pp_{0}\in\mathbb{R}^{p} is a one-hot vector, where the index corresponding to f{0,1,,p}f\in\{0,1,\ldots,p\} is set to 1. The cluster distribution at time tt is denoted by ptp_{t}. For successive transitions x^1x^2x^T\hat{x}_{1}\rightarrow\hat{x}_{2}\rightarrow\ldots\rightarrow\hat{x}_{T}, it is essential to efficiently determine the cluster of each approximate state x^i\hat{x}_{i} generated by the surrogate model at time tit_{i}. This step is performed for the estimation of high wavelength components of errors described in Section 3. The transitioning of cluster distribution (ptp_{t}) is modelled as a Markovian process discussed in [25]. We consider the nodes of the clusters from Step 2 as nodes of graph 𝒢I=(𝒱I,I,q)\mathcal{G}_{I}=(\mathcal{V}_{I},\mathcal{E}_{I},q), |𝒱I|=p|\mathcal{V}_{I}|=p, where the weights of the graph qq are unknown. The topology of the graph is considered complete with self-loops, meaning there is an edge between each node of the cluster and an edge from each node to itself. We use a supervised learning approach to determine the weights qq within the graph, where the labels yty_{t} from Step 2 are used in the optimization objective function with the cross-entropy loss (See P).

    The transitions of the cluster distribution is modeled as Markovian, based on the assumption that the states of the ODE exhibit Markovian properties, which is explained below. The general explicit Runge-Kutta numerical scheme using nn slopes ([26]) for obtaining solution of the differential equation x˙=f(x(t))\dot{x}=f(x(t)) is given by the following relation

    xt+1=xt+hb1k1+hb2k2+hbnkn,x_{t+1}=x_{t}+hb_{1}k_{1}+hb_{2}k_{2}+\ldots hb_{n}k_{n},
    ki=hf(xi+hj=1i1aijkj).k_{i}=hf(x_{i}+h\sum_{j=1}^{i-1}a_{ij}k_{j}).

    From this formulation, it follows that the solution transitions of the ODE adhere to a discrete-time Markov chain:

    p(xt+1x0,x1,,xt)=p(xt+1xt).p(x_{t+1}\mid x_{0},x_{1},\dots,x_{t})=p(x_{t+1}\mid x_{t}).

    In a graph 𝒢I\mathcal{G}_{I}, a walk is represented as a sequence of vertices (v0,v1,,vs)(v_{0},v_{1},\dots,v_{s}), where each consecutive pair of vertices (vi1,vi)(v_{i-1},v_{i}) is connected by an edge in 𝒢I\mathcal{G}_{I}. In other words, there is an edge between vi1v_{i-1} and viv_{i} for all 1is1\leq i\leq s. A random walk is characterized by the transition probabilities P(u,v)P(u,v), which define the probability of moving from vertex uu to vertex vv in one step. Each vertex uu in the graph can transition to multiple neighboring vertices vv. If we consider the transitions between the clusters as Markovian, the transition matrix P=D1AP=D^{-1}A, where DD is the diagonal matrix of degrees and AA is the adjacency matrix of the graph. For each vertex uu,

    vP(u,v)=1.\sum_{v}P(u,v)=1.
    P(u,v)={1duif u and v are adjacent,0otherwiseP\left(u,v\right)=\;\left\{\begin{array}[]{ll}\frac{1}{d_{u}}&\;\text{if u and v are adjacent},\\ 0&\mathrm{otherwise}\end{array}\right.

    If we consider a complete graph on 3 nodes with self-loops as shown in Figure 2, the transition matrix,

    P=[1q11+q12+q130001q22+q23+q120001q13+q23+q33][q11q12q13q12q22q23q13q23q33].P=\left[\begin{array}[]{ccc}\frac{1}{q_{11}+q_{12}+q_{13}\;}&0&0\\ 0&\frac{1}{q_{22}+q_{23}+q_{12}}&0\\ 0&0&\frac{1}{q_{13}+q_{23}+q_{33}}\end{array}\right]\left[\begin{array}[]{ccc}q_{11}&q_{12}&q_{13}\\ q_{12}&q_{22}&q_{23}\\ q_{13}&q_{23}&q_{33}\end{array}\right].
    Refer to caption
    Figure 2: Illustration of triplet organization into clusters, with mutually exclusive sets of indices ICiI_{C_{i}} (where i=1,2,3i=1,2,3). The inverse problem P is used to estimate weights qijq_{ij}.

    We now pose a constrained optimization problem P to estimate the weights (qq) responsible for governing the transitions of data points.

    minimize qn(n+1)/2J=t=1Tk=1p(yt)log(pt(k))(1yt)log(1(pt(k)))subject   to pt=P(q)pt1qij0i,j=1,2,p.}\left.\begin{aligned} &\textbf{minimize\;\;}_{q\,\in\,\mathbb{R}^{n(n+1)/2}}\;\;&&J=\sum_{t=1}^{T}\sum_{k=1}^{p}-({y}_{t})\log({p}_{t}(k))-(1-{y}_{t})\log(1-({p}_{t}(k)))\\ &\textbf{subject\;\; to \;\;}&&{p}_{t}=P(q)\;{p}_{t-1}\\ &&&q_{ij}\geq 0\;\;i,j=1,2,\ldots p.\\ \end{aligned}\right\} (P)

    yt{y}_{t} represents the cluster index of data point ztz_{t}. The optimization problem above is solved to find the optimal qq^{*}, which can be efficiently obtained using the adjoint method for data assimilation, as detailed in [23]. Given the cluster distribution (pkp_{k}) at time tkt_{k}, we estimate components Ak,RkA_{k},R_{k} (Eq. 5) for the high wavelength component of errors emh(tk),eoh(tk)e_{mh}(t_{k}),e_{oh}(t_{k}) as follows: Ak=diag(i=1ppk(i)emCi),Rk=diag(i=1ppk(i)e^mCi).A_{k}=\text{diag(}{\sum_{i=1}^{p}p_{k}(i){e}_{m}^{C_{i}}}),R_{k}=\text{diag(}{\sum_{i=1}^{p}p_{k}(i)\hat{e}_{m}^{C_{i}}}).

  4. 4.

    State Estimation with Filtering. The goal of filtering is to estimate system states over time using noisy measurements and state observation relations, as discussed in Section 2.4. The POD surrogate model defines the evolution of state dynamics and the state-measurement relationship as follows:

    {xk+1=M(Pf(xk))+Ak+1vk+1+wk+1,zk=ρ(xkx¯)+Rkβk+μk,\begin{cases}x_{k+1}=M(Pf({x}_{k}))+A_{k+1}v_{k+1}+w_{k+1},&\\ z_{k}=\rho(x_{k}-\bar{x})+R_{k}\beta_{k}+\mu_{k},&\end{cases} (5)

    The terms Ak+1vk+1A_{k+1}v_{k+1} and wk+1w_{k+1} represent the high- and low-wavelength components of model error at time tk+1t_{k+1}, respectively. Similarly, RkβkR_{k}\beta_{k} and μk\mu_{k} characterize the high- and low-wavelength components of the state-observation error at time tkt_{k}. P=ρTρP=\rho^{T}\rho denotes the projection matrix. At a particular time step, the data point at time tt (zt)({z}_{t}) is obtained using the ROM z˙t=fa(zt,t)=ρf(ρTzt+x¯,t)\dot{{z}}_{t}=f_{a}({z}_{t},t)=\rho f(\rho^{T}{z}_{t}+\bar{x},t), and the state vector after projection xt=ρTzt+x¯{x}_{t}=\rho^{T}{z}_{t}+\bar{x}. The matrices Ak+1A_{k+1} and RkR_{k} are computed following the methodology outlined in Step 3 of the framework. The term v1v_{1} is the solution to the linear system A1v1=x1M(Pf(x0))A_{1}v_{1}=x_{1}-M(Pf(x_{0})). For dynamical systems represented on graphs, the term x1x_{1} is computed using an explicit numerical scheme with the sparse graph obtained from the dynamic optimization Step 1. In the case of a diffusion equation represented on graphs (𝔻(\mathbb{D} in Section 2.2) with Euler discretization step size hh, the sparse Laplacian matrix L1L_{1} from the output of Step 1 is used instead of matrix LL to obtain x1=x0hL1x0x_{1}=x_{0}-hL_{1}x_{0}. β1\beta_{1} is the solution to the linear system R1β1=z1ρ(x1x¯)R_{1}\beta_{1}=z_{1}-\rho(x_{1}-\bar{x}). wk+1w_{k+1} and μk\mu_{k} are modelled as Gaussian with means 𝟎n×1,𝟎k×1\mathbf{0}_{n\times 1},\mathbf{0}_{k\times 1} and variances ζx=σxIn×n,ζy=σyIk×k.\zeta_{x}=\sigma_{x}I_{n\times n},\zeta_{y}=\sigma_{y}I_{k\times k}. The distribution governing the low wavelength components of errors (wk+1,μkw_{k+1},\mu_{k}) is deliberately shaped to exhibit minimal variability, with a centered mean of 𝟎n×1\mathbf{0}_{n\times 1}. This assumption is made with the expectation that the high wavelength component will accurately capture the true nature of the error. There exist linear and non-linear filters for state estimation as described in [23], [24]. When the uncertainty in the state prediction exceeds a threshold, vectors vk+1v_{k+1} and βk\beta_{k} are updated as solutions of the linear systems defined below:

    Ak+1vk+1=xk+1M(Pf(xk)),\displaystyle A_{k+1}\;v_{k+1}=x_{k+1}-M(Pf(x_{k})), (6)
    Rkβk=zkρ(xkx¯).\displaystyle R_{k}\;\beta_{k}=z_{k}-\rho(x_{k}-\bar{x}). (7)

    The computation of xk+1x_{k+1} is performed using an explicit numerical scheme that leverages the state vector obtained from the filter at time step kk using the sparse graph.

5   Dynamic optimization problem formulation for dynamical systems represented on graphs

This section outlines the constraints necessary for the dynamic optimization step (Step 1 in Section 4) applied to a linear dynamical system 𝔻\mathbb{D} represented on graphs (Section 2.2). Section 5.2 details the dynamic optimization process for linear diffusion systems on graphs. Section 5.3 presents the optimization framework for reaction-diffusion systems on graphs, including the corresponding constraint formulations.
Existing techniques for approximating linear dynamical systems on graphs include [27], [28], and [29]. The approach in [29] constructs spectral sparsifiers by sampling edges according to their resistance ReR_{e}. The probability of sampling an edge ee to construct a sparse approximation G¯\bar{G} of graph GG is given by

pe=weRe(n1),p_{e}=\frac{w_{e}R_{e}}{(n-1)},

where the edge resistance is defined as

Re=L1/2LeL1/2.R_{e}=\|L^{1/2}L_{e}L^{1/2}\|.

See Algorithm 5 for further details.

G¯=Sparsify(G,q)[29]\bar{G}=\textbf{Sparsify}(G,q)\cite[cite]{[\@@bibref{}{spielman2009graph}{}{}]}
Choose a random edge ee of GG with probability pep_{e} proportional to weRew_{e}R_{e},
and add ee to G¯\bar{G} with weight we/(qpe)w_{e}/(qp_{e}). Take qq samples independently
with replacement, summing weights if an edge is chosen more than once.

Computing bounds on the edge weights of the sparse graph approximation G¯\bar{G}, using effective resistances or local edge connectivities, is computationally intensive. To overcome this challenge, we propose Algorithm 5.1, which leverages Theorems 2 and 3 to efficiently compute upper bounds on edge weights and node degrees in the sparse graph. The following discussion introduces Bernstein’s Inequality [30], which is utilized in the derivation of Theorem 2.

Theorem 1.

Bernstein’s Inequality: Suppose X1,,XnX_{1},\ldots,X_{n} are independent random variables with finite variances, and suppose that max1in|Xi|B\text{max}_{1\leq i\leq n}|X_{i}|\leq B almost surely for some constant B>0B>0. Let V=i=1n𝔼Xi2.V=\sum_{i=1}^{n}\mathbb{E}X_{i}^{2}. Then, for every t0,t\geq 0,

P{i=1n(Xi𝔼Xi)t}exp(t22(V+tB/3))P\{\sum_{i=1}^{n}(X_{i}-\mathbb{E}X_{i})\geq t\}\leq exp\bigg{(}-\frac{t^{2}}{2(V+tB/3)}\bigg{)}

and

P{i=1n(Xi𝔼Xi)t}exp(t22(V+tB/3))P\{\sum_{i=1}^{n}(X_{i}-\mathbb{E}X_{i})\leq-t\}\leq exp\bigg{(}-\frac{t^{2}}{2(V+tB/3)}\bigg{)}
Theorem 2.

Let each edge (u,v)(u,v) of a graph be sampled with probability puvRuvp_{uv}\propto R_{uv}. where puvβnmin(deg(u), deg(v))p_{uv}\geq\frac{\beta}{n\text{min(deg(u), deg(v))}}, then {i=1qXiqwuvt}1mn\mathbb{P}\{\frac{\sum_{i=1}^{q}X_{i}}{q}-w_{uv}\geq t\}\leq\frac{1}{mn}, tϵ1c13q+2ϵ1wuvc1q+(ϵ1c13q)2t\geq\frac{\epsilon_{1}c_{1}}{3q}+\sqrt{\frac{2\epsilon_{1}w_{uv}c_{1}}{q}+(\frac{\epsilon_{1}c_{1}}{3q})^{2}}, where

Xi={wuvpuv, with probability puv (u,v) E(G)0, otherwise.X_{i}=\left\{\begin{array}[]{ll}\begin{array}[]{l}\frac{w_{uv}}{p_{uv}},\text{\;\;with probability $p_{uv}$ (u,v) $\in\;E(G)$}\\ \\ 0,\text{\;\;otherwise.}\\ \\ \end{array}&\end{array}\right.

ϵ1=log(nm),|Xi|c1,c1=wuvnc2β\epsilon_{1}=\log(nm),|X_{i}|\leq c_{1},c_{1}=\frac{w_{uv}nc_{2}}{\beta}, where c2=min(deg(u), deg(v))c_{2}=\text{min(deg($u$), deg($v$))}.

Proof.

The expectation of XiX_{i} satisfies

𝔼[Xi]=wuv.\mathbb{E}[X_{i}]=w_{uv}.

The second moment sum satisfies

i=1q𝔼(Xi2)=qwuv2puvwuv2nc2qβ,\sum_{i=1}^{q}\mathbb{E}(X_{i}^{2})=q\frac{w_{uv}^{2}}{p_{uv}}\leq\frac{w_{uv}^{2}nc_{2}q}{\beta},

where c2=min(deg(u),deg(v))c_{2}=\min(\text{deg}(u),\text{deg}(v)).

Applying Bernstein’s Inequality (Theorem 1), we obtain

{i=1qXiqwuvt}exp((tq)221wuvc1q+c1tq3).\mathbb{P}\left\{\frac{\sum_{i=1}^{q}X_{i}}{q}-w_{uv}\geq t\right\}\leq\exp\left(-\frac{(tq)^{2}}{2}\cdot\frac{1}{w_{uv}c_{1}q+\frac{c_{1}tq}{3}}\right).
(tq)221(wuvc1q+c1tq3)lognmϵ1\frac{(tq)^{2}}{2}\frac{1}{(w_{uv}c_{1}q+\frac{c_{1}tq}{3})}\geq\log{nm}\equiv\epsilon_{1}

Completing the square yields

(tϵ1c13q)22ϵ1wuvc1q+(ϵ1c13q)2.\left(t-\frac{\epsilon_{1}c_{1}}{3q}\right)^{2}\geq\frac{2\epsilon_{1}w_{uv}c_{1}}{q}+\left(\frac{\epsilon_{1}c_{1}}{3q}\right)^{2}.

Hence, it suffices to choose

tϵ1c13q+2ϵ1wuvc1q+(ϵ1c13q)2.t\geq\frac{\epsilon_{1}c_{1}}{3q}+\sqrt{\frac{2\epsilon_{1}w_{uv}c_{1}}{q}+\left(\frac{\epsilon_{1}c_{1}}{3q}\right)^{2}}.

Theorem 2 provides upper bounds on the edge weights of the sparse graph G¯\bar{G} generated by Algorithm 5. Furthermore, it is necessary to establish bounds on the node degrees in the sparse graph G¯\bar{G}. To derive these bounds, we introduce Theorem 3, which utilizes the established relationship between the eigenvalues of the Laplacian matrix and the node degrees, as discussed in Section 5.1.
Notation: Let GG be a simple weighted graph, i.e., a graph without self-loops or multiple edges with vertices {v1,v2,,vn}\{v_{1},v_{2},\ldots,v_{n}\}. We define two subsets of vertices in GG based on their degrees. The set lml_{m} contains the nm+1n-m+1 vertices with the highest degrees, while sms_{m} consists of the mm vertices with the lowest degrees. The subgraphs GlmG_{l_{m}} and GsmG_{s_{m}} are defined as the subgraphs of GG induced by the vertex sets lml_{m} and sms_{m}, respectively.

aub\displaystyle a_{ub} =upper bound on the maximal edge weight of the sparsified graph,\displaystyle=\quad\text{upper bound on the maximal edge weight of the sparsified graph}, (8)
Δ(Gli)ub\displaystyle\Delta(G_{l_{i}})_{ub} =upper bound on the maximum degree in Gli,\displaystyle=\quad\text{upper bound on the maximum degree in }G_{l_{i}},
δ(Gsi)lb\displaystyle\delta(G_{s_{i}})_{lb} =lower bound on the minimum degree in Gsi.\displaystyle=\quad\text{lower bound on the minimum degree in }G_{s_{i}}.

Here dv1=d1dv2=d2dvn=dn.d_{v_{1}}=d_{1}\leq d_{v_{2}}=d_{2}\leq\ldots\leq d_{v_{n}}=d_{n}.

5.1   Some known bounds on eigen values of Laplacian matrix

Unweighted graphs: [31](corollary 1) For simple unweighted graphs on nn vertices and GKnm+1+(m1)K1G\neq K_{n-m+1}+(m-1)K_{1} and G¯Km1+(nm+1)K1\bar{G}\neq K_{m-1}+(n-m+1)K_{1}, we have the following relation

dmn+m+1λm(G)dm1(G)+m2.d_{m}-n+m+1\leq\lambda_{m}(G)\leq d_{m-1}(G)+m-2.

Weighted graphs: [31](corollary 2) Let GG be finite simple weighted graph on nn vertices and denote by a the maximal weight of an edge in GG, then

dm(G)Δ(Glm)λm(G)dm1(G)+(m1)aδ(GSm1)d_{m}(G)-\Delta(G_{l_{m}})\leq\lambda_{m}(G)\leq d_{m-1}(G)+(m-1)a-\delta(G_{S_{m-1}}) (9)

We present the following theorem, which establishes bounds on the degrees of the spectral sparsifier G¯\bar{G} of GG which are incorporated as constraints in the dynamic optimization step of our framework (Section 5.2).

Theorem 3.

Let G¯\bar{G} be an ϵ\epsilon-spectral sparsifier of an undirected graph GG with nn vertices, where the eigenvalues of GG satisfy λ1λ2λn\lambda_{1}\leq\lambda_{2}\leq\cdots\leq\lambda_{n}, with λn\lambda_{n} as the largest eigenvalue. The following bounds on the degrees of G¯\bar{G} must hold:

d2(G)Δ(Gl2)(1ϵ)a(G)d1(G¯)d1(G)(1+ϵ),d_{2}(G)-\Delta(G_{l_{2}})(1-\epsilon)-a(G)\leq d_{1}(\bar{G})\leq d_{1}(G)(1+\epsilon),
(di+1(G)Δ(Gli+1))(1ϵ)ia(G¯)+δ(G¯si)lbdi(G¯)(1+ϵ)(di1(G)+(i1)a(G)δ(Gsi1))+Δ(G¯li)ub,(d_{i+1}(G)-\Delta(G_{l_{i+1}}))(1-\epsilon)-ia(\bar{G})+\delta(\bar{G}_{s_{i}})_{lb}\leq d_{i}(\bar{G})\leq(1+\epsilon)(d_{i-1}(G)+(i-1)a(G)-\delta(G_{s_{i-1}}))+\Delta(\bar{G}_{l_{i}})_{ub},

for i = 2 to n-1.

dn(G)(1ϵ)dn(G¯)(1+ϵ)(dn1(G)+(n1)a(G)δ(Gsn1)).d_{n}(G)(1-\epsilon)\leq d_{n}(\bar{G})\leq(1+\epsilon)(d_{n-1}(G)+(n-1)a(G)-\delta(G_{s_{n-1}})).
Proof.

Since G¯\bar{G} is an ϵ\epsilon-spectral approximation of GG, the eigenvalues satisfy:

(1ϵ)λi(G)λi(G¯)(1+ϵ)λi(G),(1-\epsilon)\lambda_{i}(G)\leq\lambda_{i}(\bar{G})\leq(1+\epsilon)\lambda_{i}(G), (a)

for i=1,,ni=1,\dots,n.

From the spectral bounds on eigenvalues in Equation 9, we know:

λi(G)di1(G)+(i1)a(G)δ(Gsi1),λi(G)di(G)Δ(Gli).\lambda_{i}(G)\leq d_{i-1}(G)+(i-1)a(G)-\delta(G_{s_{i-1}}),\quad\lambda_{i}(G)\geq d_{i}(G)-\Delta(G_{l_{i}}). (b)

Applying (a) to the upper bound in (b), we obtain:

di(G¯)(1+ϵ)(di1(G)+(i1)a(G)δ(Gsi1))+Δ(G¯li)ub.d_{i}(\bar{G})\leq(1+\epsilon)\big{(}d_{i-1}(G)+(i-1)a(G)-\delta(G_{s_{i-1}})\big{)}+\Delta(\bar{G}_{l_{i}})_{ub}. (c)

Similarly, using the lower bound in (b), we have:

di1(G¯)(di(G)Δ(Gli))(1ϵ)(i1)a(G¯)+δ(G¯si1)lb.d_{i-1}(\bar{G})\geq\big{(}d_{i}(G)-\Delta(G_{l_{i}})\big{)}(1-\epsilon)-(i-1)a(\bar{G})+\delta(\bar{G}_{s_{i-1}})_{lb}. (d)

For the largest degree, we use the fact that G¯\bar{G} preserves cuts in GG, giving:

dn(G)(1ϵ)dn(G¯)(1+ϵ)(dn1(G)+(n1)a(G)δ(Gsn1)).d_{n}(G)(1-\epsilon)\leq d_{n}(\bar{G})\leq(1+\epsilon)(d_{n-1}(G)+(n-1)a(G)-\delta(G_{s_{n-1}})). (e)

Hence, the degree bounds for all nodes in G¯\bar{G} are established. ∎

Let δiu\delta_{i}^{u} and δil\delta_{i}^{l} denote the upper and lower bounds on the degree of node ii, respectively, as established in Theorem 3. Additionally, let δ(i)\delta^{-}(i) represent the refined lower bound of degree ii, and Δ+(i)\Delta^{+}(i) the refined upper bound:

di(G¯)max(δil,(1ϵ)di(G))=δ(i),d_{i}(\bar{G})\geq\max\left(\delta_{i}^{l},(1-\epsilon)\,d_{i}(G)\right)=\delta^{-}(i),
di(G¯)min(δiu,(1+ϵ)di(G))=Δ+(i).d_{i}(\bar{G})\leq\min\left(\delta_{i}^{u},(1+\epsilon)\,d_{i}(G)\right)=\Delta^{+}(i).

Algorithm 5.1 outlines the computation of the upper bound Δ+\Delta^{+} and the lower bound δ\delta^{-} on the node degrees in the sparsified graph G¯\bar{G}.

Here, Δ+(i)\Delta^{+}(i) denotes the upper bound on the maximum degree in the subgraph of G¯\bar{G} induced by the vertices {vi,vi+1,,vn}\{v_{i},v_{i+1},\dots,v_{n}\}, while δ(i)\delta^{-}(i) represents the lower bound on the minimum degree in the subgraph induced by {v1,v2,,vni+1}\{v_{1},v_{2},\dots,v_{n-i+1}\}. Furthermore, we define upper and lower bounds on graph GG as Δ(i)=Δ(Gli)\Delta(i)=\Delta(G_{l_{i}}) and δ(i)=δ(Gsi)\delta(i)=\delta(G_{s_{i}}).

———————————————————————————————————————
Algorithm 1
———————————————————————————————————————
Input : Graph G=(V,E,W)G=(V,E,W), ϵ\epsilon
Output : Δ+,δ,Δ,δ,w+,w\Delta^{+},\delta^{-},\Delta,\delta,w^{+},w^{-}
w+={};w={}w^{+}=\{\};\;\;w^{-}=\{\}
for each edge eE(G)e\in E(G) do
       Compute tt using Theorem 2.
       Append we+tw_{e}+t to w+w^{+}.
       Append max(wet,0)\max(w_{e}-t,0) to ww^{-}.
end for
node{1,2,,n}\text{node}\leftarrow\{1,2,\ldots,n\}
degree {d1,d2,,dn}\leftarrow\;\{d_{1},d_{2},\ldots,d_{n}\} dfs(GG) // O(V+EV+E)
degree,degree1,degree+,degreedictionary({v1,v2,,vn}{dv1,dv2,,dvn}) (where dv1dv2dvn)\text{degree},\text{degree1},\text{degree}^{+},\text{degree}^{-}\leftarrow\text{dictionary($\{v_{1},v_{2},\ldots,v_{n}\}$, $\{d_{v_{1}},d_{v_{2}},\ldots,d_{v_{n}}\}$) (where $d_{v_{1}}\leq d_{v_{2}}\leq\ldots\leq d_{v_{n}}$)}
Δ(1)=max(degree),δ(n)=min(degree),a(G)(maximal weight)\Delta(1)=\text{max(degree)},\delta(n)=\text{min(degree)},a(G)(\text{maximal weight})
Δ(n)=0,Δ+(n)=0\Delta(n)=0,\;\;\Delta^{+}(n)=0
δ(1)=0,δ(1)=0\delta(1)=0,\;\;\delta^{-}(1)=0
Δ+(1)=max(degree),δ(n)=min(degree)\Delta^{+}(1)=\text{max(degree)},\delta^{-}(n)=\text{min(degree)}
l2l_{2} \leftarrow node; l3l_{3} \leftarrow node
Remove node v1v_{1} from l2l_{2}
Remove node vnv_{n} from l3l_{3}
for i=2i=2 to n1n-1 do
      
      for y={vi+1,vi+2,,vn}y=\{v_{i+1},v_{i+2},\ldots,v_{n}\} do
             if vi,yEv_{i},y\in E
                   degree(yy) = degree(yy) - w(vi,y)w(v_{i},y), degree(viv_{i}) = degree(viv_{i}) - w(vi,y)w(v_{i},y)
                   degree+(y)=degree+(y)w+(vi,y)\text{degree}^{+}(y)=\text{degree}^{+}(y)-w^{+}(v_{i},y), degree+(vi)=degree+(vi)w+(vi,y)\text{degree}^{+}(v_{i})=\text{degree}^{+}(v_{i})-w^{+}(v_{i},y)
       end for
      
      Remove node viv_{i} from l2l_{2}
for y={v1,v2,,vni}y=\{v_{1},v_{2},\ldots,v_{n-i}\} do
             if vni+1,yEv_{n-i+1},y\in E
                   degree1(yy) = degree1(yy) - w(vi,y)w(v_{i},y), degree1(viv_{i}) = degree1(viv_{i}) - w(vi,y)w(v_{i},y)
                   degree(y)=degree(y)w(vi,y)\text{degree}^{-}(y)=\text{degree}^{-}(y)-w^{-}(v_{i},y), degree(vi)=degree(vi)w(vi,y)\text{degree}^{-}(v_{i})=\text{degree}^{-}(v_{i})-w^{-}(v_{i},y)
       end for
      
      δ(ni+1)\delta(n-i+1) = minxl3degree1\min_{x\in l_{3}}\text{degree1}
       δ(ni+1)\delta^{-}(n-i+1) = minxl3degree\min_{x\in l_{3}}\text{degree}^{-}
       Remove node vni+1v_{n-i+1} from l3l_{3}
      
end for

The outputs of Algorithm 5.1 (Δ+,δ,w+,w\Delta^{+},\delta^{-},w^{+},w^{-}) are incorporated as constraints in the dynamic optimization process (Constraints D3, D4, D5, D6). These constraints are specifically applied in the dynamic optimization of linear dynamical systems on graphs, as discussed in Section 5.2 and outlined in Section 4.

5.2   Dynamic Optimization for Linear Dynamical Systems on Graphs Using a POD-Based Surrogate Model

Real-time dynamic optimization problems are described in [32], [33], and [34]). Using snapshots of solutions from several trajectories (Section 2.2), we formulate the dynamic optimization problem in a ROM space [3], [35] using the matrix ρk×|V|\rho\in\mathbb{R}^{k\times|V|} of projection and mean x¯\bar{x}, where kk denotes the reduced dimension and k<nk<n (Section 2.3). The cost function JJ (Eq. 10) in the dynamic optimization makes use of data points obtained from ROM. The dynamic optimization framework for obtaining a sparse graph in the case of diffusion (𝔻\mathbb{D}) considering a single trajectory is given below:

minimizeγ¯J(z¯,γ¯)=i=1Nj=1k(z(i,j)(istep m1)z¯(i,j)(istep m1))22A1+αi=1m|γ¯i|A2\underset{\bar{\gamma}}{\textbf{minimize}}\,\,J(\bar{z},\bar{\gamma})=\underbrace{\frac{\sum_{i=1}^{N}\sum_{j=1}^{k}(z(i,j)(i\;\text{step }m_{1})-\bar{z}(i,j)(i\;\text{step }m_{1})\;)^{2}}{2}}_{A1}+\underbrace{{\alpha}\sum_{i=1}^{m}|\bar{\gamma}_{i}|}_{A2} (10)
subject   to
tnNfa(z¯(i,1),z¯(i,2),,z¯(i,k),γ¯)=[z¯(i,1)z¯(i,2)...z¯(i,k)][z¯(i1,1)z¯(i1,2)...z¯(i1,k)]\displaystyle t_{n}\;Nf_{a}(\bar{z}(i,1),\bar{z}(i,2),\ldots,\bar{z}(i,k),\bar{\gamma})=\left[\begin{array}[]{c}\bar{z}(i,1)\\ \bar{z}(i,2)\\ \ldotp\\ \ldotp\\ \ldotp\\ \bar{z}(i,k)\end{array}\right]-\left[\begin{array}[]{c}\bar{z}(i-1,1)\\ \bar{z}(i-1,2)\\ \ldotp\\ \ldotp\\ \ldotp\\ \bar{z}(i-1,k)\\ \end{array}\right] i=1,2,C\displaystyle\;\;i=1,2,\ldots C
{z¯(0,1),z¯(0,2),,z¯(0,k)}=ρ(F0F¯)\displaystyle\{\bar{z}(0,1),\bar{z}(0,2),\ldots,\bar{z}(0,k)\}=\rho(F_{0}-\bar{F})
Qγ¯δ\displaystyle Q^{{}^{\prime}}\bar{\gamma}\geq\delta^{-}\quad (D3)
Qγ¯Δ+\displaystyle Q^{{}^{\prime}}\bar{\gamma}\leq\Delta^{+}\quad (D4)
wjγ¯jwj,\displaystyle w_{j}\bar{\gamma}_{j}\geq w^{-}_{j}, j=1,2,,m\displaystyle j=1,2,\dots,m (D5)
wjγ¯jwj+,\displaystyle w_{j}\bar{\gamma}_{j}\leq w^{+}_{j}, j=1,2,,m\displaystyle j=1,2,\dots,m (D6)
γ¯j0,\displaystyle\bar{\gamma}_{j}\geq 0, j=1,2,,m\displaystyle j=1,2,\dots,m (D7)

The term A1A1 in the objective function (Eq. 10) aims to minimize the discrepancy between the states obtained using the projected vector field with multipliers γ¯\bar{\gamma} and the known ROM solution obtained from the original graph. The term A2A2 enforces sparsity in γ¯\bar{\gamma} by penalizing nonzero elements. The optimization considers states at discrete increments of step size m1m_{1}. The matrix NN is determined by the chosen collocation method (see [18]).

tnt_{n} denotes the step size. CC denotes the number of collocation elements. z(i,k)z(i,k) represents the kk-th entry of state zz in the ii-th collocation element. F0={F(0,1),F(0,2),,F(0,|V|)}F_{0}=\{F(0,1),F(0,2),\ldots,F({0,|V|})\} represents the initial condition used for generating data using ROM z(i,j)z(i,j). F¯\bar{F} is determined based on the procedure described as in [3]. The first set of kCkC constraints stems from the orthogonal collocation method on finite elements (Eq. 5.2). Non-negativity of the multipliers γ¯j\bar{\gamma}_{j} is imposed in constraints (Eq. D7). The constraints Qγ¯δQ^{\prime}\bar{\gamma}\geq\delta^{-} (Eq. D3) and Qγ¯Δ+Q^{\prime}\bar{\gamma}\leq\Delta^{+} (Eq. D4) enforce connectivity levels as derived in Theorem 3. Here, Q=Qdiag(w)Q^{{}^{\prime}}=Q\;\mathrm{diag(}w), where QQ represents the unsigned incidence matrix of the graph and ww represents the weights of the given graph. The weight constraints wjγjwjw_{j}\gamma_{j}\geq w^{-}_{j} and wjγjwj+w_{j}\gamma_{j}\leq w^{+}_{j} are based on Theorem 2. From the output of the dynamic optimization framework (z,γ¯)(z^{*},\bar{\gamma}^{*}), we prune out weights wjϵ1w^{*}_{j}\leq\epsilon_{1}, where ϵ1\epsilon_{1} is a user-defined parameter (wj=wjγjw^{*}_{j}=w_{j}\gamma^{*}_{j}). The new weight vector is denoted as w1w_{1}^{*}. The sparse graph Laplacian L1=BTdiag(w1)BL_{1}=B^{T}\text{diag}(w_{1}^{*})B replaces the standard Laplacian LL in the filtering step to improve the efficiency of Laplacian-vector product computations.

Several key observations can be made about the dynamic optimization problem. The objective function (JJ) can be made continuous by using the substitution γ=γ+γ\gamma=\gamma^{+}-\gamma^{-}, γ1=𝟏Tγ++𝟏Tγ||\gamma||_{1}=\mathbf{1}^{T}\gamma^{+}+\mathbf{1}^{T}\gamma^{-}. γ+\gamma^{+} represents the positive entries in γ\gamma (0 elsewhere) and γ\gamma^{-} represents the negative entries in γ\gamma (0 elsewhere). The objective function is convex, and the constraints are continuous. To solve this optimization problem, we can employ the Barrier method for constrained optimization, as discussed in [36] and [37].

5.3    Dynamic Optimization for Reaction-diffusion Dynamical Systems on Graphs Using a POD-Based Surrogate Model

In this section, we present the formulation of the dynamic optimization framework for a reaction-diffusion system. A general reaction-diffusion system is described in [21], where the activity at node jj at time tt is represented by an m\mathrm{m}-dimensional variable rj(t)r_{j}(t). The temporal evolution of rjr_{j} follows the differential equation:

drjdt=(rj)+Kk=1NAjk𝒢(rkrj)j=1,2,.N.\frac{dr_{j}}{dt}=\mathcal{F}(r_{j})+K\sum_{k=1}^{N}A_{jk}\mathcal{G}(r_{k}-r_{j})\hskip 14.22636ptj=1,2,....N.

\mathcal{F} denotes the reaction component, while the remaining terms represent the diffusion process. :mm\mathcal{F}:\mathbb{R}^{\mathrm{m}}\rightarrow\mathbb{R}^{\mathrm{m}}, 𝒢:mm\mathcal{G}:\mathbb{R}^{\mathrm{m}}\rightarrow\mathbb{R}^{\mathrm{m}}. For brevity, we use the alternating self-dynamics Brusselator model (\mathbb{C}) discussed in Section 2. When the input graph is connected, We impose a minimum connectivity constraint to limit perturbations in the second smallest eigenvalue of the graph Laplacian matrix. The intuition behind this constraint is that for a connected graph, the lowest degree will be greater than zero, making this constraint necessary. τL\tau_{L} is the minimum degree imposed on the sparse graph (Eq. D10). Non-negativity of weights is also imposed on the dynamic optimization problem (Eq. D11). While [35] discusses generating sparse graphs using snapshots of data at arbitrary time points, we utilize data points at collocation time points for sparsification. We consider the following dynamic optimization problem for a reaction-diffusion system considering a single trajectory:

minimizeγ¯J(z¯,γ¯)=i=1Nj=1k(z(i,j)(istep m1)z¯(i,j)(istep m1))22+αi=1m|γ¯i|\underset{\bar{\gamma}}{\textbf{minimize}}\,\,J(\bar{z},\bar{\gamma})={\frac{\sum_{i=1}^{N}\sum_{j=1}^{k}(z(i,j)(i\;\text{step $m_{1}$})-\bar{z}(i,j)(i\;\text{step $m_{1}$})\;)^{2}}{2}}+{{\alpha}\sum_{i=1}^{m}|\bar{\gamma}_{i}|}
subject   to
tnNfa(z¯(i,1),z¯(i,2),,z¯(i,k),γ¯)=[z¯(i,1)z¯(i,2)...z¯(i,k)][z¯(i1,1)z¯(i1,2)...z¯(i1,k)]\displaystyle t_{n}\;Nf_{a}(\bar{z}(i,1),\bar{z}(i,2),\ldots,\bar{z}(i,k),\bar{\gamma})=\left[\begin{array}[]{c}\bar{z}(i,1)\\ \bar{z}(i,2)\\ \ldotp\\ \ldotp\\ \ldotp\\ \bar{z}(i,k)\end{array}\right]-\left[\begin{array}[]{c}\bar{z}(i-1,1)\\ \bar{z}(i-1,2)\\ \ldotp\\ \ldotp\\ \ldotp\\ \bar{z}(i-1,k)\\ \end{array}\right] i=1,2,C\displaystyle\;\;i=1,2,\ldots C
{z¯(0,1),z¯(0,2),,z¯(0,k)}=ρ(X0X¯)\displaystyle\{\bar{z}(0,1),\bar{z}(0,2),\ldots,\bar{z}(0,k)\}=\rho(X_{0}-\bar{X})
Qγ¯τL\displaystyle Q^{{}^{\prime}}\bar{\gamma}\geq\tau_{L}\quad (D10)
γ¯j0,\displaystyle\bar{\gamma}_{j}\geq 0, j=1,2,,m\displaystyle j=1,2,\dots,m (D11)

The initial condition X0={x(0,1),x(0,2),,x(0,|V|),y(0,1),y(0,2),,y(0,|V|)}X_{0}=\{x(0,1),x(0,2),\ldots,x(0,{|V|}),y(0,1),y(0,2),\ldots,y(0,{|V|})\} is used to obtain data points (z(i,j)z(i,j)) using ROM. X¯\bar{X} is determined as described in [3]. The output from the dynamic optimization problem is (z¯,γ¯)(\bar{z}^{*},\bar{\gamma}^{*}). Then, w1=diag(W)γw_{1}=\mathrm{diag(}W)\gamma^{*}. Elements in w1w_{1} less than ϵ1\epsilon_{1} are set to 0, and w1w_{1} is updated. We update the solutions in the filtering step using this sparsified graph G1=(V,E1,w1)G_{1}=(V,E_{1},w_{1}) when the uncertainty value in the covariance matrix of filtering exceeds a threshold.

6   Experimental Results

In this section, we present empirical results demonstrating the effectiveness of the proposed framework in reinforcing graph-based linear dynamical systems (𝔻\mathbb{D}) and nonlinear reaction-diffusion systems, specifically the chemical Brusselator model (\mathbb{C}), as described in Section 2.2. We evaluate the effectiveness of our framework by analyzing RMSE values under perturbations in initial conditions for surrogate models applied to both linear and reaction-diffusion systems on graphs. Additionally, we assess the impact of incorporating our framework on these models for both linear and non-linear dynamical systems. The influence of initial condition perturbations on neural ODE-based surrogate models, both with and without our framework discussed in Section 7, is also presented in Table 1.

Experiment Surrogate model Surrogate model with Framework ROM
RMSE RMSE RMSE
Reaction-Diffusion with ROM surrogate model 0.59 0.40
Linear Diffusion with ROM surrogate model 5.17 0.38
Linear Diffusion with Neural ODE surrogate model 0.48 0.29 0.65
Table 1: Comparison of RMSE values across different experiments, analyzing the effects of perturbations of input on surrogate models. The Reaction-Diffusion experiment is conducted on a 40-node Erdős-Re´\acute{e}nyi random graph using the chemical Brusselator dynamics (\mathbb{C}), as detailed in Section 2.2. The Linear Diffusion experiment is performed on a 30-node Erdős-Re´\acute{e}nyi random graph with the graph-based dynamical system (𝔻\mathbb{D}), employing the ROM (Section 2.3) as the surrogate model. For both the Linear Diffusion and Reaction-Diffusion experiments, we assess the robustness of the surrogate model with our framework under perturbations. In the Linear Diffusion experiment with neural ODEs (Section 7), we analyze the impact of perturbations on a neural-ODE surrogate model for the diffusion equation (𝔻\mathbb{D} in Section 2.2) on a 10-node complete graph. Surrogate model with our framework achieves the lowest RMSE across all experiments, demonstrating its effectiveness in improving surrogate model accuracy under perturbations.

Remark: In certain graph structures, empirical experiments revealed instances of particle filter divergence. Particle filter divergence is a critical issue that must be carefully addressed, as it compromises estimation accuracy and reduces the framework’s effectiveness. Several factors can contribute to this phenomenon, including suboptimal filter tuning, modeling inaccuracies, inconsistent measurement data, or system-related issues such as hardware-induced delays. Specifically, inaccurate likelihood estimations due to imprecise noise assumptions, erroneous process models, or delayed measurement updates can lead to divergence. For further examples, see [38]. Empirically, we observed that in certain graph configurations, the particle filter exhibited divergence, necessitating additional updates in the filtering step.

6.1   Linear dynamical system represented on graphs

We present the experimental results for the linear diffusion equation (𝔻\mathbb{D}) on a 30-node Erdős-Re´\acute{e}nyi random graph using the 4-point orthogonal collocation method with 20 elements. The parameters used in Algorithm 5.1 and Section 5.2 are as follows: ϵ=0.5\epsilon=0.5, T=0.15T=0.15, k=min(n5,50)k=\mathrm{min}(\lceil\frac{n}{5}\rceil,50), with two trajectories considered. The number of clusters is set to p=30p=30, while the particle filter employs 15,000 particles with ϵ1=1×105\epsilon_{1}=1\times 10^{-5}. The noise terms wkw_{k} and μk\mu_{k} are assumed to be normally distributed with zero mean and variances ζxI\zeta_{x}I and ζyI\zeta_{y}I, where ζx=0.01\zeta_{x}=0.01 and ζy=1×107\zeta_{y}=1\times 10^{-7}.

Following the dynamic optimization step, the resulting graph exhibited 31 sparse edges, a substantial reduction from the original graph’s 336 edges. During the filtering step, 193 updates were required for prediction over 1000 timesteps. Figure 3 compares the reduced order model solution, the actual solution, and the reduced-order model solution with our framework for the linear dynamical system (𝔻\mathbb{D}) described in Section 2.2. Figure 3(c) shows that the ROM solution with our framework closely resembles the actual solution in Figure 3(b) than the reduced-order model solution in Figure 3(a), as discussed in Section 2.3.

Refer to caption
Figure 3: Comparison of the (a) ROM Solution, (b) Actual Solution, and (c) ROM solution with our framework for the graph-based dynamical system 𝔻\mathbb{D}, as described in Section 2.2.

6.2   Reaction-diffusion system represented on graphs

Refer to caption
Figure 4: Comparing (a) ROM, (b) Actual, and (c) ROM solution with our framework for the case of the chemical Brusselator reaction-diffusion system (Eq. 1).

The results for the nonlinear case are based on the alternating self-dynamics Brusselator model applied to a 40-node Erdős-Re´\acute{e}nyi random graph, using the 4-point orthogonal collocation method. The number of trajectories is set to two, and the particle filter employs 15,000 particles. The total simulation time is set to T=3T=3, with the number of clusters fixed at p=30p=30. The sparsification parameter ϵ1\epsilon_{1} is set to 10510^{-5}, and the minimum connectivity constraint is defined as τL=0.1ds\tau_{L}=0.1\mathrm{d}_{s}, where ds\mathrm{d}_{s} denotes the minimum degree of the graph. The reduced-order dimension is given by k=min(2n5,50)k=\mathrm{min}(\lceil\frac{2n}{5}\rceil,50). The noise terms wkw_{k} and μk\mu_{k} are assumed to be normally distributed with zero mean and variances ζxI\zeta_{x}I and ζyI\zeta_{y}I, where ζx=0.01\zeta_{x}=0.01 and ζy=1×107\zeta_{y}=1\times 10^{-7}. The graph obtained from the dynamic optimization step contained 286 sparse edges, a reduction from the original graph’s 336 edges. The filtering step required a total of 50 updates for predictions over 100 timesteps.

Figure 4 presents a comparison of the actual solution of the chemical Brusselator reaction-diffusion system (\mathbb{C} in Section 2.2) for a randomly initialized condition, with both the reduced-order model (Section 2.3) and the ROM solution with our framework. The grid-based representation highlights regions marked in black, indicating areas where the absolute error of the particle filter solution is lower than that of the ROM solution.

7   Benchmarking the framework using neural ODEs

To broaden the applicability of our framework, we evaluate its effectiveness using a neural-ODE-based surrogate model, where state vectors are observed at discrete time intervals. Neural ordinary differential equations (neural ODEs) provide a powerful framework for modeling and analyzing complex dynamical systems [39, 40]. They offer a flexible approach to capturing system dynamics by integrating the principles of ordinary differential equations with neural networks. Unlike traditional neural networks that process inputs through discrete layers, neural ODEs represent system dynamics continuously using an ordinary differential equation to govern the evolution of hidden states over time. Neural ODEs learn system dynamics by learning the parameters of the differential equation using the adjoint sensitivity method ([23]). This approach allows the model to capture complex temporal dependencies in data. By leveraging the continuous-time nature of differential equations, neural ODEs provide key advantages, including the ability to model irregularly sampled time-series data and accommodate variable-length inputs.

To illustrate the framework, we consider a linear dynamical system (𝔻\mathbb{D}) from Section 2.2, defined as follows:

dxdt=Lx.\frac{dx}{dt}=-Lx. (11)

The key distinction in applying our framework with the neural ODE model as the surrogate is the absence of the dynamic optimization step outlined in Section 4. Instead, an additional step is included to train the parameters of the neural ODE surrogate model, as described in Equation 13. The neural ODE solutions are treated as observations in the filtering step after projection (zkz_{k}), where the POD method is applied for dimensionality reduction. The forward state dynamics and the state observation relationship, as applied in Step 4 of the filtering process within the framework, are given by:

{xk+1=M(xk)+Ak+1vk+1+wk,zk=ρ(xkx¯)+Rkβk+μk.\begin{cases}x_{k+1}=M({x}_{k})+A_{k+1}v_{k+1}+w_{k},\\ z_{k}=\rho\left(x_{k}-\bar{x}\right)+R_{k}\beta_{k}+\mu_{k}.\end{cases} (12)

When utilizing the neural ODE surrogate model with an Euler discretization step Δtk\Delta t_{k}, the forward model is expressed as:

M(xk)=xk+Δtknn(xk).M({x}_{k})=x_{k}+\Delta t_{k}\cdot nn(x_{k}).

At iteration kk, the vector βk\beta_{k} is updated by solving the following linear system:

Rk+1βk=zk+1ρ(xk+1x¯).R_{k+1}\beta_{k}=z_{k+1}-\rho(x_{k+1}-\bar{x}).

The matrices Ak+1A_{k+1} and Rk+1R_{k+1} are computed as described in Step 3 of the framework (Section 4). The solution at time step k+1k+1 is determined as:

xk+1=xk+Δtknn(xk),x_{k+1}=x_{k}+\Delta t_{k}nn(x_{k}),

where Δtk=tk+1tk\Delta t_{k}=t_{k+1}-t_{k}. The vector vkv_{k} is initialized as 𝟎n×1\mathbf{0}_{n\times 1}. The neural network architecture is structured as follows:

nn(x(t))=sinh(θ3θ2θ1x(t)+θ3θ2b1+b3).nn(x(t))=\sinh{(\theta_{3}\theta_{2}\theta_{1}x(t)+\theta_{3}\theta_{2}b_{1}+b_{3})}.
θ1\displaystyle\theta_{1} 10×50,θ250×50,θ350×10,b150,b310.\displaystyle\in\mathbb{R}^{10\times 50},\theta_{2}\in\mathbb{R}^{50\times 50},\theta_{3}\in\mathbb{R}^{50\times 10},b_{1}\in\mathbb{R}^{50},b_{3}\in\mathbb{R}^{10}.

The experiment involved 41 updates, with predictions performed over 100 timesteps. The number of particles in the particle filter step was set to 20,000. The noise terms wkw_{k} and μk\mu_{k} were assumed to follow normal distributions with zero mean and variances ζxI\zeta_{x}I and ζyI\zeta_{y}I, where ζx=0.01\zeta_{x}=0.01 and ζy=1×103\zeta_{y}=1\times 10^{-3}. The Laplacian matrix LL was chosen as the Laplacian of the complete graph 𝕂10\mathbb{K}_{10}, and N=30N=30 observations were randomly sampled over the time interval t=0t=0 to t=0.05t=0.05.

The cost function JJ (Eq. 13) for estimating the parameters of the neural ODE model is given by:

J(θ)=12k=1Nxkx~k22+α1θ1.J(\theta)=\frac{1}{2}\sum_{k=1}^{N}\|x_{k}-\tilde{x}_{k}\|_{2}^{2}+\alpha_{1}\|\theta\|_{1}. (13)

The parameter vector θ\theta, obtained by flattening θ1,θ2,θ3,b1,\theta_{1},\theta_{2},\theta_{3},b_{1}, and b3b_{3}, is represented as θ3560×1\theta\in\mathbb{R}^{3560\times 1}. For experimentation, the parameters in Eq. 13 are determined using the adjoint sensitivity method with an Euler discretization scheme. The number of clusters is fixed at 40.

Refer to caption
Figure 5: Comparison of (a) the actual solution, (b) the neural-ODE solution, (c) the neural-ODE solution with our framework, and (d) the ROM solution for a random initial condition in the case of a linear dynamical system on graphs (𝔻\mathbb{D} in Section 2.2).
Refer to caption
(a) Comparing absolute value of errors of neural-ODE solution with our framework and the neural ODE solution. Here the black regions indicate grid points where the absolute value of error in the neural-ODE solution with our framework is less than neural ODE solution.
Refer to caption
(b) Comparing absolute value of errors of neural-ODE solution with our framework and the ROM solution. Here the black regions indicate grid points where the absolute value of error in the neural-ODE solution with our framework is less than ROM solution.
Figure 6: Comparison of absolute errors between the neural-ODE surrogate model with our framework, the standalone neural-ODE model, and the ROM solution. Black regions indicate grid points where the solutions given by the framework achieves a lower absolute error than the other methods.

Figure 5 compares the neural-ODE solution for a random initial condition with the ROM and neural-ODE solution with our framework. The neural-ODE solution (Figure 5(b)) exhibits higher noise levels compared to the actual solution (Figure 5(a)). Figure 6 contrasts the ROM solution (Figure 5(d)) of the linear dynamical system (Equation 11) with the neural-ODE solution with our framework (Figure 5(c)). In the grid-wise comparison of solutions, the black regions in Figures 6(a) and 6(b) indicate areas where the error in the neural-ODE solution with our framework is lower than that in the neural-ODE and ROM solutions. Notably, the neural-ODE solution with our framework exhibits a greater number of black regions, suggesting a closer resemblance to the actual solution compared to the ROM and neural ODE solutions.

8   Conclusion and Discussion

In this study, we proposed a novel framework to improve the robustness of surrogate models against input perturbations, addressing a key challenge in modeling complex systems while ensuring computational efficiency. By integrating machine learning techniques with stochastic filtering, our approach has demonstrated significant improvements in surrogate model accuracy under perturbations. The experimental results presented in Section 1 provide strong validation of the framework’s effectiveness.

The versatility of our framework is evident from its application to dynamical systems represented on graphs and its extension to a general setting, where a neural-ODE-based surrogate model was employed to simulate complex physical phenomena. This not only highlights the robustness of our approach but also underscores its applicability across diverse domains, as discussed in Section 7.

Despite its strengths, our framework has certain limitations. Notably, its reliance on stochastic filtering methods introduces computational overhead and may also be susceptible to filter divergence in certain cases. While our study focuses on undirected graphs, many real-world complex systems involve directed networks, such as those governed by the complex Ginzburg-Landau equation [21]. Addressing directed graph structures presents an important avenue for future research, with potential extensions of our framework to handle these more intricate dynamical systems.

Acknowledgments

We thank Prof. S. Lakshmivarahan and Prof. Arun Tangirala for their insightful feedback, contributing to the enhancement of this work. We thank the anonymous reviewers for their insightful comments, which significantly improved the quality of the manuscript. This work was partially supported by the MATRIX grant MTR/2020/000186 of the Science and Engineering Research Board of India.

References

  • [1] Dirk Brockmann and Dirk Helbing. The hidden geometry of complex, network-driven contagion phenomena. Science, 342(6164):1337–1342, 2013.
  • [2] Sergei Maslov and I. Ispolatov. Propagation of large concentration changes in reversible protein-binding networks. Proceedings of the National Academy of Sciences, 104(34):13655–13660, 2007.
  • [3] Muruhan Rathinam and Linda R. Petzold. A new look at proper orthogonal decomposition. SIAM Journal on Numerical Analysis, 41(5):1893–1925, 2003.
  • [4] Suhan Kim and Hyunseong Shin. Data-driven multiscale finite-element method using deep neural network combined with proper orthogonal decomposition. Engineering with Computers, 40, 04 2023.
  • [5] B.A. Le, Julien Yvonnet, and Qi-Chang He. Computational homogenization of nonlinear elastic materials using neural networks. International Journal for Numerical Methods in Engineering, 104(12):1061–1084, 2015.
  • [6] Yanwen Huang and Yuanchang Deng. A hybrid model utilizing principal component analysis and artificial neural networks for driving drowsiness detection. Applied Sciences, 12(12), 2022.
  • [7] Redouane Lguensat, Pierre Tandeo, Pierre Ailliot, Manuel Pulido, and Ronan Fablet. The analog data assimilation. Monthly Weather Review, 145(10):4093–4107, 2017.
  • [8] D.C. Park and Yan Zhu. Bilinear recurrent neural network. Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94).
  • [9] D.-C. Park. A time series data prediction scheme using bilinear recurrent neural network. In 2010 International Conference on Information Science and Applications, pages 1–7, Seoul, Korea (South), 2010. IEEE.
  • [10] Julien Brajard, Alberto Carrassi, Marc Bocquet, and Laurent Bertino. Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: A case study with the lorenz 96 model. Journal of Computational Science, 44:101171, 2020.
  • [11] Peter D. Dueben and Peter Bauer. Challenges and design choices for global weather and climate models based on machine learning. Geoscientific Model Development, 11(10):3999–4009, 2018.
  • [12] Ronan Fablet, Souhaib Ouala, and Cédric Herzet. Bilinear residual neural network for the identification and forecasting of geophysical dynamics. In 2018 26th European Signal Processing Conference (EUSIPCO), pages 1477–1481, Rome, 2018. IEEE.
  • [13] Zichao Long, Yiping Lu, Xianzhong Ma, and Bin Dong. PDE-net: Learning PDEs from data. In Proceedings of the 35th International Conference on Machine Learning, page 9, 2018.
  • [14] Marc Bocquet, Jérémie Brajard, Alberto Carrassi, and Laurent Bertino. Data assimilation as a learning tool to infer ordinary differential equation representations of dynamical models. Nonlinear Processes in Geophysics, 26(3):143–162, 2019.
  • [15] Marc Bocquet, Jérémie Brajard, Alberto Carrassi, and Laurent Bertino. Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization. Foundations of Data Science, 2(1):55–80, 2020.
  • [16] Pavel Sakov, Jean-Michel Haussaire, and Marc Bocquet. An iterative ensemble kalman filter in the presence of additive model error. Quarterly Journal of the Royal Meteorological Society, 144(713):1297–1309, 2018.
  • [17] Alban Farchi, Patrick Laloyaux, Massimo Bonavita, and Marc Bocquet. Using machine learning to correct model error in data assimilation and forecast applications. Quarterly Journal of the Royal Meteorological Society, 147(739):3067–3084, jul 2021.
  • [18] John D. Hedengren, Reza Asgharzadeh Shishavan, Kody M. Powell, and Thomas F. Edgar. Nonlinear modeling, estimation and predictive control in apmonitor. Computers & Chemical Engineering, 70:133–148, 2014. Manfred Morari Special Issue.
  • [19] Lorenz T. Biegler. Nonlinear Programming. Society for Industrial and Applied Mathematics, 2010.
  • [20] P. T. Landsberg. The fourth law of thermodynamics. Nature, 238(5361):229–231, 1972.
  • [21] Giulia Cencetti, Pau Clusella, and Duccio Fanelli. Pattern invariance for reaction-diffusion systems on complex networks. Scientific Reports, 8(1), 2018.
  • [22] Chittaranjan Hens, Uzi Harush, Simi Haber, Reuven Cohen, and Baruch Barzel. Spatiotemporal signal propagation in complex networks. Nature Physics, 15(4):403–412, 2019.
  • [23] John M. Lewis, Sivaramakrishnan Lakshmivarahan, and Sudarshan Kumar Dhall. Dynamic Data Assimilation: A least squares approach. Cambridge Univ. Press, 2009.
  • [24] Tiancheng Li, Miodrag Bolic, and Petar M. Djuric. Resampling methods for particle filtering: Classification, implementation, and strategies. IEEE Signal Processing Magazine, 32(3):70–86, 2015.
  • [25] Fan R. K. Chung. Spectral Graph Theory. American Mathematical Society, Providence, RI, 1997.
  • [26] Richard L. Burden, J. Douglas Faires, and Annette M. Burden. Numerical analysis. Cengage Learning, 2016.
  • [27] Ming-Jun Lai, Jiaxin Xie, and Zhiqiang Xu. Graph sparsification by universal greedy algorithms, 2021. https://doi.org/10.48550/arXiv.2007.07161.
  • [28] Daniel A. Spielman and Shang-Hua Teng. Spectral sparsification of graphs. SIAM Journal on Computing, 40(4):981–1025, 2011.
  • [29] Daniel A. Spielman and Nikhil Srivastava. Graph sparsification by effective resistances, 2009.
  • [30] Sto´\acute{o}phane Boucheron, Ga´\acute{a}bor Lugosi, and Olivier Bousquet. Concentration inequalities. Sto´\acute{o}phane Boucheron, Ga´\acute{a}bor Lugosi, and Olivier Bousquet, 2013.
  • [31] Miriam Farber and Ido Kaminer. Upper bound for the laplacian eigenvalues of a graph, 2011. https://doi.org/10.48550/arXiv.1106.0769.
  • [32] Martin Grötschel, Sven Krumke, and Jörg Rambau. Online Optimization of Large Scale Systems. 01 2001.
  • [33] R. Donald Bartusiak. Nlmpc: A platform for optimal control of feed- or product-flexible manufacturing. Assessment and Future Directions of Nonlinear Model Predictive Control, page 367–381.
  • [34] Zoltan K. Nagy, Bernd Mahn, Rüdiger Franke, and Frank Allgöwer. Real-Time Implementation of Nonlinear Model Predictive Control of Batch Processes in an Industrial Framework, pages 465–472. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
  • [35] Abhishek Ajayakumar and Soumyendu Raha. Data assimilation for sparsification of reaction diffusion systems in a complex network, 2023. https://doi.org/10.48550/arXiv.2303.11943.
  • [36] Jorge Nocedal and Stephen J. Wright. Numerical optimization. Springer, 2006.
  • [37] David G. Luenberger and Yinyu Ye. Linear and nonlinear programming. Springer, 2021.
  • [38] Jeroen Elfring, Elena Torta, and Rob van de Molengraft. Particle filters: A hands-on tutorial. Sensors (Basel, Switzerland), 21(2):438, 2021.
  • [39] Wai Shing Fung and Nicholas J. A. Harvey. Graph sparsification by edge-connectivity and random spanning trees. CoRR, abs/1005.0265, 2010.
  • [40] Hanshu Yan, Jiawei Du, Vincent Y. F. Tan, and Jiashi Feng. On robustness of neural ordinary differential equations, 2022.