Improving Surrogate Model Robustness to Perturbations for Dynamical Systems Through Machine Learning and Data Assimilation

Abhishek Ajayakumar Soumyendu Raha

Abstract

Many real-world systems are modelled using complex ordinary differential equations (ODEs). However, the dimensionality of these systems can make them challenging to analyze. Dimensionality reduction techniques like Proper Orthogonal Decomposition (POD) can be used in such cases. However, these reduced order models are susceptible to perturbations in the input. We propose a novel framework that combines machine learning and data assimilation techniques to improving surrogate models to handle perturbations in input data effectively. Through rigorous experiments on dynamical systems modelled on graphs, we demonstrate that our framework substantially improves the accuracy of surrogate models under input perturbations. Furthermore, we evaluate the framework’s efficacy on alternative surrogate models, including neural ODEs, and the empirical results consistently show enhanced performance.

Keywords: Surrogate model $\cdot$ Machine learning $\cdot$ Dimensionality reduction $\cdot$ Reaction-diffusion system

1 Introduction

Complex networks provide valuable insights into real-world systems, such as the spread of epidemics [1] and the understanding of biochemical and neuronal processes [2]. However, modeling these systems becomes computationally complex when dealing with large state spaces. To address this challenge, methods like Proper Orthogonal Decomposition (POD) ([3], [4], [5], [6]) are employed to capture patterns while operating in reduced dimensions. While dimensionality reduction techniques like POD are effective, their sensitivity to data necessitates robust surrogate models that can maintain accuracy under input perturbations. Consequently, the question arises regarding how to enhance a surrogate model to accommodate perturbed data sets. This study proposes a framework designed to improve the robustness of surrogate models, ensuring their reliability under varying input conditions.

Surrogate models serve as approximations of complex real-world systems, which are often modeled using ordinary differential equations (ODEs) or partial differential equations (PDEs). In many cases, the complexity of these systems makes it difficult to fully capture their underlying processes. To address the challenges of modeling high-dimensional and chaotic systems, several machine learning (ML) approaches have been proposed to construct surrogate models that provide accurate and computationally efficient approximations. Several ML-based approaches have been developed to construct surrogate models that provide accurate and computationally efficient approximations. These include analog models [7], recurrent neural networks (RNNs) [8, 9], residual neural networks that approximate resolvents [10, 11], and differential equation-based models such as in [12, 13, 14, 15]. A notable study [10] integrates machine learning and data assimilation (DA) to improve predictions in scenarios with sparse observations. However, the DA step is computationally expensive, limiting its practical applicability. To address model inaccuracies, researchers have explored alternative approaches such as weak-constraint DA methods [16] and ML-DA frameworks [17].

The contributions of our paper can be summarized as follows:

1.

We introduce a novel framework (Figure 1) that improves the robustness of surrogate models against input perturbations by integrating data assimilation and machine learning techniques.
2.
We conduct extensive experiments on dynamical systems represented as graphs, specifically the diffusion system ( $\mathbb{D}$ ) and the chemical Brusselator model ( $\mathbb{C}$ ) discussed in Section 2. For dynamical systems on graphs, we propose a dynamic optimization step, described in Sections 5.2 and 5.3, to obtain a sparse graph and reduce memory complexity.
1. (a)
  
  For the diffusion equation ( $\mathbb{D}$ ) on graphs (Section 2.2), we derive conditions (Theorem 3) to guide the constraints in our dynamic optimization step.
2. (b)
  
  Section 5.3 presents the dynamic optimization step for reaction-diffusion systems on graphs, detailing the required constraints.
3.

Section 1 presents experimental results for the framework on both linear and non-linear dynamical systems represented on graphs, using the POD-based surrogate model. Empirical results in Table 1 demonstrate that our framework enhances surrogate model performance under input perturbations.
4.

In Section 7, we demonstrate how our framework can be integrated into a general setting to reinforce neural ODE-based surrogate models trained on data-driven systems. The improved models exhibit enhanced performance and robustness against input perturbations, outperforming both POD-based models and the original neural ODE solutions, as shown in Figures 3, 4, and 5.

2 Background

We present the fundamental tools and techniques essential for understanding the methodologies discussed in this paper.

2.1 Orthogonal collocation method

The orthogonal collocation method is a well-established numerical technique for solving differential equations, particularly in dynamic optimization problems. It provides a versatile approach applicable to both ordinary and partial differential equations [18, 19].

The proposed method entails partitioning the problem domain into a discrete set of collocation points, ensuring that the polynomial approximation of the vector field $f(t,z(t))$ satisfies the orthogonality condition. Common choices for collocation points include Legendre, Chebyshev, and Gaussian points.

The method enforces the differential equations at the collocation points, resulting in a system of algebraic equations that can be solved using numerical techniques such as Newton’s method or direct solvers. The choice of collocation points and basis functions significantly impacts the method’s accuracy and efficiency. To illustrate the method, we provide an example using the Lotka-Volterra system.

	$\displaystyle\frac{dx}{dt}$	$\displaystyle=\alpha x-\beta xy$
	$\displaystyle\frac{dy}{dt}$	$\displaystyle=\delta xy-\gamma y.$

Here $x$ and $y$ denote the prey and predator population. The parameters $\alpha,\beta$ determine the prey growth rate and $\delta,\gamma$ determine the predator growth rate. If we consider a single collocation element with two nodes per element, the orthogonal collocation method [18] solves the following system of equations to determine the prey and predator populations at time $(t_{s})$ , denoted as $(x_{t_{s}},y_{t_{s}})$ .

$t_{s}N_{2\times 2}\left(\begin{array}[]{c}\alpha x_{t_{s}}-\beta x_{t_{s}}y_{t_{s}}\\ \delta x_{t_{s}}y_{t_{s}}-\gamma y_{t_{s}}\end{array}\right)=\left(\begin{array}[]{c}x_{t_{s}}\\ y_{t_{s}}\end{array}\right)-\left(\begin{array}[]{c}x_{t_{0}}\\ y_{t_{0}}\end{array}\right)$ . $N_{2\times 2}=\left(\begin{array}[]{cc}0.75&-0.25\\ 1&0\end{array}\right).$ This method is applied in the dynamic optimization step of our framework (Section 4).

2.2 Spatio temporal propagation in graphs

Spatiotemporal propagation in complex networks has been widely studied across disciplines such as physics, biology, and neuroscience. Complex networks comprise interconnected nodes whose interactions give rise to dynamic processes. Understanding the mechanisms governing information, signal, or dynamic propagation within these networks is crucial for analyzing their complex behaviors and emergent properties.

Spatiotemporal propagation describes the transmission or diffusion of information, signals, or dynamics across interconnected nodes and edges in a complex network. It examines how localized events, disturbances, or modifications in one region of the network evolve and influence other nodes or areas. This phenomenon has significant implications across diverse fields, including physics, biology, and neuroscience. The following examples highlight key applications of complex networks.

1.

Diffusion Processes $(\mathbb{D})$ : The study of diffusion phenomena, including heat transfer and molecular diffusion, aids in modeling and optimizing processes involving transport and dispersion. By considering the normalized Laplacian matrix of the graph, the heat equation can be represented as an equivalent discrete dynamical system. Normalized Laplacian matrix $\mathcal{L}=D^{-1/2}LD^{-1/2}$ , where $D$ is the diagonal matrix of degrees and $L=D-A$ is the Laplacian matrix of the graph.

$\frac{dF}{dt}=-\mathcal{L}F,\;\;$

Here $F\in\mathbb{R}^{n},F(x,t)$ denotes the temperature at node $x$ and time $t$ .

Chemical Brusselator model ( $\mathbb{C}$ ): Introduced in 1971, the chemical Brusselator model exemplifies an autocatalytic chemical reaction system [20, 21]. Its dynamics are governed by the equations:

\left\{\begin{array}[]{l}\dot{x}_{i}\;=\;a-\left(b+d\right)x_{i}+c\;x_{i}^{2}y_{i}-D_{x}\;\sum_{j}L{\;}_{\mathrm{ij}}x_{j}\\ {\dot{y}_{i}\;=\;{bx}}_{i}-c\;x_{\;i}^{2}y_{i}-D_{y}\;\sum_{j}L_{\mathrm{ij}}\;y_{j}\end{array}\right.

(1)

3.

Epidemic Spread $(\mathbb{E})$ : Understanding how infectious diseases propagate through social networks can aid in designing effective strategies for disease control, information dissemination, and opinion formation. The SIS (susceptible-infected-susceptible) model, used to study epidemic spreading, is described as follows:

$\frac{dx_{i}}{dt}=-x_{i}+\sum_{j=1}^{N}A_{ij}(1-x_{i})x_{j}.$

$A_{ij}$ represents the $(i,j)$ -th entry of the adjacency matrix of the graph. $N$ denotes the number of nodes.
4.

Neural Dynamics ( $\mathbb{N}$ ): The study of neural activity propagation in brain networks provides insights into cognition and neurological disorders. One such system, described in [22], follows the equation:

$\frac{dx_{i}}{dt}=-Bx_{i}+C\tanh{x_{i}}+\sum_{j=1}^{N}A_{ij}\tanh{x_{j}}.$

2.3 Proper Orthogonal Decomposition for Dynamical systems

This section provides a concise overview of the application of Proper Orthogonal Decomposition (POD) to initial value problems in dynamical systems [3]. Given a dataset $\mathcal{D}$ consisting of a collection of points $x^{c}_{i}$ , where $x^{c}_{i}\in\mathbb{R}^{n}$ denotes the state of the system at time $t_{i}$ for a particular trajectory $c$ , the POD method seeks a subspace $S\subset\mathbb{R}^{n}$ such that

||\mathcal{D}-\rho_{S}\mathcal{D}||^{2}

is minimized. $\rho_{S}$ is the orthogonal projection onto the subspace $S$ and $\rho_{S}\mathcal{D}$ denotes the projected data set.

$S\subset\mathbb{R}^{n}$ is the best $k$ dimensional approximating affine subspace, with the matrix $\rho$ of projection consisting of leading $k$ eigenvectors of the covariance matrix ( $\bar{R}$ ). The subspace for a dataset $\mathcal{D}$ is uniquely determined by the projection matrix $P$ and the mean $\bar{x}$ , $P=\rho^{T}\rho$ .

	$\displaystyle\bar{R}$	$\displaystyle=\sum_{c=1}^{N_{T}}\int_{0}^{T}(x^{c}(t)-\bar{x})(x^{c}(t)-\bar{x})^{T}dt$
	$\displaystyle\bar{x}$	$\displaystyle=\frac{1}{N_{T}T}\sum_{c=1}^{N_{T}}\int_{0}^{T}x^{c}(t)dt.$

$N_{T}$ denotes the number of trajectories.

An asymptotic and sensitivity analysis of POD is presented in [3]. Consider a dynamical system in $\mathbb{R}^{n}$ governed by a vector field $f$ :

\dot{x}=f(x,t).

The reduced-order model (ROM) vector field is defined as:

\dot{z}=\rho f(\rho^{T}z+\bar{x},t)=f_{a}(z,t).

Thus an initial value problem for the system $\dot{x}=f(x,t)$ with $x(0)=x_{0}$ using the projection method is given by,

\dot{\hat{x}}=Pf(\hat{x},t);\;\;\hat{x}(0)=\hat{x}_{0}=P(x_{0}-\bar{x})+\bar{x}.

(2)

Here, $\hat{x}_{0}$ is the projection of $x_{0}$ onto the subspace $S$ . The sensitivity of the POD projection $P$ to changes in the dataset $\mathcal{D}$ is quantified by the following proposition from [3]:
Proposition: (Rathinam and Petzold [3]) Consider applying POD to a data set $\mathcal{D}$ to find the best approximating $k(<n)$ dimensional subspace. Let the ordered eigenvalues of the covariance matrix of the data $\mathcal{D}$ be given by $\tilde{\lambda}_{1}\geq\cdots\geq\tilde{\lambda}_{n}$ . Suppose $\tilde{\lambda}_{k}>\tilde{\lambda}_{k+1},$ which ensures that $P(\mathcal{D})$ is well defined. Then

S_{k}(\mathcal{D})=\text{max}_{i\leq k,j\leq n-k}\,\sqrt{2}\;\frac{\sqrt{\tilde{\lambda}_{i}+\tilde{\lambda}_{j+k}}}{\tilde{\lambda}_{i}-\tilde{\lambda}_{j+k}}\sqrt{\tilde{\lambda}_{1}+\cdots+\tilde{\lambda}_{n}}\geq\sqrt{2}.

(3)

2.4 Filtering

This section introduces the filtering problem and outlines the key steps in the filtering framework. For further details on the concept of filtering, see [23, 24]. Rooted in Bayesian inference, optimal filtering estimates the state of time-varying systems under noisy measurements. Its objective is to achieve statistically optimal estimation of the system state. Optimal filtering follows the Bayesian framework for state estimation, integrating statistical optimization with Bayesian reasoning to improve applications in signal processing, control systems, and sensor networks.

In Bayesian optimal filtering, the system state consists of dynamic variables such as position, velocity, orientation, and angular velocity. Due to measurement noise, observations do not yield deterministic values but rather a distribution of possible states, introducing uncertainty. The system state evolution is modeled as a dynamic system with process noise capturing inherent uncertainties in system dynamics. While the underlying system is often deterministic, stochasticity is introduced to represent model uncertainties.

The system’s state evolves according to:

x_{k+1}=M(x_{k})+w_{k+1},

where $x_{k}\in\mathbb{R}^{n}$ is the system state at time $t_{k}$ , and $w_{k+1}\in\mathbb{R}^{n}$ represents the model error. The observations are given by:

z_{k}=h(x_{k})+v_{k},

where $v_{k}$ represents the observation noise.

The filtering step estimates the state at time $t_{k+1}$ ( $x_{k+1}$ ) based on observations up to $t_{k+1}$ .

3 Problem statement

Surrogate modeling techniques, such as the reduced-order model (ROM) discussed in Section 2.3, provide computational efficiency but are highly sensitive to input variations, as indicated in Equation 3 Consequently, when initialized with a new random condition, the surrogate model ( $M(\cdot)$ ) efficiently approximates the system trajectory but may fail to capture deviations from the true state.
The surrogate model is used to obtain compressed representations of state vectors at various timesteps, which are treated as noisy observations. The observations are compressed for better memory efficiency if the model operates on the state dimension, as in the neural ODE surrogate model (Section 7). Let $h:\mathbb{R}^{n}\rightarrow\mathbb{R}^{k}$ (where $k<n$ ) denote the state observation relationship. $e_{mh}(t_{k+1})$ and $e_{ml}(t_{k+1})$ denote the high and low wavelength components of error, capturing deviations between the true state and the state obtained from the surrogate model ( $M(x_{k})$ ) at time $t_{k+1}$ . $e_{oh}(t_{k})$ and $e_{ol}(t_{k})$ denote the high and low-wavelength components of error that capture the difference between the observation and the state at time $t_{k}$ . $x_{k+1}$ denotes the true state of the system at time $t_{k+1}$ . $M(\cdot)$ denotes the forward model. For instance, if the forward model $M(\cdot)$ uses the Euler forward scheme and the surrogate model applies POD to the vector field $\dot{x}=f(x(t))$ , then:

M({x}_{k})={x}_{k}+hPf({x}_{k},t_{k}),

where $h$ denotes the step size. The state forward dynamics and state observation relationship are given by:

	$\displaystyle x_{k+1}$	$\displaystyle=M({x}_{k})+e_{mh}(t_{k+1})+e_{ml}(t_{k+1}),$
	$\displaystyle z_{k}$	$\displaystyle=h(x_{k})+e_{oh}(t_{k})+e_{ol}(t_{k}).$

4 Proposed Framework

Refer to caption — Figure 1: Diagram illustrating the key steps of the proposed methodology, including dynamic optimization, hierarchical clustering, solving the inverse problem, and filtering. See Section 4 for a detailed explanation.

The key steps of the framework, as illustrated in Figure 1, are as follows. For brevity, the framework is discussed in the context of dynamical systems represented on graphs. For dynamical systems that do not involve graphs, the dynamic optimization step is not required. The framework’s applicability to general modeling scenarios is demonstrated in Section 7, where a neural ODE-based surrogate model is used to learn from data.

1.

Dynamic optimization. For dynamical systems on graphs, a dynamic optimization problem is formulated to extract a sparse graph from known solution trajectory snapshots. The sparse graph improves memory efficiency in the filtering step (Step 4) of the framework. The original graph can be replaced with the sparse graph, further enhancing memory efficiency. This step outputs the graph Laplacian matrix of the sparse graph, given by $L_{1}=B^{T}\text{diag}(w^{*})B$ , where the weights $w^{*}$ for linear and nonlinear dynamical systems on graphs are determined by solving the optimization problems detailed in Sections 5.2 and 5.3.
2.

Hierarchical clustering. This step quantifies deviations between the surrogate model (e.g., POD) and the actual system trajectories. Given a dataset $\mathcal{D}=\{x_{1},x_{2},\ldots,x_{T}\}$ and surrogate model predictions $\mathcal{P}=\{z_{1},z_{2},\ldots,z_{T}\}$ , we apply hierarchical clustering to partition $\mathcal{P}$ into $p$ clusters. The assigned cluster labels are represented as $Y=\{y_{1},y_{2},\ldots,y_{T}\}$ , where $y_{t}$ denotes the cluster assignment for data point $z_{t}$ . For each cluster $C_{i}$ , we construct a set of triplets $\{z_{j},e_{j},\hat{e}_{j}\}_{j=1}^{N_{C_{i}}}$ , where $N_{C_{i}}$ is the number of triplets in cluster $C_{i}$ . For reduced order model using POD (Section 2.3), we denote $e_{j}=x_{j}-(\rho^{T}{z}_{j}+\bar{x}),\hat{e}_{j}={z}_{j}-\rho(x_{j}-\bar{x})$ . The center of cluster $C_{i}$ is the triplet $\{{z}^{C_{i}}_{m},e^{C_{i}}_{m},\hat{e}^{C_{i}}_{m}\}$ , where $e^{C_{i}}_{m}=\frac{\sum^{N_{C_{i}}}_{j=1}e_{j}}{N_{C_{i}}},\hat{e}^{C_{i}}_{m}=\frac{\sum^{N_{C_{i}}}_{j=1}\hat{e}_{j}}{N_{C_{i}}},{z}^{C_{i}}_{m}=\frac{\sum^{N_{C_{i}}}_{j=1}{z}_{j}}{N_{C_{i}}}$ . As shown in Figure 2, a visualization of the clusters is provided, with $I_{C_{i}}$ representing the indices of data points within cluster $C_{i}$ .

Inverse Problem for Estimating Weights Between Clusters. For an initial condition $x_{0}$ at time step $t_{0}$ , we obtain a compressed representation of the data point $z_{0}$ . For example, when using the POD method, we get the compressed representation of $x_{0}$ as ${z}_{0}=\rho(x_{0}-\bar{x})$ . From the clusters formed in Step 3, we assign the cluster to the point $z_{0}$ based on the following,

\arg\min_{f\in\{1,2,\dots,p\}}\left\|z^{C_{f}}_{m}-{z}_{0}\right\|_{2}.

(4)

The initial cluster distribution $p_{0}\in\mathbb{R}^{p}$ is a one-hot vector, where the index corresponding to $f\in\{0,1,\ldots,p\}$ is set to 1. The cluster distribution at time $t$ is denoted by $p_{t}$ . For successive transitions $\hat{x}_{1}\rightarrow\hat{x}_{2}\rightarrow\ldots\rightarrow\hat{x}_{T}$ , it is essential to efficiently determine the cluster of each approximate state $\hat{x}_{i}$ generated by the surrogate model at time $t_{i}$ . This step is performed for the estimation of high wavelength components of errors described in Section 3. The transitioning of cluster distribution ( $p_{t}$ ) is modelled as a Markovian process discussed in [25]. We consider the nodes of the clusters from Step 2 as nodes of graph $\mathcal{G}_{I}=(\mathcal{V}_{I},\mathcal{E}_{I},q)$ , $|\mathcal{V}_{I}|=p$ , where the weights of the graph $q$ are unknown. The topology of the graph is considered complete with self-loops, meaning there is an edge between each node of the cluster and an edge from each node to itself. We use a supervised learning approach to determine the weights $q$ within the graph, where the labels $y_{t}$ from Step 2 are used in the optimization objective function with the cross-entropy loss (See P).

The transitions of the cluster distribution is modeled as Markovian, based on the assumption that the states of the ODE exhibit Markovian properties, which is explained below. The general explicit Runge-Kutta numerical scheme using $n$ slopes ([26]) for obtaining solution of the differential equation $\dot{x}=f(x(t))$ is given by the following relation

x_{t+1}=x_{t}+hb_{1}k_{1}+hb_{2}k_{2}+\ldots hb_{n}k_{n},

k_{i}=hf(x_{i}+h\sum_{j=1}^{i-1}a_{ij}k_{j}).

From this formulation, it follows that the solution transitions of the ODE adhere to a discrete-time Markov chain:

p(x_{t+1}\mid x_{0},x_{1},\dots,x_{t})=p(x_{t+1}\mid x_{t}).

In a graph $\mathcal{G}_{I}$ , a walk is represented as a sequence of vertices $(v_{0},v_{1},\dots,v_{s})$ , where each consecutive pair of vertices $(v_{i-1},v_{i})$ is connected by an edge in $\mathcal{G}_{I}$ . In other words, there is an edge between $v_{i-1}$ and $v_{i}$ for all $1\leq i\leq s$ . A random walk is characterized by the transition probabilities $P(u,v)$ , which define the probability of moving from vertex $u$ to vertex $v$ in one step. Each vertex $u$ in the graph can transition to multiple neighboring vertices $v$ . If we consider the transitions between the clusters as Markovian, the transition matrix $P=D^{-1}A$ , where $D$ is the diagonal matrix of degrees and $A$ is the adjacency matrix of the graph. For each vertex $u$ ,

\sum_{v}P(u,v)=1.

P\left(u,v\right)=\;\left\{\begin{array}[]{ll}\frac{1}{d_{u}}&\;\text{if u and v are adjacent},\\ 0&\mathrm{otherwise}\end{array}\right.

If we consider a complete graph on 3 nodes with self-loops as shown in Figure 2, the transition matrix,

P=\left[\begin{array}[]{ccc}\frac{1}{q_{11}+q_{12}+q_{13}\;}&0&0\\ 0&\frac{1}{q_{22}+q_{23}+q_{12}}&0\\ 0&0&\frac{1}{q_{13}+q_{23}+q_{33}}\end{array}\right]\left[\begin{array}[]{ccc}q_{11}&q_{12}&q_{13}\\ q_{12}&q_{22}&q_{23}\\ q_{13}&q_{23}&q_{33}\end{array}\right].

We now pose a constrained optimization problem P to estimate the weights ( $q$ ) responsible for governing the transitions of data points.

\left.\begin{aligned} &\textbf{minimize\;\;}_{q\,\in\,\mathbb{R}^{n(n+1)/2}}\;\;&&J=\sum_{t=1}^{T}\sum_{k=1}^{p}-({y}_{t})\log({p}_{t}(k))-(1-{y}_{t})\log(1-({p}_{t}(k)))\\ &\textbf{subject\;\; to \;\;}&&{p}_{t}=P(q)\;{p}_{t-1}\\ &&&q_{ij}\geq 0\;\;i,j=1,2,\ldots p.\\ \end{aligned}\right\}

(P)

${y}_{t}$ represents the cluster index of data point $z_{t}$ . The optimization problem above is solved to find the optimal $q^{*}$ , which can be efficiently obtained using the adjoint method for data assimilation, as detailed in [23]. Given the cluster distribution ( $p_{k}$ ) at time $t_{k}$ , we estimate components $A_{k},R_{k}$ (Eq. 5) for the high wavelength component of errors $e_{mh}(t_{k}),e_{oh}(t_{k})$ as follows: $A_{k}=\text{diag(}{\sum_{i=1}^{p}p_{k}(i){e}_{m}^{C_{i}}}),R_{k}=\text{diag(}{\sum_{i=1}^{p}p_{k}(i)\hat{e}_{m}^{C_{i}}}).$

State Estimation with Filtering. The goal of filtering is to estimate system states over time using noisy measurements and state observation relations, as discussed in Section 2.4. The POD surrogate model defines the evolution of state dynamics and the state-measurement relationship as follows:

\begin{cases}x_{k+1}=M(Pf({x}_{k}))+A_{k+1}v_{k+1}+w_{k+1},&\\ z_{k}=\rho(x_{k}-\bar{x})+R_{k}\beta_{k}+\mu_{k},&\end{cases}

(5)

The terms $A_{k+1}v_{k+1}$ and $w_{k+1}$ represent the high- and low-wavelength components of model error at time $t_{k+1}$ , respectively. Similarly, $R_{k}\beta_{k}$ and $\mu_{k}$ characterize the high- and low-wavelength components of the state-observation error at time $t_{k}$ . $P=\rho^{T}\rho$ denotes the projection matrix. At a particular time step, the data point at time $t$ $({z}_{t})$ is obtained using the ROM $\dot{{z}}_{t}=f_{a}({z}_{t},t)=\rho f(\rho^{T}{z}_{t}+\bar{x},t)$ , and the state vector after projection ${x}_{t}=\rho^{T}{z}_{t}+\bar{x}$ . The matrices $A_{k+1}$ and $R_{k}$ are computed following the methodology outlined in Step 3 of the framework. The term $v_{1}$ is the solution to the linear system $A_{1}v_{1}=x_{1}-M(Pf(x_{0}))$ . For dynamical systems represented on graphs, the term $x_{1}$ is computed using an explicit numerical scheme with the sparse graph obtained from the dynamic optimization Step 1. In the case of a diffusion equation represented on graphs $(\mathbb{D}$ in Section 2.2) with Euler discretization step size $h$ , the sparse Laplacian matrix $L_{1}$ from the output of Step 1 is used instead of matrix $L$ to obtain $x_{1}=x_{0}-hL_{1}x_{0}$ . $\beta_{1}$ is the solution to the linear system $R_{1}\beta_{1}=z_{1}-\rho(x_{1}-\bar{x})$ . $w_{k+1}$ and $\mu_{k}$ are modelled as Gaussian with means $\mathbf{0}_{n\times 1},\mathbf{0}_{k\times 1}$ and variances $\zeta_{x}=\sigma_{x}I_{n\times n},\zeta_{y}=\sigma_{y}I_{k\times k}.$ The distribution governing the low wavelength components of errors ( $w_{k+1},\mu_{k}$ ) is deliberately shaped to exhibit minimal variability, with a centered mean of $\mathbf{0}_{n\times 1}$ . This assumption is made with the expectation that the high wavelength component will accurately capture the true nature of the error. There exist linear and non-linear filters for state estimation as described in [23], [24]. When the uncertainty in the state prediction exceeds a threshold, vectors $v_{k+1}$ and $\beta_{k}$ are updated as solutions of the linear systems defined below:

	$\displaystyle A_{k+1}\;v_{k+1}=x_{k+1}-M(Pf(x_{k})),$		(6)
	$\displaystyle R_{k}\;\beta_{k}=z_{k}-\rho(x_{k}-\bar{x}).$		(7)

The computation of $x_{k+1}$ is performed using an explicit numerical scheme that leverages the state vector obtained from the filter at time step $k$ using the sparse graph.

5 Dynamic optimization problem formulation for dynamical systems represented on graphs

This section outlines the constraints necessary for the dynamic optimization step (Step 1 in Section 4) applied to a linear dynamical system $\mathbb{D}$ represented on graphs (Section 2.2). Section 5.2 details the dynamic optimization process for linear diffusion systems on graphs. Section 5.3 presents the optimization framework for reaction-diffusion systems on graphs, including the corresponding constraint formulations.
Existing techniques for approximating linear dynamical systems on graphs include [27], [28], and [29]. The approach in [29] constructs spectral sparsifiers by sampling edges according to their resistance $R_{e}$ . The probability of sampling an edge $e$ to construct a sparse approximation $\bar{G}$ of graph $G$ is given by

p_{e}=\frac{w_{e}R_{e}}{(n-1)},

where the edge resistance is defined as

R_{e}=\|L^{1/2}L_{e}L^{1/2}\|.

See Algorithm 5 for further details.

$\bar{G}=\textbf{Sparsify}(G,q)\cite[cite]{[\@@bibref{}{spielman2009graph}{}{}]}$
Choose a random edge $e$ of $G$ with probability $p_{e}$ proportional to $w_{e}R_{e}$ ,
and add $e$ to $\bar{G}$ with weight $w_{e}/(qp_{e})$ . Take $q$ samples independently
with replacement, summing weights if an edge is chosen more than once.

Computing bounds on the edge weights of the sparse graph approximation $\bar{G}$ , using effective resistances or local edge connectivities, is computationally intensive. To overcome this challenge, we propose Algorithm 5.1, which leverages Theorems 2 and 3 to efficiently compute upper bounds on edge weights and node degrees in the sparse graph. The following discussion introduces Bernstein’s Inequality [30], which is utilized in the derivation of Theorem 2.

Theorem 1.

Bernstein’s Inequality: Suppose $X_{1},\ldots,X_{n}$ are independent random variables with finite variances, and suppose that $\text{max}_{1\leq i\leq n}|X_{i}|\leq B$ almost surely for some constant $B>0$ . Let $V=\sum_{i=1}^{n}\mathbb{E}X_{i}^{2}.$ Then, for every $t\geq 0,$

P\{\sum_{i=1}^{n}(X_{i}-\mathbb{E}X_{i})\geq t\}\leq exp\bigg{(}-\frac{t^{2}}{2(V+tB/3)}\bigg{)}

and

P\{\sum_{i=1}^{n}(X_{i}-\mathbb{E}X_{i})\leq-t\}\leq exp\bigg{(}-\frac{t^{2}}{2(V+tB/3)}\bigg{)}

Theorem 2.

Let each edge $(u,v)$ of a graph be sampled with probability $p_{uv}\propto R_{uv}$ . where $p_{uv}\geq\frac{\beta}{n\text{min(deg(u), deg(v))}}$ , then $\mathbb{P}\{\frac{\sum_{i=1}^{q}X_{i}}{q}-w_{uv}\geq t\}\leq\frac{1}{mn}$ , $t\geq\frac{\epsilon_{1}c_{1}}{3q}+\sqrt{\frac{2\epsilon_{1}w_{uv}c_{1}}{q}+(\frac{\epsilon_{1}c_{1}}{3q})^{2}}$ , where

X_{i}=\left\{\begin{array}[]{ll}\begin{array}[]{l}\frac{w_{uv}}{p_{uv}},\text{\;\;with probability $p_{uv}$ (u,v) $\in\;E(G)$}\\ \\ 0,\text{\;\;otherwise.}\\ \\ \end{array}&\end{array}\right.

$\epsilon_{1}=\log(nm),|X_{i}|\leq c_{1},c_{1}=\frac{w_{uv}nc_{2}}{\beta}$ , where $c_{2}=\text{min(deg($u$), deg($v$))}$ .

Proof.

The expectation of $X_{i}$ satisfies

\mathbb{E}[X_{i}]=w_{uv}.

The second moment sum satisfies

\sum_{i=1}^{q}\mathbb{E}(X_{i}^{2})=q\frac{w_{uv}^{2}}{p_{uv}}\leq\frac{w_{uv}^{2}nc_{2}q}{\beta},

where $c_{2}=\min(\text{deg}(u),\text{deg}(v))$ .

Applying Bernstein’s Inequality (Theorem 1), we obtain

\mathbb{P}\left\{\frac{\sum_{i=1}^{q}X_{i}}{q}-w_{uv}\geq t\right\}\leq\exp\left(-\frac{(tq)^{2}}{2}\cdot\frac{1}{w_{uv}c_{1}q+\frac{c_{1}tq}{3}}\right).

\frac{(tq)^{2}}{2}\frac{1}{(w_{uv}c_{1}q+\frac{c_{1}tq}{3})}\geq\log{nm}\equiv\epsilon_{1}

Completing the square yields

\left(t-\frac{\epsilon_{1}c_{1}}{3q}\right)^{2}\geq\frac{2\epsilon_{1}w_{uv}c_{1}}{q}+\left(\frac{\epsilon_{1}c_{1}}{3q}\right)^{2}.

Hence, it suffices to choose

t\geq\frac{\epsilon_{1}c_{1}}{3q}+\sqrt{\frac{2\epsilon_{1}w_{uv}c_{1}}{q}+\left(\frac{\epsilon_{1}c_{1}}{3q}\right)^{2}}.

∎

Theorem 2 provides upper bounds on the edge weights of the sparse graph $\bar{G}$ generated by Algorithm 5. Furthermore, it is necessary to establish bounds on the node degrees in the sparse graph $\bar{G}$ . To derive these bounds, we introduce Theorem 3, which utilizes the established relationship between the eigenvalues of the Laplacian matrix and the node degrees, as discussed in Section 5.1.
Notation: Let $G$ be a simple weighted graph, i.e., a graph without self-loops or multiple edges with vertices $\{v_{1},v_{2},\ldots,v_{n}\}$ . We define two subsets of vertices in $G$ based on their degrees. The set $l_{m}$ contains the $n-m+1$ vertices with the highest degrees, while $s_{m}$ consists of the $m$ vertices with the lowest degrees. The subgraphs $G_{l_{m}}$ and $G_{s_{m}}$ are defined as the subgraphs of $G$ induced by the vertex sets $l_{m}$ and $s_{m}$ , respectively.

$\displaystyle a_{ub}$	$\displaystyle=\quad\text{upper bound on the maximal edge weight of the sparsified graph},$	(8)
$\displaystyle\Delta(G_{l_{i}})_{ub}$	$\displaystyle=\quad\text{upper bound on the maximum degree in }G_{l_{i}},$
$\displaystyle\delta(G_{s_{i}})_{lb}$	$\displaystyle=\quad\text{lower bound on the minimum degree in }G_{s_{i}}.$

Here $d_{v_{1}}=d_{1}\leq d_{v_{2}}=d_{2}\leq\ldots\leq d_{v_{n}}=d_{n}.$

5.1 Some known bounds on eigen values of Laplacian matrix

Unweighted graphs: [31](corollary 1) For simple unweighted graphs on $n$ vertices and $G\neq K_{n-m+1}+(m-1)K_{1}$ and $\bar{G}\neq K_{m-1}+(n-m+1)K_{1}$ , we have the following relation

d_{m}-n+m+1\leq\lambda_{m}(G)\leq d_{m-1}(G)+m-2.

Weighted graphs: [31](corollary 2) Let $G$ be finite simple weighted graph on $n$ vertices and denote by a the maximal weight of an edge in $G$ , then

d_{m}(G)-\Delta(G_{l_{m}})\leq\lambda_{m}(G)\leq d_{m-1}(G)+(m-1)a-\delta(G_{S_{m-1}})

(9)

We present the following theorem, which establishes bounds on the degrees of the spectral sparsifier $\bar{G}$ of $G$ which are incorporated as constraints in the dynamic optimization step of our framework (Section 5.2).

Theorem 3.

Let $\bar{G}$ be an $\epsilon$ -spectral sparsifier of an undirected graph $G$ with $n$ vertices, where the eigenvalues of $G$ satisfy $\lambda_{1}\leq\lambda_{2}\leq\cdots\leq\lambda_{n}$ , with $\lambda_{n}$ as the largest eigenvalue. The following bounds on the degrees of $\bar{G}$ must hold:

d_{2}(G)-\Delta(G_{l_{2}})(1-\epsilon)-a(G)\leq d_{1}(\bar{G})\leq d_{1}(G)(1+\epsilon),

(d_{i+1}(G)-\Delta(G_{l_{i+1}}))(1-\epsilon)-ia(\bar{G})+\delta(\bar{G}_{s_{i}})_{lb}\leq d_{i}(\bar{G})\leq(1+\epsilon)(d_{i-1}(G)+(i-1)a(G)-\delta(G_{s_{i-1}}))+\Delta(\bar{G}_{l_{i}})_{ub},

for i = 2 to n-1.

d_{n}(G)(1-\epsilon)\leq d_{n}(\bar{G})\leq(1+\epsilon)(d_{n-1}(G)+(n-1)a(G)-\delta(G_{s_{n-1}})).

Proof.

Since $\bar{G}$ is an $\epsilon$ -spectral approximation of $G$ , the eigenvalues satisfy:

(1-\epsilon)\lambda_{i}(G)\leq\lambda_{i}(\bar{G})\leq(1+\epsilon)\lambda_{i}(G),

(a)

for $i=1,\dots,n$ .

From the spectral bounds on eigenvalues in Equation 9, we know:

\lambda_{i}(G)\leq d_{i-1}(G)+(i-1)a(G)-\delta(G_{s_{i-1}}),\quad\lambda_{i}(G)\geq d_{i}(G)-\Delta(G_{l_{i}}).

(b)

Applying (a) to the upper bound in (b), we obtain:

d_{i}(\bar{G})\leq(1+\epsilon)\big{(}d_{i-1}(G)+(i-1)a(G)-\delta(G_{s_{i-1}})\big{)}+\Delta(\bar{G}_{l_{i}})_{ub}.

(c)

Similarly, using the lower bound in (b), we have:

d_{i-1}(\bar{G})\geq\big{(}d_{i}(G)-\Delta(G_{l_{i}})\big{)}(1-\epsilon)-(i-1)a(\bar{G})+\delta(\bar{G}_{s_{i-1}})_{lb}.

(d)

For the largest degree, we use the fact that $\bar{G}$ preserves cuts in $G$ , giving:

d_{n}(G)(1-\epsilon)\leq d_{n}(\bar{G})\leq(1+\epsilon)(d_{n-1}(G)+(n-1)a(G)-\delta(G_{s_{n-1}})).

(e)

Hence, the degree bounds for all nodes in $\bar{G}$ are established. ∎

Let $\delta_{i}^{u}$ and $\delta_{i}^{l}$ denote the upper and lower bounds on the degree of node $i$ , respectively, as established in Theorem 3. Additionally, let $\delta^{-}(i)$ represent the refined lower bound of degree $i$ , and $\Delta^{+}(i)$ the refined upper bound:

d_{i}(\bar{G})\geq\max\left(\delta_{i}^{l},(1-\epsilon)\,d_{i}(G)\right)=\delta^{-}(i),

d_{i}(\bar{G})\leq\min\left(\delta_{i}^{u},(1+\epsilon)\,d_{i}(G)\right)=\Delta^{+}(i).

Algorithm 5.1 outlines the computation of the upper bound $\Delta^{+}$ and the lower bound $\delta^{-}$ on the node degrees in the sparsified graph $\bar{G}$ .

Here, $\Delta^{+}(i)$ denotes the upper bound on the maximum degree in the subgraph of $\bar{G}$ induced by the vertices $\{v_{i},v_{i+1},\dots,v_{n}\}$ , while $\delta^{-}(i)$ represents the lower bound on the minimum degree in the subgraph induced by $\{v_{1},v_{2},\dots,v_{n-i+1}\}$ . Furthermore, we define upper and lower bounds on graph $G$ as $\Delta(i)=\Delta(G_{l_{i}})$ and $\delta(i)=\delta(G_{s_{i}})$ .

———————————————————————————————————————

Algorithm 1

———————————————————————————————————————

Input : Graph

G=(V,E,W)

\epsilon

Output :

\Delta^{+},\delta^{-},\Delta,\delta,w^{+},w^{-}

w^{+}=\{\};\;\;w^{-}=\{\}

for each edge $e\in E(G)$ do

Compute

t

using Theorem 2.

Append

w_{e}+t

w^{+}

Append

\max(w_{e}-t,0)

w^{-}

end for

\text{node}\leftarrow\{1,2,\ldots,n\}

degree

\leftarrow\;\{d_{1},d_{2},\ldots,d_{n}\}

dfs(

G

) // O(

V+E

)

\text{degree},\text{degree1},\text{degree}^{+},\text{degree}^{-}\leftarrow\text{dictionary($\{v_{1},v_{2},\ldots,v_{n}\}$, $\{d_{v_{1}},d_{v_{2}},\ldots,d_{v_{n}}\}$) (where $d_{v_{1}}\leq d_{v_{2}}\leq\ldots\leq d_{v_{n}}$)}

\Delta(1)=\text{max(degree)},\delta(n)=\text{min(degree)},a(G)(\text{maximal weight})

\Delta(n)=0,\;\;\Delta^{+}(n)=0

\delta(1)=0,\;\;\delta^{-}(1)=0

\Delta^{+}(1)=\text{max(degree)},\delta^{-}(n)=\text{min(degree)}

l_{2}

\leftarrow

node;

l_{3}

\leftarrow

node

Remove node

v_{1}

from

l_{2}

Remove node

v_{n}

from

l_{3}

for $i=2$ to $n-1$ do

for $y=\{v_{i+1},v_{i+2},\ldots,v_{n}\}$ do

v_{i},y\in E

degree(

y

) = degree(

y

) -

w(v_{i},y)

, degree(

v_{i}

) = degree(

v_{i}

) -

w(v_{i},y)

\text{degree}^{+}(y)=\text{degree}^{+}(y)-w^{+}(v_{i},y)

\text{degree}^{+}(v_{i})=\text{degree}^{+}(v_{i})-w^{+}(v_{i},y)

end for

Remove node

v_{i}

from

l_{2}

for $y=\{v_{1},v_{2},\ldots,v_{n-i}\}$ do

v_{n-i+1},y\in E

degree1(

y

) = degree1(

y

) -

w(v_{i},y)

, degree1(

v_{i}

) = degree1(

v_{i}

) -

w(v_{i},y)

\text{degree}^{-}(y)=\text{degree}^{-}(y)-w^{-}(v_{i},y)

\text{degree}^{-}(v_{i})=\text{degree}^{-}(v_{i})-w^{-}(v_{i},y)

end for

\delta(n-i+1)

\min_{x\in l_{3}}\text{degree1}

\delta^{-}(n-i+1)

\min_{x\in l_{3}}\text{degree}^{-}

Remove node

v_{n-i+1}

from

l_{3}

end for

The outputs of Algorithm 5.1 ( $\Delta^{+},\delta^{-},w^{+},w^{-}$ ) are incorporated as constraints in the dynamic optimization process (Constraints D3, D4, D5, D6). These constraints are specifically applied in the dynamic optimization of linear dynamical systems on graphs, as discussed in Section 5.2 and outlined in Section 4.

5.2 Dynamic Optimization for Linear Dynamical Systems on Graphs Using a POD-Based Surrogate Model

Real-time dynamic optimization problems are described in [32], [33], and [34]). Using snapshots of solutions from several trajectories (Section 2.2), we formulate the dynamic optimization problem in a ROM space [3], [35] using the matrix $\rho\in\mathbb{R}^{k\times|V|}$ of projection and mean $\bar{x}$ , where $k$ denotes the reduced dimension and $k<n$ (Section 2.3). The cost function $J$ (Eq. 10) in the dynamic optimization makes use of data points obtained from ROM. The dynamic optimization framework for obtaining a sparse graph in the case of diffusion ( $\mathbb{D}$ ) considering a single trajectory is given below:

\underset{\bar{\gamma}}{\textbf{minimize}}\,\,J(\bar{z},\bar{\gamma})=\underbrace{\frac{\sum_{i=1}^{N}\sum_{j=1}^{k}(z(i,j)(i\;\text{step }m_{1})-\bar{z}(i,j)(i\;\text{step }m_{1})\;)^{2}}{2}}_{A1}+\underbrace{{\alpha}\sum_{i=1}^{m}|\bar{\gamma}_{i}|}_{A2}

(10)

subject to
	$\displaystyle t_{n}\;Nf_{a}(\bar{z}(i,1),\bar{z}(i,2),\ldots,\bar{z}(i,k),\bar{\gamma})=\left[\begin{array}[]{c}\bar{z}(i,1)\\ \bar{z}(i,2)\\ \ldotp\\ \ldotp\\ \ldotp\\ \bar{z}(i,k)\end{array}\right]-\left[\begin{array}[]{c}\bar{z}(i-1,1)\\ \bar{z}(i-1,2)\\ \ldotp\\ \ldotp\\ \ldotp\\ \bar{z}(i-1,k)\\ \end{array}\right]$	$\displaystyle\;\;i=1,2,\ldots C$
	$\displaystyle\{\bar{z}(0,1),\bar{z}(0,2),\ldots,\bar{z}(0,k)\}=\rho(F_{0}-\bar{F})$
	$\displaystyle Q^{{}^{\prime}}\bar{\gamma}\geq\delta^{-}\quad$		(D3)
	$\displaystyle Q^{{}^{\prime}}\bar{\gamma}\leq\Delta^{+}\quad$		(D4)
	$\displaystyle w_{j}\bar{\gamma}_{j}\geq w^{-}_{j},$	$\displaystyle j=1,2,\dots,m$	(D5)
	$\displaystyle w_{j}\bar{\gamma}_{j}\leq w^{+}_{j},$	$\displaystyle j=1,2,\dots,m$	(D6)
	$\displaystyle\bar{\gamma}_{j}\geq 0,$	$\displaystyle j=1,2,\dots,m$	(D7)

The term $A1$ in the objective function (Eq. 10) aims to minimize the discrepancy between the states obtained using the projected vector field with multipliers $\bar{\gamma}$ and the known ROM solution obtained from the original graph. The term $A2$ enforces sparsity in $\bar{\gamma}$ by penalizing nonzero elements. The optimization considers states at discrete increments of step size $m_{1}$ . The matrix $N$ is determined by the chosen collocation method (see [18]).

$t_{n}$ denotes the step size. $C$ denotes the number of collocation elements. $z(i,k)$ represents the $k-$ th entry of state $z$ in the $i$ -th collocation element. $F_{0}=\{F(0,1),F(0,2),\ldots,F({0,|V|})\}$ represents the initial condition used for generating data using ROM $z(i,j)$ . $\bar{F}$ is determined based on the procedure described as in [3]. The first set of $kC$ constraints stems from the orthogonal collocation method on finite elements (Eq. 5.2). Non-negativity of the multipliers $\bar{\gamma}_{j}$ is imposed in constraints (Eq. D7). The constraints $Q^{\prime}\bar{\gamma}\geq\delta^{-}$ (Eq. D3) and $Q^{\prime}\bar{\gamma}\leq\Delta^{+}$ (Eq. D4) enforce connectivity levels as derived in Theorem 3. Here, $Q^{{}^{\prime}}=Q\;\mathrm{diag(}w)$ , where $Q$ represents the unsigned incidence matrix of the graph and $w$ represents the weights of the given graph. The weight constraints $w_{j}\gamma_{j}\geq w^{-}_{j}$ and $w_{j}\gamma_{j}\leq w^{+}_{j}$ are based on Theorem 2. From the output of the dynamic optimization framework $(z^{*},\bar{\gamma}^{*})$ , we prune out weights $w^{*}_{j}\leq\epsilon_{1}$ , where $\epsilon_{1}$ is a user-defined parameter ( $w^{*}_{j}=w_{j}\gamma^{*}_{j}$ ). The new weight vector is denoted as $w_{1}^{*}$ . The sparse graph Laplacian $L_{1}=B^{T}\text{diag}(w_{1}^{*})B$ replaces the standard Laplacian $L$ in the filtering step to improve the efficiency of Laplacian-vector product computations.

Several key observations can be made about the dynamic optimization problem. The objective function ( $J$ ) can be made continuous by using the substitution $\gamma=\gamma^{+}-\gamma^{-}$ , $||\gamma||_{1}=\mathbf{1}^{T}\gamma^{+}+\mathbf{1}^{T}\gamma^{-}$ . $\gamma^{+}$ represents the positive entries in $\gamma$ (0 elsewhere) and $\gamma^{-}$ represents the negative entries in $\gamma$ (0 elsewhere). The objective function is convex, and the constraints are continuous. To solve this optimization problem, we can employ the Barrier method for constrained optimization, as discussed in [36] and [37].

5.3 Dynamic Optimization for Reaction-diffusion Dynamical Systems on Graphs Using a POD-Based Surrogate Model

In this section, we present the formulation of the dynamic optimization framework for a reaction-diffusion system. A general reaction-diffusion system is described in [21], where the activity at node $j$ at time $t$ is represented by an $\mathrm{m}$ -dimensional variable $r_{j}(t)$ . The temporal evolution of $r_{j}$ follows the differential equation:

\frac{dr_{j}}{dt}=\mathcal{F}(r_{j})+K\sum_{k=1}^{N}A_{jk}\mathcal{G}(r_{k}-r_{j})\hskip 14.22636ptj=1,2,....N.

$\mathcal{F}$ denotes the reaction component, while the remaining terms represent the diffusion process. $\mathcal{F}:\mathbb{R}^{\mathrm{m}}\rightarrow\mathbb{R}^{\mathrm{m}}$ , $\mathcal{G}:\mathbb{R}^{\mathrm{m}}\rightarrow\mathbb{R}^{\mathrm{m}}$ . For brevity, we use the alternating self-dynamics Brusselator model ( $\mathbb{C}$ ) discussed in Section 2. When the input graph is connected, We impose a minimum connectivity constraint to limit perturbations in the second smallest eigenvalue of the graph Laplacian matrix. The intuition behind this constraint is that for a connected graph, the lowest degree will be greater than zero, making this constraint necessary. $\tau_{L}$ is the minimum degree imposed on the sparse graph (Eq. D10). Non-negativity of weights is also imposed on the dynamic optimization problem (Eq. D11). While [35] discusses generating sparse graphs using snapshots of data at arbitrary time points, we utilize data points at collocation time points for sparsification. We consider the following dynamic optimization problem for a reaction-diffusion system considering a single trajectory:

\underset{\bar{\gamma}}{\textbf{minimize}}\,\,J(\bar{z},\bar{\gamma})={\frac{\sum_{i=1}^{N}\sum_{j=1}^{k}(z(i,j)(i\;\text{step $m_{1}$})-\bar{z}(i,j)(i\;\text{step $m_{1}$})\;)^{2}}{2}}+{{\alpha}\sum_{i=1}^{m}|\bar{\gamma}_{i}|}

subject to
	$\displaystyle t_{n}\;Nf_{a}(\bar{z}(i,1),\bar{z}(i,2),\ldots,\bar{z}(i,k),\bar{\gamma})=\left[\begin{array}[]{c}\bar{z}(i,1)\\ \bar{z}(i,2)\\ \ldotp\\ \ldotp\\ \ldotp\\ \bar{z}(i,k)\end{array}\right]-\left[\begin{array}[]{c}\bar{z}(i-1,1)\\ \bar{z}(i-1,2)\\ \ldotp\\ \ldotp\\ \ldotp\\ \bar{z}(i-1,k)\\ \end{array}\right]$	$\displaystyle\;\;i=1,2,\ldots C$
	$\displaystyle\{\bar{z}(0,1),\bar{z}(0,2),\ldots,\bar{z}(0,k)\}=\rho(X_{0}-\bar{X})$
	$\displaystyle Q^{{}^{\prime}}\bar{\gamma}\geq\tau_{L}\quad$		(D10)
	$\displaystyle\bar{\gamma}_{j}\geq 0,$	$\displaystyle j=1,2,\dots,m$	(D11)

The initial condition $X_{0}=\{x(0,1),x(0,2),\ldots,x(0,{|V|}),y(0,1),y(0,2),\ldots,y(0,{|V|})\}$ is used to obtain data points ( $z(i,j)$ ) using ROM. $\bar{X}$ is determined as described in [3]. The output from the dynamic optimization problem is $(\bar{z}^{*},\bar{\gamma}^{*})$ . Then, $w_{1}=\mathrm{diag(}W)\gamma^{*}$ . Elements in $w_{1}$ less than $\epsilon_{1}$ are set to 0, and $w_{1}$ is updated. We update the solutions in the filtering step using this sparsified graph $G_{1}=(V,E_{1},w_{1})$ when the uncertainty value in the covariance matrix of filtering exceeds a threshold.

6 Experimental Results

In this section, we present empirical results demonstrating the effectiveness of the proposed framework in reinforcing graph-based linear dynamical systems ( $\mathbb{D}$ ) and nonlinear reaction-diffusion systems, specifically the chemical Brusselator model ( $\mathbb{C}$ ), as described in Section 2.2. We evaluate the effectiveness of our framework by analyzing RMSE values under perturbations in initial conditions for surrogate models applied to both linear and reaction-diffusion systems on graphs. Additionally, we assess the impact of incorporating our framework on these models for both linear and non-linear dynamical systems. The influence of initial condition perturbations on neural ODE-based surrogate models, both with and without our framework discussed in Section 7, is also presented in Table 1.

Experiment	Surrogate model	Surrogate model with Framework	ROM
	RMSE	RMSE	RMSE
Reaction-Diffusion with ROM surrogate model	0.59	0.40	—
Linear Diffusion with ROM surrogate model	5.17	0.38	—
Linear Diffusion with Neural ODE surrogate model	0.48	0.29	0.65

Table 1: Comparison of RMSE values across different experiments, analyzing the effects of perturbations of input on surrogate models. The Reaction-Diffusion experiment is conducted on a 40-node Erdős-R

\acute{e}

nyi random graph using the chemical Brusselator dynamics (

\mathbb{C}

), as detailed in Section 2.2. The Linear Diffusion experiment is performed on a 30-node Erdős-R

\acute{e}

nyi random graph with the graph-based dynamical system (

\mathbb{D}

), employing the ROM (Section 2.3) as the surrogate model. For both the Linear Diffusion and Reaction-Diffusion experiments, we assess the robustness of the surrogate model with our framework under perturbations. In the Linear Diffusion experiment with neural ODEs (Section 7), we analyze the impact of perturbations on a neural-ODE surrogate model for the diffusion equation (

\mathbb{D}

in Section 2.2) on a 10-node complete graph. Surrogate model with our framework achieves the lowest RMSE across all experiments, demonstrating its effectiveness in improving surrogate model accuracy under perturbations.

Remark: In certain graph structures, empirical experiments revealed instances of particle filter divergence. Particle filter divergence is a critical issue that must be carefully addressed, as it compromises estimation accuracy and reduces the framework’s effectiveness. Several factors can contribute to this phenomenon, including suboptimal filter tuning, modeling inaccuracies, inconsistent measurement data, or system-related issues such as hardware-induced delays. Specifically, inaccurate likelihood estimations due to imprecise noise assumptions, erroneous process models, or delayed measurement updates can lead to divergence. For further examples, see [38]. Empirically, we observed that in certain graph configurations, the particle filter exhibited divergence, necessitating additional updates in the filtering step.

6.1 Linear dynamical system represented on graphs

We present the experimental results for the linear diffusion equation ( $\mathbb{D}$ ) on a 30-node Erdős-R $\acute{e}$ nyi random graph using the 4-point orthogonal collocation method with 20 elements. The parameters used in Algorithm 5.1 and Section 5.2 are as follows: $\epsilon=0.5$ , $T=0.15$ , $k=\mathrm{min}(\lceil\frac{n}{5}\rceil,50)$ , with two trajectories considered. The number of clusters is set to $p=30$ , while the particle filter employs 15,000 particles with $\epsilon_{1}=1\times 10^{-5}$ . The noise terms $w_{k}$ and $\mu_{k}$ are assumed to be normally distributed with zero mean and variances $\zeta_{x}I$ and $\zeta_{y}I$ , where $\zeta_{x}=0.01$ and $\zeta_{y}=1\times 10^{-7}$ .

Following the dynamic optimization step, the resulting graph exhibited 31 sparse edges, a substantial reduction from the original graph’s 336 edges. During the filtering step, 193 updates were required for prediction over 1000 timesteps. Figure 3 compares the reduced order model solution, the actual solution, and the reduced-order model solution with our framework for the linear dynamical system ( $\mathbb{D}$ ) described in Section 2.2. Figure 3(c) shows that the ROM solution with our framework closely resembles the actual solution in Figure 3(b) than the reduced-order model solution in Figure 3(a), as discussed in Section 2.3.

6.2 Reaction-diffusion system represented on graphs

The results for the nonlinear case are based on the alternating self-dynamics Brusselator model applied to a 40-node Erdős-R $\acute{e}$ nyi random graph, using the 4-point orthogonal collocation method. The number of trajectories is set to two, and the particle filter employs 15,000 particles. The total simulation time is set to $T=3$ , with the number of clusters fixed at $p=30$ . The sparsification parameter $\epsilon_{1}$ is set to $10^{-5}$ , and the minimum connectivity constraint is defined as $\tau_{L}=0.1\mathrm{d}_{s}$ , where $\mathrm{d}_{s}$ denotes the minimum degree of the graph. The reduced-order dimension is given by $k=\mathrm{min}(\lceil\frac{2n}{5}\rceil,50)$ . The noise terms $w_{k}$ and $\mu_{k}$ are assumed to be normally distributed with zero mean and variances $\zeta_{x}I$ and $\zeta_{y}I$ , where $\zeta_{x}=0.01$ and $\zeta_{y}=1\times 10^{-7}$ . The graph obtained from the dynamic optimization step contained 286 sparse edges, a reduction from the original graph’s 336 edges. The filtering step required a total of 50 updates for predictions over 100 timesteps.

Figure 4 presents a comparison of the actual solution of the chemical Brusselator reaction-diffusion system ( $\mathbb{C}$ in Section 2.2) for a randomly initialized condition, with both the reduced-order model (Section 2.3) and the ROM solution with our framework. The grid-based representation highlights regions marked in black, indicating areas where the absolute error of the particle filter solution is lower than that of the ROM solution.

7 Benchmarking the framework using neural ODEs

To broaden the applicability of our framework, we evaluate its effectiveness using a neural-ODE-based surrogate model, where state vectors are observed at discrete time intervals. Neural ordinary differential equations (neural ODEs) provide a powerful framework for modeling and analyzing complex dynamical systems [39, 40]. They offer a flexible approach to capturing system dynamics by integrating the principles of ordinary differential equations with neural networks. Unlike traditional neural networks that process inputs through discrete layers, neural ODEs represent system dynamics continuously using an ordinary differential equation to govern the evolution of hidden states over time. Neural ODEs learn system dynamics by learning the parameters of the differential equation using the adjoint sensitivity method ([23]). This approach allows the model to capture complex temporal dependencies in data. By leveraging the continuous-time nature of differential equations, neural ODEs provide key advantages, including the ability to model irregularly sampled time-series data and accommodate variable-length inputs.

To illustrate the framework, we consider a linear dynamical system ( $\mathbb{D}$ ) from Section 2.2, defined as follows:

\frac{dx}{dt}=-Lx.

(11)

The key distinction in applying our framework with the neural ODE model as the surrogate is the absence of the dynamic optimization step outlined in Section 4. Instead, an additional step is included to train the parameters of the neural ODE surrogate model, as described in Equation 13. The neural ODE solutions are treated as observations in the filtering step after projection ( $z_{k}$ ), where the POD method is applied for dimensionality reduction. The forward state dynamics and the state observation relationship, as applied in Step 4 of the filtering process within the framework, are given by:

\begin{cases}x_{k+1}=M({x}_{k})+A_{k+1}v_{k+1}+w_{k},\\ z_{k}=\rho\left(x_{k}-\bar{x}\right)+R_{k}\beta_{k}+\mu_{k}.\end{cases}

(12)

When utilizing the neural ODE surrogate model with an Euler discretization step $\Delta t_{k}$ , the forward model is expressed as:

M({x}_{k})=x_{k}+\Delta t_{k}\cdot nn(x_{k}).

At iteration $k$ , the vector $\beta_{k}$ is updated by solving the following linear system:

R_{k+1}\beta_{k}=z_{k+1}-\rho(x_{k+1}-\bar{x}).

The matrices $A_{k+1}$ and $R_{k+1}$ are computed as described in Step 3 of the framework (Section 4). The solution at time step $k+1$ is determined as:

x_{k+1}=x_{k}+\Delta t_{k}nn(x_{k}),

where $\Delta t_{k}=t_{k+1}-t_{k}$ . The vector $v_{k}$ is initialized as $\mathbf{0}_{n\times 1}$ . The neural network architecture is structured as follows:

nn(x(t))=\sinh{(\theta_{3}\theta_{2}\theta_{1}x(t)+\theta_{3}\theta_{2}b_{1}+b_{3})}.

\displaystyle\theta_{1}

\displaystyle\in\mathbb{R}^{10\times 50},\theta_{2}\in\mathbb{R}^{50\times 50},\theta_{3}\in\mathbb{R}^{50\times 10},b_{1}\in\mathbb{R}^{50},b_{3}\in\mathbb{R}^{10}.

The experiment involved 41 updates, with predictions performed over 100 timesteps. The number of particles in the particle filter step was set to 20,000. The noise terms $w_{k}$ and $\mu_{k}$ were assumed to follow normal distributions with zero mean and variances $\zeta_{x}I$ and $\zeta_{y}I$ , where $\zeta_{x}=0.01$ and $\zeta_{y}=1\times 10^{-3}$ . The Laplacian matrix $L$ was chosen as the Laplacian of the complete graph $\mathbb{K}_{10}$ , and $N=30$ observations were randomly sampled over the time interval $t=0$ to $t=0.05$ .

The cost function $J$ (Eq. 13) for estimating the parameters of the neural ODE model is given by:

J(\theta)=\frac{1}{2}\sum_{k=1}^{N}\|x_{k}-\tilde{x}_{k}\|_{2}^{2}+\alpha_{1}\|\theta\|_{1}.

(13)

The parameter vector $\theta$ , obtained by flattening $\theta_{1},\theta_{2},\theta_{3},b_{1},$ and $b_{3}$ , is represented as $\theta\in\mathbb{R}^{3560\times 1}$ . For experimentation, the parameters in Eq. 13 are determined using the adjoint sensitivity method with an Euler discretization scheme. The number of clusters is fixed at 40.

Figure 5 compares the neural-ODE solution for a random initial condition with the ROM and neural-ODE solution with our framework. The neural-ODE solution (Figure 5(b)) exhibits higher noise levels compared to the actual solution (Figure 5(a)). Figure 6 contrasts the ROM solution (Figure 5(d)) of the linear dynamical system (Equation 11) with the neural-ODE solution with our framework (Figure 5(c)). In the grid-wise comparison of solutions, the black regions in Figures 6(a) and 6(b) indicate areas where the error in the neural-ODE solution with our framework is lower than that in the neural-ODE and ROM solutions. Notably, the neural-ODE solution with our framework exhibits a greater number of black regions, suggesting a closer resemblance to the actual solution compared to the ROM and neural ODE solutions.

8 Conclusion and Discussion

In this study, we proposed a novel framework to improve the robustness of surrogate models against input perturbations, addressing a key challenge in modeling complex systems while ensuring computational efficiency. By integrating machine learning techniques with stochastic filtering, our approach has demonstrated significant improvements in surrogate model accuracy under perturbations. The experimental results presented in Section 1 provide strong validation of the framework’s effectiveness.

The versatility of our framework is evident from its application to dynamical systems represented on graphs and its extension to a general setting, where a neural-ODE-based surrogate model was employed to simulate complex physical phenomena. This not only highlights the robustness of our approach but also underscores its applicability across diverse domains, as discussed in Section 7.

Despite its strengths, our framework has certain limitations. Notably, its reliance on stochastic filtering methods introduces computational overhead and may also be susceptible to filter divergence in certain cases. While our study focuses on undirected graphs, many real-world complex systems involve directed networks, such as those governed by the complex Ginzburg-Landau equation [21]. Addressing directed graph structures presents an important avenue for future research, with potential extensions of our framework to handle these more intricate dynamical systems.

Acknowledgments

We thank Prof. S. Lakshmivarahan and Prof. Arun Tangirala for their insightful feedback, contributing to the enhancement of this work. We thank the anonymous reviewers for their insightful comments, which significantly improved the quality of the manuscript. This work was partially supported by the MATRIX grant MTR/2020/000186 of the Science and Engineering Research Board of India.

References

[1] Dirk Brockmann and Dirk Helbing. The hidden geometry of complex, network-driven contagion phenomena. Science, 342(6164):1337–1342, 2013.
[2] Sergei Maslov and I. Ispolatov. Propagation of large concentration changes in reversible protein-binding networks. Proceedings of the National Academy of Sciences, 104(34):13655–13660, 2007.
[3] Muruhan Rathinam and Linda R. Petzold. A new look at proper orthogonal decomposition. SIAM Journal on Numerical Analysis, 41(5):1893–1925, 2003.
[4] Suhan Kim and Hyunseong Shin. Data-driven multiscale finite-element method using deep neural network combined with proper orthogonal decomposition. Engineering with Computers, 40, 04 2023.
[5] B.A. Le, Julien Yvonnet, and Qi-Chang He. Computational homogenization of nonlinear elastic materials using neural networks. International Journal for Numerical Methods in Engineering, 104(12):1061–1084, 2015.
[6] Yanwen Huang and Yuanchang Deng. A hybrid model utilizing principal component analysis and artificial neural networks for driving drowsiness detection. Applied Sciences, 12(12), 2022.
[7] Redouane Lguensat, Pierre Tandeo, Pierre Ailliot, Manuel Pulido, and Ronan Fablet. The analog data assimilation. Monthly Weather Review, 145(10):4093–4107, 2017.
[8] D.C. Park and Yan Zhu. Bilinear recurrent neural network. Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94).
[9] D.-C. Park. A time series data prediction scheme using bilinear recurrent neural network. In 2010 International Conference on Information Science and Applications, pages 1–7, Seoul, Korea (South), 2010. IEEE.
[10] Julien Brajard, Alberto Carrassi, Marc Bocquet, and Laurent Bertino. Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: A case study with the lorenz 96 model. Journal of Computational Science, 44:101171, 2020.
[11] Peter D. Dueben and Peter Bauer. Challenges and design choices for global weather and climate models based on machine learning. Geoscientific Model Development, 11(10):3999–4009, 2018.
[12] Ronan Fablet, Souhaib Ouala, and Cédric Herzet. Bilinear residual neural network for the identification and forecasting of geophysical dynamics. In 2018 26th European Signal Processing Conference (EUSIPCO), pages 1477–1481, Rome, 2018. IEEE.
[13] Zichao Long, Yiping Lu, Xianzhong Ma, and Bin Dong. PDE-net: Learning PDEs from data. In Proceedings of the 35th International Conference on Machine Learning, page 9, 2018.
[14] Marc Bocquet, Jérémie Brajard, Alberto Carrassi, and Laurent Bertino. Data assimilation as a learning tool to infer ordinary differential equation representations of dynamical models. Nonlinear Processes in Geophysics, 26(3):143–162, 2019.
[15] Marc Bocquet, Jérémie Brajard, Alberto Carrassi, and Laurent Bertino. Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization. Foundations of Data Science, 2(1):55–80, 2020.
[16] Pavel Sakov, Jean-Michel Haussaire, and Marc Bocquet. An iterative ensemble kalman filter in the presence of additive model error. Quarterly Journal of the Royal Meteorological Society, 144(713):1297–1309, 2018.
[17] Alban Farchi, Patrick Laloyaux, Massimo Bonavita, and Marc Bocquet. Using machine learning to correct model error in data assimilation and forecast applications. Quarterly Journal of the Royal Meteorological Society, 147(739):3067–3084, jul 2021.
[18] John D. Hedengren, Reza Asgharzadeh Shishavan, Kody M. Powell, and Thomas F. Edgar. Nonlinear modeling, estimation and predictive control in apmonitor. Computers & Chemical Engineering, 70:133–148, 2014. Manfred Morari Special Issue.
[19] Lorenz T. Biegler. Nonlinear Programming. Society for Industrial and Applied Mathematics, 2010.
[20] P. T. Landsberg. The fourth law of thermodynamics. Nature, 238(5361):229–231, 1972.
[21] Giulia Cencetti, Pau Clusella, and Duccio Fanelli. Pattern invariance for reaction-diffusion systems on complex networks. Scientific Reports, 8(1), 2018.
[22] Chittaranjan Hens, Uzi Harush, Simi Haber, Reuven Cohen, and Baruch Barzel. Spatiotemporal signal propagation in complex networks. Nature Physics, 15(4):403–412, 2019.
[23] John M. Lewis, Sivaramakrishnan Lakshmivarahan, and Sudarshan Kumar Dhall. Dynamic Data Assimilation: A least squares approach. Cambridge Univ. Press, 2009.
[24] Tiancheng Li, Miodrag Bolic, and Petar M. Djuric. Resampling methods for particle filtering: Classification, implementation, and strategies. IEEE Signal Processing Magazine, 32(3):70–86, 2015.
[25] Fan R. K. Chung. Spectral Graph Theory. American Mathematical Society, Providence, RI, 1997.
[26] Richard L. Burden, J. Douglas Faires, and Annette M. Burden. Numerical analysis. Cengage Learning, 2016.
[27] Ming-Jun Lai, Jiaxin Xie, and Zhiqiang Xu. Graph sparsification by universal greedy algorithms, 2021. https://doi.org/10.48550/arXiv.2007.07161.
[28] Daniel A. Spielman and Shang-Hua Teng. Spectral sparsification of graphs. SIAM Journal on Computing, 40(4):981–1025, 2011.
[29] Daniel A. Spielman and Nikhil Srivastava. Graph sparsification by effective resistances, 2009.
[30] St $\acute{o}$ phane Boucheron, G $\acute{a}$ bor Lugosi, and Olivier Bousquet. Concentration inequalities. St $\acute{o}$ phane Boucheron, G $\acute{a}$ bor Lugosi, and Olivier Bousquet, 2013.
[31] Miriam Farber and Ido Kaminer. Upper bound for the laplacian eigenvalues of a graph, 2011. https://doi.org/10.48550/arXiv.1106.0769.
[32] Martin Grötschel, Sven Krumke, and Jörg Rambau. Online Optimization of Large Scale Systems. 01 2001.
[33] R. Donald Bartusiak. Nlmpc: A platform for optimal control of feed- or product-flexible manufacturing. Assessment and Future Directions of Nonlinear Model Predictive Control, page 367–381.
[34] Zoltan K. Nagy, Bernd Mahn, Rüdiger Franke, and Frank Allgöwer. Real-Time Implementation of Nonlinear Model Predictive Control of Batch Processes in an Industrial Framework, pages 465–472. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
[35] Abhishek Ajayakumar and Soumyendu Raha. Data assimilation for sparsification of reaction diffusion systems in a complex network, 2023. https://doi.org/10.48550/arXiv.2303.11943.
[36] Jorge Nocedal and Stephen J. Wright. Numerical optimization. Springer, 2006.
[37] David G. Luenberger and Yinyu Ye. Linear and nonlinear programming. Springer, 2021.
[38] Jeroen Elfring, Elena Torta, and Rob van de Molengraft. Particle filters: A hands-on tutorial. Sensors (Basel, Switzerland), 21(2):438, 2021.
[39] Wai Shing Fung and Nicholas J. A. Harvey. Graph sparsification by edge-connectivity and random spanning trees. CoRR, abs/1005.0265, 2010.
[40] Hanshu Yan, Jiawei Du, Vincent Y. F. Tan, and Jiashi Feng. On robustness of neural ordinary differential equations, 2022.