A Finite Element-Inspired Hypergraph Neural Network:
Application to Fluid Dynamics Simulations
Abstract
An emerging trend in deep learning research focuses on the applications of graph neural networks (GNNs) for mesh-based continuum mechanics simulations. Most of these learning frameworks operate on graphs wherein each edge connects two nodes. Inspired by the data connectivity in the finite element method, we present a method to construct a hypergraph by connecting the nodes by elements rather than edges. A hypergraph message-passing network is defined on such a node-element hypergraph that mimics the calculation process of local stiffness matrices. We term this method a finite element-inspired hypergraph neural network, in short FEIH()-GNN. We further equip the proposed network with rotation equivariance, and explore its capability for modeling unsteady fluid flow systems. The effectiveness of the network is demonstrated on two common benchmark problems, namely the fluid flow around a circular cylinder and airfoil configurations. Stabilized and accurate temporal roll-out predictions can be obtained using the -GNN framework within the interpolation Reynolds number range. The network is also able to extrapolate moderately towards higher Reynolds number domain out of the training range.
Keywords. Graph neural network, hypergraph, finite element, rotation equivariance, fluid dynamics
1 Introduction
There has been an increasing interest in predicting and controlling the spatial-temporal dynamics of fluid flow based on the solution of Navier-Stokes equations [1, 2]. Since analytical solutions are usually not available, numerical solutions on discretized space and time domains are considered for predictive modeling. Leveraging state-of-the-art computational fluid dynamics (CFD) approaches based on finite volume [3] or finite element [4, 5] methods, one could obtain high-fidelity solutions that can be suitable for downstream design optimization and control purposes. However, the cost of performing such simulations is significant, and becomes prohibitively high for complex problems arising from real-world applications.
This limitation of traditional CFD approaches has inspired the development of data-driven projection-based reduced-order modeling techniques. Such models are usually used in an offline-online manner. In the offline stage, an approximation of the governing flow dynamics in a low-order linear subspace is constructed based on available fluid flow data collected. This approximation reduces the complexity of the problem in the online stage, making it possible to acquire fast, accurate predictions. Popular methods in this category include proper orthogonal decomposition (POD) [6, 7], dynamic mode decomposition (DMD) [8], along with many variants (e.g., [9, 10, 11]). However, these methods encounter difficulty when applied to scenarios with high Reynolds numbers and convection-dominated problems, whereas one needs a significantly large number of linear subspaces to achieve a satisfactory approximation.
In the last decade, deep neural networks have been explored as alternatives to the aforementioned techniques. The convolutional neural network, especially the convolutional encoder-propagator-decoder architecture, has been adopted in many applications, including flow around rigid bodies in both 2D [12, 13, 14, 15] and 3D [16] scenarios, interactions between fluid flow and moving solid bodies [17, 18, 19], wave and sound propagations [20, 21], and many more. By exploiting the computational power of modern graphical processing units (GPU), these convolutional neural networks are usually orders of magnitude faster than the traditional full-order CFD simulations.
While convolutional neural networks have achieved numerous successes, they are inherently restricted to be performed on a uniform Cartesian grid, which limits their capabilities in scenarios with complex geometries. Furthermore, due to the same reason, one cannot have different grid resolutions between the area of minor interest and the area with important dynamics. Under such restriction, one have to either introduce a dense Cartesian grid covering the whole field, or tolerate low resolution at regions of interest. The former is not always feasible due to computational resource limitations and prediction speed requirements. Therefore, the latter choice is usually inevitable, leading to the loss of fine physical details and subsequent difficulties in calculating important physical statistics such as the lift and drag forces.
It turns out that the (unstructured) computational mesh used for traditional CFD simulations can be converted to a graph with nodes connected by edges rather easily. Graph neural networks that works on such graphs therefore have the ability to address the aforementioned difficulty with convolutional neural networks. One can consider two approaches for the conversion of a mesh (Fig. 1a) into a graph. One approach (Fig. 1b) converts each vertex of the mesh to a node, and each cell boundary between two vertices to two directed edges. The other approach (Fig. 1c) converts each cell within the mesh into a node, while each border between two neighboring cells is converted to two directed edges.

With the system states originally attached to the mesh converted to the node and edge features, a graph neural network can be applied to learn the temporal evolution of these features, effectively serving as a surrogate to the traditional CFD model of the system. As a result, GNNs can maintain a significantly better resolution for important regions within the domain, while keeping the same mesh size as the CNNs. With this clear advantage over CNN, graph neural networks are recently introduced to the field of continuum mechanics [22, 23], inspiring a surge of works in the past three years. In particular, the encode-process-decode architecture with generalized graph message-passing layers [23, 24, 25] see popularity in many applications, including flow around fixed bodies such as cylinders, airfoils or buildings [23, 26, 27], reacting flows [28], flow field super-resolution [22, 28], flow field completion [29], etc. Additional techniques and designs are also being developed to work with this architecture, like on-the-fly graph adaptation [23], multi-grid methods [26, 30, 31, 32], rotation equivariance [26], among many more.
For a graph neural network that fits into the generalized graph message-passing [24, 25] framework, each message-passing layer can be written as the combination of an edge update stage
(1a) | |||
and a node update stage, | |||
(1b) |
in which and denote the node features attached to the nodes and respectively, denotes the edge feature corresponding to the directed edge pointing from node to node , and the superscript denotes the updated features. The edge update function and the node update function are some nonlinear functions, e.g., multi-layer perceptrons. The function is a permutation-invariant aggregation function that aggregates the information from all the edges pointing to each node .
From a computational mechanics point of view, such a message-passing network defined on the graph mimics the finite volume method [3]. Interpreting each node as a control volume, and each edge as the boundary between neighboring control volumes, the edge update function calculates the flux of information between the two control volumes, and the node update function evaluates the information in the control volume based on the aggregated incoming flux. With a residual link [33] for node and edge features, each layer or step within the message-passing network can be explained as one iteration in the iterative approximation towards the ground truth flux and cell information updates.
Besides the finite volume method, many other approaches are available in computational mechanics. The finite element method, for example, has been used as inspiration for some recent works. Alet et al. [34] constructs encoders and decoders that interpolate between data at random points within the field and data on the nodes of the graph, mimicking the behavior of shape function-based interpolation within the element in finite element methods. Gao et al. [35], more recently, proposed to calculate the loss by integrating the prediction error over the simulation domain using Gaussian quadrature integration with high-order shape functions instead of the traditional mean-squared loss on nodes.
In contrast to these works, we take inspiration from the data connectivity of the finite element method and connect the nodes by elements , effectively forming a hypergraph, illustrated in Fig. 2b. Further modifications of the hypergraph as detailed in Sec. 2.2 lead to a node-element hypergraph (Fig. 2c). We further mimic the local stiffness matrix calculation process in the finite element analysis, and design a hypergraph message-passing network that works with the node-element hypergraph converted from a computational mesh. We refer to this as a finite element-inspired hypergraph neural network, or in short, FEIH()-GNN). We further equip the -GNN with rotational equivariance via appropriate input and output feature transformations and apply the network for the simulation of fluid flow around obstacles. For the assessment of our -GNN, numerical experiments are systematically performed on two classic fluid flow configurations, namely the flow around a cylinder and the flow around an airfoil. Through our numerical experiments, we demonstrate stabilized and accurate predictions of the flow field as well as the lift and drag statistics calculated from it.

To the best of our knowledge, the most closely-related work to this research is the recent work by Lienen and Günnemann [36], who employed a single hypergraph message-passing step to estimate the time derivatives of system states at each node, which are then sent to an ODE solver to generate predictions of the system states in the future time steps. In contrast to the approach adopted in [36], our work has similarities with that of Pfaff et al. [23]. For example, we introduce more non-local information by stacking multiple hypergraph message-passing layers, and we directly predict the difference of system states between neighboring time steps (i.e., using a forward Euler time discretization) rather than relying on an ODE solver. We also target stable and accurate predictions over a long horizon, with tests on four thousand time steps into the future starting from one single time step, while the results reported by Lienen and Günnemann only include predictions within a short period of future (60-time steps at most).
The remaining part of this paper is organized as follows: Section 2 presents the conversion from mesh to node-element hypergraph, the derivation of the -GNN layers from the stiffness matrix assembly process, as well as the network architecture. Section 3 discusses the application of -GNN in fluid flow simulations. Sec. 4 discusses the experimental setup and model training details, with the results presented in Sec. 5. We conclude the work in Sec. 6.
2 Methodology
In this section, we introduce our node-element hypergraph together with the hypergraph message-passing layer that works with it. Starting from a general continuum dynamical system, we discretize it in space and time, and then describe the target function to be approximated with our -GNN model in Sec. 2.1. We proceed to explain the conversion of the mesh employed to discretize the system in space to a node-element hypergraph in Sec. 2.2, followed by a detailed derivation of the hypergraph message-passing layer from the local stiffness matrix calculation process in Sec. 2.3 and 2.4. The complete architecture of the network is presented in Sec. 2.5.
2.1 Surrogate model of a continuum system
Consider a continuum dynamical system defined in a bounded spatial domain , with governing equation in a general form
(2) |
in which is the system state and denotes any input to the system. We discretize in time with the forward Euler scheme
(3) |
where denotes the -th time step. Further discretization in space with a mesh leads to
(4) |
in which denotes the system state vector at time step , and denotes the system input vector at time step . The function that governs the evolution of the system in time can be approximated by a surrogate model . In this work, we construct such a surrogate model with a graph neural network, which leverages the geometric relationship between the data points. As such a relationship is already prescribed by the mesh, it is preferable to convert the mesh into a graph for subsequent use with the graph neural network.
As discussed in the introduction, one could transform the mesh into a graph consisting of nodes connected by edges (cf. Fig. 1), and a generalized graph message-passing network operating on such a graph would mimic the finite volume method. Instead of following this approach, we take inspiration from the data connectivity in the finite element method, and transform the mesh into a hypergraph via a different approach in this work, discussed in detail in the subsequent Sec. 2.2.
2.2 From mesh to node-element hypergraph
Consider a bounded spatial domain that is meshed (cf. Fig. 2a). In the finite element method, one treats each cell as an element connecting all the vertices of the cell. Mimicking this behavior, we convert each cell within the mesh into an undirected hyperedge (an ”element”) that connects all the vertices of the cell, and converts each of these vertices to a node, leading to an undirected hypergraph as shown in Fig. 2b. We further explicitly define the connection between each element and each of the nodes it connects as an undirected element-node edge, and this gives us a hypergraph , in which is the set of all nodes, is the set of all elements, and is the set of all element-node edges, as illustrated in Fig. 2c. We coin the name node-element hypergraph for such a hypergraph .
With the node-element hypergraph converted from the mesh, one could proceed to transform the mesh information and system states and attach them as features. These transformations are usually specific to the problem in concern, and we thus delay the discussion of these details to Sec. 3. In the next subsection, we assume that the system states and mesh information is transformed into some node features, element features, and element-node edge features, and define a message-passing network in the general form.
Remark 1.
It should be mentioned that it is possible to adopt a directed hypergraph, i.e., assigning a sequence to the nodes connected by the elements, rather than an undirected version. Ma et al. [37], for example, used a directed hypergraph for the simulation of particulate suspensions. However, to achieve permutation invariance by covering all possible permutations, each undirected hyperedge connecting nodes has to be converted to directed hyperedges. This means that the computational cost can be very high when each hyperedge is connecting more than three nodes (e.g., hypergraphs converted from quadrilateral or hexagonal meshes in 2D, or hypergraphs converted from meshes in 3D). We therefore adopt an undirected hypergraph in this work.
2.3 Message-passing layer on node-element hypergraph
We now proceed to define a graph message-passing network that works with the node-element hypergraph. Taking further inspiration from the finite element method, we design hypergraph message-passing layers that mimic the local stiffness matrix assembly process. In the remaining part of this subsection, we would start from the calculation of the element stiffness matrix for a simple Poisson problem (Eq. 8), and proceed step-by-step towards the formulation of the hypergraph message-passing layers (Eq. 19 and Eq. 21). For the sake of consistency and simplicity in symbols, we write the equations in this subsection and the remaining parts of the paper assuming a 2D quadrilateral mesh and the corresponding converted node-element hypergraph (cf. Fig. 2).
2.3.1 Finite element form for Poisson equation
To begin, consider a simple Poisson equation defined on a 2D bounded domain ,
(5) |
for some scalar and source term . The weak form of the problem can then be written as
(6) |
where being the test function. Discretize the domain with a quadrilateral mesh, the matrix form can be written as
(7) |
with the stiffness matrix assembled from the element stiffness matrices . We now focus on the element stiffness matrix of an element that connects four vertices , , , and . Assume 4-node bi-linear elements and four Gaussian quadrature points, the element stiffness matrix can be summed from the values at individual quadrature points ,
(8) |
To calculate , we first consider the Jacobian matrix, which can be written as
(9) |
where and denote the two axes of the iso-parametric element. With the chain rule, the Jacobian matrix is calculated through
(10) |
where denotes the shape function and subscript denotes the vertex index. Note that the value of the derivatives and only depend on and , meaning that if we flatten the Jacobian matrix into a (column) vector, it can be written as
(11) | ||||
where denotes the flattening operation, the symbol here (and in the rest of this subsection) denotes a function, and denotes an aggregation function which is summation in this case.
The calculated Jacobian matrix is used to calculated the gradient of the shape function at quadrature point
(12) |
Since is a matrix, its inverse transpose is a function of its determinant and the value of its individual entries,
(13) |
This means that Eq. 12 can be rewritten as
(14) |
With the gradient of shape functions available, the next step is to calculate the entries of the element stiffness matrix. Note that within the element stiffness matrix , only a vector will be assembled to each of the vertices , and this vector can be calculated through
(15) |
Note that the matrix does not depend on and therefore only a function of the Jacobian matrix and its determinant
(16) |
Combining Eq. 15 with Eq. 14 and 16 gives us
(17) | ||||
The last step in Eq. 17 approximates by the area of the cell corresponding to the element. Since is simply a summation of over different quadrature points , we can rewrite Eq. 11 and 17 as
(18a) | |||
and | |||
(18b) |
2.3.2 Conversion to hypergraph message-passing layer
The next step is to convert Eq. 18 into a form that is suitable as a hypergraph neural network layer in general application scenarios. With the node-element hypergraph defined in Sec. 2.2, we assume that the available system information at each vertex is converted to node feature , system information at each cell is converted to the element feature , and the system information at each cell that is specific to a certain vertex connected by it is converted to the element-node edge feature . Under such assumption, we rewrite Eq. 18 in a general form
(19a) | |||
(19b) |
which is equivalent to Eq. 18 by designating
(20a) | |||
(20b) |
choosing the aggregation function as summation, and specifying that the function do not use the additional inputs in Eq. 19b.
In the stiffness matrix assembly process in finite element method, the vector obtained through Eq. 18b for different elements will be assembled to different locations within the stiffness matrix. For the neural network, since the features are usually embedded in high dimensions, we simplify this process to an aggregation rather than assembly
(21) |
in which denotes an aggregation function over all elements that connect node with other nodes.
We describe Eq. 19a as the element update stage, Eq. 19b and 21 as the node update stage, and write the two stages together as a single message-passing layer
(22) |
in which the subscript is used to distinguish between different message-passing layers. An illustration of the two hypergraph message-passing stages described in Eq. 19 and Eq. 21 is plotted in Fig. 3.

2.4 Multi-layer perceptrons
The behavior of the element and node update stages described in Eq. 19 and 21 are largely determined by the choice of functions and . For the purpose of the neural network, we wish to retain the flexibility to learn any dynamics rather than just the local stiffness matrix calculation process, and it is therefore preferable that these functions are universal approximators that can be trained to approximate any function. Under this consideration, we follow the practice shared by most of the works discussed in the introduction, and choose each of these functions as a separate multi-layer perceptron, which can be written as
(23) |
in which the symbol denotes the function composition, and is the input vector to the multi-layer perceptron. Each layer within the multi-layer perceptron is defined as
(24) |
where is a linear transformation of the input vector , and is a non-linear function called the activation function. The values of all entries within the matrix and the vector are trainable.
2.5 Network architecture
With the message-passing layer defined on node-element hypergraph , we are able to construct the surrogate graph neural network model in Eq. 4. Following the encode-process-decode architecture [23], the network includes encoders, a series of message-passing layers, and decoders, stacked together in a feedforward fashion. Separate encoders and decoders are constructed for the node features, element-node edge features, and element features respectively using multi-layer perceptrons. The model is summarized in Algorithm 1.
3 Implementation for fluid dynamics simulations
In this section, we present a framework that utilizes the -GNN defined in Sec. 2 to the surrogate modeling of fluid systems. We would start with the transformation of the mesh and system states into the appropriate features in Sec. 3.1, and then describe the pre-processing (Sec. 3.2) and post-processing (Sec. 3.3) procedures that are used alongside the network to form the complete framework.
3.1 Feature transformation, translation invariance and rotation equivariance
We consider a fluid system defined on a bounded domain on with a given mesh, with flow velocity and pressure available at each vertex within the mesh, along with the coordinate of the vertex itself . We assume that the Reynolds number is known. The mesh is assumed to be body-fitting, i.e., each system boundary is defined as a set of mesh cell boundaries, with boundary conditions known. The input features to the neural network are transformed from these information.
Similar to the characteristics of traditional computational fluid dynamics solvers, we expect the network to satisfy a series of invariance and equivariance restrictions. In particular, the predicted pressure values on the vertices are expected to be invariant when the input system translates and/or rotates on the 2D plane. The predicted flow velocity vectors, on the other hand, are not expected to change when the system translates, but are expected to rotate together with the system. These properties are not necessarily satisfied by a graph neural network by default, and therefore the available information , and on the vertices need to be transformed to suitable forms before being attached to features.
Geometry features
We first transform the geometry information, namely the coordinates of the vertices, to graph features that are invariant to translation and rotation of the system. This is achieved through converting the global coordinate values to local geometry features within each element. Given an element (Fig. 4a) connecting nodes , , , and , we also give a coordinate to the element . For simplicity, we designate that this coordinate is at the center of the cell
(25) |
with the overline denoting the mean over . The local coordinates of the nodes relative to the element can then be written as
(26) |
The shape and size of the element can then be fully constrained (in fact over-constrained) by the length of the local coordinate vectors , the angles at the four corners of the element , and the area of the element , as marked out in Fig. 4b. These features satisfy the desired translation and rotation invariance properties, and therefore suitable to be attached to features.

Fluid flow features
The fluid flow information available are the flow velocity and pressure at each node, the Reynolds number , as well as the set of boundary conditions . With the geometry features being translation and rotation invariant, no specific treatment is needed for the scalar value of pressure. The flow velocity, on the other hand, requires specific treatment. Similar to the approach adopted by Lino et al. [26] for graph neural network defined on normal graphs, we project the flow velocity vector of each vertex (node) onto the direction of the unit local coordinate vectors of the node,
(27) |
as illustrated in Fig. 4c. The weights of these projections are suitable to be used as features. Following the practice of Pfaff et al. [23], we transform the boundary conditions into an one-hot vector for each node .
3.2 Preprocessing, graph feature attachment
With the system information transformed to features of suitable form, we proceed to attach them to the graph. Since the system boundaries are body-fitting, the one-hot boundary condition vector is attached to the node feature at each node :
(28) |
It is intuitive to attach the element area to the element feature . As the cell area at different regions within the mesh can vary by orders of magnitude, and that the cell areas are strictly greater than zero, we take the natural logarithm of the actual area values,
(29) |
in which the feature does not provide additional information, and is used to prevent the existence of feature vector of length 1.
The remaining system information are attached to the element-node edge feature. The pressure values , originally defined on each node , are gathered to all the element-node edges connected to it,
(30) |
in which the subscript denotes all the element-node edges that connect node with an element. Similar to the element area, we also take the natural logarithm values for the length of local coordinate vectors . Since the mesh enforces that , we can take the cosine of the angles for the ease of computation and normalization without the loss of information. The element-node edge feature vector, after these treatments, can be written as
(31) |
When a variety of dynamics (i.e., a range of Reynolds numbers) are involved, to make it possible for the network to differentiate between similar dynamics, we explicitly supply Reynolds number as a feature, which gives the element-node edge feature vector
(32) |
A discussion on the necessity of this treatment is provided in A.
3.3 Time stepping, postprocessing
In the temporal roll-out predictions, the network iteratively takes the features as inputs, and updates some of these features with its outputs. Assuming that the boundary conditions and geometry information do not need to be predicted, the features that need to be updated at every time step during the roll-out prediction are the first two entries and of the element-node edge feature vectors. With the system discretized temporally with forward Euler time-stepping scheme (Eq. 4), we have the temporal update from time step to time step :
(33) | ||||
with being the -GNN defined in Sec. 2.5. The network outputs are vectors of length 2 on element-node edges, and the decoders for the node and element features and are not used.
It should be noted that the temporal update are on the element-node edges which do not represent a physical location. To retrieve the velocity and pressure fields defined on vertices of the mesh, we aggregate the element-node edge features back to the nodes as a post-processing step. For the pressure , the value on each node can be simply retrieved by calculating the mean value across all the element-node edges connecting to the node
(34) |
in which is the mean aggregation function. The velocity at each node is retrieved by reverting the process of Eq. 27. Similar to that in Lino et al. [26], this can be achieved by solving the (usually over-constrained) problem via Moore-Penrose pseudo-inverse:
(35) |
in which the symbol denotes the Hadamard product, denotes the number of element-node edges connecting to node , is a matrix containing the projection weights for all the element-node edges connecting to node , and is also a matrix containing the unit local coordinate vectors for all the element-node edges connecting to node .
Remark 2.
In the case when a node only connects to one element (e.g., at the edge in domain boundary), Eq. 35 cannot be solved due to the problem being under-constrained, and we simply set and .
4 Problem Set-up
In this section, we apply the framework developed in Sec. 3 to the modeling and prediction of two fluid systems. We will first describe the data sets used, and then discuss the details on the setup and training of the models. The results will be presented in Sec. 5. All neural network trainings are performed with random seeds fixed at 1 unless specifically mentioned.
4.1 Data sets
We choose two common benchmark fluid flow systems for the experiments: The flow around a circular cylinder and the flow around an NACA0012 airfoil. Both flow data sets are 2-D incompressible fluid flow simulated at laminar flow conditions. The flow around the cylinder data set is obtained through simulation at Reynolds number , and used to test the capability of the neural network in learning a certain flow dynamic. The flow around the airfoil is simulated at multiple Reynolds numbers (with the flow at each Reynolds number forming a separate ”trajectory”) within the range , and the resulting data set is used to test the capability of the network to interpolate within a range of dynamics and extrapolate out of the range.
Both flow data sets are generated via a Petrov-Galerkin finite element solver with a semi-discrete time-stepping scheme [38] written in Matlab, with the computational meshes created using Gmsh [39]. The domain for the two sets of simulations is plotted in Fig. 5. The inlet is a fixed uniform flow , the outlet is set to be traction free, while top and bottom of the boundary conditions and are set to be slip-wall. No slip boundary condition is applied at the surface of the cylinder and airfoil. The computational meshes are shown in Fig. 6a and Fig. 7a. For the flow around cylinder, a total of 6499 continuous time steps are sampled with non-dimensionalized time step at Reynolds number . For the flow around an airfoil, simulations are performed at 61 different Reynolds numbers within the range . For each of the considered Reynolds number, 4500 continuous time steps are sampled with non-dimensionalized time step .



The simulated flow data are interpolated onto a coarser mesh (depicted in Fig. 6b and 7b) via a Clough-Tocher interpolator available in SciPy package [40] before subsequent conversion to features. It should be noted that the mesh for the flow around fixed cylinder is the same as the one used for the fluid-structure interaction between fluid flow and an elastically mounted cylinder in reference [41]. The statistics about the number of nodes, elements and element-node edges for the two data sets are reported in Table 1. The interpolated data sets are then split into train, cross-validation and test data sets, as listed in Table 2. It should be noted that the training and cross-validation share the same data set for the cylinder case. This is possible because the network is trained with single-step supervision, but evaluated by generating roll-out predictions for thousands of time steps starting from the first time step within the training/cross-validation data set.
Data set | Nodes | Elements | Element-node edges |
---|---|---|---|
Cylinder | 2204 | 2160 | 8640 |
Airfoil | 3653 | 3590 | 14360 |
Data set |
|
|
|
|
||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Cylinder | 200 | 200 | 2048 | 4000 | ||||||||
Airfoil |
|
|
256 | 4000 |
4.2 Network setup, implementation, training
We implement the models with PyTorch [42]. All multi-layer perceptrons in the network (encoders, decoders, processors) have two hidden layers with hidden and output layer width 128, except for the outputs of the element-node decoder which have width 2. A total of message-passing layers are used. It should be noted that the hyperparameters of network width and depth are not tuned but fixed following the choice of Pfaff et al. [23] for their MeshGraphNet. Residual links [33] are added in all message-passing layers.
Unlike most of the existing works on graph neural network-based continuum mechanics simulations discussed in the introduction, we adopt a sinusoidal activation function [43] in this work along with the associated initialization scheme. To avoid conflict with this design choice, we choose to use the mean aggregation function for both the element and node update stages, and avoid the use of any normalization layers. In addition, we apply min-max normalization with minimum and maximum to each entry for each of the input and output features in the training data sets, with the exception of in the element-node edge features since its value is already limited to be within the range of . The test data sets are normalized with the statistics calculated from the training data sets.
Models reported in Sec. 5.1 and 5.2 are trained with Adam [44] optimizer with PyTorch default setup for a total of 200 epochs with batch size 4, with an adaptive smooth training loss that will be discussed in Sec. 4.3. The network is trained using single-step supervision, i.e., for the input features at time step , the training loss is calculated between the predicted increment of element-node edge features from time step to time step (cf. Eq. 33) and the ground truth values. Artificial noise is added to the ground truth input and output values during training, with the scheme discussed in detail later in Sec. 4.4. A maximum learning rate of and a minimum learning rate are selected, with a warm-up stage of 10 epochs at the start. Details of the learning rate scheme will be presented in Sec. 4.5. All the reported training and testing runs are completed on a single Nvidia RTX 3090 GPU with CPU being AMD Ryzen 9 5900 @ 3 GHz 12 cores.
Remark 3.
It should be noted that the message-passing layers are implemented via a gather-scatter scheme similar to that of Pytorch Geometric [45], which is non-deterministic due to the use of GPU atomic operations. This means that the results from multiple trials of training and testing runs can still be different from each other even when the random seeds are fixed as the same value. Fully-deterministic implementation is possible via the direct use of sparse matrix multiplications, but seems to be significantly slower than the non-deterministic implementation. We therefore adopt the non-deterministic version in the present work.
4.3 Adaptive smooth training loss
Most of the existing works discussed in Sec. 1 adopt mean squared error (MSE) as the training loss function. For graph or hypergraph message-passing networks with sinusoidal activation functions, however, we observe that the training process can be unstable when MSE loss is used. We therefore seek an alternative training loss in this work. Another typical loss function, the loss, might not be preferable in this case since it is not smooth at zero, and we thus adopt a smooth loss [46] function, which states that for ground truth and its neural network prediction , the loss
(36) |
in which is a non-negative parameter that controls the transition point between the loss region and the loss region. The subscript here denotes entry-by-entry calculation. A fixed value is not preferable, since the loss function is not very different from the loss when is too small and will converge to a scaled loss during training when is too large. For the present case, the instability in training usually occurs when the training MSE error is relatively low, so an adaptive scheme for is preferred. Assuming that the distribution of error during the training approximately follows a symmetric distribution centered at zero, the target is to make sure that the two tails of the distribution fall into the loss region so that they do not lead to instability. More complex on-the-fly adaptation algorithms like references [47, 48, 49] exist, but we choose to adapt the value in this work using a simpler yet robust approach by setting as the variance of the model prediction error
(37) |
based on the idea that at least part of the two tails of any zero-centered symmetric distribution would reside more than one standard deviation away from zero. This variance of prediction error can be approximated by computing the mean-squared error between the predicted and ground truth value of the whole training set, which can be further approximated on-the-fly by an exponential moving average
(38) |
in which is the total number of training iterations within each epoch. We further stabilize this process by preventing from increasing, i.e.,
(39) |
for each training step. The initial value of can be determined by calculating the mean-squared error of the initialized network over a small randomly-sampled subset of the whole training set. Such a subset contains samples for the reported cases in Sec. 5.
4.4 Training noise
The prediction of the network is expected to have an error. Without special treatment, this error would accumulate over the prediction time steps and would eventually cause the predictions to blow up. It is therefore preferable that the network automatically corrects its prediction error. In this work, this is achieved by training the network with artificial noise added to the training data. With a the trained network , we consider its prediction output calculated from the input features at time step ,
(40) |
It should be noted that this equation is different from Eq. 33. The additional scaling factor (vector) exists due to the necessary normalization of neural network output targets during the training process. Assuming that the network is large enough and trained thorough enough, such that it is able to capture the evolution of the system dynamics, it is reasonable to assume that the error of this prediction would approximately follow a Gaussian distribution,
(41) |
We hope that the network will be able to correct this error during its prediction in the subsequent time step, i.e., we expect that
(42) |
in which is the network prediction error at time step . Therefore, during training, one can enforce the network to learn such a behavior by adding a Gaussian noise with a specified variance to the training output target, and to the corresponding entries of training input.
Remark 4.
It should noted that this scheme is directly inspired by but not equivalent to the one adopted by Pfaff et al. [23], as they chose to specify a certain magnitude of Gaussian noise for input rather than for the output. Two schemes are only equivalent when different entries of vector are equal to each other.
For systems that only changes slightly over each time step, i.e., both entries of is much lower than 1, the predicted output at neighboring time steps would be similar to each other. It is therefore also reasonable to expect that the prediction error of two neighboring time steps are significantly correlated,
(43) |
This is empirically verified during the preliminary tests, we skip the details for brevity. Based on such an observation, we further propose to train the network to over-correct the error at each time step, such that a proportion of error at the subsequent time step is also canceled. This means that for a Gaussian noise with a specified variance, we add to the training output target, and add to the corresponding entries of input, with being a hyper-parameter that controls the level of over-correction. We take inspiration from the typical relaxation factor range used in successive over-relaxation methods, and directly specify in this work rather than tuning it as a hyper-parameter.
For the flow around cylinder data set, the model reported in Sec. 5.1 is trained with the standard deviation of random noise designated to be . This value is selected after a screen through the range , and a discussion of the behavior of the network when trained with different amount of training noise is attached in B. For the flow around airfoil data sets, to reduce the amount of computational power needed, we start from the noise standard deviation level of and divide this value by half in each trial, eventually using the noise standard deviation level for the results reported in Sec. 5.2.
4.5 Learning rate
We choose to merge the popular cosine and exponential learning rate schemes to take the advantage of both of them, such that a scaled cosine learning rate applies near the start of the training and a scaled exponential learning rate applies near the end. In particular, a decaying learning rate curve starting from maximum learning rate and ends at minimum learning rate is specified by
(44a) | |||
in which the scaling factor | |||
(44b) | |||
scales the two parts of the curve such that the curve is continuous over . The point where the gradient of the two parts of the curve matches with each other | |||
(44c) | |||
is numerically solved starting from a guess . |
The value is equal to the proportion of training completed. An increasing learning curve from to , used in the warm up stage at the start of the training process, is acquired by calculating the decaying learning curve and then reverse the sequence. For the models reported in this work, we change the learning rate at the start of every iteration during the warm-up stage, and change the learning rate at the start of every epoch for the rest of the training process.
4.6 Evaluation metrics
The trained models are evaluated on the test data sets. As the main purpose of the neural network surrogate model is to generate stable and accurate roll-out simulations, we evaluate the models by feeding the state of the system at a certain time step, i.e. the first time step within each of the test data sets, to the model and compare the predicted system states over the next several thousands of time steps with the ground truth values from the interpolated CFD data. We evaluate the model based on the following criteria:
Accuracy & stability of predicted system states
We quantify the accuracy of the predicted system states by calculating the similarity between the ground truth system state vector and its neural network prediction over the prediction roll-out time steps by calculating the coefficient of determination
(45) |
in which denotes the norm. The system state vector could be either the velocity components or , or the non-dimensionalized pressure field . A higher coefficient of determination indicates a more accurate prediction, up to which means perfectly accurate predictions. It is also preferable if the network is able to generate stable predictions continuously. While it could be difficult to prove the stability mathematically, it is possible to empirically verify it by using the network to predict over a very long period into the future and compare with the ground truth CFD predictions. In this work, we deem that being able to generate a stable roll-out prediction over 4000 time steps (equivalent to about 30-40 vortex shedding cycles for the test data sets used in this work) is a sufficient empirical verification of stability.
Accuracy of lift and drag coefficients
We also calculate the lift and drag coefficients directly from the predicted system states. These statistics are compared with the ground truth values calculated from CFD data. In particular, the lift and drag coefficients are calculated by directly integrating the Cauchy stress tensor at the first layer of mesh away from the surface of the cylinder or airfoil,
(46a) | |||
(46b) |
where is the density of the fluid, is the viscosity of the fluid, is equal to the magnitude of inlet velocity, and denotes the surface of the airfoil or cylinder. The characteristic length is either equal to the diameter of the cylinder or the chord length of the airfoil .
It should be reiterated that the lift and drag coefficients reported in this work are not directly given by the neural network, but rather calculated from the predicted velocity , and pressure , without any correction or interpolation schemes like the one used in [17].
5 Results and discussion
5.1 Learning a certain fluid flow dynamic
Starting from the first time step in each of the test data sets, we generate roll-out predictions of the states of the fluid system over the next 4000 time steps. In this subsection, we present these results for the flow around cylinder data set, using the criteria discussed in Sec. 4.6. We also verify the designed invariance and equivariance properties of the network.
Field prediction and stability
Figure 8 shows the coefficient of determination of the predicted systems states , and for the flow around cylinder data set over the roll-out prediction over 4000 time steps, and Fig. 9 presents the predicted versus ground truth -axis velocity component at time step 3900, 3950 and 4000. An value consistently close to , along with the close match between predicted and ground truth , indicate that the predictions are stabilized and accurate.


Lift and drag
Using Eq. 46, we calculate the lift and drag coefficients from the predicted system states over time, and compare with the ground truth values. The results between roll-out prediction time steps 3501 and 4000 are plotted in Fig. 10. The lift and drag coefficients calculated from the predicted system states are reasonably accurate, with drag coefficients slightly () lower than the ground truth values.
Predictions near the cylinder surface
We further zoom in to the vicinity of the cylinder, and plot the predicted versus ground truth pressure fields in Fig. 11. As marked out by a magenta circle, the predicted pressure values around a region near the surface of the cylinder are slightly higher than the ground truth values. Since the lift and drag forces are only integrated using the values at the first layer of mesh away from the cylinder surface, this error in the predicted pressure field would cause the drag coefficient calculated from the predicted system states to be lower than that calculated from the ground truth CFD data, which is consistent with the observations in Fig. 10.


Rotation equivariance and translation invariance
We verify the desired invariance and equivariance properties by rotating and translating the input system before sending it into the network for roll-out predictions. In particular, we rotate the input system at the first time step within the flow around cylinder test data set by , and then translate the system by a random value. The resulting lift and drag coefficient curves plotted in Fig. 12 are accurate, showing that these desired properties are satisfied.


5.2 Learning a range of fluid flow dynamics
With the network shown to be able to generate stable and accurate predictions when used to learn a single fluid flow dynamics in Sec. 5.1, we proceed to apply the network to learning a range of dynamics using the flow around airfoil data sets, and verify whether it can serves as an effective surrogate model for the time series predictions within this range of Reynolds numbers. In particular, we train the network with the flow around airfoil data simulated between Reynolds numbers 2000 and 3000, and inspect its behavior when applied to other flow around airfoil cases within the Reynolds number range . In the remaining part of this subsection, we will first test over the stability of the predictions, and then inspect various aspects of the predictions within the stability range.
Stability range
We first identify the range of Reynolds numbers within which the network is able to produce stable outputs by generating roll-out predictions for 4000 time steps for each of the test data sets (cf. Table 2), starting from the first time step in each of them. The coefficient of determinations at roll-out prediction time step 4000 for different test data sets are plotted in Fig. 13, and it is observed that the predictions within the range of Reynolds number are accurate and stable. This, combined with the fact that the predictions are stable for all the cross-validation data sets, leads to the conclusion that the network is able to generate stable and accurate predictions for Reynolds number range of , i.e., the network is an effective surrogate model within the interpolation range of Reynolds numbers, and can extrapolate moderately to higher Reynolds number cases out of the training range.
Lift and drag coefficients
We plot out the lift and drag coefficients between roll-out prediction time steps 3501 and 4000 for several test data sets within the stability range in Fig. 14. It is clear that the lift and drag coefficients calculated from the flow fields predicted by the network are accurate within the stability range. One might notice the existence of slight phase differences between the predicted and ground truth lift and drag coefficient curves. This is caused by a very slight error () in the learned vortex shedding frequencies, which is reasonable since the network is trained with one-step supervision only (cf. Sec. 4.2).

Field predictions
To evaluate the accuracy of the predicted systems states, we calculate the coefficient of determination of the predicted velocity components and pressure fields, and these values for several test data sets are plotted in Fig. 15. The high values show that the predictions are stabilized and reasonably accurate. The gradual reducing trend of the values are caused by the accumulated phase difference over the prediction roll-out.

Predictions at far wake
We notice that the coefficient of determination of -axis velocity component , although still close to 1, is significantly lower than others, and also demonstrate relatively large fluctuation in its values. To explore the reason behind such behavior, we plot out the predicted versus ground truth fields for the whole domain at prediction time step 4000 for the test data set with Reynolds number 2050 in Fig. 16. It turns out that the network prediction in a region near the end of the far wake region deviates relatively largely from the ground truth values, as the predicted dissipates. As marked out by the magenta circle in Fig. 16, such an error is likely caused by the coarsening of the mesh in this region. The predictions for test data sets at other Reynolds numbers also demonstrate similar (but not as large) error at about the same region.

Predictions at near wake & trailing edge
We proceed to examine the predictions at near wake regions. The predicted pressure fields at time step 3933, 3967 and 4000 for the case with Reynolds number 2050 in the near wake region are plotted in Fig. 17. It is clear that the predictions are accurate in the near wake region. The prediction error is mostly due to the slight phase difference between the predicted and ground truth system states. We further zoom in to the local neighborhood of the trailing edge, and plot out the predicted versus ground truth pressure fields in Fig. 18. The network is able to predict the low pressure region at the trailing edge, as well as the flow dynamics around it.


6 Conclusion
In this paper, we presented a new graph neural network framework, and applied it to fluid dynamics simulations. Inspired by the finite element method, we converted the computational mesh to a node-element hypergraph, and designed a message-passing network on such a hypergraph that mimics the local stiffness matrix calculation process. We equipped the network with appropriate invariance and equivariance properties via input and output feature transformations. We demonstrated the proposed network for surrogate modeling of the flow around cylinder and airfoil configurations, and obtained stabilized and accurate temporal predictions for a range of Reynolds numbers. As the proposed network architecture only changes the graph connectivity and message-passing strategy at individual graph levels, it should be compatible after minor modifications with most existing techniques that were originally built upon graph neural networks defined on a normal graph. This means that the proposed network could serve as an alternative to the generalized graph message-passing network in these applications. Similar to the finite element method versus the finite volume method, we believe that some problems are more easily resolved with the proposed -GNN framework. We thus hope that this work would inspire new applications of graph neural networks in the simulation of continuum mechanics.
It should be mentioned again that we took inspiration from the graph connectivity of the finite element method as well as the calculation process of the local stiffness matrices and load vectors. However, it remains untouched whether it is possible to construct a network architecture that is strictly a neural network version of the finite element method, which requires a step further from this work. This possibility will be further investigated in our future work.
Acknowledgment
This research is funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) and Seaspan Shipyards. The training and evaluation of the neural network models were supported in part by the computational resources and services provided by Advanced Research Computing at the University of British Columbia. Dr. Renjie Liao and Xiaoyu Mao provided numerous suggestions to the work, their help is hereby gratefully acknowledged.
References
- [1] S. S. Collis, R. D. Joslin, A. Seifert, V. Theofilis, Issues in active flow control: theory, control, simulation, and experiment, Progress in aerospace sciences 40 (4-5) (2004) 237–289.
- [2] R. D. Joslin, D. N. Miller, Fundamentals and applications of modern flow control, American Institute of Aeronautics and Astronautics, 2009.
- [3] R. J. LeVeque, Finite volume methods for hyperbolic problems, Vol. 31, Cambridge university press, 2002.
- [4] T. J. Hughes, The finite element method: linear static and dynamic finite element analysis, Courier Corporation, 2012.
- [5] C. Johnson, Numerical solution of partial differential equations by the finite element method, Courier Corporation, 2012.
- [6] J. L. Lumley, The structure of inhomogeneous turbulent flows, in: Atmospheric Turbulence & Wave Propagation, 1967.
- [7] L. Sirovich, Turbulence and the dynamics of coherent structures. part I. coherent structures, Quarterly of Applied Mathematics 45 (3) (1987) 561–571.
- [8] P. J. Schmid, Dynamic mode decomposition of numerical and experimental data, Journal of fluid mechanics 656 (2010) 5–28.
- [9] A. Towne, O. T. Schmidt, T. Colonius, Spectral proper orthogonal decomposition and its relationship to dynamic mode decomposition and resolvent analysis, Journal of Fluid Mechanics 847 (2018) 821–867.
- [10] O. T. Schmidt, P. J. Schmid, A conditional space–time pod formalism for intermittent and rare events: example of acoustic bursts in turbulent jets, Journal of Fluid Mechanics 867 (2019).
- [11] H. Zhang, C. W. Rowley, E. A. Deem, L. N. Cattafesta, Online dynamic mode decomposition for time-varying systems, SIAM Journal on Applied Dynamical Systems 18 (3) (2019) 1586–1609.
- [12] F. J. Gonzalez, M. Balajewicz, Deep convolutional recurrent autoencoders for learning low-dimensional feature dynamics of fluid systems, arXiv preprint arXiv:1808.01346 (2018).
- [13] N. Thuerey, K. Weißenow, L. Prantl, X. Hu, Deep learning methods for reynolds-averaged navier–stokes simulations of airfoil flows, AIAA Journal 58 (1) (2020) 25–36.
- [14] S. R. Bukka, R. Gupta, A. R. Magee, R. K. Jaiman, Assessment of unsteady flow predictions using hybrid deep learning based reduced-order models, Physics of Fluids 33 (1) (2021) 013601.
- [15] P. Pant, R. Doshi, P. Bahl, A. Barati Farimani, Deep learning for reduced order modelling and efficient temporal evolution of fluid simulations, Physics of Fluids 33 (10) (2021) 107101.
- [16] R. Gupta, R. K. Jaiman, Three-dimensional deep learning-based reduced order model for unsteady flow dynamics with variable reynolds number, Physics of Fluids 34 (3) (2022) 033612.
- [17] R. Gupta, R. K. Jaiman, A hybrid partitioned deep learning methodology for moving interface and fluid–structure interaction, Computers & Fluids 233 (2022) 105239.
- [18] X. Zhang, T. Ji, F. Xie, C. Zheng, Y. Zheng, Data-driven nonlinear reduced-order modeling of unsteady fluid–structure interactions, Physics of Fluids 34 (5) (2022) 053608.
- [19] X. Fan, J.-X. Wang, Differentiable hybrid neural modeling for fluid-structure interaction, arXiv preprint arXiv:2303.12971 (2023).
- [20] I. K. Deo, R. Jaiman, Predicting waves in fluids with deep neural network, Physics of Fluids 34 (6) (2022) 067108.
- [21] W. Mallik, R. K. Jaiman, J. Jelovica, Predicting transmission loss in underwater acoustics using convolutional recurrent autoencoder network, The Journal of the Acoustical Society of America 152 (3) (2022) 1627–1638.
- [22] F. D. A. Belbute-Peres, T. Economon, Z. Kolter, Combining differentiable pde solvers and graph neural networks for fluid flow prediction, in: international conference on machine learning, PMLR, 2020, pp. 2402–2411.
- [23] T. Pfaff, M. Fortunato, A. Sanchez-Gonzalez, P. W. Battaglia, Learning mesh-based simulation with graph networks, arXiv preprint arXiv:2010.03409 (2020).
- [24] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, et al., Relational inductive biases, deep learning, and graph networks, arXiv preprint arXiv:1806.01261 (2018).
- [25] A. Sanchez-Gonzalez, N. Heess, J. T. Springenberg, J. Merel, M. Riedmiller, R. Hadsell, P. Battaglia, Graph networks as learnable physics engines for inference and control, in: International Conference on Machine Learning, PMLR, 2018, pp. 4470–4479.
- [26] M. Lino, S. Fotiadis, A. A. Bharath, C. D. Cantwell, Multi-scale rotation-equivariant graph neural networks for unsteady eulerian fluid dynamics, Physics of Fluids 34 (8) (2022) 087110.
- [27] X. Shao, Z. Liu, S. Zhang, Z. Zhao, C. Hu, Pignn-cfd: A physics-informed graph neural network for rapid predicting urban wind field defined on unstructured mesh, Building and Environment 232 (2023) 110056.
- [28] J. Xu, A. Pradhan, K. Duraisamy, Conditionally parameterized, discretization-aware neural networks for mesh-based modeling of physical systems, Advances in Neural Information Processing Systems 34 (2021) 1634–1645.
- [29] X. He, Y. Wang, J. Li, Flow completion network: Inferring the fluid dynamics from incomplete flow information using graph neural networks, arXiv preprint arXiv:2205.04739 (2022).
- [30] Z. Yang, Y. Dong, X. Deng, L. Zhang, Amgnet: multi-scale graph neural networks for flow field prediction, Connection Science 34 (1) (2022) 2500–2519.
- [31] M. Fortunato, T. Pfaff, P. Wirnsberger, A. Pritzel, P. Battaglia, Multiscale meshgraphnets, arXiv preprint arXiv:2210.00612 (2022).
- [32] Y. Cao, M. Chai, M. Li, C. Jiang, Bi-stride multi-scale graph neural network for mesh-based physical simulation, arXiv preprint arXiv:2210.02573 (2022).
- [33] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- [34] F. Alet, A. K. Jeewajee, M. B. Villalonga, A. Rodriguez, T. Lozano-Perez, L. Kaelbling, Graph element networks: adaptive, structured computation and memory, in: International Conference on Machine Learning, PMLR, 2019, pp. 212–222.
- [35] H. Gao, M. J. Zahr, J.-X. Wang, Physics-informed graph neural galerkin networks: A unified framework for solving pde-governed forward and inverse problems, Computer Methods in Applied Mechanics and Engineering 390 (2022) 114502.
- [36] M. Lienen, S. Günnemann, Learning the dynamics of physical systems from sparse observations with finite element networks, arXiv preprint arXiv:2203.08852 (2022).
- [37] Z. Ma, Z. Ye, W. Pan, Fast simulation of particulate suspensions enabled by graph neural network, Computer Methods in Applied Mechanics and Engineering 400 (2022) 115496.
- [38] R. K. Jaiman, M. Z. Guan, T. P. Miyanawala, Partitioned iterative and dynamic subgrid-scale methods for freely vibrating square-section structures at subcritical reynolds number, Computers & Fluids 133 (2016) 68–89.
- [39] C. Geuzaine, J.-F. Remacle, Gmsh: A 3-d finite element mesh generator with built-in pre-and post-processing facilities, International journal for numerical methods in engineering 79 (11) (2009) 1309–1331.
- [40] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, SciPy 1.0 Contributors, SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python, Nature Methods 17 (2020) 261–272.
- [41] R. Gao, R. K. Jaiman, Quasi-monolithic graph neural network for fluid-structure interaction, arXiv preprint arXiv:2210.04193 (2022).
- [42] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., Pytorch: An imperative style, high-performance deep learning library, Advances in neural information processing systems 32 (2019).
- [43] V. Sitzmann, J. Martel, A. Bergman, D. Lindell, G. Wetzstein, Implicit neural representations with periodic activation functions, Advances in Neural Information Processing Systems 33 (2020) 7462–7473.
- [44] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014).
- [45] M. Fey, J. E. Lenssen, Fast graph representation learning with PyTorch Geometric, in: ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
- [46] R. Girshick, Fast r-cnn, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.
- [47] C.-Y. Fu, M. Shvets, A. C. Berg, Retinamask: Learning to predict masks improves state-of-the-art single-shot detection for free, arXiv preprint arXiv:1901.03353 (2019).
- [48] H. Zhang, H. Chang, B. Ma, N. Wang, X. Chen, Dynamic r-cnn: Towards high quality object detection via dynamic training, in: European conference on computer vision, Springer, 2020, pp. 260–275.
- [49] A. R. Sutanto, D.-K. Kang, A novel diminish smooth l1 loss model with generative adversarial network, in: International Conference on Intelligent Human Computer Interaction, Springer, 2021, pp. 361–368.
- [50] I. K. Deo, R. Gao, R. K. Jaiman, Combined space-time reduced-order model with three-dimensional deep convolution for extrapolating fluid dynamics, Physics of Fluids 35 (2023) 043606.
Appendix A Necessity of a Reynolds number feature
The Reynolds number is explicitly used as an input feature for the network in this work (cf. Eq. 32). While theoretically it should be possible to implicitly identify dynamics at different Reynolds numbers, practically the network do not give good results when the Reynolds numbers are not explicitly supplied. To demonstrate this, we train the network with the airfoil data set with almost the same setup as the model reported in Sec. 5.2, except that the element-node edge feature vectors are constructed with Eq. 31 and that noise with are added rather than to stabilize the outputs. The resulting lift and drag force curves between prediction time steps 1 to 500 as well as 3501 to 4000 are plotted in Fig. 1. The network is still able to generate stable predictions, but the results are not correct. In particular, the lift and drag force curves are gradually ’corrected’ towards a wrong pattern over the first hundred time steps. The similarity in shape between the predicted force curves for test sets at different Reynolds numbers indicate that the network is not able to distinguish between the dynamics at different Reynolds numbers from the flow data directly, but rather treats all cases as if they are governed by one certain dynamics. Such a behavior is consistent with the observations from our recent work [50], where it is shown that a convolutional neural network encounter difficulties in learning the difference between the flow dynamics at different Reynolds numbers when it is trained to learn the flow evolution from a single time step to another single time step, while being able to do so when the flow information from a series of continuous time steps are used as input.

Appendix B Effect of training noise amount
As mentioned in Sec. 4.4, the optimal amount of training noise is difficult to determine. To further explore the behavior of the network with different amounts of training noise, we plot out the coefficient of determination of the predicted non-dimensionalized pressure field at prediction time step 4000 when different amounts of training noise are used in Fig. 2. The results show that the trend is rather chaotic, with training batch size, random seed, and the total number of training iterations influencing the optimal choice of training noise. The only conclusion that we could draw from these results is that the network would not produce stable results when not enough amount of noise is added during the training process.

It is likely that these conclusions would also apply to a wide range of other scenarios in which training noise is used, and this lead to a greater question when comparing two (or multiple) different networks: How to find a certain set of training scenarios that are completely not biased towards any of the networks in comparison, or does such a set of setup even exist? One might think about comparing the networks without using training noise, but such a comparison could be biased against those networks that are designed to work with training noise. Exactly due to this issue, we avoid comparing prediction accuracy with other existing graph neural network architectures in this work. Nevertheless, the development of unbiased, rigorous criteria to compare different neural networks is certainly a core issue in neural network-based surrogate modeling of fluid systems which can be of interest for our future investigations.