Learning Vortex Dynamics for Fluid
Inference and Prediction

Yitong Deng
Dartmouth College
Stanford University
&Hong-Xing Yu
Stanford University
&Jiajun Wu
Stanford University
&Bo Zhu
Dartmouth College

Abstract

We propose a novel differentiable vortex particle (DVP) method to infer and predict fluid dynamics from a single video. Lying at its core is a particle-based latent space to encapsulate the hidden, Lagrangian vortical evolution underpinning the observable, Eulerian flow phenomena. Our differentiable vortex particles are coupled with a learnable, vortex-to-velocity dynamics mapping to effectively capture the complex flow features in a physically-constrained, low-dimensional space. This representation facilitates the learning of a fluid simulator tailored to the input video that can deliver robust, long-term future predictions. The value of our method is twofold: first, our learned simulator enables the inference of hidden physics quantities (e.g., velocity field) purely from visual observation; secondly, it also supports future prediction, constructing the input video’s sequel along with its future dynamics evolution. We compare our method with a range of existing methods on both synthetic and real-world videos, demonstrating improved reconstruction quality, visual plausibility, and physical integrity.¹¹1Our video results, code, and data can be found at our project website: https://yitongdeng.github.io/vortex_learning_webpage.

1 Introduction

As small as thin soap films, and as large as atmospheric eddies observable from outer space, fluid systems can exhibit intricate dynamic features on different mediums and scales. However, despite recent progress, it remains an open problem for scientific machine learning to effectively represent these flow features, identify the underlying dynamics system, and predict the future evolution, due to the noisy data, imperfect modeling, and unavailable, hidden physics quantities.

Here, we identify three fundamental challenges that currently hinder the success of such endeavors. First, flow features are difficult to represent. Traditional methods learn the fluid dynamics by storing velocity fields either using regularly-spaced grids or smooth neural networks. These approaches have demonstrated promising results for fluid phenomena that are relatively damped and laminar (e.g., Chu et al., 2022), but for fluid systems that can exhibit turbulent features on varying scales, these methods fall short due to the problem’s curse of dimensionality (high-resolution space and time), local non-smoothness, and hidden constraints. As a result, more compact and structured representation spaces and data structures are called for.

Secondly, hidden flow dynamics is hard to learn. Fluid systems as prescribed by the Navier-Stokes equations tightly couple multiple physical quantities (i.e., velocity, pressure, and density), and yet, only the density information can be accessibly measured. Due to the system’s complexity, ambiguity, and non-linearity, directly learning the underlying dynamics from the observable density space is infeasible; and successful learning usually relies on velocity or pressure supervision, a requirement that distances these methods from deployment in real-world scenarios.

Exciting recent progress has been made in hidden dynamics inference by PDE-based frameworks such as Raissi et al. (2020), which uncover the underlying physics variables solely from density observations. However, this type of methods encounter the third fundamental challenge, which is that performing future prediction is difficult. As we will demonstrate, although strong results are obtained for interpolating inside the observation window provided by the training data, these methods cannot extrapolate into the future, profoundly limiting their usage.

In this paper, we propose the differentiable vortex particle (DVP) method, a novel fluid learning model to tackle the three aforementioned challenges in a unified framework. In particular, harnessing the physical insights developed for the vortex methods in the computational fluid dynamics (CFD) literature, we design a novel, data-driven Lagrangian vortex system to serve as a compact and structured latent representation of the flow dynamics. We learn the complex dynamical system underneath the high-dimensional image space by learning a surrogate, low-dimensional model on the latent vortex space, and use a physics-based, learnable mechanism: the vortex-to-velocity module, to decode the latent-space dynamics back to the image space. Leveraging this physics-based representation, we design an end-to-end training pipeline that learns from a single video containing only density information. Our DVP method enables accurate inference of hidden physics quantities and robust long-term future predictions, as shown in Figure 1.

To examine the efficacy of our method, we compare our method’s performance on both motion inference and future prediction against various state-of-the-art methods along with their extensions. We conduct benchmark testing on synthetic videos generated using high-order numeric simulation schemes as well as real-world videos in the wild. Evaluation is carried out both quantitatively through exhaustive numerical analysis, and qualitatively by generating a range of realistic visual effects. We compare the uncovered velocities in terms of both reconstruction quality and physical integrity, and the predicted visual results in terms of both pixel-level and perceptual proximity. Results indicate that our proposed method provides enhanced abilities on both fronts, inferring hidden quantities at higher accuracy, and predicting future evolution with higher plausibility.

Refer to caption — Figure 1: Our goal is to learn vortex dynamics for fluid inference and prediction. The 3 frames on the left are observed from a real video of a soap film on a circular metal rim, where the red ink is spreading. The 3 frames on the right are future prediction results produced by our method.

In summary, the main technical contributions of our framework align with the three challenges regarding flow representation, dynamics learning, and simulator synthesis. (1) We devise a novel representation for fluid learning, the differentiable vortex particles (DVP), to drastically reduce the learning problem’s dimensionality on complex flow fields. Motivated by the vortex methods in CFD, we establish the vorticity-carrying fluid particles as a new type of learning primitive to transform the existing PDE-constrained optimization problem to a particle trajectory (ODE) learning problem. (2) We design a novel particle-to-field paradigm for learning the Lagrangian vortex dynamics. Instead of learning the interaction among particles (e.g., Sanchez-Gonzalez et al., 2020), our model learns the continuous vortex-to-velocity induction mapping to naturally connect the vortex particle dynamics in the latent space with the fluid phenomena captured in the image space. (3) We develop an end-to-end differentiable pipeline composed of two network models to synthesize data-driven simulators based on single, short RGB videos.

2 Related Work

Hidden Dynamics Inference. The problem of inferring dynamical systems based on noisy or incomplete observations has been addressed using a variety of techniques, including symbolic regression (Bongard & Lipson, 2007; Schmidt & Lipson, 2009), dynamic mode decomposition (Schmid, 2010; Kutz et al., 2016), sparse regression (Brunton et al., 2016; Rudy et al., 2017), Gaussian process regression (Raissi et al., 2017; Raissi & Karniadakis, 2018), and neural networks (Raissi et al., 2019; Yang et al., 2020; Jin et al., 2021; Chu et al., 2022). Among these inspiring advancements, the “hidden fluid mechanics” (HFM) method proposed in Raissi et al. (2020) is particularly noteworthy, as it uncovers the continuous solutions of fluid flow using only images (the transport of smoke or ink).

Data-driven Simulation. Recently, growing interests are cast on learning numerical simulators according to data supervision, which has shown promise to reduce computation time (Ladickỳ et al., 2015; Guo et al., 2016; Wiewel et al., 2019; Pfaff et al., 2020; Sanchez-Gonzalez et al., 2020; Tompson et al., 2017), increase simulation realism (Chu & Thuerey, 2017; Xie et al., 2018), enable stylized control (Kim et al., 2020), estimate dynamic quantities such as viscosity and energy (Chang et al., 2016; Battaglia et al., 2016; Ummenhofer et al., 2019), and facilitate the training of control policies (Sanchez-Gonzalez et al., 2018; Li et al., 2018). Akin to Watters et al. (2017), our system takes images as inputs and performs dynamics simulation on a low-dimensional latent space; but our method learns purely from the input video and performs future rollout in the image space. Our method is also related to Guan et al. (2022), which infers Lagrangian fluid simulation from observed images. We propose sparse neural vortices as our representation while they use dense material points.

Vortex Methods. The underlying physical prior incorporated in our machine learning system is rooted in the family of vortex methods that are rigorously derived, analyzed, and tested in the computational fluid dynamics (CFD) (Leonard, 1980; Perlman, 1985; Beale & Majda, 1985; Winckelmans & Leonard, 1993; Mimeau & Mortazavi, 2021) and computer graphics (CG) (Selle et al., 2005; Park & Kim, 2005; Weißmann & Pinkall, 2010; Brochu et al., 2012) communities. Xiong et al. (2020) is pioneering for combining the Discrete Vortex Method with neural networks, but its proposed method relies on a large set of ground truth velocity sequences, whereas our method learns from single videos without needing the ground truth velocity.

3 Physical Model

We consider the velocity-vorticity form of the Navier–Stokes equations (obtained by taking the curl operator of both sides of the momentum equation, see Cottet et al. (2000) for details):

	$\displaystyle\frac{D\bm{\omega}}{Dt}=\frac{\partial{\bm{\omega}}}{\partial{t}}+\bm{u}\cdot\nabla\bm{\omega}$	$\displaystyle=\bm{\omega}\cdot\nabla\bm{u}+\nu\nabla^{2}\bm{\omega}+\nabla\times\bm{b},$		(1)
	$\displaystyle\bm{u}=\nabla\times\bm{\phi}$	$\displaystyle,\ \ \ \nabla^{2}\bm{\phi}=-\bm{\omega},$		(2)

where $\bm{\omega}$ denotes the vorticity, $\bm{u}$ the velocity, $\bm{b}$ the conservative body force, $\nu$ the kinematic viscosity, and $\bm{\phi}$ the streamfunction. If we ignore the viscosity and stretching terms (inviscid 2D flow), we obtain $D\bm{\omega}/Dt=\bm{0}$ , which directly conveys the Lagrangian conservative nature of vorticity (i.e., a particle’s vorticity will not change during its advection).

If we assume the fluid domain has an open boundary, we can further obtain the vortex-to-velocity induction formula, which is derived by solving Poisson’s equation on $\bm{\phi}$ using Green’s method (also known as the Biot-Savart Law in fluid mechanics):

\displaystyle\bm{u}(\bm{x})

\displaystyle=\int{K(\bm{x}-\bm{x}^{\prime})\bm{\omega}(\bm{x}^{\prime})d\bm{x}^{\prime}},

(3)

The kernel $K$ exhibits a type-II singularity at $\bm{0}$ and causes numerical instabilities, therefore in CFD practices, $K$ is replaced by various mollified versions $K_{\delta}$ to improve the simulation accuracy (Beale & Majda, 1985). We note that the mollified version $K_{\delta}$ is not unique, and can be customized and tuned in different numerical schemes per human heuristics. Different types and parameters for the mollification bring about significantly different simulation results.

Takeaways. The mathematical models above provide two central physical insights guiding the design of our vortex-based learning framework: (1) The Lagrangian conservation of vorticity $\bm{\omega}$ suggests the suitability of adopting Lagrangian data structures (e.g., particles as opposed to grids) to capture the dynamics. Since the tracked variable $\bm{\omega}$ remains temporally invariant for each Lagrangian vortex, the evolution of the continuous flow field is embodied fully by the movement of these vortices, which significantly alleviates the difficulty in learning. (2) Equation 3 presents an induction mapping from the vorticity $\bm{\omega}$ , a Lagrangian quantity carried by particles, to the velocity $\bm{u}$ , an Eulerian variable that can be queried continuously at an arbitrary location $\bm{x}$ . This lends the possibility for the Lagrangian method to be used in conjunction with Eulerian data structures (e.g., a grid) for learning from the widely available video data. Furthermore, such a mapping can benefit from data-driven learning, as we can replace the human heuristics by learning a mollified kernel $K_{\delta}$ to minimize the discrepancy between the simulated and observed flow phenomena.

4 Method

System Overview. Following the physics insight conveyed in Section 3, we design a learning system whose workflow is illustrated in Figure 2. As shown on the top row, our system takes as input a single RGB video that captures the vortical flow phenomena. As shown on the bottom row, our method learns and outputs a dynamical simulator — not on the image space itself, but on a latent space consisting of discrete vortices. Learning the latent dynamics in the vortex space would only be useful and feasible if we can tie it back to the image space, because it is the image space that we want to perform future prediction on, and we have no ground truth values for the vortex particles to begin with. The bridge to tie the vortex space with the image space derives from Equation 3, which supplies the core insight that there exists a learnable mapping from vortex particles to the continuous velocity field at arbitrary positions. This mapping is modeled by our learned dynamics module $\mathcal{D}$ , which gives rise to the intermediate velocity space, as shown in the middle row of Figure 2.

4.1 Differentiable Vortex Particles

We track a collection $\mathcal{V}$ of $n$ vortex particles, i.e., $\mathcal{V}\coloneqq[V_{1},\dots,V_{n}]$ . We define each vortex $V_{i}$ as the 3-tuple $(\bm{x}_{i},\omega_{i},\delta_{i})$ , where $\bm{x}$ represents the position, $\omega$ the vortex strength, and $\delta$ the size. The number of particles $n$ is a hyperparameter which we set to 16 for all our results. Further discussions and experiments regarding the choice of $n$ can be found in Appendix D. We also note that, since we are concerned with 2D inviscid incompressible flow, the size $\delta$ of a vortex does not change in time due to incompressibility, and the vortex strength $\omega$ does not change in time due to Kelvin’s circulation theorem (see Hald (1979) for a thorough discussion).

Learning Particle Trajectory. As shown in Figure 3, we learn a particle trajectory module: a query function $\mathcal{T}$ such that $\mathcal{V}_{t}=\mathcal{T}(t)$ , which predicts the configuration of all the vortices at any time $t\in[0,t_{E}]$ where $t_{E}$ represents the end time of the input video. As described above, predicting $\mathcal{V}_{t}$ boils down to determining two time-invariant components: (1) $[\omega_{1},\dots,\omega_{n}]$ , (2) $[\delta_{1},\dots,\delta_{n}]$ , and one time-varying component: $[(\bm{x}_{1})_{t},\dots,(\bm{x}_{n})_{t}]$ . For the two time-invariant components, we introduce two trainable $n\times 1$ vectors $\Delta$ and $\Omega$ to represent $\delta$ and $\omega$ respectively, such that $[\omega_{1},\dots,\omega_{n}]=\sin(\Omega)$ and $[\delta_{1},\dots,\delta_{n}]=\text{sigmoid}(\Delta)+{\epsilon}$ ( ${\epsilon}$ is a hyperparameter we set to 0.03). The vortex size $\Delta$ and strength $\Omega$ are optimized to fit the motion depicted by the input RGB video. For the time-varying component, we use a neural network $N_{1}(t)$ to encode $N_{1}(t)=[(\bm{x}_{1})_{t},\dots,(\bm{x}_{n})_{t}]$ , and the particle velocities $\frac{dN_{1}}{dt}$ can be extracted using automatic differentiation. We note that learning the full particle trajectory, rather than the initial particle configuration, allows the aggregation of dynamics information throughout the input video for better inference and prediction. We provide further discussion on this design in Appendix F.

Trajectory Initialization. As discussed above, the trajectory $\mathcal{T}$ has three learnable components: $\Delta$ , $\Omega$ and $N_{1}$ . We initialize $\Delta$ and $\Omega$ as zero vectors, which gives $\delta_{i}=0.5+\epsilon$ and $\omega_{i}=0$ for all $i$ . Conceptually, these vortices are initialized as large blobs with no vortex strength, which learn to alter their sizes and grow their strengths to better recreate the eddies seen in the video. The initial positions $[(\bm{x}_{1})_{0},\dots,(\bm{x}_{n})_{0}]$ are regularly spaced points to populate the entire domain. We initialize the 16 particles to lie at the grid centers of a $4\times 4$ grid. To do so, we simply pretrain $N_{1}$ so that $N_{1}(0)$ evaluates to the grid centers. The details regarding pretraining are given in Appendix A.

Learning the Vortex-to-Velocity Mapping. The vortex-to-velocity mapping is performed by our dynamics module $\mathcal{D}$ , which predicts the velocity $\bm{u}$ at an arbitrary query point $\bm{x}$ given the collection of vortices $\mathcal{V}=[(\bm{x}_{1},\omega_{1},\delta_{1}),\dots,(\bm{x}_{n},\omega_{n},\delta_{n})]$ . Following the physical insight conveyed in Section 3, $\mathcal{D}$ should evaluate the integration:

\displaystyle\bm{u}(\bm{x})=\int{K_{\delta}(\bm{x}-\bm{x}^{\prime})\bm{\omega}(\bm{x}^{\prime})d\bm{x}^{\prime}},

(4)

which replaces the kernel $K$ in Equation 3 by a learnable mapping $K_{\delta}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ , with $d$ representing the spatial dimension. Rather than directly using a neural network to model this $\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ mapping, we incorporate further physical insights by analyzing the structure of $K_{\delta}$ . As derived in Beale & Majda (1985), the kernel $K_{\delta}$ for 2-dimensional flow exhibits the following form:

\displaystyle K_{\delta}(\bm{z})

\displaystyle=\frac{1}{2\pi r}M(r,\delta)R_{\frac{2}{\pi}}(\bm{z}),\ r=|\bm{z}|

(5)

where $R_{\frac{2}{\pi}}(\bm{z})$ computes the unit direction of the cross product of $\bm{z}$ and the out-of-plane unit vector; and $M(r,\delta)$ is the human heuristic term that varies by choice. Hence, we opt to replace $\frac{1}{2\pi r}M(r,\delta)$ by a $\mathbb{R}^{2}\rightarrow\mathbb{R}$ neural network function $N_{2}(r,\delta)$ so that:

	$\displaystyle\bm{u}(\bm{x})$	$\displaystyle=\int{N_{2}(\|\bm{x}-\bm{x}^{\prime}\|,\delta_{i})R_{\frac{2}{\pi}}(\bm{x}-\bm{x}^{\prime})\bm{\omega}(\bm{x}^{\prime})d\bm{x}^{\prime}}$		(6)
		$\displaystyle\ \approx\sum_{i=1}^{n}{N_{2}(\|\bm{x}-\bm{x}_{i}\|,\delta_{i})R_{\frac{2}{\pi}}(\bm{x}-\bm{x}_{i}){\omega}_{i}}=\mathcal{D}(\mathcal{V})(\bm{x}).$		(7)

Learning this induction kernel $N_{2}(r,\delta)$ instead of using heuristics-based kernels allows for more accurate fluid learning and prediction from input videos. We discuss more on this in Appendix E.

4.2 End-to-end training

As previously mentioned, the dynamics on the latent vortex space is bridged to the evolution of the image space through the differentiable, dynamics module $\mathcal{D}$ . Hence, we can optimize the vortex representation $\mathcal{V}_{t}=\mathcal{T}(t)$ at time $t$ using images as supervision. First, we select $m+1$ frames: $[I_{t},\dots,I_{t+m}]$ from the video. Then, we compute $\bm{u}_{t}=\mathcal{D}(\mathcal{V}_{t})$ . After that, $(\bm{u}_{t},I_{t})$ is fed into an integrator on the Eulerian grid to predict $\tilde{I}_{t+1}$ . Simultaneously, $(\bm{u}_{t},\mathcal{V}_{t})$ is fed into an integrator on the Lagrangian particles to predict $\tilde{\mathcal{V}}_{t+1}$ . The process is then repeated, using $\tilde{I}_{t+1}$ in place of $I_{t}$ and $\tilde{\mathcal{V}}_{t+1}$ in place of $\mathcal{V}_{t}$ , to generate $\tilde{I}_{t+2}$ and $\tilde{\mathcal{V}}_{t+2}$ , and so on. Eventually, we would obtain $[\tilde{I}_{t+1},\dots,\tilde{I}_{t+m}]$ , which are the predicted outcome starting at time $t$ . We optimize $\mathcal{T}$ and $\mathcal{D}$ jointly by minimizing the difference between $[\tilde{I}_{t+1},\dots,\tilde{I}_{t+m}]$ and $[I_{t+1},\dots,I_{t+m}]$ in an end-to-end fashion.

By picking different values of $t$ in each training iteration to cover $[0,t_{E}]$ , we optimize $\mathcal{T}$ and $\mathcal{D}$ to fit the input video. There remains one more caveat — which is that the trajectories encoded with $\mathcal{T}$ are not enforced to be consistent with $\mathcal{D}$ , because each frame of $\mathcal{V}_{t}$ is optimized individually. In other words, if we evaluate the particle velocities $[\dot{\bm{x_{1}}}$ , …, $\dot{\bm{x_{n}}}]=\frac{dN_{1}}{dt}$ as prescribed by $\mathcal{T}$ , it should coincide with $[\mathcal{D}(\mathcal{V})(\bm{x_{1}}),\dots,\mathcal{D}(\mathcal{V})(\bm{x_{n}})]$ , as prescribed by $\mathcal{D}$ . Hence, in training, another loss is computed between $\frac{d{N}_{1}}{dt}$ and $[\mathcal{D}(\mathcal{V})(\bm{x_{1}}),\dots,\mathcal{D}(\mathcal{V})(\bm{x_{n}})]$ to align the vortex trajectory and the predicted velocity.

Deployment. After successful training, our learned system performs two important tasks. First, using our query function $\mathcal{T}(t)$ , we are able to temporally interpolate for $\mathcal{V}_{t}$ , which then uncovers the hidden velocity field $\bm{u}=\mathcal{D}(\mathcal{V}_{t})$ at arbitrary resolutions, providing the same functionality as Raissi & Karniadakis (2018), but using vorticity instead of pressure as the secondary variable. Moreover, with the dynamics module $\mathcal{D}$ , we can perform future prediction to unroll the input video, a feature unsupported by previous methods. As shown in Figure 4, since our method is forward-simulating by nature, it can provide more realistic and robust future predictions than existing methods or their extensions. Further implementation details of our method, including hyperparameters, network architectures, training schemes, and computational costs can be found in Appendix A.

5 Experiments

We evaluate our method’s ability to perform motion inference and future prediction on both synthetic and real videos, comparing against existing methods.

Baselines. For motion inference, we compare our method against Raissi & Karniadakis (2018) (HFM) and Zhang et al. (2022) (E-R). We reimplement the HFM method as prescribed in the paper, making only the modification that instead of using only a single concentration variable $c$ and its corresponding variable $d:=1-c$ , we create three $(c,d)$ pairs for each of the RGB channel for the support of colored videos. The E-R method is evaluated using the published pretrained models. We further compare against an ablated version of our proposed method, termed “UNet”, which essentially replaces the Lagrangian components of the system with a UNet architecture (Ronneberger et al., 2015), a classic method for learning field-to-field mappings. The UNet baseline takes two images $I_{t}$ and $I_{t+1}$ and predicts a velocity field $\bm{u}_{t+1}$ to predict $I_{t+2}$ using the same Eulerian integrator as our method. For future prediction, there do not exist previous methods that operate in comparable settings, so we extend the inference methods in a few ways to support future prediction in a logical and straightforward manner. First, since HFM offers a query function parameterized by $t$ , we test its future prediction behavior by simply extrapolating with $t>t_{E}$ ; this is referred to as “HFM extp.”. Since both Raissi & Karniadakis (2018) and Zhang et al. (2022) uncover the time-varying velocity field, we use a UNet to learn the evolution from $\bm{u}_{t}$ to $\bm{u}_{t+1}$ , and use this velocity update mechanism to perform future prediction. The two baselines thus obtained are referred to as “HFM+UNet” and “E-R+UNet” respectively. Our method’s ablation “UNet” supports future prediction intrinsically.

5.1 Synthetic Video

The synthetic video for vortical flow is generated using the Discrete Vortex Method with a first-order Gaussian mollifying kernel $M$ (Beale & Majda, 1985). The high-fidelity BFECC advection scheme (Kim et al., 2005) with Runge-Kutta-3 time integration is deployed. The simulation advects a background grid of size $256\times 256$ , with a time step $dt=0.01$ to create 300 simulated frames. Only the first 100 frames will be disclosed to train all methods, and future predictions are tested and examined on the following 200 frames.

Motion Inference. The results for the uncovering of hidden dynamic variables are illustrated in Figure 5 and Figure 6. Shown in Figure 5 are the velocities uncovered by all 4 methods against the ground truth, at frame 55 of the synthetic video with 100 observed frames. The velocity is visualized in the forms of colors (top row) and streamlines (middle row), while the velocity residue, measured in end-point error (EPE), is depicted on the bottom row. It can be seen that HFM, UNet, and our method achieve agreeing results, all matching the ground truth values at high accuracy. On the bottom row, it can be seen that as compared to HFM and UNet, our method generates the inference velocity that best matches the unseen ground truth.

The inference results over the full 100 frames are depicted at the top of Figure 6. We evaluate the velocity with four metrics: the average end-point error (AEPE), average angular error (AAE), vorticity RMSE and compressibility RMSE. From all 4 metrics, it can be seen that our method outperforms the baselines consistently. The time-averaged data for all four metrics are shown on the left of Table 1, which deems our method favorable for all metrics used.

Future Prediction. In Figure 7, we visually compare the future prediction results (from frame 100 to frame 299) using our method and the 4 benchmarks against the ground truth. It can be seen that the sequence generated by our method best matches the ground truth video, capturing the vortical flow structures, while the other baselines either quickly diffuse or generate unnatural, hard-edged patterns. Numerical analysis confirms these visual observations, as we compare the 200 future frames in terms of both velocity and visual similarity. The velocity analysis inherits the same 4 metrics, and the visual similarity is gauged using the pixel-level RMSE and the VGG feature reconstruction loss (Johnson et al., 2016). The time-averaged results of all 6 metrics are documented on the right of Table 1, and the time-varying results are plotted on the bottom of Figure 6. It can be concluded from the visual and numerical evidence that our method outperforms the baselines in this case.

\hlineB 3 Time-averaged Inference Errors	Time-averaged Prediction Errors
\hlineB 2.5	AEPE	AAE	Vort.	Div.		VGG	RMSE	AEPE	AAE	Vort.	Div.
\hlineB 2 E-R	0.505	1.393	8.470	2.319	+UNet	4.346	0.205	0.631	1.424	12.84	6.580
\hlineB 2 HFM	0.100	0.212	3.949	0.202	+UNet	4.258	0.205	0.720	1.062	36.73	10.41
					Extp.	4.080	0.285	0.541	1.464	7.761	4.315
\hlineB 2 UNet	0.048	0.100	1.799	1.145		4.530	0.211	0.424	1.159	7.334	3.017
\hlineB 2 Ours	0.020	0.041	0.976	0.053		2.010	0.080	0.048	0.096	1.621	0.043
\hlineB 3

\hlineB3	VGG (avg.)	VGG (final)	Div. (avg)
\hlineB2.5 E-R	2.095	2.205	2.046
HFM+UNet	2.151	2.231	0.940
HFM Extp.	2.980	3.271	1.922
UNet	2.111	2.088	1.447
Ours	2.093	2.045	0.318
\hlineB3

\hlineB3 Synthetic Video Errors	Real Video Errors
\hlineB3	AEPE	AAE	Vort.	RMSE	VGG	VGG (avg.)	VGG (final)
\hlineB2.5 Diff-Sim (Grid)	0.469	0.953	24.43	0.157	26043	15076	18792
Ours	0.041	0.081	1.482	0.055	2171.4	7846.2	11081
\hlineB3

\hlineB 3 Time-averaged Inference Errors	Time-averaged Prediction Errors
\hlineB 2.5	AEPE	AAE	Vort.	Div.		VGG	RMSE	AEPE	AAE	Vort.	Div.
\hlineB 2 E-R	0.229	0.805	4.380	1.750	+UNet	8138.8	0.178	0.272	1.115	4.694	2.504
\hlineB 2 HFM	0.038	0.097	3.001	0.533	+UNet	9389.5	0.146	0.201	0.715	10.58	3.199
					Extp.	40967	0.166	0.293	1.221	4.862	3.152
\hlineB 2 UNet	0.026	0.101	1.013	0.895		7721.3	0.170	0.330	1.141	5.462	1.496
\hlineB 2 Ours	0.015	0.046	0.480	0.015		2045.0	0.097	0.057	0.173	1.547	0.013
\hlineB 3

\hlineB3	VGG (avg.)	VGG (final)	RMSE (avg)
\hlineB2.5 Ours w/o learnable kernels	9698.7	11240	0.183
Ours	7365.6	10027	0.180
\hlineB3

Learning Vortex Dynamics for Fluid
Inference and Prediction

Abstract

1 Introduction

2 Related Work

3 Physical Model