This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Machine learning of hidden variables in multiscale fluid simulation

Archis S. Joglekar Ergodic LLC, San Francisco, CA 94117, USA 111Also affiliated with the Department of Nuclear Engineering and Radiological Sciences, University of Michigan, Ann Arbor, MI 48109, USA [email protected]    Alexander G. R. Thomas Gerard Mourou Center for Ultrafast Optical Sciences, University of Michigan, Ann Arbor, MI 48109, USA
Abstract

Solving fluid dynamics equations often requires the use of closure relations that account for missing microphysics. For example, when solving equations related to fluid dynamics for systems with a large Reynolds number, sub-grid effects become important and a turbulence closure is required, and in systems with a large Knudsen number, kinetic effects become important and a kinetic closure is required. By adding an equation governing the growth and transport of the quantity requiring the closure relation, it becomes possible to capture microphysics through the introduction of “hidden variables” that are non-local in space and time. The behavior of the “hidden variables” in response to the fluid conditions can be learned from a higher fidelity or ab-initio model that contains all the microphysics. In our study, a partial differential equation simulator that is end-to-end differentiable is used to train judiciously placed neural networks against ground-truth simulations. We show that this method enables an Euler equation based approach to reproduce non-linear, large Knudsen number plasma physics that can otherwise only be modeled using Boltzmann-like equation simulators such as Vlasov or Particle-In-Cell modeling.

Keywords: neural operators, plasma physics, kinetics, machine learning, differentiable physics

1 Introduction

Machine learning for partial differential equations (PDEs) has received much attention with an emphasis towards computational fluid dynamics [1, 2, 3, 4]. Fluid equations describe conservation laws for bulk properties of matter — such as mass, energy, average flow velocity — under the continuum assumption. The equation set usually requires a closure, which involves additional information that represents some “microphysics” that is not captured by the continuum model. For example, because of turbulence on small scales, or granularity because of the nature of the media being made up of particles / molecules / cells etc. an equation may be introduced that represents the relationship between bulk properties, such as a constituent relation (material response to a force) or equation of state (a thermodynamic equation relating bulk properties) While analytic closure models based on equilbrium states have existed at least since the introduction of hydrodynamic equations, the abundance of data and of data-generation mechanisms has prompted a re-examination of the existing methods in light of data-driven models that provide closure for the coupled system of equations. This approach is particularly relevant in systems where physics occurs on length- and time- scales smaller than that which is resolved by the simulations.

Similarly, there are other systems where the fluid approximation itself breaks down because of effects involving an extended phase-space of states; for example, including momentum states where physical effects from the interaction between molecules or particles of a certain velocity become important. In these cases, increasing the resolution of the simulated domain arbitrarily does not recover the missing kinetic physics. Capturing kinetic physics using a first-principles approach requires a fully kinetic description such as solving the Boltzmann equation. Such effects may be included in a fluid model through constituent relations etc. that depend on the local conditions that may be derived or learned somehow from a kinetic model or experimental data etc. An example of this is the introduction of modified transport coefficients for nonlocal heatflow in fusion plasma [5]. More recently, the application of machine learning to model turbulence has a growing body of research (see references within refs. [4, 6]).

The issue with any relation derived from local bulk fluid conditions is that it retains no “memory” of microphysics effects from other locations; the effect is localized in space and time. But it is often the case that nonlocality is important. For example in the case of kinetic flows, fast particle streams will affect transport far, in both time and space, from a region that generates them.

In this article, we expand the method of machine learned closure to learn variables that themselves obey a PDE that describes the space-time variation of the microphysics. We refer to these as “hidden variables” as they do not describe bulk variables of the fluid system of equations but instead represent some physics that is “hidden” from the fluid equations. The original term “hidden variables” was introduced by Bohm to try to explain quantum behavior in terms of some unobservable underlying variables [7]. Here, the meaning is slightly different as although our “hidden variable” indeed describes physics that is unobservable (within the fluid description), the variable itself is not thought to be fundamental but just an ad-hoc reduced fidelity approximation of the fundamental behavior which can be learned from an ab-initio model. We specifically look at the dynamics of Landau damping of excited plasma wavepackets in the presence of trapped hot electrons, which is described by the higher fidelity Vlasov-Poisson equations. Our fluid model using a Landau closure, similar to that in Ref. [8], that is modified to include a hidden variable representing the effect of the hot electron population is able to replicate the fully kinetic behavior. The structure of paper is as follows. Section 2 discusses the inclusion of kinetic effects in a new differentiable fluid simulator using a “hidden variable” approach. Section 3 introduces the specific problem to address of Landau damping of excited plasma wavepackets and the basic equations to be solved. Section 4 introduces the new “hidden variable” and its evolution. Section 5 presents the results and comparison with full Vlasov simulation. Finally, we conclude in Section 6.

2 Inclusion of kinetic effects in a fluid code

The Boltzmann equation is a PDE that governs the evolution of the distribution function of particles in phase space and is given by

ft+𝐩mf𝐱+𝐅f𝐩=(ft)coll,\displaystyle\frac{\partial f}{\partial t}+\frac{\mathbf{p}}{m}\cdot\frac{\partial f}{\partial\mathbf{x}}+\mathbf{F}\cdot\frac{\partial f}{\partial\mathbf{p}}=\left(\frac{\partial f}{\partial t}\right)_{\text{coll}}, (1)

where f=f(t,𝐱,𝐩)f=f(t,\mathbf{x},\mathbf{p}) is the distribution function in (𝐱,𝐩)(\mathbf{x},\mathbf{p}) phase space, and 𝐅\mathbf{F} are the forces, and the right-hand-side describes collisional effects. Velocity-space moments of eq. 1 give an infinite chain of equations for the conservation of particle density, momentum, and energy. With appropriate closure conditions, these form the Euler equations in the inviscid case, or the Navier-Stokes equations in the viscous regime, depending on the treatment of the collisional term. If the particles are charged, this moment-driven approach gives the plasma fluid equations.

In all cases, the reduction of a three configuration and three velocity phase-space down to a three-dimensional configuration space averages over the details contained in velocity-space. This approximation is justified in systems where the Knudsen number is small such that the particles reach local thermal equilibrium (LTE) and velocity-space effects, called kinetic effects, are negligible. The Knudsen number is defined by the ratio between the mean-free-path of the particles and the size of the system. There are many important systems where the Knudsen number is not small, and the thermal equilibrium approximation does not hold. For example, in micro- or nano- scale flows where the system size is small, as well as rarified gases such as the high-altitude atmosphere that is experienced by spacecraft where the mean-free-path is large.

In plasmas, kinetic effects are important in the Scrape-Off Layer (SOL) in magnetic fusion plasmas [9], in the low-density gas-fill in inertial fusion plasmas [9], and in laser- and plasma-based accelerator experiments where the plasma density is low and the physics is weakly-collisional. In such settings, it becomes important to retain the influence of kinetic effects in order to perform accurate modeling of the system dynamics. Another example is in plasma based particle acceleration [10], in particular the phenomena of wavebreaking or trajectory crossing in plasma waves [11]. Fluid simulations of plasma accelerators have the benefit of having smooth distributions compared to typical particle-in-cell simulations, but cannot model crucial electron trapping processes because of phase space mixing.

Previous attempts at the inclusion of kinetic effects into plasma fluid modeling have leveraged analytical methods [8, 12, 13, 14]. Modeling kinetic physics using data-driven methods has either been attempted towards benchmarking their performance against known analytical methods [15] or is restrictive in the geometries to which it can be applied [16]. In this work, we develop a machine-learned model that is applied to model a phenomenon with no known analytical representation. It is trained in a small geometry and successfully translated to a much larger domain.

To accurately model the desired kinetic effects in this work, we find that a model that is non-local in space and time is necessary. In other words, we find that a model that has memory is required. To accomplish this, we use an additional transport equation for a hidden variable with learned coefficients for the source term. Because we leverage a hidden variable that can persist over time, it enables the reproduction of the kinetic effect that requires non-locality in time. Since our hidden variable can also be transported, it can be leveraged to reproduce non-local behavior in space. We train the model in a reduced, idealized system against numerical solutions of the collisionless Boltzmann equation. Because our model is a self-consistent system of coupled PDEs with carefully constructed asymptotic behavior, we show that it not only extends to larger systems but can also be used to model phenomena that the neural networks have never previously seen.

Refer to caption
Figure 1: A single step in the training loop is illustrated. A batch of physical parameters is chosen. (Bottom) Those parameters are then fed to a kinetic simulator and an observable is calculated. (Top) The same parameters are fed to the differentiable fluid simulator that includes the machine learned hidden variable. This system is evolved in time over the same duration as the kinetic simulation. The observables from each simulation are used for calculating the loss function. The loss, and more importantly, the gradient of the loss with respect to the weights of the function approximator is computed. That gradient is used by an optimization algorithm to update the weights of the hidden variable model. The updated weights are used in the next batch of simulations. In practice, we precalculate the reference simulations.

In order to maximize the likelihood of developing models that are stable over long time-scales, it is important to develop models either using symbolic or sparse regression techniques [17, 18, 19] or via differentiable simulators [20, 21]. The former allow the user to craft and analyze symbolic formulations and ensure their stability. In the most general case, however, formulating a library of terms that may enable accurate modeling is a challenging task, and using a function approximator such as a neural network provides a method by which to circumvent that challenge. Training neural network models in-situ with simulations, rather than only using a single time-step, helps ensure the stability of the learned representation. This paradigm is visualized in fig. 1.

To increase the likelihood of learning stable models, we choose to develop a differentiable fluid simulator for plasma physics called ADEPT222Automatic-Differentiation-Enabled Plasma Transport. While differentiable simulators have been used to construct models for computational fluid dynamics [20, 21, 6], and for physics discovery in plasma kinetics [22], to our knowledge, this is the first differentiable simulator of computational plasma fluid dynamics. The code is publicly available along with the dataset introduced in this work.

3 Problem Statement – Landau damping of excited plasma wavepackets

Plasmas are systems where an ionized gas exhibits collective behavior through electrostatic or electromagnetic fields. These fields are capable of sustaining oscillations and waves which leads to a discipline rich in non-linear kinetic behavior. The kinetic behavior arises due to wave-particle interactions, where particles moving at the phase velocity of the wave can exchange energy with the wave. One important example of this is Landau damping [23], where an excited plasma wave will damp even in the absence of dissipative forces. This inherently requires a kinetic description to describe the physics of the wave-particle interaction driven damping mechanism. The linear version of Landau damping can be included via an analytic closure relation, which we do here. We extend that implementation to include the nonlinear version of this effect in a fluid model through the introduction of a “hidden variable” representing the trapped electron population. We start by describing the simple one dimensional fluid model.

3.1 Fluid model

The relevant equations are the simplest form of the moment equations for electron plasma transport that are necessary for illustrating our method. The two-moment equations are given by

tn+x(un)\displaystyle\partial_{t}n+\partial_{x}(u~{}n) =0,\displaystyle=0, (2)
mn(tu+uxu)\displaystyle mn\left(\partial_{t}u+u~{}\partial_{x}u\right) =xp+qnE,\displaystyle=-\partial_{x}p+qnE, (3)

where n=n(t,x),u=u(t,x),p=p(t,x)n=n(t,x),u=u(t,x),p=p(t,x) are the density, velocity, and pressure, respectively. We use an adiabatic equation of state as a closure for the equation set,

p=p0(nn0)γ,p=p_{0}\left(\frac{n}{n_{0}}\right)^{\gamma}\;, (4)

where γ=(f+2)/f\gamma=(f+2)/f is the adiabatic index, with ff the number of degrees of freedom, which is set to 3.0 for one-dimensional problems. TT is the electron temperature and is assumed to be constant and uniform. E=E(t,x)E=E(t,x) is the electrostatic field given by solving Poisson’s equation, xE=(Znien)/ϵ0\partial_{x}E=(Zn_{i}-en)/\epsilon_{0} where nin_{i} is the ion density. Because this problem requires electron timescales, it is satisfactory to assume ions are stationary and form a neutralizing background.

3.2 Base Model - Linear Landau Damping

During linear Landau damping, particles absorb energy from the electric field and are accelerated. Because individual particle trajectories are no longer modeled in the fluid approximation, modeling Landau damping requires a modification to include the microphysics. In ref. [8], a Fourier space operation is used to provide the microphysics closure.

We include Landau damping similarly by adding a damping term to the momentum equation given by eq. 3. The modified momentum equation is given by

mn(tu+uxu)\displaystyle mn\left(\partial_{t}u+u~{}\partial_{x}u\right) =xp+qnE+2mnνLu,\displaystyle=-\partial_{x}p+qnE+2mn~{}\nu_{L}*u, (5)

where * is the convolution operator defined by [ab](x)a(xx)b(x)𝑑x[a*b](x)\equiv\int a(x^{\prime}-x)b(x^{\prime})dx^{\prime} and νL\nu_{L} is the wavenumber dependent Landau damping rate. Since νL\nu_{L} can be calculated for a range of wavenumbers using numerical rootfinders, we pre-compute the νL\nu_{L} for the computational domain, which serves as a convolutional filter that represents the wave-number dependent damping rate. We verify that this implementation reproduces Landau damping by comparing the results against those from a Vlasov code. We also verify that the analytical linear dispersion relation arising from by eqs. 2 and 5 recovers the Landau damping rate from the imaginary part of the kinetic dispersion relation.

3.3 Training - Non-linear Landau damping

When the electric field amplitude is large, particles exchange energy with the electric field, and eventually become trapped in the potential of the wave. The electric field and particles reach a time-averaged steady state during which Landau damping is suppressed and the electric field is not damped. In a weakly-collisional plasma, Landau damping does not completely disappear, but is reduced significantly, and the amount of reduction is a function of the collision frequency, field amplitude, and the wavenumber of the field. This is a distinctly non-linear and kinetic effect that is not captured by the fluid model.

A model equation that enables the field amplitude dependent suppression of Landau damping can be given by

mn(tu+uxu)\displaystyle mn\left(\partial_{t}u+u~{}\partial_{x}u\right) =xp+qnE+2mnνL(k,|E|k;θ)u,\displaystyle=-\partial_{x}p+qnE+2~{}mn~{}\nu_{L}(k,|E|^{k};\theta)*u, (6)

where νL\nu_{L} is now a function of the wavenumber dependent field amplitude. An analytical solution for that term is not available, but a representation can be machine learned via tuning its weights and biases θ\theta. However, it is readily seen that this implementation is unable to recover effects that are non-local in space, that is, effects that require nearby information, such as wavepacket etching. The formulation, by introducing a spatiotemporally dependent “hidden variable” that we use instead is detailed in the next section.

3.4 Testing - Motivating non-local physics through wavepacket etching

In the previous subsections, we have been discussing phenomenon that occurs in a idealized periodic system of a single wavelength. In finite-length wavepackets of 𝒪(10)\mathcal{O}(10) wavelengths, a kinetic effect occurs where the damping rate becomes increasingly non-local because it depends on past dynamics. The effect on the observable is that the back of the wavepacket is damped faster than the front. It was shown in ref. [24] that the reason behind this is that resonant electrons are accelerated by the wave and travel faster than the group velocity of the wave. Because the electrons in the back of the wave are no longer trapped, non-linear Landau damping no longer occurs and the back of the wave resumes linear behavior. On the other hand, since the electrons from the back are traveling to the front of the wave, the front of the wave continues to experience non-linear effects. For this reason, the back of the wave damps faster than the front. The Euler equations do not discriminate between the resonant electrons and the bulk fluid. Therefore, they are unable to capture this effect and require further modification.

4 Method

4.1 A spatiotemporally non-local model of Landau Damping

We modify eq. 6 by adding an auxiliary “hidden variable” δ=δ(t,x)\delta=\delta(t,x) that represents the resonant electrons. From this section on we write the equations in normalized form using electrostatic units where vv/vthv\rightarrow v/v_{th}, xx/λD{x}\rightarrow x/\lambda_{D}, tωpt{t}\rightarrow\omega_{p}t, mm/me{m}\rightarrow m/m_{e}, qq/e{q}\rightarrow q/e, and EeE/mevthωp{E}\rightarrow eE/m_{e}v_{th}\omega_{p} [22]. Here, vthv_{th} is the thermal velocity, ωp\omega_{p} is the plasma frequency, and mem_{e}, and ee, are the electron mass, and charge, respectively. Since we seek to minimize Landau damping in their presence, we assume a sigmoidal dependence, similar to that in ref. [25], and the resulting equation is given by

mn(tu+uxu)\displaystyle mn\left(\partial_{t}u+u~{}\partial_{x}u\right) =xp+qnE+2mnνLu1+δ2.\displaystyle=-\partial_{x}p+qnE+2~{}mn~{}\frac{\nu_{L}*u}{1+\delta^{2}}. (7)

The growth and transport of δ\delta is given by

tδ\displaystyle\partial_{t}\delta =vphxδ+νg|EνL|1+δ2,\displaystyle=v_{ph}\partial_{x}\delta+\nu_{g}~{}\frac{|E*\nu_{L}|}{1+\delta^{2}}, (8)
νg\displaystyle\nu_{g} =νg(k,νee,|E^k|;θg).\displaystyle=\nu_{g}(k,\nu_{ee},|\hat{E}^{k}|;\theta_{g}). (9)

This model makes use of the knowledge that, since δ\delta represents the electrons in phase with the wave, δ\delta must be transported at the phase velocity. It also uses a source term inspired by linearizing the Vlasov-Boltzmann equation. Because δ\delta is initialized at zero, the proportionality factor of the source term is what is approximated via a machine learned function.

4.2 ADEPT - Automatic-Differentiation-Enabled Plasma Transport

To train the model in eq. 9, we need a simulator that can model the set of equations given by

tn+x(un)\displaystyle\partial_{t}n+\partial_{x}(u~{}n) =0,\displaystyle=0, (10)
mn(tu+uxu)\displaystyle mn\left(\partial_{t}u+u~{}\partial_{x}u\right) =xp+qnE+2mnνLu1+δ2.\displaystyle=-\partial_{x}p+qnE+2~{}mn~{}\frac{\nu_{L}*u}{1+\delta^{2}}. (11)
p\displaystyle p =nγ,\displaystyle=n^{\gamma}, (12)
xE\displaystyle\partial_{x}E =qm(n1),\displaystyle=\frac{q}{m}(n-1), (13)
tδk\displaystyle\partial_{t}\delta_{k} =vphxδ+νg|EνL|1+δ2,\displaystyle=v_{ph}~{}\partial_{x}\delta+\nu_{g}~{}\frac{|E*\nu_{L}|}{1+\delta^{2}}, (14)
νg\displaystyle\nu_{g} =10f(E~k,ν~ee,k~;θg),\displaystyle=10^{f(\tilde{E}_{k},\tilde{\nu}_{ee},\tilde{k};\theta_{g})}, (15)
f(E,νee,k;θ)\displaystyle f(E,\nu_{ee},k;\theta) =3tanh(g(ν~ee,k~,|E~^k|;θ)),\displaystyle=3\tanh\left(g(\tilde{\nu}_{ee},\tilde{k},|\hat{\tilde{E}}^{k}|;\theta)\right), (16)

where the final term in eq. 11 represents the non-local kinetic effect that is added to the typical Euler equations. The closure relation for that term is given by eqs. 14, 15, 16 where νg\nu_{g} is the key coefficient that must be learned from data.

4.2.1 Neural Network Details and Input Normalization

g(ν~ee,k~,|E~k|;θ)g(\tilde{\nu}_{ee},\tilde{k},|\tilde{E}^{k}|;\theta) in eq. 16 is the function representing the output from a feedforward neural network with a depth of 3 and width of 8 units. The intermediate activation function is the hyperbolic tangent function. To ensure that the growth rate does not result in numerical instability, we bound the growth rate so that 103<νg<10310^{-3}<\nu_{g}<10^{3} by leveraging the properties of the hyperbolic tangent.

ν~ee=(log10(νee)+7)/4\tilde{\nu}_{ee}=\left(\log_{10}(\nu_{ee})+7\right)/4 is the normalized electron-electron collision frequency, k~=(k0.26)/0.14\tilde{k}=(k-0.26)/0.14 is the normalized wavenumber, |E~^k|=(log10(|E^k|+1010)+10)/10|\hat{\tilde{E}}^{k}|=(\log_{10}(|\hat{E}^{k}|+10^{-10})+10)/-10 is the normalized value of the electric field amplitude of the kkth mode number. We choose this formulation for normalizing the electric field to avoid possible problems with taking the logarithm of zero.

4.2.2 Solver and Simulations

We solve this set using a pseudo-spectral, finite-volume method with a fourth order time-stepper [26] in a spatially periodic domain. The gradients are computed using Fourier transforms. Similarly, Poisson’s equation is also solved spectrally. All the other operations, e.g. advection, are performed in real space.

5 Results

5.1 Training - Suppression of Landau Damping in large amplitude plasma waves

5.1.1 Dataset from kinetic simulations

We train our model against simulations collected from a Vlasov-Boltzmann equation solver. The equations solved by the fully-kinetic solver are

tf+vxf+Evf\displaystyle\partial_{t}f+v~{}\partial_{x}f+E~{}\partial_{v}f =νeev(vf+v02vf),\displaystyle=\nu_{ee}~{}\partial_{v}(vf+v_{0}^{2}~{}\partial_{v}f), (17)
xE\displaystyle\partial_{x}E =Znine,\displaystyle=Zn_{i}-n_{e}, (18)

where f=f(t,x,v)f=f(t,x,v) is the electron distribution function, Z=1Z=1 is the atomic charge, and ni=1n_{i}=1 is the ion density. Details for this solver are given in refs. [27, 22]

The fixed parameters are tmax=500ωp1,Δt=0.5,Nx=64,Nv=2048t_{max}=500\omega_{p}^{-1},\Delta t=0.5,N_{x}=64,N_{v}=2048. In all simulations, we simulate a single mode wave. The parameters of the temporal envelope of the external forcing term are tw=40ωp1,tr=5ωp1,tc=40ωp1t_{w}=40\omega_{p}^{-1},t_{r}=5\omega_{p}^{-1},t_{c}=40\omega_{p}^{-1}. The implementation of the forcing term is given in ref. [22].

Refer to caption
Figure 2: The training data are single mode simulations of plasma waves that are excited by a short duration driver using the fully-kinetic, Vlasov formulation. The training involves reproducing these excitations with the simulations using the modified fluid equations.

The training dataset comprises 200 first-principles simulations. The parameters that are varied are given by

νee[107,106,105,104,103],\displaystyle\nu_{ee}\in[10^{-7},10^{-6},10^{-5},10^{-4},10^{-3}], (19)
k[0.26,0.28,0.3,0.32,0.34,0.36,0.38,0.4],\displaystyle k\in[0.26,0.28,0.3,0.32,0.34,0.36,0.38,0.4], (20)
a0[106,105,104,103,102].\displaystyle a_{0}\in[10^{-6},10^{-5},10^{-4},10^{-3},10^{-2}]. (21)

A few examples are shown in fig. 2.

5.1.2 Fluid Modeling

The fluid simulations used for training the model have different spatial and temporal resolutions, given by Nx=16,Δt=0.025N_{x}=16,\Delta t=0.025, but are performed over the same spatial and temporal domains as the kinetic simulations.

Since we seek to replicate the behavior of a single-mode electron plasma wave with our fluid model, our training objective function is to minimize the mean squared error between the timeseries from the Vlasov and fluid simulations. Formally, the loss function is given by

p\displaystyle\vec{p} =(a0,νee,kλD)\displaystyle=(a_{0},\nu_{ee},k\lambda_{D}) (22)
\displaystyle\mathcal{L} =[1Ntt=250ωp1450ωp1(log10|n^f1|(p;θ)log10|n^V1(p)|)2],\displaystyle=\left[\frac{1}{N_{t}}\sum_{t=250\omega_{p}^{-1}}^{450\omega_{p}^{-1}}\left(\log_{10}|\hat{n}^{1}_{f}|(\vec{p};\theta)-\log_{10}|\hat{n}^{1}_{V}(\vec{p})|\right)^{2}\right], (23)

where p\vec{p} is the parameter vector that consists of unique combinations of the parameters listed in eqs. 19, 20, and 21, |n^1||\hat{n}^{1}| is the time history amplitude of the first mode of the density profile with the subscripts of ff and VV correspond to fluid and Vlasov, respectively. It is important to note that we compare the logarithm of the timeseries in the loss function. This is because the wave amplitude can be damped over many orders of magnitude. Without leveraging the logarithm, the wave amplitudes at the end of the simulation will be negligible in comparison to that near the beginning and would effectively be ignored by the gradient descent algorithm.

5.1.3 Assessment

Refer to caption
Figure 3: The training and validation loss is plotted against the number of epochs.

We use a 90%/10% training and validation split. In fig. 3, we show the mean squared error over the training epochs for training and validation. Training was performed using A10G GPUs. We implement data parallel training by running NbatchN_{\text{batch}} simulations in parallel and averaging their gradients on the primary node that is running the optimization algorithm. We use ADAM with a learning rate of 0.004 as the optimization algorithm. We stop training after the performance has saturated for many epochs.

Refer to caption
(a) Linear Landau damping
Refer to caption
(b) δ(t,x)\delta(t,x)
Refer to caption
(c) Non-linear Landau damping
Refer to caption
(d) δ(t,x)\delta(t,x)
Figure 4: (a) A small amplitude simulation. In this simulation, δ\delta is not needed to modify the damping rate because the system exhibits linear dynamics. Accordingly, (b) shows that δ\delta remains small. (c) A large amplitude simulation with non-linear Landau damping. δ\delta is needed to modify the damping rate. The linear fluid model is incapable of reproducing the desired physics. To recover the non-linear effects, the simulation leverages δ\delta and (d) shows that δ7\delta\approx 7.

Figure 4 shows the result of two sample simulations. We choose to show the result of these two sets of parameters because they are representative examples of two extremes that the machine learning model must be able to handle.

In fig. 4(a), we show the results from a simulation with νee=,k0=0.34,a0=106\nu_{ee}=,k_{0}=0.34,a_{0}=10^{-6}. The two simulations agree well. This is a parameter regime where linear Landau damping is expected, and therefore, the theory from sec. 3.2 applies. Therefore, theory dictates that to reproduce the desired behavior, δ\delta must remain small. Figure 4(b) shows that δ105\delta\approx 10^{-5} as the theory suggests it should. Because we train using the logarithm, as stated in eq. 23, we are able to capture the damping over five orders of magnitude.

In fig. 4(c), we show the results from a simulation with νee=105,k0=0.32,a0=102\nu_{ee}=10^{-5},k_{0}=0.32,a_{0}=10^{-2}. The two simulations agree well. In this parameter regime, kinetic theory dictates that there is a complex interplay between the collisions, represented by νee\nu_{ee}, and the field amplitude, represented by a0a_{0}. Here we expect that to reproduce the desired behavior, δ\delta must be some finite quantity that appreciably, but not completely suppresses the damping. Figure 4(d) shows that δ7\delta\approx 7 and the damping is suppressed by a factor of 50.

Now that we have been able to learn a microphysics model, we use the coupled set of PDEs along with the newly acquired model weights to simulate a novel geometry and verify its performance qualitatively.

5.2 Testing - Extending to unseen geometries and modeling unseen phenomena

Refer to caption
(a) Vlasov
Refer to caption
(b) Fluid + Local Model
Refer to caption
(c) Fluid + δ(t,x)\delta(t,x)
Refer to caption
(d) δ(t,x)\delta(t,x)
Figure 5: (a) shows the ground-truth density profile over space and time for a large amplitude excitation. (b) shows that using a local model for Landau damping is insufficient in modeling this excitation. (c) shows the result of a simulation that uses machine-learned δ(t,x)\delta(t,x) model which shows better agreement and reproduction of the non-local damping. (d) shows that δ\delta grows to 33 in the case of this large amplitude perturbation to account for the non-locality. It is important to note that the scale length over which the model is trained in fig. 4 is significantly different than the scale length to which the model is applied here.

In the previous section, we trained on a dataset of systems where a single wavelength wave was excited in a box of one wavelength in length. To assess the performance of our learned model in unseen geometries, we perform simulations where the box size is increased by a factor of 100 and a 20 wavelength long, finite-length wave is modeled. This tests the ability of the machine learned model combined with the hidden variable approach that was trained on a limited geometry to generalize to larger length scales and different boundary conditions. To verify the fidelity, we compare the results of the learned fluid simulations to their kinetic counterparts acquired by using the Vlasov-Boltzmann solver.

In refs. [24, 22], it has been shown that finite-length wavepackets damp non-uniformly due to a non-linear, kinetic phenomenon. In fig. 5(a), we show an example of this phenomenon. Figure 5(b) shows that if a local implementation of Landau damping is used, for example, like the one given by eq. 6, the non-uniform damping will not be reproduced because the damping rate will be the same along the length of the wavepacket. In contrast, fig. 5(c) shows that the implementation using the machine-learned dynamical hidden variable approach is much more capable of reproducing the results from the ab-initio simulation. Figure 5(d) confirms that δ\delta grows to account for the suppressed damping, and then is transported to account for the non-locality in space and time.

Refer to caption
(a) Vlasov
Refer to caption
(b) Fluid + Local
Refer to caption
(c) Fluid + δ(t,x)\delta(t,x)
Refer to caption
(d) δ(t,x)\delta(t,x)
Figure 6: (a) shows the ground-truth density profile over space and time for a small amplitude excitation. In contrast to fig. 5, (b) and (c) agree well here because at small amplitudes, the damping remains local and the non-local model is expected to converge to the local result. Correspondingly, (d) shows that δ\delta remains small.

Similar to that in fig. 4, we also show the behavior in the linear regime exemplified by a small amplitude simulation in fig. 6(a). In this regime, it is expected that Landau damping remains local as shown in fig. 6(b). 6(d) shows that the model does not anomalously generate large δ\delta. Because of this, in fig. 5(c), we see that linear Landau damping proceeds as expected and the non-local model recovers the local result.

In this work, we seek to highlight the fact that because we choose to embed the machine learning model into a set of coupled PDEs that are capable of generalizing, we can train the model on a simple system and can scale the learned physics to a significantly more complex domain. In this case, we scale a single-wavelength periodic system to a finite-length wavepacket in an open boundary simulation but this model could be very well be extended to multiple spatial dimensions. If the use-case were to acquire as accurate a model as possible in order to perform multi-scale modeling of an actual experiment, one could and should train the model on as many geometries as is feasible.

6 Conclusion

The marriage of numerical PDE solvers and deep learning driven models is more than coincidental because they are both products of numerical computing. The advent of scalable and performant scientific computing libraries that are capable of automatic differentiation, e.g. JAX and Julia, has enabled a seamless interchange of ideas between the two paradigms where the lines between each continue to blur.

In this work, we show how the two paradigms can be intimately woven together and leveraged towards performing multiscale modeling where the equation set does not natively contain all the necessary microphysics, and where the microphysics model itself has proven to be challenging to construct using conventional methods. By training against a dataset with a relatively simple geometry and parameters, we are able to learn a model that can emulate spatio-temporally non-local physics over broad parameter ranges and geometries, physics which can otherwise only be described by computationally expensive higher-dimensional PDE solvers. The key to preserving the spatio-temporally non-local dynamics is providing a dynamical system for a hidden variable that governs the physics. By allowing for growth and transport of the hidden variable, we enable non-locality in time and space.

We envision that a model that leverages machine-learned hidden variables can be used for various phenomenon inside and outside of plasma physics. For example, hot electrons created from laser-plasma instabilities are deleterious towards successful fusion experiments. Accurate modeling of their dynamics and energetics in fully integrated hydrodynamics simulations of fusion experiments can help substantially improve the predictive power of simulations and improve the results of fusion experiments.

7 Data Availability Statement

The data that support the findings of this study are openly available at the following URL: https://github.com/ergodicio/adept

8 Conflict of Interest

Authors declare no competing interests.

9 Acknowledgments

We acknowledge funding from the DOE Grant # DE-SC0016804.

We use and would like to acknowledge open-source software from the Python ecosystem for scientific computing. Specifically, we use NumPy [28], SciPy [29], matplotlib [30], Xarray [31], JAX [32], and Diffrax [33] while the neural network is managed using Equinox [34]. The numerical experiments and data are managed using MLFlow [35] as discussed in ref. [36].

A. J. also thanks D. J. Strozzi and W. B. Mori for useful discussions.

References

References

  • [1] Brenner M P, Eldredge J D and Freund J B 2019 Physical Review Fluids 4 100501 publisher: American Physical Society
  • [2] Duraisamy K, Iaccarino G and Xiao H 2019 Annual Review of Fluid Mechanics 51 357–377 ISSN 0066-4189, 1545-4479
  • [3] Brunton S L, Noack B R and Koumoutsakos P 2020 Annual review of fluid mechanics 52 477–508
  • [4] Vinuesa R and Brunton S L 2022 Nature Computational Science 2 358–366 ISSN 2662-8457 number: 6 Publisher: Nature Publishing Group
  • [5] Brodrick J P, Kingham R J, Marinak M M, Patel M V, Chankin A V, Omotani J T, Umansky M V, Del Sorbo D, Dudson B, Parker J T, Kerbel G D, Sherlock M and Ridgers C P 2017 Physics of Plasmas 24 ISSN 1070-664X 092309
  • [6] Shankar V, Puri V, Balakrishnan R, Maulik R and Viswanathan V 2023 Machine Learning: Science and Technology 4 015017 ISSN 2632-2153 publisher: IOP Publishing
  • [7] Bohm D 1952 Phys. Rev. 85(2) 166–179
  • [8] Hammett G W and Perkins F W 1990 Physical Review Letters 64 3019–3022 ISSN 0031-9007
  • [9] Brodrick J P, Kingham R J, Marinak M M, Patel M V, Chankin A V, Omotani J T, Umansky M V, Del Sorbo D, Dudson B, Parker J T, Kerbel G D, Sherlock M and Ridgers C P 2017 Physics of Plasmas 24 092309 ISSN 1070-664X arXiv: 1704.08963
  • [10] Albert F, Couprie M E, Debus A, Downer M C, Faure J, Flacco A, Gizzi L A, Grismayer T, Huebl A, Joshi C, Labat M, Leemans W P, Maier A R, Mangles S P D, Mason P, Mathieu F, Muggli P, Nishiuchi M, Osterhoff J, Rajeev P P, Schramm U, Schreiber J, Thomas A G R, Vay J L, Vranic M and Zeil K 2021 New Journal of Physics 23 031101
  • [11] Thomas A G R 2016 Phys. Rev. E 94(5) 053204
  • [12] Dimits A M, Joseph I and Umansky M V 2014 Physics of Plasmas 21 055907 ISSN 1070-664X
  • [13] Hunana P, Zank G, Laurenza M, Tenerani A, Webb G, Goldstein M, Velli M and Adhikari L 2018 Physical Review Letters 121 135101 publisher: American Physical Society
  • [14] Fan K, Xu X, Zhu B and Li P 2022 Physics of Plasmas 29 042116 ISSN 1070-664X
  • [15] Cheng W, Fu H, Wang L, Dong C, Jin Y, Jiang M, Ma J, Qin Y and Liu K 2022 Computer Physics Communications 108538 ISSN 0010-4655
  • [16] Lamy C, Dubroca B, Nicolaï P, Tikhonchuk V and Feugeas J L 2022 Physical Review E 105 055201 publisher: American Physical Society
  • [17] Rudy S H, Brunton S L, Proctor J L and Kutz J N 2017 Science Advances 3 e1602614 ISSN 2375-2548
  • [18] Alves E P and Fiuza F 2022 Physical Review Research 4 033192 publisher: American Physical Society
  • [19] Kaptanoglu A A, Hansen C, Lore J D, Landreman M and Brunton S L 2023 Physics of Plasmas 30 033906 ISSN 1070-664X publisher: American Institute of Physics
  • [20] Bar-Sinai Y, Hoyer S, Hickey J and Brenner M P 2019 Proceedings of the National Academy of Sciences 116 15344–15349 ISSN 0027-8424, 1091-6490 publisher: National Academy of Sciences Section: Physical Sciences
  • [21] Kochkov D, Smith J A, Alieva A, Wang Q, Brenner M P and Hoyer S 2021 Proceedings of the National Academy of Sciences 118 e2101784118 publisher: Proceedings of the National Academy of Sciences
  • [22] Joglekar A S and Thomas A G 2022 Journal of Plasma Physics 88 905880608 ISSN 0022-3778, 1469-7807
  • [23] Landau L D 1946 Zh. Eksp. Teor. Fiz. 10 25
  • [24] Fahlen J E, Winjum B J, Grismayer T and Mori W B 2009 Physical Review Letters 102 245002 ISSN 0031-9007, 1079-7114
  • [25] Tran G, Loiseau P, Fusaro A, Héron A, Hüller S, Maëder L, Masson-Laborde P E, Penninckx D and Riazuelo G 2020 Physics of Plasmas 27 122707 ISSN 1070-664X, 1089-7674
  • [26] Tsitouras C 2011 Computers & Mathematics with Applications 62 770–775 ISSN 08981221 publisher: Elsevier Ltd
  • [27] Joglekar A and Levy M 2020 Journal of Open Source Software 5 2182 ISSN 2475-9066 URL https://joss.theoj.org/papers/10.21105/joss.02182
  • [28] Harris C R, Millman K J, van der Walt S J, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith N J, Kern R, Picus M, Hoyer S, van Kerkwijk M H, Brett M, Haldane A, del Río J F, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C and Oliphant T E 2020 Nature 585 357–362 ISSN 1476-4687 arXiv: 2006.10256 Publisher: Springer US
  • [29] SciPy 10 Contributors, Virtanen P, Gommers R, Oliphant T E, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, van der Walt S J, Brett M, Wilson J, Millman K J, Mayorov N, Nelson A R J, Jones E, Kern R, Larson E, Carey C J, Polat I, Feng Y, Moore E W, VanderPlas J, Laxalde D, Perktold J, Cimrman R, Henriksen I, Quintero E A, Harris C R, Archibald A M, Ribeiro A H, Pedregosa F and van Mulbregt P 2020 Nature Methods 17 261–272 ISSN 1548-7091, 1548-7105
  • [30] Hunter J D 2007 Computing in Science & Engineering 9 90–95
  • [31] Hoyer S and Hamman J J 2017 Journal of Open Research Software 5 1–6 ISSN 2049-9647
  • [32] Bradbury J, Frostig R, Hawkins P, Johnson M J, Leary C, Maclaurin D, Necula G, Paszke A, VanderPlas J, Wanderman-Milne S and Zhang Q 2018 JAX: Autograd and XLA original-date: 2018-10-25T21:25:02Z URL https://github.com/google/jax
  • [33] Kidger P 2021 On Neural Differential Equations Ph.D. thesis University of Oxford
  • [34] Kidger P and Garcia C 2021 Differentiable Programming workshop at Neural Information Processing Systems 2021
  • [35] Chen A, Chow A, Davidson A, DCunha A, Ghodsi A, Hong S A, Konwinski A, Mewald C, Murching S, Nykodym T, Ogilvie P, Parkhe M, Singh A, Xie F, Zaharia M, Zang R, Zheng J and Zumar C 2020 Developments in MLflow: A System to Accelerate the Machine Learning Lifecycle Proceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning (New York, NY, USA: Association for Computing Machinery) ISBN 978-1-4503-8023-2 series Title: DEEM’20
  • [36] Feister S, Cassou K, Dann S, Döpp A, Gauron P, Gonsalves A J, Joglekar A, Marshall V, Neveu O, Schlenvoigt H P, Streeter M J V and Palmer C A J 2023 High Power Laser Science and Engineering 1–31 ISSN 2095-4719, 2052-3289 publisher: Cambridge University Press