This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

NF-MKV Net: A Constraint-Preserving Neural Network Approach to Solving Mean-Field Games Equilibrium

Jinwei Liu1,2{}^{1,2}\quad Lu Ren1,2{}^{1,2}\quad Wang Yao2,3,4,5,{}^{2,3,4,5,{\dagger}}\quad Xiao Zhang1,2,3,5,†
1School of Mathematical Sciences, Beihang University
2Key Laboratory of Mathematics, Information and Behavioral Semantics, Ministry of Education
3Zhongguancun Laboratory
4Institute of Artificial Intelligence, Beihang University
5Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing
{liujinwei,urrenlu,yaowang,xiao.zh}@buaa.edu.cn
: Corresponding Author
Abstract

Neural network-based methods for solving Mean-Field Games (MFGs) equilibria have garnered significant attention for their effectiveness in high-dimensional problems. However, many algorithms struggle with ensuring that the evolution of the density distribution adheres to the required mathematical constraints. This paper investigates a neural network approach to solving MFGs equilibria through a stochastic process perspective. It integrates process-regularized Normalizing Flow (NF) frameworks with state-policy-connected time-series neural networks to address McKean-Vlasov-type Forward-Backward Stochastic Differential Equation (MKV FBSDE) fixed-point problems, equivalent to MFGs equilibria. First, we reformulate MFGs equilibria as MKV FBSDEs, embedding the density distribution into the equation coefficients within a probabilistic framework. Neural networks are then designed to approximate value functions and gradients derived from these equations. Second, we employ NF architectures, a class of generative neural network models, and impose loss constraints on each density transfer function to ensure volumetric invariance and time continuity. Additionally, this paper presents theoretical proofs of the algorithm’s validity and demonstrates its applicability across diverse scenarios, highlighting its superior effectiveness compared to existing methods.

1 Introduction

Mean-Field Games (MFGs), introduced independently by Lasry & Lions (2007) and Huang et al. (2006), provide a robust framework for addressing large-scale multi-agent problems. MFGs are widely applied in domains such as autonomous driving, social networks, crowd management, and power systems.

Neural network-based algorithms have recently been employed to solve MFGs equations due to their ability to handle high-dimensional problems effectively. For example, Lin et al. (2021) reformulated MFGs as a generative adversarial network (GAN) training problem, and Ruthotto et al. (2020) introduced a Lagrangian-based approach to approximate agent states through sampling. Additionally, Chen et al. (2023) applied reinforcement learning and neural networks to model distributions and address value functions. Most approaches focus on optimizing the loss term in MFGs coupled equations using sampled agents, but they often neglect density dynamics, leading to challenges in representing continuous state density distributions.

Carmona et al. (2018) introduced a stochastic process perspective on MFGs, leveraging McKean-Vlasov Forward-Backward Stochastic Differential Equations (MKV FBSDEs) to address MFGs equilibria, and explored numerical methods for solving them. Achdou & Lauriere (2015) proposed simplified MFGs models for pedestrian dynamics and demonstrated them with numerical simulations. Ren et al. (2024) studied multi-group MFGs by solving asymmetric Riccati differential equations and established sufficient conditions for the existence and uniqueness of optimal solutions. However, existing MKV FBSDE methods are often limited to linear-quadratic MFGs, where distributions are simplified to the expectation of agents’ states, rather than full distribution functions, in nonlinear settings. Recently, Huang et al. (2023) proposed a data-driven Normalizing Flow (NF) approach to solve distribution-involved optimal transport stochastic problems. Nevertheless, constraints from MFGs process dynamics, terminal loss, and equation coupling limit the availability of NF frameworks for high-dimensional MKV FBSDEs.

In summary, solving the MFGs equilibrium reduces to addressing equivalent stochastic fixed-point problems that incorporate distribution flows. NF-MKV Net is proposed to solve the MKV FBSDEs problem by coupling the process-regularized NF and state-policy-connected time series neural networks. The enhanced NF framework models flows of probability measures, constructing a density distribution flow that satisfies volumetric invariance at each time step. State-policy-connected time series neural networks, grounded in MKV FBSDEs, model relationships between time-step value functions and approximate their gradients, enabling solutions in a time-consistent manner. Using the coupled frameworks, the fixed-point distribution flow equivalent to the MFGs equilibrium can be determined while ensuring mathematical constraints are satisfied.

Contributions: The main contributions and results are summarized as follows:

  • NF-MKV Net is proposed to solve MKV FBSDEs, which are equivalent to MFGs equilibrium, from a stochastic process perspective. By integrating process-regularized NFs and state-policy-connected time series neural networks into a coupled framework with alternating training, the method adheres to volumetric invariance and time-continuity constraints.

  • Process-regularized NF frameworks are designed to model probability measure flows by enforcing loss constraints on each density transfer function, ensuring volumetric invariance at each time step.

  • State-policy-connected time series neural networks, built upon MKV FBSDEs, capture the relationships between time-step value functions and approximate their gradients, enabling time-consistent solutions.

  • The method demonstrates applicability in traffic flow, low- and high-dimensional crowd motion, and obstacle avoidance problems. Additionally, it satisfies mathematical constraints better than existing neural network-based approaches.

2 Connections among MFG, MKV, and NF

2.1 MFGs & McKean-Vlasov FBSDE

We now formalize the MFGs problem without considering common noise. For this purpose, we start with a complete filtered probability space (Ω,,𝔽=(t)0tT,))(\Omega,\mathcal{F},\mathbb{F}=(\mathcal{F}_{t})_{0\leq t\leq T},\mathbb{P})) the filtration 𝔽\mathbb{F} supporting a dd-dimensional Wiener process 𝐖=(Wt)0tT\mathbf{W}=(W_{t})_{0\leq t\leq T} with respect to 𝔽\mathbb{F} and an initial condition ξL2(Ω,0,;d)\xi\in L^{2}(\Omega,\mathcal{F}_{0},\mathbb{P};\mathbb{R}^{d}). This MFGs problem can be described as:

(i) For each fixed deterministic flow 𝝁=(μt)0tT\bm{\mu}=(\mu_{t})_{0\leq t\leq T} of probability measures on d\mathbb{R}^{d}, solve the standard stochastic control problem:

infα𝔸Jμ(α)withJμ(α)=𝔼[0Tf(t,Xtα,μt,αt)𝑑t+g(XTα,μT)],\inf_{\alpha\in\mathbb{A}}J^{\mu}(\alpha)\quad\text{with}\quad J^{\mu}(\alpha)=\mathbb{E}[\int_{0}^{T}f(t,X^{\alpha}_{t},\mu_{t},\alpha_{t})dt+g(X_{T}^{\alpha},\mu_{T})], (1)

subject to

{dXtα=b(t,XTα,μt,αt)dt+σ(t,XTα,μt,αt)dWt,t[0,T],X0α=ξ\left\{\begin{array}[]{l}dX_{t}^{\alpha}=b(t,X_{T}^{\alpha},\mu_{t},\alpha_{t})dt+\sigma(t,X_{T}^{\alpha},\mu_{t},\alpha_{t})dW_{t},\quad t\in[0,T],\\ X_{0}^{\alpha}=\xi\end{array}\right. (2)

(ii) Find a flow 𝝁=(μt)0tT\bm{\mu}=(\mu_{t})_{0\leq t\leq T}such that (X^T𝝁)=μt\mathcal{L}(\hat{X}_{T}^{\bm{\mu}})=\mu_{t} for all t[0,T]t\in[0,T], if X^𝝁\hat{X}^{\bm{\mu}} is a solution of the above optimal control problem.

We can see that the first step provides the best response of a given player interacting with the statistical distribution of the states of the other players if this statistical distribution is assumed to be given by μt\mu_{t}. In contrast, the second step solves a specific fixed point problem in the spirit of the search for fixed points of the best response function.

Usually, the solution of MFGs is transformed into a set of coupled partial differential equations, namely the HJB-FPK equations, which respectively describe the evolution of the value function of the representative element and the density evolution of the group, as shown below:

tuνΔu+H(x,u)=f(x,μ)\displaystyle-\partial_{t}u-\nu\Delta u+H(x,\nabla u)=f(x,\mu)\qquad (HJB) (3)
tμνΔμdiv(μpH(x,u))=0\displaystyle\partial_{t}\mu-\nu\Delta\mu-\operatorname{div}(\mu\nabla_{p}H(x,\nabla u))=0\quad\quad (FPK)
μ(x,0)=μ0,u(x,T)=g(x,μ(,T))\displaystyle\mu(x,0)=\mu_{0},\quad u(x,T)=g(x,\mu(\cdot,T))

in which, u:n×[0,T]u:\mathbb{R}^{n}\times[0,T]\rightarrow\mathbb{R} is the value function to guide the agents make decisions; H:n×nH:\mathbb{R}^{n}\times\mathbb{R}^{n}\rightarrow\mathbb{R} is the Hamiltonian, which describes the physics energy of the system; μ(,t)(n)\mu(\cdot,t)\in\mathcal{L}(\mathbb{R}^{n})is the distribution of agents at time t, f:n×(n)nf:\mathbb{R}^{n}\times\mathcal{L}(\mathbb{R}^{n})\rightarrow\mathbb{R}^{n} denotes the loss during process; and g:n×(n)ng:\mathbb{R}^{n}\times\mathcal{L}(\mathbb{R}^{n})\rightarrow\mathbb{R}^{n} is the terminal condition, guiding the agents to the final distribution.

Let assumption MFGs Solvability HJB (as shown in Appendix A.1) be in force. Then, for any initial condition ξL2(Ω,0,;d)\xi\in L^{2}(\Omega,\mathcal{F}_{0},\mathbb{P};\mathbb{R}^{d}), the McKean-Vlasov FBSDEs:

{dXt=b(t,Xt,(Xt),α^(t,Xt,(Xt),σ(t,Xt,(Xt))1Zt))dt+σ(t,Xt,(Xt))dWtdYt=f(t,Xt,(Xt),α^(t,Xt,(Xt),σ(t,Xt,(Xt))1Zt))dt+ZtdWt\left\{\begin{array}[]{l}dX_{t}=b(t,X_{t},\mathcal{L}(X_{t}),\hat{\alpha}(t,X_{t},\mathcal{L}(X_{t}),\sigma(t,X_{t},\mathcal{L}(X_{t}))^{-1\dagger}Z_{t}))dt+\sigma(t,X_{t},\mathcal{L}(X_{t}))dW_{t}\\ dY_{t}=-f(t,X_{t},\mathcal{L}(X_{t}),\hat{\alpha}(t,X_{t},\mathcal{L}(X_{t}),\sigma(t,X_{t},\mathcal{L}(X_{t}))^{-1\dagger}Z_{t}))dt+Z_{t}\cdot dW_{t}\end{array}\right. (4)

for t[0,T]t\in[0,T], with YT=g(XT,(XT))Y_{T}=g(X_{T},\mathcal{L}(X_{T})) as terminal condition, is solvable. Moreover, the flow ((XT))0tT(\mathcal{L}(X_{T}))_{0\leq t\leq T} given by the marginal distributions of the forward component of any solution is an equilibrium of the MFGs problem associated with the stochastic control problem Eq.(1).

We consider the optimal control step of the formulation of the MFGs problems described earlier. Probabilists have a two-pronged approach to these optimal control problems. We consider that the input 𝝁=(μt)0tT\bm{\mu}=(\mu_{t})_{0\leq t\leq T} is deterministic and fixed to search the optimal reaction decision. Then, after the result of fixed decision, the optimal flows of probability measure can be solved. Alternately seeking for the optimal control, the MFGs equilibrium can be finally derived.

2.2 MFGs & NF

NNormalizing Flows (NF), introduced by Tabak & Vanden-Eijnden (2010), enable exact computation of data likelihood through a sequence of invertible mappings. A key feature of NF is its use of arbitrary bijective functions, achieved through stacked reversible transformations. The flow model RR consists of a sequence of reversible flows, expressed as R(x)=r1r2rL(x)R(x)=r_{1}\circ r_{2}\circ\cdots\circ r_{L}(x), where each rir_{i} has a tractable inverse and Jacobian determinant.

Our algorithm leverages the volume-preserving property of NF, aligning with the consistency of density flow in MFGs during evolution. This principle is essential for constructing the density flow in the MFGs model.

The connection between MFGs and NF provides inherent advantages. For example, in MFGs, the initial distribution is often represented in a simple analytical form. This parallels NF’s approach, where a simple initial distribution transforms into a more complex one for density estimation. Additionally, one advantage of NF over other generative models is its preservation of total density during transformation, consistent with the MFGs requirement μ𝑑x=1\int\mu dx=1. A challenge, however, is that MFGs, unlike Optimal Transport (OT), lacks both initial and terminal density distributions. In MFGs, only an initial distribution exists, and the terminal condition is governed by the terminal value function g(x,μ(,T))g(x,\mu(\cdot,T)). This complicates framing the problem as a complete density evolution problem.

To address this, we idealize the MFGs model with the assumption that the terminal value function corresponds to an explicitly solvable optimal terminal density. For example, in trajectory planning problems, the terminal value function is often related to the destination, such as g(x,μ)=dexxT2dμ(x)g(x,\mu)=\int_{\mathbb{R}^{d}}-e^{-||x-x_{T}||^{2}}d\mu(x). In such cases, we assume the optimal terminal distribution is μ(xT)=1\mu(x_{T})=1, enabling the MFGs problem to be framed with initial and terminal densities. We will show that this assumption is reasonable.

3 Methodology: NF-MKV Net

We propose NF-MKV Net, an alternately trained model combining NF and McKean-Vlasov Forward-Backward Stochastic Differential Equations (MKV FBSDEs), to address MFGs equilibrium problems. The main advantage of MKV FBSDEs is their ability to capture both optimization and interaction components in a single coupled FBSDE, eliminating the need for separate references to Hamilton-Jacobi-Bellman (HJB) and Fokker-Planck-Kolmogorov (FPK) equations. NF generates flows represented by neural networks, constraining each density transfer function to define density distributions at specific times. The density flow from NF couples with the value function of the MFG. The value function constrains the neural network generating the flow via the HJB equation, while its gradient update depends on the current marginal distribution flow.

First, we reformulate the stochastic equations of MFGs using MKV FBSDEs and approximate value function gradients with neural networks, effectively addressing the curse of dimensionality in traditional numerical methods. Second, to address the distribution-coupled challenge, we use NF architectures to model agents’ state density distributions, alternately training the unknown transition processes with value functions. Figure (1) illustrates the framework of NF-MKV Net.

Refer to caption
Figure 1: Framework Diagram of the NF-MKV Net

3.1 Modeling Value Function with MKV FBSDEs

We consider a general class of MFGs problem associated with the stochastic control problem (1), the relative FBSDEs can be represented as in (4), with initial condition ξL2(Ω,0,;d)\xi\in L^{2}(\Omega,\mathcal{F}_{0},\mathbb{P};\mathbb{R}^{d}) and terminal condition YT=g(XT,(XT))Y_{T}=g(X_{T},\mathcal{L}(X_{T})).

Then the solution of Eq.(3) satisfies the following FBSDE:

u(x,t)u(x,0)=0tf(s,Xs,(Xs),α^(s,Xs,(Xs),σ(t,Xs,(Xs))1Zs))𝑑s+0tZs𝑑Ws.u(x,t)-u(x,0)=-\int_{0}^{t}f(s,X_{s},\mathcal{L}(X_{s}),\hat{\alpha}(s,X_{s},\mathcal{L}(X_{s}),\sigma(t,X_{s},\mathcal{L}(X_{s}))^{-1\dagger}Z_{s}))ds+\int_{0}^{t}Z_{s}dW_{s}. (5)

We apply a temporal discretization to Eq.(4). Given a partition of the time interval[0,T]:0=t0<t1<<tN=T[0,T]:0=t_{0}<t_{1}<\cdots<t_{N}=T, we consider the simple Euler scheme forn=1,,N1n=1,\cdots,N-1

Xtn+1Xtnb(t,XTα,μt,αt)Δtn+σ(t,XTα,μt,αt)ΔWn,X_{t_{n+1}}-X_{t_{n}}\approx b(t,X_{T}^{\alpha},\mu_{t},\alpha_{t})\Delta t_{n}+\sigma(t,X_{T}^{\alpha},\mu_{t},\alpha_{t})\Delta W_{n}, (6)

and

u(x,tn+1)u(x,tn)f(s,Xs,(Xs),α^t()1Zs))Δtn+[xu(x,t)]TσΔWn,u(x,t_{n+1})-u(x,t_{n})\approx-f(s,X_{s},\mathcal{L}(X_{s}),\hat{\alpha}_{t}(\cdot)^{-1\dagger}Z_{s}))\Delta t_{n}+[\partial_{x}u(x,t)]^{T}\sigma\Delta W_{n}, (7)

where

Δtn=tn+1tn,ΔWn=Wtn+1Wtn.\Delta t_{n}=t_{n+1}-t_{n},\quad\Delta W_{n}=W_{t_{n+1}}-W_{t_{n}}. (8)

The key in modeling the above FBSDEs is to approximate the value function xu(0,x)x\mapsto u(0,x) at t=t0t=t_{0}, which is u(0,x)u(0,x|θ0)u(0,x)\approx u(0,x|\theta_{0}); while approximating the function x[xu(x,t)]Tσx\mapsto[\partial_{x}u(x,t)]^{T}\sigma at each time step t=tnt=t_{n} through a multi-layer feedforward neural network

x[xu(x,t)]Tσ[xu(x,t)|θn]Tσ,x\mapsto[\partial_{x}u(x,t)]^{T}\sigma\approx[\partial_{x}u(x,t)|\theta_{n}]^{T}\sigma, (9)

which represents the adjoint variable ZtZ_{t} of the value function of the element, which is expressed as the product of the gradient and the random variable in the MFGs optimization problem.

Subsequently, all value functions are connected by summing Eq.(7) over tt. The network uses the generated density flows 𝝁\bm{\mu} and WtnW_{t_{n}} as inputs and produces the final output u^\hat{u}, approximating u(x,T)=g(x,μ(,T))u(x,T)=g(x,\mu(\cdot,T)). This approximation defines the expected loss function by comparing the maximum likelihoods of the two functions to minimize the difference for {xi}i=1Nz=μ^T\{x_{i}\}_{i=1}^{N}\sim z=\hat{\mu}_{T}:

lMKV=log𝐩(g(z,μ^T)|u^(θ,z))=1Ni=1Nlog𝐩(g(xi,μ^T)|u^(θ,xi)).l_{\textit{MKV}}=-\log\mathbf{p}(g(z,\hat{\mu}_{T})|\hat{u}(\theta,z))=-\frac{1}{N}\sum\nolimits_{i=1}^{N}\log\mathbf{p}(g(x_{i},\hat{\mu}_{T})|\hat{u}(\theta,x_{i})). (10)

In summary, we reformulate MFGs as MKV FBSDEs (Eq.(5)) and discretize time to establish the relationship between value functions at each time tt (Eq.(7)), connecting them via the adjoint variable ZtZ_{t}. Next, we parameterize ZtZ_{t} and u0u_{0}, using these relationships to link uTu_{T} with the terminal condition g(x,μ(,T))g(x,\mu(\cdot,T)) for maximum likelihood estimation (Eq.(10)) to minimize the final MFGs loss.

3.2 Modeling density distribution with NF

Typically, NF methods prioritize density estimation results. In contrast, our approach emphasizes the NF density evolution process, constraining each layer to align with the density evolution in MFGs.

In an NF model, If rir_{i} are differentiable and reversible functions, we usually express it like:

(Normalizing)𝐫=r1r2rN,𝐩μ0(X)=𝐩μT(𝐫(X))|detD𝐫(X)|(Construct)𝐬=sNsN1s1,𝐩μT(X)=𝐩μ0(𝐬(X))|detD𝐫(X)|1\begin{array}[]{ll}\text{(Normalizing)}\quad\mathbf{r}=r_{1}\circ r_{2}\circ\cdots\circ r_{N},\quad\mathbf{p}_{\mu_{0}}(X)=\quad\mathbf{p}_{\mu_{T}}(\mathbf{r}(X))|\det D\mathbf{r}(X)|\\ \text{(Construct)}\quad\mathbf{s}=s_{N}\circ s_{N-1}\circ\cdots\circ s_{1},\quad\mathbf{p}_{\mu_{T}}(X)=\quad\mathbf{p}_{\mu_{0}}(\mathbf{s}(X))|\det D\mathbf{r}(X)|^{-1}\end{array} (11)

To simplify the explanation, we consider a one-dimensional model. During training, ri(x)r_{i}(x) is represented as a neural network, ri(x;ϕ)r_{i}(x;\phi). Multiple rir_{i} are combined to obtain the desired function ff. Typically, training minimizes the negative log-likelihood loss between the final estimated density and the dataset, expressed as

L(x)=log𝐩μT(x)=log𝐩μ0(𝐫1(x))log|detD𝐫1(x)|L(x)=-\log\mathbf{p}_{\mu_{T}}(x)=-\log\mathbf{p}_{\mu_{0}}(\mathbf{r}^{-1}(x))-\log|\det D\mathbf{r}^{-1}(x)| (12)

in which there is no need to consider the loss of each rir_{i} in the process.

Discretizing the NF construction process reveals that each function corresponds to Euler time discretization. Thus, each sub-function rir_{i} transforms the group density μi\mu_{i} in MFGs into μi+1\mu_{i+1}. This series of reversible flows represents the time evolution process, which results in that

μ0𝐫(x)μTμ0rt1(x)μt1rt1(x)μt2rt2(x)rtN1(x)μtN\mu_{0}\xrightarrow{\mathbf{r}(x)}\mu_{T}\quad\Leftrightarrow\quad\mu_{0}\xrightarrow{r_{t_{1}}(x)}\mu_{t_{1}}\xrightarrow{r_{t_{1}}(x)}\mu_{t_{2}}\xrightarrow{r_{t_{2}}(x)}\cdots\xrightarrow{r_{t_{N-1}}(x)}\mu_{t_{N}} (13)

Additionally, as each sub-function is implemented as a neural network, the loss at each time step can be used to constrain and optimize the sub-functions. The density μt\mu_{t} can be expressed as μtn=𝐫1,2,,nμ0\mu_{t_{n}}=\mathbf{r}_{1,2,\cdots,n}\circ\mu_{0}, where 𝐫1,2,,n=rtnrtn1rt1\mathbf{r}_{1,2,\cdots,n}=r_{t_{n}}\circ r_{t_{n-1}}\circ\cdots\circ r_{t_{1}}.

Our way of modeling the limited is to approximate the crowd density of each layer

μ0𝐫1,2,,n(x)μ0(x)𝐫1,2,,n(x;𝚽)μ0\mu_{0}\mapsto\mathbf{r}_{1,2,\cdots,n}(x)\circ\mu_{0}(x)\approx\mathbf{r}_{1,2,\cdots,n}(x;\bm{\Phi})\circ\mu_{0} (14)

at each time step t=tnt=t_{n} through a NF model consisting of multiple layers of Masked Autoregressive Flow (MAF) and Permute parameterized by ϕ\phi.

To train the NF, the first step is computing the process loss lHJBl_{\text{HJB}}. At each time step t=tnt=t_{n}, the MFGs system satisfies the HJB equation. Section 3.1 describes the value function and its gradient. Thus, samples xii=1M{x_{i}}_{i=1}^{M} from μt\mu_{t} at each time step can be used in the HJB equation to compute the loss,

lHJB=1N1Mn=1Ni=1Mtu(xi,tn)+νΔu(xi,tn)H(xu(xi,tn))+f(xi,tn)2l_{\text{HJB}}=\frac{1}{N}\frac{1}{M}\sum\nolimits_{n=1}^{N}\sum\nolimits_{i=1}^{M}\|\partial_{t}u\left(x_{i},t_{n}\right)+\nu\Delta u\left(x_{i},t_{n}\right)-H\left(\nabla_{x}u\left(x_{i},t_{n}\right)\right)+f\left(x_{i},t_{n}\right)\|^{2} (15)

where

({xi}i=1M,tn)μtnμtn(ϕ)=rtn(x;ϕn)rtn1(x;ϕn1)rt1(x;ϕ1)μ0.\left(\{x_{i}\}_{i=1}^{M},t_{n}\right)\sim\mu_{t_{n}}\approx\mu_{t_{n}}(\phi)=r_{t_{n}}(x;\phi_{n})\circ r_{t_{n-1}}(x;\phi_{n-1})\circ\cdots\circ r_{t_{1}}(x;\phi_{1})\circ\mu_{0}. (16)

Additionally, the NF method must match the terminal density condition, so the terminal loss lTl_{\text{T}} is also included in the loss calculation.If the terminal condition gg is explicitly defined, the corresponding optimal density μ^T(ϕ)=𝐟(x;Φ)μ0\hat{\mu}_{T}(\phi)=\mathbf{f}(x;\Phi)\circ\mu_{0} can serve as the target distribution for NF. The negative log-likelihood between μ^T(ϕ)\hat{\mu}T(\phi) and μT\mu_{T} generated by NF is used to compute the terminal loss lTl\text{T}. For xii=1Nμ^T(Φ){x_{i}}_{i=1}^{N}\sim\hat{\mu}_{T}(\Phi):

lT=log𝐩(μT|μ^T(Φ))=1Ni=1Nlog𝐩(μT|μ^T(Φ,xi)).l_{\textit{T}}=-\log\mathbf{p}(\mu_{T}|\hat{\mu}_{T}(\Phi))=-\frac{1}{N}\sum\nolimits_{i=1}^{N}\log\mathbf{p}(\mu_{T}|\hat{\mu}_{T}(\Phi,x_{i})). (17)

In summary, NF, as a generative model, can construct intermediate function compositions and distributions without direct data use, relying solely on the initial distribution μ0\mu_{0} and the terminal distribution μT\mu_{T}, while preserving density consistency. We first compute the terminal density distribution μT\mu_{T} explicitly and construct an NF to transition from μ0\mu_{0} to μT\mu_{T}. The losses lHJBl_{\text{HJB}} (Eq. 15) and lTl_{\textit{T}} (Eq. 17) constrain NF evolution, ensuring the flow density aligns with the control objectives.

3.3 Coupling two processes

Two processes can be coupled and trained alternately. As NF is a generative model, it can first generate a set of flow density evolution functions along with the corresponding density distributions at each time step. This generated set of density distributions is fixed as the marginal distribution to optimize the value function and gradient under the MKV FBSDE framework. Once the optimal value function for this marginal distribution flow is obtained, it is fixed to update each rnr_{n} and its corresponding μtn\mu_{t_{n}} in the NF evolution process. This continues until the optimal density distribution flow under the current value function is achieved. This iterative coupled training continues until convergence. Algorithm (1) presents the pseudo-code of the model.

Algorithm 1 NF-MKV Net
σ\sigma diffusion parameter, gg terminal cost, HH Hamiltonian, ff process loss, μ0\mu_{0} initial density
𝝁=(μt)0tT\bm{\mu}=(\mu_{t})_{0\leq t\leq T} density flow, 𝐮=(ut)0tT\mathbf{u}=(u_{t})_{0\leq t\leq T}, value function
μT\mu_{T}\leftarrowargmaxμg(x,μ(x,T))\arg\max_{\mu}g(x,\mu(x,T))
Generating NF {μtn(ϕn)}n=1N\{\mu_{t_{n}}(\phi_{n})\}_{n=1}^{N} from μ0\mu_{0} to μT\mu_{T}
while not converged do
     Train u(0,x|θ0)u(0,x|\theta_{0}) and [xu(x,t)|θn]Tσ[\partial_{x}u(x,t)|\theta_{n}]^{T}\sigma for n=1,2,,Nn=1,2,\cdots,N:
     Sample batch ({xi}i=1M,tn)μtn\left(\{x_{i}\}_{i=1}^{M},t_{n}\right)\sim\mu_{t_{n}} for n=1,2,,Nn=1,2,\cdots,N
     Sample Winner Process {Wtn}n=1N𝒩(0,σ2)\{W_{t_{n}}\}_{n=1}^{N}\sim\mathcal{N}(0,\sigma^{2})
     lMKV1N|g((xi,T),μT))u^({(xi,n,tn)}),{Wtn}n=1N|2l_{\textit{MKV}}\leftarrow-\frac{1}{N}|g((x_{i},T),\mu_{T}))-\hat{u}(\{(x_{i,n},t_{n})\}),\{W_{t_{n}}\}_{n=1}^{N}|^{2}
     Back-propagate the loss lMKVl_{\textit{MKV}} to θ\theta weights.
     Train rn(ϕn)r_{n}(\phi_{n}) for n=1,2,,Nn=1,2,\cdots,N:
     Sample batch ({xi}i=1M,tn)μtn(ϕn)\left(\{x_{i}\}_{i=1}^{M},t_{n}\right)\sim\mu_{t_{n}}(\phi_{n}) for n=1,2,,Nn=1,2,\cdots,N
     lHJB1N1Mn=1Ni=1Mtu(xi,tn)+νΔu(xi,tn)H(xu(xi,tn))+f(xi,tn)2l_{\text{HJB}}\leftarrow\frac{1}{N}\frac{1}{M}\sum\nolimits_{n=1}^{N}\sum\nolimits_{i=1}^{M}\|\partial_{t}u\left(x_{i},t_{n}\right)+\nu\Delta u\left(x_{i},t_{n}\right)-H\left(\nabla_{x}u\left(x_{i},t_{n}\right)\right)+f\left(x_{i},t_{n}\right)\|^{2}
     Sample batch {xi}i=1Nμ^T(Φ)\{x_{i}\}_{i=1}^{N}\sim\hat{\mu}_{T}(\Phi)
     lT1Ni=1Nlog𝐩(μT|μ^T(Φ,xi))l_{\text{T}}\leftarrow-\frac{1}{N}\sum\nolimits_{i=1}^{N}\log\mathbf{p}(\mu_{T}|\hat{\mu}_{T}(\Phi,x_{i}))
     Back-propagate the loss lNF=lHJB+lTl_{\text{NF}}=l_{\text{HJB}}+l_{\textit{T}} to ϕ\phi weights.
end while

4 Numerical Experiment

We apply NF-MKV Net to MFGs instances and present the numerical results in two parts. The first part demonstrates NF-MKV Net as an effective method for solving MFGs equilibrium involving density distributions. The second part highlights the accuracy of NF-MKV Net in comparison to other algorithms.

4.1 Solving MFGs with NF-MKV Net

This section presents three examples of solving MFGs using NF-MKV Net, demonstrating its applicability to traffic flow problems, low- and high-dimensional crowd motion problems, and scenarios with obstacles.

4.1.1 Example 1: MFGs Traffic Flow Control

A series of numerical experiments in MFGs Traffic Flow Control explore the dynamics of MFGs, focusing on autonomous vehicles (AVs) navigating a circular road network. The traffic flow scenario is formulated as an MFGs problem involving density distribution and the value function.

The initial density is defined on the ring road, where the state xx represents the AVs’ position. The state transfer function is dx=vdt+σdWtdx=vdt+\sigma dW_{t}, and the process constraint is f(x)=12(1μb)2f(x)=\frac{1}{2}(1-\mu-b)^{2}. The Hamiltonian is defined as H(x,p,t)=f(p,μ)+puxH(x,p,t)=f(p,\mu)+pu_{x}, leading to the optimal control u=argminp(f(p,μ)+pux)u^{*}=\arg\min_{p}(f(p,\mu)+pu_{x}). In the finite time domain problem, the terminal value function uTu_{T} of the AVs system is constrained at t=Tt=T. It is assumed that AVs have no preference for their locations at time TT, i.e., u(x,T)=0u(x,T)=0. In the MFGs traffic flow problem, the terminal value function uTu_{T} can be solved explicitly as μT(x)=1,x(0,1)\mu_{T}(x)=1,\forall x\in(0,1), satisfying the model’s assumptions.

Without loss of generality, we define the time interval as [0,1][0,1] and set the initial density μ0\mu_{0} at t=0t=0. To verify the volumetric invariance of the density distribution discussed in our study, we selected initial density functions satisfying 01μ0(x)𝑑x\int_{0}^{1}\mu_{0}(x)dx. Four different initial densities were selected, each with a distinct diffusion coefficient σ\sigma for the Wiener process. NF-MKV Net was then employed to solve for equilibrium, verifying the proposed algorithm’s applicability.

Results in Fig.(2) indicate that the agent distribution, regardless of initial density μ0\mu_{0} or drift term σ\sigma, converges to the equilibrium μ(x,T)=1\mu(x,T)=1. Comparing NF-MKV Net with the numerical solution shows errors below 10310^{-3}, demonstrating the algorithm’s effectiveness in solving traffic flow problems. The results illustrate the evolution of the density distribution μ(x)\mu(x) over time tt and the log\log errors compared to the noise-free numerical method.

Refer to caption
Figure 2: NF-MKV Net solutions and numerical error of MFGs traffic flow with various initial density distribution and diffusion coefficients

4.1.2 Example 2: MFGs Crowd Motion

In this example, a dynamically formulated MFGs problem, the Crowd Motion problem, is constructed in dimensions d=2d=2 and d=50d=50 to demonstrate the applicability of NF-MKV Net. We set the problems as in Eq.(3) with parameter:

f(x,μ)=n𝐞|xx^|2𝑑μ(x^),\displaystyle f(x,\mu)=\int_{\mathbb{R}^{n}}\mathbf{e}^{-|x-\hat{x}|^{2}}d\mu(\hat{x}),\quad H(x,p,t)=|p|2+f(x,μ,t),\displaystyle H(x,p,t)=|p|^{2}+f(x,\mu,t), (18)
μ(x,0)=μ0(x),\displaystyle\mu(x,0)=\mu_{0}(x), u(x,T)=n|xxT|2𝑑μ(x)\displaystyle u(x,T)=\int_{\mathbb{R}^{n}}{|x-x_{T}|^{2}}d\mu(x)

𝐝=𝟐\mathbf{d=2} Crowd Motion. Here, σ=2\sigma=\sqrt{2} is used with 20 time steps in the dynamics process, and the initial distribution is set as μ0(x)=𝒩((2,2),(12,12))\mu_{0}(x)=\mathcal{N}((-2,-2),(1^{2},1^{2})). To reach the goal point, we set xT=(2,2)x_{T}=(2,-2) which means the terminal condition is g(x,μ())=2|x(2,2)|2𝑑μ(x)g(x,\mu(\cdot))=\int_{\mathbb{R}^{2}}{|x-(2,-2)|^{2}}d\mu(x).The terminal distribution is set as μT(xT)=1\mu_{T}(x_{T})=1 to minimize the terminal condition. With these settings, NF-MKV Net trains the MFGs model associated with the dynamic system.

Refer to caption
Figure 3: 2-dimensional crowd motion dynamics flow

𝐝=𝟓𝟎\mathbf{d=50} Crowd Motion. Similar to the 2-dimensional case, high-dimensional methods adopt the same settings as in Eq.(3) and Eq.(18). In contrast, high-dimensional methods handle agent states and controls in 50\mathbb{R}^{50}, along with density distributions in (50)\mathcal{L}(\mathbb{R}^{50}). So Our initial density μ0(x)\mu_{0}(x) is a Gaussian centered at (2,2,0,,0)(-2,-2,0,\cdots,0) and terminal conditions g(x,μ())=50|x(2,2,0,,0)|2𝑑μ(x)g(x,\mu(\cdot))=\int_{\mathbb{R}^{50}}{|x-(2,2,0,\cdots,0)|^{2}}d\mu(x). Also, the optimal terminal density distribution can be written as μT(xT)=1\mu_{T}(x_{T})=1. With these settings, NF-MKV Net trains the MFGs model associated with the dynamic system. Results are visualized in the first two dimensions by summing projections from higher dimensions onto these two dimensions.

Refer to caption
Figure 4: 50-dimensional crowd motion dynamics flow

The trajectories of 10001000 points are shown in Fig. (3) for the 2-dimensional case and Fig. (4) for the 50-dimensional case. NF-MKV Net effectively transforms the initial Gaussian density into the terminal condition along a nearly straight trajectory, while ensuring crowd deformation and inter-group collision avoidance. This behavior remains consistent as the dimensionality increases.

4.1.3 Example 3: MFGs Crowd Motion with obstacle

This experiment considers an MFGs problem with complex process interaction costs. Following the general setting in Eq.(3) and Eq.(18), we change the process interaction costs ff as

f(x,μ)=n𝐞|xx^|2𝑑μ(x^)+n(𝐞|xxo|2+𝐞||xxo|2ssafe|2)𝑑μ(x).f(x,\mu)=\int_{\mathbb{R}^{n}}\mathbf{e}^{-|x-\hat{x}|^{2}}d\mu(\hat{x})+\int_{\mathbb{R}^{n}}\left(\mathbf{e}^{-|x-x_{\text{o}}|^{2}}+\mathbf{e}^{-||x-x_{\text{o}}|^{2}-s_{\text{safe}}|^{2}}\right)d\mu(x). (19)

We set σ=2\sigma=\sqrt{2} with 20 time steps in the dynamics process and set initial distribution as μ0(x)=𝒩((2,2);12)\mu_{0}(x)=\mathcal{N}((-2,-2);1^{2}). To reach the goal point, we set xT=(2,2)x_{T}=(2,2) which means the terminal condition is g(x,μ())=2|x(2,2)|2𝑑μ(x)g(x,\mu(\cdot))=\int_{\mathbb{R}^{2}}{|x-(2,2)|^{2}}d\mu(x). The terminal distribution should be μT(xT)=1\mu_{T}(x_{T})=1 to minimize the terminal condition. During the dynamics, the system optimizes the process loss ff defined in Eq. (19). The problem involves transforming the initial Gaussian density to a new location while minimizing terminal conditions, avoiding congestion, and bypassing an obstacle at xo=(2,2)x_{o}=(2,-2) with a safety radius ssafe=1.5s_{\text{safe}}=1.5. With these settings, NF-MKV Net successfully trains the MFGs model associated with the dynamics system.

Refer to caption
Figure 5: 2-dimensional crowd motion dynamics flow with an obstacle.

The results, shown in Fig. (5), demonstrate that NF-MKV Net successfully transforms the initial Gaussian density into the desired terminal condition along an optimized trajectory, ensuring crowd deformation, inter-group collision avoidance, and obstacle avoidance.

4.2 Comparison with other Methods

To verify volumetric invariance and time continuity, we compare NF-MKV Net with existing MFGs solving methods, including the distribution-based RL-PIDL method proposed by Chen et al. (2023) and the high-dimensional neural network-based APAC-Net by Lin et al. (2021).

Distribution volumetric-invariance. We implement an approximated integral over the dynamics region, widely used in density estimation such as O’Brien et al. (2016). By generating a grid over a specified area, the numerical integration of a specified probability distribution over that area is computed, and the return value should be close to 1. This method verifies the validity of the distribution. Since the approximated integral is a grid-based method, it can only be used in low-dimensional problems. So, when we approximate integral in d=50d=50 crowd motion, we use the same process as when showing the high-dimensional trajectories from the experiment above, that is, a projection-like method that accumulates the density distribution function on the other components over the first two components and estimates the density in a 2-dimensional region.

Agents states time-continuity. The Wasserstein distance is generally chosen as the metric for the difference between two density distributions. Laurière et al. (2022) has used the method to assess the difference between distributions. Even without an explicit probability density function, the Wasserstein distance (WW-dis) can be computed by optimization methods as long as it can be sampled from both distributions.

We set the comparison experiments in traffic flow (d=1d=1) and crowd motion (d=2d=2 and 5050). The comparison results are shown in Tab.(1) and Fig.(6).

Table 1: Comparison with other methods
COMPARISON NF-MKV RL-PIDL APAC-NET
Exp 1: traffic flow (d=1)(d=1) log\log of μ\mu integral difference from 11 2.67\mathbf{-2.67} 0.21-0.21 / (sample-based)
WW-dis of μt\mu_{t} bet- ween time steps 0.044\mathbf{0.044} 0.0520.052 0.0470.047
Exp 2: crowd motion (d=2)(d=2) log\log of μ\mu integral difference from 11 2.32\mathbf{-2.32} 0.13-0.13 / (sample-based)
WW-dis of μt\mu_{t} bet- ween time steps 0.096\mathbf{0.096} 0.1080.108 0.1010.101
Exp 2: crowd motion (d=50)(d=50) log\log of μ\mu integral difference from 11 1.06\mathbf{-1.06} 0.260.26 / (sample-based)
WW-dis of μt\mu_{t} bet- ween time steps 2.27\mathbf{2.27} 3.493.49 2.852.85
Refer to caption
Refer to caption
Figure 6: Comparison results in time steps. Left: log\log of integration of distribution difference from 11 in traffic flow (d=1d=1). Right: Wasserstein distance of distribution between time steps in crowd motion (d=2d=2).

The integral error of the NF-MKV Net method over the dynamic region is below 10210^{-2} and close to the standard value of 11, significantly outperforming the other method, which has an error exceeding 0.10.1 in low-dimensional problems. In high-dimensional scenarios, the error of the other method is over ten times larger than that of NF-MKV Net, highlighting the distribution volumetric invariance of NF-MKV Net. The NF-MKV Net has the smallest average Wasserstein distance between adjacent time steps among the three methods. This demonstrates smoother evolution and superior agent state time-continuity.

In summary, NF-MKV Net excels in distribution volumetric invariance and agent state time-continuity, making it suitable for solving MFGs problems involving density distributions.

5 Conclusion

This paper investigates MFGs equilibrium solutions using a stochastic process framework, addressing equivalent probability distribution flow fixed-point problems instead of directly solving the coupled MFGs equations. We propose NF-MKV Net, which integrates process-regularized NF with state-policy-connected time-series neural networks. Process-regularized NF frameworks enforce mathematical constraints by regulating the transfer functions to represent flows of probability measures. State-policy-connected time-series neural networks, grounded in MKV FBSDEs, establish relationships among value functions to ensure a time-consistent process. The proposed method is validated in diverse scenarios, demonstrating its effectiveness compared to existing approaches. NF-MKV Net exhibits strong performance in distribution volumetric invariance and agent state time-continuity, making it applicable to MFGs problems involving distributions.

References

  • Achdou & Lauriere (2015) Yves Achdou and Mathieu Lauriere. On the System of Partial Differential Equations Arising in Mean Field type Control. On the system of partial differential equations arising in mean field type control, 2015.
  • Carmona et al. (2018) René Carmona, François Delarue, et al. Probabilistic Theory of Mean Field Games with Applications I-II. Springer, 2018.
  • Chen et al. (2023) Xu Chen, Shuo Liu, and Xuan Di. A Hybrid Framework of Reinforcement Learning and Physics-Informed Deep Learning for Spatiotemporal Mean Field Games. In In Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems. ACM DIgital Library, 2023.
  • Huang et al. (2023) Han Huang, Jiajia Yu, Jie Chen, and Rongjie Lai. Bridging Mean-Field Games and Normalizing Flows with Trajectory Regularization. Journal of Computational Physics, 487:112155, 2023.
  • Huang et al. (2006) Minyi Huang, Roland P Malhamé, and Peter E Caines. Large Population Stochastic Dynamic Games: Closed-Loop McKean-Vlasov Systems and the Nash Certainty Equivalence Principle. JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2006.
  • Lasry & Lions (2007) Jean-Michel Lasry and Pierre-Louis Lions. Mean Field Games. Japanese journal of mathematics, 2(1):229–260, 2007.
  • Laurière et al. (2022) Mathieu Laurière, Sarah Perrin, Sertan Girgin, Paul Muller, Ayush Jain, Theophile Cabannes, Georgios Piliouras, Julien Pérolat, Romuald Elie, Olivier Pietquin, et al. Scalable deep reinforcement learning algorithms for mean field games. In International Conference on Machine Learning, pp.  12078–12095. PMLR, 2022.
  • Lin et al. (2021) Alex Tong Lin, Samy Wu Fung, Wuchen Li, Levon Nurbekyan, and Stanley J Osher. Alternating the Population and Control Neural Networks to Solve High-Dimensional Stochastic Mean-Field Games. Proceedings of the National Academy of Sciences, 118(31):e2024713118, 2021.
  • O’Brien et al. (2016) Travis A. O’Brien, Karthik Kashinath, Nicholas R. Cavanaugh, William D. Collins, and John P. O’Brien. A fast and objective multidimensional kernel density estimation method: fastkde. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 101:148–160, SEP 2016. ISSN 0167-9473. doi: 10.1016/j.csda.2016.02.014.
  • Ren et al. (2024) Lu Ren, Yuxin Jin, Zijia Niu, Wang Yao, and Xiao Zhang. Hierarchical Cooperation in LQ Multi-Population Mean Field Game with its Application to Opinion Evolution. IEEE Transactions on Network Science and Engineering, 2024.
  • Ruthotto et al. (2020) Lars Ruthotto, Stanley J Osher, Wuchen Li, Levon Nurbekyan, and Samy Wu Fung. A Machine Learning Framework for Solving High-Dimensional Mean Field Game and Mean Field Control Problems. Proceedings of the National Academy of Sciences, 117(17):9183–9193, 2020.
  • Tabak & Vanden-Eijnden (2010) Esteban G Tabak and Eric Vanden-Eijnden. Density estimation by dual ascent of the log-likelihood. Communications in Mathematical Sciences, 8(1):217–233, 2010.

Appendix A Assumption(MFGs Solvability HJB)

  • (A1)

    The volatility σ\sigma is independent of the control parameter α\alpha.

  • (A2)

    There exists a constant L0L\geq 0 such that:

    |b(t,x,μ,α)|L(1+|α|),\displaystyle|b(t,x,\mu,\alpha)|\leq L(1+|\alpha|),\quad |f(t,x,μ,α)|L(1+|α|2)\displaystyle|f(t,x,\mu,\alpha)|\leq L\left(1+|\alpha|^{2}\right)
    |(σ,σ1)(t,x,μ)|L,\displaystyle\left|\left(\sigma,\sigma^{-1}\right)(t,x,\mu)\right|\leq L,\quad |g(x,μ)|L\displaystyle|g(x,\mu)|\leq L

    for all (t,x,μ,α)[0,T]×d×𝒫2(d)×A(t,x,\mu,\alpha)\in[0,T]\times\mathbb{R}^{d}\times\mathcal{P}_{2}\left(\mathbb{R}^{d}\right)\times A.

  • (A3)

    For any t[0,T],xt\in[0,T],x\in\mathbb{R}, and μ𝒫2(d)\mu\in\mathcal{P}_{2}\left(\mathbb{R}^{d}\right), the functions b(t,x,μ,)b(t,x,\mu,\cdot) and f(t,x,μ,)f(t,x,\mu,\cdot) are continuously differentiable in α\alpha.

  • (A4)

    For any t[0,T]t\in[0,T], μ𝒫2(d)\mu\in\mathcal{P}_{2}\left(\mathbb{R}^{d}\right) and αA\alpha\in A, the functions b(t,,μ,α)b(t,\cdot,\mu,\alpha), f(t,,μ,α)f(t,\cdot,\mu,\alpha), σ(t,,μ)\sigma(t,\cdot,\mu) and g(,μ)g(\cdot,\mu) are LL-Lipschitz continuous in xx; for any t[0,T],xt\in[0,T],x\in\mathbb{R} and αA\alpha\in A, the functions b(t,x,,α)b(t,x,\cdot,\alpha), f(t,x,,α)f(t,x,\cdot,\alpha), σ(t,x,)\sigma(t,x,\cdot) and g(x,)g(x,\cdot) are continuous in the measure argument with respect to the 22-Wasserstein distance.

  • (A5)

    For the same constant LL and for all (t,x,μ,α)[0,T]×d×𝒫2(d)×A(t,x,\mu,\alpha)\in[0,T]\times\mathbb{R}^{d}\times\mathcal{P}_{2}\left(\mathbb{R}^{d}\right)\times A,

    |αb(t,x,μ,α)|L,|αf(t,x,μ,α)|L(1+|α|).\left|\partial_{\alpha}b(t,x,\mu,\alpha)\right|\leq L,\quad\left|\partial_{\alpha}f(t,x,\mu,\alpha)\right|\leq L(1+|\alpha|).
  • (A6)

    Letting

    H(t,x,μ,y,α)=b(t,x,μ,α)y+f(t,x,μ,α)H(t,x,\mu,y,\alpha)=b(t,x,\mu,\alpha)\cdot y+f(t,x,\mu,\alpha)

    for all (t,x,μ,α)[0,T]×d×𝒫2(d)×A(t,x,\mu,\alpha)\in[0,T]\times\mathbb{R}^{d}\times\mathcal{P}_{2}\left(\mathbb{R}^{d}\right)\times A, there exists a unique minimizer α^(t,x,μ,y)argminαH(t,x,μ,y,α)\hat{\alpha}(t,x,\mu,y)\in\operatorname{argmin}_{\alpha}H(t,x,\mu,y,\alpha), continuous in μ\mu and LL-Lipschitz continuous in (x,y)(x,y), satisfying:

    |α^(t,x,μ,y)|L(1+|y|)|\hat{\alpha}(t,x,\mu,y)|\leq L(1+|y|)\text{, }

    for all (t,x,μ,y)[0,T]×d×𝒫2(d)×d(t,x,\mu,y)\in[0,T]\times\mathbb{R}^{d}\times\mathcal{P}_{2}\left(\mathbb{R}^{d}\right)\times\mathbb{R}^{d}.

Appendix B Symbols Table

To ensure clarity, we have included a correspondence table for all symbols to enhance the reading experience.

Table 2: Table of symbols
Symbol Meaning
Ω\Omega State Process
𝔽\mathbb{F} Filtration in the process
t\mathcal{F}_{t} Filtration at time t
\mathbb{P} Probability space measure
𝐖\mathbf{W} Wiener Process in the process
WtW_{t} Wiener Process at time t
ξ\xi Initial Condition
L2()L^{2}(\cdot) Square Integrable Function Space
d\mathbb{R}^{d} d-dimensional real number space
μ\mathbf{\mu} Probability Density Measures Flows in the process
μt\mu_{t} Probability Density at time t
α\alpha Control Action Process
AA Control Action Set
Jμ(α)J^{\mu}(\alpha) Loss Functions under Behavioral and Density Flow
𝔼\mathbb{E} Expectation
ff Process Loss Function
gg Terminal Loss Function
bb State Control
σ\sigma Random Perturbation
X0αX_{0}^{\alpha} Initial State under a Particular Behavioral Flow
X^tμ\hat{X}_{t}^{\mu} Optimal State of the Representative Agent
L(X^tμ)L(\hat{X}_{t}^{\mu}) The marginal density flow corresponding to the optimum
uu Value Function
vv Random Perturbation
H(x,p)H(x,p) Hamiltonian function corresponding to state xx and covariates
XtX_{t} State at time tt
YtY_{t} Value Function in MKV FBSDE at time tt
ZtZ_{t} Covariate in MKV FBSDE at time tt
rtr_{t} Intermediate Conversion Functions for each layer in NF
R(x)R(x) Conversion Functions in NF
sts_{t} The inverse of the intermediate conversion function for each layer in the NF
S(x)S(x) The inverse of the conversion function of NF
σ1\sigma^{-1\dagger} The inverse ergodic conjugate transpose of sigma
u(0,x|θ0)u(0,x|\theta_{0}) Neural network representation of the value function at time t=0t=0
xu(x,t|θn)\partial_{x}u(x,t|\theta_{n}) Neural network representation of the gradient of the value function
u^\hat{u} Final output
𝐏(g,u)\mathbf{P}(g,u) Maximum likehood of function gg and uu
lMKVl_{MKV} Training Loss of the MKVFBSDE
R1,2,,n(x;Φ)R_{1,2,\cdots,n}(x;\Phi) Neural network parameter representation of NF
lHJBl_{HJB} Training Loss of the HJB equation
N,MN,M Number of time segments, number of full-plane samples
lTl_{T} Terminal Loss
rt(x;Φt)r_{t}(x;\Phi_{t}) Parameter representation of the NF conversion function
xμ{x}\sim\mu Sample
pp Covariate in Hamiltonian function
uxu_{x} uu takes a partial derivative with respect to xx

Appendix C Theoretical Analysis

The theoretical work of our designed algorithm are primarily reflected in using the Representation Theorem for Strong Formulation to guarantee that the neural network solution corresponds to the equilibrium of Mean-Field Games (MFGs).

In fact, MFGs do not always have an equilibrium; its existence depends on the form of the value function in the equations. According to the conditions for the existence of equilibrium in MFGs[1] and the Representation Theorem for Strong Formulation, the objective function must satisfy the Lipschitz continuous, continuously differentiable and convexity condition (see Appendix A). If an objective function that fails to meet these conditions is directly used as the network’s loss function, the resulting solution cannot be guaranteed to correspond to the equilibrium of the MFGs, even if the network provides a solution. At the same time, the solution of the two equations also requires iterative solving, as the MFGs achieves a fixed point through their iteration.

We employ two networks to represent the forward and backward equations, respectively. The value function of each network is strongly tied to the objective function of the equation, following an approach similar to that of Han et al.[2], which effectively represents the equations. And by designing the iterative training structure of the two networks, we characterize the iterative solution process of the two equations in the MKV FBSDE solution process.

These measures theoretically guarantee the existence and uniqueness of the equilibrium in the MFGs system, ensuring that the solution produced by our algorithm is indeed the equilibrium.

In order to show the existence of solutions to the MKV FBSDEs and the equivalence with the MFGs under the given Solvability HJB conditions for the MKV FBSDEs used, we give theoretical proofs which can illustrate the validity of our proposed transformational approach to the equilibrium of the MFGs.

The objective is to prove that, for a given initial condition, the FBSDE has a solution with a bounded martingale integrand, and that this solution is unique within the class of solutions with bounded martingale integrands. Meanwhile, we must also construct a decoupling field. Below shows the theoretical analysis.

Reference

[1] Lasry J M, Lions P L. Mean field games[J]. Japanese journal of mathematics, 2007, 2(1): 229-260.

[2] Han J, Jentzen A, E W. Solving high-dimensional partial differential equations using deep learning[J]. Proceedings of the National Academy of Sciences, 2018, 115(34): 8505-8510.

Theorem: Representation Theorem for Strong Formulation.

For the same input μ\mathbf{\mu} as above and under assumption MFGs Solvability HJB, the FBSDE with X0=ξX_{0}=\xi as initial condition at time 0 has a unique solution (Xt0,ξ,Yt0,ξ,Zt0,ξ)0tT(X_{t}^{0,\xi},Y_{t}^{0,\xi},Z_{t}^{0,\xi})_{0\leq t\leq T} with (Zt0,ξ)0tT(Z_{t}^{0,\xi})_{0\leq t\leq T} being bounded by a deterministic constant, almost everywhere for Leb1\mathrm{Leb}_{1}\otimes\mathbb{P} on [0,T]×Ω[0,T]\times\Omega.

Moreover, there exists a continuous mapping u:[0,T]×du:[0,T]\times\mathbb{R}^{d}\to\mathbb{R} Lipschitz continuous in xx uniformly with respect to t[0,T]t\in[0,T] and to the input μ\mathbf{\mu}, such that, for any initial condition ξL2(Ω,0,;d)\xi\in L^{2}(\Omega,\mathcal{F}_{0},\mathbb{P};\mathbb{R}^{d}), the unique solution (Xt0,ξ,Yt0,ξ,Zt0,ξ)0tT(X_{t}^{0,\xi},Y_{t}^{0,\xi},Z_{t}^{0,\xi})_{0\leq t\leq T} to the FBSDE with X0=ξX_{0}=\xi as initial condition at time 0, satisfies:

[t[0,T],Yt0,ξ=u(t,Xt0,ξ)]=1.\mathbb{P}\bigg{[}\forall t\in[0,T],\quad Y_{t}^{0,\xi}=u\big{(}t,X_{t}^{0,\xi}\big{)}\bigg{]}=1.

Also, the process (σ(t,Xt0,ξ,μt)1Zt0,ξ)0tT(\sigma(t,X_{t}^{0,\xi},\mu_{t})^{-1\dagger}Z_{t}^{0,\xi})_{0\leqslant t\leqslant T} is bounded by the Lipschitz constant of uu in xx. Finally, the process (Xt0,ξ)0tT(X_{t}^{0,\xi})_{0\leqslant t\leqslant T} is the unique solution of the optimal control problem. In particular, 𝔼[u(0,ξ)]=Jμ(α^)\mathbb{E}[u(0,\xi)]=J^{\mu}(\hat{\alpha}) for

α^=(α^(t,Xt0,ξ,μt,σ(t,Xt0,ξ,μt)1Zt0,ξ))0tT.\mathbf{\hat{\alpha}}=(\hat{\alpha}(t,X_{t}^{0,\xi},\mu_{t},\sigma(t,X_{t}^{0,\xi},\mu_{t})^{-1\dagger}Z_{t}^{0,\xi}))_{0\leqslant t\leqslant T}.

Proof. We split the proof into successive steps.

First Step. We first focus on a truncated version of FBSDE, namely:

{dXt=b(t,Xt,μt,α^(t,Xt,μt,σ(t,Xt,μt)1Zt))dt+σ(t,Xt,μt)dWt,dYt=ψ(Zt)f(t,Xt,μt,α^(t,Xt,μt,σ(t,Xt,μt)1Zt))dt+ZtdWt,\left\{\begin{array}[]{l}dX_{t}=b\big{(}t,X_{t},\mu_{t},\hat{\alpha}\big{(}t,X_{t},\mu_{t},\sigma(t,X_{t},\mu_{t})^{-1\dagger}Z_{t}\big{)}\big{)}dt+\sigma(t,X_{t},\mu_{t})dW_{t},\\ dY_{t}=-\psi(Z_{t})f\big{(}t,X_{t},\mu_{t},\hat{\alpha}\big{(}t,X_{t},\mu_{t},\sigma(t,X_{t},\mu_{t})^{-1\dagger}Z_{t}\big{)}\big{)}dt+Z_{t}\cdot dW_{t},\end{array}\right. (20)

for t[0,T]t\in[0,T], with the terminal conditionYt=g(XT,μT)Y_{t}=g(X_{T},\mu_{T}), for a cut-off function ψ:d[0,1]\psi:\mathbb{R}^{d}\to[0,1] , equal to 11 on the ball of center 0 and radius RR, and equal to 0 outside the ball of center 0 and radius 2R2R, such that sup|ψ|2/R\sup|\psi^{\prime}|\leqslant 2/R. For the time being, R>0R>0 is an arbitrary real number. Its value will be fixed later on.

By Ref.1, we know that, for any initial condition (t0,x)[0,T]×d(t_{0},x)\in[0,T]\times\mathbb{R}^{d}, Eq.(20) is uniquely solvable. We denote the unique solution by (XR;t0,x,YR;t0,x,ZR;t0,x)=(XtR;t0,x,YtR;t0,x,ZtR;t0,x)t0tT(X^{R;t_{0},x},Y^{R;t_{0},x},Z^{R;t_{0},x})=(X_{t}^{R;t_{0},x},Y_{t}^{R;t_{0},x},Z_{t}^{R;t_{0},x})_{t_{0}\leqslant t\leqslant T}. Thanks to the cut-off function ψ\psi, the driver of Eq.(20) is indeed Lipschitz-continuous in the variable zz. Moreover, the solution can be represented through a continuous decoupling field uRu^{R}, Lipschitz continuous in the variable xx, uniformly in time. Also, the martingale integrand ZR;t0,xZ^{R;t_{0},x} is bounded by LL times the Lipschitz constant of uRu^{R}, with LL as in assumption MFGs Solvability HJB. Therefore, the proof boils down to showing that we can bound the Lipschitz constant of the decoupling field independently of the cut-off function in Eq.(20).

Second Step. In this step, we fix the values of (t0,x)[0,T]×dandR>0(t_{0},x)\in[0,T]\times\mathbb{R}^{d}\mathrm{~{}and~{}}R>0, and we use the notation (X,Y,Z)for(XR;t0,x,YR;t0,x,ZR;t0,x)(X,Y,Z)\mathrm{~{}for~{}}(X^{R;t_{0},x},Y^{R;t_{0},x},Z^{R;t_{0},x}). We then let (t)0tT(\mathcal{E}_{t})_{0\leqslant t\leqslant T} be the Doléans-Dade exponential of the stochastic integral:

(0t[(σ1b)(s,Xs,μs,α^s)]𝑑Ws)0tT,\left(-\int_{0}^{t}\left[(\sigma^{-1}b)(s,X_{s},\mu_{s},\hat{\alpha}_{s})\right]\cdot dW_{s}\right)_{0\leqslant t\leqslant T},

where α^s=α^(s,Xs,μs,σ(s,Xs,μs)1s).\hat{\alpha}_{s}~{}=~{}\hat{\alpha}(s,X_{s},\mu_{s},\sigma(s,X_{s},\mu_{s})^{-1\dagger}\mathbb{Z}_{s}). As earlier, we write (σ1b)(t,x,μ,α)(\sigma^{-1}b)(t,x,\mu,\alpha) for σ(t,x,μ)1b(t,x,μ,α)\sigma(t,x,\mu)^{-1}b(t,x,\mu,\alpha) despite the fact that σ1\sigma^{-}1 and bb do not have the same arguments. Since the integrand is bounded, (¯t)0tT(\bar{\mathcal{E}}_{t})_{0\leqslant t\leqslant T} is a true martingale, and we can define the probability measure =T.\mathbb{Q}=\mathcal{E}_{T}\cdot\mathbb{P}. Under \mathbb{Q}, the process:

(Wt=Wt+0t(σ1b)(s,Xs,μs,α^s)𝑑s)0tT\left(W_{t}^{\mathbb{Q}}=W_{t}+\int_{0}^{t}\left(\sigma^{-1}b\right)(s,X_{s},\mu_{s},\hat{\alpha}_{s})ds\right)_{0\leqslant t\leqslant T}

is a d-dimensional Brownian motion. Following the proof of Proposition 4.51, we learn that under ,(Xt,Yt,Zt)0tT\mathbb{Q},(X_{t},Y_{t},Z_{t})_{0\leq t\leq T} is a solution of the forward-backward SDE:

{dXt=σ(t,Xt,μt)dWt,dYt=ψ(Zt)f(t,Xt,μt,α^(t,Xt,μt,σ(t,Xt,μt)1Zt))dtZt(σ1b)(t,Xt,μt,α^(t,Xt,μt,σ(t,Xt,μt)1Zt))dt+ZtdWt,\left\{\begin{array}[]{ll}dX_{t}=&\sigma(t,X_{t},\mu_{t})dW_{t}^{\mathbb{Q}},\\ dY_{t}=&-\psi(Z_{t})f\big{(}t,X_{t},\mu_{t},\hat{\alpha}\big{(}t,X_{t},\mu_{t},\sigma(t,X_{t},\mu_{t})^{-1\dagger}Z_{t}\big{)}\big{)}dt\\ &-Z_{t}\cdot(\sigma^{-1}b)\big{(}t,X_{t},\mu_{t},\hat{\alpha}\big{(}t,X_{t},\mu_{t},\sigma(t,X_{t},\mu_{t})^{-1\dagger}Z_{t}\big{)}\big{)}dt\\ &+Z_{t}\cdot dW_{t}^{\mathbb{Q}},\end{array}\right. (21)

over the interval [t0,T][t_{0},T], with the same terminal condition as before. Since ZZ is bounded, the forward-backward SDE Eq.(21) may be regarded as an FBSDE with Lipschitz-continuous coefficients. By the FBSDE version of Yamada-Watanabe theorem proven in ref.1, any other solution with a bounded martingale integrand, with the same initial condition but constructed with respect to another Brownian motion, has the same distribution. Therefore, we can focus on the version of Eq.(21) obtained by replacing 𝑾\bm{W}^{\mathbb{Q}} by 𝑾.\bm{W}. If, for this version, the backward component YY can be represented in the form Yt=V(t,Xt)Y_{t}=V(t,X_{t}), for all t[t0,T]t\in[t_{0},T], with VV being Lipschitz continuous in space, uniformly in time, and with ZZ bounded, then V(t0,x)V(t_{0},x) must coincide with uR(t0,x).u^{R}(t_{0},x). Repeating the argument for any (t0,x)[0,T]×d(t_{0},x)\in[0,T]\times\mathbb{R}^{d},we then have VuR.V\equiv u^{R}.

Third Step. The strategy is now as follows. We consider the same FBSDE as in Eq.(21), but with 𝑾\bm{W}^{\mathbb{Q}} replaced by the original 𝑾\bm{W}:

{dXt=σ(t,Xt,μt)dWt,dYt=ψ(Zt)f(t,Xt,μt,α^(t,Xt,μt,σ(t,Xt,μt)1Zt))dtZt(σ1b)(t,Xt,μt,α^(t,Xt,μt,σ(t,Xt,μt)1Zt))dt+ZtdWt,t[0,T],\left\{\begin{aligned} &dX_{t}=\sigma(t,X_{t},\mu_{t})dW_{t},\\ &dY_{t}=-\psi(Z_{t})f\big{(}t,X_{t},\mu_{t},\hat{\alpha}\big{(}t,X_{t},\mu_{t},\sigma(t,X_{t},\mu_{t})^{-1\dagger}Z_{t}\big{)}\big{)}dt\\ &-Z_{t}\cdot(\sigma^{-1}b)\big{(}t,X_{t},\mu_{t},\hat{\alpha}\big{(}t,X_{t},\mu_{t},\sigma(t,X_{t},\mu_{t})^{-1\dagger}Z_{t}\big{)}\big{)}dt\\ &+Z_{t}\cdot dW_{t},\quad t\in[0,T],\end{aligned}\right. (22)

with YT=g(XT,μT)Y_{T}=g(X_{T},\mu_{T}), This BSDE may be regarded as a quadratic BSDE. In particular, Ref.1 applies and guarantees that it is uniquely solvable. However, since the driver in the backward equation is not Lipschitz continuous, we shall modify the form of the equation and focus on the following version:

{dXt=σ(t,Xt,μt)dWt,dYt=ψ(Zt)f(t,Xt,μt,α^(t,Xt,μt,σ(t,Xt,μt)1Zt))dtψ(Zt)Zt(σ1b)(t,Xt,μt,α^(t,Xt,μt,σ(t,Xt,μt)1Zt))dt+ZtdWt,t[0,T].\left\{\begin{aligned} &dX_{t}=\sigma(t,X_{t},\mu_{t})dW_{t},\\ &dY_{t}=-\psi(Z_{t})f\big{(}t,X_{t},\mu_{t},\hat{\alpha}\big{(}t,X_{t},\mu_{t},\sigma(t,X_{t},\mu_{t})^{-1\dagger}Z_{t}\big{)}\big{)}dt\\ &-\psi(Z_{t})Z_{t}\cdot(\sigma^{-1}b)\big{(}t,X_{t},\mu_{t},\hat{\alpha}\big{(}t,X_{t},\mu_{t},\sigma(t,X_{t},\mu_{t})^{-1\dagger}Z_{t}\big{)}\big{)}dt\\ &+Z_{t}\cdot dW_{t},\quad t\in[0,T].\end{aligned}\right. (23)

Notice that the cut-off function ψ\psi now appears on the third line. Our objective being to prove that Eq.(23) admits a solution for which ZZ is bounded independently of RR, when RR is large, the presence of the cut-off does not make any difference.

Now, Eq.(23) may be regarded as both a quadratic and a Lipschitz FBSDE. For any initial condition (t0,x)(t_{0},x), we may again denote the solution by (𝑿R;t0,x,(\bm{X}^{R;t_{0},x}, 𝒀R;t0,x,𝒁R;t0,x).\bm{Y}^{R;t_{0},x},\bm{Z}^{R;t_{0},x}). This is the same notation as in the first step although the equation is different. Since the steps are completely separated, there is no risk of confusion. We denote the corresponding decoupling field by VR.V^{R}. By Theorem in Ref.1, it is bounded (the bound possibly depending on RR at this stage of\operatorname{of} the proof) and 𝐙R;t0,x\mathbf{Z}^{R};t_{0},x is bounded.

For the sake of simplicity, we assume that t0=0t_{0}=0 and we drop the indices RR and t0t_{0} in the notation (𝑿R:t0,x,𝒀R:t0,x,𝒁R:t0,x).\begin{array}[]{c}\text{notation }(\bm{X}^{R:t_{0},x},\bm{Y}^{R:t_{0},x},\bm{Z}^{R:t_{0},x}).\end{array} We just denote it by (𝑿x,𝒀x,𝒁x).(\bm{X}^{x},\bm{Y}^{x},\bm{Z}^{x}). Similarly, we just denote VRV^{R} by V.V.

The goal is then to prove that there exists a constant CC, independent of RR and of the cut-off functionsfunctions, such that, for all x,xdx,x^{\prime}\in\mathbb{R}^{d},

|𝔼[Y0xY0x]|C|xx|,\left|\mathbb{E}[Y_{0}^{x^{\prime}}-Y_{0}^{x}]\right|\leqslant C|x^{\prime}-x|, (24)

from which we will deduce that, for all x,xmathbbRdx,x^{\prime}\in mathbb{R}^{d}

|V(0,x)V(0,x)|C|xx|,|V(0,x^{\prime})-V(0,x)|\leqslant C|x^{\prime}-x|,

which is exactly the Lipschitz control we need on the decoupling field.

Fourth Step. We now proceed with the proof of Eq.(24). Fixing the values of xx and x0x_{0} and letting

(δXt,δYt,δZt)=(XtxXtx,YtxYtx,ZtxZtx),t[0,T],\left(\delta X_{t},\delta Y_{t},\delta Z_{t}\right)=\left(X_{t}^{x^{\prime}}-X_{t}^{x},Y_{t}^{x^{\prime}}-Y_{t}^{x},Z_{t}^{x^{\prime}}-Z_{t}^{x}\right),\quad t\in[0,T],

we can write:

dδXt=[δσtδXt]dWt,t[0,T],d\delta X_{t}=\begin{bmatrix}\delta\sigma_{t}\delta X_{t}\end{bmatrix}dW_{t},\quad t\in[0,T], (25)

where δσtδXt\delta\sigma_{t}\delta X_{t} is the d×dd\times dmatrix with entries:

(δσtδXt)i,j==1d(δσt)i,j,(δXt),i,j{1,,d}2,\left(\delta\sigma_{t}\delta X_{t}\right)_{i,j}=\sum_{\ell=1}^{d}\left(\delta\sigma_{t}\right)_{i,j,\ell}(\delta X_{t})_{\ell},\quad i,j\in\{1,\cdots,d\}^{2},

where (δXt)(\delta X_{t})_{\ell} is the th\ell^{{\mathrm{th}}} coordinate of δXt\delta X_{t} and

(δσt)i,j,=σi,j(t,Xt1;xx,μt)σi,j(t,Xt;xx,μt)(δXt)𝟏(δXt)0,\left(\delta\sigma_{t}\right)_{i,j,\ell}=\frac{\sigma_{i,j}{\left(t,X_{t}^{\ell-1;x\leftarrow x^{\prime}},\mu_{t}\right)}-\sigma_{i,j}{\left(t,X_{t}^{\ell;x\leftarrow x^{\prime}},\mu_{t}\right)}}{(\delta X_{t})_{\ell}}\mathbf{1}_{{(\delta X_{t})_{\ell}\neq 0}},

with:

Xt;xx=((Xtx)1,,(Xtx),(Xtx)+1,,(Xtx)d).X_{t}^{\ell;x\leftarrow x^{\prime}}=\left((X_{t}^{x})_{1},\cdots,(X_{t}^{x})_{\ell},(X_{t}^{x^{\prime}})_{\ell+1},\cdots,(X_{t}^{x^{\prime}})_{d}\right).

From the Lipschitz property of σ\sigma in xx,the process (δσt)0tT(\delta\sigma_{t})_{0}\leqslant t\leqslant T is bounded by a constant CC only depending upon LL in the assumption. Notice that in the notation δσtδXt,(δσtδXt)ij\delta\sigma_{t}\delta X_{t},(\delta\sigma_{t}\delta X_{t})_{ij} appears as the inner product of ((δσt)i,j,)1d((\delta\sigma_{t})_{i,j,\ell})_{1\leq\ell\leq d} and ((δXt))1d.((\delta X_{t})_{\ell})_{1\leq\ell\leq d}. Because of the presence of the additional indices (i,j)(i,j), we chose not to use the inner product notation in this definition. This warning being out of the way, we may use the inner product notation when convenient.

Indeed, in a similar fashion, the pair(δYt,δZt)0tT(\delta Y_{t},\delta Z_{t})_{0\leqslant t\leqslant T} satisfies a backward equation of the form:

δYt\displaystyle\delta Y_{t} =δgTδXT\displaystyle=\delta g_{T}\cdot\delta X_{T} (26)
+tT(δFs(1)δXs+δFs(2)δZs)𝑑stTδZs𝑑Ws,\displaystyle+\int_{t}^{T}\left(\delta F_{s}^{(1)}\cdot\delta X_{s}+\delta F_{s}^{(2)}\cdot\delta Z_{s}\right)ds-\int_{t}^{T}\delta Z_{s}\cdot dW_{s},\quad t[0,T],\displaystyle t\in[0,T],

where δgT\delta g_{T} is an d\mathbb{R}^{d}-valued random variable bounded by CC and δ𝑭(1)=(δFt(1))0tT\delta\bm{F}^{(1)}=\\ (\delta F_{t}^{(1)})_{0\leqslant t\leqslant T} and δ𝑭(2)=(δFt(2))0tT\delta\bm{F}^{(2)}=(\delta F_{t}^{(2)})_{0\leq t\leq T} are progressively measurable d\mathbb{R}^{d}-valued processes, which are bounded, the bounds possibly depending upon the function ψ.\psi. Here,“” denotes the inner product of d.\mathbb{R}^{d}. Notice that, as a uniform bound on the growth of δ𝑭(1)\delta\bm{F}^{(1)} and δ𝑭(2)\delta\bm{F}^{(2)},we have:

|δFt(1)|C(1+|Ztx|2+|Ztx|2)|δFt(2)|C(1+|Ztx|+|Ztx|),t[0,T],\begin{aligned} |\delta F_{t}^{(1)}|\leqslant C\big{(}1+|Z_{t}^{x}|^{2}+|Z_{t}^{x^{\prime}}|^{2}\big{)}\\ |\delta F_{t}^{(2)}|\leqslant C\big{(}1+|Z_{t}^{x}|+|Z_{t}^{x^{\prime}}|\big{)}\end{aligned},\quad t\in[0,T], (27)

the constant CC only depending on the constant LL appearing in the assumption and where we used the assumption sup |ψ|2/R.|\psi^{\prime}|\leqslant 2/R. Since δ𝑭(2)\delta\bm{F}^{(2)} is bounded, we may introduce a probability \mathbb{Q} (again this is not the same \mathbb{Q} as that appearing in the second step, but, since the two steps are completely independent, there is no risk of confusion), equivalent to \mathbb{P}, under which (Wt=Wt0tδFs(2)𝑑s)0tT(W_{t}^{\mathbb{Q}}=W_{t}-\int_{0}^{t}\delta F_{s}^{(2)}ds)_{0\leqslant t\leqslant T} is a Brownian motion. Then,

|𝔼(δY0)|=|𝔼(δY0)|=|𝔼[δgTδXT+0TδFs(1)δXs.ds]|.|\mathbb{E}(\delta Y_{0})|=\left|\mathbb{E}^{\mathbb{Q}}(\delta Y_{0})\right|=\left|\mathbb{E}^{\mathbb{Q}}{\left[\delta g_{T}\cdot\delta X_{T}+\int_{0}^{T}\delta F_{s}^{(1)}\cdot\delta X_{s}.ds\right]}\right|. (28)

In order to handle the above right-hand side, we need to investigate d/dd\mathbb{Q}/d\mathbb{P}. This requires to go back to Eq.(27) and to Eq.(23).

Fifth Step. The backward equation in Eq.(23) may be regarded as a BSDE satisfying assumption Quadratic BSDE, uniformly in R.R. By Ref.1, the integral (0tZsx𝑑Ws)0tT(\int_{0}^{t}Z_{s}^{x}\cdot dW_{s})_{0\leq t\leq T} is of Bounded Mean Oscillation and its BMO norm is independent of xx and R.R. Without any loss of generality, we may assume that it is less than C.C.

Coincidentally, the same holds true if we replace ZsxZ_{s}^{x} by δFs(2)\delta F_{s}^{(2)} from Eq.(26), as |δFs(2)|C(1+|Zsx|+|Zsx|).|\delta F_{s}^{(2)}|\leqslant C(1+|Z_{s}^{x}|+|Z_{s}^{x^{\prime}}|). By Ref.1, we deduce that there exists an exponent r>1r>1, only depending on LL and TT, such that (allowing the constant CC to increase from line to line):

𝔼[(dd)r]C.\mathbb{E}{\left[\left(\frac{d\mathbb{Q}}{d\mathbb{P}}\right)^{r}\right]}\leqslant C.

Now Eq.(25) implies that, for any p1p\geqslant 1, there exists a constant CpC_{p}^{\prime}, independent of the cutoff functions ψ\psi, such that 𝔼[sup0tT|δXs|p]1/pCp|xx|.\mathbb{E}[\sup_{0\leqslant t\leqslant T}|\delta X_{s}|^{p}]^{1/p}\leqslant C_{p}^{\prime}|x-x^{\prime}|. Therefore, applying Hölder’s inequality, Eq.(28) and the bound for the rr-moment of d/dd\mathbb{Q}/d\mathbb{P}, we obtain:

|𝔼(δY0)|C|xx|{1+𝔼[(0T(|Zsx|2+|Zsx|2)𝑑s)r]1/r},\left|\mathbb{E}(\delta Y_{0})\right|\leqslant C|x-x^{\prime}|\left\{1+\mathbb{E}{\left[\left(\int_{0}^{T}\left(|Z_{s}^{x}|^{2}+|Z_{s}^{x^{\prime}}|^{2}\right)ds\right)^{r^{\prime}}\right]^{1/r^{\prime}}}\right\},

for some r>1r^{\prime}>1. In order to estimate the right-hand side, we invoke Ref.(1) again. We deduce that:

|𝔼(δY0)|C|xx|,\left|\mathbb{E}(\delta Y_{0})\right|\leqslant C^{\prime}|x-x^{\prime}|,

for a constant CC^{\prime} that only depends upon LL and TT. This proves the required estimate for the Lipschitz constant of the decoupling field associated with the system (23).

Appendix D Error Analysis

The error due to discretized MKV FBSDEs is negatively correlated with the number of temporal discretizations NN, i.e., O(δu)O(1N)O(\delta_{u})\sim O(\frac{1}{N}). Therefore, the denser the temporal discretization splits, the smaller the resulting error in discretizations. Meanwhile, The training loss can be expressed as the solution loss of the discretized MFGs, and the error is caused by the parameterized Neural Network. Below shows the detail of error analysis.

D.1 Errors caused by discretization of MKV FBSDEs

For convenience, abbreviations will be used in the following error analysis, that is, f(t,)f(t,\cdot) will represent f(t,Xt,L(Xt)σ1Zt)f(t,X_{t},L(X_{t})\sigma^{-1\dagger}Z_{t}).

In MKV FBSDEs, YtY_{t} represents the value function utu_{t}. To get the value function, we integrate the second term in eq.(4) in Section 2.1, that is

utu0=0tf(s,)ds+0tZs𝑑Ws,u_{t}-u_{0}=\int_{0}^{t}-f(s,\cdot)ds+\int_{0}^{t}Z_{s}dW_{s},

subtracting the case at t=tnt=t_{n} from the case at t=tn+1t=t_{n+1} gives:

ut+1ut=tntn+1f(s,)ds+tntn+1Zs𝑑Ws,u_{t+1}-u_{t}=\int_{t_{n}}^{t_{n+1}}-f(s,\cdot)ds+\int_{t_{n}}^{t_{n+1}}Z_{s}dW_{s},

and can be discretized by Euler method, using f(t,)f(t,\cdot) to represent the average value in the process, and we can get

u(x,tn+1)u(x,tn)f(tn,)Δtn+[xu(x,t)]TσΔWn.u(x,{t_{n+1}})-u({x,{t_{n}}})\approx-f(t_{n},\cdot)\Delta t_{n}+[\partial_{x}u(x,t)]^{T}\sigma\Delta W_{n}.

where

Δtn=tn+1tn,ΔWn=Wtn+1Wtn\Delta t_{n}=t_{n+1}-t_{n},\quad\Delta W_{n}=W_{t_{n+1}}-W_{t_{n}}

The item [xu(x,t)]TσΔWn[\partial_{x}u(x,t)]^{T}\sigma\Delta W_{n} has no error because WtW_{t} is a random process, so there is no difference in its value between ΔWt\Delta W_{t} and the integral form. The item f(t,)Δtn-f(t,\cdot)\Delta t_{n} causes the error from the real value tntn+1f(s,)ds\int_{t_{n}}^{t_{n+1}}-f(s,\cdot)ds. The error can be calculated as (represent by δn\delta_{n}):

δ=|f(tn,)Δtntntn+1f(s,)ds|=|tntn+1f(s,)𝑑sf(tn,)Δtn|\delta=\left|-f(t_{n},\cdot)\Delta t_{n}-\int_{t_{n}}^{t_{n+1}}-f(s,\cdot)ds\right|=\left|\int_{t_{n}}^{t_{n+1}}f(s,\cdot)ds-f(t_{n},\cdot)\Delta t_{n}\right|

By using First mean value theorem for definite integrals, t[tn,tn+1]\exists t^{\prime}\in[t_{n},t_{n+1}], such that

δ=|(f(t,)f(tn,))Δtn||fmaxfmin|Δtn|f|max(Δtn)2\delta=\left|\left(f(t^{\prime},\cdot)-f(t_{n},\cdot)\right)\Delta t_{n}\right|\leqslant\left|f_{max}-f_{min}\right|\Delta t_{n}\leqslant|f^{\prime}|_{max}(\Delta t_{n})^{2}

In the whole process, since t is discretized into N parts, the error of the whole process (represent as σu\sigma_{u}) can be obtained as

δun=1Nδn=n=1N|fn|max(Δtn)2|f|max(Δtn)2Ttn=|f|maxΔtnT\delta_{u}\leqslant\sum_{n=1}^{N}\delta_{n}=\sum_{n=1}^{N}|f_{n}^{\prime}|_{max}(\Delta t_{n})^{2}\leqslant|f^{\prime}|_{max}(\Delta t_{n})^{2}\cdot\frac{T}{t_{n}}=|f^{\prime}|_{max}\cdot\Delta t_{n}\cdot T

resulting that δuO(Δtn)=O(1N)\delta_{u}\sim O(\Delta t_{n})=O(\frac{1}{N}).

In summary, the error δu\delta_{u} caused by discretization is related to the inverse of the number of discretization division states. Meanwhile, ε>0\forall\varepsilon>0, |f|max<M\forall|f^{\prime}|_{max}<M, N\exists N\in\mathbb{N}^{*} and N>MT2εN>\frac{MT^{2}}{\varepsilon} such that δu<ε\delta_{u}<\varepsilon.

D.2 Errors caused by parameterized density flow and value function

In MFGs, the loss function of the entire system can be written as

infα𝔸Jμ(α)withJμ(α)=𝔼[0Tf(t,Xtα,μt,αt)𝑑t+g(XTα,μT)].\inf_{\alpha\in\mathbb{A}}J^{\mu}(\alpha)\quad\text{with}\quad J^{\mu}(\alpha)=\mathbb{E}\left[\int_{0}^{T}f(t,X^{\alpha}_{t},\mu_{t},\alpha_{t})dt+g(X_{T}^{\alpha},\mu_{T})\right].

For fixed control α\alpha, the form of the optimization loss function becomes

infμJμ=𝔼[0Tf(t,Xt,μt)𝑑t+g(XT,μT)]\inf_{\mu}{J^{\mu}=\mathbb{E}\left[\int_{0}^{T}f(t,X_{t},\mu_{t})dt+g(X_{T},\mu_{T})\right]} (29)

It can be transformed into a parameterized problem for solving neural networks with 𝚽\mathbf{\Phi}

infμ(𝚽)Jμ=𝔼[0Tf(t,Xt,μt)𝑑t+g(XT,μT)]\inf_{\mu(\mathbf{\Phi})}{J^{\mu}=\mathbb{E}\left[\int_{0}^{T}f(t,X_{t},\mu_{t})dt+g(X_{T},\mu_{T})\right]} (30)

Thus, the loss JJ consists of two parts, one for process loss and one for terminal loss.

Process loss. As a result of discretizing the MKV FBSDEs and constructing the neural network in this way, the process loss changes from the form of an integral to the form of a cumulative sum of the total loss, passed through the network, and can thus be expressed as lMKVl_{MKV}.

Terminal Loss. In the iterative solution of the NF network by a network of fixed-value functions, the terminal loss is calculated by substituting the loss into the terminal-value function g after sampling by μT\mu_{T}. Thus it can be expressed as lTl_{T}.

Thus the above lMKV+lTl_{MKV}+l_{T} can represent the total loss of the MFGs JJ.

Regularization term loss. The HJB-FPK equations involved in the MFGs still need to be satisfied to hold throughout the process solution, so lHJBl_{HJB} is added as a regularization term.

In summary, the sum of the three loss terms during training can be expressed as the solution loss of the discretized MFGs, and since the error due to discretization has already been analyzed, this part of the error is only due to the error caused by the parameterized Neural Network.