NF-MKV Net: A Constraint-Preserving Neural Network Approach to Solving Mean-Field Games Equilibrium
Abstract
Neural network-based methods for solving Mean-Field Games (MFGs) equilibria have garnered significant attention for their effectiveness in high-dimensional problems. However, many algorithms struggle with ensuring that the evolution of the density distribution adheres to the required mathematical constraints. This paper investigates a neural network approach to solving MFGs equilibria through a stochastic process perspective. It integrates process-regularized Normalizing Flow (NF) frameworks with state-policy-connected time-series neural networks to address McKean-Vlasov-type Forward-Backward Stochastic Differential Equation (MKV FBSDE) fixed-point problems, equivalent to MFGs equilibria. First, we reformulate MFGs equilibria as MKV FBSDEs, embedding the density distribution into the equation coefficients within a probabilistic framework. Neural networks are then designed to approximate value functions and gradients derived from these equations. Second, we employ NF architectures, a class of generative neural network models, and impose loss constraints on each density transfer function to ensure volumetric invariance and time continuity. Additionally, this paper presents theoretical proofs of the algorithm’s validity and demonstrates its applicability across diverse scenarios, highlighting its superior effectiveness compared to existing methods.
1 Introduction
Mean-Field Games (MFGs), introduced independently by Lasry & Lions (2007) and Huang et al. (2006), provide a robust framework for addressing large-scale multi-agent problems. MFGs are widely applied in domains such as autonomous driving, social networks, crowd management, and power systems.
Neural network-based algorithms have recently been employed to solve MFGs equations due to their ability to handle high-dimensional problems effectively. For example, Lin et al. (2021) reformulated MFGs as a generative adversarial network (GAN) training problem, and Ruthotto et al. (2020) introduced a Lagrangian-based approach to approximate agent states through sampling. Additionally, Chen et al. (2023) applied reinforcement learning and neural networks to model distributions and address value functions. Most approaches focus on optimizing the loss term in MFGs coupled equations using sampled agents, but they often neglect density dynamics, leading to challenges in representing continuous state density distributions.
Carmona et al. (2018) introduced a stochastic process perspective on MFGs, leveraging McKean-Vlasov Forward-Backward Stochastic Differential Equations (MKV FBSDEs) to address MFGs equilibria, and explored numerical methods for solving them. Achdou & Lauriere (2015) proposed simplified MFGs models for pedestrian dynamics and demonstrated them with numerical simulations. Ren et al. (2024) studied multi-group MFGs by solving asymmetric Riccati differential equations and established sufficient conditions for the existence and uniqueness of optimal solutions. However, existing MKV FBSDE methods are often limited to linear-quadratic MFGs, where distributions are simplified to the expectation of agents’ states, rather than full distribution functions, in nonlinear settings. Recently, Huang et al. (2023) proposed a data-driven Normalizing Flow (NF) approach to solve distribution-involved optimal transport stochastic problems. Nevertheless, constraints from MFGs process dynamics, terminal loss, and equation coupling limit the availability of NF frameworks for high-dimensional MKV FBSDEs.
In summary, solving the MFGs equilibrium reduces to addressing equivalent stochastic fixed-point problems that incorporate distribution flows. NF-MKV Net is proposed to solve the MKV FBSDEs problem by coupling the process-regularized NF and state-policy-connected time series neural networks. The enhanced NF framework models flows of probability measures, constructing a density distribution flow that satisfies volumetric invariance at each time step. State-policy-connected time series neural networks, grounded in MKV FBSDEs, model relationships between time-step value functions and approximate their gradients, enabling solutions in a time-consistent manner. Using the coupled frameworks, the fixed-point distribution flow equivalent to the MFGs equilibrium can be determined while ensuring mathematical constraints are satisfied.
Contributions: The main contributions and results are summarized as follows:
-
•
NF-MKV Net is proposed to solve MKV FBSDEs, which are equivalent to MFGs equilibrium, from a stochastic process perspective. By integrating process-regularized NFs and state-policy-connected time series neural networks into a coupled framework with alternating training, the method adheres to volumetric invariance and time-continuity constraints.
-
•
Process-regularized NF frameworks are designed to model probability measure flows by enforcing loss constraints on each density transfer function, ensuring volumetric invariance at each time step.
-
•
State-policy-connected time series neural networks, built upon MKV FBSDEs, capture the relationships between time-step value functions and approximate their gradients, enabling time-consistent solutions.
-
•
The method demonstrates applicability in traffic flow, low- and high-dimensional crowd motion, and obstacle avoidance problems. Additionally, it satisfies mathematical constraints better than existing neural network-based approaches.
2 Connections among MFG, MKV, and NF
2.1 MFGs & McKean-Vlasov FBSDE
We now formalize the MFGs problem without considering common noise. For this purpose, we start with a complete filtered probability space the filtration supporting a dimensional Wiener process with respect to and an initial condition . This MFGs problem can be described as:
(i) For each fixed deterministic flow of probability measures on , solve the standard stochastic control problem:
(1) |
subject to
(2) |
(ii) Find a flow such that for all , if is a solution of the above optimal control problem.
We can see that the first step provides the best response of a given player interacting with the statistical distribution of the states of the other players if this statistical distribution is assumed to be given by . In contrast, the second step solves a specific fixed point problem in the spirit of the search for fixed points of the best response function.
Usually, the solution of MFGs is transformed into a set of coupled partial differential equations, namely the HJB-FPK equations, which respectively describe the evolution of the value function of the representative element and the density evolution of the group, as shown below:
(HJB) | (3) | |||||
(FPK) | ||||||
in which, is the value function to guide the agents make decisions; is the Hamiltonian, which describes the physics energy of the system; is the distribution of agents at time t, denotes the loss during process; and is the terminal condition, guiding the agents to the final distribution.
Let assumption MFGs Solvability HJB (as shown in Appendix A.1) be in force. Then, for any initial condition , the McKean-Vlasov FBSDEs:
(4) |
for , with as terminal condition, is solvable. Moreover, the flow given by the marginal distributions of the forward component of any solution is an equilibrium of the MFGs problem associated with the stochastic control problem Eq.(1).
We consider the optimal control step of the formulation of the MFGs problems described earlier. Probabilists have a two-pronged approach to these optimal control problems. We consider that the input is deterministic and fixed to search the optimal reaction decision. Then, after the result of fixed decision, the optimal flows of probability measure can be solved. Alternately seeking for the optimal control, the MFGs equilibrium can be finally derived.
2.2 MFGs & NF
NNormalizing Flows (NF), introduced by Tabak & Vanden-Eijnden (2010), enable exact computation of data likelihood through a sequence of invertible mappings. A key feature of NF is its use of arbitrary bijective functions, achieved through stacked reversible transformations. The flow model consists of a sequence of reversible flows, expressed as , where each has a tractable inverse and Jacobian determinant.
Our algorithm leverages the volume-preserving property of NF, aligning with the consistency of density flow in MFGs during evolution. This principle is essential for constructing the density flow in the MFGs model.
The connection between MFGs and NF provides inherent advantages. For example, in MFGs, the initial distribution is often represented in a simple analytical form. This parallels NF’s approach, where a simple initial distribution transforms into a more complex one for density estimation. Additionally, one advantage of NF over other generative models is its preservation of total density during transformation, consistent with the MFGs requirement . A challenge, however, is that MFGs, unlike Optimal Transport (OT), lacks both initial and terminal density distributions. In MFGs, only an initial distribution exists, and the terminal condition is governed by the terminal value function . This complicates framing the problem as a complete density evolution problem.
To address this, we idealize the MFGs model with the assumption that the terminal value function corresponds to an explicitly solvable optimal terminal density. For example, in trajectory planning problems, the terminal value function is often related to the destination, such as . In such cases, we assume the optimal terminal distribution is , enabling the MFGs problem to be framed with initial and terminal densities. We will show that this assumption is reasonable.
3 Methodology: NF-MKV Net
We propose NF-MKV Net, an alternately trained model combining NF and McKean-Vlasov Forward-Backward Stochastic Differential Equations (MKV FBSDEs), to address MFGs equilibrium problems. The main advantage of MKV FBSDEs is their ability to capture both optimization and interaction components in a single coupled FBSDE, eliminating the need for separate references to Hamilton-Jacobi-Bellman (HJB) and Fokker-Planck-Kolmogorov (FPK) equations. NF generates flows represented by neural networks, constraining each density transfer function to define density distributions at specific times. The density flow from NF couples with the value function of the MFG. The value function constrains the neural network generating the flow via the HJB equation, while its gradient update depends on the current marginal distribution flow.
First, we reformulate the stochastic equations of MFGs using MKV FBSDEs and approximate value function gradients with neural networks, effectively addressing the curse of dimensionality in traditional numerical methods. Second, to address the distribution-coupled challenge, we use NF architectures to model agents’ state density distributions, alternately training the unknown transition processes with value functions. Figure (1) illustrates the framework of NF-MKV Net.

3.1 Modeling Value Function with MKV FBSDEs
We consider a general class of MFGs problem associated with the stochastic control problem (1), the relative FBSDEs can be represented as in (4), with initial condition and terminal condition .
Then the solution of Eq.(3) satisfies the following FBSDE:
(5) |
We apply a temporal discretization to Eq.(4). Given a partition of the time interval, we consider the simple Euler scheme for
(6) |
and
(7) |
where
(8) |
The key in modeling the above FBSDEs is to approximate the value function at , which is ; while approximating the function at each time step through a multi-layer feedforward neural network
(9) |
which represents the adjoint variable of the value function of the element, which is expressed as the product of the gradient and the random variable in the MFGs optimization problem.
Subsequently, all value functions are connected by summing Eq.(7) over . The network uses the generated density flows and as inputs and produces the final output , approximating . This approximation defines the expected loss function by comparing the maximum likelihoods of the two functions to minimize the difference for :
(10) |
In summary, we reformulate MFGs as MKV FBSDEs (Eq.(5)) and discretize time to establish the relationship between value functions at each time (Eq.(7)), connecting them via the adjoint variable . Next, we parameterize and , using these relationships to link with the terminal condition for maximum likelihood estimation (Eq.(10)) to minimize the final MFGs loss.
3.2 Modeling density distribution with NF
Typically, NF methods prioritize density estimation results. In contrast, our approach emphasizes the NF density evolution process, constraining each layer to align with the density evolution in MFGs.
In an NF model, If are differentiable and reversible functions, we usually express it like:
(11) |
To simplify the explanation, we consider a one-dimensional model. During training, is represented as a neural network, . Multiple are combined to obtain the desired function . Typically, training minimizes the negative log-likelihood loss between the final estimated density and the dataset, expressed as
(12) |
in which there is no need to consider the loss of each in the process.
Discretizing the NF construction process reveals that each function corresponds to Euler time discretization. Thus, each sub-function transforms the group density in MFGs into . This series of reversible flows represents the time evolution process, which results in that
(13) |
Additionally, as each sub-function is implemented as a neural network, the loss at each time step can be used to constrain and optimize the sub-functions. The density can be expressed as , where .
Our way of modeling the limited is to approximate the crowd density of each layer
(14) |
at each time step through a NF model consisting of multiple layers of Masked Autoregressive Flow (MAF) and Permute parameterized by .
To train the NF, the first step is computing the process loss . At each time step , the MFGs system satisfies the HJB equation. Section 3.1 describes the value function and its gradient. Thus, samples from at each time step can be used in the HJB equation to compute the loss,
(15) |
where
(16) |
Additionally, the NF method must match the terminal density condition, so the terminal loss is also included in the loss calculation.If the terminal condition is explicitly defined, the corresponding optimal density can serve as the target distribution for NF. The negative log-likelihood between and generated by NF is used to compute the terminal loss . For :
(17) |
In summary, NF, as a generative model, can construct intermediate function compositions and distributions without direct data use, relying solely on the initial distribution and the terminal distribution , while preserving density consistency. We first compute the terminal density distribution explicitly and construct an NF to transition from to . The losses (Eq. 15) and (Eq. 17) constrain NF evolution, ensuring the flow density aligns with the control objectives.
3.3 Coupling two processes
Two processes can be coupled and trained alternately. As NF is a generative model, it can first generate a set of flow density evolution functions along with the corresponding density distributions at each time step. This generated set of density distributions is fixed as the marginal distribution to optimize the value function and gradient under the MKV FBSDE framework. Once the optimal value function for this marginal distribution flow is obtained, it is fixed to update each and its corresponding in the NF evolution process. This continues until the optimal density distribution flow under the current value function is achieved. This iterative coupled training continues until convergence. Algorithm (1) presents the pseudo-code of the model.
4 Numerical Experiment
We apply NF-MKV Net to MFGs instances and present the numerical results in two parts. The first part demonstrates NF-MKV Net as an effective method for solving MFGs equilibrium involving density distributions. The second part highlights the accuracy of NF-MKV Net in comparison to other algorithms.
4.1 Solving MFGs with NF-MKV Net
This section presents three examples of solving MFGs using NF-MKV Net, demonstrating its applicability to traffic flow problems, low- and high-dimensional crowd motion problems, and scenarios with obstacles.
4.1.1 Example 1: MFGs Traffic Flow Control
A series of numerical experiments in MFGs Traffic Flow Control explore the dynamics of MFGs, focusing on autonomous vehicles (AVs) navigating a circular road network. The traffic flow scenario is formulated as an MFGs problem involving density distribution and the value function.
The initial density is defined on the ring road, where the state represents the AVs’ position. The state transfer function is , and the process constraint is . The Hamiltonian is defined as , leading to the optimal control . In the finite time domain problem, the terminal value function of the AVs system is constrained at . It is assumed that AVs have no preference for their locations at time , i.e., . In the MFGs traffic flow problem, the terminal value function can be solved explicitly as , satisfying the model’s assumptions.
Without loss of generality, we define the time interval as and set the initial density at . To verify the volumetric invariance of the density distribution discussed in our study, we selected initial density functions satisfying . Four different initial densities were selected, each with a distinct diffusion coefficient for the Wiener process. NF-MKV Net was then employed to solve for equilibrium, verifying the proposed algorithm’s applicability.
Results in Fig.(2) indicate that the agent distribution, regardless of initial density or drift term , converges to the equilibrium . Comparing NF-MKV Net with the numerical solution shows errors below , demonstrating the algorithm’s effectiveness in solving traffic flow problems. The results illustrate the evolution of the density distribution over time and the errors compared to the noise-free numerical method.

4.1.2 Example 2: MFGs Crowd Motion
In this example, a dynamically formulated MFGs problem, the Crowd Motion problem, is constructed in dimensions and to demonstrate the applicability of NF-MKV Net. We set the problems as in Eq.(3) with parameter:
(18) | ||||||
Crowd Motion. Here, is used with 20 time steps in the dynamics process, and the initial distribution is set as . To reach the goal point, we set which means the terminal condition is .The terminal distribution is set as to minimize the terminal condition. With these settings, NF-MKV Net trains the MFGs model associated with the dynamic system.

Crowd Motion. Similar to the 2-dimensional case, high-dimensional methods adopt the same settings as in Eq.(3) and Eq.(18). In contrast, high-dimensional methods handle agent states and controls in , along with density distributions in . So Our initial density is a Gaussian centered at and terminal conditions . Also, the optimal terminal density distribution can be written as . With these settings, NF-MKV Net trains the MFGs model associated with the dynamic system. Results are visualized in the first two dimensions by summing projections from higher dimensions onto these two dimensions.

The trajectories of points are shown in Fig. (3) for the 2-dimensional case and Fig. (4) for the 50-dimensional case. NF-MKV Net effectively transforms the initial Gaussian density into the terminal condition along a nearly straight trajectory, while ensuring crowd deformation and inter-group collision avoidance. This behavior remains consistent as the dimensionality increases.
4.1.3 Example 3: MFGs Crowd Motion with obstacle
This experiment considers an MFGs problem with complex process interaction costs. Following the general setting in Eq.(3) and Eq.(18), we change the process interaction costs as
(19) |
We set with 20 time steps in the dynamics process and set initial distribution as . To reach the goal point, we set which means the terminal condition is . The terminal distribution should be to minimize the terminal condition. During the dynamics, the system optimizes the process loss defined in Eq. (19). The problem involves transforming the initial Gaussian density to a new location while minimizing terminal conditions, avoiding congestion, and bypassing an obstacle at with a safety radius . With these settings, NF-MKV Net successfully trains the MFGs model associated with the dynamics system.

The results, shown in Fig. (5), demonstrate that NF-MKV Net successfully transforms the initial Gaussian density into the desired terminal condition along an optimized trajectory, ensuring crowd deformation, inter-group collision avoidance, and obstacle avoidance.
4.2 Comparison with other Methods
To verify volumetric invariance and time continuity, we compare NF-MKV Net with existing MFGs solving methods, including the distribution-based RL-PIDL method proposed by Chen et al. (2023) and the high-dimensional neural network-based APAC-Net by Lin et al. (2021).
Distribution volumetric-invariance. We implement an approximated integral over the dynamics region, widely used in density estimation such as O’Brien et al. (2016). By generating a grid over a specified area, the numerical integration of a specified probability distribution over that area is computed, and the return value should be close to 1. This method verifies the validity of the distribution. Since the approximated integral is a grid-based method, it can only be used in low-dimensional problems. So, when we approximate integral in crowd motion, we use the same process as when showing the high-dimensional trajectories from the experiment above, that is, a projection-like method that accumulates the density distribution function on the other components over the first two components and estimates the density in a 2-dimensional region.
Agents states time-continuity. The Wasserstein distance is generally chosen as the metric for the difference between two density distributions. Laurière et al. (2022) has used the method to assess the difference between distributions. Even without an explicit probability density function, the Wasserstein distance (-dis) can be computed by optimization methods as long as it can be sampled from both distributions.
We set the comparison experiments in traffic flow () and crowd motion ( and ). The comparison results are shown in Tab.(1) and Fig.(6).
COMPARISON | NF-MKV | RL-PIDL | APAC-NET | |
Exp 1: traffic flow | of integral difference from | / (sample-based) | ||
-dis of bet- ween time steps | ||||
Exp 2: crowd motion | of integral difference from | / (sample-based) | ||
-dis of bet- ween time steps | ||||
Exp 2: crowd motion | of integral difference from | / (sample-based) | ||
-dis of bet- ween time steps |


The integral error of the NF-MKV Net method over the dynamic region is below and close to the standard value of , significantly outperforming the other method, which has an error exceeding in low-dimensional problems. In high-dimensional scenarios, the error of the other method is over ten times larger than that of NF-MKV Net, highlighting the distribution volumetric invariance of NF-MKV Net. The NF-MKV Net has the smallest average Wasserstein distance between adjacent time steps among the three methods. This demonstrates smoother evolution and superior agent state time-continuity.
In summary, NF-MKV Net excels in distribution volumetric invariance and agent state time-continuity, making it suitable for solving MFGs problems involving density distributions.
5 Conclusion
This paper investigates MFGs equilibrium solutions using a stochastic process framework, addressing equivalent probability distribution flow fixed-point problems instead of directly solving the coupled MFGs equations. We propose NF-MKV Net, which integrates process-regularized NF with state-policy-connected time-series neural networks. Process-regularized NF frameworks enforce mathematical constraints by regulating the transfer functions to represent flows of probability measures. State-policy-connected time-series neural networks, grounded in MKV FBSDEs, establish relationships among value functions to ensure a time-consistent process. The proposed method is validated in diverse scenarios, demonstrating its effectiveness compared to existing approaches. NF-MKV Net exhibits strong performance in distribution volumetric invariance and agent state time-continuity, making it applicable to MFGs problems involving distributions.
References
- Achdou & Lauriere (2015) Yves Achdou and Mathieu Lauriere. On the System of Partial Differential Equations Arising in Mean Field type Control. On the system of partial differential equations arising in mean field type control, 2015.
- Carmona et al. (2018) René Carmona, François Delarue, et al. Probabilistic Theory of Mean Field Games with Applications I-II. Springer, 2018.
- Chen et al. (2023) Xu Chen, Shuo Liu, and Xuan Di. A Hybrid Framework of Reinforcement Learning and Physics-Informed Deep Learning for Spatiotemporal Mean Field Games. In In Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems. ACM DIgital Library, 2023.
- Huang et al. (2023) Han Huang, Jiajia Yu, Jie Chen, and Rongjie Lai. Bridging Mean-Field Games and Normalizing Flows with Trajectory Regularization. Journal of Computational Physics, 487:112155, 2023.
- Huang et al. (2006) Minyi Huang, Roland P Malhamé, and Peter E Caines. Large Population Stochastic Dynamic Games: Closed-Loop McKean-Vlasov Systems and the Nash Certainty Equivalence Principle. JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2006.
- Lasry & Lions (2007) Jean-Michel Lasry and Pierre-Louis Lions. Mean Field Games. Japanese journal of mathematics, 2(1):229–260, 2007.
- Laurière et al. (2022) Mathieu Laurière, Sarah Perrin, Sertan Girgin, Paul Muller, Ayush Jain, Theophile Cabannes, Georgios Piliouras, Julien Pérolat, Romuald Elie, Olivier Pietquin, et al. Scalable deep reinforcement learning algorithms for mean field games. In International Conference on Machine Learning, pp. 12078–12095. PMLR, 2022.
- Lin et al. (2021) Alex Tong Lin, Samy Wu Fung, Wuchen Li, Levon Nurbekyan, and Stanley J Osher. Alternating the Population and Control Neural Networks to Solve High-Dimensional Stochastic Mean-Field Games. Proceedings of the National Academy of Sciences, 118(31):e2024713118, 2021.
- O’Brien et al. (2016) Travis A. O’Brien, Karthik Kashinath, Nicholas R. Cavanaugh, William D. Collins, and John P. O’Brien. A fast and objective multidimensional kernel density estimation method: fastkde. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 101:148–160, SEP 2016. ISSN 0167-9473. doi: 10.1016/j.csda.2016.02.014.
- Ren et al. (2024) Lu Ren, Yuxin Jin, Zijia Niu, Wang Yao, and Xiao Zhang. Hierarchical Cooperation in LQ Multi-Population Mean Field Game with its Application to Opinion Evolution. IEEE Transactions on Network Science and Engineering, 2024.
- Ruthotto et al. (2020) Lars Ruthotto, Stanley J Osher, Wuchen Li, Levon Nurbekyan, and Samy Wu Fung. A Machine Learning Framework for Solving High-Dimensional Mean Field Game and Mean Field Control Problems. Proceedings of the National Academy of Sciences, 117(17):9183–9193, 2020.
- Tabak & Vanden-Eijnden (2010) Esteban G Tabak and Eric Vanden-Eijnden. Density estimation by dual ascent of the log-likelihood. Communications in Mathematical Sciences, 8(1):217–233, 2010.
Appendix A Assumption(MFGs Solvability HJB)
-
(A1)
The volatility is independent of the control parameter .
-
(A2)
There exists a constant such that:
for all .
-
(A3)
For any , and , the functions and are continuously differentiable in .
-
(A4)
For any , and , the functions , , and are Lipschitz continuous in ; for any and , the functions , , and are continuous in the measure argument with respect to the -Wasserstein distance.
-
(A5)
For the same constant and for all ,
-
(A6)
Letting
for all , there exists a unique minimizer , continuous in and Lipschitz continuous in , satisfying:
for all .
Appendix B Symbols Table
To ensure clarity, we have included a correspondence table for all symbols to enhance the reading experience.
Symbol | Meaning |
State Process | |
Filtration in the process | |
Filtration at time t | |
Probability space measure | |
Wiener Process in the process | |
Wiener Process at time t | |
Initial Condition | |
Square Integrable Function Space | |
d-dimensional real number space | |
Probability Density Measures Flows in the process | |
Probability Density at time t | |
Control Action Process | |
Control Action Set | |
Loss Functions under Behavioral and Density Flow | |
Expectation | |
Process Loss Function | |
Terminal Loss Function | |
State Control | |
Random Perturbation | |
Initial State under a Particular Behavioral Flow | |
Optimal State of the Representative Agent | |
The marginal density flow corresponding to the optimum | |
Value Function | |
Random Perturbation | |
Hamiltonian function corresponding to state and covariates | |
State at time | |
Value Function in MKV FBSDE at time | |
Covariate in MKV FBSDE at time | |
Intermediate Conversion Functions for each layer in NF | |
Conversion Functions in NF | |
The inverse of the intermediate conversion function for each layer in the NF | |
The inverse of the conversion function of NF | |
The inverse ergodic conjugate transpose of sigma | |
Neural network representation of the value function at time | |
Neural network representation of the gradient of the value function | |
Final output | |
Maximum likehood of function and | |
Training Loss of the MKVFBSDE | |
Neural network parameter representation of NF | |
Training Loss of the HJB equation | |
Number of time segments, number of full-plane samples | |
Terminal Loss | |
Parameter representation of the NF conversion function | |
Sample | |
Covariate in Hamiltonian function | |
takes a partial derivative with respect to |
Appendix C Theoretical Analysis
The theoretical work of our designed algorithm are primarily reflected in using the Representation Theorem for Strong Formulation to guarantee that the neural network solution corresponds to the equilibrium of Mean-Field Games (MFGs).
In fact, MFGs do not always have an equilibrium; its existence depends on the form of the value function in the equations. According to the conditions for the existence of equilibrium in MFGs[1] and the Representation Theorem for Strong Formulation, the objective function must satisfy the Lipschitz continuous, continuously differentiable and convexity condition (see Appendix A). If an objective function that fails to meet these conditions is directly used as the network’s loss function, the resulting solution cannot be guaranteed to correspond to the equilibrium of the MFGs, even if the network provides a solution. At the same time, the solution of the two equations also requires iterative solving, as the MFGs achieves a fixed point through their iteration.
We employ two networks to represent the forward and backward equations, respectively. The value function of each network is strongly tied to the objective function of the equation, following an approach similar to that of Han et al.[2], which effectively represents the equations. And by designing the iterative training structure of the two networks, we characterize the iterative solution process of the two equations in the MKV FBSDE solution process.
These measures theoretically guarantee the existence and uniqueness of the equilibrium in the MFGs system, ensuring that the solution produced by our algorithm is indeed the equilibrium.
In order to show the existence of solutions to the MKV FBSDEs and the equivalence with the MFGs under the given Solvability HJB conditions for the MKV FBSDEs used, we give theoretical proofs which can illustrate the validity of our proposed transformational approach to the equilibrium of the MFGs.
The objective is to prove that, for a given initial condition, the FBSDE has a solution with a bounded martingale integrand, and that this solution is unique within the class of solutions with bounded martingale integrands. Meanwhile, we must also construct a decoupling field. Below shows the theoretical analysis.
Reference
[1] Lasry J M, Lions P L. Mean field games[J]. Japanese journal of mathematics, 2007, 2(1): 229-260.
[2] Han J, Jentzen A, E W. Solving high-dimensional partial differential equations using deep learning[J]. Proceedings of the National Academy of Sciences, 2018, 115(34): 8505-8510.
Theorem: Representation Theorem for Strong Formulation.
For the same input as above and under assumption MFGs Solvability HJB, the FBSDE with as initial condition at time 0 has a unique solution with being bounded by a deterministic constant, almost everywhere for on .
Moreover, there exists a continuous mapping Lipschitz continuous in uniformly with respect to and to the input , such that, for any initial condition , the unique solution to the FBSDE with as initial condition at time 0, satisfies:
Also, the process is bounded by the Lipschitz constant of in . Finally, the process is the unique solution of the optimal control problem. In particular, for
Proof. We split the proof into successive steps.
First Step. We first focus on a truncated version of FBSDE, namely:
(20) |
for , with the terminal condition, for a cut-off function , equal to on the ball of center and radius , and equal to outside the ball of center and radius , such that . For the time being, is an arbitrary real number. Its value will be fixed later on.
By Ref.1, we know that, for any initial condition , Eq.(20) is uniquely solvable. We denote the unique solution by . Thanks to the cut-off function , the driver of Eq.(20) is indeed Lipschitz-continuous in the variable . Moreover, the solution can be represented through a continuous decoupling field , Lipschitz continuous in the variable , uniformly in time. Also, the martingale integrand is bounded by times the Lipschitz constant of , with as in assumption MFGs Solvability HJB. Therefore, the proof boils down to showing that we can bound the Lipschitz constant of the decoupling field independently of the cut-off function in Eq.(20).
Second Step. In this step, we fix the values of , and we use the notation . We then let be the Doléans-Dade exponential of the stochastic integral:
where As earlier, we write for despite the fact that and do not have the same arguments. Since the integrand is bounded, is a true martingale, and we can define the probability measure Under , the process:
is a d-dimensional Brownian motion. Following the proof of Proposition 4.51, we learn that under is a solution of the forward-backward SDE:
(21) |
over the interval , with the same terminal condition as before. Since is bounded, the forward-backward SDE Eq.(21) may be regarded as an FBSDE with Lipschitz-continuous coefficients. By the FBSDE version of Yamada-Watanabe theorem proven in ref.1, any other solution with a bounded martingale integrand, with the same initial condition but constructed with respect to another Brownian motion, has the same distribution. Therefore, we can focus on the version of Eq.(21) obtained by replacing by If, for this version, the backward component can be represented in the form , for all , with being Lipschitz continuous in space, uniformly in time, and with bounded, then must coincide with Repeating the argument for any ,we then have
Third Step. The strategy is now as follows. We consider the same FBSDE as in Eq.(21), but with replaced by the original :
(22) |
with , This BSDE may be regarded as a quadratic BSDE. In particular, Ref.1 applies and guarantees that it is uniquely solvable. However, since the driver in the backward equation is not Lipschitz continuous, we shall modify the form of the equation and focus on the following version:
(23) |
Notice that the cut-off function now appears on the third line. Our objective being to prove that Eq.(23) admits a solution for which is bounded independently of , when is large, the presence of the cut-off does not make any difference.
Now, Eq.(23) may be regarded as both a quadratic and a Lipschitz FBSDE. For any initial condition , we may again denote the solution by This is the same notation as in the first step although the equation is different. Since the steps are completely separated, there is no risk of confusion. We denote the corresponding decoupling field by By Theorem in Ref.1, it is bounded (the bound possibly depending on at this stage the proof) and is bounded.
For the sake of simplicity, we assume that and we drop the indices and in the We just denote it by Similarly, we just denote by
The goal is then to prove that there exists a constant , independent of and of the cut-off , such that, for all ,
(24) |
from which we will deduce that, for all
which is exactly the Lipschitz control we need on the decoupling field.
Fourth Step. We now proceed with the proof of Eq.(24). Fixing the values of and and letting
we can write:
(25) |
where is the matrix with entries:
where is the coordinate of and
with:
From the Lipschitz property of in ,the process is bounded by a constant only depending upon in the assumption. Notice that in the notation appears as the inner product of and Because of the presence of the additional indices , we chose not to use the inner product notation in this definition. This warning being out of the way, we may use the inner product notation when convenient.
Indeed, in a similar fashion, the pair satisfies a backward equation of the form:
(26) | |||||
where is an -valued random variable bounded by and and are progressively measurable -valued processes, which are bounded, the bounds possibly depending upon the function Here,“” denotes the inner product of Notice that, as a uniform bound on the growth of and ,we have:
(27) |
the constant only depending on the constant appearing in the assumption and where we used the assumption sup Since is bounded, we may introduce a probability (again this is not the same as that appearing in the second step, but, since the two steps are completely independent, there is no risk of confusion), equivalent to , under which is a Brownian motion. Then,
(28) |
In order to handle the above right-hand side, we need to investigate . This requires to go back to Eq.(27) and to Eq.(23).
Fifth Step. The backward equation in Eq.(23) may be regarded as a BSDE satisfying assumption Quadratic BSDE, uniformly in By Ref.1, the integral is of Bounded Mean Oscillation and its BMO norm is independent of and Without any loss of generality, we may assume that it is less than
Coincidentally, the same holds true if we replace by from Eq.(26), as By Ref.1, we deduce that there exists an exponent , only depending on and , such that (allowing the constant to increase from line to line):
Now Eq.(25) implies that, for any , there exists a constant , independent of the cutoff functions , such that Therefore, applying Hölder’s inequality, Eq.(28) and the bound for the -moment of , we obtain:
for some . In order to estimate the right-hand side, we invoke Ref.(1) again. We deduce that:
for a constant that only depends upon and . This proves the required estimate for the Lipschitz constant of the decoupling field associated with the system (23).
Appendix D Error Analysis
The error due to discretized MKV FBSDEs is negatively correlated with the number of temporal discretizations , i.e., . Therefore, the denser the temporal discretization splits, the smaller the resulting error in discretizations. Meanwhile, The training loss can be expressed as the solution loss of the discretized MFGs, and the error is caused by the parameterized Neural Network. Below shows the detail of error analysis.
D.1 Errors caused by discretization of MKV FBSDEs
For convenience, abbreviations will be used in the following error analysis, that is, will represent .
In MKV FBSDEs, represents the value function . To get the value function, we integrate the second term in eq.(4) in Section 2.1, that is
subtracting the case at from the case at gives:
and can be discretized by Euler method, using to represent the average value in the process, and we can get
where
The item has no error because is a random process, so there is no difference in its value between and the integral form. The item causes the error from the real value . The error can be calculated as (represent by ):
By using First mean value theorem for definite integrals, , such that
In the whole process, since t is discretized into N parts, the error of the whole process (represent as ) can be obtained as
resulting that .
In summary, the error caused by discretization is related to the inverse of the number of discretization division states. Meanwhile, , , and such that .
D.2 Errors caused by parameterized density flow and value function
In MFGs, the loss function of the entire system can be written as
For fixed control , the form of the optimization loss function becomes
(29) |
It can be transformed into a parameterized problem for solving neural networks with
(30) |
Thus, the loss consists of two parts, one for process loss and one for terminal loss.
Process loss. As a result of discretizing the MKV FBSDEs and constructing the neural network in this way, the process loss changes from the form of an integral to the form of a cumulative sum of the total loss, passed through the network, and can thus be expressed as .
Terminal Loss. In the iterative solution of the NF network by a network of fixed-value functions, the terminal loss is calculated by substituting the loss into the terminal-value function g after sampling by . Thus it can be expressed as .
Thus the above can represent the total loss of the MFGs .
Regularization term loss. The HJB-FPK equations involved in the MFGs still need to be satisfied to hold throughout the process solution, so is added as a regularization term.
In summary, the sum of the three loss terms during training can be expressed as the solution loss of the discretized MFGs, and since the error due to discretization has already been analyzed, this part of the error is only due to the error caused by the parameterized Neural Network.