Machine Learning architectures for price formation models with common noise
Abstract
We propose a machine learning method to solve a mean-field game price formation model with common noise. This involves determining the price of a commodity traded among rational agents subject to a market clearing condition imposed by random supply, which presents additional challenges compared to the deterministic counterpart. Our approach uses a dual recurrent neural network architecture encoding noise dependence and a particle approximation of the mean-field model with a single loss function optimized by adversarial training. We provide a posteriori estimates for convergence and illustrate our method through numerical experiments.
I INTRODUCTION
In this work, we extend the use of machine learning (ML) techniques for the numerical solution of the mean-field games (MFGs) price formation models, introduced in [8], to incorporate the common noise model from [9] (see also [10]). The goal is to determine the price of a commodity with a noisy supply traded among rational agents within a finite time horizon , under a market-clearing condition. More precisely, we assume the supply function satisfies the following stochastic differential equation (SDE)
(1) |
where and is a one-dimensional Brownian motion acting as common noise. The coefficients and satisfy the usual Lipschitz conditions for existence and uniqueness of solutions (see [7]). Because of (1), our model explains the price formation for commodities with continuous and smooth fluctuations, such as stocks, bonds, currencies, and continuously produced or consumed goods such as oil or natural gas. Additional sources of noise can be considered. For instance, sudden and discontinuous fluctuations can be modeled by adding Poisson jumps to (1).
Let be a complete filtered probability space supporting . Progressive measurability refers to the measurability with respect to this filtration, which we require for all stochastic processes. In this context, the MFG with common noise characterizing the price is the following.
Problem 1
Suppose that is uniformly convex and differentiable in the second argument, is a probability measure on , and is uniformly convex and differentiable. Find , , and progressively measurable, satisfying and
(2) |
The previous problem generalizes the one introduced in [11], which corresponds to the case . The numerical solution of (2) presents additional challenges compared to the deterministic counterpart, as the state space becomes infinite-dimensional. [10] showed that (2) is well-posed when and are linear and is quadratic, obtaining semi-explicit solutions. Section II presents the derivation of (2).
In the absence of common noise, several numerical schemes have been proposed: Fourier series [17], semi-Lagrangian schemes [4], fictitious play [12], and variational methods [3]. [6] proposes an ML-based approach to solve bi-level Stackelberg problems between a principal and a mean field of agents by reformulating the problem as a single-level mean-field optimal control problem. [15] and [5] survey deep learning and reinforcement learning methods applied to MFGs and mean-field control problems. However, these methods cannot handle general forms of common noise as the state space becomes infinite-dimensional. Recent works have circumvented this issue. [2] reduces continuous-time mean field games with finitely many states and common noise to a system of forward-backward systems of (random) ordinary differential equations. [16] used rough path theory and deep learning techniques. However, the coupling in the price formation problem with common noise is given by an integral constraint in infinite dimensional spaces, which is beyond what standard methods can handle. In [1], the price formation model with common noise was converted into a convex variational problem with constraints and solved using ML, enforcing constraints by penalization. This approach, however, introduces numerical instabilities. In contrast, our method includes the balance constraint in the loss functional as a Lagrange multiplier instead of a penalization.
Our method employs two recurrent neural networks (RNN) to approximate and the optimal vector field agents follow using a particle approximation and a loss function that the RNNs optimize by adversarial training. We develop a posteriori estimates to confirm the convergence of our method, which is of paramount importance when no benchmarks are available. Given , we write . We introduce additional notation in Section III, where we prove the following main result:
Theorem 1
Suppose that is uniformly concave-convex in , separable, with Lipschitz continuous derivatives, and is convex with Lipschitz continuous derivative. Let and solve the -player price formation problem with common noise, and let and be an approximate solution to the -player problem up to the error terms and . Then, there exists , depending on problem data, such that
We present our algorithm in Section IV and numerical results in Section V for the linear-quadratic setting. Nonetheless, our method can handle models outside the linear-quadratic framework. Moreover, the ML is well suited for higher-dimensional state spaces, where, for instance, several commodities are priced simultaneously. Section VI contains concluding remarks and sketches future research directions.
II The mfg price problem with common noise
Price formation is a critical aspect of economic systems. One example is load-adaptive pricing in smart grids, which motivates consumers to adjust their energy consumption based on changes in electricity prices. MFGs provide a mathematical framework for studying complex interactions between multiple agents, including buyers and producers in a market. Here, we revisit the underlying optimization problem in Problem 1.
A representative player with an initial quantity at time selects a progressively measurable trading rate to minimize the cost functional mapping to
(3) |
where solves
(4) |
with . The Hamiltonian in (2) is the Legendre transform of the Lagrangian in (3):
(5) |
Here, we assume that is uniformly convex for all . Moreover, we assume that satisfies the assumptions of Theorem 1, which guarantees the Lagrangian satisfies the convexity requirement. Considering the value function
we obtain the first equation in (2). The distribution starts with and evolves according to the stochastic flow controlled by , as described by the second equation in (2), while the third equation imposes a market-clearing condition. [1] discusses further details. In our method, we approximate , which allows the decoupling of the equations in (2).
The particle approximation involves a finite population of players with independent, identically distributed initial positions , , with distribution . Each player selects , , determining its trajectory according to (4) and aiming at minimizing the functional mapping to
(6) |
The existence of minimizing (II) for corresponds to the existence of minimizing the functional
subject to the market-clearing constraint
(7) |
We rely on the existence and uniqueness result for the -player price formation model, presented in [10], which determines the price in (II) through the Lagrange multiplier associated with the market-clearing constraint. Our goal is to extend the ML algorithm introduced in [8] to cover the case of common noise, providing a solution to the price formation problem in random environments. Relying on the particle approximation of the model to approximate the price solving (2), we approximate stationary points of the functional mapping to
(8) |
by minimizing w.r.t. and maximizing w.r.t. . The approximation is done in the ML framework, and we guarantee its accuracy using a posteriori estimates of the -player model, which we present next.
III A posteriori estimates
In this section, we use optimality conditions for the -player game to obtain a posteriori estimates to verify our approximation’s convergence. We extend the proof presented in [8] to the common noise setting with minor modifications.
The optimality conditions for (II) give rise to a Hamiltonian system comprising the following backward-forward stochastic differential equation
(9) |
for , where is given by (5). Notice that is part of the unknowns. Moreover, and solving the -player price formation problem define a solution of (9) by
(10) |
for , defining a saddle point of (II) that satisfies the market-clearing constraint (7). Let , , , and satisfy
(11) |
where , , for . We write , and, analogously, for all -indexed stochastic processes. We denote and .
Proposition 2
Proof:
Proposition 3
Proof:
We write . The uniform concavity-convexity assumption on and the equations in (9) and (11) give
for , for some . Using Itô product rule, the initial and terminal conditions in (9) and (11), and the convexity of , the previous inequality gives
for some and to be selected. Adding the previous inequality over , and using the third equation in (9) and (11), we get
(12) |
for some to be selected. By the triangle inequality,
(13) |
Using (13) on the RHS of (III), taking expectations, and using Proposition 2, we obtain
for some to be selected. Selecting , conveniently, the previous expression provides the result. ∎
IV Neural Networks for progressively measurable processes
This section details the RNN architectures we use to estimate and . Section V presents some numerical experiments. RNNs, commonly used in natural language processing, generate outputs that sequentially depend on inputs. This architecture has a cell that iterates through input sequences and has a hidden state tracking historical dependencies; see [14] for details. RNNs have also been used in the context of control problems with delay in [13], but here our motivation comes from the impact of the common noise on the mean-field term.
In our architecture, the RNN takes as inputs an ordered sequence, such as a discrete realization of the supply . The RNN features a hidden state , initialized as zero, that captures the temporal dependence. Inside the RNN, a weight matrix and a bias vector determine layer , where . Their dimensions depend on the number of neurons per layer. The activation function of layer is denoted by . The cell parameters (weight matrices and bias vectors) are denoted by .
We use two RNNs for approximating the control variable and the price . As usual in the ML framework, a trade-off must be made between computational cost and accuracy. Deep-RNN employs several layers and neurons in their cell. After multiple numerical experiments, we select layers, with , , and for the RNN approximating , and , and for the RNN approximating . As a common practice for RNN, the activation functions are hyperbolic tangent for the first layer, which computes the hidden state, and sigmoid for layers two to four. The last layer has an activation function equal to the identity, representing any real number as an output. Although we do not address it, an interesting research question is how sensitive the results are to the choice of parameters of the RNN. Moreover, a comparison regarding the accuracy and computational efficiency against other methods, such as forward-backward SDEs methods, can be formulated based on the adaptability of those methods to the price formation MFG problem with common noise.
We denote by the time-step size and discretize (4) according to
(16) |
for , where is the parameter of the RNN approximating . The second RNN, with parameter , computes for . More precisely, the inputs and outputs of the two RNNs are as follows. For the RNN computing , the input consists of a supply realization and the time; that is, . The output is . For the RNN computing , the input consists of the time, the state variables (which the RNN updates according to (16) as it iterates in the temporal direction), and the current price approximation; that is, . The output is . Because we consider a population of agents, we add the superscript to denote the position and control sequence of the agent being considered; that is, , and , for .
IV-A Numerical implementation of a posteriori estimates
Let for . At the discrete level, (11) is equivalent to
for . By (5), at the point where the supremum is achieved. Therefore, taking expectations on both sides of the equation in the previous system, and using the martingale property for the processes , for , we get
(17) |
where drives the process according to (16). While the initial condition is deterministic, the terminal condition is random. We take a Monte Carlo (MC) approximation of (17) with realizations; that is,
Thus, to implement the a posteriori estimate of Theorem 1 numerically, let and be given. Define and according to (16) and (10), respectively, and compute the mean-square error (MSE) of and by
(18) |
We measure (IV-A) as we train the neural network with the algorithm we introduce next.
Remark 4
Theorem 1 addresses the convergence for finite populations. A complete analysis of the convergence in our method involves three steps we identify by writing
First is the convergence of finite to continuum population games. Second, the convergence of the MC approximation to the finite population game, addressing the dependence of sample size w.r.t. the population size. Third is the convergence of the ML to the MC approximation, involving the RNN parameters in the estimates. This is the error that our a posteriori estimate controls.
IV-B Training algorithm
In typical ML frameworks, a class of neural networks is trained by minimizing a loss function . Within a fixed architecture, assigns a real number to a parameter . The objective is to minimize across the parameters . For a given realization of the supply, the loss function is
(19) |
The training algorithm is the following.
In contrast to the training algorithm in [8], in Algorithm 1, the supply input changes between training steps. The algorithm trains two neural networks in an adversarial manner. At each step, we generate a sample of the probability distribution . To minimize the agent’s cost function, we update in the direction of descent while is fixed. Conversely, to penalize deviation from the balance condition, we maximize the cost functional by updating in the direction of ascent while is fixed. This process is repeated multiple times, approximating the saddle point corresponding to the control minimizing the cost functional and its Lagrange multiplier.
V Numerical results
Here, we demonstrate how the a posteriori estimate (Theorem 1) ensures that our method delivers accurate price approximations. We validate our findings using the benchmarks provided by the linear-quadratic model. For numerical implementation, we employ the Tensorflow software library.
We set and for the time discretization. We assume that the supply follows
(20) |
where and . The Brownian noise is applied on the time interval and generates deviations from the expected value, as illustrated in Figure 1 with two sample paths of the supply. The initial distribution is a normal distribution with mean and standard deviation . The sample size for the training is . We train for epochs, an epoch consisting of training steps. We compute the MC estimate of the a posteriori estimate at the end of each epoch using supply samples and a population size of . Empirically, the previous training parameters solved the trade-off between computational cost and accuracy.


We select and . Figure 2 shows the evolution of the a posteriori estimate in Theorem 1. The balance error achieves enough accuracy and slightly oscillates with a decreasing tendency. The optimality error also exhibits a decreasing trend, but accuracy does not improve, suggesting testing other combinations of training and discretization parameters.


Furthermore, we use the analytic solution derived in [10] to verify the price approximation’s accuracy. In the linear-quadratic framework, the price follows from the SDE system
(21) |
The value and the functions and are determined by a certain system of ordinary differential explicitly solvable. Figure 3 shows the corresponding price approximation and exact price (obtained from (21)) for the two supply realizations of Figure 1. The decreasing trend in the errors observed Figure 2 is reflected in the precise approximation observed in Figure 3. Notice the effect of the error in the time window , which decreases, as expected, the accuracy of the approximation compared to the time region , where no noise is applied.
As the figures show, the method has an excellent performance in approximating solutions in various noise regimes. Further research and refinements can enhance its efficiency, speed, and accuracy, leading to even more precise approximations in critical regions.
VI Conclusions and further directions
We extend the ML approach introduced in [8] for the deterministic setting to the common noise scenario, utilizing RNN architectures to represent non-anticipating controls. As in the deterministic case, our approach demonstrates good accuracy and performance.
Future research could explore method robustness concerning RNN and discretization parameter variations, addressing the trade-off between computational cost and accuracy. Comprehensive experiments may identify optimal RNN layer and neuron quantities for specific supply dynamics. Advanced coding methods could further reduce computational costs while maintaining or enhancing accuracy. Extensions could also accommodate additional noise sources, such as jump processes.
ACKNOWLEDGMENT
D. Gomes and J. Gutierrez were supported by King Abdullah University of Science and Technology (KAUST) baseline funds and KAUST OSR-CRG2021-4674.
References
- [1] Y. Ashrafyan, T. Bakaryan, D. Gomes, and J. Gutierrez. The potential method for price-formation models. In 2022 IEEE 61st Conference on Decision and Control (CDC), pages 7565–7570, 2022.
- [2] C. Belak, D. Hoffmann, and F. T. Seifried. Continuous-time mean field games with finite state space and common noise. Applied Mathematics & Optimization, 84(3):3173–3216, 2021.
- [3] J.-D. Benamou and G. Carlier. Augmented lagrangian methods for transport optimization, mean field games and degenerate elliptic equations. Journal of Optimization Theory and Applications, 167(1):1–26, 2015.
- [4] E. Carlini and F. J. Silva. A fully discrete semi-lagrangian scheme for a first order mean field game problem. SIAM Journal on Numerical Analysis, 52(1):45–67, 2014.
- [5] R. Carmona and M. Laurière. Deep learning for mean field games and mean field control with applications to finance. To appear in Machine Learning And Data Sciences For Financial Markets (arXiv preprint arXiv:2107.04568), 2021.
- [6] G. Dayanikli and M. Lauriere. A machine learning method for stackelberg mean field games, 2023.
- [7] N. El Karoui, S. Peng, and M. C. Quenez. Backward stochastic differential equations in finance. Mathematical Finance, 7(1):1–71, 1997.
- [8] D. Gomes, J. Gutiérrez, and M. Laurière. Machine learning architectures for price formation models, 2022.
- [9] D. Gomes, J. Gutierrez, and R. Ribeiro. A mean field game price model with noise. Math. Eng., 3(4):Paper No. 028, 14, 2021.
- [10] D. Gomes, J. Gutierrez, and R. Ribeiro. A random-supply mean field game price model, 2021.
- [11] D. Gomes and J. a. Saúde. A Mean-Field Game Approach to Price Formation. Dyn. Games Appl., 11(1):29–53, 2021.
- [12] S. Hadikhanloo and F. J. Silva. Finite mean field games: Fictitious play and convergence to a first order continuous mean field game. Journal de Mathématiques Pures et Appliquées, 132:369–397, 2019.
- [13] J. Han and R. Hu. Recurrent neural networks for stochastic control problems with delay. Mathematics of Control, Signals, and Systems, 33:775–795, 2021.
- [14] C. Higham and D. Higham. Deep learning: an introduction for applied mathematicians. SIAM Review, 61(4):860–891, Nov. 2019.
- [15] M. Laurière, S. Perrin, M. Geist, and O. Pietquin. Learning mean field games: A survey, 2022.
- [16] M. Min and R. Hu. Signatured deep fictitious play for mean field games with common noise, 2021.
- [17] C. Mou, X. Yang, and C. Zhou. Numerical methods for mean field games based on gaussian processes and fourier features, 2021.