CORRESPONDING AUTHOR: Emiliano Dall’Anese (e-mail: [email protected]) \authornoteThis work was supported by the National Science Foundation (NSF) awards 1941896 and 2044946, and by the NSF Mathematical Sciences Graduate Internship Program.
Solving Decision-Dependent Games by Learning from Feedback
Abstract
This paper tackles the problem of solving stochastic optimization problems with a decision-dependent distribution in the setting of stochastic strongly-monotone games and when the distributional dependence is unknown. A two-stage approach is proposed, which initially involves estimating the distributional dependence on decision variables, and subsequently optimizing over the estimated distributional map. The paper presents guarantees for the approximation of the cost of each agent. Furthermore, a stochastic gradient-based algorithm is developed and analyzed for finding the Nash equilibrium in a distributed fashion. Numerical simulations are provided for a novel electric vehicle charging market formulation using real-world data.
Optimization, Stochastic monotone games, Decision-dependent distribution, Learning.
1 INTRODUCTION
The efficacy of stochastic optimization [nemirovskistochastic] and stochastic games [lei2022stochastic, koshal2010single, liu2011stochastic, lei2020synchronous, franci2021stochastic] generally hinges on the premise that the underlying data distribution is stationary. This means that the distribution of the data, which parameterize the problem or the game, does not change throughout the execution of the algorithm used to solve the stochastic problem or game, and is neither influenced or dependent on time nor the optimization variables themselves. This is a common setup that has been considered when game-theoretic frameworks have been applied to problems in, for example, ride hailing [fabiani2022stochastic], routing [bakhshayesh2021decentralized], charging of electric vehicles (EVs) [fele2019scenario, paccagnan2018nash], power markets [kannan2011strategic], power systems [zhou2019online], and in several approaches for training of neural networks [franci2021training]. However, this assumption can be invalid in a variety of setups in which the cost to be minimized is parameterized by data that is received from populations or a collection of automated control systems, whose response is uncertain and depends on the output of the optimization problem itself. As an example, in a competitive market for electric EV charging [fele2019scenario, tushar2012economics], the operators seek to find the charging prices (i.e., the optimization variables) to maximize the revenue from EVs; however, the expected demand (i.e., the “data” of the problem) is indeed dependent on the price itself. More broadly, power consumption in power distribution grids depends on electric prices [mathieu2011examining]. A similar example pertains to ride hailing [fabiani2022stochastic].
To accommodate this scenario, the so-called stochastic optimization with decision-dependent distributions (also known as performative prediction [perdomo2020performative]) posits that we represent the data distribution used in optimization instead as a distributional map where are decision variables [drusvyatskiy2022stochastic, miller2021outside, narang2022learning, perdomo2020performative, wood2023stochastic]. In this work, we study decision-dependent stochastic games in which players seeks to minimize their cost (based on their optimization variables) subject to other players optimization variables, and where the data distribution of each player depends on the actions of all players (we will use the term player and agent interchangeably).
We focus on solving the Nash equilibrium problem of a game, which is to find a decision from which no agent is incentivized by their own cost to deviate when played. Formally, the stochastic Nash equilibrium problem with decision-dependent distributions considered in this paper is to find a point such that
(1) |
with defined as:
(2) |
where: denotes a random variable supported on , is a scalar valued function that is convex and continuously differentiable in , is a compact convex set, and is a distributional map whose output is a probability distribution supported on .
Standard stochastic first-order methods are insufficient for solving problems of this form. As we will demonstrate later in the paper, even estimating the expected gradient from samples requires knowledge of the probability density function associated with —which is not possible in a majority of practical applications.
Hereafter, we use the term “system” to refer to a population or a collection of automated controllers producing a response upon observing . To illustrate our setup, consider again the example where each agent represents an EV charging provider. Here, represents the charging price at a station managed by provider , expressed in $/kWh. Correspondingly, indicates demand for the service at that price, while is the service cost (or the negative of the total profit) for provider . This is an example of a competitive market in which the demand for service is a function of the price of all providers; see, for example, the game-theoretic approaches presented in [mediwaththe2018game, fele2019scenario] and the Stackelberg game presented in [tushar2012economics]. However, compared to existing game-theoretic models for EV markets, the framework proposed in this paper allows for an uncertain response of EV owners to price variations; this randomness is difficult to model, as it it related to the drivers’ preferences and other externalities such as the locations of the charging stations, etc., as explained in, e.g., [mediwaththe2018game, latinopoulos2017response, daina2017electric].
Challenges in solving problems of this form typically stem from the that fact that the distributional maps are often unknown [sun2014bayesian, den2015dynamic, cheung2017dynamic, chen2021nonparametric]. To overcome this challenge, we propose a learning-based optimization procedure – in the spirit of the methods proposed for convex optimization in [lin2023plug, miller2021outside] – to tackle the multi-player decision-dependent stochastic game. The key idea behind this framework is that we first propose a parameterization for the distributional map in the system and estimate it from responses. Then, we use the estimated distributional map throughout the game without requiring further interaction with the system.
1.1 RELATED WORK
Our work incorporates themes from games, learning, as well as stochastic optimization with decision-dependent distributions. We highlight the relationship with this relevant literature below.
Games. Within the context of games, our work is specifically focused on solving Nash equilibrium problems using gradient-based methods and a variational inequality (VI) framework. The literature on stochastic games is extensive; for a comprehensive yet concise review of the subject, we refer the reader to the tutorials [scutari2010convex] and [lei2022stochastic]; see also pertinent references therein. A common denominator of existing frameworks is that the data distribution is stationary. The work of [rosen1965existence] demonstrates that strictly monotone games have unique solutions and that gradient play converges to it. The modern approach of solving Nash equilibrium problems for continuous games via variational inequalities can be attributed to Facchinei and Pang [facchinei2003finite, harker1990finite]. For solving strongly monotone variational inequalities, the projected gradient method is capable of converging linearly.
Our work adds the additional complexity of minimizing communication between agents and hence we use a distributed gradient approach in our optimization algorithm. Distributed gradient methods have been explored extensively in the literature on convex optimization, though less so in that of variational inequalities. We refer the reader to [nedic2020distributed] for a review in the convex optimization setting, and [kovalev2022optimal] for variational inequalities.
Decision-Dependent Data. This paper contributes to the growing body of literature that studies stochastic optimization with decision-dependent data distributions. While the concept of decision-dependent distributions has existed for some time, its roots go back to the framework “Performative Prediction” and its use within the machine learning community [perdomo2020performative]; this work posits the formulation of optimization problems in which the data distribution is explicitly dependent on the optimization variables, and proposes repeated retraining (and the limit points thereof) as a solution. This is a family of points that solve the induced stationary optimization problem. For repeated retraining, convergence of various stochastic gradient algorithms are studied in [drusvyatskiy2022stochastic, mendler2020stochastic] in the batch setting, and in the time-varying setting in [cutler2021stochastic, wood2021online]. The extension of problems with to games includes two-player zero-sum games in [wood2021online], and general multiplayer games in [narang2022learning]. Additional recent extensions to this line of work include distributionally robust optimization[basciftci2021distributionally, luo2020distributionally] and time varying optimization [wood2021online, cutler2021stochastic].
Recent works have since pivoted towards directly solving the optimization problem by incorporating a model for the distributional map. In [miller2021outside], the authors provide conditions that ensure convexity of the expected cost, alongside a learning-based algorithm for finding solutions to problems with location scale families. The show that location scale families can be learned from the system prior to optimization without requiring further interaction with the system during optimization. The work of [lin2023plug] extends this learning framework beyond location scale families and derives regret bounds that incorporate the associated approximation and statistical errors arising from the statistical learning problem. Outside of the distribution learning approach, derivative-free methods have been studied in [miller2021outside, narang2022learning, wood2023stochastic]. While these methods do not depend on a learned model of the distributional map, they generally exhibit slower convergence rates. On the other hand, adaptive learning strategies, particularly for location-scale models, have shown promise, as discussed in [narang2022learning] whereby the model is learned during optimization. However, their exploration remains limited to this specific model category.
This work complements the technical findings in [lin2023plug] by offering a generalization of the so called “plug-in” approach presented in this work to the multi-agent setting. Relative to [narang2022learning], we focus on solving the Nash equilibrium problem for a general class of models by learning the distributional map prior to optimization—thereby reducing the required number of interaction with the system to receive a reasonable answer.
1.2 CONTRIBUTIONS
In this work, we provide the following contributions to the body of work on stochastic optimization and Nash equilibrium problems with decision-dependent distributions.
-
(i)
We propose an algorithm for finding a Nash equilibrium in stochastic games with decision-dependent distributions where: (i) the distributional map for each player’s cost is estimated from samples, and (ii) the estimated distributional map is used in gradient-based strategies.
-
(ii)
We provide guarantees on the approximation error of distributional maps for a class of map learning problems.
-
(iii)
We show that the parameterized cost approximates the ground-truth in high-probability.
-
(iv)
We propose a stochastic gradient-based algorithm for solving a parameterized strongly-monotone game, and we demonstrate linear convergence in expectation.
-
(v)
Finally, we provide numerical simulations of an EV charging market formulation using real-world data. The EV market formulation is new in the context of energy markets, thus providing contributions in this area.
1.3 ORGANIZATION
In Section 2, we provide necessary notation and background for our analysis. In Section 3 we discuss the proposed learning algorithm in detail and present our primary result. Section 5 discusses the details of the optimization stage. We provide our numerical simulations in Section 5. Proofs of the results are provided in the Appendix.
2 Notation and Preliminaries
Throughout the paper, denotes the -dimensional Euclidean space with inner product , and Euclidean norm . For a matrix , denotes the spectral norm. For a given integer , denotes the set and denotes the Euclidean hypersphere in dimensions, . The symbol is used to denote the -dimensional vector of all ones. Given vectors and , we let denote their concatenation.
For a symmetric positive definite matrix , the weighted inner product is defined by and corresponding weighted norm for any . The weighted projection onto a set with respect to the symmetric positive definite matrix is given by the map
(3) |
for any .
2.1 Probability measures
Throughout this work, we restrict our focus to random variables drawn from continuous probability distributions supported over the Euclidean space. When random variables are equal in distribution, i.e., for all , we write .
Our analysis includes study of sub-exponential random vectors. A univariate random variable is said to be sub-exponential with modulus provided that the survival function satisfies for all . By extension, a random vector is sub-exponential provided that is a sub-exponential random variable for all .
To compare probability distributions, we will be interested in computing the distance between their associated probability measures—for which we need a complete metric space. We let denote the set of finite first moment probability measures supported on and write the Wasserstein- distance as
for any , where is the set of all -Lipschitz continuous functions . Under these conditions, the set forms a complete metric space [bogachev2012monge].
2.2 Games
We consider a game that consists of players. Each player has a cost function , distributional map , and decision set . Hence, each player chooses a decision, or strategy . The concatenation of the decision variables is written as where and . For a fixed agent , we will decompose the decision as where is the strategy vector of all agents excluding the th one.
The collection of costs and decision sets defines the game
(4) |
A Nash equilibrium of this game is a point provided that
(5) |
for all . Intuitively, is a strategy such that no agent can be incentivized by its cost to deviate from when all other agents play . Finding Nash equilibria is the primary focus of this work.
Games of this form are commonly cast into a variational inequality framework. This is due, in part, to the observation that the Nash equilibria are the solutions to the variational inequality
where the gradient map is defined as
(6) |
Here, the notation is used to represent the partial gradient . We will denote the set of Nash equilibrium of a game with gradient map and domain as . Existence of solutions to variational inequalities of this form is guaranteed provided that the set is convex and compact and the gradient map is monotone; uniqueness is guaranteed when is strongly-monotone [facchinei2003finite]. We say that is -strongly-monotone on provided that there exists such that
(7) |
and monotone when . In this work, we primarily focus on strongly-monotone games. While monotone games are tractable, methods for solving them with decision-dependent distributions require alternative gradient estimators—a topic we leave to future work.
2.3 Monotonicity in Decision-Dependent Games
In this work, we introduce the additional complexity to the formulation in (4) that the ’s are the expected cost over a distributional map . In particular, we write the cost as
(8) |
This can be written alternatively as the integral
(9) |
where is the probability density function for the distribution . When the integral satisfies the Dominated Convergence Theorem, computing the gradient amounts to differentiating under the integral and using the product rule. We then obtain
(10) |
where we recall that . In short, characterizing the gradient of this decision-dependent game requires assumptions not only on , but also on the properties of the distributional map . Sufficient conditions for strong monotonicity of the game in (4) are due to [narang2022learning] and are stated in terms of the decoupled costs, given by
(11) |
for all , and their associated decoupled partial gradients
(12) |
for all and
(13) |
for all . A key observation used in the proof is that .
Theorem 1.
(Strong Monotonicity, [narang2022learning]) Suppose that,
-
(i)
For all , is -strongly monotone,
-
(ii)
For all , is monotone,
and that for all ,
-
(iii)
For all is -Lipschitz continuous,
-
(iv)
is -lipschitz continuous on .
Set . Then if , is -strongly monotone.
3 Learning-based Decision-Dependent Games
In this work, we aim to solve the stochastic Nash equilibrium problem with decision-dependent data distributions as formulated in (1). Methods for finding Nash equilibrium for games with decision dependent data distributions either use derivative free optimization, at the expense of an extremely slow rate, or use derivative information in conjunction with a learned model of the distributional map [narang2022learning].
In [lin2023plug], it is shown that a “plug-in” optimization approach, whereby a model for the distributional map is learned from samples prior to optimization, yields a bounded excess risk for the convex optimization problems with decision-dependent data. In this work, we leverage the properties of the system to simplify the communication structure of our approach, which we depict in Figure 1. We assume that realizations of can be directly observed from the system, and the decisions can be obtained from a server or are made public (for example, the prices of EV charging of different providers can be observed at the various stations).
To accommodate this setting, our algorithm proposes a multistage approach consisting of the following phases: (i) sampling; (ii) learning; (iii) optimization. It is important to note that following the learning phase players only need to participate in gradient play without receiving any additional feedback from the system in the form of . This is distinct from existing approaches in which performatively stable points can only be reached after several (even thousands of) rounds of feedback [perdomo2020performative, narang2022learning, wood2023stochastic], and performatively optimal points can only be reached for models known to be location scale families a priori [miller2021outside, narang2022learning].

Sampling. In the sampling phase we require that players collaborate by each deploying a set of decisions so that they can collectively receive feedback from the system (in response to their deployed decisions ). The result is that each agent has access to a dateset which they can use to learn their distributional map
Learning. In this procedure, each player will choose a hypothesis class of parameterized functions
(14) |
as well as a suitable criterion or risk function , to formulate their own expected risk minimization problem
(15) |
over the random variable drawn from the coupled distribution . Then, using the set of samples from the previous sampling phase, they can formulate the corresponding empirical risk minimization problem
(16) |
The result is a learned distributional map approximating , which we can now use to solve the approximate Nash equilibrium problem.
Optimization. Following the approximation phase, each player now has an learned model of their distributional map , which can be used to formulate an approximation of the ground-truth cost and hence an approximate Nash equilibrium problem:
(17) |
for all , where
(18) |
Hereafter, we denote the Nash equilibrium of the approximate game as to distinguish it from the ground truth . In Algorithm 1, we write the set of Nash equilibria for the operator with domain as . In practice, we will assume the appropriate assumptions to guarantee uniqueness of this assignment; in which case the set inclusion is simply an equality.
By solving (17) instead of (1) we have introduced two errors: (i) the approximation error of the distributional map by elements of the hypothesis class , and (ii) the estimation or statistical error by solving the ERM problem instead of the expected risk minimization problem. In [lin2023plug], the main result demonstrates that these two sources of error propagate through the optimization problem, and that the resulting excess risk can be bounded in terms of the sample complexity . Our goal is to expand this result and provide additional analysis to our setting.
3.1 Parameter Estimation for Regular Problems
A critical component of our analysis is the estimation or learning of the distributional map and the subsequent characterization of the estimation error. In this section, we outline a class of expected risk minimization problems, which we call regular problems, for which we can characterize the distance between expected risk minimization solutions and empirical risk minimization solutions. Throughout, we write and for to denote the expected and empirical risk, respectively.
Definition 1.
(Map Learning Regularity) A map learning problem, consisting of the optimization problems with costs and over , is regular provided that:
-
(a)
Convexity: The expected risk is -strongly convex, and the empirical risk is convex.
-
(b)
Smoothness: For all realizations of and , is -Lipschitz continuous.
-
(c)
Boundedness: The set is convex and compact.
-
(d)
Sub-Exponential gradient: For all , is a sub-exponential vector with parameter .
Items (a) and (c), taken together, guarantee existence of and uniqueness of as defined in (16) and (15), respectively. Furthermore, the inclusion of item (b) is necessary to guarantee that first-order stochastic gradient methods will converge at least sub-linearly to . Lastly, the heavy-tail assumption [vershynin2018high] will allow us to describe the concentration of the gradient estimates. Together, they allow us to relate the solutions to the sample complexity in the following lemma.
Lemma 2.
(Uniform Gradient Bound) If the smoothness and sub-exponential gradient assumptions in Definition 1 hold for player , then for any and any such that , we have that:
(19) |
with probability at least , where .
The proof of this result is provided in the Appendix 6.1. This result offers a broad generalization of [mou2019diffusion, Equation (19b)] to any risk with Lipschitz-continuous sub-exponential gradients over any convex and compact set. Our result is comparable to the rate that can be found for specific problem instances such as linear least squares regression and logistic regression, but with the addition of a factor. Indeed, the generality of the risk function requires that we enforce compactness of the domain, thus giving rise to this extra logarithmic factor. This gradient estimation result will now allow us to reach our desired bounded distance result, which we present in the following theorem.
Theorem 3.
(ERM Approximation) If the map learning problem is regular for player (i.e., it satisfies the assumptions in Definition 1), then for any and any such that we have that:
(20) |
with probability at least , where .
The proof of Theorem 3 is provided in the Appendix 6.2. The power in this characterization lies in the fact that it holds for any statistical learning problem satisfying the assumptions listed in Definition 1, and is not specific to the setting of learning distributional maps. We note that our Definition 1, which is a property used in the Theorem 3, is different from the one in [lin2023plug] and it involves conditions that are easier to check.
As an example, we provide conditions for which a linear least squares problem satisfies the regularity conditions and hence is subject to the above ERM approximation result.
Proposition 4.
(Linear Least Squares Regularity) Consider the linear least squares problem with expected risk problem
and empirical risk minimization problem
Let with zero mean and covariance matrix . If
(i) There exist such that ,
(ii) The entries of and are sub-exponential,
(ii) The constraint set is convex and compact.
Then, the map learning problem is regular.
3.2 Bounding the Approximation Error
Finding a relationship between and will require that we first characterize an appropriate hypothesis class of distributions for learning. Here, we formalize the notion of misspecification and sensitivity for a hypothesis class .
Definition 2.
(Misspecification [lin2023plug]) A hypothesis class is -misspecified provided that there exists a such that
(21) |
for all .
Observe that if , then our approximation error is zero and we incur no bias in our problem formulation. This corresponds to the case where the map is realizable [shalev2014understanding]. Though, small values of are appropriate as may be complex enough that approximating it with a finite-dimensional parameter space is difficult.
Definition 3.
(Sensitivity [lin2023plug]) The hypothesis class is -sensitive if, for any ,
(22) |
for all .
Sensitivity of is merely a convenient name for the condition that be -lipschitz continuous for all realizations of . In the result that follows, we demonstrate that an appropriately misspecified and sensitive hypothesis class induces a cost that has bounded distance to the ground truth cost in (1).
Theorem 5.
(Bounded Approximation) Suppose that the following conditions hold for all :
(i) The hypothesis class is -misspecified, and -sensitive.
(ii) The map learning problem is regular.
(iii) For all , is -Lipschitz continuous.
Then, the bound
(23) |
holds with probability for any , where
(24) |
and where is as in .
The proof of Theorem 5 is provided in the Appendix 6.4. By including an additional assumption on the Lipschitz continuity of the cost with respect to to other agents decisions , we can provide an analog of the excess risk result in [lin2023plug].
Corollary 6.
Suppose that for all , the function is -Lipschitz on for any . Then,
(25) |
hold with probability for any and , , and is as in (20).
The analysis in this section demonstrates that the estimation procedure in Algorithm 1 yields a cost function that approximates the original cost in (1) with an error the decreases as the number of samples increases. Furthermore, this bound exists independent of the conditioning of the Nash equilibrium problem we solve in the optimization phase. We note that (23) is similar to the result in [lin2023plug], but it is based on a different definition of regular problem (see Definition 1); the bound (25) is unique to this paper.
In the section that follows, we examine a family of hypothesis classes that allows the approximated game to be monotone, and provide suitable algorithms for solving them with convergence guarantees.
4 Solving Strongly-monotone Decision-dependent Games
Since the agents lack full knowledge of the system and hence the ground truth distributional map in (1), we cannot hope to enforce that satisfy any assumptions to encourage tractability of our optimization problem. We can however impose conditions on the hypothesis class , which is chosen by the agents. To successfully find a Nash equilibrium of the approximate problem in (17), it will be crucial that agents choose a class that balances expressiveness of the system (thereby making small) with tractability of the optimization.
Perhaps the simplest model capable of achieving this goal is the location-scale family [miller2021outside, narang2022learning, wood2023stochastic]. In our setting, a location scale family parameterization for agent is a distributional map having matrix parameter where if and only if
(26) |
for stationary random variable . We note that this parameterization can be written alternatively as , where and are block matrices such that due to linearity. The resulting partial gradient has the form
which is typically much simpler to analyze than alternative models. Intuitively, this model allows us to express as the sum of a stationary random variable from a base distribution with a linear factor depending on , where the matrix parameter weights the responsiveness of the population to the agents decisions.
This model is particularly appealing as guarantees for learning are known and established in Proposition 4. Moreover, the matter of expressiveness is due to the fact that location scale families are a particular instance of strategic regression [perdomo2020performative, lin2023plug], in which member of the population interact with agents by modifying their stationary data (such as features in a learning task) in an optimal way upon observing :
where is a utility function parameterized by corresponding to the utility that members of the population derive from changing their data in response to the decisions in ; and the quadratic term is the cost of changing their data from to . Indeed when for , we recover the form above.
Furthermore, location scale families immediately satisfy several of the assumption required for further analysis. In particular, it is known that Sensitivity (Definition 3) holds with , Lipschitz continuity of holds with , and Lipschitz continuity of holds due to the following result.
Lemma 7.
(Lipschitz Gradient, [narang2022learning]) Suppose that is such that with , and that for each there exists such that is -Lipschitz continuous. Then is -Lipschitz continuous with
(27) |
Strong monotonicity will follow from Theorem 1 provided that satisfy the remaining hypothesis on the —which tends to be on a case-by-case basis. We will not require that use this parameterization in our analysis, however we can proceed with knowledge a model class satisfying our hypothesis does exist.
4.1 Distributed Gradient-based Method
In our optimization phase, we seek to use a gradient-based algorithm that respects the agent’s communication structure with the system. For the sake of readability, we will suppress the subscript and instead refer to quantities keeping in mind that they will correspond to the approximate Nash equilibrium problem in (17) with solution .
We will assume that each agent has access to an estimator of the gradient and is capable of projecting onto their decision set . In the constant step-size setup, each agent chooses a rate and performs the update
where is a stochastic gradient estimator for used at iteration , which is then reported to the system and made available to all agents. For the sake of analysis, we will assume without loss of generality that the steps-sizes satisfy the ordering
and hence and . The collective update can be written compactly as
(28) |
where and is an estimator for at iteration . Convergence of this procedure hinges on the following assumptions.
Assumption 1.
The gradient function is -strongly monotone and -Lipschitz continuous.
Assumption 2.
(Stochastic Framework) Let with elements
(29) |
be the natural filtration of the Borel -algebra over with respect to , and use the short-hand notation as the conditional expectation over the product distribution . There exist bounded sequences such that
(Bias) | |||
(Variance) |
where and for all .
Assumption 1 is standard for guaranteeing convergence of gradient play [facchinei2003finite], and the uniformly bounded variance component of Assumption 2 is standard for convergence for stochastic algorithms. As we will show shortly, convergence with bias is possible and the result reduces to the unbiased case when . The next result will quantify the one-step improvement of (32).
Lemma 8.
The proof of Lemma 8 is provided in the Appendix 6.5. We note that setting for some recovers the result in [narang2022learning, Theorem 15]. Following this one-step analysis, we can show convergence to a neighborhood of the Nash equilibrium.
The proof can be found in Appendix 6.6. The result shows that the algorithm converges linearly to a neighborhood of the Nash equilibrium , where the radius of the neighborhood is dictated by the step-size, variance, and bias bounds. When , we retrieve linear convergence. In order to converge to directly, we will require a decaying step-size policy. For example, we consider the following policy:
(31) |
for fixed constant , which we assumed to be shared by all agents. Hence, the decaying step-size update is given by
(32) |
In the theorem that follows, we show that this sequence converges to provided that the bias shares an asymptotic rate with .
Theorem 10.
The proof of this result follows by a standard induction argument, and can be found in Appendix 6.7.
5 Numerical Experiments on Electric Vehicle Charging
In this section, we consider a competitive game between distinct electric vehicle charging station operators, where stations are equipped with renewable power sources. The goal of each player is to set prices to maximize their own profit in a system where demand for their station will change in response to the prices set by other competing stations as well. The cost function (negative profit) takes the form
where for all . The renewable profit and operational cost terms allow us to describe the trade-off between profit from renewable power generation sold to the grid at rate , and surplus power required from the grid to meet demand at rate . To set prices, we can formulate a Nash equilibrium problem over the expected costs for and , where is the interval of price values between the wholesale and retail price.
Since the set of reasonable prices will be quite small, we hypothesize that the the price and demand have a linear relationship of the form where with corresponding to the base demand. Since we have a simple model, the first and second derivatives can be computed in closed form, and the relevant constants can be computed directly. Indeed, we find that the hypothesis of Theorem 1 are satisfied with which we set to , ,and . We conclude that is -strongly monotone with where is the parameter matrix whose columns are .


Our data depicts the demand of electricity across an hour-long period for 6 ports of varying power profiles for each day in year. We standardize the data to be zero mean and unit variance across each station. Solutions are calculated by performing expected gradient play with constant step size; the expected mean is estimated via the empirical mean over the data set.
We set and , where we use to simulate learning from samples. Hence demand for agent decreases as their own price increases, and increases as the price of other agents decreases. We run the stochastic gradient play algorithm initialized at with a single sample at each round and a decaying step size policy for . In Figure 3 we plot the mean error trajectory an confidence interval over 50 trials of 2000 iterations.
6 CONCLUSIONS
In this work, we studied a class of stochastic Nash equilibrium problems, characterized by data distributions that are dependant on the decisions of all involved players. We showed that a learning-based approach enables the formulation of an approximate Nash equilibrium problem that is solvable using a stochastic gradient play algorithm. The results of this procedure is a cost that can be related to cost of the original Nash equilibrium problem via an error that depends on both our approximation and estimation error. To demonstrate the flexibility of these findings, we simulated these techniques in an electric vehicle charging market problem in which service providers set prices, and users modify their demand based on prices set by providers. Future research will look at a scenario where the estimate of the distributional map is improved during the operation of the algorithm, based on the feedback received. Future applications will demonstrate the efficacy of more complex models (beyond linear)
APPENDIX
6.1 Proof of Lemma 2
For the sake of notation convenience, and visual clarity, we will suppress the index throughout the proof. We denote the gradient error by for all .
To begin, we will generate coverings for the unit sphere in and and use a discretization argument to create bounds over these finite sets. Fix and . Let be an arbitrary -covering of the sphere with respect to the Euclidean norm. From [wainwright_2019, Lemma 5.7], we know that . From our covering, we have that there exists in the covering such that . Hence,
Since this is true for any , then it holds for . Thus the above becomes
(35) |
Now we fix , and choose and -covering for the set , which we will write as . Recall that is bounded, so there exists a constant such that for all , . Hence . From [vershynin2018high, Proposition 4.2.12], we have that
(36) |
Thus, we conclude that .
Now by our discretization argument, there exists such that and hence
We observe that if are such that , then applying our smoothness assumption yields
where the second-to-last inequality uses .
To bound the remaining term, we use the concentration of sub-exponential random variables, due to Bernstein’s Inequality combined with the Union Bound. We have that
for all , and hence
for all , where we used the fact that and . Setting the right hand side equal to yields
(37) |
Next we choose so that
By requiring that satisfy , we enforce that . In combining, we observe that
and the result follows.
6.2 Proof of Theorem 3
We suppress the subscript for notational simplicity. We recall that that the -strong convexity of the map implies -strong monotonicity of , and . It follows that
and hence
(38) |
The result now follows by applying Lemma 2.
6.3 Proof of Proposition 4
We suppress the index throughout. The associated risk function is , so that and are the corresponding gradient and hessian. We observe that enforcing for some ensures -strong convexity and -smoothness of the expected risk. Similarly, the empirical risk has gradient , and hessian . Thus is convex the hessian is symmetric, then it is positive semi-definite and thus is convex. Furthermore, smoothness of follows with constant . Lastly, since and have sub-exponential entries, the gradient is sub-exponential and the result follows.
6.4 Proof of Theorem 5
We observe that for any fixed , we have that . The first term describes our statistical error at . We denote as a coupling on so that
By similar argument, we find that . In combining, we get . Lastly, can be bounded as in Theorem 3.
Regarding the second bound, we have that
where , where is the set of equilibria of
(39) |
where
Then,
where we have used the inequality for some .
6.5 Proof of Lemma 8
Consider the function defined by for all . Then, is -strongly convex over and has a unique minimizer . This implies that:
Since for all , we obtain
It follows that
We now consider the above in the conditional expectation with . We find that
To proceed, we bound the inner product terms. Using strong monotonicity, we have that
Furthermore, we observe that
To bound the remaining terms, we use arguments based on a weighted Young’s inequality. Let be fixed constants. It follows that
and
Additionally, we have that
Combining these estimates yields
and simplifying gives
To proceed, we choose and to ensure that the coefficient on the term is zero. Furthermore, enforcing that guarantees that . Hence the variance term is finite. Substituting these values and simplifying yields the result.
6.6 Proof of Theorem 9
For notational convenience, we will use the short-hand notation , , and
Hence, the result in Lemma 8 can be written compactly as
By recursively applying this result and applying the law of total expectation, we find that
Furthermore, if , then and the geometric series converges and is equal to its limit . Hence .
6.7 Proof of Theorem 10
Fix . For notational convenience, we will denote . Replacing the step-size matrix in Lemma 8 with yields
(40) |
To proceed, we will use the observation that
(41) |
and
(42) |
By substituting our expression for , , and into (40) we obtain
Here, the last steps follow from construction of , and the fact that .
ACKNOWLEDGMENTS
This research was supported in part by the National Science Foundation (NSF) Mathematical Sciences Graduate Internship (MSGI) Program and by NSF awards 1941896 and 2044946.
The MSGI program is administered by the Oak Ridge Institute for Science and Education (ORISE) through an inter-agency agreement between the U.S. Department of Energy (DOE) and NSF. ORISE is managed for DOE by ORAU. All opinions expressed in this paper are the author’s and do not necessarily reflect the policies and views of NSF, ORAU/ORISE, or DOE.
[]
Killian Wood is a Ph.D. candidate in applied mathematics at the University of Colorado Boulder advised Prof. Emiliano Dall’Anese. He received a B.A. degree in mathematics from California State University Fullerton, Fullerton, California, USA in 2019 and an M.S. degree in applied mathematics from University of Colorado Boulder, Boulder, Colorado, USA in 2022. His research interests include the development and analysis of stochastic optimization for decision-dependent data distributions, as well as randomized derivative-free methods in scientific computing.
[]
Ahmed S. Zamzam (Member, IEEE) is a senior research scientist at the National Renewable Energy Laboratory, where he is
part of the Power Systems Engineering Center. He received the B.Sc. degree from Cairo University in 2013 and the Ph.D. degree in electrical engineering from the University of Minnesota, Minneapolis, MN, USA, in 2019. He received the Louis John Schnell Fellowship (2015) and the Doctoral Dissertation Fellowship (2018) from the University of Minnesota. His research interests include machine learning and optimization for smart grid applications, large-scale complex energy systems optimization, and grid data analytics.
[]
Emiliano Dall’Anese (Member, IEEE) is an Associate Professor in the Department of Electrical, Computer, and Energy Engineering at the University of Colorado Boulder, where he is also an affiliate Faculty with the Department of Applied Mathematics. He received the Ph.D. in Information Engineering from the Department of Information Engineering, University of Padova, Italy, in 2011. He was with the University of Minnesota as a postdoc (2011-2014) and the National Renewable Energy Laboratory as a senior researcher (2014-2018). His research interests span the areas of optimization, control, and learning; current applications include power systems and autonomous systems. He received the National Science Foundation CAREER Award in 2020, the IEEE PES Prize Paper Award in 2021, and the IEEE Transactions on Control of Network Systems Best Paper Award in 2023.