This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Dynamic programming approach for continuous-time Stackelberg games

Camilo Hernández111Princeton University, ORFE department, USA. [email protected].    Nicolás Hernández Santibáñez222Departamento de Matemática, Universidad Técnica Federico Santa María, Chile. [email protected].    Emma Hubert333Princeton University, ORFE department, USA. [email protected]. Research partially supported by the NSF grant DMS-2307736.    Dylan Possamaï444ETH Zürich, Department of Mathematics, Rämistrasse 101, 8092 Zürich, Switzerland, [email protected].
Abstract

In this paper, we provide a general approach to reformulating any continuous-time stochastic Stackelberg differential game under closed-loop strategies as a single-level optimisation problem with target constraints. More precisely, we consider a Stackelberg game in which the leader and the follower can both control the drift and the volatility of a stochastic output process, in order to maximise their respective expected utility. The aim is to characterise the Stackelberg equilibrium when the players adopt ‘closed-loop strategies’, i.e. their decisions are based solely on the historical information of the output process, excluding especially any direct dependence on the underlying driving noise, often unobservable in real-world applications. We first show that, by considering the—second-order—backward stochastic differential equation associated with the continuation utility of the follower as a controlled state variable for the leader, the latter’s unconventional optimisation problem can be reformulated as a more standard stochastic control problem with stochastic target constraints. Thereafter, adapting the methodology developed by Soner and Touzi [67] or Bouchard, Élie, and Imbert [14], the optimal strategies, as well as the corresponding value of the Stackelberg equilibrium, can be characterised through the solution of a well-specified system of Hamilton–Jacobi–Bellman equations. For a more comprehensive insight, we illustrate our approach through a simple example, facilitating both theoretical and numerical detailed comparisons with the solutions under different information structures studied in the literature.

Key words: Stackelberg games, dynamic programming, second-order backward SDE, stochastic target constraint.

AMS 2000 subject classifications: Primary: 91A65; secondary: 60H30, 93E20, 91A15.

1 Introduction

The concept of hierarchical, or bi-level solution. for games was originally introduced by von Stackelberg in 1934, to describe market situations in which some firms have power of domination over others, see [76]. In the simple context of a two-player non–zero-sum game, this solution concept, now commonly known as Stackelberg equilibrium, is used to describe a situation where one of the two players—called the leader (she)—announces her strategy first, after which the second player—called the follower (he)—optimally reacts to the leader’s strategy. Therefore, in order to determine her optimal strategy, the leader should naturally anticipate the follower’s reaction to any given strategy and then choose the one that will optimise her reward function, given the follower’s best response. As such, a Stackelberg equilibrium is characterised by the pair of the leader’s optimal action and the follower’s rational response to that action. This type of solution concept is particularly relevant in situations where the players have asymmetrical power, as in the original market situation described by von Stackelberg, or when one player has more information than the other. For example, Stackelberg equilibria naturally arise in games where only one of the two players knows both players’ cost or reward functions, or when one player is more time-efficient than the other at determining her optimal strategy.

Dynamic Stackelberg games.

After its introduction, this equilibrium concept has been thoroughly studied in static competitive economics, but the mathematical treatment of its dynamic version was not developed until the 70s, first in discrete-time models by Cruz Jr. [22, 23], Gardner and Cruz Jr. [31], Başar and Selbuz [8, 9], and then more interestingly for us, in continuous-time ones by Chen and Cruz Jr. [18], Simaan and Cruz Jr. [64, 65, 66], Papavassilopoulos and Cruz Jr. [56, 57], Papavassilopoulos [55], Başar and Olsder [6], Başar [4], Bagchi [2]. For instance, Chen and Cruz Jr. [18] investigate Stackelberg solutions for a two-player non–zero-sum dynamic game with finite horizon T>0T>0, in which both players can observe the state XX and its dynamics, but only the leader knows both reward functions.

In this two-player game, the leader first chooses her control α𝒜\alpha\in{\cal A} to minimise her cost function JLJ_{\rm L}, and then the follower wishes to minimise his cost function JFJ_{\rm F} by choosing his own control β\beta\in{\cal B}, given admissibility sets 𝒜{\cal A} and {\cal B}. In this dynamic setting, the cost functions take the form

JL(α,β)gL(XT)+0TfL(t,Xt,αt,βt)dt,andJF(α,β)gF(XT)+0TfF(t,Xt,αt,βt)dt,\displaystyle J_{\rm L}(\alpha,\beta)\coloneqq g_{\rm L}(X_{T})+\int_{0}^{T}f_{\rm L}(t,X_{t},\alpha_{t},\beta_{t})\mathrm{d}t,\;\text{and}\;J_{\rm F}(\alpha,\beta)\coloneqq g_{\rm F}(X_{T})+\int_{0}^{T}f_{\rm F}(t,X_{t},\alpha_{t},\beta_{t})\mathrm{d}t,

and both optimisation problems are subject to the following dynamics for the state process

dXt=λ(t,Xt,αt,βt)dt,t[0,T],X0=x0.\displaystyle\mathrm{d}X_{t}=\lambda(t,X_{t},\alpha_{t},\beta_{t})\mathrm{d}t,\;t\in[0,T],\;X_{0}=x_{0}.

A strategy (α,β)(\alpha^{\star},\beta^{\star}) is called a Stackelberg equilibrium if, for any α𝒜\alpha\in{\cal A}

JL(α,β)JL(α,b(α)),whereb(α)argminβJF(α,β),andβb(α).\displaystyle J_{\rm L}(\alpha^{\star},\beta^{\star})\leq J_{\rm L}(\alpha,b^{\star}(\alpha)),\;\text{where}\;b^{\star}(\alpha)\coloneqq\operatorname*{argmin}_{\beta\in{\cal B}}J_{\rm F}(\alpha,\beta),\;\text{and}\;\beta^{\star}\coloneqq b^{\star}(\alpha^{\star}).

More importantly, they also introduce two different refinements of the notion of Stackelberg solutions, depending on the information available to the players: open-loop, in which the player’s strategies are decided at time 0 as a function of the initial state, and feedback, when the value of their strategies at time tt can only depend on the current state. In particular, they show that these different strategies lead in general to different solutions. This classification of Stackelberg equilibria proved to be crucial in the subsequent literature, especially when studying stochastic dynamic Stackelberg games. Unsurprisingly, it will also be at the crux of our analysis in this paper.

Stochastic Stackelberg games.

The pioneering works dealing with stochastic versions of Stackelberg games also date back to the late 70s, with the discrete-time models of Castanon [17], Başar [3], and Başar and Haurie [5]. Başar and Olsder [7, Chapter 7] give an overview of the theory of Stackelberg games at the time, i.e., static, deterministic discrete- and continuous-time, and stochastic discrete-time. One had to wait for the influential work by Yong [80] to see the literature on Stackelberg equilibria start incorporating continuous-time stochastic models. In this framework, the output process can be defined as the solution to a stochastic differential equation of the following form

dXt=σ(t,Xt,αt,βt)(λ(t,Xt,αt,βt)dt+dWt),t[0,T],X0=x0,\displaystyle\mathrm{d}X_{t}=\sigma(t,X_{t},\alpha_{t},\beta_{t})\big{(}\lambda(t,X_{t},\alpha_{t},\beta_{t})\mathrm{d}t+\mathrm{d}W_{t}\big{)},\;t\in[0,T],\;X_{0}=x_{0}, (1.1)

where WW is a Brownian motion, and the controls α\alpha and β\beta are chosen by the leader and the follower, respectively. As already mentioned, the information available to the players plays a crucial role when determining the solution concept. In [80], the author relies on the stochastic maximum principle to provide the open-loop solution to a linear–quadratic Stackelberg game, where both players can control the drift and the volatility of the state variable. Open-loop solutions are also studied, for example, by Øksendal, Sandal, and Ubøe [54] and Moon [51] in jump–diffusion models, and by Shi, Wang, and Xiong [62] in a linear–quadratic framework with asymmetric information. In parallel, feedback solutions are investigated using the dynamic programming approach, for instance by He, Prasad, and Sethi [37] in a specific model for cooperative advertising and pricing, or by Bensoussan, Chen, and Sethi [10] in an infinite-horizon model. This approach was further extended by Huang and Shi [38] to a finite-horizon problem with volatility control.

Similar to Nash equilibrium concepts, one can also consider so-called closed-loop Stackelberg solutions, where the strategies of both players can depend in particular on the trajectory of the state variable. However, as mentioned for example in Başar and Olsder [7] and Simaan and Cruz Jr. [64], closed-loop equilibria are notoriously hard to study, even in simple dynamic games. One work in this direction is Bensoussan, Chen, and Sethi [11], which extends the stochastic maximum principle approach to characterise adapted closed-loop memoryless Stackelberg solutions and, in a linear–quadratic framework, provides a comparison with the open-loop equilibrium. Li and Shi [46, 47] also discuss within a linear–quadratic framework what they call ‘closed-loop solvability’, but they also restrict to memoryless strategies, and the approach is thus similar to the one developed previously in [11]. Finally, one should also mention the paper by Li, Xu, and Zhang [42], which studies closed-loop strategies but with one-step memory in a deterministic and discrete-time setting.

While we defer to Section 2.1 the precise definitions of open-loop, feedback, and closed-loop Stackelberg solutions in a stochastic continuous-time framework, as well as a comparison of these concepts through a simple example, we emphasise that, to the best of our knowledge, there is no literature on stochastic Stackelberg games in which the players’ strategies are allowed to depend on the whole trajectory of the output process. One goal of this paper is precisely to fill the gap in the literature: we develop an approach that allows us to characterise Stackelberg equilibria with general (path-dependent) closed-loop strategies, in the sense that both the leader’s and follower’s strategies can depend on the trajectory of the state variable up to the current time, as opposed to the memoryless strategies considered in [11, 46, 47].

Extensions and applications.

Before describing our approach and results in more details, one should mention that there are now many extensions and generalisations of the traditional leader–follower game, such as zero-sum solutions, mixed leadership, control of backward SDEs, learning problems, large-scale games, and the mean-field setting, among others.555 See Sun, Wang, and Wen [72] for zero-sum games, Bensoussan, Chen, Chutani, Sethi, Siu, and Yam [12] for mixed leadership, Zheng and Shi [84, 85] and Feng, Hu, and Huang [29] for the case where the controlled state dynamics is given by a backward SDE, Li and Han [45] and Zheng and Shi [86, 87] for learning games and Ni, Liu, and Zhang [52] for the study of the time-inconsistency of open-loop solutions. As for larger-scale games, we mention Li and Yu [43] for the study of repeated Stackelberg games, in which a follower is also the leader of another game, and Kang and Shi [39] for a three-level game. The case of one leader and many followers, originally introduced in a static game by Leitmann [41] and in a stochastic framework by Wang, Wang, and Zhang [77], Vasal [75], has been extended to the mean-field setting in Fu and Horst [30], Aïd, Basei, and Pham [1], Si and Wu [63], Vasal [74], Lv, Xiong, and Zhang [49], Li and Shi [46], Gou, Huang, and Wang [32], Dayanıklı and Laurière [26], Cong and Shi [20]. Lastly, we remark that Stackelberg games cover a wide range of applications, from original economic models, as highlighted by Bagchi [2] and Van Long [73], to operation research and management science, as reviewed by Li and Sethi [44] and Dockner, Jorgensen, Van Long, and Sorger [27]. Specific applications in these areas include, but are not limited to, marketing channels as in He, Prasad, Sethi, and Gutierrez [36], cooperative advertising as in Chutani and Sethi [19], He, Prasad, and Sethi [37], insurance as in Havrylenko, Hinken, and Zagst [35], Han, Landriault, and Li [34], Guan, Liang, and Song [33], and energy generation as in Aïd, Basei, and Pham [1].

A ‘new’ Stackelberg solution concept.

In this paper, we consider a stochastic continuous-time Stackelberg game with two players, a leader, and a follower, both of whom can control the drift and volatility of the output process XX, whose dynamics take the general form (1.1). Our main theoretical result characterises the Stackelberg equilibrium when the strategies of both players are closed-loop, in the sense that their strategies can only depend on time and on the path of the output process XX. More precisely, we allow both players to build strategies whose value at time t[0,T]t\in[0,T] can be a function of time tt but more importantly of the trajectory of the process XX up to time tt, denoted XtX_{\cdot\wedge t}. In particular, under this information concept, the players’ decisions cannot directly depend on the underlying driving noise. As already emphasised, to our knowledge only the four aforementioned papers [11, 46, 47, 42] study Stackelberg equilibria for strategies falling into the class of ‘closed-loop’. However, the first three papers focus on the memoryless case, in the sense that the admissible strategies at time tt do not actually depend on the trajectory of the process up to time tt, but only on the value of the process at that time, namely XtX_{t}. The last paper [42] introduces a notion of memory but only ‘one-step’, by allowing the strategy at time tt to depend on XtX_{t} and Xt1X_{t-1}, even though in a deterministic and discrete-time framework. The authors nevertheless show that strategies with one-step memory may lead, even in simple frameworks, to different equilibria compared to their memoryless counterparts, which thus provides a first motivation to study a form of ‘pathwise’ (as opposed to memoryless) closed-loop strategies.

Beyond the distinction between ‘memoryless’ and ‘pathwise’ closed-loop strategies, another significant difference of our solution concept comparing to [11, 46, 47] is the adaptedness of the admissible strategies. In the three previous papers, the strategies are assumed to be adapted to the filtration generated by the underlying noise. Informally, it implies that the strategies may also depend on the paths of the Brownian motion driving the output process XX. While this assumption is necessary to develop a resolution approach based on the stochastic maximum principle, one may question its feasibility in practice. Indeed, in real-world applications, it is debatable whether one actually observes the paths of the underlying noise, which is usually a modelling artefact without any physical reality.666For a more thorough discussion of this point, which is intimately linked to the question of whether one should adopt the ‘weak’ or ‘strong’ point of view in stochastic optimal control problems, we refer to the illuminating discussion in Zhang [83, Section 9.1.1]. We thus consider in our framework that admissible closed-loop strategies should instead be adapted with respect to the filtration generated by the output process XX. This different, albeit natural, concept of information for continuous-time stochastic Stackelberg games actually echoes the definition of closed-loop equilibria in the literature on ‘classical’ stochastic differential games (see, for example Carmona [16, Definition 5.5] for the case of closed-loop Nash equilibrium, or Possamaï, Touzi, and Zhang [59] for zero-sum games).

It should also be emphasised that the concept of information studied here, simply labelled closed-loop for convenience, is therefore different from the so-called ‘adapted closed-loop’ concept introduced (but not studied) by Bensoussan, Chen, and Sethi [11], in which the players’ strategies may depend on the whole trajectory of the output process XX, but are nevertheless adapted with respect to the filtration generated by the underlying Brownian motion. Although it is outside the scope of this paper to study the characterisation of adapted closed-loop solutions for Stackelberg games, our illustrative example suggests that this concept of information may be ‘too broad’. More precisely, we will see in this simple example that if the leader can design a strategy depending on the trajectories of both the output and the underlying driving noise, then she can actually impose the maximum effort on the follower. This observation suggests that the difference between ‘adapted closed-loop’ (in the sense of [11]) and what we coined ‘closed-loop’ is akin to the difference between first-best and second-best equilibria defined in the literature on principal–agent problems, which are themselves specific Stackelberg games. This parallel is further reinforced by the fact that our solution concept, although surprisingly new in the literature on stochastic Stackelberg games, as well as the solution approach we propose, are in fact strongly inspired by the theory on continuous-time principal-agent problems.

Solution approach via stochastic target.

The main contribution of our paper is therefore to provide a characterisation of the closed-loop equilibrium (in the sense previously discussed) to a general continuous-time stochastic Stackelberg game, in which both players can control the drift and volatility of the output process. Allowing for path-dependent strategies leads to a more sophisticated form of equilibrium which, consequently, is more challenging to solve. Indeed, in this case, the classical approaches used in the literature to characterise open-loop, or closed-loop memoryless equilibria, such as the maximum principle, can no longer be used. The approach we developed in this paper is based on the dynamic programming principle and stochastic target problems: the main idea is to use the follower’s value function as a state variable for the leader’s problem. More precisely, by writing forward the dynamics of the value function of the follower, which by the dynamic programming principle solves a backward SDE, we are able to reformulate the leader’s problem as a stochastic control problem of a (forward) SDE system with a stochastic target constraint. We also remark that the idea of considering the forward dynamics of the value function of the follower in a Stackelberg game, but with a continuum of followers, was used independently in Dayanıklı and Laurière [26] to develop a numerical algorithm by means of Lagrange multipliers, i.e. when the target constraint is added to leader’s objective function as a penalisation term. Our approach is different in that we employ the methodology developed in Bouchard, Élie, and Imbert [14] and Bouchard, Élie, and Touzi [13], which leverages the dynamic programming principle for problems with stochastic target constraints established in Soner and Touzi [67, 68], to provide a theoretical characterisation of the closed-loop solution of a Stackelberg game through a system of Hamilton–Jacobi–Bellman equations.

Overview of the paper.

We first introduce in Section 2 a simple illustrative example, in order to highlight the various concepts of Stackelberg equilibrium and the different approaches available to solve them. More importantly, we informally explain our approach in Section 2.2 through its application to the example under consideration. The rigorous formulation of the general problem is introduced in Section 3. In Section 4, we reformulate the leader’s problem in this general Stackelberg equilibrium as a stochastic control problem with stochastic target constraint, which is then solved in Section 5.

Notations.

We let \mathbb{N}^{\star} be the set of positive integers, +[0,)\mathbb{R}_{+}\coloneqq[0,\infty) and +(0,)\mathbb{R}_{+}^{\star}\coloneqq(0,\infty). For (d,n)×(d,n)\in\mathbb{N}^{\star}\times\mathbb{N}^{\star}, d×n\mathbb{R}^{d\times n}, 𝕊d\mathbb{S}^{d}, and, 𝕊+d\mathbb{S}^{d}_{+} denote the set of d×nd\times n matrices with real entries, d×dd\times d symmetric matrices with real entries, and d×dd\times d positive semi-definite symmetric matrices with real entries, respectively. For any closed convex subset SS\subseteq\mathbb{R}, we will denote by ΠS(x)\Pi_{S}(x) the Euclidean projection of xx\in\mathbb{R} on SS. For T>0T>0 and a finite dimensional Euclidean space EE, we let 𝒞([0,T]×E,){\cal C}([0,T]\times E,\mathbb{R}) be the space of continuous functions from [0,T]×E[0,T]\times E to \mathbb{R}, as well as 𝒞1,2([0,T]×E,){\cal C}^{1,2}([0,T]\times E,\mathbb{R}) the subset of 𝒞([0,T]×E,){\cal C}([0,T]\times E,\mathbb{R}) of all continuous functions from [0,T]×E[0,T]\times E to \mathbb{R}, which are continuously differentiable in time and twice continuously differentiable in space. For every φ𝒞1,2([0,T]×E,)\varphi\in{\cal C}^{1,2}([0,T]\times E,\mathbb{R}), we denote by tφ\partial_{t}\varphi its partial derivative with respect to time and by xφ\partial_{x}\varphi and xx2φ\partial_{xx}^{2}\varphi its gradient and Hessian with respect to the space variable, respectively. We agree that the supremum over an empty set is -\infty. For a stochastic process XX, we denote by 𝔽X(tX)t0\mathbb{F}^{X}\coloneqq({\cal F}^{X}_{t})_{t\geq 0} the filtration generated by XX.

2 Illustrative example

As already outlined in the introduction, there exist various concepts of Stackelberg equilibrium. In order to highlight their differences and describe the appropriate methods to compute each of them, we choose to develop in this section a simple illustrative example.

Let T>0T>0 be a finite time horizon. For the sake of simplicity in this section, we focus on the strong formulation by fixing a probability space (Ω,,)(\Omega,{\cal F},\mathbb{P}) supporting a one-dimensional Brownian motion WW. We slightly abuse notations here and denote by 𝔽W(tW)t[0,T]\mathbb{F}^{W}\coloneqq({\cal F}_{t}^{W})_{t\in[0,T]} the natural filtration generated by WW, \mathbb{P}-augmented in order to satisfy the usual hypotheses. We assume that the controlled one-dimensional state process XX satisfies the following dynamics

dXt=(αt+βt)dt+σdWt,t[0,T],X0=x0,\displaystyle\mathrm{d}X_{t}=(\alpha_{t}+\beta_{t})\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0}\in\mathbb{R}, (2.1)

where the pair (α,β)(\alpha,\beta) represents the players’ decisions and σ\sigma\in\mathbb{R} is a given constant. More precisely, the leader first announces her strategy α𝒜\alpha\in{\cal A} at the beginning of the game, where 𝒜{\cal A} is an appropriate family of AA-valued processes for AA\subseteq\mathbb{R}. With the knowledge of the leader’s action, the follower chooses an optimal response, i.e. a control β\beta\in{\cal B} optimising his objective function, for a given set {\cal B} of BB-valued processes for BB\subseteq\mathbb{R}. The sets 𝒜{\cal A} and {\cal B} will be defined subsequently, as they crucially depend on the solution concept considered.

We assume that, given α𝒜\alpha\in{\cal A} chosen by the leader, the follower solves the following optimal stochastic control problem

VF(α)supβJF(α,β),withJF(α,β)𝔼[XTcF20Tβt2dt],\displaystyle V_{\rm F}(\alpha)\coloneqq\sup_{\beta\in{\cal B}}J_{\rm F}(\alpha,\beta),\;\text{with}\;J_{\rm F}(\alpha,\beta)\coloneqq\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{]}, (2.2)

for some cF>0c_{\rm F}>0. The best response of the follower to a control α𝒜\alpha\in{\cal A} chosen by the leader is naturally defined by

β(α)argmaxβJF(α,β),\displaystyle\beta^{\star}(\alpha)\coloneqq\operatorname*{arg\,max}_{\beta\in{\cal B}}J_{\rm F}(\alpha,\beta), (2.3)

assuming uniqueness of the best response here to simplify.

The leader, anticipating the follower’s optimal response β(α)\beta^{\star}(\alpha), chooses α𝒜\alpha\in{\cal A} that optimises her own performance criterion. More precisely, we assume here that the leader’s optimisation is given by

VLsupα𝒜JL(α,β(α)),withJL(α,β(α))𝔼[XTcL20Tαt2dt],\displaystyle V_{\rm L}\coloneqq\sup_{\alpha\in{\cal A}}J_{\rm L}\big{(}\alpha,\beta^{\star}(\alpha)\big{)},\;\text{with}\;J_{\rm L}\big{(}\alpha,\beta^{\star}(\alpha)\big{)}\coloneqq\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\alpha_{t}^{2}\mathrm{d}t\bigg{]}, (2.4)

for some cL>0c_{\rm L}>0, and where the dynamics of XX are now driven by the optimal response of the follower, i.e.

dXt=(αt+βt(α))dt+σdWt,t[0,T],X0=x0.\displaystyle\mathrm{d}X_{t}=\big{(}\alpha_{t}+\beta^{\star}_{t}(\alpha)\big{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0}\in\mathbb{R}.

The leader’s optimal action and the follower’s rational response, namely the couple (α,β(α))(\alpha^{\star},\beta^{\star}(\alpha^{\star})) for α\alpha^{\star} a maximiser in (2.4), constitute a global Stackelberg solution or equilibrium. To ensure that the value of the Stackelberg game is finite for all the various equilibrium concepts, one should require the sets AA and BB to be bounded. For the sake of simplicity, we assume here that A[a,a]A\coloneqq[-a_{\circ},a_{\circ}] and B[0,b]B\coloneqq[0,b_{\circ}] for some a>cL1a_{\circ}>c_{\rm L}^{-1} and b>cF1b_{\circ}>c_{\rm F}^{-1}.777The latter assumption is only intended to ensure that the ‘natural’ open-loop equilibrium can be reached, see Section 2.1.1.

The following section introduces the various notions of equilibrium in continuous-time stochastic Stackelberg games, and compares their solution. More importantly for our purpose, Section 2.2 illustrates our approach, based on dynamic programming and stochastic target problems, allowing to characterise a new notion of Stackelberg equilibrium, which we coin closed-loop. Before proceeding, it may be useful to have in mind the optimal—or reference—equilibrium for the leader, i.e. when she chooses both strategy directly. This optimal scenario for the leader, which can be labelled first-best in reference to its counterpart in principal-agent problems888Our choice to coin said reformulation as ‘first-best’ is not fortuitous, it is a terminology well-studied in the contract theory literature, see for instance Cvitanić and Zhang [24], which is one particular instance of a Stackelberg game., should naturally arise when the leader can deduce the follower’s strategy from her observation, and is able to strongly penalise him whenever he deviates from the optimal strategy recommended by the leader. The value of the leader in this first-best problem is naturally defined by

VLFBsup(α,β)𝒜×JL(α,β),\displaystyle V_{\rm L}^{\rm FB}\coloneqq\sup_{(\alpha,\beta)\in{\cal A}\times{\cal B}}J_{\rm L}(\alpha,\beta), (2.5)

where here, 𝒜{\cal A} and {\cal B} are the sets of 𝔽W\mathbb{F}^{W}-adapted processes taking values in AA and BB, respectively. This corresponds to a standard stochastic control problem, whose solution is provided in the following lemma.

Lemma 2.1 (First-best solution).

The optimal efforts in the first-best scenario are given by αtFB=cL1\alpha^{\rm FB}_{t}=c_{\rm L}^{-1} and βtFB=b\beta_{t}^{\rm FB}=b_{\circ} for all t[0,T]t\in[0,T], which induce the following values for the leader and the follower, respectively

VLFB=JL(αFB,βFB)=x0+(12cL+b)T,VFFBJF(αFB,βFB)=x0+(1cL+b12cFb2)T.\displaystyle V_{\rm L}^{\rm FB}=J_{\rm L}\big{(}\alpha^{\rm FB},\beta^{\rm FB}\big{)}=x_{0}+\bigg{(}\dfrac{1}{2c_{\rm L}}+b_{\circ}\bigg{)}T,\;V_{\rm F}^{\rm FB}\coloneqq J_{\rm F}\big{(}\alpha^{\rm FB},\beta^{\rm FB}\big{)}=x_{0}+\bigg{(}\dfrac{1}{c_{\rm L}}+b_{\circ}-\dfrac{1}{2}c_{\rm F}b^{2}_{\circ}\bigg{)}T.

The previous result can be proved through standard stochastic control techniques, but also using our stochastic target approach (see Section A.1).

2.1 Various Stackelberg equilibria

There exist various notions of equilibrium in a continuous-time stochastic Stackelberg game. These concepts are related to the information available to both players, the leader and the follower, at the beginning and during the game. Following the nomenclature in [7] for dynamic Stackelberg games, and extended to the stochastic version in [11], we informally999The definition of the information available to both players is rather informal here, in order to adhere to the concepts introduced in [11]. More rigorously, it could be defined as the filtration generated by the processes observable by both players. Nonetheless, we will define in a rigorous way the sets 𝒜{\cal A} and {\cal B} of admissible efforts depending on the solution concept considered. define by t{\cal I}_{t} the information available to both players at time t[0,T]t\in[0,T] and distinguish four cases

  1. (i)(i)

    adapted open-loop (AOL) when t={x0,Wt}{\cal I}_{t}=\{x_{0},W_{\cdot\wedge t}\};

  2. (ii)(ii)

    adapted feedback (AF) when t={Xt,Wt}{\cal I}_{t}=\{X_{t},W_{\cdot\wedge t}\};

  3. (iii)(iii)

    adapted closed-loop memoryless (ACLM) when t={x0,Xt,Wt}{\cal I}_{t}=\{x_{0},X_{t},W_{\cdot\wedge t}\};

  4. (iv)(iv)

    adapted closed-loop (ACL) when t={x0,Xt,Wt}{\cal I}_{t}=\{x_{0},X_{\cdot\wedge t},W_{\cdot\wedge t}\}.

As explained in [11], the information structures (i),(iii)(i),(iii) and (iv)(iv) lead to the concept of global Stackelberg solutions, where the leader actually dominates the follower over the entire duration of the game. In these situations, a Stackelberg equilibrium (α,β(α))(\alpha^{\star},\beta^{\star}(\alpha^{\star})) is characterised as in the illustrative example above by

JF(α,β(α))JF(α,β),andJL(α,β(α))JL(α,β(α)),(α,β)𝒜×.\displaystyle J_{\rm F}(\alpha,\beta^{\star}(\alpha))\geq J_{\rm F}(\alpha,\beta),\;\text{and}\;J_{\rm L}(\alpha^{\star},\beta^{\star}(\alpha^{\star}))\geq J_{\rm L}(\alpha,\beta^{\star}(\alpha)),\;\forall(\alpha,\beta)\in{\cal A}\times{\cal B}.

The information structure (ii)(ii) leads to a different concept of solution in which the leader has only an instantaneous advantage over the follower. More precisely, a feedback Stackelberg equilibrium (α,β(α))(\alpha^{\star},\beta^{\star}(\alpha^{\star})) should satisfy

JF(α,β(α))JF(α,β),andJL(α,β(α))JL(α,β(α)),(α,β)𝒜×.\displaystyle J_{\rm F}(\alpha^{\star},\beta^{\star}(\alpha^{\star}))\geq J_{\rm F}(\alpha^{\star},\beta),\;\text{and}\;J_{\rm L}(\alpha^{\star},\beta^{\star}(\alpha^{\star}))\geq J_{\rm L}(\alpha,\beta^{\star}(\alpha)),\;\forall(\alpha,\beta)\in{\cal A}\times{\cal B}.

In the following, we illustrate the existing approaches to computing the equilibrium in the first three information structures in the context of the above example. Even though the last information structure, corresponding to the adapted closed-loop (with memory) case, has not been studied in the literature, we are able to characterise it in this example. Indeed, our analysis established a connection between this Stackelberg solution concept and the first-best scenario, already discussed in Lemma 2.1.

However, the real aim of this paper is not to study existing solution concepts, but to introduce a new, albeit natural, concept of information, corresponding to the definition of closed-loop equilibrium in the literature on stochastic differential games (see, for example, Carmona [16, Definition 5.5]), in which the information available to both players at time t[0,T]t\in[0,T] is—again informally—defined as

  1. (v)(v)

    closed-loop (CL) when t={x0,Xt}{\cal I}_{t}=\{x_{0},X_{\cdot\wedge t}\}.

In particular, this information concept is different from the adapted closed-loop case introduced in [11] and mentioned above, as we do not assume here that the players have access to the paths of the Brownian motion. As already highlighted in the introduction, considering such an information structure makes sense, especially in real-world applications, as it usually seems unrealistic to believe that players can actually observe the underlying noise driving the output process, the latter being in most cases a modelling artefact. Admissible strategies constructed using this information structure are therefore not assumed to be adapted to the natural filtration generated by the Brownian motion, in contrast to adapted closed-loop strategies, hence we simply refer to them as closed-loop.

More precise specifications of this solution concept, along with an informal description of the methodology we develop to characterise the corresponding Stackelberg equilibrium, are presented separately in Section 2.2. We present below the main results obtained in the context of the example, especially the comparison of the values obtained for both players, depending on the equilibrium considered.

Comparison of the equilibria.

The results we obtain for the different solution concepts are summarised in Table 1 below. Before commenting on our results, we should point out that these findings were obtained for the example introduced at the beginning of this section, and by no means do we claim or expect that they would all be true in a more general context. Nevertheless, given the significance of some of these findings, especially the fact that, from the leader’s value point of view101010From the follower’s point of view, all the inequalities are naturally reversed., VLAOL=VLAF<VLACLM,VLCL<VLACL=VLFBV_{\rm L}^{\rm AOL}=V_{\rm L}^{\rm AF}<V_{\rm L}^{\rm ACLM},V_{\rm L}^{\rm CL}<V_{\rm L}^{\rm ACL}=V_{\rm L}^{\rm FB}, investigating the extent to which they hold in greater generality could be the subject of future research.

AOL (i)(i) and AF (ii)(ii) ACLM (iii)(iii) ACL (iv)(iv) and FB CL (v)(v)
Leader’s value (VLV_{\rm L}) x0+(12cL+1cF)Tx_{0}+\bigg{(}\dfrac{1}{2c_{\rm L}}+\dfrac{1}{c_{\rm F}}\bigg{)}T x0+(12cL+b¯)Tx_{0}+\bigg{(}\dfrac{1}{2c_{\rm L}}+\bar{b}\bigg{)}T x0+(12cL+b)Tx_{0}+\bigg{(}\dfrac{1}{2c_{\rm L}}+b_{\circ}\bigg{)}T VLCLV_{\rm L}^{\rm CL}
Follower’s value (VFV_{\rm F}) x0+(1cL+12cF)Tx_{0}+\bigg{(}\dfrac{1}{c_{\rm L}}+\dfrac{1}{2c_{\rm F}}\bigg{)}T x0+(1cL+b~)Tx_{0}+\bigg{(}\dfrac{1}{c_{\rm L}}+\widetilde{b}\bigg{)}T x0+(1cL+b12cFb2)Tx_{0}+\bigg{(}\dfrac{1}{c_{\rm L}}+b_{\circ}-\dfrac{1}{2}c_{\rm F}b^{2}_{\circ}\bigg{)}T VFCLV_{\rm F}^{\rm CL}
Table 1: Comparison of the various Stackelberg equilibria

In the ACLM case, b¯bcF1cFlog(bcF)(1cF,b)\bar{b}\coloneqq\dfrac{b_{\circ}c_{\text{$\rm F$}}-1}{c_{\text{$\rm F$}}\log(b_{\circ}c_{\text{$\rm F$}})}\in\bigg{(}\dfrac{1}{c_{\text{$\rm F$}}},b_{\circ}\bigg{)}, and b~4bcFb2cF2+3cFlog(bcF)(12cF,b12cFb2)\tilde{b}\coloneqq\dfrac{4b_{\circ}c_{\text{$\rm F$}}-b_{\circ}^{2}c_{\text{$\rm F$}}^{2}+3}{c_{\text{$\rm F$}}\log(b_{\circ}c_{\text{$\rm F$}})}\in\bigg{(}\dfrac{1}{2c_{\text{$\rm F$}}},b_{\circ}-\dfrac{1}{2}c_{\text{$\rm F$}}b^{2}_{\circ}\bigg{)}, for b>cF1b_{\circ}>c_{\text{$\rm F$}}^{-1}.

First of all, it is obviously expected that, for any concept of Stackelberg equilibrium, the value of the leader will be lower than her value in the first-best case, introduced as a reference in Lemma 2.1, since in this scenario the leader can directly choose the optimal effort of the follower. It is also expected that the more available information the leader can use to implement her strategy, the higher the value she will obtain, which translates mathematically into the following inequalities

VLAOLVLACLMVLACL,VLAFVLACLM,andVLAOLVLCLVLACL.\displaystyle V_{\rm L}^{\rm AOL}\leq V_{\rm L}^{\rm ACLM}\leq V_{\rm L}^{\rm ACL},\;V_{\rm L}^{\rm AF}\leq V_{\rm L}^{\rm ACLM},\;\text{and}\;V_{\rm L}^{\rm AOL}\leq V_{\rm L}^{\rm CL}\leq V_{\rm L}^{\rm ACL}.
inlineinlinetodo: inlineWhy should it be true in general that VLAOLVLCLV_{\rm L}^{\rm AOL}\leq V_{\rm L}^{\rm CL}? Purely in terms of ‘information’ the filtration generated by WW is larger than the one generated by XX.

In the context of our simple example, our first finding is that the Stackelberg equilibrium, and hence the associated values for the leader and the follower, coincide for both the adapted open-loop (Section 2.1.1) and the adapted feedback (Section 2.1.2) information structures. This might reflect how the additional information under the feedback structure is counterbalanced by the global dominance of the open-loop strategies. Regarding the value of the leader in the ACLM information structure (Section 2.1.3), strict inequalities with respect to the values in the AOL and ACL cases can be obtained for specific choices of the parameters a,b,cLa_{\circ},b_{\circ},c_{\rm L}, and cFc_{\rm F}. Namely, we assume in Lemma A.1 that

a>max{1cL+b(bcF1),12cF(b2cF21)1cL},\displaystyle a_{\circ}>\max\bigg{\{}\frac{1}{c_{\rm L}}+b_{\circ}\big{(}b_{\circ}c_{\rm F}-1\big{)},\frac{1}{2c_{\rm F}}\big{(}b_{\circ}^{2}c_{\rm F}^{2}-1\big{)}-\frac{1}{c_{\rm L}}\bigg{\}}, (2.6)

in order to compute explicitly the value of the leader.

On the other hand, our analysis of the Stackelberg game under adapted closed-loop strategies in Section 2.1.4 shows that as long as the leader can effectively punish the follower at no additional cost, see Equation 2.13, then the problem degenerates to the first-best case. More precisely, by observing the trajectory of XX as well as that of WW, the leader can actually deduce the follower’s effort at each time, and thus force him to perform the maximum effort bb_{\circ}, threatening to significantly penalise him otherwise. This is the case, for instance, if

a12cFb+12cFb21cL.\displaystyle a_{\circ}\geq\dfrac{1}{2c_{\rm F}}-b_{\circ}+\dfrac{1}{2}c_{\rm F}b_{\circ}^{2}-\dfrac{1}{c_{\rm L}}. (2.7)

.

Finally, regarding our equilibrium, namely closed-loop, while it is clear that the value for the leader should be higher than in the AOL case, and lower than in the ACL and FB cases, the comparison with the ACLM case is less straightforward. Unfortunately, we are not able to obtain explicit results in this framework, even in the context of this simple example, and we thus rely on numerical results, presented in Section 2.2.4. These numerical results seem to illustrate that the CL equilibrium gives a higher value for the leader compare to the ACLM case, at least when aa_{\circ} is chosen sufficiently large so that Equation 2.6 and Equation 2.7 are satisfied. Although we cannot rule out the possibility that these conclusions could be reversed for different sets of parameters, the numerical results nevertheless highlight that these two equilibria are essentially different.

2.1.1 Adapted open-loop strategies

In a Stackelberg game under the adapted open-loop (AOL) information structure, both players have access to the initial value of XX, namely x0x_{0}, and the trajectory of the Brownian motion WW. Since the leader first announces her strategy α\alpha, its value αt\alpha_{t} at a any time t[0,T]t\in[0,T] should only depend on the realisation of the Brownian motion on [0,t][0,t], and on the initial value x0x_{0} of the state. The leader’s strategy space 𝒜{\cal A} in this case is thus naturally defined by 𝒜{α:[0,T]×Ω×{x0}A:αis𝔽W-adapted}{\cal A}\coloneqq\{\alpha:[0,T]\times\Omega\times\{x_{0}\}\longrightarrow A:\alpha\;\text{is}\;\mathbb{F}^{W}\text{-adapted}\}. As the follower makes his decision after the leader announces her whole strategy α\alpha on [0,T][0,T], his strategy may also depend on the leader’s announced strategy. More precisely, the value βt\beta_{t} of the follower’s response strategy at time t[0,T]t\in[0,T] is naturally measurable with respect to tW{\cal F}_{t}^{W}, but can also depend on the leader’s strategy α\alpha. His response strategy space is thus defined by

{β:[0,T]×Ω×{x0}×𝒜B:(βt(,x0,α))t[0,T]is an𝔽W-adapted process for allα𝒜}.\displaystyle{\cal B}\coloneqq\big{\{}\beta:[0,T]\times\Omega\times\{x_{0}\}\times{\cal A}\longrightarrow B:(\beta_{t}(\cdot,x_{0},\alpha))_{t\in[0,T]}\;\text{is an}\;\mathbb{F}^{W}\text{-adapted process for all}\;\alpha\in{\cal A}\big{\}}.
inlineinlinetodo: inlineFor this: can’t we just say that αt(ω)=a(t,Wt(ω),x0)\alpha_{t}(\omega)=a(t,W_{\cdot\wedge t}(\omega),x_{0}) for some measurable map aa and then that for any α𝒜\alpha\in{\cal A}, βt(ω)=b(t,Wt(ω),x0,αt(ω))\beta_{t}(\omega)=b(t,W_{\cdot\wedge t}(\omega),x_{0},\alpha_{\cdot\wedge t}(\omega)) for some measurable map bb? I think this is equivalent to what Camilo is proposing but maybe more readable.

Note that, at any time t[0,T]t\in[0,T], since the information available to the leader is also available to the follower, the follower can naturally compute the value of the leader’s strategy at that instant tt, i.e., αt\alpha_{t}. However, he cannot anticipate the future values of the leader’s strategy α\alpha.

As described in [11, Section 3], one way to characterise a global Stackelberg equilibrium under the AOL information structure is to rely on the maximum principle. A general result is given, for example, in [11, Proposition 3.1], but we briefly describe this approach through its application to our example. Recall that, given the leader’s strategy α𝒜\alpha\in{\cal A}, the follower’s problem is defined by Equation 2.2, where the dynamics of the state variable XX satisfies (2.1). To solve this stochastic optimal control problem through the maximum principle, we first define the appropriate Hamiltonian

hF(t,a,y,z,b)(a+b)y+σzcF2b2,(t,a,y,z,b)[0,T]×A×2×B.\displaystyle h^{\rm F}(t,a,y,z,b)\coloneqq(a+b)y+\sigma z-\dfrac{c_{\rm F}}{2}b^{2},\;(t,a,y,z,b)\in[0,T]\times A\times\mathbb{R}^{2}\times B.

Suppose now that there exists a solution β(α)\beta^{\star}(\alpha) to the follower’s problem (2.2) for any α𝒜\alpha\in{\cal A}. Then, the maximum principle states that there exists a pair of real-valued, 𝔽W\mathbb{F}^{W}-adapted processes (YF,ZF)(Y^{\rm F},Z^{\rm F}) such that

{dXt=(αt+βt(α))dt+σdWt,t[0,T],X0=x0;dYtF=ZtFdWt,t[0,T],YTF=1;βt(α)argmaxbB{hF(t,αt,YtF,ZtF,b)},dt–a.e.\displaystyle\begin{cases}\displaystyle\mathrm{d}X_{t}=\big{(}\alpha_{t}+\beta^{\star}_{t}(\alpha)\big{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0};\\[5.0pt] \displaystyle\mathrm{d}Y_{t}^{\rm F}=Z_{t}^{\rm F}\mathrm{d}W_{t},\;t\in[0,T],\;Y^{\rm F}_{T}=1;\\[5.0pt] \displaystyle\beta^{\star}_{t}(\alpha)\coloneqq\operatorname*{arg\,max}_{b\in B}\big{\{}h^{\rm F}\big{(}t,\alpha_{t},Y_{t}^{\rm F},Z_{t}^{\rm F},b\big{)}\big{\}},\;\mathrm{d}t\otimes\mathbb{P}\text{\rm--a.e.}\end{cases} (2.8)

Note that the drift in the backward SDE (BSDE for short) in (2.8), commonly called adjoint process, is equal to 0 because the Hamiltonian hFh^{\rm F} does not depend on the state variable. Clearly, in this simple example, the pair (YF,ZF)(Y^{\rm F},Z^{\rm F}) satisfying the BSDE is the pair of constant processes (1,0)(1,0). This leads to the optimal constant control βt(α)=1/cFB\beta^{\star}_{t}(\alpha)=1/c_{\rm F}\in B for all t[0,T]t\in[0,T]. In particular, this control is independent of the leader’s choice of α\alpha. The leader’s problem defined by (2.4) thus becomes

VL=supα𝒜𝔼[XTcL20Tαt2dt],subjecttodXt=(αt+1cF)dt+σdWt,t[0,T].\displaystyle V_{\rm L}=\sup_{\alpha\in{\cal A}}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\alpha_{t}^{2}\mathrm{d}t\bigg{]},\;\text{subject}\;\text{to}\;\mathrm{d}X_{t}=\bigg{(}\alpha_{t}+\dfrac{1}{c_{\rm F}}\bigg{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T].

This optimal control problem is trivial to solve, and also leads to an optimal constant control for the leader, namely αt=1/cLA\alpha^{\star}_{t}=1/c_{\rm L}\in A for all t[0,T]t\in[0,T]. The open-loop equilibrium is thus given by (1/cL,1/cF)(1/c_{\rm L},1/c_{\rm F}), which is admissible thanks to the assumptions a1/cLa_{\circ}\geq 1/c_{\rm L} and b1/cFb_{\circ}\geq 1/c_{\rm F}, and one can easily compute the corresponding values for the leader and the follower, given in Table 1.

2.1.2 Adapted feedback strategies

A Stackelberg game under the adapted feedback (AF) information structure differs from the other Stackelberg equilibrium, not only in the information structure itself, but also in the way the game is played. In this scenario, both players only have access to the current value of XX and the trajectory of the Brownian motion WW. In other words, the leader’s strategy at time t[0,T]t\in[0,T] can only depend on the value XtX_{t} and the realisation of the Brownian motion on [0,t][0,t]. Under this information structure, the equilibrium is not global, in the sense that at each time t[0,T]t\in[0,T], the leader first decides her action αt\alpha_{t}, and then the follower makes his decision, immediately after observing the leader’s instant action at time tt, rather than her whole strategy over [0,T][0,T]. Therefore, the leader’s and follower’s strategy spaces are respectively defined by

𝒜{α:[0,T]×Ω×A:αis𝔽W-adapted},and{β:[0,T]×Ω××AB:𝔽W-adapted}.{\cal A}\coloneqq\big{\{}\alpha:[0,T]\times\Omega\times\mathbb{R}\longrightarrow A:\alpha\;\text{is}\;\mathbb{F}^{W}\text{-adapted}\big{\}},\;\text{and}\;{\cal B}\coloneqq\big{\{}\beta:[0,T]\times\Omega\times\mathbb{R}\times A\longrightarrow B:\mathbb{F}^{W}\text{-adapted}\big{\}}.

Camilo: Maybe α\alpha is 𝔽AF\mathbb{F}^{AF}-adapted where 𝔽AF(tW,Xt)t[0,T]\mathbb{F}^{AF}\coloneqq({\cal F}^{W,X_{t}}_{t})_{t\in[0,T]}, tW,Xtσ((Ws,Xt):s[0,t]){\cal F}^{W,X_{t}}_{t}\coloneqq\sigma((W_{s},X_{t}):s\in[0,t]) and β(,a)\beta_{\cdot}(\cdot,a) is 𝔽AF\mathbb{F}^{AF}-adapted for any aAa\in A. inlineinlinetodo: inlineFor this: can’t we just say that αt(ω)=a(t,Wt(ω),Xt(ω))\alpha_{t}(\omega)=a(t,W_{\cdot\wedge t}(\omega),X_{t}(\omega)) for some measurable map aa and βt(ω)=b(t,Wt(ω),Xt(ω))\beta_{t}(\omega)=b(t,W_{\cdot\wedge t}(\omega),X_{t}(\omega)) for some measurable map bb? I think this is equivalent to what Camilo is proposing but maybe more readable.

Recall that an AF Stackelberg solution is a pair (α,β(α))𝒜×(\alpha^{\star},\beta^{\star}(\alpha^{\star}))\in{\cal A}\times{\cal B} satisfying JF(α,β(α))JF(α,β)J_{\rm F}(\alpha^{\star},\beta^{\star}(\alpha^{\star}))\geq J_{\rm F}(\alpha^{\star},\beta), β\forall\beta\in{\cal B}, JL(α,β(α))JL(α,β(α))J_{\rm L}(\alpha^{\star},\beta^{\star}(\alpha^{\star}))\geq J_{\rm L}(\alpha,\beta^{\star}(\alpha)), α𝒜\forall\alpha\in{\cal A}. To compute such solution, we can rely on the approach in [10], based on the dynamic programming method. More precisely, for (t,zF,zL,a,b)[0,T]×2×A×B(t,z^{\rm F},z^{\rm L},a,b)\in[0,T]\times\mathbb{R}^{2}\times A\times B, we introduce the players’ Hamiltonians

hF(t,zF,a,b)(a+b)zFcF2b2,andhL(t,zL,a,b)(a+b)zLcL2a2.\displaystyle h^{\rm F}\big{(}t,z^{\rm F},a,b\big{)}\coloneqq(a+b)z^{\rm F}-\dfrac{c_{\rm F}}{2}b^{2},\;\text{and}\;h^{\rm L}\big{(}t,z^{\rm L},a,b\big{)}\coloneqq(a+b)z^{\rm L}-\dfrac{c_{\rm L}}{2}a^{2}.

For a fixed action of the leader, the follower’s optimal response is given by the maximiser of his Hamiltonian, i.e.

b(t,zF,a)argmaxbB{hF(t,zF,a,b)}=ΠB(zFcF),(t,zF,a)[0,T]××A,\displaystyle b^{\star}\big{(}t,z^{\rm F},a\big{)}\coloneqq\operatorname*{arg\,max}_{b\in B}\big{\{}h^{\rm F}\big{(}t,z^{\rm F},a,b\big{)}\big{\}}=\Pi_{B}\bigg{(}\dfrac{z^{\rm F}}{c_{\rm F}}\bigg{)},\;(t,z^{\rm F},a)\in[0,T]\times\mathbb{R}\times A,

recalling that for all xx\in\mathbb{R}, ΠB(x)\Pi_{B}(x) denotes the projection of xx on BB. One should then replace this optimal response into the leader’s Hamiltonian. Nevertheless, in this example it does not change the functional maximising the leader’s Hamiltonian, given by

a(t,zF,zL)argmaxaA{hL(t,zL,a,b(t,zF,a))}=ΠA(zLcL),(t,zF,zL)[0,T]×2.\displaystyle a^{\star}\big{(}t,z^{\rm F},z^{\rm L}\big{)}\coloneqq\operatorname*{arg\,max}_{a\in A}\big{\{}h^{\rm L}\big{(}t,z^{\rm L},a,b^{\star}(t,z^{\rm F},a)\big{)}\big{\}}=\Pi_{A}\bigg{(}\dfrac{z^{\rm L}}{c_{\rm L}}\bigg{)},\;(t,z^{\rm F},z^{\rm L})\in[0,T]\times\mathbb{R}^{2}.

To compute the equilibrium, one must solve the following system of coupled Hamilton–Jacobi–Bellman equations

{tvF(t,x)(ΠA(xvL(t,x)cL)+ΠB(xvF(t,x)cF))xvF(t,x)+cF2ΠB2(xvF(t,x)cF)12σ2xxvF(t,x)=0,tvL(t,x)(ΠA(xvL(t,x)cL)+ΠB(xvF(t,x)cF))xvL(t,x)+cL2ΠA2(xvL(t,x)cL)12σ2xxvL(t,x)=0,\displaystyle\begin{cases}-\partial_{t}v_{\rm F}(t,x)-\bigg{(}\Pi_{A}\bigg{(}\dfrac{\partial_{x}v_{\rm L}(t,x)}{c_{\rm L}}\bigg{)}+\Pi_{B}\bigg{(}\dfrac{\partial_{x}v_{\rm F}(t,x)}{c_{\rm F}}\bigg{)}\bigg{)}\partial_{x}v_{\rm F}(t,x)+\dfrac{c_{\rm F}}{2}\Pi_{B}^{2}\bigg{(}\dfrac{\partial_{x}v_{\rm F}(t,x)}{c_{\rm F}}\bigg{)}-\dfrac{1}{2}\sigma^{2}\partial_{xx}v_{\rm F}(t,x)=0,\\[8.00003pt] -\partial_{t}v_{\rm L}(t,x)-\bigg{(}\Pi_{A}\bigg{(}\dfrac{\partial_{x}v_{\rm L}(t,x)}{c_{\rm L}}\bigg{)}+\Pi_{B}\bigg{(}\dfrac{\partial_{x}v_{\rm F}(t,x)}{c_{\rm F}}\bigg{)}\bigg{)}\partial_{x}v_{\rm L}(t,x)+\dfrac{c_{\rm L}}{2}\Pi_{A}^{2}\bigg{(}\dfrac{\partial_{x}v_{\rm L}(t,x)}{c_{\rm L}}\bigg{)}-\dfrac{1}{2}\sigma^{2}\partial_{xx}v_{\rm L}(t,x)=0,\end{cases}

for all (t,x)[0,T)×(t,x)\in[0,T)\times\mathbb{R}, with boundary conditions vF(T,x)=vL(T,x)=xv_{\rm F}(T,x)=v_{\rm L}(T,x)=x, xx\in\mathbb{R}. One can check using a standard verification theorem that the appropriate solutions to the previous system are

vF(t,x)=x+(1cL+12cF)(Tt),andvL(t,x)=x+(1cF+12cL)(Tt),(t,x)[0,T]×,\displaystyle v_{\rm F}(t,x)=x+\bigg{(}\dfrac{1}{c_{\rm L}}+\dfrac{1}{2c_{\rm F}}\bigg{)}(T-t),\;\text{and}\;v_{\rm L}(t,x)=x+\bigg{(}\dfrac{1}{c_{\rm F}}+\dfrac{1}{2c_{\rm L}}\bigg{)}(T-t),\;(t,x)\in[0,T]\times\mathbb{R},

which correspond to the constant strategies (1/cL,1/cF)A×B(1/c_{\rm L},1/c_{\rm F})\in A\times B. In particular, the feedback Stackelberg equilibrium coincides with the open-loop solution computed before, both in terms of strategy and corresponding value.

2.1.3 Adapted closed-loop memoryless strategies

If the information structure is assumed to be adapted closed-loop memoryless (ACLM), then both players have access to the initial and current value of XX, as well as the trajectory of the Brownian motion WW. This means that both players can make the values of their decisions at time tt contingent on additionally the current state information XtX_{t}, when compared to the AOL information structure case. Then, the leader’s strategy space and the follower’s response strategy space are naturally defined by

𝒜\displaystyle{\cal A} {α:[0,T]×Ω××{x0}A:α(,Xt?,x0)𝔽W-adapted},\displaystyle\coloneqq\big{\{}\alpha:[0,T]\times\Omega\times\mathbb{R}\times\{x_{0}\}\longrightarrow A:{\color[rgb]{0,0.6,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.6,0}\alpha_{\cdot}(\cdot,X_{t}?,x_{0})}\;\mathbb{F}^{W}\text{-adapted}\big{\}},
\displaystyle{\cal B} {β:[0,T]×Ω××{x0}×𝒜B:β(,Xt?,x0,α)𝔽W-adapted,α𝒜}.\displaystyle\coloneqq\big{\{}\beta:[0,T]\times\Omega\times\mathbb{R}\times\{x_{0}\}\times{\cal A}\longrightarrow B:{\color[rgb]{0,0.6,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.6,0}\beta_{\cdot}(\cdot,X_{t}?,x_{0},\alpha)}\;\mathbb{F}^{W}\text{-adapted},\;\forall\alpha\in{\cal A}\big{\}}.

Camilo: α(,x0)\alpha_{\cdot}(\cdot,x_{0}) is 𝔽AF\mathbb{F}^{AF}-adapted for any x0x_{0}\in\mathbb{R} inlineinlinetodo: inlineFor this: can’t we just say that αt(ω)=a(t,Wt(ω),Xt(ω),x0)\alpha_{t}(\omega)=a(t,W_{\cdot\wedge t}(\omega),X_{t}(\omega),x_{0}) for some measurable map aa and then that for any α𝒜\alpha\in{\cal A}, βt(ω)=b(t,Wt(ω),Xt(ω),x0,αt(ω))\beta_{t}(\omega)=b(t,W_{\cdot\wedge t}(\omega),X_{t}(\omega),x_{0},\alpha_{\cdot\wedge t}(\omega)) for some measurable map bb? I think this is equivalent to what Camilo is proposing but maybe more readable. As mentioned above, the main difference between the ACLM and the AOL information structures is that the leader’s control at time tt can depend on the value of the state at that time. However, by choosing his strategy β\beta, the follower will naturally impact the dynamic of the state XX and thus its value, which in turn impacts the value of the leader’s control α\alpha. Therefore, in order to compute his optimal response to a strategy α\alpha of the leader, the follower needs to take into account the retroaction of his control on the value of the leader’s control. This leads to a more sophisticated form of equilibrium. In particular, contrary to the AOL case where the leader is relatively myopic, in the sense that she cannot possibly take into account the choice of the follower, under the ACLM information structure, she can now design a strategy indexed on the state that will therefore take into account the follower’s actions.

In order to characterise the global Stackelberg equilibrium under the ACLM information structure, we can again rely on the maximum principle (see [11, Section 4]). First, to highlight the dependency of the value αt\alpha_{t} on the current value of the state XtX_{t}, we write αtat(Xt)\alpha_{t}\eqqcolon a_{t}(X_{t}) for a:[0,T]×Ω××{x0}Aa:[0,T]\times\Omega\times\mathbb{R}\times\{x_{0}\}\longrightarrow A, whose values at a fixed (t,ω)[0,T]×Ω(t,\omega)\in[0,T]\times\Omega induces the family A{\rm A} of mappings a:×{x0}A{\rm a}:\mathbb{R}\times\{x_{0}\}\longrightarrow A. We can then follow the maximum principle approach as before, but taking into account this dependency. More precisely, as before, we fix the leader’s strategy α𝒜\alpha\in{\cal A} and thus its value at(Xt)a_{t}(X_{t}) at time tt, and consider the follower’s problem given by (2.2), but now subject to the following dynamics for the state

dXt=(at(Xt)+βt)dt+σdWt,t[0,T],X0=x0,\displaystyle\mathrm{d}X_{t}=(a_{t}(X_{t})+\beta_{t})\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0},

where the dependency of the leader’s control on the state appears explicitly. This dependency will thus also appear explicitly in the Hamiltonian

hF(t,a,x,y,z,b)(a(x)+b)y+σzcF2b2,(t,a,x,y,z,b)[0,T]×A×3×B.\displaystyle h^{\rm F}(t,{\rm a},x,y,z,b)\coloneqq({\rm a}(x)+b)y+\sigma z-\dfrac{c_{\rm F}}{2}b^{2},\;(t,{\rm a},x,y,z,b)\in[0,T]\times{\rm A}\times\mathbb{R}^{3}\times B.

Suppose that there exists a solution β(α)\beta^{\star}(\alpha) to the follower’s problem (2.2) for any α𝒜\alpha\in{\cal A}. Then, the maximum principle states that there exists a pair of 𝔽W\mathbb{F}^{W}-adapted processes (YF,ZF)(Y^{\rm F},Z^{\rm F}) satisfying the forward–backward SDE (FBSDE for short)

{dXt=(at(Xt)+βt(α))dt+σdWt,t[0,T],X0=x0,dYtF=xhF(t,αt,Xt,YtF,ZtF,βt(α))dt+ZtFdWt,t[0,T],YTF=1,βt(α)argmaxbB{hF(t,αt,Xt,YtF,ZtF,b)},t[0,T].\displaystyle\begin{cases}\displaystyle\mathrm{d}X_{t}=\big{(}a_{t}(X_{t})+\beta^{\star}_{t}(\alpha)\big{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0},\\[5.0pt] \displaystyle\mathrm{d}Y_{t}^{\rm F}=-\partial_{x}h^{\rm F}\big{(}t,\alpha_{t},X_{t},Y_{t}^{\rm F},Z_{t}^{\rm F},\beta^{\star}_{t}(\alpha)\big{)}\mathrm{d}t+Z_{t}^{\rm F}\mathrm{d}W_{t},\;t\in[0,T],\;Y^{\rm F}_{T}=1,\\[5.0pt] \displaystyle\beta^{\star}_{t}(\alpha)\coloneqq\operatorname*{arg\,max}_{b\in B}\big{\{}h^{\rm F}\big{(}t,\alpha_{t},X_{t},Y_{t}^{\rm F},Z_{t}^{\rm F},b\big{)}\big{\}},\;t\in[0,T].\end{cases}

Notice that hFh^{\rm F} now depends explicitly on the state variable, and thus the associated partial derivative is not equal to zero, contrary to the AOL case. By computing the maximiser of hFh^{\rm F} over bBb\in B, the previous FBSDE system becomes

{dXt=(at(Xt)+ΠB(YtFcF))dt+σdWt,t[0,T],X0=x0,dYtF=xat(Xt)YtFdt+ZtFdWt,t[0,T],YTF=1.\displaystyle\begin{cases}\displaystyle\mathrm{d}X_{t}=\bigg{(}a_{t}(X_{t})+\Pi_{B}\bigg{(}\dfrac{Y_{t}^{\rm F}}{c_{\rm F}}\bigg{)}\bigg{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0},\\[8.00003pt] \displaystyle\mathrm{d}Y_{t}^{\rm F}=-\partial_{x}a_{t}(X_{t})Y_{t}^{\rm F}\mathrm{d}t+Z_{t}^{\rm F}\mathrm{d}W_{t},\;t\in[0,T],\;Y^{\rm F}_{T}=1.\end{cases} (2.9)

One can then reformulate the leader’s problem defined by (2.4) as

VL=supα𝒜𝔼[XTcL20Tαt2dt],subject to the dynamics in (2.9).\displaystyle V_{\rm L}=\sup_{\alpha\in{\cal A}}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\alpha_{t}^{2}\mathrm{d}t\bigg{]},\;\text{subject to the dynamics in \eqref{eq:MP-ACLM-example}}. (2.10)

That is, the leader’s problem is equivalent to a stochastic optimal control problem of an FBSDE. Note however that the presence of the derivative xa\partial_{x}a of the leader’s strategy in (2.9) results in a non-standard optimal control problem for the leader, which can nevertheless also be solved via the maximum principle, as described in [11, Section 4]. More precisely, the idea to solve the leader’s problem is to look at efforts of the form at(Xt)=at2Xt+at1a_{t}(X_{t})=a^{2}_{t}X_{t}+a^{1}_{t}, where a1a^{1} and a2a^{2} are 𝔽W\mathbb{F}^{W}-adapted, \mathbb{R}-valued processes such that at2Xt+at1Aa^{2}_{t}X_{t}+a^{1}_{t}\in A for every t[0,T]t\in[0,T], \mathbb{P}–a.s. We define 𝒜2{\cal A}^{2} as the space of processes (a1,a2)(a^{1},a^{2}) satisfying these properties. It then follows from [11, Theorem 4.1] that VL=V~LV_{\rm L}=\widetilde{V}_{\rm L}, where

V~Lsup(a1,a2)𝒜2𝔼[XTcL20T(at2Xt+at1)2dt],\displaystyle\widetilde{V}_{\rm L}\coloneqq\sup_{(a^{\text{$1$}},a^{\text{$2$}})\in{\cal A}^{2}}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\big{(}a^{2}_{t}X_{t}+a^{1}_{t}\big{)}^{2}\mathrm{d}t\bigg{]}, (2.11)

subject to

{dXt=(at2Xt+at1+ΠB(YtFcF))dt+σdWt,t[0,T],X0=x0,dYtF=at2YtFdt+ZtFdWt,t[0,T],YTF=1.\displaystyle\begin{cases}\displaystyle\mathrm{d}X_{t}=\bigg{(}a^{2}_{t}X_{t}+a^{1}_{t}+\Pi_{B}\bigg{(}\dfrac{Y_{t}^{\rm F}}{c_{\rm F}}\bigg{)}\bigg{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0},\\[8.00003pt] \displaystyle\mathrm{d}Y_{t}^{\rm F}=-a^{2}_{t}Y_{t}^{\rm F}\mathrm{d}t+Z_{t}^{\rm F}\mathrm{d}W_{t},\;t\in[0,T],\;Y^{\rm F}_{T}=1.\end{cases}

To solve V~L\widetilde{V}_{L}, we define, for (t,x,x,y,y,z,z,a1,a2)[0,T]×8(t,x,x^{\prime},y,y^{\prime},z,z^{\prime},{\rm a}^{1},{\rm a}^{2})\in[0,T]\times\mathbb{R}^{8}, the Hamiltonian

hL(x,x,y,y,z,z,a1,a2)(a2x+a1+ΠB(ycF))y+σza2yxcL2(a2x+a1)2.\displaystyle h^{\rm L}(x,x^{\prime},y,y^{\prime},z,z^{\prime},{\rm a}^{1},{\rm a}^{2})\coloneqq\bigg{(}{\rm a}^{2}x+{\rm a}^{1}+\Pi_{B}\bigg{(}\dfrac{y^{\prime}}{c_{\rm F}}\bigg{)}\bigg{)}y+\sigma z-{\rm a}^{2}y^{\prime}x^{\prime}-\dfrac{c_{\rm L}}{2}({\rm a}^{2}x+{\rm a}^{1})^{2}.

Again by [11, Theorem 4.1], if α^𝒜\hat{\alpha}\in{\cal A} is a solution to the leader’s problem (2.10) with the corresponding state trajectory (X^,Y^F,Z^F)(\hat{X},\hat{Y}^{\rm F},\hat{Z}^{\rm F}), then there exists a triple of 𝔽W\mathbb{F}^{W}-adapted processes (XL,YL,ZL)(X^{\rm L},Y^{\rm L},Z^{\rm L}) such that

{dXtL=yhLdtzhLdWt,t[0,T],X0L=0,dYtL=xhLdt+ZtLdWt,t[0,T],YTL=1,\displaystyle\begin{cases}\displaystyle\mathrm{d}X^{\rm L}_{t}=-\partial_{y^{\text{$\prime$}}}h^{\rm L}\mathrm{d}t-\partial_{z^{\text{$\prime$}}}h^{\rm L}\mathrm{d}W_{t},\;t\in[0,T],\;X^{\rm L}_{0}=0,\\[5.0pt] \displaystyle\mathrm{d}Y_{t}^{\rm L}=-\partial_{x}h^{\rm L}\mathrm{d}t+Z_{t}^{\rm L}\mathrm{d}W_{t},\;t\in[0,T],\;Y^{\rm L}_{T}=1,\end{cases}

where the derivatives of hLh^{\rm L} are evaluated at (X^t,XtL,YtL,Y^tF,ZtL,Z^tF,a^t(X^t)xa^t(X^t)X^t,xa^t(X^t))\big{(}\hat{X}_{t},X^{\rm L}_{t},Y_{t}^{\rm L},\hat{Y}_{t}^{\rm F},Z_{t}^{\rm L},\hat{Z}_{t}^{\rm F},\hat{a}_{t}(\hat{X}_{t})-\partial_{x}\hat{a}_{t}(\hat{X}_{t})\hat{X}_{t},\partial_{x}\hat{a}_{t}(\hat{X}_{t})\big{)}, and

(a^t(X^t)xa^t(X^t)X^t,xa^t(X^t))argmax(a1,a2)A2(X^t){hL(X^t,XtL,YtL,Y^tF,ZtL,Z^tF,a1,a2)},t[0,T],\displaystyle\big{(}\hat{a}_{t}(\hat{X}_{t})-\partial_{x}\hat{a}_{t}(\hat{X}_{t})\hat{X}_{t},\partial_{x}\hat{a}_{t}(\hat{X}_{t})\big{)}\in\operatorname*{arg\,max}_{({\rm a}^{\text{$1$}},{\rm a}^{\text{$2$}})\in A^{\text{$2$}}(\hat{X}_{\text{$t$}})}\big{\{}h^{\rm L}\big{(}\hat{X}_{t},X^{\rm L}_{t},Y_{t}^{\rm L},\hat{Y}_{t}^{\rm F},Z_{t}^{\rm L},\hat{Z}_{t}^{\rm F},{\rm a}^{1},{\rm a}^{2}\big{)}\big{\}},\;t\in[0,T],

where A2(x)A^{2}(x) is the set of (a1,a2)2(a^{1},a^{2})\in\mathbb{R}^{2} such that a1+a2xAa^{1}+a^{2}x\in A. Note, however, that the maximiser of hLh^{\rm L} is not well-defined without further restriction on the strategy α𝒜\alpha\in{\cal A}. A way to tackle this issue is to impose a priori bounds on xa\partial_{x}a, as done in [11, Section 5.2], that will later be relaxed in order to not lose generality. We thus assume that a2k\|a^{2}\|_{\infty}\leq k, for some k>0k>0, and we denote by solution ACLM-kk the corresponding constrained solution. We point out that we will later study the behaviour of the solution ACLM-kk as kk\to\infty. Optimising hLh^{\rm L} with respect to a1a^{1} gives

a^1(y,x)ycLa2x, and, hL(x,x,y,y,z,z,a^1,a2)=12y2cL+yycF+σza2yx.\displaystyle\hat{a}^{1}(y,x)\coloneqq\dfrac{y}{c_{\rm L}}-a^{2}x,\text{ and, }\;h^{\rm L}(x,x^{\prime},y,y^{\prime},z,z^{\prime},\hat{a}^{1},a^{2})=\dfrac{1}{2}\dfrac{y^{2}}{c_{\rm L}}+\dfrac{yy^{\prime}}{c_{\rm F}}+\sigma z-a^{2}y^{\prime}x^{\prime}.

Then, the maximisation with respect to a2a^{2} gives a^2ksign(yx)\hat{a}^{2}\coloneqq k{\rm sign}(y^{\prime}x^{\prime}). Therefore, by the maximum principle, if (a^1,a^2)(\hat{a}_{1},\hat{a}_{2}) is a solution to Problem (2.11), then there exists a tuple of 𝔽W\mathbb{F}^{W}-adapted processes (X,XL,YF,ZF,YL,ZL)(X,X^{\rm L},Y^{\rm F},Z^{\rm F},Y^{\rm L},Z^{\rm L}) such that

{dXt=(a^t2Xt+a^t1+ΠB(YtFcF))dt+σdWt,t[0,T],X0=x0,dXtL=(YtLcFa^t2XtL)dt,t[0,T],X0L=0,dYtF=a^t2YtFdt+ZtFdWt,t[0,T],YTF=1,dYtL=0×dt+ZtLdWt,t[0,T],YTL=1.\displaystyle\begin{cases}\displaystyle\mathrm{d}X_{t}=\bigg{(}\hat{a}^{2}_{t}X_{t}+\hat{a}^{1}_{t}+\Pi_{B}\bigg{(}\dfrac{Y_{t}^{\rm F}}{c_{\rm F}}\bigg{)}\bigg{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;\;t\in[0,T],\;X_{0}=x_{0},\\[8.00003pt] \displaystyle\mathrm{d}X^{\rm L}_{t}=-\bigg{(}\dfrac{Y^{\rm L}_{t}}{c_{\rm F}}-\hat{a}^{2}_{t}X^{\rm L}_{t}\bigg{)}\mathrm{d}t,\;t\in[0,T],\;X^{\rm L}_{0}=0,\\[10.00002pt] \displaystyle\mathrm{d}Y_{t}^{\rm F}=-\hat{a}^{2}_{t}Y_{t}^{\rm F}\mathrm{d}t+Z_{t}^{\rm F}\mathrm{d}W_{t},\;t\in[0,T],\;Y^{\rm F}_{T}=1,\\[8.00003pt] \displaystyle\mathrm{d}Y_{t}^{\rm L}={\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}0\times}\mathrm{d}t+Z_{t}^{\rm L}\mathrm{d}W_{t},\;t\in[0,T],\;Y^{\rm L}_{T}=1.\end{cases} (2.12)

We can solve the system explicitly for YL1Y^{\rm L}\equiv 1 and YtF=ek(Tt)Y_{t}^{\rm F}=\mathrm{e}^{k(T-t)} which implies that XLX^{\rm L} is a negative process. inlineinlinetodo: inlineThis is not very clear: I guess a^t1\hat{a}^{1}_{t} and a^t2\hat{a}^{2}_{t} are a^1(YtL,XtL)\hat{a}^{1}(Y^{L}_{t},X^{L}_{t}) and a^2(YtL,XtL)\hat{a}^{2}(Y^{L}_{t},X^{L}_{t})? This should be precised. Also one should give solutions for the whole system, not just YLY^{L} and YFY^{F}. I again guess that ZFZ^{F} and ZLZ^{L} have to be 0, but what of XLX^{L} and XX?Then we have the candidate solutions to ACLM-kk, given for all t[0,T]t\in[0,T] by

α(t,Xt)=ΠA(1cL+k(XtXt)),β(t)=ΠB(ek(Tt)cF), where Xt=x0+tcL+ekTkcF(1ekt)+σWt.\alpha^{\star}(t,X_{t})=\Pi_{A}\bigg{(}\frac{1}{c_{\rm L}}+k(X_{t}-X_{t}^{\star})\bigg{)},\;\beta^{\star}(t)=\Pi_{B}\bigg{(}\frac{\mathrm{e}^{k(T-t)}}{c_{\rm F}}\bigg{)},\text{ where }X^{\star}_{t}=x_{0}+\dfrac{t}{c_{\rm L}}+\dfrac{\mathrm{e}^{kT}}{kc_{\rm F}}\big{(}1-\mathrm{e}^{-kt}\big{)}+\sigma W_{t}.

It is proved in Lemma A.1, that such strategies are optimal for small values of kk such that no projection is enforced.111111We mean that for both controls, the terms inside the brackets do not go outside of the corresponding admissible intervals. Moreover, under the right choice of parameters aa_{\circ} and bb_{\circ}, for instance if Condition (2.6) is satisfied, the value of the ACLM problem is equal to the ACLM-kk problem for k=1Tlog(bcF)k=\frac{1}{T}\log(b_{\circ}c_{\rm F}) and we have

VL=x0+T2cL+T(bcF1)cFlog(bcF),VF(α)=x0+T2cL+T(bcF1)cFlog(bcF)T(b2cF21)4cFlog(bcF).V_{\rm L}=x_{0}+\frac{T}{2c_{\rm L}}+\frac{T(b_{\circ}c_{\rm F}-1)}{c_{\rm F}\log(b_{\circ}c_{\rm F})},\leavevmode\nobreak\ V_{\rm F}(\alpha^{\star})=x_{0}+\frac{T}{2c_{\rm L}}+\frac{T(b_{\circ}c_{\rm F}-1)}{c_{\rm F}\log(b_{\circ}c_{\rm F})}-\frac{T(b_{\circ}^{2}c_{\rm F}^{2}-1)}{4c_{\rm F}\log(b_{\circ}c_{\rm F})}.

It follows directly that this value is strictly smaller than the value of the first–best problem.

2.1.4 Adapted closed-loop strategies

Recall that when the information structure is assumed to be adapted closed-loop (with memory), both the leader and the follower observe the paths of the state XX and the underlying Brownian Motion, and can use these observations to construct their strategies. Then, the leader’s strategy space and the follower’s response strategy space are respectively defined by

𝒜\displaystyle{\cal A} {α:[0,T]×Ω×𝒞([0,T],)A:𝔽W-adapted},\displaystyle\coloneqq\big{\{}\alpha:[0,T]\times\Omega\times{\cal C}([0,T],\mathbb{R})\longrightarrow A:\mathbb{F}^{W}\text{-adapted}\big{\}},
\displaystyle{\cal B} {β:[0,T]×Ω×𝒞([0,T],)×𝒜B:β(,α)𝔽W-adapted,α𝒜}.\displaystyle\coloneqq\big{\{}\beta:[0,T]\times\Omega\times{\cal C}([0,T],\mathbb{R})\times{\cal A}\longrightarrow B:\beta(\cdot,\alpha)\;\mathbb{F}^{W}\text{-adapted},\;\forall\alpha\in{\cal A}\big{\}}.

Camilo: α(,x,xo)\alpha_{\cdot}(\cdot,x,x_{o}), β(,x,xo,α)\beta_{\cdot}(\cdot,x,x_{o},\alpha) is 𝔽W\mathbb{F}^{W}-adapted for any (x,x0,α)𝒞([0,T],)××𝒜(x,x_{0},\alpha)\in{\cal C}([0,T],\mathbb{R})\times\mathbb{R}\times{\cal A} inlineinlinetodo: inlineThis still looks strange to me: what about αt(ω)=a(t,Wt(ω),Xt(ω))\alpha_{t}(\omega)=a(t,W_{\cdot\wedge t}(\omega),X_{\cdot\wedge t}(\omega)) for some measurable map aa and for any α𝒜\alpha\in{\cal A}, βt(ω)=b(t,Wt(ω),Xt(ω),αt(ω))\beta_{t}(\omega)=b(t,W_{\cdot\wedge t}(\omega),X_{\cdot\wedge t}(\omega),\alpha_{\cdot\wedge t}(\omega)) for some measurable map bb? Also, 𝒞([0,T],){\cal C}([0,T],\mathbb{R}) is not exactly defined in the notations. In our example, and under this particular information structure, the leader has actually enough information to deduce the effort of the follower. Therefore, if the leader has enough bargaining power, she may actually force the follower to undertake a recommended effort. More precisely, for aa_{\circ} sufficiently large, the leader would be able to punish the follower if he deviates from the desired action. Indeed, suppose the leader wants to force the follower to perform the action β^\hat{\beta}\in{\cal B} while doing herself an action α^𝒜\hat{\alpha}\in{\cal A}. One possible way to induce these strategies is for the leader to play

αtα^tp𝟙βtβ^t,\alpha_{t}\coloneqq\hat{\alpha}_{t}-p\mathds{1}_{\beta_{\text{$t$}}^{\text{$\circ$}}\neq\hat{\beta}_{\text{$t$}}},

for some penalty coefficient p0p\geq 0, and where β\beta^{\circ} represents the ‘reference’ effort, defined by

βt\displaystyle\beta_{t}^{\circ} limsupε0(βtβtεε),withβtXtσWt0tα^sds,t[0,T].\displaystyle\coloneqq\underset{\varepsilon\searrow 0}{\rm{limsup}}\;\bigg{(}\frac{\upbeta_{t}^{\circ}-\upbeta_{t-\varepsilon}^{\circ}}{\varepsilon}\bigg{)},\;\text{with}\;\upbeta_{t}^{\circ}\coloneqq X_{t}-\sigma W_{t}-\int_{0}^{t}\hat{\alpha}_{s}\mathrm{d}s,\;t\in[0,T].

In words, by implementing the strategy α\alpha defined above, the leader threatens to punish the follower whenever the observed effort β\beta^{\circ} deviates from the recommended effort β^\hat{\beta}. Note that the definition of β\beta^{\circ} makes use of the fact that the leader observes the trajectories of both the state and the Brownian motion. In particular, such strategy α\alpha could not be implemented under the previous ACLM information structure.

In general, we say the leader can effectively punish the follower for not playing β^\hat{\beta} if

α𝒜,JF(α,β^)JF(α,β),β,andJL(α,β^)JL(α^,β^).\displaystyle\exists\alpha\in{\cal A},J_{\rm F}(\alpha,\hat{\beta})\geq J_{\rm F}(\alpha,\beta),\;\forall\beta\in{\cal B},\;\text{and}\;J_{\rm L}(\alpha,\hat{\beta})\geq J_{\rm L}(\hat{\alpha},\hat{\beta}). (2.13)

In words, there exists an admissible strategy α𝒜\alpha\in{\cal A} such that the optimal response of the follower to α\alpha is to play β^\hat{\beta}, and there is no detriment to the leader’s utility when implementing the strategy α\alpha instead of α^\hat{\alpha}.

We mention that in this example, we actually have the equality JL(α,β^)=JL(α^,β^)J_{\rm L}(\alpha,\hat{\beta})=J_{\rm L}(\hat{\alpha},\hat{\beta}). More precisely, the leader can replicate the first-best solution by choosing α^cL1\hat{\alpha}\equiv c_{\rm L}^{-1} and forcing the follower’s action β^b\hat{\beta}\equiv b_{\circ}. Indeed, given the leader’s strategy αtcL1p𝟙βtb\alpha_{t}\coloneqq c_{\rm L}^{-1}-p\mathds{1}_{\beta_{t}^{\text{$\circ$}}\neq b_{\circ}}, we have for all β\beta\in{\cal B}

JF(α,b)JF(α,β)=𝔼[0T(bcF2b2+p𝟙βtbβt+cF2βt2)dt],\displaystyle J_{\rm F}(\alpha,b_{\circ})-J_{\rm F}(\alpha,\beta)=\mathbb{E}^{\mathbb{P}}\bigg{[}\int_{0}^{T}\bigg{(}b_{\circ}-\dfrac{c_{\rm F}}{2}b_{\circ}^{2}+p\mathds{1}_{\beta_{t}^{\text{$\circ$}}\neq b_{\circ}}-\beta_{t}+\dfrac{c_{\rm F}}{2}\beta_{t}^{2}\bigg{)}\mathrm{d}t\bigg{]},

and therefore the effectiveness of the punishment amounts to p(2cF)1+cFb2/2bp\geq(2c_{\rm F})^{-1}+c_{\rm F}b_{\circ}^{2}/2-b_{\circ}. This strategy can be implemented if the process α\alpha defined above is admissible, in the sense that it takes values in AA. Therefore, if aa_{\circ} is sufficiently large, for instance if Condition (2.7) holds, then the solution to the ACL Stackelberg equilibrium in this example coincides with the first-best problem, whose solution is given in Lemma 2.1.

Remark 2.2.

Let us remark that the previous argument shows that for any Stackelberg game under closed-loop strategies in which the leader can punish the follower in a way that is effective and causes no additional cost, i.e., (2.13) holds for (α^,β^)(\hat{\alpha},\hat{\beta}) being the solution to the first-best problem, then the equality VL=VLFBV_{\rm L}=V_{\rm L}^{\rm FB} holds.

2.2 Closed-loop strategies

The approach we developed in this paper provides a way of studying and characterising a new, albeit natural, type of Stackelberg equilibrium in which the both players only have access to the trajectory of the state variable XX. Consistent with the literature on stochastic differential games (see, for example, Carmona [16]), we name this concept of information closed-loop (CL). Under this information structure, both players can take into account only the whole past trajectory of the state XX when making their decisions. Then, the leader’s strategy space and the follower’s response strategy space are respectively given by

𝒜\displaystyle{\cal A} {α:[0,T]×𝒞([0,T],)A:𝔽-adapted},\displaystyle\coloneqq\big{\{}\alpha:[0,T]\times{\cal C}([0,T],\mathbb{R})\longrightarrow A:\mathbb{F}\text{-adapted}\big{\}},
\displaystyle{\cal B} {β:[0,T]×𝒞([0,T],)×𝒜B:β(,α)𝔽-adapted,α𝒜},\displaystyle\coloneqq\big{\{}\beta:[0,T]\times{\cal C}([0,T],\mathbb{R})\times{\cal A}\longrightarrow B:\beta(\cdot,\alpha)\;\mathbb{F}\text{-adapted},\;\forall\alpha\in{\cal A}\big{\}},

where 𝔽\mathbb{F} denotes the filtration generated by XX. inlineinlinetodo: inlineSame as my earlier comments on this. In addition, we said in the notations that this filtration was supposed to be 𝔽X\mathbb{F}^{X}.As already mentioned in introduction, allowing for path-dependency leads to a more realistic and sophisticated form of equilibrium and, consequently, more challenging to solve. In this case, the difficulty arises as the approaches developed above for solving Stackelberg open-loop or closed-loop memoryless equilibrium, mostly relying on the maximum principle, can no longer be used. To the best of our knowledge, there is currently no method developed in the literature for solving Stackelberg games within the framework of this very general, yet quite natural, information structure.

The aim of this paper is, therefore, precisely to propose an approach, based on the dynamic programming principle and stochastic target problems, for characterising the solution for this type of equilibrium. Our methodology, which consists of two main steps, is informally illustrated through the example presented at the top of this section. The first step is to use the follower’s value function as a state variable for the leader’s problem. More precisely, this value function solves a backward SDE, and by writing it in a forward way, we are able to reformulate the leader’s problem as a stochastic control problem of an SDE system with stochastic target constraints. The second step consists in applying the methodology developed by [14] to characterise such a stochastic control problem with target constraints through a system of Hamilton–Jacobi–Bellman equations. Note that the reasoning developed in this section is quite informal, the aim being simply to illustrate our method; the reader is referred to Section 3 onwards for the rigorous description of our approach.

2.2.1 Reformulation as a stochastic target problem

Recall that, given the leader’s strategy α𝒜\alpha\in{\cal A}, the follower’s problem is given by (2.2). The idea of our approach to compute the Stackelberg equilibrium for closed-loop strategies is to consider the BSDE satisfied by the value function of the follower.121212Actually, one should switch to the weak formulation of the problem in order to consider the BSDE representation of the follower’s value. Nevertheless, once again our goal here is simply to illustrate our method, and we refer to Section 3 for the rigorous formulation of the problem. With this in mind, we introduce the dynamic value function of the follower given by

Ytαesssupβ𝔼[XTcF2tTβs2ds|t],t[0,T],\displaystyle Y_{t}^{\alpha}\coloneqq\operatorname*{ess\,sup}_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm F}}{2}\int_{t}^{T}\beta_{s}^{2}\mathrm{d}s\bigg{|}{\cal F}_{t}\bigg{]},\;t\in[0,T],

where the state variable XX follows the dynamics given by (2.1). By introducing the appropriate Hamiltonian, i.e.

HF(t,z,a)supbB{(a+b)zcF2b2},(t,z,a)[0,T]××A,\displaystyle H^{\rm F}(t,z,a)\coloneqq\sup_{b\in B}\bigg{\{}(a+b)z-\dfrac{c_{\rm F}}{2}b^{2}\bigg{\}},\;(t,z,a)\in[0,T]\times\mathbb{R}\times A,

it is easy to show that, for a given α𝒜\alpha\in{\cal A}, the value function of the follower is a solution to the following BSDE

dYtα=HF(t,Ztα,αt)dt+ZtαdXt,t[0,T],YTα=XT,\displaystyle\mathrm{d}Y^{\alpha}_{t}=-H^{\rm F}(t,Z^{\alpha}_{t},\alpha_{t})\mathrm{d}t+Z^{\alpha}_{t}\mathrm{d}X_{t},\;t\in[0,T],\;Y^{\alpha}_{T}=X_{T},

for some Zα𝒵Z^{\alpha}\in{\cal Z}, where 𝒵{\cal Z} is a set of 𝔽\mathbb{F}-adapted processes taking value in \mathbb{R} and satisfying appropriate integrability conditions. The maximiser of the Hamiltonian is naturally given by the functional b(z)=ΠB~(z)/cFb^{\star}(z)=\Pi_{\tilde{B}}(z)/c_{\rm F}, zz\in\mathbb{R}, where ΠB~(z)\Pi_{\tilde{B}}(z) denotes the projection of zz on B~[0,bcF]\tilde{B}\coloneqq[0,b_{\circ}c_{\rm F}]. For a given strategy α𝒜\alpha\in{\cal A} chosen by the leader, we are thus led to consider the following FBSDE system

{dXt=(αt+1cFΠB~(Ztα))dt+σdWt,t[0,T],X0=x0,dYtα=12cFΠB~2(Ztα)dt+σZtαdWt,t[0,T],YTα=XT.\displaystyle\begin{cases}\displaystyle\mathrm{d}X_{t}=\bigg{(}\alpha_{t}+\dfrac{1}{c_{\rm F}}\Pi_{\tilde{B}}\big{(}Z_{t}^{\alpha}\big{)}\bigg{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0},\\[8.00003pt] \displaystyle\mathrm{d}Y_{t}^{\alpha}=\dfrac{1}{2c_{\rm F}}\Pi^{2}_{\tilde{B}}\big{(}Z_{t}^{\alpha}\big{)}\mathrm{d}t+\sigma Z^{\alpha}_{t}\mathrm{d}W_{t},\;t\in[0,T],\;Y^{\alpha}_{T}=X_{T}.\end{cases} (2.14)

Consequently, the leader’s problem defined by (2.4) becomes

VL(x0)=supα𝒜𝔼[XTcL20Tαt2dt], subject to the FBSDE system (2.14).\displaystyle V_{\rm L}(x_{0})=\sup_{\alpha\in{\cal A}}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\alpha_{t}^{2}\mathrm{d}t\bigg{]},\text{ subject to the FBSDE system \eqref{eq:ACL-follower-example}}.

Unfortunately, the literature on the optimal control problem of FBSDEs is quite scarce and, to the best of our knowledge, is not able to accommodate the scenario described above, see for instance Yong [81] or Wu [79]. Nevertheless, to continue the reformulation of the leader’s problem, one can write the BSDE in (2.14) as a forward SDE for a given initial condition y0y_{0}\in\mathbb{R}, and thus consider the following SDE system

{dXt=(αt+1cFΠB~(Zt))dt+σdWt,t[0,T],X0=x0,dYt=12cFΠB~2(Zt)dt+σZtdWt,t[0,T],Y0=y0,\displaystyle\begin{cases}\displaystyle\mathrm{d}X_{t}=\bigg{(}\alpha_{t}+\dfrac{1}{c_{\rm F}}\Pi_{\tilde{B}}(Z_{t})\bigg{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0},\\[8.00003pt] \displaystyle\mathrm{d}Y_{t}=\dfrac{1}{2c_{\rm F}}\Pi^{2}_{\tilde{B}}(Z_{t})\mathrm{d}t+\sigma Z_{t}\mathrm{d}W_{t},\;t\in[0,T],\;Y_{0}=y_{0},\end{cases} (2.15)

for some (α,Z)𝒜×𝒵(\alpha,Z)\in{\cal A}\times{\cal Z}. However, by doing so, one needs to take into account an additional constraint, namely a stochastic target constraint, in order to ensure that the equality YT=XTY_{T}=X_{T} holds with probability one at the end of the game. More precisely, one of the main results of our paper, stated for the general framework in Theorem 4.6, is that the leader’s problem originally defined here by (2.4) is equivalent to the following stochastic target problem

V^L(x0)supy0sup(Z,α)(x0,y0)𝔼[XTcL20Tαt2dt],\displaystyle\widehat{V}_{\rm L}(x_{0})\coloneqq\sup_{y_{\text{$0$}}\in\mathbb{R}}\sup_{(Z,\alpha)\in{\mathfrak{C}}(x_{\text{$0$}},y_{\text{$0$}})}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\alpha_{t}^{2}\mathrm{d}t\bigg{]},

subject to the system (2.15), and where (x0,y0){(Z,α)𝒵×𝒜:YT=XT,–a.s.}\mathfrak{C}(x_{0},y_{0})\coloneqq\{(Z,\alpha)\in{\cal Z}\times{\cal A}:Y_{T}=X_{T},\;\mathbb{P}\text{\rm--a.s.}\}, for any (x0,y0)2(x_{0},y_{0})\in\mathbb{R}^{2}.

2.2.2 Interpretation of the reformulated problem

The interpretation of the reformulated problem V^L\widehat{V}_{\rm L} is the following. For fixed y0y_{0}\in\mathbb{R}, the leader’s problem is to choose a couple (Z,α)(Z,\alpha) of admissible controls. With this in mind, given the state XX observable in continuous time, she can construct an additional process YY, starting from Y0=y0Y_{0}=y_{0}, with the following dynamics

dYt=HF(t,Zt,αt)dt+ZtdXt,t[0,T].\displaystyle\mathrm{d}Y_{t}=-H^{\rm F}(t,Z_{t},\alpha_{t})\mathrm{d}t+Z_{t}\mathrm{d}X_{t},\;t\in[0,T].

Note that the previous process YY can be constructed based solely on the observation through time of the path of XX, and in particular does not require any knowledge of the follower’s control β\beta nor of the trajectory of the underlying Brownian motion WW. Now, the couple (Z,α)(Z,\alpha) of admissible processes chosen by the leader should be such that the terminal condition YT=XTY_{T}=X_{T} is satisfied \mathbb{P}–a.s. Indeed, under this important condition, the follower’s problem originally defined by (2.2) can be rewritten as

VF(α)supβ𝔼[XTcF20Tβt2dt]=supβ𝔼[YTcF20Tβt2dt].\displaystyle V_{\rm F}(\alpha)\coloneqq\sup_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{]}=\sup_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}Y_{T}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{]}.

With the knowledge of the dynamic of YY, as well as the leader’s controls (Z,α)(Z,\alpha), the follower can easily solve his own optimisation problem

VF(α)\displaystyle V_{\rm F}(\alpha) =y0+supβ𝔼[0THF(t,Zt,αt)dt+0TZtdXtcF20Tβt2dt]\displaystyle=y_{0}+\sup_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}-\int_{0}^{T}H^{\rm F}(t,Z_{t},\alpha_{t})\mathrm{d}t+\int_{0}^{T}Z_{t}\mathrm{d}X_{t}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{]}
=y0+supβ𝔼[0TsupbB{(αt+b)ZtcF2b2}dt+0TZt(αt+βt)dt+0TσZtdWtcF20Tβt2dt]\displaystyle=y_{0}+\sup_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}-\int_{0}^{T}\sup_{b\in B}\bigg{\{}(\alpha_{t}+b)Z_{t}-\dfrac{c_{\rm F}}{2}b^{2}\bigg{\}}\mathrm{d}t+\int_{0}^{T}Z_{t}(\alpha_{t}+\beta_{t})\mathrm{d}t+\int_{0}^{T}\sigma Z_{t}\mathrm{d}W_{t}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{]}
=y0+supβ𝔼[0TsupbB{bZtcF2b2}dt+0T(ZtβtcF20Tβt2)dt],\displaystyle=y_{0}+\sup_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}-\int_{0}^{T}\sup_{b\in B}\bigg{\{}bZ_{t}-\dfrac{c_{\rm F}}{2}b^{2}\bigg{\}}\mathrm{d}t+\int_{0}^{T}\bigg{(}Z_{t}\beta_{t}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\bigg{)}\mathrm{d}t\bigg{]},

making it clear, at least heuristically here, that his best response strategy is to choose βtΠB~(Zt)/cF\beta_{t}\coloneqq\Pi_{\tilde{B}}(Z_{t})/c_{\rm F}, t[0,T]t\in[0,T]. This optimal choice provides him with the maximum value, for all (α,Z)𝒜×𝒵(\alpha,Z)\in{\cal A}\times{\cal Z}. In particular, if ZtB~Z_{t}\in\tilde{B} for all t[0,T]t\in[0,T], then VF(α)=y0V_{\rm F}(\alpha)=y_{0}. To summarise, for a given y0y_{0}\in\mathbb{R}, which actually coincides with the follower’s value, the leader designs his strategy characterised by the couple (Z,α)(Z,\alpha) such that YT=XTY_{T}=X_{T} is satisfied –a.s.\mathbb{P}\text{\rm--a.s.} for a well-chosen process YY, inducing the follower’s optimal response βΠB~(Z)/cF\beta_{\cdot}\coloneqq\Pi_{\tilde{B}}(Z_{\cdot})/c_{\rm F}. Note that the leader should not only communicate to the follower the couple (Z,α)(Z,\alpha) of controls, but she should also indicate how these controls are designed, namely the construction of the underlying process YY: all these ingredients are part of the strategy implemented by the leader.

2.2.3 Characterisation of the equilibrium

Given the reformulation of the leader’s problem as a stochastic control problem with stochastic target constraint, the second step consists now in applying the methodology in [14] to solve the latter problem and thus obtain a characterisation of the corresponding Stackelberg equilibrium.

Recall that in our illustrative example, the leader’s reformulated problem takes the following form

V^L(x0)supy0V~L(0,x0,y0),whereV~L(t,x,y)sup(Z,α)(t,x,y)𝔼[XTt,x,Z,αcL2tTαs2ds],\displaystyle\widehat{V}_{\rm L}(x_{0})\coloneqq\sup_{y_{\text{$0$}}\in\mathbb{R}}\widetilde{V}_{\rm L}(0,x_{0},y_{0}),\;\text{where}\;\widetilde{V}_{\rm L}(t,x,y)\coloneqq\sup_{(Z,\alpha)\in{\mathfrak{C}}(t,x,y)}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}^{t,x,Z,\alpha}-\dfrac{c_{\rm L}}{2}\int_{t}^{T}\alpha_{s}^{2}\mathrm{d}s\bigg{]}, (2.16)

where for (t,x,y)[0,T]×2(t,x,y)\in[0,T]\times\mathbb{R}^{2}, the set (t,x,y){\mathfrak{C}}(t,x,y) is defined by

(t,x,y){(Z,α)𝒵×𝒜:YTt,y,Z,α=XTt,x,Z,α,–a.s.},\displaystyle{\mathfrak{C}}(t,x,y)\coloneqq\big{\{}(Z,\alpha)\in{\cal Z}\times{\cal A}:Y_{T}^{t,y,Z,\alpha}=X_{T}^{t,x,Z,\alpha},\;\mathbb{P}\text{\rm--a.s.}\big{\}},

with the controlled state variables XX and YY satisfying the following dynamics

{dXst,x,Z,α=(αs+1cFΠB~(Zs))ds+σdWs,s[t,T],Xtt,x,Z,α=x,dYst,y,Z,α=12cFΠB~2(Zs)ds+σZsdWs,s[t,T],Ytt,y,Z,α=y.\displaystyle\begin{cases}\displaystyle\mathrm{d}X_{s}^{t,x,Z,\alpha}=\bigg{(}\alpha_{s}+\dfrac{1}{c_{\rm F}}\Pi_{\tilde{B}}(Z_{s})\bigg{)}\mathrm{d}s+\sigma\mathrm{d}W_{s},\;s\in[t,T],\;X_{t}^{t,x,Z,\alpha}=x,\\[8.00003pt] \displaystyle\mathrm{d}Y_{s}^{t,y,Z,\alpha}=\dfrac{1}{2c_{\rm F}}\Pi^{2}_{\tilde{B}}(Z_{s})\mathrm{d}s+\sigma Z_{s}\mathrm{d}W_{s},\;s\in[t,T],\;Y_{t}^{t,y,Z,\alpha}=y.\end{cases} (2.17)

In particular, for fixed (t,x,y)[0,T]×2(t,x,y)\in[0,T]\times\mathbb{R}^{2}, V~L(t,x,y)\widetilde{V}_{\rm L}(t,x,y) corresponds to the dynamic value function of an optimal control problem with stochastic target constraints. Thus, we define for any t[0,T]t\in[0,T] the target reachability set

VG(t){(x,y)2:(Z,α)𝒵×𝒜,YTt,y,Z,α=XTt,x,Z,α,–a.s.}.\displaystyle V_{G}(t)\coloneqq\big{\{}(x,y)\in\mathbb{R}^{2}:\exists(Z,\alpha)\in{\cal Z}\times{\cal A},\;Y_{T}^{t,y,Z,\alpha}=X_{T}^{t,x,Z,\alpha},\;\mathbb{P}\text{\rm--a.s.}\big{\}}.

An intermediary but important result for our approach, see Lemma 5.3, is to show that the closure of the reachability set VG(t)V_{G}(t) coincides with the following set

V^G(t){(x,y)2:w(t,x)yw+(t,x)},\displaystyle\hat{V}_{G}(t)\coloneqq\{(x,y)\in\mathbb{R}^{2}:w^{-}(t,x)\leq y\leq w^{+}(t,x)\},

for appropriate auxiliary functions ww^{-} and w+w^{+}. It is then almost straightforward to extend the approach in [14] to characterise the leader’s value function V~L\widetilde{V}_{\rm L} as the solution to a specific system of Hamilton–Jacobi–Bellman (HJB) equations and therefore determine the corresponding optimal strategy. More precisely, this can be achieved in three main steps. First, the auxiliary functions ww^{-} and w+w^{+} can be characterised as solutions (in an appropriate sense) to specific HJB equations. Then, the leader’s value function V~L\widetilde{V}_{\rm L} satisfies another specific HJB equation on each of these boundaries. Finally, in the interior of the domain, V~L\widetilde{V}_{\rm L} is a solution to the classical HJB equation, but with the non-standard boundary conditions obtained in the previous step, see Theorem 5.8. These three steps are described below in the framework of our illustrative example.

The auxiliary functions.

The lower and upper boundaries, ww^{-} and w+w^{+} can be characterised as the solutions to the following specific HJB equations on (t,x)[0,T)×(t,x)\in[0,T)\times\mathbb{R},

tw+(t,x)H+(t,x,xw+(t,x),xxw+(t,x))=0,tw(t,x)H(t,x,xw(t,x),xxw(t,x))=0,\displaystyle\displaystyle-\partial_{t}w^{+}(t,x)-H^{+}(t,x,\partial_{x}w^{+}(t,x),\partial_{xx}w^{+}(t,x))=0,\;\displaystyle-\partial_{t}w^{-}(t,x)-H^{-}(t,x,\partial_{x}w^{-}(t,x),\partial_{xx}w^{-}(t,x))=0,

with terminal condition w(T,x)=w+(T,x)=xw^{-}(T,x)=w^{+}(T,x)=x, xx\in\mathbb{R}, and where for all (t,x,p,q)[0,T]×3(t,x,p,q)\in[0,T]\times\mathbb{R}^{3}

H+(t,x,p,q)\displaystyle H^{+}(t,x,p,q) sup(z,a)N(t,x,p)hb(p,q,z,a),H(t,x,p,q)inf(z,a)N(t,x,p)hb(p,q,z,a),\displaystyle\coloneqq\sup_{(z,a)\in N(t,x,p)}h^{b}(p,q,z,a),\;H^{-}(t,x,p,q)\coloneqq\inf_{(z,a)\in N(t,x,p)}h^{b}(p,q,z,a),
withhb(p,q,z,a)\displaystyle\text{with}\;h^{b}(p,q,z,a) 12cFΠB~2(z)+(a+1cFΠB~(z))p+12σ2q,for(z,a)N(t,x,p){(z,a)×A:σz=σp}.\displaystyle\coloneqq-\dfrac{1}{2c_{\rm F}}\Pi^{2}_{\tilde{B}}(z)+\bigg{(}a+\dfrac{1}{c_{\rm F}}\Pi_{\tilde{B}}(z)\bigg{)}p+\dfrac{1}{2}\sigma^{2}q,\;\text{for}\;(z,a)\in N(t,x,p)\coloneqq\{(z,a)\in\mathbb{R}\times A:\sigma z=\sigma p\}.

Since σ0\sigma\neq 0, the constraint set NN boils down to N(t,x,p)={(p,a):aA}N(t,x,p)=\{(p,a):a\in A\}, for all (t,x,p)[0,T]×2(t,x,p)\in[0,T]\times\mathbb{R}^{2}. Using in addition the ansatz xw±B~\partial_{x}w^{\pm}\in\tilde{B}, one obtains the following HJB equations on (t,x)[0,T)×(t,x)\in[0,T)\times\mathbb{R}

tw(t,x)12σ2xxw(t,x)12cF(xw(t,x))2infaA{xw(t,x)a}\displaystyle-\partial_{t}w^{-}(t,x)-\dfrac{1}{2}\sigma^{2}\partial_{xx}w^{-}(t,x)-\dfrac{1}{2c_{\rm F}}\big{(}\partial_{x}w^{-}(t,x)\big{)}^{2}-\inf_{a\in A}\big{\{}\partial_{x}w^{-}(t,x)a\big{\}} =0,\displaystyle=0,
tw+(t,x)12σ2xxw+(t,x)12cF(xw+(t,x))2supaA{xw+(t,x)a}\displaystyle-\partial_{t}w^{+}(t,x)-\dfrac{1}{2}\sigma^{2}\partial_{xx}w^{+}(t,x)-\dfrac{1}{2c_{\rm F}}\big{(}\partial_{x}w^{+}(t,x)\big{)}^{2}-\sup_{a\in A}\big{\{}\partial_{x}w^{+}(t,x)a\big{\}} =0,\displaystyle=0,

with terminal condition w(T,x)=w+(T,x)=xw^{-}(T,x)=w^{+}(T,x)=x, xx\in\mathbb{R}. Recalling that A=[a,a]A=[-a_{\circ},a_{\circ}], one can explicitly compute the auxiliary functions, solution to the previous HJB equations

w(t,x)=x+(12cFa)(Tt),andw+(t,x)=x+(12cF+a)(Tt),(t,x)[0,T]×.\displaystyle w^{-}(t,x)=x+\bigg{(}\dfrac{1}{2c_{\rm F}}-a_{\circ}\bigg{)}(T-t),\;\text{and}\;w^{+}(t,x)=x+\bigg{(}\dfrac{1}{2c_{\rm F}}+a_{\circ}\bigg{)}(T-t),\;(t,x)\in[0,T]\times\mathbb{R}. (2.18)
Remark 2.3.

Let us remark that in the context of this example, to have meaningful, i.e. finite, solutions the boundedness assumption on AA is necessary. Though the methodology developed in [14] can cover the case of unbounded action sets, this will require imposing growth conditions that, in turn, will rule out the framework of the current example. Moreover, we also remark that the possibility of discontinuous or exploding solutions requires working with solutions to the above PDEs in the viscosity sense.

The value function at the boundaries.

The second step is to determine the HJB equations satisfied by the value function V~L(t,x,y)\widetilde{V}_{\rm L}(t,x,y) on the boundaries, i.e. on {y=w(t,x)}\{y=w^{-}(t,x)\} and {y=w+(t,x)}\{y=w^{+}(t,x)\}, for all (t,x)[0,T]×(t,x)\in[0,T]\times\mathbb{R}. With this in mind, we define for all p(p1,p2)2p\coloneqq(p_{1},p_{2})^{\top}\in\mathbb{R}^{2}, q2×2q\in\mathbb{R}^{2\times 2} and (z,a)×A(z,a)\in\mathbb{R}\times A,

h(p,q,z,a)cL2a2+(a+1cFΠB~(z))p1+12cFΠB~2(z)p2+12σ2q11+12σ2z2q22+σ2zq12.\displaystyle{\rm h}(p,q,z,a)\coloneqq-\dfrac{c_{\rm L}}{2}a^{2}+\bigg{(}a+\dfrac{1}{c_{\rm F}}\Pi_{\tilde{B}}(z)\bigg{)}p_{1}+\dfrac{1}{2c_{\rm F}}\Pi^{2}_{\tilde{B}}(z)p_{2}+\dfrac{1}{2}\sigma^{2}q_{11}+\dfrac{1}{2}\sigma^{2}z^{2}q_{22}+\sigma^{2}zq_{12}.

We then introduce the following Hamiltonians, for all (t,x,p,q)[0,T]××2×3(t,x,p,q)\in[0,T]\times\mathbb{R}\times\mathbb{R}^{2}\times\mathbb{R}^{3},

H(t,x,p,q)sup(z,a)𝒵(t,x)h(p,q,z,a),andH+(t,x,p,q)sup(z,a)𝒵+(t,x)h(p,q,z,a),\displaystyle{\rm H}^{-}(t,x,p,q)\coloneqq\sup_{(z,a)\in{\cal Z}^{\text{$-$}}(t,x)}{\rm h}(p,q,z,a),\;\text{and}\;{\rm H}^{+}(t,x,p,q)\coloneqq\sup_{(z,a)\in{\cal Z}^{\text{$+$}}(t,x)}{\rm h}(p,q,z,a),

in which the sets 𝒵±(t,x){\cal Z}^{\pm}(t,x) are respectively defined by

𝒵(t,x)\displaystyle{\cal Z}^{-}(t,x) {(z,a)×A:σz=σxw(t,x),andtw(t,x)hb(xw(t,x),xxw(t,x),z,a)0},\displaystyle\coloneqq\big{\{}(z,a)\in\mathbb{R}\times A:\sigma z=\sigma\partial_{x}w^{-}(t,x),\;\text{and}\;-\partial_{t}w^{-}(t,x)-h^{b}(\partial_{x}w^{-}(t,x),\partial_{xx}w^{-}(t,x),z,a)\geq 0\big{\}},
𝒵+(t,x)\displaystyle{\cal Z}^{+}(t,x) {(z,a)×A:σz=σxw+(t,x),andtw+(t,x)hb(xw+(t,x),xxw+(t,x),z,a)0}.\displaystyle\coloneqq\big{\{}(z,a)\in\mathbb{R}\times A:\sigma z=\sigma\partial_{x}w^{+}(t,x),\;\text{and}\;-\partial_{t}w^{+}(t,x)-h^{b}(\partial_{x}w^{+}(t,x),\partial_{xx}w^{+}(t,x),z,a)\leq 0\big{\}}.

On the one hand, the value function V~L\widetilde{V}_{\rm L} should satisfy on {y=w(t,x)}\{y=w^{-}(t,x)\} the following equation

tv(t,x,y)H(t,x,xv(t,x,y),x2v(t,x,y))=0,(t,x,y)[0,T)×2,\displaystyle-\partial_{t}v(t,x,y)-{\rm H}^{-}(t,x,\partial_{\rm x}v(t,x,y),\partial^{2}_{\rm x}v(t,x,y))=0,\;(t,x,y)\in[0,T)\times\mathbb{R}^{2},

with terminal condition v(T,x,w(T,x))=xv(T,x,w^{-}(T,x))=x, xx\in\mathbb{R}.131313Here, xv(t,x,y)\partial_{\rm x}v(t,x,y) and x2v(t,x,y)\partial^{2}_{\rm x}v(t,x,y) denote respectively the gradient and Hessian of the function vv in both space variables x(x,y){\rm x}\coloneqq(x,y). Given the previous HJB equation satisfied by ww^{-}, it is clear that 𝒵(t,x)={(1,a)}{\cal Z}^{-}(t,x)=\{(1,-a_{\circ})\}, for all (t,x)[0,T)×(t,x)\in[0,T)\times\mathbb{R}. We thus obtain a standard PDE for V~L\widetilde{V}_{\rm L} on {y=w(t,x)}\{y=w^{-}(t,x)\}, namely

tv+12cLa2(1cFa)xv12cFyv12σ2xxv12σ2yyvσ2xyv=0,(t,x)[0,T]×,\displaystyle-\partial_{t}v+\dfrac{1}{2}c_{\rm L}a_{\circ}^{2}-\bigg{(}\dfrac{1}{c_{\rm F}}-a_{\circ}\bigg{)}\partial_{x}v-\dfrac{1}{2c_{\rm F}}\partial_{y}v-\dfrac{1}{2}\sigma^{2}\partial_{xx}v-\dfrac{1}{2}\sigma^{2}\partial_{yy}v-\sigma^{2}\partial_{xy}v=0,\;(t,x)\in[0,T]\times\mathbb{R},

which leads to the following solution

V~L(t,x,w(t,x))=x+(a12cLa2+1cF)(Tt),(t,x)[0,T]×.\displaystyle\widetilde{V}_{\rm L}(t,x,w^{-}(t,x))=x+\bigg{(}-a_{\circ}-\dfrac{1}{2}c_{\rm L}a_{\circ}^{2}+\dfrac{1}{c_{\rm F}}\bigg{)}(T-t),\;(t,x)\in[0,T]\times\mathbb{R}.

On the other hand, on {y=w+(t,x)}\{y=w^{+}(t,x)\}, the value function should be solution to

tv(t,x,y)H+(t,x,xv(t,x,y),x2v(t,x,y))=0,(t,x,y)[0,T)×2,\displaystyle-\partial_{t}v(t,x,y)-{\rm H}^{+}(t,x,\partial_{\rm x}v(t,x,y),\partial^{2}_{\rm x}v(t,x,y))=0,\;(t,x,y)\in[0,T)\times\mathbb{R}^{2},

with terminal condition v(T,x,w+(T,x))=xv(T,x,w^{+}(T,x))=x, xx\in\mathbb{R}. Through similar computations, one obtains

V~L(t,x,w+(t,x))=x+(a12cLa2+1cF)(Tt),(t,x)[0,T]×.\displaystyle\widetilde{V}_{\rm L}(t,x,w^{+}(t,x))=x+\bigg{(}a_{\circ}-\dfrac{1}{2}c_{\rm L}a_{\circ}^{2}+\dfrac{1}{c_{\rm F}}\bigg{)}(T-t),\;(t,x)\in[0,T]\times\mathbb{R}.
Value function inside the domain.

Finally, for (t,x)[0,T]×(t,x)\in[0,T]\times\mathbb{R} and y(w(t,x),w+(t,x))y\in(w^{-}(t,x),w^{+}(t,x)), the value function V~L\widetilde{V}_{\rm L} is solution to the classical HJB equation for stochastic control, namely

tv(t,x,y)HL(xv(t,x,y),x2v(t,x,y))=0,whereHL(p,q)sup(z,a)×Ah(p,q,z,a),(p,q)2×2×2,\displaystyle-\partial_{t}v(t,x,y)-H^{\rm L}(\partial_{\rm x}v(t,x,y),\partial^{2}_{\rm x}v(t,x,y))=0,\;\text{where}\;H^{\rm L}(p,q)\coloneqq\sup_{(z,a)\in\mathbb{R}\times A}{\rm h}(p,q,z,a),\;(p,q)\in\mathbb{R}^{2}\times\mathbb{R}^{2\times 2},

but instead of the usual terminal condition, we need to enforce the specific boundary conditions obtained in the previous step, i.e. for (t,x)[0,T]×(t,x)\in[0,T]\times\mathbb{R},

v(t,x,w(t,x))=x+(1cFa12cLa2)(Tt),andv(t,x,w+(t,x))=x+(1cF+a12cLa2)(Tt).\displaystyle v(t,x,w^{-}(t,x))=x+\bigg{(}\dfrac{1}{c_{\rm F}}-a_{\circ}-\dfrac{1}{2}c_{\rm L}a_{\circ}^{2}\bigg{)}(T-t),\;\text{and}\;v(t,x,w^{+}(t,x))=x+\bigg{(}\dfrac{1}{c_{\rm F}}+a_{\circ}-\dfrac{1}{2}c_{\rm L}a_{\circ}^{2}\bigg{)}(T-t).

The previous HJB equation can be slightly simplified as follows

tvsupaA{axvcL2a2}supz{1cFΠB~(z)xv+12cFΠB~2(z)yv+12σ2z2yyv+σ2zxyv}12σ2xxv=0.\displaystyle-\partial_{t}v-\sup_{a\in A}\bigg{\{}a\partial_{x}v-\dfrac{c_{\rm L}}{2}a^{2}\bigg{\}}-\sup_{z\in\mathbb{R}}\bigg{\{}\dfrac{1}{c_{\rm F}}\Pi_{\tilde{B}}(z)\partial_{x}v+\dfrac{1}{2c_{\rm F}}\Pi^{2}_{\tilde{B}}(z)\partial_{y}v+\dfrac{1}{2}\sigma^{2}z^{2}\partial_{yy}v+\sigma^{2}z\partial_{xy}v\bigg{\}}-\dfrac{1}{2}\sigma^{2}\partial_{xx}v=0. (2.19)

Though no explicit solution seem to be available, the previous system can be solved numerically. Once this is achieved, it remains, for fixed xx\in\mathbb{R}, to maximise v(0,x,y)v(0,x,y) over y(w(0,x),w+(0,x))y\in(w^{-}(0,x),w^{+}(0,x)). The optimal y0[w(0,x),w+(0,x)]y_{0}\in[w^{-}(0,x),w^{+}(0,x)] and the corresponding value v(0,x,y0)v(0,x,y_{0}) will respectively give the follower’s and leader value functions, namely VFV_{\rm F} and VLV_{\rm L}, for the initial condition X0=xX_{0}=x. The numerical results are presented in the following section.

2.2.4 Comparison with other solution concepts and numerical results

For the numerical results, we first consider a benchmark scenario in Figure 1, with parameters T=1T=1, cF=cL=1c_{\rm F}=c_{\rm L}=1, σ=1\sigma=1, a=10a_{\circ}=10, and b=3b_{\circ}=3. In this scenario, the cost of effort is therefore identical for the leader and the follower. We then study in Figure 3 a scenario in which the follower’s cost of effort increases to cF=1.25c_{\rm F}=1.25, and conversely in Figure 2 when now the leader’s cost of effort increases to cL=1.25c_{\rm L}=1.25. Finally, we represent in Figure 4 the impact of an increase of aa_{\circ} from 1010 to 2020. Note that in these four scenarios, aa_{\circ} is chosen sufficiently large so that Equation 2.6 and Equation 2.7 are satisfied.

Refer to caption
(a) Leader’s value function.
Refer to caption
(b) Follower’s value function.
Figure 1: Comparison of the value functions for various information concepts
Parameters: T=1T=1, cF=cL=1c_{\rm F}=c_{\rm L}=1, σ=1\sigma=1, a=10a_{\circ}=10, and b=3b_{\circ}=3.

First of all, we remark that for the four sets of parameters, we always have the following inequalities for the leader’s value function,

VLAOL=VLAF<VLACLM,VLCL<VLACL=VLFB,\displaystyle V_{\rm L}^{\rm AOL}=V_{\rm L}^{\rm AF}<V_{\rm L}^{\rm ACLM},V_{\rm L}^{\rm CL}<V_{\rm L}^{\rm ACL}=V_{\rm L}^{\rm FB},

and the converse inequalities for the follower’s value. In particular, for these chosen sets of parameters, the leader’s value in the closed-loop equilibrium is higher than her value in the adapted closed-loop memoryless scenario.

Comparing Figure 1 with Figure 2, one can observe that the increase in the leader’s cost of effort negatively impacts both her and the follower’s value in any equilibrium concepts. Comparing now Figure 1 with Figure 3, we can observe that when the follower’s cost of effort slightly increases, it negatively impacts both his and the leader’s value for almost all concepts of equilibrium, except in the ACL/first-best case. Indeed, in this scenario, the leader’s value function remains unchanged, as the follower will always exert the maximal effort bb_{\circ}. Therefore, only the follower’s value is impacted by the increase in his cost.

Finally, comparing Figure 4 with the benchmark in Figure 1, one can notice that increasing the parameter aa_{\circ}, representing the maximum absolute value of the leader’s effort, will only impact the values in the CL case. Indeed, in the FB, ACL, AOL and AF cases, the leader will always exert the optimal effort 1/cL1/c_{\rm L}, independently of aa_{\circ}. However, in the closed-loop equilibrium, when aa_{\circ} increases, the leader has more bargaining power to incentivise the follower to exert a higher effort. More precisely, when studying the partial differential equations satisfied by the boundaries w±w^{\pm}, one can notice that if aa_{\circ} increases, the cone formed by the boundaries becomes larger. The leader should still ensure that the target constraint is satisfied, and therefore set the control ZZ to 11 when one of the barriers is hit, but as the cone is wider this constraint becomes less restrictive. Intuitively, if the set AA was not bounded, the boundaries ww^{-} and w+w^{+} would be at -\infty and ++\infty respectively, leading to an unconstrained problem for the leader. With this in mind, the limit of the leader’s value when aa_{\circ} goes to infinity should coincide with her value in the first-best case. In other words, the higher aa_{\circ}, the longer the leader can force the follower to exert the maximal effort bb_{\circ} instead of his optimal effort 1/cF1/c_{\rm F}.

Refer to caption
(a) Leader’s value function.
Refer to caption
(b) Follower’s value function.
Figure 2: Comparison of the value functions for various information concepts
Parameters: T=1T=1, cF=1c_{\rm F}=1, cL=1.25c_{\rm L}=1.25, σ=1\sigma=1, a=10a_{\circ}=10, b=3b_{\circ}=3.
Refer to caption
(a) Leader’s value function.
Refer to caption
(b) Follower’s value function.
Figure 3: Comparison of the value functions for various information concepts
Parameters: T=1T=1, cF=1.25c_{\rm F}=1.25, cL=1c_{\rm L}=1, σ=1\sigma=1, a=10a_{\circ}=10, b=3b_{\circ}=3.
Refer to caption
(a) Leader’s value function.
Refer to caption
(b) Follower’s value function.
Figure 4: Comparison of the value functions for various information concepts
Parameters: T=1T=1, cF=1c_{\rm F}=1, cL=1c_{\rm L}=1, σ=1\sigma=1, a=20a_{\circ}=20, b=3b_{\circ}=3.
inlineinlinetodo: inlineAny way we can get a less ‘stair-case–like’ curve for CL in Figure 4.(b)? Also, what us the drop at time 1 for CL on Figure 4.(a)?

3 General problem formulation

Let T>0T>0, Ω𝒞([0,T];d)\Omega\coloneqq{\cal C}([0,T];\mathbb{R}^{d}), topologised by uniform convergence, and XX be the canonical process on Ω\Omega, that is

Xt(x)x(t),xΩ,t[0,T].X_{t}(x)\coloneqq x(t),\;x\in\Omega,\;t\in[0,T].

We denote by 𝔽=(t)t0\mathbb{F}=({\cal F}_{t})_{t\geq 0} the canonical filtration, that is, t=σ(Xs,0st){\cal F}_{t}=\sigma(X_{s},0\leq s\leq t) for every t[0,T]t\in[0,T]. inlineinlinetodo: inlineThis is called 𝔽X\mathbb{F}^{X} in the notations. The process XX will represent the output of the game, which will be controlled in weak formulation by both the leader and the follower. We will give the details in the next subsection.

Let 𝐌(Ω)\mathbf{M}(\Omega) be the set of all probability measures on (Ω,T)(\Omega,{\cal F}_{T}). 𝐌(Ω)\mathbb{P}\in\mathbf{M}(\Omega) is said to be a semi-martingale measure if XX is an (𝔽,)(\mathbb{F},\mathbb{P})–semi-martingale. We denote by 𝒫S\mathcal{P}_{S} the set of all semi-martingale measures. By Karandikar [40], there exists an 𝔽\mathbb{F}–progressively measurable process denoted by [X]([X]t)t[0,T][X]\coloneqq([X]_{t})_{t\in[0,T]} coinciding with the quadratic variation of XX, –a.s.\mathbb{P}\text{\rm--a.s.}, for any 𝒫S\mathbb{P}\in{\cal P}_{S}. Moreover, the density with respect to the Lebesgue measure is denoted by a non-negative symmetric matrix σ^t2𝕊d\widehat{\sigma}^{2}_{t}\in\mathbb{S}^{d} defined by

σ^t2limsupε0[X]t[X]tεε,t[0,T].\widehat{\sigma}^{2}_{t}\coloneqq\underset{\varepsilon\searrow 0}{\rm{limsup}}\;\frac{[X]_{t}-[X]_{t-\varepsilon}}{\varepsilon},\;t\in[0,T].

We also recall the so-called universal filtration 𝔽U(tU)0tT\mathbb{F}^{U}\coloneqq(\mathcal{F}^{U}_{t})_{0\leq t\leq T} given by tU𝐌(Ω)t\mathcal{F}^{U}_{t}\coloneqq\bigcap_{\mathbb{P}\in\mathbf{M}(\Omega)}\mathcal{F}_{t}^{\mathbb{P}}, where t\mathcal{F}_{t}^{\mathbb{P}} is the usual \mathbb{P}-completion of t{\cal F}_{t}. For any subset 𝒫𝐌(Ω)\mathcal{P}\subseteq\mathbf{M}(\Omega), letting 𝒩𝒫{\cal N}^{{\cal P}} denote the collection of 𝒫{\cal P}-polar sets, i.e. the sets which are \mathbb{P}-negligible for all 𝒫\mathbb{P}\in{\cal P}, we define the filtration 𝔽𝒫(t𝒫)t[0,T]\mathbb{F}^{\mathcal{P}}\coloneqq(\mathcal{F}^{\mathcal{P}}_{t})_{t\in[0,T]}, defined by t𝒫t𝒫𝒩𝒫,t[0,T]\mathcal{F}^{\mathcal{P}}_{t}\coloneqq\mathcal{F}_{t}^{\cal P}\vee{\cal N}^{\mathcal{P}},\ t\in[0,T].

3.1 Controlled state dynamics

Given finite-dimensional Euclidian spaces AA and BB, we describe the state process by means of the coefficients

σ:[0,T]×Ω×A×Bd×n,andλ:[0,T]×Ω×A×Bn,\sigma:[0,T]\times\Omega\times A\times B\longrightarrow\mathbb{R}^{d\times n},\;\text{and}\;\lambda:[0,T]\times\Omega\times A\times B\longrightarrow\mathbb{R}^{n},

assumed to be Borel-measurable and non-anticipative in the sense that t(x,a,b)=txt,a,b)\ell_{t}(x,a,b)=\ell_{t}x_{\cdot\wedge t},a,b), for all (t,x,a,b)[0,T]×Ω×A×B(t,x,a,b)\in[0,T]\times\Omega\times A\times B and {σ,λ}\ell\in\{\sigma,\lambda\}. Since the product σλ\sigma\lambda will appear often, we abuse notations and write σλt(x,a,b)σt(x,a,b)λt(x,a,b)\sigma\lambda_{t}(x,a,b)\coloneqq\sigma_{t}(x,a,b)\lambda_{t}(x,a,b), for all (t,x,a,b)[0,T]×Ω×A×B(t,x,a,b)\in[0,T]\times\Omega\times A\times B. These functions satisfy the following conditions, which we comment upon in Remark 3.2.

Assumption 3.1.
  1. (i)(i)

    The map Ωxσt(,a,b)\Omega\ni x\longmapsto\sigma_{t}(\cdot,a,b) is continuous for every (t,a,b)[0,T]×A×B(t,a,b)\in[0,T]\times A\times B, there exists σ>0\ell_{\sigma}>0 such that |σt(x,a,b)|σ|\sigma_{t}(x,a,b)|\leq\ell_{\sigma} for every (t,x,a,b)[0,T]×Ω×A×B(t,x,a,b)\in[0,T]\times\Omega\times A\times B, and σσt(x,a,b)σt(x,a,b)σt(x,a,b)\sigma\sigma^{\top}_{t}(x,a,b)\coloneqq\sigma_{t}(x,a,b)\sigma^{\top}_{t}(x,a,b) is invertible for every (t,x,a,b)[0,T]×Ω×A×B(t,x,a,b)\in[0,T]\times\Omega\times A\times B. inlineinlinetodo: inlineI would expect we also need (σσ)1(\sigma\sigma^{\top})^{-1} bounded, otherwise the BM WW^{\mathbb{P}} below may not be well-defined.

  2. (ii)(ii)

    There exists λ>0\ell_{\lambda}>0 such that |λt(x,a,b)|λ|\lambda_{t}(x,a,b)|\leq\ell_{\lambda}, for every (t,x,a,b)[0,T]×Ω×A×B(t,x,a,b)\in[0,T]\times\Omega\times A\times B.

The actions of the leader are valued in AA, and the actions of the follower are valued in BB. We define the sets of controls 𝒜o{\cal A}_{o} and {\cal B} as the ones containing the 𝔽\mathbb{F}-predictable processes with values in AA and BB, respectively. Let x0dx_{0}\in\mathbb{R}^{d}. For (α,β)𝒜o×(\alpha,\beta)\in{\cal A}_{o}\times{\cal B}, the controlled state equation is given by the SDE

Xt=x0+0tσλs(Xs,αs,βs)ds+0tσs(Xs,αs,βs)dWs,t[0,T],X_{t}=x_{0}+\int_{0}^{t}\sigma\lambda_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+\int_{0}^{t}\sigma_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}W_{s},\;t\in[0,T], (3.1)

where WW denotes an nn-dimensional Brownian motion. We characterise (3.1) in terms of weak solutions. These are elegantly represented in terms of so-called martingale problems and Girsanov’s theorem, see Stroock and Varadhan [71] for details. Indeed, let us consider the SDE

Xt=x0+0tσs(Xs,αs,βs)dWs,t[0,T],\displaystyle X_{t}=x_{0}+\int_{0}^{t}\sigma_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}W_{s},\;t\in[0,T], (3.2)

and denote by 𝒫{\cal P} the set of weak solutions to (3.2). This is

𝒫{𝐌(Ω):W,n-dimensional –Brownian motion,and(α,β)𝒜o× for which (3.2) holds –a.s.}.{\cal P}\coloneqq\{\mathbb{P}\in{\bf M}(\Omega):\exists W^{\mathbb{P}},\;n\text{-dimensional }\mathbb{P}\text{--Brownian motion},\;\text{and}\;(\alpha,\beta)\in{\cal A}_{o}\times{\cal B}\text{ for which }\eqref{eq:X-dynamics-without-drift}\text{ holds }\mathbb{P}\text{\rm--a.s.}\}.

By Girsanov’s theorem, any 𝒫\mathbb{P}\in{\cal P} induces ¯𝐌(Ω)\bar{\mathbb{P}}\in{\bf M}(\Omega) weak solution to (3.1), where ¯\bar{\mathbb{P}} is defined by

d¯dexp(0Tλs(Xs,αs,βs)dWs120Tλs(Xs,αs,βs)2ds).\displaystyle\frac{\mathrm{d}\bar{\mathbb{P}}}{\mathrm{d}\mathbb{P}}\coloneqq\exp\bigg{(}\int_{0}^{T}\lambda_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\cdot\mathrm{d}W_{s}^{\mathbb{P}}-\frac{1}{2}\int_{0}^{T}\|\lambda_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\|^{2}\mathrm{d}s\bigg{)}. (3.3)

For any action α𝒜o\alpha\in{\cal A}_{o} of the leader, we define the set (α){\cal R}(\alpha) of admissible responses of the follower by

(α){(,β)𝒫×: is the unique measure in 𝒫 such that (3.2) holds –a.s. with (α,β)},{\cal R}(\alpha)\coloneqq\{(\mathbb{P},\beta)\in{\cal P}\times{\cal B}:\mathbb{P}\text{ is the unique measure in }{\cal P}\text{ such that }\eqref{eq:X-dynamics-without-drift}\text{ holds }\mathbb{P}\text{\rm--a.s.}\text{ with }(\alpha,\beta)\},

as well as the set of weak solutions 𝒫α{𝒫:(3.2) holds –a.s. with (α,β), for some β(α)}.{\cal P}^{\alpha}\coloneqq\{\mathbb{P}\in{\cal P}:\eqref{eq:X-dynamics-without-drift}\text{ holds }\mathbb{P}\text{\rm--a.s.}\text{ with }(\alpha,\beta),\text{ for some }\beta\in{\cal R}(\alpha)\}.

Remark 3.2.
  1. (i)(i)

    We note that 𝒫{\cal P} is nonempty due to the continuity assumption on σ\sigma, ensuring that solutions do exist for instance for constant controls α\alpha and β\beta, see [71, Theorem 6.1.6]. Concerning the uniqueness of weak solutions, we impose it as a condition for the admissible controls of the follower. That is, for a pair (α,β)(\alpha,\beta) of controls played by the leader and the follower, the law of XX is uniquely determined.

  2. (ii)(ii)

    We also stress that in the above formulation, there is no need to enlarge the canonical space. This subtlety is significant in the context of Stackelberg games, as doing so would mean changing the information structure of the game. Indeed, we note that in the definition of 𝒫{\cal P}, WW^{\mathbb{P}} is a Brownian motion in the original canonical space Ω\Omega. Given our assumptions on the volatility σσ\sigma\sigma^{\top}, namely its invertibility and boundedness, we do not need to enlarge Ω\Omega in this setting. In general, if the volatility is allowed to degenerate, one may need to introduce external sources of randomness and define a Brownian motion on an enlarged probability space. We refer the reader to [71, Section 4.5] and [58, Section 2.1.2] for a discussion on these results.

3.2 The closed-loop Stackelberg game between the leader and the follower

The timing of the game is as follows. The leader chooses first a control α𝒜o\alpha\in{\cal A}_{o} to which the follower responds with β\beta\in{\cal B}. The response is, of course, dependent on the control chosen by the leader. Given an action α𝒜0\alpha\in{\cal A}_{0}, the problem of the follower is given by

VF(α)sup(,β)(α)𝔼¯[0Tcs(Xs,αs,βs)ds+g(XT)],V_{\rm F}(\alpha)\coloneqq\sup_{(\mathbb{P},\beta)\in{\cal R}(\alpha)}\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}c_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+g(X_{\cdot\wedge T})\bigg{]}, (3.4)

where the functions c:[0,T]×Ω×A×Bc:[0,T]\times\Omega\times A\times B\longrightarrow\mathbb{R} and g:Ωg:\Omega\longrightarrow\mathbb{R} are continuous and bounded by constants c\ell_{c} and g\ell_{g} respectively, such that for every (a,b)A×B(a,b)\in A\times B the process c(,a,b)c_{\cdot}(\cdot,a,b) is 𝔽\mathbb{F}–progressively measurable. We say that (,β)(α)(\mathbb{P},\beta)\in{\cal R}(\alpha) is an optimal response to α𝒜o\alpha\in{\cal A}_{o}, and write (,β)(α)(\mathbb{P},\beta)\in{\cal R}^{\star}(\alpha), if (,β)(\mathbb{P},\beta) is a solution to Problem (3.4). We define the set 𝒜{\cal A} as the family α𝒜o\alpha\in{\cal A}_{o} for which there exists at least one optimal response, i.e., (α){\cal R}^{\star}(\alpha)\neq\emptyset. inlineinlinetodo: inlineSeparate assumption for cc and gg: we should present them as we did for σ\sigma and λ\lambda. Same for CC and GG. Also, why do we need continuity w.r.t. to every argument? inlineinlinetodo: inlineWhy do we need to move from 𝒜o{\cal A}_{o} to 𝒜{\cal A}? Given the formulation, and our choice that sup over an empty set is -\infty, the leader will never choose α\alpha such that (α){\cal R}^{\star}(\alpha) is empty. The only degenerate case is if (α){\cal R}^{\star}(\alpha) is always empty, whatever the choice of α\alpha. So we may as well simply assume that there is one α𝒜o\alpha\in{\cal A}_{o} such that (α){\cal R}^{\star}(\alpha)\neq\emptyset, and then mention that this is implicitly always true that we can restrict our attention to choices of α\alpha such that this is not empty. This avoids introducing both 𝒜o{\cal A}_{o} and 𝒜{\cal A} (and I would then call the former 𝒜{\cal A} directly).

We assume that the leader chooses a control from the set 𝒜{\cal A} and anticipates the optimal response of the follower. Therefore, the leader faces the following problem

VLsupα𝒜sup(,β)(α)𝔼¯[0TCs(Xs,αs,βs)ds+G(XT)],V_{\rm L}\coloneqq\sup_{\alpha\in{\cal A}}\sup_{(\mathbb{P},\beta)\in{\cal R}^{\text{$\star$}}(\alpha)}\mathbb{E}^{{\bar{\mathbb{P}}}}\bigg{[}\int_{0}^{T}C_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+G(X_{\cdot\wedge T})\bigg{]}, (3.5)

where C:[0,T]×Ω×A×BC:[0,T]\times\Omega\times A\times B\longrightarrow\mathbb{R} and G:ΩG:\Omega\longrightarrow\mathbb{R} are bounded functions, respectively by the constants C\ell_{C} and G\ell_{G}, such that for every (a,b)A×B(a,b)\in A\times B the process C(,a,b)C_{\cdot}(\cdot,a,b) is 𝔽\mathbb{F}–progressively measurable.

Remark 3.3.
  1. (i)(i)

    We assume that the functions in our model are bounded just to simplify the expositions of the results. These assumptions can be relaxed by imposing the usual integrability conditions in the set of admissible controls of the players. The results in this section and in Section 4 will still hold. The analysis becomes more delicate when studying the so–called target reachability set, defined in Section 5, through its upper and lower boundaries, and to characterise them by our methods.

  2. (ii)(ii)

    Let us mention that the existence of optimal responses is fundamental for Stackelberg games and cannot be dropped. Indeed, the main motivation in this game is that the leader plays first by anticipating the response of the follower. On the other hand, we assume that the leader has enough bargaining power to make the follower choose a maximiser that suits her best, or equivalently, we consider the problem of an optimistic leader for whom, if the follower has multiple optimal responses—and thus he is indifferent among all of them—he will choose one that benefits the leader the most. This is consistent with, for instance, Bressan [15, Section 2.1], Zemkoho [82], or Havrylenko, Hinken, and Zagst [35]. Alternatively, one could take an adversarial perspective in which the leader faces the problem

    supα𝒜inf(,β)(α)𝔼¯[0TCs(Xs,αs,βs)ds+G(XT)].\sup_{\alpha\in{\cal A}}\inf_{(\mathbb{P},\beta)\in{\cal R}^{\text{$\star$}}(\alpha)}\mathbb{E}^{{\bar{\mathbb{P}}}}\bigg{[}\int_{0}^{T}C_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+G(X_{\cdot\wedge T})\bigg{]}.

    This is the pessimistic point of view, which has also been coined generalised or weak Stackelberg equilibrium, see Leitmann [41], Başar and Olsder [7], Wiesemann, Tsoukalas, Kleniati, and Rustem [78], or Liu, Fan, Chen, and Zheng [48]. Notice that in this case, existence of equilibria may become problematic, which led part of the literature to consider so-called regularised Stackelberg problems, where, for a fixed ε>0\varepsilon>0, the infimum would now be taken over the set of actions of the follower which give him a value ε\varepsilon-close to his optimal one, see Mallozzi and Morgan [50, Section 3] and the references therein. We point out that our approach allows us to tackle both the optimistic and the pessimistic problems in the same way, the difference being in the resulting Hamiltonians of the HJB equations associated to each one of the two problems. More details will be given below.

Before concluding this section, let us mention that, for technical reasons, we work under the classical ZFC set-theoretic axioms (Zermelo–Fraenkel plus the axiom of choice). We need these axioms for the aggregation results that we use for the KK-component of the solution to a second–order BSDEs. More details are given in the next section. inlineinlinetodo: inlineThat’s not really correct: everyone except set-theorists works with ZFC, so there is not need to mention it. For aggregation, you need strictly more than ZFC, so if we want to say something, we should say that whole thing here.

4 Reduction to a target control problem

In this section, we fix a control α𝒜\alpha\in{\cal A} of the leader and characterise the solutions (,β)(α)(\mathbb{P}^{\star},\beta^{\star})\in{\cal R}^{\star}(\alpha) to the continuous-time stochastic control problem (3.4). Our approach is inspired by the dynamic programming approach to principal–agent problems developed in [25].

As standard in the control literature, we introduce the Hamiltonian functions HF:[0,T]×Ω×d×𝕊d×AH^{\rm F}:[0,T]\times\Omega\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times A\longrightarrow\mathbb{R} and hF:[0,T]×Ω×d×𝕊d×A×Bh^{\rm F}:[0,T]\times\Omega\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times A\times B\longrightarrow\mathbb{R}

HtF(x,z,γ,a)supbB{htF(x,z,γ,a,b)},htF(x,z,γ,a,b)ct(x,a,b)+σλt(x,a,b)z+12Tr[σσt(x,a,b)γ].\displaystyle\begin{split}H^{\rm F}_{t}(x,z,\gamma,a)&\coloneqq\sup_{b\in B}\big{\{}h_{t}^{\rm F}(x,z,\gamma,a,b)\big{\}},\\ h^{\rm F}_{t}(x,z,\gamma,a,b)&\coloneqq c_{t}(x,a,b)+\sigma\lambda_{t}(x,a,b)\cdot z+\frac{1}{2}\mathrm{Tr}[\sigma\sigma_{t}^{\top}\!(x,a,b)\gamma].\end{split} (4.1)

Define now, for (t,x,Σ,a)[0,T]×Ω×𝕊+d×A(t,x,\Sigma,a)\in[0,T]\times\Omega\times\mathbb{S}^{d}_{+}\times A, the set At(x,Σ,a){bB:σσt(x,a,b)=Σ}A_{t}(x,\Sigma,a)\coloneqq\big{\{}b\in B:\sigma\sigma_{t}^{\top}\!(x,a,b)=\Sigma\big{\}}. For (α,)𝒜×𝒫α(\alpha,\mathbb{P})\in{\cal A}\times{\cal P}^{\alpha}, the set of controls for the follower is given by

(α,){β:βtAt(x,σ^t2,αt),dt–a.e.}.{\cal B}(\alpha,\mathbb{P})\coloneqq\{\beta\in{\cal B}:\beta_{t}\in A_{t}(x,\widehat{\sigma}_{t}^{2},\alpha_{t}),\;\mathrm{d}t\otimes\mathbb{P}\text{\rm--a.e.}\}.

With these definitions, we can isolate the partial maximisation with respect to the squared diffusion in HFH^{\rm F}. In words, letting F:[0,T]×Ω×d×𝕊+d×AF:[0,T]\times\Omega\times\mathbb{R}^{d}\times\mathbb{S}^{d}_{+}\times A\longrightarrow\mathbb{R}, be given by

Ft(x,z,Σ,a)supbAt(x,Σ,a){ct(x,a,b)+σλt(x,a,b)z},F_{t}(x,z,\Sigma,a)\coloneqq\sup_{b\in A_{\text{$t$}}(x,\Sigma,a)}\big{\{}c_{t}(x,a,b)+\sigma\lambda_{t}(x,a,b)\cdot z\},

we have that 2HF=(2F)2H^{\rm F}=(-2F)^{\ast} where the superscript \ast denotes the Legendre transform

HtF(x,z,γ,a)=supΣ𝕊d+{Ft(x,z,Σ,a)+12Tr[Σγ]}.H^{\rm F}_{t}(x,z,\gamma,a)=\sup_{\Sigma\in\mathbb{S}_{\text{$d$}}^{\text{$+$}}}\bigg{\{}F_{t}(x,z,\Sigma,a)+\frac{1}{2}\text{Tr}[\Sigma\gamma]\bigg{\}}.

Recalling (3.3), we can equivalently write the problem of the follower (3.4) as

VF(α)=sup𝒫αsupβ(α,)𝔼¯[0Tcs(Xs,αs,βs)ds+g(XT)],\displaystyle V_{\rm F}(\alpha)=\sup_{\mathbb{P}\in{\cal P}^{\text{$\alpha$}}}\sup_{\beta\in{\cal B}(\alpha,\mathbb{P})}\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}c_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+g(X_{\cdot\wedge T})\bigg{]}, (4.2)

to which we will associate the second-order BSDE141414We refer the reader to [58, 70] for an introduction and extension of the theory of such equations.

Yt=g(XT)+tTFs(Xs,Zs,σ^s2,αs)dstTZsdXs+tTdKs,𝒫α–q.s.,t[0,T].Y_{t}=g(X_{\cdot\wedge T})+\int_{t}^{T}F_{s}(X_{\cdot\wedge s},Z_{s},\widehat{\sigma}_{s}^{2},\alpha_{s})\mathrm{d}s-\int_{t}^{T}Z_{s}\cdot\mathrm{d}X_{s}+\int_{t}^{T}\mathrm{d}K_{s},\;{\cal P}^{\alpha}\text{\rm--q.s.},\;t\in[0,T]. (4.3)

Notice that, similarly to [25], we consider an aggregated version of the non-decreasing process KK.151515We require the aggregation of the component KK, as well as the one of the stochastic integral, to define later the forward process Yy,Z,Γ,αY^{y,Z,\Gamma,\alpha} independently of any probability. There are aggregation results for the stochastic integral in [53], which suit our setting and use the notion of medial limits. By following this route, one would need to assume ZFC plus some other axioms. We refer the reader to [58, Footnote 7] for a further discussion on the weakest set of axioms known to be sufficient for the existence of medial limits. We have then the following notion of solution to the 2BSDE, the functional spaces mentioned in the following definition can be found in Appendix B. We also use the notation

𝒫α[,𝔽+,t]{𝒫α:[E]=[E],Et+}.{\cal P}^{\alpha}[\mathbb{P},\mathbb{F}^{+},t]\coloneqq\{\mathbb{P}^{\prime}\in{\cal P}^{\alpha}:\mathbb{P}[E]=\mathbb{P}^{\prime}[E],\;\forall E\in{\cal F}_{t}^{+}\}.
inlineinlinetodo: inline𝔽+\mathbb{F}^{+} not defined.
Definition 4.1.

We say that the triple (Y,Z,K)(Y,Z,K) is a solution to the 2BSDE (4.3) if there exists p>1p>1 such that (Y,Z,K)𝕊p(𝔽𝒫α,𝒫α)×p(𝔽𝒫α,𝒫α)×𝕀p(𝔽𝒫α,𝒫α)(Y,Z,K)\in\mathbb{S}^{p}(\mathbb{F}^{{\cal P}^{\text{$\alpha$}}},{\cal P}^{\alpha})\times\mathbb{H}^{p}(\mathbb{F}^{{\cal P}^{\text{$\alpha$}}},{\cal P}^{\alpha})\times\mathbb{I}^{p}(\mathbb{F}^{{\cal P}^{\text{$\alpha$}}},{\cal P}^{\alpha}) satisfies (4.3) and KK satisfies the minimality condition

Kt=essinf𝒫α[,𝔽+,t]𝔼[KT|t,+],t[0,T],𝒫α–q.s.K_{t}=\operatorname*{{ess\,inf}^{\mathbb{P}}}_{\mathbb{P}^{\prime}\in{\cal P}^{\text{$\alpha$}}[\mathbb{P},\mathbb{F}^{\text{$+$}},t]}\;\mathbb{E}^{\mathbb{P}^{\text{$\prime$}}}\big{[}K_{T}\big{|}\mathcal{F}_{t}^{\mathbb{P},+}\big{]},\;t\in[0,T],\;{\cal P}^{\alpha}\text{\rm--q.s.} (4.4)

As anticipated, the next result connects the problem of the follower with the 2BSDE (4.3).

Proposition 4.2.

There exists a unique solution (Y,Z,K)(Y,Z,K) to the 2BSDE (4.3), for which the value of the follower satisfies VF(α)=sup𝒫α𝔼¯[Y0]V_{\rm F}(\alpha)=\sup_{\mathbb{P}\in{\cal P}^{\text{$\alpha$}}}\mathbb{E}^{\bar{\mathbb{P}}}[Y_{0}]. Moreover, (,β)(α)(\mathbb{P}^{\star},\beta^{\star})\in{\cal R}^{\star}(\alpha) if and only if KT=0K_{T}=0, –a.s.\mathbb{P}^{\star}\text{\rm--a.s.} and

βis a maximiser in the definition of F(X,Z,σ^2,α),dtd–a.e.\beta^{\star}\;\text{\rm is a maximiser in the definition of }F_{\cdot}(X_{\cdot},Z_{\cdot},\widehat{\sigma}^{2}_{\cdot},\alpha_{\cdot}),\;\mathrm{d}t\otimes\mathrm{d}\mathbb{P}^{\star}\text{\rm--a.e.} (4.5)
Proof.

Notice that the follower’s problem can be seen as the particular problem of an agent who is offered by the principal a terminal remuneration of the form ξ=g(XT)\xi=g(X_{\cdot\wedge T}). Since the function gg is assumed to be bounded, the result is a direct application of [25, Propositions 4.5 and 4.6]. ∎

For p>1p>1, (y,α,Z,K)𝒜××p(𝔽𝒫α,𝒫α)×𝕀p(𝔽𝒫α,𝒫α)(y,\alpha,Z,K)\in{\cal A}\times\mathbb{R}\times\mathbb{H}^{p}(\mathbb{F}^{{\cal P}^{\text{$\alpha$}}},{\cal P}^{\alpha})\times\mathbb{I}^{p}(\mathbb{F}^{{\cal P}^{\text{$\alpha$}}},{\cal P}^{\alpha}), KK satisfying (4.4), the process Yy,α,Z,KY^{y,\alpha,Z,K}, given by

Yty,α,Z,Ky0tFs(Xs,Zs,σ^s2,αs)ds+0tZsdXs0tdKs,t[0,T],Y^{y,\alpha,Z,K}_{t}\coloneqq y-\int_{0}^{t}F_{s}(X_{\cdot\wedge s},Z_{s},\widehat{\sigma}_{s}^{2},\alpha_{s})\mathrm{d}s+\int_{0}^{t}Z_{s}\cdot\mathrm{d}X_{s}-\int_{0}^{t}\mathrm{d}K_{s},\;t\in[0,T],

is well-defined, independent of the probability \mathbb{P}, because the stochastic integrals can be defined pathwise (see [25, Definition 3.2] and the paragraph thereafter). The idea is to look at the tuples (y,α,Z,K)(y,\alpha,Z,K) for which it holds that YTy,α,Z,K=g(XT)Y^{y,\alpha,Z,K}_{T}=g(X_{\cdot\wedge T}). However, as argued in [25, Theorem 3.6], the processes KK can be approximated by those of the form

0t(HsF(Xs,Zs,Γs,αs)Fs(Xs,Zs,σ^s2,αs)12Tr[σ^s2Γs])ds,\int_{0}^{t}\bigg{(}H^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s})-F_{s}(X_{\cdot\wedge s},Z_{s},\widehat{\sigma}^{2}_{s},\alpha_{s})-\frac{1}{2}\text{Tr}[\widehat{\sigma}^{2}_{s}\Gamma_{s}\big{]}\bigg{)}\mathrm{d}s,

for some appropriate control Γ\Gamma. With this in mind, we define the following class of processes that would be seen as controls from the point of view of the leader.

Definition 4.3.

For any α𝒜\alpha\in{\cal A}, let 𝒞α{{\cal C}^{\alpha}} be the class of 𝔽𝒫α\mathbb{F}^{{\cal P}^{\text{$\alpha$}}}-predictable processes (Z,Γ):[0,T]×Ωd×𝕊d(Z,\Gamma):[0,T]\times\Omega\longrightarrow\mathbb{R}^{d}\times\mathbb{S}^{d} such that

Yy,α,Z,Γ𝕊p(𝔽𝒫α,𝒫α)p+Zp(𝔽𝒫α,𝒫α)p<+,\|Y^{y,\alpha,Z,\Gamma}\|_{\mathbb{S}^{\text{$p$}}(\mathbb{F}^{\text{${\cal P}$}^{\text{$\alpha$}}},{\cal P}^{\text{$\alpha$}})}^{p}+\|Z\|_{\mathbb{H}^{\text{$p$}}(\mathbb{F}^{\text{${\cal P}$}^{\text{$\alpha$}}},{\cal P}^{\text{$\alpha$}})}^{p}<+\infty,

for some p>1p>1, where for yy\in\mathbb{R} we define, –a.s.\mathbb{P}\text{\rm--a.s.} for all 𝒫α\mathbb{P}\in{\cal P}^{\alpha}, the process

Yty,α,Z,Γy0tHsF(Xs,Zs,Γs,αs)ds+0tZsdXs+120tTr[σ^s2Γs]ds,t[0,T].Y^{y,\alpha,Z,\Gamma}_{t}\coloneqq y-\int_{0}^{t}H^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s})\mathrm{d}s+\int_{0}^{t}Z_{s}\cdot\mathrm{d}X_{s}+\frac{1}{2}\int_{0}^{t}\mathrm{Tr}[\widehat{\sigma}^{2}_{s}\Gamma_{s}]\mathrm{d}s,\;t\in[0,T]. (4.6)

The next proposition provides an optimality condition for a pair (,β)(\mathbb{P},\beta), when the process Yy,α,Z,ΓY^{y,\alpha,Z,\Gamma} hits the correct terminal condition, i.e., YTy,α,Z,Γ=g(XT)Y_{T}^{y,\alpha,Z,\Gamma}=g(X_{\cdot\wedge T}), –a.s.\mathbb{P}\text{\rm--a.s.} In such a case, the follower’s value coincides with yy, and his optimal actions correspond to maximisers of the Hamiltonian HFH^{\rm F}. We will use this characterisation in the next section to obtain a reformulation of the problem of the leader.

Proposition 4.4.

Let α𝒜\alpha\in{\cal A} and (y,Z,Γ)×𝒞α(y,Z,\Gamma)\in\mathbb{R}\times{{\cal C}^{\alpha}} be such that YTy,α,Z,Γ=g(XT)Y_{T}^{y,\alpha,Z,\Gamma}=g(X_{\cdot\wedge T}), –a.s.\mathbb{P}\text{\rm--a.s.}, for some (,β)(α)(\mathbb{P},\beta)\in{\cal R}(\alpha). Then, the following are equivalent

  1. (i)(i)

    (,β)(α)(\mathbb{P},\beta)\in{\cal R}^{\star}(\alpha) and VF(α)=y;V_{\rm F}(\alpha)=y;

  2. (ii)(ii)

    β\beta maximises hFh^{\rm F} on the support of \mathbb{P}

    HtF(Xt,Zt,Γt,αt)=htF(Xt,Zt,Γt,αt,βt),dtd–a.e.H^{\rm F}_{t}(X_{\cdot\wedge t},Z_{t},\Gamma_{t},\alpha_{t})=h^{\rm F}_{t}(X_{\cdot\wedge t},Z_{t},\Gamma_{t},\alpha_{t},\beta_{t}),\;\mathrm{d}t\otimes\mathrm{d}\mathbb{P}\text{\rm--a.e.} (4.7)
Proof of Proposition 4.4.

Let (,β)(α)(\mathbb{P},\beta)\in{\cal R}(\alpha) such that YTy,α,Z,Γ=g(XT)Y_{T}^{y,\alpha,Z,\Gamma}=g(X_{\cdot\wedge T}), –a.s.\mathbb{P}\text{\rm--a.s.} Assume (i)(i) holds. Then, the value and utility of the follower satisfy

VF(α)=UF(,β)𝔼¯[0Tcs(Xs,αs,βs)ds+g(XT)]=𝔼¯[0Tcs(Xs,αs,βs)ds+YTy,α,Z,Γ].V_{\rm F}(\alpha)=U_{\rm F}(\mathbb{P},\beta)\coloneqq\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}c_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+g(X_{\cdot\wedge T})\bigg{]}=\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}c_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+Y_{T}^{y,\alpha,Z,\Gamma}\bigg{]}.

By writing the dynamics of Yy,α,Z,ΓY^{y,\alpha,Z,\Gamma} and the fact that \mathbb{P} is a weak solution to (3.1) with (α,β)(\alpha,\beta), we obtain

UF(,β)\displaystyle U_{\rm F}(\mathbb{P},\beta) =𝔼¯[0Tcs(Xs,αs,βs)ds+y0THsF(Xs,Zs,Γs,αs)ds+0TZsdXs+120TTr[σ^s2Γs]ds]\displaystyle=\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}c_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+y-\int_{0}^{T}H^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s})\mathrm{d}s+\int_{0}^{T}Z_{s}\cdot\mathrm{d}X_{s}+\frac{1}{2}\int_{0}^{T}\mathrm{Tr}[\widehat{\sigma}^{2}_{s}\Gamma_{s}]\mathrm{d}s\bigg{]}
=y+𝔼¯[0T(hsF(Xs,Zs,Γs,αs,βs)HsF(Xs,Zs,Γs,αs))ds+0TZsσs(X,αs,βs)dWs¯]\displaystyle=y+\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}\big{(}h^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s},\beta_{s})-H^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s})\big{)}\mathrm{d}s+\int_{0}^{T}Z_{s}\cdot\sigma_{s}(X,\alpha_{s},\beta_{s}^{\star})\mathrm{d}W^{\bar{\mathbb{P}}^{\star}}_{s}\bigg{]}
=y+𝔼¯[0T(hsF(Xs,Zs,Γs,αs,βs)HsF(Xs,Zs,Γs,αs))ds],\displaystyle=y+\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}\big{(}h^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s},\beta_{s})-H^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s})\big{)}\mathrm{d}s\bigg{]},

since the stochastic integral is a martingale due to the integrability conditions specified in the definition of 𝒞α{{\cal C}^{\alpha}}. Now, by definition of HFH^{\rm F}, see (4.1), we see that VF(α)yV_{\rm F}(\alpha)\leq y. Since VF(α)=yV_{\rm F}(\alpha)=y, we deduce (ii)(ii) holds. Let us now assume (ii)(ii). Since (,β)(α)(\mathbb{P},\beta)\in{\cal R}(\alpha), it follows from (4.2) that

VF(α)supβ(α,)𝔼¯[0Tcs(Xs,αs,βs)ds+g(XT)].\displaystyle V_{\rm F}(\alpha)\geq\sup_{\beta\in{\cal B}(\alpha,\mathbb{P})}\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}c_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+g(X_{\cdot\wedge T})\bigg{]}. (4.8)

The value on the right corresponds to 𝔼¯[Y0]\mathbb{E}^{\bar{\mathbb{P}}}[Y_{0}], where (Y,Z,K)(Y,Z,K) is the unique solution to the BSDE

Yt=g(XT)+tTFs(Xs,Zs,σ^s2,αs)dstTZsdXs+tTdKs,t[0,T],–a.s.,Y_{t}=g(X_{\cdot\wedge T})+\int_{t}^{T}F_{s}(X_{\cdot\wedge s},Z_{s},\widehat{\sigma}_{s}^{2},\alpha_{s})\mathrm{d}s-\int_{t}^{T}Z_{s}\cdot\mathrm{d}X_{s}+\int_{t}^{T}\mathrm{d}K_{s},\;t\in[0,T],\;\mathbb{P}\text{\rm--a.s.},

and equality in (4.8) holds if KT=0K_{T}=0, –a.s.\mathbb{P}\text{\rm--a.s.} Since 2HF=(2F)2H^{\rm F}=(-2F)^{\ast} and (4.7) hold, together with the condition YTy,α,Z,Γ=g(XT)Y_{T}^{y,\alpha,Z,\Gamma}=g(X_{\cdot\wedge T}), –a.s.\mathbb{P}\text{\rm--a.s.}, we see that Yy,α,Z,ΓY^{y,\alpha,Z,\Gamma} satisfies

Yty,α,Z,Γ=g(XT)+tTFs(Xs,Zs,σ^s2,αs)dstTZsdXs+tTdKsZ,Γ,α,t[0,T],–a.s.,Y_{t}^{y,\alpha,Z,\Gamma}=g(X_{\cdot\wedge T})+\int_{t}^{T}F_{s}(X_{\cdot\wedge s},Z_{s},\widehat{\sigma}_{s}^{2},\alpha_{s})\mathrm{d}s-\int_{t}^{T}Z_{s}\cdot\mathrm{d}X_{s}+\int_{t}^{T}\mathrm{d}K_{s}^{Z,\Gamma,\alpha},\;t\in[0,T],\;\mathbb{P}\text{\rm--a.s.},

where

KtZ,Γ,α0t(HsF(Xs,Zs,Γs,αs)hsF(Xs,Zs,Γs,αs,βs))ds,K_{t}^{Z,\Gamma,\alpha}\coloneqq\int_{0}^{t}\bigg{(}H^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s})-h^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s},\beta_{s})\bigg{)}\mathrm{d}s,

which by assumption satisfies KTZ,Γ,α=0,K_{T}^{Z,\Gamma,\alpha}=0, –a.s.\mathbb{P}\text{\rm--a.s.} Hence (,β)(α)(\mathbb{P},\beta)\in{\cal R}^{\star}(\alpha) from the previous discussion. Finally, since by uniqueness of the solution we have that y=𝔼¯[Y0]y=\mathbb{E}^{\bar{\mathbb{P}}}[Y_{0}], the fact that VF(α)=yV_{\rm F}(\alpha)=y is argued as in (i)(i). ∎

4.1 A stochastic target reformulation of the problem of the leader

In light of the results from the previous section, we are drawn to reformulate the problem faced by the leader as a stochastic control problem with stochastic target constraints. Indeed, Proposition 4.4 tell us that the value of the follower (given the control α\alpha by the leader) is equal to VF(α)=yV_{\rm F}(\alpha)=y and any pair (,β)(\mathbb{P}^{\star},\beta^{\star}) that satisfies (4.7) is a solution to the problem of the follower, as long as Yy,α,Z,ΓY^{y,\alpha,Z,\Gamma} hits the correct terminal value.

For (Z,Γ,α)𝒞α×𝒜(Z,\Gamma,\alpha)\in{{\cal C}^{\alpha}}\times{\cal A} and deterministic yy\in\mathbb{R}, which represents the value of the follower, let us define the set

(y,α,Z,Γ)\displaystyle{\cal R}^{\star}(y,\alpha,Z,\Gamma) {(,β)(α):YTy,α,Z,Γ=g(XT),and (4.7) hold, –a.s.}.\displaystyle\coloneqq\{(\mathbb{P},\beta)\in{\cal R}(\alpha):Y_{T}^{y,\alpha,Z,\Gamma}=g(X_{\cdot\wedge T}),\;\text{and }\eqref{eq:optimal-effort-reformulated}\text{ hold, }\mathbb{P}\text{\rm--a.s.}\}.

We propose then the following reformulation of the problem of the leader

V^Lsupysup(Z,Γ,α)𝒞α×𝒜sup(,β)(y,α,Z,Γ)𝔼¯[0TCs(Xs,αs,βs)ds+G(XT)].\hat{V}_{\rm L}\coloneqq\sup_{y\in\mathbb{R}}\sup_{(Z,\Gamma,\alpha)\in{{\cal C}^{\text{$\alpha$}}}\times{\cal A}}\sup_{(\mathbb{P},\beta)\in{\cal R}^{\text{$\star$}}(y,\alpha,Z,\Gamma)}\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}C_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+G(X_{\cdot\wedge T})\bigg{]}. (4.9)
Remark 4.5.

Let us briefly digress on the nature of (4.9).

  1. (i)(i)

    A distinctive feature of (4.9) is that, as described in Section 3.1, the dynamics of the controlled process XX are given in weak formulation whereas those of YY are given in strong formulation as in (4.6). Though the reader might find this atypical, we recall that this feature is common in the dynamic programming approach in contract theory. Since up until this point, our approach has borrowed ideas from this literature, it is not surprising to find this feature in (4.9).

  2. (ii)(ii)

    Let us also digress on our choice to reformulate (3.5) as an optimal control problem with target constraints. This is certainly not the only possible reformulation available. Alternatively, thanks to Proposition 4.2, (3.5) also admits a reformulation as an optimal control problem of FBSDEs. Yet we think that there are some shortcomings in following this route. Though there exists some literature on this class of control problems, because there is no general comparison principle for FBSDEs, results tend to leverage the stochastic maximum principle to derive both necessary and sufficient conditions for optimality. Consequently, most of these works consider continuously differentiable state-dependent data in order to derive necessary conditions. Additional concavity/convexity assumptions are needed to derive sufficient conditions in terms of a system of FBSDEs with twice as many variables as in the initial system, see for instance [24, Chapter 10]. Be it as it may, we believe that the sufficient condition obtained through our approach, see Theorem 5.8 below, is more amenable to the analysis and numerical implementations than those in the literature on the control of FBSDEs.

Recall that (y,α,Z,Γ){\cal R}^{\star}(y,\alpha,Z,\Gamma) is non-empty thanks to Proposition 4.2 and the discussion thereafter. Since we agreed that the supremum over an empty set is -\infty, the supremum in the yy-variable could be taken instead over the set

𝔗{y:(y,α,Z,Γ) for some (Z,Γ,α)𝒞α×𝒜},\mathfrak{T}\coloneqq\{y\in\mathbb{R}:{\cal R}^{\star}(y,\alpha,Z,\Gamma)\neq\emptyset\text{ for some }(Z,\Gamma,\alpha)\in{\color[rgb]{0,0.6,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.6,0}{\cal C}^{\alpha}}\times{\cal A}\},

which corresponds to the so-called target reachability set in the language of stochastic target problems as studied for instance in [69]. By Equation 3.5, the reward of the leader is only computed under optimal responses (,β)(α)(\mathbb{P},\beta)\in{\cal R}^{\star}(\alpha), and (y,α,Z,Γ){\cal R}^{\star}(y,\alpha,Z,\Gamma) provides the optimal responses of the follower.

The interpretation of V^L\hat{V}_{L} is as follows. The leader decides yy\in\mathbb{R} and optimal controls (Z,Γ,α)𝒞α×𝒜(Z^{\star},\Gamma^{\star},\alpha^{\star})\in{{\cal C}^{\alpha^{\text{$\star$}}}}\times{\cal A}. She then announces her control α𝒜\alpha^{\star}\in{\cal A} for which she knows that the value of the follower is yy, i.e., Vf(α)=yV_{f}(\alpha^{\star})=y and that his optimal controls belong to (y,Z,Γ,α){\cal R}^{\star}(y,Z^{\star},\Gamma^{\star},\alpha^{\star}). The leader can make a recommendation to the follower for his optimal response and corresponding value, which are followed since he has no better alternative. This holds true for every y𝔗y\in\mathfrak{T} and the optimal choice of this value is the one that maximises the objective function of the leader. This new problem is a reformulation of the problem of the leader as the following result shows.

Theorem 4.6.

The reformulated and the original problem of the leader have the same value, that is, V^L=VL\hat{V}_{L}=V_{L}.

Proof.

(i)(i) Let yy\in\mathbb{R} and assume that y𝔗y\in\mathfrak{T} since the supremum in the yy-variable in (4.9) can be reduced to this set. Take next (α,Z,Γ)𝒜×𝒞α(\alpha,Z,\Gamma)\in{\cal A}\times{{\cal C}^{\alpha}}, (,β)(y,α,Z,Γ)(\mathbb{P},\beta)\in{\cal R}^{\star}(y,\alpha,Z,\Gamma), and let Yy,α,Z,ΓY^{y,\alpha,Z,\Gamma} be the process given by (4.6). By Proposition 4.4, y=Y0y,α,Z,Γ=Vf(α)y=Y_{0}^{y,\alpha,Z,\Gamma}=V_{f}(\alpha) and (,β)(α)(\mathbb{P},\beta)\in{\cal R}^{\star}(\alpha). This means that the optimal response of the follower to the action α\alpha is given by (,β)(\mathbb{P},\beta). Therefore, the objective function in problem V^L\hat{V}_{L} at (y,α,Z,Γ,,β)(y,\alpha,Z,\Gamma,\mathbb{P},\beta) is matched by the objective function in VLV_{L} at (α,,β)(\alpha,\mathbb{P},\beta). This implies V^LVL\hat{V}_{L}\leq V_{L}.

(ii)(ii) We show that the leader’s objective function in VLV_{L} can be approximated by elements in 𝒞α{{\cal C}^{\alpha}}. Let α𝒜\alpha\in{\cal A} and (,β)(α)(\mathbb{P}^{\star},\beta^{\star})\in{\cal R}^{\star}(\alpha). By Proposition 4.2, there is (Y,Z,K)(Y,Z,K) solution to the 2BSDE (4.3). We argue in 2 steps.

Step 11. We construct an approximate solution to (4.3). Let ε>0\varepsilon>0, y𝔼[Y0]y\coloneqq\mathbb{E}^{\mathbb{P}^{\text{$\star$}}}[Y_{0}] and define

Ktε1ε(tε)+tKsds,Ytεy0tFs(Xs,Zs,σ^s2,αs)ds+0tZsdXs+0tdKsε.K_{t}^{\varepsilon}\coloneqq\frac{1}{\varepsilon}\int_{(t-\varepsilon)^{\text{$+$}}}^{t}K_{s}\mathrm{d}s,\;Y^{\varepsilon}_{t}\coloneqq y-\int_{0}^{t}F_{s}(X_{\cdot\wedge s},Z_{s},\widehat{\sigma}_{s}^{2},\alpha_{s})\mathrm{d}s+\int_{0}^{t}Z_{s}\cdot\mathrm{d}X_{s}+\int_{0}^{t}\mathrm{d}K_{s}^{\varepsilon}.

Note that KεK^{\varepsilon} is absolutely continuous, 𝔽𝒫α\mathbb{F}^{{\cal P}^{\text{$\alpha$}}}-predictable, non-decreasing 𝒫α–q.s.{\cal P}^{\alpha}\text{\rm--q.s.}, and KTε=0K_{T}^{\varepsilon}=0, –a.s.\mathbb{P}^{\star}\text{\rm--a.s.} Since KTεKTK_{T}^{\varepsilon}\leq K_{T}, Kε𝕀p(𝔽𝒫α,𝒫α)K^{\varepsilon}\in\mathbb{I}^{p}(\mathbb{F}^{{\cal P}^{\text{$\alpha$}}},{\cal P}^{\alpha}) satisfies (4.4) and YTεY_{T}^{\varepsilon} satisfies the required integrability. That is, (Yε,Z,Kε)(Y^{\varepsilon},Z,K^{\varepsilon}) satisfies (4.3) with terminal condition YTεY_{T}^{\varepsilon}. By standard a priori estimates, see [58, Theorem 4.4], we have that Yε𝕊p(𝔽𝒫α,𝒫α)<\|Y^{\varepsilon}\|_{\mathbb{S}^{\text{$p$}}(\mathbb{F}^{\text{${\cal P}$}^{\text{$\alpha$}}},{\cal P}^{\text{$\alpha$}})}<\infty. All in all, we deduce that (Yε,Z,Kε)(Y^{\varepsilon},Z,K^{\varepsilon}) is a solution to 2BSDE (4.3) with terminal condition YTεY_{T}^{\varepsilon}.

Step 22. We show the approximation can be given in terms of elements in 𝒞α{{\cal C}^{\alpha}}. Let K˙ε\dot{K}^{\varepsilon} be the density, with respect to Lebesgue measure, of KεK^{\varepsilon}. We claim that there is an 𝔽\mathbb{F}-predictable process Γε\Gamma^{\varepsilon} such that

K˙tε=HtF(Xt,Zt,Γtε,αt)Ft(Xt,Zt,σ^t2,αt)12Tr[σ^t2Γtε].\dot{K}_{t}^{\varepsilon}=H^{\rm F}_{t}(X_{\cdot\wedge t},Z_{t},\Gamma_{t}^{\varepsilon},\alpha_{t})-F_{t}(X_{\cdot\wedge t},Z_{t},\widehat{\sigma}^{2}_{t},\alpha_{t})-\frac{1}{2}\text{Tr}[\widehat{\sigma}^{2}_{t}\Gamma^{\varepsilon}_{t}].

Indeed, we argue as in the proof of [60, Theorem 4.3]. Let us first note that the map γHtF(x,z,γ,a)\gamma\longmapsto H^{\rm F}_{t}(x,z,\gamma,a) has domain 𝕊d\mathbb{S}^{d}, is convex, continuous, and coercive by the boundedness of cc, λ\lambda, and σ\sigma. From the coercivity, it follows that supγ𝕊d{12Tr[σ^t2(x)γ]HtF(x,z,γ,a)}\sup_{\gamma\in\mathbb{S}^{\text{$d$}}}\big{\{}\frac{1}{2}{\rm Tr}[\widehat{\sigma}^{2}_{t}(x)\gamma]-H^{\rm F}_{t}(x,z,\gamma,a)\big{\}} has a maximiser in 𝕊d\mathbb{S}^{d}. Thus, since (2H)=(2F)(2H)=(2F)^{\ast}, it follows from standard results in convex analysis, see [61, Theorem 23.5], that we can find a (measurable) process Γ\Gamma such that the equality HtF(Xt,Zt,Γt,αt)=Ft(Xt,Zt,σ^t2,αt)+12Tr[σ^t2Γt]H^{\rm F}_{t}(X_{\cdot\wedge t},Z_{t},\Gamma_{t},\alpha_{t})=F_{t}(X_{\cdot\wedge t},Z_{t},\widehat{\sigma}^{2}_{t},\alpha_{t})+\frac{1}{2}\text{Tr}[\widehat{\sigma}^{2}_{t}\Gamma_{t}] holds and a (measurable) process Γ\Gamma^{\prime} (we omit its dependence on ε\varepsilon) such that one has strict inequality if Γ\Gamma is replaced by Γ\Gamma^{\prime} in the previous formula. The claim follows by taking ΓεΓ𝟏{Kε=0}+Γ𝟏{Kε>0}\Gamma^{\varepsilon}\coloneqq\Gamma\mathbf{1}_{\{K^{\varepsilon}=0\}}+\Gamma^{\prime}\mathbf{1}_{\{K^{\varepsilon}>0\}}. We then find that (Z,Γε)𝒞α(Z,\Gamma^{\varepsilon})\in{{\cal C}^{\alpha}} since

YTε=y0THsF(Xs,Zs,Γsε,αs)ds+0TZsdXs+120TTr[σ^s2Γsε]ds=YTy,Z,Γε,α,Y^{\varepsilon}_{T}=y-\int_{0}^{T}H^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s}^{\varepsilon},\alpha_{s})\mathrm{d}s+\int_{0}^{T}Z_{s}\cdot\mathrm{d}X_{s}+\frac{1}{2}\int_{0}^{T}\text{Tr}[\widehat{\sigma}^{2}_{s}\Gamma^{\varepsilon}_{s}]\mathrm{d}s=Y_{T}^{y,Z,\Gamma^{\varepsilon},\alpha},

and, recalling that K=Kε=0K=K^{\varepsilon}=0, –a.s.\mathbb{P}^{\star}\text{\rm--a.s.}, we see that Γε=Γ\Gamma^{\varepsilon}=\Gamma, dtd–a.e.\mathrm{d}t\otimes\mathrm{d}\mathbb{P}^{\star}\text{\rm--a.e.}, and deduce that Y=Yε,Y=Y^{\varepsilon}, –a.s.\mathbb{P}^{\star}\text{\rm--a.s.} In particular, YTε=g(XT),–a.s.Y^{\varepsilon}_{T}=g(X_{\cdot\wedge T}),\;\mathbb{P}^{\star}\text{\rm--a.s.} Thus, from Proposition 4.4, we deduce that (,β)(\mathbb{P}^{\star},\beta^{\star}) satisfies (4.7) and thus (,β)(y,Z,Γε,α)(\mathbb{P}^{\star},\beta^{\star})\in{\cal R}^{\star}(y,Z,\Gamma^{\varepsilon},\alpha). Similarly to the conclusion in part (i)(i), this implies that V^LVL\hat{V}_{L}\geq V_{L}. ∎

5 Solving the problem of the leader: strong formulation

In this section, we use the techniques developed in [14, 13] based on the geometric dynamic programming principle, see [67, 68], to study Markovian stochastic target control problems. For this reason, and to take full advantage of the standard tools from stochastic target problems, we bring ourselves to a Markovian setting and study the strong formulation of (4.9). We expect it to be equivalent to V^L\hat{V}_{L}, see Remark 5.2.

In this setting, (Ω,T,𝔽,)(\Omega,{\cal F}_{T},\mathbb{F},\mathbb{P}) denotes an abstract complete probability space supporting a \mathbb{P}–Brownian motion, which we still denote WW, and 𝔽\mathbb{F} denotes the filtration generated by WW, augmented under \mathbb{P} so that it satisfies the usual conditions. In addition, the dependence of the data of the problem on (t,x)[0,T]×𝒞([0,T];d)(t,x)\in[0,T]\times{\cal C}([0,T];\mathbb{R}^{d}) is only through (t,x(t))[0,T]×d(t,x(t))\in[0,T]\times\mathbb{R}^{d}. With a slight abuse of notation, we now write c(t,x(t),a,b)c(t,x(t),a,b) instead of ct(x,a,b)c_{t}(x,a,b)—and similarly for all the other mappings introduced in the previous sections—and thus without any risk of misunderstanding, consider now the maps as defined on d\mathbb{R}^{d} instead of 𝒞([0,T];d){\cal C}([0,T];\mathbb{R}^{d}).

In light of Proposition 4.4, by a measurable selection argument, we introduce {\cal B}^{\star} as the set of Borel-measurable maps b:[0,T]×d×d×𝕊d×ABb^{\star}:[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times A\longrightarrow B which are uniformly Lipschitz-continuous in (x,z)(x,z) and such that

HF(t,x,z,γ,a)=hF(t,x,z,γ,a,b(t,x,z,γ,a)),(t,x,z,γ,a)[0,T]×d×d×𝕊d×A.\displaystyle H^{\rm F}(t,x,z,\gamma,a)=h^{\rm F}(t,x,z,\gamma,a,b^{\star}(t,x,z,\gamma,a)),\;(t,x,z,\gamma,a)\in[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times A.
inlineinlinetodo: inlineThe Lipschitz-continuity assumption requires some comments. Also, what are we doing: assuming that all best-responses satisfy this, or only select the ones that do? The former seems a bit nicer to me, since this is a assumption on primitives of the model, why the latter means we are potentially solving a sub-optimal problem for the leader.

We now topologise {\cal B}^{\star}. Consider the measurable space (O,𝒪,λ)(O,{\cal O},\lambda), where O[0,T]×d×d×𝕊d×AO\coloneqq[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times A, and 𝒪{\cal O} and λ\lambda denote the Borel σ\sigma-algebra and Lebesgue measure on OO, respectively. inlineinlinetodo: inlineWe use 𝒪{\cal O} later, change notation. We see {\cal B}^{\star} as a subspace of 𝕃1(O,ν)\mathbb{L}^{1}(O,\nu), where dν=Cedλ\mathrm{d}\nu=C{\mathrm{e}}^{-\|\cdot\|}\mathrm{d}\lambda and C>0C>0 is a normalising constant. In this way, as a subspace of a separable metric space, {\cal B}^{\star} is separable. Lastly, for any bb^{\star}\in{\cal B}^{\star} and φ{C,c,λ,σ,λσ,σσ}\varphi\in\{C,c,\lambda,\sigma,\lambda\sigma,\sigma\sigma^{\top}\!\} we define

φ(t,x,a,z,γ)φ(t,x,a,b(t,x,z,γ,a)).\displaystyle\varphi(t,x,{\color[rgb]{0.19,0.55,0.91}\definecolor[named]{pgfstrokecolor}{rgb}{0.19,0.55,0.91}a},z,\gamma)\coloneqq\varphi(t,x,a,b^{\star}(t,x,z,\gamma,a)). (5.1)
inlineinlinetodo: inlineI don’t think this is a meaningful definition: first (but I guess this is a typo?) you have φ\varphi on both sides, and more importantly, the value has to depend on the choice of bb^{\star}. This does not matter for hFh^{F} since they are all maximisers of it, but it does for all other functions. Why not using the notation φb\varphi^{b^{\text{$\star$}}} then?

With this, we introduce the following set of assumptions.

Assumption 5.1.

In addition to 3.1, we assume that cc, σ\sigma and σλ\sigma\lambda are Lipschitz-continuous in (x,b)(x,b), uniformly in (t,a)(t,a).

We let \mathfrak{C} be the family of tuples (α,Z,Γ,b)(\alpha,Z,\Gamma,b^{\star}) consisting of 𝔽\mathbb{F}-predictable processes (α,Z,Γ):[0,T]×ΩA×d×𝕊d(\alpha,Z,\Gamma):[0,T]\times\Omega\longrightarrow A\times\mathbb{R}^{d}\times\mathbb{S}^{d} and bb^{\star}\in{\cal B}^{\star} such that, for some p>1p>1,

Zpp+Γ𝔾pp+b𝕃pp<+.\|Z\|_{\mathbb{H}^{\raisebox{1.0pt}{\text{$p$}}}}^{p}+\|\Gamma\|_{\mathbb{G}^{\raisebox{1.0pt}{\text{$p$}}}}^{p}+\|b^{\star}\|_{\mathbb{L}^{\raisebox{1.0pt}{\text{$p$}}}}^{p}<+\infty.
inlineinlinetodo: inlineWhy did the norms suddenly change here?? I know σ\sigma is bounded, but still, why take it off the norms?

To alleviate the notation, we use υ\upsilon to denote a generic element of \mathfrak{C} and υ^=(α,Z,Γ)\hat{\upsilon}=(\alpha,Z,\Gamma) its first three components. With this, given t[0,T]t\in[0,T], (x,y)d+1(x,y)\in\mathbb{R}^{d+1} and υ\upsilon\in\mathfrak{C}, the controlled state processes are given by

Xut,x,υ=x+tuσλ(s,Xst,x,υ,υ^s)ds+tuσ(s,Xst,x,υ,υ^s)dWs,u[t,T],Yut,x,y,υ=ytuc(s,Xst,x,υ,υ^s)ds+tuZsσ(s,Xst,x,υ,υ^s)dWs,u[t,T].\displaystyle\begin{split}X_{u}^{t,x,\upsilon}&=x+\int_{t}^{u}\sigma\lambda(s,X_{s}^{t,x,\upsilon},\hat{\upsilon}_{s})\mathrm{d}s+\int_{t}^{u}\sigma(s,X_{s}^{t,x,\upsilon},\hat{\upsilon}_{s})\mathrm{d}W_{s},\;u\in[t,T],\\ Y_{u}^{t,x,y,\upsilon}&=y-\int_{t}^{u}c(s,X_{s}^{t,x,\upsilon},\hat{\upsilon}_{s})\mathrm{d}s+\int_{t}^{u}Z_{s}\cdot\sigma(s,X_{s}^{t,x,\upsilon},\hat{\upsilon}_{s})\mathrm{d}W_{s},\;u\in[t,T].\end{split} (5.2)

With this, we define the problem

V~LsupyV(0,x0,y),\tilde{V}_{L}\coloneqq\sup_{y\in\mathbb{R}}V(0,x_{0},y), (5.3)

where

V(t,x,y)supυ(t,x,y)𝔼[tTC(s,Xst,x,υ,υ^s)ds+G(XTt,x,υ)|t],V(t,x,y)\coloneqq\sup_{\upsilon\in{\mathfrak{C}}(t,x,y)}\mathbb{E}^{\mathbb{P}}\bigg{[}\int_{t}^{T}C(s,X_{s}^{t,x,\upsilon},\hat{\upsilon}_{s})\mathrm{d}s+G(X_{T}^{t,x,\upsilon})\bigg{|}{\cal F}_{t}\bigg{]}, (5.4)

and, for (t,x,y)[0,T]×d+1(t,x,y)\in[0,T]\times\mathbb{R}^{d+1},

(t,x,y){υ:υ^ is independent of t, and YTt,y,x,υ=g(XTt,x,υ),–a.s.}.\mathfrak{C}(t,x,y)\coloneqq\big{\{}\upsilon\in\mathfrak{C}:\hat{\upsilon}\text{ is independent of }{\cal F}_{t}\text{, and }Y_{T}^{t,y,x,\upsilon}=g(X_{T}^{t,x,\upsilon}),\ \mathbb{P}\text{\rm--a.s.}\big{\}}.
inlineinlinetodo: inlineAs it is written, there should be an essup in the dynamic value. Though there shouldn’t be a need to condition if we define the state variables starting from tt, which you do below.
Remark 5.2.

Let us comment on the previous formulation.

  1. (i)(i)

    We remind the reader that in the strong formulation, the background probability measure \mathbb{P} is fixed, and thus, the norms in the definition of {\mathfrak{C}} coincide with those in the standard literature. In particular, contrary to the weak formulation, the family \mathfrak{C} does not depend on the choice of α𝒜\alpha\in{\cal A}. We also remark that \mathfrak{C} is a separable topological space. This guarantees the geometric dynamic programming principle of [14], based on [67], holds.

  2. (ii)(ii)

    The Lipschitz-continuity in (x,b)(x,b) of σ\sigma and λσ\lambda\sigma in 5.1 and in xx of bb^{\star} in the definition of {\cal B}^{\star}, ensure that the process Xt,x,υX^{t,x,\upsilon} is well defined, and provide sufficient regularity to conduct our upcoming analysis. Notice that Yt,x,y,υY^{t,x,y,\upsilon} is a direct definition. Note, we do not assume the uniqueness of maximisers of hFh^{\rm F} in bb. The Lipschitz-continuity in (x,b)(x,b) of cc in 5.1 together with the Lipschitz-continuity of bb^{\star}\in{\cal B}^{\star} in zz will be used to establish a comparison principle for the target boundaries in Section 5.1.

  3. (iii)(iii)

    Let us also digress on the equivalence of the strong and weak formulations. A potential roadmap to obtain this result uses [28]. Indeed, to handle the constraint in both formulations, it is natural to embed it in the reward by means of a Lagrange multiplier k0k\geq 0 and the continuous penalty function Φ(y,x)|g(x)y|2\Phi(y,x)\coloneqq|g(x)-y|^{2}. In this way, after establishing that strong duality holds, the results in [28] will allow us to obtain the equivalence of the strong and weak formulations for each element of a family of penalised problems, obtained by fixing kk and optimising over the corresponding controls. The only work needed to complete this argument is the strong duality results for the Lagrangian versions of both V^L\hat{V}_{L} and V~L\tilde{V}_{L}. We have refrained from writing such arguments as this will require, for instance, introducing the so-called relaxed formulation of V^L\hat{V}_{L}, which will unnecessarily encumber our analysis.

As usual in stochastic target problems, we define the target reachability set as the set set of triplets (t,x,y)(t,x,y) such that the set (t,x,y)\mathfrak{C}(t,x,y) is non-empty. That is

Vg(t){(x,y)d+1:υ(t,x,y),YTt,x,y,υ=g(XTt,x,υ),–a.s.}.V_{g}(t)\coloneqq\big{\{}(x,y)\in\mathbb{R}^{d+1}:\exists\upsilon\in{\mathfrak{C}}(t,x,y),\;Y_{T}^{t,x,y,\upsilon}=g(X_{T}^{t,x,\upsilon}),\;\mathbb{P}\text{\rm--a.s.}\big{\}}.

We are interested in characterising Vg(t)V_{g}(t), through the auxiliary sets

Vg(t){(x,y)d+1:υ(t,x,y),YTt,x,y,υg(XTt,x,υ),–a.s.},\displaystyle V_{g}^{-}(t)\coloneqq\big{\{}(x,y)\in\mathbb{R}^{d+1}:\exists\upsilon\in{\mathfrak{C}}(t,x,y),\;Y_{T}^{t,x,y,\upsilon}\geq g(X_{T}^{t,x,\upsilon}),\;\mathbb{P}\text{\rm--a.s.}\big{\}},
Vg+(t){(x,y)d+1:υ(t,x,y),YTt,x,y,υg(XTt,x,υ),–a.s.}.\displaystyle V_{g}^{+}(t)\coloneqq\big{\{}(x,y)\in\mathbb{R}^{d+1}:\exists\upsilon\in{\mathfrak{C}}(t,x,y),\;Y_{T}^{t,x,y,\upsilon}\leq g(X_{T}^{t,x,\upsilon}),\;\mathbb{P}\text{\rm--a.s.}\big{\}}.

Notice that the inclusion Vg(t)Vg(t)Vg+(t)V_{g}(t)\subseteq V_{g}^{-}(t)\cap V_{g}^{+}(t) is immediate. The set Vg(t)V_{g}^{-}(t) has been studied by [68, 13] and its boundary can be characterised through the auxiliary value function defined below

w(t,x)inf{y:(x,y)Vg(t)}.w^{-}(t,x)\coloneqq\inf\{y\in\mathbb{R}:(x,y)\in V_{g}^{-}(t)\}. (5.5)

It is known, see for instance [14, Corollary 2.1], that the closure of Vg(t)V_{g}^{-}(t) is given by

Vg(t)¯={(x,y):yw(t,x)}.\overline{V_{g}^{-}(t)}=\big{\{}(x,y):y\geq w^{-}(t,x)\big{\}}.

Moreover, ww^{-} is a discontinuous viscosity solution of the following PDE

tw(t,x)H(t,x,xw(t,x),xx2w(t,x))=0,(t,x)[0,T)×d,w(T,x)=g(x),xd,-\partial_{t}w(t,x)-H^{-}\big{(}t,x,\partial_{x}w(t,x),\partial_{xx}^{2}w(t,x)\big{)}=0,\;(t,x)\in[0,T)\times\mathbb{R}^{d},\;w(T^{-},x)=g(x),\;x\in\mathbb{R}^{d}, (5.6)

where H:[0,T]×d×d×𝕊dH^{-}:[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{S}^{d}\longrightarrow\mathbb{R}, and hb:[0,T]×d×d×𝕊d×A×d×𝕊dh^{\rm b}:[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times A\times\mathbb{R}^{d}\times\mathbb{S}^{d}\longrightarrow\mathbb{R} are given by

H(t,x,p,Q)inf(a,z,γ,b)N(t,x,p){hb(t,x,p,Q,a,z,γ)},hb(t,x,p,Q,a,z,γ)c(t,x,a,z,γ)+σλ(t,x,a,z,γ)p+12Tr[σσ(t,x,a,z,γ)Q],\displaystyle\begin{split}H^{-}(t,x,p,Q)&\coloneqq\inf_{(a,z,\gamma,b^{\text{$\star$}})\in N(t,x,p)}\big{\{}h^{\rm b}(t,x,p,Q,a,z,\gamma)\big{\}},\\ h^{\rm b}(t,x,p,Q,a,z,\gamma)&\coloneqq c(t,x,a,z,\gamma)+\sigma\lambda(t,x,a,z,\gamma)\cdot p+\frac{1}{2}\text{Tr}[\sigma{\sigma}^{\top}\!(t,x,a,z,\gamma)Q],\end{split} (5.7)

and, since σσ\sigma\sigma^{\top}\! is invertible by assumption

N(t,x,p){(a,z,γ,b)A×d×𝕊d×:σ(t,x,a,z,γ)(zp)=0}=A×{p}×𝕊d×.\displaystyle N(t,x,p)\coloneqq\{(a,z,\gamma,b^{\star})\in A\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times{\cal B}^{\star}:\sigma^{\top}\!(t,x,a,z,\gamma)(z-p)=0\}=A\times\{p\}\times\mathbb{S}^{d}\times{\cal B}^{\star}. (5.8)

Similarly, by doing a change of variables and following the same ideas, the closure of Vg+(t)V_{g}^{+}(t) can be characterised as follows

Vg+(t)¯={(x,y):yw+(t,x)},\overline{V_{g}^{+}(t)}=\big{\{}(x,y):y\leq w^{+}(t,x)\big{\}},

where the auxiliary value function w+w^{+} is defined by

w+(t,x)sup{y:(x,y)Vg+(t)},w^{+}(t,x)\coloneqq\sup\{y\in\mathbb{R}:(x,y)\in V_{g}^{+}(t)\}, (5.9)

and it is a discontinuous viscosity solution of the PDE

tw(t,x)H+(t,x,xw(t,x),xx2w(t,x))=0,(t,x)[0,T)×d,w(T,x)=g(x),xd,-\partial_{t}w(t,x)-H^{+}\big{(}t,x,\partial_{x}w(t,x),\partial_{xx}^{2}w(t,x)\big{)}=0,\;(t,x)\in[0,T)\times\mathbb{R}^{d},\;w(T^{-},x)=g(x),\;x\in\mathbb{R}^{d}, (5.10)

where H+:[0,T]×d×d×𝕊dH^{+}:[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{S}^{d}\longrightarrow\mathbb{R} is given by

H+(t,x,p,Q)sup(a,z,γ,b)N(t,x,p){hb(t,x,p,Q,a,z,γ)}.H^{+}(t,x,p,Q)\coloneqq\sup_{(a,z,\gamma,b^{\text{$\star$}})\in N(t,x,p)}\big{\{}h^{\rm b}(t,x,p,Q,a,z,\gamma)\big{\}}.

We propose the two auxiliary value functions as the upper and lower boundaries of Vg(t)V_{g}(t), and thus define the set

V^g(t){(x,y):w(t,x)yw+(t,x)},\hat{V}_{g}(t)\coloneqq\{(x,y):w^{-}(t,x)\leq y\leq w^{+}(t,x)\},

which, provided the upper and lower boundaries are sufficiently separated before TT, corresponds to the closure of the reachability set VG(t)V_{G}(t), as we prove next. For this, we introduce

δεinf(t,x)[0,Tε]×d|w(t,x)w+(t,x)|,ε>0.\delta_{\varepsilon}\coloneqq\inf_{(t,x)\in[0,T-\varepsilon]\times\mathbb{R}^{\text{$d$}}}|w^{-}(t,x)-w^{+}(t,x)|,\;\varepsilon>0.
Lemma 5.3.

Let t[0,T]t\in[0,T]. The following holds

  1. (i)(i)

    Vg(t)V^g(t).V_{g}(t)\subseteq\hat{V}_{g}(t).

  2. (ii)(ii)

    If in addition δε>0\delta_{\varepsilon}>0 for any ε>0\varepsilon>0, and ww^{-} and w+w^{+} are continuous, then, int(V^g(t))Vg(t){\rm int}\big{(}\hat{V}_{g}(t)\big{)}\subseteq V_{g}(t) and Vg(t)¯=V^g(t)\overline{V_{g}(t)}=\hat{V}_{g}(t).

Remark 5.4.

Let us provide a sufficient structural condition for the assumption δε>0\delta_{\varepsilon}>0 for any ε>0\varepsilon>0, before presenting the proof of Lemma 5.3. We claim that it holds if PDE (5.6) satisfies a comparison principle, as we will establish in Section 5.1, and there is η>0\eta>0 such that

H+(t,x,p,Q)H(t,x,p,Q)+η,(t,x,p,Q)[0,T]×d×d×𝕊d.H^{+}(t,x,p,Q)\geq H^{-}(t,x,p,Q)+\eta,\;\forall(t,x,p,Q)\in[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{S}^{d}. (5.11)

Indeed, under this condition, it is easy to see that the function w^(t,x)w(t,x)+η(Tt)\hat{w}^{-}(t,x)\coloneqq w^{-}(t,x)+\eta(T-t) is a discontinuous viscosity sub-solution to PDE (5.6). Therefore, from the comparison principle we have w^w+\hat{w}^{-}\leq w^{+}, which implies δε>0\delta_{\varepsilon}>0 for any ε>0\varepsilon>0. A similar argument works if PDE (5.10) satisfies a comparison principle instead.

Proof of Lemma 5.3.

Let us argue (i)(i). Let (x,y)Vg(t)(x,y)\in V_{g}(t), then there exists υ(t,x,y)\upsilon\in\mathfrak{C}(t,x,y) such that YTt,x,y,υ=g(XTt,x,y,υ),–a.s.Y_{T}^{t,x,y,\upsilon}=g(X_{T}^{t,x,y,\upsilon}),\;\mathbb{P}\text{\rm--a.s.} Then it is clear that (x,y)(x,y) belongs to both auxiliary sets Vg(t)V_{g}^{-}(t) and Vg+(t)V_{g}^{+}(t), that is, (x,y)Vg(t)Vg+(t)(x,y)\in V_{g}^{-}(t)\cap V_{g}^{+}(t). Since V^g(t)=Vg(t)¯Vg+(t)¯\hat{V}_{g}(t)=\overline{V_{g}^{-}(t)}\cap\overline{V_{g}^{+}(t)}, it follows that Vg(t)V^g(t)V_{g}(t)\subseteq\hat{V}_{g}(t).

As for (ii)(ii), we first note that the second part of the statement, i.e. Vg(t)¯=V^g(t)\overline{V_{g}(t)}=\hat{V}_{g}(t), follows from the inclusions int(V^g(t))Vg(t)V^g(t)\text{int}(\hat{V}_{g}(t))\subseteq V_{g}(t)\subseteq\hat{V}_{g}(t) by taking closure. Let us now argue int(V^g(t))Vg(t)\text{int}(\hat{V}_{g}(t))\subseteq V_{g}(t). To increase the readability of the proof, given (t,x,y)[0,T]×d+1(t,x,y)\in[0,T]\times\mathbb{R}^{d+1} and υ(t,x,y)\upsilon\in{\mathfrak{C}}(t,x,y), we will say that υ\upsilon satisfies (U)(U) or (L)(L) whenever YTt,x,y,υg(XTt,x,υ),–a.s.Y_{T}^{t,x,y,\upsilon}\geq g(X_{T}^{t,x,\upsilon}),\;\mathbb{P}\text{\rm--a.s.}, or, YTt,x,y,υg(XTt,x,υ),–a.s.Y_{T}^{t,x,y,\upsilon}\leq g(X_{T}^{t,x,\upsilon}),\;\mathbb{P}\text{\rm--a.s.}, respectively. Let t[0,T]t\in[0,T] and (x,y)int(V^G(t))(x,y)\in{\rm int}(\hat{V}_{G}(t)). We argue in 2 steps.

Step 1. We fix nn\in\mathbb{N}^{\star} and construct an admissible control up to TnTn1T_{n}\coloneqq T-n^{-1}. Since (x,y)int(V^G(t))(x,y)\in{\rm int}(\hat{V}_{G}(t)), by continuity, we have that w(t,x)<y<w+(t,x)w^{-}(t,x)<y<w^{+}(t,x). Thus, in particular, there is υ0,n(t,x,y)\upsilon^{0,n}\in{\mathfrak{C}}(t,x,y) satisfying (U)(U). Let X0,nXt,x,υ0,n,Y0,nYt,x,y,υ0,nX^{0,n}\coloneqq X^{t,x,\upsilon^{\text{$0$}\text{$,$}\text{$n$}}},Y^{0,n}\coloneqq Y^{t,x,y,\upsilon^{\text{$0$}\text{$,$}\text{$n$}}}. By [14, Corollary 2.1], Ys0,nw(s,Xs0,n)Y^{{0},n}_{s}\geq w^{-}(s,X^{0,n}_{s}), s[t,T]s\in[t,T]. We have two cases. If Ys0,n=w(s,Xs0,n)Y^{0,n}_{s}=w^{-}(s,X^{0,n}_{s}) for some s[t,T]s\in[t,T], by the Markov property and the continuity of ww^{-}, we find that (x,y)Vg(t)(x,y)\in V_{g}(t) as desired and conclude the proof. Otherwise, we have that Ys0,n>w(s,Xs0,n)Y^{0,n}_{s}>w^{-}(s,X^{0,n}_{s}), s[t,T]s\in[t,T]. We then define the sequence of 𝔽\mathbb{F}–stopping times (τkn)k{0,,k(n)}(\tau_{k}^{n})_{k\in\{0,\dots,k(n)\}}, with k(n)k(n)\in\mathbb{N} to be defined, recursively as follows

τ0n\displaystyle\tau_{0}^{n} inf{st:w+(s,Xs0,n)Ys0,nδn1/3}Tn.\displaystyle\coloneqq\inf\big{\{}s\geq t:w^{+}\big{(}s,X^{0,n}_{s}\big{)}-Y^{0,n}_{s}\leq\delta_{n^{\text{$-$}\text{$1$}}}/3\big{\}}\wedge T_{n}.

If τ0n=Tn\tau_{0}^{n}=T_{n}, we set k(n)=0k(n)=0 and conclude the construction. Otherwise, by continuity, we have that w+(τ0n,Xτ0n0,n)Yτ0n0,n=δn1/3w^{+}\big{(}\tau_{0}^{n},X_{\tau_{\text{$0$}}^{\text{$n$}}}^{0,n}\big{)}-Y_{\tau_{\text{$0$}}^{\text{$n$}}}^{0,n}=\delta_{n^{\text{$-$}\text{$1$}}}/3. By definition of δε\delta_{\varepsilon}, we have that

(Xτ0n0,n,Yτ0n0,n)int(V^g(τ0n)),–a.s.,i.e.w(τ0n,Xτ0n0,n)<Yτ0n<w+(τ0n,Xτ0n0,n),–a.s.\displaystyle\big{(}X_{\tau_{\text{$0$}}^{\text{$n$}}}^{0,n},Y_{\tau_{\text{$0$}}^{\text{$n$}}}^{0,n}\big{)}\in{\rm int}\big{(}\hat{V}_{g}(\tau_{0}^{n})\big{)},\;\mathbb{P}\text{\rm--a.s.},\;\text{\emph{i.e.}}\;w^{-}\big{(}\tau_{0}^{n},X_{\tau_{\text{$0$}}^{\text{$n$}}}^{0,n}\big{)}<Y_{\tau_{\text{$0$}}^{\text{$n$}}}<w^{+}\big{(}\tau_{0}^{n},X_{\tau_{\text{$0$}}^{\text{$n$}}}^{0,n}\big{)},\;\mathbb{P}\text{\rm--a.s.} (5.12)

Thus, by [14, Corollary 2.1], there is υ1,n(t,x,y)\upsilon^{1,n}\in{\mathfrak{C}}(t,x,y), satisfying (L)(L) and υ1,n=υ0,n\upsilon^{1,n}=\upsilon^{0,n} on [t,τ0n)[t,\tau_{0}^{n}). Let now

τ1n\displaystyle\tau_{1}^{n} inf{sτ0n:Ys1,nw(s,Xs1,n)δn1/3}Tn,forX1,nXt,x,υ1,n,Y1,nYt,x,y,υ1,n.\displaystyle\coloneqq\inf\big{\{}s\geq\tau_{0}^{n}:Y_{s}^{1,n}-w^{-}(s,X_{s}^{1,n})\leq\delta_{n^{\text{$-$}\text{$1$}}}/3\big{\}}\wedge T_{n},\;\text{for}\;X^{1,n}\coloneqq X^{t,x,\upsilon^{\text{$1$}\text{$,$}\text{$n$}}},\;Y^{1,n}\coloneqq Y^{t,x,y,\upsilon^{\text{$1$}\text{$,$}\text{$n$}}}.

Arguing as above, by definition of τ1n\tau_{1}^{n}, we find that (Xτ1n1,n,Yτ1n1,n)int(V^g(τ1n)),–a.s.(X_{\tau_{\text{$1$}}^{\text{$n$}}}^{1,n},Y_{\tau_{\text{$1$}}^{\text{$n$}}}^{1,n})\in{\rm int}\big{(}\hat{V}_{g}(\tau_{1}^{n})\big{)},\;\mathbb{P}\text{\rm--a.s.} Thus, again by [14, Corollary 2.1], there is υ2,n(t,x,y)\upsilon^{2,n}\in{\mathfrak{C}}(t,x,y) such that (U)(U) holds and υ2,n=υ1,n\upsilon^{2,n}=\upsilon^{1,n} on [τ0n,τ1n)[\tau_{0}^{n},\tau_{1}^{n}). Recursively, for kk\in\mathbb{N}^{\star} we let Xk,nXt,x,υk,n,Yk,nYt,x,y,υk,nX^{k,n}\coloneqq X^{t,x,\upsilon^{\text{$k$}\text{$,$}\text{$n$}}},\;Y^{k,n}\coloneqq Y^{t,x,y,\upsilon^{\text{$k$}\text{$,$}\text{$n$}}}

τ2kn\displaystyle\tau_{2k}^{n} inf{sτ2k1n:w+(τk1n,Xsk,n)Ysk,nδn1/3}Tn,\displaystyle\coloneqq\inf\big{\{}s\geq\tau_{2k-1}^{n}:w^{+}(\tau_{k-1}^{n},X_{s}^{k,n})-Y_{s}^{k,n}\leq\delta_{n^{\text{$-$}\text{$1$}}}/3\big{\}}\wedge T_{n},
τ2k+1n\displaystyle\tau_{2k+1}^{n} inf{sτ2kn:Ysk,nw(τk1n,Xsk,n)δn1/3}Tn,\displaystyle\coloneqq\inf\big{\{}s\geq\tau_{2k}^{n}:Y_{s}^{k,n}-w^{-}(\tau_{k-1}^{n},X_{s}^{k,n})\leq\delta_{n^{\text{$-$}\text{$1$}}}/3\big{\}}\wedge T_{n},

and find υk+1,n(t,x,y)\upsilon^{k+1,n}\in\mathfrak{C}(t,x,y) for which (Xτknk,n,Yτknk,n)int(V^g(τkn)),–a.s.(X_{\tau_{\text{$k$}}^{\text{$n$}}}^{k,n},Y_{\tau_{\text{$k$}}^{\text{$n$}}}^{k,n})\in{\rm int}\big{(}\hat{V}_{g}(\tau_{k}^{n})\big{)},\;\mathbb{P}\text{\rm--a.s.}

We now claim that there is a process k(n)k(n) with values in \mathbb{N} such that τk(n)n=Tn\tau^{n}_{k(n)}=T_{n}, –a.s.\mathbb{P}\text{\rm--a.s.} Indeed, by continuity of ww^{-} and w+w^{+}, the mappings

[t,Tn]sw+(s,Xst,x,υ)Yst,x,y,υ,and[t,Tn]sYst,x,y,υw(s,Xst,x,υ),[t,T_{n}]\ni s\longmapsto w^{+}\big{(}s,X_{s}^{t,x,\upsilon}\big{)}-Y_{s}^{t,x,y,\upsilon},\;\text{and}\;[t,T_{n}]\ni s\longmapsto Y_{s}^{t,x,y,\upsilon}-w^{-}\big{(}s,X_{s}^{t,x,\upsilon}\big{)},

are, ω\omega-by-ω\omega, uniformly continuous for any υ(t,x,y)\upsilon\in{\mathfrak{C}}(t,x,y). Hence, there exists a constant γ¯n>0\bar{\gamma}_{n}>0 and a [γ¯n,Tn][\bar{\gamma}_{n},T_{n}]-valued random variable γn\gamma_{n} such that, τknτk1n>γn,\|\tau_{k}^{n}-\tau_{k-1}^{n}\|_{\infty}>\gamma_{n}, –a.s.,k\mathbb{P}\text{\rm--a.s.},\;k\in\mathbb{N}. This proves the claim.

At the end of this construction, we set υnυk(n),n\upsilon^{n}\coloneqq\upsilon^{k(n),n}, and notice that υn(t,x,y)\upsilon^{n}\in{\mathfrak{C}}(t,x,y) and

w(Tn,XTnn)<YTnn<w+(Tn,XTnn),–a.s., for XnXt,x,υn,YnYt,x,y,υn.\displaystyle w^{-}\big{(}T_{n},X_{T_{\text{$n$}}}^{n}\big{)}<Y_{T_{\text{$n$}}}^{n}<w^{+}\big{(}T_{n},X_{T_{\text{$n$}}}^{n}\big{)},\;\mathbb{P}\text{\rm--a.s.},\text{ for }X^{n}\coloneqq X^{t,x,\upsilon^{\text{$n$}}},Y^{n}\coloneqq Y^{t,x,y,\upsilon^{\text{$n$}}}. (5.13)

Step 2. We iterate the previous construction. From here on, we can repeat Step 1, with (Tn,XTnn)(T_{n},X_{T_{\text{$n$}}}^{n}), control υn\upsilon^{n}, and n+1n+1 playing the role of (t,x)(t,x), υ0,n\upsilon^{0,n} and nn, respectively. With this, we obtain the existence of υn+1(t,x,y)\upsilon^{n+1}\in{\mathfrak{C}}(t,x,y), such that, by uniform continuity, (5.13) holds at (Tn+1,XTn+1n+1)(T_{n+1},X^{n+1}_{T_{\text{$n$}\text{$+$}\text{$1$}}}) and YTn+1n+1Y^{n+1}_{T_{\text{$n$}\text{$+$}\text{$1$}}}. Iterating this construction, we find υ\upsilon which is well-defined dtd–a.e.\mathrm{d}t\otimes\mathrm{d}\mathbb{P}\text{\rm--a.e.} on [0,T]×Ω[0,T]\times\Omega.171717Indeed, the construction allows us to define said process dtd–a.e.\mathrm{d}t\otimes\mathrm{d}\mathbb{P}\text{\rm--a.e.} on [t,T)×Ω[t,T)\times\Omega, and consequently, dtd–a.e.\mathrm{d}t\otimes\mathrm{d}\mathbb{P}\text{\rm--a.e.} on [0,T]×Ω[0,T]\times\Omega.

To conclude (x,y)Vg(t)(x,y)\in V_{g}(t), let nn\longrightarrow\infty in (5.13), and notice that by continuity of ww^{-} and w+w^{+} we have that YTt,x,y,υ=g(XTt,x,υ)Y_{T}^{t,x,y,\upsilon}=g(X_{T}^{t,x,\upsilon}) as desired.∎

5.1 Boundaries PDEs: comparison and verification

In this section, we prove the following verification theorem for the solutions to PDEs (5.6) and (5.10). We remind the reader the 5.1 is in place.

Theorem 5.5.

Let uu and vv be continuous viscosity solutions to (5.6) and (5.10), respectively, with linear growth. Then u=wu=w^{-} and v=w+v=w^{+}.

We conduct the analysis for ww^{-}, the argument for w+w^{+} being analogous. We start by establishing a comparison result for viscosity solutions to (5.6). Let us recall that ww^{-} is a discontinuous viscosity solution of such an equation.

Lemma 5.6.

Let uu and vv be respectively an upper–semi-continuous viscosity sub-solution and a lower–semi-continuous viscosity super-solution of (5.6), such that for φ{u,v}\varphi\in\{u,v\} and some C>0C>0, φ(y)C(1+y)\varphi(y)\leq C(1+\|y\|), y[0,T]×dy\in[0,T]\times\mathbb{R}^{d}. If, u(T,x)v(T,x)u(T,x)\leq v(T,x), xdx\in\mathbb{R}^{d}, then uvu\leq v on 𝒪(0,T)×d{\cal O}\coloneqq(0,T)\times\mathbb{R}^{d}.

Proof.

Step 1. Fix postive constants α\alpha, β\beta, η\eta, and ε\varepsilon, and define ϕ(t,x,y)uη(t,x)v(t,y)\phi(t,x,y)\coloneqq u^{\eta}(t,x)-v(t,y), where uη(t,x)u(t,x)ηt,(t,x)𝒪u^{\eta}(t,x)\coloneqq u(t,x)-\frac{\eta}{t},\;(t,x)\in{\cal O}. Note that since t(ηt1)=ηt2>0\frac{\partial}{\partial t}(-\eta t^{-1})=\eta t^{-2}>0, uηu^{\eta} is a viscosity sub-solution of (5.6) in 𝒪{\cal O}. Define as well

ψα,β,ε(t,x,y)α|xy|2/2+ε|x|2+ε|y|2β(tT).\psi_{\alpha,\beta,\varepsilon}(t,x,y)\coloneqq\alpha|x-y|^{2}/2+\varepsilon|x|^{2}+\varepsilon|y|^{2}-\beta(t-T).

Now, let

Mα,β,εsup(t,x,y)(0,T]×d×d(ϕψα,β,ε)(t,x,y)=(ϕψα,β,ε)(tα,β,ε,xα,β,ε,yα,β,ε),M_{\alpha,\beta,\varepsilon}\coloneqq\sup_{(t,x,y)\in(0,T]\times\mathbb{R}^{\text{$d$}}\times\mathbb{R}^{\text{$d$}}}\big{(}\phi-\psi_{\alpha,\beta,\varepsilon}\big{)}(t,x,y)=(\phi-\psi_{\alpha,\beta,\varepsilon})(t_{\alpha,\beta,\varepsilon},x_{\alpha,\beta,\varepsilon},y_{\alpha,\beta,\varepsilon}),

for some (tα,β,ε,xα,β,ε,yα,β,ε)(0,T]×d×d(t_{\alpha,\beta,\varepsilon},x_{\alpha,\beta,\varepsilon},y_{\alpha,\beta,\varepsilon})\in(0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d} thanks to the upper–semi-continuity of uηvu^{\eta}-v, the growth assumptions on uu and vv and that of β(tT)ηt1\beta(t-T)-\eta t^{-1}. Moreover, we have that <limαMα,β,ε<-\infty<\lim_{\alpha\rightarrow\infty}M_{\alpha,\beta,\varepsilon}<\infty, meaning that the supremum is attained on a compact subset of (0,T]×d×d(0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}. Consequently, there is a subsequence (tnβ,ε,xnβ,ε,ynβ,ε)(tαn,β,ε,xαn,β,ε,yαn,β,ε)(t_{n}^{\beta,\varepsilon},x_{n}^{\beta,\varepsilon},y_{n}^{\beta,\varepsilon})\coloneqq(t_{\alpha_{\text{$n$}},\beta,\varepsilon},x_{\alpha_{\text{$n$}},\beta,\varepsilon},y_{\alpha_{\text{$n$}},\beta,\varepsilon}) that converges to some (t^β,ε,x^β,ε,y^β,ε)(\hat{t}^{\beta,\varepsilon},\hat{x}^{\beta,\varepsilon},\hat{y}^{\beta,\varepsilon}). It then follows from [21, Proposition 3.7] that

x^β,ε=y^β,ε,limnαn|xnβ,εynβ,ε|2=0,Mβ,εlimnMαn,β,ε=sup(t,x)𝒪¯(uηv)(t,x)2ε|x^ε|+β(t^β,εT).\displaystyle\hat{x}^{\beta,\varepsilon}=\hat{y}^{\beta,\varepsilon},\;\lim_{n\to\infty}\alpha_{n}|x^{\beta,\varepsilon}_{n}-y^{\beta,\varepsilon}_{n}|^{2}=0,\;M_{\beta,\varepsilon}\coloneqq\lim_{n\longrightarrow\infty}M_{\alpha_{\text{$n$}},\beta,\varepsilon}=\sup_{(t,x)\in\overline{\cal O}}(u^{\eta}-v)(t,x)-2\varepsilon|\hat{x}^{\varepsilon}|+\beta(\hat{t}^{\beta,\varepsilon}-T). (5.14)

Step 2. To prove the statement, as it is standard in the literature, let us assume by contradiction that there is (to,xo)𝒪(t_{o},x_{o})\in{\cal O} such that γo(uv)(to,xo)>0\gamma_{o}\coloneqq(u-v)(t_{o},x_{o})>0. We claim that there are positive βo\beta_{o}, ηo\eta_{o}, and εo\varepsilon_{o} such that for any βoβ>0\beta_{o}\geq\beta>0, ηoη>0\eta_{o}\geq\eta>0, εoε>0\varepsilon_{o}\geq\varepsilon>0, (tnβ,ε,xnβ,ε,ynβ,ε)(t_{n}^{\beta,\varepsilon},x_{n}^{\beta,\varepsilon},y_{n}^{\beta,\varepsilon}) is a local maximiser of ϕ(t,x,y)ψαn,β,ε(t,x,y)\phi(t,x,y)-\psi_{\alpha_{\text{$n$}},\beta,\varepsilon}(t,x,y) on (0,T)×𝒦2(0,T)\times{\cal K}^{2} for some 𝒦d{\cal K}\subseteq\mathbb{R}^{d} compact. We first note that the existence of 𝒦{\cal K} is clear since the supremum is attained on a compact set. It remains to show that tnβ,ε<Tt_{n}^{\beta,\varepsilon}<T for all nn\in\mathbb{N}.

Suppose by contradiction that tnβ,ε=Tt_{n}^{\beta,\varepsilon}=T for some nn. Thanks to the first step, for any positive β\beta, ε\varepsilon, and η\eta we have that

γoηto+β(toT)2ε|xo|2Mαn,β,ε=sup(x,y)d×d{u(T,x)v(T,y)αn|xy|2/2ε|x|2ε|y|2}ηTηT,\displaystyle\gamma_{o}-\frac{\eta}{t_{o}}+\beta(t_{o}-T)-2\varepsilon|x_{o}|^{2}\leq M_{\alpha_{\text{$n$}},\beta,\varepsilon}=\sup_{(x,y)\in\mathbb{R}^{\text{$d$}}\times\mathbb{R}^{\text{$d$}}}\big{\{}u(T,x)-v(T,y)-\alpha_{n}|x-y|^{2}/2-\varepsilon|x|^{2}-\varepsilon|y|^{2}\big{\}}-\frac{\eta}{T}\leq-\frac{\eta}{T},

where the rightmost inequality follows from the assumption u(T,x)v(T,x)u(T,x)\leq v(T,x), xdx\in\mathbb{R}^{d}. Consequently

γoηtoηT+β(Tto)+2ε|xo|2,\displaystyle\gamma_{o}\leq\frac{\eta}{t_{o}}-\frac{\eta}{T}+\beta(T-t_{o})+2\varepsilon|x_{o}|^{2},

so that for β\beta, ε\varepsilon, and η\eta sufficiently small, γo\gamma_{o} is arbitrarily small which contradicts γo>0\gamma_{o}>0. This proves the claim.

Step 3. In light of the second step, it follows from Crandall–Ishii’s lemma for parabolic problems, [21, Theorem 8.3], applied to uηu^{\eta} and vv that we can find (qn,q^n)(q_{n},\hat{q}_{n}), qnq^n=tψα,β,ε(t,x,y)=βq_{n}-\hat{q}_{n}=\partial_{t}\psi_{\alpha,\beta,\varepsilon}(t,x,y)=-\beta, and symmetric matrices (Xnβ,ε,Ynβ,ε)(X_{n}^{\beta,\varepsilon},Y_{n}^{\beta,\varepsilon}) such that

(qn,αn(xnβ,εynβ,ε)+εxnβ,ε,Xnβ,ε)𝒫¯1,2,+uη(xnβ,ε),(q^n,αn(xnβ,εynβ,ε)+εynβ,ε,Ynβ,ε)𝒫¯1,2,v(ynβ,ε),\big{(}q_{n},\alpha_{n}(x_{n}^{\beta,\varepsilon}-y_{n}^{\beta,\varepsilon})+\varepsilon x_{n}^{\beta,\varepsilon},X_{n}^{\beta,\varepsilon}\big{)}\in\overline{\cal P}^{1,2,+}u^{\eta}(x_{n}^{\beta,\varepsilon}),\;\big{(}\hat{q}_{n},-\alpha_{n}(x_{n}^{\beta,\varepsilon}-y_{n}^{\beta,\varepsilon})+\varepsilon y_{n}^{\beta,\varepsilon},Y_{n}^{\beta,\varepsilon}\big{)}\in\overline{\cal P}^{1,2,-}v(y_{n}^{\beta,\varepsilon}),\;

and, for Cnαn(IdIdIdId)+εI2dC_{n}\coloneqq\alpha_{n}\begin{pmatrix}I_{d}&-I_{d}\\ -I_{d}&I_{d}\end{pmatrix}+\varepsilon I_{2d}, we have that

(1λ+Cn)I2d(Xnβ,ε00Ynβ,ε)Cn(I2d+λCn), for all λ>0.-\bigg{(}\frac{1}{\lambda}+\|C_{n}\|\bigg{)}I_{2d}\leq\begin{pmatrix}X_{n}^{\beta,\varepsilon}&0\\ 0&-Y_{n}^{\beta,\varepsilon}\end{pmatrix}\leq C_{n}(I_{2d}+\lambda C_{n}),\text{ for all }\lambda>0.

Taking λ=(αn+ε)1\lambda=(\alpha_{n}+\varepsilon)^{-1} leads to

(αn+ε+Cn)I2d(Xnβ,ε00Ynβ,ε)3αn(IdIdIdId)+2εI2d.\displaystyle-\Big{(}\alpha_{n}+\varepsilon+\|C_{n}\|\Big{)}I_{2d}\leq\begin{pmatrix}X_{n}^{\beta,\varepsilon}&0\\ 0&-Y_{n}^{\beta,\varepsilon}\end{pmatrix}\leq 3\alpha_{n}\begin{pmatrix}I_{d}&-I_{d}\\ -I_{d}&I_{d}\end{pmatrix}+2\varepsilon I_{2d}. (5.15)

Step 4. With the notation (tn,xn,yn)(tnβ,ε,xnβ,ε,ynβ,ε)(t_{n},x_{n},y_{n})\coloneqq(t_{n}^{\beta,\varepsilon},x_{n}^{\beta,\varepsilon},y_{n}^{\beta,\varepsilon}), pnxαn(xnyn)εxnp_{n}^{x}\coloneqq\alpha_{n}(x_{n}-y_{n})-\varepsilon x_{n}, pnyαn(xnyn)εynp_{n}^{y}\coloneqq\alpha_{n}(x_{n}-y_{n})-\varepsilon y_{n}, under the above assumptions we claim that there exists a universal constant C>0C>0 such that

H(tn,yn,pny,Q2)H(tn,xn,pnx,Q1)C(1+ε2xn+ε2yn+ε)(αnxnyn2+xnyn+ε)H^{-}(t_{n},y_{n},p_{n}^{y},Q_{2})-H^{-}(t_{n},x_{n},p_{n}^{x},Q_{1})\leq C\big{(}1+\varepsilon^{2}\|x_{n}\|+\varepsilon^{2}\|y_{n}\|+\varepsilon\big{)}\big{(}\alpha_{n}\|x_{n}-y_{n}\|^{2}+\|x_{n}-y_{n}\|+\varepsilon\big{)}

for matrices Q1Q_{1}, Q2Q_{2} satisfying (5.15). We consider each term in hbh^{\rm b} separately, recall (5.7) and (5.8).181818The following estimates hold for arbitrary, but fixed, (a,γ,b)(a,\gamma,b^{\star}).

Letting Σxσ(tn,xn,a,pnx,γ),Σyσ(tn,yn,a,pny,γ)\Sigma^{x}\coloneqq\sigma(t_{n},x_{n},a,p_{n}^{x},\gamma),\Sigma^{y}\coloneqq\sigma(t_{n},y_{n},a,p_{n}^{y},\gamma), note that there is C>0C>0 such that

Tr[σσ(tn,yn,a,b(t,yn,pny,γ,a))Q2]Tr[σσ(tn,xn,a,b(t,xn,pnx,γ,a))Q1]\displaystyle\text{Tr}[\sigma\sigma^{\top}\!(t_{n},y_{n},a,b^{\star}(t,y_{n},p_{n}^{y},\gamma,a))Q_{2}]-\text{Tr}[\sigma\sigma^{\top}\!(t_{n},x_{n},a,b^{\star}(t,x_{n},p_{n}^{x},\gamma,a))Q_{1}]
=Tr[(ΣxΣxΣxΣyΣyΣxΣyΣy)(Q200Q1)]\displaystyle=\text{Tr}\bigg{[}\begin{pmatrix}\Sigma^{x}{\Sigma^{x}}^{\top}&\Sigma^{x}{\Sigma^{y}}^{\top}\\ \Sigma^{y}{\Sigma^{x}}^{\top}&\Sigma^{y}{\Sigma^{y}}^{\top}\end{pmatrix}\begin{pmatrix}Q_{2}&0\\ 0&-Q_{1}\end{pmatrix}\bigg{]}
3αnTr[(ΣxΣxΣxΣyΣyΣxΣyΣy)(IdIdIdId)]+2εTr[(ΣxΣxΣxΣyΣyΣxΣyΣy)I2d]\displaystyle\leq 3\alpha_{n}\text{Tr}\bigg{[}\begin{pmatrix}\Sigma^{x}{\Sigma^{x}}^{\top}&\Sigma^{x}{\Sigma^{y}}^{\top}\\ \Sigma^{y}{\Sigma^{x}}^{\top}&\Sigma^{y}{\Sigma^{y}}^{\top}\end{pmatrix}\begin{pmatrix}I_{d}&-I_{d}\\ -I_{d}&I_{d}\end{pmatrix}\bigg{]}+2\varepsilon\text{Tr}\bigg{[}\begin{pmatrix}\Sigma^{x}{\Sigma^{x}}^{\top}&\Sigma^{x}{\Sigma^{y}}^{\top}\\ \Sigma^{y}{\Sigma^{x}}^{\top}&\Sigma^{y}{\Sigma^{y}}^{\top}\end{pmatrix}I_{2d}\bigg{]}
=3αnTr[(ΣxΣy)(ΣxΣy)]+2εTr[ΣxΣx+ΣyΣy]\displaystyle=3\alpha_{n}\text{Tr}\big{[}(\Sigma^{x}-\Sigma^{y})(\Sigma^{x}-\Sigma^{y})^{\top}\big{]}+2\varepsilon\text{Tr}\big{[}\Sigma^{x}{\Sigma^{x}}^{\top}\!+\Sigma^{y}{\Sigma^{y}}^{\top}\big{]}
=3αnΣxΣy2+2εTr[ΣxΣx+ΣyΣy]\displaystyle=3\alpha_{n}\|\Sigma^{x}-\Sigma^{y}\|^{2}+2\varepsilon\text{Tr}\big{[}\Sigma^{x}{\Sigma^{x}}^{\top}\!+\Sigma^{y}{\Sigma^{y}}^{\top}\big{]}
3αnσ(tn,xn,pnx,γ,a)σ(tn,yn,pny,γ,a)2+4εCσσC((1+ε)αnxnyn2+ε),\displaystyle\leq 3\alpha_{n}\|\sigma(t_{n},x_{n},p_{n}^{x},\gamma,a)-\sigma(t_{n},y_{n},p_{n}^{y},\gamma,a)\|^{2}+4\varepsilon C_{\sigma\sigma^{\top}\!}\leq C\big{(}(1+\varepsilon)\alpha_{n}\|x_{n}-y_{n}\|^{2}+\varepsilon\big{)},

where the first inequality follows from the right-hand side of (5.15), CσσC_{\sigma\sigma^{\top}\!} denotes the bound, assumed to exists, on σσ\sigma\sigma^{\top}\! and the last inequality follows from 5.1. Similarly, note that there is a constant C>0C>0 such that

c(tn,yn,pny,γ,a)c(tn,xn,pnx,γ,a)\displaystyle c(t_{n},y_{n},p_{n}^{y},\gamma,a)-c(t_{n},x_{n},p_{n}^{x},\gamma,a) C(xnyn+b(tn,yn,pny,γ,a)b(tn,xn,pnx,γ,a))\displaystyle\leq C\big{(}\|x_{n}-y_{n}\|+\|b^{\star}(t_{n},y_{n},p_{n}^{y},\gamma,a)-b^{\star}(t_{n},x_{n},p_{n}^{x},\gamma,a)\|\big{)}
C(xnyn+pnypnx)C(xnyn+ε),\displaystyle\leq C\big{(}\|x_{n}-y_{n}\|+\|p_{n}^{y}-p_{n}^{x}\|\big{)}\leq C\big{(}\|x_{n}-y_{n}\|+\varepsilon\big{)},

and

σλ(tn,xn,pnx,γ,a)pnxσλ(tn,yn,pny,γ,a)pny\displaystyle\sigma\lambda(t_{n},x_{n},p_{n}^{x},\gamma,a)\cdot p_{n}^{x}-\sigma\lambda(t_{n},y_{n},p_{n}^{y},\gamma,a)\cdot p_{n}^{y}
σλ(tn,xn,pnx,γ,a)pnxpny+σλ(tn,xn,pnx,γ,a)σλ(tn,yn,pny,γ,a)pny\displaystyle\leq\|\sigma\lambda(t_{n},x_{n},p_{n}^{x},\gamma,a)\|\|p_{n}^{x}-p_{n}^{y}\|+\|\sigma\lambda(t_{n},x_{n},p_{n}^{x},\gamma,a)-\sigma\lambda(t_{n},y_{n},p_{n}^{y},\gamma,a)\|\|p_{n}^{y}\|
εCxnyn+Cpny(1+ε)xnynC(1+ε+ε2yn)(xnyn+αnxnyn2).\displaystyle\leq\varepsilon C\|x_{n}-y_{n}\|+C\|p_{n}^{y}\|(1+\varepsilon)\|x_{n}-y_{n}\|\leq C(1+\varepsilon+\varepsilon^{2}\|y_{n}\|)\big{(}\|x_{n}-y_{n}\|+\alpha_{n}\|x_{n}-y_{n}\|^{2}\big{)}.

The result follows from using these estimates back in the Hamiltonian.

Step 5. We conclude. By Step 3 and the viscosity properties of uηu^{\eta} and vv, we have that

qn+H(tnβ,ε,xnβ,ε,αn(xnβ,εynβ,ε)εxnβ,ε,Xnβ,ε)0q^n+H(tnβ,ε,ynβ,ε,αn(xnβ,εynβ,ε)εynβ,ε,Ynβ,ε).-q_{n}+H^{-}\big{(}t_{n}^{\beta,\varepsilon},x_{n}^{\beta,\varepsilon},\alpha_{n}(x_{n}^{\beta,\varepsilon}-y_{n}^{\beta,\varepsilon})-\varepsilon x_{n}^{\beta,\varepsilon},X_{n}^{\beta,\varepsilon}\big{)}\leq 0\leq-\hat{q}_{n}+H^{-}\big{(}t_{n}^{\beta,\varepsilon},y_{n}^{\beta,\varepsilon},\alpha_{n}(x_{n}^{\beta,\varepsilon}-y_{n}^{\beta,\varepsilon})-\varepsilon y_{n}^{\beta,\varepsilon},Y_{n}^{\beta,\varepsilon}\big{)}.

Subtracting, we find from Step 4 that

β=q^nqn\displaystyle\beta=\hat{q}_{n}-q_{n} H(tnβ,ε,ynβ,ε,αn(xnβ,εynβ,ε)εynβ,ε,Ynβ,ε)H(tnβ,ε,xnβ,ε,αn(xnβ,εynβ,ε)εxnβ,ε,Xnβ,ε)\displaystyle\leq H^{-}\big{(}t_{n}^{\beta,\varepsilon},y_{n}^{\beta,\varepsilon},\alpha_{n}(x_{n}^{\beta,\varepsilon}-y_{n}^{\beta,\varepsilon})-\varepsilon y_{n}^{\beta,\varepsilon},Y_{n}^{\beta,\varepsilon}\big{)}-H^{-}\big{(}t_{n}^{\beta,\varepsilon},x_{n}^{\beta,\varepsilon},\alpha_{n}(x_{n}^{\beta,\varepsilon}-y_{n}^{\beta,\varepsilon})-\varepsilon x_{n}^{\beta,\varepsilon},X_{n}^{\beta,\varepsilon}\big{)}
C(1+ε2xnβ,ε+ε2ynβ,ε+ε)(αnxnβ,εynβ,ε2+xnβ,εynβ,ε+ε).\displaystyle\leq C(1+\varepsilon^{2}\|x_{n}^{\beta,\varepsilon}\|+\varepsilon^{2}\|y_{n}^{\beta,\varepsilon}\|+\varepsilon)\big{(}\alpha_{n}\|x_{n}^{\beta,\varepsilon}-y_{n}^{\beta,\varepsilon}\|^{2}+\|x_{n}^{\beta,\varepsilon}-y_{n}^{\beta,\varepsilon}\|+\varepsilon\big{)}.

Passing to the limit nn\longrightarrow\infty and ε0\varepsilon\longrightarrow 0, thanks to (5.14), we find that β0\beta\leq 0 which is a contradiction. ∎

The next lemma proves, in particular, that the auxiliary value function satisfies the hypotheses of Lemma 5.6.

Lemma 5.7.

Suppose the functions H+H^{+} and HH^{-} are continuous. The functions ww^{-} and w+w^{+} from [0,T]×d[0,T]\times\mathbb{R}^{d} to \mathbb{R} defined in (5.5) are bounded and continuous.

For completing the last step in the verification result, we have assumed the continuity of the Hamiltonian functions. We remark that this assumption holds, for instance, if the optimisation over γ\gamma in the definition of H+H^{+} and HH^{-} can be reduced to a compact set, continuously with respect to (t,x,p,Q)(t,x,p,Q).

Proof of Lemma 5.7.

We prove the result for ww^{-}, the other being analogous. We first argue ww^{-} is bounded. Let (t,x)[0,T]×d(t,x)\in[0,T]\times\mathbb{R}^{d} and y>Tc+gy>T\ell_{c}+\ell_{g}. We claim that (x,y)Vg(t)(x,y)\in V_{g}(t). Indeed, taking the control Z0Z\equiv 0, Γ0\Gamma\equiv 0 and any (α,b)𝒜×(\alpha,b^{\star})\in{\cal A}\times{\cal B}^{\star} we have

YTt,x,y,υ=ytTc(s,Xst,x,υ,Zs,υ^)dsyTc>gg(XTt,x,υ).Y_{T}^{t,x,y,\upsilon}=y-\int_{t}^{T}c\big{(}s,X_{s}^{t,x,\upsilon},Z_{s},\hat{\upsilon}\big{)}\textrm{d}s\geq y-T\ell_{c}>\ell_{g}\geq g(X_{T}^{t,x,\upsilon}).

That is w(t,x)Tc+gw^{-}(t,x)\leq T\ell_{c}+\ell_{g}. To obtain a lower bound take again (t,x)[0,T]×d(t,x)\in[0,T]\times\mathbb{R}^{d} and y<Tcgy<-T\ell_{c}-\ell_{g}. Then, it is easy to check that for any MM\in\mathbb{R} and any υ\upsilon\in{\mathfrak{C}} the following process is an (𝔽,)(\mathbb{F},\mathbb{P})–super-martingale

AsYst,x,y,υsc+M,s[0,T].A_{s}\coloneqq Y_{s}^{t,x,y,\upsilon}-s\ell_{c}+M,\;s\in[0,T].

Thus, choosing M=Tc+gM=T\ell_{c}+\ell_{g}, we have that 𝔼[YTt,x,y,υTc+M]y+M<0\mathbb{E}^{\mathbb{P}}[Y_{T}^{t,x,y,\upsilon}-T\ell_{c}+M]\leq y+M<0, which implies [YTt,x,y,υ+g<0]>0\mathbb{P}[Y_{T}^{t,x,y,\upsilon}+\ell_{g}<0]>0. Therefore, for any υ\upsilon\in{\mathfrak{C}}

[YTt,x,y,υ<g(XTt,x,υ)][YTt,x,y,υ+g<0]>0,\mathbb{P}\big{[}Y_{T}^{t,x,y,\upsilon}<g(X_{T}^{t,x,\upsilon})\big{]}\geq\mathbb{P}\big{[}Y_{T}^{t,x,y,\upsilon}+\ell_{g}<0\big{]}>0,

which means that the pair (x,y)Vg1(t)(x,y)\not\in V_{g}^{1}(t). Thus, w(t,x)Tcgw^{-}(t,x)\geq-T\ell_{c}-\ell_{g}.

Let us now prove the continuity. By [13, Theorem 2.1], ww^{-} is a discontinuous viscosity solution to PDE (5.10) as long as we verify Assumption 2.1 therein. Indeed, the continuity condition on the set N(t,x,p)N(t,x,p) holds in our case given the explicit form that was obtained in (5.8). Since HH^{-} is continuous, the lower– and upper–semi-continuous envelopes ww^{-}_{\star} and w,w^{-,\star} are viscosity super-solution and sub-solution, respectively, of Equation 5.6. From [13, Theorem 2.2], which in our case is not subject to the gradient constraints (see [13, Remark 2.1] and notice that in our setting their set Nc is empty), we conclude that w,(T,)gw(T,)w^{-,\star}(T,\cdot)\leq g\leq w^{-}_{\star}(T,\cdot). Finally, from Lemma 5.6, we have therefore that w,ww^{-,\star}\leq w^{-}_{\star} on [0,T)×d[0,T)\times\mathbb{R}^{d}. Since the reverse inequality holds by definition, we conclude the equality of the semicontinuous envelopes and thus the continuity of ww^{-}. ∎

Proof of Theorem 5.5.

The result is an immediate consequence of Lemmata 5.6 and 5.7. ∎

5.2 PDE characterisation for the problem of the leader

Having conducted the analysis of the auxiliary boundary functions ww^{-} and w+w^{+}, we are in a position to provide a verification theorem for Problem 5.4. Theorem 5.8 below provides a PDE characterisation for the intermediate problem of the leader under the CL information structure. Let us remark that once V(t,x,y)V(t,x,y) is found it only remains to optimise over yy\in\mathbb{R}.

To ease the notation, we will use xd+1{\rm x}\in\mathbb{R}^{d+1} and uA×d×𝕊d×Uu\in A\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times{\cal B}^{\star}\eqqcolon U, to denote the values of the state and control processes associated with Problem 5.4, that is, we make the convention x=(x,y){\rm x}=(x,y) and u=(a,z,γ,b)u=(a,z,\gamma,b^{\star}). In this way, recall (5.1), we let C(t,x,u)C(t,x,a,z,γ)C(t,{\rm x},u)\coloneqq C(t,x,a,z,\gamma) and similarly for the other functions in the analysis. Moreover, we denote the drift and volatility coefficients, (μ,ϑ):[0,T]×d+1×Ud+1×(d+1)×n(\mu,\vartheta):[0,T]\times\mathbb{R}^{d+1}\times U\longrightarrow\mathbb{R}^{d+1}\times\mathbb{R}^{(d+1)\times n} associated with the state process X(X,Y){\rm X}\coloneqq(X,Y) by

μ(t,x,u)(σλ(t,x,u)c(t,x,u)),ϑ(t,x,u)(σ(t,x,u)zσ(t,x,u)).\displaystyle\mu(t,{\rm x},u)\coloneqq\begin{pmatrix}\sigma\lambda(t,x,u)\\ -c(t,x,u)\end{pmatrix},\;\vartheta(t,{\rm x},u)\coloneqq\begin{pmatrix}\sigma(t,x,u)\\ z\cdot\sigma(t,x,u)\end{pmatrix}.

Given wC1,2([0,T)×d)w\in C^{1,2}([0,T)\times\mathbb{R}^{d}), we define the sets

U(t,x,w)\displaystyle U^{-}(t,x,w) {uU:σ(t,x,u)(zxw(t,x))=0,tw(t,x)hb(t,x,xw(t,x),xx2w(t,x),u)0},\displaystyle\coloneqq\big{\{}u\in U:\sigma^{\top}\!(t,x,u)(z-\partial_{x}w(t,x))=0,\;-\partial_{t}w(t,x)-h^{\rm b}(t,x,\partial_{x}w(t,x),\partial_{xx}^{2}w(t,x),u)\geq 0\big{\}},
U+(t,x,w)\displaystyle U^{+}(t,x,w) {uU:σ(t,x,u)(zxw(t,x))=0,tw(t,x)hb(t,x,xw(t,x),xx2w(t,x),u)0},\displaystyle\coloneqq\big{\{}u\in U:\sigma^{\top}\!(t,x,u)(z-\partial_{x}w(t,x))=0,\;-\partial_{t}w(t,x)-h^{\rm b}(t,x,\partial_{x}w(t,x),\partial_{xx}^{2}w(t,x),u)\leq 0\big{\}},

and, for i{,+}i\in\{-,+\}, introduce the Hamiltonians (HL,Hi,w):[0,T]×d+1×d+1×𝕊d+1(H^{\rm L},H^{i,w}):[0,T]\times\mathbb{R}^{d+1}\times\mathbb{R}^{d+1}\times\mathbb{S}^{d+1}\longrightarrow\mathbb{R}, given by

HL(t,x,p,Q)supuU{hL(t,x,p,Q,u)},Hi,w(t,x,p,Q)supuUi(t,x,w){hL(t,x,p,Q,u)},\displaystyle{H}^{\rm L}(t,{\rm x},{\rm p},{\rm Q})\coloneqq\sup_{u\in U}\big{\{}{h}^{\rm L}(t,{\rm x},{\rm p},{\rm Q},u)\big{\}},\;{\rm H}^{i,w}(t,{\rm x},{\rm p},{\rm Q})\coloneqq\sup_{u\in U^{\text{$i$}}(t,x,w)}\big{\{}{h}^{\rm L}(t,{\rm x},{\rm p},{\rm Q},u)\big{\}}, (5.16)

where

hL(t,x,p,Q,u)C(t,x,u)+μ(t,x,u)p+12Tr[ϑϑ(t,x,u)Q].{h}^{\rm L}(t,{\rm x},{\rm p},{\rm Q},u)\coloneqq C(t,x,u)+\mu(t,{\rm x},u)\cdot{\rm p}+\frac{1}{2}{\rm Tr}[\vartheta\vartheta^{\top}\!(t,{\rm x},u){\rm Q}].

Below, 𝒯T{\cal T}_{T} denotes the family of 𝔽\mathbb{F}–stopping times with values on [0,T][0,T]. With this, we have all the elements necessary to state our main result, which is the following verification theorem.

Theorem 5.8.

Let wiC1,2([0,T)×d)C0([0,T]×d)w^{i}\in C^{1,2}([0,T)\times\mathbb{R}^{d})\cap C^{0}([0,T]\times\mathbb{R}^{d}), i{,+}i\in\{-,+\}, be solutions to (5.6) and (5.10), and vC1,2([0,T)×d+1)C0([0,T]×d+1)v\in C^{1,2}([0,T)\times\mathbb{R}^{d+1})\cap C^{0}([0,T]\times\mathbb{R}^{d+1}) satisfy

{tv(t,x)HL(t,x,xv(t,x),xx2v(t,x))=0,(t,x,y)[0,T)×d×(w(t,x),w+(t,x)),tv(t,x)Hi,wi(t,x,xv(t,x),xx2v(t,x))=0,(t,x,y)[0,T)×d×{wi(t,x)},i{,+}.v(T,x)=G(x),(x,y)d×{g(x)}.\displaystyle\begin{cases}-\partial_{t}v(t,{\rm x})-{\rm H}^{\rm L}(t,{\rm x},\partial_{\rm x}v(t,{\rm x}),\partial_{\rm xx}^{2}v(t,{\rm x}))=0,\;(t,x,y)\in[0,T)\times\mathbb{R}^{d}\times(w^{-}(t,x),w^{+}(t,x)),\\ -\partial_{t}v(t,{\rm x})-{\rm H}^{i,w^{\text{$i$}}}(t,{\rm x},\partial_{\rm x}v(t,{\rm x}),\partial_{\rm xx}^{2}v(t,{\rm x}))=0,\;(t,x,y)\in[0,T)\times\mathbb{R}^{d}\times\{w^{i}(t,x)\},\;i\in\{-,+\}.\\ v(T^{-},{\rm x})=G(x),\;(x,y)\in\mathbb{R}^{d}\times\{g(x)\}.\end{cases} (5.17)

Moreover, suppose that

  • the family {v(τ,Xτυ,Yτυ)}τ𝒯T\{v(\tau,{X}_{\tau}^{\upsilon},Y_{\tau}^{\upsilon})\}_{\tau\in{\cal T}_{T}} is uniformly integrable for all controls υ;\upsilon\in{\mathfrak{C}};

  • there exists υ:[0,T]×d×[w,w+]A×d×𝕊d×\upsilon^{\star}:[0,T]\times\mathbb{R}^{d}\times[w^{-},w^{+}]\longrightarrow A\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times{\cal B}^{\star} attaining the maximisers in HLH^{\rm L}, Hi,wiH^{i,w^{i}}, i{+,};i\in\{+,-\};191919Here, [w,w+]{y:w(t,x)yw+(t,x), for some (t,x)[0,T]×d}[w^{-},w^{+}]\coloneqq\{y\in\mathbb{R}:w^{-}(t,x)\leq y\leq w^{+}(t,x),\text{ for some }(t,x)\in[0,T]\times\mathbb{R}^{d}\}.

  • there is a strong solution to the system (5.2) with control (α,Z,Γ,b)υ(,X,Y);(\alpha^{\star}_{\cdot},Z_{\cdot}^{\star},\Gamma_{\cdot}^{\star},b^{\star})\coloneqq\upsilon^{\star}(\cdot,X_{\cdot},Y_{\cdot}); inlineinlinetodo: inlineUnique?

  • (α,Z,Γ,b)(\alpha^{\star},Z^{\star},\Gamma^{\star},b^{\star})\in{\mathfrak{C}}.

Then, V(t,x,y)=v(t,x,y)V(t,x,y)=v(t,x,y), and (α,Z,Γ,b)(\alpha^{\star},Z^{\star},\Gamma^{\star},b^{\star}) is an optimal control for the problem V(t,x,y)V(t,x,y).

Remark 5.9.

We remark that we could build upon one of the main results of [14] to characterise the functions VV, w+w^{+} and ww^{-} given by (5.4), (5.5), and (5.9), respectively, as viscosity solutions to—a relaxed version of—(5.6), (5.10) and (5.17), respectively. In particular, if one can show that VV, w+w^{+}, and ww^{-} are smooth and the associated Hamiltonians are continuous, the relaxation reduces to the above system. We refer to [14] for details. We have refrained from doing so as the above verification theorem gives the result most useful in solving any example in practice. In Section 2.2, we use the above result and search for solution to the above system directly.

Proof.

Let t[0,T]t\in[0,T], (x,y)Vg(t)(x,y)\in V_{g}(t), υ(t,x,y)\upsilon\in\mathfrak{C}(t,x,y), and (X,Y)(Xt,x,υ,Yt,x,y,υ)(X,Y)\coloneqq(X^{t,x,\upsilon},Y^{t,x,y,\upsilon}) be given by (5.2). We set X(X,Y){\rm X}\coloneqq(X,Y). By Lemma 5.3, we have that w(t,x)yw+(t,x),w^{-}(t,x)\leq y\leq w^{+}(t,x), and by a comparison argument we find that w(s,Xs)Ysw+(s,Xs)w^{-}(s,X_{s})\leq Y_{s}\leq w^{+}(s,X_{s}), s[t,T]s\in[t,T], –a.s.\mathbb{P}\text{\rm--a.s.} inlineinlinetodo: inlineWhat do you mean by comparison argument? Let τ\tau be given by

τinf{s>t:Ys=w(s,Xs), or, Ys=w+(s,Xs)}T.\tau\coloneqq\inf\{s>t:Y_{s}=w^{-}(s,X_{s}),\text{ or, }Y_{s}=w^{+}(s,X_{s})\}\wedge T.

We now consider the process v(t,Xt)v(t,Xt,Yt)v(t,{\rm X}_{t})\coloneqq v(t,X_{t},Y_{t}) and compute v(t,Xt)v(θ,Xθ)=v(t,Xt)v(τ,Xτ)+v(τ,Xτ)v(θ,Xθ)I1+I2v(t,{\rm X}_{t})-v(\theta,{\rm X}_{\theta})=v(t,{\rm X}_{t})-v(\tau,{\rm X}_{\tau})+v(\tau,{\rm X}_{\tau})-v(\theta,{\rm X}_{\theta})\eqqcolon I_{1}+I_{2}, for θ𝒯T\theta\in{\cal T}_{T}, τθ\tau\leq\theta. It follows from Itô’s formula that

I1\displaystyle I_{1} =tτ(tv(s,Xs)ds+12Tr[xx2v(s,Xs)d[X]s])tτxv(s,Xs)dXs\displaystyle=-\int_{t}^{\tau}\Big{(}\partial_{t}v(s,{\rm X}_{s})\mathrm{d}s+\frac{1}{2}{\rm Tr}[\partial_{\rm xx}^{2}v(s,{\rm X}_{s})\mathrm{d}[{\rm X}]_{s}]\Big{)}-\int_{t}^{\tau}\partial_{\rm x}v(s,{\rm X}_{s})\cdot\mathrm{d}{\rm X}_{s}
=tτ(HL(s,Xs,xv(s,Xs),xx2v(s,Xs))hL(s,Xs,xv(s,Xs),xx2v(s,Xs),υs))ds\displaystyle=\int_{t}^{\tau}\Big{(}{\rm H}^{\rm L}\big{(}s,{\rm X}_{s},\partial_{\rm x}v(s,{\rm X}_{s}),\partial_{\rm xx}^{2}v(s,{\rm X}_{s})\big{)}-{\rm h}^{\rm L}\big{(}s,{\rm X}_{s},\partial_{\rm x}v(s,{\rm X}_{s}),\partial_{\rm xx}^{2}v(s,{\rm X}_{s}),\upsilon_{s}\big{)}\Big{)}\mathrm{d}s
+tτC(s,Xs,υs)dstτ(xv(s,Xs),yv(s,Xs))(σ(s,Xs,υs)dWs,Zsσ(s,Xs,υs)dWs)\displaystyle\quad+\int_{t}^{\tau}C(s,X_{s},\upsilon_{s})\mathrm{d}s-\int_{t}^{\tau}\big{(}\partial_{x}v(s,{\rm X}_{s}),\partial_{y}v(s,{\rm X}_{s})\big{)}^{\top}\cdot\big{(}\sigma(s,X_{s},\upsilon_{s})\mathrm{d}W_{s},Z_{s}\cdot\sigma(s,X_{s},\upsilon_{s})\mathrm{d}W_{s}\big{)}^{\top}
tτC(s,Xs,υs)dstτ(xv(s,Xs)+yv(s,Xs)Zs)σ(s,Xs,υs)dWs,\displaystyle\geq\int_{t}^{\tau}C(s,X_{s},\upsilon_{s})\mathrm{d}s-\int_{t}^{\tau}\Big{(}\partial_{x}v(s,{\rm X}_{s})+\partial_{y}v(s,{\rm X}_{s})Z_{s}\Big{)}\cdot\sigma(s,X_{s},\upsilon_{s})\mathrm{d}W_{s}, (5.18)

where we used the fact, on [t,τ)[t,{\tau}), vv satisfies the first equation in (5.17), we computed the dynamics of X{\rm X} and added and subtracted CC to complete the term hLh^{\rm L}. The inequality follows from the definition of HLH^{\rm L}.

We now consider I2I_{2}. Without loss of generality, we assume that Yτ=w(τ,Xτ)Y_{\tau}=w^{-}(\tau,X_{\tau}), and thus, by the Markov property, Ys=w(s,Xs)Y_{s}=w^{-}(s,X_{s}) for s[τ,T]s\in[\tau,T], –a.s.\mathbb{P}\text{\rm--a.s.} inlineinlinetodo: inlineWhy is that the Markov property? You seem to be saying that the boundary is absorbing. Why is that true? By the uniqueness of their Itô decomposition, we deduce that Zt=xw(t,Xt)Z_{t}=\partial_{x}w^{-}(t,X_{t}), and υtN(t,Xt,xw(t,Xt)),dtd–a.e.\upsilon_{t}\in N(t,X_{t},\partial_{x}w^{-}(t,X_{t})),\mathrm{d}t\otimes\mathrm{d}\mathbb{P}\text{\rm--a.e.} With this, applying Itô’s formula to w(τ,Xτ)w^{-}(\tau,X_{\tau}) we find that υ\upsilon attains the infimum in (5.7); in particular, υtU(t,Xt),dtd–a.s.\upsilon_{t}\in U^{-}(t,X_{t}),\mathrm{d}t\otimes\mathrm{d}\mathbb{P}\text{\rm--a.s.} Let v¯(t,x)v(t,x,w(t,x))\bar{v}(t,x)\coloneqq v(t,x,w^{-}(t,x)), so that

I2\displaystyle I_{2} =τθ(tv¯(s,Xs)ds+12Tr[xx2v¯(s,Xs)d[X]s])tθxv¯(s,Xs)dXs\displaystyle=-\int_{\tau}^{\theta}\bigg{(}\partial_{t}\bar{v}(s,X_{s})\mathrm{d}s+\frac{1}{2}{\rm Tr}[\partial_{xx}^{2}\bar{v}(s,X_{s})\mathrm{d}[X]_{s}]\bigg{)}-\int_{t}^{\theta}\partial_{x}\bar{v}(s,X_{s})\cdot\mathrm{d}X_{s}
=τθ(tv(s,Xs,w(s,Xs))+σλ(s,Xs,υs)xv(s,Xs,w(s,Xs)))dsτθ12Tr[xx2v¯(s,Xs)d[X]s]\displaystyle=-\int_{\tau}^{\theta}\Big{(}\partial_{t}v(s,X_{s},w^{-}(s,X_{s}))+\sigma\lambda(s,X_{s},\upsilon_{s})\cdot\partial_{x}v(s,X_{s},w^{-}(s,X_{s}))\Big{)}\mathrm{d}s-\int_{\tau}^{\theta}\frac{1}{2}{\rm Tr}[\partial_{xx}^{2}\bar{v}(s,X_{s})\mathrm{d}[X]_{s}]
τθyv(s,Xs,w(s,Xs))(tw(s,Xs)+σλ(s,Xs,υs)xw(s,Xs))ds\displaystyle\quad-\int_{\tau}^{\theta}\partial_{y}v(s,X_{s},w^{-}(s,X_{s}))\Big{(}\partial_{t}w^{-}(s,X_{s})+\sigma\lambda(s,X_{s},\upsilon_{s})\cdot\partial_{x}w^{-}(s,X_{s})\Big{)}\mathrm{d}s
τθ(xv(s,Xs,w(s,Xs))+yv(s,Xs,w(s,Xs))xw(s,Xs))σ(s,Xs,us)dWs\displaystyle\quad-\int_{\tau}^{\theta}\Big{(}\partial_{x}v(s,X_{s},w^{-}(s,X_{s}))+\partial_{y}v(s,X_{s},w^{-}(s,X_{s}))\partial_{x}w^{-}(s,X_{s})\Big{)}\cdot\sigma(s,X_{s},u_{s})\mathrm{d}W_{s}
=τθ(tv(s,Xs,w(s,Xs))+σλ(s,Xs,υs)xv(s,Xs,w(s,Xs))yv(s,Xs,w(s,Xs))c(s,Xs,υs))ds\displaystyle=-\int_{\tau}^{\theta}\Big{(}\partial_{t}v(s,X_{s},w^{-}(s,X_{s}))+\sigma\lambda(s,X_{s},\upsilon_{s})\cdot\partial_{x}v(s,X_{s},w^{-}(s,X_{s}))-\partial_{y}v(s,X_{s},w^{-}(s,X_{s}))c(s,X_{s},\upsilon_{s})\Big{)}\mathrm{d}s
τθ(12Tr[xx2v¯(s,Xs)d[X]s]12yv(s,Xs,w(s,Xs))Tr[xx2w(s,Xs)d[X]s])\displaystyle\quad-\int_{\tau}^{\theta}\bigg{(}\frac{1}{2}{\rm Tr}[\partial_{xx}^{2}\bar{v}(s,X_{s})\mathrm{d}[X]_{s}]-\frac{1}{2}\partial_{y}v(s,X_{s},w^{-}(s,X_{s})){\rm Tr}[\partial^{2}_{xx}w^{-}(s,X_{s})\mathrm{d}[X]_{s}]\bigg{)}
τθyv(s,Xs,w(s,Xs))(tw(s,Xs)+hb(s,Xs,xw(s,Xs),xx2w(s,Xs),υs))ds\displaystyle\quad-\int_{\tau}^{\theta}\partial_{y}v(s,X_{s},w^{-}(s,X_{s}))\bigg{(}\partial_{t}w^{-}(s,X_{s})+h^{\rm b}(s,X_{s},\partial_{x}w^{-}(s,X_{s}),\partial^{2}_{xx}w^{-}(s,X_{s}),\upsilon_{s})\bigg{)}\mathrm{d}s
τθ(xv(s,Xs,w(s,Xs))+yv(s,Xs,w(s,Xs))xw(s,Xs))σ(s,Xs,υs)dWs\displaystyle\quad-\int_{\tau}^{\theta}\Big{(}\partial_{x}v(s,X_{s},w^{-}(s,X_{s}))+\partial_{y}v(s,X_{s},w^{-}(s,X_{s}))\partial_{x}w^{-}(s,X_{s})\Big{)}\cdot\sigma(s,X_{s},\upsilon_{s})\mathrm{d}W_{s}

where in the first equality, we computed the time and space derivatives of v¯\bar{v}, the dynamics of XX, and in the second equality, we added and subtracted yv(c+12Tr[σσxx2w])\partial_{y}v\big{(}c+\frac{1}{2}{\rm Tr}[\sigma\sigma^{\top}\!\partial^{2}_{xx}w^{-}]\big{)} and use the fact that Z=xw(,X)Z_{\cdot}=\partial_{x}w^{-}(\cdot,X_{\cdot}) to complete the term hbh^{\rm b} in the third line.

Recalling that υ\upsilon attains the infimum in (5.7), we see the term tw+hb\partial_{t}w^{-}+h^{\rm b} equals 0. Moreover, since Z=xw(,X)Z_{\cdot}=\partial_{x}w^{-}(\cdot,X_{\cdot}), Tr[xx2v(t,Xt)d[X]t]=Tr[xx2v¯(t,Xt)d[X]t]yv(t,Xt,w(t,Xt))Tr[xx2w(t,Xt)d[X]t],dtd–a.e.{\rm Tr}\big{[}\partial^{2}_{\rm xx}v(t,{\rm X}_{t})\mathrm{d}[{\rm X}]_{t}\big{]}={\rm Tr}\big{[}\partial^{2}_{xx}\bar{v}(t,{X}_{t})\mathrm{d}[{X}]_{t}\big{]}-\partial_{y}v(t,X_{t},w^{-}(t,{X}_{t})){\rm Tr}\big{[}\partial^{2}_{xx}w^{-}(t,{X}_{t})\mathrm{d}[{X}]_{t}\big{]},\mathrm{d}t\otimes\mathrm{d}\mathbb{P}\text{\rm--a.e.} Consequently,

I2\displaystyle I_{2} =τθ(tv(s,Xs)+hL(s,Xs,xv(s,Xs),xx2v(s,Xs),υs))ds\displaystyle=-\int_{\tau}^{\theta}\Big{(}\partial_{t}v(s,{\rm X}_{s})+{\rm h}^{\rm L}\big{(}s,{\rm X}_{s},\partial_{\rm x}v(s,{\rm X}_{s}),\partial_{\rm xx}^{2}v(s,{\rm X}_{s}),\upsilon_{s}\big{)}\Big{)}\mathrm{d}s
+τθC(s,Xs,υs)dsτθ(xv(s,Xs)+yv(s,Xs)xw(s,Xs))σ(s,Xs,υs)dWs\displaystyle\quad+\int_{\tau}^{\theta}C(s,X_{s},\upsilon_{s})\mathrm{d}s-\int_{\tau}^{\theta}\Big{(}\partial_{x}v(s,{\rm X}_{s})+\partial_{y}v(s,{\rm X}_{s})\partial_{x}w^{-}(s,X_{s})\Big{)}\cdot\sigma(s,X_{s},\upsilon_{s})\mathrm{d}W_{s}
=τθ(H,w(s,Xs,xv(s,Xs),xx2v(s,Xs))hL(s,Xs,xv(s,Xs),xx2v(s,Xs),υs))ds\displaystyle=\int_{\tau}^{\theta}\Big{(}{\rm H}^{-,w^{-}}\!\big{(}s,{\rm X}_{s},\partial_{\rm x}v(s,{\rm X}_{s}),\partial_{\rm xx}^{2}v(s,{\rm X}_{s})\big{)}-{\rm h}^{\rm L}\big{(}s,{\rm X}_{s},\partial_{\rm x}v(s,{\rm X}_{s}),\partial_{\rm xx}^{2}v(s,{\rm X}_{s}),\upsilon_{s}\big{)}\Big{)}\mathrm{d}s
+τθC(s,Xs,υs)dsτθ(xv(s,Xs)+yv(s,Xs)xw(s,Xs))σ(s,Xs,υs)dWs\displaystyle\quad+\int_{\tau}^{\theta}C(s,X_{s},\upsilon_{s})\mathrm{d}s-\int_{\tau}^{\theta}\Big{(}\partial_{x}v(s,{\rm X}_{s})+\partial_{y}v(s,{\rm X}_{s})\partial_{x}w^{-}(s,X_{s})\Big{)}\cdot\sigma(s,X_{s},\upsilon_{s})\mathrm{d}W_{s}
τθC(s,Xs,υs)dsτθ(xv(s,Xs)+yv(s,Xs)xw(s,Xs))σ(s,Xs,υs)dWs,\displaystyle\geq\int_{\tau}^{\theta}C(s,X_{s},\upsilon_{s})\mathrm{d}s-\int_{\tau}^{\theta}\Big{(}\partial_{x}v(s,{\rm X}_{s})+\partial_{y}v(s,{\rm X}_{s})\partial_{x}w^{-}(s,X_{s})\Big{)}\cdot\sigma(s,X_{s},\upsilon_{s})\mathrm{d}W_{s}, (5.19)

where in the first equality we added and subtracted CC to complete the expression for hLh^{\rm L}, and in the second equality, we used the fact that vv satisfies the second equation in (5.17) for X=(X,w(,X)){\rm X}_{\cdot}=(X_{\cdot},w^{-}(\cdot,X_{\cdot})). The inequality follows from the definition of H,wH^{-,w^{-}} and the fact that υN(,X,xw(,X)),dtd–a.e.\upsilon_{\cdot}\in N(\cdot,X_{\cdot},\partial_{x}w^{-}(\cdot,X_{\cdot})),\mathrm{d}t\otimes\mathrm{d}\mathbb{P}\text{\rm--a.e.}

Let (θn)n1𝒯T(\theta_{n})_{n\geq 1}\subseteq{\cal T}_{T}, tθnθn+1t\leq\theta_{n}\leq\theta_{n+1}, n1n\geq 1, θnT\theta_{n}\longrightarrow T, –a.s.\mathbb{P}\text{\rm--a.s.}, be such that (X,Y)(X,Y) is bounded on [t,θn][t,\theta_{n}].We now add (5.2) and (5.2), and take θ=θn\theta=\theta_{n}.202020Note that whenever τ=T\tau=T, I2=0I_{2}=0 and θnτ=T\theta_{n}\longrightarrow\tau=T. By continuity, the terms vv, w+w^{+}, and their derivates are bounded on [t,θn][t,\theta_{n}]. Thus, since σ\sigma is bounded and Zp(𝔽,)p<\|Z\|_{\mathbb{H}^{\raisebox{1.0pt}{\text{$p$}}}(\mathbb{F},\mathbb{P})}^{p}<\infty, the stochastic integrals in (5.2) and (5.2) are martingales. Consequently,

v(t,x,y)𝔼[v(θn,Xθn,Yθn)+tθnC(s,Xs,υs)ds|t].\displaystyle v(t,x,y)\geq\mathbb{E}\bigg{[}v(\theta_{n},X_{\theta_{n}},Y_{\theta_{n}})+\int_{t}^{\theta_{n}}C(s,X_{s},\upsilon_{s})\mathrm{d}s\bigg{|}{\cal F}_{t}\bigg{]}.

Thus, the uniform integrability of the family {v(θn,Xθn,Yθn)}n1\{v(\theta_{n},X_{\theta_{n}},Y_{\theta_{n}})\}_{n\geq 1}, the boundedness of CC, together with an application of dominated convergence, gives

v(t,x,y)𝔼[G(XT)+tTC(s,Xs,υs)ds|t],\displaystyle v(t,x,y)\geq\mathbb{E}\bigg{[}G(X_{T})+\int_{t}^{T}C(s,X_{s},\upsilon_{s})\mathrm{d}s\bigg{|}{\cal F}_{t}\bigg{]}, (5.20)

where we used the boundary condition in time in (5.17) and that by the feasibility of υ\upsilon, if τ=T\tau=T we have that YT=g(XT)Y_{T}=g(X_{T}), and if τ<T\tau<T we have that w(T,x)=g(x)w^{-}(T^{-},x)=g(x), see (5.10). The arbitrariness of υ\upsilon gives vVv\geq V. To conclude, note that for (Z,Γ,α)(Z^{\star},\Gamma^{\star},\alpha^{\star}) as in the statement, the inequalities in (5.2) and (5.2) are tight. ∎

inlineinlinetodo: inlineI guess the discussion of what happens in the pessimistic case is missing, right? We should mention there that the value looks like the lower value of a zero-sum game where the leader would maximise his controls, but the follower would now simply minimise the criterion of the leader over his best responses. Then informally one would put the lower HJBI equation for this instead of the one we put.

References

  • Aïd et al. [2020] R. Aïd, M. Basei, and H. Pham. A McKean–Vlasov approach to distributed electricity generation development. Mathematical Methods of Operations Research, 91:269–310, 2020.
  • Bagchi [1984] A. Bagchi. Stackelberg differential games in economic models, volume 64 of Lecture notes in control and information sciences. Springer Berlin, Heidelberg, 1984.
  • Başar [1979] T. Başar. Stochastic stagewise Stackelberg strategies for linear quadratic systems. In M. Kohlmann and W. Vogel, editors, Stochastic control theory and stochastic differential systems. Proceedings of a workshop of the „Sonderforschungsbereich 72 der Deutschen Forschungsgemeinschaft an der Universität Bonn” which took place in January 1979 at Bad Honnef, volume 16 of Lecture notes in control and information sciences, pages 264–276. Springer Berlin, Heidelberg, 1979.
  • Başar [1981] T. Başar. A new method for the Stackelberg solution of differential games with sampled-data state information. IFAC Proceedings Volumes, 14(2):1365–1370, 1981.
  • Başar and Haurie [1984] T. Başar and A. Haurie. Feedback equilibria in differential games with structural and modal uncertainties. In J.B. Cruz, Jr., editor, Advances in large scale systems, volume 1, pages 163–201. 1984.
  • Başar and Olsder [1980] T. Başar and G.J. Olsder. Team-optimal closed-loop Stackelberg strategies in hierarchical control problems. Automatica, 16(4):409–414, 1980.
  • Başar and Olsder [1999] T. Başar and G.J. Olsder. Dynamic noncooperative game theory. SIAM, 2nd revised edition, 1999.
  • Başar and Selbuz [1978] T. Başar and H. Selbuz. A new approach for derivation of closed-loop Stackelberg strategies. In R.E. Larson and A.S. Willsky, editors, 1978 IEEE conference on decision and control including the 17th symposium on adaptive processes, pages 1113–1118, 1978.
  • Başar and Selbuz [1979] T. Başar and H. Selbuz. Closed-loop Stackelberg strategies with applications in the optimal control of multilevel systems. IEEE Transactions on Automatic Control, 24(2):166–179, 1979.
  • Bensoussan et al. [2014] A. Bensoussan, S. Chen, and S.P. Sethi. Feedback Stackelberg solutions of infinite-horizon stochastic differential games. In F. El Ouardighi and K. Kogan, editors, Models and methods in economics and management science: essays in honor of Charles S. Tapiero, volume 198 of International series in operations research & management science, pages 3–15. Springer Cham, 2014.
  • Bensoussan et al. [2015] A. Bensoussan, S. Chen, and S.P. Sethi. The maximum principle for global solutions of stochastic Stackelberg differential games. SIAM Journal on Control and Optimization, 53(4):1956–1981, 2015.
  • Bensoussan et al. [2019] A. Bensoussan, S. Chen, A. Chutani, S.P. Sethi, C.C. Siu, and S.C.P. Yam. Feedback Stackelberg–Nash equilibria in mixed leadership games with an application to cooperative advertising. SIAM Journal on Control and Optimization, 57(5):3413–3444, 2019.
  • Bouchard et al. [2009] B. Bouchard, R. Élie, and N. Touzi. Stochastic target problems with controlled loss. SIAM Journal on Control and Optimization, 48(5):3123–3150, 2009.
  • Bouchard et al. [2010] B. Bouchard, R. Élie, and C. Imbert. Optimal control under stochastic target constraints. SIAM Journal on Control and Optimization, 48(5):3501–3531, 2010.
  • Bressan [2011] A. Bressan. Noncooperative differential games. Milan Journal of Mathematics, 79:357–427, 2011.
  • Carmona [2016] R. Carmona. Lectures on BSDEs, stochastic control, and stochastic differential games with financial applications, volume 1 of Financial mathematics. SIAM, 2016.
  • Castanon [1976] D.A. Castanon. Equilibria in stochastic dynamic games of Stackelberg type. PhD thesis, Massachusetts Institute of Technology, 1976.
  • Chen and Cruz Jr. [1972] C.I. Chen and J.B. Cruz Jr. Stackelberg solution for two-person games with biased information patterns. IEEE Transactions on Automatic Control, 17(6):791–798, 1972.
  • Chutani and Sethi [2014] A. Chutani and S.P. Sethi. A feedback Stackelberg game of cooperative advertising in a durable goods oligopoly. In J. Haunschmied, V.M. Veliov, and S. Wrzaczek, editors, Dynamic games in economics, volume 16 of Dynamic modeling and econometrics in economics and finance, pages 89–114. Springer, 2014.
  • Cong and Shi [2024] W. Cong and J. Shi. Direct approach of linear–quadratic Stackelberg mean field games of backward–forward stochastic systems. ArXiv preprint arXiv:2401.15835, 2024.
  • Crandall et al. [1992] M.G. Crandall, H. Ishii, and P.-L. Lions. User’s guide to viscosity solutions of second order partial differential equations. Bulletin of the American Mathematical Society, 27(1):1–67, 1992.
  • Cruz Jr. [1975] J.B. Cruz Jr. Survey of Nash and Stackelberg equilibrium strategies in dynamic games. In Annals of economic and social measurement, volume 4, pages 339–344. National Bureau of Economic Research, 1975.
  • Cruz Jr. [1976] J.B. Cruz Jr. Stackelberg strategies for multilevel systems. In Y.C. Ho and S.K. Mitter, editors, Directions in large-scale systems, pages 139–147. Springer New York, NY, 1976.
  • Cvitanić and Zhang [2012] J. Cvitanić and J. Zhang. Contract theory in continuous-time models. Springer, 2012.
  • Cvitanić et al. [2018] J. Cvitanić, D. Possamaï, and N. Touzi. Dynamic programming approach to principal–agent problems. Finance and Stochastics, 22(1):1–37, 2018.
  • Dayanıklı and Laurière [2023] G. Dayanıklı and M. Laurière. A machine learning method for Stackelberg mean field games. ArXiv preprint arXiv:2302.10440, 2023.
  • Dockner et al. [2000] E.J. Dockner, S. Jorgensen, N. Van Long, and G. Sorger. Differential games in economics and management science. Cambridge University Press, 2000.
  • El Karoui and Tan [2013] N. El Karoui and X. Tan. Capacities, measurable selection and dynamic programming part II: application in stochastic control problems. Technical report, École Polytechnique and université Paris-Dauphine, 2013.
  • Feng et al. [2022] X. Feng, Y. Hu, and J. Huang. Backward Stackelberg differential game with constraints: a mixed terminal-perturbation and linear–quadratic approach. SIAM Journal on Control and Optimization, 60(3):1488–1518, 2022.
  • Fu and Horst [2020] G. Fu and U. Horst. Mean-field leader–follower games with terminal state constraint. SIAM Journal on Control and Optimization, 58(4):2078–2113, 2020.
  • Gardner and Cruz Jr. [1977] B Gardner and J.B. Cruz Jr. Feedback Stackelberg strategy for a two player game. IEEE Transactions on Automatic Control, 22(2):270–271, 1977.
  • Gou et al. [2023] Z. Gou, N.-J. Huang, and M.-H. Wang. A linear–quadratic mean-field stochastic Stackelberg differential game with random exit time. International Journal of Control, 96(3):731–745, 2023.
  • Guan et al. [2023] G. Guan, Z. Liang, and Y. Song. A Stackelberg reinsurance–investment game under α\alpha-maxmin mean–variance criterion and stochastic volatility. Scandinavian Actuarial Journal, to appear, 2023.
  • Han et al. [2023] X. Han, D. Landriault, and D. Li. Optimal reinsurance contract in a Stackelberg game framework: a view of social planner. Scandinavian Actuarial Journal, to appear, 2023.
  • Havrylenko et al. [2022] Y. Havrylenko, M. Hinken, and R. Zagst. Risk sharing in equity-linked insurance products: Stackelberg equilibrium between an insurer and a reinsurer. ArXiv preprint arXiv:2203.04053, 2022.
  • He et al. [2007] X. He, A. Prasad, S.P. Sethi, and G.J. Gutierrez. A survey of Stackelberg differential game models in supply and marketing channels. Journal of Systems Science and Systems Engineering, 16:385–413, 2007.
  • He et al. [2008] X. He, A. Prasad, and S.P. Sethi. Cooperative advertising and pricing in a dynamic stochastic supply chain: Feedback stackelberg strategies. In D.F. Kocaoglu, T.R. Anderson, and T.U. Daim, editors, PICMET ’08, Portland international conference on management of engineering & technology, pages 1634–1649, 2008.
  • Huang and Shi [2021] Q. Huang and J. Shi. A verification theorem for Stackelberg stochastic differential games in feedback information pattern. ArXiv preprint arXiv:2108.06498, 2021.
  • Kang and Shi [2022] K. Kang and J. Shi. A three-level stochastic linear–quadratic Stackelberg differential game with asymmetric information. ArXiv preprint arXiv:2210.11808, 2022.
  • Karandikar [1995] R.L. Karandikar. On pathwise stochastic integration. Stochastic Processes and their Applications, 57(1):11–18, 1995.
  • Leitmann [1978] G. Leitmann. On generalized Stackelberg strategies. Journal of Optimization Theory and Applications, 26(4):637–643, 1978.
  • Li et al. [2022] H. Li, J. Xu, and H. Zhang. Closed–loop Stackelberg strategy for linear–quadratic leader–follower game. ArXiv preprint arXiv:2212.08977, 2022.
  • Li and Yu [2018] N. Li and Z. Yu. Forward–backward stochastic differential equations and linear–quadratic generalized Stackelberg games. SIAM Journal on Control and Optimization, 56(6):4148–4180, 2018.
  • Li and Sethi [2017] T. Li and S.P. Sethi. A review of dynamic Stackelberg game models. Discrete & Continuous Dynamical Systems–B, 22(1):125–129, 2017.
  • Li and Han [2023] Y. Li and S. Han. Solving strongly convex and smooth Stackelberg games without modeling the follower. ArXiv preprint arXiv:2303.06192, 2023.
  • Li and Shi [2023a] Z. Li and J. Shi. Closed-loop solvability of linear quadratic mean-field type Stackelberg stochastic differential games. ArXiv preprint arXiv:2303.07544, 2023a.
  • Li and Shi [2023b] Z. Li and J. Shi. Linear quadratic leader–follower stochastic differential games: closed-loop solvability. Journal of Systems Science and Complexity, 36(4):1373–1406, 2023b.
  • Liu et al. [2018] J. Liu, Y. Fan, Z. Chen, and Y. Zheng. Pessimistic bilevel optimization: a survey. International Journal of Computational Intelligence Systems, 11(1):725–736, 2018.
  • Lv et al. [2023] S. Lv, J. Xiong, and X. Zhang. Linear quadratic leader–follower stochastic differential games for mean-field switching diffusions. Automatica, 154(111072):1–9, 2023.
  • Mallozzi and Morgan [1995] L. Mallozzi and J. Morgan. Weak Stackelberg problem and mixed solutions under data perturbations. Optimization, 32(3):269–290, 1995.
  • Moon [2021] J. Moon. Linear–quadratic stochastic Stackelberg differential games for jump–diffusion systems. SIAM Journal on Control and Optimization, 59(2):954–976, 2021.
  • Ni et al. [2023] Y.-H. Ni, L. Liu, and X. Zhang. Deterministic dynamic Stackelberg games: time-consistent open-loop solution. Automatica, 148(110757):1–9, 2023.
  • Nutz [2012] M. Nutz. Pathwise construction of stochastic integrals. Electronic Communications in Probability, 17(24):1–7, 2012.
  • Øksendal et al. [2013] B. Øksendal, L. Sandal, and J. Ubøe. Stochastic Stackelberg equilibria with applications to time-dependent newsvendor models. Journal of Economic Dynamics and Control, 37(7):1284–1299, 2013.
  • Papavassilopoulos [1979] G.P. Papavassilopoulos. Leader–follower and Nash strategies with state information. PhD thesis, University of Illinois at Urbana-Champaign, 1979.
  • Papavassilopoulos and Cruz Jr. [1979] G.P. Papavassilopoulos and J.B. Cruz Jr. Nonclassical control problems and Stackelberg games. IEEE Transactions on Automatic Control, 24(2):155–166, 1979.
  • Papavassilopoulos and Cruz Jr. [1980] G.P. Papavassilopoulos and J.B. Cruz Jr. Sufficient conditions for Stackelberg and Nash strategies with memory. Journal of Optimization Theory and Applications, 31(2):233–260, 1980.
  • Possamaï et al. [2018] D. Possamaï, X. Tan, and C. Zhou. Stochastic control for a class of nonlinear kernels and applications. The Annals of Probability, 46(1):551–603, 2018.
  • Possamaï et al. [2020] D. Possamaï, N. Touzi, and J. Zhang. Zero-sum path-dependent stochastic differential games in weak formulation. The Annals of Applied Probability, 30(3):1415–1457, 2020.
  • Ren et al. [2023] Z. Ren, X. Tan, N. Touzi, and J. Yang. Entropic optimal planning for path-dependent mean field games. SIAM Journal on Control and Optimization, 61(3):1415–1437, 2023.
  • Rockafellar [1970] R.T. Rockafellar. Convex analysis. Princeton University Press, 1970.
  • Shi et al. [2016] J. Shi, G. Wang, and J. Xiong. Leader–follower stochastic differential game with asymmetric information and applications. Automatica, 63:60–73, 2016.
  • Si and Wu [2021] K. Si and Z. Wu. Backward–forward linear–quadratic mean-field Stackelberg games. Advances in Difference Equations, 2021(73):1–23, 2021.
  • Simaan and Cruz Jr. [1973a] M. Simaan and J.B. Cruz Jr. Additional aspects of the Stackelberg strategy in nonzero-sum games. Journal of Optimization Theory and Applications, 11(6):613–626, 1973a.
  • Simaan and Cruz Jr. [1973b] M. Simaan and J.B. Cruz Jr. On the Stackelberg strategy in nonzero-sum games. Journal of Optimization Theory and Applications, 11(5):533–555, 1973b.
  • Simaan and Cruz Jr. [1976] M. Simaan and J.B. Cruz Jr. On the Stackelberg strategy in nonzero-sum games. In G. Leitmann, editor, Multicriteria decision making and differential games, Mathematical concepts and methods in science and engineering, pages 173–195. Springer New York, NY, 1976.
  • Soner and Touzi [2002a] H.M. Soner and N. Touzi. Dynamic programming for stochastic target problems and geometric flows. Journal of the European Mathematical Society, 4(3):201–236, 2002a.
  • Soner and Touzi [2002b] H.M. Soner and N. Touzi. Stochastic target problems, dynamic programming, and viscosity solutions. SIAM Journal on Control and Optimization, 41(2):404–424, 2002b.
  • Soner and Touzi [2003] H.M. Soner and N. Touzi. A stochastic representation for mean curvature type geometric flows. The Annals of Probability, 31(3):1145–1165, 2003.
  • Soner et al. [2012] H.M. Soner, N. Touzi, and J. Zhang. Wellposedness of second order backward SDEs. Probability Theory and Related Fields, 153(1–2):149–190, 2012.
  • Stroock and Varadhan [1997] D.W. Stroock and S.R.S. Varadhan. Multidimensional diffusion processes, volume 233 of Grundlehren der mathematischen Wissenschaften. Springer-Verlag Berlin Heidelberg, 1997.
  • Sun et al. [2023] J. Sun, H. Wang, and J. Wen. Zero-sum Stackelberg stochastic linear–quadratic differential games. SIAM Journal on Control and Optimization, 61(1):252–284, 2023.
  • Van Long [2010] N. Van Long. A survey of dynamic games in economics, volume 1 of Surveys on theories in economics and business administration. World Scientific, 2010.
  • Vasal [2022a] D. Vasal. Master equation of discrete-time Stackelberg mean field games with multiple leaders. ArXiv preprint arXiv:2209.03186, 2022a.
  • Vasal [2022b] D. Vasal. Sequential decomposition of stochastic Stackelberg games. In B. Ferri and F. Zhang, editors, 2022 American control conference, pages 1266–1271. IEEE, 2022b.
  • von Stackelberg [1934] H. von Stackelberg. Marktform und Gleichgewicht. Springer-Verlag Wien New York, 1934.
  • Wang et al. [2020] G. Wang, Y. Wang, and S. Zhang. An asymmetric information mean-field type linear–quadratic stochastic Stackelberg differential game with one leader and two followers. Optimal Control Applications and Methods, 41(4):1034–1051, 2020.
  • Wiesemann et al. [2013] W. Wiesemann, A. Tsoukalas, P.-M. Kleniati, and B. Rustem. Pessimistic bilevel optimization. SIAM Journal on Optimization, 23(1):353–380, 2013.
  • Wu [2013] Z. Wu. A general maximum principle for optimal control of forward–backward stochastic systems. Automatica, 49(5):1473–1480, 2013.
  • Yong [2002] J. Yong. A leader–follower stochastic linear quadratic differential game. SIAM Journal on Control and Optimization, 41(4):1015–1041, 2002.
  • Yong [2010] J. Yong. Optimality variational principle for controlled forward–backward stochastic differential equations with mixed initial–terminal conditions. SIAM Journal on Control and Optimization, 48(6):3675–4179, 2010.
  • Zemkoho [2016] A.B. Zemkoho. Solving ill-posed bilevel programs. Set-Valued and Variational Analysis, 24(3):423–448, 2016.
  • Zhang [2017] J. Zhang. Backward stochastic differential equations—from linear to fully nonlinear theory, volume 86 of Probability theory and stochastic modelling. Springer-Verlag New York, 2017.
  • Zheng and Shi [2020] Y. Zheng and J. Shi. A Stackelberg game of backward stochastic differential equations with applications. Dynamic Games and Applications, 10(4):968–992, 2020.
  • Zheng and Shi [2021] Y. Zheng and J. Shi. A Stackelberg game of backward stochastic differential equations with partial information. Mathematical Control and Related Fields, 11(4):797–828, 2021.
  • Zheng and Shi [2022a] Y. Zheng and J. Shi. A linear–quadratic partially observed Stackelberg stochastic differential game with application. Applied Mathematics and Computation, 420(126819):1–22, 2022a.
  • Zheng and Shi [2022b] Y. Zheng and J. Shi. Stackelberg stochastic differential game with asymmetric noisy observations. International Journal of Control, 95(9):2510–2530, 2022b.

Appendix A Illustrative example: additional proofs

A.1 The first-best case

The proof of Lemma 2.1 is very straightforward using standard HJB techniques or even by pointwise optimisation, as one can compute

JL(α,β)𝔼[XTcL20Tαt2dt]=𝔼[0T(αt+βtcL2αt2)dt],\displaystyle J_{\rm L}(\alpha,\beta)\coloneqq\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\alpha_{t}^{2}\mathrm{d}t\bigg{]}=\mathbb{E}^{\mathbb{P}}\bigg{[}\int_{0}^{T}\bigg{(}\alpha_{t}+\beta_{t}-\dfrac{c_{\rm L}}{2}\alpha_{t}^{2}\bigg{)}\mathrm{d}t\bigg{]},

and directly verify that the optimal efforts are the one defined in Lemma 2.1.

A.2 ACLM formulation

Lemma A.1.

For k>0k>0, consider the closed-loop memoryless strategy ak𝒜a_{k}\in{\cal A} defined for all t[0,T]t\in[0,T] by

ak(t,Xt)Π[0,a](1cL+k(XtXt)),whereXt=x0+tcL+ekTkcF(1ekt)+σWt,t[0,T].\displaystyle a_{k}(t,X_{t})\coloneqq\Pi_{[0,a_{\circ}]}\bigg{(}\frac{1}{c_{\rm L}}+k(X_{t}-X_{t}^{\star})\bigg{)},\;\text{\rm where}\;X^{\star}_{t}=x_{0}+\dfrac{t}{c_{\rm L}}+\dfrac{\mathrm{e}^{kT}}{kc_{\rm F}}\big{(}1-\mathrm{e}^{-kt}\big{)}+\sigma W_{t},\;t\in[0,T].

Define K¯1Tlog(bcF)\bar{K}\coloneqq\frac{1}{T}\log(b_{\circ}c_{\rm F}). Then, for a fixed k(0,K¯]k\in(0,\bar{K}] and the associated strategy aka_{k}, the leader obtains the following reward, which is higher than his value in the AOL information case

f(k)=x0+T2cL+ekT1kcF.f(k)=x_{0}+\dfrac{T}{2c_{\rm L}}+\dfrac{\mathrm{e}^{kT}-1}{kc_{\rm F}}.

Moreover, if a>1cL+b(bcF1)a_{\circ}>\frac{1}{c_{\rm L}}+b_{\circ}(b_{\circ}c_{\rm F}-1) and a>12cF(b2cF21)1cLa_{\circ}>\frac{1}{2c_{\rm F}}(b_{\circ}^{2}c_{\rm F}^{2}-1)-\frac{1}{c_{\rm L}}, then the solution to the ACLM problem is equal to the solution to the ACLM-K¯\bar{K} problem, with value f(K¯)f(\bar{K}).

Proof of Lemma A.1.

(i)(i) To provide the main intuition, suppose first that the leader’s actions are unrestricted, that is A=A=\mathbb{R}. This is the usual setting for the ACLM problems that are solved explicitly in the literature. Then, the leader announces her strategy αk𝒜\alpha_{k}\in{\cal A} defined by

ak(t,Xt)=1cL+k(XtXt),t[0,T].a_{k}(t,X_{t})=\frac{1}{c_{\rm L}}+k(X_{t}-X_{t}^{\star}),\;t\in[0,T].

Then, the follower’s optimisation problem originally defined in (2.2) is the following

VF(αk)supβ𝔼[XTcF20Tβt2dt],subject todXt=(1cL+k(XtXt)+βt)dt+σdWt,t[0,T].\displaystyle V_{\rm F}(\alpha_{k})\coloneqq\sup_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{]},\;\text{subject to}\;\mathrm{d}X_{t}=\bigg{(}\frac{1}{c_{\rm L}}+k(X_{t}-X_{t}^{\star})+\beta_{t}\bigg{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T]. (A.1)

As described in Section 2.1.3, one can use the stochastic maximum principle to obtain, after solving the appropriate FBSDE system, that the optimal response of the follower is given by

βt=Π[0,b](ek(Tt)cF),t[0,T].\beta^{\star}_{t}=\Pi_{[0,b_{\circ}]}\bigg{(}\frac{\mathrm{e}^{k(T-t)}}{c_{\rm F}}\bigg{)},\;t\in[0,T]. (A.2)

Alternatively, one can solve this stochastic control problem in a more straightforward way, by noticing that the follower’s problem defined above by (A.1) can be rewritten as

VF(αk)=supβ𝔼[XT+X~TcF20Tβt2dt]=x0+TcL+ekT1kcF+supβ𝔼[σWT+X~TcF20Tβt2dt],\displaystyle V_{\rm F}(\alpha_{k})=\sup_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}X^{\star}_{T}+\widetilde{X}_{T}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{]}=x_{0}+\dfrac{T}{c_{\rm L}}+\dfrac{\mathrm{e}^{kT}-1}{kc_{\rm F}}+\sup_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}\sigma W_{T}+\widetilde{X}_{T}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{]},

where the process X~XX\widetilde{X}\coloneqq X-X^{\star}, corresponding to the only state variable of the previous control problem, satisfies the following controlled ODE

dX~t=(kX~t+βt1cFek(Tt))dt,t[0,T],X~0=0,\displaystyle\mathrm{d}\widetilde{X}_{t}=\bigg{(}k\widetilde{X}_{t}+\beta_{t}-\dfrac{1}{c_{\rm F}}\mathrm{e}^{k(T-t)}\bigg{)}\mathrm{d}t,\;t\in[0,T],\;\widetilde{X}_{0}=0, (A.3)

whose solution is given by

X~tekt0teks(βsek(Ts)cF)ds=ekt0teksβsds12kcFekT(ektekt),t[0,T].\widetilde{X}_{t}\coloneqq\mathrm{e}^{kt}\int_{0}^{t}\mathrm{e}^{-ks}\bigg{(}\beta_{s}-\frac{\mathrm{e}^{k(T-s)}}{c_{\rm F}}\bigg{)}\mathrm{d}s=\mathrm{e}^{kt}\int_{0}^{t}\mathrm{e}^{-ks}\beta_{s}\mathrm{d}s-\frac{1}{2kc_{\rm F}}\mathrm{e}^{kT}\big{(}\mathrm{e}^{kt}-\mathrm{e}^{-kt}\big{)},\;\forall t\in[0,T].

The follower’s optimisation problem thus becomes

VF(αk)\displaystyle V_{\rm F}(\alpha_{k}) =x0+TcL+ekT1kcF+supβ{ekT0Tektβtdt12kcFekT(ekTekT)cF20Tβt2dt}\displaystyle=x_{0}+\dfrac{T}{c_{\rm L}}+\dfrac{\mathrm{e}^{kT}-1}{kc_{\rm F}}+\sup_{\beta\in{\cal B}}\bigg{\{}\mathrm{e}^{kT}\int_{0}^{T}\mathrm{e}^{-kt}\beta_{t}\mathrm{d}t-\frac{1}{2kc_{\rm F}}\mathrm{e}^{kT}\big{(}\mathrm{e}^{kT}-\mathrm{e}^{-kT}\big{)}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{\}}
=x0+TcL+ekT1kcF12kcF(e2kT1)+supβ0T(ek(Tt)βtcF2βt2)dt.\displaystyle=x_{0}+\dfrac{T}{c_{\rm L}}+\dfrac{\mathrm{e}^{kT}-1}{kc_{\rm F}}-\frac{1}{2kc_{\rm F}}\big{(}\mathrm{e}^{2kT}-1\big{)}+\sup_{\beta\in{\cal B}}\int_{0}^{T}\bigg{(}\mathrm{e}^{k(T-t)}\beta_{t}-\dfrac{c_{\rm F}}{2}\beta_{t}^{2}\bigg{)}\mathrm{d}t.

The optimal effort β\beta^{\star} introduced above in (A.2) is deduced by pointwise optimisation.

(i.1)(i.1) Suppose first, that k<K¯k<\bar{K} so then βt=1cFek(Tt)\beta^{\star}_{t}=\frac{1}{c_{\rm F}}\mathrm{e}^{k(T-t)}. The associated value for the follower is given by

VF(αk)\displaystyle V_{\rm F}(\alpha_{k}) =x0+TcL+ekT1kcF12kcF(e2kT1)+12cF0Te2k(Tt)dt=x0+TcL+ekT1kcF14kcF(e2kT1).\displaystyle=x_{0}+\dfrac{T}{c_{\rm L}}+\dfrac{\mathrm{e}^{kT}-1}{kc_{\rm F}}-\frac{1}{2kc_{\rm F}}\big{(}\mathrm{e}^{2kT}-1\big{)}+\frac{1}{2c_{\rm F}}\int_{0}^{T}\mathrm{e}^{2k(T-t)}\mathrm{d}t=x_{0}+\dfrac{T}{c_{\rm L}}+\dfrac{\mathrm{e}^{kT}-1}{kc_{\rm F}}-\frac{1}{4kc_{\rm F}}\big{(}\mathrm{e}^{2kT}-1\big{)}.

Remark that for the optimal control β\beta^{\star}, the controlled ODE (A.3) simplifies, and gives the trivial solution X~t=0\widetilde{X}_{t}=0, i.e. Xt=XtX_{t}=X_{t}^{\star}, for all t[0,T]t\in[0,T]. In other words, the best choice for the follower is to choose β\beta so that the process XX coincides with the process XX^{\star}. Given the follower’s optimal response, the objective value of the leader for the strategy αk\alpha_{k} simplifies to

𝔼[XTcL20T(ak(t,Xt))2dt]\displaystyle\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\big{(}a_{k}(t,X_{t})\big{)}^{2}\mathrm{d}t\bigg{]} =𝔼[XTcL20T(1cL+k(XtXt))2dt]=x0+T2cL+ekT1kcFf(k).\displaystyle=\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}^{\star}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\bigg{(}\frac{1}{c_{\rm L}}+k(X_{t}-X_{t}^{\star})\bigg{)}^{2}\mathrm{d}t\bigg{]}=x_{0}+\dfrac{T}{2c_{\rm L}}+\dfrac{\mathrm{e}^{kT}-1}{kc_{\rm F}}\eqqcolon f(k).

Notice that ff is an increasing function of kk, and that its limit when kk goes to 0 is given by

limk0f(k)=x0+T2cL+limk0ekT1kcF=x0+(12cL+1cF)T.\displaystyle\lim_{k\rightarrow 0}f(k)=x_{0}+\dfrac{T}{2c_{\rm L}}+\lim_{k\rightarrow 0}\dfrac{\mathrm{e}^{kT}-1}{kc_{\rm F}}=x_{0}+\bigg{(}\dfrac{1}{2c_{\rm L}}+\dfrac{1}{c_{\rm F}}\bigg{)}T.

As this value corresponds to the leader’s value function in the AOL case, we immediately conclude that her value with the closed-loop memoryless strategy αk\alpha_{k} would provide her a better value, as soon as k(0,1Tlog(bcF))k\in(0,\frac{1}{T}\log(b_{\circ}c_{\rm F})). Similarly, we have

limk0VF(αk)\displaystyle\lim_{k\rightarrow 0}V_{\rm F}(\alpha_{k}) =x0+(1cL+12cF)T.\displaystyle=x_{0}+\bigg{(}\dfrac{1}{c_{\rm L}}+\dfrac{1}{2c_{\rm F}}\bigg{)}T.

(i.2)(i.2) Take now k>K¯k>\bar{K} so we have

βt=b𝟏{t<t0}+ek(Tt)cF𝟏t[t0,T],wheret0Tlog(bcF)k.\beta^{\star}_{t}=b_{\circ}{\bf 1}_{\{t<t_{0}\}}+\frac{e^{k(T-t)}}{c_{\rm F}}{\bf 1}_{t\in[t_{0},T]},\;\text{where}\;t_{0}\coloneqq T-\frac{\log(b_{\circ}c_{\rm F})}{k}.

In this case the follower is not able to keep the process XX equal to XX^{\star}. Notice that βtcF<ek(Tt)\beta^{\star}_{t}c_{\rm F}<\mathrm{e}^{k(T-t)} for every t[0,t0]t\in[0,t_{0}] so then X~t\widetilde{X}_{t} is negative over [0,t0)[0,t_{0}) and remains so over [t0,T][t_{0},T]. Then, we have that ak(t,Xt)<1cLa_{k}(t,X_{t})<\frac{1}{c_{\rm L}} for every t>0t>0. In fact, we can compute explicitly

X~t={bk(ekt1)12kcF(ek(t+t0)ek(tt0)),t[0,t0],ek(tt0)X~t0,t[t0,T].\widetilde{X}_{t}=\begin{cases}\displaystyle\frac{b_{\circ}}{k}(\mathrm{e}^{kt}-1)-\frac{1}{2kc_{\rm F}}\big{(}\mathrm{e}^{k(t+t_{\text{$0$}})}-\mathrm{e}^{k(t-t_{\text{$0$}})}\big{)},\;t\in[0,t_{0}],\\[5.0pt] \displaystyle\mathrm{e}^{k(t-t_{\text{$0$}})}\widetilde{X}_{t_{\text{$0$}}^{\text{$\star$}}},\;t\in[t_{0},T].\end{cases}

The utility for the leader from this strategy is given by

JL(ak,β(ak))\displaystyle J_{\rm L}(a_{k},\beta^{\star}(a_{k})) =𝔼[XTcL20Tαt2dt]\displaystyle=\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\alpha_{t}^{2}\mathrm{d}t\bigg{]}
=x0+TcL+ekT1kcF+bekTkb2cF2ke2kT2kcFcL2𝔼[0T(1cL+kX~t)2dt].\displaystyle=x_{0}+\frac{T}{c_{\rm L}}+\frac{\mathrm{e}^{kT}-1}{kc_{\rm F}}+\frac{b_{\circ}\mathrm{e}^{kT}}{k}-\frac{b_{\circ}^{2}c_{\rm F}}{2k}-\frac{\mathrm{e}^{2kT}}{2kc_{\rm F}}-\dfrac{c_{\rm L}}{2}\mathbb{E}^{\mathbb{P}}\bigg{[}\int_{0}^{T}\bigg{(}\frac{1}{c_{\rm L}}+k\widetilde{X}_{t}\bigg{)}^{2}\mathrm{d}t\bigg{]}.

It can be seen numerically that the previous expression is decreasing in kk. Combining this with the deductions from part (i.1)(i.1), the (unrestricted) leader should choose the value k=K¯k=\bar{K} which provides her the highest utility.

inlineinlinetodo: inlineThis is not an acceptable argument… We have to prove that this is non-increasing in kk. Also the notations are misleading since X~\widetilde{X} actually depends on kk, and so does t0t_{0}. It is also completely deterministic so the expectation is useless.

(ii) We now discuss the problem in our setting, when A=[0,a]A=[0,a_{\circ}]. To simplify the solution of this example, we can choose the parameters aa_{\circ} and bb_{\circ} properly so the solution of the case (i)(i) is admissible in the restricted problem. This implies that the solution found in the case (i)(i) is also the solution to the ACLM problem with bounded effort for the leader. Indeed, note that the process X~t\widetilde{X}_{t} defined in (A.3) is bounded since B=[0,b]B=[0,b_{\circ}]. Namely, we have

12kcF(ek(Tt)ek(T+t))X~t12kcF(ek(Tt)ek(T+t))+bk(ekt1),t[0,T].\frac{1}{2kc_{\rm F}}(\mathrm{e}^{k(T-t)}-\mathrm{e}^{k(T+t)})\leq\widetilde{X}_{t}\leq\frac{1}{2kc_{\rm F}}(\mathrm{e}^{k(T-t)}-\mathrm{e}^{k(T+t)})+\frac{b_{\circ}}{k}(\mathrm{e}^{kt}-1),\;t\in[0,T].

Therefore the strategy aka_{k} is guaranteed to take values in A=[a,a]A=[-a_{\circ},a_{\circ}], for every response β\beta\in{\cal B}, for instance, if a>1cL+b(bcF1)a_{\circ}>\frac{1}{c_{\rm L}}+b_{\circ}(b_{\circ}c_{\rm F}-1) and a>12cF(b2cF21)1cLa_{\circ}>\frac{1}{2c_{\rm F}}(b_{\circ}^{2}c_{\rm F}^{2}-1)-\frac{1}{c_{\rm L}}.

Appendix B Functional spaces

We introduce the spaces used in this paper, by following [58]. Let (t,x)[0,T]×Ω(t,x)\in[0,T]\times\Omega, (𝒫(t,x))t[0,T]×xΩ(\mathcal{P}(t,x))_{t\in[0,T]\times x\in\Omega} be a family of sets of probability measures on (Ω,T)(\Omega,\mathcal{F}_{T}). In this section, we denote by 𝕏(𝒳s)s[0,T]\mathbb{X}\coloneqq(\mathcal{X}_{s})_{s\in[0,T]} a general filtration on (Ω,T)(\Omega,\mathcal{F}_{T}). Let p1p\geq 1, 𝒫(t,x)\mathbb{P}\in\mathcal{P}(t,x) and 𝕏\mathbb{X}_{\mathbb{P}} the usual \mathbb{P}-augmented filtration associated with 𝕏\mathbb{X}.

  • t,xp(𝕏,)\mathbb{H}^{p}_{t,x}(\mathbb{X},\mathbb{P}) (resp. t,xp(𝕏,𝒫)\mathbb{H}^{p}_{t,x}(\mathbb{X},\mathcal{P})) denotes the spaces of 𝕏\mathbb{X}-predictable d\mathbb{R}^{d}-valued processes ZZ such that

    Zt,xp(𝕏,)p𝔼[(tTσ^sZs2ds)p2]<+,(resp. Zt,xp(𝕏,𝒫)psup𝒫(t,x)Zt,xp(𝕏,)p<+).\|Z\|_{\mathbb{H}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathbb{P})}^{p}\coloneqq\mathbb{E}^{\mathbb{P}}\bigg{[}\bigg{(}\int_{t}^{T}\|\widehat{\sigma}_{s}^{\top}\!Z_{s}\|^{2}\mathrm{d}s\bigg{)}^{\frac{p}{2}}\bigg{]}<+\infty,\;\;\bigg{(}\text{resp. }\|Z\|_{\mathbb{H}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathcal{P})}^{p}\coloneqq\sup_{\mathbb{P}\in\mathcal{P}(t,x)}\|Z\|_{\mathbb{H}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathbb{P})}^{p}<+\infty\bigg{)}.
  • 𝕊t,xp(𝕏,)\mathbb{S}^{p}_{t,x}(\mathbb{X},\mathbb{P}) (resp. 𝕊t,xp(𝕏,𝒫)\mathbb{S}^{p}_{t,x}(\mathbb{X},\mathcal{P})) denotes the spaces of 𝕏\mathbb{X}–progressively measurable \mathbb{R}-valued processes YY such that

    Y𝕊t,xp(𝕏,)p𝔼[sups[t,T]|Ys|p]<+,(resp. Y𝕊t,xp(𝕏,𝒫)psup𝒫(t,x)Y𝕊t,xp(𝕏,)p<+).\|Y\|_{\mathbb{S}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathbb{P})}^{p}\coloneqq\mathbb{E}^{\mathbb{P}}\bigg{[}\sup_{s\in[t,T]}|Y_{s}|^{p}\bigg{]}<+\infty,\;\;\bigg{(}\text{resp. }\|Y\|_{\mathbb{S}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathcal{P})}^{p}\coloneqq\sup_{\mathbb{P}\in\mathcal{P}(t,x)}\|Y\|_{\mathbb{S}^{p}_{t,x}(\mathbb{X},\mathbb{P})}^{p}<+\infty\bigg{)}.
  • 𝕀t,xp(𝕏,)\mathbb{I}^{p}_{t,x}(\mathbb{X},\mathbb{P}) (resp. 𝕀t,xp(𝕏,𝒫)\mathbb{I}^{p}_{t,x}(\mathbb{X},\mathcal{P})) denotes the spaces of 𝕏\mathbb{X}-optional \mathbb{R}-valued processes KK with –a.s.\mathbb{P}\text{\rm--a.s.} càdlàg and non-decreasing paths on [t,T][t,T] with Kt=0,–a.s.K_{t}=0,\;\mathbb{P}\text{\rm--a.s.} and

    K𝕀t,xp(𝕏,)p𝔼[KTp]<+,(resp. K𝕀t,xp(𝕏,𝒫)psup𝒫(t,x)K𝕀t,xp(𝕏,)p<+).\|K\|_{\mathbb{I}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathbb{P})}^{p}\coloneqq\mathbb{E}^{\mathbb{P}}[K_{T}^{p}]<+\infty,\;\;\bigg{(}\text{resp. }\|K\|_{\mathbb{I}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathcal{P})}^{p}\coloneqq\sup_{\mathbb{P}\in\mathcal{P}(t,x)}\|K\|^{p}_{\mathbb{I}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathbb{P})}<+\infty\bigg{)}.
  • 𝔾t,xp(𝕏,)\mathbb{G}^{p}_{t,x}(\mathbb{X},\mathbb{P}) denotes the spaces of 𝕏\mathbb{X}-predictable 𝕊d\mathbb{S}^{d}-valued processes Γ\Gamma such that

    Γ𝔾t,xp(𝕏,)p𝔼[(tTσ^s2Γs2ds)p2]<+.\|\Gamma\|_{\mathbb{G}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathbb{P})}^{p}\coloneqq\mathbb{E}^{\mathbb{P}}\bigg{[}\bigg{(}\int_{t}^{T}\big{\|}\widehat{\sigma}^{2}_{s}\Gamma_{s}\big{\|}^{2}\mathrm{d}s\bigg{)}^{\frac{p}{2}}\bigg{]}<+\infty.

When t=0t=0, we simplify the previous notations by omitting the dependence on both tt and xx.