Dynamic programming approach for continuous-time Stackelberg games

Camilo Hernández¹¹1Princeton University, ORFE department, USA. [email protected]. Nicolás Hernández Santibáñez²²2Departamento de Matemática, Universidad Técnica Federico Santa María, Chile. [email protected]. Emma Hubert³³3Princeton University, ORFE department, USA. [email protected]. Research partially supported by the NSF grant DMS-2307736. Dylan Possamaï⁴⁴4ETH Zürich, Department of Mathematics, Rämistrasse 101, 8092 Zürich, Switzerland, [email protected].

Abstract

In this paper, we provide a general approach to reformulating any continuous-time stochastic Stackelberg differential game under closed-loop strategies as a single-level optimisation problem with target constraints. More precisely, we consider a Stackelberg game in which the leader and the follower can both control the drift and the volatility of a stochastic output process, in order to maximise their respective expected utility. The aim is to characterise the Stackelberg equilibrium when the players adopt ‘closed-loop strategies’, i.e. their decisions are based solely on the historical information of the output process, excluding especially any direct dependence on the underlying driving noise, often unobservable in real-world applications. We first show that, by considering the—second-order—backward stochastic differential equation associated with the continuation utility of the follower as a controlled state variable for the leader, the latter’s unconventional optimisation problem can be reformulated as a more standard stochastic control problem with stochastic target constraints. Thereafter, adapting the methodology developed by Soner and Touzi [67] or Bouchard, Élie, and Imbert [14], the optimal strategies, as well as the corresponding value of the Stackelberg equilibrium, can be characterised through the solution of a well-specified system of Hamilton–Jacobi–Bellman equations. For a more comprehensive insight, we illustrate our approach through a simple example, facilitating both theoretical and numerical detailed comparisons with the solutions under different information structures studied in the literature.

Key words: Stackelberg games, dynamic programming, second-order backward SDE, stochastic target constraint.

AMS 2000 subject classifications: Primary: 91A65; secondary: 60H30, 93E20, 91A15.

1 Introduction

The concept of hierarchical, or bi-level solution. for games was originally introduced by von Stackelberg in 1934, to describe market situations in which some firms have power of domination over others, see [76]. In the simple context of a two-player non–zero-sum game, this solution concept, now commonly known as Stackelberg equilibrium, is used to describe a situation where one of the two players—called the leader (she)—announces her strategy first, after which the second player—called the follower (he)—optimally reacts to the leader’s strategy. Therefore, in order to determine her optimal strategy, the leader should naturally anticipate the follower’s reaction to any given strategy and then choose the one that will optimise her reward function, given the follower’s best response. As such, a Stackelberg equilibrium is characterised by the pair of the leader’s optimal action and the follower’s rational response to that action. This type of solution concept is particularly relevant in situations where the players have asymmetrical power, as in the original market situation described by von Stackelberg, or when one player has more information than the other. For example, Stackelberg equilibria naturally arise in games where only one of the two players knows both players’ cost or reward functions, or when one player is more time-efficient than the other at determining her optimal strategy.

Dynamic Stackelberg games.

After its introduction, this equilibrium concept has been thoroughly studied in static competitive economics, but the mathematical treatment of its dynamic version was not developed until the 70s, first in discrete-time models by Cruz Jr. [22, 23], Gardner and Cruz Jr. [31], Başar and Selbuz [8, 9], and then more interestingly for us, in continuous-time ones by Chen and Cruz Jr. [18], Simaan and Cruz Jr. [64, 65, 66], Papavassilopoulos and Cruz Jr. [56, 57], Papavassilopoulos [55], Başar and Olsder [6], Başar [4], Bagchi [2]. For instance, Chen and Cruz Jr. [18] investigate Stackelberg solutions for a two-player non–zero-sum dynamic game with finite horizon $T>0$ , in which both players can observe the state $X$ and its dynamics, but only the leader knows both reward functions.

In this two-player game, the leader first chooses her control $\alpha\in{\cal A}$ to minimise her cost function $J_{\rm L}$ , and then the follower wishes to minimise his cost function $J_{\rm F}$ by choosing his own control $\beta\in{\cal B}$ , given admissibility sets ${\cal A}$ and ${\cal B}$ . In this dynamic setting, the cost functions take the form

\displaystyle J_{\rm L}(\alpha,\beta)\coloneqq g_{\rm L}(X_{T})+\int_{0}^{T}f_{\rm L}(t,X_{t},\alpha_{t},\beta_{t})\mathrm{d}t,\;\text{and}\;J_{\rm F}(\alpha,\beta)\coloneqq g_{\rm F}(X_{T})+\int_{0}^{T}f_{\rm F}(t,X_{t},\alpha_{t},\beta_{t})\mathrm{d}t,

and both optimisation problems are subject to the following dynamics for the state process

\displaystyle\mathrm{d}X_{t}=\lambda(t,X_{t},\alpha_{t},\beta_{t})\mathrm{d}t,\;t\in[0,T],\;X_{0}=x_{0}.

A strategy $(\alpha^{\star},\beta^{\star})$ is called a Stackelberg equilibrium if, for any $\alpha\in{\cal A}$

\displaystyle J_{\rm L}(\alpha^{\star},\beta^{\star})\leq J_{\rm L}(\alpha,b^{\star}(\alpha)),\;\text{where}\;b^{\star}(\alpha)\coloneqq\operatorname*{argmin}_{\beta\in{\cal B}}J_{\rm F}(\alpha,\beta),\;\text{and}\;\beta^{\star}\coloneqq b^{\star}(\alpha^{\star}).

More importantly, they also introduce two different refinements of the notion of Stackelberg solutions, depending on the information available to the players: open-loop, in which the player’s strategies are decided at time $0$ as a function of the initial state, and feedback, when the value of their strategies at time $t$ can only depend on the current state. In particular, they show that these different strategies lead in general to different solutions. This classification of Stackelberg equilibria proved to be crucial in the subsequent literature, especially when studying stochastic dynamic Stackelberg games. Unsurprisingly, it will also be at the crux of our analysis in this paper.

Stochastic Stackelberg games.

The pioneering works dealing with stochastic versions of Stackelberg games also date back to the late 70s, with the discrete-time models of Castanon [17], Başar [3], and Başar and Haurie [5]. Başar and Olsder [7, Chapter 7] give an overview of the theory of Stackelberg games at the time, i.e., static, deterministic discrete- and continuous-time, and stochastic discrete-time. One had to wait for the influential work by Yong [80] to see the literature on Stackelberg equilibria start incorporating continuous-time stochastic models. In this framework, the output process can be defined as the solution to a stochastic differential equation of the following form

\displaystyle\mathrm{d}X_{t}=\sigma(t,X_{t},\alpha_{t},\beta_{t})\big{(}\lambda(t,X_{t},\alpha_{t},\beta_{t})\mathrm{d}t+\mathrm{d}W_{t}\big{)},\;t\in[0,T],\;X_{0}=x_{0},

(1.1)

where $W$ is a Brownian motion, and the controls $\alpha$ and $\beta$ are chosen by the leader and the follower, respectively. As already mentioned, the information available to the players plays a crucial role when determining the solution concept. In [80], the author relies on the stochastic maximum principle to provide the open-loop solution to a linear–quadratic Stackelberg game, where both players can control the drift and the volatility of the state variable. Open-loop solutions are also studied, for example, by Øksendal, Sandal, and Ubøe [54] and Moon [51] in jump–diffusion models, and by Shi, Wang, and Xiong [62] in a linear–quadratic framework with asymmetric information. In parallel, feedback solutions are investigated using the dynamic programming approach, for instance by He, Prasad, and Sethi [37] in a specific model for cooperative advertising and pricing, or by Bensoussan, Chen, and Sethi [10] in an infinite-horizon model. This approach was further extended by Huang and Shi [38] to a finite-horizon problem with volatility control.

Similar to Nash equilibrium concepts, one can also consider so-called closed-loop Stackelberg solutions, where the strategies of both players can depend in particular on the trajectory of the state variable. However, as mentioned for example in Başar and Olsder [7] and Simaan and Cruz Jr. [64], closed-loop equilibria are notoriously hard to study, even in simple dynamic games. One work in this direction is Bensoussan, Chen, and Sethi [11], which extends the stochastic maximum principle approach to characterise adapted closed-loop memoryless Stackelberg solutions and, in a linear–quadratic framework, provides a comparison with the open-loop equilibrium. Li and Shi [46, 47] also discuss within a linear–quadratic framework what they call ‘closed-loop solvability’, but they also restrict to memoryless strategies, and the approach is thus similar to the one developed previously in [11]. Finally, one should also mention the paper by Li, Xu, and Zhang [42], which studies closed-loop strategies but with one-step memory in a deterministic and discrete-time setting.

While we defer to Section 2.1 the precise definitions of open-loop, feedback, and closed-loop Stackelberg solutions in a stochastic continuous-time framework, as well as a comparison of these concepts through a simple example, we emphasise that, to the best of our knowledge, there is no literature on stochastic Stackelberg games in which the players’ strategies are allowed to depend on the whole trajectory of the output process. One goal of this paper is precisely to fill the gap in the literature: we develop an approach that allows us to characterise Stackelberg equilibria with general (path-dependent) closed-loop strategies, in the sense that both the leader’s and follower’s strategies can depend on the trajectory of the state variable up to the current time, as opposed to the memoryless strategies considered in [11, 46, 47].

Extensions and applications.

Before describing our approach and results in more details, one should mention that there are now many extensions and generalisations of the traditional leader–follower game, such as zero-sum solutions, mixed leadership, control of backward SDEs, learning problems, large-scale games, and the mean-field setting, among others.⁵⁵5 See Sun, Wang, and Wen [72] for zero-sum games, Bensoussan, Chen, Chutani, Sethi, Siu, and Yam [12] for mixed leadership, Zheng and Shi [84, 85] and Feng, Hu, and Huang [29] for the case where the controlled state dynamics is given by a backward SDE, Li and Han [45] and Zheng and Shi [86, 87] for learning games and Ni, Liu, and Zhang [52] for the study of the time-inconsistency of open-loop solutions. As for larger-scale games, we mention Li and Yu [43] for the study of repeated Stackelberg games, in which a follower is also the leader of another game, and Kang and Shi [39] for a three-level game. The case of one leader and many followers, originally introduced in a static game by Leitmann [41] and in a stochastic framework by Wang, Wang, and Zhang [77], Vasal [75], has been extended to the mean-field setting in Fu and Horst [30], Aïd, Basei, and Pham [1], Si and Wu [63], Vasal [74], Lv, Xiong, and Zhang [49], Li and Shi [46], Gou, Huang, and Wang [32], Dayanıklı and Laurière [26], Cong and Shi [20]. Lastly, we remark that Stackelberg games cover a wide range of applications, from original economic models, as highlighted by Bagchi [2] and Van Long [73], to operation research and management science, as reviewed by Li and Sethi [44] and Dockner, Jorgensen, Van Long, and Sorger [27]. Specific applications in these areas include, but are not limited to, marketing channels as in He, Prasad, Sethi, and Gutierrez [36], cooperative advertising as in Chutani and Sethi [19], He, Prasad, and Sethi [37], insurance as in Havrylenko, Hinken, and Zagst [35], Han, Landriault, and Li [34], Guan, Liang, and Song [33], and energy generation as in Aïd, Basei, and Pham [1].

A ‘new’ Stackelberg solution concept.

In this paper, we consider a stochastic continuous-time Stackelberg game with two players, a leader, and a follower, both of whom can control the drift and volatility of the output process $X$ , whose dynamics take the general form (1.1). Our main theoretical result characterises the Stackelberg equilibrium when the strategies of both players are closed-loop, in the sense that their strategies can only depend on time and on the path of the output process $X$ . More precisely, we allow both players to build strategies whose value at time $t\in[0,T]$ can be a function of time $t$ but more importantly of the trajectory of the process $X$ up to time $t$ , denoted $X_{\cdot\wedge t}$ . In particular, under this information concept, the players’ decisions cannot directly depend on the underlying driving noise. As already emphasised, to our knowledge only the four aforementioned papers [11, 46, 47, 42] study Stackelberg equilibria for strategies falling into the class of ‘closed-loop’. However, the first three papers focus on the memoryless case, in the sense that the admissible strategies at time $t$ do not actually depend on the trajectory of the process up to time $t$ , but only on the value of the process at that time, namely $X_{t}$ . The last paper [42] introduces a notion of memory but only ‘one-step’, by allowing the strategy at time $t$ to depend on $X_{t}$ and $X_{t-1}$ , even though in a deterministic and discrete-time framework. The authors nevertheless show that strategies with one-step memory may lead, even in simple frameworks, to different equilibria compared to their memoryless counterparts, which thus provides a first motivation to study a form of ‘pathwise’ (as opposed to memoryless) closed-loop strategies.

Beyond the distinction between ‘memoryless’ and ‘pathwise’ closed-loop strategies, another significant difference of our solution concept comparing to [11, 46, 47] is the adaptedness of the admissible strategies. In the three previous papers, the strategies are assumed to be adapted to the filtration generated by the underlying noise. Informally, it implies that the strategies may also depend on the paths of the Brownian motion driving the output process $X$ . While this assumption is necessary to develop a resolution approach based on the stochastic maximum principle, one may question its feasibility in practice. Indeed, in real-world applications, it is debatable whether one actually observes the paths of the underlying noise, which is usually a modelling artefact without any physical reality.⁶⁶6For a more thorough discussion of this point, which is intimately linked to the question of whether one should adopt the ‘weak’ or ‘strong’ point of view in stochastic optimal control problems, we refer to the illuminating discussion in Zhang [83, Section 9.1.1]. We thus consider in our framework that admissible closed-loop strategies should instead be adapted with respect to the filtration generated by the output process $X$ . This different, albeit natural, concept of information for continuous-time stochastic Stackelberg games actually echoes the definition of closed-loop equilibria in the literature on ‘classical’ stochastic differential games (see, for example Carmona [16, Definition 5.5] for the case of closed-loop Nash equilibrium, or Possamaï, Touzi, and Zhang [59] for zero-sum games).

It should also be emphasised that the concept of information studied here, simply labelled closed-loop for convenience, is therefore different from the so-called ‘adapted closed-loop’ concept introduced (but not studied) by Bensoussan, Chen, and Sethi [11], in which the players’ strategies may depend on the whole trajectory of the output process $X$ , but are nevertheless adapted with respect to the filtration generated by the underlying Brownian motion. Although it is outside the scope of this paper to study the characterisation of adapted closed-loop solutions for Stackelberg games, our illustrative example suggests that this concept of information may be ‘too broad’. More precisely, we will see in this simple example that if the leader can design a strategy depending on the trajectories of both the output and the underlying driving noise, then she can actually impose the maximum effort on the follower. This observation suggests that the difference between ‘adapted closed-loop’ (in the sense of [11]) and what we coined ‘closed-loop’ is akin to the difference between first-best and second-best equilibria defined in the literature on principal–agent problems, which are themselves specific Stackelberg games. This parallel is further reinforced by the fact that our solution concept, although surprisingly new in the literature on stochastic Stackelberg games, as well as the solution approach we propose, are in fact strongly inspired by the theory on continuous-time principal-agent problems.

Solution approach via stochastic target.

The main contribution of our paper is therefore to provide a characterisation of the closed-loop equilibrium (in the sense previously discussed) to a general continuous-time stochastic Stackelberg game, in which both players can control the drift and volatility of the output process. Allowing for path-dependent strategies leads to a more sophisticated form of equilibrium which, consequently, is more challenging to solve. Indeed, in this case, the classical approaches used in the literature to characterise open-loop, or closed-loop memoryless equilibria, such as the maximum principle, can no longer be used. The approach we developed in this paper is based on the dynamic programming principle and stochastic target problems: the main idea is to use the follower’s value function as a state variable for the leader’s problem. More precisely, by writing forward the dynamics of the value function of the follower, which by the dynamic programming principle solves a backward SDE, we are able to reformulate the leader’s problem as a stochastic control problem of a (forward) SDE system with a stochastic target constraint. We also remark that the idea of considering the forward dynamics of the value function of the follower in a Stackelberg game, but with a continuum of followers, was used independently in Dayanıklı and Laurière [26] to develop a numerical algorithm by means of Lagrange multipliers, i.e. when the target constraint is added to leader’s objective function as a penalisation term. Our approach is different in that we employ the methodology developed in Bouchard, Élie, and Imbert [14] and Bouchard, Élie, and Touzi [13], which leverages the dynamic programming principle for problems with stochastic target constraints established in Soner and Touzi [67, 68], to provide a theoretical characterisation of the closed-loop solution of a Stackelberg game through a system of Hamilton–Jacobi–Bellman equations.

Overview of the paper.

We first introduce in Section 2 a simple illustrative example, in order to highlight the various concepts of Stackelberg equilibrium and the different approaches available to solve them. More importantly, we informally explain our approach in Section 2.2 through its application to the example under consideration. The rigorous formulation of the general problem is introduced in Section 3. In Section 4, we reformulate the leader’s problem in this general Stackelberg equilibrium as a stochastic control problem with stochastic target constraint, which is then solved in Section 5.

Notations.

We let $\mathbb{N}^{\star}$ be the set of positive integers, $\mathbb{R}_{+}\coloneqq[0,\infty)$ and $\mathbb{R}_{+}^{\star}\coloneqq(0,\infty)$ . For $(d,n)\in\mathbb{N}^{\star}\times\mathbb{N}^{\star}$ , $\mathbb{R}^{d\times n}$ , $\mathbb{S}^{d}$ , and, $\mathbb{S}^{d}_{+}$ denote the set of $d\times n$ matrices with real entries, $d\times d$ symmetric matrices with real entries, and $d\times d$ positive semi-definite symmetric matrices with real entries, respectively. For any closed convex subset $S\subseteq\mathbb{R}$ , we will denote by $\Pi_{S}(x)$ the Euclidean projection of $x\in\mathbb{R}$ on $S$ . For $T>0$ and a finite dimensional Euclidean space $E$ , we let ${\cal C}([0,T]\times E,\mathbb{R})$ be the space of continuous functions from $[0,T]\times E$ to $\mathbb{R}$ , as well as ${\cal C}^{1,2}([0,T]\times E,\mathbb{R})$ the subset of ${\cal C}([0,T]\times E,\mathbb{R})$ of all continuous functions from $[0,T]\times E$ to $\mathbb{R}$ , which are continuously differentiable in time and twice continuously differentiable in space. For every $\varphi\in{\cal C}^{1,2}([0,T]\times E,\mathbb{R})$ , we denote by $\partial_{t}\varphi$ its partial derivative with respect to time and by $\partial_{x}\varphi$ and $\partial_{xx}^{2}\varphi$ its gradient and Hessian with respect to the space variable, respectively. We agree that the supremum over an empty set is $-\infty$ . For a stochastic process $X$ , we denote by $\mathbb{F}^{X}\coloneqq({\cal F}^{X}_{t})_{t\geq 0}$ the filtration generated by $X$ .

2 Illustrative example

As already outlined in the introduction, there exist various concepts of Stackelberg equilibrium. In order to highlight their differences and describe the appropriate methods to compute each of them, we choose to develop in this section a simple illustrative example.

Let $T>0$ be a finite time horizon. For the sake of simplicity in this section, we focus on the strong formulation by fixing a probability space $(\Omega,{\cal F},\mathbb{P})$ supporting a one-dimensional Brownian motion $W$ . We slightly abuse notations here and denote by $\mathbb{F}^{W}\coloneqq({\cal F}_{t}^{W})_{t\in[0,T]}$ the natural filtration generated by $W$ , $\mathbb{P}$ -augmented in order to satisfy the usual hypotheses. We assume that the controlled one-dimensional state process $X$ satisfies the following dynamics

\displaystyle\mathrm{d}X_{t}=(\alpha_{t}+\beta_{t})\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0}\in\mathbb{R},

(2.1)

where the pair $(\alpha,\beta)$ represents the players’ decisions and $\sigma\in\mathbb{R}$ is a given constant. More precisely, the leader first announces her strategy $\alpha\in{\cal A}$ at the beginning of the game, where ${\cal A}$ is an appropriate family of $A$ -valued processes for $A\subseteq\mathbb{R}$ . With the knowledge of the leader’s action, the follower chooses an optimal response, i.e. a control $\beta\in{\cal B}$ optimising his objective function, for a given set ${\cal B}$ of $B$ -valued processes for $B\subseteq\mathbb{R}$ . The sets ${\cal A}$ and ${\cal B}$ will be defined subsequently, as they crucially depend on the solution concept considered.

We assume that, given $\alpha\in{\cal A}$ chosen by the leader, the follower solves the following optimal stochastic control problem

\displaystyle V_{\rm F}(\alpha)\coloneqq\sup_{\beta\in{\cal B}}J_{\rm F}(\alpha,\beta),\;\text{with}\;J_{\rm F}(\alpha,\beta)\coloneqq\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{]},

(2.2)

for some $c_{\rm F}>0$ . The best response of the follower to a control $\alpha\in{\cal A}$ chosen by the leader is naturally defined by

\displaystyle\beta^{\star}(\alpha)\coloneqq\operatorname*{arg\,max}_{\beta\in{\cal B}}J_{\rm F}(\alpha,\beta),

(2.3)

assuming uniqueness of the best response here to simplify.

The leader, anticipating the follower’s optimal response $\beta^{\star}(\alpha)$ , chooses $\alpha\in{\cal A}$ that optimises her own performance criterion. More precisely, we assume here that the leader’s optimisation is given by

\displaystyle V_{\rm L}\coloneqq\sup_{\alpha\in{\cal A}}J_{\rm L}\big{(}\alpha,\beta^{\star}(\alpha)\big{)},\;\text{with}\;J_{\rm L}\big{(}\alpha,\beta^{\star}(\alpha)\big{)}\coloneqq\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\alpha_{t}^{2}\mathrm{d}t\bigg{]},

(2.4)

for some $c_{\rm L}>0$ , and where the dynamics of $X$ are now driven by the optimal response of the follower, i.e.

\displaystyle\mathrm{d}X_{t}=\big{(}\alpha_{t}+\beta^{\star}_{t}(\alpha)\big{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0}\in\mathbb{R}.

The leader’s optimal action and the follower’s rational response, namely the couple $(\alpha^{\star},\beta^{\star}(\alpha^{\star}))$ for $\alpha^{\star}$ a maximiser in (2.4), constitute a global Stackelberg solution or equilibrium. To ensure that the value of the Stackelberg game is finite for all the various equilibrium concepts, one should require the sets $A$ and $B$ to be bounded. For the sake of simplicity, we assume here that $A\coloneqq[-a_{\circ},a_{\circ}]$ and $B\coloneqq[0,b_{\circ}]$ for some $a_{\circ}>c_{\rm L}^{-1}$ and $b_{\circ}>c_{\rm F}^{-1}$ .⁷⁷7The latter assumption is only intended to ensure that the ‘natural’ open-loop equilibrium can be reached, see Section 2.1.1.

The following section introduces the various notions of equilibrium in continuous-time stochastic Stackelberg games, and compares their solution. More importantly for our purpose, Section 2.2 illustrates our approach, based on dynamic programming and stochastic target problems, allowing to characterise a new notion of Stackelberg equilibrium, which we coin closed-loop. Before proceeding, it may be useful to have in mind the optimal—or reference—equilibrium for the leader, i.e. when she chooses both strategy directly. This optimal scenario for the leader, which can be labelled first-best in reference to its counterpart in principal-agent problems⁸⁸8Our choice to coin said reformulation as ‘first-best’ is not fortuitous, it is a terminology well-studied in the contract theory literature, see for instance Cvitanić and Zhang [24], which is one particular instance of a Stackelberg game., should naturally arise when the leader can deduce the follower’s strategy from her observation, and is able to strongly penalise him whenever he deviates from the optimal strategy recommended by the leader. The value of the leader in this first-best problem is naturally defined by

\displaystyle V_{\rm L}^{\rm FB}\coloneqq\sup_{(\alpha,\beta)\in{\cal A}\times{\cal B}}J_{\rm L}(\alpha,\beta),

(2.5)

where here, ${\cal A}$ and ${\cal B}$ are the sets of $\mathbb{F}^{W}$ -adapted processes taking values in $A$ and $B$ , respectively. This corresponds to a standard stochastic control problem, whose solution is provided in the following lemma.

Lemma 2.1 (First-best solution).

The optimal efforts in the first-best scenario are given by $\alpha^{\rm FB}_{t}=c_{\rm L}^{-1}$ and $\beta_{t}^{\rm FB}=b_{\circ}$ for all $t\in[0,T]$ , which induce the following values for the leader and the follower, respectively

\displaystyle V_{\rm L}^{\rm FB}=J_{\rm L}\big{(}\alpha^{\rm FB},\beta^{\rm FB}\big{)}=x_{0}+\bigg{(}\dfrac{1}{2c_{\rm L}}+b_{\circ}\bigg{)}T,\;V_{\rm F}^{\rm FB}\coloneqq J_{\rm F}\big{(}\alpha^{\rm FB},\beta^{\rm FB}\big{)}=x_{0}+\bigg{(}\dfrac{1}{c_{\rm L}}+b_{\circ}-\dfrac{1}{2}c_{\rm F}b^{2}_{\circ}\bigg{)}T.

The previous result can be proved through standard stochastic control techniques, but also using our stochastic target approach (see Section A.1).

2.1 Various Stackelberg equilibria

There exist various notions of equilibrium in a continuous-time stochastic Stackelberg game. These concepts are related to the information available to both players, the leader and the follower, at the beginning and during the game. Following the nomenclature in [7] for dynamic Stackelberg games, and extended to the stochastic version in [11], we informally⁹⁹9The definition of the information available to both players is rather informal here, in order to adhere to the concepts introduced in [11]. More rigorously, it could be defined as the filtration generated by the processes observable by both players. Nonetheless, we will define in a rigorous way the sets ${\cal A}$ and ${\cal B}$ of admissible efforts depending on the solution concept considered. define by ${\cal I}_{t}$ the information available to both players at time $t\in[0,T]$ and distinguish four cases

$(i)$

adapted open-loop (AOL) when ${\cal I}_{t}=\{x_{0},W_{\cdot\wedge t}\}$ ;
$(ii)$

adapted feedback (AF) when ${\cal I}_{t}=\{X_{t},W_{\cdot\wedge t}\}$ ;
$(iii)$

adapted closed-loop memoryless (ACLM) when ${\cal I}_{t}=\{x_{0},X_{t},W_{\cdot\wedge t}\}$ ;
$(iv)$

adapted closed-loop (ACL) when ${\cal I}_{t}=\{x_{0},X_{\cdot\wedge t},W_{\cdot\wedge t}\}$ .

As explained in [11], the information structures $(i),(iii)$ and $(iv)$ lead to the concept of global Stackelberg solutions, where the leader actually dominates the follower over the entire duration of the game. In these situations, a Stackelberg equilibrium $(\alpha^{\star},\beta^{\star}(\alpha^{\star}))$ is characterised as in the illustrative example above by

\displaystyle J_{\rm F}(\alpha,\beta^{\star}(\alpha))\geq J_{\rm F}(\alpha,\beta),\;\text{and}\;J_{\rm L}(\alpha^{\star},\beta^{\star}(\alpha^{\star}))\geq J_{\rm L}(\alpha,\beta^{\star}(\alpha)),\;\forall(\alpha,\beta)\in{\cal A}\times{\cal B}.

The information structure $(ii)$ leads to a different concept of solution in which the leader has only an instantaneous advantage over the follower. More precisely, a feedback Stackelberg equilibrium $(\alpha^{\star},\beta^{\star}(\alpha^{\star}))$ should satisfy

\displaystyle J_{\rm F}(\alpha^{\star},\beta^{\star}(\alpha^{\star}))\geq J_{\rm F}(\alpha^{\star},\beta),\;\text{and}\;J_{\rm L}(\alpha^{\star},\beta^{\star}(\alpha^{\star}))\geq J_{\rm L}(\alpha,\beta^{\star}(\alpha)),\;\forall(\alpha,\beta)\in{\cal A}\times{\cal B}.

In the following, we illustrate the existing approaches to computing the equilibrium in the first three information structures in the context of the above example. Even though the last information structure, corresponding to the adapted closed-loop (with memory) case, has not been studied in the literature, we are able to characterise it in this example. Indeed, our analysis established a connection between this Stackelberg solution concept and the first-best scenario, already discussed in Lemma 2.1.

However, the real aim of this paper is not to study existing solution concepts, but to introduce a new, albeit natural, concept of information, corresponding to the definition of closed-loop equilibrium in the literature on stochastic differential games (see, for example, Carmona [16, Definition 5.5]), in which the information available to both players at time $t\in[0,T]$ is—again informally—defined as

$(v)$

closed-loop (CL) when ${\cal I}_{t}=\{x_{0},X_{\cdot\wedge t}\}$ .

In particular, this information concept is different from the adapted closed-loop case introduced in [11] and mentioned above, as we do not assume here that the players have access to the paths of the Brownian motion. As already highlighted in the introduction, considering such an information structure makes sense, especially in real-world applications, as it usually seems unrealistic to believe that players can actually observe the underlying noise driving the output process, the latter being in most cases a modelling artefact. Admissible strategies constructed using this information structure are therefore not assumed to be adapted to the natural filtration generated by the Brownian motion, in contrast to adapted closed-loop strategies, hence we simply refer to them as closed-loop.

More precise specifications of this solution concept, along with an informal description of the methodology we develop to characterise the corresponding Stackelberg equilibrium, are presented separately in Section 2.2. We present below the main results obtained in the context of the example, especially the comparison of the values obtained for both players, depending on the equilibrium considered.

Comparison of the equilibria.

The results we obtain for the different solution concepts are summarised in Table 1 below. Before commenting on our results, we should point out that these findings were obtained for the example introduced at the beginning of this section, and by no means do we claim or expect that they would all be true in a more general context. Nevertheless, given the significance of some of these findings, especially the fact that, from the leader’s value point of view¹⁰¹⁰10From the follower’s point of view, all the inequalities are naturally reversed., $V_{\rm L}^{\rm AOL}=V_{\rm L}^{\rm AF}<V_{\rm L}^{\rm ACLM},V_{\rm L}^{\rm CL}<V_{\rm L}^{\rm ACL}=V_{\rm L}^{\rm FB}$ , investigating the extent to which they hold in greater generality could be the subject of future research.

	AOL $(i)$ and AF $(ii)$	ACLM $(iii)$	ACL $(iv)$ and FB	CL $(v)$
Leader’s value ( $V_{\rm L}$ )	$x_{0}+\bigg{(}\dfrac{1}{2c_{\rm L}}+\dfrac{1}{c_{\rm F}}\bigg{)}T$	$x_{0}+\bigg{(}\dfrac{1}{2c_{\rm L}}+\bar{b}\bigg{)}T$	$x_{0}+\bigg{(}\dfrac{1}{2c_{\rm L}}+b_{\circ}\bigg{)}T$	$V_{\rm L}^{\rm CL}$
Follower’s value ( $V_{\rm F}$ )	$x_{0}+\bigg{(}\dfrac{1}{c_{\rm L}}+\dfrac{1}{2c_{\rm F}}\bigg{)}T$	$x_{0}+\bigg{(}\dfrac{1}{c_{\rm L}}+\widetilde{b}\bigg{)}T$	$x_{0}+\bigg{(}\dfrac{1}{c_{\rm L}}+b_{\circ}-\dfrac{1}{2}c_{\rm F}b^{2}_{\circ}\bigg{)}T$	$V_{\rm F}^{\rm CL}$

Table 1: Comparison of the various Stackelberg equilibria

In the ACLM case, $\bar{b}\coloneqq\dfrac{b_{\circ}c_{\text{$\rm F$}}-1}{c_{\text{$\rm F$}}\log(b_{\circ}c_{\text{$\rm F$}})}\in\bigg{(}\dfrac{1}{c_{\text{$\rm F$}}},b_{\circ}\bigg{)}$ , and $\tilde{b}\coloneqq\dfrac{4b_{\circ}c_{\text{$\rm F$}}-b_{\circ}^{2}c_{\text{$\rm F$}}^{2}+3}{c_{\text{$\rm F$}}\log(b_{\circ}c_{\text{$\rm F$}})}\in\bigg{(}\dfrac{1}{2c_{\text{$\rm F$}}},b_{\circ}-\dfrac{1}{2}c_{\text{$\rm F$}}b^{2}_{\circ}\bigg{)}$ , for $b_{\circ}>c_{\text{$\rm F$}}^{-1}$ .

First of all, it is obviously expected that, for any concept of Stackelberg equilibrium, the value of the leader will be lower than her value in the first-best case, introduced as a reference in Lemma 2.1, since in this scenario the leader can directly choose the optimal effort of the follower. It is also expected that the more available information the leader can use to implement her strategy, the higher the value she will obtain, which translates mathematically into the following inequalities

\displaystyle V_{\rm L}^{\rm AOL}\leq V_{\rm L}^{\rm ACLM}\leq V_{\rm L}^{\rm ACL},\;V_{\rm L}^{\rm AF}\leq V_{\rm L}^{\rm ACLM},\;\text{and}\;V_{\rm L}^{\rm AOL}\leq V_{\rm L}^{\rm CL}\leq V_{\rm L}^{\rm ACL}.

^inline^inlinetodo: inlineWhy should it be true in general that

V_{\rm L}^{\rm AOL}\leq V_{\rm L}^{\rm CL}

? Purely in terms of ‘information’ the filtration generated by

W

is larger than the one generated by

X

In the context of our simple example, our first finding is that the Stackelberg equilibrium, and hence the associated values for the leader and the follower, coincide for both the adapted open-loop (Section 2.1.1) and the adapted feedback (Section 2.1.2) information structures. This might reflect how the additional information under the feedback structure is counterbalanced by the global dominance of the open-loop strategies. Regarding the value of the leader in the ACLM information structure (Section 2.1.3), strict inequalities with respect to the values in the AOL and ACL cases can be obtained for specific choices of the parameters $a_{\circ},b_{\circ},c_{\rm L}$ , and $c_{\rm F}$ . Namely, we assume in Lemma A.1 that

\displaystyle a_{\circ}>\max\bigg{\{}\frac{1}{c_{\rm L}}+b_{\circ}\big{(}b_{\circ}c_{\rm F}-1\big{)},\frac{1}{2c_{\rm F}}\big{(}b_{\circ}^{2}c_{\rm F}^{2}-1\big{)}-\frac{1}{c_{\rm L}}\bigg{\}},

(2.6)

in order to compute explicitly the value of the leader.

On the other hand, our analysis of the Stackelberg game under adapted closed-loop strategies in Section 2.1.4 shows that as long as the leader can effectively punish the follower at no additional cost, see Equation 2.13, then the problem degenerates to the first-best case. More precisely, by observing the trajectory of $X$ as well as that of $W$ , the leader can actually deduce the follower’s effort at each time, and thus force him to perform the maximum effort $b_{\circ}$ , threatening to significantly penalise him otherwise. This is the case, for instance, if

\displaystyle a_{\circ}\geq\dfrac{1}{2c_{\rm F}}-b_{\circ}+\dfrac{1}{2}c_{\rm F}b_{\circ}^{2}-\dfrac{1}{c_{\rm L}}.

(2.7)

Finally, regarding our equilibrium, namely closed-loop, while it is clear that the value for the leader should be higher than in the AOL case, and lower than in the ACL and FB cases, the comparison with the ACLM case is less straightforward. Unfortunately, we are not able to obtain explicit results in this framework, even in the context of this simple example, and we thus rely on numerical results, presented in Section 2.2.4. These numerical results seem to illustrate that the CL equilibrium gives a higher value for the leader compare to the ACLM case, at least when $a_{\circ}$ is chosen sufficiently large so that Equation 2.6 and Equation 2.7 are satisfied. Although we cannot rule out the possibility that these conclusions could be reversed for different sets of parameters, the numerical results nevertheless highlight that these two equilibria are essentially different.

2.1.1 Adapted open-loop strategies

In a Stackelberg game under the adapted open-loop (AOL) information structure, both players have access to the initial value of $X$ , namely $x_{0}$ , and the trajectory of the Brownian motion $W$ . Since the leader first announces her strategy $\alpha$ , its value $\alpha_{t}$ at a any time $t\in[0,T]$ should only depend on the realisation of the Brownian motion on $[0,t]$ , and on the initial value $x_{0}$ of the state. The leader’s strategy space ${\cal A}$ in this case is thus naturally defined by ${\cal A}\coloneqq\{\alpha:[0,T]\times\Omega\times\{x_{0}\}\longrightarrow A:\alpha\;\text{is}\;\mathbb{F}^{W}\text{-adapted}\}$ . As the follower makes his decision after the leader announces her whole strategy $\alpha$ on $[0,T]$ , his strategy may also depend on the leader’s announced strategy. More precisely, the value $\beta_{t}$ of the follower’s response strategy at time $t\in[0,T]$ is naturally measurable with respect to ${\cal F}_{t}^{W}$ , but can also depend on the leader’s strategy $\alpha$ . His response strategy space is thus defined by

\displaystyle{\cal B}\coloneqq\big{\{}\beta:[0,T]\times\Omega\times\{x_{0}\}\times{\cal A}\longrightarrow B:(\beta_{t}(\cdot,x_{0},\alpha))_{t\in[0,T]}\;\text{is an}\;\mathbb{F}^{W}\text{-adapted process for all}\;\alpha\in{\cal A}\big{\}}.

^inline^inlinetodo: inlineFor this: can’t we just say that

\alpha_{t}(\omega)=a(t,W_{\cdot\wedge t}(\omega),x_{0})

for some measurable map

a

and then that for any

\alpha\in{\cal A}

\beta_{t}(\omega)=b(t,W_{\cdot\wedge t}(\omega),x_{0},\alpha_{\cdot\wedge t}(\omega))

for some measurable map

b

? I think this is equivalent to what Camilo is proposing but maybe more readable.

Note that, at any time $t\in[0,T]$ , since the information available to the leader is also available to the follower, the follower can naturally compute the value of the leader’s strategy at that instant $t$ , i.e., $\alpha_{t}$ . However, he cannot anticipate the future values of the leader’s strategy $\alpha$ .

As described in [11, Section 3], one way to characterise a global Stackelberg equilibrium under the AOL information structure is to rely on the maximum principle. A general result is given, for example, in [11, Proposition 3.1], but we briefly describe this approach through its application to our example. Recall that, given the leader’s strategy $\alpha\in{\cal A}$ , the follower’s problem is defined by Equation 2.2, where the dynamics of the state variable $X$ satisfies (2.1). To solve this stochastic optimal control problem through the maximum principle, we first define the appropriate Hamiltonian

\displaystyle h^{\rm F}(t,a,y,z,b)\coloneqq(a+b)y+\sigma z-\dfrac{c_{\rm F}}{2}b^{2},\;(t,a,y,z,b)\in[0,T]\times A\times\mathbb{R}^{2}\times B.

Suppose now that there exists a solution $\beta^{\star}(\alpha)$ to the follower’s problem (2.2) for any $\alpha\in{\cal A}$ . Then, the maximum principle states that there exists a pair of real-valued, $\mathbb{F}^{W}$ -adapted processes $(Y^{\rm F},Z^{\rm F})$ such that

\displaystyle\begin{cases}\displaystyle\mathrm{d}X_{t}=\big{(}\alpha_{t}+\beta^{\star}_{t}(\alpha)\big{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0};\\[5.0pt] \displaystyle\mathrm{d}Y_{t}^{\rm F}=Z_{t}^{\rm F}\mathrm{d}W_{t},\;t\in[0,T],\;Y^{\rm F}_{T}=1;\\[5.0pt] \displaystyle\beta^{\star}_{t}(\alpha)\coloneqq\operatorname*{arg\,max}_{b\in B}\big{\{}h^{\rm F}\big{(}t,\alpha_{t},Y_{t}^{\rm F},Z_{t}^{\rm F},b\big{)}\big{\}},\;\mathrm{d}t\otimes\mathbb{P}\text{\rm--a.e.}\end{cases}

(2.8)

Note that the drift in the backward SDE (BSDE for short) in (2.8), commonly called adjoint process, is equal to $0$ because the Hamiltonian $h^{\rm F}$ does not depend on the state variable. Clearly, in this simple example, the pair $(Y^{\rm F},Z^{\rm F})$ satisfying the BSDE is the pair of constant processes $(1,0)$ . This leads to the optimal constant control $\beta^{\star}_{t}(\alpha)=1/c_{\rm F}\in B$ for all $t\in[0,T]$ . In particular, this control is independent of the leader’s choice of $\alpha$ . The leader’s problem defined by (2.4) thus becomes

\displaystyle V_{\rm L}=\sup_{\alpha\in{\cal A}}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\alpha_{t}^{2}\mathrm{d}t\bigg{]},\;\text{subject}\;\text{to}\;\mathrm{d}X_{t}=\bigg{(}\alpha_{t}+\dfrac{1}{c_{\rm F}}\bigg{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T].

This optimal control problem is trivial to solve, and also leads to an optimal constant control for the leader, namely $\alpha^{\star}_{t}=1/c_{\rm L}\in A$ for all $t\in[0,T]$ . The open-loop equilibrium is thus given by $(1/c_{\rm L},1/c_{\rm F})$ , which is admissible thanks to the assumptions $a_{\circ}\geq 1/c_{\rm L}$ and $b_{\circ}\geq 1/c_{\rm F}$ , and one can easily compute the corresponding values for the leader and the follower, given in Table 1.

2.1.2 Adapted feedback strategies

A Stackelberg game under the adapted feedback (AF) information structure differs from the other Stackelberg equilibrium, not only in the information structure itself, but also in the way the game is played. In this scenario, both players only have access to the current value of $X$ and the trajectory of the Brownian motion $W$ . In other words, the leader’s strategy at time $t\in[0,T]$ can only depend on the value $X_{t}$ and the realisation of the Brownian motion on $[0,t]$ . Under this information structure, the equilibrium is not global, in the sense that at each time $t\in[0,T]$ , the leader first decides her action $\alpha_{t}$ , and then the follower makes his decision, immediately after observing the leader’s instant action at time $t$ , rather than her whole strategy over $[0,T]$ . Therefore, the leader’s and follower’s strategy spaces are respectively defined by

{\cal A}\coloneqq\big{\{}\alpha:[0,T]\times\Omega\times\mathbb{R}\longrightarrow A:\alpha\;\text{is}\;\mathbb{F}^{W}\text{-adapted}\big{\}},\;\text{and}\;{\cal B}\coloneqq\big{\{}\beta:[0,T]\times\Omega\times\mathbb{R}\times A\longrightarrow B:\mathbb{F}^{W}\text{-adapted}\big{\}}.

Camilo: Maybe $\alpha$ is $\mathbb{F}^{AF}$ -adapted where $\mathbb{F}^{AF}\coloneqq({\cal F}^{W,X_{t}}_{t})_{t\in[0,T]}$ , ${\cal F}^{W,X_{t}}_{t}\coloneqq\sigma((W_{s},X_{t}):s\in[0,t])$ and $\beta_{\cdot}(\cdot,a)$ is $\mathbb{F}^{AF}$ -adapted for any $a\in A$ . ^inline^inlinetodo: inlineFor this: can’t we just say that $\alpha_{t}(\omega)=a(t,W_{\cdot\wedge t}(\omega),X_{t}(\omega))$ for some measurable map $a$ and $\beta_{t}(\omega)=b(t,W_{\cdot\wedge t}(\omega),X_{t}(\omega))$ for some measurable map $b$ ? I think this is equivalent to what Camilo is proposing but maybe more readable.

Recall that an AF Stackelberg solution is a pair $(\alpha^{\star},\beta^{\star}(\alpha^{\star}))\in{\cal A}\times{\cal B}$ satisfying $J_{\rm F}(\alpha^{\star},\beta^{\star}(\alpha^{\star}))\geq J_{\rm F}(\alpha^{\star},\beta)$ , $\forall\beta\in{\cal B}$ , $J_{\rm L}(\alpha^{\star},\beta^{\star}(\alpha^{\star}))\geq J_{\rm L}(\alpha,\beta^{\star}(\alpha))$ , $\forall\alpha\in{\cal A}$ . To compute such solution, we can rely on the approach in [10], based on the dynamic programming method. More precisely, for $(t,z^{\rm F},z^{\rm L},a,b)\in[0,T]\times\mathbb{R}^{2}\times A\times B$ , we introduce the players’ Hamiltonians

\displaystyle h^{\rm F}\big{(}t,z^{\rm F},a,b\big{)}\coloneqq(a+b)z^{\rm F}-\dfrac{c_{\rm F}}{2}b^{2},\;\text{and}\;h^{\rm L}\big{(}t,z^{\rm L},a,b\big{)}\coloneqq(a+b)z^{\rm L}-\dfrac{c_{\rm L}}{2}a^{2}.

For a fixed action of the leader, the follower’s optimal response is given by the maximiser of his Hamiltonian, i.e.

\displaystyle b^{\star}\big{(}t,z^{\rm F},a\big{)}\coloneqq\operatorname*{arg\,max}_{b\in B}\big{\{}h^{\rm F}\big{(}t,z^{\rm F},a,b\big{)}\big{\}}=\Pi_{B}\bigg{(}\dfrac{z^{\rm F}}{c_{\rm F}}\bigg{)},\;(t,z^{\rm F},a)\in[0,T]\times\mathbb{R}\times A,

recalling that for all $x\in\mathbb{R}$ , $\Pi_{B}(x)$ denotes the projection of $x$ on $B$ . One should then replace this optimal response into the leader’s Hamiltonian. Nevertheless, in this example it does not change the functional maximising the leader’s Hamiltonian, given by

\displaystyle a^{\star}\big{(}t,z^{\rm F},z^{\rm L}\big{)}\coloneqq\operatorname*{arg\,max}_{a\in A}\big{\{}h^{\rm L}\big{(}t,z^{\rm L},a,b^{\star}(t,z^{\rm F},a)\big{)}\big{\}}=\Pi_{A}\bigg{(}\dfrac{z^{\rm L}}{c_{\rm L}}\bigg{)},\;(t,z^{\rm F},z^{\rm L})\in[0,T]\times\mathbb{R}^{2}.

To compute the equilibrium, one must solve the following system of coupled Hamilton–Jacobi–Bellman equations

\displaystyle\begin{cases}-\partial_{t}v_{\rm F}(t,x)-\bigg{(}\Pi_{A}\bigg{(}\dfrac{\partial_{x}v_{\rm L}(t,x)}{c_{\rm L}}\bigg{)}+\Pi_{B}\bigg{(}\dfrac{\partial_{x}v_{\rm F}(t,x)}{c_{\rm F}}\bigg{)}\bigg{)}\partial_{x}v_{\rm F}(t,x)+\dfrac{c_{\rm F}}{2}\Pi_{B}^{2}\bigg{(}\dfrac{\partial_{x}v_{\rm F}(t,x)}{c_{\rm F}}\bigg{)}-\dfrac{1}{2}\sigma^{2}\partial_{xx}v_{\rm F}(t,x)=0,\\[8.00003pt] -\partial_{t}v_{\rm L}(t,x)-\bigg{(}\Pi_{A}\bigg{(}\dfrac{\partial_{x}v_{\rm L}(t,x)}{c_{\rm L}}\bigg{)}+\Pi_{B}\bigg{(}\dfrac{\partial_{x}v_{\rm F}(t,x)}{c_{\rm F}}\bigg{)}\bigg{)}\partial_{x}v_{\rm L}(t,x)+\dfrac{c_{\rm L}}{2}\Pi_{A}^{2}\bigg{(}\dfrac{\partial_{x}v_{\rm L}(t,x)}{c_{\rm L}}\bigg{)}-\dfrac{1}{2}\sigma^{2}\partial_{xx}v_{\rm L}(t,x)=0,\end{cases}

for all $(t,x)\in[0,T)\times\mathbb{R}$ , with boundary conditions $v_{\rm F}(T,x)=v_{\rm L}(T,x)=x$ , $x\in\mathbb{R}$ . One can check using a standard verification theorem that the appropriate solutions to the previous system are

\displaystyle v_{\rm F}(t,x)=x+\bigg{(}\dfrac{1}{c_{\rm L}}+\dfrac{1}{2c_{\rm F}}\bigg{)}(T-t),\;\text{and}\;v_{\rm L}(t,x)=x+\bigg{(}\dfrac{1}{c_{\rm F}}+\dfrac{1}{2c_{\rm L}}\bigg{)}(T-t),\;(t,x)\in[0,T]\times\mathbb{R},

which correspond to the constant strategies $(1/c_{\rm L},1/c_{\rm F})\in A\times B$ . In particular, the feedback Stackelberg equilibrium coincides with the open-loop solution computed before, both in terms of strategy and corresponding value.

2.1.3 Adapted closed-loop memoryless strategies

If the information structure is assumed to be adapted closed-loop memoryless (ACLM), then both players have access to the initial and current value of $X$ , as well as the trajectory of the Brownian motion $W$ . This means that both players can make the values of their decisions at time $t$ contingent on additionally the current state information $X_{t}$ , when compared to the AOL information structure case. Then, the leader’s strategy space and the follower’s response strategy space are naturally defined by

	$\displaystyle{\cal A}$	$\displaystyle\coloneqq\big{\{}\alpha:[0,T]\times\Omega\times\mathbb{R}\times\{x_{0}\}\longrightarrow A:{\color[rgb]{0,0.6,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.6,0}\alpha_{\cdot}(\cdot,X_{t}?,x_{0})}\;\mathbb{F}^{W}\text{-adapted}\big{\}},$
	$\displaystyle{\cal B}$	$\displaystyle\coloneqq\big{\{}\beta:[0,T]\times\Omega\times\mathbb{R}\times\{x_{0}\}\times{\cal A}\longrightarrow B:{\color[rgb]{0,0.6,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.6,0}\beta_{\cdot}(\cdot,X_{t}?,x_{0},\alpha)}\;\mathbb{F}^{W}\text{-adapted},\;\forall\alpha\in{\cal A}\big{\}}.$

Camilo: $\alpha_{\cdot}(\cdot,x_{0})$ is $\mathbb{F}^{AF}$ -adapted for any $x_{0}\in\mathbb{R}$ ^inline^inlinetodo: inlineFor this: can’t we just say that $\alpha_{t}(\omega)=a(t,W_{\cdot\wedge t}(\omega),X_{t}(\omega),x_{0})$ for some measurable map $a$ and then that for any $\alpha\in{\cal A}$ , $\beta_{t}(\omega)=b(t,W_{\cdot\wedge t}(\omega),X_{t}(\omega),x_{0},\alpha_{\cdot\wedge t}(\omega))$ for some measurable map $b$ ? I think this is equivalent to what Camilo is proposing but maybe more readable. As mentioned above, the main difference between the ACLM and the AOL information structures is that the leader’s control at time $t$ can depend on the value of the state at that time. However, by choosing his strategy $\beta$ , the follower will naturally impact the dynamic of the state $X$ and thus its value, which in turn impacts the value of the leader’s control $\alpha$ . Therefore, in order to compute his optimal response to a strategy $\alpha$ of the leader, the follower needs to take into account the retroaction of his control on the value of the leader’s control. This leads to a more sophisticated form of equilibrium. In particular, contrary to the AOL case where the leader is relatively myopic, in the sense that she cannot possibly take into account the choice of the follower, under the ACLM information structure, she can now design a strategy indexed on the state that will therefore take into account the follower’s actions.

In order to characterise the global Stackelberg equilibrium under the ACLM information structure, we can again rely on the maximum principle (see [11, Section 4]). First, to highlight the dependency of the value $\alpha_{t}$ on the current value of the state $X_{t}$ , we write $\alpha_{t}\eqqcolon a_{t}(X_{t})$ for $a:[0,T]\times\Omega\times\mathbb{R}\times\{x_{0}\}\longrightarrow A$ , whose values at a fixed $(t,\omega)\in[0,T]\times\Omega$ induces the family ${\rm A}$ of mappings ${\rm a}:\mathbb{R}\times\{x_{0}\}\longrightarrow A$ . We can then follow the maximum principle approach as before, but taking into account this dependency. More precisely, as before, we fix the leader’s strategy $\alpha\in{\cal A}$ and thus its value $a_{t}(X_{t})$ at time $t$ , and consider the follower’s problem given by (2.2), but now subject to the following dynamics for the state

\displaystyle\mathrm{d}X_{t}=(a_{t}(X_{t})+\beta_{t})\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0},

where the dependency of the leader’s control on the state appears explicitly. This dependency will thus also appear explicitly in the Hamiltonian

\displaystyle h^{\rm F}(t,{\rm a},x,y,z,b)\coloneqq({\rm a}(x)+b)y+\sigma z-\dfrac{c_{\rm F}}{2}b^{2},\;(t,{\rm a},x,y,z,b)\in[0,T]\times{\rm A}\times\mathbb{R}^{3}\times B.

Suppose that there exists a solution $\beta^{\star}(\alpha)$ to the follower’s problem (2.2) for any $\alpha\in{\cal A}$ . Then, the maximum principle states that there exists a pair of $\mathbb{F}^{W}$ -adapted processes $(Y^{\rm F},Z^{\rm F})$ satisfying the forward–backward SDE (FBSDE for short)

\displaystyle\begin{cases}\displaystyle\mathrm{d}X_{t}=\big{(}a_{t}(X_{t})+\beta^{\star}_{t}(\alpha)\big{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0},\\[5.0pt] \displaystyle\mathrm{d}Y_{t}^{\rm F}=-\partial_{x}h^{\rm F}\big{(}t,\alpha_{t},X_{t},Y_{t}^{\rm F},Z_{t}^{\rm F},\beta^{\star}_{t}(\alpha)\big{)}\mathrm{d}t+Z_{t}^{\rm F}\mathrm{d}W_{t},\;t\in[0,T],\;Y^{\rm F}_{T}=1,\\[5.0pt] \displaystyle\beta^{\star}_{t}(\alpha)\coloneqq\operatorname*{arg\,max}_{b\in B}\big{\{}h^{\rm F}\big{(}t,\alpha_{t},X_{t},Y_{t}^{\rm F},Z_{t}^{\rm F},b\big{)}\big{\}},\;t\in[0,T].\end{cases}

Notice that $h^{\rm F}$ now depends explicitly on the state variable, and thus the associated partial derivative is not equal to zero, contrary to the AOL case. By computing the maximiser of $h^{\rm F}$ over $b\in B$ , the previous FBSDE system becomes

\displaystyle\begin{cases}\displaystyle\mathrm{d}X_{t}=\bigg{(}a_{t}(X_{t})+\Pi_{B}\bigg{(}\dfrac{Y_{t}^{\rm F}}{c_{\rm F}}\bigg{)}\bigg{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0},\\[8.00003pt] \displaystyle\mathrm{d}Y_{t}^{\rm F}=-\partial_{x}a_{t}(X_{t})Y_{t}^{\rm F}\mathrm{d}t+Z_{t}^{\rm F}\mathrm{d}W_{t},\;t\in[0,T],\;Y^{\rm F}_{T}=1.\end{cases}

(2.9)

One can then reformulate the leader’s problem defined by (2.4) as

\displaystyle V_{\rm L}=\sup_{\alpha\in{\cal A}}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\alpha_{t}^{2}\mathrm{d}t\bigg{]},\;\text{subject to the dynamics in \eqref{eq:MP-ACLM-example}}.

(2.10)

That is, the leader’s problem is equivalent to a stochastic optimal control problem of an FBSDE. Note however that the presence of the derivative $\partial_{x}a$ of the leader’s strategy in (2.9) results in a non-standard optimal control problem for the leader, which can nevertheless also be solved via the maximum principle, as described in [11, Section 4]. More precisely, the idea to solve the leader’s problem is to look at efforts of the form $a_{t}(X_{t})=a^{2}_{t}X_{t}+a^{1}_{t}$ , where $a^{1}$ and $a^{2}$ are $\mathbb{F}^{W}$ -adapted, $\mathbb{R}$ -valued processes such that $a^{2}_{t}X_{t}+a^{1}_{t}\in A$ for every $t\in[0,T]$ , $\mathbb{P}$ –a.s. We define ${\cal A}^{2}$ as the space of processes $(a^{1},a^{2})$ satisfying these properties. It then follows from [11, Theorem 4.1] that $V_{\rm L}=\widetilde{V}_{\rm L}$ , where

\displaystyle\widetilde{V}_{\rm L}\coloneqq\sup_{(a^{\text{$1$}},a^{\text{$2$}})\in{\cal A}^{2}}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\big{(}a^{2}_{t}X_{t}+a^{1}_{t}\big{)}^{2}\mathrm{d}t\bigg{]},

(2.11)

subject to

\displaystyle\begin{cases}\displaystyle\mathrm{d}X_{t}=\bigg{(}a^{2}_{t}X_{t}+a^{1}_{t}+\Pi_{B}\bigg{(}\dfrac{Y_{t}^{\rm F}}{c_{\rm F}}\bigg{)}\bigg{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0},\\[8.00003pt] \displaystyle\mathrm{d}Y_{t}^{\rm F}=-a^{2}_{t}Y_{t}^{\rm F}\mathrm{d}t+Z_{t}^{\rm F}\mathrm{d}W_{t},\;t\in[0,T],\;Y^{\rm F}_{T}=1.\end{cases}

To solve $\widetilde{V}_{L}$ , we define, for $(t,x,x^{\prime},y,y^{\prime},z,z^{\prime},{\rm a}^{1},{\rm a}^{2})\in[0,T]\times\mathbb{R}^{8}$ , the Hamiltonian

\displaystyle h^{\rm L}(x,x^{\prime},y,y^{\prime},z,z^{\prime},{\rm a}^{1},{\rm a}^{2})\coloneqq\bigg{(}{\rm a}^{2}x+{\rm a}^{1}+\Pi_{B}\bigg{(}\dfrac{y^{\prime}}{c_{\rm F}}\bigg{)}\bigg{)}y+\sigma z-{\rm a}^{2}y^{\prime}x^{\prime}-\dfrac{c_{\rm L}}{2}({\rm a}^{2}x+{\rm a}^{1})^{2}.

Again by [11, Theorem 4.1], if $\hat{\alpha}\in{\cal A}$ is a solution to the leader’s problem (2.10) with the corresponding state trajectory $(\hat{X},\hat{Y}^{\rm F},\hat{Z}^{\rm F})$ , then there exists a triple of $\mathbb{F}^{W}$ -adapted processes $(X^{\rm L},Y^{\rm L},Z^{\rm L})$ such that

\displaystyle\begin{cases}\displaystyle\mathrm{d}X^{\rm L}_{t}=-\partial_{y^{\text{$\prime$}}}h^{\rm L}\mathrm{d}t-\partial_{z^{\text{$\prime$}}}h^{\rm L}\mathrm{d}W_{t},\;t\in[0,T],\;X^{\rm L}_{0}=0,\\[5.0pt] \displaystyle\mathrm{d}Y_{t}^{\rm L}=-\partial_{x}h^{\rm L}\mathrm{d}t+Z_{t}^{\rm L}\mathrm{d}W_{t},\;t\in[0,T],\;Y^{\rm L}_{T}=1,\end{cases}

where the derivatives of $h^{\rm L}$ are evaluated at $\big{(}\hat{X}_{t},X^{\rm L}_{t},Y_{t}^{\rm L},\hat{Y}_{t}^{\rm F},Z_{t}^{\rm L},\hat{Z}_{t}^{\rm F},\hat{a}_{t}(\hat{X}_{t})-\partial_{x}\hat{a}_{t}(\hat{X}_{t})\hat{X}_{t},\partial_{x}\hat{a}_{t}(\hat{X}_{t})\big{)}$ , and

\displaystyle\big{(}\hat{a}_{t}(\hat{X}_{t})-\partial_{x}\hat{a}_{t}(\hat{X}_{t})\hat{X}_{t},\partial_{x}\hat{a}_{t}(\hat{X}_{t})\big{)}\in\operatorname*{arg\,max}_{({\rm a}^{\text{$1$}},{\rm a}^{\text{$2$}})\in A^{\text{$2$}}(\hat{X}_{\text{$t$}})}\big{\{}h^{\rm L}\big{(}\hat{X}_{t},X^{\rm L}_{t},Y_{t}^{\rm L},\hat{Y}_{t}^{\rm F},Z_{t}^{\rm L},\hat{Z}_{t}^{\rm F},{\rm a}^{1},{\rm a}^{2}\big{)}\big{\}},\;t\in[0,T],

where $A^{2}(x)$ is the set of $(a^{1},a^{2})\in\mathbb{R}^{2}$ such that $a^{1}+a^{2}x\in A$ . Note, however, that the maximiser of $h^{\rm L}$ is not well-defined without further restriction on the strategy $\alpha\in{\cal A}$ . A way to tackle this issue is to impose a priori bounds on $\partial_{x}a$ , as done in [11, Section 5.2], that will later be relaxed in order to not lose generality. We thus assume that $\|a^{2}\|_{\infty}\leq k$ , for some $k>0$ , and we denote by solution ACLM- $k$ the corresponding constrained solution. We point out that we will later study the behaviour of the solution ACLM- $k$ as $k\to\infty$ . Optimising $h^{\rm L}$ with respect to $a^{1}$ gives

\displaystyle\hat{a}^{1}(y,x)\coloneqq\dfrac{y}{c_{\rm L}}-a^{2}x,\text{ and, }\;h^{\rm L}(x,x^{\prime},y,y^{\prime},z,z^{\prime},\hat{a}^{1},a^{2})=\dfrac{1}{2}\dfrac{y^{2}}{c_{\rm L}}+\dfrac{yy^{\prime}}{c_{\rm F}}+\sigma z-a^{2}y^{\prime}x^{\prime}.

Then, the maximisation with respect to $a^{2}$ gives $\hat{a}^{2}\coloneqq k{\rm sign}(y^{\prime}x^{\prime})$ . Therefore, by the maximum principle, if $(\hat{a}_{1},\hat{a}_{2})$ is a solution to Problem (2.11), then there exists a tuple of $\mathbb{F}^{W}$ -adapted processes $(X,X^{\rm L},Y^{\rm F},Z^{\rm F},Y^{\rm L},Z^{\rm L})$ such that

\displaystyle\begin{cases}\displaystyle\mathrm{d}X_{t}=\bigg{(}\hat{a}^{2}_{t}X_{t}+\hat{a}^{1}_{t}+\Pi_{B}\bigg{(}\dfrac{Y_{t}^{\rm F}}{c_{\rm F}}\bigg{)}\bigg{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;\;t\in[0,T],\;X_{0}=x_{0},\\[8.00003pt] \displaystyle\mathrm{d}X^{\rm L}_{t}=-\bigg{(}\dfrac{Y^{\rm L}_{t}}{c_{\rm F}}-\hat{a}^{2}_{t}X^{\rm L}_{t}\bigg{)}\mathrm{d}t,\;t\in[0,T],\;X^{\rm L}_{0}=0,\\[10.00002pt] \displaystyle\mathrm{d}Y_{t}^{\rm F}=-\hat{a}^{2}_{t}Y_{t}^{\rm F}\mathrm{d}t+Z_{t}^{\rm F}\mathrm{d}W_{t},\;t\in[0,T],\;Y^{\rm F}_{T}=1,\\[8.00003pt] \displaystyle\mathrm{d}Y_{t}^{\rm L}={\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}0\times}\mathrm{d}t+Z_{t}^{\rm L}\mathrm{d}W_{t},\;t\in[0,T],\;Y^{\rm L}_{T}=1.\end{cases}

(2.12)

We can solve the system explicitly for $Y^{\rm L}\equiv 1$ and $Y_{t}^{\rm F}=\mathrm{e}^{k(T-t)}$ which implies that $X^{\rm L}$ is a negative process. ^inline^inlinetodo: inlineThis is not very clear: I guess $\hat{a}^{1}_{t}$ and $\hat{a}^{2}_{t}$ are $\hat{a}^{1}(Y^{L}_{t},X^{L}_{t})$ and $\hat{a}^{2}(Y^{L}_{t},X^{L}_{t})$ ? This should be precised. Also one should give solutions for the whole system, not just $Y^{L}$ and $Y^{F}$ . I again guess that $Z^{F}$ and $Z^{L}$ have to be $0$ , but what of $X^{L}$ and $X$ ?Then we have the candidate solutions to ACLM- $k$ , given for all $t\in[0,T]$ by

\alpha^{\star}(t,X_{t})=\Pi_{A}\bigg{(}\frac{1}{c_{\rm L}}+k(X_{t}-X_{t}^{\star})\bigg{)},\;\beta^{\star}(t)=\Pi_{B}\bigg{(}\frac{\mathrm{e}^{k(T-t)}}{c_{\rm F}}\bigg{)},\text{ where }X^{\star}_{t}=x_{0}+\dfrac{t}{c_{\rm L}}+\dfrac{\mathrm{e}^{kT}}{kc_{\rm F}}\big{(}1-\mathrm{e}^{-kt}\big{)}+\sigma W_{t}.

It is proved in Lemma A.1, that such strategies are optimal for small values of $k$ such that no projection is enforced.¹¹¹¹11We mean that for both controls, the terms inside the brackets do not go outside of the corresponding admissible intervals. Moreover, under the right choice of parameters $a_{\circ}$ and $b_{\circ}$ , for instance if Condition (2.6) is satisfied, the value of the ACLM problem is equal to the ACLM- $k$ problem for $k=\frac{1}{T}\log(b_{\circ}c_{\rm F})$ and we have

V_{\rm L}=x_{0}+\frac{T}{2c_{\rm L}}+\frac{T(b_{\circ}c_{\rm F}-1)}{c_{\rm F}\log(b_{\circ}c_{\rm F})},\leavevmode\nobreak\ V_{\rm F}(\alpha^{\star})=x_{0}+\frac{T}{2c_{\rm L}}+\frac{T(b_{\circ}c_{\rm F}-1)}{c_{\rm F}\log(b_{\circ}c_{\rm F})}-\frac{T(b_{\circ}^{2}c_{\rm F}^{2}-1)}{4c_{\rm F}\log(b_{\circ}c_{\rm F})}.

It follows directly that this value is strictly smaller than the value of the first–best problem.

2.1.4 Adapted closed-loop strategies

Recall that when the information structure is assumed to be adapted closed-loop (with memory), both the leader and the follower observe the paths of the state $X$ and the underlying Brownian Motion, and can use these observations to construct their strategies. Then, the leader’s strategy space and the follower’s response strategy space are respectively defined by

	$\displaystyle{\cal A}$	$\displaystyle\coloneqq\big{\{}\alpha:[0,T]\times\Omega\times{\cal C}([0,T],\mathbb{R})\longrightarrow A:\mathbb{F}^{W}\text{-adapted}\big{\}},$
	$\displaystyle{\cal B}$	$\displaystyle\coloneqq\big{\{}\beta:[0,T]\times\Omega\times{\cal C}([0,T],\mathbb{R})\times{\cal A}\longrightarrow B:\beta(\cdot,\alpha)\;\mathbb{F}^{W}\text{-adapted},\;\forall\alpha\in{\cal A}\big{\}}.$

Camilo: $\alpha_{\cdot}(\cdot,x,x_{o})$ , $\beta_{\cdot}(\cdot,x,x_{o},\alpha)$ is $\mathbb{F}^{W}$ -adapted for any $(x,x_{0},\alpha)\in{\cal C}([0,T],\mathbb{R})\times\mathbb{R}\times{\cal A}$ ^inline^inlinetodo: inlineThis still looks strange to me: what about $\alpha_{t}(\omega)=a(t,W_{\cdot\wedge t}(\omega),X_{\cdot\wedge t}(\omega))$ for some measurable map $a$ and for any $\alpha\in{\cal A}$ , $\beta_{t}(\omega)=b(t,W_{\cdot\wedge t}(\omega),X_{\cdot\wedge t}(\omega),\alpha_{\cdot\wedge t}(\omega))$ for some measurable map $b$ ? Also, ${\cal C}([0,T],\mathbb{R})$ is not exactly defined in the notations. In our example, and under this particular information structure, the leader has actually enough information to deduce the effort of the follower. Therefore, if the leader has enough bargaining power, she may actually force the follower to undertake a recommended effort. More precisely, for $a_{\circ}$ sufficiently large, the leader would be able to punish the follower if he deviates from the desired action. Indeed, suppose the leader wants to force the follower to perform the action $\hat{\beta}\in{\cal B}$ while doing herself an action $\hat{\alpha}\in{\cal A}$ . One possible way to induce these strategies is for the leader to play

\alpha_{t}\coloneqq\hat{\alpha}_{t}-p\mathds{1}_{\beta_{\text{$t$}}^{\text{$\circ$}}\neq\hat{\beta}_{\text{$t$}}},

for some penalty coefficient $p\geq 0$ , and where $\beta^{\circ}$ represents the ‘reference’ effort, defined by

\displaystyle\beta_{t}^{\circ}

\displaystyle\coloneqq\underset{\varepsilon\searrow 0}{\rm{limsup}}\;\bigg{(}\frac{\upbeta_{t}^{\circ}-\upbeta_{t-\varepsilon}^{\circ}}{\varepsilon}\bigg{)},\;\text{with}\;\upbeta_{t}^{\circ}\coloneqq X_{t}-\sigma W_{t}-\int_{0}^{t}\hat{\alpha}_{s}\mathrm{d}s,\;t\in[0,T].

In words, by implementing the strategy $\alpha$ defined above, the leader threatens to punish the follower whenever the observed effort $\beta^{\circ}$ deviates from the recommended effort $\hat{\beta}$ . Note that the definition of $\beta^{\circ}$ makes use of the fact that the leader observes the trajectories of both the state and the Brownian motion. In particular, such strategy $\alpha$ could not be implemented under the previous ACLM information structure.

In general, we say the leader can effectively punish the follower for not playing $\hat{\beta}$ if

\displaystyle\exists\alpha\in{\cal A},J_{\rm F}(\alpha,\hat{\beta})\geq J_{\rm F}(\alpha,\beta),\;\forall\beta\in{\cal B},\;\text{and}\;J_{\rm L}(\alpha,\hat{\beta})\geq J_{\rm L}(\hat{\alpha},\hat{\beta}).

(2.13)

In words, there exists an admissible strategy $\alpha\in{\cal A}$ such that the optimal response of the follower to $\alpha$ is to play $\hat{\beta}$ , and there is no detriment to the leader’s utility when implementing the strategy $\alpha$ instead of $\hat{\alpha}$ .

We mention that in this example, we actually have the equality $J_{\rm L}(\alpha,\hat{\beta})=J_{\rm L}(\hat{\alpha},\hat{\beta})$ . More precisely, the leader can replicate the first-best solution by choosing $\hat{\alpha}\equiv c_{\rm L}^{-1}$ and forcing the follower’s action $\hat{\beta}\equiv b_{\circ}$ . Indeed, given the leader’s strategy $\alpha_{t}\coloneqq c_{\rm L}^{-1}-p\mathds{1}_{\beta_{t}^{\text{$\circ$}}\neq b_{\circ}}$ , we have for all $\beta\in{\cal B}$

\displaystyle J_{\rm F}(\alpha,b_{\circ})-J_{\rm F}(\alpha,\beta)=\mathbb{E}^{\mathbb{P}}\bigg{[}\int_{0}^{T}\bigg{(}b_{\circ}-\dfrac{c_{\rm F}}{2}b_{\circ}^{2}+p\mathds{1}_{\beta_{t}^{\text{$\circ$}}\neq b_{\circ}}-\beta_{t}+\dfrac{c_{\rm F}}{2}\beta_{t}^{2}\bigg{)}\mathrm{d}t\bigg{]},

and therefore the effectiveness of the punishment amounts to $p\geq(2c_{\rm F})^{-1}+c_{\rm F}b_{\circ}^{2}/2-b_{\circ}$ . This strategy can be implemented if the process $\alpha$ defined above is admissible, in the sense that it takes values in $A$ . Therefore, if $a_{\circ}$ is sufficiently large, for instance if Condition (2.7) holds, then the solution to the ACL Stackelberg equilibrium in this example coincides with the first-best problem, whose solution is given in Lemma 2.1.

Remark 2.2.

Let us remark that the previous argument shows that for any Stackelberg game under closed-loop strategies in which the leader can punish the follower in a way that is effective and causes no additional cost, i.e., (2.13) holds for $(\hat{\alpha},\hat{\beta})$ being the solution to the first-best problem, then the equality $V_{\rm L}=V_{\rm L}^{\rm FB}$ holds.

2.2 Closed-loop strategies

The approach we developed in this paper provides a way of studying and characterising a new, albeit natural, type of Stackelberg equilibrium in which the both players only have access to the trajectory of the state variable $X$ . Consistent with the literature on stochastic differential games (see, for example, Carmona [16]), we name this concept of information closed-loop (CL). Under this information structure, both players can take into account only the whole past trajectory of the state $X$ when making their decisions. Then, the leader’s strategy space and the follower’s response strategy space are respectively given by

	$\displaystyle{\cal A}$	$\displaystyle\coloneqq\big{\{}\alpha:[0,T]\times{\cal C}([0,T],\mathbb{R})\longrightarrow A:\mathbb{F}\text{-adapted}\big{\}},$
	$\displaystyle{\cal B}$	$\displaystyle\coloneqq\big{\{}\beta:[0,T]\times{\cal C}([0,T],\mathbb{R})\times{\cal A}\longrightarrow B:\beta(\cdot,\alpha)\;\mathbb{F}\text{-adapted},\;\forall\alpha\in{\cal A}\big{\}},$

where $\mathbb{F}$ denotes the filtration generated by $X$ . ^inline^inlinetodo: inlineSame as my earlier comments on this. In addition, we said in the notations that this filtration was supposed to be $\mathbb{F}^{X}$ .As already mentioned in introduction, allowing for path-dependency leads to a more realistic and sophisticated form of equilibrium and, consequently, more challenging to solve. In this case, the difficulty arises as the approaches developed above for solving Stackelberg open-loop or closed-loop memoryless equilibrium, mostly relying on the maximum principle, can no longer be used. To the best of our knowledge, there is currently no method developed in the literature for solving Stackelberg games within the framework of this very general, yet quite natural, information structure.

The aim of this paper is, therefore, precisely to propose an approach, based on the dynamic programming principle and stochastic target problems, for characterising the solution for this type of equilibrium. Our methodology, which consists of two main steps, is informally illustrated through the example presented at the top of this section. The first step is to use the follower’s value function as a state variable for the leader’s problem. More precisely, this value function solves a backward SDE, and by writing it in a forward way, we are able to reformulate the leader’s problem as a stochastic control problem of an SDE system with stochastic target constraints. The second step consists in applying the methodology developed by [14] to characterise such a stochastic control problem with target constraints through a system of Hamilton–Jacobi–Bellman equations. Note that the reasoning developed in this section is quite informal, the aim being simply to illustrate our method; the reader is referred to Section 3 onwards for the rigorous description of our approach.

2.2.1 Reformulation as a stochastic target problem

Recall that, given the leader’s strategy $\alpha\in{\cal A}$ , the follower’s problem is given by (2.2). The idea of our approach to compute the Stackelberg equilibrium for closed-loop strategies is to consider the BSDE satisfied by the value function of the follower.¹²¹²12Actually, one should switch to the weak formulation of the problem in order to consider the BSDE representation of the follower’s value. Nevertheless, once again our goal here is simply to illustrate our method, and we refer to Section 3 for the rigorous formulation of the problem. With this in mind, we introduce the dynamic value function of the follower given by

\displaystyle Y_{t}^{\alpha}\coloneqq\operatorname*{ess\,sup}_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm F}}{2}\int_{t}^{T}\beta_{s}^{2}\mathrm{d}s\bigg{|}{\cal F}_{t}\bigg{]},\;t\in[0,T],

where the state variable $X$ follows the dynamics given by (2.1). By introducing the appropriate Hamiltonian, i.e.

\displaystyle H^{\rm F}(t,z,a)\coloneqq\sup_{b\in B}\bigg{\{}(a+b)z-\dfrac{c_{\rm F}}{2}b^{2}\bigg{\}},\;(t,z,a)\in[0,T]\times\mathbb{R}\times A,

it is easy to show that, for a given $\alpha\in{\cal A}$ , the value function of the follower is a solution to the following BSDE

\displaystyle\mathrm{d}Y^{\alpha}_{t}=-H^{\rm F}(t,Z^{\alpha}_{t},\alpha_{t})\mathrm{d}t+Z^{\alpha}_{t}\mathrm{d}X_{t},\;t\in[0,T],\;Y^{\alpha}_{T}=X_{T},

for some $Z^{\alpha}\in{\cal Z}$ , where ${\cal Z}$ is a set of $\mathbb{F}$ -adapted processes taking value in $\mathbb{R}$ and satisfying appropriate integrability conditions. The maximiser of the Hamiltonian is naturally given by the functional $b^{\star}(z)=\Pi_{\tilde{B}}(z)/c_{\rm F}$ , $z\in\mathbb{R}$ , where $\Pi_{\tilde{B}}(z)$ denotes the projection of $z$ on $\tilde{B}\coloneqq[0,b_{\circ}c_{\rm F}]$ . For a given strategy $\alpha\in{\cal A}$ chosen by the leader, we are thus led to consider the following FBSDE system

\displaystyle\begin{cases}\displaystyle\mathrm{d}X_{t}=\bigg{(}\alpha_{t}+\dfrac{1}{c_{\rm F}}\Pi_{\tilde{B}}\big{(}Z_{t}^{\alpha}\big{)}\bigg{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0},\\[8.00003pt] \displaystyle\mathrm{d}Y_{t}^{\alpha}=\dfrac{1}{2c_{\rm F}}\Pi^{2}_{\tilde{B}}\big{(}Z_{t}^{\alpha}\big{)}\mathrm{d}t+\sigma Z^{\alpha}_{t}\mathrm{d}W_{t},\;t\in[0,T],\;Y^{\alpha}_{T}=X_{T}.\end{cases}

(2.14)

Consequently, the leader’s problem defined by (2.4) becomes

\displaystyle V_{\rm L}(x_{0})=\sup_{\alpha\in{\cal A}}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\alpha_{t}^{2}\mathrm{d}t\bigg{]},\text{ subject to the FBSDE system \eqref{eq:ACL-follower-example}}.

Unfortunately, the literature on the optimal control problem of FBSDEs is quite scarce and, to the best of our knowledge, is not able to accommodate the scenario described above, see for instance Yong [81] or Wu [79]. Nevertheless, to continue the reformulation of the leader’s problem, one can write the BSDE in (2.14) as a forward SDE for a given initial condition $y_{0}\in\mathbb{R}$ , and thus consider the following SDE system

\displaystyle\begin{cases}\displaystyle\mathrm{d}X_{t}=\bigg{(}\alpha_{t}+\dfrac{1}{c_{\rm F}}\Pi_{\tilde{B}}(Z_{t})\bigg{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T],\;X_{0}=x_{0},\\[8.00003pt] \displaystyle\mathrm{d}Y_{t}=\dfrac{1}{2c_{\rm F}}\Pi^{2}_{\tilde{B}}(Z_{t})\mathrm{d}t+\sigma Z_{t}\mathrm{d}W_{t},\;t\in[0,T],\;Y_{0}=y_{0},\end{cases}

(2.15)

for some $(\alpha,Z)\in{\cal A}\times{\cal Z}$ . However, by doing so, one needs to take into account an additional constraint, namely a stochastic target constraint, in order to ensure that the equality $Y_{T}=X_{T}$ holds with probability one at the end of the game. More precisely, one of the main results of our paper, stated for the general framework in Theorem 4.6, is that the leader’s problem originally defined here by (2.4) is equivalent to the following stochastic target problem

\displaystyle\widehat{V}_{\rm L}(x_{0})\coloneqq\sup_{y_{\text{$0$}}\in\mathbb{R}}\sup_{(Z,\alpha)\in{\mathfrak{C}}(x_{\text{$0$}},y_{\text{$0$}})}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\alpha_{t}^{2}\mathrm{d}t\bigg{]},

subject to the system (2.15), and where $\mathfrak{C}(x_{0},y_{0})\coloneqq\{(Z,\alpha)\in{\cal Z}\times{\cal A}:Y_{T}=X_{T},\;\mathbb{P}\text{\rm--a.s.}\}$ , for any $(x_{0},y_{0})\in\mathbb{R}^{2}$ .

2.2.2 Interpretation of the reformulated problem

The interpretation of the reformulated problem $\widehat{V}_{\rm L}$ is the following. For fixed $y_{0}\in\mathbb{R}$ , the leader’s problem is to choose a couple $(Z,\alpha)$ of admissible controls. With this in mind, given the state $X$ observable in continuous time, she can construct an additional process $Y$ , starting from $Y_{0}=y_{0}$ , with the following dynamics

\displaystyle\mathrm{d}Y_{t}=-H^{\rm F}(t,Z_{t},\alpha_{t})\mathrm{d}t+Z_{t}\mathrm{d}X_{t},\;t\in[0,T].

Note that the previous process $Y$ can be constructed based solely on the observation through time of the path of $X$ , and in particular does not require any knowledge of the follower’s control $\beta$ nor of the trajectory of the underlying Brownian motion $W$ . Now, the couple $(Z,\alpha)$ of admissible processes chosen by the leader should be such that the terminal condition $Y_{T}=X_{T}$ is satisfied $\mathbb{P}$ –a.s. Indeed, under this important condition, the follower’s problem originally defined by (2.2) can be rewritten as

\displaystyle V_{\rm F}(\alpha)\coloneqq\sup_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{]}=\sup_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}Y_{T}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{]}.

With the knowledge of the dynamic of $Y$ , as well as the leader’s controls $(Z,\alpha)$ , the follower can easily solve his own optimisation problem

	$\displaystyle V_{\rm F}(\alpha)$	$\displaystyle=y_{0}+\sup_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}-\int_{0}^{T}H^{\rm F}(t,Z_{t},\alpha_{t})\mathrm{d}t+\int_{0}^{T}Z_{t}\mathrm{d}X_{t}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{]}$
		$\displaystyle=y_{0}+\sup_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}-\int_{0}^{T}\sup_{b\in B}\bigg{\{}(\alpha_{t}+b)Z_{t}-\dfrac{c_{\rm F}}{2}b^{2}\bigg{\}}\mathrm{d}t+\int_{0}^{T}Z_{t}(\alpha_{t}+\beta_{t})\mathrm{d}t+\int_{0}^{T}\sigma Z_{t}\mathrm{d}W_{t}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{]}$
		$\displaystyle=y_{0}+\sup_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}-\int_{0}^{T}\sup_{b\in B}\bigg{\{}bZ_{t}-\dfrac{c_{\rm F}}{2}b^{2}\bigg{\}}\mathrm{d}t+\int_{0}^{T}\bigg{(}Z_{t}\beta_{t}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\bigg{)}\mathrm{d}t\bigg{]},$

making it clear, at least heuristically here, that his best response strategy is to choose $\beta_{t}\coloneqq\Pi_{\tilde{B}}(Z_{t})/c_{\rm F}$ , $t\in[0,T]$ . This optimal choice provides him with the maximum value, for all $(\alpha,Z)\in{\cal A}\times{\cal Z}$ . In particular, if $Z_{t}\in\tilde{B}$ for all $t\in[0,T]$ , then $V_{\rm F}(\alpha)=y_{0}$ . To summarise, for a given $y_{0}\in\mathbb{R}$ , which actually coincides with the follower’s value, the leader designs his strategy characterised by the couple $(Z,\alpha)$ such that $Y_{T}=X_{T}$ is satisfied $\mathbb{P}\text{\rm--a.s.}$ for a well-chosen process $Y$ , inducing the follower’s optimal response $\beta_{\cdot}\coloneqq\Pi_{\tilde{B}}(Z_{\cdot})/c_{\rm F}$ . Note that the leader should not only communicate to the follower the couple $(Z,\alpha)$ of controls, but she should also indicate how these controls are designed, namely the construction of the underlying process $Y$ : all these ingredients are part of the strategy implemented by the leader.

2.2.3 Characterisation of the equilibrium

Given the reformulation of the leader’s problem as a stochastic control problem with stochastic target constraint, the second step consists now in applying the methodology in [14] to solve the latter problem and thus obtain a characterisation of the corresponding Stackelberg equilibrium.

Recall that in our illustrative example, the leader’s reformulated problem takes the following form

\displaystyle\widehat{V}_{\rm L}(x_{0})\coloneqq\sup_{y_{\text{$0$}}\in\mathbb{R}}\widetilde{V}_{\rm L}(0,x_{0},y_{0}),\;\text{where}\;\widetilde{V}_{\rm L}(t,x,y)\coloneqq\sup_{(Z,\alpha)\in{\mathfrak{C}}(t,x,y)}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}^{t,x,Z,\alpha}-\dfrac{c_{\rm L}}{2}\int_{t}^{T}\alpha_{s}^{2}\mathrm{d}s\bigg{]},

(2.16)

where for $(t,x,y)\in[0,T]\times\mathbb{R}^{2}$ , the set ${\mathfrak{C}}(t,x,y)$ is defined by

\displaystyle{\mathfrak{C}}(t,x,y)\coloneqq\big{\{}(Z,\alpha)\in{\cal Z}\times{\cal A}:Y_{T}^{t,y,Z,\alpha}=X_{T}^{t,x,Z,\alpha},\;\mathbb{P}\text{\rm--a.s.}\big{\}},

with the controlled state variables $X$ and $Y$ satisfying the following dynamics

\displaystyle\begin{cases}\displaystyle\mathrm{d}X_{s}^{t,x,Z,\alpha}=\bigg{(}\alpha_{s}+\dfrac{1}{c_{\rm F}}\Pi_{\tilde{B}}(Z_{s})\bigg{)}\mathrm{d}s+\sigma\mathrm{d}W_{s},\;s\in[t,T],\;X_{t}^{t,x,Z,\alpha}=x,\\[8.00003pt] \displaystyle\mathrm{d}Y_{s}^{t,y,Z,\alpha}=\dfrac{1}{2c_{\rm F}}\Pi^{2}_{\tilde{B}}(Z_{s})\mathrm{d}s+\sigma Z_{s}\mathrm{d}W_{s},\;s\in[t,T],\;Y_{t}^{t,y,Z,\alpha}=y.\end{cases}

(2.17)

In particular, for fixed $(t,x,y)\in[0,T]\times\mathbb{R}^{2}$ , $\widetilde{V}_{\rm L}(t,x,y)$ corresponds to the dynamic value function of an optimal control problem with stochastic target constraints. Thus, we define for any $t\in[0,T]$ the target reachability set

\displaystyle V_{G}(t)\coloneqq\big{\{}(x,y)\in\mathbb{R}^{2}:\exists(Z,\alpha)\in{\cal Z}\times{\cal A},\;Y_{T}^{t,y,Z,\alpha}=X_{T}^{t,x,Z,\alpha},\;\mathbb{P}\text{\rm--a.s.}\big{\}}.

An intermediary but important result for our approach, see Lemma 5.3, is to show that the closure of the reachability set $V_{G}(t)$ coincides with the following set

\displaystyle\hat{V}_{G}(t)\coloneqq\{(x,y)\in\mathbb{R}^{2}:w^{-}(t,x)\leq y\leq w^{+}(t,x)\},

for appropriate auxiliary functions $w^{-}$ and $w^{+}$ . It is then almost straightforward to extend the approach in [14] to characterise the leader’s value function $\widetilde{V}_{\rm L}$ as the solution to a specific system of Hamilton–Jacobi–Bellman (HJB) equations and therefore determine the corresponding optimal strategy. More precisely, this can be achieved in three main steps. First, the auxiliary functions $w^{-}$ and $w^{+}$ can be characterised as solutions (in an appropriate sense) to specific HJB equations. Then, the leader’s value function $\widetilde{V}_{\rm L}$ satisfies another specific HJB equation on each of these boundaries. Finally, in the interior of the domain, $\widetilde{V}_{\rm L}$ is a solution to the classical HJB equation, but with the non-standard boundary conditions obtained in the previous step, see Theorem 5.8. These three steps are described below in the framework of our illustrative example.

The auxiliary functions.

The lower and upper boundaries, $w^{-}$ and $w^{+}$ can be characterised as the solutions to the following specific HJB equations on $(t,x)\in[0,T)\times\mathbb{R}$ ,

\displaystyle\displaystyle-\partial_{t}w^{+}(t,x)-H^{+}(t,x,\partial_{x}w^{+}(t,x),\partial_{xx}w^{+}(t,x))=0,\;\displaystyle-\partial_{t}w^{-}(t,x)-H^{-}(t,x,\partial_{x}w^{-}(t,x),\partial_{xx}w^{-}(t,x))=0,

with terminal condition $w^{-}(T,x)=w^{+}(T,x)=x$ , $x\in\mathbb{R}$ , and where for all $(t,x,p,q)\in[0,T]\times\mathbb{R}^{3}$

	$\displaystyle H^{+}(t,x,p,q)$	$\displaystyle\coloneqq\sup_{(z,a)\in N(t,x,p)}h^{b}(p,q,z,a),\;H^{-}(t,x,p,q)\coloneqq\inf_{(z,a)\in N(t,x,p)}h^{b}(p,q,z,a),$
	$\displaystyle\text{with}\;h^{b}(p,q,z,a)$	$\displaystyle\coloneqq-\dfrac{1}{2c_{\rm F}}\Pi^{2}_{\tilde{B}}(z)+\bigg{(}a+\dfrac{1}{c_{\rm F}}\Pi_{\tilde{B}}(z)\bigg{)}p+\dfrac{1}{2}\sigma^{2}q,\;\text{for}\;(z,a)\in N(t,x,p)\coloneqq\{(z,a)\in\mathbb{R}\times A:\sigma z=\sigma p\}.$

Since $\sigma\neq 0$ , the constraint set $N$ boils down to $N(t,x,p)=\{(p,a):a\in A\}$ , for all $(t,x,p)\in[0,T]\times\mathbb{R}^{2}$ . Using in addition the ansatz $\partial_{x}w^{\pm}\in\tilde{B}$ , one obtains the following HJB equations on $(t,x)\in[0,T)\times\mathbb{R}$

	$\displaystyle-\partial_{t}w^{-}(t,x)-\dfrac{1}{2}\sigma^{2}\partial_{xx}w^{-}(t,x)-\dfrac{1}{2c_{\rm F}}\big{(}\partial_{x}w^{-}(t,x)\big{)}^{2}-\inf_{a\in A}\big{\{}\partial_{x}w^{-}(t,x)a\big{\}}$	$\displaystyle=0,$
	$\displaystyle-\partial_{t}w^{+}(t,x)-\dfrac{1}{2}\sigma^{2}\partial_{xx}w^{+}(t,x)-\dfrac{1}{2c_{\rm F}}\big{(}\partial_{x}w^{+}(t,x)\big{)}^{2}-\sup_{a\in A}\big{\{}\partial_{x}w^{+}(t,x)a\big{\}}$	$\displaystyle=0,$

with terminal condition $w^{-}(T,x)=w^{+}(T,x)=x$ , $x\in\mathbb{R}$ . Recalling that $A=[-a_{\circ},a_{\circ}]$ , one can explicitly compute the auxiliary functions, solution to the previous HJB equations

\displaystyle w^{-}(t,x)=x+\bigg{(}\dfrac{1}{2c_{\rm F}}-a_{\circ}\bigg{)}(T-t),\;\text{and}\;w^{+}(t,x)=x+\bigg{(}\dfrac{1}{2c_{\rm F}}+a_{\circ}\bigg{)}(T-t),\;(t,x)\in[0,T]\times\mathbb{R}.

(2.18)

Remark 2.3.

Let us remark that in the context of this example, to have meaningful, i.e. finite, solutions the boundedness assumption on $A$ is necessary. Though the methodology developed in [14] can cover the case of unbounded action sets, this will require imposing growth conditions that, in turn, will rule out the framework of the current example. Moreover, we also remark that the possibility of discontinuous or exploding solutions requires working with solutions to the above PDEs in the viscosity sense.

The value function at the boundaries.

The second step is to determine the HJB equations satisfied by the value function $\widetilde{V}_{\rm L}(t,x,y)$ on the boundaries, i.e. on $\{y=w^{-}(t,x)\}$ and $\{y=w^{+}(t,x)\}$ , for all $(t,x)\in[0,T]\times\mathbb{R}$ . With this in mind, we define for all $p\coloneqq(p_{1},p_{2})^{\top}\in\mathbb{R}^{2}$ , $q\in\mathbb{R}^{2\times 2}$ and $(z,a)\in\mathbb{R}\times A$ ,

\displaystyle{\rm h}(p,q,z,a)\coloneqq-\dfrac{c_{\rm L}}{2}a^{2}+\bigg{(}a+\dfrac{1}{c_{\rm F}}\Pi_{\tilde{B}}(z)\bigg{)}p_{1}+\dfrac{1}{2c_{\rm F}}\Pi^{2}_{\tilde{B}}(z)p_{2}+\dfrac{1}{2}\sigma^{2}q_{11}+\dfrac{1}{2}\sigma^{2}z^{2}q_{22}+\sigma^{2}zq_{12}.

We then introduce the following Hamiltonians, for all $(t,x,p,q)\in[0,T]\times\mathbb{R}\times\mathbb{R}^{2}\times\mathbb{R}^{3}$ ,

\displaystyle{\rm H}^{-}(t,x,p,q)\coloneqq\sup_{(z,a)\in{\cal Z}^{\text{$-$}}(t,x)}{\rm h}(p,q,z,a),\;\text{and}\;{\rm H}^{+}(t,x,p,q)\coloneqq\sup_{(z,a)\in{\cal Z}^{\text{$+$}}(t,x)}{\rm h}(p,q,z,a),

in which the sets ${\cal Z}^{\pm}(t,x)$ are respectively defined by

	$\displaystyle{\cal Z}^{-}(t,x)$	$\displaystyle\coloneqq\big{\{}(z,a)\in\mathbb{R}\times A:\sigma z=\sigma\partial_{x}w^{-}(t,x),\;\text{and}\;-\partial_{t}w^{-}(t,x)-h^{b}(\partial_{x}w^{-}(t,x),\partial_{xx}w^{-}(t,x),z,a)\geq 0\big{\}},$
	$\displaystyle{\cal Z}^{+}(t,x)$	$\displaystyle\coloneqq\big{\{}(z,a)\in\mathbb{R}\times A:\sigma z=\sigma\partial_{x}w^{+}(t,x),\;\text{and}\;-\partial_{t}w^{+}(t,x)-h^{b}(\partial_{x}w^{+}(t,x),\partial_{xx}w^{+}(t,x),z,a)\leq 0\big{\}}.$

On the one hand, the value function $\widetilde{V}_{\rm L}$ should satisfy on $\{y=w^{-}(t,x)\}$ the following equation

\displaystyle-\partial_{t}v(t,x,y)-{\rm H}^{-}(t,x,\partial_{\rm x}v(t,x,y),\partial^{2}_{\rm x}v(t,x,y))=0,\;(t,x,y)\in[0,T)\times\mathbb{R}^{2},

with terminal condition $v(T,x,w^{-}(T,x))=x$ , $x\in\mathbb{R}$ .¹³¹³13Here, $\partial_{\rm x}v(t,x,y)$ and $\partial^{2}_{\rm x}v(t,x,y)$ denote respectively the gradient and Hessian of the function $v$ in both space variables ${\rm x}\coloneqq(x,y)$ . Given the previous HJB equation satisfied by $w^{-}$ , it is clear that ${\cal Z}^{-}(t,x)=\{(1,-a_{\circ})\}$ , for all $(t,x)\in[0,T)\times\mathbb{R}$ . We thus obtain a standard PDE for $\widetilde{V}_{\rm L}$ on $\{y=w^{-}(t,x)\}$ , namely

\displaystyle-\partial_{t}v+\dfrac{1}{2}c_{\rm L}a_{\circ}^{2}-\bigg{(}\dfrac{1}{c_{\rm F}}-a_{\circ}\bigg{)}\partial_{x}v-\dfrac{1}{2c_{\rm F}}\partial_{y}v-\dfrac{1}{2}\sigma^{2}\partial_{xx}v-\dfrac{1}{2}\sigma^{2}\partial_{yy}v-\sigma^{2}\partial_{xy}v=0,\;(t,x)\in[0,T]\times\mathbb{R},

which leads to the following solution

\displaystyle\widetilde{V}_{\rm L}(t,x,w^{-}(t,x))=x+\bigg{(}-a_{\circ}-\dfrac{1}{2}c_{\rm L}a_{\circ}^{2}+\dfrac{1}{c_{\rm F}}\bigg{)}(T-t),\;(t,x)\in[0,T]\times\mathbb{R}.

On the other hand, on $\{y=w^{+}(t,x)\}$ , the value function should be solution to

\displaystyle-\partial_{t}v(t,x,y)-{\rm H}^{+}(t,x,\partial_{\rm x}v(t,x,y),\partial^{2}_{\rm x}v(t,x,y))=0,\;(t,x,y)\in[0,T)\times\mathbb{R}^{2},

with terminal condition $v(T,x,w^{+}(T,x))=x$ , $x\in\mathbb{R}$ . Through similar computations, one obtains

\displaystyle\widetilde{V}_{\rm L}(t,x,w^{+}(t,x))=x+\bigg{(}a_{\circ}-\dfrac{1}{2}c_{\rm L}a_{\circ}^{2}+\dfrac{1}{c_{\rm F}}\bigg{)}(T-t),\;(t,x)\in[0,T]\times\mathbb{R}.

Value function inside the domain.

Finally, for $(t,x)\in[0,T]\times\mathbb{R}$ and $y\in(w^{-}(t,x),w^{+}(t,x))$ , the value function $\widetilde{V}_{\rm L}$ is solution to the classical HJB equation for stochastic control, namely

\displaystyle-\partial_{t}v(t,x,y)-H^{\rm L}(\partial_{\rm x}v(t,x,y),\partial^{2}_{\rm x}v(t,x,y))=0,\;\text{where}\;H^{\rm L}(p,q)\coloneqq\sup_{(z,a)\in\mathbb{R}\times A}{\rm h}(p,q,z,a),\;(p,q)\in\mathbb{R}^{2}\times\mathbb{R}^{2\times 2},

but instead of the usual terminal condition, we need to enforce the specific boundary conditions obtained in the previous step, i.e. for $(t,x)\in[0,T]\times\mathbb{R}$ ,

\displaystyle v(t,x,w^{-}(t,x))=x+\bigg{(}\dfrac{1}{c_{\rm F}}-a_{\circ}-\dfrac{1}{2}c_{\rm L}a_{\circ}^{2}\bigg{)}(T-t),\;\text{and}\;v(t,x,w^{+}(t,x))=x+\bigg{(}\dfrac{1}{c_{\rm F}}+a_{\circ}-\dfrac{1}{2}c_{\rm L}a_{\circ}^{2}\bigg{)}(T-t).

The previous HJB equation can be slightly simplified as follows

\displaystyle-\partial_{t}v-\sup_{a\in A}\bigg{\{}a\partial_{x}v-\dfrac{c_{\rm L}}{2}a^{2}\bigg{\}}-\sup_{z\in\mathbb{R}}\bigg{\{}\dfrac{1}{c_{\rm F}}\Pi_{\tilde{B}}(z)\partial_{x}v+\dfrac{1}{2c_{\rm F}}\Pi^{2}_{\tilde{B}}(z)\partial_{y}v+\dfrac{1}{2}\sigma^{2}z^{2}\partial_{yy}v+\sigma^{2}z\partial_{xy}v\bigg{\}}-\dfrac{1}{2}\sigma^{2}\partial_{xx}v=0.

(2.19)

Though no explicit solution seem to be available, the previous system can be solved numerically. Once this is achieved, it remains, for fixed $x\in\mathbb{R}$ , to maximise $v(0,x,y)$ over $y\in(w^{-}(0,x),w^{+}(0,x))$ . The optimal $y_{0}\in[w^{-}(0,x),w^{+}(0,x)]$ and the corresponding value $v(0,x,y_{0})$ will respectively give the follower’s and leader value functions, namely $V_{\rm F}$ and $V_{\rm L}$ , for the initial condition $X_{0}=x$ . The numerical results are presented in the following section.

2.2.4 Comparison with other solution concepts and numerical results

For the numerical results, we first consider a benchmark scenario in Figure 1, with parameters $T=1$ , $c_{\rm F}=c_{\rm L}=1$ , $\sigma=1$ , $a_{\circ}=10$ , and $b_{\circ}=3$ . In this scenario, the cost of effort is therefore identical for the leader and the follower. We then study in Figure 3 a scenario in which the follower’s cost of effort increases to $c_{\rm F}=1.25$ , and conversely in Figure 2 when now the leader’s cost of effort increases to $c_{\rm L}=1.25$ . Finally, we represent in Figure 4 the impact of an increase of $a_{\circ}$ from $10$ to $20$ . Note that in these four scenarios, $a_{\circ}$ is chosen sufficiently large so that Equation 2.6 and Equation 2.7 are satisfied.

Refer to caption — (a) Leader’s value function.

First of all, we remark that for the four sets of parameters, we always have the following inequalities for the leader’s value function,

\displaystyle V_{\rm L}^{\rm AOL}=V_{\rm L}^{\rm AF}<V_{\rm L}^{\rm ACLM},V_{\rm L}^{\rm CL}<V_{\rm L}^{\rm ACL}=V_{\rm L}^{\rm FB},

and the converse inequalities for the follower’s value. In particular, for these chosen sets of parameters, the leader’s value in the closed-loop equilibrium is higher than her value in the adapted closed-loop memoryless scenario.

Comparing Figure 1 with Figure 2, one can observe that the increase in the leader’s cost of effort negatively impacts both her and the follower’s value in any equilibrium concepts. Comparing now Figure 1 with Figure 3, we can observe that when the follower’s cost of effort slightly increases, it negatively impacts both his and the leader’s value for almost all concepts of equilibrium, except in the ACL/first-best case. Indeed, in this scenario, the leader’s value function remains unchanged, as the follower will always exert the maximal effort $b_{\circ}$ . Therefore, only the follower’s value is impacted by the increase in his cost.

Finally, comparing Figure 4 with the benchmark in Figure 1, one can notice that increasing the parameter $a_{\circ}$ , representing the maximum absolute value of the leader’s effort, will only impact the values in the CL case. Indeed, in the FB, ACL, AOL and AF cases, the leader will always exert the optimal effort $1/c_{\rm L}$ , independently of $a_{\circ}$ . However, in the closed-loop equilibrium, when $a_{\circ}$ increases, the leader has more bargaining power to incentivise the follower to exert a higher effort. More precisely, when studying the partial differential equations satisfied by the boundaries $w^{\pm}$ , one can notice that if $a_{\circ}$ increases, the cone formed by the boundaries becomes larger. The leader should still ensure that the target constraint is satisfied, and therefore set the control $Z$ to $1$ when one of the barriers is hit, but as the cone is wider this constraint becomes less restrictive. Intuitively, if the set $A$ was not bounded, the boundaries $w^{-}$ and $w^{+}$ would be at $-\infty$ and $+\infty$ respectively, leading to an unconstrained problem for the leader. With this in mind, the limit of the leader’s value when $a_{\circ}$ goes to infinity should coincide with her value in the first-best case. In other words, the higher $a_{\circ}$ , the longer the leader can force the follower to exert the maximal effort $b_{\circ}$ instead of his optimal effort $1/c_{\rm F}$ .

^inline^inlinetodo: inlineAny way we can get a less ‘stair-case–like’ curve for CL in Figure 4.(b)? Also, what us the drop at time 1 for CL on Figure 4.(a)?

3 General problem formulation

Let $T>0$ , $\Omega\coloneqq{\cal C}([0,T];\mathbb{R}^{d})$ , topologised by uniform convergence, and $X$ be the canonical process on $\Omega$ , that is

X_{t}(x)\coloneqq x(t),\;x\in\Omega,\;t\in[0,T].

We denote by $\mathbb{F}=({\cal F}_{t})_{t\geq 0}$ the canonical filtration, that is, ${\cal F}_{t}=\sigma(X_{s},0\leq s\leq t)$ for every $t\in[0,T]$ . ^inline^inlinetodo: inlineThis is called $\mathbb{F}^{X}$ in the notations. The process $X$ will represent the output of the game, which will be controlled in weak formulation by both the leader and the follower. We will give the details in the next subsection.

Let $\mathbf{M}(\Omega)$ be the set of all probability measures on $(\Omega,{\cal F}_{T})$ . $\mathbb{P}\in\mathbf{M}(\Omega)$ is said to be a semi-martingale measure if $X$ is an $(\mathbb{F},\mathbb{P})$ –semi-martingale. We denote by $\mathcal{P}_{S}$ the set of all semi-martingale measures. By Karandikar [40], there exists an $\mathbb{F}$ –progressively measurable process denoted by $[X]\coloneqq([X]_{t})_{t\in[0,T]}$ coinciding with the quadratic variation of $X$ , $\mathbb{P}\text{\rm--a.s.}$ , for any $\mathbb{P}\in{\cal P}_{S}$ . Moreover, the density with respect to the Lebesgue measure is denoted by a non-negative symmetric matrix $\widehat{\sigma}^{2}_{t}\in\mathbb{S}^{d}$ defined by

\widehat{\sigma}^{2}_{t}\coloneqq\underset{\varepsilon\searrow 0}{\rm{limsup}}\;\frac{[X]_{t}-[X]_{t-\varepsilon}}{\varepsilon},\;t\in[0,T].

We also recall the so-called universal filtration $\mathbb{F}^{U}\coloneqq(\mathcal{F}^{U}_{t})_{0\leq t\leq T}$ given by $\mathcal{F}^{U}_{t}\coloneqq\bigcap_{\mathbb{P}\in\mathbf{M}(\Omega)}\mathcal{F}_{t}^{\mathbb{P}}$ , where $\mathcal{F}_{t}^{\mathbb{P}}$ is the usual $\mathbb{P}$ -completion of ${\cal F}_{t}$ . For any subset $\mathcal{P}\subseteq\mathbf{M}(\Omega)$ , letting ${\cal N}^{{\cal P}}$ denote the collection of ${\cal P}$ -polar sets, i.e. the sets which are $\mathbb{P}$ -negligible for all $\mathbb{P}\in{\cal P}$ , we define the filtration $\mathbb{F}^{\mathcal{P}}\coloneqq(\mathcal{F}^{\mathcal{P}}_{t})_{t\in[0,T]}$ , defined by $\mathcal{F}^{\mathcal{P}}_{t}\coloneqq\mathcal{F}_{t}^{\cal P}\vee{\cal N}^{\mathcal{P}},\ t\in[0,T]$ .

3.1 Controlled state dynamics

Given finite-dimensional Euclidian spaces $A$ and $B$ , we describe the state process by means of the coefficients

\sigma:[0,T]\times\Omega\times A\times B\longrightarrow\mathbb{R}^{d\times n},\;\text{and}\;\lambda:[0,T]\times\Omega\times A\times B\longrightarrow\mathbb{R}^{n},

assumed to be Borel-measurable and non-anticipative in the sense that $\ell_{t}(x,a,b)=\ell_{t}x_{\cdot\wedge t},a,b)$ , for all $(t,x,a,b)\in[0,T]\times\Omega\times A\times B$ and $\ell\in\{\sigma,\lambda\}$ . Since the product $\sigma\lambda$ will appear often, we abuse notations and write $\sigma\lambda_{t}(x,a,b)\coloneqq\sigma_{t}(x,a,b)\lambda_{t}(x,a,b)$ , for all $(t,x,a,b)\in[0,T]\times\Omega\times A\times B$ . These functions satisfy the following conditions, which we comment upon in Remark 3.2.

Assumption 3.1.

$(i)$

The map $\Omega\ni x\longmapsto\sigma_{t}(\cdot,a,b)$ is continuous for every $(t,a,b)\in[0,T]\times A\times B$ , there exists $\ell_{\sigma}>0$ such that $|\sigma_{t}(x,a,b)|\leq\ell_{\sigma}$ for every $(t,x,a,b)\in[0,T]\times\Omega\times A\times B$ , and $\sigma\sigma^{\top}_{t}(x,a,b)\coloneqq\sigma_{t}(x,a,b)\sigma^{\top}_{t}(x,a,b)$ is invertible for every $(t,x,a,b)\in[0,T]\times\Omega\times A\times B$ . ^inline^inlinetodo: inlineI would expect we also need $(\sigma\sigma^{\top})^{-1}$ bounded, otherwise the BM $W^{\mathbb{P}}$ below may not be well-defined.
$(ii)$

There exists $\ell_{\lambda}>0$ such that $|\lambda_{t}(x,a,b)|\leq\ell_{\lambda}$ , for every $(t,x,a,b)\in[0,T]\times\Omega\times A\times B$ .

The actions of the leader are valued in $A$ , and the actions of the follower are valued in $B$ . We define the sets of controls ${\cal A}_{o}$ and ${\cal B}$ as the ones containing the $\mathbb{F}$ -predictable processes with values in $A$ and $B$ , respectively. Let $x_{0}\in\mathbb{R}^{d}$ . For $(\alpha,\beta)\in{\cal A}_{o}\times{\cal B}$ , the controlled state equation is given by the SDE

X_{t}=x_{0}+\int_{0}^{t}\sigma\lambda_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+\int_{0}^{t}\sigma_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}W_{s},\;t\in[0,T],

(3.1)

where $W$ denotes an $n$ -dimensional Brownian motion. We characterise (3.1) in terms of weak solutions. These are elegantly represented in terms of so-called martingale problems and Girsanov’s theorem, see Stroock and Varadhan [71] for details. Indeed, let us consider the SDE

\displaystyle X_{t}=x_{0}+\int_{0}^{t}\sigma_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}W_{s},\;t\in[0,T],

(3.2)

and denote by ${\cal P}$ the set of weak solutions to (3.2). This is

{\cal P}\coloneqq\{\mathbb{P}\in{\bf M}(\Omega):\exists W^{\mathbb{P}},\;n\text{-dimensional }\mathbb{P}\text{--Brownian motion},\;\text{and}\;(\alpha,\beta)\in{\cal A}_{o}\times{\cal B}\text{ for which }\eqref{eq:X-dynamics-without-drift}\text{ holds }\mathbb{P}\text{\rm--a.s.}\}.

By Girsanov’s theorem, any $\mathbb{P}\in{\cal P}$ induces $\bar{\mathbb{P}}\in{\bf M}(\Omega)$ weak solution to (3.1), where $\bar{\mathbb{P}}$ is defined by

\displaystyle\frac{\mathrm{d}\bar{\mathbb{P}}}{\mathrm{d}\mathbb{P}}\coloneqq\exp\bigg{(}\int_{0}^{T}\lambda_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\cdot\mathrm{d}W_{s}^{\mathbb{P}}-\frac{1}{2}\int_{0}^{T}\|\lambda_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\|^{2}\mathrm{d}s\bigg{)}.

(3.3)

For any action $\alpha\in{\cal A}_{o}$ of the leader, we define the set ${\cal R}(\alpha)$ of admissible responses of the follower by

{\cal R}(\alpha)\coloneqq\{(\mathbb{P},\beta)\in{\cal P}\times{\cal B}:\mathbb{P}\text{ is the unique measure in }{\cal P}\text{ such that }\eqref{eq:X-dynamics-without-drift}\text{ holds }\mathbb{P}\text{\rm--a.s.}\text{ with }(\alpha,\beta)\},

as well as the set of weak solutions ${\cal P}^{\alpha}\coloneqq\{\mathbb{P}\in{\cal P}:\eqref{eq:X-dynamics-without-drift}\text{ holds }\mathbb{P}\text{\rm--a.s.}\text{ with }(\alpha,\beta),\text{ for some }\beta\in{\cal R}(\alpha)\}.$

Remark 3.2.

$(i)$

We note that ${\cal P}$ is nonempty due to the continuity assumption on $\sigma$ , ensuring that solutions do exist for instance for constant controls $\alpha$ and $\beta$ , see [71, Theorem 6.1.6]. Concerning the uniqueness of weak solutions, we impose it as a condition for the admissible controls of the follower. That is, for a pair $(\alpha,\beta)$ of controls played by the leader and the follower, the law of $X$ is uniquely determined.
$(ii)$

We also stress that in the above formulation, there is no need to enlarge the canonical space. This subtlety is significant in the context of Stackelberg games, as doing so would mean changing the information structure of the game. Indeed, we note that in the definition of ${\cal P}$ , $W^{\mathbb{P}}$ is a Brownian motion in the original canonical space $\Omega$ . Given our assumptions on the volatility $\sigma\sigma^{\top}$ , namely its invertibility and boundedness, we do not need to enlarge $\Omega$ in this setting. In general, if the volatility is allowed to degenerate, one may need to introduce external sources of randomness and define a Brownian motion on an enlarged probability space. We refer the reader to [71, Section 4.5] and [58, Section 2.1.2] for a discussion on these results.

3.2 The closed-loop Stackelberg game between the leader and the follower

The timing of the game is as follows. The leader chooses first a control $\alpha\in{\cal A}_{o}$ to which the follower responds with $\beta\in{\cal B}$ . The response is, of course, dependent on the control chosen by the leader. Given an action $\alpha\in{\cal A}_{0}$ , the problem of the follower is given by

V_{\rm F}(\alpha)\coloneqq\sup_{(\mathbb{P},\beta)\in{\cal R}(\alpha)}\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}c_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+g(X_{\cdot\wedge T})\bigg{]},

(3.4)

where the functions $c:[0,T]\times\Omega\times A\times B\longrightarrow\mathbb{R}$ and $g:\Omega\longrightarrow\mathbb{R}$ are continuous and bounded by constants $\ell_{c}$ and $\ell_{g}$ respectively, such that for every $(a,b)\in A\times B$ the process $c_{\cdot}(\cdot,a,b)$ is $\mathbb{F}$ –progressively measurable. We say that $(\mathbb{P},\beta)\in{\cal R}(\alpha)$ is an optimal response to $\alpha\in{\cal A}_{o}$ , and write $(\mathbb{P},\beta)\in{\cal R}^{\star}(\alpha)$ , if $(\mathbb{P},\beta)$ is a solution to Problem (3.4). We define the set ${\cal A}$ as the family $\alpha\in{\cal A}_{o}$ for which there exists at least one optimal response, i.e., ${\cal R}^{\star}(\alpha)\neq\emptyset$ . ^inline^inlinetodo: inlineSeparate assumption for $c$ and $g$ : we should present them as we did for $\sigma$ and $\lambda$ . Same for $C$ and $G$ . Also, why do we need continuity w.r.t. to every argument? ^inline^inlinetodo: inlineWhy do we need to move from ${\cal A}_{o}$ to ${\cal A}$ ? Given the formulation, and our choice that sup over an empty set is $-\infty$ , the leader will never choose $\alpha$ such that ${\cal R}^{\star}(\alpha)$ is empty. The only degenerate case is if ${\cal R}^{\star}(\alpha)$ is always empty, whatever the choice of $\alpha$ . So we may as well simply assume that there is one $\alpha\in{\cal A}_{o}$ such that ${\cal R}^{\star}(\alpha)\neq\emptyset$ , and then mention that this is implicitly always true that we can restrict our attention to choices of $\alpha$ such that this is not empty. This avoids introducing both ${\cal A}_{o}$ and ${\cal A}$ (and I would then call the former ${\cal A}$ directly).

We assume that the leader chooses a control from the set ${\cal A}$ and anticipates the optimal response of the follower. Therefore, the leader faces the following problem

V_{\rm L}\coloneqq\sup_{\alpha\in{\cal A}}\sup_{(\mathbb{P},\beta)\in{\cal R}^{\text{$\star$}}(\alpha)}\mathbb{E}^{{\bar{\mathbb{P}}}}\bigg{[}\int_{0}^{T}C_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+G(X_{\cdot\wedge T})\bigg{]},

(3.5)

where $C:[0,T]\times\Omega\times A\times B\longrightarrow\mathbb{R}$ and $G:\Omega\longrightarrow\mathbb{R}$ are bounded functions, respectively by the constants $\ell_{C}$ and $\ell_{G}$ , such that for every $(a,b)\in A\times B$ the process $C_{\cdot}(\cdot,a,b)$ is $\mathbb{F}$ –progressively measurable.

Remark 3.3.

$(i)$

We assume that the functions in our model are bounded just to simplify the expositions of the results. These assumptions can be relaxed by imposing the usual integrability conditions in the set of admissible controls of the players. The results in this section and in Section 4 will still hold. The analysis becomes more delicate when studying the so–called target reachability set, defined in Section 5, through its upper and lower boundaries, and to characterise them by our methods.

(ii)

Let us mention that the existence of optimal responses is fundamental for Stackelberg games and cannot be dropped. Indeed, the main motivation in this game is that the leader plays first by anticipating the response of the follower. On the other hand, we assume that the leader has enough bargaining power to make the follower choose a maximiser that suits her best, or equivalently, we consider the problem of an optimistic leader for whom, if the follower has multiple optimal responses—and thus he is indifferent among all of them—he will choose one that benefits the leader the most. This is consistent with, for instance, Bressan [15, Section 2.1], Zemkoho [82], or Havrylenko, Hinken, and Zagst [35]. Alternatively, one could take an adversarial perspective in which the leader faces the problem

\sup_{\alpha\in{\cal A}}\inf_{(\mathbb{P},\beta)\in{\cal R}^{\text{$\star$}}(\alpha)}\mathbb{E}^{{\bar{\mathbb{P}}}}\bigg{[}\int_{0}^{T}C_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+G(X_{\cdot\wedge T})\bigg{]}.

This is the pessimistic point of view, which has also been coined generalised or weak Stackelberg equilibrium, see Leitmann [41], Başar and Olsder [7], Wiesemann, Tsoukalas, Kleniati, and Rustem [78], or Liu, Fan, Chen, and Zheng [48]. Notice that in this case, existence of equilibria may become problematic, which led part of the literature to consider so-called regularised Stackelberg problems, where, for a fixed $\varepsilon>0$ , the infimum would now be taken over the set of actions of the follower which give him a value $\varepsilon$ -close to his optimal one, see Mallozzi and Morgan [50, Section 3] and the references therein. We point out that our approach allows us to tackle both the optimistic and the pessimistic problems in the same way, the difference being in the resulting Hamiltonians of the HJB equations associated to each one of the two problems. More details will be given below.

Before concluding this section, let us mention that, for technical reasons, we work under the classical ZFC set-theoretic axioms (Zermelo–Fraenkel plus the axiom of choice). We need these axioms for the aggregation results that we use for the $K-$ component of the solution to a second–order BSDEs. More details are given in the next section. ^inline^inlinetodo: inlineThat’s not really correct: everyone except set-theorists works with ZFC, so there is not need to mention it. For aggregation, you need strictly more than ZFC, so if we want to say something, we should say that whole thing here.

4 Reduction to a target control problem

In this section, we fix a control $\alpha\in{\cal A}$ of the leader and characterise the solutions $(\mathbb{P}^{\star},\beta^{\star})\in{\cal R}^{\star}(\alpha)$ to the continuous-time stochastic control problem (3.4). Our approach is inspired by the dynamic programming approach to principal–agent problems developed in [25].

As standard in the control literature, we introduce the Hamiltonian functions $H^{\rm F}:[0,T]\times\Omega\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times A\longrightarrow\mathbb{R}$ and $h^{\rm F}:[0,T]\times\Omega\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times A\times B\longrightarrow\mathbb{R}$

\displaystyle\begin{split}H^{\rm F}_{t}(x,z,\gamma,a)&\coloneqq\sup_{b\in B}\big{\{}h_{t}^{\rm F}(x,z,\gamma,a,b)\big{\}},\\ h^{\rm F}_{t}(x,z,\gamma,a,b)&\coloneqq c_{t}(x,a,b)+\sigma\lambda_{t}(x,a,b)\cdot z+\frac{1}{2}\mathrm{Tr}[\sigma\sigma_{t}^{\top}\!(x,a,b)\gamma].\end{split}

(4.1)

Define now, for $(t,x,\Sigma,a)\in[0,T]\times\Omega\times\mathbb{S}^{d}_{+}\times A$ , the set $A_{t}(x,\Sigma,a)\coloneqq\big{\{}b\in B:\sigma\sigma_{t}^{\top}\!(x,a,b)=\Sigma\big{\}}$ . For $(\alpha,\mathbb{P})\in{\cal A}\times{\cal P}^{\alpha}$ , the set of controls for the follower is given by

{\cal B}(\alpha,\mathbb{P})\coloneqq\{\beta\in{\cal B}:\beta_{t}\in A_{t}(x,\widehat{\sigma}_{t}^{2},\alpha_{t}),\;\mathrm{d}t\otimes\mathbb{P}\text{\rm--a.e.}\}.

With these definitions, we can isolate the partial maximisation with respect to the squared diffusion in $H^{\rm F}$ . In words, letting $F:[0,T]\times\Omega\times\mathbb{R}^{d}\times\mathbb{S}^{d}_{+}\times A\longrightarrow\mathbb{R}$ , be given by

F_{t}(x,z,\Sigma,a)\coloneqq\sup_{b\in A_{\text{$t$}}(x,\Sigma,a)}\big{\{}c_{t}(x,a,b)+\sigma\lambda_{t}(x,a,b)\cdot z\},

we have that $2H^{\rm F}=(-2F)^{\ast}$ where the superscript $\ast$ denotes the Legendre transform

H^{\rm F}_{t}(x,z,\gamma,a)=\sup_{\Sigma\in\mathbb{S}_{\text{$d$}}^{\text{$+$}}}\bigg{\{}F_{t}(x,z,\Sigma,a)+\frac{1}{2}\text{Tr}[\Sigma\gamma]\bigg{\}}.

Recalling (3.3), we can equivalently write the problem of the follower (3.4) as

\displaystyle V_{\rm F}(\alpha)=\sup_{\mathbb{P}\in{\cal P}^{\text{$\alpha$}}}\sup_{\beta\in{\cal B}(\alpha,\mathbb{P})}\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}c_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+g(X_{\cdot\wedge T})\bigg{]},

(4.2)

to which we will associate the second-order BSDE¹⁴¹⁴14We refer the reader to [58, 70] for an introduction and extension of the theory of such equations.

Y_{t}=g(X_{\cdot\wedge T})+\int_{t}^{T}F_{s}(X_{\cdot\wedge s},Z_{s},\widehat{\sigma}_{s}^{2},\alpha_{s})\mathrm{d}s-\int_{t}^{T}Z_{s}\cdot\mathrm{d}X_{s}+\int_{t}^{T}\mathrm{d}K_{s},\;{\cal P}^{\alpha}\text{\rm--q.s.},\;t\in[0,T].

(4.3)

Notice that, similarly to [25], we consider an aggregated version of the non-decreasing process $K$ .¹⁵¹⁵15We require the aggregation of the component $K$ , as well as the one of the stochastic integral, to define later the forward process $Y^{y,Z,\Gamma,\alpha}$ independently of any probability. There are aggregation results for the stochastic integral in [53], which suit our setting and use the notion of medial limits. By following this route, one would need to assume ZFC plus some other axioms. We refer the reader to [58, Footnote 7] for a further discussion on the weakest set of axioms known to be sufficient for the existence of medial limits. We have then the following notion of solution to the 2BSDE, the functional spaces mentioned in the following definition can be found in Appendix B. We also use the notation

{\cal P}^{\alpha}[\mathbb{P},\mathbb{F}^{+},t]\coloneqq\{\mathbb{P}^{\prime}\in{\cal P}^{\alpha}:\mathbb{P}[E]=\mathbb{P}^{\prime}[E],\;\forall E\in{\cal F}_{t}^{+}\}.

^inline^inlinetodo: inline

\mathbb{F}^{+}

not defined.

Definition 4.1.

We say that the triple $(Y,Z,K)$ is a solution to the 2BSDE (4.3) if there exists $p>1$ such that $(Y,Z,K)\in\mathbb{S}^{p}(\mathbb{F}^{{\cal P}^{\text{$\alpha$}}},{\cal P}^{\alpha})\times\mathbb{H}^{p}(\mathbb{F}^{{\cal P}^{\text{$\alpha$}}},{\cal P}^{\alpha})\times\mathbb{I}^{p}(\mathbb{F}^{{\cal P}^{\text{$\alpha$}}},{\cal P}^{\alpha})$ satisfies (4.3) and $K$ satisfies the minimality condition

K_{t}=\operatorname*{{ess\,inf}^{\mathbb{P}}}_{\mathbb{P}^{\prime}\in{\cal P}^{\text{$\alpha$}}[\mathbb{P},\mathbb{F}^{\text{$+$}},t]}\;\mathbb{E}^{\mathbb{P}^{\text{$\prime$}}}\big{[}K_{T}\big{|}\mathcal{F}_{t}^{\mathbb{P},+}\big{]},\;t\in[0,T],\;{\cal P}^{\alpha}\text{\rm--q.s.}

(4.4)

As anticipated, the next result connects the problem of the follower with the 2BSDE (4.3).

Proposition 4.2.

There exists a unique solution $(Y,Z,K)$ to the 2BSDE (4.3), for which the value of the follower satisfies $V_{\rm F}(\alpha)=\sup_{\mathbb{P}\in{\cal P}^{\text{$\alpha$}}}\mathbb{E}^{\bar{\mathbb{P}}}[Y_{0}]$ . Moreover, $(\mathbb{P}^{\star},\beta^{\star})\in{\cal R}^{\star}(\alpha)$ if and only if $K_{T}=0$ , $\mathbb{P}^{\star}\text{\rm--a.s.}$ and

\beta^{\star}\;\text{\rm is a maximiser in the definition of }F_{\cdot}(X_{\cdot},Z_{\cdot},\widehat{\sigma}^{2}_{\cdot},\alpha_{\cdot}),\;\mathrm{d}t\otimes\mathrm{d}\mathbb{P}^{\star}\text{\rm--a.e.}

(4.5)

Proof.

Notice that the follower’s problem can be seen as the particular problem of an agent who is offered by the principal a terminal remuneration of the form $\xi=g(X_{\cdot\wedge T})$ . Since the function $g$ is assumed to be bounded, the result is a direct application of [25, Propositions 4.5 and 4.6]. ∎

For $p>1$ , $(y,\alpha,Z,K)\in{\cal A}\times\mathbb{R}\times\mathbb{H}^{p}(\mathbb{F}^{{\cal P}^{\text{$\alpha$}}},{\cal P}^{\alpha})\times\mathbb{I}^{p}(\mathbb{F}^{{\cal P}^{\text{$\alpha$}}},{\cal P}^{\alpha})$ , $K$ satisfying (4.4), the process $Y^{y,\alpha,Z,K}$ , given by

Y^{y,\alpha,Z,K}_{t}\coloneqq y-\int_{0}^{t}F_{s}(X_{\cdot\wedge s},Z_{s},\widehat{\sigma}_{s}^{2},\alpha_{s})\mathrm{d}s+\int_{0}^{t}Z_{s}\cdot\mathrm{d}X_{s}-\int_{0}^{t}\mathrm{d}K_{s},\;t\in[0,T],

is well-defined, independent of the probability $\mathbb{P}$ , because the stochastic integrals can be defined pathwise (see [25, Definition 3.2] and the paragraph thereafter). The idea is to look at the tuples $(y,\alpha,Z,K)$ for which it holds that $Y^{y,\alpha,Z,K}_{T}=g(X_{\cdot\wedge T})$ . However, as argued in [25, Theorem 3.6], the processes $K$ can be approximated by those of the form

\int_{0}^{t}\bigg{(}H^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s})-F_{s}(X_{\cdot\wedge s},Z_{s},\widehat{\sigma}^{2}_{s},\alpha_{s})-\frac{1}{2}\text{Tr}[\widehat{\sigma}^{2}_{s}\Gamma_{s}\big{]}\bigg{)}\mathrm{d}s,

for some appropriate control $\Gamma$ . With this in mind, we define the following class of processes that would be seen as controls from the point of view of the leader.

Definition 4.3.

For any $\alpha\in{\cal A}$ , let ${{\cal C}^{\alpha}}$ be the class of $\mathbb{F}^{{\cal P}^{\text{$\alpha$}}}$ -predictable processes $(Z,\Gamma):[0,T]\times\Omega\longrightarrow\mathbb{R}^{d}\times\mathbb{S}^{d}$ such that

\|Y^{y,\alpha,Z,\Gamma}\|_{\mathbb{S}^{\text{$p$}}(\mathbb{F}^{\text{${\cal P}$}^{\text{$\alpha$}}},{\cal P}^{\text{$\alpha$}})}^{p}+\|Z\|_{\mathbb{H}^{\text{$p$}}(\mathbb{F}^{\text{${\cal P}$}^{\text{$\alpha$}}},{\cal P}^{\text{$\alpha$}})}^{p}<+\infty,

for some $p>1$ , where for $y\in\mathbb{R}$ we define, $\mathbb{P}\text{\rm--a.s.}$ for all $\mathbb{P}\in{\cal P}^{\alpha}$ , the process

Y^{y,\alpha,Z,\Gamma}_{t}\coloneqq y-\int_{0}^{t}H^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s})\mathrm{d}s+\int_{0}^{t}Z_{s}\cdot\mathrm{d}X_{s}+\frac{1}{2}\int_{0}^{t}\mathrm{Tr}[\widehat{\sigma}^{2}_{s}\Gamma_{s}]\mathrm{d}s,\;t\in[0,T].

(4.6)

The next proposition provides an optimality condition for a pair $(\mathbb{P},\beta)$ , when the process $Y^{y,\alpha,Z,\Gamma}$ hits the correct terminal condition, i.e., $Y_{T}^{y,\alpha,Z,\Gamma}=g(X_{\cdot\wedge T})$ , $\mathbb{P}\text{\rm--a.s.}$ In such a case, the follower’s value coincides with $y$ , and his optimal actions correspond to maximisers of the Hamiltonian $H^{\rm F}$ . We will use this characterisation in the next section to obtain a reformulation of the problem of the leader.

Proposition 4.4.

Let $\alpha\in{\cal A}$ and $(y,Z,\Gamma)\in\mathbb{R}\times{{\cal C}^{\alpha}}$ be such that $Y_{T}^{y,\alpha,Z,\Gamma}=g(X_{\cdot\wedge T})$ , $\mathbb{P}\text{\rm--a.s.}$ , for some $(\mathbb{P},\beta)\in{\cal R}(\alpha)$ . Then, the following are equivalent

$(i)$

$(\mathbb{P},\beta)\in{\cal R}^{\star}(\alpha)$ and $V_{\rm F}(\alpha)=y;$

(ii)

$\beta$ maximises $h^{\rm F}$ on the support of $\mathbb{P}$

H^{\rm F}_{t}(X_{\cdot\wedge t},Z_{t},\Gamma_{t},\alpha_{t})=h^{\rm F}_{t}(X_{\cdot\wedge t},Z_{t},\Gamma_{t},\alpha_{t},\beta_{t}),\;\mathrm{d}t\otimes\mathrm{d}\mathbb{P}\text{\rm--a.e.}

(4.7)

Proof of Proposition 4.4.

Let $(\mathbb{P},\beta)\in{\cal R}(\alpha)$ such that $Y_{T}^{y,\alpha,Z,\Gamma}=g(X_{\cdot\wedge T})$ , $\mathbb{P}\text{\rm--a.s.}$ Assume $(i)$ holds. Then, the value and utility of the follower satisfy

V_{\rm F}(\alpha)=U_{\rm F}(\mathbb{P},\beta)\coloneqq\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}c_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+g(X_{\cdot\wedge T})\bigg{]}=\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}c_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+Y_{T}^{y,\alpha,Z,\Gamma}\bigg{]}.

By writing the dynamics of $Y^{y,\alpha,Z,\Gamma}$ and the fact that $\mathbb{P}$ is a weak solution to (3.1) with $(\alpha,\beta)$ , we obtain

	$\displaystyle U_{\rm F}(\mathbb{P},\beta)$	$\displaystyle=\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}c_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+y-\int_{0}^{T}H^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s})\mathrm{d}s+\int_{0}^{T}Z_{s}\cdot\mathrm{d}X_{s}+\frac{1}{2}\int_{0}^{T}\mathrm{Tr}[\widehat{\sigma}^{2}_{s}\Gamma_{s}]\mathrm{d}s\bigg{]}$
		$\displaystyle=y+\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}\big{(}h^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s},\beta_{s})-H^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s})\big{)}\mathrm{d}s+\int_{0}^{T}Z_{s}\cdot\sigma_{s}(X,\alpha_{s},\beta_{s}^{\star})\mathrm{d}W^{\bar{\mathbb{P}}^{\star}}_{s}\bigg{]}$
		$\displaystyle=y+\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}\big{(}h^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s},\beta_{s})-H^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s})\big{)}\mathrm{d}s\bigg{]},$

since the stochastic integral is a martingale due to the integrability conditions specified in the definition of ${{\cal C}^{\alpha}}$ . Now, by definition of $H^{\rm F}$ , see (4.1), we see that $V_{\rm F}(\alpha)\leq y$ . Since $V_{\rm F}(\alpha)=y$ , we deduce $(ii)$ holds. Let us now assume $(ii)$ . Since $(\mathbb{P},\beta)\in{\cal R}(\alpha)$ , it follows from (4.2) that

\displaystyle V_{\rm F}(\alpha)\geq\sup_{\beta\in{\cal B}(\alpha,\mathbb{P})}\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}c_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+g(X_{\cdot\wedge T})\bigg{]}.

(4.8)

The value on the right corresponds to $\mathbb{E}^{\bar{\mathbb{P}}}[Y_{0}]$ , where $(Y,Z,K)$ is the unique solution to the BSDE

Y_{t}=g(X_{\cdot\wedge T})+\int_{t}^{T}F_{s}(X_{\cdot\wedge s},Z_{s},\widehat{\sigma}_{s}^{2},\alpha_{s})\mathrm{d}s-\int_{t}^{T}Z_{s}\cdot\mathrm{d}X_{s}+\int_{t}^{T}\mathrm{d}K_{s},\;t\in[0,T],\;\mathbb{P}\text{\rm--a.s.},

and equality in (4.8) holds if $K_{T}=0$ , $\mathbb{P}\text{\rm--a.s.}$ Since $2H^{\rm F}=(-2F)^{\ast}$ and (4.7) hold, together with the condition $Y_{T}^{y,\alpha,Z,\Gamma}=g(X_{\cdot\wedge T})$ , $\mathbb{P}\text{\rm--a.s.}$ , we see that $Y^{y,\alpha,Z,\Gamma}$ satisfies

Y_{t}^{y,\alpha,Z,\Gamma}=g(X_{\cdot\wedge T})+\int_{t}^{T}F_{s}(X_{\cdot\wedge s},Z_{s},\widehat{\sigma}_{s}^{2},\alpha_{s})\mathrm{d}s-\int_{t}^{T}Z_{s}\cdot\mathrm{d}X_{s}+\int_{t}^{T}\mathrm{d}K_{s}^{Z,\Gamma,\alpha},\;t\in[0,T],\;\mathbb{P}\text{\rm--a.s.},

where

K_{t}^{Z,\Gamma,\alpha}\coloneqq\int_{0}^{t}\bigg{(}H^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s})-h^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s},\alpha_{s},\beta_{s})\bigg{)}\mathrm{d}s,

which by assumption satisfies $K_{T}^{Z,\Gamma,\alpha}=0,$ $\mathbb{P}\text{\rm--a.s.}$ Hence $(\mathbb{P},\beta)\in{\cal R}^{\star}(\alpha)$ from the previous discussion. Finally, since by uniqueness of the solution we have that $y=\mathbb{E}^{\bar{\mathbb{P}}}[Y_{0}]$ , the fact that $V_{\rm F}(\alpha)=y$ is argued as in $(i)$ . ∎

4.1 A stochastic target reformulation of the problem of the leader

In light of the results from the previous section, we are drawn to reformulate the problem faced by the leader as a stochastic control problem with stochastic target constraints. Indeed, Proposition 4.4 tell us that the value of the follower (given the control $\alpha$ by the leader) is equal to $V_{\rm F}(\alpha)=y$ and any pair $(\mathbb{P}^{\star},\beta^{\star})$ that satisfies (4.7) is a solution to the problem of the follower, as long as $Y^{y,\alpha,Z,\Gamma}$ hits the correct terminal value.

For $(Z,\Gamma,\alpha)\in{{\cal C}^{\alpha}}\times{\cal A}$ and deterministic $y\in\mathbb{R}$ , which represents the value of the follower, let us define the set

\displaystyle{\cal R}^{\star}(y,\alpha,Z,\Gamma)

\displaystyle\coloneqq\{(\mathbb{P},\beta)\in{\cal R}(\alpha):Y_{T}^{y,\alpha,Z,\Gamma}=g(X_{\cdot\wedge T}),\;\text{and }\eqref{eq:optimal-effort-reformulated}\text{ hold, }\mathbb{P}\text{\rm--a.s.}\}.

We propose then the following reformulation of the problem of the leader

\hat{V}_{\rm L}\coloneqq\sup_{y\in\mathbb{R}}\sup_{(Z,\Gamma,\alpha)\in{{\cal C}^{\text{$\alpha$}}}\times{\cal A}}\sup_{(\mathbb{P},\beta)\in{\cal R}^{\text{$\star$}}(y,\alpha,Z,\Gamma)}\mathbb{E}^{\bar{\mathbb{P}}}\bigg{[}\int_{0}^{T}C_{s}(X_{\cdot\wedge s},\alpha_{s},\beta_{s})\mathrm{d}s+G(X_{\cdot\wedge T})\bigg{]}.

(4.9)

Remark 4.5.

Let us briefly digress on the nature of (4.9).

$(i)$

A distinctive feature of (4.9) is that, as described in Section 3.1, the dynamics of the controlled process $X$ are given in weak formulation whereas those of $Y$ are given in strong formulation as in (4.6). Though the reader might find this atypical, we recall that this feature is common in the dynamic programming approach in contract theory. Since up until this point, our approach has borrowed ideas from this literature, it is not surprising to find this feature in (4.9).
$(ii)$

Let us also digress on our choice to reformulate (3.5) as an optimal control problem with target constraints. This is certainly not the only possible reformulation available. Alternatively, thanks to Proposition 4.2, (3.5) also admits a reformulation as an optimal control problem of FBSDEs. Yet we think that there are some shortcomings in following this route. Though there exists some literature on this class of control problems, because there is no general comparison principle for FBSDEs, results tend to leverage the stochastic maximum principle to derive both necessary and sufficient conditions for optimality. Consequently, most of these works consider continuously differentiable state-dependent data in order to derive necessary conditions. Additional concavity/convexity assumptions are needed to derive sufficient conditions in terms of a system of FBSDEs with twice as many variables as in the initial system, see for instance [24, Chapter 10]. Be it as it may, we believe that the sufficient condition obtained through our approach, see Theorem 5.8 below, is more amenable to the analysis and numerical implementations than those in the literature on the control of FBSDEs.

Recall that ${\cal R}^{\star}(y,\alpha,Z,\Gamma)$ is non-empty thanks to Proposition 4.2 and the discussion thereafter. Since we agreed that the supremum over an empty set is $-\infty$ , the supremum in the $y$ -variable could be taken instead over the set

\mathfrak{T}\coloneqq\{y\in\mathbb{R}:{\cal R}^{\star}(y,\alpha,Z,\Gamma)\neq\emptyset\text{ for some }(Z,\Gamma,\alpha)\in{\color[rgb]{0,0.6,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.6,0}{\cal C}^{\alpha}}\times{\cal A}\},

which corresponds to the so-called target reachability set in the language of stochastic target problems as studied for instance in [69]. By Equation 3.5, the reward of the leader is only computed under optimal responses $(\mathbb{P},\beta)\in{\cal R}^{\star}(\alpha)$ , and ${\cal R}^{\star}(y,\alpha,Z,\Gamma)$ provides the optimal responses of the follower.

The interpretation of $\hat{V}_{L}$ is as follows. The leader decides $y\in\mathbb{R}$ and optimal controls $(Z^{\star},\Gamma^{\star},\alpha^{\star})\in{{\cal C}^{\alpha^{\text{$\star$}}}}\times{\cal A}$ . She then announces her control $\alpha^{\star}\in{\cal A}$ for which she knows that the value of the follower is $y$ , i.e., $V_{f}(\alpha^{\star})=y$ and that his optimal controls belong to ${\cal R}^{\star}(y,Z^{\star},\Gamma^{\star},\alpha^{\star})$ . The leader can make a recommendation to the follower for his optimal response and corresponding value, which are followed since he has no better alternative. This holds true for every $y\in\mathfrak{T}$ and the optimal choice of this value is the one that maximises the objective function of the leader. This new problem is a reformulation of the problem of the leader as the following result shows.

Theorem 4.6.

The reformulated and the original problem of the leader have the same value, that is, $\hat{V}_{L}=V_{L}$ .

Proof.

$(i)$ Let $y\in\mathbb{R}$ and assume that $y\in\mathfrak{T}$ since the supremum in the $y$ -variable in (4.9) can be reduced to this set. Take next $(\alpha,Z,\Gamma)\in{\cal A}\times{{\cal C}^{\alpha}}$ , $(\mathbb{P},\beta)\in{\cal R}^{\star}(y,\alpha,Z,\Gamma)$ , and let $Y^{y,\alpha,Z,\Gamma}$ be the process given by (4.6). By Proposition 4.4, $y=Y_{0}^{y,\alpha,Z,\Gamma}=V_{f}(\alpha)$ and $(\mathbb{P},\beta)\in{\cal R}^{\star}(\alpha)$ . This means that the optimal response of the follower to the action $\alpha$ is given by $(\mathbb{P},\beta)$ . Therefore, the objective function in problem $\hat{V}_{L}$ at $(y,\alpha,Z,\Gamma,\mathbb{P},\beta)$ is matched by the objective function in $V_{L}$ at $(\alpha,\mathbb{P},\beta)$ . This implies $\hat{V}_{L}\leq V_{L}$ .

$(ii)$ We show that the leader’s objective function in $V_{L}$ can be approximated by elements in ${{\cal C}^{\alpha}}$ . Let $\alpha\in{\cal A}$ and $(\mathbb{P}^{\star},\beta^{\star})\in{\cal R}^{\star}(\alpha)$ . By Proposition 4.2, there is $(Y,Z,K)$ solution to the 2BSDE (4.3). We argue in 2 steps.

Step $1$ . We construct an approximate solution to (4.3). Let $\varepsilon>0$ , $y\coloneqq\mathbb{E}^{\mathbb{P}^{\text{$\star$}}}[Y_{0}]$ and define

K_{t}^{\varepsilon}\coloneqq\frac{1}{\varepsilon}\int_{(t-\varepsilon)^{\text{$+$}}}^{t}K_{s}\mathrm{d}s,\;Y^{\varepsilon}_{t}\coloneqq y-\int_{0}^{t}F_{s}(X_{\cdot\wedge s},Z_{s},\widehat{\sigma}_{s}^{2},\alpha_{s})\mathrm{d}s+\int_{0}^{t}Z_{s}\cdot\mathrm{d}X_{s}+\int_{0}^{t}\mathrm{d}K_{s}^{\varepsilon}.

Note that $K^{\varepsilon}$ is absolutely continuous, $\mathbb{F}^{{\cal P}^{\text{$\alpha$}}}$ -predictable, non-decreasing ${\cal P}^{\alpha}\text{\rm--q.s.}$ , and $K_{T}^{\varepsilon}=0$ , $\mathbb{P}^{\star}\text{\rm--a.s.}$ Since $K_{T}^{\varepsilon}\leq K_{T}$ , $K^{\varepsilon}\in\mathbb{I}^{p}(\mathbb{F}^{{\cal P}^{\text{$\alpha$}}},{\cal P}^{\alpha})$ satisfies (4.4) and $Y_{T}^{\varepsilon}$ satisfies the required integrability. That is, $(Y^{\varepsilon},Z,K^{\varepsilon})$ satisfies (4.3) with terminal condition $Y_{T}^{\varepsilon}$ . By standard a priori estimates, see [58, Theorem 4.4], we have that $\|Y^{\varepsilon}\|_{\mathbb{S}^{\text{$p$}}(\mathbb{F}^{\text{${\cal P}$}^{\text{$\alpha$}}},{\cal P}^{\text{$\alpha$}})}<\infty$ . All in all, we deduce that $(Y^{\varepsilon},Z,K^{\varepsilon})$ is a solution to 2BSDE (4.3) with terminal condition $Y_{T}^{\varepsilon}$ .

Step $2$ . We show the approximation can be given in terms of elements in ${{\cal C}^{\alpha}}$ . Let $\dot{K}^{\varepsilon}$ be the density, with respect to Lebesgue measure, of $K^{\varepsilon}$ . We claim that there is an $\mathbb{F}$ -predictable process $\Gamma^{\varepsilon}$ such that

\dot{K}_{t}^{\varepsilon}=H^{\rm F}_{t}(X_{\cdot\wedge t},Z_{t},\Gamma_{t}^{\varepsilon},\alpha_{t})-F_{t}(X_{\cdot\wedge t},Z_{t},\widehat{\sigma}^{2}_{t},\alpha_{t})-\frac{1}{2}\text{Tr}[\widehat{\sigma}^{2}_{t}\Gamma^{\varepsilon}_{t}].

Indeed, we argue as in the proof of [60, Theorem 4.3]. Let us first note that the map $\gamma\longmapsto H^{\rm F}_{t}(x,z,\gamma,a)$ has domain $\mathbb{S}^{d}$ , is convex, continuous, and coercive by the boundedness of $c$ , $\lambda$ , and $\sigma$ . From the coercivity, it follows that $\sup_{\gamma\in\mathbb{S}^{\text{$d$}}}\big{\{}\frac{1}{2}{\rm Tr}[\widehat{\sigma}^{2}_{t}(x)\gamma]-H^{\rm F}_{t}(x,z,\gamma,a)\big{\}}$ has a maximiser in $\mathbb{S}^{d}$ . Thus, since $(2H)=(2F)^{\ast}$ , it follows from standard results in convex analysis, see [61, Theorem 23.5], that we can find a (measurable) process $\Gamma$ such that the equality $H^{\rm F}_{t}(X_{\cdot\wedge t},Z_{t},\Gamma_{t},\alpha_{t})=F_{t}(X_{\cdot\wedge t},Z_{t},\widehat{\sigma}^{2}_{t},\alpha_{t})+\frac{1}{2}\text{Tr}[\widehat{\sigma}^{2}_{t}\Gamma_{t}]$ holds and a (measurable) process $\Gamma^{\prime}$ (we omit its dependence on $\varepsilon$ ) such that one has strict inequality if $\Gamma$ is replaced by $\Gamma^{\prime}$ in the previous formula. The claim follows by taking $\Gamma^{\varepsilon}\coloneqq\Gamma\mathbf{1}_{\{K^{\varepsilon}=0\}}+\Gamma^{\prime}\mathbf{1}_{\{K^{\varepsilon}>0\}}$ . We then find that $(Z,\Gamma^{\varepsilon})\in{{\cal C}^{\alpha}}$ since

Y^{\varepsilon}_{T}=y-\int_{0}^{T}H^{\rm F}_{s}(X_{\cdot\wedge s},Z_{s},\Gamma_{s}^{\varepsilon},\alpha_{s})\mathrm{d}s+\int_{0}^{T}Z_{s}\cdot\mathrm{d}X_{s}+\frac{1}{2}\int_{0}^{T}\text{Tr}[\widehat{\sigma}^{2}_{s}\Gamma^{\varepsilon}_{s}]\mathrm{d}s=Y_{T}^{y,Z,\Gamma^{\varepsilon},\alpha},

and, recalling that $K=K^{\varepsilon}=0$ , $\mathbb{P}^{\star}\text{\rm--a.s.}$ , we see that $\Gamma^{\varepsilon}=\Gamma$ , $\mathrm{d}t\otimes\mathrm{d}\mathbb{P}^{\star}\text{\rm--a.e.}$ , and deduce that $Y=Y^{\varepsilon},$ $\mathbb{P}^{\star}\text{\rm--a.s.}$ In particular, $Y^{\varepsilon}_{T}=g(X_{\cdot\wedge T}),\;\mathbb{P}^{\star}\text{\rm--a.s.}$ Thus, from Proposition 4.4, we deduce that $(\mathbb{P}^{\star},\beta^{\star})$ satisfies (4.7) and thus $(\mathbb{P}^{\star},\beta^{\star})\in{\cal R}^{\star}(y,Z,\Gamma^{\varepsilon},\alpha)$ . Similarly to the conclusion in part $(i)$ , this implies that $\hat{V}_{L}\geq V_{L}$ . ∎

5 Solving the problem of the leader: strong formulation

In this section, we use the techniques developed in [14, 13] based on the geometric dynamic programming principle, see [67, 68], to study Markovian stochastic target control problems. For this reason, and to take full advantage of the standard tools from stochastic target problems, we bring ourselves to a Markovian setting and study the strong formulation of (4.9). We expect it to be equivalent to $\hat{V}_{L}$ , see Remark 5.2.

In this setting, $(\Omega,{\cal F}_{T},\mathbb{F},\mathbb{P})$ denotes an abstract complete probability space supporting a $\mathbb{P}$ –Brownian motion, which we still denote $W$ , and $\mathbb{F}$ denotes the filtration generated by $W$ , augmented under $\mathbb{P}$ so that it satisfies the usual conditions. In addition, the dependence of the data of the problem on $(t,x)\in[0,T]\times{\cal C}([0,T];\mathbb{R}^{d})$ is only through $(t,x(t))\in[0,T]\times\mathbb{R}^{d}$ . With a slight abuse of notation, we now write $c(t,x(t),a,b)$ instead of $c_{t}(x,a,b)$ —and similarly for all the other mappings introduced in the previous sections—and thus without any risk of misunderstanding, consider now the maps as defined on $\mathbb{R}^{d}$ instead of ${\cal C}([0,T];\mathbb{R}^{d})$ .

In light of Proposition 4.4, by a measurable selection argument, we introduce ${\cal B}^{\star}$ as the set of Borel-measurable maps $b^{\star}:[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times A\longrightarrow B$ which are uniformly Lipschitz-continuous in $(x,z)$ and such that

\displaystyle H^{\rm F}(t,x,z,\gamma,a)=h^{\rm F}(t,x,z,\gamma,a,b^{\star}(t,x,z,\gamma,a)),\;(t,x,z,\gamma,a)\in[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times A.

^inline^inlinetodo: inlineThe Lipschitz-continuity assumption requires some comments. Also, what are we doing: assuming that all best-responses satisfy this, or only select the ones that do? The former seems a bit nicer to me, since this is a assumption on primitives of the model, why the latter means we are potentially solving a sub-optimal problem for the leader.

We now topologise ${\cal B}^{\star}$ . Consider the measurable space $(O,{\cal O},\lambda)$ , where $O\coloneqq[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times A$ , and ${\cal O}$ and $\lambda$ denote the Borel $\sigma$ -algebra and Lebesgue measure on $O$ , respectively. ^inline^inlinetodo: inlineWe use ${\cal O}$ later, change notation. We see ${\cal B}^{\star}$ as a subspace of $\mathbb{L}^{1}(O,\nu)$ , where $\mathrm{d}\nu=C{\mathrm{e}}^{-\|\cdot\|}\mathrm{d}\lambda$ and $C>0$ is a normalising constant. In this way, as a subspace of a separable metric space, ${\cal B}^{\star}$ is separable. Lastly, for any $b^{\star}\in{\cal B}^{\star}$ and $\varphi\in\{C,c,\lambda,\sigma,\lambda\sigma,\sigma\sigma^{\top}\!\}$ we define

\displaystyle\varphi(t,x,{\color[rgb]{0.19,0.55,0.91}\definecolor[named]{pgfstrokecolor}{rgb}{0.19,0.55,0.91}a},z,\gamma)\coloneqq\varphi(t,x,a,b^{\star}(t,x,z,\gamma,a)).

(5.1)

^inline^inlinetodo: inlineI don’t think this is a meaningful definition: first (but I guess this is a typo?) you have

\varphi

on both sides, and more importantly, the value has to depend on the choice of

b^{\star}

. This does not matter for

h^{F}

since they are all maximisers of it, but it does for all other functions. Why not using the notation

\varphi^{b^{\text{$\star$}}}

then?

With this, we introduce the following set of assumptions.

Assumption 5.1.

In addition to 3.1, we assume that $c$ , $\sigma$ and $\sigma\lambda$ are Lipschitz-continuous in $(x,b)$ , uniformly in $(t,a)$ .

We let $\mathfrak{C}$ be the family of tuples $(\alpha,Z,\Gamma,b^{\star})$ consisting of $\mathbb{F}$ -predictable processes $(\alpha,Z,\Gamma):[0,T]\times\Omega\longrightarrow A\times\mathbb{R}^{d}\times\mathbb{S}^{d}$ and $b^{\star}\in{\cal B}^{\star}$ such that, for some $p>1$ ,

\|Z\|_{\mathbb{H}^{\raisebox{1.0pt}{\text{$p$}}}}^{p}+\|\Gamma\|_{\mathbb{G}^{\raisebox{1.0pt}{\text{$p$}}}}^{p}+\|b^{\star}\|_{\mathbb{L}^{\raisebox{1.0pt}{\text{$p$}}}}^{p}<+\infty.

^inline^inlinetodo: inlineWhy did the norms suddenly change here?? I know

\sigma

is bounded, but still, why take it off the norms?

To alleviate the notation, we use $\upsilon$ to denote a generic element of $\mathfrak{C}$ and $\hat{\upsilon}=(\alpha,Z,\Gamma)$ its first three components. With this, given $t\in[0,T]$ , $(x,y)\in\mathbb{R}^{d+1}$ and $\upsilon\in\mathfrak{C}$ , the controlled state processes are given by

\displaystyle\begin{split}X_{u}^{t,x,\upsilon}&=x+\int_{t}^{u}\sigma\lambda(s,X_{s}^{t,x,\upsilon},\hat{\upsilon}_{s})\mathrm{d}s+\int_{t}^{u}\sigma(s,X_{s}^{t,x,\upsilon},\hat{\upsilon}_{s})\mathrm{d}W_{s},\;u\in[t,T],\\ Y_{u}^{t,x,y,\upsilon}&=y-\int_{t}^{u}c(s,X_{s}^{t,x,\upsilon},\hat{\upsilon}_{s})\mathrm{d}s+\int_{t}^{u}Z_{s}\cdot\sigma(s,X_{s}^{t,x,\upsilon},\hat{\upsilon}_{s})\mathrm{d}W_{s},\;u\in[t,T].\end{split}

(5.2)

With this, we define the problem

\tilde{V}_{L}\coloneqq\sup_{y\in\mathbb{R}}V(0,x_{0},y),

(5.3)

where

V(t,x,y)\coloneqq\sup_{\upsilon\in{\mathfrak{C}}(t,x,y)}\mathbb{E}^{\mathbb{P}}\bigg{[}\int_{t}^{T}C(s,X_{s}^{t,x,\upsilon},\hat{\upsilon}_{s})\mathrm{d}s+G(X_{T}^{t,x,\upsilon})\bigg{|}{\cal F}_{t}\bigg{]},

(5.4)

and, for $(t,x,y)\in[0,T]\times\mathbb{R}^{d+1}$ ,

\mathfrak{C}(t,x,y)\coloneqq\big{\{}\upsilon\in\mathfrak{C}:\hat{\upsilon}\text{ is independent of }{\cal F}_{t}\text{, and }Y_{T}^{t,y,x,\upsilon}=g(X_{T}^{t,x,\upsilon}),\ \mathbb{P}\text{\rm--a.s.}\big{\}}.

^inline^inlinetodo: inlineAs it is written, there should be an essup in the dynamic value. Though there shouldn’t be a need to condition if we define the state variables starting from

t

, which you do below.

Remark 5.2.

Let us comment on the previous formulation.

$(i)$

We remind the reader that in the strong formulation, the background probability measure $\mathbb{P}$ is fixed, and thus, the norms in the definition of ${\mathfrak{C}}$ coincide with those in the standard literature. In particular, contrary to the weak formulation, the family $\mathfrak{C}$ does not depend on the choice of $\alpha\in{\cal A}$ . We also remark that $\mathfrak{C}$ is a separable topological space. This guarantees the geometric dynamic programming principle of [14], based on [67], holds.
$(ii)$

The Lipschitz-continuity in $(x,b)$ of $\sigma$ and $\lambda\sigma$ in 5.1 and in $x$ of $b^{\star}$ in the definition of ${\cal B}^{\star}$ , ensure that the process $X^{t,x,\upsilon}$ is well defined, and provide sufficient regularity to conduct our upcoming analysis. Notice that $Y^{t,x,y,\upsilon}$ is a direct definition. Note, we do not assume the uniqueness of maximisers of $h^{\rm F}$ in $b$ . The Lipschitz-continuity in $(x,b)$ of $c$ in 5.1 together with the Lipschitz-continuity of $b^{\star}\in{\cal B}^{\star}$ in $z$ will be used to establish a comparison principle for the target boundaries in Section 5.1.
$(iii)$

Let us also digress on the equivalence of the strong and weak formulations. A potential roadmap to obtain this result uses [28]. Indeed, to handle the constraint in both formulations, it is natural to embed it in the reward by means of a Lagrange multiplier $k\geq 0$ and the continuous penalty function $\Phi(y,x)\coloneqq|g(x)-y|^{2}$ . In this way, after establishing that strong duality holds, the results in [28] will allow us to obtain the equivalence of the strong and weak formulations for each element of a family of penalised problems, obtained by fixing $k$ and optimising over the corresponding controls. The only work needed to complete this argument is the strong duality results for the Lagrangian versions of both $\hat{V}_{L}$ and $\tilde{V}_{L}$ . We have refrained from writing such arguments as this will require, for instance, introducing the so-called relaxed formulation of $\hat{V}_{L}$ , which will unnecessarily encumber our analysis.

As usual in stochastic target problems, we define the target reachability set as the set set of triplets $(t,x,y)$ such that the set $\mathfrak{C}(t,x,y)$ is non-empty. That is

V_{g}(t)\coloneqq\big{\{}(x,y)\in\mathbb{R}^{d+1}:\exists\upsilon\in{\mathfrak{C}}(t,x,y),\;Y_{T}^{t,x,y,\upsilon}=g(X_{T}^{t,x,\upsilon}),\;\mathbb{P}\text{\rm--a.s.}\big{\}}.

We are interested in characterising $V_{g}(t)$ , through the auxiliary sets

	$\displaystyle V_{g}^{-}(t)\coloneqq\big{\{}(x,y)\in\mathbb{R}^{d+1}:\exists\upsilon\in{\mathfrak{C}}(t,x,y),\;Y_{T}^{t,x,y,\upsilon}\geq g(X_{T}^{t,x,\upsilon}),\;\mathbb{P}\text{\rm--a.s.}\big{\}},$
	$\displaystyle V_{g}^{+}(t)\coloneqq\big{\{}(x,y)\in\mathbb{R}^{d+1}:\exists\upsilon\in{\mathfrak{C}}(t,x,y),\;Y_{T}^{t,x,y,\upsilon}\leq g(X_{T}^{t,x,\upsilon}),\;\mathbb{P}\text{\rm--a.s.}\big{\}}.$

Notice that the inclusion $V_{g}(t)\subseteq V_{g}^{-}(t)\cap V_{g}^{+}(t)$ is immediate. The set $V_{g}^{-}(t)$ has been studied by [68, 13] and its boundary can be characterised through the auxiliary value function defined below

w^{-}(t,x)\coloneqq\inf\{y\in\mathbb{R}:(x,y)\in V_{g}^{-}(t)\}.

(5.5)

It is known, see for instance [14, Corollary 2.1], that the closure of $V_{g}^{-}(t)$ is given by

\overline{V_{g}^{-}(t)}=\big{\{}(x,y):y\geq w^{-}(t,x)\big{\}}.

Moreover, $w^{-}$ is a discontinuous viscosity solution of the following PDE

-\partial_{t}w(t,x)-H^{-}\big{(}t,x,\partial_{x}w(t,x),\partial_{xx}^{2}w(t,x)\big{)}=0,\;(t,x)\in[0,T)\times\mathbb{R}^{d},\;w(T^{-},x)=g(x),\;x\in\mathbb{R}^{d},

(5.6)

where $H^{-}:[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{S}^{d}\longrightarrow\mathbb{R}$ , and $h^{\rm b}:[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times A\times\mathbb{R}^{d}\times\mathbb{S}^{d}\longrightarrow\mathbb{R}$ are given by

\displaystyle\begin{split}H^{-}(t,x,p,Q)&\coloneqq\inf_{(a,z,\gamma,b^{\text{$\star$}})\in N(t,x,p)}\big{\{}h^{\rm b}(t,x,p,Q,a,z,\gamma)\big{\}},\\ h^{\rm b}(t,x,p,Q,a,z,\gamma)&\coloneqq c(t,x,a,z,\gamma)+\sigma\lambda(t,x,a,z,\gamma)\cdot p+\frac{1}{2}\text{Tr}[\sigma{\sigma}^{\top}\!(t,x,a,z,\gamma)Q],\end{split}

(5.7)

and, since $\sigma\sigma^{\top}\!$ is invertible by assumption

\displaystyle N(t,x,p)\coloneqq\{(a,z,\gamma,b^{\star})\in A\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times{\cal B}^{\star}:\sigma^{\top}\!(t,x,a,z,\gamma)(z-p)=0\}=A\times\{p\}\times\mathbb{S}^{d}\times{\cal B}^{\star}.

(5.8)

Similarly, by doing a change of variables and following the same ideas, the closure of $V_{g}^{+}(t)$ can be characterised as follows

\overline{V_{g}^{+}(t)}=\big{\{}(x,y):y\leq w^{+}(t,x)\big{\}},

where the auxiliary value function $w^{+}$ is defined by

w^{+}(t,x)\coloneqq\sup\{y\in\mathbb{R}:(x,y)\in V_{g}^{+}(t)\},

(5.9)

and it is a discontinuous viscosity solution of the PDE

-\partial_{t}w(t,x)-H^{+}\big{(}t,x,\partial_{x}w(t,x),\partial_{xx}^{2}w(t,x)\big{)}=0,\;(t,x)\in[0,T)\times\mathbb{R}^{d},\;w(T^{-},x)=g(x),\;x\in\mathbb{R}^{d},

(5.10)

where $H^{+}:[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{S}^{d}\longrightarrow\mathbb{R}$ is given by

H^{+}(t,x,p,Q)\coloneqq\sup_{(a,z,\gamma,b^{\text{$\star$}})\in N(t,x,p)}\big{\{}h^{\rm b}(t,x,p,Q,a,z,\gamma)\big{\}}.

We propose the two auxiliary value functions as the upper and lower boundaries of $V_{g}(t)$ , and thus define the set

\hat{V}_{g}(t)\coloneqq\{(x,y):w^{-}(t,x)\leq y\leq w^{+}(t,x)\},

which, provided the upper and lower boundaries are sufficiently separated before $T$ , corresponds to the closure of the reachability set $V_{G}(t)$ , as we prove next. For this, we introduce

\delta_{\varepsilon}\coloneqq\inf_{(t,x)\in[0,T-\varepsilon]\times\mathbb{R}^{\text{$d$}}}|w^{-}(t,x)-w^{+}(t,x)|,\;\varepsilon>0.

Lemma 5.3.

Let $t\in[0,T]$ . The following holds

$(i)$

$V_{g}(t)\subseteq\hat{V}_{g}(t).$
$(ii)$

If in addition $\delta_{\varepsilon}>0$ for any $\varepsilon>0$ , and $w^{-}$ and $w^{+}$ are continuous, then, ${\rm int}\big{(}\hat{V}_{g}(t)\big{)}\subseteq V_{g}(t)$ and $\overline{V_{g}(t)}=\hat{V}_{g}(t)$ .

Remark 5.4.

Let us provide a sufficient structural condition for the assumption $\delta_{\varepsilon}>0$ for any $\varepsilon>0$ , before presenting the proof of Lemma 5.3. We claim that it holds if PDE (5.6) satisfies a comparison principle, as we will establish in Section 5.1, and there is $\eta>0$ such that

H^{+}(t,x,p,Q)\geq H^{-}(t,x,p,Q)+\eta,\;\forall(t,x,p,Q)\in[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{S}^{d}.

(5.11)

Indeed, under this condition, it is easy to see that the function $\hat{w}^{-}(t,x)\coloneqq w^{-}(t,x)+\eta(T-t)$ is a discontinuous viscosity sub-solution to PDE (5.6). Therefore, from the comparison principle we have $\hat{w}^{-}\leq w^{+}$ , which implies $\delta_{\varepsilon}>0$ for any $\varepsilon>0$ . A similar argument works if PDE (5.10) satisfies a comparison principle instead.

Proof of Lemma 5.3.

Let us argue $(i)$ . Let $(x,y)\in V_{g}(t)$ , then there exists $\upsilon\in\mathfrak{C}(t,x,y)$ such that $Y_{T}^{t,x,y,\upsilon}=g(X_{T}^{t,x,y,\upsilon}),\;\mathbb{P}\text{\rm--a.s.}$ Then it is clear that $(x,y)$ belongs to both auxiliary sets $V_{g}^{-}(t)$ and $V_{g}^{+}(t)$ , that is, $(x,y)\in V_{g}^{-}(t)\cap V_{g}^{+}(t)$ . Since $\hat{V}_{g}(t)=\overline{V_{g}^{-}(t)}\cap\overline{V_{g}^{+}(t)}$ , it follows that $V_{g}(t)\subseteq\hat{V}_{g}(t)$ .

As for $(ii)$ , we first note that the second part of the statement, i.e. $\overline{V_{g}(t)}=\hat{V}_{g}(t)$ , follows from the inclusions $\text{int}(\hat{V}_{g}(t))\subseteq V_{g}(t)\subseteq\hat{V}_{g}(t)$ by taking closure. Let us now argue $\text{int}(\hat{V}_{g}(t))\subseteq V_{g}(t)$ . To increase the readability of the proof, given $(t,x,y)\in[0,T]\times\mathbb{R}^{d+1}$ and $\upsilon\in{\mathfrak{C}}(t,x,y)$ , we will say that $\upsilon$ satisfies $(U)$ or $(L)$ whenever $Y_{T}^{t,x,y,\upsilon}\geq g(X_{T}^{t,x,\upsilon}),\;\mathbb{P}\text{\rm--a.s.}$ , or, $Y_{T}^{t,x,y,\upsilon}\leq g(X_{T}^{t,x,\upsilon}),\;\mathbb{P}\text{\rm--a.s.}$ , respectively. Let $t\in[0,T]$ and $(x,y)\in{\rm int}(\hat{V}_{G}(t))$ . We argue in 2 steps.

Step 1. We fix $n\in\mathbb{N}^{\star}$ and construct an admissible control up to $T_{n}\coloneqq T-n^{-1}$ . Since $(x,y)\in{\rm int}(\hat{V}_{G}(t))$ , by continuity, we have that $w^{-}(t,x)<y<w^{+}(t,x)$ . Thus, in particular, there is $\upsilon^{0,n}\in{\mathfrak{C}}(t,x,y)$ satisfying $(U)$ . Let $X^{0,n}\coloneqq X^{t,x,\upsilon^{\text{$0$}\text{$,$}\text{$n$}}},Y^{0,n}\coloneqq Y^{t,x,y,\upsilon^{\text{$0$}\text{$,$}\text{$n$}}}$ . By [14, Corollary 2.1], $Y^{{0},n}_{s}\geq w^{-}(s,X^{0,n}_{s})$ , $s\in[t,T]$ . We have two cases. If $Y^{0,n}_{s}=w^{-}(s,X^{0,n}_{s})$ for some $s\in[t,T]$ , by the Markov property and the continuity of $w^{-}$ , we find that $(x,y)\in V_{g}(t)$ as desired and conclude the proof. Otherwise, we have that $Y^{0,n}_{s}>w^{-}(s,X^{0,n}_{s})$ , $s\in[t,T]$ . We then define the sequence of $\mathbb{F}$ –stopping times $(\tau_{k}^{n})_{k\in\{0,\dots,k(n)\}}$ , with $k(n)\in\mathbb{N}$ to be defined, recursively as follows

\displaystyle\tau_{0}^{n}

\displaystyle\coloneqq\inf\big{\{}s\geq t:w^{+}\big{(}s,X^{0,n}_{s}\big{)}-Y^{0,n}_{s}\leq\delta_{n^{\text{$-$}\text{$1$}}}/3\big{\}}\wedge T_{n}.

If $\tau_{0}^{n}=T_{n}$ , we set $k(n)=0$ and conclude the construction. Otherwise, by continuity, we have that $w^{+}\big{(}\tau_{0}^{n},X_{\tau_{\text{$0$}}^{\text{$n$}}}^{0,n}\big{)}-Y_{\tau_{\text{$0$}}^{\text{$n$}}}^{0,n}=\delta_{n^{\text{$-$}\text{$1$}}}/3$ . By definition of $\delta_{\varepsilon}$ , we have that

\displaystyle\big{(}X_{\tau_{\text{$0$}}^{\text{$n$}}}^{0,n},Y_{\tau_{\text{$0$}}^{\text{$n$}}}^{0,n}\big{)}\in{\rm int}\big{(}\hat{V}_{g}(\tau_{0}^{n})\big{)},\;\mathbb{P}\text{\rm--a.s.},\;\text{\emph{i.e.}}\;w^{-}\big{(}\tau_{0}^{n},X_{\tau_{\text{$0$}}^{\text{$n$}}}^{0,n}\big{)}<Y_{\tau_{\text{$0$}}^{\text{$n$}}}<w^{+}\big{(}\tau_{0}^{n},X_{\tau_{\text{$0$}}^{\text{$n$}}}^{0,n}\big{)},\;\mathbb{P}\text{\rm--a.s.}

(5.12)

Thus, by [14, Corollary 2.1], there is $\upsilon^{1,n}\in{\mathfrak{C}}(t,x,y)$ , satisfying $(L)$ and $\upsilon^{1,n}=\upsilon^{0,n}$ on $[t,\tau_{0}^{n})$ . Let now

\displaystyle\tau_{1}^{n}

\displaystyle\coloneqq\inf\big{\{}s\geq\tau_{0}^{n}:Y_{s}^{1,n}-w^{-}(s,X_{s}^{1,n})\leq\delta_{n^{\text{$-$}\text{$1$}}}/3\big{\}}\wedge T_{n},\;\text{for}\;X^{1,n}\coloneqq X^{t,x,\upsilon^{\text{$1$}\text{$,$}\text{$n$}}},\;Y^{1,n}\coloneqq Y^{t,x,y,\upsilon^{\text{$1$}\text{$,$}\text{$n$}}}.

Arguing as above, by definition of $\tau_{1}^{n}$ , we find that $(X_{\tau_{\text{$1$}}^{\text{$n$}}}^{1,n},Y_{\tau_{\text{$1$}}^{\text{$n$}}}^{1,n})\in{\rm int}\big{(}\hat{V}_{g}(\tau_{1}^{n})\big{)},\;\mathbb{P}\text{\rm--a.s.}$ Thus, again by [14, Corollary 2.1], there is $\upsilon^{2,n}\in{\mathfrak{C}}(t,x,y)$ such that $(U)$ holds and $\upsilon^{2,n}=\upsilon^{1,n}$ on $[\tau_{0}^{n},\tau_{1}^{n})$ . Recursively, for $k\in\mathbb{N}^{\star}$ we let $X^{k,n}\coloneqq X^{t,x,\upsilon^{\text{$k$}\text{$,$}\text{$n$}}},\;Y^{k,n}\coloneqq Y^{t,x,y,\upsilon^{\text{$k$}\text{$,$}\text{$n$}}}$

	$\displaystyle\tau_{2k}^{n}$	$\displaystyle\coloneqq\inf\big{\{}s\geq\tau_{2k-1}^{n}:w^{+}(\tau_{k-1}^{n},X_{s}^{k,n})-Y_{s}^{k,n}\leq\delta_{n^{\text{$-$}\text{$1$}}}/3\big{\}}\wedge T_{n},$
	$\displaystyle\tau_{2k+1}^{n}$	$\displaystyle\coloneqq\inf\big{\{}s\geq\tau_{2k}^{n}:Y_{s}^{k,n}-w^{-}(\tau_{k-1}^{n},X_{s}^{k,n})\leq\delta_{n^{\text{$-$}\text{$1$}}}/3\big{\}}\wedge T_{n},$

and find $\upsilon^{k+1,n}\in\mathfrak{C}(t,x,y)$ for which $(X_{\tau_{\text{$k$}}^{\text{$n$}}}^{k,n},Y_{\tau_{\text{$k$}}^{\text{$n$}}}^{k,n})\in{\rm int}\big{(}\hat{V}_{g}(\tau_{k}^{n})\big{)},\;\mathbb{P}\text{\rm--a.s.}$

We now claim that there is a process $k(n)$ with values in $\mathbb{N}$ such that $\tau^{n}_{k(n)}=T_{n}$ , $\mathbb{P}\text{\rm--a.s.}$ Indeed, by continuity of $w^{-}$ and $w^{+}$ , the mappings

[t,T_{n}]\ni s\longmapsto w^{+}\big{(}s,X_{s}^{t,x,\upsilon}\big{)}-Y_{s}^{t,x,y,\upsilon},\;\text{and}\;[t,T_{n}]\ni s\longmapsto Y_{s}^{t,x,y,\upsilon}-w^{-}\big{(}s,X_{s}^{t,x,\upsilon}\big{)},

are, $\omega$ -by- $\omega$ , uniformly continuous for any $\upsilon\in{\mathfrak{C}}(t,x,y)$ . Hence, there exists a constant $\bar{\gamma}_{n}>0$ and a $[\bar{\gamma}_{n},T_{n}]$ -valued random variable $\gamma_{n}$ such that, $\|\tau_{k}^{n}-\tau_{k-1}^{n}\|_{\infty}>\gamma_{n},$ $\mathbb{P}\text{\rm--a.s.},\;k\in\mathbb{N}$ . This proves the claim.

At the end of this construction, we set $\upsilon^{n}\coloneqq\upsilon^{k(n),n}$ , and notice that $\upsilon^{n}\in{\mathfrak{C}}(t,x,y)$ and

\displaystyle w^{-}\big{(}T_{n},X_{T_{\text{$n$}}}^{n}\big{)}<Y_{T_{\text{$n$}}}^{n}<w^{+}\big{(}T_{n},X_{T_{\text{$n$}}}^{n}\big{)},\;\mathbb{P}\text{\rm--a.s.},\text{ for }X^{n}\coloneqq X^{t,x,\upsilon^{\text{$n$}}},Y^{n}\coloneqq Y^{t,x,y,\upsilon^{\text{$n$}}}.

(5.13)

Step 2. We iterate the previous construction. From here on, we can repeat Step 1, with $(T_{n},X_{T_{\text{$n$}}}^{n})$ , control $\upsilon^{n}$ , and $n+1$ playing the role of $(t,x)$ , $\upsilon^{0,n}$ and $n$ , respectively. With this, we obtain the existence of $\upsilon^{n+1}\in{\mathfrak{C}}(t,x,y)$ , such that, by uniform continuity, (5.13) holds at $(T_{n+1},X^{n+1}_{T_{\text{$n$}\text{$+$}\text{$1$}}})$ and $Y^{n+1}_{T_{\text{$n$}\text{$+$}\text{$1$}}}$ . Iterating this construction, we find $\upsilon$ which is well-defined $\mathrm{d}t\otimes\mathrm{d}\mathbb{P}\text{\rm--a.e.}$ on $[0,T]\times\Omega$ .¹⁷¹⁷17Indeed, the construction allows us to define said process $\mathrm{d}t\otimes\mathrm{d}\mathbb{P}\text{\rm--a.e.}$ on $[t,T)\times\Omega$ , and consequently, $\mathrm{d}t\otimes\mathrm{d}\mathbb{P}\text{\rm--a.e.}$ on $[0,T]\times\Omega$ .

To conclude $(x,y)\in V_{g}(t)$ , let $n\longrightarrow\infty$ in (5.13), and notice that by continuity of $w^{-}$ and $w^{+}$ we have that $Y_{T}^{t,x,y,\upsilon}=g(X_{T}^{t,x,\upsilon})$ as desired.∎

5.1 Boundaries PDEs: comparison and verification

In this section, we prove the following verification theorem for the solutions to PDEs (5.6) and (5.10). We remind the reader the 5.1 is in place.

Theorem 5.5.

Let $u$ and $v$ be continuous viscosity solutions to (5.6) and (5.10), respectively, with linear growth. Then $u=w^{-}$ and $v=w^{+}$ .

We conduct the analysis for $w^{-}$ , the argument for $w^{+}$ being analogous. We start by establishing a comparison result for viscosity solutions to (5.6). Let us recall that $w^{-}$ is a discontinuous viscosity solution of such an equation.

Lemma 5.6.

Let $u$ and $v$ be respectively an upper–semi-continuous viscosity sub-solution and a lower–semi-continuous viscosity super-solution of (5.6), such that for $\varphi\in\{u,v\}$ and some $C>0$ , $\varphi(y)\leq C(1+\|y\|)$ , $y\in[0,T]\times\mathbb{R}^{d}$ . If, $u(T,x)\leq v(T,x)$ , $x\in\mathbb{R}^{d}$ , then $u\leq v$ on ${\cal O}\coloneqq(0,T)\times\mathbb{R}^{d}$ .

Proof.

Step 1. Fix postive constants $\alpha$ , $\beta$ , $\eta$ , and $\varepsilon$ , and define $\phi(t,x,y)\coloneqq u^{\eta}(t,x)-v(t,y)$ , where $u^{\eta}(t,x)\coloneqq u(t,x)-\frac{\eta}{t},\;(t,x)\in{\cal O}$ . Note that since $\frac{\partial}{\partial t}(-\eta t^{-1})=\eta t^{-2}>0$ , $u^{\eta}$ is a viscosity sub-solution of (5.6) in ${\cal O}$ . Define as well

\psi_{\alpha,\beta,\varepsilon}(t,x,y)\coloneqq\alpha|x-y|^{2}/2+\varepsilon|x|^{2}+\varepsilon|y|^{2}-\beta(t-T).

Now, let

M_{\alpha,\beta,\varepsilon}\coloneqq\sup_{(t,x,y)\in(0,T]\times\mathbb{R}^{\text{$d$}}\times\mathbb{R}^{\text{$d$}}}\big{(}\phi-\psi_{\alpha,\beta,\varepsilon}\big{)}(t,x,y)=(\phi-\psi_{\alpha,\beta,\varepsilon})(t_{\alpha,\beta,\varepsilon},x_{\alpha,\beta,\varepsilon},y_{\alpha,\beta,\varepsilon}),

for some $(t_{\alpha,\beta,\varepsilon},x_{\alpha,\beta,\varepsilon},y_{\alpha,\beta,\varepsilon})\in(0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}$ thanks to the upper–semi-continuity of $u^{\eta}-v$ , the growth assumptions on $u$ and $v$ and that of $\beta(t-T)-\eta t^{-1}$ . Moreover, we have that $-\infty<\lim_{\alpha\rightarrow\infty}M_{\alpha,\beta,\varepsilon}<\infty$ , meaning that the supremum is attained on a compact subset of $(0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}$ . Consequently, there is a subsequence $(t_{n}^{\beta,\varepsilon},x_{n}^{\beta,\varepsilon},y_{n}^{\beta,\varepsilon})\coloneqq(t_{\alpha_{\text{$n$}},\beta,\varepsilon},x_{\alpha_{\text{$n$}},\beta,\varepsilon},y_{\alpha_{\text{$n$}},\beta,\varepsilon})$ that converges to some $(\hat{t}^{\beta,\varepsilon},\hat{x}^{\beta,\varepsilon},\hat{y}^{\beta,\varepsilon})$ . It then follows from [21, Proposition 3.7] that

\displaystyle\hat{x}^{\beta,\varepsilon}=\hat{y}^{\beta,\varepsilon},\;\lim_{n\to\infty}\alpha_{n}|x^{\beta,\varepsilon}_{n}-y^{\beta,\varepsilon}_{n}|^{2}=0,\;M_{\beta,\varepsilon}\coloneqq\lim_{n\longrightarrow\infty}M_{\alpha_{\text{$n$}},\beta,\varepsilon}=\sup_{(t,x)\in\overline{\cal O}}(u^{\eta}-v)(t,x)-2\varepsilon|\hat{x}^{\varepsilon}|+\beta(\hat{t}^{\beta,\varepsilon}-T).

(5.14)

Step 2. To prove the statement, as it is standard in the literature, let us assume by contradiction that there is $(t_{o},x_{o})\in{\cal O}$ such that $\gamma_{o}\coloneqq(u-v)(t_{o},x_{o})>0$ . We claim that there are positive $\beta_{o}$ , $\eta_{o}$ , and $\varepsilon_{o}$ such that for any $\beta_{o}\geq\beta>0$ , $\eta_{o}\geq\eta>0$ , $\varepsilon_{o}\geq\varepsilon>0$ , $(t_{n}^{\beta,\varepsilon},x_{n}^{\beta,\varepsilon},y_{n}^{\beta,\varepsilon})$ is a local maximiser of $\phi(t,x,y)-\psi_{\alpha_{\text{$n$}},\beta,\varepsilon}(t,x,y)$ on $(0,T)\times{\cal K}^{2}$ for some ${\cal K}\subseteq\mathbb{R}^{d}$ compact. We first note that the existence of ${\cal K}$ is clear since the supremum is attained on a compact set. It remains to show that $t_{n}^{\beta,\varepsilon}<T$ for all $n\in\mathbb{N}$ .

Suppose by contradiction that $t_{n}^{\beta,\varepsilon}=T$ for some $n$ . Thanks to the first step, for any positive $\beta$ , $\varepsilon$ , and $\eta$ we have that

\displaystyle\gamma_{o}-\frac{\eta}{t_{o}}+\beta(t_{o}-T)-2\varepsilon|x_{o}|^{2}\leq M_{\alpha_{\text{$n$}},\beta,\varepsilon}=\sup_{(x,y)\in\mathbb{R}^{\text{$d$}}\times\mathbb{R}^{\text{$d$}}}\big{\{}u(T,x)-v(T,y)-\alpha_{n}|x-y|^{2}/2-\varepsilon|x|^{2}-\varepsilon|y|^{2}\big{\}}-\frac{\eta}{T}\leq-\frac{\eta}{T},

where the rightmost inequality follows from the assumption $u(T,x)\leq v(T,x)$ , $x\in\mathbb{R}^{d}$ . Consequently

\displaystyle\gamma_{o}\leq\frac{\eta}{t_{o}}-\frac{\eta}{T}+\beta(T-t_{o})+2\varepsilon|x_{o}|^{2},

so that for $\beta$ , $\varepsilon$ , and $\eta$ sufficiently small, $\gamma_{o}$ is arbitrarily small which contradicts $\gamma_{o}>0$ . This proves the claim.

Step 3. In light of the second step, it follows from Crandall–Ishii’s lemma for parabolic problems, [21, Theorem 8.3], applied to $u^{\eta}$ and $v$ that we can find $(q_{n},\hat{q}_{n})$ , $q_{n}-\hat{q}_{n}=\partial_{t}\psi_{\alpha,\beta,\varepsilon}(t,x,y)=-\beta$ , and symmetric matrices $(X_{n}^{\beta,\varepsilon},Y_{n}^{\beta,\varepsilon})$ such that

\big{(}q_{n},\alpha_{n}(x_{n}^{\beta,\varepsilon}-y_{n}^{\beta,\varepsilon})+\varepsilon x_{n}^{\beta,\varepsilon},X_{n}^{\beta,\varepsilon}\big{)}\in\overline{\cal P}^{1,2,+}u^{\eta}(x_{n}^{\beta,\varepsilon}),\;\big{(}\hat{q}_{n},-\alpha_{n}(x_{n}^{\beta,\varepsilon}-y_{n}^{\beta,\varepsilon})+\varepsilon y_{n}^{\beta,\varepsilon},Y_{n}^{\beta,\varepsilon}\big{)}\in\overline{\cal P}^{1,2,-}v(y_{n}^{\beta,\varepsilon}),\;

and, for $C_{n}\coloneqq\alpha_{n}\begin{pmatrix}I_{d}&-I_{d}\\ -I_{d}&I_{d}\end{pmatrix}+\varepsilon I_{2d}$ , we have that

-\bigg{(}\frac{1}{\lambda}+\|C_{n}\|\bigg{)}I_{2d}\leq\begin{pmatrix}X_{n}^{\beta,\varepsilon}&0\\ 0&-Y_{n}^{\beta,\varepsilon}\end{pmatrix}\leq C_{n}(I_{2d}+\lambda C_{n}),\text{ for all }\lambda>0.

Taking $\lambda=(\alpha_{n}+\varepsilon)^{-1}$ leads to

\displaystyle-\Big{(}\alpha_{n}+\varepsilon+\|C_{n}\|\Big{)}I_{2d}\leq\begin{pmatrix}X_{n}^{\beta,\varepsilon}&0\\ 0&-Y_{n}^{\beta,\varepsilon}\end{pmatrix}\leq 3\alpha_{n}\begin{pmatrix}I_{d}&-I_{d}\\ -I_{d}&I_{d}\end{pmatrix}+2\varepsilon I_{2d}.

(5.15)

Step 4. With the notation $(t_{n},x_{n},y_{n})\coloneqq(t_{n}^{\beta,\varepsilon},x_{n}^{\beta,\varepsilon},y_{n}^{\beta,\varepsilon})$ , $p_{n}^{x}\coloneqq\alpha_{n}(x_{n}-y_{n})-\varepsilon x_{n}$ , $p_{n}^{y}\coloneqq\alpha_{n}(x_{n}-y_{n})-\varepsilon y_{n}$ , under the above assumptions we claim that there exists a universal constant $C>0$ such that

H^{-}(t_{n},y_{n},p_{n}^{y},Q_{2})-H^{-}(t_{n},x_{n},p_{n}^{x},Q_{1})\leq C\big{(}1+\varepsilon^{2}\|x_{n}\|+\varepsilon^{2}\|y_{n}\|+\varepsilon\big{)}\big{(}\alpha_{n}\|x_{n}-y_{n}\|^{2}+\|x_{n}-y_{n}\|+\varepsilon\big{)}

for matrices $Q_{1}$ , $Q_{2}$ satisfying (5.15). We consider each term in $h^{\rm b}$ separately, recall (5.7) and (5.8).¹⁸¹⁸18The following estimates hold for arbitrary, but fixed, $(a,\gamma,b^{\star})$ .

Letting $\Sigma^{x}\coloneqq\sigma(t_{n},x_{n},a,p_{n}^{x},\gamma),\Sigma^{y}\coloneqq\sigma(t_{n},y_{n},a,p_{n}^{y},\gamma)$ , note that there is $C>0$ such that

	$\displaystyle\text{Tr}[\sigma\sigma^{\top}\!(t_{n},y_{n},a,b^{\star}(t,y_{n},p_{n}^{y},\gamma,a))Q_{2}]-\text{Tr}[\sigma\sigma^{\top}\!(t_{n},x_{n},a,b^{\star}(t,x_{n},p_{n}^{x},\gamma,a))Q_{1}]$
	$\displaystyle=\text{Tr}\bigg{[}\begin{pmatrix}\Sigma^{x}{\Sigma^{x}}^{\top}&\Sigma^{x}{\Sigma^{y}}^{\top}\\ \Sigma^{y}{\Sigma^{x}}^{\top}&\Sigma^{y}{\Sigma^{y}}^{\top}\end{pmatrix}\begin{pmatrix}Q_{2}&0\\ 0&-Q_{1}\end{pmatrix}\bigg{]}$
	$\displaystyle\leq 3\alpha_{n}\text{Tr}\bigg{[}\begin{pmatrix}\Sigma^{x}{\Sigma^{x}}^{\top}&\Sigma^{x}{\Sigma^{y}}^{\top}\\ \Sigma^{y}{\Sigma^{x}}^{\top}&\Sigma^{y}{\Sigma^{y}}^{\top}\end{pmatrix}\begin{pmatrix}I_{d}&-I_{d}\\ -I_{d}&I_{d}\end{pmatrix}\bigg{]}+2\varepsilon\text{Tr}\bigg{[}\begin{pmatrix}\Sigma^{x}{\Sigma^{x}}^{\top}&\Sigma^{x}{\Sigma^{y}}^{\top}\\ \Sigma^{y}{\Sigma^{x}}^{\top}&\Sigma^{y}{\Sigma^{y}}^{\top}\end{pmatrix}I_{2d}\bigg{]}$
	$\displaystyle=3\alpha_{n}\text{Tr}\big{[}(\Sigma^{x}-\Sigma^{y})(\Sigma^{x}-\Sigma^{y})^{\top}\big{]}+2\varepsilon\text{Tr}\big{[}\Sigma^{x}{\Sigma^{x}}^{\top}\!+\Sigma^{y}{\Sigma^{y}}^{\top}\big{]}$
	$\displaystyle=3\alpha_{n}\\|\Sigma^{x}-\Sigma^{y}\\|^{2}+2\varepsilon\text{Tr}\big{[}\Sigma^{x}{\Sigma^{x}}^{\top}\!+\Sigma^{y}{\Sigma^{y}}^{\top}\big{]}$
	$\displaystyle\leq 3\alpha_{n}\\|\sigma(t_{n},x_{n},p_{n}^{x},\gamma,a)-\sigma(t_{n},y_{n},p_{n}^{y},\gamma,a)\\|^{2}+4\varepsilon C_{\sigma\sigma^{\top}\!}\leq C\big{(}(1+\varepsilon)\alpha_{n}\\|x_{n}-y_{n}\\|^{2}+\varepsilon\big{)},$

where the first inequality follows from the right-hand side of (5.15), $C_{\sigma\sigma^{\top}\!}$ denotes the bound, assumed to exists, on $\sigma\sigma^{\top}\!$ and the last inequality follows from 5.1. Similarly, note that there is a constant $C>0$ such that

	$\displaystyle c(t_{n},y_{n},p_{n}^{y},\gamma,a)-c(t_{n},x_{n},p_{n}^{x},\gamma,a)$	$\displaystyle\leq C\big{(}\\|x_{n}-y_{n}\\|+\\|b^{\star}(t_{n},y_{n},p_{n}^{y},\gamma,a)-b^{\star}(t_{n},x_{n},p_{n}^{x},\gamma,a)\\|\big{)}$
		$\displaystyle\leq C\big{(}\\|x_{n}-y_{n}\\|+\\|p_{n}^{y}-p_{n}^{x}\\|\big{)}\leq C\big{(}\\|x_{n}-y_{n}\\|+\varepsilon\big{)},$

and

	$\displaystyle\sigma\lambda(t_{n},x_{n},p_{n}^{x},\gamma,a)\cdot p_{n}^{x}-\sigma\lambda(t_{n},y_{n},p_{n}^{y},\gamma,a)\cdot p_{n}^{y}$
	$\displaystyle\leq\\|\sigma\lambda(t_{n},x_{n},p_{n}^{x},\gamma,a)\\|\\|p_{n}^{x}-p_{n}^{y}\\|+\\|\sigma\lambda(t_{n},x_{n},p_{n}^{x},\gamma,a)-\sigma\lambda(t_{n},y_{n},p_{n}^{y},\gamma,a)\\|\\|p_{n}^{y}\\|$
	$\displaystyle\leq\varepsilon C\\|x_{n}-y_{n}\\|+C\\|p_{n}^{y}\\|(1+\varepsilon)\\|x_{n}-y_{n}\\|\leq C(1+\varepsilon+\varepsilon^{2}\\|y_{n}\\|)\big{(}\\|x_{n}-y_{n}\\|+\alpha_{n}\\|x_{n}-y_{n}\\|^{2}\big{)}.$

The result follows from using these estimates back in the Hamiltonian.

Step 5. We conclude. By Step 3 and the viscosity properties of $u^{\eta}$ and $v$ , we have that

-q_{n}+H^{-}\big{(}t_{n}^{\beta,\varepsilon},x_{n}^{\beta,\varepsilon},\alpha_{n}(x_{n}^{\beta,\varepsilon}-y_{n}^{\beta,\varepsilon})-\varepsilon x_{n}^{\beta,\varepsilon},X_{n}^{\beta,\varepsilon}\big{)}\leq 0\leq-\hat{q}_{n}+H^{-}\big{(}t_{n}^{\beta,\varepsilon},y_{n}^{\beta,\varepsilon},\alpha_{n}(x_{n}^{\beta,\varepsilon}-y_{n}^{\beta,\varepsilon})-\varepsilon y_{n}^{\beta,\varepsilon},Y_{n}^{\beta,\varepsilon}\big{)}.

Subtracting, we find from Step 4 that

	$\displaystyle\beta=\hat{q}_{n}-q_{n}$	$\displaystyle\leq H^{-}\big{(}t_{n}^{\beta,\varepsilon},y_{n}^{\beta,\varepsilon},\alpha_{n}(x_{n}^{\beta,\varepsilon}-y_{n}^{\beta,\varepsilon})-\varepsilon y_{n}^{\beta,\varepsilon},Y_{n}^{\beta,\varepsilon}\big{)}-H^{-}\big{(}t_{n}^{\beta,\varepsilon},x_{n}^{\beta,\varepsilon},\alpha_{n}(x_{n}^{\beta,\varepsilon}-y_{n}^{\beta,\varepsilon})-\varepsilon x_{n}^{\beta,\varepsilon},X_{n}^{\beta,\varepsilon}\big{)}$
		$\displaystyle\leq C(1+\varepsilon^{2}\\|x_{n}^{\beta,\varepsilon}\\|+\varepsilon^{2}\\|y_{n}^{\beta,\varepsilon}\\|+\varepsilon)\big{(}\alpha_{n}\\|x_{n}^{\beta,\varepsilon}-y_{n}^{\beta,\varepsilon}\\|^{2}+\\|x_{n}^{\beta,\varepsilon}-y_{n}^{\beta,\varepsilon}\\|+\varepsilon\big{)}.$

Passing to the limit $n\longrightarrow\infty$ and $\varepsilon\longrightarrow 0$ , thanks to (5.14), we find that $\beta\leq 0$ which is a contradiction. ∎

The next lemma proves, in particular, that the auxiliary value function satisfies the hypotheses of Lemma 5.6.

Lemma 5.7.

Suppose the functions $H^{+}$ and $H^{-}$ are continuous. The functions $w^{-}$ and $w^{+}$ from $[0,T]\times\mathbb{R}^{d}$ to $\mathbb{R}$ defined in (5.5) are bounded and continuous.

For completing the last step in the verification result, we have assumed the continuity of the Hamiltonian functions. We remark that this assumption holds, for instance, if the optimisation over $\gamma$ in the definition of $H^{+}$ and $H^{-}$ can be reduced to a compact set, continuously with respect to $(t,x,p,Q)$ .

Proof of Lemma 5.7.

We prove the result for $w^{-}$ , the other being analogous. We first argue $w^{-}$ is bounded. Let $(t,x)\in[0,T]\times\mathbb{R}^{d}$ and $y>T\ell_{c}+\ell_{g}$ . We claim that $(x,y)\in V_{g}(t)$ . Indeed, taking the control $Z\equiv 0$ , $\Gamma\equiv 0$ and any $(\alpha,b^{\star})\in{\cal A}\times{\cal B}^{\star}$ we have

Y_{T}^{t,x,y,\upsilon}=y-\int_{t}^{T}c\big{(}s,X_{s}^{t,x,\upsilon},Z_{s},\hat{\upsilon}\big{)}\textrm{d}s\geq y-T\ell_{c}>\ell_{g}\geq g(X_{T}^{t,x,\upsilon}).

That is $w^{-}(t,x)\leq T\ell_{c}+\ell_{g}$ . To obtain a lower bound take again $(t,x)\in[0,T]\times\mathbb{R}^{d}$ and $y<-T\ell_{c}-\ell_{g}$ . Then, it is easy to check that for any $M\in\mathbb{R}$ and any $\upsilon\in{\mathfrak{C}}$ the following process is an $(\mathbb{F},\mathbb{P})$ –super-martingale

A_{s}\coloneqq Y_{s}^{t,x,y,\upsilon}-s\ell_{c}+M,\;s\in[0,T].

Thus, choosing $M=T\ell_{c}+\ell_{g}$ , we have that $\mathbb{E}^{\mathbb{P}}[Y_{T}^{t,x,y,\upsilon}-T\ell_{c}+M]\leq y+M<0$ , which implies $\mathbb{P}[Y_{T}^{t,x,y,\upsilon}+\ell_{g}<0]>0$ . Therefore, for any $\upsilon\in{\mathfrak{C}}$

\mathbb{P}\big{[}Y_{T}^{t,x,y,\upsilon}<g(X_{T}^{t,x,\upsilon})\big{]}\geq\mathbb{P}\big{[}Y_{T}^{t,x,y,\upsilon}+\ell_{g}<0\big{]}>0,

which means that the pair $(x,y)\not\in V_{g}^{1}(t)$ . Thus, $w^{-}(t,x)\geq-T\ell_{c}-\ell_{g}$ .

Let us now prove the continuity. By [13, Theorem 2.1], $w^{-}$ is a discontinuous viscosity solution to PDE (5.10) as long as we verify Assumption 2.1 therein. Indeed, the continuity condition on the set $N(t,x,p)$ holds in our case given the explicit form that was obtained in (5.8). Since $H^{-}$ is continuous, the lower– and upper–semi-continuous envelopes $w^{-}_{\star}$ and $w^{-,\star}$ are viscosity super-solution and sub-solution, respectively, of Equation 5.6. From [13, Theorem 2.2], which in our case is not subject to the gradient constraints (see [13, Remark 2.1] and notice that in our setting their set N^c is empty), we conclude that $w^{-,\star}(T,\cdot)\leq g\leq w^{-}_{\star}(T,\cdot)$ . Finally, from Lemma 5.6, we have therefore that $w^{-,\star}\leq w^{-}_{\star}$ on $[0,T)\times\mathbb{R}^{d}$ . Since the reverse inequality holds by definition, we conclude the equality of the semicontinuous envelopes and thus the continuity of $w^{-}$ . ∎

Proof of Theorem 5.5.

The result is an immediate consequence of Lemmata 5.6 and 5.7. ∎

5.2 PDE characterisation for the problem of the leader

Having conducted the analysis of the auxiliary boundary functions $w^{-}$ and $w^{+}$ , we are in a position to provide a verification theorem for Problem 5.4. Theorem 5.8 below provides a PDE characterisation for the intermediate problem of the leader under the CL information structure. Let us remark that once $V(t,x,y)$ is found it only remains to optimise over $y\in\mathbb{R}$ .

To ease the notation, we will use ${\rm x}\in\mathbb{R}^{d+1}$ and $u\in A\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times{\cal B}^{\star}\eqqcolon U$ , to denote the values of the state and control processes associated with Problem 5.4, that is, we make the convention ${\rm x}=(x,y)$ and $u=(a,z,\gamma,b^{\star})$ . In this way, recall (5.1), we let $C(t,{\rm x},u)\coloneqq C(t,x,a,z,\gamma)$ and similarly for the other functions in the analysis. Moreover, we denote the drift and volatility coefficients, $(\mu,\vartheta):[0,T]\times\mathbb{R}^{d+1}\times U\longrightarrow\mathbb{R}^{d+1}\times\mathbb{R}^{(d+1)\times n}$ associated with the state process ${\rm X}\coloneqq(X,Y)$ by

\displaystyle\mu(t,{\rm x},u)\coloneqq\begin{pmatrix}\sigma\lambda(t,x,u)\\ -c(t,x,u)\end{pmatrix},\;\vartheta(t,{\rm x},u)\coloneqq\begin{pmatrix}\sigma(t,x,u)\\ z\cdot\sigma(t,x,u)\end{pmatrix}.

Given $w\in C^{1,2}([0,T)\times\mathbb{R}^{d})$ , we define the sets

	$\displaystyle U^{-}(t,x,w)$	$\displaystyle\coloneqq\big{\{}u\in U:\sigma^{\top}\!(t,x,u)(z-\partial_{x}w(t,x))=0,\;-\partial_{t}w(t,x)-h^{\rm b}(t,x,\partial_{x}w(t,x),\partial_{xx}^{2}w(t,x),u)\geq 0\big{\}},$
	$\displaystyle U^{+}(t,x,w)$	$\displaystyle\coloneqq\big{\{}u\in U:\sigma^{\top}\!(t,x,u)(z-\partial_{x}w(t,x))=0,\;-\partial_{t}w(t,x)-h^{\rm b}(t,x,\partial_{x}w(t,x),\partial_{xx}^{2}w(t,x),u)\leq 0\big{\}},$

and, for $i\in\{-,+\}$ , introduce the Hamiltonians $(H^{\rm L},H^{i,w}):[0,T]\times\mathbb{R}^{d+1}\times\mathbb{R}^{d+1}\times\mathbb{S}^{d+1}\longrightarrow\mathbb{R}$ , given by

\displaystyle{H}^{\rm L}(t,{\rm x},{\rm p},{\rm Q})\coloneqq\sup_{u\in U}\big{\{}{h}^{\rm L}(t,{\rm x},{\rm p},{\rm Q},u)\big{\}},\;{\rm H}^{i,w}(t,{\rm x},{\rm p},{\rm Q})\coloneqq\sup_{u\in U^{\text{$i$}}(t,x,w)}\big{\{}{h}^{\rm L}(t,{\rm x},{\rm p},{\rm Q},u)\big{\}},

(5.16)

where

{h}^{\rm L}(t,{\rm x},{\rm p},{\rm Q},u)\coloneqq C(t,x,u)+\mu(t,{\rm x},u)\cdot{\rm p}+\frac{1}{2}{\rm Tr}[\vartheta\vartheta^{\top}\!(t,{\rm x},u){\rm Q}].

Below, ${\cal T}_{T}$ denotes the family of $\mathbb{F}$ –stopping times with values on $[0,T]$ . With this, we have all the elements necessary to state our main result, which is the following verification theorem.

Theorem 5.8.

Let $w^{i}\in C^{1,2}([0,T)\times\mathbb{R}^{d})\cap C^{0}([0,T]\times\mathbb{R}^{d})$ , $i\in\{-,+\}$ , be solutions to (5.6) and (5.10), and $v\in C^{1,2}([0,T)\times\mathbb{R}^{d+1})\cap C^{0}([0,T]\times\mathbb{R}^{d+1})$ satisfy

\displaystyle\begin{cases}-\partial_{t}v(t,{\rm x})-{\rm H}^{\rm L}(t,{\rm x},\partial_{\rm x}v(t,{\rm x}),\partial_{\rm xx}^{2}v(t,{\rm x}))=0,\;(t,x,y)\in[0,T)\times\mathbb{R}^{d}\times(w^{-}(t,x),w^{+}(t,x)),\\ -\partial_{t}v(t,{\rm x})-{\rm H}^{i,w^{\text{$i$}}}(t,{\rm x},\partial_{\rm x}v(t,{\rm x}),\partial_{\rm xx}^{2}v(t,{\rm x}))=0,\;(t,x,y)\in[0,T)\times\mathbb{R}^{d}\times\{w^{i}(t,x)\},\;i\in\{-,+\}.\\ v(T^{-},{\rm x})=G(x),\;(x,y)\in\mathbb{R}^{d}\times\{g(x)\}.\end{cases}

(5.17)

Moreover, suppose that

•

the family $\{v(\tau,{X}_{\tau}^{\upsilon},Y_{\tau}^{\upsilon})\}_{\tau\in{\cal T}_{T}}$ is uniformly integrable for all controls $\upsilon\in{\mathfrak{C}};$
•

there exists $\upsilon^{\star}:[0,T]\times\mathbb{R}^{d}\times[w^{-},w^{+}]\longrightarrow A\times\mathbb{R}^{d}\times\mathbb{S}^{d}\times{\cal B}^{\star}$ attaining the maximisers in $H^{\rm L}$ , $H^{i,w^{i}}$ , $i\in\{+,-\};$ ¹⁹¹⁹19Here, $[w^{-},w^{+}]\coloneqq\{y\in\mathbb{R}:w^{-}(t,x)\leq y\leq w^{+}(t,x),\text{ for some }(t,x)\in[0,T]\times\mathbb{R}^{d}\}$ .
•

there is a strong solution to the system (5.2) with control $(\alpha^{\star}_{\cdot},Z_{\cdot}^{\star},\Gamma_{\cdot}^{\star},b^{\star})\coloneqq\upsilon^{\star}(\cdot,X_{\cdot},Y_{\cdot});$ ^inline^inlinetodo: inlineUnique?
•

$(\alpha^{\star},Z^{\star},\Gamma^{\star},b^{\star})\in{\mathfrak{C}}$ .

Then, $V(t,x,y)=v(t,x,y)$ , and $(\alpha^{\star},Z^{\star},\Gamma^{\star},b^{\star})$ is an optimal control for the problem $V(t,x,y)$ .

Remark 5.9.

We remark that we could build upon one of the main results of [14] to characterise the functions $V$ , $w^{+}$ and $w^{-}$ given by (5.4), (5.5), and (5.9), respectively, as viscosity solutions to—a relaxed version of—(5.6), (5.10) and (5.17), respectively. In particular, if one can show that $V$ , $w^{+}$ , and $w^{-}$ are smooth and the associated Hamiltonians are continuous, the relaxation reduces to the above system. We refer to [14] for details. We have refrained from doing so as the above verification theorem gives the result most useful in solving any example in practice. In Section 2.2, we use the above result and search for solution to the above system directly.

Proof.

Let $t\in[0,T]$ , $(x,y)\in V_{g}(t)$ , $\upsilon\in\mathfrak{C}(t,x,y)$ , and $(X,Y)\coloneqq(X^{t,x,\upsilon},Y^{t,x,y,\upsilon})$ be given by (5.2). We set ${\rm X}\coloneqq(X,Y)$ . By Lemma 5.3, we have that $w^{-}(t,x)\leq y\leq w^{+}(t,x),$ and by a comparison argument we find that $w^{-}(s,X_{s})\leq Y_{s}\leq w^{+}(s,X_{s})$ , $s\in[t,T]$ , $\mathbb{P}\text{\rm--a.s.}$ ^inline^inlinetodo: inlineWhat do you mean by comparison argument? Let $\tau$ be given by

\tau\coloneqq\inf\{s>t:Y_{s}=w^{-}(s,X_{s}),\text{ or, }Y_{s}=w^{+}(s,X_{s})\}\wedge T.

We now consider the process $v(t,{\rm X}_{t})\coloneqq v(t,X_{t},Y_{t})$ and compute $v(t,{\rm X}_{t})-v(\theta,{\rm X}_{\theta})=v(t,{\rm X}_{t})-v(\tau,{\rm X}_{\tau})+v(\tau,{\rm X}_{\tau})-v(\theta,{\rm X}_{\theta})\eqqcolon I_{1}+I_{2}$ , for $\theta\in{\cal T}_{T}$ , $\tau\leq\theta$ . It follows from Itô’s formula that

$\displaystyle I_{1}$	$\displaystyle=-\int_{t}^{\tau}\Big{(}\partial_{t}v(s,{\rm X}_{s})\mathrm{d}s+\frac{1}{2}{\rm Tr}[\partial_{\rm xx}^{2}v(s,{\rm X}_{s})\mathrm{d}[{\rm X}]_{s}]\Big{)}-\int_{t}^{\tau}\partial_{\rm x}v(s,{\rm X}_{s})\cdot\mathrm{d}{\rm X}_{s}$
	$\displaystyle=\int_{t}^{\tau}\Big{(}{\rm H}^{\rm L}\big{(}s,{\rm X}_{s},\partial_{\rm x}v(s,{\rm X}_{s}),\partial_{\rm xx}^{2}v(s,{\rm X}_{s})\big{)}-{\rm h}^{\rm L}\big{(}s,{\rm X}_{s},\partial_{\rm x}v(s,{\rm X}_{s}),\partial_{\rm xx}^{2}v(s,{\rm X}_{s}),\upsilon_{s}\big{)}\Big{)}\mathrm{d}s$
	$\displaystyle\quad+\int_{t}^{\tau}C(s,X_{s},\upsilon_{s})\mathrm{d}s-\int_{t}^{\tau}\big{(}\partial_{x}v(s,{\rm X}_{s}),\partial_{y}v(s,{\rm X}_{s})\big{)}^{\top}\cdot\big{(}\sigma(s,X_{s},\upsilon_{s})\mathrm{d}W_{s},Z_{s}\cdot\sigma(s,X_{s},\upsilon_{s})\mathrm{d}W_{s}\big{)}^{\top}$
	$\displaystyle\geq\int_{t}^{\tau}C(s,X_{s},\upsilon_{s})\mathrm{d}s-\int_{t}^{\tau}\Big{(}\partial_{x}v(s,{\rm X}_{s})+\partial_{y}v(s,{\rm X}_{s})Z_{s}\Big{)}\cdot\sigma(s,X_{s},\upsilon_{s})\mathrm{d}W_{s},$	(5.18)

where we used the fact, on $[t,{\tau})$ , $v$ satisfies the first equation in (5.17), we computed the dynamics of ${\rm X}$ and added and subtracted $C$ to complete the term $h^{\rm L}$ . The inequality follows from the definition of $H^{\rm L}$ .

We now consider $I_{2}$ . Without loss of generality, we assume that $Y_{\tau}=w^{-}(\tau,X_{\tau})$ , and thus, by the Markov property, $Y_{s}=w^{-}(s,X_{s})$ for $s\in[\tau,T]$ , $\mathbb{P}\text{\rm--a.s.}$ ^inline^inlinetodo: inlineWhy is that the Markov property? You seem to be saying that the boundary is absorbing. Why is that true? By the uniqueness of their Itô decomposition, we deduce that $Z_{t}=\partial_{x}w^{-}(t,X_{t})$ , and $\upsilon_{t}\in N(t,X_{t},\partial_{x}w^{-}(t,X_{t})),\mathrm{d}t\otimes\mathrm{d}\mathbb{P}\text{\rm--a.e.}$ With this, applying Itô’s formula to $w^{-}(\tau,X_{\tau})$ we find that $\upsilon$ attains the infimum in (5.7); in particular, $\upsilon_{t}\in U^{-}(t,X_{t}),\mathrm{d}t\otimes\mathrm{d}\mathbb{P}\text{\rm--a.s.}$ Let $\bar{v}(t,x)\coloneqq v(t,x,w^{-}(t,x))$ , so that

	$\displaystyle I_{2}$	$\displaystyle=-\int_{\tau}^{\theta}\bigg{(}\partial_{t}\bar{v}(s,X_{s})\mathrm{d}s+\frac{1}{2}{\rm Tr}[\partial_{xx}^{2}\bar{v}(s,X_{s})\mathrm{d}[X]_{s}]\bigg{)}-\int_{t}^{\theta}\partial_{x}\bar{v}(s,X_{s})\cdot\mathrm{d}X_{s}$
		$\displaystyle=-\int_{\tau}^{\theta}\Big{(}\partial_{t}v(s,X_{s},w^{-}(s,X_{s}))+\sigma\lambda(s,X_{s},\upsilon_{s})\cdot\partial_{x}v(s,X_{s},w^{-}(s,X_{s}))\Big{)}\mathrm{d}s-\int_{\tau}^{\theta}\frac{1}{2}{\rm Tr}[\partial_{xx}^{2}\bar{v}(s,X_{s})\mathrm{d}[X]_{s}]$
		$\displaystyle\quad-\int_{\tau}^{\theta}\partial_{y}v(s,X_{s},w^{-}(s,X_{s}))\Big{(}\partial_{t}w^{-}(s,X_{s})+\sigma\lambda(s,X_{s},\upsilon_{s})\cdot\partial_{x}w^{-}(s,X_{s})\Big{)}\mathrm{d}s$
		$\displaystyle\quad-\int_{\tau}^{\theta}\Big{(}\partial_{x}v(s,X_{s},w^{-}(s,X_{s}))+\partial_{y}v(s,X_{s},w^{-}(s,X_{s}))\partial_{x}w^{-}(s,X_{s})\Big{)}\cdot\sigma(s,X_{s},u_{s})\mathrm{d}W_{s}$
		$\displaystyle=-\int_{\tau}^{\theta}\Big{(}\partial_{t}v(s,X_{s},w^{-}(s,X_{s}))+\sigma\lambda(s,X_{s},\upsilon_{s})\cdot\partial_{x}v(s,X_{s},w^{-}(s,X_{s}))-\partial_{y}v(s,X_{s},w^{-}(s,X_{s}))c(s,X_{s},\upsilon_{s})\Big{)}\mathrm{d}s$
		$\displaystyle\quad-\int_{\tau}^{\theta}\bigg{(}\frac{1}{2}{\rm Tr}[\partial_{xx}^{2}\bar{v}(s,X_{s})\mathrm{d}[X]_{s}]-\frac{1}{2}\partial_{y}v(s,X_{s},w^{-}(s,X_{s})){\rm Tr}[\partial^{2}_{xx}w^{-}(s,X_{s})\mathrm{d}[X]_{s}]\bigg{)}$
		$\displaystyle\quad-\int_{\tau}^{\theta}\partial_{y}v(s,X_{s},w^{-}(s,X_{s}))\bigg{(}\partial_{t}w^{-}(s,X_{s})+h^{\rm b}(s,X_{s},\partial_{x}w^{-}(s,X_{s}),\partial^{2}_{xx}w^{-}(s,X_{s}),\upsilon_{s})\bigg{)}\mathrm{d}s$
		$\displaystyle\quad-\int_{\tau}^{\theta}\Big{(}\partial_{x}v(s,X_{s},w^{-}(s,X_{s}))+\partial_{y}v(s,X_{s},w^{-}(s,X_{s}))\partial_{x}w^{-}(s,X_{s})\Big{)}\cdot\sigma(s,X_{s},\upsilon_{s})\mathrm{d}W_{s}$

where in the first equality, we computed the time and space derivatives of $\bar{v}$ , the dynamics of $X$ , and in the second equality, we added and subtracted $\partial_{y}v\big{(}c+\frac{1}{2}{\rm Tr}[\sigma\sigma^{\top}\!\partial^{2}_{xx}w^{-}]\big{)}$ and use the fact that $Z_{\cdot}=\partial_{x}w^{-}(\cdot,X_{\cdot})$ to complete the term $h^{\rm b}$ in the third line.

Recalling that $\upsilon$ attains the infimum in (5.7), we see the term $\partial_{t}w^{-}+h^{\rm b}$ equals $0$ . Moreover, since $Z_{\cdot}=\partial_{x}w^{-}(\cdot,X_{\cdot})$ , ${\rm Tr}\big{[}\partial^{2}_{\rm xx}v(t,{\rm X}_{t})\mathrm{d}[{\rm X}]_{t}\big{]}={\rm Tr}\big{[}\partial^{2}_{xx}\bar{v}(t,{X}_{t})\mathrm{d}[{X}]_{t}\big{]}-\partial_{y}v(t,X_{t},w^{-}(t,{X}_{t})){\rm Tr}\big{[}\partial^{2}_{xx}w^{-}(t,{X}_{t})\mathrm{d}[{X}]_{t}\big{]},\mathrm{d}t\otimes\mathrm{d}\mathbb{P}\text{\rm--a.e.}$ Consequently,

$\displaystyle I_{2}$	$\displaystyle=-\int_{\tau}^{\theta}\Big{(}\partial_{t}v(s,{\rm X}_{s})+{\rm h}^{\rm L}\big{(}s,{\rm X}_{s},\partial_{\rm x}v(s,{\rm X}_{s}),\partial_{\rm xx}^{2}v(s,{\rm X}_{s}),\upsilon_{s}\big{)}\Big{)}\mathrm{d}s$
	$\displaystyle\quad+\int_{\tau}^{\theta}C(s,X_{s},\upsilon_{s})\mathrm{d}s-\int_{\tau}^{\theta}\Big{(}\partial_{x}v(s,{\rm X}_{s})+\partial_{y}v(s,{\rm X}_{s})\partial_{x}w^{-}(s,X_{s})\Big{)}\cdot\sigma(s,X_{s},\upsilon_{s})\mathrm{d}W_{s}$
	$\displaystyle=\int_{\tau}^{\theta}\Big{(}{\rm H}^{-,w^{-}}\!\big{(}s,{\rm X}_{s},\partial_{\rm x}v(s,{\rm X}_{s}),\partial_{\rm xx}^{2}v(s,{\rm X}_{s})\big{)}-{\rm h}^{\rm L}\big{(}s,{\rm X}_{s},\partial_{\rm x}v(s,{\rm X}_{s}),\partial_{\rm xx}^{2}v(s,{\rm X}_{s}),\upsilon_{s}\big{)}\Big{)}\mathrm{d}s$
	$\displaystyle\quad+\int_{\tau}^{\theta}C(s,X_{s},\upsilon_{s})\mathrm{d}s-\int_{\tau}^{\theta}\Big{(}\partial_{x}v(s,{\rm X}_{s})+\partial_{y}v(s,{\rm X}_{s})\partial_{x}w^{-}(s,X_{s})\Big{)}\cdot\sigma(s,X_{s},\upsilon_{s})\mathrm{d}W_{s}$
	$\displaystyle\geq\int_{\tau}^{\theta}C(s,X_{s},\upsilon_{s})\mathrm{d}s-\int_{\tau}^{\theta}\Big{(}\partial_{x}v(s,{\rm X}_{s})+\partial_{y}v(s,{\rm X}_{s})\partial_{x}w^{-}(s,X_{s})\Big{)}\cdot\sigma(s,X_{s},\upsilon_{s})\mathrm{d}W_{s},$	(5.19)

where in the first equality we added and subtracted $C$ to complete the expression for $h^{\rm L}$ , and in the second equality, we used the fact that $v$ satisfies the second equation in (5.17) for ${\rm X}_{\cdot}=(X_{\cdot},w^{-}(\cdot,X_{\cdot}))$ . The inequality follows from the definition of $H^{-,w^{-}}$ and the fact that $\upsilon_{\cdot}\in N(\cdot,X_{\cdot},\partial_{x}w^{-}(\cdot,X_{\cdot})),\mathrm{d}t\otimes\mathrm{d}\mathbb{P}\text{\rm--a.e.}$

Let $(\theta_{n})_{n\geq 1}\subseteq{\cal T}_{T}$ , $t\leq\theta_{n}\leq\theta_{n+1}$ , $n\geq 1$ , $\theta_{n}\longrightarrow T$ , $\mathbb{P}\text{\rm--a.s.}$ , be such that $(X,Y)$ is bounded on $[t,\theta_{n}]$ .We now add (5.2) and (5.2), and take $\theta=\theta_{n}$ .²⁰²⁰20Note that whenever $\tau=T$ , $I_{2}=0$ and $\theta_{n}\longrightarrow\tau=T$ . By continuity, the terms $v$ , $w^{+}$ , and their derivates are bounded on $[t,\theta_{n}]$ . Thus, since $\sigma$ is bounded and $\|Z\|_{\mathbb{H}^{\raisebox{1.0pt}{\text{$p$}}}(\mathbb{F},\mathbb{P})}^{p}<\infty$ , the stochastic integrals in (5.2) and (5.2) are martingales. Consequently,

\displaystyle v(t,x,y)\geq\mathbb{E}\bigg{[}v(\theta_{n},X_{\theta_{n}},Y_{\theta_{n}})+\int_{t}^{\theta_{n}}C(s,X_{s},\upsilon_{s})\mathrm{d}s\bigg{|}{\cal F}_{t}\bigg{]}.

Thus, the uniform integrability of the family $\{v(\theta_{n},X_{\theta_{n}},Y_{\theta_{n}})\}_{n\geq 1}$ , the boundedness of $C$ , together with an application of dominated convergence, gives

\displaystyle v(t,x,y)\geq\mathbb{E}\bigg{[}G(X_{T})+\int_{t}^{T}C(s,X_{s},\upsilon_{s})\mathrm{d}s\bigg{|}{\cal F}_{t}\bigg{]},

(5.20)

where we used the boundary condition in time in (5.17) and that by the feasibility of $\upsilon$ , if $\tau=T$ we have that $Y_{T}=g(X_{T})$ , and if $\tau<T$ we have that $w^{-}(T^{-},x)=g(x)$ , see (5.10). The arbitrariness of $\upsilon$ gives $v\geq V$ . To conclude, note that for $(Z^{\star},\Gamma^{\star},\alpha^{\star})$ as in the statement, the inequalities in (5.2) and (5.2) are tight. ∎

^inline^inlinetodo: inlineI guess the discussion of what happens in the pessimistic case is missing, right? We should mention there that the value looks like the lower value of a zero-sum game where the leader would maximise his controls, but the follower would now simply minimise the criterion of the leader over his best responses. Then informally one would put the lower HJBI equation for this instead of the one we put.

References

Aïd et al. [2020] R. Aïd, M. Basei, and H. Pham. A McKean–Vlasov approach to distributed electricity generation development. Mathematical Methods of Operations Research, 91:269–310, 2020.
Bagchi [1984] A. Bagchi. Stackelberg differential games in economic models, volume 64 of Lecture notes in control and information sciences. Springer Berlin, Heidelberg, 1984.
Başar [1979] T. Başar. Stochastic stagewise Stackelberg strategies for linear quadratic systems. In M. Kohlmann and W. Vogel, editors, Stochastic control theory and stochastic differential systems. Proceedings of a workshop of the „Sonderforschungsbereich 72 der Deutschen Forschungsgemeinschaft an der Universität Bonn” which took place in January 1979 at Bad Honnef, volume 16 of Lecture notes in control and information sciences, pages 264–276. Springer Berlin, Heidelberg, 1979.
Başar [1981] T. Başar. A new method for the Stackelberg solution of differential games with sampled-data state information. IFAC Proceedings Volumes, 14(2):1365–1370, 1981.
Başar and Haurie [1984] T. Başar and A. Haurie. Feedback equilibria in differential games with structural and modal uncertainties. In J.B. Cruz, Jr., editor, Advances in large scale systems, volume 1, pages 163–201. 1984.
Başar and Olsder [1980] T. Başar and G.J. Olsder. Team-optimal closed-loop Stackelberg strategies in hierarchical control problems. Automatica, 16(4):409–414, 1980.
Başar and Olsder [1999] T. Başar and G.J. Olsder. Dynamic noncooperative game theory. SIAM, 2nd revised edition, 1999.
Başar and Selbuz [1978] T. Başar and H. Selbuz. A new approach for derivation of closed-loop Stackelberg strategies. In R.E. Larson and A.S. Willsky, editors, 1978 IEEE conference on decision and control including the 17th symposium on adaptive processes, pages 1113–1118, 1978.
Başar and Selbuz [1979] T. Başar and H. Selbuz. Closed-loop Stackelberg strategies with applications in the optimal control of multilevel systems. IEEE Transactions on Automatic Control, 24(2):166–179, 1979.
Bensoussan et al. [2014] A. Bensoussan, S. Chen, and S.P. Sethi. Feedback Stackelberg solutions of infinite-horizon stochastic differential games. In F. El Ouardighi and K. Kogan, editors, Models and methods in economics and management science: essays in honor of Charles S. Tapiero, volume 198 of International series in operations research & management science, pages 3–15. Springer Cham, 2014.
Bensoussan et al. [2015] A. Bensoussan, S. Chen, and S.P. Sethi. The maximum principle for global solutions of stochastic Stackelberg differential games. SIAM Journal on Control and Optimization, 53(4):1956–1981, 2015.
Bensoussan et al. [2019] A. Bensoussan, S. Chen, A. Chutani, S.P. Sethi, C.C. Siu, and S.C.P. Yam. Feedback Stackelberg–Nash equilibria in mixed leadership games with an application to cooperative advertising. SIAM Journal on Control and Optimization, 57(5):3413–3444, 2019.
Bouchard et al. [2009] B. Bouchard, R. Élie, and N. Touzi. Stochastic target problems with controlled loss. SIAM Journal on Control and Optimization, 48(5):3123–3150, 2009.
Bouchard et al. [2010] B. Bouchard, R. Élie, and C. Imbert. Optimal control under stochastic target constraints. SIAM Journal on Control and Optimization, 48(5):3501–3531, 2010.
Bressan [2011] A. Bressan. Noncooperative differential games. Milan Journal of Mathematics, 79:357–427, 2011.
Carmona [2016] R. Carmona. Lectures on BSDEs, stochastic control, and stochastic differential games with financial applications, volume 1 of Financial mathematics. SIAM, 2016.
Castanon [1976] D.A. Castanon. Equilibria in stochastic dynamic games of Stackelberg type. PhD thesis, Massachusetts Institute of Technology, 1976.
Chen and Cruz Jr. [1972] C.I. Chen and J.B. Cruz Jr. Stackelberg solution for two-person games with biased information patterns. IEEE Transactions on Automatic Control, 17(6):791–798, 1972.
Chutani and Sethi [2014] A. Chutani and S.P. Sethi. A feedback Stackelberg game of cooperative advertising in a durable goods oligopoly. In J. Haunschmied, V.M. Veliov, and S. Wrzaczek, editors, Dynamic games in economics, volume 16 of Dynamic modeling and econometrics in economics and finance, pages 89–114. Springer, 2014.
Cong and Shi [2024] W. Cong and J. Shi. Direct approach of linear–quadratic Stackelberg mean field games of backward–forward stochastic systems. ArXiv preprint arXiv:2401.15835, 2024.
Crandall et al. [1992] M.G. Crandall, H. Ishii, and P.-L. Lions. User’s guide to viscosity solutions of second order partial differential equations. Bulletin of the American Mathematical Society, 27(1):1–67, 1992.
Cruz Jr. [1975] J.B. Cruz Jr. Survey of Nash and Stackelberg equilibrium strategies in dynamic games. In Annals of economic and social measurement, volume 4, pages 339–344. National Bureau of Economic Research, 1975.
Cruz Jr. [1976] J.B. Cruz Jr. Stackelberg strategies for multilevel systems. In Y.C. Ho and S.K. Mitter, editors, Directions in large-scale systems, pages 139–147. Springer New York, NY, 1976.
Cvitanić and Zhang [2012] J. Cvitanić and J. Zhang. Contract theory in continuous-time models. Springer, 2012.
Cvitanić et al. [2018] J. Cvitanić, D. Possamaï, and N. Touzi. Dynamic programming approach to principal–agent problems. Finance and Stochastics, 22(1):1–37, 2018.
Dayanıklı and Laurière [2023] G. Dayanıklı and M. Laurière. A machine learning method for Stackelberg mean field games. ArXiv preprint arXiv:2302.10440, 2023.
Dockner et al. [2000] E.J. Dockner, S. Jorgensen, N. Van Long, and G. Sorger. Differential games in economics and management science. Cambridge University Press, 2000.
El Karoui and Tan [2013] N. El Karoui and X. Tan. Capacities, measurable selection and dynamic programming part II: application in stochastic control problems. Technical report, École Polytechnique and université Paris-Dauphine, 2013.
Feng et al. [2022] X. Feng, Y. Hu, and J. Huang. Backward Stackelberg differential game with constraints: a mixed terminal-perturbation and linear–quadratic approach. SIAM Journal on Control and Optimization, 60(3):1488–1518, 2022.
Fu and Horst [2020] G. Fu and U. Horst. Mean-field leader–follower games with terminal state constraint. SIAM Journal on Control and Optimization, 58(4):2078–2113, 2020.
Gardner and Cruz Jr. [1977] B Gardner and J.B. Cruz Jr. Feedback Stackelberg strategy for a two player game. IEEE Transactions on Automatic Control, 22(2):270–271, 1977.
Gou et al. [2023] Z. Gou, N.-J. Huang, and M.-H. Wang. A linear–quadratic mean-field stochastic Stackelberg differential game with random exit time. International Journal of Control, 96(3):731–745, 2023.
Guan et al. [2023] G. Guan, Z. Liang, and Y. Song. A Stackelberg reinsurance–investment game under $\alpha$ -maxmin mean–variance criterion and stochastic volatility. Scandinavian Actuarial Journal, to appear, 2023.
Han et al. [2023] X. Han, D. Landriault, and D. Li. Optimal reinsurance contract in a Stackelberg game framework: a view of social planner. Scandinavian Actuarial Journal, to appear, 2023.
Havrylenko et al. [2022] Y. Havrylenko, M. Hinken, and R. Zagst. Risk sharing in equity-linked insurance products: Stackelberg equilibrium between an insurer and a reinsurer. ArXiv preprint arXiv:2203.04053, 2022.
He et al. [2007] X. He, A. Prasad, S.P. Sethi, and G.J. Gutierrez. A survey of Stackelberg differential game models in supply and marketing channels. Journal of Systems Science and Systems Engineering, 16:385–413, 2007.
He et al. [2008] X. He, A. Prasad, and S.P. Sethi. Cooperative advertising and pricing in a dynamic stochastic supply chain: Feedback stackelberg strategies. In D.F. Kocaoglu, T.R. Anderson, and T.U. Daim, editors, PICMET ’08, Portland international conference on management of engineering & technology, pages 1634–1649, 2008.
Huang and Shi [2021] Q. Huang and J. Shi. A verification theorem for Stackelberg stochastic differential games in feedback information pattern. ArXiv preprint arXiv:2108.06498, 2021.
Kang and Shi [2022] K. Kang and J. Shi. A three-level stochastic linear–quadratic Stackelberg differential game with asymmetric information. ArXiv preprint arXiv:2210.11808, 2022.
Karandikar [1995] R.L. Karandikar. On pathwise stochastic integration. Stochastic Processes and their Applications, 57(1):11–18, 1995.
Leitmann [1978] G. Leitmann. On generalized Stackelberg strategies. Journal of Optimization Theory and Applications, 26(4):637–643, 1978.
Li et al. [2022] H. Li, J. Xu, and H. Zhang. Closed–loop Stackelberg strategy for linear–quadratic leader–follower game. ArXiv preprint arXiv:2212.08977, 2022.
Li and Yu [2018] N. Li and Z. Yu. Forward–backward stochastic differential equations and linear–quadratic generalized Stackelberg games. SIAM Journal on Control and Optimization, 56(6):4148–4180, 2018.
Li and Sethi [2017] T. Li and S.P. Sethi. A review of dynamic Stackelberg game models. Discrete & Continuous Dynamical Systems–B, 22(1):125–129, 2017.
Li and Han [2023] Y. Li and S. Han. Solving strongly convex and smooth Stackelberg games without modeling the follower. ArXiv preprint arXiv:2303.06192, 2023.
Li and Shi [2023a] Z. Li and J. Shi. Closed-loop solvability of linear quadratic mean-field type Stackelberg stochastic differential games. ArXiv preprint arXiv:2303.07544, 2023a.
Li and Shi [2023b] Z. Li and J. Shi. Linear quadratic leader–follower stochastic differential games: closed-loop solvability. Journal of Systems Science and Complexity, 36(4):1373–1406, 2023b.
Liu et al. [2018] J. Liu, Y. Fan, Z. Chen, and Y. Zheng. Pessimistic bilevel optimization: a survey. International Journal of Computational Intelligence Systems, 11(1):725–736, 2018.
Lv et al. [2023] S. Lv, J. Xiong, and X. Zhang. Linear quadratic leader–follower stochastic differential games for mean-field switching diffusions. Automatica, 154(111072):1–9, 2023.
Mallozzi and Morgan [1995] L. Mallozzi and J. Morgan. Weak Stackelberg problem and mixed solutions under data perturbations. Optimization, 32(3):269–290, 1995.
Moon [2021] J. Moon. Linear–quadratic stochastic Stackelberg differential games for jump–diffusion systems. SIAM Journal on Control and Optimization, 59(2):954–976, 2021.
Ni et al. [2023] Y.-H. Ni, L. Liu, and X. Zhang. Deterministic dynamic Stackelberg games: time-consistent open-loop solution. Automatica, 148(110757):1–9, 2023.
Nutz [2012] M. Nutz. Pathwise construction of stochastic integrals. Electronic Communications in Probability, 17(24):1–7, 2012.
Øksendal et al. [2013] B. Øksendal, L. Sandal, and J. Ubøe. Stochastic Stackelberg equilibria with applications to time-dependent newsvendor models. Journal of Economic Dynamics and Control, 37(7):1284–1299, 2013.
Papavassilopoulos [1979] G.P. Papavassilopoulos. Leader–follower and Nash strategies with state information. PhD thesis, University of Illinois at Urbana-Champaign, 1979.
Papavassilopoulos and Cruz Jr. [1979] G.P. Papavassilopoulos and J.B. Cruz Jr. Nonclassical control problems and Stackelberg games. IEEE Transactions on Automatic Control, 24(2):155–166, 1979.
Papavassilopoulos and Cruz Jr. [1980] G.P. Papavassilopoulos and J.B. Cruz Jr. Sufficient conditions for Stackelberg and Nash strategies with memory. Journal of Optimization Theory and Applications, 31(2):233–260, 1980.
Possamaï et al. [2018] D. Possamaï, X. Tan, and C. Zhou. Stochastic control for a class of nonlinear kernels and applications. The Annals of Probability, 46(1):551–603, 2018.
Possamaï et al. [2020] D. Possamaï, N. Touzi, and J. Zhang. Zero-sum path-dependent stochastic differential games in weak formulation. The Annals of Applied Probability, 30(3):1415–1457, 2020.
Ren et al. [2023] Z. Ren, X. Tan, N. Touzi, and J. Yang. Entropic optimal planning for path-dependent mean field games. SIAM Journal on Control and Optimization, 61(3):1415–1437, 2023.
Rockafellar [1970] R.T. Rockafellar. Convex analysis. Princeton University Press, 1970.
Shi et al. [2016] J. Shi, G. Wang, and J. Xiong. Leader–follower stochastic differential game with asymmetric information and applications. Automatica, 63:60–73, 2016.
Si and Wu [2021] K. Si and Z. Wu. Backward–forward linear–quadratic mean-field Stackelberg games. Advances in Difference Equations, 2021(73):1–23, 2021.
Simaan and Cruz Jr. [1973a] M. Simaan and J.B. Cruz Jr. Additional aspects of the Stackelberg strategy in nonzero-sum games. Journal of Optimization Theory and Applications, 11(6):613–626, 1973a.
Simaan and Cruz Jr. [1973b] M. Simaan and J.B. Cruz Jr. On the Stackelberg strategy in nonzero-sum games. Journal of Optimization Theory and Applications, 11(5):533–555, 1973b.
Simaan and Cruz Jr. [1976] M. Simaan and J.B. Cruz Jr. On the Stackelberg strategy in nonzero-sum games. In G. Leitmann, editor, Multicriteria decision making and differential games, Mathematical concepts and methods in science and engineering, pages 173–195. Springer New York, NY, 1976.
Soner and Touzi [2002a] H.M. Soner and N. Touzi. Dynamic programming for stochastic target problems and geometric flows. Journal of the European Mathematical Society, 4(3):201–236, 2002a.
Soner and Touzi [2002b] H.M. Soner and N. Touzi. Stochastic target problems, dynamic programming, and viscosity solutions. SIAM Journal on Control and Optimization, 41(2):404–424, 2002b.
Soner and Touzi [2003] H.M. Soner and N. Touzi. A stochastic representation for mean curvature type geometric flows. The Annals of Probability, 31(3):1145–1165, 2003.
Soner et al. [2012] H.M. Soner, N. Touzi, and J. Zhang. Wellposedness of second order backward SDEs. Probability Theory and Related Fields, 153(1–2):149–190, 2012.
Stroock and Varadhan [1997] D.W. Stroock and S.R.S. Varadhan. Multidimensional diffusion processes, volume 233 of Grundlehren der mathematischen Wissenschaften. Springer-Verlag Berlin Heidelberg, 1997.
Sun et al. [2023] J. Sun, H. Wang, and J. Wen. Zero-sum Stackelberg stochastic linear–quadratic differential games. SIAM Journal on Control and Optimization, 61(1):252–284, 2023.
Van Long [2010] N. Van Long. A survey of dynamic games in economics, volume 1 of Surveys on theories in economics and business administration. World Scientific, 2010.
Vasal [2022a] D. Vasal. Master equation of discrete-time Stackelberg mean field games with multiple leaders. ArXiv preprint arXiv:2209.03186, 2022a.
Vasal [2022b] D. Vasal. Sequential decomposition of stochastic Stackelberg games. In B. Ferri and F. Zhang, editors, 2022 American control conference, pages 1266–1271. IEEE, 2022b.
von Stackelberg [1934] H. von Stackelberg. Marktform und Gleichgewicht. Springer-Verlag Wien New York, 1934.
Wang et al. [2020] G. Wang, Y. Wang, and S. Zhang. An asymmetric information mean-field type linear–quadratic stochastic Stackelberg differential game with one leader and two followers. Optimal Control Applications and Methods, 41(4):1034–1051, 2020.
Wiesemann et al. [2013] W. Wiesemann, A. Tsoukalas, P.-M. Kleniati, and B. Rustem. Pessimistic bilevel optimization. SIAM Journal on Optimization, 23(1):353–380, 2013.
Wu [2013] Z. Wu. A general maximum principle for optimal control of forward–backward stochastic systems. Automatica, 49(5):1473–1480, 2013.
Yong [2002] J. Yong. A leader–follower stochastic linear quadratic differential game. SIAM Journal on Control and Optimization, 41(4):1015–1041, 2002.
Yong [2010] J. Yong. Optimality variational principle for controlled forward–backward stochastic differential equations with mixed initial–terminal conditions. SIAM Journal on Control and Optimization, 48(6):3675–4179, 2010.
Zemkoho [2016] A.B. Zemkoho. Solving ill-posed bilevel programs. Set-Valued and Variational Analysis, 24(3):423–448, 2016.
Zhang [2017] J. Zhang. Backward stochastic differential equations—from linear to fully nonlinear theory, volume 86 of Probability theory and stochastic modelling. Springer-Verlag New York, 2017.
Zheng and Shi [2020] Y. Zheng and J. Shi. A Stackelberg game of backward stochastic differential equations with applications. Dynamic Games and Applications, 10(4):968–992, 2020.
Zheng and Shi [2021] Y. Zheng and J. Shi. A Stackelberg game of backward stochastic differential equations with partial information. Mathematical Control and Related Fields, 11(4):797–828, 2021.
Zheng and Shi [2022a] Y. Zheng and J. Shi. A linear–quadratic partially observed Stackelberg stochastic differential game with application. Applied Mathematics and Computation, 420(126819):1–22, 2022a.
Zheng and Shi [2022b] Y. Zheng and J. Shi. Stackelberg stochastic differential game with asymmetric noisy observations. International Journal of Control, 95(9):2510–2530, 2022b.

Appendix A Illustrative example: additional proofs

A.1 The first-best case

The proof of Lemma 2.1 is very straightforward using standard HJB techniques or even by pointwise optimisation, as one can compute

\displaystyle J_{\rm L}(\alpha,\beta)\coloneqq\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\alpha_{t}^{2}\mathrm{d}t\bigg{]}=\mathbb{E}^{\mathbb{P}}\bigg{[}\int_{0}^{T}\bigg{(}\alpha_{t}+\beta_{t}-\dfrac{c_{\rm L}}{2}\alpha_{t}^{2}\bigg{)}\mathrm{d}t\bigg{]},

and directly verify that the optimal efforts are the one defined in Lemma 2.1.

A.2 ACLM formulation

Lemma A.1.

For $k>0$ , consider the closed-loop memoryless strategy $a_{k}\in{\cal A}$ defined for all $t\in[0,T]$ by

\displaystyle a_{k}(t,X_{t})\coloneqq\Pi_{[0,a_{\circ}]}\bigg{(}\frac{1}{c_{\rm L}}+k(X_{t}-X_{t}^{\star})\bigg{)},\;\text{\rm where}\;X^{\star}_{t}=x_{0}+\dfrac{t}{c_{\rm L}}+\dfrac{\mathrm{e}^{kT}}{kc_{\rm F}}\big{(}1-\mathrm{e}^{-kt}\big{)}+\sigma W_{t},\;t\in[0,T].

Define $\bar{K}\coloneqq\frac{1}{T}\log(b_{\circ}c_{\rm F})$ . Then, for a fixed $k\in(0,\bar{K}]$ and the associated strategy $a_{k}$ , the leader obtains the following reward, which is higher than his value in the AOL information case

f(k)=x_{0}+\dfrac{T}{2c_{\rm L}}+\dfrac{\mathrm{e}^{kT}-1}{kc_{\rm F}}.

Moreover, if $a_{\circ}>\frac{1}{c_{\rm L}}+b_{\circ}(b_{\circ}c_{\rm F}-1)$ and $a_{\circ}>\frac{1}{2c_{\rm F}}(b_{\circ}^{2}c_{\rm F}^{2}-1)-\frac{1}{c_{\rm L}}$ , then the solution to the ACLM problem is equal to the solution to the ACLM- $\bar{K}$ problem, with value $f(\bar{K})$ .

Proof of Lemma A.1.

$(i)$ To provide the main intuition, suppose first that the leader’s actions are unrestricted, that is $A=\mathbb{R}$ . This is the usual setting for the ACLM problems that are solved explicitly in the literature. Then, the leader announces her strategy $\alpha_{k}\in{\cal A}$ defined by

a_{k}(t,X_{t})=\frac{1}{c_{\rm L}}+k(X_{t}-X_{t}^{\star}),\;t\in[0,T].

Then, the follower’s optimisation problem originally defined in (2.2) is the following

\displaystyle V_{\rm F}(\alpha_{k})\coloneqq\sup_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{]},\;\text{subject to}\;\mathrm{d}X_{t}=\bigg{(}\frac{1}{c_{\rm L}}+k(X_{t}-X_{t}^{\star})+\beta_{t}\bigg{)}\mathrm{d}t+\sigma\mathrm{d}W_{t},\;t\in[0,T].

(A.1)

As described in Section 2.1.3, one can use the stochastic maximum principle to obtain, after solving the appropriate FBSDE system, that the optimal response of the follower is given by

\beta^{\star}_{t}=\Pi_{[0,b_{\circ}]}\bigg{(}\frac{\mathrm{e}^{k(T-t)}}{c_{\rm F}}\bigg{)},\;t\in[0,T].

(A.2)

Alternatively, one can solve this stochastic control problem in a more straightforward way, by noticing that the follower’s problem defined above by (A.1) can be rewritten as

\displaystyle V_{\rm F}(\alpha_{k})=\sup_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}X^{\star}_{T}+\widetilde{X}_{T}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{]}=x_{0}+\dfrac{T}{c_{\rm L}}+\dfrac{\mathrm{e}^{kT}-1}{kc_{\rm F}}+\sup_{\beta\in{\cal B}}\mathbb{E}^{\mathbb{P}}\bigg{[}\sigma W_{T}+\widetilde{X}_{T}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{]},

where the process $\widetilde{X}\coloneqq X-X^{\star}$ , corresponding to the only state variable of the previous control problem, satisfies the following controlled ODE

\displaystyle\mathrm{d}\widetilde{X}_{t}=\bigg{(}k\widetilde{X}_{t}+\beta_{t}-\dfrac{1}{c_{\rm F}}\mathrm{e}^{k(T-t)}\bigg{)}\mathrm{d}t,\;t\in[0,T],\;\widetilde{X}_{0}=0,

(A.3)

whose solution is given by

\widetilde{X}_{t}\coloneqq\mathrm{e}^{kt}\int_{0}^{t}\mathrm{e}^{-ks}\bigg{(}\beta_{s}-\frac{\mathrm{e}^{k(T-s)}}{c_{\rm F}}\bigg{)}\mathrm{d}s=\mathrm{e}^{kt}\int_{0}^{t}\mathrm{e}^{-ks}\beta_{s}\mathrm{d}s-\frac{1}{2kc_{\rm F}}\mathrm{e}^{kT}\big{(}\mathrm{e}^{kt}-\mathrm{e}^{-kt}\big{)},\;\forall t\in[0,T].

The follower’s optimisation problem thus becomes

	$\displaystyle V_{\rm F}(\alpha_{k})$	$\displaystyle=x_{0}+\dfrac{T}{c_{\rm L}}+\dfrac{\mathrm{e}^{kT}-1}{kc_{\rm F}}+\sup_{\beta\in{\cal B}}\bigg{\{}\mathrm{e}^{kT}\int_{0}^{T}\mathrm{e}^{-kt}\beta_{t}\mathrm{d}t-\frac{1}{2kc_{\rm F}}\mathrm{e}^{kT}\big{(}\mathrm{e}^{kT}-\mathrm{e}^{-kT}\big{)}-\dfrac{c_{\rm F}}{2}\int_{0}^{T}\beta_{t}^{2}\mathrm{d}t\bigg{\}}$
		$\displaystyle=x_{0}+\dfrac{T}{c_{\rm L}}+\dfrac{\mathrm{e}^{kT}-1}{kc_{\rm F}}-\frac{1}{2kc_{\rm F}}\big{(}\mathrm{e}^{2kT}-1\big{)}+\sup_{\beta\in{\cal B}}\int_{0}^{T}\bigg{(}\mathrm{e}^{k(T-t)}\beta_{t}-\dfrac{c_{\rm F}}{2}\beta_{t}^{2}\bigg{)}\mathrm{d}t.$

The optimal effort $\beta^{\star}$ introduced above in (A.2) is deduced by pointwise optimisation.

$(i.1)$ Suppose first, that $k<\bar{K}$ so then $\beta^{\star}_{t}=\frac{1}{c_{\rm F}}\mathrm{e}^{k(T-t)}$ . The associated value for the follower is given by

\displaystyle V_{\rm F}(\alpha_{k})

\displaystyle=x_{0}+\dfrac{T}{c_{\rm L}}+\dfrac{\mathrm{e}^{kT}-1}{kc_{\rm F}}-\frac{1}{2kc_{\rm F}}\big{(}\mathrm{e}^{2kT}-1\big{)}+\frac{1}{2c_{\rm F}}\int_{0}^{T}\mathrm{e}^{2k(T-t)}\mathrm{d}t=x_{0}+\dfrac{T}{c_{\rm L}}+\dfrac{\mathrm{e}^{kT}-1}{kc_{\rm F}}-\frac{1}{4kc_{\rm F}}\big{(}\mathrm{e}^{2kT}-1\big{)}.

Remark that for the optimal control $\beta^{\star}$ , the controlled ODE (A.3) simplifies, and gives the trivial solution $\widetilde{X}_{t}=0$ , i.e. $X_{t}=X_{t}^{\star}$ , for all $t\in[0,T]$ . In other words, the best choice for the follower is to choose $\beta$ so that the process $X$ coincides with the process $X^{\star}$ . Given the follower’s optimal response, the objective value of the leader for the strategy $\alpha_{k}$ simplifies to

\displaystyle\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\big{(}a_{k}(t,X_{t})\big{)}^{2}\mathrm{d}t\bigg{]}

\displaystyle=\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}^{\star}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\bigg{(}\frac{1}{c_{\rm L}}+k(X_{t}-X_{t}^{\star})\bigg{)}^{2}\mathrm{d}t\bigg{]}=x_{0}+\dfrac{T}{2c_{\rm L}}+\dfrac{\mathrm{e}^{kT}-1}{kc_{\rm F}}\eqqcolon f(k).

Notice that $f$ is an increasing function of $k$ , and that its limit when $k$ goes to $0$ is given by

\displaystyle\lim_{k\rightarrow 0}f(k)=x_{0}+\dfrac{T}{2c_{\rm L}}+\lim_{k\rightarrow 0}\dfrac{\mathrm{e}^{kT}-1}{kc_{\rm F}}=x_{0}+\bigg{(}\dfrac{1}{2c_{\rm L}}+\dfrac{1}{c_{\rm F}}\bigg{)}T.

As this value corresponds to the leader’s value function in the AOL case, we immediately conclude that her value with the closed-loop memoryless strategy $\alpha_{k}$ would provide her a better value, as soon as $k\in(0,\frac{1}{T}\log(b_{\circ}c_{\rm F}))$ . Similarly, we have

\displaystyle\lim_{k\rightarrow 0}V_{\rm F}(\alpha_{k})

\displaystyle=x_{0}+\bigg{(}\dfrac{1}{c_{\rm L}}+\dfrac{1}{2c_{\rm F}}\bigg{)}T.

$(i.2)$ Take now $k>\bar{K}$ so we have

\beta^{\star}_{t}=b_{\circ}{\bf 1}_{\{t<t_{0}\}}+\frac{e^{k(T-t)}}{c_{\rm F}}{\bf 1}_{t\in[t_{0},T]},\;\text{where}\;t_{0}\coloneqq T-\frac{\log(b_{\circ}c_{\rm F})}{k}.

In this case the follower is not able to keep the process $X$ equal to $X^{\star}$ . Notice that $\beta^{\star}_{t}c_{\rm F}<\mathrm{e}^{k(T-t)}$ for every $t\in[0,t_{0}]$ so then $\widetilde{X}_{t}$ is negative over $[0,t_{0})$ and remains so over $[t_{0},T]$ . Then, we have that $a_{k}(t,X_{t})<\frac{1}{c_{\rm L}}$ for every $t>0$ . In fact, we can compute explicitly

\widetilde{X}_{t}=\begin{cases}\displaystyle\frac{b_{\circ}}{k}(\mathrm{e}^{kt}-1)-\frac{1}{2kc_{\rm F}}\big{(}\mathrm{e}^{k(t+t_{\text{$0$}})}-\mathrm{e}^{k(t-t_{\text{$0$}})}\big{)},\;t\in[0,t_{0}],\\[5.0pt] \displaystyle\mathrm{e}^{k(t-t_{\text{$0$}})}\widetilde{X}_{t_{\text{$0$}}^{\text{$\star$}}},\;t\in[t_{0},T].\end{cases}

The utility for the leader from this strategy is given by

	$\displaystyle J_{\rm L}(a_{k},\beta^{\star}(a_{k}))$	$\displaystyle=\mathbb{E}^{\mathbb{P}}\bigg{[}X_{T}-\dfrac{c_{\rm L}}{2}\int_{0}^{T}\alpha_{t}^{2}\mathrm{d}t\bigg{]}$
		$\displaystyle=x_{0}+\frac{T}{c_{\rm L}}+\frac{\mathrm{e}^{kT}-1}{kc_{\rm F}}+\frac{b_{\circ}\mathrm{e}^{kT}}{k}-\frac{b_{\circ}^{2}c_{\rm F}}{2k}-\frac{\mathrm{e}^{2kT}}{2kc_{\rm F}}-\dfrac{c_{\rm L}}{2}\mathbb{E}^{\mathbb{P}}\bigg{[}\int_{0}^{T}\bigg{(}\frac{1}{c_{\rm L}}+k\widetilde{X}_{t}\bigg{)}^{2}\mathrm{d}t\bigg{]}.$

It can be seen numerically that the previous expression is decreasing in $k$ . Combining this with the deductions from part $(i.1)$ , the (unrestricted) leader should choose the value $k=\bar{K}$ which provides her the highest utility.

^inline^inlinetodo: inlineThis is not an acceptable argument… We have to prove that this is non-increasing in

k

. Also the notations are misleading since

\widetilde{X}

actually depends on

k

, and so does

t_{0}

. It is also completely deterministic so the expectation is useless.

(ii) We now discuss the problem in our setting, when $A=[0,a_{\circ}]$ . To simplify the solution of this example, we can choose the parameters $a_{\circ}$ and $b_{\circ}$ properly so the solution of the case $(i)$ is admissible in the restricted problem. This implies that the solution found in the case $(i)$ is also the solution to the ACLM problem with bounded effort for the leader. Indeed, note that the process $\widetilde{X}_{t}$ defined in (A.3) is bounded since $B=[0,b_{\circ}]$ . Namely, we have

\frac{1}{2kc_{\rm F}}(\mathrm{e}^{k(T-t)}-\mathrm{e}^{k(T+t)})\leq\widetilde{X}_{t}\leq\frac{1}{2kc_{\rm F}}(\mathrm{e}^{k(T-t)}-\mathrm{e}^{k(T+t)})+\frac{b_{\circ}}{k}(\mathrm{e}^{kt}-1),\;t\in[0,T].

Therefore the strategy $a_{k}$ is guaranteed to take values in $A=[-a_{\circ},a_{\circ}]$ , for every response $\beta\in{\cal B}$ , for instance, if $a_{\circ}>\frac{1}{c_{\rm L}}+b_{\circ}(b_{\circ}c_{\rm F}-1)$ and $a_{\circ}>\frac{1}{2c_{\rm F}}(b_{\circ}^{2}c_{\rm F}^{2}-1)-\frac{1}{c_{\rm L}}$ .

∎

Appendix B Functional spaces

We introduce the spaces used in this paper, by following [58]. Let $(t,x)\in[0,T]\times\Omega$ , $(\mathcal{P}(t,x))_{t\in[0,T]\times x\in\Omega}$ be a family of sets of probability measures on $(\Omega,\mathcal{F}_{T})$ . In this section, we denote by $\mathbb{X}\coloneqq(\mathcal{X}_{s})_{s\in[0,T]}$ a general filtration on $(\Omega,\mathcal{F}_{T})$ . Let $p\geq 1$ , $\mathbb{P}\in\mathcal{P}(t,x)$ and $\mathbb{X}_{\mathbb{P}}$ the usual $\mathbb{P}$ -augmented filtration associated with $\mathbb{X}$ .

•

$\mathbb{H}^{p}_{t,x}(\mathbb{X},\mathbb{P})$ (resp. $\mathbb{H}^{p}_{t,x}(\mathbb{X},\mathcal{P})$ ) denotes the spaces of $\mathbb{X}$ -predictable $\mathbb{R}^{d}$ -valued processes $Z$ such that

\|Z\|_{\mathbb{H}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathbb{P})}^{p}\coloneqq\mathbb{E}^{\mathbb{P}}\bigg{[}\bigg{(}\int_{t}^{T}\|\widehat{\sigma}_{s}^{\top}\!Z_{s}\|^{2}\mathrm{d}s\bigg{)}^{\frac{p}{2}}\bigg{]}<+\infty,\;\;\bigg{(}\text{resp. }\|Z\|_{\mathbb{H}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathcal{P})}^{p}\coloneqq\sup_{\mathbb{P}\in\mathcal{P}(t,x)}\|Z\|_{\mathbb{H}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathbb{P})}^{p}<+\infty\bigg{)}.

•

$\mathbb{S}^{p}_{t,x}(\mathbb{X},\mathbb{P})$ (resp. $\mathbb{S}^{p}_{t,x}(\mathbb{X},\mathcal{P})$ ) denotes the spaces of $\mathbb{X}$ –progressively measurable $\mathbb{R}$ -valued processes $Y$ such that

\|Y\|_{\mathbb{S}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathbb{P})}^{p}\coloneqq\mathbb{E}^{\mathbb{P}}\bigg{[}\sup_{s\in[t,T]}|Y_{s}|^{p}\bigg{]}<+\infty,\;\;\bigg{(}\text{resp. }\|Y\|_{\mathbb{S}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathcal{P})}^{p}\coloneqq\sup_{\mathbb{P}\in\mathcal{P}(t,x)}\|Y\|_{\mathbb{S}^{p}_{t,x}(\mathbb{X},\mathbb{P})}^{p}<+\infty\bigg{)}.

•

$\mathbb{I}^{p}_{t,x}(\mathbb{X},\mathbb{P})$ (resp. $\mathbb{I}^{p}_{t,x}(\mathbb{X},\mathcal{P})$ ) denotes the spaces of $\mathbb{X}$ -optional $\mathbb{R}$ -valued processes $K$ with $\mathbb{P}\text{\rm--a.s.}$ càdlàg and non-decreasing paths on $[t,T]$ with $K_{t}=0,\;\mathbb{P}\text{\rm--a.s.}$ and

\|K\|_{\mathbb{I}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathbb{P})}^{p}\coloneqq\mathbb{E}^{\mathbb{P}}[K_{T}^{p}]<+\infty,\;\;\bigg{(}\text{resp. }\|K\|_{\mathbb{I}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathcal{P})}^{p}\coloneqq\sup_{\mathbb{P}\in\mathcal{P}(t,x)}\|K\|^{p}_{\mathbb{I}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathbb{P})}<+\infty\bigg{)}.

•

$\mathbb{G}^{p}_{t,x}(\mathbb{X},\mathbb{P})$ denotes the spaces of $\mathbb{X}$ -predictable $\mathbb{S}^{d}$ -valued processes $\Gamma$ such that

\|\Gamma\|_{\mathbb{G}^{\text{$p$}}_{\text{$t$}\text{$,$}\text{$x$}}(\mathbb{X},\mathbb{P})}^{p}\coloneqq\mathbb{E}^{\mathbb{P}}\bigg{[}\bigg{(}\int_{t}^{T}\big{\|}\widehat{\sigma}^{2}_{s}\Gamma_{s}\big{\|}^{2}\mathrm{d}s\bigg{)}^{\frac{p}{2}}\bigg{]}<+\infty.

When $t=0$ , we simplify the previous notations by omitting the dependence on both $t$ and $x$ .