Delegated portfolio management with random default
Abstract
We are considering the problem of optimal portfolio delegation between an investor and a portfolio manager under a random default time. We focus on a novel variation of the Principal-Agent problem adapted to this framework. We address the challenge of an uncertain investment horizon caused by an exogenous random default time, after which neither the agent nor the principal can access the market. This uncertainty introduces significant complexities in analyzing the problem, requiring distinct mathematical approaches for two cases: when the random default time falls within the initial time frame and when it extends beyond this period. We develop a theoretical framework to model the stochastic dynamics of the investment process, incorporating the random default time. We then analyze the portfolio manager’s investment decisions and compensation mechanisms for both scenarios. In the first case, where the default time could be unbounded, we apply traditional results from Backward Stochastic Differential Equations (BSDEs) and control theory to address the agent problem. In the second case, where the default time is within the interval , the problem becomes more intricate due to the degeneracy of the BSDE’s driver. For both scenarios, we demonstrate that the contracting problem can be resolved by examining the existence of solutions to integro-partial Hamilton-Jacobi-Bellman (HJB) equations in both situations. We develop a deep-learning algorithm to solve the problem in high-dimension with no access to the optimizer of the Hamiltonian function.
Keywords: Stochastic control with random horizon, Principal-Agent problem, enlargement of filtration, BSDE, HJB equation and deep learning.
1 Introduction
Delegating portfolio management from an investor to a professional fund manager is increasingly seen as a strategic move due to the growing complexity of financial markets and their fragmentation [28, 7, 45]. The financial landscape today is marked by rapid market fluctuations, changing regulations, and an extensive range of investment opportunities, all of which necessitate not only significant time and effort but also in-depth knowledge and experience to navigate effectively. For many investors, handing over portfolio management to a professional allows them to leverage the fund manager’s expertise in areas such as professional oversight, diversification strategies, and sophisticated risk management techniques—capabilities that are often difficult to achieve independently.
In this context, fund managers are expected to deliver superior performance by utilizing their specialized skills and tools. Traditionally, the compensation structure for fund managers has been a blend of a fixed fee and a performance-based component. The fixed fee provides a stable income for the manager, while the performance-based component is designed to incentivize the manager to achieve better returns by aligning their interests with those of the investor. However, this conventional compensation structure raises an important question: Is it optimally designed to align the incentives of both the investor and the fund manager? The challenge lies in ensuring that the performance-based component effectively motivates the fund manager to act in the best interests of the investor, while also accounting for the inherent uncertainties and risks associated with market fluctuations.
To address this, it is crucial to evaluate whether the existing compensation structures are adequately aligned with the investor’s objectives and whether alternative models could offer better alignment. This involves exploring various compensation schemes and their impact on fund performance, risk management, and overall investor satisfaction. Ultimately, a well-designed compensation structure should not only incentivize fund managers to maximize returns but also to manage risks prudently, ensuring that both the investor’s and the manager’s goals are harmoniously aligned in the pursuit of financial success.
From a mathematical perspective, this issue can be framed as a variation of the principal-agent (PA) problem in wealth management, see for example [16, 36, 45, 17, 33, 32, 14]. The PA framework in continuous time is a game-theoretical model designed to address problems of stochastic control, where one party (the principal) delegates decision-making authority to another party (the agent), whose actions are not directly observable and evolve in a stochastic environment. The agent controls the system by choosing a strategy that influences outcomes, but because of the information asymmetry, the principal cannot directly observe the agent’s actions. Instead, the principal must design a contract based on the observed outcomes, which indirectly depend on the agent’s actions. The goal is to structure this contract in a way that motivates the agent to exert optimal effort, while managing the inherent trade-offs between risk-sharing and incentives. Here, the investor (the principal) has an initial capital, , and seeks an agent to invest on their behalf. The principal is willing to negotiate a compensation scheme that incentivizes the agent based on portfolio performance and risks. In the literature, the PA problem was extensively studied in a simpler setting: in the seminal work of [22], the agents control the drift of the process, but the utility is just drawn from the terminal value of the controlled process. In [43] we still see a control based on just the drift, but the utility is now directly dependent on inter-temporal payments. In both cases, because of the modeling choice of having a single Brownian motion, there is no moral hazard with respect to the control on the volatility. A concept introduced by [14], moral hazard in the volatility control is important when multiple sources of risks arise in the control problem. What the authors proved in this work, thanks to the mathematical advancement on singular change of measure, is that in optimal contracts there are some incentives to be given with respect to the quadratic variation of the controlled process and the co-variation with respect to the risk factors. Furthermore, in their work, instead of working from a probability perspective, they cast the problem as a stochastic control problem and they restrict the analysis to an admissible family of contracts, proving no loss of generality in doing so, paving a simpler way to the recent literature on the subject. In a following paper [15], the same result was proven by using second order BSDE, and a recent work [11] simplified even more the theoretical guarantees for volatility controlled problems, showing the same results but using the standard theory of BSDE. To summarise, in these works they define a class of admissible contracts, they prove there is no loss of generality in consider such form and they find the optimal contract in this set. In practice they rely on the work of [38, 5, 19, 39, 31, 37] on existence of solutions for BSDEs in the Lipshitz and quadratic cases with or without jumping terms and random horizon. This will also be the building block for this work.
Portfolio optimization has been extensively studied in the literature, beginning with the pioneering work of [34]. More recent studies have explored continuous-time versions of the framework, as well as settings that incorporate jumps, as in [35]. The PA problem captures the scenario of a fund manager investing on behalf of a client. Despite its practical relevance, the existing literature has not fully addressed a crucial aspect of this problem: the randomness of the investment horizon. More often than not, an investment in financial markets does not have a precise duration, which is used as a reference for setting risk aversion for the investor. But time is crucial in control problems. Our contribution aims to fill this gap by examining how decision-making strategies—both the agent’s investment strategy and the principal’s compensation scheme—are affected by the introduction of default times, adding uncertainty to the investment horizon. Adding a default time makes the problem mathematically more challenging. First of all, using a general default time forces to delve into the depth of information theory. To be able to treat the problem, we will need to enlarge the filtration adding to the one generated by the financial market the filtration resulting from the random default (see [27, 8, 1, 21] and the references therein). Furthermore, as highlighted by [25], there are two distinct cases to consider for default times, each with different implications:
-
•
Unbounded Case: If the maximum possible default time exceeds the investment horizon (or is infinite), it is uncertain whether a default will occur within the investment period, that is . In this case, the investing problem is reduced to a utility maximization under random horizon. It has been solved in for example [29] proving that the solution is related to a system of BSDEs with a jump admitting a solution via a decomposition approach coming from filtration enlargement theory.
- •
These two cases not only have different interpretations but also require distinct mathematical tools. The unbounded case can be seen as a default caused by a black swan event [40], or a crash that forces authorities to close to market (Flash crash in May 2010, see [30]), or, in the blockchain, a hacker attack or, finally, in a more structured deal with investors, when a fund sees its money withdrawn with no or little notice. This often complicates the investment strategies for the funds, and for this reason certain funds (i.e. hedge fund) have very strict policies on funding withdrawal. Mathematically, the BSDE related to this case is better behaved than the other case. The bounded case instead can be representative of the well known and studied life insurance market: in this case, the insurance policy can have a time horizon of over years, so that we can claim with probability one, the investor is going to pass away before the natural termination of the contract. This means that, when calibrating the contract, both the agent and the investor are aware of the fact the horizon will not be respected and this is taken into account in both the trading strategies and the insurance payments. Mathematically, this formulation introduces a difficulty in the family of proposed contracts, as it generates a BSDE with a singular driver (see [25] and references therein), and it also poses some extra difficulties in the convergence of numerical methods.
The problem’s structure also presents challenges for numerical solutions. The partial differential equation (PDE) resulting from the Hamilton-Jacobi-Bellman (HJB) control problem has a varying coefficient dependent on the solution of a maximization problem involving the solution itself. Addressing this requires a specialized approach using an ”actor-critic” iterative algorithm, where the actor solves the PDE for a fixed coefficient, and the critic updates the maximization problem based on the actor’s latest guess. While several schemes could tackle this iterative process, the most effective have proven to be in the domain of Physics-Informed Neural Networks (PINNs). PINNs are a powerful machine learning framework that blends neural networks with principles from physics to solve complex differential equations, especially in situations where traditional methods may struggle. From a technical perspective, the surge of this methodology was possible because of the one of the most useful but perhaps underused techniques in
scientific computing, automatic differentiation. We refer to the survey [6] for a comprehensive study of this problem. The simple idea behind it is to differentiate neural networks with respect to both their input coordinates and model parameters: the former allows us to have derivatives in the loss function, the latter is the standard way to train the network. Introduced and expanded by works such as [42] and [44], PINNs leverage the underlying physical laws, typically encoded as partial differential equations (PDEs), to guide the learning process. Rather than relying purely on data, PINNs incorporate these governing equations into the loss function, ensuring that the neural network solutions respect known physical constraints. This approach is particularly effective for solving high-dimensional partial (integro) differential equations, arising from (stochastic and continuous-time) problems in virtually every field, like fluid dynamics, electromagnetism, biology or finance (see the work of [3], [4]). By incorporating physics directly into the architecture, PINNs enable the modeling of complex systems while reducing reliance on large datasets, bridging the gap between traditional numerical solvers and modern machine learning techniques. Despite all these difficulties, this default-time formulation is crucial for practical applications, as they make both contract incentives and trading strategies more robust. This study contributes significantly to the literature on principal-agent problems, extending its applicability to real-world financial scenarios by providing insights into the effects of random investment horizons and default times.
The structure of this paper is organized as follows. In Section 2, we present the mathematical formulation of the Principal-Agent problem under time uncertainty, describing the underlying stochastic framework, the controlled wealth process governing the system dynamics, and all the necessary assumption to make the problem tractable. We will further define some classes of admissible contracts, and the optimisation problems for both the principal and the agent. In Section 3, we solve the problem sequentially, first focusing on the agent’s optimal strategy, which is going to be the same for both cases of bounded and unbounded default. Despite the fundamental difference in the proof of existence, this trading strategy has the same form in both cases and will be plugged into the principal’s problem. Then we will derive the Hamilton-Jacobi-Bellman (HJB) equation for the principal and we will claim, with a verification theorem, the existence of the solution to the Partial Differential Equation (PDE) that encapsulates the principal’s optimization problem in the two proposed settings. Then, Section 3.2 provides numerical examples that demonstrate the implementation of the theoretical results in concrete scenarios, for both cases, using default time from the families of beta distribution and exponential distribution. The goal is multiple: to show the differences arising within the same case, but also across the two different cases and even drawing comparison with a case of no default. Finally, we also want to highlight the sub-optimal behaviour of a subset of optimal contracts that mimic real-world compensation schemes.
2 The model and the optimization problem
2.1 Risky assets and portfolio dynamics
We consider a financial market represented by a probability space endowed with a dimensional Brownian motion denoted by and a finite horizon . We denote by the natural filtration of this Brownian motion. This market consists in risky assets with vector price at time with no risk-free rate. The risky assets have the followed the dynamic:
where are respectively valued and -valued bounded predictable processes. We define the valued covariance matrix by is the th component of . We assume that is an invertible matrix, that is is a.e. elliptic. We define . Let be a vector in representing the fraction of money invested in every asset at time . We refer to it as the investment strategy of the portfolio manager. We set . Note that we can refer also to or as the investment strategy interchangeably up to the volatility factor . For every , one can define a probability measure such that the dynamics of the value of portfolio starting with is then given by111We refer to Appendix A in [4] for the rigorous formulation of the problem and the choice of the probability .
or equivalently
2.2 Default time and enlargement of filtration
The default time is representing by a random variable taking values in . We define the default process by . Note that this process is not necessarily measurable since the default time is assume to be potentially exogenous to the system and so independent of . We thus enlarged the available information and so the filtration taking into account the information generated by the default time occurrence.
Definition 1.
Let be the algebra generated by until time . Given a filtrated space , the enlarged filtration
is the smallest enlargement of such that is a -stopping time.
Remark 1.
is not measurable but it is -measurable stochastic process.
The goal is to ensure that the inaccessible default time enable us to enlarged the filtration to and transferring the martingale property from to known as the immersion property or H-hypothesis. The first fundamental hypothesis is to set the existence of a (conditional) density for the default time with a certain property.
Hypothesis (Density Hypothesis).
For any , it exists a measurable map such that
and
As a consequence of this assumption, see for example [10, 18], any -martingale is also a -martingale. Furthermore, and still under the density hypothesis, the process admits an absolutely continuous compensator, i.e., there exists a non-negative -predictable process , such that the compensated process defined by
is a -martingale. The compensator vanishes after time (therefore ) and
is a -predictable process. For a complete and deeper discussion on the properties of enlarged filtrations, we refer the reader to [21]. We set .
As a consequence of Proposition 4.4 in [18]
We now turn to the integrability of the process and the support of the default time . Denoting by the set of -stopping times (so we will have or ), we consider two cases
Hypothesis A - unbounded default.
(HA) |
As a direct consequence of the tower property and since , Hypothesis A leads to
Consequently, , the support of strictly contains .
Hypothesis B - bounded default
(HB) |
Hence,
Consequently, , the support of is included in .
2.3 Admissible strategy and contracts
Admissible strategies may be restricted to a closed or compact subset of . We set the rigorous definition of an admissible strategy below, following [23, Definition 1].
Definition 2 (Admissible strategy with constraints).
Let be a closed set in . The set of admissible strategy denoted by consists of dimensional predictable process such that and , a.e.
Note that due to the nature of the problem considered and by considering a compensation , other integrability conditions are transferred to the admissibility of the contract below. Some example of set of admissible strategy includes:
-
•
, that is is a proportion of the total wealth that is invested in the portfolio with no possibility to borrow or spend more than the actual value of . It is not permitting shorting stocks (i.e. selling stocks borrowed).
-
•
, which does not permitting to leverage positions up to a certain threshold.
-
•
assuming that the investor can spend or borrow as much money as needed limited to a fraction of the total wealth (possibly greater or less than 1 or ).
For a symmetric positive definite matrix , we define the norm of a column vector by
This norm is equivalent to the euclidean norm in , and the constants of equivalence are the smallest and biggest eigenvalues of the matrix . From this definition of -norm, we define the -distance of to the set as
The contract proposed by the investor follows the idea in the article [14]. We denote by a risk aversion parameter for the portfolio manager with CARA exponential utility function
Definition 3 (Admissible contract with contractible variables).
We denote by the set of admissible contract composed by measurable random variable , controlled by predictable real valued processes such that is a positive definite matrix222 denotes the identity matrix in dimension . and
where
with by
so that
(1) |
where
and there exists such that
The set of processes satisfying this integrability condition is denoted by while there restriction to is denoted by for any .
Economical interpretation.
-
•
is a fixed compensation determined by the reservation utility of the portfolio manager;
-
•
the integrand process represents a compensation with respect to the evolution of the -th asset , if this one is observable by the investor and so contractible;
-
•
is a compensation term with respect to the portfolio dynamic, always observable by the investor;
-
•
is a compensation with respect to the default risk of the market;
-
•
is a compensation with respect to the covariation of and while the term is a compensation driven by the risk aversion of the manager with respect to the quadratic variation of the portfolio;
-
•
is the certain equivalent utility wins by the portfolio manager when solving her optimization problem. The gain resulting from this optimization is transferred into the contract.
Remark 2.
We can refine the set of contract depending on the information available for the investor (see [14]).
-
•
We denote by the set of random variables with corresponding to the case where is not observable by the investor and so not contractible.
-
•
The set of linear contract defined by
with contractible . The idea around this contract is that in practice, most of fund managers are asking as a compensation to their clients a fixed percentage of the terminal wealth, forcing in , where is the fixed percentage the agent is going to receive.
In all these cases, are predictable processes such that and all the stochastic integral are martingales.
2.4 Delegated portfolio management and bi-level stochastic programming
We are assuming that the portfolio manager is receiving the contract and optimally chooses a strategy in order to stay close to a benchmark strategy à la Almgren-Chriss, see [2] so that the objective of the manager is to solve for any contract fixed
(A) |
where
In our model, the investor chooses a full delegation of the portfolio management to the manager and thus let the manager choosing the optimal strategy to optimize the profit terminal value of the portfolio under default. This second-best case and the contracting problem is reduced to solve a bi-level optimization under constraint as follows when is contractible
(P) |
subject to
-
•
(R):
-
•
(IC): .
We will refer to the first problem (A) as the Problem of the Agent, and as the Problem of the Principal the bi-level programming (P). Furthermore, we want to emphasize the fact that does not need to be a stopping time in the natural filtration given by the assets’ dynamics. Despite this, most of the results regarding default time are based on stopping time theory so we want to work with a filtration such that can be a stopping time.
3 Optimal contract and investment strategy
3.1 Optimal investment with random horizon: solving the agent problem
A common approach in the continuous stochastic optimisation literature is based on solution of Backward Stochastic Differential Equations (BSDEs) by solving a martingale optimality principle. We refer the reader to [23] for a detailed explanation of the method in a continuous setting and to [35, 29, 25] for extensions to discontinuous processes or default time. The general idea is to generate a family of super-martingales indexed by the control variable, in our case the investment strategy , with terminal condition the objective function of the agent at time . If we are able to find a specific control such that is a martingale, this control is optimal for (A), and the optimal value is given by the process indexed by the optimal control at time .
Lemma 1 (Martingale Optimality Principle).
Let and let be a family of stochastic processes indexed by the strategy such that
-
(i)
,
-
(ii)
is a -supermartingale and is constant for all ,
-
(iii)
there exists such that is a -martingale.
Then, is a solution to the maximization problem (A).
Proof.
Take . Then, we have
∎
Let be fixed. Independently of the boundness of the default time, that is either under Hypothesis A or Hypothesis B, we define
where is defined by
Theorem 1.
Assume that Hypothesis and either Hypothesis A or Hypothesis B are satisfied. For any , the optimal strategy solving (A) is
(2) |
and the optimal value is given by , where
and
Proof.
The proof follows the same idea of [23] for the continuous case and [25, 35] for the discontinuous case extending to the multi-dimensional case and the contract fixed by the principal. Note that and are already stochastic (row) vectors and that we are going to denote by the set of admissible trading strategies . The proof will be based on Ito’s formula with Poisson jumps (as the intensity of our jump is the same of a simple Poisson process with varying intensity ) and we refer to [41, 24], for more details on stochastic calculus with semi-martingales; and the notion of Doléans-Dade exponential process (DDE). Given a semi-martingale , we denote its DDE by
where is the continuous component of the path of while .
Note that the DDE of is a solution of the following SDE
and that, provided is a -martingale, the resulting DDE is a (local) martingale as well.
To find the optimal solution, we are applying the so-called martingale optimality principle. We define the following family of stochastic processes, indexed by the strategy
We set , so that . Note that the Ito’s decomposition of is given by
Hence,
where
We thus deduce that
where we used the fact that . Note that and and equals to zero when is chosen by maximizing
with
and
Therefore,
so that is optimal and unique because our set is compact. ∎
Remark 4.
Note that if and and and are constant we get
Remark 5.
As a consequence of the constraint (IC), we implicitly are looking for contract such that . This requires to consider such that , otherwise (IC) is not satisfied and there is no optimal contracts.
3.2 The optimal contract and verification results
In the previous chapter, we found the optimal strategy for the agent under the best optimal contract with contractible . Given that strategy, we know want to solve the problem for the principal of (P), i.e, we want to find the compensation to maximise the principal utility:
(3) |
where we used to stress that now the object of our maximisation problem is dependent on the best agent strategy . Before diving into the intuition behind how to solve this problem, it is important to show how this becomes a control problem so that an HJB equation can be derived. To do so, we first want to work on . We recall that , so that
Moreover,
Therefore, we can rewrite (3) as
(4) |
where . In order to derive the HJB equation with control process , we will use the results from [9] assuming that has a probability density function and a cumulative density function either if the support of is bounded in or unbounded in the next two subsections. We recall that the problem of the principal is reduced to solve
To derive the HJB equation from this problem, we set an additional assumption enforcing Markovian properties of the drift and volatility processes.
Assumption (M). We assume that there exists two progressive measurable functions such that and .
We define the following system of coupled SDE with jumps with solution controlled by
For the sake of computational simplicity and to avoid overwhelming notation, we will assume from now on that the principal is risk neutral, so that .
3.2.1 Bounded default time
Under Hypothesis B, the support of is included in . Recalling the results in [9], we deduce that
where is the density of . We introduce the following Hamilton-Jacobi-Bellman integro-partial differential equation.
where is a differential operator given by
We define
where optimizes .
Theorem 2 (Verification Theorem - bounded case).
Assume that there exists a function twice continuously differentiable in space and differentiable in time, such that solves (bHJB). Furthermore, assume that has a quadratic growth in and polynomial growth in such that
Then, for each , the strategy is an optimal strategy for the control problem and
The optimal contract is given by
Remark 6.
If we assume that and , we note that the solution to the HJB equations (bHJB) does not depend on . It can thus be rewritten as
where
where
Proof.
The proof of the theorem relies on a localization procedure. We first assume that there exists a solution to (uHJB) satisfying
Let We introduce the following stopping time
where is the centered ball of radius in . Applying Ito’s formula, we get
By the localisation procedure and considering the expectations, we get
Since satisfies (uHJB),
where the equality holds for .Recall that is bounded by a polynomial function in the variables . We note that
By applying the dominated convergence theorem, we deduce that
with equality when = .
∎
Viscosity solution and dynamic programming.
Assuming the existence of a solution to (bHJB) can be relaxed in Theorem (2) by showing that the value function of the problem is a solution in the sense of viscosity of the PDE (bHJB). This relaxation of regularity for solution to PDE has been developed by Lions and Crandal in [13, 12]. We refer to [20, 46] for more details about stochastic control with viscosity solutions. We start by recalling the dynamic programming principle and we define the continuation value objective of the principal for any control , by
so that where and denotes the flow processes starting at time with respective initial values and in (SDE). The dynamic programming principle states that for any stopping time where we have
(5) |
The notion of weak solutions of (bHJB) results from this dynamic programming principle and is defined as follows.
Definition 4 (Viscosity solution).
We say that is a lower (resp. upper) semi-continuous super-solution (resp. sub-solution) of (bHJB) on if for all function and satisfying
we have
If is both a super and sub solution, we say that is a viscosity solution to (uHJB).
As a consequence of the dynamic programming principle, we have the following theorem
Theorem 3.
Assume that the value function is locally bounded on . Then is a viscosity solution to (bHJB).
Remark 7.
Note that the infinitesimal generator contains a degenerative term through the process. This term is exploding when approaches in the case that the support of is bounded and requires to prove Theorem 3 with a localisation technique of both the state variables of the problem and the process.
Proof of Theorem 3.
We denote by the lower semi-continuous and upper semi-continuous envelopes of respectively.
Step 1. Proof of the super-solution property. Let be a function. Let be such that
and a sequence such that
with . We define . Note that for large . We also define the solution to (SDE) starting at time with respecting values and and controlled by such that , so that is a constant at time , and denotes the price process starting at the price vector at time . In other words, and . We also define
and the stopping time
where is the unit ball on centered at the point . We note that when goes to . According to the dynamic programming principle (5) we have
or equivalently
Applying Ito’s formula for the function and by the localisation procedure before the stopping time we get
Since the function is assumed to be , and its derivative are essentially locally bounded around and uniformly in for . taking the limit when goes to , we obtain
We deduce that is a viscosity super-solution in the sense of Definition 4 to (bHJB).
Step 2. Proof of the sub-solution property. Let be a function. Let be such that
We assume by contradiction that
Since the function is continuous, we deduce that there exists a ball around with norm small enough such that
Note that there exists independent of such that
Let converging to such that and for any . We define
Applying Ito’s formula, we get for any control
Since is independent of the control , it contradicts the dynamic programming principle (5). We thus deduce that
hence is a sub-solution in the sense of in the sense of Definition 4 to (bHJB). ∎
3.2.2 Unbounded default time
Even in the unbounded case, we start by by getting rid of the we have in (4) as it is useless for the optimisation routine: again and only differs for a constant, so they have the same governing SDE. Therefore, our starting point is the following optimisation problem
The first difference is that now, in reformulating the problem using the default density, we will still have a terminal part . Overall, our problem is now
Note that this problem differs form the bounded default time case only from the terminal condition given by
We introduce the following integro-partial PDE and the verification theorem follows.
Theorem 4 (Verification Theorem - unbounded case).
Assume that there exists a function twice continuously differentiable in space and differentiable in time, such that solves (uHJB). We denote by the optimizers in the supremum. Furthermore, assume that has a quadratic growth in and polynomial growth in such that
Then, for each , the strategy is an optimal strategy for the control problem and
The optimal contract is given by
Remark 8.
The proof of this theorem follows the same lines than the proof of Theorem 2 without requiring to localize the term.
Remark 9.
Similarly to Theorem 3, the value function of the problem is a viscosity solution to (uHJB).
4 Numerical solutions
In this section, we explore the numerical analysis of the HJB equations (bHJB) and (uHJB). On the first level it is mainly to understand the differences in terms of bounded/unbounded decision and to check them against the non default case. On the second level, we are interested, for the bounded default, to capture some features:
-
•
understand how the default time affects the trading strategy and the incentive compensation scheme;
-
•
investigate how the skewness of the default with bounded support in impacts the compensation scheme;
-
•
compare the compensation scheme, especially the compensation with respect to the default given by the process , in the linear case and the general case ;
-
•
give a qualitatively explanation of the average patterns of various incentives related to the different kind of risks involved in the compensation scheme.
The problem already poses several numerical challenges, especially because we have no explicit formula for the optimizer in Theorem 2 and the high dimensionality of the state variable set. Both (bHJB) and (uHJB) require an iterative approach, as one of the coefficient involves the value of the solution in a point which is different from the one we are currently evaluating. In recent years literature, neural network showed great potential to be the state of the arts for numerical scheme for PDE, so the backbone of our algorithm will be a simple feed-forward neural net, that we will deploy in a deep learning scheme like the one used in [3], where an actor-critic approach is used to find the optimal policy for market making.
We recall that the PDE we are trying to solve is
coupled with some boundary and terminal conditions dependent on which case we are considering. The simple idea of our algorithm is to have an initial guess for the optimizers
and an initial untrained neural net which is serving as initial value function . For this fixed set of values, we are going to find the solution of the above PDE by training the neural network. The training of a neural network for the so-called Physics-Informed-Neural-Network consist in using the auto-differentiation tools of deep learning packages to have the right derivatives, to build a loss function which is the residual of the PDE equation, and to have as additional loss boundary/terminal/initial conditions. A very detailed description can be found in the seminal work of [42]. The advantage of using a neural network is that is a parametrized solution and we can numerically try to solve again the optimisation problem
finding new optimal variables denoted by and solving the new PDE related to this new parameters by iteration to reach convergence to the solution of the PDE. The pseudo-code can be found in the following listing Algorithm 1 and Algorithm 2.
Main numerical challenges and remedies.
Several numerical challenges have been are derived from the training phase and the solution of (bHJB) or (uHJB). First, the neural network can go towards a constant solution because residual can be kept at a low value and boundary/terminal conditions are met. To avoid the solution getting stuck into a constant, we implemented a regularisation terms penalising the first derivatives being too close to 0. This terms progressively cancels during training once sufficient training is done. The second challenge is to select optimally the weight of the network regarding the boundary conditions. We are using scaling weights for the terminal/boundary loss at the beginning of training so that the loss focus is on the PDE residual. These weights go back to once sufficient training is done. Finally, note that the convergence of this algorithm is especially difficult to study especially in the first few iterations where the loss can potentially explode, due to the exponential term and the degenerated case with exploding term involved in (bHJB). To tackle the iterative challenge of the problem, initialisation of the optimizers were uniformly distributed on large intervals so that movements were not biased in any way. Moreover, instead of taking the current maximum, a weighted average was taken to mollify the optimization process, with weight decaying so that after a lot of training the new optimal solution is taken with weight .
In practice, we used a simple grid search algorithm for maximising over the variables ( and some more efficient approaches can be tried. From our expertise on the problem, a Neural Network approach, making the whole algorithm like a pure deep learning actor-critic was not suited best, as the maximisation task is tough to deal with for early training and was giving raise to some non smooth result that were making the optimisation problem much tougher. Our solution, coupled with an interpolation mechanism in style of -neighbours was working fairly well in practice for our purposes. For the sake of simplicity for the numerical simulations. Note however that our algorithm is tractable to higher dimensionality and still require the algorithm developed above due to the non existence of explicit formula for to solve the integro-partial PDE (bHJB) or (uHJB).
4.1 Numerical simulations with bounded default times
In this subsection we will discuss the experiments with a bounded default time. In particular, we deployed a simple feed-forward NN, with 3d inputs, 1d outputs, 8 hidden layers each with 32 neurons. The learning rate was piece-wise linear, we were sampling points uniformly and between terminal and boundary points. In order to capture the effect of skewness, we decided to have a default time of the beta family, with varying parameters to capture different skewness levels. We worked with Beta,Beta (i.e a uniform distribution), and a Beta. Of course, the exploding compensator was computed but capped, and since the intervals of our study was from in terms of wealth, we opted to cap the compensator process to . The market has a positive drift of with a volatility of . In particular, we will show as an example the calculations for the symmetric beta random variable, which is, in fact, a uniform distribution. Consider for example being uniformly distributed on . We recall that
Since we assume independence of from the filtration, we have
as . Consequently,
so that
matching with the survival function of the uniform distribution. In all the experiments with a independent default time, has a probability density function and an associated cumulative distribution function so that


Figure 1 give a comparative study of the average portfolio evolution and average optimal strategy by considering a uniform distributions on or Beta distributions with opposite skew on , and the case without any jump . Since at the default time, the trader exists the market which has a positive drift, the total terminal wealth of the non-default case is bigger, and in general, the three models in which the default happens on average first are the least profitable one as expected. Note that the wealth obtained by considering a Beta distribution is higher than the uniform which is higher than the Beta distribution. If the default is more likely to happen at the end, the trader is expected to be less at risk of default at the beginning and so benefits from better decisions and more time to invest the money in the assets. The impact of the default on the behaviour of the investment strategy is quite different and reflect interesting features. If at the beginning all the 4 strategies mirror each other, knowing that no default is going to happen makes the no-default investor more careful, while in all 3 cases of default, the strategy goes flat to a certain value. The default case make the investor more aggressive in the trading compared with the non-default case.




Figure 2 gives the average total compensation evolution (top left), average compensation with respect to the default time (top right), average incentive with respect to the wealth of the portfolio (bottom left) and average compensation with respect to the variability (quadratic variation) of the portfolio value (bottom right). We note that
for a skew on the right for the default time distribution, the total compensation is reduced. The average compensation is also increasing along the time. Intuitively, the investor needs to pay more the broker to make the investments when she knows that the conditional probability of the default happening is greater.
Regarding the the incentive related to the default , the uniform default distributes the incentive uniformly. The left skew distribution, Beta24 shows that this compensation is increasing with time, since the default is even more likely to happen for small values of time. The right skewed beta, Beta42 shows an interesting bimodal phases: at the beginning both the principal and the agent know in advance the default is going to happen at the end and the first local maximum of the green curve can be seen as a preventive compensation for the future default, and therefore there is no “wealth shock” at the anticipated default. Then, the compensation stays low until approaching the end when the default is more likely to happen.
The incentive related to the portfolio performance is growing along time, responding to the growth of the portfolio wealth. Note that the compensation for the uniform distribution has the least growth, while a skew on the left provides a higher compensation.
The compensation with respect to the variability of the wealth is almost constant for the uniform distribution, while decreasing then increasing for a skewed distribution. Finally, it is worth mentioning that, for the plots of , the values are, in average, such that the matrix is positive definite and so meet the requirement for the optimal contract to be admissible, and so that the whole numerical scheme is well-posed.
Figure 3 investigates the difference between the solution of the problem for contracts restricted in the set of linear contracts in and the set of general contract . The optimality gap in the portfolio performance is small but noticeable (top left), and the linear strategy is more conservative and shifted down after half the interval but closely mirroring the more general contract (top right). Since the contract is not optimal, the compensation to hedge against the default is very different, growing in time but still being small with respect to other variables (bottom center).



4.2 Numerical simulations with unbounded default times and comparisons
We turn to a comparative study of both the trading strategies and incentives when the default is unbounded or bounded on . In this section, we opted for a exponential default time for the unbounded case, with parameter , and we compared against the uniform distribution, as the mean was the same.
Figure 4 presents the different portfolio evolution, optimal strategy, incentives with respect to the jump and average incentive with respect to the portfolio value in all the three typical case: uniform distribution for the default, exponential distribution for the default or no default time. We see the effect of reaching the pre-planned investment horizon, as the strategy for the unbounded case is reaching, on average, a larger terminal wealth. Note that the trading strategy for the exponential case emphasizes a in-between behavior, between the non default and the uniform default cases: it is similar to the uniform case at the beginning, but the strategy gets closer to the non default case when the end of horizon is approaching. This can be intuitively explained as there is no certainty that the default is going to happen and therefore the strategy does not have to stay as aggressive as the uniform case.
Furthermore, and perhaps the most interesting takeaway from this comparison, as evident from Figure 4, the incentive for the default changes quite dramatically from the uniform case: this is because now there is uncertainty on the fact default is actually going to happen and the incentive for the risk is not intrinsically present in the contract, but the principal has to reward the agent for entering the market under default threat. In general therefore, we notice that the various incentives are different from the bounded cases, highlighting the difference between the two cases and the impact the certainty of having a default has in structuring the contract.




References
- [1] Anna Aksamit, Monique Jeanblanc, et al. Enlargement of filtration with finance in view. 2017.
- [2] Robert Almgren and Neil Chriss. Optimal execution of portfolio transactions. Journal of Risk, 3:5–40, 2001.
- [3] Bastien Baldacci, Iuliia Manziuk, Thibaut Mastrolia, and Mathieu Rosenbaum. Market making and incentives design in the presence of a dark pool: a deep reinforcement learning approach. arXiv preprint arXiv:1912.01129, 2019.
- [4] Bastien Baldacci and Dylan Possamaï. Governmental incentives for green bonds investment. Mathematics and Financial Economics, 16(3):539–585, 2022.
- [5] Guy Barles, Rainer Buckdahn, and Etienne Pardoux. Backward stochastic differential equations and integral-partial differential equations. Stochastics: An International Journal of Probability and Stochastic Processes, 60(1-2):57–83, 1997.
- [6] Atilim Gunes Baydin, Barak A Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. Automatic differentiation in machine learning: a survey. Journal of machine learning research, 18(153):1–43, 2018.
- [7] Peter L Bernstein and Aswath Damodaran. Investment management. J. Wiley, 1998.
- [8] Tomasz R Bielecki and Marek Rutkowski. Credit risk: modeling, valuation and hedging. Springer Science & Business Media, 2013.
- [9] Christophette Blanchet-Scalliet, Nicole El Karoui, Monique Jeanblanc, and Lionel Martellini. Optimal investment decisions when time-horizon is uncertain. Journal of Mathematical Economics, 44(11):1100–1113, 2008.
- [10] Pierre Brémaud and Marc Yor. Changes of filtrations and of probability measures. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 45(4):269–295, 1978.
- [11] Alessandro Chiusolo and Emma Hubert. A new approach to principal-agent problems with volatility control. arXiv preprint arXiv:2407.09471, 2024.
- [12] Michael G Crandall, Hitoshi Ishii, and Pierre-Louis Lions. User’s guide to viscosity solutions of second order partial differential equations. Bulletin of the American mathematical society, 27(1):1–67, 1992.
- [13] Michael G Crandall and Pierre-Louis Lions. Viscosity solutions of hamilton-jacobi equations. Transactions of the American mathematical society, 277(1):1–42, 1983.
- [14] Jakša Cvitanić, Dylan Possamaï, and Nizar Touzi. Moral hazard in dynamic risk management. Management Science, 63(10):3328–3346, 2017.
- [15] Jakša Cvitanić, Dylan Possamaï, and Nizar Touzi. Dynamic programming approach to principal–agent problems. Finance and Stochastics, 22:1–37, 2018.
- [16] Flávia Zóboli Dalmácio, Valcemiro Nossa, et al. The agency theory applied to the investment funds. Brazilian Business Review, 1(1):31–44, 2004.
- [17] Peter M DeMarzo and Yuliy Sannikov. Optimal security design and dynamic capital structure in a continuous-time agency model. The journal of Finance, 61(6):2681–2724, 2006.
- [18] Nicole El Karoui, Monique Jeanblanc, and Ying Jiao. What happens after a default: the conditional density approach. Stochastic processes and their applications, 120(7):1011–1032, 2010.
- [19] Nicole El Karoui, Shige Peng, and Marie Claire Quenez. Backward stochastic differential equations in finance. Mathematical finance, 7(1):1–71, 1997.
- [20] Wendell H Fleming and Halil Mete Soner. Controlled Markov processes and viscosity solutions, volume 25. Springer Science & Business Media, 2006.
- [21] Xin Guo and Yan Zeng. Intensity process and compensator: A new filtration expansion approach and the jeulin–yor theorem. The Annals of Applied Probability, 18(1), 2008.
- [22] Bengt Holmstrom and Paul Milgrom. Aggregation and linearity in the provision of intertemporal incentives. Econometrica: Journal of the Econometric Society, pages 303–328, 1987.
- [23] Ying Hu, Peter Imkeller, and Matthias Müller. Utility maximization in incomplete markets. The Annals of Applied Probability, pages 1691 – 1712, 2005.
- [24] Nobuyuki Ikeda and Shinzo Watanabe. Stochastic differential equations and diffusion processes. Elsevier, 2014.
- [25] Monique Jeanblanc, Thibaut Mastrolia, Dylan Possamaï, and Anthony Réveillac. Utility maximization with random horizon: a bsde approach. International Journal of Theoretical and Applied Finance, 18(07):1550045, 2015.
- [26] Monique Jeanblanc and Anthony Réveillac. A note on bsdes with singular driver coefficients. In Arbitrage, credit and informational risks, pages 207–224. World Scientific, 2014.
- [27] Monique Jeanblanc, Marc Yor, and Marc Chesney. Mathematical methods for financial markets. Springer Science & Business Media, 2009.
- [28] Michael C Jensen. The performance of mutual funds in the period 1945-1964. The Journal of finance, 23(2):389–416, 1968.
- [29] Idris Kharroubi, Thomas Lim, and Armand Ngoupeyou. Mean-variance hedging on uncertain time horizon in a market with a jump. Applied Mathematics & Optimization, 68:413–444, 2013.
- [30] Andrei Kirilenko, Albert S Kyle, Mehrdad Samadi, and Tugkan Tuzun. The flash crash: High-frequency trading in an electronic market. The Journal of Finance, 72(3):967–998, 2017.
- [31] Magdalena Kobylanski. Backward stochastic differential equations and partial differential equations with quadratic growth. The annals of probability, 28(2):558–602, 2000.
- [32] Raymond CW Leung. Continuous-time principal-agent problem with drift and stochastic volatility control: with applications to delegated portfolio management. Available at SSRN, 2014.
- [33] C Wei Li and Ashish Tiwari. Incentive contracts in delegated portfolio management. The Review of Financial Studies, 22(11):4681–4714, 2009.
- [34] Harry M Markowitz. Foundations of portfolio theory. The journal of finance, 46(2):469–477, 1991.
- [35] Marie-Amelie Morlais. Utility maximization in a jump market model. Stochastics: An International Journal of Probability and Stochastics Processes, 81(1):1–27, 2009.
- [36] Hui Ou-Yang. Optimal contracts in a continuous-time delegated portfolio management problem. The Review of Financial Studies, 16(1):173–208, 2003.
- [37] Antonis Papapantoleon, Dylan Possamaï, and Alexandros Saplaouras. Existence and uniqueness results for bsde with jumps: the whole nine yards. Electronic Journal of Probability, 23:1 – 68, 2018.
- [38] Etienne Pardoux and Shige Peng. Adapted solution of a backward stochastic differential equation. Systems & control letters, 14(1):55–61, 1990.
- [39] Etienne Pardoux and Shige Peng. Backward stochastic differential equations and quasilinear parabolic partial differential equations. In Stochastic Partial Differential Equations and Their Applications: Proceedings of IFIP WG 7/1 International Conference University of North Carolina at Charlotte, NC June 6–8, 1991, pages 200–217. Springer, 2005.
- [40] Elisabeth Paté-Cornell. On “black swans” and “perfect storms”: Risk analysis and management when statistics are not enough. Risk Analysis: An International Journal, 32(11):1823–1833, 2012.
- [41] Nicolas Privault. Introduction to stochastic finance with market examples. Chapman and Hall/CRC, 2022.
- [42] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561, 2017.
- [43] Yuliy Sannikov. A continuous-time version of the principal-agent problem. The Review of Economic Studies, 75(3):957–984, 2008.
- [44] Justin Sirignano and Konstantinos Spiliopoulos. Dgm: A deep learning algorithm for solving partial differential equations. Journal of computational physics, 375:1339–1364, 2018.
- [45] Livio Stracca. Delegated portfolio management: A survey of the theoretical literature. Journal of Economic surveys, 20(5):823–848, 2006.
- [46] Nizar Touzi. Optimal stochastic control, stochastic target problems, and backward SDE, volume 29. Springer Science & Business Media, 2012.