Parameter learning: stochastic optimal control approach with reinforcement learning
Abstract
In this study, we develop a stochastic optimal control approach with reinforcement learning structure to learn the unknown parameters appeared in the drift and diffusion terms of the stochastic differential equation. By choosing an appropriate cost functional, based on a classical optimal feedback control, we translate the original optimal control problem to a new control problem which takes place the unknown parameter as control, and the related optimal control can be used to estimate the unknown parameter. We establish the mathematical framework of the dynamic equation for the exploratory state, which is consistent with the existing results. Furthermore, we consider the linear stochastic differential equation case where the drift or diffusion term with unknown parameter. Then, we investigate the general case where both the drift and diffusion terms contain unknown parameters. For the above cases, we show that the optimal density function satisfies a Gaussian distribution which can be used to estimate the unknown parameter. Based on the obtained estimation of the parameters, we can do empirical analysis for a given model.
KEYWORDS: Parameters estimation; Stochastic optimal control; Reinforcement learning; Linear SDE; Exploratory
1 Introduction
Zhou (2023) commented that ”We strive to seek optimality, but often find ourselves trapped in bad optimal, solutions that are either local optimizers, or are too rigid to leave any room for errors”. Indeed, exploration through randomization brings a way to break this curse of optimality. In this present study, we try to balance the model based and model free methods, that is we aim to use the reinforcement learning structure to explore the unknown parameters appeared in the model based method.
Recently, Wang et al. (2020) first considered the reinforcement learning in continuous time and spaces, and an exploratory formulation for the dynamics equation of the state and an entropy-regularized reward function were introduced. Then, Wang and Zhou (2020) solved the continuous time mean variance portfolio selection problem under the reinforcement learning stochastic optimal control framework. Tang et al. (2022) considered the exploratory Hamilton-Jacobi-Bellman (HJB) equation formulated by Wang et al. (2020) in the context of reinforcement learning in continuous time, and established the well-posedness and regularity of the viscosity solution to the HJB equation. Gao et al. (2022) considered the temperature control problem for Langevin diffusions in the context of nonconvex optimization under the regularized exploratory formulation developed by Wang et al. (2020). Differing from the continuous-time entropy-regularized reinforcement learning problem of Wang et al. (2020), Han et al. (2023) proposed a Choquet regularizer to measure and manage the level of the exploration for reinforcement learning.
For the classical stochastic optimal control theory and related applications, we refer the readers to monographs Fleming and Rishel (1975); Yong and Zhou (1999); Fleming and Soner (2006). For the recursive stochastic optimal control problem and related dynamic programming principle, see Pardoux and Peng (1990); Peng (1990, 1992). Well-known that the classical optimal control theory is applied to solve the mean-variance model, see Zhou and Li (2000); Basak and Chabakauri (2010); Björk et al. (2014); Dai et al. (2021). When we consider the applications of the classical optimal control problem, for example in the mean-variance investment portfolio problem, we first need to estimate the parameters appeared in the model. Some statistic methods could be employed such as moment estimation, maximum likelihood estimation which heavily depend on the stringent assumptions on the observed samples. Furthermore, based on the observations (historical data), it is difficult to estimate the time varying parameter appeared in the model based problems. The reinforcement learning algorithm has been widely used to study optimization, engineering and finance, etc. In particular, Wang et al. (2020) first introduced the reinforcement learning in continuous time stochastic optimal control problem.
However, it is important to find a better estimation for the parameter appeared in the model. Based on the estimation of the parameter, we can do empirical and time trend analysis. Therefore, in this study, we first show how to use a stochastic optimal control approach with reinforcement learning structure to learn the unknown parameters appeared in the dynamic equation. We consider the following stochastic optimal control problem with unknown deterministic parameter ,
(1.1) |
where are given deterministic function, is a Brownian motion, and is an input control. In the input control , we can observe the output state . In the classical mean-variance investment portfolio framework, the parameter in (1.1) can be used to describe the mean and volatility of risky asset. Thus, it is useful to obtain the estimation of the unknown parameter .
Now, we show the details of the stochastic optimal control approach with reinforcement learning structure for learning the unknown parameters. Step 1: Based on equation (1.1), by choosing an appropriate cost functional, we obtain a feedback optimal control which contains the unknown parameter , denote by . Step 2: We take place the unknown parameter in the optimal control as a new deterministic control , and the input control is denoted by . Step 3: We put the new control into equation (1.1) and the related cost functional, and establish a new optimal control problem. Indeed, the optimal control of the new exploratory control problem can be used to estimate the unknown parameters. Therefore, we consider the linear stochastic differential equation case where the drift or diffusion term contains the unknown parameter, and the general case where both the drift and diffusion terms contain unknown parameters. For the above cases, we find that the optimal density function can be used to learn the unknown parameters.
In this present paper, we focus on develop the theory of a stochastic optimal control approach with reinforcement learning structure to learn the unknown parameters appeared in the equation (1.1). For the application of this new theory, we leave it as the future work. However, we refer the reader to the following references for the study of the algorithm of the optimal control. A unified framework to study policy evaluation and the associated temporal difference methods in continuous time was investigated in Jia and Zhou (2022a). Jia and Zhou (2022b) studied the policy gradient for reinforcement learning in continuous time and space under the regularized exploratory formulation developed by Wang et al. (2020). Dong (2022) studied the optimal stopping problem under the exploratory framework, and established the related HJB equation and algorithm. Furthermore, Jia and Zhou (2022c) introduced a q-learning in continuous time and space. For the policy evaluation, we follow Doya (2000) for learning the value function. An introduction of reinforcement learning could be found in Sutton and Barto (2018), and the deep learning and related topic see Goodfellow et al. (2016).
The main contributions of this study are threefold:
(i). To learn the unknown parameters appeared in the dynamic equation of the state, we first develop a stochastic optimal control approach with reinforcement learning structure.
(ii). By choosing an appropriate cost functional, based on a classical optimal feedback control, we translate the original optimal control problem into a new control problem, in which the related optimal control is used to estimate unknown parameter. Based on the obtained estimations of the parameters, we can do empirical analysis for a given model.
(iii). We consider the linear stochastic differential equation case where the drift or diffusion term contains unknown parameter, and the general case where both the drift and diffusion terms contain unknown parameters. We show that the optimal density function satisfies a Gaussian distribution which can be used to learn the unknown parameters.
The remainder of this study is organized as follows. In section 2, we formulate the stochastic optimal control approach with reinforcement learning structure, and show how to learn the unknown parameter. Then, we investigate the exploratory HJB equation for the stochastic optimal control approach in Section 3. In Section 4, we consider the problem of the Linear SDE case, where the drift or diffusion term is allowed to contain the unknown parameter. Furthermore, in Section 5, we general the model in Section 4 to the case where both the drift and diffusion terms contain the unknown parameters. Then, we give an example to verify the main results. Finally, we conclude this study in Section 6.
2 Formulation of the model
Given a probability space , Brownian motion , and filtration generated by , where with a given terminal time . We introduce the following stochastic differential equation with the deterministic unknown parameter , and control ,
(2.1) |
where is the set of all progressing measurable, square integrate processes on , and the processes take value in a Euclidean space. Based on the state , we consider the running and terminal cost functional,
(2.2) |
and the target is to minimize over , that is
The conditions on functions and in equations (2.1) and (2.2) will be given later. Under mild conditions, we can show that there exists a feedback optimal control (see Yong and Zhou (1999) for further details), where is the optimal state with the optimal control .
Remark 2.1.
Now, we give the assumptions of this study. The functions and are continuous on all the variables.
Assumption 2.1.
The functions and are linear growth and Lipschaitz continuous on the second and third variables with Lipschitz constant .
Assumption 2.2.
The functions and are polynomial growth on and .
Assumption 2.3.
is a deterministic piecewise continuous function on . The feedback optimal control is uniformly Lipschitz continuous on , with Lipschitz constant .
In practical analysis, we always assume the form of the functions and and leave room for estimating the unknown parameter . Some classical statistics estimations are implemented, for example moment estimation, maximum likelihood estimation, etc. However, these estimation methods are heavily based on the observations. In this study, we aim to develop a new estimations method based on the explicit feedback control .
It should be note that we can observe the value of the state with a given control , but not the unknown parameter . In the following, we develop the model used to estimate the parameter . Since we don’t know the value of the parameter , we take place as a new deterministic control process in the feedback optimal control , and denotes as , where satisfies the following stochastic differential equation,
(2.3) |
where and . Based on the above given assumptions, we have the following unique results for the solution of equation (2.3).
Furthermore, we rewrite the cost functional (2.2) as follows,
(2.4) |
where , , and , . We denote ={all deterministic piecewise continuous functions on }. Obviously, we have the following equivalence between the cost functional and .
Theorem 2.2.
Proof.
Note that, is a deterministic piecewise continuous process. From Assumption 2.3, it is easily to verify that . Thus,
On the contrary, since and
thus, we have
This completes the proof. ∎
Remark 2.2.
Based on Theorem 2.2, if the cost functional admits a unique optimal control , we have that , which means that we can use the above new optimal control problem to estimate the parameter in the original optimal control problem. Indeed, we can choose a cost functional which satisfies the above inference. Further details can be found in Sections 4 and 5.
From equation (2.3), we can observe the value of the state based on an input control . Following the idea investigated in Wang et al. (2020), we use reinforcement learning (RL) structure to learn the optimal control . Motivated with the law of large numbers of samples of state under the density function , Wang et al. (2020) introduced the following exploratory dynamic equation for state ,
(2.5) |
where is the density function of the input control ,
and
for .
It should be note that, the observations of the exploratory state should satisfy the equation (2.3). Based on this fact, we can prove that satisfies equation (2.5).
Theorem 2.3.
Proof.
We assume that the state satisfies the following diffusion process,
(2.6) |
where the functions and are needed to be determined.
Let us consider a sequence of samples which are the observations of with and , where is the independent sample paths of the Brownian motion , and is the control which obeys the density function , . For given and sufficiently small , satisfies
(2.7) |
which leads to
Applying the classical law of larger numbers, we have the following convergence results in probability:
where the last equality is obtained by the continuous dependence of stochastic differential equation, is the higher order infinitesimal of , and thus,
(2.8) |
Dividing on both sides of equation (2.8) by , and letting , it follows that
(2.9) |
Combining equations (2.6) and (2.9), we have that
(2.10) |
Remark 2.3.
Furthermore, in the framework of exploratory, we rewrite the cost functional (2.4) as follows,
(2.16) |
where the term
denotes the Shanon’s differential entropy used to measure the level of exploratory, and is the temperature constant balanced the exploitation and exploration. We denote all the admissible density functions control on as . Thus, our target is to minimum the cost functional (2.16) over .
3 Hamilton-Jacobi-Bellman Approach
Now, we consider equation (2.5) with cost functional (2.16). Similar with the manner in Wang et al. (2020), employing the classical dynamic programming principle, one obtains that
and the related Hamilton-Jacobi-Bellman (HJB) equation is
(3.1) |
with terminal condition .
It is worth noting that, equation (3.1) has an optimal control which satisfies
(3.2) |
Letting , reduces to a Dirac measure, where
and solves the following equation of ,
(3.3) |
By Theorem 2.2, we have that equation (3.3) has at least one solution , and thus
Furthermore, the function of , takes the minimum value at , where
Since , thus takes the maximum value at . The above observations motive us to find the optimal estimation for the parameter from the optimal density function .
Based on the above analysis, we conclude the following results.
Theorem 3.1.
If satisfies for any ,
(3.4) |
We have that
Proof.
From condition (3.4), we have that takes the unique minimum value at , and takes the unique maximum value at . Obviously, is equal to the parameter . ∎
In the following section, we consider a linear stochastic differential equation (SDE) model to verify the results given in Theorem 3.1, where the drift or diffusion term contains unknown parameter.
4 Linear SDE problem
We now consider the problem of the Linear SDE case, where the drift or diffusion term is allowed to contain the unknown parameter. We first study the diffusion term with unknown parameter case, and then investigate the drift term with unknown parameter.
4.1 The diffusion term with unknown parameter
We consider the following linear SDE,
(4.1) |
with the initial condition and parameter needs to be estimated. The cost functional is given as follows,
(4.2) |
where . Based on the classical optimal control theory, the related HJB equation is given by
The optimal control is given by
We take the parameter as , and obtain
Remark 4.1.
In this present paper, we aim to establish the theory to estimate the parameter appeared in the state’s dynamic equation. To derive the explicit solution of the optimal control , here, we take place the parameter as which is essentially same with the method developed in Section 3. Note that, when we obtain the optimal control , we can divide by the state . Then we can obtain the optimal estimation for the parameter . Indeed, when deal with some special problems, this kind of structure method should be considered according the properties of the problem.
Then equation (2.3) becomes
(4.3) |
Now, applying Theorem 2.3, we can formulate the RL stochastic optimal control problem. The exploratory dynamic equation is,
(4.4) |
where
and
The exploratory cost functional becomes
(4.5) |
By the formula of the optimal control in (3.2), we have that
(4.6) |
and thus
where
Putting into equation (3.1), and by a simple calculation, one obtains
(4.7) |
We assume that the classical solution of equation (4.7) satisfies
Then, it follows that
Now, we put the formula of into equation (4.7) and obtain that
(4.8) |
Thus, satisfy the following equations
and
which solve the value function with the optimal control satisfying the Gaussian density function,
For notation simplicity, we denote by
where
We conclude the above results in the following theorem.
Theorem 4.1.
Remark 4.2.
In Theorem 4.1, we show the explicit formula of the optimal control . In the control , we can use the mean to estimate the parameter . In this exploratory stochastic optimal control problem, the variance of the optimal control investigates the efficiency of exploratory of the RL. It is worth noting that, let , reduces to the Dirac measure .
Now, we consider the following classical optimal control problem. The state satisfies
(4.9) |
and the cost functional is
(4.10) |
The value function is defined by
and satisfies the following HJB equation,
(4.11) |
The optimal control of HJB equation (4.11) is
Then, putting into equation (4.11), and we assume the formula of the value function ,
Thus, satisfy the following equations
and
The value function is given as
with the optimal control .
Thus, the optimal control of the classical optimal control problem is consistent with the mean of the optimal density function . Therefore, based on the reinforcement learning (RL) stochastic optimal control structure presented in this study, we can use the RL method to learn the optimal density function and then obtains the estimation for parameter .
4.2 The drift term with unknown parameter
We could also consider the following model where the state satisfies
(4.12) |
with the initial condition and parameter needs to be estimated. Note that the parameter appears in the drift term of the state equation (4.12). We construct the following cost functional differed from (4.10),
(4.13) |
Remark 4.3.
Indeed, when we consider the cost functional (4.10), the optimal control admits the unique optima control which cannot be used to estimate the parameter . From cost functional (4.13), the related optimal control is given as follows:
We take place as , and thus
Then, equation (4.12) becomes
(4.14) |
Based on Theorem 2.3, the exploratory dynamic equation is,
(4.15) |
where
and
The exploratory cost functional is,
(4.16) |
By the formula of the optimal control in (3.2), we have that
(4.17) |
which satisfies the Gaussian density function with mean
and variance
We assume
The related HJB equation becomes
(4.18) |
and derives that
with , and thus
which solves the value function with the optimal control ,
For notation simplicity, we denote by
Based on RL, we can learn the mean from the observation data. Furthermore, we can use the variance to adjust the exploratory rate.
In this section, we have considered the parameters appeared in drift or diffusion term. In the following Section 5, we aim to consider a more general case where both drift and diffusion terms contain parameter.
5 Extension
Now, we general the model in Section 4 to the case where both the drift and diffusion terms contain unknown parameters. We show the details how to estimate the parameters in the drift and diffusion terms. Then, we give an example to verify the main results.
5.1 General model
We consider the following stochastic differential equation with the deterministic parameters , and control ,
(5.1) |
We construct two cost functionals,
(5.2) |
and
(5.3) |
to estimate the parameters and , respectively.
Firstly, we show how to estimate from cost functional . By choosing appropriate functions and , we obtain a feedback optimal control from cost functional , where is the optimal state with the optimal control . However, we need that contains the parameter only. The question is that if contains the parameter , we don’t know the value of the new input control (see ). We take place as a deterministic process in the feedback optimal control , and denotes by , where satisfies the following stochastic differential equation,
(5.4) |
where and . Furthermore, we rewrite the cost functional (5.2) as follows,
(5.5) |
where . Based on Theorem 2.3, we consider the following exploratory dynamic equation,
(5.6) |
where is the density function of the input control ,
and
for . The cost functional (5.5) is rewritten as follows,
(5.7) |
Secondly, we estimate by cost functional . We choose appropriate functions and , such that the feedback optimal control of the cost functional contains the parameter , where is the optimal state with the optimal control . We take place as a deterministic process in the feedback optimal control , and denotes by , where satisfies the following stochastic differential equation.
(5.8) |
where and . Furthermore, we rewrite the cost functional (5.3) as follows,
(5.9) |
where . Then, based on Theorem 2.3, we can introduce the exploratory dynamic equations almost same with (5.6) and (5.7). For notations simplicity, we omit them.
5.2 Linear SDE with unknown parameters
Now, we consider the following example.
(5.10) |
where and need to be estimated. Note that, equation (5.10) contains the parameters and . Thus, we cannot use the cost functional developed in Subsection 4.2 to investigate the estimation for , directly. Therefore, we first consider to estimate the parameter . Based on the estimation of , we can use the cost functional constructed in Subsection 4.2 to estimate the parameter .
Step 1: We first construct the cost functional used to estimate parameter , where
(5.11) |
Based on the classical optimal control theory, the related HJB equation is given by
The related optimal control is
We take place as , and have that
Equation (5.10) becomes
(5.12) |
From Theorem 2.3, the exploratory dynamic equation is,
(5.13) |
where
and
The exploratory cost functional is,
(5.14) |
By the formula of the optimal control in (3.2), we have that
(5.15) |
which satisfies the Gaussian density function with mean
and variance
We assume
The related HJB equation becomes
(5.16) |
and derives that
which solves the value function with the optimal control satisfies Gaussian density function,
For notation simplicity, we denote by
where
Step 2: We then construct the cost functional used to estimate parameter , where
(5.17) |
Minimizing the cost functional (5.17), the related optimal control is given by
We take place as , and thus
Then, equation (5.10) becomes
(5.18) |
From Theorem 2.3, the exploratory dynamic equation is,
(5.19) |
where
and
The exploratory cost functional is,
(5.20) |
By the formula of the optimal control in (3.2), we have that
(5.21) |
which satisfies the Gaussian density function with mean
and variance
We assume
The related HJB equation becomes
(5.22) |
and derives that
which solves the value function with the optimal control satisfies Gaussian density function,
For notation simplicity, we denote by
We conclude the above main results in the following theorem.
Theorem 5.1.
When both the drift and diffusion terms of the linear SDE (5.1) contain the unknown parameters and , based on the exploratory equations (5.13) and (5.19), and related cost functionals (5.14) and (5.20), we have the optimal density functions for the parameters and , respectively,
where
Based on and , we can learn the unknown parameters and .
Remark 5.1.
Wang and Zhou (2020) investigated a continuous time mean-variance portfolio selection with reinforcement learning model, and developed an implementable reinforcement learning algorithm. Based on the results in Wang and Zhou (2020), we can obtain the parameters appeared in density functions and , which can be used to estimate parameters and . Based on the estimations of and , we can do empirical analysis for the classical optimal control problem and so on.
6 Conclusion
Combining the stochastic optimal control model and the reinforcement learning structure, we develop a novel approach to learn the unknown parameters appeared in the drift and diffusion terms of the stochastic differential equation. By choosing an appropriate cost functional, we can obtain a feedback optimal control which contains the unknown parameter , denote by . We take place the unknown parameter in the optimal control as a new deterministic control , and obtain a new input control . Putting the new control into the related stochastic differential equation and cost functional. We establish the mathematical framework of the exploratory dynamic equation and cost functional, which is consistent with the structure introduced in Wang et al. (2020).
Indeed, the optimal control of the new exploratory control problem can be used to estimate unknown parameters. Therefore, we consider the linear stochastic differential equation case where the drift or diffusion term with unknown parameter. Then, we investigate the general case where both the drift and diffusion terms contain unknown parameters. For the above cases, we show that the optimal density function satisfies a Gaussian distribution which can be used to estimate the unknown parameters. When both the drift and diffusion terms of the linear SDE contain the unknown parameters and , based on the exploratory equations, and related cost functionals, we obtain the optimal density functions for the parameters and , respectively. The optimal density functions are given by
where
Based on and , we can learn the unknown parameters and .
In this present paper, we focus on develop the theory of a stochastic optimal control approach with reinforcement learning structure to estimate the unknown parameters appeared in the model. Indeed, based on the existing optimal control learning methods, for example policy evaluation and policy improvement developed in Wang and Zhou (2020), we can learn the optimal density functions, and then obtain the estimations of the unknown parameters. These estimations are useful in the related investment portfolio problems, for example mean-variance investment portfolio. We will further consider these problems in the future work.
References
- Basak and Chabakauri (2010) S. Basak and G. Chabakauri. Dynamic mean-variance asset allocation. The Review of Financial Studies, 23(8):2970–3016, 2010.
- Björk et al. (2014) T. Björk, A. Murgoci, and X. Y. Zhou. Mean-variance portfolio optimization with state-dependent risk aversion. Mathematical Finance, 24:1–24, 2014.
- Dai et al. (2021) M. Dai, H. Jin, S. Kou, and Y. Xu. A dynamic mean-variance analysis for log returns. Management Science, 67(2):1093–1108, 2021.
- Dong (2022) Y. C. Dong. Randomized optimal stopping problem in continuous time and reinforcement learning algorithm. arxiv.org/abs/2208.02409, pages 1–19, 2022.
- Doya (2000) K. Doya. Reinforcement learning in continuous time and space. Neural Computation, 12(1):219–245, 2000.
- Fleming and Rishel (1975) W. H. Fleming and R. W. Rishel. Determnistic and stochastic optimal control. Springer-Verlag, New York, 1975.
- Fleming and Soner (2006) W. H. Fleming and H. M. Soner. Controlled markov processes and viscosity solutions. Springer, New York, 2006.
- Gao et al. (2022) X. Gao, Z. Q. Xu, and X. Y. Zhou. State-dependent temperature control for langevin diffusions. SIAM Journal on Control and Optimization, 60(3):1250–1268, 2022.
- Goodfellow et al. (2016) I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT press, http://www.deeplearningbook.org, 2016.
- Han et al. (2023) X. Han, R. Wang, and X. Y. Zhou. Choquet regularization for continuous-time reinforcement learning. pages 1–35, 2023.
- Jia and Zhou (2022a) Y. Jia and X. Y. Zhou. Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach. The Journal of Machine Learning Research, 23(154):1–55, 2022a.
- Jia and Zhou (2022b) Y. Jia and X. Y. Zhou. Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms. The Journal of Machine Learning Research, 23(275):1–50, 2022b.
- Jia and Zhou (2022c) Y. Jia and X. Y. Zhou. q-learning in continuous time. arXiv: 2207.00713, 2022c.
- Pardoux and Peng (1990) E. Pardoux and S. Peng. Adapted solutions of backward stochastic equations. Systerm and Control Letters, 14:55–61, 1990.
- Peng (1990) S. Peng. A general stochastic maximum principle for optimal control problems. SIAM J.Control and Optim., 28(4):966–979, 1990.
- Peng (1992) S. Peng. A generalized dynamic programming principle and Hamilton-Jacobi-Bellmen equation. Stochastics Stochastics Rep., 38:119–134, 1992.
- Sutton and Barto (2018) R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. MIT press, 2018.
- Tang et al. (2022) W. Tang, Y. P. Zhang, and X. Y. Zhou. Exploratory HJB equations and their convergence. SIAM Journal on Control and Optimization, 60(6):3191–3216, 2022.
- Wang and Zhou (2020) H. Wang and X. Y. Zhou. Continuous-time mean-variance portfolio selection: A reinforcement learning framework. Mathematical Finance, 30(4):1273–1308, 2020.
- Wang et al. (2020) H. Wang, T. Zariphopoulou, and X. Y. Zhou. Exploration versus exploitation in reinforcement learning: A stochastic control approach. Journal of Machine Learning Research, 21(198):1–34, 2020.
- Yong and Zhou (1999) J. Yong and X. Y. Zhou. Stochastic controls: Hamiltonian systems and HJB equations. Springer, New York, 1999.
- Zhou (2023) X. Y. Zhou. The curse of optimality, and how to break it? Cambridge University Press, Part V-New Frontiers for Stochastic Control in Finance, 2023.
- Zhou and Li (2000) X. Y. Zhou and D. Li. Continuous-time mean-variance portfolio selection: A stochastic LQ frame work. Appl. Math. Optim., 42:19–33, 2000.