Simple method for efficiently solving dynamic models with continuous actions using policy gradient

Takeshi Fukasawa Graduate School of Public Policy, The University of Tokyo. Email: [email protected]
The replication code of the numerical experiments in this article is available at https://github.com/takeshi-fukasawa/dynamic_opt_conti_action.

Abstract

This study proposes the Value Function-Policy Gradient Iteration-Spectral (VF-PGI-Spectral) algorithm, which efficiently solves discrete-time infinite-horizon dynamic models with continuous actions. It combines the spectral algorithm to accelerate convergence. The method is applicable not only to single-agent dynamic optimization problems, but also to multi-agent dynamic games, which previously proposed methods cannot deal with. Moreover, the proposed algorithm is not limited to models with specific functional forms, and applies to models with multiple continuous actions. This study shows the results of numerical experiments, showing the effective performance of the proposed algorithm.

Keywords: Dynamic optimization with continuous actions; Single-agent dynamic models; Multi-agent dynamic games; VF-PGI-Spectral algorithm

1 Introduction

Dynamic optimization problems involving continuous actions are widespread in economics and other fields. Examples include consumption, investment, R&D, advertising, and dynamic pricing decisions, which are extensively studied in the literature of macroeconomics, industrial organization, labor economics, and others. Although such problems are common and important, one challenge to conducting quantitative analyses is the computational burden. When we apply standard value function iteration (VFI) to solve a discrete-time infinite-horizon dynamic model, the computational cost is generally determined by the number of iterations until convergence and the computational cost per iteration. When applying the VFI to solve a dynamic model, we must solve for optimal continuous actions $a$ for each state given the candidate values of value functions $V$ . The optimization problem is typically nonlinear, and the computational step can be time-consuming.¹¹1Discretizing the possible actions may help avoid the nonlinear optimization problem, but coarse discretized action can reduce the precision of the approximation. In contrast, when the number of possible actions is large, evaluating long-term profits for all the possible actions can be time-consuming. Hence, this approach has limitations. See also the discussion in Section 12 of Judd, (1998).

This study introduces a new, simple, computationally efficient “Value Function-Policy Gradient Iteration-Spectral (VF-PGI-Spectral) algorithm” for quantitatively solving discrete-time infinite-horizon dynamic models with continuous actions. Though the proposed algorithm inherits the idea of the VFI, we jointly update value functions $V$ and action variables $a$ (policy functions) in each iteration. Value functions $V$ are updated as in the value function iteration with candidate values of actions $a$ . Continuous actions $a$ are updated so that they step in the direction of the gradient of the action value functions $Q$ , given candidate value functions $V$ . Action value functions are the dynamic version of “profit.” The computational cost per iteration is low because these updating steps do not require nonlinear root-finding or optimization procedures. Note that we can easily introduce box constraints on the domain of actions, such as nonnegativity constraints.

This study combines the spectral algorithm, known for its effectiveness in reducing the number of iterations until convergence to solve nonlinear equations, mainly in the field of numerical analysis. The spectral algorithm share similarities with Newton’s method, well known for its rapid local convergence. Unlike Newton’s method, we do not have to specify the first derivative to be solved, which is the advantage of the spectral algorithm. This study shows the results of numerical experiments, where the VF-PGI-Spectral algorithm is dozens of times faster than the traditional VFI algorithm and previously proposed methods.

The proposed VF-PGI-Spectral algorithm offer the following advantages. First, the algorithm is intuitive, and it is easy to code. Second, the algorithm is not limited to models with specific functional forms, and it applies to both single-agent dynamic models and multi-agent dynamic games. It is also applicable to dynamic models with multiple continuous actions. Third, it attains fast convergence, because, unlike the traditional VFI, it does not require solving nonlinear optimization problems in the iteration.

The remainder of this paper is presented as follows: Section 3 discusses the proposed algorithm. In Section 4, we examine how to incorporate box constraints related to continuous actions, such as nonnegative constraints, into the proposed algorithm. Section 5 shows the results of numerical experiments. Sections 2 and 6 discuss the relationships with the previous studies and previously proposed methods. Section 7 finally concludes. Section A shows all proofs of mathematical statements. Section B describes the details of the numerical experiments, and Section C shows additional numerical results.

2 Literature

This study contributes to the literature on computationally efficient algorithms for solving dynamic optimization problems with continuous actions. Carroll, (2006) proposed the Endogenous Gridpoint Method (EGM), which avoids time-consuming nonlinear optimization steps, for solving single-agent dynamic optimization problems with a continuous action. Maliar and Maliar, (2013) and Arellano et al., (2016) proposed the Envelope Condition Method (ECM) for single-agent dynamic models with continuous actions.

Though useful, their applicability largely depends on the functional form of the models. Moreover, as discussed in Section 6, they are not directly applicable to multi-agent dynamic games because of strategic interactions. Furthermore, when extending these methods to single-agent dynamic models with multiple continuous actions, we additionally need to solve time-consuming nonlinear equations in some models.²²2Iskhakov, (2015) discusses the class of models with multiple continuous actions where EGM can be applied without additional nonlinear problems. Unlike other methods, the proposed VF-PGI-Spectral algorithm is not restricted to models with specific functional forms, and applies to both single-agent and multi-agent dynamic models with single and multiple continuous actions.

This study builds on the existing literature on numerical analysis of the spectral algorithm. The spectral algorithm was first proposed to solve nonlinear continuous optimization problems (Barzilai and Borwein,, 1988) and later extended to solve nonlinear equations (La Cruz et al.,, 2006; Varadhan and Roland,, 2008). When dealing with dynamic optimization problems with continuous actions, essentially, we should solve two types of problems: nonlinear equations related to value functions, and nonlinear optimization problems related to continuous actions. The current study builds on the literature, and proposes a new algorithm for solving dynamic optimization models with continuous actions efficiently.

The idea of introducing the spectral algorithm to efficiently solve nonlinear problems is not new in the economics literature. In studies on industrial organization, Reynaerts et al., (2012) and Conlon and Gortmaker, (2020) examined the use of spectral-type algorithms to reduce the number of iterations in the inner loop of estimating static random-coefficient logit model with aggregate data (BLP method proposed by Berry et al.,, 1995).³³3These studies also introduced the SQUAREM algorithm, which can be considered as a minor extension of the spectral algorithm. See Fukasawa, (2024) for details. Aguirregabiria and Marcoux, (2021) introduced the spectral algorithm in the context of nested-pseudo likelihood estimation of dynamic discrete games. Fukasawa, (2024), who proposed novel, fast, and simple algorithms for efficiently solving the inner loop of static/dynamic BLP estimations, also introduced spectral-type algorithms, and emphasized the cautious selection of step sizes. The current study builds on these findings, and discusses computationally efficient algorithms for solving dynamic models with continuous actions.

The concept of using gradients as an updating direction⁴⁴4In standard continuous optimization problems, the gradient descent-type methods also use gradients as an updating direction. can be found in the reinforcement learning literature (Deterministic policy gradient method; Silver et al.,, 2014). However, the settings may differ, and we discuss these differences in Section 6.

3 Model and Algorithms

In this section, we first examine solution methods for single-agent dynamic optimization problems (Section 3.1), and subsequently examine multi-agent dynamic games (Section 3.2). Although single-agent dynamic models are special models of multi-agent dynamic games, we first discuss the former to clarify the point. Readers who are only interested in single-agent models can skip Section 3.2. Section 3.3 discusses the spectral algorithm that effectively reduces the number of iterations until convergence. In the following sections, the primes on variables denote the next-period variables.

3.1 Single-agent dynamic model

Consider a single-agent infinite-horizon dynamic optimization problem with continuous action variables $a(s)$ characterized by the following Bellman equation:

\displaystyle V(s)

\displaystyle=

\displaystyle\max_{a(s)\in\mathcal{A}(s)}Q(a(s),s;V)\equiv r(a^{*}(s),s)+\beta\int V\left(s^{\prime}\right)p\left(s^{\prime}|s,a^{*}(s)\right)ds^{\prime},

where $a(s)\equiv\left(a^{d}(s)\right)_{d=1,\cdots,D}$ denotes the action of the agent at state $s$ . Note that the action can be multi-dimensional, and let $D$ be the dimension of the actions. Let $a^{*}(s)$ be the solution to the maximization problem. $\mathcal{S}$ denotes the state space, which can be either discrete or continuous. $\mathcal{A}(s)$ denotes the convex closed set of possible actions at state $s$ . For now, we assume $\mathcal{A}(s)=\mathbb{R}^{D}$ , but we can easily introduce box constraints, as discussed in Section 4. $p\left(s^{\prime}|s,a(s)\right)$ denotes the state transition probability. $Q(a(s),s;V)$ denotes the action value function, and $V(s)$ denotes the value function. Note that $Q$ depends on the value function $V$ , and $Q\left(a^{*}(s),s;V\right)=V(s)$ holds.

Assuming that $Q$ is differentiable with respect to $a$ , the following equations hold:

$\displaystyle 0$	$\displaystyle=$	$\displaystyle\frac{\partial Q(a^{}(s),s;V)}{\partial a^{}(s)}$	(1)
	$\displaystyle=$	$\displaystyle\frac{\partial r\left(a^{}(s),s\right)}{\partial a^{}(s)}+\beta\frac{\partial\left[\int V\left(s^{\prime}\right)p\left(s^{\prime}\|s,a^{}(s)\right)ds^{\prime}\right]}{\partial a^{}(s)},$	(1)
$\displaystyle V(s)$	$\displaystyle=$	$\displaystyle r(a^{}(s),s)+\beta\int V\left(s^{\prime}\right)p\left(s^{\prime}\|s,a^{}(s)\right)ds^{\prime}.$	(2)

Then, the problem reduces to solving the nonlinear equations (1) and (2) for $\left\{a^{*}(s)\right\}_{s\in\mathcal{S}}$ and $\left\{V(s)\right\}_{s\in\mathcal{S}}$ . In general, the state space of $s$ is not finite. In this case, we should take suitable finite set of grid points $\widehat{\mathcal{S}}\subset\mathcal{S}$ to approximate the solution well. If the state space $\mathcal{S}$ is finite, let $\widehat{\mathcal{S}}=\mathcal{S}$ . The alternative equations that we should solve are as follows:

$\displaystyle 0$	$\displaystyle=$	$\displaystyle\frac{\partial Q(a^{}\left(\widehat{s}\right),\widehat{s};V)}{\partial a^{}\left(\widehat{s}\right)}$	(3)
	$\displaystyle=$	$\displaystyle\frac{\partial r\left(a^{}\left(\widehat{s}\right),\widehat{s}\right)}{\partial a^{}\left(\widehat{s}\right)}+\beta\frac{\partial\left[\int\overline{V}\left(s^{\prime}\right)p\left(s^{\prime}\|\widehat{s},a^{}\left(\widehat{s}\right)\right)ds^{\prime}\right]}{\partial a^{}\left(\widehat{s}\right)},$	(3)
$\displaystyle V\left(\widehat{s}\right)$	$\displaystyle=$	$\displaystyle r\left(a^{}\left(\widehat{s}\right),\widehat{s}\right)+\beta\int\overline{V}\left(s^{\prime}\right)p\left(s^{\prime}\|\widehat{s},a^{}\left(\widehat{s}\right)\right)ds^{\prime}.$	(4)

Here, $\overline{V}$ denotes the interpolated values of $V$ .⁵⁵5For example, when using Chebyshev polynomials as basis functions, $\overline{V}$ can be constructed by $\overline{V}(s)=\sum_{m=0}^{M-1}\theta_{m}T_{m}(s)$ , where $T_{m}(s)$ denotes the order- $m$ Chebyshev basis function. $\theta_{m=0,1,\cdots,M}$ are unknown coefficients, and these values are computed based on the values of $\left\{V(s_{t}^{(grid)})\right\}_{s_{t}^{(grid)}\in\mathcal{S}^{(grid)}}$ . Note that $T_{m}$ satisfies $T_{0}(s)=1,T_{1}(s)=s,$ and $T_{m}(s)=2sT_{m-1}(s)-T_{m-2}(s)\ (m\geq 2)$ . If $\widehat{\mathcal{S}}\subsetneq\mathcal{S}$ , we cannot directly evaluate the values of $V(s)\ s\notin\widehat{\mathcal{S}}$ , and we interpolate these values based on the values of $\left\{V(\widehat{s})\right\}_{\widehat{s}\in\text{$\widehat{\mathcal{S}}$}}$ . We solve these equations for $\left\{a^{*}(\widehat{s})\right\}_{\widehat{s}\in\widehat{\mathcal{S}}}$ and $\left\{V(\widehat{s})\right\}_{\widehat{s}\in\widehat{\mathcal{S}}}$ .

Algorithm 1 shows the proposed Value Function-Policy Gradient Iteration (VF-PGI) algorithm.

1.

Take grid points $\widehat{s}\in\widehat{\mathcal{S}}$ . Set initial values $V^{(0)}\equiv\left\{V^{(0)}\left(\widehat{s}\right)\right\}_{\widehat{s}\in\widehat{\mathcal{S}}}$ and $a^{(0)}\equiv\left\{a^{(0)}\left(\widehat{s}\right)\right\}_{\widehat{s}\in\widehat{\mathcal{S}}}$

Iterate the following $(n=0,1,\cdots)$ :

(a)

Find $a^{(n*)}\left(\widehat{s}\right)$ by:

a^{(n*)}\left(\widehat{s}\right)=\Phi_{a}\left(V^{(n)},a^{(n)}\right)\left(\widehat{s}\right)\equiv a^{(n)}\left(\widehat{s}\right)+\lambda\cdot\frac{\partial Q\left(a^{(n)}\left(\widehat{s}\right),\widehat{s};V^{(n)}\right)}{\partial a^{(n)}\left(\widehat{s}\right)}

where

			$\displaystyle\frac{\partial Q\left(a^{(n)}\left(\widehat{s}\right),\widehat{s};V^{(n)}\right)}{\partial a^{(n)}\left(\widehat{s}\right)}$
		$\displaystyle=$	$\displaystyle\frac{\partial r\left(a^{(n)}\left(\widehat{s}\right),\widehat{s}\right)}{\partial a^{(n)}\left(\widehat{s}\right)}+\beta\frac{\partial\left[\int\overline{V}^{(n)}\left(s^{\prime}\right)p\left(s^{\prime}\|\widehat{s},a^{(n)}\left(\widehat{s}\right)\right)ds^{\prime}\right]}{\partial a^{(n)}\left(\widehat{s}\right)}$

(b)

Find

	$\displaystyle V^{(n*)}\left(\widehat{s}\right)$	$\displaystyle=$	$\displaystyle\Phi_{V}\left(V^{(n)},a^{(n)}\right)\left(\widehat{s}\right)$
		$\displaystyle\equiv$	$\displaystyle r\left(a^{(n)}\left(\widehat{s}\right),\widehat{s}\right)+\beta\int\overline{V}^{(n)}\left(s^{\prime}\right)p\left(s^{\prime}\|\widehat{s},a^{(n)}\left(\widehat{s}\right)\right)ds^{\prime}$

for $\widehat{s}\in\widehat{\mathcal{S}}$

(c)
Update $V$ and $a$ in either of the following ways:
- •
  
  Simple: $V^{(n+1)}\left(\widehat{s}\right)\leftarrow V^{(n*)}\left(\widehat{s}\right)$ , $a^{(n+1)}\left(\widehat{s}\right)\leftarrow a^{(n*)}\left(\widehat{s}\right)$
- •
  
  Spectral algorithm: For $z\in\left\{V,a^{d=1,\cdots,D}\right\},$ $z^{(n+1)}\left(\widehat{s}\right)\leftarrow\alpha_{z}^{(n)}z^{(n*)}\left(\widehat{s}\right)+(1-\alpha_{z}^{(n)})z^{(n)}\left(\widehat{s}\right)$ , where $\alpha_{z}^{(n\geq 1)}\equiv\frac{\left\|z^{(n)}-z^{(n-1)}\right\|_{2}}{\left\|F_{z}\left(V^{(n)},a^{(n)}\right)-F_{z}\left(V^{(n-1)},a^{(n-1)}\right)\right\|_{2}}$ , $F_{z}(V,a)\equiv\Phi_{z}\left(V,a\right)-z.$
(d)

Exit the iteration if $\left\|V^{(n+1)}-V^{(n)}\right\|<\epsilon_{V}$ and $\left\|a^{d(n+1)}-a^{d(n)}\right\|<\epsilon_{a^{d}}\ (d=1,\cdots,D)$

Algorithm 1 VF-PGI algorithm (Single-agent dynamic model)

Regarding the VF-PGI algorithm (Algorithm 1), we can intuitively explain the steps and treat the iterations as a learning process. We first set the initial values of value function $V$ and actions $a$ . If the gradient of “long-run profit” $Q$ with respect to the action given the initial value functions and the action in a state is positive, we increase the value of the action, expecting higher long-term profit. If negative, we reduce the value of the action. When the gradient is close to zero, the action is likely to be optimal given value functions, and we rarely change the value of the action. As for value functions, we update the values so that they are consistent with Bellman equations given initial value functions and actions. Note that the initial actions in this step might not be optimal. We repeat these steps until the process when the initial and updated values are sufficiently close to each other, and then terminate the process.

In the algorithm, we can combine the spectral algorithm, which has been applied to accelerate the iterations in solving continuous optimization problems and nonlinear equations. When introducing the spectral algorithm, the values of $V,a$ are updated by $z^{(n+1)}\left(\widehat{s}\right)\leftarrow\alpha_{z}^{(n)}z^{(n*)}\left(\widehat{s}\right)+(1-\alpha_{z}^{(n)})z^{(n)}\left(\widehat{s}\right)\ z\in\left\{V,a^{d=1,\cdots,D}\right\}$ , where $\alpha_{z}^{(n)}\equiv\frac{\left\|z^{(n)}-z^{(n-1)}\right\|_{2}}{\left\|F_{z}\left(V^{(n)},a^{(n)}\right)-F_{z}\left(V^{(n-1)},a^{(n-1)}\right)\right\|_{2}}$ . Here, $\alpha_{z}^{(n)}$ denote step sizes. The values of step sizes $\alpha_{z}^{(n)}$ work as learning rates, and the spectral algorithm accelerates the learning by automatically choosing the values of step sizes according to a predetermined formula. For details of the spectral algorithm, see Section 3.3. Note that without combining the spectral algorithm, the VF-PGI algorithm works poorly due to slow convergence, and introducing the spectral algorithm is essential.

To compare with the traditional Value Function Iteration (VFI) method, I also show the VFI algorithm in Algorithm 2. In VFI, we need to solve a nonlinear equation related to optimal actions at each grid point, as shown in Step 2(a). Because solving nonlinear equations themselves usually takes nonnegligible time, the step is computationally costly. In contrast, in VF-PGI, we do not have to solve nonlinear equations to update the values of actions, as shown in Step 2(a). Though we have to update value functions and actions jointly, we can expect that the computational cost per iteration would be lower.

1.

Take grid points $\widehat{s}\in\widehat{\mathcal{S}}$ . Set initial values $V^{(0)}\equiv\left\{V^{(0)}\left(\widehat{s}\right)\right\}_{\widehat{s}\in\widehat{\mathcal{S}}}$

Iterate the following $(n=0,1,\cdots)$ :

(a)

Solve for $a^{(n*)}\left(\widehat{s}\right)=\Phi_{a}\left(V^{(n)},a^{(n)}\right)\left(\widehat{s}\right)$ satisfying the following nonlinear equation for each $\widehat{s}\in\widehat{\mathcal{S}}$ :

	$\displaystyle 0$	$\displaystyle=$	$\displaystyle\frac{\partial Q(a^{(n)}\left(\widehat{s}\right),\widehat{s};V^{(n)})}{\partial a^{(n)}\left(\widehat{s}\right)}$
		$\displaystyle=$	$\displaystyle\frac{\partial r\left(a^{(n)}\left(\widehat{s}\right),\widehat{s}\right)}{\partial a^{(n)}\left(\widehat{s}\right)}+\beta\frac{\partial\left[\int\overline{V}^{(n)}\left(\text{$s^{\prime}$}\right)p\left(s^{\prime}\|\widehat{s},a^{(n)}\left(\widehat{s}\right)\right)ds^{\prime}\right]}{\partial a^{(n)}\left(\widehat{s}\right)}$

(b)

Find

\displaystyle V^{(n*)}\left(\widehat{s}\right)

\displaystyle=

\displaystyle r\left(a^{(n*)}\left(\widehat{s}\right),\widehat{s}\right)+\beta\int\overline{V}^{(n)}\left(s^{\prime}\right)p\left(s^{\prime}|\widehat{s},a^{(n*)}\left(\widehat{s}\right)\right)ds^{\prime}

for $\widehat{s}\in\widehat{\mathcal{S}}$

(c)
Update $V$ in either of the following ways:
- •
  
  Simple: $V^{(n+1)}\left(\widehat{s}\right)\leftarrow V^{(n*)}\left(\widehat{s}\right)$
- •
  
  Spectral algorithm: $V^{(n+1)}\left(\widehat{s}\right)\leftarrow\alpha_{V}^{(n)}V^{(n*)}\left(\widehat{s}\right)+(1-\alpha_{V}^{(n)})V^{(n)}\left(\widehat{s}\right)$ ,
  
  where $\alpha_{V}^{(n\geq 1)}\equiv\frac{\left\|V^{(n)}-V^{(n-1)}\right\|_{2}}{\left\|F_{V}\left(V^{(n)},a^{(n)}\right)-F_{V}\left(V^{(n-1)},a^{(n-1)}\right)\right\|_{2}}$ , $F_{V}(V,a)\equiv\Phi_{V}\left(V,a\right)-V$ .
(d)

Exit the iteration if $\left\|V^{(n+1)}-V^{(n)}\right\|<\epsilon_{V}$

Algorithm 2 VFI algorithm (Single-agent dynamic model)

Remark 1.

In VF-PGI, the parameter $\lambda$ corresponds to the learning rate, and the iteration can be unstable when $\lambda$ is large. In contrast, a small $\lambda$ can lead to slow convergence. Note that the convergence speed is not so sensitive to the value of $\lambda$ as long as $\lambda$ is relatively small, when we combine the spectral algorithm. See also the numerical results in Section 5.1.

Remark 2.

Unlike VFI under regularity conditions, there is no guarantee that VF-PGI is a contraction. However, in the numerical experiments, VF-PGI-Spectral converges for appropriate $\lambda$ .

3.2 Multi-agent dynamic game

Next, we consider a dynamic model in which multiple agents interact over time. Let $\pi_{j}\left(a_{t+\tau},s_{t+\tau}\right)$ be agent $j$ ’s profit when they choose actions $a_{t+\tau}\equiv\left(a_{jt+\tau}^{*}\right)_{j\in\mathcal{J}}$ at state $s_{t+\tau}$ . Here, $\mathcal{J}$ denotes the set of agents. Given a current state $s_{t}$ , agent $j$ ’s expected future profit is $E\left[\sum_{\tau=0}^{\infty}\beta^{\tau}\pi_{j}\left(a_{t+\tau},s_{t+\tau}\right)|s_{t}\right]$ . Actions can be multidimensional, and let $a_{j}\equiv\left(a_{j}^{d}\right)_{d=1,\cdots,D}$ .

Here, we focus on pure strategy Markov perfect equilibria. We assume the existence of such an equilibrium, and let $a_{j}^{*}:\mathcal{S}\rightarrow\mathcal{A}_{j}$ be the Markov strategy of agent $j$ . $a^{*}\equiv\left(a_{j}^{*}\right)_{j\in\mathcal{J}}$ denotes the profile of the vector, where $a^{*}:\mathcal{S}\rightarrow\mathcal{A}$ .

Then, the value function of firm $j$ given a state $s$ can be written recursively:

	$\displaystyle V_{j}(s)$	$\displaystyle=$	$\displaystyle\max_{a_{j}(s)\in\mathcal{A}_{j}(s)}Q_{j}\left(a_{j}(s),a_{-j}^{*}(s),s;V_{j}\right)$
		$\displaystyle\equiv$	$\displaystyle r_{j}(a_{j}^{}(s),a_{-j}^{}(s),s)+\beta\int V_{j}\left(s^{\prime}\right)p(s^{\prime}\|s,a_{j}^{}(s),a_{-j}^{}(s))ds^{\prime},$

where $a_{j}(s)$ is the action of agent $j$ at state $s$ . $Q_{j}\left(a_{j}(s),a_{-j}(s),s;V_{j}\right)$ is the action value function of firm $j$ , and $V_{j}(s)$ is the value function. $p\left(s^{\prime}|s,a(s)\right)$ is the state transition probability. Note that $Q_{j}\left(a_{j}^{*}(s),a_{-j}^{*}(s),s;V_{j}\right)=V_{j}(s)$ holds.

If the action $a_{j}(s)$ is a continuous variable and $Q_{j}$ is differentiable with respect to $a_{j}(s)$ , the following equations hold:

$\displaystyle 0$	$\displaystyle=$	$\displaystyle\frac{\partial Q_{j}(a_{j}^{}(s),a_{-j}^{}(s),s;V_{j})}{\partial a_{j}^{*}(s)}$	(5)
	$\displaystyle=$	$\displaystyle\frac{\partial r_{j}\left(a_{j}^{}(s),a_{-j}^{}(s),s\right)}{\partial a_{j}^{}(s)}+\beta\frac{\partial\left[\int V_{j}\left(s^{\prime}\right)p(s^{\prime}\|s,a_{j}^{}(s),a_{-j}^{}(s))ds^{\prime}\right]}{\partial a_{j}^{}(s)},$	(5)
$\displaystyle V_{j}(s)$	$\displaystyle=$	$\displaystyle r_{j}(a^{}(s),s)+\beta\int V_{j}\left(s^{\prime}\right)p\left(s^{\prime}\|s,a^{}(s)\right)ds^{\prime}.$	(6)

Then, the problem reduces to solving the nonlinear equations (5) and (6) concerning $\left\{a_{j}^{*}(s)\right\}_{j\in\mathcal{J},s\in\mathcal{S}}$ and $\left\{V_{j}(s)\right\}_{j\in\mathcal{J},s\in\mathcal{S}}$ .

In general, the state space of $s$ is not finite. In this case, we should take suitable finite sets of grid points $\widehat{s}\in\widehat{\mathcal{S}}\subset\mathcal{S}$ to well-approximate the solution. Then, we should alternatively solve the following system of equations:

$\displaystyle 0$	$\displaystyle=$	$\displaystyle\frac{\partial Q_{j}\left(a_{j}^{}\left(\widehat{s}\right),a_{-j}^{}\left(\widehat{s}\right),\widehat{s};V_{j}\right)}{\partial a_{j}^{*}\left(\widehat{s}\right)}$	(7)
		$\displaystyle\frac{\partial r_{j}\left(a_{j}^{}\left(\widehat{s}\right),a_{-j}^{}\left(\widehat{s}\right),s\right)}{\partial a_{j}^{}\left(\widehat{s}\right)}+\beta\frac{\partial\left[\int\overline{V_{j}}\left(s^{\prime}\right)p\left(s^{\prime}\|\widehat{s},a_{j}^{}\left(\widehat{s}\right),a_{-j}^{}\left(\widehat{s}\right)\right)ds^{\prime}\right]}{\partial a_{j}^{}\left(\widehat{s}\right)},$	(7)
$\displaystyle V_{j}\left(\widehat{s}\right)$	$\displaystyle=$	$\displaystyle r_{j}\left(a^{}\left(\widehat{s}\right),\widehat{s}\right)+\beta\int\overline{V_{j}}\left(s^{\prime}\right)p\left(s^{\prime}\|\widehat{s},a^{}\left(\widehat{s}\right)\right)ds^{\prime}.$	(8)

Algorithm 3 shows the proposed VF-PGI algorithm for the dynamic game.

1.

Take grid points $\widehat{s}\in\widehat{\mathcal{S}}$ . Set initial values $V^{(0)}\equiv\left\{V_{j}^{(0)}\left(\widehat{s}\right)\right\}_{j\in\mathcal{J},\widehat{s}\in\widehat{\mathcal{S}}}$ and $a^{(0)}\equiv\left\{a_{j}^{(0)}\left(\widehat{s}\right)\right\}_{j\in\mathcal{J},\widehat{s}\in\widehat{\mathcal{S}}}$

Iterate the following $(n=0,1,\cdots)$ :

(a)

Solve for $\left\{a_{j}^{(n*)}\left(\widehat{s}\right)\right\}_{j\in\mathcal{J},\widehat{s}\in\widehat{\mathcal{S}}}$ :

a_{j}^{(n*)}\left(\widehat{s}\right)=\Phi_{a,j}\left(V^{(n)},a^{(n)}\right)\left(\widehat{s}\right)\equiv a_{j}^{(n)}\left(\widehat{s}\right)+\lambda\cdot\frac{\partial Q_{j}\left(a_{j}^{(n)}\left(\widehat{s}\right),a_{-j}^{(n)}\left(\widehat{s}\right),\widehat{s};V_{j}^{(n)}\right)}{\partial a_{j}^{(n)}\left(\widehat{s}\right)}

where

			$\displaystyle\frac{\partial Q_{j}(a_{j}^{(n)}\left(\widehat{s}\right),a_{-j}^{(n)}\left(\widehat{s}\right),\widehat{s};V_{j}^{(n)})}{\partial a_{j}^{(n)}\left(\widehat{s}\right)}$
		$\displaystyle\equiv$	$\displaystyle\frac{\partial r_{j}\left(a_{j}^{(n)}\left(\widehat{s}\right),a_{-j}^{(n)}\left(\widehat{s}\right),s\right)}{\partial a_{j}^{(n)}\left(\widehat{s}\right)}+\beta\frac{\partial\left[\int\overline{V_{j}}^{(n)}\left(s^{\prime}\right)p\left(s^{\prime}\|\widehat{s},a_{j}^{(n)}\left(\widehat{s}\right),a_{-j}^{(n)}\left(\widehat{s}\right)\right)ds^{\prime}\right]}{\partial a_{j}^{(n)}\left(\widehat{s}\right)}$

(b)

Find

	$\displaystyle V_{j}^{(n*)}\left(\widehat{s}\right)$	$\displaystyle\equiv$	$\displaystyle\Phi_{V,j}\left(V^{(n)},a^{(n)}\right)\left(\widehat{s}\right)$
		$\displaystyle=$	$\displaystyle r_{j}\left(a^{(n)}\left(\widehat{s}\right),\widehat{s}\right)+\beta\int\overline{V_{j}}^{(n)}\left(s^{\prime}\right)p\left(s^{\prime}\|\widehat{s},a^{(n)}\left(\widehat{s}\right)\right)ds^{\prime}$

for $j\in\mathcal{J},\widehat{s}\in\widehat{\mathcal{S}}$

(c)
Update $a$ and $V$ in either of the following ways:
- •
  
  Simple: $a_{j}^{(n+1)}\left(\widehat{s}\right)\leftarrow a_{j}^{(n*)}\left(\widehat{s}\right)$ , $V_{j}^{(n+1)}\left(\widehat{s}\right)\leftarrow V_{j}^{(n*)}\left(\widehat{s}\right)$
- •
  
  Spectral algorithm: For $z_{j}\in\left\{V_{j},a_{j}^{d=1,\cdots,D}\right\}$ , $z_{j}^{(n+1)}\left(\widehat{s}\right)\leftarrow\alpha_{z}^{(n)}z_{j}^{(n*)}\left(\widehat{s}\right)+(1-\alpha_{z}^{(n)})z_{j}^{(n)}\left(\widehat{s}\right)$ , where
  
  $\alpha_{z}^{(n\geq 1)}\equiv\frac{\left\|z^{(n)}-z^{(n-1)}\right\|_{2}}{\left\|F_{z}\left(V^{(n)},a^{(n)}\right)-F_{z}\left(V^{(n-1)},a^{(n-1)}\right)\right\|_{2}}$ , $F_{z}(V,a)\equiv\Phi_{z}\left(V,a\right)\left(\widehat{s}\right)-z$
(d)

Exit the iteration if $\left\|V^{(n+1)}-V^{(n)}\right\|<\epsilon_{V}$ and $\left\|a^{d(n+1)}-a^{d(n)}\right\|<\epsilon_{a^{d}}\ (d=1,\cdots,D)$

Algorithm 3 VF-PGI algorithm (Multi-agent dynamic game)

Intuitively, in the VF-PGI algorithm, we first set initial values of all agents’ value functions $V$ and actions $a$ . Then, if the gradient of agent $j$ ’s “long-run profit” $\frac{\partial Q}{\partial a_{j}}$ given their initial value functions and all agents’ actions is positive, we increase the value of agent $j$ ’s action expecting agent $j$ ’s larger long-run profit. If it is negative, we decrease the value. When the gradient is close to zero, the action is mostly optimal given value functions and all agents’ actions, and we rarely change the value of agent $j$ ’s action.

Regarding value functions, we update the values of agent $j$ ’s value functions so that they align with agent $j$ ’s Bellman equations given their own initial value functions and all agents’ actions. Note that the initial actions in this step might not be optimal. We repeat the procedure for all agents, and iterate these steps until the initial and updated values are sufficiently close.

As in the single-agent version, we need no solve nonlinear problems in each iteration, and it contributes to smaller computational burden.

In the previous studies, an algorithm, termed VFI* algorithm, because it is a slight extension of the single-agent VFI algorithm, has been applied.⁶⁶6Corresponding algorithms have been applied in Pakes and McGuire, (1994), Miranda and Fackler, (2004), and subsequent studies. Note that the original algorithms in these studies update $V$ by using $a^{(n+1)}$ , rather than $a^{(n)}$ , in Step 2(b). To make the comparison with VF-PGI easier, I represent the algorithm based on the latter specification. I show the algorithm in Algorithm 4. The difference between VF-PGI (Algorithm 3) and VFI* (Algorithm 4) is Step 2(a). In VFI*, we must solve nonlinear equations many times; however, in VF-PGI, we need not solve nonlinear problems in this step.

1.

Take grid points $\widehat{s}\in\widehat{\mathcal{S}}$ . Set initial values $V^{(0)}\equiv\left\{V_{j}^{(0)}\left(\widehat{s}\right)\right\}_{j\in\mathcal{J},\widehat{s}\in\widehat{\mathcal{S}}}$ and $a^{(0)}\equiv\left\{a_{j}^{(0)}\left(\widehat{s}\right)\right\}_{j\in\mathcal{J},\widehat{s}\in\widehat{\mathcal{S}}}$

Iterate the following $(n=0,1,\cdots)$ :

(a)

Solve for $a_{j}^{(n*)}\left(\widehat{s}\right)=\Phi_{a}\left(V^{(n)},a^{(n)}\right)\left(\widehat{s}\right)$ satisfying the following nonlinear equation for each $j\in\mathcal{J},\widehat{s}\in\widehat{\mathcal{S}}$ :

	$\displaystyle 0$	$\displaystyle=$	$\displaystyle\frac{\partial Q(a_{j}^{(n)}\left(\widehat{s}\right),a_{-j}^{(n)}\left(\widehat{s}\right),\widehat{s};V_{j}^{(n)})}{\partial a_{j}^{(n)}\left(\widehat{s}\right)}$
		$\displaystyle=$	$\displaystyle\frac{\partial r_{j}\left(a_{j}^{(n)}\left(\widehat{s}\right),a_{-j}^{(n)}\left(\widehat{s}\right),\widehat{s}\right)}{\partial a_{j}^{(n)}\left(\widehat{s}\right)}+\beta\frac{\partial\left[\int\overline{V_{j}}^{(n)}\left(s^{\prime}\right)p(s^{\prime}\|\widehat{s},a_{j}^{(n)}\left(\widehat{s}\right),a_{-j}^{(n)}\left(\widehat{s}\right))ds^{\prime}\right]}{\partial a_{j}^{(n)}\left(\widehat{s}\right)}$

(b)

Find

	$\displaystyle V_{j}^{(n*)}\left(\widehat{s}\right)$	$\displaystyle\equiv$	$\displaystyle\Phi_{V,j}\left(V^{(n)},a^{(n)}\right)\left(\widehat{s}\right)$
		$\displaystyle=$	$\displaystyle r_{j}\left(a^{(n)}\left(\widehat{s}\right),\widehat{s}\right)+\beta\int\overline{V_{j}}^{(n)}\left(s^{\prime}\right)p\left(s^{\prime}\|\widehat{s},a^{(n)}\left(\widehat{s}\right)\right)ds^{\prime}$

for $j\in\mathcal{J},\widehat{s}\in\widehat{\mathcal{S}}$

(c)
Update $a$ and $V$ :
- •
  
  Simple: $a_{j}^{(n+1)}\left(\widehat{s}\right)\leftarrow a_{j}^{(n*)}\left(\widehat{s}\right)$ , $V_{j}^{(n+1)}\left(\widehat{s}\right)\leftarrow V_{j}^{(n*)}\left(\widehat{s}\right)$
- •
  
  Spectral algorithm: For $z_{j}\in\left\{V_{j},a_{j}^{d=1,\cdots,D}\right\}$ , $z_{j}^{(n+1)}\left(\widehat{s}\right)\leftarrow\alpha_{z}^{(n)}z_{j}^{(n*)}\left(\widehat{s}\right)+(1-\alpha_{z}^{(n)})z_{j}^{(n)}\left(\widehat{s}\right)$ , where $\alpha_{z}^{(n\geq 1)}\equiv\frac{\left\|z^{(n)}-z^{(n-1)}\right\|_{2}}{\left\|F_{z}\left(V^{(n)},a^{(n)}\right)-F_{z}\left(V^{(n-1)},a^{(n-1)}\right)\right\|_{2}}$ , $F_{z}(V,a)\equiv\Phi_{z}\left(V,a\right)\left(\widehat{s}\right)-z$
(d)

Exit the iteration if $\left\|V^{(n+1)}-V^{(n)}\right\|<\epsilon_{V}$ and $\left\|a^{d(n+1)}-a^{d(n)}\right\|<\epsilon_{a^{d}}\ (d=1,\cdots,D)$

Algorithm 4 VFI* algorithm (Multi-agent dynamic game)

Remark 3.

As in the VFI* algorithm, there is no guarantee that the iteration is a contraction. Furthermore, given the existence of multiple equilibria, there is no guarantee that we can find all equilibria even when starting from several initial values.

3.3 Spectral algorithm

We can speed up the convergence of the algorithm by introducing the spectral algorithm. To introduce the idea, we first discuss the fixed point iteration representation of the proposed VF-PGI algorithm. We then briefly discuss the spectral algorithm, which summarizes the detailed discussion in Fukasawa, (2024).

Fixed point iteration representation of the algorithm

VF-PGI (without applying the spectral algorithm) can be considered as a fixed point iteration. To see this, we consider the single-agent dynamic model discussed in Section 3.1. Let $x=\left(V,a\right)$ , where $V\equiv\left\{V\left(\widehat{s}\right)\right\}_{\widehat{s}\in\widehat{\mathcal{S}}}$ and $a\equiv\left\{a\left(\widehat{s}\right)\right\}_{\widehat{s}\in\widehat{\mathcal{S}}}$ . We define a mapping $\Phi:B_{V}\times B_{a}\rightarrow B_{V}\times B_{a}$ such that $\Phi(x)\equiv\Phi(V,a)\equiv\left(\Phi_{V}(V,a),\Phi_{a}(V,a)\right)$ . Here, we define $\Phi_{V}:B_{V}\times B_{a}\rightarrow B_{V}$ such that

\Phi_{V}(V,a)\left(\widehat{s}\right)\equiv r(a\left(\widehat{s}\right),\widehat{s})+\beta\int\overline{V}\left(s^{\prime}\right)p\left(s^{\prime}|\widehat{s},a\left(\widehat{s}\right)\right)ds^{\prime},

and $\Phi_{a}:B_{V}\times B_{a}\rightarrow B_{a}$ such that

\Phi_{a}(V,a)\left(\widehat{s}\right)=a\left(\widehat{s}\right)+\lambda\cdot\left[\frac{\partial r\left(a\left(\widehat{s}\right),\widehat{s}\right)}{\partial a\left(\widehat{s}\right)}+\beta\frac{\partial\left[\int\overline{V}\left(s^{\prime}\right)p(s^{\prime}|\widehat{s},a\left(\widehat{s}\right))ds^{\prime}\right]}{\partial a\left(\widehat{s}\right)}\right].

$\overline{V}$ is an interpolated value based on the values of $V\equiv\left\{V\left(\widehat{s}\right)\right\}_{\widehat{s}\in\widehat{\mathcal{S}}}$ . $B_{V}$ and $B_{a}$ denote the domains of $V$ and $a$ , respectively. Hence, the proposed algorithm is equivalent to repeating the fixed point iteration $x^{(n+1)}=\Phi\left(x^{(n)}\right)$ until convergence with appropriate initial values $x^{(0)}\equiv\left(V^{(0)},a^{(0)}\right)$ . Analogous representation is possible even when we consider multi-agent dynamic games.

Spectral algorithm

The spectral algorithm is designed to solve nonlinear equations and nonlinear continuous optimization problems. When we want to solve a nonlinear equation $F(x)=0$ , $x^{(n)}$ is iteratively updated as follows, given initial values of $x^{(0)}$ :⁷⁷7Newton’s method, which uses the updating equation $x^{(n+1)}=x^{(n)}-\left(\nabla F(x^{(n)})\right)^{-1}F\left(x^{(n)}\right)\ (n=0,1,2,\cdots)$ , attains fast convergence around the solution. Although Newton’s method can be applied to solve the equation, computing $\nabla F(x^{(n)})$ requires coding analytical first derivatives of $F$ , which is not an easy task, especially when the function $F$ is complicated. Moreover, especially when $n_{x}$ , the dimension of $x$ , is large, computing the inverse of $n_{x}\times n_{x}$ matrix $\nabla F(x^{(n)})$ is computationally costly. Hence, the use of the spectral algorithm is attractive from the perspective of simplicity and computational cost.

x^{(n+1)}=x^{(n)}+\alpha^{(n)}F\left(x^{(n)}\right)\ (n=0,1,2,\cdots)

Here, $\alpha^{(n)}$ is generally taken based on the values of $s^{(n)}\equiv x^{(n)}-x^{(n-1)}$ and $y^{(n)}\equiv F(x^{(n)})-F(x^{(n-1)})$ , and it can vary across $n=0,1,2,\cdots$ .

The idea of the spectral algorithm can be utilized in the context of fixed point iterations. Suppose we want to solve a fixed point constraint $x=\Phi(x)$ . Then, by letting $F(x)=\Phi(x)-x$ , we can solve $x=\Phi(x)$ by iteratively updating the values of $x$ by:⁸⁸8Spectral algorithm can be thought of as a generalization of extrapolation method (cf. Judd, (1998)) with varying values of $\alpha$ .

	$\displaystyle x^{(n+1)}$	$\displaystyle=$	$\displaystyle x^{(n)}+\alpha^{(n)}\left(\Phi\left(x^{(n)}\right)-x^{(n)}\right)\ (n=0,1,2,\cdots)$
		$\displaystyle=$	$\displaystyle\alpha^{(n)}\Phi(x^{(n)})+(1-\alpha^{(n)})x^{(n)}\ (n=0,1,2,\cdots).$

As discussed in detail in Varadhan and Roland, (2008) and Fukasawa, (2024),⁹⁹9Step size $\alpha_{S3}\equiv\frac{\left\|s^{(n)}\right\|_{2}}{\left\|y^{(n)}\right\|_{2}}$ can be derived from a simple optimization problem $\min_{\alpha^{(n)}}\frac{\left\|x^{(n+1)}-x^{(n)}\right\|_{2}^{2}}{\left|\alpha^{(n)}\right|}=\frac{\left\|s^{(n)}+\alpha^{(n)}y^{(n)}\right\|_{2}^{2}}{\left|\alpha^{(n)}\right|}.$ For details, see Section 5 of Fukasawa, (2024). the step size $\alpha^{(n)}=\alpha_{S3}\equiv\frac{\left\|s^{(n)}\right\|_{2}}{\left\|y^{(n)}\right\|_{2}}>0$ works well. Algorithm 5 shows the detailed steps in the spectral algorithm under the step size $\alpha_{S3}$ .¹⁰¹⁰10Previous studies (e.g., La Cruz et al.,, 2006) in the numerical literature have proposed globalization strategies to guarantee global convergence. However, the iteration without globalization strategies works well, as shown in Section 5, and I do not discuss them here.

1.

Set initial values of $x^{(0)}$ . Choose $\alpha^{(0)}$ and tolerance level $\epsilon$ .
2.
Iterate the following $(n=0,1,2,\cdots)$ :
1. (a)
  
  Compute $\Phi\left(x^{(n)}\right)$ and $F\left(x^{(n)}\right)=\Phi\left(x^{(n)}\right)-x^{(n)}$ .
2. (b)
  
  Compute $s^{(n)}\equiv x^{(n)}-x^{(n-1)},y^{(n)}\equiv F(x^{(n)})-F(x^{(n-1)}),\text{and }\alpha^{(n)}=\frac{\left\|s^{(n)}\right\|_{2}}{\left\|y^{(n)}\right\|_{2}}\text{\ ir\ }n\geq 1.$
3. (c)
  
  Compute $x^{(n+1)}=x^{(n)}+\alpha^{(n)}F(x^{(n)})$
4. (d)
  
  If $\left\|\Phi(x^{(n)})-x^{(n)}\right\|<\epsilon$ , exit the iteration. Otherwise, go back to Step 2(a).

Algorithm 5 Spectral algorithm with fixed point mapping

Variable-type-specific step size

Although the standard spectral algorithm uses a scalar $\alpha^{(n)}$ , I introduce the idea of variable-type-specific step sizes to the spectral algorithm.

To introduce the idea, suppose we would like to solve a nonlinear equation $F(x)=\left(\begin{array}[]{c}F_{1}(x)\\ F_{2}(x)\end{array}\right)=0\in\mathbb{R}^{n_{1}+n_{2}}$ , where $x=\left(\begin{array}[]{c}x_{1}\\ x_{2}\end{array}\right)\in\mathbb{R}^{n_{1}+n_{2}}$ . When using the standard step size $\alpha_{S3}$ , we set $\alpha^{(n)}=\frac{\left\|s^{(n)}\right\|_{2}}{\left\|y^{(n)}\right\|_{2}}$ , where $s^{(n)}\equiv\left(\begin{array}[]{c}x_{1}^{(n)}\\ x_{2}^{(n)}\end{array}\right)-\left(\begin{array}[]{c}x_{1}^{(n-1)}\\ x_{2}^{(n-1)}\end{array}\right)$ , $y^{(n)}\equiv\left(\begin{array}[]{c}F_{1}(x^{(n)})\\ F_{2}(x^{(n)})\end{array}\right)-\left(\begin{array}[]{c}F_{1}(x^{(n-1)})\\ F_{2}(x^{(n-1)})\end{array}\right)$ , and update $x$ by $x^{(n+1)}=x^{(n)}+\alpha^{(n)}F\left(x^{(n)}\right)$ .

Here, we can alternatively update the values of $x=\left(\begin{array}[]{c}x_{1}\\ x_{2}\end{array}\right)$ by $x^{(n+1)}=\left(\begin{array}[]{c}x_{1}^{(n+1)}\\ x_{2}^{(n+1)}\end{array}\right)=\left(\begin{array}[]{c}x_{1}^{(n)}\\ x_{2}^{(n)}\end{array}\right)+\left(\begin{array}[]{c}\alpha_{1}^{(n)}F_{1}(x^{(n)})\\ \alpha_{2}^{(n)}F_{2}(x^{(n)})\end{array}\right),$ where $\alpha_{m}^{(n)}=\frac{\left\|s_{m}^{(n)}\right\|_{2}}{\left\|y_{m}^{(n)}\right\|_{2}}$ , $s_{m}^{(n)}\equiv x_{m}^{(n)}-x_{m}^{(n-1)}$ , $y_{m}^{(n)}\equiv F_{m}(x^{(n)})-F_{m}(x^{(n-1)})\ (m=1,2)$ . The idea can be generalized to the case where $N\in\mathbb{N}$ types of variables exist.

As discussed earlier, $\alpha^{(n)}$ is chosen to accelerate the convergence. In principle, any choices of $\alpha^{(n)}$ are allowed in the spectral algorithm provided it works, and its numerical performance is more critical.¹¹¹¹11We can justify the strategy with a simple thought experiment. Suppose $x_{1}^{(n)}$ is fixed at the true value $x_{1}^{*}$ , and $x_{2}^{(n)}$ is not. Then, we should solve the equation $F_{2}(x_{2};x_{1}^{*})=0$ as an equation of $x_{2}$ , and setting $\alpha_{2}^{(n)}=\frac{\left\|s_{2}^{(n)}\right\|_{2}}{\left\|y_{2}^{(n)}\right\|_{2}}$ is desirable. If we assume $\alpha_{1}^{(n)}=\alpha_{2}^{(n)}=\alpha^{(n)}$ , we should set $\alpha^{(n)}=\frac{\left\|s^{(n)}\right\|_{2}}{\left\|y^{(n)}\right\|_{2}}=\frac{\sqrt{\left\|s_{1}^{(n)}\right\|_{2}^{2}+\left\|s_{2}^{(n)}\right\|_{2}^{2}}}{\sqrt{\left\|y_{1}^{(n)}\right\|_{2}^{2}+\left\|y_{2}^{(n)}\right\|_{2}^{2}}}$ , which may not be equal to $\frac{\left\|s_{2}^{(n)}\right\|_{2}}{\left\|y_{2}^{(n)}\right\|_{2}}$ .

4 Extension: Box constraint

So far, we have only considered settings in which first-order equality conditions of the optimization problems hold exactly. However, in some applications, the introduction of box constraints on the domain of actions, such as nonnegative constraints, may be suitable. For example, in the investment competition model discussed by Pakes and McGuire, (1994), each firm’s investment value should be nonnegative, and first-order equality conditions do not necessarily hold at boundary points. This section shows that we can easily incorporate box constraints into the proposed algorithm.

As discussed in Section 3.3, the proposed algorithm under no binding box constraints corresponds to solving a fixed point problem $x=\Phi(x)\equiv\left(\Phi_{V}(V,a),\Phi_{a}(V,a)\right)$ , where $x\equiv(V,a)$ . In contrast, in the setting where some of the box constraints are binding, we cannot directly derive fixed-point representations. For instance, in the case of the single-agent dynamic optimization problem, let $l$ and $u$ be the lower and upper bounds of the actions. Then, the following should hold:

\frac{\partial Q(a^{*}\left(s\right),s;V)}{\partial a^{*}\left(s\right)}\begin{cases}\leq 0&\text{if}\ a^{*}\left(s\right)=l\\ =0&\text{if}\ a^{*}\left(s\right)\in(l,u)\\ \geq 0&\text{if}\ a^{*}\left(s\right)=u\end{cases}

Motivated by this, suppose that we want to solve the following general mathematical problem:

f_{i}(x)\begin{cases}\leq 0&\text{if}\ x_{i}=l_{i}\\ =0&\text{if}\ x_{i}\in(l_{i},u_{i})\\ \geq 0&\text{if}\ x_{i}=u_{i}\end{cases},

(10)

where $r:B\subset\mathbb{R}^{I}\rightarrow\mathbb{R}^{I}$ is a continuous function. Here, $B\equiv\Pi_{i\in\{1,\cdots,I\}}B_{i}$ , $B_{i}\equiv[l_{i},u_{i}]$ .

Suppose that a function $\Phi:B\rightarrow\mathbb{R}^{I}$ satisfies:

f_{i}(x)=0\iff x_{i}=\Phi_{i}(x)

(11)

Here, we define a function $\widehat{\Phi}:B\rightarrow B$ such that:

\widehat{\Phi}_{i}(x)\equiv\begin{cases}l_{i}&\text{if}\ f_{i}(x)\leq 0\ \&\ x_{i}=l_{i}\\ u_{i}&\text{if}\ f_{i}(x)\geq 0\ \&\ x_{i}=u_{i}\\ P_{B_{i}}\left(\Phi_{i}(x)\right)&\text{Otherwise}\end{cases},

where $P_{B}(x)$ denotes the projection of $x$ to a convex set $B$ .

Then, the following statement holds:

Proposition 1.

$x^{*}$ is the solution to (10) $\iff$ $x^{*}$ is the solution to $\widehat{\Phi}(x)=x$

Regarding the dynamic optimization problems with continuous actions, we can construct an alternative mapping $\widehat{\Phi}$ from the fixed point mapping $\Phi$ under no binding constraints and the optimality conditions using $\frac{\partial Q}{\partial a}$ . Then, the problem is equivalent to solving $x=\widehat{\Phi}(x)$ , and we can solve the problem using fixed-point iterations $x^{(n+1)}=\widehat{\Phi}(x^{(n)})$ . Note that we can also introduce the spectral algorithm.

Regarding the continuity of the function $\widehat{\Phi}$ , the following statement applies.

Proposition 2.

$\widehat{\Phi}$ is a continuous function, if $\Phi$ is continuous, $\lim_{x_{i}\rightarrow l_{i}+0,f_{i}(x)\leq 0}\Phi_{i}(x)\leq l_{i}$ , and $\lim_{x_{i}\rightarrow u_{i}-0,f_{i}(x)\geq 0}\Phi_{i}(x)\geq u_{i}$ .

Then, we can easily derive the following statement for the mapping of VF-PGI, which we denote $\Phi^{VF-PGI}$ :

Corollary 1.

Suppose that $\Phi^{VF-PGI}$ is a continuous function. Then, $\widehat{\Phi^{VF-PGI}}$ , derived from $\Phi^{VF-PGI}$ by the box constraint on the domain of actions $a$ , is continuous.

5 Numerical experiments

This section uses numerical experiments to show the performance of the proposed VF-PGI-Spectral algorithm under three settings. Section 5.1 shows results under the single-agent neoclassical growth model with elastic labor supply, which is extensively considered in the literature on macroeconomics. Section 5.2 considers a dynamic investment competition model with continuous states. Section 5.3 considers a dynamic investment competition model with discrete states, examined by Pakes and McGuire, (1994). All experiments were conducted on a laptop computer with the CPU AMD Ryzen 5 6600H 3.30 GHz, 16.0 GB of RAM, Windows 11 64-bit, and MATLAB 2022b. When applying the spectral algorithm, we use $\alpha_{0}=1$ in this section. For details of each experiment, see Appendix B.

5.1 Single-agent neoclassical growth model with elastic labor supply

Settings

As in Maliar and Maliar, (2013), we consider a standard single-agent neoclassical growth model with elastic labor supply. The dynamic optimization problem is characterized by:

$\displaystyle V(k,z)$	$\displaystyle=$	$\displaystyle\max_{l,c}u(c,l)+\beta E\left[V(k^{\prime},z^{\prime})\right]$
	$\displaystyle s.t.$	$\displaystyle k^{\prime}=(1-\delta)k+zr(k,l)-c$
		$\displaystyle\ln z^{\prime}=\rho\ln z+\epsilon^{\prime},\ \epsilon^{\prime}\sim N(0,\sigma^{2})$
		$\displaystyle u(c,l)=\frac{c^{1-\gamma}-1}{1-\gamma}+B\frac{(1-l)^{1-\mu}-1}{1-\mu}$
		$\displaystyle r(k,l)=Ak^{\alpha}l^{1-\alpha},$

where $k,c,l,z$ are the capital, consumption, labor, and productivity level.

First order conditions concerning $l$ and $c$ are:

	$\displaystyle 0=\frac{\partial Q(l,c,k,z;V)}{\partial l(k,z)}$	$\displaystyle=$	$\displaystyle u_{l}(c,l)+Azf_{l}\cdot\beta\frac{\partial E\left[V(k^{\prime},z^{\prime})\right]}{\partial k^{\prime}}\,$		(13)
	$\displaystyle 0=\frac{\partial Q(l,c,k,z;V)}{\partial c(k,z)}$	$\displaystyle=$	$\displaystyle u_{c}(c,l)-\beta\frac{\partial E\left[V(k^{\prime},z^{\prime})\right]}{\partial k^{\prime}}$		(14)

In this study, we try the following algorithms: VF-PGI-Spectral, VFI-Spectral, ECM-Spectral, EGM-Spectral, VFI, ECM, EGM. In the VF-PGI-Spectral algorithm, we jointly update the values of value functions $V$ and actions $l,c$ until convergence based on equations (5.1), (13), (14) by combining the spectral algorithm.

It should be noted that $c=\left(\frac{B(1-l)^{-\mu}}{(1-\alpha)Azk^{\alpha}l^{-\alpha}}\right)^{\frac{-1}{\gamma}}$ holds by equations (13) and (14). Hence, given the values of $l$ , we can analytically compute the values of $c$ . Therefore, we can alternatively consider an algorithm jointly updating value functions $V$ and an actions $l$ until convergence. In the algorithm, the values of $c$ are recovered by an analytical formula $c=\left(\frac{B(1-l)^{-\mu}}{(1-\alpha)Azk^{\alpha}l^{-\alpha}}\right)^{\frac{-1}{\gamma}}$ . We denote the algorithm as VF-PGI*-Spectral, and we also validate the performance of the algorithm.

The experiments are conducted by modifying and extending the replication code of Maliar and Maliar, (2013). As in Maliar and Maliar, (2013), we choose steady-state capital-output ratio $\pi_{k}=10$ , steady-state consumption-output ratio $\pi_{c}=3/4$ , steady-state labor $\overline{l}=1/3$ , and parameters $A=\frac{1/\beta-(1-\delta)}{\alpha},B=(1-\alpha)\pi_{k}^{(1-\gamma)\alpha/(1-\alpha)}\pi_{c}^{-\gamma}(1-\overline{l})^{\mu}l^{-\mu},\gamma=2,\mu=2,\alpha=1/3,\beta=0.99,\delta=0.025,\rho=0.95,\sigma=0.01$ .

Results

Table 1 shows the performance of the algorithms. The VF-PGI-Spectral algorithm converges significantly faster than other methods, including VFI, EGM, and ECM. We can also see that introducing the spectral algorithm significantly accelerates the convergence of VFI, EGM, and ECM. Still, VF-PGI-Spectral is superior. As noted in Maliar and Maliar, (2013), in the neoclassical growth model with elastic labor supply, neither EGM nor ECM can completely avoid solving nonlinear equations within the iterations. In contrast, VF-PGI-Spectral avoids such a problem. As shown in the last column, the CPU time per iteration for VF-PGI-Spectral is significantly less than that of other methods. Note that regardless of the choice of algorithms, the numerical accuracy rarely changes.

The table also shows the performance of VF-PGI*-Spectral, using the analytical formula between $l$ and $c$ . The results show that VF-PGI*-Spectral requires fewer iterations than VF-PGI-Spectral, and converges slightly faster than VF-PGI-Spectral. The findings indicate that the convergence might be accelerated by reducing the number of variables to be solved using analytical formulas.

Table 1: Speed and accuracy of algorithms (Single-agent neoclassical growth model with elastic labor supply)

Method	CPU (sec)	$L_{1}$	$L_{\infty}$	Iter.	CPU / Iter.
VF-PGI-Spectral	0.261	-5.425	-3.983	138	0.002
VF-PGI*-Spectral	0.153	-5.425	-3.983	98	0.002
VFI-Spectral	1.171	-5.426	-3.973	52	0.023
ECM-Spectral	0.650	-5.412	-3.958	39	0.017
EGM-Spectral	0.492	-5.345	-3.825	37	0.013
VFI	15.577	-5.425	-3.983	844	0.018
ECM	9.580	-5.406	-3.960	844	0.011
EGM	4.607	-5.345	-3.825	413	0.011

Notes. $\alpha_{0}=1,\lambda=10^{-7}$ .

$L_{1}$ and $L_{\infty}$ denotes the average and the maximum absolute values of Euler residuals in log 10 units on a stochastic simulation of 10,000 observations.

Regarding VF-PGI-Spectral, we must specify an exogenous tuning parameter $\lambda$ . In the setting shown in Table 1, we chose $\lambda=10^{-7}$ .¹²¹²12Under $\alpha_{0}=1$ , $a^{(n=1)}=a^{(n=0)}+\lambda\frac{\partial Q}{\partial a}\left(a^{(n=0)}\right)$ holds. If $\lambda$ is excessively large, $a^{(n=1)}$ can be too far from $a^{(n=0)}$ and the true solution, and the iteration may become unstable. In the current setting, $Q$ is of order $10^{5},$ and $\left|\frac{\partial Q}{\partial a}\right|$ can be very large. Hence, there is a need to choose small $\lambda$ like $10^{-7}$ . Table 2 shows how the results vary when $\lambda$ varies.

The results suggest that taking a sufficiently small value of $\lambda$ is essential for the convergence of VF-PGI-Spectral. Note that the size of $\lambda$ has no significant effect on the convergence speed of the algorithm as long as it is sufficiently small, and choosing the adequate value of $\lambda$ is not critical as long as it leads to convergence.

Table 2: Effect of

\lambda

in the VF-PGI-Spectral algorithm (Single-agent neoclassical growth model with elastic labor supply)

Method	$\log_{10}(\lambda)$	CPU (sec)	$L_{1}$	$L_{\infty}$	Iter.	Conv.
VF-PGI-Spectral	-10	0.22	-5.425	-3.983	108	1
VF-PGI-Spectral	-9	0.278	-5.425	-3.983	145	1
VF-PGI-Spectral	-8	0.264	-5.425	-3.983	135	1
VF-PGI-Spectral	-7	0.279	-5.425	-3.983	140	1
VF-PGI-Spectral	-6	0.092	NaN	NaN	38	0
VF-PGI-Spectral	-5	0.016	NaN	NaN	5	0

Notes. $\alpha_{0}=1$ .

$L_{1}$ and $L_{\infty}$ denotes the average and the maximum absolute values of Euler residuals in log 10 units on a stochastic simulation of 10,000 observations.

We let Conv.=0 if it diverges.

Remark 4.

Note that even when $\lambda=1$ , we can successfully solve the dynamic model by setting a sufficiently small value of $\alpha_{0}$ , which represents the initial step size of the spectral algorithm. As in the case of $\lambda$ , the size of $\alpha_{0}$ does not significantly affect the convergence speed of the algorithm as long as it convergences. Appendix C.1 shows the numerical results.

Remark 5.

As shown in Appendix C.2 , ECM-Spectral slightly outperforms VF-PGI-Spectral in the model with inelastic labor supply. In the model, both VF-PGI-Spectral and EGM-Spectral totally avoid solving nonlinear equations inside iterations. Unlike VF-PGI-Spectral, which essentially solves two types of variables (value functions and actions), ECM-Spectral requires solving only for value functions. Thus, using analytical formulas to reduce the number of variables to be solved may lead to faster convergence. For general models where deriving precise analytical formulas is challenging, VF-PGI-Spectral would outperform other methods, despite the potential increase in the number of iterations.

5.2 Dynamic investment competition model with continuous states

Settings

We examine a simple stationary dynamic investment competition model.¹³¹³13A nonstationary version of the model was empirically applied in Fukasawa and Ohashi, (2023), and an analogous model was also considered in Miranda and Fackler, (2004). Firms $j=1,\cdots,J$ produce and sell a homogeneous product in a single market in each period. Each firm makes investment decisions to lower the marginal production costs. Firm $j$ ’s dynamic optimization problem related to its investment is given by:

	$\displaystyle\max_{i_{j}(s)\in\mathbb{R}}$	$\displaystyle\pi_{j}(s)-c\left(k_{j},i_{j}(s)\right)+\beta E_{z}V_{j}\left(k_{j}^{\prime},k_{-j}^{\prime},z^{\prime}\right)$
		$\displaystyle s.t.\ k_{j}^{\prime}=(1-\delta)k_{j}+i_{j}$

Here, $\pi_{j}(s)$ denotes firm $j$ ’s per-period profit from producing and selling profit given states $s\equiv(k,z)$ . $k\equiv\left(k_{j},k_{-j}\right)$ denotes all firms’ capital stocks, and $z\equiv(\xi,\mu)$ denotes the exogenous state variables. $c\left(k_{j},i_{j}(s)\right)$ denotes firm $j$ ’s investment cost given its capital $k_{j}$ and investment $i_{j}(s)$ , and assume $c(k,i)=\theta_{k}i+\theta_{a}\frac{i^{2}}{k}$ . $k_{j}^{\prime}=(1-\delta)k_{j}+i_{j}$ represents the deterministic state transition of firm $j$ ’s capital stock. $\delta$ denotes the depreciation rate of the capital.

Assuming constant marginal production costs, we define $mc_{j}\left(k_{j},\mu\right)=k_{j}^{-\gamma}\cdot\exp\left(\mu\right)$ as the marginal cost of firm $j$ given its capital stock $k_{j}$ . $\mu$ denotes stochastic cost shock. Firms choose outputs in the market, and assume static Cournot competition with regard to the output choice. Let $Q(P)=P^{-\eta}\cdot\exp\left(\xi\right)$ be the demand function of the product, where $Q$ denotes the aggregate quantity, $P$ denotes the price, and $\xi$ denotes stochastic demand shock. $\eta$ denotes a parameter corresponding to the price elasticity of demand. Then, firm $j$ ’s profit function can be represented as $\pi_{j}(s)=\left(P(s)-mc_{j}(s)\right)q_{j}(s)$ . Given the values of the state variables $s\equiv\left(k_{j},k_{-j};\xi,\mu\right)$ and parameters, we can compute the profits of the firms.

Regarding the exogenous shocks, we assume stationary AR(1) transitions $\xi^{\prime}=\rho^{\xi}(\xi-\overline{\xi})+\overline{\xi}+u^{\xi}$ where $u^{\xi}\sim N(0,\sigma_{\xi}^{2})$ , and $\mu^{\prime}=\rho^{\mu}(\mu-\overline{\mu})+\overline{\mu}+u^{\mu}$ where $u^{\mu}\sim N(0,\sigma_{\mu}^{2})$ .

The parameters are specified as follows: $\alpha=1.5$ , $\beta=0.9$ , $\delta=0.08$ , $\rho_{1}^{\xi}=\rho_{1}^{\mu}=0.9$ , $\sigma_{\xi}=\sigma_{\mu}=0.01$ , $\overline{\xi}=4$ , $\overline{\mu}=2$ , and $\theta_{k}=0.12$ , $\theta_{a}=0.3$ . Regarding VF-PGI-Spectral, we let $\alpha_{0}=1$ and $\lambda=0.001$ .

Results

Table 3 shows the results. Regardless of the number of firms, VF-PGI-Spectral outperforms VFI* based methods. Furthermore, the numerical accuracy of the solutions was rarely impacted by the choice of the algorithms.

Table 3: Speed and accuracy of algorithms (Dynamic investment competition model with continuous states)

$J$	Method	CPU (sec)	$L_{1}$	$L_{\infty}$	Iter.	CPU / Iter.
1	VF-PGI-Spectral	0.034	-5.924	-5.38	48	0.001
	VFI*-Spectral	0.083	-5.924	-5.38	39	0.002
	VFI*	0.191	-5.924	-5.38	98	0.002
2	VF-PGI-Spectral	0.103	-6.697	-5.958	84	0.001
	VFI*-Spectral	0.543	-6.697	-5.958	89	0.006
	VFI*	0.689	-6.697	-5.958	141	0.005
3	VF-PGI-Spectral	0.741	-7.378	-6.245	86	0.009
	VFI*-Spectral	3.765	-7.378	-6.245	63	0.06
	VFI*	5.957	-7.378	-6.245	141	0.042

$L_{1}$ and $L_{\infty}$ denotes the average and the maximum of $\log_{10}\left(\left\|\frac{\partial Q_{j}(a_{j}\left(\widehat{s}\right),a_{-j}\left(\widehat{s}\right),\widehat{s};V_{j})}{\partial a_{j}\left(\widehat{s}\right)}\right\|_{\infty}\right)$ on a stochastic simulation of 100 observations.

5.3 Dynamic investment competition model with discrete states (Pakes and McGuire,, 1994)

Settings

We consider the dynamic investment competition model with discrete states following Pakes and McGuire, (1994).¹⁴¹⁴14To clarify the notations, we slightly alter the notations in Pakes and McGuire, (1994). Investment $i_{j}$ in the current paper corresponds to $x_{j}$ in Pakes and McGuire, (1994), and state transition parameter $\gamma$ in the current paper corresponds to $a$ in Pakes and McGuire, (1994). Single product firms $j=1,\cdots,J$ produce and sell differentiated products in a single market in each period. Firms make investment decisions to improve their product quality. Firm $j$ ’s dynamic optimization problem related to its investment is given by:

V_{j}(w)=\max_{i_{j}(w)\geq 0}Q_{j}(i_{j}(w),i_{-j}^{*}(w),w;V_{j})\equiv\pi_{j}(w)-c\left(i_{j}(w)\right)+\beta E_{w^{\prime}}\left[V_{j}(w^{\prime})|w,i_{j}(w),i_{-j}^{*}(w)\right]

Here, $\pi_{j}(w)$ denotes firm $j$ ’s per-period profit function given states $w\equiv(w_{j},w_{-j})$ , and $c\left(i\right)$ denotes the investment cost function. $w_{j}\in\{1,2,\cdots,\overline{K}\}$ denotes firm $j$ ’s capital stock.

State transition of $w$ is given by $w_{j}^{\prime}=w_{j}+\tau_{j}-\nu$ . Here, $\tau_{j}$ represents the output of firm $j$ ’s own investment process, and its probability distribution is given by:

Pr(\tau_{j})=\begin{cases}\frac{\gamma i_{j}(w)}{1+\gamma i_{j}(w)}&\text{if}\ \tau_{j}=1\\ \frac{1}{1+\gamma i_{j}(w)}&\text{if}\ \tau_{j}=0\end{cases}.

$\nu$ denotes the industry-wide depreciation shock, and its probability distribution is given by:

Pr(\nu)=\begin{cases}\delta&\text{if}\ \nu=1\\ 1-\delta&\text{if}\ \nu=0\end{cases}.

In this study, we assume the following functional form of investment cost function $c\left(i\right)$ : $c\left(i\right)=i+\theta_{2}i^{2}$ . The term $\theta_{2}i^{2}$ corresponds to the quadratic investment cost.¹⁵¹⁵15Adjustment costs of investment, in the form of quadratic investment costs, have been thoroughly examined in the macroeconomics literature (e.g., Cooper and Haltiwanger,, 2006). See also the discussions in Mermelstein et al., (2020).

We assume that each consumer purchases at most one product in each period. Consumer $i$ ’s utility when purchasing firm $j$ ’s product is $U_{ij}=g(w_{j})-p_{j}+\epsilon_{ij}$ , where $g(w_{j})$ denotes the quality of the product produced by firm $j$ at state $w_{j}$ . Consumer $i$ ’s utility from buying nothing is $\epsilon_{i0}$ . $p_{j}$ denotes firm $j$ ’s product price, and $\left\{\epsilon_{ij}\right\}_{j\in\mathcal{J}\cup\{0\}}$ denotes idiosyncratic utility shocks following i.i.d. Gumbel distribution. Then, the demand for firm $j$ ’s product is $q_{j}(w;p)=M\frac{\exp\left(g(w_{j})-p_{j}\right)}{1+\sum_{k=1}^{J}\exp\left(g(w_{k})-p_{k}\right)}$ , where $M$ denotes the market size. Regarding $g(w)$ , we assume $g(w)=w$ for $w\leq w^{*}$ and $g(w)=w+\log\left(2-\exp\left(-w+w^{*}\right)\right)$ for $w>w^{*}$ .

Firms choose their product prices in each period, observing the quality of competitors’ products. Let $p_{j}^{*}(w)$ be the equilibrium price of firm $j$ ’s product. Then, the profit function of firm $j$ can be represented as $\pi_{j}(w)=\left(p_{j}^{*}(w)-mc\right)q_{j}\left(w;p^{*}\right)$ , where $mc$ denotes the marginal cost of products, which we assume to be common for all products. Given the state variables $w\equiv\left(w_{j},w_{-j}\right)$ and parameters, we can compute the firms’ profits. As in Pakes and McGuire, (1994), we use the following parameter values: $\delta=0.7,mc=5,M=5,\overline{K}=19,w^{*}=12,\beta=0.925.$

The experiments are conducted by modifying the replication code of Pakes and McGuire, (1994).¹⁶¹⁶16The replication code of Pakes and McGuire, (1994) was downloaded from https://scholar.harvard.edu/pakes/pages/pakes-maguire-algorithm-0. To simplify the setting, I modified the original code so that no entry/exit decisions exist. Note that the relative performance of the VF-PGI-Spectral algorithm rarely changes even when firms’ endogenous entry/exit decisions are introduced.Regarding VF-PGI-Spectral, we let $\lambda=0.01$ .

We assume nonnegative investments by firms. By combining the discussion in Section 4, in VF-PGI-Spectral, we update the values of $i_{j}$ as follows:

$i_{j}^{(n+1)}(w)=\begin{cases}i_{j}^{(n)}(w)&\text{if}\ i_{j}^{(i)}(w)=0\ \&\ \frac{\partial Q_{j}(i^{(n)}(w),w;V^{(n)})}{\partial i_{j}^{(n)}(w)}(w)<0\\ i_{j}^{(n)}(w)+\lambda\cdot\frac{\partial Q_{j}(i^{(n)}(w),w;V^{(n)})}{\partial i_{j}^{(n)}(w)}(w)&\text{Otherwise}\end{cases}$

Results

First, Table 4 shows the findings in the setting where there are no quadratic investment costs $(\theta_{2}=0)$ . In the specific setting, as discussed in Pakes et al., (1993), we can show that optimal $i_{j}$ given value functions $V$ and competitors’ actions $i_{-j}^{*}$ satisfies $i_{j}^{*}(w)=\max\left\{\frac{-1+\sqrt{\beta\gamma\left(v_{1j}(w)-v_{2j}(w)\right)}}{\gamma},0\right\}\ \ (j=1,\cdots,J;w\in\Omega)$ , where $v_{1j}(w)\equiv E_{w^{\prime}}\left[V_{j}(w^{\prime})|w,\tau_{j}=1,i_{-j}^{*}(w)\right]$ and $v_{2j}(w)=E_{w^{\prime}}\left[V_{j}(w^{\prime})|w,\tau_{j}=0,i_{-j}^{*}(w)\right]$ . This implies nonlinear optimization problems inside iterations can be avoided by using the analytical solution. This leads to lower a computational burden per iteration in the VFI* algorithm. As shown in the table, VF-PGI-Spectral and VFI*-Spectral perform analogously.

In contrast, in the general setting where quadratic adjustment costs exist $(\theta_{2}>0)$ , there are no analytical formulas related to optimal investments. Table 5 shows the results under $\theta_{2}=1$ , and we can see that VF-PGI-Spectral outperforms VFI*-Spectral.

Table 4: Speed and accuracy of algorithms (Dynamic investment competition model with discrete states;

\theta_{2}=0

; Without quadratic investment cost)

$J$	Method	CPU(secs)	Iter.	$L_{\infty}(V)$	$L_{\infty}(i)$
1	VF-PGI-Spectral	0.28	74	-6.09	-6.41
	VFI*-Spectral (analytical sol)	0.28	95	-6.05	-6.26
	VFI* (analytical sol)	0.42	219	-6.02	-7.53
2	VF-PGI-Spectral	3.41	103	-6.11	-6.6
	VFI*-Spectral (analytical sol)	3.06	99	-6.03	-6.93
	VFI* (analytical sol)	5.74	196	-6.02	-7.36
3	VF-PGI-Spectral	74.11	129	-6.02	-6.25
	VFI*-Spectral (analytical sol)	59.63	102	-6.02	-6.73
	VFI* (analytical sol)	124.67	206	-6.02	-7.59

Notes. $L_{\infty}(V)$ denotes the value of $\log_{10}\left(\left\|\Phi_{V}(V,a)-V\right\|_{\infty}\right)$ . $L_{\infty}(i)$ is defined analogously.

Table 5: Speed and accuracy of algorithms (Dynamic investment competition model with discrete states;

\theta_{2}=1

; With quadratic investment cost)

$J$	Method	CPU(secs)	Iter.	$L_{\infty}(V)$	$L_{\infty}(i)$
1	VF-PGI-Spectral	0.25	66	-6.94	-6.26
1	VFI*-Spectral	0.46	53	-7.26	-7.03
2	VF-PGI-Spectral	3.76	111	-6.17	-6.88
2	VFI*-Spectral	12.3	98	-6.2	-7.5
3	VF-PGI-Spectral	63.08	97	-6.59	-6.63
3	VFI*-Spectral	191.63	77	-6.1	-7.01

Notes. $L_{\infty}(V)$ denotes the value of $\log_{10}\left(\left\|\Phi_{V}(V,a)-V\right\|_{\infty}\right)$ . $L_{\infty}(i)$ is defined analogously.

6 Relationships with previously proposed methods

6.1 ECM, EE, and EGM

To ease the computation of single-agent dynamic optimization problems mainly applied in the macroeconomics literature, several methods (ECM, EE, EGM) have been proposed. These methods avoid nonlinear optimization problems in the iterations, leading to a lower computational burden. However, these methods are not directly applicable to multi-agent dynamic games. This section clarifies this point by considering the dynamic investment competition with continuous states considered in Section 5.2 as an example. To simplify equations, we consider the setting $c(i)=\theta_{1}i+\theta_{2}i^{2}$ . Moreover, we assume that the exogenous states $z_{t}$ are constant over time, and we do not explicitly treat them as states.

6.1.1 Envelope Condition Method (ECM; Maliar and Maliar,, 2013; Arellano et al.,, 2016)

In the dynamic investment competition model, envelope condition is:¹⁷¹⁷17See Appendix A.3 for the derivation.

\frac{\partial V_{j}\left(k_{j},k_{-j}\right)}{\partial k_{j}}=\frac{\partial\pi_{j}\left(k_{j},k_{-j}\right)}{\partial k_{j}}+(1-\delta)c^{\prime}\left(k_{j}^{\prime}-(1-\delta)k_{j}\right)+\beta\frac{\partial k_{-j}^{\prime}}{\partial k_{j}}\frac{\partial V_{j}\left(k_{j}^{\prime},k_{-j}^{\prime}\right)}{\partial k_{-j}^{\prime}}

(15)

If firm $j$ alone exists, the last term disappears, and we have:

\frac{\partial V_{j}(k_{j},k_{-j})}{\partial k_{j}}=\frac{\partial\pi_{j}(k_{j},k_{-j})}{\partial k_{j}}+(1-\delta)\left(\theta_{1}+2\theta_{2}i_{j}\right)

It implies we can analytically solve for $i_{jt}$ given the value of $V_{j}(k_{j},k_{-j})$ . Hence, we can consider the following algorithm to solve for $V$ when only firm $j$ exists:

1.

Take grid points $k_{j}$ . Set initial values of $V_{j}^{(0)}(k_{j})$ . Iterate the following until convergence $(n=0,1,\cdots)$ :
2.
Given $V_{j}^{(n)}$ ,
1. (a)
  
  Analytically solve for $i_{j}^{(n+1)}(k_{j})$ satisfying $\frac{\partial V_{j}^{(n)}(k_{j})}{\partial k_{j}}=\frac{\partial\pi_{j}(k_{j})}{\partial k_{j}}+(1-\delta)\left(\theta_{1}+2\theta_{2}i_{j}^{(n+1)}\left(k_{j}\right)\right)$
2. (b)
  
  Update $V_{j}(k_{j})$ by : $V_{j}^{(n+1)}(k_{j})=\pi_{j}(k_{j})-c\left(i_{j}^{(n+1)}(k_{j})\right)+\beta V_{j}^{(n)}\left((1-\delta)k_{j}+i_{j}^{(n+1)}(k_{j})\right)$

In contrast, in the model involving multiple firms, the term $\beta\frac{\partial k_{-j}^{\prime}}{\partial k_{j}}\frac{\partial V_{j}(k_{j}^{\prime},k_{-j}^{\prime})}{\partial k_{-j}^{\prime}}$ does not disappear, and we need knowledge of the values of the term $\frac{\partial k_{-j}^{\prime}}{\partial k_{j}}$ to analytically solve for $i_{j}^{(n+1)}(k_{j})$ . However, the term cannot be directly recovered from $V$ , and the idea of ECM is not directly applicable.

6.1.2 Euler Equation (EE) method (cf. Judd,, 1998)

The Euler equation for the model is:¹⁸¹⁸18See Appendix A.3 for the derivation. $k_{j}^{\prime\prime}$ denotes firm $j$ ’s capital two periods later than the current period.

0=-c^{\prime}\left(k_{j}^{\prime}-(1-\delta)k_{j}\right)+\beta\frac{\partial\pi_{j}(k_{j}^{\prime},k_{-j}^{\prime})}{\partial k_{j}^{\prime}}+\beta(1-\delta)c^{\prime}\left(k_{j}^{\prime\prime}-(1-\delta)k_{j}^{\prime}\right)+\beta^{2}\frac{\partial k_{-j}^{\prime\prime}}{\partial k_{j}^{\prime}}\frac{\partial V_{j}(k_{j}^{\prime\prime},k_{-j}^{\prime\prime})}{\partial k_{-j}^{\prime\prime}}

(16)

If only one firm exists and $c(i)=\theta_{1}i+\theta_{2}i^{2}$ , the last term disappears, and

0=-\theta_{1}-\theta_{2}i_{j}(k_{j})+\beta\frac{\partial\pi_{j}\left((1-\delta)k_{j}+i_{j}(k_{j})\right)}{\partial i_{j}}+\beta(1-\delta)\left[\theta_{1}+\theta_{2}i_{j}^{\prime}\right]

holds.

Hence, we can consider the following algorithm:

1.

Take grid points $k_{j}$ . Set initial values of $i_{j}^{(0)}(k_{j})$ . Iterate the following until convergence $(n=0,1,\cdots)$ :
2.

Given $i_{j}^{(n)}(k_{j})$ , analytically solve for $i_{j}^{(n+1)}(k_{j})$ satisfying $0=-\theta_{1}-\theta_{2}i_{j}^{(n+1)}(k_{j})+\beta\frac{\partial\pi_{j}\left(k_{j}^{\prime}(k_{j})\right)}{\partial k_{j}^{\prime}(k_{j})}+\beta(1-\delta)\left[\theta_{1}+\theta_{2}\overline{i_{j}^{(n)}}(k_{j}^{\prime}(k_{j}))\right]$ ,

where $k_{j}^{\prime}(k_{j})=(1-\delta)k_{j}+i_{j}^{(n)}(k_{j})$ , and $\overline{i_{j}^{(n)}}$ denotes the interpolated value of $i_{j}$ using the values of $i_{j}^{(n)}(k)$ .

However, for dynamic games, the last term of equation (16), $\beta^{2}\frac{\partial k_{-j}^{\prime\prime}}{\partial k_{j}^{\prime}}\frac{\partial V_{j}(k_{j}^{\prime\prime},k_{-j}^{\prime\prime})}{\partial k_{-j}^{\prime\prime}}$ , remains. The term corresponds to the effect of strategic interactions under the Markov perfect equilibrium, and the term cannot be directly computed without introducing $V_{j}$ . This implies the EE method is not directly applicable to dynamic games.

6.1.3 Endogenous Gridpoint Method (EGM; Carroll,, 2006)

The idea behind EGM is to solve the problem backward rather than forward. If we directly applied the EGM to the dynamic investment competition model, the algorithm would be as follows:

1.

Take grid points $\widehat{k^{\prime}}\equiv\left(\widehat{k_{j}^{\prime}}\right)_{j\in\mathcal{J}}\in\widehat{\mathcal{S}_{E}}$ . Set initial values of $\left(V_{j}^{(0)}\left(k\left(\widehat{k^{\prime}}\right)\right)\right)_{j\in\mathcal{J},\widehat{k^{\prime}}\in\widehat{\mathcal{S}_{E}}}$ . Iterate the following until convergence $(n=0,1,\cdots,)$ :
2.
Given $V^{(n)}$ ,
1. (a)
  
  Analytically solve for $i_{j}^{(n+1)}\left(\widehat{k^{\prime}}\right)$ satisfying $-\theta_{1}-\theta_{2}i_{j}^{(n+1)}\left(\widehat{k^{\prime}}\right)+\beta\frac{\partial V_{j}\left(\widehat{k_{-j}^{\prime}},\widehat{k_{-j}^{\prime}}\right)}{\partial\widehat{k_{j}^{\prime}}}=0$
2. (b)
  
  Compute endogenous grid point $k_{j}^{(n+1)}\left(\widehat{k^{\prime}}\right)=\frac{\widehat{k_{j}^{\prime}}-i_{j}^{(n+1)}\left(\widehat{k^{\prime}}\right)}{1-\delta}$
3. (c)
  
  Update $V_{j}\left(k\left(\widehat{k^{\prime}}\right)\right)$ by $V_{j}^{(n+1)}\left(k\left(\widehat{k^{\prime}}\right)\right)=\pi_{j}\left(k\left(\widehat{k^{\prime}}\right)\right)-c\left(i_{j}^{(n+1)}\left(\widehat{k^{\prime}}\right)\right)+\beta V_{j}^{(n)}\left(\widehat{k^{\prime}}\right)$

Here, suppose that two equilibria $i^{(1)}\left(k^{*}\right)\equiv\left(i_{j}^{(1)}\left(k^{*}\right)\right)_{j\in\mathcal{J}}$ and $i^{(2)}\left(k^{*}\right)\equiv\left(i_{j}^{(2)}\left(k^{*}\right)\right)_{j\in\mathcal{J}}$ exist at a state $k^{*}$ . Let $k_{j}^{\prime(m=1,2)}=(1-\delta)i_{j}^{(m=1,2)}+k_{j}^{*}$ be the associated state of firm $j$ in the next period. We consider the case where we choose $k^{\prime(1)}\equiv\left(k_{j}^{\prime(1)}\right)_{j\in\mathcal{J}}$ and $k^{\prime(2)}\equiv\left(k_{j}^{\prime(2)}\right)_{j\in\mathcal{J}}$ as grid points. Then, if we use the first endogenous grid point, we evaluate $V_{j}(k^{*})$ as $\pi_{j}(k^{*})-c\left(i_{j}^{(1)}(k^{*})\right)+\beta V_{j}\left(k^{\prime(1)}\right)$ . If we use the second endogenous grid point, we evaluate $V_{j}\left(k^{*}\right)$ as $\pi_{j}(k^{*})-c\left(i_{j}^{(2)}(k^{*})\right)+\beta V_{j}\left(k^{\prime(2)}\right)$ . If we use both, we have two distinct values $V_{j}(k^{*})=\pi_{j}(k^{*})-c\left(i_{j}^{(1)}(k^{*})\right)+\beta V_{j}\left(k^{\prime(1)}\right)$ and $V_{j}(k^{*})=\pi_{j}(k^{*})-c\left(i_{j}^{(2)}(k^{*})\right)+\beta V_{j}\left(k^{\prime(2)}\right)$ , which are hard to handle in the algorithm above.

6.2 Approach using cubic splines

Goettler and Gordon, (2011) and others¹⁹¹⁹19Goettler and Gordon, (2011) solved a dynamic oligopoly model of durable goods firms introducing dynamic demand structure and firms’ continuous investment decisions. use cubic splines to approximate the shape of the action value functions $Q$ with respect to $a$ . Because cubic spline functions are cubic functions, their derivatives are quadratic functions, which implies that the counterpart of $\frac{\partial Q}{\partial a}=0$ are quadratic equations and their analytical solutions exist. Hence, analysts need not solve nonlinear equations numerically. However, the introduction of cubic splines needs the use of many grid points to evaluate the values of $Q$ for good approximation, thereby requiring many function evaluations.

6.3 Projection method and deep learning-based methods

The projection method (e.g., Judd,, 1992) parameterizes variables as functions of states, such as value functions and actions, and solves for the parameters that satisfy the nonlinear equations or minimize the residuals of the nonlinear equations. The method has recently been extended by introducing deep learning techniques (e.g., Azinovic et al.,, 2022). As discussed in Section 3.3, in the VF-PGI method, we consider Bellman equations and optimality conditions concerning actions as nonlinear equations to be solved, and the proposed method resembles projection-based methods. Moreover, actions are updated in the directions of gradients in the proposed algorithm, and deep learning techniques also rely on gradient descent type algorithms. Note that our proposed algorithm might be more intuitive than the projection-based algorithms, because we can provide an intuitive explanation of the iteration process until convergence, as discussed in Section 3.

6.4 Reinforcement learning

The concept of using the gradient of “long-term reward” concerning actions for updating actions is similar to the Deterministic Policy Gradient method (DPG; Silver et al.,, 2014), which has been widely applied in the literature of reinforcement learning.²⁰²⁰20The DPG method utilizing deep learning techniques is known as DDPG (Lillicrap et al.,, 2015). DDPG was extended to the multi-agent deterministic policy gradient method (MADDPG) by Lowe et al., (2017). However, the settings and ideas are slightly different, and I clarify the point here by considering a single-agent dynamic model.

Unlike the dynamic model examined in the current study, DPG is developed in model-free settings, where agents do not know the parametric forms of per-period profits and state transitions. DPG approximates the action value function $Q\left(a,s\right)$ by $Q\left(a,s\right)\approx\widetilde{Q}\left(a,s;\theta_{Q}\right)$ and action $a(s)$ by $a(s)\approx\widetilde{a}\left(s;\theta_{a}\right)$ . The method iterates the joint updates of the parameters $\theta_{Q}$ and $\theta_{a}$ until convergence. Regarding $\theta_{a}$ , the parameter is updated by $\theta_{a}^{(n+1)}=\theta_{a}^{(n)}+\lambda_{a}\nabla_{\theta_{a}}\widetilde{a}\left(s;\theta_{a}\right)\nabla_{a}\widetilde{Q}\left(s,a\right)|_{a=\widetilde{a}\left(s;\theta_{a}\right)}$ , where $\lambda_{a}$ denotes the learning rate.²¹²¹21See Silver et al., (2014) for details. The updating equation implies that the values of $\theta_{a}$ are updated so that they step in the direction of the gradient of the action value function $Q$ , and the idea is analogous to the VF-PGI algorithm. Note that VF-PGI is more intuitive because it directly updates actions rather than the parameters related to the actions.

7 Conclusions

This study proposes the Value Function-Policy Gradient Iteration-Spectral (VF-PGI-Spectral) algorithm for computationally efficiently solving dynamic models with continuous actions. The VF-PGI-Spectral algorithm is general, in that it is not limited to models with specific functional forms, and it is applicable to single-agent/multi-agent dynamic models with single/multiple continuous actions. The algorithm is intuitive, easy to code, and substantially faster than the traditional value function iteration algorithm in some settings.

Although the VF-PGI-Spectral algorithm performs numerically well in this study, more theoretical investigation of the convergence properties of the algorithm would be interesting.

Furthermore, although the current study has employed the spectral algorithm to accelerate the convergence and it works well, it is unclear whether the spectral algorithm is the best in terms of convergence speed. Previous studies in operation research have investigated other methods to accelerate value function iterations and policy iterations, such as Anderson acceleration and the idea of momentum.²²²²22For recent studies, see the literature review in Grand-Clément, (2021). Though they have implicitly considered dynamic optimization problems with discrete actions, their ideas would also provide insights into the dynamic models with continuous actions. I leave the investigation of these alternative acceleration methods for further research.

Appendix A Proof

A.1 Proof of Proposition 1

Proof.

(Proof of $\Rightarrow$ )

•

If $f_{i}(x^{*})\leq 0$ and $x_{i}^{*}=l_{i}$ , $\widehat{\Phi}_{i}(x^{*})-x_{i}^{*}=l_{i}-l_{i}=0$ .
•

If $f_{i}(x^{*})\geq 0$ and $x_{i}^{*}=u_{i}$ , $\widehat{\Phi}_{i}(x^{*})-x_{i}^{*}=u_{i}-u_{i}=0$ .
•

If $x_{i}^{*}\in(l_{i},u_{i})$ , $f_{i}(x^{*})=0$ implies $x_{i}^{*}=\Phi_{i}(x^{*})$ . Hence, $\widehat{\Phi}_{i}(x)-x_{i}^{*}=P_{B_{i}}\left(\Phi_{i}(x^{*})\right)-x_{i}^{*}=P_{B_{i}}\left(x_{i}^{*}\right)-x_{i}^{*}=0$ .

Note that neither $\left(f_{i}(x^{*})>0\ \&\ x_{i}^{*}=l_{i}\right)$ nor $\left(f_{i}(x^{*})<0\ \&\ x_{i}^{*}=u_{i}\right)$ does not hold under (10).

Therefore, $\Rightarrow$ holds.

(Proof of $\Leftarrow$ )

•

Case of $x_{i}^{*}=l_{i}$ :

Suppose $f_{i}(x)>0$ . Then, by the definition of $\widehat{\Phi}$ , $l_{i}=x_{i}^{*}=P_{B_{i}}\left(\Phi_{i}(x^{*})\right)$ holds. Since $l_{i}\in B_{i}$ , $l_{i}=P_{B_{i}}\left(\Phi_{i}(x^{*})\right)=\Phi_{i}(x^{*})$ holds. It implies $x_{i}^{*}=\Phi_{i}(x^{*})$ , and $f_{i}(x^{*})=0$ holds by (11). This is a contradiction. Hence, $f_{i}(x)\leq 0$ holds.
•

Case of $x_{i}^{*}=u_{i}$ : We can show $f_{i}(x)\geq 0$ as in the case of $x_{i}^{*}=l_{i}$ .
•

Case of $x_{i}^{*}\in(l_{i},u_{i})$ : Under $x_{i}^{*}\in(l_{i},u_{i})$ , $x_{i}^{*}=\widehat{\Phi}_{i}(x^{*})=P_{B_{i}}\left(\Phi_{i}(x^{*})\right)=\Phi_{i}(x^{*})$ holds. Hence, by (11), $f_{i}(x^{*})=0$ holds.

∎

A.2 Proof of Proposition 2

Proof.

It is sufficient to show the following.

•

$\lim_{x_{i}\rightarrow l_{i}+0,f_{i}(x)\leq 0}\widehat{\Phi}_{i}(x)=l_{i}$ :

$\displaystyle\lim_{x_{i}\rightarrow l_{i}+0,f_{i}(x)\leq 0}\widehat{\Phi}_{i}(x)$	$\displaystyle=$	$\displaystyle\lim_{x_{i}\rightarrow l_{i}+0,f_{i}(x)\leq 0}P_{B_{i}}\left(\Phi_{i}(x)\right)$
	$\displaystyle=$	$\displaystyle P_{B_{i}}\left(\lim_{x_{i}\rightarrow l_{i}+0,f_{i}(x)\leq 0}\Phi_{i}(x)\right)$
	$\displaystyle=$	$\displaystyle P_{B_{i}}(l_{i})=l_{i}$

•

$\lim_{x_{-i}\rightarrow x_{-i}^{*}\ s.t.\ f_{i}(l_{i},x_{-i})>0,\ f_{i}(l_{i},x_{-i}^{*})=0}\widehat{\Phi}_{i}(l_{i},x_{-i})=l_{i}$ :

Let $x_{-i}^{*}$ be such that $f_{i}(l_{i},x_{-i}^{*})=0$ . Because $\Phi_{i}(l_{i},x_{-i})$ is continuous with respect to $x_{-i}$ , for any $\epsilon>0$ , there exists $\delta>0$ such that $|x_{-i}-x_{-i}^{*}|<\delta\Rightarrow\left|\Phi_{i}(l_{i},x_{-i})-\Phi_{i}(l_{i},x_{-i}^{*})\right|<\epsilon$ . Then, for $x_{-i}$ such that $f_{i}(l_{i},x_{-i})>0$ ,

$\displaystyle\left\|\widehat{\Phi_{i}}(l_{i},x_{-i})-l_{i}\right\|$	$\displaystyle=$	$\displaystyle\left\|P_{B_{i}}\left(\Phi_{i}(l_{i},x_{-i})\right)-l_{i}\right\|$
	$\displaystyle\leq$	$\displaystyle\left\|\Phi_{i}(l_{i},x_{-i})-l_{i}\right\|\ \left(\because P_{B_{i}}\left(\Phi_{i}(l_{i},x_{-i})\right)\geq l_{i}\right)$
	$\displaystyle=$	$\displaystyle\left\|\Phi_{i}(l_{i},x_{-i})-\Phi_{i}(l_{i},x_{-i}^{*})\right\|$
	$\displaystyle\leq$	$\displaystyle\epsilon\ \text{ror}\ \|x_{-i}-x_{-i}^{*}\|<\delta$

•

$\lim_{x_{-i}\rightarrow x_{-i}^{*}\ s.t.\ f_{i}(u_{i},x_{-i})<0,\ f_{i}(u_{i},x_{-i}^{*})=0}\widehat{\Phi}_{i}(u_{i},x_{-i})=u_{i}$ : Similar to the case above.

∎

A.3 Derivation of the envelope condition and Euler equation in the dynamic investment competition model with continuous states

In this section, we show equations (15) and (16).

Let $k_{j}^{\prime}(k)$ be the optimal next period capital stock of firm $j$ given the current state $k$ . By differentiating $V_{j}(k_{j},k_{-j})=\pi_{j}(k_{j},k_{-j})-c\left(k_{j}^{\prime}(k)-(1-\delta)k_{j}\right)+\beta V_{j}\left(k_{j}^{\prime}(k),k_{-j}^{\prime}(k)\right)$ with respect to $k_{j}$ , we have:

	$\displaystyle\frac{\partial V_{j}(k_{j},k_{-j})}{\partial k_{j}}$	$\displaystyle=$	$\displaystyle\frac{\partial\pi_{j}(k_{j},k_{-j})}{\partial k_{j}}+(1-\delta)c^{\prime}\left(k_{j}^{\prime}-(1-\delta)k_{j}\right)+$		(17)
			$\displaystyle\frac{\partial k_{j}^{\prime}}{\partial k_{j}}\left(-c^{\prime}\left(k_{j}^{\prime}-(1-\delta)k_{j}\right)+\beta\frac{\partial V_{j}(k_{j}^{\prime},k_{-j}^{\prime})}{\partial k_{j}^{\prime}}\right)+\beta\frac{\partial k_{-j}^{\prime}}{\partial k_{j}}\frac{\partial V(k_{j}^{\prime},k_{-j}^{\prime})}{\partial k_{-j}^{\prime}}$		(17)

By firm $j$ ’s optimality condition regarding the choice of $k_{j}^{\prime}$ ,

-c^{\prime}\left(k_{j}^{\prime}-(1-\delta)k_{j}\right)+\beta\frac{\partial V(k_{j}^{\prime},k_{-j}^{\prime})}{\partial k_{j}^{\prime}}=0

(18)

holds, and equations (17) and (18) imply

\frac{\partial V_{j}\left(k_{j},k_{-j}\right)}{\partial k_{j}}=\frac{\partial\pi_{j}\left(k_{j},k_{-j}\right)}{\partial k_{j}}+(1-\delta)c^{\prime}\left(k_{j}^{\prime}-(1-\delta)k_{j}\right)+\beta\frac{\partial k_{-j}^{\prime}}{\partial k_{j}}\frac{\partial V_{j}\left(k_{j}^{\prime},k_{-j}^{\prime}\right)}{\partial k_{-j}^{\prime}},

namely, equation (15).

Next, equations (15) and (18) imply

\displaystyle 0

\displaystyle=

\displaystyle-c^{\prime}\left(k_{j}^{\prime}-(1-\delta)k_{j}\right)+\beta\frac{\partial\pi_{j}(k_{j}^{\prime},k_{-j}^{\prime})}{\partial k_{j}^{\prime}}+\beta(1-\delta)c^{\prime}\left(k_{j}^{\prime\prime}-(1-\delta)k_{j}^{\prime}\right)+\beta^{2}\frac{\partial k_{-j}^{\prime\prime}}{\partial k_{j}^{\prime}}\frac{\partial V_{j}(k_{j}^{\prime\prime},k_{-j}^{\prime\prime})}{\partial k_{-j}^{\prime\prime}},

namely, equation (16).

Appendix B Details of the Numerical experiments

B.1 Details of the numerical experiments in Section 5.1

The code for the experiment is based on the replication code of Maliar and Maliar, (2013). As in Maliar and Maliar, (2013), we use a rectangular, uniformly spaced grid of 10 $\times$ 10 points for capital $k$ and productivity $z$ within an ergodic range as a solution domain. Expectations concerning future exogenous variables are approximated by using a 3-node Gauss Hermite quadrature rule. We parameterize value functions with complete ordinary polynomials of degree 4. We assume the iteration converges when the unit-free norms of variables are less than 1E-6. For instance, we assume VFI converges when $\text{$\left\|\frac{V^{(n+1)}(s)}{V^{(n)}(s)}-1\right\|$}_{\infty}\leq$ 1E-6.

As in Maliar and Maliar, (2013), we use $z(1-\overline{l})$ as the initial labor function. Our initial guess on the consumption function is that a constant fraction $\pi_{c}$ of the period output goes to consumption and the rest goes to investment. As initial value functions, I use values consistent with the labor and consumption functions.

Regarding the ECM and EGM algorithms, I show their steps in Algorithms 6 and 7.

1.

Set grid points $(k,z)$ and initial $V(k,z)$ .
2.
Iterate the following until convergence
1. (a)
  
  Solve for $l$ satisfying $V_{k}(k,z)=\frac{B(1-l)^{-\mu}}{a(1-\alpha)k^{\alpha}l^{-\alpha}}\left[1-\delta+a\alpha k^{\alpha-1}l^{1-\alpha}\right]$ .
2. (b)
  
  Compute $c=\left(\frac{B(1-l)^{-\mu}}{(1-\alpha)Azk^{\alpha}l^{-\alpha}}\right)^{\frac{-1}{\gamma}}$ and $k^{\prime}=(1-\delta)k+zr(k,l)-c$
3. (c)
  
  Update $V(k,z)$ by $V(k,z)\leftarrow u(c,l)+\beta E\left[\overline{V}(k^{\prime},z^{\prime})\right]$

Algorithm 6 ECM algorithm for the single-agent growth model with elastic labor supply

1.

Set grid points $(k^{\prime},z)$ and initial $V\left(k(k^{\prime}),z\right)$ .
2.
Iterate the following until convergence. Here, let $W\left(k^{\prime},z\right)\equiv E_{z^{\prime}}\left[V(k^{\prime},z^{\prime})|z\right]$ .
1. (a)
  
  Compute $c=\left[\beta W_{k^{\prime}}\left(k^{\prime},z\right)\right]^{-1/\gamma}$
2. (b)
  
  Solve for $l$ satisfying $k^{\prime}=(1-\delta)\left(\frac{B(1-l)^{-\mu}}{\beta W_{k}(k^{\prime},z)z(1-\alpha)}\right)^{1/\alpha}l+\frac{B(1-l)^{-\mu}l}{\beta W_{k}(k^{\prime},z)(1-\alpha)}-c$ .
3. (c)
  
  Compute $c=\left(\frac{B(1-l)^{-\mu}}{(1-\alpha)Azk^{\alpha}l^{-\alpha}}\right)^{\frac{-1}{\gamma}}$ and $k=\frac{k^{\prime}+c-zr(k,l)}{1-\delta}$
4. (d)
  
  Update $V\left(k(k^{\prime}),z\right)$ by $V\left(k(k^{\prime}),z\right)\leftarrow u(c,l)+\beta W(k^{\prime},z)$

Algorithm 7 EGM algorithm for the single-agent growth model with elastic labor supply

B.2 Details of the numerical experiments in Section 5.2

I compute the expectations concerning future exogenous variables by using a 10-node Gauss Hermite quadrature rule. To ease the computation of the expectation, I apply the “precomputation of integrals” technique in Judd et al., (2017). The technique reduces the computational cost per iteration.

Regarding basis functions, I use Chebyshev polynomials. As a solution domain, I choose a rectangular grid. The range of each firm’s capital stock $k_{jt}$ is between $\left[1/1.5,1\times 1.5\right]$ . The range of the exogenous cost shock $\mu$ is between $\left[4/1.03,4\times 1.03\right]$ , and the range of the exogenous demand shock $\xi$ is between $\left[2/1.03,2\times 1.03\right]$ . To reduce the number of grid points without largely losing numerical accuracy, I apply the Smolyak method (Smolyak,, 1963; Judd et al.,, 2014). The approximation level, which is denoted as $\mu$ in Judd et al., (2014), is set to 3. Note that the relative performance of the algorithms rarely changes even when not introducing such additional techniques.

As initial value functions, we use values assuming that exogenous states are constant over time and firms make constant investments so that the current capital stocks remain constant over time. As initial investment values, we use 0. The tolerance level for evaluating convergence is set to 1E-10.

When applying the VFI* algorithm, we need to solve nonlinear optimization problems for each firm and state. To solve the problem, I apply Newton’s method.

B.3 Details of the numerical experiments in Section 5.3

The code for the experiment is based on the replication code of Pakes and McGuire, (1994). As in Pakes and McGuire, (1994), we use 0 as the initial value functions and investment. To make the comparison easier, I modified the original code so that the steps are consistent with the VF-PGI* algorithm shown in Algorithm 4. Note that the modification rarely affected the performance of the algorithm. Tolerance level for evaluating convergence is set to 1E-10.

Besides, as some global variables make the iterations more computationally costly, I omitted these global variables and put them as arguments of functions. When solving nonlinear optimization problems in the absence of analytical formulas on solutions, I use the fminbnd function in MATLAB by setting the lower and upper bounds to be 0 and 1.

Appendix C Additional results

C.1 Single-agent neoclassical growth model with elastic supply

C.1.1 Effect of the value of $\lambda$ on the performance of the VF-PGI algorithm

Table 6 shows the numerical results under the settings where we do not apply the spectral algorithm to VF-PGI. The results imply that the iteration does not diverge when we set $\lambda\leq 10^{-8}$ . Nevertheless, the convergence is very slow, and the iterations terminated before reaching the convergence criteria. In contrast, when $\lambda\geq 10^{-7}$ , the iterations diverge. Hence, though VF-PGI-Spectral works well, VF-PGI itself is not practical.

Table 6: Effect of the value of

\lambda

in the VF-PGI algorithm (Single agent neoclassical growth model with elastic labor supply)

Method	$\log_{10}(\lambda)$	CPU (sec)	$L_{1}$	$L_{\infty}$	Iter.
VF-PGI	-10	3.495	-3.266	-2.394	3000
VF-PGI	-9	3.925	-3.852	-3.256	3000
VF-PGI	-8	5.322	-5.425	-3.983	3000
VF-PGI	-7	0.046	NaN	NaN	8
VF-PGI	-6	0.015	NaN	NaN	4
VF-PGI	-5	0.021	NaN	NaN	6

Notes. $\alpha_{0}=1$ .

$L_{1}$ and $L_{\infty}$ denotes the average and the maximum absolute values of Euler residuals in log 10 units on a stochastic simulation of 10,000 observations.

In Table 7, I show the results when we set $\lambda=1$ . The results show that when we set $\alpha_{0}$ to be small, iterations converge. Note that the value of $\alpha_{0}$ does not largely affect the convergence speed as long as they converge. For relatively large values of $\alpha_{0}$ , the iterations diverge.

Table 7: Effect of the value of

\alpha_{0}

in the VF-PGI-Spectral algorithm (Single agent neoclassical growth model with elastic labor supply)

Method	$\log_{10}(\alpha_{0})$	CPU (sec)	$L_{1}$	$L_{\infty}$	Iter.	Conv.
VF-PGI-Spectral	-10	0.169	-5.425	-3.983	136	1
VF-PGI-Spectral	-9	0.129	-5.425	-3.983	115	1
VF-PGI-Spectral	-8	0.12	-5.425	-3.983	95	1
VF-PGI-Spectral	-7	0.137	-5.425	-3.983	112	1
VF-PGI-Spectral	-6	0.022	NaN	NaN	14	0
VF-PGI-Spectral	-5	0.008	NaN	NaN	4	0

Notes. $\lambda=1$ .

$L_{1}$ and $L_{\infty}$ denotes the average and the maximum absolute values of Euler residuals in log 10 units on a stochastic simulation of 10,000 observations.

C.1.2 Alternative specification: Same $\alpha_{z}\ \left(z\in\left\{V,a^{d=1,\cdots,D}\right\}\right)$

Table 8 shows the results under the alternative setting where we assume that $\alpha_{z}\ \left(z\in\left\{V,a^{d=1,\cdots,D}\right\}\right)$ are common. The results show that the iteration becomes unstable when choosing a common value.

Table 8: Effect of the choice of

\alpha_{z}\ \left(z\in\left\{V,a^{d=1,\cdots,D}\right\}\right)

in the VF-PGI-Spectral algorithm (Single agent neoclassical growth model with elastic labor supply)

Method	Same $\alpha_{z}$	CPU (sec)	$L_{1}$	$L_{\infty}$	Iter.	Conv.
VF-PGI-Spectral	No	0.156	-5.425	-3.983	138	1
VF-PGI-Spectral	Yes	0.033	NaN	NaN	19	0

Notes. $\alpha_{0}=1,\lambda=10^{-7}$ .

$L_{1}$ and $L_{\infty}$ denotes the average and the maximum absolute values of Euler residuals in log 10 units on a stochastic simulation of 10,000 observations.

C.2 Single-agent neoclassical growth model with inelastic labor supply

Settings

As in Arellano et al., (2016) and Coleman et al., (2021), we consider a standard single-agent neoclassical growth model with inelastic labor supply. The dynamic optimization problem is characterized by:

$\displaystyle V(k,z)$	$\displaystyle=$	$\displaystyle\max_{c,k^{\prime}}u(c)+\beta E\left[V(k^{\prime},z^{\prime})\right]$
	$\displaystyle s.t.$	$\displaystyle k^{\prime}=(1-\delta)k+zr(k)-c$
		$\displaystyle\ln z^{\prime}=\rho\ln z+\epsilon^{\prime},\ \epsilon^{\prime}\sim N(0,\sigma^{2})$
		$\displaystyle u(c)=\frac{c^{1-\gamma}-1}{1-\gamma}$
		$\displaystyle r(k)=Ak^{\alpha}$

where $k,c,z$ are capital, consumption, and productivity level. We choose parameters $A=\frac{1/\beta-(1-\delta)}{\alpha},\alpha=1/3,\beta=0.99,\delta=0.025,\rho=0.95,\sigma=0.01$ . The experiments are run by modifying the replication code of Coleman et al., (2021).²³²³23Coleman et al., (2021)’s code is based on the one in Arellano et al., (2016).

Results

Table 9 shows the results. For details of the 7 methods without the spectral algorithm, see Arellano et al., (2016). The results show that VF-PGI-Spectral works well, but ECM-VF-Spectral, ECM-DVF-Spectral, and EE-Spectral work better. In these three methods, we can totally avoid solving nonlinear problems, and we have to update only one variable (value function or action). In contrast, in VF-PGI-Spectral, we have to update two types of variables (value function and action) separately, and it might lead to slower convergence. Hence, when researchers can find good analytical representations to avoid solving nonlinear problems inside iterations, these methods can be good alternatives to VF-PGI-Spectral.

Table 9: Speed and accuracy of algorithms (Single agent neoclassical growth model with inelastic labor supply)

Method	CPU (sec)	$L_{1}$	$L_{\infty}$	Iter.
VF-PGI-Spectral	0.121	-6.089	-4.099	92
ECM-VF-Spectral	0.060	-6.064	-4.056	59
VFI-Spectral	4.921	-6.089	-4.099	66
EGM-Spectral	2.151	-6.091	-4.124	40
ECM-PI-Spectral	0.754	-5.465	-3.944	5
PI-Spectral	1.202	-5.396	-3.689	5
ECM-DVF-Spectral	0.046	-6.071	-4.137	31
EE-Spectral	0.076	-7.79	-5.936	48
ECM-VF	0.831	-6.064	-4.056	1072
VFI	106.812	-6.089	-4.099	1072
EGM	57.741	-6.091	-4.124	768
ECM-PI	0.384	-5.938	-4.044	5
PI	0.898	-5.952	-4.086	5
ECM-DVF	0.385	-6.071	-4.137	292
EE	0.445	-7.784	-5.934	324

Notes. $\alpha_{0}=1,\lambda=0.001$ .

$L_{1}$ and $L_{\infty}$ denotes the average and the maximum absolute values of Euler residuals in log 10 units on a stochastic simulation of 10,000 observations.

The results also show that combining the spectral algorithm does not lead to a large speed-up for PI (policy iteration)-type algorithms. As formally discussed in Puterman and Brumelle, (1979) and Santos and Rust, (2004), policy iteration is mathematically equivalent to applying Newton’s method for solving the nonlinear equation (Bellman equation). Because the role of the spectral algorithm is similar to Newton’s method, the additional role of introducing the spectral algorithm might not be very large for policy iteration-type algorithms.

References

Aguirregabiria and Marcoux, (2021) Aguirregabiria, V. and Marcoux, M. (2021). Imposing equilibrium restrictions in the estimation of dynamic discrete games. Quantitative Economics, 12(4):1223–1271.
Arellano et al., (2016) Arellano, C., Maliar, L., Maliar, S., and Tsyrennikov, V. (2016). Envelope condition method with an application to default risk models. Journal of Economic Dynamics and Control, 69:436–459.
Azinovic et al., (2022) Azinovic, M., Gaegauf, L., and Scheidegger, S. (2022). Deep equilibrium nets. International Economic Review, 63(4):1471–1525.
Barzilai and Borwein, (1988) Barzilai, J. and Borwein, J. M. (1988). Two-point step size gradient methods. IMA journal of numerical analysis, 8(1):141–148.
Berry et al., (1995) Berry, S., Levinsohn, J., and Pakes, A. (1995). Automobile prices in market equilibrium. Econometrica, 63(4):841–890.
Carroll, (2006) Carroll, C. D. (2006). The method of endogenous gridpoints for solving dynamic stochastic optimization problems. Economics Letters, 91(3):312–320.
Coleman et al., (2021) Coleman, C., Lyon, S., Maliar, L., and Maliar, S. (2021). Matlab, python, julia: What to choose in economics? Computational Economics, 58:1263–1288.
Conlon and Gortmaker, (2020) Conlon, C. and Gortmaker, J. (2020). Best practices for differentiated products demand estimation with PyBLP. The RAND Journal of Economics, 51(4):1108–1161.
Cooper and Haltiwanger, (2006) Cooper, R. W. and Haltiwanger, J. C. (2006). On the nature of capital adjustment costs. The Review of Economic Studies, 73(3):611–633.
Fukasawa, (2024) Fukasawa, T. (2024). Fast and simple inner-loop algorithms of static/dynamic BLP estimations. arXiv preprint arXiv:2404.04494.
Fukasawa and Ohashi, (2023) Fukasawa, T. and Ohashi, H. (2023). Long-run Effect of a Horizontal Merger and Its Remedial Standards. Technical report, Research Institute of Economy, Trade and Industry (RIETI).
Goettler and Gordon, (2011) Goettler, R. L. and Gordon, B. R. (2011). Does AMD spur intel to innovate more? Journal of Political Economy, 119(6):1141–1200.
Grand-Clément, (2021) Grand-Clément, J. (2021). From convex optimization to MDPs: A review of first-order, second-order and quasi-Newton methods for MDPs. arXiv preprint arXiv:2104.10677.
Iskhakov, (2015) Iskhakov, F. (2015). Multidimensional endogenous gridpoint method: Solving triangular dynamic stochastic optimization problems without root-finding operations. Economics Letters, 135:72–76.
Judd, (1992) Judd, K. L. (1992). Projection methods for solving aggregate growth models. Journal of Economic theory, 58(2):410–452.
Judd, (1998) Judd, K. L. (1998). Numerical methods in economics. MIT press.
Judd et al., (2017) Judd, K. L., Maliar, L., Maliar, S., and Tsener, I. (2017). How to solve dynamic stochastic models computing expectations just once. Quantitative Economics, 8(3):851–893.
Judd et al., (2014) Judd, K. L., Maliar, L., Maliar, S., and Valero, R. (2014). Smolyak method for solving dynamic economic models: Lagrange interpolation, anisotropic grid and adaptive domain. Journal of Economic Dynamics and Control, 44:92–123.
La Cruz et al., (2006) La Cruz, W., Martínez, J., and Raydan, M. (2006). Spectral residual method without gradient information for solving large-scale nonlinear systems of equations. Mathematics of Computation, 75(255):1429–1448.
Lillicrap et al., (2015) Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
Lowe et al., (2017) Lowe, R., Wu, Y. I., Tamar, A., Harb, J., Pieter Abbeel, O., and Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30.
Maliar and Maliar, (2013) Maliar, L. and Maliar, S. (2013). Envelope condition method versus endogenous grid method for solving dynamic programming problems. Economics Letters, 120(2):262–266.
Mermelstein et al., (2020) Mermelstein, B., Nocke, V., Satterthwaite, M. A., and Whinston, M. D. (2020). Internal versus external growth in industries with scale economies: A computational model of optimal merger policy. Journal of Political Economy, 128(1):301–341.
Miranda and Fackler, (2004) Miranda, M. J. and Fackler, P. L. (2004). Applied computational economics and finance. MIT press.
Pakes et al., (1993) Pakes, A., Gowrisankaran, G., and McGuire, P. (1993). Implementing the Pakes-McGuire algorithm for computing Markov perfect equilibria in Gauss. Technical report, Working paper, Yale University, New Haven.
Pakes and McGuire, (1994) Pakes, A. and McGuire, P. (1994). Computing Markov-Perfect Nash Equilibria: Numerical Implications of a Dynamic Differentiated Product Model. The RAND Journal of Economics, 25(4):555–589.
Puterman and Brumelle, (1979) Puterman, M. L. and Brumelle, S. L. (1979). On the convergence of policy iteration in stationary dynamic programming. Mathematics of Operations Research, 4(1):60–69.
Reynaerts et al., (2012) Reynaerts, J., Varadha, R., and Nash, J. C. (2012). Enhancing the convergence properties of the BLP (1995) contraction mapping.
Santos and Rust, (2004) Santos, M. S. and Rust, J. (2004). Convergence properties of policy iteration. SIAM Journal on Control and Optimization, 42(6):2094–2115.
Silver et al., (2014) Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. (2014). Deterministic policy gradient algorithms. In International conference on machine learning, pages 387–395. Pmlr.
Smolyak, (1963) Smolyak, S. A. (1963). Quadrature and interpolation formulas for tensor products of certain classes of functions. In Doklady Akademii Nauk, volume 148, pages 1042–1045. Russian Academy of Sciences.
Varadhan and Roland, (2008) Varadhan, R. and Roland, C. (2008). Simple and globally convergent methods for accelerating the convergence of any EM algorithm. Scandinavian Journal of Statistics, 35(2):335–353.

$\displaystyle\left\|\widehat{\Phi_{i}}(l_{i},x_{-i})-l_{i}\right\|$	$\displaystyle=$	$\displaystyle\left\|P_{B_{i}}\left(\Phi_{i}(l_{i},x_{-i})\right)-l_{i}\right\|$
	$\displaystyle\leq$	$\displaystyle\left\|\Phi_{i}(l_{i},x_{-i})-l_{i}\right\|\ \left(\because P_{B_{i}}\left(\Phi_{i}(l_{i},x_{-i})\right)\geq l_{i}\right)$
	$\displaystyle=$	$\displaystyle\left\|\Phi_{i}(l_{i},x_{-i})-\Phi_{i}(l_{i},x_{-i}^{*})\right\|$
	$\displaystyle\leq$	$\displaystyle\epsilon\ \text{ror}\ \|x_{-i}-x_{-i}^{*}\|<\delta$

Simple method for efficiently solving dynamic models with continuous actions using policy gradient

Abstract

1 Introduction

2 Literature

3 Model and Algorithms

3.1 Single-agent dynamic model

Remark 1.

Remark 2.

3.2 Multi-agent dynamic game

Remark 3.

3.3 Spectral algorithm

Fixed point iteration representation of the algorithm

Spectral algorithm

Variable-type-specific step size

4 Extension: Box constraint

Proposition 1.

Proposition 2.

Corollary 1.

5 Numerical experiments

5.1 Single-agent neoclassical growth model with elastic labor supply

Settings

Results

Remark 4.

Remark 5.

5.2 Dynamic investment competition model with continuous states

Settings

Results

5.3 Dynamic investment competition model with discrete states (Pakes and McGuire,, 1994)

Settings

Results

6 Relationships with previously proposed methods

6.1 ECM, EE, and EGM

6.1.1 Envelope Condition Method (ECM; Maliar and Maliar,, 2013; Arellano et al.,, 2016)

6.1.2 Euler Equation (EE) method (cf. Judd,, 1998)

6.1.3 Endogenous Gridpoint Method (EGM; Carroll,, 2006)

6.2 Approach using cubic splines

6.3 Projection method and deep learning-based methods

6.4 Reinforcement learning

7 Conclusions

Appendix A Proof

A.1 Proof of Proposition 1

Proof.

A.2 Proof of Proposition 2

Proof.

A.3 Derivation of the envelope condition and Euler equation in the dynamic investment competition model with continuous states

Appendix B Details of the Numerical experiments

B.1 Details of the numerical experiments in Section 5.1

B.2 Details of the numerical experiments in Section 5.2

B.3 Details of the numerical experiments in Section 5.3

Appendix C Additional results

C.1 Single-agent neoclassical growth model with elastic supply

C.1.1 Effect of the value of λ\lambda on the performance of the VF-PGI algorithm

C.1.2 Alternative specification: Same αz​(z∈{V,ad=1,⋯,D})\alpha_{z}\ \left(z\in\left\{V,a^{d=1,\cdots,D}\right\}\right)

C.2 Single-agent neoclassical growth model with inelastic labor supply

Settings

Results

References

C.1.1 Effect of the value of $\lambda$ on the performance of the VF-PGI algorithm

C.1.2 Alternative specification: Same $\alpha_{z}\ \left(z\in\left\{V,a^{d=1,\cdots,D}\right\}\right)$