This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The Block-Correlated Pseudo Marginal Sampler for State Space Models

David Gunawan\star, Pratiti Chatterjee\ddagger, and Robert Kohn\star\star
Abstract

Particle Marginal Metropolis-Hastings (PMMH) is a general approach to Bayesian inference when the likelihood is intractable, but can be estimated unbiasedly. Our article develops an efficient PMMH method that scales up better to higher dimensional state vectors than previous approaches. The improvement is achieved by the following innovations. First, the trimmed mean of the unbiased likelihood estimates of the multiple particle filters is used. Second, a novel block version of PMMH that works with multiple particle filters is proposed. Third, the article develops an efficient auxiliary disturbance particle filter, which is necessary when the bootstrap disturbance filter is inefficient, but the state transition density cannot be expressed in closed form. Fourth, a novel sorting algorithm, which is as effective as previous approaches but significantly faster than them, is developed to preserve the correlation between the logs of the likelihood estimates at the current and proposed parameter values. The performance of the sampler is investigated empirically by applying it to non-linear Dynamic Stochastic General Equilibrium models with relatively high state dimensions and with intractable state transition densities and to multivariate stochastic volatility in the mean models. Although our focus is on applying the method to state space models, the approach will be useful in a wide range of applications such as large panel data models and stochastic differential equation models with mixed effects.

Keywords: Block PMMH; correlated PMMH; dynamic stochastic general equilibrium (DSGE) model; trimmed mean likelihood estimate

\ddagger Level 4, West Lobby, School of Economics, University of New South Wales Business School – building E-12, Kensington Campus, UNSW Sydney – 2052, Email: [email protected], Phone Number: (+61) 293852150. Website: http://www.pratitichatterjee.com
\star 39C. 164, School of Mathematics and Applied Statistics (SMAS), University of Wollongong, Wollongong, 2522; Australian Center of Excellence for Mathematical and Statistical Frontiers (ACEMS); National Institute for Applied Statistics Research Australia (NIASRA); Email: [email protected]. Phone Number: (+61) 424379015.
\star\star Level 4, West Lobby, School of Economics, University of New South Wales Business School – Building E-12, Kensington Campus, UNSW Sydney – 2052, and ACEMS Email: [email protected], Phone Number: (+61) 424802159.

1 Introduction

Particle marginal Metropolis-Hastings (PMMH) (Andrieu and Roberts, , 2009) is a Bayesian inference method for statistical models having an intractable likelihood, when the likelihood can be estimated unbiasedly. Our article develops a PMMH sampling method for efficiently estimating the parameters of complex state space models. The method scales better than current approaches in the number of observations and latent states and can handle state transition densities that cannot be expressed in closed form; e.g., many dynamic stochastic general equilibrium (DSGE) models, which are a popular class of macroeconomic time series state space models, do not have closed form transition densities.

A key issue in efficiently estimating statistical models using a PMMH approach is that the variance of the log of the estimated likelihood grows with the number of observations and the dimension of the latent states (Deligiannidis et al., , 2018). Pitt et al., (2012) show that to obtain a balance between computational time and the mixing of the Markov chain Monte Carlo (MCMC) chain, the number of particles used in the particle filter should be such that the variance of the log of the estimated likelihood is in the range 1 to 3, depending on the efficiency of the proposal for θ\theta. Pitt et al., (2012) also show that the efficiency of PMMH schemes deteriorates exponentially as that variance increases; we further note that in many complex statistical applications, it is computationally very expensive ensuring that the variance of the log of the estimated likelihood is within the required range.

Deligiannidis et al., (2018) propose a more efficient PMMH scheme, called the correlated pseudo-marginal (CPM) method, which correlates the random numbers in the (single) particle filters used to construct the estimated likelihoods at the current and proposed values of the parameters. This dependence induces a positive correlation between the estimated likelihoods and reduces the variance of the difference in the logs of the estimated likelihoods which appear in the Metropolis-Hastings (MH) acceptance ratio. They show that the CPM scales up with the number of observations compared to the standard pseudo marginal method of Andrieu et al., (2010) when the state dimension is small.

Tran et al., (2016) propose an alternative correlated pseudo marginal approach to that of Deligiannidis et al., , calling it the block pseudo marginal (BPM) method; the BPM divides the random numbers used in constructing likelihood estimators into blocks and then updates the parameters jointly with one randomly chosen block of the random numbers in each MCMC iteration; this induces a positive correlation between the numerator and denominator of the MH acceptance ratio, similarly to the CPM. They show that for large samples the correlation of the logs of the estimated likelihoods at the current and proposed values is close to 11/S1-1/S, where SS is the number of blocks. However, they do not apply the BPM method for estimating time series state space models using the particle filter.

Our article introduces a new PMMH sampler, referred to as the multiple PMMH algorithm (MPM), which extends the CPM method of Deligiannidis (2018) and the BPM method of Tran (2016). The MPM sampler is innovative and addresses various issues in the following ways: (a) The likelihood estimator is a trimmed mean of unbiased likelihood estimators from multiple independent particle filters (PFs); these PFs can be run in parallel. This approach reduces the variance of the likelihood estimator compared to using a single particle filter. The algorithm is exact when there is no trimming and approximate otherwise, but our empirical results suggest that the approximation is very close to being exact. (b) The unknown parameters, and only the random numbers used in one of the PFs, are updated jointly. This is a novel block version of PMMH that works with multiple particle filters. See section 2.3 for details. (c) An auxiliary disturbance PF (ADPF) algorithm is proposed to estimate the likelihood efficiently for state space models such as DSGE models, where the bootstrap filter is very inefficient and the state transition density does not have a closed form so that an auxiliary PF cannot be constructed; see section 2.5 for details. (d) A novel sorting algorithm, which is as effective as previous approaches but is significantly faster than them, is proposed to maintain the correlation between the logs of the likelihood estimates at the current and proposed values. The standard resampling step in the particle filter introduces discontinuities that break down the correlation between the logs of the likelihood estimates at the current and proposed values, even when the current and proposed parameters are close. Section 3.2 shows that the MPM sampler with the proposed sorting algorithm can maintain the correlation between the logs of the estimated likelihoods for relatively high dimensional state space models. Note that the proposed PMMH sampler works with any particle filter algorithm, including the tempered particle filter of Herbst and Schorfheide, (2019), the standard bootstrap filter of Gordon et al., (1993), the disturbance particle filter of Murray et al., (2013), and the proposed auxiliary disturbance filter. When the number of state dimensions is greater than the number of disturbance dimensions, a disturbance particle filter is more efficient than a state-based particle filter. Additionally, the disturbance particle filter is useful for state space models with intractable state transition density.

We illustrate the MPM sampler empirically, and compare its performance to the CPM method, using a standard linear Gaussian state space model, a multivariate stochastic volatility in the mean model and two non-linear Dynamic Stochastic General Equilibrium (DSGE) models, using simulated and real data. Our work in estimating non-linear DSGE models is also related to Fernández-Villaverde and Rubio-Ramírez, (2007) and Hall et al., (2014) who use standard PMMH methods. We note that the MPM sampler will also be useful for other complex statistical models, such as panel data models and stochastic differential equation mixed effects models.

The rest of the article is organised as follows. Section 2 introduces the state space model and gives some examples. Section 2.3 discusses the MPM sampler. Section 3 presents results from both simulated and real data. Section 4 concludes with a discussion of our major results and findings. The paper has an online supplement containing some further technical and empirical results.

2 Bayesian Inference for State Space Models

Notation

We use the colon notation for collections of variables, i.e., atr:s:=(atr,,ats)a_{t}^{r:s}:=\left(a_{t}^{r},...,a_{t}^{s}\right) and for tut\leq u, at:ur:s:=(atr:s,,aur:s)a_{t:u}^{r:s}:=\left(a_{t}^{r:s},...,a_{u}^{r:s}\right). Suppose {(Zt,Yt),t0}\left\{\left(Z_{t},Y_{t}\right),t\geq 0\right\} is a stochastic process, with parameter θ\theta, where the YtY_{t} are the observations and the ZtZ_{t} are the latent state vectors; random variables are denoted by capital letters and their realizations by lower case letters. We consider the state space model with p(z0|θ)p\left(z_{0}|\theta\right) the density of Z0,Z_{0}, p(zt|zt1,θ)p\left(z_{t}|z_{t-1},\theta\right) the density of ZtZ_{t} given Z0:t1=z0:t1Z_{0:t-1}=z_{0:t-1} for t1t\geq 1, and p(yt|zt,θ)p\left(y_{t}|z_{t},\theta\right) is the density of YtY_{t} given Z0:t=z0:tZ_{0:t}=z_{0:t}, Y1:t1=y1:t1Y_{1:t-1}=y_{1:t-1}.

2.1 Bayesian Inference

The objective of Bayesian inference is to obtain the posterior distribution of the model parameters θ\theta and the latent states z0:Tz_{0:T}, given the observations y1:Ty_{1:T} and a prior distribution p(θ)p\left(\theta\right); i.e.,

p(θ,z0:T|y1:T)\displaystyle p\left(\theta,z_{0:T}|y_{1:T}\right) =\displaystyle= p(y1:T|θ,z0:T)p(z0:T|θ)p(θ)/p(y1:T),\displaystyle p\left(y_{1:T}|\theta,z_{0:T}\right)p\left(z_{0:T}|\theta\right)p\left(\theta\right)/p\left(y_{1:T}\right), (1)

where

p(y1:T)=p(y1:T|θ,z0:T)p(z0:T|θ)p(θ)𝑑z0:T𝑑θp\left(y_{1:T}\right)=\int\int p\left(y_{1:T}|\theta,z_{0:T}\right)p\left(z_{0:T}|\theta\right)p\left(\theta\right)dz_{0:T}d\theta

is the marginal likelihood. The likelihood

p(y1:T|θ)=p(y1:T|θ,z0:T)p(z0:T|θ)𝑑z0:Tp\left(y_{1:T}|\theta\right)=\int p\left(y_{1:T}|\theta,z_{0:T}\right)p\left(z_{0:T}|\theta\right)dz_{0:T}

can be calculated exactly using the Kalman filter for linear Gaussian state space models (LGSS) and for linear DSGE models so that posterior samples can be obtained using an MCMC sampling scheme. However, the likelihood can be estimated unbiasedly, but not computed exactly, for non-linear and non-Gaussian state space models and the non-linear DSGE models described in section 3.4.

The bootstrap particle filter (Gordon et al., , 1993) provides an unbiased estimator of the likelihood for a general state space model. Andrieu and Roberts, (2009) and Andrieu et al., (2010) show that it is possible to use this unbiased estimator of the likelihood to carry out exact Bayesian inference for the parameters of the general state space model. They call this MCMC approach pseudo marginal Metropolis-Hastings (PMMH).

The non-linear (second-order) DSGE models considered in section 3.4 lie within the class of general non-linear state space models whose state transition density is difficult to work with or cannot be expressed in closed form. In such cases, it is useful to express the model in terms of the density of its latent disturbance variables as it can be expressed in closed form. The posterior in Eq. (1) becomes

p(θ,ϵ1:T,z0|y1:T)t=1Tp(yt|ϵ1:t,θ,z0)p(ϵt)p(z0|θ)p(θ),p\left(\theta,\epsilon_{1:T},z_{0}|y_{1:T}\right)\propto\prod_{t=1}^{T}p\left(y_{t}|\epsilon_{1:t},\theta,z_{0}\right)p\left(\epsilon_{t}\right)p\left(z_{0}|\theta\right)p\left(\theta\right), (2)

which Murray et al., (2013) call the disturbance state-space model. The standard state space model can be recovered from the disturbance state space model by using the deterministic function F(zt1,ϵt;θ)ztF\left(z_{t-1},\epsilon_{t};\theta\right)\rightarrow z_{t}. This gives us a state trajectory z0:Tz_{0:T} from any sample (ϵ1:T,z0)\left(\epsilon_{1:T},z_{0}\right). In the disturbance state-space model the target becomes the posterior distribution over the parameters θ\theta, the latent noise variables ϵ1:T\epsilon_{1:T} and the initial state z0z_{0}, rather than (θ,z0:T)\left(\theta,z_{0:T}\right).

2.2 Standard Pseudo Marginal Metropolis-Hastings

This section outlines the standard PMMH scheme which carries out MCMC on an expanded space using an unbiased estimate of the likelihood. Our paper focuses on MCMC based on the parameters, but it is straightforward to also use it to obtain posterior inference for the unobserved states; see, e.g., Andrieu et al., (2010). Let uu consist of all the random variables required to compute the unbiased likelihood estimate p^N(y|θ,u)\widehat{p}_{N}\left(y|\theta,u\right), with p(u)p(u) the density of uu; let p(θ)p\left(\theta\right) be the prior of θ\theta. The joint posterior density θ\theta and uu is

p(θ,u|y1:T)=p^N(y1:T|θ,u)p(θ)p(u)/p(y1:T),p\left(\theta,u|y_{1:T}\right)=\widehat{p}_{N}\left(y_{1:T}|\theta,u\right)p\left(\theta\right)p\left(u\right)/p\left(y_{1:T}\right),

so that

p(θ|y1:T)=p(θ,u|y1:T)𝑑u=p(y1:T|θ)p(θ)/p(y1:T)p\left(\theta|y_{1:T}\right)=\int p\left(\theta,u|y_{1:T}\right)du=p\left(y_{1:T}|\theta\right)p\left(\theta\right)/p\left(y_{1:T}\right)

is the posterior of θ\theta and p^N(y1:T|θ,u)p(u)𝑑u=p(y1:T|θ)\intop\widehat{p}_{N}\left(y_{1:T}|\theta,u\right)p\left(u\right)du=p\left(y_{1:T}|\theta\right) because the likelihood estimate is unbiased. We can therefore sample from the posterior density p(θ|y1:T)p\left(\theta|y_{1:T}\right) by sampling θ\theta and uu from p(θ,u|y1:T)p\left(\theta,u|y_{1:T}\right). The subscript NN indicates the number of particles used to estimate the likelihood.

Let q(θ|θ)q\left(\theta^{{}^{\prime}}|\theta\right) be the proposal density for θ\theta^{\prime} with the current value θ\theta and q(u|u)q\left(u^{{}^{\prime}}|u\right) the proposal density for uu^{{}^{\prime}} given uu. We always assume that q(u|u)q\left(u^{{}^{\prime}}|u\right) satisfies the reversibility condition

q(u|u)p(u)=q(u|u)p(u);q\left(u^{{}^{\prime}}|u\right)p\left(u\right)=q\left(u|u^{{}^{\prime}}\right)p\left(u^{{}^{\prime}}\right);

it is clearly satisfied by the standard PMMH where q(u|u)=p(u)q\left(u^{{}^{\prime}}|u\right)=p\left(u^{{}^{\prime}}\right). We generate a proposal θ\theta^{{}^{\prime}} from q(θ|θ)q\left(\theta^{{}^{\prime}}|\theta\right) and uu^{{}^{\prime}} from q(u|u)q\left(u^{{}^{\prime}}|u\right), and accept both proposals with probability

α(θ,u;θ,u)\displaystyle\alpha\left(\theta,u;\theta^{{}^{\prime}},u^{{}^{\prime}}\right) =\displaystyle= min(1,p^N(y|θ,u)p(θ)p(u)q(θ|θ)q(u|u)p^N(y|θ,u)p(θ)p(u)q(θ|θ)q(u|u))\displaystyle\min\left(1,\frac{\widehat{p}_{N}\left(y|\theta^{{}^{\prime}},u^{{}^{\prime}}\right)p\left(\theta^{{}^{\prime}}\right)p\left(u^{{}^{\prime}}\right)q\left(\theta|\theta^{{}^{\prime}}\right)q\left(u|u^{{}^{\prime}}\right)}{\widehat{p}_{N}\left(y|\theta,u\right)p\left(\theta\right)p\left(u\right)q\left(\theta^{{}^{\prime}}|\theta\right)q\left(u^{{}^{\prime}}|u\right)}\right) (3)
=\displaystyle= min(1,p^N(y|θ,u)p(θ)q(θ|θ)p^N(y|θ,u)p(θ)q(θ|θ)).\displaystyle\min\left(1,\frac{\widehat{p}_{N}\left(y|\theta^{{}^{\prime}},u^{{}^{\prime}}\right)p\left(\theta^{{}^{\prime}}\right)q\left(\theta|\theta^{{}^{\prime}}\right)}{\widehat{p}_{N}\left(y|\theta,u\right)p\left(\theta\right)q\left(\theta^{{}^{\prime}}|\theta\right)}\right).

The expression in Eq. (3) is identical to a standard Metropolis-Hastings algorithm, except that estimates of the likelihood at the current and proposed parameters are used. Andrieu and Roberts, (2009) show that the resulting PMMH algorithm has the correct invariant distribution regardless of the variance of the estimated likelihoods. However, the performance of the PMMH approach crucially depends on the number of particles NN used to estimate the likelihood. The variance of the log of the estimated likelihood should be between 1 and 3 depending on the quality of the proposal for θ\theta; see Pitt et al., (2012) and Sherlock et al., (2015). In many applications of the non-linear state space models considered in this paper, it is computationally very expensive to ensure that the variance of the log of the estimated likelihood is within the required range. Section 2.3 discusses the new PMMH sampler, which we call the multiple PMMH algorithm (MPM), that builds on and extends, the CPM of Deligiannidis et al., (2018) and BPM of Tran et al., (2016).

2.3 Multiple PMMH (MPM)

This section discusses Algorithm 1, which is the proposed multiple PMMH (MPM) method that uses multiple particle filters to obtain the estimate of the likelihood. The novel version of block-correlated PMMH that works with multiple particle filters to induce a high correlation between successive logs of the estimated likelihoods is also discussed. We now define the joint target density of θ\theta and u~=(u1,,uS)\widetilde{u}=\left(u_{1},...,u_{S}\right) as

p(θ,u~|y1:T)p^¯N(y|θ,u~)p(θ)s=1Sp(us),p\left(\theta,\widetilde{u}|y_{1:T}\right)\propto\overline{\widehat{p}}_{N}\left(y|\theta,\widetilde{u}\right)p\left(\theta\right)\prod_{s=1}^{S}p\left(u_{s}\right), (4)

where p^¯N(y|θ,u)\overline{\widehat{p}}_{N}\left(y|\theta,u\right) is the likelihood estimates obtained from SS particle filters discussed below. We update the parameters θ\theta jointly with a randomly selected block usu_{s} in each MCMC iteration, with Pr(S=s)=1/S\Pr\left(S=s\right)=1/S for any s=1,,Ss=1,...,S. The selected block usu_{s} is updated using us=ρuus+1ρu2ηuu_{s}^{{}^{\prime}}=\rho_{u}u_{s}+\sqrt{1-\rho_{u}^{2}}\eta_{u}, where ρu\rho_{u} is the non-negative correlation between the random numbers usu_{s} and usu_{s}^{{}^{\prime}} and ηu\eta_{u} is a standard normal vector of the same length as usu_{s}. This is similar to component-wise MCMC whose convergence is well established in the literature; see, e.g. Johnson et al., (2013). Using this scheme, the acceptance probability is

α(θ,u~;θ,u~)=min(1,p^¯N(y|θ,u~=(u1,,us1,us,us+1,,uS))p(θ)q(θ|θ)p^¯N(y|θ,u~=(u1,,us1,us,us+1,,uS))p(θ)q(θ|θ)).\alpha\left(\theta,\widetilde{u};\theta^{{}^{\prime}},\widetilde{u}^{{}^{\prime}}\right)=\min\left(1,\frac{\overline{\widehat{p}}_{N}\left(y|\theta^{{}^{\prime}},\widetilde{u}^{{}^{\prime}}=\left(u_{1},...,u_{s-1},u_{s}^{{}^{\prime}},u_{s+1},...,u_{S}\right)\right)p\left(\theta^{{}^{\prime}}\right)q\left(\theta|\theta^{{}^{\prime}}\right)}{\overline{\widehat{p}}_{N}\left(y|\theta,\widetilde{u}=\left(u_{1},...,u_{s-1},u_{s},u_{s+1},...,u_{S}\right)\right)p\left(\theta\right)q\left(\theta^{{}^{\prime}}|\theta\right)}\right). (5)
Algorithm 1 The Multiple PMMH (MPM) algorithm
  • Set the initial values of θ(0)\theta^{\left(0\right)} arbitrarily.

  • Sample usN(0,I)u_{s}\sim N\left(0,I\right) for s=1,,Ss=1,...,S, and run SS particle filters to estimate the likelihood p^¯N(y|θ,u~)\overline{\widehat{p}}_{N}\left(y|\theta,\widetilde{u}\right) as the trimmed mean of p^N(y|θ,us),s=1,,S\widehat{p}_{N}\left(y|\theta,u_{s}\right),s=1,\dots,S; a 0% is the mean and a 50% trimmed mean is the median.

  • For each MCMC iteration pp, p=1,,Pp=1,...,P,

    • Sample θ\theta^{{}^{\prime}} from the proposal density q(θ|θ)q\left(\theta^{{}^{\prime}}|\theta\right).

    • Choose index ss with probability 1/S1/S, sample ηuN(0,I)\eta_{u}\sim N\left(0,I\right), and set us=ρuus+1ρu2ηuu_{s}^{{}^{\prime}}=\rho_{u}u_{s}+\sqrt{1-{\rho_{u}}^{2}}{\eta_{u}}.

    • Run SS particle filters to compute the estimate of likelihood p^¯N(y|θ,u~)\overline{\widehat{p}}_{N}\left(y|\theta^{{}^{\prime}},\widetilde{u}^{{}^{\prime}}\right).

    • With the probability in Eq. (5), set p^¯(y1:T|θ,u~)(p)=p^¯(y1:T|θ,u~)\overline{\widehat{p}}\left(y_{1:T}|\theta,\widetilde{u}\right)^{\left(p\right)}=\overline{\widehat{p}}\left(y_{1:T}|\theta^{{}^{\prime}},\widetilde{u}^{{}^{\prime}}\right), u~(p)=u~\widetilde{u}^{(p)}=\widetilde{u}^{{}^{\prime}}, and θ(p)=θ\theta^{\left(p\right)}=\theta^{{}^{\prime}}; otherwise, set p^¯(y1:T|θ,u~)(p)=p^¯(y1:T|θ,u~)(p1)\overline{\widehat{p}}\left(y_{1:T}|\theta,\widetilde{u}\right)^{\left(p\right)}=\overline{\widehat{p}}\left(y_{1:T}|\theta,\widetilde{u}\right)^{\left(p-1\right)}, u~(p)=u~(p1)\widetilde{u}^{(p)}=\widetilde{u}^{(p-1)}, and θ(p)=θ(p1)\theta^{\left(p\right)}=\theta^{\left(p-1\right)}.

We now discuss approaches to obtain a likelihood estimate from multiple particle filters which we then use in the MPM algorithm. Suppose that SS particle filters are run in parallel. Let p^N(y|θ,us)\widehat{p}_{N}\left(y|\theta,u_{s}\right) be the unbiased estimate of the likelihood obtained from the ssth particle filter, for s=1,,Ss=1,...,S. The first approach takes the average of the SS unbiased likelihood estimates from the particle filter. The resulting likelihood estimate

p^¯N(y|θ,u)=s=1Sp^N(y|θ,us)/S\overline{\widehat{p}}_{N}\left(y|\theta,u\right)=\sum_{s=1}^{S}\widehat{p}_{N}\left(y|\theta,u_{s}\right)/S (6)

is also unbiased (Sherlock et al., 2017b, ). The second approach is to take the trimmed mean as the likelihood estimate in MPM. For example, the 20%20\% trimmed mean averages the middle 60%60\% of the likelihood values. Although the trimmed mean does not give an unbiased estimate of the likelihood, we show in section 3 that the posterior based on the trimmed means approximates the exact posterior well.

2.4 Multidimensional Euclidean Sorting Algorithm

It is important that the logs of the likelihood estimates evaluated at the current and proposed values of θ\theta and u~\widetilde{u} are highly correlated to reduce the variance of logp^¯N(y|θ,u~)logp^¯N(y|θ,u~)\log\overline{\widehat{p}}_{N}\left(y|\theta^{{}^{\prime}},\widetilde{u}^{{}^{\prime}}\right)-\log\overline{\widehat{p}}_{N}\left(y|\theta,\widetilde{u}\right) because this helps the Markov chain to mix well. However, the standard resampling step in the particle filter introduces discontinuities which breaks down the correlation between the logs of the likelihood estimates at the current and proposed values even when the current parameters θ\theta and the proposed parameters θ\theta^{{}^{\prime}} are close. Sorting the particles from smallest to largest before resampling helps to preserve the correlation between the logs of the likelihood estimates at the current and proposed values (Deligiannidis et al., , 2018). However, such simple sorting is unavailable for multidimensional state particles.

This section discusses Algorithm 2, which is the fast multidimensional Euclidean sorting algorithm used to sort the multidimensional state or disturbance particles in the particle filter algorithm described in Algorithm S1 in section S1 of the online supplement. The algorithm is written in terms of state particles, but similar steps can be implemented for the disturbance particles. The algorithm takes the particles zt1:Nz_{t}^{1:N} and the normalised weights w¯t1:N\overline{w}_{t}^{1:N} as the inputs; it outputs the sorted particles z~t1:N\widetilde{z}_{t}^{1:N}, sorted weights w¯~t1:N\widetilde{\overline{w}}_{t}^{1:N}, and sorted indices ζ1:N\zeta_{1:N}. Let ztiz_{t}^{i} be dd-dimensional particles at a time tt for the iith particle, zti=(zt,1i,,zt,di)z_{t}^{i}=\left(z_{t,1}^{i},...,z_{t,d}^{i}\right)^{\top}. Let d(ztj,zti)d\left(z_{t}^{j},z_{t}^{i}\right) be the Euclidean distance between two multidimensional particles. The first step is to calculate the means of the multidimensional particles z¯ti=(1/d)k=1dzt,ki\overline{z}_{t}^{i}=(\nicefrac{{1}}{{d}})\sum_{k=1}^{d}z_{t,k}^{i} at time tt for i=1,,Ni=1,...,N. The first sorting index, ζ1\zeta_{1}, is the index of the multidimensional particle having the smallest value of z¯ti\overline{z}_{t}^{i} for i=1,,Ni=1,...,N. Therefore, the first sorted particle z~t1\widetilde{z}_{t}^{1} is ztζ1z_{t}^{\zeta_{1}} with its associated weight w¯~t1=w¯tζ1\widetilde{\overline{w}}_{t}^{1}=\overline{w}_{t}^{\zeta_{1}}. We then calculate the Euclidean distance between the selected multidimensional particle and the set of all remaining multidimensional particles, d(z~t1,zti)d\left(\widetilde{z}_{t}^{1},z_{t}^{i}\right), for all iζ1i\neq\zeta_{1}. The next step is to sort the particles and weights according to the Euclidean distance from smallest to largest to obtain the sorted particles z~t2:N\widetilde{z}_{t}^{2:N}, sorted weights w¯~t2:N\widetilde{\overline{w}}_{t}^{2:N}, and sorted indices ζ2:N\zeta_{2:N}.

Algorithm 2 simplifies the sorting algorithm proposed by Choppala et al., (2016). In Choppala et al., (2016), the first sorting index is the index of the multidimensional particle having the smallest value along its first dimension. To select the second sorted index, ζ2\zeta_{2}, we need to calculate the Euclidean distance between the first selected particle and the remaining multidimensional particles. Similarly, to select the third sorted index, ζ3\zeta_{3}, we need to calculate the Euclidean distance between the second selected particle and the remaining multidimensional particles. This process is repeated NN times, which is expensive for a large number of particles and long time series. In contrast, Algorithm 2 only needs to calculate the Euclidean distance of the first selected particle and the remaining particles once. Deligiannidis et al., (2018) use the Hilbert sorting method of Skilling, (2004) to order the multidimensional state particles, which requires calculating the Hilbert index for each multidimensional particle and is much more expensive than Algorithm 2. Table 1 shows the CPU time (in seconds) of the three sorting methods for different numbers of particles NN and state dimensions dd. The table shows that the proposed sorting algorithm is much faster than that in Choppala et al., (2016) and the Hilbert sorting method. For example, for state dimensions d=30d=30 and N=2000N=2000 particles, Algorithm 2 is 290290 and 335335 times faster than the Choppala et al., (2016) and the Hilbert sorting methods, respectively

Algorithm 2 Multidimensional Euclidean Sorting Algorithm

Input: zt1:Nz_{t}^{1:N}, w¯t1:N\overline{w}_{t}^{1:N}

Output: sorted particles z~t1:N\widetilde{z}_{t}^{1:N}, sorted weights w¯~t1:N\widetilde{\overline{w}}_{t}^{1:N}, sorted indices ζ1:N\zeta_{1:N}

  • Calculate the means of the multidimensional particles z¯ti=1dk=1dzt,ki\overline{z}_{t}^{i}=\frac{1}{d}\sum_{k=1}^{d}z_{t,k}^{i} at time tt for i=1,,Ni=1,...,N. The first sorting index, ζ1\zeta_{1}, is the index of the multidimensional particle having the smallest value of z¯ti\overline{z}_{t}^{i} for i=1,,Ni=1,...,N. The first selected particle z~t1\widetilde{z}_{t}^{1} is ztζ1z_{t}^{\zeta_{1}} with its associated weight w¯~t1=w¯tζ1\widetilde{\overline{w}}_{t}^{1}=\overline{w}_{t}^{\zeta_{1}}.

  • Calculate the Euclidean distance between the selected multidimensional particle and the set of all remaining multidimensional particles, d(z~t1,zti)d\left(\widetilde{z}_{t}^{1},z_{t}^{i}\right), for iζ1\forall i\neq\zeta_{1}.

  • Sort the particles and weights according to the Euclidean distance from smallest to largest to obtain ζi\zeta_{i} for i=2,,Ni=2,...,N. The sorted particles z~ti=ztζi\widetilde{z}_{t}^{i}=z_{t}^{\zeta_{i}} and w¯~ti=w¯tζi\widetilde{\overline{w}}_{t}^{i}=\overline{w}_{t}^{\zeta_{i}} for i=2,,Ni=2,...,N.

Table 1: Comparing different sorting methods for various combinations of NN and dd in terms of CPU time (in seconds): I: The fast Euclidean sorting algorithm, II: Choppala et al., (2016) sorting method, and III: Hilbert sorting method. The entries in columns I to III are multiples of the time in column I. The column headed “II actual” is the actual time for column I; e.g. the 30.5 in column II and row 1 means that I is 30.5 times faster than the Choppala et al., (2016) sorting. The corresponding entry for I actual is 0.0002.
dd NN II IIII IIIIII II actual
10 500 1.00 30.50 87.50 0.0002
1000 1.00 66.67 115.00 0.0003
2000 1.00 172.40 141.00 0.0005
30 500 1.00 39.67 141.00 0.0003
1000 1.00 113.00 214.00 0.0004
2000 1.00 289.80 335.00 0.0005

2.5 The Auxiliary Disturbance Particle Filter

This section discusses the auxiliary disturbance particle filter we use to obtain the estimates of the likelihood in the MPM sampler described in section 2.3. It is particularly useful for state space models when the state dimension is substantially bigger than the disturbance dimension and the state transition density is intractable. Suppose that z1=Φ(z0,ϵ1;θ)z_{1}=\Phi\left(z_{0},\epsilon_{1};\theta\right), where z0z_{0} is the initial state vector with density p(z0|θ)p(z_{0}|\theta), and zt=F(zt1,ϵt;θ)(t2)z_{t}=F\left(z_{t-1},\epsilon_{t};\theta\right)(t\geq 2), where ϵt\epsilon_{t} is a ne×1n_{e}\times 1 vector of normally distributed latent noise with density p(ϵt).p\left(\epsilon_{t}\right). Murray et al., (2013) express the standard state-space model in terms of the latent noise variables ϵ1:T\epsilon_{1:T}, and call

ϵtp(ϵt),yt|ϵ1:tp(yt|ϵ1:t,z0;θ)=p(yt|zt;θ),t=1,,T,\epsilon_{t}\sim p\left(\epsilon_{t}\right),y_{t}|\epsilon_{1:t}\sim p\left(y_{t}|\epsilon_{1:t},z_{0};\theta\right)=p\left(y_{t}|z_{t};\theta\right),\,\,t=1,...,T,

the disturbance state-space model. We note that the conditional distribution of yty_{t} depends on all the latent error variables, ϵ1:t.\epsilon_{1:t}.

We now discuss the proposal for ϵt\epsilon_{t} used in the disturbance filter. The defensive mixture proposal density (Hesterberg, , 1995) is

m(ϵt|uϵ,t,θ)=πp(ϵt|θ)+(1π)q(ϵt|θ,y1:T),with0<π1,m\left(\epsilon_{t}|u_{\epsilon,t},\theta\right)=\pi p\left(\epsilon_{t}|\theta\right)+(1-\pi)q\left(\epsilon_{t}|\theta,y_{1:T}\right),\quad{\rm with}\quad 0<\pi\ll 1, (7)

where uϵ,tu_{\epsilon,t} is the vector random variable used to generate the particles ϵt\epsilon_{t} given θ\theta. If the observation density p(yt|zt,θ)p\left(y_{t}|z_{t},\theta\right) is bounded, then Eq. (7) guarantees that the weights are bounded in the disturbance particle filter algorithm defined in Eq. (S2) of section S1 of the online supplement. In practice, we take q(ϵt|θ,y1:T)=N(ϵt|μ^t,Σ^t)q\left(\epsilon_{t}|\theta,y_{1:T}\right)=N\left(\epsilon_{t}|\widehat{\mu}_{t},\widehat{\Sigma}_{t}\right) and set π=0.05\pi=0.05 in all the examples in section 3; see algorithm 4. If π=1\pi=1 in Eq. (7), then m(ϵt)=p(ϵt)m\left(\epsilon_{t}\right)=p\left(\epsilon_{t}\right) is the bootstrap disturbance particle filter. However, the empirical performance of this filter is usually poor because the resulting likelihood estimate is too variable.

Algorithm 4 discusses how we obtain μ^t\widehat{\mu}_{t} and the covariance matrix Σ^t\widehat{\Sigma}_{t}, for t=1,,Tt=1,...,T, at each iteration of the MPM algorithm. It is computationally cheap to obtain μ^t\widehat{\mu}_{t} and Σ^t\widehat{\Sigma}_{t} for t=1,,Tt=1,...,T because they are obtained from the output of the SS disturbance particle filters run at each MCMC iteration and the ancestral tracing method is fast. The proposal mean μ^t\widehat{\mu}_{t} and the covariance matrix Σ^t\widehat{\Sigma}_{t}, for t=1,,Tt=1,...,T, obtained at iteration pp are used to estimate the likelihood at the next iteration. At the first iteration, we use the bootstrap filter to initialise the mean μ^t\widehat{\mu}_{t} and the covariance matrix Σ^t\widehat{\Sigma}_{t}, for t=1,,Tt=1,...,T. Algorithm 3 gives the MPM algorithm with ADPF.

Algorithm 3 The Multiple PMMH (MPM) with ADPF algorithm
  • Set the initial values of θ(0)\theta^{\left(0\right)} arbitrarily.

  • Sample usN(0,I)u_{s}\sim N\left(0,I\right) for s=1,,Ss=1,...,S, run SS (disturbance) particle filters to estimate the likelihood p^¯N(y|θ,u~)\overline{\widehat{p}}_{N}\left(y|\theta,\widetilde{u}\right) as the trimmed mean of p^N(y|θ,us),s=1,,S\widehat{p}_{N}\left(y|\theta,u_{s}\right),s=1,\dots,S; a 0% trimmed mean is the mean and a 50% trimmed mean is the median, and run the algorithm 4 to construct the (initial) proposal for the auxiliary disturbance particle filter.

  • For each MCMC iteration pp, p=1,,Pp=1,...,P,

    • Sample θ\theta^{{}^{\prime}} from the proposal density q(θ|θ)q\left(\theta^{{}^{\prime}}|\theta\right).

    • Choose index ss with probability 1/S1/S, sample ηuN(0,I)\eta_{u}\sim N\left(0,I\right), and set us=ρuus+1ρu2ηuu_{s}^{{}^{\prime}}=\rho_{u}u_{s}+\sqrt{1-{\rho_{u}}^{2}}{\eta_{u}}.

    • Run SS (disturbance) particle filters to compute the estimate of likelihood p^¯N(y|θ,u~)\overline{\widehat{p}}_{N}\left(y|\theta^{{}^{\prime}},\widetilde{u}^{{}^{\prime}}\right)

    • Run algorithm 4 to construct the proposal for the auxiliary disturbance particle filter at iteration p+1p+1.

    • With the probability in Eq. (5), set p^¯(y1:T|θ,u~)(p)=p^¯(y1:T|θ,u~)\overline{\widehat{p}}\left(y_{1:T}|\theta,\widetilde{u}\right)^{\left(p\right)}=\overline{\widehat{p}}\left(y_{1:T}|\theta^{{}^{\prime}},\widetilde{u}^{{}^{\prime}}\right), u~(p)=u~\widetilde{u}^{(p)}=\widetilde{u}^{{}^{\prime}}, and θ(p)=θ\theta^{\left(p\right)}=\theta^{{}^{\prime}}; otherwise, set p^¯(y1:T|θ,u~)(p)=p^¯(y1:T|θ,u~)(p1)\overline{\widehat{p}}\left(y_{1:T}|\theta,\widetilde{u}\right)^{\left(p\right)}=\overline{\widehat{p}}\left(y_{1:T}|\theta,\widetilde{u}\right)^{\left(p-1\right)}, u~(p)=u~(p1)\widetilde{u}^{(p)}=\widetilde{u}^{(p-1)}, and θ(p)=θ(p1)\theta^{\left(p\right)}=\theta^{\left(p-1\right)}.

Algorithm 4 Constructing proposal for the auxiliary disturbance particle filter (ADPF)

Input: the number of particle filters SS, the number of particles for each particle filter NN, particles ϵ1:S,1:T1:N\epsilon_{1:S,1:T}^{1:N}, weights w¯1:S,1:T1:N\overline{w}_{1:S,1:T}^{1:N}, and ancestor indices a1:S,1:T11:Na_{1:S,1:T-1}^{1:N}.

Output: the mean μ^t\widehat{\mu}_{t} and the covariance matrix Σ^t\widehat{\Sigma}_{t}, for t=1,,Tt=1,...,T.

  1. 1.

    Given the particles ϵ1:S,1:T1:N\epsilon_{1:S,1:T}^{1:N} with weights w¯1:S,1:T1:N\overline{w}_{1:S,1:T}^{1:N} and ancestor indices a1:S,1:T11:Na_{1:S,1:T-1}^{1:N} from the output of the disturbance particle filters, the ancestral tracing algorithm of Kitagawa, (1996) is used to sample from particle approximations from the smoothing distribution p(ϵ1:T|θ,y1:T)p(\epsilon_{1:T}|\theta,y_{1:T}). This consists of sampling one particle trajectory from each of the SS particle filters in parallel. For each particle filter, we first sample an index jsj_{s} with the probability w¯s,Tjs\overline{w}_{s,T}^{j_{s}}, tracing back its ancestral lineage bs,1:Tjs(bs,Tjs=jsandbs,t1js=as,t1bs,tjs)b_{s,1:T}^{j_{s}}\left(b_{s,T}^{j_{s}}=j_{s}\;\textrm{and}\;b_{s,t-1}^{j_{s}}=a_{s,t-1}^{b_{s,t}^{j_{s}}}\right) and choosing the particle trajectory ϵs,1:Tjs=(ϵs,1bs,1js,,ϵs,Tbs,Tjs)\epsilon_{s,1:T}^{j_{s}}=\left(\epsilon_{s,1}^{b_{s,1}^{j_{s}}},...,\epsilon_{s,T}^{b_{s,T}^{j_{s}}}\right) for s=1,,Ss=1,...,S.

  2. 2.

    The mean μ^t\widehat{\mu}_{t} and the covariance matrix Σ^t\widehat{\Sigma}_{t} are

    μ^t:=1Ss=1Sϵs,tbs,tjs,andΣ^t:=1S1s=1S(ϵs,tbs,tjsμ^t)(ϵs,tbs,tjsμ^t).\widehat{\mu}_{t}:=\frac{1}{S}\sum_{s=1}^{S}\epsilon_{s,t}^{b_{s,t}^{j_{s}}},\quad\text{and}\quad\widehat{\Sigma}_{t}:=\frac{1}{S-1}\sum_{s=1}^{S}\left(\epsilon_{s,t}^{b_{s,t}^{j_{s}}}-\widehat{\mu}_{t}\right)\left(\epsilon_{s,t}^{b_{s,t}^{j_{s}}}-\widehat{\mu}_{t}\right)^{\top}.

3 Examples

Section 3.1 discusses the inefficiency measures used to compare the performance of different particle filters or PMMH samplers used in our article. Section 3.2 investigates empirically the performance of the proposed methods for estimating a high-dimensional linear Gaussian state space model. Section 3.3 discusses a multivariate stochastic volatility in mean model with GARCH diffusion processes. Sections 3.4.1 and 3.4.2 apply the proposed samplers to estimate non-linear small scale DSGE and medium scale DSGE models, respectively.

3.1 Definitions of Inefficiency

We use the inefficiency factor (IF) (also called the integrated autocorrelation time)

IFψ:=1+2j=1ρψ(j),\textrm{IF}_{\psi}:=1+2\sum_{j=1}^{\infty}\rho_{\psi}\left(j\right),

to measure the inefficiency of a PMMH sampler at estimating the posterior expectation of a univariate function ψ(θ)\psi\left(\theta\right) of θ\theta; here, ρψ(j)\rho_{\psi}\left(j\right) is the jjth autocorrelation of the iterates ψ(θ)\psi\left(\theta\right) in the MCMC chain after it has converged to its stationary distribution. We estimate the IFψ\textrm{IF}_{\psi} using the CODA R package of Plummer et al., (2006). A low value of the IFψ\textrm{IF}_{\psi} estimate suggests that the Markov chain mixes well. Our measure of the inefficiency of a PMMH sampler that takes computing time (CT) into account for a given parameter θ\theta based on IFψ\textrm{IF}_{\psi} is the time normalised inefficiency factor (TNIF) defined as TNIFψ:=IFψ×CT.\textrm{TNIF}_{\psi}:=\textrm{IF}_{\psi}\times\textrm{CT}. For a given sampler, let IFψ,MAX\textrm{IF}_{\psi,\textrm{MAX}} and IFψ,MEAN\textrm{IF}_{\psi,\textrm{MEAN}} be the maximum and mean of the IF values over all the parameters in the model. The relative time normalized inefficiency factor (RTNIF) is a measure of the TNIF relative to the benchmark method, where the benchmark method depends on the example.

3.2 Linear Gaussian State Space Model

This section investigates (1) the efficiency of different approaches for obtaining likelihood estimates from SS particle filters, and (2) the performance of different PMMH samplers for estimating a single parameter θ\theta, regardless of the dimension of the states. We consider the linear Gaussian state space model discussed in Guarniero et al., (2017) and Deligiannidis et al., (2018), where {Xt;t1}\left\{X_{t};t\geq 1\right\} and {Yt;t1}\left\{Y_{t};t\geq 1\right\} are d{\cal R}^{d} valued with

Yt\displaystyle Y_{t} =\displaystyle= Xt+Wt,\displaystyle X_{t}+W_{t},
Xt+1\displaystyle X_{t+1} =\displaystyle= AθXt+Vt+1,\displaystyle A_{\theta}X_{t}+V_{t+1},

with X1𝒩(0d,Id)X_{1}\sim{\cal N}\left(0_{d},I_{d}\right), VtN(0d,Id)V_{t}\sim N\left(0_{d},I_{d}\right), WtN(0d,Id)W_{t}\sim N\left(0_{d},I_{d}\right), and Aθi,j=θ|ij|+1A_{\theta}^{i,j}=\theta^{|i-j|+1} for i<ji<j; the true value of θ\theta is 0.40.4. Although it is possible to use the more efficient fully adapted particle filter (Pitt et al., , 2012), we use the bootstrap filter for all methods to show that the proposed methods are useful for models where it is difficult to use better particle filter algorithms.

The first study compares the 0%, 5%, 10%, 25%, and 50% trimmed means of the likelihood estimates obtained from SS particle filters. The simulated data is generated from the model above with T=200T=200 and T=300T=300 time periods and d=1,5d=1,5 and 1010 dimensions.

Table 2: Comparing the variance of the log of the estimated likelihood for five different estimators of the likelihood: I: Averaging the likelihood (0% trimmed mean), II: Averaging the likelihood (5% trimmed mean), III: Averaging the likelihood (10% trimmed mean), IV: Averaging the likelihood (25% trimmed mean), and V: Averaging the likelihood (50% trimmed mean), for d=10d=10 dimension linear Gaussian state space model with T=300T=300. The variance of the log of the estimated likelihood of a single particle filter is reported in column VIVI. The results are based on 10001000 independent runs. The entries in columns headed I to V are relative to the variance in column V. The entries in column ”V actual” are the actual variances for estimator V. For example, the entry on row 4, column I is 70.45, which means the variance of the log of the likelihood estimate is 70.45 times the corresponding variance in column V. The entries in column VI are relative to column V with S=1000S=1000 particles.
NN SS II IIII IIIIII IVIV VV VIVI VV actual
100 1 563.65563.65
20 3.263.26 2.022.02 1.481.48 1.111.11 1.001.00 29.3029.30
100100 11.4711.47 2.602.60 1.801.80 1.221.22 1.001.00 5.735.73
1000 70.4570.45 2.322.32 1.581.58 1.011.01 1.001.00 0.660.66
250 1 629.83629.83
20 3.603.60 2.032.03 1.671.67 1.121.12 1.001.00 15.7515.75
100 11.9011.90 2.482.48 1.561.56 1.121.12 1.001.00 3.203.20
1000 76.5576.55 2.242.24 1.621.62 1.121.12 1.001.00 0.320.32
1000 1 620.74620.74
20 3.693.69 1.941.94 1.511.51 0.990.99 1.001.00 6.146.14
100 13.7513.75 2.502.50 1.791.79 1.121.12 1.001.00 1.161.16
1000 82.1882.18 2.262.26 1.591.59 1.081.08 1.001.00 0.130.13

Table 2 shows the variance of the log of the estimated likelihood obtained by using the five estimators of the likelihood for the d=10d=10 dimensional linear Gaussian state space model with T=300T=300 time periods. The table shows that: (a) there is no substantial reduction in the variance of the log of the estimated likelihood for the mean (0% trimmed mean). For example, the variance decreases by only a factor of 2.312.31 times when SS increases from 2020 to 10001000 when N=250N=250. (2) The 5%, 10%, 25%, and 50% trimmed means of the individual likelihood estimates decrease the variance substantially as SS and/or NN increase. The 25% and 50% trimmed mean estimates have the smallest variance for all cases. Similar results are obtained for d=5d=5 with T=200T=200 and T=300T=300 and d=10d=10 with T=200T=200; see tables S3, S4, and S5 in section S4 of the online supplement.

We now examine empirically the ability of the proposed MPM samplers to maintain the correlation between successive values of the log of the estimated likelihood for the 10 dimensional linear Gaussian state space model with T=300T=300. We ran the different MPM approaches for 10001000 iterations holding the current parameter at θ=0.4\theta=0.4 and the proposed parameter at θ=0.4,0.399,0.385\theta^{{}^{\prime}}=0.4,0.399,0.385. At each iteration we generated u~\widetilde{u}, where u~=(u1,,uS)\widetilde{u}=\left(u_{1},...,u_{S}\right) and u~\widetilde{u}^{{}^{\prime}} and obtained logp^¯N(y|θ,u~(s))\log\overline{\widehat{p}}_{N}\left(y|\theta,\widetilde{u}^{\left(s\right)}\right) and logp^¯N(y|θ,u~(s))\log\overline{\widehat{p}}_{N}\left(y|\theta^{{}^{\prime}},\widetilde{u}^{\left(s\right)^{\prime}}\right) for the MPM approaches and computed their sample correlations.

Figure S2 in Section S4 of the online supplement reports the correlation estimates of the log of estimated likelihood obtained using different MPM approaches. The figure show that: (1) when the current and proposed values of the parameters are equal to 0.40.4, all MPM methods maintain a high correlation between the logs of the estimated likelihoods. The estimated correlations between logs of the estimated likelihoods when the number of particle filters SS is 100 are about 0.99; (2) when the current parameter θ\theta and the proposed parameter θ\theta^{{}^{\prime}} are (slightly) different, the MPM methods can still maintain some of the correlations between the log of the estimated likelihoods. The MPM methods with 25% and 50% trimmed means of the likelihood perform the best to maintain the correlation between the logs of estimated likelihoods.

We now compare the efficiency of different sampling schemes for estimating the parameter θ\theta of the linear Gaussian state space model. In all examples, we run the samplers for 2500025000 iterations, with the initial 50005000 iterations discarded as warm up. The particle filter and the parameter samplers are implemented in Matlab running on 2020 CPU cores, a high performance computer cluster. The optimal number of CPU cores required for the MPM algorithm is equal to the number of particle filters SS, which means the properties of the sampler can be easily tuned to provide maximum parallel efficiency on a large range of hardware. The adaptive random walk proposal of Roberts and Rosenthal, (2009) is used for all samplers.

Figure 2 shows the trace plots of the parameter θ\theta estimated using the MPM algorithm with the different trimmed means for estimating the 10 dimensional linear Gaussian state space model with T=200T=200. The figure shows that: (1) the MCMC chain gets stuck for the 5% trimmed mean approach with S=100S=100 and N=250N=250 and N=500N=500 and for the 10% trimmed mean approach with S=100S=100 and N=100N=100; (2) the 10% trimmed mean approach requires at least S=100S=100 and N=250N=250 to make the MCMC chain mix well; (3) the 25% trimmed mean approach is the most efficient sampler as its MCMC chain does not get stuck even with S=100S=100 and N=100N=100. Figure 1 compares the MPM algorithms with and without blocking with S=100S=100 and N=250N=250 for estimating the 10 dimensional linear Gaussian state space model with T=300T=300. The figure shows that the MPM (no blocking) with 5% and 10% trimmed means get stuck and the MPM (no blocking) with 25% trimmed means mixes poorly. The IF^\widehat{\textrm{IF}} of the MPM (blocking) with 25% trimmed mean is 9.6249.624 times smaller than the MPM (no blocking) with 25% trimmed mean. All three panels show that the MPM (blocking) algorithms using trimmed means perform better than the MPM (no blocking). This suggests the usefulness of the MPM (blocking) sampling scheme.

Table 3 shows the IF^\widehat{\textrm{IF}}, TNIF^\widehat{\textrm{TNIF}}, and RTNIF^\widehat{\textrm{RTNIF}} values for the parameter θ\theta in the linear Gaussian state space model with d=10d=10 dimensions and T=300T=300 time periods estimated using the following five different MCMC samplers: (1) the correlated PMMH of Deligiannidis et al., (2018), (2) the MPM with 5% trimmed mean, (3) the MPM with 10% trimmed mean, (4) the MPM with 25% trimmed mean, and (5) the MPM with 50% trimmed mean. In this paper, instead of using the Hilbert sorting method, we implement the correlated PMMH using the fast multidimensional Euclidean sorting method in section 2.4. The computing time reported in the table is the time to run a single particle filter for the CPM and SS particle filters for the MPM approach. The table shows the following points. (1) The correlated PMMH requires more than 5000050000 particles to improve the mixing of the MCMC chain for the parameter θ\theta. (2) The CPU time for running a single particle filter with N=50000N=50000 particles is 4.094.09 times higher than running multiple particle filters with S=100S=100 and N=500N=500. The MPM method can be much faster than the CPM method if it is run using high-performance computing with a large number of cores. (3) The MPM allows us to use much smaller number of particles for each independent PF and these multiple PFs can be run independently. (4) The IF^\widehat{\textrm{IF}} values for the parameter θ\theta estimated using the correlated PMMH with N=50000N=50000 particles is 11.0511.05, 41.9341.93, 46.1246.12, and 56.3756.37 times larger than the 5%, 10%, 25%, and 50% trimmed means approaches with S=100S=100 and N=500N=500 particles, respectively. (5) In terms of RTNIF^\widehat{\textrm{RTNIF}}, the 5%, 10%, 25%, and 50% trimmed means with S=100S=100 and N=500N=500 are 45.2745.27, 171.52171.52, 188.67188.67, and 230.60230.60 times smaller than the correlated PMMH with N=50000N=50000 particles. (6) The best sampler for this example is the 50% trimmed mean approach with S=100S=100 and N=250N=250 particles. Table LABEL:tabledim10T200 in section S4 of the online supplement gives similar results for the 10 dimensional linear Gaussian state space model with T=200T=200. Figure 3 shows the kernel density estimates of the parameter θ\theta estimated using Metropolis-Hastings algorithm with the (exact) Kalman filter method and the MPM algorithm with 5%, 10%, 25%, and 50% trimmed means of the likelihood. The figure shows the approximate posteriors obtained by various approaches using trimmed means are very close to the true posterior. Similar results are obtained for the case d=10d=10 dimensions and T=200T=200 time periods given in Figure S1 in Section S4 of the online supplement. Figure S1 also shows that the approximate posterior obtained using the MPM with trimmed mean of the likelihood is very close to the exact posterior obtained using the correlated PMMH.

In summary, the example suggests that: (1) for a high dimensional state space model, there is no substantial reduction in the variance of the log of the estimated likelihood for the method which uses the average of the likelihood from SS particle filters when SS and/or NN increases. Methods that use trimmed means of the likelihood reduce the variance substantially when SS and/or NN increases; (2) the 25% and 50% trimmed means approaches give the smallest variance of the log of the estimated likelihood for all cases; (3) the MPM method with 50% trimmed mean is best at maintaining the correlation between the logs of the estimated likelihoods in successive iterates; (4) the MPM (blocking) method is more efficient than the MPM (no blocking) for the same values of SS and NN; (5) the approximate approaches that use the trimmed means of the likelihood gives accurate approximations to the true posterior; (6) the best sampler for estimating the 10 dimensional linear Gaussian state space model with T=300T=300 is the 50% trimmed mean approach with S=100S=100 and N=250N=250 particles; (7) the MPM allows us to use much smaller number of particles for each independent PF and these multiple PFs can be run independently. These approaches can be made much faster if they are run using a high-performance computer cluster with a large number of cores.

Table 3: Comparing the performance of different PMMH samplers with different number of particle filters SS and different number of particles NN in each particle filter for estimating the linear Gaussian state space model using a simulate dataset with T=300T=300 and d=10d=10 dimensions. Sampler I: Correlated PMMH of Deligiannidis et al., (2018). Sampler II: MPM with 5% trimmed mean. Sampler III: MPM with 10% trimmed mean. Sampler IV: MPM with 25% trimmed mean. Sampler V: MPM with 50% trimmed mean. Time denotes the time taken in seconds for one iteration of the method. The IF^\widehat{\textrm{IF}}, CT, and RTNIF^\widehat{\textrm{RTNIF}} entries in columns headed I to V are relative to the entries in column V (MPM with 50% trimmed mean with N=250N=250 and S=100S=100). The entries in column ”V actual” are the actual IF^\widehat{\textrm{IF}}, CT, and RTNIF^\widehat{\textrm{RTNIF}} values for estimator V. For example, the entry on row 5, column I is 415.08, which means the RTNIF^\widehat{\textrm{RTNIF}} is 415.08 times the corresponding RTNIF^\widehat{\textrm{RTNIF}} in column V (MPM with 50% trimmed mean with N=250N=250 and S=100S=100).

.

I II III IV V V actual
N 50000 250 500 250 500 250 500 250 500 250 500
S 1 100 100 100 100 100 100 100 100 100 100
IF^\widehat{\textrm{IF}} 50.73 88.09 4.59 3.03 1.21 1.40 1.10 1.00 0.90 9.98 9.00
CT 8.18 1.00 2.00 1.00 2.00 1.00 2.00 1.00 2.00 0.99 1.98
RTNIF^\widehat{\textrm{RTNIF}} 415.08 88.10 9.17 3.03 2.42 1.40 2.20 1.00 1.80 9.88 17.82
Figure 1: Left: Trace plots of the parameter θ\theta estimated using (1) MPM (blocking) with 5% trimmed mean and (2) MPM (no blocking) with 5% trimmed mean. Middle: Trace plots of the parameter θ\theta estimated using (1) MPM (blocking) with 10% trimmed mean and (2) MPM (no blocking) with 10% trimmed mean. Right: Trace plots of the parameter θ\theta estimated using (1) MPM (blocking) with 25% trimmed mean and (2) MPM (no blocking) with 25% trimmed mean. Linear Gaussian space model with d=10d=10 dimensions, T=300T=300 time periods, the number of particle filters S=100S=100, and the number of particles in each particle filter N=250N=250.
Refer to caption
Figure 2: Trace plots of the parameter θ\theta of the linear Gaussian state space model with d=10d=10 dimensions and T=200T=200 time periods estimated using different MPM samplers with different number of particle filters SS and number of particles NN in each particle filter: (I) MPM with mean likelihood estimate, (II) MPM with 5% trimmed mean estimate, (III) MPM with 10% trimmed mean estimate, (IV) MPM with 25% trimmed mean estimate.
Refer to caption
Figure 3: Linear Gaussian state space model with d=10,T=300d=10,T=300. Kernel density estimates of the posterior density of θ\theta estimated using: Top Left: (1) Metropolis-Hastings with exact likelihood; (2) MPM with 5% trimmed mean estimate, S=100,N=500S=100,N=500; (3) MPM with 5% trimmed mean, S=100,N=1000S=100,N=1000. Top Right: (1) Metropolis-Hastings with exact likelihood; (2) MPM with the 10% trimmed mean estimate, S=100,N=500S=100,N=500; (3) MPM with 10% trimmed mean estimate, S=100S=100, N=1000N=1000. Bottom Left: (1) Metropolis-Hastings using the exact likelihood; (2) MPM with 25% trimmed mean estimate of the likelihood, S=100S=100, N=500N=500; (3) MPM with 25% trimmed mean estimate of the likelihood, S=100S=100, N=1000N=1000. Bottom Right: (1) Metropolis-Hastings algorithm with exact likelihood; (2) MPM with 50% trimmed mean estimate of the likelihood, S=101S=101, N=500N=500; (3) MPM with 50% trimmed mean estimate of the likelihood, S=101S=101, N=1000N=1000.
Refer to caption

3.3 Multivariate Stochastic Volatility in Mean Model

This section investigates the performance of the proposed MPM samplers for estimating large multivariate stochastic volatility in mean models (see, e.g., Carriero et al., , 2018; Cross et al., , 2021), where each of the log volatility processes follows a GARCH (Generalized Autoregressive Conditional Heteroskedasticity) diffusion process (Shephard et al., , 2004). The GARCH diffusion model does not have a closed form state transition density, making it challenging to estimate using the Gibbs type MCMC sampler used in Cross et al., (2021). It is well-known that the Gibbs sampler is inefficient for generating parameters for a diffusion process, especially for the variance parameter τ2\tau^{2} (Stramer and Bognar, , 2011). Conversely, MPM samplers can efficiently estimate models with non-closed form state transition densities, making them well-suited for the GARCH diffusion model. This section demonstrates that MPM samplers can estimate this model efficiently and overcome the limitations of the Gibbs sampler.

Suppose that PtP_{t} is a d×1d\times 1 vector of daily stock prices and define yt=logPtlogPt1y_{t}=\log P_{t}-\log P_{t-1} as the log-return of the stocks. Let hi,th_{i,t} be the log-volatility process of the iith stock at time tt. We also define, h,t=(h1,t,,hd,t)h_{\cdotp,t}=\left(h_{1,t},...,h_{d,t}\right)^{\top} and hi,=(hi,1,,hi,T)h_{i,\cdotp}=\left(h_{i,1},...,h_{i,T}\right)^{\top}. The model for yty_{t} is

yt=Ah,t+ϵt,ϵtN(0,Σt),y_{t}=Ah_{\cdot,t}+\epsilon_{t},\;\epsilon_{t}\sim N\left(0,\Sigma_{t}\right), (8)

where AA is a d×dd\times d matrix that captures the effects of log-volatilities on the levels of the variables. The time-varying error covariance matrix Σt\Sigma_{t} depends on the unobserved latent variables h,th_{\cdotp,t} such that

Σt:=diag(exp(h1,t),,exp(hd,t)).\Sigma_{t}:=\textrm{diag}\left(\exp\left(h_{1,t}\right),...,\exp\left(h_{d,t}\right)\right). (9)

Each log-volatility hi,th_{i,t} is assumed to follow a continuous time GARCH diffusion process (Shephard et al., , 2004) satisfying

dhi,t={α(μexp(hi,t))exp(hi,t)τ22}dt+τdWi,t,dh_{i,t}=\left\{\alpha\left(\mu-\exp\left(h_{i,t}\right)\right)\exp\left(-h_{i,t}\right)-\frac{\tau^{2}}{2}\right\}dt+\tau dW_{i,t}, (10)

for i=1,,di=1,...,d, where the Wi,tW_{i,t} are independent Wiener processes. The following Euler scheme approximates the evolution of the log-volatilities hi,t{h}_{i,t} in (10) by placing M1M-1 evenly spaced points between times tt and t+1t+1. We denote the intermediate volatility components by hi,t,1,,hi,t,M1h_{i,t,1},...,h_{i,t,M-1}, and it is convenient to set hi,t,0=hi,th_{i,t,0}=h_{i,t} and hi,t,M=hi,t+1h_{i,t,M}=h_{i,t+1}. The equation for the Euler evolution, starting at hi,t,0h_{i,t,0} is (see, for example, Stramer and Bognar, (2011), pg. 234)

hi,t,j+1|hi,t,jN(hi,t,j+{α(μexp(hi,t,j))exp(hi,t,j)τ22}δ,τ2δ),h_{i,t,j+1}|h_{i,t,j}\sim N\left(h_{i,t,j}+\left\{\alpha\left(\mu-\exp\left(h_{i,t,j}\right)\right)\exp\left(-h_{i,t,j}\right)-\frac{\tau^{2}}{2}\right\}\delta,\tau^{2}\delta\right), (11)

for j=0,,M1j=0,...,M-1, where δ=1/M\delta=1/M. The initial state of hi,th_{i,t} is assumed normally distributed N(0,1)N(0,1) for i=1,,di=1,...,d. For this example, we assume for simplicity that the parameters μ\mu, α\alpha, and τ2\tau^{2} are the same across all stocks and Ai,j=ψ|ij|+1A^{i,j}=\psi^{|i-j|+1} for i<ji<j. This gives the same number of parameters regardless of the dimension of the stock returns so that we can focus on comparing different PMMH samplers.

We first report on a simulation study for the above model. We simulated ten independent datasets from the model described above with d=20d=20 dimensions and T=100T=100 observations. The true parameters are α=2\alpha=2, μ=1.81\mu=1.81, τ2=0.38\tau^{2}=0.38, and ψ=0.01\psi=0.01. The priors are αG(1,1)\alpha\sim G\left(1,1\right), τ2G(0.5,0.5)\tau^{2}\sim G\left(0.5,0.5\right), μG(1,1)\mu\sim G\left(1,1\right), and ψN(0,1)\psi\sim N\left(0,1\right). These prior densities are non-informative and cover almost all possible values in practice. We use M=3M=3 latent points for the Euler approximations of the state transition densities. For each dataset, we run the MPM sampler with the 25% trimmed mean approach for 2500025000 iterations; the initial 50005000 iterations are discarded as warm up. The particle filter and the parameter samplers are implemented in Matlab running on 20 CPU cores, a high performance computer cluster. The adaptive random walk proposal of Roberts and Rosenthal, (2009) is used for all samplers.

Figure 4 and Figures S3 to S11 in section S5 of the online supplement show the kernel density estimates of the parameters of the multivariate stochastic volatility in mean model described above estimated by the MPM sampler (S=100S=100 and N=250N=250) using a 25% trimmed mean; the vertical lines show the true parameter values. The figures show that the model parameters are accurately estimated. Figure 5 shows the inefficiency factors (IF^\widehat{\textrm{IF}}) of the parameters over 10 simulated datasets with d=20d=20 dimensions and T=100T=100 time periods. The results suggest that the MPM sampler with only N=250N=250 particles and S=100S=100 particle filters is efficient because the IF^\widehat{\textrm{IF}} values of the parameters are reasonably small.

Figure 4: Kernel density estimates of the parameters of the multivariate stochastic volatility in mean model estimated using MPM with a 25% trimmed mean estimate of the likelihood based on a simulated dataset (data 1) with d=20d=20 and T=100T=100.
Refer to caption
Figure 5: The inefficiency factors (IF^\widehat{\textrm{IF}}) of the parameters of the multivariate stochastic volatility in mean model estimated using MPM with a 25% trimmed mean estimate of the likelihood for 10 simulated datasets with d=20d=20 and T=100T=100.
Refer to caption

We now apply the methods to a sample of daily US industry stock returns data obtained from the Kenneth French website111http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/datalibrary.html, using a sample from January 4th, 2021 to the 26th of May, 2021, a total of 100 observations.

Table 4 shows the IF^\widehat{\textrm{IF}}, TNIF^\widehat{\textrm{TNIF}}, and RTNIF^\widehat{\textrm{RTNIF}} for the parameters of the 20 dimensional multivariate stochastic volatility in mean model estimated using (1) the correlated PMMH, (2) the MPM sampler with the 25% trimmed mean of the likelihood estimate, and (3) the MPM sampler with the 50% trimmed mean of the likelihood estimate. The table shows that: (1) the MCMC chain obtained using the correlated PMMH with N=100000N=100000 still gets stuck; (2) the CPU time for the correlated PMMH with N=100000N=100000 particles is 2626 times slower than the MPM algorithm with S=250S=250 and N=100N=100; (3) the MPM sampler with N=250N=250 and S=100S=100 is less efficient than the MPM sampler with N=100N=100 and S=250S=250. The MPM sampler with N=100N=100 and S=250S=250 can be made more efficient by running it on a high-performance computer cluster with many cores; (4) the best sampler in terms of TNIF^MEAN\widehat{\textrm{TNIF}}_{\textrm{MEAN}} and TNIF^MAX\widehat{\textrm{TNIF}}_{\textrm{MAX}} for estimating the 20 dimensional multivariate stochastic volatility in mean model is the MPM sampler with a 50% trimmed mean with S=250S=250 and N=100N=100.

Table 5 shows the IF^\widehat{\textrm{IF}}, TNIF^\widehat{\textrm{TNIF}}, and RTNIF^\widehat{\textrm{RTNIF}} for the parameters of the d=30d=30 dimensional multivariate stochastic volatility in the mean model estimated using: (1) the MPM sampler with the 25% trimmed mean of the likelihood, and (2) the MPM sampler with 50% trimmed mean of the likelihood. We do not consider the correlated PMMH as it is very expensive to run for this high dimensional model. The table shows that (1) increasing the number of particle filters from S=100S=100 to S=200S=200 reduces the IF^\widehat{\textrm{IF}}, but increases the computational time substantially. However, the computational time can be reduced by using a high performance computer cluster with a larger number of cores; (2) the MPM sampler with N=500N=500 and S=100S=100 is less efficient than the MPM sampler with N=250N=250 and S=200S=200. We recommend that if there is access to large computing resources, then the number of particle filters SS should be increased while keeping the number of particles NN in each particle filter reasonably small.

Table 4: Comparing the performance of different PMMH samplers with different numbers SS of particle filters and different number NN of particles in each particle filter for estimating a multivariate stochastic volatility in mean model with GARCH diffusion processes using a real dataset with T=100T=100 and d=20d=20. The model parameters are ψ\psi, μ\mu, aa, and τ2\tau^{2}. Sampler I: Correlated PMMH of Deligiannidis et al., (2018). Sampler II: MPM with 25% trimmed mean. Sampler III: MPM with 50% trimmed mean. Time denotes the time taken in seconds for one iteration of the method. The RTNIF is the TNIF relative to the MPM with 25% trimmed mean with S=100S=100 and N=250N=250.
Param. I II III
N 100000 250 100 250 100
S 1 100 250 100 250
ψ\psi NA 97.62197.621 67.45967.459 83.52383.523 62.68062.680
μ\mu NA 65.67865.678 42.29942.299 52.29352.293 39.39039.390
aa NA 20.03720.037 10.40910.409 12.85712.857 9.4909.490
τ2\tau^{2} NA 56.61756.617 32.90032.900 43.70543.705 37.92837.928
IFψ,MAX^\widehat{\textrm{IF}_{\psi,\textrm{MAX}}} NA 97.62197.621 67.45967.459 83.52383.523 62.68062.680
TNIF^MAX\widehat{\textrm{TNIF}}_{\textrm{MAX}} NA 173.765173.765 124.799124.799 148.671148.671 115.958115.958
RTNIF^MAX\widehat{\textrm{RTNIF}}_{\textrm{MAX}} NA 11 0.7180.718 0.8560.856 0.6670.667
IFψ,MEAN^\widehat{\textrm{IF}_{\psi,\textrm{MEAN}}} NA 59.98859.988 38.26738.267 48.09548.095 37.37237.372
TNIF^MEAN\widehat{\textrm{TNIF}}_{\textrm{MEAN}} NA 106.779106.779 70.79470.794 85.60985.609 69.13869.138
RTNIF^MEAN\widehat{\textrm{RTNIF}}_{\textrm{MEAN}} NA 11 0.6630.663 0.8020.802 0.6470.647
Time 47.65 1.78 1.85 1.78 1.85
Table 5: Comparing the performance of different PMMH samplers with different number of particle filters S and different number of particles N in each particle filter for estimating multivariate stochastic volatility in mean model with GARCH diffusion processes using a real dataset with T=100T=100 and d=30d=30. The model parameters are ψ\psi, μ\mu, aa, and τ2\tau^{2}. Sampler I: MPM with 25% trimmed mean. Sampler II: MPM with 50% trimmed mean. Time is the time in seconds for one iteration of the method. The RTNIF is the TNIF relative to the MPM with 25% trimmed mean with S=100S=100 and N=250N=250.
Param. I II
N 250 250 500 250 250
S 100 200 100 100 200
ψ\psi 183.559183.559 124.258124.258 227.685227.685 151.745151.745 80.07380.073
μ\mu 74.07074.070 49.25049.250 384.767384.767 74.48574.485 45.55445.554
aa 46.15946.159 45.66245.662 126.968126.968 32.72132.721 14.75314.753
τ2\tau^{2} 102.234102.234 70.25570.255 218.866218.866 78.17178.171 42.07542.075
IFψ,MAX^\widehat{\textrm{IF}_{\psi,\textrm{MAX}}} 183.559183.559 124.258124.258 384.767384.767 151.745151.745 80.07380.073
TNIF^MAX\widehat{\textrm{TNIF}}_{\textrm{MAX}} 523.143523.143 877.262877.262 3351.3213351.321 432.473432.473 565.315565.315
RTNIF^MAX\widehat{\textrm{RTNIF}}_{\textrm{MAX}} 11 1.6771.677 6.4066.406 0.8270.827 1.0811.081
IFψ,MEAN^\widehat{\textrm{IF}_{\psi,\textrm{MEAN}}} 101.505101.505 72.35672.356 239.571239.571 84.28084.280 45.61445.614
TNIF^MEAN\widehat{\textrm{TNIF}}_{\textrm{MEAN}} 289.289289.289 510.833510.833 2077.9532077.953 240.198240.198 322.035322.035
RTNIF^MEAN\widehat{\textrm{RTNIF}}_{\textrm{MEAN}} 11 1.7661.766 7.1837.183 0.8300.830 1.1131.113
Time 2.85 7.06 8.71 2.85 7.06

3.4 DSGE models

In this section, we evaluate the effectiveness of the MPM samplers for estimating two non-linear DSGE models: the small scale and medium scale DSGE models as detailed in sections S8 and S9 of the online supplement. Due to the large dimension of the states compared to the dimension of the disturbances in these state space models, we employ the disturbance filter method introduced in section 2.5.

The state transition densities of the two non-linear DSGE models are intractable. Section S3 of the online supplement describes the state space representation of these models, which are solved using Dynare.

3.4.1 Nonlinear Small Scale DSGE Model

This section reports on how well a number of PMMH samplers of interest estimate the parameters of the nonlinear (second order) small scale DSGE model. The small scale DSGE model follows the setting in Herbst and Schorfheide, (2019) with nominal rigidities through price adjustment costs following Rotemberg, (1982), and three structural shocks. There are 44 state variables, 44 control variables (88 variables combining the state and control variables) and 33 exogenous shocks in the model. We estimate this model using 33 observables, which means that the dimensions of yty_{t}, the state vector and the disturbance vector are 33, 88, and 33, respectively. Section S8 of the online supplement describes the model.

In this section, we use the delayed acceptance version of the MPM algorithm, calling it “delayed acceptance multiple PMMH” (DA-MPM), to speed up the computation. The delayed acceptance sampler (Christen and Fox, , 2005) avoids computing the expensive likelihood estimate if it is likely that the proposed draw will ultimately be rejected. The first accept-reject stage uses a cheap (or deterministic) approximation to the likelihood instead of the expensive likelihood estimate in the MH acceptance ratio. The particle filter is then used to estimate the likelihood only for a proposal that is accepted in the first stage; a second accept-reject stage ensures that detailed balance is satisfied with respect to the true posterior. We use the likelihood obtained from the central difference Kalman filter (CDKF) proposed by Norgaard et al., (2000) in the first accept-reject stage of the delayed acceptance scheme. Section  S2 of the online supplement gives further details. The dimension of states is usually much larger than the dimension of the disturbances for the DSGE models. Two different sorting methods are also considered. The first sorts the state particles and the second sorts the disturbance particles using the sorting algorithm in section 2.4.

The PMMH samplers considered are: (1) The MPM (0% trimmed mean) with the auxiliary disturbance particle filter (ADPF), disturbance sorting, and the correlation between random numbers used in estimating the log of estimated likelihoods set to ρu=0.9\rho_{u}=0.9; (2) the MPM (10% trimmed mean) with the ADPF, disturbance sorting, and the ρu=0.9\rho_{u}=0.9; (3) the MPM (25% trimmed mean) with the ADPF, disturbance sorting, and the ρu=0.9\rho_{u}=0.9; (4) the MPM (25% trimmed mean) with the ADPF, disturbance sorting, and ρu=0\rho_{u}=0, (5) the MPM (25% trimmed mean) with the ADPF, state sorting, and ρu=0.9\rho_{u}=0.9; (6) the delayed acceptance MPM (DA-MPM) (25% trimmed mean) with ADPF, disturbance sorting, and the ρu=0.9\rho_{u}=0.9; (7) the MPM (50% trimmed mean) with the ADPF, disturbance sorting, and the ρu=0.9\rho_{u}=0.9; (8) the MPM (0% trimmed mean) with the bootstrap particle filter, disturbance sorting, and the ρu=0.9\rho_{u}=0.9; (9) the correlated PMMH with ρu=0.9\rho_{u}=0.9, the bootstrap particle filter, and disturbance sorting. Each sampler ran for 2500025000 iterations, with the initial 50005000 iterations discarded as burn-in. We use the adaptive random walk proposal of Roberts and Rosenthal, (2009) for q(θ|θ)q\left(\theta^{{}^{\prime}}|\theta\right) and the adaptive scaling approach of Garthwaite et al., (2015) to tune the Metropolis-Hastings acceptance probability to 20%20\%. Section S8 of the online supplement gives details on model specifications and model parameters. We include measurement error in the observation equation. The measurement error variances are estimated together with other parameters. All the samplers ran on a high performance computer cluster with 20 CPU cores. The real dataset is obtained from Herbst and Schorfheide, (2019), using data from 1983Q1 to 2013Q4, which includes the Great Recession with a total of 124124 observations for each series.

Table 6 reports the relative time normalised inefficiency factor (RTNIF^\widehat{\textrm{RTNIF}}) of a PMMH sampler relative to the DA-MPM (25% trimmed mean) with the ADPF, disturbance sorting, and ρu=0.9\rho_{u}=0.9 for the non-linear small scale models with T=124T=124 observations. The computing time reported in the table is the time to run a single particle filter for the CPM and SS particle filters for the MPM approach. The table shows that: (1) The MPM sampler (0% trimmed mean) with the ADPF, disturbance sorting, ρu=0.9\rho_{u}=0.9, and N=250N=250 particles is more efficient than the MPM sampler (0% trimmed mean) with the bootstrap particle filter, disturbance sorting, ρu=0.9\rho_{u}=0.9, and N=250N=250 particles with the MCMC chain obtained by the latter getting stuck. (2) The running time taken the MPM method with S=100S=100 particles filters, each with N=100N=100 particles and disturbance sorting is 9.739.73 times faster than the CPM with N=20000N=20000 particles. (3) In terms of RTNIF^MAX\widehat{\textrm{RTNIF}}_{\textrm{MAX}}, the performance of MPM samplers with 0%, 10%, 25%, and 50% trimmed means, ADPF, disturbance sorting, and ρu=0.9\rho_{u}=0.9 are 3.153.15, 90.7790.77, 103.04103.04, and 91.5091.50 times more efficient respectively, than the correlated PMMH with N=20000N=20000 particles. (4) In terms of RTNIF^MAX\widehat{\textrm{RTNIF}}_{\textrm{MAX}}, the MPM (25% trimmed mean) with disturbance sorting, and ρu=0.9\rho_{u}=0.9 is 1.381.38 times more efficient than the MPM (25% trimmed mean) method with state sorting method and performs similarly to the MPM (25% trimmed mean) approach with ADPF, disturbance sorting, and ρu=0\rho_{u}=0. (5) The delayed acceptance MPM (25% trimmed mean) is slightly more efficient than the MPM (25% trimmed mean) approach. The delayed acceptance algorithm is 55 times faster on average because the target Metropolis-Hastings acceptance probability is set to 20%20\% using the Garthwaite et al., (2015) approach, but it has higher maximum IF^\widehat{\textrm{IF}} value. Table S8 in section S6 of the online supplement gives the details. (6) It is possible to use the tempered particle filter of Herbst and Schorfheide, (2019) within the MPM algorithm. However, the TPF is computationally expensive because of the tempering iterations and the random walk Metropolis-Hastings mutation steps. In summary, the best sampler with the smallest RTNIF^MAX\widehat{\textrm{RTNIF}}_{\textrm{MAX}} and RTNIF^MEAN\widehat{\textrm{RTNIF}}_{\textrm{MEAN}} is the delayed acceptance MPM (DA-MPM) (25% trimmed mean) with ADPF, disturbance sorting, and ρu=0.9\rho_{u}=0.9.

Table 6: Comparing the performance of different PMMH samplers with different number of particle filters SS and different number of particles NN in each particle filter for estimating small scale DSGE model using a real dataset with T=124T=124 observations. Sampler I: MPM (0% trimmed mean, ρu=0.9\rho_{u}=0.9, disturbance sorting, ADPF). Sampler II: MPM (10% trimmed mean, ρu=0.9\rho_{u}=0.9, disturbance sorting, ADPF). Sampler III: MPM (25% trimmed mean, ρu=0.9\rho_{u}=0.9, disturbance sorting, ADPF). Sampler IV: MPM (25% trimmed mean, ρu=0\rho_{u}=0, disturbance sorting, ADPF). Sampler V: MPM (25% trimmed mean, ρu=0.9\rho_{u}=0.9, state sorting, ADPF). Sampler VI: DA-MPM (25% trimmed mean, ρu=0.9\rho_{u}=0.9, disturbance sorting, ADPF). Sampler VII: MPM (50% trimmed mean, ρu=0.9\rho_{u}=0.9, disturbance sorting, ADPF). Sampler VIII: MPM (0% trimmed mean, ρu=0.9\rho_{u}=0.9, disturbance sorting, bootstrap). Sampler IX: Correlated PMMH of Deligiannidis et al., (2018). Time denotes the time taken in seconds for one iteration of the method. The IF^ψ,MAX\widehat{\textrm{IF}}_{\psi,\textrm{MAX}}, RTNIF^MAX\widehat{\textrm{RTNIF}}_{\textrm{MAX}}, IF^ψ,MEAN\widehat{\textrm{IF}}_{\psi,\textrm{MEAN}}, and RTNIF^MEAN\widehat{\textrm{RTNIF}}_{\textrm{MEAN}} entries in columns headed I to IX are relative to the entries in column VI. The entries in column ”VI actual” are the actual IF^ψ,MAX\widehat{\textrm{IF}}_{\psi,\textrm{MAX}}, RTNIF^MAX\widehat{\textrm{RTNIF}}_{\textrm{MAX}}, IF^ψ,MEAN\widehat{\textrm{IF}}_{\psi,\textrm{MEAN}}, and RTNIF^MEAN\widehat{\textrm{RTNIF}}_{\textrm{MEAN}} for estimator VI. The numbers in bold are for the sampler with similar values of the RTNIF^MAX\widehat{\textrm{RTNIF}}_{\textrm{MAX}} and RTNIF^MEAN\widehat{\textrm{RTNIF}}_{\textrm{MEAN}}.
I II III IV V VI VII VIII IX VI actual
S 100 100 100 100 100 100 100 100 1 100
N 250 100 100 100 100 100 100 250 20000 100
IF^ψ,MAX\widehat{\textrm{IF}}_{\psi,\textrm{MAX}} 3.733.73 0.250.25 0.220.22 0.250.25 0.280.28 1.001.00 0.390.39 NA 2.422.42 604.72604.72
RTNIF^MAX\widehat{\textrm{RTNIF}}_{\textrm{MAX}} 36.2736.27 1.26 1.11 1.25 1.541.54 1.00 1.961.96 NA 114.37114.37 66.52{66.52}
IF^ψ,MEAN\widehat{\textrm{IF}}_{\psi,\textrm{MEAN}} 5.885.88 0.490.49 0.500.50 0.490.49 0.610.61 1.001.00 0.590.59 NA 3.323.32 190.74190.74
TNIF^MEAN\widehat{\textrm{TNIF}}_{\textrm{MEAN}} 57.2357.23 2.47 2.49 2.44 3.413.41 1.00 2.932.93 NA 157.02157.02 20.98{20.98}
Time 9.73 5.00 5.00 5.00 5.55 1.00 5.005.00 6.82 47.27 0.11

3.4.2 Nonlinear Medium Scale DSGE Model

This section applies the MPM samplers with 50% trimmed mean for estimating the parameters of the second-order medium-scale DSGE model of Gust et al., (2017) discussed in Section S9 of the online supplement. The medium scale DSGE model is similar to the models in Christiano et al., (2005) and Smets and Wouters, (2007) with nominal rigidities in both wages and prices, habit formation in consumption, investment adjustment costs, along with five structural shocks. We do not consider the zero lower bound (ZLB) on the nominal interest rate because estimating such a model requires solving the fully nonlinear model using global methods instead of local perturbation techniques. The time taken to solve the fully nonlinear model depends on the choice of solution technique and can vary across methods such as value function iteration and time-iteration methods. Since the paper focuses on demonstrating the strength of the estimation algorithm, we restrict our sample to the period before the zero lower bound constraint on the nominal interest rate binds and the model can be solved (at a second order approximation) using perturbation methods. The estimation algorithm, however, can be applied to fully nonlinear models as well since the estimation step requires decision rules produced by the model solution (global or local methods).

The baseline model in Gust et al., (2017) consists of 12 state variables, 18 control variables (30 variables combining the state and control variables) and 5 exogenous shocks. We estimate this model using 5 observables; that means that the dimensions of yty_{t}, the state vector and the disturbance vector are 55, 3030, and 55, respectively. We include measurement error in the observation equation and consider two cases. The measurement error variances are: (1) estimated and (2) fixed at the 25% of the sample variance of the observables over the estimation period. We use quarterly US data from 1983Q1 to 2007Q4, with a total of 100100 observations for each of the five series.

The PMMH samplers considered are: (1) the MPM (50% trimmed mean) with the auxiliary disturbance particle filter (ADPF), disturbance sorting, and the correlation between random numbers used in estimating the log of estimated likelihoods set to ρu=0.9\rho_{u}=0.9; (2) the correlated PMMH of Deligiannidis et al., (2018). Each sampler ran for 6000060000 iterations, with the initial 3000030000 iterations discarded as burn-in. We use the adaptive random walk proposal of Roberts and Rosenthal, (2009) for q(θ|θ)q\left(\theta^{{}^{\prime}}|\theta\right). All the samplers ran on a high performance computer cluster with 2020 CPU cores.

Table 7 shows the RTNIF^\widehat{\textrm{RTNIF}} of the correlated PMMH relative to the MPM sampler with 50% trimmed mean (ADPF, disturbance sorting, and ρu=0.9\rho_{u}=0.9) for the non-linear medium scale DSGE model for the cases where the measurement error variances are estimated and fixed at 25% sample variance of the observables. The computing time reported in the table is the time to run a single particle filter for the CPM and S=250S=250 particle filters for the MPM approach. The table shows that in terms of RTNIF^MAX\widehat{\textrm{RTNIF}}_{\textrm{MAX}}, the MPM sampler is 2626 and 1414 times more efficient than the correlated PMMH for the cases where the measurement error variances are estimated and fixed at 25% sample variance of the observables, respectively. The full details on the inefficiency factors for each parameter are given in table S11 in section S7 of the online supplement.

Table 8 gives the posterior means, 2.5%2.5\%, and 97.5%97.5\% quantile estimates of some of the parameters with standard errors in brackets for the medium scale DSGE model estimated using the MPM sampler with 50%50\% trimmed mean (ρu=0.9\rho_{u}=0.9, disturbance sorting, ADPF). The measurement error variances are fixed to 25%25\% of the sample variance of the observables. The table shows that the standard errors are relatively small for all parameters indicating that the parameters are estimated accurately. Table S10 in Section S7 of the online supplement gives the posterior means, 2.5%2.5\%, and 97.5%97.5\% quantile estimates of all parameters in the medium scale DSGE model.

Figure 6 shows the kernel density estimates of some of the parameters of the medium scale DSGE model estimated using the MPM with 50% trimmed mean (ADPF, disturbance sorting, and ρu=0.9\rho_{u}=0.9) and the correlated PMMH. The figure shows that the approximate posterior obtained using the MPM with 50% trimmed mean is very close to the exact posterior obtained using the correlated PMMH.

We also consider a variation of the medium scale DSGE model of Gust et al., (2017) described in section S9 of the online supplement. To study how well our approach performs with increasing state dimension, we estimate the model with the canonical definition of the output gap defined as the difference between output in the model with nominal rigidities and output in the model with flexible prices and wages. This variation is identical to the baseline model in Gust et al., (2017) except for the Taylor rule specification, which now uses the canonical definition of the output gap. This extension of the medium scale DSGE model consists of 21 state variables, 30 control variables (51 variables combining the state and control variables) and 5 exogenous shocks. We estimate this model using 5 observables; that means that the dimensions of yty_{t}, the state vector and the disturbance vector are 55, 5151, and 55, respectively. The measurement error variances are fixed at 25% of the variance of the observables over the estimation period.

Table S12 of the online supplement shows the posterior means, 2.5% and 97.5% quantiles of the parameters of the extended medium scale DSGE model estimated using the MPM algorithm with a 50% trimmed mean (ADPF, disturbance sorting, and ρu=0.9\rho_{u}=0.9) with S=250S=250 and N=100N=100. The table shows that the IF^\widehat{\textrm{IF}}s of the parameters are similar to the simpler medium scale DSGE model, indicating the performance of the MPM algorithm does not deteriorate with the extended model with the higher state dimensions.

Table 7: Comparing the performance of different PMMH samplers with different numbers of particle filters SS and different numbers of particles NN in each particle filter for estimating the medium scale DSGE model using the US quarterly dataset from 1983Q1 to 2007Q4. Sampler I: MPM (50% trimmed mean, ρu=0.9\rho_{u}=0.9, disturbance sorting, ADPF). Sampler II: Correlated PMMH. The first three columns are for the case when the measurement error variances are estimated. The IF^MAX\widehat{\textrm{IF}}_{\textrm{MAX}}, RTNIF^MAX\widehat{\textrm{RTNIF}}_{\textrm{MAX}}, IF^MEAN\widehat{\textrm{IF}}_{\textrm{MEAN}}, and RTNIF^MEAN\widehat{\textrm{RTNIF}}_{\textrm{MEAN}} entries in columns headed I to II are relative to the entries in column I. The entries in column ”I actual” are the actual IF^ψ,MAX\widehat{\textrm{IF}}_{\psi,\textrm{MAX}}, RTNIF^MAX\widehat{\textrm{RTNIF}}_{\textrm{MAX}}, IF^ψ,MEAN\widehat{\textrm{IF}}_{\psi,\textrm{MEAN}}, and RTNIF^MEAN\widehat{\textrm{RTNIF}}_{\textrm{MEAN}} for estimator I. The last three columns are for the case when the measurement error variances are fixed at 25% of the sample variance of the observables. The IF^MAX\widehat{\textrm{IF}}_{\textrm{MAX}}, RTNIF^MAX\widehat{\textrm{RTNIF}}_{\textrm{MAX}}, IF^MEAN\widehat{\textrm{IF}}_{\textrm{MEAN}}, and RTNIF^MEAN\widehat{\textrm{RTNIF}}_{\textrm{MEAN}} entries in columns headed I to II are relative to the entries in column I. The entries in column ”I actual” are the actual IF^ψ,MAX\widehat{\textrm{IF}}_{\psi,\textrm{MAX}}, RTNIF^MAX\widehat{\textrm{RTNIF}}_{\textrm{MAX}}, IF^ψ,MEAN\widehat{\textrm{IF}}_{\psi,\textrm{MEAN}}, and RTNIF^MEAN\widehat{\textrm{RTNIF}}_{\textrm{MEAN}} for estimator I.
I II I actual I II I actual
IF^MAX\widehat{\textrm{IF}}_{\textrm{MAX}} 1.001.00 2.742.74 796.82796.82 1.001.00 1.481.48 383.31383.31
RTNIF^MAX\widehat{\textrm{RTNIF}}_{\textrm{MAX}} 1.001.00 26.5126.51 1338.661338.66 1.001.00 14.3714.37 643.96643.96
IF^MEAN\widehat{\textrm{IF}}_{\textrm{MEAN}} 1.001.00 2.782.78 428.71428.71 1.001.00 0.960.96 275.34275.34
RTNIF^MEAN\widehat{\textrm{RTNIF}}_{\textrm{MEAN}} 1.001.00 26.9226.92 720.22720.22 1.001.00 9.319.31 462.57462.57
Time 1.001.00 9.689.68 1.681.68 1.001.00 9.689.68 1.681.68
Table 8: Posterior means, 2.5%2.5\%, and 97.5%97.5\% quantiles of some of the parameters in the medium scale DSGE model estimated using the MPM sampler with 50%50\% trimmed mean (ρu=0.9\rho_{u}=0.9, disturbance sorting, ADPF) for the US quarterly dataset from 1983Q1 to 2007Q4. The measurement error variances are fixed to 25%25\% of the variance of the observables. The standard errors in brackets are calculated from 1010 independent runs of the MPM sampler.
Param. Estimates 2.5%2.5\% 97.5%97.5\%
ρR\rho_{R} 0.4329(0.0178)\underset{\left(0.0178\right)}{0.4329} 0.2097(0.0365)\underset{\left(0.0365\right)}{0.2097} 0.6490(0.0220)\underset{\left(0.0220\right)}{0.6490}
ρg\rho_{g} 0.9883(0.0008)\underset{\left(0.0008\right)}{0.9883} 0.9770(0.0020)\underset{\left(0.0020\right)}{0.9770} 0.9965(0.0005)\underset{\left(0.0005\right)}{0.9965}
ρμ\rho_{\mu} 0.9883(0.0191)\underset{\left(0.0191\right)}{0.9883} 0.4136(0.0309)\underset{\left(0.0309\right)}{0.4136} 0.7598(0.0410)\underset{\left(0.0410\right)}{0.7598}
100σg100\sigma_{g} 0.2188(0.0104)\underset{\left(0.0104\right)}{0.2188} 0.0973(0.0129)\underset{\left(0.0129\right)}{0.0973} 0.3992(0.0182)\underset{\left(0.0182\right)}{0.3992}
100σμ100\sigma_{\mu} 4.1832(0.0978)\underset{\left(0.0978\right)}{4.1832} 2.1935(0.2905)\underset{\left(0.2905\right)}{2.1935} 7.0727(0.2697)\underset{\left(0.2697\right)}{7.0727}
100ση100\sigma_{\eta} 0.3614(0.0096)\underset{\left(0.0096\right)}{0.3614} 0.2706(0.0082)\underset{\left(0.0082\right)}{0.2706} 0.4837(0.0151)\underset{\left(0.0151\right)}{0.4837}
100σZ100\sigma_{Z} 0.6240(0.0118)\underset{\left(0.0118\right)}{0.6240} 0.4707(0.0168)\underset{\left(0.0168\right)}{0.4707} 0.7954(0.0121)\underset{\left(0.0121\right)}{0.7954}
100σR100\sigma_{R} 0.0462(0.0032)\underset{\left(0.0032\right)}{0.0462} 0.0059(0.0002)\underset{\left(0.0002\right)}{0.0059} 0.1570(0.0123)\underset{\left(0.0123\right)}{0.1570}
Figure 6: Kernel density estimates of the posterior density of some of the parameters of the medium scale DSGE model for the US quarterly dataset from 1983Q1 to 2007Q4 estimated using: (1) MPM with 50% trimmed mean (ρu=0.9\rho_{u}=0.9, disturbance sorting, ADPF) with S=250S=250 and N=100N=100; (2) correlated PMMH with N=50000N=50000. The measurement error variances are fixed to 25%25\% of the variance of the observables.
Refer to caption

4 Summary and conclusions

The article proposes a general particle marginal approach (MPM) for estimating the posterior density of the parameters of complex and high-dimensional state-space models. It is especially useful when the single bootstrap filter is inefficient, while the more efficient auxiliary particle filter cannot be used because the state transition density is computationally intractable. The MPM method is a general extension of the PMMH method and makes four innovations: (a) it is based on the mean or trimmed mean of unbiased likelihood estimators; (b) it uses a novel block version of PMMH that works with multiple particle filters; (c) an auxiliary disturbance particle filter sampler is proposed to estimate the likelihood which is especially useful when the dimension of the states is much larger than the dimension of the disturbances; (d) a fast Euclidean sorting algorithm is proposed to preserve the correlation between the logs of the estimated likelihoods at the current and proposed parameter values. The MPM samplers are applied to complex multivariate stochastic volatility and DSGE models and shown to work well. Future research will consider developing better proposals for the parameter θ\theta.

Online supplement for “The Block-Correlated Pseudo Marginal Sampler for State Space Models”

\ddagger Level 4, West Lobby, School of Economics, University of New South Wales Business School – Building E-12, Kensington Campus, UNSW Sydney – 2052, Email: [email protected], Phone Number: (+61) 293852150. Website: http://www.pratitichatterjee.com
\star 39C. 164, School of Mathematics and Applied Statistics (SMAS), University of Wollongong, Wollongong, 2522; Australian Center of Excellence for Mathematical and Statistical Frontiers (ACEMS); National Institute for Applied Statistics Research Australia (NIASRA); Email: [email protected]. Phone Number: (+61) 424379015.
\star\star Level 4, West Lobby, School of Economics, University of New South Wales Business School – Building E-12, Kensington Campus, UNSW Sydney – 2052, and ACEMS Email: [email protected], Phone Number: (+61) 424802159.

S1 The Disturbance Particle Filter Algorithm

This section discusses the disturbance particle filter algorithm. Let uu be the random vector used to obtain the unbiased estimate of the likelihood. It has the two components uϵ,1:T1:Nu_{\epsilon,1:T}^{1:N} and uA,1:T11:Nu_{A,1:T-1}^{1:N}; uϵ,tiu_{\epsilon,t}^{i} is the vector random variable used to generate the particles ϵti\epsilon_{t}^{i} given θ\theta. We can write

ϵ1im(ϵ1i|uϵ,1i,θ),z1i=F(z0,ϵ1i;θ)andϵtim(ϵti|uϵ,ti,θ),zti=F(zt1at1i,ϵti;θ),t2,\epsilon_{1}^{i}\sim m\left(\epsilon_{1}^{i}|u_{\epsilon,1}^{i},\theta\right),z_{1}^{i}=F\left(z_{0},\epsilon_{1}^{i};\theta\right)\;\textrm{and}\;\epsilon_{t}^{i}\sim m\left(\epsilon_{t}^{i}|u_{\epsilon,t}^{i},\theta\right),z_{t}^{i}=F\left(z_{t-1}^{a_{t-1}^{i}},\epsilon_{t}^{i};\theta\right),t\geq 2, (S1)

where m(ϵti|uϵ,ti,θ)m\left(\epsilon_{t}^{i}|u_{\epsilon,t}^{i},\theta\right) is the proposal density to generate ϵti\epsilon_{t}^{i}, and z0z_{0} is the initial state vector with density p(z0|θ)p(z_{0}|\theta). Denote the distribution of uϵ,tiu_{\epsilon,t}^{i} as ψϵt()\psi_{\epsilon t}\left(\cdot\right). For t2t\geq 2, let uA,t1u_{A,t-1} be the vector of random variables used to generate the ancestor indices at11:Na_{t-1}^{1:N} using the resampling scheme M(at11:N|w¯t11:N,zt11:N)M\left(a_{t-1}^{1:N}|\overline{w}_{t-1}^{1:N},z_{t-1}^{1:N}\right) and define ψAt1()\psi_{At-1}\left(\cdot\right) as the distribution of uA,t1u_{A,t-1}. Common choices for ψϵt()\psi_{\epsilon t}\left(\cdot\right) and ψAt1()\psi_{At-1}\left(\cdot\right) are iid N(0,1)N\left(0,1\right) and i.i.d. U(0,1)U\left(0,1\right) random variables, respectively.

Algorithm S1 takes the number of particles NN, the parameters θ\theta, the random variables used to generate the disturbance particles uϵ,1:T1:Nu_{\epsilon,1:T}^{1:N}, and the random variables used in the resampling steps uA,t11:Nu_{A,t-1}^{1:N} as the inputs; it outputs the set of state particles z1:T1:Nz_{1:T}^{1:N}, disturbance particles ϵ1:T1:N\epsilon_{1:T}^{1:N}, ancestor indices A1:T11:NA_{1:T-1}^{1:N}, and the weights w¯1:T1:N\overline{w}_{1:T}^{1:N}. At t=1t=1, the disturbance particles ϵ11:N\epsilon_{1}^{1:N} are obtained as a function of the random numbers uϵ,11:Nu_{\epsilon,1}^{1:N} using Eq. (S1) in step (1a) and the state particles are obtained from z1i=F(z0,ϵ1i;θ)z_{1}^{i}=F\left(z_{0},\epsilon_{1}^{i};\theta\right), for i=1,,Ni=1,...,N; the weights for all particles are then computed in steps (1b) and (1c).

Step (2a) sorts the state or disturbance particles from smallest to largest using the Euclidean sorting procedure described in section 2.4 to obtain the sorted disturbance particles, sorted state particles and weights. Algorithm S2 resamples the particles using the correlated multinomial resampling to obtain the sorted ancestor indices a~t11:N\widetilde{a}_{t-1}^{1:N}. Step (2d) generates the disturbance particles ϵt1:N\epsilon_{t}^{1:N} as a function of the random numbers uϵ,t1:Nu_{\epsilon,t}^{1:N} using Eq. (S1) and the state particles are obtained from zti=F(zt1at1i,ϵti;θ)z_{t}^{i}=F\left(z_{t-1}^{a_{t-1}^{i}},\epsilon_{t}^{i};\theta\right), for i=1,,Ni=1,...,N; we then compute the weights for all particles in step (2e) and (2f).

The disturbance particle filter provides the unbiased estimate of the likelihood

p^N(y|θ,u):=t=1T(N1i=1Nwti),\widehat{p}_{N}\left(y|\theta,u\right):=\prod_{t=1}^{T}\left(N^{-1}\sum_{i=1}^{N}w_{t}^{i}\right),

where

wti=p(yt|zti,θ)p(ϵti)m(ϵt|uϵ,t,θ)for t=1,,T, and w¯ti=wtij=1Nwtj.w_{t}^{i}=\frac{p\left(y_{t}|z^{i}_{t},\theta\right)p\left(\epsilon^{i}_{t}\right)}{m\left(\epsilon_{t}|u_{\epsilon,t},\theta\right)}\;\textrm{for }t=1,...,T,\textrm{ and }\overline{w}_{t}^{i}=\frac{w_{t}^{i}}{\sum_{j=1}^{N}w_{t}^{j}}. (S2)
Algorithm S1 The Correlated Disturbance Particle Filter

Input: uϵ,1:T1:Nu_{\epsilon,1:T}^{1:N}, uA,t11:Nu_{A,t-1}^{1:N}, θ\theta and NN

Output: ϵ1:T1:N\epsilon_{1:T}^{1:N}, z1:T1:Nz_{1:T}^{1:N}, A1:T11:NA_{1:T-1}^{1:N}, and w¯1:T1:N\overline{w}_{1:T}^{1:N}

For t=1t=1

  • (1a) Generate ϵ1i\epsilon_{1}^{i} from m(ϵ1i|uϵ,1i,θ)m\left(\epsilon_{1}^{i}|u_{\epsilon,1}^{i},\theta\right) and set z1i=F(z0,ϵ1i;θ)z_{1}^{i}=F\left(z_{0},\epsilon_{1}^{i};\theta\right) for i=1,,Ni=1,...,N

  • (1b) Compute the unnormalised weights w1iw_{1}^{i}, for i=1,,Ni=1,...,N

  • (1c) Compute the normalised weights w¯1i\overline{w}_{1}^{i} for i=1,,Ni=1,...,N.

For t2t\geq 2

  • (2a) Sort the state particles zt1iz_{t-1}^{i} or disturbance particles ϵt1i\epsilon_{t-1}^{i} using the Euclidean sorting method of Choppala et al., (2016) and obtain the sorted index ζi\zeta_{i} for i=1,,Ni=1,...,N, and the sorted state particles, disturbance particles, and weights z~t1i=zt1ζi\widetilde{z}_{t-1}^{i}=z_{t-1}^{\zeta_{i}}, ϵ~t1i=ϵt1ζi\widetilde{\epsilon}_{t-1}^{i}=\epsilon_{t-1}^{\zeta_{i}} and w¯~ti=w¯t1i\widetilde{\overline{w}}_{t}^{i}=\overline{w}_{t-1}^{i}, for i=1,,Ni=1,...,N.

  • (2b) Obtain the ancestor indices based on the sorted state particles a~t11:N\widetilde{a}_{t-1}^{1:N} using the correlated multinomial resampling in Algorithm S2.

  • (2c) Obtain the ancestor indices based on original order of the particles at1ia_{t-1}^{i} for i=1,,Ni=1,...,N.

  • (2d) Generate ϵti\epsilon_{t}^{i} from m(ϵti|uϵ,ti,θ)m\left(\epsilon_{t}^{i}|u_{\epsilon,t}^{i},\theta\right) and set zti=F(zt1at1i,ϵti;θ)z_{t}^{i}=F\left(z_{t-1}^{a_{t-1}^{i}},\epsilon_{t}^{i};\theta\right), for i=1,,Ni=1,...,N

  • (2e) Compute the unnormalised weights wtiw_{t}^{i}, for i=1,,Ni=1,...,N

  • (2f) Compute the normalised weights w¯ti\overline{w}_{t}^{i} for i=1,,Ni=1,...,N.

Algorithm S2 Multinomial Resampling Algorithm

Input: uA,t1u_{A,t-1}, sorted states z~t11:N\widetilde{z}_{t-1}^{1:N}, sorted disturbances ϵ~t11:N\widetilde{\epsilon}_{t-1}^{1:N}, and sorted weights w¯~t11:N\widetilde{\overline{w}}_{t-1}^{1:N}

Output: a~t11:N\widetilde{a}_{t-1}^{1:N}

  1. 1.

    Compute the cumulative weights

    F^t1N(j)=i=1jw¯~t1i\widehat{F}_{t-1}^{N}\left(j\right)=\sum_{i=1}^{j}\widetilde{\overline{w}}_{t-1}^{i}

    based on the sorted state particles {z~t11:N,w¯~t11:N}\left\{\widetilde{z}_{t-1}^{1:N},\widetilde{\overline{w}}_{t-1}^{1:N}\right\} or the sorted disturbance particles {ϵ~t11:N,w¯~t11:N}\left\{\widetilde{\epsilon}_{t-1}^{1:N},\widetilde{\overline{w}}_{t-1}^{1:N}\right\}

  2. 2.

    set a~t1i=min𝑗F^t1N(j)uA,t1i\widetilde{a}_{t-1}^{i}=\underset{j}{\min}\widehat{F}_{t-1}^{N}\left(j\right)\geq u_{A,t-1}^{i} for i=1,Ni=1,...N. Note that a~t1i\widetilde{a}_{t-1}^{i} for i=1,,Ni=1,...,N is the ancestor index based on the sorted states or the sorted disturbances.

S2 Delayed Acceptance Multiple PMMH (DA-MPM)

Delayed acceptance MCMC can used to speed up the computation for models with expensive likelihoods (Christen and Fox, , 2005). The motivation in delayed acceptance is to avoid computing of the expensive likelihood if it is likely that the proposed draw will ultimately be rejected. A first accept-reject stage is applied with the cheap (or deterministic) approximation substituted for the expensive likelihood in the Metropolis-Hastings acceptance ratio. Then, only for a proposal that is accepted in the first stage, the computationally expensive likelihood is calculated with a second accept-reject stage to ensure detailed balance is satisfied with respect to the true posterior. This section discusses the delayed-acceptance mixed PMMH (DA-MPM) algorithm.

Given the current parameter value θ\theta, the delayed acceptance Metropolis-Hastings (MH) algorithm proposes a new value θ\theta^{{}^{\prime}} from the proposal density q(θ|θ)q\left(\theta^{{}^{\prime}}|\theta\right) and uses a cheap approximation p^c(y|θ)\widehat{p}^{c}\left(y|\theta^{{}^{\prime}}\right) to the likelihood in the MH acceptance probability

α1(θ,u~;θ,u~)=min(1,p^c(y|θ)p(θ)q(θ|θ)p^c(y|θ)p(θ)q(θ|θ)).\alpha_{1}\left(\theta,\widetilde{u};\theta^{{}^{\prime}},\widetilde{u}^{{}^{\prime}}\right)=\min\left(1,\frac{\widehat{p}^{c}\left(y|\theta^{{}^{\prime}}\right)p\left(\theta^{{}^{\prime}}\right)q\left(\theta|\theta^{{}^{\prime}}\right)}{\widehat{p}^{c}\left(y|\theta\right)p\left(\theta\right)q\left(\theta^{{}^{\prime}}|\theta\right)}\right). (S3)

As an alternative to particle filtering, Norgaard et al., (2000) develop the central difference Kalman filter (CDKF) for estimating the state in general non-linear and non-Gaussian state-space models. Andreasen, (2013) shows that the CDKF frequently outperforms the extended Kalman Filter (EKF) for general non-linear and non-Gaussian state-space models. We use the likelihood approximation, p^c(y|θ)\widehat{p}^{c}(y|\theta), obtained from the CDKF in the first stage accept-reject in Eq. (S3).

A second accept-reject stage is applied to a proposal that is accepted in the first stage, with the second acceptance probability

α2(θ,u~;θ,u~)=min(1,p^¯N(y|θ,u~)p^c(y|θ)p^¯N(y|θ,u~)p^c(y|θ)).\alpha_{2}\left(\theta,\widetilde{u};\theta^{{}^{\prime}},\widetilde{u}^{{}^{\prime}}\right)=\min\left(1,\frac{\overline{\widehat{p}}_{N}\left(y|\theta^{{}^{\prime}},\widetilde{u}^{{}^{\prime}}\right)\widehat{p}^{c}\left(y|\theta\right)}{\overline{\widehat{p}}_{N}\left(y|\theta,\widetilde{u}\right)\widehat{p}^{c}\left(y|\theta^{{}^{\prime}}\right)}\right). (S4)

The overall acceptance probability α1(θ,u~;θ,u~)α2(θ,u~;θ,u~)\alpha_{1}\left(\theta,\widetilde{u};\theta^{{}^{\prime}},\widetilde{u}^{{}^{\prime}}\right)\alpha_{2}\left(\theta,\widetilde{u};\theta^{{}^{\prime}},\widetilde{u}^{{}^{\prime}}\right) ensures that detailed balance is satisfied with respect to the true posterior. If a rejection occurs at the first stage, then the expensive evaluation of the likelihood at the second stage is avoided (Sherlock et al., 2017a, ). Algorithm S3 describes the Delayed Acceptance MPM algorithm.

Algorithm S3 The Delayed Acceptance Multiple PMMH (DA-MPM) algorithm
  • Set the initial values of θ(0)\theta^{\left(0\right)} arbitrarily

  • Sample usN(0,I)u_{s}\sim N\left(0,I\right) for s=1,,Ss=1,...,S, run SS (disturbance) particle filters to estimate the likelihood p^¯N(y|θ,u~)\overline{\widehat{p}}_{N}\left(y|\theta,\widetilde{u}\right) as the trimmed mean of p^N(y|θ,us),s=1,,S\widehat{p}_{N}\left(y|\theta,u_{s}\right),s=1,\dots,S; a 0% is the mean and a 50% trimmed mean is the median, and run the algorithm 4 in Section 2.5 to construct the (initial) proposal for the auxiliary disturbance particle filter.

  • For each MCMC iteration pp, p=1,,Pp=1,...,P,

    • (1) Sample θ\theta^{{}^{\prime}} from the proposal density q(θ|θ)q\left(\theta^{{}^{\prime}}|\theta\right).

    • (2) Compute the likelihood approximation p^c(y|θ)\widehat{p}^{c}\left(y|\theta^{{}^{\prime}}\right) using the central difference Kalman filter (CDKF).

    • (3) Accept the first stage proposal with the acceptance probability in Equation (S3). If the proposal is accepted, then go to step (4), otherwise go to step (8)

    • (4) Choose index ss with probability 1/S1/S, sample ηuN(0,I)\eta_{u}\sim N\left(0,I\right), and set us=ρuus+1ρu2ηuu_{s}^{{}^{\prime}}=\rho_{u}u_{s}+\sqrt{1-\rho_{u}^{2}}\eta_{u}.

    • (5) Run SS particle filters to compute an estimate of likelihood p^¯N(y|θ,u~)\overline{\widehat{p}}_{N}\left(y|\theta^{{}^{\prime}},\widetilde{u}^{{}^{\prime}}\right).

    • (6) Run the algorithm 4 in Section 2.5 to construct the proposal for the auxiliary disturbance particle filter for iteration p+1p+1.

    • (7) Accept the second stage Metropolis-Hastings with acceptance probability in Equation (S4). If the proposal is accepted, then set p^¯N(y|θ,u~)(p)=p^¯N(y|θ,u~)\overline{\widehat{p}}_{N}\left(y|\theta,\widetilde{u}\right)^{\left(p\right)}=\overline{\widehat{p}}_{N}\left(y|\theta^{{}^{\prime}},\widetilde{u}^{{}^{\prime}}\right), p^c(y|θ)(p)=p^c(y|θ)\widehat{p}^{c}\left(y|\theta\right)^{\left(p\right)}=\widehat{p}^{c}\left(y|\theta^{{}^{\prime}}\right), u~(i)=u~\widetilde{u}^{\left(i\right)}=\widetilde{u}^{{}^{\prime}}, and θ(p)=θ\theta^{\left(p\right)}=\theta^{{}^{\prime}}. Otherwise, go to step (8).

    • (8) Otherwise, p^¯N(y|θ,u~)(p)=p^¯N(y|θ,u~)(p1)\overline{\widehat{p}}_{N}\left(y|\theta,\widetilde{u}\right)^{\left(p\right)}=\overline{\widehat{p}}_{N}\left(y|\theta,\widetilde{u}\right)^{\left(p-1\right)}, p^c(y|θ)(p)=p^c(y|θ)(p1)\widehat{p}^{c}\left(y|\theta\right)^{\left(p\right)}=\widehat{p}^{c}\left(y|\theta\right)^{\left(p-1\right)}, u~(p)=u~(p1)\widetilde{u}^{\left(p\right)}=\widetilde{u}^{\left(p-1\right)}, and θ(p)=θ(p1)\theta^{\left(p\right)}=\theta^{\left(p-1\right)}

S3 Overview of DSGE models

This section briefly overviews DSGE models. The equilibrium conditions for a wide variety of DSGE models can be summarized by

EtG(zt+1,zt,ϵt+1;θ)=0;E_{t}G(z_{t+1},z_{t},\epsilon_{t+1};\theta)=0; (S5)

EtE_{t} is the expectation conditional on date tt information and GG:2n+mn:\mathbb{R}^{2n+m}\mapsto\mathbb{R}^{n}; ztz_{t} is an n×1n\times 1 vector containing all variables known at time tt; ϵt+1\epsilon_{t+1} is an m×1m\times 1 vector of serially independent innovations. The solution to Eq. (S5) for zt+1z_{t+1} can be written as

zt+1=F(zt,ϵt+1,ζ;θ)z_{t+1}=F(z_{t},\epsilon_{t+1},\zeta;\theta) (S6)

such that

EtG(F(zt,ϵt+1,ζ;θ),zt,ϵt+1;θ)=0for anyt.E_{t}G(F(z_{t},\epsilon_{t+1},\zeta;\theta),z_{t},\epsilon_{t+1};\theta)=0\,\,\text{for any}\,\,t.

For our applications, we assume that ϵtN(0,ζ2Σϵ)\epsilon_{t}\sim N(0,\zeta^{2}\Sigma_{\epsilon}), where ζ\zeta is a scalar perturbation parameter. Under suitable differentiablity assumptions, and using the notation in Kollmann, (2015), we now describe the first and second order accurate solutions of DSGE models.

For most applications, the function F()F(\cdot) in Eq. (S6) is analytically intractable and is approximated using local solution techniques. We use first and second order Taylor series approximations around the deterministic steady state with ζ=0\zeta=0 to approximate FF. The deterministic steady state zsz^{s} satisfies zs=F(zs,0,0;θ)z^{s}=F(z^{s},0,0;\theta).

A first order-accurate approximation, around the deterministic steady state is

zt+1d=F1(θ)ztd+F2(θ)ϵt+1,z0d=0.z_{t+1}^{d}=F_{1}(\theta)z_{t}^{d}+F_{2}(\theta)\epsilon_{t+1},\quad z_{0}^{d}=0. (S7)

For Eq. (S7) to be stable, it is necessary for all the eigenvalues of F1(θ)F_{1}(\theta) to be less than 1 in absolute value. The initial value z0d=0z_{0}^{d}=0 because we work with approximations around the deterministic steady state. For a given set of parameters θ\theta, the matrices F1(θ)F_{1}(\theta) and F2(θ)F_{2}(\theta) can be solved using existing software; our applications use Dynare.

The second order accurate approximation (around the deterministic steady state) is

zt+1d=F0(θ)ζ2+F1(θ)ztd+F2(θ)ϵt+1+F11(θ)P(ztd)+F12(θ)(ztdϵt+1)+F22(θ)P(ϵt+1);z_{t+1}^{d}=F_{0}(\theta)\zeta^{2}+F_{1}(\theta)z_{t}^{d}+F_{2}(\theta)\epsilon_{t+1}+F_{11}(\theta)P(z_{t}^{d})+F_{12}(\theta)(z_{t}^{d}\otimes\epsilon_{t+1})+F_{22}(\theta)P(\epsilon_{t+1}); (S8)

where ztd=(ztzs)z_{t}^{d}=(z_{t}-z^{s}) and for any vector x:=(x1,,xm)x:=(x_{1},\dots,x_{m})^{\top}, we define P(x):=vech(xx),P(x):=\text{\rm vech}(xx^{\top}), where vech(xx)\text{\rm vech}(xx^{\top}) is the strict upper triangle and diagonal of xxxx^{\top}, such that vech(xx):=(x12,x1x2,x22,x1x3,,xm2)\text{\rm vech}(xx^{\top}):=(x_{1}^{2},x_{1}x_{2},x_{2}^{2},x_{1}x_{3},...,x_{m}^{2})^{\top}. The term F0(θ)ζ2F_{0}(\theta)\zeta^{2} captures the level correction due to uncertainty from taking a second-order approximation. For our analysis we normalize the perturbation parameter ζ\zeta to 1. As before, for a given set of parameters θ\theta, the matrices F1(θ),F2(θ),F11(θ),F12(θ),F22(θ)F_{1}(\theta),F_{2}(\theta),F_{11}(\theta),F_{12}(\theta),F_{22}(\theta) can be solved using Dynare.

The measurement (observation) equation for the DSGE model and its approximations is

yt\displaystyle y_{t} =Hzt+ηt,ηtN(0,Ση),\displaystyle=Hz_{t}+\eta_{t},\;\eta_{t}\sim N\left(0,\Sigma_{\eta}\right),

where HH is a known matrix; {ηt}\{\eta_{t}\} is an independent N(0,Ση)N\left(0,\Sigma_{\eta}\right) and sequence and it is also independent of the {ϵt}\{\epsilon_{t}\} sequence. The matrix Ση\Sigma_{\eta} is usually unknown and is estimated from the data.

S3.1 Pruned State Space Representation of DSGE Models

To obtain a stable solution we use the pruning approach recommended in Kim et al., (2008); for details of how pruning is implemented in DSGE models, see Kollmann, (2015) and Andreasen et al., (2018). The pruned second order solution preserves only second order accurate terms by using the first order accurate solution to calculate P(zt(2))P(z^{(2)}_{t}) and (zt(2)ϵt+1)(z^{(2)}_{t}\otimes\epsilon_{t+1}); where P()P(\cdot) is defined below. That is, we obtain the solution preserving only the second order effects by using P(zt(1))P(z_{t}^{(1)}) instead of P(zt(2))P(z^{(2)}_{t}) and (zt(1)ϵt+1)(z_{t}^{(1)}\otimes\epsilon_{t+1}) instead of (zt(2)ϵt+1)(z^{(2)}_{t}\otimes\epsilon_{t+1}) in Eq. (S8). We note that the variables are still expressed as deviations from steady state; however, for notational simplicity, we drop the superscript dd.

The evolution of P(zt(1))P(z_{t}^{(1)}) is

P(zt+1(1))=K11P(zt(1))+K12(zt(1)ϵt+1)+K22P(ϵt+1),P(z_{t+1}^{(1)})=K_{11}P(z_{t}^{(1)})+K_{12}(z_{t}^{(1)}\otimes\epsilon_{t+1})+K_{22}P(\epsilon_{t+1}),

with K11,K12,K22K_{11},K_{12},K_{22} being functions of F1F_{1} and F2F_{2}, respectively. A consequence of using pruning in preserving only second order effects is that it increases the dimension of the state space. The pruning-augmented solution of the DSGE model is given by

[zt+1(2)P(zt+1(1))zt+1(1)]=[F0ζ200]+[F1F1100K11000F1][zt(2)P(zt(1))zt(1)]+[F20F2]ϵt+1+[F12K120](zt(1)ϵt+1)+[F22K220]P(ϵt+1).\begin{bmatrix}z^{(2)}_{t+1}\\ P(z_{t+1}^{(1)})\\ z_{t+1}^{(1)}\end{bmatrix}=\begin{bmatrix}F_{0}\zeta^{2}\\ 0\\ 0\end{bmatrix}+\begin{bmatrix}F_{1}&F_{11}&0\\ 0&K_{11}&0\\ 0&0&F_{1}\end{bmatrix}\begin{bmatrix}z^{(2)}_{t}\\ P(z_{t}^{(1)})\\ z_{t}^{(1)}\end{bmatrix}+\begin{bmatrix}F_{2}\\ 0\\ F_{2}\end{bmatrix}\epsilon_{t+1}+\begin{bmatrix}F_{12}\\ K_{12}\\ 0\end{bmatrix}(z_{t}^{(1)}\otimes\epsilon_{t+1})+\\ \begin{bmatrix}F_{22}\\ K_{22}\\ 0\end{bmatrix}P(\epsilon_{t+1}).

The augmented state representation of the pruned second order accurate system becomes

z~t+1=g0+G1z~t+G2ϵt+1+G12(zt(1)ϵt+1)+G22P(ϵt+1),\widetilde{z}_{t+1}=g_{0}+G_{1}\widetilde{z}_{t}+G_{2}\epsilon_{t+1}+G_{12}(z_{t}^{(1)}\otimes\epsilon_{t+1})+G_{22}P(\epsilon_{t+1}),

where z~t=[zt(2),P(zt(1)),zt(1)]\widetilde{z}_{t}=[z^{(2)}_{t},P(z_{t}^{(1)}),z_{t}^{(1)}]^{\top}. We now summarize the state transition and the measurement equations for the pruned second order accurate system as

yt+1=Hz~t+1+ηt+1,ηt+1N(0,Ση)z~t+1=g0+G1z~t+G2ϵt+1+G12(zt(1)ϵt+1)+G22P(ϵt+1),ϵt+1N(0,ζ2Σϵ)\displaystyle\begin{aligned} &y_{t+1}=H\widetilde{z}_{t+1}+\eta_{t+1},\;\eta_{t+1}\sim N\left(0,\Sigma_{\eta}\right)\\ &\widetilde{z}_{t+1}=g_{0}+G_{1}\widetilde{z}_{t}+G_{2}\epsilon_{t+1}+G_{12}(z_{t}^{(1)}\otimes\epsilon_{t+1})+G_{22}P(\epsilon_{t+1}),\;\epsilon_{t+1}\sim N\left(0,\zeta^{2}\Sigma_{\epsilon}\right)\end{aligned} (S9)

The matrix HH selects the observables from the pruning augmented state space representation of the system. If dim(zt(2))=n×1{\rm dim}(z_{t}^{(2)})=n\times 1 and dim(ϵt)=m×1{\rm dim}(\epsilon_{t})=m\times 1 then dim(P(zt(1))=n(n+1)/2×1{\rm dim}(P(z_{t}^{(1)})=n(n+1)/2\times 1 and dim(z~t)=[n+n(n+1)/2+n]×1{\rm dim}(\widetilde{z}_{t})=[n+n(n+1)/2+n]\times 1. If nynn_{y}\leq n denotes the number of observables then the selection matrix [H=[Hny;𝟎]][H=[H_{ny};\mathbf{0}]], where dim(Hny)=ny×[n+n(n+1)/2+n]){\rm dim}(H_{ny})=n_{y}\times[n+n(n+1)/2+n]) and dim(𝟎)=(nny)×[n+n(n+1)/2+n]){\rm dim}(\mathbf{0})=(n-n_{y})\times[n+n(n+1)/2+n]). The selection matrix HnyH_{ny} consists only of zeros and ones.

S4 Additional tables and figures for the linear Gaussian state space model example

This section gives additional tables and figures for the linear Gaussian state space model in Section 3.2.

Tables S1 and S2 show the variance of the log of the estimated likelihood obtained by using five different approaches for the 11 dimensional linear Gaussian state space model with T=200T=200 and T=300T=300 time periods, respectively. The table shows that for all approaches there is a substantial reduction in the variance of the log of the estimated likelihood when the number of particle filters SS increases and the number of particles within each particle filter NN increases.

Tables S3S5 show the variance of the log of the estimated likelihood obtained by using the five estimators of the likelihood for the d=5d=5 dimensional linear Gaussian state space model with T=200T=200 and T=300T=300 time periods and d=10d=10 with T=200T=200 time period. The table shows that: (1) there is no substantial reduction in variance of the log of the estimated likelihood for the 0% trimmed mean. (2) The 5%, 10%, 25%, and 50% trimmed means decrease the variance substantially as SS and/or NN increase. The 25% and 50% trimmed means have the smallest variance for all cases.

Table S1: Comparing the variance of the log of the estimated likelihood for five different estimators of the likelihood: I: Averaging the likelihood (0% trimmed mean), II: Averaging the likelihood (5% trimmed mean), III: Averaging the likelihood (10% trimmed mean), IV: Averaging the likelihood (25% trimmed mean), and V: Averaging the likelihood (50% trimmed mean) for d=1d=1 dimension linear Gaussian state space model with T=200T=200. The variance of the log of the estimated likelihood of a single particle filter is reported under the column “Single”. The results are based on 10001000 independent runs.
NN SS II IIII IIIIII IVIV VV Single
100 1 2.8812.881
20 0.2710.271 0.1910.191 0.1750.175 0.1660.166 0.1910.191
50 0.1310.131 0.0770.077 0.0680.068 0.0640.064 0.0800.080
100100 0.0810.081 0.0400.040 0.0360.036 0.0350.035 0.0430.043
500 0.0180.018 0.0070.007 0.0070.007 0.0070.007 0.0080.008
1000 0.0110.011 0.0040.004 0.0030.003 0.0030.003 0.0040.004
250 1 1.0661.066
20 0.0730.073 0.0610.061 0.0580.058 0.0610.061 0.0770.077
50 0.0300.030 0.0240.024 0.0220.022 0.0230.023 0.0300.030
100 0.0160.016 0.0120.012 0.0110.011 0.0120.012 0.0150.015
500 0.0030.003 0.0020.002 0.0020.002 0.0020.002 0.0030.003
1000 0.0020.002 0.0010.001 0.0010.001 0.0010.001 0.0020.002
500 1 0.4930.493
20 0.0320.032 0.0290.029 0.0280.028 0.0300.030 0.0370.037
50 0.0130.013 0.0120.012 0.0120.012 0.0130.013 0.0170.017
100 0.0070.007 0.0060.006 0.0060.006 0.0060.006 0.0080.008
500 0.0010.001 0.0010.001 0.0010.001 0.0010.001 0.0020.002
1000 0.0010.001 0.0010.001 0.0010.001 0.0010.001 0.0010.001
1000 1 0.2620.262
20 0.0140.014 0.0140.014 0.0140.014 0.0160.016 0.0200.020
50 0.0060.006 0.0060.006 0.0060.006 0.0060.006 0.0080.008
100 0.0030.003 0.0030.003 0.0030.003 0.0030.003 0.0040.004
500 0.0010.001 0.0010.001 0.0010.001 0.0010.001 0.0010.001
1000 0.0000.000 0.0000.000 0.0000.000 0.0000.000 0.0000.000
Table S2: Comparing the variance of the log of the estimated likelihood for five different estimators of the likelihood: I: Averaging the likelihood (0% trimmed mean), II: Averaging the likelihood (5% trimmed mean), III: Averaging the likelihood (10% trimmed mean), IV: Averaging the likelihood (25% trimmed mean), and V: Averaging the likelihood (50% trimmed mean), for d=1d=1 dimension linear Gaussian state space model with T=300T=300. The variance of the log of estimated likelihood of a single particle filter is reported under the column “Single”. The results are based on 10001000 independent runs.
NN SS II IIII IIIIII IVIV VV Single
100 1 3.2773.277
20 0.4550.455 0.2980.298 0.2550.255 0.2290.229 0.2750.275
50 0.2510.251 0.1230.123 0.0980.098 0.0880.088 0.1090.109
100100 0.1490.149 0.0560.056 0.0480.048 0.0430.043 0.0540.054
500 0.0410.041 0.0130.013 0.0110.011 0.0100.010 0.0120.012
1000 0.0230.023 0.0050.005 0.0050.005 0.0040.004 0.0050.005
250 1 1.5091.509
20 0.1210.121 0.0960.096 0.0910.091 0.0920.092 0.1100.110
50 0.0540.054 0.0380.038 0.0350.035 0.0360.036 0.0470.047
100 0.0270.027 0.0190.019 0.0180.018 0.0180.018 0.0230.023
500 0.0060.006 0.0040.004 0.0030.003 0.0040.004 0.0050.005
1000 0.0030.003 0.0020.002 0.0020.002 0.0020.002 0.0020.002
500 1 0.6860.686
20 0.0480.048 0.0410.041 0.0400.040 0.0410.041 0.0500.050
50 0.0210.021 0.0180.018 0.0180.018 0.0180.018 0.0230.023
100 0.0110.011 0.0090.009 0.0090.009 0.0090.009 0.0120.012
500 0.0020.002 0.0020.002 0.0020.002 0.0020.002 0.0020.002
1000 0.0010.001 0.0010.001 0.0010.001 0.0010.001 0.0010.001
1000 1 0.3440.344
20 0.0200.020 0.0190.019 0.0190.019 0.0210.021 0.0270.027
50 0.0080.008 0.0080.008 0.0080.008 0.0080.008 0.0100.010
100 0.0040.004 0.0040.004 0.0040.004 0.0040.004 0.0060.006
500 0.0010.001 0.0010.001 0.0010.001 0.0010.001 0.0010.001
1000 0.0000.000 0.0000.000 0.0000.000 0.0000.000 0.0010.001
Table S3: Comparing the variance of the log of the estimated likelihood for five different estimators of the likelihood: I: Averaging the likelihood (0% trimmed mean), II: Averaging the likelihood (5% trimmed mean), III: Averaging the likelihood (10% trimmed mean), IV: Averaging the likelihood (25% trimmed mean), and V: Averaging the likelihood (50% trimmed mean), for d=5d=5 dimension linear Gaussian state space model with T=200T=200. The variance of the log of estimated likelihood of a single particle filter is reported under the column “Single”. The results are based on 10001000 independent runs.
NN SS II IIII IIIIII IVIV VV Single
100 1 44.22444.224
20 11.61111.611 5.9395.939 4.7974.797 3.6123.612 3.3993.399
50 8.2808.280 2.9352.935 1.881.88 1.5231.523 1.5151.515
100100 6.6816.681 1.4081.408 0.9970.997 0.7120.712 0.7250.725
500 4.8714.871 0.2380.238 0.1770.177 0.1360.136 0.1480.148
1000 4.3584.358 0.1340.134 0.0990.099 0.0680.068 0.0680.068
250 1 21.17521.175
20 4.4654.465 2.6272.627 2.0992.099 1.5561.556 1.5171.517
50 3.4703.470 1.2121.212 0.7810.781 0.5510.551 0.5880.588
100 2.6772.677 0.5270.527 0.4110.411 0.2950.295 0.2900.290
500 1.6521.652 0.1110.111 0.0810.081 0.0590.059 0.0630.063
1000 1.5111.511 0.0540.054 0.0390.039 0.0290.029 0.0320.032
500 1 9.9819.981
20 2.4152.415 1.3421.342 1.0231.023 0.8040.804 0.8190.819
50 1.6071.607 0.5500.550 0.3710.371 0.2940.294 0.3260.326
100 1.2161.216 0.2440.244 0.1910.191 0.1540.154 0.1750.175
500 0.6910.691 0.0530.053 0.0390.039 0.0300.030 0.0340.034
1000 0.5160.516 0.0260.026 0.0200.020 0.0150.015 0.0180.018
1000 1 6.0906.090
20 1.1271.127 0.6310.631 0.4970.497 0.4100.410 0.4630.463
50 0.6610.661 0.2550.255 0.1810.181 0.1510.151 0.1670.167
100 0.5020.502 0.1190.119 0.0940.094 0.0760.076 0.0850.085
500 0.2010.201 0.0230.023 0.0200.020 0.0160.016 0.0190.019
1000 0.1410.141 0.0110.011 0.0090.009 0.0070.007 0.0090.009
Table S4: Comparing the variance of the log of the estimated likelihood for five different estimators of the likelihood: I: Averaging the likelihood (0% trimmed mean), II: Averaging the likelihood (5% trimmed mean), III: Averaging the likelihood (10% trimmed mean), IV: Averaging the likelihood (25% trimmed mean), and V: Averaging the likelihood (50% trimmed mean), for d=5d=5 dimension linear Gaussian state space model with T=300T=300. The variance of the log of estimated likelihood of a single particle filter is reported under the column “Single”. The results are based on 10001000 independent runs.
NN SS II IIII IIIIII IVIV VV Single
100 1 56.28756.287
20 16.25216.252 8.6608.660 7.1307.130 4.7964.796 4.4774.477
50 12.21112.211 4.5244.524 2.8662.866 2.1532.153 1.9531.953
100100 10.48110.481 2.1742.174 1.4561.456 0.9560.956 0.9700.970
500 7.4327.432 0.4020.402 0.2850.285 0.2030.203 0.1920.192
1000 5.9605.960 0.1960.196 0.1370.137 0.0960.096 0.1080.108
250 1 28.27628.276
20 6.4166.416 3.6593.659 2.8942.894 1.9861.986 1.9941.994
50 4.7194.719 1.7181.718 1.1821.182 0.8310.831 0.8400.840
100 3.8973.897 0.7440.744 0.5640.564 0.4010.401 0.4190.419
500 2.6532.653 0.1610.161 0.1140.114 0.0780.078 0.0810.081
1000 2.0922.092 0.0770.077 0.0570.057 0.0400.040 0.0440.044
500 1 14.32714.327
20 3.2613.261 1.8811.881 1.3261.326 0.9900.990 0.9960.996
50 2.2222.222 0.7680.768 0.5120.512 0.3930.393 0.4220.422
100 1.6851.685 0.3380.338 0.2590.259 0.1880.188 0.2190.219
500 1.0571.057 0.0700.070 0.0540.054 0.0390.039 0.0450.045
1000 0.7840.784 0.0360.036 0.0270.027 0.0200.020 0.0220.022
1000 1 7.2697.269
20 1.4321.432 0.7820.782 0.6620.662 0.5440.544 0.5860.586
50 0.8870.887 0.3320.332 0.2570.257 0.2190.219 0.2390.239
100 0.6480.648 0.1540.154 0.1240.124 0.1030.103 0.1170.117
500 0.2910.291 0.0300.030 0.0240.024 0.0190.019 0.0220.022
1000 0.2550.255 0.0160.016 0.0120.012 0.0100.010 0.0120.012
Table S5: Comparing the variance of the log of the estimated likelihood for five different estimators of the likelihood: I: Averaging the likelihood (0% trimmed mean), II: Averaging the likelihood (5% trimmed mean), III: Averaging the likelihood (10% trimmed mean), IV: Averaging the likelihood (25% trimmed mean), and V: Averaging the likelihood (50% trimmed mean), for d=10d=10 dimension linear Gaussian state space model with T=200T=200. The variance of the log of estimated likelihood of a single particle filter is reported under the column “Single”. The results are based on 10001000 independent runs.
NN SS II IIII IIIIII IVIV VV Single
100 1 253.715253.715
20 63.02163.021 37.82837.828 28.79228.792 21.09521.095 19.56719.567
50 48.29548.295 19.96919.969 12.35112.351 7.9517.951 7.2217.221
100100 43.51943.519 9.1949.194 6.2726.272 3.7713.771 3.6643.664
500 31.72931.729 1.8821.882 1.3101.310 0.8300.830 0.7250.725
1000 27.66727.667 0.9400.940 0.5860.586 0.3820.382 0.3620.362
250 1 128.193128.193
20 35.40035.400 20.40320.403 15.42015.420 10.94910.949 9.3589.358
50 28.43028.430 10.00510.005 6.6006.600 4.6234.623 4.0314.031
100 24.81024.810 4.9214.921 3.4353.435 2.1542.154 1.8991.899
500 16.66616.666 0.8890.889 0.5990.599 0.4230.423 0.3870.387
1000 15.06115.061 0.4750.475 0.3120.312 0.2190.219 0.2040.204
500 1 77.09477.094
20 22.46522.465 12.91812.918 8.8918.891 6.6126.612 6.2696.269
50 17.78517.785 6.1316.131 3.5333.533 2.4652.465 2.4502.450
100 14.07314.073 2.7672.767 1.8611.861 1.2761.276 1.2711.271
500 10.37610.376 0.5760.576 0.4150.415 0.2700.270 0.2430.243
1000 8.6408.640 0.2900.290 0.2090.209 0.1310.131 0.1210.121
1000 1 50.31950.319
20 12.62012.620 6.9556.955 5.5985.598 3.8033.803 3.6633.663
50 9.2309.230 3.4363.436 2.4062.406 1.6631.663 1.5151.515
100 8.0708.070 1.6451.645 1.1441.144 0.7590.759 0.7380.738
500 5.8255.825 0.3080.308 0.2200.220 0.1470.147 0.1500.150
1000 5.2555.255 0.1530.153 0.1160.116 0.0760.076 0.0760.076
Table S6: Comparing the performance of different PMMH samplers with different numbers of particle filters SS and different numbers of particles NN in each particle filter for estimating linear Gaussian state space model using a simulate dataset with T=200T=200 and d=10d=10 dimensions. Sampler I: Correlated PMMH of Deligiannidis et al., (2018). Sampler II: MPM with averaging likelihood (5% trimmed mean). Sampler III: MPM with averaging likelihood (10% trimmed mean). Sampler IV: MPM with averaging likelihood (25% trimmed mean). Time denotes the time taken in seconds for one iteration of the method.
I II III IV
N 20000 100 250 500 100 250 500 100 250 500
S 1 100 100 100 100 100 100 100 100 100
IF^\widehat{\textrm{IF}} 42.076 NA 137.725 22.507 253.251 17.065 10.658 18.901 9.748 7.899
CT 2.44 0.32 0.71 1.21 0.32 0.71 1.21 0.32 0.71 1.21
TNIF^\widehat{\textrm{TNIF}} 102.665 NA 97.785 27.234 81.040 12.116 12.896 6.048 6.921 9.558
RTNIF^\widehat{\textrm{RTNIF}} 1 NA 0.952 0.265 0.789 0.119 0.126 0.059 0.067 0.093

Table S6 reports the IF^\widehat{\textrm{IF}}, TNIF^\widehat{\textrm{TNIF}}, and RTNIF^\widehat{\textrm{RTNIF}} values for the parameter θ\theta in the linear Gaussian state space model with d=10d=10 dimensions and T=200T=200 time periods estimated using the four different MCMC samplers: (1) the correlated PMMH of Deligiannidis et al., (2018), (2) the MPM with 5% trimmed mean, (3) the MPM with 10% trimmed mean, and (4) the MPM with 25% trimmed mean. The computing time reported in the table is the time to run a single particle filter for the CPM and SS particle filters for the MPM approach. The table shows that: (1) The correlated PMMH requires more than 2000020000 particles to improve the mixing of the MCMC chain for the parameter θ\theta; (2) the CPU time for running a single particle filter with N=20000N=20000 particles is 3.433.43 times higher than running multiple particle filters with S=100S=100 and N=250N=250. The MPM method can be much faster than the CPM method if it is run using high-performance computing with a large number of cores; (3) the MPM allows us to use a much smaller number of particles for each independent PF and these multiple PFs can be run independently; (4) in terms of RTNIF^\widehat{\textrm{RTNIF}}, the 5%, 10%, and 25% trimmed means with S=100S=100 and N=250N=250 are 1.051.05, 8.408.40, and 14.9314.93 times smaller than the correlated PMMH with N=20000N=20000 particles; (6) the best sampler for this example is the 25% trimmed mean approach with S=100S=100 and N=100N=100 particles. Figure S1 shows the kernel density estimates of the parameter θ\theta estimated using Metropolis-Hastings algorithm with the (exact) Kalman filter method and the MPM algorithm with 5%, 10%, and 25% trimmed means. The figure shows that the approximate posteriors obtained by various approaches using the trimmed means are very close to the true posterior. Figure S1 also shows that the approximate posterior obtained using the MPM with the trimmed mean of the likelihood is very close to the exact posterior obtained using the correlated PMMH.

Figure S1: Left: Kernel density estimates of the parameter θ\theta estimated using (1) Metropolis-Hastings algorithm with the (exact) Kalman filter method; (2) Correlated PMMH with N=20000N=20000 particles; (3) MPM with averaging the likelihood (5% trimmed mean, S=100S=100, N=250N=250); (4) MPM with averaging the likelihood (5% trimmed mean, S=100S=100, N=500N=500). Middle: Kernel density estimates of the parameter θ\theta estimated using: (1) Metropolis-Hastings algorithm with exact Kalman filter method; (2) Correlated PMMH with 20000 particles; (3) MPM with averaging the likelihood (10% trimmed mean, S=100S=100, N=250N=250); (4) MPM with averaging the likelihood (10% trimmed mean, S=100S=100, N=500N=500). Right: Kernel density estimates of the parameter θ\theta estimated using: (1) Metropolis-Hastings algorithm with exact Kalman filter method; (2) Correlated PMMH with 20000 particles; (3) MPM with averaging the likelihood (25% trimmed mean, S=100S=100, N=250N=250); (4) MPM with averaging the likelihood (25% trimmed mean, S=100S=100, N=500N=500) for estimating d=10d=10 dimensions linear Gaussian state space model with T=200T=200.
Refer to caption
Figure S2: The estimated correlation of log of the estimated likelihoods evaluated at θ=0.4=θ\theta=0.4=\theta^{{}^{\prime}} (top), θ=0.4\theta=0.4 and θ=0.399\theta^{{}^{\prime}}=0.399 (middle), and θ=0.4\theta=0.4 and θ=0.385\theta^{{}^{\prime}}=0.385 (bottom) for the 10 dimensional linear Gaussian state space model with T=300T=300 using the five estimators: (1) MPM (Blocking) with 0% trimmed mean of the likelihood, (2) MPM (Blocking) with 5% trimmed mean of the likelihood, (3) MPM (Blocking) with 10% trimmed mean of the likelihood, (4) MPM (Blocking) with 25% trimmed mean of the likelihood, (5) MPM (Blocking) with 50% trimmed mean of the likelihood.
Refer to caption

Figure S2 reports the correlation estimates at the current and proposed values of θ\theta of the log of the estimated likelihood obtained using different MPM approaches. The figure show that: (1) When the current and proposed values of the parameters are equal to 0.40.4, all MPM methods maintain a high correlation between the logs of the estimated likelihoods. The estimated correlations between logs of the estimated likelihoods when the number of particle filters SS is 100 are about 0.99. (2) When the current parameter θ\theta and the proposed parameter θ\theta^{{}^{\prime}} are (slightly) different, the MPM methods can still maintain some of the correlation between the log of the estimated likelihoods. The MPM methods with 25% and 50% trimmed means of the likelihood perform the best to maintain the correlation between the logs of the estimated likelihoods.

S5 Additional tables and figures for the multivariate stochastic volatility in mean model example

This section gives additional tables and figures for the multivariate stochastic volatility in mean example in Section 3.2.

Figures S3 to S11 show the kernel density estimates of the parameters of the multivariate stochastic volatility in mean model described above estimated using the MPM sampler (S=100S=100 and N=250N=250) using a 25% trimmed mean with the vertical lines showing the true parameter values. The figures show that the model parameters are estimated well.

Figure S3: Kernel density estimates of the parameters of the multivariate stochastic volatility in mean model estimated using the MPM with averaging likelihood (25% trimmed mean) for a simulated dataset (data 2) with d=20d=20 dimensions and T=100T=100.
Refer to caption
Figure S4: Kernel density estimates of the parameters of the multivariate stochastic volatility in mean model estimated using the MPM with averaging likelihood (25% trimmed mean) for a simulated dataset (data 3) with d=20d=20 dimensions and T=100T=100.
Refer to caption
Figure S5: Kernel density estimates of the parameters of the multivariate stochastic volatility in mean model estimated using the MPM with averaging likelihood (25% trimmed mean) for a simulated dataset (data 4) with d=20d=20 dimensions and T=100T=100.
Refer to caption
Figure S6: Kernel density estimates of the parameters of the multivariate stochastic volatility in mean model estimated using the MPM with averaging likelihood (25% trimmed mean) for a simulated dataset (data 5) with d=20d=20 dimensions and T=100T=100.
Refer to caption
Figure S7: Kernel density estimates of the parameters of the multivariate stochastic volatility in mean model estimated using the MPM with averaging likelihood (25% trimmed mean) for a simulated dataset (data 6) with d=20d=20 dimensions and T=100T=100.
Refer to caption
Figure S8: Kernel density estimates of the parameters of the multivariate stochastic volatility in mean model estimated using the MPM with averaging likelihood (25% trimmed mean) for a simulated dataset (data 7) with d=20d=20 dimensions and T=100T=100.
Refer to caption
Figure S9: Kernel density estimates of the parameters of the multivariate stochastic volatility in mean model estimated using the MPM with averaging likelihood (25% trimmed mean) for a simulated dataset (data 8) with d=20d=20 dimensions and T=100T=100.
Refer to caption
Figure S10: Kernel density estimates of the parameters of the multivariate stochastic volatility in mean model estimated using the MPM with averaging likelihood (25% trimmed mean) for a simulated dataset (data 9) with d=20d=20 dimensions and T=100T=100.
Refer to caption
Figure S11: Kernel density estimates of the parameters of the multivariate stochastic volatility in mean model estimated using the MPM with averaging likelihood (25% trimmed mean) for a simulated dataset (data 10) with d=20d=20 dimensions and T=100T=100.
Refer to caption

S6 Additional tables and figures for the small scale DSGE model example

This section gives additional tables and figures for the small scale DSGE model example in Section 3.4.1.

Table S7 gives the prior distributions for each model parameter for the non-linear small scale DSGE model in section 3.4.1. Table S8 reports the inefficiency factors for each parameter and the relative time normalised inefficiency factor (RTNIF) of a PMMH sampler relative to the MPM (0% trimmed mean) with the ADPF, disturbance sorting, and ρu=0.99\rho_{u}=0.99 for the non-linear small scale models with T=124T=124 time periods. Section 3.4.1 discusses the results.

Table S7: Marginal prior distributions for each model parameter for the non-linear small scale DSGE model in Section 3.4.1. The Param (1) and Param (2) columns list the mean and the standard deviation for the truncated normal (TN) and Normal distribution; vv and ss for the inverse gamma (IG) distribution. The domain R+:=(0,)R^{+}:=(0,\infty). TN(0,)\textrm{TN}_{\left(0,\infty\right)} means the normal distribution truncated to the interval (0,)(0,\infty). TN(0,1)\textrm{TN}_{\left(0,1\right)} is the normal distribution truncated to the interval (0,1)(0,1). σrm\sigma^{m}_{r}, σgm\sigma^{m}_{g}, and σmm\sigma^{m}_{m} are the measurement error variances.
Param. Domain Density Param (1) Param (2)
r(A)r^{\left(A\right)} R+R^{+} TN(0,)\textrm{TN}_{\left(0,\infty\right)} 0.7 0.5
π(A)\pi^{\left(A\right)} R+R^{+} TN(0,)\textrm{TN}_{\left(0,\infty\right)} 3.12 0.5
γ(Q)\gamma^{\left(Q\right)} RR N 0.60 0.5
τ\tau R+R^{+} TN(0,)\textrm{TN}_{\left(0,\infty\right)} 1.50 0.5
ψ1\psi_{1} R+R^{+} TN(0,)\textrm{TN}_{\left(0,\infty\right)} 2.41 0.5
ψ2\psi_{2} R+R^{+} TN(0,)\textrm{TN}_{\left(0,\infty\right)} 0.39 0.5
ρr\rho_{r} [0,1]\left[0,1\right] TN(0,1)\textrm{TN}_{\left(0,1\right)} 0.5 0.2
ρg\rho_{g} [0,1]\left[0,1\right] TN(0,1)\textrm{TN}_{\left(0,1\right)} 0.5 0.2
ρm\rho_{m} [0,1]\left[0,1\right] TN(0,1)\textrm{TN}_{\left(0,1\right)} 0.5 0.2
σr\sigma_{r} R+R^{+} IG 5 0.5
σg\sigma_{g} R+R^{+} IG 5 0.5
σm\sigma_{m} R+R^{+} IG 5 0.5
σrm\sigma_{r}^{m} R+R^{+} G 1 1
σgm\sigma_{g}^{m} R+R^{+} G 1 1
σmm\sigma_{m}^{m} R+R^{+} G 1 1
Table S8: Comparing the performance of different PMMH samplers with different number of particle filters SS and different number of particles NN in each particle filter for estimating the small scale DSGE model for the real dataset obtained from Herbst and Schorfheide, (2019) with T=124T=124. Sampler I: MPM (averaging likelihood, ρu=0.99\rho_{u}=0.99, disturbance sorting, ADPF). Sampler II: MPM (averaging likelihood 10% trimmed mean, ρu=0.99\rho_{u}=0.99, disturbance sorting, ADPF). Sampler III: MPM (averaging likelihood 25% trimmed mean, ρu=0.99\rho_{u}=0.99, disturbance sorting, ADPF). Sampler IV: MPM (averaging likelihood 25% trimmed mean, ρu=0\rho_{u}=0, disturbance sorting, ADPF). Sampler V: MPM (averaging likelihood 25% trimmed mean, ρu=0.99\rho_{u}=0.99, state sorting, ADPF). Sampler VI: DA-MPM (averaging likelihood 25% trimmed mean, ρu=0.99\rho_{u}=0.99, state sorting, ADPF). Sampler VII: MPM (averaging likelihood 50% trimmed mean, ρu=0.99\rho_{u}=0.99, disturbance sorting, ADPF). Sampler VIII: MPM (averaging likelihood, ρu=0.99\rho_{u}=0.99, disturbance sorting, bootstrap). Sampler IX: Correlated PMMH. Time is the time in seconds for one iteration of the method.
Param I II III IV V VI VII VIII IX
S 100 100 100 100 100 100 100 100 1
N 250 100 100 100 100 100 100 250 20000
π(A)\pi^{\left(A\right)} 1477.6731477.673 87.79687.796 87.53187.531 86.95686.956 97.64197.641 127.836127.836 95.56695.566 NA 184.817184.817
τ\tau 689.320689.320 57.26057.260 59.92459.924 60.83960.839 92.84792.847 108.051108.051 77.12977.129 NA 486.328486.328
ψ1\psi_{1} 1074.3081074.308 84.83584.835 83.48183.481 82.31782.317 112.780112.780 157.470157.470 95.78095.780 NA 313.848313.848
ψ2\psi_{2} 645.963645.963 90.75090.750 96.50796.507 105.948105.948 148.919148.919 234.819234.819 116.554116.554 NA 931.106931.106
γ(Q)\gamma^{\left(Q\right)} 662.909662.909 74.88174.881 134.589134.589 105.032105.032 131.219131.219 128.182128.182 88.10588.105 NA 138.769138.769
r(A)r^{\left(A\right)} 617.848617.848 110.671110.671 108.337108.337 79.47779.477 118.001118.001 156.412156.412 84.81284.812 NA 1463.0571463.057
ρr\rho_{r} 651.307651.307 116.273116.273 113.495113.495 93.52593.525 135.521135.521 194.771194.771 118.165118.165 NA 386.218386.218
ρg\rho_{g} 1145.9011145.901 152.711152.711 123.128123.128 99.31699.316 167.833167.833 604.722604.722 132.060132.060 NA 1064.0701064.070
ρm\rho_{m} 864.234864.234 62.18362.183 110.929110.929 151.060151.060 96.78796.787 200.407200.407 236.684236.684 NA 1407.1051407.105
σr\sigma_{r} 2254.6952254.695 88.55288.552 104.442104.442 89.85489.854 142.248142.248 148.776148.776 94.68994.689 NA 556.250556.250
σg\sigma_{g} 1728.3091728.309 149.187149.187 91.28391.283 114.929114.929 140.916140.916 212.246212.246 148.838148.838 NA 676.259676.259
σm\sigma_{m} 1213.4291213.429 87.72487.724 80.62780.627 72.74372.743 120.432120.432 195.319195.319 97.56697.566 NA 213.740213.740
σrm\sigma_{r}^{m} 1242.3511242.351 82.95782.957 94.45594.455 81.54781.547 83.68983.689 122.662122.662 101.957101.957 NA 199.629199.629
σgm\sigma_{g}^{m} 1078.0891078.089 82.67882.678 66.75166.751 76.76476.764 81.13181.131 98.44198.441 79.81079.810 NA 104.288104.288
σmm\sigma_{m}^{m} 1487.4081487.408 85.19685.196 69.18869.188 97.43897.438 87.32387.323 170.904170.904 110.041110.041 NA 1377.6491377.649
IF^ψ,MAX\widehat{\textrm{IF}}_{\psi,\textrm{MAX}} 2254.6952254.695 152.711152.711 134.589134.589 151.060151.060 167.833167.833 604.723604.723 236.684236.684 NA 1463.0571463.057
TNIF^MAX\widehat{\textrm{TNIF}}_{\textrm{MAX}} 2412.5242412.524 83.99183.991 74.02474.024 83.08383.083 102.378102.378 66.52066.520 130.176130.176 NA 7607.8967607.896
RTNIF^MAX\widehat{\textrm{RTNIF}}_{\textrm{MAX}} 11 0.0350.035 0.0310.031 0.0340.034 0.0420.042 0.0280.028 0.0540.054 NA 3.1543.154
IF^ψ,MEAN\widehat{\textrm{IF}}_{\psi,\textrm{MEAN}} 1122.2501122.250 94.24494.244 94.97894.978 93.18393.183 117.153117.153 190.735190.735 111.852111.852 NA 633.542633.542
TNIF^MEAN\widehat{\textrm{TNIF}}_{\textrm{MEAN}} 1200.8081200.808 51.83451.834 52.23852.238 51.25151.251 71.46371.463 20.98120.981 61.51961.519 NA 3294.4183294.418
RTNIF^MEAN\widehat{\textrm{RTNIF}}_{\textrm{MEAN}} 11 0.0430.043 0.0440.044 0.0430.043 0.0600.060 0.0170.017 0.0510.051 NA 2.7442.744
Time 1.07 0.55 0.55 0.55 0.61 0.11 0.550.55 0.75 5.20

S7 Additional tables and figures for the medium scale DSGE model example

This section gives additional tables and figures for the medium scale DSGE model example in Section 3.4.2. Table S9 gives the prior distributions for each model parameter for the non-linear medium scale DSGE model in Section 3.4.2. Table S10 gives the mean, 2.5%2.5\%, and 97.5%97.5\% quantiles estimates of each parameter in the medium scale DSGE model estimated using the MPM sampler with a 50%50\% trimmed mean (ρu=0.99\rho_{u}=0.99, disturbance sorting, ADPF)) for the US quarterly dataset from 1983Q1 to 2007Q4. The measurement error variances are fixed to 25%25\% of the variance of the observables. The standard errors are relatively small for all parameters indicating that the parameters are estimated accurately.

Table S11 gives the inefficiency factor for each parameter in the medium scale DSGE model estimated using the MPM with 50% trimmed mean (ADPF, disturbance sorting, ρu=0.9\rho_{u}=0.9) and the correlated PMMH samplers. Table S12 gives the mean, 2.5%2.5\%, and 97.5%97.5\% quantiles estimates of each parameter in the extended version of the medium scale DSGE model estimated using the MPM sampler with a 50%50\% trimmed mean (ρu=0.99\rho_{u}=0.99, disturbance sorting, ADPF)) for the US quarterly dataset from 1983Q1 to 2007Q4. The measurement error variances are fixed to 25%25\% of the variance of the observables.

Table S9: Prior distribution for each model parameter for the non-linear medium scale DSGE model. The Param (1) and Param (2) columns list the mean and the standard deviation for the truncated normal (TN) and Normal distribution; vv and ss for the shape and scale parameters of the inverse gamma (IG) and gamma (G) distribution. The domain R+:=(0,)R^{+}:=(0,\infty). TN(0,)\textrm{TN}_{\left(0,\infty\right)} means the normal distribution truncated to the interval (0,)(0,\infty). TN(0,1)\textrm{TN}_{\left(0,1\right)} is the normal distribution truncated to the interval (0,1)(0,1). σrm\sigma_{r}^{m}, σgm\sigma_{g}^{m}, and σmm\sigma_{m}^{m} are the measurement error variances.
Param. Domain Density Param. (1) Param. (2)
ρR\rho_{R} [0,1]\left[0,1\right] TN[0,1]\textrm{TN}_{\left[0,1\right]} 0.60.6 0.20.2
ρg\rho_{g} [0,1]\left[0,1\right] TN[0,1]\textrm{TN}_{\left[0,1\right]} 0.60.6 0.20.2
ρμ\rho_{\mu} [0,1]\left[0,1\right] TN[0,1]\textrm{TN}_{\left[0,1\right]} 0.60.6 0.20.2
100σg100\sigma_{g} R+R^{+} G 0.50.5 0.50.5
100σμ100\sigma_{\mu} R+R^{+} G 0.50.5 0.50.5
100ση100\sigma_{\eta} R+R^{+} G 0.50.5 0.50.5
100σZ100\sigma_{Z} R+R^{+} G 0.50.5 0.50.5
100σR100\sigma_{R} R+R^{+} G 0.50.5 0.50.5
100σgdp100\sigma_{gdp} R+R^{+} G 0.50.5 0.50.5
100σcon100\sigma_{con} R+R^{+} G 0.50.5 0.50.5
100σinv100\sigma_{inv} R+R^{+} G 0.50.5 0.50.5
100σinf100\sigma_{inf} R+R^{+} G 0.50.5 0.50.5
100σffr100\sigma_{ffr} R+R^{+} G 0.50.5 0.50.5
γg\gamma_{g} RR N 0.40.4 0.20.2
γΠ\gamma_{\Pi} RR N 1.71.7 0.30.3
γx\gamma_{x} RR N 0.40.4 0.30.3
γ\gamma [0,1]\left[0,1\right] TN[0,1]\textrm{TN}_{\left[0,1\right]} 0.60.6 0.10.1
σa\sigma_{a} R+R^{+} G 2525 0.20.2
φp/100\varphi_{p}/100 RR N 11 0.250.25
φw/1000\varphi_{w}/1000 RR N 33 55
φI\varphi_{I} R+R^{+} G 1616 0.250.25
σL\sigma_{L} R+R^{+} G 5.345.34 0.370.37
1a1-a [0,1]\left[0,1\right] TN[0,1]\textrm{TN}_{\left[0,1\right]} 0.50.5 0.150.15
1aw1-a_{w} [0,1]\left[0,1\right] TN[0,1]\textrm{TN}_{\left[0,1\right]} 0.50.5 0.150.15
100(β11)100\left(\beta^{-1}-1\right) R+R^{+} G 0.6250.625 44
100log(Gz)100\log\left(G_{z}\right) RR N 0.50.5 0.030.03
100(Π¯1)100\left(\overline{\Pi}-1\right) RR N 0.620.62 0.10.1
α\alpha RR N 0.30.3 0.050.05
Table S10: Posterior means, 2.5%2.5\%, and 97.5%97.5\% quantiles of the posterior distributions of each of the parameters in the medium scale DSGE model estimated using the MPM sampler with 50%50\% trimmed mean (ρu=0.99\rho_{u}=0.99, disturbance sorting, ADPF)) for the US quarterly dataset from 1983Q1 to 2007Q4. The measurement error variances are fixed to 25%25\% of the variance of the observables. The standard errors in brackets are calculated from 1010 independent runs of the MPM sampler
Param. Estimates 2.5%2.5\% 97.5%97.5\%
ρR\rho_{R} 0.4329(0.0178)\underset{\left(0.0178\right)}{0.4329} 0.2097(0.0365)\underset{\left(0.0365\right)}{0.2097} 0.6490(0.0220)\underset{\left(0.0220\right)}{0.6490}
ρg\rho_{g} 0.9883(0.0008)\underset{\left(0.0008\right)}{0.9883} 0.9770(0.0020)\underset{\left(0.0020\right)}{0.9770} 0.9965(0.0005)\underset{\left(0.0005\right)}{0.9965}
ρμ\rho_{\mu} 0.5975(0.0191)\underset{\left(0.0191\right)}{0.5975} 0.4136(0.0309)\underset{\left(0.0309\right)}{0.4136} 0.7598(0.0410)\underset{\left(0.0410\right)}{0.7598}
100σg100\sigma_{g} 0.2188(0.0104)\underset{\left(0.0104\right)}{0.2188} 0.0973(0.0129)\underset{\left(0.0129\right)}{0.0973} 0.3992(0.0182)\underset{\left(0.0182\right)}{0.3992}
100σμ100\sigma_{\mu} 4.1832(0.0978)\underset{\left(0.0978\right)}{4.1832} 2.1935(0.2905)\underset{\left(0.2905\right)}{2.1935} 7.0727(0.2697)\underset{\left(0.2697\right)}{7.0727}
100ση100\sigma_{\eta} 0.3614(0.0096)\underset{\left(0.0096\right)}{0.3614} 0.2706(0.0082)\underset{\left(0.0082\right)}{0.2706} 0.4837(0.0151)\underset{\left(0.0151\right)}{0.4837}
100σZ100\sigma_{Z} 0.6240(0.0118)\underset{\left(0.0118\right)}{0.6240} 0.4707(0.0168)\underset{\left(0.0168\right)}{0.4707} 0.7954(0.0121)\underset{\left(0.0121\right)}{0.7954}
100σR100\sigma_{R} 0.0462(0.0032)\underset{\left(0.0032\right)}{0.0462} 0.0059(0.0002)\underset{\left(0.0002\right)}{0.0059} 0.1570(0.0123)\underset{\left(0.0123\right)}{0.1570}
γg\gamma_{g} 0.3051(0.0162)\underset{\left(0.0162\right)}{0.3051} 0.0435(0.0130)\underset{\left(0.0130\right)}{0.0435} 0.7360(0.0541)\underset{\left(0.0541\right)}{0.7360}
γΠ\gamma_{\Pi} 0.9897(0.0039)\underset{\left(0.0039\right)}{0.9897} 0.9671(0.0146)\underset{\left(0.0146\right)}{0.9671} 1.0072(0.0016)\underset{\left(0.0016\right)}{1.0072}
γx\gamma_{x} 0.7649(0.0115)\underset{\left(0.0115\right)}{0.7649} 0.4726(0.0106)\underset{\left(0.0106\right)}{0.4726} 1.0829(0.0428)\underset{\left(0.0428\right)}{1.0829}
γ\gamma 0.4606(0.0086)\underset{\left(0.0086\right)}{0.4606} 0.3506(0.0131)\underset{\left(0.0131\right)}{0.3506} 0.5727(0.0128)\underset{\left(0.0128\right)}{0.5727}
σa\sigma_{a} 4.9188(0.1244)\underset{\left(0.1244\right)}{4.9188} 3.4442(0.1777)\underset{\left(0.1777\right)}{3.4442} 6.7495(0.2509)\underset{\left(0.2509\right)}{6.7495}
φp\varphi_{p} 86.7032(3.6050)\underset{\left(3.6050\right)}{86.7032} 37.4987(15.1263)\underset{\left(15.1263\right)}{37.4987} 138.1528(3.4207)\underset{\left(3.4207\right)}{138.1528}
φw\varphi_{w} 2879.123(289.8829)\underset{\left(289.8829\right)}{2879.123} 1230.612(110.8468)\underset{\left(110.8468\right)}{1230.612} 5076.731(856.0024)\underset{\left(856.0024\right)}{5076.731}
φI\varphi_{I} 2.4537(0.0798)\underset{\left(0.0798\right)}{2.4537} 1.4166(0.0887)\underset{\left(0.0887\right)}{1.4166} 3.9636(0.2252)\underset{\left(0.2252\right)}{3.9636}
σL\sigma_{L} 3.8518(0.1389)\underset{\left(0.1389\right)}{3.8518} 2.0824(0.1179)\underset{\left(0.1179\right)}{2.0824} 6.1554(0.2504)\underset{\left(0.2504\right)}{6.1554}
1a1-a 0.4386(0.0140)\underset{\left(0.0140\right)}{0.4386} 0.2153(0.0351)\underset{\left(0.0351\right)}{0.2153} 0.7024(0.0355)\underset{\left(0.0355\right)}{0.7024}
1aw1-a_{w} 0.6493(0.0119)\underset{\left(0.0119\right)}{0.6493} 0.3875(0.0236)\underset{\left(0.0236\right)}{0.3875} 0.8285(0.0074)\underset{\left(0.0074\right)}{0.8285}
100(β11)100\left(\beta^{-1}-1\right) 0.0651(0.0187)\underset{\left(0.0187\right)}{0.0651} 0.0052(0.0034)\underset{\left(0.0034\right)}{0.0052} 0.2034(0.0866)\underset{\left(0.0866\right)}{0.2034}
100log(Gz)100\log\left(G_{z}\right) 0.5144(0.0035)\underset{\left(0.0035\right)}{0.5144} 0.4650(0.0039)\underset{\left(0.0039\right)}{0.4650} 0.5708(0.0042)\underset{\left(0.0042\right)}{0.5708}
100(Π¯1)100\left(\overline{\Pi}-1\right) 0.6150(0.0069)\underset{\left(0.0069\right)}{0.6150} 0.4213(0.0132)\underset{\left(0.0132\right)}{0.4213} 0.7977(0.0164)\underset{\left(0.0164\right)}{0.7977}
α\alpha 0.1738(0.0032)\underset{\left(0.0032\right)}{0.1738} 0.1284(0.0064)\underset{\left(0.0064\right)}{0.1284} 0.2151(0.0034)\underset{\left(0.0034\right)}{0.2151}
Table S11: Comparing the performance of different PMMH samplers with different number of particle filters SS and different number of particles NN in each particle filter for estimating the medium scale DSGE model using the US quarterly dataset from 1983Q1 to 2007Q4. Sampler I: MPM (50% trimmed mean, ρu=0.9\rho_{u}=0.9, disturbance sorting, ADPF). Sampler II: Correlated PMMH. The first two columns are the inefficiency factors of each parameter for the case when the measurement error variances are estimated. The last two columns are the inefficiency factor of each parameter for the case when the measurement error variances are fixed at 25% of the variance of each observable over the estimation period.
Param. I II I II
ρR\rho_{R} 450.704450.704 1886.7911886.791 318.305318.305 277.221277.221
ρg\rho_{g} 523.660523.660 653.350653.350 205.736205.736 292.167292.167
ρμ\rho_{\mu} 618.928618.928 1354.5821354.582 233.298233.298 238.901238.901
100σg100\sigma_{g} 593.827593.827 1731.7641731.764 286.744286.744 302.512302.512
100σμ100\sigma_{\mu} 396.348396.348 1239.1421239.142 335.987335.987 173.760173.760
100ση100\sigma_{\eta} 380.984380.984 634.356634.356 234.383234.383 203.419203.419
100σZ100\sigma_{Z} 456.971456.971 1035.4441035.444 263.295263.295 226.536226.536
100σR100\sigma_{R} 412.131412.131 1008.3771008.377 247.562247.562 214.989214.989
100σgdp100\sigma_{gdp} 372.244372.244 621.6010621.6010 NA NA
100σcon100\sigma_{con} 364.200364.200 1272.3671272.367 NA NA
100σinv100\sigma_{inv} 287.053287.053 1024.0541024.054 NA NA
100σinf100\sigma_{inf} 353.456353.456 1395.7611395.761 NA NA
100σffr100\sigma_{ffr} 292.713292.713 1309.1011309.101 NA NA
γg\gamma_{g} 279.772279.772 1512.7391512.739 243.631243.631 244.901244.901
γΠ\gamma_{\Pi} 794.587794.587 842.370842.370 367.625367.625 441.666441.666
γx\gamma_{x} 361.702361.702 1210.5351210.535 308.195308.195 280.568280.568
γ\gamma 365.017365.017 1175.5421175.542 208.171208.171 269.869269.869
σa\sigma_{a} 379.348379.348 1939.1161939.116 241.076241.076 247.873247.873
φp/100\varphi_{p}/100 393.882393.882 1036.0881036.088 245.650245.650 175.856175.856
φw/1000\varphi_{w}/1000 462.727462.727 1222.3711222.371 341.178341.178 568.954568.954
φI\varphi_{I} 371.357371.357 1004.8291004.829 299.040299.040 242.744242.744
σL\sigma_{L} 426.799426.799 926.466926.466 230.927230.927 242.741242.741
1a1-a 434.454434.454 1566.7281566.728 251.796251.796 237.491237.491
1aw1-a_{w} 359.744359.744 839.328839.328 255.983255.983 229.476229.476
100(β11)100\left(\beta^{-1}-1\right) 796.824796.824 660.065660.065 383.310383.310 343.446343.446
100log(Gz)100\log\left(G_{z}\right) 336.080336.080 2181.5662181.566 252.953252.953 203.629203.629
100(Π¯1)100\left(\overline{\Pi}-1\right) 295.468295.468 967.840967.840 259.442259.442 202.170202.170
α\alpha 442.753442.753 1120.5341120.534 318.548318.548 224.703224.703
Table S12: Posterior means, 2.5%2.5\%, and 97.5%97.5\% quantiles of the posterior distributions, and the inefficiency factors of each of the parameters in the extended version of the medium scale DSGE model estimated using the MPM sampler with 50%50\% trimmed mean (ρu=0.99\rho_{u}=0.99, disturbance sorting, ADPF)) for the US quarterly dataset from 1983Q1 to 2007Q4. The measurement error variances are fixed to 25%25\% of the variance of the observables.
Param. Estimates 2.5%2.5\% 97.5%97.5\% IF^\widehat{\textrm{IF}}
ρR\rho_{R} 0.5220.522 0.2040.204 0.7590.759 229.752229.752
ρg\rho_{g} 0.6080.608 0.2410.241 0.9290.929 201.565201.565
ρμ\rho_{\mu} 0.5600.560 0.3830.383 0.7230.723 170.293170.293
100σg100\sigma_{g} 0.0520.052 0.0060.006 0.1570.157 151.086151.086
100σμ100\sigma_{\mu} 4.0854.085 2.4942.494 6.0476.047 169.246169.246
100ση100\sigma_{\eta} 0.3500.350 0.2620.262 0.4680.468 220.044220.044
100σZ100\sigma_{Z} 0.6280.628 0.4830.483 0.8000.800 203.168203.168
100σR100\sigma_{R} 0.1020.102 0.0070.007 0.2380.238 185.973185.973
γg\gamma_{g} 0.3540.354 0.019-0.019 0.8560.856 209.208209.208
γΠ\gamma_{\Pi} 1.0111.011 1.0011.001 1.0321.032 301.381301.381
γx\gamma_{x} 0.8780.878 0.5060.506 1.3141.314 266.684266.684
γ\gamma 0.4570.457 0.3300.330 0.5880.588 199.630199.630
σa\sigma_{a} 5.5965.596 3.7983.798 7.6947.694 167.707167.707
φp\varphi_{p} 65.97365.973 17.46517.465 118.752118.752 250.439250.439
φw\varphi_{w} 814.476814.476 263.520263.520 1911.1361911.136 310.302310.302
φI\varphi_{I} 2.2922.292 1.2231.223 3.6633.663 193.898193.898
σL\sigma_{L} 5.2125.212 2.9462.946 7.7847.784 203.591203.591
1a1-a 0.3260.326 0.0870.087 0.5450.545 164.680164.680
1aw1-a_{w} 0.6130.613 0.2730.273 0.8620.862 257.090257.090
100(β11)100\left(\beta^{-1}-1\right) 0.0940.094 0.0060.006 0.3150.315 223.610223.610
100log(Gz)100\log\left(G_{z}\right) 0.5180.518 0.4680.468 0.5680.568 176.914176.914
100(Π¯1)100\left(\overline{\Pi}-1\right) 0.6110.611 0.4290.429 0.7950.795 154.344154.344
α\alpha 0.1810.181 0.1370.137 0.2210.221 165.125165.125

S8 Description: Small scale DSGE model

The small scale DSGE model specification follows Herbst and Schorfheide, (2016), and is also used by Herbst and Schorfheide, (2019). The small scale DSGE model follows the environment in Herbst and Schorfheide, (2019) with nominal rigidities through price adjustment costs following Rotemberg, (1982), and three structural shocks. There are altogether 44 state variables, 44 control variables (88 variables combining the state and control variables) and 33 exogenous shocks in the model. We estimate this model using 33 observables; that means that the dimensions of yty_{t}, the state vector and the disturbance vector are 33, 88, and 33, respectively.

Households

Time is discrete, households live forever, the representative household derives utility from consumption CtC_{t} relative to a habit stock (which is approximated by the level of technology AtA_{t})222This assumption ensures that the economy evolves along a balanced growth path even if the utility function is additively separable in consumption, real money balances and hours. and real money balances MtPt\frac{M_{t}}{P_{t}} and disutility from hours worked NtN_{t}. The household maximizes

E0t=0βt[(CtAt)1τ11τ+χMlogMtPtχNNt]E_{0}\sum_{t=0}^{\infty}\beta^{t}\Big{[}\frac{\big{(}\frac{C_{t}}{A_{t}}\big{)}^{1-\tau}-1}{1-\tau}+\chi_{M}\log\frac{M_{t}}{P_{t}}-\chi_{N}{N_{t}}\Big{]}

subject to the budget constraint

PtCt+Bt+Mt+Tt=PtWtNt+Rt1Bt1+Mt1+PtDt+PtSCt;P_{t}C_{t}+B_{t}+M_{t}+T_{t}=P_{t}W_{t}N_{t}+R_{t-1}B_{t-1}+M_{t-1}+P_{t}D_{t}+P_{t}{SC}_{t};

Tt,DtT_{t},D_{t} and SCt{SC}_{t} denote lump-sum taxes, aggregate residual profits and net cash inflow from trading a full set of state contingent securities; PtP_{t} is the aggregate price index, and WtW_{t} is the real wage; β\beta is the discount factor, τ{\tau} is the coefficient of relative risk aversion, χM\chi_{M} and χN\chi_{N} are scale factors determining the steady state money balance holdings and hours. We set χN=1\chi_{N}=1. AtA_{t} is the level of aggregate productivity.

Firms

Final output is produced by a perfectly competitive representative firm which uses a continuum of intermediate goods Yt(i)Y_{t}(i) and the production function

Yt=(01Yt(i)1ν𝑑i)11ν,{Y_{t}}=\Bigg{(}\int_{0}^{1}Y_{t}(i)^{1-\nu}di\Bigg{)}^{\frac{1}{1-\nu}},

with ν<1\nu<1. The demand for intermediate good ii is

Yt(i)=(Pt(i)Pt)1νYt,Y_{t}(i)=\Bigg{(}\frac{P_{t}(i)}{P_{t}}\Bigg{)}^{-\frac{1}{\nu}}Y_{t},

and the aggregate price index is

Pt=(01Pt(i)ν1ν𝑑i)νν1.P_{t}=\Bigg{(}\int_{0}^{1}P_{t}(i)^{\frac{\nu-1}{\nu}}di\Bigg{)}^{\frac{\nu}{\nu-1}}.

Intermediate good ii is produced using the linear production technology

Yt(i)=AtNt(i);Y_{t}(i)=A_{t}N_{t}(i);

AtA_{t} is an exogenous productivity process common to all firms, and Nt(i)N_{t}(i) is the labor input of firm i. Intermediate firms set prices (Pt(i))(P_{t}(i)) and labor input (Nt(i)N_{t}(i)) by maximizing the net present value of future profit. Nominal rigidities are introduced through price adjustment costs following Rotemberg, (1982).

Ets=0βsQt,t+s[Pt+s(i)Pt+sYt+s(i)Wt+sNt+s(i)ACt+s(i)]E_{t}\sum_{s=0}^{\infty}\beta^{s}Q_{t,t+s}\Big{[}\frac{P_{t+s(i)}}{P_{t+s}}Y_{t+s}(i)-W_{t+s}N_{t+s}(i)-AC_{t+s}(i)\Big{]}

with

ACt(i)=ϕ2(Pt(i)Pt1(i)π)2Yt(i)AC_{t}(i)=\frac{\phi}{2}\Bigg{(}\frac{P_{t}(i)}{P_{t-1}(i)}-\pi\Bigg{)}^{2}{Y_{t}(i)}

The parameter ϕ\phi governs the extent of price rigidity in the economy and π\pi is the steady state inflation rate associated with the final good. In equilibrium, households and firms use the same stochastic discount factor Qt,t+sQ_{t,t+s} where

Qt,t+s=(Ct+sCt)τ(AtAt+s)1τ.Q_{t,t+s}=\Big{(}\frac{C_{t+s}}{C_{t}}\Big{)}^{-\tau}{\Big{(}\frac{A_{t}}{A_{t+s}}\Big{)}^{1-\tau}}.

In a symmetric equilibrium all firms choose the same price.

Monetary and Fiscal Policy

The central bank conducts monetary policy following an interest rate feedback rule given by

Rt=Rt1ρRRt1ρRϵtR,R_{t}={R_{t}^{*}}^{1-\rho_{R}}R_{t-1}^{\rho_{R}}\epsilon_{t}^{R},

where ϵtRIID(0,σr)\epsilon_{t}^{R}\sim IID({0,\sigma_{r}}) is an iid shock to the nominal interest rate, Rt{R_{t}^{*}} is the nominal target and (1ρR)(1-\rho_{R}) captures interest rate smoothing in the conduct of policy,

Rt=rπ(πtπ)ψ1(YtYt)ψ2;{R_{t}^{*}}=r\pi^{*}\Big{(}\frac{\pi_{t}}{\pi^{*}}\Big{)}^{\psi_{1}}\Big{(}\frac{Y_{t}}{Y_{t}^{*}}\Big{)}^{\psi_{2}};

πt:=PtPt1\pi_{t}:=\frac{P_{t}}{P_{t-1}} is the gross inflation rate and π\pi^{*} is the target inflation rate. YtY_{t}^{*} is the output in the absence of nominal rigidities. The parameters ψ1\psi_{1} and ψ2\psi_{2} capture the intensity with which the central bank responds to inflation and output gap in the model. Government expenditure accounts for a fraction υt[0,1]\upsilon_{t}\in[0,1] of final output such that Gt=υtYtG_{t}=\upsilon_{t}Y_{t}. The government budget constraint is

PtGt+Rt1Bt1+Mt1=Tt+Bt+Mt.P_{t}G_{t}+R_{t-1}B_{t-1}+M_{t-1}=T_{t}+B_{t}+M_{t}.
Aggregation

Combining the household budget constraint with the government budget constraint gives

Ct+Gt+ACt=YtC_{t}+G_{t}+AC_{t}=Y_{t}

where in equilibrium, ACt=ϕ2(πtπ)2YtAC_{t}=\frac{\phi}{2}(\pi_{t}-\pi)^{2}{Y}_{t}.

Exogenous Processes

Aggregate technology grows at the rate γ\gamma and mtm_{t} is the shock to aggregate demand such that

logAt=logγ+logAt1+logmt\log{A_{t}}=\log{\gamma}+\log{A_{t-1}}+\log{m_{t}}
logmt=(1ρm)logm+ρmlogmt1+ϵtm\log{m_{t}}=(1-\rho_{m})\log{m}+\rho_{m}\log{m_{t-1}}+\epsilon_{t}^{m}

with ϵtmIID(0,σm)\epsilon_{t}^{m}\sim IID({0,\sigma_{m}}). We define gt:=1/(1υt)g_{t}:=1/(1-\upsilon_{t}) and gtg_{t} evolves as

loggt=(1ρg)logg+ρgloggt1+ϵtg,\log{g_{t}}=(1-\rho_{g})\log{g}+\rho_{g}\log{g_{t-1}}+\epsilon_{t}^{g},

with ϵtgIID(0,σg)\epsilon_{t}^{g}\sim IID({0,\sigma_{g}}). We summarize the nonlinear equilibrium conditions after detrending Ct,Yt,GtC_{t},Y_{t},G_{t} by deterministic technology, i.e define Xt~:=Xt/At\tilde{X_{t}}:=X_{t}/A_{t}:

1=βEt[(C~t+1C~t)τztRtπt+1]1=\beta{E_{t}}\Big{[}\bigg{(}\frac{\tilde{C}_{t+1}}{\tilde{C}_{t}}\bigg{)}^{-\tau}{z_{t}}\frac{R_{t}}{\pi_{t+1}}\Big{]} (S10)
1=ϕ(πtπ)[(112ν)πt+π2ν]ϕEt[(C~t+1C~t)τY~t+1Y~t(πt+1π)πt+1]+1ν[1(C~t)τ]1=\phi(\pi_{t}-\pi)\Big{[}\bigg{(}1-\frac{1}{2\nu}\bigg{)}\pi_{t}+\frac{\pi}{2\nu}\Big{]}-\phi{E_{t}}\Big{[}\bigg{(}\frac{\tilde{C}_{t+1}}{\tilde{C}_{t}}\bigg{)}^{-\tau}\frac{\tilde{Y}_{t+1}}{\tilde{Y}_{t}}(\pi_{t+1}-\pi)\pi_{t+1}\Big{]}+\frac{1}{\nu}\Big{[}1-\bigg{(}{\tilde{C}_{t}}\bigg{)}^{\tau}\Big{]}
Rt=Rt1ρRRt1ρRexpϵtRR_{t}={R_{t}^{*}}^{1-\rho_{R}}R_{t-1}^{\rho_{R}}\exp{\epsilon_{t}^{R}}
Rt=rπ(πtπ)ψ1(Y~tY~t)ψ2{R_{t}^{*}}=r\pi^{*}\Big{(}\frac{\pi_{t}}{\pi^{*}}\Big{)}^{\psi_{1}}\Big{(}\frac{\tilde{Y}_{t}}{\tilde{Y}_{t}^{*}}\Big{)}^{\psi_{2}}
C~t+G~t+ϕ2(πtπ)2Y~t=Y~t\tilde{C}_{t}+\tilde{G}_{t}+\frac{\phi}{2}(\pi_{t}-\pi)^{2}\tilde{Y}_{t}=\tilde{Y}_{t}
logAt=logγ+logAt1+logmt\log{A_{t}}=\log{\gamma}+\log{A_{t-1}}+\log{m_{t}}
logmt=(1ρm)logm+ρmlogmt1+ϵtm\log{m_{t}}=(1-\rho_{m})\log{m}+\rho_{m}\log{m_{t-1}}+\epsilon_{t}^{m} (S11)
loggt=(1ρg)logg+ρgloggt1+ϵtg.\log{g_{t}}=(1-\rho_{g})\log{g}+\rho_{g}\log{g_{t-1}}+\epsilon_{t}^{g}. (S12)

After detrending, the steady state solution of the model is π=π,R~=γπβ,C~=(1ν)1τandY~=gC~=y~\pi=\pi^{*},\tilde{R}=\frac{\gamma\pi}{\beta},\tilde{C}=(1-\nu)^{\frac{1}{\tau}}\;\;\mathrm{and}\;\;\tilde{Y}=g\tilde{C}=\tilde{y^{*}}.

Equations (S10)-(S12) can now be rewritten as

EtG(zt+1,zt,ϵt+1)=0,E_{t}G(z_{t+1},z_{t},\epsilon_{t+1})=0,

where

zt=[C~t,Y~t,G~t,Rt,Rt,At,mt,gt]andϵt=[ϵtR,ϵtm,ϵtg],z_{t}=[\tilde{C}_{t},\tilde{Y}_{t},\tilde{G}_{t},R_{t},R_{t}^{*},A_{t},m_{t},g_{t}]\quad\text{and}\quad\epsilon_{t}=[\epsilon_{t}^{R},\epsilon_{t}^{m},\epsilon_{t}^{g}],

and solved using the methods in Section  S3.

We solve the model in log deviations from steady state, i.e. xt^:=log(Xt~X~)\widehat{x_{t}}:=\log\Big{(}\frac{\tilde{X_{t}}}{\tilde{X}}\Big{)} using Dynare. The small scale DSGE model therefore consists of a consumption Euler equation, a new Keynesian Philip curve, a monetary policy rule, fiscal policy rule, three exogenous shock processes, and eight endogenous latent variables. We estimate the model using both first and second order approximations to assess the performance of our proposed method relative to existing methods. As explained, the source of nonlinearity in our case stems from taking a second order approximation of the equilibrium conditions.

The observables used in estimating the model consist of per capita GDP growth rate (YGRtYGR_{t}), annualized quarter on quarter inflation rate (InfltInfl_{t}) and annualized nominal rates (InttInt_{t}). The observed data is measured in percentages, and, after applying a log transformation to the endogenous variables, the measurement equations for our system using the transformed endogenous variables given by (a hatted variable denotes the log transformation).

YGRt=γ(Q)+100(y^ty^t1+m^t),\displaystyle YGR_{t}=\gamma^{(Q)}+100(\hat{y}_{t}-\hat{y}_{t-1}+\hat{m}_{t}),
Inflt=π(A)+400π^t,\displaystyle Infl_{t}=\pi^{(A)}+400\hat{\pi}_{t},
Intt=π(A)+r(A)+4γ(Q)+400R^t.\displaystyle Int_{t}=\pi^{(A)}+r^{(A)}+4\gamma^{(Q)}+400\hat{R}_{t}.

Here, y^t=log(Yt~Y~)\hat{y}_{t}=\log\Big{(}\frac{\tilde{Y_{t}}}{\tilde{Y}}\Big{)}, π^t=log(πtπ)\hat{\pi}_{t}=\log\Big{(}\frac{\pi_{t}}{{\pi}}\Big{)}, R^t=log(RtR¯)\hat{R}_{t}=\log\Big{(}\frac{R_{t}}{{\overline{R}}}\Big{)} and m^t=log(mtm¯)=logmt\hat{m}_{t}=\log\Big{(}\frac{m_{t}}{{\overline{m}}}\Big{)}=\log{m_{t}} since log(m¯)=0\log(\overline{m})=0 in steady state as evident from Equation (S11).

The model has the 15 parameters

{υ,ν,ϕ,τ,π(A),r(A),γ(Q),ψ1,ψ2,ρr,ρg,ρm,σr,σg,σm}.\Big{\{}\upsilon,\nu,\phi,\tau,\pi^{\left(A\right)},r^{\left(A\right)},\gamma^{\left(Q\right)},\psi_{1},\psi_{2},\rho_{r},\rho_{g},\rho_{m},\sigma_{r},\sigma_{g},\sigma_{m}\Big{\}}.

We calibrate the parameters characterizing the share of fiscal expenditure in GDP υ=0.2\upsilon=0.2, the elasticity of substitution across varieties 1/ν=111/\nu=11 and the parameter guiding the extent of nominal rigidities, ϕ=100\phi=100. The remaining parameters along with the measurement errors in the observation equation are estimated. The steady-state inflation rate, π\pi, in the model relates to the estimated parameter for annualized inflation π(A)\pi^{(A)} such that π=π(A)/400\pi=\pi^{(A)}/400 and the discount factor β\beta to the estimated parameter for annualized interest rate r(A)r^{(A)} such that β=((1+r(A))/400)1\beta=\Big{(}(1+r^{(A)})/400\Big{)}^{-1}. The sample used for estimation is 1983Q1-2002Q4. The data set and the variables are identical to those used in Herbst and Schorfheide, (2019).333Data on all three observables are sourced from FRED. Per capita GDP growth rate is calculated using data on real gross domestic product (FRED mnemonic ‘GDPC1’) and Civilian Non-institutional Population (FRED mnemonic ‘CNP16OV’ / BLS series ‘LNS10000000’), Annualized Inflation is calculated using CPI price level (FRED mnemonic ‘CPIAUCSL’), the Federal Funds Rate (FRED mnemonic ‘FEDFUNDS’).

S9 Description: Medium scale DSGE model

The specification of the medium-scale model is similar to Smets and Wouters, (2007) and follows the specification in Gust et al., (2017). The model allows for habit formation in consumption, nominal rigidities in wages and prices, indexation in wages and prices, and investment adjustment costs. The model environment consists of monopolistically competitive intermediate goods-producing firms, a perfectly competitive final good-producing firm, households, and the government conducting fiscal and monetary policy. We describe the behavior of each agent in detail below. The baseline model in Gust et al., (2017) consists of 12 state variables, 18 control variables (30 variables combining the state and control variables) and 5 exogenous shocks. We estimate this model using 5 observables; that means that the dimensions of yty_{t}, the state vector and the disturbance vector are 55, 3030, and 55, respectively. We also consider a variation of the medium scale DSGE model of Gust et al., (2017). To study how well our approach performs with increasing state dimension, we estimate the model with the canonical definition of the output gap defined as the difference between output in the model with nominal rigidities and output in the model with flexible prices and wages. This variation is identical to the baseline model in Gust et al., (2017) except for the Taylor rule specification, which now uses the canonical definition of the output gap. This extension of the medium scale DSGE model consists of 21 state variables, 30 control variables (51 variables combining the state and control variables) and 5 exogenous shocks. We estimate this model using 5 observables; that means that the dimensions of yty_{t}, the state vector and the disturbance vector are 55, 5151, and 55, respectively.

S9.1 Firms

There is a continuum of monopolistically competitive firms producing differentiated intermediate goods. The final good (YtY_{t}) is produced by a perfectly competitive representative firm that uses a continuum of differentiated intermediate goods (Yt(i)Y_{t}(i)) such that

Yt=(01Yt(j)ϵp1ϵp𝑑j)ϵpϵp1.{Y_{t}}=\Bigg{(}\int_{0}^{1}Y_{t}(j)^{\frac{\epsilon_{p}-1}{\epsilon_{p}}}dj\Bigg{)}^{\frac{\epsilon_{p}}{\epsilon_{p}-1}}.

Under the constant elasticity of substitution aggregator, the demand for good produced by the jthj^{th} firm is

Yt(j)=(Pt(j)Pt)ϵpYt,Y_{t}(j)=\Bigg{(}\frac{P_{t}(j)}{P_{t}}\Bigg{)}^{-\epsilon_{p}}Y_{t},

The production function for the jthj^{th} intermediate good producer is

Yt(j)=Kt(j)α(ZtNt(j))(1α).{Y_{t}(j)}=K_{t}(j)^{\alpha}(Z_{t}N_{t}(j))^{(1-\alpha)}.

Nt(j)N_{t}(j) and Kt(j)K_{t}(j) denote labor and capital inputs for the jthj^{th} intermediate good producer and ZtZ_{t} denotes labor augmenting technological progress. ZtZ_{t} evolves as

Zt=Zt1Gzexp(ϵz,t); ϵz,tiidN(0,σz2);Z_{t}=Z_{t-1}G_{z}\exp(\epsilon_{z,t});\text{ }\epsilon_{z,t}\overset{iid}{\sim}N(0,\sigma_{z}^{2});

GzG_{z} is the deterministic growth rate of technology and ϵz,t\epsilon_{z,t} introduces deviations about this deterministic trend. Following Rotemberg, (1982), intermediate good producing firm jj face quadratic costs in adjusting nominal prices Pt(j)P_{t}(j) and is given by

ϕp2(Pt(j)π~t1Pt1(j)1)2Yt\frac{\phi_{p}}{2}\Bigg{(}\frac{P_{t}(j)}{\tilde{\pi}_{t-1}P_{t-1}(j)}-1\Bigg{)}^{2}{Y_{t}}

with ϕp\phi_{p} describes the size of adjustment cost. The change in price is indexed to π~t1\tilde{\pi}_{t-1} with

π~t1=π¯aπt11a.\tilde{\pi}_{t-1}=\overline{\pi}^{a}{{\pi}_{t-1}^{1-a}}.

Here π¯\overline{\pi} is the inflation target set by the central bank and πt1:=Pt1Pt2{\pi}_{t-1}:=\frac{P_{t-1}}{P_{t-2}} is the lagged gross inflation in period t1t-1. The parameter aa quantifies the extent of indexation to the central bank inflation target or lagged gross inflation with a[0,1]a\in[0,1]. The optimal price is set by each intermediate good producing firm by maximizing the expected present discounted value of profits

Ets=0Λt+sΛt[(Pt+s(j)Pt+smct+s)Yt+s(j)ψp2(Pt+s(j)π~t+s1Pt+s1(j)1)2Yt+s];E_{t}\sum_{s=0}^{\infty}\frac{\Lambda_{t+s}}{\Lambda_{t}}\Big{[}\Big{(}\frac{P_{t+s}(j)}{P_{t+s}}-mc_{t+s}\Big{)}Y_{t+s}(j)-\frac{\psi_{p}}{2}\Big{(}\frac{P_{t+s}(j)}{\tilde{\pi}_{{t+s}-1}P_{{t+s}-1}(j)}-1\Big{)}^{2}{Y_{t+s}}\Big{]};

mctmc_{t} is the real marginal cost with mct:=(Wt/Pt)1α(RtK/Pt)ααα1α1αmc_{t}:=\frac{{(W_{t}/P_{t})}^{1-\alpha}{(R_{t}^{K}/P_{t})}^{\alpha}}{\alpha^{\alpha}{1-\alpha}^{1-\alpha}}. WtW_{t} and RtKR_{t}^{K} denote the nominal wage and nominal rental rate of capital services respectively. Λt+sΛt\frac{\Lambda_{t+s}}{\Lambda_{t}} is the stochastic discount factor from the optimization exercise of households.

S9.2 Households

There exists a continuum of infinitely lived monopolistically competitive households indexed by i[0,1]i\in[0,1] supplying differentiated labor service, Nt(i)N_{t}(i). Household ii sells labor service Nt(i)N_{t}(i) to a representative employment agency which combines the differentiated labor services across the spectrum of households into a composite NtN_{t}, which in turn is supplied to the intermediate goods producers for production in a perfectly competitive market. Differentiated labor services is aggregated via a Dixit-Stiglitz aggregator given by

Nt=(01Nt(i)ϵw1ϵw𝑑i)ϵwϵw1.N_{t}=\Big{(}\int_{0}^{1}N_{t}(i)^{\frac{\epsilon_{w}-1}{\epsilon_{w}}}di\Big{)}^{\frac{\epsilon_{w}}{\epsilon_{w}-1}}.

The parameter ϵw\epsilon_{w} is the constant elasticity of substitution across different varieties of labor services. Households maximize lifetime utility,

E0t=0βt[(Ct(i)γCt1(i))ψLNt(i)1+σL1+σLψtw(i)],E_{0}\sum_{t=0}^{\infty}\beta^{t}\Big{[}\Big{(}C_{t}(i)-\gamma C_{t-1}(i)\Big{)}-\psi_{L}\frac{N_{t}(i)^{1+\sigma_{L}}}{1+\sigma_{L}}-\psi_{t}^{w}(i)\Big{]},

with Ct+s(i)C_{t+s}(i) consumption, Nt+s(i)N_{t+s}(i) being labor services, and ψtw(i)\psi_{t}^{w}(i) denotes the loss in utility due to adjusting nominal wages Wt(i)W_{t}(i). The utility cost of adjusting nominal wages is quadratic. The cost of adjusting nominal wages is

ψtw(i)=ϕw2(Wt(j)π~t1wWt1(j)1)2.\psi_{t}^{w}(i)=\frac{\phi_{w}}{2}\Bigg{(}\frac{W_{t}(j)}{\tilde{\pi}_{t-1}^{w}W_{t-1}(j)}-1\Bigg{)}^{2}.

The parameter ϕw\phi_{w} governs the size of wage adjustment costs. Wage contracts are indexed to productivity and inflation with

π~tw=Gzπ¯aw(exp(ϵz,t)πt1)1aw.\tilde{\pi}_{t}^{w}=G_{z}\overline{\pi}^{a_{w}}(\exp(\epsilon_{z},t)\pi_{t-1})^{1-a_{w}}.

The parameter γ\gamma governs the degree of habit formation in household utility.444Habits are internal to households. Households also engage in savings. Households can save by investing in risk-free nominal bonds (Bt+s(i)B_{t+s}(i)) and owning physical capital stock K¯t(i)\overline{K}_{t}(i). Additionally, households can choose the extent of capital utilization ut(i)u_{t}(i). Utilization is combined with physical capital stock to transform physical capital to capital services Kt(i)=ut(i)K¯t(i)K_{t}(i)=u_{t}(i)\overline{K}_{t}(i). In the process of transforming physical capital to capital services via utilization ut(i)u_{t}(i), households incur costs of utilization

a(ut(i)):=rkσa[exp(σa(ut(i)1))1].a(u_{t}(i)):=\frac{r^{k}}{\sigma_{a}}\Big{[}\exp(\sigma_{a}(u_{t}(i)-1))-1\Big{]}.

Households choose Ct(i)C_{t}(i), Nt(i)N_{t}(i), Bt+1(i)B_{t+1}(i), investment It(i)I_{t}(i), K¯t(i)\overline{K}_{t}(i) and ut(i)u_{t}(i) subject to the budget constraint

Ct(i)+I(i)+Bt+1/PtRtηt+a(ut(i))K¯t(i)=Wt(i)PtNt(i)+RtKPtKt1(i)+Bt(i)Pt+DtPtTt, for any t.C_{t}(i)+I(i)+\frac{B_{t+1}/{P_{t}}}{R_{t}\eta_{t}}+a(u_{t}(i))\overline{K}_{t}(i)=\frac{W_{t}(i)}{P_{t}}N_{t}(i)+\frac{R_{t}^{K}}{P_{t}}{K}_{t-1}(i)+\frac{B_{t}(i)}{P_{t}}+\frac{D_{t}}{P_{t}}-T_{t},\text{ for any }t.

The law of motion of capital

K¯t(i)=(1δ)K¯t1(i)+μt(1St(i))It(i).\overline{K}_{t}(i)=(1-\delta)\overline{K}_{t-1}(i)+\mu_{t}(1-S_{t}(i))I_{t}(i).

Accumulation of capital is subject to the investment adjustment costs

St(i)=ψI2[It(i)GZIt1(i)1]2.S_{t}(i)=\frac{\psi_{I}}{2}\Big{[}\frac{I_{t}(i)}{G_{Z}I_{t-1}(i)-1}\Big{]}^{2}.

Consistent with Smets and Wouters, (2007), ηt\eta_{t} in the budget constraint and μt\mu_{t} in the law of motion of capital represent shocks to demand for risk-free assets and an exogenous disturbance to the marginal efficiency of transforming final goods in tt to physical capital in t+1t+1 with

ln(ηt)=ρηln(ηt1)+ϵη,t,ϵη,tiidN(0,ση2)\ln(\eta_{t})=\rho_{\eta}\ln(\eta_{t-1})+\epsilon_{\eta,t},\epsilon_{\eta,t}\overset{iid}{\sim}N(0,\sigma_{\eta}^{2})

and

ln(μt)=ρμln(μt1)+ϵμ,t,ϵμ,tiidN(0,σμ2).\ln(\mu_{t})=\rho_{\mu}\ln(\mu_{t-1})+\epsilon_{\mu,t},\epsilon_{\mu,t}\overset{iid}{\sim}N(0,{\sigma_{\mu}^{2}}).

S9.3 Monetary and fiscal policy

The central bank sets the nominal interest rate RtNR_{t}^{N} according to a Taylor rule with

Rt=(Rt1)ρR[R(πtπ¯)γπxg,tγx(YtGzYt1)γg]1ρRϵR,t.R_{t}=\big{(}R_{t-1}\big{)}^{\rho_{R}}\Big{[}R\Big{(}\frac{\pi_{t}}{\overline{\pi}}\Big{)}^{\gamma_{\pi}}x_{g,t}^{\gamma_{x}}\Big{(}\frac{Y_{t}}{G_{z}Y_{t-1}}\Big{)}^{\gamma_{g}}\Big{]}^{1-\rho_{R}}{\epsilon_{R,t}}.

The parameter ρR(0,1)\rho_{R}\in(0,1) governs the extent of interest rate smoothing; RR denotes the interest rate in the non-stochastic steady state with R=β1Gzπ¯R=\beta^{-1}G_{z}\overline{\pi}; π¯\overline{\pi} is the inflation target set by the central bank; xg,tx_{g,t} in the Taylor rule is a proxy to quantify the output-gap in the economy with nominal rigidities relative to the flexible price benchmark. Consistent with Gust et al., (2017) we deviate from Smets and Wouters, (2007) and specify the the output-gap xg,tx_{g,t} with

xg,t=utα(NtN)1α,x_{g,t}=u_{t}^{\alpha}\Big{(}\frac{N_{t}}{N}\Big{)}^{1-\alpha},

where utu_{t} is the aggregate level of capital utilization (with non-stochastic steady state of 1) and NN is the non-stochastic steady state level of aggregate labor. We estimate both versions of the model where the output gap is proxied by xg,tx_{g,t} as well as using the canonical definition of the output gap – characterized as deviations of output in the model with nominal rigidities relative to the model without nominal rigidities and the Taylor rule being

Rt=(Rt1)ρR[R(πtπ¯)γπ(xt)γx(YtGzYt1)γg]1ρRϵR,t,R_{t}=\big{(}R_{t-1}\big{)}^{\rho_{R}}\Big{[}R\Big{(}\frac{\pi_{t}}{\overline{\pi}}\Big{)}^{\gamma_{\pi}}(x_{t})^{\gamma_{x}}\Big{(}\frac{Y_{t}}{G_{z}Y_{t-1}}\Big{)}^{\gamma_{g}}\Big{]}^{1-\rho_{R}}{\epsilon_{R,t}},

with xt=YtYtNx_{t}=\frac{Y_{t}}{Y_{t}^{N}}; YtNY_{t}^{N} is the output in the flexible price model. The parameters γπ0\gamma_{\pi}\geq 0, γg0\gamma_{g}\geq 0 and γx0\gamma_{x}\geq 0 quantify the extent of the central bank’s reaction to inflation, output-gap and output growth; ϵR,t{\epsilon_{R,t}} is the exogenous disturbance to monetary policy with ϵR,tiidN(0,σR2){\epsilon_{R,t}}\overset{iid}{\sim}N(0,\sigma_{R}^{2}). Government spending evolves exogenously as a time-varying function of aggregate output YtY_{t} with

Gt=(11gt)Yt,G_{t}=\big{(}1-\frac{1}{g_{t}}\big{)}Y_{t},

such that

ln(gt)=(1ρg)ln(g)+ρgln(gt1)+ϵg,t; ϵg,tiidN(0,σg2).\ln(g_{t})=(1-\rho_{g})\ln(g)+\rho_{g}\ln(g_{t-1})+\epsilon_{g,t};\text{ }{\epsilon_{g,t}}\overset{iid}{\sim}N(0,\sigma_{g}^{2}).

The government budget constraint satisfies Gt=TtG_{t}=T_{t}, for each period.

S9.4 Market clearing

In a symmetric equilibrium, all intermediate goods firm choose the same price with Pt(i)=PtP_{t}(i)=P_{t}, for all ii, and all households choose the same wage with Wt(j)=Wt ,jW_{t}(j)=W_{t}\text{ },\forall j. The aggregate production function is

Yt=Ktα[ZtNt]1α.Y_{t}=K_{t}^{\alpha}[Z_{t}N_{t}]^{1-\alpha}.

Market clearing implies

Yt=Ct+It+Gt+ϕp2(Ptπ~t1Pt11)2Yt+a(ut)K¯t.Y_{t}=C_{t}+I_{t}+G_{t}+\frac{\phi_{p}}{2}\Bigg{(}\frac{P_{t}}{\tilde{\pi}_{t-1}P_{t-1}}-1\Bigg{)}^{2}{Y_{t}}+a(u_{t})\overline{K}_{t}.

S9.5 First-order conditions

The detailed first order conditions are summarized below:

[πtπ~t11]πtπ~t1=βEt[Λt+1Λt][πt+1π~t1]πt+1π~tYt+1Yt+ϵpψp[mctϵp1ϵp][\frac{\pi_{t}}{\tilde{\pi}_{t-1}}-1]\frac{\pi_{t}}{\tilde{\pi}_{t-1}}=\beta E_{t}\Big{[}\frac{\Lambda_{t+1}}{\Lambda_{t}}\Big{]}[\frac{\pi_{t+1}}{\tilde{\pi}_{t}}-1]\frac{\pi_{t+1}}{\tilde{\pi}_{t}}\frac{Y_{t+1}}{Y_{t}}+\frac{\epsilon_{p}}{\psi_{p}}[mc_{t}-\frac{\epsilon_{p}-1}{\epsilon_{p}}] (S13)
(1α)mct=WtNtPtYt(1-\alpha)mc_{t}=\frac{W_{t}N_{t}}{P_{t}Y_{t}} (S14)
PtrtK=α1αWtNtutK¯tP_{t}r_{t}^{K}=\frac{\alpha}{1-\alpha}\frac{W_{t}N_{t}}{u_{t}\overline{K}_{t}} (S15)
π~t1=π¯aπt11a\tilde{\pi}_{t-1}=\overline{\pi}^{a}{{\pi}_{t-1}^{1-a}} (S16)
Yt=(utK¯t)α(ZtNt)1αY_{t}=(u_{t}\overline{K}_{t})^{\alpha}{\big{(}Z_{t}N_{t}\big{)}}^{1-\alpha} (S17)
Λt=[CtγCt1]1γβEt[Ct+1γCt]1\Lambda_{t}=\big{[}C_{t}-\gamma C_{t-1}\big{]}^{-1}-\gamma\beta E_{t}\big{[}C_{t+1}-\gamma C_{t}\big{]}^{-1} (S18)
1=βRtηtEtΛt+1Λt/πt+11=\beta R_{t}\eta_{t}E_{t}\frac{\Lambda_{t+1}}{\Lambda_{t}}/\pi_{t+1} (S19)
[πw,tπ~w,t1]πw,tπ~w,t=βEt[Λt+1Λt][πw,t+1π~w,t+11]πw,t+1π~w,t+1+NtΛtϵwψw[ψLNtσLΛtϵw1ϵwWtPt][\frac{\pi_{w,t}}{\tilde{\pi}_{w,t}}-1]\frac{\pi_{w,t}}{\tilde{\pi}_{w,t}}=\beta E_{t}\Big{[}\frac{\Lambda_{t+1}}{\Lambda_{t}}\Big{]}[\frac{\pi_{w,t+1}}{\tilde{\pi}_{w,t+1}}-1]\frac{\pi_{w,t+1}}{\tilde{\pi}_{w,t+1}}+N_{t}\Lambda_{t}\frac{\epsilon_{w}}{\psi_{w}}\Big{[}\psi_{L}\frac{N_{t}^{\sigma_{L}}}{\Lambda_{t}}-\frac{\epsilon_{w}-1}{\epsilon_{w}}\frac{W_{t}}{P_{t}}\Big{]} (S20)
πw,t=WtWt1\pi_{w,t}=\frac{W_{t}}{W_{t-1}} (S21)
π~w,t=Gzπ¯aw(exp(ϵz,t)πt1)1aw\tilde{\pi}_{w,t}=G_{z}\overline{\pi}^{a_{w}}\big{(}\exp(\epsilon_{z,t})\pi_{t-1}\big{)}^{1-a_{w}} (S22)
qt=βEt[Λt+1Λt(rt+1Ka(ut+1)+(1δ)qt+1)]q_{t}=\beta E_{t}\Big{[}\frac{\Lambda_{t+1}}{\Lambda_{t}}\big{(}r_{t+1}^{K}-a(u_{t+1})+(1-\delta)q_{t+1}\big{)}\Big{]} (S23)
1=qtμt[1ψI2(ItGzIt11)2ψI(ItGzIt11)ItGzIt1]+βψIEt[qt+1Λt+1Λtμt+1(It+1GzIt1)(It+1GzIt)2]1=q_{t}\mu_{t}\Big{[}1-\frac{\psi_{I}}{2}\big{(}\frac{I_{t}}{G_{z}I_{t-1}}-1\big{)}^{2}-{\psi_{I}}\big{(}\frac{I_{t}}{G_{z}I_{t-1}}-1\big{)}\frac{I_{t}}{G_{z}I_{t-1}}\Big{]}\\ +\beta\psi_{I}E_{t}\Big{[}q_{t+1}\frac{\Lambda_{t+1}}{\Lambda_{t}}\mu_{t+1}\big{(}\frac{I_{t+1}}{G_{z}I_{t}}-1\big{)}\big{(}\frac{I_{t+1}}{G_{z}I_{t}}\big{)}^{2}\Big{]} (S24)
rtK=rkexp(σa(ut1))r_{t}^{K}=r^{k}{\exp(\sigma_{a}(u_{t}-1))} (S25)
a(ut)=rkσa(exp(σa(ut1))1)a(u_{t})=\frac{r^{k}}{\sigma_{a}}\Big{(}\exp(\sigma_{a}(u_{t}-1))-1\Big{)} (S26)
K¯t+1=(1δ)K¯t+μt(1ψI2[ItGZIt11]2)It\overline{K}_{t+1}=(1-\delta)\overline{K}_{t}+\mu_{t}(1-\frac{\psi_{I}}{2}\Big{[}\frac{I_{t}}{G_{Z}I_{t-1}}-1\Big{]}^{2})I_{t} (S27)
Rt=(Rt1)ρR[R(πtπ¯)γπ(xg,t)γx(YtGzYt1)γg]1ρRϵR,t.R_{t}=\big{(}R_{t-1}\big{)}^{\rho_{R}}\Big{[}R\Big{(}\frac{\pi_{t}}{\overline{\pi}}\Big{)}^{\gamma_{\pi}}(x_{g,t})^{\gamma_{x}}\Big{(}\frac{Y_{t}}{G_{z}Y_{t-1}}\Big{)}^{\gamma_{g}}\Big{]}^{1-\rho_{R}}{\epsilon_{R,t}}. (S28)

When solving the model using the canonical definition of the output-gap, the Taylor rule is specified as follows where YtNY_{t}^{N} is the output in the model with flexible prices and wages.

Rt=(Rt1)ρR[R(πtπ¯)γπ(YtYtN)γx(YtGzYt1)γg]1ρRϵR,tR_{t}=\big{(}R_{t-1}\big{)}^{\rho_{R}}\Big{[}R\Big{(}\frac{\pi_{t}}{\overline{\pi}}\Big{)}^{\gamma_{\pi}}(\frac{Y_{t}}{Y_{t}^{N}})^{\gamma_{x}}\Big{(}\frac{Y_{t}}{G_{z}Y_{t-1}}\Big{)}^{\gamma_{g}}\Big{]}^{1-\rho_{R}}{\epsilon_{R,t}} (S29)
Yt=Ct+It+Gt+ϕp2(Ptπ~t1Pt11)2Yt+a(ut)K¯tY_{t}=C_{t}+I_{t}+G_{t}+\frac{\phi_{p}}{2}\Bigg{(}\frac{P_{t}}{\tilde{\pi}_{t-1}P_{t-1}}-1\Bigg{)}^{2}{Y_{t}}+a(u_{t})\overline{K}_{t} (S30)
xg,t=utα(NtN)1αx_{g,t}=u_{t}^{\alpha}\Big{(}\frac{N_{t}}{N}\Big{)}^{1-\alpha} (S31)
ln(ηt)=ρηln(ηt1)+ϵη,t; ϵη,tiidN(0,ση2)\ln(\eta_{t})=\rho_{\eta}\ln(\eta_{t-1})+\epsilon_{\eta,t};\text{ }\epsilon_{\eta,t}\overset{iid}{\sim}N(0,\sigma_{\eta}^{2}) (S32)
ln(μt)=ρμln(μt1)+ϵμ,t; ϵμ,tiidN(0,σμ2).\ln(\mu_{t})=\rho_{\mu}\ln(\mu_{t-1})+\epsilon_{\mu,t};\text{ }\epsilon_{\mu,t}\overset{iid}{\sim}N(0,{\sigma_{\mu}^{2}}). (S33)
ln(gt)=(1ρg)ln(g)+ρgln(gt1)+ϵg,t; ϵg,tiidN(0,σg2).\ln(g_{t})=(1-\rho_{g})\ln(g)+\rho_{g}\ln(g_{t-1})+\epsilon_{g,t};\text{ }{\epsilon_{g,t}}\overset{iid}{\sim}N(0,\sigma_{g}^{2}). (S34)
Zt=Zt1Gzexp(ϵz,t); ϵz,tiidN(0,σz2).Z_{t}=Z_{t-1}G_{z}\exp(\epsilon_{z,t});\text{ }\epsilon_{z,t}\overset{iid}{\sim}N(0,\sigma_{z}^{2}). (S35)

S9.6 De-trended equilibrium conditions

Labor augmenting technological progress ZtZ_{t} evolves as

Zt=Zt1Gzexp(ϵz,t); ϵz,tiidN(0,σz2).Z_{t}=Z_{t-1}G_{z}\exp(\epsilon_{z,t});\text{ }\epsilon_{z,t}\overset{iid}{\sim}N(0,\sigma_{z}^{2}).

The deterministic component GzG_{z} of labor augmenting technological process is the source of deterministic growth in the model. We summarize the de-trended stationary equilibrium conditions such that lower case letters yt:=YtZt,ct:=CtZt,it:=ItZt,k¯t:=K¯tZt1,wt:=WtPtZt,g~t:=(11gt)YtZtandλt:=ZtΛt,y_{t}:=\frac{Y_{t}}{Z_{t}},c_{t}:=\frac{C_{t}}{Z_{t}},i_{t}:=\frac{I_{t}}{Z_{t}},\overline{k}_{t}:=\frac{\overline{K}_{t}}{Z_{t-1}},w_{t}:=\frac{W_{t}}{P_{t}Z_{t}},\tilde{g}_{t}:=(1-\frac{1}{g_{t}})\frac{Y_{t}}{Z_{t}}\text{and}\lambda_{t}:={Z_{t}}{\Lambda_{t}}, denote stationary transformations.

[πtπ~t11]πtπ~t1=βEt[λt+1λt][πt+1π~t1]πt+1π~tyt+1yt+ϵpψp[mctϵp1ϵp][\frac{\pi_{t}}{\tilde{\pi}_{t-1}}-1]\frac{\pi_{t}}{\tilde{\pi}_{t-1}}=\beta E_{t}\Big{[}\frac{\lambda_{t+1}}{\lambda_{t}}\Big{]}[\frac{\pi_{t+1}}{\tilde{\pi}_{t}}-1]\frac{\pi_{t+1}}{\tilde{\pi}_{t}}\frac{y_{t+1}}{y_{t}}+\frac{\epsilon_{p}}{\psi_{p}}[mc_{t}-\frac{\epsilon_{p}-1}{\epsilon_{p}}] (S36)
(1α)mct=wtNtPtyt(1-\alpha)mc_{t}=\frac{w_{t}N_{t}}{P_{t}y_{t}} (S37)
rtK=α1αwtNtutk¯tr_{t}^{K}=\frac{\alpha}{1-\alpha}\frac{w_{t}N_{t}}{u_{t}\overline{k}_{t}} (S38)
π~t1=π¯aπt11a\tilde{\pi}_{t-1}=\overline{\pi}^{a}{{\pi}_{t-1}^{1-a}} (S39)
yt=Gz,tα(utk¯t)αNt1αy_{t}={G_{z,t}}^{-\alpha}(u_{t}\overline{k}_{t})^{\alpha}{N_{t}}^{1-\alpha} (S40)
λt=[ctγct1]1γβEt[ct+1γct]1\lambda_{t}=\big{[}c_{t}-\gamma c_{t-1}\big{]}^{-1}-\gamma\beta E_{t}\big{[}c_{t+1}-\gamma c_{t}\big{]}^{-1} (S41)
1=βRtηtEtλt+1λt/(Gz,t+1πt+1)1=\beta R_{t}\eta_{t}E_{t}\frac{\lambda_{t+1}}{\lambda_{t}}/\big{(}{G_{z,t+1}}\pi_{t+1}\big{)} (S42)
[πw,tπ~w,t1]πw,tπ~w,t=βEt[λt+1λt][πw,t+1π~w,t+11]πw,t+1π~w,t+1+Ntλtϵwψw[ψLNtσLλtϵw1ϵwwt][\frac{\pi_{w,t}}{\tilde{\pi}_{w,t}}-1]\frac{\pi_{w,t}}{\tilde{\pi}_{w,t}}=\beta E_{t}\Big{[}\frac{\lambda_{t+1}}{\lambda_{t}}\Big{]}[\frac{\pi_{w,t+1}}{\tilde{\pi}_{w,t+1}}-1]\frac{\pi_{w,t+1}}{\tilde{\pi}_{w,t+1}}+N_{t}\lambda_{t}\frac{\epsilon_{w}}{\psi_{w}}\Big{[}\psi_{L}\frac{N_{t}^{\sigma_{L}}}{\lambda_{t}}-\frac{\epsilon_{w}-1}{\epsilon_{w}}w_{t}\Big{]} (S43)
πw,t=wtwt1Gz,tπt\pi_{w,t}=\frac{w_{t}}{w_{t-1}}{G_{z,t}}{\pi}_{t} (S44)
π~w,t=Gzπ¯aw(exp(ϵz,t)πt1)1aw\tilde{\pi}_{w,t}=G_{z}\overline{\pi}^{a_{w}}\big{(}\exp(\epsilon_{z,t})\pi_{t-1}\big{)}^{1-a_{w}} (S45)
qt=βEt[λt+1Gz,t+1λt(rt+1Ka(ut+1)+(1δ)qt+1)]q_{t}=\beta E_{t}\Big{[}\frac{\lambda_{t+1}}{{G_{z,t+1}}\lambda_{t}}\big{(}r_{t+1}^{K}-a(u_{t+1})+(1-\delta)q_{t+1}\big{)}\Big{]} (S46)
1=qtμt[1ψI2(itϵz,tit11)2ψI(itϵz,tit11)itϵz,tit1]+βψIEt[qt+1λt+1λtμt+1(it+1ϵz,t+1it1)(it+1it)2ϵz,t+1]1=q_{t}\mu_{t}\Big{[}1-\frac{\psi_{I}}{2}\big{(}\frac{i_{t}\epsilon_{z,t}}{i_{t-1}}-1\big{)}^{2}-{\psi_{I}}\big{(}\frac{i_{t}\epsilon_{z,t}}{i_{t-1}}-1\big{)}\frac{i_{t}\epsilon_{z,t}}{i_{t-1}}\Big{]}\\ +\beta\psi_{I}E_{t}\Big{[}q_{t+1}\frac{\lambda_{t+1}}{\lambda_{t}}\mu_{t+1}\big{(}\frac{i_{t+1}\epsilon_{z,t+1}}{i_{t}}-1\big{)}\big{(}\frac{i_{t+1}}{i_{t}}\big{)}^{2}\epsilon_{z,t+1}\Big{]} (S47)
rtK=rkexp(σa(ut1))r_{t}^{K}=r^{k}{\exp(\sigma_{a}(u_{t}-1))} (S48)
a(ut)=rkσa(exp(σa(ut1))1)a(u_{t})=\frac{r^{k}}{\sigma_{a}}\Big{(}\exp(\sigma_{a}(u_{t}-1))-1\Big{)} (S49)
k¯t+1=(1δ)Gz,t1k¯t+μt(1ψI2[itϵz,tit11]2)it\overline{k}_{t+1}=(1-\delta){G_{z,t}^{-1}}\overline{k}_{t}+\mu_{t}(1-\frac{\psi_{I}}{2}\Big{[}\frac{i_{t}\epsilon_{z,t}}{i_{t-1}}-1\Big{]}^{2})i_{t} (S50)
Rt=(Rt1)ρR[R(πtπ¯)γπ(xg,t)γx(ytϵz,tyt1)γg]1ρRϵR,tR_{t}=\big{(}R_{t-1}\big{)}^{\rho_{R}}\Big{[}R\Big{(}\frac{\pi_{t}}{\overline{\pi}}\Big{)}^{\gamma_{\pi}}(x_{g,t})^{\gamma_{x}}\Big{(}\frac{y_{t}\epsilon_{z,t}}{y_{t-1}}\Big{)}^{\gamma_{g}}\Big{]}^{1-\rho_{R}}{\epsilon_{R,t}} (S51)
yt=ct+it+g~t+ϕp2(Ptπ~t1Pt11)2yt+a(ut)Gz,t1k¯ty_{t}=c_{t}+i_{t}+\tilde{g}_{t}+\frac{\phi_{p}}{2}\Bigg{(}\frac{P_{t}}{\tilde{\pi}_{t-1}P_{t-1}}-1\Bigg{)}^{2}{y_{t}}+a(u_{t}){G_{z,t}^{-1}}\overline{k}_{t} (S52)
xg,t=utα(NtN)1αx_{g,t}=u_{t}^{\alpha}\Big{(}\frac{N_{t}}{N}\Big{)}^{1-\alpha} (S53)
ln(ηt)=ρηln(ηt1)+ϵη,t; ϵη,tiidN(0,ση2)\ln(\eta_{t})=\rho_{\eta}\ln(\eta_{t-1})+\epsilon_{\eta,t};\text{ }\epsilon_{\eta,t}\overset{iid}{\sim}N(0,\sigma_{\eta}^{2}) (S54)
ln(μt)=ρμln(μt1)+ϵμ,t; ϵμ,tiidN(0,σμ2)\ln(\mu_{t})=\rho_{\mu}\ln(\mu_{t-1})+\epsilon_{\mu,t};\text{ }\epsilon_{\mu,t}\overset{iid}{\sim}N(0,{\sigma_{\mu}^{2}}) (S55)
ln(gt)=(1ρg)ln(g)+ρgln(gt1)+ϵg,t; ϵg,tiidN(0,σg2).\ln(g_{t})=(1-\rho_{g})\ln(g)+\rho_{g}\ln(g_{t-1})+\epsilon_{g,t};\text{ }{\epsilon_{g,t}}\overset{iid}{\sim}N(0,\sigma_{g}^{2}). (S56)

Equations S36-S56 summarize the detrended equilibrium conditions of the model. We solve the model by taking a second-order approximation of Equations S36-S56. To show that the solution algorithm can handle a higher dimensional state space, we solve a version of the model keeping all the conditions unchanged except for the Taylor Rule (Equation S51) which approximates the output gap using Equation S53 with

Rt=(Rt1)ρR[R(πtπ¯)γπ(ytytN)γx(ytϵz,tyt1)γg]1ρRϵR,t.R_{t}=\big{(}R_{t-1}\big{)}^{\rho_{R}}\Big{[}R\Big{(}\frac{\pi_{t}}{\overline{\pi}}\Big{)}^{\gamma_{\pi}}(\frac{y_{t}}{y_{t}^{N}})^{\gamma_{x}}\Big{(}\frac{y_{t}\epsilon_{z,t}}{y_{t-1}}\Big{)}^{\gamma_{g}}\Big{]}^{1-\rho_{R}}{\epsilon_{R,t}}. (S57)

The specification of the Taylor rule in Equation S57 follows the canonical definition of the output gap defined as the difference between output in the model with nominal rigidities (yty_{t}) and output in the model with flexible prices and wages (ytNy_{t}^{N}). The number of state variables in the DSGE model increases from 12 to 21 when using the definition of the output gap consistent with the canonical definition and used in Smets and Wouters, (2007).

References

  • Andreasen, (2013) Andreasen, M. M. (2013). Non-linear DSGE models and the central difference Kalman filter. Journal of Applied Econometrics, 28(6):929–955.
  • Andreasen et al., (2018) Andreasen, M. M., Fernández-Villaverde, J., and Rubio-Ramírez, J. F. (2018). The pruned state-space system for non-linear DSGE models: Theory and empirical applications. Review of Economic Studies, 85(1):1–49.
  • Andrieu et al., (2010) Andrieu, C., Doucet, A., and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society, Series B, 72(3):269–342.
  • Andrieu and Roberts, (2009) Andrieu, C. and Roberts, G. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. The Annals of Statistics, 37(2):697–725.
  • Carriero et al., (2018) Carriero, A., Clark, T. E., and Marcellino, M. (2018). Measuring uncertainty and its impact on the economy. Review of Economics and Statistics, 100(5):799–815.
  • Choppala et al., (2016) Choppala, P., Gunawan, D., Chen, J., Tran, M. N., and Kohn, R. (2016). Bayesian inference for state-space models using block and correlated pseudo marginal methods. arXiv:1612.07072v1.
  • Christen and Fox, (2005) Christen, J. A. and Fox, C. (2005). Markov chain Monte Carlo using an approximation. Journal of Computational and Graphical Statistics, 14(4):795–810.
  • Christiano et al., (2005) Christiano, L. J., Eichenbaum, M., and Evans, C. L. (2005). Nominal rigidities and the dynamic effects of a shock to monetary policy. Journal of political Economy, 113(1):1–45.
  • Cross et al., (2021) Cross, J. L., Hou, C., Koop, G., and Poon, A. (2021). Macroeconomic forecasting with large stochastic volatility in mean VARs. BI Norwegian Business School, https://www.oru.se/contentassets/b70754c1297e4b65993aa6115c0f5d0c/cross—macroeconomic-forecasting-with-large-stochastic-volatility-in-mean-vars.pdf.
  • Deligiannidis et al., (2018) Deligiannidis, G., Doucet, A., and Pitt, M. (2018). The correlated pseudo-marginal method. Journal of Royal Statistical Society Series B, 80(5):839–870.
  • Fernández-Villaverde and Rubio-Ramírez, (2007) Fernández-Villaverde, J. and Rubio-Ramírez, J. F. (2007). Estimating macroeconomic models: A likelihood approach. Review of Economic Studies, 74(4):1059–1087.
  • Garthwaite et al., (2015) Garthwaite, P. H., Fan, Y., and Sisson, S. A. (2015). Adaptive optimal scaling of Metropolis-Hastings algorithms using the Robbins-Monro process. Communications in Statistics - Theory and Methods, 45(17):5098–5111.
  • Gordon et al., (1993) Gordon, N., Salmond, D., and Smith, A. (1993). A novel approach to nonlinear and non-Gaussian Bayesian state estimation. IEE-Proceedings F, 140(2):107–113.
  • Guarniero et al., (2017) Guarniero, P., Johansen, A. M., and Lee, A. (2017). The iterated auxiliary particle filter. Journal of American Statistical Association, 112(520):1636–1647.
  • Gust et al., (2017) Gust, C., Herbst, E., López-Salido, D., and Smith, M. E. (2017). The empirical implications of the interest-rate lower bound. American Economic Review, 107(7):1971–2006.
  • Hall et al., (2014) Hall, J., Pitt, M. K., and Kohn, R. (2014). Bayesian inference for nonlinear structural time series models. Journal of Econometrics, 179(2):99–111.
  • Herbst and Schorfheide, (2016) Herbst, E. and Schorfheide, F. (2016). Bayesian estimation of DSGE models (1st ed.). Princeton University Press.
  • Herbst and Schorfheide, (2019) Herbst, E. and Schorfheide, F. (2019). Tempered particle filtering. Journal of Econometrics, 210(1):26–44.
  • Hesterberg, (1995) Hesterberg, T. (1995). Weighted average importance sampling and defensive mixture distributions. Technometrics, 37(2):185–194.
  • Johnson et al., (2013) Johnson, A. A., Jones, G. L., and Neath, R. C. (2013). Component-wise Markov chain Monte Carlo: Uniform and geometric ergodicity under mixing and composition. Statistical Science, 28(3):360–375.
  • Kim et al., (2008) Kim, J., Kim, S., Schaumburg, E., and Sims, C. A. (2008). Calculating and using second-order accurate solutions of discrete time dynamic equilibrium models. Journal of Economic Dynamics and Control, 32(11):3397–3414.
  • Kitagawa, (1996) Kitagawa, G. (1996). Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. Journal of Computational and Graphical Statistics, 5(1):1–25.
  • Kollmann, (2015) Kollmann, R. (2015). Tractable latent state filtering for non-linear DSGE models using a second-order approximation and pruning. Comnputational Economics, 45:239–260.
  • Murray et al., (2013) Murray, L. M., Jones, E. M., and Parslow, J. (2013). On disturbance state-space models and the particle marginal Metropolis-Hastings sampler. SIAM/ASA Journal on Uncertainty Quantification, 1(1):494–521.
  • Norgaard et al., (2000) Norgaard, M., Poulsen, N. K., and Ravn, O. (2000). New developments in state estimation for nonlinear systems. Automatica, 36(11):1627–1638.
  • Pitt et al., (2012) Pitt, M. K., Silva, R. S., Giordani, P., and Kohn, R. (2012). On some properties of Markov chain Monte Carlo simulation methods based on the particle filter. Journal of Econometrics, 171(2):134–151.
  • Plummer et al., (2006) Plummer, M., Best, N., Cowles, K., and Vines, K. (2006). CODA: Convergence Diagnosis and Output Analysis of MCMC. R News, 6(1):7–11.
  • Roberts and Rosenthal, (2009) Roberts, G. O. and Rosenthal, J. S. (2009). Examples of adaptive MCMC. Journal of Computational and Graphical Statistics, 18(2):349–367.
  • Rotemberg, (1982) Rotemberg, J. J. (1982). Sticky prices in the United States. Journal of Political Economy, 90(6):1187–1211.
  • Shephard et al., (2004) Shephard, N., Chib, S., Pitt, M. K., et al. (2004). Likelihood-based inference for diffusion-driven models. technical report, https://www.nuff.ox.ac.uk/economics/papers/2004/w20/chibpittshephard.pdf.
  • (31) Sherlock, C., Golightly, A., and Henderson, D. A. (2017a). Adaptive, delayed-acceptance MCMC for targets with expensive likelihoods. Journal of Computational and Graphical Statistics, 26(2):434–444.
  • Sherlock et al., (2015) Sherlock, C., Thiery, A., Robert, G., and Rosenthal, J. (2015). On the efficiency of pseudo-marginal random walk Metropolis algorithms. The Annals of Statistics, 43(1):238–275.
  • (33) Sherlock, C., Thiery, A. H., and Lee, A. (2017b). Pseudo-marginal Metropolis-Hastings sampling using averages of unbiased estimators. Biometrika, 104(3):727–734.
  • Skilling, (2004) Skilling, J. (2004). Programming the Hilbert curve. AIP Conference Proceedings, 707(1):381–387.
  • Smets and Wouters, (2007) Smets, F. and Wouters, R. (2007). Shocks and frictions in US business cycles: A Bayesian DSGE approach. The American Economic Review, 97(3):586–606.
  • Stramer and Bognar, (2011) Stramer, O. and Bognar, M. (2011). Bayesian inference for irreducible diffusion processes using the pseudo-marginal approach. Bayesian Analysis, 6(2):231–258.
  • Tran et al., (2016) Tran, M. N., Kohn, R., Quiroz, M., and Villani, M. (2016). Block-wise pseudo-marginal Metropolis-Hastings. arXiv:1603.02485v2.