This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

High-dimensional Asymptotic Theory of Bayesian Multiple Testing Procedures Under General Dependent Setup and Possible Misspecification

Noirrit Kiran Chandra and Sourabh Bhattacharya Noirrit Kiran Chandra is a postdoctoral researcher at Department of Statistical Science, Duke University, USA, and Sourabh Bhattacharya is an Associate Professor in Interdisciplinary Statistical Research Unit, Indian Statistical Institute, 203, B. T. Road, Kolkata 700108. Corresponding e-mail: noirritchandra@gmail.com.
Abstract

In this article, we investigate the asymptotic properties of Bayesian multiple testing procedures under general dependent setup, when the sample size and the number of hypotheses both tend to infinity. Specifically, we investigate strong consistency of the procedures and asymptotic properties of different versions of false discovery and false non-discovery rates under the high dimensional setup. We particularly focus on a novel Bayesian non-marginal multiple testing procedure and its associated error rates in this regard. Our results show that the asymptotic convergence rates of the error rates are directly associated with the Kullback-Leibler divergence from the true model, and the results hold even when the postulated class of models is misspecified.

For illustration of our high-dimensional asymptotic theory, we consider a Bayesian variable selection problem in a time-varying covariate selection framework, with autoregressive response variables. We particularly focus on the setup where the number of hypotheses increases at a faster rate compared to the sample size, which is the so-called ultra-high dimensional situation.
MSC 2010 subject classifications: Primary 62F05, 62F15; secondary 62C10, 62J07.
Keywords: Bayesian multiple testing, Dependence, False discovery rate, Kullback-Leibler, Posterior convergence, Ultra high dimension.

1 Introduction

The area of multiple hypotheses testing has gained much importance and popularity, particularly in this era of big data, where often very large number of hypotheses need to be tested simultaneously. There are applications abound in the fields of statistical genetics, spatio-temporal statistics, brain imaging, to name a few. On the theoretical side, it is important to establish validity of the multiple testing procedure in the sense that the procedure controls the false discovery rate (FDR)(FDR) at some pre-specified level or attains oracle, as the number of tests grows to infinity.

Although there is considerable literature addressing these issues, the important factor of dependence among the tests seem to have attained less attention. Indeed, realistically, the test statistics or the parameters can not be expected to be independent. In this regard, Chandra and Bhattacharya (2019) introduced a novel Bayesian multiple testing procedure that coherently accounts for such dependence and yields joint decision rules that are functions of appropriate joint posterior probabilities. As demonstrated in Chandra and Bhattacharya (2019) and Chandra and Bhattacharya (2020), the new Bayesian method significantly outperforms existing popular multiple testing methods by proper utilization of the dependence structures. Since in the new method the decisions are taken jointly, the method is referred to as Bayesian non-marginal multiple testing procedure.

Chandra and Bhattacharya (2020) investigated in detail the asymptotic theory of the non-marginal procedure, and indeed general Bayesian multiple testing methods under additive loss, for fixed number of hypotheses, when the sample size tends to infinity. In particular, they provided sufficient conditions for strong consistency of such procedures and also showed that the asymptotic convergence rates of the versions of FDRFDR and false non-discovery rate (FNRFNR) are directly related to the Kullback-Leibler (KL) divergence from the true model. Interestingly, their results continue to hold even under misspecifications, that is, if the class of postulated models does not include the true model. In this work, we investigate the asymptotic properties of the Bayesian non-marginal procedure in particular, and Bayesian multiple testing methods under additive loss in general, when the sample size, as well as the number of hypotheses, tend to infinity.

As mentioned earlier, asymptotic works in multiple testing when the number of hypotheses grows to infinity, are not rare. Xie et al. (2011) have proposed an asymptotic optimal decision rule for short range dependent data with dependent test statistics. Bogdan et al. (2011) studied the oracle properties and Bayes risk of several multiple testing methods under sparsity in Bayesian decision-theoretic setup. Datta and Ghosh (2013) studied oracle properties for horse-shoe prior when the number of tests grows to infinity. However, in the aforementioned works, the test statistics are independent and follow Gaussian distribution. Fan et al. (2012) proposed a method of dealing with correlated test statistics where the covariance structure is known. Their method is based on principal eigen values of the covariance matrix, which they termed as principal factors. Using those principal factors their method dilutes the association between correlated statistics to deal with an arbitrary dependence structure. They also derived an approximate consistent estimator for the false discovery proportion (FDP) in large-scale multiple testing. Fan and Han (2017) extended this work when the dependence structure is unknown. In these approaches, the decision rules are marginal and the test statistics jointly follow multivariate Gaussian distribution. Chandra and Bhattacharya (2019) argue that when the decision rules corresponding to different hypotheses are marginal, the full potential of the dependence structure is not properly exploited. Results of extensive simulation studies reported in Chandra and Bhattacharya (2019) and Chandra and Bhattacharya (2020), demonstrating superior performance of the Bayesian non-marginal method compared to popular marginal methods, even for large number of hypotheses, seem to vindicate this issue. This makes asymptotic analysis of the Bayesian non-marginal method with increasing number of hypotheses all the more important.

To be more specific, we investigate the asymptotic theory of the Bayesian non-marginal procedure in the general dependence setup, without any particular model assumption, when the sample size (nn) and the number of hypotheses (mnm_{n}, which may be a function of nn), both tend to infinity. We establish strong consistency of the procedure and show that even in this setup, the convergence rates of versions of FDRFDR and FNRFNR are directly related to the KL-divergence from the true model. We show that our results continue to hold for general Bayesian procedures under the additive loss function. In the Bayesian non-marginal context we illustrate the theory with the time-varying covariate selection problem, where the number of covariates tends to infinity with the sample size nn. We distinguish between the two setups: ultra high-dimensional case, that is, where mnn\frac{m_{n}}{n}\rightarrow\infty (or some constant), as nn\rightarrow\infty, and the high-dimensional but not ultra high-dimensional case, that is, mnm_{n}\rightarrow\infty and mnn0\frac{m_{n}}{n}\rightarrow 0, as nn\rightarrow\infty. We particularly focus on the ultra high-dimensional setup because of its much more challenging nature.

2 A brief overview of the Bayesian non-marginal procedure

Let 𝑿n={X1,,Xn}\boldsymbol{X}_{n}=\{X_{1},\ldots,X_{n}\} denote the available data set. Suppose the data is modelled by the family of distributions P𝑿n|𝜽P_{\boldsymbol{X}_{n}|\boldsymbol{\theta}} (which may also be non-parametric). For M>1M>1, let us denote by 𝚯=Θ1××ΘM\boldsymbol{\Theta}=\Theta_{1}\times\cdots\times\Theta_{M} the relevant parameter space associated with 𝜽=(θ1,,θM)\boldsymbol{\theta}=(\theta_{1},\ldots,\theta_{M}), where we allow MM to be infinity as well. Let P𝜽|𝑿n()P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}(\cdot) and E𝜽|𝑿n()E_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}(\cdot) denote the posterior distribution and expectation respectively of 𝜽\boldsymbol{\theta} given 𝑿n\boldsymbol{X}_{n} and let P𝑿n()P_{\boldsymbol{X}_{n}}(\cdot) and E𝑿n()E_{\boldsymbol{X}_{n}}(\cdot) denote the marginal distribution and expectation of 𝑿n\boldsymbol{X}_{n} respectively. Let us consider the problem of testing mm hypotheses simultaneously corresponding to the actual parameters of interest, where 1<mM1<m\leq M.

Without loss of generality, let us consider testing the parameters associated with Θi\Theta_{i}; i=1,,mi=1,\ldots,m, formalized as:

H0i:θiΘ0i versus H1i:θiΘ1i,H_{0i}:\theta_{i}\in\Theta_{0i}\hbox{ versus }H_{1i}:\theta_{i}\in\Theta_{1i},

where Θ0iΘ1i= and Θ0iΘ1i=Θi, for i=1,,m.\Theta_{0i}\bigcap\Theta_{1i}=\emptyset\mbox{ and }\Theta_{0i}\bigcup\Theta_{1i}=\Theta_{i},\mbox{ for $i=1,\cdots,m$}.

Let

di=\displaystyle d_{i}= {1if the i-th hypothesis is rejected;0otherwise;\displaystyle\begin{cases}1&\text{if the $i$-th hypothesis is rejected;}\\ 0&\text{otherwise;}\end{cases} (2.1)
ri=\displaystyle r_{i}= {1if H1i is true;0if H0i is true.\displaystyle\begin{cases}1&\text{if $H_{1i}$ is true;}\\ 0&\text{if $H_{0i}$ is true.}\end{cases} (2.2)

Following Chandra and Bhattacharya (2019) we define GiG_{i} to be the set of hypotheses, including the ii-th one, which are highly dependent, and define

zi={1if Hdj,j is true for all jGi{i};0otherwise.z_{i}=\begin{cases}1&\mbox{if $H_{d_{j},j}$ is true for all $j\in G_{i}\setminus\{i\}$;}\\ 0&\mbox{otherwise.}\end{cases} (2.3)

If, for any i{1,,m}i\in\{1,\ldots,m\}, Gi={i}G_{i}=\{i\}, a singleton, then we define zi=1z_{i}=1. Chandra and Bhattacharya (2019) maximize the posterior expectation of the number of true positives

TP=i=1mdirizi,TP=\sum_{i=1}^{m}d_{i}r_{i}z_{i}, (2.4)

subject to controlling the posterior expectation of the error term

E=i=1mdi(1rizi),E=\sum_{i=1}^{m}d_{i}(1-r_{i}z_{i}), (2.5)

which is actually the posterior mean of the sum of three error terms E1=i=1mdi(1ri)ziE_{1}=\sum_{i=1}^{m}d_{i}(1-r_{i})z_{i}, E2=i=1mdi(1ri)(1zi)E_{2}=\sum_{i=1}^{m}d_{i}(1-r_{i})(1-z_{i}) and E3=i=1mdiri(1zi)E_{3}=\sum_{i=1}^{m}d_{i}r_{i}(1-z_{i}). For detailed discussion regarding these, see Chandra and Bhattacharya (2019).

It follows that the decision configuration can be obtained by minimizing the function

ξ(𝒅)\displaystyle\xi(\boldsymbol{d}) =i=1mdiE𝜽|𝑿n(rizi)+λni=1mdiE𝜽|𝑿n(1rizi)\displaystyle=-\sum_{i=1}^{m}d_{i}E_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}(r_{i}z_{i})+\lambda_{n}\sum_{i=1}^{m}d_{i}E_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}(1-r_{i}z_{i})
=(1+λn)i=1mdi(win(𝒅)λn1+λn),\displaystyle=-(1+\lambda_{n})\sum_{i=1}^{m}d_{i}\left(w_{in}(\boldsymbol{d})-\frac{\lambda_{n}}{1+\lambda_{n}}\right),

with respect to all possible decision configurations of the form 𝒅={d1,,dm}\boldsymbol{d}=\{d_{1},\ldots,d_{m}\}, where λn>0\lambda_{n}>0, and

win(𝒅)=E𝜽|𝑿n(rizi)=P𝜽|𝑿n(H1i{ji,jGiHdj,j}),w_{in}(\boldsymbol{d})=E_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}(r_{i}z_{i})=P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left(H_{1i}\cap\left\{\cap_{j\neq i,j\in G_{i}}H_{d_{j},j}\right\}\right), (2.6)

is the posterior probability of the decision configuration {d1,,di1,1,di+1,,dm}\{d_{1},\ldots,d_{i-1},1,d_{i+1},\ldots,d_{m}\} being correct. Letting βn=λn/(1+λn)\beta_{n}=\lambda_{n}/(1+\lambda_{n}), one can equivalently maximize

fβn(𝒅)=i=1mdi(win(𝒅)βn)f_{\beta_{n}}(\boldsymbol{d})=\sum_{i=1}^{m}d_{i}\left(w_{in}(\boldsymbol{d})-\beta_{n}\right) (2.7)

with respect to 𝒅\boldsymbol{d} and obtain the optimal decision configuration.

Definition 1

Let 𝔻\mathbb{D} be the set of all mm-dimensional binary vectors denoting all possible decision configurations. Define

𝒅^=argmax𝒅𝔻fβ(𝒅)\widehat{\boldsymbol{d}}=\operatorname*{argmax}_{\boldsymbol{d}\in\mathbb{D}}f_{\beta}(\boldsymbol{d})

where 0<β<10<\beta<1. Then 𝐝^\widehat{\boldsymbol{d}} is the optimal decision configuration obtained as the solution of the non-marginal multiple testing method.

For detailed discussion regarding the choice of GiG_{i}s in (2.3), see Chandra and Bhattacharya (2019) and Chandra and Bhattacharya (2020). In particular, Chandra and Bhattacharya (2020) show that asymptotically, the Bayesian non-marginal method is robust with respect to GiG_{i}s in the sense that it is consistent with respect to any choice of the grouping structure. As will be shown in this article, the same holds even in the high-dimensional asymptotic setup.

2.1 Error measures in multiple testing

Storey (2003) advocated positive False Discovery Rate (pFDR)(pFDR) as a measure of type-I error in multiple testing. Let δ(𝒅|𝑿n)\delta_{\mathcal{M}}(\boldsymbol{d}|\boldsymbol{X}_{n}) be the probability of choosing 𝒅\boldsymbol{d} as the optimal decision configuration given data 𝑿n\boldsymbol{X}_{n} when a multiple testing method \mathcal{M} is employed. Then pFDRpFDR is defined as:

pFDR=E𝑿n[𝒅𝔻i=1mdi(1ri)i=1mdiδ(𝒅|𝑿n)|δ(𝒅=𝟎|𝑿n)=0].pFDR=E_{\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}d_{i}(1-r_{i})}{\sum_{i=1}^{m}d_{i}}\delta_{\mathcal{M}}(\boldsymbol{d}|\boldsymbol{X}_{n})\bigg{|}\delta_{\mathcal{M}}(\boldsymbol{d}=\mathbf{0}|\boldsymbol{X}_{n})=0\right]. (2.8)

Analogous to type-II error, the positive False Non-discovery Rate (pFNR)(pFNR) is defined as

pFNR=E𝑿n[𝒅𝔻i=1m(1di)rii=1m(1di)δ(𝒅|𝑿n)|δ(𝒅=𝟏|𝑿n)=0].\displaystyle pFNR=E_{\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}(1-d_{i})r_{i}}{\sum_{i=1}^{m}(1-d_{i})}\delta_{\mathcal{M}}\left(\boldsymbol{d}|\boldsymbol{X}_{n}\right)\bigg{|}\delta_{\mathcal{M}}\left(\boldsymbol{d}=\boldsymbol{1}|\boldsymbol{X}_{n}\right)=0\right]. (2.9)

Under prior π()\pi(\cdot), Sarkar et al. (2008) defined posterior FDRFDR and FNRFNR. The measures are given as following:

posteriorFDR\displaystyle posterior~{}FDR =E𝜽|𝑿n[𝒅𝔻i=1mdi(1ri)i=1mdi1δ(𝒅|𝑿n)]=𝒅𝔻i=1mdi(1vin)i=1mdi1δ(𝒅|𝑿n);\displaystyle=E_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}d_{i}(1-r_{i})}{\sum_{i=1}^{m}d_{i}\vee 1}\delta_{\mathcal{M}}\left(\boldsymbol{d}|\boldsymbol{X}_{n}\right)\right]=\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}d_{i}(1-v_{in})}{\sum_{i=1}^{m}d_{i}\vee 1}\delta_{\mathcal{M}}(\boldsymbol{d}|\boldsymbol{X}_{n}); (2.10)
posteriorFNR\displaystyle posterior~{}FNR =E𝜽|𝑿n[𝒅𝔻i=1m(1di)rii=1m(1di)1δ(𝒅|𝑿n)]=𝒅𝔻i=1m(1di)vini=1m(1di)1δ(𝒅|𝑿n),\displaystyle=E_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}(1-d_{i})r_{i}}{\sum_{i=1}^{m}(1-d_{i})\vee 1}\delta_{\mathcal{M}}\left(\boldsymbol{d}|\boldsymbol{X}_{n}\right)\right]=\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}(1-d_{i})v_{in}}{\sum_{i=1}^{m}(1-d_{i})\vee 1}\delta_{\mathcal{M}}(\boldsymbol{d}|\boldsymbol{X}_{n}), (2.11)

where vin=P𝜽|𝑿n(Θ1i)v_{in}=P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}(\varTheta_{1i}). Also under any non-randomized decision rule \mathcal{M}, δ(𝒅|𝑿n)\delta_{\mathcal{M}}(\boldsymbol{d}|\boldsymbol{X}_{n}) is either 1 or 0 depending on data 𝑿n\boldsymbol{X}_{n}. Given 𝑿n\boldsymbol{X}_{n}, we denote these posterior error measures by FDR𝑿nFDR_{\boldsymbol{X}_{n}} and FNR𝑿nFNR_{\boldsymbol{X}_{n}} respectively.

With respect to the new notions of errors in (2.4) and (2.5), Chandra and Bhattacharya (2019) modified FDR𝑿nFDR_{\boldsymbol{X}_{n}} as

modifiedFDR𝑿n\displaystyle modified~{}FDR_{\boldsymbol{X}_{n}} =E𝜽|𝑿n[𝒅𝔻i=1mdi(1rizi)i=1mdi1δ(𝒅|𝑿n)]\displaystyle=E_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}d_{i}(1-r_{i}z_{i})}{\sum_{i=1}^{m}d_{i}\vee 1}\delta_{\mathcal{M}}\left(\boldsymbol{d}|\boldsymbol{X}_{n}\right)\right]
=𝒅𝔻i=1mdi(1win(𝒅))i=1mdi1δ(𝒅|𝑿n),\displaystyle=\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}d_{i}(1-w_{in}(\boldsymbol{d}))}{\sum_{i=1}^{m}d_{i}\vee 1}\delta_{\mathcal{M}}(\boldsymbol{d}|\boldsymbol{X}_{n}), (2.12)

and FNR𝑿nFNR_{\boldsymbol{X}_{n}} as

modifiedFNR𝑿n\displaystyle modified~{}FNR_{\boldsymbol{X}_{n}} =E𝜽|𝑿n[𝒅𝔻i=1m(1di)rizii=1m(1di)1δ(𝒅|𝑿n)]\displaystyle=E_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}(1-d_{i})r_{i}z_{i}}{\sum_{i=1}^{m}(1-d_{i})\vee 1}\delta_{\mathcal{M}}\left(\boldsymbol{d}|\boldsymbol{X}_{n}\right)\right]
=𝒅𝔻i=1m(1di)win(𝒅)i=1m(1di)1δ(𝒅|𝑿n).\displaystyle=\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}(1-d_{i})w_{in}(\boldsymbol{d})}{\sum_{i=1}^{m}(1-d_{i})\vee 1}\delta_{\mathcal{M}}(\boldsymbol{d}|\boldsymbol{X}_{n}). (2.13)

We denote modifiedFDR𝑿nmodified~{}FDR_{\boldsymbol{X}_{n}} and FNR𝑿nFNR_{\boldsymbol{X}_{n}} by mFDR𝑿nmFDR_{\boldsymbol{X}_{n}} and mFNR𝑿nmFNR_{\boldsymbol{X}_{n}} respectively. Notably, the expectations of FDR𝑿nFDR_{\boldsymbol{X}_{n}} and FNR𝑿nFNR_{\boldsymbol{X}_{n}} with respect to 𝑿n\boldsymbol{X}_{n}, conditioned on the fact that their respective denominators are positive, yields the positive Bayesian FDR(pBFDR)FDR~{}(pBFDR) and FNRFNR (pBFNR)(pBFNR) respectively. The same expectation over mFDR𝑿nmFDR_{\boldsymbol{X}_{n}} and mFNR𝑿nmFNR_{\boldsymbol{X}_{n}} yields modified positive BFDR(mpBFDR)BFDR~{}(mpBFDR) and modified positive BFNR(mpBFNR)BFNR~{}(mpBFNR) respectively.

Müller et al. (2004) (see also (Sun and Cai, 2009; Xie et al., 2011)) considered the following additive loss function

L(𝒅,𝜽)=ci=1mdi(1ri)+i=1m(1di)ri,L(\boldsymbol{d},\boldsymbol{\theta})=c\sum_{i=1}^{m}d_{i}(1-r_{i})+\sum_{i=1}^{m}(1-d_{i})r_{i}, (2.14)

where cc is a positive constant. The decision rule that minimizes the posterior risk of the above loss is di=I(vi>c1+c)d_{i}=I\left(v_{i}>\frac{c}{1+c}\right) for all i=1,,mi=1,\cdots,m, where I()I(\cdot) is the indicator function. Observe that the non-marginal method boils down to this additive loss function based approach when Gi={i}G_{i}=\{i\}, that is, when the information regarding dependence between hypotheses is not available or overlooked. Hence, the convergence properties of the additive loss function based methods can be easily derived from our theories.

Note that multiple testing problems can be regarded as model selection problems where the task is to choose the correct specification for the parameters under consideration. The model is misspecified even if one decision is taken incorrectly. Under quite general conditions, Shalizi (2009) investigated asymptotic behaviour of misspecified models. We adopt his basic assumptions and some of his convergence results to build a general asymptotic theory for our Bayesian non-marginal multiple testing method in high dimensions.

In Section 3, we provide the setup, assumptions and the main result of Shalizi (2009) which we adopt for our purpose. In Section 4 we address consistency of the Bayesian non-marginal method and convergence of the associated error terms in the high-dimensional setup. High-dimensional asymptotic analyses of versions of FDRFDR and FNRFNR are detailed in Sections 5 and 6, respectively. In Section 7, we establish the high-dimensional asymptotic theory for FNR𝑿nFNR_{\boldsymbol{X}_{n}} and BFNRBFNR when versions of BFDRBFDR are α\alpha-controlled asymptotically. We illustrate the asymptotic properties of the non-marginal method in a multiple testing setup associated with an autoregressive model involving time-varying covariates in Section 8, in high-dimensional contexts. Finally, in Section 9 we summarize our contributions and provide concluding remarks.

3 Preliminaries for ensuring posterior convergence under general setup

Following Shalizi (2009) we consider a probability space (Ω,,P)(\Omega,\mathcal{F},P), and a sequence of random variables X1,X2,X_{1},X_{2},\ldots, taking values in some measurable space (Ξ,𝒳)(\Xi,\mathcal{X}), whose infinite-dimensional distribution is PP. The natural filtration of this process is σ(𝑿n)\sigma(\boldsymbol{X}_{n}).

We denote the distributions of processes adapted to σ(𝑿n)\sigma(\boldsymbol{X}_{n}) by P𝑿n|𝜽P_{\boldsymbol{X}_{n}|\boldsymbol{\theta}}, where 𝜽\boldsymbol{\theta} is associated with a measurable space (𝚯,𝒯)(\boldsymbol{\Theta},\mathcal{T}), and is generally infinite-dimensional. For the sake of convenience, we assume, as in Shalizi (2009), that PP and all the P𝑿n|𝜽P_{\boldsymbol{X}_{n}|\boldsymbol{\theta}} are dominated by a common reference measure, with respective densities pp and f𝜽f_{\boldsymbol{\theta}}. The usual assumptions that P𝚯P\in\boldsymbol{\Theta} or even PP lies in the support of the prior on 𝚯\boldsymbol{\Theta}, are not required for Shalizi’s result, rendering it very general indeed. We put the prior distribution π()\pi(\cdot) on the parameter space 𝚯\boldsymbol{\Theta}.

3.1 Assumptions and theorem of Shalizi

  1. (S1)

    Consider the following likelihood ratio:

    Rn(𝜽)=f𝜽(𝑿n)p(𝑿n).R_{n}(\boldsymbol{\theta})=\frac{f_{\boldsymbol{\theta}}(\boldsymbol{X}_{n})}{p(\boldsymbol{X}_{n})}. (3.1)

    Assume that Rn(𝜽)R_{n}(\boldsymbol{\theta}) is σ(𝑿n)×𝒯\sigma(\boldsymbol{X}_{n})\times\mathcal{T}-measurable for all n>0n>0.

  2. (S2)

    For each 𝜽Θ\boldsymbol{\theta}\in\Theta, the generalized or relative asymptotic equipartition property holds, and so, almost surely,

    limn1nlogRn(𝜽)=h(𝜽),\underset{n\rightarrow\infty}{\lim}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})=-h(\boldsymbol{\theta}),

    where h(𝜽)h(\boldsymbol{\theta}) is given in (S3) below.

  3. (S3)

    For every 𝜽Θ\boldsymbol{\theta}\in\Theta, the KL-divergence rate

    h(𝜽)=limn1nE(logp(𝑿n)f𝜽(𝑿n)).h(\boldsymbol{\theta})=\underset{n\rightarrow\infty}{\lim}~{}\frac{1}{n}E\left(\log\frac{p(\boldsymbol{X}_{n})}{f_{\boldsymbol{\theta}}(\boldsymbol{X}_{n})}\right). (3.2)

    exists (possibly being infinite) and is 𝒯\mathcal{T}-measurable.

  4. (S4)

    Let I={𝜽:h(𝜽)=}I=\left\{\boldsymbol{\theta}:h(\boldsymbol{\theta})=\infty\right\}. The prior π\pi satisfies π(I)<1\pi(I)<1.

    Following the notation of Shalizi (2009), for AΘA\subseteq\Theta, let

    h(A)\displaystyle h\left(A\right) =ess inf𝜽Ah(𝜽);\displaystyle=\underset{\boldsymbol{\theta}\in A}{\mbox{ess~{}inf}}~{}h(\boldsymbol{\theta}); (3.3)
    J(𝜽)\displaystyle J(\boldsymbol{\theta}) =h(𝜽)h(Θ);\displaystyle=h(\boldsymbol{\theta})-h(\Theta); (3.4)
    J(A)\displaystyle J(A) =ess inf𝜽AJ(𝜽).\displaystyle=\underset{\boldsymbol{\theta}\in A}{\mbox{ess~{}inf}}~{}J(\boldsymbol{\theta}). (3.5)
  5. (S5)

    There exists a sequence of sets 𝒢nΘ\mathcal{G}_{n}\rightarrow\Theta as nn\rightarrow\infty such that:

    1. (1)
      π(𝒢n)1αexp(ςn),for someα>0,ς>2h(Θ);\pi\left(\mathcal{G}_{n}\right)\geq 1-\alpha\exp\left(-\varsigma n\right),~{}\mbox{for some}~{}\alpha>0,~{}\varsigma>2h(\Theta); (3.6)
    2. (2)

      The convergence in (S3) is uniform in θ\theta over 𝒢nI\mathcal{G}_{n}\setminus I.

    3. (3)

      h(𝒢n)h(Θ)h\left(\mathcal{G}_{n}\right)\rightarrow h\left(\Theta\right), as nn\rightarrow\infty.

    For each measurable AΘA\subseteq\Theta, for every δ>0\delta>0, there exists a random natural number τ(A,δ)\tau(A,\delta) such that

    n1logARn(𝜽)π(𝜽)𝑑𝜽δ+lim supnn1logARn(𝜽)π(𝜽)𝑑𝜽,n^{-1}\log\int_{A}R_{n}(\boldsymbol{\theta})\pi(\boldsymbol{\theta})d\boldsymbol{\theta}\leq\delta+\underset{n\rightarrow\infty}{\limsup}~{}n^{-1}\log\int_{A}R_{n}(\boldsymbol{\theta})\pi(\boldsymbol{\theta})d\boldsymbol{\theta}, (3.7)

    for all n>τ(A,δ)n>\tau(A,\delta), provided limsupnn1logπ(𝕀ARn)<\underset{n\rightarrow\infty}{\lim\sup}~{}n^{-1}\log\pi\left(\mathbb{I}_{A}R_{n}\right)<\infty. Regarding this, the following assumption has been made by Shalizi:

  6. (S6)

    The sets 𝒢n\mathcal{G}_{n} of (S5) can be chosen such that for every δ>0\delta>0, the inequality n>τ(𝒢n,δ)n>\tau(\mathcal{G}_{n},\delta) holds almost surely for all sufficiently large nn.

  7. (S7)

    The sets 𝒢n\mathcal{G}_{n} of (S5) and (S6) can be chosen such that for any set AA with π(A)>0\pi(A)>0,

    h(𝒢nA)h(A),h\left(\mathcal{G}_{n}\cap A\right)\rightarrow h\left(A\right), (3.8)

    as nn\rightarrow\infty.

Under the above assumptions, the following version of the theorem of Shalizi (2009) can be seen to hold.

Theorem 2 ((Shalizi, 2009))

Consider assumptions (S1)(S7) and any set A𝒯A\in\mathcal{T} with π(A)>0\pi(A)>0. If ς>2h(A)\varsigma>2h(A), where ς\varsigma is given in (3.6) under assumption (S5), then

limn1nlogP𝜽|𝑿n(A|𝑿n)=J(A).\underset{n\rightarrow\infty}{\lim}~{}\frac{1}{n}\log P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}(A|\boldsymbol{X}_{n})=-J(A). (3.9)

We shall frequently make use of this theorem for our purpose. Also throughout this article, we show consistency results for general models which satisfy (S1)(S7). For all our results, we assume these conditions.

4 Consistency of multiple testing procedures when the number of hypotheses tends to infinity

In this section we show that the non-marginal procedure is asymptotically consistent under any general dependency model satisfying the conditions in Section 3.1. Since one of our main goals is to allow for misspecification, we must define consistency of multiple testing methods encompassing misspecification, while also allowing for mnm_{n} hypotheses where mn/ncm_{n}/n\rightarrow c, where c0c\geq 0 or \infty. We formalize this below by introducing appropriate notions.

4.1 Consistency of multiple testing procedures under misspecification

Let 𝚯\mathbf{\Theta}^{\infty} be the infinite dimensional parameter space of the countably infinite set of parameters {θ1,θ2,}\{\theta_{1},\theta_{2},\ldots\}. In this case, any decision configuration 𝒅\boldsymbol{d} is also an infinite dimensional vector of 0’s and 1’s. Define 𝚯t=i=1𝚯dit,i\boldsymbol{\Theta}^{t}=\otimes_{i=1}^{\infty}\boldsymbol{\Theta}_{d_{i}^{t},i}, where ``"``\otimes" denotes cartesian product, and 𝒅t=(d1t,d2t,)\boldsymbol{d}^{t}=(d^{t}_{1},d^{t}_{2},\ldots) denotes the actual infinite dimensional decision configuration satisfying J(𝚯t)=J(𝚯)J\left(\boldsymbol{\Theta}^{t}\right)=J\left(\mathbf{\Theta}^{\infty}\right). This definition of 𝒅t\boldsymbol{d}^{t} accounts for misspecification in the sense that 𝒅t\boldsymbol{d}^{t} is the minimizer of the KL-divergence from the true data-generating model. For any decision 𝒅\boldsymbol{d}, let 𝒅(mn)\boldsymbol{d}(m_{n}) denote the first mnm_{n} components of 𝒅\boldsymbol{d}. Let 𝔻mn\mathbb{D}_{m_{n}} denote the set of all possible decision configurations corresponding to mnm_{n} hypotheses. With the aforementioned notions, we now define consistency of multiple testing procedures.

Definition 3

Let 𝐝t(mn)\boldsymbol{d}^{t}(m_{n}) be the true decision configuration among all possible decision configurations in 𝔻mn\mathbb{D}_{m_{n}}. Then a multiple testing method \mathcal{M} is said to be asymptotically consistent if almost surely

limnδ(𝒅t(mn)|𝑿n)=1.\lim_{n\rightarrow\infty}\delta_{\mathcal{M}}(\boldsymbol{d}^{t}(m_{n})|\boldsymbol{X}_{n})=1. (4.1)

Recall the constant βn\beta_{n} in (2.7), which is the penalizing constant between the error EE and true positives TPTP. For consistency of the non-marginal procedure, we need certain conditions on βn\beta_{n}, which we state below. These conditions will also play important roles in the asymptotic studies of the different versions of FDRFDR and FNRFNR that we consider.

  1. (A1)

    We assume that the sequence βn\beta_{n} is neither too small nor too large, that is,

    β¯\displaystyle\underline{\beta} =lim infn1βn>0;\displaystyle=\underset{n\geq 1}{\liminf}\beta_{n}>0; (4.2)
    β¯\displaystyle\overline{\beta} =lim supn1βn<1.\displaystyle=\underset{n\geq 1}{\limsup}\beta_{n}<1. (4.3)
  2. (A2)

    We assume that neither all the null hypotheses are true and nor all of then are false for mnm_{n} hypotheses being considered, that is, 𝒅t(mn)𝟎\boldsymbol{d}^{t}(m_{n})\neq\boldsymbol{0} and 𝒅t(mn)𝟏\boldsymbol{d}^{t}(m_{n})\neq\boldsymbol{1}, where 𝟎\boldsymbol{0} and 𝟏\boldsymbol{1} are vectors of 0’s and 1’s respectively.

Condition (A1) is necessary for the asymptotic consistency of both the non-marginal method and additive loss function based method. This ensures that the penalizing constant is asymptotically bounded away from 0 and 1, that is, it is neither too small nor too large. Notably, (A2) is not required for the consistency results. The role of (A2) is to ensure that the denominator terms in the multiple testing error measures (defined in Section 2.1) do not become 0.

4.2 Main results on consistency in the infinite-dimensional setup

In this section we investigate the asymptotic properties of the Bayesian non-marginal method and Müller et al. (2004) when mn/nm_{n}/n tends to infinity or some positive constant. It is to be noted that result (3.9) of Shalizi (2009) holds even for infinite-dimensional parameter space. Exploiting this fact we derive the results in this section.

Note that if there exists a value 𝜽t\boldsymbol{\theta}^{t} of 𝜽\boldsymbol{\theta} that minimizes the KL-divergence, then 𝜽t\boldsymbol{\theta}^{t} is in the set 𝚯t\boldsymbol{\Theta}^{t}. Let us denote by 𝚯tc\boldsymbol{\Theta}^{tc} the complement of 𝚯t\boldsymbol{\Theta}^{t}. Observe that if 𝜽t\boldsymbol{\theta}^{t} lies in the interior of 𝚯t\boldsymbol{\Theta}^{t}, then J(𝚯tc)>0J\left(\boldsymbol{\Theta}^{tc}\right)>0. It then holds that

limn1nlogP𝜽|𝑿n(𝚯tc)=J(𝚯tc),\lim_{n\rightarrow\infty}\frac{1}{n}\log P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left(\boldsymbol{\Theta}^{tc}\right)=-J\left(\boldsymbol{\Theta}^{tc}\right), (4.4)

which implies that for any ϵ>0\epsilon>0, there exists a n0(ϵ)n_{0}(\epsilon) such that for all n>n0(ϵ)n>n_{0}(\epsilon)

exp[n(J(𝚯tc)+ϵ)]<P𝜽|𝑿n(𝚯tc)<exp[n(J(𝚯tc)ϵ)]\displaystyle\exp\left[-n\left(J\left(\boldsymbol{\Theta}^{tc}\right)+\epsilon\right)\right]<P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left(\boldsymbol{\Theta}^{tc}\right)<\exp\left[-n\left(J\left(\boldsymbol{\Theta}^{tc}\right)-\epsilon\right)\right] (4.5)
\displaystyle\Rightarrow 1exp[n(J(𝚯tc)ϵ)]<P𝜽|𝑿n(𝚯t)<1exp[n(J(𝚯tc)+ϵ)].\displaystyle 1-\exp\left[-n\left(J\left(\boldsymbol{\Theta}^{tc}\right)-\epsilon\right)\right]<P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left(\boldsymbol{\Theta}^{t}\right)<1-\exp\left[-n\left(J\left(\boldsymbol{\Theta}^{tc}\right)+\epsilon\right)\right]. (4.6)

For notational convenience, we shall henceforth denote J(𝚯tc)J\left(\boldsymbol{\Theta}^{tc}\right) by JJ.

Note that the groups GiG_{i} also depend upon mnm_{n} in our setup; hence, we denote them by Gi,mnG_{i,m_{n}}. For any decision configuration 𝒅(mn)\boldsymbol{d}(m_{n}) and group GmnG_{m_{n}} let 𝒅Gmn={dj:jGmn}\boldsymbol{d}_{G_{m_{n}}}=\{d_{j}:j\in G_{m_{n}}\}. Define

𝔻i,mn={𝒅(mn):all decisions in𝒅Gi,mnare correct}.\mathbb{D}_{i,m_{n}}=\left\{\boldsymbol{d}(m_{n}):~{}\mbox{all decisions in}~{}\boldsymbol{d}_{G_{i,m_{n}}}~{}\mbox{are correct}\right\}.

Here 𝔻i,mn\mathbb{D}_{i,m_{n}} is the set of all decision configurations where the decisions corresponding to the hypotheses in Gi,mnG_{i,m_{n}} are at least correct. Clearly 𝔻i,mn\mathbb{D}_{i,m_{n}} contains 𝒅t(mn)\boldsymbol{d}^{t}(m_{n}) for all i=1,2,,mni=1,2,\ldots,m_{n}.

Hence, 𝔻i,mnc={𝒅(mn):at least one decision in𝒅Gi,mnis incorrect}\mathbb{D}^{c}_{i,m_{n}}=\left\{\boldsymbol{d}(m_{n}):~{}\mbox{at least one decision in}~{}\boldsymbol{d}_{G_{i,m_{n}}}~{}\mbox{is incorrect}\right\}. Observe that if 𝒅(mn)𝔻i,mnc\boldsymbol{d}(m_{n})\in\mathbb{D}^{c}_{i,m_{n}}, at least one decision is wrong corresponding to some parameter in Gi,mnG_{i,m_{n}}. As P𝜽|𝑿n(𝚯tc)P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left(\boldsymbol{\Theta}^{tc}\right) is the posterior probability of at least one wrong decision in the infinite dimensional parameter space, we have

win(𝒅(mn))win(𝒅)<P𝜽|𝑿n(𝚯tc)<exp[n(Jϵ)].w_{in}(\boldsymbol{d}(m_{n}))\leq w_{in}(\boldsymbol{d})<P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left(\boldsymbol{\Theta}^{tc}\right)<\exp\left[-n\left(J-\epsilon\right)\right]. (4.7)

Also if H0iH_{0i} is true, then

vinwin(𝒅)<P𝜽|𝑿n(𝚯tc)<exp[n(Jϵ)].v_{in}\leq w_{in}(\boldsymbol{d})<P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left(\boldsymbol{\Theta}^{tc}\right)<\exp\left[-n\left(J-\epsilon\right)\right]. (4.8)

Similarly for 𝒅(mn)𝔻i,mn\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}} and for false H0iH_{0i}

win(𝒅(mn))win(𝒅t)>P𝜽|𝑿n(𝚯t)>1exp[n(Jϵ)];w_{in}(\boldsymbol{d}(m_{n}))\geq w_{in}(\boldsymbol{d}^{t})>P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left(\boldsymbol{\Theta}^{t}\right)>1-\exp\left[-n\left(J-\epsilon\right)\right]; (4.9)
vinwin(𝒅t)>P𝜽|𝑿n(𝚯t)>1exp[n(Jϵ)].v_{in}\geq w_{in}(\boldsymbol{d}^{t})>P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left(\boldsymbol{\Theta}^{t}\right)>1-\exp\left[-n\left(J-\epsilon\right)\right]. (4.10)

It is important to note that the inequalities (4.7)-(4.10) hold for all n>n0n>n_{0} and this n0n_{0} is the same for all ii, thanks to validity of Shalizi’s result in infinite dimensional parameter space. Exploiting the properties of Shalizi’s theorem we will now establish consistency of the Bayesian non-marginal method for increasing number of hypotheses.

Theorem 4

Let δ𝒩\delta_{\mathcal{NM}} denote the decision rule corresponding to the Bayesian non-marginal procedure for mnm_{n} hypotheses being tested using samples of size nn, where mnm_{n}\rightarrow\infty as nn\rightarrow\infty. Assume Shalizi’s conditions and assumption (A1). Also assume that J(𝚯t)>0J\left(\boldsymbol{\Theta}^{t}\right)>0. Then,

limnδ𝒩(𝒅t(mn)|𝑿n)=1,almost surely, and\displaystyle\lim_{n\rightarrow\infty}\delta_{\mathcal{NM}}(\boldsymbol{d}^{t}(m_{n})|\boldsymbol{X}_{n})=1,~{}\mbox{almost surely, and} (4.11)
limnE[δ𝒩(𝒅t(mn)|𝑿n)]=1.\displaystyle\lim_{n\rightarrow\infty}E\left[\delta_{\mathcal{NM}}(\boldsymbol{d}^{t}(m_{n})|\boldsymbol{X}_{n})\right]=1. (4.12)
Corollary 5

Assuming condition (A1), the optimal decision rule corresponding to the additive loss function (2.14) is asymptotically consistent. The proof follows in the same way as that of Theorem 4 using (4.8) and (4.10).

Remark 6

Note that Theorem 4 does not require any condition regarding the growth of mnm_{n} with respect to nn, and holds if mn/ncm_{n}/n\rightarrow c as nn\rightarrow\infty, where c0c\geq 0 is some constant, or infinity. Thus, the result seems to be extremely satisfactory. However, restrictions on the growth of mnm_{n} needs to be generally imposed to satisfy the conditions of Shalizi. An illustration in this regard is provided in Section 8.

5 High-dimensional asymptotic analyses of versions of FDRFDR

For a fixed number of hypotheses mm, Chandra and Bhattacharya (2020) investigated convergence of different versions of FDRFDR as the sample size nn tends to infinity. They show that show that the convergence rates of the posterior error measures mFDR𝑿nmFDR_{\boldsymbol{X}_{n}} and FDR𝑿nFDR_{\boldsymbol{X}_{n}} are directly associated with the KL-divergence from the true model. Indeed, they were able to obtain the exact limits of 1nlogmFDR𝑿n\frac{1}{n}\log mFDR_{\boldsymbol{X}_{n}} and 1nlogFDR𝑿n\frac{1}{n}\log FDR_{\boldsymbol{X}_{n}} in terms of the relevant mm-dimensional KL-divergence rate.

In the current high-dimensional setup, however, such exact KL-divergence rate can not be expected to be available since the number of hypotheses mnm_{n} is not fixed. As mnm_{n}\rightarrow\infty, it is plausible to expect that the convergence rates depend upon the infinite-dimensional KL-divergence JJ. We show that this is indeed the case, but the exact limit is not available, which is again to be expected, since mnm_{n} approaches infinity, not equal to infinity. Here, in the high-dimensional setup we obtain J-J as an upper bound of the limit supremums. It is easy to observe that the limits in the finite-dimensional setup are bounded above by J-J, thus providing evidence of internal consistency as we move from fixed-dimensional to high-dimensional setup.

We also show that mpBFDRmpBFDR and pBFDRpBFDR approach zero, even though the rates of convergence are not available. Recall that even in the fixed-dimensional setup, the convergence rates of mpBFDRmpBFDR and pBFDRpBFDR were not available. As in the consistency result, these results too do not require any restriction on the growth rate of mnm_{n}, except that required for Shalizi’s conditions to hold.

We present our results below, the proofs of which are presented in the supplement.

Theorem 7

Assume the setup and conditions of Theorem 4. Then, for any ϵ>0\epsilon>0, there exists n0(ϵ)1n_{0}(\epsilon)\geq 1 such that for nn0(ϵ)n\geq n_{0}(\epsilon), the following hold almost surely:

mFDR𝑿n\displaystyle mFDR_{\boldsymbol{X}_{n}} en(Jϵ);\displaystyle\leq e^{-n(J-\epsilon)}; (5.1)
FDR𝑿n\displaystyle FDR_{\boldsymbol{X}_{n}} en(Jϵ).\displaystyle\leq e^{-n(J-\epsilon)}. (5.2)

The above theorem shows that the convergence rate of mFDR𝑿nmFDR_{\boldsymbol{X}_{n}} and FDRFDR to 0 for arbitrarily large number of hypotheses is at exponential rate, for arbitrary growth rate of mnm_{n} with respect to nn. However, again Shalizi’s conditions would require restriction on the growth rate of mnm_{n}.

Corollary 8

Under the setup and assumptions of Theorem 4,

lim supn1nlogmFDR𝑿n\displaystyle\limsup_{n\rightarrow\infty}\frac{1}{n}\log mFDR_{\boldsymbol{X}_{n}} J;\displaystyle\leq-J; (5.3)
lim supn1nlogFDR𝑿n\displaystyle\limsup_{n\rightarrow\infty}\frac{1}{n}\log FDR_{\boldsymbol{X}_{n}} J;\displaystyle\leq-J; (5.4)
Theorem 9

Assume the setup and conditions of Theorem 4, along with assumption (A2). Then

limnmpBFDR\displaystyle\lim_{n\rightarrow\infty}mpBFDR =0;\displaystyle=0; (5.5)
limnpBFDR\displaystyle\lim_{n\rightarrow\infty}pBFDR =0.\displaystyle=0. (5.6)

6 High-dimensional asymptotic analyses of versions of FNRFNR

High-dimensional asymptotic treatments of versions of FNRFNR are similar to those for versions of FDRFDR. In particular, limit supremums of both 1nlogmFNR𝑿n\frac{1}{n}\log mFNR_{\boldsymbol{X}_{n}} and 1nlogFNR𝑿n\frac{1}{n}\log FNR_{\boldsymbol{X}_{n}} are bounded above by J-J, and that both mpBFNRmpBFNR and pBFNRpBFNR converge to zero. The proofs of these results are also similar to those for the respective FDRFDR versions. Internal consistency of these results is again evident as the limits of 1nlogmFNR𝑿n\frac{1}{n}\log mFNR_{\boldsymbol{X}_{n}} and 1nlogFNR𝑿n\frac{1}{n}\log FNR_{\boldsymbol{X}_{n}} in the finite dimensional setups are bounded above by J-J and mpBFNRmpBFNR and pBFNRpBFNR converge to zero for fixed number of hypotheses. In the latter cases, convergence rates are not available for either fixed or high-dimensional cases. Below we provide the relevant results on versions of FNRFNR, with proofs in the supplement.

Theorem 10

Assume the setup and conditions of Theorem 4. Then, for any ϵ>0\epsilon>0, there exists n0(ϵ)1n_{0}(\epsilon)\geq 1 such that for nn0(ϵ)n\geq n_{0}(\epsilon), the following hold almost surely:

mFNR𝑿n\displaystyle mFNR_{\boldsymbol{X}_{n}} en(Jϵ);\displaystyle\leq e^{-n(J-\epsilon)}; (6.1)
FNR𝑿n\displaystyle FNR_{\boldsymbol{X}_{n}} en(Jϵ).\displaystyle\leq e^{-n(J-\epsilon)}. (6.2)

The above theorem shows that the convergence rate of mFNR𝑿nmFNR_{\boldsymbol{X}_{n}} and FNRFNR to 0 for arbitrarily large number of hypotheses is at exponential rate, for arbitrary growth rate of mnm_{n} with respect to nn. However, again Shalizi’s conditions would require restriction on the growth rate of mnm_{n}.

Corollary 11

Under the setup and assumptions of Theorem 4,

lim supn1nlogmFNR𝑿n\displaystyle\limsup_{n\rightarrow\infty}\frac{1}{n}\log mFNR_{\boldsymbol{X}_{n}} J;\displaystyle\leq-J; (6.3)
lim supn1nlogFNR𝑿n\displaystyle\limsup_{n\rightarrow\infty}\frac{1}{n}\log FNR_{\boldsymbol{X}_{n}} J;\displaystyle\leq-J; (6.4)
Theorem 12

Assume the setup and conditions of Theorem 4, along with assumption (A2). Then

limnmpBFNR\displaystyle\lim_{n\rightarrow\infty}mpBFNR =0;\displaystyle=0; (6.5)
limnpBFNR\displaystyle\lim_{n\rightarrow\infty}pBFNR =0.\displaystyle=0. (6.6)

7 High-dimensional asymptotics for FNR𝑿nFNR_{\boldsymbol{X}_{n}} and BFNRBFNR when versions of BFDRBFDR are α\alpha-controlled

It has been proved in Chandra and Bhattacharya (2019) for the non-marginal multiple testing procedure and additive loss-function based methods, mpBFDRmpBFDR and pBFDRpBFDR are continuous and non-increasing in β\beta. Consequently, for suitable values of β\beta any α(0,1)\alpha\in(0,1) can be achieved by these errors. For suitably chosen positive values of α\alpha, one can hope to reduce the corresponding BFNRBFNR. This is standard practice even in the single hypothesis testing literature, where the Type-I error is controlled at some positive value so that a reduced Type-II error may be incurred. However, as shown in Chandra and Bhattacharya (2020) in the fixed-dimensional setup, for the non-marginal multiple testing procedure and additive loss-function based methods, values of α\alpha that are as close to 1 as desired, can not be attained by versions of FDRFDR as the sample size nn tends to infinity. This is not surprising, however, since consistent procedures are not expected to incur large errors asymptotically, at least when the number of hypothesis is fixed. Indeed, in the fixed-dimensional setup, Chandra and Bhattacharya (2020) provided an interval of the form (a,b)(a,b) where 0<a<b<10<a<b<1, in which maximum values of the versions of FDRFDR can lie asymptotically and obtained asymptotic results for FNRFNR for such α\alpha-controlled versions of FDRFDR.

In this section we investigate the asymptotic theory for α\alpha-control in the high-dimensional context, that is, when mnm_{n}\rightarrow\infty as nn\rightarrow\infty. Although none of our previous high-dimensional results did not require any explicit restrictions on the growth rate of mnm_{n} given that the posterior convergence result of Shalizi holds, here we need a very mild condition on mnm_{n} that it grows slower than the exponential rate in nn. We also need to fix the proportion (pp) of true alternatives as mnm_{n}\rightarrow\infty, and the proportion (qq) of groups associated with at least one false null hypothesis. As we show, these two proportions define an interval of the form (0,b)(0,b), with b=1q1+pq<1b=\frac{1-q}{1+p-q}<1, in which the maximum of the versions of FDRFDR lie, as mnm_{n}\rightarrow\infty with nn. In contrast with the fixed-dimensional asymptotics of Chandra and Bhattacharya (2020), the lower bound of the interval is zero for high dimensions, not strictly positive. To explain, for fixed dimension mm, the lower bound was a=1i=1mdit+1a=\frac{1}{\sum_{i=1}^{m}d^{t}_{i}+1}. Intuitively, replacing aa and mm with amna_{m_{n}} and mnm_{n} respectively, dividing both numerator and denominator of aa by mnm_{n}, taking the limit, replacing the denominator with pp, we obtain amn0a_{m_{n}}\rightarrow 0, as nn\rightarrow\infty. Similar intuition can be used to verify that the upper bound bb in the fixed dimensional case converges to 1q1+pq\frac{1-q}{1+p-q} in the high-dimensional setup. As in our previous results, these provide a verification of internal consistency in the case of transition from fixed-dimensional to high-dimensional situations.

Our results regarding asymptotic α\alpha control of versions of FDRFDR and corresponding convergence of versions of FNRFNR are detailed in Sections 7.1 and 7.2.

7.1 High-dimensional α\alpha-control of mpBFDRmpBFDR and pBFDRpBFDR for the non-marginal method

The following theorem provides the interval for the maximum mpBFDRmpBFDR that can be incurred asymptotically in the high-dimensional setup.

Theorem 13

In addition to (A1)-(A2), assume the following:

  1. (B)

    For each n>1n>1, let each group of a particular set of m1n(<mn)m_{1n}~{}(<m_{n}) groups out of the total mnm_{n} groups be associated with at least one false null hypothesis, and that all the null hypotheses associated with the remaining mnm1nm_{n}-m_{1n} groups be true. Let us further assume that the latter mnm1nm_{n}-m_{1n} groups do not have any overlap with the remaining m1nm_{1n} groups. Without loss of generality assume that G1n,,Gm1nG_{1n},\ldots,G_{m_{1n}} are the groups each consisting of at least one false null and Gm1n+1,Gm1n+2,,GmnG_{m_{1n}+1},G_{m_{1n}+2},\cdots,G_{m_{n}} are the groups where all the null hypotheses are true. Assume further, the following limits:

    limnm1nmn=q(0,1);\displaystyle\lim_{n\rightarrow\infty}\frac{m_{1n}}{m_{n}}=q\in(0,1); (7.1)
    limni=1mnditmn=p(0,1);\displaystyle\lim_{n\rightarrow\infty}\frac{\sum_{i=1}^{m_{n}}d^{t}_{i}}{m_{n}}=p\in(0,1); (7.2)
    limnmnenc=0for allc>0.\displaystyle\lim_{n\rightarrow\infty}m_{n}e^{-nc}=0~{}\mbox{for all}~{}c>0. (7.3)

Then the maximum mpBFDRmpBFDR that can be incurred, asymptotically lies in (0,1q1+pq)\left(0,\frac{1-q}{1+p-q}\right).

Remark 14

If pp is close to zero, that is, if all but a finite number of null hypotheses are true, then 1q1+pq1\frac{1-q}{1+p-q}\approx 1, showing that in such cases, better α\alpha-control can be exercised. Indeed, as the proof of the theorem shows, the optimal decision in this case will be given by all but a finite set of one’s, so that all but a finite number of decisions are correct. Hence, maximum error occurs in this case. Also, if qq is close to 11, then 1q1+pq0\frac{1-q}{1+p-q}\approx 0. In other words, if all but a finite number of groups are associated with at least one false null hypothesis, then almost no error can be incurred. As the proof Theorem 13 shows, this is the case where all but a finite number of decisions are correct, and hence, it is not surprising that almost no error can be incurred in this case.

Remark 15

Also, as in the fixed-dimensional case, Theorem 13 holds, if for at least one i{1,,mn}i\in\{1,\ldots,m_{n}\}, Gi{1,,mn}G_{i}\subset\{1,\ldots,m_{n}\}. But if Gi={1,,mn}G_{i}=\{1,\ldots,m_{n}\} for i=1,,mni=1,\ldots,m_{n}, then mpBFDR0mpBFDR\rightarrow 0 as nn\rightarrow\infty, for any sequence βn[0,1]\beta_{n}\in[0,1].

Remark 16

Note that in the same way as in the fixed-dimensional setup, Theorem 13 remains valid even for mFDR𝐗nmFDR_{\boldsymbol{X}_{n}} thanks to its monotonicity with respect to β\beta, the property crucially used to prove Theorem 13.

The following theorem shows that for feasible values of α\alpha attained asymptotically by the maximum of mpBFDRmpBFDR, for appropriate sequences of penalizing constants βn\beta_{n}, it is possible to asymptotically approach such α\alpha through mpBFDRβnmpBFDR_{\beta_{n}}, where mpBFDRβmpBFDR_{\beta} denotes mpBFDRmpBFDR for the non-marginal procedure where the penalizing constant is β\beta.

Theorem 17

Suppose that

limnmpBFDRβ=0=E.\displaystyle\lim_{n\rightarrow\infty}mpBFDR_{\beta=0}=E. (7.4)

Then, for any α<E\alpha<E and α(0,1q1+pq)\alpha\in\left(0,\frac{1-q}{1+p-q}\right), under condition (B), there exists a sequence βn0\beta_{n}\rightarrow 0 such that mpBFDRβnαmpBFDR_{\beta_{n}}\rightarrow\alpha as nn\rightarrow\infty.

From the proofs of Theorem 13 and 17, it can be seen that replacing win(𝒅^(mn))w_{in}(\hat{\boldsymbol{d}}(m_{n})) by vinv_{in} does not affect the results. Hence we state the following corollary.

Corollary 18

Let pBFDRβpBFDR_{\beta} denote the pBFDRpBFDR corresponding to the non-marginal procedure where the penalizing constant is β\beta. Suppose that

limnpBFDRβ=0=E,\displaystyle\lim_{n\rightarrow\infty}pBFDR_{\beta=0}=E^{\prime},

Then, for any α<E\alpha<E^{\prime} and α(0,1q1+pq)\alpha\in\left(0,\frac{1-q}{1+p-q}\right), under condition (B), there exists a sequence βn0\beta_{n}\rightarrow 0 such that pBFDRβnαpBFDR_{\beta_{n}}\rightarrow\alpha as nn\rightarrow\infty.

As in the fixed-dimensional setup, we see that for α\alpha-control we must have limnβn=0\lim_{n\rightarrow\infty}\beta_{n}=0, and that for lim infnβn>0\liminf_{n\rightarrow\infty}\beta_{n}>0, mpBFDRmpBFDR tends to zero. In other words, even in the high-dimensional setup, α\alpha-control requires a sequence βn\beta_{n} that is smaller that that for which mpBFDRmpBFDR tends to zero.

Since the additive loss function based methods are special cases of the non-marginal procedure where Gi={i}G_{i}=\{i\} for all ii (see Chandra and Bhattacharya (2019), Chandra and Bhattacharya (2020)), and that in such cases, mpBFDRmpBFDR reduces to pBFDRpBFDR, it is important to investigate asymptotic α\alpha-control of pBFDRpBFDR in this situation. Our result in this direction is provided in Theorem 19.

Theorem 19

Let m0n(<mn)m_{0n}~{}(<m_{n}) be the number of true null hypotheses such that m0n/mnp0(0,1)m_{0n}/m_{n}\rightarrow p_{0}\in(0,1), as nn\rightarrow\infty. Then for any 0<α<p00<\alpha<p_{0}, there exists a sequence βn0\beta_{n}\rightarrow 0 as nn\rightarrow\infty such that for the additive loss function based methods

limnpBFDRβn=α.\underset{n\rightarrow\infty}{\lim}~{}pBFDR_{\beta_{n}}=\alpha.

The result is similar in spirit to that obtained by Chandra and Bhattacharya (2020) in the corresponding finite dimensional situation. The limit of m0n/mnm_{0n}/m_{n} in the corresponding high-dimensional setup, instead of m0/mm_{0}/m in the fixed dimensional case, plays the central role here.

Chandra and Bhattacharya (2019) and Chandra and Bhattacharya (2020) noted that even for additive loss function based multiple testing procedures, mpBFDRmpBFDR may be a more desirable candidate compared to pBFDRpBFDR since it can yield non-marginal decisions even if the multiple testing criterion to be optimized is a simple sum of loss functions designed to yield marginal decisions. The following theorem shows that the same high-dimensional asymptotic result as Theorem 19 also holds for mpBFDRmpBFDR in the case of additive loss functions, without the requirement of condition (B). Non-requirement of condition (B) even in the high-dimensional setup can be attributed to the fact that mpBFDR()pBFDR()mpBFDR(\mathcal{M})\geq pBFDR(\mathcal{M}) for any multiple testing method \mathcal{M}, for arbitrary sample size.

Theorem 20

Let m0n(<mn)m_{0n}~{}(<m_{n}) be the number of true null hypotheses such that m0n/mnp0(0,1)m_{0n}/m_{n}\rightarrow p_{0}\in(0,1), as nn\rightarrow\infty. Let α\alpha be the desired level of significance where 0<α<p00<\alpha<p_{0}. Then there exists a sequence βn0\beta_{n}\rightarrow 0 as nn\rightarrow\infty such that for the additive loss function based method

limnmpBFDRβn=α.\underset{n\rightarrow\infty}{\lim}~{}mpBFDR_{\beta_{n}}=\alpha.

Note that Bayesian versions of FDRFDR (conditional on the data) need not be continuous with respect to β\beta, and so results for such Bayesian versions similar to Theorem 17, Corollary 18 and Theorems 19, 20, which heavily use such continuity property, could not be established.

Thus, interestingly, all the asymptotic results for α\alpha-control of versions of FDRFDR in the fixed dimensional setup admitted simple extensions to the high-dimensional setup, with minimal assumption regarding the growth rate of mnm_{n}, given Shalizi’s conditions hold. Since Shalizi’s conditions are meant for posterior consistency, from the multiple testing perspective, our high-dimensional results are very interesting in the sense that almost no extra assumptions are required in addition to Shalizi’s conditions for our multiple testing results to carry over from fixed dimension to high dimensions.

7.2 High-dimensional properties of Type-II errors when mpBFDRmpBFDR and pBFDRpBFDR are asymptotically controlled at α\alpha

In this section, we investigate the high-dimensional asymptotic theory for FNR𝑿nFNR_{\boldsymbol{X}_{n}} and pBFNRpBFNR associated with α\alpha-control of versions of FDRFDR. Our results in these regards are provided as Theorem 21 and Corollary 22.

Theorem 21

Assume condition (B) and that n1logmn0n^{-1}\log m_{n}\rightarrow 0, as nn\rightarrow\infty. Then for asymptotic α\alpha-control of mpBFDRmpBFDR in the non-marginal procedure the following holds almost surely:

lim supnFNR𝑿nJ.\limsup_{n\rightarrow\infty}FNR_{\boldsymbol{X}_{n}}\leq-J.

The above theorem requires the very mild assumption that n1logmn0n^{-1}\log m_{n}\rightarrow 0, as nn\rightarrow\infty, in addition to (B). The result shows that FNR𝑿nFNR_{\boldsymbol{X}_{n}} converges to zero at an exponential rate, but again the exact limit of FNR𝑿nFNR_{\boldsymbol{X}_{n}} is not available in this high-dimensional setup. This is slightly disconcerting in the sense that we are now unable to compare the rates of convergence of FNR𝑿nFNR_{\boldsymbol{X}_{n}} for cases where α\alpha-control is imposed and not imposed. Indeed, for the fixed-dimensional setup, Chandra and Bhattacharya (2020) could obtain exact limits and consequently show that FNR𝑿nFNR_{\boldsymbol{X}_{n}} converges to zero at a rate faster than or equal to that compared to the case when α\alpha control is not exercised. However, as we already argued in the context of versions of FDRFDR, exact limits are not expected to be available in these cases for high dimensions.

Corollary 22

Assume condition (B) and that n1logmn0n^{-1}\log m_{n}\rightarrow 0, as nn\rightarrow\infty. Then for asymptotic α\alpha-control of mpBFDRmpBFDR in the non-marginal procedure the following holds:

limnpBFNR=0.\lim_{n\rightarrow\infty}pBFNR=0.

Thus, as in the fixed dimensional setup, Corollary 22 shows that corresponding to α\alpha-control, pBFNRpBFNR converges to zero even in the high-dimensional setup, and that the rate of convergence to zero is unavailable.

8 Illustration of consistency of our non-marginal multiple testing procedure in time-varying covariate selection in autoregressive process

Let the true model PP stand for the following AR(1)AR(1) model consisting of time-varying covariates:

xt=ρ0xt1+i=0mβi0zit+ϵt,t=1,2,,n,x_{t}=\rho_{0}x_{t-1}+\sum_{i=0}^{m}\beta_{i0}z_{it}+\epsilon_{t},~{}t=1,2,\ldots,n, (8.1)

where x00x_{0}\equiv 0, |ρ0|<1|\rho_{0}|<1 and ϵtiidN(0,σ02)\epsilon_{t}\stackrel{{\scriptstyle iid}}{{\sim}}N(0,\sigma^{2}_{0}), for t=1,2,,nt=1,2,\ldots,n. In (8.1), mmnm\equiv m_{n}\rightarrow\infty as nn\rightarrow\infty. Here {zit:t=1,2,}\left\{z_{it}:t=1,2,\ldots\right\} are relevant time-varying covariates. We set z0t1z_{0t}\equiv 1 for all tt.

Now let the data be modeled by the same model as PP but with ρ0\rho_{0}, βi0\beta_{i0} and σ02\sigma^{2}_{0} be replaced with the unknown quantities ρ\rho, βi\beta_{i} and σ2\sigma^{2}, respectively, that is,

xt=ρxt1+i=0mβizit+ϵt,t=1,2,,n,x_{t}=\rho x_{t-1}+\sum_{i=0}^{m}\beta_{i}z_{it}+\epsilon_{t},~{}t=1,2,\ldots,n, (8.2)

where we set x00x_{0}\equiv 0, ϵtiidN(0,σ2)\epsilon_{t}\stackrel{{\scriptstyle iid}}{{\sim}}N(0,\sigma^{2}), for t=1,2,,nt=1,2,\ldots,n.

For notational purposes, we let 𝒛mt=(z0t,z1t,,zmt)\boldsymbol{z}_{mt}=(z_{0t},z_{1t},\ldots,z_{mt})^{\prime}, 𝒛t=(z0t,z1t,)\boldsymbol{z}_{t}=(z_{0t},z_{1t},\ldots)^{\prime}, 𝜷m0=(β00,β10,,βm0)\boldsymbol{\beta}_{m0}=(\beta_{00},\beta_{10},\ldots,\beta_{m0})^{\prime}, 𝜷m=(β0,β1,,βm)\boldsymbol{\beta}_{m}=(\beta_{0},\beta_{1},\ldots,\beta_{m})^{\prime} and 𝜷=(β0,β1,)\boldsymbol{\beta}=(\beta_{0},\beta_{1},\ldots)^{\prime}.

8.1 The ultra high-dimensional setup

Let us first consider the setup where mnn\frac{m_{n}}{n}\rightarrow\infty as nn\rightarrow\infty. This is a challenging problem, and we require notions of sparsity to address such a problem. As will be shown subsequently in Section 8.2, a precise notion of sparsity is available for our problem in the context of the equipartition property. Specifically sparsity in our problem entails controlling relevant quadratic forms of 𝜷\boldsymbol{\beta}. For such sparsity, we must devise a prior for 𝜷\boldsymbol{\beta} such that 𝜷<\|\boldsymbol{\beta}\|<\infty. We also assume that 𝜷0<\|\boldsymbol{\beta}_{0}\|<\infty.

For appropriate prior structures for 𝜷\boldsymbol{\beta}, let us consider the following strategy. First, let us consider an almost surely continuously differentiable random function η~()\tilde{\eta}(\cdot) on a compact space 𝒳\mathcal{X}, such that

η~=sup𝐱~𝒳|η~(𝐱~)|<,almost surely.\|\tilde{\eta}\|=\underset{\tilde{\mathbf{x}}\in\mathcal{X}}{\sup}~{}|\tilde{\eta}(\tilde{\mathbf{x}})|<\infty,~{}\mbox{almost surely.} (8.3)

We denote the class of such functions as 𝒞(𝒳)\mathcal{C}^{\prime}(\mathcal{X}). A popular prior for 𝒞(𝒳)\mathcal{C}^{\prime}(\mathcal{X}) is the Gaussian process prior with sufficiently smooth covariance function, in which case, both η~\tilde{\eta} and η~\tilde{\eta}^{\prime} are Gaussian processes; see, for example, Cramer and Leadbetter (1967). Let us now consider an arbitrary sequence {𝐱~i:i=1,2,}\left\{\tilde{\mathbf{x}}_{i}:i=1,2,\ldots\right\}, and let 𝜷~=(β~1,β~2,)\tilde{\boldsymbol{\beta}}=\left(\tilde{\beta}_{1},\tilde{\beta}_{2},\ldots\right)^{\prime}, where, for i=1,2,i=1,2,\ldots, β~i=η~(𝐱~i)\tilde{\beta}_{i}=\tilde{\eta}(\tilde{\mathbf{x}}_{i}). We then define βi=γiβ~i\beta_{i}=\gamma_{i}\tilde{\beta}_{i}, where for i=1,2,i=1,2,\ldots, γi\gamma_{i} are independent (but non-identical) random variables, such that 0<|γi|<L<0<|\gamma_{i}|<L<\infty for i1i\geq 1, and

i=1|γi|<,almost surely.\sum_{i=1}^{\infty}|\gamma_{i}|<\infty,~{}\mbox{almost surely.} (8.4)

Also, let ρ\rho\in\mathbb{R} and σ(0,)=+\sigma\in(0,\infty)=\mathbb{R}^{+}. Thus, 𝜽=(η~,𝜸,ρ,σ)\boldsymbol{\theta}=(\tilde{\eta},\boldsymbol{\gamma},\rho,\sigma), where 𝜸=(γ1,γ2,)\boldsymbol{\gamma}=(\gamma_{1},\gamma_{2},\ldots)^{\prime}, and 𝚯=𝒞(𝒳)×××+\boldsymbol{\Theta}=\mathcal{C}^{\prime}(\mathcal{X})\times\mathbb{R}^{\infty}\times\mathbb{R}\times\mathbb{R}^{+}, is the parameter space. For our asymptotic theories regarding the multiple testing methods that we consider, we must verify the assumptions of Shalizi for the modeling setups (8.1) and (8.2), with this parameter space.

With respect to the above ultra high-dimensional setup, we consider the following multiple-testing framework:

H01:|ρ|<1 versus H11:|ρ|1 and\displaystyle H_{01}:|\rho|<1\text{ versus }H_{11}:|\rho|\geq 1\text{ and}
H0,i+2:βi𝒩0 versus H1,i+2:βi𝒩0c, for i=0,,m,\displaystyle H_{0,i+2}:\beta_{i}\in\mathcal{N}_{0}\text{ versus }H_{1,i+2}:\beta_{i}\in\mathcal{N}^{c}_{0},~{}\text{ for }~{}i=0,\ldots,m, (8.5)

where 𝒩0\mathcal{N}_{0} is some neighborhood of zero and 𝒩0c\mathcal{N}^{c}_{0} is the complement of the neighborhood in the relevant parameter space.

Verification of consistency of our non-marginal procedure amounts to verification of assumptions (S1)(S7) of Shalizi for the above setup. In this regard, we make the following assumptions:

  • (B1)

    supt1zt<\underset{t\geq 1}{\sup}~{}\|z_{t}\|<\infty, where, for t1t\geq 1, zt=supi1|zit|\|z_{t}\|=\underset{i\geq 1}{\sup}~{}|z_{it}|.

  • (B2)

    For k>1k>1, let λ~nk\tilde{\lambda}_{nk} be the largest eigenvalue of t=1n𝒛m,t+k𝒛mtn\frac{\sum_{t=1}^{n}\boldsymbol{z}_{m,t+k}\boldsymbol{z}^{\prime}_{mt}}{n}. We assume that λ~nk0\tilde{\lambda}_{nk}\rightarrow 0, as nn\rightarrow\infty, for k>1k>1.

  • (B3)

    Let λn\lambda_{n} be the largest eigenvalue of t=1n𝒛mt𝒛mtn\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}. We assume that supn1λnK<\underset{n\geq 1}{\sup}~{}\lambda_{n}\leq K<\infty.

  • (B4)
    1nt=1n𝜷m𝒛mt0almost surely;1nt=1n𝜷m0𝒛mt0;\displaystyle\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{mt}\rightarrow 0~{}\mbox{almost surely};~{}~{}\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m0}\boldsymbol{z}_{mt}\rightarrow 0; (8.6)
    1nt=1n𝜷m𝒛mt𝒛mt𝜷mc(𝜷)almost surely;1nt=1n𝜷m0𝒛mt𝒛mt𝜷m0c(𝜷0),\displaystyle\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}\rightarrow c(\boldsymbol{\beta})~{}\mbox{almost surely};~{}~{}\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m0}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0}\rightarrow c(\boldsymbol{\beta}_{0}), (8.7)
    1nt=1n𝜷m𝒛mt𝒛mt𝜷m0c10(𝜷,𝜷0)almost surely,\displaystyle\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0}\rightarrow c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})~{}\mbox{almost surely}, (8.8)

    as nn\rightarrow\infty. In the above, c(𝜷0)(>0)c(\boldsymbol{\beta}_{0})~{}(>0) is a finite constant; c(𝜷)(>0)c(\boldsymbol{\beta})~{}(>0) and c10(𝜷,𝜷0)c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0}) are finite quantities that depend upon the choice of the sequence {𝜷m;n=1,2,}\left\{\boldsymbol{\beta}_{m};n=1,2,\ldots\right\}.

  • (B5)

    The limits of the quantities 𝒛t𝜷\boldsymbol{z}^{\prime}_{t}\boldsymbol{\beta} for almost all 𝜷\boldsymbol{\beta}, 𝒛t𝜷0\boldsymbol{z}^{\prime}_{t}\boldsymbol{\beta}_{0} and ϱ^t=k=1tρ0tk𝒛k𝜷0\hat{\varrho}_{t}=\sum_{k=1}^{t}\rho^{t-k}_{0}\boldsymbol{z}^{\prime}_{k}\boldsymbol{\beta}_{0} exist as tt\rightarrow\infty.

  • (B6)

    There exist positive constants α\alpha, cρc_{\rho}, cσc_{\sigma}, cη~c_{\tilde{\eta}}, cη~c_{\tilde{\eta}^{\prime}} and cγc_{\gamma} such that the following hold for sufficiently large nn:

    π(|ρ|>exp((αn)1/16))\displaystyle\pi\left(|\rho|>\exp(\left(\alpha n\right)^{1/16})\right) cρexp(αn);\displaystyle\leq c_{\rho}\exp\left(-\alpha n\right);
    π(exp((αn)1/16)σexp((αn)1/16))\displaystyle\pi\left(\exp(-\left(\alpha n\right)^{1/16})\leq\sigma\leq\exp(\left(\alpha n\right)^{1/16})\right) 1cσexp(αn);\displaystyle\geq 1-c_{\sigma}\exp\left(-\alpha n\right);
    π(η~exp((αn)1/16))\displaystyle\pi\left(\|\tilde{\eta}\|\geq\exp(\left(\alpha n\right)^{1/16})\right) cη~exp(αn);\displaystyle\leq c_{\tilde{\eta}}\exp\left(-\alpha n\right);
    π(η~exp((αn)1/16))\displaystyle\pi\left(\|\tilde{\eta}^{\prime}\|\geq\exp(\left(\alpha n\right)^{1/16})\right) cη~exp(αn);\displaystyle\leq c_{\tilde{\eta}^{\prime}}\exp\left(-\alpha n\right);
    π(i=1|γi|exp((αn)1/16))\displaystyle\pi\left(\sum_{i=1}^{\infty}|\gamma_{i}|\geq\exp(\left(\alpha n\right)^{1/16})\right) cγexp(αn),\displaystyle\leq c_{\gamma}\exp\left(-\alpha n\right),
  • (B7)

    L(mn+1mn)exp((α(n+1))1/16)exp((αn)1/16)L(m_{n+1}-m_{n})\leq\exp(\left(\alpha(n+1)\right)^{1/16})-\exp(\left(\alpha n\right)^{1/16}), for nn0n\geq n_{0}, for some n01n_{0}\geq 1.

8.2 Discussion of the assumptions in the light of the ultra high-dimensional setup

Condition (B1) holds if the covariates {zit;i1,t1}\left\{z_{it};i\geq 1,t\geq 1\right\}, is a realization of some stochastic process with almost surely finite sup-norm, for example, Gaussian process. Assumption (B1), along with (8.3) and (8.4) leads to the following result:

|𝒛mt𝜷m0|<C,|\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0}|<C, (8.9)

for some C>0C>0. To see this, first let 𝜷0\boldsymbol{\beta}_{0} correspond to the true quantities 𝜸0\boldsymbol{\gamma}_{0} and η~0\tilde{\eta}_{0}. Then observe that |𝒛mt𝜷m0|i=1m|zit||βi0|supt1ztη~0i=1|γi0|<C|\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0}|\leq\sum_{i=1}^{m}|z_{it}||\beta_{i0}|\leq\underset{t\geq 1}{\sup}~{}\|z_{t}\|\|\tilde{\eta}_{0}\|\sum_{i=1}^{\infty}|\gamma_{i0}|<C, since supt1zt<\underset{t\geq 1}{\sup}~{}\|z_{t}\|<\infty by (B5), η~0<\|\tilde{\eta}_{0}\|<\infty by (8.3) and i=1|γi0|<\sum_{i=1}^{\infty}|\gamma_{i0}|<\infty by (8.4). Condition (B1) is required for some limit calculations and boundedness of some norms associated with concentration inequalities.

Condition (B2) says that the covariates at different time points, after scaling by n\sqrt{n}, are asymptotically orthogonal. This condition also imply the following:

1nt=1n𝜷m𝒛m,t+k𝒛mt𝜷m0almost surely, and1nt=1n𝜷m0𝒛m,t+k𝒛mt𝜷m00for anyk>1;\displaystyle\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{m,t+k}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}\rightarrow 0~{}\mbox{almost surely, and}~{}~{}\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m0}\boldsymbol{z}_{m,t+k}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0}\rightarrow 0~{}\mbox{for any}~{}k>1; (8.10)

To see (8.10), observe that

1nt=1n𝜷m𝒛m,t+k𝒛mt𝜷m=𝜷m(t=1n𝒛m,t+k𝒛mtn)𝜷m𝜷m2(t=1n𝒛m,t+k𝒛mtn)op.\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{m,t+k}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}=\boldsymbol{\beta}^{\prime}_{m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{m,t+k}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{m}\leq\|\boldsymbol{\beta}_{m}\|^{2}\left\|\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{m,t+k}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\right\|_{op}. (8.11)

In (8.11), 𝜷m\|\boldsymbol{\beta}_{m}\| denotes the Euclidean norm of 𝜷m\boldsymbol{\beta}_{m} and for any matrix 𝑨\boldsymbol{A}, 𝑨op\|\boldsymbol{A}\|_{op} denotes the operator norm of 𝑨\boldsymbol{A} given by 𝑨op=sup𝒖=1𝑨𝒖\|\boldsymbol{A}\|_{op}=\underset{\|\boldsymbol{u}\|=1}{\sup}~{}\|\boldsymbol{A}\boldsymbol{u}\|. By (B2), (t=1n𝒛m,t+k𝒛mtn)op0\left\|\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{m,t+k}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\right\|_{op}\rightarrow 0 as nn\rightarrow\infty. Also,

𝜷m2i=1γi2β~i2η~2i=1γi2<,almost surely,\|\boldsymbol{\beta}_{m}\|^{2}\leq\sum_{i=1}^{\infty}\gamma^{2}_{i}\tilde{\beta}^{2}_{i}\leq\|\tilde{\eta}\|^{2}\sum_{i=1}^{\infty}\gamma^{2}_{i}<\infty,~{}\mbox{almost surely}, (8.12)

by (8.3) and (8.4). It follows from (8.12) that (8.11) is almost surely finite. This and (B2) together imply the first part of the limit 8.10). Since 𝜷0<\|\boldsymbol{\beta}_{0}\|<\infty, the second limit of 8.10) follows in the same way.

As shown in Section 8.3, λn0\lambda_{n}\rightarrow 0 as nn\rightarrow\infty, even if supt=1,,n𝒛mt=O(nr),wherer<1\underset{t=1,\ldots,n}{\sup}~{}\|\boldsymbol{z}_{mt}\|=O(n^{r}),~{}\mbox{where}~{}r<1, that is, even if (B1) does not hold. Since we assume only as much as λn\lambda_{n} is bounded above, (B3) is a reasonably mild assumption.

In (B4), (8.6) can be made to hold in practice by centering the covariates, that is, by setting 𝒛~mt=𝒛mt𝒛¯m\tilde{\boldsymbol{z}}_{mt}=\boldsymbol{z}_{mt}-\bar{\boldsymbol{z}}_{m}, where 𝒛¯m=1nt=1n𝒛mt\bar{\boldsymbol{z}}_{m}=\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{z}_{mt}. In (B1) (8.7) we assume that c(𝜷)c(\boldsymbol{\beta}) and c10(𝜷,𝜷0)c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0}) remain finite for any choice of {𝜷m;n=1,2,}\left\{\boldsymbol{\beta}_{m};n=1,2,\ldots\right\}. To see that finiteness holds, first note that

1nt=1n𝜷m𝒛mt𝒛mt𝜷m=𝜷m(t=1n𝒛mt𝒛mt)𝜷m𝜷m2(t=1n𝒛mt𝒛mtn)op.\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}=\boldsymbol{\beta}^{\prime}_{m}\left(\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\right)\boldsymbol{\beta}_{m}\leq\|\boldsymbol{\beta}_{m}\|^{2}\left\|\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\right\|_{op}. (8.13)

In (8.13), 𝜷m<\|\boldsymbol{\beta}_{m}\|<\infty almost surely, by (8.12), and (t=1n𝒛mt𝒛mtn)op<\left\|\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\right\|_{op}<\infty by (B3). Hence, (8.11) is finite. Similarly, 1nt=1n𝜷m𝒛mt𝒛mt𝜷m0=𝜷m(t=1n𝒛mt𝒛mtn)𝜷m0𝜷m𝜷m0(t=1n𝒛mt𝒛mtn)op\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0}=\boldsymbol{\beta}^{\prime}_{m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{m0}\leq\|\boldsymbol{\beta}_{m}\|\|\boldsymbol{\beta}_{m0}\|\left\|\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\right\|_{op}, which is again almost surely finite due to (8.3), (8.4) and (B3). Thus, (8.3) and (8.4) are precisely the conditions that induce sparsity within our model in the sense of controlling the quadratic forms involving 𝜷m\boldsymbol{\beta}_{m} and 𝜷m0\boldsymbol{\beta}_{m0}, given that (B4) holds. Assumptions on the existence of the limits are required for conditions (S2) and (S3) of Shalizi. As can be observe from Section 8.3, 1nt=1n𝜷m𝒛mt𝒛mt𝜷m0\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}\rightarrow 0, almost surely as nn\rightarrow\infty, if the asymptotically orthogonal covariates satisfy supt=1,,n𝒛mt=O(nr),wherer<1\underset{t=1,\ldots,n}{\sup}~{}\|\boldsymbol{z}_{mt}\|=O(n^{r}),~{}\mbox{where}~{}r<1, that is, even if (B1) does not hold. Hence, in this situation, the required limits of the quadratic forms exist and are zero, under very mild conditions.

Again, the limit existence assumption (B5) is required for verification of conditions (S2) and (S3) of Shalizi.

Assumption (B6), required to satisfy condition (S5) of Shalizi, is reasonably mild. The threshold exp((αn)1/16)\exp(\left(\alpha n\right)^{1/16}) for the probabilities involving η~\|\tilde{\eta}\| and η~\|\tilde{\eta}^{\prime}\| can be replaced with the order of n\sqrt{n} for Gaussian process priors or for independent sub-Gaussian components of 𝜷\boldsymbol{\beta}. However, note that priors such as gamma or inverse gamma for σ\sigma do not necessarily satisfy the condition. In such cases, one can modify the prior by replacing the tail part of the prior, after an arbitrarily large positive value, with a thin-tailed prior, such as normal. In practice, such modified priors would be effectively the same as gamma or inverse gamma priors, and yet would satisfy the conditions of (B6).

Assumption (B7), in conjunction with boundedness of |γi||\gamma_{i}|, for all ii by LL, is a mild condition ensuring that 𝒢n\mathcal{G}_{n} are increasing in nn, when nn0n\geq n_{0}, for some n01n_{0}\geq 1.

8.3 High-dimensional but not ultra high-dimensional setup

The setup we discussed so far deals with the so-called ultra high-dimensional problem, in the sense that mnn\frac{m_{n}}{n}\rightarrow\infty as nn\rightarrow\infty. This is a challenging problem to address and we required a prior for 𝜷\boldsymbol{\beta} satisfying 𝜷<\|\boldsymbol{\beta}\|<\infty almost surely. However, if we are only interested in the problem where mnn0\frac{m_{n}}{n}\rightarrow 0 as nn\rightarrow\infty, then it is not necessary to insist on priors to ensure finiteness of 𝜷\|\boldsymbol{\beta}\|. For example, if the covariates 𝒛mt\boldsymbol{z}_{mt} are orthogonal, then assuming that

supt=1,,n𝒛mt=O(nr),wherer<1,\underset{t=1,\ldots,n}{\sup}~{}\|\boldsymbol{z}_{mt}\|=O(n^{r}),~{}\mbox{where}~{}r<1, (8.14)

1nt=1n𝒛mt𝒛mt\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt} has maximum eigenvalue O(nr1)O(n^{r-1}), so that (8.11) entails

1nt=1n𝜷m𝒛mt𝒛mt𝜷m=O(𝜷m2nr1).\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}=O\left(\|\boldsymbol{\beta}_{m}\|^{2}n^{r-1}\right). (8.15)

Now, if the components of 𝜷m\boldsymbol{\beta}_{m} are independent and sub-Gaussian with mean zero, then by the Hanson-Wright inequality (see, for example, Rudelson and Vershynin (2013)) we have

P(|t=1mβt2t=1mE(βt2)|>n1rt=1mE(βt2))\displaystyle P\left(\left|\sum_{t=1}^{m}\beta^{2}_{t}-\sum_{t=1}^{m}E(\beta^{2}_{t})\right|>n^{1-r}-\sum_{t=1}^{m}E(\beta^{2}_{t})\right)
2exp(L1min{(n1rt=1mE(βt2))2L24m,n1rt=1mE(βt2)L22}),\displaystyle\qquad\leq 2\exp\left(-L_{1}\min\left\{\frac{\left(n^{1-r}-\sum_{t=1}^{m}E(\beta^{2}_{t})\right)^{2}}{L^{4}_{2}m},\frac{n^{1-r}-\sum_{t=1}^{m}E(\beta^{2}_{t})}{L^{2}_{2}}\right\}\right), (8.16)

where L1>0L_{1}>0 is some constant and L2L_{2} is the upper bound of the sub-Gaussian norm. Let m~=t=1mE(βt2)\tilde{m}=\sum_{t=1}^{m}E(\beta^{2}_{t}). If n1rm~m~c~(>0)\frac{n^{1-r}-\tilde{m}}{\sqrt{\tilde{m}}}\rightarrow\tilde{c}~{}(>0), where c~\tilde{c} is finite or infinite, then (8.16) is summable. Hence, by the Borel-Cantelli lemma, t=1mβt2n1r\sum_{t=1}^{m}\beta^{2}_{t}\leq n^{1-r} almost surely, as nn\rightarrow\infty. It then follows from (8.15) that 1nt=1n𝜷m𝒛mt𝒛mt𝜷m<\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}<\infty almost surely as nn\rightarrow\infty.

For the non-ultra high-dimensional setup, the problem is largely simplified. Indeed, introduction of η~\tilde{\eta} and η~\tilde{\eta}^{\prime} are not required, as we can directly consider sub-Gaussian priors for 𝜷\boldsymbol{\beta} as detailed above. Consequently, in (B3), only the first two inequalities are needed and assumption (B6) is no longer required. Since the ultra high-dimensional setup is far more challenging than the non-ultra high-dimensional setup, we consider only the former setup for our purpose, and note that the latter setup can be dealt with using almost the same ideas but with much less effort.

Assumptions (B1)–(B6) lead to the following results that are the main ingredients in proving our posterior convergence in the ultra high-dimensional setup.

Lemma 23

Under (B1), (B2) and (B5), the KL-divergence rate h(𝛉)h(\boldsymbol{\theta}) exists for each 𝛉𝚯\boldsymbol{\theta}\in\boldsymbol{\Theta} and is given by

h(𝜽)=log(σσ0)+(12σ212σ02)(σ021ρ02+c(𝜷0)1ρ02)+(ρ22σ2ρ022σ02)(σ021ρ02+c(𝜷0)1ρ02)+c(𝜷)2σ2c(𝜷0)2σ02(ρσ2ρ0σ02)(ρ0σ021ρ02+ρ0c(𝜷0)1ρ02)(c10(𝜷,𝜷0)σ2c(𝜷0)σ02).h(\boldsymbol{\theta})=\log\left(\frac{\sigma}{\sigma_{0}}\right)+\left(\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right)\left(\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)\\ +\left(\frac{\rho^{2}}{2\sigma^{2}}-\frac{\rho^{2}_{0}}{2\sigma^{2}_{0}}\right)\left(\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)+\frac{c(\boldsymbol{\beta})}{2\sigma^{2}}-\frac{c(\boldsymbol{\beta}_{0})}{2\sigma^{2}_{0}}\\ -\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\left(\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)-\left(\frac{c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})}{\sigma^{2}}-\frac{c(\boldsymbol{\beta}_{0})}{\sigma^{2}_{0}}\right). (8.17)
Theorem 24

Under (B1), (B2) and (B5), the asymptotic equipartition property holds and is given by

limn1nlogRn(𝜽)=h(𝜽).\underset{n\rightarrow\infty}{\lim}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})=-h(\boldsymbol{\theta}).

Furthermore, the convergence is uniform on any compact subset of 𝚯\boldsymbol{\Theta}.

Lemma 23 and Theorem 24 ensure that (S1)(S3) hold, and (S4) holds since h(𝜽)h(\boldsymbol{\theta}) is almost surely finite. (B6) implies that 𝒢n\mathcal{G}_{n} increases to 𝚯\boldsymbol{\Theta}. In Section S-13.5 we verify (S5).

Now observe that the aim of assumption (S6) is to ensure that (see the proof of Lemma 7 of Shalizi (2009)) for every ε>0\varepsilon>0 and for all nn sufficiently large,

1nlog𝒢nRn(𝜽)𝑑π(𝜽)h(𝒢n)+ε,almost surely.\frac{1}{n}\log\int_{\mathcal{G}_{n}}R_{n}(\boldsymbol{\theta})d\pi(\boldsymbol{\theta})\leq-h\left(\mathcal{G}_{n}\right)+\varepsilon,~{}\mbox{almost surely}.

Since h(𝒢n)h(𝚯)h\left(\mathcal{G}_{n}\right)\rightarrow h\left(\boldsymbol{\Theta}\right) as nn\rightarrow\infty, it is enough to verify that for every ε>0\varepsilon>0 and for all nn sufficiently large,

1nlog𝒢nRn(𝜽)𝑑π(𝜽)h(𝚯)+ε,almost surely.\frac{1}{n}\log\int_{\mathcal{G}_{n}}R_{n}(\boldsymbol{\theta})d\pi(\boldsymbol{\theta})\leq-h\left(\boldsymbol{\Theta}\right)+\varepsilon,~{}\mbox{almost surely}. (8.18)

In this regard, first observe that

1nlog𝒢nRn(𝜽)𝑑π(𝜽)\displaystyle\frac{1}{n}\log\int_{\mathcal{G}_{n}}R_{n}(\boldsymbol{\theta})d\pi(\boldsymbol{\theta}) 1nlog[sup𝜽𝒢nRn(𝜽)π(𝒢n)]\displaystyle\leq\frac{1}{n}\log\left[\underset{\boldsymbol{\theta}\in\mathcal{G}_{n}}{\sup}~{}R_{n}(\boldsymbol{\theta})\pi(\mathcal{G}_{n})\right]
=1nlog[sup𝜽𝒢nRn(𝜽)]+1nlogπ(𝒢n)\displaystyle=\frac{1}{n}\log\left[\underset{\boldsymbol{\theta}\in\mathcal{G}_{n}}{\sup}~{}R_{n}(\boldsymbol{\theta})\right]+\frac{1}{n}\log\pi(\mathcal{G}_{n})
=sup𝜽𝒢n1nlogRn(𝜽)+1nlogπ(𝒢n)\displaystyle=\underset{\boldsymbol{\theta}\in\mathcal{G}_{n}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+\frac{1}{n}\log\pi(\mathcal{G}_{n})
1nsup𝜽𝒢nlogRn(𝜽),\displaystyle\leq\frac{1}{n}\underset{\boldsymbol{\theta}\in\mathcal{G}_{n}}{\sup}~{}\log R_{n}(\boldsymbol{\theta}), (8.19)

where the last inequality holds since 1nlogπ(𝒢n)0\frac{1}{n}\log\pi(\mathcal{G}_{n})\leq 0. Now, letting 𝒮={𝜽:h(𝜽)κ}\mathcal{S}=\left\{\boldsymbol{\theta}:h(\boldsymbol{\theta})\leq\kappa\right\}, where κ>h(𝚯)\kappa>h\left(\boldsymbol{\Theta}\right) is large as desired,

sup𝜽𝒢n1nlogRn(𝜽)\displaystyle\underset{\boldsymbol{\theta}\in\mathcal{G}_{n}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta}) sup𝜽𝚯1nlogRn(𝜽)=sup𝜽𝒮𝒮c1nlogRn(𝜽)\displaystyle\leq\underset{\boldsymbol{\theta}\in\boldsymbol{\Theta}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})=\underset{\boldsymbol{\theta}\in\mathcal{S}\cup\mathcal{S}^{c}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})
max{sup𝜽𝒮1nlogRn(𝜽),sup𝜽𝒮c1nlogRn(𝜽)}.\displaystyle\leq\max\left\{\underset{\boldsymbol{\theta}\in\mathcal{S}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta}),\underset{\boldsymbol{\theta}\in\mathcal{S}^{c}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})\right\}. (8.20)

From (8.17) it is clear that h(𝜽)h(\boldsymbol{\theta}) is continuous in 𝜽\boldsymbol{\theta} and that h(𝜽)h(\boldsymbol{\theta})\rightarrow\infty as 𝜽\|\boldsymbol{\theta}\|\rightarrow\infty. In other words, h(𝜽)h(\boldsymbol{\theta}) is a continuous coercive function. Hence, 𝒮\mathcal{S} is a compact set (see, for example, Lange (2010)). Hence it easily follows that (see Chatterjee and Bhattacharya (2020)), that

sup𝜽𝒮1nlogRn(𝜽)sup𝜽𝒮h(𝜽)=h(𝒮),almost surely, asn.\underset{\boldsymbol{\theta}\in\mathcal{S}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})\rightarrow\underset{\boldsymbol{\theta}\in\mathcal{S}}{\sup}~{}-h(\boldsymbol{\theta})=-h\left(\mathcal{S}\right),~{}\mbox{almost surely, as}~{}n\rightarrow\infty. (8.21)

We now show that

sup𝜽𝒮c1nlogRn(𝜽)h(𝚯)almost surely, asn.\underset{\boldsymbol{\theta}\in\mathcal{S}^{c}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})\leq-h\left(\boldsymbol{\Theta}\right)~{}\mbox{almost surely, as}~{}n\rightarrow\infty. (8.22)

First note that if sup𝜽𝒮c1nlogRn(𝜽)>h(𝚯)\underset{\boldsymbol{\theta}\in\mathcal{S}^{c}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})>-h\left(\boldsymbol{\Theta}\right) infinitely often, then 1nlogRn(𝜽)>h(𝚯)\frac{1}{n}\log R_{n}(\boldsymbol{\theta})>-h\left(\boldsymbol{\Theta}\right) for some 𝜽𝒮c\boldsymbol{\theta}\in\mathcal{S}^{c} infinitely often. But 1nlogRn(𝜽)>h(𝚯)\frac{1}{n}\log R_{n}(\boldsymbol{\theta})>-h\left(\boldsymbol{\Theta}\right) if and only if 1nlogRn(𝜽)+h(𝜽)>h(𝜽)h(𝚯),for𝜽𝒮c.\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+h(\boldsymbol{\theta})>h(\boldsymbol{\theta})-h\left(\boldsymbol{\Theta}\right),~{}\mbox{for}~{}\boldsymbol{\theta}\in\mathcal{S}^{c}. Hence, if we can show that

P(|1nlogRn(𝜽)+h(𝜽)|>κh(𝚯),for𝜽𝒮cinfinitely often)=0,P\left(\left|\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+h(\boldsymbol{\theta})\right|>\kappa-h\left(\boldsymbol{\Theta}\right),~{}\mbox{for}~{}\boldsymbol{\theta}\in\mathcal{S}^{c}~{}\mbox{infinitely often}\right)=0, (8.23)

then (8.22) will be proved. We use the Borel-Cantelli lemma to prove (8.23). In other words, we prove that

Theorem 25

Under (B5), (8.3) and (8.4),

n=1𝒮cP(|1nlogRn(𝜽)+h(𝜽)|>κh(𝚯))𝑑π(𝜽)<.\sum_{n=1}^{\infty}\int_{\mathcal{S}^{c}}P\left(\left|\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+h(\boldsymbol{\theta})\right|>\kappa-h\left(\boldsymbol{\Theta}\right)\right)d\pi(\boldsymbol{\theta})<\infty. (8.24)

The proof of Theorem 25 heavily uses (8.9), which is ensured by (B5), (8.3) and (8.4). Since h(𝜽)h(\boldsymbol{\theta}) is continuous, (S7) holds trivially.

We provide detailed verification of the seven assumptions of Shalizi in the supplement, which leads to the following result:

Theorem 26

Under assumptions (B1) – (B6), the non-marginal multiple testing procedure for testing (8.5) is consistent.

Needless to mention, all the results on error convergence of the non-marginal method also continue to hold for this setup under (B1) – (B6), thanks to verification of Shalizi’s conditions.

8.4 Remark on identifiability of our model and posterior consistency

Note that we have modeled 𝜷\boldsymbol{\beta} in terms of 𝜸\boldsymbol{\gamma} and η~\tilde{\eta}. But from the likelihood it is evident that although 𝜷\boldsymbol{\beta} is identifiable, 𝜸\boldsymbol{\gamma} and η~\tilde{\eta} are not. But this is not an issue since our interest is in the posterior of 𝜷\boldsymbol{\beta}, not of 𝜸\boldsymbol{\gamma} or η~\tilde{\eta}. Indeed, Theorem 3 of Shalizi guarantees that the posterior of the set {𝜽:h(𝜽)h(𝚯)+ε}\{\boldsymbol{\theta}:h(\boldsymbol{\theta})\leq h(\boldsymbol{\Theta})+\varepsilon\} tends to 1 as nn\rightarrow\infty, for any ε>0\varepsilon>0. We show in the supplement that h(𝚯)=0h(\boldsymbol{\Theta})=0 in our case. Since h(𝜽0)=0h(\boldsymbol{\theta}_{0})=0, where 𝜽0\boldsymbol{\theta}_{0} is the true parameter which includes 𝜷0\boldsymbol{\beta}_{0} and lies in {𝜽:h(𝜽)<ε}\{\boldsymbol{\theta}:h(\boldsymbol{\theta})<\varepsilon\} for any ε>0\varepsilon>0, it follows that the posterior of 𝜷\boldsymbol{\beta} is consistent.

9 Summary and conclusion

In this article, we have investigated asymptotic properties of the Bayesian non-marginal procedure under the general dependence structure when the number of hypotheses also tend to infinity with the sample size. We specifically showed that our method is consistent even in this setup, and that the different Bayesian versions of the error rates converge to zero exponentially fast, and that the expectations of the Bayesian versions with respect to the data also tend to zero. Since our results hold for any choice of the groups, it follows that they hold even for singleton groups, that is, for marginal decision rules. The results associated with α\alpha-control also continue to hold in the same spirit as the finite-dimensional setup developed in Chandra and Bhattacharya (2020). Interestingly, provided that Shalizi’s conditions hold, almost no assumption is required on the growth rate of the number of hypotheses to establish the results of the multiple testing procedures in high dimensions. Although in several cases, unlike the exact fixed-dimensional limits established in Chandra and Bhattacharya (2020), the exact high-dimensional limits associated with the error rates could not be established, exponential convergence to zero in high dimensions could still be achieved. Moreover, internal consistency of our results, as we make transition from fixed dimension to high dimensions, are always ensured.

An important objective of this research is to show that the finite-dimensional time-varying variable selection problem in the autoregressive setup introduced in Chandra and Bhattacharya (2020) admits extension to the setup where the number of covariates to be selected by our Bayesian non-marginal procedure, grows with sample size. Indeed, we have shown that under reasonable assumptions, our asymptotic theories remain valid for this problem for both high-dimensional and ultra high-dimensional situations. Different priors for the regression coefficients are of course warranted, and we have discussed the classes of such relevant priors for the two different setups. As much as we are aware of, at least in the time series context, such high-dimensional multiple hypotheses testing is not hitherto dealt with. The priors that we introduce, particularly in the ultra high-dimensional context, also do not seem to have been considered before. These priors, in conjunction with the equipartition property, help control sparsity of the model quite precisely. As such, these ideas seem to be of independent interest for general high-dimensional asymptotics.

Supplementary Material

S-10 Proof of Theorem 4

Proof. From conditions (4.2) and (4.3), it follows that there exists n1n_{1} such that for all n>n1n>n_{1}

βn\displaystyle\beta_{n} >β¯δ,\displaystyle>\underline{\beta}-\delta, (S-10.1)
βn\displaystyle\beta_{n} <1δ, such that\displaystyle<1-\delta,\text{ such that} (S-10.2)

β¯δ>0\underline{\beta}-\delta>0 and 1β¯>δ1-\bar{\beta}>\delta, for some δ>0\delta>0. It follows using this, (4.7) and (4.9), that for n>n1n>n_{1},

i:𝒅(mn)𝔻i,mncmnditwin(𝒅t(mn))i:𝒅(mn)𝔻i,mncmndiwin(𝒅(mn))\displaystyle\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d^{t}_{i}w_{in}(\boldsymbol{d}^{t}(m_{n}))-\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n})) (S-10.3)
>(1en(Jϵ))i:𝒅(mn)𝔻i,mncditen(Jϵ)i:𝒅(mn)𝔻i,mncdi,and\displaystyle\qquad\qquad>\left(1-e^{-n(J-\epsilon)}\right)\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}d^{t}_{i}-e^{-n(J-\epsilon)}\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}d_{i},~{}\mbox{and}
βn(i:𝒅𝔻i,mncmnditi:𝒅(mn)𝔻i,mncmndi)<(1δ)i:𝒅𝔻i,mncmndit(β¯δ)i:𝒅(mn)𝔻i,mncmndi.\displaystyle\beta_{n}\left(\sum_{i:\boldsymbol{d}\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d^{t}_{i}-\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d_{i}\right)<(1-\delta)\sum_{i:\boldsymbol{d}\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d^{t}_{i}-(\underline{\beta}-\delta)\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d_{i}. (S-10.4)

Now n1n_{1} can be appropriately chosen such that en(Jϵ)<min{δ,β¯δ}e^{-n(J-\epsilon)}<\min\{\delta,\underline{\beta}-\delta\}. Hence, for n>max{n0,n1}n>\max\{n_{0},n_{1}\},

i:𝒅𝔻i,mncmnditwin(𝒅t(mn))i:𝒅(mn)𝔻i,mncmndiwin(𝒅(mn))>βn(i:𝒅(mn)𝔻i,mncmnditi:𝒅(mn)𝔻i,mncmndi),\displaystyle\sum_{i:\boldsymbol{d}\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d^{t}_{i}w_{in}(\boldsymbol{d}^{t}(m_{n}))-\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))>\beta_{n}\left(\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d^{t}_{i}-\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d_{i}\right),
for all𝒅(mn)𝒅t(mn), almost surely;\displaystyle\qquad~{}\mbox{for all}~{}\boldsymbol{d}(m_{n})\neq\boldsymbol{d}^{t}(m_{n}),\text{ almost surely};
\displaystyle\Rightarrow i=1mndit(win(𝒅t(mn))βn)>i=1mndi(win(𝒅(mn))βn),for all𝒅(mn)𝒅t(mn), almost surely;\displaystyle\sum_{i=1}^{m_{n}}d^{t}_{i}(w_{in}(\boldsymbol{d}^{t}(m_{n}))-\beta_{n})>\sum_{i=1}^{m_{n}}d_{i}(w_{in}(\boldsymbol{d}(m_{n}))-\beta_{n}),~{}\mbox{for all}~{}\boldsymbol{d}(m_{n})\neq\boldsymbol{d}^{t}(m_{n}),\text{ almost surely};
\displaystyle\Rightarrow limnδ𝒩(𝒅t(mn)|𝑿n)=1,almost surely.\displaystyle\lim_{n\rightarrow\infty}\delta_{\mathcal{NM}}(\boldsymbol{d}^{t}(m_{n})|\boldsymbol{X}_{n})=1,~{}\mbox{almost surely}.

Hence, (4.11) holds, and by the dominated convergence theorem, (4.12) also follows.   

S-11 Proof of Theorem 7

Proof.

𝒅(mn)𝟎i=1mndi(1win(𝒅(mn)))i=1mndiδ𝒩(𝒅(mn)|𝑿n)\displaystyle\sum_{\boldsymbol{d}(m_{n})\neq\boldsymbol{0}}\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{in}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}\delta_{\mathcal{NM}}\left(\boldsymbol{d}(m_{n})|\boldsymbol{X}_{n}\right)
=\displaystyle= i=1mndit(1win(𝒅t(mn)))i=1mnditδ𝒩(𝒅t(mn)|𝑿n)+𝒅(mn)𝒅t(mn)𝟎i=1mndi(1win(𝒅(mn)))i=1mndiδ𝒩(𝒅(mn)|𝑿n).\displaystyle\frac{\sum_{i=1}^{m_{n}}d_{i}^{t}(1-w_{in}(\boldsymbol{d}^{t}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}^{t}}\delta_{\mathcal{NM}}\left(\boldsymbol{d}^{t}(m_{n})|\boldsymbol{X}_{n}\right)+\sum_{\boldsymbol{d}(m_{n})\neq\boldsymbol{d}^{t}(m_{n})\neq\boldsymbol{0}}\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{in}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}\delta_{\mathcal{NM}}\left(\boldsymbol{d}(m_{n})|\boldsymbol{X}_{n}\right).

Following Theorem 4, it holds, almost surely, that there exists N1N\geq 1 such that for all n>Nn>N, δ𝒩(𝒅(mn)|𝑿n)=0\delta_{\mathcal{NM}}\left(\boldsymbol{d}(m_{n})|\boldsymbol{X}_{n}\right)=0 for all 𝒅(mn)𝒅t(mn)\boldsymbol{d}(m_{n})\neq\boldsymbol{d}^{t}(m_{n}). Therefore, for n>Nn>N,

𝒅(mn)𝟎i=1mndi(1win(𝒅(mn)))i=1mndiδ𝒩(𝒅(mn)|𝑿n)\displaystyle\sum_{\boldsymbol{d}(m_{n})\neq\boldsymbol{0}}\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{in}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}\delta_{\mathcal{NM}}\left(\boldsymbol{d}(m_{n})|\boldsymbol{X}_{n}\right)
=\displaystyle= i=1mndit(1win(𝒅t(mn)))i=1mnditδ𝒩(𝒅t(mn)|𝑿n)\displaystyle\frac{\sum_{i=1}^{m_{n}}d_{i}^{t}(1-w_{in}(\boldsymbol{d}^{t}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}^{t}}\delta_{\mathcal{NM}}\left(\boldsymbol{d}^{t}(m_{n})|\boldsymbol{X}_{n}\right)
\displaystyle\leq i=1mnditen(Jϵ)i=1mndit\displaystyle\frac{\sum_{i=1}^{m_{n}}d_{i}^{t}e^{-n(J-\epsilon)}}{\sum_{i=1}^{m_{n}}d_{i}^{t}}
=\displaystyle= en(Jϵ).\displaystyle e^{-n(J-\epsilon)}.

Thus, (5.1) is established. Using (4.10) and Corollary 5, (5.2) follows in the same way.   

S-11.1 Proof of Theorem 9

Proof. Note that

mpBFDR\displaystyle mpBFDR
=E𝑿n[𝒅(mn)𝔻mni=1mndi(1wi(𝒅(mn)))i=1mndiδβ(𝒅(mn)|𝑿n)|δ𝒩(𝒅(mn)=𝟎|𝑿n)=0]\displaystyle=E_{\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}(m_{n})\in\mathbb{D}_{m_{n}}}\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{i}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}\delta_{\beta}(\boldsymbol{d}(m_{n})|\boldsymbol{X}_{n})\bigg{|}\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})=\boldsymbol{0}|\boldsymbol{X}_{n})=0\right]
=\displaystyle= E𝑿n[𝒅(mn)𝔻mni=1mndi(1wi(𝒅(mn)))i=1mndiδ𝒩(𝒅(mn)|𝑿n)|δ𝒩(𝒅(mn)=𝟎|𝑿n)=0]\displaystyle E_{\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}(m_{n})\in\mathbb{D}_{m_{n}}}\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{i}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})|\boldsymbol{X}_{n})\bigg{|}\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})=\boldsymbol{0}|\boldsymbol{X}_{n})=0\right]
=\displaystyle= E𝑿n[𝒅(mn)𝔻mni=1mndi(1wi(𝒅(mn)))i=1mndiI(i=1mndi>0)δ𝒩(𝒅(mn)|𝑿n)]1P𝑿n[δ𝒩(𝒅(mn)=𝟎|𝑿n)=0]\displaystyle E_{\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}(m_{n})\in\mathbb{D}_{m_{n}}}\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{i}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}I\left(\sum_{i=1}^{m_{n}}d_{i}>0\right)\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})|\boldsymbol{X}_{n})\right]\frac{1}{P_{\boldsymbol{X}_{n}}\left[\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})=\boldsymbol{0}|\boldsymbol{X}_{n})=0\right]}
=\displaystyle= E𝑿n[𝒅(mn)𝔻mn{𝟎}i=1mndi(1wi(𝒅(mn)))i=1mndiδ𝒩(𝒅(mn)|𝑿n)]1P𝑿n[δ𝒩(𝒅(mn)=𝟎|𝑿n)=0].\displaystyle E_{\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}(m_{n})\in\mathbb{D}_{m_{n}}\setminus\left\{\boldsymbol{0}\right\}}\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{i}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})|\boldsymbol{X}_{n})\right]\frac{1}{P_{\boldsymbol{X}_{n}}\left[\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})=\boldsymbol{0}|\boldsymbol{X}_{n})=0\right]}.

From Theorem 7, mFDR𝑿n0mFDR_{\boldsymbol{X}_{n}}\rightarrow 0, as nn\rightarrow\infty. Also we have

0𝒅(mn)𝔻mn{𝟎}i=1mndi(1wi(𝒅(mn)))i=1mndiδ𝒩(𝒅(mn)|𝑿n)mFDR𝑿n1.0\leq\sum_{\boldsymbol{d}(m_{n})\in\mathbb{D}_{m_{n}}\setminus\left\{\boldsymbol{0}\right\}}\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{i}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})|\boldsymbol{X}_{n})\leq mFDR_{\boldsymbol{X}_{n}}\leq 1.

Therefore by the dominated convergence theorem, E𝑿n[𝒅(mn)𝔻mn{𝟎}i=1mdi(1wi(𝒅(mn)))i=1mndiδ𝒩(𝒅(mn)|𝑿n)]0E_{\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}(m_{n})\in\mathbb{D}_{m_{n}}\setminus\left\{\boldsymbol{0}\right\}}\frac{\sum_{i=1}^{m}d_{i}(1-w_{i}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})|\boldsymbol{X}_{n})\right]\rightarrow 0, as nn\rightarrow\infty. From (A2) we have 𝒅t(mn)𝟎\boldsymbol{d}^{t}(m_{n})\neq\boldsymbol{0} and from Theorem 4 we have E𝑿n[δ𝒩(𝒅t(mn)|𝑿n)]1E_{\boldsymbol{X}_{n}}[\delta_{\mathcal{NM}}(\boldsymbol{d}^{t}(m_{n})|\boldsymbol{X}_{n})]\rightarrow 1. Thus P𝑿n[δ𝒩(𝒅(mn)=𝟎|𝑿n)=0]1P_{\boldsymbol{X}_{n}}\left[\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})=\boldsymbol{0}|\boldsymbol{X}_{n})=0\right]\rightarrow 1, as nn\rightarrow\infty. This proves the result.

It can be similarly shown that pBFDR0pBFDR\rightarrow 0, as nn\rightarrow\infty.   

S-12 Proof of Theorem 10

Proof. The proof follows in the same way as that of Theorem 7, using ((A2)) in addition.   

S-12.1 Proof of Theorem 12

Proof. The proof follows in the same way as that of Theorem 9, using ((A2)) in addition.   

S-12.2 Proof of Theorem 13

Proof. Theorem 3.4 of Chandra and Bhattacharya (2019) shows that mpBFDRmpBFDR is non-increasing in β\beta. Hence, for every n>1n>1, the maximum error that can be incurred is at β=0\beta=0 where we actually maximize i=1mndiwin(𝒅(mn))\sum_{i=1}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n})). Let

𝒅^(mn)\displaystyle\hat{\boldsymbol{d}}(m_{n}) =argmax𝒅(mn)𝔻mni=1mndiwin(𝒅(mn))=argmax𝒅(mn)𝔻mn[i=1m1ndiwin(𝒅(mn))+i=m1n+1mndiwin(𝒅(mn))]\displaystyle=\operatorname*{argmax}_{\boldsymbol{d}(m_{n})\in\mathbb{D}_{m_{n}}}\sum_{i=1}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))=\operatorname*{argmax}_{\boldsymbol{d}(m_{n})\in\mathbb{D}_{m_{n}}}\left[\sum_{i=1}^{m_{1n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))+\sum_{i=m_{1n}+1}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))\right]

Since the groups in {Gi,mn:i=1,,m1n}\{G_{i,m_{n}}:i=1,\ldots,m_{1n}\} have no overlap with those in {Gi,mn:i=m1n+1,,mn}\{G_{i,m_{n}}:i=m_{1n}+1,\ldots,m_{n}\}, i=1m1ndiwin(𝒅(mn))\sum_{i=1}^{m_{1n}}d_{i}w_{in}(\boldsymbol{d}(m_{n})) and i=m1n+1mndiwin(𝒅(mn))\sum_{i=m_{1n}+1}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n})) can be maximized separately.

Let us define the following notations:

Q𝒅(mn)={i{1,,mn}:all elements of𝒅Gi,mnare correct};\displaystyle Q_{\boldsymbol{d}(m_{n})}=\left\{i\in\{1,\ldots,m_{n}\}:\mbox{all elements of}~{}\boldsymbol{d}_{G_{i,m_{n}}}~{}\mbox{are correct}\right\};
Q𝒅(mn)m1n=Q𝒅(mn){1,2,,m1n},Q𝒅(mn)m1nc={1,2,,m1n}Q𝒅(mn)m1n.\displaystyle Q_{\boldsymbol{d}(m_{n})}^{m_{1n}}=Q_{\boldsymbol{d}(m_{n})}\cap\{1,2,\ldots,m_{1n}\},~{}Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}=\{1,2,\cdots,m_{1n}\}\setminus Q_{\boldsymbol{d}(m_{n})}^{m_{1n}}.

Now,

i=1m1ndiwin(𝒅(mn))i=1m1nditwin(𝒅t(mn))\displaystyle\sum_{i=1}^{m_{1n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))-\sum_{i=1}^{m_{1n}}d_{i}^{t}w_{in}(\boldsymbol{d}^{t}(m_{n}))
=\displaystyle= [iQ𝒅(mn)m1ndiwin(𝒅(mn))iQ𝒅(mn)m1nditwin(𝒅t(mn))]+[iQ𝒅(mn)m1ncdiwin(𝒅(mn))iQ𝒅(mn)m1ncditwin(𝒅t(mn))]\displaystyle\left[\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))-\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}}}d_{i}^{t}w_{in}(\boldsymbol{d}^{t}(m_{n}))\right]+\left[\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))-\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d_{i}^{t}w_{in}(\boldsymbol{d}^{t}(m_{n}))\right]
=\displaystyle= iQ𝒅(mn)m1ncdiwin(𝒅(mn))iQ𝒅(mn)m1ncditwin(𝒅t(mn)),\displaystyle\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))-\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d_{i}^{t}w_{in}(\boldsymbol{d}^{t}(m_{n})),

since for any 𝒅(mn)\boldsymbol{d}(m_{n}), iQ𝒅(mn)m1ndiwin(𝒅(mn))=iQ𝒅(mn)m1nditwin(𝒅t(mn))\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))=\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}}}d_{i}^{t}w_{in}(\boldsymbol{d}^{t}(m_{n})) by definition of Q𝒅(mn)m1nQ_{\boldsymbol{d}(m_{n})}^{m_{1n}}.

Note that iQ𝒅(mn)m1ncditwin(𝒅t(mn))\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d_{i}^{t}w_{in}(\boldsymbol{d}^{t}(m_{n})) can not be zero as it contradicts (B) that {Gi,mn:i=1,,m1n}\left\{G_{i,m_{n}}:i=1,\ldots,m_{1n}\right\} have at least one false null hypothesis.

Now, from (4.7) and (4.9), we obtain for nn0(ϵ)n\geq n_{0}(\epsilon),

iQ𝒅(mn)m1ncdiwin(𝒅(mn))iQ𝒅(mn)m1ncditwin(𝒅t(mn))\displaystyle\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))-\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d_{i}^{t}w_{in}(\boldsymbol{d}^{t}(m_{n}))
<en(Jϵ)iQ𝒅(mn)m1nc(di+dit)iQ𝒅(mn)m1ncdit\displaystyle\qquad<e^{-n(J-\epsilon)}\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}\left(d_{i}+d^{t}_{i}\right)-\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d^{t}_{i}
<2m1nen(Jϵ)iQ𝒅(mn)m1ncdit.\displaystyle\qquad<2m_{1n}e^{-n(J-\epsilon)}-\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d^{t}_{i}. (S-12.1)

By our assumption (7.3), mnen(Jϵ)0m_{n}e^{-n(J-\epsilon)}\rightarrow 0 as nn\rightarrow\infty, so that m1nen(Jϵ)0m_{1n}e^{-n(J-\epsilon)}\rightarrow 0 as nn\rightarrow\infty. Also, iQ𝒅(mn)m1ncdit>0\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d^{t}_{i}>0. Hence, (S-12.1) is negative for sufficient;y large nn. In other words, 𝒅t(mn)\boldsymbol{d}^{t}(m_{n}) maximizes i=1m1ndiwin(𝒅(mn))\sum_{i=1}^{m_{1n}}d_{i}w_{in}(\boldsymbol{d}(m_{n})) for sufficiently large nn.

Let us now consider the term i=m1n+1mndiwin(𝒅(mn))\sum_{i=m_{1n}+1}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n})). Note that i=m1n+1mnditwin(𝒅t(mn))=0\sum_{i=m_{1n}+1}^{m_{n}}d_{i}^{t}w_{in}(\boldsymbol{d}^{t}(m_{n}))=0 by (B). For any finite nn, i=m1n+1mndiwin(𝒅(mn))\sum_{i=m_{1n}+1}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n})) is maximized for some decision configuration 𝒅~(mn)\tilde{\boldsymbol{d}}(m_{n}) where d~i=1\tilde{d}_{i}=1 for at least one i{m1n+1,,mn}i\in\{m_{1n}+1,\ldots,m_{n}\}. In that case,

𝒅^t(mn)=(d1t,,dm1nt,d~m1n+1,d~m1n+2,,d~mn),\hat{\boldsymbol{d}}^{t}(m_{n})=(d^{t}_{1},\ldots,d^{t}_{m_{1n}},\tilde{d}_{m_{1n}+1},\tilde{d}_{m_{1n}+2},\ldots,\tilde{d}_{m_{n}}),

so that for sufficiently large nn,

i=1mnd^i(1win(𝒅^(mn)))i=1mnd^i1i=1m1nditwin(𝒅t(mn))+(mnm1n)en(Jϵ)i=1mndit+1\displaystyle\frac{\sum_{i=1}^{m_{n}}\hat{d}_{i}(1-w_{in}(\hat{\boldsymbol{d}}(m_{n})))}{\sum_{i=1}^{m_{n}}\hat{d}_{i}}\geq 1-\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}w_{in}(\boldsymbol{d}^{t}(m_{n}))+(m_{n}-m_{1n})e^{-n(J-\epsilon)}}{\sum_{i=1}^{m_{n}}d^{t}_{i}+1}
=1+i=1m1ndit(1win(𝒅t))i=1mndit+1(mnm1n)en(Jϵ)i=1mndit+1.\displaystyle\qquad=\frac{1+\sum_{i=1}^{m_{1n}}d^{t}_{i}\left(1-w_{in}(\boldsymbol{d}^{t})\right)}{\sum_{i=1}^{m_{n}}d^{t}_{i}+1}-\frac{(m_{n}-m_{1n})e^{-n(J-\epsilon)}}{\sum_{i=1}^{m_{n}}d^{t}_{i}+1}. (S-12.2)

Now note that

0<i=1m1ndit(1win(𝒅t))mn<en(Jϵ)i=1m1nditmn<m1nmnen(Jϵ).0<\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}\left(1-w_{in}(\boldsymbol{d}^{t})\right)}{m_{n}}<e^{-n(J-\epsilon)}\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}}{m_{n}}<\frac{m_{1n}}{m_{n}}e^{-n(J-\epsilon)}. (S-12.3)

Since the right most side of (S-12.3) tends to zero as nn\rightarrow\infty due to (7.1), it follows that i=1m1ndit(1win(𝒅t))mn0\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}\left(1-w_{in}(\boldsymbol{d}^{t})\right)}{m_{n}}\rightarrow 0 as nn\rightarrow\infty. Hence, dividing the numerators and denominators of the right hand side of (S-12.2) by mnm_{n} and taking limit as nn\rightarrow\infty shows that

limni=1mnd^i(1win(𝒅^(mn)))i=1mnd^i0.\lim_{n\rightarrow\infty}\frac{\sum_{i=1}^{m_{n}}\hat{d}_{i}(1-w_{in}(\hat{\boldsymbol{d}}(m_{n})))}{\sum_{i=1}^{m_{n}}\hat{d}_{i}}\geq 0. (S-12.4)

almost surely, for all data sequences. Boundedness of i=1mndi(1win(𝒅(mn)))i=1mndi\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{in}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}} for all 𝒅(mn)\boldsymbol{d}(m_{n}) and 𝑿n\boldsymbol{X}_{n} ensures uniform integrability, which, in conjunction with the simple observation that for β=0\beta=0,

P(δ𝒩(𝒅(mn)=𝟎|𝑿n)=0)=1P\left(\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})=\boldsymbol{0}|\boldsymbol{X}_{n})=0\right)=1

for all n1n\geq 1, guarantees that under (B), limnmpBFDR0\underset{n\rightarrow\infty}{\lim}~{}mpBFDR\geq 0.

Now, if Gm1n+1,,GmnG_{m_{1n}+1},\ldots,G_{m_{n}} are all disjoint, each consisting of only one true null hypothesis, then i=m1n+1mndiwin(𝒅(mn))\sum_{i=m_{1n}+1}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n})) will be maximized by 𝒅~(mn)\tilde{\boldsymbol{d}}(m_{n}) where d~i=1\tilde{d}_{i}=1 for all i{m1n+1,,mn}i\in\{m_{1n}+1,\ldots,m_{n}\}. Since ditd^{t}_{i}; i=1,,m1ni=1,\ldots,m_{1n} maximizes i=1m1ndiwin(𝒅(mn))\sum_{i=1}^{m_{1n}}d_{i}w_{in}(\boldsymbol{d}(m_{n})) for large nn, it follows that 𝒅^(mn)=(d1t,,dm1nt,1,1,,1)\hat{\boldsymbol{d}}(m_{n})=(d^{t}_{1},\ldots,d^{t}_{m_{1n}},1,1,\ldots,1) is the maximizer of i=1mndiwin(𝒅(mn))\sum_{i=1}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n})) for large nn. In this case,

i=1mnd^i(1win(𝒅^(mn)))i=1mnd^i=1i=1m1nditwin(𝒅t(mn))+i=m1n+1mnwin(𝟏)i=1mndit+mnm1n.\frac{\sum_{i=1}^{m_{n}}\hat{d}_{i}(1-w_{in}(\hat{\boldsymbol{d}}(m_{n})))}{\sum_{i=1}^{m_{n}}\hat{d}_{i}}=1-\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}w_{in}(\boldsymbol{d}^{t}(m_{n}))+\sum_{i=m_{1n}+1}^{m_{n}}w_{in}(\boldsymbol{1})}{\sum_{i=1}^{m_{n}}d^{t}_{i}+m_{n}-m_{1n}}. (S-12.5)

Now, for large enough nn,

(1en(Jϵ))i=1m1nditmn<i=1m1nditwin(𝒅t(mn))mn<i=1m1nditmn.\left(1-e^{-n(J-\epsilon)}\right)\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}}{m_{n}}<\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}w_{in}(\boldsymbol{d}^{t}(m_{n}))}{m_{n}}<\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}}{m_{n}}. (S-12.6)

Since due to (7.2), i=1m1nditmnp\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}}{m_{n}}\rightarrow p, as nn\rightarrow\infty, it follows from (S-12.6) that

i=1m1nditwin(𝒅t(mn))mnp,asn.\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}w_{in}(\boldsymbol{d}^{t}(m_{n}))}{m_{n}}\rightarrow p,~{}\mbox{as}~{}n\rightarrow\infty. (S-12.7)

Also, since for large enough nn,

0<i=m1n+1mnwin(𝟏)mn<(mnm1n)mnen(Jϵ),0<\frac{\sum_{i=m_{1n}+1}^{m_{n}}w_{in}(\boldsymbol{1})}{m_{n}}<\frac{(m_{n}-m_{1n})}{m_{n}}e^{-n(J-\epsilon)},

it follows using (7.1) that

i=m1n+1mnwin(𝟏)mn0,asn.\frac{\sum_{i=m_{1n}+1}^{m_{n}}w_{in}(\boldsymbol{1})}{m_{n}}\rightarrow 0,~{}\mbox{as}~{}n\rightarrow\infty. (S-12.8)

Hence, dividing the numerator and denominator in the ratio on the right hand side of (S-12.5) by mnm_{n} and using the limits (S-12.7), (S-12.8) and (7.1) as nn\rightarrow\infty, yields

limni=1mnd^i(1win(𝒅^(mn)))i=1mnd^i=1q1+pq.\lim_{n\rightarrow\infty}\frac{\sum_{i=1}^{m_{n}}\hat{d}_{i}(1-w_{in}(\hat{\boldsymbol{d}}(m_{n})))}{\sum_{i=1}^{m_{n}}\hat{d}_{i}}=\frac{1-q}{1+p-q}. (S-12.9)

Hence, in this case, the maximum mpBFDRmpBFDR (that can be incurred at β=0\beta=0) for nn\rightarrow\infty is given by

limnmpBFDRβ=0=1q1+pq.\lim_{n\rightarrow\infty}mpBFDR_{\beta=0}=\frac{1-q}{1+p-q}.

Note that this is also the maximum asymptotic mpBFDRmpBFDR that can be incurred among all possible configurations of Gm1n+1,,GmnG_{m_{1n}+1},\ldots,G_{m_{n}}. Hence, for any arbitrary configuration of groups, the maximum asymptotic mpBFDRmpBFDR that can be incurred lies in the interval (0,1q1+pq)\left(0,\frac{1-q}{1+p-q}\right).   

S-12.3 Proof of Theorem 17

Proof. Using the facts that mpBFDRmpBFDR is continuous and decreasing in β\beta (Chandra and Bhattacharya (2019)) and that mpBFDRmpBFDR tends to 0 (Theorem 9), the proof follows in the same way as that of Theorem 8 of Chandra and Bhattacharya (2020).   

S-12.4 Proof of Theorem 19

Proof. From Chandra and Bhattacharya (2019) it is known that mpBFDRmpBFDR and pBFDRpBFDR are continuous and non-increasing in β\beta. If 𝒅^(mn)\hat{\boldsymbol{d}}(m_{n}) denotes the optimal decision configuration with respect to the additive loss function, d^i=1\hat{d}_{i}=1 for all ii, for β=0\beta=0. Thus, assuming without loss of generality that the first m0nm_{0n} null hypotheses are true,

i=1mnd^i(1vin)i=1mnd^i=1i=1m0nvin+i=m0n+1mnvinmn.\frac{\sum_{i=1}^{m_{n}}\hat{d}_{i}(1-v_{in})}{\sum_{i=1}^{m_{n}}\hat{d}_{i}}=1-\frac{\sum_{i=1}^{m_{0n}}v_{in}+\sum_{i=m_{0n}+1}^{m_{n}}v_{in}}{m_{n}}. (S-12.10)

Now, 0<i=1m0nvinmn<(1m0nmn)en(Jϵ)0<\frac{\sum_{i=1}^{m_{0n}}v_{in}}{m_{n}}<\left(1-\frac{m_{0n}}{m_{n}}\right)e^{-n(J-\epsilon)}, so that i=1m0nvinmn0\frac{\sum_{i=1}^{m_{0n}}v_{in}}{m_{n}}\rightarrow 0 as nn\rightarrow\infty. Also, (1en(Jϵ))(1m0nmn)<i=m0n+1mnvinmn<1m0nmn\left(1-e^{-n(J-\epsilon)}\right)\left(1-\frac{m_{0n}}{m_{n}}\right)<\frac{\sum_{i=m_{0n}+1}^{m_{n}}v_{in}}{m_{n}}<1-\frac{m_{0n}}{m_{n}}, so that i=m0n+1mnvinmnp0\frac{\sum_{i=m_{0n}+1}^{m_{n}}v_{in}}{m_{n}}\rightarrow p_{0}, as nn\rightarrow\infty. Hence, taking limits on both sides of (S-12.10), we obtain

limni=1mnd^i(1vin)i=1mnd^i=p0.\lim_{n\rightarrow\infty}\frac{\sum_{i=1}^{m_{n}}\hat{d}_{i}(1-v_{in})}{\sum_{i=1}^{m_{n}}\hat{d}_{i}}=p_{0}.

The remaining part of the proof follows in the same way as that of Theorem 17.   

S-12.5 Proof of Theorem 20

Proof. The proof follows in the same way as that of Theorem 10 of Chandra and Bhattacharya (2020) using the facts mpBFDRβ>pBFDRβmpBFDR_{\beta}>pBFDR_{\beta} for any multiple testing procedure, limnpBFDRβ=0=p0\underset{n\rightarrow\infty}{\lim}~{}pBFDR_{\beta=0}=p_{0} (due to Theorem 19), and that mpBFDRmpBFDR is continuous and non-increasing in β\beta and tends to zero as nn\rightarrow\infty.   

S-12.6 Proof of Theorem 21

Proof. Note that by Theorem 17, there exists a sequence {βn}\{\beta_{n}\} such that limnmpBFDRβn=α\lim_{n\rightarrow\infty}mpBFDR_{\beta_{n}}=\alpha, where α(0,1q1+pq)\alpha\in\left(0,\frac{1-q}{1+p-q}\right). Let 𝒅(mn)^\hat{\boldsymbol{d}(m_{n})} be the optimal decision configuration associated with the sequence {βn}\{\beta_{n}\}. The proofs of Theorem 13 and 17 show that d^in=dit\hat{d}_{in}=d_{i}^{t} for i=1,,m1ni=1,\cdots,m_{1n} and i=m1n+1mnd^in>0\sum_{i=m_{1n}+1}^{m_{n}}\hat{d}_{in}>0. Hence, using (4.8) we obtain

i=1mn(1d^in)vini=1mn(1d^in)i=1mn(1dit)vini=1mn(1d^in)<en(Jϵ)×i=1mn(1dit))i=1mn(1d^in)\displaystyle\frac{\sum_{i=1}^{m_{n}}(1-\hat{d}_{in})v_{in}}{\sum_{i=1}^{m_{n}}(1-\hat{d}_{in})}\leq\frac{\sum_{i=1}^{m_{n}}(1-d_{i}^{t})v_{in}}{\sum_{i=1}^{m_{n}}(1-\hat{d}_{in})}<e^{-n(J-\epsilon)}\times\frac{\sum_{i=1}^{m_{n}}(1-d_{i}^{t}))}{\sum_{i=1}^{m_{n}}(1-\hat{d}_{in})} (S-12.11)
\displaystyle\Rightarrow~{} 1nlog(FNR𝑿n)<J+ϵ+1nlog[i=1mn(1dit)]1nlog[i=1mn(1d^in)].\displaystyle\frac{1}{n}\log\left(FNR_{\boldsymbol{X}_{n}}\right)<-J+\epsilon+\frac{1}{n}\log\left[\sum_{i=1}^{m_{n}}(1-d_{i}^{t})\right]-\frac{1}{n}\log\left[\sum_{i=1}^{m_{n}}(1-\hat{d}_{in})\right]. (S-12.12)

Now,

01nlog[i=1mn(1dit)]logmnn;\displaystyle 0\leq\frac{1}{n}\log\left[\sum_{i=1}^{m_{n}}(1-d^{t}_{i})\right]\leq\frac{\log m_{n}}{n};
01nlog[i=1mn(1d^in)]logmnn.\displaystyle 0\leq\frac{1}{n}\log\left[\sum_{i=1}^{m_{n}}(1-\hat{d}_{in})\right]\leq\frac{\log m_{n}}{n}.

Since logmnn0\frac{\log m_{n}}{n}\rightarrow 0, as nn\rightarrow\infty,

limn1nlog[i=1m(1dit)]=0,and\displaystyle\lim_{n\rightarrow\infty}\frac{1}{n}\log\left[\sum_{i=1}^{m}(1-d^{t}_{i})\right]=0,~{}\mbox{and} (S-12.13)
limn1nlog[i=1m(1d^in)]=0.\displaystyle\lim_{n\rightarrow\infty}\frac{1}{n}\log\left[\sum_{i=1}^{m}(1-\hat{d}_{in})\right]=0. (S-12.14)

As ϵ\epsilon is any arbitrary positive quantity we have from (S-12.12), (S-12.13) and (S-12.14) that

lim supn1nlog(FNR𝑿n)J.\limsup_{n\rightarrow\infty}\frac{1}{n}\log\left(FNR_{\boldsymbol{X}_{n}}\right)\leq-J.

 

S-13 Verification of (S1)-(S7) in AR(1)AR(1) model with time-varying covariates and proofs of the relevant theorems

All the probabilities and expectations below are with respect to the true model PP.

S-13.1 Verification of (S1)

We obtain

logRn(𝜽)\displaystyle-\log R_{n}(\boldsymbol{\theta}) =nlog(σσ0)+(12σ212σ02)t=1nxt2+(ρ22σ2ρ022σ02)t=1nxt12\displaystyle=n\log\left(\frac{\sigma}{\sigma_{0}}\right)+\left(\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right)\sum_{t=1}^{n}x^{2}_{t}+\left(\frac{\rho^{2}}{2\sigma^{2}}-\frac{\rho^{2}_{0}}{2\sigma^{2}_{0}}\right)\sum_{t=1}^{n}x^{2}_{t-1}
+12σ2𝜷m(t=1n𝒛mt𝒛mt)𝜷m12σ02𝜷m0(t=1n𝒛mt𝒛mt)𝜷m0\displaystyle\qquad+\frac{1}{2\sigma^{2}}\boldsymbol{\beta}^{\prime}_{m}\left(\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\right)\boldsymbol{\beta}_{m}-\frac{1}{2\sigma^{2}_{0}}\boldsymbol{\beta}^{\prime}_{m0}\left(\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\right)\boldsymbol{\beta}_{m0}
(ρσ2ρ0σ02)t=1nxtxt1(𝜷mσ2𝜷m0σ02)t=1n𝒛mtxt\displaystyle\qquad-\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\sum_{t=1}^{n}x_{t}x_{t-1}-\left(\frac{\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\boldsymbol{\beta}_{m0}}{\sigma^{2}_{0}}\right)^{\prime}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t}
+(ρ𝜷mσ2ρ0𝜷0σ02)t=1n𝒛mtxt1.\displaystyle\qquad+\left(\frac{\rho\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\rho_{0}\boldsymbol{\beta}_{0}}{\sigma^{2}_{0}}\right)^{\prime}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}. (S-13.1)

It is easily seen that logRn(𝜽)-\log R_{n}(\boldsymbol{\theta}) is continuous in 𝑿n\boldsymbol{X}_{n} and 𝜽\boldsymbol{\theta}. Hence, Rn(𝜽)R_{n}(\boldsymbol{\theta}) is n×𝒯\mathcal{F}_{n}\times\mathcal{T} measurable. In other words, (S1) holds.

S-13.2 Proof of Lemma 23

It is easy to see that under the true model PP,

E(xt)\displaystyle E(x_{t}) =k=1tρ0tk𝒛mk𝜷m0;\displaystyle=\sum_{k=1}^{t}\rho^{t-k}_{0}\boldsymbol{z}^{\prime}_{mk}\boldsymbol{\beta}_{m0}; (S-13.2)
E(xt+hxt)\displaystyle E(x_{t+h}x_{t}) σ02ρ0h1ρ02+E(xt+h)E(xt);h0,\displaystyle\sim\frac{\sigma^{2}_{0}\rho^{h}_{0}}{1-\rho^{2}_{0}}+E(x_{t+h})E(x_{t});~{}h\geq 0, (S-13.3)

where for any two sequences {at}t=1\{a_{t}\}_{t=1}^{\infty} and {bt}t=1\{b_{t}\}_{t=1}^{\infty}, atbta_{t}\sim b_{t} stands for at/bt1a_{t}/b_{t}\rightarrow 1 as tt\rightarrow\infty. Hence,

E(xt2)σ021ρ02+(k=1tρ0tk𝒛mk𝜷m0)2.E(x^{2}_{t})\sim\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\left(\sum_{k=1}^{t}\rho^{t-k}_{0}\boldsymbol{z}^{\prime}_{mk}\boldsymbol{\beta}_{m0}\right)^{2}. (S-13.4)

Now let

ϱt=k=1tρ0tk𝒛mk𝜷m0\varrho_{t}=\sum_{k=1}^{t}\rho^{t-k}_{0}\boldsymbol{z}^{\prime}_{mk}\boldsymbol{\beta}_{m0} (S-13.5)

and for t>t0t>t_{0},

ϱ~t=k=tt0tρ0tk𝒛mk𝜷m0,\tilde{\varrho}_{t}=\sum_{k=t-t_{0}}^{t}\rho^{t-k}_{0}\boldsymbol{z}^{\prime}_{mk}\boldsymbol{\beta}_{m0}, (S-13.6)

where, for any ε>0\varepsilon>0, t0t_{0} is so large that

C|ρ0|t0+1(1|ρ0|t0)ε.\frac{C\left|\rho_{0}\right|^{t_{0}+1}}{(1-\left|\rho_{0}\right|^{t_{0}})}\leq\varepsilon. (S-13.7)

It follows, using (8.9) and (S-13.7), that for t>t0t>t_{0},

|ϱtϱ~t|k=1tt01|ρ0|tk|𝒛mk𝜷m0|C|ρ0|t0+1(1|ρ0|tt0+1)1|ρ0|ε.\left|\varrho_{t}-\tilde{\varrho}_{t}\right|\leq\sum_{k=1}^{t-t_{0}-1}|\rho_{0}|^{t-k}\left|\boldsymbol{z}^{\prime}_{mk}\boldsymbol{\beta}_{m0}\right|\leq\frac{C|\rho_{0}|^{t_{0}+1}(1-|\rho_{0}|^{t-t_{0}+1})}{1-|\rho_{0}|}\leq\varepsilon. (S-13.8)

Hence, for t>t0t>t_{0},

ϱ~tεϱtϱ~t+ε.\tilde{\varrho}_{t}-\varepsilon\leq\varrho_{t}\leq\tilde{\varrho}_{t}+\varepsilon. (S-13.9)

Now,

t=1nϱ~tn\displaystyle\frac{\sum_{t=1}^{n}\tilde{\varrho}_{t}}{n} =ρ0t0(t=1n𝒛mtn)𝜷m0+ρ0t01(t=2n𝒛mtn)𝜷m0+ρ0t02(t=3n𝒛mtn)𝜷m0+\displaystyle=\rho^{t_{0}}_{0}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}}{n}\right)^{\prime}\boldsymbol{\beta}_{m0}+\rho^{t_{0}-1}_{0}\left(\frac{\sum_{t=2}^{n}\boldsymbol{z}_{mt}}{n}\right)^{\prime}\boldsymbol{\beta}_{m0}+\rho^{t_{0}-2}_{0}\left(\frac{\sum_{t=3}^{n}\boldsymbol{z}_{mt}}{n}\right)^{\prime}\boldsymbol{\beta}_{m0}+\cdots
+ρ0(t=t0n𝒛mtn)𝜷m0+(t=t0+1n𝒛mtn)𝜷m0\displaystyle\qquad\qquad\cdots+\rho_{0}\left(\frac{\sum_{t=t_{0}}^{n}\boldsymbol{z}_{mt}}{n}\right)^{\prime}\boldsymbol{\beta}_{m0}+\left(\frac{\sum_{t=t_{0}+1}^{n}\boldsymbol{z}_{mt}}{n}\right)^{\prime}\boldsymbol{\beta}_{m0}
0,asn,by virtue of (B4) (8.6).\displaystyle\rightarrow 0,~{}\mbox{as}~{}n\rightarrow\infty,~{}\mbox{by virtue of (B4) (\ref{eq:ass1})}. (S-13.10)

Similarly, it is easily seen, using (B4), that

t=1nϱ~t2n(1ρ02(2t0+1)1ρ02)c(𝜷0),asn.\frac{\sum_{t=1}^{n}\tilde{\varrho}^{2}_{t}}{n}\rightarrow\left(\frac{1-\rho^{2(2t_{0}+1)}_{0}}{1-\rho^{2}_{0}}\right)c(\boldsymbol{\beta}_{0}),~{}\mbox{as}~{}n\rightarrow\infty. (S-13.11)

Since (S-13.8) implies that for t>t0t>t_{0}, ϱ~t2+ε22εϱ~tϱt2ϱ~t2+ε2+2εϱ~t\tilde{\varrho}^{2}_{t}+\varepsilon^{2}-2\varepsilon\tilde{\varrho}_{t}\leq\varrho^{2}_{t}\leq\tilde{\varrho}^{2}_{t}+\varepsilon^{2}+2\varepsilon\tilde{\varrho}_{t}, it follows that

limnt=1nϱt2n=limnt=1nϱ~t2n+ε2=(1ρ02(2t0+1)1ρ02)c(𝜷0)+ε2,\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\varrho^{2}_{t}}{n}=\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\tilde{\varrho}^{2}_{t}}{n}+\varepsilon^{2}=\left(\frac{1-\rho^{2(2t_{0}+1)}_{0}}{1-\rho^{2}_{0}}\right)c(\boldsymbol{\beta}_{0})+\varepsilon^{2}, (S-13.12)

and since ϵ>0\epsilon>0 is arbitrary, it follows that

limnt=1nϱt2n=c(𝜷0)1ρ02.\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\varrho^{2}_{t}}{n}=\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}. (S-13.13)

Hence, it also follows from (S-13.2), (S-13.4), (B4) and (S-13.13), that

t=1nE(xt2)nσ021ρ02+c(𝜷0)1ρ02,asn\frac{\sum_{t=1}^{n}E(x^{2}_{t})}{n}\rightarrow\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}},~{}\mbox{as}~{}n\rightarrow\infty (S-13.14)

and

t=1nE(xt12)nσ021ρ02+c(𝜷0)1ρ02,asn.\frac{\sum_{t=1}^{n}E(x^{2}_{t-1})}{n}\rightarrow\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}},~{}\mbox{as}~{}n\rightarrow\infty. (S-13.15)

Now note that

xtxt1=ρ0xt12+𝒛mt𝜷0xt1+ϵtxt1.x_{t}x_{t-1}=\rho_{0}x^{2}_{t-1}+\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{0}x_{t-1}+\epsilon_{t}x_{t-1}. (S-13.16)

Using (8.10), (S-13.9) and arbitrariness of ε>0\varepsilon>0 it is again easy to see that

t=1n𝒛mt𝜷m0E(xt1)n0,asn.\frac{\sum_{t=1}^{n}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0}E(x_{t-1})}{n}\rightarrow 0,~{}\mbox{as}~{}n\rightarrow\infty. (S-13.17)

Also, since for t=1,2,,t=1,2,\ldots, E(ϵtxt1)=E(ϵt)E(xt1)E(\epsilon_{t}x_{t-1})=E(\epsilon_{t})E(x_{t-1}) by independence, and since E(ϵt)=0E(\epsilon_{t})=0 for t=1,2,t=1,2,\ldots, it holds that

t=1nE(ϵtxt1)n=0,for alln=1,2,.\frac{\sum_{t=1}^{n}E\left(\epsilon_{t}x_{t-1}\right)}{n}=0,~{}\mbox{for all}~{}n=1,2,\ldots. (S-13.18)

Combining (S-13.16), (S-13.15), (S-13.17) and (S-13.18) we obtain

t=1nE(xtxt1)nρ0σ021ρ02+ρ0c(𝜷0)1ρ02.\frac{\sum_{t=1}^{n}E\left(x_{t}x_{t-1}\right)}{n}\rightarrow\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}. (S-13.19)

Using (B4) (8.9) and arbitrariness of ε>0\varepsilon>0, it follows that

h(𝜽)=limn1nE[logRn(𝜽)]=log(σσ0)+(12σ212σ02)(σ021ρ02+c(𝜷0)1ρ02)+(ρ22σ2ρ022σ02)(σ021ρ02+c(𝜷0)1ρ02)+c(𝜷)2σ2c(𝜷0)2σ02(ρσ2ρ0σ02)(ρ0σ021ρ02+ρ0c(𝜷0)1ρ02)(c10(𝜷,𝜷0)σ2c(𝜷0)σ02).h(\boldsymbol{\theta})=\underset{n\rightarrow\infty}{\lim}~{}\frac{1}{n}E\left[-\log R_{n}(\boldsymbol{\theta})\right]=\log\left(\frac{\sigma}{\sigma_{0}}\right)+\left(\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right)\left(\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)\\ +\left(\frac{\rho^{2}}{2\sigma^{2}}-\frac{\rho^{2}_{0}}{2\sigma^{2}_{0}}\right)\left(\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)+\frac{c(\boldsymbol{\beta})}{2\sigma^{2}}-\frac{c(\boldsymbol{\beta}_{0})}{2\sigma^{2}_{0}}\\ -\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\left(\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)-\left(\frac{c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})}{\sigma^{2}}-\frac{c(\boldsymbol{\beta}_{0})}{\sigma^{2}_{0}}\right).

In other words, (S2) holds, with h(𝜽)h(\boldsymbol{\theta}) given by (8.17).

S-13.3 Proof of Theorem 24

Note that

xt=k=1tρ0tk𝒛mk𝜷m0+k=1tρ0tkϵk,x_{t}=\sum_{k=1}^{t}\rho^{t-k}_{0}\boldsymbol{z}^{\prime}_{mk}\boldsymbol{\beta}_{m0}+\sum_{k=1}^{t}\rho^{t-k}_{0}\epsilon_{k}, (S-13.20)

where ϵ~t=k=1tρ0tkϵk\tilde{\epsilon}_{t}=\sum_{k=1}^{t}\rho^{t-k}_{0}\epsilon_{k} is an asymptotically stationary Gaussian process with mean zero and covariance

cov(ϵ~t+h,ϵ~t)σ02ρ0h1ρ02,whereh0.cov(\tilde{\epsilon}_{t+h},\tilde{\epsilon}_{t})\sim\frac{\sigma^{2}_{0}\rho^{h}_{0}}{1-\rho^{2}_{0}},~{}\mbox{where}~{}h\geq 0. (S-13.21)

Then

t=1nxt2n=t=1nϱt2n+t=1nϵ~t2n+2t=1nϵ~tϱtn.\frac{\sum_{t=1}^{n}x^{2}_{t}}{n}=\frac{\sum_{t=1}^{n}\varrho^{2}_{t}}{n}+\frac{\sum_{t=1}^{n}\tilde{\epsilon}^{2}_{t}}{n}+\frac{2\sum_{t=1}^{n}\tilde{\epsilon}_{t}\varrho_{t}}{n}. (S-13.22)

By (S-13.13), the first term of the right hand side of (S-13.22) converges to c(𝜷0)1ρ02\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}, as nn\rightarrow\infty, and since ϵ~t\tilde{\epsilon}_{t}; t=1,2,t=1,2,\ldots is also an irreducible and aperiodic Markov chain, by the ergodic theorem it follows that the second term of (S-13.22) converges to σ02/(1ρ02)\sigma^{2}_{0}/(1-\rho^{2}_{0}) almost surely, as nn\rightarrow\infty. For the third term, we observe that

|𝒛k𝜷0𝒛mk𝜷m0|<δ,|\boldsymbol{z}^{\prime}_{k}\boldsymbol{\beta}_{0}-\boldsymbol{z}^{\prime}_{mk}\boldsymbol{\beta}_{m0}|<\delta, (S-13.23)

for n>n0n>n_{0}, where n0n_{0}, depending upon δ(>0)\delta~{}(>0), is sufficiently large. Recalling from (B5) that ϱ^t=k=1tρ0tk𝒛k𝜷0\hat{\varrho}_{t}=\sum_{k=1}^{t}\rho^{t-k}_{0}\boldsymbol{z}^{\prime}_{k}\boldsymbol{\beta}_{0}, we then see that for t>n0t>n_{0},

|ϱtϱ^t|<δ1|ρ0|<ε,|\varrho_{t}-\hat{\varrho}_{t}|<\frac{\delta}{1-|\rho_{0}|}<\varepsilon, (S-13.24)

for δ<(1|ρ0|)ε\delta<(1-|\rho_{0}|)\varepsilon. From (S-13.24) it follows that

limn2t=1nϵ~tϱtn=limn2t=n0+1nϵ~tϱ^tnn0\displaystyle\underset{n\rightarrow\infty}{\lim}~{}\frac{2\sum_{t=1}^{n}\tilde{\epsilon}_{t}\varrho_{t}}{n}=\underset{n\rightarrow\infty}{\lim}~{}\frac{2\sum_{t=n_{0}+1}^{n}\tilde{\epsilon}_{t}\hat{\varrho}_{t}}{n-n_{0}} (S-13.25)

Since by (B5) the limit of ϱ^t\hat{\varrho}_{t} exists as tt\rightarrow\infty, it follows that ϵ~tϱ^t\tilde{\epsilon}_{t}\hat{\varrho}_{t} is still an irreducible and aperiodic Markov chain with asymptotically stationary zero-mean Gaussian process. Hence, by the ergodic theorem, the third term of (S-13.22) converges to zero, almost surely, as nn\rightarrow\infty. It follows that

t=1nxt2nσ021ρ02+c(𝜷0)1ρ02,\frac{\sum_{t=1}^{n}x^{2}_{t}}{n}\rightarrow\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}, (S-13.26)

and similarly,

t=1nxt12nσ021ρ02+c(𝜷0)1ρ02.\frac{\sum_{t=1}^{n}x^{2}_{t-1}}{n}\rightarrow\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}. (S-13.27)

Now, since xt=ϱt+ϵ~tx_{t}=\varrho_{t}+\tilde{\epsilon}_{t}, it follows using (B2) (orthogonality) and (S-13.9) that for 𝜷~m=𝜷m\tilde{\boldsymbol{\beta}}_{m}=\boldsymbol{\beta}_{m} or 𝜷~m=𝜷m0\tilde{\boldsymbol{\beta}}_{m}=\boldsymbol{\beta}_{m0},

limnt=1n𝜷~m𝒛mtxtn=limnt=1n𝜷~m𝒛mt𝒛mt𝜷m0n+limnt=1n𝜷~m𝒛mtϵ~tn.\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\tilde{\boldsymbol{\beta}}^{\prime}_{m}\boldsymbol{z}_{mt}x_{t}}{n}=\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\tilde{\boldsymbol{\beta}}^{\prime}_{m}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0}}{n}+\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\tilde{\boldsymbol{\beta}}^{\prime}_{m}\boldsymbol{z}_{mt}\tilde{\epsilon}_{t}}{n}. (S-13.28)

By (B4), the first term on the right hand side of (S-13.28) is c~(𝜷,𝜷0)\tilde{c}(\boldsymbol{\beta},\boldsymbol{\beta}_{0}), where c~(𝜷,𝜷0)\tilde{c}(\boldsymbol{\beta},\boldsymbol{\beta}_{0}) is c(𝜷0)c(\boldsymbol{\beta}_{0}) or c10(𝜷,𝜷0)c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0}) accordingly as 𝜷~m\tilde{\boldsymbol{\beta}}_{m} is 𝜷m0\boldsymbol{\beta}_{m0} or 𝜷m\boldsymbol{\beta}_{m}. For the second term, due to (S-13.23), limnt=1n𝜷~m𝒛mtϵ~tn=limnt=1n𝜷~𝒛tϵ~tn\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\tilde{\boldsymbol{\beta}}^{\prime}_{m}\boldsymbol{z}_{mt}\tilde{\epsilon}_{t}}{n}=\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\tilde{\boldsymbol{\beta}}^{\prime}\boldsymbol{z}_{t}\tilde{\epsilon}_{t}}{n}, where 𝜷~\tilde{\boldsymbol{\beta}} is either 𝜷\boldsymbol{\beta} or 𝜷0\boldsymbol{\beta}_{0}. By (B5) the limit of 𝜷~𝒛t\tilde{\boldsymbol{\beta}}^{\prime}\boldsymbol{z}_{t} exists as tt\rightarrow\infty, and hence 𝜷~𝒛tϵ~t\tilde{\boldsymbol{\beta}}^{\prime}\boldsymbol{z}_{t}\tilde{\epsilon}_{t} remains an irreducible, aperiodic Markov chain with zero mean Gaussian stationary distribution. Hence, by the ergodic theorem, it follows that the second term of (S-13.28) is zero, almost surely. In other words, almost surely,

t=1n𝜷~m𝒛mtxtnc~(𝜷,𝜷0),asn,\frac{\sum_{t=1}^{n}\tilde{\boldsymbol{\beta}}^{\prime}_{m}\boldsymbol{z}_{mt}x_{t}}{n}\rightarrow\tilde{c}(\boldsymbol{\beta},\boldsymbol{\beta}_{0}),~{}\mbox{as}~{}n\rightarrow\infty, (S-13.29)

and similar arguments show that, almost surely,

t=1n𝜷~m𝒛mtxt1n0,asn.\frac{\sum_{t=1}^{n}\tilde{\boldsymbol{\beta}}^{\prime}_{m}\boldsymbol{z}_{mt}x_{t-1}}{n}\rightarrow 0,~{}\mbox{as}~{}n\rightarrow\infty. (S-13.30)

We now calculate the limit of t=1nxtxt1/n\sum_{t=1}^{n}x_{t}x_{t-1}/n, as nn\rightarrow\infty. By (S-13.16),

limnt=1nxtxt1n=limnρ0t=1nxt12n+limn𝜷m0t=1n𝒛mtxt1n+limnt=1nϵtxt1n.\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}x_{t}x_{t-1}}{n}=\underset{n\rightarrow\infty}{\lim}~{}\frac{\rho_{0}\sum_{t=1}^{n}x^{2}_{t-1}}{n}+\underset{n\rightarrow\infty}{\lim}~{}\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}+\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\epsilon_{t}x_{t-1}}{n}. (S-13.31)

By (S-13.27), the first term on the right hand side of (S-13.31) is given, almost surely, by ρ0σ021ρ02+ρ0c(𝜷0)1ρ02\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}, and the second term is almost surely zero due to (S-13.30). For the third term, note that ϵtxt1=ϵtϱt1+ϵtϵ~t1\epsilon_{t}x_{t-1}=\epsilon_{t}\varrho_{t-1}+\epsilon_{t}\tilde{\epsilon}_{t-1}, and hence using (S-13.23), limnt=1nϵtxt1n=limnt=1nϵtϱ^t1n+limnt=1nϵtϵ~t1n\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\epsilon_{t}x_{t-1}}{n}=\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\epsilon_{t}\hat{\varrho}_{t-1}}{n}+\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\epsilon_{t}\tilde{\epsilon}_{t-1}}{n}. Both ϵtϱ^t1\epsilon_{t}\hat{\varrho}_{t-1}; t=1,2,t=1,2,\ldots and ϵtϵ~t1\epsilon_{t}\tilde{\epsilon}_{t-1}; t=1,2,t=1,2,\ldots, are sample paths of irreducible and aperiodic Markov chains having stationary distributions with mean zero. Hence, by the ergodic theorem, the third term of (S-13.31) is zero, almost surely. That is,

limnt=1nxtxt1n=ρ0σ021ρ02+ρ0c(𝜷0)1ρ02.\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}x_{t}x_{t-1}}{n}=\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}. (S-13.32)

The limits (S-13.26), (S-13.27), (S-13.29), (S-13.30), (S-13.32) applied to logRn(𝜽)\log R_{n}(\boldsymbol{\theta}) given by (S-13.1), shows that logRn(𝜽)n\frac{\log R_{n}(\boldsymbol{\theta})}{n} converges to h(𝜽)-h(\boldsymbol{\theta}) almost surely as nn\rightarrow\infty. In other words, (S3) holds.

S-13.4 Verification of (S4)

In the expression for h(𝜽)h(\boldsymbol{\theta}) given by (8.17), note that c(𝜷)c(\boldsymbol{\beta}) and c10(𝜷,𝜷0)c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0}) are almost surely finite. Hence, for any prior on σ\sigma and ρ\rho such that they are almost surely finite, (S4) clearly holds. In particular, this holds for any proper priors on σ\sigma and ρ\rho.

S-13.5 Verification of (S5)

S-13.5.1 Verification of (S5) (1)

Since Θ=𝒞(𝒳)×××+\Theta=\mathcal{C}^{\prime}(\mathcal{X})\times\mathbb{R}^{\infty}\times\mathbb{R}\times\mathbb{R}^{+}, it is easy to see that h(Θ)=0h(\Theta)=0. Let 𝜸m=(γ1,,γm)\boldsymbol{\gamma}_{m}=(\gamma_{1},\ldots,\gamma_{m})^{\prime}, γ~m=i=1m|γi|\tilde{\gamma}_{m}=\sum_{i=1}^{m}|\gamma_{i}|, 𝜽m=(η~,𝜸m,ρ,σ)\boldsymbol{\theta}_{m}=(\tilde{\eta},\boldsymbol{\gamma}_{m},\rho,\sigma), Θm=𝒞(𝒳)×m××+\Theta_{m}=\mathcal{C}^{\prime}(\mathcal{X})\times\mathbb{R}^{m}\times\mathbb{R}\times\mathbb{R}^{+}. We now define

𝒢n\displaystyle\mathcal{G}_{n} ={𝜽mΘm:|ρ|exp((αn)1/16),γ~mexp((αn)1/16),\displaystyle=\left\{\boldsymbol{\theta}_{m}\in\Theta_{m}:|\rho|\leq\exp\left(\left(\alpha n\right)^{1/16}\right),\tilde{\gamma}_{m}\leq\exp\left(\left(\alpha n\right)^{1/16}\right),\right.
η~exp((αn)1/16),η~exp((αn)1/16),exp((αn)1/16)σexp((αn)1/16)},\displaystyle\qquad\left.\|\tilde{\eta}\|\leq\exp\left(\left(\alpha n\right)^{1/16}\right),\|\tilde{\eta}^{\prime}\|\leq\exp\left(\left(\alpha n\right)^{1/16}\right),\exp\left(-\left(\alpha n\right)^{1/16}\right)\leq\sigma\leq\exp\left(\left(\alpha n\right)^{1/16}\right)\right\},

where α>0\alpha>0.

Since |γi|<L<|\gamma_{i}|<L<\infty for all ii, it follows that 𝒢n\mathcal{G}_{n} is increasing in nn for nn0n\geq n_{0}, for some n01n_{0}\geq 1. To see this, note that if γ~mnexp((αn)1/16)\tilde{\gamma}_{m_{n}}\leq\exp(\left(\alpha n\right)^{1/16}), then γ~mn+1=γ~mn+i=mn+1mn+1|γi|<exp((α(n+1))1/16)\tilde{\gamma}_{m_{n+1}}=\tilde{\gamma}_{m_{n}}+\sum_{i=m_{n}+1}^{m_{n+1}}|\gamma_{i}|<\exp(\left(\alpha(n+1)\right)^{1/16}) if i=mn+1mn+1|γi|<L(mn+1mn)<exp((α(n+1))1/16)exp((αn)1/16)\sum_{i=m_{n}+1}^{m_{n+1}}|\gamma_{i}|<L(m_{n+1}-m_{n})<\exp(\left(\alpha(n+1)\right)^{1/16})-\exp(\left(\alpha n\right)^{1/16}), which holds by assumption (B7). Since 𝒢nΘ\mathcal{G}_{n}\rightarrow\Theta as nn\rightarrow\infty, there exists n1n_{1} such that 𝒢n1\mathcal{G}_{n_{1}} contains 𝜽0\boldsymbol{\theta}_{0}. Hence, h(𝒢n)=0h(\mathcal{G}_{n})=0 for all nn1n\geq n_{1}. In other words, h(𝒢n)h(Θ)h(\mathcal{G}_{n})\rightarrow h(\Theta), as nn\rightarrow\infty. Now observe that

π(𝒢n)\displaystyle\pi\left(\mathcal{G}_{n}\right)
=π(γ~mexp((αn)1/16),η~exp((αn)1/16),η~exp((αn)1/16),\displaystyle=\pi\left(\tilde{\gamma}_{m}\leq\exp(\left(\alpha n\right)^{1/16}),\|\tilde{\eta}\|\leq\exp(\left(\alpha n\right)^{1/16}),\|\tilde{\eta}^{\prime}\|\leq\exp(\left(\alpha n\right)^{1/16}),\right.
exp((αn)1/16)σexp((αn)1/16))\displaystyle\qquad\left.\exp\left(-\left(\alpha n\right)^{1/16}\right)\leq\sigma\leq\exp\left(\left(\alpha n\right)^{1/16}\right)\right)
π(|ρ|>exp((αn)1/16),γ~mexp((αn)1/16),η~exp((αn)1/16),η~exp((αn)1/16),\displaystyle\quad-\pi\left(|\rho|>\exp\left(\left(\alpha n\right)^{1/16}\right),\tilde{\gamma}_{m}\leq\exp(\left(\alpha n\right)^{1/16}),\|\tilde{\eta}\|\leq\exp(\left(\alpha n\right)^{1/16}),\|\tilde{\eta}^{\prime}\|\leq\exp(\left(\alpha n\right)^{1/16}),\right.
exp((αn)1/16)σexp((αn)1/16))\displaystyle\qquad\left.\exp\left(-\left(\alpha n\right)^{1/16}\right)\leq\sigma\leq\exp\left(\left(\alpha n\right)^{1/16}\right)\right)
1π(|ρ|>exp((αn)1/16))π(γ~m>exp((αn)1/16))π(η~>exp((αn)1/16))\displaystyle\geq 1-\pi\left(|\rho|>\exp\left(\left(\alpha n\right)^{1/16}\right)\right)-\pi\left(\tilde{\gamma}_{m}>\exp\left(\left(\alpha n\right)^{1/16}\right)\right)-\pi\left(\|\tilde{\eta}\|>\exp(\left(\alpha n\right)^{1/16})\right)
π(η~>exp((αn)1/16))π({exp((αn)1/16)σexp((αn)1/16)}c)\displaystyle\qquad-\pi\left(\|\tilde{\eta}^{\prime}\|>\exp(\left(\alpha n\right)^{1/16})\right)-\pi\left(\left\{\exp\left(-\left(\alpha n\right)^{1/16}\right)\leq\sigma\leq\exp\left(\left(\alpha n\right)^{1/16}\right)\right\}^{c}\right)
1(cρ+cγ+cη~+cη~+cσ)exp(αn),\displaystyle\geq 1-(c_{\rho}+c_{\gamma}+c_{\tilde{\eta}}+c_{\tilde{\eta}^{\prime}}+c_{\sigma})\exp(-\alpha n),

where the last step is due to (B6).

S-13.5.2 Verification of (S5) (2)

First, we note that 𝒢n\mathcal{G}_{n} is compact, which can be proved using Arzela-Ascoli lemma, in almost the same way as in Chatterjee and Bhattacharya (2020). Since 𝒢n\mathcal{G}_{n} is compact for all n1n\geq 1, uniform convergence as required will be proven if we can show that 1nlogRn(𝜽)+h(𝜽)\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+h(\boldsymbol{\theta}) is stochastically equicontinuous almost surely in 𝜽𝒢\boldsymbol{\theta}\in\mathcal{G} for any 𝒢{𝒢n:n=1,2,}\mathcal{G}\in\left\{\mathcal{G}_{n}:n=1,2,\ldots\right\} and 1nlogRn(𝜽)+h(𝜽)0\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+h(\boldsymbol{\theta})\rightarrow 0, almost surely, for all 𝜽𝒢\boldsymbol{\theta}\in\mathcal{G} (see Newey (1991) for the general theory of uniform convergence in compact sets under stochastic equicontinuity). Since we have already verified pointwise convergence of the above for all 𝜽𝚯\boldsymbol{\theta}\in\boldsymbol{\Theta} while verifying (S3), it remains to prove stochastic equicontinuity of 1nlogRn()+h()\frac{1}{n}\log R_{n}(\cdot)+h(\cdot). Stochastic equicontinuity usually follows easily if one can prove that the function concerned is almost surely Lipschitz continuous. In our case, we can first verify Lipschitz continuity of 1nlogRn(𝜽)\frac{1}{n}\log R_{n}(\boldsymbol{\theta}) by showing that its first partial derivatives with respect to the components of 𝜽\boldsymbol{\theta} are almost surely bounded. With respect to ρ\rho and σ\sigma, the boundedness of the parameters in 𝒢\mathcal{G}, (8.9) and the limit results (S-13.26), (S-13.27), (S-13.29), (S-13.30) and (S-13.32) readily show boundedness of the partial derivatives. With respect to 𝜷m\boldsymbol{\beta}_{m}, note that the derivative of 12σ2𝜷m(t=1n𝒛mt𝒛mtn)𝜷m\frac{1}{2\sigma^{2}}\boldsymbol{\beta}^{\prime}_{m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{m}, a relevant expression of 1nlogRn(𝜽)\frac{1}{n}\log R_{n}(\boldsymbol{\theta}) (see (S-13.1)), is 1σ2(t=1n𝒛mt𝒛mtn)𝜷m\frac{1}{\sigma^{2}}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{m}, whose Euclidean norm is bounded above by σ2(t=1n𝒛mt𝒛mtn)op×𝜷m\sigma^{-2}\|\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\|_{op}\times\|\boldsymbol{\beta}_{m}\|. In our case, (t=1n𝒛mt𝒛mtn)opK<\|\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\|_{op}\leq K<\infty by (B3). Moreover, σ2\sigma^{-2} is bounded in 𝒢\mathcal{G} and 𝜷mη~×i=1mγi2\|\boldsymbol{\beta}_{m}\|\leq\|\tilde{\eta}\|\times\sqrt{\sum_{i=1}^{m}\gamma^{2}_{i}}, which is also bounded in 𝒢\mathcal{G}. Boundedness of the partial derivatives with respect to 𝜷m\boldsymbol{\beta}_{m} of the other terms of 1nlogRn(𝜽)\frac{1}{n}\log R_{n}(\boldsymbol{\theta}) involving 𝜷m\boldsymbol{\beta}_{m} are easy to observe. In other words, 1nlogRn(𝜽)\frac{1}{n}\log R_{n}(\boldsymbol{\theta}) is stochastically equicontinuous.

To see that h(𝜽)h(\boldsymbol{\theta}) is equicontinuous, first note that in the expression (8.17), except the terms involving c(𝜷)c(\boldsymbol{\beta}) and c10(𝜷,𝜷0)c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0}), the other terms are easily seen to be Lipschitz, using boundedness of the partial derivatives. Let us now focus on the term c(𝜷)2σ2\frac{c(\boldsymbol{\beta})}{2\sigma^{2}}. For our purpose, let us consider two different sequences 𝜷1m\boldsymbol{\beta}_{1m} and 𝜷2m\boldsymbol{\beta}_{2m} associated with (γ1,η~1)(\gamma_{1},\tilde{\eta}_{1}) and (γ2,η~2)(\gamma_{2},\tilde{\eta}_{2}), respectively, such that 𝜷1m(t=1n𝒛mt𝒛mtn)𝜷1mc(𝜷1)\boldsymbol{\beta}^{\prime}_{1m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{1m}\rightarrow c(\boldsymbol{\beta}_{1}) and 𝜷2m(t=1n𝒛mt𝒛mtn)𝜷2mc(𝜷2)\boldsymbol{\beta}^{\prime}_{2m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{2m}\rightarrow c(\boldsymbol{\beta}_{2}). As we have already shown that 𝜷m(t=1n𝒛mt𝒛mtn)𝜷m\boldsymbol{\beta}^{\prime}_{m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{m} is Lipschitz in 𝜷m\boldsymbol{\beta}_{m}, we must have 𝜷1m(t=1n𝒛mt𝒛mtn)𝜷1m𝜷2m(t=1n𝒛mt𝒛mtn)𝜷2mL𝜷1m𝜷2mLγ1η~1γ2η~2\|\boldsymbol{\beta}^{\prime}_{1m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{1m}-\boldsymbol{\beta}^{\prime}_{2m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{2m}\|\leq L\|\boldsymbol{\beta}_{1m}-\boldsymbol{\beta}_{2m}\|\leq L\|\gamma_{1}\tilde{\eta}_{1}-\gamma_{2}\tilde{\eta}_{2}\|, for some Lipschitz constant L>0L>0. Taking the limit of both sides as nn\rightarrow\infty shows that |c(𝜷1)c(𝜷2)|Lγ1η~1γ2η~2|c(\boldsymbol{\beta}_{1})-c(\boldsymbol{\beta}_{2})|\leq L\|\gamma_{1}\tilde{\eta}_{1}-\gamma_{2}\tilde{\eta}_{2}\|, proving that c(𝜷)2σ2\frac{c(\boldsymbol{\beta})}{2\sigma^{2}} is Lipschitz in η=γη~\eta=\gamma\tilde{\eta}, when σ\sigma is held fixed. The bounded partial derivative with respect to σ\sigma also shows that c(𝜷)2σ2\frac{c(\boldsymbol{\beta})}{2\sigma^{2}} is Lipschitz in both η\eta and σ\sigma. Similarly, the term c10(𝜷,𝜷0)σ2\frac{c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})}{\sigma^{2}} in (8.17) is also Lipschitz continuous.

In other words, 1nlogRn(𝜽)+h(𝜽)\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+h(\boldsymbol{\theta}) is stochastically equicontinuous almost surely in 𝜽𝒢\boldsymbol{\theta}\in\mathcal{G}. Hence, the required uniform convergence is satisfied.

S-13.5.3 Verification of (S5) (3)

Continuity of h(𝜽)h(\boldsymbol{\theta}), compactness of 𝒢n\mathcal{G}_{n} , along with its non-decreasing nature with respect to nn implies that h(𝒢n)h(𝚯)h\left(\mathcal{G}_{n}\right)\rightarrow h\left(\boldsymbol{\Theta}\right), as nn\rightarrow\infty. Hence, (S5) holds.

S-13.6 Verification of (S6) and proof of Theorem 25

Note that in our case,

1nlogRn(𝜽)+h(𝜽)\displaystyle\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+h(\boldsymbol{\theta})
=(12σ212σ02)(t=1nxt2nσ021ρ02c(𝜷0)1ρ02)\displaystyle\qquad=\left(\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right)\left(\frac{\sum_{t=1}^{n}x^{2}_{t}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)
+(ρ22σ2ρ022σ02)(t=1nxt12nσ021ρ02c(𝜷0)1ρ02)\displaystyle\qquad+\left(\frac{\rho^{2}}{2\sigma^{2}}-\frac{\rho^{2}_{0}}{2\sigma^{2}_{0}}\right)\left(\frac{\sum_{t=1}^{n}x^{2}_{t-1}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)
+12σ2(𝜷m(t=1n𝒛mt𝒛mtn)𝜷mc(𝜷))12σ02(𝜷m0(t=1n𝒛mt𝒛mtn)𝜷m0c(𝜷0))\displaystyle\qquad+\frac{1}{2\sigma^{2}}\left(\boldsymbol{\beta}^{\prime}_{m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{m}-c(\boldsymbol{\beta})\right)-\frac{1}{2\sigma^{2}_{0}}\left(\boldsymbol{\beta}^{\prime}_{m0}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{m0}-c(\boldsymbol{\beta}_{0})\right)
(ρσ2ρ0σ02)(ρ0t=1nxt12n+𝜷m0t=1n𝒛mtxt1nρ0σ021ρ02ρ0c(𝜷0)1ρ02)\displaystyle\qquad-\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\left(\frac{\rho_{0}\sum_{t=1}^{n}x^{2}_{t-1}}{n}+\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}-\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)
[(𝜷mσ2𝜷m0σ02)(t=1n𝒛mtxtn)c10(𝜷,𝜷0)σ2+c(𝜷0)σ02]\displaystyle\qquad-\left[\left(\frac{\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\boldsymbol{\beta}_{m0}}{\sigma^{2}_{0}}\right)^{\prime}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t}}{n}\right)-\frac{c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})}{\sigma^{2}}+\frac{c(\boldsymbol{\beta}_{0})}{\sigma^{2}_{0}}\right]
+(ρ𝜷mσ2ρ0𝜷0σ02)t=1n𝒛mtxt1n\displaystyle\qquad+\left(\frac{\rho\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\rho_{0}\boldsymbol{\beta}_{0}}{\sigma^{2}_{0}}\right)^{\prime}\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}
+(ρσ2ρ0σ02)(t=1nϵtxt1n).\displaystyle\qquad+\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\left(\frac{\sum_{t=1}^{n}\epsilon_{t}x_{t-1}}{n}\right). (S-13.33)

Let κ1=(κh(𝚯))/7\kappa_{1}=(\kappa-h\left(\boldsymbol{\Theta}\right))/7, 𝝁n=E(𝐱n)\boldsymbol{\mu}_{n}=E(\mathbf{x}_{n}) and 𝚺n=Var(𝐱n)\boldsymbol{\Sigma}_{n}=Var(\mathbf{x}_{n}); let 𝚺n=𝑪n𝑪n\boldsymbol{\Sigma}_{n}=\boldsymbol{C}_{n}\boldsymbol{C}^{\prime}_{n} be the Cholesky decomposition. Also let 𝒚nNn(𝟎n,𝑰n)\boldsymbol{y}_{n}\sim N_{n}\left(\boldsymbol{0}_{n},\boldsymbol{I}_{n}\right), the nn-dimensional normal distribution with mean 𝟎n\boldsymbol{0}_{n}, the nn-dimensional vector with all components zero and variance 𝑰n\boldsymbol{I}_{n}, the nn-dimensional identity matrix. Then

P(|12σ212σ02||t=1nxt2nσ021ρ02c(𝜷0)1ρ02|>κ1)\displaystyle P\left(\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|\left|\frac{\sum_{t=1}^{n}x^{2}_{t}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right|>\kappa_{1}\right)
=\displaystyle= P(|12σ212σ02||𝝁n𝝁n+2𝝁n𝑪n𝒚n+𝒚n𝚺n𝒚nnσ021ρ02c(𝜷0)1ρ02|>κ1)\displaystyle P\left(\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|\left|\frac{\boldsymbol{\mu}_{n}\prime\boldsymbol{\mu}_{n}+2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n}+\boldsymbol{y}^{\prime}_{n}\boldsymbol{\Sigma}_{n}\boldsymbol{y}_{n}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right|>\kappa_{1}\right)
\displaystyle\leq P(|12σ212σ02||2𝝁n𝑪n𝒚nn|>κ14)+P(|12σ212σ02||𝝁n𝝁nnc(𝜷0)1ρ02|>κ14)\displaystyle P\left(\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|\left|\frac{2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n}}{n}\right|>\frac{\kappa_{1}}{4}\right)+P\left(\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|\left|\frac{\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{\mu}_{n}}{n}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right|>\frac{\kappa_{1}}{4}\right) (S-13.34)
+P(|12σ212σ02||𝒚n𝚺n𝒚nntr(𝚺nn)|>κ14)+P(|12σ212σ02||tr(𝚺nn)σ021ρ02|>κ14).\displaystyle+P\left(\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|\left|\frac{\boldsymbol{y}^{\prime}_{n}\boldsymbol{\Sigma}_{n}\boldsymbol{y}_{n}}{n}-tr\left(\frac{\boldsymbol{\Sigma}_{n}}{n}\right)\right|>\frac{\kappa_{1}}{4}\right)+P\left(\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|\left|tr\left(\frac{\boldsymbol{\Sigma}_{n}}{n}\right)-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}\right|>\frac{\kappa_{1}}{4}\right). (S-13.35)

To deal with the first term of (S-13.34) first note that 2𝝁n𝑪n𝒚n2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n} is Lipschitz in 𝒚n\boldsymbol{y}_{n}, with the square of the Lipschitz constant being 4𝝁n𝚺n𝝁n4\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{\Sigma}_{n}\boldsymbol{\mu}_{n}, which is again bounded above by K1nK_{1}n, for some constant K1>0K_{1}>0, due to (8.9). It then follows using the Gaussian concentration inequality (see, for example, Giraud (2015)) that

P(|12σ212σ02||2𝝁n𝑪n𝒚nn|>κ14)\displaystyle P\left(\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|\left|\frac{2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n}}{n}\right|>\frac{\kappa_{1}}{4}\right) =P(|2𝝁n𝑪n𝒚n|>nκ13|12σ212σ02|1)\displaystyle=P\left(\left|2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n}\right|>\frac{n\kappa_{1}}{3}\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|^{-1}\right)
2exp(nκ1218K1|12σ212σ02|2).\displaystyle\leq 2\exp\left(-\frac{n\kappa^{2}_{1}}{18K_{1}}\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|^{-2}\right). (S-13.36)

Now, for large enough nn, noting that π(𝒢nc)exp(αn)\pi\left(\mathcal{G}^{c}_{n}\right)\leq\exp(-\alpha n) up to some positive constant, we have

𝒮cP(|12σ212σ02||2𝝁n𝑪n𝒚nn|>κ14)𝑑π(𝜽)\displaystyle\int_{\mathcal{S}^{c}}P\left(\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|\left|\frac{2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n}}{n}\right|>\frac{\kappa_{1}}{4}\right)d\pi(\boldsymbol{\theta})
2𝒮cexp(nκ1218K1|12σ212σ02|2)𝑑π(𝜽)\displaystyle\leq 2\int_{\mathcal{S}^{c}}\exp\left(-\frac{n\kappa^{2}_{1}}{18K_{1}}\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|^{-2}\right)d\pi(\boldsymbol{\theta}) (S-13.37)
2𝒢nexp(nκ1218K1|12σ212σ02|2)𝑑π(𝜽)+2𝒢ncexp(nκ1218K1|12σ212σ02|2)𝑑π(𝜽)\displaystyle\leq 2\int_{\mathcal{G}_{n}}\exp\left(-\frac{n\kappa^{2}_{1}}{18K_{1}}\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|^{-2}\right)d\pi(\boldsymbol{\theta})+2\int_{\mathcal{G}^{c}_{n}}\exp\left(-\frac{n\kappa^{2}_{1}}{18K_{1}}\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|^{-2}\right)d\pi(\boldsymbol{\theta})
2exp(2(αn)1/16)exp(2(αn)1/16)exp(nκ1218K1|12σ212σ02|2)π(σ2)𝑑σ2+2π(𝒢nc)\displaystyle\leq 2\int_{\exp(-2\left(\alpha n\right)^{1/16})}^{\exp(2\left(\alpha n\right)^{1/16})}\exp\left(-\frac{n\kappa^{2}_{1}}{18K_{1}}\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|^{-2}\right)\pi(\sigma^{2})d\sigma^{2}+2\pi\left(\mathcal{G}^{c}_{n}\right)
2exp(2(αn)1/16)σ02exp(2(αn)1/16)σ02exp(C1κ12Tu2)(u+σ02)2π(1u+σ02)𝑑u+C~exp(αn),\displaystyle\leq 2\int_{\exp(-2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}^{\exp(2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}\exp\left(-C_{1}\kappa^{2}_{1}Tu^{-2}\right)(u+\sigma^{-2}_{0})^{-2}\pi\left(\frac{1}{u+\sigma^{-2}_{0}}\right)du+\tilde{C}\exp(-\alpha n), (S-13.38)

for some positive constants C1C_{1} and C~\tilde{C}.

Now, the prior (u+σ02)2π(1u+σ02)(u+\sigma^{-2}_{0})^{-2}\pi\left(\frac{1}{u+\sigma^{-2}_{0}}\right) is such that large values of uu receive small probabilities. Hence, if this prior is replaced by an appropriate function which has a thicker tail than the prior, then the resultant integral provides an upper bound for the first term of (S-13.38). We consider a function π~(u)\tilde{\pi}(u) which is of mixture form depending upon nn, that is, we let π~n(u)=c3r=1Mnψrnζrnexp(ψrnu2)u2(ζrn1)𝑰Bn(u)\tilde{\pi}_{n}(u)=c_{3}\sum_{r=1}^{M_{n}}\psi^{\zeta_{rn}}_{rn}\exp(-\psi_{rn}u^{2})u^{2(\zeta_{rn}-1)}\boldsymbol{I}_{B_{n}}(u), where Bn=[exp(2(αn)1/16)σ02,exp(2(αn)1/16)σ02]B_{n}=\left[\exp\left(-2\left(\alpha n\right)^{1/16}\right)-\sigma^{-2}_{0},\exp\left(2\left(\alpha n\right)^{1/16}\right)-\sigma^{-2}_{0}\right], Mnexp((αn)1/16)M_{n}\leq\exp(\left(\alpha n\right)^{1/16}) is the number of mixture components, c3>0c_{3}>0, for r=1,,Mnr=1,\ldots,M_{n}, 12<ζrnc4nq\frac{1}{2}<\zeta_{rn}\leq c_{4}n^{q}, for 0<q<1/160<q<1/16 and n1n\geq 1, where c4>0c_{4}>0, and 0<ψ1ψrn<c5<0<\psi_{1}\leq\psi_{rn}<c_{5}<\infty, for all rr and nn. In this case,

exp(2(αn)1/16)σ02exp(2(αn)1/16)σ02exp(C1κ12nu2)(u+σ02)2π(1u+σ02)𝑑u\displaystyle\int_{\exp(-2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}^{\exp(2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}\exp\left(-C_{1}\kappa^{2}_{1}nu^{-2}\right)(u+\sigma^{-2}_{0})^{-2}\pi\left(\frac{1}{u+\sigma^{-2}_{0}}\right)du
c3r=1Mnψrnζrnexp(2(αn)1/16)σ02exp(2(αn)1/16)σ02exp[(C1κ12nu2+ψrnu2)](u2)ζrn1𝑑u.\displaystyle\leq c_{3}\sum_{r=1}^{M_{n}}\psi^{\zeta_{rn}}_{rn}\int_{\exp(-2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}^{\exp(2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}\exp\left[-\left(C_{1}\kappa^{2}_{1}nu^{-2}+\psi_{rn}u^{2}\right)\right]\left(u^{2}\right)^{\zeta_{rn}-1}du. (S-13.39)

Now the rr-th integrand of (S-13.39) is minimized at u~rn2=ζrn1+(ζrn1)2+4C1ψrnκ12n2ψrn\tilde{u}^{2}_{rn}=\frac{\zeta_{rn}-1+\sqrt{(\zeta_{rn}-1)^{2}+4C_{1}\psi_{rn}\kappa^{2}_{1}n}}{2\psi_{rn}}, so that for sufficiently large nn, c1κ1nψrnu~rn2c~1κ1nψrnc_{1}\kappa_{1}\sqrt{\frac{n}{\psi_{rn}}}\leq\tilde{u}^{2}_{rn}\leq\tilde{c}_{1}\kappa_{1}\sqrt{\frac{n}{\psi_{rn}}}, for some positive constants c1c_{1} and c~1\tilde{c}_{1}. Now, for sufficiently large nn, we have u~rn2logu~rn2ζrn1ψrn(1c2)\frac{\tilde{u}^{2}_{rn}}{\log\tilde{u}^{2}_{rn}}\geq\frac{\zeta_{rn}-1}{\psi_{rn}(1-c_{2})}, for 0<c2<10<c_{2}<1. Hence, for sufficiently large nn, C1κ12nu~rn2+ψrnu~rn2(ζrn1)log(u~rn2)c2ψ1u~rn2C2κ1ψrnnC_{1}\kappa^{2}_{1}n\tilde{u}^{-2}_{rn}+\psi_{rn}\tilde{u}^{2}_{rn}-(\zeta_{rn}-1)\log(\tilde{u}^{2}_{rn})\geq c_{2}\psi_{1}\tilde{u}^{2}_{rn}\geq C_{2}\kappa_{1}\sqrt{\psi_{rn}n} for some positive constant C2C_{2}. From these and (S-13.38) it follows that

c3r=1Mnψrnζrnexp(2(αn)1/16)σ02exp(2(αn)1/16)σ02exp[(C1κ12nu2+ψ1u2)](u2)ζrn1𝑑u\displaystyle c_{3}\sum_{r=1}^{M_{n}}\psi^{\zeta_{rn}}_{rn}\int_{\exp(-2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}^{\exp(2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}\exp\left[-\left(C_{1}\kappa^{2}_{1}nu^{-2}+\psi_{1}u^{2}\right)\right]\left(u^{2}\right)^{\zeta_{rn}-1}du
c3Mnexp[(C2κ1nψ12(αn)1/16c~5nq)]\displaystyle\leq c_{3}M_{n}\exp\left[-\left(C_{2}\kappa_{1}\sqrt{n\psi_{1}}-2\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]
c3exp[(C2κ1n3(αn)1/16c~5nq)].\displaystyle\leq c_{3}\exp\left[-\left(C_{2}\kappa_{1}\sqrt{n}-3\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]. (S-13.40)

for some constant c~5\tilde{c}_{5}. Combining (S-13.38), (S-13.39) and (S-13.40) we obtain

𝒮cP(|12σ212σ02||2𝝁n𝑪n𝒚nn|>κ14)𝑑π(𝜽)\displaystyle\int_{\mathcal{S}^{c}}P\left(\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|\left|\frac{2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n}}{n}\right|>\frac{\kappa_{1}}{4}\right)d\pi(\boldsymbol{\theta})
K2exp[(C2κ1n3(αn)1/16c~5nq)]+C~exp(αn).\displaystyle\leq K_{2}\exp\left[-\left(C_{2}\kappa_{1}\sqrt{n}-3\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n). (S-13.41)

For the second term of (S-13.34), since 𝝁n\boldsymbol{\mu}_{n} is non-random, we can also view this as a set of independent realizations from any suitable independent zero mean process with variance c(𝜷0)1ρ02\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}} on a compact set (due to (8.9)). In that case, by Hoeffding’s inequality (Hoeffding, 1963) we obtain

𝒮cP(|12σ212σ02||𝝁n𝝁nnc(𝜷0)1ρ02|>κ14)𝑑π(𝜽)\displaystyle\int_{\mathcal{S}^{c}}P\left(\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|\left|\frac{\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{\mu}_{n}}{n}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right|>\frac{\kappa_{1}}{4}\right)d\pi(\boldsymbol{\theta})
2exp(2(αn)1/16)exp(2(αn)1/16)exp(K3κ12n|12σ212σ02|2)π(σ2)𝑑σ2+C~2exp(αn)\displaystyle\leq 2\int_{\exp(-2\left(\alpha n\right)^{1/16})}^{\exp(2(\alpha n)^{1/16})}\exp\left(-K_{3}\kappa^{2}_{1}n\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|^{-2}\right)\pi(\sigma^{2})d\sigma^{2}+\tilde{C}_{2}\exp(-\alpha n)
K3exp[(C3κ1nψ23(αn)1/16c~5nq)]+C~exp(αn).\displaystyle\leq K_{3}\exp\left[-\left(C_{3}\kappa_{1}\sqrt{n\psi_{2}}-3\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n). (S-13.42)

for some positive constants K3K_{3} and C3C_{3}. The last step follows in the same way as (S-13.41).

We now deal with the first term of (S-13.35). First note that 𝚺nF2K4n\|\boldsymbol{\Sigma}_{n}\|^{2}_{F}\leq K_{4}n, for some K4>0K_{4}>0, where 𝚺nF2\|\boldsymbol{\Sigma}_{n}\|^{2}_{F} is the Frobenius norm of 𝚺n\boldsymbol{\Sigma}_{n}. Also, any eigenvalue λ\lambda of any matrix 𝑨=(aij)\boldsymbol{A}=(a_{ij}) satisfies |λaii|ji|aij||\lambda-a_{ii}|\leq\sum_{j\neq i}|a_{ij}|, by the Gerschgorin’s circle theorem (see, for example, Lange (2010)). In our case, the rows of 𝚺n\boldsymbol{\Sigma}_{n} are summable and the diagonal elements are bounded for any nn. The maximum row sum is attained by the middle row when nn is odd and the two middle rows when nn is even. In other words, the maximum eigenvalue of 𝚺n\boldsymbol{\Sigma}_{n} remains bounded for all n1n\geq 1. That is, supn1𝚺nop<K5\underset{n\geq 1}{\sup}~{}\|\boldsymbol{\Sigma}_{n}\|_{op}<K_{5}, for some positive constant K5K_{5}. Now observe that for the integral of the form σ2𝒢n~exp(C5κ12n|σ2σ02|1)π(σ2)𝑑σ2\int_{\sigma^{2}\in\tilde{\mathcal{G}_{n}}}\exp\left(-C_{5}\kappa^{2}_{1}n\left|\sigma^{-2}-\sigma^{-2}_{0}\right|^{-1}\right)\pi(\sigma^{2})d\sigma^{2}, where 𝒢n~𝒢n\tilde{\mathcal{G}_{n}}\subseteq\mathcal{G}_{n}, we can obtain, using the same technique pertaining to (S-13.41), that

σ2𝒢n~exp(C5κ12n|σ2σ02|1)π(σ2)𝑑σ2C7exp[(C6κ1n3(αn)1/16c~5nq)],\int_{\sigma^{2}\in\tilde{\mathcal{G}_{n}}}\exp\left(-C_{5}\kappa^{2}_{1}n\left|\sigma^{-2}-\sigma^{-2}_{0}\right|^{-1}\right)\pi(\sigma^{2})d\sigma^{2}\\ \leq C_{7}\exp\left[-\left(C_{6}\kappa_{1}\sqrt{n}-3\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right], (S-13.43)

for relevant positive constants C6C_{6}, ψ3\psi_{3} and c~5\tilde{c}_{5}. Then by the Hanson-Wright inequality, (S-13.43) and the same method for obtaining (S-13.41), we obtain the following bound for the first term of (S-13.35):

𝒮cP(|12σ212σ02||𝒚n𝚺n𝒚nntr(𝚺nn)|>κ14)𝑑π(𝜽)\displaystyle\int_{\mathcal{S}^{c}}P\left(\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|\left|\frac{\boldsymbol{y}^{\prime}_{n}\boldsymbol{\Sigma}_{n}\boldsymbol{y}_{n}}{n}-\tr\left(\frac{\boldsymbol{\Sigma}_{n}}{n}\right)\right|>\frac{\kappa_{1}}{4}\right)d\pi(\boldsymbol{\theta})
Eπ[exp[K6min{κ129|12σ212σ02|2𝚺nnF2,κ13|12σ212σ02|1𝚺nnop}]𝑰𝒢n(𝜽)]+C~exp(αn)\displaystyle\leq E_{\pi}\left[\exp\left[-K_{6}\min\left\{\frac{\frac{\kappa^{2}_{1}}{9}\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|^{-2}}{\|\frac{\boldsymbol{\Sigma}_{n}}{n}\|^{2}_{F}},\frac{\frac{\kappa_{1}}{3}\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|^{-1}}{\|\frac{\boldsymbol{\Sigma}_{n}}{n}\|_{op}}\right\}\right]\boldsymbol{I}_{\mathcal{G}_{n}}(\boldsymbol{\theta})\right]+\tilde{C}\exp(-\alpha n)
K7exp[(C8κ1n3(αn)1/16c~5nq)]+C~exp(αn),\displaystyle\leq K_{7}\exp\left[-\left(C_{8}\kappa_{1}\sqrt{n}-3\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n), (S-13.44)

for relevant positive constants K7K_{7}, C8C_{8}, ψ4\psi_{4} and c~5\tilde{c}_{5}.

Using the same technique involving Hoeffding’s bound for the second term of (S-13.34), it is easy to see that the second term of (S-13.35) satisfies the following:

P(|12σ212σ02||tr(𝚺nn)σ021ρ02|>κ14)\displaystyle P\left(\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|\left|tr\left(\frac{\boldsymbol{\Sigma}_{n}}{n}\right)-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}\right|>\frac{\kappa_{1}}{4}\right) K~3exp[(C~3κ1n3(αn)1/16c~5nq)],\displaystyle\leq\tilde{K}_{3}\exp\left[-\left(\tilde{C}_{3}\kappa_{1}\sqrt{n}-3\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right],
+C~exp(αn),\displaystyle\qquad+\tilde{C}\exp(-\alpha n), (S-13.45)

for relevant positive constants K~3\tilde{K}_{3}, C~3\tilde{C}_{3}, ψ~2\tilde{\psi}_{2} and c~5\tilde{c}_{5}.

Hence, combining (S-13.34), (S-13.35), (S-13.42), (S-13.44) and (S-13.45), we obtain

Eπ[P(|12σ212σ02||t=1nxt2nσ021ρ02c(𝜷0)1ρ02|>κ1)𝑰𝒮c(𝜽)]\displaystyle E_{\pi}\left[P\left(\left|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right|\left|\frac{\sum_{t=1}^{n}x^{2}_{t}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]
K8exp[(C9κ1n3(αn)1/16c~5nq)]+C~exp(αn),\displaystyle\leq K_{8}\exp\left[-\left(C_{9}\kappa_{1}\sqrt{n}-3\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n), (S-13.46)

for relevant positive constants.

Let us now obtain a bound for Eπ[P(|ρ22σ2ρ022σ02||t=1sxt12sσ021ρ02c(𝜷0)1ρ02|>κ1)𝑰𝒮c(𝜽)]E_{\pi}\left[P\left(\left|\frac{\rho^{2}}{2\sigma^{2}}-\frac{\rho^{2}_{0}}{2\sigma^{2}_{0}}\right|\left|\frac{\sum_{t=1}^{s}x^{2}_{t-1}}{s}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]. By the same way as above, we obtain, by first taking the expectation with respect to σ2𝒢n\sigma^{2}\in\mathcal{G}_{n}, the following:

Eπ[P(|ρ22σ2ρ022σ02||t=1nxt12nσ021ρ02c(𝜷0)1ρ02|>κ1)𝑰𝒮c(𝜽)]\displaystyle E_{\pi}\left[P\left(\left|\frac{\rho^{2}}{2\sigma^{2}}-\frac{\rho^{2}_{0}}{2\sigma^{2}_{0}}\right|\left|\frac{\sum_{t=1}^{n}x^{2}_{t-1}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]
C10ρ𝒢nexp(2(αn)1/16)exp(2(αn)1/16)exp[C11κ12n(ρ2σ2ρ02σ02)2]π(σ2)𝑑σ2π(ρ)𝑑ρ+C~exp(αn)\displaystyle\leq C_{10}\int_{\rho\in\mathcal{G}_{n}}\int_{\exp(-2\left(\alpha n\right)^{1/16})}^{\exp(2\left(\alpha n\right)^{1/16})}\exp\left[-C_{11}\kappa^{2}_{1}n\left(\frac{\rho^{2}}{\sigma^{2}}-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}\right)^{-2}\right]\pi(\sigma^{2})d\sigma^{2}\pi(\rho)d\rho+\tilde{C}\exp(-\alpha n)
=C10ρ𝒢nρ2ρ2exp(2(αn)1/16)ρ02σ02ρ2exp(2(αn)1/16)ρ02σ02exp(C11κ12nu2)(u+ρ02σ02)2π(ρ2u+ρ02σ02)𝑑uπ(ρ)𝑑ρ\displaystyle=C_{10}\int_{\rho\in\mathcal{G}_{n}}\rho^{2}\int_{\rho^{2}\exp(-2\left(\alpha n\right)^{1/16})-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}^{\rho^{2}\exp(2\left(\alpha n\right)^{1/16})-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}\exp\left(-C_{11}\kappa^{2}_{1}nu^{-2}\right)\left(u+\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}\right)^{-2}\pi\left(\frac{\rho^{2}}{u+\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}\right)du\pi(\rho)d\rho
+C~exp(αn),\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad+\tilde{C}\exp(-\alpha n), (S-13.47)

for relevant positive constants. Since π(σ2>exp(2(αn)1/16))exp(αn)\pi\left(\sigma^{2}>\exp(2\left(\alpha n\right)^{1/16})\right)\leq\exp(-\alpha n), it is evident that much the mass of (u+ρ02σ02)2π(ρ2u+ρ02σ02)\left(u+\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}\right)^{-2}\pi\left(\frac{\rho^{2}}{u+\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}\right) is concentrated around zero, where the function exp(C11nu2)\exp\left(-C_{11}nu^{-2}\right) is small. To give greater weight to the function, we can replace (u+ρ02σ02)2π(ρ2u+ρ02σ02)\left(u+\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}\right)^{-2}\pi\left(\frac{\rho^{2}}{u+\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}\right) with a mixture function of the form π~ρ2,n(u)=c3r=1Mnρ2ζrnψrnζrnexp(u2ψrnρ2)(u2)(ζrn1)𝑰Bn,ρ2(u)\tilde{\pi}_{\rho^{2},n}(u)=c_{3}\sum_{r=1}^{M_{n}}\rho^{2\zeta_{rn}}\psi^{\zeta_{rn}}_{rn}\exp\left(-u^{2}\psi_{rn}\rho^{2}\right)\left(u^{2}\right)^{(\zeta_{rn}-1)}\boldsymbol{I}_{B_{n,\rho^{2}}}(u), for positive constants 0<ψ2ψrn<c5<0<\psi_{2}\leq\psi_{rn}<c_{5}<\infty and 1/2<ζrn<c4nq1/2<\zeta_{rn}<c_{4}n^{q}. Here

Bn,ρ2=[ρ2exp(2(αn)1/16)ρ02σ02,ρ2exp(2(αn)1/16)ρ02σ02].B_{n,\rho^{2}}=\left[\rho^{2}\exp(-2\left(\alpha n\right)^{1/16})-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}},\rho^{2}\exp(2\left(\alpha n\right)^{1/16})-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}\right].

As before, 0<q<1/160<q<1/16 and Mnexp((αn)1/16)M_{n}\leq\exp\left(\left(\alpha n\right)^{1/16}\right). Hence, up to some positive constant,

ρ2exp(2(αn)1/16)ρ02σ02ρ2exp(2(αn)1/16)ρ02σ02exp(C11κ12nu2)(u+ρ02σ02)2π(ρ2u+ρ02σ02)𝑑u\displaystyle\int_{\rho^{2}\exp(-2\left(\alpha n\right)^{1/16})-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}^{\rho^{2}\exp(2\left(\alpha n\right)^{1/16})-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}\exp\left(-C_{11}\kappa^{2}_{1}nu^{-2}\right)\left(u+\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}\right)^{-2}\pi\left(\frac{\rho^{2}}{u+\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}\right)du
r=1Mnρ2ζrnψrnζrnρ2exp(2(αn)1/16)ρ02σ02ρ2exp(2(αn)1/16)ρ02σ02exp[(C11κ12nu2+ψrnρ2u2(ζrn1)logu2)]𝑑u.\displaystyle\leq\sum_{r=1}^{M_{n}}\rho^{2\zeta_{rn}}\psi^{\zeta_{rn}}_{rn}\int_{\rho^{2}\exp(-2\left(\alpha n\right)^{1/16})-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}^{\rho^{2}\exp(2\left(\alpha n\right)^{1/16})-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}\exp\left[-\left(C_{11}\kappa^{2}_{1}nu^{-2}+\psi_{rn}\rho^{2}u^{2}-(\zeta_{rn}-1)\log u^{2}\right)\right]du. (S-13.48)

The term within the parenthesis in the exponent of (S-13.48) is minimized at u~rn2=ζrn1+(ζrn1)2+4ψrnρ2C11κ12n2ψrnρ2\tilde{u}^{2}_{rn}=\frac{\zeta_{rn}-1+\sqrt{(\zeta_{rn}-1)^{2}+4\psi_{rn}\rho^{2}C_{11}\kappa^{2}_{1}n}}{2\psi_{rn}\rho^{2}}. Note that C~01κ1|ρ|nψrnu~rn2C~11κ1|ρ|nψrn\tilde{C}_{01}\frac{\kappa_{1}}{|\rho|}\sqrt{\frac{n}{\psi_{rn}}}\leq\tilde{u}^{2}_{rn}\leq\tilde{C}_{11}\frac{\kappa_{1}}{|\rho|}\sqrt{\frac{n}{\psi_{rn}}}, for large enough nn. Hence, for large nn, the term within the parenthesis in the exponent of (S-13.48) exceeds ψrnu~2C~02×|ρ|κ1ψrnn\psi_{rn}\tilde{u}^{2}\geq\tilde{C}_{02}\times|\rho|\kappa_{1}\sqrt{\psi_{rn}n}, for C~02>0\tilde{C}_{02}>0. Thus, (S-13.48) is bounded above by a constant times ρ2(1+ζrn)exp(C~02×κ1|ρ|ψ6n+3(αn)1/16+c~5nq)\rho^{2(1+\zeta_{rn})}\exp\left(-\tilde{C}_{02}\times\kappa_{1}|\rho|\sqrt{\psi_{6}n}+3\left(\alpha n\right)^{1/16}+\tilde{c}_{5}n^{q}\right). Combining this with (S-13.47) we see that

Eπ[P(|ρ22σ2ρ022σ02||t=1nxt12nσ021ρ02c(𝜷0)1ρ02|>κ1)𝑰𝒮c(𝜽)]\displaystyle E_{\pi}\left[P\left(\left|\frac{\rho^{2}}{2\sigma^{2}}-\frac{\rho^{2}_{0}}{2\sigma^{2}_{0}}\right|\left|\frac{\sum_{t=1}^{n}x^{2}_{t-1}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]
ρ𝒢nρ2(2+ζrn)exp[(C~02×κ1|ρ|ψ6n3(αn)1/4c~5nq)]π(ρ)𝑑ρ+C~exp(αn)\displaystyle\leq\int_{\rho\in\mathcal{G}_{n}}\rho^{2(2+\zeta_{rn})}\exp\left[-\left(\tilde{C}_{02}\times\kappa_{1}|\rho|\sqrt{\psi_{6}n}-3\left(\alpha n\right)^{1/4}-\tilde{c}_{5}n^{q}\right)\right]\pi(\rho)d\rho+\tilde{C}\exp(-\alpha n)
=exp((αn)1/16)exp((αn)1/16)exp[(C~02×κ1u1ψ6n+2(2+ζrn)logu3(αn)1/16c~5nq)]π1(u)𝑑u\displaystyle=\int_{\exp\left(-\left(\alpha n\right)^{1/16}\right)}^{\exp\left(\left(\alpha n\right)^{1/16}\right)}\exp\left[-\left(\tilde{C}_{02}\times\kappa_{1}u^{-1}\sqrt{\psi_{6}n}+2(2+\zeta_{rn})\log u-3\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]\pi_{1}(u)du
+C~exp(αn),\displaystyle\qquad+\tilde{C}\exp(-\alpha n), (S-13.49)

where π1(u)du\pi_{1}(u)du is the appropriate modification of π(ρ)dρ\pi(\rho)d\rho in view of the transformation |ρ|u1|\rho|\mapsto u^{-1}. Replacing π1(u)\pi_{1}(u) with a mixture function of the form π~n(u)=c3r=1Mnψrnζ~rnexp(uψrn)u(ζ~rn1)\tilde{\pi}_{n}(u)=c_{3}\sum_{r=1}^{M_{n}}\psi^{\tilde{\zeta}_{rn}}_{rn}\exp\left(-u\psi_{rn}\right)u^{(\tilde{\zeta}_{rn}-1)}, for positive constants 0<ψ2ψrn<c~5<0<\psi_{2}\leq\psi_{rn}<\tilde{c}_{5}<\infty and 0<ζrn<c4nq0<\zeta_{rn}<c_{4}n^{q}, with 0<q<1/160<q<1/16, and Mnexp((αn)1/16)M_{n}\leq\exp\left(\left(\alpha n\right)^{1/16}\right), and applying the same techniques as before, we see from (S-13.49) that

Eπ[P(|ρ22σ2ρ022σ02||t=1nxt12nσ021ρ02c(𝜷0)1ρ02|>κ1)𝑰𝒮c(𝜽)]\displaystyle E_{\pi}\left[P\left(\left|\frac{\rho^{2}}{2\sigma^{2}}-\frac{\rho^{2}_{0}}{2\sigma^{2}_{0}}\right|\left|\frac{\sum_{t=1}^{n}x^{2}_{t-1}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]
C14exp(3(αn)1/4+c~5nq)\displaystyle\leq C_{14}\exp\left(3\left(\alpha n\right)^{1/4}+\tilde{c}_{5}n^{q}\right)
×t=1Mnψrnζ~rnexp((αn)1/4)exp((αn)1/4)exp[(C~02×κ1u1ψ6n+uψrn(ζ~rn2ζrn5)logu)]du\displaystyle\times\sum_{t=1}^{M_{n}}\psi^{\tilde{\zeta}_{rn}}_{rn}\int_{\exp\left(-\left(\alpha n\right)^{1/4}\right)}^{\exp\left(\left(\alpha n\right)^{1/4}\right)}\exp\left[-\left(\tilde{C}_{02}\times\kappa_{1}u^{-1}\sqrt{\psi_{6}n}+u\psi_{rn}-(\tilde{\zeta}_{rn}-2\zeta_{rn}-5)\log u\right)\right]du
+C~exp(αn)\displaystyle\qquad+\tilde{C}\exp(-\alpha n)
C14exp[(C15κ1n1/44(αn)1/162nqlogc~5)]+C~exp(αn),\displaystyle\leq C_{14}\exp\left[-\left(C_{15}\sqrt{\kappa_{1}}n^{1/4}-4\left(\alpha n\right)^{1/16}-2n^{q}\log\tilde{c}_{5}\right)\right]+\tilde{C}\exp(-\alpha n), (S-13.50)

for relevant positive constants.

Let us now deal with 12σ2(𝜷m(t=1n𝒛mt𝒛mtn)𝜷mc(𝜷))=12σ2(t=1n(𝒛mt𝜷m)2nc(𝜷))\frac{1}{2\sigma^{2}}\left(\boldsymbol{\beta}^{\prime}_{m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{m}-c(\boldsymbol{\beta})\right)=\frac{1}{2\sigma^{2}}\left(\frac{\sum_{t=1}^{n}\left(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}\right)^{2}}{n}-c(\boldsymbol{\beta})\right). Now, again we assume as before that 𝒛mt𝜷m\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}; t=1,2,,nt=1,2,\ldots,n is a realization from some independent zero-mean process with variance c(𝜷)c(\boldsymbol{\beta}). Note that |𝒛mt𝜷m|i=1m|zit||βi|=i=1m|zit||γi||η~i|supt1ztη~i=1m|γi||\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}|\leq\sum_{i=1}^{m}|z_{it}||\beta_{i}|=\sum_{i=1}^{m}|z_{it}||\gamma_{i}||\tilde{\eta}_{i}|\leq\underset{t\geq 1}{\sup}~{}\|z_{t}\|\|\tilde{\eta}\|\sum_{i=1}^{m}|\gamma_{i}|. By (B1), supt1zt<\underset{t\geq 1}{\sup}~{}\|z_{t}\|<\infty. Let γ~m=i=1m|γi|\tilde{\gamma}_{m}=\sum_{i=1}^{m}|\gamma_{i}|. Then using Hoeffding’s inequality in conjunction with (8.9), we obtain

P(12σ2|t=1n(𝒛mt𝜷m)2nc(𝜷)|>κ1)<2exp(nκ12σ4C2γ~m4η~4).P\left(\frac{1}{2\sigma^{2}}\left|\frac{\sum_{t=1}^{n}(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m})^{2}}{n}-c(\boldsymbol{\beta})\right|>\kappa_{1}\right)<2\exp\left(-\frac{n\kappa^{2}_{1}\sigma^{4}}{C^{2}\tilde{\gamma}^{4}_{m}\|\tilde{\eta}\|^{4}}\right). (S-13.51)

Then, first integrating with respect to u=σ2u=\sigma^{-2}, then integrating with respect to v=η~v=\|\tilde{\eta}\| and finally with respect to w=γ~mw=\tilde{\gamma}_{m}, in each case using the gamma mixture form π~n(x)=c3r=1Mnψrnζ~rnexp(xψrn)x(ζ~rn1)\tilde{\pi}_{n}(x)=c_{3}\sum_{r=1}^{M_{n}}\psi^{\tilde{\zeta}_{rn}}_{rn}\exp\left(-x\psi_{rn}\right)x^{(\tilde{\zeta}_{rn}-1)}, for positive constants 0<ψ2ψrn<c~5<0<\psi_{2}\leq\psi_{rn}<\tilde{c}_{5}<\infty and 0<ζrn<c4nq0<\zeta_{rn}<c_{4}n^{q}, with 0<q<1/160<q<1/16, and Mnexp((αn)1/16)M_{n}\leq\exp\left(\left(\alpha n\right)^{1/16}\right), we find that

Eπ[P(12σ2|t=1n(𝒛mt𝜷m)2nc(𝜷)|>κ1)𝑰𝒮c(𝜽)]K9exp[(C16κ11/4(nψ7)1/8C17(αn)1/16c~5nq)]+C~exp(αn),E_{\pi}\left[P\left(\frac{1}{2\sigma^{2}}\left|\frac{\sum_{t=1}^{n}(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m})^{2}}{n}-c(\boldsymbol{\beta})\right|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]\\ \leq K_{9}\exp\left[-\left(C_{16}\kappa^{1/4}_{1}\left(n\psi_{7}\right)^{1/8}-C_{17}\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n), (S-13.52)

for relevant positive constants. It is also easy to see using Hoeffding’s inequality using (8.9) that

Eπ[P(12σ02|t=1n(𝒛mt𝜷m0)2nc(𝜷0)|>κ1)𝑰𝒮c(𝜽)]\displaystyle E_{\pi}\left[P\left(\frac{1}{2\sigma^{2}_{0}}\left|\frac{\sum_{t=1}^{n}(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0})^{2}}{n}-c(\boldsymbol{\beta}_{0})\right|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right] K~9exp[(C~16κ12n)],\displaystyle\leq\tilde{K}_{9}\exp\left[-\left(\tilde{C}_{16}\kappa^{2}_{1}n\right)\right], (S-13.53)

for relevant constants.

We next consider P(|ρσ2ρ0σ02||ρ0t=1nxt12n+𝜷m0t=1n𝒛mtxt1nρ0σ021ρ02ρ0c(𝜷0)1ρ02|>κ1)P\left(\left|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right|\left|\frac{\rho_{0}\sum_{t=1}^{n}x^{2}_{t-1}}{n}+\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}-\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right|>\kappa_{1}\right). Note that

P(|ρσ2ρ0σ02||ρ0t=1nxt12n+𝜷m0t=1n𝒛mtxt1nρ0σ021ρ02ρ0c(𝜷0)1ρ02|>κ1)\displaystyle P\left(\left|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right|\left|\frac{\rho_{0}\sum_{t=1}^{n}x^{2}_{t-1}}{n}+\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}-\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right|>\kappa_{1}\right)
P(|ρσ2ρ0σ02||t=1nxt12nσ021ρ02c(𝜷0)1ρ02|>κ12ρ0)\displaystyle\qquad\qquad\leq P\left(\left|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right|\left|\frac{\sum_{t=1}^{n}x^{2}_{t-1}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right|>\frac{\kappa_{1}}{2\rho_{0}}\right) (S-13.54)
+P(|ρσ2ρ0σ02||𝜷m0t=1n𝒛mtxt1n|>κ12).\displaystyle\qquad\qquad\qquad\qquad+P\left(\left|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right|\left|\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}\right|>\frac{\kappa_{1}}{2}\right). (S-13.55)

Note that the expectation of (S-13.54) admits the same upper bound as (S-13.50). To deal with (S-13.55) we let x~t=(𝒛t𝜷0)xt1\tilde{x}_{t}=(\boldsymbol{z}^{\prime}_{t}\boldsymbol{\beta}_{0})x_{t-1} and 𝐱~n=(x~1,,x~n)\tilde{\mathbf{x}}_{n}=(\tilde{x}_{1},\ldots,\tilde{x}_{n})^{\prime}. Then 𝐱~nNn(𝝁~n,𝚺~n)\tilde{\mathbf{x}}_{n}\sim N_{n}\left(\tilde{\boldsymbol{\mu}}_{n},\tilde{\boldsymbol{\Sigma}}_{n}\right), where 𝝁~n\tilde{\boldsymbol{\mu}}_{n} and 𝚺~n=𝑪~n𝑪~n\tilde{\boldsymbol{\Sigma}}_{n}=\tilde{\boldsymbol{C}}_{n}\tilde{\boldsymbol{C}}^{\prime}_{n} are appropriate modifications of 𝝁n\boldsymbol{\mu}_{n} and 𝚺n=𝑪n𝑪n\boldsymbol{\Sigma}_{n}=\boldsymbol{C}_{n}\boldsymbol{C}^{\prime}_{n} associated with (S-13.36). Note that 𝐱~n=𝝁~n+𝑪~n𝒚n\tilde{\mathbf{x}}_{n}=\tilde{\boldsymbol{\mu}}_{n}+\tilde{\boldsymbol{C}}_{n}\boldsymbol{y}_{n}, where 𝒚nNn(𝟎n,𝑰n)\boldsymbol{y}_{n}\sim N_{n}\left(\boldsymbol{0}_{n},\boldsymbol{I}_{n}\right). Using (8.9) we obtain the same form of the bound for (S-13.55) as (S-13.36). That is, we have

P(|ρσ2ρ0σ02||𝜷m0t=1n𝒛mtxt1n|>κ12)\displaystyle P\left(\left|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right|\left|\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}\right|>\frac{\kappa_{1}}{2}\right)
P(|𝟏n𝑪~n𝒚n|>nκ14|ρσ2ρ0σ02|1)+P(|𝝁n𝟏n|>nκ14|ρσ2ρ0σ02|1)\displaystyle\leq P\left(\left|\boldsymbol{1}^{\prime}_{n}\tilde{\boldsymbol{C}}_{n}\boldsymbol{y}_{n}\right|>\frac{n\kappa_{1}}{4}\left|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right|^{-1}\right)+P\left(\left|\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{1}_{n}\right|>\frac{n\kappa_{1}}{4}\left|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right|^{-1}\right)
2exp(K10κ12n|ρσ2ρ0σ02|2)+P(|𝝁n𝟏n|>nκ14|ρσ2ρ0σ02|1),\displaystyle\leq 2\exp\left(-K_{10}\kappa^{2}_{1}n\left|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right|^{-2}\right)+P\left(\left|\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{1}_{n}\right|>\frac{n\kappa_{1}}{4}\left|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right|^{-1}\right), (S-13.56)

where K10K_{10} is some positive constant. Using the same method as before again we obtain a bound for the expectation of the first part of (S-13.56) of similar form as exp[(C~16κ1n1/4C~17(αn)1/16α~5nq)]+C~exp(αn)\exp\left[-\left(\tilde{C}_{16}\sqrt{\kappa_{1}}n^{1/4}-\tilde{C}_{17}\left(\alpha n\right)^{1/16}-\tilde{\alpha}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n), for relevant positive constants. As before, here 0<q<1/160<q<1/16. For the second part of (S-13.56) we apply the method involving Hoeffding’s inequality as before, and obtain a bound of the above-mentioned form. Hence combining the bounds for the expectations of (S-13.51) and (S-13.55) we see that

Eπ[P(|ρσ2ρ0σ02||ρ0t=1nxt12n+𝜷m0t=1n𝒛mtxt1nρ0σ021ρ02ρ0c(𝜷0)1ρ02|>κ1)𝑰𝒮c(𝜽)]\displaystyle E_{\pi}\left[P\left(\left|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right|\left|\frac{\rho_{0}\sum_{t=1}^{n}x^{2}_{t-1}}{n}+\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}-\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]
K12exp[(C18κ1n1/4C19(αn)1/16α~5nq)]+C~exp(αn),\displaystyle\leq K_{12}\exp\left[-\left(C_{18}\sqrt{\kappa_{1}}n^{1/4}-C_{19}\left(\alpha n\right)^{1/16}-\tilde{\alpha}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n), (S-13.57)

for relevant positive constants.

Now let us bound the probability P(|(𝜷mσ2𝜷m0σ02)(t=1n𝒛mtxtn)c10(𝜷,𝜷0)σ2+c(𝜷0)σ02|>κ1)P\left(\left|\left(\frac{\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\boldsymbol{\beta}_{m0}}{\sigma^{2}_{0}}\right)^{\prime}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t}}{n}\right)-\frac{c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})}{\sigma^{2}}+\frac{c(\boldsymbol{\beta}_{0})}{\sigma^{2}_{0}}\right|>\kappa_{1}\right). Observe that

P(|(𝜷mσ2𝜷m0σ02)(t=1n𝒛mtxtn)c10(𝜷,𝜷0)σ2+c(𝜷0)σ02|>κ1)\displaystyle P\left(\left|\left(\frac{\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\boldsymbol{\beta}_{m0}}{\sigma^{2}_{0}}\right)^{\prime}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t}}{n}\right)-\frac{c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})}{\sigma^{2}}+\frac{c(\boldsymbol{\beta}_{0})}{\sigma^{2}_{0}}\right|>\kappa_{1}\right)
P(|t=1n(𝒛mt𝜷m)xtnc10(𝜷,𝜷0)|>κ1σ22)+P(|t=1n(𝒛mt𝜷m0)xtnc(𝜷0)|>κ1σ022).\displaystyle\leq P\left(\left|\frac{\sum_{t=1}^{n}(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m})x_{t}}{n}-c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})\right|>\frac{\kappa_{1}\sigma^{2}}{2}\right)+P\left(\left|\frac{\sum_{t=1}^{n}(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0})x_{t}}{n}-c(\boldsymbol{\beta}_{0})\right|>\frac{\kappa_{1}\sigma^{2}_{0}}{2}\right). (S-13.58)

Using the Gaussian concentration inequality as before it is easily seen that

Eπ[P(|t=1n(𝒛mt𝜷m)xtnc10(𝜷,𝜷0)|>κ1σ22)𝑰𝒮c(𝜽)]\displaystyle E_{\pi}\left[P\left(\left|\frac{\sum_{t=1}^{n}(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m})x_{t}}{n}-c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})\right|>\frac{\kappa_{1}\sigma^{2}}{2}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]
2𝜸m,η~𝒢nexp(2(αn)1/16)exp(2(αn)1/16)exp(K13κ12nσ4𝜷2)𝑑π(𝜷,σ2)+C~exp(αn)\displaystyle\leq 2\int_{\boldsymbol{\gamma}_{m},\tilde{\eta}\in\mathcal{G}_{n}}\int_{\exp(-2\left(\alpha n\right)^{1/16})}^{\exp(2\left(\alpha n\right)^{1/16})}\exp\left(-\frac{K_{13}\kappa^{2}_{1}n\sigma^{4}}{\|\boldsymbol{\beta}\|^{2}}\right)d\pi(\boldsymbol{\beta},\sigma^{2})+\tilde{C}\exp(-\alpha n)
C20exp[(C21κ1n1/4C22(αn)1/16c~5nq)]+C~exp(αn),\displaystyle\leq C_{20}\exp\left[-\left(C_{21}\sqrt{\kappa_{1}}n^{1/4}-C_{22}\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n), (S-13.59)

for relevant positive constants.

The Gaussian concentration inequality also ensures that the second term of (S-13.58) is bounded above by 2exp(K13κ12n)2\exp(-K_{13}\kappa^{2}_{1}n), for some K13>0K_{13}>0. Combining this with (S-13.58) and (S-13.59) we obtain

Eπ[P(|(𝜷mσ2𝜷m0σ02)(t=1n𝒛mtxtn)c10(𝜷,𝜷0)σ2+c(𝜷0)σ02|>κ1)𝑰𝒮c(𝜽)]\displaystyle E_{\pi}\left[P\left(\left|\left(\frac{\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\boldsymbol{\beta}_{m0}}{\sigma^{2}_{0}}\right)^{\prime}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t}}{n}\right)-\frac{c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})}{\sigma^{2}}+\frac{c(\boldsymbol{\beta}_{0})}{\sigma^{2}_{0}}\right|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]
K14exp[(C23κ1n1/4C24(αn)1/16c~5nq)]+C~exp(αn)+2exp(K13κ12n),\displaystyle\leq K_{14}\exp\left[-\left(C_{23}\sqrt{\kappa_{1}}n^{1/4}-C_{24}\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n)+2\exp(-K_{13}\kappa^{2}_{1}n), (S-13.60)

for relevant positive constants. Note that, here 0<q<1/160<q<1/16.

For P(|(ρ𝜷mσ2ρ0𝜷m0σ02)(t=1n𝒛mtxt1n)|>κ1)P\left(\left|\left(\frac{\rho\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\rho_{0}\boldsymbol{\beta}_{m0}}{\sigma^{2}_{0}}\right)^{\prime}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}\right)\right|>\kappa_{1}\right), we note that

P(|(ρ𝜷mσ2ρ0𝜷m0σ02)(t=1n𝒛mtxt1n)|>κ1)\displaystyle P\left(\left|\left(\frac{\rho\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\rho_{0}\boldsymbol{\beta}_{m0}}{\sigma^{2}_{0}}\right)^{\prime}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}\right)\right|>\kappa_{1}\right)
P(|t=1n(𝒛mt𝜷m)xt1n|>κ1σ22ρ)+P(|t=1n(𝒛mt𝜷m0)xt1n|>κ1σ022ρ0).\displaystyle\leq P\left(\left|\frac{\sum_{t=1}^{n}(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m})x_{t-1}}{n}\right|>\frac{\kappa_{1}\sigma^{2}}{2\rho}\right)+P\left(\left|\frac{\sum_{t=1}^{n}(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0})x_{t-1}}{n}\right|>\frac{\kappa_{1}\sigma^{2}_{0}}{2\rho_{0}}\right). (S-13.61)

For the first term of (S-13.61) we apply the Gaussian concentration inequality followed by taking expectations with respect to σ2\sigma^{2}, |ρ||\rho|, |γ~m||\tilde{\gamma}_{m}| and η~\|\tilde{\eta}\|. This yields the bound

K15exp[(C25κ11/8n1/16C26(αn)1/16nqlogc~5)]+C~exp(αn),K_{15}\exp\left[-\left(C_{25}\kappa^{1/8}_{1}n^{1/16}-C_{26}\left(\alpha n\right)^{1/16}-n^{q}\log\tilde{c}_{5}\right)\right]+\tilde{C}\exp(-\alpha n),

for relevant positive constants. The bound for the second term is given by 2exp(K16κ12n)2\exp(-K_{16}\kappa^{2}_{1}n). Together we thus obtain

Eπ[P(|(ρ𝜷mσ2ρ0𝜷m0σ02)(t=1n𝒛mtxt1n)|>δ1)𝑰𝒢n(𝜽)]\displaystyle E_{\pi}\left[P\left(\left|\left(\frac{\rho\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\rho_{0}\boldsymbol{\beta}_{m0}}{\sigma^{2}_{0}}\right)^{\prime}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}\right)\right|>\delta_{1}\right)\boldsymbol{I}_{\mathcal{G}_{n}}(\boldsymbol{\theta})\right]
K~16exp[(C26κ11/8n1/16C27(αn)1/16nqlogc~5)]+2exp(K16κ12n).\displaystyle\leq\tilde{K}_{16}\exp\left[-\left(C_{26}\kappa^{1/8}_{1}n^{1/16}-C_{27}\left(\alpha n\right)^{1/16}-n^{q}\log\tilde{c}_{5}\right)\right]+2\exp(-K_{16}\kappa^{2}_{1}n). (S-13.62)

We now deal with the last term P(|(ρσ2ρ0σ02)(t=1nϵtxt1n)|>κ1)P\left(\left|\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\left(\frac{\sum_{t=1}^{n}\epsilon_{t}x_{t-1}}{n}\right)\right|>\kappa_{1}\right). Recall that 𝐱n=𝝁n+𝑪n𝒚n\mathbf{x}_{n}=\boldsymbol{\mu}_{n}+\boldsymbol{C}_{n}\boldsymbol{y}_{n}, where 𝑪n𝑪n=𝚺n\boldsymbol{C}_{n}\boldsymbol{C}^{\prime}_{n}=\boldsymbol{\Sigma}_{n} and 𝒚nNn(𝝁n,𝑰n)\boldsymbol{y}_{n}\sim N_{n}\left(\boldsymbol{\mu}_{n},\boldsymbol{I}_{n}\right). Let ϵn1=(ϵ2,,ϵn)\boldsymbol{\epsilon}_{n-1}=(\epsilon_{2},\ldots,\epsilon_{n})^{\prime}. Then t=1nϵtxt1=ϵn1𝐱n1=σ0(𝒚n𝝁n+𝒚n1𝑪n1𝒚n1)\sum_{t=1}^{n}\epsilon_{t}x_{t-1}=\boldsymbol{\epsilon}^{\prime}_{n-1}\mathbf{x}_{n-1}=\sigma_{0}\left(\boldsymbol{y}^{\prime}_{n}\boldsymbol{\mu}_{n}+\boldsymbol{y}^{\prime}_{n-1}\boldsymbol{C}_{n-1}\boldsymbol{y}_{n-1}\right). Application of the Gaussian concentration inequality and the Hanson-Wright inequality we find that

P(|(ρσ2ρ0σ02)(t=1nϵtxt1n)|>κ1)\displaystyle P\left(\left|\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\left(\frac{\sum_{t=1}^{n}\epsilon_{t}x_{t-1}}{n}\right)\right|>\kappa_{1}\right)
P(|𝒚n𝝁n|n>κ1σ0|ρσ2ρ0σ02|1)+P(𝒚n1𝑪n1𝒚n1n>κ1σ0|ρσ2ρ0σ02|1)\displaystyle\leq P\left(\frac{\left|\boldsymbol{y}^{\prime}_{n}\boldsymbol{\mu}_{n}\right|}{n}>\frac{\kappa_{1}}{\sigma_{0}}\left|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right|^{-1}\right)+P\left(\frac{\boldsymbol{y}^{\prime}_{n-1}\boldsymbol{C}_{n-1}\boldsymbol{y}_{n-1}}{n}>\frac{\kappa_{1}}{\sigma_{0}}\left|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right|^{-1}\right)
K17exp(K18κ12n|ρσ2ρ0σ02|2),\displaystyle\leq K_{17}\exp\left(-K_{18}\kappa^{2}_{1}n\left|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right|^{-2}\right), (S-13.63)

for some positive constants K17K_{17} and K18K_{18}. Taking expectation of (S-13.63) with respect to π\pi we obtain as before

Eπ[P(|(ρσ2ρ0σ02)(t=1nϵtxt1n)|>κ1)𝑰𝒮c(𝜽)]K19exp[(K20κ1n1/4K21(αn)1/16c~5nq)]+C~exp(αn),E_{\pi}\left[P\left(\left|\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\left(\frac{\sum_{t=1}^{n}\epsilon_{t}x_{t-1}}{n}\right)\right|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]\\ \leq K_{19}\exp\left[-\left(K_{20}\sqrt{\kappa_{1}}n^{1/4}-K_{21}\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n), (S-13.64)

for relevant positive constants. Recall that 0<q<1/160<q<1/16.

Combining (S-13.46), (S-13.50), (S-13.52), (S-13.57), (S-13.60), (S-13.62) and (S-13.64), we see that

n=1Eπ[P(|1nlogRn(𝜽)+h(𝜽)|>δ)𝑰𝒮c(𝜽)]<.\sum_{n=1}^{\infty}E_{\pi}\left[P\left(\left|\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+h(\boldsymbol{\theta})\right|>\delta\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]<\infty.

This verifies (8.24) and hence (S6).

S-13.7 Verification of (S7)

Since 𝒢n𝚯\mathcal{G}_{n}\rightarrow\boldsymbol{\Theta} as nn\rightarrow\infty, it follows that for any set AA with π(A)>0\pi(A)>0, 𝒢nA𝚯A=A\mathcal{G}_{n}\cap A\rightarrow\boldsymbol{\Theta}\cap A=A, as nn\rightarrow\infty. In our case, 𝒢n\mathcal{G}_{n}, and hence 𝒢nA\mathcal{G}_{n}\cap A, are decreasing in nn, so that h(𝒢nA)h\left(\mathcal{G}_{n}\cap A\right) must be non-increasing in nn. Moreover, for any n1n\geq 1, 𝒢nAA\mathcal{G}_{n}\cap A\subseteq A, so that h(𝒢nA)h(A)h\left(\mathcal{G}_{n}\cap A\right)\geq h(A), for all n1n\geq 1. Hence, continuity of hh implies that h(𝒢nA)h(A)h\left(\mathcal{G}_{n}\cap A\right)\rightarrow h(A), as nn\rightarrow\infty, and (S7) is satisfied.

Thus (S1)–(S7) are satisfied, so that Shalizi’s result stated in the main manuscript holds. It follows that all our asymptotic results of our main manuscript apply to this multiple testing problem.

References

  • Bogdan et al. (2011) Bogdan, M., Chakrabarti, A., Frommlet, F., and Ghosh, J. K. (2011). Asymptotic Bayes-optimality under sparsity of some multiple testing procedures. Ann. Statist., 39(3), 1551–1579.
  • Chandra and Bhattacharya (2019) Chandra, N. K. and Bhattacharya, S. (2019). Non-marginal Decisions: A Novel Bayesian Multiple Testing Procedure. Electronic Journal of Statistics, 13(1), 489–535.
  • Chandra and Bhattacharya (2020) Chandra, N. K. and Bhattacharya, S. (2020). Asymptotic Theory of Dependent Bayesian Multiple Testing Procedures Under Possible Model Misspecification. arXiv preprint arXiv:1611.01369.
  • Chatterjee and Bhattacharya (2020) Chatterjee, D. and Bhattacharya, S. (2020). Posterior Convergence of Gaussian Process Regression Under Possible Misspecifications. arXiv preprint.
  • Cramer and Leadbetter (1967) Cramer, H. and Leadbetter, M. R. (1967). Stationary and Related Stochastic Processes. Wiley, New York.
  • Datta and Ghosh (2013) Datta, J. and Ghosh, J. K. (2013). Asymptotic Properties of Bayes Risk for the Horseshoe Prior. Bayesian Anal., 8(1), 111–132.
  • Fan and Han (2017) Fan, J. and Han, X. (2017). Estimation of the false discovery proportion with unknown dependence. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(4), 1143–1164.
  • Fan et al. (2012) Fan, J., Han, X., and Gu, W. (2012). Estimating false discovery proportion under arbitrary covariance dependence. Journal of the American Statistical Association, 107(499), 1019–1035. PMID: 24729644.
  • Giraud (2015) Giraud, C. (2015). Introduction to High-Dimensional Statistics. CRC Press, Boca Raton.
  • Hoeffding (1963) Hoeffding, W. (1963). Probability Inequalities for Sums of Bounded Random Variables. Journal of the American Statistical Association, 58, 13–30.
  • Lange (2010) Lange, K. (2010). Numerical Analysis for Statisticians. New York, Springer.
  • Müller et al. (2004) Müller, P., Parmigiani, G., Robert, C., and Rousseau, J. (2004). Optimal sample size for multiple testing: the case of gene expression microarrays. Journal of the American Statistical Association, 99(468), 990–1001.
  • Newey (1991) Newey, W. K. (1991). Uniform Convergence in Probability and Stochastic Equicontinuity. Econometrica, 59, 1161–1167.
  • Rudelson and Vershynin (2013) Rudelson, M. and Vershynin, R. (2013). Hanson-Wright Inequality and Sub-Gaussian Concentration. Electronic Communications in Probability, 18, 9.
  • Sarkar et al. (2008) Sarkar, S. K., Zhou, T., and Ghosh, D. (2008). A general decision theoretic formulation of procedures controlling FDR and FNR from a Bayesian perspective. Statistica Sinica, 18(3), 925–945.
  • Shalizi (2009) Shalizi, C. R. (2009). Dynamics of Bayesian Updating with Dependent Data and Misspecified Models. Electron. J. Statist., 3, 1039–1074.
  • Storey (2003) Storey, J. D. (2003). The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Statist., 31(6), 2013–2035.
  • Sun and Cai (2009) Sun, W. and Cai, T. T. (2009). Large-scale multiple testing under dependence. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 393–424.
  • Xie et al. (2011) Xie, J., Cai, T. T., Maris, J., and Li, H. (2011). Optimal false discovery rate control for dependent data. Statistics and its interface, 4(4), 417.