High-dimensional Asymptotic Theory of Bayesian Multiple Testing Procedures Under General Dependent Setup and Possible Misspecification

Noirrit Kiran Chandra and Sourabh Bhattacharya Noirrit Kiran Chandra is a postdoctoral researcher at Department of Statistical Science, Duke University, USA, and Sourabh Bhattacharya is an Associate Professor in Interdisciplinary Statistical Research Unit, Indian Statistical Institute, 203, B. T. Road, Kolkata 700108. Corresponding e-mail: noirritchandra@gmail.com.

Abstract

In this article, we investigate the asymptotic properties of Bayesian multiple testing procedures under general dependent setup, when the sample size and the number of hypotheses both tend to infinity. Specifically, we investigate strong consistency of the procedures and asymptotic properties of different versions of false discovery and false non-discovery rates under the high dimensional setup. We particularly focus on a novel Bayesian non-marginal multiple testing procedure and its associated error rates in this regard. Our results show that the asymptotic convergence rates of the error rates are directly associated with the Kullback-Leibler divergence from the true model, and the results hold even when the postulated class of models is misspecified.

For illustration of our high-dimensional asymptotic theory, we consider a Bayesian variable selection problem in a time-varying covariate selection framework, with autoregressive response variables. We particularly focus on the setup where the number of hypotheses increases at a faster rate compared to the sample size, which is the so-called ultra-high dimensional situation.
MSC 2010 subject classifications: Primary 62F05, 62F15; secondary 62C10, 62J07.
Keywords: Bayesian multiple testing, Dependence, False discovery rate, Kullback-Leibler, Posterior convergence, Ultra high dimension.

1 Introduction

The area of multiple hypotheses testing has gained much importance and popularity, particularly in this era of big data, where often very large number of hypotheses need to be tested simultaneously. There are applications abound in the fields of statistical genetics, spatio-temporal statistics, brain imaging, to name a few. On the theoretical side, it is important to establish validity of the multiple testing procedure in the sense that the procedure controls the false discovery rate $(FDR)$ at some pre-specified level or attains oracle, as the number of tests grows to infinity.

Although there is considerable literature addressing these issues, the important factor of dependence among the tests seem to have attained less attention. Indeed, realistically, the test statistics or the parameters can not be expected to be independent. In this regard, Chandra and Bhattacharya (2019) introduced a novel Bayesian multiple testing procedure that coherently accounts for such dependence and yields joint decision rules that are functions of appropriate joint posterior probabilities. As demonstrated in Chandra and Bhattacharya (2019) and Chandra and Bhattacharya (2020), the new Bayesian method significantly outperforms existing popular multiple testing methods by proper utilization of the dependence structures. Since in the new method the decisions are taken jointly, the method is referred to as Bayesian non-marginal multiple testing procedure.

Chandra and Bhattacharya (2020) investigated in detail the asymptotic theory of the non-marginal procedure, and indeed general Bayesian multiple testing methods under additive loss, for fixed number of hypotheses, when the sample size tends to infinity. In particular, they provided sufficient conditions for strong consistency of such procedures and also showed that the asymptotic convergence rates of the versions of $FDR$ and false non-discovery rate ( $FNR$ ) are directly related to the Kullback-Leibler (KL) divergence from the true model. Interestingly, their results continue to hold even under misspecifications, that is, if the class of postulated models does not include the true model. In this work, we investigate the asymptotic properties of the Bayesian non-marginal procedure in particular, and Bayesian multiple testing methods under additive loss in general, when the sample size, as well as the number of hypotheses, tend to infinity.

As mentioned earlier, asymptotic works in multiple testing when the number of hypotheses grows to infinity, are not rare. Xie et al. (2011) have proposed an asymptotic optimal decision rule for short range dependent data with dependent test statistics. Bogdan et al. (2011) studied the oracle properties and Bayes risk of several multiple testing methods under sparsity in Bayesian decision-theoretic setup. Datta and Ghosh (2013) studied oracle properties for horse-shoe prior when the number of tests grows to infinity. However, in the aforementioned works, the test statistics are independent and follow Gaussian distribution. Fan et al. (2012) proposed a method of dealing with correlated test statistics where the covariance structure is known. Their method is based on principal eigen values of the covariance matrix, which they termed as principal factors. Using those principal factors their method dilutes the association between correlated statistics to deal with an arbitrary dependence structure. They also derived an approximate consistent estimator for the false discovery proportion (FDP) in large-scale multiple testing. Fan and Han (2017) extended this work when the dependence structure is unknown. In these approaches, the decision rules are marginal and the test statistics jointly follow multivariate Gaussian distribution. Chandra and Bhattacharya (2019) argue that when the decision rules corresponding to different hypotheses are marginal, the full potential of the dependence structure is not properly exploited. Results of extensive simulation studies reported in Chandra and Bhattacharya (2019) and Chandra and Bhattacharya (2020), demonstrating superior performance of the Bayesian non-marginal method compared to popular marginal methods, even for large number of hypotheses, seem to vindicate this issue. This makes asymptotic analysis of the Bayesian non-marginal method with increasing number of hypotheses all the more important.

To be more specific, we investigate the asymptotic theory of the Bayesian non-marginal procedure in the general dependence setup, without any particular model assumption, when the sample size ( $n$ ) and the number of hypotheses ( $m_{n}$ , which may be a function of $n$ ), both tend to infinity. We establish strong consistency of the procedure and show that even in this setup, the convergence rates of versions of $FDR$ and $FNR$ are directly related to the KL-divergence from the true model. We show that our results continue to hold for general Bayesian procedures under the additive loss function. In the Bayesian non-marginal context we illustrate the theory with the time-varying covariate selection problem, where the number of covariates tends to infinity with the sample size $n$ . We distinguish between the two setups: ultra high-dimensional case, that is, where $\frac{m_{n}}{n}\rightarrow\infty$ (or some constant), as $n\rightarrow\infty$ , and the high-dimensional but not ultra high-dimensional case, that is, $m_{n}\rightarrow\infty$ and $\frac{m_{n}}{n}\rightarrow 0$ , as $n\rightarrow\infty$ . We particularly focus on the ultra high-dimensional setup because of its much more challenging nature.

2 A brief overview of the Bayesian non-marginal procedure

Let $\boldsymbol{X}_{n}=\{X_{1},\ldots,X_{n}\}$ denote the available data set. Suppose the data is modelled by the family of distributions $P_{\boldsymbol{X}_{n}|\boldsymbol{\theta}}$ (which may also be non-parametric). For $M>1$ , let us denote by $\boldsymbol{\Theta}=\Theta_{1}\times\cdots\times\Theta_{M}$ the relevant parameter space associated with $\boldsymbol{\theta}=(\theta_{1},\ldots,\theta_{M})$ , where we allow $M$ to be infinity as well. Let $P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}(\cdot)$ and $E_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}(\cdot)$ denote the posterior distribution and expectation respectively of $\boldsymbol{\theta}$ given $\boldsymbol{X}_{n}$ and let $P_{\boldsymbol{X}_{n}}(\cdot)$ and $E_{\boldsymbol{X}_{n}}(\cdot)$ denote the marginal distribution and expectation of $\boldsymbol{X}_{n}$ respectively. Let us consider the problem of testing $m$ hypotheses simultaneously corresponding to the actual parameters of interest, where $1<m\leq M$ .

Without loss of generality, let us consider testing the parameters associated with $\Theta_{i}$ ; $i=1,\ldots,m$ , formalized as:

H_{0i}:\theta_{i}\in\Theta_{0i}\hbox{ versus }H_{1i}:\theta_{i}\in\Theta_{1i},

where $\Theta_{0i}\bigcap\Theta_{1i}=\emptyset\mbox{ and }\Theta_{0i}\bigcup\Theta_{1i}=\Theta_{i},\mbox{ for $i=1,\cdots,m$}.$

Let

	$\displaystyle d_{i}=$	$\displaystyle\begin{cases}1&\text{if the $i$-th hypothesis is rejected;}\\ 0&\text{otherwise;}\end{cases}$		(2.1)
	$\displaystyle r_{i}=$	$\displaystyle\begin{cases}1&\text{if $H_{1i}$ is true;}\\ 0&\text{if $H_{0i}$ is true.}\end{cases}$		(2.2)

Following Chandra and Bhattacharya (2019) we define $G_{i}$ to be the set of hypotheses, including the $i$ -th one, which are highly dependent, and define

z_{i}=\begin{cases}1&\mbox{if $H_{d_{j},j}$ is true for all $j\in G_{i}\setminus\{i\}$;}\\ 0&\mbox{otherwise.}\end{cases}

(2.3)

If, for any $i\in\{1,\ldots,m\}$ , $G_{i}=\{i\}$ , a singleton, then we define $z_{i}=1$ . Chandra and Bhattacharya (2019) maximize the posterior expectation of the number of true positives

TP=\sum_{i=1}^{m}d_{i}r_{i}z_{i},

(2.4)

subject to controlling the posterior expectation of the error term

E=\sum_{i=1}^{m}d_{i}(1-r_{i}z_{i}),

(2.5)

which is actually the posterior mean of the sum of three error terms $E_{1}=\sum_{i=1}^{m}d_{i}(1-r_{i})z_{i}$ , $E_{2}=\sum_{i=1}^{m}d_{i}(1-r_{i})(1-z_{i})$ and $E_{3}=\sum_{i=1}^{m}d_{i}r_{i}(1-z_{i})$ . For detailed discussion regarding these, see Chandra and Bhattacharya (2019).

It follows that the decision configuration can be obtained by minimizing the function

	$\displaystyle\xi(\boldsymbol{d})$	$\displaystyle=-\sum_{i=1}^{m}d_{i}E_{\boldsymbol{\theta}\|\boldsymbol{X}_{n}}(r_{i}z_{i})+\lambda_{n}\sum_{i=1}^{m}d_{i}E_{\boldsymbol{\theta}\|\boldsymbol{X}_{n}}(1-r_{i}z_{i})$
		$\displaystyle=-(1+\lambda_{n})\sum_{i=1}^{m}d_{i}\left(w_{in}(\boldsymbol{d})-\frac{\lambda_{n}}{1+\lambda_{n}}\right),$

with respect to all possible decision configurations of the form $\boldsymbol{d}=\{d_{1},\ldots,d_{m}\}$ , where $\lambda_{n}>0$ , and

w_{in}(\boldsymbol{d})=E_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}(r_{i}z_{i})=P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left(H_{1i}\cap\left\{\cap_{j\neq i,j\in G_{i}}H_{d_{j},j}\right\}\right),

(2.6)

is the posterior probability of the decision configuration $\{d_{1},\ldots,d_{i-1},1,d_{i+1},\ldots,d_{m}\}$ being correct. Letting $\beta_{n}=\lambda_{n}/(1+\lambda_{n})$ , one can equivalently maximize

f_{\beta_{n}}(\boldsymbol{d})=\sum_{i=1}^{m}d_{i}\left(w_{in}(\boldsymbol{d})-\beta_{n}\right)

(2.7)

with respect to $\boldsymbol{d}$ and obtain the optimal decision configuration.

Definition 1

Let $\mathbb{D}$ be the set of all $m$ -dimensional binary vectors denoting all possible decision configurations. Define

\widehat{\boldsymbol{d}}=\operatorname*{argmax}_{\boldsymbol{d}\in\mathbb{D}}f_{\beta}(\boldsymbol{d})

where $0<\beta<1$ . Then $\widehat{\boldsymbol{d}}$ is the optimal decision configuration obtained as the solution of the non-marginal multiple testing method.

For detailed discussion regarding the choice of $G_{i}$ s in (2.3), see Chandra and Bhattacharya (2019) and Chandra and Bhattacharya (2020). In particular, Chandra and Bhattacharya (2020) show that asymptotically, the Bayesian non-marginal method is robust with respect to $G_{i}$ s in the sense that it is consistent with respect to any choice of the grouping structure. As will be shown in this article, the same holds even in the high-dimensional asymptotic setup.

2.1 Error measures in multiple testing

Storey (2003) advocated positive False Discovery Rate $(pFDR)$ as a measure of type-I error in multiple testing. Let $\delta_{\mathcal{M}}(\boldsymbol{d}|\boldsymbol{X}_{n})$ be the probability of choosing $\boldsymbol{d}$ as the optimal decision configuration given data $\boldsymbol{X}_{n}$ when a multiple testing method $\mathcal{M}$ is employed. Then $pFDR$ is defined as:

pFDR=E_{\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}d_{i}(1-r_{i})}{\sum_{i=1}^{m}d_{i}}\delta_{\mathcal{M}}(\boldsymbol{d}|\boldsymbol{X}_{n})\bigg{|}\delta_{\mathcal{M}}(\boldsymbol{d}=\mathbf{0}|\boldsymbol{X}_{n})=0\right].

(2.8)

Analogous to type-II error, the positive False Non-discovery Rate $(pFNR)$ is defined as

\displaystyle pFNR=E_{\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}(1-d_{i})r_{i}}{\sum_{i=1}^{m}(1-d_{i})}\delta_{\mathcal{M}}\left(\boldsymbol{d}|\boldsymbol{X}_{n}\right)\bigg{|}\delta_{\mathcal{M}}\left(\boldsymbol{d}=\boldsymbol{1}|\boldsymbol{X}_{n}\right)=0\right].

(2.9)

Under prior $\pi(\cdot)$ , Sarkar et al. (2008) defined posterior $FDR$ and $FNR$ . The measures are given as following:

	$\displaystyle posterior~{}FDR$	$\displaystyle=E_{\boldsymbol{\theta}\|\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}d_{i}(1-r_{i})}{\sum_{i=1}^{m}d_{i}\vee 1}\delta_{\mathcal{M}}\left(\boldsymbol{d}\|\boldsymbol{X}_{n}\right)\right]=\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}d_{i}(1-v_{in})}{\sum_{i=1}^{m}d_{i}\vee 1}\delta_{\mathcal{M}}(\boldsymbol{d}\|\boldsymbol{X}_{n});$		(2.10)
	$\displaystyle posterior~{}FNR$	$\displaystyle=E_{\boldsymbol{\theta}\|\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}(1-d_{i})r_{i}}{\sum_{i=1}^{m}(1-d_{i})\vee 1}\delta_{\mathcal{M}}\left(\boldsymbol{d}\|\boldsymbol{X}_{n}\right)\right]=\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}(1-d_{i})v_{in}}{\sum_{i=1}^{m}(1-d_{i})\vee 1}\delta_{\mathcal{M}}(\boldsymbol{d}\|\boldsymbol{X}_{n}),$		(2.11)

where $v_{in}=P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}(\varTheta_{1i})$ . Also under any non-randomized decision rule $\mathcal{M}$ , $\delta_{\mathcal{M}}(\boldsymbol{d}|\boldsymbol{X}_{n})$ is either 1 or 0 depending on data $\boldsymbol{X}_{n}$ . Given $\boldsymbol{X}_{n}$ , we denote these posterior error measures by $FDR_{\boldsymbol{X}_{n}}$ and $FNR_{\boldsymbol{X}_{n}}$ respectively.

With respect to the new notions of errors in (2.4) and (2.5), Chandra and Bhattacharya (2019) modified $FDR_{\boldsymbol{X}_{n}}$ as

	$\displaystyle modified~{}FDR_{\boldsymbol{X}_{n}}$	$\displaystyle=E_{\boldsymbol{\theta}\|\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}d_{i}(1-r_{i}z_{i})}{\sum_{i=1}^{m}d_{i}\vee 1}\delta_{\mathcal{M}}\left(\boldsymbol{d}\|\boldsymbol{X}_{n}\right)\right]$
		$\displaystyle=\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}d_{i}(1-w_{in}(\boldsymbol{d}))}{\sum_{i=1}^{m}d_{i}\vee 1}\delta_{\mathcal{M}}(\boldsymbol{d}\|\boldsymbol{X}_{n}),$		(2.12)

and $FNR_{\boldsymbol{X}_{n}}$ as

	$\displaystyle modified~{}FNR_{\boldsymbol{X}_{n}}$	$\displaystyle=E_{\boldsymbol{\theta}\|\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}(1-d_{i})r_{i}z_{i}}{\sum_{i=1}^{m}(1-d_{i})\vee 1}\delta_{\mathcal{M}}\left(\boldsymbol{d}\|\boldsymbol{X}_{n}\right)\right]$
		$\displaystyle=\sum_{\boldsymbol{d}\in\mathbb{D}}\frac{\sum_{i=1}^{m}(1-d_{i})w_{in}(\boldsymbol{d})}{\sum_{i=1}^{m}(1-d_{i})\vee 1}\delta_{\mathcal{M}}(\boldsymbol{d}\|\boldsymbol{X}_{n}).$		(2.13)

We denote $modified~{}FDR_{\boldsymbol{X}_{n}}$ and $FNR_{\boldsymbol{X}_{n}}$ by $mFDR_{\boldsymbol{X}_{n}}$ and $mFNR_{\boldsymbol{X}_{n}}$ respectively. Notably, the expectations of $FDR_{\boldsymbol{X}_{n}}$ and $FNR_{\boldsymbol{X}_{n}}$ with respect to $\boldsymbol{X}_{n}$ , conditioned on the fact that their respective denominators are positive, yields the positive Bayesian $FDR~{}(pBFDR)$ and $FNR$ $(pBFNR)$ respectively. The same expectation over $mFDR_{\boldsymbol{X}_{n}}$ and $mFNR_{\boldsymbol{X}_{n}}$ yields modified positive $BFDR~{}(mpBFDR)$ and modified positive $BFNR~{}(mpBFNR)$ respectively.

Müller et al. (2004) (see also (Sun and Cai, 2009; Xie et al., 2011)) considered the following additive loss function

L(\boldsymbol{d},\boldsymbol{\theta})=c\sum_{i=1}^{m}d_{i}(1-r_{i})+\sum_{i=1}^{m}(1-d_{i})r_{i},

(2.14)

where $c$ is a positive constant. The decision rule that minimizes the posterior risk of the above loss is $d_{i}=I\left(v_{i}>\frac{c}{1+c}\right)$ for all $i=1,\cdots,m$ , where $I(\cdot)$ is the indicator function. Observe that the non-marginal method boils down to this additive loss function based approach when $G_{i}=\{i\}$ , that is, when the information regarding dependence between hypotheses is not available or overlooked. Hence, the convergence properties of the additive loss function based methods can be easily derived from our theories.

Note that multiple testing problems can be regarded as model selection problems where the task is to choose the correct specification for the parameters under consideration. The model is misspecified even if one decision is taken incorrectly. Under quite general conditions, Shalizi (2009) investigated asymptotic behaviour of misspecified models. We adopt his basic assumptions and some of his convergence results to build a general asymptotic theory for our Bayesian non-marginal multiple testing method in high dimensions.

In Section 3, we provide the setup, assumptions and the main result of Shalizi (2009) which we adopt for our purpose. In Section 4 we address consistency of the Bayesian non-marginal method and convergence of the associated error terms in the high-dimensional setup. High-dimensional asymptotic analyses of versions of $FDR$ and $FNR$ are detailed in Sections 5 and 6, respectively. In Section 7, we establish the high-dimensional asymptotic theory for $FNR_{\boldsymbol{X}_{n}}$ and $BFNR$ when versions of $BFDR$ are $\alpha$ -controlled asymptotically. We illustrate the asymptotic properties of the non-marginal method in a multiple testing setup associated with an autoregressive model involving time-varying covariates in Section 8, in high-dimensional contexts. Finally, in Section 9 we summarize our contributions and provide concluding remarks.

3 Preliminaries for ensuring posterior convergence under general setup

Following Shalizi (2009) we consider a probability space $(\Omega,\mathcal{F},P)$ , and a sequence of random variables $X_{1},X_{2},\ldots$ , taking values in some measurable space $(\Xi,\mathcal{X})$ , whose infinite-dimensional distribution is $P$ . The natural filtration of this process is $\sigma(\boldsymbol{X}_{n})$ .

We denote the distributions of processes adapted to $\sigma(\boldsymbol{X}_{n})$ by $P_{\boldsymbol{X}_{n}|\boldsymbol{\theta}}$ , where $\boldsymbol{\theta}$ is associated with a measurable space $(\boldsymbol{\Theta},\mathcal{T})$ , and is generally infinite-dimensional. For the sake of convenience, we assume, as in Shalizi (2009), that $P$ and all the $P_{\boldsymbol{X}_{n}|\boldsymbol{\theta}}$ are dominated by a common reference measure, with respective densities $p$ and $f_{\boldsymbol{\theta}}$ . The usual assumptions that $P\in\boldsymbol{\Theta}$ or even $P$ lies in the support of the prior on $\boldsymbol{\Theta}$ , are not required for Shalizi’s result, rendering it very general indeed. We put the prior distribution $\pi(\cdot)$ on the parameter space $\boldsymbol{\Theta}$ .

3.1 Assumptions and theorem of Shalizi

(S1)

Consider the following likelihood ratio:

$R_{n}(\boldsymbol{\theta})=\frac{f_{\boldsymbol{\theta}}(\boldsymbol{X}_{n})}{p(\boldsymbol{X}_{n})}.$ (3.1)

Assume that $R_{n}(\boldsymbol{\theta})$ is $\sigma(\boldsymbol{X}_{n})\times\mathcal{T}$ -measurable for all $n>0$ .
(S2)

For each $\boldsymbol{\theta}\in\Theta$ , the generalized or relative asymptotic equipartition property holds, and so, almost surely,

$\underset{n\rightarrow\infty}{\lim}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})=-h(\boldsymbol{\theta}),$

where $h(\boldsymbol{\theta})$ is given in (S3) below.

(S3)

For every $\boldsymbol{\theta}\in\Theta$ , the KL-divergence rate

h(\boldsymbol{\theta})=\underset{n\rightarrow\infty}{\lim}~{}\frac{1}{n}E\left(\log\frac{p(\boldsymbol{X}_{n})}{f_{\boldsymbol{\theta}}(\boldsymbol{X}_{n})}\right).

(3.2)

exists (possibly being infinite) and is $\mathcal{T}$ -measurable.

(S4)

Let $I=\left\{\boldsymbol{\theta}:h(\boldsymbol{\theta})=\infty\right\}$ . The prior $\pi$ satisfies $\pi(I)<1$ .

Following the notation of Shalizi (2009), for $A\subseteq\Theta$ , let

$\displaystyle h\left(A\right)$	$\displaystyle=\underset{\boldsymbol{\theta}\in A}{\mbox{ess~{}inf}}~{}h(\boldsymbol{\theta});$	(3.3)
$\displaystyle J(\boldsymbol{\theta})$	$\displaystyle=h(\boldsymbol{\theta})-h(\Theta);$	(3.4)
$\displaystyle J(A)$	$\displaystyle=\underset{\boldsymbol{\theta}\in A}{\mbox{ess~{}inf}}~{}J(\boldsymbol{\theta}).$	(3.5)

(S5)

There exists a sequence of sets $\mathcal{G}_{n}\rightarrow\Theta$ as $n\rightarrow\infty$ such that:

(1)

\pi\left(\mathcal{G}_{n}\right)\geq 1-\alpha\exp\left(-\varsigma n\right),~{}\mbox{for some}~{}\alpha>0,~{}\varsigma>2h(\Theta);

(3.6)

(2)

The convergence in (S3) is uniform in $\theta$ over $\mathcal{G}_{n}\setminus I$ .
(3)

$h\left(\mathcal{G}_{n}\right)\rightarrow h\left(\Theta\right)$ , as $n\rightarrow\infty$ .

For each measurable $A\subseteq\Theta$ , for every $\delta>0$ , there exists a random natural number $\tau(A,\delta)$ such that

n^{-1}\log\int_{A}R_{n}(\boldsymbol{\theta})\pi(\boldsymbol{\theta})d\boldsymbol{\theta}\leq\delta+\underset{n\rightarrow\infty}{\limsup}~{}n^{-1}\log\int_{A}R_{n}(\boldsymbol{\theta})\pi(\boldsymbol{\theta})d\boldsymbol{\theta},

(3.7)

for all $n>\tau(A,\delta)$ , provided $\underset{n\rightarrow\infty}{\lim\sup}~{}n^{-1}\log\pi\left(\mathbb{I}_{A}R_{n}\right)<\infty$ . Regarding this, the following assumption has been made by Shalizi:

(S6)

The sets $\mathcal{G}_{n}$ of (S5) can be chosen such that for every $\delta>0$ , the inequality $n>\tau(\mathcal{G}_{n},\delta)$ holds almost surely for all sufficiently large $n$ .
(S7)

The sets $\mathcal{G}_{n}$ of (S5) and (S6) can be chosen such that for any set $A$ with $\pi(A)>0$ ,

$h\left(\mathcal{G}_{n}\cap A\right)\rightarrow h\left(A\right),$ (3.8)

as $n\rightarrow\infty$ .

Under the above assumptions, the following version of the theorem of Shalizi (2009) can be seen to hold.

Theorem 2 ((Shalizi, 2009))

Consider assumptions (S1)–(S7) and any set $A\in\mathcal{T}$ with $\pi(A)>0$ . If $\varsigma>2h(A)$ , where $\varsigma$ is given in (3.6) under assumption (S5), then

\underset{n\rightarrow\infty}{\lim}~{}\frac{1}{n}\log P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}(A|\boldsymbol{X}_{n})=-J(A).

(3.9)

We shall frequently make use of this theorem for our purpose. Also throughout this article, we show consistency results for general models which satisfy (S1)–(S7). For all our results, we assume these conditions.

4 Consistency of multiple testing procedures when the number of hypotheses tends to infinity

In this section we show that the non-marginal procedure is asymptotically consistent under any general dependency model satisfying the conditions in Section 3.1. Since one of our main goals is to allow for misspecification, we must define consistency of multiple testing methods encompassing misspecification, while also allowing for $m_{n}$ hypotheses where $m_{n}/n\rightarrow c$ , where $c\geq 0$ or $\infty$ . We formalize this below by introducing appropriate notions.

4.1 Consistency of multiple testing procedures under misspecification

Let $\mathbf{\Theta}^{\infty}$ be the infinite dimensional parameter space of the countably infinite set of parameters $\{\theta_{1},\theta_{2},\ldots\}$ . In this case, any decision configuration $\boldsymbol{d}$ is also an infinite dimensional vector of 0’s and 1’s. Define $\boldsymbol{\Theta}^{t}=\otimes_{i=1}^{\infty}\boldsymbol{\Theta}_{d_{i}^{t},i}$ , where $``\otimes"$ denotes cartesian product, and $\boldsymbol{d}^{t}=(d^{t}_{1},d^{t}_{2},\ldots)$ denotes the actual infinite dimensional decision configuration satisfying $J\left(\boldsymbol{\Theta}^{t}\right)=J\left(\mathbf{\Theta}^{\infty}\right)$ . This definition of $\boldsymbol{d}^{t}$ accounts for misspecification in the sense that $\boldsymbol{d}^{t}$ is the minimizer of the KL-divergence from the true data-generating model. For any decision $\boldsymbol{d}$ , let $\boldsymbol{d}(m_{n})$ denote the first $m_{n}$ components of $\boldsymbol{d}$ . Let $\mathbb{D}_{m_{n}}$ denote the set of all possible decision configurations corresponding to $m_{n}$ hypotheses. With the aforementioned notions, we now define consistency of multiple testing procedures.

Definition 3

Let $\boldsymbol{d}^{t}(m_{n})$ be the true decision configuration among all possible decision configurations in $\mathbb{D}_{m_{n}}$ . Then a multiple testing method $\mathcal{M}$ is said to be asymptotically consistent if almost surely

\lim_{n\rightarrow\infty}\delta_{\mathcal{M}}(\boldsymbol{d}^{t}(m_{n})|\boldsymbol{X}_{n})=1.

(4.1)

Recall the constant $\beta_{n}$ in (2.7), which is the penalizing constant between the error $E$ and true positives $TP$ . For consistency of the non-marginal procedure, we need certain conditions on $\beta_{n}$ , which we state below. These conditions will also play important roles in the asymptotic studies of the different versions of $FDR$ and $FNR$ that we consider.

(A1)

We assume that the sequence $\beta_{n}$ is neither too small nor too large, that is,

	$\displaystyle\underline{\beta}$	$\displaystyle=\underset{n\geq 1}{\liminf}\beta_{n}>0;$		(4.2)
	$\displaystyle\overline{\beta}$	$\displaystyle=\underset{n\geq 1}{\limsup}\beta_{n}<1.$		(4.3)

(A2)

We assume that neither all the null hypotheses are true and nor all of then are false for $m_{n}$ hypotheses being considered, that is, $\boldsymbol{d}^{t}(m_{n})\neq\boldsymbol{0}$ and $\boldsymbol{d}^{t}(m_{n})\neq\boldsymbol{1}$ , where $\boldsymbol{0}$ and $\boldsymbol{1}$ are vectors of 0’s and 1’s respectively.

Condition (A1) is necessary for the asymptotic consistency of both the non-marginal method and additive loss function based method. This ensures that the penalizing constant is asymptotically bounded away from 0 and 1, that is, it is neither too small nor too large. Notably, (A2) is not required for the consistency results. The role of (A2) is to ensure that the denominator terms in the multiple testing error measures (defined in Section 2.1) do not become 0.

4.2 Main results on consistency in the infinite-dimensional setup

In this section we investigate the asymptotic properties of the Bayesian non-marginal method and Müller et al. (2004) when $m_{n}/n$ tends to infinity or some positive constant. It is to be noted that result (3.9) of Shalizi (2009) holds even for infinite-dimensional parameter space. Exploiting this fact we derive the results in this section.

Note that if there exists a value $\boldsymbol{\theta}^{t}$ of $\boldsymbol{\theta}$ that minimizes the KL-divergence, then $\boldsymbol{\theta}^{t}$ is in the set $\boldsymbol{\Theta}^{t}$ . Let us denote by $\boldsymbol{\Theta}^{tc}$ the complement of $\boldsymbol{\Theta}^{t}$ . Observe that if $\boldsymbol{\theta}^{t}$ lies in the interior of $\boldsymbol{\Theta}^{t}$ , then $J\left(\boldsymbol{\Theta}^{tc}\right)>0$ . It then holds that

\lim_{n\rightarrow\infty}\frac{1}{n}\log P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left(\boldsymbol{\Theta}^{tc}\right)=-J\left(\boldsymbol{\Theta}^{tc}\right),

(4.4)

which implies that for any $\epsilon>0$ , there exists a $n_{0}(\epsilon)$ such that for all $n>n_{0}(\epsilon)$

		$\displaystyle\exp\left[-n\left(J\left(\boldsymbol{\Theta}^{tc}\right)+\epsilon\right)\right]<P_{\boldsymbol{\theta}\|\boldsymbol{X}_{n}}\left(\boldsymbol{\Theta}^{tc}\right)<\exp\left[-n\left(J\left(\boldsymbol{\Theta}^{tc}\right)-\epsilon\right)\right]$		(4.5)
	$\displaystyle\Rightarrow$	$\displaystyle 1-\exp\left[-n\left(J\left(\boldsymbol{\Theta}^{tc}\right)-\epsilon\right)\right]<P_{\boldsymbol{\theta}\|\boldsymbol{X}_{n}}\left(\boldsymbol{\Theta}^{t}\right)<1-\exp\left[-n\left(J\left(\boldsymbol{\Theta}^{tc}\right)+\epsilon\right)\right].$		(4.6)

For notational convenience, we shall henceforth denote $J\left(\boldsymbol{\Theta}^{tc}\right)$ by $J$ .

Note that the groups $G_{i}$ also depend upon $m_{n}$ in our setup; hence, we denote them by $G_{i,m_{n}}$ . For any decision configuration $\boldsymbol{d}(m_{n})$ and group $G_{m_{n}}$ let $\boldsymbol{d}_{G_{m_{n}}}=\{d_{j}:j\in G_{m_{n}}\}$ . Define

\mathbb{D}_{i,m_{n}}=\left\{\boldsymbol{d}(m_{n}):~{}\mbox{all decisions in}~{}\boldsymbol{d}_{G_{i,m_{n}}}~{}\mbox{are correct}\right\}.

Here $\mathbb{D}_{i,m_{n}}$ is the set of all decision configurations where the decisions corresponding to the hypotheses in $G_{i,m_{n}}$ are at least correct. Clearly $\mathbb{D}_{i,m_{n}}$ contains $\boldsymbol{d}^{t}(m_{n})$ for all $i=1,2,\ldots,m_{n}$ .

Hence, $\mathbb{D}^{c}_{i,m_{n}}=\left\{\boldsymbol{d}(m_{n}):~{}\mbox{at least one decision in}~{}\boldsymbol{d}_{G_{i,m_{n}}}~{}\mbox{is incorrect}\right\}$ . Observe that if $\boldsymbol{d}(m_{n})\in\mathbb{D}^{c}_{i,m_{n}}$ , at least one decision is wrong corresponding to some parameter in $G_{i,m_{n}}$ . As $P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left(\boldsymbol{\Theta}^{tc}\right)$ is the posterior probability of at least one wrong decision in the infinite dimensional parameter space, we have

w_{in}(\boldsymbol{d}(m_{n}))\leq w_{in}(\boldsymbol{d})<P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left(\boldsymbol{\Theta}^{tc}\right)<\exp\left[-n\left(J-\epsilon\right)\right].

(4.7)

Also if $H_{0i}$ is true, then

v_{in}\leq w_{in}(\boldsymbol{d})<P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left(\boldsymbol{\Theta}^{tc}\right)<\exp\left[-n\left(J-\epsilon\right)\right].

(4.8)

Similarly for $\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}$ and for false $H_{0i}$

w_{in}(\boldsymbol{d}(m_{n}))\geq w_{in}(\boldsymbol{d}^{t})>P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left(\boldsymbol{\Theta}^{t}\right)>1-\exp\left[-n\left(J-\epsilon\right)\right];

(4.9)

v_{in}\geq w_{in}(\boldsymbol{d}^{t})>P_{\boldsymbol{\theta}|\boldsymbol{X}_{n}}\left(\boldsymbol{\Theta}^{t}\right)>1-\exp\left[-n\left(J-\epsilon\right)\right].

(4.10)

It is important to note that the inequalities (4.7)-(4.10) hold for all $n>n_{0}$ and this $n_{0}$ is the same for all $i$ , thanks to validity of Shalizi’s result in infinite dimensional parameter space. Exploiting the properties of Shalizi’s theorem we will now establish consistency of the Bayesian non-marginal method for increasing number of hypotheses.

Theorem 4

Let $\delta_{\mathcal{NM}}$ denote the decision rule corresponding to the Bayesian non-marginal procedure for $m_{n}$ hypotheses being tested using samples of size $n$ , where $m_{n}\rightarrow\infty$ as $n\rightarrow\infty$ . Assume Shalizi’s conditions and assumption (A1). Also assume that $J\left(\boldsymbol{\Theta}^{t}\right)>0$ . Then,

	$\displaystyle\lim_{n\rightarrow\infty}\delta_{\mathcal{NM}}(\boldsymbol{d}^{t}(m_{n})\|\boldsymbol{X}_{n})=1,~{}\mbox{almost surely, and}$		(4.11)
	$\displaystyle\lim_{n\rightarrow\infty}E\left[\delta_{\mathcal{NM}}(\boldsymbol{d}^{t}(m_{n})\|\boldsymbol{X}_{n})\right]=1.$		(4.12)

Corollary 5

Assuming condition (A1), the optimal decision rule corresponding to the additive loss function (2.14) is asymptotically consistent. The proof follows in the same way as that of Theorem 4 using (4.8) and (4.10).

Remark 6

Note that Theorem 4 does not require any condition regarding the growth of $m_{n}$ with respect to $n$ , and holds if $m_{n}/n\rightarrow c$ as $n\rightarrow\infty$ , where $c\geq 0$ is some constant, or infinity. Thus, the result seems to be extremely satisfactory. However, restrictions on the growth of $m_{n}$ needs to be generally imposed to satisfy the conditions of Shalizi. An illustration in this regard is provided in Section 8.

5 High-dimensional asymptotic analyses of versions of $FDR$

For a fixed number of hypotheses $m$ , Chandra and Bhattacharya (2020) investigated convergence of different versions of $FDR$ as the sample size $n$ tends to infinity. They show that show that the convergence rates of the posterior error measures $mFDR_{\boldsymbol{X}_{n}}$ and $FDR_{\boldsymbol{X}_{n}}$ are directly associated with the KL-divergence from the true model. Indeed, they were able to obtain the exact limits of $\frac{1}{n}\log mFDR_{\boldsymbol{X}_{n}}$ and $\frac{1}{n}\log FDR_{\boldsymbol{X}_{n}}$ in terms of the relevant $m$ -dimensional KL-divergence rate.

In the current high-dimensional setup, however, such exact KL-divergence rate can not be expected to be available since the number of hypotheses $m_{n}$ is not fixed. As $m_{n}\rightarrow\infty$ , it is plausible to expect that the convergence rates depend upon the infinite-dimensional KL-divergence $J$ . We show that this is indeed the case, but the exact limit is not available, which is again to be expected, since $m_{n}$ approaches infinity, not equal to infinity. Here, in the high-dimensional setup we obtain $-J$ as an upper bound of the limit supremums. It is easy to observe that the limits in the finite-dimensional setup are bounded above by $-J$ , thus providing evidence of internal consistency as we move from fixed-dimensional to high-dimensional setup.

We also show that $mpBFDR$ and $pBFDR$ approach zero, even though the rates of convergence are not available. Recall that even in the fixed-dimensional setup, the convergence rates of $mpBFDR$ and $pBFDR$ were not available. As in the consistency result, these results too do not require any restriction on the growth rate of $m_{n}$ , except that required for Shalizi’s conditions to hold.

We present our results below, the proofs of which are presented in the supplement.

Theorem 7

Assume the setup and conditions of Theorem 4. Then, for any $\epsilon>0$ , there exists $n_{0}(\epsilon)\geq 1$ such that for $n\geq n_{0}(\epsilon)$ , the following hold almost surely:

	$\displaystyle mFDR_{\boldsymbol{X}_{n}}$	$\displaystyle\leq e^{-n(J-\epsilon)};$		(5.1)
	$\displaystyle FDR_{\boldsymbol{X}_{n}}$	$\displaystyle\leq e^{-n(J-\epsilon)}.$		(5.2)

The above theorem shows that the convergence rate of $mFDR_{\boldsymbol{X}_{n}}$ and $FDR$ to 0 for arbitrarily large number of hypotheses is at exponential rate, for arbitrary growth rate of $m_{n}$ with respect to $n$ . However, again Shalizi’s conditions would require restriction on the growth rate of $m_{n}$ .

Corollary 8

Under the setup and assumptions of Theorem 4,

	$\displaystyle\limsup_{n\rightarrow\infty}\frac{1}{n}\log mFDR_{\boldsymbol{X}_{n}}$	$\displaystyle\leq-J;$		(5.3)
	$\displaystyle\limsup_{n\rightarrow\infty}\frac{1}{n}\log FDR_{\boldsymbol{X}_{n}}$	$\displaystyle\leq-J;$		(5.4)

Theorem 9

Assume the setup and conditions of Theorem 4, along with assumption (A2). Then

	$\displaystyle\lim_{n\rightarrow\infty}mpBFDR$	$\displaystyle=0;$		(5.5)
	$\displaystyle\lim_{n\rightarrow\infty}pBFDR$	$\displaystyle=0.$		(5.6)

6 High-dimensional asymptotic analyses of versions of $FNR$

High-dimensional asymptotic treatments of versions of $FNR$ are similar to those for versions of $FDR$ . In particular, limit supremums of both $\frac{1}{n}\log mFNR_{\boldsymbol{X}_{n}}$ and $\frac{1}{n}\log FNR_{\boldsymbol{X}_{n}}$ are bounded above by $-J$ , and that both $mpBFNR$ and $pBFNR$ converge to zero. The proofs of these results are also similar to those for the respective $FDR$ versions. Internal consistency of these results is again evident as the limits of $\frac{1}{n}\log mFNR_{\boldsymbol{X}_{n}}$ and $\frac{1}{n}\log FNR_{\boldsymbol{X}_{n}}$ in the finite dimensional setups are bounded above by $-J$ and $mpBFNR$ and $pBFNR$ converge to zero for fixed number of hypotheses. In the latter cases, convergence rates are not available for either fixed or high-dimensional cases. Below we provide the relevant results on versions of $FNR$ , with proofs in the supplement.

Theorem 10

Assume the setup and conditions of Theorem 4. Then, for any $\epsilon>0$ , there exists $n_{0}(\epsilon)\geq 1$ such that for $n\geq n_{0}(\epsilon)$ , the following hold almost surely:

	$\displaystyle mFNR_{\boldsymbol{X}_{n}}$	$\displaystyle\leq e^{-n(J-\epsilon)};$		(6.1)
	$\displaystyle FNR_{\boldsymbol{X}_{n}}$	$\displaystyle\leq e^{-n(J-\epsilon)}.$		(6.2)

The above theorem shows that the convergence rate of $mFNR_{\boldsymbol{X}_{n}}$ and $FNR$ to 0 for arbitrarily large number of hypotheses is at exponential rate, for arbitrary growth rate of $m_{n}$ with respect to $n$ . However, again Shalizi’s conditions would require restriction on the growth rate of $m_{n}$ .

Corollary 11

Under the setup and assumptions of Theorem 4,

	$\displaystyle\limsup_{n\rightarrow\infty}\frac{1}{n}\log mFNR_{\boldsymbol{X}_{n}}$	$\displaystyle\leq-J;$		(6.3)
	$\displaystyle\limsup_{n\rightarrow\infty}\frac{1}{n}\log FNR_{\boldsymbol{X}_{n}}$	$\displaystyle\leq-J;$		(6.4)

Theorem 12

Assume the setup and conditions of Theorem 4, along with assumption (A2). Then

	$\displaystyle\lim_{n\rightarrow\infty}mpBFNR$	$\displaystyle=0;$		(6.5)
	$\displaystyle\lim_{n\rightarrow\infty}pBFNR$	$\displaystyle=0.$		(6.6)

7 High-dimensional asymptotics for $FNR_{\boldsymbol{X}_{n}}$ and $BFNR$ when versions of $BFDR$ are $\alpha$ -controlled

It has been proved in Chandra and Bhattacharya (2019) for the non-marginal multiple testing procedure and additive loss-function based methods, $mpBFDR$ and $pBFDR$ are continuous and non-increasing in $\beta$ . Consequently, for suitable values of $\beta$ any $\alpha\in(0,1)$ can be achieved by these errors. For suitably chosen positive values of $\alpha$ , one can hope to reduce the corresponding $BFNR$ . This is standard practice even in the single hypothesis testing literature, where the Type-I error is controlled at some positive value so that a reduced Type-II error may be incurred. However, as shown in Chandra and Bhattacharya (2020) in the fixed-dimensional setup, for the non-marginal multiple testing procedure and additive loss-function based methods, values of $\alpha$ that are as close to 1 as desired, can not be attained by versions of $FDR$ as the sample size $n$ tends to infinity. This is not surprising, however, since consistent procedures are not expected to incur large errors asymptotically, at least when the number of hypothesis is fixed. Indeed, in the fixed-dimensional setup, Chandra and Bhattacharya (2020) provided an interval of the form $(a,b)$ where $0<a<b<1$ , in which maximum values of the versions of $FDR$ can lie asymptotically and obtained asymptotic results for $FNR$ for such $\alpha$ -controlled versions of $FDR$ .

In this section we investigate the asymptotic theory for $\alpha$ -control in the high-dimensional context, that is, when $m_{n}\rightarrow\infty$ as $n\rightarrow\infty$ . Although none of our previous high-dimensional results did not require any explicit restrictions on the growth rate of $m_{n}$ given that the posterior convergence result of Shalizi holds, here we need a very mild condition on $m_{n}$ that it grows slower than the exponential rate in $n$ . We also need to fix the proportion ( $p$ ) of true alternatives as $m_{n}\rightarrow\infty$ , and the proportion ( $q$ ) of groups associated with at least one false null hypothesis. As we show, these two proportions define an interval of the form $(0,b)$ , with $b=\frac{1-q}{1+p-q}<1$ , in which the maximum of the versions of $FDR$ lie, as $m_{n}\rightarrow\infty$ with $n$ . In contrast with the fixed-dimensional asymptotics of Chandra and Bhattacharya (2020), the lower bound of the interval is zero for high dimensions, not strictly positive. To explain, for fixed dimension $m$ , the lower bound was $a=\frac{1}{\sum_{i=1}^{m}d^{t}_{i}+1}$ . Intuitively, replacing $a$ and $m$ with $a_{m_{n}}$ and $m_{n}$ respectively, dividing both numerator and denominator of $a$ by $m_{n}$ , taking the limit, replacing the denominator with $p$ , we obtain $a_{m_{n}}\rightarrow 0$ , as $n\rightarrow\infty$ . Similar intuition can be used to verify that the upper bound $b$ in the fixed dimensional case converges to $\frac{1-q}{1+p-q}$ in the high-dimensional setup. As in our previous results, these provide a verification of internal consistency in the case of transition from fixed-dimensional to high-dimensional situations.

Our results regarding asymptotic $\alpha$ control of versions of $FDR$ and corresponding convergence of versions of $FNR$ are detailed in Sections 7.1 and 7.2.

7.1 High-dimensional $\alpha$ -control of $mpBFDR$ and $pBFDR$ for the non-marginal method

The following theorem provides the interval for the maximum $mpBFDR$ that can be incurred asymptotically in the high-dimensional setup.

Theorem 13

In addition to (A1)-(A2), assume the following:

(B)

For each $n>1$ , let each group of a particular set of $m_{1n}~{}(<m_{n})$ groups out of the total $m_{n}$ groups be associated with at least one false null hypothesis, and that all the null hypotheses associated with the remaining $m_{n}-m_{1n}$ groups be true. Let us further assume that the latter $m_{n}-m_{1n}$ groups do not have any overlap with the remaining $m_{1n}$ groups. Without loss of generality assume that $G_{1n},\ldots,G_{m_{1n}}$ are the groups each consisting of at least one false null and $G_{m_{1n}+1},G_{m_{1n}+2},\cdots,G_{m_{n}}$ are the groups where all the null hypotheses are true. Assume further, the following limits:

	$\displaystyle\lim_{n\rightarrow\infty}\frac{m_{1n}}{m_{n}}=q\in(0,1);$		(7.1)
	$\displaystyle\lim_{n\rightarrow\infty}\frac{\sum_{i=1}^{m_{n}}d^{t}_{i}}{m_{n}}=p\in(0,1);$		(7.2)
	$\displaystyle\lim_{n\rightarrow\infty}m_{n}e^{-nc}=0~{}\mbox{for all}~{}c>0.$		(7.3)

Then the maximum $mpBFDR$ that can be incurred, asymptotically lies in $\left(0,\frac{1-q}{1+p-q}\right)$ .

Remark 14

If $p$ is close to zero, that is, if all but a finite number of null hypotheses are true, then $\frac{1-q}{1+p-q}\approx 1$ , showing that in such cases, better $\alpha$ -control can be exercised. Indeed, as the proof of the theorem shows, the optimal decision in this case will be given by all but a finite set of one’s, so that all but a finite number of decisions are correct. Hence, maximum error occurs in this case. Also, if $q$ is close to $1$ , then $\frac{1-q}{1+p-q}\approx 0$ . In other words, if all but a finite number of groups are associated with at least one false null hypothesis, then almost no error can be incurred. As the proof Theorem 13 shows, this is the case where all but a finite number of decisions are correct, and hence, it is not surprising that almost no error can be incurred in this case.

Remark 15

Also, as in the fixed-dimensional case, Theorem 13 holds, if for at least one $i\in\{1,\ldots,m_{n}\}$ , $G_{i}\subset\{1,\ldots,m_{n}\}$ . But if $G_{i}=\{1,\ldots,m_{n}\}$ for $i=1,\ldots,m_{n}$ , then $mpBFDR\rightarrow 0$ as $n\rightarrow\infty$ , for any sequence $\beta_{n}\in[0,1]$ .

Remark 16

Note that in the same way as in the fixed-dimensional setup, Theorem 13 remains valid even for $mFDR_{\boldsymbol{X}_{n}}$ thanks to its monotonicity with respect to $\beta$ , the property crucially used to prove Theorem 13.

The following theorem shows that for feasible values of $\alpha$ attained asymptotically by the maximum of $mpBFDR$ , for appropriate sequences of penalizing constants $\beta_{n}$ , it is possible to asymptotically approach such $\alpha$ through $mpBFDR_{\beta_{n}}$ , where $mpBFDR_{\beta}$ denotes $mpBFDR$ for the non-marginal procedure where the penalizing constant is $\beta$ .

Theorem 17

Suppose that

\displaystyle\lim_{n\rightarrow\infty}mpBFDR_{\beta=0}=E.

(7.4)

Then, for any $\alpha<E$ and $\alpha\in\left(0,\frac{1-q}{1+p-q}\right)$ , under condition (B), there exists a sequence $\beta_{n}\rightarrow 0$ such that $mpBFDR_{\beta_{n}}\rightarrow\alpha$ as $n\rightarrow\infty$ .

From the proofs of Theorem 13 and 17, it can be seen that replacing $w_{in}(\hat{\boldsymbol{d}}(m_{n}))$ by $v_{in}$ does not affect the results. Hence we state the following corollary.

Corollary 18

Let $pBFDR_{\beta}$ denote the $pBFDR$ corresponding to the non-marginal procedure where the penalizing constant is $\beta$ . Suppose that

\displaystyle\lim_{n\rightarrow\infty}pBFDR_{\beta=0}=E^{\prime},

Then, for any $\alpha<E^{\prime}$ and $\alpha\in\left(0,\frac{1-q}{1+p-q}\right)$ , under condition (B), there exists a sequence $\beta_{n}\rightarrow 0$ such that $pBFDR_{\beta_{n}}\rightarrow\alpha$ as $n\rightarrow\infty$ .

As in the fixed-dimensional setup, we see that for $\alpha$ -control we must have $\lim_{n\rightarrow\infty}\beta_{n}=0$ , and that for $\liminf_{n\rightarrow\infty}\beta_{n}>0$ , $mpBFDR$ tends to zero. In other words, even in the high-dimensional setup, $\alpha$ -control requires a sequence $\beta_{n}$ that is smaller that that for which $mpBFDR$ tends to zero.

Since the additive loss function based methods are special cases of the non-marginal procedure where $G_{i}=\{i\}$ for all $i$ (see Chandra and Bhattacharya (2019), Chandra and Bhattacharya (2020)), and that in such cases, $mpBFDR$ reduces to $pBFDR$ , it is important to investigate asymptotic $\alpha$ -control of $pBFDR$ in this situation. Our result in this direction is provided in Theorem 19.

Theorem 19

Let $m_{0n}~{}(<m_{n})$ be the number of true null hypotheses such that $m_{0n}/m_{n}\rightarrow p_{0}\in(0,1)$ , as $n\rightarrow\infty$ . Then for any $0<\alpha<p_{0}$ , there exists a sequence $\beta_{n}\rightarrow 0$ as $n\rightarrow\infty$ such that for the additive loss function based methods

\underset{n\rightarrow\infty}{\lim}~{}pBFDR_{\beta_{n}}=\alpha.

The result is similar in spirit to that obtained by Chandra and Bhattacharya (2020) in the corresponding finite dimensional situation. The limit of $m_{0n}/m_{n}$ in the corresponding high-dimensional setup, instead of $m_{0}/m$ in the fixed dimensional case, plays the central role here.

Chandra and Bhattacharya (2019) and Chandra and Bhattacharya (2020) noted that even for additive loss function based multiple testing procedures, $mpBFDR$ may be a more desirable candidate compared to $pBFDR$ since it can yield non-marginal decisions even if the multiple testing criterion to be optimized is a simple sum of loss functions designed to yield marginal decisions. The following theorem shows that the same high-dimensional asymptotic result as Theorem 19 also holds for $mpBFDR$ in the case of additive loss functions, without the requirement of condition (B). Non-requirement of condition (B) even in the high-dimensional setup can be attributed to the fact that $mpBFDR(\mathcal{M})\geq pBFDR(\mathcal{M})$ for any multiple testing method $\mathcal{M}$ , for arbitrary sample size.

Theorem 20

Let $m_{0n}~{}(<m_{n})$ be the number of true null hypotheses such that $m_{0n}/m_{n}\rightarrow p_{0}\in(0,1)$ , as $n\rightarrow\infty$ . Let $\alpha$ be the desired level of significance where $0<\alpha<p_{0}$ . Then there exists a sequence $\beta_{n}\rightarrow 0$ as $n\rightarrow\infty$ such that for the additive loss function based method

\underset{n\rightarrow\infty}{\lim}~{}mpBFDR_{\beta_{n}}=\alpha.

Note that Bayesian versions of $FDR$ (conditional on the data) need not be continuous with respect to $\beta$ , and so results for such Bayesian versions similar to Theorem 17, Corollary 18 and Theorems 19, 20, which heavily use such continuity property, could not be established.

Thus, interestingly, all the asymptotic results for $\alpha$ -control of versions of $FDR$ in the fixed dimensional setup admitted simple extensions to the high-dimensional setup, with minimal assumption regarding the growth rate of $m_{n}$ , given Shalizi’s conditions hold. Since Shalizi’s conditions are meant for posterior consistency, from the multiple testing perspective, our high-dimensional results are very interesting in the sense that almost no extra assumptions are required in addition to Shalizi’s conditions for our multiple testing results to carry over from fixed dimension to high dimensions.

7.2 High-dimensional properties of Type-II errors when $mpBFDR$ and $pBFDR$ are asymptotically controlled at $\alpha$

In this section, we investigate the high-dimensional asymptotic theory for $FNR_{\boldsymbol{X}_{n}}$ and $pBFNR$ associated with $\alpha$ -control of versions of $FDR$ . Our results in these regards are provided as Theorem 21 and Corollary 22.

Theorem 21

Assume condition (B) and that $n^{-1}\log m_{n}\rightarrow 0$ , as $n\rightarrow\infty$ . Then for asymptotic $\alpha$ -control of $mpBFDR$ in the non-marginal procedure the following holds almost surely:

\limsup_{n\rightarrow\infty}FNR_{\boldsymbol{X}_{n}}\leq-J.

The above theorem requires the very mild assumption that $n^{-1}\log m_{n}\rightarrow 0$ , as $n\rightarrow\infty$ , in addition to (B). The result shows that $FNR_{\boldsymbol{X}_{n}}$ converges to zero at an exponential rate, but again the exact limit of $FNR_{\boldsymbol{X}_{n}}$ is not available in this high-dimensional setup. This is slightly disconcerting in the sense that we are now unable to compare the rates of convergence of $FNR_{\boldsymbol{X}_{n}}$ for cases where $\alpha$ -control is imposed and not imposed. Indeed, for the fixed-dimensional setup, Chandra and Bhattacharya (2020) could obtain exact limits and consequently show that $FNR_{\boldsymbol{X}_{n}}$ converges to zero at a rate faster than or equal to that compared to the case when $\alpha$ control is not exercised. However, as we already argued in the context of versions of $FDR$ , exact limits are not expected to be available in these cases for high dimensions.

Corollary 22

Assume condition (B) and that $n^{-1}\log m_{n}\rightarrow 0$ , as $n\rightarrow\infty$ . Then for asymptotic $\alpha$ -control of $mpBFDR$ in the non-marginal procedure the following holds:

\lim_{n\rightarrow\infty}pBFNR=0.

Thus, as in the fixed dimensional setup, Corollary 22 shows that corresponding to $\alpha$ -control, $pBFNR$ converges to zero even in the high-dimensional setup, and that the rate of convergence to zero is unavailable.

8 Illustration of consistency of our non-marginal multiple testing procedure in time-varying covariate selection in autoregressive process

Let the true model $P$ stand for the following $AR(1)$ model consisting of time-varying covariates:

x_{t}=\rho_{0}x_{t-1}+\sum_{i=0}^{m}\beta_{i0}z_{it}+\epsilon_{t},~{}t=1,2,\ldots,n,

(8.1)

where $x_{0}\equiv 0$ , $|\rho_{0}|<1$ and $\epsilon_{t}\stackrel{{\scriptstyle iid}}{{\sim}}N(0,\sigma^{2}_{0})$ , for $t=1,2,\ldots,n$ . In (8.1), $m\equiv m_{n}\rightarrow\infty$ as $n\rightarrow\infty$ . Here $\left\{z_{it}:t=1,2,\ldots\right\}$ are relevant time-varying covariates. We set $z_{0t}\equiv 1$ for all $t$ .

Now let the data be modeled by the same model as $P$ but with $\rho_{0}$ , $\beta_{i0}$ and $\sigma^{2}_{0}$ be replaced with the unknown quantities $\rho$ , $\beta_{i}$ and $\sigma^{2}$ , respectively, that is,

x_{t}=\rho x_{t-1}+\sum_{i=0}^{m}\beta_{i}z_{it}+\epsilon_{t},~{}t=1,2,\ldots,n,

(8.2)

where we set $x_{0}\equiv 0$ , $\epsilon_{t}\stackrel{{\scriptstyle iid}}{{\sim}}N(0,\sigma^{2})$ , for $t=1,2,\ldots,n$ .

For notational purposes, we let $\boldsymbol{z}_{mt}=(z_{0t},z_{1t},\ldots,z_{mt})^{\prime}$ , $\boldsymbol{z}_{t}=(z_{0t},z_{1t},\ldots)^{\prime}$ , $\boldsymbol{\beta}_{m0}=(\beta_{00},\beta_{10},\ldots,\beta_{m0})^{\prime}$ , $\boldsymbol{\beta}_{m}=(\beta_{0},\beta_{1},\ldots,\beta_{m})^{\prime}$ and $\boldsymbol{\beta}=(\beta_{0},\beta_{1},\ldots)^{\prime}$ .

8.1 The ultra high-dimensional setup

Let us first consider the setup where $\frac{m_{n}}{n}\rightarrow\infty$ as $n\rightarrow\infty$ . This is a challenging problem, and we require notions of sparsity to address such a problem. As will be shown subsequently in Section 8.2, a precise notion of sparsity is available for our problem in the context of the equipartition property. Specifically sparsity in our problem entails controlling relevant quadratic forms of $\boldsymbol{\beta}$ . For such sparsity, we must devise a prior for $\boldsymbol{\beta}$ such that $\|\boldsymbol{\beta}\|<\infty$ . We also assume that $\|\boldsymbol{\beta}_{0}\|<\infty$ .

For appropriate prior structures for $\boldsymbol{\beta}$ , let us consider the following strategy. First, let us consider an almost surely continuously differentiable random function $\tilde{\eta}(\cdot)$ on a compact space $\mathcal{X}$ , such that

\|\tilde{\eta}\|=\underset{\tilde{\mathbf{x}}\in\mathcal{X}}{\sup}~{}|\tilde{\eta}(\tilde{\mathbf{x}})|<\infty,~{}\mbox{almost surely.}

(8.3)

We denote the class of such functions as $\mathcal{C}^{\prime}(\mathcal{X})$ . A popular prior for $\mathcal{C}^{\prime}(\mathcal{X})$ is the Gaussian process prior with sufficiently smooth covariance function, in which case, both $\tilde{\eta}$ and $\tilde{\eta}^{\prime}$ are Gaussian processes; see, for example, Cramer and Leadbetter (1967). Let us now consider an arbitrary sequence $\left\{\tilde{\mathbf{x}}_{i}:i=1,2,\ldots\right\}$ , and let $\tilde{\boldsymbol{\beta}}=\left(\tilde{\beta}_{1},\tilde{\beta}_{2},\ldots\right)^{\prime}$ , where, for $i=1,2,\ldots$ , $\tilde{\beta}_{i}=\tilde{\eta}(\tilde{\mathbf{x}}_{i})$ . We then define $\beta_{i}=\gamma_{i}\tilde{\beta}_{i}$ , where for $i=1,2,\ldots$ , $\gamma_{i}$ are independent (but non-identical) random variables, such that $0<|\gamma_{i}|<L<\infty$ for $i\geq 1$ , and

\sum_{i=1}^{\infty}|\gamma_{i}|<\infty,~{}\mbox{almost surely.}

(8.4)

Also, let $\rho\in\mathbb{R}$ and $\sigma\in(0,\infty)=\mathbb{R}^{+}$ . Thus, $\boldsymbol{\theta}=(\tilde{\eta},\boldsymbol{\gamma},\rho,\sigma)$ , where $\boldsymbol{\gamma}=(\gamma_{1},\gamma_{2},\ldots)^{\prime}$ , and $\boldsymbol{\Theta}=\mathcal{C}^{\prime}(\mathcal{X})\times\mathbb{R}^{\infty}\times\mathbb{R}\times\mathbb{R}^{+}$ , is the parameter space. For our asymptotic theories regarding the multiple testing methods that we consider, we must verify the assumptions of Shalizi for the modeling setups (8.1) and (8.2), with this parameter space.

With respect to the above ultra high-dimensional setup, we consider the following multiple-testing framework:

	$\displaystyle H_{01}:\|\rho\|<1\text{ versus }H_{11}:\|\rho\|\geq 1\text{ and}$
	$\displaystyle H_{0,i+2}:\beta_{i}\in\mathcal{N}_{0}\text{ versus }H_{1,i+2}:\beta_{i}\in\mathcal{N}^{c}_{0},~{}\text{ for }~{}i=0,\ldots,m,$		(8.5)

where $\mathcal{N}_{0}$ is some neighborhood of zero and $\mathcal{N}^{c}_{0}$ is the complement of the neighborhood in the relevant parameter space.

Verification of consistency of our non-marginal procedure amounts to verification of assumptions (S1)–(S7) of Shalizi for the above setup. In this regard, we make the following assumptions:

(B1)

$\underset{t\geq 1}{\sup}~{}\|z_{t}\|<\infty$ , where, for $t\geq 1$ , $\|z_{t}\|=\underset{i\geq 1}{\sup}~{}|z_{it}|$ .
(B2)

For $k>1$ , let $\tilde{\lambda}_{nk}$ be the largest eigenvalue of $\frac{\sum_{t=1}^{n}\boldsymbol{z}_{m,t+k}\boldsymbol{z}^{\prime}_{mt}}{n}$ . We assume that $\tilde{\lambda}_{nk}\rightarrow 0$ , as $n\rightarrow\infty$ , for $k>1$ .
(B3)

Let $\lambda_{n}$ be the largest eigenvalue of $\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}$ . We assume that $\underset{n\geq 1}{\sup}~{}\lambda_{n}\leq K<\infty$ .

(B4)

	$\displaystyle\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{mt}\rightarrow 0~{}\mbox{almost surely};~{}~{}\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m0}\boldsymbol{z}_{mt}\rightarrow 0;$		(8.6)
	$\displaystyle\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}\rightarrow c(\boldsymbol{\beta})~{}\mbox{almost surely};~{}~{}\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m0}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0}\rightarrow c(\boldsymbol{\beta}_{0}),$		(8.7)
	$\displaystyle\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0}\rightarrow c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})~{}\mbox{almost surely},$		(8.8)

as $n\rightarrow\infty$ . In the above, $c(\boldsymbol{\beta}_{0})~{}(>0)$ is a finite constant; $c(\boldsymbol{\beta})~{}(>0)$ and $c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})$ are finite quantities that depend upon the choice of the sequence $\left\{\boldsymbol{\beta}_{m};n=1,2,\ldots\right\}$ .

(B5)

The limits of the quantities $\boldsymbol{z}^{\prime}_{t}\boldsymbol{\beta}$ for almost all $\boldsymbol{\beta}$ , $\boldsymbol{z}^{\prime}_{t}\boldsymbol{\beta}_{0}$ and $\hat{\varrho}_{t}=\sum_{k=1}^{t}\rho^{t-k}_{0}\boldsymbol{z}^{\prime}_{k}\boldsymbol{\beta}_{0}$ exist as $t\rightarrow\infty$ .

(B6)

There exist positive constants $\alpha$ , $c_{\rho}$ , $c_{\sigma}$ , $c_{\tilde{\eta}}$ , $c_{\tilde{\eta}^{\prime}}$ and $c_{\gamma}$ such that the following hold for sufficiently large $n$ :

	$\displaystyle\pi\left(\|\rho\|>\exp(\left(\alpha n\right)^{1/16})\right)$	$\displaystyle\leq c_{\rho}\exp\left(-\alpha n\right);$
	$\displaystyle\pi\left(\exp(-\left(\alpha n\right)^{1/16})\leq\sigma\leq\exp(\left(\alpha n\right)^{1/16})\right)$	$\displaystyle\geq 1-c_{\sigma}\exp\left(-\alpha n\right);$
	$\displaystyle\pi\left(\\|\tilde{\eta}\\|\geq\exp(\left(\alpha n\right)^{1/16})\right)$	$\displaystyle\leq c_{\tilde{\eta}}\exp\left(-\alpha n\right);$
	$\displaystyle\pi\left(\\|\tilde{\eta}^{\prime}\\|\geq\exp(\left(\alpha n\right)^{1/16})\right)$	$\displaystyle\leq c_{\tilde{\eta}^{\prime}}\exp\left(-\alpha n\right);$
	$\displaystyle\pi\left(\sum_{i=1}^{\infty}\|\gamma_{i}\|\geq\exp(\left(\alpha n\right)^{1/16})\right)$	$\displaystyle\leq c_{\gamma}\exp\left(-\alpha n\right),$

(B7)

$L(m_{n+1}-m_{n})\leq\exp(\left(\alpha(n+1)\right)^{1/16})-\exp(\left(\alpha n\right)^{1/16})$ , for $n\geq n_{0}$ , for some $n_{0}\geq 1$ .

8.2 Discussion of the assumptions in the light of the ultra high-dimensional setup

Condition (B1) holds if the covariates $\left\{z_{it};i\geq 1,t\geq 1\right\}$ , is a realization of some stochastic process with almost surely finite sup-norm, for example, Gaussian process. Assumption (B1), along with (8.3) and (8.4) leads to the following result:

|\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0}|<C,

(8.9)

for some $C>0$ . To see this, first let $\boldsymbol{\beta}_{0}$ correspond to the true quantities $\boldsymbol{\gamma}_{0}$ and $\tilde{\eta}_{0}$ . Then observe that $|\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0}|\leq\sum_{i=1}^{m}|z_{it}||\beta_{i0}|\leq\underset{t\geq 1}{\sup}~{}\|z_{t}\|\|\tilde{\eta}_{0}\|\sum_{i=1}^{\infty}|\gamma_{i0}|<C$ , since $\underset{t\geq 1}{\sup}~{}\|z_{t}\|<\infty$ by (B5), $\|\tilde{\eta}_{0}\|<\infty$ by (8.3) and $\sum_{i=1}^{\infty}|\gamma_{i0}|<\infty$ by (8.4). Condition (B1) is required for some limit calculations and boundedness of some norms associated with concentration inequalities.

Condition (B2) says that the covariates at different time points, after scaling by $\sqrt{n}$ , are asymptotically orthogonal. This condition also imply the following:

\displaystyle\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{m,t+k}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}\rightarrow 0~{}\mbox{almost surely, and}~{}~{}\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m0}\boldsymbol{z}_{m,t+k}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0}\rightarrow 0~{}\mbox{for any}~{}k>1;

(8.10)

To see (8.10), observe that

\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{m,t+k}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}=\boldsymbol{\beta}^{\prime}_{m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{m,t+k}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{m}\leq\|\boldsymbol{\beta}_{m}\|^{2}\left\|\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{m,t+k}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\right\|_{op}.

(8.11)

In (8.11), $\|\boldsymbol{\beta}_{m}\|$ denotes the Euclidean norm of $\boldsymbol{\beta}_{m}$ and for any matrix $\boldsymbol{A}$ , $\|\boldsymbol{A}\|_{op}$ denotes the operator norm of $\boldsymbol{A}$ given by $\|\boldsymbol{A}\|_{op}=\underset{\|\boldsymbol{u}\|=1}{\sup}~{}\|\boldsymbol{A}\boldsymbol{u}\|$ . By (B2), $\left\|\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{m,t+k}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\right\|_{op}\rightarrow 0$ as $n\rightarrow\infty$ . Also,

\|\boldsymbol{\beta}_{m}\|^{2}\leq\sum_{i=1}^{\infty}\gamma^{2}_{i}\tilde{\beta}^{2}_{i}\leq\|\tilde{\eta}\|^{2}\sum_{i=1}^{\infty}\gamma^{2}_{i}<\infty,~{}\mbox{almost surely},

(8.12)

by (8.3) and (8.4). It follows from (8.12) that (8.11) is almost surely finite. This and (B2) together imply the first part of the limit 8.10). Since $\|\boldsymbol{\beta}_{0}\|<\infty$ , the second limit of 8.10) follows in the same way.

As shown in Section 8.3, $\lambda_{n}\rightarrow 0$ as $n\rightarrow\infty$ , even if $\underset{t=1,\ldots,n}{\sup}~{}\|\boldsymbol{z}_{mt}\|=O(n^{r}),~{}\mbox{where}~{}r<1$ , that is, even if (B1) does not hold. Since we assume only as much as $\lambda_{n}$ is bounded above, (B3) is a reasonably mild assumption.

In (B4), (8.6) can be made to hold in practice by centering the covariates, that is, by setting $\tilde{\boldsymbol{z}}_{mt}=\boldsymbol{z}_{mt}-\bar{\boldsymbol{z}}_{m}$ , where $\bar{\boldsymbol{z}}_{m}=\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{z}_{mt}$ . In (B1) (8.7) we assume that $c(\boldsymbol{\beta})$ and $c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})$ remain finite for any choice of $\left\{\boldsymbol{\beta}_{m};n=1,2,\ldots\right\}$ . To see that finiteness holds, first note that

\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}=\boldsymbol{\beta}^{\prime}_{m}\left(\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\right)\boldsymbol{\beta}_{m}\leq\|\boldsymbol{\beta}_{m}\|^{2}\left\|\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\right\|_{op}.

(8.13)

In (8.13), $\|\boldsymbol{\beta}_{m}\|<\infty$ almost surely, by (8.12), and $\left\|\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\right\|_{op}<\infty$ by (B3). Hence, (8.11) is finite. Similarly, $\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0}=\boldsymbol{\beta}^{\prime}_{m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{m0}\leq\|\boldsymbol{\beta}_{m}\|\|\boldsymbol{\beta}_{m0}\|\left\|\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\right\|_{op}$ , which is again almost surely finite due to (8.3), (8.4) and (B3). Thus, (8.3) and (8.4) are precisely the conditions that induce sparsity within our model in the sense of controlling the quadratic forms involving $\boldsymbol{\beta}_{m}$ and $\boldsymbol{\beta}_{m0}$ , given that (B4) holds. Assumptions on the existence of the limits are required for conditions (S2) and (S3) of Shalizi. As can be observe from Section 8.3, $\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}\rightarrow 0$ , almost surely as $n\rightarrow\infty$ , if the asymptotically orthogonal covariates satisfy $\underset{t=1,\ldots,n}{\sup}~{}\|\boldsymbol{z}_{mt}\|=O(n^{r}),~{}\mbox{where}~{}r<1$ , that is, even if (B1) does not hold. Hence, in this situation, the required limits of the quadratic forms exist and are zero, under very mild conditions.

Again, the limit existence assumption (B5) is required for verification of conditions (S2) and (S3) of Shalizi.

Assumption (B6), required to satisfy condition (S5) of Shalizi, is reasonably mild. The threshold $\exp(\left(\alpha n\right)^{1/16})$ for the probabilities involving $\|\tilde{\eta}\|$ and $\|\tilde{\eta}^{\prime}\|$ can be replaced with the order of $\sqrt{n}$ for Gaussian process priors or for independent sub-Gaussian components of $\boldsymbol{\beta}$ . However, note that priors such as gamma or inverse gamma for $\sigma$ do not necessarily satisfy the condition. In such cases, one can modify the prior by replacing the tail part of the prior, after an arbitrarily large positive value, with a thin-tailed prior, such as normal. In practice, such modified priors would be effectively the same as gamma or inverse gamma priors, and yet would satisfy the conditions of (B6).

Assumption (B7), in conjunction with boundedness of $|\gamma_{i}|$ , for all $i$ by $L$ , is a mild condition ensuring that $\mathcal{G}_{n}$ are increasing in $n$ , when $n\geq n_{0}$ , for some $n_{0}\geq 1$ .

8.3 High-dimensional but not ultra high-dimensional setup

The setup we discussed so far deals with the so-called ultra high-dimensional problem, in the sense that $\frac{m_{n}}{n}\rightarrow\infty$ as $n\rightarrow\infty$ . This is a challenging problem to address and we required a prior for $\boldsymbol{\beta}$ satisfying $\|\boldsymbol{\beta}\|<\infty$ almost surely. However, if we are only interested in the problem where $\frac{m_{n}}{n}\rightarrow 0$ as $n\rightarrow\infty$ , then it is not necessary to insist on priors to ensure finiteness of $\|\boldsymbol{\beta}\|$ . For example, if the covariates $\boldsymbol{z}_{mt}$ are orthogonal, then assuming that

\underset{t=1,\ldots,n}{\sup}~{}\|\boldsymbol{z}_{mt}\|=O(n^{r}),~{}\mbox{where}~{}r<1,

(8.14)

$\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}$ has maximum eigenvalue $O(n^{r-1})$ , so that (8.11) entails

\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}=O\left(\|\boldsymbol{\beta}_{m}\|^{2}n^{r-1}\right).

(8.15)

Now, if the components of $\boldsymbol{\beta}_{m}$ are independent and sub-Gaussian with mean zero, then by the Hanson-Wright inequality (see, for example, Rudelson and Vershynin (2013)) we have

	$\displaystyle P\left(\left\|\sum_{t=1}^{m}\beta^{2}_{t}-\sum_{t=1}^{m}E(\beta^{2}_{t})\right\|>n^{1-r}-\sum_{t=1}^{m}E(\beta^{2}_{t})\right)$
	$\displaystyle\qquad\leq 2\exp\left(-L_{1}\min\left\{\frac{\left(n^{1-r}-\sum_{t=1}^{m}E(\beta^{2}_{t})\right)^{2}}{L^{4}_{2}m},\frac{n^{1-r}-\sum_{t=1}^{m}E(\beta^{2}_{t})}{L^{2}_{2}}\right\}\right),$		(8.16)

where $L_{1}>0$ is some constant and $L_{2}$ is the upper bound of the sub-Gaussian norm. Let $\tilde{m}=\sum_{t=1}^{m}E(\beta^{2}_{t})$ . If $\frac{n^{1-r}-\tilde{m}}{\sqrt{\tilde{m}}}\rightarrow\tilde{c}~{}(>0)$ , where $\tilde{c}$ is finite or infinite, then (8.16) is summable. Hence, by the Borel-Cantelli lemma, $\sum_{t=1}^{m}\beta^{2}_{t}\leq n^{1-r}$ almost surely, as $n\rightarrow\infty$ . It then follows from (8.15) that $\frac{1}{n}\sum_{t=1}^{n}\boldsymbol{\beta}^{\prime}_{m}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}<\infty$ almost surely as $n\rightarrow\infty$ .

For the non-ultra high-dimensional setup, the problem is largely simplified. Indeed, introduction of $\tilde{\eta}$ and $\tilde{\eta}^{\prime}$ are not required, as we can directly consider sub-Gaussian priors for $\boldsymbol{\beta}$ as detailed above. Consequently, in (B3), only the first two inequalities are needed and assumption (B6) is no longer required. Since the ultra high-dimensional setup is far more challenging than the non-ultra high-dimensional setup, we consider only the former setup for our purpose, and note that the latter setup can be dealt with using almost the same ideas but with much less effort.

Assumptions (B1)–(B6) lead to the following results that are the main ingredients in proving our posterior convergence in the ultra high-dimensional setup.

Lemma 23

Under (B1), (B2) and (B5), the KL-divergence rate $h(\boldsymbol{\theta})$ exists for each $\boldsymbol{\theta}\in\boldsymbol{\Theta}$ and is given by

h(\boldsymbol{\theta})=\log\left(\frac{\sigma}{\sigma_{0}}\right)+\left(\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right)\left(\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)\\ +\left(\frac{\rho^{2}}{2\sigma^{2}}-\frac{\rho^{2}_{0}}{2\sigma^{2}_{0}}\right)\left(\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)+\frac{c(\boldsymbol{\beta})}{2\sigma^{2}}-\frac{c(\boldsymbol{\beta}_{0})}{2\sigma^{2}_{0}}\\ -\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\left(\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)-\left(\frac{c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})}{\sigma^{2}}-\frac{c(\boldsymbol{\beta}_{0})}{\sigma^{2}_{0}}\right).

(8.17)

Theorem 24

Under (B1), (B2) and (B5), the asymptotic equipartition property holds and is given by

\underset{n\rightarrow\infty}{\lim}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})=-h(\boldsymbol{\theta}).

Furthermore, the convergence is uniform on any compact subset of $\boldsymbol{\Theta}$ .

Lemma 23 and Theorem 24 ensure that (S1) – (S3) hold, and (S4) holds since $h(\boldsymbol{\theta})$ is almost surely finite. (B6) implies that $\mathcal{G}_{n}$ increases to $\boldsymbol{\Theta}$ . In Section S-13.5 we verify (S5).

Now observe that the aim of assumption (S6) is to ensure that (see the proof of Lemma 7 of Shalizi (2009)) for every $\varepsilon>0$ and for all $n$ sufficiently large,

\frac{1}{n}\log\int_{\mathcal{G}_{n}}R_{n}(\boldsymbol{\theta})d\pi(\boldsymbol{\theta})\leq-h\left(\mathcal{G}_{n}\right)+\varepsilon,~{}\mbox{almost surely}.

Since $h\left(\mathcal{G}_{n}\right)\rightarrow h\left(\boldsymbol{\Theta}\right)$ as $n\rightarrow\infty$ , it is enough to verify that for every $\varepsilon>0$ and for all $n$ sufficiently large,

\frac{1}{n}\log\int_{\mathcal{G}_{n}}R_{n}(\boldsymbol{\theta})d\pi(\boldsymbol{\theta})\leq-h\left(\boldsymbol{\Theta}\right)+\varepsilon,~{}\mbox{almost surely}.

(8.18)

In this regard, first observe that

$\displaystyle\frac{1}{n}\log\int_{\mathcal{G}_{n}}R_{n}(\boldsymbol{\theta})d\pi(\boldsymbol{\theta})$	$\displaystyle\leq\frac{1}{n}\log\left[\underset{\boldsymbol{\theta}\in\mathcal{G}_{n}}{\sup}~{}R_{n}(\boldsymbol{\theta})\pi(\mathcal{G}_{n})\right]$
	$\displaystyle=\frac{1}{n}\log\left[\underset{\boldsymbol{\theta}\in\mathcal{G}_{n}}{\sup}~{}R_{n}(\boldsymbol{\theta})\right]+\frac{1}{n}\log\pi(\mathcal{G}_{n})$
	$\displaystyle=\underset{\boldsymbol{\theta}\in\mathcal{G}_{n}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+\frac{1}{n}\log\pi(\mathcal{G}_{n})$
	$\displaystyle\leq\frac{1}{n}\underset{\boldsymbol{\theta}\in\mathcal{G}_{n}}{\sup}~{}\log R_{n}(\boldsymbol{\theta}),$	(8.19)

where the last inequality holds since $\frac{1}{n}\log\pi(\mathcal{G}_{n})\leq 0$ . Now, letting $\mathcal{S}=\left\{\boldsymbol{\theta}:h(\boldsymbol{\theta})\leq\kappa\right\}$ , where $\kappa>h\left(\boldsymbol{\Theta}\right)$ is large as desired,

	$\displaystyle\underset{\boldsymbol{\theta}\in\mathcal{G}_{n}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})$	$\displaystyle\leq\underset{\boldsymbol{\theta}\in\boldsymbol{\Theta}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})=\underset{\boldsymbol{\theta}\in\mathcal{S}\cup\mathcal{S}^{c}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})$
		$\displaystyle\leq\max\left\{\underset{\boldsymbol{\theta}\in\mathcal{S}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta}),\underset{\boldsymbol{\theta}\in\mathcal{S}^{c}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})\right\}.$		(8.20)

From (8.17) it is clear that $h(\boldsymbol{\theta})$ is continuous in $\boldsymbol{\theta}$ and that $h(\boldsymbol{\theta})\rightarrow\infty$ as $\|\boldsymbol{\theta}\|\rightarrow\infty$ . In other words, $h(\boldsymbol{\theta})$ is a continuous coercive function. Hence, $\mathcal{S}$ is a compact set (see, for example, Lange (2010)). Hence it easily follows that (see Chatterjee and Bhattacharya (2020)), that

\underset{\boldsymbol{\theta}\in\mathcal{S}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})\rightarrow\underset{\boldsymbol{\theta}\in\mathcal{S}}{\sup}~{}-h(\boldsymbol{\theta})=-h\left(\mathcal{S}\right),~{}\mbox{almost surely, as}~{}n\rightarrow\infty.

(8.21)

We now show that

\underset{\boldsymbol{\theta}\in\mathcal{S}^{c}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})\leq-h\left(\boldsymbol{\Theta}\right)~{}\mbox{almost surely, as}~{}n\rightarrow\infty.

(8.22)

First note that if $\underset{\boldsymbol{\theta}\in\mathcal{S}^{c}}{\sup}~{}\frac{1}{n}\log R_{n}(\boldsymbol{\theta})>-h\left(\boldsymbol{\Theta}\right)$ infinitely often, then $\frac{1}{n}\log R_{n}(\boldsymbol{\theta})>-h\left(\boldsymbol{\Theta}\right)$ for some $\boldsymbol{\theta}\in\mathcal{S}^{c}$ infinitely often. But $\frac{1}{n}\log R_{n}(\boldsymbol{\theta})>-h\left(\boldsymbol{\Theta}\right)$ if and only if $\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+h(\boldsymbol{\theta})>h(\boldsymbol{\theta})-h\left(\boldsymbol{\Theta}\right),~{}\mbox{for}~{}\boldsymbol{\theta}\in\mathcal{S}^{c}.$ Hence, if we can show that

P\left(\left|\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+h(\boldsymbol{\theta})\right|>\kappa-h\left(\boldsymbol{\Theta}\right),~{}\mbox{for}~{}\boldsymbol{\theta}\in\mathcal{S}^{c}~{}\mbox{infinitely often}\right)=0,

(8.23)

then (8.22) will be proved. We use the Borel-Cantelli lemma to prove (8.23). In other words, we prove that

Theorem 25

Under (B5), (8.3) and (8.4),

\sum_{n=1}^{\infty}\int_{\mathcal{S}^{c}}P\left(\left|\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+h(\boldsymbol{\theta})\right|>\kappa-h\left(\boldsymbol{\Theta}\right)\right)d\pi(\boldsymbol{\theta})<\infty.

(8.24)

The proof of Theorem 25 heavily uses (8.9), which is ensured by (B5), (8.3) and (8.4). Since $h(\boldsymbol{\theta})$ is continuous, (S7) holds trivially.

We provide detailed verification of the seven assumptions of Shalizi in the supplement, which leads to the following result:

Theorem 26

Under assumptions (B1) – (B6), the non-marginal multiple testing procedure for testing (8.5) is consistent.

Needless to mention, all the results on error convergence of the non-marginal method also continue to hold for this setup under (B1) – (B6), thanks to verification of Shalizi’s conditions.

8.4 Remark on identifiability of our model and posterior consistency

Note that we have modeled $\boldsymbol{\beta}$ in terms of $\boldsymbol{\gamma}$ and $\tilde{\eta}$ . But from the likelihood it is evident that although $\boldsymbol{\beta}$ is identifiable, $\boldsymbol{\gamma}$ and $\tilde{\eta}$ are not. But this is not an issue since our interest is in the posterior of $\boldsymbol{\beta}$ , not of $\boldsymbol{\gamma}$ or $\tilde{\eta}$ . Indeed, Theorem 3 of Shalizi guarantees that the posterior of the set $\{\boldsymbol{\theta}:h(\boldsymbol{\theta})\leq h(\boldsymbol{\Theta})+\varepsilon\}$ tends to 1 as $n\rightarrow\infty$ , for any $\varepsilon>0$ . We show in the supplement that $h(\boldsymbol{\Theta})=0$ in our case. Since $h(\boldsymbol{\theta}_{0})=0$ , where $\boldsymbol{\theta}_{0}$ is the true parameter which includes $\boldsymbol{\beta}_{0}$ and lies in $\{\boldsymbol{\theta}:h(\boldsymbol{\theta})<\varepsilon\}$ for any $\varepsilon>0$ , it follows that the posterior of $\boldsymbol{\beta}$ is consistent.

9 Summary and conclusion

In this article, we have investigated asymptotic properties of the Bayesian non-marginal procedure under the general dependence structure when the number of hypotheses also tend to infinity with the sample size. We specifically showed that our method is consistent even in this setup, and that the different Bayesian versions of the error rates converge to zero exponentially fast, and that the expectations of the Bayesian versions with respect to the data also tend to zero. Since our results hold for any choice of the groups, it follows that they hold even for singleton groups, that is, for marginal decision rules. The results associated with $\alpha$ -control also continue to hold in the same spirit as the finite-dimensional setup developed in Chandra and Bhattacharya (2020). Interestingly, provided that Shalizi’s conditions hold, almost no assumption is required on the growth rate of the number of hypotheses to establish the results of the multiple testing procedures in high dimensions. Although in several cases, unlike the exact fixed-dimensional limits established in Chandra and Bhattacharya (2020), the exact high-dimensional limits associated with the error rates could not be established, exponential convergence to zero in high dimensions could still be achieved. Moreover, internal consistency of our results, as we make transition from fixed dimension to high dimensions, are always ensured.

An important objective of this research is to show that the finite-dimensional time-varying variable selection problem in the autoregressive setup introduced in Chandra and Bhattacharya (2020) admits extension to the setup where the number of covariates to be selected by our Bayesian non-marginal procedure, grows with sample size. Indeed, we have shown that under reasonable assumptions, our asymptotic theories remain valid for this problem for both high-dimensional and ultra high-dimensional situations. Different priors for the regression coefficients are of course warranted, and we have discussed the classes of such relevant priors for the two different setups. As much as we are aware of, at least in the time series context, such high-dimensional multiple hypotheses testing is not hitherto dealt with. The priors that we introduce, particularly in the ultra high-dimensional context, also do not seem to have been considered before. These priors, in conjunction with the equipartition property, help control sparsity of the model quite precisely. As such, these ideas seem to be of independent interest for general high-dimensional asymptotics.

Supplementary Material

S-10 Proof of Theorem 4

Proof. From conditions (4.2) and (4.3), it follows that there exists $n_{1}$ such that for all $n>n_{1}$

	$\displaystyle\beta_{n}$	$\displaystyle>\underline{\beta}-\delta,$		(S-10.1)
	$\displaystyle\beta_{n}$	$\displaystyle<1-\delta,\text{ such that}$		(S-10.2)

$\underline{\beta}-\delta>0$ and $1-\bar{\beta}>\delta$ , for some $\delta>0$ . It follows using this, (4.7) and (4.9), that for $n>n_{1}$ ,

	$\displaystyle\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d^{t}_{i}w_{in}(\boldsymbol{d}^{t}(m_{n}))-\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))$		(S-10.3)
	$\displaystyle\qquad\qquad>\left(1-e^{-n(J-\epsilon)}\right)\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}d^{t}_{i}-e^{-n(J-\epsilon)}\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}d_{i},~{}\mbox{and}$
	$\displaystyle\beta_{n}\left(\sum_{i:\boldsymbol{d}\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d^{t}_{i}-\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d_{i}\right)<(1-\delta)\sum_{i:\boldsymbol{d}\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d^{t}_{i}-(\underline{\beta}-\delta)\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d_{i}.$		(S-10.4)

Now $n_{1}$ can be appropriately chosen such that $e^{-n(J-\epsilon)}<\min\{\delta,\underline{\beta}-\delta\}$ . Hence, for $n>\max\{n_{0},n_{1}\}$ ,

		$\displaystyle\sum_{i:\boldsymbol{d}\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d^{t}_{i}w_{in}(\boldsymbol{d}^{t}(m_{n}))-\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))>\beta_{n}\left(\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d^{t}_{i}-\sum_{i:\boldsymbol{d}(m_{n})\in\mathbb{D}_{i,m_{n}}^{c}}^{m_{n}}d_{i}\right),$
		$\displaystyle\qquad~{}\mbox{for all}~{}\boldsymbol{d}(m_{n})\neq\boldsymbol{d}^{t}(m_{n}),\text{ almost surely};$
	$\displaystyle\Rightarrow$	$\displaystyle\sum_{i=1}^{m_{n}}d^{t}_{i}(w_{in}(\boldsymbol{d}^{t}(m_{n}))-\beta_{n})>\sum_{i=1}^{m_{n}}d_{i}(w_{in}(\boldsymbol{d}(m_{n}))-\beta_{n}),~{}\mbox{for all}~{}\boldsymbol{d}(m_{n})\neq\boldsymbol{d}^{t}(m_{n}),\text{ almost surely};$
	$\displaystyle\Rightarrow$	$\displaystyle\lim_{n\rightarrow\infty}\delta_{\mathcal{NM}}(\boldsymbol{d}^{t}(m_{n})\|\boldsymbol{X}_{n})=1,~{}\mbox{almost surely}.$

Hence, (4.11) holds, and by the dominated convergence theorem, (4.12) also follows.

S-11 Proof of Theorem 7

Proof.

		$\displaystyle\sum_{\boldsymbol{d}(m_{n})\neq\boldsymbol{0}}\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{in}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}\delta_{\mathcal{NM}}\left(\boldsymbol{d}(m_{n})\|\boldsymbol{X}_{n}\right)$
	$\displaystyle=$	$\displaystyle\frac{\sum_{i=1}^{m_{n}}d_{i}^{t}(1-w_{in}(\boldsymbol{d}^{t}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}^{t}}\delta_{\mathcal{NM}}\left(\boldsymbol{d}^{t}(m_{n})\|\boldsymbol{X}_{n}\right)+\sum_{\boldsymbol{d}(m_{n})\neq\boldsymbol{d}^{t}(m_{n})\neq\boldsymbol{0}}\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{in}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}\delta_{\mathcal{NM}}\left(\boldsymbol{d}(m_{n})\|\boldsymbol{X}_{n}\right).$

Following Theorem 4, it holds, almost surely, that there exists $N\geq 1$ such that for all $n>N$ , $\delta_{\mathcal{NM}}\left(\boldsymbol{d}(m_{n})|\boldsymbol{X}_{n}\right)=0$ for all $\boldsymbol{d}(m_{n})\neq\boldsymbol{d}^{t}(m_{n})$ . Therefore, for $n>N$ ,

		$\displaystyle\sum_{\boldsymbol{d}(m_{n})\neq\boldsymbol{0}}\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{in}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}\delta_{\mathcal{NM}}\left(\boldsymbol{d}(m_{n})\|\boldsymbol{X}_{n}\right)$
	$\displaystyle=$	$\displaystyle\frac{\sum_{i=1}^{m_{n}}d_{i}^{t}(1-w_{in}(\boldsymbol{d}^{t}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}^{t}}\delta_{\mathcal{NM}}\left(\boldsymbol{d}^{t}(m_{n})\|\boldsymbol{X}_{n}\right)$
	$\displaystyle\leq$	$\displaystyle\frac{\sum_{i=1}^{m_{n}}d_{i}^{t}e^{-n(J-\epsilon)}}{\sum_{i=1}^{m_{n}}d_{i}^{t}}$
	$\displaystyle=$	$\displaystyle e^{-n(J-\epsilon)}.$

Thus, (5.1) is established. Using (4.10) and Corollary 5, (5.2) follows in the same way.

S-11.1 Proof of Theorem 9

Proof. Note that

		$\displaystyle mpBFDR$
		$\displaystyle=E_{\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}(m_{n})\in\mathbb{D}_{m_{n}}}\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{i}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}\delta_{\beta}(\boldsymbol{d}(m_{n})\|\boldsymbol{X}_{n})\bigg{\|}\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})=\boldsymbol{0}\|\boldsymbol{X}_{n})=0\right]$
	$\displaystyle=$	$\displaystyle E_{\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}(m_{n})\in\mathbb{D}_{m_{n}}}\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{i}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})\|\boldsymbol{X}_{n})\bigg{\|}\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})=\boldsymbol{0}\|\boldsymbol{X}_{n})=0\right]$
	$\displaystyle=$	$\displaystyle E_{\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}(m_{n})\in\mathbb{D}_{m_{n}}}\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{i}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}I\left(\sum_{i=1}^{m_{n}}d_{i}>0\right)\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})\|\boldsymbol{X}_{n})\right]\frac{1}{P_{\boldsymbol{X}_{n}}\left[\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})=\boldsymbol{0}\|\boldsymbol{X}_{n})=0\right]}$
	$\displaystyle=$	$\displaystyle E_{\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}(m_{n})\in\mathbb{D}_{m_{n}}\setminus\left\{\boldsymbol{0}\right\}}\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{i}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})\|\boldsymbol{X}_{n})\right]\frac{1}{P_{\boldsymbol{X}_{n}}\left[\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})=\boldsymbol{0}\|\boldsymbol{X}_{n})=0\right]}.$

From Theorem 7, $mFDR_{\boldsymbol{X}_{n}}\rightarrow 0$ , as $n\rightarrow\infty$ . Also we have

0\leq\sum_{\boldsymbol{d}(m_{n})\in\mathbb{D}_{m_{n}}\setminus\left\{\boldsymbol{0}\right\}}\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{i}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})|\boldsymbol{X}_{n})\leq mFDR_{\boldsymbol{X}_{n}}\leq 1.

Therefore by the dominated convergence theorem, $E_{\boldsymbol{X}_{n}}\left[\sum_{\boldsymbol{d}(m_{n})\in\mathbb{D}_{m_{n}}\setminus\left\{\boldsymbol{0}\right\}}\frac{\sum_{i=1}^{m}d_{i}(1-w_{i}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})|\boldsymbol{X}_{n})\right]\rightarrow 0$ , as $n\rightarrow\infty$ . From (A2) we have $\boldsymbol{d}^{t}(m_{n})\neq\boldsymbol{0}$ and from Theorem 4 we have $E_{\boldsymbol{X}_{n}}[\delta_{\mathcal{NM}}(\boldsymbol{d}^{t}(m_{n})|\boldsymbol{X}_{n})]\rightarrow 1$ . Thus $P_{\boldsymbol{X}_{n}}\left[\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})=\boldsymbol{0}|\boldsymbol{X}_{n})=0\right]\rightarrow 1$ , as $n\rightarrow\infty$ . This proves the result.

It can be similarly shown that $pBFDR\rightarrow 0$ , as $n\rightarrow\infty$ .

S-12 Proof of Theorem 10

Proof. The proof follows in the same way as that of Theorem 7, using ((A2)) in addition.

S-12.1 Proof of Theorem 12

Proof. The proof follows in the same way as that of Theorem 9, using ((A2)) in addition.

S-12.2 Proof of Theorem 13

Proof. Theorem 3.4 of Chandra and Bhattacharya (2019) shows that $mpBFDR$ is non-increasing in $\beta$ . Hence, for every $n>1$ , the maximum error that can be incurred is at $\beta=0$ where we actually maximize $\sum_{i=1}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))$ . Let

\displaystyle\hat{\boldsymbol{d}}(m_{n})

\displaystyle=\operatorname*{argmax}_{\boldsymbol{d}(m_{n})\in\mathbb{D}_{m_{n}}}\sum_{i=1}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))=\operatorname*{argmax}_{\boldsymbol{d}(m_{n})\in\mathbb{D}_{m_{n}}}\left[\sum_{i=1}^{m_{1n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))+\sum_{i=m_{1n}+1}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))\right]

Since the groups in $\{G_{i,m_{n}}:i=1,\ldots,m_{1n}\}$ have no overlap with those in $\{G_{i,m_{n}}:i=m_{1n}+1,\ldots,m_{n}\}$ , $\sum_{i=1}^{m_{1n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))$ and $\sum_{i=m_{1n}+1}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))$ can be maximized separately.

Let us define the following notations:

	$\displaystyle Q_{\boldsymbol{d}(m_{n})}=\left\{i\in\{1,\ldots,m_{n}\}:\mbox{all elements of}~{}\boldsymbol{d}_{G_{i,m_{n}}}~{}\mbox{are correct}\right\};$
	$\displaystyle Q_{\boldsymbol{d}(m_{n})}^{m_{1n}}=Q_{\boldsymbol{d}(m_{n})}\cap\{1,2,\ldots,m_{1n}\},~{}Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}=\{1,2,\cdots,m_{1n}\}\setminus Q_{\boldsymbol{d}(m_{n})}^{m_{1n}}.$

Now,

		$\displaystyle\sum_{i=1}^{m_{1n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))-\sum_{i=1}^{m_{1n}}d_{i}^{t}w_{in}(\boldsymbol{d}^{t}(m_{n}))$
	$\displaystyle=$	$\displaystyle\left[\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))-\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}}}d_{i}^{t}w_{in}(\boldsymbol{d}^{t}(m_{n}))\right]+\left[\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))-\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d_{i}^{t}w_{in}(\boldsymbol{d}^{t}(m_{n}))\right]$
	$\displaystyle=$	$\displaystyle\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))-\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d_{i}^{t}w_{in}(\boldsymbol{d}^{t}(m_{n})),$

since for any $\boldsymbol{d}(m_{n})$ , $\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))=\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}}}d_{i}^{t}w_{in}(\boldsymbol{d}^{t}(m_{n}))$ by definition of $Q_{\boldsymbol{d}(m_{n})}^{m_{1n}}$ .

Note that $\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d_{i}^{t}w_{in}(\boldsymbol{d}^{t}(m_{n}))$ can not be zero as it contradicts (B) that $\left\{G_{i,m_{n}}:i=1,\ldots,m_{1n}\right\}$ have at least one false null hypothesis.

Now, from (4.7) and (4.9), we obtain for $n\geq n_{0}(\epsilon)$ ,

	$\displaystyle\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))-\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d_{i}^{t}w_{in}(\boldsymbol{d}^{t}(m_{n}))$
	$\displaystyle\qquad<e^{-n(J-\epsilon)}\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}\left(d_{i}+d^{t}_{i}\right)-\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d^{t}_{i}$
	$\displaystyle\qquad<2m_{1n}e^{-n(J-\epsilon)}-\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d^{t}_{i}.$		(S-12.1)

By our assumption (7.3), $m_{n}e^{-n(J-\epsilon)}\rightarrow 0$ as $n\rightarrow\infty$ , so that $m_{1n}e^{-n(J-\epsilon)}\rightarrow 0$ as $n\rightarrow\infty$ . Also, $\sum_{i\in Q_{\boldsymbol{d}(m_{n})}^{m_{1n}c}}d^{t}_{i}>0$ . Hence, (S-12.1) is negative for sufficient;y large $n$ . In other words, $\boldsymbol{d}^{t}(m_{n})$ maximizes $\sum_{i=1}^{m_{1n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))$ for sufficiently large $n$ .

Let us now consider the term $\sum_{i=m_{1n}+1}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))$ . Note that $\sum_{i=m_{1n}+1}^{m_{n}}d_{i}^{t}w_{in}(\boldsymbol{d}^{t}(m_{n}))=0$ by (B). For any finite $n$ , $\sum_{i=m_{1n}+1}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))$ is maximized for some decision configuration $\tilde{\boldsymbol{d}}(m_{n})$ where $\tilde{d}_{i}=1$ for at least one $i\in\{m_{1n}+1,\ldots,m_{n}\}$ . In that case,

\hat{\boldsymbol{d}}^{t}(m_{n})=(d^{t}_{1},\ldots,d^{t}_{m_{1n}},\tilde{d}_{m_{1n}+1},\tilde{d}_{m_{1n}+2},\ldots,\tilde{d}_{m_{n}}),

so that for sufficiently large $n$ ,

	$\displaystyle\frac{\sum_{i=1}^{m_{n}}\hat{d}_{i}(1-w_{in}(\hat{\boldsymbol{d}}(m_{n})))}{\sum_{i=1}^{m_{n}}\hat{d}_{i}}\geq 1-\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}w_{in}(\boldsymbol{d}^{t}(m_{n}))+(m_{n}-m_{1n})e^{-n(J-\epsilon)}}{\sum_{i=1}^{m_{n}}d^{t}_{i}+1}$
	$\displaystyle\qquad=\frac{1+\sum_{i=1}^{m_{1n}}d^{t}_{i}\left(1-w_{in}(\boldsymbol{d}^{t})\right)}{\sum_{i=1}^{m_{n}}d^{t}_{i}+1}-\frac{(m_{n}-m_{1n})e^{-n(J-\epsilon)}}{\sum_{i=1}^{m_{n}}d^{t}_{i}+1}.$		(S-12.2)

Now note that

0<\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}\left(1-w_{in}(\boldsymbol{d}^{t})\right)}{m_{n}}<e^{-n(J-\epsilon)}\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}}{m_{n}}<\frac{m_{1n}}{m_{n}}e^{-n(J-\epsilon)}.

(S-12.3)

Since the right most side of (S-12.3) tends to zero as $n\rightarrow\infty$ due to (7.1), it follows that $\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}\left(1-w_{in}(\boldsymbol{d}^{t})\right)}{m_{n}}\rightarrow 0$ as $n\rightarrow\infty$ . Hence, dividing the numerators and denominators of the right hand side of (S-12.2) by $m_{n}$ and taking limit as $n\rightarrow\infty$ shows that

\lim_{n\rightarrow\infty}\frac{\sum_{i=1}^{m_{n}}\hat{d}_{i}(1-w_{in}(\hat{\boldsymbol{d}}(m_{n})))}{\sum_{i=1}^{m_{n}}\hat{d}_{i}}\geq 0.

(S-12.4)

almost surely, for all data sequences. Boundedness of $\frac{\sum_{i=1}^{m_{n}}d_{i}(1-w_{in}(\boldsymbol{d}(m_{n})))}{\sum_{i=1}^{m_{n}}d_{i}}$ for all $\boldsymbol{d}(m_{n})$ and $\boldsymbol{X}_{n}$ ensures uniform integrability, which, in conjunction with the simple observation that for $\beta=0$ ,

P\left(\delta_{\mathcal{NM}}(\boldsymbol{d}(m_{n})=\boldsymbol{0}|\boldsymbol{X}_{n})=0\right)=1

for all $n\geq 1$ , guarantees that under (B), $\underset{n\rightarrow\infty}{\lim}~{}mpBFDR\geq 0$ .

Now, if $G_{m_{1n}+1},\ldots,G_{m_{n}}$ are all disjoint, each consisting of only one true null hypothesis, then $\sum_{i=m_{1n}+1}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))$ will be maximized by $\tilde{\boldsymbol{d}}(m_{n})$ where $\tilde{d}_{i}=1$ for all $i\in\{m_{1n}+1,\ldots,m_{n}\}$ . Since $d^{t}_{i}$ ; $i=1,\ldots,m_{1n}$ maximizes $\sum_{i=1}^{m_{1n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))$ for large $n$ , it follows that $\hat{\boldsymbol{d}}(m_{n})=(d^{t}_{1},\ldots,d^{t}_{m_{1n}},1,1,\ldots,1)$ is the maximizer of $\sum_{i=1}^{m_{n}}d_{i}w_{in}(\boldsymbol{d}(m_{n}))$ for large $n$ . In this case,

\frac{\sum_{i=1}^{m_{n}}\hat{d}_{i}(1-w_{in}(\hat{\boldsymbol{d}}(m_{n})))}{\sum_{i=1}^{m_{n}}\hat{d}_{i}}=1-\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}w_{in}(\boldsymbol{d}^{t}(m_{n}))+\sum_{i=m_{1n}+1}^{m_{n}}w_{in}(\boldsymbol{1})}{\sum_{i=1}^{m_{n}}d^{t}_{i}+m_{n}-m_{1n}}.

(S-12.5)

Now, for large enough $n$ ,

\left(1-e^{-n(J-\epsilon)}\right)\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}}{m_{n}}<\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}w_{in}(\boldsymbol{d}^{t}(m_{n}))}{m_{n}}<\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}}{m_{n}}.

(S-12.6)

Since due to (7.2), $\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}}{m_{n}}\rightarrow p$ , as $n\rightarrow\infty$ , it follows from (S-12.6) that

\frac{\sum_{i=1}^{m_{1n}}d^{t}_{i}w_{in}(\boldsymbol{d}^{t}(m_{n}))}{m_{n}}\rightarrow p,~{}\mbox{as}~{}n\rightarrow\infty.

(S-12.7)

Also, since for large enough $n$ ,

0<\frac{\sum_{i=m_{1n}+1}^{m_{n}}w_{in}(\boldsymbol{1})}{m_{n}}<\frac{(m_{n}-m_{1n})}{m_{n}}e^{-n(J-\epsilon)},

it follows using (7.1) that

\frac{\sum_{i=m_{1n}+1}^{m_{n}}w_{in}(\boldsymbol{1})}{m_{n}}\rightarrow 0,~{}\mbox{as}~{}n\rightarrow\infty.

(S-12.8)

Hence, dividing the numerator and denominator in the ratio on the right hand side of (S-12.5) by $m_{n}$ and using the limits (S-12.7), (S-12.8) and (7.1) as $n\rightarrow\infty$ , yields

\lim_{n\rightarrow\infty}\frac{\sum_{i=1}^{m_{n}}\hat{d}_{i}(1-w_{in}(\hat{\boldsymbol{d}}(m_{n})))}{\sum_{i=1}^{m_{n}}\hat{d}_{i}}=\frac{1-q}{1+p-q}.

(S-12.9)

Hence, in this case, the maximum $mpBFDR$ (that can be incurred at $\beta=0$ ) for $n\rightarrow\infty$ is given by

\lim_{n\rightarrow\infty}mpBFDR_{\beta=0}=\frac{1-q}{1+p-q}.

Note that this is also the maximum asymptotic $mpBFDR$ that can be incurred among all possible configurations of $G_{m_{1n}+1},\ldots,G_{m_{n}}$ . Hence, for any arbitrary configuration of groups, the maximum asymptotic $mpBFDR$ that can be incurred lies in the interval $\left(0,\frac{1-q}{1+p-q}\right)$ .

S-12.3 Proof of Theorem 17

Proof. Using the facts that $mpBFDR$ is continuous and decreasing in $\beta$ (Chandra and Bhattacharya (2019)) and that $mpBFDR$ tends to $0$ (Theorem 9), the proof follows in the same way as that of Theorem 8 of Chandra and Bhattacharya (2020).

S-12.4 Proof of Theorem 19

Proof. From Chandra and Bhattacharya (2019) it is known that $mpBFDR$ and $pBFDR$ are continuous and non-increasing in $\beta$ . If $\hat{\boldsymbol{d}}(m_{n})$ denotes the optimal decision configuration with respect to the additive loss function, $\hat{d}_{i}=1$ for all $i$ , for $\beta=0$ . Thus, assuming without loss of generality that the first $m_{0n}$ null hypotheses are true,

\frac{\sum_{i=1}^{m_{n}}\hat{d}_{i}(1-v_{in})}{\sum_{i=1}^{m_{n}}\hat{d}_{i}}=1-\frac{\sum_{i=1}^{m_{0n}}v_{in}+\sum_{i=m_{0n}+1}^{m_{n}}v_{in}}{m_{n}}.

(S-12.10)

Now, $0<\frac{\sum_{i=1}^{m_{0n}}v_{in}}{m_{n}}<\left(1-\frac{m_{0n}}{m_{n}}\right)e^{-n(J-\epsilon)}$ , so that $\frac{\sum_{i=1}^{m_{0n}}v_{in}}{m_{n}}\rightarrow 0$ as $n\rightarrow\infty$ . Also, $\left(1-e^{-n(J-\epsilon)}\right)\left(1-\frac{m_{0n}}{m_{n}}\right)<\frac{\sum_{i=m_{0n}+1}^{m_{n}}v_{in}}{m_{n}}<1-\frac{m_{0n}}{m_{n}}$ , so that $\frac{\sum_{i=m_{0n}+1}^{m_{n}}v_{in}}{m_{n}}\rightarrow p_{0}$ , as $n\rightarrow\infty$ . Hence, taking limits on both sides of (S-12.10), we obtain

\lim_{n\rightarrow\infty}\frac{\sum_{i=1}^{m_{n}}\hat{d}_{i}(1-v_{in})}{\sum_{i=1}^{m_{n}}\hat{d}_{i}}=p_{0}.

The remaining part of the proof follows in the same way as that of Theorem 17.

S-12.5 Proof of Theorem 20

Proof. The proof follows in the same way as that of Theorem 10 of Chandra and Bhattacharya (2020) using the facts $mpBFDR_{\beta}>pBFDR_{\beta}$ for any multiple testing procedure, $\underset{n\rightarrow\infty}{\lim}~{}pBFDR_{\beta=0}=p_{0}$ (due to Theorem 19), and that $mpBFDR$ is continuous and non-increasing in $\beta$ and tends to zero as $n\rightarrow\infty$ .

S-12.6 Proof of Theorem 21

Proof. Note that by Theorem 17, there exists a sequence $\{\beta_{n}\}$ such that $\lim_{n\rightarrow\infty}mpBFDR_{\beta_{n}}=\alpha$ , where $\alpha\in\left(0,\frac{1-q}{1+p-q}\right)$ . Let $\hat{\boldsymbol{d}(m_{n})}$ be the optimal decision configuration associated with the sequence $\{\beta_{n}\}$ . The proofs of Theorem 13 and 17 show that $\hat{d}_{in}=d_{i}^{t}$ for $i=1,\cdots,m_{1n}$ and $\sum_{i=m_{1n}+1}^{m_{n}}\hat{d}_{in}>0$ . Hence, using (4.8) we obtain

		$\displaystyle\frac{\sum_{i=1}^{m_{n}}(1-\hat{d}_{in})v_{in}}{\sum_{i=1}^{m_{n}}(1-\hat{d}_{in})}\leq\frac{\sum_{i=1}^{m_{n}}(1-d_{i}^{t})v_{in}}{\sum_{i=1}^{m_{n}}(1-\hat{d}_{in})}<e^{-n(J-\epsilon)}\times\frac{\sum_{i=1}^{m_{n}}(1-d_{i}^{t}))}{\sum_{i=1}^{m_{n}}(1-\hat{d}_{in})}$		(S-12.11)
	$\displaystyle\Rightarrow~{}$	$\displaystyle\frac{1}{n}\log\left(FNR_{\boldsymbol{X}_{n}}\right)<-J+\epsilon+\frac{1}{n}\log\left[\sum_{i=1}^{m_{n}}(1-d_{i}^{t})\right]-\frac{1}{n}\log\left[\sum_{i=1}^{m_{n}}(1-\hat{d}_{in})\right].$		(S-12.12)

Now,

	$\displaystyle 0\leq\frac{1}{n}\log\left[\sum_{i=1}^{m_{n}}(1-d^{t}_{i})\right]\leq\frac{\log m_{n}}{n};$
	$\displaystyle 0\leq\frac{1}{n}\log\left[\sum_{i=1}^{m_{n}}(1-\hat{d}_{in})\right]\leq\frac{\log m_{n}}{n}.$

Since $\frac{\log m_{n}}{n}\rightarrow 0$ , as $n\rightarrow\infty$ ,

	$\displaystyle\lim_{n\rightarrow\infty}\frac{1}{n}\log\left[\sum_{i=1}^{m}(1-d^{t}_{i})\right]=0,~{}\mbox{and}$		(S-12.13)
	$\displaystyle\lim_{n\rightarrow\infty}\frac{1}{n}\log\left[\sum_{i=1}^{m}(1-\hat{d}_{in})\right]=0.$		(S-12.14)

As $\epsilon$ is any arbitrary positive quantity we have from (S-12.12), (S-12.13) and (S-12.14) that

\limsup_{n\rightarrow\infty}\frac{1}{n}\log\left(FNR_{\boldsymbol{X}_{n}}\right)\leq-J.

S-13 Verification of (S1)-(S7) in $AR(1)$ model with time-varying covariates and proofs of the relevant theorems

All the probabilities and expectations below are with respect to the true model $P$ .

S-13.1 Verification of (S1)

We obtain

$\displaystyle-\log R_{n}(\boldsymbol{\theta})$	$\displaystyle=n\log\left(\frac{\sigma}{\sigma_{0}}\right)+\left(\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right)\sum_{t=1}^{n}x^{2}_{t}+\left(\frac{\rho^{2}}{2\sigma^{2}}-\frac{\rho^{2}_{0}}{2\sigma^{2}_{0}}\right)\sum_{t=1}^{n}x^{2}_{t-1}$
	$\displaystyle\qquad+\frac{1}{2\sigma^{2}}\boldsymbol{\beta}^{\prime}_{m}\left(\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\right)\boldsymbol{\beta}_{m}-\frac{1}{2\sigma^{2}_{0}}\boldsymbol{\beta}^{\prime}_{m0}\left(\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\right)\boldsymbol{\beta}_{m0}$
	$\displaystyle\qquad-\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\sum_{t=1}^{n}x_{t}x_{t-1}-\left(\frac{\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\boldsymbol{\beta}_{m0}}{\sigma^{2}_{0}}\right)^{\prime}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t}$
	$\displaystyle\qquad+\left(\frac{\rho\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\rho_{0}\boldsymbol{\beta}_{0}}{\sigma^{2}_{0}}\right)^{\prime}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}.$	(S-13.1)

It is easily seen that $-\log R_{n}(\boldsymbol{\theta})$ is continuous in $\boldsymbol{X}_{n}$ and $\boldsymbol{\theta}$ . Hence, $R_{n}(\boldsymbol{\theta})$ is $\mathcal{F}_{n}\times\mathcal{T}$ measurable. In other words, (S1) holds.

S-13.2 Proof of Lemma 23

It is easy to see that under the true model $P$ ,

	$\displaystyle E(x_{t})$	$\displaystyle=\sum_{k=1}^{t}\rho^{t-k}_{0}\boldsymbol{z}^{\prime}_{mk}\boldsymbol{\beta}_{m0};$		(S-13.2)
	$\displaystyle E(x_{t+h}x_{t})$	$\displaystyle\sim\frac{\sigma^{2}_{0}\rho^{h}_{0}}{1-\rho^{2}_{0}}+E(x_{t+h})E(x_{t});~{}h\geq 0,$		(S-13.3)

where for any two sequences $\{a_{t}\}_{t=1}^{\infty}$ and $\{b_{t}\}_{t=1}^{\infty}$ , $a_{t}\sim b_{t}$ stands for $a_{t}/b_{t}\rightarrow 1$ as $t\rightarrow\infty$ . Hence,

E(x^{2}_{t})\sim\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\left(\sum_{k=1}^{t}\rho^{t-k}_{0}\boldsymbol{z}^{\prime}_{mk}\boldsymbol{\beta}_{m0}\right)^{2}.

(S-13.4)

Now let

\varrho_{t}=\sum_{k=1}^{t}\rho^{t-k}_{0}\boldsymbol{z}^{\prime}_{mk}\boldsymbol{\beta}_{m0}

(S-13.5)

and for $t>t_{0}$ ,

\tilde{\varrho}_{t}=\sum_{k=t-t_{0}}^{t}\rho^{t-k}_{0}\boldsymbol{z}^{\prime}_{mk}\boldsymbol{\beta}_{m0},

(S-13.6)

where, for any $\varepsilon>0$ , $t_{0}$ is so large that

\frac{C\left|\rho_{0}\right|^{t_{0}+1}}{(1-\left|\rho_{0}\right|^{t_{0}})}\leq\varepsilon.

(S-13.7)

It follows, using (8.9) and (S-13.7), that for $t>t_{0}$ ,

\left|\varrho_{t}-\tilde{\varrho}_{t}\right|\leq\sum_{k=1}^{t-t_{0}-1}|\rho_{0}|^{t-k}\left|\boldsymbol{z}^{\prime}_{mk}\boldsymbol{\beta}_{m0}\right|\leq\frac{C|\rho_{0}|^{t_{0}+1}(1-|\rho_{0}|^{t-t_{0}+1})}{1-|\rho_{0}|}\leq\varepsilon.

(S-13.8)

Hence, for $t>t_{0}$ ,

\tilde{\varrho}_{t}-\varepsilon\leq\varrho_{t}\leq\tilde{\varrho}_{t}+\varepsilon.

(S-13.9)

Now,

$\displaystyle\frac{\sum_{t=1}^{n}\tilde{\varrho}_{t}}{n}$	$\displaystyle=\rho^{t_{0}}_{0}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}}{n}\right)^{\prime}\boldsymbol{\beta}_{m0}+\rho^{t_{0}-1}_{0}\left(\frac{\sum_{t=2}^{n}\boldsymbol{z}_{mt}}{n}\right)^{\prime}\boldsymbol{\beta}_{m0}+\rho^{t_{0}-2}_{0}\left(\frac{\sum_{t=3}^{n}\boldsymbol{z}_{mt}}{n}\right)^{\prime}\boldsymbol{\beta}_{m0}+\cdots$
	$\displaystyle\qquad\qquad\cdots+\rho_{0}\left(\frac{\sum_{t=t_{0}}^{n}\boldsymbol{z}_{mt}}{n}\right)^{\prime}\boldsymbol{\beta}_{m0}+\left(\frac{\sum_{t=t_{0}+1}^{n}\boldsymbol{z}_{mt}}{n}\right)^{\prime}\boldsymbol{\beta}_{m0}$
	$\displaystyle\rightarrow 0,~{}\mbox{as}~{}n\rightarrow\infty,~{}\mbox{by virtue of (B4) (\ref{eq:ass1})}.$	(S-13.10)

Similarly, it is easily seen, using (B4), that

\frac{\sum_{t=1}^{n}\tilde{\varrho}^{2}_{t}}{n}\rightarrow\left(\frac{1-\rho^{2(2t_{0}+1)}_{0}}{1-\rho^{2}_{0}}\right)c(\boldsymbol{\beta}_{0}),~{}\mbox{as}~{}n\rightarrow\infty.

(S-13.11)

Since (S-13.8) implies that for $t>t_{0}$ , $\tilde{\varrho}^{2}_{t}+\varepsilon^{2}-2\varepsilon\tilde{\varrho}_{t}\leq\varrho^{2}_{t}\leq\tilde{\varrho}^{2}_{t}+\varepsilon^{2}+2\varepsilon\tilde{\varrho}_{t}$ , it follows that

\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\varrho^{2}_{t}}{n}=\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\tilde{\varrho}^{2}_{t}}{n}+\varepsilon^{2}=\left(\frac{1-\rho^{2(2t_{0}+1)}_{0}}{1-\rho^{2}_{0}}\right)c(\boldsymbol{\beta}_{0})+\varepsilon^{2},

(S-13.12)

and since $\epsilon>0$ is arbitrary, it follows that

\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\varrho^{2}_{t}}{n}=\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}.

(S-13.13)

Hence, it also follows from (S-13.2), (S-13.4), (B4) and (S-13.13), that

\frac{\sum_{t=1}^{n}E(x^{2}_{t})}{n}\rightarrow\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}},~{}\mbox{as}~{}n\rightarrow\infty

(S-13.14)

and

\frac{\sum_{t=1}^{n}E(x^{2}_{t-1})}{n}\rightarrow\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}},~{}\mbox{as}~{}n\rightarrow\infty.

(S-13.15)

Now note that

x_{t}x_{t-1}=\rho_{0}x^{2}_{t-1}+\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{0}x_{t-1}+\epsilon_{t}x_{t-1}.

(S-13.16)

Using (8.10), (S-13.9) and arbitrariness of $\varepsilon>0$ it is again easy to see that

\frac{\sum_{t=1}^{n}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0}E(x_{t-1})}{n}\rightarrow 0,~{}\mbox{as}~{}n\rightarrow\infty.

(S-13.17)

Also, since for $t=1,2,\ldots,$ $E(\epsilon_{t}x_{t-1})=E(\epsilon_{t})E(x_{t-1})$ by independence, and since $E(\epsilon_{t})=0$ for $t=1,2,\ldots$ , it holds that

\frac{\sum_{t=1}^{n}E\left(\epsilon_{t}x_{t-1}\right)}{n}=0,~{}\mbox{for all}~{}n=1,2,\ldots.

(S-13.18)

Combining (S-13.16), (S-13.15), (S-13.17) and (S-13.18) we obtain

\frac{\sum_{t=1}^{n}E\left(x_{t}x_{t-1}\right)}{n}\rightarrow\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}.

(S-13.19)

Using (B4) (8.9) and arbitrariness of $\varepsilon>0$ , it follows that

h(\boldsymbol{\theta})=\underset{n\rightarrow\infty}{\lim}~{}\frac{1}{n}E\left[-\log R_{n}(\boldsymbol{\theta})\right]=\log\left(\frac{\sigma}{\sigma_{0}}\right)+\left(\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right)\left(\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)\\ +\left(\frac{\rho^{2}}{2\sigma^{2}}-\frac{\rho^{2}_{0}}{2\sigma^{2}_{0}}\right)\left(\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)+\frac{c(\boldsymbol{\beta})}{2\sigma^{2}}-\frac{c(\boldsymbol{\beta}_{0})}{2\sigma^{2}_{0}}\\ -\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\left(\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)-\left(\frac{c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})}{\sigma^{2}}-\frac{c(\boldsymbol{\beta}_{0})}{\sigma^{2}_{0}}\right).

In other words, (S2) holds, with $h(\boldsymbol{\theta})$ given by (8.17).

S-13.3 Proof of Theorem 24

Note that

x_{t}=\sum_{k=1}^{t}\rho^{t-k}_{0}\boldsymbol{z}^{\prime}_{mk}\boldsymbol{\beta}_{m0}+\sum_{k=1}^{t}\rho^{t-k}_{0}\epsilon_{k},

(S-13.20)

where $\tilde{\epsilon}_{t}=\sum_{k=1}^{t}\rho^{t-k}_{0}\epsilon_{k}$ is an asymptotically stationary Gaussian process with mean zero and covariance

cov(\tilde{\epsilon}_{t+h},\tilde{\epsilon}_{t})\sim\frac{\sigma^{2}_{0}\rho^{h}_{0}}{1-\rho^{2}_{0}},~{}\mbox{where}~{}h\geq 0.

(S-13.21)

Then

\frac{\sum_{t=1}^{n}x^{2}_{t}}{n}=\frac{\sum_{t=1}^{n}\varrho^{2}_{t}}{n}+\frac{\sum_{t=1}^{n}\tilde{\epsilon}^{2}_{t}}{n}+\frac{2\sum_{t=1}^{n}\tilde{\epsilon}_{t}\varrho_{t}}{n}.

(S-13.22)

By (S-13.13), the first term of the right hand side of (S-13.22) converges to $\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}$ , as $n\rightarrow\infty$ , and since $\tilde{\epsilon}_{t}$ ; $t=1,2,\ldots$ is also an irreducible and aperiodic Markov chain, by the ergodic theorem it follows that the second term of (S-13.22) converges to $\sigma^{2}_{0}/(1-\rho^{2}_{0})$ almost surely, as $n\rightarrow\infty$ . For the third term, we observe that

|\boldsymbol{z}^{\prime}_{k}\boldsymbol{\beta}_{0}-\boldsymbol{z}^{\prime}_{mk}\boldsymbol{\beta}_{m0}|<\delta,

(S-13.23)

for $n>n_{0}$ , where $n_{0}$ , depending upon $\delta~{}(>0)$ , is sufficiently large. Recalling from (B5) that $\hat{\varrho}_{t}=\sum_{k=1}^{t}\rho^{t-k}_{0}\boldsymbol{z}^{\prime}_{k}\boldsymbol{\beta}_{0}$ , we then see that for $t>n_{0}$ ,

|\varrho_{t}-\hat{\varrho}_{t}|<\frac{\delta}{1-|\rho_{0}|}<\varepsilon,

(S-13.24)

for $\delta<(1-|\rho_{0}|)\varepsilon$ . From (S-13.24) it follows that

\displaystyle\underset{n\rightarrow\infty}{\lim}~{}\frac{2\sum_{t=1}^{n}\tilde{\epsilon}_{t}\varrho_{t}}{n}=\underset{n\rightarrow\infty}{\lim}~{}\frac{2\sum_{t=n_{0}+1}^{n}\tilde{\epsilon}_{t}\hat{\varrho}_{t}}{n-n_{0}}

(S-13.25)

Since by (B5) the limit of $\hat{\varrho}_{t}$ exists as $t\rightarrow\infty$ , it follows that $\tilde{\epsilon}_{t}\hat{\varrho}_{t}$ is still an irreducible and aperiodic Markov chain with asymptotically stationary zero-mean Gaussian process. Hence, by the ergodic theorem, the third term of (S-13.22) converges to zero, almost surely, as $n\rightarrow\infty$ . It follows that

\frac{\sum_{t=1}^{n}x^{2}_{t}}{n}\rightarrow\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}},

(S-13.26)

and similarly,

\frac{\sum_{t=1}^{n}x^{2}_{t-1}}{n}\rightarrow\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}.

(S-13.27)

Now, since $x_{t}=\varrho_{t}+\tilde{\epsilon}_{t}$ , it follows using (B2) (orthogonality) and (S-13.9) that for $\tilde{\boldsymbol{\beta}}_{m}=\boldsymbol{\beta}_{m}$ or $\tilde{\boldsymbol{\beta}}_{m}=\boldsymbol{\beta}_{m0}$ ,

\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\tilde{\boldsymbol{\beta}}^{\prime}_{m}\boldsymbol{z}_{mt}x_{t}}{n}=\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\tilde{\boldsymbol{\beta}}^{\prime}_{m}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0}}{n}+\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\tilde{\boldsymbol{\beta}}^{\prime}_{m}\boldsymbol{z}_{mt}\tilde{\epsilon}_{t}}{n}.

(S-13.28)

By (B4), the first term on the right hand side of (S-13.28) is $\tilde{c}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})$ , where $\tilde{c}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})$ is $c(\boldsymbol{\beta}_{0})$ or $c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})$ accordingly as $\tilde{\boldsymbol{\beta}}_{m}$ is $\boldsymbol{\beta}_{m0}$ or $\boldsymbol{\beta}_{m}$ . For the second term, due to (S-13.23), $\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\tilde{\boldsymbol{\beta}}^{\prime}_{m}\boldsymbol{z}_{mt}\tilde{\epsilon}_{t}}{n}=\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\tilde{\boldsymbol{\beta}}^{\prime}\boldsymbol{z}_{t}\tilde{\epsilon}_{t}}{n}$ , where $\tilde{\boldsymbol{\beta}}$ is either $\boldsymbol{\beta}$ or $\boldsymbol{\beta}_{0}$ . By (B5) the limit of $\tilde{\boldsymbol{\beta}}^{\prime}\boldsymbol{z}_{t}$ exists as $t\rightarrow\infty$ , and hence $\tilde{\boldsymbol{\beta}}^{\prime}\boldsymbol{z}_{t}\tilde{\epsilon}_{t}$ remains an irreducible, aperiodic Markov chain with zero mean Gaussian stationary distribution. Hence, by the ergodic theorem, it follows that the second term of (S-13.28) is zero, almost surely. In other words, almost surely,

\frac{\sum_{t=1}^{n}\tilde{\boldsymbol{\beta}}^{\prime}_{m}\boldsymbol{z}_{mt}x_{t}}{n}\rightarrow\tilde{c}(\boldsymbol{\beta},\boldsymbol{\beta}_{0}),~{}\mbox{as}~{}n\rightarrow\infty,

(S-13.29)

and similar arguments show that, almost surely,

\frac{\sum_{t=1}^{n}\tilde{\boldsymbol{\beta}}^{\prime}_{m}\boldsymbol{z}_{mt}x_{t-1}}{n}\rightarrow 0,~{}\mbox{as}~{}n\rightarrow\infty.

(S-13.30)

We now calculate the limit of $\sum_{t=1}^{n}x_{t}x_{t-1}/n$ , as $n\rightarrow\infty$ . By (S-13.16),

\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}x_{t}x_{t-1}}{n}=\underset{n\rightarrow\infty}{\lim}~{}\frac{\rho_{0}\sum_{t=1}^{n}x^{2}_{t-1}}{n}+\underset{n\rightarrow\infty}{\lim}~{}\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}+\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\epsilon_{t}x_{t-1}}{n}.

(S-13.31)

By (S-13.27), the first term on the right hand side of (S-13.31) is given, almost surely, by $\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}$ , and the second term is almost surely zero due to (S-13.30). For the third term, note that $\epsilon_{t}x_{t-1}=\epsilon_{t}\varrho_{t-1}+\epsilon_{t}\tilde{\epsilon}_{t-1}$ , and hence using (S-13.23), $\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\epsilon_{t}x_{t-1}}{n}=\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\epsilon_{t}\hat{\varrho}_{t-1}}{n}+\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}\epsilon_{t}\tilde{\epsilon}_{t-1}}{n}$ . Both $\epsilon_{t}\hat{\varrho}_{t-1}$ ; $t=1,2,\ldots$ and $\epsilon_{t}\tilde{\epsilon}_{t-1}$ ; $t=1,2,\ldots$ , are sample paths of irreducible and aperiodic Markov chains having stationary distributions with mean zero. Hence, by the ergodic theorem, the third term of (S-13.31) is zero, almost surely. That is,

\underset{n\rightarrow\infty}{\lim}~{}\frac{\sum_{t=1}^{n}x_{t}x_{t-1}}{n}=\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}+\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}.

(S-13.32)

The limits (S-13.26), (S-13.27), (S-13.29), (S-13.30), (S-13.32) applied to $\log R_{n}(\boldsymbol{\theta})$ given by (S-13.1), shows that $\frac{\log R_{n}(\boldsymbol{\theta})}{n}$ converges to $-h(\boldsymbol{\theta})$ almost surely as $n\rightarrow\infty$ . In other words, (S3) holds.

S-13.4 Verification of (S4)

In the expression for $h(\boldsymbol{\theta})$ given by (8.17), note that $c(\boldsymbol{\beta})$ and $c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})$ are almost surely finite. Hence, for any prior on $\sigma$ and $\rho$ such that they are almost surely finite, (S4) clearly holds. In particular, this holds for any proper priors on $\sigma$ and $\rho$ .

S-13.5 Verification of (S5)

S-13.5.1 Verification of (S5) (1)

Since $\Theta=\mathcal{C}^{\prime}(\mathcal{X})\times\mathbb{R}^{\infty}\times\mathbb{R}\times\mathbb{R}^{+}$ , it is easy to see that $h(\Theta)=0$ . Let $\boldsymbol{\gamma}_{m}=(\gamma_{1},\ldots,\gamma_{m})^{\prime}$ , $\tilde{\gamma}_{m}=\sum_{i=1}^{m}|\gamma_{i}|$ , $\boldsymbol{\theta}_{m}=(\tilde{\eta},\boldsymbol{\gamma}_{m},\rho,\sigma)$ , $\Theta_{m}=\mathcal{C}^{\prime}(\mathcal{X})\times\mathbb{R}^{m}\times\mathbb{R}\times\mathbb{R}^{+}$ . We now define

	$\displaystyle\mathcal{G}_{n}$	$\displaystyle=\left\{\boldsymbol{\theta}_{m}\in\Theta_{m}:\|\rho\|\leq\exp\left(\left(\alpha n\right)^{1/16}\right),\tilde{\gamma}_{m}\leq\exp\left(\left(\alpha n\right)^{1/16}\right),\right.$
		$\displaystyle\qquad\left.\\|\tilde{\eta}\\|\leq\exp\left(\left(\alpha n\right)^{1/16}\right),\\|\tilde{\eta}^{\prime}\\|\leq\exp\left(\left(\alpha n\right)^{1/16}\right),\exp\left(-\left(\alpha n\right)^{1/16}\right)\leq\sigma\leq\exp\left(\left(\alpha n\right)^{1/16}\right)\right\},$

where $\alpha>0$ .

Since $|\gamma_{i}|<L<\infty$ for all $i$ , it follows that $\mathcal{G}_{n}$ is increasing in $n$ for $n\geq n_{0}$ , for some $n_{0}\geq 1$ . To see this, note that if $\tilde{\gamma}_{m_{n}}\leq\exp(\left(\alpha n\right)^{1/16})$ , then $\tilde{\gamma}_{m_{n+1}}=\tilde{\gamma}_{m_{n}}+\sum_{i=m_{n}+1}^{m_{n+1}}|\gamma_{i}|<\exp(\left(\alpha(n+1)\right)^{1/16})$ if $\sum_{i=m_{n}+1}^{m_{n+1}}|\gamma_{i}|<L(m_{n+1}-m_{n})<\exp(\left(\alpha(n+1)\right)^{1/16})-\exp(\left(\alpha n\right)^{1/16})$ , which holds by assumption (B7). Since $\mathcal{G}_{n}\rightarrow\Theta$ as $n\rightarrow\infty$ , there exists $n_{1}$ such that $\mathcal{G}_{n_{1}}$ contains $\boldsymbol{\theta}_{0}$ . Hence, $h(\mathcal{G}_{n})=0$ for all $n\geq n_{1}$ . In other words, $h(\mathcal{G}_{n})\rightarrow h(\Theta)$ , as $n\rightarrow\infty$ . Now observe that

	$\displaystyle\pi\left(\mathcal{G}_{n}\right)$
	$\displaystyle=\pi\left(\tilde{\gamma}_{m}\leq\exp(\left(\alpha n\right)^{1/16}),\\|\tilde{\eta}\\|\leq\exp(\left(\alpha n\right)^{1/16}),\\|\tilde{\eta}^{\prime}\\|\leq\exp(\left(\alpha n\right)^{1/16}),\right.$
	$\displaystyle\qquad\left.\exp\left(-\left(\alpha n\right)^{1/16}\right)\leq\sigma\leq\exp\left(\left(\alpha n\right)^{1/16}\right)\right)$
	$\displaystyle\quad-\pi\left(\|\rho\|>\exp\left(\left(\alpha n\right)^{1/16}\right),\tilde{\gamma}_{m}\leq\exp(\left(\alpha n\right)^{1/16}),\\|\tilde{\eta}\\|\leq\exp(\left(\alpha n\right)^{1/16}),\\|\tilde{\eta}^{\prime}\\|\leq\exp(\left(\alpha n\right)^{1/16}),\right.$
	$\displaystyle\qquad\left.\exp\left(-\left(\alpha n\right)^{1/16}\right)\leq\sigma\leq\exp\left(\left(\alpha n\right)^{1/16}\right)\right)$
	$\displaystyle\geq 1-\pi\left(\|\rho\|>\exp\left(\left(\alpha n\right)^{1/16}\right)\right)-\pi\left(\tilde{\gamma}_{m}>\exp\left(\left(\alpha n\right)^{1/16}\right)\right)-\pi\left(\\|\tilde{\eta}\\|>\exp(\left(\alpha n\right)^{1/16})\right)$
	$\displaystyle\qquad-\pi\left(\\|\tilde{\eta}^{\prime}\\|>\exp(\left(\alpha n\right)^{1/16})\right)-\pi\left(\left\{\exp\left(-\left(\alpha n\right)^{1/16}\right)\leq\sigma\leq\exp\left(\left(\alpha n\right)^{1/16}\right)\right\}^{c}\right)$
	$\displaystyle\geq 1-(c_{\rho}+c_{\gamma}+c_{\tilde{\eta}}+c_{\tilde{\eta}^{\prime}}+c_{\sigma})\exp(-\alpha n),$

where the last step is due to (B6).

S-13.5.2 Verification of (S5) (2)

First, we note that $\mathcal{G}_{n}$ is compact, which can be proved using Arzela-Ascoli lemma, in almost the same way as in Chatterjee and Bhattacharya (2020). Since $\mathcal{G}_{n}$ is compact for all $n\geq 1$ , uniform convergence as required will be proven if we can show that $\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+h(\boldsymbol{\theta})$ is stochastically equicontinuous almost surely in $\boldsymbol{\theta}\in\mathcal{G}$ for any $\mathcal{G}\in\left\{\mathcal{G}_{n}:n=1,2,\ldots\right\}$ and $\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+h(\boldsymbol{\theta})\rightarrow 0$ , almost surely, for all $\boldsymbol{\theta}\in\mathcal{G}$ (see Newey (1991) for the general theory of uniform convergence in compact sets under stochastic equicontinuity). Since we have already verified pointwise convergence of the above for all $\boldsymbol{\theta}\in\boldsymbol{\Theta}$ while verifying (S3), it remains to prove stochastic equicontinuity of $\frac{1}{n}\log R_{n}(\cdot)+h(\cdot)$ . Stochastic equicontinuity usually follows easily if one can prove that the function concerned is almost surely Lipschitz continuous. In our case, we can first verify Lipschitz continuity of $\frac{1}{n}\log R_{n}(\boldsymbol{\theta})$ by showing that its first partial derivatives with respect to the components of $\boldsymbol{\theta}$ are almost surely bounded. With respect to $\rho$ and $\sigma$ , the boundedness of the parameters in $\mathcal{G}$ , (8.9) and the limit results (S-13.26), (S-13.27), (S-13.29), (S-13.30) and (S-13.32) readily show boundedness of the partial derivatives. With respect to $\boldsymbol{\beta}_{m}$ , note that the derivative of $\frac{1}{2\sigma^{2}}\boldsymbol{\beta}^{\prime}_{m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{m}$ , a relevant expression of $\frac{1}{n}\log R_{n}(\boldsymbol{\theta})$ (see (S-13.1)), is $\frac{1}{\sigma^{2}}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{m}$ , whose Euclidean norm is bounded above by $\sigma^{-2}\|\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\|_{op}\times\|\boldsymbol{\beta}_{m}\|$ . In our case, $\|\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\|_{op}\leq K<\infty$ by (B3). Moreover, $\sigma^{-2}$ is bounded in $\mathcal{G}$ and $\|\boldsymbol{\beta}_{m}\|\leq\|\tilde{\eta}\|\times\sqrt{\sum_{i=1}^{m}\gamma^{2}_{i}}$ , which is also bounded in $\mathcal{G}$ . Boundedness of the partial derivatives with respect to $\boldsymbol{\beta}_{m}$ of the other terms of $\frac{1}{n}\log R_{n}(\boldsymbol{\theta})$ involving $\boldsymbol{\beta}_{m}$ are easy to observe. In other words, $\frac{1}{n}\log R_{n}(\boldsymbol{\theta})$ is stochastically equicontinuous.

To see that $h(\boldsymbol{\theta})$ is equicontinuous, first note that in the expression (8.17), except the terms involving $c(\boldsymbol{\beta})$ and $c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})$ , the other terms are easily seen to be Lipschitz, using boundedness of the partial derivatives. Let us now focus on the term $\frac{c(\boldsymbol{\beta})}{2\sigma^{2}}$ . For our purpose, let us consider two different sequences $\boldsymbol{\beta}_{1m}$ and $\boldsymbol{\beta}_{2m}$ associated with $(\gamma_{1},\tilde{\eta}_{1})$ and $(\gamma_{2},\tilde{\eta}_{2})$ , respectively, such that $\boldsymbol{\beta}^{\prime}_{1m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{1m}\rightarrow c(\boldsymbol{\beta}_{1})$ and $\boldsymbol{\beta}^{\prime}_{2m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{2m}\rightarrow c(\boldsymbol{\beta}_{2})$ . As we have already shown that $\boldsymbol{\beta}^{\prime}_{m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{m}$ is Lipschitz in $\boldsymbol{\beta}_{m}$ , we must have $\|\boldsymbol{\beta}^{\prime}_{1m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{1m}-\boldsymbol{\beta}^{\prime}_{2m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{2m}\|\leq L\|\boldsymbol{\beta}_{1m}-\boldsymbol{\beta}_{2m}\|\leq L\|\gamma_{1}\tilde{\eta}_{1}-\gamma_{2}\tilde{\eta}_{2}\|$ , for some Lipschitz constant $L>0$ . Taking the limit of both sides as $n\rightarrow\infty$ shows that $|c(\boldsymbol{\beta}_{1})-c(\boldsymbol{\beta}_{2})|\leq L\|\gamma_{1}\tilde{\eta}_{1}-\gamma_{2}\tilde{\eta}_{2}\|$ , proving that $\frac{c(\boldsymbol{\beta})}{2\sigma^{2}}$ is Lipschitz in $\eta=\gamma\tilde{\eta}$ , when $\sigma$ is held fixed. The bounded partial derivative with respect to $\sigma$ also shows that $\frac{c(\boldsymbol{\beta})}{2\sigma^{2}}$ is Lipschitz in both $\eta$ and $\sigma$ . Similarly, the term $\frac{c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})}{\sigma^{2}}$ in (8.17) is also Lipschitz continuous.

In other words, $\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+h(\boldsymbol{\theta})$ is stochastically equicontinuous almost surely in $\boldsymbol{\theta}\in\mathcal{G}$ . Hence, the required uniform convergence is satisfied.

S-13.5.3 Verification of (S5) (3)

Continuity of $h(\boldsymbol{\theta})$ , compactness of $\mathcal{G}_{n}$ , along with its non-decreasing nature with respect to $n$ implies that $h\left(\mathcal{G}_{n}\right)\rightarrow h\left(\boldsymbol{\Theta}\right)$ , as $n\rightarrow\infty$ . Hence, (S5) holds.

S-13.6 Verification of (S6) and proof of Theorem 25

Note that in our case,

	$\displaystyle\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+h(\boldsymbol{\theta})$
	$\displaystyle\qquad=\left(\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right)\left(\frac{\sum_{t=1}^{n}x^{2}_{t}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)$
	$\displaystyle\qquad+\left(\frac{\rho^{2}}{2\sigma^{2}}-\frac{\rho^{2}_{0}}{2\sigma^{2}_{0}}\right)\left(\frac{\sum_{t=1}^{n}x^{2}_{t-1}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)$
	$\displaystyle\qquad+\frac{1}{2\sigma^{2}}\left(\boldsymbol{\beta}^{\prime}_{m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{m}-c(\boldsymbol{\beta})\right)-\frac{1}{2\sigma^{2}_{0}}\left(\boldsymbol{\beta}^{\prime}_{m0}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{m0}-c(\boldsymbol{\beta}_{0})\right)$
	$\displaystyle\qquad-\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\left(\frac{\rho_{0}\sum_{t=1}^{n}x^{2}_{t-1}}{n}+\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}-\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right)$
	$\displaystyle\qquad-\left[\left(\frac{\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\boldsymbol{\beta}_{m0}}{\sigma^{2}_{0}}\right)^{\prime}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t}}{n}\right)-\frac{c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})}{\sigma^{2}}+\frac{c(\boldsymbol{\beta}_{0})}{\sigma^{2}_{0}}\right]$
	$\displaystyle\qquad+\left(\frac{\rho\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\rho_{0}\boldsymbol{\beta}_{0}}{\sigma^{2}_{0}}\right)^{\prime}\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}$
	$\displaystyle\qquad+\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\left(\frac{\sum_{t=1}^{n}\epsilon_{t}x_{t-1}}{n}\right).$		(S-13.33)

Let $\kappa_{1}=(\kappa-h\left(\boldsymbol{\Theta}\right))/7$ , $\boldsymbol{\mu}_{n}=E(\mathbf{x}_{n})$ and $\boldsymbol{\Sigma}_{n}=Var(\mathbf{x}_{n})$ ; let $\boldsymbol{\Sigma}_{n}=\boldsymbol{C}_{n}\boldsymbol{C}^{\prime}_{n}$ be the Cholesky decomposition. Also let $\boldsymbol{y}_{n}\sim N_{n}\left(\boldsymbol{0}_{n},\boldsymbol{I}_{n}\right)$ , the $n$ -dimensional normal distribution with mean $\boldsymbol{0}_{n}$ , the $n$ -dimensional vector with all components zero and variance $\boldsymbol{I}_{n}$ , the $n$ -dimensional identity matrix. Then

	$\displaystyle P\left(\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|\left\|\frac{\sum_{t=1}^{n}x^{2}_{t}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right\|>\kappa_{1}\right)$
$\displaystyle=$	$\displaystyle P\left(\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|\left\|\frac{\boldsymbol{\mu}_{n}\prime\boldsymbol{\mu}_{n}+2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n}+\boldsymbol{y}^{\prime}_{n}\boldsymbol{\Sigma}_{n}\boldsymbol{y}_{n}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right\|>\kappa_{1}\right)$
$\displaystyle\leq$	$\displaystyle P\left(\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|\left\|\frac{2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n}}{n}\right\|>\frac{\kappa_{1}}{4}\right)+P\left(\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|\left\|\frac{\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{\mu}_{n}}{n}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right\|>\frac{\kappa_{1}}{4}\right)$	(S-13.34)
	$\displaystyle+P\left(\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|\left\|\frac{\boldsymbol{y}^{\prime}_{n}\boldsymbol{\Sigma}_{n}\boldsymbol{y}_{n}}{n}-tr\left(\frac{\boldsymbol{\Sigma}_{n}}{n}\right)\right\|>\frac{\kappa_{1}}{4}\right)+P\left(\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|\left\|tr\left(\frac{\boldsymbol{\Sigma}_{n}}{n}\right)-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}\right\|>\frac{\kappa_{1}}{4}\right).$	(S-13.35)

To deal with the first term of (S-13.34) first note that $2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n}$ is Lipschitz in $\boldsymbol{y}_{n}$ , with the square of the Lipschitz constant being $4\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{\Sigma}_{n}\boldsymbol{\mu}_{n}$ , which is again bounded above by $K_{1}n$ , for some constant $K_{1}>0$ , due to (8.9). It then follows using the Gaussian concentration inequality (see, for example, Giraud (2015)) that

	$\displaystyle P\left(\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|\left\|\frac{2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n}}{n}\right\|>\frac{\kappa_{1}}{4}\right)$	$\displaystyle=P\left(\left\|2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n}\right\|>\frac{n\kappa_{1}}{3}\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|^{-1}\right)$
		$\displaystyle\leq 2\exp\left(-\frac{n\kappa^{2}_{1}}{18K_{1}}\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|^{-2}\right).$		(S-13.36)

Now, for large enough $n$ , noting that $\pi\left(\mathcal{G}^{c}_{n}\right)\leq\exp(-\alpha n)$ up to some positive constant, we have

	$\displaystyle\int_{\mathcal{S}^{c}}P\left(\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|\left\|\frac{2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n}}{n}\right\|>\frac{\kappa_{1}}{4}\right)d\pi(\boldsymbol{\theta})$
	$\displaystyle\leq 2\int_{\mathcal{S}^{c}}\exp\left(-\frac{n\kappa^{2}_{1}}{18K_{1}}\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|^{-2}\right)d\pi(\boldsymbol{\theta})$		(S-13.37)
	$\displaystyle\leq 2\int_{\mathcal{G}_{n}}\exp\left(-\frac{n\kappa^{2}_{1}}{18K_{1}}\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|^{-2}\right)d\pi(\boldsymbol{\theta})+2\int_{\mathcal{G}^{c}_{n}}\exp\left(-\frac{n\kappa^{2}_{1}}{18K_{1}}\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|^{-2}\right)d\pi(\boldsymbol{\theta})$
	$\displaystyle\leq 2\int_{\exp(-2\left(\alpha n\right)^{1/16})}^{\exp(2\left(\alpha n\right)^{1/16})}\exp\left(-\frac{n\kappa^{2}_{1}}{18K_{1}}\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|^{-2}\right)\pi(\sigma^{2})d\sigma^{2}+2\pi\left(\mathcal{G}^{c}_{n}\right)$
	$\displaystyle\leq 2\int_{\exp(-2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}^{\exp(2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}\exp\left(-C_{1}\kappa^{2}_{1}Tu^{-2}\right)(u+\sigma^{-2}_{0})^{-2}\pi\left(\frac{1}{u+\sigma^{-2}_{0}}\right)du+\tilde{C}\exp(-\alpha n),$		(S-13.38)

for some positive constants $C_{1}$ and $\tilde{C}$ .

Now, the prior $(u+\sigma^{-2}_{0})^{-2}\pi\left(\frac{1}{u+\sigma^{-2}_{0}}\right)$ is such that large values of $u$ receive small probabilities. Hence, if this prior is replaced by an appropriate function which has a thicker tail than the prior, then the resultant integral provides an upper bound for the first term of (S-13.38). We consider a function $\tilde{\pi}(u)$ which is of mixture form depending upon $n$ , that is, we let $\tilde{\pi}_{n}(u)=c_{3}\sum_{r=1}^{M_{n}}\psi^{\zeta_{rn}}_{rn}\exp(-\psi_{rn}u^{2})u^{2(\zeta_{rn}-1)}\boldsymbol{I}_{B_{n}}(u)$ , where $B_{n}=\left[\exp\left(-2\left(\alpha n\right)^{1/16}\right)-\sigma^{-2}_{0},\exp\left(2\left(\alpha n\right)^{1/16}\right)-\sigma^{-2}_{0}\right]$ , $M_{n}\leq\exp(\left(\alpha n\right)^{1/16})$ is the number of mixture components, $c_{3}>0$ , for $r=1,\ldots,M_{n}$ , $\frac{1}{2}<\zeta_{rn}\leq c_{4}n^{q}$ , for $0<q<1/16$ and $n\geq 1$ , where $c_{4}>0$ , and $0<\psi_{1}\leq\psi_{rn}<c_{5}<\infty$ , for all $r$ and $n$ . In this case,

	$\displaystyle\int_{\exp(-2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}^{\exp(2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}\exp\left(-C_{1}\kappa^{2}_{1}nu^{-2}\right)(u+\sigma^{-2}_{0})^{-2}\pi\left(\frac{1}{u+\sigma^{-2}_{0}}\right)du$
	$\displaystyle\leq c_{3}\sum_{r=1}^{M_{n}}\psi^{\zeta_{rn}}_{rn}\int_{\exp(-2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}^{\exp(2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}\exp\left[-\left(C_{1}\kappa^{2}_{1}nu^{-2}+\psi_{rn}u^{2}\right)\right]\left(u^{2}\right)^{\zeta_{rn}-1}du.$		(S-13.39)

Now the $r$ -th integrand of (S-13.39) is minimized at $\tilde{u}^{2}_{rn}=\frac{\zeta_{rn}-1+\sqrt{(\zeta_{rn}-1)^{2}+4C_{1}\psi_{rn}\kappa^{2}_{1}n}}{2\psi_{rn}}$ , so that for sufficiently large $n$ , $c_{1}\kappa_{1}\sqrt{\frac{n}{\psi_{rn}}}\leq\tilde{u}^{2}_{rn}\leq\tilde{c}_{1}\kappa_{1}\sqrt{\frac{n}{\psi_{rn}}}$ , for some positive constants $c_{1}$ and $\tilde{c}_{1}$ . Now, for sufficiently large $n$ , we have $\frac{\tilde{u}^{2}_{rn}}{\log\tilde{u}^{2}_{rn}}\geq\frac{\zeta_{rn}-1}{\psi_{rn}(1-c_{2})}$ , for $0<c_{2}<1$ . Hence, for sufficiently large $n$ , $C_{1}\kappa^{2}_{1}n\tilde{u}^{-2}_{rn}+\psi_{rn}\tilde{u}^{2}_{rn}-(\zeta_{rn}-1)\log(\tilde{u}^{2}_{rn})\geq c_{2}\psi_{1}\tilde{u}^{2}_{rn}\geq C_{2}\kappa_{1}\sqrt{\psi_{rn}n}$ for some positive constant $C_{2}$ . From these and (S-13.38) it follows that

	$\displaystyle c_{3}\sum_{r=1}^{M_{n}}\psi^{\zeta_{rn}}_{rn}\int_{\exp(-2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}^{\exp(2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}\exp\left[-\left(C_{1}\kappa^{2}_{1}nu^{-2}+\psi_{1}u^{2}\right)\right]\left(u^{2}\right)^{\zeta_{rn}-1}du$
	$\displaystyle\leq c_{3}M_{n}\exp\left[-\left(C_{2}\kappa_{1}\sqrt{n\psi_{1}}-2\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]$
	$\displaystyle\leq c_{3}\exp\left[-\left(C_{2}\kappa_{1}\sqrt{n}-3\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right].$		(S-13.40)

for some constant $\tilde{c}_{5}$ . Combining (S-13.38), (S-13.39) and (S-13.40) we obtain

	$\displaystyle\int_{\mathcal{S}^{c}}P\left(\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|\left\|\frac{2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n}}{n}\right\|>\frac{\kappa_{1}}{4}\right)d\pi(\boldsymbol{\theta})$
	$\displaystyle\leq K_{2}\exp\left[-\left(C_{2}\kappa_{1}\sqrt{n}-3\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n).$		(S-13.41)

For the second term of (S-13.34), since $\boldsymbol{\mu}_{n}$ is non-random, we can also view this as a set of independent realizations from any suitable independent zero mean process with variance $\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}$ on a compact set (due to (8.9)). In that case, by Hoeffding’s inequality (Hoeffding, 1963) we obtain

	$\displaystyle\int_{\mathcal{S}^{c}}P\left(\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|\left\|\frac{\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{\mu}_{n}}{n}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right\|>\frac{\kappa_{1}}{4}\right)d\pi(\boldsymbol{\theta})$
	$\displaystyle\leq 2\int_{\exp(-2\left(\alpha n\right)^{1/16})}^{\exp(2(\alpha n)^{1/16})}\exp\left(-K_{3}\kappa^{2}_{1}n\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|^{-2}\right)\pi(\sigma^{2})d\sigma^{2}+\tilde{C}_{2}\exp(-\alpha n)$
	$\displaystyle\leq K_{3}\exp\left[-\left(C_{3}\kappa_{1}\sqrt{n\psi_{2}}-3\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n).$		(S-13.42)

for some positive constants $K_{3}$ and $C_{3}$ . The last step follows in the same way as (S-13.41).

We now deal with the first term of (S-13.35). First note that $\|\boldsymbol{\Sigma}_{n}\|^{2}_{F}\leq K_{4}n$ , for some $K_{4}>0$ , where $\|\boldsymbol{\Sigma}_{n}\|^{2}_{F}$ is the Frobenius norm of $\boldsymbol{\Sigma}_{n}$ . Also, any eigenvalue $\lambda$ of any matrix $\boldsymbol{A}=(a_{ij})$ satisfies $|\lambda-a_{ii}|\leq\sum_{j\neq i}|a_{ij}|$ , by the Gerschgorin’s circle theorem (see, for example, Lange (2010)). In our case, the rows of $\boldsymbol{\Sigma}_{n}$ are summable and the diagonal elements are bounded for any $n$ . The maximum row sum is attained by the middle row when $n$ is odd and the two middle rows when $n$ is even. In other words, the maximum eigenvalue of $\boldsymbol{\Sigma}_{n}$ remains bounded for all $n\geq 1$ . That is, $\underset{n\geq 1}{\sup}~{}\|\boldsymbol{\Sigma}_{n}\|_{op}<K_{5}$ , for some positive constant $K_{5}$ . Now observe that for the integral of the form $\int_{\sigma^{2}\in\tilde{\mathcal{G}_{n}}}\exp\left(-C_{5}\kappa^{2}_{1}n\left|\sigma^{-2}-\sigma^{-2}_{0}\right|^{-1}\right)\pi(\sigma^{2})d\sigma^{2}$ , where $\tilde{\mathcal{G}_{n}}\subseteq\mathcal{G}_{n}$ , we can obtain, using the same technique pertaining to (S-13.41), that

\int_{\sigma^{2}\in\tilde{\mathcal{G}_{n}}}\exp\left(-C_{5}\kappa^{2}_{1}n\left|\sigma^{-2}-\sigma^{-2}_{0}\right|^{-1}\right)\pi(\sigma^{2})d\sigma^{2}\\ \leq C_{7}\exp\left[-\left(C_{6}\kappa_{1}\sqrt{n}-3\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right],

(S-13.43)

for relevant positive constants $C_{6}$ , $\psi_{3}$ and $\tilde{c}_{5}$ . Then by the Hanson-Wright inequality, (S-13.43) and the same method for obtaining (S-13.41), we obtain the following bound for the first term of (S-13.35):

	$\displaystyle\int_{\mathcal{S}^{c}}P\left(\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|\left\|\frac{\boldsymbol{y}^{\prime}_{n}\boldsymbol{\Sigma}_{n}\boldsymbol{y}_{n}}{n}-\tr\left(\frac{\boldsymbol{\Sigma}_{n}}{n}\right)\right\|>\frac{\kappa_{1}}{4}\right)d\pi(\boldsymbol{\theta})$
	$\displaystyle\leq E_{\pi}\left[\exp\left[-K_{6}\min\left\{\frac{\frac{\kappa^{2}_{1}}{9}\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|^{-2}}{\\|\frac{\boldsymbol{\Sigma}_{n}}{n}\\|^{2}_{F}},\frac{\frac{\kappa_{1}}{3}\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|^{-1}}{\\|\frac{\boldsymbol{\Sigma}_{n}}{n}\\|_{op}}\right\}\right]\boldsymbol{I}_{\mathcal{G}_{n}}(\boldsymbol{\theta})\right]+\tilde{C}\exp(-\alpha n)$
	$\displaystyle\leq K_{7}\exp\left[-\left(C_{8}\kappa_{1}\sqrt{n}-3\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n),$		(S-13.44)

for relevant positive constants $K_{7}$ , $C_{8}$ , $\psi_{4}$ and $\tilde{c}_{5}$ .

Using the same technique involving Hoeffding’s bound for the second term of (S-13.34), it is easy to see that the second term of (S-13.35) satisfies the following:

	$\displaystyle P\left(\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|\left\|tr\left(\frac{\boldsymbol{\Sigma}_{n}}{n}\right)-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}\right\|>\frac{\kappa_{1}}{4}\right)$	$\displaystyle\leq\tilde{K}_{3}\exp\left[-\left(\tilde{C}_{3}\kappa_{1}\sqrt{n}-3\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right],$
		$\displaystyle\qquad+\tilde{C}\exp(-\alpha n),$		(S-13.45)

for relevant positive constants $\tilde{K}_{3}$ , $\tilde{C}_{3}$ , $\tilde{\psi}_{2}$ and $\tilde{c}_{5}$ .

Hence, combining (S-13.34), (S-13.35), (S-13.42), (S-13.44) and (S-13.45), we obtain

	$\displaystyle E_{\pi}\left[P\left(\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|\left\|\frac{\sum_{t=1}^{n}x^{2}_{t}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right\|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]$
	$\displaystyle\leq K_{8}\exp\left[-\left(C_{9}\kappa_{1}\sqrt{n}-3\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n),$		(S-13.46)

for relevant positive constants.

Let us now obtain a bound for $E_{\pi}\left[P\left(\left|\frac{\rho^{2}}{2\sigma^{2}}-\frac{\rho^{2}_{0}}{2\sigma^{2}_{0}}\right|\left|\frac{\sum_{t=1}^{s}x^{2}_{t-1}}{s}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]$ . By the same way as above, we obtain, by first taking the expectation with respect to $\sigma^{2}\in\mathcal{G}_{n}$ , the following:

	$\displaystyle E_{\pi}\left[P\left(\left\|\frac{\rho^{2}}{2\sigma^{2}}-\frac{\rho^{2}_{0}}{2\sigma^{2}_{0}}\right\|\left\|\frac{\sum_{t=1}^{n}x^{2}_{t-1}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right\|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]$
	$\displaystyle\leq C_{10}\int_{\rho\in\mathcal{G}_{n}}\int_{\exp(-2\left(\alpha n\right)^{1/16})}^{\exp(2\left(\alpha n\right)^{1/16})}\exp\left[-C_{11}\kappa^{2}_{1}n\left(\frac{\rho^{2}}{\sigma^{2}}-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}\right)^{-2}\right]\pi(\sigma^{2})d\sigma^{2}\pi(\rho)d\rho+\tilde{C}\exp(-\alpha n)$
	$\displaystyle=C_{10}\int_{\rho\in\mathcal{G}_{n}}\rho^{2}\int_{\rho^{2}\exp(-2\left(\alpha n\right)^{1/16})-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}^{\rho^{2}\exp(2\left(\alpha n\right)^{1/16})-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}\exp\left(-C_{11}\kappa^{2}_{1}nu^{-2}\right)\left(u+\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}\right)^{-2}\pi\left(\frac{\rho^{2}}{u+\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}\right)du\pi(\rho)d\rho$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad+\tilde{C}\exp(-\alpha n),$		(S-13.47)

for relevant positive constants. Since $\pi\left(\sigma^{2}>\exp(2\left(\alpha n\right)^{1/16})\right)\leq\exp(-\alpha n)$ , it is evident that much the mass of $\left(u+\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}\right)^{-2}\pi\left(\frac{\rho^{2}}{u+\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}\right)$ is concentrated around zero, where the function $\exp\left(-C_{11}nu^{-2}\right)$ is small. To give greater weight to the function, we can replace $\left(u+\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}\right)^{-2}\pi\left(\frac{\rho^{2}}{u+\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}\right)$ with a mixture function of the form $\tilde{\pi}_{\rho^{2},n}(u)=c_{3}\sum_{r=1}^{M_{n}}\rho^{2\zeta_{rn}}\psi^{\zeta_{rn}}_{rn}\exp\left(-u^{2}\psi_{rn}\rho^{2}\right)\left(u^{2}\right)^{(\zeta_{rn}-1)}\boldsymbol{I}_{B_{n,\rho^{2}}}(u)$ , for positive constants $0<\psi_{2}\leq\psi_{rn}<c_{5}<\infty$ and $1/2<\zeta_{rn}<c_{4}n^{q}$ . Here

B_{n,\rho^{2}}=\left[\rho^{2}\exp(-2\left(\alpha n\right)^{1/16})-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}},\rho^{2}\exp(2\left(\alpha n\right)^{1/16})-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}\right].

As before, $0<q<1/16$ and $M_{n}\leq\exp\left(\left(\alpha n\right)^{1/16}\right)$ . Hence, up to some positive constant,

	$\displaystyle\int_{\rho^{2}\exp(-2\left(\alpha n\right)^{1/16})-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}^{\rho^{2}\exp(2\left(\alpha n\right)^{1/16})-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}\exp\left(-C_{11}\kappa^{2}_{1}nu^{-2}\right)\left(u+\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}\right)^{-2}\pi\left(\frac{\rho^{2}}{u+\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}\right)du$
	$\displaystyle\leq\sum_{r=1}^{M_{n}}\rho^{2\zeta_{rn}}\psi^{\zeta_{rn}}_{rn}\int_{\rho^{2}\exp(-2\left(\alpha n\right)^{1/16})-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}^{\rho^{2}\exp(2\left(\alpha n\right)^{1/16})-\frac{\rho^{2}_{0}}{\sigma^{2}_{0}}}\exp\left[-\left(C_{11}\kappa^{2}_{1}nu^{-2}+\psi_{rn}\rho^{2}u^{2}-(\zeta_{rn}-1)\log u^{2}\right)\right]du.$		(S-13.48)

The term within the parenthesis in the exponent of (S-13.48) is minimized at $\tilde{u}^{2}_{rn}=\frac{\zeta_{rn}-1+\sqrt{(\zeta_{rn}-1)^{2}+4\psi_{rn}\rho^{2}C_{11}\kappa^{2}_{1}n}}{2\psi_{rn}\rho^{2}}$ . Note that $\tilde{C}_{01}\frac{\kappa_{1}}{|\rho|}\sqrt{\frac{n}{\psi_{rn}}}\leq\tilde{u}^{2}_{rn}\leq\tilde{C}_{11}\frac{\kappa_{1}}{|\rho|}\sqrt{\frac{n}{\psi_{rn}}}$ , for large enough $n$ . Hence, for large $n$ , the term within the parenthesis in the exponent of (S-13.48) exceeds $\psi_{rn}\tilde{u}^{2}\geq\tilde{C}_{02}\times|\rho|\kappa_{1}\sqrt{\psi_{rn}n}$ , for $\tilde{C}_{02}>0$ . Thus, (S-13.48) is bounded above by a constant times $\rho^{2(1+\zeta_{rn})}\exp\left(-\tilde{C}_{02}\times\kappa_{1}|\rho|\sqrt{\psi_{6}n}+3\left(\alpha n\right)^{1/16}+\tilde{c}_{5}n^{q}\right)$ . Combining this with (S-13.47) we see that

	$\displaystyle E_{\pi}\left[P\left(\left\|\frac{\rho^{2}}{2\sigma^{2}}-\frac{\rho^{2}_{0}}{2\sigma^{2}_{0}}\right\|\left\|\frac{\sum_{t=1}^{n}x^{2}_{t-1}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right\|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]$
	$\displaystyle\leq\int_{\rho\in\mathcal{G}_{n}}\rho^{2(2+\zeta_{rn})}\exp\left[-\left(\tilde{C}_{02}\times\kappa_{1}\|\rho\|\sqrt{\psi_{6}n}-3\left(\alpha n\right)^{1/4}-\tilde{c}_{5}n^{q}\right)\right]\pi(\rho)d\rho+\tilde{C}\exp(-\alpha n)$
	$\displaystyle=\int_{\exp\left(-\left(\alpha n\right)^{1/16}\right)}^{\exp\left(\left(\alpha n\right)^{1/16}\right)}\exp\left[-\left(\tilde{C}_{02}\times\kappa_{1}u^{-1}\sqrt{\psi_{6}n}+2(2+\zeta_{rn})\log u-3\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]\pi_{1}(u)du$
	$\displaystyle\qquad+\tilde{C}\exp(-\alpha n),$		(S-13.49)

where $\pi_{1}(u)du$ is the appropriate modification of $\pi(\rho)d\rho$ in view of the transformation $|\rho|\mapsto u^{-1}$ . Replacing $\pi_{1}(u)$ with a mixture function of the form $\tilde{\pi}_{n}(u)=c_{3}\sum_{r=1}^{M_{n}}\psi^{\tilde{\zeta}_{rn}}_{rn}\exp\left(-u\psi_{rn}\right)u^{(\tilde{\zeta}_{rn}-1)}$ , for positive constants $0<\psi_{2}\leq\psi_{rn}<\tilde{c}_{5}<\infty$ and $0<\zeta_{rn}<c_{4}n^{q}$ , with $0<q<1/16$ , and $M_{n}\leq\exp\left(\left(\alpha n\right)^{1/16}\right)$ , and applying the same techniques as before, we see from (S-13.49) that

	$\displaystyle E_{\pi}\left[P\left(\left\|\frac{\rho^{2}}{2\sigma^{2}}-\frac{\rho^{2}_{0}}{2\sigma^{2}_{0}}\right\|\left\|\frac{\sum_{t=1}^{n}x^{2}_{t-1}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right\|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]$
	$\displaystyle\leq C_{14}\exp\left(3\left(\alpha n\right)^{1/4}+\tilde{c}_{5}n^{q}\right)$
	$\displaystyle\times\sum_{t=1}^{M_{n}}\psi^{\tilde{\zeta}_{rn}}_{rn}\int_{\exp\left(-\left(\alpha n\right)^{1/4}\right)}^{\exp\left(\left(\alpha n\right)^{1/4}\right)}\exp\left[-\left(\tilde{C}_{02}\times\kappa_{1}u^{-1}\sqrt{\psi_{6}n}+u\psi_{rn}-(\tilde{\zeta}_{rn}-2\zeta_{rn}-5)\log u\right)\right]du$
	$\displaystyle\qquad+\tilde{C}\exp(-\alpha n)$
	$\displaystyle\leq C_{14}\exp\left[-\left(C_{15}\sqrt{\kappa_{1}}n^{1/4}-4\left(\alpha n\right)^{1/16}-2n^{q}\log\tilde{c}_{5}\right)\right]+\tilde{C}\exp(-\alpha n),$		(S-13.50)

for relevant positive constants.

Let us now deal with $\frac{1}{2\sigma^{2}}\left(\boldsymbol{\beta}^{\prime}_{m}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}\boldsymbol{z}^{\prime}_{mt}}{n}\right)\boldsymbol{\beta}_{m}-c(\boldsymbol{\beta})\right)=\frac{1}{2\sigma^{2}}\left(\frac{\sum_{t=1}^{n}\left(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}\right)^{2}}{n}-c(\boldsymbol{\beta})\right)$ . Now, again we assume as before that $\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}$ ; $t=1,2,\ldots,n$ is a realization from some independent zero-mean process with variance $c(\boldsymbol{\beta})$ . Note that $|\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m}|\leq\sum_{i=1}^{m}|z_{it}||\beta_{i}|=\sum_{i=1}^{m}|z_{it}||\gamma_{i}||\tilde{\eta}_{i}|\leq\underset{t\geq 1}{\sup}~{}\|z_{t}\|\|\tilde{\eta}\|\sum_{i=1}^{m}|\gamma_{i}|$ . By (B1), $\underset{t\geq 1}{\sup}~{}\|z_{t}\|<\infty$ . Let $\tilde{\gamma}_{m}=\sum_{i=1}^{m}|\gamma_{i}|$ . Then using Hoeffding’s inequality in conjunction with (8.9), we obtain

P\left(\frac{1}{2\sigma^{2}}\left|\frac{\sum_{t=1}^{n}(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m})^{2}}{n}-c(\boldsymbol{\beta})\right|>\kappa_{1}\right)<2\exp\left(-\frac{n\kappa^{2}_{1}\sigma^{4}}{C^{2}\tilde{\gamma}^{4}_{m}\|\tilde{\eta}\|^{4}}\right).

(S-13.51)

Then, first integrating with respect to $u=\sigma^{-2}$ , then integrating with respect to $v=\|\tilde{\eta}\|$ and finally with respect to $w=\tilde{\gamma}_{m}$ , in each case using the gamma mixture form $\tilde{\pi}_{n}(x)=c_{3}\sum_{r=1}^{M_{n}}\psi^{\tilde{\zeta}_{rn}}_{rn}\exp\left(-x\psi_{rn}\right)x^{(\tilde{\zeta}_{rn}-1)}$ , for positive constants $0<\psi_{2}\leq\psi_{rn}<\tilde{c}_{5}<\infty$ and $0<\zeta_{rn}<c_{4}n^{q}$ , with $0<q<1/16$ , and $M_{n}\leq\exp\left(\left(\alpha n\right)^{1/16}\right)$ , we find that

E_{\pi}\left[P\left(\frac{1}{2\sigma^{2}}\left|\frac{\sum_{t=1}^{n}(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m})^{2}}{n}-c(\boldsymbol{\beta})\right|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]\\ \leq K_{9}\exp\left[-\left(C_{16}\kappa^{1/4}_{1}\left(n\psi_{7}\right)^{1/8}-C_{17}\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n),

(S-13.52)

for relevant positive constants. It is also easy to see using Hoeffding’s inequality using (8.9) that

\displaystyle E_{\pi}\left[P\left(\frac{1}{2\sigma^{2}_{0}}\left|\frac{\sum_{t=1}^{n}(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0})^{2}}{n}-c(\boldsymbol{\beta}_{0})\right|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]

\displaystyle\leq\tilde{K}_{9}\exp\left[-\left(\tilde{C}_{16}\kappa^{2}_{1}n\right)\right],

(S-13.53)

for relevant constants.

We next consider $P\left(\left|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right|\left|\frac{\rho_{0}\sum_{t=1}^{n}x^{2}_{t-1}}{n}+\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}-\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right|>\kappa_{1}\right)$ . Note that

	$\displaystyle P\left(\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|\left\|\frac{\rho_{0}\sum_{t=1}^{n}x^{2}_{t-1}}{n}+\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}-\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right\|>\kappa_{1}\right)$
	$\displaystyle\qquad\qquad\leq P\left(\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|\left\|\frac{\sum_{t=1}^{n}x^{2}_{t-1}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right\|>\frac{\kappa_{1}}{2\rho_{0}}\right)$		(S-13.54)
	$\displaystyle\qquad\qquad\qquad\qquad+P\left(\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|\left\|\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}\right\|>\frac{\kappa_{1}}{2}\right).$		(S-13.55)

Note that the expectation of (S-13.54) admits the same upper bound as (S-13.50). To deal with (S-13.55) we let $\tilde{x}_{t}=(\boldsymbol{z}^{\prime}_{t}\boldsymbol{\beta}_{0})x_{t-1}$ and $\tilde{\mathbf{x}}_{n}=(\tilde{x}_{1},\ldots,\tilde{x}_{n})^{\prime}$ . Then $\tilde{\mathbf{x}}_{n}\sim N_{n}\left(\tilde{\boldsymbol{\mu}}_{n},\tilde{\boldsymbol{\Sigma}}_{n}\right)$ , where $\tilde{\boldsymbol{\mu}}_{n}$ and $\tilde{\boldsymbol{\Sigma}}_{n}=\tilde{\boldsymbol{C}}_{n}\tilde{\boldsymbol{C}}^{\prime}_{n}$ are appropriate modifications of $\boldsymbol{\mu}_{n}$ and $\boldsymbol{\Sigma}_{n}=\boldsymbol{C}_{n}\boldsymbol{C}^{\prime}_{n}$ associated with (S-13.36). Note that $\tilde{\mathbf{x}}_{n}=\tilde{\boldsymbol{\mu}}_{n}+\tilde{\boldsymbol{C}}_{n}\boldsymbol{y}_{n}$ , where $\boldsymbol{y}_{n}\sim N_{n}\left(\boldsymbol{0}_{n},\boldsymbol{I}_{n}\right)$ . Using (8.9) we obtain the same form of the bound for (S-13.55) as (S-13.36). That is, we have

	$\displaystyle P\left(\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|\left\|\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}\right\|>\frac{\kappa_{1}}{2}\right)$
	$\displaystyle\leq P\left(\left\|\boldsymbol{1}^{\prime}_{n}\tilde{\boldsymbol{C}}_{n}\boldsymbol{y}_{n}\right\|>\frac{n\kappa_{1}}{4}\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|^{-1}\right)+P\left(\left\|\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{1}_{n}\right\|>\frac{n\kappa_{1}}{4}\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|^{-1}\right)$
	$\displaystyle\leq 2\exp\left(-K_{10}\kappa^{2}_{1}n\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|^{-2}\right)+P\left(\left\|\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{1}_{n}\right\|>\frac{n\kappa_{1}}{4}\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|^{-1}\right),$		(S-13.56)

where $K_{10}$ is some positive constant. Using the same method as before again we obtain a bound for the expectation of the first part of (S-13.56) of similar form as $\exp\left[-\left(\tilde{C}_{16}\sqrt{\kappa_{1}}n^{1/4}-\tilde{C}_{17}\left(\alpha n\right)^{1/16}-\tilde{\alpha}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n)$ , for relevant positive constants. As before, here $0<q<1/16$ . For the second part of (S-13.56) we apply the method involving Hoeffding’s inequality as before, and obtain a bound of the above-mentioned form. Hence combining the bounds for the expectations of (S-13.51) and (S-13.55) we see that

	$\displaystyle E_{\pi}\left[P\left(\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|\left\|\frac{\rho_{0}\sum_{t=1}^{n}x^{2}_{t-1}}{n}+\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}-\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right\|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]$
	$\displaystyle\leq K_{12}\exp\left[-\left(C_{18}\sqrt{\kappa_{1}}n^{1/4}-C_{19}\left(\alpha n\right)^{1/16}-\tilde{\alpha}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n),$		(S-13.57)

for relevant positive constants.

Now let us bound the probability $P\left(\left|\left(\frac{\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\boldsymbol{\beta}_{m0}}{\sigma^{2}_{0}}\right)^{\prime}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t}}{n}\right)-\frac{c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})}{\sigma^{2}}+\frac{c(\boldsymbol{\beta}_{0})}{\sigma^{2}_{0}}\right|>\kappa_{1}\right)$ . Observe that

	$\displaystyle P\left(\left\|\left(\frac{\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\boldsymbol{\beta}_{m0}}{\sigma^{2}_{0}}\right)^{\prime}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t}}{n}\right)-\frac{c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})}{\sigma^{2}}+\frac{c(\boldsymbol{\beta}_{0})}{\sigma^{2}_{0}}\right\|>\kappa_{1}\right)$
	$\displaystyle\leq P\left(\left\|\frac{\sum_{t=1}^{n}(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m})x_{t}}{n}-c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})\right\|>\frac{\kappa_{1}\sigma^{2}}{2}\right)+P\left(\left\|\frac{\sum_{t=1}^{n}(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0})x_{t}}{n}-c(\boldsymbol{\beta}_{0})\right\|>\frac{\kappa_{1}\sigma^{2}_{0}}{2}\right).$		(S-13.58)

Using the Gaussian concentration inequality as before it is easily seen that

	$\displaystyle E_{\pi}\left[P\left(\left\|\frac{\sum_{t=1}^{n}(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m})x_{t}}{n}-c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})\right\|>\frac{\kappa_{1}\sigma^{2}}{2}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]$
	$\displaystyle\leq 2\int_{\boldsymbol{\gamma}_{m},\tilde{\eta}\in\mathcal{G}_{n}}\int_{\exp(-2\left(\alpha n\right)^{1/16})}^{\exp(2\left(\alpha n\right)^{1/16})}\exp\left(-\frac{K_{13}\kappa^{2}_{1}n\sigma^{4}}{\\|\boldsymbol{\beta}\\|^{2}}\right)d\pi(\boldsymbol{\beta},\sigma^{2})+\tilde{C}\exp(-\alpha n)$
	$\displaystyle\leq C_{20}\exp\left[-\left(C_{21}\sqrt{\kappa_{1}}n^{1/4}-C_{22}\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n),$		(S-13.59)

for relevant positive constants.

The Gaussian concentration inequality also ensures that the second term of (S-13.58) is bounded above by $2\exp(-K_{13}\kappa^{2}_{1}n)$ , for some $K_{13}>0$ . Combining this with (S-13.58) and (S-13.59) we obtain

	$\displaystyle E_{\pi}\left[P\left(\left\|\left(\frac{\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\boldsymbol{\beta}_{m0}}{\sigma^{2}_{0}}\right)^{\prime}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t}}{n}\right)-\frac{c_{10}(\boldsymbol{\beta},\boldsymbol{\beta}_{0})}{\sigma^{2}}+\frac{c(\boldsymbol{\beta}_{0})}{\sigma^{2}_{0}}\right\|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]$
	$\displaystyle\leq K_{14}\exp\left[-\left(C_{23}\sqrt{\kappa_{1}}n^{1/4}-C_{24}\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n)+2\exp(-K_{13}\kappa^{2}_{1}n),$		(S-13.60)

for relevant positive constants. Note that, here $0<q<1/16$ .

For $P\left(\left|\left(\frac{\rho\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\rho_{0}\boldsymbol{\beta}_{m0}}{\sigma^{2}_{0}}\right)^{\prime}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}\right)\right|>\kappa_{1}\right)$ , we note that

	$\displaystyle P\left(\left\|\left(\frac{\rho\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\rho_{0}\boldsymbol{\beta}_{m0}}{\sigma^{2}_{0}}\right)^{\prime}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}\right)\right\|>\kappa_{1}\right)$
	$\displaystyle\leq P\left(\left\|\frac{\sum_{t=1}^{n}(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m})x_{t-1}}{n}\right\|>\frac{\kappa_{1}\sigma^{2}}{2\rho}\right)+P\left(\left\|\frac{\sum_{t=1}^{n}(\boldsymbol{z}^{\prime}_{mt}\boldsymbol{\beta}_{m0})x_{t-1}}{n}\right\|>\frac{\kappa_{1}\sigma^{2}_{0}}{2\rho_{0}}\right).$		(S-13.61)

For the first term of (S-13.61) we apply the Gaussian concentration inequality followed by taking expectations with respect to $\sigma^{2}$ , $|\rho|$ , $|\tilde{\gamma}_{m}|$ and $\|\tilde{\eta}\|$ . This yields the bound

K_{15}\exp\left[-\left(C_{25}\kappa^{1/8}_{1}n^{1/16}-C_{26}\left(\alpha n\right)^{1/16}-n^{q}\log\tilde{c}_{5}\right)\right]+\tilde{C}\exp(-\alpha n),

for relevant positive constants. The bound for the second term is given by $2\exp(-K_{16}\kappa^{2}_{1}n)$ . Together we thus obtain

	$\displaystyle E_{\pi}\left[P\left(\left\|\left(\frac{\rho\boldsymbol{\beta}_{m}}{\sigma^{2}}-\frac{\rho_{0}\boldsymbol{\beta}_{m0}}{\sigma^{2}_{0}}\right)^{\prime}\left(\frac{\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}\right)\right\|>\delta_{1}\right)\boldsymbol{I}_{\mathcal{G}_{n}}(\boldsymbol{\theta})\right]$
	$\displaystyle\leq\tilde{K}_{16}\exp\left[-\left(C_{26}\kappa^{1/8}_{1}n^{1/16}-C_{27}\left(\alpha n\right)^{1/16}-n^{q}\log\tilde{c}_{5}\right)\right]+2\exp(-K_{16}\kappa^{2}_{1}n).$		(S-13.62)

We now deal with the last term $P\left(\left|\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\left(\frac{\sum_{t=1}^{n}\epsilon_{t}x_{t-1}}{n}\right)\right|>\kappa_{1}\right)$ . Recall that $\mathbf{x}_{n}=\boldsymbol{\mu}_{n}+\boldsymbol{C}_{n}\boldsymbol{y}_{n}$ , where $\boldsymbol{C}_{n}\boldsymbol{C}^{\prime}_{n}=\boldsymbol{\Sigma}_{n}$ and $\boldsymbol{y}_{n}\sim N_{n}\left(\boldsymbol{\mu}_{n},\boldsymbol{I}_{n}\right)$ . Let $\boldsymbol{\epsilon}_{n-1}=(\epsilon_{2},\ldots,\epsilon_{n})^{\prime}$ . Then $\sum_{t=1}^{n}\epsilon_{t}x_{t-1}=\boldsymbol{\epsilon}^{\prime}_{n-1}\mathbf{x}_{n-1}=\sigma_{0}\left(\boldsymbol{y}^{\prime}_{n}\boldsymbol{\mu}_{n}+\boldsymbol{y}^{\prime}_{n-1}\boldsymbol{C}_{n-1}\boldsymbol{y}_{n-1}\right)$ . Application of the Gaussian concentration inequality and the Hanson-Wright inequality we find that

	$\displaystyle P\left(\left\|\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\left(\frac{\sum_{t=1}^{n}\epsilon_{t}x_{t-1}}{n}\right)\right\|>\kappa_{1}\right)$
	$\displaystyle\leq P\left(\frac{\left\|\boldsymbol{y}^{\prime}_{n}\boldsymbol{\mu}_{n}\right\|}{n}>\frac{\kappa_{1}}{\sigma_{0}}\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|^{-1}\right)+P\left(\frac{\boldsymbol{y}^{\prime}_{n-1}\boldsymbol{C}_{n-1}\boldsymbol{y}_{n-1}}{n}>\frac{\kappa_{1}}{\sigma_{0}}\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|^{-1}\right)$
	$\displaystyle\leq K_{17}\exp\left(-K_{18}\kappa^{2}_{1}n\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|^{-2}\right),$		(S-13.63)

for some positive constants $K_{17}$ and $K_{18}$ . Taking expectation of (S-13.63) with respect to $\pi$ we obtain as before

E_{\pi}\left[P\left(\left|\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\left(\frac{\sum_{t=1}^{n}\epsilon_{t}x_{t-1}}{n}\right)\right|>\kappa_{1}\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]\\ \leq K_{19}\exp\left[-\left(K_{20}\sqrt{\kappa_{1}}n^{1/4}-K_{21}\left(\alpha n\right)^{1/16}-\tilde{c}_{5}n^{q}\right)\right]+\tilde{C}\exp(-\alpha n),

(S-13.64)

for relevant positive constants. Recall that $0<q<1/16$ .

Combining (S-13.46), (S-13.50), (S-13.52), (S-13.57), (S-13.60), (S-13.62) and (S-13.64), we see that

\sum_{n=1}^{\infty}E_{\pi}\left[P\left(\left|\frac{1}{n}\log R_{n}(\boldsymbol{\theta})+h(\boldsymbol{\theta})\right|>\delta\right)\boldsymbol{I}_{\mathcal{S}^{c}}(\boldsymbol{\theta})\right]<\infty.

This verifies (8.24) and hence (S6).

S-13.7 Verification of (S7)

Since $\mathcal{G}_{n}\rightarrow\boldsymbol{\Theta}$ as $n\rightarrow\infty$ , it follows that for any set $A$ with $\pi(A)>0$ , $\mathcal{G}_{n}\cap A\rightarrow\boldsymbol{\Theta}\cap A=A$ , as $n\rightarrow\infty$ . In our case, $\mathcal{G}_{n}$ , and hence $\mathcal{G}_{n}\cap A$ , are decreasing in $n$ , so that $h\left(\mathcal{G}_{n}\cap A\right)$ must be non-increasing in $n$ . Moreover, for any $n\geq 1$ , $\mathcal{G}_{n}\cap A\subseteq A$ , so that $h\left(\mathcal{G}_{n}\cap A\right)\geq h(A)$ , for all $n\geq 1$ . Hence, continuity of $h$ implies that $h\left(\mathcal{G}_{n}\cap A\right)\rightarrow h(A)$ , as $n\rightarrow\infty$ , and (S7) is satisfied.

Thus (S1)–(S7) are satisfied, so that Shalizi’s result stated in the main manuscript holds. It follows that all our asymptotic results of our main manuscript apply to this multiple testing problem.

References

Bogdan et al. (2011) Bogdan, M., Chakrabarti, A., Frommlet, F., and Ghosh, J. K. (2011). Asymptotic Bayes-optimality under sparsity of some multiple testing procedures. Ann. Statist., 39(3), 1551–1579.
Chandra and Bhattacharya (2019) Chandra, N. K. and Bhattacharya, S. (2019). Non-marginal Decisions: A Novel Bayesian Multiple Testing Procedure. Electronic Journal of Statistics, 13(1), 489–535.
Chandra and Bhattacharya (2020) Chandra, N. K. and Bhattacharya, S. (2020). Asymptotic Theory of Dependent Bayesian Multiple Testing Procedures Under Possible Model Misspecification. arXiv preprint arXiv:1611.01369.
Chatterjee and Bhattacharya (2020) Chatterjee, D. and Bhattacharya, S. (2020). Posterior Convergence of Gaussian Process Regression Under Possible Misspecifications. arXiv preprint.
Cramer and Leadbetter (1967) Cramer, H. and Leadbetter, M. R. (1967). Stationary and Related Stochastic Processes. Wiley, New York.
Datta and Ghosh (2013) Datta, J. and Ghosh, J. K. (2013). Asymptotic Properties of Bayes Risk for the Horseshoe Prior. Bayesian Anal., 8(1), 111–132.
Fan and Han (2017) Fan, J. and Han, X. (2017). Estimation of the false discovery proportion with unknown dependence. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(4), 1143–1164.
Fan et al. (2012) Fan, J., Han, X., and Gu, W. (2012). Estimating false discovery proportion under arbitrary covariance dependence. Journal of the American Statistical Association, 107(499), 1019–1035. PMID: 24729644.
Giraud (2015) Giraud, C. (2015). Introduction to High-Dimensional Statistics. CRC Press, Boca Raton.
Hoeffding (1963) Hoeffding, W. (1963). Probability Inequalities for Sums of Bounded Random Variables. Journal of the American Statistical Association, 58, 13–30.
Lange (2010) Lange, K. (2010). Numerical Analysis for Statisticians. New York, Springer.
Müller et al. (2004) Müller, P., Parmigiani, G., Robert, C., and Rousseau, J. (2004). Optimal sample size for multiple testing: the case of gene expression microarrays. Journal of the American Statistical Association, 99(468), 990–1001.
Newey (1991) Newey, W. K. (1991). Uniform Convergence in Probability and Stochastic Equicontinuity. Econometrica, 59, 1161–1167.
Rudelson and Vershynin (2013) Rudelson, M. and Vershynin, R. (2013). Hanson-Wright Inequality and Sub-Gaussian Concentration. Electronic Communications in Probability, 18, 9.
Sarkar et al. (2008) Sarkar, S. K., Zhou, T., and Ghosh, D. (2008). A general decision theoretic formulation of procedures controlling FDR and FNR from a Bayesian perspective. Statistica Sinica, 18(3), 925–945.
Shalizi (2009) Shalizi, C. R. (2009). Dynamics of Bayesian Updating with Dependent Data and Misspecified Models. Electron. J. Statist., 3, 1039–1074.
Storey (2003) Storey, J. D. (2003). The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Statist., 31(6), 2013–2035.
Sun and Cai (2009) Sun, W. and Cai, T. T. (2009). Large-scale multiple testing under dependence. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 393–424.
Xie et al. (2011) Xie, J., Cai, T. T., Maris, J., and Li, H. (2011). Optimal false discovery rate control for dependent data. Statistics and its interface, 4(4), 417.

	$\displaystyle P\left(\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|\left\|\frac{2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n}}{n}\right\|>\frac{\kappa_{1}}{4}\right)$	$\displaystyle=P\left(\left\|2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n}\right\|>\frac{n\kappa_{1}}{3}\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|^{-1}\right)$
		$\displaystyle\leq 2\exp\left(-\frac{n\kappa^{2}_{1}}{18K_{1}}\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|^{-2}\right).$		(S-13.36)

	$\displaystyle\int_{\mathcal{S}^{c}}P\left(\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|\left\|\frac{2\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{C}_{n}\boldsymbol{y}_{n}}{n}\right\|>\frac{\kappa_{1}}{4}\right)d\pi(\boldsymbol{\theta})$
	$\displaystyle\leq 2\int_{\mathcal{S}^{c}}\exp\left(-\frac{n\kappa^{2}_{1}}{18K_{1}}\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|^{-2}\right)d\pi(\boldsymbol{\theta})$		(S-13.37)
	$\displaystyle\leq 2\int_{\mathcal{G}_{n}}\exp\left(-\frac{n\kappa^{2}_{1}}{18K_{1}}\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|^{-2}\right)d\pi(\boldsymbol{\theta})+2\int_{\mathcal{G}^{c}_{n}}\exp\left(-\frac{n\kappa^{2}_{1}}{18K_{1}}\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|^{-2}\right)d\pi(\boldsymbol{\theta})$
	$\displaystyle\leq 2\int_{\exp(-2\left(\alpha n\right)^{1/16})}^{\exp(2\left(\alpha n\right)^{1/16})}\exp\left(-\frac{n\kappa^{2}_{1}}{18K_{1}}\left\|\frac{1}{2\sigma^{2}}-\frac{1}{2\sigma^{2}_{0}}\right\|^{-2}\right)\pi(\sigma^{2})d\sigma^{2}+2\pi\left(\mathcal{G}^{c}_{n}\right)$
	$\displaystyle\leq 2\int_{\exp(-2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}^{\exp(2\left(\alpha n\right)^{1/16})-\sigma^{-2}_{0}}\exp\left(-C_{1}\kappa^{2}_{1}Tu^{-2}\right)(u+\sigma^{-2}_{0})^{-2}\pi\left(\frac{1}{u+\sigma^{-2}_{0}}\right)du+\tilde{C}\exp(-\alpha n),$		(S-13.38)

	$\displaystyle P\left(\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|\left\|\frac{\rho_{0}\sum_{t=1}^{n}x^{2}_{t-1}}{n}+\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}-\frac{\rho_{0}\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{\rho_{0}c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right\|>\kappa_{1}\right)$
	$\displaystyle\qquad\qquad\leq P\left(\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|\left\|\frac{\sum_{t=1}^{n}x^{2}_{t-1}}{n}-\frac{\sigma^{2}_{0}}{1-\rho^{2}_{0}}-\frac{c(\boldsymbol{\beta}_{0})}{1-\rho^{2}_{0}}\right\|>\frac{\kappa_{1}}{2\rho_{0}}\right)$		(S-13.54)
	$\displaystyle\qquad\qquad\qquad\qquad+P\left(\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|\left\|\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}\right\|>\frac{\kappa_{1}}{2}\right).$		(S-13.55)

	$\displaystyle P\left(\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|\left\|\frac{\boldsymbol{\beta}^{\prime}_{m0}\sum_{t=1}^{n}\boldsymbol{z}_{mt}x_{t-1}}{n}\right\|>\frac{\kappa_{1}}{2}\right)$
	$\displaystyle\leq P\left(\left\|\boldsymbol{1}^{\prime}_{n}\tilde{\boldsymbol{C}}_{n}\boldsymbol{y}_{n}\right\|>\frac{n\kappa_{1}}{4}\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|^{-1}\right)+P\left(\left\|\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{1}_{n}\right\|>\frac{n\kappa_{1}}{4}\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|^{-1}\right)$
	$\displaystyle\leq 2\exp\left(-K_{10}\kappa^{2}_{1}n\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|^{-2}\right)+P\left(\left\|\boldsymbol{\mu}^{\prime}_{n}\boldsymbol{1}_{n}\right\|>\frac{n\kappa_{1}}{4}\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|^{-1}\right),$		(S-13.56)

	$\displaystyle P\left(\left\|\left(\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right)\left(\frac{\sum_{t=1}^{n}\epsilon_{t}x_{t-1}}{n}\right)\right\|>\kappa_{1}\right)$
	$\displaystyle\leq P\left(\frac{\left\|\boldsymbol{y}^{\prime}_{n}\boldsymbol{\mu}_{n}\right\|}{n}>\frac{\kappa_{1}}{\sigma_{0}}\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|^{-1}\right)+P\left(\frac{\boldsymbol{y}^{\prime}_{n-1}\boldsymbol{C}_{n-1}\boldsymbol{y}_{n-1}}{n}>\frac{\kappa_{1}}{\sigma_{0}}\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|^{-1}\right)$
	$\displaystyle\leq K_{17}\exp\left(-K_{18}\kappa^{2}_{1}n\left\|\frac{\rho}{\sigma^{2}}-\frac{\rho_{0}}{\sigma^{2}_{0}}\right\|^{-2}\right),$		(S-13.63)

High-dimensional Asymptotic Theory of Bayesian Multiple Testing Procedures Under General Dependent Setup and Possible Misspecification

Abstract

1 Introduction

2 A brief overview of the Bayesian non-marginal procedure

Definition 1

2.1 Error measures in multiple testing

3 Preliminaries for ensuring posterior convergence under general setup

3.1 Assumptions and theorem of Shalizi

Theorem 2 ((Shalizi, 2009))

4 Consistency of multiple testing procedures when the number of hypotheses tends to infinity

4.1 Consistency of multiple testing procedures under misspecification

Definition 3

4.2 Main results on consistency in the infinite-dimensional setup

Theorem 4

Corollary 5

Remark 6

5 High-dimensional asymptotic analyses of versions of F​D​RFDR

Theorem 7

Corollary 8

Theorem 9

6 High-dimensional asymptotic analyses of versions of F​N​RFNR

Theorem 10

Corollary 11

Theorem 12

7 High-dimensional asymptotics for F​N​R𝑿nFNR_{\boldsymbol{X}_{n}} and B​F​N​RBFNR when versions of B​F​D​RBFDR are α\alpha-controlled

7.1 High-dimensional α\alpha-control of m​p​B​F​D​RmpBFDR and p​B​F​D​RpBFDR for the non-marginal method

Theorem 13

Remark 14

Remark 15

Remark 16

Theorem 17

Corollary 18

Theorem 19

Theorem 20

7.2 High-dimensional properties of Type-II errors when m​p​B​F​D​RmpBFDR and p​B​F​D​RpBFDR are asymptotically controlled at α\alpha

Theorem 21

Corollary 22

8 Illustration of consistency of our non-marginal multiple testing procedure in time-varying covariate selection in autoregressive process

8.1 The ultra high-dimensional setup

8.2 Discussion of the assumptions in the light of the ultra high-dimensional setup

8.3 High-dimensional but not ultra high-dimensional setup

Lemma 23

Theorem 24

Theorem 25

Theorem 26

8.4 Remark on identifiability of our model and posterior consistency

9 Summary and conclusion

S-10 Proof of Theorem 4

S-11 Proof of Theorem 7

S-11.1 Proof of Theorem 9

S-12 Proof of Theorem 10

S-12.1 Proof of Theorem 12

S-12.2 Proof of Theorem 13

S-12.3 Proof of Theorem 17

S-12.4 Proof of Theorem 19

S-12.5 Proof of Theorem 20

S-12.6 Proof of Theorem 21

S-13 Verification of (S1)-(S7) in A​R​(1)AR(1) model with time-varying covariates and proofs of the relevant theorems

S-13.1 Verification of (S1)

S-13.2 Proof of Lemma 23

S-13.3 Proof of Theorem 24

S-13.4 Verification of (S4)

S-13.5 Verification of (S5)

S-13.5.1 Verification of (S5) (1)

S-13.5.2 Verification of (S5) (2)

S-13.5.3 Verification of (S5) (3)

S-13.6 Verification of (S6) and proof of Theorem 25

S-13.7 Verification of (S7)

References

5 High-dimensional asymptotic analyses of versions of $FDR$

6 High-dimensional asymptotic analyses of versions of $FNR$

7 High-dimensional asymptotics for $FNR_{\boldsymbol{X}_{n}}$ and $BFNR$ when versions of $BFDR$ are $\alpha$ -controlled

7.1 High-dimensional $\alpha$ -control of $mpBFDR$ and $pBFDR$ for the non-marginal method

7.2 High-dimensional properties of Type-II errors when $mpBFDR$ and $pBFDR$ are asymptotically controlled at $\alpha$

S-13 Verification of (S1)-(S7) in $AR(1)$ model with time-varying covariates and proofs of the relevant theorems