This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

High-dimensional inference for single-index model with latent factors

Yanmei Shi1 Meiling Hao2∗ Yanlin Tang 3 Heng Lian4 and Xu Guo1***Corresponding authors: Meiling Hao and Xu Guo; email: [email protected] and [email protected]
 
1 School of Statistics, Beijing Normal University, Beijing, China
2 School of Statistics, University of International Business and Economics, Beijing, China
3 Key Laboratory of Advanced Theory and Application in Statistics and Data Science–MOE, School of Statistics, East China Normal University, Shanghai, China
4 Department of Mathematics, City University of Hong Kong, Hong Kong, China
Abstract

Models with latent factors recently attract a lot of attention. However, most investigations focus on linear regression models and thus cannot capture nonlinearity. To address this issue, we propose a novel Factor Augmented Single-Index Model. We first address the concern whether it is necessary to consider the augmented part by introducing a score-type test statistic. Compared with previous test statistics, our proposed test statistic does not need to estimate the high-dimensional regression coefficients, nor high-dimensional precision matrix, making it simpler in implementation. We also propose a Gaussian multiplier bootstrap to determine the critical value. The validity of our procedure is theoretically established under suitable conditions. We further investigate the penalized estimation of the regression model. With estimated latent factors, we establish the error bounds of the estimators. Lastly, we introduce debiased estimator and construct confidence interval for individual coefficient based on the asymptotic normality. No moment condition for the error term is imposed for our proposal. Thus our procedures work well when random error follows heavy-tailed distributions or when outliers are present. We demonstrate the finite sample performance of the proposed method through comprehensive numerical studies and its application to an FRED-MD macroeconomics dataset.

Keywords: Factor augmented regression, latent factor regression, robustness, single-index model, score-type test statistic

1 Introduction

The rapid advancement of information technology has brought significant changes in both data collection and analysis. Various disciplines, such as economics, social sciences, and genetics, increasingly collect high-dimensional data for comprehensive research and analysis (Belloni et al.,, 2012; Bühlmann et al.,, 2014; Fan et al.,, 2020). Most existing high-dimensional procedures (for a recent comprehensive review, see for instance Fan et al., (2020)) were conducted under the assumption that there are no latent factors associated with both the response variables and covariates. However, it is important to acknowledge that this assumption is often violated in real-world. For instance, in genetic studies, the influence of specific DNA segments on gene expression can be confounded by population structure and artifacts in microarray expression (Listgarten et al.,, 2010). Similarly, in healthcare research, disentangling the impact of nutrient intake on cancer risk can be confounded by factors such as physical health, social class, and behavioral aspects (Fewell et al.,, 2007). Failure to account for latent factors in the model may lead to biased inferences. These examples underscore the importance of considering latent factors within models to ensure accurate and reliable conclusions.

To tackle the issues posed by latent factors, Fan et al., (2024) proposed the following Factor Augmented sparse linear Regression Model (FARM):

Y\displaystyle Y =𝒇𝜸+𝒖𝜷+ε,\displaystyle=\bm{f}^{\top}\bm{\gamma}+\bm{u}^{\top}\bm{\beta}+\varepsilon, (1.1)
with𝒙\displaystyle\text{with}\ \ \bm{x} =𝑩𝒇+𝒖.\displaystyle=\bm{B}\bm{f}+\bm{u}. (1.2)

Here, YY is a response variable, 𝒙\bm{x} is a pp-dimensional covariate vector, 𝒇\bm{f} is a KK-dimensional vector of latent factors, 𝑩p×K\bm{B}\in\mathbb{R}^{p\times K} is the corresponding factor loading matrix, and 𝒖\bm{u} is a pp-dimensional vector of idiosyncratic components, which is uncorrelated with 𝒇\bm{f}. The 𝜷=(β1,,βp)p\bm{\beta}=(\beta_{1},\ldots,\beta_{p})^{\top}\in\mathbb{R}^{p} and 𝜸=(γ1,,γK)K\bm{\gamma}=(\gamma_{1},\ldots,\gamma_{K})^{\top}\in\mathbb{R}^{K} are vectors of regression parameters quantifying the contribution of 𝒖\bm{u} and 𝒇\bm{f}, respectively. The random error ε\varepsilon satisfies that 𝔼(ε)=0\mathbb{E}\left(\varepsilon\right)=0, and is independent of 𝒖\bm{u} and 𝒇\bm{f}. In fact, model (1.2) is a commonly used structure for characterizing the interdependence among features. In this framework, the variables are intercorrelated through a shared set of latent factors.

Numerous methodologies have been proposed to enable statistical analysis regarding models with latent factors. Guo et al., (2022) introduced a deconfounding approach for conducting statistical inference on individual regression coefficient βj,j=1,,p{\beta}_{j},j=1,\ldots,p, integrating the findings of Ćevid et al., (2020) with the debiased Lasso method (Zhang and Zhang,, 2014; Van de Geer et al.,, 2014). Ouyang et al., (2023) investigated the inference problem of generalized linear regression models (GLM) with latent factors. Sun et al., (2023) considered the multiple testing problem for GLM with latent factors. Bing et al., (2023) focused on inferring high-dimensional multivariate response regression models with latent factors.

Although the above FARM is powerful to deal with latent factors, it may be not flexible enough to handle the nonlinear relationship between covariates and the response. To capture the nonlinearity, the single-index model (SIM) is usually adopted due to its flexibility and interpretability. As a result, the SIM has been the subject of extensive attention and in-depth research in the past decade. Actually, several approaches have been explored for the variable selection problem in high-dimensional SIMs. Examples include Kong and Xia, (2007), Zhu and Zhu, (2009), Wang et al., (2012), Radchenko, (2015), Plan and Vershynin, (2016), and Rejchel and Bogdan, (2020). However, limited attention has been paid to discuss SIMs with latent factors. This motivates us to investigate the high-dimensional single-index model with latent factors.

To this end, we consider the following Factor Augmented sparse Single Index Model (FASIM), which integrates both the latent factors and the covariates,

Y\displaystyle Y =g(𝒇𝜸+𝒖𝜷,ε),\displaystyle=g(\bm{f}^{\top}\bm{\gamma}+\bm{u}^{\top}\bm{\beta},\varepsilon),
with𝒙\displaystyle\text{with}\ \ \bm{x} =𝑩𝒇+𝒖.\displaystyle=\bm{B}\bm{f}+\bm{u}. (1.3)

Here the link function g(,)g(\cdot,\cdot) is unknown, and other variables and parameters in model (1) remain consistent with those defined in model (1.1). When g(a,b)=a+bg(a,b)=a+b, the above FASIM reduces to the FARM introduced by Fan et al., (2024). Since g(,)g(\cdot,\cdot) can be unknown, the above FASIM is very flexible and can capture the nonlinearity.

For the FASIM, the first concern is whether 𝜷\bm{\beta} is zero or not. Actually if 𝜷=𝟎\bm{\beta}=\bm{0}, the model reduces to a single-index factor regression model, which was also considered by Fan et al., (2017), Jiang et al., (2019), and Luo et al., (2022). For this important problem, Fan et al., (2024) considered the maximum of the debiased lasso estimator of 𝜷\bm{\beta} under FARM. However, their procedure is computationally expensive since it requires to estimate high-dimensional regression coefficients and also high-dimensional precision matrix. In this paper, we first introduce a score-type test statistic, which does not need to estimate high-dimensional regression coefficients, nor high-dimensional precision matrix. Hence, our procedure is very simple in implementation. We also propose a Gaussian multiplier bootstrap to determine our test statistic’s critical value. The validity of our procedure is theoretically established under suitable conditions. We also give a power analysis for our test procedure.

When the FASIM is adequate, it is then of importance to estimate the parameters in FASIM. For the SIM, Rejchel and Bogdan, (2020) considered distribution function transformation of the responses, and then estimated the unknown parameters using the Lasso method. Similar procedures have also been investigated by Zhu and Zhu, (2009) and Wang et al., (2012). This enhances the model’s robustness in scenarios where random errors follow heavy-tailed distributions or when outliers are present. Motivated by their procedures, we also consider the distribution function transformation of the responses and then introduce penalized estimation of unknown parameters. However, it should be emphasized here, different from Zhu and Zhu, (2009), Wang et al., (2012) and Rejchel and Bogdan, (2020), in our model 𝒇\bm{f} and 𝒖\bm{u} are unobserved and must be estimated firstly. This would give additional technical difficulty. In this paper, we establish the estimation error bounds of our introduced penalized estimators under mild assumptions. Notably, no moment condition is required for the error term in the FASIM.

Lastly, we investigate the construction of confidence interval for each regression coefficient in the FASIM. Due to the inherent bias, the penalized estimator cannot be directly used in statistical inference. Eftekhari et al., (2021) investigated the inference problem of the SIM by adopting the debiasing technique (Zhang and Zhang,, 2014; Van de Geer et al.,, 2014). However, their procedure cannot handle latent factors. To this end, we introduce debiased estimator for the FASIM and also establish its corresponding asymptotic normality. Compared with Eftekhari et al., (2021), our procedure does not need sample-splitting and is robust to outliers.

The remainder of this paper is structured as follows. In Section 2, we delve into the reformulation of the FASIM and the estimation of the latent factors. In Section 3, we develop a powerful test procedure for testing whether 𝜷=𝟎\bm{\beta}=\bm{0}. In Section 4, we consider the regularization estimation of 𝜷\bm{\beta} and establish the 1\ell_{1} and 2\ell_{2}-estimation error bounds for this estimation. Further we introduce debiased estimator and construct confidence interval for each coefficient. We present the findings of our simulation studies in Section 5 and provide an analysis of real data in Section 6 to assess the performance and effectiveness of the proposed approach. Conclusions and discussions are presented in Section 7. Proofs of the main theorems are provided in the Appendix. Proofs of related technical Lemmas are attached in the Supplementary Material.

Notation. Let 𝕀()\mathbb{I}(\cdot) denote the indicator function. For a vector 𝒂=(a1,,am)m\bm{a}=\left(a_{1},\ldots,a_{m}\right)^{\top}\in\mathbb{R}^{m}, we denote its q\ell_{q} norm as 𝒂q=(=1m|a|q)1/q, 1q<\|\bm{a}\|_{q}=\left(\sum_{\ell=1}^{m}|a_{\ell}|^{q}\right)^{1/q},\ 1\leq q<\infty, 𝒂=max1m|a|\|\bm{a}\|_{\infty}=\max_{1\leq\ell\leq m}|a_{\ell}|, and 𝒂0==1m𝕀(a0)\|\bm{a}\|_{0}=\sum_{\ell=1}^{m}\mathbb{I}(a_{\ell}\neq 0). For any integer mm, we define [m]={1,,m}[m]=\{1,\ldots,m\}. The Orlicz norm of a scalar random variable XX is defined as Xψ2=inf{c>0:𝔼exp(X2/c2)2}\|X\|_{\psi_{2}}=\inf{\{c>0:\mathbb{E}\exp(X^{2}/c^{2})\leq 2\}}. For a random vector 𝒙m\bm{x}\in\mathbb{R}^{m}, we define its Orlicz norm as 𝒙ψ2=sup𝒄2=1𝒄𝒙ψ2\|\bm{x}\|_{\psi_{2}}=\sup_{\|\bm{c}\|_{2}=1}\|\bm{c}^{\top}\bm{x}\|_{\psi_{2}}. Furthermore, we use 𝑰K\bm{I}_{K}, 𝟏K\bm{1}_{K} and 𝟎K\bm{0}_{K} to denote the identity matrix in K×K\mathbb{R}^{K\times K}, a vector of dimensional KK with all elements being 11 and all elements being 0, respectively. For a matrix 𝑨=(Ajk)\bm{A}={(A_{jk})}, 𝒂i\bm{a}_{i} represents the ii-th row of 𝑨\bm{A}, and 𝑨j\bm{A}_{j} represents the jj-th column of 𝑨\bm{A}. We define 𝑨𝔽=jkAjk2\|\bm{A}\|_{\mathbb{F}}=\sqrt{\sum_{jk}{A}_{jk}^{2}}, 𝑨max=maxj,k|Ajk|\|\bm{A}\|_{\max}=\max_{j,k}|A_{jk}|, 𝑨=maxjk|Ajk|\|\bm{A}\|_{\infty}=\max_{j}\sum_{k}|A_{jk}|, 𝑨1=maxkj|Ajk|\|\bm{A}\|_{1}=\max_{k}\sum_{j}|A_{jk}| and 𝑨sum=j,k|Ajk|\|\bm{A}\|_{\text{sum}}=\sum_{j,k}|A_{jk}| to be its Frobenius norm, element-wise max-norm, matrix \ell_{\infty}-norm, matrix 1\ell_{1}-norm and the element-wise sum-norm, respectively. Besides, we use λmin(𝑨)\lambda_{\min}(\bm{A}) and λmax(𝑨)\lambda_{\max}(\bm{A}) to denote the minimal and maximal eigenvalues of 𝑨\bm{A}, respectively. We use |𝒜||\mathcal{A}| to denote the cardinality of a set 𝒜\mathcal{A}. For two positive sequences {an}n1\{a_{n}\}_{n\geq 1}, {bn}n1\{b_{n}\}_{n\geq 1}, we write an=O(bn)a_{n}=O(b_{n}) if there exists a positive constant CC such that anCbna_{n}\leq C\cdot b_{n}, and we write an=o(bn)a_{n}=o(b_{n}) if an/bn0a_{n}/b_{n}\rightarrow 0. Furthermore, if an=O(bn)a_{n}=O(b_{n}) is satisfied, we write anbna_{n}\lesssim b_{n}. If anbna_{n}\lesssim b_{n} and bnanb_{n}\lesssim a_{n}, we write it as anbna_{n}\asymp b_{n} for short. In addition, an=O(bn)a_{n}=O_{\mathbb{P}}(b_{n}) and an=o(bn)a_{n}=o_{\mathbb{P}}(b_{n}) have similar meanings as above except that the relationship of an/bna_{n}/b_{n} holds with high probability. The parameters c,c0,C,C1,C2c,c_{0},C,C_{1},C_{2} and KK^{\prime} appearing in this paper are all positive constants.

2 Factor augmented single-index model

In this section, we investigate the reformulation of the FASIM and the estimation of the latent factors, as well as their properties.

2.1 Reformulation of FASIM

The unknown smooth function g(,)g(\cdot,\cdot) makes the estimation of the FASIM challenging. To address this concern, we reformulate the FASIM as a linear regression model with transfomed response. With this transformation, we estimate the parameters avoiding estimating the unknown function g(,)g(\cdot,\cdot).

Without loss of generality, we assume that 𝔼(𝒙)=𝟎\mathbb{E}(\bm{x})=\bm{0}, and 𝚺=Cov(𝒙)\bm{\Sigma}=\mathrm{Cov}({\bm{x}}) is a positive definite matrix. Let 𝝈𝒖h=Cov{𝒖,h(Y)}\bm{\sigma}_{\bm{u}h}=\mathrm{Cov}\{\bm{u},h(Y)\} and 𝝈𝒇h=Cov{𝒇,h(Y)}\bm{\sigma}_{\bm{f}h}=\mathrm{Cov}\{\bm{f},h(Y)\} for a given transformation function h()h(\cdot) of the response. Define 𝜷h=𝔼(𝒖𝒖)1𝝈𝒖h\bm{\beta}_{h}=\mathbb{E}\left(\bm{u}\bm{u}^{\top}\right)^{-1}\bm{\sigma}_{{\bm{u}}h}, 𝜸h=𝔼(𝒇𝒇)1𝝈𝒇h\bm{\gamma}_{h}=\mathbb{E}\left(\bm{f}\bm{f}^{\top}\right)^{-1}\bm{\sigma}_{\bm{f}h}, 𝜼h=(𝜷h,𝜸h)\bm{\eta}_{h}=(\bm{\beta}_{h}^{\top},\bm{\gamma}_{h}^{\top})^{\top}, 𝜼=(𝜷,𝜸)\bm{\eta}=(\bm{\beta}^{\top},\bm{\gamma}^{\top})^{\top} and 𝒗=(𝒖,𝒇)\bm{v}=(\bm{u}^{\top},\bm{f}^{\top})^{\top}. In the context of the SIM framework, the following linear expectation condition is commonly assumed.

Proposition 2.1.

Assume that 𝔼(𝐯𝐯𝛈)\mathbb{E}\left(\bm{v}\mid\bm{v}^{\top}\bm{\eta}\right) is a linear function of 𝐯𝛈\bm{v}^{\top}\bm{\eta}. Then 𝛈h\bm{\eta}_{h} is proportional to 𝛈\bm{\eta}, that is 𝛈h=κh×𝛈\bm{\eta}_{h}=\kappa_{h}\times\bm{\eta} for some constant κh\kappa_{h}.

Proposition 2.1 is from Theorem 2.1 of Li and Duan, (1989). The condition in this proposition is referred to as the linearity condition (LC) for predictors. It is satisfied when 𝒗\bm{v} follows an elliptical distribution, a commonly used assumption in the sufficient dimension reduction literature (Li,, 1991; Cook and Ni,, 2005). Hall and Li, (1993) showed that the LC holds approximately to many high-dimensional settings. Throughout the paper, we assume κh0\kappa_{h}\neq 0, which is a relatively mild assumption. In fact, when h()h(\cdot) is monotone and g(,)g(\cdot,\cdot) is monotonic with respect to the first argument, this assumption is anatomically satisfied.

Following from the definition of 𝜷h\bm{\beta}_{h} and 𝜸h\bm{\gamma}_{h}, our model can be rewritten as:

h(Y)\displaystyle h(Y) =𝒙𝜷h+𝒇𝝋h+eh,\displaystyle=\bm{x}^{\top}\bm{\beta}_{h}+\bm{f}^{\top}\bm{\varphi}_{h}+{e}_{h}, (2.1)

where 𝝋h=𝜸h𝑩𝜷h\bm{\varphi}_{h}=\bm{\gamma}_{h}-\bm{B}^{\top}\bm{\beta}_{h}, and the error term ehe_{h} satisfies 𝔼(eh)=0\mathbb{E}(e_{h})={{0}}, 𝔼(eh𝒖)=𝟎p\mathbb{E}(e_{h}\bm{u})={\bm{0}_{p}} and 𝔼(eh𝒇)=𝟎K\mathbb{E}(e_{h}\bm{f})={\bm{0}_{K}}. This implies that we now recast the FASIM as a FARM with transformed response. Actually for identification of the FASIM, it is usually assumed that 𝜼2=1\|\bm{\eta}\|_{2}=1 and its first element is positive. Thus the estimation of the direction of 𝜼\bm{\eta} is sufficient for the FASIM. The proportionality between 𝜼\bm{\eta} and 𝜼h\bm{\eta}_{h}, along with the previous mentioned transformed linear regression model (2.1), reduces the difficulty of analysis.

In practice, we need to choose an appropriate transformation function h()h(\cdot). It’s worth mentioning that the procedure in Eftekhari et al., (2021) essentially works with h(Y)=Yh(Y)=Y. Motivated by Zhu and Zhu, (2009) and Rejchel and Bogdan, (2020), we set h(Y)=F(Y)1/2h(Y)=F(Y)-1/2, where F(Y)F(Y) is the distribution function of YY. This specific choice could make our procedures be robust against outliers and heavy tails. Actually with equation (2.1), given the widely imposed sub-Gaussian assumption on the predictors, the boundedness of F(Y)1/2F(Y)-1/2 would lead the transformed error term ehe_{h} being sub-Gaussian, even if the original error term ε\varepsilon comes from Cauchy distribution. The response-distribution transformation is preferred due to some additional reasons which will be discussed later.

2.2 Factor estimation

Throughout the paper, we assume that the data {𝒙i,𝒇i,Yi,ehi}i=1n\{\bm{x}_{i},\bm{f}_{i},Y_{i},e_{hi}\}_{i=1}^{n} are independent and identically distributed (i.i.d.) copies of {𝒙,𝒇,Y,eh}\{\bm{x},\bm{f},Y,e_{h}\}. Let 𝑿=(𝒙1,,𝒙n)\bm{X}=(\bm{x}_{1},\ldots,\bm{x}_{n})^{\top}, 𝑭=(𝒇1,,𝒇n)\bm{F}=(\bm{f}_{1},\ldots,\bm{f}_{n})^{\top}, 𝒀=(Y1,,Yn)\bm{Y}=(Y_{1},\ldots,Y_{n})^{\top} and 𝒆h=(eh1,,ehn)\bm{e}_{h}=({e}_{h1},\ldots,{e}_{hn})^{\top}. We consider the high-dimensional scenario where the dimension pp of the observed covariate vector 𝒙\bm{x} can be much larger than the sample size nn. For the FASIM, we first need to estimate the latent factor vector 𝒇\bm{f} since only the predictor vector 𝒙{\bm{x}} and the response YY are observable. To address this issue, we impose an identifiability assumption similar to that in Fan et al., (2013). That is

Cov(𝒇)=𝑰K,and𝑩𝑩is   diagonal.\displaystyle\mathrm{Cov}(\bm{f})=\bm{I}_{K},\ \ \text{and}\ \ \bm{B}^{\top}\bm{B}\ \ \text{is \ diagonal}.

Consequently, the constrained least squares estimator of (𝑭,𝑩)(\bm{F},\bm{B}) based on 𝑿{\bm{X}} is given as

(𝑭^,𝑩^)=\displaystyle(\widehat{\bm{F}},\widehat{\bm{B}})= argmin𝑭n×K,𝑩p×K𝑿𝑭𝑩𝔽2,\displaystyle\arg\min\limits_{\bm{F}\in\mathbb{R}^{n\times K},~{}\bm{B}\in\mathbb{R}^{p\times K}}\|{\bm{X}}-\bm{F}\bm{B}^{\top}\|_{\mathbb{F}}^{2}, (2.2)
subjectto 1n𝑭𝑭=𝑰Kand𝑩𝑩isdiagonal.\displaystyle\ \frac{1}{n}{\bm{F}}^{\top}\bm{F}=\bm{I}_{K}\ \ \text{and}\ \ {\bm{B}}^{\top}\bm{B}\ \ \text{is}\ \ \text{diagonal}.

Let 𝑼=(𝒖1,,𝒖n)\bm{U}=(\bm{u}_{1},\ldots,\bm{u}_{n})^{\top}, with 𝒖i=𝒙i𝑩𝒇i,i=1,,n\bm{u}_{i}=\bm{x}_{i}-\bm{B}\bm{f}_{i},i=1,\ldots,n. Denote 𝚺𝒖=Cov(𝒖)\bm{\Sigma}_{\bm{u}}=\mathrm{Cov}(\bm{u}).

Elementary manipulation yields that the columns of 𝑭^/n\widehat{\bm{F}}/\sqrt{n} are the eigenvectors corresponding to the largest KK eigenvalues of the matrix 𝑿𝑿{\bm{X}}{{\bm{X}}}^{\top} and 𝑩^=𝑿𝑭^(𝑭^𝑭^)1=n1𝑿𝑭^\widehat{\bm{B}}={\bm{X}}^{\top}{\widehat{\bm{F}}}({\widehat{\bm{F}}}^{\top}\widehat{\bm{F}})^{-1}=n^{-1}{\bm{X}}^{\top}{\widehat{\bm{F}}}. Then the estimator of 𝑼\bm{U} is

𝑼^=𝑿𝑭^𝑩^=(𝑰n1n𝑭^𝑭^)𝑿,\widehat{\bm{U}}={\bm{X}}-\widehat{\bm{F}}{\widehat{\bm{B}}}^{\top}=\left(\bm{I}_{n}-\frac{1}{n}\widehat{\bm{F}}{\widehat{\bm{F}}}^{\top}\right){\bm{X}},

see Fan et al., (2013). Since KK is related to the number of spiked eigenvalues of 𝑿𝑿{\bm{X}}{\bm{X}}^{\top}, it is usually small. Therefore, we treat KK as a fixed constant as suggested by Fan et al., (2024). Additionally, let 𝚺^\widehat{\bm{\Sigma}}, 𝚲^=diag(λ^1,,λ^K)\widehat{\bm{\Lambda}}=\text{diag}\left(\widehat{\lambda}_{1},\ldots,\widehat{\lambda}_{K}\right) and 𝚪^=(𝜻^1,,𝜻^K)\widehat{\bm{\Gamma}}=\left(\widehat{\bm{\zeta}}_{1},\ldots,\widehat{\bm{\zeta}}_{K}\right) be suitable estimators of the covariance matrix 𝚺\bm{\Sigma} of 𝒙\bm{x}, the matrix consisting of its leading KK eigenvalues 𝚲=diag(λ1,,λK)\bm{\Lambda}=\text{diag}(\lambda_{1},\ldots,\lambda_{K}) and the matrix consisting of their corresponding KK orthonormalized eigenvectors 𝚪=(𝜻1,,𝜻K)\bm{\Gamma}=(\bm{\zeta}_{1},\ldots,\bm{\zeta}_{K}), respectively.

We proceed by presenting the regularity assumptions imposed in seminal works about factor analysis, such as Bai, (2003) and Fan et al., (2013, 2024).

Assumption 1.

We make the following assumptions.

  • (i)

    There exists a positive constant c0c_{0} such that 𝒇ψ2c0\|\bm{f}\|_{\psi_{2}}\leq c_{0} and 𝒖ψ2c0\|\bm{u}\|_{\psi_{2}}\leq c_{0}. In addition, 𝔼(Uij)=𝔼(Fik)=0,i=1,,n,j=1,,p,k=1,,K\mathbb{E}(U_{ij})=\mathbb{E}(F_{ik})=0,\ i=1,\ldots,n,\ j=1,\ldots,p,\ k=1,\ldots,K.

  • (ii)

    There exists a constant τ>1\tau>1 such that p/τλmin(𝑩𝑩)λmax(𝑩𝑩)pτp/\tau\leq\lambda_{min}(\bm{B}^{\top}\bm{B})\leq\lambda_{\max}(\bm{B}^{\top}\bm{B})\leq p\tau.

  • (iii)

    There exists a constant Υ>0\Upsilon>0 such that 𝑩maxΥ\|\bm{B}\|_{\max}\leq\Upsilon and

    𝔼|𝒖𝒖tr(𝚺𝒖)|4Υp2.\mathbb{E}|\bm{u}^{\top}\bm{u}-\mathrm{tr}(\bm{\Sigma}_{\bm{u}})|^{4}\leq\Upsilon p^{2}.
  • (iv)

    There exists a positive constant κ<1\kappa<1 such that κλmin(𝚺𝒖),λmax(𝚺𝒖)\kappa\leq\lambda_{\min}(\bm{\Sigma}_{\bm{u}}),\ \lambda_{\max}(\bm{\Sigma}_{\bm{u}}), 𝚺𝒖11/κ\|\bm{\Sigma}_{\bm{u}}\|_{1}\leq 1/\kappa, and min1k,lpVar(UikUil)κ\min_{1\leq k,l\leq p}\mathrm{Var}({U}_{ik}{U}_{il})\geq\kappa.

Assumption 2.

(Initial pilot estimators). Assume that 𝚺^\widehat{\bm{\Sigma}}, 𝚲^\widehat{\bm{\Lambda}} and 𝚪^\widehat{\bm{\Gamma}} satisfy 𝚺^𝚺max=O{(logp)/n}\|\widehat{\bm{\Sigma}}-\bm{\Sigma}\|_{\max}=O_{\mathbb{P}}\{\sqrt{(\log p)/n}\}, (𝚲^𝚲)𝚲^1max=O{(logp)/n}\|(\widehat{\bm{\Lambda}}-\bm{\Lambda})\widehat{\bm{\Lambda}}^{-1}\|_{\max}=O_{\mathbb{P}}\{\sqrt{(\log p)/n}\}, and 𝚪^𝚪max=O{(logp)/(np)}\|\widehat{\bm{\Gamma}}-\bm{\Gamma}\|_{\max}=O_{\mathbb{P}}\{\sqrt{(\log p)/(np)}\}.

Remark 1.

Assumption 2 is taken from Bayle and Fan, (2022). This assumption holds in various scenarios of interest, such as for the sample covariance matrix under sub-Gaussian distributions (Fan et al.,, 2013). Moreover, the estimators like the marginal and spatial Kendall’s tau (Fan et al.,, 2018), and the elementwise adaptive Huber estimator (Fan et al.,, 2019) satisfy this assumption.

We summarize the theoretical results related to consistent factor estimation in Lemma 1 in Supplementary Material, which directly follows from Proposition 2.1 in Fan et al., (2024) and Lemma 3.1 in Bayle and Fan, (2022).

In practice, the number of latent factors KK is often unknown, and determining KK in a data-driven way is a crucial challenge. Numerous methods have been introduced in the literature to estimate the value of KK (Bai and Ng,, 2002; Lam and Yao,, 2012; Ahn and Horenstein,, 2013; Fan et al.,, 2022). In this paper, we employ the ratio method for our numerical studies (Luo et al.,, 2009; Lam and Yao,, 2012; Ahn and Horenstein,, 2013). Let λk(𝑿𝑿)\lambda_{k}({\bm{X}}{\bm{X}}^{\top}) be the kk-th largest eigenvalue of 𝑿𝑿{\bm{X}}{\bm{X}}^{\top}. The number of factors can be consistently estimated by

K^=argmaxkKmaxλk(𝑿𝑿)λk+1(𝑿𝑿),\displaystyle\widehat{K}=\arg\max\limits_{k\leq K_{\max}}\frac{\lambda_{k}({\bm{X}}{\bm{X}}^{\top})}{\lambda_{k+1}({\bm{X}}{\bm{X}}^{\top})},

where 1Kmaxn1\leq K_{\max}\leq n is a prescribed upper bound for KK. In our subsequent theoretical analysis, we treat KK as known. All the theoretical results remain valid conditioning on that K^\widehat{K} is a consistent estimator of KK.

3 Adequacy test of factor model

In this section, we aim to assess the adequacy of the factor model and determining whether FASIM (1) can serve as an alternative to Y=g(𝒇𝜸,ε)Y=g(\bm{f}^{\top}\bm{\gamma},\varepsilon). The primary question of interest pertains to the following hypothesis:

H0:𝜷h=𝟎versusH1:𝜷h𝟎.\displaystyle H_{0}:\bm{\beta}_{h}=\bm{0}\ \ \text{versus}\ \ H_{1}:\bm{\beta}_{h}\neq\bm{0}. (3.1)

From Proposition 2.1, the null hypothesis is also equivalent to H0:𝜷=𝟎H_{0}:\bm{\beta}=\bm{0}.

3.1 Factor-adjusted score-type test

In this subsection, we develop a factor-adjusted score type test (FAST) and derive its Gaussian approximation result. For the adequacy testing problem, Fan et al., (2024) considered the maximum of the debiased lasso estimator of 𝜷\bm{\beta} under FARM. However, their procedure requires to estimate high-dimensional regression coefficients and also high-dimensional precision matrix, and thus is computationally expensive. While our proposed FAST does not need to estimate high-dimensional regression coefficients, nor high-dimensional precision matrix. Hence, Our procedure is straightforward to implement, saving both computation time and computational resources.

Under the null hypothesis in (3.1), we have

𝔼[{F(Y)12𝒇𝜸h}𝒖]=𝔼[eh𝒖]=𝟎.\mathbb{E}\left[\left\{F(Y)-{\frac{1}{2}}-\bm{f}^{\top}\bm{\gamma}_{h}\right\}\bm{u}\right]=\mathbb{E}[e_{h}\bm{u}]=\bm{0}.

While under alternative hypothesis H1H_{1}, we have

𝔼[{F(Y)12𝒇𝜸h}𝒖]=𝔼{𝒖(eh+𝒖𝜷h)}=𝚺𝒖𝜷h𝟎.\mathbb{E}\left[\left\{F(Y)-{\frac{1}{2}}-\bm{f}^{\top}\bm{\gamma}_{h}\right\}\bm{u}\right]=\mathbb{E}\{\bm{u}(e_{h}+\bm{u}^{\top}\bm{\beta}_{h})\}=\bm{\Sigma}_{\bm{u}}\bm{\beta}_{h}\neq\bm{0}.

This observation motivates us to consider the following individual test statistic

Tnj=1ni=1n[{Fn(Yi)12𝒇^i𝜸^h}U^ij].\displaystyle{T}_{nj}=\frac{1}{\sqrt{n}}\sum\limits_{i=1}^{n}\left[\left\{F_{n}(Y_{i})-\frac{1}{2}-\widehat{\bm{f}}_{i}^{\top}\widehat{\bm{\gamma}}_{h}\right\}\widehat{U}_{ij}\right]. (3.2)

Here we utilize the empirical distribution Fn(y)=n1i=1n𝕀(Yiy)F_{n}(y)=n^{-1}\sum_{i=1}^{n}\mathbb{I}(Y_{i}\leq y) as an estimator of F(Y)F(Y). In the empirical distribution function, the term j=1n𝕀(YjYi)\sum_{j=1}^{n}\mathbb{I}(Y_{j}\leq Y_{i}) is the rank of YiY_{i}. Since statistics with ranks such as Wilcoxon test and the Kruskall-Wallis ANOVA test, are well-known to be robust, this intuitively explains why our procedures with response-distribution transformation function would be robust with respect to outliers in response. We consider the least squares estimator 𝜸^h=(𝑭^𝑭^)1𝑭^{Fn(𝒀)1/2}\widehat{\bm{\gamma}}_{h}=(\widehat{\bm{F}}^{\top}\widehat{\bm{F}})^{-1}\widehat{\bm{F}}^{\top}\left\{F_{n}(\bm{Y})-{1}/{2}\right\}. Denote 𝑻n=(Tn1,,Tnp){\bm{T}}_{n}=({T}_{n1},\ldots,{T}_{np})^{\top}. To test the null hypothesis H0:𝜷h=𝟎H_{0}:\bm{\beta}_{h}=\bm{0}, we consider \ell_{\infty} norm of 𝑻n{\bm{T}}_{n}. That is,

Mn=𝑻n.\displaystyle M_{n}=\|{\bm{T}}_{n}\|_{\infty}. (3.3)

It is clear that in the FAST statistic MnM_{n} defined above, we only need to estimate a low-dimensional parameter 𝜸h\bm{\gamma}_{h}, and no high-dimensional parameters are required. In addition, since FAST does not rely on the estimation of the precision matrix, we can avoid assumptions about the \ell_{\infty} norm of the precision matrix, as demonstrated in Fan et al., (2024). Further define SnjS_{nj} as follows:

Snj=1ni=1n[{F(Yi)12𝒇i𝜸h}Uij+mj(Yi)],\displaystyle S_{nj}=\frac{1}{\sqrt{n}}\sum\limits_{i=1}^{n}\left[\left\{F(Y_{i})-\frac{1}{2}-\bm{f}_{i}^{\top}\bm{\gamma}_{h}\right\}U_{ij}+m_{j}(Y_{i})\right], (3.4)

where mj(y)=𝔼[(X1j𝒇1𝒃j){𝕀(Yy)F(Y)}]m_{j}(y)=\mathbb{E}[(X_{1j}-\bm{f}_{1}^{\top}\bm{b}_{j})\{\mathbb{I}(Y\geq y)-F(Y)\}]. Denote 𝑺n=(Sn1,,Snp){\bm{S}}_{n}=({S}_{n1},\ldots,{S}_{np})^{\top} and 𝛀=Cov(𝑺n)\bm{\Omega}^{*}=\mathrm{Cov}({\bm{S}}_{n}). Let σj2=𝛀jj,j=1,,p\sigma_{j}^{2}=\bm{\Omega}_{jj}^{*},\ j=1,\ldots,p. It could be shown that Tnj=Snj+o(1){T}_{nj}={S}_{nj}+o_{\mathbb{P}}(1).

Assumption 3.

{log(np)}7/n=o(1)\{\log(np)\}^{7}/n=o(1).

The min1jpσj2\min_{1\leq j\leq p}\sigma_{j}^{2} is bounded away from zero.

Assumption 3 is mild and frequently employed in high-dimensional settings, which is the technical requirement to bound the difference between MnM_{n} and 𝐙\|\bm{\mathrm{Z}}\|_{\infty}, where 𝐙Np(𝟎,𝛀)\bm{\mathrm{Z}}\sim\mathrm{N}_{p}(\bm{0},\bm{\Omega}^{*}). Specifically, condition (i) imposes suitable restriction on the growth rate of pp and is commonly used in the high-dimensional inference literature, see Zhang and Cheng, (2017) and Dezeure et al., (2017). Condition (ii) imposes a boundedness restriction on the second moment of SnjS_{nj}, which is also assumed in Ning and Liu, (2017).

Theorem 3.1.

Suppose that Assumptions 1-3 and LC hold. Under the null hypothesis, that is, 𝛃h=𝟎\bm{\beta}_{h}=\bm{0}, we have

limnsupt|Pr(Mnt)Pr(𝐙t)|=0,\displaystyle\lim\limits_{n\rightarrow\infty}\sup\limits_{t\in\mathbb{R}}\left|\Pr(M_{n}\leq t)-\Pr\left(\left\|\bm{\mathrm{Z}}\right\|_{\infty}\leq t\right)\right|=0, (3.5)

where 𝐙Np(𝟎,𝛀)\bm{\mathrm{Z}}\sim\mathrm{N}_{p}\left(\bm{0},\bm{\Omega}^{*}\right).

Theorem 3.1 indicates that our test statistic can be approximated by the maximum of a high-dimensional Gaussian vector under mild conditions. Based on this, we can reject the null hypothesis H0H_{0} at the significant level α\alpha if and only if Mn>c1αM_{n}>c_{1-\alpha}, where c1αc_{1-\alpha} is the (1α)(1-\alpha)-th quantile of the distribution of 𝐙\|\bm{\mathrm{Z}}\|_{\infty}.

As demonstrated by many authors, such as Cai et al., (2014) and Ma et al., (2021), the distribution of 𝐙\|\bm{\mathrm{Z}}\|_{\infty} can be asymptotically approximated by the Gumbel distribution. However, this generally requires restrictions on the covariance matrix of 𝐙\bm{\mathrm{Z}}. For instance, it requires that the eigenvalues of 𝛀\bm{\Omega}^{*} are uniformly bounded. In addition, the critical value obtained from the Gumbel distribution may not work well in practice since this weak convergence is typically slow (Zhang and Cheng,, 2017). Instead of adopting Gumbel distribution, we consider using bootstrap to approximate the distribution of MnM_{n}.

3.2 Gaussian multiplier bootstrap

Given that 𝛀\bm{\Omega}^{*} is unknown, we employ the Gaussian multiplier bootstrap to derive the critical value c1αc_{1-\alpha}. The procedures and theoretical properties of the Gaussian multiplier bootstrap are outlined in this subsection.

  • 1.

    Generate i.i.d. random variables N1,,NnN(0,1)\mathrm{N}_{1},\ldots,\mathrm{N}_{n}\sim\mathrm{N}(0,1) independent of the observed dataset 𝒟={𝒙1,,𝒙n,Y1,,Yn}\mathcal{D}=\{\bm{x}_{1},\ldots,\bm{x}_{n},Y_{1},\ldots,Y_{n}\}, and compute

    𝒢^=maxj[p]|1ni=1n[{Fn(Yi)12𝒇^i𝜸^h}U^ij+m^j(Yi)]Ni|.\displaystyle\widehat{\mathcal{G}}=\max\limits_{j\in[p]}\left|\frac{1}{\sqrt{n}}\sum\limits_{i=1}^{n}\left[\left\{F_{n}(Y_{i})-\frac{1}{2}-\widehat{\bm{f}}_{i}^{\top}\widehat{\bm{\gamma}}_{h}\right\}\widehat{U}_{ij}+\widehat{m}_{j}(Y_{i})\right]\mathrm{N}_{i}\right|. (3.6)

    Here m^j(y)=n1i=1n(Xij𝒇^i𝒃^j){𝕀(Yiy)Fn(Yi)}\widehat{m}_{j}(y)={n}^{-1}\sum_{i=1}^{n}\left(X_{ij}-\widehat{\bm{f}}_{i}^{\top}\widehat{\bm{b}}_{j}\right)\{\mathbb{I}(Y_{i}\geq y)-F_{n}(Y_{i})\}.

  • 2.

    Repeat the first step independently for BB times to obtain 𝒢^1,,𝒢^B\widehat{\mathcal{G}}_{1},\ldots,\widehat{\mathcal{G}}_{B}. Approximate the critical value c1αc_{1-\alpha} via the (1α)(1-\alpha)-th quantile of the empirical distribution of the bootstrap statistics:

    c^1α=inf{t0:1Bb=1B𝕀(𝒢^bt)1α}.\displaystyle\widehat{c}_{1-\alpha}=\inf\left\{t\geq 0:\frac{1}{B}\sum\limits_{b=1}^{B}\mathbb{I}(\widehat{\mathcal{G}}_{b}\leq t)\geq 1-\alpha\right\}. (3.7)
  • 3.

    We reject the null hypothesis H0H_{0} if and only if

    Mnc^1α.\displaystyle M_{n}\geq\widehat{c}_{1-\alpha}. (3.8)
Theorem 3.2.

Suppose that conditions in Theorem 3.1 are satisfied. Under the null hypothesis, we have

supx>0|Pr(Mnx)Pr(𝒢^x𝒟)|0.\displaystyle\sup\limits_{x>0}\left|\Pr\left(M_{n}\leq x\right)-\Pr(\widehat{\mathcal{G}}\leq x|\mathcal{D})\right|\rightarrow 0.

Theorem 3.2 demonstrates the validity of the proposed bootstrap procedure, grounded in the Gaussian approximation theory established in Theorem 3.1. Based on this result, we can define our decision rule as follows:

ψ,α=𝕀(Mn>c^1α).\displaystyle\psi_{\infty,\alpha}=\mathbb{I}\left(M_{n}>\widehat{c}_{1-\alpha}\right).

The null hypothesis H0H_{0} is rejected if and only if ψ,α=1\psi_{\infty,\alpha}=1.

3.3 Power analysis

We next consider the asymptotic power analysis of the MnM_{n}. To demonstrate the efficiency of the test statistic, we consider the following local alternative parameter family for 𝜷h\bm{\beta}_{h} under H1H_{1}.

𝚫(c)={𝜷hp:maxj[p]|βhj|clogpn},\displaystyle\bm{\Delta}(c)=\left\{\bm{\beta}_{h}\in\mathbb{R}^{p}:\max\limits_{j\in[p]}\left|{\beta_{hj}}\right|\geq\sqrt{c\frac{\log p}{n}}\right\}, (3.9)

where cc is a positive constant. The s=𝜷h0s=\|\bm{\beta}_{h}\|_{0} quantifies the sparsity of the parameter 𝜷h\bm{\beta}_{h}. Denote μj=𝔼(Uij2)\mu_{j}=\mathbb{E}(U_{ij}^{2}).

Assumption 4.

Suppose that 𝚺𝐮=Cov(𝐮)\bm{\Sigma}_{\bm{u}}=\mathrm{Cov}(\bm{u}) is a diagonal matrix and μj\mu_{j} is bounded away from zero.

Assumption 4 outlines the diagonal structure of the covariance matrix 𝚺𝒖\bm{\Sigma}_{\bm{u}}, a common assumption in factor analysis (Kim and Mueller,, 1978). We should also note that this diagonality of 𝚺𝒖\bm{\Sigma}_{\bm{u}} is only imposed here for simplify the power analysis.

Theorem 3.3.

Suppose that Assumptions 1-4 and LC hold. For the test statistic MnM_{n} defined in (3.3), if s=o(n/logp)s=o({n}/\log p), we have

lim(n,p)inf𝜷h𝚫(2+ϱ0)Pr(Mnc^1α)=1.\displaystyle\lim\limits_{(n,p)\rightarrow\infty}\inf\limits_{\bm{\beta}_{h}\in\bm{\Delta}(2+\varrho_{0})}\Pr\left(M_{n}\geq\widehat{c}_{1-\alpha}\right)=1. (3.10)

where ϱ0\varrho_{0} is a positive constant.

Theorem 3.3 suggests that our test procedure maintains high power even when only a few components of 𝜷h\bm{\beta}_{h} have magnitudes larger than (2+ϱ0)(logp)/n\sqrt{(2+\varrho_{0})(\log p)/{n}}. Thus, our testing procedure is powerful against sparse alternatives. Notably, this separation rate represents the minimax optimal rate for local alternative hypotheses, as discussed in Verzelen, (2012), Cai et al., (2014), Zhang and Cheng, (2017), and Ma et al., (2021).

4 Estimation and inference of FASIM

When the null hypothesis is rejected, indicating that the factor model is inadequate, we need to consider FASIM. In this section, we aim to investigate the estimation and inference of the FASIM.

4.1 Regularization estimation

In the high-dimensional regime, we employ 1\ell_{1} penalty (Tibshirani,, 1996) to estimate the unknown parameter vectors 𝜷h\bm{\beta}_{h} and 𝜸h\bm{\gamma}_{h} in model (2.1):

(𝜷^h,𝜸^h)=argmin𝜷p,𝜸K[12ni=1n{Fn(Yi)12𝒖^i𝜷𝒇^i𝜸}2+λ𝜷1],\displaystyle(\widehat{\bm{\beta}}_{h},\widehat{\bm{\gamma}}_{h})=\arg\min\limits_{\bm{\beta}\in\mathbb{R}^{p},\bm{\gamma}\in\mathbb{R}^{K}}\left[\frac{1}{2n}\sum\limits_{i=1}^{n}\left\{F_{n}(Y_{i})-\frac{1}{2}-\widehat{\bm{u}}_{i}^{\top}\bm{\beta}-\widehat{\bm{f}}_{i}^{\top}\bm{\gamma}\right\}^{2}+\lambda\|\bm{\beta}\|_{1}\right], (4.1)

where λ>0\lambda>0 is a tuning parameter.

By utilizing pseudo response observations Fn(Yi)F_{n}(Y_{i}) instead of YiY_{i}, our approach is inherently robust when confronting with outliers. We note that for the SIM without latent factors, the distribution function transformation has also been considered by many other authors, such as Zhu and Zhu, (2009), Wang et al., (2012), and Rejchel and Bogdan, (2020). However, in our model 𝒇\bm{f} and 𝒖\bm{u} are unobserved and must be estimated firstly. Thus we require additional efforts to derive the theoretical properties of 𝜷^h\widehat{\bm{\beta}}_{h}.

Let F~n(𝒀)=(𝑰n𝑷^){Fn(𝒀)1/2}\widetilde{F}_{n}(\bm{Y})=(\bm{I}_{n}-\widehat{\bm{P}})\left\{F_{n}(\bm{Y})-1/2\right\} represent the residuals of the response vector Fn(𝒀)1/2F_{n}(\bm{Y})-1/2 after it has been projected onto the column space of 𝑭^\widehat{\bm{F}}, where 𝑷^=n1𝑭^𝑭^\widehat{\bm{P}}=n^{-1}\widehat{\bm{F}}\widehat{\bm{F}}^{\top} is the corresponding projection matrix. Since 𝑼^=(𝑰n𝑷^)𝑿\widehat{\bm{U}}=(\bm{I}_{n}-\widehat{\bm{P}})\bm{X}, 𝑭^𝑼^=𝟎K×p\widehat{\bm{F}}^{\top}\widehat{\bm{U}}={\bm{0}_{K\times p}}. Then direct calculations yield that the solution of (4.1) is equivalent to

𝜷^h\displaystyle\widehat{\bm{\beta}}_{h} =argmin𝜷p{12nF~n(𝒀)𝑼^𝜷22+λ𝜷1},\displaystyle=\arg\min\limits_{\bm{\beta}\in\mathbb{R}^{p}}\left\{\frac{1}{2n}\left\|\widetilde{F}_{n}(\bm{Y})-\widehat{\bm{U}}\bm{\beta}\right\|_{2}^{2}+\lambda\|\bm{\beta}\|_{1}\right\}, (4.2)
𝜸^h\displaystyle\widehat{\bm{\gamma}}_{h} =(𝑭^𝑭^)1𝑭^{Fn(𝒀)12}.\displaystyle=(\widehat{\bm{F}}^{\top}\widehat{\bm{F}})^{-1}\widehat{\bm{F}}^{\top}\left\{F_{n}(\bm{Y})-\frac{1}{2}\right\}. (4.3)

For any vector 𝜷^h\widehat{\bm{\beta}}_{h} defined in (4.2), we have the following result.

Theorem 4.1.

Under Assumptions 1-2 and LC, assume that logp=o(n)\log p=o(n). If s=o(n/logp)s=o(n/\log p) and λ(logp)/n\lambda\asymp\sqrt{(\log p)/n}, then 𝛃^h\widehat{\bm{\beta}}_{h} defined in (4.2) satisfies

𝜷^h𝜷h2=O(λs),and𝜷^h𝜷h1=O(λs).\displaystyle\|\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\|_{2}=O_{\mathbb{P}}(\lambda\sqrt{s}),\ \ and\ \ \|\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\|_{1}=O_{\mathbb{P}}(\lambda s).

Based on the empirical distribution function of the response, we establish the 1\ell_{1} and 2\ell_{2}-error bounds for our introduced estimator 𝜷^h\widehat{\bm{\beta}}_{h} in Theorem 4.1. Compared with Zhu and Zhu, (2009), Wang et al., (2012), and Rejchel and Bogdan, (2020), in our model 𝒇\bm{f} and 𝒖\bm{u} are unobserved and must be estimated firstly. This adds new technical difficulty. Further compared with Fan et al., (2024), our procedures do not need any moment condition for the error term ε\varepsilon in the model.

4.2 Inference of FASIM

Due to the bias inherent in the penalized estimation, 𝜷^h\widehat{\bm{\beta}}_{h} is unsuitable for direct utilization in statistical inference. To overcome this obstacle, Zhang and Zhang, (2014) and Van de Geer et al., (2014) proposed the debiasing technique for Lasso estimator in linear regression. Eftekhari et al., (2021) extended the technique to SIM. Han et al., (2023) and Yang et al., (2024) further considered robust inference for high-dimensional SIM. However, their procedures cannot handle latent factors. To this end, we introduce debiased estimator for the FASIM. Motivated by Zhang and Zhang, (2014) and Van de Geer et al., (2014), we construct the following debiased estimator

𝜷~h=𝜷^h+1n𝚯^𝑼^{Fn(𝒀)12𝑼^𝜷^h},\displaystyle\widetilde{\bm{\beta}}_{h}=\widehat{\bm{\beta}}_{h}+\frac{1}{n}\widehat{\bm{\Theta}}\widehat{\bm{U}}^{\top}\left\{F_{n}(\bm{Y})-\frac{1}{2}-\widehat{\bm{U}}\widehat{\bm{\beta}}_{h}\right\}, (4.4)

where 𝚯^\widehat{\bm{\Theta}} is a sparse estimator of 𝚺u1\bm{\Sigma}_{u}^{-1}, with 𝚺u=𝔼(𝒖i𝒖i)\bm{\Sigma}_{u}=\mathbb{E}\left({\bm{u}}_{i}{\bm{u}}_{i}^{\top}\right). Denote 𝚺^u=n1i=1n𝒖^i𝒖^i\widehat{\bm{\Sigma}}_{u}=n^{-1}\sum_{i=1}^{n}\widehat{\bm{u}}_{i}\widehat{\bm{u}}_{i}^{\top}. Then we construct 𝚯^{\widehat{\bm{\Theta}}} by employing the approach in Cai et al., (2011). Concretely, 𝚯^{\widehat{\bm{\Theta}}} is the solution to the following constrained optimization problem:

𝚯^\displaystyle\widehat{\bm{\Theta}} =argmin𝚯p×p𝚯sum,\displaystyle=\arg\min\limits_{\bm{\Theta}\in\mathbb{R}^{p\times p}}\|\bm{\Theta}\|_{\text{sum}},
s.t.𝚯𝚺^u𝑰pmaxδn,\displaystyle\text{s.t.}\ \|\bm{\Theta}\widehat{\bm{\Sigma}}_{u}-\bm{I}_{p}\|_{\max}\leq\delta_{n}, (4.5)

where δn\delta_{n} is a predetermined tuning parameter. In general, 𝚯^\widehat{\bm{\Theta}} is not symmetric since there is no symmetry constraint in (4.2). Symmetry can be enforced through additional operations. Denote 𝚯^=(𝝃^1,,𝝃^p)=(ξ^ij)1i,jp\widehat{\bm{\Theta}}=\left(\widehat{\bm{\xi}}_{1},\ldots,\widehat{\bm{\xi}}_{p}\right)^{\top}=(\widehat{\xi}_{ij})_{1\leq i,j\leq p}. Write 𝚯^sym=(ξ^ijsym)1i,jp\widehat{\bm{\Theta}}^{\text{sym}}=\left(\widehat{\xi}_{ij}^{\text{sym}}\right)_{1\leq i,j\leq p}, where ξ^ijsym\widehat{\xi}_{ij}^{\text{sym}} is defined as:

ξ^ijsym=ξ^ij𝕀(|ξ^ij||ξ^ji|)+ξ^ji𝕀(|ξ^ij|>|ξ^ji|).\displaystyle\widehat{\xi}_{ij}^{\text{sym}}=\widehat{\xi}_{ij}\mathbb{I}\left(|\widehat{\xi}_{ij}|\leq|\widehat{\xi}_{ji}|\right)+\widehat{\xi}_{ji}\mathbb{I}\left(|\widehat{\xi}_{ij}|>|\widehat{\xi}_{ji}|\right).

Apparently, 𝚯^sym\widehat{\bm{\Theta}}^{\text{sym}} is a symmetric matrix. For simplicity, we write 𝚯^\widehat{\bm{\Theta}} as the symmetric estimator in the rest of the paper. Next, we consider the estimation error bound of 𝚯^\widehat{\bm{\Theta}}. To achieve this target, we need to introduce the following assumption on the inverse of the 𝚺u\bm{\Sigma}_{u}.

Assumption 5.

There exists a positive constant MM^{*} such that 𝚺u1M\left\|\bm{\Sigma}_{u}^{-1}\right\|_{\infty}\leq M^{*}. Moreover, 𝚺u1=(𝛏1,,𝛏p)=(ξij)1i,jp\bm{\Sigma}_{u}^{-1}=\left({\bm{\xi}}_{1},\ldots,{\bm{\xi}}_{p}\right)^{\top}=({\xi}_{ij})_{1\leq i,j\leq p} is row-wise sparse, i.e., maxi[p]j=1p|ξij|qcn,p\max_{i\in[p]}\sum_{j=1}^{p}|{\xi}_{ij}|^{q}\leq c_{n,p}, where cn,pc_{n,p} is positive and bounded away from zero and allowed to increase as nn and pp grow, and 0q<10\leq q<1.

Assumption 5 necessitates that 𝚺u1\bm{\Sigma}_{u}^{-1} be sparse in terms of both its \ell_{\infty}-norm and matrix row space. Van de Geer et al., (2014), Cai et al., (2016) and Ning and Liu, (2017) also discussed similar assumptions on precision matrix estimation and more general inverse Hessian matrix estimation.

The estimation error bound of 𝚯^\widehat{\bm{\Theta}} and the upper bound of 𝚯^𝚺u1\left\|\widehat{\bm{\Theta}}-\bm{\Sigma}_{u}^{-1}\right\|_{\infty} are shown in Proposition 7 and Lemma 8 in the Supplementary Material, which are crucial for establishing theoretical results afterwards. Denote 𝒎={m1(Y),,mp(Y)}\bm{m}=\{m_{1}(Y),\ldots,m_{p}(Y)\}^{\top}, with mj(y)=𝔼[(X1j𝒇1𝒃j){𝕀(Yy)F(Y)}]m_{j}(y)=\mathbb{E}[(X_{1j}-\bm{f}_{1}^{\top}\bm{b}_{j})\{\mathbb{I}(Y\geq y)-F(Y)\}]. Let 𝒎i\bm{m}_{i} be the i.i.d copies of 𝒎\bm{m}. Namely, 𝒎i={m1(Yi),,mp(Yi)}\bm{m}_{i}=\{m_{1}(Y_{i}),\ldots,m_{p}(Y_{i})\}^{\top}.

Theorem 4.2.

Suppose that the conditions in Theorem 4.1 and Assumption 5 are satisfied. Then if s=o(n/logp)s=o(\sqrt{n}/\log p) and logp<<n(1q)/(3q)\log p<<n^{{(1-q)}/{(3-q)}}, where qq is given in Assumption 5, we have

n(𝜷~h𝜷h)=𝒁b+𝑬,\displaystyle\sqrt{n}(\widetilde{\bm{\beta}}_{h}-{\bm{\beta}}_{h})=\bm{Z}_{b}+\bm{E},
𝒁b=1ni=1n𝚺u1(𝒖iehi+𝒎i),𝑬=op(1).\displaystyle\bm{Z}_{b}=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\bm{\Sigma}_{u}^{-1}\left({\bm{u}}_{i}e_{hi}+\bm{m}_{i}\right),\ \|\bm{E}\|_{\infty}=o_{p}(1).

Theorem 4.2 indicates that the asymptotic representation of n(𝜷~h𝜷h)\sqrt{n}(\widetilde{\bm{\beta}}_{h}-{\bm{\beta}}_{h}) can be divided into two terms. The major term 𝒁b\bm{Z}_{b} is associated with the error 𝒆h\bm{e}_{h} and 𝒎\bm{m}, while the remainder 𝑬\bm{E} vanishes as nn approaches to infinity.

Further for j=1,,pj=1,\ldots,p, the variance of Zbj{Z}_{bj} is σzj2=𝒆j𝚺u1Var{𝒖eh+𝒎}𝚺u1𝒆j{\sigma}_{zj}^{2}=\bm{e}_{j}^{\top}\bm{\Sigma}_{u}^{-1}\mbox{Var}\{\bm{u}e_{h}+\bm{m}\}\bm{\Sigma}_{u}^{-1}\bm{e}_{j}, with 𝒆j\bm{e}_{j} being an unit vector with only its jj-th element being 1. Based on Theorem 4.2, the asymptotic variance of n(β~hjβhj)\sqrt{n}(\widetilde{{\beta}}_{hj}-{{\beta}}_{hj}) can be estimated by σ^zj2=𝒆j𝚯^𝚺~𝚯^𝒆j\widehat{\sigma}_{zj}^{2}=\bm{e}_{j}^{\top}\widehat{\bm{\Theta}}\widetilde{\bm{\Sigma}}\widehat{\bm{\Theta}}\bm{e}_{j}, with 𝚺~=(σ~lk)1l,kp\widetilde{\bm{\Sigma}}=(\widetilde{\sigma}_{lk})_{1\leq l,k\leq p}, σ~lk=n1i=1n{U^ile~hi+m^l(Yi)}{U^ike~hi+m^k(Yi)}\widetilde{\sigma}_{lk}={n}^{-1}\sum_{i=1}^{n}\left\{\widehat{U}_{il}\widetilde{e}_{hi}+\widehat{m}_{l}(Y_{i})\right\}\left\{\widehat{U}_{ik}\widetilde{e}_{hi}+\widehat{m}_{k}(Y_{i})\right\}, e~hi=Fn(Yi)1/2𝒖^i𝜷^h𝒇^i𝜸^h\widetilde{e}_{hi}=F_{n}(Y_{i})-{1}/{2}-\widehat{\bm{u}}_{i}^{\top}\widehat{\bm{\beta}}_{h}-\widehat{\bm{f}}_{i}^{\top}\widehat{\bm{\gamma}}_{h} and m^j(y)=n1i=1n(Xij𝒇^i𝒃^j){𝕀(Yiy)Fn(Yi)}\widehat{m}_{j}(y)={n}^{-1}\sum_{i=1}^{n}\left(X_{ij}-\widehat{\bm{f}}_{i}^{\top}\widehat{\bm{b}}_{j}\right)\{\mathbb{I}(Y_{i}\geq y)-F_{n}(Y_{i})\}. Thus the (1α)(1-\alpha) confidence interval for βhj\beta_{hj}, j[p]j\in[p] can be constructed as follows

𝒞α(βhj)=(β~hjσ^zjz1α/2n,β~hj+σ^zjz1α/2n),\displaystyle\mathcal{CI}_{\alpha}(\beta_{hj})=\left(\widetilde{\beta}_{hj}-\frac{\widehat{\sigma}_{zj}z_{1-\alpha/2}}{\sqrt{n}},\ \widetilde{\beta}_{hj}+\frac{\widehat{\sigma}_{zj}z_{1-\alpha/2}}{\sqrt{n}}\right), (4.6)

where β~hj\widetilde{\beta}_{hj} is the jj-th component of 𝜷~h\widetilde{\bm{\beta}}_{h} and z1α/2z_{1-\alpha/2} is the (1α/2)(1-\alpha/2)-th quantile of standard normal distribution.

5 Numerical studies

In this section, simulation studies are conducted to evaluate the finite sample performance of the proposed methods. We implement the proposed method with the following two models.
Model 1: Linear model

Y=𝒖𝜷+𝒇𝜸+ε.\displaystyle Y=\bm{u}^{\top}\bm{\beta}+\bm{f}^{\top}\bm{\gamma}+\varepsilon. (5.1)

Model 2: Nonlinear model

Y=exp(𝒖𝜷+𝒇𝜸+ε).\displaystyle Y=\exp\left(\bm{u}^{\top}\bm{\beta}+\bm{f}^{\top}\bm{\gamma}+\varepsilon\right). (5.2)

Under each setting, we generate nn i.i.d. observations, and replicate 500 times.

5.1 Adequacy test of factor model

In this subsection, we set n=200n=200, K=2K=2, p=200p=200 or 500500, and 𝑿=𝑭𝑩+𝑼{\bm{X}}=\bm{F}\bm{B}^{\top}+\bm{U}. Here, the entries of 𝑩\bm{B} are generated from the uniform distribution Unif(1,1)\mathrm{Unif}(-1,1) and every row of 𝑼\bm{U} follows from N(𝟎,𝚺𝒖)\mathrm{N}(\bm{0},\bm{\Sigma}_{\bm{u}}) with 𝚺𝒖=(σuij)1i,jp\bm{\Sigma}_{\bm{u}}=(\sigma_{uij})_{1\leq i,j\leq p} and σuij=0.5|ij|\sigma_{uij}=0.5^{|i-j|}. We consider two cases of 𝑭\bm{F} generation:
Case i. We generate 𝒇i\bm{f}_{i} from standard normal distribution, that is,

𝒇iN(𝟎,𝑰K),i=1,,n.\displaystyle\bm{f}_{i}\sim\mathrm{N}(\bm{0},\bm{I}_{K}),\ i=1,\ldots,n. (5.3)

Case ii. We generate 𝒇i\bm{f}_{i} from the AR(1) model:

𝒇i=𝚽𝒇i1+𝝅i,i=2,,n,\displaystyle\bm{f}_{i}=\bm{\Phi}\bm{f}_{i-1}+\bm{\pi}_{i},\,i=2,\ldots,n, (5.4)

where 𝚽=(ϕij)K×K\bm{\Phi}=(\phi_{ij})\in\mathbb{R}^{K\times K} with ϕij=0.4|ij|+1,i,j[K]{\phi}_{ij}=0.4^{|i-j|+1},\ i,j\in[K]. In addition, 𝒇1{\bm{f}_{1}} and 𝝅i,i1{\bm{\pi}_{i}},\ i\geq 1 are independently drawn from N(𝟎,𝑰K)\mathrm{N}(\bm{0},\bm{I}_{K}). We generate ε{\varepsilon} either from (a) N(0,0.25)\mathrm{N}(0,0.25) or (b) Student’s t distribution with 3 degree of freedom, denoted as t3\mathrm{t}_{3}. We set γ=(0.5,0.5)\gamma=(0.5,0.5)^{\top} and 𝜷=ω(𝟏3,𝟎p3)\bm{\beta}=\omega\ast\left(\mathbf{1}_{3},\mathbf{0}_{p-3}\right)^{\top}, ω0\omega\geq 0. When ω=0\omega=0, it indicates that the null hypothesis holds, and the simulation results correspond to the empirical size. Otherwise, they correspond to the empirical power.

To assess the robustness of our proposed method, we introduce outliers to the responses. We randomly pick poutp_{out} of the response, and increase by moutm_{out}-times maximum of original responses, shorted as pout+moutmaxp_{out}+m_{out}*\max(response). Here, poutp_{out} represents the proportion of outliers, while moutm_{out} is a predetermined constant indicating the strength of the outliers. Throughout the simulation, we adopt the strategy of 10%+10max*\max(response) to pollute the observations. We compare our proposed FAST with FabTest in Fan et al., (2024). This comparison under the linear model (5.1) is illustrated in Figure 1 and Figure 2. Here, “FAST_i” and “FAST_ii” signify the results derived from the FAST corresponding to settings (5.3) and (5.4) of 𝑭\bm{F} generation, respectively. Similarly, “FabTest_i” and “FabTest_ii” denote the results of FabTest in Fan et al., (2024) corresponding to settings (5.3) and (5.4) of 𝑭\bm{F} generation, respectively. The first rows in Figures 1 and 2 represent the results obtained with the original data, while the second rows correspond to the results obtained after incorporating outliers (namely, 10%+10max*\max(response)). The figures suggest that both FAST and FabTest exhibit commendable performance under the Gaussian distribution. But when confronted with heavy tails or outliers, our proposed method outperforms FabTest. Specifically, as illustrated in the second rows of these two figures, the power curves associated with FAST demonstrate a notably swifter attainment of 1. In contrast, the power curves related to FabTest consistently hover around the significance level of 0.05.

Figures 1 and 2 should be here.

Figures 3 and 4 illustrate the power curves of FAST and FabTest under the nonlinear model (5.2). We can observe a distinct performance in the nonlinear scenario where, even with the light-tailed error distribution N(0,0.25)\mathrm{N}(0,0.25) within the original data, the power performance of the FabTest is notably inferior compared to that of FAST. Furthermore, when considering the t3\mathrm{t}_{3}-distribution within the original data, the power curves of FAST continue to reach 1 faster. Similarly, when outliers are introduced, FabTest exhibits a complete failure, whereas FAST continues to perform very well. These results serve to illustrate the robustness of FAST in testing.

Figures 3 and 4 should be here.

As previously mentioned, our method avoids estimating high-dimensional regression coefficients and high-dimensional precision matrix, significantly reducing computational costs. Table 1 provides a comparison of the average computation time between the FAST and the FabTest in Fan et al., (2024) under the linear model (5.1). The table indicates that under the same settings, the average computation time for FAST is significantly less than that for FabTest. Additionally, it is evident that the average computation time for both tests increase as the parameter dimension pp increases to 500, which is expected.

Table 1 should be here.

5.2 Accuracy of estimation

To illustrate the accuracy of our estimation, we set the number of latent factors to K=2K=2, the dimension of 𝒙\bm{x} to p=500p=500, γ=(0.5,0.5)\gamma=(0.5,0.5)^{\top}, the first s=3s=3 entries of 𝜷\bm{\beta} to 0.5, the remaining psp-s entries to 0. Throughout this subsection, we generate each entry of 𝑭\bm{F} and 𝑼\bm{U} from the standard Gaussian distribution N(0,1)\mathrm{N}(0,1), while each entry of 𝑩\bm{B} is randomly generated from the uniform distribution Unif(1,1)\mathrm{Unif}(-1,1). The ε\varepsilon is generated from the three scenarios: (a) the standard Gaussian distribution N(0,1)\mathrm{N}(0,1); (b) the uniform distribution Unif(3/2,3/2)\mathrm{Unif}(-{3}/{2},{3}/{2}); (c) t3\mathrm{t}_{3}. Under each setting, nn is set to satisfy that s(logp)/n\sqrt{s(\log p)/n} takes uniform grids in [0.10,0.30][0.10,0.30] with step being 0.05.

To validate the necessities of considering latent factors in observational studies, we compare the relative error obtained by our approach (denoted as “FASIM_Lasso”) with the method in Rejchel and Bogdan, (2020) (denoted as “SIM_Lasso”), which considers the single-index model without incorporating the factor effect, and the method in Fan et al., (2024) (denoted as “FA_Lasso”). Figures 5 and 6 depict the relative error

L2(𝜷^h,𝜷h)=𝜷^h𝜷h2𝜷h2L_{2}(\widehat{\bm{\beta}}_{h},\bm{\beta}_{h})=\frac{\|\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\|_{2}}{\left\|\bm{\beta}_{h}\right\|_{2}}

of linear model (5.1) and nonlinear model (5.2), respectively. The simulation results indicate that for original data, FA_Lasso performs very well when the error is not heavy-tailed, but when the error follows the t3\mathrm{t}_{3}, FA_Lasso performs significantly worse. In contrast, the proposed method maintains stable and good performance across different error distribution scenarios, illustrating the robustness of our approach. Additionally, compared to SIM_Lasso, FASIM_Lasso enjoys smaller relative errors. This is partially due to the inadequacy of the SIM in Rejchel and Bogdan, (2020). Additionally, Figures 5 and 6 show that the upper limits of the statistical rates 𝜷^h𝜷h2\|\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\|_{2} of FASIM_Lasso are O{s(logp)/n}O\left\{\sqrt{s(\log p)/n}\right\}, which validates Theorem 4.1.

Figures 5 and 6 should be here.

To investigate the robustness of FASIM_Lasso, we introduce outliers into the observations using the aforementioned method, specifically, 10%+10max*\max(response). We output the L2(𝜷^,𝜷)L_{2}(\widehat{\bm{\beta}},\bm{\beta}) of FA_Lasso and L2(𝜷^h,𝜷h)L_{2}(\widehat{\bm{\beta}}_{h},\bm{\beta}_{h}) of FASIM_Lasso and SIM_Lasso. The results are depicted in Figures 7 and 8. The figures indicate that our proposed method can achieve more precise results, with smaller relative errors in all scenarios. This implies that our proposed method is much more robust than FA_Lasso confronted with outliers in the observed data.

Figures 7 and 8 should be here.

5.3 Validity of confidence interval

In this subsection, we construct confidence intervals based on the proposed method in Section 4. We generate ε{\varepsilon} either from (a) N(0,0.25)\mathrm{N}(0,0.25) or (b) t3\mathrm{t}_{3} and set n=200n=200, p=200p=200 or 500500. In addition, we generate 𝑭\bm{F}, 𝑼\bm{U}, 𝑩\bm{B} and the parameters 𝜷\bm{\beta}, 𝜸\bm{\gamma} as described in Section 5.2. To assess the performance of our methods, we examine the empirical coverage probabilities and the average lengths of confidence intervals for the individual coefficients across all covariates, covariates on 𝒮={1,2,3}\mathcal{S}=\{1,2,3\} and those on 𝒮C=[p]/𝒮\mathcal{S}^{C}=[p]/\mathcal{S}. We define

CP =j=1p𝕀{βhj𝒞α(βhj)}p,AL=j=1p2σ^zjz1α/2/np,\displaystyle=\frac{\sum\limits_{j=1}^{p}\mathbb{I}\{{\beta}_{hj}\in\mathcal{CI}_{\alpha}(\beta_{hj})\}}{p},\ \ \ \text{AL}=\frac{\sum\limits_{j=1}^{p}2{\widehat{\sigma}_{zj}z_{1-\alpha/2}}/{\sqrt{n}}}{p},
CP𝒮\displaystyle\text{CP}_{\mathcal{S}} =j𝒮𝕀{βhj𝒞α(βhj)}s,AL𝒮=j𝒮2σ^zjz1α/2/ns,\displaystyle=\frac{\sum\limits_{j\in\mathcal{S}}\mathbb{I}\{{\beta}_{hj}\in\mathcal{CI}_{\alpha}(\beta_{hj})\}}{s},\ \ \ \text{AL}_{\mathcal{S}}=\frac{\sum\limits_{j\in\mathcal{S}}2{\widehat{\sigma}_{zj}z_{1-\alpha/2}}/{\sqrt{n}}}{s},
CP𝒮C\displaystyle\text{CP}_{\mathcal{S}^{C}} =j𝒮C𝕀{βhj𝒞α(βhj)}ps,AL𝒮C=j𝒮C2σ^zjz1α/2/n(ps).\displaystyle=\frac{\sum\limits_{j\in\mathcal{S}^{C}}\mathbb{I}\{{\beta}_{hj}\in\mathcal{CI}_{\alpha}(\beta_{hj})\}}{p-s},\ \ \ \text{AL}_{\mathcal{S}^{C}}=\frac{\sum\limits_{j\in\mathcal{S}^{C}}2{\widehat{\sigma}_{zj}z_{1-\alpha/2}}/{\sqrt{n}}}{(p-s)}.

The results are summarized in Table 2. The empirical coverage probabilities are close to the nominal level across all settings, and the average lengths of the confidence interval are very short. This illustrates the merits of our proposed methods. In addition, we find that even when the error follows a t3\mathrm{t}_{3} distribution, the average interval lengths only increase slightly.

Table 2 should be here.

6 Real data analysis

In this section, we employ our method with a macroeconomic dataset named FRED-MD (McCracken and Ng,, 2016) which comprises 126 monthly U.S. macroeconomic variables. Due to these variables measuring certain aspects of economic health, they are influenced by latent factors and thus can be regarded as intercorrelated. We analyze this dataset to illustrate the performance of FASIM and evaluate the adequacy of the factor regression model. In our study, we take “GS5” as response variable YY, and the remaining variables as predictors 𝒙\bm{x}, where “GS5” represents the 5-year treasury rate. In addition, to demonstrate the robustness of FASIM, we apply the previously mentioned outlier handling method to the response variable, specifically using 10%+10max*\max(response). Due to the economic crisis of 2008-2009, the data remained unstable even after implementing the suggested transformations. Therefore, we examine data from two distinct periods: February 1992 to October 2007 and August 2010 to February 2020.

At the beginning, we employ our introduced FAST and FabTest in Fan et al., (2024) to test the adequacy of factor models, and denote the corresponding models with only factors as F_SIM and F_LM, respectively. We employ 2000 bootstrap replications here to obtain the critical value. For FabTest, we use 10 fold cross-validation to compute the tuning parameters and refit cross-validation based on iterated sure independent screening to estimate the variance of the error. The pp-values are provided in Table 3. At the significance level of 0.05, both for the original dataset and the polluted dataset, the results indicate the inadequacy of F_SIM for the “GS5” within the two distinct time periods. While for F_LM, the results indicate that it is inadequate for “GS5” in the original dataset within the two distinct time periods. When the data is polluted, the null hypothesis is rejected during the period from February 1992 to October 2007, while it is not rejected during the other period. This implies the necessary to introduce the idiosyncratic component 𝒖\bm{u} into the regression model. Hence, in the subsequent study on prediction accuracy, we consider FASIM and FARM for comparison.

Table 3 should be here.

We compare the forecasting results of FASIM with that of FARM. For each given time period and model, predictions are performed using a moving window approach with a window size of 90 months. Indexing the panel data from 1 for each of the two time periods, for all t>90t>90, we use the 90 previous measurements {(𝒙t90,Yt90),,(𝒙t1,Yt1)}\{(\bm{x}_{t-90},Y_{t-90}),\ldots,(\bm{x}_{t-1},Y_{t-1})\} to train the models (FASIM and FARM) and output predictions Y^FASIM,t\widehat{Y}_{\text{FASIM},t} and Y^FARM,t\widehat{Y}_{\text{FARM},t}. For FASIM, as defined in Equation (1), after obtaining the estimated parameters 𝜷^h\widehat{\bm{\beta}}_{h} and 𝜸^h\widehat{\bm{\gamma}}_{h}, the estimated latent factor vector 𝒇^t\widehat{\bm{f}}_{t}, and the estimated idiosyncratic component 𝒖^t\widehat{\bm{u}}_{t} at the tt-th time point, we get the estimator g^\widehat{g} of the unknown function gg via spline regression. Finally, the predicted value of the response variable is calculated as Y^t=g^(𝒇^t𝜸^+𝒖^t𝜷^)\widehat{Y}_{t}=\widehat{g}(\widehat{\bm{f}}_{t}^{\top}\widehat{\bm{\gamma}}+\widehat{\bm{u}}_{t}^{\top}\widehat{\bm{\beta}}). The accuracy of FASIM and FARM is measured using the Mean Square Error (MSE) (Hastie et al.,, 2009), defined as:

MSE=1T90t=91T(YtY^t)2,\text{MSE}=\frac{1}{T-90}\sum_{t=91}^{T}(Y_{t}-\widehat{Y}_{t})^{2},

where TT denotes the total number of data points in a given time period.

Table 4 should be here.

Table 4 presents the prediction accuracy results of FASIM and FARM of “GS5” for the original dataset and the polluted dataset within the two distinct time periods. It is evident that the performance of FASIM and FARM on the original data are similar. However, while the MSEs of both FASIM and FARM increase for the polluted dataset, the increase is substantial for FARM, whereas FASIM shows only a modest increase. This suggests that FASIM is more robust than FARM.

7 Conclusions and discussions

To capture nonliearity with latent factors, in this paper, we introduce a novel Factor Augmented sparse Single-Index Model, FASIM. For this newly proposed model, we first address the concern of whether the augmented part is necessary or not. We develop a score-type test statistic without estimating high-dimensional regression coefficients nor high-dimensional precision matrix. We also propose a Gaussian multiplier bootstrap to determine the critical value for our proposed test statistic FAST. The validity of our procedure is theoretically established under mild conditions. When the model test is passed, we employ the 1\ell_{1} penalty to estimate the unknown parameters, establishing both 1\ell_{1} and 2\ell_{2} error bounds. We also introduce debiased estimator and establish its asymptotic normality. Numerical results illustrate the robustness and effectiveness of our proposed procedures.

In practice, it would be of interest to test whether the FASIM is actually FARM. We may also consider the multiple testing problem for the FASIM. Further the condition κh0\kappa_{h}\neq 0 excludes even link functions, and in particular the problem of sparse phase retrieval. We would explore these possible topics in near future.

Appendix

Appendix A Proofs of Theorems

In the following, we will present the proofs of our main theorems. To save space, proofs of some technical lemmas will be shown in Supplementary Material.

A.1 Proof of Theorem 3.1

Proof.

By Lemma 2 in Supplementary Material, with high probability, we have

|maxj[p]|Tnj|maxj[p]|Snj||maxj[p]|TnjSnj|Cr(n,p),\displaystyle\left|\max\limits_{j\in[p]}|T_{nj}|-\max\limits_{j\in[p]}|S_{nj}|\right|\leq\max\limits_{j\in[p]}\left|T_{nj}-S_{nj}\right|\leq Cr(n,p),

for some constant CC, as (n,p)(n,p)\rightarrow\infty. Here r(n,p)=O{(logp)3/2log(np)/n+(logp)(logn)/n}r(n,p)=O\left\{{(\log p)^{3/2}\log(np)}/{n}+{(\log p)(\log n)}/{\sqrt{n}}\right\}. This implies that

|Pr(maxj[p]|Tnj|t)Pr(𝐙t)|\displaystyle\quad\left|\Pr\left(\max\limits_{j\in[p]}|T_{nj}|\leq t\right)-\Pr\left(\|\bm{\mathrm{Z}}\|_{\infty}\leq t\right)\right|
|Pr{maxj[p]|Snj|t+Cr(n,p)}Pr{𝐙t+Cr(n,p)}|IB,1\displaystyle\leq\underbrace{\left|\Pr\left\{\max\limits_{j\in[p]}|S_{nj}|\leq t+Cr(n,p)\right\}-\Pr\left\{\|\bm{\mathrm{Z}}\|_{\infty}\leq t+Cr(n,p)\right\}\right|}_{\text{I}_{\text{B},1}}
+|Pr{𝐙t+Cr(n,p)}Pr(𝐙t)|IB,2.\displaystyle\quad+\underbrace{\left|\Pr\left\{\|\bm{\mathrm{Z}}\|_{\infty}\leq t+Cr(n,p)\right\}-\Pr\left(\|\bm{\mathrm{Z}}\|_{\infty}\leq t\right)\right|}_{\text{I}_{\text{B},2}}. (A.1)

For the term IB,1=|Pr{maxj[p]|Snj|t+Cr(n,p)}Pr{𝐙t+Cr(n,p)}|\text{I}_{\text{B},1}=\left|\Pr\left\{\max_{j\in[p]}|S_{nj}|\leq t+Cr(n,p)\right\}-\Pr\left\{\|\bm{\mathrm{Z}}\|_{\infty}\leq t+Cr(n,p)\right\}\right| in (A.1), recall that Snj=n1/2i=1n[{F(Yi)1/2𝒇i𝜸h}(Xij𝒇i𝒃j)+mj(Yi)]S_{nj}={n}^{-1/2}\sum_{i=1}^{n}\left[\left\{F(Y_{i})-{1}/{2}-\bm{f}_{i}^{\top}\bm{\gamma}_{h}\right\}\left(X_{ij}-\bm{f}_{i}^{\top}\bm{b}_{j}\right)+m_{j}(Y_{i})\right] with mj(y)=𝔼[(X1j𝒇1𝒃j){𝕀(Yy)F(Y)}]m_{j}(y)=\mathbb{E}[(X_{1j}-\bm{f}_{1}^{\top}\bm{b}_{j})\{\mathbb{I}(Y\geq y)-F(Y)\}], by the sub-Gaussian assumptions of 𝒇\bm{f} and 𝒖\bm{u}, we can get that [{F(Yi)1/2𝒇i𝜸h}(Xij𝒇i𝒃j)+mj(Yi)]i=1n\left[\left\{F(Y_{i})-{1}/{2}-\bm{f}_{i}^{\top}\bm{\gamma}_{h}\right\}\left(X_{ij}-\bm{f}_{i}^{\top}\bm{b}_{j}\right)+m_{j}(Y_{i})\right]_{i=1}^{n} is a sub-Exponential variable sequence with bounded norm. In addition, by Assumption 3, applying Lemma 3 in Supplementary Material, we have

IB,1=limnsupt|Pr{maxj[p]|Snj|t+Cr(n,p)}Pr{𝐙t+Cr(n,p)}|=0.\displaystyle\text{I}_{\text{B},1}=\lim\limits_{n\rightarrow\infty}\sup\limits_{t\in\mathbb{R}}\left|\Pr\left\{\max\limits_{j\in[p]}|S_{nj}|\leq t+Cr(n,p)\right\}-\Pr\left\{\|\bm{\mathrm{Z}}\|_{\infty}\leq t+Cr(n,p)\right\}\right|=0. (A.2)

For the term IB,2=|Pr{𝐙t+Cr(n,p)}Pr(𝐙t)|\text{I}_{\text{B},2}=\left|\Pr\left\{\|\bm{\mathrm{Z}}\|_{\infty}\leq t+Cr(n,p)\right\}-\Pr\left(\|\bm{\mathrm{Z}}\|_{\infty}\leq t\right)\right| in (A.1), by Lemma 4 in Supplementary Material, we have

IB,2\displaystyle\text{I}_{\text{B},2} =|Pr{𝐙t+Cr(n,p)}Pr(𝐙t)|\displaystyle=\left|\Pr\left\{\|\bm{\mathrm{Z}}\|_{\infty}\leq t+Cr(n,p)\right\}-\Pr\left(\|\bm{\mathrm{Z}}\|_{\infty}\leq t\right)\right|
r(n,p)1log{pr(n,p)}\displaystyle\lesssim r(n,p)\sqrt{1\vee\log\left\{\frac{p}{r(n,p)}\right\}}
0.\displaystyle\rightarrow 0. (A.3)

Combining (A.2) with (A.1), we have

|Pr(maxj[p]|Tnj|t)Pr(𝐙t)|0.\displaystyle\left|\Pr\left(\max\limits_{j\in[p]}|T_{nj}|\leq t\right)-\Pr\left(\|\bm{\mathrm{Z}}\|_{\infty}\leq t\right)\right|\rightarrow 0.

The proof is completed. ∎

A.2 Proof of Theorem 3.2

Proof.

Firstly, we have the following decomposition.

|Pr(Mnx)Pr(𝒢^x)|\displaystyle\quad\left|\Pr\left(M_{n}\leq x\right)-\Pr^{*}(\widehat{\mathcal{G}}\leq x)\right|
=|Pr(Mnx)Pr(𝐙x)+Pr(𝐙x)Pr(𝒢^x)|\displaystyle=\left|\Pr\left(M_{n}\leq x\right)-\Pr\left(\left\|\bm{\mathrm{Z}}\right\|_{\infty}\leq x\right)+\Pr\left(\left\|\bm{\mathrm{Z}}\right\|_{\infty}\leq x\right)-\Pr^{*}(\widehat{\mathcal{G}}\leq x)\right|
|Pr(Mnx)Pr(𝐙x)|+|Pr(𝐙x)Pr(𝒢^x)|.\displaystyle\leq\left|\Pr\left(M_{n}\leq x\right)-\Pr\left(\left\|\bm{\mathrm{Z}}\right\|_{\infty}\leq x\right)\right|+\left|\Pr\left(\left\|\bm{\mathrm{Z}}\right\|_{\infty}\leq x\right)-\Pr^{*}(\widehat{\mathcal{G}}\leq x)\right|.

By Theorem 3.1, we have

|Pr(Mnx)Pr(𝐙x)|0.\displaystyle\left|\Pr\left(M_{n}\leq x\right)-\Pr\left(\left\|\bm{\mathrm{Z}}\right\|_{\infty}\leq x\right)\right|\rightarrow 0.

We write

ρ=supx>0|Pr(𝐙x)Pr(𝒢^x)|.\displaystyle\rho^{*}=\sup\limits_{x>0}\left|\Pr\left(\left\|\bm{\mathrm{Z}}\right\|_{\infty}\leq x\right)-\Pr^{*}(\widehat{\mathcal{G}}\leq x)\right|. (A.4)

Denote S~nj=n1/2i=1n[{Fn(Yi)1/2𝒇^i𝜸^h}U^ij+m^j(Yi)]Ni\widetilde{S}_{nj}=n^{-1/2}\sum_{i=1}^{n}\left[\left\{F_{n}(Y_{i})-{1}/{2}-\widehat{\bm{f}}_{i}^{\top}\widehat{\bm{\gamma}}_{h}\right\}\widehat{U}_{ij}+\widehat{m}_{j}(Y_{i})\right]\mathrm{N}_{i}. Given the dataset 𝒟={𝒙1,,𝒙n,Y1,,Yn}\mathcal{D}=\{\bm{x}_{1},\ldots,\bm{x}_{n},Y_{1},\ldots,Y_{n}\}, the covariance matrix of 𝑺~n=(S~nj)j=1p\widetilde{\bm{S}}_{n}=(\widetilde{S}_{nj})_{j=1}^{p} is

𝛀~=Cov(𝑺~n|𝒟),\displaystyle\widetilde{\bm{\Omega}}=\mathrm{Cov}(\widetilde{\bm{S}}_{n}|\mathcal{D}),

with Ω~jk=n1i=1n{U^ije^hi+m^j(Yi)}{U^ike^hi+m^k(Yi)}\widetilde{{\Omega}}_{jk}=n^{-1}\sum_{i=1}^{n}\{\widehat{U}_{ij}\widehat{e}_{hi}+\widehat{m}_{j}(Y_{i})\}\{\widehat{U}_{ik}\widehat{e}_{hi}+\widehat{m}_{k}(Y_{i})\}, j,k=1,,pj,k=1,\ldots,p, and e^hi=Fn(Yi)1/2𝒇^i𝜸^h\widehat{e}_{hi}=F_{n}(Y_{i})-{1}/{2}-\widehat{\bm{f}}_{i}^{\top}\widehat{\bm{\gamma}}_{h}. Recall that 𝛀=Cov(𝑺n)\bm{\Omega}^{*}=\mathrm{Cov}({\bm{S}}_{n}), with Ωjk=𝔼{Uijehi+mj(Yi)}{Uikehi+mk(Yi)},j,k=1,,p{\Omega}_{jk}^{*}=\mathbb{E}\left\{{U}_{ij}{e}_{hi}+{m}_{j}(Y_{i})\right\}\left\{{U}_{ik}{e}_{hi}+{m}_{k}(Y_{i})\right\},\ j,k=1,\ldots,p, and ehi=F(Yi)1/2𝒇i𝜸he_{hi}=F(Y_{i})-{1}/{2}-{\bm{f}_{i}}^{\top}{\bm{\gamma}}_{h}. By Lemma 5 in Supplementary Material, we have

𝛀𝛀~max=o(1logp).\displaystyle\|\bm{\Omega}^{*}-\widetilde{\bm{\Omega}}\|_{\max}=o_{\mathbb{P}}\left(\frac{1}{{\log p}}\right).

Then it follows from Lemma 2.1 in Chernozhukov et al., (2023) that ρ0\rho^{*}\stackrel{{\scriptstyle\mathbb{P}}}{{\rightarrow}}0.

Combining this result with Theorem 3.1, we have

|Pr(Mnx)Pr(𝒢^x)|0.\displaystyle\left|\Pr\left(M_{n}\leq x\right)-\Pr^{*}(\widehat{\mathcal{G}}\leq x)\right|\rightarrow 0. (A.5)

The proof is completed. ∎

A.3 Proof of Theorem 3.3

Proof.

Firstly, we have the following decomposition

maxj[p]|nβhjμj|IH1maxj[p]|Tnj|+maxj[p]|nβhjμjTnj|IH2.\displaystyle\underbrace{\max\limits_{j\in[p]}\left|\sqrt{n}\beta_{hj}\mu_{j}\right|}_{\text{I}_{\text{H}_{1}}}\leq\max\limits_{j\in[p]}|T_{nj}|+\underbrace{\max\limits_{j\in[p]}\left|\sqrt{n}\beta_{hj}\mu_{j}-T_{nj}\right|}_{\text{I}_{\text{H}_{2}}}. (A.6)

For the term IH2=maxj[p]|nβhjμjTnj|\text{I}_{\text{H}_{2}}=\max_{j\in[p]}\left|\sqrt{n}\beta_{hj}\mu_{j}-T_{nj}\right| in (A.6), denoted η=maxj[p]|(Tnjnβhjμj)(Snjnβhjμj)|\eta^{\natural}=\max_{j\in[p]}|(T_{nj}-\sqrt{n}\beta_{hj}\mu_{j})-\left(S_{nj}-\sqrt{n}\beta_{hj}\mu_{j}\right)|, we have η=O{r(n,p)}\eta^{\natural}=O_{\mathbb{P}}\{r(n,p)\}, with r(n,p)=(logp)3/2log(np)/n+(logp)(logn)/n+s𝜷hlogp/n+maxj[p]|μjβhj|/nr(n,p)=(\log p)^{3/2}\log(np)/{n}+(\log p)(\log n)/{\sqrt{n}}+s\|\bm{\beta}_{h}\|_{\infty}{\log p}/{\sqrt{n}}+{\max_{j\in[p]}|\mu_{j}\beta_{hj}|}/{\sqrt{n}} by Lemma 2 in Supplementary Material. For simplify, we rewrite Snjnβhjμj=i=1nϖij/nS_{nj}-\sqrt{n}\beta_{hj}\mu_{j}=\sum_{i=1}^{n}\varpi_{ij}/\sqrt{n}, where ϖij={F(Yi)1/2𝒇i𝜸h}Uij+mj(Yi)βhjμj\varpi_{ij}=\left\{F(Y_{i})-{1}/{2}-\bm{f}_{i}^{\top}\bm{\gamma}_{h}\right\}U_{ij}+m_{j}(Y_{i})-\beta_{hj}\mu_{j}. Under Assumption 1, it’s easy to show that {ϖij}i=1n\{\varpi_{ij}\}_{i=1}^{n} is an i.i.d. mean zero sub-Exponential random variable sequence. Assume that μj=𝔼(Uij2)Cmin\mu_{j}=\mathbb{E}(U_{ij}^{2})\geq C_{\min}. Denote K=maxi[n]ϖijψ1K^{\prime}=\max_{i\in[n]}\|\varpi_{ij}\|_{\psi_{1}}, by the Bernstein’s inequality, we have

maxj[p]|Snjnβhjμj|=maxj[p]|1ni=1nϖij|Cminlogp2.\displaystyle\max\limits_{j\in[p]}\left|S_{nj}-\sqrt{n}\beta_{hj}\mu_{j}\right|=\max\limits_{j\in[p]}\left|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\varpi_{ij}\right|\leq C_{\min}\sqrt{\frac{{\log p}}{2}}. (A.7)

with probability at least 12exp{C(logp)/(2K2)}1-2\exp\{-C(\log p)/(2K^{\prime 2})\}. Therefore, with high probability, we have

maxj[p]|Tnjnβhjμj|maxj[p]|Snjnβhjμj|+ηCminlogp2+Cs𝜷hlogpn+maxj[p]C|μjβhj|n.\displaystyle\max\limits_{j\in[p]}\left|T_{nj}-\sqrt{n}\beta_{hj}\mu_{j}\right|\leq\max\limits_{j\in[p]}\left|S_{nj}-\sqrt{n}\beta_{hj}\mu_{j}\right|+\eta^{\natural}\leq C_{\min}\sqrt{\frac{{\log p}}{2}}+Cs\|\bm{\beta}_{h}\|_{\infty}\frac{\log p}{\sqrt{n}}+\max\limits_{j\in[p]}C\frac{|\mu_{j}\beta_{hj}|}{\sqrt{n}}. (A.8)

For any 𝜷h𝚫(2+ϱ0)\bm{\beta}_{h}\in\bm{\Delta}(2+\varrho_{0}), we have

IH1=maxj[p]|nβhjμj|>Cmin(2+ϱ0)logp.\displaystyle\text{I}_{\text{H}_{1}}=\max\limits_{j\in[p]}\left|\sqrt{n}\beta_{hj}\mu_{j}\right|>C_{\min}\sqrt{(2+\varrho_{0})\log p}. (A.9)

Substituting (A.8) and (A.9) into (A.6), we have

maxj[p]|Tnj|\displaystyle\max\limits_{j\in[p]}\left|T_{nj}\right| maxj[p]|nβhjμj|maxj[p]|nβhjμjTnj|\displaystyle\geq\max\limits_{j\in[p]}\left|\sqrt{n}\beta_{hj}\mu_{j}\right|-\max\limits_{j\in[p]}\left|\sqrt{n}\beta_{hj}\mu_{j}-T_{nj}\right|
Cminn𝜷hCminlogp2Cs𝜷hlogpnCmaxj[p]|μjβhj|n.\displaystyle\geq C_{\min}\sqrt{n}\|\bm{\beta}_{h}\|_{\infty}-C_{\min}\sqrt{\frac{{\log p}}{2}}-Cs\|\bm{\beta}_{h}\|_{\infty}\frac{\log p}{\sqrt{n}}-C\max\limits_{j\in[p]}\frac{|\mu_{j}\beta_{hj}|}{\sqrt{n}}.

By Theorem 3.2, c^1α\widehat{c}_{1-\alpha} is equal to the (1α)(1-\alpha)-th quantile of maxj[p]|Tnjnβhjμj|\max_{j\in[p]}\left|T_{nj}-\sqrt{n}\beta_{hj}\mu_{j}\right| asymptotically. By (A.8), we have c^1αCminlogp/2+Cs𝜷hlogp/n+maxj[p]C|μjβhj|/n\widehat{c}_{1-\alpha}\leq C_{\min}\sqrt{\log p/2}+Cs\|\bm{\beta}_{h}\|_{\infty}{\log p}/{\sqrt{n}}+\max_{j\in[p]}C{|\mu_{j}\beta_{hj}|}/{\sqrt{n}} with high probability. Therefore,

lim(n,p)inf𝜷h𝚫(2+ϱ0)Pr(Mnc^1α)=1.\displaystyle\lim_{(n,p)\rightarrow\infty}\inf\limits_{\bm{\beta}_{h}\in\bm{\Delta}(2+\varrho_{0})}\Pr\left(M_{n}\geq\widehat{c}_{1-\alpha}\right)=1.

The proof is completed. ∎

A.4 Proof of Theorem 4.1

Proof.

Recall that 𝜷^h\widehat{\bm{\beta}}_{h} is the minimizer of optimization problem as

𝜷^h=argmin𝜷p{12nF~n(𝒀)𝑼^𝜷22+λ𝜷1}.\displaystyle\widehat{\bm{\beta}}_{h}=\arg\min\limits_{\bm{\beta}\in\mathbb{R}^{p}}\left\{\frac{1}{2n}\left\|\widetilde{F}_{n}(\bm{Y})-\widehat{\bm{U}}\bm{\beta}\right\|_{2}^{2}+\lambda\|\bm{\beta}\|_{1}\right\}.

For the loss function n(𝜷h)=(2n)1(𝑰n𝑷^){Fn(𝒀)1/2}𝑼^𝜷h22\mathcal{L}_{n}({\bm{\beta}}_{h})=(2n)^{-1}\|(\bm{I}_{n}-\widehat{\bm{P}})\left\{F_{n}(\bm{Y})-{1}/{2}\right\}-\widehat{\bm{U}}\bm{\beta}_{h}\|_{2}^{2} in (4.2), to demonstrate the 2\ell_{2}-norm error bound for 𝜷^h\widehat{\bm{\beta}}_{h}, we aim to prove the following inequality:

C𝜷^h𝜷h22n(𝜷^h)n(𝜷h),𝜷^h𝜷hC1λs𝜷^h𝜷h2,\displaystyle C\|\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\|_{2}^{2}\leq\langle\nabla\mathcal{L}_{n}(\widehat{\bm{\beta}}_{h})-\nabla\mathcal{L}_{n}({\bm{\beta}}_{h}),\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\rangle\leq C_{1}\lambda\sqrt{s}\|\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\|_{2}, (A.10)

where n(𝜷h)\nabla\mathcal{L}_{n}({\bm{\beta}}_{h}) is the gradient of loss function n(𝜷h)\mathcal{L}_{n}({\bm{\beta}}_{h}), s=𝜷h0s=\|\bm{\beta}_{h}\|_{0} and λ(logp)/n\lambda\asymp\sqrt{(\log p)/n}, with positive constants CC and C1C_{1}.

We firstly derive the upper bound of (A.10). By KKT condition, there exists a subgradient 𝜿𝜷^h1\bm{\kappa}\in\partial\|\widehat{\bm{\beta}}_{h}\|_{1}, such that n(𝜷^h)+λ𝜿=𝟎\nabla\mathcal{L}_{n}(\widehat{\bm{\beta}}_{h})+\lambda\bm{\kappa}=\bm{0}. We then derive

n(𝜷^h)n(𝜷h),𝜷^h𝜷h\displaystyle\quad\langle\nabla\mathcal{L}_{n}(\widehat{\bm{\beta}}_{h})-\nabla\mathcal{L}_{n}({\bm{\beta}}_{h}),\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\rangle
=n(𝜷^h)(𝜷^h𝜷h)n(𝜷h)(𝜷^h𝜷h)\displaystyle=\nabla\mathcal{L}_{n}(\widehat{\bm{\beta}}_{h})^{\top}(\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h})-\nabla\mathcal{L}_{n}({\bm{\beta}}_{h})(\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h})
=λ𝜿(𝜷^h𝜷h)I1+n(𝜷h)(𝜷h𝜷^h)I2.\displaystyle=\underbrace{-\lambda\bm{\kappa}^{\top}(\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h})}_{\text{I}_{1}}+\underbrace{\nabla\mathcal{L}_{n}({\bm{\beta}}_{h})(\bm{\beta}_{h}-\widehat{\bm{\beta}}_{h})}_{\text{I}_{2}}. (A.11)

Denote 𝜽^h=𝜷^h𝜷h\widehat{\bm{\theta}}_{h}=\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}. For I1=λ𝜿(𝜷^h𝜷h)\text{I}_{1}=-\lambda\bm{\kappa}^{\top}(\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}) in (A.4), by the definition of subgradient, we have

I1\displaystyle\text{I}_{1} =λ𝜿(𝜷^h𝜷h)\displaystyle=-\lambda\bm{\kappa}^{\top}(\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h})
λ(𝜷h1𝜷^h1)\displaystyle\leq\lambda(\|\bm{\beta}_{h}\|_{1}-\|\widehat{\bm{\beta}}_{h}\|_{1})
=λ(𝜷h𝒮1𝜽^h𝒮+𝜷h𝒮1𝜽^h𝒮C1)\displaystyle=\lambda(\|\bm{\beta}_{h\mathcal{S}}\|_{1}-\|\widehat{\bm{\theta}}_{h\mathcal{S}}+\bm{\beta}_{h\mathcal{S}}\|_{1}-\|\widehat{\bm{\theta}}_{h\mathcal{S}^{C}}\|_{1})
λ(𝜽^h𝒮1𝜽^h𝒮C1),\displaystyle\leq\lambda(\|\widehat{\bm{\theta}}_{h\mathcal{S}}\|_{1}-\|\widehat{\bm{\theta}}_{h\mathcal{S}^{C}}\|_{1}), (A.12)

where 𝒮={j[p]:βh,j0}\mathcal{S}=\{j\in[p]:{\beta}_{h,j}\neq 0\}, and 𝒮C=[p]/𝒮\mathcal{S}^{C}=[p]/\mathcal{S}. For I2=n(𝜷h)(𝜷h𝜷^h)\text{I}_{2}=\nabla\mathcal{L}_{n}({\bm{\beta}}_{h})(\bm{\beta}_{h}-\widehat{\bm{\beta}}_{h}) in (A.4), by Ho¨\ddot{\text{o}}lder’s inequality and Lemma 6 in Supplementary Material, we have

I2\displaystyle\text{I}_{2} =n(𝜷h)(𝜷h𝜷^h)\displaystyle=\nabla\mathcal{L}_{n}({\bm{\beta}}_{h})(\bm{\beta}_{h}-\widehat{\bm{\beta}}_{h})
n(𝜷h)𝜷h𝜷^h1\displaystyle\leq\|\nabla\mathcal{L}_{n}({\bm{\beta}}_{h})\|_{\infty}\|\bm{\beta}_{h}-\widehat{\bm{\beta}}_{h}\|_{1}
logpn𝜽^h1.\displaystyle\lesssim\sqrt{\frac{\log p}{n}}\|\widehat{\bm{\theta}}_{h}\|_{1}. (A.13)

Recall that λ(logp)/n\lambda\asymp\sqrt{(\log p)/n}, and assume that λ/2>C(logp)/n\lambda/2>C\sqrt{(\log p)/n}. By triangle and Cauchy-Schwartz inequalities, combining (A.4) with (A.4), we have

n(𝜷^h)n(𝜷h),𝜷^h𝜷h\displaystyle\quad\langle\nabla\mathcal{L}_{n}(\widehat{\bm{\beta}}_{h})-\nabla\mathcal{L}_{n}({\bm{\beta}}_{h}),\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\rangle
λ(𝜽^h𝒮1𝜽^h𝒮C1)+Clogpn𝜽^h1\displaystyle\leq\lambda\left(\|\widehat{\bm{\theta}}_{h\mathcal{S}}\|_{1}-\|\widehat{\bm{\theta}}_{h\mathcal{S}^{C}}\|_{1}\right)+C\sqrt{\frac{\log p}{n}}\|\widehat{\bm{\theta}}_{h}\|_{1}
=λ(𝜽^h𝒮1𝜽^h𝒮C1)+Clogpn(𝜽^h𝒮1+𝜽^h𝒮C1)\displaystyle=\lambda(\|\widehat{\bm{\theta}}_{h\mathcal{S}}\|_{1}-\|\widehat{\bm{\theta}}_{h\mathcal{S}^{C}}\|_{1})+C\sqrt{\frac{\log p}{n}}(\|\widehat{\bm{\theta}}_{h\mathcal{S}}\|_{1}+\|\widehat{\bm{\theta}}_{h\mathcal{S}^{C}}\|_{1})
={λ+Clogpn}𝜽^h𝒮1+{Clogpnλ}𝜽^h𝒮C1\displaystyle=\left\{\lambda+C\sqrt{\frac{\log p}{n}}\right\}\|\widehat{\bm{\theta}}_{h\mathcal{S}}\|_{1}+\left\{C\sqrt{\frac{\log p}{n}}-\lambda\right\}\|\widehat{\bm{\theta}}_{h\mathcal{S}^{C}}\|_{1}
32λs𝜷^h𝜷h2.\displaystyle\leq\frac{3}{2}\lambda\sqrt{s}\|\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\|_{2}. (A.14)

The inequality (A.4) shows that the right side of (A.10) holds.

Next, we focus on proving the left side of (A.10). Note that for any Δp\Delta\in\mathbb{R}^{p},

εn(Δ)={n(𝜷h+Δ)n(𝜷h)}Δ=1n(𝑼^Δ)(𝑼^Δ).\displaystyle\varepsilon_{n}(\Delta)=\left\{\nabla\mathcal{L}_{n}(\bm{\beta}_{h}+\Delta)-\nabla\mathcal{L}_{n}(\bm{\beta}_{h})\right\}^{\top}\Delta=\frac{1}{n}(\widehat{\bm{U}}\Delta)^{\top}(\widehat{\bm{U}}\Delta).

Because n(𝜷^h)n(𝜷h),𝜷^h𝜷h0\langle\nabla\mathcal{L}_{n}(\widehat{\bm{\beta}}_{h})-\nabla\mathcal{L}_{n}({\bm{\beta}}_{h}),\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\rangle\geq 0 in (A.4), we have 𝜽^𝒮C13𝜽^𝒮1\|\widehat{\bm{\theta}}_{\mathcal{S}^{C}}\|_{1}\leq 3\|\widehat{\bm{\theta}}_{\mathcal{S}}\|_{1}. Besides, we have the sparsity assumption that s{(logp)/n+1/p}0s\{(\log p)/n+1/p\}\rightarrow 0. Then, according to Lemma C.2 in Fan et al., (2024), we have

Pr{|1n(𝑼^𝜽^h)(𝑼^𝜽^h)|𝜽^h22C}1.\displaystyle\Pr\left\{\frac{\left|\frac{1}{n}(\widehat{\bm{U}}\widehat{\bm{\theta}}_{h})^{\top}(\widehat{\bm{U}}\widehat{\bm{\theta}}_{h})\right|}{\|\widehat{\bm{\theta}}_{h}\|_{2}^{2}}\geq C\right\}\rightarrow 1.

That is, with high probability, we have

n(𝜷^h)n(𝜷h),𝜷^h𝜷hC𝜷^h𝜷h22,\displaystyle\langle\nabla\mathcal{L}_{n}(\widehat{\bm{\beta}}_{h})-\nabla\mathcal{L}_{n}({\bm{\beta}}_{h}),\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\rangle\geq C\|\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\|_{2}^{2}, (A.15)

which shows that the left side of (A.10) holds. Therefore, in conclusion, we prove the 2\ell_{2}-norm error bound for 𝜷^h\widehat{\bm{\beta}}_{h}, i.e.

𝜷^h𝜷h2Cλs,\displaystyle\|\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\|_{2}\leq C\lambda\sqrt{s}, (A.16)

where λ(logp)/n\lambda\asymp\sqrt{(\log p)/n}. Because n(𝜷^h)n(𝜷h),𝜷^h𝜷h0\langle\nabla\mathcal{L}_{n}(\widehat{\bm{\beta}}_{h})-\nabla\mathcal{L}_{n}({\bm{\beta}}_{h}),\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\rangle\geq 0 in (A.4), we have

𝜷^h𝜷h1Cs𝜷^h𝜷h2.\displaystyle\|\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\|_{1}\leq C\sqrt{s}\|\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\|_{2}.

Therefore,

𝜷^h𝜷h1Cλs.\displaystyle\|\widehat{\bm{\beta}}_{h}-\bm{\beta}_{h}\|_{1}\leq C\lambda s. (A.17)

Here, λ(logp)/n\lambda\asymp\sqrt{(\log p)/n}.
The proof is completed. ∎

A.5 Proof of Theorem 4.2

Proof.

Recall that 𝒎={m1(Y),,mp(Y)}\bm{m}=\{m_{1}(Y),\ldots,m_{p}(Y)\}^{\top}, with mj(y)=𝔼[(X1j𝒇1𝒃j){𝕀(Yy)F(Y)}]m_{j}(y)=\mathbb{E}[(X_{1j}-\bm{f}_{1}^{\top}\bm{b}_{j})\{\mathbb{I}(Y\geq y)-F(Y)\}]. Let 𝒎i\bm{m}_{i} be the i.i.d copies of 𝒎\bm{m}. Namely, 𝒎i={m1(Yi),,mp(Yi)}\bm{m}_{i}=\{m_{1}(Y_{i}),\ldots,m_{p}(Y_{i})\}^{\top}. We have the following decomposition.

n(𝜷~h𝜷h)=𝒁b+𝑬.\displaystyle\sqrt{n}\left(\widetilde{\bm{\beta}}_{h}-{\bm{\beta}}_{h}\right)=\bm{Z}_{b}+\bm{E}.

Here, 𝒁b=n1/2i=1n𝚺u1(𝒖iehi+𝒎i)\bm{Z}_{b}=n^{-1/2}\sum_{i=1}^{n}\bm{\Sigma}_{u}^{-1}\left({\bm{u}}_{i}e_{hi}+\bm{m}_{i}\right) and 𝑬=n(𝜷~h𝜷h)n1/2i=1n𝚺u1(𝒖iehi+𝒎i)\bm{E}=\sqrt{n}\left(\widetilde{\bm{\beta}}_{h}-{\bm{\beta}}_{h}\right)-n^{-1/2}\sum_{i=1}^{n}\bm{\Sigma}_{u}^{-1}\left({\bm{u}}_{i}e_{hi}+\bm{m}_{i}\right). By Lemma 9 in Supplementary Material, as long as s=o(n/logp)s=o(\sqrt{n}/\log p) and logp<<n(1q)/(3q)\log p<<n^{{(1-q)}/{(3-q)}}, we have 𝑬=o(1)\|\bm{E}\|_{\infty}=o_{\mathbb{P}}(1), where qq is given in Assumption 5. The proof is completed. ∎

References

  • Ahn and Horenstein, (2013) Ahn, S. C. and Horenstein, A. R. (2013). Eigenvalue ratio test for the number of factors. Econometrica, 81(3):1203–1227.
  • Bai, (2003) Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica, 71(1):135–171.
  • Bai and Ng, (2002) Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1):191–221.
  • Bayle and Fan, (2022) Bayle, P. and Fan, J. (2022). Factor-augmented regularized model for hazard regression. arXiv preprint arXiv:2210.01067.
  • Belloni et al., (2012) Belloni, A., Chen, D., Chernozhukov, V., and Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica, 80(6):2369–2429.
  • Bing et al., (2023) Bing, X., Cheng, W., Feng, H., and Ning, Y. (2023). Inference in high-dimensional multivariate response regression with hidden variables. Journal of the American Statistical Association, pages 1–12.
  • Bühlmann et al., (2014) Bühlmann, P., Kalisch, M., and Meier, L. (2014). High-dimensional statistics with a view toward applications in biology. Annual Review of Statistics and Its Application, 1(1):255–278.
  • Cai et al., (2011) Cai, T., Liu, W., and Luo, X. (2011). A constrained minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106(494):594–607.
  • Cai et al., (2014) Cai, T., Liu, W., and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. Journal of the Royal Statistical Society Series B: Statistical Methodology, 76(2):349–372.
  • Cai et al., (2016) Cai, T., Liu, W., and Zhou, H. H. (2016). Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. Annals of Statistics, 44(2):455–488.
  • Ćevid et al., (2020) Ćevid, D., Bühlmann, P., and Meinshausen, N. (2020). Spectral deconfounding via perturbed sparse linear models. Journal of Machine Learning Research, 21(1):9442–9482.
  • Chernozhukov et al., (2023) Chernozhukov, V., Chetverikov, D., and Koike, Y. (2023). Nearly optimal central limit theorem and bootstrap approximations in high dimensions. Annals of Applied Probability, 33(3):2374–2425.
  • Cook and Ni, (2005) Cook, R. D. and Ni, L. (2005). Sufficient dimension reduction via inverse regression: A minimum discrepancy approach. Journal of the American Statistical Association, 100(470):410–428.
  • Dezeure et al., (2017) Dezeure, R., Bühlmann, P., and Zhang, C. (2017). High-dimensional simultaneous inference with the bootstrap. Test, 26(4):685–719.
  • Eftekhari et al., (2021) Eftekhari, H., Banerjee, M., and Ritov, Y. (2021). Inference in high-dimensional single-index models under symmetric designs. Journal of Machine Learning Research, 22(27):1–63.
  • Fan et al., (2022) Fan, J., Guo, J., and Zheng, S. (2022). Estimating number of factors by adjusted eigenvalues thresholding. Journal of the American Statistical Association, 117(538):852–861.
  • Fan et al., (2020) Fan, J., Li, R., Zhang, C., and Zou, H. (2020). Statistical foundations of data science. Chapman and Hall/CRC.
  • Fan et al., (2013) Fan, J., Liao, Y., and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society Series B: Statistical Methodology, 75(4):603–680.
  • Fan et al., (2018) Fan, J., Liu, H., and Wang, W. (2018). Large covariance estimation through elliptical factor models. Annals of Statistics, 46(4):1383.
  • Fan et al., (2024) Fan, J., Lou, Z., and Yu, M. (2024). Are latent factor regression and sparse regression adequate? Journal of the American Statistical Association, 119(546):1076–1088.
  • Fan et al., (2019) Fan, J., Wang, W., and Zhong, Y. (2019). Robust covariance estimation for approximate factor models. Journal of Econometrics, 208(1):5–22.
  • Fan et al., (2017) Fan, J., Xue, L., and Yao, J. (2017). Sufficient forecasting using factor models. Journal of Econometrics, 201(2):292–306.
  • Fewell et al., (2007) Fewell, Z., Davey Smith, G., and Sterne, J. A. (2007). The impact of residual and unmeasured confounding in epidemiologic studies: a simulation study. American Journal of Epidemiology, 166(6):646–655.
  • Guo et al., (2022) Guo, Z., Ćevid, D., and Bühlmann, P. (2022). Doubly debiased lasso: High-dimensional inference under hidden confounding. Annals of Statistics, 50(3):1320.
  • Hall and Li, (1993) Hall, P. and Li, K.-C. (1993). On almost linearity of low dimensional projections from high dimensional data. Annals of Statistics, 21(2):867–889.
  • Han et al., (2023) Han, D., Han, M., Huang, J., and Lin, Y. (2023). Robust inference for high-dimensional single index models. Scandinavian Journal of Statistics, 50(4):1590–1615.
  • Hastie et al., (2009) Hastie, T., Tibshirani, R., Friedman, J. H., and Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer.
  • Jiang et al., (2019) Jiang, F., Ma, Y., and Wei, Y. (2019). Sufficient direction factor model and its application to gene expression quantitative trait loci discovery. Biometrika, 106(2):417–432.
  • Kim and Mueller, (1978) Kim, J.-O. and Mueller, C. W. (1978). Factor analysis: Statistical methods and practical issues, volume 14. sage.
  • Kong and Xia, (2007) Kong, E. and Xia, Y. (2007). Variable selection for the single-index model. Biometrika, 94(1):217–229.
  • Lam and Yao, (2012) Lam, C. and Yao, Q. (2012). Factor modeling for high-dimensional time series: inference for the number of factors. Annals of Statistics, 40(2):694–726.
  • Li, (1991) Li, K.-C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414):316–327.
  • Li and Duan, (1989) Li, K.-C. and Duan, N. (1989). Regression analysis under link violation. Annals of Statistics, 17(3):1009–1052.
  • Listgarten et al., (2010) Listgarten, J., Kadie, C., Schadt, E. E., and Heckerman, D. (2010). Correction for hidden confounders in the genetic analysis of gene expression. Proceedings of the National Academy of Sciences, 107(38):16465–16470.
  • Luo et al., (2009) Luo, R., Wang, H., and Tsai, C.-L. (2009). Contour projected dimension reduction. Annals of Statistics, 37:3743–3778.
  • Luo et al., (2022) Luo, W., Xue, L., Yao, J., and Yu, X. (2022). Inverse moment methods for sufficient forecasting using high-dimensional predictors. Biometrika, 109(2):473–487.
  • Ma et al., (2021) Ma, R., Cai, T., and Li, H. (2021). Global and simultaneous hypothesis testing for high-dimensional logistic regression models. Journal of the American Statistical Association, 116(534):984–998.
  • McCracken and Ng, (2016) McCracken, M. W. and Ng, S. (2016). Fred-md: A monthly database for macroeconomic research. Journal of Business and Economic Statistics, 34(4):574–589.
  • Ning and Liu, (2017) Ning, Y. and Liu, H. (2017). A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Annals of Statistics, 45(1):158–195.
  • Ouyang et al., (2023) Ouyang, J., Tan, K. M., and Xu, G. (2023). High-dimensional inference for generalized linear models with hidden confounding. Journal of Machine Learning Research, 24(296):1–61.
  • Plan and Vershynin, (2016) Plan, Y. and Vershynin, R. (2016). The generalized lasso with non-linear observations. IEEE Transactions on Information Theory, 62(3):1528–1537.
  • Radchenko, (2015) Radchenko, P. (2015). High dimensional single index models. Journal of Multivariate Analysis, 139:266–282.
  • Rejchel and Bogdan, (2020) Rejchel, W. and Bogdan, M. (2020). Rank-based lasso-efficient methods for high-dimensional robust model selection. Journal of Machine Learning Research, 21(244):1–47.
  • Sun et al., (2023) Sun, Y., Ma, L., and Xia, Y. (2023). A decorrelating and debiasing approach to simultaneous inference for high-dimensional confounded models. Journal of the American Statistical Association, (to appear):1–24.
  • Tibshirani, (1996) Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288.
  • Van de Geer et al., (2014) Van de Geer, S., Bühlmann, P., Ritov, Y., and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Annals of Statistics, 42(3):1166–1202.
  • Verzelen, (2012) Verzelen, N. (2012). Minimax risks for sparse regressions: Ultra-high dimensional phenomenons. Electronic Journal of Statistics, 6:38–90.
  • Wang et al., (2012) Wang, T., Xu, P., and Zhu, L. (2012). Non-convex penalized estimation in high-dimensional models with single-index structure. Journal of Multivariate Analysis, 109:221–235.
  • Yang et al., (2024) Yang, W., Shi, H., Guo, X., and Zou, C. (2024). Robust group and simultaneous inferences for high-dimensional single index model. Manuscript.
  • Zhang and Zhang, (2014) Zhang, C. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 76(1):217–242.
  • Zhang and Cheng, (2017) Zhang, X. and Cheng, G. (2017). Simultaneous inference for high-dimensional linear models. Journal of the American Statistical Association, 112(518):757–768.
  • Zhu and Zhu, (2009) Zhu, L. and Zhu, L. (2009). Nonconcave penalized inverse regression in single-index models with high dimensional predictors. Journal of Multivariate Analysis, 100(5):862–875.

Figures

Refer to caption
Figure 1: Power curves under linear model (5.1) with p=200p=200 and 𝜷=ω(𝟏3,𝟎p3)\bm{\beta}=\omega\ast\left(\mathbf{1}_{3},\mathbf{0}_{p-3}\right)^{\top}. The “FAST_i”, “FAST_ii”, “FabTest_i” and “FabTest_ii” signify the results derived from the FAST in this paper and FabTest in Fan et al., (2024) corresponding to settings (5.3) and (5.4) of 𝑭\bm{F} generation, respectively. The first row represent the results obtained with the original data, while the second row corresponds to the results of adding outliers. The first column shows the results when the error follows N(0,0.25)\mathrm{N}(0,0.25), while the second column exhibits the results when the error follows t3\mathrm{t}_{3}.
Refer to caption
Figure 2: Power curves of linear model (5.1) with p=500p=500 and 𝜷=ω(𝟏3,𝟎p3)\bm{\beta}=\omega\ast\left(\mathbf{1}_{3},\mathbf{0}_{p-3}\right)^{\top}. The “FAST_i”, “FAST_ii”, “FabTest_i” and “FabTest_ii” signify the results derived from the FAST in this paper and FabTest in Fan et al., (2024) corresponding to settings (5.3) and (5.4) of 𝑭\bm{F} generation, respectively. The first row represents the outcomes derived from the original data, while the second row corresponds to the results of adding outliers. The first column shows the outcomes assuming the error follows N(0,0.25)\mathrm{N}(0,0.25), while the second column exhibits the outcomes assuming the error follows t3\mathrm{t}_{3}.
Refer to caption
Figure 3: Power curves of non-linear model (5.2) with p=200p=200 and 𝜷=ω(𝟏3,𝟎p3)\bm{\beta}=\omega\ast\left(\mathbf{1}_{3},\mathbf{0}_{p-3}\right)^{\top}. The “FAST_i”, “FAST_ii”, “FabTest_i” and “FabTest_ii” signify the results derived from the FAST in this paper and FabTest in Fan et al., (2024) corresponding to settings (5.3) and (5.4) of 𝑭\bm{F} generation, respectively. The first row represents the outcomes derived from the original data, while the second row corresponds to the results of adding outliers. The first column shows the outcomes assuming the error follows N(0,0.25)\mathrm{N}(0,0.25), while the second column exhibits the outcomes assuming the error follows t3\mathrm{t}_{3}.
Refer to caption
Figure 4: Power curves of non-linear model (5.2) with p=500p=500 and 𝜷=ω(𝟏3,𝟎p3)\bm{\beta}=\omega\ast\left(\mathbf{1}_{3},\mathbf{0}_{p-3}\right)^{\top}. The “FAST_i”, “FAST_ii”, “FabTest_i” and “FabTest_ii” signify the results derived from the FAST in this paper and FabTest in Fan et al., (2024) corresponding to settings (5.3) and (5.4) of 𝑭\bm{F} generation, respectively. The first row represents the outcomes derived from the original data, while the second row corresponds to the results of adding outliers. The first column shows the outcomes assuming the error follows N(0,0.25)\mathrm{N}(0,0.25), while the second column exhibits the outcomes assuming the error follows t3\mathrm{t}_{3}.
Refer to caption
Figure 5: The relative errors of 𝜷^h\widehat{\bm{\beta}}_{h} and 𝜷^\widehat{\bm{\beta}}. Figures (a), (b) and (c) depict the estimation results of linear model (5.1) with noise ε\varepsilon following N(0,1)\mathrm{N}(0,1), Unif(3/2,3/2)\mathrm{Unif}(-{3}/{2},{3}/{2}) and t3\mathrm{t}_{3}, respectively. The “FASIM_Lasso”, “SIM_Lasso” and “FA_Lasso” represent the relative errors of parameter 𝜷h{\bm{\beta}}_{h} under FASIM in this paper, SIM without incorporating the factor effect in Rejchel and Bogdan, (2020) and the parameter 𝜷{\bm{\beta}} under FARM in Fan et al., (2024), respectively.
Refer to caption
Figure 6: The relative errors of 𝜷^h\widehat{\bm{\beta}}_{h} and 𝜷^\widehat{\bm{\beta}}. Figures (a), (b) and (c) depict the estimation results of nonlinear model (5.2) with noise ε\varepsilon following N(0,1)\mathrm{N}(0,1), Unif(3/2,3/2)\mathrm{Unif}(-{3}/{2},{3}/{2}) and t3\mathrm{t}_{3}, respectively. The “FASIM_Lasso”, “SIM_Lasso” and “FA_Lasso” represent the relative errors of parameter 𝜷h{\bm{\beta}}_{h} under FASIM in this paper, SIM without incorporating the factor effect in Rejchel and Bogdan, (2020) and the parameter 𝜷{\bm{\beta}} under FARM in Fan et al., (2024), respectively.
Refer to caption
Figure 7: The relative errors for 𝜷^h\widehat{\bm{\beta}}_{h} and 𝜷^\widehat{\bm{\beta}} with outliers. Figures (a), (b) and (c) depict the estimation results of linear model (5.1) with noise ε\varepsilon following N(0,1)\mathrm{N}(0,1), Unif(3/2,3/2)\mathrm{Unif}(-{3}/{2},{3}/{2}) and t3\mathrm{t}_{3}, respectively. The “FASIM_Lasso”, “SIM_Lasso” and “FA_Lasso” represent the relative errors of parameter 𝜷h{\bm{\beta}}_{h} under FASIM in this paper, SIM without incorporating the factor effect in Rejchel and Bogdan, (2020) and the parameter 𝜷{\bm{\beta}} under FARM in Fan et al., (2024), respectively.
Refer to caption
Figure 8: The relative errors of 𝜷^h\widehat{\bm{\beta}}_{h} and 𝜷^\widehat{\bm{\beta}} with outliers. Figures (a), (b) and (c) depict the estimation results of nonlinear model (5.2) with noise ε\varepsilon following N(0,1)\mathrm{N}(0,1), Unif(3/2,3/2)\mathrm{Unif}(-{3}/{2},{3}/{2}) and t3\mathrm{t}_{3}, respectively. The “FASIM_Lasso”, “SIM_Lasso” and “FA_Lasso” represent the relative errors of parameter 𝜷h{\bm{\beta}}_{h} under FASIM in this paper, SIM without incorporating the factor effect in Rejchel and Bogdan, (2020) and the parameter 𝜷{\bm{\beta}} under FARM in Fan et al., (2024), respectively.

Tables

Table 1: The average computation time (Unit: second). The “FAST_i”, “FAST_ii”, “FabTest_i” and “FabTest_ii” signify the results derived from the FAST and FabTest corresponding to settings (5.3) and (5.4) of 𝑭\bm{F} generation, respectively.
pp ε\varepsilon FAST_i FabTest_i FAST_ii FabTest_ii
200200 N(0,0.25)\mathrm{N}(0,0.25) 1.50 46.33 1.60 44.53
t3\mathrm{t}_{3} 1.61 45.62 1.48 47.08
500500 N(0,0.25)\mathrm{N}(0,0.25) 3.72 156.00 3.75 156.53
t3\mathrm{t}_{3} 3.65 208.86 3.73 146.21
Table 2: The empirical coverage probabilities and average lengths of the confidence intervals.
Model pp ε\varepsilon CP CP𝒮\text{CP}_{\mathcal{S}} CP𝒮C\text{CP}_{\mathcal{S}^{C}} AL AL𝒮\text{AL}_{\mathcal{S}} AL𝒮C\text{AL}_{\mathcal{S}^{C}}
Model 1 500 N(0,0.25)\mathrm{N}(0,0.25) 0.951 0.955 0.951 0.040 0.039 0.040
t3\mathrm{t}_{3} 0.947 0.950 0.947 0.066 0.064 0.066
200 N(0,0.25)\mathrm{N}(0,0.25) 0.952 0.940 0.952 0.040 0.039 0.040
t3\mathrm{t}_{3} 0.946 0.943 0.946 0.067 0.066 0.067
Model 2 500 N(0,0.25)\mathrm{N}(0,0.25) 0.951 0.966 0.951 0.041 0.040 0.041
t3\mathrm{t}_{3} 0.946 0.930 0.946 0.069 0.068 0.069
200 N(0,0.25)\mathrm{N}(0,0.25) 0.951 0.974 0.950 0.041 0.040 0.041
t3\mathrm{t}_{3} 0.947 0.956 0.947 0.067 0.066 0.067
Table 3: The pp-values for the original dataset and polluted dataset in two different time periods.
Time Period Data F_SIM F_LM
1992.02-2007.10 original 0.0000 2×1032\times 10^{-3}
polluted 0.0000 0.0000
2010.08-2020.02 original 0.0000 1.7×1021.7\times 10^{-2}
polluted 0.0000 0.2015
Table 4: The MSE for the original dataset and polluted dataset in different time periods.
Time Period Data FARM FASIM
1992.02-2007.10 original 0.2620 0.2764
polluted 1.0592 0.9251
2010.08-2020.02 original 0.3328 0.3748
polluted 0.9432 0.8764