Sparsity learning via structured functional factor augmentation
Abstract
As one of the most powerful tools for examining the association between functional covariates and a response, the functional regression model has been widely adopted in various interdisciplinary studies. Usually, a limited number of functional covariates are assumed in a functional linear regression model. Nevertheless, correlations may exist between functional covariates in high-dimensional functional linear regression models, which brings significant statistical challenges to statistical inference and functional variable selection. In this article, a novel functional factor augmentation structure (fFAS) is proposed for multivariate functional series, and a multivariate functional factor augmentation selection model (fFASM) is further proposed to deal with issues arising from variable selection of correlated functional covariates. Theoretical justifications for the proposed fFAS are provided, and statistical inference results of the proposed fFASM are established. Numerical investigations support the superb performance of the novel fFASM model in terms of estimation accuracy and selection consistency.
Keywords: correlated functional covariates, functional factor augmentation structure, functional variable selection, factor augmentation regression.
1 Introduction
Functional data, usually referred to as a sequential collection of instances over time with serial dependence, have been widely observed in various scenarios from different scientific disciplines, such as earth, medical, and social sciences (Peng et al., 2005; Centofanti et al., 2021). Instead of employing conventional time series modeling techniques, an underlying trajectory is usually assumed for a functional process and sampled at a set of time points in some interval, so that statistical modeling and inference are feasible, including estimation of the mean process and its intrinsic covariance structure, trajectory recovery, and prediction of a functional; see, for example, Ramsay & Silverman (2005); Yao et al. (2005); Ferraty (2006); Hall & Horowitz (2007); Hörmann & Kokoszka (2010); Cuevas (2014); Hörmann et al. (2015); Aneiros, Horová, Hušková & Vieu (2022); Petersen (2024), among others.
Conventionally, functional data analysis has focused on a single or fixed number of separate functional series, while recent studies have explored associations between multiple functional series, and the most widely adopted model may be the functional linear model, which assumes to (linearly) link a response with a bunch of covariates and at least one functional covariate by using unknown functional coefficient curves. Such a model offers a statistical tool to infer linear association between the response and functional covariates. Several studies have attempted along this path, employing parametric, non-parametric, and semi-parametric models; see, for instance, Cardot et al. (2003); Yao et al. (2005); Li & Hsing (2007); Müller & Yao (2008); Yuan & Cai (2010); Chen et al. (2011); Kong et al. (2016); Chen et al. (2022), among others.
However, two main issues arise in multivariate functional data analysis in practice. One is that functional series may be correlated with each other, especially in high dimensions or less dense observation cases, while the current functional linear model analysis may typically assume independence between multiple functional covariates. To address this problem, some efforts are paid, for example, to assume pairwise covariance structures for functional covariates explicitly (Chiou & Müller, 2016). However, these methods turn out to be less robust or accurate due to lack of flexibility and computational efficiency. Alternatively, one can describe such associations by using a factor model with lower ranks (Bai, 2003; Fan et al., 2020), where all functional covariates are assumed to share a finite number of certain latent functional factors (Castellanos et al., 2015), without assuming any explicit correlation structure for multivariate functional covariates. A few studies have touched the so-called functional factor models, e.g., Hays et al. (2012); Chen et al. (2021); Yang & Ling (2024), while these methods provide limited exploration of common functional factors shared by functional covariates. To the best of our knowledge, very limited research has focused on how to capture common associations for multivariate functional series efficiently and effectively with inferential justifications from a statistical perspective.
Another issue is how to select useful functional covariates in multivariate functional linear regression models when correlations between functional covariates exist, which is also frequently needed in practice, such as Internet of Things (IoT) data analysis (Gonzalez-Vidal et al., 2019), antibiotic resistance prediction (Jiménez et al., 2020), nonsmall cell lung cancer (NSCLC) clinical study (Fang et al., 2015), and stock market forecasting (Nti et al., 2019; Htun et al., 2023). A common way to achieve functional variable selection is to impose a penalty on the corresponding functional coefficient curves in a group-wise manner, using popular penalties such as the lasso (Tibshirani, 1996), SCAD (Fan & Li, 2001) and MCP (Zhang, 2010). A few studies have touched on functional variable selection in certain scenarios; see, for example, Matsui & Konishi (2011); Kong et al. (2016); Aneiros, Horová, Hušková & Vieu (2022), among others. However, when correlation exists between functional covariates, such a strategy turns out to be less accurate in functional coefficient estimation and fails to capture the truly useful ones, as demonstrated in simulation studies in this article. Consequently, it remains statistically challenging how to select useful functional covariates with selection consistency in multivariate functional linear models with correlated functional covariates in high dimensions.
Inspired by the challenges above, we firstly propose a novel functional factor augmentation structure (fFAS) to capture associations for correlated multivariate functional data, and further propose a functional factor augmentation selection model (fFASM) to achieve selection of functional covariates in high dimensions when correlations exist between functional covariates using the penalized method. Not only is the correlation addressed without assuming an explicit covariance structure, but theoretical properties of the estimated functional factors are also established. Further, pertaining to the correlated functional covariates, the proposed fFASM method successfully captures the useful functional covariates simultaneously in the context of functional factor models. Numerical studies on both simulated and real datasets support the superior performance of the proposed fFASM. The main contributions of the proposed method are two-fold as follows.
-
•
We propose a feasible time-independent functional factor augmentation structure (fFAS) for functional data, and establish theoretical justifications for statistical inference, revealing valuable insights into current literature. Also, the impact from truncation inherent in functional data analysis to the proposed fFAS is discussed, and solutions are provided to enhance robustness and applicability of our model. A key result is how the difference may be addressed and controlled by comparing the true factor model and the estimated one when using truncated expansion.
-
•
A multivariate functional linear regression model with correlated functional covariates is proposed, by employing the proposed functional structural factor. As correlations often exist in multiple functional covariates in high dimensions but may be difficult to estimate, the proposed fFASM method decomposes the functional covariates into two parts with low correlations with each other, and simultaneously expands the dimensions of parameters to be estimated. By this way our approach improves the performance of functional variable selection.
The rest of the paper is organized as follows. Section 2 introduces a novel fFAS for functional processes with its statistical properties, and the fFASM is further proposed in Section 3 for correlated functional covariates and functional variable selection in detail with theoretical justifications. Section 4 employs simulated data to examine the proposed method in various scenarios, and Section 5 presents its applications on two sets of real-world datasets. Section 6 concludes the article with discussions.
Notation: denotes the identity matrix; refers to the zero matrix; and represent the all-zero and all-one vectors in , respectively. For a vector , denotes its Euclidean norm. For a vector and , denote as its sub-vector. For a matrix and , define and , and its matrix entry-wise max norm is denoted as , and and as its Frobenius and induced -norms, respectively. Denote as the minimum eigenvalue of if it is symmetric. Let and be the gradient and Hessian operators. For and , define and . refers to the normal distribution with mean and covariance matrix .
2 A novel functional factor augmentation structure
2.1 Functional observations and its expansion
To start with, suppose that functional processes are observed and collected sequentially in a sample for the -th functional process from the -th subject at time in a certain time interval , and we use for abbreviation without confusion. Usually in the analysis of a single functional process, is assumed to consist of the underlying process and an independent noise , i.e., , where and are assumed to be identically and independently distributed (i.i.d.) with a mean function and a covariance structure, respectively, and . For convenience, we centralize these functional processes and still use the notation to represent them. To recover the functional trajectory , it is usually smoothed by assuming an expansion over a set of pre-specified orthogonal basis functions as
(1) |
where are the time-invariant coefficients, , and is the residual orthogonal to . As in practice, is usually unknown, an identifiable estimator can be obtained under some conditions, shown in Lemma 1.
Lemma 1
For which has the following properties,
-
•
, where ,
-
•
for , , ,
-
•
,
-
•
all the eigenvalues of are less than , where is the smallest eigenvalue of ,
where the -th eigenfunction of is , so that the -th functional score .
Remark 1
Lemma 1 states is a reasonable approximation of . All eigenvalues of are required to be less than , indicating that segregation can be conducted based on contribution to the variance of .
The whole recovery process can be achieved by the popular functional principal component analysis (fPCA) with the Karhunen-Loéve (KL) expansion (Yao et al., 2005).
2.2 A functional factor augmentation structure
As correlations may exist between multivariate functional processes, we propose a functional factor augmentation structure (fFAS) to address the issue. Consider a simplified scenario with only two correlated functional processes, and , generated by
(2) | ||||
where and are independent of each other. Assume and are uncorrelated for , where , and hence the correlation between and arises only from that between and . To capture such a correlation, it is assumed that each shares common underlying factors using a linear combination
(9) |
where is a factor loading matrix with , is a vectorized latent factors, and is an idiosyncratic component independent of which carries a weak correlation. Note that the covariance . For model identifiability, it is assumed that and with a matrix , where , a constant depending on the distribution of . By the spectral decomposition, with the eigenvalues of and the corresponding eigenvectors . In this way, and in (2) can be expanded as
(10) | ||||
where . This indicates can be decomposed into two correlated parts, namely, the functional factor part and the weakly correlated part , plus an independent error term. We provide an example to illustrate such a fFAS when two functional covariates share a linear structure.
Example 1
Suppose and are associated with a linear structure as
with . Then it is obtained that
so that . With a truncation of the first and scores in and , and combining them as , and a matrix with elements and a diagonal matrix with elements , it is easily obtained that
where is an orthogonal matrix, so that is diagonal, and with . Note that no matter what values of and are in practice, one can always obtain such a functional factor augmentation structure.
Next, we consider a more general case where the structure contains correlations for , where . Then, Lemma 2 shows the relation between and in this circumstance that is still an approximation of by imposing an orthogonal rotation induced by the basis functions in K-L expansion.
Lemma 2
Under the conditions in Lemma 1 without , there exists an orthogonal matrix , such that , and
Furthermore, denote and . Then
where
and
Remark 2
Lemma 2 indicates that if there is a fFAS on , then there will be a fFAS on by the fact that
(11) |
and treating and as the updated loading matrix and the idiosyncratic component in (9), respectively, where is still an orthogonal matrix. More specifically, if has a fFAS induced by , also has such a fFAS with the same factor . Consequently, even if the functional covariates are correlated with each other, we can still use the functional scores to estimate as if they were uncorrelated.
To further get the estimates of the loading matrix and the functional factors after obtaining the score matrix with in (11), we employ the principal component analysis (Bai, 2003) on the covariance of , and further obtain that , where . More specifically, the columns of are the eigenvectors of corresponding to the top eigenvalues, and . This is similar to the case and , where and are top eigenvalues in descending order and their associated eigenvectors of the sample covariance matrix.
A practical issue is how to determine the number of factors . Given that latent factors, loadings, and idiosyncratic components are all unobservable, a conditional sparsity criterion is then adopted (Ahn & Horenstein, 2013; Fan et al., 2020). Specifically, let denote the -th largest eigenvalue of , be a prespecified upper bound for , and be a constant dependent on and , and then is determined by
(12) |
for a given and . An alternative criterion may be the information criteria proposed by Bai & Ng (2002) and Fan et al. (2013), referred to as PC and IC, respectively.
2.3 Properties of functional factor augmentation structure
In this subsection, we evaluate the estimation error of the proposed functional factors . Denote the first eigenvectors and eigenvalues of as , and , respectively, where , are sorted in a descending order. The largest eigenvalue of is , where , the covariance matrix of , , equals to from (11). The covariance can be further expanded using a perturbation
where is a perturbing matrix with such that
with and , being the first eigenvalues and eigenvectors of (Shi, 1997), respectively. Based on we have and . Accordingly, the following lemma shows the estimation error of the proposed functional structure factors for a given .
Lemma 3
For the fixed truncation numbers , if and for some constant , then
with and calculated based on .
Remark 3
In terms of errors induced by detecting functional structure factors, there are mainly two sources. One is from the estimation error of eigenvalues and eigenvectors, and the other is from the projection of in the column space of .
Furthermore, in real applications when only repeated observations are available for each subject, a common practice is to estimate a smooth trajectory and hence obtain , and . To further consider the estimation error for , more assumptions are required to describe the properties of the scores estimated by fPCA. For each functional covariate, denote the covariance function , and is the k-th eigenvalue of covariance functions . Assume for , we have .
Assumption 1
-
(A1)
(The decay rates of eigenvalues) For , , for .
-
(A2)
(The common truncation parameter cannot be too large)
This implies that . As the covariance functions are bounded, one has .
Assumption 2
We defer to the Supplementary Material for Conditions (B1)–(B4) on the underlying processes . These conditions specify how data are sampled and smoothed.
Lemma 4 indicates the convergence rate of estimating using is the same as using . Combined with Lemma 3, the estimation error of functional structure factors is formally established.
Theorem 1
With and , we have
2.4 Truncation analysis and the relationship between and
A practical issue when modeling functional data with sample instances is how to determine the number of basis functions using truncation, i.e., for , and how this truncation would affect the estimate of functional factors in (9). Usually, would be determined by the cumulative contribution to the variance from the corresponding functional scores, which tends to get an overestimated value of , namely, . To illustrate, assume and for , and define where is a redundant variable for the factor model. In this case,
and
If is one of the first eigenvectors of , is an eigenvector of with the same eigenvalue, and the position of in corresponds to that of in . As , will not affect since the row of corresponding to is when . Actually, in this overestimating situation, we can treat
as the real model with and .
Another case is that may be underestimated for some specific , especially when and are relatively small. In this case, a majority amount of variance of is concentrated in by (10). When are linearly independent for , is determined as by the variance contribution criteria. However, when they are correlated, an underestimation of may not significantly affect the estimation of . To illustrate, consider that the covariance matrix of with as
with , and for convenience, we assume . Since is an matrix, has at most non-zero eigenvalues, denoted as . By plugging in , is written as
with and . Consequently, when , underestimating will not significantly affect as long as , which is easy to achieve in practice.
3 The functional factor augmentation selection model
3.1 The functional linear regression model
In this section, we address a multivariate functional linear regression model with correlated functional covariates, using the proposed fFAS in Section 2. To start with, consider a functional linear regression model where a scalar response with is generated by a group of functional covariates as
(13) |
where is the square-integrable regression parameter function, and is a random noise with a zero mean and a constant variance . We note as the intercept term. Thus, with given i.i.d samples } which have the same distribution as , the sample functional linear regression model is described as
(14) |
To detect active functional covariates with correlation, we develop a functional factor augmentation selection model (fFASM) as follows. Without loss of generalizability, we use and as the centered functional covariates and scalar response variable, respectively, and accordingly (14) is equivalent to
(15) |
and can be further expanded by K-L expansion as
where , and . For , has a zero mean. Consequently, to select useful functional covariates is to find such that , which is further assumed as . Plugging in the proposed fFAS, one easily obtains
or equivalently,
(16) |
and with , and obtained by the fFAS, (16) is further equivalent to
(17) |
where is the orthogonal projection matrix onto the column space , and . Consequently to select useful functional covariates, the penalized loss function
is minimized with respect to , where is a penalty controlled by the parameter and can be set as the popular penalties such as lasso, SCAD or MCP. Note that can be selected using cross-validation. Hence, we successfully transform the problem from model selection with highly correlated functional covariates to model selection with weakly correlated or uncorrelated ones by lifting the space to higher dimension. As is obtained, is estimated as
Accordingly, when , is selected as a useful functional covariate. Note that the group selection method, such as the GM strategy by Aneiros, Novo & Vieu (2022), is not adopted in our case.
Also, this procedure may work even in the generalized linear context. Honestly,
(18) |
indicating that the explanatory variables are switched from to . In practice, when using the sample mean of to substitute , under some regularization conditions (Kong et al., 2016). After centralizing , by , the unknown parameters are transformed into , so that their corresponding covariates and are weakly correlated by introducing . Note that in linear cases, is further eliminated by using the projection matrix in (17). Consequently, for generalized linear models and samples without centralization, the loss function is updated as
(19) |
where , and , is the intercept term and is a known function, where in linear models. The estimate is given by .
3.2 Theoretical justifications for functional variable selection
In this section, the proposed method is theoretically investigated under the general linear model context in (3.1), and hence the linear model will be covered as a special case. To start with, some notations and assumptions are introduced. Recall , and define , , , , , , and . Suppose and are obtained given . We write
(20) |
with and .
Assumption 3
(Smoothness). , i.e., for some constants and , and .
Assumption 4
(Restricted strong convexity and irrepresentable condition). For where , there exist and such that
(21) | |||
(22) |
Assumption 5
(Estimation of factor model). for , and and for some constant . In addition, there exists a nonsingular matrix , and such that for , we have and , with and being the -th element of and .
Assumption 3 holds for a large family of generalized linear models. For example, linear model has , , and ; logistic model has and finite . Assumption 4 is easily satisfied with a small matrix and holds with high probability as long as satisfies similar conditions by standard concentration inequalities (Merlevède et al., 2011). Assumption 5 indicates the norm when and is bounded, and
will be satisfied with a high probability when is large and is small. Furthermore, with and obtained by minimizing (3.1), the following result is established.
Theorem 2
Theorem 2 guarantees the selection consistency of the functional covariates by developing sign consistency under some mild conditions. Furthermore, it demonstrates the relationship between and whose value depends on and , where comes from the first-order partial derivative of the empirical loss function (without penalty) when the parameters are known and satisfying a generalized irrepresentable condition from (22) (Lee et al., 2015). When is small, selecting an appropriate leads to satisfactory performance in the model estimation. Furthermore, when is not available, will be estimated and employed. Accordingly, the sign consistency is further guaranteed by showing that the loss function (3.1) based on will have an asymptotic property as follows.
Theorem 3
Corollary 1
Suppose has a unique global minimum. Then and will have an identical global minimizer when is sufficiently large, so that the estimates from will also have sign consistency.
4 Simulation Studies
In this section, the performance of the proposed fFASM estimator is examined for correlated functional data using simulation studies. To start with, the functional covariates are generated by with independently and identically (i.i.d) drawn from , and sampled over 51 uniformly distributed grid points for from with interval lengths of 0.02, where is set as 20, 50, 100 and 150, respectively. The basis functions are Fourier basis with set as 10. To blend correlations into functional covariates , two scenarios are considered when is generated:
-
•
Scenario I: the factor model case, i.e., where the elements of , and are generated from , , and , respectively. The true number of factors is set as , respectively;
-
•
Scenario II: the equal correlation case, i.e., are generated from a multivariate normal distribution where has diagonal elements as 1 and off-diagonal elements as . The true value of is set as , respectively.
Furthermore, the true functional coefficients in (15) are generated (the Harmonic attenuation type), (the weak single signal type), and for . Accordingly, the response is generated by (15) with .
We compare the performance of the proposed fFASM method (MCP penalty) with other functional variable selection methods, including the MCP (MCP) and the group MCP (grMCP) methods on the synthetic data in different scenarios. To evaluate the performance, three different measurements are adopted, i.e, the model size defined as the cardinality of the set , the integrated mean squared error (IMSE)
and the true positive rate (TPR)
where TP represents the number of correctly predicted nonzero instances of (events that are actually positive and predicted as positive), and FN for that of instances that are actually nonzero but are predicted as zero. Note that a model with the TPR of 1 and the model size of 6 indicates perfect recovery. The proposed fFASM method employs the MCP regularization, and the hyperparameters are tuned using cross-validation by minimizing the cross-validation error with a randomly selected subset of training samples, where the sample size is set as 100. The whole experiment is repeated for 500 times, and the averaged results are reported.

Figure 1 shows the estimation and selection performance for Scenario I (the factor model structure case) using different methods. As is easily observed, the proposed fFASM method significantly and consistently outperforms MCP and grMCP in the sense that the proposed fFASM has the smallest IMSE for each , demonstrating a more accurate estimation of the functional coefficients. Furthermore, the TPR for the proposed fFASM method is always greater than those of MCP and grMCP as with the increase of , and MCP and grMCP even show dramatic drops when increases. In terms of the model size, the proposed fFASM tends to select more functional covariates into the model, slightly greater than the true model size 6, while the MCP and grMCP methods tend to select irrelevant functional covariates into the model when looking deeper at the selection results, and the grMCP method tends to be more conservative in selecting functional covariates than MCP.

Furthermore, Figure 2 shows the estimation and selection results in Scenario II (the equal correlation case). The IMSE values of the proposed fFASM method turn out to remain at a low level when the correlation increases in each subfigure in the first row, indicating a robust estimation performance from fFASM, while MCP and grMCP tend to increase, demonstrating less satisfactory estimation performance. Although the two methods show slightly better results when functional covariates are of small correlation with each other, this is expected since the equal correlation indicates a small number of factors, which may be captured easily. In terms of TPR, the fFASM method is again consistently close to 1 when the correlation varies, while TPRs for the competitor methods drop dramatically. Also, when the number of functional covariates increases, the TPRs from all three methods turn out to drop without surprise. When looking at the model size, the proposed fFASM remains stable, which may slightly overestimate the model size, while the competitor models show a much less stable model size of little consistency.
Additionally, the selection frequencies of the two types of functional coefficients are examined, namely the Harmonic attenuation type (Type1) and the weak single signal type (Type2), displayed in Figure 3 and 4, respectively. Note that the number of true Type1 functional covariates is 4 and that of Type2 is 2 in both scenarios. On one hand, Figure 3 shows that the selection frequency of the proposed fFASM method decreases first and then increases when the number of factors increases in both two types of functional covariates, while those for MCP and grMCP tend to be 0, especially in the Type2 setting where the two competitor methods show lower starting points. Note that the selection frequencies of the proposed method for both two types are not extraordinarily high, since the functional linear model is somewhat lack of fit due to the complexity of the true model. On the other hand, in Scenario II, Figure 4 shows a much improved selection consistency of the proposed method than grMCP and MCP for both two types, even though Type2 may still select fewer covariates because of slight violation of norm requirement in Theorem 2.


5 Real Data Application
5.1 Effects of macroeconomic covariates on lifetime expectancy
In this section, the effects of macroeconomic covariates on national expected lifetime are explored for European and American countries using the open-sourced EPS data platform 111https://www.epsnet.com.cn/. The data have been collected and documented annually over a span of 21 years, from 2000 to 2020, for 40 countries or regions. Our focus is the average lifetime expectancy as a scalar dependent variable against 33 macroeconomic functional covariates, such as gross domestic product, healthcare expenditure, and educational attainment. Functional covariates are correlated with each other. For example, higher levels of educational attainment often correlate with higher GDP, since a more educated workforce may usually contribute to greater economic productivity. To estimate lifetime expectancy and find out useful functional covariates, the proposed fFASM method, the MCP and grMCP methods are employed. Specifically, to recover the functional trajectories, year points from 2000 to 2022 are used as grid points. During the study, the collected data for 30 randomly selected countries (or regions) out of 40 are used as the train set to build the model, and the remaining 10 are used as the test set. The whole procedure is repeated 200 times, and the average out-of-sample and model size are reported in Table 1 and the top five selected functional covariates in Table 2.
The results from both the prediction performance in Table 1 and the variable selection in Table 2 may strongly support the benefits of the proposed fFASM. The average out-of-sample from the fFASM is the greatest among the three competitor models, indicating its competitive prediction accuracy. Further, the variables selected by fFASM are closely related to economic development and social well-being, providing insights into education and labor market conditions, which may show significant influence on the average life expectancy of a nation. Although all three methods successfully selected the key variable “Employment” with the highest selection frequencies, “Adult literacy rate” and “total fertility rate” are both selected with relatively large frequencies by fFASM and MCP, while grMCP fails to show high selection frequencies of them. Additionally, the fFASM method selected “Gross national income per capita” and “Unemployment rate”, which are of little correlation with other variables, while MCP selected “Labor force” highly correlated with “Employment” and “Unemployment rate”. The grMCP only focused on “Employment” and ignored almost all other variables with selection rates lower than 5% .
average out-of-sample | average model size | ||||
fFASM | MCP | grMCP | fFASM | MCP | grMCP |
0.288 | 0.223 | 0.227 | 1.95 | 1.53 | 1.04 |
Rank | fFASM | |
---|---|---|
Name | Frequency | |
1 | Employment (million people) | 160 |
2 | Adult literacy rate (age 15 and above) (%) | 56 |
3 | Total fertility rate | 54 |
4 | Unemployment rate (% of total labor force) | 21 |
5 | Gross national income per capita (current USD) | 17 |
Rank | MCP | |
Name | Frequency | |
1 | Employment (million people) | 166 |
2 | Total fertility rate | 57 |
3 | Adult literacy rate (age 15 and above, %) | 37 |
4 | Labor force (people) | 16 |
5 | Unemployment rate (% of total labor force) | 7 |
Rank | grMCP | |
Name | Frequency | |
1 | Employment (million people) | 187 |
2 | Prevalence of undernourishment (% of total population) | 7 |
3 | Adult literacy rate (age 15 and above) (%) | 3 |
4 | Gross national income per capita (current USD) | 2 |
5 | Per capita health expenditure (current USD) | 1 |
5.2 Prediction of sales areas of commercial houses
In this section, we focus on the prediction of annual average sales area of commercial houses in each province in China. Data are collected from the National Bureau of Statistics of China222https://data.stats.gov.cn/ on a monthly basis for 31 administrative regions of mainland China in the year 2022 and 2023. The dependent variable is the annual average sales area of commercial houses against 60 functional covariates, including cement, medium tractors, engines, aluminum materials, and other variables related to industrial output. To recover the functional trajectories, 24 grid points are predetermined on a monthly basis from 2022 to 2023. To predict the response, the proposed fFASM, the MCP and grMCP methods are employed. During the study, the collected data for 20 randomly selected provinces are used as the training set to build the model, and the remaining 11 as the test set to evaluate the prediction performance, with 200 repetitions of the whole procedure. The average out-of-sample and the average model sizes are reported in Table 3, and the most frequently selected functional covariates in Table 4.
From Table 3, the proposed fFASM method is found to show a better performance than MCP and grMCP in terms of a larger average out-of-sample and a slightly lower average model size. In Table 4, all three methods select “Cement” as the most frequently selected functional covariates out of 200 repetitions, which is expected as the most important raw material required for house construction. Similarly, the functional covariate “Aluminum Materials” is selected among top five. Specifically, the fFASM and MCP methods select the functional covariates “Medium Tractors” and “Engine”, which are closely related to construction machinery used in the house building industry, while grMCP selects “Mechanized Paper”, “Hydropower Generation” and “Phosphate Rock”, which seem to have limited connection with house construction. This may explain why grMCP has a larger model size but a relatively smaller average out-of-sample performance.
average out-of-sample | average model size | ||||
fFASM | MCP | grMCP | fFASM | MCP | grMCP |
0.612 | 0.601 | 0.355 | 1.35 | 2.20 | 2.55 |
Rank | fFASM | |
---|---|---|
Name | Frequency | |
1 | Cement | 198 |
2 | Medium Tractors | 13 |
3 | Engines | 12 |
3 | Aluminum Materials | 12 |
5 | Chemical Pesticides (Active Ingredients) | 4 |
Rank | MCP | |
Name | Frequency | |
1 | Cement | 199 |
2 | Medium Tractors | 48 |
3 | Aluminum Materials | 19 |
4 | Computers | 18 |
4 | Engines | 18 |
Rank | grMCP | |
Name | Frequency | |
1 | Cement | 137 |
2 | Aluminum Materials | 97 |
3 | Mechanized Paper | 68 |
4 | Hydropower Generation | 30 |
5 | Phosphate Rock | 20 |
6 Conclusion
In this article, a novel functional factor augmentation structure (fFAS) is proposed to capture associations for correlated functional processes, and further a functional factor augmentation selection model (fFASM) is developed to select useful functional covariates in high dimensions with correlated functional covariates. Note only is the correlation between functional covariates addressed without assuming an explicit covariance structure, theoretical properties of the estimated functional factors are established. We primarily discuss the rationale for constructing a fFAS, how to estimate the fFAS and its estimation error, and the impact of truncating functional data on the validity and estimation of the factor model. Due to the unique characteristics of functional data, the assumed factor model and the actually estimated factor model may differ.
Numerical investigations on both simulated and real datasets support the superior performance of the proposed fFASM method. It is found that our proposed method performs better than general functional data variable selection methods when dealing with the variable selection problem of correlated multivariate functional covariates. A practical issue may be how to determine the model size, as our method may slightly select more functional covariates in simulation studies, which may be a trade-off for modeling functional processes with correlations in high dimensions.
References
- (1)
- Ahn & Horenstein (2013) Ahn, S. C. & Horenstein, A. R. (2013), ‘Eigenvalue ratio test for the number of factors’, Econometrica 81(3), 1203–1227.
- Aneiros, Horová, Hušková & Vieu (2022) Aneiros, G., Horová, I., Hušková, M. & Vieu, P. (2022), ‘On functional data analysis and related topics’, Journal of Multivariate Analysis 189, 104861.
-
Aneiros, Novo & Vieu (2022)
Aneiros, G., Novo, S. & Vieu, P. (2022), ‘Variable selection in functional regression models:
A review’, Journal of Multivariate Analysis 188, 104871.
50th Anniversary Jubilee Edition.
https://www.sciencedirect.com/science/article/pii/S0047259X21001494 - Bai (2003) Bai, J. (2003), ‘Inferential theory for factor models of large dimensions’, Econometrica 71(1), 135–171.
- Bai & Ng (2002) Bai, J. & Ng, S. (2002), ‘Determining the number of factors in approximate factor models’, Econometrica 70(1), 191–221.
- Cardot et al. (2003) Cardot, H., Ferraty, F. & Sarda, P. (2003), ‘Spline estimators for the functional linear model’, Statistica Sinica pp. 571–591.
- Castellanos et al. (2015) Castellanos, L., Vu, V. Q., Perel, S., Schwartz, A. B. & Kass, R. E. (2015), ‘A multivariate gaussian process factor model for hand shape during reach-to-grasp movements’, Statistica Sinica 25(1), 5.
- Centofanti et al. (2021) Centofanti, F., Lepore, A., Menafoglio, A., Palumbo, B. & Vantini, S. (2021), ‘Functional regression control chart’, Technometrics 63(3), 281–294.
- Chen et al. (2022) Chen, C., Guo, S. & Qiao, X. (2022), ‘Functional linear regression: dependence and error contamination’, Journal of Business & Economic Statistics 40(1), 444–457.
- Chen et al. (2011) Chen, D., Hall, P. & Müller, H.-G. (2011), ‘Single and multiple index functional regression models with nonparametric link’.
- Chen et al. (2021) Chen, L., Wang, W. & Wu, W. B. (2021), ‘Dynamic semiparametric factor model with structural breaks’, Journal of Business & Economic Statistics 39(3), 757–771.
- Chiou & Müller (2016) Chiou, J.-M. & Müller, H.-G. (2016), ‘A pairwise interaction model for multivariate functional and longitudinal data’, Biometrika 103(2), 377–396.
- Cuevas (2014) Cuevas, A. (2014), ‘A partial overview of the theory of statistics with functional data’, Journal of Statistical Planning and Inference 147, 1–23.
- Fan et al. (2020) Fan, J., Ke, Y. & Wang, K. (2020), ‘Factor-adjusted regularized model selection’, Journal of Econometrics 216(1), 71–85.
- Fan & Li (2001) Fan, J. & Li, R. (2001), ‘Variable selection via nonconcave penalized likelihood and its oracle properties’, Journal of the American statistical Association 96(456), 1348–1360.
- Fan et al. (2013) Fan, J., Liao, Y. & Mincheva, M. (2013), ‘Large covariance estimation by thresholding principal orthogonal complements’, Journal of the Royal Statistical Society Series B: Statistical Methodology 75(4), 603–680.
- Fang et al. (2015) Fang, L., Zhao, H., Wang, P., Yu, M., Yan, J., Cheng, W. & Chen, P. (2015), ‘Feature selection method based on mutual information and class separability for dimension reduction in multidimensional time series for clinical data’, Biomedical Signal Processing and Control 21, 82–89.
- Ferraty (2006) Ferraty, F. (2006), Nonparametric functional data analysis, Springer.
- Gonzalez-Vidal et al. (2019) Gonzalez-Vidal, A., Jimenez, F. & Gomez-Skarmeta, A. F. (2019), ‘A methodology for energy multivariate time series forecasting in smart buildings based on feature selection’, Energy and Buildings 196, 71–82.
- Hall & Horowitz (2007) Hall, P. & Horowitz, J. L. (2007), ‘Methodology and convergence rates for functional linear regression’.
- Hays et al. (2012) Hays, S., Shen, H. & Huang, J. Z. (2012), ‘Functional dynamic factor models with application to yield curve forecasting’, The Annals of Applied Statistics pp. 870–894.
- Hörmann et al. (2015) Hörmann, S., Kidziński, Ł. & Hallin, M. (2015), ‘Dynamic functional principal components’, Journal of the Royal Statistical Society Series B: Statistical Methodology 77(2), 319–348.
- Hörmann & Kokoszka (2010) Hörmann, S. & Kokoszka, P. (2010), ‘Weakly dependent functional data’.
- Htun et al. (2023) Htun, H. H., Biehl, M. & Petkov, N. (2023), ‘Survey of feature selection and extraction techniques for stock market prediction’, Financial Innovation 9(1), 26.
-
Jiménez et al. (2020)
Jiménez, F., Palma, J., Sánchez, G., Marín, D., Francisco Palacios, M.
& Lucía López, M. (2020),
‘Feature selection based multivariate time series forecasting: An application
to antibiotic resistance outbreaks prediction’, Artificial Intelligence
in Medicine 104, 101818.
https://www.sciencedirect.com/science/article/pii/S0933365719306608 - Kong et al. (2016) Kong, D., Xue, K., Yao, F. & Zhang, H. H. (2016), ‘Partially functional linear regression in high dimensions’, Biometrika 103(1), 147–159.
- Lee et al. (2015) Lee, J. D., Sun, Y. & Taylor, J. E. (2015), ‘On model selection consistency of regularized m-estimators’.
- Li & Hsing (2007) Li, Y. & Hsing, T. (2007), ‘On rates of convergence in functional linear regression’, Journal of Multivariate Analysis 98(9), 1782–1804.
- Matsui & Konishi (2011) Matsui, H. & Konishi, S. (2011), ‘Variable selection for functional regression models via the l1 regularization’, Computational Statistics & Data Analysis 55(12), 3304–3310.
- Merlevède et al. (2011) Merlevède, F., Peligrad, M. & Rio, E. (2011), ‘A bernstein type inequality and moderate deviations for weakly dependent sequences’, Probability Theory and Related Fields 151, 435–474.
- Müller & Yao (2008) Müller, H.-G. & Yao, F. (2008), ‘Functional additive models’, Journal of the American Statistical Association 103(484), 1534–1544.
- Nti et al. (2019) Nti, K. O., Adekoya, A. & Weyori, B. (2019), ‘Random forest based feature selection of macroeconomic variables for stock market prediction’, American Journal of Applied Sciences 16(7), 200–212.
- Peng et al. (2005) Peng, R. D., Dominici, F., Pastor-Barriuso, R., Zeger, S. L. & Samet, J. M. (2005), ‘Seasonal analyses of air pollution and mortality in 100 us cities’, American journal of epidemiology 161(6), 585–594.
- Petersen (2024) Petersen, A. (2024), ‘Mean and covariance estimation for discretely observed high-dimensional functional data: Rates of convergence and division of observational regimes’, Journal of Multivariate Analysis p. 105355.
- Ramsay & Silverman (2005) Ramsay, J. & Silverman, B. (2005), Functional Data Analysis, Springer Series in Statistics, Springer.
- Shi (1997) Shi, L. (1997), ‘Local influence in principal components analysis’, Biometrika 84(1), 175–186.
- Tibshirani (1996) Tibshirani, R. (1996), ‘Regression shrinkage and selection via the lasso’, Journal of the Royal Statistical Society Series B: Statistical Methodology 58(1), 267–288.
- Yang & Ling (2024) Yang, S. & Ling, N. (2024), ‘Robust estimation of functional factor models with functional pairwise spatial signs’, Computational Statistics pp. 1–24.
- Yao et al. (2005) Yao, F., Müller, H.-G. & Wang, J.-L. (2005), ‘Functional linear regression analysis for longitudinal data’, The Annals of Statistics 33(6), 2873 – 2903.
- Yuan & Cai (2010) Yuan, M. & Cai, T. T. (2010), ‘A reproducing kernel hilbert space approach to functional linear regression’.
- Zhang (2010) Zhang, C.-H. (2010), ‘Nearly unbiased variable selection under minimax concave penalty’, Annals of statistics 38(2), 894–942.
SUPPLEMENTARY MATERIAL
7 Regularity Condition
In this section, we introduce some regularity conditions (Kong et al. 2016). Without loss of generality, we assume that have been centred to have zero mean. With , for definiteness, we consider the local linear smoother for each set of subjects using bandwidths , and denote the smoothed trajectories by .
Condition (B1) consists of regularity assumptions for functional data. Condition (B2) is standard for local linear smoother, (B3)–(B4) concern how the functional predictors are sampled and smoothed.
-
(B1)
For , for any there exists an such that
For each integer is bounded uniformly in .
-
(B2)
For is twice continuously differentiable on with probability 1 , and , where denotes the second derivative of . The following condition concerns the design on which is observed and the local linear smoother . When a function is said to be smooth, we mean that it is continuously differentiable to an adequate order.
-
(B3)
For are considered deterministic and ordered increasingly for . There exist densities uniformly smooth over , satisfying and that generate according to , where is the inverse of . For each , there exist a common sequence of bandwidths such that , where is the bandwidth for . The kernel density function is smooth and compactly supported.
Let , let and . The condition below is to let the smooth estimate serve as well as the true in the asymptotic analysis, denoting by .
-
(B4)
For .
8 Proof of Theorem
Proof of Lemma 3:
Proof: consists of first eigenvalues of . consists of first eigenvalues and note -th eigenvalues of as .
Then we give the bound separately: For the first part, as , we have
where is a constant. For the second part,
And the third part, with
To sum up, we obtain that
For , with
that we have
It means
Where
Then we give the bound separately:
To sum up, remove the constant then we can obtain that
Proof of Lemma 4:
Proof:
Under (A1,A2) (B1-B4), using the Lemma 2(a) in Kong et al. (2016).
then the analysis for is is equivalent to analyzing the following .
Proof of Theorem 1
Proof: We view the estimation error as perturbation
where represents the estimation error, and the equation holds true with probability. is a bounded random matrix since is bounded. Consider
Follow the steps in proof of Lemma 3, we have
Proof of Theorem 2
Define , , and . We easily see that and
Then it follows that and for any norm .
Consequently, Theorem 2 is reduced to studying and the loss function . The Lemma 5 below implies that all the regularity conditions (with ) in Lemma 6 are satisfied.
Let and be the -th element of and , respectively. Observe that and . Hence and consequently, and . In addition, .
Based on these estimates, all the results follow from Theorem 2 and some simple algebra.
Here we present the Lemma 5 used above and its proof.
Lemma 5
Proof of Lemma 5 (i) Based on the fact that , we have and . For any and ,
(24) |
By the Cauchy-Schwarz inequality and , we obtain that for . Plugging this result back to (8), we get
(ii) Now we come to the second claim. For any ,
Also, by and we have
Define . By the Jensen’s inequality, ,
As a result,
(25) |
Let . Then
(26) |
Lemma 7 yields
We also have a cruder bound , which leads to
(27) |
(iii) The third argument follows (27) easily. Since holds for any symmetric matrix , we have and thus . (iv) Finally we prove the last inequality. On the one hand,
From claim (ii) and (8) it is easy to see that
We have shown above in (8) that . As a result,
By combining these estimates, we have
Therefore .
Finally, we use the following lemma to prove the theorem.
Lemma 6
Proof of Lemma 6
We need the following two lemma to prove the Lemma 6.
Define for . We first introduce two useful lemmas.
Lemma 7
Suppose and and , where is an induced norm. Then .
Lemma 8
Under (i) (ii) (iii) in Lemma 5, we have and over .
Lemma 9
Suppose is a Euclidean space, and is convex, and is convex. In addition, there exist such that as long as A. If , then has unique minimizer and .
Lemma 10
Suppose and is convex. is convex and for and , where is a linear subspace of and is its orthonormal complement. In addition, there exists such that for and . Let where , and .
If and for all , then is the unique global minimizer of .
Now we start the proof: First we study the restricted problem
where is oracle parameter set. Take and . Let and hence . Lemma 8 shows that and over .
Second, we study the bound. On the one hand, the optimality condition yields and hence . On the other hand, by letting we have
Hence
By (i) (ii) (iii) in Lemma 5, we obtain that
By we have
Therefore,
(28) |
Third we study the bound. The bound on can be obtained in a similar way. Using the fact that for symmetric matrices,
Hence . Since , we also have
This gives another bound. By Lemma 10, to derive it remains to show that . Using the Taylor expansion we have
(29) | ||||
On the one hand, the first term in (29) follows,
By the Taylor expansion, triangle’s inequality, (i) in Lemma 5 and the fact that ,
On the other hand, we bound the second term in (B.4). Note that for all . (i) in Lemma 5 yields
As a result,
Recall that the bound in (28). By plugging in this estimate, and using the assumptions and , we derive that
This implies and translates all the bounds for to the ones for . The proposition on sign consistency follows from elementary computation, thus we omit its proof.
Proof of Theorem 3
Proof: Write the loss function 3.1 as . We note to represent that and of the loss function are calculated by , so as . Note . For a fixed , have the same distribution.
With in probability, we have .