The conditionally studentized test for high-dimensional parametric regressions
Abstract
This paper studies model checking for general parametric regression models having no dimension reduction structures on the predictor vector. Using any U-statistic type test as an initial test, this paper combines the sample-splitting and conditional studentization approaches to construct a COnditionally Studentized Test (COST). Whether the initial test is global or local smoothing-based; the dimension of the predictor vector and the number of parameters are fixed or diverge at a certain rate, the proposed test always has a normal weak limit under the null hypothesis. When the dimension of the predictor vector diverges to infinity at faster rate than the number of parameters, even the sample size, these results are still available under some conditions. This shows the potential of our method to handle higher dimensional problems. Further, the test can detect the local alternatives distinct from the null hypothesis at the fastest possible rate of convergence in hypothesis testing. We also discuss the optimal sample splitting in power performance. The numerical studies offer information on its merits and limitations in finite sample cases including the setting where the dimension of predictor vector equals the sample size. As a generic methodology, it could be applied to other testing problems.
Keywords: Asymptotic model-free test; conditional studentization; high dimensions; model checking; sample-splitting.
1 Introduction
Consider the general parametric regression model:
where is a known function of and is an unknown parameter vector. is a predictor vector in while represents the univariate response variable where () stands for the -(-)dimensional Euclidean space. For many models, such as linear and generalized linear models, , but in general, this may not be the case. As well known, when the assumed model is used, model checking is necessary before further analyzing data. Specifically, the null hypothesis is, for a subset ,
(1.1) |
versus the alternative hypothesis
(1.2) |
Relevant problems have been investigated intensively in the literature. Most existing methods are for cases with fixed dimensions and . Examples include nonparametric estimation-based local smoothing tests such as HΓ€rdle and Mammen (1993), Zheng (1996), Zhu and Li (1998), Lavergne and Patilea (2008, 2012), Guo etΒ al. (2016), and empirical process-based global smoothing tests, e.g. Stute and Zhu (2002), Zhu (2003), Escanciano (2006), and Stute etΒ al. (2008). These two general methodologies respectively show their advantages and limitations. Local smoothing tests often have tractable weak limits under the null hypothesis and most of them can detect local alternatives distinct from the null hypothesis at slower rates of convergence than where is the sample size. As the rates can be slower in higher dimensional cases, the curse of dimensionality is a big challenge. In contrast, the limiting null distributions of global smoothing tests under the null hypothesis are usually intractable, resorting to resampling approximations to determine critical values, but can detect local alternatives distinct from the null hypothesis at the fastest possible rate of convergence in hypothesis testing. The dimensionality is still an issue as this type of test involves high dimensional empirical process.
For paradigms with large dimensions and that may diverge as the sample size goes to infinity, there are few methods available in the literature. Some relevant references for models with dimension reduction structures are cited as follows. Shah and BΓΌhlmann (2018) and JankovΓ‘ etΒ al. (2020) respectively proposed two goodness-of-fit tests for high-dimensional sparse linear and generalized linear models with a fixed designs. For the problems with random designs without sparse structures, Tan and Zhu (2019) and Tan and Zhu (2022) considered adaptive-to-model tests for high-dimensional single-index and multi-index models that the number of linear combinations of is fixed, respectively, with diverging dimensions . Their methods are the extensions of the test first proposed by Guo etΒ al. (2016) in fixed-dimension cases for multi-index models. These three tests critically hinge on the dimension reduction structures of the predictors. Otherwise, as Tan and Zhu (2022) discussed, they fail to work because the limiting distributions under the null and alternative hypothesis are degenerate at constants and cannot be used to determine critical values and resampling approximation cannot work well either.
The current paper proposes a COnditionally Studentized Test (COST) for general parametric models without dimension reduction structures in high-dimensional cases. The basic idea for constructing this novel test is that based on an initial test that can be rewritten as a U-statistic (either a local or a global smoothing test), we divide the sample of size into two subsamples of sizes and , and use the conditional studentization approach to construct the final test. The dimension can diverge at the rate with a leading term corresponding to the rate Tan and Zhu (2022) achieved. Further, the restriction that is no longer necessary. That is, the dimension of the predictor vector can be higher than , even the sample size under some regularity conditions on the regression and related functions. Note that the conditions in this paper are not assumed on predictor significance to the regression function. Thus, understandably, when is large, the conditions could relatively stringently restrict the forms of related functions, but the results still show the potential of our method in higher dimension settings. The details will be presented in SectionΒ 2. RemarkΒ 1 in this section also gives some more explanations. SectionΒ 4 reports some numerical studies to check the performance of the test when is larger than and equal to . As the conclusions are similar, some other studies with are not reported for space saving. The conditions are put in SectionΒ 6. We also discuss the optimal sample splitting between the sizes and in the power performance of the test. The following merits of the novel test are worthy of mentioning. Under more general parametric model structures, whether the initial test is global or local smoothing-based, and whether the dimensions are fixed or divergent at certain rates, the final test can always have a normal weak limit under the null hypothesis, which is often a merit of local smoothing tests; and can detect local alternatives distinct from the null at the rate as close to ( for a given positive constant ) as possible, which is the typical optimal rate global smoothing tests can achieve. These unique features are very different from any existing test as, other than being able to handle high-dimensional problems, the test also enjoys the advantages local smoothing tests and global smoothing tests have. On the other hand, the test statistic converges to its weak limit at the rate of rather than such that it may lose some power in theory. We will discuss this limitation in more detail later.
The rest of the paper is organized as follows. Section 2 describes the test statistic construction. Section 3 includes the asymptotic properties of the test statistic under the null and alternative hypotheses, the investigation on the optimal choice of sample-splitting scheme. Section 4 contains some numerical studies including simulations and real data analysis. To examine its performance, the simulation studies include the settings favoring the existing method in the literature; the settings where the condition on the number of parameters is violated; and the settings with the dimension of predictor vector being much larger than the number of parameters, even equal to the sample size. Section 5 comments on its advantages and limitations. It gives a discussion of the reason why in our setting, we do not apply the commonly used cross-fitting approach for power enhancement, and a brief discussion on the challenge of extending or modifying our method to handle models having higher dimensions with sparse structure. SectionΒ 6 includes the regularity conditions with some remarks. As the proofs of the main results are technically demanded and lengthy, we then put them in Supplementary Material.
2 Test statistic construction
As the test construction requires estimating the parameter in the model, we first briefly give the details.
2.1 Notation and parameter estimation
Write the underlying regression function as and the error as . To study the power performance, we also consider a sequence of alternative hypotheses:
(2.1) |
When as , (2.1) corresponds to a sequence of local alternatives. When is a fixed constant, (2.1) reduces to the global alternative model (1.2).
Set
(2.2) |
Under the null hypothesis in (1.1) and the regularity condition 1 specified in Section 6, . Under the alternatives, typically depends on the distribution of . To save space, redefine under the global alternative hypothesis with fixed , while we still write under the local alternative hypothesis.
Subsequently, we proceed to present additional notations that have been employed in this paper. Denote and . Under the null hypothesis and the local alternative hypothesis, we define . Under the global alternatives, without confusion, we define and . Use to represent the norm. Write the conditional expectation as for any random variable and random variable/vector .
The least squares estimator of is defined by
(2.3) |
Under the regularity conditions 1-5 in Section 6, the convergence rate and asymptotically linear representation of can be derived, which are important for studying the asymptotic properties of the test statistic under the null and alternative hypotheses. We will state them as three lemmas in Supplementary Material, and the first two lemmas are Theorems 1 and 2 and the third lemma is an extension of TheoremΒ 4 in Tan and Zhu (2022).
2.2 The motivation
In the literature, several existing tests have a similar structure, with a weight function depending on the sample size such that a test statistic can be written as, before standardizing, where is the residual. Some classic tests are listed as follows. Bierens (1982) proposed the integrated conditional moment (ICM) test based on the weight function ; The tests proposed by Escanciano (2009) discussed the case with general weight functions; Li etΒ al. (2019) used the weight function induced by an idea bridging local and global smoothing tests. Other tests can be written or approximately written as U-statistics including those local smoothing tests proposed by HΓ€rdle and Mammen (1993), Zheng (1996), and those global smoothing tests suggested by Stute etΒ al. (1998b, a), Tan and Zhu (2019) and Tan and Zhu (2022). However, these tests do not apply to the cases with diverging dimensions without dimension reduction structures.
We now construct a novel test by combining the sample-splitting and the conditional studentization approaches. The following observations induce the test construction. Let . Under the null hypothesis, and . Note that under the null hypothesis, that is, , otherwise greater than zero under the alternatives. In a more general presentation, when we use an approximation with, say, a kernel function in lieu of to , this quantity can be approximated by . Simply, we abbreviate as . Thus, in general, this quantity can be estimated by that can be used to define a non-standardized statistic:
(2.4) |
Different tests uses different weight functions . Examples include the kernel function with a bandwidth (see, e.g., Zheng (1996)), exponential function (see e.g., Bierens (1982)), and the weight function used by Li etΒ al. (2019). These tests often have no tractable limiting null distributions.
2.3 The test statistic
We now modify to avoid duplicated use of the samples. Estimate by two parts of samples respectively. Define
where the samples of size are divided into two disjoint parts of sizes and satisfying . All the results in SectionΒ 2 hold when is replaced by or . Define a modified test statistic as
(2.5) |
where and Again, this test statistic usually has no significant difference from the previous as they have similar asymptotic behaviors. However, this seemingly minor modification plays a vital role in constructing a conditionally studentized test with a normal weak limit under the null hypothesis. The key ingredient in the construction is that in the two independent sums the residuals are not used duplicately, but the weight function links the two sums.
To see how to define the final statistic, we give the decomposition of . Under the null hypothesis, we have the followings:
(2.6) |
where is the conditional expectation given , and is defined in (2.2). The detailed justification can be found in Corollary 1 of Supplementary Material. The decomposition is only about the residuals βs in the first part of the samples. Given , are conditionally independent and identically distributed random variables. Intuitively, under the null hypothesis, the following random sequence would have a normal weak limit by using the Central Limit Theorem conditionally:
(2.7) |
that is a conditionally studentized version of . To define the final test statistic, we use in lieu of and for the denominator that is a conditional standard deviation, we use , and to replace , and :
(2.8) |
where
and . We will prove that , and are the consistent estimators of , and , respectively such that the consistency of this estimated conditional standard deviation holds. Note that in , we use the full data-based estimator instead of . We find that asymptotically, there is no difference by using either estimator, but using can make a faster convergence rate of the estimator to . In Corollary 2 of Supplementary Material, we will show that under the null hypothesis and the local alternative hypothesis in (2.1) with ,
(2.9) |
where stands for convergence in probability. While, under the global alternative hypothesis in (2.1) with fixed , will be proved to be a consistent estimator of . Note that when a kernel function is used as the weight function , it can go to infinity as the sample size goes to infinity. But this is not a problem in our construction as the studentized test is scale-invariant, and the quantity in the weight is eliminated from the numerator and denominator. Thus, we can consider the weight function without this quantity.
Remark 1.
It is worth noticing that in the theorems and corollaries, we impose some restrictions on the divergence rate of the parameter dimension such as or , but do not directly give the constraints on the dimension of the predictor vector . In fact, some constraints are hidden in the regularity conditions in Section 6 on the regression and other related functions such that in Tan and Zhu (2022) is not required and can diverge to infinity much faster than , even the sample size . For instance, our test statistic can deal with the following model:
where and . The required conditions are satisfied. Similarly, we can see some models with higher dimension . Therefore, although the conditions are strong, the results show the potential of our method to handle the problems in large-dimensional settings.
3 Asymptotic properties
3.1 The limiting null distribution
Under the null hypothesis, the conditionally studentized version of has a normal weak limit under some regularity conditions (see Corollary 3 in Supplementary Material for details). We can prove the asymptotic equivalence between the numerator(denominator) of and . The result is stated as follows.
Theorem 1.
Therefore, we can compute the critical values easily.
Remark 2.
Remark 3.
In Theorem 1, we require and because when analyzing the residuals we use the asymptotically linear representation of the parameter estimator, and also need the consistency of some sample covariance matrices to their counterparts at the population level in the sense of norm. Tan and Zhu (2022) showed that to get the asymptotically linear representation of the parameter estimator, this rate cannot be faster in general. However, for linear regression models, when the rate of divergence can be improved to , the asymptotically linear representation of the parameter estimator still holds (Theorem 2 in Tan and Zhu (2019)), and the sample covariance matrices are still consistent.
3.2 Power study
Consider the power performance under the alternative hypothesis in (2.1). The fixed non-zero constant corresponds to the global alternative hypothesis with , where for , and as to local alternatives. Define and We state the following results.
Theorem 2.
(a) When for the global alternatives, recalling , and in SectionΒ 2, we can obtain that
(3.2) |
where stands for convergence in probability, and
In cases (b)-(d), recalling and , we have:
(b) When ,
(3.3) |
where
and converges to in distribution.
Especially, if , then
(c) When , , , , then
(d) When , , then
Remark 4.
Theorem 2 shows that the test can detect the local alternatives distinct from the null at the fastest possible rate of order in general. Due to possible different sizes and of the two subsamples, the above analysis presents more detailed results in different cases than those with existing tests in the literature. From the above results, we can see that only in case (b) with , the optimal splitting is , while in cases (a) and (c), large can enhance power. But practically, if is too small, the conditional variance cannot be estimated well causing that the test may not perform well. These claims were confirmed when we conducted numerical studies using , , and . Thus, in the numerical studies, we report the results with . Another issue is the power performance when with . We do not discuss this case mainly because of the difficulty of studying the negligibility of the remaining terms in the asymptotically linear representations of and under the local alternatives.
4 Numerical Studies
4.1 Simulations
In this section, some numerical studies are conducted to examine the performance of our test proposed in Section 3. As Li etΒ al. (2019) used, the weight function has the merits that combine local smoothing and global smoothing tests. But it is theoretically flawed for large as it converges to a constant when goes to infinity. To remedy this defect, not only should our chosen weight function include this weight function, but also it contains another weight function , where denotes the kernel density function and is the bandwidth. The summation form ensures that it works in diverging dimension cases. As a result, the weight function, defined as , is a hybrid of two equally weighted functions.
To make the simulation results convincing, we compare with the test proposed by Tan and Zhu (2022) that can also handle diverging dimension cases and shows its advantages. It is worth noticing that the method in JankovΓ‘ etΒ al. (2020) could also be applied to non-sparse models with random designs and dimensions under our constraints, Tan and Zhu (2022) made a comparison for Logistic model with three model settings, and outperformed the test in JankovΓ‘ etΒ al. (2020) in two of the three model settings. We have also conducted the simulations under the same model settings and found that worked similarly to . Therefore, we put the results in Section 4.2, and the main text here only reports the numerical comparison with . In addition, we also evaluate the performance of another related test, , which can be seen as a modified version of . When the underlying model has a dimension reduction structure, contains the dimension reduction step (see, Tan and Zhu (2022)) . We design three studies: models with dimension reduction structures; with dimension reduction structures and diverging dimensions; and without dimension reduction structures. The first two studies favor Tan and Zhu (2022)βs method, and the third study deals with general models. The predictor vectors βs are independently generated from multivariate normal distribution . Here or . The error βs are independently drawn from the standard normal distribution . The simulation results are all based on replications. The results of are computed by the code provided by the authors of the paper Tan and Zhu (2022). To choose the bandwidth , we consider it to be , where is a constant taking the following five values: . To check how robust the test is against the value , we use the model as an example in Figure 1. It shows that our test performs robustly, and the size and power level can be similar. Therefore, we use the bandwidth with .

For comparisons, we consider four numerical studies. StudiesΒ 1 and 2 have multi-index model structures that favor , and StudyΒ 3 does not have such a structure. As under some strong conditions on the regression function our method can handle the cases where the dimension of the predictor vector can be much higher than that of parameter vector, we consider StudyΒ 4 that has and . As the limiting null distribution of is generally intractable, the wild bootstrap approximation is used to determine the critical values. For model in StudyΒ 1 and model in StudyΒ 2, the numerical results of are excerpted from Tan and Zhu (2022) to make the paper self-contained.
Study 1. Generate data from the following double-index and triple-index models:
We set and with , where indicates the largest integer smaller than or equal to . Here denotes the -th component of . The first hypothetical model is single-index, and the second is double-index, containing the second index and third index in alternative models respectively. The empirical sizes and powers are reported in TablesΒ 1-2 in SectionΒ 4.2.
Table 1 shows that works better for model , and with the dimension reduction step outperforms , but slightly. When the sample size gets larger, our tests gradually work closer to . However, for model , the situation changes. The results of TableΒ 2 suggest that outperforms and , although the model favors them. We checked the details and found that the structural dimension of the central subspace is underestimated to in this model and the residuals cannot be estimated well under the alternative hypothesis. This might be one of the reasons that and do not work well.
Study 2. Generate data from the following multi-index models with and respectively:
where and other notations are the same as stated above. In this study, the dimension is large; thus, the regularity conditions fail to hold in theory. The empirical sizes and powers of Study 2 are presented in Tables 3-4 in SectionΒ 4.2.
The simulation results suggest that all competitors cannot maintain the significance level for model when diverges at the rate of . This means that the condition of dimensionality is violated too much. For model with that also violates the condition, we can see that when the sample size is large, our test performs well in the significance level maintenance with relatively high power. At the same time, is liberal in general. The test with the dimension reduction step works slightly worse than . This suggests that the test may still be usable when the sample size is large.
Study 3. Generate data from the following models without dimension reduction structures:
where denotes the -th component of and the rest of the notations are the same as stated above. The four models have no dimension reduction structures, as the representatives of models: under the null, with low-dimensional and high-dimensional regressions; under the alternatives, with high-dimensional departure; and under the null, with high-dimensional regressions with interactions among the covariates.
Tables 5-8 in SectionΒ 4.2 report the empirical sizes and powers. fails to work entirely in high dimension scenarios, especially for Model allowing higher order interactions between the covariates. In almost all cases, cannot maintain the significance level and even has no empirical powers. This phenomenon suggests that relies critically on dimension reduction structures. The test works much better comparably as expected.
Study 4. Generate data from the following models with and respectively:
where , with and denotes the -th component of . In addition, define and , where takes the smallest integer greater than or equal to . The other notations remain the same as mentioned earlier. The two models have much higher dimensions of the predictor vector than that of the parameter vector. Tables 9-10 in SectionΒ 4.2 report the empirical sizes and powers. The simulation results suggest that may not be significantly affected by the dimension of the predictor vector when the regression function has a particular structure about the predictors and still works well in both significance level maintenance and power performance, whereas entirely fails to work.
The performance of our test is more robust against model settings than . Thus, our test is more suitable when the sample size is relatively large. But in moderate sample size scenarios, the test loses some power. See TableΒ 7 for instance. This is because the sample-splitting technique causes the sample size reduction such that the test has a slower rate to the weak limit than the classic tests. Another reason could be because of using the limiting null distribution to determine the critical values whereas uses the bootstrap approximation to favor the small and moderate sample size scenarios.
4.2 Simulation results
a | n=100 | n=100 | n=100 | n=100 | n=200 | n=400 | n=600 | |
---|---|---|---|---|---|---|---|---|
q=2 | q=4 | q=6 | q=8 | q=12 | q=17 | q=20 | ||
p=q | ||||||||
0.00 | 0.044 | 0.044 | 0.048 | 0.055 | 0.061 | 0.045 | 0.053 | |
0.25 | 0.326 | 0.313 | 0.367 | 0.357 | 0.674 | 0.951 | 0.995 | |
0.00 | 0.050 | 0.046 | 0.042 | 0.054 | 0.050 | 0.041 | 0.048 | |
0.25 | 0.343 | 0.348 | 0.339 | 0.304 | 0.620 | 0.917 | 0.979 | |
0.00 | 0.055 | 0.051 | 0.076 | 0.051 | 0.050 | 0.050 | 0.065 | |
(from Tan and Zhu (2022)) | 0.25 | 0.556 | 0.564 | 0.553 | 0.562 | 0.853 | 0.992 | 1.000 |
0.00 | 0.043 | 0.053 | 0.052 | 0.061 | 0.048 | 0.054 | 0.055 | |
0.25 | 0.337 | 0.624 | 0.767 | 0.817 | 0.996 | 1.000 | 1.000 | |
0.00 | 0.051 | 0.049 | 0.046 | 0.056 | 0.058 | 0.048 | 0.065 | |
0.25 | 0.281 | 0.540 | 0.703 | 0.768 | 0.987 | 1.000 | 1.000 | |
0.00 | 0.052 | 0.043 | 0.059 | 0.070 | 0.057 | 0.049 | 0.050 | |
(from Tan and Zhu (2022)) | 0.25 | 0.481 | 0.820 | 0.916 | 0.956 | 1.000 | 1.000 | 1.000 |
a | n=100 | n=100 | n=100 | n=100 | n=200 | n=400 | n=600 | |
---|---|---|---|---|---|---|---|---|
q=2 | q=4 | q=6 | q=8 | q=12 | q=17 | q=20 | ||
p=2 | ||||||||
0.00 | 0.054 | 0.048 | 0.038 | 0.055 | 0.046 | 0.045 | 0.050 | |
0.10 | 0.530 | 0.487 | 0.473 | 0.533 | 0.726 | 0.878 | 0.923 | |
0.00 | 0.051 | 0.051 | 0.060 | 0.042 | 0.054 | 0.054 | 0.046 | |
0.10 | 0.403 | 0.425 | 0.415 | 0.390 | 0.619 | 0.757 | 0.850 | |
0.00 | 0.054 | 0.059 | 0.044 | 0.051 | 0.069 | 0.066 | 0.068 | |
0.10 | 0.098 | 0.093 | 0.084 | 0.103 | 0.112 | 0.172 | 0.240 | |
0.00 | 0.051 | 0.036 | 0.052 | 0.043 | 0.038 | 0.039 | 0.039 | |
0.10 | 0.502 | 0.509 | 0.557 | 0.493 | 0.743 | 0.830 | 0.885 | |
0.00 | 0.050 | 0.045 | 0.044 | 0.040 | 0.046 | 0.053 | 0.052 | |
0.10 | 0.248 | 0.266 | 0.288 | 0.294 | 0.457 | 0.601 | 0.688 | |
0.00 | 0.050 | 0.053 | 0.048 | 0.038 | 0.059 | 0.064 | 0.053 | |
0.10 | 0.043 | 0.063 | 0.050 | 0.051 | 0.042 | 0.031 | 0.018 |
a | n=50 | n=100 | n=500 | n=1000 | |
---|---|---|---|---|---|
q=5 | q=10 | q=50 | q=100 | ||
p=q | |||||
0.00 | 0.039 | 0.058 | 0.063 | 0.064 | |
0.10 | 0.116 | 0.222 | 0.798 | 0.984 | |
0.00 | 0.067 | 0.053 | 0.065 | 0.075 | |
0.10 | 0.107 | 0.151 | 0.566 | 0.871 | |
0.00 | 0.062 | 0.057 | 0.071 | 0.081 | |
(fromTan and Zhu (2022)) | 0.10 | 0.163 | 0.250 | 0.858 | 0.994 |
0.00 | 0.051 | 0.057 | 0.058 | 0.072 | |
0.10 | 0.184 | 0.461 | 0.993 | 1.000 | |
0.00 | 0.050 | 0.059 | 0.070 | 0.071 | |
0.10 | 0.154 | 0.331 | 0.970 | 0.992 | |
0.00 | 0.064 | 0.068 | 0.079 | 0.107 | |
(from Tan and Zhu (2022)) | 0.10 | 0.235 | 0.582 | 0.935 | 0.959 |
a | n=100 | n=400 | n=900 | |
---|---|---|---|---|
q=10 | q=20 | q=30 | ||
p=2q | ||||
0.00 | 0.104 | 0.068 | 0.061 | |
0.50 | 0.568 | 0.991 | 1.000 | |
0.00 | 0.094 | 0.060 | 0.059 | |
0.50 | 0.512 | 0.962 | 1.000 | |
0.00 | 0.079 | 0.074 | 0.074 | |
0.50 | 0.970 | 0.999 | 1.000 | |
0.00 | 0.056 | 0.059 | 0.037 | |
0.50 | 0.393 | 0.824 | 0.953 | |
0.00 | 0.074 | 0.061 | 0.057 | |
0.50 | 0.358 | 0.708 | 0.889 | |
0.00 | 0.086 | 0.091 | 0.069 | |
0.50 | 0.760 | 0.839 | 0.917 |
a | n=100 | n=100 | n=100 | n=100 | n=200 | n=400 | n=600 | |
---|---|---|---|---|---|---|---|---|
q=2 | q=4 | q=6 | q=8 | q=12 | q=17 | q=20 | ||
p=2 | ||||||||
0.00 | 0.042 | 0.035 | 0.043 | 0.046 | 0.057 | 0.048 | 0.047 | |
0.10 | 0.658 | 0.726 | 0.825 | 0.817 | 0.933 | 0.975 | 0.970 | |
0.00 | 0.049 | 0.057 | 0.050 | 0.048 | 0.069 | 0.065 | 0.066 | |
0.10 | 0.041 | 0.033 | 0.063 | 0.082 | 0.144 | 0.211 | 0.258 | |
0.00 | 0.034 | 0.045 | 0.041 | 0.045 | 0.039 | 0.048 | 0.051 | |
0.10 | 0.610 | 0.738 | 0.850 | 0.838 | 0.935 | 0.950 | 0.978 | |
0.00 | 0.050 | 0.055 | 0.048 | 0.039 | 0.061 | 0.063 | 0.056 | |
0.10 | 0.028 | 0.010 | 0.017 | 0.022 | 0.041 | 0.070 | 0.104 |
a | n=100 | n=100 | n=100 | n=100 | n=200 | n=400 | n=600 | |
---|---|---|---|---|---|---|---|---|
q=2 | q=4 | q=6 | q=8 | q=12 | q=17 | q=20 | ||
p=q | ||||||||
0.00 | 0.035 | 0.029 | 0.046 | 0.046 | 0.055 | 0.064 | 0.041 | |
0.10 | 0.588 | 0.757 | 0.782 | 0.840 | 0.988 | 1.000 | 1.000 | |
0.00 | 0.061 | 0.056 | 0.056 | 0.079 | 0.074 | 0.048 | 0.000 | |
0.10 | 0.353 | 0.162 | 0.03 | 0.002 | 0.000 | 0.000 | 0.000 | |
0.00 | 0.042 | 0.051 | 0.053 | 0.051 | 0.051 | 0.070 | 0.049 | |
0.10 | 0.487 | 0.637 | 0.760 | 0.843 | 0.967 | 0.998 | 1.000 | |
0.00 | 0.060 | 0.063 | 0.047 | 0.057 | 0.064 | 0.053 | 0.000 | |
0.10 | 0.391 | 0.177 | 0.022 | 0.000 | 0.000 | 0.000 | 0.000 |
a | n=100 | n=100 | n=100 | n=100 | n=200 | n=400 | n=600 | |
---|---|---|---|---|---|---|---|---|
q=2 | q=4 | q=6 | q=8 | q=12 | q=17 | q=20 | ||
p=q-1 | ||||||||
0.00 | 0.047 | 0.054 | 0.062 | 0.042 | 0.058 | 0.039 | 0.052 | |
0.50 | 0.688 | 0.647 | 0.644 | 0.615 | 0.915 | 0.996 | 0.998 | |
0.00 | 0.052 | 0.038 | 0.020 | 0.002 | 0.000 | 0.000 | 0.000 | |
0.50 | 0.905 | 0.792 | 0.550 | 0.144 | 0.000 | 0.000 | 0.000 | |
0.00 | 0.035 | 0.052 | 0.051 | 0.048 | 0.044 | 0.053 | 0.051 | |
0.50 | 0.675 | 0.558 | 0.465 | 0.399 | 0.572 | 0.774 | 0.867 | |
0.00 | 0.060 | 0.038 | 0.025 | 0.004 | 0.000 | 0.000 | 0.000 | |
0.50 | 0.904 | 0.724 | 0.496 | 0.162 | 0.000 | 0.000 | 0.000 |
a | n=100 | n=100 | n=100 | n=100 | n=200 | n=400 | n=600 | |
---|---|---|---|---|---|---|---|---|
q=2 | q=4 | q=6 | q=8 | q=12 | q=17 | q=20 | ||
p=q-2 | ||||||||
0.00 | / | 0.052 | 0.047 | 0.046 | 0.053 | 0.057 | 0.055 | |
0.50 | / | 0.489 | 0.345 | 0.226 | 0.433 | 0.661 | 0.787 | |
0.00 | / | 0.038 | 0.015 | 0.000 | 0.000 | 0.000 | 0.000 | |
0.50 | / | 0.642 | 0.176 | 0.022 | 0.000 | 0.000 | 0.000 | |
0.00 | / | 0.048 | 0.044 | 0.057 | 0.051 | 0.042 | 0.066 | |
0.50 | / | 0.909 | 0.835 | 0.750 | 0.932 | 0.995 | 0.999 | |
0.00 | / | 0.047 | 0.022 | 0.005 | 0.000 | 0.000 | 0.000 | |
0.50 | / | 0.619 | 0.251 | 0.075 | 0.032 | 0.004 | 0.000 |
a | n=100 | n=100 | n=100 | n=100 | n=200 | n=400 | n=600 | |
p=2 | p=4 | p=6 | p=8 | p=12 | p=17 | p=20 | ||
0.00 | 0.053 | 0.063 | 0.047 | 0.048 | 0.046 | 0.055 | 0.049 | |
0.25 | 0.357 | 0.399 | 0.421 | 0.460 | 0.762 | 0.965 | 0.994 | |
0.00 | 0.037 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
0.25 | 0.279 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
0.00 | 0.044 | 0.052 | 0.051 | 0.047 | 0.055 | 0.046 | 0.039 | |
0.25 | 0.272 | 0.645 | 0.841 | 0.913 | 1.000 | 1.000 | 1.000 | |
0.00 | 0.094 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
0.25 | 0.206 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
0.00 | 0.037 | 0.041 | 0.049 | 0.043 | 0.051 | 0.059 | 0.041 | |
0.25 | 0.486 | 0.424 | 0.432 | 0.435 | 0.765 | 0.961 | 1.000 | |
0.00 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
0.25 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
0.00 | 0.047 | 0.052 | 0.037 | 0.041 | 0.063 | 0.039 | 0.065 | |
0.25 | 0.455 | 0.766 | 0.881 | 0.921 | 0.994 | 1.000 | 1.000 | |
0.00 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
0.25 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
a | n=100 | n=100 | n=100 | n=100 | n=200 | n=400 | n=600 | |
p=2 | p=4 | p=6 | p=8 | p=12 | p=17 | p=20 | ||
0.00 | 0.052 | 0.034 | 0.055 | 0.051 | 0.064 | 0.066 | 0.049 | |
0.25 | 0.324 | 0.381 | 0.401 | 0.416 | 0.755 | 0.960 | 0.999 | |
0.00 | 0.038 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
0.25 | 0.329 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
0.00 | 0.040 | 0.053 | 0.051 | 0.053 | 0.054 | 0.055 | 0.044 | |
0.25 | 0.335 | 0.652 | 0.834 | 0.876 | 0.999 | 1.000 | 1.000 | |
0.00 | 0.049 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
0.25 | 0.320 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
0.00 | 0.046 | 0.056 | 0.048 | 0.067 | 0.066 | 0.056 | 0.048 | |
0.25 | 0.191 | 0.145 | 0.119 | 0.112 | 0.138 | 0.171 | 0.197 | |
0.00 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
0.25 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
0.00 | 0.055 | 0.060 | 0.052 | 0.053 | 0.066 | 0.051 | 0.056 | |
0.25 | 0.214 | 0.258 | 0.281 | 0.275 | 0.452 | 0.671 | 0.821 | |
0.00 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
0.25 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
4.3 A real data example
In this subsection, we use the CSM data set to illustrate our method. The CSM data set was first
analyzed by Ahmed etΒ al. (2015) and can be obtained through the website (https://archive.ics.uci.edu/ml
/datasets/CSM+%28Conventional+and+Social+Media+Movies%29+Dataset+2014+and
+2015). There are 187 observations left in the data set after cleaning 30 observations with missing responses and/or covariates. The response variable Y is Gross Income. There are 11 predictor variables: Rating , Genre , Budget , Screens , Sequel , Sentiment , Views , Likes , Dislikes
, Comments and Aggregate Followers . To explore the relationship between the response Y and the predictor vector , we first check whether the data set follows a linear regression model that is often used in practice. Figure 2 (a) shows that the residual plot might have a linear pattern. Besides, the value of our proposed test statistic = 2.0420 and the -value is about 0.0412. Therefore, a linear regression model may not be tenable to fit the data, while a more plausible model is in need. Using sufficient dimension reduction techniques like the cumulative slicing estimation (CSE), we find that the estimated structural dimension of this data set is and the corresponding projected direction is
As a result, we establish the following polynomial regression model
(4.1) |
The value of the test statistic is 0.7150, with the -value being 0.4746, indicating that the model (4.1) may be more appropriate to fit the CSM data set. We also plot the residuals against the fitted responses in Figure 2 (b), which seems not to have a clear nonlinear pattern of the residuals. As this model is of a dimension reduction structure, Tan and Zhu (2022)βs test also suggested this modeling.

5 Discussions
This paper develops a novel test statistic for checking general parametric regression models in high-dimensional scenarios. By using a sample-splitting strategy and a conditional studentization approach, the proposed test can obtain a normal limiting null distribution . It does not depend on dimension reduction model structures that are critically useful for existing tests. Moreover, our method is easy to implement, and does not need a resampling approximation for the critical value determination. The simulation results also show that our test, in many cases, can maintain the significance level and has good power performance. Thus, this research could be a good input in this research field. Further, it could be applied to other modelβchecking problems as a generic methodology.
The sample-splitting technique also brings some limitations. The main shortcoming is that the test statistic converges to its weak limit at the rate of rather than causing some loss in power. Thus, the sample size should not be too small otherwise this methodology may not work well. We note that the commonly used cross-fitting idea is often useful in other testing problems to enhance power. But the following observations make us hesitant to use this method. Although the studentisation approach considers the conditional variance in denominator when the second subset of data is given, the test involves all datum points even in the conditional variance estimation. If we construct another conditional studentized test with given the first subset of data, the numerators of the two test statistics are the same, but the denominators are highly correlated and the covariance is hard to compute, if not impossible. As the limiting null distribution is intractable in this case, a possible solution is to use re-sampling approximation such as the wild bootstrap. We tried this idea for Model 1 in StudyΒ 1, and found that the power can be enhanced, but slightly. But such a solution gives up the main advantage of our method having the tractable limiting null distribution. It may not be as valuable as that. Another issue is about the model dimensions. In our setting, without sparsity structure, the method can handle the cases where the number of the parameters can diverge at the rate of order and most likely this rate cannot be higher (see, e.g., Tan and Zhu (2022)). To handle the cases with larger number of parameters, model sparsity assumption about parameters could be necessary. See the relevant references Shah and BΓΌhlmann (2018) and JankovΓ‘ etΒ al. (2020) that checked sparse linear and generalized linear models. But in these cases, the construction of the test statistic may have to use penalization method for variable selection, thus in the more general model setting than those in Shah and BΓΌhlmann (2018) and JankovΓ‘ etΒ al. (2020), it is still unclear whether the asymptotic behaviors can be derived and even if we can, it is still being determined whether the asymptotic distribution-free property holds. These deserve further study. On the other hand, it is of interest to see that when some conditions on regression function holds, the dimension of the predictor vector can be high, even higher than the sample size, although the conditions are stringent in the large scenarios. The simulations with and support this observation. We also conducted some simulations with being larger than , the phenomenon is similar. This shows the potential of our method tackling the models with higher dimensions of the predictor vector.
6 Regularity Conditions
In these conditions, means the norm of any vector or matrix.
Condition 1.
There exists a unique minimizer of the square loss in the interior of the compact parameter set .
Condition 2.
Denote . The regression function admits third derivatives concerning . For , define
Let be a subset consists of all satisfying in the interior of , where is a positive constant. Further, for and , there exists a function such that for any , with . The fourth moment of is bounded for the regression model.
Condition 3.
For , define , and . Let be the -th eigenvalue of . , for any where is a positive constant free from and .
For any matrix , let and be the smallest and the largest eigenvalue of , respectively.
Condition 4.
Define . There exist two constants and unrelated to and , such that
Besides, define and , where is the error term corresponding to in our regression model. , , and are four constants unrelated to and such that and
Finally, define and assume , where is a constant free from and .
Condition 5.
Define There exist two measurable functions and with and , satisfying and where .
Condition 6.
There exists a measurable function with , satisfying
where .
Remark 5.
Conditions 1, 2, 3 and 5 appear in Tan and Zhu (2022), commonly assumed in high dimensional model checking literature. Condition 4 is similar to the regularity condition (A2) in Tan and Zhu (2022), while we also assume that the largest eigenvalue and the smallest eigenvalue are bounded in the other three matrices. Condition 6 is a general Lipschitz Condition. For Conditions 1-6, we do not directly put the constraints on the dimension of the predictor vector , while they are hidden in the boundedness of the related functions and their derivatives. Though our conditions could be stringent when diverges quickly to infinity, some functions still meet the requirements. Thus, the test could work with a high-dimensional predictor vector.
Recall that in our paper is a function concerning and , and without confusion and mis-understanding, for the sake of simplicity.
Condition 7.
and for a positive constant .
Remark 6.
For Condition 7, , as a weight function of and , can satisfy in many forms, such as and . In both two cases, always holds. For the form like , although may diverge to infinity as , we can divide both numerator and denominator by and use to replace , then . In fact, under Conditions 4 and 7, we can infer that and are bounded. In Supplementary Material, we give some more details to show this condition is satisfied.
Condition 8.
Define
(6.1) |
Condition 9.
, and , where and are two positive constants. Besides,
Remark 7.
Condition 9 is a sufficient condition for the Berry-Esseen bound, which is essential to ensure the asymptotic normality of our test statistic under the null hypothesis. For Condition 9, letβs analyse the meaning of . The linear projection of on is , and the linear projection of the remainder term, , on , is . Then we can find that has the form of the remainder term after projected on and . Hence is almost the estimation of the error term when we use and to predict using a linear model. Since the structure of and are different, it is reasonable to assume that has a lower bound as goes to infinity. Since is bounded, it is reasonable to assume the boundedness of .
Define . Under the global hypothesis, .
Condition 10.
. Further, there are two constants , not depending on and such that
Recall the sequence of local alternatives, , and the definition of in SubsectionΒ 3.2.
Condition 11.
, , a.s., and , where and are two constants unrelated to and .
Remark 8.
Conditions 10 and 11 are used to obtain the asymptotic distribution of our test statistic under the alternative hypotheses. These two conditions are not necessary for studying the test statistic with high power under the alternative hypotheses. Conversely, if we do not put conditions either on the moments of and or on an upper bound of the eigenvalues of , our test statistic may diverge at a faster rate to have higher power. On the other hand, when we want to study the properties under the local alternatives, these conditions can make the investigation of the limiting properties of the test statistic more easily with some easily understood presentations.
SUPPLEMENTARY MATERIAL
-
Supplementary of The conditionally studentized test for high-dimensional parametric regressions Technical proofs of the theorems. (.pdf file)
References
- Ahmed etΒ al. [2015] M.Β Ahmed, M.Β Jahangir, H.Β Afzal, A.Β Majeed, and I.Β Siddiqi. Using crowd-source based features from social media and conventional features to predict the movies popularity. In 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pages 273β278. IEEE, 2015.
- Bierens [1982] H.Β J. Bierens. Consistent model specification tests. Journal of Econometrics, 20:105β134, 1982.
- Escanciano [2006] J.Β C. Escanciano. A consistent diagnostic test for regression models using projections. Econometric Theory, 22:1030β1051, 2006.
- Escanciano [2009] J.Β C. Escanciano. On the lack of power of omnibus specification tests. Econometric Theory, 25:162β194, 2009.
- Guo etΒ al. [2016] X.Β Guo, T.Β Wang, and L.Β Zhu. Model checking for parametric single-index models: a dimension reduction model-adaptive approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78:1013β1035, 2016.
- HΓ€rdle and Mammen [1993] W.Β HΓ€rdle and E.Β Mammen. Comparing nonparametric versus parametric regression fits. The Annals of Statistics, 21:1926β1947, 1993.
- JankovΓ‘ etΒ al. [2020] J.Β JankovΓ‘, R.Β D. Shah, P.Β BΓΌhlmann, and R.Β J. Samworth. Goodness-of-fit testing in high dimensional generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82:773β795, 2020.
- Lavergne and Patilea [2008] P.Β Lavergne and V.Β Patilea. Breaking the curse of dimensionality in nonparametric testing. Journal of Econometrics, 143:103β122, 2008.
- Lavergne and Patilea [2012] P.Β Lavergne and V.Β Patilea. One for all and all for one: regression checks with many regressors. Journal of business & economic statistics, 30:41β52, 2012.
- Li etΒ al. [2019] L.Β Li, S.Β N. Chiu, and L.Β Zhu. Model checking for regressions: An approach bridging between local smoothing and global smoothing methods. Computational Statistics & Data Analysis, 138:64β82, 2019.
- Shah and BΓΌhlmann [2018] R.Β D. Shah and P.Β BΓΌhlmann. Goodness-of-fit tests for high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80:113β135, 2018.
- Stute and Zhu [2002] W.Β Stute and L.-X. Zhu. Model checks for generalized linear models. Scandinavian Journal of Statistics, 29:535β545, 2002.
- Stute etΒ al. [1998a] W.Β Stute, W.Β G. Manteiga, and M.Β P. Quindimil. Bootstrap approximations in model checks for regression. Journal of the American Statistical Association, 93:141β149, 1998a.
- Stute etΒ al. [1998b] W.Β Stute, S.Β Thies, and L.-X. Zhu. Model checks for regression: an innovation process approach. The Annals of Statistics, 26:1916β1934, 1998b.
- Stute etΒ al. [2008] W.Β Stute, W.Β Xu, and L.Β Zhu. Model diagnosis for parametric regression in high-dimensional spaces. Biometrika, 95:451β467, 2008.
- Tan and Zhu [2019] F.Β Tan and L.Β Zhu. Adaptive-to-model checking for regressions with diverging number of predictors. The Annals of Statistics, 47:1960β1994, 2019.
- Tan and Zhu [2022] F.Β Tan and L.Β Zhu. Integrated conditional moment test and beyond: when the number of covariates is divergent. Biometrika, 109:103β122, 2022.
- Zheng [1996] J.Β X. Zheng. A consistent test of functional form via nonparametric estimation techniques. Journal of Econometrics, 75:263β289, 1996.
- Zhu and Li [1998] L.Β Zhu and R.Β Li. Dimension-reduction type test for linearity of a stochastic regression model. Acta Mathematicae Applicatae Sinica, 14:165β175, 1998.
- Zhu [2003] L.-X. Zhu. Model checking of dimension-reduction type for regression. Statistica Sinica, 13:283β296, 2003.