Variable Selection in Doubly Truncated Regression
Abstract
Doubly truncated data arise in many areas such as astronomy, econometrics, and medical studies. For the regression analysis with doubly truncated response variables, the existence of double truncation may bring bias for estimation as well as affect variable selection. We propose a simultaneous estimation and variable selection procedure for the doubly truncated regression, allowing a diverging number of regression parameters. To remove the bias introduced by the double truncation, a Mann-Whitney-type loss function is used. The adaptive LASSO penalty is then added into the loss function to achieve simultaneous estimation and variable selection. An iterative algorithm is designed to optimize the resulting objective function. We establish the consistency and the asymptotic normality of the proposed estimator. The oracle property of the proposed selection procedure is also obtained. Some simulation studies are conducted to show the finite sample performance of the proposed approach. We also apply the method to analyze a real astronomical data.
Keywords: Adaptive LASSO; Double truncation; Diverging number of parameters; Least absolute deviation; Oracle property; Variable selection.
Accepted by SCIENTIA SINICA Mathematica (in Chinese)
1 Introduction
Truncated data arise in astronomy, econometrics, and survival analysis, etc. In truncation, only those subjects fall with an interval can be observed along with the interval. The subjects fall out of their respective intervals are not known to exist and consequently, have no chance to be observed. When the truncation interval is unbounded from above, the truncation is called left-truncation; when the interval is unbounded from below, it is called right-truncation. Double truncation occurs when the truncation interval is bounded in both sides.
Truncation may bring systematic bias to statistical analysis if it is not dealt with appropriately. For nonparametric approach of distribution estimation, Turnbull (1976) developed a general algorithm for finding the nonparametric maximum likelihood estimator for arbitrarily grouped, censored or truncated data. Lyden-Bell (1971) obtained a similar estimator for singly truncated data. The counting process techniques are then adopted by Wang et al. (1986), Keiding and Gill (1990), Lai and Ying (1991a), etc. For regression analysis with single truncation, one can refer to Bhattacharya et al. (1983) for extended Mann-Whitey approach, Tsui et al. (1988) for iterative bias adjustment, Tsai (1990) for Kendall’s tau correlation, Lai and Ying (1991b) for rank-based estimation, and so on. Some more recent developments include Greene (2012) for econometrics and Kim et al. (2013) and Liu et al. (2016) for general biased sampling.
Double truncation is technically more challenging to deal with compared with single truncation, so fewer results have been found in existing literature. For distribution estimation, Efron and Petrosian (1999) and Shen (2010) developed nonparametric maximum likelihood estimation. Moreira and Alvarev (2012) proposed kernel-type density estimation approach. For two-sample problem, Bilker and Wang (1996) and Shen (2013a) extended the Mann-Whitney test. For regression ananlsis with doubly truncated responses, Moreia and Alvarev (2016) proposed kernel type approach for low dimensional covariates. Shen (2013b) considered estimating a class of semiparametric transformation models. More recently, Ying et al. (2020) proposed an extended Mann-Whitney-type loss function for parameter estimation in linear regression model with fixed covariates dimension. A Mann-Whitney-type rank estimator is then defined as its minimizer.
Besides the parameter estimation, double truncation may also bring difficulty for variable selection in regression analysis. To the best of our knowledge, no literature has studied variable selection procedure for linear regression models with doubly truncated responses. Based on the extended Mann-Whitney-type loss function proposed by Ying et al. (2020), we adopt the adaptive LASSO penalty (Zou, 2006) to develop a simultaneous estimation and variable selection procedure for the doubly truncated linear models. The proposed procedure can not only deal with fixed-dimensional covariates, but also be applicable for the diverging model size where the dimension of covariates goes to infinity with the sample size. We show that the adaptive LASSO penalty leads to the oracle properties of the selection procedure. Meanwhile, an iterative algorithm is designed for minimizing the proposed objective function. In each iteration, a least absolute deviation (LAD) optimization is involved and can be solved efficiently through standard LAD algorithm. A modified BIC approach is used for selecting the tuning parameter. We also develop a random weighting approach for estimating the standard errors of the non-zero regression parameters. Numerical studies are conducted to illustrate the finite sample behaviors of the proposed approach.
The rest of the paper is organized as follows. The next section introduces necessary notation and the estimation procedure of doubly truncated regression models. In Section 3, we give out the main results of the variable selection procedure, including the basic idea of the selection method, the algorithm, the oracle properties of the selection procedure, and the approach to determine the tuning parameter. In Section 4, the results of simulation studies are presented to show the finite sample behavior of the proposed approach and a real dataset is analyzed to illustrate the approach. Some concluding remarks are given in Section 5. All the technical details are summarized in the Appendix.
2 Notation, model specification, and estimation
First some notations are introduced. Let be the response variable and be a -dimensional covariate vector. We consider the linear location-shift model
(1) |
where is the -dimensional regression coefficient vector and is the error term independent of , is assumed to hold. The response is subject to double truncation. Let and be the left and right truncation variables, respectively. can be observed if and only if falls into the interval . We also assume that is conditionally independent of given . This is a common assumption for truncation data analysis. Under model (1), this assumption is equivalent to assume that is independent of . The distribution and density function of is denoted by and , respectively.
Let , , be independent and identically distributed (i.i.d.) copies of . As we mentioned, is observed if and only if , together with . Denote the number of the observed data points by , that is, . We use , to denote the observed data points. Moreover, for , let , , and .
To correct the bias brought by the double truncation, Ying et al. (2020) proposed a modified Mann-Whitney-type rank estimation equation
(2) |
where is the indicator function and is the sign function. By introducing the indicators, the estimation function takes the summation of all the so-called comparable pairs and becomes the unbiased when takes the true value. Meanwhile, solving (2) is equivalent to minimizing the following loss function
(3) |
where and are the minimum and maximum operator, respectively. The minimizer of , denoted by , is the proposed estimator of by Ying et al. (2020). They showed that the estimator is consistent and asymptotically normal under some regularity conditions.
3 Main results of variable selection
3.1 The method of variable selection
We consider variable selection for model (1) based on the observed data points. The loss function (3) motivates a penalized approach for variable selection. Instead of restricting to fixed dimension of , we allow the dimension to go to infinity with , that is, depends on , denoted by . Then all the quantities that are functions of the covariates depend on . For the simplicity of notation, we shall suppress the subscript when there is no confusion. Meanwhile, is assumed to converge to a positive constant less than 1 as goes to infinity.
Because of the form of the loss function (3), the LASSO-type penalties are naturally considered. Specifically, the objective function with usual LASSO penalty is given by
where is a (data-dependent) tuning parameter. There exist two main concerns if we want to use the above LASSO penalty. The first one is about the computation. One can see that is not convex in , so the optimization is no longer convex. However, we note that can be represented as
The non-convexity comes from the indicator function, while is convex in . Thus, one can develop an iterative algorithm where the parameter in the indicator function is kept fixed in each iteration step and the optimization is implemented for the rest part of the objective function. Some more details of the computation are discussed in Section 3.2 and Appendix A.1.
The second one is that it is well known that the LASSO estimator does not have the unbiasedness property and there are scenarios where the LASSO selection is not consistent. To overcome these drawbacks, Zou (2006) proposed adaptive LASSO penalty, in which proper weights are introduced for penalizing different coefficients in the lasso penalty. The weights used by Zou (2006) are defined based on root--consistent estimator of . He showed that with a proper choice of the tuning parameter , the adaptive LASSO possesses the oracle properties when the dimension of is fixed.
Trying to keep the oracle property, we also consider the adaptive LASSO approach for variable selection. Specifically, the objective function with adaptive LASSO is given by
(4) |
where is the adaptive weight of . Originally, Zou (2006) proposed to use , where is a root--consistent estimator of and is a predetermined positive constant. In our case, apparently, even when diverges with , we can still get by minimizing as long as does not grow too fast. Based on , we define in (4), where stands for the -th component of . The proposed adaptive LASSO estimator, conceptually, is defined to be the minimizer of .
3.2 The algorithm
As we mention in Section 3.1, is not convex, neither is , so we turn to an iterative procedure to get the proposed estimator. Specifically, we modify to
Let be an initial estimate; for example, one may use as the start. Then, in the -th iteration step, let for . If converges to a limit as the number of , then we use the limit as the proposed estimator.
The main reason that we use the above iterative algorithm is because in the th iteration, the objective function can be written as
(5) |
where . Thus, the optimization problem becomes a linear programming problem which can be solved efficiently. Moreover, we find that after some algebraic operations, the optimization can be transformed into a least absolute deviation (LAD) problem which can be solved efficiently through standard LAD procedure, such as the quantreg package in R. More details on the transformation are provided in the Appendix A.1.
3.3 Large sample properties
Next we establish the oracle property of the proposed adaptive lasso approach under some regularity conditions. The oracle property requires the consistency of . Although Ying et al. (2020) proved root--consistency of under suitable conditions, their result is only confined to fixed dimension. When diverges with , the consistency of needs further investigation. Let denote the true value of . The following proposition, proved in the Appendix A.2, gives out the consistency of with the diverging dimension.
Proposition 1.
Assume that conditions C1 to C6 listed in the Appendix A.2 hold. If , then .
We then show the existence of an adaptive lasso estimator and it is consistent with the convergence rate of . The following theorem, proved in the Appendix A.2, states the results.
Theorem 1.
Assume that the conditions C1 to C6 listed in the Appendix A.2 hold. If and , then with probability tending to 1 there exists a local minimizer of , denoted by , satisfying that .
To present the following oracle properties, we need some more notation. Let , , and , where the expectation is taken with respect to the distribution of the observed data. We define two matrices and . We further partition into and , where stands for the nonzero component of and for the zero component. The dimension of is denoted by . Let and be the adaptive lasso estimator for and , respectively. Moreover, let denote the components of corresponding to and denote the components of .
Theorem 2.
Assume that the conditions C1 to C6 listed in the Appendix A.2 hold. If and , then with probability tending to 1 the adaptive lasso estimator has the following properties: i) ; ii) and for any nonzero constant vector with , converges in distribution to .
We give out the proof of Theorem 2 in the Appendix A.2. By choosing a suitable , it can be seen from Theorem 2 that the estimator of each individual component of is asymptotically normal. The results can be used to make inference on individual coefficients. The estimation of the variance is discussed in Section 3.4.
3.4 The selection of tuning parameter and variance estimation
Typical methods of selecting tuning parameter include information criteria methods such as AIC, BIC, and data-driven procedures such as -fold cross-validation. Because of the computational issue, we consider information criteria approaches here. It is well known that BIC usually leads to the consistency of selection for fixed dimension scenarios. However, as we allow the dimension of the parameter to diverge, the standard BIC does not guarantee the selection consistency. Thus, we consider the following modified BIC:
where represents the adaptive lasso estimate obtained at the tuning parameter with level , is a sequence going to infinity with , and stands for the size of the non-zero component of the estimate at level . We find that works well in this paper. The tuning parameter we choose is given by
The proposed modified BIC has the consistency of selection. Let and be the true model and the selected model, respectively. We have the following result on selection consistency.
Theorem 3.
Assume that the conditions C1 to C6 listed in the Appendix A.2 hold. If , , , and , then as .
Finally, we also need to give out the variance estimation of the proposed estimator for the nonzero component of . Note that the asymptotic variance-covariance matrix of involves the density function of and is complicated to estimate directly. We turn to use a random weighting method to obtain the variance estimate. To be specific, we defined the perturbed penalized estimator, denoted by , to be the minimizer of the following perturbed version of the objective function :
where , are i.i.d. non-negative and bounded random variables with mean 0.5 and variance 1, independent of the observed data. Let be component of corresponding to , i.e., the non-zero part of . It can be shown that under suitable conditions, given the observed data, the conditional asymptotic distribution of is the same as the asymptotic distribution of . Thus, by generating repeatedly the sequence of , , we can obtain a large number of replications of and then approximate the variance of .
4 Numerical results
4.1 Simulation studies
We conduct some simulation studies to illustrate the finite sample performance of the proposed approach. We set , where is the size of the full sample, as defined in Section 2. The size of the non-zero components is set to be . We consider two scenarios with and , respectively. In the first scenario, and . We set . For the covariates , we independently generate from Binomial(0.25) (the Binomial distribution with success probability 0.25), from Binomial(0.8), from Uniform(0,2) (the uniform distribution from 0 to 2), from Binomial(0.5), and from Uniform(-2,0). All the rest covariats are generated from the standard normal distribution, with the correlation between and being . In the second scenario, and . We set . We generate from Binomial(0.25), from Binomial(0.8), from Uniform(0,2), from Binomial(0.5), from Uniform(-2,0), and the rest from the standard normal distribution, with the correlation between and being . For the error distribution in the linear model, we consider the standard normal () distribution and the extreme value distribution with location parameter 0 and scale parameter 1 (). The left truncation variable is generated from Uniform(,) and the right one , where , , and are constants to yield truncation percentage of 30% and 40%. The percentages of left and right truncation are almost the same.
Under each scenario, 1000 replications are carried out. For variable selection, we define several criteria to assess the performance. The first one is the model error (ME), being defined as ME, where is an estimator of derived from a specific approach. The median and median absolute deviation (MAD) of the 1000 MEs are recorded to evaluate the prediction performance of different procedures. The second one is the average number of total estimated zero coefficients (TN), that is, the average number of zero estimates obtained in the 1000 replications, including correctly estimated zero number (CN) and incorrectly estimated zero number (IN). For CN, the closer it is to the better a procedure performs, while for IN, the closer it is to 0 the better the procedure performs. The third one is the ratio of the correctly identified models (RCM), that is, the proportion of the replications that can identify the true model correctly among the 1000 replications. The closer it is to 100%, the better a procedure performs. We consider three procedures, including the proposed selection procedure (Proposed), the variable selection procedure with adaptive lasso penalty without taking truncation into consideration (Naive), and the oracle procedure where the correct subset of covariates is used to fit the model (Oracle). The results are summarized in the following tables.
[Insert Table 1 here]
[Insert Table 2 here]
From the tables, we see that the proposed method outperforms the naive approach significantly in terms of the ME. This is expectable since the naive approach yields biased estimators for the regression coefficients. For the TN and RCM, the proposed method also has better results than the naive approach, while the extend of the improvement is not very large. The TN and RCM are criteria for variable selection performance. Although selecting important variables seems to be less affected by the double truncation, correcting the bias induced by double truncation still helps in increasing the precision of the variable selection procedure.
4.2 A real data example
We analyze an astronomical data from the 46420 quasar catalog produced by the Sloan Digital Sky Survey (SDSS) team. In this data, the main purpose is to predict the red shift of a quasar by using some related features. Thus, the linear model is used and the dependent variable is the red shift of the quasars. Compared with the complete SDSS catalog, the data set we use here omits some technical columns and therefore contains 23 covariates. Some more details about the data set can be found in https://astrostatistics.psu.edu/datasets.
It might not be that straightforward to see why there exists double truncation in the red shift data. This mainly comes from the nature of the observation feature of the astronomical telescope. Because of the aperture of the astronomical telescope, the red shift of a quasar one can observe is limited from the above. For the current data, we consider the maximum value of the red shift (5.4135) as the right truncation bound of the dependent variable. That is, the red shift larger this value is right truncated. For the low red shift, the measurement might be unreliable. The values should be omitted if the value of the red shift is lower its estimated error. The estimated error is then treated as the left truncation bound of the red shift observation. Therefore, the red shift value of the quasar is considered to be doubly truncated. Among the 23 covariates, some are the measurement errors of the brightness which are determined by the SDSS team from knowledge of the observing conditions, detector background, and other technical considerations. These covariates are not included in the model. We use 13 main covariates (shown in Table 3) to fit the linear model. Due to the computation capacity limitation, we do not use the full data to fit the model. A random sample with sample size 929 is drawn to illustrate the proposed method. The sampled data is split into the training part and testing part according to the proportion of 7:3. We use the proposed variable selection procedure on the training set to get a list of ”important” covariates and then used the selected model to do prediction on the testing set. The results of the variable selection are summarized in Table 4.
[Insert Table 3 here]
[Insert Table 4 here]
The proposed variable selection procedure gives out 5 ”important” covariates with non-zero coefficients. We use the selected covariates to do the prediction on the testing set. In Figure 1, we draw the line plot to compare the predicted values and the true response values. We see that the predicted values (shown by the dotted line) are quite close to the true values (shown by the solid line). The selected covariates give out reasonable predictions.
[Insert Figure 1 here]
5 Concluding remarks
We combine the Mann-Whitney-type loss function and the adaptive LASSO penalty to develop a simultaneous estimation and variable selection procedure for double truncated linear models, which allows the number of regression parameters to grow to infinity with the sample size. To overcome the non-convexity of the objective function, an iterative algorithm is designed. In each iteration, the objective function to be optimized can be transformed into a LAD problem. The oracle property of the proposed variable selection procedure is proved. We use the modified BIC to select the tuning parameter and adopt random weighting approach for variance estimation. Simulation studies show the reasonable finite sample performances of the proposed approach.
One of the main contributions of the current approach is to allow the dimension of the covariates to go to infinity with the sample size. However, by using the penalty based method, the growing rate of the covariates dimension can not be very large. For the ultra-high dimension situations, some screening approach is needed first. This is one of the interested directions that can be explored in future study.
Appendix A Appendix
A.1 More details on the algorithm
Here we give out some more details on the algorithm discussed in Section 3.2. Define and for . Let be an appropriate index of . Thus . Further define the matrices
and the augmented matrices
where is a -vector of zeros and is a diagonal matrix with the elements . Based on these notations, to minimize the objective function in (5) can be formulated to minimize with respect to . This is a linear progamming problem and can be solved quite efficiently through standard unpenalized LAD procedure, such as quantreg package in R. The specific estimation procedure can be summarized as follows.
(i) Get an initial estimate value, for instance, the estimate value obtained by ignoring the double truncation and the regularization.
(ii) Iteratively solve the unpenalized optimization problem until convergence to get .
(iii) Use to construct the objective function and the corresponding modified one .
(iv) Use as the initial value. Iteratively solve the optimization problem by using linear programming until convergence to get .
A.2 Proof of the large sample properties
To obtain the large sample properties presented in Section 3.3, some regularity conditions are needed. Let be the population of the observed data. Let be an neighborhood of and be the sample space of . We use the notation to represent , where , . Define . We use to represent the -th order derivative of a function of with respect to . We assume the following conditions hold.
C1. The density function is bounded and has a bounded and continuous derivative.
C2. The parameter space is compact and the true parameter value is an interior point of .
C3. Each component of the covariate vector has a bounded second moment.
C4. For all , exist on and continuous at .
C5. Let with . .
C6. There exist positive constants , and such that for all ,
(6) |
where and are the largest and smallest eigenvalue of a matrix, respectively.
C1 and C2 are mild conditions. C2 is used to prove the consistency of , which is crucial to guarantee the consistency of the proposed estimator . C3 is sufficient to meet the condition 5 in Wang et al. (2019) so that some results on high dimensional Hoeffding’s decomposition can used to obtain the quadratic approximation of . C4 is for the Taylor’s expansion of at . C5 is used to bound the empirical term in the Hoeffding’s decomposition of . C6 is used to confine the Hessian matrix of the quadratic approximation of the objective function and ensure that the covariance matrix of the score function is positively definite with uniformly bounded eigenvalues for all . This provides justification for the component-wise asymptotic normality of . Similar conditions can be found in some related literature, e.g., Peng and Fan (2004), Cai et al. (2005), Cho and Qu (2013), and Ni et al. (2016).
We then give out the proof of the proposition and the theorems.
Proof of Proposition 1: For a U-process of order 2 with the kernel function , where is a measurable class of symmetric functions on with a non-negative envelope. Denote a set as with length and a class . According to the Example 1 in Wang et al. (2019) and Lemma 4.4 in Pollard (1990), we have that the pseudo-dimension of is at most for every . If , it is sufficient to guarantee the condition on the pseudo-dimension in Proposition 1 of Wang et al. (2019). Hence, we have the following quadratic approximation
where with . By constraining , the pseudo-dimension is at the rate of . Therefore, all the conditions required listed in Theorem 3 in Wang et al. (2019) are satisfied. Then the consistency of can be obtained.∎
To prove Theorem 1, a lemma is needed.
Lemma A.1.
Assume that the conditions C1 to C6 hold. Then with , converges in distribution to .
Proof: It is not difficult to see that
Note that and . By C3 and C4, is bounded. By the Cauchy-Schwarz inequality, we have that
Thus, the Lyapunov condition holds for . By Lyapunov central limit theorem, converges in distribution to .∎
Proof of Theorem 1: Let . It can be shown that for any and any constant vector with , there exists a large enough such that
This implies the existence of a local minimizer such that . Then we have that
We write that
where
and
Therefore, for large enough , dominates and in . Meanwhile,
For large enough , dominates . Since is positive definite, then . Consequently, with probability tending to 1 as for large enough .∎
To prove Theorem 2, another lemma is needed.
Lemma A.2.
Assume that the conditions C1 to C6 hold. If , , and , then with probability tending to 1, for any given satisfying and any constant , .
Proof: It is sufficient to show that with probability tending to 1, for any satisfying and , , have the same signs for . We have that
It can be shown that
and . Thus, . Following this, we have that
Since , dominates . Therefore, and have the same signs for with probability tending to 1 as .∎
Proof of Theorem 2: The part i) follows directly from Lemma A.2. To prove ii), we first denote that . It is easy to see that converges to 0 as . Since is the minimizer of , . Combining , we get that
(7) |
where consists of the first components of . Rearrange (7) to get
By the condition that , we have that . By C1 and C3, it can be seen that for . Thus, and
Since for any nonzero constant vector with , converges in distribution to . Thus, by Slutsky’s theorem, we conclude that converges in distribution to .∎
Proof of Theorem 3: To state the proof more clearly, some more notation are needed. The underlying true model is denoted by . If we use to represent the estimator at the tuning parameter value , then the corresponding model is denoted by . For any model with size , the corresponding estimated coefficients are . Note that and are two different terms. We need to distinguish them carefully. Next, we divide the space into the following three disjoint regions: i) , ii) , and iii) . This division divides the range of into 3 categories, corresponding to the under-fitted model, the correctly fitted model and the over-fitted model. For the under-fitted model, there exists and . The over-fitted model refers to the model such that for all , , but there exists and . We prove the theorem by considering the two parts: the under-fitted model and the over-fitted model.
For the under-fitted model part, when , the length of the vector is inconsistent with . In order to make and comparable, let us slightly modify the definition of here. In this part, we define for any model ,
By definition,
By C6, we can obtain that
(8) |
By the condition , the term on the right side of (8) is , which is as we choose here. Similarly, we derive that
and
Combining the two results, we can obtain that
(9) |
For the over-fitted model part, for any ,
The last line holds since for any , it must satisfy . Furthermore, as , so
This means that
(10) |
Combining (9) and ( 10) we can conclude that:
as .∎
Acknowledgment
The research of Ming Zheng was supported by the National Natural Science Foundation of China Grants (11771095). The research of Wen Yu was supported by the National Natural Science Foundation of China Grants (12071088).
References
-
[1]
Bhattacharya, P. K., Chernoff, H., and Yang, S. S. (1983), Nonparametric estimation of the slope of a truncated regression, The Annals of Statistics, 11, 505-514.
-
[2]
Bilker, W., and Wang, M.-C. (1996), Generalized Wilcoxon statistics in semiparametric truncation models, Biometrics, 52, 10-20.
-
[3]
Cai, J., Fan, J., Li, r., and Zhou, H. (2005). Variable selection for multivariate failure time data. Biometrika, 92, 303?16.
-
[4]
Cho, H. and Qu, A. (2013), Model selection for correlated data with diverging number of parameters. Statistica Sinica, 23, 901?27.
-
[5]
Efron, B., and Petrosian, V. (1999), Nonparametric methods for doubly truncated data, Journal of the American Statistical Association, 94, 824-834.
-
[6]
Greene, W. H. (2012), Econometric Analysis (7th Ed.), Prentice Hall, Upper Saddle River, NJ.
-
[7]
Keiding, N., and Gill, R. D. (1990), Random truncation models and Markov processes, The Annals of Statistics, 18, 582-602.
-
[8]
Kim, J.P., Lu, W., Sit, T. and Ying, Z. (2013), A unified approach to semiparametric transformation models under generalized biased sampling schemes, Journal of the American Statistical Association, 108, 217-227.
-
[9]
Lai, T. L., and Ying, Z. (1991a), Estimating a distribution function with truncated and censored data, The Annals of Statistics, 19, 417-442.
-
[10]
Lai, T. L., and Ying, Z. (1991b), Rank regression methods for left-truncated and right-censored data, The Annals of Statistics, 19, 531-556.
-
[11]
Liu, H., Ning, J., Qin, J. and Shen, Y. (2016), Semiparametric maximum likelihood inference for truncated or biased-sampling data”, Statistica Sinica, 26, 1087-1115.
-
[12]
Lynden-Bell, D. (1971), A method of allowing for known observational selection in small samples applied to 3CR quasars, Monthly Notices of the Royal Astronomical Society, 155, 95-118.
-
[13]
Moreira, C. and Una-Alvarez, J. (2012), Kernel density estimation with doubly truncated data, Electronic Journal of Statistics, 6, 501-521.
-
[14]
Moreira, C. and Una-Alvarez, J. (2016), Nonparametric regression with doubly truncated data, Computational Statistics and Data Analysis, 93, 294-307.
-
[15]
Ni, A., Cai, J., and Zeng, D. (2016), Variable selection for case-cohort studies with failure time outcome, Biometrika, 103, 547-562.
-
[16]
Peng, H. and Fan, J. (2004). Nonconcave penalized likelihood with a diverging number of parameters, The Annals of Statistics, 32, 928-961.
-
[17]
Pollard, D. (1990), it Empirical Processes: Theory and Applications, IMS,Hayward, CA.
-
[18]
Shen, P.-S., (2010), Nonparametric analysis of doubly truncated data, Annals of the Institute of Statistical Mathematics, 62, 835-853.
-
[19]
Shen, P.-S. (2013a), A class of rank-based tests for doubly-truncated data, TEST, 22, 83-102.
-
[20]
Shen, P.-S. (2013b), Regression analysis of interval censored data and doubly truncated data. Computational Statistics, 28, 581-596.
-
[21]
Tsai, W.-Y. (1990), Testing the independence of truncation time and failure time, Biometrika, 77, 167-177.
-
[22]
Tsui, K.-L., Jewell, N. P., and Wu, C. F. J. (1988), A nonparametric approach to the truncated regression problem, Journal of the American Statistical Association, 83, 785-792.
-
[23]
Turnbull, B. W. (1976), The empirical distribution function with arbitrarily grouped, censored and truncated data, Journal of the Royal Statistical Society, Ser. B, 38, 290–295.
-
[24]
Wang, M.-C., Jewell, N. P., and Tsai, W.-Y. (1986), Asymptotic properties of the product limit estimate under random truncation, The Annals of Statistics, 14, 1597-1605.
-
[25]
Wang, Z., Lin, Y., Liu, W., and Shao, Q. (2019), U-processes with increasing dimensional parameters, working paper.
-
[26]
Ying, Z., Yu, W., Zhao, Z., and Zheng, M. (2020), Regression analysis of doubly truncated data, Journal of the American Statistical Association, 115, 810-821.
-
[27]
Zou, H. (2006), The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418-1429.
Truncation | Method | ME | TN | RCM | |||
percentage | Median | MAD | CN | IN | (%) | ||
300 | 0.3 | Proposed | 0.220 | 0.137 | 13.921 | 0.000 | 92.836 |
Naive | 0.487 | 0.242 | 13.908 | 0.000 | 91.364 | ||
Oracle | 0.109 | 0.076 | 14.000 | 0.000 | 100.000 | ||
0.4 | Proposed | 0.385 | 0.232 | 13.862 | 0.000 | 88.679 | |
Naive | 1.048 | 0.423 | 13.821 | 0.000 | 84.608 | ||
Oracle | 0.158 | 0.112 | 14.000 | 0.000 | 100.000 | ||
500 | 0.3 | Proposed | 0.133 | 0.073 | 15.995 | 0.000 | 99.503 |
Naive | 0.289 | 0.135 | 15.966 | 0.000 | 96.723 | ||
Oracle | 0.062 | 0.043 | 16.000 | 0.000 | 100.000 | ||
0.4 | Proposed | 0.217 | 0.120 | 15.977 | 0.000 | 98.014 | |
Naive | 0.611 | 0.229 | 15.938 | 0.000 | 94.048 | ||
Oracle | 0.084 | 0.057 | 16.000 | 0.000 | 100.000 |
Truncation | Method | ME | TN | RCM | |||
percentage | Median | MAD | CN | IN | (%) | ||
300 | 0.3 | Proposed | 0.226 | 0.162 | 13.85 | 0.00 | 86.905 |
Naive | 0.780 | 0.353 | 13.82 | 0.00 | 83.730 | ||
Oracle | 0.129 | 0.098 | 14.00 | 0.00 | 100.000 | ||
0.4 | Proposed | 0.380 | 0.293 | 13.81 | 0.00 | 83.765 | |
Naive | 1.232 | 0.565 | 13.72 | 0.00 | 75.896 | ||
Oracle | 0.184 | 0.132 | 14.00 | 0.00 | 100.000 | ||
500 | 0.3 | Proposed | 0.118 | 0.087 | 15.98 | 0.00 | 98.214 |
Naive | 0.489 | 0.208 | 15.96 | 0.00 | 95.933 | ||
Oracle | 0.075 | 0.048 | 16.00 | 0.00 | 100.000 | ||
0.4 | Proposed | 0.197 | 0.146 | 15.90 | 0.00 | 91.144 | |
Naive | 0.989 | 0.369 | 15.88 | 0.00 | 89.154 | ||
Oracle | 0.107 | 0.072 | 16.00 | 0.00 | 100.000 |
Name | Description |
---|---|
R.A. | Right Ascension |
Dec. | Declination |
u_mag | Brightness in the u (ultraviolet) band in magnitudes. |
g_mag | Brightness in the g (green) band |
r_mag | Brightness in the r (red) band |
i_mag | Brightness in the i (more red) band |
z_mag | Brightness in the z (even more red) band |
Radio | Brightness in the radio band |
X-ray | Brightness in the X-ray band |
J_mag | Brightness in the near-infrared J band |
H_mag | Brightness in the near-infrared H band |
K_mag | Brightness in the near-infrared K band |
M_i | The absolute magnitude in the i band. |
Covariate | EST | SEE | |
---|---|---|---|
1 | R.A. | 0 | - |
2 | Dec. | 0 | - |
3 | u_mag | 0.340 | 0.024 |
4 | g_mag | -0.320 | 0.046 |
5 | r_mag | 0 | - |
6 | i_mag | 0.500 | 0.043 |
7 | z_mag | 0 | - |
8 | Radio | 0 | - |
9 | X.ray | 0 | - |
10 | J_mag | 0.017 | 0.007 |
11 | H_mag | 0 | - |
12 | K_mag | 0 | - |
13 | M_i | -0.803 | 0.017 |
EST: estimation of coefficients,
SEE: estimation of standard error
