Reconstruction of luminosity function from flux-limited samples
Abstract
The properties of the progenitors of gamma-ray bursts (GRBs) and of their environment are encoded in their luminosity function and cosmic formation rate. They are usually recovered from a flux-limited sample based on Lynden-Bell’s method. However, this method is based on the assumption that the luminosity is independent of the redshift. Observationally, if correlated, people use nonparametric statistical method to remove this correlation through the transformation, , where is the burst redshift, and parameterizes the underlying luminosity evolution. However, the application of this method to different observations could result in very different luminosity functions. By the means of Monte Carlo simulation, in this paper, we demonstrate that the origin of an observed correlation, measured by the statistical method, is a complex combination of multiple factors when the underlying data are correlated. Thus, in this case, it is difficult to unbiasedly reconstruct the underlying population distribution from a truncated sample, unless the detailed information of the intrinsic correlation is accurately known in advance. In addition, we argue that an intrinsic correlation between luminosity function and formation rate is unlikely eliminated by a misconfigured transformation, and the , derived from a truncated sample with the statistical method, does not necessarily represent its underlying luminosity evolution.
keywords:
(transients:) gamma-ray bursts - methods: numerical - stars: luminosity function, mass function1 Introduction
For any astronomical source, there are two key properties that characterises the population: (a) their cosmic formation rate, representing the number of sources per unit comoving volume and time as a function of redshift; (b) their luminosity function, which represents the relative fraction of sources in a given luminosity range per unit volume. The statistical problem at hand is the determination of the two properties from flux-limited samples. Lynden-Bell (1971) applied a novel method to study the luminosity function and density evolution from a flux-limit quasar sample, which is called Lynden-Bell’s method. This method is based on the assumption that the luminosity is independent of the redshift. To overcome this shortcoming, Efron & Petrosian (1992) generalized Lynden-Bell’s idea and developed a non-parametric test statistic for independence, which is called non-parametric statistical method. These methods have been widely used to estimate the intrinsic luminosity function and cosmic formation rate of astronomical sources, such as galaxies (Kirshner et al., 1978; Peterson et al., 1986; Loh & Spillar, 1986), GRBs(Yonetoku et al., 2004; Lloyd-Ronning et al., 2002; Yonetoku et al., 2014; Kocevski & Liang, 2006), and quasars(Maloney & Petrosian, 1999; Singal et al., 2011).
Before applying Lynden-Bell’s method, one must first determine whether L and z are correlated or not. Traditionally, if correlated, many authors (e.g., Lloyd & Petrosian (1999); Maloney & Petrosian (1999); Lloyd-Ronning et al. (2002); Yonetoku et al. (2004); Yu et al. (2015)) parameterized it as the transformation, , where , the power-law redshift dependence is always adopted to parameterize the luminosity evolution. Once a function is found, one could remove the correlation and yield an uncorrelated data set , then their distributions could be estimated by using Lynden-Bell’s method.
By using this method to derive the luminosity function and the formation rate of GRBs, Yu et al. (2015) found that an unexpectedly low-redshift excess in the formation rate of GRBs, compared to the star formation rate (SFR). Whereas, following the same approach, other authors (Pescalli et al., 2016; Tsvetkova et al., 2017) did not.
More recently, Bryant et al. (2021) had re-analysed several previous works (Yu et al., 2015; Pescalli et al., 2016; Tsvetkova et al., 2017; Lloyd-Ronning et al., 2019) and investigated the origin of the evolution of the luminosity/energetics of GRBs with redshift based on the same approach, and found that the effects of the detection threshold have been likely severely underestimated. Then they argued that the observed correlations are artefacts of the individually chosen detection thresholds of the various gamma-ray detectors, and that an inappropriate use of this statistical method could lead to biased scientific discoveries.
So our questions arise: When one applies a truncation function to an intrinsically correlated population, what factors would impact on the distribution of the statistic? In the case, does the transformation decouple the intrinsic correlation between luminosity and redshift of astronomical sources?
In this paper, we will investigate the issues in detail by performing Monte Carlo simulations. An introduction to Lynden-Bell’s method and the nonparametric statistical method is given in Section 2. To investigate the factors affecting the statistic, in Section 3, we first apply the statistical method to a toy model, where the intrinsic correlations between two random variables are known. Then, in Section 4, applying the same approach to the realistic example in an astronomical context, i.e., the luminosity function and the formation rate of GRBs, we explore whether a correlated population distribution could be unbiasedly reconstructed by the transformation. Finally, in Section 5, we present our conclusions and discussions.
2 LYNDEN-BELL’S METHOD AND NONPARAMETRIC TEST METHOD
Following the description about Lynden-Bell’s method in (Ivezić et al., 2020) (see Figures of (4.8) and (4.9) in their Book), Here we give a summary description for the test statistic as follows.
Suppose and are the two random variables (RVs), if they are uncorrelated, the bivariate joint density of can be represented as
(1) |
Assuming that pairs are observable only if they satisfy the truncation function (Efron & Petrosian, 1992),
(2) |
here is a monotonic function of . In an astronomical context, can be considered as redshift, as absolute magnitude, and the truncation function as magnitude limit, as shown in Figure 4.10 in Ivezić et al. (2020).
In the following analysis, we will adopt a symmetric function (Ivezić et al. (2020) (seen in Fig.(1)) as
(3) |
where is a constant. To test for independence when the data is truncated, firstly, for ith object, one can define an associated set as
(4) |
where is the size of the observable sample. This is the largest -limited and -limited data subset for ith object, with elements. This region is shown in Fig. (1) as a black solid rectangle.
Secondly, sorting the set by , then the number of objects with in set is defined as , the rank for ith object. If X and Y are independent (Null hypothesis ), then is uniformly distributed between 1 to . The Efron-Petrosian test statistic is then,
(5) |
where , are the expected mean and the variance of , respectively. This is a specialized version of Kendell’s statistic. The statistic has mean 0 and variance 1 under . As pointed out by Efron & Petrosian (1992), the statistic will approximately follow as a standard normal distribution, , even for as small as 10 under .
Following classical statistical inference, one can accept if , and reject otherwise. The rejection probability of the test for independence would be approximately 0.10. Once accepting , the cumulative distributions for the two random variables are defined as (Ivezić et al., 2020)
(6) |
and
(7) |
Then,
(8) |
where it is assumed that are sorted (). Analogously, if is the number of objects in an associated set defined as . This region is also shown in Fig. (1) as a blue dashed rectangle. Then
(9) |
where it is also assumed that are sorted ().

3 Case of The Bivariate Normal Distribution
A bivariate distribution is a statistical method used to show the probability of occurrence of two random variables. In this section, we first use Monte Carlo simulation to test the statistic based on a truncated bivariate normal distribution when the relations between two random variables are known (similar to the toy model in Ivezić et al. (2020)), and next, we further investigate whether one can apply the statistical method to decouple a correlation between the physical quantities of a source, such as luminosity and redshift, generated by effects of evolution and the truncation or bias introduced by the flux limit.
RVs and have a joint normal distribution, . First step, we investigate this issue based on the assumption that there is none correlation between RVs and . We define a joint probability density function (PDF) from a truncated normal distribution based on the truncMVN python package111https://github.com/zachjennings/truncMVN, with the parameters of (=(0.67,0.33), (=(0.33,0.33), and , and and are truncated in the range of [0,1]. Then we generate a random sample with the size of objects from the population distribution with the emcee sampler222https://emcee.readthedocs.io/en/stable/index.html. Shown in Fig. (2) are the corner and the resultant one-cumulative distributions of the sample. We obtain the sample Pearson correlation coefficient, . To study the effect of selection (or truncated) function on the statistic, we apply three different selection functions with to the sample, respectively, and then obtain three truncated data sets. As seen in the upper panel of Fig. (2), the larger the , the smaller the proportion of truncated samples in the population.
For every truncated data set, we create pseudo samples and each sample contains observable objects333Our analysis shows that the distribution of the statistic is less sensitive to the size of observable sample, as pointed out in Efron & Petrosian (1992) that the has a short-tailed distribution if RVs and are uncorrelated. So here we also use a sample of objects for our analysis.. Then we calculate the distribution of the statistic based on Eq.(5) for these pseudo samples, and compare it to a standard normal distribution by using a KS-test. The results are shown in Fig. (3). The chance probabilities of the three tests are 0.999, 0.968 and 0.999, respectively, which indicates that the statistic follows well a standard normal distribution for all the three cases. Thus, with Monte Carlo simulations, for the first time, we confirm the conclusions proposed by Efron & Petrosian (1992), i.e., the statistic always follow a standard normal distribution for any truncated data as long as does hold.
At the same time, with these pseudo truncated samples, we also derive the one-dimensional cumulative distributions for the two random variables based on Eqs. of (8) and (9), and compare them to their corresponding population distributions by the KS-test, respectively. The comparison results are shown in Fig. (4), which indicates that their corresponding population distributions could be unbiasedly recovered. Note that here we normalize the sample distribution to its corresponding population distribution for comparison. Further investigations show that the reconstructed cumulative distributions are not sensitive to the truncated sample size we adopted.
In conclusion, the statistic is indeed a robust test statistic for independence of the truncated data. Once one accepts (i.e., and are truly independent), their population distribution could be unbiasedly reconstructed from truncated data with Eqs. of (8) and (9), irrespective of adopted selection functions. The fact shows that an observed correlation does unlikely come from a truncation effect as long as the does hold.
Now we wonder whether the statistic still obeys a standard normal distribution when the two random variables, and , are correlated. In that case, whether one would still unbiasedly recover the underlying population distribution from truncated data based on Eqs. of (8) and (9) or not?
With the truncMVN python package and following the same sampling method above, we can also create some pseudo samples from the population distributions with the different values of and , as adopted in Fig. (2). As pointed out by some authors (Kan & Robotti, 2017; Galarza et al., 2020), the moment of the truncated variable depends on the mean and covariance matrix of a bivariate population distribution, and one can find the explicit expression for low order moments of the truncated multivariate normal distribution in their papers. In this paper, we calculate the Pearson correlation coefficient () of the pseudo sample with numpy python library444https://numpy.org/. Finally, by sampling from different populations with different values of and , we obtain four samples with Pearson’s =0.15, 0.32, 0.50, and 0.73, respectively.
To answer the questions mentioned above, we also apply the same three truncation functions above to the four samples, respectively, and obtain three truncated data sets for every sample. Then, with the same methods as done in Figs. of (3) and (4), we calculate the distributions of the statistic and their corresponding one-dimensional cumulative distributions for RVs and , respectively. Meanwhile, the influence of observable sample size on the statistic is also explored. Finally, the results of these analyses are shown in Fig. (5).
Evidently, unlike the case of an uncorrelated bivariate distribution, we can draw the following conclusions from Fig. (5):
In all cases we have explored, the statistic no longer follow a standard normal distribution, but a normal distribution, defined as (where and are the average and standard variance of the distribution, respectively). The changes with the Pearson’s of population, observable sample size (), as well as different selection functions. But the is less sensitive to them, with in all cases (The is shown as the error bar of the data point in the figure), which means that the origin of an observed correlation, measured by the statistical method, is a complex combination of the three factors.
Further investigations show that, in that case, the underlying population distribution can not unbiasedly reconstructed from the truncated data based on Eqs. of (8) and (9).








4 Case of the luminosity function and the formation rate of GRBs
Now, let’s turn our attention to the realistic example in an astronomical context. It is often the case in the analysis of astronomical data that one is faced with reconstructing the joint bivariate distribution from truncated data. And the luminosity function and cosmic formation rate of GRBs (or other astronomical object) is one such set of bivariate data. For simplicity, it is often assumed that such a bivariate distribution is separable in the following form,
(10) |
where is the GRB Formation Rate and is GRB’s LF.
The GRB formation rate is usually assumed to trace the cosmic star formation rate (SFR). Here, we assume that the rate is purely proportional to the SFR, and parameterize it as (Hopkins & Beacom, 2006; Li, 2008)
(11) |
For the GRB’s LF, we adopt the Schechter function (Schechter, 1976) as follows
(12) |
where represents the power-law parameter for the faint-end and is the characteristic luminosity, while serves as the normalisation constant. Due to the typically large span of the luminosities, here we use logarithmic units in the Schechter function, written as follows,
(13) |
In the following analysis, we adopt arbitrary parameter values: and .
In the luminosity-redshift plane, the truncated function is the luminosity limit (The red dashed line in Fig.6), given as
(14) |
where and are the luminosity distance at redshift and flux-limit, respectively. Associated sets and for ith GRB(, ) can be defined as,
(15) |
and
(16) |
respectively (seen Fig.6).
With the same method done in sec. (3), we also draw a pseudo sample with from the joint probability function (Equation 10) with the emcee sampler. With Eqs. of (8) and (9), we could also reconstruct their population distributions well from the observable data of the pseudo sample truncated by the flux limit at . The results are shown in Fig. (6). Further investigations show that, when different values of the flux limit are adopted, the population distributions could always be unbiasedly recovered from their corresponding truncated data, indicating that the non-parametric statistical method can be applied to unbiasedly recover the underlying population distributions from truncated data regardless of adopted detection thresholds. The same conclusion could be arrived in the contex of an uncorrelated bivariate normal distribution.
Unfortunately, here’s the fact that the luminosities of GRBs are strongly correlated with their redshifts is a common feature (Qin et al., 2010; Deng et al., 2016; Yu et al., 2015; Petrosian et al., 2015; Pescalli et al., 2016; Lloyd-Ronning et al., 2019). In this case, one can not directly apply Lynden-Bell’s method to reconstruct their underlying parent distribution.
The popular method to eliminate the correlation (e.g.,Lloyd & Petrosian (1999); Maloney & Petrosian (1999); Lloyd-Ronning et al. (2002); Yonetoku et al. (2004); Yu et al. (2015)) is to parameterize it as the luminosity evolution through the transformation, , where parameterizes the luminosity evolution. Then one could extract the luminosity evolution by varying until . Once the function is found, one could reconstruct their underlying parent luminosity and redshift distributions from this uncorrelated data set, .
Now we verify the correctness of this approach by numerical simulations. We firstly produce a pseudo correlated sample from the sample shown in the upper panel of Fig. (6) by the transformation, , where , and is taken, which means that the information of intrinsic luminosity evolution is known accurately. The corner of the pseudo correlated sample is shown in the upper panel of Fig. (7)).
Next, we produce a truncated data set by the flux limit at (The data above the red line in the upper panel of Fig. (7) are observable). Based on the truncated data, we create pseudo samples, and each pseudo sample contains observable GRBs, the same as done in section 3. For every observable sample, we make the reverse transformation of the intrinsic luminosity evolution , and calculate the best at , defining it as . Next, with Eqs. of (8) and (9), we could calculate their corresponding luminosity and redshift distributions from these uncorrelated data sets . We fit a Gaussian function to the distribution of the , giving . The result is also shown in the lower panel of Fig. (7). As the reconstructed luminosity and redshift distributions are similar to those shown in the lower panel of Fig. (6), we do not plot them repeately. Further investigations show that, the distribution of the and the reconstructed luminosity and redshift distribution are less sensitive to both the adopted detector threshold and observable sample size. These results show that, if the detailed information of the luminosity evolution is accurately known, we could remove the effect of the evolution by making the reverse transformation of the intrinsic luminosity evolution, then unbiasedly reconstruct their underlying parent distribution by Lynden-Bell’s method. Some authors (Yu et al., 2015; Pescalli et al., 2016) came to similar conclusion.
However, this is not the case when the detailed information of intrinsic luminosity evolutions is not known. As shown in Fig (8) is an instance of the case, in which, we assume that the intrinsic luminosity evolves with redshift by the law of , where , is taken to parameterize its intrinsic luminosity evolution. Then, a bias reverse transformation function, such as, , is adopted (The same as that usually adopted by some authors in the astronomical context) to reconstruct their underlying parent population as done in Fig (7). We find that the at also obeys a Gaussian distribution. Then a Gaussian function is employed to fit. This yields a value of , which differs significantly from its intrinsic evolution index. In this case, although its redshift distribution can be unbiasedly recovered, the underlying luminosity distribution can not. Certainly, in this instance, if the reverse transformation function, , is adopted, their underlying parent population could also unbiasedly recovered.
The fact turns out that, in the reconstruction of an intrinsic luminosity function, if using a misconfigured transformation function, one does not unbiasedly recover its underlying parent population, though such a transformation does produce an uncorrelated truncated sample.







5 SUMMARY and DISCUSSION
In this work, we use Monte Carlo simulation to explore what factors would impact the statistic, and that how one could unbiasedly recover the underlying parent population from a truncated sample based on Lynden-Bell’s method. Our main results are as follows.
1. According to Efron & Petrosian (1992), the statistic, measured by Equation 5, always follows a standard normal distribution (see Fig.3) under the premise that the underlying bivariate variables are uncorrelated regardless of adopted selection functions. Under the condition, an uncorrelated bivariate population distribution could always be unbiasedly recovered from a truncated sample with Lynden-Bell’s nonparametric method (Please refer to Ivezić et al. (2020)).
2. On the contrary, when an observable sample comes from an underlying correlated bivariate population distribution, the statistic no longer obeys a standard normal distribution, but a normal distribution with both its average and standard variance changing with the Pearson’s coefficient () of the population, the observable sample size (), as well as different selection functions (see Fig.5), which indicates that, in this situation, the origin of the statistic is a complex combination of multiple factors. In this case, it is very difficult to unbiasedly recover the underlying population from a truncated sample with Lynden-Bell’s method by the transformation, unless the detailed information of the intrinsic correlation is known accurately in advance, then its corresponding reverse transformation is applied in the constructions (see Fig. 6).
3. In practice, the luminosity evolution form, , derived from a truncated sample with the statistical method, does not necessarily represent its underlying luminosity evolution.
By applying the transformation function, , to several GRBs samples, Bryant et al. (2021) found that the resulting is sensitive to be adopted detection thresholds. This fact may indicate that these GRBs samples may likely come from an intrinsic correlated population, according to what we find in Fig. (5). If so, it is extremely difficult to get the detailed information of the intrinsic correlation between the luminosity and redshift of GRBs.
Again, with the transformation method, some authors (Yu et al., 2015; Petrosian et al., 2015; Lloyd-Ronning et al., 2019) found a low-redshift excess in the formation rate of GRBs , whereas others did not (Pescalli et al., 2016; Tsvetkova et al., 2017). Whether the low-redshift excess results from an intrinsic physics? or from an inappropriate transformation method? or is biased by the sample size and completeness. It needs to be further investigated with a larger complete observed sample in the future.
Acknowledgements
This work is supported by the National Natural Science Foundation of China (grant No. 12133003).
Data Availability
The data underlying this article will be shared on reasonable request to the corresponding author.
References
- Bryant et al. (2021) Bryant C. M., Osborne J. A., Shahmoradi A., 2021, MNRAS, 504, 4192
- Deng et al. (2016) Deng C.-M., Wang X.-G., Guo B.-B., Lu R.-J., Wang Y.-Z., Wei J.-J., Wu X.-F., Liang E.-W., 2016, ApJ, 820, 66
- Efron & Petrosian (1992) Efron B., Petrosian V., 1992, ApJ, 399, 345
- Galarza et al. (2020) Galarza C. E., Matos L. A., Dey D. K., Lachos V. H., 2020, arXiv e-prints, p. arXiv:2009.13488
- Hopkins & Beacom (2006) Hopkins A. M., Beacom J. F., 2006, ApJ, 651, 142
- Ivezić et al. (2020) Ivezić Ž., Connolly A. J., VanderPlas J. T., Gray A., 2020, Statistics, Data Mining, and Machine Learning in Astronomy. A Practical Python Guide for the Analysis of Survey Data, Updated Edition
- Kan & Robotti (2017) Kan R., Robotti C., 2017, Journal of Computational and Graphical Statistics, 26, 930
- Kirshner et al. (1978) Kirshner R. P., Oemler A. J., Schechter P. L., 1978, AJ, 83, 1549
- Kocevski & Liang (2006) Kocevski D., Liang E., 2006, ApJ, 642, 371
- Li (2008) Li L.-X., 2008, MNRAS, 388, 1487
- Lloyd & Petrosian (1999) Lloyd N. M., Petrosian V., 1999, ApJ, 511, 550
- Lloyd-Ronning et al. (2002) Lloyd-Ronning N. M., Fryer C. L., Ramirez-Ruiz E., 2002, ApJ, 574, 554
- Lloyd-Ronning et al. (2019) Lloyd-Ronning N. M., Aykutalp A., Johnson J. L., 2019, MNRAS, 488, 5823
- Loh & Spillar (1986) Loh E. D., Spillar E. J., 1986, ApJ, 307, L1
- Lynden-Bell (1971) Lynden-Bell D., 1971, MNRAS, 155, 95
- Maloney & Petrosian (1999) Maloney A., Petrosian V., 1999, ApJ, 518, 32
- Pescalli et al. (2016) Pescalli A., et al., 2016, A&A, 587, A40
- Peterson et al. (1986) Peterson B. A., Ellis R. S., Efstathiou G., Shanks T., Bean A. J., Fong R., Zen-Long Z., 1986, MNRAS, 221, 233
- Petrosian et al. (2015) Petrosian V., Kitanidis E., Kocevski D., 2015, ApJ, 806, 44
- Qin et al. (2010) Qin S.-F., Liang E.-W., Lu R.-J., Wei J.-Y., Zhang S.-N., 2010, MNRAS, 406, 558
- Schechter (1976) Schechter P., 1976, ApJ, 203, 297
- Singal et al. (2011) Singal J., Petrosian V., Lawrence A., Stawarz Ł., 2011, ApJ, 743, 104
- Tsvetkova et al. (2017) Tsvetkova A., et al., 2017, ApJ, 850, 161
- Yonetoku et al. (2004) Yonetoku D., Murakami T., Nakamura T., Yamazaki R., Inoue A. K., Ioka K., 2004, ApJ, 609, 935
- Yonetoku et al. (2014) Yonetoku D., Nakamura T., Sawano T., Takahashi K., Toyanago A., 2014, ApJ, 789, 65
- Yu et al. (2015) Yu H., Wang F. Y., Dai Z. G., Cheng K. S., 2015, ApJS, 218, 13