Photometric Redshifts for Cosmology: Improving Accuracy and Uncertainty Estimates Using Bayesian Neural Networks

Evan Jones
University of California, Los Angeles
[email protected]
\AndTuan Do
University of California, Los Angeles
[email protected]
\AndBernie Boscoe
Occidental College
[email protected]
\AndYujie Wan
University of California, Los Angeles
[email protected]
\AndZooey Nguyen
University of California, Los Angeles
[email protected]
\AndJack Singal
University of Richmond
[email protected]

Abstract

We present results exploring the role that probabilistic deep learning models can play in cosmology from large scale astronomical surveys through estimating the distances to galaxies (redshifts) from photometry. Due to the massive scale of data coming from these new and upcoming sky surveys, machine learning techniques using galaxy photometry are increasingly adopted to predict galactic redshifts which are important for inferring cosmological parameters such as the nature of dark energy. Associated uncertainty estimates are also critical measurements, however, common machine learning methods typically provide only point estimates and lack uncertainty information as outputs. We turn to Bayesian neural networks (BNNs) as a promising way to provide accurate predictions of redshift values. We have compiled a new galaxy training dataset from the Hyper Suprime-Cam Survey, designed to mimic large surveys, but over a smaller portion of the sky. We evaluate the performance and accuracy of photometric redshift (photo-z) predictions from photometry using machine learning, astronomical and probabilistic metrics. We find that while the Bayesian neural network did not perform as well as non-Bayesian neural networks if evaluated solely by point estimate photo-z values, BNNs can provide uncertainty estimates that are necessary for cosmology.

1 Introduction

As large astronomical surveys come online in the next few years both from the ground with LSST and the Vera Rubin Telescope and in space with Euclid, many researchers are turning to machine learning to handle the exponentially increasing influx of data. However, common machine learning methods often only provide point estimates and do not generally provide accurate confidence intervals for specific predictions [12, 13, 3]. Accurate uncertainty estimates, whether from machine learning or elsewhere, are critical for measurements from these surveys because these measurements and uncertainties are not the end goals; rather, they are subsequent inputs into inference to constrain models of our Universe.

One crucial goal of these surveys is to determine expansion history of the universe and with it the parameters describing dark energy. This determination relies ultimately on accurately and precisely measuring the redshifts of hundreds of millions of galaxies. Spectroscopically measuring galaxy redshifts, where the light is split into hundreds of small bins of wavelength, is time consuming and practically impossible for the necessary sample size to constrain cosmological parameters. Instead of measuring detailed galaxy spectra to determine redshift, one can take images of galaxies in a few large bins of wavelength (photometry). While galaxy photometry contains information about redshift, the observed variation between the intrinsic properties of galaxies makes it difficult to model a-priori. Astronomers have adopted data driven approaches using machine learning methods for redshift estimates using photometry [24, 1, 4, 27, 14, 29].

In this work, we investigate photometric redshift (photo-z) estimation using Bayesian neural networks (BNN), a type of probabilistic neural network (NN) [15]. Probabilistic neural networks, conceptualized in the 1990s [26], have previously been limited in their ability to process the size of data required for performing photo-z estimation for large-scale surveys, because of the complexity of their computation. However, recent breakthroughs in conceptual understanding and computational capabilities (e.g. [10, 9]) now make probabilistic deep learning possible for cosmology. Probabilistic deep learning such as BNN has many advantages compared to traditional neural networks, including better uncertainty representations, better point predictions, and offers better interpretability of neural networks because they can be viewed through the lens of probability theory. In this way we can draw upon decades of development in Bayesian inference analyses. We compare the performance of a BNN to a fully-connected non-probabilistic NN and evaluate the accuracy of the confidence intervals for the probabilistic predictions. To our knowledge this is the first application of BNN for photo-z estimation. We compare the NN to BNN in order to assess the effect of incorporating Bayesian statistics to photo-z estimation; we intentionally compose both models to be as similar as possible.

2 Data and Methods

2.1 Data: Galaxy observations

Refer to caption — Figure 1: Left: typical galaxy (z = 0.48) image in the $i$ -band. Middle: five-band photometry for the same galaxy. Right: N(z) distribution for the dataset discussed in §2.1 For the photo-z determinations in this work we use training and testing sets consisting of 229,120 and 28,640 galaxies respectively.

For the analysis in this work we compile a dataset intended to approximate the data produced by future large-scale deep surveys for photo-z estimation [6]. We use the Hyper-Suprime Cam (HSC) Public Data Release 2 (PDR2) [1], which is designed to reach similar depths as LSST but over a smaller portion of the sky. We crossmatched galaxy photometry from HSC with the HSC collection of publicly available spectroscopic redshifts [18],[2, 20], [25, 21],[17], [11], [19], [8, 22], [5, 7] using the galaxies’ sky positions (d < 1 arcsecond). We use data quality cuts similar to [23] and [24] (see ¹¹1https://doi.org/10.5281/zenodo.5528827 for full list). We use the spectroscopic redshift values as the ground truth for training and evaluation. We also select only one set of g,r,i,z,y measurements per galaxy. In total, our data consists of 286,401 galaxies with broad-band g,r,i,z,y photometry from the HSC PDR2 survey and spectroscopic redshifts. The majority of galaxies in our sample lies between redshift of 0.01 and 2.5 (see N(z) in Fig. 1). We use 80% for training, 10% for validation, and 10% for testing.

2.2 Network architectures

We compare the performance of a NN and BNN using a similar architecture. Both the NN and the BNN are implemented in TensorFlow and have five input nodes for photometry with four hidden layers (200 nodes per layer with rectified linear activation function). The networks also have a skip connection between the input nodes and the final layer. The NN has an output node to produce a single point estimate photo-z prediction. The BNN has a final output node that produces a mean and standard deviation assuming a Gaussian distribution for each photo-z prediction. For the BNN we use a negative log likelihood loss function with RMS error as the metric while the NN uses a mean absolute error loss function. Both models use the Adam optimizer. We train using an AMD Ryzen Threadripper PRO 3955WX with 16-Cores and NVIDIA RTX A6000. Training and evaluation runtimes are typically under 30 minutes.

While optimal hyperparameters for each network such as the number of nodes per layer and number of layers differed slightly between the BNN and NN, we find the difference in performance is negligible and use similar architectures when possible for the sake of comparison. We choose the negative log-likelihood loss function for the BNN because it has been shown to be more effective than MAE for probabilistic NNs [16].

2.3 Metrics

To measure model performance we evaluate predictions using the metrics in Table 2. We define “outliers” in Eq. 1, where $z_{phot}$ and $z_{spec}$ are the estimated photo-z and actual (spectroscopically determined) redshift of the galaxy. An advantage of the BNN is that the model naturally outputs an uncertainty for each photo-z prediction; using the associated uncertainties we can consider an additional quality metric defined in Eq. 2, where the uncertainty $\sigma$ is the standard deviation of the photo-z PDF produced by the BNN. The RMS photo-z error in a determination is given by a standard definition in Eq. 3, where $n_{gals}$ is the number of galaxies in the evaluation testing set and $\Sigma_{gals}$ represents a sum over those galaxies. Bias and dispersion are defined in Eqs. 5 and 6, where MAD is the median absolute deviation. We follow [28] and define a loss function in Eq. 7 to characterize the point estimate photo-z accuracy with a single number, where we use $\gamma=0.15$ .

Finally, a key metric in assessing the performance of the BNN is ‘coverage’, which we use as a metric for determining whether we have accurate uncertainties. Coverage is the fraction of galaxies that have a spectro-z within their 68 $\%$ confidence interval. Ideally, 68 $\%$ of evaluated galaxies should have true spectro-zs within their 68 $\%$ confidence interval. If more than 68 $\%$ of evaluated galaxies have spectro-zs within their 68 $\%$ confidence interval, the galaxies are considered ‘over-covered’ because their photo-z uncertainties are too large. The same logic applies for ‘under-covered’ galaxies.

Table 1: Metrics used to assess model performance.

Point Metrics		Probabilistic Metrics
Outlier	$O:{{\|z_{phot}-z_{spec}\|}\over{1+z_{spec}}}>.15$ (1)	Bayesian Outlier $O_{b}:{{\|z_{phot}-z_{spec}\|-\sigma}\over{1+z_{spec}}}>.15$ (2)
RMS error	$\sqrt{{{1}\over{n_{gals}}}\Sigma_{gals}\left({{z_{phot}-z_{spec}}\over{1+z_{spec}}}\right)^{2}}$ (3)	coverage: $\displaystyle\sum_{i}^{n_{gals}}\frac{(\bar{z}_{pdf,i}-z_{spec,i})<z_{\sigma,i}}{n_{gals}}$ (4)
bias	$b={{z_{phot}-z_{spec}}\over{1+z_{spec}}}$ (5)
MAD	Median( $\|\Delta z-$ Median $(\Delta z_{i})\|)$ (6)
loss	$L(\Delta z)=1-\frac{1}{1+(\frac{\Delta z}{\gamma})^{2}}$ (7)

3 Results

We compare the performance of the NN and BNN on the data discussed in §2.1 in Table 2. The percentage of point-source outlier predictions as defined by Eq. 1 differ for the NN and BB. For the NN, only 6.5% of points lie outside of the lines as outliers for the NN and 17.3% for the BNN. With the modified outlier metric (Eq. 2) discussed in §2.3 we obtain $O_{b}=6.3\%$ . Fig. 2 contains results from an example determination with a BNN and non-Bayesian NN, where results are divided into bins of size $z=0.1$ and averaged. We note that both models generally perform worse at higher redshifts, which is due in large part to the reduced signal to noise for distant dim sources and also the disproportionate number of high redshift sources (z > 2.5) compared to low redshift sources. The latter point is an unavoidable attribute of similar datasets. We note that low redshifts are generally over-covered, indicating that their photo-z uncertainties are over-estimated, while the photo-z uncertainties of high redshift galaxies are generally under-estimated.

While our goal in this work is to compare the two types of NNs, we provide a comparison to LSST requirements [6] (Table 2) for reference. The NN meets the LSST goal for outlier rate and bias. We believe both models could be further optimized for these requirements.

Table 2: Comparison of BNN to NN performance averaged over all evaluation galaxies for a sample determination. We include LSST science requirements for reference when possible.

Network	$O$	$O_{b}$	RMS	$\|b\|$	MAD	$L(\Delta z)$	coverage	${\sigma}/(1+z_{spec})$
BNN	0.173	0.063	0.225	0.007	0.074	0.22	0.78	0.005
NN	0.065	-	0.174	0.002	0.023	0.095	-	-
LSST Req.	< 0.15	-	-	< 0.003	< 0.02

4 Discussion

Compared to the non-probabilistic NN, the BNN has the advantage of producing uncertainty constraints on every prediction, which are necessary for using photo-z estimation as a probe of cosmological parameters. Based on the results in this work, the BNN has the disadvantage of generally producing worse point estimates compared to the NN, however, pairing photo-z predictions with uncertainties provides a more robust look into the quality of photo-z predictions. We note that the BNN in this work was designed with the intention to closely resemble the non-probabilistic NN, and therefore our findings may not generalize to other BNN models. Optimal photo-z performance from the BNN may require significant model adjustments, which is an on-going study. Fig. 2 (right) visualizes the $O$ and $O_{b}$ rates per redshift bin; as expected, the number of outliers decreased when considering the photo-z uncertainty. The uncertainties produced by the BNN are larger than expected for $0<z<2.5$ and are underestimated in $2.5<z<4$ . This is not necessarily a flaw inherent to a BNN, but it is worth investigating how to develop better uncertainty estimates with broad-band photometry. It is possible that one source of the over-estimation of photo-z uncertainties results from a disparity between the complexity present in the band magnitudes compared to the BNN model; we use five photometric band fluxes paired with a single spectroscopic redshift per galaxy for training, while the model parameters optimized during the training process of a NN can easily reach into the thousands. In a future work we will apply galaxy photometric images to a Bayesian convolutional neural network, which we believe will enhance the information present in the band fluxes.

References

[1] Hiroaki Aihara et al. “Second data release of the Hyper Suprime-Cam Subaru Strategic Program” In Publications of the Astronomical Society of Japan 71.6, 2019, pp. 114 DOI: 10.1093/pasj/psz103
[2] E.. Bradshaw et al. “High-velocity outflows from young star-forming galaxies in the UKIDSS Ultra-Deep Survey” In Monthly Notices of the Royal Astronomical Society 433.1, 2013, pp. 194–208 DOI: 10.1093/mnras/stt715
[3] D. Carrasco et al. “Photometric classification of quasars from RCS-2 using Random Forest” In Astronomy & Astrophysics 584, 2015, pp. A44 DOI: 10.1051/0004-6361/201525752
[4] M. Carrasco Kind and R.. Brunner “TPZ : Photometric redshift PDFs and ancillary information by using prediction trees and random forests” _eprint: 1303.7269 In Mon. Not. Roy. Astron. Soc. 432, 2013, pp. 1483 DOI: 10.1093/mnras/stt574
[5] Alison L. Coil et al. “THE PRISM MULTI-OBJECT SURVEY (PRIMUS). I. SURVEY OVERVIEW AND CHARACTERISTICS” In The Astrophysical Journal 741.1, 2011, pp. 8 DOI: 10.1088/0004-637X/741/1/8
[6] The LSST Dark Energy Science Collaboration et al. “The LSST Dark Energy Science Collaboration (DESC) Science Requirements Document” arXiv: 1809.01669 version: 2 In arXiv:1809.01669 [astro-ph], 2021 URL: http://arxiv.org/abs/1809.01669
[7] Richard J. Cool et al. “THE PRISM MULTI-OBJECT SURVEY (PRIMUS). II. DATA REDUCTION AND REDSHIFT FITTING” In The Astrophysical Journal 767.2, 2013, pp. 118 DOI: 10.1088/0004-637X/767/2/118
[8] Marc Davis et al. “Science Objectives and Early Results of the DEEP2 Redshift Survey” arXiv: astro-ph/0209419 In arXiv:astro-ph/0209419, 2003, pp. 161 DOI: 10.1117/12.457897
[9] Michael W. Dusenberry et al. “Analyzing the role of model uncertainty for electronic health records” In Proceedings of the ACM Conference on Health, Inference, and Learning Toronto Ontario Canada: ACM, 2020, pp. 204–213 DOI: 10.1145/3368555.3384457
[10] Angelos Filos et al. “A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks” arXiv: 1912.10481 In arXiv:1912.10481 [cs, eess, stat], 2019 URL: http://arxiv.org/abs/1912.10481
[11] B Garilli et al. “The VIMOS Public Extragalactic Survey $VIPERS$ ”, 2014, pp. 18
[12] Philip Graff, Farhan Feroz, Michael P. Hobson and Anthony Lasenby “SkyNet: an efficient and robust neural network training tool for machine learning in astronomy” In Monthly Notices of the Royal Astronomical Society 441.2, 2014, pp. 1741–1759 DOI: 10.1093/mnras/stu642
[13] E. Jones and J. Singal “Analysis of a custom support vector machine for photometric redshift estimation and the inclusion of galaxy shape information” In Astronomy & Astrophysics 600, 2017, pp. A113 DOI: 10.1051/0004-6361/201629558
[14] E. Jones and J. Singal “Tests of Catastrophic Outlier Prediction in Empirical Photometric Redshift Estimation with Redshift Probability Distributions” In Publications of the Astronomical Society of the Pacific 132.1008, 2020, pp. 024501 DOI: 10.1088/1538-3873/ab54ed
[15] Laurent Valentin Jospin et al. “Hands-on Bayesian Neural Networks – a Tutorial for Deep Learning Users” arXiv: 2007.06823 In arXiv:2007.06823 [cs, stat], 2020 URL: http://arxiv.org/abs/2007.06823
[16] Balaji Lakshminarayanan, Alexander Pritzel and Charles Blundell “Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles” In Advances in Neural Information Processing Systems 30 Curran Associates, Inc., 2017 URL: https://proceedings.neurips.cc/paper/2017/hash/9ef2ed4b7fd2c810847ffa5fa85bce38-Abstract.html
[17] O. Le Fèvre et al. “The VIMOS VLT Deep Survey final data release: a spectroscopic sample of 35 016 galaxies and AGN out to z ~ 6.7 selected with 17.5 i ${}_{\textrm{{AB}}}$ 24.75” In Astronomy & Astrophysics 559, 2013, pp. A14 DOI: 10.1051/0004-6361/201322179
[18] Simon J. Lilly et al. “THE zCOSMOS 10k-BRIGHT SPECTROSCOPIC SAMPLE” In The Astrophysical Journal Supplement Series 184.2, 2009, pp. 218–229 DOI: 10.1088/0067-0049/184/2/218
[19] J. Liske et al. “Galaxy And Mass Assembly (GAMA): end of survey report and data release 2” In Monthly Notices of the Royal Astronomical Society 452.2, 2015, pp. 2087–2126 DOI: 10.1093/mnras/stv1436
[20] R.. McLure et al. “The sizes, masses and specific star formation rates of massive galaxies at 1.3 < z < 1.5: strong evidence in favour of evolution via minor mergers” In Monthly Notices of the Royal Astronomical Society 428.2, 2013, pp. 1088–1106 DOI: 10.1093/mnras/sts092
[21] Ivelina G. Momcheva et al. “THE 3D-HST SURVEY: HUBBLE SPACE TELESCOPE WFC3/G141 GRISM SPECTRA, REDSHIFTS, AND EMISSION LINE MEASUREMENTS FOR 100,000 GALAXIES” In The Astrophysical Journal Supplement Series 225.2, 2016, pp. 27 DOI: 10.3847/0067-0049/225/2/27
[22] Jeffrey A. Newman et al. “THE DEEP2 GALAXY REDSHIFT SURVEY: DESIGN, OBSERVATIONS, DATA REDUCTION, AND REDSHIFTS” In The Astrophysical Journal Supplement Series 208.1, 2013, pp. 5 DOI: 10.1088/0067-0049/208/1/5
[23] Atsushi J. Nishizawa, Bau-Ching Hsieh, Masayuki Tanaka and Tadafumi Takata “Photometric Redshifts for the Hyper Suprime-Cam Subaru Strategic Program Data Release 2” arXiv: 2003.01511 In arXiv:2003.01511 [astro-ph], 2020 URL: http://arxiv.org/abs/2003.01511
[24] S. Schuldt et al. “Photometric redshift estimation with a convolutional neural network: NetZ” In Astronomy & Astrophysics 651, 2021, pp. A55 DOI: 10.1051/0004-6361/202039945
[25] Rosalind E. Skelton et al. “3D-HST WFC3-SELECTED PHOTOMETRIC CATALOGS IN THE FIVE CANDELS/3D-HST FIELDS: PHOTOMETRY, PHOTOMETRIC REDSHIFTS, AND STELLAR MASSES” In The Astrophysical Journal Supplement Series 214.2, 2014, pp. 24 DOI: 10.1088/0067-0049/214/2/24
[26] Donald F. Specht “Probabilistic neural networks” In Neural Networks 3.1, 1990, pp. 109–118 DOI: 10.1016/0893-6080(90)90049-Q
[27] Masayuki Tanaka “PHOTOMETRIC REDSHIFT WITH BAYESIAN PRIORS ON PHYSICAL PROPERTIES OF GALAXIES” In The Astrophysical Journal 801.1, 2015, pp. 20 DOI: 10.1088/0004-637X/801/1/20
[28] Masayuki Tanaka et al. “Photometric redshifts for Hyper Suprime-Cam Subaru Strategic Program Data Release 1” In Publications of the Astronomical Society of Japan 70.SP1, 2018 DOI: 10.1093/pasj/psx077
[29] M. Wyatt and J. Singal “Outlier Prediction and Training Set Modification to Reduce Catastrophic Outlier Redshift Estimates in Large-scale Surveys” ADS Bibcode: 2021PASP..133d4504W In Publications of the Astronomical Society of the Pacific 133, 2021, pp. 044504 DOI: 10.1088/1538-3873/abe5fb