20XX Vol. X No. XX, 000–000
22institutetext: School of Astronomy and Space Science, University of Chinese Academy of Sciences, 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
33institutetext: School of Mechanical, Electrical and Information Engineering, Shandong University, 180 Wenhua Xilu, Weihai, 264209, Shandong, People’s Republic of China
\vs\noReceived 20XX Month Day; accepted 20XX Month Day
Low surface brightness galaxies from BASS+MzLS with Machine Learning
Abstract
From 5000 deg2 of the combination of the Beijing-Arizona Sky Survey (BASS) and Mayall -band Legacy Survey (MzLS) which is also the northern sky region of the Dark Energy Spectroscopic Instrument (DESI) Legacy Imaging Surveys, we selected a sample of 31,825 candidates of low surface brightness galaxies (LSBGs) with the mean effective surface brightness 24.2 28.8 mag arcsec-2 and the half-light radius 2.5′′ 20′′ based on the released photometric catalogue and the machine learning model. The distribution of the LSBGs is of bimodality in the - color, indicating the two distinct populations of the blue ( - 0.60) and the red ( - 0.60) LSBGs. The blue LSBGs appear spiral, disk or irregular while the red LSBGs are spheroidal or ellipitcal and spatially clustered. This trend shows that the color has a strong correlation with galaxy morphology for LSBGs. In the spatial distribution, the blue LSBGs are more uniformly distributed while the red ones are highly clustered, indicating that red LSBGs preferentially populated denser environment than the blue LSBGs. Besides, both populations have consistent distribution of ellipticity (median 0.3), half-light radius (median 4′′), and Srsic index (median = 1), implying the dominance of the full sample by the round and disk galaxies. This sample has definitely extended the studies of LSBGs to a regime of lower surface brightness, fainter magnitude, and broader other properties than the previously SDSS-based samples.
keywords:
catalogues – galaxies: disc – galaxies: fundamental parameters – galaxies: statistics – techniques: photometric1 INTRODUCTION
Low surface brightness galaxies (LSBGs) are traditionally defined as galaxies with the -band central surface brightnesses () fainter than a threshold value within 21.65 - 23.0 mag arcsec-2(Freeman 1970; Impey & Bothun 1997; O’Neil et al. 1997; Zhong et al. 2008; Du et al. 2015). In addition, the in some other optical or near infrared bands such as the (Courteau 1996), (Adami et al. 2006), and bands (Monnier Ragaigne et al. 2003) have been adopted to distinguish between LSBGs and high surface brightness galaxies (HSBGs) as well. Besides the , the mean surface brightness within effective radius () has also been utilized to define LSBGs, for example, the criterion of the -band 24.2 - 24.3 mag arcsec-2 was once used to select LSBGs in Greco et al. (2018); Tanoglidis et al. (2021b), allowing for the retention of nucleated galaxies in the sample.
LSBGs are characterized by their diffuse, extended, low-density stellar discs and most of them are blue in color (de Blok et al. 1996; Burkholder et al. 2001; O’Neil et al. 2004; Trachternach et al. 2006; Vorobyov et al. 2009; Zhang et al. 2024). In morphology, they are disk-like or irregular (de Blok & McGaugh 1996, 1997; de Blok et al. 2001). Compared to HSBGs, LSBGs have different properties, including low star formation rates (van der Hulst et al. 1993; van Zee et al. 1997; van den Hoek et al. 2000; Wyder et al. 2009; Schombert et al. 2011; Galaz et al. 2011; Lei et al. 2018, 2019; Galaz et al. 2022), low metallicities (de Blok & van der Hulst 1998a, b; Kuzio de Naray et al. 2004; Du et al. 2017), high gas fractions (Huang et al. 2014; Du et al. 2015; He et al. 2020), low dust content (Matthews et al. 2001; Rahman et al. 2007; Hinz et al. 2007), and low AGN fraction (Galaz et al. 2011), which indicate that LSBGs are different in star formation and evolutionary history from HSBGs. Therefore, it is vital to study LSBGs to complete the current paradigm of galaxy formation and evolution. Moreover, given that LSBGs contributing approximately 20 (Minchin et al. 2004) to the dynamical mass of the galaxies in the universe and 30 - 60 (McGaugh et al. 1995; McGaugh 1996; Bothun et al. 1997; O’Neil et al. 2000; Trachternach et al. 2006; Haberzettl et al. 2007; Martin et al. 2019) to the number density of galaxies in the local universe, LSBGs play a significant role in understanding the universe.
In the past, researches on LSBGs were primarily concentrated in smaller regions such as massive galaxy clusters (Sabatini et al. 2005; van Dokkum et al. 2015; Venhola et al. 2017), satellites of nearby galaxies (Martin et al. 2013; Cohen et al. 2018), and other nearby clusters. However, with the advancement of modern observational technology and the emergence of larger, more sensitive telescopes, it has become possible to untargeted search for LSBGs using the deep and wide-field imaging surveys. In the recent decades, wide-field galaxy surveys have revealed a large number of LSBGs. For example, the Sloan Digital Sky Survey (SDSS; York et al. 2000) DR4 had established a population of 12,282 face-on LSBGs by Zhong et al. (2008). Greco et al. (2018) discovered 781 LSBGs with untargeted search in the Hyper Suprime-Cam Subaru Strategic Program(HSC-SSP; Aihara et al. 2018b). Recently, Tanoglidis et al. (2021b) constructed a large sample of 23,790 LSBGs based on the first three years of data from the Dark Energy Survey (DES; The Dark Energy Survey Collaboration 2005). In addition, the imaging survey of SDSS DR7 and the 40 Arecibo Legacy Fast ALFA (ALFALFA) Survey (Giovanelli 2007) have been combined to search for samples with low optical surface brightnesses and abundant neutral hydrogen gas (Du et al. 2015; He et al. 2020). More recently, the candidates of the ultra-diffuse galaxies (UDGs), a subset of LSBG with -band mag arcsec-2 and the effective radii kpc (van Dokkum et al. 2015), are selected by Zaritsky et al. (2022, 2023) from the Dark Energy Spectroscopic Instrument (DESI) Legacy Imaging Surveys (hereafter referred to as the Legacy Surveys; Dey et al. 2019).
In recent years, the advent of more and more deep and wide imaging surveys brought unprecedented opportunities to detect numerous LSBGs with much fainter surface brightness than before. With these samples of more LSBGs with much lower surface brightnesses from the images at previously unreachable depth, the existing LSBG samples that are dominated by brighter LSBGs( 24.0 mag arcsec-2) would be highly completed by fainter LSBGs that have much lower surface brightnesses, which would be definitely useful to refine or complete the extant conclusions that are biased towards the LSBGs with brighter surface brightnesses, and provide new constraints on galaxy formation theory and the cosmological models. Thanks to the increasingly widespread application of the computer techniques in the field of astronomy (Cheng et al. 2020), it is possible to expedite the search for LSBGs amidst the continuously increasing astronomical data with the help of the available computer techniques, such as machine learning. For example, Tanoglidis et al. (2021b) searched for LSBGs from the data of the first three years of DES observing (DES Y3), utilizing the machine learning techniques. In other work, the deep learning techniques are used to identify LSBGs from the digital sky survey images (Zaritsky et al. 2019; Tanoglidis et al. 2021a; Yi et al. 2022). In this paper we are inspired to obtain a catalogue of LSBG candidates from the data from DR9 of the northern portion of the Legacy Surveys in virtue of the machine learning technique.
In this paper, we briefly describe the Legacy Surveys and the initial data in Section 2, and described the initial data and the selection of the sample of the LSBG candidates by using the machine learning in Section 3. We studied the properties of the sample of the LSBG candidates in Section 4, such as the color distribution, spatial distribution, and other properties. Finally, we compare the LSBG candidates with the LSBG samples from several previous publications in Section 5 and make a summary in Section 6.
2 DATA
The DESI Legacy Surveys conducted observations of 14,000 deg2 of extragalactic sky in three optical bands (, , ). The 5 point source depths for the Data Release 9 (DR9) of the Legacy Surveys are about = 24.7, = 23.9, and = 23.0 AB mag, apparently deeper than those for the SDSS images which are = 23.13, = 22.70, and = 20.71, respectively. Therefore, the data from the Legacy Surveys are expected to embrace numerous galaxies with much lower surface brightness than the SDSS data.
Additionally, the Legacy Surveys are composed of three imaging projects of the Beijing-Arizona Sky Survey (BASS; Zou et al. 2017), the Mayall -band Legacy Survey (MzLS), and the Dark Energy Camera Legacy Survey (DECaLS). Specifically, the BASS has surveyed an area of 5500 deg2 which is dominated by the region of the sky at Dec (with only 4 located at Dec ) in the optical and bands using the Bok 2.3m telescope at Kitt Peak. The MzLS has observed nearly the same sky region as the BASS at Dec but in the -band, which well provides a complementary band to extend the band coverage of the BASS. Hereafter, we refer to the survey of both BASS and MzLS at Dec as the BASS+MzLS, and intend to select the LSBG candidates from the data of BASS+MzLS.
3 LSBG CATALOG
In this section, we elaborate the procedures that we followed to select the LSBG candidates from the BASS+MzLS, based on the combination of the publicly photometric catalogue produced by the Tractor software (Lang et al. 2016) for DR9 and the machine learning technique.
3.1 Initial sample selection
The Tractor catalogue of DR9 for the BASS+MzLS provides valuable properties for the total sample of 364,277,779 extracted sources, including astrometry, photometry, and geometry. Based on some crucial properties in this catalogue, we select the LSBG candiates according to the following procedures step by step.
First of all, most LSBGs are acknowledged to be dominated by an extended disk, so we remove sources with morphological types (type) of PSF, DUP or DEV from the total sample. By this criterion, the point sources, coincident Gaia sources, or elliptical galaxies are excluded and 187,492,198 sources () of the total sample are retained.
Secondly, we require the sources to have half-light radius (as measured via shape_r parameter in the Tractor catalog) to focus on the extended galaxies, and simultaneously require to reject spurious sources or imaging artifacts, following Greco et al. (2018) and Tanoglidis et al. (2021b) where the detections larger than this scale in HSC-SSP or DES images were inspected to be rare and generally spurious. By this criterion, 2,999,940 sources (out of 187,492,198) are retained.
Then, to avoid the sources seriously polluted by the nearby contaminants which would cause unreliable model fitting results, we require our sources to satisfy the following criteria according to Ruiz-Macias et al. (2020):
(1) |
where X represents the , , or bands. The fracmaskedX, fracfluxX and fracinX are parameters in the Tractor catalogue which could probe the quality of the model fitting for the sources. Specifically, the fracmaskedX, the profile-weighted fraction of pixels masked from other observations of the target object, is used to remove sources with a high fraction of masked pixels. The fracfluxX, the profile-weighted fraction of the flux from other sources divided by the target object flux, is used to reject objects with heavily contaminated flux. The fracinX, the fraction of the flux from the target source within the blob, a group of pixels, is used to select sources with a large fraction to ensure well-constrained model fits. By this criterion, 1,622,986 sources, approximately 0.446% of the total sample, are retained.
Here before the next criterion, we correct the flux for the Galactic extinction and convert it to the magnitude with the prescription (Equation (2)) given by the Legacy Surveys.
(2) |
where X represents the , , or bands. is the model flux in the X band, measured as fluxX in unit of nanomaggy in the Tractor catalogue, the Galactic transmission of the object position in the X filter, measured as mwtransmissionX in linear units from 0 to 1 in the catalogue, where 1 represents a fully transparent region of the Milky Way and 0 a fully opaque region. , the Galactic-extinction corrected flux, is further converted to the magnitude, , based on which the colors of - and - are obtained.
After that, we request the colors to be within the color box defined by
(3) |
This color box was empirically determined based on the distribution of the total sample of the BASS+MzLS in the - versus - diagram, as shown in Figure 1 where the three black solid contours from the inside out, respectively, enclose 68.2, 95.6, 99.8 of the total sample. For determining the color requirements (the red box in Figure 1), our principles are including the majority of the galaxies within the central contour where 68.2 of the total sample gathers while excluding the high redshift galaxies and spurious objects. By satisfying the color requirements (equation (3)), 994,459 sources ( of the total sample) are retained.

Subsequently, we require the ellipticity (1 - b/a) to be less than 0.7 (an axis ratio, b/a, greater than 0.3) to avoid edge-on galaxies, some spurious objects with high ellipticity (e.g., diffraction spikes), or the most obvious lensed galaxies. By this criterion, 772,745 sources (0.212% of the total sample) are retained.
Finally, we calculate the mean surface brightness within the half-light radius, , by using Equation (4):
(4) |
where is the half-light radius, measured as shape_r in the Tractor catalogue. We require the to be within mag arcsec-2 and obtain 344,370 objects (0.095% of the total sample) as the initial sample of the LSBG candidates.
For a clear picture of the process of our selection for the initial LSBG candidates so far, the selection criteria above are listed in Table 1. Up to now, the selection of the initial LSBG candidates were solely via the direct use of the Tractor catalogue, so we furthermore inspected the images of a few thousand initial LSBG candidates and found a large number of the candidates were apparently false LSBGs that are instead the sources of contaminations. So it is necessary to reject those false LSBG candidates from the numerous initial candidates via the machine learning techniques.
Criterion | Range | LSBG Candidates | Percent |
---|---|---|---|
no cut | NA | 364,277,779 | 100.000% |
type | PSF, DEV, DUP | 187,492,198 | 51.470% |
2.5′′ - 20′′ | 2,999,940 | 0.824% | |
fracmasked, fracflux, fracin | Equation (1) | 1,622,986 | 0.446% |
color | Equation (3) | 994,459 | 0.273% |
ellipticity | 0.7 | 772,745 | 0.212% |
24.2 - 28.8 | 344,370 | 0.095% | |
Machine Learning | - | 57,934 | 0.016% |
Visual Inspection | - | 31,825 | 0.009% |
3.2 Machine Learning Classification
From our visual inspection, the most common sources of contaminations for the false LSBGs were:
-
1.
Red objects with high ellipticity close to the criterion of 0.7 (e.g., Figure 22(a)).
-
2.
Detections that are almost invisible in the images (e.g., Figure 22(b)).
- 3.
-
4.
Faint, diffuse regions of objects in a larger scale, such as Galactic cirrus (e.g., Figure 22(d)).
-
5.
Diffuse light from the arms of large spiral galaxies (e.g., Figure 22(e)).






Aiming to reject the false LSBGs from the initial sample of the LSBG candidates and simultaneously maintain a completeness of the true LSBGs as high as possible, we employed a supervised machine learning classification algorithms.
3.2.1 Training and test sets
In order to prepare for a labeled sample with objects labeled as either true or false LSBGs for the training and the test in machine learning, we decided to visually inspect the images of all of the 22,710 initial LSBG candidates within the 26 sky areas (blue areas in Figure 3) that were selected by us to distribute uniformly in the spatial area of the BASS+MzLS. To alleviate the subjective biases, we had three individuals to perform the visual inspections independently to identify each candidate to be a true or a false LSBG. Then, the results from the three were combined as the final results. Ultimately, we labeled the 2,561 candidates identified as the true LSBGs by more than two individuals as LSBGs and labeled the rest 20,149 as non-LSBGs. Then, 70 of the labeled sample of 22710 labeled objects was adopted as a training set while the rest 30 of the labeled sample was utilized as a test set. We used the training set to train a model and evaluated the quality of the trained model using the test set.

3.2.2 Model, features and Classification
Before training the model, it’s key to pick up a machine learning algorithm. We tested and evaluated the widely used algorithms of Random Forest (via the Python library scikit-learn; Pedregosa et al. (2011)), XGBoost (Chen & Guestrin (2016) ; via the Python library xgboost), Naive Bayes, AdaBoost, K Nearest Neighbor, Decision Tree, Random Forest, Support Vector Machines, and SVM with radial basis function (RBF) kernel (via an automated toolkit auto-sklearn that integrates diverse machine learning algorithms; Feurer et al. (2015)). Among these models, we voted for the XGBoost which stood out with the highest accuracy on the test set to be our machine learning model in this study.
Asides from the model, we need to opt for the useful features for learning. We performed tests and assessments for the quality of different feature combinations for learning by using the control variable method. If the accuracy of the model takes the first priority, we believe that it is best to use all of the following 24 features in learning, which are listed in an order of importance.
-
1.
The ellipticity of objects, 1 - b/a.
-
2.
The half-light radius, shape_r.
-
3.
The colors of - , - , and - derived from the Galactic extinction corrected magnitudes.
-
4.
The Galactic extinction corrected magnitudes in the , , and bands, mag_corr.
-
5.
The profile-weighted fraction of the flux from other sources divided by the total flux in the , , and bands, fracflux.
-
6.
The fraction of a source’s flux within the detection in the , , and bands, fracin.
-
7.
The profile-weighted fraction of pixels masked from all observations of this object in the , , and bands, fracmasked.
-
8.
The mean effective surface brightness in the , , and bands, mu_mean.
-
9.
The power-law index for the Srsic profile model, measured as sersic in the Tractor catalogue.
-
10.
The central surface brightness in the , , and bands, mu_0, which is converted from mu_mean by the transforming prescription provided in Graham & Driver (2005).
As for the training, our principle was to obtain a model with the maximum value for the Recall parameter to make sure that the true LSBGs in the training sample could be retained in positive predictions as completely as possible while maintaining the Precision parameter (the proportion of true LSBGs in the predicted LSBGs) as high as possible. To evaluate the model at a balance between the Recall and Precision, the Fbeta-measure criterion was introduced as an evaluation metric, which represents the weighted harmonic mean of both the Precision and the Recall. In our principle, the Recall parameter should have a greater weight than the Precision, so we use beta = 2, a commonly used value, as the standard for the Fbeta-measure in model evaluation. With these guidelines, we trained the XGBoost model by using grid search and optuna, a hyperparameter optimization framework (Akiba et al. 2019), to optimize the hyperparameters of XGBoost model. After thousands of optimizations, we finally derived the trained XGBoost model with the optimized hyperparameters, such as max_depth = 6, n_estimators = 337, learning_rate 0.09, subsample 0.393, scale_pos_weight = 8 and so on.
Subsequently, this XGBoost model was applied to the test set, and the results from the test set were displayed in the confusion matrix (Figure 4). Obviously, the Recall value, defined as the ratio of the true LSBGs classified as LSBGs () by the model, is . For the minor fraction of the true LSBGs that were classified as non-LSBGs and the false LSBGs that were classified as LSBGs by the model, we visually inspected their images and found that they are too dark to result in a reliable classification. In addition, the Precision value, defined as the fraction of predicted LSBGs classified as ture LSBGs (), is , meaning that approximately 40% of the objects in the LSBG candidates we obtained after machine learning are non-LSBGs. We validated this probability in Section 3.3.

With the help of the machine learning, the number of the initial LSBG candidates was decreased from 344,370 to 57,934. However, according to the Precision of the model, the 57,934 LSBG candidates are expected to still contain non-LSBGs, so we would perform the visual inspection of the images of the 57,934 candidates again to purify the sample in next section.
3.3 Visual Inspection
In this section, we visually inspected the -composite images of the 57,934 LSBG candidates retained after the machine learning. From the inspection, we found that there are still false LSBGs in the sample whose visual appearances in the images were not like the true LSBGs at all, but the values of their main features listed in Section 3.2.2 given by the Tractor measurements followed the true LSBGs, making it challenging to classify them to be non-LSBGs by our model that were trained solely on learning the main features since we desired a fast learning and classification of the LSBGs in this work. However, in the future, we plan to train a better deep learning classification model using both the features and images of the final LSBG sample selected in this work.
Specifically, these false LSBGs in the current sample still appeared to be like the contaminations shown in Figure 2. Therefore, we rejected them by visual inspection and ultimately resulted in a final sample of 31,825 LSBG candidates with a high purity of the true LSBGs with the half-light radius 2.5′′ 20′′ and the Galactic extinction-corrected mean effective surface brightness mag arcsec-2. This final sample is so far the largest catalogue of LSBG candidates from the 5500 deg2 sky area of BASS+MzLS, more than 1/3 of the entire sky area of the DR9 of the DESI Legacy Survey.
4 LSBG PROPERTIES
We successfully established a sample of 31,825 LSBG candidates from the BASS+MzLS, spanning a wide range of properties, such as the color, morphology and environment, which would be studied in detail in this section.
4.1 Color Distribution

We displayed the distribution of the final sample of the LSBG candidates in the color - color diagram of - versus - in Figure 5. The sample galaxies (green dots) exhibited a bimodal distribution in the - color which naturally required a fitting by a combination of double Gaussian profiles rather than a single Gaussian profile according to our evaluation by the Akaike Information Criterion (AIC / AICc) and the Bayesian Information Criterion (BIC; see details in Section 5.1).
In Figure 5 (the top panel), the best-fitting profile (black solid curve) evaluated by the AIC/BIC is the sum of a blue component represented by a single Gaussian profile with a peak value at a blue - color of 0.455 and of 0.103 (blue solid curve) and a red component represented by a single Gaussian profile with a peak value at a red - color of 0.700 and of 0.070 (red solid curve). Obviously, the blue component is dominated by the blue LSBG candidates of which 97.8 are bluer than - 0.66 while the red component is dominated by the red LSBG candidates of which 97.8 are redder than - 0.56. This means that galaxies between - = 0. 56 and 0.66 are the mixture of LSBG candidates from the red end of the blue component ( - 0.56) and those from the blue end of the red component ( - 0.66). Since the median color of all of the galaxies between 0. 56 and 0.66 is 0.60 in - , we adopt - = 0.60 as the color dividing line (vertical black dashed line) to separate the final sample of LSBG candidates into two subsamples of the blue ( - 0.60; 26,672 galaxies) and the red ( - 0.60; 5,153 galaxies). The median - colors of the blue and red subsamples are 0.44 and 0.67, respectively.
In Figure 6, we show randomly selected LSBG candidates from the the blue (the left) and the red (the right) subsamples for examples. Apparently, the blue LSBG candidates appear disk-like, spiral or irregular while the red ones tend to be spheroidal or elliptical. The former is quite distinguishing from the latter in morphology, implying that the colors of LSBGs correlate with their morphologies. Such a conclusion was also supported by several previous published studies, which would be discussed in Section 5.2.


4.2 Magnitude and Surface Brightness
In Figure 7(a) the distribution of the magnitudes in the -, -, and -band are shown for the final sample of LSBG candidates entirely. In Figure 7(b), the distribution of the -band magnitude is compared between the blue and red subsamples, showing that the blue are slightly brighter than the red in the apparent magnitude in -band. In Figure 7(c), we show the distribution of the mean surface brightness for the blue and red subsamples, respectively. We find that the red subsample shows a bump or an excess at the lower surface brightness tail (fainter than 25.5 mag arcsec-2) while the blue subsample has slightly more LSBGs with higher surface brightness (brighter than 25.5 mag arcsec-2), implying that the red LSBGs from our sample incline to have lower surface brightness while the blue ones tend to have higher surface brightness. This could be further supported by the statistics that the 16th, 50th, and 84th percentiles of are 24.4, 24.7, 25.5 mag arcsec-2 for the blue subsample and 24.4, 24.8, 25.8 mag arcsec-2 for the red subsample.



4.3 Ellipticity, Effective Radius and Srsic Index
In Figure 8(a), we present the distribution of ellipticity ( = 1 - b/a) for the full final sample. It shows that both the blue and the red subsamples have considerable fractions of galaxies with the zero from the Tractor catalogue (10% of the blue and 18% of the red). The median are 0.31 for the full sample, 0.32 for the blue and 0.28 for the red, respectively. To give a clear picture of the distribution for those galaxies without the zero , we plot a zoom-in picture for them in panel (b), where both subsamples show generally consistent distributions, with the median of 0.34 for the blue and 0.31 for the red, respectively. All of these distributions demonstrate that the LSBG candidates in our final sample are obviously round between = 0.1 and 0.7, which differs from the normal spiral galaxies showing a nearly flat ellipticity distribution between = 0.1 and 0.7 (Figure 4 in Rodríguez & Padilla (2013)).
In Figure 9(a), both subsamples are dominated (more than 99%) by galaxies with sizes ranging from 2.5′′ to 14′′ in , with the medians are 3.5′′ for the red sample, 4.1′′ for the blue sample, and 4′′ for the full sample. It is worth noting that the measurements from the Tractor catalogue for the minority of the large spiral galaxies are all given to be around 13.8′′, causing a low peak occurs at of 13.8′′ in the figure. This low peak due to the limitation of the Tractor model measurements has no physical implications, but the galaxies in this low peak all appear blue, large, diffuse, and extended disk LSBGs from our visual inspection. Thus, we still kept these galaxies in our final sample. In Figure 9(b), we plot the Srsic index for the blue and the red subsamples, showing that 95% of the blue subsample have 2.5 while 93% of the red subsample have 2.5. The distribution of agrees with each other for both subsamples, with a median of = 1 for each, showing our final sample is dominated by the disk LSBGs.




4.4 Spatial Distribution
In Figure 10 we show the spatial distribution of the blue (top) and the red (bottom) subsamples over the sky area within the the BASS+MzLS footprint (the black solid). We find an obvious discrepancy between the spatial distribution of the two subsamples. The blue LSBGs are more uniformly distributed while the red populations are clustered, showing that red LSBGs preferentially inhabit in denser environments than blue LSBGs. This is found by the studies of Greco et al. (2018) and Tanoglidis et al. (2021b) as well, and we will discuss it in Section 5.2.


5 DISCUSSION
5.1 Double or single Gaussian fitting?
In Section 4.1 our LSBG sample was reported to have a bimodal color distribution that could be best fitted by a mixture of double Gaussian models rather than a single Gaussian model. Such a statement is supported by the evaluation of the performance of the single Gaussian model (SGM; grey dashed line in the top panel of Figure 5) and the double Gaussian model (DGM; black line in the top panel of Figure 5) fit according to the AIC/BIC in the equation below (equation (5)) .
(5) |
where is the number of fitting parameters, is the likelihood function, and n is the number of samples. When the sample is small in size, AIC should be corrected into AICc. According to Kass & Raftery (1995), the performance of the model improves as the AICc or BIC value decreases.
We derive the BIC or AICc values from fitting the - color distribution of our sample with a single Gaussian model as BICSGM or AICcSGM. Similarly, we derive BICDGM or AICcDGM for the fit with the double Gaussian model. Then, the BIC or AICc differences between the SGM and DGM are calculated as and . According to Kass & Raftery (1995), if or was larger than 10, the DGM would prevail. In our calculation, the and values are 235.1 and 227.1, respectively, which are far greater than 10, giving a strong evidence for us to believe that the - color distribution is much better fitted by a double Gaussian model than a single Gaussian model. This strongly convinces us of a bimodal - color distribution of the final sample of LSBGs. Additionally, such bimodal distributions of the colors of the LSBGs have also been reported for the previously defined sample of LSBGs from Greco et al. (2018) and Tanoglidis et al. (2021b), which would be discussed in detail in Section 5.2.
5.2 Comparison with previous samples
In this section, we compare our sample of the LSBG candidates with three other LSBG samples from Du et al. (2015)(D15), Greco et al. (2018)(G18) and Tanoglidis et al. (2021b)(T21), respectively. The D15 provides a sample of 1,129 LSBGs selected from the 2800 deg2 area of the .40 - SDSS DR7 survey with an imaging depth of 22.2 mag for point sources of 95% detection (York et al. 2000). This sample is defined on the central surface brightness 22.5 mag arcsec-2, and they are nearby (z 0.06), blue, Hi-rich, and disk-dominated. The G18 presents a sample of 781 extended LSBGs from the first 200 deg2 area of the imaging survey of the Wide layer of the Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP) which has a depth of 26.8, 26.4, and 26.4 mag for point sources at 5 (Aihara et al. 2018a). This sample is defined on the mean surface brightness ( 24.3 mag arcsec-2) to allow nucleated galaxies into the sample and on galaxy size ( 2.5′′) as well to be restricted to low redshift. Using the similar selection criteria to the G18, the T21 produces a catalogue of 23,790 extended LSBGs from the 5000 deg2 area of the first three years of imaging data from the Dark Energy Survey (DES Y3) with a depth of 23.52, 23.10, and 22.51 mag for point sources at 10 which is corresponding to a surface brightness limit at 3 of 28.26, 27.86, and 27.37 mag arcsec-2.
In terms of the surface brightness (Figure 11(a)), our sample is highly consistent with the T21, ranging from 24.2 28.8 mag arcsec-2. The 16th, 50th, and 84th percentiles of are 24.4, 24.7, and 25.5 mag arcsec-2 for our sample and 24.3, 24.7, 25.3 mag arcsec-2 for the T21. For the G18 sample, the measurement is not available in its released catalogue, so we are not able to display the G18 sample overplotted in Figure 11(a) to carry out direct comparisons with the three other samples. However, it is clearly stated in G18 that the distribution of their sample is broad with the 16th, 50th, and 84th percentiles are = 24.5 (24.8), 24.8 (25.8), and 25.5 (26.8) mag arcsec-2 for the blue (red) subsamples. According to such statements, we believe that the G18 sample has quite similar distribution of mean surface brightness to our sample and the T21. In a stark contrast, the Du15 sample has the mean surface brightness distribution with the 16th, 50th, and 84th percentiles of = 23.4, 23.7, and 24.6 mag arcsec-2, which are much brighter than our the three other samples. This is reasonable because the Du15 sample is from the SDSS imaging survey which has a shallower depth than the BASS+MzLS, DES Y3, and HSC-SSP surveys that our sample, T21 and G18 are, repsectively, based on. This could be furthermore supported by the comparison of magnitude (Figure 11(b)), where our sample, the T21, and the G18 are systematically at least 2 mag fainter than the Du15 sample in the -band apparent magnitude.
In the aspect of the color (Figure 11(c)), the 16th, 50th, and 84th percentiles of - are 0.36, 0.47, and 0.60 for our sample, 0.29, 0.43, and 0.60 for the G18 sample, 0.26, 0.38, 0.57 for the T21 sample, and 0.20, 0.30, and 0.41 for the D15. Apparently, our sample generally agrees with the G18 and the T21 in the - distribution, albeit the latter two samples are slightly bluer. Among the samples for comparison, the sample of Du15 is the bluest because their galaxies are dominated by Hi-rich and blue LSBGs. Additionally, we reported that our sample has a bimodal distribution of the - color in Section 4.1, implying two distinct populations of the blue and the red LSBGs, respectively. Actually, such bimodal distributions of the color has also been found in the G18 and the T21 for their own LSBG samples. Specifically, the G18 sample shows a clear bimodality in both the - and - colors, and is thus divided into two populations of the red and the blue LSBGs using the median - = 0.64 as the dividing line. Similarly, the T21 sample also displays bimodal distribution in both the - and - colors, and is then separated into two subsamples of the blue and the red LSBGs using the intersection of the two Gaussian model profiles at - = 0.60 as the threshold. The color distributions of all the three of our sample, the G18, and the T21 demonstrate that LSBGs, similarly to the galaxies with normal/high surface brightness (normal galaxies), are able to be conventionally divided into two sequences of the blue and the red, with the blue LSBGs dominated by the sprial, disk, or irregular systems in morphology and the red LSBGs by the spheroidal or elliptical morphology.
As for the environments, the blue and Hi-rich LSBGs of the Du15 are mostly in voids or to the edge of the filaments of low densities. For the three of our sample, the G18, and the T21, the LSBGs show consistent spatial distributions, with the blue LSBG populations of each sample more uniformly distributed within the sky footprint and the red populations of each sample highly clustered in the spatial area. This implies that the red LSBGs preferentially inhabit in denser environments than the blue LSBGs.
Furthermore, our sample is consistent with the G18 and the T21 in the ellipticity distribution, with the median around 0.3 showing the LSBGs of the three samples are generally round. This is a striking contrast to the almost flat distribution of of the normal galaxies between = 0 and 0.7.
These comparisons strongly demonstrate that our sample along with the G18 and the T21 have well extended the SDSS-based LSBG samples to a new regime of much lower surface brightness, fainter apparent magnitude and broad properties in a large scale.



5.3 Possible evolution from the blue to red LSBGs?
The optical colors of galaxies indicate their stellar populations and have a strong correlation with the galaxy morphology and environment. In the frame of galaxies with normal or high surface brightnesses, galaxies in the local universe fall into one of two distinct populations in terms of optical colors: a red sequence and a blue cloud (Strateva et al. 2001; Baldry et al. 2004; Blanton & Moustakas 2009). Besides the color, the bimodal distributions have also been observed and measured in some other parameters, such as metallicity and star formation rate (Kauffmann et al. 2003a, b). The blue cloud is dominated by active, star-forming galaxies while the red sequence is composed of quiescent galaxies. Compared to the blue galaxies which are spiral, disk, or irregular systems in morphology, the red galaxies are ellipticals, spheroidals, lenticulars, and cD galaxies (Blanton & Moustakas 2009). Moreover, red galaxies are more likely to be found in denser environments and more spatially clustered than blue galaxies (Blanton & Moustakas 2009; Das & Pandey 2024). It is proposed that the blue galaxies would evolve onto the red sequence by fading their stellar populations after their star formation was ceased by some quenching mechanisms, such as the natural exhaustion of gas, active galactic nuclei feedback, galaxy harassment, and galaxy mergers, etc.
Similar to the blue cloud and red sequence in the frame of galaxies with normal surface brightnesses, our LSBGs in this work show a bimodal distribution in the optical color, so they fall into two populations in terms of the - color: the blue and red LSBGs. In morphology, the blue LSBGs are disk-like or irregular while the red LSBGs are more bulge-dominated or spheroidal. In addition, the red LSBGs are more spatially clustered than the blue LSBGs. So there might be an possible evolutionary path from the blue LSBGs to the red LSBG, and we would investigate this issue in our future work.
6 SUMMARY AND CONCLUSIONS
Based on the released photometric catalogue from the Tractor software and the machine learning model, we selected a sample of 31,825 LSBG candidates with the mean surface brightness 24.2 28.8 mag arcsec-2 and the half-light radius 2.5 20′′ from the 5500 deg2 of the BASS+MzLS survey. The selection criteria are summarized in Table 1.
This sample shows a bimodal distribution in the - color, implying two distinct populations of the blue ( - 0.60) and red ( - 0.60) LSBGs. The blue populations are dominated by spiral, disk or irregular systems while the red ones appear spheroidal or elliptical in morphology, revealing that the colors of LSBGs correlate with morphology. In apparent magnitude and surface brightness, the red LSBGs are slightly fainter than the blue. Both populations have similar distribution of ellipticity, half-light radius (median 4′′), and Srsic index (median = 1). In terms of ellipticity, the for both populations range from 0 to 0.7 with the median 0.3, indicating that the sample galaxies are generally round. This differs from the normal spiral galaxies which show a nearly flat distribution between = 0 and 0.7. The half-light radius are within , with a median 4′′. In Srsic index, the blue and the red LSBG populations are both dominated by disk galaxies with = 1. However, the two populations differ in the spatial distribution, with the blue LSBGs more uniformly distributed across the sky area while the red ones highly clustered. This sample would absolutely be important for further studies on the possible evolutionary link between the two LSBG populations.
By comparing our sample with three other samples of LSBGs, it is strongly demonstrated that our sample of LSBG candidates well extends the studies of LSBGs to the regime of lower surface brightness, fainter magnitude, and broader properties than the previously SDSS-based LSBG samples. This sample is definitely an excellent sample for training the deep learning model of higher performance to automatically identify LSBGs from the huge data from more wide and deep imaging surveys in the future.
Acknowledgements.
This work is supported by the National Key R&D Program of China (grant No.2022YFA1602901), the Youth Innovation Promotion Association, Chinese Academy of Sciences (No.2020057), the science research grants from the China Manned Space Project, and the National Natural Science Foundation of China (NSFC; Nos.12090041 and 12090040). Additional support comes from the Strategic Priority Research Program of the Chinese Academy of Sciences (grant Nos.XDB0550100 and XDB0550102) and the Open Project Program of the Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences.References
- Adami et al. (2006) Adami, C., Scheidegger, R., Ulmer, M., et al. 2006, A&A, 459, 679
- Aihara et al. (2018a) Aihara, H., Armstrong, R., Bickerton, S., et al. 2018a, PASJ, 70, S8
- Aihara et al. (2018b) Aihara, H., Arimoto, N., Armstrong, R., et al. 2018b, PASJ, 70, S4
- Akiba et al. (2019) Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. 2019, in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19 (New York, NY, USA: Association for Computing Machinery), 2623–2631
- Baldry et al. (2004) Baldry, I. K., Glazebrook, K., Brinkmann, J., et al. 2004, ApJ, 600, 681
- Blanton & Moustakas (2009) Blanton, M. R., & Moustakas, J. 2009, ARA&A, 47, 159
- Bothun et al. (1997) Bothun, G., Impey, C., & McGaugh, S. 1997, PASP, 109, 745
- Burkholder et al. (2001) Burkholder, V., Impey, C., & Sprayberry, D. 2001, AJ, 122, 2318
- Chen & Guestrin (2016) Chen, T., & Guestrin, C. 2016, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16 (New York, NY, USA: Association for Computing Machinery), 785–794
- Cheng et al. (2020) Cheng, T.-Y., Conselice, C. J., Aragón-Salamanca, A., et al. 2020, MNRAS, 493, 4209
- Cohen et al. (2018) Cohen, Y., van Dokkum, P., Danieli, S., et al. 2018, ApJ, 868, 96
- Courteau (1996) Courteau, S. 1996, ApJS, 103, 363
- Das & Pandey (2024) Das, A., & Pandey, B. 2024, arXiv e-prints, arXiv:2402.05788
- de Blok & McGaugh (1996) de Blok, W. J. G., & McGaugh, S. S. 1996, ApJ, 469, L89
- de Blok & McGaugh (1997) de Blok, W. J. G., & McGaugh, S. S. 1997, MNRAS, 290, 533
- de Blok et al. (2001) de Blok, W. J. G., McGaugh, S. S., & Rubin, V. C. 2001, AJ, 122, 2396
- de Blok et al. (1996) de Blok, W. J. G., McGaugh, S. S., & van der Hulst, J. M. 1996, MNRAS, 283, 18
- de Blok & van der Hulst (1998a) de Blok, W. J. G., & van der Hulst, J. M. 1998a, A&A, 335, 421
- de Blok & van der Hulst (1998b) de Blok, W. J. G., & van der Hulst, J. M. 1998b, A&A, 336, 49
- Dey et al. (2019) Dey, A., Schlegel, D. J., Lang, D., et al. 2019, AJ, 157, 168
- Du et al. (2015) Du, W., Wu, H., Lam, M. I., et al. 2015, AJ, 149, 199
- Du et al. (2017) Du, W., Wu, H., Zhu, Y., Zheng, W., & Filippenko, A. V. 2017, ApJ, 837, 152
- Feurer et al. (2015) Feurer, M., Klein, A., Eggensperger, K., et al. 2015, in Advances in Neural Information Processing Systems, ed. C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, & R. Garnett, Vol. 28 (Curran Associates, Inc.), 2962
- Freeman (1970) Freeman, K. C. 1970, ApJ, 160, 811
- Galaz et al. (2011) Galaz, G., Herrera-Camus, R., Garcia-Lambas, D., & Padilla, N. 2011, ApJ, 728, 74
- Galaz et al. (2022) Galaz, G., Frayer, D. T., Blaña, M., et al. 2022, ApJ, 940, L37
- Giovanelli (2007) Giovanelli, R. 2007, Nuovo Cimento B Serie, 122, 1097
- Graham & Driver (2005) Graham, A. W., & Driver, S. P. 2005, PASA, 22, 118
- Greco et al. (2018) Greco, J. P., Greene, J. E., Strauss, M. A., et al. 2018, ApJ, 857, 104
- Haberzettl et al. (2007) Haberzettl, L., Bomans, D. J., & Dettmar, R. J. 2007, A&A, 471, 787
- He et al. (2020) He, M., Wu, H., Du, W., et al. 2020, ApJS, 248, 33
- Hinz et al. (2007) Hinz, J. L., Rieke, M. J., Rieke, G. H., et al. 2007, ApJ, 663, 895
- Huang et al. (2014) Huang, S., Haynes, M. P., Giovanelli, R., et al. 2014, ApJ, 793, 40
- Impey & Bothun (1997) Impey, C., & Bothun, G. 1997, ARA&A, 35, 267
- Kass & Raftery (1995) Kass, R. E., & Raftery, A. E. 1995, Journal of the American Statistical Association, 90, 773
- Kauffmann et al. (2003a) Kauffmann, G., Heckman, T. M., White, S. D. M., et al. 2003a, MNRAS, 341, 33
- Kauffmann et al. (2003b) Kauffmann, G., Heckman, T. M., White, S. D. M., et al. 2003b, MNRAS, 341, 54
- Kuzio de Naray et al. (2004) Kuzio de Naray, R., McGaugh, S. S., & de Blok, W. J. G. 2004, MNRAS, 355, 887
- Lang et al. (2016) Lang, D., Hogg, D. W., & Mykytyn, D. 2016, The Tractor: Probabilistic astronomical source detection and measurement, Astrophysics Source Code Library, record ascl:1604.008
- Lei et al. (2019) Lei, F.-J., Wu, H., Zhu, Y.-N., et al. 2019, ApJS, 242, 11
- Lei et al. (2018) Lei, F.-J., Wu, H., Du, W., et al. 2018, ApJS, 235, 18
- Martin et al. (2019) Martin, G., Kaviraj, S., Laigle, C., et al. 2019, MNRAS, 485, 796
- Martin et al. (2013) Martin, N. F., Ibata, R. A., McConnachie, A. W., et al. 2013, ApJ, 776, 80
- Matthews et al. (2001) Matthews, L. D., van Driel, W., & Monnier-Ragaigne, D. 2001, A&A, 365, 1
- McGaugh (1996) McGaugh, S. S. 1996, MNRAS, 280, 337
- McGaugh et al. (1995) McGaugh, S. S., Bothun, G. D., & Schombert, J. M. 1995, AJ, 110, 573
- Minchin et al. (2004) Minchin, R. F., Disney, M. J., Parker, Q. A., et al. 2004, MNRAS, 355, 1303
- Monnier Ragaigne et al. (2003) Monnier Ragaigne, D., van Driel, W., Schneider, S. E., Jarrett, T. H., & Balkowski, C. 2003, A&A, 405, 99
- O’Neil et al. (1997) O’Neil, K., Bothun, G. D., & Cornell, M. E. 1997, AJ, 113, 1212
- O’Neil et al. (2000) O’Neil, K., Bothun, G. D., & Schombert, J. 2000, AJ, 119, 136
- O’Neil et al. (2004) O’Neil, K., Bothun, G., van Driel, W., & Monnier Ragaigne, D. 2004, A&A, 428, 823
- Pedregosa et al. (2011) Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, Journal of Machine Learning Research, 12, 2825
- Rahman et al. (2007) Rahman, N., Howell, J. H., Helou, G., Mazzarella, J. M., & Buckalew, B. 2007, ApJ, 663, 908
- Rodríguez & Padilla (2013) Rodríguez, S., & Padilla, N. D. 2013, MNRAS, 434, 2153
- Ruiz-Macias et al. (2020) Ruiz-Macias, O., Zarrouk, P., Cole, S., et al. 2020, Research Notes of the American Astronomical Society, 4, 187
- Sabatini et al. (2005) Sabatini, S., Davies, J., van Driel, W., et al. 2005, MNRAS, 357, 819
- Schombert et al. (2011) Schombert, J., Maciel, T., & McGaugh, S. 2011, Advances in Astronomy, 2011, 143698
- Strateva et al. (2001) Strateva, I., Ivezić, Ž., Knapp, G. R., et al. 2001, AJ, 122, 1861
- Tanoglidis et al. (2021a) Tanoglidis, D., Ćiprijanović, A., & Drlica-Wagner, A. 2021a, Astronomy and Computing, 35, 100469
- Tanoglidis et al. (2021b) Tanoglidis, D., Drlica-Wagner, A., Wei, K., et al. 2021b, ApJS, 252, 18
- The Dark Energy Survey Collaboration (2005) The Dark Energy Survey Collaboration. 2005, arXiv e-prints, astro
- Trachternach et al. (2006) Trachternach, C., Bomans, D. J., Haberzettl, L., & Dettmar, R. J. 2006, A&A, 458, 341
- van den Hoek et al. (2000) van den Hoek, L. B., de Blok, W. J. G., van der Hulst, J. M., & de Jong, T. 2000, A&A, 357, 397
- van der Hulst et al. (1993) van der Hulst, J. M., Skillman, E. D., Smith, T. R., et al. 1993, AJ, 106, 548
- van Dokkum et al. (2015) van Dokkum, P. G., Abraham, R., Merritt, A., et al. 2015, ApJ, 798, L45
- van Zee et al. (1997) van Zee, L., Haynes, M. P., & Salzer, J. J. 1997, AJ, 114, 2497
- Venhola et al. (2017) Venhola, A., Peletier, R., Laurikainen, E., et al. 2017, A&A, 608, A142
- Vorobyov et al. (2009) Vorobyov, E. I., Shchekinov, Y., Bizyaev, D., Bomans, D., & Dettmar, R. J. 2009, A&A, 505, 483
- Wyder et al. (2009) Wyder, T. K., Martin, D. C., Barlow, T. A., et al. 2009, ApJ, 696, 1834
- Yi et al. (2022) Yi, Z., Li, J., Du, W., et al. 2022, MNRAS, 513, 3972
- York et al. (2000) York, D. G., Adelman, J., Anderson, John E., J., et al. 2000, AJ, 120, 1579
- Zaritsky et al. (2023) Zaritsky, D., Donnerstein, R., Dey, A., et al. 2023, ApJS, 267, 27
- Zaritsky et al. (2022) Zaritsky, D., Donnerstein, R., Karunakaran, A., et al. 2022, ApJS, 261, 11
- Zaritsky et al. (2019) Zaritsky, D., Donnerstein, R., Dey, A., et al. 2019, ApJS, 240, 1
- Zhang et al. (2024) Zhang, B.-Q., Wu, H., Du, W., et al. 2024, Research in Astronomy and Astrophysics, 24, 015018
- Zhong et al. (2008) Zhong, G. H., Liang, Y. C., Liu, F. S., et al. 2008, MNRAS, 391, 986
- Zou et al. (2017) Zou, H., Zhou, X., Fan, X., et al. 2017, PASP, 129, 064101