Galaxy cluster optical mass proxies from probabilistic memberships
Abstract
Robust galaxy cluster mass estimates are fundamental for constraining cosmological parameters from counts. For this reason, it is essential to search for tracers that, independent of the cluster’s dynamical state, have a small intrinsic scatter and can be easily inferred from observations. This work uses a simulated data set to focus on photometric properties and explores different optical mass proxies including richness, optical luminosity, and total stellar mass. We have developed a probabilistic membership assignment that makes minimal assumptions about the galaxy cluster properties, limited to a characteristic radius, velocity dispersion, and spatial distribution. Applying the estimator to over galaxy clusters with within a mass range of to M⊙, we obtain robust richness estimates that deviate from the median true value (from simulations) by . The scatter in the mass-observable relations is dex for richness, dex for optical luminosity, and dex for stellar mass. We also discuss membership assignment, completeness and purity, and the consequences of small centre and redshift offsets. We conclude that the application of our method for photometric surveys delivers competitive cluster mass proxies.
keywords:
Galaxies: clusters: general – Galaxies: groups: general – Methods: statistical1 Introduction
1.1 The need for optical-based mass-proxies for galaxy clusters
Wide-field cosmological imaging surveys such as KiDS (de Jong et al., 2013), DES (The Dark Energy Survey Collaboration, 2005), HSC (Miyazaki et al., 2012), WISE (Wright et al., 2010), LSST (Ivezić et al., 2019), EUCLID (Sartoris et al., 2016; Euclid Collaboration et al., 2022) and others, provide (or are expected to provide) datasets containing billions of galaxies. These data can help us understand the large-scale structure of the Universe and its evolution. As galaxy clusters trace overdensity peaks in the matter distribution, their abundance as a function of mass and redshift is quite sensitive to the matter density of the Universe and the evolution of its clustering. Thus, galaxy clusters are known as powerful tools for constraining cosmological parameters (e.g. Carlberg et al., 1996; Reiprich & Böhringer, 2002; Voit, 2005; Vikhlinin et al., 2009; Allen et al., 2011; Weinberg et al., 2013; Pacaud et al., 2016; Costanzi et al., 2019; Ider Chitham et al., 2020; Finoguenov et al., 2020). This approach, however, depends on robust and precise mass estimates for all detected structures. Since this cannot be directly inferred from observations, we must rely on observable proxies for the halo masses.
An ideal mass proxy should have a small intrinsic scatter, present a minor dependence on the cluster’s dynamical state, and be easily accessed through observations. Various independent mass proxies have been explored in recent decades to minimize uncertainties and systematic effects. Statistical uncertainties can go down to - dex for gas mass, temperature, X-ray luminosity (Ettori, 2013; Sereno et al., 2019), Sunyaev-Zel’dovich effect signal (Planck Collaboration et al., 2016; Pratt & Bregman, 2020), stellar mass (Pereira et al., 2018) and cluster optical richness (Lopes et al., 2009; Andreon, 2015; Bellagamba et al., 2019). Recently, Costanzi et al. (2019) and Ider Chitham et al. (2020) successfully utilized optical richness to derive cosmological constraints in SDSS (Aihara et al., 2011) and CODEX (Finoguenov et al., 2020) surveys, respectively. Nevertheless, statistics of low-mass structures such as galaxy groups are still limited and require further investigation.
Imaging surveys that use narrow-band filters are particularly interesting for galaxy cluster studies. Those surveys can be seen as a midpoint between broad-band imaging and spectroscopic surveys in the sense of the precision and accuracy of galaxies’ photometric redshifts (photo-z’s). Examples includes J-PAS (Benitez et al., 2014; Bonoli et al., 2021), PAU (Martí et al., 2014), S-PLUS (Mendes de Oliveira et al., 2019) and J-PLUS (Cenarro et al., 2019). In terms of group and cluster detections, this transition affects the cluster finder algorithms in the sense that the methodologies are migrating from a colour-based approach to ones that can take advantage of the increasing spectro-photometric information (for example, the performance and algorithm selection in Euclid Collaboration et al., 2019).
It is, therefore, also necessary to revisit the membership estimate techniques and the strategies that use the spectro-photometric data to derive mass proxies with low scatter. However, a common problem when dealing with photometric redshifts is the magnitude of the uncertainties in the redshift space. A typical error of in photo-zs corresponds to km/s, which is - larger than a typical, virialized, cluster velocity dispersion estimated with spectroscopic redshifts. This effect leads to contamination by neighbouring galaxies along the line of sight and can also create false cluster detections by linking unbounded structures (Weinberg et al., 2013). The use of less discriminating observables, such as individual colours, can exacerbate this issue.
Some recent works by Castignani & Benoist (2016); Bellagamba et al. (2019); Lopes & Ribeiro (2020) have addressed this matter regarding the membership of galaxies to clusters or groups. The use of continuous probabilistic memberships, instead of binomial ones (member/non-member), emerges as a superior description of what can be inferred given the data available. This approach allows for the definition of a set of galaxy-based mass proxies that can be derived by weighting galaxy numbers, luminosities, stellar masses, or others according to their membership probability.
1.2 The state-of-the-art in probabilistic membership
In 2000, Brunner & Lubin introduced a technique that produces a probabilistic membership interpretation without directly relying on spectroscopic information. First, the authors assign a Gaussian probability distribution function to each galaxy centred on the estimated photometric redshift. Next, the membership is obtained by integrating the distribution function within the galaxy cluster redshift interval.
This method was improved by George et al. (2011) and Castignani & Benoist (2016). The authors introduce a Bayesian formalism that also considers the cluster centre-galaxy projected distance, magnitudes, and the relative population size between field and cluster galaxy density. These studies, tailored for the specific sample in question, provide a better understanding of the membership dependence on photo-z accuracy, magnitudes, and cluster properties.
In the context of previous studies of galaxy cluster detection and galaxy population identification, the redMaPPer (Rykoff et al., 2014) and AMICO (Bellagamba et al., 2019; Maturi et al., 2023) cluster-finders advanced meaningfully by relying on an optimal filtering formalism. Considering the previously commented information, the methods also included self-similar models to describe the expected magnitude and galaxy distributions in clusters. This modelling allowed better discrimination between member and field galaxies, producing membership estimates with lower contamination.
In this work, we aim to study probabilistic memberships and their possible application as photometric-based mass proxies by expanding previous approaches to explore well-defined photo-z probability density functions (PDFs) surveys like S-PLUS and J-PAS enable. The procedure allows further refinements over the red sequence methods and includes fainter galaxies, essential for galaxy groups. Here, we take a step back and revisit potential results that can be achieved without strong modelling hypotheses and possible priors that are not entirely adequate to our data.
Motivated by the advent of several ongoing and forthcoming wide photometric surveys, we introduce a galaxy membership algorithm that follows the minimal assumption, data-driven, and photo-z PDF-based approach. We analyze the galaxy group/cluster distributions down to M⊙, using a mock sky simulation based on the Southern Photometric Local Universe Survey (S-PLUS, Mendes de Oliveira et al., 2019) up to redshift , and test several physically meaningful mass-proxies that can be derived from those memberships. To the best of our knowledge, only Castignani & Benoist (2016), Bellagamba et al. (2019), and Werner et al. (2022) consider samples through this mass range.
1.3 Outline of the paper
In §2 we present details of the simulated sky area. We next describe in §3 membership assignments based on a variable aperture radius. In §4 we show the results from applying the method over the mock catalogue, focusing on the performance of the richness estimation, scaling relations with different optical proxies, and minor disturbances in the cluster centre and redshift. In §5 we discuss the membership significance, present completeness and purity results, and derive cluster positions and based on member galaxies. Finally, in §6 we summarize the results and present the conclusions.
Through this work, we adopt a flat CDM cosmology, with , and , following the Planck Collaboration et al. (2014) parameters, which are the same cosmology adopted in the simulations. Magnitudes are given in the AB system.
2 Mock catalogue
To evaluate our estimators’ performance, we use a mock catalogue from a simulated sky lightcone by Araya-Araya et al. (2021); Werner et al. (2022), made to emulate the S-PLUS111S-PLUS is a photometric survey of the Southern Sky using 12 optical bands (7 narrow and 5 broad), from a 0.8m robotic telescope located at Chile/CTIO (Mendes de Oliveira et al., 2019). survey first data release (Mendes de Oliveira et al., 2019).
This mock catalogue simulates a projected area of 324 square degrees, using synthetic galaxies following Henriques et al. (2015) analytical model (SAM). The algorithm uses the Millenium run simulation (Springel et al., 2005) as a base to generate an equivalent matter density field scaled by the Planck 1 cosmological framework (Planck Collaboration et al., 2014). The general halo mass resolution is M, but only those with the corresponding stellar mass higher than M are accepted in the simulation.
All the dominant dark matter halos with M⊙ are selected. To ensure robust membership estimates and statistics, we choose only the ones with at least associated galaxies. The associated galaxy members (hereafter “true members”) are retrieved from the merger history of the system. All galaxies that formally reside in a dark matter halo and evolve into a chosen cluster receive the cluster identification ID “haloId”. This ID allows us to easily identify all the galaxies that belong to a certain cluster. The sky coordinates are estimated from the median value of the member galaxies’ distribution. Limiting the catalogue to , we obtain groups with masses between – M⊙, clusters with – M⊙, with – M⊙, and massive clusters with M⊙. One limitation of the catalogue is that the statistics at low and high masses are poor due to the simulation mass limits. We cannot state how representative the sample is in relation to the full halo population and the impact on the results. A detailed analysis of the simulation is beyond the scope of this paper. Low masses are also affected by the applied membership cut (see the resulting distributions in Fig. 4).
Observed true redshifts in the mocks are created by adding cosmological redshifts (assuming that all galaxies at have a comoving distance of kpc) and Doppler redshifts created by peculiar motions of each mock galaxy. Simulation 3D coordinates are generated following the Kitzbichler & White (2007) methodology.
Star formation histories (SFH) and stellar masses are extracted from the output of the SAM. With the evolution of the information through the simulation’s cosmic time, it is possible to attribute spectral energy distributions (SED) to each time step. SEDs, alongside dust extinction models, are used to compute u, g, r, i, and z apparent magnitudes, similarly to the Sloan AB filter system (Fukugita et al., 1996).
The mock photometric redshifts are generated by perturbing the true value by adding a random number from a normal distribution with as the standard deviation. is the normalized median absolute deviation of the comparison between photo-zs and spectroscopic redshifts for a given galaxy magnitude obtained for the DR1/S-PLUS survey (, Molino et al., 2020, with a median value of ).
A similar procedure generates the galaxy’s photo-z probability density function (PDF). For each galaxy, we assume a normal distribution with as the standard deviation, centred on the created photometric redshift. This technique considers the correlation between the galaxy’s magnitudes and its photometric redshift errors.
3 A density-based probabilistic membership algorithm
3.1 A fixed aperture richness estimator (FAE)
To make a first rough richness estimation, we calculate the galaxy overdensity at a given position in the projected (celestial) coordinates and redshift, or the (2+1)D space for a given cluster.
The galaxies are selected based on the distance from the cluster centre, within a cutoff radius of kpc 222as the median value of the characteristic radius of the cluster sample, more details are provided in § 3.2 and 4.1, and a redshift interval, (with ), where is a robust estimation of the photo-z scatter at the relevant redshift (similar to Araya-Araya et al., 2021).
A probability membership can then be assigned to each of the selected galaxies by:
(1) |
where is the photo-z PDF.
The richness around the cluster position is estimated as,
(2) |
where i is the index that runs over all galaxies within the cluster area (), and is the median surface density of field galaxies at a given redshift. We calculate the surface density as the sum of the galaxy’s probabilities per area for randomly distributed sky coordinates. We avoid bias due to the presence of clusters or voids with a clipping procedure. This strategy is adequate for small/medium sky areas.
Another possible approach to dealing with field galaxies contamination is to compute the contribution in annuli near the cluster centre. This procedure is more sensitive to large-scale structures and could better characterise different sky regions. Here, we tested the sample with an inner radius of Mpc and an external one of Mpc. We do not find, statistically, significant trends in the final richness results with FAE (a median difference of ) in agreement with the initial method.
We present preliminary scaling relations obtained with the above procedure in Fig. 1, and compare to the true values of the mock catalogue (more details in § 2), as orange and grey curves, respectively. Red points represent the fixed aperture richness, and black points the mock catalogue. Both datasets share the same halo masses values. While the richness estimated by the above procedure correlates with the mass, the slope is rather flat given the scatter () in comparison to the mock catalogue (), thus making it an uncompelling proxy. The main reason we found for this flatness is the fixed radius that does not take into consideration differences in the sizes of clusters of different masses.

3.2 An adaptive membership estimator (AME)

We propose an adaptive aperture procedure to reduce the aperture effect on the richness and keep the membership and richness estimation as data-based and assumptions-free as possible. The inputs are the galaxy catalogue projected positions, photo-z PDFs, and the cluster position in the (2+1)D space: z, RA and Dec. In this work, we assume the PDF as a Gaussian distribution, centred on the photometric redshift, developed to reproduce the expected values in the S-PLUS survey. In the following, we present our main algorithm steps’ and details.
-
1.
Remove obvious non-members by cutting out all galaxies outside a radius of Mpc in the plane of the sky and with .
-
2.
Calculate the galaxy density profile. A core radius () will be defined as a break or “knee” in this profile. Estimate (Eq. 2) within this radius.
-
3.
Draw a photo-z PDF-based random redshift value for each galaxy within .
-
4.
Calculate the sample velocity dispersion with the drawn redshifts after a clipping process.
-
5.
Run HDBSCAN (Campello et al., 2013) in the 2D space using the remaining galaxies. Input parameters are galaxy positions and as the minimum cluster size parameter. The structure with the most galaxies is assumed to be a primary counterpart of the cluster in question. Minor groups are candidate substructures.
-
6.
Repeat N times steps –. The probability of each galaxy being a member is the number of times that the galaxy is included as a member over N: .
In step (i), we try to ensure that no galaxies with reasonable possibilities of being gravitationally bound cluster members are excluded from the putative member pool.
Consistent with our data-oriented process, we let the radial distribution of galaxies define a projected space aperture (Rc) for the remaining analysis. We detect a discontinuity on the radial density profile gradient starting from the centre in step (ii). This behaviour is expected to happen when our initial sample stops being cluster dominated to be field dominated.
The projected density profile is calculated in the annulus, from the centre on, in overlapping steps of kpc and with a log-scaling width of – kpc. Survey boundaries are carefully considered when estimating the areas using the Monte Carlo method. This process also provides a powerful approach to dealing with possible masked survey areas. The core radius, , is defined as a sudden drop/break in density (e.g., a factor of 2 within steps).
To identify this “break” point, we use the Kneedle algorithm (Satopaa et al., 2011). In short, the code identifies a local minimum by accounting for the difference between the density profile and a straight line connecting its initial and final points. is then the first (more central) detected local minima. Galaxies outside are discarded at this point.
Figure 2 displays some examples of the radial distribution and the “break” point (red vertical lines). Despite the noisy behaviour for larger radii, the central discontinuity is pronounced. We tested different binning (width of the annuli radius) and steps (increment of the smaller annuli radius). Larger steps can introduce a positive difference of to kpc over the entire range of radii. For binning, a more refined scheme does not produce a significant bias. The expectation is that the signal-to-noise contrast between field and cluster galaxy populations is maximum within this radius. In § 4.1 we discuss how scales with .

In steps (iii) and (iv), we use the photo-z PDF instead of its point estimation. We draw a value for each galaxy in the remaining sample, mimicking the realization of an ideal redshift measurement.
After these clippings in the (2+1)D space, we still expect the sample to be somewhat contaminated. Due to the quality of the photometric redshift, tests with true member galaxies only produced a similar value in velocity dispersion. This analysis indicates that even relying on an interactive approach relating a mock quantity, e.g., richness, with the expected velocity dispersion would not produce better results. Applying a more rigorous cut () may solve this issue for data with a well-behaved PDF since the limit becomes smaller than the redshift uncertainty itself. However, in the case of a more realistic distribution, this procedure can eliminate a meaningful part of the information. Nevertheless, despite the limitations of the code, we can obtain accurate results, as discussed below.
As a final step (v), we use the density-based clustering algorithm HDBSCAN333Hierarchical Density-Based Spatial Clustering of Applications with Noise.. It connects points in a space based on proximity, deeming more isolated ones as interlopers.





Similar algorithms, such as the better-known DBSCAN (Ester et al., 1996), have been used in astronomy for similar purposes (Bhattacharya et al., 2017; Olave-Rojas et al., 2018). DBSCAN, however, requires two parameters to run: a distance-related parameter () and the minimum number of neighbours: (min_samples). HDBSCAN estimates by itself, varying and integrating it into a search for the most stable value. It still needs the minimum number of neighbours as an input parameter.
To provide this input, we use the relation between the number of true members (“true richness”) defined within Rc, instead of a fixed aperture of kpc, and (Eq. 2). We show in Fig. 3 the relation between the number of true mock cluster members and the average of values. It can be seen that the richness values we are obtaining are almost twice the true values indicating important contamination. On the other hand, as there is a linear relation between those quantities, we can use to estimate min_samples. As the richness depends on the chosen absolute magnitude limit, the slope of this relation may be slightly higher (lower) for a shallower limit (deeper).
The final membership probability is finally defined by running the procedure above times and estimating the membership probability of a given galaxy by the number of times it has been selected over the total number of tries. This iteration process allows for several redshift realizations of the photo-z’s PDFs, thus making full use of it.
4 Results
We are considering galaxies with for the membership sample. This way, we have a volume-limited sample up to . At this redshift, this luminosity is about one magnitude fainter than the characteristic (Puddu et al., 2021). We calculate the absolute magnitudes simply by the relation with apparent magnitude, , where is the luminosity distance in Mpc.
In Fig. 4, we show the main products of this exercise. The redshift and mass distributions of the mock clusters, plus the main mass proxies obtained through the probabilistic membership method:
(3) | ||||
(4) | ||||
(5) |
These values depend on the galaxy’s P, which include only of the total galaxy population in the simulation for . The total fraction of true galaxies associated with a cluster, known from the simulations and without any cuts, is . Applying thresholds for mass, redshift, and the minimum number of true associated galaxies, this value drops to . The discrepancy between the two percentages for P and the true members highlights the contamination of our sample. We discuss in §5 the P completeness and purity results, as well as different cluster mass and redshift ranges, to understand the source of the contamination.
4.1 Physical meaning of Rc
Galaxy probabilities and richness estimates are typically measured within some specific aperture radius related to the cluster’s physical size. Common choices are the radius within which the density is ( or other) times the critical density of the Universe.
For the results using the adaptive membership estimator (AME), assuming the radius in which HDBSCAN will act is also necessary. This step is done by searching for a break () in the number density profile of the cluster, as described above (Section 3). A statistical comparison between the mock-defined and reveal that is of , having a scatter of . Because does not represent a very restrictive criterion, it also can be interpreted as a physical size of the cluster, close to .
4.2 Sanity check
To ensure that the richness code works properly, we perform a sanity check with a catalogue of random coordinates and redshifts. The random catalogue comprises galaxy clusters with sky coordinates and redshifts drawn using the python function random.uniform. The limits are defined by the boundaries of the mock catalogue. For , we cannot apply a similar random distribution. Because groups are usually described with a smaller radius and are more numerous in counts, the function that describes the radii in the function of the structure mass follows an exponential behaviour. We therefore use results from the mock cluster catalogue to derive a probability distribution function. We then randomly distribute values that reproduce the data.
For each random cluster, we estimate richness values for both fixed aperture and adaptive estimators. The results are presented in Fig. 5. As expected, the sanity check shows median values near zero, with for the FAE method. This result indicates that richness values lower than may include random superpositions444For further analysis on robust decontamination process, we suggest a procedure developed by Klein et al. (2017), on which the authors discuss an estimator that allows removing cluster candidates below a given threshold.. Note that we have some below zero. This behaviour occurs when the sum of the probability of the background galaxies is greater than the sum of galaxies around the fixed radius, indicating a region of galaxy density lower than the mean value.
For the AME, values are concentrated over the zero line without error bars. However, this does not mean the entire random points return zero. In this case, we hit a significant structure () times by chance. These chance detections though are insufficient to account for the statistics (right panel of Fig. 5).
Another important discussion for the AME is the minimum cluster size obtained by the linear fitting. As the parameter depends on the fixed aperture estimator, values with return min_samples ; However HDBSCAN only works with min_samples . If we simply round the values, we force the code to find at least bound galaxies. This circumstance introduces a bias of at least for the random detections at , which decreases for higher redshifts. To avoid this situation, we introduce a threshold, where HDBSCAN is unable to run, resulting in .
4.3 Richness significance


The main goal of this study is to develop a methodology that produces an optical mass proxy with low scattering using photometric information. Additionally, we aim for a method that can be applied to any cluster catalogue, ranging from groups to galaxy clusters, without relying on strong modelling hypotheses. To use optical richness as a proxy for scaling relations, we need to be able to quantify the richness with good accuracy. To check its significance, we can perform an analysis in richness bins.
As we know the true richness of the sample, we can compare the average values given by the mock with the richness calculated by the adaptive membership estimator (). The true richness is given as the sum of the galaxies identified by the simulations located within from the cluster centre. Fig. 6 presents the agreement between both quantities. A black dotted line shows the one-to-one line. The residual value between both quantities, , is . This low and unbiased result reassures the quality of our richness estimate. We require that each bin includes at least clusters.

4.4 Mass-Observable relations
The main difficulty in using galaxy clusters for cosmological studies is the measurement of their masses for each object. A workaround is to correlate mass with other observational properties, such as optical richness, X-ray luminosity, or total stellar mass. This relationship between mass and observable is usually calibrated using a limited sample of objects and then extended to the full catalogue. For example, lensing surveys consider the full sample in stacked analyses. An interesting mass proxy should present a low intrinsic scatter to produce robust mass estimates.
Our AME returns for each galaxy the probability of being physically bound to a certain cluster. Using this capability, we can calculate other proprieties, weighted by the membership, to characterize the cluster sample. For example, with the magnitude information in the -band we can estimate the total optical luminosity, . The solar absolute magnitude is represented by in r-band (, Willmer, 2018), and the -th galaxy absolute magnitude in the same band by . The mock galaxy’s stellar mass can be used to derive the total stellar mass, following the same procedure as described above, , with as the -th galaxy stellar mass.
A similar approach can be applied to derive the mock quantities. In this case, is always one for the true galaxy members and zero otherwise.
With our mass proxies, we can derive the scaling relations. The linear regression is done by linmix (Kelly, 2007). This algorithm uses a Bayesian approach and accounts for errors in both parameters for the minimization process, masses and proxies, which is ideal for real data. The relation is modelled as,
(6) |
where coefficients are represented by and , is the mass proxy, is a pivot value corresponding to the mean value of the mock proxy, and is the intrinsic random scatter about the regression. Table 1 shows the resulting best-fitting parameters for both mock (marked with a symbol ) and estimated values from the AME. The relation between median values of mass and Richness (top), (middle), and (bottom panels), separated in proxy bins, can be found in Fig. 7. Black diamonds represent mock and red circles represent the estimator results.
and are given in units of L⊙ and M⊙. Mock results are identified by after the observable proxy. Proxy ” ” ”



Statistically, the most scattered distribution is the richness-mass relation. The mock results for the low-mass end highlight the high intrinsic scatter for the small richness groups. The same structure with () can have a halo mass between and M⊙. Besides the different low richness end, both (mock and adaptive estimators) linear regressions present the same behaviour. The observed intrinsic scatter is dex, a value similar to the best-case scenario, i.e. simulations. The total optical luminosity is a valuable parameter with median dex, compared to dex from simulations. This residual scatter observed between mock and AME results is consistent with the intrinsic scatter of the relation -, of . Considering the mass range amplitude, similar behaviour is observed as depends mainly on magnitudes and galaxies probabilities. We see a small deviation for lower luminous structures, probably due to external contamination, that slightly increases the intrinsic scatter in relation to the best-case scenario. is an interesting option with the lowest intrinsic scatter. It provides a galaxy cluster candidate’s characterization regarding physical properties, such as stellar mass. These results may be overly-optimistic since we use the exact values of the stellar masses. In optical surveys, scatter may be introduced by the inference method. The same behaviour found in for the low end is seen here. We also observe a small difference in that introduces a gap.
4.5 Perturbing the cluster centre and redshift
We can test the code robustness by applying small variations in redshift and coordinates in the mock cluster catalogue that mimics expected disturbances found in galaxy cluster detections using observational data.
Werner et al. (2022) discuss the application of the density-based algorithm PZWav (Gonzalez, 2014) for detecting clusters from S-PLUS DR1 and analyze the algorithm performance by comparing the outputs with the same mock catalogue used in our work. Due to the PZWav computational approach that detects substructures by the spatial distribution of the galaxies and estimates the redshift also based on the surrounding galaxies, the variations found in centre distances and redshifts are notably small. The mean radial difference is kpc with a standard deviation of kpc, and the mean redshift difference is with .
We now test different scenarios that include the mean value added to the ( kpc, ), ( kpc, ) and intervals ( kpc, ) as the peak of a 2D Gaussian distribution (such as Johnston et al., 2007; Vitorelli et al., 2018, for miscentering), and compare the obtained richness results with the centralized ones. For the variations, we do not use a preferential frame for the displacement. We draw instead a random value between to degrees that orient the offset.
While the radial variation is always positive, can be applied as a positive or negative offset. For this, we use normal distributions, centred on with proportional , or that randomly that adds or subtracts from the cluster redshift.
In Table 2, we show the median relative errors, and , between the richness results for and separately. As there were no significant trends through the redshift range, we present only one result containing all the galaxy clusters for each scenario.
For the centre distance, there is no effective deviation in richness results for small offsets, i.e., the median relative errors are consistent with zero. This analysis shows the code is robust to centre variations lower than kpc. Investigating further, we found that the median relative error is below until kpc, increasing to for kpc and for kpc. Note that the negative sign represents an underestimation of the richness. Those values indicate that the code is sensitive to differences in the galaxy density at cluster outskirts compared to the central distribution and the choice of the background estimate using the farthest galaxies as contamination points.
For the redshift offsetting, we observe a decrease (however, within error bars) in the richness measurements, going from for to an deviation for . Such difference as large as the photo-z uncertainty, causes it to exclude a significant contribution from galaxies near the centre of the cluster. This analysis shows that the code is robust to both centre and redshift variations.
[kpc] | ||
---|---|---|
Photo-z PDF | fraction | median relative errors |
---|---|---|
t-distribution | ||
bimodality | ||
4.6 More realistic photo-z PDFs
In this work we approximate the photo-z PDFs as well-behaved Gaussian distributions. However, PDFs from observations might be more complex, often presenting long tails or secondary peaks. As the photo-z PDFs play a central role in defining the richness, we also test the effects of the non-Gaussianity in our sample.
To address the effects of the long tails, we describe the galaxy sample with Student’s t-distributions. We centred it on the galaxy photo-z central value, with width (§ 2). We adopt the degrees of freedom parameter to make the wings much broader than the Normal ones. This test produced similar results in terms of richness, with a relative error of , as presented in Table 3.
To test the effect of a bimodal PDF, we took some examples from the S-PLUS database. We define that a PDF has a significant secondary peak when the distance between peaks is greater than , and the secondary peak amplitude is at least of that of the primary peak. For each galaxy in the mock, we attribute an S-PLUS-like double-peaked PDF in the function of the galaxies with the closest redshift and magnitude. We tested different fractions of double-peaked PDF, such as , , and , and found that the richness measurements are only affected for values , leading to underestimates of .
As this work is inspired by imaging surveys that use several narrow-band filters, we do not expect double-peak to be a more significant issue than our test. Surveys such as S-PLUS and J-PAS utilize a combination of filter systems that allow a better constraint of the photo-z PDFs. S-PLUS, for example, present a secondary peak fraction lower than .

5 Individual Memberships
In this section we present a view of the membership significance, compare the galaxy probabilities with the fraction of true members, discuss completeness and purity in terms of galaxy probabilities in different redshift and cluster mass ranges, and compare the cluster position and redshifts with the ones estimated by the selected member galaxies alongside the ones obtained using a cluster finder.
5.1 Significance
In the previous section we based our proxy estimator on the individual membership probabilities, . Here we will investigate the real meaning of this number using an approach similar to Castignani & Benoist (2016) and Lopes & Ribeiro (2020). The idea is to compare the fraction of true members () given by the mock and the estimated . The galaxy data is binned in , and is calculated for each bin.
Here we also introduce a test case based on the already known information from the mock to observe how a more strict limit in the velocity dispersion (step iv §3.2) can improve our results. First, we run AME using only the true member galaxies over the entire cluster sample and estimate the “real” velocity dispersion given the photo-zs. As no significant evolution with richness is expected, we take the median value of the sample. This step produces a smaller than when considering all galaxies (members and non-members) within . We also modify the clipping process: only the galaxies within are accepted instead of default . Finally, running AME with the new limits should produce purer results.
The resulting fractions are presented in Fig. 8. A black dotted line shows the one-to-one line, blue points represent our default model, and the orange diamonds are the values obtained with rigorous mock-based limit.
In both cases, we detect a slightly higher fraction of true members for low . On the other hand, on the high end the contamination is higher than expected. The combination of both effects produces a flattened curve in Fig. 8. A fraction of line-of-sight contamination is expected because photometric redshift uncertainties are one order of magnitude larger than typical cluster velocity dispersions. The average fraction is , increasing contamination for . This behaviour indicates that some non-members with the maximum peak of the photo-z PDFs near the cluster redshifts cannot be removed with the clipping. Still, a more rigorous cut could reduce the contamination contribution. For example, for the average fraction of contamination is decreased to only . However, the limit is lower than the redshift uncertainty itself, restraining at and introducing a small underestimate in our optical proxies.
We highlight that contamination by interlopers still occurs even in a controlled situation where we know a priori the sample velocity dispersion. Despite the higher fraction of interlopers found using the data-driven shallower threshold, we still obtain accurate results that allow us to produce scaling relations with competitive intrinsic scatter. A better characterization of how contamination affects the purity of our sample is discussed below (subsection 5.2).
5.2 Completeness and purity
Another way to quantify the robustness of our method is by analyzing the completeness (C) and purity (P). Usually, completeness is described as the fraction of true members correctly classified as a member, and purity is the fraction of true members concerning all objects selected as members.
Using the same notation of other classification studies (Castignani & Benoist, 2016; Lopes & Ribeiro, 2020), we estimate C and P as,
(7) |
(8) |
where is the number of galaxies that were classified as members, represents the number of true members given by the simulation, gives the number of false positive objects (wrongly classified as a true member), and are the false negative objects (a true member with negative output).
We can explore variations in both purity and completeness by selecting galaxies that have membership probabilities higher than a given limiting value. In Fig. 9, we show completeness and purity as a function of different thresholds, highlighting , , in black diamonds. The corresponding results are , , and , , . Note that increasing the limiting value higher than does not improve significantly (less than ) the purity. This limited improvement can be seen in the flattening of the curve in Fig. 9.



To identify possible dependencies on cluster redshift and halo mass, we divided the cluster sample into redshift bins of and . Fig. 10 shows the results highlighting threshold of , , and as a circle, square, diamond, and triangle, respectively. In a similar approach, Castignani & Benoist (2016), using a cut, observed mean values with only a few per cent variations (). Taking for comparison , we see in the upper panel of Fig. 10 a large difference between the (blue line) and the rest of the redshift range. This difference is in completeness and without a significant gain in purity (). Similar behaviour is observed in completeness for the other cuts (), but with also an improvement in purity (). Smaller variations are observed within , with increasing improvements in both and (, ). The red line for shows a decrease in completeness for of , with the highest purity values ().
For mass evolution, we observe a decrease in both C and P for more massive objects. For , C shows a change of and P a . Changes are larger for high cuts (, and for , respectively). We conclude that this behaviour does not originate from poor clusters, but is instead due to contamination in the cluster’s outskirts from field galaxies. Besides the code’s ability to remove spatially dispersed points, the ones too close to the real galaxy members are still considered. Massive clusters are usually populated with a larger sample of true galaxies over less concentrated areas than the group mass scale (Comerford & Natarajan, 2007; Merten et al., 2015). This effect means that the code will be more susceptible to accepting contributions from other field galaxies nearby, increasing the contamination. This lower density within the cluster core may lead to neglect of the outskirts. Less massive objects tend to be more concentrated, leading to more accurate results.
Studies with probabilistic membership assignments typically select a probability threshold to account only for reliable cluster members. In this work, we consider the contribution of all galaxies for the richness estimation. A more careful approach should be considered in the case of single cluster analyses due to the non-negligible fraction of outliers. However, for large samples, we present statistically robust results with low median differences and purity and completeness that are comparable to other estimators ( , , George et al., 2011; Castignani & Benoist, 2016, respectively).
5.3 Clusters position and redshift from Pmem
[kpc] | ||
---|---|---|
Omock-Ogals | ||
OPZWav-Ogals |
Another test we can perform is to measure the difference between the redshift derived from the member galaxies () and the mock cluster, as well as the centre discrepancies. We can take advantage of the galaxy cluster catalogue produced by PZWav for the mock (described in Werner et al., 2022) and compare how these values change in a real case of detection. For this analysis, we consider as a “counterpart” the PZWav detections that have a redshift difference with the mock cluster of and centre distance of kpc.
For redshifts, we can compute the average value weighting each galaxy photo-z contribution by its Pmem, as . This procedure ensures that galaxies weakly related to the cluster have a lower contribution in the calculations. We follow the procedure proposed by Castignani & Benoist (2016) for the centre estimates, where the authors estimate the mock clusters’ barycentre by averaging the member galaxies’ Cartesian coordinates. Here, we also introduce the probability weight for galaxies with . We do the same for the right ascension, , and for declination.
In Table 4 we show as Omock-Ogals the difference between the quantity O ( or ) for the mock catalogue and the averaged value obtained by the galaxies accepted as a member () around its centre. Similarly, OPZWav-Ogals is the difference between the quantity O for the PZWav catalogue and the selected galaxies around PZWav centres. Here we also average all the cluster samples due to insignificant trends through the redshift range.
For both scenarios, the redshift estimates are highly consistent within the detection and selected member galaxies, giving a total difference consistent with zero, with . This result highlights the applied computational approach to calculate the redshift estimates. Both PZWav and simulations calculate by averaging the nearby galaxies.
In contrast, the centres present substantial discrepancies, which arise from selection effects. The centre estimations in the mock clusters are based on the true members assigned independently of the galaxy’s centre distance. In our method, however, galaxies are selected within a cutoff radius , which leads to the removing some distant true members from the analysis. We also highlight the presence of contaminating galaxies that can offset the centre depending on their . The median error is kpc. For PZWav, centres are estimated through overdensities, calculated based on the galaxy distribution weighted by the integrated PDF (similarly to eq. 1, more details in Werner et al. (2022)). These overdensities may have physical sizes within to kpc. Then, the centre estimate considers the contribution of all galaxies inside the range. This circumstance returns a median error of .
We highlight that these values can be redshift dependent when considering angular sizes, on which for physical units we might under-estimate the offsets of low redshift clusters and overestimate high-z clusters. In this case, we found a variation of arcmin at that rapidly drops for arcmin at and stabilizes at arcmin within .
6 Summary and conclusions
In this work, we present a probabilistic membership assignment estimator that uses photometric parameters to derive reliable richness estimates, optical luminosity, and total stellar mass for 919 simulated structures with masses ranging from groups to galaxy clusters. The approach assumes minimal information about the definition of the cluster and uses the sky position within a characteristic radius () to select potential galaxy members.
Below, we highlight the main findings of this work:
-
•
The characteristic radius statistically scales as , with a median absolute deviation of .
-
•
Tests with random points distributed along the redshift of the simulations show that the FAE method returns of . By applying this value as a richness threshold for the adaptive membership estimator richness, , correctly yielded .
-
•
Considering the group/cluster sample, comparisons between and produce a linear relation in median values, with a deviation of .
-
•
With the probabilistic results, we successfully derive optical mass proxies that are simple, with low observational cost, and present small intrinsic scatter.
From the scaling relations:
dex
dex
dex.
-
•
We show that our adaptive estimator is robust for the small centre and redshift offsets. Displacements presented by Werner et al. (2022) produce richness variations lower than .
-
•
Testing the use of complex photo-z probability density functions (PDF) reveals that the presence of long tails does not significantly impact the results. However, in cases where the PDF exhibits a bimodal distribution, there is a potential underestimation of richness by approximately when more than half the sample has bimodal PDFs. Considering that the fraction of bi-modality in the S-PLUS data is only , we anticipate that the double-peak phenomenon is not a significant concern for our data.
Regarding the individual membership probabilities:
-
•
The fraction of true galaxy members in each bin emphasizes the contamination for higher probability values.
-
•
We analyze the distributions in different redshift ranges and masses and find a better selection of true galaxy members in group scales and for higher redshift ranges. This result is probably related to the cluster spatial distribution, where groups tend to be more concentrated.
-
•
The comparison between redshift average values obtained through the selected member galaxies and the nominal ones, given by the mock, shows similar redshift values. This comparison also shows centre position variations around kpc between mock and selected galaxies, and kpc for PZWav and selected galaxies.
We conclude that our estimator is robust to small offsets and produces optical mass proxies that are competitive with other traditional observables. Our method can be applied in present and future photometric surveys, such as S-PLUS and J-PAS, alongside any list of clusters or galaxy groups produced by cluster finders.
Acknowledgements
LD was supported by a scholarship from the Brazilian federal funding agency Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES). ESC acknowledges the support of the funding agencies CNPq (#309850/2021-5) and FAPESP (#2023/02709-9). PAAL thanks the support of CNPq (grants 433938/2018-8 and 312460/2021-0) and FAPERJ (grant E-26/200.545/2023). R.A.D. acknowledges partial support from CNPq grant 312565/2022-4. The authors are thankful to the anonymous referee for the useful comments and suggestions.
Data Availability
To access the mock catalogue presented in this article contact Pablo Araya-Araya (e-mail: paraya-araya@usp.br), asking for the specific dataset. The FAE and AME codes are available on GitHub: https://github.com/liadoubrawa/Probabilistic_membership_estimators.
References
- Aihara et al. (2011) Aihara H., et al., 2011, The Astrophysical Journal Supplement Series, 193, 29
- Allen et al. (2011) Allen S. W., Evrard A. E., Mantz A. B., 2011, Annual Review of Astronomy and Astrophysics, 49, 409–470
- Andreon (2015) Andreon S., 2015, A&A, 582, A100
- Araya-Araya et al. (2021) Araya-Araya P., Vicentin M. C., Sodré Laerte J., Overzier R. A., Cuevas H., 2021, MNRAS, 504, 5054
- Bellagamba et al. (2019) Bellagamba F., et al., 2019, MNRAS, 484, 1598
- Benitez et al. (2014) Benitez N., et al., 2014, arXiv e-prints, p. arXiv:1403.5237
- Bhattacharya et al. (2017) Bhattacharya S., Mahulkar V., Pandaokar S., Singh P. K., 2017, Astronomy and Computing, 18, 1
- Bonoli et al. (2021) Bonoli S., et al., 2021, A&A, 653, A31
- Brunner & Lubin (2000) Brunner R. J., Lubin L. M., 2000, AJ, 120, 2851
- Campello et al. (2013) Campello R. J. G. B., Moulavi D., Sander J., 2013, in Pei J., Tseng V. S., Cao L., Motoda H., Xu G., eds, Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 160–172
- Carlberg et al. (1996) Carlberg R. G., Yee H. K. C., Ellingson E., Abraham R., Gravel P., Morris S., Pritchet C. J., 1996, ApJ, 462, 32
- Castignani & Benoist (2016) Castignani G., Benoist C., 2016, A&A, 595, A111
- Cenarro et al. (2019) Cenarro A. J., et al., 2019, A&A, 622, A176
- Comerford & Natarajan (2007) Comerford J. M., Natarajan P., 2007, Monthly Notices of the Royal Astronomical Society, 379, 190
- Costanzi et al. (2019) Costanzi M., et al., 2019, MNRAS, 488, 4779
- Ester et al. (1996) Ester M., Kriegel H.-P., Sander J., Xu X., 1996, in KDD.
- Ettori (2013) Ettori S., 2013, MNRAS, 435, 1265
- Euclid Collaboration et al. (2019) Euclid Collaboration et al., 2019, A&A, 627, A23
- Euclid Collaboration et al. (2022) Euclid Collaboration et al., 2022, A&A, 662, A112
- Finoguenov et al. (2020) Finoguenov A., et al., 2020, A&A, 638, A114
- Fukugita et al. (1996) Fukugita M., Ichikawa T., Gunn J. E., Doi M., Shimasaku K., Schneider D. P., 1996, AJ, 111, 1748
- George et al. (2011) George M. R., et al., 2011, ApJ, 742, 125
- Gonzalez (2014) Gonzalez A., 2014, in Building the Euclid Cluster Survey - Scientific Program. p. 7
- Henriques et al. (2015) Henriques B. M. B., White S. D. M., Thomas P. A., Angulo R., Guo Q., Lemson G., Springel V., Overzier R., 2015, MNRAS, 451, 2663
- Ider Chitham et al. (2020) Ider Chitham J., et al., 2020, MNRAS, 499, 4768
- Ivezić et al. (2019) Ivezić Ž., et al., 2019, ApJ, 873, 111
- Johnston et al. (2007) Johnston D. E., et al., 2007, arXiv e-prints, p. arXiv:0709.1159
- Kelly (2007) Kelly B. C., 2007, ApJ, 665, 1489
- Kitzbichler & White (2007) Kitzbichler M. G., White S. D. M., 2007, MNRAS, 376, 2
- Klein et al. (2017) Klein M., et al., 2017, Monthly Notices of the Royal Astronomical Society, 474, 3324
- Lopes & Ribeiro (2020) Lopes P. A. A., Ribeiro A. L. B., 2020, MNRAS, 493, 3429
- Lopes et al. (2009) Lopes P. A. A., de Carvalho R. R., Kohl-Moreira J. L., Jones C., 2009, MNRAS, 399, 2201
- Martí et al. (2014) Martí P., Miquel R., Castander F. J., Gaztañaga E., Eriksen M., Sánchez C., 2014, MNRAS, 442, 92
- Maturi et al. (2023) Maturi M., et al., 2023, arXiv e-prints, p. arXiv:2307.06412
- Mendes de Oliveira et al. (2019) Mendes de Oliveira C., et al., 2019, MNRAS, 489, 241
- Merten et al. (2015) Merten J., et al., 2015, ApJ, 806, 4
- Miyazaki et al. (2012) Miyazaki S., et al., 2012, in McLean I. S., Ramsay S. K., Takami H., eds, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series Vol. 8446, Ground-based and Airborne Instrumentation for Astronomy IV. p. 84460Z, doi:10.1117/12.926844
- Molino et al. (2020) Molino A., et al., 2020, MNRAS, 499, 3884
- Olave-Rojas et al. (2018) Olave-Rojas D., Cerulo P., Demarco R., Jaffé Y. L., Mercurio A., Rosati P., Balestra I., Nonino M., 2018, MNRAS, 479, 2328
- Pacaud et al. (2016) Pacaud F., et al., 2016, A&A, 592, A2
- Pereira et al. (2018) Pereira M. E. S., et al., 2018, MNRAS, 474, 1361
- Planck Collaboration et al. (2014) Planck Collaboration et al., 2014, A&A, 571, A16
- Planck Collaboration et al. (2016) Planck Collaboration et al., 2016, A&A, 594, A24
- Pratt & Bregman (2020) Pratt C. T., Bregman J. N., 2020, ApJ, 890, 156
- Puddu et al. (2021) Puddu E., et al., 2021, A&A, 645, A9
- Reiprich & Böhringer (2002) Reiprich T. H., Böhringer H., 2002, ApJ, 567, 716
- Rykoff et al. (2014) Rykoff E. S., et al., 2014, ApJ, 785, 104
- Sartoris et al. (2016) Sartoris B., et al., 2016, MNRAS, 459, 1764
- Satopaa et al. (2011) Satopaa V., Albrecht J., Irwin D., Raghavan B., 2011, in 2011 31st International Conference on Distributed Computing Systems Workshops. pp 166–171, doi:10.1109/ICDCSW.2011.20
- Sereno et al. (2019) Sereno M., Ettori S., Eckert D., Giles P., Maughan B. J., Pacaud F., Pierre M., Valageas P., 2019, A&A, 632, A54
- Springel et al. (2005) Springel V., et al., 2005, Nature, 435, 629
- The Dark Energy Survey Collaboration (2005) The Dark Energy Survey Collaboration 2005, arXiv e-prints, pp astro–ph/0510346
- Vikhlinin et al. (2009) Vikhlinin A., et al., 2009, ApJ, 692, 1060
- Vitorelli et al. (2018) Vitorelli A. Z., Cypriano E. S., Makler M., Pereira M. E. S., Erben T., Moraes B., 2018, MNRAS, 474, 866
- Voit (2005) Voit G. M., 2005, Rev. Mod. Phys., 77, 207
- Weinberg et al. (2013) Weinberg D. H., Mortonson M. J., Eisenstein D. J., Hirata C., Riess A. G., Rozo E., 2013, Phys. Rep., 530, 87
- Werner et al. (2022) Werner S. V., et al., 2022, arXiv e-prints, p. arXiv:2211.04243
- Willmer (2018) Willmer C. N. A., 2018, The Astrophysical Journal Supplement Series, 236, 47
- Wright et al. (2010) Wright E. L., et al., 2010, AJ, 140, 1868
- de Jong et al. (2013) de Jong J. T. A., Verdoes Kleijn G. A., Kuijken K. H., Valentijn E. A., 2013, Experimental Astronomy, 35, 25