This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Neural Simulation-Based Inference of the Neutron Star Equation of State directly from Telescope Spectra

Len Brandes [email protected] Technical University of Munich, TUM School of Natural Sciences, Physics Department, 85747 Garching, Germany    Chirag Modi Center for Computational Astrophysics, Flatiron Institute, New York, NY, 11226, USA Center for Computational Mathematics, Flatiron Institute, New York, NY, 11226, USA    Aishik Ghosh Department of Physics and Astronomy, University of California, Irvine, CA 92697, USA Physics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA    Delaney Farrell Department of Physics, San Diego State University, San Diego, CA 92115, United States    Lee Lindblom Department of Physics, University of California at San Diego, La Jolla, CA 92093, United States    Lukas Heinrich Technical University of Munich, TUM School of Natural Sciences, Physics Department, 85747 Garching, Germany    Andrew W. Steiner Department of Physics and Astronomy, University of Tennessee, Knoxville, Tennessee 37996, USA Physics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA    Fridolin Weber Department of Physics, San Diego State University, San Diego, CA 92115, United States Department of Physics, University of California at San Diego, La Jolla, CA 92093, United States    Daniel Whiteson Department of Physics and Astronomy, University of California, Irvine, CA 92697, USA
Abstract

Neutron stars provide a unique opportunity to study strongly interacting matter under extreme density conditions. The intricacies of matter inside neutron stars and their equation of state are not directly visible, but determine bulk properties, such as mass and radius, which affect the star’s thermal X-ray emissions. However, the telescope spectra of these emissions are also affected by the stellar distance, hydrogen column, and effective surface temperature, which are not always well-constrained. Uncertainties on these nuisance parameters must be accounted for when making a robust estimation of the equation of state. In this study, we develop a novel methodology that, for the first time, can infer the full posterior distribution of both the equation of state and nuisance parameters directly from telescope observations. This method relies on the use of neural likelihood estimation, in which normalizing flows use samples of simulated telescope data to learn the likelihood of the neutron star spectra as a function of these parameters, coupled with Hamiltonian Monte Carlo methods to efficiently sample from the corresponding posterior distribution. Our approach surpasses the accuracy of previous methods, improves the interpretability of the results by providing access to the full posterior distribution, and naturally scales to a growing number of neutron star observations expected in the coming years.

I Introduction

The study of neutron stars provides a window into the fundamental nature of matter under extreme conditions, a long-standing area of research in nuclear and astrophysics [1]. The core of a neutron star contains matter at densities that far surpass those encountered in terrestrial experiments. Under such conditions, matter could exist in various exotic states [2], such as baryons in the form of hyperons and Δ\Delta isobars [3, 4, 5, 6, 7], deconfined quarks [8, 9, 10, 11], color superconducting phases [12, 13, 14], quarkyonic matter [15, 16], or possibly meson condensates [17, 18, 19, 20, 21]. Understanding the physics of matter under the extreme conditions within a neutron star hinges crucially on unveiling the equation of state (EoS), which represents its internal composition through the intricate relationship between pressure and energy density [22]. Knowledge of the EoS is also vital in determining the macroscopic properties of neutron stars, such as their mass and radius, via the relativistic stellar structure equations [23, 24].

Observations of neutron stars have been the primary source for advancing our knowledge of the EoS of superdense matter. In the last two decades, there has been an increasing amount of high-quality data from X-ray and radio emissions, and gravitational waves [25, 26, 27, 28, 29, 30, 31, 32, 33, 32, 34, 35]. In particular, spectroscopic measurements from the thermal surface emission of low-mass X-ray binaries in quiescence have provided constraints on the neutron star mass-radius relation. The constraints on the mass-radius relation from observations can be translated into constraints on the EoS of neutron star matter through the use of Bayesian inference techniques [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50], with a recent interest in complementary methods based on machine learning (ML)  [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]. Beyond the star’s mass and radius, emitted stellar spectra are also affected by physical quantities like the stellar distance, hydrogen column, and effective surface temperature. Uncertainties on these so-called nuisance parameters must be accounted for when analyzing stellar spectra and making a robust estimation of the equation of state. Traditionally, inference is done in two steps, first by extracting the mass and radius of a neutron star from an observed telescope spectrum, followed by EoS inference from a set of stellar masses and radii [73, 74]. This two-step procedure often requires additional assumptions, though recently it has been demonstrated that a single-step inference is possible while fully propagating uncertainties [71, 72]. While these advancements allow access to the posterior of the EoS parameters by marginalizing nuisance parameters, the computational expense involved has hindered access to the full posterior.

This paper introduces a novel method that provides access to the full posterior distribution of the equation of state and the nuisance parameters directly from telescope spectra. We employ a recently developed simulation-based inference technique known as neural likelihood estimation (NLE) [75] in which normalizing flows [76, 77] use samples of simulated X-ray telescope spectra to learn the likelihood of an observation as a function of the EoS and nuisance parameters. The direct inference of the neutron star EoS from telescope spectra via neural likelihood estimation in contrast to the traditional two-step inference is shown schematically in Fig.  1.

Refer to caption
Figure 1: Traditional inference of EoS parameters (left) from telescope spectra (right) is done by first inferring intermediate mass-radius constraints (green arrows), involving additional implicit assumptions. In contrast, neural likelihood estimation allows for inference of the EoS directly from telescope spectra (orange arrow), robustly accounting for uncertainties.

Once trained, the likelihood evaluation is computationally inexpensive, allowing for the calculation of the posterior across the high-dimensional parameter space for inference, marginalization, profiling, or visualization. Since neural networks are, by design, differentiable functions, we can efficiently sample the posterior using state-of-the-art Markov Chain Monte Carlo (MCMC) methods, such as Hamiltonian Monte Carlo (HMC). On simulated test spectra, this method outperforms previous approaches in terms of accuracy and precision of EoS estimation, while improving the interpretability of results with available posterior nuisance parameters. This approach also naturally scales to a growing number of neutron stars, as it does not require retraining to apply to larger datasets.

This paper is organized as follows: Section II reviews the connection between the equation of state of neutron stars and their observed X-ray spectra. This section also outlines the simulation process employed for generating samples of simulated telescope spectra. Sec. III summarizes related previous work to infer the neutron star EoS. Sec. IV then describes our new simulation-based neural likelihood estimation (NLE) approach developed in this paper. The performance of this method is quantitatively compared to previous approaches in Sec. V, with a qualitative discussion on its merits in Sec. VI. Finally, we present our conclusions in Sec.  VII.

II Simulated neutron stars

Samples of simulated neutron stars with varying values of EoS and nuisance parameters are prepared for training the normalizing flows and evaluation of their performance. Samples from Refs. [71, 72] are used to facilitate direct comparison to previous methods.

II.1 Equation of state

Matter inside neutron stars can be described in terms of the EoS, which provides the thermodynamic relationship between pressure PP and energy density ε\varepsilon within the star. To generate samples for the EoS, we begin with the relativistic mean field model GM1L [78] (see Appendix  A for more details), represented by a second-order spectral expansion as described in Refs. [79, 80]. Thus the full EoS can be described succinctly by the expansion coefficients, which we refer to as λ1\lambda_{1} and λ2\lambda_{2}. We draw EoS samples by uniformly sampling the coefficients in the intervals λ1[4.75,5.25]\lambda_{1}\in[4.75,5.25] and λ2[2.05,1.85]\lambda_{2}\in[-2.05,-1.85].

The underlying physics captured in the star’s EoS directly influences bulk properties like mass and radius through the relativistic stellar structure equations. We determine mass-radius relations for each EoS using the Tolman-Oppenheimer-Volkoff (TOV) equations, which determine the structure of non-rotating, spherically symmetric neutron stars in equilibrium [23, 24]. Using the EoS variations described above, the TOV equations are numerically solved to produce mass-radius relations which are sampled to generate the mass MM and radius RR for \sim100 stars per EoS sample. For the sampling of stars we choose a log-uniform distribution of central enthalpies [81] for the boundary condition to solve the stellar structure equations. This leads to a higher weighting of larger masses close to the maximum supported mass in the distribution of stars for each EoS.

II.2 Modeling X-ray spectra with xspec

Traditional statistical methods infer macroscopic neutron star properties from the emitted X-ray spectra of quiescent low-mass X-ray binaries (LMXBs) by fitting the observed spectrum to well-motivated theoretical models [82, 83, 84]. The open-source software xspec contains many such models and is widely regarded as the state-of-the-art for spectral fitting [85] and can additionally be used in the generation of simulated spectra. The spectra used in this study were generated using the NSATMOS model in xspec, a hydrogen atmospheric model with electron conduction and self-irradiation [84]. Beyond stellar mass and radius, this model also depends on three additional nuisance parameters, described in the next section. The simulated spectra are subjected to the Chandra telescope response function corresponding to the instrument ACIS-S [84, 86] and to Poisson noise corresponding to an observation time of 100100\,ks111The simulated spectra do not yield detectable photons at high energies. Therefore, we follow previous works [71, 72] to use only the first 250 bins of the telescope spectra..

II.3 Nuisance parameters

Each simulated X-ray spectrum from the NSATMOS model depends on the stellar mass, radius, and three additional nuisance parameters: the effective temperature of the surface, TeffT_{\mathrm{eff}}, the distance to the star, dd, and the hydrogen column, NHN_{H}, which parameterizes the reddening of the spectrum by the interstellar medium. Values for the nuisance parameters are sampled from ranges with distributions motivated by observation as described in Refs. [26, 25]; details of each range are given in Tab. 1.

Table 1: Distributions and ranges of the equation of state parameters (λ1\lambda_{1} and λ2\lambda_{2}) and nuisance parameters (NHN_{H}, dd, and log(Teff)\log(T_{\mathrm{eff}})) used to generate the samples of simulated neutron stars. See text for parameter definitions.
Parameter Distribution Range
EoS λ1\lambda_{1} uniform [4.75, 5.25]
λ2\lambda_{2} uniform [-2.05, -1.85]
nuisance dd uniform [2.3, 12.3] kpc
NHN_{H} log uniform [0.01, 3.16] 1021\,10^{21}\,cm-2
TeffT_{\mathrm{eff}} exponential [1, 2]10610^{6}\,222This corresponds to a uniform distribution of log(Teff)\log(T_{\mathrm{eff}}) in the range [6, 6.3].

The values of nuisance parameters can be informed by independent observations, which can vary significantly from star to star and provide prior information. We consider three scenarios for nuisance parameter priors, referred to as ‘true’, ‘tight’, and ‘loose’ [71, 72]. In the ‘true’ scenario, the nuisance parameters are known exactly, while the ‘tight’ and ‘loose’ scenarios have narrow or wide Gaussian priors, respectively; see Tab.  2 for a description of the prior widths for the three scenarios.

Table 2: Prior distributions on nuisance parameters under three scenarios, ‘true’, ‘tight’, and ‘loose’. Shown are the widths of the Gaussian priors.
Parameter true tight loose
dd exact 5% 20%
NHN_{H} exact 30% 50%
log(Teff)\log(T_{\mathrm{eff}}) exact ±0.1\pm 0.1 ±0.2\pm 0.2

III Previous work

The likelihood, p(s|λ)p(s|\lambda), of the spectra ss given the EoS parameters λ\lambda is not analytically known. Many traditional methods that infer the EoS from telescope spectra approximate the unknown likelihood in a two-step method [73, 74], first inferring posterior distributions p(M,R|s)p(M,R|s) for the stellar mass and radius from an observed spectrum. Uncertainties on the nuisance parameters can produce non-trivial [71], occasionally multimodal [26] contours in the posterior probabilities of stellar mass and radius. Subsequently, these posterior distributions, approximated with a Kernel Density Estimator (KDE)333KDE corresponds to summing a Gaussian kernel at each posterior sample with a standard deviation equal to a bandwidth parameter hh [87]., are used as likelihoods in a second step to infer the EoS parameters [37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]. This is valid only if the (M,R)(M,R) priors used in the inference are sufficiently flat [73, 88].

Some previous machine learning (ML) approaches infer the EoS by focusing only on the second of the two steps, starting directly from the posterior probabilities of mass and radius. In Refs. [51, 52, 53], neural networks are used to perform regression of EoS parameters from a fixed number of stars described by their masses and radii, where the posterior probabilities are simplified as uncorrelated Gaussians. These neural networks need to be retrained when the number of measurements increases. Several other machine learning methods follow a similar approach with varying architectures [54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69]. The mass and radius standard deviations can be estimated directly from xspec by varying the nuisance parameters [71]; a neural network approach that regresses EoS parameters from the resulting mass, radius and uncertainties is referred to below as ‘NN(M,RM,R via xspec)’.

However, deep neural networks are capable of analyzing high-dimensional inputs, allowing instead for regression of EoS parameters directly from a set of stellar spectra [71] and effectively removing the intermediate step in the two-step approach. This method, referred to below as ‘NN(Spectra)’, uses an uncertainty-aware and permutation-invariant neural network and captures the complex correlations between uncertainties, but only produces a point-estimate rather than the full posterior as a function of the EoS and nuisance parameters. Calculation of the full likelihood is intractable, but Ref. [72] showed that two neural networks can be used to replace the unavailable elements, granting access to the likelihood of the expected spectra given stellar mass, radius, and nuisance parameters, p(s|M,R,ν)p(s|M,R,\nu). While this method, referred to as ‘ML-LikelihoodEOS{}_{\text{EOS}}’, succeeds in obtaining a posterior in EoS, it is computationally expensive to run. This makes it a prohibitively expensive method if access to the nuisance parameter posteriors is also desired. These three approaches are used for comparison in this work.

IV Bayesian inference with neural likelihood estimation

We introduce a single-step approach, which uses neural likelihood estimation to directly learn the likelihood of telescope spectra (ss) as a function of the EoS parameters (λ\lambda) and the nuisance parameters (ν\nu) for a star, p(s|λ,ν)p(s|\lambda,\nu). This extends the strategy in Ref. [72], which learned p(s|M,R,ν)p(s|M,R,\nu), but avoids needing to integrate over the MM-RR plane to achieve a connection to the EoS parameters, saving significant computational complexity. In addition, we apply a Hamiltonian Monte Carlo (HMC) method to efficiently draw samples from the posterior distribution, p(λ,ν|s)p(\lambda,\nu|s).

IV.1 Neural likelihood estimation

Neural likelihood estimation (NLE) [75] is a type of simulation-based inference [89, 90] technique, successfully used for inference with gravitational waves [91, 92, 93], particle physics [94, 95, 96] and cosmology [97, 98, 99, 100, 101] when the likelihood of the observed data are not analytically available but must be estimated from samples of simulated data.

In this approach a neural density estimator (qΦq_{\Phi}, with parameters Φ\Phi) approximates the likelihood:

qΦ(s|λ,ν)p(s|λ,ν).\displaystyle q_{\Phi}(s|\lambda,\nu)\approx p(s|\lambda,\nu). (1)

Normalizing flows (NF), powerful neural networks capable of modeling complicated probability distributions even in high dimensions [77], are used as the density estimator. A brief introduction to NFs is provided in Appendix  B. Once trained, NFs can easily generate new samples as well as estimate the likelihood of a given sample. Specifically, a Masked Autoregressive Flow (MAF) [102] is used for the density estimator qΦq_{\Phi}. To fit the parameters Φ\Phi such that the learned distribution qΦq_{\Phi} approximates the target likelihood distribution pp, we minimize the Kullback-Leibler divergence (DKLD_{\text{KL}}) between the two distributions, which is equivalent to maximizing the approximate log-probability of the samples generated from the likelihood distribution:

argminΦDKL(p(s|λ,ν)||qΦ(s|λ,ν))\displaystyle\arg\min_{\Phi}D_{\text{KL}}\left(p(s|\lambda,\nu)\big{|}\big{|}q_{\Phi}(s|\lambda,\nu)\right) =argmaxΦsip(s|λi,νi)logqΦ(si|λi,νi).\displaystyle=\arg\max_{\Phi}\sum_{s_{i}\sim p(s|\lambda_{i},\nu_{i})}\log q_{\Phi}(s_{i}|\lambda_{i},\nu_{i})~{}. (2)

A derivation of this equivalence is given in Appendix  B. With samples sis_{i} drawn from the likelihood via simulation, as described above,

sip(s|λi,νi),\displaystyle s_{i}\sim p(s|\lambda_{i},\nu_{i})~{}, (3)

the sbi package [103] is used to train the MAF as a neural likelihood estimator. To make our analysis robust to stochasticities in training the neural network, we perform a hyperparameter search by training 100 different MAFs with different architectures. After training, we use an ensemble average [104] over the N=5N=5 best-performing density estimators such that the log-likelihood of any given telescope spectrum s0s_{0} is

logp(s0|λ,ν)1NjlogqΦj(s0|λ,ν).\displaystyle\log p(s_{0}|\lambda,\nu)\approx\frac{1}{N}\sum_{j}\log q_{\Phi_{j}}(s_{0}|\lambda,\nu)~{}. (4)

IV.2 Posterior sampling with Hamiltonian Monte Carlo

The posterior distribution can be built from the estimated likelihood and prior distributions:

p(λ,ν|s)\displaystyle p(\lambda,\nu|s) p(s|λ,ν)p(λ)p(ν).\displaystyle\propto p(s|\lambda,\nu)p(\lambda)p(\nu)~{}. (5)

The prior distribution on the nuisance parameters, p(ν)p(\nu), is given in Tab. 2, while the prior on the EoS parameters, p(λ)p(\lambda), is taken to be uniform in the intervals λ1[4.75,5.25]\lambda_{1}\in[4.75,5.25] and λ2[2.05,1.85]\lambda_{2}\in[-2.05,-1.85], the same as the distribution of training samples described in Tab. 1.

To draw samples from the posterior distribution, sampling algorithms like Markov Chain Monte Carlo (MCMC) or nested sampling can be employed. Given that the normalizing flows, priors and, consequently, the full posterior distribution are differentiable, we draw samples from the posterior using Hamiltonian Monte Carlo (HMC) [105, 106] sampling, which can use the gradient information and scales much more efficiently to high dimensional parameter spaces. For a brief introduction to HMC and our implementation details, see Appendix  C.

Unlike standard approaches, our methodology allows for the simultaneous inference of EoS parameters λ\lambda and nuisance parameters ν\nu, which minimizes assumptions made about these parameters and therefore makes the approach more robust. Additionally, any supplemental information on these parameters coming from other observations can be naturally included in the analysis through the prior distribution p(ν)p(\nu) without retraining the neural density estimators.

IV.3 Scaling to multiple observations

The estimation of the per-star likelihood as a function of the EoS parameters, p(s|λ,ν)p(s|\lambda,\nu), makes the calculation of the joint likelihood for JJ stars straightforward:

p(s1J|λ,ν1J)=jp(sj|λ,νj).\displaystyle p(s_{1...J}|\lambda,\nu_{1...J})=\prod_{j}p(s_{j}|\lambda,\nu_{j})~{}. (6)

Each star has a specific set of nuisance parameters, νj\nu_{j}, such that the posterior is

p(λ,ν1J|s1J)\displaystyle p(\lambda,\nu_{1...J}|s_{1...J}) (jp(sj|λ,νj)p(νj))p(λ).\displaystyle\propto\left(\prod_{j}\,p(s_{j}|\lambda,\nu_{j})p(\nu_{j})\right)p(\lambda)~{}. (7)

Scaling to multiple observations is straightforward for several reasons. Estimation of the likelihood in the EoS parameters rather than MM and RR, means that no additional integration over the MM-RR plane is required. In addition, the likelihood itself is estimated, rather than posteriors which cannot be trivially combined [107, 108, 109, 110]. Learning the likelihood conditioned on the nuisance parameters for each star rather than implicitly marginalizing over them allows for the joint likelihood to be simply a product of the individual stellar likelihoods.

A consequence of these choices is that we infer not only the equation of state parameters but also the nuisance parameters corresponding to every star. The stellar nuisance parameter priors are independent and can encode prior information for each star separately, for maximum flexibility and robustness. While this increases the dimensionality of the problem, the availability of gradients of the posterior distribution enables the use of powerful algorithms like HMC, ensuring that the inference remains computationally tractable.

V Results

Inference of the full posterior distribution of neutron star EoS and nuisance parameters directly from telescope spectra is now feasible. Following Refs. [71, 72], we apply the method to a set of ten simulated neutron stars for a given point in EoS space, each with three independent nuisance parameters defined above. The complete parameter space comprises two parameters of interest and thirty nuisance parameters. Results are shown for a single representative EoS point, we then demonstrate the scaling of the NLE approach to more observations and finally present a comparison with previous approaches by evaluating the average performance over 100 randomly sampled test points in EoS space.

V.1 Example posterior distribution

An example of the posterior distributions marginalized to one and two dimensions is shown in the corner plot in Fig. 2. The figure depicts both EoS parameters, λ1\lambda_{1} and λ2\lambda_{2}, and the nuisance parameters for the first neutron star, NH(1)N_{H}^{(1)}, d(1)d^{(1)} and log(Teff)(1)\log(T_{\mathrm{eff}})^{(1)}. While the marginalized posteriors of the nuisance parameters for the other nine stars are also available, they are not shown here. In all three nuisance parameter scenarios from Tab. 2, the EoS parameters are strongly correlated, similar to the log-likelihoods computed in Ref. [72]. The marginal posterior distribution of λ1\lambda_{1} is relatively tight compared to its prior range, while for the second parameter λ2\lambda_{2} it is not as well constrained compared to the parameter’s prior range.

Refer to caption
Figure 2: Corner plot depicting the posterior distribution of the parameters λ1\lambda_{1} and λ2\lambda_{2} of one example EoS as well as the first 3 (of 30) nuisance parameters NH(1)N_{H}^{(1)}, d(1)d^{(1)} and log(Teff)(1)\log(T_{\mathrm{eff}})^{(1)}. The posterior is computed based on the simulated spectra of 10 stars with the nuisance parameters known exactly in the true scenario (green), and known with the uncertainties in Tab. 2 in the tight (orange) and loose (blue) scenarios. The ground-truth parameter values are depicted as black crosses/lines. The marginal posterior distributions of the nuisance parameters are compared to the respective priors (dotted) of the tight and loose scenarios.

As expected, in the true scenario where the nuisance parameters are exactly known, the marginal posterior distributions are sharply centered around the ground-truth values. In the tight scenario, the uncertainty in the nuisance parameter distributions leads to wider distributions for the EoS parameters. This is further pronounced for the loose case, where less prior information on the nuisance parameters is available. Fig.  2 illustrates that the hydrogen column NHN_{H} as well as the logarithm of the effective surface temperature log(Teff)\log(T_{\mathrm{eff}}) can be significantly constrained from the spectrum data compared to their prior ranges. In the tight scenario, the marginal posterior for the distance dd is almost indistinguishable from the prior, indicating that the telescope spectra do not contribute any more information for this parameter over the tight priors. However, in the loose case, the marginal posterior distribution of dd becomes tighter than the loose prior, which implies that we can indeed extract information about the distance of a neutron star from its X-ray spectrum.

We can transform the posterior distribution for the EoS parameters λ1\lambda_{1} and λ2\lambda_{2} into 95% (highest density) posterior credible bands for the pressure as a function of energy density as depicted in Fig.  3. As before, the constraints are much tighter in the true scenario and become increasingly broader in the tight and loose cases. By solving the TOV equations, we can translate the EoS into mass-radius constraints. The 95% posterior credible bands for the radius as a function of mass are depicted in Fig.  3. The credible bands terminate at the 95% upper limits of the maximum mass. Focusing only on the tight case in Fig.  4, there is a very close agreement of the inferred median for P(ε)P(\varepsilon) and R(M)R(M) to the ground-truth values. Note that for this particular example, the mass of one of the simulated stars is very close to the respective maximum supported mass such that a good reconstruction even of the high-density part for this particular EoS is possible. In other cases, the reconstruction might be worse.

Refer to caption
Refer to caption
Figure 3: Posterior 95% (highest density) credible bands for the pressure as a function of energy density and the radius as a function of mass for the three (true, tight, loose) scenarios. Similar to Fig.  2, the posterior is derived based on the simulated spectra of 10 stars. The ground-truth value for the equation of state and the corresponding mass-radius relation is depicted as a dashed black line. Black dots indicate ground-truth mass-radius values of the 10 simulated stars.
Refer to caption
Refer to caption
Figure 4: Similar to Fig.  3, but only for the tight scenario the median and the 68% and 95% posterior credible bands are shown for the pressure as a function of energy density and the radius as a function of mass. The ground-truth value for the equation of state and the corresponding mass-radius relation is depicted as a dashed black line. Black dots indicate the mass-radius values of the 10 simulated stars.

V.2 Increasing the number of observations

With an anticipated surge in the number of available neutron star observations in the upcoming years, the inference method must be able to scale to a large set of data. In our approach, normalizing flows approximate the likelihood p(s|λ,ν)p(s|\lambda,\nu) per observed star. These likelihoods are then combined to obtain the total likelihood for a set of neutron stars, see Eq. (6). Consequently, the method does not need to assume a fixed number of observed neutron stars, nor a particular ordering of the stars, and works out-of-the-box for a growing set of observed neutron stars without the need for retraining any networks. To demonstrate this, we present the marginal posterior distributions for one example EoS in Fig.  5, for 5, 10, and 20 observed neutron stars, and with loose uncertainties on the nuisance parameters.

Refer to caption
Figure 5: Corner plot depicting the posterior distribution of the parameters λ1\lambda_{1} and λ2\lambda_{2} of one example EoS. The posterior is computed based on the simulated spectra of 5 (olive), 10 (blue), or 20 (purple) stars with the nuisance parameters known with the uncertainties in Tab. 2 of the loose scenario. The ground-truth parameter values are depicted as black crosses/lines.

The figure illustrates that the increase of available spectra significantly refines the inference of the EoS parameters. Notably, the transition from 5 to 10 observed spectra has a substantial impact on the posterior constraints, reducing the standard deviation of λ1\lambda_{1} by 23% from 0.061 to 0.047, while further increasing the number of measurements to 20 shows a comparatively smaller reduction in the standard deviation by only 6.4% to 0.044 for the given example. For λ2\lambda_{2} the increase in accuracy from 10 to 20 observations is even smaller. It is worth noting that in the numerical implementation of Hamiltonian Monte Carlo for posterior sampling, the computation time is predominantly consumed by the evaluation of the likelihood and its gradient. While the availability of more observations increases the per-iteration computational time in sampling the posterior, it also speeds up the convergence of the algorithm. Furthermore, HMC algorithms can easily be scaled to thousands of dimensions, hence we do not anticipate the dimensionality to be a limiting factor in the scaling of our approach.

V.3 Average performance on test set

After discussing one example EoS, now we turn to the average performance of NLE with a test set of simulated data from 100 different equations of states. To compare the average performance to the previous ML approaches that infer the neutron star EoS directly from telescope spectra described in Sec.  III, we use the same accuracy measure as Refs. [71, 72]. For each EoS in the test set, we simulate 10 spectra with random nuisance parameters. Based on the spectra and the prior nuisance parameter information, we then sample the posterior using the methodology outlined in Sec.  IV. From the marginal posterior distributions, similar to the example in Fig.  2, we determine the maximum-a-posteriori estimates (MAP)444Note that this is the maximum-a-posteriori estimate of the marginal and not of the full posterior. for the two EoS parameters (λ1,pred,λ2,pred)(\lambda_{1,\mathrm{pred}},\lambda_{2,\mathrm{pred}}) and compare them to the ground-truth values (λ1,truth,λ2,truth)(\lambda_{1,\mathrm{truth}},\lambda_{2,\mathrm{truth}}). The distributions of the differences between the marginal MAP estimates and the ground-truth values, (λ1,predλ1,truth,λ2,predλ2,truth)(\lambda_{1,\mathrm{pred}}-\lambda_{1,\mathrm{truth}},\lambda_{2,\mathrm{pred}}-\lambda_{2,\mathrm{truth}}), on the test data set are depicted in Fig.  6. As before, the equation of state parameters can be maximally constrained in the true nuisance parameter scenario. In all three scenarios, the distributions are centered around zero, indicating that there is no systematic bias in the posterior prediction. Compared to the previous ML analyses in Refs.  [71, 72], the distributions of λ1\lambda_{1} are much narrower in the tight and loose scenarios (see for example Fig.  8 in [72]).

Refer to caption
Refer to caption
Figure 6: Distribution of the predicted maximum-a-posteriori estimates versus the ground-truth values for the two equation of state parameters λ1\lambda_{1} and λ2\lambda_{2}. In the true scenario, the nuisance parameters are fixed to their exact values; in the tight and loose cases, they are drawn from the narrow or wide priors in Tab. 2.

To quantitatively compare these distributions to the previous ML analyses, we compute the mean μ\mu and standard deviation σ\sigma of the distribution of differences in Fig.  6. We can combine both standard deviations into

σtot=σλ12+σλ22.\displaystyle\sigma_{\mathrm{tot}}=\sqrt{\sigma_{\lambda_{1}}^{2}+\sigma_{\lambda_{2}}^{2}}~{}. (8)

The resulting values are listed in Tab. 3 and illustrated in Fig.  7. The mean μ\mu of the difference between the posterior prediction of λ1\lambda_{1} and its ground-truth value is much smaller in the NLE approach compared to previous approaches in all three prior distributions considered in Tab. 2. In the true case, the NLE approach performs better than ML-LikelihoodEOS{}_{\text{EOS}} from Ref. [72] and NN(Spectra) from Ref. [71] (see Sec.  III for more details about these methods). For realistic scenarios of nuisance parameters (loose and tight), NLE outperforms all other methods as quantified by σtot\sigma_{\mathrm{tot}}, while for the true scenario, it outperforms all methods besides NN(M,RM,R via xspec), which uses xspec itself for part of the inference and involves simplifying assumptions about the mass-radius uncertainties.

Table 3: Average accuracy for the prediction of neutron star EoS parameters λ1\lambda_{1} and λ2\lambda_{2}. Shown are the means (μ\mu) and standard deviations (σ\sigma) of the distributions in Fig. 6, i.e., of the differences between the predicted maximum-a-posteriori and ground-truth values. Both standard deviations are combined to σtot\sigma_{\text{tot}} according to Eq. (8). The neural likelihood estimation (NLE) approach is compared to three previous approaches; neural networks that regress the EoS parameters from the spectra (NN(Spectra)) and from M,RM,R estimates by xspec (NN(M,RM,R via xspec)), both from Ref. [71], as well as an approach using an approximate likelihood that incorporates two neural networks, ML-LikelihoodEOS{}_{\text{EOS}}, from [72]. In the true scenario, the nuisance parameters are fixed to their exact values; in the tight and loose cases, they are drawn from the narrow or wide priors in Tab. 2.
λ1,predλ1,truth\lambda_{1,\mathrm{pred}}-\lambda_{1,\mathrm{truth}} λ2,predλ2,truth\lambda_{2,\mathrm{pred}}-\lambda_{2,\mathrm{truth}} Combined
p(ν)p(\nu) Method μ\mu      σ\sigma μ\mu       σ\sigma σtot\sigma_{\mathrm{tot}}
true ML-LikelihoodEOS{}_{\text{EOS}} -0.02 0.066 0.01 0.070 0.096
NN(Spectra) -0.02 0.066 0.01 0.075 0.099
NN(M,RM,R via xspec) -0.03 0.065 0.01 0.055 0.085
NLE 0.00 0.056 -0.01 0.070 0.090
tight ML-LikelihoodEOS{}_{\text{EOS}} -0.02 0.078 0.03 0.081 0.112
NN(Spectra) 0.02 0.085 -0.02 0.077 0.115
NN(M,RM,R via xspec) -0.03 0.081 0.01 0.056 0.098
NLE 0.00 0.066 -0.02 0.071 0.097
loose ML-LikelihoodEOS{}_{\text{EOS}} -0.04 0.089 0.03 0.081 0.120
NN(Spectra) -0.03 0.131 -0.01 0.078 0.152
NN(M,RM,R via xspec) -0.03 0.123 0.01 0.058 0.136
NLE 0.00 0.085 -0.01 0.074 0.113

Interestingly, from Tab. 3 it becomes clear that the NLE approach is better than all other approaches to constrain the first EoS parameter λ1\lambda_{1}, while for λ2\lambda_{2} the accuracy of the xspec based two-step approach, with its simplifying assumptions regarding the impact of nuisance parameters, is always much better than all other approaches. Note that the central enthalpies used to solve the stellar structure equations are sampled from log-uniform intervals, such that the masses near the maximum supported mass are weighted more heavily for each EoS. Consequently, for each EoS in the test set, the largest of the ten randomly selected masses is close to the respective maximum supported mass. This is also true for the previous approaches with which we compare our results.

Refer to caption
Figure 7: Illustrated mean and standard deviation of the difference between the predicted maximum-a-posteriori values to the ground-truth values for the three different scenarios from Tab. 3.

With our neural likelihood estimation approach, we can now, for the first time, infer the posterior for nuisance parameters. The means and standard deviations of the differences between the marginal MAP estimates and the ground-truth values on the test data are listed in Tab.  4. While we obtain excellent constraints on the hydrogen fraction NHN_{H} and the effective surface temperature log(Teff)\log(T_{\mathrm{eff}}), our ability to constrain the distance dd from the spectra is limited. When going from the tight to the loose scenario, the inference on the distance estimates suffers the most, indicating that these constraints are primarily driven by the prior in the tight case, as seen in Fig.  2. However, this is not as limiting as it might seem because it is much easier to obtain precise constraints on the distance of a neutron star from external measurements than it is for the other nuisance parameters.

Table 4: Similar to Tab. 3, but with mean and standard deviation of the predicted maximum-a-posteriori estimate versus the ground-truth value for the three nuisance parameters.
NH,predNH,truthN_{H,\mathrm{pred}}-N_{H,\mathrm{truth}} dpreddtruthd_{\mathrm{pred}}-d_{\mathrm{truth}} log(Teff)predlog(Teff)truth\log(T_{\mathrm{eff}})_{\mathrm{pred}}-\log(T_{\mathrm{eff}})_{\mathrm{truth}}
Method p(ν)p(\nu) μ\mu σ\sigma μ\mu σ\sigma μ\mu σ\sigma
NLE tight 0.00 0.063 -0.06 0.419 0.00 0.034
loose 0.01 0.070 -0.26 1.149 -0.01 0.047

VI Discussion & outlook

In addition to the performance gains seen in the preceding sections, the NLE+HMC approach has several advantages relative to previous work.

  1. (i)

    The single-step estimation of the likelihood in terms of the EoS avoids collapsing the information into the mass-radius plane as an intermediate step. EoS-dependent quantities like temperature inhomogeneities [111] can impact a star’s spectrum but are not captured by the mass and radius. Uncertainties from nuisance parameters can also be more accurately propagated by using the full high-dimensional data. In addition, this avoids assuming the mass-radius posteriors can be used as a likelihood, making simplifying approximations regarding the nature of the intermediate likelihood, or integrating over the MM-RR plane.

  2. (ii)

    Neural likelihood estimation allows for amortization; after training the neural density estimators once, the inclusion of additional observations is straightforward, see Sec.  V.2. In addition, extending to additional stars is inexpensive relative to other methods, which require integrating over estimated mass-radius posteriors to construct likelihoods [39, 50], such as with Kernel Density Estimation techniques.

  3. (iii)

    Learning the likelihood instead of the posterior allows combination with likelihoods from other data [112], e.g., constraints from low-energy nuclear theory at small densities [113, 114], perturbative QCD at high densities [115, 116], mass measurements from Shapiro time delays [28, 29, 30], mass-radius constraints from analyses of the NICER telescope [31, 32, 33, 32] or gravitational wave signals from binary neutron star mergers [34, 35]. If these likelihoods are computed by traditional methods, they may not be differentiable, which precludes the use of HMC sampling methods, however it may be worthwhile extending simulation-based inference techniques also to these cases.

  4. (iv)

    Application of HMC allows for robust exploration of the high-dimensional space of nuisance parameters, which can be of interest in other astrophysical studies.

Improvements to the method could be pursued in several directions. To prevent degradation of performance near the borders of the training data, additional simulated samples could be included beyond the current borders. Another potential improvement may be to additionally condition the inference on the masses MM of the neutron stars, approximating p(s|λ,ν,M)p(s|\lambda,\nu,M) rather than p(s|λ,ν)p(s|\lambda,\nu), to further enhance the interpretability and accuracy of the method. In this way, we could infer the neutron star masses and, by solving the TOV equations, radii simultaneously with the EoS. Moreover, to combine the information from different observational instruments for the same neutron star, it becomes essential to compute the likelihood as a function of the neutron star mass. Such situations arise regularly, see e.g. [26, 117, 118, 119, 120]. Preliminary results show, however, that the mass distributions can become multimodal, as also seen in [26], which makes sampling the posterior difficult. Future work may extend our method to effectively include the mass as a nuisance parameter. In addition, one could explore alternative parameterizations of the EoS which can describe more complicated phase structures possibly realized inside neutron stars like crossovers of first-order phase transitions, such as piecewise polytropes [121], segment-wise linear interpolations of the speed of sound [122], non-parametric representations based on a Gaussian process [123, 37] or a neural network [124]. In that way, we could constrain the unknown phase structure of strongly interacting matter at large densities [125]. Our approach is conducive to the use of a more complex EoS parameterization.

VII Conclusion

Neural likelihood estimation (NLE) and Hamiltonian Monte Carlo (HMC) are used in conjunction to allow the first inference of the full posterior for neutron star equation of state and nuisance parameters directly from telescope spectra. In realistic scenarios for nuisance parameter priors, this method outperforms all state-of-the-art methods in terms of prediction accuracy.

Extracting these parameters requires simulation-based inference techniques, as the likelihood is analytically unavailable. Previous methods either relied on inference of the mass and radius as an intermediate step or only provided a point estimate of the EoS parameters. NLE+HMC allows for the full posterior, avoiding simplifying assumptions and information collapse at an intermediate step, and is trivially extendable to multiple stars and other data.

Our study considered three different scenarios for the available prior information on the nuisance parameters coming from additional observations beyond the telescope spectra. With tighter prior constraints, the marginal posterior distributions of the EoS parameters, and accordingly the posterior P(ε)P(\varepsilon) and R(M)R(M) credible bands, become much tighter. Both the hydrogen column NHN_{H} and effective surface temperature log(Teff)\log(T_{\mathrm{eff}}) can be tightly constrained from the spectra. For the distance dd, the posterior constraints are driven by the prior information.

These techniques can be extended to NICER or gravitational wave data [126]555For the analysis of NICER and gravitational waves, the telescope data is much larger compared to the quiescent LMXBs analyzed here. In that case, it might become necessary to use an additional embedding net to learn summary statistics [103].. Many more gravitational wave events from binary neutron star mergers will be detected in future runs of LIGO, Virgo, and KAGRA. In addition, the gravitational wave database will increase dramatically with the next generation of gravitational wave detectors. For such a large number of expected observations, it is not numerically feasible to compute the likelihood by integrating KDEs [112, 127, 63]. Neural simulation-based inference techniques promise to provide a cost-efficient alternative [92, 128].

This approach can help guide decisions about future observations. Sec. V.2 demonstrated our method can easily be used to study the impact of additional measurements on the final EoS constraints. It can therefore be used to estimate the relative value of future measurements of one star compared to another, using simulations. Through the assessment of the constraining power of multiple simulated test spectra, we can anticipate which stars provide the most useful additional information required to further constrain the equation of state, thereby guiding decisions for future experimental endeavors.

Based on the results derived in this study on simulated test data, we can conclude that neural likelihood estimation techniques provide a promising avenue for the inference of the neutron star equation of state, not only for X-ray measurements of neutron stars but possibly also for gravitational waves. In future studies, this method would be applied to real telescope measurements to draw conclusions about the internal structure of neutron stars.

Acknowledgements

LB thanks Max Dax, Kenji Fukushima, and Koichi Murase for stimulating discussions. LB is supported in part by Deutsche Forschungsgemeinschaft (DFG) and the National Natural Science Foundation of China (NSFC) through funds provided by the Sino-German CRC110 (DFG Grant No. TRR110 and NSFC Grant No. 11621131001), and by the DFG Excellence Cluster ORIGINS. LB thanks the Japan Society for the Promotion of Science (JSPS) and The University of Tokyo for their support and hospitality during his International Fellowship for Research in Japan in the winter and spring of 2024. AG and DW are supported by The Department of Energy Office of Science. FW and DF are supported by the National Science Foundation (USA) under Grant No. PHY-2012152. AWS is supported by NSF AST NSF grant, AST 22-06322 and PHY 21-16686. LL is supported by NSF grant 2012857. LH is supported by the Excellence Cluster ORIGINS, which is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC-2094-390783311.

AG used resources from the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, for this research. We specifically thank Wahid Bhimji and Benjamin Nachman for their support with these computing resources. This collaboration was forged in discussions at the Differentiable and Probabilistic Programming for Fundamental Physics Workshop at Munich and the research was therefore supported by the Munich Institute for Astro-, Particle and BioPhysics (MIAPbP) which is funded by the DFG under Germany’s Excellence Strategy – EXC-2094 – 390783311.

Appendix A Relativistic mean-field model

In this study, the EoS is based on a relativistic mean-field (RMF) model, which expresses the baryon-baryon interactions through effective meson exchanges. These mesons encompass a scalar meson (σ\sigma) signifying attraction between baryons, a vector meson (ω\omega) representing repulsion, and an isovector meson (ρ\rho) essential for elucidating nucleon-nucleon interactions in isospin asymmetric systems. The pion, playing a key role in the long-range description of baryon-baryon interaction, assumes odd parity and consequently disappears in the mean-field approximation. The Lagrangian governing interacting nucleons ψN\psi_{N} is expressed as [129]:

N=Nψ¯N[γμ(iμgωNωμ12gρN𝝉𝝆μ)(mNgσNσ)]ψN,\displaystyle\mathcal{L}_{\mathrm{N}}=\sum\limits_{N}{\bar{\psi}}_{N}\bigl{[}\gamma_{\mu}\bigl{(}i\partial^{\mu}-g_{\omega N}\omega^{\mu}-\tfrac{1}{2}g_{\rho N}\,\bm{\tau}\cdot\bm{\rho}^{\mu}\bigr{)}-\bigl{(}m_{N}-g_{\sigma N}\,\sigma\bigr{)}\bigr{]}\psi_{N}\,, (9)

where gσNg_{\sigma N}, gωNg_{\omega N}, and gρNg_{\rho N} represent the meson-nucleon coupling constants.

In the standard RMF approach, meson-nucleon coupling constants are set to reproduce properties of isospin symmetric nuclear matter at nuclear saturation density, n0n_{0}. These properties include the energy per nucleon E0E_{0}, the nuclear incompressibility K0K_{0}, the effective nucleon mass m/mNm^{*}/m_{N}, the asymmetry energy JJ, the asymmetry energy slope L0L_{0}, and the nucleon potential UNU_{N}. For the GM1L parameter set employed in this study, the respective values are n0=0.153n_{0}=0.153\,fm-3, E0=16.3E_{0}=-16.3\,MeV, K0=300K_{0}=300\,MeV, m/mN=0.70m^{*}/m_{N}=0.70, J=32.5J=32.5\,MeV, L0=55.0L_{0}=55.0\,MeV, and UN=65.5U_{N}=-65.5\,MeV. Electrons and muons in neutron star matter are described by the lepton Lagrangian (λ{e,μ}\lambda\in\{e^{-},\mu^{-}\}):

L=λψ¯λ(iγμμmλ)ψλ,\mathcal{L}_{\mathrm{L}}=\sum_{\lambda}\overline{\psi}_{\lambda}\bigl{(}i\gamma_{\mu}\partial^{\mu}-m_{\lambda}\bigr{)}\psi_{\lambda}\,, (10)

while the meson Lagrangian is given by [130, 129]:

M=12(μσμσmσ2σ2)14ωμνωμν+12mω2ωμωμ+12mρ2𝝆μ𝝆μ14𝝆μν𝝆μν,\mathcal{L}_{\mathrm{M}}=\,\tfrac{1}{2}\bigl{(}\partial_{\mu}\sigma\partial^{\mu}\sigma-m^{2}_{\sigma}\sigma^{2}\bigr{)}-\tfrac{1}{4}\omega_{\mu\nu}\omega^{\mu\nu}+\tfrac{1}{2}m^{2}_{\omega}\omega_{\mu}\omega^{\mu}\\ +\tfrac{1}{2}m^{2}_{\rho}\bm{\rho}_{\mu}\cdot\bm{\rho}^{\mu}-\tfrac{1}{4}\bm{\rho}_{\mu\nu}\cdot\bm{\rho}^{\mu\nu}~{}, (11)

where ωμν=μωννωμ\omega_{\mu\nu}=\partial_{\mu}\omega_{\nu}-\partial_{\nu}\omega_{\mu} and 𝝆μν=μ𝝆νν𝝆μ\bm{\rho}_{\mu\nu}=\partial_{\mu}\bm{\rho}_{\nu}-\partial_{\nu}\bm{\rho}_{\mu}. To ensure that the RMF model reproduces the empirical values for the nuclear incompressibility and the effective nucleon mass at saturation, additional nonlinear scalar self-interactions need to be included in the Lagrangian [131]:

NLσ=13bσmN[gσN(n)σ]314cσ[gσN(n)σ]4,\displaystyle\mathcal{L}_{\mathrm{NL\sigma}}=-\tfrac{1}{3}b_{\sigma}m_{N}\bigl{[}g_{\sigma N}(n)\sigma\bigr{]}^{3}-\tfrac{1}{4}c_{\sigma}\bigl{[}g_{\sigma N}(n)\sigma\bigr{]}^{4}~{}, (12)

where bσb_{\sigma} and cσc_{\sigma} are constants determined by the properties of symmetric nuclear matter.

The field equations resulting from the aforementioned Lagrangians must be solved while adhering to the conditions of electrical charge neutrality in neutron star matter:

nqp+λnλqλ=0,n\,q_{p}+\sum\limits_{\lambda}n_{\lambda}q_{\lambda}=0~{}, (13)

where qpq_{p} is the electric charge of a proton, and conservation of baryon number:

nB=n,pnB=0.n-\sum\limits_{B=n,p}n_{B}=0~{}. (14)

The meson-field equations combined with the equations for electric charge neutrality and baryon number conservation constitutes a set of five coupled nonlinear equations, to be simultaneously solved to determine the meson mean-fields (σ¯\bar{\sigma}, ω¯\bar{\omega}, ρ¯\bar{\rho}) and the neutron and electron Fermi momenta (knk_{n}, kek_{e}). The proton’s Fermi momentum is determined by the condition that neutron star matter is in chemical equilibrium:

μp=μnqpμe,\mu_{p}=\mu_{n}-q_{p}\mu_{e}~{}, (15)

where μn\mu_{n} and μp\mu_{p} are the chemical potentials of neutrons and protons respectively, and μe\mu_{e} is the electron chemical potential. The presence of muons in neutron star matter occurs when μeμμ\mu_{e}\geq\mu_{\mu}. For the neutron star matter EoS, we match the GM1L parametrization with the Baym-Pethick-Sutherland (BPS) model [132] for the outer crust and the Baym-Bethe-Pethick (BBP) model [133] for the inner crust.

To limit the number of parameters in the inference procedure, it is advantageous to represent the EoS with only a few parameters. As described in Ref. [71], we accomplish this by constructing a parametric representation based on a spectral fit, where the EoS is represented as a linear combination of basis functions and can therefore be reconstructed using the coefficients of the basis functions. Using a second-order expansion, we construct a spectral fit of the GM1L EoS using the process outlined in Refs. [79, 80]. The two coefficients from the spectral fit are hereafter referred to as λ1\lambda_{1} and λ2\lambda_{2}. The original coefficients describing the GM1L EoS are then used to generate many EoS scenarios using the expression:

λgenerated=λfit(1+2c(ran20.5))\displaystyle\lambda_{\text{generated}}=\lambda_{\mathrm{fit}}\cdot(1+2\cdot c\,(\mathit{ran2}-0.5)) (16)

where λgenerated\lambda_{\text{generated}} represents the newly constructed spectral parameter, λfit\lambda_{\text{fit}} is the best fit spectral parameter of GM1L, and cc is a scaling parameter set to 0.05. ran2\mathit{ran2} are uniformly distributed random numbers in the range 0 to 1 generated by the ran2\mathit{ran2} function given in [134]. This process was repeated to create around 10310^{3} different EoS models, each represented by a unique set of spectral coefficients λ1\lambda_{1} and λ2\lambda_{2}. Following this process, the two parameters are uniformly distributed in the intervals λ1[4.75,5.25]\lambda_{1}\in[4.75,5.25] and λ2[2.05,1.85]\lambda_{2}\in[-2.05,-1.85].

Appendix B Normalizing flows and simulation-based inference

Normalizing Flows are a class of generative models in machine learning that focus on learning a bijective mapping between a simple base distribution π(u)\pi(u) (usually chosen to be a Gaussian distribution) and a more complex, target distribution p(x)p(x) [77]. The main idea is to model p(x)p(x) by transforming the base distribution π(u)\pi(u) through a series of invertible and differentiable transformations fΦf_{\Phi} with trainable parameters Φ\Phi.

Given NN transformations, fΦ=fΦ1fΦNf_{\Phi}=f_{\Phi_{1}}\circ\dots\circ f_{\Phi_{N}}, we can easily generate samples xx from p(x)p(x) by transforming the samples uu of π(u)\pi(u)

uπ(u),\displaystyle u\sim\pi(u)~{}, (17)
x=fΦ(u).\displaystyle x=f_{\Phi}(u)~{}. (18)

From the change of variables law for probability distributions, we can compute the probability density of the target distribution

p(x)=π(fΦ1(x))|det(fΦ1x)|.\displaystyle p(x)=\pi(f^{-1}_{\Phi}(x))\bigg{|}\det\left(\frac{\partial f^{-1}_{\Phi}}{\partial x}\right)\bigg{|}~{}. (19)

Consequently, for a fast numerical evaluation of the probability density, the transformations fΦf_{\Phi} should be chosen such that they are easy to invert and have a Jacobian whose determinant should be fast to compute. One popular choice is Masked Autoregressive Flows (MAF) [102], where each dimension of the data is sequentially transformed conditioned on the previous dimensions, making the Jacobian of fΦ1f_{\Phi}^{-1} triangular by design. MAFs are well suited for our purposes because they are very efficient in computing the probability density p(x)p(x), but less efficient in generating samples, xp(x)x\sim p(x).

In simulation-based inference (SBI), our goal is to train a conditional normalizing flow to learn the likelihood distribution, p(s|λ,ν)p(s|\lambda,\nu). This is achieved my minimizing the Kullback-Leibler divergence (DKLD_{\text{KL}}), which is a measure of the statistical distance between two probability distributions, between the likelihood p(s|λ,ν)p(s|\lambda,\nu) and the distribution parameterized by the normalizing flow, qΦ(s|λ,ν)q_{\Phi}(s|\lambda,\nu). DKLD_{\text{KL}} becomes zero if the distributions are identical. Hence we fit the trainable parameters Φ\Phi with the following optimization procedure:

argminΦDKL(p(s|λ,ν)||qΦ(s|λ,ν))\displaystyle\arg\min_{\Phi}D_{\text{KL}}\left(p(s|\lambda,\nu)\big{|}\big{|}q_{\Phi}(s|\lambda,\nu)\right) =argminΦdsp(s|λ,ν)[logp(s|λ,ν)logqΦ(s|λ,ν)]\displaystyle=\arg\min_{\Phi}\int\mathrm{d}s\,p(s|\lambda,\nu)\,[\log p(s|\lambda,\nu)-\log q_{\Phi}(s|\lambda,\nu)]
argminΦsip(s|λi,νi)logp(si|λi,νi)logqΦ(si|λi,νi)\displaystyle\approx\arg\min_{\Phi}\sum_{s_{i}\sim p(s|\lambda_{i},\nu_{i})}\log p(s_{i}|\lambda_{i},\nu_{i})-\log q_{\Phi}(s_{i}|\lambda_{i},\nu_{i})
=argminΦsip(s|λi,νi)logqΦ(si|λi,νi)\displaystyle=\arg\min_{\Phi}\sum_{s_{i}\sim p(s|\lambda_{i},\nu_{i})}-\log q_{\Phi}(s_{i}|\lambda_{i},\nu_{i})
=argmaxΦsip(s|λi,νi)logqΦ(si|λi,νi),\displaystyle=\arg\max_{\Phi}\sum_{s_{i}\sim p(s|\lambda_{i},\nu_{i})}\log q_{\Phi}(s_{i}|\lambda_{i},\nu_{i})~{}, (20)

where the second line is the Monte-Carlo estimator of the integral in the definition of the KL divergence in the first line. The first term in the second line can be dropped as it is constant with respect to the parameters Φ\Phi of the density estimator. Thus to learn the likelihood distribution, training the density estimator to minimize the KL divergence is equivalent to maximizing the log-probability of the sampled spectra, sis_{i}, generated from the likelihood distribution via the simulator for given EoS and nuisance parameters, (λi,νi)(\lambda_{i},\nu_{i}).

Appendix C Hamiltonian Monte Carlo

Let xnx\in\mathcal{R}^{n} denote the state (random variable) in a continuous state space. Our goal is to generate samples from a target probability distribution function π(x)\pi(x). We assume the normalization 𝑑xπ(x)=1\int dx\,\pi(x)=1, although all MCMC methods only require un-normalized densities.

C.1 Metropolis-Hastings

Given the target distribution π\pi and the current state xx, the random-walk Metropolis-Hastings algorithm constructs a Markov chain by sampling a proposal yy from a transition function (proposal distribution) t(x,y)t(x,y). The simplest example of such a proposal distribution is a Gaussian distribution centered on xx with some width σ\sigma, i.e., y𝒩(x,σ2)y~{}\sim\mathcal{N}(x,\sigma^{2}). The proposal is then accepted with some probability α(x,y)\alpha(x,y), in which case the next state is yy, otherwise it remains xx. We need to identify this acceptance probability to ensure that asymptotically this Markov chain generates samples from the target distribution, x,yπx,y\sim\pi. This is equivalent to ensuring that the target distribution π\pi is invariant under this Markov chain, for example by maintaining detailed balance conditions when accepting the proposed stage

π(x)t(x,y)α(x,y)=π(y)t(y,x)α(y,x).\pi(x)t(x,y)\alpha(x,y)=\pi(y)t(y,x)\alpha(y,x)~{}. (21)

Detailed balance means ensuring that the function π(x)t(x,y)α(x,y)\pi(x)t(x,y)\alpha(x,y) is symmetric in exchanging xx and yy (xyx\leftrightarrow y). This equation enforces that the probability of being in the state xx, proposing a transition to state yy and accepting this transition (xyx\rightarrow y) is the same as making the reverse transition (yxy\rightarrow x).

If t(x,y)t(x,y) is positive everywhere, the above condition is satisfied by the standard Metropolis-Hastings acceptance formula for x,ySx,y\in S,

α(x,y)=min(π(y)t(y,x)π(x)t(x,y),1),\alpha(x,y)=\min\!\left(\frac{\pi(y)\,t(y,x)}{\pi(x)\,t(x,y)},1\right), (22)

where the denominator is never zero given the above assumptions on π\pi and tt. For each x,ynx,y\in\mathcal{R}^{n}, either α(x,y)\alpha(x,y) or α(y,x)\alpha(y,x) is 1. There are other formulae for α\alpha obeying Eq.  (21), but with lower acceptance rates, meaning they lead to an undesirable higher asymptotic variance for estimated expectations.

C.2 Classical Hamiltonian Monte Carlo

In high dimensions, random walk Metropolis-Hastings as described in the previous section often leads to diffusive behavior and can be extremely inefficient. In this section, we outline the Hamiltonian Monte Carlo (HMC) sampling algorithm which overcomes this by designing more efficient proposal kernels, t(x,y)t(x,y), that utilize gradient information [105, 106]. Following standard notation for Hamiltonian dynamics, qnq\in\mathcal{R}^{n} denotes the parameter of interest that is to be sampled. The target density, π\pi, is assumed to be continuous, differentiable, and positive everywhere. To draw samples qq from π(q)\pi(q), HMC reinterprets the parameters of interest as a position vector with associated potential energy function U(q)=logπ(q)U(q)=-\log\pi(q). We introduce an auxiliary momentum vector pnp\in\mathcal{R}^{n}, which contributes a kinetic energy term K(p)=12pTM1pK(p)=\frac{1}{2}p^{T}M^{-1}p, where MM is some symmetric positive definite mass matrix that we take as fixed. Then the Hamiltonian H:2nH:\mathcal{R}^{2n}\to\mathcal{R} gives the total energy for the state x:=(q,p)x:=(q,p),

H(x)=H(q,p)=U(q)+12pTM1p.H(x)=H(q,p)=U(q)+\frac{1}{2}p^{T}M^{-1}p~{}. (23)

The state space S=2nS=\mathcal{R}^{2n} is called phase space. The dynamical evolution of particles under this Hamiltonian is called Hamiltonian flow and can be simulated by solving Hamiltonian equations.

HMC uses a Markov chain to generate samples xx from the canonical distribution π~\tilde{\pi} defined by HH, namely

π~(x):=Z1eH(x)=Z1eU(q)e12pTM1p=Z1π(q)e12pTM1p,\tilde{\pi}(x):=Z^{-1}\,e^{-H(x)}=Z^{-1}\,e^{-U(q)}\,e^{-\frac{1}{2}p^{T}M^{-1}p}=Z^{-1}\,\pi(q)\,e^{-\frac{1}{2}p^{T}M^{-1}p}~{}, (24)

where Z=2n𝑑xeH(x)=(2π)n/2detMZ=\int_{\mathcal{R}^{2n}}dx\,e^{-H(x)}=(2\pi)^{n/2}\sqrt{\det M} is a normalizing constant. Since HH is the sum of potential and kinetic terms, in the Gibbs density qq and pp are independent, with the qq-marginal of π~(x)\tilde{\pi}(x) being the target density π(q)\pi(q). Thus, extracting the first nn coordinates of samples x(i)x^{(i)} from π~\tilde{\pi}, one obtains samples from π\pi.

HMC constructs a Markov chain to generate samples from this distribution. Given a current state x(i):=(q(i),p(i))x^{(i)}:=(q^{(i)},p^{(i)}), the Markov update comprises two steps:

Step 1. Gibbs sampling:

Resample the momentum p(i)p^{(i)} from its Gaussian marginal distribution p𝒩(0,M)p\sim\mathcal{N}(0,M), without changing q(i)q^{(i)}. This randomization step is needed to mix efficiently between different HH values (corresponding to energy level-sets).

Step 2. Metropolis update:

Given the momentum p(i)p^{(i)}, generate a new Metropolis-Hastings proposal via a deterministic map y=F(x)y=F(x) which approximates the Hamiltonian flow in Eq.  (23) over a certain time TT, starting from the initial point x=(q(i),p(i))x=(q^{(i)},p^{(i)}) and is followed by negation of the final momentum666This negation of momentum maintains the invertibility of every step as is necessary for detailed balance, but is not necessary in practice for HMC as it is followed with a Gibbs momentum refresh step.. This map defines the transition kernel tt. This proposal is then accepted with the probability α(x,y)\alpha(x,y).

The most commonly used dynamics for the map FF approximating the Hamiltonian flow, i.e., solving the Hamiltonian equations, is the leapfrog (Verlet) integrator. Each leapfrog step, written (q,p)=Lϵ(q,p)(q^{\prime},p^{\prime})=L_{\epsilon}(q,p), comprises three substeps:

p¯\displaystyle\bar{p} pϵ2U(q),\displaystyle\leftarrow\;p-\frac{\epsilon}{2}\nabla U(q)~{},
q\displaystyle q^{\prime} q+ϵM1p¯,\displaystyle\leftarrow\;q+\epsilon M^{-1}\bar{p}~{},
p\displaystyle p^{\prime} p¯ϵ2U(q),\displaystyle\leftarrow\;\bar{p}-\frac{\epsilon}{2}\nabla U(q^{\prime})~{}, (25)

where ϵ\epsilon is the step-size. We compose n=T/ϵn=T/\epsilon such steps, (Lϵ)n(L_{\epsilon})^{n} to integrate the Hamiltonian flow for time TT. With the momentum-flip operator P(q,p):=(q,p)P(q,p):=(q,-p), F=(Lϵ)nPF=(L_{\epsilon})^{n}P (operators act left to right) is volume-preserving because each of the three substeps is a shear transformation777A shear is a map of the form (q,p)(q+G(p),p)(q,p)\mapsto(q+G(p),p) or (q,p+G(q))(q,p+G(q)), and it is easy to check that the determinant of the 2n×2n2n\times 2n Jacobian derivative matrix is 1., and FF is an involution888Involution means F1=FF^{-1}=F, i.e., it is time reversible because Lϵ(q,p)=(q,p)L_{\epsilon}(q^{\prime},-p^{\prime})=(q,-p), which can be verified by reversing the order of the substeps, so LϵPLϵ=PL_{\epsilon}PL_{\epsilon}=P and ((Lϵ)nP)2=I((L_{\epsilon})^{n}P)^{2}=I. In this case, the transition described in Step 2 preserves detailed balance when

α(x,y)=min(π~(y)π~(x), 1),\alpha(x,y)=\min\left(\frac{\tilde{\pi}(y)}{\tilde{\pi}(x)},\,1\right), (26)

so that the distribution π~\tilde{\pi} is invariant under Step 2. Since π~\tilde{\pi} is also invariant under the Gibbs sampling in Step 1, π~\tilde{\pi} is invariant under their composition of both steps, i.e., under each HMC update. Note that failure to approximate well the Hamiltonian flow by the leapfrog integrator does not impact detailed balance, although it can drastically reduce the mixing of the Markov chain, and hence the efficiency of the algorithm.

C.3 Implementation

To draw posterior samples with HMC, we run 16 chains of 2000 samples each. The leapfrog integration in each chain is performed for 40 steps to generate new proposals, with a step-size that is dynamically determined using dual averaging for 300 burn-in steps to achieve an average acceptance probability of 0.65. The mass matrix used in this work is diagonal, except for negative off-diagonal elements between the EoS parameters. In this work, we tuned these values manually based on few initial runs, but in the future this can be automated using variational approximations to the target distribution [135]. We use an importance sampling step to determine initial values for each chain [90]. To facilitate an easy comparison of our NLE + HMC method with future analyses, we provide working examples of the algorithms implemented for this work in a public GitHub repository999https://github.com/lenjonah/neutron_star_inference.

References