Nuclear data evaluation with Bayesian networks

Georg Schnabel [email protected] NAPC–Nuclear Data Section, International Atomic Energy Agency, Vienna, Austria Roberto Capote NAPC–Nuclear Data Section, International Atomic Energy Agency, Vienna, Austria Arjan Koning NAPC–Nuclear Data Section, International Atomic Energy Agency, Vienna, Austria David Brown National Nuclear Data Center, Brookhaven National Laboratory, Upton, NY 11973-5000, USA

(January 26, 2025)

Abstract

Bayesian networks are graphical models to represent the deterministic and probabilistic relationships between variables within the Bayesian framework. The knowledge of all variables can be updated using new information about some of the variables. The Bayesian Generalized Linear Least Squares method can be regarded as an inference method for Bayesian networks of variables with multivariate normal priors and linear relationships between them. We show that relying explicitly on the Bayesian network interpretation enables large scale inference and gives more flexibility in incorporating prior assumptions and constraints into the nuclear data evaluation process, such as the constraints that some cross sections equal linear combinations of other cross sections and that all cross sections must be non-negative. The latter constraint is accounted for by a non-linear transformation and therefore we also discuss inference in Bayesian networks with non-linear relationships between variables. Using Bayesian networks, the evaluation process yields more detailed information, such as posterior estimates and uncertainties of all statistical and systematic errors associated with the experiments. We further elaborate on a sparse Gaussian process construction that can be well integrated into the Bayesian network framework and applied to, e.g., the modeling of energy-dependent model parameters, model deficiencies of the physics model or energy-dependent systematic errors of experiments. We present three proof-of-concept examples that emerged in the context of the neutron data standards project and in the ongoing international evaluation efforts of ⁵⁶Fe. In the first example we demonstrate the modelization and explicit estimation of relative energy-dependent error components associated with experimental datasets. Then we show that Bayesian networks in combination with the outlined Gaussian process construction may be applied to an evaluation of ⁵⁶Fe in the energy range between one and two MeV, where it is difficult to obtain satisfactory evaluations by R-Matrix and nuclear model fits. Finally, we present a model-based evaluation of ⁵⁶Fe between 5 MeV and 30 MeV with a consistent and statistically sound treatment of model deficiencies. The R scripts to reproduce the Bayesian network examples and the nucdataBaynet package for Bayesian network modeling and inference have been made publicly available.

pacs:

Valid PACS appear here

^†^†preprint: APS/123-QED

I Introduction

An accurate knowledge of nuclear quantities, such as cross sections, branching ratios, and half-lifes, of the nuclei is important in various branches of nuclear physics and engineering. The risk and safety assessment of nuclear power plants, the modeling of astrophysical processes and dose calculations in nuclear medicine all depend on nuclear data. The aim of nuclear data evaluation is to provide accurate estimates of these quantities along with reliable uncertainties quantifying their accuracy.

Evaluated nuclear data are the result of an evaluation process that begins with the identification and collection of suitable experimental data. Selected data are scrutinized regarding potential errors and biases. It is common practice to characterize the knowledge about the experimental features in the following way. The determined possible ranges of the errors are expressed as uncertainties. Biases, i.e., errors that affect distinct measured values in the same way, are accounted for by introducing correlations. Uncertainties and correlations are usually summarized as a covariance matrix, referred to as experimental covariance matrix. Finally, a statistical method is applied to fuse the information of the indvidual experiments, i.e., the measured values and the associated covariance matrix, in order to obtain estimates and an associated evaluated covariance matrix of quantities of interest.

An established method in the field of nuclear data evaluation to achieve this fusion is the Bayesian version of the Generalized Least Squares (GLS) method, e.g., Muir (1989); Smith (1991); Capote et al. (2010), which enables the inclusion of prior knowledge. It relies on the assumption that relationships between variables are linear and the knowledge of the variables is expressed in terms of a multivariate normal distribution, which is uniquely characterized by a mean vector and a covariance matrix. Therefore, the experimental covariance matrix can be directly used in the GLS method. Especially in the field of nuclear data, the name Generalized Linear Least Squares (GLLS) is often preferred to emphasize the linearity assumption.

From now on, we always refer to the Bayesian version when we use the term GLS method, as is common practice in the field of nuclear data evaluation. The GLS method is often presented as a method to obtain estimates of parameters, e.g., parameters of a physics model or cross sections on a mesh, and an associated evaluated covariance matrix based on measured data and the associated experimental covariance matrix. This viewpoint is also reflected in the mathematical equations with the appearance of the experimental covariance matrix. Unfortunately, the experimental covariance matrix is dense if systematic experimental uncertainties are taken into account, posing an obstacle to the application of the GLS method to a large number of data points. One solution to deal with large covariance matrices in the GLS method is to update the prior covariance matrix block-wise as implemented in the GANDR evaluation code Muir et al. (2007) and another one is to exploit the structure of the prior covariance matrix when it is derived from a sample of model predictions Schnabel and Leeb (2017). However, these approaches only address the issue of a large prior covariance matrix and not a large experimental covariance matrix.

Regarding the flexibility of the GLS method in terms of possible modeling assumptions, Gaussian processes (GPs), e.g., Rasmussen and Williams (2006); Stein and Stein (1999); Tarantola (2005), have been included in the nuclear data evaluation process in various ways, e.g., to account for model deficiencies Pigni and Leeb (2003); Leeb et al. (2008); Schnabel and Leeb (2016); Helgesson and Sjöstrand (2017), as priors on energy-dependent model parameters Helgesson and Sjöstrand (2018); Schnabel et al. (2021), and as a flexible fitting function instead of a physics model Iwamoto (2020). Even though these approaches may be regarded as new methods, as soon as we discretize the function for which the GP serves as prior, these approaches merely represent specific constructions of the prior covariance matrix which is then used in the GLS method. There is still a vast space of modeling possibilities that has not been thoroughly exploited in nuclear data evaluations yet. For example, to our knowledge, the modeling of model deficiencies with Gaussian processes obeying simultaneously the constraints that some cross sections are defined as sums of other cross sections and that cross sections must be non-negative has not been demonstrated so far. The developments in this direction are again impeded by the resulting large and potentially dense covariance matrix and its negative impact on the computation time of the GLS method. Furthermore, the GLS method can only be applied to variables with linear relationships, which may be a serious limitation in certain nuclear data evaluation scenarios, e.g., Smith and Naberejnev (2002); Capote and Smith (2008).

The limitations of the GLS method can be pushed back by changing the perspective on the inference process. Model parameters, model predictions, model deficiencies, measurements, statistical and systematic errors are linked to each other by functional relationships. These quantities with the exception of the measured values are uncertain and can therefore be represented as random variables. These random variables and their probablistic functional relationships can be modeled as a Bayesian network, e.g., Koller and Friedman (2009); Pearl (2014). Inference procedures for Bayesian networks exploit the connection structure of the variables to speed up computations and thus enable inference at a larger scale.

Bayesian networks have a long history and a sound mathematical foundation. Early and seminal contributions to their theory were made by Judea Pearl in the eighties, e.g., Pearl (1985). Importantly, a Bayesian network is a specific representation of the information required to perform Bayesian inference, which is both intuitive for humans and computationally efficient. As an aside, also hierarchical models, e.g., Buss et al. (2010); Schnabel (2018a) in the context of nuclear data, can be interpreted in terms of Bayesian networks.

Relying on Bayesian networks as a mental abstraction helps to find good and valid Bayesian models and makes the modeling assumptions more explicit. Besides increased computational efficiency, one benefit for nuclear data evaluation is that consistent and joint evaluations of several isotopes coupled together by cross section measurements of materials in natural composition and the consistent evaluation of exclusive cross sections using measurements of inclusive ones, such as residual production cross sections, become straight-forward from the modeler point of view. Another possible application is the consistent updating of evaluations in library projects, such as JEFF Plompen et al. (2020), JENDL Shibata et al. (2011), ENDF/B Brown et al. (2018). Many past evaluations represent valuable knowledge but are unfortunately not reproducible anymore. The Bayesian network framework enables the consistent update of such evaluations using new measurements. In the future, the unresolved resonance range may also be evaluated using Bayesian networks. For instance, a computational mesh of several hundred thousand points would be required to properly represent point-wise the resonances in the unresolved resonance range of ²³⁹Pu Levitt (1972). This mesh size is nowadays intractable within the conventional formulation of the GLS method but may be tractable in the Bayesian network framework. Please note that the usual treatment of the unresolved resonance range does assume a non-Gaussian probability distribution and the hypothetical treatment using Gaussian processes within the GLS method would be conceptually different.

In this paper, we first focus on Bayesian networks with linear relationships between variables and prior knowledge represented by multivariate normal distributions. These are the same assumptions as in the GLS method, and inference in such Bayesian networks therefore yields the same results. Because the GLS method is well-known in the nuclear data field, we anchor the exposition of the Bayesian network interpretation at the GLS method and tailor it to the specifics of nuclear data evaluation. The aim of this paper is therefore to demonstrate potential advantages of the Bayesian network interpretation for nuclear data evaluation and not to give a comprehensive account of the general theory of Bayesian networks. A comprehensive account of their theory can be found in Pearl (2014).

As non-linear relationships between variables appear frequently in nuclear data evaluation, it is important to take them into account properly in the inference procedure. Consequently we also discuss how inference in Bayesian networks with non-linear relationship between variables can be performed, keeping the assumption of a multivariate normal prior on the variables. To this end, we use a customized Levenberg-Marquardt algorithm Helgesson and Sjöstrand (2017) to find the most likely values of the variables according to the exact posterior distribution. Assumptions, such as an uncertainty being relative to the underlying measured quantity or a quantitity being non-negative, can be rigorously taken into account. The exact treatment of non-linear relationships also enables the use of other probability distributions related to the (multivariate) normal distribution, such as the log-normal distribution for strictly positive variables, e.g., Smith and Naberejnev (2002), and the application of transformations to obtain approximately normally distributed variables in cases where the original variables are not normally distributed, e.g., Hoefer and Buss (2021).

Approximate posterior covariances based on the linearization of the non-linear relationships between variables at the posterior maximum can be quickly computed for a subset of the variables despite a possible large total number of variables, e.g., several hundred thousand. Furthermore, sample vectors from the approximate posterior distribution including all variables can be obtained using the full approximate posterior covariance matrix, even though it is never explicitly computed.

Regarding the flexibility of modeling assumptions, we elaborate on a sparse Gaussian process construction that can be well integrated into the Bayesian network framework and scales well with the number of data points. A similar construction has already been employed in a prototype of a nuclear data evaluation pipeline Schnabel et al. (2021) in the step concerned with the automatic correction of uncertainties of inconsistent experimental datasets. This GP construction allows the inclusion of prior knowledge about the expected range, slope, and smoothness of the function for which the Gaussian process serves as prior. The prior on these properties can be different at different (usually energy) locations of the function. This GP construction may be used instead of the originally proposed ones in, e.g., Leeb et al. (2008); Schnabel and Leeb (2016); Helgesson and Sjöstrand (2017); Schnabel (2018b) to take into account model deficiencies, unknown parameter functions in a nuclear model Helgesson and Sjöstrand (2018); Schnabel et al. (2021) or be used in place of a nuclear model to fit cross sections Iwamoto (2020). It is however also perfectly possible to use other GP constructions in the Bayesian network framework, although inference may not be possible for a large number of variables then.

Finally, we present three example applications to demonstrate the flexibility of the Bayesian network framework and its usefulness for nuclear data evaluation. The first example application emerged in the context of the neutron data standards project Carlson et al. (2009, 2018) and showcases the inclusion of energy-dependent unrecognized sources of uncertainty Capote et al. (2020) and the rigorous treatment of uncertainties given relative to the underlying true cross sections. The two other examples emerged in the context of the international evaluation efforts of neutron-induced reaction of ⁵⁶Fe within the International Nuclear Data Evaluation Network (INDEN). As R-matrix codes, e.g., Larson (1998); Moxon et al. (2010); De Saint Jean et al. (2021), and nuclear physics model codes, e.g., Young and Arthur (1977); Herman et al. (2007); Koning and Rochman (2012); Iwamoto et al. (2016); Kawano (2021), struggle to provide good fits in the incident energy range from several hundred keV to about 5 MeV, we show how an evaluation may be performed using only Gaussian processes in the Bayesian network framework in the energy range between one and two MeV, while preserving consistency between the various channels and ensuring that all evaluated cross sections are non-negative. In the final example, we incorporate model defects into a model-based evaluation with TALYS Koning and Rochman (2012); Koning et al. (2019), as already suggested and demonstrated in schematic examples Schnabel and Leeb (2016); Helgesson and Sjöstrand (2017), but using the GP construction proposed in this paper. In all the examples, comprehensive uncertainty information of all evaluated quantities is available, such as for the evaluated cross sections and systematic error components. The R scripts to create these Bayesian networks and do inference in them are provided as examples as a part of the publicly available nucdataBaynet R package Georg .

The description of the Bayesian networks given as examples in section III is focused on the modeler point of view. A modeler can think in terms of nodes and mappings between them, such as linear interpolations, convolutions and non-linear transformations, and does not need to be concerned about the details of the inference algorithms described in section II. Therefore a reader may start with section II.1 and then jump to the examples in section III to get a feel for Bayesian network modeling before possibly delving into the backing math in section II.2 and subsequent sections.

Finally, we stress that nuclear model code and evaluation systems exist that already offer advanced Bayesian inference capabilities with a significant methodological overlap with the methods presented in this paper, such as CONRAD De Saint Jean et al. (2021), EMPIRE Herman et al. (2007, 2013), GANDR Muir et al. (2007), GMAP Poenitz (1981); Poenitz and Aumeier (1997), KALMAN Kawano and Shibata (1997), SAMMY Larson (1998), SOK Kawano et al. (2000) and the T6 code system Koning et al. (2019); Koning (2015). A detailed account of Bayesian statistics in the context of the analysis of nuclear resonance data is given in Fröhner (2000) and a comprehensive general introduction with an eye to physics in von der Linden et al. (2014). However, to the best of our knowledge, the employed Bayesian methods have never been discussed in terms of the Bayesian network interpretation in the nuclear data field. Bayesian neural networks have already been employed for nuclear mass prediction Niu and Liang (2018) but a neural network is conceptually different from a Bayesian network even though Bayesian inference was applied to adjust the weights of the network. Regarding the application of Bayesian networks in other nuclear-related fields, they have been discussed for example in the context of power plant safety analysis Chen et al. (2010); Mohan et al. (2020); Chen et al. (2021), software reliability quantification of system critical software Kang et al. (2018), reactor control Ramos et al. (2021), and analysis of nuclear acquisitions Freeman (2009).

II Methodology

II.1 Primer on Bayesian networks

Bayesian inference is a framework for inference under uncertainty. The central formula in Bayesian statistics is given by Bayes theorem, e.g., Sivia (1996),

P(H\,|\,E)=\frac{P(E\,|\,H)P(H)}{P(E)}\,,

(1)

where $H$ is an hypothesis and $E$ represents an observation. For instance, $H$ could be an hypothesis from the set $\{H_{1}:\textrm{`It rained'},H_{2}:\textrm{`It did not rain'}\}$ and the evidence $E$ an observation from a set of possible observations $\{\textrm{`Floor is wet'},\textrm{`Floor is dry'}\}$ . The notation $P(.)$ denotes a probability, a number between zero and one, which is interpreted as degree of belief in Bayesian statistics. The probability $P(H)$ is referred to as prior probability and represents the degree of belief that hypothesis $H$ is true without taking into account the evidence $E$ . The probability $P(E\,|\,H)$ denotes the probability for making the observation $E$ given that $H$ is true. For instance, $P(\textrm{Floor is wet}\,|\,\textrm{It rained})=1$ if we know that rain always causes the floor to be wet. The probability $P(E)$ is referred to as marginal likelihood or model evidence and is the probability of making a specific observation $P(E)$ based on the modeling assumptions. The probability $P(H\,|\,E)$ is referred to as posterior probability and represents the degree of belief that $H$ is true after the observation $E$ has been made.

The sets we have introduced above are discrete and we can exhaustively enumerate their elements. However, for continuous quantities, such as temperatures or cross sections in nuclear data, the notion of probability needs to be replaced by the concept of probability density function. Given a probability density function $\rho(x)$ , the probability that a value is within the range enclosed by $a$ and $b$ is given by

P(a\leq x\leq b)=\int_{a}^{b}\rho(x)dx\,.

(2)

Due to this relationship, probability density functions must be non-negative everywhere and their integral over the full range one. We do not emphasize the distinction between probabilities and probability density functions in the following because this aspect is not crucial for the concept of Bayesian networks.

Even though the Bayesian theorem in its basic form is at the core of Bayesian inference as it formalizes the mechanics of learning from experience, it only deals with a very simple case. In the following we generalize the perspective on the inference problem to prepare the discussion of Bayesian networks. The distinction between hypotheses and evidence is to some extent arbitrary. For instance, the possible observation ‘Floor is wet’ we mentioned as an example is also a hypothesis, but we named it evidence because it is an hypothesis that we observed to be true. In contrast, which hypothesis $H$ among the possible ones is true was not directly observed and Bayes theorem allows us to indirectly improve our knowledge about their likelihood by using the evidence.

Therefore, at a higher level of abstraction, we can equally say that we have a system of hypotheses and the Bayesian inference framework provides a mechanism to update our knowledge about their likelihood by observing some of the hypotheses to be true or false. Assume that we have a number of variables $H_{1},H_{2},H_{3},\dots$ with each one being an hypothesis from a certain class of hypotheses. For instance, $H_{1}$ could be the value of the elastic cross section, $H_{2}$ the value of the non-elastic cross section and $H_{3}$ the value of the total cross section. Their joint prior probability density function is denoted by $P(H_{1},H_{2},H_{3})$ . To give an example of the form of the Bayesian update formula in the case of three variables, assume that we know the elastic cross section $H_{1}$ . The Bayesian update formula takes then the form

P(H_{2},H_{3}\,|\,H_{1})=\frac{P(H_{1}\,|\,H_{2},H_{3})P(H_{2},H_{3})}{P(H_{1})}\,.

(3)

Even though the Bayesian update formula can be written down for an arbitrary number of variables in this way, typical inference tasks with many variables pose computational challenges. A relevant inference query is to find an assignment of values to $H_{2}$ and $H_{3}$ that maximizes the value of the posterior probability density function $P(H_{2},H_{3}\,|\,H_{1})$ . This assignment is called a maximum a posteriori probability (MAP) estimate.

The idea of Bayesian networks is to exploit the structure of the relationships between variables. Every joint probability distribution can be decomposed into a product of conditional probability distributions. For instance, for three variables, we can write the joint probability density function as

P(H_{1},H_{2},H_{3})=P(H_{1})P(H_{2}\,|\,H_{1})P(H_{3}\,|\,H_{1},H_{2})\,.

(4)

In this product, going from left to right, each conditional probability distribution is conditioned on all variables that appeared before. These conditional dependencies can be visualized as a Bayesian network:

Variables (or a collection of variables of the same type) are associated with nodes and directed arrows go from the variables being conditioned on to the dependent variables. Bayesian networks are not allowed to have cycles, hence it is impossible to arrive at the same node twice by following the arrows along their pointing direction.

In the product in eq. 4 conditional dependencies can be dropped if variables are conditionally independent. Two random variables $X$ and $Y$ are conditional independent given a third random variable $Z$ if

P(X\,|\,Y,Z)=P(X\,|\,Z)\;\;\textrm{and}\;\;P(Y\,|\,X,Z)=P(Y\,|\,Z)\,.

(5)

In words, knowledge about the value of $Y$ does not improve the knowledge about the value of $X$ if the value of $Z$ is already known.

In the example of the three hypotheses: If $H_{2}$ and $H_{3}$ are conditionally independent given $H_{1}$ , eq. 4 simplifies to

P(H_{1},H_{2},H_{3})=P(H_{1})P(H_{2}\,|\,H_{1})P(H_{3}\,|\,H_{1})\,.

(6)

and the corresponding Bayesian network loses one arrow:

Please note that there is not a unique way to factorize a joint probability distribution. For instance, for two variables we can choose $P(H_{1},H_{2})=P(H_{1})P(H_{2}\,|\,H_{1})$ or $P(H_{1},H_{2})=P(H_{2})P(H_{1}\,|\,H_{2})$ . Consequently, there are several Bayesian network topologies to describe the same joint probability distribution which usually differ in both the number of arrows and their orientations. An objective in Bayesian network modeling is therefore to find a topology which is simple, e.g., possesses a smaller number of connections or a topology with arrows reflecting the causal relationships between variables. The specific topology of Bayesian network can be exploited in Bayesian inference.

Bayesian networks may be classified according to whether variables are continuous or discrete in the network or the type of prior distribution imposed on variables, e.g., Gaussian distributions. Three general inference tasks can be identified, e.g., Koller and Friedman (2009):

1.

Determinig the links between variables and their orientation,
2.

Estimating the parameters of the conditional prior distributions, and,
3.

Inferring the values of unobserved variables.

In this paper, we restrict ourselves to Bayesian networks with all prior conditional distributions being multivariate normal and possibly non-linear relationships between variables. We deal only with the inference of unobserved variables (3), which is the essential inference task in nuclear data evaluation. To develop the concepts and notation, we first review the Generalized Least Squares method, already linking it to Bayesian networks, and afterwards extend the discussion to nested and non-linear relationships.

II.2 Generalized Least Squares recapitulated

The Bayesian version of the Generalized Least Squares (GLS) method, e.g., Muir (1989), is usually regarded as a method to update the parameters of a model based on data from experiments affected by statistical and systematic errors. In the nuclear data context, parameters may be the cross sections associated with the points of an energy mesh or the parameters of a nuclear physics model. This view suggests that model parameters and experimental errors are quantities of different nature both from the point of physics and Bayesian statistics. However, from the point of Bayesian statistics, all these quantities are variables with uncertain values and can be treated in the same way. In order to introduce the Bayesian network interpretation, we present the GLS formulas without regard to the particular meaning of the variables to emphasize this symmetry.

The GLS method is based on the multivariate normal distribution, which is of the form

\mathcal{N}(\vec{x}\,|\,\vec{\mu},\mathbf{\Sigma})=\frac{1}{\sqrt{(2\pi)^{N}\det\mathbf{\Sigma}}}\\ \exp\left(-\frac{1}{2}(\vec{x}-\vec{\mu})^{T}\mathbf{\Sigma}^{-1}(\vec{x}-\vec{\mu})\right)\,,

(7)

and characterized by a mean vector $\vec{\mu}$ and a covariance matrix $\mathbf{\Sigma}$ . We use the notation $\vec{x}\sim\mathcal{N}(\vec{\mu},\mathbf{\Sigma})$ to indicate that the random vector $\vec{x}$ is distributed according a multivariate normal distribution with mean vector $\vec{\mu}$ and covariance matrix $\mathbf{\Sigma}$ , hence its probability density function given by eq. 7.

Assume we have a set of variables $\{y_{i}\}_{i=1..N}$ whose values are uncertain and we assemble these variables to a vector $\vec{y}$ . Some variables are either completely or to a certain degree determined by other variables. For instance, the values of the parameters of a deterministic nuclear model determine completely the resulting predictions. In contrast, for a stochastic nuclear model, the model parameters certainly influence the predictions but do not completely determine them due to the stochastic nature of the simulation. In a nuclear experiment, the measurement is influenced by the underlying true values of nuclear properties but not completely determined due to various measurement errors.

To account for these dependencies, we partition the variables in $\vec{y}$ into two subvectors, $\vec{y}_{I}$ and $\vec{y}_{J}$ . We refer to variables whose indices are in the set $I$ as independent variables and assume them to be governed by a multivariate normal distribution, i.e., $\vec{y}_{I}\sim\mathcal{N}(\vec{u}_{I},\mathbf{U}_{I,I})$ . The variables with their indices being in $J$ are referred to as dependent variables and they are assumed to be functions of the independent variables,

\vec{y}_{J}=f(\vec{y}_{I},\vec{\tau})=g(\vec{y}_{I})+\vec{\tau}\,.

(8)

We introduced the random variable $\vec{\tau}$ to allow for stochastic perturbation of the otherwise deterministic link. The vectors $\vec{\tau}$ and $\vec{y}_{J}$ are of the same size. Please note that the variables in $\vec{\tau}$ have also to be regarded as independent variables, even though their indices are not included in $I$ for notational reasons. In nuclear data evaluation, $\vec{\tau}$ typically contains the statistical errors. We assume $\vec{\tau}$ to be governed by a multivariate normal distribution, i.e., $\vec{\tau}\sim\mathcal{N}(\vec{u}_{J},\mathbf{U}_{J,J})$ , and the random variable $\vec{y}_{I}$ to be independent of $\vec{\tau}$ . With these specifications, the joint probability distribution $P(\vec{y}_{J},\vec{y}_{I},\vec{\tau})$ factorizes therefore into

P(\vec{y}_{J},\vec{y}_{I},\vec{\tau})=P(\vec{y}_{I})P(\vec{\tau})P(\vec{y}_{J}\,|\,\vec{y}_{I},\vec{\tau})

(9)

with the following corresponding Bayesian network:

A non-linear relationships $g(\vec{y}_{I},\vec{\tau})$ in eq. 8 can be cast into the GLS framework by constructing a linear Taylor approximation. A linear Taylor approximation of eq. 8 is of the form

\vec{y}_{J}=\vec{y}_{\textrm{ref},J}+\mathbf{T}\,\left(\vec{y}_{I}-\vec{y}_{\textrm{ref},{I}}\right)+(\vec{\tau}-\vec{\tau}_{\textrm{ref}})

(10)

where $\vec{y}_{\textrm{ref},J}=f(\vec{y}_{\textrm{ref},I},\vec{\tau}_{\textrm{ref}})$ and $\mathbf{T}$ is the Jacobian matrix of $f$ evaluated at $\vec{y}_{\textrm{ref},I}$ . The Jacobian matrix is somestimes also referred to as sensitivity matrix. At this point it may be regarded as unnecessary to introduce a potentially non-zero vector $\vec{\tau}_{\textrm{ref}}$ because $\vec{\tau}$ is an additive (i.e., also linear) contribution to the result. However, this term will become important later in the treatment of nested relationships.

Depending on the specific construction of $\mathbf{T}$ , this equation may implement interpolation using splines, a Legendre polynomial, a Fourier polynomial or any other type of polynomial. For instance, for spline interpolation $\mathbf{T}$ may map the values at the knot points given in $\vec{y}_{I}$ to the locations of the data points $\vec{y}_{J}$ . In the case of a Legendre polynomial, the Jacobian $\mathbf{T}$ maps the values of the Legendre coefficients given in $\vec{y}_{I}$ to the function values at the positions of the data points whose values are given in $\vec{y}_{J}$ . Equation 10 is therefore general and can accomodate many different types of interpolation schemes. Importantly, this equation as a building block does not only cover interpolation but, for instance, is also used to distribute normalization errors given in $\vec{y}_{I}$ to the data points in $\vec{y}_{J}$ .

It will be helpful to have a compact formula to express $\vec{y}$ , i.e., both $\vec{y}_{I}$ and $\vec{y}_{J}$ , as a function of the independent variables $\vec{y}_{I}$ and $\vec{\tau}$ . In other words, we want to extend eq. 10 so that its result is the full vector $\vec{y}$ . To this end, we introduce a vector $\vec{z}$ of the same size as $\vec{y}$ whose components are given by

\vec{z}_{I}=\vec{y}_{I}\;\;\;\;\textrm{and}\;\;\;\;\vec{z}_{J}=\vec{\tau}\,.

(11)

This vector $\vec{z}$ contains all independent variables, which completely determine the values in $\vec{y}_{J}$ .

Analogously, we also introduce a vector $\vec{z}_{\textrm{ref}}$ with elements given by

\vec{z}_{\textrm{ref},I}=\vec{y}_{\textrm{ref},I}\;\;\;\;\textrm{and}\;\;\;\;\vec{z}_{\textrm{ref},J}=\vec{\tau}_{\textrm{ref}}\,.

(12)

Finally, we regard $\mathbf{T}$ as the respective block in a larger matrix $\mathbf{S}$ whose blocks are defined as

\mathbf{S}_{I,I}=\mathbb{1},\;\;\;\;\mathbf{S}_{I,J}=\mathbf{0}\;\;\;\;\mathbf{S}_{J,I}=\mathbf{T},\;\;\;\;\mathbf{S}_{J,J}=\mathbb{1}\,.

(13)

Please note that even though we use the term block, the ordering of the rows and columns does not matter, as long as the index sets $I$ and $J$ do not contain common indices and all indices from $1$ to $N$ are present in $A=I\cup J$ . The same is true for the vectors $\vec{z}$ and $\vec{z}_{\textrm{ref}}$ .

The introduction of these quantities allows us to rewrite eq. 10 to cover both dependent and independent variables,

\vec{y}=\vec{y}_{\textrm{ref}}+\mathbf{S}\,\left(\vec{z}-\vec{z}_{\textrm{ref}}\right)\,.

(14)

Regarding the Bayesian network interpretation, this equation propagates the values of the variables without parent nodes, i.e., $\vec{y}_{I}$ and $\vec{\tau}$ to all the other variables. Now with all variables and their mutual relationships defined, we can deal with the question of inference in the Bayesian framework.

Bayesian statistics can be seen as a framework to update the knowledge about the possible values of variables based on observation of some variables. Using our notation, the Bayesian inference formula to achieve this update is given by

P(\vec{y}_{I}\,|\,\vec{y}_{J})=\frac{1}{P(\vec{y}_{I})}P(\vec{y}_{J}\,|\,\vec{y}_{I})P(\vec{y}_{I})

(15)

where we marginalized over $\vec{\tau}$ to treat it as a nuisance variable, i.e., $P(\vec{y}_{J}\,|\,\vec{y}_{I})=\int P(\vec{\tau})P(\vec{y}_{J}\,|\,\vec{y}_{I},\vec{\tau})d\vec{\tau}$ .

The GLS method is a special case of the general Bayesian update formula to update the knowledge about $\vec{y}_{I}$ based on the observed values of $\vec{y}_{J}$ in eq. 8 if the following assumptions are met:

•

The function $f(.)$ in eq. 8 is linear, i.e., it can be exactly represented by eq. 10
•

The prior distribution of $\vec{y}_{I}$ and $\vec{\tau}$ are multivariate normal and no prior correlations exist between elements in $\vec{y}_{I}$ and elements in $\vec{\tau}$

The resulting posterior distribution $P(\vec{y}_{I}\,|\,\vec{y}_{J})$ is then also multivariate normal and the posterior mean vector and covariance matrix can be obtained by analytic matrix formulas.

To state the GLS formulas, we introduce the prior mean vector $\vec{u}$ and prior covariance matrix $\mathbf{U}$ which characterize the prior on $\vec{y}_{I}$ and $\vec{\tau}$ simultaneously. The priors on these vectors have been already introduced along with eq. 8, which we repeat here for the convenience of the reader: $\vec{y}_{I}\sim\mathcal{N}(\vec{u}_{I},\mathbf{U}_{I,I})$ and $\vec{\tau}\sim\mathcal{N}(\vec{u}_{J},\mathbf{U}_{J,J})$ . The matrices $\mathbf{U}_{I,I}$ and $\mathbf{U}_{J,J}$ are now considered to be the blocks of the compound covariance matrix $\mathbf{U}$ at the positions defined by index sets $I$ and $J$ , respectively. Due to the independence assumption between $\vec{y}_{I}$ and $\vec{\tau}$ , we have $\mathbf{U}_{I,J}=\mathbf{U}_{J,I}^{T}=\mathbf{0}$ . Let us assume that we have observed the values of the vector $\vec{y}_{J}$ , i.e., $\vec{y}_{J}=\vec{r}$ . Observed nodes in the Bayesian network are colored grey:

The update formula to obtain the posterior covariance matrix $\mathbf{U}_{I,I}^{\prime}$ is given by

\mathbf{U}_{I,I}^{\prime}=\left(\mathbf{S}_{J,I}^{T}\mathbf{U}_{J,J}^{-1}\mathbf{S}_{J,I}+\mathbf{U}_{I,I}^{-1}\right)^{-1}\,.

(16)

and the update formula for the posterior mean vector by

\vec{u}_{I}^{\prime}=\vec{y}_{\textrm{ref},I}+\mathbf{U}_{I,I}^{\prime}\Big{(}\mathbf{S}_{J,I}^{T}\mathbf{U}_{J,J}^{-1}(\vec{r}-\vec{y}_{\textrm{ref},J})\\ +\mathbf{U}_{I,I}^{-1}(\vec{u}_{I}-\vec{y}_{\textrm{ref},I})\Big{)}\,.

(17)

A derivation of eqs. 16 and 17 can be found in appendix A.

These equations can be written in more compact form. The assumption of zero cross-covariance elements between $\vec{y}_{I}$ and $\vec{\tau}$ , i.e., $\mathbf{U}_{I,J}=\mathbf{U}_{J,I}^{T}=\mathbf{0}$ , allows us to write

\mathbf{U}_{I,I}^{\prime}=\left(\mathbf{S}_{A,I}^{T}\mathbf{U}_{A,A}^{-1}\mathbf{S}_{A,I}\right)^{-1}\,.

(18)

If in addition taking into account the structure of $\mathbf{S}$ in eq. 13 and defining the vector $\vec{v}$ with elements

\vec{v}_{I}=\vec{u}_{I},\,\vec{v}_{J}=\vec{r}\,,

(19)

we can rewrite eq. 17 as

\vec{u}_{I}^{\prime}=\vec{y}_{\textrm{ref},I}+\mathbf{U}_{I,I}^{\prime}\mathbf{S}_{A,I}^{T}\mathbf{U}_{A,A}^{-1}(\vec{v}-\vec{y}_{\textrm{ref}})\,.

(20)

The vector $\vec{u}_{I}^{\prime}$ and the covariance matrix $\mathbf{U}_{I,I}^{\prime}$ are the updated distribution parameters of the multivariate normal distribution associated with $\vec{y}_{I}$ due to observing the values in $\vec{y}_{J}$ , i.e., $\vec{y}_{I}\sim\mathcal{N}(\vec{u}_{I}^{\prime},\mathbf{U}_{I,I}^{\prime})$

To discuss the application of the GLS method for a Bayesian network with more nodes, we consider the following scenario for the purpose of illustration. So far we did not impose further constraints on the prior of $\vec{\tau}$ except being a multivariate normal distribution. If, however, $\vec{\tau}$ can be decomposed into two a priori mutually independent blocks $\vec{\tau}_{K},\vec{\tau}_{L}$ , i.e., $\mathbf{U}_{K,L}=(\mathbf{U}_{L,K})^{T}=\mathbf{0}$ , we may split up eq. 10 into two equations:

	$\displaystyle\vec{y}_{K}$	$\displaystyle=\vec{y}_{\textrm{ref},K}+\mathbf{T}_{K,I}\,\left(\vec{y}_{I}-\vec{y}_{\textrm{ref},{I}}\right)+(\vec{\tau}_{K}-\vec{\tau}_{\textrm{ref},K})\,,$		(21)
	$\displaystyle\vec{y}_{L}$	$\displaystyle=\vec{y}_{\textrm{ref},L}+\mathbf{T}_{L,I}\,\left(\vec{y}_{I}-\vec{y}_{\textrm{ref},{I}}\right)+(\vec{\tau}_{L}-\vec{\tau}_{\textrm{ref},L})\,.$		(22)

If we regard the values in $\vec{y}_{I}$ as fixed, the remaining contribution to $\vec{y}_{K}$ and $\vec{y}_{L}$ given by $\vec{\tau}_{K}$ and $\vec{\tau}_{L}$ , respectively, are independent of each other due to our independence assumption for the blocks in $\vec{\tau}$ above. Due to this conditional independence of $\vec{y}_{K}$ and $\vec{y}_{L}$ there is no arrow between $\vec{y}_{K}$ and $\vec{y}_{L}$ . Considering this conditional independence and the values in $\vec{y}_{K}$ and $\vec{y}_{L}$ being given as a function of $\vec{y}_{I}$ , we have the following Bayesian network:

In this scenario, one may assume that only the variables in $\vec{y}_{K}$ have been observed, i.e., the variables in the vector $\vec{y}$ referred to by index set K.

Keeping this example in mind, we want to discuss the general form of the GLS equations for a Bayesian network without intermediate nodes in which some of the terminal nodes have been observed, which amounts to knowing the values of some variables in the vector $\vec{y}_{J}$ in eq. 10. The treatment of nested relationship will be discussed later in section II.5.

We denote by $D$ the index set referring to the observed values in $\vec{y}$ , i.e., $\vec{y}_{D}$ is the subvector with observed values. For instance, in the example above we can identify $D=K$ . Let us denote by $\neg{D}$ the index set with all indices that are not in $D$ , i.e., $\neg{D}=A\setminus D$ with $A=\{1,\dots,N\}$ . In this case, we can consider eq. 14 to realize that the variables in the vector $\vec{z}_{\neg D}$ , which includes all variables in $\vec{y}_{I}$ and a subset of the variables in $\vec{\tau}$ , can be regarded as the independent variables. The dependent and measured variables are now those in $\vec{y}_{D}$ . For this more general case, the structure of the update formulas remains the same.

For the posterior covariance matrix we have

\mathbf{U}_{\neg D,\neg D}^{\prime}=\left(\mathbf{S}_{D,\neg D}^{T}\mathbf{U}_{D,D}^{-1}\mathbf{S}_{\neg D,D}+\mathbf{U}_{\neg D,\neg D}^{-1}\right)^{-1}\,,

(23)

and for the posterior mean vector

\vec{u}_{\neg D}^{\prime}=\vec{z}_{\textrm{ref},\neg D}+\mathbf{U}_{\neg D,\neg D}^{\prime}\Big{(}\mathbf{S}_{D,\neg D}^{T}\mathbf{U}_{D,D}^{-1}(\vec{r}-\vec{y}_{\textrm{ref},D})\\ +\mathbf{U}_{\neg D,\neg D}^{-1}(\vec{u}_{\neg D}-\vec{z}_{\textrm{ref},\neg D})\Big{)}\,.

(24)

If we assume that the random variables in $\vec{z}_{D}$ are independent of those in $\vec{z}_{\neg D}$ , we can again simplify the equations. Later when dealing with Bayesian networks with more nodes, this independence assumption will be fulfilled because variables of the form $\vec{\tau}$ associated with different nodes are a priori independent and all variables associated with each single node are either observed or unobserved at the same time. We do not consider situations in which only a part of the variables associated with a node is observed.

If the observed values are $\vec{y}_{D}=\vec{r}$ , the vector $\vec{v}$ is now defined by

\vec{v}_{\neg D}=\vec{u}_{\neg D},\;\vec{v}_{D}=\vec{r}\,.

(25)

In addition we define a vector $\vec{v}_{\textrm{ref}}$ as

\vec{v}_{\textrm{ref},\neg D}=\vec{z}_{\textrm{ref},\neg D},\;\vec{v}_{\textrm{ref},D}=\vec{y}_{\textrm{ref},D}\,.

(26)

With these specifications, the posterior covariance matrix of the independent variables $\vec{z}_{\neg D}$ can be written as

\mathbf{U}_{\neg D,\neg D}^{\prime}=\left(\mathbf{S}_{A,\neg D}^{T}\mathbf{U}_{A,A}^{-1}\mathbf{S}_{A,\neg D}\right)^{-1}\,,

(27)

and the posterior mean vector as

\vec{u}_{\neg D}^{\prime}=\vec{z}_{\textrm{ref},\neg D}+\mathbf{U}_{\neg D,\neg D}^{\prime}\mathbf{S}_{D,A}^{T}\mathbf{U}_{A,A}^{-1}(\vec{v}-\vec{v}_{\textrm{ref}})\,.

(28)

The outer inversion in eq. 27 has never to be evaluated explicitly. In section II.3, we discuss how blocks of the posterior covariance matrix of interest can be computed and how to draw samples from the posterior distribution. The multiplication with $\mathbf{U}_{\neg D,\neg D}^{\prime}$ in eq. 28 can be reformulated as the task to solve a system of linear equations,

	$\displaystyle\mathbf{A}\vec{x}=\vec{b}\;\;\;\textrm{with}\;\;\;\mathbf{A}$	$\displaystyle=\mathbf{S}_{A,\neg D}^{T}\mathbf{U}_{A,A}^{-1}\mathbf{S}_{A,\neg D}$
	$\displaystyle\vec{b}$	$\displaystyle=\mathbf{S}_{D,A}^{T}\mathbf{U}_{A,A}^{-1}(\vec{v}-\vec{v}_{\textrm{ref}})\,,$

with the solution vector $\vec{x}$ representing the result of the matrix product in eq. 28. As all matrices $\mathbf{U}^{-1}$ , $\mathbf{S}$ , and $\mathbf{S}_{A,\neg D}^{T}\mathbf{U}_{A,A}^{-1}\mathbf{S}_{A,\neg D}$ are typically sparse, we employ sparse matrix algorithms to greatly speed up the computation of the posterior expectation vector $\vec{u}_{\neg D}^{\prime}$ .

Please note that so far we have only recapitulated one version of the formulas of the Generalized Least Squares method and introduced specific notation. These formulas will be the basis for inference in Bayesian networks with linear relationships and with a slight modification explained in section II.6 also in the case of non-linear relationships. Using these formulas implies that observed nodes are always attached to a vector $\vec{\tau}$ with non-zero prior uncertainties. We feel that this assumption is not a serious limitation in the context of nuclear data evaluation because measurements are always associated with a statistical uncertainty component. If this is not the case, the prior uncertainty associated with the elements of $\vec{\tau}$ can be made very small which effectively eliminates the impact of these noise node on the results although it may be detrimental to numerical stability.

Deterministic relationships, i.e., those associated with a zero diagonal element in the covariance matrix $\mathbf{U}$ , must be treated as a special case because $\mathbf{U}^{-1}$ is not defined anymore. Assume that $Z$ is the subset of $\neg D$ associated with zero prior uncertainties and $\neg Z$ is the subset of $\neg D$ with non-zero prior uncertainties. The approach to deal with deterministic relationships is to use $\neg Z$ instead of $\neg D$ in eq. 28 and eq. 27. Afterward the full posterior mean vector $\vec{u}_{\neg D}$ and covariance matrix $\mathbf{U}_{\neg D,\neg D}$ can be obtained by imputing the prior mean vector and covariance matrix to the missing blocks,

\vec{u}_{Z}^{\prime}=\vec{u}_{Z}\;\;\;\textrm{and}\;\;\;\mathbf{U}_{Z,Z}^{\prime}=\mathbf{0},\mathbf{U}_{Z,\neg{Z}}=(\mathbf{U}_{\neg Z,Z})^{T}=\mathbf{0}\,.

(29)

In order to obtain the posterior expectation $\vec{u}_{D}$ of $\vec{z}_{D}$ , we note that the known vector $\vec{r}$ of the observed node is given by the sum of the posterior expectation $\vec{u}_{D}$ and the posterior expectations in $\vec{u}_{\neg D}$ propagated to the observed node by means of eq. 14. With the result of the propagation given by

g_{D}(\vec{u}_{\neg D}):=\vec{y}_{\textrm{ref},\neg D}+\mathbf{S}_{\neg D,D}(\vec{u}_{\neg D}^{\prime}-\vec{z}_{\textrm{ref},\neg D})

(30)

we can therefore compute the posterior expectation $\vec{u}_{D}^{\prime}$ by

\vec{u}_{D}^{\prime}=\vec{r}-g_{D}(\vec{u}_{\neg D})\,.

(31)

To compute the posterior covariance matrix blocks $\mathbf{U}_{D,D}$ and $\mathbf{U}_{\neg D,D}$ , we remark that the following relationships for the covariance matrix of random vectors $\vec{x},\vec{y},\vec{z}$ hold:

	$\displaystyle\operatorname{Cov}\left[\vec{x},\vec{y}\right]$	$\displaystyle=\left(\operatorname{Cov}\left[\vec{y},\vec{x}\right]\right)^{T},$
	$\displaystyle\operatorname{Cov}\left[\vec{x}+\vec{y},\vec{z}\right]$	$\displaystyle=\operatorname{Cov}\left[\vec{x},\vec{z}\right]+\operatorname{Cov}\left[\vec{y},\vec{z}\right],$
	$\displaystyle\operatorname{Cov}\left[\mathbf{S}\vec{x},\vec{y}\right]$	$\displaystyle=\mathbf{S}\operatorname{Cov}\left[\vec{x},\vec{y}\right]\,,$

where the notation $\operatorname{Cov}\left[\vec{x},\vec{y}\right]$ denotes the covariance matrix between random vectors $\vec{x}$ and $\vec{y}$ .

Introducing the abbreviation $\operatorname{Var}\left[\vec{x}\right]:=\operatorname{Cov}\left[\vec{x},\vec{x}\right]$ and calculating the variance of both sides of the equation $\vec{z}_{D}=\vec{r}-g_{D}(\vec{z}_{\neg D})$ , we find that $\operatorname{Var}\left[\vec{z}_{D}\right]=\operatorname{Var}\left[g_{D}(\vec{z}_{\neg D})\right]$ and in terms of the corresponding posterior covariance matrix blocks:

\mathbf{U}_{D,D}^{\prime}=\mathbf{S}_{D,\neg D}\mathbf{U}_{\neg D,\neg D}^{\prime}\left(\mathbf{S}_{D,\neg D}\right)^{T}

(32)

The application of the variance operator on both sides of the rearranged equation $\vec{z}_{D}+g_{D}(\vec{z}_{\neg D})=\vec{r}$ yields

\mathbf{U}_{D,\neg D}^{\prime}=\left(\mathbf{U}_{\neg D,D}^{\prime}\right)^{T}=\mathbf{S}_{D,\neg D}\mathbf{U}_{\neg D,\neg D}^{\prime}\,.

(33)

In summary, with eqs. 27, 28, 29, 30, 31, 32 and 33 we obtain the posterior expectation $\vec{u}^{\prime}$ and covariance matrix $\mathbf{U}^{\prime}$ for the full vector of independent variables $\vec{z}$ by conditioning on the known vector $\vec{r}$ of one (or several) observed nodes for both stochastic and deterministic linear functional relationship between variables. The posterior expectation $\vec{y}_{\textrm{post}}$ associated with the dependent nodes in $\vec{y}$ can be obtained by using eq. 14 with the posterior expectation of independent variables:

\vec{y}_{\textrm{post}}=\vec{y}_{\textrm{ref}}+\mathbf{S}\left(\vec{u}^{\prime}-\vec{z}_{\textrm{ref}}\right)

(34)

The posterior covariance matrix $\mathbf{Y}_{\textrm{post}}$ of dependent nodes is given by

\mathbf{Y}_{\textrm{post}}=\mathbf{S}\mathbf{U}^{\prime}\mathbf{S}^{T}\,.

(35)

The important feature of these formulas is that all matrices involved are typically sparse in nuclear data evaluation if we do not absorb functional relationships, such as those for normalization errors into the covariance matrix but explicitly keep them in $\mathbf{S}$ . The matrix $\mathbf{S}$ contains the information about the connection structure of the Bayesian network. In this paper, we do not need specialized algorithms for Bayesian networks to exploit the connection structure but can rely on general libraries for sparse matrix operations instead. In the next section, we elaborate on ways to deal with the large posterior covariance matrix by exploiting sparsity. Afterwards, we focus on the Bayesian network interpretation and discuss the proper handling of nested and non-linear relationships.

II.3 Evaluating the posterior covariance matrix

When working with ten or even hundred thousands of variables, it is often not practical to compute or store the full posterior covariance matrix explicitly. Even if we just need to evaluate the posterior covariance block $\mathbf{U}_{I,I}^{\prime}$ of independent variables, it may be still too large in practice.

In nuclear data evaluation, the ability to compute at least the posterior covariance matrix block associated with the evaluated cross sections is very important because this information is needed for linear error propagation through application codes. However, sometimes linear error propagation based on the covariance matirx is not feasible with a reasonable amount of computational resources. For instance, if we have to use a very fine energy mesh, e.g., several tens of thousands of mesh points, in energy regions with quickly oscillating cross sections, even the handling of the posterior covariance matrix block associated with cross sections becomes impractical. In such cases, we may prefer to draw samples from the posterior distribution and to subsequently propagate these samples through application codes according to the Total Monte Carlo (TMC) method Koning and Rochman (2008); Rochman et al. (2009, 2010).

Therefore, feasible approaches to evaluate blocks of the posterior covariance matrix and to sample from the posterior distribution using the full posterior covariance matrix are both important in the field of nuclear data.

Following we explain the computation of blocks from the posterior covariance matrix and the sampling using the full posterior covariance matrix. For both requirements, we exploit the key insight that the inverse of the posterior covariance matrix in eq. 27,

\left(\mathbf{U}_{\neg D,\neg D}^{\prime}\right)^{-1}=\mathbf{S}_{A,\neg D}^{T}\mathbf{U}_{A,A}^{-1}\mathbf{S}_{A,\neg D}\,,

(36)

can be very sparse even though the posterior covariance matrix is not. The reason is that off-diagonal elements in the posterior covariance matrix are non-zero if the associated variables are correlated, irrespective of whether this correlation is mediated by a third variable. In contrast to that, elements in the inverse posterior covariance matrix are zero if the associated variables are conditionally independent given all other variables, e.g., (Koller and Friedman, 2009, Chap. 7.1.3). In other words, if two variables are only correlated because they are both influenced by one or a set of other variables, they are conditionally independent. The approaches described in the following to obtain random samples from the posterior distribution and to calculate blocks of the full posterior covariance matrix rely on the property that the inverse posterior covariance matrix is sparse. This sparseness property is present in all the Bayesian network examples we present in sections III.1, III.2 and III.3 thanks to the sparse Gaussian process construction explained in section II.7 and the specific handling of normalization errors.

First we discuss one way to obtain samples from the posterior distribution that is amenable to optimization by exploiting sparsity. To sample from a multivariate normal distribution $\vec{x}\sim\mathcal{N}(\vec{\mu},\mathbf{\Sigma})$ , we can first compute a Cholesky decomposition of the inverse covariance matrix $\mathbf{\Sigma}^{-1}=\mathbf{L}^{T}\mathbf{L}$ with $\mathbf{L}$ being an upper triagonal matrix. Making use of the fact that the elements in $\vec{z}=\mathbf{L}\vec{x}$ are distributed according to a standard normal distribution, i.e., $z_{i}\sim\mathcal{N}(0,1)$ , we can generate a vector $\vec{z}$ with elements from a standard normal distribution and solve the set of linear equations $\mathbf{L}\vec{y}=\vec{z}$ . Because $\mathbf{L}$ is an upper triagonal matrix, this system of linear equations can be efficiently solved using backward substitution, e.g., (Press, 2007, Chap. 2.2). A sample from the multivariate normal distribution is then given by $\vec{s}=\vec{\mu}+\vec{y}$ .

Therefore, we can make a sparse Cholesky decomposition of $\left(\mathbf{U}_{\neg D,\neg D}^{\prime}\right)^{-1}$ in order to apply the approach described in the previous paragraph to obtain samples from the posterior distribution. The essential idea of a sparse Cholesky decomposition of a sparse matrix $\mathbf{A}$ is to find a permutation matrix $\mathbf{P}$ so that the Cholesky decomposition of the permuted matrix $\mathbf{P}\mathbf{A}\mathbf{P}^{T}$ is sparse. Several algorithms exist to find such a so-called fill-reducing permutation, e.g., Davis (2006), and a popular one is the minimum-degree algorithm, e.g., George and Liu (1989). Without a permutation applied before the Cholesky decomposition, the resulting Cholesky factor $\mathbf{L}$ is in general not sparse. We employed the Matrix package Bates and Maechler (2021) available in the programming language R, which itself relies on the CHOLMOD routines Chen et al. (2008) written in C to perform the sparse Cholesky decomposition. The sparsity of the Cholesky factor $\mathbf{L}$ does not only help to keep the storage requirement low but also speeds up the solution of the system of linear equations $\mathbf{L}\vec{x}=\vec{z}$ . Finally, because we only have obtained a sample of the variables associated with indices $\neg D$ , we need to compute the missing variables associated with indices $D$ using eq. 31. Please note that the values in the resulting vector are realizations of the independent variables (those in $\vec{z}$ in the previous section). To obtain a sample of the values at dependent nodes, we have to propagate these sampled values associated with independent nodes by means of eq. 14.

Although the computation of the full posterior covariance matrix may be intractable, it is still valuable to be able to compute posterior covariances between quantities of interest. The sparse Cholesky decomposition of the inverse posterior covariance matrix also helps to solve this task. We know that $\left(\mathbf{U}_{\neg D,\neg D}^{\prime}\right)^{-1}=\mathbf{P}^{T}\mathbf{L}^{T}\mathbf{L}\mathbf{P}$ with $\mathbf{P}$ the permutation matrix and $\mathbf{L}$ an upper triagonal matrix. We denote by $\vec{e}_{i}$ a vector with all elements being zero except the element at the i ${}^{\textrm{th}}$ position being one. The element $c_{ij}$ in the $i^{\textrm{th}}$ row and $j^{\textrm{th}}$ column of the posterior covariance matrix is given by

\vec{e}_{i}^{T}\left(\mathbf{U}_{\neg D,\neg D}^{\prime}\right)\vec{e}_{j}=\vec{e}_{i}^{T}\left((\mathbf{U}_{\neg D,\neg D}^{\prime})^{-1}\right)^{-1}\vec{e}_{j}=\\ \vec{e}_{i}^{T}\left(\mathbf{P}^{T}\mathbf{L}^{T}\mathbf{L}\mathbf{P}\right)^{-1}\vec{e}_{j}=\vec{e}_{i}^{T}\mathbf{P}^{-1}\mathbf{L}^{-1}\left(\mathbf{L}^{T}\right)^{-1}\left(\mathbf{P}^{T}\right)^{-1}\vec{e}_{j}=\\ \left((\mathbf{L}^{T})^{-1}\left(\mathbf{P}^{T}\right)^{-1}\vec{e}_{i}\right)^{T}\left(\mathbf{L}^{T}\right)^{-1}\left(\mathbf{P}^{T}\right)^{-1}\vec{e}_{j}=\\ \left((\mathbf{L}^{T})^{-1}\mathbf{P}\vec{e}_{i}\right)^{T}\left(\mathbf{L}^{T}\right)^{-1}\mathbf{P}\vec{e}_{j}\,,

(37)

where we made use of the fact that for permutation matrices $\mathbf{P}^{T}=\mathbf{P}^{-1}$ . The multiplications $\left(\mathbf{L}^{T}\right)^{-1}\mathbf{P}\vec{e}_{i}$ and $\left(\mathbf{L}^{T}\right)^{-1}\mathbf{P}\vec{e}_{j}$ can be restated as the solution of the set of linear equations

\mathbf{L}^{T}\vec{r}_{i}=\mathbf{P}\vec{e}_{i}\;\;\;\textrm{and}\;\;\;\mathbf{L}^{T}\vec{r}_{j}=\mathbf{P}\vec{e}_{j}

(38)

which allows the computation of the covariance element as $c_{ij}=\vec{r}_{i}^{T}\vec{r}_{j}$ . Blocks of the posterior covariance matrix can be evaluated by assemblig the (column) vectors $\vec{e}_{i}$ with indices of interest to matrices $\mathbf{E}_{I}=(\vec{e}_{i_{1}},\vec{e}_{i_{2}},\dots)$ and $\mathbf{E}_{J}=(\vec{e}_{j_{1}},\vec{e}_{j_{2}},\dots)$ , and then solve the set of linear equations

\mathbf{L}^{T}\mathbf{R}_{I}=\mathbf{P}\mathbf{E}_{I}\;\;\;\textrm{and}\;\;\;\mathbf{L}^{T}\mathbf{R}_{J}=\mathbf{P}\mathbf{E}_{J}

(39)

to finally compute $\mathbf{C}_{I,J}=(\mathbf{R}_{I})^{T}\mathbf{R}_{J}$ . This covariance block is associated with a subset of the independent variables in $\vec{z}_{\neg D}$ . To calculate the covariance matrix block associated with the independent variables in $\vec{z}_{D}$ , we can use eq. 32. The product in this equation can be computed by solving the set of linear equations

\left(\mathbf{U}_{\neg D,\neg D}^{\prime}\right)^{-1}\mathbf{M}=\mathbf{S}_{D,\neg D}

(40)

and then computing $\mathbf{S}_{D,\neg D}\mathbf{M}$ . Instead of using the index set $D$ , it is perfectly possible to use only a subset of D comprising only the indices associated with variables of interest.

II.4 A simple Bayesian network

Besides computational advantages thanks to exploiting sparsity, the strength of Bayesian networks is their interpretability. Instead of formalizing an estimation problem in nuclear data evaluation in terms of an experimental covariance matrix and parameter prior covariance matrix, hence converting all functional relationships between variables into elements of a covariance matrix, we maintain and work explicity with these relationships. Consequently, we are also able to obtain updated posterior distributions of quantities, which are often marginalized out, such as systematic errors of experiments. This offers a convenient way to spot inconsistencies in the posterior distribution. For instance, a posterior expectation of a normalization error not consistent with its prior specification is a clear warning that some modeling assumptions are wrong.

To emphasize the advantages of the Bayesian network interpretation and also to prepare the discussion of nested relationships in the next section, we give a simple example of how the fitting of model parameters to an experimental dataset with $M$ measurement points affected by statistical and normalization errors may be modeled as a Bayesian network. We then show how to compute the quantities that are required to evaluate the GLS formuals in eqs. 27 and 28.

The relationship of measured values, associated systematic and statistical errors, and the model parameters can be written as

\vec{\sigma}_{\textrm{exp}}=g(\vec{p})+\mathbf{J}_{\textrm{norm}}\eta+\mathbf{J}_{\textrm{stat}}\vec{\tau}\,,

(41)

with $g(\vec{p})$ assumed to be a linear model, e.g., the linearized version of a nuclear physics model,

g(\vec{p})=\vec{y}_{\textrm{ref}}^{\textrm{mod}}+\mathbf{J}_{\textrm{mod}}(\vec{p}-\vec{p}_{\textrm{ref}})\,.

(42)

The matrix $\mathbf{J}_{\textrm{mod}}$ denotes the Jacobian matrix of the model. The mapping matrix $\mathbf{J}_{\textrm{norm}}$ of dimension $M\times 1$ maps the normalization error $\eta$ systematicly to each data point. We did not use the same symbol $\mathbf{T}$ as in eq. 10 to denote the Jacobian matrices because the Jacobian matrices here have to be regarded as submatrices of $\mathbf{T}$ and therefore also as submatrices of $\mathbf{S}$ defined in eq. 13 as we will discuss in a moment.

For an absolute normalization error, all elements of $\mathbf{J}_{\textrm{norm}}$ are one. Finally, $\mathbf{J}_{\textrm{stat}}$ is given by the identity matrix to map the statistical errors to all the data points. To be fully consistent with the notation introduced in section II.2, we need to introduce the reference vectors associated with the Taylor approximation,

\vec{\sigma}_{\textrm{exp}}=\vec{\sigma}_{\textrm{ref}}^{\textrm{exp}}+\mathbf{J}_{\textrm{mod}}(\vec{p}-\vec{p}_{\textrm{ref}})\\ +\mathbf{J}_{\textrm{norm}}(\eta-\eta_{\textrm{ref}})+\mathbf{J}_{\textrm{stat}}(\vec{\tau}-\vec{\tau}_{\textrm{ref}})\,,

(43)

with $\vec{\sigma}_{\textrm{ref}}^{\textrm{exp}}=\vec{y}_{\textrm{ref}}^{\textrm{mod}}+\mathbf{J}_{\textrm{norm}}\eta_{\textrm{ref}}+\mathbf{J}_{\textrm{stat}}\vec{\tau}_{\textrm{ref}}$ . The corresponding Bayesian network is given by:

The vectors $\vec{y}$ and $\vec{z}$ are

\vec{y}=\begin{pmatrix}\vec{\sigma}_{\textrm{exp}}\\ \vec{p}\\ \eta\end{pmatrix}\;\;\;\textrm{and}\;\;\;\;\vec{z}=\begin{pmatrix}\vec{\tau}\\ \vec{p}\\ \eta\end{pmatrix}\,.

(44)

The respective reference vectors for a Taylor expansion are given by

\vec{y}_{\textrm{ref}}=\begin{pmatrix}\vec{\sigma}_{\textrm{ref}}^{\textrm{exp}}\\ \vec{p}_{\textrm{ref}}\\ \eta_{\textrm{ref}}\\ \end{pmatrix}\;\;\;\;\textrm{and}\;\;\;\;\vec{z}_{\textrm{ref}}=\begin{pmatrix}\vec{\tau}_{\textrm{ref}}\\ \vec{p}_{\textrm{ref}}\\ \eta_{\textrm{ref}}\\ \end{pmatrix}

(45)

and the matrix $\mathbf{S}$ as required for eq. 14 is of the form

\mathbf{S}=\begin{pmatrix}\mathbf{J}_{\textrm{stat}}&\mathbf{J}_{\textrm{mod}}&\mathbf{J}_{\textrm{norm}}\\ \mathbf{0}&\mathbb{1}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbb{1}\end{pmatrix}\,.

(46)

Please note that $\mathbf{J}_{\textrm{stat}}=\mathbb{1}$ and the vector $\vec{\tau}$ has the same purpose as in the last section, i.e., to introduce stochasticity into the otherwise deterministic link which connects the model parameters and the normalization error with the experimental measurement vector.

As the values of the experimental measurement vector $\vec{\sigma}_{\textrm{exp}}$ are observed, we can use the GLS update formula in eqs. 27 and 28 to obtain the joint posterior distributions of $\vec{p}$ and $\eta$ . To that end, the index set $D$ refers to the indices of the elements associated with the block $\vec{\sigma}_{\textrm{exp}}$ in $\vec{y}$ .

If we assume the prior covariance matrix for the model parameter vector $\vec{p}$ to be diagonal, the full prior covariance matrix $\mathbf{U}$ summarizing at the same time our knowledge on model parameters, the normalization error, and the statistical errors is diagonal as well. The mapping matrix $\mathbf{S}$ is very sparse, except possibly for the block $\mathbf{J}_{\textrm{mod}}$ associated with the mapping of model parameters to the corresponding predictions. However, the matrix $\mathbf{J}_{\textrm{mod}}$ is very rectangular for a nuclear model because there are usually much fewer model parameters than predicted values. If using a mathematical fitting function, we can opt for piecewise linear interpolation or similar local interpolation schemes that yield a sparse Jacobian. Prior knowledge on possible shapes of the function, such as its smoothness, can be introduced via the prior specification. Section II.7 elaborates on a sparse Gaussian process construction for this purpose.

II.5 Nested relationships

In the previous section, we discussed a very simple Bayesian network that directly connected the independent variables to the dependent variables that are measured. However, the full potential of Bayesian networks can only be exploited if we permit nested relationships. Before we discuss inference in the presence of nested relationships in general, we give a motivating example with a nested relationship to develop intuition.

To this end, we extend the discussion of the previous section to incorporate a positivity constraint. We keep the linear model specification in eq. 42, which may represent a piecewise linear model, spline or any other sort of polynomial depending on the form of $\mathbf{J}_{\textrm{mod}}$ . Because a linear model may produce negative predictions, we apply a transformation to the model output $\vec{y}_{\textrm{pred}}=f(\vec{y}_{\textrm{mod}})$ with the effect of $f$ being element-wise exponentiation to enforce positivity. This transformation represents a non-linearity in the model description. In the current discussion, we linearize the non-linear relationship and focus on the nested property. In the next section, we show how to deal exactly with the non-linearity.

In vicinity of a vector $\vec{y}_{\textrm{pred}}^{\textrm{ref}}$ , the linear Taylor approximation of $f(\vec{y}_{\textrm{mod}})$ can be written as

\vec{y}_{\textrm{pred}}=\vec{y}_{\textrm{pred}}^{\textrm{ref}}+\mathbf{J}_{\textrm{pos}}(\vec{y}_{\textrm{mod}}-\vec{y}_{\textrm{mod}}^{\textrm{ref}})\,,

(47)

with $\mathbf{J}_{\textrm{pos}}$ being a diagonal matrix with the diagonal elements $J_{\textrm{mod},ii}=\exp(z_{i})$ with $z_{i}$ being the i ${}^{\textrm{th}}$ element of $\vec{y}_{\textrm{mod}}^{\textrm{ref}}$ . Please note that we did not add a random variable $\vec{\tau}$ to eq. 47 because we deal with a deterministic relationship. For the application of eq. 11 and later equations, such a node must be formally present. Formally, we therefore set all elements in the respective blocks in the reference vector $\vec{z}_{\textrm{ref}}$ , the prior estimate $\vec{u}$ and covariance matrix $\mathbf{U}$ to zero, which is equivalent to removing the node altogether.

The Taylor approximation for the compound function can be obtained by plugging the functional form of $g(p)$ into the Taylor approximation of $f$ :

\vec{y}_{\textrm{pred}}=\vec{y}_{\textrm{pred}}^{\textrm{ref}}+\mathbf{J}_{\textrm{pos}}\big{(}\vec{y}_{\textrm{mod}}^{\textrm{ref}}+\mathbf{J}_{\textrm{mod}}(\vec{p}-\vec{p}_{\textrm{ref}})-\vec{y}_{\textrm{mod}}^{\textrm{ref}}\big{)}=\\ =\vec{y}_{\textrm{pred}}^{\textrm{ref}}+\mathbf{J}_{\textrm{pos}}\mathbf{J}_{\textrm{mod}}(\vec{p}-\vec{p}_{\textrm{ref}})\,.

(48)

Finally, the Taylor approximation of this compound function can be combined with the normalization and statistical error, as has been already done in eq. 43,

\vec{\sigma}_{\textrm{exp}}=\vec{\sigma}_{\textrm{exp}}^{\textrm{ref}}+\mathbf{J}_{\textrm{pos}}\mathbf{J}_{\textrm{mod}}(\vec{p}-\vec{p}_{\textrm{ref}})\\ +\mathbf{J}_{\textrm{norm}}(\eta-\eta_{\textrm{ref}})+\mathbf{J}_{\textrm{stat}}(\vec{\tau}-\vec{\tau}_{\textrm{ref}})\,,

(49)

with $\vec{\sigma}_{\textrm{exp}}^{\textrm{ref}}=\vec{y}_{\textrm{pred}}^{\textrm{ref}}+\mathbf{J}_{\textrm{norm}}\eta_{\textrm{ref}}+\mathbf{J}_{\textrm{stat}}\vec{\tau}_{\textrm{ref}}$ . The Bayesian network representing the relationships between the variables in this case is illustrated here:

The quantities to evaluate eqs. 28 and 27 are therefore given by

\vec{y}=\begin{pmatrix}\vec{\sigma}_{\textrm{exp}}\\ \vec{y}_{\textrm{mod}}\\ \vec{p}\\ \eta\end{pmatrix}\;\;\;\textrm{and}\;\;\;\vec{z}=\begin{pmatrix}\vec{\tau}\\ \vec{0}\\ \vec{p}\\ \eta\end{pmatrix}\,.

(50)

The second element in $\vec{z}$ is zero because we regard the link from the node $\vec{p}$ to $\vec{y}_{\textrm{mod}}$ as deterministic, which is formally solved as stating that the associated parentless node is always zero and also its reference point, prior expectation and associated prior covariance matrix element in $\mathbf{U}$ are zero as well.

The reference vectors of the Taylor expansion are given by

\vec{y}_{\textrm{ref}}=\begin{pmatrix}\vec{\sigma}_{\textrm{exp}}^{\textrm{ref}}\\ \vec{y}_{\textrm{mod}}^{\textrm{ref}}\\ \vec{p}_{\textrm{ref}}\\ \eta_{\textrm{ref}}\\ \end{pmatrix}\;\;\;\textrm{and}\;\;\;\vec{z}_{\textrm{ref}}=\begin{pmatrix}\vec{\tau}_{\textrm{ref}}\\ \vec{0}\\ \vec{p}_{\textrm{ref}}\\ \eta_{\textrm{ref}}\end{pmatrix}

(51)

and the sensitivity matrix is of the form

\mathbf{S}=\begin{pmatrix}\mathbb{1}&\mathbf{0}&\mathbf{J}_{\textrm{pos}}\mathbf{J}_{\textrm{mod}}&\mathbf{J}_{\textrm{norm}}\\ \mathbf{0}&\mathbb{1}&\mathbf{J}_{\textrm{mod}}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbb{1}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}&\mathbb{1}\\ \end{pmatrix}\,.

(52)

The matrix $\mathbf{S}$ is again very sparse, with the potential exception of the blocks containing $\mathbf{J}_{\textrm{mod}}$ .

Because the values of the experimental measurement vector $\vec{\sigma}_{\textrm{exp}}$ are known, the indices in the index set $D$ introduced in the paragraph above eq. 23 refer to the positions of elements of $\vec{\sigma}_{\textrm{exp}}$ in the vector $\vec{y}$ . The respective overall Jacobian matrix $\mathbf{S}$ was constructed by hand as an example of how it is done in principle. For more complex Bayesian networks with more deeply nested functions, the manual approach is impractical. In such scenarios, a programmatic recipe to construct $\mathbf{S}$ based on the Jacobian matrices of the individual functional relationships is pertinent.

To this end, we remind ourselves of the notation established in section II.2. There we introduced a vector $\vec{y}$ that comprised the subvector of independent variables $\vec{y}_{I}$ and the dependent variables $\vec{y}_{J}$ whose values are partially determined as a function of the independent variables, $\vec{y}_{J}=f(\vec{y}_{I})+\vec{\tau}$ . In the Bayesian network picture, the vector $\vec{\tau}$ contains all the parentless nodes directly attached to the dependent nodes. We also combined the vectors $\vec{y}_{I}$ and $\vec{\tau}$ to the vector $\vec{z}$ that contained the variables associated with all independent nodes. The mental distinction between the parentless nodes in $\vec{y}_{I}$ and those associated with variables in $\vec{\tau}$ was to structurally ensure that we can apply the GLS method by using eq. 27 and eq. 28.

The GLS formula for the posterior expectation eq. 28 relies on the availability of the vector $\vec{y}_{\textrm{ref}}$ containing the propagated values of the reference values $\vec{z}_{\textrm{ref}}$ through the network and the overall Jacobian matrix $\mathbf{S}$ . An example of the overall Jacobian matrix for a nested relationship was given in eq. 52.

To describe the necessary propagation and computation of the overall Jacobian, we first introduce the notion of what we call a mapping. Each individual functional relationship between two blocks of variables can be characterized by a set of source indices $\mathcal{S}$ and a set of target indices $\mathcal{T}$ and a function $f$ . The two sets must not have indices in common, i.e., $\mathcal{S}\cap\mathcal{T}=\emptyset$ .

Each individual mapping takes the elements in $\vec{y}$ indexed by $\mathcal{S}$ , applies a function $f(.)$ to it in order to obtain the contribution to the values of the elements in $\vec{y}$ indexed by $\mathcal{T}$ . In other words, the application of a mapping on a vector $\vec{y}$ produces a new vector $\vec{y}^{\prime}$ of the same size given by

\vec{y}_{\mathcal{T}}^{\prime}=\vec{y}_{\mathcal{T}}+f(\vec{y}_{\mathcal{S}})\;\;\;\;\textrm{and}\;\;\;\;\vec{y}_{\mathcal{\neg T}}^{\prime}=\vec{y}_{\neg T}

(53)

In the presented example, for instance, the function $f$ to enforce positivity together with the set of source indices referring to the block $\vec{y}_{\textrm{mod}}$ in $\vec{y}$ and the set of target indices referring to the block $\vec{\sigma}_{\textrm{exp}}$ in $\vec{y}$ is a mapping, which we denote by $\mathcal{M}_{\textrm{pos}}$ . The other mappings are associated with the linear model $g(\vec{p})$ and the distribution of normalization errors and statistical errors to the experimental data points, and are denoted by $\mathcal{M}_{\textrm{mod}}$ , $\mathcal{M}_{\textrm{norm}}$ and $\mathcal{M}_{\textrm{stat}}$ , respectively. The mappings $\mathcal{M}_{\textrm{pos}}$ , $\mathcal{M}_{\textrm{norm}}$ and $\mathcal{M}_{\textrm{stat}}$ contribute additively to the experimental measurement $\vec{\sigma}_{\textrm{exp}}$ , hence their sets of target indices are identical.

Illustrated on the current example, the propagation of the independent values in $\vec{z}$ to obtain the values associated with $\vec{y}_{J}$ works in the following way. Given that the subset $\vec{y}_{I}$ is specified, we can compute the elements of the dependent variables $\vec{y}_{J}$ by applying the mappings in the correct order. In the example with the positivity constraint, we first have to apply $\mathcal{M}_{\textrm{mod}}$ to obtain $\vec{y}_{\textrm{mod}}$ and then apply $\mathcal{M}_{\textrm{pos}}$ , $\mathcal{M}_{\textrm{norm}}$ and $\mathcal{M}_{\textrm{stat}}$ to obtain $\vec{\sigma}_{\textrm{exp}}$ . One mapping that depends on the output of another mapping must not be applied before that other mapping. Importantly, after all mappings that refer to the same target index set $\mathcal{T}$ have been applied and before the application of any subsequent mapping, we need to also add the vector $\vec{z}_{\mathcal{T}}$ to $\vec{y}_{\mathcal{T}}$ to account for the additive contributions represented by $\vec{z}_{\mathcal{T}}$ .

As a general recipe, the order of the mappings $\mathcal{M}_{1},\mathcal{M}_{2},\dots$ is established by the following requirement: For two mappings $\mathcal{M}_{i}$ and $\mathcal{M}_{j}$ with $i<j$ , the set of source indices $\mathcal{S}_{i}$ must not contain elements of the set of target indices $\mathcal{T}_{j}$ , i.e., $\mathcal{T}_{j}\cap\mathcal{S}_{i}=\emptyset$ . Please note that this criterion may leave some ambiguity in the order, which is not a problem. For instance, in our running example, it does not matter whether $\mathcal{M}_{\textrm{stat}}$ is applied before or after $\mathcal{M}_{\textrm{norm}}$ .

Having the mappings in the correct order, the propagation of values in a general situation can be achieved as follows. Initially, we set $\vec{y}=\vec{z}$ . The maps are then applied one by one in an order consistent with the criterion above. The resulting vector $\vec{y}$ of one mapping serves as input to the next mapping. After the application of the final mapping, the vector $\vec{y}$ contains the correctly propagated values of the independent variables in $\vec{z}$ .

The construction of the overall Jacobian matrix $\mathbf{S}$ can be achieved similarly. We start with an initial mapping matrix $\mathbf{S}$ given by

\mathbf{S}_{I,I}=\mathbb{1},\;\;\;\;\mathbf{S}_{I,J}=\mathbf{0},\;\;\;\;\mathbf{S}_{J,I}=\mathbf{0},\;\;\;\;\mathbf{S}_{J,J}=\mathbb{1}\,.

(54)

We go through the mappings in a valid order. For each mapping $\mathcal{M}_{i}$ in this sequence, we multiply the Jacobian matrix $\mathbf{W}$ of the current mapping by the resulting mapping matrix $\mathbf{S}$ of the previous mapping to obtain the updated mapping matrix $\mathbf{S}^{\prime}$ ,

\mathbf{S}^{\prime}=\mathbf{W}\mathbf{S}\,.

(55)

The $N\times N$ Jacobian matrix $\mathbf{W}$ of a mapping $\mathcal{M}$ given by a function $f$ with source index set $\mathcal{S}$ and target index set $\mathcal{T}$ is given by

\mathbf{W}_{\mathcal{T},\mathcal{S}}=\mathbf{J},\;\;\mathbf{W}_{\mathcal{S},\mathcal{S}}=\mathbb{1},\;\;\mathbf{W}_{\mathcal{T},\mathcal{T}}=\mathbb{1},\;\;\mathbf{W}_{\mathcal{S},\mathcal{T}}=\mathbf{0}.

(56)

where $\mathbf{J}$ is the Jacobian matrix of the function $f(.)$ . Please note that in the case of a non-linear relationship $f$ , the Jacobian of a mapping depends on the actual values in $\vec{y}_{\mathcal{S}}$ , see eq. 53.

If we deal with observations of intermediate nodes, we cannot use the overall Jacobian matrix $\mathbf{S}$ in the GLS method but need to use a slightly modified matrix $\mathbf{S}^{\prime}$ . This Bayesian network demonstrates a case with an observation of an intermediate node:

To calculate the matrix $\mathbf{S}^{\prime}$ , we need to make the following alteration in the prescription we outlined for the computation of $\mathbf{S}$ . In the sequential application of the mappings, whenever the set of source indices $\mathcal{S}$ of the current mapping makes reference to variables of an observed node, we need to use instead of eq. 56 the specification

\mathbf{W}_{\mathcal{T},\mathcal{S}}=\mathbf{0},\;\;\mathbf{W}_{\mathcal{S},\mathcal{S}}=\mathbb{1},\;\;\mathbf{W}_{\mathcal{T},\mathcal{T}}=\mathbb{1},\;\;\mathbf{W}_{\mathcal{S},\mathcal{T}}=\mathbf{0}\,.

(57)

This change means that we remove the functional dependence of the variables associated with the target indices in $\mathcal{T}$ from those associated with the source indices $\mathcal{S}$ .

We also need to make a similar adjustment to the propagation of the values in $\vec{z}_{\textrm{ref}}$ to obtain the values in $\vec{y}_{\textrm{ref}}$ . We start with the initial assignment $\vec{y}=\vec{z}_{\textrm{ref}}$ . In the sequential application of the mappings, i.e., $\vec{y}^{\prime}=\vec{y}+f_{i}(\vec{y}_{\mathcal{S}})$ , whenever $\mathcal{S}$ makes reference to an observed node, we need to propagate the observed values $\vec{r}_{\mathcal{S}}$ instead, $\vec{y}^{\prime}=\vec{y}+f_{i}(\vec{r}_{\mathcal{S}})$ . The resulting vector $\vec{y}_{\textrm{ref}}^{\prime}$ together with $\mathbf{S}^{\prime}$ must be used instead of their un-primed counterparts to evaluate the GLS formulas in eqs. 27 and 28.

In the graphical representation of the schematic Bayesian network, it amounts to removing the respective arrow:

Thereby the flow of information from observations downstream of $\vec{y}_{1}$ , e.g., the observation of $\vec{y}_{2}$ , does not impact anymore the posterior distribution of independent variables upstream of $\vec{y}_{1}$ . However, the observed values of $\vec{y}_{1}$ have still to be propagated downstream, as they contribute with certain proportion to the values observed in $\vec{y}_{2}$ , which does not need to be explained by independent variables anymore connected to $\vec{y}_{2}$ , such as $\vec{\tau}_{2}$ . The special treatment of observed nodes in the computation of $\vec{y}_{\textrm{ref}}$ ensures that the contribution of the observed nodes is properly accounted for in downstream nodes in the application of the GLS method in eq. 28. Please note that the functionality to incorporate observations of intermediate nodes has not been implemented in the nucdataBaynet package yet.

Finally, we want to address the possibly counter-intuitive aspect that an observed intermediate node blocks any information flow from downstream nodes to upstream nodes if one considers the adjustment of differential cross sections using both differential and integral experimental data. The Bayesian network used as illustration to deal with an intermediate node does not represent the situation of integral data assimilation. If $\vec{y}_{1}$ represented the differential measurements, this vector would not be propagated through the application code, e.g., a neutron transport calculation, but the inferred true cross section would be used instead. The following Bayesian network correctly captures the estimation of differential cross sections by using both differential and integral experimental data:

The true differential cross section $\vec{\sigma}_{\textrm{true}}^{\textrm{diff}}$ explains the differential measurement $\vec{\sigma}_{\textrm{exp}}$ up to the experimental error $\vec{\tau}_{\textrm{exp}}^{\textrm{diff}}$ . The integral measurement $\vec{\sigma}_{\textrm{exp}}^{\textrm{int}}$ is explained by the true differential cross section propagated through the transport code, and again up to a measurement error $\vec{\sigma}_{\textrm{exp}}^{int}$ . In this scenario, there are not any observed intermediate nodes.

II.6 Non-linear relationships

So far we assumed all relationships to be linear. However, as non-linear relationships appear frequently in nuclear data evaluation, their adequate treatment is important. Some examples are:

•

The relationship between parameters of nuclear models and the associated predictions are usually non-linear.
•

Ratios of cross sections are non-linear functions of the cross sections.
•

Measured cross sections are non-linear functions of the true cross section and the relative normalization error
•

To enforce positivity of a cross section $y$ , a non-linear variable transformation $y=\exp(x)$ can be introduced and the auxiliary variable $x$ treated as the independent variable.

It is therefore tremendously useful to have an inference procedure that is capable to deal with these non-linearities. At this point of the discussion, it is pertinent to recall that the GLS formula in eq. 28 locates the maximum of the posterior distribution given by

P(\vec{z}_{\neg D}\,|\,\vec{y}_{D})=\frac{1}{P(\vec{y}_{D})}P(\vec{y}_{D}\,|\,\vec{z}_{\neg D})P(\vec{z}_{\neg D})

(58)

if likelihood and prior are given by the multivariate normal distributions $P(\vec{y}_{D}\,|\,\vec{z}_{\neg D})=\mathcal{N}(g(\vec{z}_{\neg D})+\vec{u}_{D},\mathbf{U}_{D,D})$ and $P(\vec{z}_{\neg D})=\mathcal{N}(\vec{u}_{\neg D},\mathbf{U}_{\neg D,\neg D})$ , respectively, and $g$ is a linear function. The distribution parameters of the likelihood are due to the prior associated with the parentless node $\vec{\tau}_{D}\sim\mathcal{N}(\vec{u}_{D},\mathbf{U}_{D,D})$ attached to the observed node.

The Levenberg-Marquardt (LM) algorithm Levenberg (1944); Marquardt (1963) is an iterative algorithm to solve the non-linear least squares problem. A customized LM algorithm has been presented in Helgesson and Sjöstrand (2017) to take into account the same prior information as the GLS method. More precisely, it can locate exactly the posterior maximum of eq. 58 also for a non-linear relationship $g$ . The specific values of $\vec{z}_{\neg D}$ associated with the posterior maximum represent a so-called maximum a posteriori probability (MAP) estimate. In the following we discuss the adjustments to the GLS method to obtain the customized LM algorithm. A detailed derivation of the update formula employed in the LM algorithm is provided in appendix A.

The LM algorithm proceeds by applying in each iteration the GLS method with the inverse of the prior covariance matrix augmented by a damping term. This mechanism enables the algorithm to dynamically transition between the gradient ascent method and the GLS method depending on the degree of the non-linearities present. As eqs. 27 and 28 implement the GLS method, the application of the LM algorithm in our situation is straight-forward. In each iteration, we have to use eq. 27 augmented by a damping term,

\mathbf{U}_{I,I}^{\prime}=\left(\mathbf{S}_{A,I}^{T}\mathbf{U}^{-1}\mathbf{S}_{A,I}+\lambda\mathbf{D}\right)^{-1}\,,

(59)

with $\lambda$ being a real positive scalar, $\mathbf{D}$ a diagonal matrix and the Jacobian matrix $\mathbf{S}$ evaluated at the current reference vector $\vec{z}_{\textrm{ref}}$ , which is associated with the largest value of the posterior distribution found so far. The index set $A$ contains all $1..N$ indices with $N$ being the number of rows (or columns) of $\mathbf{U}$ .

In each iteration, a proposal vector $\vec{z}_{\textrm{prop},\neg D}$ is computed using eq. 28. If the proposed vector is associated with a larger value of the posterior distribution, it becomes the new reference vector $\vec{z}_{\textrm{ref},\neg D}$ for the next iteration, which also represents the new best guess for the maximum of the posterior distribution. Importantly, also $\vec{y}_{\textrm{ref}}$ has to be updated accordingly by propagating $\vec{z}_{\textrm{ref},\neg D}$ using the non-linear mappings as described in the previous section. If the proposed vector leads to a lower value of the posterior distribution than the current reference vector, it is rejected.

The value of the control parameter $\lambda$ to adjust the step size is changed in each iteration depending on the gain defined by

\rho=\frac{f_{\textrm{ex}}(\vec{z}_{\textrm{prop},\neg D})-f_{\textrm{ex}}(\vec{z}_{\textrm{ref},\neg D})}{f_{\textrm{lin}}(\vec{z}_{\textrm{prop},\neg D})-f_{\textrm{lin}}(\vec{z}_{\textrm{ref},\neg D})}\,,

(60)

where $f_{\textrm{ex}}$ is the exact logarithmized posterior density function using the non-linear relationship $g$ and $f_{\textrm{lin}}$ the expected value of the logarithmized posterior distribution using a linear approximation to $g$ constructed at the vector $\vec{z}_{\textrm{ref},\neg D}$ . The following prescription was suggested by Marquardt (1963) to update the parameter $\lambda$ from one iteration to the next one,

\lambda^{\prime}=\begin{cases}2\lambda&\textrm{if }\rho<0.25\\ \lambda&\textrm{if }0.25\leq\rho<0.75\\ \lambda/3&\textrm{if }0.75\leq\rho\end{cases}\,.

(61)

In words, if the expected improvement using the linear approximation is similar to the real improvement, the value of $\lambda$ is decreased which leads to a larger step size and an update more similar to the GLS update. On the other hand, if the expected improvement and real improvement are very different, $\lambda$ is increased which reduces the step size and proposes vectors more according to the gradient ascent method. Please note that the iterative application of the GLS update without the adaptive damping term is not guaranteed to converge, e.g., demonstrated in (Schnabel, 2015, p. 117).

Overall, the customized LM algorithm is very efficient. It exploits the fact that the result of the non-linear mappings appears as the mean vector in the multivariate normal likelihood. This structure in the equations enables to analytically obtain and use second-order information about the posterior pdf even though only first-order information (i.e., the Jacobian matrix) of the non-linear relationships is available.

Its suitability and efficiency for model-based nuclear data evaluation has already been demonstrated in a full scale evaluation within a nuclear data evaluation pipeline prototype Schnabel et al. (2021) to fit energy-dependent model parameters of TALYS Koning and Rochman (2012); Koning et al. (2019).

If several local maxima are present due to non-linear mappings, such as induced by relative normalization errors, a stage-wise approach is helpful to ensure that a good local and ideally the global maximum is found. Only a subset of the variables may be adjusted in each stage and the other adjustable variables are kept fixed. Mathematically, this is achieved by setting the prior estimates of fixed variables in $\vec{u}$ equal to the associated elements in the reference vector $\vec{z}_{\textrm{ref}}$ and setting the respective elements in the prior covariance matrix $\mathbf{U}$ to zero. With these specifications, the result of eq. 28 is conditioned on the values of the fixed variables. Note that due to the zero prior uncertainty of the fixed elements, also the prescription to deal with deterministic relationships described in the paragraph below eq. 28 needs to be applied.

As a final remark, sometimes it is suggested to treat relative normalization uncertainties by updating the elements of the covariance matrix from one step to another in an iterative optimization scheme. The convergence of the LM algorithm, however, is only guaranteed if the prior covariance matrix $\mathbf{U}$ is assumed to be constant throughout the iteratitive scheme. The correct modeling of relative errors (and thereby induced relative uncertainties) to ensure convergence is explained in section II.8.

In the next section, we elaborate on an efficient and flexible approach to define priors on functions. This construction may be used to model the prior knowledge about an excitation function, which is attempted to be measured in an experiment. Afterwards we elaborate on two important experimental aspects, relative normalization errors and energy calibration errors, that can be properly taken into account in the presented Bayesian network framework but not with the GLS method.

II.7 Sparse Gaussian process construction

Gaussian processes, e.g., Rasmussen and Williams (2006), have been applied in various ways in the nuclear data evaluation field. They have been suggested to account for model deficiencies of the nuclear model in the fast energy region, e.g., Leeb et al. (2008); Schnabel (2015); Schnabel and Leeb (2016), as prior on energy-dependent model parameters of nuclear models Helgesson and Sjöstrand (2018); Schnabel et al. (2021), to model energy-dependent systematic errors of experiments, e.g., Schnabel (2018a), and to fit cross section curves to experimental data Iwamoto (2020). This list demonstrates the versatility of GPs.

A disadvantage of GPs is that the time needed to fit them to data scales with $N^{3}$ with $N$ being the number of data points. This scaling behavior arises due to the occurence of the inverse of a covariance matrix of size $N\times N$ in the regression formula. Approaches to extend the applicability of GP regression to larger datasets are sparse approximations, e.g., Quiñonero-Candela and Rasmussen (2005); Snelson and Ghahramani (2006) and references therein, and numerical methods relying on iterative solvers, e.g., Wang et al. (2019); Pleiss et al. (2020). Another approach, which is application dependent, is to exploit a specific structure of the covariance matrix. For instance, the equations for GP inference can be more efficiently evaluated if a covariance matrix is of toeplitz structure or can be decomposed into kernels acting independently on different dimensions, e.g., Saatchi (2011).

In this section, we elaborate on a construction of a sparse Gaussian process, which enmeshes well with the outlined Bayesian network interpretation, scales to large datasets, and allows for the flexible incorporation of prior knowledge on the features of the unknown function, such as its expected range, slope and smoothness. This construction has already been discussed to some extent in the context of a nuclear data evaluation pipeline Schnabel et al. (2021). Also in (von der Linden et al., 2014, Chap. 13.2) essentially the same construction is described in good detail and derived using the MaxEnt principle Jaynes and Rosenkrantz (1983); Shore and Johnson (1980) and variational calculus, but relying on a spline basis instead of a piecewise linear function as done here.

Given two mesh points associated with energies $E_{i}$ and $E_{i+1}$ , where $E_{i}<E_{i+1}$ and function values $y_{i}$ and $y_{i+1}$ , respectively, values at intermediate energies can be determined by linear interpolation:

g(E)=\left(\frac{E_{i+1}-E}{E_{i+1}-E_{i}}\right)y_{i}+\left(\frac{E-E_{i}}{E_{i+1}-E_{i}}\right)y_{i+1}\\ \textrm{if}\;\;E_{i}\leq E<E_{i+1}

(62)

To state the formulas in a general way for a complete mesh of energies, we introduce the abbreviation

c_{i}(E)=\begin{cases}\frac{E_{i+1}-E}{E_{i+1}-E_{i}}&\textrm{if}\;\;E_{i}\leq E<E_{i+1}\\ 0&\textrm{otherwise}\end{cases}

(63)

and

d_{i}(E)=\begin{cases}\frac{E-E_{i-1}}{E_{i}-E_{i-1}}&\textrm{if}\;\;E_{i-1}\leq E<E_{i}\\ 0&\textrm{otherwise}\end{cases}

(64)

as well as their sum,

f_{i}(E)=c_{i}(E)+d_{i}(E)\,.

(65)

Piecewise linear interpolation, i.e., locating for an energy of interest $E$ the enclosing energies on the mesh and then performing linear interpolation according to eq. 62, can now be concisely written as

g(E)=\sum_{i=1}^{M}f_{i}(E)y_{i}\,.

(66)

Thanks to the bilinearity of the covariance operator, the covariance between function values at arbitrary energies can be computed by

\operatorname{Cov}\left[g(E),g(E^{\prime})\right]=\\ \sum_{i=1}^{M}\sum_{j=1}^{M}f_{i}(E)f_{j}(E^{\prime})\operatorname{Cov}\left[y_{i},y_{j}\right].

(67)

This formula shows that the covariance matrix $\tilde{\mathbf{K}}$ with $\tilde{K}_{ij}=\operatorname{Cov}\left[y_{i},y_{j}\right]$ associated with a finite number of variables $M$ , in combination with linear interpolation enables the computation of covariances for arbitrary pairs of energies $E,E^{\prime}$ . In other words, it is a valid specification of a covariance function $\kappa(E,E^{\prime})$ and together with a mean function $m(E)$ completely characterizes a Gaussian process.

To make the link to the Bayesian network interpretation, we assume that observations of some energy-dependent quantity have been made at experimental energies $\{E_{i}^{\prime}\}_{i=1..N}$ and the computational mesh used for the sparse Gaussian process construction in eq. 67 is given by $\{E_{j}\}_{j=1..M}$ . The vector of function values on the computational mesh is denoted by $\vec{y}_{\textrm{GP}}$ and the vector of function values at the experimental mesh by $\vec{\sigma}_{\textrm{exp}}$ . The Jacobian matrix $\mathbf{J}_{\textrm{GP}}$ that performs the mapping from the computational to the experimental mesh can be computed by

J_{\textrm{GP},ij}=\frac{\partial g(E_{i}^{\prime})}{\partial y_{j}}=f_{j}(E_{i}^{\prime})\,.

(68)

The mapping of the function values of the computational to the experimental mesh can now be written as

\vec{\sigma}_{\textrm{exp}}=\vec{\sigma}_{\textrm{exp}}^{\textrm{ref}}+\mathbf{J}_{\textrm{GP}}(\vec{y}_{\textrm{GP}}-\vec{y}_{\textrm{GP}}^{\textrm{ref}})+(\vec{\tau}-\vec{\tau}^{\textrm{ref}})\,,

(69)

with $\vec{\sigma}_{\textrm{exp}}^{\textrm{ref}}=\mathbf{J}_{\textrm{GP}}\vec{y}_{\textrm{GP}}^{\textrm{ref}}+\vec{\tau}_{\textrm{ref}}$ . The corresponding Bayesian network is of the form

The vectors to evaluate the GLS equations in eqs. 27 and 28 are given by

\vec{y}=\begin{pmatrix}\vec{\sigma}_{\textrm{exp}}\\ \vec{y}_{\textrm{GP}}\end{pmatrix}\;\;\;\textrm{and}\;\;\;\vec{z}=\begin{pmatrix}\vec{\tau}\\ \vec{y}_{\textrm{GP}}\end{pmatrix}\,.

(70)

Regarding the reference vectors of the Taylor expansion, we have

\vec{y}_{\textrm{ref}}=\begin{pmatrix}\vec{\sigma}_{\textrm{exp}}^{\textrm{ref}}\\ \vec{y}_{\textrm{GP}}^{\textrm{ref}}\end{pmatrix}\;\;\;\textrm{and}\;\;\;\vec{z}_{\textrm{ref}}=\begin{pmatrix}\vec{\tau}_{\textrm{ref}}\\ \vec{y}_{\textrm{GP}}^{\textrm{ref}}\end{pmatrix}\,,

(71)

and for the overall mapping matrix

\mathbf{S}=\begin{pmatrix}\mathbb{1}&\mathbf{J}_{\textrm{GP}}\\ \mathbf{0}&\mathbb{1}\end{pmatrix}\,.

(72)

The matrix $\mathbf{J}_{\textrm{GP}}$ contains only two non-zero elements per row and the whole matrix $\mathbf{S}$ is therefore very sparse. The compound prior covariance matrix is given by

\mathbf{U}=\begin{pmatrix}\mathbf{U}_{\vec{\tau}}&\mathbf{0}\\ \mathbf{0}&\mathbf{U}_{\textrm{GP}}\end{pmatrix}\,.

(73)

We usually assume that $\mathbf{U}_{\vec{\tau}}$ is a diagonal matrix because it contains the uncertainties about the statistical errors of the experimental data points. Regarding the prior covariance matrix $\mathbf{U}_{\textrm{GP}}$ of the discretized Gaussian process, we have to make a choice which covariance function to use. A common choice is the so-called squared exponential,

\operatorname{Cov}\left[y_{i},y_{j}\right]=\delta^{2}\exp\left(-\frac{1}{2\lambda^{2}}(E_{i}-E_{j})^{2}\right)\,.

(74)

The length scale $\lambda$ determines how rapidly the function is a priori expected to change or oscillate and the amplitude $\delta$ defines the range the unknown function values are expected to cover.

However, there are several potential issues with this covariance function. One issue being that the same length scale is applied in all energy ranges, which is clearly an invalid assumption for an energy range covering both the resonance region and the fast energy range. Depending on the application, hand-tailored composite covariance functions can be used, e.g., Helgesson and Sjöstrand (2017); Schnabel (2018c), or an energy-dependent length scale employed, e.g., Schnabel and Sjöstrand (2018). Another issue is that the prior induced by this covariance function incorporates the assumption that all potential solution functions have derivatives of all orders, e.g., (Rasmussen and Williams, 2006, Chap. 4.2.1), which may be regarded as an unreasonably strong assumption. Using a covariance function of the Matern class instead is a possible solution, which only possesses derivatives up to a certian order, e.g., (Rasmussen and Williams, 2006, Chap. 4.2.1). Finally, the resulting covariance matrix associated with the values at the mesh points is dense which is detrimental to scale up the inference in Bayesian networks to a large number of variables, e.g., several hundred thousand. Sparse approximations to the full Gaussian process are one approach to deal with this issue, e.g., Quiñonero-Candela and Rasmussen (2005). The discretization introduced in eq. 67 in combination with linear interpolation is already a sparse approximation to the full Gaussian process. However, if using the squared exponential covariance function, the block $\mathbf{U}_{\textrm{GP}}$ is still dense. In the following we present a GP construction that renders $\mathbf{U}_{\textrm{GP}}$ also very sparse.

A diagonal covariance matrix $\mathbf{U}_{\textrm{GP}}$ is very sparse but does not impose any constraint on the smoothness of possible solution functions. The idea is therefore to introduce pseudo-observations of the second derivatives of this function to enforce a certian degree of smoothness. To this end, we introduce the discretized version of the first derivative at location $E_{i}$ ,

\Delta_{i}=\frac{y_{i+1}-y_{i}}{E_{i+1}-E_{i}}\,,

(75)

and use this definition recursively to get a discretized version of the second derivative at location $E_{i}$ ,

\Delta_{i}^{2}=\frac{\Delta_{i+1}-\Delta_{i}}{E_{i+1}-E_{i}}=\\ \frac{1}{(E_{i+1}-E_{i})^{2}}\,y_{i}+\frac{1}{(E_{i+2}-E_{i+1})(E_{i+1}-E_{i})}y_{i+2}\\ -\left(\frac{1}{(E_{i+1}-E_{i})^{2}}+\frac{1}{(E_{i+2}-E_{i+1})(E_{i+1}-E_{i})}\right)y_{i+1}\,.

(76)

This discretized version of the second derivative can be cast into a matrix equation with a matrix $\mathbf{J}_{2nd}$ that maps the values in the vector $\vec{y}_{\textrm{GP}}$ associated with the energies $E_{1},\dots,E_{M}$ to the second derivatives $\vec{y}_{\textrm{2nd}}$ at energies $E_{1},\dots,E_{M-2}$ ,

\vec{y}_{\textrm{2nd}}=\vec{y}_{\textrm{2nd}}^{\textrm{ref}}+\mathbf{J}_{\textrm{2nd}}\,(\vec{y}_{\textrm{GP}}-\vec{y}_{\textrm{GP}}^{\textrm{ref}})+(\vec{\tau}_{\textrm{2nd}}-\vec{\tau}_{\textrm{2nd}}^{\textrm{ref}})\,,

(77)

with $\vec{y}_{\textrm{2nd}}^{\textrm{ref}}=\mathbf{J}_{\textrm{2nd}}\vec{y}_{\textrm{GP}}^{\textrm{ref}}+\vec{\tau}_{\textrm{2nd}}^{\textrm{ref}}$ . We also introduced a random variable $\vec{\tau}_{\textrm{2nd}}$ to fulfill the requirement that every observable node must be associated with a parentless random variable in order to apply eqs. 27 and 28. We can now extend the Bayesian network to accomodate the vector $\vec{y}_{\textrm{2nd}}$ with the second derivatives of $\vec{y}_{\textrm{GP}}$ :

As a reminder, nodes filled with gray color are assumed to be observed. The additional node $\vec{\tau}_{\textrm{2nd}}$ introduces stochasticity which weakens the link between the observed vector $\vec{y}_{\textrm{2nd}}$ and the function values in $\vec{y}_{\textrm{GP}}$ . Without this additinal noise term, the observation of $\vec{y}_{\textrm{2nd}}$ would fully determine $\vec{y}_{\textrm{GP}}$ up to two integration constants, which could be for instance the function value of $y_{1}$ and the slope at the same energy.

The corresponding compound vectors $\vec{y}$ and $\vec{z}$ need to be extended to

\vec{y}=\begin{pmatrix}\vec{\sigma}_{\textrm{exp}}\\ \vec{y}_{\textrm{GP}}\\ \vec{y}_{\textrm{2nd}}\end{pmatrix}\;\;\;\textrm{and}\;\;\;\vec{z}=\begin{pmatrix}\vec{\tau}\\ \vec{y}_{\textrm{GP}}\\ \vec{y}_{\textrm{2nd}}\end{pmatrix}

(78)

and the reference vectors to

\vec{y}_{\textrm{ref}}=\begin{pmatrix}\vec{\sigma}_{\textrm{exp}}^{\textrm{ref}}\\ \vec{y}_{\textrm{GP}}^{\textrm{ref}}\\ \vec{y}_{\textrm{2nd}}^{\textrm{ref}}\end{pmatrix}\;\;\;\textrm{and}\;\;\;\vec{z}_{\textrm{ref}}=\begin{pmatrix}\vec{\tau}^{\textrm{ref}}\\ \vec{y}_{\textrm{GP}}^{\textrm{ref}}\\ \vec{\tau}_{\textrm{2nd}}^{\textrm{ref}}\end{pmatrix}\,.

(79)

The overall mapping matrix is now given by

\mathbf{S}=\begin{pmatrix}\mathbb{1}&\mathbf{J}_{\textrm{GP}}&\mathbf{0}\\ \mathbf{0}&\mathbb{1}&\mathbf{0}\\ \mathbf{0}&\mathbf{J}_{\textrm{2nd}}&\mathbb{1}\end{pmatrix}\,.

(80)

Please note that also $\mathbf{J}_{\textrm{2nd}}$ is very sparse as it only contains three non-zero elements per row. The associated prior covariance matrix is of the form

\mathbf{U}=\begin{pmatrix}\mathbf{U}_{\vec{\tau}}&\mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{U}_{\textrm{GP}}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{U}_{\textrm{2nd}}\end{pmatrix}\,.

(81)

We can use diagonal matrices for all the submatrices $\mathbf{U}_{\vec{\tau}},\mathbf{U}_{\textrm{GP}},\mathbf{U}_{\textrm{2nd}}$ . Let us denote by $P$ and $Q$ the index sets associated with the subvectors $\vec{\sigma}_{\textrm{exp}}$ and $\vec{y}_{\textrm{2nd}}$ , respectively. The index set $D$ indicating the observed variables is therefore given by $D=P\cup Q$ . Finally, the vector $v$ in eq. 25 required to apply the GLS method in eq. 28 is defined by

\vec{v}_{\neg D}=\vec{u}_{\neg D},\vec{v}_{P}=\vec{r},\textrm{and}\;\vec{v}_{Q}=\vec{0}\,.

(82)

The vector $\vec{r}$ contains the observed values of the vector $\vec{\sigma}_{\textrm{exp}}$ . By imposing $\vec{y}_{Q}=\vec{0}$ we assume that the observed second derivatives at all energies $E_{1},\dots,E_{M-2}$ are zero but these measurements were affected by a statistical error whose uncertainty is reflected in $\mathbf{U}_{\textrm{2nd}}$ . The use of pseudo-observations of the second derivative is a way to implicitly define a Gaussian process prior on $\vec{y}_{\textrm{GP}}$ that favors a certain degree of smoothness. Effectively, this construction penalizes solutions according to the magnitude of the sum of squared second derivatives at the various mesh points. The continuous version of the same penalty criterion is employed in the fitting of smoothing splines, e.g., Green and Silverman (1993). Alternatively, one may regard the construction as a kind of Tikhonov regularization or ridge regression, e.g., Tikhonov and Arsenin (1977), in the space of second derivatives. Figure 1 gives an impression of how the uncertainties assigned to the pseudo-observations of the second derivatives impact the result of the Bayesian inference.

Figure 1: Impact of the uncertainty assigned to the pseudo-observations of the second derivative on the evaluated curve. The computational mesh contains 1000 points and the synthetic measurements follow a sine wave perturbed by random noise with a standard deviation of 0.2. The uncertainty used for the pseudo-observations at each of the thousand mesh points to obtain the evaluated blue, green and red curve is

10^{-6}

10^{-7}

and

10^{-8}

, respectively.

As each element in $\vec{y}_{\textrm{2nd}}$ is associated with the second derivative at a particular energy $E_{i}$ , we can use different variances along the diagonal of $\mathbf{U}_{\textrm{2nd}}$ to change the degree of smoothness in different energy ranges. In addition, if we use very large variances in the diagonal of $\mathbf{U}_{\textrm{GP}}$ , we only impose our knowledge about the expected smoothness. In principle, we can also augment the Bayesian network with a node with pseudo-observations of the first derivative (slope). The construction is analogous to the one to incorporate pseudo-observations on the second derivative.

To conclude this section, we remark that observations of first derivatives (gradients) are also taken into account in the construction of surrogate models to emulate expensive computer models, e.g., in the engineering context. If the surrogate model is based on a Gaussian process, this approach is referred to as gradient-enhanced Kriging, e.g., de Baar et al. (2014). Whereas in gradient-enhanced Kriging real observations of the gradient of a computer model are used to better constrain the surrogate model, the uncertainties of the pseudo-observations of the second derivatives introduced in this section act as additional free parameters that permit to regulate the smoothness of feasible solutions.

II.8 Two important non-linear mappings

The Bayesian network framework in the form presented is general and can incorporate any linear or non-linear mapping as long as the functions for propagation and for the computation of the Jacobian are available. Two experimental aspects, which are relative normalization errors and errors in the energy calibration, are very relevant for many experiments. As they cannot be properly addressed in the GLS method, we highlight the treatment of these experimental aspects in the Bayesian network framework.

In time-of-flight (TOF) experiments, the true energy $E$ of the particle beam is related to the assumed energy $E^{\prime}$ in first approximation by

E=\alpha+(1+\beta)E^{\prime}\,,

(83)

with $\alpha$ being a constant energy shift and $(1+\beta)$ being a scaling factor. In the previous section, we established eq. 66 to interpolate the values from a mesh with knot points at energies $E_{1},E_{2},\dots,E_{M}$ to obtain the value at an energy $E$ ,

g(E)=\sum_{i=1}^{M}f_{i}(E)y_{i}\,.

(84)

If the experimental energy is perfectly known, the factor $f_{i}(E)$ to map the values on the mesh to the experimental energy is a constant and consequently the interpolation result is a linear function of the values $\vec{y}_{i}$ on the mesh. However, by acknowledging that $E$ is also uncertain because we do not perfectly know the precise values of $\alpha$ and $\beta$ , this relationship becomes non-linear.

In the visualization of the Bayesian network, we can choose which collection of variables should be summarized to separate nodes. As the variables $\vec{y}=(y_{1},y_{2},\dots,y_{M})$ referring to the true cross section are independent of specific experiments, it is reasonable to group them to a node. The variables $\alpha$ and $\beta$ are specific for an experiment and therefore may be grouped together to $\vec{\gamma}=(\alpha,\beta)^{T}$ as one node. These choices lead to the following visualization of the Bayesian network:

We did not attach a parentless node $\vec{\tau}$ to $\vec{\sigma}_{\textrm{exp}}$ but its presence for nodes with parents is implicitly understood.

For cases where the non-linearity arises due to non-linear interactions between source variables, such as between $\vec{y}$ and $\vec{\gamma}$ , we introduce the convention that those nodes are connected by a dashed line.

The non-linear propagation of the variables $\vec{y}$ and $\vec{\gamma}$ is already established by eq. 83 in combination with eq. 84. Also the partial derivatives to construct the Jacobian matrix can be readily computed,

$\displaystyle\frac{\partial g(E)}{\vec{y}_{k}}$	$\displaystyle=f_{k}(E)\,,$	(85)
$\displaystyle\frac{\partial g(E)}{\partial\alpha}$	$\displaystyle=\sum_{i=1}^{M}\frac{df_{i}(E)}{dE}y_{i}\,,$	(86)
$\displaystyle\frac{\partial g(E)}{\partial\beta}$	$\displaystyle=E\sum_{i=1}^{M}\frac{df_{i}(E)}{dE}y_{i}\,.$	(87)

The appearance of the input variables $y_{i}$ in the Jacobian matrix is the signature of a non-linear relationship.

Analogous to the examples in previous sections, we can define the column vector $\vec{y}=(\vec{\sigma}_{\textrm{exp}}^{T},\vec{y}^{T},\vec{\gamma}^{T})^{T}$ . With the positions of the subvectors in $\vec{y}$ defined, the source index set $\mathcal{S}$ of the mapping refers to the positions of $\vec{y}$ and $\vec{\gamma}$ . The target index set $\mathcal{T}$ refers to the indices associated with $\vec{\sigma}_{\textrm{exp}}$ . Such a mapping can be used in Bayesian networks as a building block.

Another experimental aspect is the energy resolution of the measurement. Finite resolutions can be modeled by convolutions, which like linear interpolation represents a linear relationship between the variables on the mesh and the value propagated to the experimental energy. The combination of convolutions with energy calibration errors works analogous to the case of linear interpolation.

As a second example, we elaborate on the modeling of relative normalization errors. Within the GLS method, relative normalization uncertainties must be converted to absolute ones by using the measured values as reference. However, this approach leads to a systematic underestimation of the uncertainty of measured values below and a systematic overestimation of measured values above the true value, effectively biasing the results of the GLS method towards lower values. The correct way to deal with a relative normalization error is to use the true cross section as the reference. Using again the same construction as in eq. 84 to map from the true cross sections given on the computational mesh to the experimental energy, the relationship between an experimental measurement and the true cross section can be written as:

\vec{\sigma}_{\textrm{exp}}=g(\vec{E}_{\textrm{exp}})+\eta\,g(\vec{E}_{\textrm{exp}})\,,

(88)

where $g(\vec{E}_{\textrm{exp}})$ represents the vector given by $(g(E_{\textrm{exp},1}),g(E_{\textrm{exp},2}),\dots)^{T}$ with $E_{\textrm{exp}},i$ being the experimental energies. The variable $\eta$ is the relative normalization error. The associated Bayesian network is given by

Conceptually, we can think of $\vec{\sigma}_{\textrm{exp}}$ as the superposition of two mappings, which are the linear interpolation from the computational mesh to the experimental energies and the contribution of the relative normalization error. The mapping associated with the first term $g(\vec{E}_{\textrm{exp}})$ is linear and we do not discuss it further. However, the mapping of the relative normalization error is non-linear. The partial derivatives to construct the associated Jacobian matrix are

\frac{\partial g(E)}{\vec{y}_{k}}=\eta\,f_{k}(E)\;\;\;\textrm{and}\;\;\;\frac{\partial g(E)}{\partial\eta}=g(\vec{E})\,.

(89)

In both the case of the energy calibration error and the relative normalization error, the mapping used as input the true cross section vector, which is of course not known. During the iterative scheme of the LM algorithm, the current best estimate of the true cross section vector is used as reference for the relative normalization error. As already described in section II.6, the LM algorithm is able to locate an assignment of values to the variables, e.g., to $\vec{\eta}$ and $\vec{y}$ , that corresponds to the maximum of the posterior distribution taking into account the exact non-linear relationships.

II.9 Few remarks on practical Bayesian network modeling

Before we present the Bayesian network examples in the next section, which make use of the algorithms developed in the method part in orchestration, we want to summarize the possibilities in modeling and give an idea of how Bayesian network modeling is done in practice using the nucdataBaynet package.

We aimed to provide a comprehensive account of the mathematics involved to deal with Bayesian networks with multivariate normal distributions as priors on variables and non-linear, deterministic and nested relationships between variables. We also discussed a flexible sparse Gaussian process prior construction that scales well to large datasets and elaborated on two important non-linear mappings associated with experimental data, which are relative normalization errors and energy calibration errors.

However, from the user point of view, the benefit of Bayesian networks is that they are a helpful mental abstraction to find valid Bayesian models. Furthermore, Bayesian networks can be quickly created on the computer by assembling common mappings, such as linear interpolation mappings, convolutions and non-linear transformations, like a puzzle. Therefore, in the following, we want to convey a basic understanding of how Bayesian models are defined in practice using the nucdataBaynet package from the user point of view.

The first step is to define a data table with the information about the variables present in the Bayesian network. An example data table is shown in table 1. Each row is associated with a variable. The column IDX establishes the position of all independent variables in the vector $\vec{z}$ defined in eq. 11. The column NODE establishes names for the nodes in the Bayesian network. Several variables can be grouped to a single node, hence the multiple appearance of the same node label. The column PRIOR contains the prior expectation in $\vec{u}$ and the column UNC the prior uncertainty associated with each variable in $\vec{z}$ . Please note that the column UNC is for convenience in the case that the prior covariance matrix $\mathbf{U}$ is diagonal and the diagonal elements are given by the squared values of the UNC column. However, it is also possible to introduce covariance elements between variables, but then instead of the column UNC, a full (and better sparse) prior covariance matrix must be defined along the data table. The column OBS contains the observed values of dependent nodes. The special value NA (=not available) means that the dependent node was not observed. Please be aware that the values in PRIOR and UNC refer to the values in $\vec{z}$ whereas the values in OBS refer to the observed values of dependent nodes $\vec{y}$ . The (linear) relationship between the variables in $\vec{z}$ and those in $\vec{y}$ was stated in eq. 14.

IDX	NODE	PRIOR	UNC	OBS	REAC	ENERGY	EXPID
1	truexs	1000	500	NA	(N,TOT)	1.0	NA
2	truexs	1000	500	NA	(N,TOT)	1.5	NA
3	truexs	1000	500	NA	(N,TOT)	2.0	NA
4	normerr	0	100	NA	(N,TOT)	NA	20733
5	exp	0	50	NA	(N,TOT)	1.7	20733

Table 1: Structure of a data table with the definition of the variables of the Bayesian network

The columns REAC, ENERGY and EXPID are provided as examples of columns that are helpful for the definition of the mappings between nodes, as we will see in a moment. Depending on the evaluation situation, columns with other information may be more pertinent.

A mapping is defined by a list with several named variables to specify its characteristics. The essential variables present in the definition of any mapping are maptype to define the type of mapping (e.g., linear interpolation), mapname to assign a unique name to the mapping, src_idx to define the source index set $\mathcal{S}$ and tar_idx to define the target index set $\mathcal{T}$ of the specific mapping. Depending on the type of mapping, additional variables may be required, such as src_x and tar_x to define the (e.g., energy) mesh of the source variables and the mesh of the target variables in the case of a mapping that implements linear interpolation. These lists are preferably not created by hand but by query operations on the node data table. An example R code to define a linear interpolation mapping is presented in LABEL:lst:example_mapping_def. The variable name dt refers to the node data table, in this example given by table 1. Query operations, such as NODE=="truexs" are employed to select specific rows and to retrieve the values of a particular column, e.g., ENERGY. The query syntax is very flexible and different criteria can be combined to retrieve the relevant information of the rows of interest in order to define a mapping.

⬇

1list(

2 maptype = "linearinterpol_map",

3 mapname = "truexs_to_experiment",

4 src_idx = dt[NODE=="truexs", IDX],

5 tar_idx = dt[NODE=="exp", IDX],

6 src_x = dt[NODE=="truexs", ENERGY],

7 tar_x = dt[NODE=="exp", ENERGY],

Listing 1: Example R code to create a list with the defiition of a linear interpolation mapping

The collection of those lists with the mapping definitions determine the link structure of the Bayesian network and are bundled together to a so-called compound mapping, which is also specified by a similar list as the one given in LABEL:lst:example_mapping_def but containing as variable a list of all the individual mapping definitions. The list with the compound mapping specification is used to instantiate a mapping object (or more precisely a closure in R, which can be regarded for all practical matters as an object).

The functions implementing the inference algorithms, such as the GLS method and the LM algorithm, are called with the compound mapping object and the relevant prior specifications given in the node data table. Additional functions exist to evaluate parts of the posterior covariance matrix and to create samples from the (approximate) posterior distribution. Functions to visualize Bayesian networks are also available.

III Examples

We demonstrate the use of Bayesian networks in relevant evaluation scenarios. The examples emerged in the context of the neutron data standards project and the INDEN evaluation efforts of structural materials. As for the GLS method, the quality of the results using Bayesian networks depends on the validity of the assumptions imposed. The objective of these examples is to demonstrate the modeling possibilities outlined in the method part in practice. To this end, some assumptions were taken for the sake of demonstration and not because they are believed to be a sound reflection of expert knowledge. Therefore, we do not claim or recommend that the results in this section should replace established evaluations. Nevertheless they serve as proof of the technical feasibility and make the benefits of Bayesian network modeling in the nuclear data context tangible.

III.1 Energy-dependent USU error component

Recently the concept of so-called Unrecognized Sources of Uncertainty (USU) has been extensively discussed in Capote et al. (2020). It was suggested that even if the greatest care in the determination of experimental error sources is applied, there may be still error contributions remaining which the evaluators are unaware of. If the possibility of such contributions is not taken into account, evaluated uncertainties may be too optimistic. As an additional complication for some quantities, the magnitude of these unrecognized errors may be energy-dependent.

In this example, we want to demonstrate how the sparse GP construction introduced in section II.7 can be used within the Bayesian network framework to model energy-dependent USU error components in the evaluation procedure.

As an important disclaimer, please be aware that USU as defined in Capote et al. (2020) refers to unrecognized errors missed by an evaluator after a diligent effort to comprehensively quantify all uncertainties due to error sources associated with an experiment. It may well be that a large part of the contribution we refer to as USU error component in this example can be reattributed to known error sources in a serious evaluation effort. How we name an error component is therefore a matter of available knowledge and interpretation but the mathematical modelling can be the same.

We use measurement data of the ²³⁵U(n,f) cross section, which is a neutron data standard Carlson et al. (2009, 2018) between 150 eV and 200 MeV, and an important reaction to design and assess nuclear reactors. We consider the five datasets listed in table 2.

EXFOR	NUM	AUTHOR	YEAR	REF
20483	205	Blons et al	1971	Blons et al. (1971)
20783	308	Migneco et al	1975	Migneco et al. (1975)
20826	155	Wagemans et al	1976	Wagemans and Deruytter (1976)
12877	482	Weston et al	1984	Weston and Todd (1984)
23294	310	Paradela et al	2016	Paradela et al. (2016)

Table 2: Experimental datasets used in the example evaluation of ²³⁵U(n,f) reaction between seven and 9 keV. The column EXFOR contains the EXFOR accession number, NUM the number of datapoints of the datasets in the energy range considered.

Figure 2: Bayesian network for the evaluation of the ²³⁵U(n,f) cross section

We created the Bayesian network in fig. 2 to model the dependencies between the pieces of information. The node truexs represents the variables that contain the true cross sections on a densely defined computational mesh with a spacing of 1 eV. It is made up of the transformed sum of two components truexs_avg and truexs_hires which represent a smoothly varying cross section to capture an average trend and a high-resolution component to capture an overlaid resonance structure. The transformation keeps the result of the sum as it is if the sum is positive and otherwise yields zero. This transformation is known as Rectified Linear Unit (ReLU) in the machine learning field, e.g., Fukushima (1969), and ensures in our case that all cross sections are non-negative. The mapping from the mesh of truexs_avg and truexs_hires to the one of truexs is established by linear interpolation. Prior knowledge about the smoothness of these components is incorporated by using a coarser mesh for truexs_avg (spacing 50 eV) than for truexs_hires (spacing 1 eV) and by introducing for both components pseudo-observations of the second derivative as explained in section II.7 elaborating on a sparse Gaussian process construction. The nodes associated with the second derivatives of truexs_avg and truexs_hires are named truexs2nd_avg and truexs2nd_hires, respectively. We use smaller prior uncertainties for the variables of truexs2nd_avg than for truexs2nd_hires to ensure that one component really captures an average trend and the other one the high-resolution behavior. However, the posterior uncertainties of truexs_avg and truexs_hires would remain quite large without further prior constraints. The reason is that the prior on the second derivative is indifferent to shifting the function up or down and only the sum of truexs_avg and truexs_hires relates to the experiments, hence the distribution of possible global shifts to these two components is only determined with an uncertainty given by the one barn prior uncertainty associated with truexs_hires as it is smaller than the ten barn prior uncertainty of truexs_avg. To solve this issue, we introduce a node inttruexs_hires that represents binned averages of truexs_hires and associate this node with pseudo-observations with a zero value and small uncertainty. With this additional constraint, the average of the posterior estimate of the high-resolution component is close to zero for large enough energy intervals covering several peaks and valleys. Consequently the global shift of the truexs_hires component is fixed to be close to zero.

The other nodes reflect the experimental information of the five datasets. The experimental data points represented by expdata are modeled as a sum consisting of the convoluted true cross section truexs, a statistical error component relstaterr and an energy-dependent systematic error component relsyserr. Without further knowledge about the experimental error(s) associated with the energy-dependent systematic error component, we can regard it as an USU error component.

The employed convolution mapping from truexs to expdata averages the true cross section within a window of certain size centered at the true experimental energy given by the linear transformation stated in eq. 83. The respective parameters of the energy transformation are summarized in the node encalib. The non-linearity in the convolution mapping due to the energy transformation is indicated by the dashed lines between those two nodes in fig. 2.

The energy-dependent systematic error component is given relative to the true average cross section truexs_avg whereas the statistical error component relative to the high-resolution true cross section truexs_hires. The specification as relative errors make these mappings non-linear, which is indicated by the dashed lines between relstaterr and truexs, and relsyserr and truexs_avg, respectively. The mapping from relsyserr to expdata is given by the same convolution operation as used in the mapping from truexs to expdata (including the energy transformation), but also taking into account that the values in relsyserr are given relative to truexs_avg. Therefore, relsyserr is connected by dashed lines to the truexs_avg and encalib nodes. We use the approach of pseudo-observation of second derivatives another time to impose a smoothness prior on relsyserr. The node associated with the second derivatives is named relsyserr2nd.

To keep the representation of the Bayesian network compact, we aggregated the experimental components referring to individual datasets together. Differently stated, each of the nodes encalib, relsyserr, relsyserr2nd, relstaterr and expdata can be split up into five nodes referring to the various dataset. The nodes associated with one dataset do not have any direct connections to the nodes of another dataset. They are only indirectly linked over the nodes truexs and truexs_avg.

NODE	PRIOR	UNC	EMIN	EMAX	NUM
expdata	0	0.01	7001	11998	1460
relstaterr	0	0.03	7001	11998	1460
relsyserr (all)	0	0.05	6851	12115	490
relsyserr (Paradela)	0	0	6851	12115	490
relsyserr2nd	0	$1\times 10^{-6}$	6901	12065	480
truexs	0	0	6000	14000	8001
truexs_avg	0	$1\times 10^{4}$	6000	14000	161
truexs2nd_avg	0	$1\times 10^{-5}$	6050	13950	159
truexs_hires	0	$1\times 10^{3}$	6000	14000	8001
truexs2nd_hires	0	1	6001	13999	7999
inttruexs_hires	0	0.001	6101	13801	78
encalib (resolution)	4	2			15
encalib (shift $\alpha$ )	0	0.05			15
encalib (scaling $\beta$ )	0	0.05			15

Table 3: Prior estimates and uncertainties of the variables aggregated to the various nodes. The number of variables associated with a node is given in column NUM. The range of the associated energy mesh is given by columns EMIN and EMAX. Priors on cross sections (truexs, truexs_hires and truexs_avg) are given in barn. All energy related quantities (encalib (resolution), EMIN, EMAX) are given in eV.

Information about the nodes and the associated prior specifications are displayed in table 3. No effort was undertaken to base these specifications on the experimental details of the measurements. We assumed that each experimental point is affected by a relative statistical uncertainty of 3% (relative to the true high-resolution cross section). With the exception of the measurement by Paradela and colleagues, we assumed a 5% prior uncertainty for the relative energy-dependent systematic error (relative to the true average cross section). For the measurement of Paradela and colleagues, we specified that no energy-dependent systematic error is present. This choice is only for the sake of demonstration and should not be taken as judgment about the quality of this or the other experiments. The large prior uncertainties of truexs_avg, truexs_hires and truexs are chosen to express our a priori indifference about the cross section values. The prior uncertainties of truexs_avg are assumed larger than of truexs_hires to favor larger adjustments of the former component to ensure that it captures an average trend and the latter component additive fluctuations. The prior uncertainties associated with the pseudo-observations of the second derivatives were manually fine-tuned to obtain visually pleasing fits and were guided by the following considerations. The spacing of equidistant energy associated with truexs_avg is 50 eV. The square of the energy spacing appears in the denominators in eq. 76 so we work already with a baseline of $1/50^{2}=1/2500=4\times 10^{-4}$ . If we assume that changes within 50 eV of the smooth trend are somewhere between $0.01$ and $0.1$ barn, which appears to be a plausible assumption considering fig. 4, we end up with a prior uncertainty somewhere around $10^{-5}$ . To see this, we can choose, e.g., $y_{i}=y_{i+1}=0$ and $y_{i+2}=0.01$ in eq. 76. Similar considerations lead to the prior uncertainty of one for the truexs2nd_hires node. Together with an energy spacing of 1 eV, this prior uncertainty implies the prior assumption that truexs_hires is expected to exhibit changes of about one barn in a one electronvolt energy interval. The spacing of the energy mesh of the relative energy-dependent systematic error associated with each experiment is 50 eV. The prior uncertainty of $10^{-6}$ implies that we expect it to change less quickly as a function of energy than the average trend given by truexs_avg.

Please note that all these assumptions are choices, which are ideally informed by physics knowledge but will be in pratice based to some extent on the subjective judgement of the modeler. Data-driven approaches, such as proposed in Schnabel (2018a), to determine good values for uncertainties may help in specific cases but can be expected to be less useful in complex Bayesian networks due to the large amount of degeneracy.

The Bayesian network comprises in total 28304 variables. The associated prior matrix is very sparse with a proportion of $2.5\times 10^{-5}$ non-zero elements. Only 2% of the elements in the inverse posterior covariance matrix associated with the independent variables are non-zero.

Due to the non-linear mappings between nodes, such as associated with the relative normalization error, several local maxima exist in the posterior distribution, which we discovered by performing several optimization runs. Finally, to arrive at a good local maximum, we divided the optimization into several stages and only optimized a subset of the nodes in each stage. We followed the general heuristic to first adjust global components, such as truexs_avg and subsequently include the optimization of local or highly fluctuating components. Therefore, in the first stage, we adjusted only the true average cross section truexs_avg for a maximum of ten iterations.

Figure 3: Posterior estimate and credibility band of the true average cross section (truexs_avg). The brown highly fluctuating curve in the background represents the posterior estimate of the true cross section (truexs), which is the sum of truexs_avg and truexs_hires.

Figure 4: Posterior estimates and credibility bands of the energy-dependent systematic error components (relsyserr) of two experimental datasets: Blons (71) in orange and Weston (84) in green.

Starting from the obtained values for the variables, we continued with the joint optimization of truexs_avg, relnormerr and relstaterr for a maximum of 30 iterations. Afterwards, we optimized jointly relnormerr, relstaterr and truexs_hires for a maximum of 30 iterations. In the final stage, we included all variables in the optimization with a maximum number of 300 iterations. These scheme was adopted after some trial and error. The complete optimization procedure took a couple of minutes.

Finally, we provide impressions of some evaluated quantities. The most likely true average cross section according to the posterior distribution is shown in fig. 4. The most likely relative energy-dependent normalization error for two experimental datasets is displayed in fig. 4.

III.2 No-model evaluation of cross sections with resonance structure

One part of the efforts within the International Nuclear Data Evaluation Network (INDEN) is focused on the evaluation of neutron-induced reactions of structural materials. One such material is ⁵⁶Fe, which is notoriously difficult to evaluate in the incident energy range between a few hundred keV and about five MeV with either R-matrix fits and nuclear models. R-matrix fits are difficult because there is a large number of resonances, which cannot be experimentally resolved. Nuclear models cannot describe well this energy range because they rely on the assumption that the structure due to resonances is completely averaged out, which is however also not true. A third option is a no-model fit, i.e., to use a mathematical function, such as a spline, instead as a fitting function. There are several aspects that make a no-model fit in this region difficult, which are:

•

The large number of mesh points to capture the resonant structure of the cross sections,
•

The preservation of consistency between the elastic, inelastic and total cross section, and,
•

The enforcement of the constraint that cross sections must be non-negative.

Here we explore a potential solution, which uses the sparse Gaussian process construction described in this paper as a fitting function for cross section within the Bayesian framework for a consistent evaluation of the elastic, inelastic and total cross section. We have included the datasets listed in table 4.

REAC	EXFOR	NUM	AUTHOR	YEAR	REF
EL	40532014	2	Korzh et al	1977	Korzh et al. (1977)
INL	11700002	1	Barrows	1965	Barrows (1965)
INL	10529004	378	Perey et al	1971	Perey et al. (1971)
INL	32201002	4	Korzh et al	1994	Korzh et al. (1994)
INL	23134005	11	Beyer et al	2014	Beyer et al. (2014a)
TOT	13764002	426	Harvey	1987
TOT	22316003	2050	Rohr et al	1995

Table 4: Experimental datasets used in the example evaluation of neutron-induced reactionf of ⁵⁶Fe between one and two MeV. The column EXFOR contains the EXFOR accession number, NUM the number of datapoints of the datasets in the energy range considered. A missing reference means that no accessible publication is known to the authors.

We have created the Bayesian network depicted in fig. 5.

Figure 5: Bayesian network for an evaluation of neutron-induced reactions of ⁵⁶Fe in the energy range between one and two MeV. The meaning of the nodes and the mappings between them are described in the text.

In the following description, we use REAC in a node name to indicate that there is a node of this form for each reaction channel. For example, the string truexs_REAC indicates that the nodes truexs_EL, truexs_INL and truexs_TOT are present in the network.

As an important clarification, we will use the term true cross section to refer to the quickly fluctuating cross section curve followed by the experimental data. However, as single resonances cannot be resolved anymore in this energy range, what we label as true cross section is in reality an average trend of the true cross section over several resonances. A more rigorous investigation of the pertinence of the Bayesian network approach to deal with unresolved resonances, the relationship to approaches based on probability tables, and the possibility of a coupling to processing codes for the resolved and unresolved resonance region, e.g., CALENDF Sublet and Ribon (2002), remains as future work. Quantitative analysis tools, such as the autocorrelation function to investigate the presence of structure Bertsch et al. (2018), are expected to help in this regard. We also note that we neglected the capture cross section in this example for the sake of simplicity.

The true cross section truexs_REAC of each reaction channel is modeled as a transformed sum of an average trend component truexs_avg_REAC and a high-resolution component truexs_hires_REAC. Regarding the transformation, let $x$ be the value at energy $E$ of truexs_avg_REAC and $y$ be the value at the same energy of truexs_hires_REAC, the transformation is given by $\max(0,x+y)$ . This non-linear transformation, known as rectified linear unit (ReLU) in the machine learning field, ensures that all cross sections remain non-negative quantities. We enforce different degrees of smoothness of truexs_avg_REAC and truexs_hires_REAC by introducing pseudo-observations of their second derivatives represented by truexs2nd_avg_REAC and truexs2nd_hires_REAC, respectively. As we use non-informative priors for truexs_REAC and the second derivative is invariant to shifts of the cross section, we need to ensure that truexs_hires_REAC is in average zero so that truexs_avg_REAC captures the average cross section. As in the last example, we achieve this by introducing pseudo-observations inttruexs_hires_REAC of the high-resolution component, which represent averages of the high-resolution component in several overlapping intervals. These pseudo-observations are taken to be zero with a low uncertainty compared to the magnitude of the resonance-like fluctuations. The window size was manually fixed by inspecting the width of the resonance-like structures. Each experimental dataset was associated with an absolute normalization error represented by normerr. A summary of the individual nodes including their prior specifications and number of variables is given in table 5.

NODE	UNC	EMIN	EMAX	NUM
expdata_INL	70	1.00	2.00	394
expdata_TOT	$3\times 10^{2}$	1.00	2.00	2476
expdata_EL	$2\times 10^{2}$	1.50	2.00	2
truexs_INL	0	0.75	2.25	1501
truexs_TOT	0	0.75	2.25	1501
truexs_EL	0	0.75	2.25	1501
normerr	$1\times 10^{2}$			7
truexs_hires_INL	$1\times 10^{4}$	0.75	2.25	1501
truexs_hires_TOT	0	0.75	2.25	1501
truexs_hires_EL	$1\times 10^{4}$	0.75	2.25	1501
inttruexs_hires_INL	50	1.10	1.90	9
inttruexs_hires_TOT	50	1.10	1.90	9
inttruexs_hires_EL	50	1.10	1.90	9
truexs2nd_hires_INL	$1\times 10^{8}$	0.75	2.25	1499
truexs2nd_hires_EL	$1\times 10^{8}$	0.75	2.25	1499
truexs_avg_INL	$1\times 10^{8}$	0.75	2.25	31
truexs_avg_TOT	0	0.75	2.25	31
truexs_avg_EL	$1\times 10^{8}$	0.75	2.25	31
truexs2nd_avg_INL	$1\times 10^{4}$	0.80	2.20	29
truexs2nd_avg_EL	$1\times 10^{4}$	0.80	2.20	29

Table 5: Information of the nodes in the Bayesian network to evaluate ⁵⁶Fe in the energy range between one and two MeV. The column NUM shows the number of variabes aggregated in the node, PRIOR and UNC show the prior estimate and uncertainty, respectively, and are the same for each variable belonging to a node. There are no prior correlations between the variables of one node (and by the construction of the Bayesian network none between variables belonging to different nodes). If UNC is zero, it means the associated node is a deterministic function of the values associated with other nodes. EMIN and EMAX show the energy range covered by the variables of a node. These two columns do not apply to the normerr node. The Bayesian network contains 15061 variables in total.

Refer to caption — Figure 6: Correlation matrix between the variables associated with the node truexs_hires_EL and those associated with the node truexs_avg_TOT. The horizontal stripe of weak negative correlation at 1.50 MeV of the high-resolution component is due to the experimental data point at 1.5 MeV in the elastic channel.

The customized LM algorithm finds the most likely values of the variables according to the posterior distribution in a few iterations, and the complete optimization finished in about ten seconds. In this example, all relationships are linear and the GLS method can therefore exactly locate the posterior maximum. The LM algorithm takes a few more iterations because it starts out with a non-zero damping term which leads to more careful steps compared to the GLS method.

The posterior estimates of the average and the high-resolution cross section component are shown in fig. 7 and fig. 8, respectively.

Figure 7: Most likely average trends (truexs_avg_REAC) of the elastic, inelastic and total cross section curves between one and two MeV for neutrons on ⁵⁶Fe according to the posterior distribution. The one-sigma credibility interval is also displayed. The horizontal blue segments represent the binned averages of the experimental data with the bin size of 0.2 MeV indicated by the lengths of the segments. The dashed lines are samples from the approximate posterior distribution.

Figure 8: Most likely cross section (truexs_REAC) of the elastic, inelastic and total cross section curves for neutrons on ⁵⁶Fe in a selected energy range. The one-sigma posterior credibility band is also shown and the dashed lines represent samples from the approximate posterior distribution. An energy resolution of 0.3 keV has been assumed for all experimental data points in the Bayesian inference.

The average components track well the averages of the binned cross section indicated by the blue horizontal segments. The exception is the elastic cross section channel with only two data points, which are at first sight inconsistent with the measurements in the inelastic and total channel considering the sum rule. The evaluated curve is far above the two experimental data points. However, by inspecting the true elastic cross section (equaling sum of average and high-resolution component) in fig. 8 and taking into account the assumed energy resolution of 3 keV for all experiments, the data point at 1.5 MeV seems to be consistent with the evaluated curve. As an important remark, the Bayesian network framework enables the use of different energy resolutions for different experiments but in this example we assumed that all experiments have the same energy resolution for the sake of simplicity.

Overall, we see that both the average trend and the resonant structure in the cross sections are well reproduced by the tentative evaluation. Additional fine-tuning of the prior uncertainties of the second derivatives at various energy locations can improve the coherence between the evaluation and the experimental data at the peaks and valleys.

The presence of the posterior uncertainty bands in the plots highlights the possibility to calculate selected blocks of the posterior covariance matrix described in section II.3. The time needed to compute the posterior covariance block associated with the truexs_avg_REAC was only a couple of seconds. As a final demonstration of the possibility to selectively compute elements of the posterior covariance matrix, fig. 6 shows the correlations between the variables of truexs_avg_TOT and truexs_hires_EL. Weak negative correlations between those components only occur at the same energy, which is expected due to the short-range prior correlation within the high-resolution component. The horizontal stripe of weak negative correlation at 1.5 MeV of the high-resolution component is induced by the data point in the elastic channel at the same energy.

Even though there are a total of 15061 variables constituing the Bayesian network, there are only 3071 independent variables, which are associated with the nodes normerr, truexs_avg_EL, truexs_hires_EL, truexs_avg_INL and truexs_hires_INL. Only $12\%$ of the elements in the inverse posterior covariance matrix of dimension $3071\times 3071$ are non-zero.

III.3 Model based evaluation with consistent model defects

Nuclear model codes, such as CCONE Iwamoto et al. (2016), CoH₃ Kawano (2021), EMPIRE Herman et al. (2007, 2013), GNASH Young and Arthur (1977), OPTMAN Soukhovitski et al. (2013) and TALYS Koning and Rochman (2012); Koning et al. (2019), can be used for an evaluation in the fast energy region. However, due the complexity of nuclear processes and the limitations of nuclear models, discrepancies between the experimental data and the model predictions may remain even after the model parameters have been adjusted using the data. Two approaches relying on Gaussian processes have been explored in the past to address the issue of imperfect nuclear models in the fast energy range:

1.

The use of Gaussian processes as priors for energy-dependent model parameters to inject more flexibility into the model Helgesson and Sjöstrand (2018); Schnabel et al. (2021), and,
2.

The combination of model predictions and model defects modeled by Gaussian processes to explain the experiments, e.g., Pigni and Leeb (2003); Leeb et al. (2008); Schnabel and Leeb (2016); Helgesson and Sjöstrand (2017).

Whereas the first approach has already been applied in a tentative full scale evaluation of neutron-induced reactions of ⁵⁶Fe Schnabel et al. (2021), the second approach has only been explored in schematic examples. Obstacles to the application of the second approach are preserving the consistency between the various reaction channels, such as the sum of the exclusive reaction channels equaling the total cross section, and enforcing that all cross sections are non-negative according to the posterior distribution. In this example, we show the application of a Bayesian network to perform an evaluation following the second approach.

We employ the sparse Gaussian process construction explained in section II.7 within the Bayesian network framework to perform an evaluation of neutron-induced reactions of ⁵⁶Fe between 5 MeV and 30 MeV with sum rule and positivity constraints preserved. We remark that a sum rule Gaussian process construction has already been suggested and studied in a toy example in Schnabel (2015) and an evaluation based on GPs under positivity constraint has been presented in Iwamoto (2020). Also the fitting of splines using linear regression can be regarded as a special case of Gaussian process regression and therefore Kawano et al. (2000) can also be considered as an example of GP regression under a positivity constraint. However, to the best of our knowledge, these two constraints have not been modeled together in a nuclear data evaluation so far. Here we elaborate on a tentative full scale evaluation enforcing both constraints at the same time. The experimental datasets used in this example evaluation are listed in appendix B.

Figure 9: Bayesian network for an evaluation of neutron-induced reactions of ⁵⁶Fe in the energy range between 5 and 30 MeV using TALYS as a nuclear model code and model defects defined by Gaussian processes. The meaning of the nodes and the mappings between them are explained in the text.

We created the Bayesian network depicted in fig. 9 to perform a consistent evaluation of neutron-induced reactions of ⁵⁶Fe using 91 experimental datasets with a total number of 2072 datapoints between 5 and 30 MeV in the following nine reaction channels: (N,EL), (N,A), (N,T), (N,D), (N,P), (N,INL), (N,N+P), (N,2N) and (N,TOT). Regarding the total cross section, we aggregated the data in bins of 0.2 MeV and use the bin averages as the experimental data. This measure was taken to make the plots less busy and avoid the modeling of a potential high-resolution component as done in the previous examples for the sake of simplicity. After this aggregation, the total number of data points is reduced to 596. The experimental data along with the evaluated cross sections based on the Bayesian network are shown in fig. 10.

Figure 10: Most likely true cross section (truexs_REAC, black) of the neutron-induced reaction channels of ⁵⁶Fe between 5 and 30 MeV and associated one-sigma credibility band according to the posterior distribution. The prior curves (green) and associated uncertainty band (pale green) are also displayed.

Following we describe the Bayesian network in more detail. If we use REAC in a node name, it implies that a node of this type exists for each reaction channel except the total cross section. For instance, truexs_REAC makes reference to truexs_(N,A), truexs_(N,P) and all the other channels but not truexs_(N,TOT).

The node auxmodpar contains the multipliers to be applied to the default model parameter values of the nuclear model code TALYS, as they are specified by the rvadjust and similar keywords in a TALYS input file. If we use the term model parameters from now on, we refer to these multiplication factors. The prior estimate of each multiplication factor is one, meaning that the default value employed by TALYS is a priori considered the best guess, and the associated prior uncertainty is taken as $0.05$ . These parameters are propagated to modpar by a (deterministic) non-linear transformation that limits their range to $0.9$ and $1.1$ in order to avoid excessive adjustments and counteract potential overfitting to available experimental data at the cost of predictive power. Any value of a parameter below $0.9$ in auxmodpar is mapped to $0.9$ in modpar and any value above $1.1$ to $1.1$ . The model parameters in modpar yield the predictions of the nuclear model in the pred_REAC nodes. In this example, we use a linear approximation of the nuclear model using the default parameters as reference vector to simplify the example. The feasibility of incorporating the exact non-linear deterministic model at scale into the Bayesian inference procedure has already been demonstrated in Schnabel et al. (2021).

As TALYS is a deterministic code, the links from modpar to pred_REAC are deterministic as well. The predictions are an additive contribution to the true reaction cross sections in truexs_REAC. The other contribution is given by the model defects def_REAC associated with each reaction channel. We also apply a non-linear transformation to the sum of pred_REAC and def_REAC to ensure that the cross sections in truexs_REAC are non-negative. If the result of the sum at any energy is negative, it is mapped to zero.

The model defects are given relative to the model predictions, which introduces a non-linear interaction between def_REAC and pred_REAC indicated by the dashed connections. The prior estimate of these model defects at all energies is zero. The prior uncertainty is 20% for energies above the reaction specific reaction threshold and zero below. To ensure a certain smoothness, we introduced the nodes def2nd_REAC representing the second derivatives of def_REAC and assumed that a zero value was observed at all energy locations with small uncertainty. This construction, as explained in section II.7, enforces a certain degree of smoothness of the def_REAC nodes and guards against kinks caused for instance by inconsistent experimental datasets. Because the variables of the node def_REAC are relative to pred_REAC, also the observations of smoothness in def2nd_REAC are defined in terms of relative changes. This feature appears helpful to us as the expected absolute variations of different cross sections can be quite different depending on their magnitude.

The sum of the true reaction cross sections truexs_REAC yields the total cross section truexs_(N,TOT). Please note that in this example, to keep the display of the Bayesian network in this paper manageable, we did not account for some channels, such as $(N,N+A)$ , $(N,3N)$ and $(N,2N+P)$ which yield contributions of a few millibarn at 15 MeV due to $(N,N+A)$ , several dozen millibarns at 20 MeV and about hundred millibarn at 25 MeV, which introduces a bias of respective size distributed over the reaction channels in this example evaluation.

At this point in the explanation, we have covered the construction of the true cross sections as a function of a nuclear physics model and model defect components with the constraints that the individual reaction channels should yield the total cross section and all cross sections are non-negative quantities. The remaining part of the explanations covers the modelization of the experimental data and their relationship to the true cross sections. From now on, the appearance of REAC in a node name implies that a node of this type is present for all reaction channels, hence also including the total cross section $(N,TOT)$ . Furthermore, we aggregated the nodes of different experimental datasets in each channel together to one node for a manageable display of the Bayesian network. A node, such as staterr_(N,A) in the display of the Bayesian network can be expanded into a node for each individual dataset without connections between those nodes.

Each experimental dataset exp_REAC is modeled as a sum of three components, which are the true reaction cross section truexs_REAC, a relative normalization error normerr_REAC and a relative statistical error staterr_REAC. Both error components are given relative to the true reaction cross section, and the non-linear interactions between the source nodes are indicated by dashed lines from normerr_REAC to truexs_REAC and from staterr_REAC to truexs_REAC. The prior uncertainty of all relative statistical errors is assumed to be 1% and the prior uncertainty of the normalization errors of each experimental dataset is $10\%$ . This description of the nodes associated with the properties of the experiments completes the explanation of the Bayesian network topology depicted in fig. 10, which comprises a total of 4895 variables associated with the nodes.

Tentative optimization runs using the customized Levenberg-Marquardt algorithm revealed several local maxima in the posterior distribution. We obtained the highest value of the posterior distribution, which may represent the global maximum, by dividing the optimization procedure in two stages. In the first stage we allowed only the adjustment of the variables in the nodes def_REAC, normerr_REAC and staterr_REAC. In the second stage, starting from the optimized values of the first stage, we allowed the adjustment of all independent variables, which includes the variables of the first stage and in addition auxmodpar. The optimization procedure finished in a couple of minutes.

The most likely cross sections according to the posterior distribution, which we refer to as evaluated cross sections, are depicted in fig. 10. In most cases, the evaluated cross sections coincide with the best prior estimate given by the model prediction using the default values of the model parameters. Only the evaluated $(N,P)$ cross section is significantly below the prior estimate and this difference is associated with the evaluated model defect component shown in fig. 11. We double checked that this reduction of the cross section in $(N,P)$ due to the model defect component is done consistently so that the sum of exclusive cross sections still equals the total cross section.

The most likely model parameters and their posterior credible interval are depicted in fig. 12. The posterior estimates of all model parameters are very close to the prior ones, only the posterior uncertainty of some parameters is significantly smaller compared to the prior uncertainty of $0.05$ . The additional flexibility introduced by the model defect components reduces the need to adjust model parameters to explain the experimental data. A model defect with a prior uncertainty of 20% renders the adjustment of model parameters unnecessary as most experimental data are in a 20% corridor around the model prediction with default parameter values.

Figure 11: Posterior estimate and associated one-sigma credibility band of the model defect in the (N,P) channel of ⁵⁶Fe. The model defect is given relative to the posterior model prediction, i.e., the prediction based on the most likely model parameter values according to the posterior distribution.

From a physics point of view, it is interesting to know which model parameters are constrained most by the data. The TALYS input keyword aadjust 25 56 is a multiplication factor to the TALYS default value of the level density parameter $a$ for ⁵⁶Mn appearing in the Fermi gas model. The adjust keywords starting with rv, av and v1 represent multiplication factors applied to the default values of the optical model parameters for a specific projectile indicated at the end of the keyword string. The value of the radius $r_{V}$ appearing in the Woods-Saxon factor of the real and imaginary component of the volume-central potential is adjusted by rvadjust. The value $v_{1}$ appears as a factor in the real component of the volume-central potential and is adjusted by avadjust. The diffuseness parameter $a_{V}$ in the Woods-Saxon factor appearing in the real and imaginary component of the volume-central potential is adjusted by avadjust. The keyword gnadjust 26 57 adjusts for ⁵⁷Fe the single-particle neutron level density parameter $g_{\nu}$ that appears in the particle-hole state density expression of Běták and Dobeš Dobeš and Běták (1983) used in the pre-equilibrium excition model. The keyword egradjust 25 57 e1 is a normalization factor for the energy of the giant dipole resonance. More information on these input keywords and how they impact the nuclear models can be found in the TALYS user manual.

The plots in fig. 10 contain several outlying datasets. These datasets are also revealed as outliers by inspecting the posterior estimates and uncertainties of the relative normalization errors. For example, for the dataset in the (N,2N) channel being located significantly below the peak (EXFOR identification 20854015, see appendix B), the posterior estimate of the relative normalization error is -44% and the posterior uncertainty of this value is 0.9. This result is incompatible with the 10% prior uncertainty of the normalization error. The possibility to obtain posterior estimates and uncertainties of all nodes in the Bayesian network is therefore an algorithmic way to identify outliers.

Finally, we remark that in tentative optimization runs that converged to local minima associated with a smaller value of the posterior density function, also the posterior estimates of the model parameters significantly differed from the prior ones.

Figure 12: Posterior estimate and associated one-sigma credibility interval of the most constrained TALYS model parameters.

IV Summary and outlook

Starting from the Generalized Least Squares method, we presented the equations to perform inference in Bayesian networks with multivariate normal priors on the variables associated with each node and linear, non-linear, deterministic and nested relationships between the variables belonging to different nodes. Furthermore, we elaborated on a sparse Gaussian process construction that blends well with the Bayesian network interpretation and is suitable for scenarios with a large number of variables. One key ingredient of this construction is to use a discretized version of a function defined on a mesh and to use linear interpolation between mesh points. The second ingredient is to use a diagonal prior covariance matrix for the function values at the mesh points and to regulate the degree of smoothness by pseudo-observations of the second derivative with the associated covariance matrix also being diagonal. An advantage of this GP construction, besides increased computational and storage efficiency, is the possibility to define prior knowledge about the magnitude and smoothness of the unknown function in a fine-grained manner individually for different energy regions. We also reviewed two important non-linear mappings to model energy calibration errors and relative experimental errors, which can be rigorously taken into account in the outlined Bayesian network framework by using a customized Levenberg-Marquardt algorithm Helgesson and Sjöstrand (2017); Schnabel et al. (2021) to exactly locate the posterior maximum even in the presence of non-linear relationships. It was also explained how samples can be obtained from an approximate posterior distribution and how selected blocks of an approximated posterior covariance matrix can be computed, even in the case of a large number of variables by exploiting the sparseness of the inverse posterior covariance matrix. The approximate posterior distribution is close to the true posterior distribution if the non-linearities are in good approximation linear in the parameter domain associated with significant values of the posterior density function. If this assumption is strongly violated, Monte Carlo sampling schemes should be preferred to extract samples or blocks of the posterior covariance matrix from the posterior distribution.

We provided three proof-of-concept examples to perform nuclear data evaluations with Bayesian networks in combination with the outlined sparse GP construction, which are of practical relevance. The first example demonstrated the evaluation of the ²³⁵U(n,f) cross section including a relative energy-dependent systematic error. For example, such an energy-dependent error contribution can be used to model errors associated with Unrecognized Sources of Uncertainty (USU) in the neutron data standards project Carlson et al. (2009, 2018). The second example was a no-model evaluation of neutron-induced reactions of the important structural material ⁵⁶Fe in the energy range between one and two MeV, where it is difficult to obtain satisfactory evaluations with either R-matrix and nuclear model fits due to the many overlapping resonances causing rapid fluctuations of the cross sections. The third example demonstrated a model-based evaluation of neutron-induced reaction of ⁵⁶Fe between 5 and 30 MeV including model defect components to account for discrepancies between the model predictions and the experimental data while preserving sum rule and positivity constraints.

The presentation of the examples was focused on the modeler point of view who can build a Bayesian network by first defining a data table with the variables and then link these variables to each other by using various mappings, such as a linear interpolation mapping, convolutions, and non-linear transformations as building blocks. The diverse aspects of the examples, such as the presence or absence of a physics model or fluctuations, hinted at the general applicability of the Bayesian network interpretation to many different evaluation scenarios.

The examples also proved that the customized Levenberg-Marquardt algorithm is able to find good solutions in a reasonable amount of time. The optimization process never took loger than a couple of minutes on a personal computer. However, a point that needs improvement is the implementation of heuristics to ensure that a good local (and ideally the global) maximum of the posterior distribution is found. Common non-linear mappings between variables, such as relative normalization error, usually yield multiple local maxima in the posterior distribution. In the examples, we dealt with this issue by performing the optimization in several stages and optimizing only a subset of the variables in each stage. The determination of the final optimization strategy was the result of trial and error.

We believe that the modelization of an evaluation scenario as a Bayesian networks can help to make nuclear data evaluation more transparent and less error-prone. The possibility to get posterior estimates and uncertainties of all quantities involved, such as the various systematic error components, helps to quickly identify problematic modeling assumptions by checking the compatibility of the posterior estimates with the prior estimates and uncertainties. At some point in the future, templates of measurement uncertainties Neudecker et al. (2018, 2020a) may be used for the specification of default priors on the experimental errors associated with different measurement types, and also to check the compatibility of the posterior estimates with sensible ranges given by a template. The Bayesian network framework is also compatible with the procedures described in Schnabel (2018a); Siefman et al. (2020, 2021); Schnabel et al. (2021) for the automated determination of missing or misspecified uncertainties. As the relations between experiments in nuclear databases, such as EXFOR Otuka et al. (2014), can also be represented as a graph Brown (2014); Hirdt and Brown (2016), with links established by common features, Bayesian networks may be used for the automatic correction of experimental datasets and also outlier detection there. These procedures are complementary to machine learning approaches, such as presented in Whewell et al. (2020); Neudecker et al. (2020b); Dwivedi (2019), to help humans make sense of data and enhance evaluations.

References

Muir (1989) D. W. Muir, Nuclear Science and Engineering 101, 88 (1989).
Smith (1991) D. L. Smith, Probability, Statistics, and Data Uncertainties in Nuclear Science and Technology (Amer Nuclear Society, LaGrange Park, Ill, 1991).
Capote et al. (2010) R. Capote, D. Smith, and A. Trkov, EPJ Web of Conferences 8, 04001 (2010).
Muir et al. (2007) D. W. Muir, A. Trkov, I. Kodeli, R. Capote, and V. Zerkin (EDP Sciences, 2007).
Schnabel and Leeb (2017) G. Schnabel and H. Leeb, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 841, 87 (2017).
Rasmussen and Williams (2006) C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning (MIT Press, Cambridge, Mass., 2006).
Stein and Stein (1999) M. L. Stein and M. L. Stein, Interpolation of Spatial Data: Some Theory for Kriging, Springer Series in Statistics (Springer, New York Berlin Heidelberg, 1999).
Tarantola (2005) A. Tarantola, Inverse Problem Theory and Methods for Model Parameter Estimation (Society for Industrial and Applied Mathematics, Philadelphia, PA, 2005).
Pigni and Leeb (2003) M. T. Pigni and H. Leeb, in Proc. Int. Workshop on Nuclear Data for the Transmutation of Nuclear Waste. GSI-Darmstadt, Germany (2003).
Leeb et al. (2008) H. Leeb, D. Neudecker, and T. Srdinko, Nuclear Data Sheets 109, 2762 (2008).
Schnabel and Leeb (2016) G. Schnabel and H. Leeb, EPJ Web of Conferences 111, 09001 (2016).
Helgesson and Sjöstrand (2017) P. Helgesson and H. Sjöstrand, Review of Scientific Instruments 88, 115114 (2017).
Helgesson and Sjöstrand (2018) P. Helgesson and H. Sjöstrand, Annals of Nuclear Energy 120, 35 (2018).
Schnabel et al. (2021) G. Schnabel, H. Sjöstrand, J. Hansson, D. Rochman, A. Koning, and R. Capote, Nuclear Data Sheets Special Issue on Nuclear Reaction Data, 173, 239 (2021).
Iwamoto (2020) H. Iwamoto, Journal of Nuclear Science and Technology 57, 932 (2020).
Smith and Naberejnev (2002) D. Smith and L. A. Naberejnev, “Large Errors and Severe Conditions,” (2002).
Capote and Smith (2008) R. Capote and D. L. Smith, Nuclear Data Sheets 109, 2768 (2008).
Koller and Friedman (2009) D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques (MIT Press, Cambridge, MA, 2009).
Pearl (2014) J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, 1st ed. (Morgan Kaufmann, 2014).
Pearl (1985) J. Pearl, in Proc. of Cognitive Science Society (CSS-7) (1985).
Buss et al. (2010) O. Buss, A. Hoefer, and J. Neuber, in International Conference on the Physics of Reactors (PHYSOR 2010), Vol. 3 (Pittsburgh, Pennsylvania, 2010).
Schnabel (2018a) G. Schnabel, arXiv:1803.00960 [nucl-ex, physics:nucl-th, physics:physics] (2018a), arXiv:1803.00960 [nucl-ex, physics:nucl-th, physics:physics] .
Plompen et al. (2020) A. J. M. Plompen, O. Cabellos, C. De Saint Jean, M. Fleming, A. Algora, M. Angelone, P. Archier, E. Bauge, O. Bersillon, A. Blokhin, F. Cantargi, A. Chebboubi, C. Diez, H. Duarte, E. Dupont, J. Dyrda, B. Erasmus, L. Fiorito, U. Fischer, D. Flammini, D. Foligno, M. R. Gilbert, J. R. Granada, W. Haeck, F.-J. Hambsch, P. Helgesson, S. Hilaire, I. Hill, M. Hursin, R. Ichou, R. Jacqmin, B. Jansky, C. Jouanne, M. A. Kellett, D. H. Kim, H. I. Kim, I. Kodeli, A. J. Koning, A. Y. Konobeyev, S. Kopecky, B. Kos, A. Krása, L. C. Leal, N. Leclaire, P. Leconte, Y. O. Lee, H. Leeb, O. Litaize, M. Majerle, J. I. Márquez Damián, F. Michel-Sendis, R. W. Mills, B. Morillon, G. Noguère, M. Pecchia, S. Pelloni, P. Pereslavtsev, R. J. Perry, D. Rochman, A. Röhrmoser, P. Romain, P. Romojaro, D. Roubtsov, P. Sauvan, P. Schillebeeckx, K. H. Schmidt, O. Serot, S. Simakov, I. Sirakov, H. Sjöstrand, A. Stankovskiy, J. C. Sublet, P. Tamagno, A. Trkov, S. van der Marck, F. Álvarez-Velarde, R. Villari, T. C. Ware, K. Yokoyama, and G. Žerovnik, The European Physical Journal A 56, 181 (2020).
Shibata et al. (2011) K. Shibata, O. Iwamoto, T. Nakagawa, N. Iwamoto, A. Ichihara, S. Kunieda, S. Chiba, K. Furutaka, N. Otuka, T. Ohsawa, T. Murata, H. Matsunobu, A. Zukeran, S. Kamada, and J.-i. Katakura, Journal of Nuclear Science and Technology 48, 1 (2011).
Brown et al. (2018) D. Brown, M. Chadwick, R. Capote, A. Kahler, A. Trkov, M. Herman, A. Sonzogni, Y. Danon, A. Carlson, M. Dunn, D. Smith, G. Hale, G. Arbanas, R. Arcilla, C. Bates, B. Beck, B. Becker, F. Brown, R. Casperson, J. Conlin, D. Cullen, M.-A. Descalle, R. Firestone, T. Gaines, K. Guber, A. Hawari, J. Holmes, T. Johnson, T. Kawano, B. Kiedrowski, A. Koning, S. Kopecky, L. Leal, J. Lestone, C. Lubitz, J. Márquez Damián, C. Mattoon, E. McCutchan, S. Mughabghab, P. Navratil, D. Neudecker, G. Nobre, G. Noguere, M. Paris, M. Pigni, A. Plompen, B. Pritychenko, V. Pronyaev, D. Roubtsov, D. Rochman, P. Romano, P. Schillebeeckx, S. Simakov, M. Sin, I. Sirakov, B. Sleaford, V. Sobes, E. Soukhovitskii, I. Stetcu, P. Talou, I. Thompson, S. van der Marck, L. Welser-Sherrill, D. Wiarda, M. White, J. Wormald, R. Wright, M. Zerkle, G. Žerovnik, and Y. Zhu, Nuclear Data Sheets 148, 1 (2018).
Levitt (1972) L. B. Levitt, Nuclear Science and Engineering 49, 450 (1972).
Hoefer and Buss (2021) A. Hoefer and O. Buss, Annals of Nuclear Energy 162, 108490 (2021).
Schnabel (2018b) G. Schnabel, EPJ Nuclear Sciences & Technologies 4, 33 (2018b).
Carlson et al. (2009) A. Carlson, V. Pronyaev, D. Smith, N. Larson, Z. Chen, G. Hale, F.-J. Hambsch, E. Gai, S.-Y. Oh, S. Badikov, T. Kawano, H. Hofmann, H. Vonach, and S. Tagesen, Nuclear Data Sheets 110, 3215 (2009).
Carlson et al. (2018) A. Carlson, V. Pronyaev, R. Capote, G. Hale, Z.-P. Chen, I. Duran, F.-J. Hambsch, S. Kunieda, W. Mannhart, B. Marcinkevicius, R. Nelson, D. Neudecker, G. Noguere, M. Paris, S. Simakov, P. Schillebeeckx, D. Smith, X. Tao, A. Trkov, A. Wallner, and W. Wang, Nuclear Data Sheets 148, 143 (2018).
Capote et al. (2020) R. Capote, S. Badikov, A. Carlson, I. Duran, F. Gunsing, D. Neudecker, V. Pronyaev, P. Schillebeeckx, G. Schnabel, D. Smith, and A. Wallner, Nuclear Data Sheets 163, 191 (2020).
Larson (1998) N. M. Larson, Updated User’s Guide for SAMMY: Multilevel R-Matrix Fits to Neutron Data Using Bayes’ Equation, Revision 4, Tech. Rep. ORNL/TM-9179/R4 (Oak Ridge National Laboratory, Oak Ridge, 1998).
Moxon et al. (2010) M. C. Moxon, T. C. Ware, and C. J. Dean, REFIT-2009 A Least-Square Fitting Program for Resonance Analysis of Neutron Transmission, Capture, Fission and Scattering Data Users’ Guide for REFIT-2009-10, Tech. Rep. UKNSF(2010)P243 (Apirl 2010).
De Saint Jean et al. (2021) C. De Saint Jean, P. Tamagno, P. Archier, and G. Noguere, EPJ Nuclear Sciences & Technologies 7, 10 (2021).
Young and Arthur (1977) P. Young and E. Arthur, GNASH: A Preequilibrium, Statistical Nuclear-Model Code for Calculation of Cross Sections and Emission Spectra. [In FORTRAN for CDC 7600], Tech. Rep. LA-6947, 5224192 (1977).
Herman et al. (2007) M. Herman, R. Capote, B. Carlson, P. Obložinský, M. Sin, A. Trkov, H. Wienke, and V. Zerkin, Nuclear Data Sheets 108, 2655 (2007).
Koning and Rochman (2012) A. Koning and D. Rochman, Nuclear Data Sheets 113, 2841 (2012).
Iwamoto et al. (2016) O. Iwamoto, N. Iwamoto, S. Kunieda, F. Minato, and K. Shibata, Nuclear Data Sheets 131, 259 (2016).
Kawano (2021) T. Kawano, in Compound-Nuclear Reactions, Vol. 254, edited by J. Escher, Y. Alhassid, L. A. Bernstein, D. Brown, C. Fröhlich, P. Talou, and W. Younes (Springer International Publishing, Cham, 2021) pp. 27–34.
Koning et al. (2019) A. Koning, D. Rochman, J.-C. Sublet, N. Dzysiuk, M. Fleming, and S. van der Marck, Nuclear Data Sheets 155, 1 (2019).
(41) S. Georg, “nucdataBaynet - Nuclear Data Evaluation with Bayesian Networks,” https://github.com/IAEA-NDS/nucdataBaynet.
Herman et al. (2013) M. Herman, R. Capote, M. Sin, A. Trkov, B. V. Carlson, P. Oblozinsky, C. Mattoon, H. Wienke, S. Hoblit, Y. Cho, G. Nobre, V. Plujko, and V. Zerkin, EMPIRE-3.2 Malta-Modular System for Nuclear Reaction Calculations and Nuclear Data Evaluation, Tech. Rep. INDC (NDS)-0603 (IAEA, Vienna, 2013).
Poenitz (1981) W. P. Poenitz, in Proceedings of the Conference on Nuclear Data Evaluation Methods and Procedures, INDC(USA)-85, Vol. 1 (Upton, NY 11973, 1981).
Poenitz and Aumeier (1997) W. P. Poenitz and S. E. Aumeier, The Simultaneous Evaluation of the Standards and Other Cross Sections of Importance for Technology, Tech. Rep. 139 (Argonne National Lab., IL (USA), 1997).
Kawano and Shibata (1997) T. Kawano and K. Shibata, Covariance Evaluation System, Tech. Rep. JAERI-Data/Code 97-037 (JAERI, Japan, 1997).
Kawano et al. (2000) T. Kawano, H. Matsunobu, T. Murata, A. Zukeran, Y. Nakajima, M. Kawai, O. Iwamoto, K. Shibata, T. Nakagawa, T. Ohsawa, M. Baba, and T. Yoshida, Journal of Nuclear Science and Technology 37, 327 (2000).
Koning (2015) A. J. Koning, The European Physical Journal A 51, 184 (2015).
Fröhner (2000) F. Fröhner, Evaluation and Analysis of Nuclear Resonance Data, Tech. Rep. 18 (Institut für Neutronenphysik und Reaktortechnik, Karlsruhe, 2000).
von der Linden et al. (2014) W. von der Linden, V. Dose, and U. von Toussaint, Bayesian Probability Theory: Applications in the Physical Sciences (Cambridge University Press, Cambridge, United Kingdom, 2014).
Niu and Liang (2018) Z. Niu and H. Liang, Physics Letters B 778, 48 (2018).
Chen et al. (2010) G. Chen, Z. Yang, and J. Sun, Procedia Engineering 7, 81 (2010).
Mohan et al. (2020) V. K. D. Mohan, P. Vardon, J. Daniell, P. Gehl, A. Schafer, P. van Gelder, V. Natarajan, C. Molenaar, E. Foerster, and F. Ragon, in EGU General Assembly Conference Abstracts, EGU General Assembly Conference Abstracts (2020) p. 21036.
Chen et al. (2021) S. Chen, L. Zhang, T. Qing, and X. Liu, Journal of Nuclear Science and Technology , 1 (2021).
Kang et al. (2018) H. G. Kang, S. H. Lee, S. J. Lee, T.-L. Chu, A. Varuttamaseni, M. Yue, S. Yang, H. S. Eom, J. Cho, and M. Li, Annals of Nuclear Energy 120, 62 (2018).
Ramos et al. (2021) D. Ramos, P. Ramirez-Hereza, D. T. Toledano, J. Gonzalez-Rodriguez, A. Ariza-Velazquez, D. Solis-Tovar, and C. Muñoz-Reja, Chemometrics and Intelligent Laboratory Systems 214, 104327 (2021).
Freeman (2009) C. Freeman (2009).
Sivia (1996) D. S. Sivia, Data Analysis: A Bayesian Tutorial (Clarendon Press, 1996).
Koning and Rochman (2008) A. J. Koning and D. Rochman, Annals of Nuclear Energy 35, 2024 (2008).
Rochman et al. (2009) D. Rochman, A. J. Koning, and S. C. van der Marck, Annals of Nuclear Energy 36, 810 (2009).
Rochman et al. (2010) D. Rochman, A. J. Koning, D. F. da Cruz, P. Archier, and J. Tommasi, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 612, 374 (2010).
Press (2007) W. H. Press, Numerical Recipes: The Art of Scientific Computing (Cambridge University Press, Cambridge, UK; New York, 2007).
Davis (2006) T. A. Davis, Direct Methods for Sparse Linear Systems (Society for Industrial and Applied Mathematics, 2006).
George and Liu (1989) A. George and J. W. Liu, SIAM Review 31, 1 (1989).
Bates and Maechler (2021) D. Bates and M. Maechler, Matrix: Sparse and Dense Matrix Classes and Methods (2021).
Chen et al. (2008) Y. Chen, T. A. Davis, W. W. Hager, and S. Rajamanickam, ACM Transactions on Mathematical Software 35, 1 (2008).
Levenberg (1944) K. Levenberg, Quarterly of Applied Mathematics 2, 164 (1944).
Marquardt (1963) D. W. Marquardt, Journal of the Society for Industrial and Applied Mathematics 11, 431 (1963).
Schnabel (2015) G. Schnabel, Large Scale Bayesian Nuclear Data Evaluation with Consistent Model Defects, Ph.D. thesis, Technische Universität Wien, Vienna (2015).
Quiñonero-Candela and Rasmussen (2005) J. Quiñonero-Candela and C. E. Rasmussen, Journal of Machine Learning Research 6, 1939 (2005).
Snelson and Ghahramani (2006) E. Snelson and Z. Ghahramani, in Advances in Neural Information Processing Systems (2006) pp. 1257–1264.
Wang et al. (2019) K. Wang, G. Pleiss, J. Gardner, S. Tyree, K. Q. Weinberger, and A. G. Wilson, in Advances in Neural Information Processing Systems, Vol. 32, edited by H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett (Curran Associates, Inc., 2019).
Pleiss et al. (2020) G. Pleiss, M. Jankowiak, D. Eriksson, A. Damle, and J. R. Gardner, arXiv e-prints , arXiv:2006.11267 (2020), arXiv:2006.11267 [cs.LG] .
Saatchi (2011) Y. Saatchi, Scalable Inference for Structured Gaussian Process Models, Ph.D. thesis (2011).
Jaynes and Rosenkrantz (1983) E. T. Jaynes and R. D. Rosenkrantz, E.T. Jaynes: Papers on Probability, Statistics, and Statistical Physics (D. Reidel ; Sold and distributed in the U.S.A. and Canada by Kluwer Boston, Dordrecht, Holland; Boston; Hingham, MA, 1983).
Shore and Johnson (1980) J. Shore and R. Johnson, IEEE Transactions on Information Theory 26, 26 (1980).
Schnabel (2018c) G. Schnabel, EPJ Nuclear Sciences & Technologies 4, 33 (2018c).
Schnabel and Sjöstrand (2018) G. Schnabel and H. Sjöstrand, arXiv:1811.03874 [nucl-ex, physics:nucl-th, physics:physics] (2018), arXiv:1811.03874 [nucl-ex, physics:nucl-th, physics:physics] .
Green and Silverman (1993) P. Green and B. W. Silverman, Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach, zeroth ed. (Chapman and Hall/CRC, 1993).
Tikhonov and Arsenin (1977) A. N. Tikhonov and V. I. Arsenin, Solutions of Ill-Posed Problems, Scripta Series in Mathematics (Winston ; distributed solely by Halsted Press, Washington : New York, 1977).
de Baar et al. (2014) J. H. de Baar, R. P. Dwight, and H. Bijl, International Journal for Uncertainty Quantification 4, 205 (2014).
Blons et al. (1971) J. Blons, H. Derrien, and A. Michaudon, in 3rd Conf. Neutron Cross Sections and Technology, Vol. 2 (Knoxville, 1971) p. 829.
Migneco et al. (1975) E. Migneco, P. Bonsignore, G. Lanzano, J. A. Wartena, H. Weigmann, M. G. Cao, J. P. Theobald, and J. Winter, in Conf. on Nuclear Cross Sections and Technology (Washington, 1975) p. 607.
Wagemans and Deruytter (1976) C. Wagemans and A. Deruytter, Annals of Nuclear Energy 3, 437 (1976).
Weston and Todd (1984) L. W. Weston and J. H. Todd, Nuclear Science and Engineering 88, 567 (1984).
Paradela et al. (2016) C. Paradela, I. Duran, L. Tassan-Got, L. Audouin, B. Berthier, S. Isaev, C. Le Naour, C. Stephan, D. Tarrío, U. Abbondanno, G. Aerts, H. Álvarez-Pol, F. Álvarez-Velarde, S. Andriamonje, J. Andrzejewski, G. Badurek, P. Baumann, F. Becvar, E. Berthoumieux, F. Calviño, M. Calviani, D. Cano-Ott, R. Capote, C. Carrapiço, P. Cennini, V. Chepel, E. Chiaveri, N. Colonna, G. Cortes, A. Couture, J. Cox, M. Dahlfors, S. David, I. Dillmann, C. Domingo-Pardo, W. Dridi, C. Eleftheriadis, M. Embid-Segura, L. Ferrant, A. Ferrari, R. Ferreira-Marques, K. Fujii, W. Furman, I. Gonçalves, E. Gonzalez-Romero, A. Goverdovski, F. Gramegna, C. Guerrero, F. Gunsing, R. Haight, M. Heil, M. Igashira, E. Jericha, Y. Kadi, F. Kaeppeler, D. Karadimos, M. Kerveno, V. Ketlerov, P. Koehler, V. Konovalov, M. Krticka, C. Lampoudis, C. Lederer, H. Leeb, A. Lindote, S. Lukic, J. Marganiec, T. Martinez, S. Marrone, C. Massimi, P. Mastinu, A. Mengoni, P. Milazzo, C. Moreau, M. Mosconi, S. Pancin, J., A. Pavlik, P. Pavlopoulos, L. Perrot, R. Plag, A. Plompen, A. Plukis, A. Poch, C. Pretel, J. Praena, J. Quesada, T. Rauscher, R. Reifarth, C. Rubbia, G. Rudolf, P. Rullhusen, J. Salgado, C. Santos, L. Sarchiapone, I. Savvidis, G. Tagliente, J. Tain, L. Tavora, R. Terlizzi, P. Vaz, A. Ventura, D. Villamarin, M. Vincente, V. Vlachoudis, R. Vlastou, F. Voss, S. Walter, C. Weiss, M. Wiesher, and K. Wisshak, EPJ Web of Conferences 111, 02003 (2016).
Fukushima (1969) K. Fukushima, IEEE Transactions on Systems Science and Cybernetics 5, 322 (1969).
Korzh et al. (1977) I. A. Korzh, V. A. Mishchenko, E. N. Mozhzhukhin, N. M. Pravdivij, and I. E. Sanzhur, Ukrainskii Fizichnii Zhurnal 22, 87 (1977).
Barrows (1965) T. W. J. Barrows, A Study of the (n,Ng) Reactions in V51 and Mn55 and the Reaction F19(d,n)Ne20, Ph.D. thesis (1965).
Perey et al. (1971) F. G. Perey, W. E. Kinney, and R. L. Macklin, in 3rd Conf. Neutron Cross Sections and Technology, Vol. 1 (Knoxville, 1971) p. 191.
Korzh et al. (1994) I. A. Korzh, V. A. Mishchenko, N. M. Pravdivy, and N. T. Sklyar, Ukrainskii Fizichnii Zhurnal 39, 785 (1994).
Beyer et al. (2014a) R. Beyer, R. Schwengner, R. Hannaske, A. Junghans, R. Massarczyk, M. Anders, D. Bemmerer, A. Ferrari, A. Hartmann, T. Kögler, M. Röder, K. Schmidt, and A. Wagner, Nuclear Physics A 927, 41 (2014a).
Sublet and Ribon (2002) J.-C. Sublet and P. Ribon, Journal of Nuclear Science and Technology 39, 856 (2002).
Bertsch et al. (2018) G. F. Bertsch, D. Brown, and E. D. Davis, Physical Review C 98, 014611 (2018).
Soukhovitski et al. (2013) E. S. Soukhovitski, G. B. Morogovskij, R. Capote Noy, S. Chiba, and J. Quesada, Program OPTMAN Version 14 (2013) User’s Guide, Tech. Rep. INDC(NDS)-0642 (IAEA, Vienna, Austria, 2013).
Dobeš and Běták (1983) J. Dobeš and E. Běták, Zeitschrift fuer Physik A310, 329 (1983).
Neudecker et al. (2018) D. Neudecker, B. Hejnal, F. Tovesson, M. C. White, D. L. Smith, D. Vaughan, and R. Capote, EPJ Nuclear Sciences & Technologies 4, 21 (2018).
Neudecker et al. (2020a) D. Neudecker, D. Smith, F. Tovesson, R. Capote, M. White, N. Bowden, L. Snyder, A. Carlson, R. Casperson, V. Pronyaev, S. Sangiorgio, K. Schmitt, B. Seilhan, N. Walsh, and W. Younes, Nuclear Data Sheets 163, 228 (2020a).
Siefman et al. (2020) D. Siefman, M. Hursin, H. Sjostrand, G. Schnabel, D. Rochman, and A. Pautz, EPJ Nuclear Sciences & Technologies 6, 52 (2020).
Siefman et al. (2021) D. Siefman, M. Hursin, G. Schnabel, and H. Sjöstrand, Annals of Nuclear Energy 159, 108255 (2021).
Otuka et al. (2014) N. Otuka, E. Dupont, V. Semkova, B. Pritychenko, A. Blokhin, M. Aikawa, S. Babykina, M. Bossant, G. Chen, S. Dunaeva, R. Forrest, T. Fukahori, N. Furutachi, S. Ganesan, Z. Ge, O. Gritzay, M. Herman, S. Hlavač, K. Katō, B. Lalremruata, Y. Lee, A. Makinaga, K. Matsumoto, M. Mikhaylyukova, G. Pikulina, V. Pronyaev, A. Saxena, O. Schwerer, S. Simakov, N. Soppera, R. Suzuki, S. Takács, X. Tao, S. Taova, F. Tárkányi, V. Varlamov, J. Wang, S. Yang, V. Zerkin, and Y. Zhuang, Nuclear Data Sheets 120, 272 (2014).
Brown (2014) D. Brown, Nuclear Data Sheets 120, 285 (2014).
Hirdt and Brown (2016) J. Hirdt and D. Brown, Nuclear Data Sheets 131, 377 (2016).
Whewell et al. (2020) B. Whewell, M. Grosskopf, and D. Neudecker, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 978, 164305 (2020).
Neudecker et al. (2020b) D. Neudecker, M. Grosskopf, M. Herman, W. Haeck, P. Grechanuk, S. Vander Wiel, M. Rising, A. Kahler, N. Sly, and P. Talou, Nuclear Data Sheets 167, 36 (2020b).
Dwivedi (2019) N. R. Dwivedi, arXiv:1907.09764 [nucl-ex, physics:nucl-th, stat] (2019), arXiv:1907.09764 [nucl-ex, physics:nucl-th, stat] .
Wenusch and Vonach (1962) R. Wenusch and H. Vonach, Oesterr.Akad.Wiss.,Math-Naturw.Kl.,Anzeiger 99, 1 (1962).
Barrall et al. (1969a) R. Barrall, M. Silbergeld, and D. Gardner, Nuclear Physics A 138, 387 (1969a).
Molla and Qaim (1977) N. Molla and S. Qaim, Nuclear Physics A 283, 269 (1977).
Dyer and Hamilton (1972) N. Dyer and J. Hamilton, Journal of Inorganic and Nuclear Chemistry 34, 1119 (1972).
Corcalciuc et al. (1978) V. Corcalciuc, B. Holmqvist, A. Marcinkowski, and G. Prokopets, Nuclear Physics A 307, 445 (1978).
Singh (1972) J. J. Singh, Transactions of the American Nuclear Society 15, 147 (1972).
Frehaut et al. (1980) J. Frehaut, A. Bertin, R. Bois, J. Jary, and G. Mosinski, in U.S. Report to the I.N.D.C., Vol. 1 (USA, 1980) p. 399.
Robertson et al. (1973) J. Robertson, B. Audric, and P. Kolkowski, Journal of Nuclear Energy 27, 139 (1973).
Bowers and Greenwood (1989) D. L. Bowers and L. R. Greenwood, in American Soc. of Testing and Materials Reports (USA, 1989) p. 508.
Spangler et al. (1975) R. Spangler, E. L. J. Draper, and T. A. Parish, Transactions of the American Nuclear Society 22, 818 (1975).
Wallner et al. (2011) A. Wallner, K. Buczak, C. Lederer, H. Vonach, T. Faestermann, G. Korschinek, M. Poutivtsev, G. Rugel, A. Klix, K. Seidel, A. Plompen, and V. Semkova, Journal of the Korean Physical Society 59, 1378 (2011).
Mostafa (1976) A. B. M. G. Mostafa, Nuclear Science and Applications B 9, 10 (1976).
Saraf et al. (1991) S. K. Saraf, C. E. Brient, P. M. Egun, S. M. Grimes, V. Mishra, and R. S. Pedroni, Nuclear Science and Engineering 107, 365 (1991).
Sothras (1977) Sothras, A Study of the Systematics for (n,2n) Reactions, Ph.D. thesis (1977).
Wang et al. (2015a) Z. Wang, X. Fan, L. Zhang, H. Bai, J. Chen, G. Zhang, Y. M. Gledenov, M. V. Sedysheva, L. Krupa, and G. Khuukhenkhuu, Physical Review C 92, 044601 (2015a).
Wang et al. (2015b) Z. Wang, X. Fan, L. Zhang, H. Bai, J. Chen, G. Zhang, Y. M. Gledenov, M. V. Sedysheva, L. Krupa, and G. Khuukhenkhuu, Physical Review C 92, 044601 (2015b).
Kudo (1977) K. Kudo, Nuclear Instruments and Methods 141, 325 (1977).
Grimes et al. (1979) S. M. Grimes, R. C. Haight, K. R. Alvar, H. H. Barschall, and R. R. Borchers, Physical Review C 19, 2127 (1979).
Ramendik et al. (1977) Z. A. Ramendik, G. M. Stukov, and V. T. Shchebolev, Atomnaya Energiya 42, 136 (1977).
Kinney (1968) W. E. Kinney, Neutron Elastic and Inelastic Scattering from Fe56 from 4.60 to 7.55 MeV, Technical Memo 2052 (Oak Ridge National Lab, USA, 1968).
Ryves et al. (1978) T. B. Ryves, P. Kolkowski, and K. J. Zieba, Metrologia 14, 127 (1978).
Boschung et al. (1971) P. Boschung, J. Lindow, and E. Shrader, Nuclear Physics A 161, 593 (1971).
Chi-Chou et al. (1978) L. Chi-Chou, L. Han-Lin, F. Pei-Kuo, M. Hung-Chang, and L. Yeh-Sha, High Energy Physics and Nucl.Physics, Chinese ed. 2, 550 (1978).
El-Kadi et al. (1982) S. M. El-Kadi, C. E. Nelson, F. O. Purser, R. L. Walter, A. Beyerle, C. R. Gould, and L. W. Seagondollar, Nuclear Physics A 390, 509 (1982).
Ramirez et al. (2017) A. P. D. Ramirez, J. R. Vanhoy, S. F. Hicks, M. T. McEllistrem, E. E. Peters, S. Mukhopadhyay, T. D. Harrison, T. J. Howard, D. T. Jackson, P. D. Lenzen, T. D. Nguyen, R. L. Pecha, B. G. Rice, B. K. Thompson, and S. W. Yates, Physical Review C 95, 064605 (2017).
Sharma et al. (1978) D. Sharma, M. G. Shahani, U. Y. Phadnis, and S. K. Sadavarte, in 21.Nucl.Phys.and Solid State Phys.Symp., Vol. 2 (Bombay, 1978) p. 349.
Kozyr and Prokopets (1978) Y. E. Kozyr and G. A. Prokopets, Yadernaya Fizika 27, 616 (1978).
Nemilov and Trofimov (1978) Y. A. Nemilov and Y. N. Trofimov, Cross-Sections of (n,p) Reactions on Ni-58, Fe-56 and Zn-64 Isotopes at Neutron Energies 7.6 - 9.3 MeV, Tech. Rep. 26 (Russia, 1978).
Xiamin et al. (1982) S. Xiamin, S. Ronglin, X. Jinqiang, W. Yongshun, and D. Dazhao, in Conf.on Nucl.Data for Sci.and Technol. (Antwerp, Belgium, 1982) p. 373.
Ngoc et al. (1980) P. N. Ngoc, S. Gueth, F. Deak, and A. Kiss, Investigations of (n,p), (n,a) and (n,2n) Reactions around 14 MeV, Ph.D. thesis (1980).
Simakov et al. (1992) S. P. Simakov, A. A. Androsenko, P. A. Androsenko, B. V. Devkin, M. G. Kobozev, A. A. Lychagin, V. V. Sinitca, V. A. Talalaev, D. Y. Chuvilin, A. A. Borisov, and V. A. Zagryadskiy, Vop.At.Nauki i Tekhn.,Ser.Yaderno-Reak.Konstanty, , 93 (1992).
Kudo (1982) K. Kudo, 56Fe(n,p)56Mn Cross Section at 14.60 MeV, Japanese Report to NEANDC 83 (1982).
Beyer et al. (2014b) R. Beyer, R. Schwengner, R. Hannaske, A. Junghans, R. Massarczyk, M. Anders, D. Bemmerer, A. Ferrari, A. Hartmann, T. Kögler, M. Röder, K. Schmidt, and A. Wagner, Nuclear Physics A 927, 41 (2014b).
Viennot et al. (1982) M. Viennot, A. A. Haddou, A. Chiadli, and G. Paic, in Conf.on Nucl.Data for Sci.and Technol. (Antwerp, Belgium, 1982) p. 406.
Klochkova et al. (1992) L. I. Klochkova, B. S. Kovrigin, and V. N. Kuritsin, Vop.At.Nauki i Tekhn.,Ser.Yaderno-Reak.Konstanty , 27 (1992).
Ngoc et al. (1983) P. N. Ngoc, L. L. Bach, N. Van Do, T. T. Vinh, and I. Ribansky, Neutron Activation Cross Section for 56Fe(n,p) and 87Rb(n,2n) Reactions, Tech. Rep. 2 (Austria, 1983).
Paul and Clarke (1953) E. B. Paul and R. L. Clarke, Canadian Journal of Physics 31, 267 (1953).
Mcclure and Kent (1955) G. W. Mcclure and D. W. Kent, Journal of the Franklin Institute 260, 238 (1955).
Gupta et al. (1985) J. P. Gupta, H. D. Bhardwaj, and R. Prasad, Pramana 24, 637 (1985).
Yasumi (1957) S. Yasumi, Journal of the Physical Society of Japan 12, 443 (1957).
Garlea et al. (1985) I. Garlea, C. Miron-Garlea, H. N. Rosu, M. Ion, and V. Raducu, Neutron Cross Sections Measured at 14.8 MeV, Tech. Rep. 562 (Zentralinst. f. Kernforschung Rossendorf, Rossendorf, Germany, 1985).
Allan (1957) D. L. Allan, Proceedings of the Physical Society. Section A 70, 195 (1957).
Meadows et al. (1987) J. Meadows, D. Smith, M. Bretscher, and S. Cox, Annals of Nuclear Energy 14, 489 (1987).
Terrell and Holm (1958) J. Terrell and D. M. Holm, Physical Review 109, 2031 (1958).
Muyao et al. (1987) Z. Muyao, Z. Yongfa, W. Chuanshan, Z. Lu, C. Yitai, Z. Shuxin, Z. Shenjun, X. Kuanzhong, Z. Shenmuo, C. Xueshi, Z. Yiping, and Y. Qinguan, Chinese J.of Nuclear Physics 9, 34 (1987).
Ikeda et al. (1988) Y. Ikeda, C. Konno, K. Oishi, T. Nakamura, H. Miyade, K. Kawade, H. Yamamoto, and T. Katoh, Activation Cross Section Measurements for Fusion Reactor Structural Materials at Neutron Energy from 13.3 to 15.0 MeV Using FNS Facility, JAERI Reports 1312 (Japan, 1988).
Kern et al. (1959) B. Kern, W. Thompson, and J. Ferguson, Nuclear Physics 10, 226 (1959).
Kimura and Kobayashi (1990) I. Kimura and K. Kobayashi, Nuclear Science and Engineering 106, 332 (1990).
Depraz et al. (1960) M. J. Depraz, G. Legros, and M. R. Salin, Journal de Physique et le Radium 21, 377 (1960).
Chittenden et al. (1961) D. M. Chittenden, D. G. Gardner, and R. W. Fink, Physical Review 122, 860 (1961).
Ercan et al. (1991) A. Ercan, M. N. Erduran, M. Subasi, E. Gueltekin, G. Tarcan, A. Baykal, and M. Bostan, in Conf.on Nucl.Data for Sci.and Technol. (Juelich, 1991) p. 376.
Pollehn and Neuert (1961) H. Pollehn and N. Neuert, Zeitschrift fuer Naturforschung, Section A 16, 227 (1961).
Viennot et al. (1991) M. Viennot, M. Berrada, G. Paic, and S. Joly, Nuclear Science and Engineering 108, 289 (1991).
Fuga (1991) P. Fuga, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 309, 500 (1991).
Gabbard and Kern (1962) F. Gabbard and B. D. Kern, Physical Review 128, 1276 (1962).
Bormann et al. (1962) M. Bormann, S. Cierjacks, R. Langkau, and H. Neuert, Zeitschrift fuer Physik 166, 477 (1962).
Belgaid et al. (1992) M. Belgaid, M. Siad, and M. Allab, Journal of Radioanalytical and Nuclear Chemistry 166, 493 (1992).
Cross et al. (1963) W. G. Cross, R. L. Clarke, K. Morin, G. Slinn, N. M. Ahmed, and K. Beg, Activation Cross Sections in Ti for 14.5 MeV Neutrons, Canadian Report to EANDC 16 (1963).
Santry and Butler (1964) D. C. Santry and J. P. Butler, Canadian Journal of Physics 42, 1030 (1964).
Ikeda et al. (1993) Y. Ikeda, C. Konno, Y. Oyama, K. Kosako, K. Oishi, and H. Maekawa, Journal of Nuclear Science and Technology 30, 870 (1993).
Zongyu et al. (1993) B. Zongyu, R. Chaofan, Y. Xiaoyun, Z. Shuping, D. Shengyao, and Y. Yiguang, Chinese J.of Nuclear Physics 15, 341 (1993).
Filatenkov et al. (1999) A. A. Filatenkov, S. V. Chuvaev, V. N. Aksenov, V. A. Yakovlev, A. V. Malyshenkov, S. K. Vasil, M. Avrigeanu, V. Avrigeanu, D. L. Smith, Y. Ikeda, A. Wallner, W. Kutschera, A. Priller, P. Steier, H. Vonach, G. Mertens, and W. Rochow, Systematic Measurement of Activation Cross Sections at Neutron Energies from 13.4 to 14.9 MeV, Tech. Rep. 252 (Khlopin Radiev. Inst., Russia, 1999).
Bonazzola et al. (1964) G. Bonazzola, P. Brovetto, E. Chiavassa, R. Spinoglio, and A. Pasquarelli, Nuclear Physics 51, 337 (1964).
Fessler et al. (2000) A. Fessler, A. J. M. Plompen, D. L. Smith, J. W. Meadows, and Y. Ikeda, Nuclear Science and Engineering 134, 171 (2000).
Liskien and Paulsen (1965) H. Liskien and A. Paulsen, Journal of Nuclear Energy. Parts A/B. Reactor Science and Technology 19, 73 (1965).
Coszach et al. (2000) R. Coszach, P. Duhamel, W. Galster, P. Jean, P. Leleux, J.-P. Meulders, J. Vanhorenbeeck, G. Vedrenne, and P. von Ballmoos, Physical Review C 61, 064615 (2000).
Bormann et al. (1965) M. Bormann, E. Fretwurst, P. Schehka, G. Wrege, H. Büttner, A. Lindner, and H. Meldner, Nuclear Physics 63, 438 (1965).
Mannhart and Schmidt (2007) W. Mannhart and D. Schmidt, Measurement of Neutron Activation Cross Sections in the Energy Range from 8 MeV to 15 MeV, Tech. Rep. 53 (Phys.Techn.Bundesanst., 2007).
Liskien and Paulsen (1966) H. Liskien and A. Paulsen, Nukleonik 8, 315 (1966).
Mulik et al. (2013) V. K. Mulik, H. Naik, S. V. Suryanarayana, S. D. Dhole, P. M. Prajapati, B. S. Shivashankar, K. C. Jagadeesan, S. V. Thakre, V. N. Bhoraskar, and A. Goswami, Journal of Radioanalytical and Nuclear Chemistry 296, 1321 (2013).
Hemingway et al. (1966) J. D. Hemingway, R. H. James, E. B. M. Martin, and G. R. Martin, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences 292, 180 (1966).
Filatenkov (2016) A. A. Filatenkov, Neutron Activation Cross Sections Measured at KRI in Neutron Energy Region 13.4 - 14.9 MeV, Tech. Rep. 0460 (Austria, 2016).
Grundl (1967) J. A. Grundl, Nuclear Science and Engineering 30, 39 (1967).
Qaim and Stöcklin (1976) S. Qaim and G. Stöcklin, Nuclear Physics A 257, 233 (1976).
Vonach et al. (1968) H. K. Vonach, W. G. Vonach, H. Muenzer, and P. Schramel, in Nuclear Cross-Sections Techn. Conf., Vol. 2 (Washington, USA, 1968) p. 885.
Cuzzocrea et al. (1968) P. Cuzzocrea, E. Perillo, and S. Notarrigo, Il Nuovo Cimento B (1965-1970) 54, 53 (1968).
Tutubalin et al. (1973) A. I. Tutubalin, A. P. Klyucharev, V. P. Bozhko, V. Y. Golobnya, G. P. Dolya, G. A. Zachepilo, and A. S. Kachan, Total Neutron Cross Section for the Isotopes 54-Fe and 56-Fe, Tech. Rep. 11 (Russia, 1973).
Barrall et al. (1969b) R. C. Barrall, J. A. Holmes, and M. Silbergeld, High Energy Neutron Cross Section Validation and Neutron Flux Spectrum Using the HENRE Source, A.F.B.Repts. (Air Force Spec.Weap.Center Kirtland, 1969).

Appendix A Derivation of GLS method and LM algorithm

The GLS method is the basis for inference in Bayesian networks of continuous variables with a multivariate normal prior distribution and linear relationships between variables. The customized LM algorithm Helgesson and Sjöstrand (2017) is an iterative optimization algorithm that extends the GLS method to enable the determination of the most likely values of the variables also in the case of non-linear relationships between them. Here we present a derivation of the GLS method and the LM algorithm that follows closely the derivation given in the appendix of Schnabel et al. (2021).

Assume that we have a vector $\vec{p}$ with variables of interest that are related to observed quantities $\vec{\sigma}_{\textrm{exp}}$ by the sum of a vector-valued function $\mathcal{M}(\vec{p})$ and a noise term. We denote the prior distribution on the vector $\vec{p}$ by $\pi(\vec{p})$ and the likelihood connecting $\vec{p}$ to the observations by $\ell(\vec{\sigma}_{\textrm{exp}}\,|\,\vec{p})$ . According to Bayes theorem, the posterior distribution is proportional to the product of prior and likelihood,

\pi(\vec{p}\,|\,\vec{\sigma}_{\textrm{exp}})=\frac{1}{\pi(\vec{\sigma}_{\textrm{exp}})}\ell(\vec{\sigma}_{\textrm{exp}}\,|\,\vec{p})\pi(\vec{p})\,.

(90)

For both the GLS method and the customized LM algorithm, multivariate normal distributions are imposed for the likelihood and the prior distribution,

	$\displaystyle\ell(\vec{\sigma}_{\textrm{exp}}\,\|\,\vec{p})$	$\displaystyle=\mathcal{N}(\vec{\sigma}_{\textrm{exp}}\,\|\,\mathcal{M}(\vec{p});\mathbf{\Sigma}_{\textrm{exp}})\,,$		(91)
	$\displaystyle\pi(\vec{p})$	$\displaystyle=\mathcal{N}(\vec{p}\,\|\,\vec{p}_{0};\mathbf{P})\,.$		(92)

The functional form of the posterior density function (pdf) of a multivariate normal distribution is given by

\mathcal{N}(\vec{x}\,|\,\vec{\mu},\mathbf{\Sigma})=\frac{1}{\sqrt{(2\pi)^{N}\det\mathbf{\Sigma}}}\\ \exp\left(-\frac{1}{2}(\vec{x}-\vec{\mu})^{T}\mathbf{\Sigma}^{-1}(\vec{x}-\vec{\mu})\right)

(93)

characterized by a mean vector $\vec{\mu}$ and a covariance matrix $\mathbf{\Sigma}$ . In the following, we make use of the natural logarithm of the multivariate normal pdf,

\ln\mathcal{N}(\vec{x}\,|\,\vec{\mu},\mathbf{\Sigma})=-\frac{N}{2}\ln(2\pi)-\frac{1}{2}\ln\det\mathbf{\Sigma}\\ -\frac{1}{2}(\vec{x}-\vec{\mu})^{T}\mathbf{\Sigma}^{-1}(\vec{x}-\vec{\mu})=\\ -\frac{1}{2}\vec{x}^{T}\mathbf{\Sigma}^{-1}\vec{x}-(\vec{x}-\vec{\mu})^{T}\mathbf{\Sigma}^{T}\vec{x}+\mathcal{C}

(94)

where we absorbed terms independent of $\vec{x}$ into $\mathcal{C}$ and made use of the fact that the second term is equivalent to its transpose due to the covariance matrix being symmetric.

For the GLS method, we assume that the function $\mathcal{M}(\vec{p})$ is linear and can therefore be written in the form

\mathcal{M}_{\textrm{lin}}(\vec{p})=\vec{p}_{\textrm{ref}}+\mathbf{J}\left(\vec{p}-\vec{p}_{\textrm{ref}}\right)\,.

(95)

The introduction of $\delta\vec{p}=\vec{p}-\vec{p}_{\textrm{ref}}$ , $\vec{d}=\vec{\sigma}_{\textrm{exp}}-\vec{\sigma}_{\textrm{ref}}$ allows us to write

(-2)\ln\pi(\vec{p}\,|\,\vec{\sigma}_{\textrm{exp}})=\left(\vec{d}-\mathbf{J}\,\delta\vec{p}\right)^{T}\mathbf{\Sigma}_{\textrm{exp}}^{-1}\left(\vec{d}-\mathbf{J}\,\delta\vec{p}\right)\\ +\left(\delta\vec{p}+\vec{p}_{\textrm{ref}}-\vec{p}_{0}\right)^{T}\mathbf{P}^{-1}\left(\delta\vec{p}+\vec{p}_{\textrm{ref}}-\vec{p}_{0}\right)+\mathcal{C}

(96)

with the constant $\mathcal{C}$ absorbing everything which is independent of $\delta\vec{p}$ . The constant is of no significance as we are going to take derivatives with respect to elements in $\delta\vec{p}$ .

Now we isolate terms containing $\delta\vec{p}$ and regroup the expression according to their order,

\frac{1}{2}\delta\vec{p}^{T}\left(\mathbf{J}^{T}\mathbf{\Sigma}_{\textrm{exp}}^{-1}\mathbf{J}+\mathbf{P}^{-1}\right)\delta\vec{p}\\ -\delta\vec{p}^{T}\left(\mathbf{J}^{T}\mathbf{\Sigma}_{\textrm{exp}}^{-1}\vec{d}+\mathbf{P}^{-1}(\vec{p}_{0}-\vec{p}_{\textrm{ref}})\right)+\mathcal{C}

(97)

where all terms independent of $\delta\vec{p}$ are absorved in $\mathcal{C}$ . The comparison of this expression with eq. 94 reveals that also the posterior pdf is a multivariate normal distribution. We can identify the inverse of the posterior covariance matrix $\mathbf{P}^{\prime}$ being given by

\left(\mathbf{P}^{\prime}\right)^{-1}=\mathbf{J}^{T}\mathbf{\Sigma}_{\textrm{exp}}^{-1}\mathbf{J}+\mathbf{P}^{-1}\,.

(98)

To determine the vector $\delta\vec{p}$ that minimizes eq. 97, we calculate the gradient of eq. 97 and require it to vanish:

\left(\mathbf{J}^{T}\mathbf{\Sigma}_{\textrm{exp}}^{-1}\mathbf{J}+\mathbf{P}^{-1}\right)\delta\vec{p}\\ -\left(\mathbf{J}^{T}\mathbf{\Sigma}_{\textrm{exp}}^{-1}\vec{d}+\mathbf{P}^{-1}(\vec{p}_{0}-\vec{p}_{\textrm{ref}})\right)=\vec{0}\,.

(99)

Rearranging yields

\left(\mathbf{J}^{T}\mathbf{\Sigma}_{\textrm{exp}}^{-1}\mathbf{J}+\mathbf{P}^{-1}\right)\delta\vec{p}=\left(\mathbf{J}^{T}\mathbf{\Sigma}_{\textrm{exp}}^{-1}\vec{d}+\mathbf{P}^{-1}(\vec{p}_{0}-\vec{p}_{\textrm{ref}})\right)\,.

(100)

Using the identification in eq. 98 and rearranging eq. 100 another time yields the GLS formula to compute the posterior center vector,

\vec{p}=\vec{p}_{\textrm{ref}}+\mathbf{P}^{\prime}\left(\mathbf{J}^{T}\mathbf{\Sigma}_{\textrm{exp}}^{-1}\vec{d}+\mathbf{P}^{-1}(\vec{p}_{0}-\vec{p}_{\textrm{ref}})\right)\,.

(101)

Now we discuss the derivation of the LM algorithm. For non-linear relationships $\mathcal{M}(\vec{p})$ , the Jacobian $\mathbf{J}$ changes depending on the chosen expansion point $\vec{p}_{\textrm{ref}}$ . It may be still used to determine a small step in vicinity of the expansion point to climb up the posterior density function but cannot be trusted globally anymore as in the GLS method. Inspecting eq. 100, we see that the right-hand side is proportional to the gradient of eq. 97 evaluated at the reference parameter vector, i.e., $\delta\vec{p}=0$ . Therefore it is also proportional to the gradient of the logarithmized posterior distribution $\ln\pi(\vec{p}\,|\,\vec{\sigma}_{\textrm{exp}})$ evaluated at $\vec{p}=\vec{p}_{\textrm{ref}}$ . Taking this aspect into account, we can augment eq. 98 by a damping term,

\left(\mathbf{P}^{\prime}\right)^{-1}=\mathbf{J}^{T}\mathbf{\Sigma}_{\textrm{exp}}^{-1}\mathbf{J}+\mathbf{P}^{-1}+\lambda\mathcal{I}\,,

(102)

with $\lambda$ being a non-negative number and $\mathcal{I}$ the identity matrix or a diagonal matrix. An increase of $\lambda$ makes the matrix $(\mathbf{P}^{\prime})^{-1}$ more diagonal and consequently also its inverse. In addition, the magnitude of the diagonal elements in the matrix $\mathbf{P}^{\prime}$ decreases. Consequently, for an increasing value of $\lambda$ , the update prescription in eq. 101 transforms gradually into the gradient ascent method. On the other side, for $\lambda=0$ , we recover the GLS update.

The LM algorithm adaptively changes the value of $\lambda$ from one iteration to the next depending on how well the expected improvement according to the linear approximation of $\mathcal{M}(\vec{p})$ matches the real improvement using the exact non-linear function. A strategy for the adjustment of $\lambda$ was explained in section II.6.

Appendix B Experimental data used in the ⁵⁶Fe example evaluation

REAC	EXFOR	AUTHOR	YEAR	REF	REAC	EXFOR	AUTHOR	YEAR	REF
(N,2N)	20091004	Wenusch and Vonach	1962	Wenusch and Vonach (1962)	(N,P)	10031005	Barrall et al	1969	Barrall et al. (1969a)
(N,2N)	20721021	Molla and Qaim	1977	Molla and Qaim (1977)	(N,P)	10289002	Dyer and Hamilton	1972	Dyer and Hamilton (1972)
(N,2N)	20854015	Corcalciuc	1978	Corcalciuc et al. (1978)	(N,P)	10309004	Singh	1972	Singh (1972)
(N,2N)	20416003	Frehaut et al	1980	Frehaut et al. (1980)	(N,P)	20798003	Robertson et al	1973	Robertson et al. (1973)
(N,2N)	13132002	Bowers and Greenwood	1989	Bowers and Greenwood (1989)	(N,P)	12956012	Spangler et al	1975	Spangler et al. (1975)
(N,2N)	23171003	Wallner et al	2011	Wallner et al. (2011)	(N,P)	21049005	Mostafa	1976	Mostafa (1976)
(N,A)	12812012	Saraf et al	1991	Saraf et al. (1991)	(N,P)	10835002	Sothras	1977	Sothras (1977)
(N,A)	32737002	Wang et al	2015	Wang et al. (2015a)	(N,P)	20721094	Molla and Qaim	1977	Molla and Qaim (1977)
(N,A)	32737003	Wang et al	2015	Wang et al. (2015b)	(N,P)	20993002	Kudo	1977	Kudo (1977)
(N,D)	10827031	Grimes et al	1979	Grimes et al. (1979)	(N,P)	41313002	Ramendik et al	1977	Ramendik et al. (1977)
(N,EL)	11708003	Kinney	1968	Kinney (1968)	(N,P)	20772003	Ryves et al	1978	Ryves et al. (1978)
(N,EL)	10037004	Boschung et al	1971	Boschung et al. (1971)	(N,P)	30483002	Chi-Chou et al	1978	Chi-Chou et al. (1978)
(N,EL)	10958012	El-Khadi et al	1982	El-Kadi et al. (1982)	(N,P)	30483003	Chi-Chou et al	1978	Chi-Chou et al. (1978)
(N,EL)	14462004	Ramirez et al	2017	Ramirez et al. (2017)	(N,P)	30676002	Sharma et al	1978	Sharma et al. (1978)
(N,INL)	41316002	Kozyr and Prokopets	1978	Kozyr and Prokopets (1978)	(N,P)	40485002	Nemilov and Tofimov	1978	Nemilov and Trofimov (1978)
(N,INL)	30656021	Xiamin et al	1982	Xiamin et al. (1982)	(N,P)	30562019	Ngoc et al	1980	Ngoc et al. (1980)
(N,INL)	41156006	Simakov et al	1992	Simakov et al. (1992)	(N,P)	21868002	Kudo	1982	Kudo (1982)
(N,INL)	23134005	Beyer et al	2014	Beyer et al. (2014b)	(N,P)	30644006	Viennot et al	1982	Viennot et al. (1982)
(N,N+P)	41118013	Klochkova	1992	Klochkova et al. (1992)	(N,P)	30802002	Ngoc et al	1983	Ngoc et al. (1983)
(N,P)	11274031	Paul and Clarke	1953	Paul and Clarke (1953)	(N,P)	21923003	Kudo	1984
(N,P)	11703002	Mcclure and Kent	1955	Mcclure and Kent (1955)	(N,P)	30707013	Gupta et al	1985	Gupta et al. (1985)
(N,P)	20280004	Yasumi	1957	Yasumi (1957)	(N,P)	30807008	Garlea et al	1985	Garlea et al. (1985)
(N,P)	21487008	Allan	1957	Allan (1957)	(N,P)	12969013	Meadows et al	1987	Meadows et al. (1987)
(N,P)	11715003	Terrell and Holm	1958	Terrell and Holm (1958)	(N,P)	30755003	Muyao et al	1987	Muyao et al. (1987)
(N,P)	11715004	Terrell and Holm	1958	Terrell and Holm (1958)	(N,P)	22089042	Ikeda et al	1988	Ikeda et al. (1988)
(N,P)	11464006	Thompson and Ferguson	1959	Kern et al. (1959)	(N,P)	22093011	Kimura and Kobayashi	1990	Kimura and Kobayashi (1990)
(N,P)	21419006	Depraz et al	1960	Depraz et al. (1960)	(N,P)	12812010	Saraf et al	1991	Saraf et al. (1991)
(N,P)	11718005	Chittenden et al	1961	Chittenden et al. (1961)	(N,P)	22338048	Ercan et al	1991	Ercan et al. (1991)
(N,P)	21352002	Pollehn and Neuert	1961	Pollehn and Neuert (1961)	(N,P)	30978022	Viennot et al	1991	Viennot et al. (1991)
(N,P)	21352007	Pollehn and Neuert	1961	Pollehn and Neuert (1961)	(N,P)	31479002	Fuga	1991	Fuga (1991)
(N,P)	11494009	Gabbard and Kern	1962	Gabbard and Kern (1962)	(N,P)	31479003	Fuga	1991	Fuga (1991)
(N,P)	21339005	Bormann et al	1962	Bormann et al. (1962)	(N,P)	31524008	Belgaid et al	1992	Belgaid et al. (1992)
(N,P)	11696007	Cross et al	1963	Cross et al. (1963)	(N,P)	41118012	Klochkova et al	1992	Klochkova et al. (1992)
(N,P)	11701002	Santry and Butler	1964	Santry and Butler (1964)	(N,P)	22312004	Ikeda et al	1993	Ikeda et al. (1993)
(N,P)	11701003	Santry and Butler	1964	Santry and Butler (1964)	(N,P)	30993003	Zongyu et al	1993	Zongyu et al. (1993)
(N,P)	11701004	Santry and Butler	1964	Santry and Butler (1964)	(N,P)	41240012	Filatenkov et al	1999	Filatenkov et al. (1999)
(N,P)	20888004	Bonazzola	1964	Bonazzola et al. (1964)	(N,P)	22414017	Fessler et al	2000	Fessler et al. (2000)
(N,P)	20377002	Liskien and Paulsen	1965	Liskien and Paulsen (1965)	(N,P)	22497004	Coszach et al	2000	Coszach et al. (2000)
(N,P)	20887015	Bormann et al	1965	Bormann et al. (1965)	(N,P)	22976017	Mannhart and Schmidt	2007	Mannhart and Schmidt (2007)
(N,P)	20387004	Liskien and Paulsen	1966	Liskien and Paulsen (1966)	(N,P)	33045003	Mulik et al	2013	Mulik et al. (2013)
(N,P)	21372003	Hemingway et al	1966	Hemingway et al. (1966)	(N,P)	41614019	Filatenkov	2016	Filatenkov (2016)
(N,P)	10417007	Grundl	1967	Grundl (1967)	(N,T)	20669004	Qaim and Stocklin	1976	Qaim and Stöcklin (1976)
(N,P)	20815014	Vonach et al	1968	Vonach et al. (1968)	(N,TOT)	10037005	Boschung et al	1971	Boschung et al. (1971)
(N,P)	20890004	Cuzzocrea et al	1968	Cuzzocrea et al. (1968)	(N,TOT)	41325003	Tutubalin et al	1973	Tutubalin et al. (1973)
(N,P)	10022010	Barrall et al	1969	Barrall et al. (1969b)	(N,TOT)	13764002	Harvey	1987

Table 6: Experimental datasets used in the example evaluation of neutron-induced reactionf of ⁵⁶Fe between five and thirty MeV. The column EXFOR contains the EXFOR accession number. A missing reference means that no accessible publication is known to the authors.