Derivative based global sensitivity analysis and its entropic link

Jiannan Yang
School of Physics, Engineering and Technology
University of York
Heslington, York, YO10 5DD, UK
[email protected]

(May 9, 2024)

Abstract

Variance-based Sobol’ sensitivity is one of the most well known measures in global sensitivity analysis (GSA). However, uncertainties with certain distributions, such as highly skewed distributions or those with a heavy tail, cannot be adequately characterised using the second central moment only. Entropy-based GSA can consider the entire probability density function, but its application has been limited because it is difficult to estimate. Here we present a novel derivative-based upper bound for conditional entropies, to efficiently rank uncertain variables and to work as a proxy for entropy-based total effect indices. To overcome the non-desirable issue of negativity for differential entropies as sensitivity indices, we discuss an exponentiation of the total effect entropy and its proxy. We found that the proposed new entropy proxy is equivalent to the proxy for variance-based GSA for linear functions with Gaussian inputs, but outperforms the latter for a river flood physics model with 8 inputs of different distributions. We expect the new entropy proxy to increase the variable screening power of derivative-based GSA and to complement Sobol’-index proxy for a more diverse type of distributions.

Keywords sensitivity proxy; sensitivity inequality; conditional entropy; exponential entropy; DGSM; Ishigami function

1 Introduction

This research is motivated by applications of global sensitivity analysis (GSA) towards mathematical models. The use of mathematical models to simulate real-world phenomena is firmly established in many areas of science and technology. The input data for the models are often uncertain, as they could be from multiple sources and of different levels of relevance. The uncertain inputs of a mathematical model induce uncertainties in the output. GSA helps to identify the influential inputs and is becoming an integral part of mathematical modelling.

The most common GSA approach examines variability using the output variance. Variance-based methods, also called Sobol’ indices, decompose the function output into a linear combination of input and interaction of increasing dimensionality, and estimate the contribution of each input factor to the variance of the output [2]. As only the 2nd order moments are considered, it was pointed out in [3, 18, 19] that the variance-based sensitivity measure is not well suited for heavy tailed or multimodal distributions. Entropy is a measure of uncertainty similar to variance: higher entropy tends to indicate higher variance (for Gaussian, entropy is proportional to log variance). Nevertheless, entropy is moment-independent as it is based on the entire probability density function of the model output. It was shown in [3] that entropy-based methods and variance-based methods can sometimes produce significantly different results.

Both variance-based and entropy-based global sensitivity analysis (GSA) can provide quantitative contributions of each input variable to the output quantity of interest. However, the estimation of variance and entropy-based sensitivity indices can become expensive in terms of the number of model evaluations. For example, the computational cost using sampling based estimation for variance-based indices is $N(d+1)$ citesaltelli2008book, where $N$ is base sample number and $d$ is the input dimension. Large values of $N$ , normally in the order of thousands or tenths of thousands, is needed for more accurate estimate, and the computational cost has been noted as one of the main drawbacks of the variance-based GSA in [2]. In addition, it was noted in [3] that although both variance-based and entropy-based sensitivity analysis take long computational time, the convergence for entropy-based indices is even slower.

In contrast, derivative-based methods are much more efficient as only the average of the functional gradients across the input space is needed. For example, the Morris’ method [4] constructs a global sensitivity measure by computing a weighted mean of the finite difference approximation to the partial derivatives, and it requires only a few model evaluations. The computational time required can be many orders of magnitude lower than that for estimation of the Sobol’ sensitivity indices [5] and it is thus often used for screening a large number of input variables.

Previous studies have found a link between the derivative-based and variance-based total effect indices. In [6], a sensitivity measure $\mu*$ is proposed based on the absolute values of the partial derivatives. It is empirically demonstrated that for some practical problems, $\mu*$ is similar to the variance-based total index. In [7], Sobol and Kucherenko have proposed the so-called derivative-based global sensitivity measures (DGSM). This importance criterion is similar to the modified Morris measure, except that the squared partial derivatives are used instead of their absolute values. In addition, an inequality link between variance-based global sensitivity indices and the DGSM is established in the case of uniform or Gaussian input variables.

This inequality between DGSM and variance-based GSA has been extended to input variables belonging to the large class of Boltzmann probability measures in [8]. A new sensitivity index, which is defined as a constant times the crude derivative-based sensitivity, is shown to be a maximal bound of the variance-based total sensitivity index. Furthermore, in [9], the variance-based sensitivity indices are interpreted as difference-based measures, where the total sensitivity index is equivalent to taking a difference in the output when perturbing one of the parameters with the other parameters fixed. The similarity to partial derivatives helps to explain why the mean of absolute elementary effects from the Morris’ method can be a good proxy for the total sensitivity index.

Inspired by the success of derivative-based proxies for the Sobol’ indices, in this paper, we present a novel derivative-based upper bound for conditional entropies. The key idea here is to make use of a well known inequality between the entropy of a continuous random variable and its deterministic transformation. This inequality can be seen as a version of the information processing inequality and is shown here to provide an upper bound for the total effect entropy sensitivity measure. This upper bound is demonstrated to efficiently rank uncertain variables and can thus be used as a proxy for entropy-based total effect index. And that is the main contribution of this paper.

In addition, via exponentiation, we extend the upper bound to DGSM for a total sensitivity measure based on exponential entropies and exponential powers (also known as effective variance). In the special case with Gaussian inputs and linear functions, the proposed new proxy for entropy GSA is found to be equivalent to the proxy for variance-based total effect sensitivity. The detail of the above mentioned relationship between entropy and variance proxies are given in Section 3 and is summarised in Figure 1 for an overview.

Refer to caption — Figure 1: A summary for the entropy proxies developed in this paper, the relationship among them and the link with variance-based sensitivity proxy. $H$ and $V$ represent differential entropy and variance respectively and the terms are explained in Table 2

In this paper, we focus on entropy-based GSA. However, it should be noted that there are many other moment-independent sensitivity measures [10], which are often based on a distance metric to measure the discrepancy between the conditional and unconditional output probability density functions (PDFs). For example, sensitivity indices based on the modification of the input PDFs have been proposed in [11] for reliability sensitivity analysis, where the input perturbation is derived from minimizing the probability divergence under constraints. [12] proposed a moment independent $\delta$ -indicator that looks at the entire input/output distribution. The definition of $\delta$ -indicator examines the expected total shift between the conditional and unconditional output PDFs, where the shift is conditional on one or more of the random input variables. Recently, the Fisher Information Matrix has been proposed to examine the perturbation of the entire joint probability density function (jPDF) of the outputs, and is closely linked to the relative entropy between the jPDF of the outputs and its perturbation due to an infinitesimal variation of the input distributions [13, 14].

In what follows, we will first review global sensitivity measures in Section 2, where the motivation for the entropy-based measure and its proxy is discussed with an example. In Section 3, we establish the inequality relationship between the total effect entropy measure and the partial derivatives of the function of interest, where mathematical proofs and numerical verifications are provided. In Section 4, we demonstrate the advantages of exponential entropy as a sensitivity measure and discuss its link with variance-based indices and their proxy. Concluding remarks are given in Section 5.

2 Global sensitivity measures

This research is motivated by applications of global sensitivity analysis (GSA) towards mathematical models. Such problems are common in computer experiments, where a physical phenomenon is studied with a complex numerical code and GSA is employed to better understand how they work, reduce the dimensionality of the problem, and help with calibration and verification [15]. In this context, an important question for GSA is: ‘Which model inputs can be fixed anywhere over its range of variability without affecting the output?’ [2]. In this section, we first review both variance-based and derivative-based sensitivity measures, which can provide answers to the above ‘screening’ question. After discussing the links between the two measures, we will use a simple example to motivate the use of entropy-based sensitivity indices.

2.1 Total effect sensitivity and its link with derivative-based measures

The variance-based total effect measure accounts for the total contribution of an input to the output variation, and is often a preferred approach due to its intuitive interpretation and quantitative nature.

Let us denote $\mathbf{x}=(x_{1},x_{2},\dots,x_{d})$ as independent random input variables, and $y$ being the output of our computational model represented by a function $g$ , such that $y=g(\mathbf{x})$ .

The variance-based GSA decompose the output variance $V(Y)$ into conditional terms [2]:

V(Y)=\sum_{i}{V_{i}}+\sum_{i}\sum_{i<j}V_{ij}+\cdots+V_{1,2,\dots,d}

(1)

where

V_{i}=V[\mathbb{E}(Y|x_{i})];\quad V_{ij}=V[\mathbb{E}(Y|x_{i},x_{j})]-V[\mathbb{E}(Y|x_{i})]-V[\mathbb{E}(Y|x_{j})]

and so on for the higher interaction terms. $V_{i}$ measures the first order effect variance and $V_{ij}$ for a second order effect variance, where their contributions to the unconditional model output variance can be quantified as $V_{i}/V(Y)$ and $V_{ij}/V(Y)$ respectively. Analogous formulas can be written for higher-order terms, enabling the analyst to quantify the higher-order interactions.

The total order sensitivity index is then defined as:

S_{T_{i}}=\frac{\mathbb{E}[V(Y|x_{\sim i})]}{V(Y)}=\frac{V(Y|X_{\sim i})}{V(Y)}

(2)

where $x_{\sim i}$ is the set of all inputs except $x_{i}$ . The total order sensitivity index measures the total contribution of the input $x_{i}$ to the output variance, including its first order effect and its interactions of any order with other inputs.

The most direct computation of $S_{T_{i}}$ is via Monte Carlo (MC) estimation because it does not impose any assumption on the functional form of the response function. Various estimators for $S_{T_{i}}$ have been developed where the main difference is in the sampling design. A recent paper on comparing performance of several existing total-order sensitivity estimators can be found in [16]. Regardless of the specific estimator chosen, the computational cost for $S_{T_{i}}$ is a least $N(d+1)$ , where $N$ is base sample number. Large values of $N$ , normally in the order of thousands or tenths of thousands, is needed for more accurate estimate of $S_{T_{i}}$ , and the computational cost has been noted as one of the main drawbacks of the variance-based GSA in [2].

The elementary effect method, also known as Morris’ method [2, 4], on the other hand is a simple and effective screening method for GSA. The Morris’ method is based on the construction of $r$ trajectories in the sampling space and the total cost of computation is $r(d+1)$ . $r$ is typically between 10 and 50 [17], as compared to $N$ which is in the order of thousands for the estimation of $S_{T_{i}}$ .

When the function $g(\cdot)$ is differentiable, functionals based on $g_{x_{i}}=\partial g/\partial x_{i}$ have been proposed to examine the global sensitivity with respect to the parameter $x_{i}$ . The modified Morris sensitivity index $\mu^{*}$ [17] is an approximation of the following functional:

\mu_{i}=\mathbb{E}\left[\left|\frac{\partial g(\mathbf{X})}{\partial x_{i}}\right|\right]=\int\left|\frac{\partial g(\mathbf{x})}{\partial x_{i}}\right|f_{X}(\mathbf{x})d\mathbf{x}

(3)

for $i=1,2,\dots,d$ and $f_{X}(\mathbf{x})$ is the joint probability density function (jPDF) of $\mathbf{x}$ . It has been empirically demonstrated in [17] that for some practical problems $\mu^{*}$ is similar to the variance-based total index $S_{T_{i}}$ .

Based on that observation, the Derivative-based Global Sensitivity Measure (DGSM):

\nu_{i}=\mathbb{E}\left[\left(\frac{\partial g(\mathbf{X})}{\partial x_{i}}\right)^{2}\right]=\int\left(\frac{\partial g(\mathbf{x})}{\partial x_{i}}\right)^{2}f_{X}(\mathbf{x})d\mathbf{x}

(4)

has been proposed to be used as a proxy for $S_{T_{i}}$ [7, 8]. In particular, the total sensitivity variance $V(Y|\mathbf{X}_{\sim i})$ is upper bounded by DGSM via the following inequality:

V(Y|\mathbf{X}_{\sim i})\leq 4c_{i}^{2}\nu_{i}

(5)

which is developed in [8] for random inputs with log-concave distributions such as exponential, beta and gamma distributions. $c_{i}$ is called the Cheeger constant. Although the Cheeger constant has no analytical expressions for general log-concave measures, it can be estimated numerically using the distribution function of the uncertain input as shown in [8].

Divide Eq 5 by the output variance $V(Y)$ from both sides, we then have the upper bound $S_{T_{i}}\leq 4c_{i}^{2}\nu_{i}/V(Y)$ which can be used as a proxy for $S_{T_{i}}$ . For Gaussian inputs with variance $\sigma_{i}^{2}$ , the Cheeger constant $c_{i}=\sigma_{i}/2$ and the inequality in Eq 5 becomes $S_{T_{i}}\leq\sigma_{i}^{2}\nu_{i}/V(Y)$ as given in [7].

2.2 A motivating example and entropy-based sensitivity

It has been pointed out in [18] that a major limitation of variance-based sensitivity indices is that they implicitly assume that output variance is a sensible measure of the output uncertainty, which might not always be the case. For instance, if the output distribution is multi-modal or if it is highly skewed, using variance as a proxy of uncertainty may lead to contradictory results.

To illustrate this point, we look at the simple function $y=x_{1}/x_{2}$ . In this case, the two inputs both follow the chi-squared $\chi^{2}$ distribution with $x_{1}\sim\chi^{2}(10)$ and $x_{1}\sim\chi^{2}(13.978)$ , and are assumed to be independent. This results in a positively skewed distribution of $Y$ with a heavy tail. This example has been used in [19] to show that variance-based sensitivity indices fail to properly rank the inputs of a model whose output has a highly skewed distribution.

To overcome this issue, a Kullback-Leibler (KL) divergence based metric has been proposed in [19]:

KL_{T_{i}}=\int f_{1}(y(x_{1},\dots,\bar{x}_{i},\dots,x_{d}))\ln\frac{f_{1}(y(x_{1},\dots,\bar{x}_{i},\dots,x_{d}))}{f_{0}(y(x_{1},\dots,x_{i},\dots,x_{d}))}dy

(6)

where $f_{1}(y)$ and $f_{0}(y)$ are the probability density function of the output, depending on whether $x_{i}$ is fixed, usually at its mean. The larger the $KL_{T_{i}}$ , the more important $X_{i}$ is. It was found in [19] that the effect of $X_{1}$ is higher in terms of divergence of the output distribution, but the variance-based total index shows that $X_{1}$ and $X_{2}$ are equally important. The higher influence of $X_{1}$ has also been confirmed in [18] where the sensitivity is characterised by the change of cumulative distribution function of the output.

Table 1: Sensitivity results for

y=x_{1}/x_{2}

. Results for

S_{T_{i}}

and

KL_{T_{i}}

are reproduced from [19].

Variance-based total index K-L divergence based index Derivative-based DGSM Entropy-based total index Variable $S_{T_{i}}$ $KL_{T_{i}}$ $\nu_{i}$ $\eta_{T_{i}}$ $x_{1}\sim\chi^{2}(10)$ 0.546 0.1571 8.4e-3 0.510 $x_{2}\sim\chi^{2}(13.978)$ 0.547 0.0791 2.1e-2 0.213

We reproduce the sensitivity results of $S_{T_{i}}$ and $KL_{T_{i}}$ in Table 1 from [19]. In comparison, we have also computed the DGSM results, using numerical integrations as the derivatives are analytically available for this simple function. It can be seen from Table 1 that DGSM also provides misleading results in this case.

In addition, we also compare the results with the entropy-based total sensitivity index [20]:

\eta_{T_{i}}=\frac{\mathbb{E}[H(Y|\mathbf{x}_{\sim i})]}{H(Y)}=\frac{H(Y|\mathbf{X}_{\sim i})}{H(Y)}

(7)

where the average is calculated over all possible values of $\mathbf{X}_{\sim i}$ and $H$ is the differential entropy, e.g. $H(Y)=-\int f(y)\ln f(y)dy$ . $\eta_{T_{i}}$ measures the remaining entropy of $Y$ if the true values of $\mathbf{X}_{\sim i}$ can be determined, in analogy to the variance-based total effect index $S_{T_{i}}$ . Note that $\eta_{T_{i}}\leq 1$ because $H(Y|\mathbf{X}_{\sim i})\leq{H(Y)}$ [21].

$\eta_{T_{i}}$ has been estimated numerically based on the histogram method given in Appendix A, where $10^{7}$ samples are used. It can be seen from Table 1 that the entropy-based $\eta_{T_{i}}$ is able to effectively identify the higher influence of $X_{1}$ for this example. Note that different from $KL_{T_{i}}$ which is conditional on the value of $x_{i}$ , for example $x_{i}$ are set at their mean values in Table 1, $\eta_{T_{i}}$ is an un-conditional sensitivity measure as all possible values of the inputs are averaged out.

We note in passing that analogously to the variance based sensitivity indices, a first order entropy index can also be defined as $\eta_{i}={H(Y)-H(Y|X_{i})}/{H(Y)}=I(X_{i},Y)/{H(Y)}$ [22]. $I(X_{i},Y)$ is the mutual information which measures how much knowing $X_{i}$ reduces uncertainty of $Y$ or vice versa. The index $\eta_{i}$ can thus be regarded as a measure of the excepted reduction in the entropy of of $Y$ by fixing $X_{i}$ .

2.3 Summary

From the motivating example, it became clear that variance-based total sensitivity index might provide misleading sensitivity results, due to the fact that only the second moment is considered. This is especially the case for highly skewed or multi-modal distributions . This limitation is overcome by the entropy-based measures which are applicable independent of the underlying shape of the distribution.

However, the entropy-based indices have limited application in practice, mainly due to the heavy computational burden where the knowledge of conditional probability distributions are required. Both histogram and kernel based density estimation methods have computational challenges for entropy-based sensitivity indices as pointed out in [18]. For example, it was noted in [3] that although both variance-based and entropy-based sensitivity analysis take long computational time, the convergence for conditional entropy estimation is even slower.

Motivated by the above issues and inspired by the low-cost sensitivity proxy for variance-based measures, the objective of this paper is to find a computational efficient proxy for entropy-based total sensitivity measure. An overview of the entropy proxies is summarised in Figure 1, while detail for their derivation is given in Section 3. For ease of reference, the main sensitivity measures considered in this paper are listed in Table 2.

Table 2: Summary list for sensitivity measures considered in this paper

Category Notation Description Derivative-based GSA measure $\mu_{i}$ Limiting value of the modified Morris index $\mathbb{E}_{X}\left[\left|g_{x_{i}}\right|\right]$ $\nu_{i}$ Derivative-based global sensitivity measure (DGSM) $\mathbb{E}_{X}\left[\left|g_{x_{i}}\right|^{2}\right]$ $l_{i}$ Log-derivative sensitivity measure $\mathbb{E}_{X}\left[\ln{\left|g_{x_{i}}\right|}\right]$ Variance-based GSA measure $V(Y|\mathbf{X}_{\sim i})$ Total effect variance $S_{T_{i}}$ Variance-based total effect index $V(Y|\mathbf{X}_{\sim i})/{V(Y)}$ Entropy-based GSA measure $H(Y|\mathbf{X}_{\sim i})$ Total effect entropy $\eta_{T_{i}}$ Entropy-based total effect index $H(Y|\mathbf{X}_{\sim i})/{H(Y)}$ $e^{H(Y|\mathbf{X}_{\sim i})}$ Total effect exponential entropy $\kappa_{T_{i}}$ Exponential entropy based total effect index ${e^{H(Y|\mathbf{X}_{\sim i})}}/{e^{H}(Y)}$

3 Link between the partial derivatives and the total effect entropy measure

In this section, the relationship between the function partial derivatives and entropy-based sensitivity indices is explored. We will first establish the link in Section 3.1 and provide numerical verifications in Section 3.2 and 3.3.

Note that in this section we will consider the function $y=g(\mathbf{x})$ where the function $g(\cdot)$ is differentiable. Recall that our interest here is for applications of global sensitivity analysis (GSA) towards mathematical models, where a physical phenomenon is typically studied with a complex numerical code. The computation of the partial derivatives can then be obtained via the companion adjoint code, or numerically estimated by a finite difference method. For example, the derivative-based DGSM can be estimated by finite difference, and this can be performed efficiently via Monte Carlo sampling as discussed in [5].

3.1 An upper bound for the total effect entropy

Let $X$ be a continuous random variable with the probability density function (PDF) given by $f_{X}(x)$ . $Y=g(X)$ can be regarded as a transformed variable and the transformed PDF $f_{Y}(y)$ can then be found via the Jacobian matrix for the calculation of its differential entropy $H(Y)=-\int f(y)\ln f(y)dy$ .

For a general vector transformation $\mathbf{Y}=\mathbf{g}(\mathbf{X})$ , the corresponding differential entropy are related as [23]:

\begin{split}H(\mathbf{Y})&\leq H(\mathbf{X})+\int f_{X}(\mathbf{x})\ln{\left|\det{\mathbb{J}}\right|}d\mathbf{x}\end{split}

(8)

where $\mathbb{J}$ is the Jacobian matrix with $\mathbb{J}_{ij}=\partial g_{i}/\partial x_{j}$ and $f_{X}(\mathbf{x})$ is the joint probability density function (jPDF) of $\mathbf{x}$ . The above inequality becomes an equality if the transform is a bijection, i.e. an invertible transformation. Note that we have not assumed independent inputs for the inequality above.

As show in [23], Eq 8 can be proved substituting the transformed PDF $f(\mathbf{y})=f(\mathbf{x})/\det{\mathbb{J}}$ into the expression of $H(Y)$ and note there will be a reduction of entropy if the transformation is not one-to-one. Following this line of thought, it was noted in [24] that Eq 8 can also be seen as one version of the data processing inequality, where the transformation does not increase information.

Now set $y_{1}=g_{1}(\mathbf{x})=g(\mathbf{x})$ where $\mathbf{x}=(x_{1},x_{2},\dots,x_{d})$ , and introduce dummy variables $y_{i}=g_{i}(\mathbf{x})=x_{i}$ with $i=2,\dots,d$ . In this setting, the Jacobian matrix from the 2nd row onwards, i.e. $i\geq 2$ , $\partial g_{i}/\partial x_{j}=1$ when $i=j$ and $\partial g_{i}/\partial x_{j}=0$ when $i\neq j$ . Therefore, the Jacobian matrix in this case is a triangular matrix. As a result, the Jacobian determinant is the product of the diagonal entries:

\det{\mathbb{J}}=\left|\frac{\partial g_{1}}{\partial x_{1}}\times\underbrace{1\times\dots\times 1}_{2\,\text{to}\,d}\right|=\left|\frac{\partial g_{1}}{\partial x_{1}}\right|

(9)

The inequality in Eq 8 is thus:

\begin{split}H(\mathbf{Y})\leq H(\mathbf{X})+\int f_{X}(\mathbf{x})\ln{\left|\frac{\partial g_{1}(\mathbf{x})}{\partial x_{1}}\right|}d\mathbf{x}\end{split}

(10)

where $\mathbf{Y}=\{Y,X_{2},X_{3},\dots,X_{d}\}$ .

On the left hand side of the above inequality, the joint entropy of $\mathbf{Y}$ can be expressed using the conditional entropies as $H(\mathbf{Y})=H(Y_{1}|\mathbf{X}_{2\sim d})+H(\mathbf{X}_{2\sim d})$ [21]. The subscript $2\sim d$ indicates the index ranging from $2$ to $d$ . On the right hand side of Eq 10, we have $H(\mathbf{X})\leq H(X_{1})+H(\mathbf{X}_{2\sim d})$ using the subadditivity property of the joint entropy of the input variables. The joint entropy $H(\mathbf{X})$ becomes additive if the input variables are independent.

Putting these together, the inequality of Eq 10 then becomes:

H(Y|\mathbf{X}_{2\sim n})\leq H(X_{1})+\mathbb{E}\left[\ln{\left|\frac{\partial g(\mathbf{x})}{\partial x_{1}}\right|}\right]

where the expectation is with respect to the jPDF of the input variables. Note that $H(Y|\mathbf{X}_{2\sim d})=\mathbb{E}\left[H(Y|\mathbf{x}_{2\sim d})\right]$ is the expected conditional entropy.

The reasoning above uses the first variable $x_{1}$ as an example. However, the results hold for any variables via simple row/column exchanges, which only affects the sign of the determinant but not its modulus.

Therefore, we have the following equality:

H(Y|\mathbf{X}_{\sim i})\leq H(X_{i})+l_{i}

(11)

where $\sim i$ indicates the index ranges from $1$ to $d$ excluding $i$ , and we have introduced the following notation $l_{i}=\mathbb{E}\left[\ln{\left|{\partial g(\mathbf{x})}/{\partial x_{i}}\right|}\right]$ .

The conditional entropy $H(Y|\mathbf{X}_{\sim i})$ measures the remaining entropy of $Y$ if the true values of $\mathbf{X}_{\sim i}$ can be determined, in analogy to the total effect index $S_{T_{i}}$ from variance-based indices. Note that the conditional entropy $H(Y|\mathbf{X}_{\sim i})=\mathbb{E}\left[H(Y|\mathbf{x}_{\sim i})\right]$ is a global measure of uncertainty as the expectation is with respect to all possible values of $\mathbf{X}_{\sim i}$ .

The above inequality thus demonstrates that, for a differentiable function $y=g(\mathbf{x})$ , the entropy-based total sensitivity is bounded by the expectation of log partial derivatives of the function, with the addition of the entropy of the input variable of interest. And this inequality becomes an equality if the input variables are independent and the transformation $g(\cdot)$ is bijection, i.e. $g:X\rightarrow Y$ is an invertible transformation and it has a unique inverse.

The inequality in Eq 11 is one of the main contributions of this paper. It establishes an upper bound for the total effect entropy $H(Y|\mathbf{X}_{\sim i})$ using computationally efficient partial-derivative based functionals. As smaller $l_{i}+H(X_{i})$ tends to indicate smaller total effect entropy, it can thus be used to screen uninfluential variables and work as a low cost proxy for entropy-based indices.

The use of $l_{i}+H(X_{i})$ as a proxy for the total effect entropy is similar to the DGSM-based upper bound for the variance-based $S_{T_{i}}$ described in Eq 5. However, there is a major issue with the inequality in Eq 11: we cannot normalise Eq 11 by $H(Y)$ to obtain a bound on the entropy-based total effect indices $\eta_{T_{i}}$ , as $H(Y)$ can potentially be negative. In Section 4, we will overcome this issue by taking the exponentiation of the inequality in Eq 11, and demonstrate that the exponential entropy based measure has a more intuitive sensitivity interpretation and is closely linked to the variance-DGSM proxy. Before that, we will first verify Eq 11 using several examples in the next two subsections.

3.2 Examples - monotonic functions

We will first test monotonic functions where the equality of Eq 11 is expected, and examine the inequality with more general functions in the next section. All input variables are assumed to have the same uniform distribution for examples 1 - 4, i.e. $x_{i}\sim\mathbb{U}(0,1)$ , while Gaussian distributions are used for example 5. For verification purposes, all the examples in this section are chosen to have tractable expressions for both the integral of derivatives and the conditional entropies. Note that for simplicity the input variables are assumed to be independent in these numerical examples. Although input independence is required for the equality condition, the inequality in Eq 11 makes no assumptions of independence or distribution type for $X_{i}$ .

We can see below that $H(Y|\mathbf{X}_{\sim i})=H(X_{i})+l_{i}$ for example 1 - 5 and that numerically verifies the equality case for Eq 11. For examples 1 - 3, the conditional entropies are also numerically estimated using the method given in Appendix A. This is to demonstrate that although numerical estimation of the total effect entropy $H(Y|\mathbf{X}_{\sim i})$ can be conducted for the low dimensional problem, it can be computationally very expensive which is prohibitive for complex models.

Table 3: Total effect entropy for the monotonic functions in examples 1 - 3 (surface plots in Figure 2). Conditional entropy results are obtained from both analytical calculation (Exact results) and via Monte Carlo (MC) sampling. The MC results are with number of samples ranging from

10^{3}

10^{8}

. The results from

10^{8}

samples are compared to the exact results and the relative error are less than 1% for all functions. Also given are the corresponding analytical results for

H(X_{i})+l_{i}

Number of Samples $x_{i}\sim\mathbb{U}(0,1)$ $y=x_{1}+e^{x_{2}}$ $y=x_{1}\times x_{2}$ $y=x_{1}+3x_{2}$ $H(Y|\mathbf{X}_{\sim 1})$ $H(Y|\mathbf{X}_{\sim 2})$ $H(Y|\mathbf{X}_{\sim 1})$ $H(Y|\mathbf{X}_{\sim 2})$ $H(Y|\mathbf{X}_{\sim 1})$ $H(Y|\mathbf{X}_{\sim 2})$ 1.00E+03 0.0934 0.5251 -0.9057 -0.8468 0.2279 1.1269 1.00E+04 0.0602 0.5225 -0.9111 -0.9211 0.1043 1.1093 1.00E+05 0.0315 0.5131 -0.9528 -0.9593 0.0526 1.1047 1.00E+06 0.0148 0.5069 -0.9760 -0.9755 0.0242 1.1019 1.00E+07 0.0068 0.5034 -0.9879 -0.9878 0.0113 1.1000 1.00E+08 0.0032 0.5015 -0.9939 -0.9939 0.0052 1.0993 Exact results 0.0000 0.5000 -1.0000 -1.0000 0.0000 1.0986 error - -0.30% 0.61% 0.61% - -0.06% $X_{1}$ $X_{2}$ $X_{1}$ $X_{2}$ $X_{1}$ $X_{2}$ $H(X_{i})+l_{i}$ 0.0000 0.5000 -1.0000 -1.0000 0.0000 1.0986

Example 1.

$y=x_{1}+e^{x_{2}}$
For variable $x_{1}$ , $H(Y|X_{\sim 1})=H(Y|X_{2})=\int_{X_{2}}H(Y|X_{2}=x_{2})dx_{2}=H(X_{1})=0$ because the differential entropy remains constant under addition of a constant ( $e^{x_{2}}$ is a constant for $H(Y|x_{2})$ ). For right hand side of the inequality in Eq 11, we have $g_{x_{1}}=\partial y/\partial x_{1}=1$ , so $\mathbb{E}\left[\ln{\left|g_{x_{1}}\right|}\right]+H(X_{1})=0+0=0$ .
For variable $x_{2}$ , $H(Y|X_{\sim 2})=H(Y|X_{1})=\int_{X_{1}}H(Y|X_{1}=x_{1})dx_{1}=\int_{X_{1}}\left[-\int_{1}^{e}\frac{1}{y}\ln\frac{1}{y}dy\right]dx_{1}=1/2$ , where $1/y$ is the conditional PDF of $Y|x_{1}$ because the transformed variable $V=e^{x}$ has a PDF $V\sim 1/v,1<v<e$ . Similarly, we have $\partial y/\partial x_{2}=e^{x_{2}}$ , so $\mathbb{E}\left[\ln{\left|g_{x_{2}}\right|}\right]-H(X_{2})=1/2-0=1/2$ , which proves the equality as expected.

Example 2.

$y=x_{1}\times x_{2}$
Consider the function $y=x_{1}\times x_{2}$ , where the partial derivatives are $\partial y/\partial x_{1}=x_{2}$ and $\partial y/\partial x_{2}=x_{1}$ . For $x_{i}\sim\mathbb{U}(0,1)$ , the expected value $l_{i}=\mathbb{E}\left[\ln{\left|g_{x_{i}}\right|}\right]$ are $-1$ for both $x_{1}$ and $x_{2}$ . Recall that the differential entropy increases additively upon multiplication with a constant, $H(Y|X_{\sim 1})=H(Y|X_{2})=\mathbb{E}_{X_{2}}\left[H(x_{1})+\ln|x_{2}|\right]=\int_{0}^{1}\ln{x_{2}}d{x_{2}}=-1$ . Same results can be obtained for $x_{2}$ due to the symmetry between $x_{1}$ and $x_{2}$ . Therefore, $H(Y|\mathbf{X}_{\sim i})=H(X_{i})+l_{i}$ as given by Eq 11.

Example 3.

$y=x_{1}+3x_{2}$
Consider the function $y=x_{1}+3x_{2}$ , where the partial derivatives are 1 and 3 for $x_{1}$ and $x_{2}$ respectively. For $x_{i}\sim\mathbb{U}(0,1)$ , the expected value $l_{i}=\mathbb{E}\left[\ln{\left|g_{x_{i}}\right|}\right]$ can be integrated analytically as 0 and $\ln 3\simeq 1.0986$ respectively. It is straightforward to show that, as in previous examples, $H(Y|X_{\sim 1})=H(Y|X_{2})=\mathbb{E}\left[\ln{\left|g_{x_{1}}\right|}\right]-H(X_{1})=0$ and $H(Y|X_{\sim 2})=H(Y|X_{1})=\ln 3$ , which is the same as $l_{2}+H(X_{2})$ .

Example 4.

$y=x_{1}x_{2}^{r}$
Consider the product function $y=x_{1}x_{2}^{r}$ , where $r\geq 1$ (for $r=1$ , we recover the function in example 2). For $x_{1},x_{2}\sim\mathbb{U}(0,1)$ , it is straightforward to show that $\mathbb{E}\left[\ln{\left|g_{x_{1}}\right|}\right]=-r$ and $\mathbb{E}\left[\ln{\left|g_{x_{2}}\right|}\right]=\ln{r}-r$ .
For variable $x_{1}$ , $H(Y|X_{\sim 1})=H(Y|X_{2})=\mathbb{E}_{X_{2}}\left[H(x_{1})+\ln|x_{2}|\right]=\int_{0}^{1}\ln{x_{2}^{r}}d{x_{2}}=-r$ , which is the same as $l_{1}+H(X_{1})=-r+0=0$ .
For variable $x_{2}$ , $H(Y|X_{\sim 2})=H(Y|X_{1})=\mathbb{E}_{X_{1}}\left[H(Y|x_{1})+\ln|x_{1}|\right]=\mathbb{E}_{X_{1}}\left[-\int_{0}^{1}p(y|x_{1})\ln p(y|x_{1})\right]-1=\ln{r}-r$ , where the conditional PDF of $Y|x_{1}$ is $p(y|x_{1})=\frac{1}{r}y^{\frac{1}{r}-1}$ because the transformed variable $V=x^{r}$ has a PDF $V\sim\frac{1}{r}v^{\frac{1}{r}-1},0<v<1$ . This is the same as $\mathbb{E}\left[\ln{\left|g_{x_{2}}\right|}\right]-H(X_{2})=\ln{r}-r-0=\ln{r}-r$ .

Example 5.

$y=\sum_{i=1}^{n}a_{i}x_{i}$
This linear function, $y=\sum_{i=1}^{n}a_{i}x_{i}$ , has been used in [22] to demonstrate the equivalence between entropy based and variance based sensitivity indices for Guassian random inputs, i.e. $x_{i}\sim\mathbb{N}(\mu_{i},\sigma_{i}^{2})$ . In the case with independent inputs, the sensitivity index based on the conditional entropy can be obtained as $H(Y|\mathbf{X}_{\sim i})=1/2\ln(2\pi ea_{i}^{2}\sigma_{i}^{2})$ , and this is just the logarithmically scaled version of $H(X_{i})$ , i.e. $H(Y|\mathbf{X}_{\sim i})=H(X_{i})+\ln{|a_{i}|}$ . As $\mathbb{E}\left[\ln{\left|g_{x_{i}}\right|}\right]=\ln|a_{i}|$ for this simple linear function, the special equality case is obtained for the inequality given in Eq 11.

3.3 Examples - General functions

Both Ishigami function and G-function are commonly used test functions for global sensitivity analysis, due to the presence of strong interactions. These two functions, each with three input variables, are used in this section to demonstrate the inequality relationship derived in Eq 11. We will focus on the verification of Eq 11 in this section, but will re-visit these two functions in the next section for a discussion in comparison with variance-based sensitivity indices.

The conditional entropies are estimated numerically using Monte Carlo sampling as described in Appendix A. Different numbers of samples are used, ranging from $1e6$ to $1e8$ . For each estimation, the computation is repeated 20 times and both the mean value and the standard deviation (std) are reported in Table 4 and 5. For the estimation of derivative-based $l_{i}$ , Matlab’s inbuilt numerical integrator "integral" is used with the default tolerance setting.

Example 6.

The Ishigami function, $y=\sin(x_{1})+a\sin^{2}(x_{2})+bx^{4}_{3}\sin(x_{1})$ , is often used as an example for uncertainty and sensitivity analysis. It exhibits strong nonlinearity and nonmonotonicity, as can be seen in Figure 3a. In this case, $a=7$ and $b=0.1$ are used, and the input random variables have uniform distributions, i.e., $x_{i}\sim\mathbb{U}(-\pi,\pi)$ for $i=1,2,3$ .

The sensitivity results are listed in Table 4, where it is clear that inequality from Eq 11 is satisfied. The total effect entropy $H(Y|\mathbf{X}_{\sim i})$ is estimated with different numbers of samples, and each estimation with 20 repetitions for variability assurance. It is clear from the small std that the variation is small. However, the convergence of conditional entropy estimation is slow as large number of samples is needed, as we saw in Table 3.

Table 4: Total effect entropy results for the Ishigami function, which are obtained for different number of samples. This is repeated for 20 times and the mean and standard deviation (std) are given. The results from

10^{8}

samples are compared to

H(X_{i})+l_{i}

, for which the inequality in Eq 11 is clearly satisfied.

Ishigami function $y=\sin(x_{1})+7\sin^{2}(x_{2})+0.1x^{4}_{3}\sin(x_{1})$ Number of Samples $H(Y|\mathbf{X}_{\sim 1})$ $H(Y|\mathbf{X}_{\sim 2})$ $H(Y|\mathbf{X}_{\sim 3})$ mean std mean std mean std 1.00E+06 1.3902 0.0007 1.7614 0.0006 0.9701 0.0013 1.00E+07 1.2978 0.0003 1.7023 0.0001 0.7693 0.0004 1.00E+08 1.2335 0.0001 1.6609 0.0001 0.6066 0.0002 $X_{1}$ $X_{2}$ $X_{3}$ $H(Y|\mathbf{X}_{\sim i})$ 1.2335 1.6609 0.6066 $H(X_{i})+l_{i}$ 1.9024 3.0906 0.6626

Example 7.

Consider the so-called G-function, $y=\prod_{i=1}^{3}(|4x_{i}-2|+a_{i})/(1+a_{i})$ , which is often used for numerical experiments in sensitivity analysis. It is a highly nonlinear function, as can be seen in Figure 3b for a two-variable example. In this case, $a_{i}=(i-2)/2$ , for $i=1,2,3$ . The input random variables have uniform distributions, i.e., $x_{i}\sim\mathbb{U}(0,1)$ for $i=1,2,3$ . A lower value of $a_{i}$ indicates a higher importance of the input variable $x_{i}$ , i.e., $x_{1}$ is the most important, while $x_{3}$ is the least important in this case. The sensitivity results for the G-function, in the same format as Table 4, are reported in Table 5. It is clear that the inequality relationship in Eq 11 is satisfied.

Table 5: Sensitivity results for the G-function, where the inequality in Eq 11 is clearly satisfied. Same key as Table 4

G-function $\displaystyle y=\prod_{i=1}^{3}\frac{|4x_{i}-2|+a_{i}}{1+a_{i}}$ with $a_{i}=(i-2)/2$ Number of Samples $H(Y|\mathbf{X}_{\sim 1})$ $H(Y|\mathbf{X}_{\sim 2})$ $H(Y|\mathbf{X}_{\sim 3})$ mean std mean std mean std 1.00E+06 0.3477 0.0009 -0.1376 0.0013 -0.3988 0.0015 1.00E+07 0.3398 0.0006 -0.1737 0.0005 -0.4482 0.0006 1.00E+08 0.3378 0.0003 -0.1917 0.0002 -0.4738 0.0002 $X_{1}$ $X_{2}$ $X_{3}$ $H(Y|\mathbf{X}_{\sim i})$ 0.3378 -0.1917 -0.4738 $H(X_{i})+l_{i}$ 1.3863 0.9808 0.6931

4 Exponential entropy based total sensitivity measure

There are two issues with the differential entropy based sensitivity measure: 1) as pointed out in [20], entropy for continuous random variables (aka differential entropy) can become negative and this is undesirable for sensitivity analysis. More importantly, as mentioned in Section 3.1, the inequality in Eq 11 is not valid when normalised by a negative $H(Y)$ ; 2) the interpretation of conditional entropy is not as intuitive as variance based sensitivity indices. This is partly due to the fact that variance-based methods is firmly anchored in variance decomposition, but also because entropy measures the average information or non-uniformity of a distribution as compared to variance which measures the spread of data around the mean. Although non-uniformity can be seen as a suitable measure for epistemic uncertainties [22], its interpretation for GSA in a general setting can be further improved.

We propose to use exponential entropy as a entropy-based measure for global sensitivity analysis, to overcome these two issues. Studies in [3] already noted that an exponentiation of the standard entropy-based sensitivity measures may improve its discrimination power. We take an exponentiation of the inequality in Eq 11:

e^{H(Y|\mathbf{X}_{\sim i})}\leq e^{H(X_{i})}e^{l_{i}}

(12)

where we recall that $l_{i}=\mathbb{E}\left[\ln{\left|g_{x_{i}}\right|}\right]$ . Divide both sides of Eq 12 by $e^{H(Y)}$ :

\kappa_{T_{i}}=\frac{e^{H(Y|\mathbf{X}_{\sim i})}}{e^{H(Y)}}\leq\frac{e^{H(X_{i})}}{e^{H(Y)}}e^{l_{i}}

(13)

where $\kappa_{T_{i}}$ can be considered as the exponential entropy based total sensitivity indices, and the upper bound can then used as a proxy for $\kappa_{T_{i}}$ . As $H(Y|\mathbf{X}_{\sim i})\leq H(Y)$ , so we have $0<\kappa\leq 1$ which is desirable as sensitivity indices.

In addition, $\kappa_{T_{i}}$ and the un-normalised $e^{H(Y|\mathbf{X}_{\sim i})}$ have a more intuitive interpretation as GSA indices as compared to the standard differential entropy, because exponential entropy can be seen as a measure for the effective spread or extent of a distribution [6].

4.1 Intuitive interpretation of sensitivity based on exponential entropy

To explain this point, we first recall that the entropy of a random variable with a uniform distribution is $\ln(b-a)$ , where $a,b$ are the bounds of the distribution. Taking the natural exponentiation of the entropy in this case results in $b-a$ , which is the range of the uniform distribution. For a Gaussian distribution with variance $\sigma^{2}$ , the exponential entropy is $\sqrt{2\pi e}\sigma^{2}$ which is proportional to the variance.

This becomes more evident if we plot the histogram of function output $Y$ from examples 1 to 3 in Figure 4. These histograms are based on $10^{6}$ Monte Carlo samples and from which we have calculated the exponential entropy $e^{H(Y)}$ for each case. When compared to the range of the distribution, the values of the exponential entropy provide intuitive indication of the spread of the distribution, in analogy to standard deviation or variance. For example, the average width of the top and bottom of the trapezium in Example 3 is approximately $(3+4)/2=3.5$ , which is about the same as the exponential entropy of $3.543$ .

Therefore, the exponential entropy $e^{H}$ can be regarded as a measure of the extent, or effective support, of a distribution and this has been discussed in details in [6]. As $H(Y|\mathbf{X}_{\sim i})$ measures the remaining entropy in average if the true values of $\mathbf{X}_{\sim i}$ can be determined, the exponentiation of the conditional entropy, $e^{H(Y|\mathbf{X}_{\sim i})}$ , can thus be regarded as the effective remaining range of the output distribution conditioning on that $\mathbf{X}_{\sim i}$ are known. $\kappa_{T_{i}}$ then measures the ratio of the effective range before and after $\mathbf{X}_{\sim i}$ are fixed, and larger $\kappa_{T_{i}}$ thus indicate a higher influence of $X_{i}$ .

4.2 Link between exponential entropy and variance

Last but not least, in addition to its non-negativity and a more intuitive interpretation for GSA, exponential entropy is also closely linked to variance-based GSA indices and their corresponding bounds. To demonstrate this, we first note that the three different derivative-based sensitivity indices listed in Table 2 are closely related as:

e^{l_{i}}\leq\mu_{i}\leq\sqrt{\nu_{i}}

(14)

where it is evident that $\mu_{i}\leq\sqrt{\nu_{i}}$ based on Cauchy-Schwarz inequality. In addition, we have $e^{l_{i}}\leq\mu_{i}$ using Jensen’s inequality as the exponential function is convex. So the inequality for the exponential entropy in Eq 12 can be further extended to:

e^{2H(Y|\mathbf{X}_{\sim i})}\leq e^{2H(X_{i})}\nu_{i}

(15)

where we recall that $\nu_{i}$ is the derivative-based DGSM indices.

Eq 15 already looks remarkably similar to the variance-DGSM inequality given in Eq 5. In fact, the squared exponential entropy $e^{2H(X)}$ of a random variable $X$ , also called entropy power, is known from information theory to be bounded by the variance of $X$ with $e^{2H(X)}\leq 2\pi e{V(X)}$ [21], where the equality is obtained if $X$ is a Gaussian random variable. Therefore, for independent Gaussian inputs with variance $\sigma_{i}^{2}$ , we have $e^{2H(X_{i})}=2\pi e\sigma_{i}^{2}$ . The squared exponential entropy $e^{2H(X)}$ thus measures the ‘effective variance’ as it is simply the variance of a Gaussian random variable with the same entropy [25].

In the special case where the function $y=g(\mathbf{x})$ is a linear function, the conditional distribution of $Y|\mathbf{X}_{\sim i}$ is also Gaussian, and it indicates that the conditional entropy power would also be proportional to the conditional variances of the output, i.e. $e^{2H(Y|\mathbf{X}_{\sim i})}=2\pi eV(Y|\mathbf{X}_{\sim i})$ (cf Example 5 in Section 3). Therefore, it is clear that with independent Gaussian inputs and a linear function, we have from Eq 15:

2\pi eV(Y|\mathbf{X}_{\sim i})\leq 2\pi e\sigma_{i}^{2}\nu_{i}\rightarrow V(Y|\mathbf{X}_{\sim i})\leq\sigma_{i}^{2}\nu_{i}

(16)

which indicates that in this special case the entropy-DGSM bound from Eq 15 is equivalent to the variance-DGSM relationship given in Eq 5.

4.3 Examples with Ishigami function and G-function

In this section, we will revisit the Ishigami function of Example-6 and G-function of Example-7 from Section 3.3 using the exponential entropy based total sensitivity index $\kappa_{T_{i}}$ . The results of $\kappa_{T_{i}}$ are listed in Table 6. They are obtained using the mean values of the total effect entropy $H(Y|\mathbf{X}_{\sim i})$ from Table 4 and 5 with $10^{8}$ samples, and the corresponding $e^{H(Y)}$ for the output.

Table 6: Sensitivity results for Ishigami and G-function

$X_{1}$ $X_{2}$ $X_{3}$ G-function $\kappa_{T_{i}}$ 0.5306 0.3125 0.2357 $S_{T_{i}}$ 0.6246 0.3240 0.1913 Ishigami $\kappa_{T_{i}}$ 0.2300 0.3527 0.1229 $S_{T_{i}}$ 0.5576 0.4424 0.2437

In comparison, we have also calculated the variance based $S_{T_{i}}$ , using $10^{5}$ samples with 20 repetitions. The mean values for $S_{T_{i}}$ are listed in Table 6. It can be seen that the results from $\kappa_{i}$ are generally consistent with $S_{T_{i}}$ , especially for G-function where the sensitivity ranking are similar both qualitatively and quantitatively. For the Ishigami function, both indices have successfully identified the contribution of $X_{3}$ which is the lowest. However, the relative importance of $X_{1}$ and $X_{2}$ are opposite from $\kappa_{i}$ and $S_{T_{i}}$ . This difference between variance-based and entropy-based ranking for Ishigami function was also noted in [3], but without further explanation.

We attempt to look at the difference using a simple function with $(x,z)$ as the input random variables:

y(x,z)=\alpha(z)x+\beta(z)

(17)

which can be seen to represent the interaction effect of $x_{1}$ and $x_{3}$ in the Ishigami function. The conditional variance and entropy of $Y|Z=z$ are thus:

V(Y|Z=z)={\alpha^{2}(z)}V(X)\quad\text{and}\quad H(Y|Z=z)=H(X)+\ln{|\alpha(z)|}

(18)

where the translation invariance of variance and entropy are used. And the corresponding un-conditional $V(Y|Z)$ and $H(Y|Z)$ can be found as:

V(Y|Z)=\mathbb{E}_{Z}[{\alpha^{2}(z)}]V(X)\quad\text{and}\quad H(Y|Z)=\mathbb{E}_{Z}[\ln{|\alpha(z)|}]+H(X)

(19)

From 19, it is clear that despite the similarities, the conditional variance $V(Y|Z)$ depends on $\alpha^{2}$ which grows much faster than $\ln|\alpha|$ for the conditional entropy $H(Y|Z)$ as $|\alpha|$ increases.

In the Ishigami function, the effect of $x_{3}$ grows fast as $x_{3}$ approach its boundary, i.e. towards $-\pi$ and $\pi$ . This is evident from the scatter plot of for $x_{3}$ in Figure 3a. The much stronger interaction at the support boundary between $x_{1}$ and $x_{3}$ has also been noted in [26], where the importance of $x_{1}$ was found to be much reduced if the support of the distribution of $x_{3}$ is reduced by $10$ percent. Putting $x_{3}\sim[-\pi+\pi/10,\pi-\pi/10]$ would put $x_{2}$ as the most influential variable.

From Eq 19, we can see that the interaction between $x_{1}$ and $x_{3}$ is more influential for the conditional variance due to the squaring effect, as compared to the entropy operation which takes logarithm of the interaction. This difference increases towards the boundary as the interaction between $x_{1}$ and $x_{3}$ gets stronger towards $-\pi$ and $\pi$ . And this helps to explain why $x_{1}$ is the most influential variable for the variance-based $S_{T_{i}}$ . This example highlights that, despite many similarities, entropy and variance are fundamentally different, for example, the variable interactions are processed differently between them.

5 A flood model case study

The motivation of this paper is to identify a much more computational efficient proxy for entropy-based total effect sensitivity. $e^{H(X_{i})}e^{l_{i}}$ has been proved to upper bound the total effect exponential entropy, thus, can be used as a proxy for $e^{H(Y|\mathbf{X_{\sim i}})}$ and its normalised version as a proxy for $\kappa_{T_{i}}$ .

We have demonstrated the inequality via several numerical examples in Section 3 for the differential entropy based measure. In the previous section, the use of the log-derivative $l_{i}$ has been further extended to the more commonly used DGSM indices $\nu_{i}$ , where $e^{2H(X_{i})}\nu_{i}$ can be used as a proxy for $e^{2H(Y|\mathbf{X_{\sim i}})}$ . This is especially useful when $\nu_{i}$ are known.

To illustrate this point for practical problems, a simple river flood model is considered where $\nu_{i}$ has been used for sensitivity analysis. This model has been used in [8] for demonstration of the use of Cheeger constant for factor prioritization with DGSM, and as an example in GSA review [27].

This model simulates the height of a river, and flooding occurs when the river height exceeds the height of a dyke that protects industrial facilities. It is based on simplification of the 1D hydro-dynamical equations of SaintVenant under the assumptions of uniform and constant flow rate and large rectangular sections. The quantity of interest in this case is the maximal annual overflow $Y$ :

Y=Z_{\nu}+D_{m}-D_{d}-C_{b}\quad\text{with}\quad D_{m}=\left(\frac{Q}{BK_{s}\sqrt{(Z_{m}-Z_{\nu})/L}}\right)^{0.6}

(20)

where the distributions of the independent input variables are listed in Table 7. We have also added the exponential entropy $e^{H(X_{i})}$ value for each variable in Table 7. The analytical expressions for differential entropy of most probability distributions are readily available and well documented. A comprehensive list can be found in [28].

For a truncated distribution with the interval $[a,b]$ , its differential entropy can be found as [29]:

H_{\text{truncated}}(X)=-\int_{a}^{b}\frac{f(x)}{\Delta F}\ln\frac{f(x)}{\Delta F}dx

(21)

where $f(x)$ and $F(x)$ are the original probability density function (PDF) and cumulative distribution function (CDF) of the random variable $X$ respectively. ${f(x)}/{\Delta F}$ is the PDF of the truncated distribution where $\Delta F=F(b)-F(a)$ . As both $f(x)$ and $F(x)$ are analytically known, Eq 21 can then integrated numerically for the truncated entropy.

Table 7: Probability distributions for input variables of the flood model and their exponential entropy

Variable Description Distribution Function Exponential Entropy $e^{H(X_{i})}$ $x_{1}$ $Q$ Maximal annual flowrate [m³/s] Truncated Gumbel $\mathcal{G}$ (1013,558) on [500,3000] 2051 $x_{2}$ $K_{s}$ Strickler coefficient [ - ] Truncated Normal $\mathcal{N}$ (30,8) on [15, $\infty$ ] 30 $x_{3}$ $Z_{\nu}$ River downstream level [m] Triangular $\mathcal{T}$ (49,50,51) 1.65 $x_{3}$ $Z_{m}$ River upstream level [m] Triangular $\mathcal{T}$ (54,55,56) 1.65 $x_{5}$ $D_{d}$ Dyke height [m] Uniform $\mathcal{U}$ [7,9] 2 $x_{6}$ $C_{b}$ Bank level [m] Triangular $\mathcal{T}$ (55,55.5,56) 0.825 $x_{7}$ $L$ Length of the river stretch [m] Triangular $\mathcal{T}$ (4990,5000,5010) 16.5 $x_{8}$ $B$ River width [m] Triangular $\mathcal{T}$ (295,300,305) 8.24

The results for the sensitivity indices are shown in Table 8, where both the variance-based and derivative-based results are obtained from Table 5 of reference [8]. According to [8], the variance-based index is based on $2\times 10^{7}$ model evaluations, while a Sobol’ sequence is used with $1\times 10^{4}$ model evaluations for estimating the derivative-based DGSM. Note that different from [8], the un-normalised total effect variance $V(Y|\mathbf{X}_{\sim i})$ and its DGSM based upper bound are given in Table 8. Recall that $S_{T_{i}}=V(Y|\mathbf{X}_{\sim i})/V(Y)$ and variance of the output $V(Y)=1.174$ in this case.

Also given in Table 8 is the entropy-based upper bound, which is calculated using the exponential entropy values listed in Table 7 and the $\nu_{i}$ results. Note that as analytical expressions are typically available for entropy, there is negligible extra cost for the estimation the entropy-DGSM upper bound once $\nu_{i}$ are known. From the entropy-DGSM proxy, we can see that four input variables, $Q,K_{s},Z_{\nu},D_{d}$ , have been identified as the most important variables for maximal annual overflow, with $L$ and $B$ of negligible influence.

In comparing the new entropy-DGSM proxy with results from other measures, it is clear from Table 8 that: 1) DGSM alone failed to properly rank the input variables in this case, as pointed out in [8]. We can get a hint for the reason from Eq 11 where it is clear that, although $l_{i}$ is a global measure, it is the combination of $H(X_{i})$ and the derivative-based $l_{i}$ that has the direct link with the total entropy effect. This point is also reflected in the variance proxy where $\nu_{i}$ is scaled by the Cheeger constant which is a function of the distribution functions of the inputs; 2) Both upper bound proxies, entropy-based $e^{2H(X_{i})}\nu_{i}$ and variance-based $4c_{i}^{2}\nu_{i}$ , provide consistent results as compared to the total effect variance $V(Y|\mathbf{X}_{\sim i})$ ; 3) The entropy-DGSM upper bound outperforms the variance-DGSM proxy, by ranking the variables in exactly the same order as the total effect variance. Note that the order of $L$ and $B$ is negligible as their influence is almost zero.

It is thus clear that the entropy-DGSM upper bound can be used as a proxy for the total effect sensitivity analysis. As compared to the Cheeger constant, the normalization constant $e^{2H(X_{i})}$ is also easier to compute as the entropy of many distribution functions are known in closed form. The main computational cost for estimation of the entropy-DGSM proxy is thus for the calculation $\nu_{i}$ , which is much more affordable than the estimation of the conditional variance or the conditional entropy.

Table 8: Sensitivity results for the flood model. Results for

V(Y|\mathbf{X}_{\sim i})

\nu_{i}

4c_{i}^{2}\nu_{i}

are reproduced from [8], with

V(Y|\mathbf{X}_{\sim i})=V(Y)S_{T_{i}}

and

V(Y)=1.174

. The rankings are shown in the brackets.

Total effect variance Derivative-based DGSM Variance-DGSM Upper Bound Entropy-DGSM Upper Bound Variable $V(Y|\mathbf{X}_{\sim i})$ $\nu_{i}$ $4c_{i}^{2}\nu_{i}$ $e^{2H(X_{i})}\nu_{i}$ $x_{1}$ $Q$ 0.414 (1) 1.296e-06 (7) 3.295 (1) 5.452 (1) $x_{2}$ $K_{s}$ 0.163 (4) 3.286e-03 (5) 0.233 (4) 2.960 (4) $x_{3}$ $Z_{\nu}$ 0.218 (3) 1.123e+00 (1) 0.659 (2) 3.053 (3) $x_{3}$ $Z_{m}$ 0.004 (6) 2.279e-02 (4) 0.013 (6) 0.062 (6) $x_{5}$ $D_{d}$ 0.324 (2) 8.389e-01 (2) 0.399 (3) 3.356 (2) $x_{6}$ $C_{b}$ 0.042 (5) 8.389e-01 (3) 0.123 (5) 0.570 (5) $x_{7}$ $L$ 0.000 (7) 2.147e-08 (8) 0.000 (7) 0.000 (8) $x_{8}$ $B$ 0.000 (8) 2.386e-05 (6) 0.000 (8) 0.002 (7)

6 Conclusions

A novel global sensitivity proxy for entropy-based total effect has been developed in this paper. This development is motivated by the observation that, on the one hand, for distributions where the second moment is not a sensible measure of the output uncertainty such as highly skewed distributions, entropy-based sensitivity indices can perform better for global sensitivity analysis than variance-based measures. It is also driven by the issue that, on the other hand, entropy-based indices have limited application in practice, mainly due to the heavy computational burden where the knowledge of conditional probability distributions are required.

We have made use of the inequality between the entropy of the model output and its inputs, which can be seen as an instance of data processing inequality, and established an upper bound for the total effect entropy $H(Y|\mathbf{X}_{\sim i})$ . This upper bound is tight for monotonic functions and this has been demonstrated numerically. Extending to general functions, the upper bound was able to provide similar input rankings and thus can be regarded as a proxy for entropy-based total effect measure. Applying to a physics model for flood analysis, the new proxy shows much improved variable ranking capability compared to derivative-based global sensitivity measures (DGSM).

The resulted proxy is based on partial derivatives and thus computationally cheap to estimate. If the the derivatives are available, e.g. as output of a computational code, the proxy would be readily available. Even if a Monte Carlo based approach is used, the computational cost is typically in the order $10^{3}$ or $10^{4}$ , as compared to $10^{7}$ for entropy-based indices. This computational advantage is why derivative-based methods, such as the Morris’ method and DGSM, are popular among practitioners whenever computational cost is of major concern. This is especially true for high dimensional problems, where it becomes infeasible to estimate conditional entropies as many instances of conditional distributions are required.

Drawing on the criticism of using differential-entropy as sensitivity indices, we propose to use its exponentiation which is both non-negative and offer a more intuitive interpretation as sensitivity indices. The total effect exponential entropy $e^{H(Y|\mathbf{X_{\sim i}})}$ and its normalised indices $\kappa_{i}$ measure the effective remaining extent or spread of a distribution conditioning on that $X_{\sim i}$ are known. Larger $e^{H(Y|\mathbf{X_{\sim i}})}$ or $\kappa_{T_{i}}$ thus indicate a higher influence of $X_{i}$ . This interpretation is similar to the variance-based sensitivity indices as variance is also a measure of distribution spread. However, entropy-based and variance-based indices are fundamentally different and they would generally provide different variable ranking. This is evident from the Ishigami function example where it was found that the interaction effect between variables are processed differently.

The sensitivity measure $\kappa_{T_{i}}$ based on exponential entropy is mainly introduced in this work for the purpose of extending the conditional entropy upper bound. However, $\kappa_{T_{i}}$ is interesting in its own right, as it possesses many desirable properties for GSA, such as quantitative, moment independent and easy to interpret. The G-function example in Section 4.3 shows that $\sum\kappa_{T_{i}}$ is close to one for this product function, as opposed to the variance-based indices where the sum of sensitivity indices is equal to one for additive functions. As exponential entropy can be seen as a geometric mean of the underlying distribution, i.e. $e^{H(X)}=e^{\mathbb{E}[-\ln f(x)]}$ , one of the future research is to examine the unique properties of GSA indices based on exponential entropy, and explore its decomposition characteristics for sensitivity analysis of different interaction orders.

Acknowledgment

For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.

Data availability statement

The authors confirm that the data supporting the findings of this study are available within the article.

References

[1]
[2] Saltelli, A.: Global sensitivity analysis: the primer. John Wiley, 2008 http://books.google.at/books?id=wAssmt2vumgC. – ISBN 9780470059975
[3] Auder, B ; Iooss, B: Global sensitivity analysis based on entropy. In: Proceedings of the ESREL 2008 Conference (2008), S. 2107–2115
[4] Morris, Max D.: Factorial Sampling Plans for Preliminary Computational Experiments. In: Technometrics 33 (1991), Nr. 2, 161-174. http://dx.doi.org/10.1080/00401706.1991.10484804. – DOI 10.1080/00401706.1991.10484804
[5] Kucherenko, S ; Rodriguez-Fernandez, María ; Pantelides, C ; Shah, Nilay: Monte Carlo evaluation of derivative-based global sensitivity measures. In: Reliability Engineering & System Safety 94 (2009), Nr. 7, S. 1135–1148
[6] Campbell, L L.: Exponential entropy as a measure of extent of a distribution. In: Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 5 (1966), Nr. 3, S. 217–225
[7] Sobol’, I.M. ; Kucherenko, S.: Derivative based global sensitivity measures and their link with global sensitivity indices. In: Mathematics and Computers in Simulation 79 (2009), Nr. 10, S. 3009–3017
[8] Lamboni, M. ; Iooss, B. ; Popelin, A.-L. ; Gamboa, F.: Derivative-based global sensitivity measures: General links with Sobol’ indices and numerical tests. In: Mathematics and Computers in Simulation 87 (2013), 45-54. http://dx.doi.org/https://doi.org/10.1016/j.matcom.2013.02.002. – DOI https://doi.org/10.1016/j.matcom.2013.02.002. – ISSN 0378–4754
[9] Wainwright, Haruko M. ; Finsterle, Stefan ; Jung, Yoojin ; Zhou, Quanlin ; Birkholzer, Jens T.: Making sense of global sensitivity analyses. In: Computers & Geosciences 65 (2014), 84-94. http://dx.doi.org/https://doi.org/10.1016/j.cageo.2013.06.006. – DOI https://doi.org/10.1016/j.cageo.2013.06.006. – ISSN 0098–3004. – TOUGH Symposium 2012
[10] Borgonovo, Emanuele ; Plischke, Elmar: Sensitivity analysis: A review of recent advances. In: European Journal of Operational Research 248 (2016), Nr. 3, 869-887. http://dx.doi.org/https://doi.org/10.1016/j.ejor.2015.06.032. – DOI https://doi.org/10.1016/j.ejor.2015.06.032. – ISSN 0377–2217
[11] Lemaître, P. ; Sergienko, E. ; Arnaud, A. ; Bousquet, N. ; Gamboa, F. ; Iooss, B.: Density modification-based reliability sensitivity analysis. In: Journal of Statistical Computation and Simulation 85 (2015), Nr. 6, 1200-1223. http://dx.doi.org/10.1080/00949655.2013.873039. – DOI 10.1080/00949655.2013.873039
[12] Borgonovo, E.: A new uncertainty importance measure. In: Reliability Engineering & System Safety 92 (2007), Nr. 6, 771-784. http://dx.doi.org/https://doi.org/10.1016/j.ress.2006.04.015. – DOI https://doi.org/10.1016/j.ress.2006.04.015. – ISSN 0951–8320
[13] Yang, Jiannan: A general framework for probabilistic sensitivity analysis with respect to distribution parameters. In: Probabilistic Engineering Mechanics 72 (2023), S. 103433
[14] Yang, Jiannan: Decision-Oriented Two-Parameter Fisher Information Sensitivity Using Symplectic Decomposition. In: Technometrics 0 (2023), Nr. 0, 1-12. http://dx.doi.org/10.1080/00401706.2023.2216251. – DOI 10.1080/00401706.2023.2216251
[15] Razavi, Saman ; Jakeman, Anthony ; Saltelli, Andrea ; Prieur, Clémentine ; Iooss, Bertrand ; Borgonovo, Emanuele ; Plischke, Elmar ; Piano, Samuele L. ; Iwanaga, Takuya ; Becker, William u. a.: The future of sensitivity analysis: an essential discipline for systems modeling and policy support. In: Environmental Modelling & Software 137 (2021), S. 104954
[16] Puy, Arnald ; Becker, William ; Piano, Samuele L. ; Saltelli, Andrea: The battle of total-order sensitivity estimators. In: arXiv preprint arXiv:2009.01147 (2020)
[17] Campolongo, Francesca ; Cariboni, Jessica ; Saltelli, Andrea: An effective screening design for sensitivity analysis of large models. In: Environmental modelling & software 22 (2007), Nr. 10, S. 1509–1518
[18] Pianosi, Francesca ; Wagener, Thorsten: A simple and efficient method for global sensitivity analysis based on cumulative distribution functions. In: Environmental Modelling & Software 67 (2015), S. 1–11
[19] Liu, Huibin ; Chen, Wei ; Sudjianto, Agus: Relative entropy based method for probabilistic sensitivity analysis in engineering design. In: Journal of Mechanical Design 128 (2006), Nr. 2, S. 326–336
[20] Kala, Zdeněk: Global sensitivity analysis based on entropy: From differential entropy to alternative measures. In: Entropy 23 (2021), Nr. 6, S. 778
[21] Cover, Thomas M.: Elements of information theory. John Wiley & Sons, 1999
[22] Krzykacz-Hausmann, Bernard: Epistemic sensitivity analysis based on the concept of entropy. In: Proceedings of SAMO2001 (2001), S. 31–35
[23] Papoulis, Athanasios: Probability, random variables and stochastic processes. 1984
[24] Geiger, Bernhard C. ; Kubin, Gernot: On the information loss in memoryless systems: The multivariate case. In: arXiv preprint arXiv:1109.4856 (2011)
[25] Costa, Max ; Cover, Thomas: On the similarity of the entropy power inequality and the Brunn-Minkowski inequality (corresp.). In: IEEE Transactions on Information Theory 30 (1984), Nr. 6, S. 837–839
[26] Fruth, Jana ; Roustant, Olivier ; Kuhnt, Sonja: Support indices: Measuring the effect of input variables over their supports. In: Reliability Engineering & System Safety 187 (2019), S. 17–27
[27] Iooss, Bertrand ; Lemaître, Paul: A review on global sensitivity analysis methods. In: Uncertainty management in simulation-optimization of complex systems: algorithms and applications (2015), S. 101–122
[28] Lazo, A V. ; Rathie, Pushpa: On the entropy of continuous probability distributions (corresp.). In: IEEE Transactions on Information Theory 24 (1978), Nr. 1, S. 120–122
[29] Moharana, Rajesh ; Kayal, Suchandan: Properties of Shannon entropy for double truncated random variables and its applications. In: Journal of Statistical Theory and Applications 19 (2020), Nr. 2, S. 261–273
[30] Moddemeijer, Rudy: On estimation of entropy and mutual information of continuous distributions. In: Signal processing 16 (1989), Nr. 3, S. 233–248

Appendix A Numerical estimation of entropy

Adopting the approach from [30], the $xy$ -plane is gridded by equal size cells ( $\Delta x\times\Delta y$ ) with coordinates ( $i$ , $j$ ). The probability of observing a sample in cell ( $i$ , $j$ ) is:

p_{ij}=\iint\limits_{\text{cell}(i,j)}f(x,y)dxdy\approx f(x_{i},y_{j})\Delta x\Delta y

(A.1)

where $(x_{i},y_{j})$ is the centre of the cell.

Assuming the jPDF is approximately constant within a cell, the joint entropy can be represented as:

\begin{split}H(X,Y)&=-\int f(x,y)\ln f(x,y)dxdy\\ &\approx-\sum f(x_{i},y_{j})\ln f(x_{i},y_{j})\Delta x\Delta y\\ &\approx-\sum p_{ij}\left(\ln p_{ij}-\ln(\Delta x\Delta y)\right)\\ &\approx-\sum\left(\frac{k_{ij}}{N}\ln\frac{k_{ij}}{N}\right)+\ln(\Delta x\Delta y)\end{split}

(A.2)

where $k_{ij}$ represents the number of samples observed in the cell ( $i$ , $j$ ), and $N$ is the total number of samples.

Similarly, the conditional entropy can be approximated as:

\begin{split}H(Y|X)&=-\int f(x,y)\frac{\ln f(x,y)}{f(x)}dxdy\\ &\approx-\sum p_{ij}\left(\ln p_{ij}-\ln p_{i}-\ln\Delta y\right)\\ &\approx-\sum\left(\frac{k_{ij}}{N}\ln\frac{k_{ij}}{k_{i}}\right)+\ln\Delta y\end{split}

(A.3)

where $k_{i}=\sum_{j}k_{ij}$ and similar expressions can be derived when $\mathbf{X}$ is a vector variable.