This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Rényi Cross-Entropy Measures for Common Distributions
and Processes with Memory

Ferenc Cole Thierrin, Fady Alajaji, Tamás Linder
Department of Mathematics and Statistics
Queen’s University
Kingston, ON K7L 3N6, Canada
Emails: {14fngt, fa, tamas.linder}@queensu.ca
Abstract

Two Rényi-type generalizations of the Shannon cross-entropy, the Rényi cross-entropy and the Natural Rényi cross-entropy, were recently used as loss functions for the improved design of deep learning generative adversarial networks. In this work, we build upon our results in [1] by deriving the Rényi and Natural Rényi differential cross-entropy measures in closed form for a wide class of common continuous distributions belonging to the exponential family and tabulating the results for ease of reference. We also summarise the Rényi-type cross-entropy rates between stationary Gaussian processes and between finite-alphabet time-invariant Markov sources.

Keywords

Rényi information measures, cross-entropy, divergence measures, exponential family distributions, Gaussian processes, Markov sources.


I Introduction

The Rényi entropy [2] of order α\alpha of a probability mass function pp with finite support 𝕊\mathbb{S} is defined as

Hα(p)=11αlnx𝕊p(x)αH_{\alpha}(p)=\frac{1}{1-\alpha}\ln\sum_{x\in\mathbb{S}}p(x)^{\alpha}

for α>0,α1\alpha>0,\alpha\neq 1. The Rényi entropy generalizes the Shannon entropy,

H(p)=x𝕊p(x)lnp(x),H(p)=-\sum_{x\in\mathbb{S}}p(x)\ln p(x),

in the sense that as α1\alpha\to 1, Hα(p)H(p)H_{\alpha}(p)\to H(p). Several other Rényi-type information measures have been put forward, each obeying the condition that their limit as α\alpha goes to one reduces to a Shannon-type information measure. This includes the Rényi divergence (of order α\alpha) between two discrete distributions pp and qq with common finite support 𝕊\mathbb{S}, given by

Dα(p||q)=1α1lnx𝕊p(x)αq(x)1αD_{\alpha}(p||q)=\frac{1}{\alpha-1}\ln\sum_{x\in\mathbb{S}}p(x)^{\alpha}q(x)^{1-\alpha}

which reduces to the familiar Kullback-Leibler divergence,

DKL(p||q)=x𝕊p(x)lnp(x)q(x),D_{\text{\tiny KL}}(p||q)=\sum_{x\in\mathbb{S}}p(x)\ln\frac{p(x)}{q(x)},

as α1\alpha\to 1. Note that in some cases [3], there may exist multiple Rényi-type generalisations for the same information measure (particularly for the mutual information).

Many of these definitions admit natural counterparts in the case when the involved distributions have a probability density function (pdf). This gives rise to information measures such as the Rényi differential entropy for pdf pp with support 𝕊\mathbb{S},

hα(p)=11αln𝕊p(x)α𝑑x,h_{\alpha}(p)=\frac{1}{1-\alpha}\ln\int_{\mathbb{S}}p(x)^{\alpha}\,dx,

and the Rényi (differential) divergence between pdfs pp and qq with common support 𝕊\mathbb{S},

Dα(pq)=1α1ln𝕊p(x)αq(x)1α𝑑x.D_{\alpha}(p\|q)=\frac{1}{\alpha-1}\ln\int_{\mathbb{S}}p(x)^{\alpha}q(x)^{1-\alpha}\,dx.

The Rényi cross-entropy between distributions pp and qq is an analogous generalization of the Shannon cross-entropy

H(p;q)=x𝕊p(x)lnq(x).H(p;q)=-\sum_{x\in\mathbb{S}}p(x)\ln q(x).

Two definitions for this measure have been recently suggested. In light of the fact that Shannon’s cross-entropy satisfies H(p;q)=D(pq)+H(p)H(p;q)=D(p\|q)+H(p), a natural definition of the Rényi cross-entropy is:

H~α(p;q)Dα(p||q)+Hα(p).\tilde{H}_{\alpha}(p;q)\coloneqq D_{\alpha}(p||q)+H_{\alpha}(p). (1)

This definition was indeed proposed in [4] in the continuous case, with the differential cross-entropy measure given by

h~α(p;q)Dα(p||q)+hα(p).\tilde{h}_{\alpha}(p;q)\coloneqq D_{\alpha}(p||q)+h_{\alpha}(p). (2)

In contrast, prior to [4], the authors of [5] introduced the Rényi cross-entropy in their study of shifted Rényi measures expressed as the logarithm of weighted generalized power means. Specifically, upon simplifying Definition 6 in [5], their expression for the Rényi cross-entropy between distributions pp and qq is given by

Hα(p;q)11αlnx𝕊p(x)q(x)α1.H_{\alpha}\left(p;q\right)\coloneqq\frac{1}{1-\alpha}\ln\sum_{x\in\mathbb{S}}p\left(x\right)q\left(x\right)^{\alpha-1}. (3)

For the continuous case, (3) can be readily converted to yield the Rényi differential cross-entropy between pdfs pp and qq:

hα(p;q)11αln𝕊p(x)q(x)α1𝑑x.h_{\alpha}(p;q)\coloneqq\frac{1}{1-\alpha}\ln\int_{\mathbb{S}}p(x)q(x)^{\alpha-1}\,dx. (4)

Note that both (1) and (3) reduce to the Shannon cross-entropy H(p;q)H(p;q) as α1\alpha\to 1 [6]. A similar result holds for (2) and (4), where the Shannon differential cross-entropy, h(p;q)=𝕊p(x)lnq(x)𝑑xh(p;q)=-\int_{\mathbb{S}}p(x)\ln q(x)\,dx, is obtained. Also, the Rényi (differential) entropy is recovered in all equations when p=qp=q (almost everywhere). These properties alone make these definitions viable extensions of the Shannon (differential) cross-entropy.

Finding closed-form expressions for the cross-entropy measure in (2) for continuous distributions is direct, since the Rényi divergence and the Rényi differential entropy were already calculated for numerous distributions in [7] and [8], respectively. However, deriving the measure in (4) is more involved. We hereafter refer to the measures H~α(p;q)\tilde{H}_{\alpha}(p;q) in (1) and h~α(p;q)\tilde{h}_{\alpha}(p;q) (2), as the Natural Rényi cross-entropy and the Natural Rényi differential cross-entropy, respectively; while we plainly call the measures in Hα(p;q)H_{\alpha}(p;q) in (3) and hα(p;q)h_{\alpha}(p;q) (4) as the Rényi cross-entropy and the Rényi differential cross-entropy, respectively.

In a recent conference paper [1], we showed how to calculate the Rényi differential cross-entropy hα(p;q)h_{\alpha}(p;q) between distributions of the same type from the exponential family. Building upon the results shown there, the purpose of this paper is to derive in closed form the expression of hα(p;q)h_{\alpha}(p;q) for thirteen commonly used univariate distributions from the exponential family, as well as for multivariate Gaussians, and tabulate the results for ease of reference. We also analytically derive the Natural Rényi differential cross-entropy h~α(p;q)\tilde{h}_{\alpha}(p;q) for the same set of distributions. Finally, we present tables summarising the Rényi and Natural Rényi (differential) cross-entropy rate measures, along with their Shannon counterparts, for two important classes of sources with memory, namely stationary Gaussian sources and finite-state time-invariant Markov sources.

Motivation for determining formulae for the Rényi cross-entropy originates from the use of the Shannon differential cross-entropy as a loss function for the design of deep learning generative adversarial networks (GANs) in [9]. The parameter α\alpha, ubiquitous to all Rényi information measures, allows one to fine-tune the loss function to improve the quality of the GAN-generated output. This can be seen in [10, 6] and [4], which used the Rényi differential cross-entropy, and the Natural Rényi differential cross-entropy measures, respectively, to generalize the original GAN loss function (which is recovered as α1\alpha\to 1), resulting in both improved GAN system stability and performance for multiple image datasets. It is also shown in [10, 6] that the introduced Rényi-centric generalized loss function preserves the equilibrium point satisfied by the original GAN via the so-called Jensen-Rényi divergence [11], a natural extension of the Jensen-Shannon divergence [12] upon which the equilibrium result of [9] is established. Other GAN systems that utilize different generalized loss functions were recently developed and analyzed in [13] and [14, 15] (see also the references therein for prior work).

The rest of this paper is organised as follows. In Section II, the formulae for the Rényi differential cross-entropy and Natural Rényi differential cross-entropy for distributions from the exponential family are given. In Section III, these calculations are systematically carried for fourteen pairs of distributions of the same type within the exponential family and the results are presented in two tables. The Rényi and Natural Rényi differential cross-entropy rates are presented in Section IV for stationary Gaussian sources; furthermore, the Rényi and Natural Rényi cross-entropy rates are provided in Section V for finite-state time-invariant Markov sources. Finally, the paper is concluded in Section VI.

II Rényi and Natural Rényi Differential Cross-Entropies for Distributions from the Exponential Family

An exponential family is a class of probability distributions over a support 𝕊n\mathbb{S}\subseteq\mathbb{R}^{n} defined by a parameter space Θm\Theta\subseteq\mathbb{R}^{m} and functions b:𝕊b:\mathbb{S}\mapsto\mathbb{R}, c:Θc:\Theta\mapsto\mathbb{R}, T:𝕊mT:\mathbb{S}\mapsto\mathbb{R}^{m}, and η:Θm\eta:\Theta\mapsto\mathbb{R}^{m} such that the pdf in this family have the form

f(x)=c(θ)b(x)exp(η(θ),T(x)),x𝕊f(x)=c(\theta)b(x)\exp\left(\langle\eta(\theta),T(x)\rangle\right),\qquad x\in\mathbb{S} (5)

where ,\langle\cdot,\cdot\rangle denotes the standard inner product in m\mathbb{R}^{m}. Alternatively, by using the (natural) parameter η=η(θ)\eta=\eta(\theta), the pdf can also be written

f(x)=b(x)exp(η,T(x)+A(η)),\displaystyle{f(x)=b(x)\exp\left(\langle\eta,T(x)\rangle+A(\eta)\right)}, (6)

where A(η):η(Θ)A(\eta):\eta(\Theta)\mapsto\mathbb{R} with A(η)=lnc(θ)A(\eta)=\ln c(\theta). Examples of important pdfs we consider from the exponential family are included in Appendix A.

In [1] the cross-entropy between pdfs f1f_{1} and f2f_{2} of the same type from the exponential family was proven to be

hα(f1;f2)=A(η1)A(ηh)+lnEh1αA(η2),h_{\alpha}\left(f_{1};f_{2}\right)=\frac{A\left(\eta_{1}\right)-A\left(\eta_{h}\right)+\ln E_{h}}{1-\alpha}-A\left(\eta_{2}\right), (7)

where

Eh=𝔼fh[b(X)α1]=b(x)α1fh(x)𝑑x.E_{h}=\mathbb{E}_{f_{h}}\left[b(X)^{\alpha-1}\right]=\int b(x)^{\alpha-1}f_{h}(x)\,dx.

Here fhf_{h} refers to a distribution of the same type as f1f_{1} and f2f_{2} within the exponential family with natural parameter

ηhη1+(α1)η2.\eta_{h}\coloneqq\eta_{1}+(\alpha-1)\eta_{2}.

It can also be shown that the Natural Rényi differential cross-entropy between f1f_{1} and f2f_{2} is given by

h~α(f1;f2)=A(ηα)A(αη1)+lnEα1αA(η2),\tilde{h}_{\alpha}\left(f_{1};f_{2}\right)=\frac{A\left(\eta_{\alpha}\right)-A\left(\alpha\eta_{1}\right)+\ln E_{\alpha}}{1-\alpha}-A\left(\eta_{2}\right), (8)

where

ηα=αη1+(1α)η2,\eta_{\alpha}=\alpha\eta_{1}+(1-\alpha)\eta_{2},

and

Eα=𝔼fα1[b(X)α1]=b(x)α1fα1(x)𝑑xE_{\alpha}=\mathbb{E}_{f_{\alpha 1}}\left[b(X)^{\alpha-1}\right]=\int b(x)^{\alpha-1}f_{\alpha 1}(x)\,dx

where fα1f_{\alpha 1} refers to a distribution of the same type as f1f_{1} and f2f_{2} within the exponential family with natural parameter αη1\alpha\eta_{1}.

III Tables of Rényi and Natural Rényi Differential Cross-Entropies

Tables LABEL:RenDivTable and LABEL:RenNatTable list Rényi and Natural differential cross-entropy expressions, respectively, between common distributions of the same type from the exponential family (which we describe in Appendix A for convenience). The closed-form expressions were derived using (7) and (8), respectively. In the tables, the subscript of ii is used to denote that a parameter belongs to pdf fif_{i}, i=1,2i=1,2.


TABLE I: Rényi Differential Cross-Entropies
Name hα(f1;f2)h_{\alpha}(f_{1};f_{2})
Beta lnB(a2,b2)+1α1lnB(ah,bh)B(a1,b1)\displaystyle{\ln{B(a_{2},b_{2})}+\frac{1}{\alpha-1}\ln\frac{B(a_{h},b_{h})}{B(a_{1},b_{1})}}
aha1+(α1)(a21)a_{h}\coloneqq a_{1}+(\alpha-1)(a_{2}-1),ah>0\hskip 28.45274pta_{h}>0
bhb1+(α1)(b21)b_{h}\coloneqq b_{1}+(\alpha-1)(b_{2}-1),bh>0\hskip 28.45274ptb_{h}>0
𝝌 (scaled)\begin{array}[]{c}\boldsymbol{\chi}\\ \textbf{ (scaled)}\end{array} 12(k2lnσ22σh2ln2σh2)+lnΓ(k22)\displaystyle{\frac{1}{2}\left(k_{2}\ln\sigma_{2}^{2}\sigma_{h}^{2}-\ln 2\sigma_{h}^{2}\right)+\ln\Gamma\left(\frac{k_{2}}{2}\right)}
+1α1(lnΓ(kh2)lnΓ(k12)k12lnσ12σh2)+\displaystyle{\frac{1}{\alpha-1}\left(\ln\Gamma\left(\frac{k_{h}}{2}\right)-\ln\Gamma\left(\frac{k_{1}}{2}\right)-\frac{k_{1}}{2}\ln\sigma_{1}^{2}\sigma_{h}^{2}\right)}
σh21σ12+α1σ22\sigma^{2}_{h}\coloneqq\frac{1}{\sigma_{1}^{2}}+\frac{\alpha-1}{\sigma_{2}^{2}},σh2>0\hskip 28.45274pt\sigma^{2}_{h}>0
khk1+(α1)(k21)k_{h}\coloneqq k_{1}+(\alpha-1)(k_{2}-1),kh>0\hskip 28.45274ptk_{h}>0
𝝌 (non-scaled)\begin{array}[]{c}\boldsymbol{\chi}\\ \textbf{ (non-scaled)}\end{array} 12(k2lnαln2α)+lnΓ(k22)\displaystyle{\frac{1}{2}\left(k_{2}\ln\alpha-\ln 2\alpha\right)+\ln\Gamma\left(\frac{k_{2}}{2}\right)}
+1α1(lnΓ(kh2)lnΓ(k12)k12lnα)+\displaystyle{\frac{1}{\alpha-1}\left(\ln\Gamma\left(\frac{k_{h}}{2}\right)-\ln\Gamma\left(\frac{k_{1}}{2}\right)-\frac{k_{1}}{2}\ln\alpha\right)}
khk1+(α1)(k21)k_{h}\coloneqq k_{1}+(\alpha-1)(k_{2}-1),kh>0\hskip 28.45274ptk_{h}>0
𝝌𝟐\boldsymbol{\chi^{2}} 11α(ν12ln(α)lnΓ(ν12)+lnΓ(νh2))\displaystyle{\frac{1}{1-\alpha}\left(\frac{\nu_{1}}{2}\ln\left(\alpha\right)-\ln\Gamma\left(\frac{\nu_{1}}{2}\right)+\ln\Gamma\left(\frac{\nu_{h}}{2}\right)\right)}
+2ν22ln(α)+ln2Γ(ν22)+\displaystyle{\frac{2-\nu_{2}}{2}\ln\left(\alpha\right)+\ln 2\Gamma\left(\frac{\nu_{2}}{2}\right)}
νhν1+(α1)(ν22)\nu_{h}\coloneqq\nu_{1}+(\alpha-1)(\nu_{2}-2),νh>0\hskip 28.45274pt\nu_{h}>0
Exponential 11αlnλ1λhlnλ2\displaystyle{\frac{1}{1-\alpha}\ln\frac{\lambda_{1}}{\lambda_{h}}-\ln\lambda_{2}}
λhλ1+(α1)λ2\lambda_{h}\coloneqq\lambda_{1}+(\alpha-1)\lambda_{2},λh>0\hskip 28.45274pt\lambda_{h}>0
Gamma lnΓ(k2)+k2lnθ2\displaystyle{\ln\Gamma(k_{2})+k_{2}\ln\theta_{2}}
+11α(lnΓ(kh)Γ(k1)khlnθhk1lnθ1)+\displaystyle{\frac{1}{1-\alpha}\left(\ln\frac{\Gamma(k_{h})}{\Gamma(k_{1})}-k_{h}\ln\theta_{h}-k_{1}\ln\theta_{1}\right)}
θhθ1+(a1)θ2(α1)θ1θ1\theta_{h}\coloneqq\frac{\theta_{1}+(a-1)\theta_{2}}{(\alpha-1)\theta_{1}\theta_{1}},θh>0\hskip 28.45274pt\theta_{h}>0
khk1+(α1)k2k_{h}\coloneqq k_{1}+(\alpha-1)k_{2},kh>0\hskip 28.45274ptk_{h}>0
Gaussian (univariate)\begin{array}[]{c}\textbf{Gaussian}\\ \textbf{ (univariate)}\end{array} 12(ln(2πσ22)+11αln(σ22(σ2)h)+(μ1μ2)2(σ2)h)\displaystyle{\frac{1}{2}\left(\ln(2\pi\sigma_{2}^{2})+\frac{1}{1-\alpha}\ln\left(\frac{\sigma_{2}^{2}}{(\sigma^{2})_{h}}\right)+\frac{(\mu_{1}-\mu_{2})^{2}}{(\sigma^{2})_{h}}\right)}
(σ2)hσ22+(α1)σ12(\sigma^{2})_{h}\coloneqq\sigma_{2}^{2}+(\alpha-1)\sigma_{1}^{2},(σ2)h>0\hskip 28.45274pt(\sigma^{2})_{h}>0
Gaussian (Multivariate)\begin{array}[]{c}\textbf{Gaussian}\\ \textbf{ (Multivariate)}\end{array} 122α(ln|A||Σ1|+(1α)ln(2π)n|Σ2|d)\displaystyle{\frac{1}{2-2\alpha}\big{(}-\ln|A||\Sigma_{1}|+\left(1-\alpha\right)\ln\left(2\pi\right)^{n}|\Sigma_{2}|-d\big{)}}
AΣ11+(α1)Σ21A\coloneqq\Sigma_{1}^{-1}+(\alpha-1)\Sigma_{2}^{-1} , A0\hskip 28.45274ptA\succ 0
d𝝁𝟏𝑻Σ11𝝁𝟏+(α1)𝝁𝟐𝑻Σ21𝝁𝟐\displaystyle{d\coloneqq\boldsymbol{\mu^{T}_{1}}\Sigma^{-1}_{1}\boldsymbol{\mu_{1}}+\left(\alpha-1\right)\boldsymbol{\mu^{T}_{2}}\Sigma^{-1}_{2}\boldsymbol{\mu_{2}}}
(𝝁𝟏𝑻Σ11+(α1)𝝁𝟐𝑻Σ21)A1(Σ11𝝁𝟏+(α1)Σ21𝝁𝟐)\displaystyle{-(\boldsymbol{\mu^{T}_{1}}\Sigma^{-1}_{1}+\left(\alpha-1\right)\boldsymbol{\mu^{T}_{2}}\Sigma_{2}^{-1})A^{-1}(\Sigma^{-1}_{1}\boldsymbol{\mu_{1}}+\left(\alpha-1\right)\Sigma^{-1}_{2}\boldsymbol{\mu_{2}})}
Gumbel(𝜷𝟏=𝜷𝟐=𝜷)\begin{array}[]{c}\textbf{Gumbel}\\ \boldsymbol{(\beta_{1}=\beta_{2}=\beta)}\end{array} 11α(lnΓ(2α)βμ1βαlnηh)+μ2β\displaystyle{\frac{1}{1-\alpha}\left(\ln\frac{\Gamma(2-\alpha)}{\beta}-\frac{\mu_{1}}{\beta}-\alpha\ln\eta_{h}\right)+\frac{\mu_{2}}{\beta}}
ηheμ1/β+(α1)eμ2/β\displaystyle{\eta_{h}\coloneqq e^{-\mu_{1}/\beta}+(\alpha-1)e^{-\mu_{2}/\beta}},ηh>0\hskip 28.45274pt\eta_{h}>0
Half-Normal 12(ln(πσ222)+11αln(σ22(σ2)h))\displaystyle{\frac{1}{2}\left(\ln(\frac{\pi\sigma_{2}^{2}}{2})+\frac{1}{1-\alpha}\ln\left(\frac{\sigma_{2}^{2}}{(\sigma^{2})_{h}}\right)\right)}
(σ2)hσ22+(α1)σ12(\sigma^{2})_{h}\coloneqq\sigma_{2}^{2}+(\alpha-1)\sigma_{1}^{2},(σ2)h>0\hskip 28.45274pt(\sigma^{2})_{h}>0
Laplace(μ1=μ2=0)\begin{array}[]{c}\textbf{Laplace}\\ \textbf{$(\mu_{1}=\mu_{2}=0)$}\\ \end{array} ln(2b2)+11αln(b22bh)\displaystyle{\ln(2b_{2})+\frac{1}{1-\alpha}\ln\left(\frac{b_{2}}{2b_{h}}\right)}
bhb2+(1α)b1\displaystyle{b_{h}\coloneqq b_{2}+(1-\alpha)b_{1}},bh>0\hskip 28.45274ptb_{h}>0
MaxwellBoltzmann\begin{array}[]{c}\textbf{Maxwell}\\ \textbf{{Boltzmann}}\\ \end{array} 12(ln2π+3lnσ22)+lnσh2\displaystyle{\frac{1}{2}\left(\ln 2\pi+3\ln\sigma_{2}^{2}\right)+\ln\sigma_{h}^{2}}
+11α(lnΓ(2α)Γ(α)32lnσ12σh2)+\displaystyle{\frac{1}{1-\alpha}\left(\ln\frac{\Gamma(2\alpha)}{\Gamma(\alpha)}-\frac{3}{2}\ln\sigma_{1}^{2}\sigma_{h}^{2}\right)}
σh2σ12+(α1)σ22\sigma^{2}_{h}\coloneqq\sigma_{1}^{-2}+(\alpha-1)\sigma_{2}^{-2},σh2>0\hskip 28.45274pt\sigma^{2}_{h}>0
Pareto(𝒎𝟏=𝒎𝟐=𝒎)\begin{array}[]{c}\textbf{Pareto}\\ \boldsymbol{(m_{1}=m_{2}=m)}\end{array} lnmlnλ2+11αlnλ1λh\displaystyle{-\ln m-\ln\lambda_{2}+\frac{1}{1-\alpha}\ln\frac{\lambda_{1}}{\lambda_{h}}}
λhλ1+(α1)(λ2+1)\lambda_{h}\coloneqq\lambda_{1}+\left(\alpha-1\right)\left(\lambda_{2}+1\right),λh>0\hskip 28.45274pt\lambda_{h}>0
Rayleigh lnσ12αlnσh2+lnΓ(1α2)1α+ln2σ22\displaystyle{\frac{\ln\sigma_{1}^{2}-\alpha\ln\sigma_{h}^{2}+\ln\Gamma(\frac{1-\alpha}{2})}{1-\alpha}+\ln 2\sigma_{2}^{2}}
σh2σ12+(α1)σ22\sigma_{h}^{2}\coloneqq\sigma_{1}^{-2}+(\alpha-1)\sigma_{2}^{-2},σh2>0\hskip 28.45274pt\sigma^{2}_{h}>0
TABLE II: Natural Rényi Differential Cross-Entropies
Name h~α(f1;f2)\tilde{h}_{\alpha}(f_{1};f_{2})
Beta lnB(a2,b2)+1α1lnB(aα,bα)B(α(a11)+1,α(b11)+1)\displaystyle{\ln{B(a_{2},b_{2})}+\frac{1}{\alpha-1}\ln\frac{B(a_{\alpha},b_{\alpha})}{B\left(\alpha\left(a_{1}-1\right)+1,\alpha\left(b_{1}-1\right)+1\right)}}
aααa1+(1α)a2a_{\alpha}\coloneqq\alpha a_{1}+(1-\alpha)a_{2}, aα>0\hskip 28.45274pta_{\alpha}>0
bααb1+(1α)b2b_{\alpha}\coloneqq\alpha b_{1}+(1-\alpha)b_{2}, bα>0\hskip 28.45274ptb_{\alpha}>0
𝝌 (scaled)\begin{array}[]{c}\boldsymbol{\chi}\\ \textbf{ (scaled)}\end{array} 12(ln2σ12α+k2lnσ22σα2)+lnΓ(k22)\displaystyle{\frac{1}{2}\left(-\ln\frac{2\sigma_{1}^{2}}{\alpha}+k_{2}\ln\sigma_{2}^{2}\sigma^{2}_{\alpha}\right)+\ln\Gamma\left(\frac{k_{2}}{2}\right)}
+11α(αk1lnσα2σ12α2lnΓ(kα2)+lnΓ(α(k11)+12))+\displaystyle{\frac{1}{1-\alpha}\left(\frac{\alpha k_{1}\ln\frac{\sigma^{2}_{\alpha}\sigma_{1}^{2}}{\alpha}}{2}-\ln\Gamma(\frac{k_{\alpha}}{2})+\ln\Gamma\left(\frac{\alpha(k_{1}-1)+1}{2}\right)\right)}
σα2ασ12+1ασ22\sigma^{2}_{\alpha}\coloneqq\frac{\alpha}{\sigma_{1}^{2}}+\frac{1-\alpha}{\sigma_{2}^{2}}, σα2>0\hskip 28.45274pt\sigma^{2}_{\alpha}>0
kααk1+(1α)k2k_{\alpha}\coloneqq\alpha k_{1}+(1-\alpha)k_{2}, kα>0\hskip 28.45274ptk_{\alpha}>0
𝝌 (non-scaled)\begin{array}[]{c}\boldsymbol{\chi}\\ \textbf{ (non-scaled)}\end{array} ln2α2+lnΓ(k22)\frac{-\ln 2\alpha}{2}+\ln\Gamma(\frac{k_{2}}{2})
+11α(lnΓ(kα2)αk1lnα2+lnΓ(α(k11)+12))+\displaystyle{\frac{1}{1-\alpha}\left(-\ln\Gamma(\frac{k_{\alpha}}{2})-\frac{\alpha k_{1}\ln\alpha}{2}+\ln\Gamma(\frac{\alpha(k_{1}-1)+1}{2})\right)}
kααk1+(1α)k2k_{\alpha}\coloneqq\alpha k_{1}+(1-\alpha)k_{2}, kα>0\hskip 28.45274ptk_{\alpha}>0
𝝌𝟐\boldsymbol{\chi^{2}} 11α(lnΓ(να2)+αlnΓ(ν12))+lnΓ(ν22)\displaystyle{\frac{1}{1-\alpha}\left(-\ln\Gamma\left(\frac{\nu_{\alpha}}{2}\right)+\alpha\ln\Gamma\left(\frac{\nu_{1}}{2}\right)\right)}\displaystyle{+\ln\Gamma\left(\frac{\nu_{2}}{2}\right)}
νααν1+(1α)k\nu_{\alpha}\coloneqq\alpha\nu_{1}+(1-\alpha)k, να>0\hskip 28.45274pt\nu_{\alpha}>0
Exponential 11αlnλ1αλαlnλ2\displaystyle{\frac{1}{1-\alpha}\ln\frac{\lambda_{1}}{\alpha\lambda_{\alpha}}-\ln\lambda_{2}}
λααλ1+(1α)λ2\lambda_{\alpha}\coloneqq\alpha\lambda_{1}+(1-\alpha)\lambda_{2}, λα>0\hskip 28.45274pt\lambda_{\alpha}>0
Gamma lnΓ(k2)+k2lnθ2\displaystyle{\ln\Gamma(k_{2})+k_{2}\ln\theta_{2}}
+11α(lnΓ(k1)Γ(kα)kαlnθαα2k1lnθ1)+\displaystyle{\frac{1}{1-\alpha}\left(\ln\frac{\Gamma(k_{1})}{\Gamma(k_{\alpha})}-k_{\alpha}\ln\theta_{\alpha}-\alpha^{2}k_{1}\ln\theta_{1}\right)}
θααθ11+(1α)θ21\theta_{\alpha}\coloneqq\alpha\theta_{1}^{-1}+(1-\alpha)\theta_{2}^{-1},  kααk1+(1α)k2k_{\alpha}\coloneqq\alpha k_{1}+(1-\alpha)k_{2}, θα>0\hskip 28.45274pt\theta_{\alpha}>0
Gaussian (Univariate)\begin{array}[]{c}\textbf{Gaussian}\\ \textbf{ (Univariate)}\end{array} 12(ln(2πσ22)+(μ1μ2)2(σ2)α+11αln(ασ22(σ2)α))\displaystyle{\frac{1}{2}\left(\ln(2\pi\sigma_{2}^{2})+\frac{(\mu_{1}-\mu_{2})^{2}}{(\sigma^{2})_{\alpha}}+\frac{1}{1-\alpha}\ln\left(\frac{\alpha\sigma_{2}^{2}}{(\sigma^{2})_{\alpha}}\right)\right)}
(σ2)αασ22+(1α)σ12(\sigma^{2})_{\alpha}\coloneqq\alpha\sigma_{2}^{2}+(1-\alpha)\sigma_{1}^{2}, (σ2)α>0\hskip 28.45274pt(\sigma^{2})_{\alpha}>0
Gaussian (Multivariate)\begin{array}[]{c}\textbf{Gaussian}\\ \textbf{ (Multivariate)}\end{array} 122α(ln|α|+ln|A||Σ1|+d)+12ln(2π)n|Σ1|2|Σ2|\displaystyle{\frac{1}{2-2\alpha}\left(-\ln|\alpha|+\ln|A||\Sigma_{1}|+d\right)+\frac{1}{2}\ln\frac{(2\pi)^{n}|\Sigma_{1}|^{2}}{|\Sigma_{2}|}}
AαΣ11+(1α)Σ21A\coloneqq\alpha\Sigma_{1}^{-1}+(1-\alpha)\Sigma_{2}^{-1}, A0\hskip 28.45274ptA\succ 0
d(𝝁𝟏𝝁𝟐)TΣ1AΣ2(𝝁𝟏𝝁𝟐)\displaystyle{d\coloneqq(\boldsymbol{\mu_{1}}-\boldsymbol{\mu_{2}})^{T}\Sigma_{1}A\Sigma_{2}(\boldsymbol{\mu_{1}}-\boldsymbol{\mu_{2}})}
Gumbel(𝜷𝟏=𝜷𝟐=𝜷)\begin{array}[]{c}\textbf{Gumbel}\\ \boldsymbol{(\beta_{1}=\beta_{2}=\beta)}\end{array} μ2+αμ1β+11α(lnΓ(2α)ηααβ+μ1β)\displaystyle{\frac{\mu_{2}+\alpha\mu_{1}}{\beta}+\frac{1}{1-\alpha}\left(\ln\frac{\Gamma(2-\alpha)\eta_{\alpha}}{\alpha\beta}+\frac{\mu_{1}}{\beta}\right)}
ηααeμ1/β+(1α)eμ2/β\displaystyle{\eta_{\alpha}\coloneqq\alpha e^{-\mu_{1}/\beta}+(1-\alpha)e^{-\mu_{2}/\beta}}, ηα>0\hskip 28.45274pt\eta_{\alpha}>0
Half-Normal 12(ln(πσ222)+11αln(ασ22(σ2)α))\displaystyle{\frac{1}{2}\left(\ln(\frac{\pi\sigma_{2}^{2}}{2})+\frac{1}{1-\alpha}\ln\left(\frac{\alpha\sigma_{2}^{2}}{(\sigma^{2})_{\alpha}}\right)\right)}
(σ2)αασ22+(1α)σ12(\sigma^{2})_{\alpha}\coloneqq\alpha\sigma_{2}^{2}+(1-\alpha)\sigma_{1}^{2} , (σ2)α>0\hskip 28.45274pt(\sigma^{2})_{\alpha}>0
Laplace(μ1=μ2=0)\begin{array}[]{c}\textbf{Laplace}\\ \textbf{$(\mu_{1}=\mu_{2}=0)$}\\ \end{array} lnbα+lnαb11α+ln2b2\displaystyle{\frac{\ln b_{\alpha}+\ln\alpha b_{1}}{1-\alpha}+\ln 2b_{2}}
bααb1+1αb2\displaystyle{b_{\alpha}\coloneqq\frac{\alpha}{b_{1}}+\frac{1-\alpha}{b_{2}}}, bα>0\hskip 28.45274ptb_{\alpha}>0
Maxwell Boltzmann ln2+3lnσ222+lnασ12\displaystyle{\frac{-\ln 2+3\ln\sigma_{2}^{2}}{2}+\ln\frac{\alpha}{\sigma_{1}^{2}}}
+11α(32lnσασ12ααlnπ2+lnΓ(α+12))+\displaystyle{\frac{1}{1-\alpha}\left(\frac{3}{2}\ln\frac{\sigma_{\alpha}\sigma_{1}^{2}}{\alpha}-\alpha\ln\frac{\sqrt{\pi}}{2}+\ln\Gamma\left(\alpha+\frac{1}{2}\right)\right)}
σα2ασ12+1ασ22\sigma^{2}_{\alpha}\coloneqq\frac{\alpha}{\sigma_{1}^{2}}+\frac{1-\alpha}{\sigma_{2}^{2}}, σα2>0\hskip 28.45274pt\sigma^{2}_{\alpha}>0
Pareto(𝒎𝟏=𝒎𝟐=𝒎)\begin{array}[]{c}\textbf{Pareto}\\ \boldsymbol{(m_{1}=m_{2}=m)}\end{array} 11α(lnλαln(1α(λ11)))lnλ2m\displaystyle{\frac{1}{1-\alpha}\bigg{(}\ln\lambda_{\alpha}-\ln\left(1-\alpha(\lambda_{1}-1)\right)\bigg{)}-\ln\lambda_{2}m}
λααλ1+(1α)λ2\lambda_{\alpha}\coloneqq\alpha\lambda_{1}+\left(1-\alpha\right)\lambda_{2}, λα>0\hskip 28.45274pt\lambda_{\alpha}>0
Rayleigh lnσ12(σ2)α+lnα+lnΓ(1α2)1α+ln2σ12\displaystyle{\frac{\ln\sigma_{1}^{2}(\sigma^{2})_{\alpha}+\ln\alpha+\ln\Gamma(\frac{1-\alpha}{2})}{1-\alpha}+\ln 2\sigma_{1}^{2}}
(σ2)αασ12+12ln2σ14σ24α(\sigma^{2})_{\alpha}\coloneqq\alpha\sigma_{1}^{-2}+\frac{1}{2}\ln\frac{2\sigma_{1}^{4}\sigma_{2}^{4}}{\alpha}, (σ2)α>0\hskip 28.45274pt(\sigma^{2})_{\alpha}>0

IV Rényi and Natural Rényi Differential Cross-Entropy Rates for Stationary Gaussian Processes

In [1] the Rényi differential cross-entropy rate for stationary zero-mean Gaussian processes was derived. This, alongside with the Shannon and Natural Rényi differential cross-entropy rates, are summarised in Table LABEL:RencrosenTable. Here, f(λ)f(\lambda) is the spectral density of the first zero-mean Gaussian process, g(λ)g(\lambda) is the spectral density of the second,

h(λ)=f(λ)+(α1)g(λ),h(\lambda)=f(\lambda)+(\alpha-1)g(\lambda),

and

j(λ)=αf(λ)+(1α)g(λ).j(\lambda)=\alpha f(\lambda)+(1-\alpha)g(\lambda).
TABLE III: Differential Cross-Entropy Rates for Stationary Zero-Mean Gaussian Sources
Information Measure Rate Constraint
ShannonDifferentialCross-Entropy\begin{array}[]{l}\text{Shannon}\\ \text{Differential}\\ \text{Cross-Entropy}\end{array} 12ln2π+14π02π[lng(λ)+f(λ)g(λ)]𝑑λ\displaystyle{\frac{1}{2}\ln 2\pi+\frac{1}{4\pi}\int_{0}^{2\pi}\Big{[}\ln g(\lambda)+\frac{f(\lambda)}{g(\lambda)}\Big{]}\,d\lambda} g(λ)>0g(\lambda)>0
Natural RényiDifferentialCross-Entropy\begin{array}[]{l}\text{Natural R\'{e}nyi}\\ \text{Differential}\\ \text{Cross-Entropy}\end{array} 12ln4π2α1α1+14π(1α)02πlnj(λ)g(λ)αdλ\displaystyle{\frac{1}{2}\ln 4\pi^{2}\alpha^{\frac{1}{\alpha-1}}+\frac{1}{4\pi(1-\alpha)}\int_{0}^{2\pi}\ln\frac{j(\lambda)}{{g(\lambda)}^{\alpha}}\,d\lambda} j(λ)g(λ)>0\displaystyle{\frac{j(\lambda)}{g(\lambda)}>0}
RényiDifferentialCross-Entropy\begin{array}[]{l}\text{R\'{e}nyi}\\ \text{Differential}\\ \text{Cross-Entropy}\end{array} ln2π2+14π(1α)02π[(2α)lng(λ)lnh(λ)]𝑑λ\displaystyle{\frac{\ln 2\pi}{2}+\frac{1}{4\pi(1-\alpha)}\int_{0}^{2\pi}\left[(2-\alpha)\ln{g}(\lambda)-\ln{h}(\lambda)\right]d\lambda} g(λ)h(λ)>0\displaystyle{\frac{g(\lambda)}{h(\lambda)}>0}

V Rényi and Natural Rényi Cross-Entropy Rates for Markov Sources

In [1], the Rényi cross-entropy rate between finite-state time-invariant Markov sources was established, using as in [16] tools from the theory of non-negative matrices and Perron-Frobenius theory (e.g., cf. [17, 18]). This, alongside the Shannon and Natural Rényi differential cross-entropy rates, are derived and summarised in Table LABEL:MarkovcrosenTable. Here, PP and QQ are the m×mm\times m (stochastic) transition matrices associated with the first and second Markov sources, respectively, where both sources have common alphabet of size mm. To allow any value of the Rényi parameter α\alpha in (0,1)(1,)(0,1)\cup(1,\infty), we assume that the transition matrix QQ of the second Markov chain has positive entries (Q>0Q>0); however, the transition matrix PP of the first Markov chain is taken to be an arbitrary stochastic matrix. For simplicity, we assume that the initial distribution vectors, pp and qq, of both Markov chains also have positive entries (p>0p>0 and q>0q>0).111This condition can be relaxed via the approach used to prove [16, Theorem 1]. Moreover, πpT\pi_{p}^{T} denotes the stationary probability row vector associated with the first Markov chain and 𝟏\mathbf{1} is an mm-dimensional column vector where each element equals one. Furthermore, \odot denotes element-wise multiplication (i.e., the Hadamard product operation) and ln˙\dot{\ln} is the element-wise natural logarithm.

Finally, the definition of λ(R):m×m\lambda(R):\mathbb{R}^{m\times m}\mapsto\mathbb{R} for a matrix RR is more involved. If RR is irreducible, λ(R)\lambda(R) is its largest positive eigenvalue. Otherwise, rewriting RR in its canonical form as detailed in [16, Proposition 1], we have that λ(R)=max(λ,λ)\lambda(R)=\max(\lambda^{*},\lambda_{*}), where λ\lambda^{*} is the maximum of all largest positive eigenvalues of (irreducible) sub-matrices of RR corresponding to self-communicating classes, and λ\lambda_{*} is the maximum of all largest positive eigenvalues of sub-matrices of RR corresponding to classes reachable from an inessential class.

TABLE IV: Cross-Entropy Rates for Time-Invariant Markov Sources
Information Measure Rate
ShannonCross-Entropy\begin{array}[]{l}\text{Shannon}\\ \text{Cross-Entropy}\end{array} πpT(Pln˙Q)𝟏\displaystyle{-\pi_{p}^{T}\left(P\odot\dot{\ln}Q\right)\mathbf{1}}
Natural RényiCross-Entropy\begin{array}[]{l}\text{Natural R\'{e}nyi}\\ \text{Cross-Entropy}\end{array} 1α1lnλ(PαQ1α)λ(Pα)\displaystyle{\frac{1}{\alpha-1}\ln\frac{\lambda(P^{\alpha}\odot Q^{1-\alpha})}{\lambda(P^{\alpha})}}
RényiCross-Entropy\begin{array}[]{l}\text{R\'{e}nyi}\\ \text{Cross-Entropy}\end{array} 11αlnλ(PQα1)\displaystyle{\frac{1}{1-\alpha}\ln\lambda(P\odot Q^{\alpha-1})}

VI Conclusion

We have derived closed-form formulae for the Rényi and Natural Rényi differential cross-entropies of commonly used distributions from the exponential family. This is of potential use to further studies in information theory and machine learning, particularly problems where deep neural networks, trained according to a Shannon cross-entropy loss function, can be improved via generalized Rényi-type loss functions in virtue of the extra degree degree of freedom provided by the Rényi (α\alpha) parameter. In addition, we have provided formulae for the Rényi and Natural Rényi differential cross-entropy rates for stationary zero-mean Gaussian processes and expressions for the cross-entropy rates for Markov sources. Further work includes expanding the present collection by considering distributions such as Levy or Weibull and investigating cross-entropy measures based on the ff-divergence [19, 20, 21], starting with Arimoto’s divergence [22].

Acknowledgements

This work was supported in part by Natural Sciences and Engineering Research Council (NSERC) of Canada.

Appendix A: Distributions listed in Tables LABEL:RenDivTable and LABEL:RenNatTable

Name PDF f(x)f(x)
(Θ\Theta) (Support)
Beta B(a,b)xa1(1x)b1\displaystyle{{B(a,b)}x^{a-1}(1-x)^{b-1}}
(a>0a>0, b>0b>0) 𝕊=(0,1)\mathbb{S}=(0,1)
𝝌 (scaled)\begin{array}[]{c}\boldsymbol{\chi}\\ \textbf{ (scaled)}\end{array} 21k/2xk1ex2/2σ2σkΓ(k2)\displaystyle{\frac{2^{1-k/2}x^{k-1}e^{-x^{2}/2\sigma^{2}}}{\sigma^{k}\Gamma\left(\frac{k}{2}\right)}}
(k>0k>0, σ>0\sigma>0) 𝕊=+\mathbb{S}=\mathbb{R^{+}}
𝝌 (non-scaled)\begin{array}[]{c}\boldsymbol{\chi}\\ \textbf{ (non-scaled)}\end{array} 21k/2xk1ex2/2Γ(k2)\displaystyle{\frac{2^{1-k/2}x^{k-1}e^{-x^{2}/2}}{\Gamma\left(\frac{k}{2}\right)}}
(k>0k>0) 𝕊=+\mathbb{S}=\mathbb{R^{+}}
𝝌𝟐\boldsymbol{\chi^{2}} 12ν2Γ(ν2)xν21ex2\displaystyle{\dfrac{1}{2^{\frac{\nu}{2}}\Gamma\left(\frac{\nu}{2}\right)}x^{\frac{\nu}{2}-1}e^{-\frac{x}{2}}}
(ν>0\nu>0) 𝕊=+\mathbb{S}=\mathbb{R^{+}}
Exponential λeλx\displaystyle{\lambda e^{-\lambda x}}
(λ>0\lambda>0) 𝕊=+\mathbb{S}=\mathbb{R^{+}}
Gamma 1θkΓ(k)xk1ekθ\displaystyle{\dfrac{1}{\theta^{k}\Gamma\left(k\right)}x^{k-1}e^{-\frac{k}{\theta}}}
(k>0k>0, θ>0\theta>0) 𝕊=+\mathbb{S}=\mathbb{R^{+}}
Gaussian (univariate)\begin{array}[]{c}\textbf{Gaussian}\\ \textbf{ (univariate)}\end{array} 12πσ2e12(xμσ)2\displaystyle{\frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}}}
(μ\mu, σ2>0\sigma^{2}>0) 𝕊=\mathbb{S}=\mathbb{R}
Gaussian (multivariate)\begin{array}[]{c}\textbf{Gaussian}\\ \textbf{ (multivariate)}\end{array} 1(2π)n|Σ|e12(𝒙𝝁)TΣ1(𝒙𝝁)\displaystyle{\frac{1}{\sqrt{(2\pi)^{n}|\Sigma|}}e^{-\frac{1}{2}\left(\boldsymbol{x}-\boldsymbol{\mu}\right)^{T}\Sigma^{-1}\left(\boldsymbol{x}-\boldsymbol{\mu}\right)}}
(𝛍n\boldsymbol{\mu}\in\mathbb{R}^{n}, Σ2n×n0\Sigma^{2}\in\mathbb{R}^{n\times n}\succ 0) 𝕊=n\mathbb{S}=\mathbb{R}^{n}
Half-Normal 2πσ2e12(xσ)2\displaystyle{\sqrt{\frac{2}{\pi\sigma^{2}}}e^{-\frac{1}{2}\left(\frac{x}{\sigma}\right)^{2}}}
(σ2>0\sigma^{2}>0) 𝕊=+\mathbb{S}=\mathbb{R^{+}}
Gumbel 1βexp(xμβ+exμβ)\displaystyle{\frac{1}{\beta}\exp-\left(\frac{x-\mu}{\beta}+e^{-\frac{x-\mu}{\beta}}\right)}
(μ\mu, β>0\beta>0) 𝕊=\mathbb{S}=\mathbb{R}
Pareto amax(1+m)\displaystyle{am^{a}x^{-(1+m)}}
(m>0m>0, a>0a>0) 𝕊+(m,)\mathbb{S}+(m,\infty)
Maxwell Boltzmann 2x2πσ6e12(xσ)2\displaystyle{\frac{2x^{2}}{\sqrt{\pi\sigma^{6}}}e^{-\frac{1}{2}\left(\frac{x}{\sigma}\right)^{2}}}
(σ>0\sigma>0) 𝕊=+\mathbb{S}=\mathbb{R^{+}}
Rayleigh xσ2e12(xσ)2\displaystyle{\frac{x}{\sigma^{2}}e^{-\frac{1}{2}\left(\frac{x}{\sigma}\right)^{2}}}
(σ2>0\sigma^{2}>0) 𝕊=+\mathbb{S}=\mathbb{R}^{+}
Laplace 12be|xμ|b\displaystyle{\frac{1}{2b}e^{-\frac{|x-\mu|}{b}}}
(μ\mu, b2>0b^{2}>0) 𝕊=\mathbb{S}=\mathbb{R}

Notes

  • B(a,b)=01ta1(1t)b1𝑑t{\displaystyle\mathrm{B}(a,b)=\int_{0}^{1}t^{a-1}(1-t)^{b-1}\,dt} is the Beta function.

  • Γ(z)=0xz1ex𝑑x{\displaystyle\Gamma(z)=\int_{0}^{\infty}x^{z-1}e^{-x}\,dx} is the Gamma function.

References

  • [1] F. C. Thierrin, F. Alajaji, and T. Linder, “On the Rényi cross-entropy,” in Proceedings of the 17th Canadian Workshop on Information Theory (see also arXiv e-prints, arXiv:2206.14329), 2022, pp. 1–5.
  • [2] A. Rényi, “On measures of entropy and information,” in Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 1961, pp. 547–561.
  • [3] S. Verdú, “α\alpha-mutual information,” in Proceedings of the IEEE Information Theory and Applications Workshop, 2015, pp. 1–6.
  • [4] A. Sarraf and Y. Nie, “RGAN: Rényi generative adversarial network,” SN Computer Science, vol. 2, no. 1, p. 17, 2021.
  • [5] F. J. Valverde-Albacete and C. Peláez-Moreno, “The case for shifting the Rényi entropy,” Entropy, vol. 21, pp. 1–21, 2019. [Online]. Available: https://doi.org/10.3390/e21010046
  • [6] H. Bhatia, W. Paul, F. Alajaji, B. Gharesifard, and P. Burlina, “Least kth-order and Rényi generative adversarial networks,” Neural Computation, vol. 33, no. 9, pp. 2473–2510, Aug 2021. [Online]. Available: http://dx.doi.org/10.1162/neco_a_01416
  • [7] M. Gil, F. Alajaji, and T. Linder, “Rényi divergence measures for commonly used univariate continuous distributions,” Information Sciences, vol. 249, pp. 124–131, 2013. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0020025513004441
  • [8] K.-S. Song, “Rényi information, loglikelihood and an intrinsic distribution measure,” J. Statist. Plann. Inference, vol. 93, 02, 2001.
  • [9] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proceedings of Advances in Neural Information Processing Systems, vol. 27, 2014, pp. 2672–2680.
  • [10] H. Bhatia, W. Paul, F. Alajaji, B. Gharesifard, and P. Burlina, “Rényi Generative Adversarial Networks,” ArXiv:2006.02479v1, 2020.
  • [11] P. A. Kluza, “On Jensen-Rényi and Jeffreys-Rényi type ff-divergences induced by convex functions,” Physica A: Statistical Mechanics and its Applications, 2019.
  • [12] J. Lin, “Divergence measures based on the Shannon entropy,” IEEE Transactions on Information Theory, vol. 31, pp. 145–151, 1991.
  • [13] Y. Pantazis, D. Paul, M. Fasoulakis, Y. Stylianou, and M. Katsoulakis, “Cumulant GAN,” arXiv:2006.06625, 2020.
  • [14] G. R. Kurri, T. Sypherd, and L. Sankar, “Realizing GANs via a tunable loss function,” in Proceedings of the IEEE Information Theory Workshop (ITW), 2021, pp. 1–6.
  • [15] G. R. Kurri, M. Welfert, T. Sypherd, and L. Sankar, “α\alpha-GAN: Convergence and estimation guarantees,” in Proceedings of the IEEE International Symposium on Information Theory (ISIT), 2022, pp. 312–317.
  • [16] Z. Rached, F. Alajaji, and L. L. Campbell, “Rényi’s divergence and entropy rates for finite alphabet Markov sources,” IEEE Transactions on Information Theory, vol. 47, no. 4, pp. 1553–1561, 2001.
  • [17] E. Seneta, Non-negative Matrices and Markov Chains.   Springer Science & Business Media, 2006.
  • [18] R. G. Gallager, Discrete Stochastic Processes.   Springer, 1996.
  • [19] I. Csiszar, “Eine Informationstheoretische Ungleichung und ihre Anwendung auf den Bewis der Ergodizitat on Markhoffschen Ketten,” Publications of the Mathematical Institute of the Hungarian Academy of Sciences, Series A, vol. 8, 01 1963.
  • [20] I. Csiszár, “Information-type measures of difference of probability distributions and indirect observations,” Studia Sci. Math. Hungarica, vol. 2, pp. 299–318, 1967.
  • [21] S. M. Ali and S. D. Silvey, “A general class of coefficients of divergence of one distribution from another,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 28, no. 1, pp. 131–142, 1966. [Online]. Available: http://www.jstor.org/stable/2984279
  • [22] F. Liese and I. Vajda, “On divergences and informations in statistics and information theory,” IEEE Transactions on Information Theory, vol. 52, no. 10, pp. 4394–4412, 2006.