This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\SetWatermarkScale

10 \SetWatermarkLightness0.95

Divergences on Monads for Relational Program Logics

Tetsuya Sato    Shin-ya Katsumata
Abstract

Several relational program logics have been introduced for integrating reasoning about relational properties of programs and measurement of quantitative difference between computational effects. Towards a general framework for such logics, in this paper, we formalize quantitative difference between computational effects as divergence on monad, then develop a relational program logic acRL that supports generic computational effects and divergences on them. To give a categorical semantics of acRL supporting divergences, we give a method to obtain graded strong relational liftings from divergences on monads. We derive two instantiations of acRL for the verification of 1) various differential privacy of higher-order functional probabilistic programs and 2) difference of distribution of costs between higher-order functional programs with probabilistic choice and cost counting operations.

1 Introduction

Comparing behavior of programs is one of the fundamental activities in the verification and analysis of programs, and many concepts, techniques and formal systems have been proposed for this purpose, such as product program construction ([11]), relational Hoare logic ([14]), higher-order relational refinement types ([9]) and so on.

Several recent relational program logics integrate compositional reasoning about relational properties of programs and over-approximation of quantitative difference between computational effects of programs; the latter is done in the style of effect system ([36]). One successful logic of this kind is Barthe et al.’s approximate probabilistic relational Hoare logic (apRHL for short) designed for verifying differential privacy of probabilistic programs ([12]). A judgment of apRHL is of the form cϵ,δc:ΦΨc\sim_{\epsilon,\delta}c^{\prime}:\Phi\Rightarrow\Psi, and its intuitive meaning is that for any state pair (ρ,ρ)(\rho,\rho^{\prime}) related by Φ\Phi, the ϵ\epsilon-distance between two probability distributions of final states [[c]]ρ[\![c]\!]\rho and [[c]]ρ[\![c^{\prime}]\!]{\rho^{\prime}} is below δ\delta, and final states satisfy Ψ\Psi. Another relational program logic that measures the difference between computational effects of programs is Çiçek et al.’s RelCost ([18]). The target of the reasoning is a higher-order programming language equipped with cost counting effect. When we derive a judgment Δ;Ψ;ΓM1M2n:Φ\Delta;\Psi;\Gamma\vdash M_{1}\ominus M_{2}\precsim n\colon\Phi in RelCost, the sound semantics ensures that the difference of cost counts by M1M_{1} and M2M_{2} is bound by nn.

A high-level view on these relational program logics is that they integrate the feature of measuring quantitative difference between computational effects into relational program logic. We are interested in extracting mathematical essence of this design and making relational program logics versatile. Towards this goal, we contribute the following development.

  • We introduce a structure called divergence on monad for measuring quantitative difference between computational effects (Section 4, 5). This generalizes various statistical divergences, such as Kullback-Leibler divergence and total variation distance on probability distributions. After exploring examples of divergence on monads, we introduce a method to transfer divergences on a monad to those on another monad through monad opfunctors.

  • The key structure to integrate divergences on monads and relational program logics is something called graded strong relational lifting of monads that extends given divergences. We present a general construction of such liftings from divergences on monads in Section 7. This generalization shows that the development of relational program logics with quantitative measurement on computational effects can be done with various combinations of monads and divergences on them.

  • We introduce a generic relational program logic (called acRL) over Moggi’s computational metalanguage (the simply-typed lambda calculus with monadic types) in Section 8. Inside acRL, we can use graded strong relational liftings constructed from divergences on a monad, and reason about relational properties of programs together with quantitative difference of computational effects. To illustrate how the reasoning works in acRL, we instantiate it with the computational metalanguage having effectful operations for continuous random sampling (Section 9) and cost counting operation (Section 10).

2 Preliminaries

We assume basic knowledge about category theory ([37]) and Moggi’s model of computational effects ([42]). The definition of monad [37, Chapter VI] and Kleisli category [37, Section VI.5] are omitted.

In this paper, a Cartesian category (CC for short) is specified by a category \mathbb{C} with a designated final object 11 and a binary product functor (×):2(\times)\colon\mathbb{C}^{2}\to\mathbb{C}. The associated pairing operation and projection morphisms are denoted by ,,π1,π2\langle-,-\rangle,\pi_{1},\pi_{2}, respectively. The unique morphism to the terminal object is denoted by !I:I1!_{I}:I\to 1. A Cartesian closed category (CCC for short) is a CC (,1,(×))(\mathbb{C},1,(\times)) with a specified exponential functor ():op×(\Rightarrow)\colon\mathbb{C}^{\mathrm{op}}\times\mathbb{C}\to\mathbb{C}. The associated evaluation morphism and currying operation is denoted by ev,λ()\mathrm{ev},\lambda(-) respectively.

Let (,1,(×))(\mathbb{C},1,(\times)) be a CC. A global element of II\in\mathbb{C} is a morphism of type 1I1\rightarrow I. For a category \mathbb{C}, we define the functor U:𝐒𝐞𝐭U^{\mathbb{C}}:\mathbb{C}\to{\bf Set} by U=(1,)U^{\mathbb{C}}=\mathbb{C}(1,-). When \mathbb{C} is obvious, UU^{\mathbb{C}} is denoted by \mathbb{C}. Morphisms in \mathbb{C} act on global elements by the composition. To emphasize this action, we introduce a dedicated notation ()(\mathbin{\bullet}) whose type is (I,J)×UIUJ\mathbb{C}(I,J)\times UI\to UJ. Of course, fxfx=(Uf)(x)f\mathbin{\bullet}x\triangleq f\circ x=(Uf)(x). We also define the partial application of a binary morphism f:I×JKf\colon I\times J\rightarrow K to a global element iUIi\in UI by fifi!J,idJ:JKf_{i}\triangleq f\circ\langle i\circ!_{J},\operatorname{id}_{J}\rangle:J\to K. When \mathbb{C} is a CCC, there is an evident isomorphism :U(IJ)(I,J)\lfloor{-}\rfloor\colon U(I\Rightarrow J)\cong\mathbb{C}(I,J). We write \lceil{-}\rceil for its inverse.

A monad (T,η,μ)(T,\eta,\mu) on a category \mathbb{C} determines the operation ():(I,TJ)(TI,TJ)(-)^{\sharp}:\mathbb{C}(I,TJ)\to\mathbb{C}(TI,TJ) called Kleisli extension. It is defined by fμJTff^{\sharp}\triangleq\mu_{J}\circ Tf. A monad may be given as a Kleisli triple [42, Definition 1.2]. A strong monad on a CC (,1,(×))(\mathbb{C},1,(\times)) is a pair of a monad (T,η,μ)(T,\eta,\mu) and a natural transformation θI,J:I×TJT(I×J)\theta_{I,J}\colon I\times TJ\to T(I\times J) called strength. It should satisfy four axioms; see [42, Definition 3.2] for detail.

In a CC-SM (,1,(×),T,η,μ,θ)(\mathbb{C},1,(\times),T,\eta,\mu,\theta), the application of the strength to a global element can be expressed by the unit and the Kleisli extension of TT [42, Proof of Proposition 3.4]:

θI,Ji,c=((ηI×J)i)c(iUI,cU(TJ)).\theta_{I,J}\mathbin{\bullet}\langle i,c\rangle=((\eta_{I\times J})_{i})^{\sharp}\mathbin{\bullet}c\quad(i\in UI,c\in U(TJ)). (1)

We will use this fact in Proposition 7 and Proposition 1.

There are plenty of examples of C(C)Cs. For the models of probabilistic computation, we will later use CC 𝐌𝐞𝐚𝐬{\bf Meas} of measurable spaces and CCC 𝐐𝐁𝐒{\bf QBS} of quasi-Borel spaces ([28]). Their definitions are deferred to Section 13.

2.1 Category of Binary Relations

We next introduce the category 𝐁𝐑𝐞𝐥(){\bf BRel}(\mathbb{C}) of binary relations over \mathbb{C}-objects. This category is equivalent to subscones of 2\mathbb{C}^{2} ([41]). It offers an underlying category for relational reasoning about programs interpreted in \mathbb{C}.

  • An object in 𝐁𝐑𝐞𝐥(){\bf BRel}(\mathbb{C}) is a triple (I1,I2,R)(I_{1},I_{2},R) where RUI×UJR\subseteq UI\times UJ.

  • A morphism from (I1,I2,R)(I_{1},I_{2},R) to (J1,J2,S)(J_{1},J_{2},S) in 𝐁𝐑𝐞𝐥(){\bf BRel}(\mathbb{C}) is a pair of \mathbb{C}-morphisms f1:I1J1f_{1}\colon I_{1}\rightarrow J_{1} and f2:I2J2f_{2}\colon I_{2}\rightarrow J_{2} such that for any (i1,i2)R(i_{1},i_{2})\in R, we have (f1i1,f2i2)S(f_{1}\mathbin{\bullet}i_{1},f_{2}\mathbin{\bullet}i_{2})\in S.

When XX is a name of a 𝐁𝐑𝐞𝐥(){\bf BRel}(\mathbb{C})-object, by X1,X2X_{1},X_{2} we mean its first and second component, and by RXR_{X} we mean its third component; so X=(X1,X2,RX)X=(X_{1},X_{2},R_{X}). By (x1,x2)X(x_{1},x_{2})\in X we mean (x1,x2)RX(x_{1},x_{2})\in R_{X}. For objects X,Y𝐁𝐑𝐞𝐥()X,Y\in{\bf BRel}(\mathbb{C}) and a morphism (f1,f2):(X1,X2)(Y1,Y2)(f_{1},f_{2})\colon(X_{1},X_{2})\to(Y_{1},Y_{2}) in 2\mathbb{C}^{2}, by

(f1,f2):X˙Y(f_{1},f_{2})\colon X\mathbin{\dot{\rightarrow}}Y

we mean that (f1,f2)𝐁𝐑𝐞𝐥()(X,Y)(f_{1},f_{2})\in{\bf BRel}(\mathbb{C})(X,Y), that is, for any (x1,x2)X(x_{1},x_{2})\in X, we have (f1x1,f2x2)Y(f_{1}\mathbin{\bullet}x_{1},f_{2}\mathbin{\bullet}x_{2})\in Y. We say that X𝐁𝐑𝐞𝐥()X\in{\bf BRel}(\mathbb{C}) is an endorelation (over II) if X1=X2(=I)X_{1}=X_{2}(=I).

We next define the forgetful functor p:𝐁𝐑𝐞𝐥()2p_{\mathbb{C}}:{\bf BRel}(\mathbb{C})\to{\mathbb{C}}^{2} by

pX(X1,X2),p(f1,f2)(f1,f2).p_{\mathbb{C}}X\triangleq(X_{1},X_{2}),\quad p_{\mathbb{C}}(f_{1},f_{2})\triangleq(f_{1},f_{2}).

For (I1,I2)2(I_{1},I_{2})\in\mathbb{C}^{2}, by 𝐁𝐑𝐞𝐥()(I1,I2){\bf BRel}(\mathbb{C})_{(I_{1},I_{2})} we mean the complete boolean algebra {X𝐁𝐑𝐞𝐥()|X1=I1X2=I2}\{X\in{\bf BRel}(\mathbb{C})~{}|~{}X_{1}=I_{1}\wedge X_{2}=I_{2}\} with the order given by XYRXRYX\leq Y\iff R_{X}\subseteq R_{Y}.

When \mathbb{C} is a C(C)C, so is 𝐁𝐑𝐞𝐥(){\bf BRel}(\mathbb{C}) [41, Proposition 4.3]. We specify a final object, a binary product functor and an exponential functor (in case \mathbb{C} is a CCC) on 𝐁𝐑𝐞𝐥(){\bf BRel}(\mathbb{C}) by:

1˙\displaystyle\dot{1} (1,1,{(id1,id1)})\displaystyle\triangleq(1,1,\{({\rm id}_{1},{\rm id}_{1})\})
X×˙Y\displaystyle X\mathbin{\dot{\times}}Y (X1×Y1,X2×Y2,{(x1,y1,x2,y2)|(x1,x2)X,(y1,y2)Y})\displaystyle\triangleq(X_{1}\times Y_{1},X_{2}\times Y_{2},\{(\langle x_{1},y_{1}\rangle,\langle x_{2},y_{2}\rangle)~{}|~{}(x_{1},x_{2})\in X,(y_{1},y_{2})\in Y\})
X˙Y\displaystyle X\mathbin{\dot{\Rightarrow}}Y (X1Y1,X2Y2,{(f1,f2)|(x1,x2)X.(evf1,x1,evf2,x2)Y}).\displaystyle\triangleq(X_{1}\Rightarrow Y_{1},X_{2}\Rightarrow Y_{2},\{(f_{1},f_{2})~{}|~{}\forall{(x_{1},x_{2})\in X}~{}.~{}(\mathrm{ev}\circ\langle f_{1},x_{1}\rangle,\mathrm{ev}\circ\langle f_{2},x_{2}\rangle)\in Y\}).

3 Divergences on Objects

We introduce the concept of divergence on objects in a CC \mathbb{C}. Major differences between divergence and metric are threefold: 1) it is defined over objects in \mathbb{C}, 2) no axioms is imposed on it, and 3) it takes values in a partially ordered monoid called divergence domain, which we define below.

Definition 1.

A divergence domain 𝒬=(Q,,0,(+))\mathcal{Q}=(Q,\leq,0,(+)) is a partially ordered commutative monoid whose poset part is a complete lattice.

The monoid addition (+)(+) is only required to be monotone; no interaction with the sup / inf is required. We reserve the letter 𝒬\mathcal{Q} to denote a general divergence domain. Examples of divergence domains are:

𝒩\displaystyle\mathcal{N} =({},,0,(+)),\displaystyle=(\mathbb{N}\cup\{\infty\},\leq,0,(+)), +\displaystyle\mathcal{R}^{+} =([0,],,0,(+)),\displaystyle=([0,\infty],\leq,0,(+)),
×\displaystyle\mathcal{R}^{\times} =([0,],,1,(×)),\displaystyle=([0,\infty],\leq,1,(\times)), 1+\displaystyle\mathcal{R}^{+}_{1} =([0,],,0,λ(p,q).p+q+pq),\displaystyle=([0,\infty],\leq,0,\lambda{(p,q)}~{}.~{}p+q+pq),
𝒵\displaystyle\mathcal{Z} =({,},,0,(+¯)),\displaystyle=(\mathbb{Z}\cup\{\infty,-\infty\},\leq,0,(\mathbin{\bar{+}})), \displaystyle\mathcal{R} =([,],,0,(+¯))\displaystyle=([-\infty,\infty],\leq,0,(\mathbin{\bar{+}}))

Here, +¯\mathbin{\bar{+}} is an extension of the addition by r+¯()=()+¯r=r\mathbin{\bar{+}}(-\infty)=(-\infty)\mathbin{\bar{+}}r=-\infty.

Definition 2.

Let \mathbb{C} be a CC. A 𝒬\mathcal{Q}-divergence on an object II\in\mathbb{C} is a function d:(UI)2𝒬d\colon(UI)^{2}\to\mathcal{Q}.

A suitable notion of morphism between \mathbb{C}-objects with divergences is nonexpansive morphism.

Definition 3.

Let \mathbb{C} be a CC. We define the category 𝐃𝐢𝐯𝒬(){\bf Div}_{\mathcal{Q}}({\mathbb{C}}) of 𝒬\mathcal{Q}-divergences on \mathbb{C}-objects and nonexpansive morphisms between them by the following data.

  • An object is a pair (I,d)(I,d) of an object II\in\mathbb{C} and a 𝒬\mathcal{Q}-divergence dd on II.

  • A morphism from (I,d)(I,d) to (J,e)(J,e) is a \mathbb{C}-morphism f:IJf\colon I\to J such that for any x1,x2UIx_{1},x_{2}\in UI, e(fx1,fx2)d(x1,x2)e(f\mathbin{\bullet}x_{1},f\mathbin{\bullet}x_{2})\leq d(x_{1},x_{2}) holds.

For an object X𝐃𝐢𝐯𝒬()X\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}}), by dXd_{X} we mean its 𝒬\mathcal{Q}-divergence part. We also define the forgetful functor V𝒬,:𝐃𝐢𝐯𝒬()V_{\mathcal{Q},\mathbb{C}}:{\bf Div}_{\mathcal{Q}}({\mathbb{C}})\to{\mathbb{C}} by V𝒬,(I,d)IV_{\mathcal{Q},\mathbb{C}}(I,d)\triangleq I and V𝒬,(f)fV_{\mathcal{Q},\mathbb{C}}(f)\triangleq f.

We remark that the forgetful functor V𝒬,𝐒𝐞𝐭:𝐃𝐢𝐯𝒬(𝐒𝐞𝐭)𝐒𝐞𝐭V_{\mathcal{Q},{\bf Set}}:{\bf Div}_{\mathcal{Q}}({{\bf Set}})\to{{\bf Set}} is a (Grothendieck) fibration, and the functor U¯:𝐃𝐢𝐯𝒬()𝐃𝐢𝐯𝒬(𝐒𝐞𝐭)\overline{U}\colon{\bf Div}_{\mathcal{Q}}({\mathbb{C}})\to{\bf Div}_{\mathcal{Q}}({{\bf Set}}) defined by U¯(I,d)(UI,d)\overline{U}(I,d)\triangleq(UI,d) and U¯(f)f\overline{U}(f)\triangleq f makes the following commutative square a pullback in 𝐂𝐀𝐓{\bf CAT} (the large category of categories and functors between them):

𝐃𝐢𝐯𝒬()\textstyle{{\bf Div}_{\mathcal{Q}}({\mathbb{C}})\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}V𝒬,\scriptstyle{V_{\mathcal{Q},\mathbb{C}}}U¯\scriptstyle{\overline{U}}𝐃𝐢𝐯𝒬(𝐒𝐞𝐭)\textstyle{{\bf Div}_{\mathcal{Q}}({{\bf Set}})\ignorespaces\ignorespaces\ignorespaces\ignorespaces}V𝒬,𝐒𝐞𝐭\scriptstyle{V_{\mathcal{Q},{\bf Set}}}\textstyle{\mathbb{C}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}U\scriptstyle{U}𝐒𝐞𝐭\textstyle{\bf Set}

Therefore this pullback diagram asserts that V𝒬,:𝐃𝐢𝐯𝒬()V_{\mathcal{Q},\mathbb{C}}:{\bf Div}_{\mathcal{Q}}({\mathbb{C}})\to{\mathbb{C}} arises from the change-of-base of the fibration V𝒬,𝐒𝐞𝐭V_{\mathcal{Q},{\bf Set}} along the global section functor U:𝐒𝐞𝐭U:\mathbb{C}\to{\bf Set} ([29]).

4 Divergences on Monads

We introduce the concept of divergence on monad as a quantitative measure of difference between computational effects. This is hinted from Barthe and Olmedo’s composable divergences on probability distributions ([13]). Divergences on monads are defined upon two extra data called grading monoid and basic endorelation.

Definition 4.

A grading monoid is a partially ordered monoid (M,,1,())(M,\leq,1,(\cdot)).

Definition 5.

A basic endorelation is a functor E:𝐁𝐑𝐞𝐥()E\colon\mathbb{C}\to{\bf BRel}(\mathbb{C}) such that EIEI is an endorelation on II.

Grading monoids will be used when formulating (ε,δ)(\varepsilon,\delta)-differential privacy as a divergence on a monad. Basic endorelations specify which global elements are regarded as identical. Any CC \mathbb{C} has at least two basic endorelations of equality relations and total relations:

EqI\displaystyle{\color[rgb]{0,0,0}\mathrm{Eq}}I (I,I,{(i,i)|iUI})\displaystyle\triangleq(I,I,\{(i,i)~{}|~{}i\in UI\}) TopI\displaystyle{\color[rgb]{0,0,0}\mathrm{Top}}I (I,I,UI×UI).\displaystyle\triangleq(I,I,UI\times UI).

Other examples of basic endorelations can be found in concrete categories.

  • The category 𝐃𝐢𝐯𝒬(){\bf Div}_{\mathcal{Q}}({\mathbb{C}}) of 𝒬\mathcal{Q}-divergences on \mathbb{C}-objects has a basic relation EδE_{\delta} parameterized by δ𝒬\delta\in\mathcal{Q}. It collects all pairs of global elements whose divergence is bound by dd. That is, Eδ(I,d)(I,I,{(x1,x2)|d(x1,x2)δ})E_{\delta}(I,d)\triangleq(I,I,\{(x_{1},x_{2})~{}|~{}d(x_{1},x_{2})\leq\delta\}).

  • The category of preorders and monotone functions has the basic endorelation EeqE_{eq} collecting equivalent global elements: Eeq(I,)(I,I,{(x,y)|xyyx})E_{eq}(I,\leq)\triangleq(I,I,\{(x,y)~{}|~{}x\leq y\wedge y\leq x\}).

Definition 6.

Let (,1,(×),T,η,μ,θ)(\mathbb{C},1,(\times),T,\eta,\mu,\theta) be a CC-SM, 𝒬\mathcal{Q} be a divergence domain, (M,,1,())(M,\leq,1,(\cdot)) be a grading monoid and E:𝐁𝐑𝐞𝐥()E\colon\mathbb{C}\to{\bf BRel}(\mathbb{C}) be a basic endorelation. An EE-relative MM-graded 𝒬\mathcal{Q}-divergence (when M=1M=1, we drop “MM-graded”) on the monad TT is a doubly-indexed family of 𝒬\mathcal{Q}-divergences Δ={ΔIm:(U(TI))2𝒬}mM,I{\mathsf{\Delta}}=\{{\mathsf{\Delta}}^{m}_{I}\colon(U(TI))^{2}\to\mathcal{Q}\}_{m\in M,I\in\mathbb{C}} satisfying the following conditions:

Monotonicity

For any mmm\leq m^{\prime} in MM, II\in\mathbb{C} and c1,c2U(TI)c_{1},c_{2}\in U(TI),

ΔIm(c1,c2)ΔIm(c1,c2).{\mathsf{\Delta}}^{m}_{I}(c_{1},c_{2})\geq{\mathsf{\Delta}}^{m^{\prime}}_{I}(c_{1},c_{2}).
EE-unit Reflexivity

For any II\in\mathbb{C},

sup(x1,x2)EIΔI1(ηIx1,ηIx2)0.\sup_{(x_{1},x_{2})\in EI}{\mathsf{\Delta}}^{1}_{I}(\eta_{I}\mathbin{\bullet}x_{1},\eta_{I}\mathbin{\bullet}x_{2})\leq 0.
EE-composability

For any m1,m2Mm_{1},m_{2}\in M, I,JI,J\in\mathbb{C}, c1,c2U(TI)c_{1},c_{2}\in U(TI) and f1,f2:ITJf_{1},f_{2}\colon I\to TJ,

ΔJm1m2(f1c1,f2c2)ΔIm1(c1,c2)+sup(x1,x2)EIΔJm2(f1x1,f2x2).{\mathsf{\Delta}}^{m_{1}\cdot m_{2}}_{J}(f_{1}^{\sharp}\mathbin{\bullet}c_{1},f_{2}^{\sharp}\mathbin{\bullet}{c_{2}})\leq{\mathsf{\Delta}}^{m_{1}}_{I}(c_{1},c_{2})+\sup_{(x_{1},x_{2})\in EI}{\mathsf{\Delta}}^{m_{2}}_{J}(f_{1}\mathbin{\bullet}x_{1},f_{2}\mathbin{\bullet}x_{2}).

We write 𝐃𝐢𝐯(T,E,M,𝒬){\bf Div}(T,E,M,\mathcal{Q}) for the collection of EE-relative MM-graded 𝒬\mathcal{Q}-divergences on TT. We introduce a partial order \preceq on 𝐃𝐢𝐯(T,E,M,𝒬){\bf Div}(T,E,M,\mathcal{Q}) by:

Δ1Δ2mM,I,c1,c2U(TI).(Δ1)Im(c1,c2)(Δ2)Im(c1,c2).{\mathsf{\Delta}}_{1}\preceq{\mathsf{\Delta}}_{2}\iff\forall{m\in M,I\in\mathbb{C},c_{1},c_{2}\in U(TI)}~{}.~{}({\mathsf{\Delta}}_{1})^{m}_{I}(c_{1},c_{2})\geq({\mathsf{\Delta}}_{2})^{m}_{I}(c_{1},c_{2}).

The EE-composability condition is a generalization of the composability of differential privacy stated as [13, Theorem 1]. What is new in this paper is that 1) we introduce a condition on the monad unit (EE-unit reflexivity), and that 2) the sup computed in EE-unit reflexivity and EE-composability scans global elements related by EE, while [13] only considers the case where E=EqE={\color[rgb]{0,0,0}\mathrm{Eq}}. We will later show that both EE-unit reflexivity and EE-composability play an important role when connecting divergences, relational liftings of TT, and the monad structure of TT - these conditions are necessary and sufficient to construct strong graded relational liftings of TT satisfying fundamental property with respect to divergences (Proposition 2).

5 Examples of Divergences on Monads

5.1 Cost Difference for Deterministic Computations

To aid in understanding the EE-unit reflexivity and EE-composability conditions, we illustrate a few divergences on an elementary monad: the cost count monad T=×T=\mathbb{N}\times- on 𝐒𝐞𝐭{\bf Set}. Its unit and Kleisli extension are defined by

ηI(x)\displaystyle\eta_{I}(x) (0,x)\displaystyle\triangleq(0,x) f(i,x)\displaystyle f^{\sharp}(i,x) (i+π1(f(x)),π2(f(x)))\displaystyle\triangleq(i+\pi_{1}(f(x)),\pi_{2}(f(x))) (xI,i,f:ITJ).\displaystyle(x\in I,i\in\mathbb{N},f\colon I\to TJ).

The monad TT can be used to record the cost incurred by deterministic computations. For instance, consider the quick sort algorithm 𝗊𝗌𝗈𝗋𝗍\mathsf{qsort} and the insertion sort algorithm 𝗂𝗌𝗈𝗋𝗍\mathsf{isort}, both of which are modified so that they tick a count whenever they compare two elements to be sorted. These two modified sort programs are interpreted as functions [[𝗊𝗌𝗈𝗋𝗍]],[[𝗂𝗌𝗈𝗋𝗍]]:T()[\![\mathsf{qsort}]\!],[\![\mathsf{isort}]\!]\colon\mathbb{N}^{*}\to T(\mathbb{N}^{*}), so that the first component of [[𝗊𝗌𝗈𝗋𝗍]](x)[\![\mathsf{qsort}]\!](x) and that of [[𝗂𝗌𝗈𝗋𝗍]](x)[\![\mathsf{isort}]\!](x) report the number of comparisons performed during sorting xx.

We first define an 𝒩\mathcal{N}-divergence 𝖢I\mathsf{C}_{I} on TITI, for each I𝐒𝐞𝐭I\in{\bf Set}, by

𝖢I((i,x),(j,y))|ij|.{\mathsf{C}_{I}}{((i,x),(j,y))}\triangleq|i-j|.

This divergence 𝖢I\mathsf{C}_{I} computes the difference of costs between two computations (i,x),(j,y)TI(i,x),(j,y)\in TI, ignoring their return values. The family 𝖢={𝖢I}I𝐒𝐞𝐭\mathsf{C}=\{\mathsf{C}_{I}\}_{I\in{\bf Set}} forms a Top{\color[rgb]{0,0,0}\mathrm{Top}}-relative 𝒩\mathcal{N}-divergence on TT. The Top{\color[rgb]{0,0,0}\mathrm{Top}}-unit reflexivity of 𝖢\mathsf{C} means that the difference of costs between pure computations is zero:

𝖢I(ηI(x),ηI(y))=𝖢I((0,x),(0,y))=0.\mathsf{C}_{I}(\eta_{I}(x),\eta_{I}(y))=\mathsf{C}_{I}((0,x),(0,y))=0.

The Top{\color[rgb]{0,0,0}\mathrm{Top}}-composability of 𝖢\mathsf{C} says that we can limit the cost difference of two runs of programs f(i,x)f^{\sharp}(i,x) and g(j,y)g^{\sharp}(j,y) by the sum of cost difference of the preceding computations (i,x),(j,y)(i,x),(j,y) and that of two programs f,g:ITJf,g\colon I\to TJ. The latter is measured by taking the sup of cost difference of f(x)f(x) and g(y)g(y), where (x,y)(x,y) range over the basic endorelation TopI{\color[rgb]{0,0,0}\mathrm{Top}}I.

𝖢I(f(i,x),g(j,y))\displaystyle\mathsf{C}_{I}(f^{\sharp}(i,x),g^{\sharp}(j,y)) =𝖢I(i+π1(f(x)),π2(f(x)),j+π1(g(y)),π2(g(y)))\displaystyle=\mathsf{C}_{I}(i+\pi_{1}(f(x)),\pi_{2}(f(x)),j+\pi_{1}(g(y)),\pi_{2}(g(y)))
|ij|+supx,yI|π1(f(x))π1(g(y))|\displaystyle\leq|i-j|+\sup_{x,y\in I}|\pi_{1}(f(x))-\pi_{1}(g(y))|
=𝖢I((i,x),(j,y))+sup(x,y)TopI𝖢J(f(x),g(y)).\displaystyle=\mathsf{C}_{I}((i,x),(j,y))+\sup_{(x,y)\in{\color[rgb]{0,0,0}\mathrm{Top}}I}\mathsf{C}_{J}(f(x),g(y)).

We remark that 𝖢\mathsf{C} is not an Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-relative 𝒩\mathcal{N}-divergence on TT because the Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-composability fails: when f(x)=(0,w)f(x)=(0,w), f(y)=(1,w)f(y)=(1,w) and f(z)=(0,v)f(z)=(0,v) (for zx,yz\neq x,y) we have 𝖢I((0,x),(0,y))=0\mathsf{C}_{I}((0,x),(0,y))=0 and sup(x,y)EqI𝖢J(f(x),f(y))=0\sup_{(x,y)\in{\color[rgb]{0,0,0}\mathrm{Eq}}I}\mathsf{C}_{J}(f(x),f(y))=0, but we have 𝖢J(f(0,x),f(0,y))=𝖢J((0,w),(1,w))=1\mathsf{C}_{J}(f^{\sharp}(0,x),f^{\sharp}(0,y))=\mathsf{C}_{J}((0,w),(1,w))=1.

Alternatively, we may consider the following 𝒩\mathcal{N}-divergence 𝖢I\mathsf{C}^{\prime}_{I} on TITI for each I𝐒𝐞𝐭I\in{\bf Set}:

𝖢I((i,x),(j,y)){|ij|x=yxy.\mathsf{C}^{\prime}_{I}((i,x),(j,y))\triangleq\begin{cases}|i-j|&x=y\\ \infty&x\neq y\end{cases}.

This divergence is sensitive on return values of computations. When return values of two computations agree, 𝖢\mathsf{C}^{\prime} measures the cost difference as done in 𝖢\mathsf{C}, but when they do not agree, the cost difference is judged as \infty. This divergence is an Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-relative 𝒩\mathcal{N}-divergence on TT.

5.2 Cost Difference for Nondeterministic Computations

Deterministic and nondeterministic computations with cost counting can be respectively modeled by the monads (×)(\mathbb{N}\times-) and P(×)P(\mathbb{N}\times-) on 𝐒𝐞𝐭{\bf Set}.

We define the divergences for cost difference as in Table 1. These divergences extract the upper bound of cost difference between two computations. The divergences 𝖢\mathsf{C} and 𝖭𝖢\mathsf{NC} measure the usual distance of costs for deterministic and nondeterministic computations respectively. The divergence 𝖭𝖢𝖨\mathsf{NCI} measures the subtraction of costs of two nondeterministic computations. For results of two nondeterministic computations A,BP(×I)A,B\in P(\mathbb{N}\times I), the divergence 𝖭𝖢𝖨I(A,B)\mathsf{NCI}_{I}(A,B) is an upper bound of iji-j for all possible choices of (i,x)A(i,x)\in A and (j,y)B(j,y)\in B, where a lower bound of iji-j is also given by 𝖭𝖢𝖨I(B,A)-\mathsf{NCI}_{I}(B,A). The same idea to measure the difference of costs between two programs by subtraction also appears in ([18, 47]). If either AA or BB is empty, we fail to get an information of costs. We then have 𝖭𝖢𝖨I(A,B)=\mathsf{NCI}_{I}(A,B)=-\infty. On the other hand, if both AA and BB are not empty, their cost intervals are defined by

[lA,hA][inf(i,x)Ai,sup(i,x)Ai],[lB,hB][inf(j,y)Bj,sup(j,y)Bj].[l_{A},h_{A}]\triangleq[\inf_{(i,x)\in A}i,\sup_{(i,x)\in A}i],\qquad[l_{B},h_{B}]\triangleq[\inf_{(j,y)\in B}j,\sup_{(j,y)\in B}j].

We then have 𝖭𝖢𝖨I(A,B)=hAlB\mathsf{NCI}_{I}(A,B)=h_{A}-l_{B} and 𝖭𝖢𝖨I(B,A)=lAhB-\mathsf{NCI}_{I}(B,A)=l_{A}-h_{B}.

Table 1: (11-graded) Top{\color[rgb]{0,0,0}\mathrm{Top}}-relative QQ-divergences for cost counting monads
Δ𝐃𝐢𝐯(T,Top,1,𝒬){\mathsf{\Delta}}\in{\bf Div}(T,{\color[rgb]{0,0,0}\mathrm{Top}},1,\mathcal{Q}) TT 𝒬\mathcal{Q} Definition of ΔI(c1,c2){{\mathsf{\Delta}}_{I}}(c_{1},c_{2})
𝖢\mathsf{C} ×\mathbb{N}\times- 𝒩\mathcal{N} 𝖢I((i,x),(j,y))=|ij|{\mathsf{C}_{I}}{((i,x),(j,y))}=|i-j|
𝖭𝖢\mathsf{NC} P(×)P(\mathbb{N}\times-) 𝒩\mathcal{N} 𝖭𝖢I(A,B)=sup(i,x)A,(j,y)B|ij|{\mathsf{NC}_{I}}{(A,B)}=\sup_{(i,x)\in A,(j,y)\in B}|i-j|
𝖭𝖢𝖨\mathsf{NCI} P(×)P(\mathbb{N}\times-) 𝒵\mathcal{Z} 𝖭𝖢𝖨I(A,B)=sup(i,x)A,(j,y)Bij\mathsf{NCI}_{I}(A,B)=\sup_{(i,x)\in A,(j,y)\in B}i-j

5.3 Divergences for Differential Privacy

Differential privacy (DP for short) is a quantitative definition of privacy of randomized queries in databases. DP is based on the idea of noise-adding anonymization against background-knowledge attacks. In the study of DP, a query is modeled by a measurable function c:IGJc\colon I\to GJ, where II and JJ are measurable spaces of inputs and outputs respectively, and GJGJ is the measurable space of all probability measures over JJ; here GG itself refers to the Giry monad ([26]; see also Section 13).

Definition 7 (Differential Privacy, ([21])).

Let c:IGJc\colon I\to GJ be a morphism in 𝐌𝐞𝐚𝐬{\bf Meas}, representing a randomized query. The query cc satisfies (ε,δ)(\varepsilon,\delta)-differential privacy (ϵ,δ0\epsilon,\delta\geq 0 are reals) if for any adjacent datasets (d1,d2)Radj(d_{1},d_{2})\in R_{\mathrm{adj}}111 Strictly speaking, differential privacy depends on the definition of adjacency of datasets. The adjacency relation RadjR_{\mathrm{adj}} is usually defined as {(d1,d2)|ρ(d1,d2)1}\{(d_{1},d_{2})|\rho(d_{1},d_{2})\leq 1\} with a metric ρ\rho over II., the following holds:

SmeasurableJ.Pr[c(d1)S]exp(ε)Pr[c(d2)S]+δ.\forall S\subseteq_{\mathrm{measurable}}J.~{}\Pr[c(d_{1})\in S]\leq\exp(\varepsilon)\Pr[c(d_{2})\in S]+\delta.

To express this definition in terms of divergence on monad, we introduce a doubly-indexed family of +\mathcal{R}^{+}-divergence 𝖣𝖯={𝖣𝖯Jε}ε[0,],J𝐌𝐞𝐚𝐬\mathsf{DP}=\{\mathsf{DP}_{J}^{\varepsilon}\}_{\varepsilon\in[0,\infty],J\in{\bf Meas}} on GJGJ by

𝖣𝖯Jε(μ1,μ2)supSΣJ(μ1(S)exp(ε)μ2(S))(μ1,μ2GJ).\mathsf{DP}_{J}^{\varepsilon}(\mu_{1},\mu_{2})\triangleq\sup_{S\in\Sigma_{J}}(\mu_{1}(S)-\exp(\varepsilon)\mu_{2}(S))\quad(\mu_{1},\mu_{2}\in GJ).

Then the query c:IGJc\colon I\to GJ satisfies (ε,δ)(\varepsilon,\delta)-DP if and only if

(d1,d2)Radj.𝖣𝖯Jε(c(d1),c(d2))δ.\forall{(d_{1},d_{2})\in R_{\mathrm{adj}}}~{}.~{}\mathsf{DP}_{J}^{\varepsilon}(c(d_{1}),c(d_{2}))\leq\delta.

The pair (ε,δ)(\varepsilon,\delta) indicates the difference between output probability distributions c(d1)c(d_{1}) and c(d2)c(d_{2}) of the query cc for given datasets d1d_{1} and d2d_{2}. Intuitively, the parameter ε\varepsilon is an upper bound of the ratio Pr[c(d1)=s]/Pr[c(d2)=s]\Pr[c(d_{1})=s]/\Pr[c(d_{2})=s] of probabilities which indicates the leakage of privacy. If ε\varepsilon is large, attackers can distinguish the datasets d1d_{1} and d2d_{2} from the outputs of the query cc. The parameter δ\delta is the probability of failure of privacy protection.

The family 𝖣𝖯\mathsf{DP} forms an Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-relative +\mathcal{R}^{+}-graded +\mathcal{R}^{+}-divergence on the Giry monad GG [52, Lemma 6]. This is proved by extending the composability of the divergence for DP on discrete probability distributions shown as [12, Lemmas 3 and 6] and [13, Proposition 5], based on the composition theorem of DP [22, Section 3.5].

The conditions in Definition 6 on 𝖣𝖯\mathsf{DP} corresponds to the following basic properties of DP:

(monotonicity)

The monotonicity of 𝖣𝖯\mathsf{DP} corresponds to weakening the differential privacy of queries: if cc satisfies (ε,δ)(\varepsilon,\delta)-DP and εε\varepsilon\leq\varepsilon^{\prime} and δδ\delta\leq\delta^{\prime} holds, then cc satisfies (ε,δ)(\varepsilon^{\prime},\delta^{\prime})-DP.

(Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-unit reflexivity)

The Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-unit reflexivity of 𝖣𝖯\mathsf{DP} implies 𝖣𝖯J0(ηJh(x),ηJh(x))=0\mathsf{DP}_{J}^{0}(\eta_{J}\circ h(x),\eta_{J}\circ h(x))=0 for any measurable function h:IJh\colon I\to J and xIx\in I. This, together with the composability below, ensures the robustness of DP of a query c:IGJc\colon I\to GJ with respect to deterministic postprocessing:

h:JK.c is (ϵ,δ)-DPGhc is (ϵ,δ)-DP.\forall{h:J\to K}~{}.~{}\text{$c$ is $(\epsilon,\delta)$-DP}\implies\text{$Gh\circ c$ is $(\epsilon,\delta)$-DP}. (2)

In fact, the divergence 𝖣𝖯\mathsf{DP} is reflexive: we have 𝖣𝖯J0(μ,μ)=0\mathsf{DP}_{J}^{0}(\mu,\mu)=0 for every μGJ\mu\in GJ. Therefore h:JKh\colon J\to K and GhGh in (2) can be replaced by h:JGKh\colon J\to GK and hh^{\sharp}; the replaced condition states the robustness of DP of a query with respect to probabilistic postprocessing.

(Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-composability)

The Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-composability of 𝖣𝖯\mathsf{DP} corresponds to the known property of DP called the sequential composition theorem ([22]). If c1:IGJc_{1}\colon I\to GJ^{\prime} and c2:JGJc_{2}\colon J^{\prime}\to GJ are (ε1,δ1)(\varepsilon_{1},\delta_{1})-DP and (ε2,δ2)(\varepsilon_{2},\delta_{2})-DP respectively, then the sequential composition c2c1:IGJc_{2}^{\sharp}\circ c_{1}\colon I\to GJ of the queries c1c_{1} and c2c_{2} is (ε1+ε2,δ1+δ2)(\varepsilon_{1}+\varepsilon_{2},\delta_{1}+\delta_{2})-DP.

A Non-Example: Pointwise Differential Privacy.

We stated above that a parameter (ε,δ)(\varepsilon,\delta) of DP intuitively gives an upper bound of the probability ratio Pr[c(d1)=s]/Pr[c(d2)=s]\Pr[c(d_{1})=s]/\Pr[c(d_{2})=s] and the probability of failure of privacy protection. However, strictly speaking, there is a gap between the definition of (ε,δ)(\varepsilon,\delta)-DP and this intuition of ε\varepsilon and δ\delta. Pointwise differential privacy ([46, Definition 3.2] and [27, Proposition 1.2.3]) is a finer definition of DP that is faithful to the intuition.

Definition 8.

A measurable function c:IGJc\colon I\to GJ (regarded as a query) is pointwise (ε,δ)(\varepsilon,\delta)-differentially private if whenever d1d_{1} and d2d_{2} are adjacent, for some AΣJA\in\Sigma_{J} with Pr[c(d1)A]δ\Pr[c(d_{1})\notin A]\leq\delta, we have

sA.Pr[c(d1)=s]exp(ε)Pr[c(d2)=s],\forall s\in A.~{}\Pr[c(d_{1})=s]\leq\exp(\varepsilon)\Pr[c(d_{2})=s],

which is equivalent to 222 Remark that Pr[c(d1)=s]\Pr[c(d_{1})=s] and Pr[c(d2)=s]\Pr[c(d_{2})=s] are Radon-Nikodym derivatives of c(d1)c(d_{1}) and c(d2)c(d_{2}) with respect to a measure ν\nu such that c(d1),c(d2)νc(d_{1}),c(d_{2})\ll\nu. [\implies] Obvious. [\impliedby] By Radon-Nikodym theorem we can take the Radon-Nikodym derivatives Pr[c(d1)=s]\Pr[c(d_{1})=s] and Pr[c(d2)=s]\Pr[c(d_{2})=s] with respect to ν=c(d1)+c(d2)\nu=c(d_{1})+c(d_{2}). The inequality does not depend on the choice of ν\nu.

SmeasurableA.Pr[c(d1)S]exp(ε)Pr[c(d2)S].\forall S\subseteq_{\mathrm{measurable}}A.~{}\Pr[c(d_{1})\in S]\leq\exp(\varepsilon)\Pr[c(d_{2})\in S].

To express this definition in terms of divergence on monad, we introduce a doubly-indexed family of +\mathcal{R}^{+}-divergences 𝗉𝗐𝖣𝖯={𝗉𝗐𝖣𝖯Jε}ε+,J𝐌𝐞𝐚𝐬\mathsf{pwDP}=\{\mathsf{pwDP}_{J}^{\varepsilon}\}_{\varepsilon\in\mathcal{R}^{+},J\in{\bf Meas}} called pointwise indistinguishability:

𝗉𝗐𝖣𝖯Jε(μ1,μ2)inf{μ1(JA)|AΣX(SΣJ.SAμ1(S)exp(ε)μ2(S))}.\mathsf{pwDP}_{J}^{\varepsilon}(\mu_{1},\mu_{2})\triangleq\inf\left\{\mu_{1}(J\setminus A)~{}|~{}A\in\Sigma_{X}\wedge(\forall{S\in\Sigma_{J}}.S\subseteq A\implies\mu_{1}(S)\leq\exp(\varepsilon)\mu_{2}(S))\right\}.

Then c:IGJc\colon I\to GJ is pointwise (ε,δ)(\varepsilon,\delta)-differentially private if and only if

(d1,d2)Radj.𝗉𝗐𝖣𝖯Jε(c(d1),c(d2))δ.\forall{(d_{1},d_{2})\in R_{\mathrm{adj}}}~{}.~{}\mathsf{pwDP}_{J}^{\varepsilon}(c(d_{1}),c(d_{2}))\leq\delta.

The family 𝗉𝗐𝖣𝖯\mathsf{pwDP} is obviously reflexive: 𝗉𝗐𝖣𝖯Jε(μ,μ)=0\mathsf{pwDP}_{J}^{\varepsilon}(\mu,\mu)=0 holds for any μGJ\mu\in GJ and ε0\varepsilon\geq 0. Hence it is Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-unit reflexive too. However, it is not Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-composable. We let 3={0,1,2}3=\{0,1,2\} and 2={0,1}2=\{0,1\} be discrete spaces, and let α=exp(ε)\alpha=\exp(\varepsilon). We define two probability distributions μ1,μ2G3\mu_{1},\mu_{2}\in G3 by

μ1110𝐝0+910𝐝1,μ2910α𝐝1+(1910α)𝐝2.\mu_{1}\triangleq\frac{1}{10}\mathbf{d}_{0}+\frac{9}{10}\mathbf{d}_{1},\qquad\mu_{2}\triangleq\frac{9}{10\alpha}\mathbf{d}_{1}+(1-\frac{9}{10\alpha})\mathbf{d}_{2}.

We then have 𝗉𝗐𝖣𝖯3ε(μ1,μ2)=110\mathsf{pwDP}_{3}^{\varepsilon}(\mu_{1},\mu_{2})=\frac{1}{10} with A={1,2}A=\{1,2\} since 110>exp(ε)0\frac{1}{10}>\exp(\varepsilon)\cdot 0, 910exp(ε)910α\frac{9}{10}\leq\exp(\varepsilon)\cdot\frac{9}{10\alpha}, and 0exp(ε)(1910α)0\leq\exp(\varepsilon)\cdot(1-\frac{9}{10\alpha}). Next, we define f:3G2f\colon 3\to G2 by

f(0)110𝐝0+910𝐝1,f(1)910𝐝0+110𝐝1,f(2)𝐝1.f(0)\triangleq\frac{1}{10}\mathbf{d}_{0}+\frac{9}{10}\mathbf{d}_{1},\quad f(1)\triangleq\frac{9}{10}\mathbf{d}_{0}+\frac{1}{10}\mathbf{d}_{1},\quad f(2)\triangleq\mathbf{d}_{1}.

We then calculate

f(μ1)=82100𝐝0+18100𝐝1,f(μ2)=81100α𝐝0+(100α90+9100α)𝐝1.f^{\sharp}(\mu_{1})=\frac{82}{100}\mathbf{d}_{0}+\frac{18}{100}\mathbf{d}_{1},\quad f^{\sharp}(\mu_{2})=\frac{81}{100\alpha}\mathbf{d}_{0}+(\frac{100\alpha-90+9}{100\alpha})\mathbf{d}_{1}.

Then, we obtain 𝗉𝗐𝖣𝖯2ε(f(μ1),f(μ2))=82100\mathsf{pwDP}_{2}^{\varepsilon}(f^{\sharp}(\mu_{1}),f^{\sharp}(\mu_{2}))=\frac{82}{100} with A={0}A=\{0\} since 82100>exp(ε)81100α\frac{82}{100}>\exp(\varepsilon)\frac{81}{100\alpha}. Hence 𝗉𝗐𝖣𝖯2ε(f(μ1),f(μ2))=82100>110=𝗉𝗐𝖣𝖯3ε(μ1,μ2)\mathsf{pwDP}_{2}^{\varepsilon}(f^{\sharp}(\mu_{1}),f^{\sharp}(\mu_{2}))=\frac{82}{100}>\frac{1}{10}=\mathsf{pwDP}_{3}^{\varepsilon}(\mu_{1},\mu_{2}). Thus 𝗉𝗐𝖣𝖯\mathsf{pwDP} is not Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-composable, because by the reflexivity of 𝗉𝗐𝖣𝖯\mathsf{pwDP}, we have sup(x,y)Eq3𝗉𝗐𝖣𝖯0(f(x),f(y))=0\sup_{(x,y)\in{\color[rgb]{0,0,0}\mathrm{Eq}}_{3}}\mathsf{pwDP}^{0}(f(x),f(y))=0.

Various Relaxations of Differential Privacy

Since the seminal work on DP by [21], various relaxations of differential privacy have been proposed: Rényi DP ([40]), zero-concentrated DP ([17]) and truncated zero-concentrated DP ([16]). They give tighter bounds of differential privacy. These relaxations of differential privacy can be expressed by suitable divergences on the Giry monad GG and sub-Giry monad GsG_{s}; see Table 2 for their definitions. There, α,w(1,)\alpha,w\in(1,\infty) are non-grading parameters for 𝖱𝖾\mathsf{Re} and 𝗍𝖢𝖣𝖯\mathsf{tCDP}. Each row of the table represents that Δ{\mathsf{\Delta}} is an Eq{\color[rgb]{0,0,0}\mathrm{Eq}}{}-relative 𝒬\mathcal{Q}- (resp. 𝒬s\mathcal{Q}_{s}-) divergences on GG (resp. GsG_{s}), and the definition of ΔI(μ1,μ2){\mathsf{\Delta}}_{I}(\mu_{1},\mu_{2}) follows.

Table 2: Eq{\color[rgb]{0,0,0}\mathrm{Eq}}{}-relative MM-graded 𝒬\mathcal{Q}- (𝒬s\mathcal{Q}_{s}-)divergences on GG (GsG_{s})
Δ{\mathsf{\Delta}} MM 𝒬\mathcal{Q} 𝒬s\mathcal{Q}_{s} Definition of ΔIm(μ1,μ2){{\mathsf{\Delta}}^{m}_{I}}(\mu_{1},\mu_{2}) Composability proof
𝖣𝖯\mathsf{DP} +\mathcal{R}^{+} +\mathcal{R}^{+} +\mathcal{R}^{+} supSΣI(μ1(S)exp(ε)μ2(S))\sup_{S\in\Sigma_{I}}(\mu_{1}(S)-\exp(\varepsilon)\mu_{2}(S)) ([13])
𝖱𝖾α{}^{\alpha}\mathsf{Re} 1 +\mathcal{R}^{+} \mathcal{R} 1α1logI(μ1(x)μ2(x))αμ2(x)𝑑x.\frac{1}{\alpha-1}\log\int_{I}\left(\frac{\mu_{1}(x)}{\mu_{2}(x)}\right)^{\alpha}\mu_{2}(x)~{}dx. ([40])
𝗓𝖢𝖣𝖯\mathsf{zCDP} +\mathcal{R}^{+} +\mathcal{R}^{+} \mathcal{R} sup1<α1α(𝖱𝖾Iα(μ1,μ2)m)\sup_{1<\alpha}\frac{1}{\alpha}({}^{\alpha}\mathsf{Re}_{I}(\mu_{1},\mu_{2})-m) ([17])
𝗍𝖢𝖣𝖯w{}^{w}\mathsf{tCDP} 1 +\mathcal{R}^{+} \mathcal{R} sup1<α<w1α(𝖱𝖾Iα(μ1,μ2))\sup_{1<\alpha<w}\frac{1}{\alpha}({}^{\alpha}\mathsf{Re}_{I}(\mu_{1},\mu_{2})) ([16])

5.4 Statistical Divergences and Composablity of ff-Divergences

Apart from differential privacy, various distances between (sub-)probability distributions are introduced in probability theory. They are called statistical divergences. Examples include: total variation distance 𝖳𝖵\mathsf{TV}, Hellinger distance 𝖧𝖣\mathsf{HD}, Kullback-Leibler divergence 𝖪𝖫\mathsf{KL}, and χ2\chi^{2}-divergence 𝖢𝗁𝗂\mathsf{Chi}; they are defined in Table 3. These statistical divergences are Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-relative divergences on the Giry monad GG (and GsG_{s} for 𝖳𝖵\mathsf{TV}); see the same table for their divergence domains. Question marks in the column of 𝒬s\mathcal{Q}_{s} means that we do not know with which monoid structure the Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-composability holds. We remark that these divergences are also reflexive, that is, Δ(c,c)=0{\mathsf{\Delta}}(c,c)=0. Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-composability of these divergences in discrete form are proved in ([13, 45]). Later, [52] extends their results to the composability of divergences in continuous form.

Table 3: Statistical divergences that are Eq{\color[rgb]{0,0,0}\mathrm{Eq}}{}-relative 𝒬\mathcal{Q}- (resp. 𝒬s\mathcal{Q}_{s}-) divergences on GG (resp. GsG_{s})
Name Δ{\mathsf{\Delta}} 𝒬\mathcal{Q} 𝒬s\mathcal{Q}_{s} Definition of ΔIm(μ1,μ2){{\mathsf{\Delta}}^{m}_{I}}(\mu_{1},\mu_{2})
Total variation distance 𝖳𝖵\mathsf{TV} +\mathcal{R}^{+} +\mathcal{R}^{+} 12I|μ1(x)μ2(x)|𝑑x\frac{1}{2}\int_{I}|\mu_{1}(x)-\mu_{2}(x)|~{}dx
Kullback-Leibler divergence 𝖪𝖫\mathsf{KL} +\mathcal{R}^{+} ? Iμ1(x)log(μ1(x)μ2(x))𝑑x\int_{I}\mu_{1}(x)\log\left(\frac{\mu_{1}(x)}{\mu_{2}(x)}\right)~{}dx
Hellinger distance 𝖧𝖣\mathsf{HD} +\mathcal{R}^{+} ? 12I(μ1(x)μ2(x))2𝑑x\frac{1}{2}\int_{I}\left(\sqrt{\mu_{1}(x)}-\sqrt{\mu_{2}(x)}\right)^{2}~{}dx
χ2\chi^{2}-divergence 𝖢𝗁𝗂\mathsf{Chi} 1+\mathcal{R}^{+}_{1} ? I(μ1(x)μ2(x))2μ2(x)𝑑x\int_{I}\frac{(\mu_{1}(x)-\mu_{2}(x))^{2}}{\mu_{2}(x)}~{}dx

Each of four divergences in Table 3 can be expressed as an ff-divergence 𝖣𝗂𝗏f{}^{f}\mathsf{Div} ([19, 20, 43]):

𝖣𝗂𝗏If(μ1,μ2)Iμ2(x)f(μ1(x)μ2(x))𝑑x.{}^{f}\mathsf{Div}_{I}(\mu_{1},\mu_{2})\triangleq\int_{I}\mu_{2}(x)f\left(\frac{\mu_{1}(x)}{\mu_{2}(x)}\right)dx.

Here, ff is a parameter called weight function, and has to be a convex function f:[0,)f\colon[0,\infty)\to\mathbb{R}, continuous at 0 and satisfying limx+0xf(x)=0\lim_{x\to+0}xf(x)=0. Weight functions for four divergences 𝖳𝖵,𝖪𝖫,𝖧𝖣,𝖢𝗁𝗂\mathsf{TV},\mathsf{KL},\mathsf{HD},\mathsf{Chi} are in Table 4. In fact, 𝖣𝖯ε\mathsf{DP}^{\varepsilon} is also an ff-divergence with weight function f(t)=max(0,texp(ε))f(t)=\max(0,t-\exp(\varepsilon)); see [13, Proposition 2]. We also remark that Rényi divergence 𝖱𝖾α{}^{\alpha}\mathsf{Re} of order α\alpha is the logarithm of the ff-divergence with weight function f(t)=tαf(t)=t^{\alpha}.

ff-divergences have several nice properties such as reflexivity, postprocessing inequality, joint-convexity, duality and continuity ([20, 35]). However, the Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-composability of ff-divergences is not guaranteed in general. Here we provide a sufficient condition for the Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-composability of 𝖣𝗂𝗏f{}^{f}\mathsf{Div} over a specific form of divergence domain.

Proposition 1.

Let γ0\gamma\geq 0 be a nonnegative real number, γ+=([0,],,0,λ(p,q).p+q+γpq)\mathcal{R}^{+}_{\gamma}=([0,\infty],\leq,0,\lambda{(p,q)}~{}.~{}p+q+\gamma pq) be the divergence domain, and ff be a weight function such that f0f\geq 0 and f(1)=0f(1)=0. If there exists α,β,β\alpha,\beta,\beta^{\prime}\in\mathbb{R} such that, for all x,y,z,w[0,1]x,y,z,w\in[0,1], the following hold (suppose 0f(0/0)=00f(0/0)=0):

0\displaystyle 0 (βz+(1β)x)+γxf(z/x)\displaystyle\leq(\beta^{\prime}z+(1-\beta^{\prime})x)+\gamma xf\left({z}/{x}\right)
xyf(zw/xy)\displaystyle xyf\left({zw}/{xy}\right) (βw+(1β)y)xf(z/x)+(βz+(1β)x)yf(w/y)\displaystyle\leq(\beta w+(1-\beta)y)xf\left({z}/{x}\right)+(\beta^{\prime}z+(1-\beta^{\prime})x)yf\left({w}/{y}\right)
+γxyf(z/x)f(w/y)+α(xz)(wy),\displaystyle\quad+\gamma xyf\left({z}/{x}\right)f\left({w}/{y}\right)+\alpha(x-z)(w-y),

then 𝖣𝗂𝗏f{}^{f}\mathsf{Div} is an Eq{\color[rgb]{0,0,0}\mathrm{Eq}}{}-relative γ+\mathcal{R}^{+}_{\gamma}-divergence on the Giry monad GG. When α=0\alpha=0 and β,β[0,1]\beta,\beta^{\prime}\in[0,1], GG can be replaced with the sub-Giry monad GsG_{s}.

The proof of this proposition generalizes and integrates the proofs given in [45, Section 5.A.2]. This proposition is applicable to prove the composability of divergences in Table 3 by choosing suitable parameters; see Table 4.

Table 4: Parameters for Proposition 1
𝖣𝗂𝗏f{}^{f}\mathsf{Div} Weight function ff γ\gamma α\alpha β\beta β\beta^{\prime}
𝖳𝖵\mathsf{TV} f(t)=|t1|/2f(t)=|t-1|/2 0 0 11 0
𝖪𝖫\mathsf{KL} f(t)=flog(t)t+1f(t)=f\log(t)-t+1 0 1-1 11 11
𝖧𝖣\mathsf{HD} f(t)=(t1)2/2f(t)=(\sqrt{t}-1)^{2}/2 0 1/4-1/4 1/21/2 1/21/2
𝖢𝗁𝗂\mathsf{Chi} f(t)=(t1)2/2f(t)=(t-1)^{2}/2 11 2-2 22 22

5.5 Divergences on the Probability Monad on QBS via Monad Opfunctors.

We have seen various divergences on the Giry monad GG. It would be nice if they are transferred to the probability monad PP on 𝐐𝐁𝐒{\bf QBS} (Section 13). For this, we first develop a generic method for transferring divergences on monads.

Let (,S)(\mathbb{C},S) and (𝔻,T)(\mathbb{D},T) be two CC-SMs. A monad opfunctor [53, Section 4] is a functor p:𝔻p\colon\mathbb{C}\to\mathbb{D} together with a natural transformation λ:pSTp\lambda\colon p\circ S\to T\circ p making the following diagrams commute:

p\textstyle{p\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}pηS\scriptstyle{p\circ\eta^{S}}ηTp\scriptstyle{\eta^{T}\circ p}pSS\textstyle{p\circ S\circ S\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}λS\scriptstyle{\lambda\circ S}pμS\scriptstyle{p\circ\mu^{S}}TpS\textstyle{T\circ p\circ S\ignorespaces\ignorespaces\ignorespaces\ignorespaces}Tλ\scriptstyle{T\circ\lambda}TTp\textstyle{T\circ T\circ p\ignorespaces\ignorespaces\ignorespaces\ignorespaces}μTp\scriptstyle{\mu^{T}\circ p}pS\textstyle{p\circ S\ignorespaces\ignorespaces\ignorespaces\ignorespaces}λ\scriptstyle{\lambda}Tp\textstyle{T\circ p}pS\textstyle{p\circ S\ignorespaces\ignorespaces\ignorespaces\ignorespaces}λ\scriptstyle{\lambda}Tp\textstyle{T\circ p}
Proposition 2.

Let (,S),(𝔻,T)(\mathbb{C},S),(\mathbb{D},T) be two CC-SMs, (p:𝔻,λ:pSTp)(p\colon\mathbb{C}\to\mathbb{D},\lambda\colon p\circ S\to T\circ p) be a monad opfunctor, and assume that U𝔻p=UU^{\mathbb{D}}\circ p=U^{\mathbb{C}} holds, and basic endorelations F:𝐁𝐑𝐞𝐥()F\colon\mathbb{C}\to{\bf BRel}(\mathbb{C}) and E:𝔻𝐁𝐑𝐞𝐥(𝔻)E\colon\mathbb{D}\to{\bf BRel}(\mathbb{D}) satisfy RFpI=REIR_{FpI}=R_{EI} for all II\in\mathbb{C} (we here use U𝔻p=UU^{\mathbb{D}}\circ p=U^{\mathbb{C}}). Then for any Δ𝐃𝐢𝐯(T,E,M,𝒬){\mathsf{\Delta}}\in{\bf Div}(T,E,M,\mathcal{Q}), the following doubly-indexed family of 𝒬\mathcal{Q}-divergences p,λΔ={(p,λΔ)Im}mM,I\langle{p,\lambda}\rangle^{*}{\mathsf{\Delta}}=\{(\langle{p,\lambda}\rangle^{*}{\mathsf{\Delta}})^{m}_{I}\}_{m\in M,I\in\mathbb{C}} on SISI is an FF-relative MM-graded 𝒬\mathcal{Q}-divergence on SS:

(p,λΔ)Im(ν1,ν2)\displaystyle(\langle{p,\lambda}\rangle^{*}{\mathsf{\Delta}})^{m}_{I}(\nu_{1},\nu_{2}) ΔpIm(λIν1,λIν2)=ΔpIm((U𝔻λI)(ν1),(U𝔻λI)(ν2)).\displaystyle\triangleq{\mathsf{\Delta}}^{m}_{pI}(\lambda_{I}\mathbin{\bullet}\nu_{1},\lambda_{I}\mathbin{\bullet}\nu_{2})={\mathsf{\Delta}}^{m}_{pI}((U^{\mathbb{D}}\lambda_{I})(\nu_{1}),(U^{\mathbb{D}}\lambda_{I})(\nu_{2})).

The left adjoint L:𝐐𝐁𝐒𝐌𝐞𝐚𝐬L\colon{\bf QBS}\to{\bf Meas} of the adjunction LK:𝐌𝐞𝐚𝐬𝐐𝐁𝐒L\dashv K\colon{\bf Meas}\rightarrow{\bf QBS} and the natural transformation l:LPGLl\colon LP\Rightarrow GL defined by lX([α,μ]X)=μ(α1())l_{X}([\alpha,\mu]_{\sim_{X}})=\mu(\alpha^{-1}(-)) forms a monad opfunctor from the probability monad PP on 𝐐𝐁𝐒{\bf QBS} to the Giry monad GG on 𝐌𝐞𝐚𝐬{\bf Meas} [28, Prop. 22 (3)]. Through this monad opfunctor (L,l)(L,l), we can convert Eq{\color[rgb]{0,0,0}\mathrm{Eq}}{}-divergences on GG to those on PP. This conversion can be applied to all the statistical divergences in Table 2 and 3.

In addition, for any standard Borel space, we can view such converted divergences L,lΔ\langle{L,l}\rangle^{*}{\mathsf{\Delta}} as the same thing as the original Δ{\mathsf{\Delta}}. When Ω𝐌𝐞𝐚𝐬\Omega\in{\bf Meas} is standard Borel, we have an equality LKΩ=ΩLK\Omega=\Omega, and lKΩl_{K\Omega} is an isomorphism. Therefore we obtain an isomorphism lKΩ:LPKΩGLKΩ=GΩl_{K\Omega}\colon LPK\Omega\cong GLK\Omega=G\Omega [28, Prop. 22 (4)]. A concrete description of its inverse is lKΩ1μ=[γ,μ(γ1())]KΩl_{K\Omega}^{-1}\mathbin{\bullet}\mu=[\gamma^{\prime},\mu(\gamma^{-1}(-))]_{\sim_{K\Omega}}, where γ:Ω\gamma^{\prime}\colon\mathbb{R}\to\Omega and γ:Ω\gamma\colon\Omega\to\mathbb{R} are a section-retraction pair (i.e. γγ=idΩ\gamma^{\prime}\circ\gamma=\mathrm{id}_{\Omega}) that exists for any standard Borel Ω\Omega.

Theorem 1.

For any Δ𝐃𝐢𝐯(G,Eq,M,𝒬){\mathsf{\Delta}}\in{\bf Div}(G,{\color[rgb]{0,0,0}\mathrm{Eq}},M,\mathcal{Q}) and standard Borel Ω𝐌𝐞𝐚𝐬\Omega\in{\bf Meas},

(L,lΔ)KΩm(lKΩ1μ1,lKΩ1μ2)=ΔΩm(μ1,μ2)(μ1,μ2U(GΩ)).(\langle{L,l}\rangle^{*}{\mathsf{\Delta}})^{m}_{K\Omega}(l_{K\Omega}^{-1}\mathbin{\bullet}\mu_{1},l_{K\Omega}^{-1}\mathbin{\bullet}\mu_{2})={\mathsf{\Delta}}^{m}_{\Omega}(\mu_{1},\mu_{2})\quad(\mu_{1},\mu_{2}\in U(G\Omega)).

5.6 Divergences on State Monads

The state monad TSS(×S)T_{S}\triangleq S\Rightarrow(-\times S) with a state space SS is used to represent programs that update the state. We construct divergences on TST_{S} using divergences dSd_{S} on the state space SS in several ways.

5.6.1 Lipschitz Constant on States

We first consider the state monad TST_{S} on 𝐒𝐞𝐭{\bf Set}. We also consider a function dS:S2[0,]d_{S}\colon S^{2}\to[0,\infty] satisfying dS(s,s)=0d_{S}(s,s)=0. The following ×\mathcal{R}^{\times}-divergence ΔI𝗅𝗂𝗉,dS(f1,f2){\mathsf{\Delta}}^{\mathsf{lip},d_{S}}_{I}(f_{1},f_{2}) on TSIT_{S}I measures how much the function pair (π2f1,π2f2)(\pi_{2}\circ f_{1},\pi_{2}\circ f_{2}) extends the distance between two states before updated. In short, Δ𝗅𝗂𝗉,dS{\mathsf{\Delta}}^{\mathsf{lip},d_{S}} measures the Lipschitz constant on state transformers.

Proposition 3.

The family Δ𝗅𝗂𝗉,dS={ΔI𝗅𝗂𝗉,dS}I𝐒𝐞𝐭{\mathsf{\Delta}}^{\mathsf{lip},d_{S}}=\{{\mathsf{\Delta}}^{\mathsf{lip},d_{S}}_{I}\}_{I\in{\bf Set}} of ×\mathcal{R}^{\times}-divergences on TSIT_{S}I defined by

ΔI𝗅𝗂𝗉,dS(f1,f2)sups1,s2SdS(π2(f1(s1)),π2(f2(s2)))dS(s1,s2)(f1,f2TSI, we suppose 0/0=1){\mathsf{\Delta}}^{\mathsf{lip},d_{S}}_{I}(f_{1},f_{2})\triangleq\sup_{s_{1},s_{2}\in S}\frac{d_{S}(\pi_{2}(f_{1}(s_{1})),\pi_{2}(f_{2}(s_{2})))}{d_{S}(s_{1},s_{2})}\quad(f_{1},f_{2}\in T_{S}I,\text{ we suppose }0/0=1)

is a Top{\color[rgb]{0,0,0}\mathrm{Top}}-relative ×\mathcal{R}^{\times}-divergence on TST_{S}.

For state transformers f1,f2TSIf_{1},f_{2}\in T_{S}I, their state-updating part is given as functions π2f1,π2f2SS\pi_{2}\circ f_{1},\pi_{2}\circ f_{2}\in S\Rightarrow S. When f1=f2=gf_{1}=f_{2}=g, ΔI𝗅𝗂𝗉,dS(g,g){\mathsf{\Delta}}^{\mathsf{lip},d_{S}}_{I}(g,g) is exactly the Lipschitz constant of π2g\pi_{2}\circ g.

5.6.2 Distance between State Transformers with the Same Inputs

Suppose that the function dSd_{S} also satisfies the triangle inequality. The following +\mathcal{R}^{+}-divergence ΔI𝗆𝖾𝗍,dS(f1,f2){\mathsf{\Delta}}^{\mathsf{met},d_{S}}_{I}(f_{1},f_{2}) on TSIT_{S}I estimates the distance between updated states after the state transformers f1f_{1} and f2f_{2} are applied to the same input.

Proposition 4.

Suppose that the function dSd_{S} also satisfy the triangle-inequality. The family Δ𝗆𝖾𝗍,dS={ΔI𝗆𝖾𝗍,dS}I𝐒𝐞𝐭{\mathsf{\Delta}}^{\mathsf{met},d_{S}}=\{{\mathsf{\Delta}}^{\mathsf{met},d_{S}}_{I}\}_{I\in{\bf Set}} of +\mathcal{R}^{+}-divergences on TSIT_{S}I defined by:

ΔI𝗆𝖾𝗍,dS(f1,f2)\displaystyle{\mathsf{\Delta}}^{\mathsf{met},d_{S}}_{I}(f_{1},f_{2}) {supsSdS(π2(f1(s)),π2(f2(s)))π1f1=π1f2andπ2f1,π2f2:nonexpansiveotherwise\displaystyle\triangleq\begin{cases}\sup_{s\in S}\!d_{S}(\pi_{2}(f_{1}(s)),\pi_{2}(f_{2}(s)))&\pi_{1}\circ f_{1}=\pi_{1}\circ f_{2}~{}\text{and}\\ &\pi_{2}\circ f_{1},\pi_{2}\circ f_{2}\colon\text{nonexpansive}\\ \infty&\text{otherwise}\end{cases}

is an Eq{\color[rgb]{0,0,0}\mathrm{Eq}}{}-relative +\mathcal{R}^{+}-divergence on TST_{S}.

5.6.3 Sup-Metric on the State Monad on the Category of Generalized Ultrametric Spaces

The category 𝐆𝐮𝐦\mathbf{Gum} of generalized ([0,1][0,1]-valued) ultrametric spaces333Recall that an ultrametric space (I,dI)(I,d_{I}) is a set II together with a function dI:I2[0,1]d_{I}\colon I^{2}\to[0,1] such that dI(x,x)=0d_{I}(x,x)=0 and dI(x,z)max(dI(x,y),dI(y,z))d_{I}(x,z)\leq\max(d_{I}(x,y),d_{I}(y,z)). and nonexpansive functions is Cartesian closed [49, Section 2.2]. We consider the state monad TS=S(×S)T_{S}=S\Rightarrow(-\times S) on 𝐆𝐮𝐦\mathbf{Gum} for a fixed space (S,dS)𝐆𝐮𝐦(S,d_{S})\in\mathbf{Gum}. From the definition of exponential objects in 𝐆𝐮𝐦\mathbf{Gum}, TS(I,dI)T_{S}(I,d_{I}) consists of the set of nonexpansive state transformers with the sup metric between them. In fact, the metric part of all TS(I,dI)T_{S}(I,d_{I}) forms a divergence on TST_{S}.

Proposition 5.

The family {dTSI:(TS(I,dI))2[0,1]}(I,dI)𝐆𝐮𝐦\{d_{T_{S}I}\colon(T_{S}(I,d_{I}))^{2}\to[0,1]\}_{(I,d_{I})\in\mathbf{Gum}} consisting of the metric part of the spaces TS(I,dI)T_{S}(I,d_{I}), given by

dTSI(f1,f2)supsSmax(dI(π1(f1(s)),π1(f2(s))),dS(π2(f1(s)),π2(f2(s))))\displaystyle d_{T_{S}I}(f_{1},f_{2})\triangleq\sup_{s\in S}\max\left(d_{I}(\pi_{1}(f_{1}(s)),\pi_{1}(f_{2}(s))),d_{S}(\pi_{2}(f_{1}(s)),\pi_{2}(f_{2}(s)))\right)

forms an Eq{\color[rgb]{0,0,0}\mathrm{Eq}}{}-relative ([0,1],,max,0)([0,1],\leq,\max,0)-divergence on TST_{S}.

In the category 𝐆𝐮𝐦\mathbf{Gum}, instead of Eq{\color[rgb]{0,0,0}\mathrm{Eq}}, there is another basic endorelation 𝖣𝗂𝗌𝗍0\mathsf{Dist}_{0}:

𝖣𝗂𝗌𝗍0(I,dI){(x1,x2)|dI(x1,x2)=0}.\mathsf{Dist}_{0}(I,d_{I})\triangleq\{(x_{1},x_{2})~{}|~{}d_{I}(x_{1},x_{2})=0\}.

By modifying the divergence dTS()d_{T_{S}(-)}, we obtain a 𝖣𝗂𝗌𝗍0\mathsf{Dist}_{0}-relative ([0,1],,max,0)([0,1],\leq,\max,0)-divergence as below:

Proposition 6.

The following forms a 𝖣𝗂𝗌𝗍0\mathsf{Dist}_{0}{}-relative ([0,1],,max,0)([0,1],\leq,\max,0)-divergence on TST_{S}.

Δ(I,dI)𝖣𝗂𝗌𝗍0(f1,f2)supdS(s1,s2)=0max(dS(π1(f1(s1)),π1(f2(s2))),dI(π2(f1(s1)),π2(f2(s2)))).{\mathsf{\Delta}}^{\mathsf{Dist}_{0}}_{(I,d_{I})}(f_{1},f_{2})\triangleq\sup_{d_{S}(s_{1},s_{2})=0}\max(d_{S}(\pi_{1}(f_{1}(s_{1})),\pi_{1}(f_{2}(s_{2}))),d_{I}(\pi_{2}(f_{1}(s_{1})),\pi_{2}(f_{2}(s_{2})))).

5.7 Combining Divergence with Cost

In Section 5.2, we have introduced a divergence on the monad P(×)P(\mathbb{N}\times-) modeling nondeterministic choice and cost counting. In this section we construct a divergence on the combination of a general computational effect and cost counting.

Let (,T)(\mathbb{C},T) be a CC-SM and Δ𝐃𝐢𝐯(T,Eq,1,𝒬){\mathsf{\Delta}}\in{\bf Div}(T,{\color[rgb]{0,0,0}\mathrm{Eq}},1,\mathcal{Q}) be a divergence and (N,1N:1N,():N×NN)(N,1_{N}\colon 1\to N,(\star)\colon N\times N\to N) be a monoid object in \mathbb{C} (for cost counting). Then the composite T(N×)T(N\times-) of the monad TT and the monoid action monad N×()N\times(-) again carries a monad structure. We now define a family 𝖢(Δ,N)={𝖢(Δ,N)I:(U(T(N×I)))2𝒬}I{\mathsf{C}}({\mathsf{\Delta}},N)=\{{\mathsf{C}}({\mathsf{\Delta}},N)_{I}\colon(U(T(N\times I)))^{2}\to\mathcal{Q}\}_{I\in\mathbb{C}} of 𝒬\mathcal{Q}-divergences by

𝖢(Δ,N)I(c1,c2){ΔN(Tπ1c1,Tπ1c2)ΔN×I(c1,c2)ΔN(Tπ1c1,Tπ1c2)𝒬otherwise.\displaystyle{\mathsf{C}}({\mathsf{\Delta}},N)_{I}(c_{1},c_{2})\triangleq\begin{cases}{\mathsf{\Delta}}_{N}(T\pi_{1}\mathbin{\bullet}c_{1},T\pi_{1}\mathbin{\bullet}c_{2})&{\mathsf{\Delta}}_{N\times I}(c_{1},c_{2})\leq{\mathsf{\Delta}}_{N}(T\pi_{1}\mathbin{\bullet}c_{1},T\pi_{1}\mathbin{\bullet}c_{2})\\ \top_{\mathcal{Q}}&\text{otherwise}\end{cases}.
Proposition 7.

The family 𝖢(Δ,N){\mathsf{C}}({\mathsf{\Delta}},N) is an Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-relative 𝒬\mathcal{Q}-divergence on T(N×)T(N\times-).

For example, the divergence 𝖢(𝖪𝖫,){\mathsf{C}}(\mathsf{KL},\mathbb{R}) on the composite monad G(×)G(\mathbb{R}\times-) on 𝐌𝐞𝐚𝐬{\bf Meas} describes Kullback-Leibler divergence between distributions of costs in the probabilistic computations with real-valued costs. Intuitively, the side condition 𝖪𝖫×I(μ1,μ2)𝖪𝖫(Gπ1μ1,Gπ1μ2)\mathsf{KL}_{\mathbb{R}\times I}(\mu_{1},\mu_{2})\leq\mathsf{KL}_{\mathbb{R}}(G\pi_{1}\mathbin{\bullet}\mu_{1},G\pi_{1}\mathbin{\bullet}\mu_{2}) in the definition of 𝖢(𝖪𝖫,){\mathsf{C}}(\mathsf{KL},\mathbb{R}) means that the difference between μ1\mu_{1} and μ2\mu_{2} lies only in the costs.

5.8 Preorders on Monads

To explore the generality of our framework, we look at the case where the divergence domain is =({01},1,×)\mathcal{B}=(\{0\geq 1\},1,\times); here ×\times is the numerical multiplication. We identify an indexed family Δ={ΔI:(U(TI))2}I{\mathsf{\Delta}}=\{{\mathsf{\Delta}}_{I}\colon(U(TI))^{2}\to\mathcal{B}\}_{I\in\mathbb{C}} of \mathcal{B}-divergences and a family of adjacency relations Δ~(1)I{(c1,c2)|ΔI(c1,c2)1}\tilde{\mathsf{\Delta}}(1)I\triangleq\{(c_{1},c_{2})~{}|~{}{\mathsf{\Delta}}_{I}(c_{1},c_{2})\leq 1\}.

We point out a connection between Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-relative \mathcal{B}-divergences and preorders on monads studied in ([32, 50]). A preorder on a monad TT on 𝐒𝐞𝐭{\bf Set} assigns a preorder I\sqsubseteq_{I} on TITI for each I𝐒𝐞𝐭I\in{\bf Set}, and this assignment satisfies:

Substitutivity

For any function f:ITJf\colon I\to TJ and c1,c2TIc_{1},c_{2}\in TI, c1Ic2c_{1}\sqsubseteq_{I}c_{2} implies f(c1)Jf(c2)f^{\sharp}(c_{1})\sqsubseteq_{J}f^{\sharp}(c_{2}).

Congruence

For any function f1,f2:ITJf_{1},f_{2}\colon I\to TJ, if f1(x)Jf2(x)f_{1}(x)\sqsubseteq_{J}f_{2}(x) holds for any xIx\in I, then f1(c)Jf2(c)f_{1}^{\sharp}(c)\sqsubseteq_{J}f_{2}^{\sharp}(c) holds for any cTIc\in TI.

Proposition 8.

A preorder on a monad TT on 𝐒𝐞𝐭{\bf Set} bijectively corresponds to an Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-relative \mathcal{B}-divergence Δ{\mathsf{\Delta}} on TT such that each Δ~(1)I\tilde{\mathsf{\Delta}}(1)I is a preorder.

For a preorder \sqsubseteq on a monad TT on 𝐒𝐞𝐭{\bf Set}, by Δ{\mathsf{\Delta}}^{\sqsubseteq} we mean the divergence corresponding to \sqsubseteq by Proposition 8 (in fact, we have Δ~(1)I=I\widetilde{{\mathsf{\Delta}}^{\sqsubseteq}}(1)I={\sqsubseteq_{I}} for all set II).

6 Properties of Divergences on Monads

6.1 Divergences on Monads as Structures in 𝐃𝐢𝐯𝒬(){\bf Div}_{\mathcal{Q}}({\mathbb{C}})

In this section we examine divergences on monads from the view point of monoidal structure of 𝐃𝐢𝐯𝒬(){\bf Div}_{\mathcal{Q}}({\mathbb{C}}). For any CC \mathbb{C}, the category 𝐃𝐢𝐯𝒬(){\bf Div}_{\mathcal{Q}}({\mathbb{C}}) has a symmetric monoidal structure, whose unit and tensor product are given by

𝐈\displaystyle{\bf I} (1,λ(x1,x2).0),\displaystyle\triangleq(1,\lambda{(x_{1},x_{2})}~{}.~{}0),
(I,d)(J,e)\displaystyle(I,d)\otimes(J,e) (I×J,λ(x1,y1,x2,y2).d(x1,x2)+e(y1,y2)).\displaystyle\triangleq(I\times J,\lambda{(\langle x_{1},y_{1}\rangle,\langle x_{2},y_{2}\rangle)}~{}.~{}d(x_{1},x_{2})+e(y_{1},y_{2})).

The coherence isomorphisms of this symmetric monoidal structure are inherited from the Cartesian monoidal structure on \mathbb{C}. Moreover, V𝒬,:𝐃𝐢𝐯𝒬()V_{\mathcal{Q},\mathbb{C}}:{\bf Div}_{\mathcal{Q}}({\mathbb{C}})\to{\mathbb{C}} becomes a symmetric strict monoidal functor of type (𝐃𝐢𝐯𝒬(),𝐈,)(,1,(×))({\bf Div}_{\mathcal{Q}}({\mathbb{C}}),{\bf I},\otimes)\to(\mathbb{C},1,(\times)).

6.1.1 Enrichments of Kleisli Categories Induced by Divergences

Let (,T)(\mathbb{C},T) be a CC-SM. We first show that a (non-graded) divergence on a monad TT attaches a 𝐃𝐢𝐯𝒬(𝐒𝐞𝐭){\bf Div}_{\mathcal{Q}}({{\bf Set}})-enrichment on the Kleisli category T\mathbb{C}_{T} of TT. What we mean by attaching an enrichment to an ordinary category is formulated as follows.

Definition 9.

A 𝐃𝐢𝐯𝒬(𝐒𝐞𝐭){\bf Div}_{\mathcal{Q}}({{\bf Set}})-enrichment of a category 𝔻\mathbb{D} is a family {dI,J:𝔻(I,J)2𝒬}I,J𝔻\{d_{I,J}:\mathbb{D}(I,J)^{2}\to\mathcal{Q}\}_{I,J\in\mathbb{D}} of 𝒬\mathcal{Q}-divergences on the homset 𝔻(I,J)\mathbb{D}(I,J) such that the following inequalities hold:

dI,I(idI,idI)0,\displaystyle d_{I,I}({\rm id}_{I},{\rm id}_{I})\leq 0, (3)
dI,K(g1f1,g2f2)dJ,K(g1,g2)+dI,J(f1,f2).\displaystyle d_{I,K}(g_{1}\circ f_{1},g_{2}\circ f_{2})\leq d_{J,K}(g_{1},g_{2})+d_{I,J}(f_{1},f_{2}). (4)

Such an enrichment determines a 𝐃𝐢𝐯𝒬(𝐒𝐞𝐭){\bf Div}_{\mathcal{Q}}({{\bf Set}})-enriched category 𝔻d\mathbb{D}^{d}, whose object collection and homobjects are given by

𝐎𝐛𝐣(𝔻d)𝐎𝐛𝐣(𝔻),𝔻d(I,J)(𝔻(I,J),dI,J).{\bf Obj}(\mathbb{D}^{d})\triangleq{\bf Obj}(\mathbb{D}),\quad\mathbb{D}^{d}(I,J)\triangleq(\mathbb{D}(I,J),d_{I,J}).

The identity and composition morphisms of 𝔻d\mathbb{D}^{d}:

jI:𝐈𝔻d(I,I),mI,J,K:𝔻d(J,K)𝔻d(I,J)𝔻d(I,K)j_{I}:{\bf I}\to\mathbb{D}^{d}(I,I),\quad m_{I,J,K}:\mathbb{D}^{d}(J,K)\otimes\mathbb{D}^{d}(I,J)\to\mathbb{D}^{d}(I,K)

are inherited from 𝔻\mathbb{D}; they are guaranteed to be nonexpansive by the conditions (3) and (4). The change of base of enrichment of 𝔻d\mathbb{D}^{d} by the symmetric strict monoidal functor V𝒬,𝔻:𝐃𝐢𝐯𝒬(𝔻)𝔻V_{\mathcal{Q},\mathbb{D}}:{\bf Div}_{\mathcal{Q}}({\mathbb{D}})\to{\mathbb{D}} coincides with 𝔻\mathbb{D}. 444The underlying category of 𝔻d\mathbb{D}^{d} [34, Section 1.3] does not coincide with 𝔻\mathbb{D}.

We relate conditions (3) and (4) with the unit reflexivity and composability conditions in the definition of divergence on monad (Definition 6).

Theorem 2.

Let (,T)(\mathbb{C},T) be a CC-SM, E:𝐁𝐑𝐞𝐥()E:\mathbb{C}\to{\bf BRel}(\mathbb{C}) be a basic endorelation such that RE1R_{E1}\neq\emptyset 555 RE1=R_{E1}=\emptyset happens if and only if REI=R_{EI}=\emptyset for any II\in\mathbb{C}. Therefore nontrivial basic endorelations always satisfy RE1R_{E1}\neq\emptyset. , 𝒬\mathcal{Q} be a divergence domain and Δ={ΔI:(U(TI))2𝒬}I{\mathsf{\Delta}}=\{{\mathsf{\Delta}}_{I}:(U(TI))^{2}\to\mathcal{Q}\}_{I\in\mathbb{C}} be a family of 𝒬\mathcal{Q}-divergences on TITI. Define a family d={dI,J}I,Jd=\{d_{I,J}\}_{I,J\in\mathbb{C}} of 𝒬\mathcal{Q}-divergences on the homset T(I,J)\mathbb{C}_{T}(I,J) of the Kleisli category T\mathbb{C}_{T} by

dI,J(f1,f2)sup(x1,x2)EIΔJ(f1x1,f2x2).d_{I,J}(f_{1},f_{2})\triangleq\sup_{(x_{1},x_{2})\in EI}{\mathsf{\Delta}}_{J}(f_{1}\mathbin{\bullet}x_{1},f_{2}\mathbin{\bullet}x_{2}). (5)

Then dd is a 𝐃𝐢𝐯𝒬(𝐒𝐞𝐭){\bf Div}_{\mathcal{Q}}({{\bf Set}})-enrichment of T\mathbb{C}_{T} if and only if Δ{\mathsf{\Delta}} is an EE-relative 𝒬\mathcal{Q}-divergence on TT.

6.1.2 Internalizing Divergences as Structures in 𝐃𝐢𝐯𝒬(){\bf Div}_{\mathcal{Q}}({\mathbb{C}})

One might wonder how the 𝒬\mathcal{Q}-divergence (5) given to each homset of T\mathbb{C}_{T} arises. Under a strengthened assumption, we derive it from the closed structure with respect to the monoidal product of 𝐃𝐢𝐯𝒬(){\bf Div}_{\mathcal{Q}}({\mathbb{C}}). This allows us to internalize divergences on monads as structures in 𝐃𝐢𝐯𝒬(){\bf Div}_{\mathcal{Q}}({\mathbb{C}}).

Let (,T)(\mathbb{C},T) be a CCC-SM and 𝒬\mathcal{Q} be a divergence domain whose monoid operation (+)(+) preserves the largest element 𝒬\top\in\mathcal{Q}, that is, x+=x+\top=\top. A consequence of this strengthened assumption is the following:

Lemma 1.

Let (I,d)𝐃𝐢𝐯𝒬()(I,d)\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}}) be an object such that d(x1,x2)d(x_{1},x_{2}) takes only values in {0,}𝒬\{0,\top\}\subseteq\mathcal{Q}. Then the functor ()(I,d):𝐃𝐢𝐯𝒬()𝐃𝐢𝐯𝒬()(-)\otimes(I,d):{\bf Div}_{\mathcal{Q}}({\mathbb{C}})\to{\bf Div}_{\mathcal{Q}}({\mathbb{C}}) has a right adjoint, which we denote by (I,d)()(I,d)\multimap(-). Moreover, V𝒬,:𝐃𝐢𝐯𝒬()V_{\mathcal{Q},\mathbb{C}}:{\bf Div}_{\mathcal{Q}}({\mathbb{C}})\to{\mathbb{C}} is a map of adjunction of type:

V𝒬,:(()(I,d)(I,d)())(()×II()).V_{\mathcal{Q},\mathbb{C}}:((-)\otimes(I,d)\dashv(I,d)\multimap(-))\to((-)\times I\dashv I\Rightarrow(-)).

The proof of this lemma exhibits that the 𝒬\mathcal{Q}-divergence hh associated to the internal hom object (I,d)(J,e)(I,d)\multimap(J,e) measures the divergence between f1,f2U(IJ)f_{1},f_{2}\in U(I\Rightarrow J) by

h(f1,f2)=supx1,x2UI,d(x1,x2)=0e(f1x1,f2x2),h(f_{1},f_{2})=\sup_{x_{1},x_{2}\in UI,d(x_{1},x_{2})=0}e(\lfloor{f_{1}}\rfloor\bullet x_{1},\lfloor{f_{2}}\rfloor\bullet x_{2}),

which almost coincides with the sup part of (5); here :U(IJ)(I,J)\lfloor{-}\rfloor:U(I\Rightarrow J)\to\mathbb{C}(I,J) is the bijection given in Section 2. We use this coincidence to characterize the unit-reflexivity and composability conditions in the definition of divergence on monad (Definition 6). First, we define the internal Kleisli extension morphism klI,J:TI×(ITJ)TJkl_{I,J}:TI\times(I\Rightarrow TJ)\rightarrow TJ by

klI,JTI×(ITJ)π2,π1(ITJ)×TIθITJ,IT((ITJ)×I)ev#TJ.kl_{I,J}\triangleq\lx@xy@svg{\hbox{\raise 0.0pt\hbox{\kern 36.44435pt\hbox{\ignorespaces\ignorespaces\ignorespaces\hbox{\vtop{\kern 0.0pt\offinterlineskip\halign{\entry@#!@&&\entry@@#!@\cr&&&\crcr}}}\ignorespaces{\hbox{\kern-36.44435pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{TI\times(I\Rightarrow TJ)\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$}}}}}}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces\ignorespaces\ignorespaces{\hbox{\kern 34.59305pt\raise 8.0pt\hbox{{}\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\hbox{\hbox{\kern 0.0pt\raise-2.5pt\hbox{$\scriptstyle{\langle\pi_{2},\pi_{1}\rangle}$}}}\kern 3.0pt}}}}}}\ignorespaces{\hbox{\kern 60.44435pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\lx@xy@tip{1}\lx@xy@tip{-1}}}}}}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{\hbox{\kern 60.44435pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{(I\Rightarrow TJ)\times TI\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$}}}}}}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces\ignorespaces\ignorespaces{\hbox{\kern 133.53654pt\raise 6.50278pt\hbox{{}\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\hbox{\hbox{\kern 0.0pt\raise-1.35834pt\hbox{$\scriptstyle{\theta_{I\Rightarrow TJ,I}}$}}}\kern 3.0pt}}}}}}\ignorespaces{\hbox{\kern 157.33305pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\lx@xy@tip{1}\lx@xy@tip{-1}}}}}}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{\hbox{\kern 157.33305pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{T((I\Rightarrow TJ)\times I)\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$}}}}}}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces\ignorespaces\ignorespaces{\hbox{\kern 240.26343pt\raise 5.89583pt\hbox{{}\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\hbox{\hbox{\kern 0.0pt\raise-2.89583pt\hbox{$\scriptstyle{\mathrm{ev}^{\#}}$}}}\kern 3.0pt}}}}}}\ignorespaces{\hbox{\kern 261.99956pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\lx@xy@tip{1}\lx@xy@tip{-1}}}}}}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}{\hbox{\kern 261.99956pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{TJ}$}}}}}}}\ignorespaces}}}}\ignorespaces. (6)

Next, for a basic endorelation E:𝐁𝐑𝐞𝐥()E:\mathbb{C}\to{\bf BRel}(\mathbb{C}), we define the functor E:𝐃𝐢𝐯𝒬()E^{\prime}:\mathbb{C}\to{\bf Div}_{\mathcal{Q}}({\mathbb{C}}) by

EI(I,dEI),Eff,wheredEI(x1,x2){0(x1,x2)E(x1,x2)E.E^{\prime}I\triangleq(I,d_{E^{\prime}I}),\quad E^{\prime}f\triangleq f,\quad\text{where}\quad d_{E^{\prime}I}(x_{1},x_{2})\triangleq\left\{\begin{array}[]{ll}0&(x_{1},x_{2})\in E\\ \infty&(x_{1},x_{2})\not\in E.\end{array}\right.
Theorem 3.

Let (,T)(\mathbb{C},T) be a CCC-SM, (M,,1,())(M,\leq,1,(\cdot)) be a grading monoid, 𝒬\mathcal{Q} be a divergence domain whose monoid operation (+)(+) satisfies x+=x+\top=\top, and E:𝐁𝐑𝐞𝐥()E:\mathbb{C}\to{\bf BRel}(\mathbb{C}) be a basic endorelation. Let Δ={ΔIm}mM,I{\mathsf{\Delta}}=\{{\mathsf{\Delta}}^{m}_{I}\}_{m\in M,I\in\mathbb{C}} be a doubly-indexed family of 𝒬\mathcal{Q}-divergences on TITI, regarded as 𝐃𝐢𝐯𝒬(){\bf Div}_{\mathcal{Q}}({\mathbb{C}})-objects. Then

  1. 1.

    Δ{\mathsf{\Delta}} satisfies the EE-unit reflexivity condition if and only if for any II\in\mathbb{C}, the following nonexpansivity holds on the global element ηI:1ITI\lceil{\eta_{I}}\rceil:1\to I\Rightarrow TI corresponding to the monad unit:

    ηI𝐃𝐢𝐯𝒬()(𝐈,EIΔI1).\lceil{\eta_{I}}\rceil\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}})({\bf I},E^{\prime}I\multimap{\mathsf{\Delta}}^{1}_{I}).
  2. 2.

    Δ{\mathsf{\Delta}} satisfies the EE-composablity condition if and only if for any I,JI,J\in\mathbb{C} and m,nMm,n\in M, the following nonexpansivity holds on the internal Kleisli extension morphism klI,J:TI×(ITJ)TJkl_{I,J}:TI\times(I\Rightarrow TJ)\rightarrow TJ:

    klI,J𝐃𝐢𝐯𝒬()(ΔIm(EIΔJn),ΔJmn).kl_{I,J}\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}})({\mathsf{\Delta}}^{m}_{I}\otimes(E^{\prime}I\multimap{\mathsf{\Delta}}^{n}_{J}),{\mathsf{\Delta}}^{m\cdot n}_{J}).

[5] formalized families of composable divergences as parameterized assignment in weakly closed monoidal refinement. Roughly speaking, they adopted the equivalence (2) of Theorem 3 as the definition of parameterized assignment. However, divergence on monads and parameterized assignments are built on slightly different categorical foundations, and their generalities are incomparable. Notable differences from parameterized assignment are: 1) divergences on monads are defined in relative to basic endorelations, and 2) the underlying category of divergences on monads is any CCs, while parameterized assignments requires closed structure on their underlying category. In this sense divergences on monads are a mild generalization of parameterized assignments.

6.1.3 Divergences on Monads and Divergence Liftings of Monads

We next relate graded divergences on monads and monad-like structures on the category 𝐃𝐢𝐯𝒬(){\bf Div}_{\mathcal{Q}}({\mathbb{C}}) of 𝒬\mathcal{Q}-divergences on \mathbb{C}-objects. What we mean by monad-like structures is graded divergence liftings of monads on \mathbb{C}, which we introduce below. It is a graded monad on 𝐃𝐢𝐯𝒬(){\bf Div}_{\mathcal{Q}}({\mathbb{C}}) ([31]) whose unit and multiplication are inherited from a monad on \mathbb{C}.

Definition 10.

Let (,T)(\mathbb{C},T) be a CC-SM, MM be a grading monoid and 𝒬\mathcal{Q} be a divergence domain. An MM-graded 𝒬\mathcal{Q}-divergence lifting of TT is an mapping T˙:M×𝐎𝐛𝐣(𝐃𝐢𝐯𝒬())𝐎𝐛𝐣(𝐃𝐢𝐯𝒬())\dot{T}:M\times{\bf Obj}({\bf Div}_{\mathcal{Q}}({\mathbb{C}}))\rightarrow{\bf Obj}({\bf Div}_{\mathcal{Q}}({\mathbb{C}})) such that (below VV stands for the forgetful functor V𝒬,:𝐃𝐢𝐯𝒬()V_{\mathcal{Q},\mathbb{C}}:{\bf Div}_{\mathcal{Q}}({\mathbb{C}})\to{\mathbb{C}})

  1. 1.

    V(T˙mX)=T(VX)V(\dot{T}mX)=T(VX)

  2. 2.

    mnm\leq n implies T˙mXT˙nX\dot{T}mX\leq\dot{T}nX

  3. 3.

    ηVX𝐃𝐢𝐯𝒬()(X,T˙1X)\eta_{VX}\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}})(X,\dot{T}1X)

  4. 4.

    μVX:𝐃𝐢𝐯𝒬()(T˙m(T˙nX),T˙(mn)X)\mu_{VX}:{\bf Div}_{\mathcal{Q}}({\mathbb{C}})(\dot{T}m(\dot{T}nX),\dot{T}(m\cdot n)X).

Let E:𝐁𝐑𝐞𝐥()E:\mathbb{C}\rightarrow{\bf BRel}(\mathbb{C}) be a basic endorelation. We say that an MM-graded 𝒬\mathcal{Q}-divergence lifting T˙\dot{T} of TT is EE-strong if the strength θ\theta of TT satisfies

θVX,J𝐃𝐢𝐯𝒬()(XT˙m(EJ),T˙m(XEJ)).\theta_{VX,J}\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}})(X\otimes\dot{T}m(E^{\prime}J),\dot{T}m(X\otimes E^{\prime}J)).

We write 𝐒𝐆𝐃𝐋𝐢𝐟𝐭(T,E,M,𝒬){\bf SGDLift}(T,E,M,\mathcal{Q}) for the collection of EE-strong MM-graded 𝒬\mathcal{Q}-divergence liftings of TT. We introduce a partial order \preceq on 𝐒𝐆𝐃𝐋𝐢𝐟𝐭(T,E,M,𝒬){\bf SGDLift}(T,E,M,\mathcal{Q}) by

T˙S˙mM,X𝐃𝐢𝐯𝒬(),c1,c2U(T(VX)).dT˙mX(c1,c2)dS˙mX(c1,c2).\dot{T}\preceq\dot{S}\iff\forall{m\in M,X\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}}),c_{1},c_{2}\in U(T(VX))}~{}.~{}d_{\dot{T}mX}(c_{1},c_{2})\geq d_{\dot{S}mX}(c_{1},c_{2}).

We will later see a similar concept of strong graded relational lifting of monad in Definition 15. Divergence lifting and relational lifting are actually instances of a common general definition of strong graded lifting of monad ([31]), but in this paper we omit this general definition.

The following theorem relates that every divergence can be expressed as the composite of a graded divergence lifting and the divergence corresponding to a basic endorelation.

Theorem 4.

Let (,T)(\mathbb{C},T) be a CC-SM, MM be a grading monoid, 𝒬\mathcal{Q} be a divergence domain and E:𝐁𝐑𝐞𝐥()E:\mathbb{C}\rightarrow{\bf BRel}(\mathbb{C}) be a basic endorelation. For any Δ𝐃𝐢𝐯(T,E,M,𝒬){\mathsf{\Delta}}\in{\bf Div}(T,E,M,\mathcal{Q}), define a mapping [Δ]:M×𝐎𝐛𝐣(𝐃𝐢𝐯𝒬())𝐎𝐛𝐣(𝐃𝐢𝐯𝒬())[{\mathsf{\Delta}}]:M\times{\bf Obj}({\bf Div}_{\mathcal{Q}}({\mathbb{C}}))\to{\bf Obj}({\bf Div}_{\mathcal{Q}}({\mathbb{C}})) by, for X=(I,d)X=(I,d), [Δ]mX(TI,d[Δ]mX)[{\mathsf{\Delta}}]mX\triangleq(TI,d_{[{\mathsf{\Delta}}]mX}) where

d[Δ]mX(c1,c2)supJ,nM,f𝐃𝐢𝐯𝒬()(X,ΔJn)ΔJmn(fc1,fc2).d_{[{\mathsf{\Delta}}]mX}(c_{1},c_{2})\triangleq\sup_{J\in\mathbb{C},n\in M,f\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}})(X,{\mathsf{\Delta}}^{n}_{J})}{\mathsf{\Delta}}^{m\cdot n}_{J}(f^{\sharp}\mathbin{\bullet}c_{1},f^{\sharp}\mathbin{\bullet}c_{2}).

Then [Δ][{\mathsf{\Delta}}] is an MM-graded 𝒬\mathcal{Q}-divergence lifting T˙\dot{T} such that ΔIm=[Δ]m(EI){\mathsf{\Delta}}^{m}_{I}=[{\mathsf{\Delta}}]m(E^{\prime}I).

When M=1M=1, Theorem 4 implies that the assignment IΔII\mapsto{\mathsf{\Delta}}_{I} extends to the EE^{\prime}-relative monad [Δ]E:𝐃𝐢𝐯𝒬()[{\mathsf{\Delta}}]\circ E^{\prime}:\mathbb{C}\to{\bf Div}_{\mathcal{Q}}({\mathbb{C}}) in the sense of [3].

When we strengthen the assumptions on (,T)(\mathbb{C},T) and 𝒬\mathcal{Q} as done in Section 6.1.2, we obtain a sharper correspondence between divergences on monads and strong graded divergence liftings of monads.

Theorem 5.

Let (,T)(\mathbb{C},T) be a CCC-SM, MM be a grading monoid, 𝒬\mathcal{Q} be a divergence domain such that (+)(+) satisfies x+=x+\top=\top and E:𝐁𝐑𝐞𝐥()E:\mathbb{C}\rightarrow{\bf BRel}(\mathbb{C}) be a basic endorelation. Then there exists an adjunction between partial orders:

(𝐒𝐆𝐃𝐋𝐢𝐟𝐭(T,E,M,𝒬),)\textstyle{({\bf SGDLift}(T,E,M,\mathcal{Q}),\preceq)\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}\scriptstyle{\langle-\rangle}\textstyle{\bot}[]\scriptstyle{[-]}(𝐃𝐢𝐯(T,E,M,𝒬),)\textstyle{({\bf Div}(T,E,M,\mathcal{Q}),\preceq)}

where T˙mIT˙m(EI)\langle\dot{T}\rangle mI\triangleq\dot{T}m(E^{\prime}I)

6.2 Generation of Divergences

It has been shown that DP can be interpreted as hypothesis testing ([54, 30]). Given a query c:IGJc\colon I\to GJ and adjacent datasets (d1,d2)RadjI2(d_{1},d_{2})\in R_{\mathrm{adj}}\subseteq I^{2}, we consider the following hypothesis testing with the null and alternative hypotheses:

H0:The output y comes from the dataset d1,\displaystyle H_{0}\colon\text{The output $y$ comes from the dataset $d_{1}$},
H1:The output y comes from the dataset d2.\displaystyle H_{1}\colon\text{The output $y$ comes from the dataset $d_{2}$}.

For any rejection region SΣJS\in\Sigma_{J}, the Type I and Type II errors are then represented by Pr[c(d1)S]\Pr[c(d_{1})\in S] and Pr[c(d2)S]\Pr[c(d_{2})\notin S], respectively. [30] showed that cc is (ε,δ)(\varepsilon,\delta)-DP if and only if for any adjacent datasets (d1,d2)RadjI2(d_{1},d_{2})\in R_{\mathrm{adj}}\subseteq I^{2}, the pair of Type I error and Type II error lands in the privacy region R(ε,δ)R(\varepsilon,\delta):

SΣJ.(Pr[c(d1)S],Pr[c(d2)S]){(x,y)[0,1]2|(1x)exp(ε)y+δ}R(ε,δ).\forall{S\in\Sigma_{J}}~{}.~{}(\Pr[c(d_{1})\in S],\Pr[c(d_{2})\notin S])\in\underbrace{\{(x,y)\in[0,1]^{2}|(1-x)\leq\exp(\varepsilon)y+\delta\}}_{\triangleq R(\varepsilon,\delta)}.

They also showed that this is equivalent to the testing using probabilistic decision rules [30, Corollary 2.3]:

k:JG{𝖠𝖼𝖼,𝖱𝖾𝗃}.(Pr[kc(d1)=𝖠𝖼𝖼],Pr[kc(d2)=𝖱𝖾𝗃])R(ε,δ).\forall{k\colon J\to G\{\mathsf{Acc},\mathsf{Rej}\}}~{}.~{}(\Pr[k^{\sharp}c(d_{1})=\mathsf{Acc}],\Pr[k^{\sharp}c(d_{2})=\mathsf{Rej}])\in R(\varepsilon,\delta).

Later [7] generalized this probabilistic variant of hypothesis testing to general statistical divergences, and arrived at a notion of kk-generatedness of statistical divergences (k{}k\in\mathbb{N}\cup\{\infty\}). Following their generalization, we introduce the concept of Ω\Omega-generatedness of divergences on monads.

Definition 11.

Let Ω\Omega\in\mathbb{C}. A divergence Δ𝐃𝐢𝐯(T,E,M,𝒬){\mathsf{\Delta}}\in{\bf Div}(T,E,M,\mathcal{Q}) is Ω\Omega-generated if for any mMm\in M, II\in\mathbb{C} and c1,c2U(TI)c_{1},c_{2}\in U(TI),

ΔIm(c1,c2)=supk:ITΩΔΩm(kc1,kc2).{\mathsf{\Delta}}^{m}_{I}(c_{1},c_{2})=\sup_{k\colon I\to T\Omega}{\mathsf{\Delta}}^{m}_{\Omega}(k^{\sharp}\mathbin{\bullet}c_{1},k^{\sharp}\mathbin{\bullet}c_{2}).

An equivalent definition of Δ𝐃𝐢𝐯(T,E,M,𝒬){\mathsf{\Delta}}\in{\bf Div}(T,E,M,\mathcal{Q}) being Ω\Omega-generated is: the following holds for any mM,I,c1,c2U(TI),v𝒬m\in M,I\in\mathbb{C},c_{1},c_{2}\in U(TI),v\in\mathcal{Q}:

ΔIm(c1,c2)vk:ITΩ.(kc1,kc2)Δ~(m,v)Ω.{\mathsf{\Delta}}^{m}_{I}(c_{1},c_{2})\leq v\iff\forall{k\colon I\to T\Omega}~{}.~{}(k^{\sharp}\mathbin{\bullet}c_{1},k^{\sharp}\mathbin{\bullet}c_{2})\in\tilde{\mathsf{\Delta}}(m,v)\Omega.

Here Δ~(m,v)Ω\tilde{\mathsf{\Delta}}(m,v)\Omega is the binary relation {(c1,c2)|ΔΩm(c1,c2)v}\{(c_{1},c_{2})~{}|~{}{\mathsf{\Delta}}^{m}_{\Omega}(c_{1},c_{2})\leq v\}; see also (9). For an Ω\Omega-generated divergence Δ{\mathsf{\Delta}}, its component ΔΩm{\mathsf{\Delta}}^{m}_{\Omega} at Ω\Omega is an essential part that determines all components ΔIm{\mathsf{\Delta}}^{m}_{I} of Δ{\mathsf{\Delta}}. When a divergence is shown to be Ω\Omega-generated, the calculation of the codensity lifting T[Δ]T^{[{\mathsf{\Delta}}]} given in Section 7 will be simplified (Section 7.1).

We illustrate Ω\Omega-generatedness of various divergences. First, we show the Ω\Omega-generatedness of divergences on the Giry monad GG in Tables 2 and 3.

  • Divergence 𝖣𝖯\mathsf{DP} is generated over the two-point discrete space 22 [7, Section B.7]. The binary relation (𝖣𝖯~(ε,δ)2)(\widetilde{\mathsf{DP}}(\varepsilon,\delta)2) coincides with the privacy region R(ε,δ)R(\varepsilon,\delta).

  • Divergence 𝖳𝖵\mathsf{TV} is also generated over 22 [7, Section C.1].

  • Divergences 𝖱𝖾α\mathsf{Re}^{\alpha}, 𝖢𝗁𝗂\mathsf{Chi}, 𝖧𝖣\mathsf{HD} and 𝖪𝖫\mathsf{KL} are generated over the countably infinite discrete space \mathbb{N}. In contrast, they are not NN-generated for every finite discrete space NN [7, Sections B.5 and B.9].

On the sub-Giry monad GsG_{s}, the divergence 𝖣𝖯\mathsf{DP} is 11-generated, and the total variation distance 𝖳𝖵\mathsf{TV} is 22-generated.

Proposition 9.

The divergence 𝖣𝖯𝐃𝐢𝐯(Gs,Eq,+,+)\mathsf{DP}\in{\bf Div}(G_{s},{\color[rgb]{0,0,0}\mathrm{Eq}},\mathcal{R}^{+},\mathcal{R}^{+}) is 11-generated.

Proposition 10.

The divergence 𝖳𝖵𝐃𝐢𝐯(Gs,Eq,1,+)\mathsf{TV}\in{\bf Div}(G_{s},{\color[rgb]{0,0,0}\mathrm{Eq}},1,\mathcal{R}^{+}) is not 11-generated but 22-generated.

Ω\Omega-Generatedness of Preorders on Monads

We relate Ω\Omega-generatedness of divergences and preorders on monads studied in ([32]). Let TT be a monad on 𝐒𝐞𝐭{\bf Set} and Ω\Omega be a set. [32] introduced the concept of congruent and substitutive preorders on TΩT\Omega as those satisfying:

Substitutivity

For any function f:ΩTΩf\colon\Omega\to T\Omega and c1,c2TΩc_{1},c_{2}\in T\Omega, c1c2c_{1}\leq c_{2} implies f(c1)f(c2)f^{\sharp}(c_{1})\leq f^{\sharp}(c_{2}).

Congruence

For any function f1,f2:JTΩf_{1},f_{2}\colon J\to T\Omega, if f1(x)f2(x)f_{1}(x)\leq f_{2}(x) holds for any xJx\in J, then f1(c)f2(c)f_{1}^{\sharp}(c)\leq f_{2}^{\sharp}(c) holds for any cTΩc\in T\Omega.

For instance, any component of a preorder on TT at Ω\Omega forms a congruent and substitutive preorder on TΩT\Omega. We write 𝐂𝐒𝐏𝐫𝐞(T,Ω)\mathbf{CSPre}(T,\Omega) for the set of all congruent and substitutive preorders on TΩT\Omega, and 𝐏𝐫𝐞(T)\mathbf{Pre}(T) for the collection of all preorders on TT. [32] gave a construction []Ω:𝐂𝐒𝐏𝐫𝐞(T,Ω)𝐏𝐫𝐞(T)[-]^{\Omega}\colon\mathbf{CSPre}(T,\Omega)\to\mathbf{Pre}(T) of preorders on TT from congruent and substitutive preorders on TΩT\Omega:

c1[]JΩc2g:JTΩ.g(c1)g(c2)c_{1}[\leq]^{\Omega}_{J}c_{2}\iff\forall{g\colon J\to T\Omega}~{}.~{}g^{\sharp}(c_{1})\leq g^{\sharp}(c_{2})

The constructed preorders on TT are Ω\Omega-generated in the following sense:

Proposition 11.

For any 𝐂𝐒𝐏𝐫𝐞(T,Ω){\leq}\in\mathbf{CSPre}(T,\Omega), the \mathcal{B}-divergence Δ[]Ω{\mathsf{\Delta}}^{[\leq]^{\Omega}} corresponding to the preorder []Ω{[\leq]}^{\Omega} on TT is Ω\Omega-generated (see Proposition 8 for the correspondence).

Applying this proposition, we can determine Ω\Omega-generatedness of preorders on monads:

  • If the monad TT has a rank α\alpha, the construction []α[-]^{\alpha} is bijective [32, Theorem 7]. Hence for such a monad, each preorder on TT corresponds to an α\alpha-generated \mathcal{B}-divergence.

  • For the subprobability distribution monad DsD_{s} on 𝐒𝐞𝐭{\bf Set}, [50] identified all preorders on DsD_{s}: there are 41 preorders on DsD_{s}. Among them, 25 preorders are 1-generated, while 16 preorders are 2-generated [50, Proposition 6.3].

6.3 An Adjunction between Quantitative Equational Theories and Divergences

[39] introduced a concept of quantitative equational theory as an algebraic presentation of monads on the category of (pseudo-)metric spaces. A quantitative equational theory is an equational theory with indexed equations t=εu{t}=_{\varepsilon}{u} having the axioms of pseudometric spaces, plus suitable axioms reflecting properties of quantitative algebras. A quantitative equational theory determines a pseudometric on the set of Ω\Omega-terms.

Consider a set Ω\Omega of function symbols of finite arity. If nn is the arity of a function fΩf\in\Omega, we write f:nΩf\colon n\in\Omega. Let XX be a set of variables, and let TΩXT_{\Omega}X be the Ω\Omega-term algebra over XX. For f:nΩf\colon n\in\Omega and t1,,tnTΩXt_{1},\ldots,t_{n}\in T_{\Omega}X, we write f(t1,,tn)f(t_{1},\ldots,t_{n}) for the term obtained by applying ff to t1,,tnt_{1},\ldots,t_{n}. The construction XTΩXX\mapsto T_{\Omega}X forms a (strong) monad on 𝐒𝐞𝐭{\bf Set} whose unit sends variables to terms, that is, ηX(x)=x\eta_{X}(x)=x, and Kleisli extension h:TΩITΩXh^{\sharp}\colon T_{\Omega}I\to T_{\Omega}X of function h:ITΩXh\colon I\to T_{\Omega}X is defined inductively by

h(x)h(x),h(f(t1,,tn))f(h(t1),,h(tn)).h^{\sharp}(x)\triangleq h(x),\quad h^{\sharp}(f(t_{1},\ldots,t_{n}))\triangleq f(h^{\sharp}(t_{1}),\ldots,h^{\sharp}(t_{n})).

A substitution of Ω\Omega-terms over XX is a function σ:XTΩX\sigma\colon X\to T_{\Omega}X. For tTΩXt\in T_{\Omega}X, we call σ(t)\sigma^{\sharp}(t) the substitution of σ\sigma to tt. We define the set of indexed equations of terms by

𝕍(TΩX){t=εu|t,uTΩX,ε+}.\mathbb{V}(T_{\Omega}X)\triangleq\{{t}=_{\varepsilon}{u}~{}|~{}t,u\in T_{\Omega}X,\varepsilon\in\mathbb{Q}^{+}\}.

Here the index ε\varepsilon runs over non-negative rational numbers. A conditional quantitative equation is a judgment of the following form

{ti=εiui|iI}t=εu(I:countable,ti=εiui,t=εu𝕍(TΩX));\{{t_{i}}=_{\varepsilon_{i}}{u_{i}}~{}|~{}i\in I\}\vdash{t}=_{\varepsilon}{u}\qquad(I\colon\text{countable},{t_{i}}=_{\varepsilon_{i}}{u_{i}},{t}=_{\varepsilon}{u}\in\mathbb{V}(T_{\Omega}X));

the left hand side of turnstile (\vdash) is called hypothesis and the right hand side conclusion. We denote by 𝔼(TΩX)\mathbb{E}(T_{\Omega}X) the set of conditional quantitative equations. For any countable subset Γ\Gamma of 𝕍(TΩX)\mathbb{V}(T_{\Omega}X) and any substitution σ:XTΩX\sigma\colon X\to T_{\Omega}X, we define σ(Γ){σ(ti)=εiσ(ui)|ti=εiuiΓ}\sigma(\Gamma)\triangleq\{{\sigma^{\sharp}(t_{i})}=_{\varepsilon_{i}}{\sigma^{\sharp}(u_{i})}~{}|~{}{t_{i}}=_{\varepsilon_{i}}{u_{i}}\in\Gamma\}.

Definition 12 (Quantitative Equational Theory [39, Definition 2.1]).

A quantitative equational theory (QET for short) of type Ω\Omega over XX is a set U𝔼(TΩX)U\subseteq\mathbb{E}(T_{\Omega}X) closed under the rules summarized as Figure 1.

\displaystyle\emptyset t=0tU\displaystyle\vdash{t}=_{0}{t}\in U (Ref)
{t=εu}\displaystyle\{{t}=_{\varepsilon}{u}\} u=εtU\displaystyle\vdash{u}=_{\varepsilon}{t}\in U (Sym)
{t=εu,u=εv}\displaystyle\{{t}=_{\varepsilon}{u},{u}=_{\varepsilon^{\prime}}{v}\} t=ε+εvU\displaystyle\vdash{t}=_{\varepsilon+\varepsilon^{\prime}}{v}\in U (Tri)
ε+.{t=εu}\displaystyle\forall{\varepsilon^{\prime}\in\mathbb{Q}^{+}}~{}.~{}\{{t}=_{\varepsilon}{u}\} t=ε+εuU\displaystyle\vdash{t}=_{\varepsilon+\varepsilon^{\prime}}{u}\in U (Max)
ε+.{t=εu|ε<ε}\displaystyle\forall{\varepsilon\in\mathbb{Q}^{+}}~{}.~{}\{{t}=_{\varepsilon^{\prime}}{u}|\varepsilon<\varepsilon^{\prime}\} t=εuU\displaystyle\vdash{t}=_{\varepsilon}{u}\in U (Arch)
f:nΩ.{ti=εui|1in}\displaystyle\forall{f\colon n\in\Omega}~{}.~{}\{{t_{i}}=_{\varepsilon}{u_{i}}|1\leq i\leq n\} f(t1,,tn)=εf(u1,,un)\displaystyle\vdash{f(t_{1},\ldots,t_{n})}=_{\varepsilon}{f(u_{1},\ldots,u_{n})} (Nonexp)
σ:XTΩX.Γt=εuU\displaystyle\forall{\sigma\colon X\to T_{\Omega}X}~{}.~{}{\Gamma\vdash{t}=_{\varepsilon}{u}\in U} σ(Γ)σ(t)=εσ(u)U\displaystyle\implies{\sigma(\Gamma)\vdash{\sigma^{\sharp}(t)}=_{\varepsilon}{\sigma^{\sharp}(u)}\in U} (Subst)
Γt=εuUψΓ.ΓψU\displaystyle\Gamma^{\prime}\vdash{t}=_{\varepsilon}{u}\in U\land\forall{\psi\in\Gamma^{\prime}}~{}.~{}\Gamma\vdash\psi\in U Γt=εuU\displaystyle\implies{\Gamma\vdash{t}=_{\varepsilon}{u}\in U} (Cut)
t=εuΓ\displaystyle{{t}=_{\varepsilon}{u}\in\Gamma} Γt=εuU\displaystyle\implies{\Gamma\vdash{t}=_{\varepsilon}{u}\in U} (Assumpt)
Figure 1: Quantitative Equational Theory Rules

We write 𝐐𝐄𝐓(Ω,X){\bf QET}(\Omega,X) for the set of QETs of type Ω\Omega over XX. We regard it as a poset (𝐐𝐄𝐓(Ω,X),)({\bf QET}(\Omega,X),\subseteq) by the set inclusion order. Given a set U0U_{0} of conditional quantitative equations of type Ω\Omega over XX, by U0¯QET(Ω,X)\overline{U_{0}}^{\mathrm{QET}({\Omega},{X})} we mean the least QET containing U0U_{0}.

We state an adjunction between quantitative equational theories and divergences on free-algebra monads on 𝐒𝐞𝐭{\bf Set}. More specifically, we construct the following adjunction and isomorphism between posets:

(𝐐𝐄𝐓(Ω,X),)(𝐂𝐒𝐄𝐏𝐌𝐞𝐭(TΩ,X),)U[]d[]Gen(𝐃𝐢𝐯𝐄𝐏𝐌𝐞𝐭(TΩ,X),)()X.\lx@xy@svg{\hbox{\raise 0.0pt\hbox{\kern 38.15974pt\hbox{\ignorespaces\ignorespaces\ignorespaces\hbox{\vtop{\kern 0.0pt\offinterlineskip\halign{\entry@#!@&&\entry@@#!@\cr&&\crcr}}}\ignorespaces{\hbox{\kern-38.15974pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{({\bf QET}(\Omega,X),\subseteq)}$}}}}}}}{\hbox{\kern 80.83887pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{({\bf CSEPMet}(T_{\Omega},X),\preceq)\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$}}}}}}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces\ignorespaces\ignorespaces{\hbox{\kern 50.6169pt\raise-11.29993pt\hbox{{}\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\hbox{\hbox{\kern 0.0pt\raise-1.75pt\hbox{$\scriptstyle{U[-]}$}}}\kern 3.0pt}}}}}}\ignorespaces{\hbox{\kern 38.15974pt\raise-4.79993pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\lx@xy@tip{1}\lx@xy@tip{-1}}}}}}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces{}\ignorespaces{\hbox{\kern 59.49931pt\raise 0.0pt\hbox{{}{}\hbox{\kern-3.8889pt\raise-3.47221pt\hbox{\xyRotate@@{-1024}\kern 0.0pt\kern 0.0pt}}}}}\ignorespaces{}\ignorespaces\ignorespaces\ignorespaces\ignorespaces{\hbox{\kern 80.83887pt\raise 4.79993pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\lx@xy@tip{1}\lx@xy@tip{-1}}}}}}{\hbox{\lx@xy@droprule}}\ignorespaces\ignorespaces\ignorespaces{\hbox{\kern 51.56651pt\raise 11.29993pt\hbox{{}\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\hbox{\hbox{\kern 0.0pt\raise-1.75pt\hbox{$\scriptstyle{d[-]}$}}}\kern 3.0pt}}}}}}\ignorespaces{}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces\ignorespaces\ignorespaces{\hbox{\kern 195.38962pt\raise 8.9917pt\hbox{{}\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\hbox{\hbox{\kern 0.0pt\raise-2.39166pt\hbox{$\scriptstyle{\mathrm{Gen}}$}}}\kern 3.0pt}}}}}}\ignorespaces{\hbox{\kern 225.97572pt\raise 3.60004pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\lx@xy@tip{1}\lx@xy@tip{-1}}}}}}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces{}\ignorespaces{\hbox{\kern 202.01115pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise-1.75pt\hbox{$\scriptstyle{\cong}$}}}}}\ignorespaces{}{\hbox{\kern 225.97572pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{(\mathbf{DivEPMet}(T_{\Omega},X),\preceq)\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$}}}}}}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces{}{\hbox{\lx@xy@droprule}}\ignorespaces\ignorespaces\ignorespaces{\hbox{\kern 195.9334pt\raise-10.10004pt\hbox{{}\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\hbox{\hbox{\kern 0.0pt\raise-1.75pt\hbox{$\scriptstyle{(-)_{X}}$}}}\kern 3.0pt}}}}}}\ignorespaces{\hbox{\kern 183.2966pt\raise-3.60004pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\lx@xy@tip{1}\lx@xy@tip{-1}}}}}}{\hbox{\lx@xy@droprule}}{\hbox{\lx@xy@droprule}}\ignorespaces}}}}\ignorespaces. (7)

By combining these, a QET of type Ω\Omega over XX determines an XX-generated Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-relative +\mathcal{R}^{+}-divergence on TΩT_{\Omega} and vice versa. The poset in the middle is that of congruent and substitutive pseudometrics, which are a quantitative analogue of congruent and substitutive preorders.

Definition 13.

Let TT be a monad on 𝐒𝐞𝐭{\bf Set} and X𝐒𝐞𝐭X\in{\bf Set}. A congruent and substitutive pseudometric (CS-EPMet for short) on TXTX is an extended pseudometric666A function d:A2+d\colon A^{2}\to\mathcal{R}^{+} is called an extended pseudometric on AA if d(a,a)=0d(a,a)=0 (reflexivity), d(b,a)=d(a,b)d(b,a)=d(a,b) (symmetry) and d(a,c)d(a,b)+d(b,c)d(a,c)\leq d(a,b)+d(b,c) (triangle-inequality) hold for all a,b,cAa,b,c\in A. d:(TX)2+d\colon(TX)^{2}\to\mathcal{R}^{+} on TXTX satisfying

Substitutivity

For all function fXTXfX\to TX and c1,c2TXc_{1},c_{2}\in TX, d(f(c1),f(c2))d(c1,c2)d(f^{\sharp}(c_{1}),f^{\sharp}(c_{2}))\leq d(c_{1},c_{2}).

Congruence

For all set II, function f1,f2:ITXf_{1},f_{2}\colon I\to TX and cTIc\in TI, d(f1(c),f2(c))supiId(f1(i),f2(i))d(f_{1}^{\sharp}(c),f_{2}^{\sharp}(c))\leq\sup_{i\in I}d(f_{1}(i),f_{2}(i)).

We denote by 𝐂𝐒𝐄𝐏𝐌𝐞𝐭(T,X){\bf CSEPMet}(T,X) the set of CS-EPMets on TXTX. We then make it into a poset (𝐂𝐒𝐄𝐏𝐌𝐞𝐭(T,X),)({\bf CSEPMet}(T,X),\preceq) by the following pointwise opposite order:

ddc1,c2TX.d(c1,c2)d(c1,c2).d\preceq d^{\prime}\iff\forall{c_{1},c_{2}\in TX}~{}.~{}d(c_{1},c_{2})\geq d^{\prime}(c_{1},c_{2}).
Definition 14.

Let TT be a monad on 𝐒𝐞𝐭{\bf Set} and X𝐒𝐞𝐭X\in{\bf Set}. We denote by 𝐃𝐢𝐯𝐄𝐏𝐌𝐞𝐭(T,X)\mathbf{DivEPMet}(T,X) the collection of XX-generated Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-relative +\mathcal{R}^{+}-divergences Δ{\mathsf{\Delta}} on TT such that each component ΔI{\mathsf{\Delta}}_{I} is an extended pseudometric. We restrict the partial order \preceq on 𝐃𝐢𝐯(T,Eq,1,+){\bf Div}(T,{\color[rgb]{0,0,0}\mathrm{Eq}},1,\mathcal{R}^{+}) to 𝐃𝐢𝐯𝐄𝐏𝐌𝐞𝐭(T,X)\mathbf{DivEPMet}(T,X).

We next introduce various monotone functions appearing in (7).

d[U](t,u)inf{ε+|t=εuU}\displaystyle d[U](t,u)\triangleq\inf\left\{\varepsilon\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{t}=_{\varepsilon}{u}\in U\right\} Gen(d)I(c1,c2)supk:ITXd(k(c1),k(c2))\displaystyle\mathrm{Gen}(d)_{I}(c_{1},c_{2})\triangleq\sup_{k\colon I\to TX}d(k^{\sharp}(c_{1}),k^{\sharp}(c_{2}))
U[d]{t=εu|ε+,d(t,u)ε}¯QET(Ω,X)\displaystyle U[d]\triangleq\overline{\left\{\emptyset\vdash{t}=_{\varepsilon}{u}~{}\middle|~{}\varepsilon\in\mathbb{Q}^{+},d(t,u)\leq\varepsilon\right\}}^{\mathrm{QET}({\Omega},{X})} (Δ)XΔX\displaystyle({\mathsf{\Delta}})_{X}\triangleq{\mathsf{\Delta}}_{X}
Proposition 12.

The functions d[],U[],Gen,()Xd[-],U[-],\mathrm{Gen},(-)_{X} defined above are all well-defined monotone functions having types given in (7).

That d[U]d[U] is an extended pseudometric is shown in the beginning of [39, Section 5]. Here we additionally show that it enjoys congruence and substitutivity of Definition 13. The function Gen\mathrm{Gen} is taken from the right hand side of the definition of Ω\Omega-generatedness (Definition 11). The function ()X(-)_{X} simply extracts the XX-th component of a given divergence.

Theorem 6.

For any set Ω\Omega of function symbols with finite arity and set XX, the following holds for the monotone functions in (7):

  1. 1.

    Gen\mathrm{Gen} is the inverse of ()X(-)_{X}.

  2. 2.

    We have an adjunction satisfying d[U[]]=idd[U[-]]={\rm id}:

    (𝐐𝐄𝐓(Ω,X),)\textstyle{({\bf QET}(\Omega,X),\subseteq)}(𝐂𝐒𝐄𝐏𝐌𝐞𝐭(TΩ,X),)\textstyle{({\bf CSEPMet}(T_{\Omega},X),\preceq)\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}U[]\scriptstyle{U[-]}\textstyle{\bot}d[]\scriptstyle{d[-]} (8)

In the proof of this theorem, we used the definition of models of QET ([6]). Intuitively, the right adjoint d[]d[-] extracts the pseudometric on TΩXT_{\Omega}X from a given QET. The left adjoint U[]U[-] constructs the least QET containing all information of a given pseudometric on TΩXT_{\Omega}X. The adjunction (8) also implies that we can construct monads on the category of extended metric spaces from CS-EPMets by Mardare et al.’s metric term monad construction ([39]). Overall adjunction (7) says that XX-generated divergences can be axiomatized with QETs whose variable set is XX.

The range of U[]U[-] is a subset of 𝐔𝐐𝐄𝐓(Ω,X)\mathbf{UQET}(\Omega,X) of unconditional QETs defined below (See also [38, Section 3]):

𝐔𝐐𝐄𝐓(Ω,X){V𝐐𝐄𝐓(Ω,X)|S{t=εu|t,uTΩX,ε+}.V=S¯QET(Ω,X)}.\mathbf{UQET}(\Omega,X)\triangleq\left\{V\in{\bf QET}(\Omega,X)~{}\middle|~{}\exists S\subseteq\{\emptyset\vdash{t}=_{\varepsilon}{u}~{}|~{}t,u\in T_{\Omega}X,\varepsilon\in\mathbb{Q}^{+}\}.~{}V=\overline{S}^{\mathrm{QET}({\Omega},{X})}\right\}.

Unconditional QETs of type Ω\Omega over XX are equivalent to XX-generated divergence on TΩT_{\Omega}: restricting QETs to unconditional QETs, the adjunction (8) becomes a pair of isomorphisms.

Theorem 7.

(𝐔𝐐𝐄𝐓(Ω,X),)(𝐂𝐒𝐄𝐏𝐌𝐞𝐭(TΩ,X),)(𝐃𝐢𝐯𝐄𝐏𝐌𝐞𝐭(TΩ,X),)(\mathbf{UQET}(\Omega,X),\subseteq)\cong({\bf CSEPMet}(T_{\Omega},X),\preceq)\cong(\mathbf{DivEPMet}(T_{\Omega},X),\preceq).

7 Graded Strong Relational Liftings for Divergences

We have introduced the concept of divergence on monad for measuring quantitative difference between two computational effects. To integrate this concept with relational program logic, we employ a semantic structure called graded strong relational lifting of monad. It is introduced for the semantics of approximate probabilistic relational Hoare logic for the verification of differential privacy ([12]), then later used in various program logics ([13, 8, 9, 51, 52]). Independently, it is also introduced as a semantic structure for effect system ([31]). Liftings introduced in the study of differential privacy are designed to satisfy a special property called fundamental property [12, Theorem 1]: when we supply the equivalence relation to the lifting, it returns the adjacency relation of the divergence. This special property is the key to express the differential privacy of probabilistic programs in relational program logics.

In this paper, we present a general construction of graded strong relational liftings from divergences on monads. First, we recall its definition ([31, 24]).

Definition 15.

Let (,T)(\mathbb{C},T) be a CC-SM and (M,,1,())(M,\leq,1,(\cdot)) be a grading monoid. An MM-graded strong relational lifting T˙\dot{T} of TT is a mapping T˙:M×𝐎𝐛𝐣(𝐁𝐑𝐞𝐥())𝐎𝐛𝐣(𝐁𝐑𝐞𝐥())\dot{T}:M\times{\bf Obj}({\bf BRel}(\mathbb{C}))\rightarrow{\bf Obj}({\bf BRel}(\mathbb{C})) satisfying the following conditions:

  1. 1.

    p(T˙mX)=(TX1,TX2)p_{\mathbb{C}}(\dot{T}mX)=(TX_{1},TX_{2}), and mmm\leq m^{\prime} implies T˙mXT˙mX\dot{T}mX\leq\dot{T}m^{\prime}X.

  2. 2.

    (ηX1,ηX2):X˙T˙1(X)(\eta_{X_{1}},\eta_{X_{2}}):X\mathbin{\dot{\rightarrow}}\dot{T}1(X).

  3. 3.

    (f1,f2):X˙T˙m(Y)(f_{1},f_{2}):X\mathbin{\dot{\rightarrow}}\dot{T}m(Y) implies (f1,f2):T˙mX˙T˙(mm)Y(f^{\sharp}_{1},f^{\sharp}_{2}):\dot{T}m^{\prime}X\mathbin{\dot{\rightarrow}}\dot{T}(m\cdot m^{\prime})Y.

  4. 4.

    (θX1,Y1,θX2,Y2):X×˙T˙mY˙T˙m(X×˙Y)(\theta_{X_{1},Y_{1}},\theta_{X_{2},Y_{2}}):X\mathbin{\dot{\times}}\dot{T}mY\mathbin{\dot{\rightarrow}}\dot{T}m(X\mathbin{\dot{\times}}Y).

Our interest is in the graded strong relational lifting that carries the information of a given divergence Δ𝐃𝐢𝐯(T,E,M,𝒬){\mathsf{\Delta}}\in{\bf Div}(T,E,M,\mathcal{Q}). We identify such liftings by the following fundamental property. First define the adjacency relation of Δ{\mathsf{\Delta}} by

Δ~(m,v)I(TI,TI,{(c1,c2)|ΔIm(c1,c2)v})(mM,v𝒬,I).\tilde{{\mathsf{\Delta}}}(m,v)I\triangleq(TI,TI,\{(c_{1},c_{2})~{}|~{}{\mathsf{\Delta}}^{m}_{I}(c_{1},c_{2})\leq v\})\quad(m\in M,v\in\mathcal{Q},I\in\mathbb{C}). (9)

Note that Δ~\tilde{\mathsf{\Delta}} is monotone on mm and vv.

Definition 16.

We say that an M×𝒬M\times\mathcal{Q}-graded strong relational lifting T˙\dot{T} of TT satisfies the fundamental property with respect to Δ𝐃𝐢𝐯(T,E,M,𝒬){\mathsf{\Delta}}\in{\bf Div}(T,E,M,\mathcal{Q}) if the following holds:

T˙(m,v)(EI)=Δ~(m,v)I(mM,v𝒬,I).\dot{T}(m,v)(EI)=\tilde{\mathsf{\Delta}}(m,v)I\quad(m\in M,v\in\mathcal{Q},I\in\mathbb{C}).
Theorem 8.

Let (,T)(\mathbb{C},T) be a CC-SM, (M,,1,())(M,\leq,1,(\cdot)) be a grading monoid, 𝒬\mathcal{Q} be a divergence domain and Δ={ΔIm:(U(TI))2𝒬}mM,I{\mathsf{\Delta}}=\{{\mathsf{\Delta}}^{m}_{I}\colon(U(TI))^{2}\to\mathcal{Q}\}_{m\in M,I\in\mathbb{C}} be a doubly-indexed family of 𝒬\mathcal{Q}-divergences satisfying monotonicity on mm (Definition 6). Define the following mapping T[Δ]:(M×𝒬)×𝐎𝐛𝐣(𝐁𝐑𝐞𝐥())𝐎𝐛𝐣(𝐁𝐑𝐞𝐥())T^{[{\mathsf{\Delta}}]}:(M\times\mathcal{Q})\times{\bf Obj}({\bf BRel}(\mathbb{C}))\to{\bf Obj}({\bf BRel}(\mathbb{C})):

T[Δ](m,v)X(TX1,TX2,{(c1,c2)|\displaystyle T^{[{\mathsf{\Delta}}]}(m,v)X\triangleq(TX_{1},TX_{2},\{(c_{1},c_{2})~{}|~{} I,nM,w𝒬,(k1,k2):X˙Δ~(n,w)I.\displaystyle\forall{I\in\mathbb{C},n\in M,w\in\mathcal{Q},(k_{1},k_{2}):X\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n,w)I}~{}.~{}
(k1c1,k2c2)Δ~(mn,v+w)I})\displaystyle\quad(k_{1}^{\sharp}\mathbin{\bullet}c_{1},k_{2}^{\sharp}\mathbin{\bullet}c_{2})\in\tilde{\mathsf{\Delta}}(m\cdot n,v+w)I\})
  1. 1.

    The mapping T[Δ]T^{[{\mathsf{\Delta}}]} is an M×𝒬M\times\mathcal{Q}-graded strong relational lifting of TT.

  2. 2.

    Let E:𝐁𝐑𝐞𝐥()E:\mathbb{C}\to{\bf BRel}(\mathbb{C}) be a basic endorelation. Then

    Δ{\mathsf{\Delta}} is EE-unit-reflexive I,(m,v)M×𝒬.T[Δ](m,v)(EI)Δ~(m,v)I\displaystyle\iff\forall{I\in\mathbb{C},(m,v)\in M\times\mathcal{Q}}~{}.~{}T^{[{\mathsf{\Delta}}]}(m,v)(EI)\leq\tilde{\mathsf{\Delta}}(m,v)I (S)
    Δ{\mathsf{\Delta}} is EE-composable I,(m,v)M×𝒬.T[Δ](m,v)(EI)Δ~(m,v)I.\displaystyle\iff\forall{I\in\mathbb{C},(m,v)\in M\times\mathcal{Q}}~{}.~{}T^{[{\mathsf{\Delta}}]}(m,v)(EI)\geq\tilde{\mathsf{\Delta}}(m,v)I. (C)

The construction of T[Δ]T^{[{\mathsf{\Delta}}]} is a graded extension of the codensity lifting ([51, 33]). The remainder of this section is the proof of Theorem 8.

Proof.

(Proof of (1)) Proving conditions 1-3 of graded strong relational lifting (Definition 15) are routine generalization of [33] and [31, Section 5]; thus omitted here (see Lemma 4 in appendix).

However, condition 4 of Definition 15 needs a special attention because in general codensity lifting does not automatically lift strength. The current setting works because of our particular choice of the category of binary relations over \mathbb{C}. We prove condition 4 as follows. Since fij=fi,jf_{i}\mathbin{\bullet}j=f\mathbin{\bullet}\langle i,j\rangle for any jUJj\in UJ holds, we have the equivalence

(f,g):X×˙Y˙Z\displaystyle(f,g):X\mathbin{\dot{\times}}Y\mathbin{\dot{\rightarrow}}Z (x,x)X,(y,y)Y.(fx,y,gx,y)Z\displaystyle\iff\forall(x,x^{\prime})\in X,(y,y^{\prime})\in Y.(f\mathbin{\bullet}{\langle x,y\rangle},g\mathbin{\bullet}{\langle x^{\prime},y^{\prime}\rangle})\in Z
(x,x)X,(y,y)Y.((fx)y,(gx)y)Z\displaystyle\iff\forall(x,x^{\prime})\in X,(y,y^{\prime})\in Y.\left(\left(f_{x}\right)\mathbin{\bullet}{y},\left(g_{x^{\prime}}\right)\mathbin{\bullet}{y^{\prime}}\right)\in Z
(x,x)X.(fx,gx):Y˙Z.\displaystyle\iff\forall(x,x^{\prime})\in X.(f_{x},g_{x^{\prime}})\colon Y\mathbin{\dot{\rightarrow}}Z.

From this, condition 3 (law of graded Kleisli extension), and the equation (1) on the strength of a CC-SM, we prove condition 4 from condition 2 (unit law): for all mMm\in M and v𝒬v\in\mathcal{Q}, we have

(ηX1×Y1,ηX2×Y2):X×˙Y˙T[Δ](1,0)(X×˙Y)\displaystyle(\eta_{X_{1}\times Y_{1}},\eta_{X_{2}\times Y_{2}})\colon X\mathbin{\dot{\times}}Y\mathbin{\dot{\rightarrow}}T^{[{\mathsf{\Delta}}]}(1,0)(X\mathbin{\dot{\times}}Y)
(x,x)X.((ηX1×Y1)x,(ηX2×Y2)x):Y˙T[Δ](1,0)(X×˙Y)\displaystyle\iff\forall{(x,x^{\prime})\in X}~{}.~{}((\eta_{X_{1}\times Y_{1}})_{x},(\eta_{X_{2}\times Y_{2}})_{x^{\prime}})\colon Y\mathbin{\dot{\rightarrow}}T^{[{\mathsf{\Delta}}]}(1,0)(X\mathbin{\dot{\times}}Y)
(x,x)X.(((ηX1×Y1)x),((ηX2×Y2)x)):T[Δ](m,v)Y˙T[Δ](m,v)(X×˙Y)\displaystyle\implies\forall{(x,x^{\prime})\in X}~{}.~{}(((\eta_{X_{1}\times Y_{1}})_{x})^{\sharp},((\eta_{X_{2}\times Y_{2}})_{x^{\prime}})^{\sharp}):T^{[{\mathsf{\Delta}}]}(m,v)Y\mathbin{\dot{\rightarrow}}T^{[{\mathsf{\Delta}}]}(m,v)(X\mathbin{\dot{\times}}Y)
((x,x)X,(c1,c2)T[Δ](m,v)Y.(((ηX1×Y1)x)c1,((ηX2×Y2)x)c2)T[Δ](m,v)(X×˙Y))\displaystyle\iff\left(\begin{aligned} &\forall{(x,x^{\prime})\in X,(c_{1},c_{2})\in T^{[{\mathsf{\Delta}}]}(m,v)Y}~{}.~{}\\ &\qquad(((\eta_{X_{1}\times Y_{1}})_{x})^{\sharp}\mathbin{\bullet}c_{1},((\eta_{X_{2}\times Y_{2}})_{x^{\prime}})^{\sharp}\mathbin{\bullet}c_{2})\in T^{[{\mathsf{\Delta}}]}(m,v)(X\mathbin{\dot{\times}}Y)\end{aligned}\right)
((x,x)X,(c1,c2)T[Δ](m,v)Y.(θX1,Y1x,c1,θX2,Y2x,c2)T[Δ](m,v)(X×˙Y))\displaystyle\iff\left(\begin{aligned} &\forall{(x,x^{\prime})\in X,(c_{1},c_{2})\in T^{[{\mathsf{\Delta}}]}(m,v)Y}~{}.~{}\\ &\qquad(\theta_{X_{1},Y_{1}}\mathbin{\bullet}\langle x,c_{1}\rangle,\theta_{X_{2},Y_{2}}\mathbin{\bullet}\langle x^{\prime},c_{2}\rangle)\in T^{[{\mathsf{\Delta}}]}(m,v)(X\mathbin{\dot{\times}}Y)\end{aligned}\right)
(x,x)X.((θX1,Y1)x,(θX2,Y2)x):T[Δ](m,v)Y˙T[Δ](m,v)(X×˙Y)\displaystyle\iff\forall{(x,x^{\prime})\in X}~{}.~{}((\theta_{X_{1},Y_{1}})_{x},(\theta_{X_{2},Y_{2}})_{x^{\prime}})\colon T^{[{\mathsf{\Delta}}]}(m,v)Y\mathbin{\dot{\rightarrow}}T^{[{\mathsf{\Delta}}]}(m,v)(X\mathbin{\dot{\times}}Y)
(θX1,Y1,θX2,Y2):X×˙T[Δ](m,v)Y˙T[Δ](m,v)(X×˙Y).\displaystyle\iff(\theta_{X_{1},Y_{1}},\theta_{X_{2},Y_{2}})\colon X\mathbin{\dot{\times}}T^{[{\mathsf{\Delta}}]}(m,v)Y\mathbin{\dot{\rightarrow}}T^{[{\mathsf{\Delta}}]}(m,v)(X\mathbin{\dot{\times}}Y).

(Proof of (2)-(S)) We show the equivalence of Δ{\mathsf{\Delta}} being EE-unit-reflexive and the implication

I,mM,v𝒬,c,cU(TI).\displaystyle\forall{I\in\mathbb{C},m\in M,v\in\mathcal{Q},c,c^{\prime}\in U(TI)}~{}.~{}
(J,mM,v𝒬,(k,l):EI˙Δ~(m,v)J.ΔJmm(kc,lc)v+v)\displaystyle\qquad(\forall{J\in\mathbb{C},m^{\prime}\in M,v^{\prime}\in\mathcal{Q},(k,l):EI\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(m^{\prime},v^{\prime})J}~{}.~{}{\mathsf{\Delta}}^{m\cdot m^{\prime}}_{J}(k^{\sharp}\mathbin{\bullet}c,l^{\sharp}\mathbin{\bullet}{c^{\prime}})\leq v+v^{\prime}) (10)
ΔIm(c,c)v.\displaystyle\qquad\qquad\implies{\mathsf{\Delta}}^{m}_{I}(c,c^{\prime})\leq v.

We suppose that the above implication holds. We fix II\in\mathbb{C}. Let (i,j)EI(i,j)\in EI. By instantiating the whole implication with m=1,v=0,c=ηIi,c=ηIjm=1,v=0,c=\eta_{I}\mathbin{\bullet}i,c^{\prime}=\eta_{I}\mathbin{\bullet}j, the middle part of (10) becomes

J,mM,v𝒬,(k,l):EI˙Δ~(m,v)J.ΔJm(ki,lj)v,\forall{J\in\mathbb{C},m^{\prime}\in M,v^{\prime}\in\mathcal{Q},(k,l):EI\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(m^{\prime},v^{\prime})J}~{}.~{}{\mathsf{\Delta}}^{m^{\prime}}_{J}(k\mathbin{\bullet}i,l\mathbin{\bullet}j)\leq v^{\prime},

which is trivially true. Therefore we conclude ΔIm(ηIi,ηIj)0{\mathsf{\Delta}}^{m}_{I}(\eta_{I}\mathbin{\bullet}i,\eta_{I}\mathbin{\bullet}j)\leq 0 for any (i,j)EI(i,j)\in EI, that is, EE-unit reflexivity holds.

Conversely, we suppose that Δ{\mathsf{\Delta}} satisfies the unit-reflexivity. We take I,m,v,c,cI,m,v,c,c^{\prime} of appropriate type and assume the middle part of (10). By instantiating it with J=I,m=1,v=0,k=l=ηIJ=I,m^{\prime}=1,v^{\prime}=0,k=l=\eta_{I}, we conclude ΔIm(c,c)v{\mathsf{\Delta}}^{m}_{I}(c,c^{\prime})\leq v.

(Proof of (2)-(C)) We show the equivalence of Δ{\mathsf{\Delta}} being EE-composable and the implication I,mM,v𝒬.Δ~I(m,v)T[Δ]I(m,v)(EI)\forall{I\in\mathbb{C},m\in M,v\in\mathcal{Q}}~{}.~{}\tilde{\mathsf{\Delta}}I(m,v)\leq T^{[{\mathsf{\Delta}}]}I(m,v)(EI) as follows:

I,mM,v𝒬.Δ~I(m,v)T[Δ]I(m,v)(EI)\displaystyle\forall{I\in\mathbb{C},m\in M,v\in\mathcal{Q}}~{}.~{}\tilde{\mathsf{\Delta}}I(m,v)\leq T^{[{\mathsf{\Delta}}]}I(m,v)(EI)
(I,mM,v𝒬,c,cU(TI).ΔIm(c,c)vJ,mM,v𝒬,(k,l):EI˙Δ~(m,v)J.(kc,lc)Δ~(mm,v+v)J)\displaystyle\iff\left(\begin{aligned} &\forall{I\in\mathbb{C},m\in M,v\in\mathcal{Q},c,c^{\prime}\in U(TI)}~{}.~{}\\ &\qquad{\mathsf{\Delta}}^{m}_{I}(c,c^{\prime})\leq v\implies\\ &\qquad\qquad\forall{J\in\mathbb{C},m^{\prime}\in M,v^{\prime}\in\mathcal{Q},(k,l)\colon EI\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(m^{\prime},v^{\prime})J}~{}.~{}\\ &\qquad\qquad\qquad(k^{\sharp}\mathbin{\bullet}c,l^{\sharp}\mathbin{\bullet}c^{\prime})\in\tilde{\mathsf{\Delta}}(m\cdot m^{\prime},v+v^{\prime})J\end{aligned}\right)
(I,J,mM,v𝒬,c,cU(TI),mM,v𝒬,k,l(I,TJ).ΔIm(c,c)v((i,j)EI.(ki,lj)Δ~(m,v)I)ΔImm(kc,lc)v+v)\displaystyle\iff\left(\begin{aligned} &\forall{I,J\in\mathbb{C},m\in M,v\in\mathcal{Q},c,c^{\prime}\in U(TI),m^{\prime}\in M,v^{\prime}\in\mathcal{Q},k,l\in\mathbb{C}(I,TJ)}~{}.~{}\\ &\qquad{\mathsf{\Delta}}^{m}_{I}(c,c^{\prime})\leq v\implies\\ &\qquad\qquad(\forall{(i,j)\in EI}~{}.~{}(k\mathbin{\bullet}i,l\mathbin{\bullet}j)\in\tilde{\mathsf{\Delta}}(m^{\prime},v^{\prime})I)\implies{\mathsf{\Delta}}^{m\cdot m^{\prime}}_{I}(k^{\sharp}\mathbin{\bullet}c,l^{\sharp}\mathbin{\bullet}c^{\prime})\leq v+v^{\prime}\end{aligned}\right)
(I,J,mM,v𝒬,c,cU(TI),mM,v𝒬,k,l(I,TJ).ΔIm(c,c)vsup(i,j)EIΔJm(ki,lj)vΔJmm(kc,lc)v+v)\displaystyle\iff\left(\begin{aligned} &\forall{I,J\in\mathbb{C},m\in M,v\in\mathcal{Q},c,c^{\prime}\in U(TI),m^{\prime}\in M,v^{\prime}\in\mathcal{Q},k,l\in\mathbb{C}(I,TJ)}~{}.~{}\\ &\qquad{\mathsf{\Delta}}^{m}_{I}(c,c^{\prime})\leq v\implies\\ &\qquad\qquad\textstyle\sup_{(i,j)\in EI}{\mathsf{\Delta}}^{m^{\prime}}_{J}(k\mathbin{\bullet}i,l\mathbin{\bullet}j)\leq v^{\prime}\implies{\mathsf{\Delta}}^{m\cdot m^{\prime}}_{J}(k^{\sharp}\mathbin{\bullet}c,l^{\sharp}\mathbin{\bullet}{c^{\prime}})\leq v+v^{\prime}\end{aligned}\right)
(I,J,mM,c,cU(TI),mM,k,l(I,TJ).ΔImm(kc,lc)ΔIm(c,c)+sup(i,j)EIΔIm(ki,lj).).\displaystyle\iff\left(\begin{aligned} &\forall{I,J\in\mathbb{C},m\in M,c,c^{\prime}\in U(TI),m^{\prime}\in M,k,l\in\mathbb{C}(I,TJ)}~{}.~{}\\ &\qquad{\mathsf{\Delta}}^{m\cdot m^{\prime}}_{I}(k^{\sharp}\mathbin{\bullet}c,l^{\sharp}\mathbin{\bullet}{c^{\prime}})\leq{\mathsf{\Delta}}^{m}_{I}(c,c^{\prime})+\textstyle\sup_{(i,j)\in EI}{\mathsf{\Delta}}^{m^{\prime}}_{I}(k\mathbin{\bullet}i,l\mathbin{\bullet}j).\end{aligned}\right).

The first two equivalences are obtained by expanding the definitions of 𝐁𝐑𝐞𝐥(){\bf BRel}(\mathbb{C}), T[Δ]T^{[{\mathsf{\Delta}}]}{} and Δ~\tilde{\mathsf{\Delta}}, the last two equivalences hold because 𝒬\mathcal{Q} is a divergence domain. ∎

Combining the fundamental property and the strength of T[Δ]T^{[{\mathsf{\Delta}}]}, we recover a strength law of divergences.

Proposition 13.

Let (,T)(\mathbb{C},T) be a CC-SM, E:𝐁𝐑𝐞𝐥()E:\mathbb{C}\to{\bf BRel}(\mathbb{C}) be a basic endorelation, (M,,1,())(M,\leq,1,(\cdot)) be a grading monoid and 𝒬\mathcal{Q} be a divergence domain. Suppose also that EI×˙EJE(I×J)EI\mathbin{\dot{\times}}EJ\subseteq E(I\times J) holds for all I,JI,J\in\mathbb{C}. Then each divergence Δ𝐃𝐢𝐯(T,E,M,𝒬){\mathsf{\Delta}}\in{\bf Div}(T,E,M,\mathcal{Q}) satisfies: for all (x1,x2)EI(x_{1},x_{2})\in EI and c1,c2U(TI)c_{1},c_{2}\in U(TI),

ΔI×Jm(θI,Jx1,c1,θI,Jx2,c2)ΔJm(c1,c2).{\mathsf{\Delta}}^{m}_{I\times J}(\theta_{I,J}\mathbin{\bullet}\langle x_{1},c_{1}\rangle,\theta_{I,J}\mathbin{\bullet}\langle x_{2},c_{2}\rangle)\leq{\mathsf{\Delta}}^{m}_{J}(c_{1},c_{2}).

7.1 Simplifying Codensity Liftings by Ω\Omega-Generatedness of Divergences

We here show that for an Ω\Omega-generated divergence Δ{\mathsf{\Delta}}, the calculation of the codensity lifting T[Δ]T^{[{\mathsf{\Delta}}]}{} can be simplified. For an object II\in\mathbb{C}, we define T[Δ],IT^{[{\mathsf{\Delta}}],I} by

(c1,c2)T[Δ],I(m,v)X\displaystyle(c_{1},c_{2})\in T^{[{\mathsf{\Delta}}],I}(m,v)X
n,w,(k1,k2):X˙Δ~(n,w)I.(k1c1,k2c2)Δ~(mn,v+w)I.\displaystyle\iff\forall{n,w,(k_{1},k_{2})\colon X\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n,w)I}~{}.~{}(k_{1}^{\sharp}\mathbin{\bullet}c_{1},k_{2}^{\sharp}\mathbin{\bullet}c_{2})\in\tilde{\mathsf{\Delta}}(m\cdot n,v+w)I.

The original calculation of T[Δ]T^{[{\mathsf{\Delta}}]}{} is a large intersection T[Δ]=IT[Δ],IT^{[{\mathsf{\Delta}}]}=\bigwedge_{I\in\mathbb{C}}T^{[{\mathsf{\Delta}}],I} where II runs over all \mathbb{C}-objects, but if Δ{\mathsf{\Delta}} is Ω\Omega-generated, the parameter II can be fixed at Ω\Omega.

Proposition 14.

For any Ω\Omega-generated divergence Δ𝐃𝐢𝐯(T,E,M,𝒬){\mathsf{\Delta}}\in{\bf Div}(T,E,M,\mathcal{Q}), we have T[Δ]=T[Δ],ΩT^{[{\mathsf{\Delta}}]}{}=T^{[{\mathsf{\Delta}}],\Omega}.

Proof.

We show the equivalence T[Δ]X=T[Δ],ΩXT^{[{\mathsf{\Delta}}]}X=T^{[{\mathsf{\Delta}}],\Omega}X for each X𝐁𝐑𝐞𝐥()X\in{\bf BRel}(\mathbb{C}).

(\supseteq) Immediate from T[Δ]=IT[Δ],IT^{[{\mathsf{\Delta}}]}=\bigwedge_{I\in\mathbb{C}}T^{[{\mathsf{\Delta}}],I}.

(\subseteq) By the Ω\Omega-generatedness of Δ{\mathsf{\Delta}}, we have for all II\in\mathbb{C} and c1,c2U(TI)c^{\prime}_{1},c^{\prime}_{2}\in U(TI),

(c1,c2)Δ~(m,v)Ik:ITΩ.(kc1,kc2)Δ~(m,v)Ω(c^{\prime}_{1},c^{\prime}_{2})\in\tilde{\mathsf{\Delta}}(m^{\prime},v^{\prime})I\iff\forall{k\colon I\to T\Omega}~{}.~{}(k^{\sharp}\mathbin{\bullet}c^{\prime}_{1},k^{\sharp}\mathbin{\bullet}c^{\prime}_{2})\in\tilde{\mathsf{\Delta}}(m^{\prime},v^{\prime})\Omega

Therefore, for any (c2,c2)U(TX1)×U(TX2)(c_{2},c_{2})\in U(TX_{1})\times U(TX_{2}), we have

(c1,c2)T[Δ],ΩX\displaystyle(c_{1},c_{2})\in T^{[{\mathsf{\Delta}}],\Omega}X
nM,w𝒬,(k1,k2):X˙Δ~(n,w)Ω.(k1c1,k2c2)Δ~(mn,v+w)Ω\displaystyle\iff\forall{n\in M,w\in\mathcal{Q},(k_{1},k_{2})\colon X\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n,w)\Omega}~{}.~{}(k_{1}^{\sharp}\mathbin{\bullet}c_{1},k_{2}^{\sharp}\mathbin{\bullet}c_{2})\in\tilde{\mathsf{\Delta}}(m\cdot n,v+w)\Omega
(I,nM,w𝒬,(l1,l2):X˙Δ~(n,w)I,k:ITΩ.(kl1c1,kl2c2)Δ~(mn,v+w)Ω)\displaystyle\implies\left(\begin{aligned} \forall{I\in\mathbb{C},n\in M,w\in\mathcal{Q},(l_{1},l_{2})\colon X\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n,w)I,k\colon I\to T\Omega}~{}.~{}\\ \qquad(k^{\sharp}\circ l_{1}^{\sharp}\mathbin{\bullet}c_{1},k^{\sharp}\circ l_{2}^{\sharp}\mathbin{\bullet}c_{2})\in\tilde{\mathsf{\Delta}}(m\cdot n,v+w)\Omega\end{aligned}\right)
I,nM,w𝒬,(l1,l2):X˙Δ~(n,w)I.(l1c1,l2c2)Δ~(mn,v+w)I\displaystyle\iff\forall{I\in\mathbb{C},n\in M,w\in\mathcal{Q},(l_{1},l_{2})\colon X\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n,w)I}~{}.~{}(l_{1}^{\sharp}\mathbin{\bullet}c_{1},l_{2}^{\sharp}\mathbin{\bullet}c_{2})\in\tilde{\mathsf{\Delta}}(m\cdot n,v+w)I
(c1,c2)T[Δ]X.\displaystyle\iff(c_{1},c_{2})\in T^{[{\mathsf{\Delta}}]}X.

This completes the proof. ∎

For example, the generatedness of 𝖣𝖯\mathsf{DP} shown in Section 6.2 implies that G[𝖣𝖯]=G[𝖣𝖯],2G^{[\mathsf{DP}]}=G^{[\mathsf{DP}],2} and Gs[𝖣𝖯]=Gs[𝖣𝖯],1G_{s}^{[\mathsf{DP}]}=G_{s}^{[\mathsf{DP}],1}. In fact, the simplification Gs[𝖣𝖯],1G_{s}^{[\mathsf{DP}],1} is equal to the (+)2(\mathcal{R}^{+})^{2}-graded relational lifting GsG_{s}^{\top\top} for DP given in [51, Section 2.2], which is defined by, for each (X1,X2,RX)𝐁𝐑𝐞𝐥(𝐌𝐞𝐚𝐬)(X_{1},X_{2},R_{X})\in{\bf BRel}({\bf Meas}),

Gs(ε,δ)(X1,X2,RX)\displaystyle G_{s}^{{\top\top}}(\varepsilon,\delta)(X_{1},X_{2},R_{X})
(Gs(X1),Gs(X2),{(ν1,ν2)AΣX1,BΣX2.RX(A)Bν1(A)exp(ε)ν2(B)+δ}).\displaystyle\triangleq(G_{s}(X_{1}),G_{s}(X_{2}),\{(\nu_{1},\nu_{2})\mid\forall{A\in\Sigma_{X_{1}},B\in\Sigma_{X_{2}}}~{}.~{}R_{X}(A)\subseteq B\implies\nu_{1}(A)\leq\exp(\varepsilon)\nu_{2}(B)+\delta\}).

For detail, see the proof of equalities (†) and (‡) in the proof of [51, Theorem 2.2(iv)].

7.2 Two Lifting Approaches: Codensity and Coupling

We briefly compare two lifting approaches: graded codensity lifting and coupling-based lifting employed in ([12, 13, 8, 9, 52]).

We compare the role of the unit-reflexivity and composability in the codensity graded lifting and the coupling-based graded lifting. Consider the CCC-SM (𝐒𝐞𝐭,D)({\bf Set},D), where DD is the probability distribution monad. Given an Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-relative MM-graded 𝒬\mathcal{Q}-divergence Δ{\mathsf{\Delta}} on DD, the coupling-based graded lifting is defined by

D˙Δ(m,v)X{(Dp1μ1,Dp2μ2)|(μ1,μ2)(DRX)2,ΔRXm(μ1,μ2)v}\dot{D}^{{\mathsf{\Delta}}}(m,v)X\triangleq\{(Dp_{1}\mathbin{\bullet}\mu_{1},Dp_{2}\mathbin{\bullet}\mu_{2})~{}|~{}(\mu_{1},\mu_{2})\in(DR_{X})^{2},{\mathsf{\Delta}}^{m}_{R_{X}}(\mu_{1},\mu_{2})\leq v\} (11)

where pi:RXXip_{i}\colon R_{X}\rightarrow X_{i} is the projection (i=1,2i=1,2) from the binary relation. The pair (μ1,μ2)(\mu_{1},\mu_{2}) of probability distributions collected in the right hand side of (11) is called a coupling.

The fundamental property D˙Δ(EqI)=Δ~(m,v)I\dot{D}^{\mathsf{\Delta}}({\color[rgb]{0,0,0}\mathrm{Eq}}I)=\tilde{\mathsf{\Delta}}(m,v)I immediately follows from the definition of D˙Δ\dot{D}^{\mathsf{\Delta}}, while the composability and unit-reflexivity of Δ{\mathsf{\Delta}} are used to make D˙Δ\dot{D}^{{\mathsf{\Delta}}} a strong M×𝒬M\times\mathcal{Q}-graded lifting [13, Proposition 9]. On the other hand, the codensity graded lifting D[Δ]D^{[{\mathsf{\Delta}}]} is always an M×𝒬M\times\mathcal{Q}-graded lifting; this does not rely on the unit-reflexivity and composability of Δ{\mathsf{\Delta}} (Proposition 1). These properties are used to show that D[Δ]D^{[{\mathsf{\Delta}}]} satisfies the fundamental property (Proposition 2).

The coupling-based lifting (11) can be naturally generalized to any 𝐒𝐞𝐭{\bf Set}-monad TT. However, at this moment we do not know how to generalize the coupling technique to any CC-SM (,T)(\mathbb{C},T). As the prior study by [52] pointed out, there is already a difficulty in extending it to the CC-SM (𝐌𝐞𝐚𝐬,G)({\bf Meas},G).

We illustrate how the problem arises. Let X𝐁𝐑𝐞𝐥(𝐌𝐞𝐚𝐬)X\in{\bf BRel}({\bf Meas}). We would like to pick two probability measures over RXR_{X} as couplings, but RXR_{X} is merely a set. We therefore equip it with the subspace σ\sigma-algebra of X1×X2X_{1}\times X_{2}, and let HXH_{X} be the derived measurable space (hence |HX|=RX|H_{X}|=R_{X}). We write pi:HXXip_{i}:H_{X}\rightarrow X_{i} for measurable projections (i=1,2i=1,2). We then define a candidate M×𝒬M\times\mathcal{Q}-graded lifting of GG by

G˙(m,v)X={(Gp1μ1,Gp2μ2)|(μ1,μ2)(UGHX)2,ΔHXm(μ1,μ2)v}.\dot{G}(m,v)X=\{(Gp_{1}\mathbin{\bullet}\mu_{1},Gp_{2}\mathbin{\bullet}\mu_{2})~{}|~{}(\mu_{1},\mu_{2})\in(UGH_{X})^{2},{\mathsf{\Delta}}^{m}_{H_{X}}(\mu_{1},\mu_{2})\leq v\}.

We now verify that G˙\dot{G} also lifts the Kleisli extension of GG, that is,

(f,g):Y˙G˙(m,v)X(f,g):G˙(m,v)YG˙(mm,v+v)X.(f,g)\colon Y\mathbin{\dot{\rightarrow}}\dot{G}(m^{\prime},v^{\prime})X\implies(f^{\sharp},g^{\sharp})\colon\dot{G}(m,v)Y\to\dot{G}(mm^{\prime},v+v^{\prime})X.

Let (f,g):Y˙G˙(m,v)X(f,g)\colon Y\mathbin{\dot{\rightarrow}}\dot{G}(m^{\prime},v^{\prime})X be pair of measurable functions. Then for each (x,y)RY(x,y)\in R_{Y}, we have (fx,gy)RG˙(m,v)X(f\mathbin{\bullet}x,g\mathbin{\bullet}y)\in R_{\dot{G}(m,v)X}. Therefore there exists (μ1(x,y),μ2(x,y))(UGHX)2(\mu^{(x,y)}_{1},\mu^{(x,y)}_{2})\in(UGH_{X})^{2} such that Gπ1μ1(x,y)=fxG\pi_{1}\mathbin{\bullet}\mu^{(x,y)}_{1}=f\mathbin{\bullet}x and Gπ2μ2(x,y)=gyG\pi_{2}\mathbin{\bullet}\mu^{(x,y)}_{2}=g\mathbin{\bullet}y. Using the axiom of choice, we turn this relationship into functions μ1,μ2:RYUGHX\mu_{1},\mu_{2}\colon R_{Y}\to UGH_{X}. If they were measurable functions of type HYGHXH_{Y}\rightarrow GH_{X}, then from the composability of Δ{\mathsf{\Delta}}, we would have ΔHXmm(μ1w1,μ2w2)v+v{\mathsf{\Delta}}^{mm^{\prime}}_{H_{X}}(\mu^{\sharp}_{1}\mathbin{\bullet}w_{1},\mu^{\sharp}_{2}\mathbin{\bullet}w_{2})\leq v+v^{\prime} for w1,w2UGHYw_{1},w_{2}\in UGH_{Y} such that ΔHYm(w1,w2)v{\mathsf{\Delta}}^{m^{\prime}}_{H_{Y}}(w_{1},w_{2})\leq v^{\prime}. This gives (f,g):G˙(m,v)Y˙G˙(mm,v+v)X(f^{\sharp},g^{\sharp})\colon\dot{G}(m,v)Y\mathbin{\dot{\rightarrow}}\dot{G}(mm^{\prime},v+v^{\prime})X. However, in general, ensuring the measurability of μ1,μ2\mu_{1},\mu_{2} is not possible, especially because they are picked up by the axiom of choice. A solution given in [52] is to use the category 𝐒𝐩𝐚𝐧(𝐌𝐞𝐚𝐬){\bf Span}({\bf Meas}) of spans, that guarantees the existence of good measurable functions h1,h2:HYGHXh_{1},h_{2}\colon H_{Y}\rightarrow GH_{X}.

8 Approximate Computational Relational Logic

We introduce a program logic called approximate computational relational logic (acRL for short). It is a combination of Moggi’s computational metalanguage and a relational refinement type system ([9]). The strong graded relational lifting of a monad constructed from a divergence will be used to relationally interpret monadic types, and gradings give upper bounds of divergences between computational effects caused by two programs. acRL is similar to the relational refinement type system HOARe2 ([9]), which is designed for verifying differential privacy of probabilistic programs. Compared to HOARe2, acRL supports general monads and divergences, while it does not support dependent products nor non-termination.

The relational logic acRL adopts the extensional approach (cf. [44, Chapter 9.2]):

  • Relational assertions between contexts Γ\Gamma and Δ\Delta are defined as binary relations between U[[Γ]]U[\![\Gamma]\!] and U[[Δ]]U[\![\Delta]\!], or equivalently 𝐁𝐑𝐞𝐥(){\bf BRel}(\mathbb{C})-objects ϕ\phi such that p(ϕ)=([[Γ]],[[Δ]])p_{\mathbb{C}}(\phi)=([\![\Gamma]\!],[\![\Delta]\!]). Logical connectives and quantifications are defined as operations on such 𝐁𝐑𝐞𝐥(){\bf BRel}(\mathbb{C})-objects. This is in contrast to the standard design of logic where assertions are defined by a BNF.

  • Let ΓM:τ\Gamma\vdash M:\tau and ΔN:σ\Delta\vdash N:\sigma be well-typed terms, ϕ\phi be a relational assertion between Γ,Δ\Gamma,\Delta, and ψ\psi be an assertion between τ,σ\tau,\sigma. The main concern of acRL is the statement “(γ,δ)ϕ.([[M]]γ,[[N]]δ)ψ\forall(\gamma,\delta)\in\phi.([\![M]\!]\mathbin{\bullet}\gamma,[\![N]\!]\mathbin{\bullet}\delta)\in\psi” (equivalently ([[M]],[[N]]):ϕ˙ψ([\![M]\!],[\![N]\!]):\phi\mathbin{\dot{\rightarrow}}\psi). In this section we denote this statement by ϕ(M,M):ψ\phi\vdash(M,M^{\prime}):\psi.

  • Inference rules of the logic consists of the facts about the statement ϕ(M,M):ψ\phi\vdash(M,M^{\prime}):\psi. We remark that in the standard logic, proving these facts corresponds to the soundness of inference rules.

8.1 Moggi’s Computational Metalanguage

Figure 2: Syntax of Types and Raw Terms of the Computational Metalanguage
𝐓𝐲𝐩(B)τ::=\displaystyle{\bf Typ}(B)\ni\tau\mathbin{::=} b1τ×τ0τ+τττ𝚃τ(bB)\displaystyle b\mid 1\mid\tau\times\tau\mid 0\mid\tau+\tau\mid\tau\Rightarrow\tau\mid\mathtt{T}\tau\quad(b\in B)
M::=\displaystyle M\mathbin{::=} xo(M)c(M)()(M,M)π1(M)π2(M)(oOv,cOe)\displaystyle x\mid o(M)\mid c(M)\mid()\mid(M,M)\mid\pi_{1}(M)\mid\pi_{2}(M)\quad(o\in O_{v},c\in O_{e})
\displaystyle\mid ι1(M)ι2(M)M𝚠𝚒𝚝𝚑ι1(x:τ).M ι2(x:τ).M\displaystyle\iota_{1}(M)\mid\iota_{2}(M)\mid M\mathbin{\mathtt{with}}\iota_{1}(x:\tau).M\mathbin{\rule{0.4pt}{4.30554pt}}\iota_{2}(x:\tau).M
\displaystyle\mid (λx:τ.M)(MM)𝚛𝚎𝚝(M)𝚕𝚎𝚝x:τ=M𝚒𝚗M\displaystyle(\lambda{x:\tau}~{}.~{}M)\mid(MM)\mid\mathop{\mathtt{ret}}(M)\mid\mathop{\mathtt{let}}{x:\tau=M}\mathbin{\mathtt{in}}M

8.1.1 Syntax of the Computational Metalanguage

For the higher-order programming language, we adopt Moggi’s computational metalanguage ([42]). It is an extension of the simply typed lambda calculus with monadic types. For a set BB, we define the set 𝐓𝐲𝐩(B){\bf Typ}(B) of types over BB by the first BNF in Figure 2. We then define the set 𝐓𝐲𝐩1(B){\bf Typ}_{1}(B) of first-order types to be the subset of 𝐓𝐲𝐩(B){\bf Typ}(B) consisting only of b,1,×,+b,1,\times,+.

We next introduce computational signatures for specifying constants in the computational metalanguage. A computational signature is a tuple (B,Σv,Σe)(B,\Sigma_{v},\Sigma_{e}) where BB is a set of base types, and Σv\Sigma_{v} and Σe\Sigma_{e} are functions whose range is 𝐓𝐲𝐩1(B)2{\bf Typ}_{1}(B)^{2}. The domains of Σv,Σe\Sigma_{v},\Sigma_{e} are sets of value operation symbols and effectful operation symbols, and are denoted by Ov,OeO_{v},O_{e}, respectively. These functions assign input and output types to these operations.

Fix a countably infinite set VV of variables. A context is a function from a finite subset of VV to 𝐓𝐲𝐩(B){\bf Typ}(B); contexts are often denoted by capital Greek letters Γ,Δ\Gamma,\Delta. For contexts Γ,Δ\Gamma,\Delta such that dom(Γ)dom(Δ)={\rm dom}(\Gamma)\cap{\rm dom}(\Delta)=\emptyset, by Γ,Δ\Gamma,\Delta we mean the join of Γ\Gamma and Δ\Delta.

The set of raw terms is defined by the second BNF in Figure 2. The type system of the computational metalanguage has judgments of the form ΓM:τ\Gamma\vdash M:\tau where Γ\Gamma is a context, MM a raw term and τ\tau a type. It adopts the standard rules for products, coproducts, implications and monadic types; see e.g. [42]. The typing rules for value operations and effectful operations are given by

oOvΣv(o)=(b,b)ΓM:bΓo(M):boOeΣe(c)=(b,b)ΓM:bΓc(M):𝚃b\Gamma\vdash o(M):b^{\prime}\lx@proof@logical@and o\in O_{v}\Sigma_{v}(o)=(b,b^{\prime})\Gamma\vdash M:b\qquad\Gamma\vdash c(M):\mathtt{T}b^{\prime}\lx@proof@logical@and o\in O_{e}\Sigma_{e}(c)=(b,b^{\prime})\Gamma\vdash M:b

A simultaneous substitution from Γ\Gamma to Γ\Gamma^{\prime} is a function θ\theta from the set dom(Γ){\rm dom}(\Gamma^{\prime}) of variables to raw terms such that the well-typedness Γθ(x):Γ(x)\Gamma\vdash\theta(x):\Gamma^{\prime}(x) holds for each xdom(Γ)x\in{\rm dom}(\Gamma^{\prime}). The application of θ\theta to a term ΓM:τ\Gamma^{\prime}\vdash M:\tau is denoted by MθM\theta, which has a typing ΓMθ:τ\Gamma\vdash M\theta:\tau. For disjoint contexts Γi\Gamma_{i} (i=1,2i=1,2), we define the projection substitutions Γ1,Γ2πiΓ1,Γ2:Γi\Gamma_{1},\Gamma_{2}\vdash\pi_{i}^{\Gamma_{1},\Gamma_{2}}:\Gamma_{i} by πiΓ1,Γ2(x)=x\pi_{i}^{\Gamma_{1},\Gamma_{2}}(x)=x.

8.1.2 Categorical Semantics of the Computational Metalanguage

Figure 3: Data for the Categorical Semantics of Metalanguage
  1. 1.

    (,T)(\mathbb{C},T) is a CCC-SM and \mathbb{C} has finite coproducts.

  2. 2.

    [[b]][\![b]\!]\in\mathbb{C} for each bBb\in B

  3. 3.

    [[o]]:[[b]][[b]][\![o]\!]:[\![b]\!]\to[\![b^{\prime}]\!] for each oOvo\in O_{v} such that Σv(o)=(b,b)\Sigma_{v}(o)=(b,b^{\prime})

  4. 4.

    [[c]]:[[b]]T[[b]][\![c]\!]:[\![b]\!]\to T[\![b^{\prime}]\!] for each cOec\in O_{e} such that Σe(c)=(b,b)\Sigma_{e}(c)=(b,b^{\prime})

The interpretation of the computational metalanguage over a computational signature (B,Σv,Σe)(B,\Sigma_{v},\Sigma_{e}) is given by the data specified by Figure 3.

We first inductively extend the interpretation of base types to all types using the bi-Cartesian closed structure and the monad. Next, for each context Γ\Gamma, we fix a product diagram ([[Γ]],{πx:[[Γ]][[Γ(x)]]}xdom(Γ))([\![\Gamma]\!],\{\pi_{x}:[\![\Gamma]\!]\to[\![\Gamma(x)]\!]\}_{x\in{\rm dom}(\Gamma)}); when dom(Γ)={x}{\rm dom}(\Gamma)=\{x\}, we assume that [[Γ]]=[[Γ(x)]][\![\Gamma]\!]=[\![\Gamma(x)]\!] with πx=id\pi_{x}={\rm id}. Lastly we interpret a typing derivation of ΓM:τ\Gamma\vdash M:\tau as a morphism [[M]]:[[Γ]][[τ]][\![M]\!]:[\![\Gamma]\!]\to[\![\tau]\!] in the standard way, using the interpretations of operations given in Figure 3. We further extend this to the interpretation of each simultaneous substitution Γθ:Γ\Gamma\vdash\theta:\Gamma^{\prime} as a morphisms [[θ]]:[[Γ]][[Γ]][\![\theta]\!]:[\![\Gamma]\!]\to[\![\Gamma^{\prime}]\!].

8.2 Approximate Relational Computational Logic

8.2.1 Relational Logic in External Form

A relational assertion ϕ\phi between disjoint contexts Γ\Gamma and Δ\Delta is a binary relation between U[[Γ]]U[\![\Gamma]\!] and U[[Δ]]U[\![\Delta]\!]. We denote such a relational assertion by ΔΓϕ{}^{\Gamma}_{\Delta}\vdash\phi, and identify it as a 𝐁𝐑𝐞𝐥(){\bf BRel}(\mathbb{C})-object ϕ\phi such that p(ϕ)=([[Γ]],[[Δ]])p_{\mathbb{C}}(\phi)=([\![\Gamma]\!],[\![\Delta]\!]). Similarly, a relational assertion between types τ\tau and σ\sigma is defined to be a relational assertion d:σu:τϕ{}^{u:\tau}_{d:\sigma}\vdash\phi; here u,du,d are reserved and fixed variables.

Relational assertions between contexts Γ\Gamma and Δ\Delta carry a boolean algebra structure ,,¬\wedge,\vee,\neg given by the set-intersection, set-union and set-complement (see the boolean algebra 𝐁𝐑𝐞𝐥()([[Γ]],[[Δ]]){\bf BRel}(\mathbb{C})_{([\![\Gamma]\!],[\![\Delta]\!])} in Section 2.1). The pseudo-complement ϕψ\phi\Rightarrow\psi is defined to be ¬ϕψ\neg\phi\vee\psi. For Δ,y:σΓ,x:τϕ{}^{\Gamma,x:\tau}_{\Delta,y:\sigma}\vdash\phi, by ΔΓyx.ϕ{}^{\Gamma}_{\Delta}\vdash\forall^{x}_{y}~{}.~{}\phi and ΔΓyx.ϕ{}^{\Gamma}_{\Delta}\vdash\exists^{x}_{y}~{}.~{}\phi we mean the relational assertions defined by the following equivalence:

(γ,δ)yx.ϕ\displaystyle(\gamma,\delta)\in\forall^{x}_{y}~{}.~{}\phi\iff γU[[Γ,x:τ]],δU[[Δ,y:σ]].\displaystyle\forall{\gamma^{\prime}\in U[\![\Gamma,x:\tau]\!],\delta^{\prime}\in U[\![\Delta,y:\sigma]\!]}~{}.~{}
([[π1Γ,x:τ]]γ=γ)([[π1Δ,y:σ]]δ=δ)(γ,δ)ϕ\displaystyle\quad\quad([\![\pi_{1}^{\Gamma,x:\tau}]\!]\mathbin{\bullet}\gamma^{\prime}=\gamma)\wedge([\![\pi_{1}^{\Delta,y:\sigma}]\!]\mathbin{\bullet}\delta^{\prime}=\delta)\Rightarrow(\gamma^{\prime},\delta^{\prime})\in\phi
(γ,δ)yx.ϕ\displaystyle(\gamma,\delta)\in\exists^{x}_{y}~{}.~{}\phi\iff γU[[Γ,x:τ]],δU[[Δ,y:σ]].\displaystyle\exists{\gamma^{\prime}\in U[\![\Gamma,x:\tau]\!],\delta^{\prime}\in U[\![\Delta,y:\sigma]\!]}~{}.~{}
([[π1Γ,x:τ]]γ=γ)([[π1Δ,y:σ]]δ=δ)(γ,δ)ϕ\displaystyle\quad\quad([\![\pi_{1}^{\Gamma,x:\tau}]\!]\mathbin{\bullet}\gamma^{\prime}=\gamma)\wedge([\![\pi_{1}^{\Delta,y:\sigma}]\!]\mathbin{\bullet}\delta^{\prime}=\delta)\wedge(\gamma^{\prime},\delta^{\prime})\in\phi

The boolean algebra structure and the above quantifier operations allow us to interpret first-order logical formulas as relational assertions; we omit its detail here. In addition to these standard logical connectives, we will use graded strong relational lifting T[Δ]T^{[{\mathsf{\Delta}}]} to form relational assertions. That is, for any basic endorelation E:𝐁𝐑𝐞𝐥()E:\mathbb{C}\to{\bf BRel}(\mathbb{C}), grading monoid MM, divergence domain 𝒬\mathcal{Q} and divergence Δ𝐃𝐢𝐯(T,E,M,𝒬){\mathsf{\Delta}}\in{\bf Div}(T,E,M,\mathcal{Q}), we obtain a relational assertion d:𝚃σu:𝚃τT[Δ](m,v)ϕ{}^{u:\mathtt{T}\tau}_{d:\mathtt{T}\sigma}\vdash T^{[{\mathsf{\Delta}}]}(m,v)\phi from any d:σu:τϕ{}^{u:\tau}_{d:\sigma}\vdash\phi, mMm\in M and v𝒬v\in\mathcal{Q}.

For substitutions Γθ:Γ,Δθ:Δ\Gamma\vdash\theta:\Gamma^{\prime},\Delta\vdash{\theta^{\prime}}:\Delta^{\prime} and an assertion ΔΓϕ{}^{\Gamma}_{\Delta}\vdash\phi, by ΔΓϕ[θ;θ]{}^{\Gamma^{\prime}}_{\Delta^{\prime}}\vdash\phi[{\theta};{\theta^{\prime}}] we mean the relational assertion {(γ,δ)|([[θ]]γ,[[θ]]δ)ϕ}\{(\gamma,\delta)~{}|~{}([\![\theta]\!]\mathbin{\bullet}\gamma,[\![\theta^{\prime}]\!]\mathbin{\bullet}\delta)\in\phi\}. For disjoint context pairs Γ,Γ\Gamma,\Gamma^{\prime} and Δ,Δ\Delta,\Delta^{\prime} and relational assertions ΔΓϕ{}^{\Gamma}_{\Delta}\vdash\phi and ΔΓψ{}^{\Gamma^{\prime}}_{\Delta^{\prime}}\vdash\psi, by the juxtaposition Δ,ΔΓ,Γϕ,ψ{}^{\Gamma,\Gamma^{\prime}}_{\Delta,\Delta^{\prime}}\vdash\phi,\psi we mean the relational assertion Δ,ΔΓ,Γϕ[π1Γ,Γ;π1Δ,Δ]ψ[π2Γ,Γ;π2Δ,Δ]{}^{\Gamma,\Gamma^{\prime}}_{\Delta,\Delta^{\prime}}\vdash\phi[{\pi_{1}^{\Gamma,\Gamma^{\prime}}};{\pi_{1}^{\Delta,\Delta^{\prime}}}]\wedge\psi[{\pi_{2}^{\Gamma,\Gamma^{\prime}}};{\pi_{2}^{\Delta,\Delta^{\prime}}}].

8.2.2 Inference Rules for acRL

For well-typed computational metalanguage terms ΓM:τ\Gamma\vdash M:\tau and ΔN:σ\Delta\vdash N:\sigma, and relational assertions ΔΓϕ{}^{\Gamma}_{\Delta}\vdash\phi and d:σu:τψ{}^{u:\tau}_{d:\sigma}\vdash\psi, by the judgment

ϕ(M,N):ψ\phi\vdash{}(M,N)\colon\psi

we mean the inclusion ϕψ[[M/u];[N/d]]\phi\subseteq\psi[{[M/u]};{[N/d]}] of binary relations. This is equivalent to ([[M]],[[N]]):ϕ˙ψ([\![M]\!],[\![N]\!]):\phi\mathbin{\dot{\rightarrow}}\psi. We show basic facts about judgments ϕ(M,N):ψ\phi\vdash{}(M,N)\colon\psi.

Proposition 15.
  1. 1.

    ϕ(M,N):ψ\phi\vdash{}(M,N)\colon\psi and [[M]]=[[M]][\![M]\!]=[\![M^{\prime}]\!] and [[N]]=[[N]][\![N]\!]=[\![N^{\prime}]\!] implies ϕ(M,N):ψ\phi\vdash{}(M^{\prime},N^{\prime})\colon\psi.

  2. 2.

    ϕ(M,N):ψ\phi\vdash{}(M,N)\colon\psi and ϕϕ\phi^{\prime}\subseteq\phi and ψψ\psi\subseteq\psi^{\prime} implies ϕ(M,N):ψ\phi^{\prime}\vdash{}(M,N)\colon\psi^{\prime}.

  3. 3.

    ϕ(M,N):T[Δ](m,v)ψ\phi\vdash{}(M,N)\colon T^{[{\mathsf{\Delta}}]}(m,v)\psi and mnm\leq n and vwv\leq w and ψψ\psi\leq\psi^{\prime}
    implies ϕ(M,N):T[Δ](n,w)ψ\phi\vdash{}(M,N)\colon T^{[{\mathsf{\Delta}}]}(n,w){\psi^{\prime}}.

  4. 4.

    ϕ(M,N):ψ\phi\vdash{}(M,N)\colon\psi implies ϕ(𝚛𝚎𝚝(M),𝚛𝚎𝚝(N)):T[Δ](1,0)ψ\phi\vdash{}(\mathop{\mathtt{ret}}(M),\mathop{\mathtt{ret}}(N))\colon T^{[{\mathsf{\Delta}}]}(1,0){\psi}.

  5. 5.

    ϕ(M,N):T[Δ](m,v)ψ\phi\vdash{}(M,N)\colon T^{[{\mathsf{\Delta}}]}(m,v)\psi and ϕ,ψ[[x/u];[x/d]](M,N):T[Δ](n,w)ρ\phi,\psi[{[x/u]};{[x^{\prime}/d]}]\vdash{}(M^{\prime},N^{\prime})\colon T^{[{\mathsf{\Delta}}]}(n,w)\rho
    implies ϕ(𝚕𝚎𝚝x=M𝚒𝚗M,𝚕𝚎𝚝x=N𝚒𝚗N):T[Δ](mn,vw)ρ\phi\vdash{}(\mathop{\mathtt{let}}{x=M}\mathbin{\mathtt{in}}M^{\prime},\mathop{\mathtt{let}}{x^{\prime}=N}\mathbin{\mathtt{in}}N^{\prime})\colon T^{[{\mathsf{\Delta}}]}(m\cdot n,v\cdot w)\rho.

We next establish relational judgments on effectful operations. We present a convenient way to establish such judgments using the fundamental property of the graded relational lifting T[Δ]T^{[{\mathsf{\Delta}}]}.

Proposition 16.

For any cOec\in O_{e} such that Σe(c)=(b,b)\Sigma_{e}(c)=(b,b^{\prime}), relational assertion d:bu:bϕ{}^{u:b}_{d:b}\vdash\phi and mMm\in M, putting v=sup{Δ[[b]]m([[c]]x,[[c]]y)|(x,y)ϕ}v=\sup\{{\mathsf{\Delta}}^{m}_{[\![b^{\prime}]\!]}([\![c]\!]\mathbin{\bullet}x,[\![c]\!]\mathbin{\bullet}y)~{}|~{}(x,y)\in\phi\}, we have ϕ(c(u),c(d)):T[Δ](m,v)(E[[b]])\phi\vdash{}(c(u),c(d))\colon T^{[{\mathsf{\Delta}}]}(m,v)(E[\![b^{\prime}]\!]).

Proof.

Take an arbitrary pair (x,y)ϕ(x,y)\in\phi. We have Δ[[b]]m([[c]]x,[[c]]y)v{\mathsf{\Delta}}^{m}_{[\![b^{\prime}]\!]}([\![c]\!]\mathbin{\bullet}x,[\![c]\!]\mathbin{\bullet}y)\leq v by definition of vv. Thanks to the fundamental property of T[Δ]T^{[{\mathsf{\Delta}}]} (Theorem 8), it is equivalent to ([[c]]x,[[c]]y)T[Δ](m,v)(E[[b]])([\![c]\!]\mathbin{\bullet}x,[\![c]\!]\mathbin{\bullet}y)\in T^{[{\mathsf{\Delta}}]}(m,v)(E[\![b^{\prime}]\!]). ∎

9 Case Study I: Higher-Order Probabilistic Programs

We represent a higher-order probabilistic programming language with sampling commands from continuous distributions as a computational metalanguage. For now we assume that the language supports sampling from Gaussian distribution and Laplace distribution. This computational metalanguage is specified by the computational signature:

𝒞=({𝚁},Σv,{𝚗𝚘𝚛𝚖:(𝚁×𝚁,𝚁),𝚕𝚊𝚙:(𝚁×𝚁,𝚁)}),\mathcal{C}=(\{\mathtt{R}\},\Sigma_{v},\{\mathtt{norm}:(\mathtt{R}\times\mathtt{R},\mathtt{R}),~{}\mathtt{lap}:(\mathtt{R}\times\mathtt{R},\mathtt{R})\}),

where Σv\Sigma_{v} is some chosen signature for value operations over reals. We interpret this computational metalanguage by filling Figure 3 as follows:

  1. 1.

    for the CCC-SM, we take (,T)=(𝐐𝐁𝐒,P)(\mathbb{C},T)=({\bf QBS},P) (see Section 13),

  2. 2.

    for the interpretation [[𝚁]][\![\mathtt{R}]\!] of 𝚁\mathtt{R}, we take the quasi-Borel space KK\mathbb{R} associated with the standard Borel space \mathbb{R},

  3. 3.

    the interpretation of value operations is given as expected (we omit it here); for example when Σv\Sigma_{v} contains the real number addition operator ++ as type (𝚁×𝚁,𝚁)(\mathtt{R}\times\mathtt{R},\mathtt{R}), its interpretation is the QBS morphism [[+]](x,y)=x+y:[[𝚁×𝚁]][[𝚁]][\![+]\!](x,y)=x+y:[\![\mathtt{R}\times\mathtt{R}]\!]\rightarrow[\![\mathtt{R}]\!],

  4. 4.

    for the interpretation of effectful operations, we put

    [[𝚗𝚘𝚛𝚖]](x,σ)\displaystyle[\![\mathtt{norm}]\!](x,\sigma) =[id,𝒩(x,σ2)]K,\displaystyle=[\mathrm{id},\mathcal{N}(x,\sigma^{2})]_{\sim_{K\mathbb{R}}}, [[𝚕𝚊𝚙]](x,λ)\displaystyle[\![\mathtt{lap}]\!](x,\lambda) =[id,Lap(x,λ)]K.\displaystyle=[\mathrm{id},\mathrm{Lap}(x,\lambda)]_{\sim_{K\mathbb{R}}}.

Here, 𝒩(x,σ2)G\mathcal{N}(x,\sigma^{2})\in G\mathbb{R} is the Gaussian distribution with mean xx and variance σ2\sigma^{2}. Lap(x,λ)G\mathrm{Lap}(x,\lambda)\in G\mathbb{R} is the Laplacian distribution with mean xx and variance 2λ22\lambda^{2} 777If σ=0\sigma=0 (or λ0\lambda\leq 0), 𝒩(x,σ2)\mathcal{N}(x,\sigma^{2}) (resp. Lap(x,λ)\mathrm{Lap}(x,\lambda)) is not defined, thus we replace it by the Dirac distribution 𝐝x\mathbf{d}_{x} at xx instead.. Every probability (Borel-)measure μG\mu\in G\mathbb{R} on \mathbb{R} can be converted to the probability measure [id,μ]KPK[\mathrm{id},\mu]_{\sim_{K\mathbb{R}}}\in PK\mathbb{R} on the quasi-Borel space KK\mathbb{R} (see Section 5.5).

9.1 A Relational Logic Verifying Differential Privacy

To formulate differential privacy and its relaxations in the quasi-Borel setting, we convert statistical divergences Δ{\mathsf{\Delta}} on the Giry monad GG in Table 2 to Eq{\color[rgb]{0,0,0}\mathrm{Eq}}{}-relative divergences L,lΔ\langle{L,l}\rangle^{*}{\mathsf{\Delta}} on the probability monad PP on 𝐐𝐁𝐒{\bf QBS} by the construction in Section 5.5. Then, we construct the graded relational lifting P[L,lΔ]P^{[\langle{L,l}\rangle^{*}{\mathsf{\Delta}}]} by Theorem 8. Using this, as an instantiation of acRL, we build a relational logic reasoning about differential privacy and its relaxations, supporting higher-order programs and continuous random samplings. Basic proof rules can be given by Proposition 15.

For effectful operations, we import basic proof rules on noise-adding mechanisms given in prior studies ([21, 22, 40, 16]) via Theorem 1 and Proposition 16. For example, consider the Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-relative +\mathcal{R}^{+}-graded +\mathcal{R}^{+}-divergence Δ=L,l𝖣𝖯{\mathsf{\Delta}}=\langle{L,l}\rangle^{*}\mathsf{DP} on PP. Proposition 16 with an effectful operation c=𝚕𝚊𝚙c=\mathtt{lap} and a relational assertion (below we identify global elements in KK\mathbb{R} and real numbers)

d:𝚁×𝚁u:𝚁×𝚁ϕ={(x,1/ε,y,1/ε)||xy|1},{}^{u:\mathtt{R}\times\mathtt{R}}_{d:\mathtt{R}\times\mathtt{R}}\vdash\phi=\{(\langle x,1/\varepsilon\rangle,\langle y,1/\varepsilon\rangle)~{}|~{}|x-y|\leq 1\},

together with Theorem 1 and the prior result [21, Example 1] yields the following judgment:

ϕ(𝚕𝚊𝚙(u),𝚕𝚊𝚙(d)):P[L,l𝖣𝖯](0,ϵ)(EqK).\phi\vdash{}(\mathtt{lap}(u),\mathtt{lap}(d))\colon P^{[\langle{L,l}\rangle^{*}\mathsf{DP}]}(0,\epsilon)({\color[rgb]{0,0,0}\mathrm{Eq}}{K\mathbb{R}}).

By letting diffr\mathrm{diff}_{r} be the relational assertion d:𝚁u:𝚁{(x,y)||xy|r}{}^{u:\mathtt{R}}_{d:\mathtt{R}}\vdash\{(x,y)~{}|~{}|x-y|\leq r\}, the above judgment is equivalent to:

diff1(𝚕𝚊𝚙(u,1/ϵ),𝚕𝚊𝚙(d,1/ϵ))):P[L,l𝖣𝖯](0,ϵ)(EqK).\mathrm{diff}_{1}\vdash{}(\mathtt{lap}(u,1/\epsilon),\mathtt{lap}(d,1/\epsilon)))\colon P^{[\langle{L,l}\rangle^{*}\mathsf{DP}]}(0,\epsilon)({\color[rgb]{0,0,0}\mathrm{Eq}}{K\mathbb{R}}). (12)

This rule corresponds to the rule [LapGen] of the program logic apRHL+ ([11]) for differential privacy. For another example, by the reflexivity of 𝖣𝖯\mathsf{DP}, L,l𝖣𝖯\langle{L,l}\rangle^{*}\mathsf{DP} is also reflexive, hence we obtain the following judgments (below succr\mathrm{succ}_{r} is the relational assertion d:𝚁u:𝚁{(x,y)|y=x+r}{}^{u:\mathtt{R}}_{d:\mathtt{R}}\vdash\{(x,y)~{}|~{}y=x+r\}):

succ1(𝚕𝚊𝚙(u,λ),𝚕𝚊𝚙(d,λ)):P[L,l𝖣𝖯](0,0)(succ1)\displaystyle\mathrm{succ}_{1}\vdash{}(\mathtt{lap}(u,\lambda),\mathtt{lap}(d,\lambda))\colon P^{[\langle{L,l}\rangle^{*}\mathsf{DP}]}(0,0)(\mathrm{succ}_{1}) (13)
succ1(𝚗𝚘𝚛𝚖(u,σ),𝚗𝚘𝚛𝚖(d,σ)):P[L,l𝖣𝖯](0,0)(succ1).\displaystyle\mathrm{succ}_{1}\vdash{}(\mathtt{norm}(u,\sigma),\mathtt{norm}(d,\sigma))\colon P^{[\langle{L,l}\rangle^{*}\mathsf{DP}]}(0,0)(\mathrm{succ}_{1}). (14)

The judgment (13) correspond to [LapNull] of apRHL+. Similarly, the following judgments about the DP, Rényi-DP, zero-concentrated DP of the Gaussian mechanism can be derived as (15)–(17).

diff1(𝚗𝚘𝚛𝚖(u,σ),𝚗𝚘𝚛𝚖(d,σ)):P[L,l𝖣𝖯](ϵ,δ)(EqK)\displaystyle\mathrm{diff}_{1}\vdash{}(\mathtt{norm}(u,\sigma),\mathtt{norm}(d,\sigma))\colon P^{[\langle{L,l}\rangle^{*}\mathsf{DP}]}{(\epsilon,\delta)}{({\color[rgb]{0,0,0}\mathrm{Eq}}{K\mathbb{R}})} (15)
diffr(𝚗𝚘𝚛𝚖(u,σ),𝚗𝚘𝚛𝚖(d,σ)):P[L,l𝖱𝖾α](αr2/2σ2)(EqK)\displaystyle\mathrm{diff}_{r}\vdash{}(\mathtt{norm}(u,\sigma),\mathtt{norm}(d,\sigma))\colon P^{[\langle{L,l}\rangle^{*}{}^{\alpha}\mathsf{Re}]}{(\alpha r^{2}/2\sigma^{2})}{({\color[rgb]{0,0,0}\mathrm{Eq}}{K\mathbb{R}})} (16)
diffr(𝚗𝚘𝚛𝚖(u,σ),𝚗𝚘𝚛𝚖(d,σ)):P[L,l𝗓𝖢𝖣𝖯](0,r2/2σ2)(EqK)\displaystyle\mathrm{diff}_{r}\vdash{}(\mathtt{norm}(u,\sigma),\mathtt{norm}(d,\sigma))\colon P^{[\langle{L,l}\rangle^{*}\mathsf{zCDP}]}{(0,r^{2}/2\sigma^{2})}{({\color[rgb]{0,0,0}\mathrm{Eq}}{K\mathbb{R}})} (17)

In (15) we require σmax((1+3)/2,2log(0.66/δ)/ϵ)\sigma\geq\max((1+\sqrt{3})/2,~{}\sqrt{2\log(0.66/\delta)}/\epsilon). The derivation is done via Proposition 16, Theorem 1 and prior studies ([51, 40, 17]).

10 Case Study II: Probabilistic Programs with Costs

We further extend the computational signature 𝒞\mathcal{C} in the previous section with an effectful operation 𝚝𝚒𝚌𝚔\mathtt{tick} such that Σe(𝚝𝚒𝚌𝚔)=(𝚁,1)\Sigma_{e}(\mathtt{tick})=(\mathtt{R},1). The intention of 𝚝𝚒𝚌𝚔(r)\mathtt{tick}(r) is to increase cost counter by rr during execution888To make examples simpler, we allow negative costs.. To interpret this extended metalanguage, we fill Figure 3 as follows:

  1. 1.

    for the CCC-SM, we take (,T)=(𝐐𝐁𝐒,Pc)(\mathbb{C},T)=({\bf QBS},P_{c}) where PcP(K×)P_{c}\triangleq P(K\mathbb{R}\times-) is the monad for modeling probabilistic choice and cost counting (see Section 5.7).

  2. 2.

    interpretation of bBb\in B is the same as Section 9,

  3. 3.

    interpretation of value operations is also the same as Section 9,

  4. 4.

    for the interpretation of effectful operations, put

    [[𝚗𝚘𝚛𝚖]](x,σ)\displaystyle[\![\mathtt{norm}]\!](x,\sigma) =[(0,id),𝒩(x,σ2)]K×K,\displaystyle=[(0,\mathrm{id}),\mathcal{N}(x,\sigma^{2})]_{\sim_{K\mathbb{R}\times K\mathbb{R}}},
    [[𝚕𝚊𝚙]](x,λ)\displaystyle[\![\mathtt{lap}]\!](x,\lambda) =[(0,id),Lap(x,λ)]K×K,\displaystyle=[(0,\mathrm{id}),\mathrm{Lap}(x,\lambda)]_{\sim_{K\mathbb{R}\times K\mathbb{R}}},
    [[𝚝𝚒𝚌𝚔]](r)\displaystyle[\![\mathtt{tick}]\!](r) =ηK×[[1]]P(r,)=[const(r,),μ]K×1.\displaystyle=\eta^{P}_{K\mathbb{R}\times[\![1]\!]}(r,\ast)=[\mathrm{const}(r,\ast),\mu]_{\sim_{K\mathbb{R}\times 1}}.

We derive a closed term 𝚗𝚝𝚒𝚌𝚔:𝚁𝚁𝚃1\mathtt{ntick}\colon\mathtt{R}\Rightarrow\mathtt{R}\Rightarrow\mathtt{T}1 for ticking with a cost sampled from Gaussian distribution:

𝚗𝚝𝚒𝚌𝚔(λs.λr.𝚕𝚎𝚝x=𝚗𝚘𝚛𝚖(r,s)𝚒𝚗𝚝𝚒𝚌𝚔(x)).\mathtt{ntick}\triangleq(\lambda s.\lambda r.~{}\mathop{\mathtt{let}}{x=\mathtt{norm}(r,s)}\mathbin{\mathtt{in}}\mathtt{tick}(x)).

The term 𝚗𝚝𝚒𝚌𝚔sr\mathtt{ntick}~{}s~{}r adds cost counter by a random value sampled from the Gaussian distribution 𝚗𝚘𝚛𝚖(r,s2)\mathtt{norm}(r,s^{2}).

10.1 Relational Reasoning on Probabilistic Costs

We convert the total valuation distance 𝖳𝖵𝐃𝐢𝐯(G,Eq,1,+)\mathsf{TV}\in{\bf Div}(G,{\color[rgb]{0,0,0}\mathrm{Eq}},1,\mathcal{R}^{+}) to the divergence Δc𝖢(L,l𝖳𝖵,K)𝐃𝐢𝐯(Pc,Eq,1,+){\mathsf{\Delta}}_{c}\triangleq{\mathsf{C}}(\langle{L,l}\rangle^{*}\mathsf{TV},K\mathbb{R})\in{\bf Div}(P_{c},{\color[rgb]{0,0,0}\mathrm{Eq}},1,\mathcal{R}^{+}) on PcP_{c} by Propositions 2 and 7. We also prove basic facts on effectful operations. First, the following relational judgments on 𝚝𝚒𝚌𝚔\mathtt{tick} can be easily given:

(𝚝𝚒𝚌𝚔(u),𝚝𝚒𝚌𝚔(d)):T[Δc](1)()\displaystyle\top\vdash{}(\mathtt{tick}(u),\mathtt{tick}(d))\colon T^{[{\mathsf{\Delta}}_{c}]}(1)(\top) (18)
u=d(𝚝𝚒𝚌𝚔(u),𝚝𝚒𝚌𝚔(d)):T[Δc](0)()\displaystyle u=d\vdash{}(\mathtt{tick}(u),\mathtt{tick}(d))\colon T^{[{\mathsf{\Delta}}_{c}]}(0)(\top)

Remark that Eq1={\color[rgb]{0,0,0}\mathrm{Eq}}1=\top and [[𝚝𝚒𝚌𝚔(0)]]=[[𝚛𝚎𝚝()]][\![\mathtt{tick}(0)]\!]=[\![\mathop{\mathtt{ret}}(\ast)]\!] holds. Next, in the similar way as (13), by the reflexivity of 𝖳𝖵\mathsf{TV}, we have the reflexivity of L,l𝖳𝖵\langle{L,l}\rangle^{*}\mathsf{TV}, and we obtain, for each real number constant σ,λ\sigma,\lambda,

succr(𝚗𝚘𝚛𝚖(u,σ),𝚗𝚘𝚛𝚖(d,σ)):T[Δc](0)(succr)\displaystyle\mathrm{succ}_{r}\vdash{}(\mathtt{norm}(u,\sigma),\mathtt{norm}(d,\sigma))\colon T^{[{\mathsf{\Delta}}_{c}]}(0)(\mathrm{succ}_{r})
succr(𝚕𝚊𝚙(u,λ),𝚕𝚊𝚙(d,λ)):T[Δc](0)(succr)\displaystyle\mathrm{succ}_{r}\vdash{}(\mathtt{lap}(u,\lambda),\mathtt{lap}(d,\lambda))\colon T^{[{\mathsf{\Delta}}_{c}]}(0)(\mathrm{succ}_{r}) (19)

We also directly verify the following judgment on 𝚗𝚝𝚒𝚌𝚔\mathtt{ntick} using Theorem 1 and Proposition 16:

diff1(𝚗𝚝𝚒𝚌𝚔σu,𝚗𝚝𝚒𝚌𝚔σd):T[Δc](Prr𝒩(0,σ2)[|r|<0.5])().\displaystyle\mathrm{diff}_{1}\vdash{}(\mathtt{ntick}~{}\sigma~{}u,\mathtt{ntick}~{}\sigma~{}d)\colon T^{[{\mathsf{\Delta}}_{c}]}{({\textstyle\Pr_{r\sim\mathcal{N}(0,\sigma^{2})}}[|r|<0.5])}{(\top)}. (20)

10.1.1 An Example of Relational Reasoning

We give examples of verification of difference (of distributions) of costs between two runs of a probabilistic program whose output and cost depend on the input. We consider the following program:

Mλr:𝚁.λt:𝚁T1.𝚕𝚎𝚝x=𝚕𝚊𝚙(r,5)𝚒𝚗𝚕𝚎𝚝_=t(r)𝚒𝚗𝚛𝚎𝚝(xr).M\triangleq\lambda r\colon\mathtt{R}.~{}\lambda t\colon\mathtt{R}\to T1.~{}\mathop{\mathtt{let}}{x=\mathtt{lap}(r,5)}\mathbin{\mathtt{in}}\mathop{\mathtt{let}}{\_=t(r)}\mathbin{\mathtt{in}}\mathop{\mathtt{ret}}(x-r).

It first samples a real number xx from the Laplacian distribution centered at the input rr, call the (possibly effectful) closure tt with rr and return xrx-r. Since the return type of tt is T1T1, it can only probabilistically tick the counter. We show that the following two judgments in acRL:

(M0(λx.𝚝𝚒𝚌𝚔(x)),M1(λx.𝚝𝚒𝚌𝚔(x))):T[Δc](1)(Eq[[𝚁]]),\displaystyle\vdash{}(M~{}0~{}(\lambda x.\mathtt{tick}(x)),M~{}1~{}(\lambda x.\mathtt{tick}(x)))\colon T^{[{\mathsf{\Delta}}_{c}]}{(1)}{({\color[rgb]{0,0,0}\mathrm{Eq}}{[\![\mathtt{R}]\!]}}), (A)
(M0(𝚗𝚝𝚒𝚌𝚔(2)),M1(𝚗𝚝𝚒𝚌𝚔(2))):T[Δc](0.20)(Eq[[𝚁]])\displaystyle\vdash{}(M~{}0~{}(\mathtt{ntick}(2)),M~{}1~{}(\mathtt{ntick}(2)))\colon T^{[{\mathsf{\Delta}}_{c}]}{(0.20)}{({\color[rgb]{0,0,0}\mathrm{Eq}}{[\![\mathtt{R}]\!]}}) (B)

In judgment (A), we pass the tick operation t=λx.𝚝𝚒𝚌𝚔(x)t=\lambda x.\mathtt{tick}(x) itself to M0M~{}0 and M1M~{}1. By the fundamental property of T[Δc]T^{[{\mathsf{\Delta}}_{c}]}, the difference of costs between two runs of M0tM~{}0~{}t and M1tM~{}1~{}t is 11, because each of these programs reports cost 0 and 11 deterministically. In contrast, in judgment (B), we pass to M0M~{}0 and M1M~{}1 the probabilistic tick function t=𝚗𝚝𝚒𝚌𝚔(2)t^{\prime}=\mathtt{ntick}(2) that ticks a real number sampled from the Gaussian distribution with variance 22=42^{2}=4. Therefore the cost reported by the runs of programs M0tM~{}0~{}t^{\prime} and M1tM~{}1~{}t^{\prime} follow the Gaussian distributions 𝒩(0,4)\mathcal{N}(0,4) and 𝒩(1,4)\mathcal{N}(1,4), whose difference by 𝖳𝖵\mathsf{TV} is bounded by 0.200.20.

We first show (A). By (18) and 2 of Proposition 15, we have,

succ1(𝚝𝚒𝚌𝚔(u),𝚝𝚒𝚌𝚔(d)):T[Δc](1)().\mathrm{succ}_{1}\vdash{}(\mathtt{tick}(u),\mathtt{tick}(d))\colon T^{[{\mathsf{\Delta}}_{c}]}{(1)}{(\top)}. (21)

By (21), and 4, 5 of Proposition 15, we obtain,

succ1\displaystyle\mathrm{succ}_{1}\vdash (𝚕𝚎𝚝_=𝚝𝚒𝚌𝚔(u)𝚒𝚗𝚛𝚎𝚝(u),\displaystyle(\mathop{\mathtt{let}}{\_=\mathtt{tick}(u)}\mathbin{\mathtt{in}}\mathop{\mathtt{ret}}(u),
𝚕𝚎𝚝_=𝚝𝚒𝚌𝚔(d)𝚒𝚗𝚛𝚎𝚝(d1)):T[Δc](1)(Eq[[𝚁]]).\displaystyle\quad\mathop{\mathtt{let}}{\_=\mathtt{tick}(d)}\mathbin{\mathtt{in}}\mathop{\mathtt{ret}}(d-1))\colon T^{[{\mathsf{\Delta}}_{c}]}{(1)}{({\color[rgb]{0,0,0}\mathrm{Eq}}[\![\mathtt{R}]\!])}. (22)

By (19), (22), and 1 and 5 of Proposition 15 again, we conclude (A).

To show (B), it suffices to replace (21) by the following judgment proved by (20), the inequality Prr𝒩(0,4)[|r|<0.5]0.20\Pr_{r\sim\mathcal{N}(0,4)}[|r|<0.5]\leq 0.20 and 2 of Proposition 15:

succ1(𝚗𝚝𝚒𝚌𝚔2u,𝚗𝚝𝚒𝚌𝚔2d):T[Δc](0.20)().\mathrm{succ}_{1}\vdash{}(\mathtt{ntick}~{}2~{}u,\mathtt{ntick}~{}2~{}d)\colon T^{[{\mathsf{\Delta}}_{c}]}{(0.20)}{(\top)}.

The rest of proof is the same as (A).

11 Related Work

This work is based on the frameworks for verifying the differential privacy of probabilistic programs using relational logic, summarized in Table 5. Composable divergences employed in these frameworks include the one for differential privacy, plus its recent relaxations, such as, Rényi DP, zero-concentrated DP, and truncated-concentrated DP ([16, 17, 40]).

Table 5: Approximate Probabilistic Relational Logic
Work Monad Relation Lifting Method Supported divergences
[8, 10, 12] Dist 𝐁𝐑𝐞𝐥(𝐒𝐞𝐭){\bf BRel}({\bf Set}) coupling DP
[13] Dist 𝐁𝐑𝐞𝐥(𝐒𝐞𝐭){\bf BRel}({\bf Set}) coupling ff-divergences
[52] Giry 𝐒𝐩𝐚𝐧(𝐌𝐞𝐚𝐬){\bf Span}({\bf Meas}) coupling (spans) composable ones
[51] Giry 𝐁𝐑𝐞𝐥(𝐌𝐞𝐚𝐬){\bf BRel}({\bf Meas}) codensity DP
This work Generic 𝐁𝐑𝐞𝐥(){\bf BRel}(\mathbb{C}) codensity composable ones

The key semantic structure in these frameworks is graded relational liftings of the probability distribution monad. Barthe et al. gave a graded relational lifting of the distribution monad based on the existence of two witnessing probability distributions (called coupling) ([12]). Since then, coupling-based liftings have been refined and used in several works ([8, 10, 13, 52]). They can be systematically constructed from composable divergences on the probability distribution monad ([13]). One advantage of coupling-based liftings is that, to relate two probability distributions, it suffices to exhibit a coupling; this is exploited in the mechanized verification of differential privacy of programs ([1, 2]). These coupling-based liftings, however, are developed upon discrete probability distributions, and measure-theoretic probability distributions, such as Gaussian or Cauchy distributions, were not supported until the work ([52]).

The relational Hoare logic supporting sampling from continuous probability measures is given in the study by [51]. In his work, the graded relational lifting for (ϵ,δ)(\epsilon,\delta)-DP is given in the style of codensity lifting ([33]), which does not rely on the existence of coupling. Yet, it has been an open question [52, Section VIII] how to extend his graded relational lifting to support various relaxations of differential privacy. This paper answers to this question as Theorem 8. Later, coupling-based liftings has also been extended to support samplings from continuous probability measures ([52]). This extension is achieved by redefining the concept of binary relations as spans of measurable functions. Comparison of these approaches is in the next section.

The verification of differential privacy in functional programming languages has also been pursued ([48, 23, 9, 5]). [48] introduced a linear functional programming language with a graded monadic type that supports reasoning about ϵ\epsilon-differential privacy. Later, Gaboardi et al. strengthen Reed-Pierce type system with dependent types ([23]). A category-theoretic account of Reed and Pierce type system is given in [5], where general (ϵ,δ)(\epsilon,\delta)-differential privacy is also supported. These works basically regard types as metric spaces, allowing us to reason about sensitivity of programs with respect to inputs. The coupling-based lifting techniques are also employed in the relational models of higher-order probabilistic programming language ([9]).

The study [5] gives a categorical definition of composable divergences in a general framework called weakly closed refinements of symmetric monoidal closed categories [5, Definition 1]. A comparison is given in Section 6.1.2.

[39] introduced a quantitative refinement of algebraic theory called quantitative equational theory, and studied variety theorem for quantitative algebras. [6] discussed tensor products of quantitative equational theories. QETs and divergences on monads share the common interest of measuring quantitative differences between computational effects. Divergences on monads are derived as a generalization of the composability condition of statistical divergences studied by [13]. To make a precise connection between these two concepts, in Section 6.3, we have given an adjunction between QETs of type Ω\Omega over XX and XX-generated divergences on the free monad TΩT_{\Omega}. The adjunction cuts down to the isomorphism between unconditional QETs of type Ω\Omega over XX and XX-generated divergences on TΩT_{\Omega}.

The use of metric-like spaces in the semantics is seen in several recent work. [25] studies quantitative refinements of Abramsky’s applicative bisimilarity for Reed-Pierce type system. He introduces a monadic operational semantics of the language and formalized quantitative applicative bisimilarity using monad liftings to the category of quantale-valued relations. [15] also used metric-like spaces to study bisimulations and up-to techniques in the category of quantale-valued relations. In this work our interest is relational program verification of effectful programs, and it is carried out in the relational category 𝐁𝐑𝐞𝐥(){\bf BRel}(\mathbb{C}), rather than 𝐃𝐢𝐯𝒬(){\bf Div}_{\mathcal{Q}}({\mathbb{C}}). The quantitative difference of computational effects measured by a divergence Δ{\mathsf{\Delta}} is represented by the binary relation Δ~\tilde{\mathsf{\Delta}} graded by upper bounds of distance.

12 Future Work

The framework for relational cost analysis given in ([47])(extension of 𝖱𝖾𝗅𝖢𝗈𝗌𝗍\mathsf{RelCost} ([18])) consists of the relational logic verifying the difference of costs between two programs and the unary logic verifying the lower and upper bound of costs (i.e. cost intervals) in one program. We expect that the relational logic can be reformulated by an instantiation of acRL with the divergence 𝖭𝖢𝖨\mathsf{NCI} on P(×)P(\mathbb{N}\times-) (or its variant). However to reformulate the unary logic, we want a unary version of divergence on P(×)P(\mathbb{N}\times-) for cost intervals. To establish the connection between the unary logic and relational logic, we want a conversion from the unary version of divergence (for cost intervals) to 𝖭𝖢𝖨\mathsf{NCI} (for cost difference).

There might be many other examples and applications of divergences on monads. In this paper, we mainly discussed examples of divergences with basic endorelations Top{\color[rgb]{0,0,0}\mathrm{Top}} and Eq{\color[rgb]{0,0,0}\mathrm{Eq}}, but various other basic endorelations can be considered.

13 Measurable Spaces and Quasi-Borel Spaces

Measurable Spaces.

For the treatment of continuous probability distributions, we employ the category 𝐌𝐞𝐚𝐬{\bf Meas} of measurable spaces and measurable functions. For a measurable space II we write |I||I| and ΣI\Sigma_{I} for the underlying set and σ\sigma-algebra of II respectively. The category 𝐌𝐞𝐚𝐬{\bf Meas} is a (well-pointed) CC, and it has all small limits and small colimits that are strictly preserved by the forgetful functor ||:𝐌𝐞𝐚𝐬𝐒𝐞𝐭|{-}|\colon{\bf Meas}\to{\bf Set}. It is naturally isomorphic to the global element functor 𝐌𝐞𝐚𝐬(1,){\bf Meas}(1,-).

Standard Borel Spaces.

A standard Borel space is a special measurable space (|Ω|,ΣΩ)(|\Omega|,\Sigma_{\Omega}) whose σ\sigma-algebra ΣΩ\Sigma_{\Omega} is the coarsest one containing the topology σΩ\sigma_{\Omega} of a Polish space (|Ω|,σΩ)(|\Omega|,\sigma_{\Omega}). In particular, the real line \mathbb{R} forms a standard Borel space. In fact, a measurable space Ω\Omega is standard Borel if and only if there are γ:Ω\gamma\colon\Omega\to\mathbb{R} and γ:Ω\gamma^{\prime}\colon\mathbb{R}\to\Omega in 𝐌𝐞𝐚𝐬{\bf Meas} forming a section-retraction pair, that is, γγ=idΩ\gamma^{\prime}\circ\gamma={\rm id}_{\Omega}. For example, [0,1][0,1], [0,][0,\infty], \mathbb{N}, k\mathbb{R}^{k} (kk\in\mathbb{N}) are standard Borel.

The Giry Monad.

We recall the Giry monad GG ([26]). For every measurable space II, GIGI is the set |GI||GI| of all probability measures over II with the coarsest σ\sigma-algebra induced by functions evA:|GI|[0,1]\mathrm{ev}_{A}\colon|GI|\to[0,1] (AΣXA\in\Sigma_{X}) defined by evA(μ)=μ(A)\mathrm{ev}_{A}(\mu)=\mu(A). The unit ηI:IGI\eta_{I}\colon I\to GI assigns to each xIx\in I the Dirac distribution 𝐝x\mathbf{d}_{x} centered at xx. For every f:IGJf\colon I\to GJ, the Kleisli extension f:GIGJf^{\sharp}\colon GI\to GJ is given by (f(μ))(A)=xf(x)(A)𝑑μ(x)(f^{\sharp}(\mu))(A)=\int_{x}f(x)(A)~{}d\mu(x) for each μGI\mu\in GI. We also denote by GsG_{s} the subprobabilistic variant of GG (called sub-Giry monad), where the underlying set |GsI||G_{s}I| of GsIG_{s}I is relaxed to the set of subprobaility measures over II.

The Giry monad GG (resp. the sub-Giry monad GsG_{s}) carries a (commutative) strength θI,J:I×GJG(I×J)\theta_{I,J}\colon I\times GJ\to G(I\times J) over the CC (𝐌𝐞𝐚𝐬,1,(×))({\bf Meas},1,(\times)). It computes the product of measures ((x,μ)𝐝xμ(x,\mu)\mapsto\mathbf{d}_{x}\otimes\mu). Therefore (𝐌𝐞𝐚𝐬,G)({\bf Meas},G) and (𝐌𝐞𝐚𝐬,Gs)({\bf Meas},G_{s}) are (well-pointed) CC-SMs.

Quasi-Borel Spaces.

The category 𝐌𝐞𝐚𝐬{\bf Meas} is not suitable for the semantics of higher-order programming languages since it is not Cartesian closed ([4]). For the treatment of higher-order probabilistic programs with continuous distributions, we employ the Cartesian closed category 𝐐𝐁𝐒{\bf QBS} of quasi-Borel spaces and morphisms between them, together with the probability monad PP on 𝐐𝐁𝐒{\bf QBS} ([28]). A quasi-Borel space is a pair I=(|I|,MI)I=(|I|,M_{I}) of a set |I||I| and a subset MIM_{I} of the function space |I|\mathbb{R}\Rightarrow|I| satisfying

  1. 1.

    for αMI\alpha\in M_{I} and a measurable function f:f\colon\mathbb{R}\to\mathbb{R}, αfMI\alpha\circ f\in M_{I}.

  2. 2.

    for any xIx\in I, (λr.x)MI(\lambda r\in\mathbb{R}.x)\in M_{I}.

  3. 3.

    for all P:P\colon\mathbb{R}\to\mathbb{N} and a family {αi}i\{\alpha_{i}\}_{i\in\mathbb{N}} of functions αiMI\alpha_{i}\in M_{I}, (λr.αP(r)(r))MI(\lambda r\in\mathbb{R}.\alpha_{P(r)}(r))\in M_{I}.

A morphism f:(|I|,MI)(|J|,MJ)f\colon(|I|,M_{I})\to(|J|,M_{J}) is a function f:|I||J|f\colon|I|\to|J| such that fαMJf\circ\alpha\in M_{J} holds for all αMI\alpha\in M_{I}. The category 𝐐𝐁𝐒{\bf QBS} is a (well-pointed) CCC, and has all countable products and coproducts that are strictly preserved by the forgetful functor ||:𝐐𝐁𝐒𝐒𝐞𝐭|{-}|\colon{\bf QBS}\to{\bf Set}. It is naturally isomorphic to the global element functor 𝐐𝐁𝐒(1,){\bf QBS}(1,-).

Connection to Measurable Spaces: an Adjunction

We can convert measurable spaces and quasi-Borel spaces using an adjunction LK:𝐌𝐞𝐚𝐬𝐐𝐁𝐒L\dashv K\colon{\bf Meas}\to{\bf QBS}. They are given by

LI\displaystyle LI (|I|,{U|I||αMX.α1(I)Σ})\displaystyle\triangleq(|I|,\{U\subseteq|I|~{}|~{}\forall\alpha\in M_{X}.\alpha^{-1}(I)\in\Sigma_{\mathbb{R}}\}) Lf\displaystyle Lf f\displaystyle\triangleq f
KI\displaystyle KI (|I|,𝐌𝐞𝐚𝐬(,I))\displaystyle\triangleq(|I|,{\bf Meas}(\mathbb{R},I)) Kf\displaystyle Kf f\displaystyle\triangleq f

For any standard Borel space Ω𝐌𝐞𝐚𝐬\Omega\in{\bf Meas}, we have LKΩ=ΩLK\Omega=\Omega. The right adjoint KK is full-faithful when restricted to the standard Borel spaces [28, Proposition 15-(2)]. The right adjoint KK preserves countable coproducts and function spaces (if exists) of standard Borel spaces [28, Proposition 19].

Probability Measures and the Probability Monad.

A probability measure on a quasi-Borel space II is a pair (α,μ)MI×G(\alpha,\mu)\in M_{I}\times G\mathbb{R}. We introduce an equivalence relation I\sim_{I} over probability measures on II by

(α,μ)I(β,ν)μ(α1())=ν(β1()).(\alpha,\mu)\sim_{I}(\beta,\nu)\iff\mu(\alpha^{-1}(-))=\nu(\beta^{-1}(-)).

Using this, we introduce a probability monad PP on 𝐐𝐁𝐒{\bf QBS} as follows:

  • On objects, we define P:𝐎𝐛𝐣(𝐐𝐁𝐒)𝐎𝐛𝐣(𝐐𝐁𝐒)P:{\bf Obj}({\bf QBS})\to{\bf Obj}({\bf QBS}) by

    |P(I)|\displaystyle|P(I)| (MI×G)/I,\displaystyle\triangleq(M_{I}\times G\mathbb{R})/\sim_{I}, MP(I){λr.[(α,g(r))]I|αMI,g𝐌𝐞𝐚𝐬(,G)}.\displaystyle M_{P(I)}\triangleq\{\lambda r.[(\alpha,g(r))]_{\sim_{I}}~{}|~{}\alpha\in M_{I},g\in{\bf Meas}(\mathbb{R},G\mathbb{R})\}.
  • The unit is defined by ηI(x)[λr.x,μ]I\eta_{I}(x)\triangleq[\lambda r.x,\mu]_{\sim_{I}} for an arbitrary μG\mu\in G\mathbb{R}.

  • The Kleisli extension of f:IP(J)f\colon I\to P(J) is defined by f[α,μ]I[β,gμ]f^{\sharp}[\alpha,\mu]_{\sim_{I}}\triangleq[\beta,g^{\sharp}\mu] where there are βMJ\beta\in M_{J} and g𝐌𝐞𝐚𝐬(,G)g\in{\bf Meas}(\mathbb{R},G\mathbb{R}) satisfying fα=λr.[β,g(r)]Jf\circ\alpha=\lambda r\in\mathbb{R}.[\beta,g(r)]_{\sim_{J}} by definition of MP(J)M_{P(J)}.

The monad PP is (commutative) strong with respect to the CCC (𝐐𝐁𝐒,1,(×))({\bf QBS},1,(\times)).

Acknowledgments

Tetsuya Sato carried out this research under the support by JST ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603) and JSPS KAKENHI Grant Number 20K19775, Japan. Shin-ya Katsumata carried out this research under the support by JST ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603) and JSPS KAKENHI Grant Number 18H03204, Japan. The authors are grateful to Ichiro Hasuo providing the opportunity of collaborating in that project. The authors are grateful to Satoshi Kura, Justin Hsu, Marco Gaboardi, Borja Balle and Gilles Barthe for fruitful discussions.

References

  • [1] Aws Albarghouthi and Justin Hsu. Constraint-based synthesis of coupling proofs. In Computer Aided Verification - 30th International Conference, CAV 2018, Proceedings, Part I, volume 10981 of LNCS, pages 327–346. Springer, 2018.
  • [2] Aws Albarghouthi and Justin Hsu. Synthesizing coupling proofs of differential privacy. PACMPL, 2(POPL):58:1–58:30, 2018.
  • [3] Thorsten Altenkirch, James Chapman, and Tarmo Uustalu. Monads need not be endofunctors. Log. Methods Comput. Sci., 11(1), 2015.
  • [4] Robert J. Aumann. Borel structures for function spaces. Illinois J. Math., 5(4):614–630, 12 1961.
  • [5] Arthur Azevedo de Amorim, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata. Probabilistic relational reasoning via metrics. In 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2019, pages 1–19. IEEE, 2019.
  • [6] Giorgio Bacci, Radu Mardare, Prakash Panangaden, and Gordon Plotkin. Tensor of Quantitative Equational Theories. In Fabio Gadducci and Alexandra Silva, editors, 9th Conference on Algebra and Coalgebra in Computer Science (CALCO 2021), volume 211 of Leibniz International Proceedings in Informatics (LIPIcs), pages 7:1–7:17, Dagstuhl, Germany, 2021. Schloss Dagstuhl – Leibniz-Zentrum für Informatik.
  • [7] Borja Balle, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Tetsuya Sato. Hypothesis testing interpretations and renyi differential privacy. In Silvia Chiappa and Roberto Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS 2020), volume 108 of Proceedings of Machine Learning Research, pages 2496–2506, Online, 26–28 Aug 2020. PMLR.
  • [8] Gilles Barthe, Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu, César Kunz, and Pierre-Yves Strub. Proving differential privacy in Hoare logic. In IEEE 27th Computer Security Foundations Symposium, CSF 2014, pages 411–424. IEEE Computer Society, 2014.
  • [9] Gilles Barthe, Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu, Aaron Roth, and Pierre-Yves Strub. Higher-order approximate relational refinement types for mechanism design and differential privacy. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015, pages 55–68. ACM, 2015.
  • [10] Gilles Barthe, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. Proving differential privacy via probabilistic couplings. In Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’16, pages 749–758. ACM, 2016.
  • [11] Gilles Barthe, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. Coupling proofs are probabilistic product programs. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, pages 161–174, New York, NY, USA, 2017. Association for Computing Machinery.
  • [12] Gilles Barthe, Boris Köpf, Federico Olmedo, and Santiago Zanella Béguelin. Probabilistic relational reasoning for differential privacy. In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2012, pages 97–110. ACM, 2012.
  • [13] Gilles Barthe and Federico Olmedo. Beyond differential privacy: Composition theorems and relational logic for f-divergences between probabilistic programs. In Automata, Languages, and Programming - 40th International Colloquium, ICALP 2013, Proceedings, Part II, volume 7966 of LNCS, pages 49–60. Springer, 2013.
  • [14] Nick Benton. Simple relational correctness proofs for static analyses and program transformations. SIGPLAN Not., 39(1):14–25, January 2004.
  • [15] Filippo Bonchi, Barbara König, and Daniela Petrisan. Up-To Techniques for Behavioural Metrics via Fibrations. In 29th International Conference on Concurrency Theory (CONCUR 2018), volume 118 of Leibniz International Proceedings in Informatics (LIPIcs), pages 17:1–17:17, Dagstuhl, Germany, 2018. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.
  • [16] Mark Bun, Cynthia Dwork, Guy N. Rothblum, and Thomas Steinke. Composable and versatile privacy via truncated CDP. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, pages 74–86, New York, NY, USA, 2018. Association for Computing Machinery.
  • [17] Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography, pages 635–658, Berlin, Heidelberg, 2016. Springer Berlin Heidelberg.
  • [18] Ezgi Çiçek, Gilles Barthe, Marco Gaboardi, Deepak Garg, and Jan Hoffmann. Relational cost analysis. SIGPLAN Not., 52(1):316–329, January 2017.
  • [19] Imre Csiszár. Eine informationstheoretische Ungleichung und ihre Anwendung auf den beweis der ergodizitat von markoffschen ketten. Magyar. Tud. Akad. Mat. Kutato Int. Kozl., 8:85–108, 1963.
  • [20] Imre Csiszár. Information-type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar., 2:299–318, 1967.
  • [21] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, volume 3876 of LNCS, pages 265–284. Springer Berlin Heidelberg, 2006.
  • [22] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3-4):211–407, 2013.
  • [23] Marco Gaboardi, Andreas Haeberlen, Justin Hsu, Arjun Narayan, and Benjamin C. Pierce. Linear dependent types for differential privacy. In The 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’13, pages 357–370. ACM, 2013.
  • [24] Marco Gaboardi, Shin-ya Katsumata, Dominic Orchard, and Tetsuya Sato. Graded hoare logic and its categorical semantics. In Nobuko Yoshida, editor, Programming Languages and Systems - 30th European Symposium on Programming, ESOP 2021, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg, March 27 - April 1, 2021, Proceedings, volume 12648 of Lecture Notes in Computer Science, pages 234–263. Springer, 2021.
  • [25] Francesco Gavazzo. Quantitative behavioural reasoning for higher-order effectful programs: Applicative distances. In Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’18, pages 452–461, New York, NY, USA, 2018. Association for Computing Machinery.
  • [26] Michèle Giry. A categorical approach to probability theory. In B. Banaschewski, editor, Categorical Aspects of Topology and Analysis, volume 915 of LNM, pages 68–85. Springer, 1982.
  • [27] Rob Hall. New Statistical Applications for Differential Privacy. PhD thesis, Machine Learning Department School of Computer Science Carnegie Mellon University, 2012.
  • [28] Chris Heunen, Ohad Kammar, Sam Staton, and Hongseok Yang. A convenient category for higher-order probability theory. In 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2017, pages 1–12, 2017.
  • [29] B. Jacobs. Categorical Logic and Type Theory. Elsevier, 1999.
  • [30] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, pages 1376–1385, 2015.
  • [31] Shin-ya Katsumata. Parametric effect monads and semantics of effect systems. In The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’14, pages 633–646. ACM, 2014.
  • [32] Shin-ya Katsumata and Tetsuya Sato. Preorders on monads and coalgebraic simulations. In Frank Pfenning, editor, Foundations of Software Science and Computation Structures, pages 145–160, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
  • [33] Shin-ya Katsumata, Tetsuya Sato, and Tarmo Uustalu. Codensity lifting of monads and its dual. Logical Methods in Computer Science, 14(4), 2018.
  • [34] Max Kelly. Basic Concepts of Enriched Category Theory, volume 64. Cambridge University Press, 1982. Republished in: Reprints in Theory and Applications of Categories, No. 10 (2005) pp.1-136.
  • [35] Friedrich Liese and Igor Vajda. On divergences and informations in statistics and information theory. IEEE Transactions on Information Theory, 52(10):4394–4412, Oct 2006.
  • [36] John M. Lucassen and David K. Gifford. Polymorphic effect systems. In Conference Record of the Fifteenth Annual ACM Symposium on Principles of Programming Languages, pages 47–57. ACM Press, 1988.
  • [37] Saunders Mac Lane. Categories for the Working Mathematician (Second Edition), volume 5 of Graduate Texts in Mathematics. Springer, 1998.
  • [38] R. Mardare, P. Panangaden, and G. Plotkin. On the axiomatizability of quantitative algebras. In 2017 32nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 1–12, Los Alamitos, CA, USA, jun 2017. IEEE Computer Society.
  • [39] Radu Mardare, Prakash Panangaden, and Gordon Plotkin. Quantitative algebraic reasoning. In Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’16, page 700–709, New York, NY, USA, 2016. Association for Computing Machinery.
  • [40] Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263–275, Aug 2017.
  • [41] John C. Mitchell and Andre Scedrov. Notes on sconing and relators. In Computer Science Logic, 6th Workshop, CSL ’92, volume 702 of LNCS, pages 352–378. Springer, 1992.
  • [42] Eugenio Moggi. Notions of computation and monads. Information and Computation, 93(1):55–92, 1991.
  • [43] Tetsuzo Morimoto. Markov processes and the H-theorem. Journal of the Physical Society of Japan, 18(3):328–331, 1963.
  • [44] Hanne Riis Nielson and Flemming Nielson. Semantics with Applications: An Appetizer. Springer-Verlag, Berlin, Heidelberg, 2007.
  • [45] Federico Olmedo. Approximate Relational Reasoning for Probabilistic Programs. PhD thesis, Technical University of Madrid, 2014.
  • [46] Shiva Prasad and Kasiviswanathan Adam Smith. A note on differential privacy: Defining resistance to arbitrary side information. Journal of Privacy and Confidentiality, 6(1), 2014.
  • [47] Ivan Radiček, Gilles Barthe, Marco Gaboardi, Deepak Garg, and Florian Zuleger. Monadic refinements for relational cost analysis. Proc. ACM Program. Lang., 2(POPL):36:1–36:32, December 2017.
  • [48] Jason Reed and Benjamin C. Pierce. Distance makes the types grow stronger: a calculus for differential privacy. In Proceeding of the 15th ACM SIGPLAN international conference on Functional programming, ICFP 2010, pages 157–168. ACM, 2010.
  • [49] J. J. M. M. Rutten. Elements of generalized ultrametric domain theory. Theor. Comput. Sci., 170(1-2):349–381, December 1996.
  • [50] Tetsuya Sato. Identifying all preorders on the subdistribution monad. In Bart Jacobs, Alexandra Silva, and Sam Staton, editors, Proceedings of the 30th Conference on the Mathematical Foundations of Programming Semantics, MFPS 2014, Ithaca, NY, USA, June 12-15, 2014, volume 308 of Electronic Notes in Theoretical Computer Science, pages 309–327. Elsevier, 2014.
  • [51] Tetsuya Sato. Approximate relational hoare logic for continuous random samplings. In The Thirty-second Conference on the Mathematical Foundations of Programming Semantics, MFPS 2016, volume 325 of Electronic Notes in Theoretical Computer Science, pages 277–298. Elsevier, 2016.
  • [52] Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata. Approximate span liftings: Compositional semantics for relaxations of differential privacy. In 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2019, pages 1–14. IEEE, 2019.
  • [53] Ross Street. The formal theory of monads. Journal of Pure and Applied Algebra, 2(2):149 – 168, 1972.
  • [54] Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375–389, 2010.

relax

Proof.

relax∎

relax__relax

Proposition 17.

The family 𝖢={𝖢I:(×I)2𝒩}I𝐒𝐞𝐭\mathsf{C}^{\prime}=\{{\mathsf{C}^{\prime}_{I}}\colon(\mathbb{N}\times I)^{2}\to\mathcal{N}\}_{I\in{\bf Set}} of 𝒩\mathcal{N}-divergences defined by

𝖢I((i,x),(j,y)){|ij|x=yxy.\mathsf{C}^{\prime}_{I}((i,x),(j,y))\triangleq\begin{cases}|i-j|&x=y\\ \infty&x\neq y\end{cases}.

is a Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-relative 𝒩\mathcal{N}-divergence on the monad ×\mathbb{N}\times-.

Proof.

The monotonicity of 𝖢\mathsf{C}^{\prime} is obvious.

We show the Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-unit-reflexivity of 𝖢\mathsf{C}^{\prime}. For all (x,y)EqI(x,y)\in{\color[rgb]{0,0,0}\mathrm{Eq}}I (that is, x=yIx=y\in I), we have

𝖢I(ηI(x),ηI(y))=𝖢I((0,x),(0,y))=0.\mathsf{C}^{\prime}_{I}(\eta_{I}(x),\eta_{I}(y))={\mathsf{C}^{\prime}_{I}}{((0,x),(0,y))}=0.

We show the Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-composability of 𝖢\mathsf{C}^{\prime}. Let (i,x),(j,y)×I(i,x),(j,y)\in\mathbb{N}\times I and f,g:I×Jf,g\colon I\to\mathbb{N}\times J. We write f(z)=(iz,fz)f(z)=(i_{z},f_{z}) and g(z)=(jz,gz)g(z)=(j_{z},g_{z}) for each zZz\in Z.

  • If x=yx=y and xz=yzx_{z}=y_{z} for all zIz\in I, we have

    𝖢J(f(i,x),g(j,y))\displaystyle\mathsf{C}^{\prime}_{J}(f^{\sharp}(i,x),g^{\sharp}(j,y)) =𝖢J(i+ix,fx),(j+jx,gx))\displaystyle=\mathsf{C}^{\prime}_{J}(i+i_{x},f_{x}),(j+j_{x},g_{x}))
    =|(i+ix)(j+jx)||ij|+|ixjx|\displaystyle=|(i+i_{x})-(j+j_{x})|\leq|i-j|+|i_{x}-j_{x}|
    𝖢I((i,x),(j,y))+sup(x,y)EqI(x=yI)𝖢J(f(x),g(y))\displaystyle\leq\mathsf{C}^{\prime}_{I}((i,x),(j,y))+\sup_{(x,y)\in{\color[rgb]{0,0,0}\mathrm{Eq}}I(\iff x=y\in I)}\mathsf{C}^{\prime}_{J}(f(x),g(y))
  • If xyx\neq y or fzgzf_{z}\neq g_{z} for some zIz\in I, we have

    𝖢J(f(i,x),g(j,y))=𝖢I((i,x),(j,y))+sup(x,y)EqI(x=yI)𝖢J(f(x),g(y)).\mathsf{C}^{\prime}_{J}(f^{\sharp}(i,x),g^{\sharp}(j,y))\leq\infty=\mathsf{C}^{\prime}_{I}((i,x),(j,y))+\sup_{(x,y)\in{\color[rgb]{0,0,0}\mathrm{Eq}}I(\iff x=y\in I)}\mathsf{C}^{\prime}_{J}(f(x),g(y)).

This completes the proof. ∎

Proposition 18.

The family 𝖭𝖢={𝖭𝖢I:(P(×I))2𝒩}I𝐒𝐞𝐭\mathsf{NC}=\{\mathsf{NC}_{I}\colon(P(\mathbb{N}\times I))^{2}\to\mathcal{N}\}_{I\in{\bf Set}} of 𝒩\mathcal{N}-divergences defined by

𝖭𝖢I(A,B)sup(i,x)A,(j,x)B|ij|\mathsf{NC}_{I}(A,B)\triangleq\sup_{(i,x)\in A,(j,x)\in B}|i-j|

is a Top{\color[rgb]{0,0,0}\mathrm{Top}}-relative 𝒩\mathcal{N}-divergence on the monad P(×)P(\mathbb{N}\times-).

Proof.

The monotonicity of 𝖭𝖢\mathsf{NC} is obvious.

We show the Top{\color[rgb]{0,0,0}\mathrm{Top}}-unit-reflexivity of 𝖭𝖢\mathsf{NC} . For all (x,y)TopI(x,y)\in{\color[rgb]{0,0,0}\mathrm{Top}}I (that is, x,yIx,y\in I), we have

𝖭𝖢I(ηI(x),ηI(y))=𝖭𝖢I({(0,x)},{(0,y)})=|00|=0.\mathsf{NC}_{I}(\eta_{I}(x),\eta_{I}(y))={\mathsf{NC}_{I}}{(\{(0,x)\},\{(0,y)\})}=|0-0|=0.

We show the Top{\color[rgb]{0,0,0}\mathrm{Top}}-composability of 𝖭𝖢\mathsf{NC}. For all f,g:IP(×J)f,g\colon I\to P(\mathbb{N}\times J) and A,BP(×I)A,B\in P(\mathbb{N}\times I), we have

𝖭𝖢J(fA,gB)\displaystyle\mathsf{NC}_{J}(f^{\sharp}A,g^{\sharp}B) =sup{|ij|(i,x)f(A),(j,y)g(B)}\displaystyle=\sup\{|i-j|\mid(i,x)\in f^{\sharp}(A),(j,y)\in g^{\sharp}(B)\}
=sup{|i1+i2j1j2||(i1,x)A,(j1,y)B,(i2,x)f(x),(j2,y)g(y)}\displaystyle=\sup\left\{|i_{1}+i_{2}-j_{1}-j_{2}|~{}\middle|\begin{array}[]{l@{}}(i_{1},x)\in A,(j_{1},y)\in B,\\ (i_{2},x^{\prime})\in f(x),(j_{2},y^{\prime})\in g(y)\end{array}\right\}
sup{|i1j1||(i1,x)A,(j1,y)B}\displaystyle\leq\sup\{|i_{1}-j_{1}|~{}|~{}(i_{1},x)\in A,(j_{1},y)\in B\}
+sup(x,y)TopI(x,yI){|i2j2||(i2,x)f(x),(j2,y)g(y)}\displaystyle\qquad+\sup_{(x,y)\in{\color[rgb]{0,0,0}\mathrm{Top}}I(\iff x,y\in I)}\{|i_{2}-j_{2}|~{}|~{}(i_{2},x^{\prime})\in f(x),(j_{2},y^{\prime})\in g(y)\}
=𝖭𝖢I(A,B)+sup(x,y)TopI(x,yI)𝖭𝖢J(f(x),g(y)).\displaystyle=\mathsf{NC}_{I}(A,B)+\sup_{(x,y)\in{\color[rgb]{0,0,0}\mathrm{Top}}I(\iff x,y\in I)}\mathsf{NC}_{J}(f(x),g(y)).

This completes the proof. ∎

Proposition 19.

The family 𝖭𝖢𝖨={𝖭𝖢𝖨I:(P(×I))2𝒩}I𝐒𝐞𝐭\mathsf{NCI}=\{\mathsf{NCI}_{I}\colon(P(\mathbb{N}\times I))^{2}\to\mathcal{N}\}_{I\in{\bf Set}} of 𝒵\mathcal{Z}-divergences defined by

𝖭𝖢𝖨I(A,B)sup(i,x)A,(j,y)Bij\mathsf{NCI}_{I}(A,B)\triangleq\sup_{(i,x)\in A,(j,y)\in B}i-j

is a Top{\color[rgb]{0,0,0}\mathrm{Top}}-relative 𝒵\mathcal{Z}-divergence on the monad P(×)P(\mathbb{N}\times-).

Proof.

The monotonicity of 𝖭𝖢𝖨\mathsf{NCI} is obvious.

We show the Top{\color[rgb]{0,0,0}\mathrm{Top}}-unit-reflexivity of 𝖭𝖢𝖨\mathsf{NCI} . For all (x,y)TopI(x,y)\in{\color[rgb]{0,0,0}\mathrm{Top}}I (that is, x,yIx,y\in I), we have

𝖭𝖢𝖨I(ηI(x),ηI(y))=𝖭𝖢𝖨I({(0,x)},{(0,y)})=00=0.\mathsf{NCI}_{I}(\eta_{I}(x),\eta_{I}(y))={\mathsf{NCI}_{I}}{(\{(0,x)\},\{(0,y)\})}=0-0=0.

We show the Top{\color[rgb]{0,0,0}\mathrm{Top}}-composability of 𝖭𝖢𝖨\mathsf{NCI}. For all f,g:IP(×J)f,g\colon I\to P(\mathbb{N}\times J) and A,BP(×I)A,B\in P(\mathbb{N}\times I), we have

𝖭𝖢𝖨J(fA,gB)\displaystyle\mathsf{NCI}_{J}(f^{\sharp}A,g^{\sharp}B) =sup{ij(i,x)f(A)(j,y)g(B)}\displaystyle=\sup\{i-j\mid(i,x)\in f^{\sharp}(A)\land(j,y)\in g^{\sharp}(B)\}
=sup{i1+i2j1j2|(i1,x)A,(j1,y)B,(i2,x)f(x),(j2,y)g(y)}\displaystyle=\sup\left\{i_{1}+i_{2}-j_{1}-j_{2}~{}\middle|\begin{array}[]{l@{}}(i_{1},x)\in A,(j_{1},y)\in B,\\ (i_{2},x^{\prime})\in f(x),(j_{2},y^{\prime})\in g(y)\end{array}\right\}
sup{i1j1|(i1,x)A,(j1,y)B}\displaystyle\leq\sup\{i_{1}-j_{1}~{}|~{}(i_{1},x)\in A,(j_{1},y)\in B\}
+sup(x,y)TopI(x,yI){i2j2|(i2,x)f(x),(j2,y)g(y)}\displaystyle\qquad+\sup_{(x,y)\in{\color[rgb]{0,0,0}\mathrm{Top}}I(\iff x,y\in I)}\{i_{2}-j_{2}~{}|~{}(i_{2},x^{\prime})\in f(x),(j_{2},y^{\prime})\in g(y)\}
=𝖭𝖢𝖨I(A,B)+sup(x,y)TopI(x,yI)𝖭𝖢𝖨J(f(x),g(y)).\displaystyle=\mathsf{NCI}_{I}(A,B)+\sup_{(x,y)\in{\color[rgb]{0,0,0}\mathrm{Top}}I(\iff x,y\in I)}\mathsf{NCI}_{J}(f(x),g(y)).

This completes the proof. ∎

__relaxaxp@oldproof (Proof of Proposition 1) We have Eq{\color[rgb]{0,0,0}\mathrm{Eq}}{}-unit reflexivity because the reflexivity 𝖣𝗂𝗏If(μ,μ)=0{}^{f}\mathsf{Div}_{I}(\mu,\mu)=0 is obtained from f(1)=0f(1)=0. We show Eq{\color[rgb]{0,0,0}\mathrm{Eq}}{}-composability. To show this, we prove a bit stronger statement. Consider three positive weight functions f,f1,f20f,f_{1},f_{2}\geq 0 with f(1)=f1(1)=f2(1)=0f(1)=f_{1}(1)=f_{2}(1)=0. Assume that there are some α,β,β\alpha,\beta,\beta^{\prime}\in\mathbb{R} satisfying the following conditions: {itembox}[l](A’) for all x,y,z,w[0,1]x,y,z,w\in[0,1], 0(βz+(1β)x)+γxf1(z/x)0\leq(\beta^{\prime}z+(1-\beta^{\prime})x)+\gamma xf_{1}\left({z}/{x}\right) and

xyf(zw/xy)\displaystyle xyf\left({zw}/{xy}\right) (βw+(1β)y)xf1(z/x)+(βz+(1β)x)yf2(w/y)\displaystyle\leq(\beta w+(1-\beta)y)xf_{1}\left({z}/{x}\right)+(\beta^{\prime}z+(1-\beta^{\prime})x)yf_{2}\left({w}/{y}\right)
+γxyf1(z/x)f2(w/y)+α(xz)(wy).\displaystyle\quad+\gamma xyf_{1}\left({z}/{x}\right)f_{2}\left({w}/{y}\right)+\alpha(x-z)(w-y).

Let μ1,μ2GsI\mu_{1},\mu_{2}\in G_{s}I, and let h,k:IGsJh,k\colon I\to G_{s}J. We want to show the composability in the sense of [45, Definition 5.2]:

𝖣𝗂𝗏Jf(hμ1,kμ2)𝖣𝗂𝗏If1(μ1,μ2)+supxI𝖣𝗂𝗏Jf2(h(x),k(x))+γ𝖣𝗂𝗏If1(μ1,μ2)supxI𝖣𝗂𝗏Jf2(h(x),k(x)).\begin{split}&{{}^{f}\mathsf{Div}}_{J}(h^{\sharp}\mu_{1},k^{\sharp}\mu_{2})\\ &\leq{{}^{f_{1}}\mathsf{Div}}_{I}(\mu_{1},\mu_{2})+\sup_{x\in I}{{}^{f_{2}}\mathsf{Div}}_{J}(h(x),k(x))+\gamma{{}^{f_{1}}\mathsf{Div}}_{I}(\mu_{1},\mu_{2})\cdot\sup_{x\in I}{{}^{f_{2}}\mathsf{Div}}_{J}(h(x),k(x)).\end{split} (23)

We first fix a measurable partition {Ai}i=0n\{A_{i}\}_{i=0}^{n} of JJ, that is a family {Ai}i=0n\{A_{i}\}_{i=0}^{n} of measurable subsets AiΣJA_{i}\in\Sigma_{J} satisfying ijAiAj=i\neq j\implies A_{i}\cap A_{j}=\emptyset and i=0nAi=J\bigcup_{i=0}^{n}A_{i}=J. For each 0in0\leq i\leq n, we fix two monotone increasing sequences {hli}l=0\{h^{i}_{l}\}_{l=0}^{\infty} and {kli}l=0\{k^{i}_{l}\}_{l=0}^{\infty} of simple functions that converge uniformly to measurable functions h()(Ai):I[0,1]h(-)(A_{i})\colon I\to[0,1] and k()(Ai):I[0,1]k(-)(A_{i})\colon I\to[0,1] respectively. The above composability (23) is then equivalent to

limli=0n(Xkli𝑑μ2)f(Xhli𝑑μ1Xkli𝑑μ2)𝖣𝗂𝗏If1(μ1,μ2)+supxI𝖣𝗂𝗏Jf2(h(x),k(x))+γ𝖣𝗂𝗏If1(μ1,μ2)supxI𝖣𝗂𝗏Jf2(h(x),k(x)).\begin{split}&\lim_{l\to\infty}\sum_{i=0}^{n}(\int_{X}k^{i}_{l}~{}d\mu_{2})f\left(\frac{\int_{X}h^{i}_{l}~{}d\mu_{1}}{\int_{X}k^{i}_{l}~{}d\mu_{2}}\right)\\ &\leq{{}^{f_{1}}\mathsf{Div}}_{I}(\mu_{1},\mu_{2})+\sup_{x\in I}{{}^{f_{2}}\mathsf{Div}}_{J}(h(x),k(x))+\gamma{{}^{f_{1}}\mathsf{Div}}_{I}(\mu_{1},\mu_{2})\sup_{x\in I}{{}^{f_{2}}\mathsf{Div}}_{J}(h(x),k(x)).\end{split} (24)

We fix ll\in\mathbb{N}. We suppose hli=j=0mαjiχBjh^{i}_{l}=\sum_{j=0}^{m}\alpha^{i}_{j}\chi_{B_{j}} and kli=j=0mβjiχBjk^{i}_{l}=\sum_{j=0}^{m}\beta^{i}_{j}\chi_{B_{j}} for some αji,βji[0,1]\alpha^{i}_{j},\beta^{i}_{j}\in[0,1] (0jm0\leq j\leq m) and a measurable partition {Bj}j=0m\{B_{j}\}_{j=0}^{m} of II.

Thanks to the condition (A’), we calculate as follows:

i=0n(Xkli𝑑μ2)f(Xhli𝑑μ1Xkli𝑑μ2)\displaystyle\sum_{i=0}^{n}(\int_{X}k^{i}_{l}~{}d\mu_{2})f\left(\frac{\int_{X}h^{i}_{l}~{}d\mu_{1}}{\int_{X}k^{i}_{l}~{}d\mu_{2}}\right)
i=0nj=0mβjiμ2(Bj)f(αjiμ1(Bj)βjiμ2(Bj))\displaystyle\leq\sum_{i=0}^{n}\sum_{j=0}^{m}\beta^{i}_{j}\mu_{2}(B_{j})f\left(\frac{\alpha^{i}_{j}\mu_{1}(B_{j})}{\beta^{i}_{j}\mu_{2}(B_{j})}\right)
i=0nj=0m(βαji+(1β)βji)μ2(Bj)f1(μ1(Bj)μ2(Bj))V1\displaystyle\leq\quad\underbrace{\sum_{i=0}^{n}\sum_{j=0}^{m}(\beta\alpha^{i}_{j}+(1-\beta)\beta^{i}_{j})~{}\mu_{2}(B_{j})f_{1}\left(\frac{\mu_{1}(B_{j})}{\mu_{2}(B_{j})}\right)}_{\triangleq V_{1}}
+i=0nj=0m(βμ1(Bj)+(1β)μ2(Bj))+γμ2(Bj)f1(μ1(Bj)μ2(Bj)))βijf2(αjiβji)V2\displaystyle\quad+\underbrace{\sum_{i=0}^{n}\sum_{j=0}^{m}\left(\beta^{\prime}\mu_{1}(B_{j})+(1-\beta^{\prime})\mu_{2}(B_{j}))+\gamma\mu_{2}(B_{j})f_{1}\left(\frac{\mu_{1}(B_{j})}{\mu_{2}(B_{j})}\right)\right)\beta^{i}_{j}f_{2}\left(\frac{\alpha^{i}_{j}}{\beta^{i}_{j}}\right)}_{\triangleq V_{2}}
+i=0nj=0mα(μ2(Bj)μ1(Bj))(αjiβji)V3\displaystyle\quad+\underbrace{\sum_{i=0}^{n}\sum_{j=0}^{m}\alpha(\mu_{2}(B_{j})-\mu_{1}(B_{j}))(\alpha^{i}_{j}-\beta^{i}_{j})}_{\triangleq V_{3}}

We evaluate the above three subexpressions V1,V2,V3V_{1},V_{2},V_{3} as follows.

We evaluate V1V_{1} as follows:

V1\displaystyle V_{1} (sup0jmi=0n(βαji+(1β)βji))j=0mμ2(Bj)f1(μ1(Bj)μ2(Bj))\displaystyle\leq\left(\sup_{0\leq j\leq m}\sum_{i=0}^{n}(\beta\alpha^{i}_{j}+(1-\beta)\beta^{i}_{j})\right)\cdot\sum_{j=0}^{m}\mu_{2}(B_{j})f_{1}\left(\frac{\mu_{1}(B_{j})}{\mu_{2}(B_{j})}\right)
=supxI(βi=0nhli(x)+(1β)i=0nkli(x))j=0mμ2(Bj)f1(μ1(Bj)μ2(Bj))\displaystyle=\sup_{x\in I}\left(\beta\sum_{i=0}^{n}h^{i}_{l}(x)+(1-\beta)\sum_{i=0}^{n}k^{i}_{l}(x)\right)\cdot\sum_{j=0}^{m}\mu_{2}(B_{j})f_{1}\left(\frac{\mu_{1}(B_{j})}{\mu_{2}(B_{j})}\right)
supxI(βi=0nhli(x)+(1β)i=0nkli(x))𝖣𝗂𝗏If1(μ1,μ2)\displaystyle\leq\sup_{x\in I}\left(\beta\sum_{i=0}^{n}h^{i}_{l}(x)+(1-\beta)\sum_{i=0}^{n}k^{i}_{l}(x)\right)\cdot{{}^{f_{1}}\mathsf{Div}}_{I}(\mu_{1},\mu_{2})
lsupxI(βh(x)(J)+(1β)k(x)(J))𝖣𝗂𝗏If1(μ1,μ2)\displaystyle\xrightarrow[]{~{}l\to\infty~{}}\sup_{x\in I}\left(\beta h(x)(J)+(1-\beta)k(x)(J)\right)\cdot{{}^{f_{1}}\mathsf{Div}}_{I}(\mu_{1},\mu_{2})
𝖣𝗂𝗏If1(μ1,μ2)\displaystyle\leq{{}^{f_{1}}\mathsf{Div}}_{I}(\mu_{1},\mu_{2})

Here, the first inequality is given from the non-negativity of each μ2(Bj)f1(μ1(Bj)μ2(Bj))\mu_{2}(B_{j})f_{1}\left(\frac{\mu_{1}(B_{j})}{\mu_{2}(B_{j})}\right); the equality is given by definition of αji\alpha_{j}^{i} and βji\beta_{j}^{i}; the second inequality can be given by the continuity of 𝖣𝗂𝗏f1{{}^{f_{1}}\mathsf{Div}} ([35, Theorem 16]; [52, Theorem 3] for the sub-Giry monad GsG_{s}):

𝖣𝗂𝗏If1(μ1,μ2)=sup{j=0mμ2(Bj)f1(μ1(Bj)μ2(Bj))|{Bj}j=0m:measurable partition of I};{{}^{f_{1}}\mathsf{Div}}_{I}(\mu_{1},\mu_{2})=\sup\left\{\sum_{j=0}^{m}\mu_{2}(B_{j})f_{1}\left(\frac{\mu_{1}(B_{j})}{\mu_{2}(B_{j})}\right)\middle|\{B_{j}\}_{j=0}^{m}\colon\text{measurable partition of }I\right\};

the last inequality is derived by βh(x)(J)+(1β)k(x)(J)[0,1]\beta h(x)(J)+(1-\beta)k(x)(J)\in[0,1] from the assumption that either β[0,1]\beta\in[0,1] or h(x)(J)=k(x)(J)h(x)(J)=k(x)(J) for all xIx\in I holds.

We next evaluate V2V_{2} as follows:

V2\displaystyle V_{2} (sup0jmi=0nβjif2(αjiβji))j=0m(βμ1(Bj)+(1β)μ2(Bj))+γμ2(Bj)f1(μ1(Bj)μ2(Bj)))\displaystyle\leq\left(\sup_{0\leq j\leq m}\sum_{i=0}^{n}\beta^{i}_{j}f_{2}\left(\frac{\alpha^{i}_{j}}{\beta^{i}_{j}}\right)\right)\sum_{j=0}^{m}\left(\beta^{\prime}\mu_{1}(B_{j})+(1-\beta^{\prime})\mu_{2}(B_{j}))+\gamma\mu_{2}(B_{j})f_{1}\left(\frac{\mu_{1}(B_{j})}{\mu_{2}(B_{j})}\right)\right)
=(supxIi=0nkli(x)f2(hli(x)kli(x)))(βμ1(I)+(1β)μ2(I)+γj=0mμ2(Bj)f1(μ1(Bj)μ2(Bj)))\displaystyle=\left(\sup_{x\in I}\sum_{i=0}^{n}k^{i}_{l}(x)f_{2}\left(\frac{h^{i}_{l}(x)}{k^{i}_{l}(x)}\right)\right)\left(\beta^{\prime}\mu_{1}(I)+(1-\beta^{\prime})\mu_{2}(I)+\gamma\sum_{j=0}^{m}\mu_{2}(B_{j})f_{1}\left(\frac{\mu_{1}(B_{j})}{\mu_{2}(B_{j})}\right)\right)
(supxIi=0nkli(x)f2(hli(x)kli(x)))(βμ1(I)+(1β)μ2(I)+γ𝖣𝗂𝗏If1(μ1,μ2))\displaystyle\leq\left(\sup_{x\in I}\sum_{i=0}^{n}k^{i}_{l}(x)f_{2}\left(\frac{h^{i}_{l}(x)}{k^{i}_{l}(x)}\right)\right)\left(\beta^{\prime}\mu_{1}(I)+(1-\beta^{\prime})\mu_{2}(I)+\gamma{{}^{f_{1}}\mathsf{Div}}_{I}(\mu_{1},\mu_{2})\right)
l(supxIi=0nk(x)(Ai)f2(h(x)(Ai)k(x)(Ai)))(βμ1(I)+(1β)μ2(I)+γ𝖣𝗂𝗏If1(μ1,μ2))\displaystyle\xrightarrow[]{~{}l\to\infty~{}}\left(\sup_{x\in I}\sum_{i=0}^{n}k(x)(A_{i})f_{2}\left(\frac{h(x)(A_{i})}{k(x)(A_{i})}\right)\right)\left(\beta^{\prime}\mu_{1}(I)+(1-\beta^{\prime})\mu_{2}(I)+\gamma{{}^{f_{1}}\mathsf{Div}}_{I}(\mu_{1},\mu_{2})\right)
supxI𝖣𝗂𝗏Jf2(h(x),k(x))(βμ1(I)+(1β)μ2(I)+γ𝖣𝗂𝗏If1(μ1,μ2))\displaystyle\leq\sup_{x\in I}{{}^{f_{2}}\mathsf{Div}}_{J}(h(x),k(x))\left(\beta^{\prime}\mu_{1}(I)+(1-\beta^{\prime})\mu_{2}(I)+\gamma{{}^{f_{1}}\mathsf{Div}}_{I}(\mu_{1},\mu_{2})\right)
supxI𝖣𝗂𝗏Jf2(h(x),k(x))γ𝖣𝗂𝗏If1(μ1,μ2)\displaystyle\leq\sup_{x\in I}{{}^{f_{2}}\mathsf{Div}}_{J}(h(x),k(x))\cdot\gamma{{}^{f_{1}}\mathsf{Div}}_{I}(\mu_{1},\mu_{2})
=γ𝖣𝗂𝗏If1(μ1,μ2)supxI𝖣𝗂𝗏Jf2(h(x),k(x)).\displaystyle=\gamma{{}^{f_{1}}\mathsf{Div}}_{I}(\mu_{1},\mu_{2})\cdot\sup_{x\in I}{{}^{f_{2}}\mathsf{Div}}_{J}(h(x),k(x)).

Here, the first inequality is derived from the non-negativity of each

(βμ1(Bj)+(1β)μ2(Bj))+γμ2(Bj)f1(μ1(Bj)μ2(Bj));(\beta^{\prime}\mu_{1}(B_{j})+(1-\beta^{\prime})\mu_{2}(B_{j}))+\gamma\mu_{2}(B_{j})f_{1}\left(\frac{\mu_{1}(B_{j})}{\mu_{2}(B_{j})}\right); (25)

the first equality is given by definition of αji\alpha_{j}^{i} and βji\beta_{j}^{i} and the countable additivity of μ1\mu_{1} and μ2\mu_{2}; the second inequality is given by the continuity of 𝖣𝗂𝗏f1{{}^{f_{1}}\mathsf{Div}} and 0γ0\leq\gamma; the last inequality is derived by βμ1(I)+(1β)μ2(I)[0,1]\beta^{\prime}\mu_{1}(I)+(1-\beta^{\prime})\mu_{2}(I)\in[0,1] from the assumption that either β[0,1]\beta^{\prime}\in[0,1] or μ1(I)=μ2(I)\mu_{1}(I)=\mu_{2}(I) holds. We prove the third inequality. Since f2f_{2} is convex function, and sequences {hli(x)}l=0\{h^{i}_{l}(x)\}_{l=0}^{\infty} and {kli(x)}l=0\{k^{i}_{l}(x)\}_{l=0}^{\infty} are monotone increasing at each xIx\in I, By Jensen’s inequality, the sequence {i=0nkli(x)f2(hli(x)/kli(x))}l=0\left\{\sum_{i=0}^{n}k^{i}_{l}(x)f_{2}\left({h^{i}_{l}(x)}/{k^{i}_{l}(x)}\right)\right\}_{l=0}^{\infty} is monotone increasing for each xIx\in I. Then, the sequence {supxIi=0nkli(x)f2(hli(x)/kli(x))}l=0\left\{\sup_{x\in I}\sum_{i=0}^{n}k^{i}_{l}(x)f_{2}\left({h^{i}_{l}(x)}/{k^{i}_{l}(x)}\right)\right\}_{l=0}^{\infty} of supremums is also monotone increasing, because each i=0nkl+1i(x)f2(hl+1i(x)/kl+1i(x))\sum_{i=0}^{n}k^{i}_{l+1}(x)f_{2}\left({h^{i}_{l+1}(x)}/{k^{i}_{l+1}(x)}\right) is always greater than i=0nkli(x)f2(hli(x)/kli(x))\sum_{i=0}^{n}k^{i}_{l}(x)f_{2}\left({h^{i}_{l}(x)}/{k^{i}_{l}(x)}\right). Hence,

limlsupxIi=0nkli(x)f2(hli(x)kli(x))\displaystyle\lim_{l\to\infty}\sup_{x\in I}\sum_{i=0}^{n}k^{i}_{l}(x)f_{2}\left(\frac{h^{i}_{l}(x)}{k^{i}_{l}(x)}\right) =suplsupxIi=0nkli(x)f2(hli(x)kli(x))\displaystyle=\sup_{l\in\mathbb{N}}\sup_{x\in I}\sum_{i=0}^{n}k^{i}_{l}(x)f_{2}\left(\frac{h^{i}_{l}(x)}{k^{i}_{l}(x)}\right)
=supxIsupli=0nkli(x)f2(hli(x)kli(x))\displaystyle=\sup_{x\in I}\sup_{l\in\mathbb{N}}\sum_{i=0}^{n}k^{i}_{l}(x)f_{2}\left(\frac{h^{i}_{l}(x)}{k^{i}_{l}(x)}\right)
=supxIi=0nk(x)(Ai)f2(h(x)(Ai)k(x)(Ai))\displaystyle=\sup_{x\in I}\sum_{i=0}^{n}k(x)(A_{i})f_{2}\left(\frac{h(x)(A_{i})}{k(x)(A_{i})}\right)
supxI𝖣𝗂𝗏f2(h(x),k(x)).\displaystyle\leq\sup_{x\in I}{{}^{f_{2}}\mathsf{Div}}(h(x),k(x)).

Finally, we evaluate V3V_{3} as follows:

V3\displaystyle V_{3} =j=0mα(μ2(Bj)μ1(Bj))(i=0nαjiβji)\displaystyle=\sum_{j=0}^{m}\alpha(\mu_{2}(B_{j})-\mu_{1}(B_{j}))(\sum_{i=0}^{n}\alpha^{i}_{j}-\beta^{i}_{j})
=α(Ihli𝑑μ2Ikli𝑑μ2+Ikli𝑑μ1Ihli𝑑μ1)\displaystyle=\alpha\left(\int_{I}h^{i}_{l}~{}d\mu_{2}-\int_{I}k^{i}_{l}~{}d\mu_{2}+\int_{I}k^{i}_{l}~{}d\mu_{1}-\int_{I}h^{i}_{l}~{}d\mu_{1}\right)
lα(Ih()(J)𝑑μ2Ik()(J)𝑑μ2+Ik()(J)𝑑μ1Ih()(J)𝑑μ1).\displaystyle\xrightarrow[]{~{}l\to\infty~{}}\alpha\left(\int_{I}h(-)(J)~{}d\mu_{2}-\int_{I}k(-)(J)~{}d\mu_{2}+\int_{I}k(-)(J)~{}d\mu_{1}-\int_{I}h(-)(J)~{}d\mu_{1}\right).

Here, if either α=0\alpha=0 or h(x)(J)=k(x)(J)h(x)(J)=k(x)(J) for any xIx\in I holds then the limit will be 0. To sum up the above evaluations of V1,V2,V3V_{1},V_{2},V_{3}, we obtain the inequality (24) if we have either

  1. 1.

    μ1(I)=μ2(I)=1\mu_{1}(I)=\mu_{2}(I)=1 and xI.h(x)(J)=k(x)(J)=1\forall x\in I.~{}h(x)(J)=k(x)(J)=1, or

  2. 2.

    α=0\alpha=0 and β,β[0,1]\beta,\beta\in[0,1].

This completes the proof. __relaxaxp@oldproof Parameters for Proposition 1 for for weight functions of 𝖳𝖵\mathsf{TV}, 𝖪𝖫\mathsf{KL}, 𝖧𝖣\mathsf{HD} and 𝖢𝗁𝗂\mathsf{Chi} are shown in Table 4. Below, we check the conditions in Proposition 1.

  • For the weight function f(t)=|t1|/2f(t)=|t-1|/2 of 𝖳𝖵\mathsf{TV}, the tuple (γ,α,β,β)=(0,0,1,0)(\gamma,\alpha,\beta,\beta^{\prime})=(0,0,1,0) satisfies for all x,y,z,w[0,1]x,y,z,w\in[0,1], we have

    0\displaystyle 0 w+xf(z/x),\displaystyle\leq w+xf({z}/{x}),
    xyf(zw/xy)\displaystyle xyf(zw/xy) =|zwxy|/2|zwwx|+|xwxy|/2=wxf(z/x)+xf(|w/y)/2.\displaystyle=|{zw}-xy|/2\leq|{zw}-{wx}|+|{xw}-xy|/2=wxf({z}/{x})+xf(|{w}/{y})/2.
  • For the weight function f(t)=tlog(t)t+1f(t)=t\log(t)-t+1 of 𝖪𝖫\mathsf{KL}, the tuple (γ,α,β,β)=(0,1,1,1)(\gamma,\alpha,\beta,\beta^{\prime})=(0,-1,1,1) satisfies for all x,y,z,w[0,1]x,y,z,w\in[0,1], we have

    0\displaystyle 0 z+xf(z/x),\displaystyle\leq z+xf(z/x),
    xy((zw/xy)log(zw/xy)zw/xy+1)\displaystyle xy((zw/xy)\log(zw/xy)-zw/xy+1)
    =zwlog(w/y)+zwlog(z/x)zw+xy\displaystyle=zw\log(w/y)+zw\log(z/x)-zw+xy
    =xw((z/x)log(z/x)z/x+1)+zy((w/y)log(w/y)w/y+1)(xz)(wy).\displaystyle=xw((z/x)\log(z/x)-z/x+1)+zy((w/y)\log(w/y)-w/y+1)-(x-z)(w-y).
  • For the weight function f(t)=(t1)2/2f(t)=(\sqrt{t}-1)^{2}/2 of 𝖧𝖣\mathsf{HD}, the tuple (γ,α,β,β)=(0,1/4,1/2,1/2)(\gamma,\alpha,\beta,\beta^{\prime})=(0,-1/4,1/2,1/2) satisfies for all x,y,z,w[0,1]x,y,z,w\in[0,1],

    0\displaystyle 0 (z+x)/2+f(z/x),\displaystyle\leq(z+x)/2+f(z/x),
    xyf(zw/xy)\displaystyle xyf(zw/xy) =(zw+xy)/2((x+z)(xz)2)((y+w)(yw)2)/4\displaystyle=(zw+xy)/2-((x+z)-(\sqrt{x}-\sqrt{z})^{2})((y+w)-(\sqrt{y}-\sqrt{w})^{2})/4
    =(zw+xy)/2((x+z)xf(z/x))((y+w)yf(w/y))/4\displaystyle=(zw+xy)/2-((x+z)-xf(z/x))((y+w)-yf(w/y))/4
    (y+w)/2xf(z/x)+(x+z)/2yf(w/y)(xz)(wy)/4.\displaystyle\leq(y+w)/2\cdot xf(z/x)+(x+z)/2\cdot yf(w/y)-(x-z)(w-y)/4.
  • For the weight function f(t)=(t1)2/2f(t)=({t}-1)^{2}/2 of 𝖢𝗁𝗂\mathsf{Chi}, The tuple (γ,α,β,β)=(1,2,2,2)(\gamma,\alpha,\beta,\beta^{\prime})=(1,-2,2,2) satisfies for all x,y,z,w[0,1]x,y,z,w\in[0,1],

    0\displaystyle 0 (2zx)+xf(z/x)=(2zx)+((z/x)1)(zx)=z+(z2/x),\displaystyle\leq(2z-x)+xf(z/x)=(2z-x)+((z/x)-1)(z-x)=z+(z^{2}/x),
    xyf(zw/xy)=z2w2/xy+xy2zw\displaystyle xyf(zw/xy)=z^{2}w^{2}/xy+xy-2zw
    =(xf(z/x)+2zx)(yf(w/y)+2wy)2zw+xy\displaystyle=(xf(z/x)+2z-x)(yf(w/y)+2w-y)-2zw+xy
    =(2wy)xf(z/x)+(2zx)yf(w/y)+xyf(z/x)f(w/y)2(xz)(wy).\displaystyle=(2w-y)xf(z/x)+(2z-x)yf(w/y)+xyf(z/x)f(w/y)-2(x-z)(w-y).

__relaxaxp@oldproof (Proof of Proposition 2)

We first show the monotonicity of p,λΔ\langle{p,\lambda}\rangle^{*}{\mathsf{\Delta}}. Assume mmm\leq m^{\prime}. From the monotonicity of the original Δ{\mathsf{\Delta}}, we obtain for each ν1,ν2U(SI))\nu_{1},\nu_{2}\in U^{\mathbb{C}}(SI)),

(p,λΔ)Im(ν1,ν2)\displaystyle(\langle{p,\lambda}\rangle^{*}{\mathsf{\Delta}})^{m}_{I}(\nu_{1},\nu_{2}) =ΔpIm((U𝔻λI)(ν1),(U𝔻λI)(ν2))\displaystyle={\mathsf{\Delta}}^{m}_{pI}((U^{\mathbb{D}}\lambda_{I})(\nu_{1}),(U^{\mathbb{D}}\lambda_{I})(\nu_{2}))
ΔpIm((U𝔻λI)(ν1),(U𝔻λI)(ν2))\displaystyle\geq{\mathsf{\Delta}}^{m^{\prime}}_{pI}((U^{\mathbb{D}}\lambda_{I})(\nu_{1}),(U^{\mathbb{D}}\lambda_{I})(\nu_{2}))
=(p,λΔ)Im(ν1,ν2).\displaystyle=(\langle{p,\lambda}\rangle^{*}{\mathsf{\Delta}})^{m^{\prime}}_{I}(\nu_{1},\nu_{2}).

Second, we show the FF-unit-reflexivity of p,λΔ\langle{p,\lambda}\rangle^{*}{\mathsf{\Delta}}. For FI=(I,I,RFI)FI=(I,I,R_{FI}), we have EpI=(p=(pI,pI,RFI)EpI=(p=(pI,pI,R_{FI}) for all II\in\mathbb{C}. We can calculate for all (x,y)RF(x,y)\in R_{F},

(p,λΔ)I1M(ηISx,ηISy)\displaystyle(\langle{p,\lambda}\rangle^{*}{\mathsf{\Delta}})^{1_{M}}_{I}(\eta^{S}_{I}\mathbin{\bullet}x,\eta^{S}_{I}\mathbin{\bullet}y) =ΔpI1M(U𝔻λIUηISx,U𝔻λIUηISy)\displaystyle={\mathsf{\Delta}}^{1_{M}}_{pI}(U^{\mathbb{D}}\lambda_{I}\circ U^{\mathbb{C}}\eta^{S}_{I}\circ x,U^{\mathbb{D}}\lambda_{I}\circ U^{\mathbb{C}}\eta^{S}_{I}\circ y)
=ΔpI1M((λIpηIS)x,(λIpηIS)y)\displaystyle={\mathsf{\Delta}}^{1_{M}}_{pI}((\lambda_{I}\circ p\eta^{S}_{I})\mathbin{\bullet}x,(\lambda_{I}\circ p\eta^{S}_{I})\mathbin{\bullet}y)
=ΔpI1M(ηpITx,ηpITy)0.\displaystyle={\mathsf{\Delta}}^{1_{M}}_{pI}(\eta^{T}_{pI}\mathbin{\bullet}x,\eta^{T}_{pI}\mathbin{\bullet}y)\leq 0.

Finally, we show the F{\color[rgb]{0,0,0}F}-composability of p,λΔ\langle{p,\lambda}\rangle^{*}{\mathsf{\Delta}}. For all JJ\in\mathbb{C}, c1,c2UTIc_{1},c_{2}\in U^{\mathbb{C}}TI, and f1,f2:ISJf_{1},f_{2}\colon I\to SJ we can calculate

(p,λΔ)Jmn(f1c1,f2c2)\displaystyle(\langle{p,\lambda}\rangle^{*}{\mathsf{\Delta}})^{mn}_{J}(f_{1}^{\sharp}\mathbin{\bullet}c_{1},f_{2}^{\sharp}\mathbin{\bullet}c_{2}) =ΔpJmn(U𝔻λJU𝔻p(f1)c1,U𝔻λJU𝔻p(f2)c2)\displaystyle={\mathsf{\Delta}}^{mn}_{pJ}(U^{\mathbb{D}}\lambda_{J}\circ U^{\mathbb{D}}p(f_{1}^{\sharp})\circ c_{1},U^{\mathbb{D}}\lambda_{J}\circ U^{\mathbb{D}}p(f_{2}^{\sharp})\circ c_{2})
=ΔpJmn(U𝔻((λJpf1))U𝔻λIc1,U𝔻((λJpf2))U𝔻λIc2)\displaystyle={\mathsf{\Delta}}^{mn}_{pJ}(U^{\mathbb{D}}((\lambda_{J}\circ pf_{1})^{\sharp})\circ U^{\mathbb{D}}\lambda_{I}\circ c_{1},U^{\mathbb{D}}((\lambda_{J}\circ pf_{2})^{\sharp})\circ U^{\mathbb{D}}\lambda_{I}\circ c_{2})
=ΔpJmn((λJpf1)(λIc1),(λJpf2)(λIc2))\displaystyle={\mathsf{\Delta}}^{mn}_{pJ}((\lambda_{J}\circ pf_{1})^{\sharp}\mathbin{\bullet}(\lambda_{I}\mathbin{\bullet}c_{1}),(\lambda_{J}\circ pf_{2})^{\sharp}\mathbin{\bullet}(\lambda_{I}\mathbin{\bullet}c_{2}))
ΔpIm(λIc1,λIc2)+sup(x,y)EpIΔpJn((λJpf1)x,(λJpf2)x)\displaystyle\leq{\mathsf{\Delta}}^{m}_{pI}(\lambda_{I}\mathbin{\bullet}c_{1},\lambda_{I}\mathbin{\bullet}c_{2})+\sup_{(x,y)\in EpI}{\mathsf{\Delta}}^{n}_{pJ}((\lambda_{J}\circ pf_{1})\mathbin{\bullet}x,(\lambda_{J}\circ pf_{2})\mathbin{\bullet}x)
=(p,λΔ)Im(c1,c2)+sup(x,y)FI(p,λΔ)Jn(f1x,f2y).\displaystyle=(\langle{p,\lambda}\rangle^{*}{\mathsf{\Delta}})^{m}_{I}(c_{1},c_{2})+\sup_{(x,y)\in FI}(\langle{p,\lambda}\rangle^{*}{\mathsf{\Delta}})^{n}_{J}(f_{1}\mathbin{\bullet}x,f_{2}\mathbin{\bullet}y).

To prove the second equality, we calculate

U𝔻λJU𝔻p(fi)\displaystyle U^{\mathbb{D}}\lambda_{J}\circ U^{\mathbb{D}}p(f_{i}^{\sharp}) =U𝔻(λJpμJSpSfi)=U𝔻(μpJTTλJλSJpSfi)\displaystyle=U^{\mathbb{D}}(\lambda_{J}\circ p\mu^{S}_{J}\circ pSf_{i})=U^{\mathbb{D}}(\mu^{T}_{pJ}\circ T\lambda_{J}\circ\lambda_{SJ}\circ pSf_{i})
=U𝔻(μpJTTλJTpfiλI)=U𝔻((λJpfi)λI).\displaystyle=U^{\mathbb{D}}(\mu^{T}_{pJ}\circ T\lambda_{J}\circ Tpf_{i}\circ\lambda_{I})=U^{\mathbb{D}}((\lambda_{J}\circ pf_{i})^{\sharp}\circ\lambda_{I}).

This completes the proof. __relaxaxp@oldproof __relaxaxp@oldproof (Proof of Proposition 3)

It suffices to show Top{\color[rgb]{0,0,0}\mathrm{Top}}-unit reflexivity and Top{\color[rgb]{0,0,0}\mathrm{Top}}-composability:

ΔI𝗅𝗂𝗉,dS(ηI(x),ηI(y))=sups,sSdS(π2(s,x),π2(s,y))dS(s,s)=dS(s,s)dS(s,s)=1,\displaystyle{\mathsf{\Delta}}^{\mathsf{lip},d_{S}}_{I}(\eta_{I}(x),\eta_{I}(y))=\sup_{s^{\prime},s\in S}\frac{d_{S}(\pi_{2}(s,x),\pi_{2}(s^{\prime},y))}{d_{S}(s,s^{\prime})}=\frac{d_{S}(s,s^{\prime})}{d_{S}(s,s^{\prime})}=1,
ΔJ𝗅𝗂𝗉,dS(F1(f1),F1(f2))\displaystyle{\mathsf{\Delta}}^{\mathsf{lip},d_{S}}_{J}(F_{1}^{\sharp}(f_{1}),F_{1}^{\sharp}(f_{2}))
=sups,sSdS(π2(F1(π1f1(s))(π2f1(s))),π2(F2(π1f2(s))(π2f2(s))))dS(s,s)\displaystyle=\sup_{s^{\prime},s\in S}\frac{d_{S}(\pi_{2}(F_{1}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\pi_{2}(F_{2}(\pi_{1}f_{2}(s^{\prime}))(\pi_{2}f_{2}(s^{\prime}))))}{d_{S}(s,s^{\prime})}
=sups,sSdS(π2f1(s),π2f2(s))dS(s,s)dS(π2(F1(π1f1(s))(π2f1(s))),π2(F2(π1f2(s))(π2f2(s))))dS(π2f1(s),π2f2(s))\displaystyle=\sup_{s^{\prime},s\in S}\frac{d_{S}(\pi_{2}f_{1}(s),\pi_{2}f_{2}(s^{\prime}))}{d_{S}(s,s^{\prime})}\cdot\frac{d_{S}(\pi_{2}(F_{1}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\pi_{2}(F_{2}(\pi_{1}f_{2}(s^{\prime}))(\pi_{2}f_{2}(s^{\prime}))))}{d_{S}(\pi_{2}f_{1}(s),\pi_{2}f_{2}(s^{\prime}))}
sups,sSdS(π2f1(s),π2f2(s))dS(s,s)supt,tSdS(π2(F1(π1f1(s))(t)),π2(F2(π1f2(s))(t)))dS(t,t)\displaystyle\leq\sup_{s^{\prime},s\in S}\frac{d_{S}(\pi_{2}f_{1}(s),\pi_{2}f_{2}(s^{\prime}))}{d_{S}(s,s^{\prime})}\cdot\sup_{t^{\prime},t\in S}\frac{d_{S}(\pi_{2}(F_{1}(\pi_{1}f_{1}(s))(t)),\pi_{2}(F_{2}(\pi_{1}f_{2}(s^{\prime}))(t^{\prime})))}{d_{S}(t,t^{\prime})}
ΔI𝗅𝗂𝗉,dS(f1,f2)supx,yIΔJ𝗅𝗂𝗉,dS(F1(x),F2(y))\displaystyle\leq{\mathsf{\Delta}}^{\mathsf{lip},d_{S}}_{I}(f_{1},f_{2})\cdot\sup_{x,y\in I}{\mathsf{\Delta}}^{\mathsf{lip},d_{S}}_{J}(F_{1}(x),F_{2}(y))

Here F1,F2:ITSJF_{1},F_{2}\colon I\to T_{S}J and f1,f2TSIf_{1},f_{2}\in T_{S}I. __relaxaxp@oldproof __relaxaxp@oldproof (Proof of Proposition 4)

It suffices to show Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-unit reflexivity and Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-composability:

ΔI𝗆𝖾𝗍,dS(ηI(x),ηI(x))\displaystyle{\mathsf{\Delta}}^{\mathsf{met},d_{S}}_{I}(\eta_{I}(x),\eta_{I}(x)) =supsSdS(π2(x,s),π2(x,s))=supsSdS(s,s)=0.\displaystyle=\sup_{s\in S}d_{S}(\pi_{2}(x,s),\pi_{2}(x,s))=\sup_{s\in S}d_{S}(s,s)=0.
ΔJ𝗆𝖾𝗍,dS(F1(f1),F1(f2))\displaystyle{\mathsf{\Delta}}^{\mathsf{met},d_{S}}_{J}(F_{1}^{\sharp}(f_{1}),F_{1}^{\sharp}(f_{2})) =supsSdS(π2(F1(π1f1(s))(π2f1(s))),π1(F2(π1f2(s))(π2f2(s))))\displaystyle=\sup_{s\in S}d_{S}(\pi_{2}(F_{1}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\pi_{1}(F_{2}(\pi_{1}f_{2}(s))(\pi_{2}f_{2}(s))))
supsSdS(π2(F1(π1f1(s))(π2f1(s))),π2(F2(π1f1(s))(π2f1(s))))\displaystyle\leq\sup_{s\in S}d_{S}(\pi_{2}(F_{1}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\pi_{2}(F_{2}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))))
+supsSdS(π2(F2(π1f1(s))(π2f1(s))),π2(F2(π1f1(s))(π2f2(s))))\displaystyle\quad+\sup_{s\in S}d_{S}(\pi_{2}(F_{2}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\pi_{2}(F_{2}(\pi_{1}f_{1}(s))(\pi_{2}f_{2}(s))))
supxIΔJ𝗆𝖾𝗍,dS(F1(x),F2(x))+ΔI𝗆𝖾𝗍,dS(f1,f2)\displaystyle\leq\sup_{x\in I}{\mathsf{\Delta}}^{\mathsf{met},d_{S}}_{J}(F_{1}(x),F_{2}(x))+{\mathsf{\Delta}}^{\mathsf{met},d_{S}}_{I}(f_{1},f_{2})

Here F1,F2:ITSJF_{1},F_{2}\colon I\to T_{S}J and f1,f2TSIf_{1},f_{2}\in T_{S}I. Without loss of generality, we may assume π1f1=π1f2\pi_{1}f_{1}=\pi_{1}f_{2} holds and π2f1\pi_{2}f_{1} and π2f2\pi_{2}f_{2} are nonexpansive, and for every xIx\in I, π1F1(x)=π1F2(x)\pi_{1}F_{1}(x)=\pi_{1}F_{2}(x) holds and π2F1(x)\pi_{2}F_{1}(x) and π2F2(x)\pi_{2}F_{2}(x) are nonexpansive. __relaxaxp@oldproof __relaxaxp@oldproof (Proof of Proposition 5) We first show the Eq{\color[rgb]{0,0,0}\mathrm{Eq}}{}-unit reflexivity of dTS()d^{T_{S}(-)}. For any sSs\in S, we calculate

dTSI(ηI(x),ηI(x))\displaystyle d_{T_{S}I}(\eta_{I}(x),\eta_{I}(x)) =supsSmax(dI(π1(x,s),π1(x,s)),dS(π2(x,s),π2(x,s))\displaystyle=\sup_{s\in S}\max\left(d_{I}(\pi_{1}(x,s),\pi_{1}(x,s)),d_{S}(\pi_{2}(x,s),\pi_{2}(x,s)\right)
=supsSmax(dI(x,x),dS(s,s))=0.\displaystyle=\sup_{s\in S}\max(d_{I}(x,x),d_{S}(s,s))=0.

We next show the Eq{\color[rgb]{0,0,0}\mathrm{Eq}}{}-composability of dTS()d^{T_{S}(-)}. For any f1,f2TS(I,dI)f_{1},f_{2}\in T_{S}(I,d_{I}) and nonexpansive functions F1,F2:(I,dI)TS(J,dJ)F_{1},F_{2}\colon(I,d_{I})\to T_{S}(J,d_{J}), we compute

dTSJ(F1(f1),F1(f2))\displaystyle d^{T_{S}J}(F_{1}^{\sharp}(f_{1}),F_{1}^{\sharp}(f_{2})) =supsSmax(dJ(π1(F1(π1f1(s))(π2f1(s))),π1(F2(π1f2(s))(π2f2(s))),dS(π2(F1(π1f1(s))(π2f1(s))),π2(F2(π1f2(s))(π2f2(s))))\displaystyle=\sup_{s\in S}\max\left(\begin{aligned} d_{J}(\pi_{1}(F_{1}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\pi_{1}(F_{2}(\pi_{1}f_{2}(s))(\pi_{2}f_{2}(s))),\\ d_{S}(\pi_{2}(F_{1}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\pi_{2}(F_{2}(\pi_{1}f_{2}(s))(\pi_{2}f_{2}(s)))\end{aligned}\right)\allowdisplaybreaks[0]
supsSmax(dJ(π1(F1(π1f1(s))(π2f1(s))),π1(F2(π1f1(s))(π2f1(s))),dJ(π1(F2(π1f1(s))(π2f1(s))),π1(F2(π1f2(s))(π2f2(s))),dS(π2(F1(π1f1(s))(π2f1(s))),π2(F2(π1f1(s))(π2f1(s))),dS(π2(F2(π1f1(s))(π2f1(s))),π2(F2(π1f2(s))(π2f2(s))))\displaystyle\leq\sup_{s\in S}\max\left(\begin{aligned} d_{J}(\pi_{1}(F_{1}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\pi_{1}(F_{2}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\\ d_{J}(\pi_{1}(F_{2}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\pi_{1}(F_{2}(\pi_{1}f_{2}(s))(\pi_{2}f_{2}(s))),\\ d_{S}(\pi_{2}(F_{1}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\pi_{2}(F_{2}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\\ d_{S}(\pi_{2}(F_{2}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\pi_{2}(F_{2}(\pi_{1}f_{2}(s))(\pi_{2}f_{2}(s)))\end{aligned}\right)\allowdisplaybreaks[1]
=supsSmax(dJ(π1(F2(π1f1(s))(π2f1(s))),π1(F2(π1f2(s))(π2f2(s))),dS(π2(F2(π1f1(s))(π2f1(s))),π2(F2(π1f2(s))(π2f2(s))),dJ(π1(F1(π1f1(s))(π2f1(s))),π1(F2(π1f1(s))(π2f1(s))),dS(π2(F1(π1f1(s))(π2f1(s))),π2(F2(π1f1(s))(π2f1(s))))\displaystyle=\sup_{s\in S}\max\left(\begin{aligned} d_{J}(\pi_{1}(F_{2}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\pi_{1}(F_{2}(\pi_{1}f_{2}(s))(\pi_{2}f_{2}(s))),\\ d_{S}(\pi_{2}(F_{2}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\pi_{2}(F_{2}(\pi_{1}f_{2}(s))(\pi_{2}f_{2}(s))),\\ d_{J}(\pi_{1}(F_{1}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\pi_{1}(F_{2}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\\ d_{S}(\pi_{2}(F_{1}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s))),\pi_{2}(F_{2}(\pi_{1}f_{1}(s))(\pi_{2}f_{1}(s)))\end{aligned}\right)\allowdisplaybreaks[0]
supsSmax(dI(π1(f1(s)),π1(f2(s))),dS(π2(f1(s)),π2(f2(s))),supxIsupsSmax(dJ(π1(F1(x)(s)),π1(F2(x)(s)),dS(π2(F1(x)(s)),π2(F2(x)(s))))\displaystyle\leq\sup_{s\in S}\max\left(\begin{aligned} d_{I}(\pi_{1}(f_{1}(s)),\pi_{1}(f_{2}(s))),\\ d_{S}(\pi_{2}(f_{1}(s)),\pi_{2}(f_{2}(s))),\\ \ \sup_{x\in I}\sup_{s^{\prime}\in S}\max\left(\begin{aligned} d_{J}(\pi_{1}(F_{1}(x)(s^{\prime})),\pi_{1}(F_{2}(x)(s^{\prime})),\\ d_{S}(\pi_{2}(F_{1}(x)(s^{\prime})),\pi_{2}(F_{2}(x)(s^{\prime}))\end{aligned}\right)\end{aligned}\right)\allowdisplaybreaks[1]
=max(supsSmax(dI(π1(f1(s)),π1(f2(s))),dS(π2(f1(s)),π2(f2(s)))),supxIsupsSmax(dJ(π1(F1(x)(s)),π1(F2(x)(s)),dS(π2(F1(x)(s)),π2(F2(x)(s))))\displaystyle=\max\left(\begin{aligned} \sup_{s\in S}\max(d_{I}(\pi_{1}(f_{1}(s)),\pi_{1}(f_{2}(s))),d_{S}(\pi_{2}(f_{1}(s)),\pi_{2}(f_{2}(s)))),\\ \sup_{x\in I}\sup_{s^{\prime}\in S}\max\left(\begin{aligned} d_{J}(\pi_{1}(F_{1}(x)(s^{\prime})),\pi_{1}(F_{2}(x)(s^{\prime})),\\ d_{S}(\pi_{2}(F_{1}(x)(s^{\prime})),\pi_{2}(F_{2}(x)(s^{\prime}))\end{aligned}\right)\end{aligned}\right)\allowdisplaybreaks[0]
=max(dTSI(f1,f2),supxIdTSJ(F1(x),F2(x)).\displaystyle=\max(d_{T_{S}I}(f_{1},f_{2}),\sup_{x\in I}d^{T_{S}J}(F_{1}(x),F_{2}(x)).

We note here that the nonexpansivity of F2:(I,dI)(S,dS)(S,dS)×(J,dJ)F_{2}\colon(I,d_{I})\to(S,d_{S})\Rightarrow(S,d_{S})\times(J,d_{J}) is equivalent to the one of its uncurrying F2¯:(S,dS)×(I,dI)(S,dS)×(J,dJ)\overline{F_{2}}\colon(S,d_{S})\times(I,d_{I})\to(S,d_{S})\times(J,d_{J}). __relaxaxp@oldproof __relaxaxp@oldproof (Proof of Proposition 6) We first show the 𝖣𝗂𝗌𝗍0\mathsf{Dist}_{0}{}-unit reflexivity of Δ𝖣𝗂𝗌𝗍0{\mathsf{\Delta}}^{\mathsf{Dist}_{0}}. For (x1,x2)𝖣𝗂𝗌𝗍0(I,dI)(x_{1},x_{2})\in\mathsf{Dist}_{0}(I,d_{I}) (i.e. dI(x1,x2)=0d_{I}(x_{1},x_{2})=0), we calculate

Δ(I,dI)𝖣𝗂𝗌𝗍0(ηI(x1),ηI(x2))\displaystyle{\mathsf{\Delta}}^{\mathsf{Dist}_{0}}_{(I,d_{I})}(\eta_{I}(x_{1}),\eta_{I}(x_{2})) =supdS(s1,s2)=0max(dI(π1(x1,s2),π1(x2,s2)),dS(π2(x1,s1),π2(x2,s2))\displaystyle=\sup_{d_{S}(s_{1},s_{2})=0}\max\left(d_{I}(\pi_{1}(x_{1},s_{2}),\pi_{1}(x_{2},s_{2})),d_{S}(\pi_{2}(x_{1},s_{1}),\pi_{2}(x_{2},s_{2})\right)
=supdS(s1,s2)=0max(dI(x1,x2),dS(s1,s2))=0.\displaystyle=\sup_{d_{S}(s_{1},s_{2})=0}\max(d_{I}(x_{1},x_{2}),d_{S}(s_{1},s_{2}))=0.

Next, we show the 𝖣𝗂𝗌𝗍0\mathsf{Dist}_{0}{}-composability of Δ𝖣𝗂𝗌𝗍0{\mathsf{\Delta}}^{\mathsf{Dist}_{0}}. For any f1,f2TS(I,dI)f_{1},f_{2}\in T_{S}(I,d_{I}) and nonexpansive functions F1,F2:(I,dI)TS(J,dJ)F_{1},F_{2}\colon(I,d_{I})\to T_{S}(J,d_{J}), we compute

ΔJ𝖣𝗂𝗌𝗍0(F1(f1),F1(f2))\displaystyle{\mathsf{\Delta}}^{\mathsf{Dist}_{0}}_{J}(F_{1}^{\sharp}(f_{1}),F_{1}^{\sharp}(f_{2}))
=supdS(s1,s2)=0max(dJ(π1(F1(π1f1(s1))(π2f1(s1))),π1(F2(π1f2(s2))(π2f2(s2))),dS(π2(F1(π1f1(s1))(π2f1(s1))),π2(F2(π1f2(s2))(π2f2(s2))))\displaystyle=\sup_{d_{S}(s_{1},s_{2})=0}\max\left(\begin{aligned} d_{J}(\pi_{1}(F_{1}(\pi_{1}f_{1}(s_{1}))(\pi_{2}f_{1}(s_{1}))),\pi_{1}(F_{2}(\pi_{1}f_{2}(s_{2}))(\pi_{2}f_{2}(s_{2}))),\\ d_{S}(\pi_{2}(F_{1}(\pi_{1}f_{1}(s_{1}))(\pi_{2}f_{1}(s_{1}))),\pi_{2}(F_{2}(\pi_{1}f_{2}(s_{2}))(\pi_{2}f_{2}(s_{2})))\end{aligned}\right)\allowdisplaybreaks[0]
supdS(s1,s2)=0max(dJ(π1(F1(π1f1(s1))(π2f1(s1))),π1(F2(π1f1(s1))(π2f1(s1))),dJ(π1(F2(π1f1(s1))(π2f1(s1))),π1(F2(π1f2(s2))(π2f2(s2))),dS(π2(F1(π1f1(s1))(π2f1(s1))),π2(F2(π1f1(s1))(π2f1(s1))),dS(π2(F2(π1f1(s1))(π2f1(s1))),π2(F2(π1f2(s2))(π2f2(s2))))\displaystyle\leq\sup_{d_{S}(s_{1},s_{2})=0}\max\left(\begin{aligned} d_{J}(\pi_{1}(F_{1}(\pi_{1}f_{1}(s_{1}))(\pi_{2}f_{1}(s_{1}))),\pi_{1}(F_{2}(\pi_{1}f_{1}(s_{1}))(\pi_{2}f_{1}(s_{1}))),\\ d_{J}(\pi_{1}(F_{2}(\pi_{1}f_{1}(s_{1}))(\pi_{2}f_{1}(s_{1}))),\pi_{1}(F_{2}(\pi_{1}f_{2}(s_{2}))(\pi_{2}f_{2}(s_{2}))),\\ d_{S}(\pi_{2}(F_{1}(\pi_{1}f_{1}(s_{1}))(\pi_{2}f_{1}(s_{1}))),\pi_{2}(F_{2}(\pi_{1}f_{1}(s_{1}))(\pi_{2}f_{1}(s_{1}))),\\ d_{S}(\pi_{2}(F_{2}(\pi_{1}f_{1}(s_{1}))(\pi_{2}f_{1}(s_{1}))),\pi_{2}(F_{2}(\pi_{1}f_{2}(s_{2}))(\pi_{2}f_{2}(s_{2})))\end{aligned}\right)\allowdisplaybreaks[1]
=supdS(s1,s2)=0max(dJ(π1(F2(π1f1(s1))(π2f1(s1))),π1(F2(π1f2(s2))(π2f2(s2))),dS(π2(F2(π1f1(s1))(π2f1(s1))),π2(F2(π1f2(s2))(π2f2(s2))),dJ(π1(F1(π1f1(s1))(π2f1(s1))),π1(F2(π1f1(s1))(π2f1(s1))),dS(π2(F1(π1f1(s1))(π2f1(s1))),π2(F2(π1f1(s1))(π2f1(s1))))\displaystyle=\sup_{d_{S}(s_{1},s_{2})=0}\max\left(\begin{aligned} d_{J}(\pi_{1}(F_{2}(\pi_{1}f_{1}(s_{1}))(\pi_{2}f_{1}(s_{1}))),\pi_{1}(F_{2}(\pi_{1}f_{2}(s_{2}))(\pi_{2}f_{2}(s_{2}))),\\ d_{S}(\pi_{2}(F_{2}(\pi_{1}f_{1}(s_{1}))(\pi_{2}f_{1}(s_{1}))),\pi_{2}(F_{2}(\pi_{1}f_{2}(s_{2}))(\pi_{2}f_{2}(s_{2}))),\\ d_{J}(\pi_{1}(F_{1}(\pi_{1}f_{1}(s_{1}))(\pi_{2}f_{1}(s_{1}))),\pi_{1}(F_{2}(\pi_{1}f_{1}(s_{1}))(\pi_{2}f_{1}(s_{1}))),\\ d_{S}(\pi_{2}(F_{1}(\pi_{1}f_{1}(s_{1}))(\pi_{2}f_{1}(s_{1}))),\pi_{2}(F_{2}(\pi_{1}f_{1}(s_{1}))(\pi_{2}f_{1}(s_{1})))\end{aligned}\right)\allowdisplaybreaks[0]
supdS(s1,s2)=0max(dI(π1(f1(s1)),π1(f2(s2))),dS(π2(f1(s1)),π2(f2(s2))),sup(x1,x2)𝖣𝗂𝗌𝗍0(I,dI)supdS(s1,s2)=0max(dJ(π1(F1(x)(s1)),π1(F2(x)(s2)),dS(π2(F1(x)(s1)),π2(F2(x)(s2))))\displaystyle\leq\sup_{d_{S}(s_{1},s_{2})=0}\max\left(\begin{aligned} d_{I}(\pi_{1}(f_{1}(s_{1})),\pi_{1}(f_{2}(s_{2}))),\\ d_{S}(\pi_{2}(f_{1}(s_{1})),\pi_{2}(f_{2}(s_{2}))),\\ \ \sup_{(x_{1},x_{2})\in\mathsf{Dist}_{0}(I,d_{I})}\sup_{d_{S}(s^{\prime}_{1},s^{\prime}_{2})=0}\max\left(\begin{aligned} d_{J}(\pi_{1}(F_{1}(x)(s^{\prime}_{1})),\pi_{1}(F_{2}(x)(s_{2}^{\prime})),\\ d_{S}(\pi_{2}(F_{1}(x)(s^{\prime}_{1})),\pi_{2}(F_{2}(x)(s^{\prime}_{2}))\end{aligned}\right)\end{aligned}\right)\allowdisplaybreaks[1]
=max(supdS(s1,s2)=0max(dI(π1(f1(s1)),π1(f2(s2))),dS(π2(f1(s1)),π2(f2(s2)))),sup(x1,x2)𝖣𝗂𝗌𝗍0(I,dI)supdS(s1,s2)=0max(dJ(π1(F1(x1)(s1)),π1(F2(x2)(s2)),dS(π2(F1(x1)(s1)),π2(F2(x2)(s2))))\displaystyle=\max\left(\begin{aligned} \sup_{d_{S}(s^{\prime}_{1},s^{\prime}_{2})=0}\max(d_{I}(\pi_{1}(f_{1}(s_{1})),\pi_{1}(f_{2}(s_{2}))),d_{S}(\pi_{2}(f_{1}(s_{1})),\pi_{2}(f_{2}(s_{2})))),\\ \sup_{(x_{1},x_{2})\in\mathsf{Dist}_{0}(I,d_{I})}\sup_{d_{S}(s^{\prime}_{1},s^{\prime}_{2})=0}\max\left(\begin{aligned} d_{J}(\pi_{1}(F_{1}(x_{1})(s^{\prime}_{1})),\pi_{1}(F_{2}(x_{2})(s^{\prime}_{2})),\\ d_{S}(\pi_{2}(F_{1}(x_{1})(s^{\prime}_{1})),\pi_{2}(F_{2}(x_{2})(s^{\prime}_{2}))\end{aligned}\right)\end{aligned}\right)\allowdisplaybreaks[0]
=max(ΔI𝖣𝗂𝗌𝗍0(f1,f2),sup(x1,x2)𝖣𝗂𝗌𝗍0(I,dI)ΔJ𝖣𝗂𝗌𝗍0(F1(x),F2(x)).\displaystyle=\max({\mathsf{\Delta}}^{\mathsf{Dist}_{0}}_{I}(f_{1},f_{2}),\sup_{(x_{1},x_{2})\in\mathsf{Dist}_{0}(I,d_{I})}{\mathsf{\Delta}}^{\mathsf{Dist}_{0}}_{J}(F_{1}(x),F_{2}(x)).

This completes the proof. __relaxaxp@oldproof __relaxaxp@oldproof

(Proof of Proposition 7) The monotonicity of 𝖢(Δ,N){\mathsf{C}}({\mathsf{\Delta}},N) is obvious since M=1M=1.

We show the Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-unit reflexivity of 𝖢(Δ,N){\mathsf{C}}({\mathsf{\Delta}},N). For all xUIx\in UI, we have

Tπ1(ηIT(N×)x)\displaystyle T\pi_{1}\mathbin{\bullet}(\eta^{T(N\times-)}_{I}\mathbin{\bullet}x) =(Tπ1ηIT(N×))x\displaystyle=(T\pi_{1}\circ\eta^{T(N\times-)}_{I})\mathbin{\bullet}x =(Tπ1TηI(N×)ηIT)x\displaystyle=(T\pi_{1}\circ T\eta^{(N\times-)}_{I}\circ\eta^{T}_{I})\mathbin{\bullet}x
=(T(1N!I)ηIT)x\displaystyle=(T(1_{N}\circ!_{I})\circ\eta^{T}_{I})\mathbin{\bullet}x =(ηT1N!I)x\displaystyle=(\eta^{T}\circ 1_{N}\circ!_{I})\mathbin{\bullet}x
=ηT((1N!I)x)\displaystyle=\eta^{T}\mathbin{\bullet}((1_{N}\circ!_{I})\mathbin{\bullet}x) =ηT(1N(!Ix))\displaystyle=\eta^{T}\mathbin{\bullet}(1_{N}\mathbin{\bullet}(!_{I}\mathbin{\bullet}x))
=ηT1N\displaystyle=\eta^{T}\mathbin{\bullet}1_{N}

Hence,

𝖢(Δ,N)I(ηT(N×)x,ηT(N×)x)\displaystyle{\mathsf{C}}({\mathsf{\Delta}},N)_{I}(\eta^{T(N\times-)}\mathbin{\bullet}x,\eta^{T(N\times-)}\mathbin{\bullet}x) =𝖢(Δ,N)I(ηT(N×)x,ηT(N×)x)\displaystyle={\mathsf{C}}({\mathsf{\Delta}},N)_{I}(\eta^{T(N\times-)}\mathbin{\bullet}x,\eta^{T(N\times-)}\mathbin{\bullet}x)
=ΔN(Tπ1(ηT(N×)x),Tπ1(ηT(N×)x)\displaystyle={\mathsf{\Delta}}_{N}(T\pi_{1}\mathbin{\bullet}(\eta^{T(N\times-)}\mathbin{\bullet}x),T\pi_{1}\mathbin{\bullet}(\eta^{T(N\times-)}\mathbin{\bullet}x)
=ΔN(ηT1N,ηT1N)\displaystyle={\mathsf{\Delta}}_{N}(\eta^{T}\mathbin{\bullet}1_{N},\eta^{T}\mathbin{\bullet}1_{N})
0𝒬.\displaystyle\leq 0_{\mathcal{Q}}.

We next show the Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-composability of 𝖢(Δ,N){\mathsf{C}}({\mathsf{\Delta}},N). For any f:IT(N×I)f\colon I\to T(N\times I), we define hf:N×IT(N)h_{f}\colon N\times I\to T(N) by hf=T()θN,N(N×(Tπ1f))h_{f}=T(\star)\circ\theta_{N,N}\circ(N\times(T\pi_{1}\circ f)). Then, we have Tπ1f(T(N×I))ν=hfTνT\pi_{1}\mathbin{\bullet}f^{\sharp(T(N\times I))}\mathbin{\bullet}\nu=h_{f}^{\sharp T}\mathbin{\bullet}\nu for any νU(T(N×I))\nu\in U(T(N\times I)). First, for all m,nUNm,n\in UN, we have

(T()(ηN×N)n)m\displaystyle(T(\star)\circ(\eta_{N\times N})_{n})\mathbin{\bullet}m =T()(ηN×Nn,m)\displaystyle=T(\star)\mathbin{\bullet}(\eta_{N\times N}\mathbin{\bullet}\langle n,m\rangle) =(T()ηN×N)n,m)\displaystyle=(T(\star)\circ\eta_{N\times N})\mathbin{\bullet}\langle n,m\rangle)
=(ηN)n,m)\displaystyle=(\eta_{N}\circ\star)\mathbin{\bullet}\langle n,m\rangle) =ηN(n,m)\displaystyle=\eta_{N}\mathbin{\bullet}(\star\mathbin{\bullet}\langle n,m\rangle)
=ηN(nm)\displaystyle=\eta_{N}\mathbin{\bullet}(\star_{n}\mathbin{\bullet}m) =(ηNn)m.\displaystyle=(\eta_{N}\circ\star_{n})\mathbin{\bullet}m.

From this and the equality (1), we can calculate as follows:

hfn,i\displaystyle h_{f}\mathbin{\bullet}\langle n,i\rangle
=(T()θN,N(N×(Tπ1f)))n,i\displaystyle=(T(\star)\circ\theta_{N,N}\circ(N\times(T\pi_{1}\circ f)))\mathbin{\bullet}\langle n,i\rangle =(T()θN,N)((N×(Tπ1f))n,i)\displaystyle=(T(\star)\circ\theta_{N,N})\mathbin{\bullet}((N\times(T\pi_{1}\circ f))\mathbin{\bullet}\langle n,i\rangle)
=(T()θN,N)(U(N×(Tπ1f))(n,i))\displaystyle=(T(\star)\circ\theta_{N,N})\mathbin{\bullet}(U(N\times(T\pi_{1}\circ f))(n,i)) =(T()θN,N)((U(N)×U(Tπ1f))(n,i))\displaystyle=(T(\star)\circ\theta_{N,N})\mathbin{\bullet}((U(N)\times U(T\pi_{1}\circ f))(n,i))
=(T()θN,N)U(N)(n),U(Tπ1f)(i)\displaystyle=(T(\star)\circ\theta_{N,N})\mathbin{\bullet}\langle U(N)(n),U(T\pi_{1}\circ f)(i)\rangle =(T()θN,N)n,(Tπ1f)i\displaystyle=(T(\star)\circ\theta_{N,N})\mathbin{\bullet}\langle n,(T\pi_{1}\circ f)\mathbin{\bullet}i\rangle
=T()(θN,Nn,(Tπ1f)i)\displaystyle=T(\star)\mathbin{\bullet}(\theta_{N,N}\mathbin{\bullet}\langle n,(T\pi_{1}\circ f)\mathbin{\bullet}i\rangle) =T()((θN,N)n((Tπ1f)i))\displaystyle=T(\star)\mathbin{\bullet}((\theta_{N,N})_{n}\mathbin{\bullet}((T\pi_{1}\circ f)\mathbin{\bullet}i))
=T()(((ηN×N)n)((Tπ1f)i))\displaystyle=T(\star)\mathbin{\bullet}(((\eta_{N\times N})_{n})^{\sharp}\mathbin{\bullet}((T\pi_{1}\circ f)\mathbin{\bullet}i)) =(T()((ηN×N)n))((Tπ1f)i)\displaystyle=(T(\star)\circ((\eta_{N\times N})_{n})^{\sharp})\mathbin{\bullet}((T\pi_{1}\circ f)\mathbin{\bullet}i)
=(T()(ηN×N)n)((Tπ1f)i)\displaystyle=(T(\star)\circ(\eta_{N\times N})_{n})^{\sharp}\mathbin{\bullet}((T\pi_{1}\circ f)\mathbin{\bullet}i) =(ηN(n))((Tπ1f)i).\displaystyle=(\eta_{N}\circ(\star_{n}))^{\sharp}\mathbin{\bullet}((T\pi_{1}\circ f)\mathbin{\bullet}i).

From the assumption ΔN×I(c1,c2)𝖢(Δ,N)I(c1,c2){\mathsf{\Delta}}_{N\times I}(c_{1},c_{2})\leq{\mathsf{C}}({\mathsf{\Delta}},N)_{I}(c_{1},c_{2}), the Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-unit-reflexivity and Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-composability of the original divergence Δ{\mathsf{\Delta}}, we obtain the Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-composability of 𝖢(Δ,N){\mathsf{C}}({\mathsf{\Delta}},N) as follows:

𝖢(Δ,N)J(f1(T(N×I))c1,f2(T(N×I))c2)\displaystyle{\mathsf{C}}({\mathsf{\Delta}},N)_{J}(f_{1}^{\sharp(T(N\times I))}\mathbin{\bullet}c_{1},f_{2}^{\sharp(T(N\times I))}\mathbin{\bullet}c_{2})
=ΔN(Tπ1f1(T(N×I))c1,Tπ1f2(T(N×I))c2)\displaystyle={\mathsf{\Delta}}_{N}(T\pi_{1}\mathbin{\bullet}f_{1}^{\sharp(T(N\times I))}\mathbin{\bullet}c_{1},T\pi_{1}\mathbin{\bullet}f_{2}^{\sharp(T(N\times I))}\mathbin{\bullet}c_{2})
=ΔN(hf1Tc1,hf2Tc2)\displaystyle={\mathsf{\Delta}}_{N}(h_{f_{1}}^{\sharp T}\mathbin{\bullet}c_{1},h_{f_{2}}^{\sharp T}\mathbin{\bullet}c_{2})
ΔN×I(c1,c2)+supn,iU(N×I)ΔN(hf1n,i,hf2n,i)\displaystyle\leq{\mathsf{\Delta}}_{N\times I}(c_{1},c_{2})+\sup_{\langle n,i\rangle\in U(N\times I)}{\mathsf{\Delta}}_{N}(h_{f_{1}}\mathbin{\bullet}\langle n,i\rangle,h_{f_{2}}\mathbin{\bullet}\langle n,i\rangle)
=ΔN×I(c1,c2)\displaystyle={\mathsf{\Delta}}_{N\times I}(c_{1},c_{2})
+supn,iU(N×I)ΔN((ηN()n)T((Tπ1f)i),(ηN()n)T((Tπ1f)i))\displaystyle\qquad+\sup_{\langle n,i\rangle\in U(N\times I)}{\mathsf{\Delta}}_{N}((\eta_{N}\circ(\star)_{n})^{\sharp T}\mathbin{\bullet}((T\pi_{1}\circ f)\mathbin{\bullet}i),(\eta_{N}\circ(\star)_{n})^{\sharp T}\mathbin{\bullet}((T\pi_{1}\circ f)\mathbin{\bullet}i))
ΔN×I(c1,c2)\displaystyle\leq{\mathsf{\Delta}}_{N\times I}(c_{1},c_{2})
+supn,iU(N×I)(ΔN((Tπ1f)i,(Tπ1f)i)+supmUNΔN(ηN()n)m,(ηN()n)m))\displaystyle\qquad+\sup_{\langle n,i\rangle\in U(N\times I)}\left(\begin{aligned} &{\mathsf{\Delta}}_{N}((T\pi_{1}\circ f)\mathbin{\bullet}i,(T\pi_{1}\circ f)\mathbin{\bullet}i)\\ &\qquad+\sup_{m\in UN}{\mathsf{\Delta}}_{N}(\eta_{N}\circ(\star)_{n})\mathbin{\bullet}m,(\eta_{N}\circ(\star)_{n})\mathbin{\bullet}m)\end{aligned}\right)
ΔN×I(c1,c2)+supn,iU(N×I)ΔN((Tπ1f)i,(Tπ1f)i)\displaystyle\leq{\mathsf{\Delta}}_{N\times I}(c_{1},c_{2})+\sup_{\langle n,i\rangle\in U(N\times I)}{\mathsf{\Delta}}_{N}((T\pi_{1}\circ f)\mathbin{\bullet}i,(T\pi_{1}\circ f)\mathbin{\bullet}i)
=ΔN×I(c1,c2)+supiUIΔN((Tπ1f1)i,(Tπ1f2)i)\displaystyle={\mathsf{\Delta}}_{N\times I}(c_{1},c_{2})+\sup_{i\in UI}{\mathsf{\Delta}}_{N}((T\pi_{1}\circ f_{1})\mathbin{\bullet}i,(T\pi_{1}\circ f_{2})\mathbin{\bullet}i)
𝖢(Δ,N)I(c1,c2)+supiUI𝖢(Δ,N)J(f1i,f2i).\displaystyle\leq{\mathsf{C}}({\mathsf{\Delta}},N)_{I}(c_{1},c_{2})+\sup_{i\in UI}{\mathsf{C}}({\mathsf{\Delta}},N)_{J}(f_{1}\mathbin{\bullet}i,f_{2}\mathbin{\bullet}i).

This completes the proof. __relaxaxp@oldproof __relaxaxp@oldproof (Proof of Proposition 8)

We consider a preorder \sqsubseteq on a monad TT. We define the \mathcal{B}-divergence Δ{\mathsf{\Delta}}^{\sqsubseteq} on TITI by

ΔI(c1,c2){0c1Ic21c1Ic2{\mathsf{\Delta}}^{\sqsubseteq}_{I}(c_{1},c_{2})\triangleq\begin{cases}0&c_{1}\not\sqsubseteq_{I}c_{2}\\ 1&c_{1}\sqsubseteq_{I}c_{2}\end{cases}

Each Δ~(1)I\tilde{\mathsf{\Delta}}(1)I is a preorder because Δ~(1)I=I\tilde{\mathsf{\Delta}}(1)I={\sqsubseteq_{I}} holds for each II.

The Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-unit reflexivity of Δ{\mathsf{\Delta}}^{\sqsubseteq} is derived from the reflexivity of \sqsubseteq. For all set II and cTIc\in TI,

(ΔI(c,c)1)(cIc).({\mathsf{\Delta}}^{\sqsubseteq}_{I}(c,c)\leq 1)\iff(c\sqsubseteq_{I}c).

Since \sqsubseteq is a preorder on TT, for all set I,JI,J, c1,c2TIc_{1},c_{2}\in TI and f,g:ITJf,g\colon I\to TJ,

(ΔI(c1,c2)×supxIΔJ(f(x),g(x)))=1\displaystyle({\mathsf{\Delta}}^{\sqsubseteq}_{I}(c_{1},c_{2})\times\sup_{x\in I}{\mathsf{\Delta}}^{\sqsubseteq}_{J}(f(x),g(x)))=1
(ΔI(c1,c2)=1)(supxIΔJ(f(x),g(x))=1)\displaystyle\iff({\mathsf{\Delta}}^{\sqsubseteq}_{I}(c_{1},c_{2})=1)\land(\sup_{x\in I}{\mathsf{\Delta}}^{\sqsubseteq}_{J}(f(x),g(x))=1)
(c1Ic2)(xI.f(x)Jg(x))\displaystyle\iff(c_{1}\sqsubseteq_{I}c_{2})\land(\forall{x\in I}.~{}f(x)\sqsubseteq_{J}g(x))
(f(c1)Jf(c2))(f(c2)Jg(c2))\displaystyle\implies(f^{\sharp}(c_{1})\sqsubseteq_{J}f^{\sharp}(c_{2}))\land(f^{\sharp}(c_{2})\sqsubseteq_{J}g^{\sharp}(c_{2}))
(f(c1)Jg(c2))\displaystyle\implies(f^{\sharp}(c_{1})\sqsubseteq_{J}g^{\sharp}(c_{2}))
(ΔJ(f(c1),g(c2))=1)\displaystyle\iff({\mathsf{\Delta}}^{\sqsubseteq}_{J}(f^{\sharp}(c_{1}),g^{\sharp}(c_{2}))=1)

Hence, we have the Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-composability

ΔJ(f(c1),g(c2))ΔI(c1,c2)×supxIΔJ(f(x),g(x)).\displaystyle{\mathsf{\Delta}}^{\sqsubseteq}_{J}(f^{\sharp}(c_{1}),g^{\sharp}(c_{2}))\leq{\mathsf{\Delta}}^{\sqsubseteq}_{I}(c_{1},c_{2})\times\sup_{x\in I}{\mathsf{\Delta}}^{\sqsubseteq}_{J}(f(x),g(x)).

Conversely, we consider an Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-relative \mathcal{B}-divergence Δ{\mathsf{\Delta}} on TT such that each Δ~(1)I\tilde{\mathsf{\Delta}}(1)I is a preorder. We show that the family Δ={IΔ}I𝐒𝐞𝐭\sqsubseteq^{\mathsf{\Delta}}=\{\sqsubseteq^{\mathsf{\Delta}}_{I}\}_{I\in{\bf Set}} defined by IΔΔ~(1)I{\sqsubseteq^{\mathsf{\Delta}}_{I}}\triangleq\tilde{\mathsf{\Delta}}(1)I forms a preorder on monad TT.

Each component IΔ\sqsubseteq^{\mathsf{\Delta}}_{I} of Δ\sqsubseteq^{\mathsf{\Delta}} at set II is a preorder on the set TITI. We here note that the divergence Δ{\mathsf{\Delta}} must be reflexive (i.e. ΔI(c,c)1{\mathsf{\Delta}}_{I}(c,c)\leq 1 for all I𝐒𝐞𝐭,cTII\in{\bf Set},c\in TI) because of the reflexivity of IΔ\sqsubseteq^{\mathsf{\Delta}}_{I}:

(ΔI(c,c)1)(cIΔc), for all I𝐒𝐞𝐭,cTI.({\mathsf{\Delta}}_{I}(c,c)\leq 1)\iff(c\sqsubseteq^{\mathsf{\Delta}}_{I}c),\quad\text{ for all }I\in{\bf Set},c\in TI.

From the reflexivity and Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-composability of Δ{\mathsf{\Delta}}, we have for all c1,c2,cTIc_{1},c_{2},c\in TI and f,g:ITJf,g\colon I\to TJ,

c1,c2TI,f:ITJ.\displaystyle\forall{c_{1},c_{2}\in TI,f\colon I\to TJ}~{}.~{} ΔJ(f(c1),f(c2))ΔI(c1,c2),\displaystyle{\mathsf{\Delta}}_{J}(f^{\sharp}(c_{1}),f^{\sharp}(c_{2}))\leq{\mathsf{\Delta}}_{I}(c_{1},c_{2}), (26)
cTI,f,g:ITJ.\displaystyle\forall{c\in TI,f,g\colon I\to TJ}~{}.~{} ΔJ(f(c),g(c))supxIΔJ(f(x),g(x)).\displaystyle{\mathsf{\Delta}}_{J}(f^{\sharp}(c),g^{\sharp}(c))\leq\sup_{x\in I}{\mathsf{\Delta}}_{J}(f(x),g(x)). (27)

They are equivalent to the substitutivity and congruence of Δ\sqsubseteq^{\mathsf{\Delta}} respectively:

(26)\displaystyle(\ref{divergence_premonad_subst}) c1,c2TI,f:ITJ.(c1IΔc2f(c1)JΔf(c2)),\displaystyle\iff\forall{c_{1},c_{2}\in TI,f\colon I\to TJ}~{}.~{}(c_{1}\sqsubseteq^{\mathsf{\Delta}}_{I}c_{2}\implies f^{\sharp}(c_{1})\sqsubseteq^{\mathsf{\Delta}}_{J}f^{\sharp}(c_{2})),
(27)\displaystyle(\ref{divergence_premonad_congr}) cTI,f,g:ITJ.(xI.f(x)JΔg(x)f(c)JΔg(c)).\displaystyle\iff\forall{c\in TI,f,g\colon I\to TJ}~{}.~{}(\forall{x\in I}.~{}f(x)\sqsubseteq^{\mathsf{\Delta}}_{J}g(x)\implies f^{\sharp}(c)\sqsubseteq^{\mathsf{\Delta}}_{J}g^{\sharp}(c)).

Finally, the above conversions Δ(){\mathsf{\Delta}}^{(-)} and ()\sqsubseteq^{(-)} are mutually inverse:

ΔIΔ(c1,c2)1c1IΔc2ΔI(c1,c2)1,\displaystyle{\mathsf{\Delta}}^{\sqsubseteq^{{\mathsf{\Delta}}^{\prime}}}_{I}(c_{1},c_{2})\leq 1\iff c_{1}\sqsubseteq^{{\mathsf{\Delta}}^{\prime}}_{I}c_{2}\iff{\mathsf{\Delta}}^{\prime}_{I}(c_{1},c_{2})\leq 1,
c1IΔc2ΔI(c1,c2)1c1Ic2.\displaystyle c_{1}\sqsubseteq^{{\mathsf{\Delta}}^{\sqsubseteq^{\prime}}}_{I}c_{2}\iff{\mathsf{\Delta}}^{\sqsubseteq^{\prime}}_{I}(c_{1},c_{2})\leq 1\iff c_{1}\sqsubseteq^{\prime}_{I}c_{2}.

This completes the proof. __relaxaxp@oldproof __relaxaxp@oldproof (Proof of Theorem 2) First, it is easy to see that the inequality (3) is equivalent to Δ{\mathsf{\Delta}} satisfying EE-unit reflexivity.

We next show that the inequality (4) is equivalent to Δ{\mathsf{\Delta}} satisfying EE-composability.

(only if) Since U1={id1}U1=\{{\rm id}_{1}\}, we have RE1={(id1,id1)}R_{E1}=\{({\rm id}_{1},{\rm id}_{1})\}. Therefore it holds d1,J(c1,c2)=ΔJ(c1,c2)d_{1,J}(c_{1},c_{2})={\mathsf{\Delta}}_{J}(c_{1},c_{2}). By letting I=1I=1 in the inequality (4), we obtain the EE-composability:

d1,K(f1Tc1,f2Tc2)dJ,K(f1,f2)+d1,J(c1,c2)\displaystyle d_{1,K}(f_{1}\circ_{\mathbb{C}_{T}}c_{1},f_{2}\circ_{\mathbb{C}_{T}}c_{2})\leq d_{J,K}(f_{1},f_{2})+d_{1,J}(c_{1},c_{2})
\displaystyle\iff ΔK(f1c1,f2c2)sup(x1,x2)EIΔK(f1x1,f2x2)+ΔJ(c1,c2).\displaystyle{\mathsf{\Delta}}_{K}(f_{1}^{\sharp}\circ c_{1},f_{2}^{\sharp}\circ c_{2})\leq\sup_{(x_{1},x_{2})\in EI}{\mathsf{\Delta}}_{K}(f_{1}\mathbin{\bullet}x_{1},f_{2}\mathbin{\bullet}x_{2})+{\mathsf{\Delta}}_{J}(c_{1},c_{2}).

(if) From the EE-composability, for any f1,f2:ITJf_{1},f_{2}:I\to TJ and g1,g2:JTKg_{1},g_{2}:J\to TK and (x1,x2)EI(x_{1},x_{2})\in EI, we have

ΔK(g1(f1x1),g2(f2x2))dJ,K(g1,g2)+ΔJ(f1x1,f2x2).{\mathsf{\Delta}}_{K}(g_{1}^{\sharp}\mathbin{\bullet}(f_{1}\mathbin{\bullet}x_{1}),g_{2}^{\sharp}\mathbin{\bullet}(f_{2}\mathbin{\bullet}x_{2}))\leq d_{J,K}(g_{1},g_{2})+{\mathsf{\Delta}}_{J}(f_{1}\mathbin{\bullet}x_{1},f_{2}\mathbin{\bullet}x_{2}).

Next, for any (x1,x2)EI(x_{1},x_{2})\in EI, we have ΔJ(f1x1,f2x2)dI,J(f1,f2){\mathsf{\Delta}}_{J}(f_{1}\mathbin{\bullet}x_{1},f_{2}\mathbin{\bullet}x_{2})\leq d_{I,J}(f_{1},f_{2}). Thus by monotonicity of (+)(+) we have

ΔK(g1f1x1,g2f2x2)dJ,K(g1,g2)+dI,J(f1,f2).{\mathsf{\Delta}}_{K}(g_{1}^{\sharp}\mathbin{\bullet}f_{1}\mathbin{\bullet}x_{1},g_{2}^{\sharp}\mathbin{\bullet}f_{2}\mathbin{\bullet}x_{2})\leq d_{J,K}(g_{1},g_{2})+d_{I,J}(f_{1},f_{2}).

By discharging (x1,x2)EI(x_{1},x_{2})\in EI, we conclude

dI,K(g1f1,g2f2)dJ,K(g1,g2)+dI,J(f1,f2).d_{I,K}(g_{1}^{\sharp}\circ f_{1},g_{2}^{\sharp}\circ f_{2})\leq d_{J,K}(g_{1},g_{2})+d_{I,J}(f_{1},f_{2}).

__relaxaxp@oldproof __relaxaxp@oldproof (Proof of Theorem 4) [Δ][{\mathsf{\Delta}}] is a graded variant of codensity lifting performed along the fibration V𝒬,:𝐃𝐢𝐯𝒬()V_{\mathcal{Q},\mathbb{C}}:{\bf Div}_{\mathcal{Q}}({\mathbb{C}})\to{\mathbb{C}} ([33]; see also Definition 15). Proving that it is a graded lifting of TT is routine. We show ΔIm=[Δ]m(EI){\mathsf{\Delta}}^{m}_{I}=[{\mathsf{\Delta}}]m(E^{\prime}I). The direction [Δ]m(EI)ΔIm[{\mathsf{\Delta}}]m(E^{\prime}I)\leq{\mathsf{\Delta}}^{m}_{I} is easy. We show the converse. From the composability of Δ{\mathsf{\Delta}}, for any c1,c2U(TI)c_{1},c_{2}\in U(TI), JJ\in\mathbb{C}, nMn\in M and f𝐃𝐢𝐯𝒬()(EI,ΔJn)f\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}})(E^{\prime}I,{\mathsf{\Delta}}^{n}_{J}), we have

ΔJmn(fc1,fc2)ΔIm(c1,c2)+sup(x1,x2)EIΔJn(fx1,fx2).{\mathsf{\Delta}}^{m\cdot n}_{J}(f^{\sharp}\mathbin{\bullet}c_{1},f^{\sharp}\mathbin{\bullet}c_{2})\leq{\mathsf{\Delta}}^{m}_{I}(c_{1},c_{2})+\sup_{(x_{1},x_{2})\in EI}{\mathsf{\Delta}}^{n}_{J}(f\mathbin{\bullet}x_{1},f\mathbin{\bullet}x_{2}).

Next, the nonexpansivity of ff is equivalent to

sup(x1,x2)EIΔJn(fx1,fx2)0.\sup_{(x_{1},x_{2})\in EI}{\mathsf{\Delta}}^{n}_{J}(f\mathbin{\bullet}x_{1},f\mathbin{\bullet}x_{2})\leq 0.

Therefore we conclude ΔJmn(fc1,fc2)ΔIm(c1,c2){\mathsf{\Delta}}^{m\cdot n}_{J}(f^{\sharp}\mathbin{\bullet}c_{1},f^{\sharp}\mathbin{\bullet}c_{2})\leq{\mathsf{\Delta}}^{m}_{I}(c_{1},c_{2}). By discharging J,n,fJ,n,f, we conclude the inequality [Δ]m(EI)ΔIm[{\mathsf{\Delta}}]m(EI)\leq{\mathsf{\Delta}}^{m}_{I}. __relaxaxp@oldproof __relaxaxp@oldproof (Proof of Theorem 5) Let Δ𝐃𝐢𝐯(T,E,M,𝒬){\mathsf{\Delta}}\in{\bf Div}(T,E,M,\mathcal{Q}). We have already shown that [Δ][{\mathsf{\Delta}}] is an MM-graded 𝒬\mathcal{Q}-divergence lifting of TT. We show that [Δ][{\mathsf{\Delta}}] is EE-strong (this proof does not need the closedness of \mathbb{C}). Let X(I,d)𝐃𝐢𝐯𝒬()X\triangleq(I,d)\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}}) and JJ\in\mathbb{C} be objects. We first rewrite the goal:

θ𝐃𝐢𝐯𝒬()(X[Δ]m(EJ),[Δ]m(XEJ))\displaystyle\theta\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}})(X\otimes[{\mathsf{\Delta}}]m(E^{\prime}J),[{\mathsf{\Delta}}]m(X\otimes E^{\prime}J))
(x1,x2UI,c1,c2U(TJ).d[Δ]m(XEJ)(θx1,c1,θx2,c2)d(x1,x2)+d[Δ]m(EJ)(c1,c2))\displaystyle\iff\left(\begin{aligned} &\forall{x_{1},x_{2}\in UI,c_{1},c_{2}\in U(TJ)}~{}.~{}\\ &\qquad d_{[{\mathsf{\Delta}}]m(X\otimes E^{\prime}J)}(\theta\mathbin{\bullet}\langle x_{1},c_{1}\rangle,\theta\mathbin{\bullet}\langle x_{2},c_{2}\rangle)\leq d(x_{1},x_{2})+d_{[{\mathsf{\Delta}}]m(E^{\prime}J)}(c_{1},c_{2})\end{aligned}\right)
(x1,x2UI,c1,c2U(TJ),K,nM,f𝐃𝐢𝐯𝒬()(XEJ,ΔKn).ΔKmn(fθx1,c1,fθx2,c2)d(x1,x2)+d[Δ]m(EJ)(c1,c2))\displaystyle\iff\left(\begin{aligned} &\forall{x_{1},x_{2}\in UI,c_{1},c_{2}\in U(TJ),K\in\mathbb{C},n\in M,f\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}})(X\otimes E^{\prime}J,{\mathsf{\Delta}}^{n}_{K})}~{}.~{}\\ &\qquad{\mathsf{\Delta}}^{m\cdot n}_{K}(f^{\sharp}\mathbin{\bullet}\theta\mathbin{\bullet}\langle x_{1},c_{1}\rangle,f^{\sharp}\mathbin{\bullet}\theta\mathbin{\bullet}\langle x_{2},c_{2}\rangle)\leq d(x_{1},x_{2})+d_{[{\mathsf{\Delta}}]m(E^{\prime}J)}(c_{1},c_{2})\end{aligned}\right)
(x1,x2UI,c1,c2U(TJ),K,nM,f𝐃𝐢𝐯𝒬()(XEJ,ΔKn).ΔKmn((fx1)c1,(fx2)c2)d(x1,x2)+d[Δ]m(c1,c2)).\displaystyle\mathbin{\stackrel{{\scriptstyle{\dagger}}}{{\iff}}}\left(\begin{aligned} &\forall{x_{1},x_{2}\in UI,c_{1},c_{2}\in U(TJ),K\in\mathbb{C},n\in M,f\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}})(X\otimes E^{\prime}J,{\mathsf{\Delta}}^{n}_{K})}~{}.~{}\\ &\qquad{\mathsf{\Delta}}^{m\cdot n}_{K}((f_{x_{1}})^{\sharp}\mathbin{\bullet}c_{1},(f_{x_{2}})^{\sharp}\mathbin{\bullet}c_{2})\leq d(x_{1},x_{2})+d_{[{\mathsf{\Delta}}]}^{m}(c_{1},c_{2})\end{aligned}\right).

In the step \mathbin{\stackrel{{\scriptstyle{\dagger}}}{{\iff}}}, we used the equality (1). To show this goal, we proceed as follows. Let x1,x2UI,c1,c2U(TJ),K,nMx_{1},x_{2}\in UI,c_{1},c_{2}\in U(TJ),K\in\mathbb{C},n\in M and f𝐃𝐢𝐯𝒬()(XEJ,ΔKn)f\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}})(X\otimes E^{\prime}J,{\mathsf{\Delta}}^{n}_{K}). First, from the composability of Δ{\mathsf{\Delta}}, we obtain

ΔKmn((fx1)c1,(fx2)c2)ΔJm(c1,c2)+sup(y1,y2)EJΔKn(fx1y1,fx2y2).{\mathsf{\Delta}}^{m\cdot n}_{K}((f_{x_{1}})^{\sharp}\mathbin{\bullet}c_{1},(f_{x_{2}})^{\sharp}\mathbin{\bullet}c_{2})\leq{\mathsf{\Delta}}_{J}^{m}(c_{1},c_{2})+\sup_{(y_{1},y_{2})\in EJ}{\mathsf{\Delta}}^{n}_{K}(f_{x_{1}}\mathbin{\bullet}y_{1},f_{x_{2}}\mathbin{\bullet}y_{2}).

We look at summands of the right hand side. First, we easily obtain ΔJm(c1,c2)d[Δ]m(EJ)(c1,c2){\mathsf{\Delta}}^{m}_{J}(c_{1},c_{2})\leq d_{[{\mathsf{\Delta}}]m(E^{\prime}J)}(c_{1},c_{2}). Next, from the nonexpansivity of ff, for any x1,x2UI,y1,y2UJx_{1},x_{2}\in UI,y_{1},y_{2}\in UJ, we have

ΔKn(fx1y1,fx2y2)=ΔKn(fx1,y1,fx2,y2)d(x1,x2)+EJ(y1,y2).{\mathsf{\Delta}}^{n}_{K}(f_{x_{1}}\mathbin{\bullet}y_{1},f_{x_{2}}\mathbin{\bullet}y_{2})={\mathsf{\Delta}}^{n}_{K}(f\mathbin{\bullet}\langle x_{1},y_{1}\rangle,f\mathbin{\bullet}\langle x_{2},y_{2}\rangle)\leq d(x_{1},x_{2})+E^{\prime}J(y_{1},y_{2}).

Because x+=x+\top=\top, we obtain

x1,x2UI.sup(y1,y2)EJΔKn(fx1y1,fx2y2)d(x1,x2).\forall{x_{1},x_{2}\in UI}~{}.~{}\sup_{(y_{1},y_{2})\in EJ}{\mathsf{\Delta}}^{n}_{K}(f_{x_{1}}\mathbin{\bullet}y_{1},f_{x_{2}}\mathbin{\bullet}y_{2})\leq d(x_{1},x_{2}).

Therefore we obtain the goal:

ΔKmn((fx1)c1,(fx2)c2)d[Δ]m(EJ)(c1,c2)+d(x1,x2)=d(x1,x2)+d[Δ]m(EJ)(c1,c2).{\mathsf{\Delta}}^{m\cdot n}_{K}((f_{x_{1}})^{\sharp}\mathbin{\bullet}c_{1},(f_{x_{2}})^{\sharp}\mathbin{\bullet}c_{2})\leq d_{[{\mathsf{\Delta}}]m(E^{\prime}J)}(c_{1},c_{2})+d(x_{1},x_{2})=d(x_{1},x_{2})+d_{[{\mathsf{\Delta}}]m(E^{\prime}J)}(c_{1},c_{2}).

Next, let T˙𝐒𝐆𝐃𝐋𝐢𝐟𝐭(T,E,M,𝒬)\dot{T}\in{\bf SGDLift}(T,E,M,\mathcal{Q}). We show that T˙𝐃𝐢𝐯(T,E,M,𝒬)\langle\dot{T}\rangle\in{\bf Div}(T,E,M,\mathcal{Q}).

The unit law of T˙\dot{T} immediately entails

ηI𝐃𝐢𝐯𝒬()(EI,T˙1(EI)).\eta_{I}\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}})(E^{\prime}I,\dot{T}1(E^{\prime}I)).

Next, under the assumption on (,T)(\mathbb{C},T) and 𝒬\mathcal{Q}, in 𝐃𝐢𝐯𝒬(){\bf Div}_{\mathcal{Q}}({\mathbb{C}}) the functor ()EI(-)\otimes E^{\prime}I has a right adjoint EI()E^{\prime}I\multimap(-) above the adjunction ()×II()(-)\times I\dashv I\Rightarrow(-) (Lemma 1). Therefore each component of the internal Kleisli extension morphism klkl given in (6) are nonexpansive morphisms in 𝐃𝐢𝐯𝒬(){\bf Div}_{\mathcal{Q}}({\mathbb{C}}):

T˙m(EI)(EIT˙n(EJ))\textstyle{\langle\dot{T}\rangle m(E^{\prime}I)\otimes(E^{\prime}I\multimap\langle\dot{T}\rangle n(E^{\prime}J))\ignorespaces\ignorespaces\ignorespaces\ignorespaces}π2,π1\scriptstyle{\langle\pi_{2},\pi_{1}\rangle}(EIT˙n(EJ))T˙m(EI)\textstyle{(E^{\prime}I\multimap\langle\dot{T}\rangle n(E^{\prime}J))\otimes\langle\dot{T}\rangle m(E^{\prime}I)\ignorespaces\ignorespaces\ignorespaces\ignorespaces}θ\scriptstyle{\theta}T˙m((EIT˙n(EJ))EI)\textstyle{\langle\dot{T}\rangle m((E^{\prime}I\multimap\langle\dot{T}\rangle n(E^{\prime}J))\otimes E^{\prime}I)\ignorespaces\ignorespaces\ignorespaces\ignorespaces}ev\scriptstyle{\operatorname{ev}^{\sharp}}T˙(mn)(EJ)\textstyle{\langle\dot{T}\rangle(m\cdot n)(E^{\prime}J)}

Therefore we conclude

kl𝐃𝐢𝐯𝒬()(T˙m(EI)(EIT˙n(EJ)),T˙(mn)(EJ)).\operatorname{kl}\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}})(\langle\dot{T}\rangle m(E^{\prime}I)\otimes(E^{\prime}I\multimap\langle\dot{T}\rangle n(E^{\prime}J)),\langle\dot{T}\rangle(m\cdot n)(E^{\prime}J)).

We also easily have monotonicity: T˙m(EI)T˙n(EI)\langle\dot{T}\rangle m(E^{\prime}I)\leq\langle\dot{T}\rangle n(E^{\prime}I) for mnm\leq n by condition 1 of graded divergence lifting. We thus conclude that T˙mEI𝐃𝐢𝐯(T,E,M,𝒬)\langle\dot{T}\rangle mE^{\prime}I\in{\bf Div}(T,E,M,\mathcal{Q}).

We finally show T˙[T˙]\dot{T}\preceq[\langle\dot{T}\rangle]. Let c1,c2U(TI)c_{1},c_{2}\in U(TI). We show

supnM,J,f𝐃𝐢𝐯𝒬()(X,T˙n(EJ))dT˙(mn)(EJ)(f(c1),f(c2))dT˙mX(c1,c2).\sup_{n\in M,J\in\mathbb{C},f\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}})(X,\dot{T}n(E^{\prime}J))}d_{\dot{T}(m\cdot n)(E^{\prime}J)}(f^{\sharp}(c_{1}),f^{\sharp}(c_{2}))\leq d_{\dot{T}mX}(c_{1},c_{2}). (28)

Let nM,J,f𝐃𝐢𝐯𝒬()(X,T˙n(EJ))n\in M,J\in\mathbb{C},f\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}})(X,\dot{T}n(E^{\prime}J)). Since T˙\dot{T} is an MM-graded 𝒬\mathcal{Q}-divergence lifting of TT, we obtain

f𝐃𝐢𝐯𝒬()(T˙mX,T˙(mn)(EJ)).f^{\sharp}\in{\bf Div}_{\mathcal{Q}}({\mathbb{C}})(\dot{T}mX,\dot{T}(m\cdot n)(E^{\prime}J)).

This implies the inequality dT˙(mn)(EJ)(f(c1),f(c2))dT˙mX(c1,c2)d_{\dot{T}(m\cdot n)(E^{\prime}J)}(f^{\sharp}(c_{1}),f^{\sharp}(c_{2}))\leq d_{\dot{T}mX}(c_{1},c_{2}) in 𝒬\mathcal{Q}. By taking the sup for n,J,fn,J,f, we obtain the inequality (28). __relaxaxp@oldproof __relaxaxp@oldproof (Proof of Proposition 9) We write |1|={}|1|=\{\ast\}. We first check the measurable isomorphism Gs1[0,1]G_{s}1\cong[0,1]. The measurable functions ev{}:Gs1[0,1]\mathrm{ev}_{\{\ast\}}\colon G_{s}1\to[0,1] (νν()\nu\mapsto\nu(\ast)) and the function H:|[0,1]||Gs1|H\colon|[0,1]|\to|G_{s}1| (rr𝐝r\mapsto r\cdot\mathbf{d}_{\ast}) are mutually inverse. For any (Borel-)measurable UΣ[0,1]U\in\Sigma_{[0,1]}, we have H1(ev{}1(U))=UH^{-1}(\mathrm{ev}_{\{\ast\}}^{-1}(U))=U and H1(ev1(U))=[0,1]H^{-1}(\mathrm{ev}_{\emptyset}^{-1}(U))=[0,1] if 0U0\in U and H1(ev1(U))=H^{-1}(\mathrm{ev}_{\emptyset}^{-1}(U))=\emptyset otherwise. Since all generators of ΣGs1\Sigma_{G_{s}1} are ev{}1(U)\mathrm{ev}_{\{\ast\}}^{-1}(U) and ev1(U)\mathrm{ev}_{\emptyset}^{-1}(U) where UΣ[0,1]U\in\Sigma_{[0,1]}, we conclude the measurability of HH. Thus, f:I[0,1]f\colon I\to[0,1] corresponds bijectively to Hf:IGs1H\circ f\colon I\to G_{s}1, and

If𝑑ν1=Iev{}Hf𝑑ν1=((Hf)ν1)({}).\int_{I}fd\nu_{1}=\int_{I}\mathrm{ev}_{\{\ast\}}\circ H\circ fd\nu_{1}=((H\circ f)^{\sharp}\nu_{1})(\{\ast\}).

We then obtain, for all I𝐌𝐞𝐚𝐬I\in{\bf Meas}, ν1,ν2GsI\nu_{1},\nu_{2}\in G_{s}I

𝖣𝖯Iε(ν1,ν2)\displaystyle\mathsf{DP}_{I}^{\varepsilon}(\nu_{1},\nu_{2}) =supSΣI(ν1(S)exp(ε)ν2(S))\displaystyle=\sup_{S\in\Sigma_{I}}(\nu_{1}(S)-\exp(\varepsilon)\nu_{2}(S))
supfS:I[0,1](IfS𝑑ν1exp(ε)IfS𝑑ν2)\displaystyle\leq\sup_{f_{S}\colon I\to[0,1]}(\int_{I}f_{S}d\nu_{1}-\exp(\varepsilon)\int_{I}f_{S}d\nu_{2})
=supfS:IGs[0,1](((HfS)ν1)()exp(ε)((HfS)ν2)())\displaystyle=\sup_{f_{S}\colon I\to G_{s}[0,1]}(((H\circ f_{S})^{\sharp}\nu_{1})(\ast)-\exp(\varepsilon)((H\circ f_{S})^{\sharp}\nu_{2})(\ast))
supfS:IGs[0,1]supSΣ1(S={},)((HfS)ν1)(S)exp(ε)((HfS)ν2)(S))\displaystyle\leq\sup_{f_{S}\colon I\to G_{s}[0,1]}\sup_{S^{\prime}\in\Sigma_{1}(\iff S^{\prime}=\{\ast\},\emptyset)}((H\circ f_{S})^{\sharp}\nu_{1})(S^{\prime})-\exp(\varepsilon)((H\circ f_{S})^{\sharp}\nu_{2})(S^{\prime}))
=supfS:IGs[0,1]𝖣𝖯1ε((HfS)ν1,(HfS)ν2)\displaystyle=\sup_{f_{S}\colon I\to G_{s}[0,1]}\mathsf{DP}_{1}^{\varepsilon}((H\circ f_{S})^{\sharp}\nu_{1},(H\circ f_{S})^{\sharp}\nu_{2})
=supg:IGs1𝖣𝖯1ε(gν1,gν2)\displaystyle=\sup_{g\colon I\to G_{s}1}\mathsf{DP}_{1}^{\varepsilon}(g^{\sharp}\nu_{1},g^{\sharp}\nu_{2})
𝖣𝖯Iε(ν1,ν2).\displaystyle\leq\mathsf{DP}_{I}^{\varepsilon}(\nu_{1},\nu_{2}).

The first inequality is given by ν(S)=IχS𝑑ν\nu(S)=\int_{I}\chi_{S}d\nu where χS:I[0,1]\chi_{S}\colon I\to[0,1] is the indicator function of SS defined by χS(x)=1\chi_{S}(x)=1 when xSx\in S and χS(x)=0\chi_{S}(x)=0 otherwise. The last inequality is given by the data-processing inequality which is given by the reflexivity and Eq{\color[rgb]{0,0,0}\mathrm{Eq}}{}-composability of 𝖣𝖯\mathsf{DP}. __relaxaxp@oldproof __relaxaxp@oldproof (Proof of Proposition 10) We first prove that 𝖳𝖵\mathsf{TV} is not 11-generated. We write |2|={0,1}|2|=\{0,1\}. We define ν1,ν2Gs2\nu_{1},\nu_{2}\in G_{s}2 by

ν1=12𝐝0+12𝐝1,ν2=13𝐝0+23𝐝1.\nu_{1}=\frac{1}{2}\cdot\mathbf{d}_{0}+\frac{1}{2}\cdot\mathbf{d}_{1},\qquad\nu_{2}=\frac{1}{3}\cdot\mathbf{d}_{0}+\frac{2}{3}\cdot\mathbf{d}_{1}.

Then the total variation distance between them is calculated by

𝖳𝖵2(ν1,ν2)=12(|1213|+|1223|)=16.\mathsf{TV}_{2}(\nu_{1},\nu_{2})=\frac{1}{2}\left(\left|\frac{1}{2}-\frac{1}{3}\right|+\left|\frac{1}{2}-\frac{2}{3}\right|\right)=\frac{1}{6}.

On the other hand, for any f:2Gs1f\colon 2\to G_{s}1, we have

𝖳𝖵1(f(ν1),f(ν2))\displaystyle\mathsf{TV}_{1}(f^{\sharp}(\nu_{1}),f^{\sharp}(\nu_{2})) =12|12f(0)+12f(1)13f(0)23f(1)|\displaystyle=\frac{1}{2}\left|\frac{1}{2}f(0)+\frac{1}{2}f(1)-\frac{1}{3}f(0)-\frac{2}{3}f(1)\right|
=12|16f(0)16f(1)|\displaystyle=\frac{1}{2}\left|\frac{1}{6}f(0)-\frac{1}{6}f(1)\right|
=112|f(0)f(1)|\displaystyle=\frac{1}{12}\left|f(0)-f(1)\right|
112.\displaystyle\leq\frac{1}{12}.

This implies that 𝖳𝖵\mathsf{TV} is not 11-generated.

Next, we prove that 𝖳𝖵\mathsf{TV} is 22-generated. From the data-processing inequality 𝖳𝖵\mathsf{TV} which is given by the reflexivity and Eq{\color[rgb]{0,0,0}\mathrm{Eq}}{}-composability of 𝖳𝖵\mathsf{TV}, we obtain for any ν1,ν2GsI\nu_{1},\nu_{2}\in G_{s}I,

𝖳𝖵I(ν1,ν2)supg:IGs2𝖳𝖵2(gν1,gν2).\mathsf{TV}_{I}(\nu_{1},\nu_{2})\geq\sup_{g\colon I\to G_{s}2}\mathsf{TV}_{2}(g^{\sharp}\nu_{1},g^{\sharp}\nu_{2}).

We show that the above inequality becomes the equality for some gg.

We fix ν1,ν2GsI\nu_{1},\nu_{2}\in G_{s}I, a base measure μ\mu over II satisfying the absolute continuity ν1,ν2μ\nu_{1},\nu_{2}\ll\mu and the Radon-Nikodym derivatives (density functions) dν1dμ,dν2dμ\frac{d\nu_{1}}{d\mu},\frac{d\nu_{2}}{d\mu} of ν1,ν2\nu_{1},\nu_{2} with respect to μ\mu respectively.

Let A=(dν1dμdν2dμ)1([0,))A=(\frac{d\nu_{1}}{d\mu}-\frac{d\nu_{2}}{d\mu})^{-1}([0,\infty)) and B=IAB=I\setminus A. We define g:IGs2g\colon I\to G_{s}2 by g(x)=𝐝0g(x)=\mathbf{d}_{0} if xBx\in B and g(x)=𝐝1g(x)=\mathbf{d}_{1} otherwise. Then for any νGsI\nu\in G_{s}I we have

(gν)({0})=Ig()({0})𝑑ν=Ag()({0})𝑑ν+Bg()({0})𝑑ν=A1𝑑ν+B0𝑑ν=ν(A).(g^{\sharp}\nu)(\{0\})=\int_{I}g(-)(\{0\})d\nu=\int_{A}g(-)(\{0\})d\nu+\int_{B}g(-)(\{0\})d\nu=\int_{A}1d\nu+\int_{B}0d\nu=\nu(A).

Similarly we have (gν)({1})=ν(B)(g^{\sharp}\nu)(\{1\})=\nu(B). Therefore, we obtain

12𝖳𝖵I(μ1,μ2)\displaystyle\frac{1}{2}\mathsf{TV}_{I}(\mu_{1},\mu_{2}) =12I|dν1dμ(x)dν2dμ(x)|𝑑μ(x)\displaystyle=\frac{1}{2}\int_{I}\left|\frac{d\nu_{1}}{d\mu}(x)-\frac{d\nu_{2}}{d\mu}(x)\right|~{}d\mu(x)
=12Adν1dμ(x)dν2dμ(x)dμ(x)+12Bdν2dμ(x)dν1dμ(x)dμ(x)\displaystyle=\frac{1}{2}\int_{A}\frac{d\nu_{1}}{d\mu}(x)-\frac{d\nu_{2}}{d\mu}(x)~{}d\mu(x)+\frac{1}{2}\int_{B}\frac{d\nu_{2}}{d\mu}(x)-\frac{d\nu_{1}}{d\mu}(x)~{}d\mu(x)
=12(ν1(A)ν2(A)+ν2(B)ν1(B))\displaystyle=\frac{1}{2}(\nu_{1}(A)-\nu_{2}(A)+\nu_{2}(B)-\nu_{1}(B))
=12((gν1)({0})(gν2)({0})+(gν2)({1})(gν2)({1}))\displaystyle=\frac{1}{2}((g^{\sharp}\nu_{1})(\{0\})-(g^{\sharp}\nu_{2})(\{0\})+(g^{\sharp}\nu_{2})(\{1\})-(g^{\sharp}\nu_{2})(\{1\}))
=12(|(gν1)({0})(gν2)({0})|+|(gν2)({1})(gν2)({1})|)\displaystyle=\frac{1}{2}(|(g^{\sharp}\nu_{1})(\{0\})-(g^{\sharp}\nu_{2})(\{0\})|+|(g^{\sharp}\nu_{2})(\{1\})-(g^{\sharp}\nu_{2})(\{1\})|)
=𝖳𝖵2(g(μ1),g(μ2))\displaystyle=\mathsf{TV}_{2}(g^{\sharp}(\mu_{1}),g^{\sharp}(\mu_{2}))

We then conclude that ΔTV\Delta^{\mathrm{TV}} is 22-generated. __relaxaxp@oldproof __relaxaxp@oldproof (Proof of Proposition 11) For all set JJ and c1,c2TJc_{1},c_{2}\in TJ, we have

ΔJ[]Ω(c1,c2)=1\displaystyle{\mathsf{\Delta}}^{[\leq]^{\Omega}}_{J}(c_{1},c_{2})=1 c1[]JΩc2\displaystyle\iff c_{1}[\leq]^{\Omega}_{J}c_{2}
g:JTΩg(c1)g(c2)\displaystyle\iff\bigwedge_{g\colon J\to T\Omega}g^{\sharp}(c_{1})\leq g^{\sharp}(c_{2})
g:JTΩg(c1)[]ΩΩg(c2)\displaystyle\iff\bigwedge_{g\colon J\to T\Omega}g^{\sharp}(c_{1})\mathbin{[\leq]^{\Omega}_{\Omega}}g^{\sharp}(c_{2})
supg:JTΩΔ([]Ω)Ω(g(c1),g(c2))=1.\displaystyle\iff\sup_{g\colon J\to T\Omega}{\mathsf{\Delta}}([\leq]^{\Omega})_{\Omega}(g^{\sharp}(c_{1}),g^{\sharp}(c_{2}))=1.

This implies that Δ[]Ω{\mathsf{\Delta}}^{[\leq]^{\Omega}} is Ω\Omega-generated. __relaxaxp@oldproof

Lemma 2.

For any U𝐐𝐄𝐓(Ω,X)U\in{\bf QET}(\Omega,X), the function d[U]:(TΩX)2+d[U]\colon(T_{\Omega}X)^{2}\to\mathcal{R}^{+} defined by

d[U](t,u)inf{ε+|t=εuU}d[U](t,u)\triangleq\inf\left\{\varepsilon\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{t}=_{\varepsilon}{u}\in U\right\}

is a CS-EPMet on TΩXT_{\Omega}X such that d[U](t,u)+t=d[U](t,u)ud[U](t,u)\in\mathbb{Q}^{+}\implies\emptyset\vdash{t}=_{d[U](t,u)}{u}.

Proof.

We first check the axioms of extended pseudometric.

By (Ref), UU contains t=0t\emptyset\vdash{t}=_{0}{t} for each tTΩXt\in T_{\Omega}X. Hence d[U](t,t)=0d[U](t,t)=0 holds for all tTΩXt\in T_{\Omega}X.

By (Sym) and (Cut), t=εu\emptyset\vdash{t}=_{\varepsilon}{u} if and only if u=εt\emptyset\vdash{u}=_{\varepsilon}{t}. Hence, for all t,uTΩXt,u\in T_{\Omega}X,

d[U](t,u)=inf{ε+|t=εu}=inf{ε+|u=εt}=d[U](u,t)d[U](t,u)=\inf\left\{\varepsilon\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{t}=_{\varepsilon}{u}\right\}=\inf\left\{\varepsilon\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{u}=_{\varepsilon}{t}\right\}=d[U](u,t)

By (Tri) and (Cut), if t=εu\emptyset\vdash{t}=_{\varepsilon}{u} and u=εv\emptyset\vdash{u}=_{\varepsilon^{\prime}}{v} then t=ε+εv\emptyset\vdash{t}=_{\varepsilon+\varepsilon^{\prime}}{v}. Hence, for all t,u,vTΩXt,u,v\in T_{\Omega}X,

d[U](t,v)\displaystyle d[U](t,v) =inf{ε+|t=εv}\displaystyle=\inf\left\{\varepsilon^{\ast}\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{t}=_{\varepsilon^{\ast}}{v}\right\}
inf{ε+ε|t=εuu=εv}\displaystyle\leq\inf\left\{\varepsilon+\varepsilon^{\prime}~{}\middle|~{}\emptyset\vdash{t}=_{\varepsilon}{u}\land\emptyset\vdash{u}=_{\varepsilon^{\prime}}{v}\right\}
inf{ε+|t=εu}+inf{ε+|u=εv}\displaystyle\leq\inf\left\{\varepsilon\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{t}=_{\varepsilon}{u}\right\}+\inf\left\{\varepsilon^{\prime}\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{u}=_{\varepsilon^{\prime}}{v}\right\}
=d[U](t,u)+d[U](u,v).\displaystyle=d[U](t,u)+d[U](u,v).

We next check the substitutivity. Let t,uTΩXt,u\in T_{\Omega}X and h:XTΩXh\colon X\to T_{\Omega}X. By (Subst), we have

t=εuUh(t)=εh(u)U.\emptyset\vdash{t}=_{\varepsilon}{u}\in U\implies\emptyset\vdash{h^{\sharp}(t)}=_{\varepsilon}{h^{\sharp}(u)}\in U.

Since ε\varepsilon is arbitrary, we conclude the substitutivity as follows:

d[U](h(t),h(u))\displaystyle d[U](h^{\sharp}(t),h^{\sharp}(u)) =inf{ε+|h(t)=εh(u)U}\displaystyle=\inf\left\{\varepsilon\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{h^{\sharp}(t)}=_{\varepsilon}{h^{\sharp}(u)}\in U\right\}
inf{ε+|t=εuU}\displaystyle\leq\inf\left\{\varepsilon\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{t}=_{\varepsilon}{u}\in U\right\}
=d[U](t,u).\displaystyle=d[U](t,u).

Next, we check the congruence, Let tTΩIt\in T_{\Omega}I and h1,h2:ITΩXh_{1},h_{2}\colon I\to T_{\Omega}X By applying (Nonexp) and (Cut) inductively by unfolding the structure of tt,

iI.h1(i)=εh2(i)Uh1(t)=εh2(t)U.\forall{i\in I}~{}.~{}\emptyset\vdash{h_{1}(i)}=_{\varepsilon}{h_{2}(i)}\in U\implies\emptyset\vdash{h_{1}^{\sharp}(t)}=_{\varepsilon}{h_{2}^{\sharp}(t)}\in U. (29)

If supiId[U](h1(i),h2(i))ε\sup_{i\in I}d[U](h_{1}(i),h_{2}(i))\leq\varepsilon^{\prime} for some ε+\varepsilon^{\prime}\in\mathbb{Q}^{+}, then we have d[U](h1(i),h2(i))εd[U](h_{1}(i),h_{2}(i))\leq\varepsilon^{\prime} for all iIi\in I. By (Max),(Cut) and definition of dXUd_{X}^{U}, we have h1(i)=εh2(i)U\vdash{h_{1}(i)}=_{\varepsilon^{\prime}}{h_{2}(i)}\in U for all iIi\in I. Hence,

supiId[U](h1(i),h2(i))εiI.h1(i)=εh2(i)U.\sup_{i\in I}d[U](h_{1}(i),h_{2}(i))\leq\varepsilon^{\prime}\implies\forall{i\in I}~{}.~{}\emptyset\vdash{h_{1}(i)}=_{\varepsilon^{\prime}}{h_{2}(i)}\in U. (30)

From the above two implications (29) and (30), We conclude the congruence as follows:

d[U](h1(t),h2(t))\displaystyle d[U](h_{1}^{\sharp}(t),h_{2}^{\sharp}(t)) =inf{ε+|h1(t)=εh2(t)U}\displaystyle=\inf\left\{\varepsilon^{\prime}\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{h_{1}^{\sharp}(t)}=_{\varepsilon^{\prime}}{h_{2}^{\sharp}(t)}\in U\right\}
inf{ε+|iI.h1(i)=εh2(i)U}\displaystyle\leq\inf\left\{\varepsilon^{\prime}\in\mathbb{Q}^{+}~{}\middle|~{}\forall{i\in I}~{}.~{}\emptyset\vdash{h_{1}(i)}=_{\varepsilon^{\prime}}{h_{2}(i)}\in U\right\}
inf{ε+|supiIdXU(h1(i),h2(i))ε}\displaystyle\leq\inf\left\{\varepsilon^{\prime}\in\mathbb{Q}^{+}~{}\middle|~{}\sup_{i\in I}d_{X}^{U}(h_{1}(i),h_{2}(i))\leq\varepsilon^{\prime}\right\}
=supiId[U](h1(i),h2(i)).\displaystyle=\sup_{i\in I}d[U](h_{1}(i),h_{2}(i)).

Finally, we assume d[U](t,u)+d[U](t,u)\in\mathbb{Q}^{+}. By definition of d[U](t,u)d[U](t,u), for any ε+\varepsilon\in\mathbb{Q}^{+} such that d[U](t,u)<εd[U](t,u)<\varepsilon, there is ε+\varepsilon^{\prime}\in\mathbb{Q}^{+} satisfying d[U](t,u)ε<εd[U](t,u)\leq\varepsilon^{\prime}<\varepsilon and t=εuU{t}=_{\varepsilon^{\prime}}{u}\in U. Since ε+\varepsilon\in\mathbb{Q}^{+} is arbitrary, by (Max) and (Cut), we conclude

ε+.(d[U](t,u)<εt=εuU).\forall{\varepsilon\in\mathbb{Q}^{+}}~{}.~{}(d[U](t,u)<\varepsilon\implies{t}=_{\varepsilon}{u}\in U).

Since d[U](t,u)+d[U](t,u)\in\mathbb{Q}^{+}, by (Arch) and (Cut), we have t=d[U](t,u)uU{t}=_{d[U](t,u)}{u}\in U. ∎

The monotonicity of d[]:(𝐐𝐄𝐓(Ω,X),)(𝐂𝐒𝐄𝐏𝐌𝐞𝐭(TΩ,X),)d[-]\colon({\bf QET}(\Omega,X),\subseteq)\to({\bf CSEPMet}(T_{\Omega},X),\preceq) is easy to prove:

UV\displaystyle U\subseteq V t,uTΩX.inf{ε+|t=εuU}inf{ε+|t=εuV}\displaystyle\implies\forall{t,u\in T_{\Omega}X}~{}.~{}\inf\left\{\varepsilon\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{t}=_{\varepsilon}{u}\in U\right\}\geq\inf\left\{\varepsilon\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{t}=_{\varepsilon}{u}\in V\right\}
t,uTΩX.d[U](t,u)d[V](t,u)\displaystyle\iff\forall{t,u\in T_{\Omega}X}~{}.~{}d[U](t,u)\geq d[V](t,u)
d[U]d[V].\displaystyle\iff d[U]\preceq d[V].
Lemma 3.

Let TT be a monad on 𝐒𝐞𝐭{\bf Set}, and let X𝐒𝐞𝐭X\in{\bf Set}. For any d𝐂𝐒𝐄𝐏𝐌𝐞𝐭(T,X)d\in{\bf CSEPMet}(T,X), the family Gen(d)={Gen(d)I:(TX)2+}\mathrm{Gen}(d)=\{\mathrm{Gen}(d)_{I}\colon(TX)^{2}\to\mathcal{R}^{+}\} defined by

Gen(d)I(c1,c2)=supk:ITXd(k(c1),k(c2))\mathrm{Gen}(d)_{I}(c_{1},c_{2})=\sup_{k\colon I\to TX}d(k^{\sharp}(c_{1}),k^{\sharp}(c_{2}))

is an XX-generated Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-relative +\mathcal{R}^{+}-divergence on TT where each Gen(d)I\mathrm{Gen}(d)_{I} is a pseudometric.

Proof.

From the reflexivity of dd, we have the reflexivity of Gen(d)I\mathrm{Gen}(d)_{I}: for each cTIc\in TI,

Gen(d)I(c,c)=supk:ITXd(k(c),k(c))=0.\mathrm{Gen}(d)_{I}(c,c)=\sup_{k\colon I\to TX}d(k^{\sharp}(c),k^{\sharp}(c))=0.

Hence, the Eq{\color[rgb]{0,0,0}\mathrm{Eq}}-unit-reflexivity of Δd{\mathsf{\Delta}}^{d} is already proved from the (proper) reflexivity. From the symmetry of dd, we have the symmetry of ΔId{\mathsf{\Delta}}^{d}_{I}: for each c1,c2TIc_{1},c_{2}\in TI,

Gen(d)I(c1,c2)\displaystyle\mathrm{Gen}(d)_{I}(c_{1},c_{2}) =supk:ITXd(k(c1),k(c2))\displaystyle=\sup_{k\colon I\to TX}d(k^{\sharp}(c_{1}),k^{\sharp}(c_{2}))
=supk:ITXd(k(c2),k(c1))\displaystyle=\sup_{k\colon I\to TX}d(k^{\sharp}(c_{2}),k^{\sharp}(c_{1}))
=Gen(d)I(c2,c1).\displaystyle=\mathrm{Gen}(d)_{I}(c_{2},c_{1}).

From the triangle-inequality of dd, we have the triangle-inequality of Gen(d)I\mathrm{Gen}(d)_{I}: for all c1,c2,c3TIc_{1},c_{2},c_{3}\in TI,

Gen(d)I(c1,c3)\displaystyle\mathrm{Gen}(d)_{I}(c_{1},c_{3}) =supk:ITXd(k(c1),k(c3))\displaystyle=\sup_{k\colon I\to TX}d(k^{\sharp}(c_{1}),k^{\sharp}(c_{3}))
supk:ITXd(k(c1),k(c2))+d(k(c2),k(c3))\displaystyle\leq\sup_{k\colon I\to TX}d(k^{\sharp}(c_{1}),k^{\sharp}(c_{2}))+d(k^{\sharp}(c_{2}),k^{\sharp}(c_{3}))
supk:ITXd(k(c1),k(c2))+supk:ITXd(k(c2),k(c3))\displaystyle\leq\sup_{k\colon I\to TX}d(k^{\sharp}(c_{1}),k^{\sharp}(c_{2}))+\sup_{k\colon I\to TX}d(k^{\sharp}(c_{2}),k^{\sharp}(c_{3}))
=Gen(d)I(c1,c2)+Gen(d)I(c2,c3).\displaystyle=\mathrm{Gen}(d)_{I}(c_{1},c_{2})+\mathrm{Gen}(d)_{I}(c_{2},c_{3}).

From the reflexivity, congruence and substitutivity of dd and the triangle-inequality of ΔId{\mathsf{\Delta}}^{d}_{I}, we next show the composability. Let c1,c2TIc_{1},c_{2}\in TI and f1,f2:ITJf_{1},f_{2}\colon I\to TJ. We obtain,

Gen(d)J(f1(c1),f2(c2))\displaystyle\mathrm{Gen}(d)_{J}(f_{1}^{\sharp}(c_{1}),f_{2}^{\sharp}(c_{2}))
Gen(d)J(f1(c1),f1(c2))+Gen(d)J(f1(c2),f2(c2))\displaystyle\leq\mathrm{Gen}(d)_{J}(f_{1}^{\sharp}(c_{1}),f_{1}^{\sharp}(c_{2}))+\mathrm{Gen}(d)_{J}(f_{1}^{\sharp}(c_{2}),f_{2}^{\sharp}(c_{2}))
=supk:JTXd((kf1)(c1),(kf1)(c2))+supk:JTXd((kf1)(c2),(kf2)(c2))\displaystyle=\sup_{k\colon J\to TX}d((k^{\sharp}\circ f_{1})^{\sharp}(c_{1}),(k^{\sharp}\circ f_{1})^{\sharp}(c_{2}))+\sup_{k\colon J\to TX}d((k^{\sharp}\circ f_{1})^{\sharp}(c_{2}),(k^{\sharp}\circ f_{2})^{\sharp}(c_{2}))
supk:JTXd(f1(c1),f1(c2))+supk:JTXsupiId(kf1(i),kf2(i))\displaystyle\leq\sup_{k\colon J\to TX}d(f_{1}^{\sharp}(c_{1}),f_{1}^{\sharp}(c_{2}))+\sup_{k\colon J\to TX}\sup_{i\in I}d(k^{\sharp}\circ f_{1}(i),k^{\sharp}\circ f_{2}(i))
=d(f1(c1),f1(c2))+supk:JTXsupiId(kf1(i),kf2(i))\displaystyle=d(f_{1}^{\sharp}(c_{1}),f_{1}^{\sharp}(c_{2}))+\sup_{k\colon J\to TX}\sup_{i\in I}d(k^{\sharp}\circ f_{1}(i),k^{\sharp}\circ f_{2}(i))
supf1:ITXd(f1(c1),f1(c2))+supiIsupk:JTXd(kf1(i),kf2(i))\displaystyle\leq\sup_{f_{1}\colon I\to TX}d(f_{1}^{\sharp}(c_{1}),f_{1}^{\sharp}(c_{2}))+\sup_{i\in I}\sup_{k\colon J\to TX}d(k^{\sharp}\circ f_{1}(i),k^{\sharp}\circ f_{2}(i))
=Gen(d)I(c1,c2)+supiIGen(d)J(f1(i),f2(i)).\displaystyle=\mathrm{Gen}(d)_{I}(c_{1},c_{2})+\sup_{i\in I}\mathrm{Gen}(d)_{J}(f_{1}(i),f_{2}(i)).

Finally we show the XX-generatedness of Gen(d)\mathrm{Gen}(d) by definition

Gen(d)I(c1,c2)\displaystyle\mathrm{Gen}(d)I(c_{1},c_{2}) =supk:ITXd(k(c1),k(c2))\displaystyle=\sup_{k\colon I\to TX}d(k^{\sharp}(c_{1}),k^{\sharp}(c_{2}))
=suph:XTXsupk:ITXd(h(k(c1)),h(k(c2)))\displaystyle=\sup_{h\colon X\to TX}\sup_{k\colon I\to TX}d(h^{\sharp}(k^{\sharp}(c_{1})),h^{\sharp}(k^{\sharp}(c_{2})))
=supk:ITXsuph:XTXd(h(k(c1)),h(k(c2)))\displaystyle=\sup_{k\colon I\to TX}\sup_{h\colon X\to TX}d(h^{\sharp}(k^{\sharp}(c_{1})),h^{\sharp}(k^{\sharp}(c_{2})))
=supk:ITXGen(d)X(k(c1),k(c2))\displaystyle=\sup_{k\colon I\to TX}\mathrm{Gen}(d)_{X}(k^{\sharp}(c_{1}),k^{\sharp}(c_{2}))

This completes the proof. ∎

The monotonicity of Gen:(𝐂𝐒𝐄𝐏𝐌𝐞𝐭(TΩ,X),)(𝐃𝐢𝐯𝐄𝐏𝐌𝐞𝐭(TΩ,X),)\mathrm{Gen}\colon({\bf CSEPMet}(T_{\Omega},X),\preceq)\to(\mathbf{DivEPMet}(T_{\Omega},X),\preceq) is easy to prove:

dd\displaystyle d\preceq d^{\prime} c1,c2TI.supk:ITXd(k(c1),k(c2))supk:ITXd(k(c1),k(c2))\displaystyle\implies\forall{c_{1},c_{2}\in TI}~{}.~{}\sup_{k\colon I\to TX}d(k^{\sharp}(c_{1}),k^{\sharp}(c_{2}))\geq\sup_{k\colon I\to TX}d^{\prime}(k^{\sharp}(c_{1}),k^{\sharp}(c_{2}))
c1,c2TI.Gen(d)I(c1,c2)Gen(d)I(c1,c2)\displaystyle\iff\forall{c_{1},c_{2}\in TI}~{}.~{}\mathrm{Gen}(d)_{I}(c_{1},c_{2})\geq\mathrm{Gen}(d^{\prime})_{I}(c_{1},c_{2})
Gen(d)Gen(d).\displaystyle\iff\mathrm{Gen}(d)\preceq\mathrm{Gen}(d^{\prime}).

__relaxaxp@oldproof (Proof of Theorem 6) We first show (Gen())X=id(\mathrm{Gen}(-))_{X}=\mathrm{id}. Let d𝐂𝐒𝐄𝐏𝐌𝐞𝐭(TΩ,X)d\in{\bf CSEPMet}(T_{\Omega},X). We fix arbitrary t,uTΩXt,u\in T_{\Omega}X. From the substitutivity of dd, we have d(k(t),k(u))d(t,u)d(k^{\sharp}(t),k^{\sharp}(u))\leq d(t,u), but we can take k=ηXk=\eta_{X}, we obtain

Gen(d)XC(t,u)=supk:XTXd(k(t),k(u))=d(t,u).\mathrm{Gen}(d)XC(t,u)=\sup_{k\colon X\to TX}d(k^{\sharp}(t),k^{\sharp}(u))=d(t,u).

Since d,t,ud,t,u are arbitrary, we conclude (Gen())X=id(\mathrm{Gen}(-))_{X}=\mathrm{id}.

We show Gen(()X)=id\mathrm{Gen}((-)_{X})=\mathrm{id}. Let Δ𝐃𝐢𝐯𝐄𝐏𝐌𝐞𝐭(TΩ,X){\mathsf{\Delta}}\in\mathbf{DivEPMet}(T_{\Omega},X). By the XX-generatedness of Δ{\mathsf{\Delta}}, we have for all set II and t,uTΩIt,u\in T_{\Omega}I,

Gen((Δ)X)I(t,u)=supk:ITXΔX(k(t),k(u))=ΔI(t,u).\mathrm{Gen}(({\mathsf{\Delta}})_{X})_{I}(t,u)=\sup_{k\colon I\to TX}{\mathsf{\Delta}}_{X}(k^{\sharp}(t),k^{\sharp}(u))={\mathsf{\Delta}}_{I}(t,u).

Since Δ,I,t,u{\mathsf{\Delta}},I,t,u are arbitrary, we conclude (Gen())X=id(\mathrm{Gen}(-))_{X}=\mathrm{id}.

We show the adjointness: U[d]Vdd[V]U[d]\subseteq V\iff d\geq d[V] for any V𝐐𝐄𝐓(Ω,X)V\in{\bf QET}(\Omega,X) and d𝐂𝐒𝐄𝐏𝐌𝐞𝐭(TΩ,X)d\in{\bf CSEPMet}(T_{\Omega},X).

U[d]V\displaystyle U[d]\subseteq V {t=εu|ε+,d(t,u)ε}¯QET(Ω,X)V\displaystyle\iff\overline{\left\{\emptyset\vdash{t}=_{\varepsilon}{u}~{}\middle|~{}\varepsilon\in\mathbb{Q}^{+},d(t,u)\leq\varepsilon\right\}}^{\mathrm{QET}({\Omega},{X})}\subseteq V
t,uTΩ,ε+.d(t,u)εt=εuV\displaystyle\iff\forall{t,u\in T_{\Omega},\varepsilon\in\mathbb{Q}^{+}}~{}.~{}d(t,u)\leq\varepsilon\implies\emptyset\vdash{t}=_{\varepsilon}{u}\in V
t,uTΩ,ε+.d(t,u)εinf{ε+|t=εuV}ε\displaystyle\iff\forall{t,u\in T_{\Omega},\varepsilon\in\mathbb{Q}^{+}}~{}.~{}d(t,u)\leq\varepsilon\implies\inf\left\{\varepsilon^{\prime}\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{t}=_{\varepsilon^{\prime}}{u}\in V\right\}\leq\varepsilon
t,uTΩ.inf{ε+|t=εuV}d(t,u)\displaystyle\iff\forall{t,u\in T_{\Omega}}~{}.~{}\inf\left\{\varepsilon^{\prime}\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{t}=_{\varepsilon^{\prime}}{u}\in V\right\}\leq d(t,u)
dd[V]\displaystyle\iff d\geq d[V]

We notice that since VV is closed under (Max), (Arch) and (Cut), we have the equivalence

inf{ε+|t=εuV}ε\displaystyle\inf\left\{\varepsilon^{\prime}\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{t}=_{\varepsilon^{\prime}}{u}\in V\right\}\leq\varepsilon
(ε+.ε>εt=εuV)\displaystyle\implies(\forall{\varepsilon^{\prime}\in\mathbb{Q}^{+}}~{}.~{}\varepsilon^{\prime}>\varepsilon\implies\emptyset\vdash{t}=_{\varepsilon^{\prime}}{u}\in V)
t=εuV\displaystyle\implies\emptyset\vdash{t}=_{\varepsilon}{u}\in V
inf{ε+|t=εuV}ε.\displaystyle\implies\inf\left\{\varepsilon^{\prime}\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{t}=_{\varepsilon^{\prime}}{u}\in V\right\}\leq\varepsilon.

We finally show d[U[]]=id𝐂𝐒𝐄𝐏𝐌𝐞𝐭(TΩ,X)d[U[-]]=\mathrm{id}_{{\bf CSEPMet}(T_{\Omega},X)}. From the adjointness, d[U[d]]dd[U[d]]\leq d holds for each d𝐂𝐒𝐄𝐏𝐌𝐞𝐭(TΩ,X)d\in{\bf CSEPMet}(T_{\Omega},X). We can rewrite dd[U[d]]d\leq d[U[d]] as follows:

dd[U[d]]\displaystyle d\leq d[U[d]] t,uTΩ.d(t,u)d[U[d]](t,u)\displaystyle\iff\forall{t,u\in T_{\Omega}}~{}.~{}d(t,u)\leq d[U[d]](t,u)
t,uTΩ.d(t,u)inf{ε+|t=εuU[d]}\displaystyle\iff\forall{t,u\in T_{\Omega}}~{}.~{}d(t,u)\leq\inf\left\{\varepsilon\in\mathbb{Q}^{+}~{}\middle|~{}\emptyset\vdash{t}=_{\varepsilon}{u}\in U[d]\right\}
t,uTΩ,ε+.d(t,u)ε\displaystyle\iff\forall{t,u\in T_{\Omega},\varepsilon\in\mathbb{Q}^{+}}~{}.~{}\implies d(t,u)\leq\varepsilon
{t=εuu[d]}{t=εu|d(t,u)ε}.\displaystyle\iff\{\emptyset\vdash{t}=_{\varepsilon}{u}\in u[d]\}\subseteq\{\emptyset\vdash{t}=_{\varepsilon}{u}|d(t,u)\leq\varepsilon\}.

Thanks to the minimality of U[d]U[d], it suffices to have a QET V𝐐𝐄𝐓(Ω,X)V\in{\bf QET}(\Omega,X) such that

{t=εuV}={t=εu|d(t,u)ε}.\{\emptyset\vdash{t}=_{\varepsilon}{u}\in V\}=\{\emptyset\vdash{t}=_{\varepsilon}{u}|d(t,u)\leq\varepsilon\}.

Inspired from the definition of models of QET ([6]), we define VV as follows:

Γt=εuV\displaystyle{\Gamma\vdash{t}=_{\varepsilon}{u}}\in V
σ:XTΩX.((t=εuΓ.d(σ(t),σ(u))ε)d(σ(t),σ(u))ε).\displaystyle\iff\forall{\sigma\colon X\to T_{\Omega}X}~{}.~{}\left(\left(\forall{{t^{\prime}}=_{\varepsilon^{\prime}}{u^{\prime}}\in\Gamma}~{}.~{}d(\sigma^{\sharp}(t^{\prime}),\sigma^{\sharp}(u^{\prime}))\leq\varepsilon^{\prime}\right)\implies d(\sigma^{\sharp}(t),\sigma^{\sharp}(u))\leq\varepsilon\right).

By the substitutivity of dd and the definition of VV, we obtain for all t,uTΩXt,u\in T_{\Omega}X and ε+\varepsilon\in\mathbb{Q}^{+},

t=εuV(σ:XTΩX.d(σ(t),σ(u))ε)d(t,u)ε.{\emptyset\vdash{t}=_{\varepsilon}{u}}\in V\iff(\forall{\sigma\colon X\to T_{\Omega}X}~{}.~{}d(\sigma^{\sharp}(t),\sigma^{\sharp}(u))\leq\varepsilon)\iff d(t,u)\leq\varepsilon.

We check that VV satisfies all rules of QET:

(Ref) Immediate from the reflexivity of dd.

(Sym) Immediate from the symmetry of dd.

(Tri) Immediate from the triangle-inequality of dd.

(Max) Immediate from the transitivity of ordering \leq and the monotonicity of ++.

(Arch) Immediate from the Archimedean property and the completeness of [0,][0,\infty].

(Nonexp) Let f:|I|Ωf\colon|I|\in\Omega. We then take a term tfTΩIt_{f}\in T_{\Omega}I corresponding to ff. Let t,s:ITΩXt,s:I\rightarrow T_{\Omega}X be functions. We fix an arbitrary σ:XTΩX\sigma\colon X\to T_{\Omega}X. Assume d(σ(t(i)),σ(s(i)))εd(\sigma^{\sharp}(t(i)),\sigma^{\sharp}(s(i)))\leq\varepsilon for each iIi\in I. Then this asserts supiId(σ(t(i)),σ(s(i)))ε\sup_{i\in I}d(\sigma^{\sharp}(t(i)),\sigma^{\sharp}(s(i)))\leq\varepsilon. From the congruence of dd, we conclude

d(σ(f(t(i)|iI)),σ(f(s(i)|iI)))=d(σ(t(tf)),σ(s(tf)))supiId(σ(t(i)),σ(s(i)))ε.d(\sigma^{\sharp}(f(t(i)|i\in I)),\sigma^{\sharp}(f(s(i)|i\in I)))=d(\sigma^{\sharp}(t^{\sharp}(t_{f})),\sigma^{\sharp}(s^{\sharp}(t_{f})))\leq\sup_{i\in I}d(\sigma^{\sharp}(t(i)),\sigma^{\sharp}(s(i)))\leq\varepsilon.

(Subst) Immediate by definition of VV:

Γt=εuV\displaystyle{\Gamma\vdash{t}=_{\varepsilon}{u}}\in V
σ:XTΩX.((t=εuΓ.d(σ(t),σ(u))ε)d(σ(t),σ(u))ε)\displaystyle\iff\forall{\sigma\colon X\to T_{\Omega}X}~{}.~{}\left(\left(\forall{{t^{\prime}}=_{\varepsilon^{\prime}}{u^{\prime}}\in\Gamma}~{}.~{}d(\sigma^{\sharp}(t^{\prime}),\sigma^{\sharp}(u^{\prime}))\leq\varepsilon^{\prime}\right)\implies d(\sigma^{\sharp}(t),\sigma^{\sharp}(u))\leq\varepsilon\right)
σ:XTΩX.σ:XTΩX.((t=εuΓ.d(σ(σ(t)),σ(σ(u)))ε)d(σ(σ(t)),σ(σ(u)))ε)\displaystyle\implies\forall{\sigma^{\prime}\colon X\to T_{\Omega}X}~{}.~{}\forall{\sigma\colon X\to T_{\Omega}X}~{}.~{}\left(\begin{aligned} &\left(\forall{{t^{\prime}}=_{\varepsilon^{\prime}}{u^{\prime}}\in\Gamma}~{}.~{}d(\sigma^{\sharp}({\sigma^{\prime}}^{\sharp}(t^{\prime})),\sigma^{\sharp}({\sigma^{\prime}}^{\sharp}(u^{\prime})))\leq\varepsilon^{\prime}\right)\\ &\qquad\implies d(\sigma^{\sharp}({\sigma^{\prime}}^{\sharp}(t)),\sigma^{\sharp}({\sigma^{\prime}}^{\sharp}(u)))\leq\varepsilon\end{aligned}\right)
σ:XTΩX.σ:XTΩX.((t′′=εu′′σ(Γ).d(σ(t′′),σ(u′′))ε)d(σ(σ(t)),σ(σ(u)))ε)\displaystyle\implies\forall{\sigma^{\prime}\colon X\to T_{\Omega}X}~{}.~{}\forall{\sigma\colon X\to T_{\Omega}X}~{}.~{}\left(\begin{aligned} &\left(\forall{{t^{\prime\prime}}=_{\varepsilon^{\prime}}{u^{\prime\prime}}\in\sigma^{\prime}(\Gamma)}~{}.~{}d(\sigma^{\sharp}(t^{\prime\prime}),\sigma^{\sharp}(u^{\prime\prime}))\leq\varepsilon^{\prime}\right)\\ &\qquad\implies d(\sigma^{\sharp}({\sigma^{\prime}}^{\sharp}(t)),\sigma^{\sharp}({\sigma^{\prime}}^{\sharp}(u)))\leq\varepsilon\end{aligned}\right)
σ:XTΩX.σ(Γ)σ(t)=εσ(u)V.\displaystyle\iff\forall{\sigma^{\prime}\colon X\to T_{\Omega}X}~{}.~{}{\sigma^{\prime}}(\Gamma)\vdash{{\sigma^{\prime}}^{\sharp}(t)}=_{\varepsilon}{{\sigma^{\prime}}^{\sharp}(u)}\in V.

(Cut) Immediate.

(Assumpt) Immediate. __relaxaxp@oldproof __relaxaxp@oldproof (Proof of Theorem 7) Since the range of U[]U[-] is a subset of 𝐔𝐐𝐄𝐓(Ω,X)\mathbf{UQET}(\Omega,X), we may define the following monotone restrictions of U[]U[-] and d[]d[-]:

U[]\displaystyle U^{\prime}[-] :(𝐂𝐒𝐄𝐏𝐌𝐞𝐭(TΩ,X),)(𝐔𝐐𝐄𝐓(Ω,X),)\displaystyle\colon({\bf CSEPMet}(T_{\Omega},X),\preceq)\to(\mathbf{UQET}(\Omega,X),\subseteq) U[d]U[d](d𝐂𝐒𝐄𝐏𝐌𝐞𝐭(TΩ,X)),\displaystyle U^{\prime}[d]\triangleq U[d]\quad(d\in{\bf CSEPMet}(T_{\Omega},X)),
d[]\displaystyle d^{\prime}[-] :(𝐔𝐐𝐄𝐓(Ω,X),)(𝐂𝐒𝐄𝐏𝐌𝐞𝐭(TΩ,X),)\displaystyle\colon(\mathbf{UQET}(\Omega,X),\subseteq)\to({\bf CSEPMet}(T_{\Omega},X),\preceq) d[V]d[V](V𝐔𝐐𝐄𝐓(Ω,X)).\displaystyle d^{\prime}[V]\triangleq d[V]\quad(V\in\mathbf{UQET}(\Omega,X)).

By Theorem 6, we have U[]d[]U^{\prime}[-]\vdash d^{\prime}[-] and d[U[]]=idd^{\prime}[U^{\prime}[-]]={\rm id}. We show U[d[]]=idU^{\prime}[d^{\prime}[-]]=\mathrm{id}. Let V𝐔𝐐𝐄𝐓(Ω,X)V\in\mathbf{UQET}(\Omega,X). There exists S{t=εu|t,uTΩX,ε+}S\subseteq\{\emptyset\vdash{t}=_{\varepsilon}{u}~{}|~{}t,u\in T_{\Omega}X,\varepsilon\in\mathbb{Q}^{+}\} such that V=S¯QET(Ω,X)V=\overline{S}^{\mathrm{QET}({\Omega},{X})}. We check U[d[V]]=VU^{\prime}[d^{\prime}[V]]=V. By the adjunction U[]d[]U^{\prime}[-]\dashv d^{\prime}[-], we have U[d[V]]VU^{\prime}[d^{\prime}[V]]\subseteq V which is equivalent to d[V]d[V]d^{\prime}[V]\preceq d^{\prime}[V]. It suffices to check VU[d[V]]V\subseteq U^{\prime}[d^{\prime}[V]]. We have

t=εuS\displaystyle\emptyset\vdash{t}=_{\varepsilon}{u}\in S
t=εuV\displaystyle\implies\emptyset\vdash{t}=_{\varepsilon}{u}\in V
d[V](t,u)=inf{ε+|t=εuV}ε\displaystyle\implies d^{\prime}[V](t,u)=\inf\{\varepsilon^{\prime}\in\mathbb{Q}^{+}~{}|~{}\emptyset\vdash{t}=_{\varepsilon^{\prime}}{u}\in V\}\leq\varepsilon

From the monotonicity of the closure ()¯QET(Ω,X)\overline{(-)}^{\mathrm{QET}({\Omega},{X})}, we conclude

V=S¯QET(Ω,X){t=εu|d[V](t,u)ε}¯QET(Ω,X)=U[d[V]].V=\overline{S}^{\mathrm{QET}({\Omega},{X})}\subseteq\overline{\{\emptyset\vdash{t}=_{\varepsilon}{u}~{}|~{}d^{\prime}[V](t,u)\leq\varepsilon\}}^{\mathrm{QET}({\Omega},{X})}=U^{\prime}[d^{\prime}[V]].

Since V𝐔𝐐𝐄𝐓(Ω,X)V\in\mathbf{UQET}(\Omega,X) is arbitrary, we have U[d[]]=idU^{\prime}[d^{\prime}[-]]=\mathrm{id}. __relaxaxp@oldproof

Lemma 4.

Let (,T)(\mathbb{C},T) be a CC-SM and Δ={ΔIm:(U(TI))2𝒬}mM,I{\mathsf{\Delta}}=\{{\mathsf{\Delta}}^{m}_{I}\colon(U(TI))^{2}\to\mathcal{Q}\}_{m\in M,I\in\mathbb{C}} be a doubly-indexed family of 𝒬\mathcal{Q}-divergences satisfying monotonicity on mm (Definition 6). Then T[Δ]T^{[{\mathsf{\Delta}}]} is an M×𝒬M\times\mathcal{Q}-graded relational lifting of TT (satisfies conditions 13 of Definition 15).

Proof.

(Condition 1) We first show that (idTX1,idTX2)𝐁𝐑𝐞𝐥()(T[Δ](m,v)X,T[Δ](n,w)X)(\mathrm{id}_{TX_{1}},\mathrm{id}_{TX_{2}})\in{\bf BRel}(\mathbb{C})(T^{[{\mathsf{\Delta}}]}(m,v)X,T^{[{\mathsf{\Delta}}]}(n,w)X) for all XX whenever mnm\leq n and vwv\leq w. From the monotonicity of Δ{\mathsf{\Delta}}, for all II\in\mathbb{C}, c1,c2U(TI)c^{\prime}_{1},c^{\prime}_{2}\in U(TI), nMn^{\prime}\in M,w𝒬w^{\prime}\in\mathcal{Q}, we have

(c1,c2)Δ~(mn,v+w)I\displaystyle(c^{\prime}_{1},c^{\prime}_{2})\in\tilde{\mathsf{\Delta}}(m\cdot n^{\prime},v+w^{\prime})I
ΔImn(c1,c2)v+wΔInn(c1,c2)v+wΔInn(c1,c2)w+w\displaystyle\iff{\mathsf{\Delta}}^{m\cdot n^{\prime}}_{I}(c^{\prime}_{1},c^{\prime}_{2})\leq v+w^{\prime}\implies{\mathsf{\Delta}}^{n\cdot n^{\prime}}_{I}(c^{\prime}_{1},c^{\prime}_{2})\leq v+w^{\prime}\implies{\mathsf{\Delta}}^{n\cdot n^{\prime}}_{I}(c^{\prime}_{1},c^{\prime}_{2})\leq w+w^{\prime}
(c1,c2)Δ~(nn,w+w)I.\displaystyle\iff(c^{\prime}_{1},c^{\prime}_{2})\in\tilde{\mathsf{\Delta}}(n\cdot n^{\prime},w+w^{\prime})I.

Therefore, for any (c1,c2)T[Δ](m,v)X(c_{1},c_{2})\in T^{[{\mathsf{\Delta}}]}(m,v)X, we obtain (c1,c2)T[Δ](n,w)X(c_{1},c_{2})\in T^{[{\mathsf{\Delta}}]}(n,w)X as follows:

(c1,c2)T[Δ](m,v)X\displaystyle(c_{1},c_{2})\in T^{[{\mathsf{\Delta}}]}(m,v)X
I,nM,w𝒬,(k1,k2):X˙Δ~(n,w)I.(k1c1,k2c2)Δ~(mn,v+w)I\displaystyle\iff\forall I\in\mathbb{C},n^{\prime}\in M,w^{\prime}\in\mathcal{Q},(k_{1},k_{2}):X\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n^{\prime},w^{\prime})I~{}.~{}(k_{1}^{\sharp}\mathbin{\bullet}c_{1},k_{2}^{\sharp}\mathbin{\bullet}c_{2})\in\tilde{\mathsf{\Delta}}(m\cdot n^{\prime},v+w^{\prime})I
I,nM,w𝒬,(k1,k2):X˙Δ~(n,w)I.(k1c1,k2c2)Δ~(nn,w+w)I\displaystyle\implies\forall I\in\mathbb{C},n^{\prime}\in M,w^{\prime}\in\mathcal{Q},(k_{1},k_{2}):X\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n^{\prime},w^{\prime})I~{}.~{}(k_{1}^{\sharp}\mathbin{\bullet}c_{1},k_{2}^{\sharp}\mathbin{\bullet}c_{2})\in\tilde{\mathsf{\Delta}}(n\cdot n^{\prime},w+w^{\prime})I
(c1,c2)T[Δ](n,w)X.\displaystyle\iff(c_{1},c_{2})\in T^{[{\mathsf{\Delta}}]}(n,w)X.

(Condition 2) We next show (ηX1,ηX2):X˙T[Δ](1,0)X(\eta_{X_{1}},\eta_{X_{2}}):X\mathbin{\dot{\rightarrow}}T^{[{\mathsf{\Delta}}]}(1,0)X. From the definition of morphisms in 𝐁𝐑𝐞𝐥(){\bf BRel}(\mathbb{C}), for all (x1,x2)X(x_{1},x_{2})\in X, we have (ηX1x1,ηX2x2)T[Δ](1,0)X(\eta_{X_{1}}\mathbin{\bullet}x_{1},\eta_{X_{2}}\mathbin{\bullet}x_{2})\in T^{[{\mathsf{\Delta}}]}(1,0)X as follows:

(x1,x2)X\displaystyle(x_{1},x_{2})\in X
I,nM,w𝒬,(k1,k2):X˙Δ~(n,w)I.(k1x1,k2x2)Δ~(n,w)I\displaystyle\implies\forall{I\in\mathbb{C},n\in M,w\in\mathcal{Q},(k_{1},k_{2}):X\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n,w)I}~{}.~{}(k_{1}\mathbin{\bullet}x_{1},k_{2}\mathbin{\bullet}x_{2})\in\tilde{\mathsf{\Delta}}(n,w)I
I,nM,w𝒬,(k1,k2):X˙Δ~(n,w)I.((k1ηX1)x1),(k2ηX2)x2)Δ~(n,w)I\displaystyle\iff\forall{I\in\mathbb{C},n\in M,w\in\mathcal{Q},(k_{1},k_{2}):X\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n,w)I}~{}.~{}((k_{1}^{\sharp}\circ\eta_{X_{1}})\mathbin{\bullet}x_{1}),(k_{2}^{\sharp}\circ\eta_{X_{2}})\mathbin{\bullet}x_{2})\in\tilde{\mathsf{\Delta}}(n,w)I
I,nM,w𝒬,(k1,k2):X˙Δ~(n,w)I.(k1(ηX1x1),k2(ηX2x2))Δ~(n,w)I\displaystyle\iff\forall{I\in\mathbb{C},n\in M,w\in\mathcal{Q},(k_{1},k_{2}):X\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n,w)I}~{}.~{}(k_{1}^{\sharp}\mathbin{\bullet}(\eta_{X_{1}}\mathbin{\bullet}x_{1}),k_{2}^{\sharp}\mathbin{\bullet}(\eta_{X_{2}}\mathbin{\bullet}x_{2}))\in\tilde{\mathsf{\Delta}}(n,w)I
(ηX1x1,ηX2x2)T[Δ](1,0)X.\displaystyle\iff(\eta_{X_{1}}\mathbin{\bullet}x_{1},\eta_{X_{2}}\mathbin{\bullet}x_{2})\in T^{[{\mathsf{\Delta}}]}(1,0)X.

(Condition 3) Finally, we show that (f1,f2):T[Δ](n,w)X˙T[Δ](nm,w+v)Y(f_{1}^{\sharp},f_{2}^{\sharp}):T^{[{\mathsf{\Delta}}]}(n,w)X\mathbin{\dot{\rightarrow}}T^{[{\mathsf{\Delta}}]}(n\cdot m,w+v)Y holds for any (f1,f2):X˙T[Δ](m,v)Y(f_{1},f_{2}):X\mathbin{\dot{\rightarrow}}T^{[{\mathsf{\Delta}}]}(m,v)Y and (n,w)M×𝒬(n,w)\in M\times\mathcal{Q}. For all (f1,f2):X˙T[Δ](m,v)Y(f_{1},f_{2}):X\mathbin{\dot{\rightarrow}}T^{[{\mathsf{\Delta}}]}(m,v)Y, we have

(f1,f2):X˙T[Δ](m,v)Y\displaystyle(f_{1},f_{2}):X\mathbin{\dot{\rightarrow}}T^{[{\mathsf{\Delta}}]}(m,v)Y
(x1,x2)X.(f1x1,f2x2)T[Δ](m,v)Y\displaystyle\iff\forall{(x_{1},x_{2})\in X}~{}.~{}(f_{1}\mathbin{\bullet}x_{1},f_{2}\mathbin{\bullet}x_{2})\in T^{[{\mathsf{\Delta}}]}(m,v)Y
((x1,x2)X,I,nM,w𝒬,(k1,k2):Y˙Δ~(n,w)I.(k1(f1x1),k2(f2x2))Δ~(mn,v+w)I)\displaystyle\iff\left(\begin{aligned} &\forall{(x_{1},x_{2})\in X,I\in\mathbb{C},n^{\prime}\in M,w^{\prime}\in\mathcal{Q},(k_{1},k_{2}):Y\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n^{\prime},w^{\prime})I}~{}.~{}\\ &\qquad(k_{1}^{\sharp}\mathbin{\bullet}(f_{1}\mathbin{\bullet}x_{1}),k_{2}^{\sharp}\mathbin{\bullet}(f_{2}\mathbin{\bullet}x_{2}))\in\tilde{\mathsf{\Delta}}(m\cdot n^{\prime},v+w^{\prime})I\end{aligned}\right)
((x1,x2)X,I,nM,w𝒬,(k1,k2):Y˙Δ~(n,w)I.((k1f1)x1),(k2f2)x2)Δ~(mn,v+w)I)\displaystyle\iff\left(\begin{aligned} &\forall{(x_{1},x_{2})\in X,I\in\mathbb{C},n^{\prime}\in M,w^{\prime}\in\mathcal{Q},(k_{1},k_{2}):Y\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n^{\prime},w^{\prime})I}~{}.~{}\\ &\qquad((k_{1}^{\sharp}\circ f_{1})\mathbin{\bullet}x_{1}),(k_{2}^{\sharp}\circ f_{2})\mathbin{\bullet}x_{2})\in\tilde{\mathsf{\Delta}}(m\cdot n^{\prime},v+w^{\prime})I\end{aligned}\right)
(I,nM,w𝒬,(k1,k2):Y˙Δ~(n,w)I.(k1f1,k2f2):X˙Δ~(mn,v+w)I).\displaystyle\iff\left(\begin{aligned} &\forall{I\in\mathbb{C},n^{\prime}\in M,w^{\prime}\in\mathcal{Q},(k_{1},k_{2}):Y\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n^{\prime},w^{\prime})I}~{}.~{}\\ &\qquad(k_{1}^{\sharp}\circ f_{1},k_{2}^{\sharp}\circ f_{2}):X\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(m\cdot n^{\prime},v+w^{\prime})I\end{aligned}\right). (a)

For all (c1,c2)T[Δ](n,w)X(c_{1},c_{2})\in T^{[{\mathsf{\Delta}}]}(n,w)X, we have

(c1,c2)T[Δ](n,w)X\displaystyle(c_{1},c_{2})\in T^{[{\mathsf{\Delta}}]}(n,w)X
(I,nM,w𝒬,(l1,l2):X˙Δ~(n,w)I.(l1c1,l2c2)Δ~(nn,w+w)I).\displaystyle\iff\left(\begin{aligned} &\forall{I\in\mathbb{C},n^{\prime}\in M,w^{\prime}\in\mathcal{Q},(l_{1},l_{2}):X\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n^{\prime},w^{\prime})I}~{}.~{}\\ &\qquad(l_{1}^{\sharp}\mathbin{\bullet}c_{1},l_{2}^{\sharp}\mathbin{\bullet}c_{2})\in\tilde{\mathsf{\Delta}}(n\cdot n^{\prime},w+w^{\prime})I\end{aligned}\right). (b)

We here fix (f1,f2):X˙T[Δ](m,v)Y(f_{1},f_{2}):X\mathbin{\dot{\rightarrow}}T^{[{\mathsf{\Delta}}]}(m,v)Y. We show (f1,f2):T[Δ](n,w)X˙TΔ(nm,w+v)Y(f_{1}^{\sharp},f_{2}^{\sharp})\colon T^{[{\mathsf{\Delta}}]}(n,w)X\mathbin{\dot{\rightarrow}}T{\mathsf{\Delta}}(n\cdot m,w+v)Y. We also fix II\in\mathbb{C}, nMn^{\prime\prime}\in M, w𝒬w^{\prime\prime}\in\mathcal{Q} and (k1,k2):Y˙Δ~(n,w)I(k_{1},k_{2}):Y\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n^{\prime\prime},w^{\prime\prime})I. From (a), we obtain

(k1f1,k2f2):X˙Δ~(mn,v+w)I.(k_{1}^{\sharp}\circ f_{1},k_{2}^{\sharp}\circ f_{2}):X\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(m\cdot n^{\prime\prime},v+w^{\prime\prime})I.

Therefore, by instantiating (b) with (n,w)=(mn,v+w)(n^{\prime},w^{\prime})=(m\cdot n^{\prime\prime},v+w^{\prime\prime}) and (l1,l2)=(k1f1,k2f2)(l_{1},l_{2})=(k_{1}^{\sharp}\circ f_{1},k_{2}^{\sharp}\circ f_{2}), for all (c1,c2)T[Δ](n,w)X(c_{1},c_{2})\in T^{[{\mathsf{\Delta}}]}(n,w)X, we have

((k1f1)c1,(k2f2)c2)Δ~(nmn,w+v+w)I.((k_{1}^{\sharp}\circ f_{1})^{\sharp}\mathbin{\bullet}c_{1},(k_{2}^{\sharp}\circ f_{2})^{\sharp}\mathbin{\bullet}c_{2})\in\tilde{\mathsf{\Delta}}(n\cdot m\cdot n^{\prime\prime},w+v+w^{\prime\prime})I.

Since (c1,c2)T[Δ](n,w)X(c_{1},c_{2})\in T^{[{\mathsf{\Delta}}]}(n,w)X, II\in\mathbb{C}, nMn^{\prime\prime}\in M, w𝒬w^{\prime\prime}\in\mathcal{Q} and (k1,k2):Y˙Δ~(n,w)I(k_{1},k_{2}):Y\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n^{\prime\prime},w^{\prime\prime})I are arbitrary, we conclude (f1,f2):T[Δ](n,w)X˙TΔ(nm,w+v)(f_{1}^{\sharp},f_{2}^{\sharp})\colon T^{[{\mathsf{\Delta}}]}(n,w)X\mathbin{\dot{\rightarrow}}T{\mathsf{\Delta}}(n\cdot m,w+v) as follows:

((c1,c2)T[Δ](n,w)X,I,mM,v𝒬,(k1,k2):Y˙Δ~(m,v)I.((k1f1)c1,(k2f2)c2):X˙Δ~(nmm,w+v+v)I)\displaystyle\left(\begin{aligned} &\forall{(c_{1},c_{2})\in T^{[{\mathsf{\Delta}}]}(n,w)X,I\in\mathbb{C},m^{\prime\prime}\in M,v^{\prime\prime}\in\mathcal{Q},(k_{1},k_{2}):Y\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(m^{\prime\prime},v^{\prime\prime})I}~{}.~{}\\ &\qquad((k_{1}^{\sharp}\circ f_{1})^{\sharp}\mathbin{\bullet}c_{1},(k_{2}^{\sharp}\circ f_{2})^{\sharp}\mathbin{\bullet}c_{2}):X\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n\cdot m\cdot m^{\prime\prime},w+v+v^{\prime\prime})I\end{aligned}\right)
((c1,c2)T[Δ](n,w)X,I,mM,v𝒬,(k1,k2):Y˙Δ~(m,v)I.(k1(f1c1),k2(f2c2)):X˙Δ~(nmm,w+v+v)I)\displaystyle\iff\left(\begin{aligned} &\forall{(c_{1},c_{2})\in T^{[{\mathsf{\Delta}}]}(n,w)X,I\in\mathbb{C},m^{\prime\prime}\in M,v^{\prime\prime}\in\mathcal{Q},(k_{1},k_{2}):Y\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(m^{\prime\prime},v^{\prime\prime})I}~{}.~{}\\ &\qquad(k_{1}^{\sharp}\mathbin{\bullet}(f_{1}^{\sharp}\mathbin{\bullet}c_{1}),k_{2}^{\sharp}\mathbin{\bullet}(f_{2}^{\sharp}\mathbin{\bullet}c_{2})):X\mathbin{\dot{\rightarrow}}\tilde{\mathsf{\Delta}}(n\cdot m\cdot m^{\prime\prime},w+v+v^{\prime\prime})I\end{aligned}\right)
(c1,c2)T[Δ](n,w)X.(f1c1,f2c2)T[Δ](nm,w+v)Y\displaystyle\iff\forall{(c_{1},c_{2})\in T^{[{\mathsf{\Delta}}]}(n,w)X}~{}.~{}(f_{1}^{\sharp}\mathbin{\bullet}c_{1},f_{2}^{\sharp}\mathbin{\bullet}c_{2})\in T^{[{\mathsf{\Delta}}]}(n\cdot m,w+v)Y
(f1,f2):T[Δ](n,w)X˙TΔ(nm,w+v).\displaystyle\iff(f_{1}^{\sharp},f_{2}^{\sharp})\colon T^{[{\mathsf{\Delta}}]}(n,w)X\mathbin{\dot{\rightarrow}}T{\mathsf{\Delta}}(n\cdot m,w+v).

This completes the proof. ∎

__relaxaxp@oldproof (Proof of Proposition 13) By Theorem 8 and the assumption I,J.EI×˙EJE(I×J)\forall{I,J\in\mathbb{C}}~{}.~{}EI\mathbin{\dot{\times}}EJ\subseteq E(I\times J), we obtain for all (x1,x2)EI(x_{1},x_{2})\in EI and c1,c2U(TI)c_{1},c_{2}\in U(TI),

(x1,c1,x2,c2)EI×˙Δ~(m,v)J\displaystyle(\langle x_{1},c_{1}\rangle,\langle x_{2},c_{2}\rangle)\in EI\mathbin{\dot{\times}}\tilde{\mathsf{\Delta}}(m,v)J
(x1,c1,x2,c2)EI×˙T[Δ](m,v)(EJ)\displaystyle\iff(\langle x_{1},c_{1}\rangle,\langle x_{2},c_{2}\rangle)\in EI\mathbin{\dot{\times}}T^{[{\mathsf{\Delta}}]}(m,v)(EJ)
(θI,Jx1,c1,θI,Jx2,c2)T[Δ](m,v)(EI×˙EJ)\displaystyle\implies(\theta_{I,J}\mathbin{\bullet}\langle x_{1},c_{1}\rangle,\theta_{I,J}\mathbin{\bullet}\langle x_{2},c_{2}\rangle)\in T^{[{\mathsf{\Delta}}]}(m,v)(EI\mathbin{\dot{\times}}EJ)
(θI,Jx1,c1,θI,Jx2,c2)T[Δ](m,v)E(I×J)\displaystyle\implies(\theta_{I,J}\mathbin{\bullet}\langle x_{1},c_{1}\rangle,\theta_{I,J}\mathbin{\bullet}\langle x_{2},c_{2}\rangle)\in T^{[{\mathsf{\Delta}}]}(m,v)E(I\times J)
(θI,Jx1,c1,θI,Jx2,c2)Δ~(m,v)(I×J).\displaystyle\iff(\theta_{I,J}\mathbin{\bullet}\langle x_{1},c_{1}\rangle,\theta_{I,J}\mathbin{\bullet}\langle x_{2},c_{2}\rangle)\in\tilde{\mathsf{\Delta}}(m,v)(I\times J).

This completes the proof. __relaxaxp@oldproof

Lemma 5.

The mapping

(x,σ){𝒩(x,σ2)σ2>0𝐝xσ=0(x,\sigma)\mapsto\begin{cases}\mathcal{N}(x,\sigma^{2})&\sigma^{2}>0\\ \mathbf{d}_{x}&\sigma=0\end{cases}

forms a measurable function of type ×G\mathbb{R}\times\mathbb{R}\to G\mathbb{R}.

Proof.

We show that for all AΣA\in\Sigma_{\mathbb{R}}, the mapping fA(x,σ)=𝒩(x,σ2)(A)f_{A}(x,\sigma)=\mathcal{N}(x,\sigma^{2})(A) forms a measurable function of type ×0[0,1]\mathbb{R}\times\mathbb{R}_{\neq 0}\to[0,1] where 0\mathbb{R}_{\neq 0} is the subspace of \mathbb{R} whose underlying set is {r|r0}\{r\in\mathbb{R}|r\neq 0\}. We have

𝒩(x,σ2)(A)=k𝒩(x,σ2)(A[k,k+1])=kA[k,k+1]12πσ2exp((xr)2σ2)dr\mathcal{N}(x,\sigma^{2})(A)=\sum_{k\in\mathbb{Z}}\mathcal{N}(x,\sigma^{2})(A\cap[k,k+1])=\sum_{k\in\mathbb{Z}}\int_{A\cap[k,k+1]}\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{(x-r)^{2}}{\sigma^{2}}\right)dr

The mapping h(x,σ,r)=12πσ2exp((xr)2σ2)h(x,\sigma,r)=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{(x-r)^{2}}{\sigma^{2}}\right) forms a continuous function of type ×0×\mathbb{R}\times\mathbb{R}_{\neq 0}\times\mathbb{R}\to\mathbb{R}, hence it is uniformly continuous on the compact set I1×I2×[k,k+1]I_{1}\times I_{2}\times[k,k+1] where I1I_{1} and I2I_{2} are arbitrary closed intervals in \mathbb{R} and 0\mathbb{R}_{\neq 0} respectively. Then, for all 0<ε0<\varepsilon, there exists 0<δ0<\delta such that |h(x,σ,r)h(x,σ,r)|<ε|h(x,\sigma,r)-h(x^{\prime},\sigma^{\prime},r^{\prime})|<\varepsilon holds wherever |xx|+|σσ|+|rr|<δ|x-x^{\prime}|+|\sigma-\sigma^{\prime}|+|r-r^{\prime}|<\delta. Hence, for all 0<ε0<\varepsilon, there is 0<δ0<\delta such that whenever |xx|+|σσ|<δ|x-x^{\prime}|+|\sigma-\sigma^{\prime}|<\delta,

|A[k,k+1]h(x,σ,r)drA[k,k+1]h(x,σ,r)dr||[k,k+1]|h(x,σ,r)h(x,σ,r)|drε.\left|\int_{A\cap[k,k+1]}h(x,\sigma,r)dr-\int_{A\cap[k,k+1]}h(x^{\prime},\sigma^{\prime},r)dr\right|\leq|\int_{[k,k+1]}|h(x,\sigma,r)-h(x^{\prime},\sigma^{\prime},r^{\prime})|dr\leq\varepsilon.

Since the closed intervals I1I_{1} and I2I_{2} are arbitrary, we conclude that the function fA[k,k+1]:×0[0,1]f_{A\cap[k,k+1]}\colon\mathbb{R}\times\mathbb{R}_{\neq 0}\to[0,1] is continuous, hence measurable. Hence, the mapping fA=kfA[k,k+1]f_{A}=\sum_{k\in\mathbb{Z}}f_{A\cap[k,k+1]} is measurable. Since AA is arbitrary and fA(x,σ2)=evA𝒩(x,σ2)f_{A}(x,\sigma^{2})=\mathrm{ev}_{A}\circ\mathcal{N}(x,\sigma^{2}), the mapping g(x,σ2)=𝒩(x,σ2)g(x,\sigma^{2})=\mathcal{N}(x,\sigma^{2}) forms a measurable function of type ×0G\mathbb{R}\times\mathbb{R}_{\neq 0}\to G\mathbb{R}. The rest of proof is routine. ∎

Corollary 1.

[[𝚗𝚘𝚛𝚖]]𝐐𝐁𝐒(K×K,PK)[\![\mathtt{norm}]\!]\in{\bf QBS}(K\mathbb{R}\times K\mathbb{R},PK\mathbb{R}).

Lemma 6 (Measurability of [[𝚕𝚊𝚙]][\![\mathtt{lap}]\!]).

The mapping

(x,λ){Lap(x,λ)λ>0𝐝xλ0(x,\lambda)\mapsto\begin{cases}\mathrm{Lap}(x,\lambda)&\lambda>0\\ \mathbf{d}_{x}&\lambda\leq 0\end{cases}

forms a measurable function of type ×G\mathbb{R}\times\mathbb{R}\to G\mathbb{R}.

Proof.

We have, for all AΣA\in\Sigma_{\mathbb{R}},

Lap(x,λ)(A)=A12λexp(|xr|λ)dr\mathrm{Lap}(x,\lambda)(A)=\int_{A}\frac{1}{2\lambda}\exp\left(-\frac{|x-r|}{\lambda}\right)dr

The density function h(x,λ,r)=12λexp(|xr|λ)h(x,\lambda,r)=\frac{1}{2\lambda}\exp\left(-\frac{|x-r|}{\lambda}\right) is continuous function of type ×0×\mathbb{R}\times\mathbb{R}_{0\leq}\times\mathbb{R}\to\mathbb{R} where 0\mathbb{R}_{0\leq} is the subspace of \mathbb{R} whose underlying set is {r|0r}\{r\in\mathbb{R}|0\leq r\}. The measurability of Lap(x,λ)\mathrm{Lap}(x,\lambda) is proved in the same way as 𝒩(x,σ2)\mathcal{N}(x,\sigma^{2}). The rest of proof is routine. ∎

Corollary 2.

[[𝚕𝚊𝚙]]𝐐𝐁𝐒(K×K,PK)[\![\mathtt{lap}]\!]\in{\bf QBS}(K\mathbb{R}\times K\mathbb{R},PK\mathbb{R}).