Divergences on Monads for Relational Program Logics
Tetsuya Sato
Shin-ya Katsumata
Abstract
Several relational program logics have been introduced for
integrating reasoning about relational properties of programs and
measurement of quantitative difference between computational
effects. Towards a general framework for such logics, in this
paper, we formalize quantitative difference between computational
effects as divergence on monad, then develop a relational
program logic acRL that supports generic computational effects and
divergences on them. To give a categorical semantics of acRL
supporting divergences, we give a method to obtain graded
strong relational liftings from divergences on monads. We derive
two instantiations of acRL for the verification of 1) various differential
privacy of higher-order functional
probabilistic programs and 2) difference of distribution of costs
between higher-order functional programs with
probabilistic choice and cost counting operations.
1 Introduction
Comparing behavior of programs is one of the fundamental activities in
the verification and analysis of programs, and many concepts,
techniques and formal systems have been proposed for this purpose,
such as product program construction ([11]), relational Hoare logic ([14]), higher-order relational refinement types
([9]) and so on.
Several recent relational program logics integrate compositional
reasoning about relational properties of programs and
over-approximation of quantitative difference between
computational effects of programs; the latter is done in the style of
effect system ([36]). One
successful logic of this kind is Barthe et al.’s approximate
probabilistic relational Hoare logic (apRHL for short) designed
for verifying differential privacy of probabilistic programs
([12]). A judgment of apRHL is of the
form , and its intuitive
meaning is that for any state pair related by ,
the -distance between two probability distributions of final
states and is below , and final
states satisfy . Another relational program logic that measures
the difference between computational effects of programs is
Çiçek et al.’s RelCost
([18]). The target of the reasoning
is a higher-order programming language equipped with cost counting
effect. When we derive a judgment
in RelCost, the sound semantics ensures that the difference of
cost counts by and is bound by .
A high-level view on these relational program logics is that they
integrate the feature of measuring quantitative difference between
computational effects into relational program logic. We are interested
in extracting mathematical essence of this design and making
relational program logics versatile. Towards this goal, we contribute the
following development.
•
We introduce a structure called divergence on monad for
measuring quantitative difference between computational effects
(Section 4,
5). This generalizes various statistical
divergences, such as Kullback-Leibler divergence and total variation
distance on probability distributions. After exploring examples of
divergence on monads, we introduce a method to transfer divergences
on a monad to those on another monad through monad opfunctors.
•
The key structure to integrate divergences on monads and
relational program logics is something called graded strong
relational lifting of monads that extends given divergences. We
present a general construction of such liftings from divergences on
monads in Section 7. This generalization
shows that the development of relational program logics with
quantitative measurement on computational effects can be done with
various combinations of monads and divergences on them.
•
We introduce a generic relational program logic (called acRL)
over Moggi’s computational metalanguage (the simply-typed lambda
calculus with monadic types) in Section 8. Inside acRL,
we can use graded strong relational liftings constructed from
divergences on a monad, and reason about relational properties of
programs together with quantitative difference of computational
effects. To illustrate how the reasoning works in acRL, we
instantiate it with the computational metalanguage having effectful
operations for continuous random sampling (Section 9)
and cost counting operation (Section 10).
2 Preliminaries
We assume basic knowledge about category theory ([37]) and
Moggi’s model of computational effects ([42]).
The definition of monad [37, Chapter VI] and Kleisli category
[37, Section VI.5] are omitted.
In this paper, a Cartesian category (CC for short) is specified by a
category with a designated final object and a binary product
functor . The associated pairing operation
and projection morphisms are denoted by
, respectively. The unique morphism to the
terminal object is denoted by . A Cartesian closed category (CCC for
short) is a CC with a specified exponential functor
. The associated
evaluation morphism and currying operation is denoted by
respectively.
Let be a CC. A global element of is
a morphism of type . For a category , we define the
functor by . When is obvious,
is denoted by .
Morphisms in act on global elements
by the composition. To emphasize this action, we introduce a dedicated
notation whose type is . Of course,
. We also define the partial
application of a binary morphism to a
global element by
.
When is a CCC, there is an evident isomorphism .
We write for its inverse.
A monad on a category determines the operation
called Kleisli extension. It is
defined by . A monad may be given as a
Kleisli triple [42, Definition 1.2]. A
strong monad on a CC is a pair of a monad
and a natural transformation
called strength. It
should satisfy four axioms; see [42, Definition
3.2] for detail.
In a CC-SM , the
application of the strength to a global element can be expressed by
the unit and the Kleisli extension of [42, Proof of Proposition 3.4]:
(1)
We will use this fact in Proposition
7 and Proposition
1.
There are plenty of examples of C(C)Cs. For the models of
probabilistic computation, we will later use CC of
measurable spaces and CCC of quasi-Borel spaces
([28]). Their definitions are deferred to Section
13.
2.1 Category of Binary Relations
We next introduce the category of binary relations over
-objects. This category is equivalent to subscones of
([41]). It offers an underlying
category for relational reasoning about programs interpreted in .
•
An object in is a triple where
.
•
A morphism from to in is a pair of
-morphisms and such that for any
, we have .
When is a name of a -object, by we mean its
first and second component, and by we mean its third component; so
. By we mean .
For objects and a morphism
in , by
we mean that , that is, for any
, we have . We say that
is an endorelation (over ) if .
We next define the forgetful functor by
For , by
we mean the complete boolean
algebra with the order given by
.
When is a C(C)C, so is
[41, Proposition 4.3]. We specify a final object, a binary
product functor and an exponential functor (in case is a CCC) on
by:
3 Divergences on Objects
We introduce the concept of divergence on objects in a CC
. Major differences between divergence and metric are threefold:
1) it is defined over objects in , 2) no axioms is imposed on it,
and 3) it takes values in a partially ordered monoid called divergence domain, which we define below.
Definition 1.
A divergence domain is a partially ordered
commutative monoid whose poset part is a complete lattice.
The monoid addition is only required to be monotone; no
interaction with the sup / inf is required. We reserve the letter
to denote a general divergence domain. Examples of divergence
domains are:
Here, is an extension of the addition by
.
Definition 2.
Let be a CC. A -divergence on an object
is a function .
A suitable notion of morphism between -objects with divergences
is nonexpansive morphism.
Definition 3.
Let be a CC. We define the category of
-divergences on -objects and nonexpansive morphisms
between them by the following data.
•
An object is a pair of an object and a
-divergence on .
•
A morphism from to is a -morphism
such that for any ,
holds.
For an object , by we mean its -divergence part.
We also define the forgetful functor by
and .
We remark that the
forgetful functor is a (Grothendieck) fibration,
and the functor defined by
and makes the
following commutative square a pullback in (the large category
of categories and functors between them):
Therefore this pullback diagram asserts that arises from
the change-of-base of the fibration along the
global section functor ([29]).
4 Divergences on Monads
We introduce the concept of divergence on monad as a
quantitative measure of difference between computational effects.
This is hinted from Barthe and Olmedo’s composable divergences on
probability distributions
([13]). Divergences on monads are defined
upon two extra data called grading monoid and basic
endorelation.
Definition 4.
A grading monoid is a partially ordered
monoid .
Definition 5.
A basic endorelation is a functor such
that is an endorelation on .
Grading monoids will be used when formulating
-differential privacy as a divergence on a
monad. Basic endorelations specify which global elements are regarded
as identical. Any CC has at least two basic endorelations of
equality relations and total relations:
Other examples of basic endorelations can be found in concrete
categories.
•
The category of -divergences on -objects has a basic
relation parameterized by . It collects all
pairs of global elements whose divergence is bound by .
That is, .
•
The category of preorders and monotone functions has the basic
endorelation collecting equivalent global elements:
.
Definition 6.
Let be a CC-SM, be a
divergence domain, be a grading monoid and
be a basic endorelation. An -relative -graded
-divergence (when , we drop “-graded”) on the
monad is a doubly-indexed family of -divergences
satisfying the following conditions:
Monotonicity
For any in , and
,
-unit Reflexivity
For any ,
-composability
For any , ,
and ,
We write for the collection of -relative
-graded -divergences on .
We introduce a partial order on by:
The -composability condition is a generalization of the
composability of differential privacy stated as [13, Theorem
1]. What is new in this paper is that 1) we
introduce a condition on the monad unit (-unit reflexivity), and
that 2) the sup computed in -unit reflexivity and -composability
scans global elements related by , while [13]
only considers the case where . We will later show that both
-unit reflexivity and -composability play an important role when
connecting divergences, relational liftings of , and the monad
structure of - these conditions are necessary and sufficient to
construct strong graded relational liftings of satisfying
fundamental property with respect to divergences (Proposition
2).
5 Examples of Divergences on Monads
5.1 Cost Difference for Deterministic Computations
To aid in understanding the -unit reflexivity and -composability
conditions, we illustrate a few divergences on an elementary
monad: the cost count monad on . Its
unit and Kleisli extension are defined by
The monad can be used to record the cost incurred by deterministic
computations. For instance, consider the quick sort algorithm
and the insertion sort algorithm , both of which are modified so
that they tick a count whenever they compare two elements to be
sorted. These two modified sort programs are interpreted as functions
, so that the first
component of and that of report the
number of comparisons performed during sorting .
We first define an -divergence
on , for each , by
This divergence computes the difference of costs between two
computations , ignoring their return values. The
family forms a
-relative -divergence on . The -unit reflexivity of
means that the difference of costs between pure
computations is zero:
The -composability of says that we can limit the
cost difference of two runs of programs and by
the sum of cost difference of the preceding computations
and that of two programs . The latter is measured by
taking the sup of cost difference of and , where
range over the basic endorelation .
We remark that is not an -relative
-divergence on because the -composability fails: when
, and (for ) we
have and
, but we have
.
Alternatively, we may consider the following -divergence
on for each :
This divergence is sensitive on return values of computations.
When return values of two computations agree, measures
the cost difference as done in , but when they do not
agree, the cost difference is judged as . This divergence is
an -relative -divergence on .
5.2 Cost Difference for Nondeterministic Computations
Deterministic and nondeterministic
computations with cost counting can be respectively modeled by the
monads and on
.
We define the divergences for cost difference as in Table
1.
These divergences extract the upper bound of cost difference
between two computations.
The divergences and measure the usual distance
of costs for deterministic and nondeterministic computations respectively.
The divergence measures the subtraction of costs of two
nondeterministic computations.
For results of two nondeterministic computations ,
the divergence is an upper bound of
for all possible choices of and , where
a lower bound of is also given by .
The same idea to measure the difference of costs between two programs
by subtraction also appears in ([18, 47]).
If either or is empty, we fail to get an information of costs.
We then have .
On the other hand, if both and are not empty,
their cost intervals are defined by
We then have and .
Table 1: (-graded) -relative -divergences for cost counting monads
Definition of
5.3 Divergences for Differential Privacy
Differential privacy (DP for short) is a quantitative definition of
privacy of randomized queries in databases. DP is based on the idea of
noise-adding anonymization against background-knowledge attacks.
In the study of DP, a query is modeled by a measurable function
, where and are measurable spaces of
inputs and outputs respectively, and is the measurable space
of all probability measures over ; here itself refers to
the Giry monad ([26]; see also Section
13).
Let be a morphism in , representing
a randomized query. The query satisfies
-differential privacy (
are reals) if for any adjacent datasets
111 Strictly speaking,
differential privacy depends on the definition of adjacency of
datasets. The adjacency relation is usually
defined as with a metric
over ., the following holds:
To express this definition in terms of divergence on monad, we
introduce a doubly-indexed family of -divergence
on by
Then the query satisfies
-DP if and only if
The pair indicates the difference between
output probability distributions and of the
query for given datasets and . Intuitively, the
parameter is an upper bound of the ratio
of probabilities which indicates
the leakage of privacy. If is large, attackers can
distinguish the datasets and from the outputs of the query
. The parameter is the probability of
failure of privacy protection.
The family forms an -relative -graded
-divergence on the Giry monad [52, Lemma
6]. This is proved by extending the
composability of the divergence for DP on discrete probability
distributions shown as [12, Lemmas 3 and
6] and [13, Proposition
5], based on the composition theorem
of DP [22, Section 3.5].
The conditions in Definition 6 on corresponds to
the following basic properties of DP:
(monotonicity)
The monotonicity of corresponds to
weakening the differential privacy of queries: if satisfies
-DP and and
holds, then satisfies
-DP.
(-unit reflexivity)
The -unit reflexivity of
implies
for any
measurable function and . This,
together with the composability below, ensures the robustness
of DP of a query with respect to deterministic
postprocessing:
(2)
In fact, the divergence is reflexive: we have
for every . Therefore
and in (2) can be replaced by
and ; the replaced condition states the robustness of DP of a query with respect to probabilistic
postprocessing.
(-composability)
The -composability of
corresponds to the known property of DP called the sequential
composition theorem ([22]). If
and are
-DP and -DP
respectively, then the sequential composition
of the queries and
is -DP.
A Non-Example: Pointwise Differential Privacy.
We stated above that a parameter of DP
intuitively gives an upper bound of the probability ratio
and the probability of failure of
privacy protection. However, strictly speaking, there is a gap
between the definition of -DP and this intuition
of and . Pointwise differential privacy
([46, Definition 3.2] and [27, Proposition
1.2.3]) is a finer definition of DP that is
faithful to the intuition.
Definition 8.
A measurable function (regarded as a query)
is pointwise -differentially private if
whenever and are adjacent, for some
with , we have
which is equivalent to 222 Remark that and
are Radon-Nikodym derivatives of and
with respect to a measure such that .
[] Obvious. [] By Radon-Nikodym theorem
we can take the Radon-Nikodym derivatives and
with respect to .
The inequality does not depend on the choice of .
To express this definition in terms of divergence on monad, we
introduce a doubly-indexed family of -divergences
called pointwise
indistinguishability:
Then is pointwise -differentially private
if and only if
The family is obviously reflexive:
holds for any
and . Hence it is -unit reflexive too.
However, it is not -composable. We let and
be discrete spaces, and let
. We define two probability distributions
by
We then have with
since , , and .
Next, we define
by
We then calculate
Then, we obtain with since .
Hence .
Thus is not -composable, because by the reflexivity of , we have .
Various Relaxations of Differential Privacy
Since the seminal work on DP by [21],
various relaxations of differential privacy have been proposed:
Rényi DP ([40]),
zero-concentrated DP ([17]) and
truncated zero-concentrated DP ([16]).
They give tighter bounds of differential privacy.
These relaxations of differential privacy can be expressed by suitable
divergences on the Giry monad and sub-Giry monad ; see
Table 2 for their definitions. There,
are non-grading parameters for and
. Each row of the table represents that is an
-relative - (resp. -) divergences on
(resp. ), and the definition of
follows.
5.4 Statistical Divergences and Composablity of -Divergences
Apart from differential privacy, various distances between
(sub-)probability distributions are introduced in probability theory.
They are called statistical divergences. Examples include: total variation distance , Hellinger distance
, Kullback-Leibler divergence , and -divergence ; they are defined in Table
3. These statistical divergences are -relative
divergences on the Giry monad (and for ); see
the same table for their divergence domains. Question marks in the
column of means that we do not know with which monoid structure
the -composability holds. We remark that these divergences are
also reflexive, that is, . -composability of
these divergences in discrete form are proved
in ([13, 45]). Later,
[52] extends their results to
the composability of divergences in continuous form.
Table 3: Statistical divergences that are -relative -
(resp. -) divergences on (resp. )
Name
Definition of
Total variation distance
Kullback-Leibler divergence
?
Hellinger distance
?
-divergence
?
Each of four divergences in Table 3 can be expressed
as an -divergence ([19, 20, 43]):
Here, is a parameter called weight function, and has to be a
convex function , continuous at and
satisfying . Weight functions for four
divergences are in Table
4. In fact, is also
an -divergence with weight function
; see [13, Proposition
2]. We also remark that Rényi divergence
of order is the logarithm of the
-divergence with weight function .
-divergences have several nice properties such as reflexivity,
postprocessing inequality, joint-convexity, duality and
continuity ([20, 35]). However, the
-composability of -divergences is not guaranteed in
general. Here we provide a sufficient condition for the -composability
of over a specific form of divergence domain.
Proposition 1.
Let be a nonnegative real number,
be the
divergence domain, and be a weight function such that
and . If there exists
such that, for all , the following hold (suppose ):
then is an -relative -divergence on the
Giry monad . When and ,
can be replaced with the sub-Giry monad .
The proof of this proposition generalizes
and integrates the proofs given in
[45, Section 5.A.2]. This proposition is
applicable to prove the composability of divergences in Table
3 by choosing suitable parameters; see Table
4.
5.5 Divergences on the Probability Monad on QBS via Monad
Opfunctors.
We have seen various divergences on the Giry monad . It would
be nice if they are transferred to the probability monad on
(Section 13). For this, we first develop a generic
method for transferring divergences on monads.
Let and be two CC-SMs. A monad opfunctor
[53, Section 4] is a functor
together with a natural transformation
making the following diagrams
commute:
Proposition 2.
Let be two CC-SMs,
be a monad opfunctor,
and assume that holds, and basic
endorelations and
satisfy for all
(we here use ).
Then for any
, the following doubly-indexed family of
-divergences
on is
an -relative -graded -divergence on :
The left adjoint of the adjunction
and the natural
transformation defined by
forms a monad
opfunctor from the probability monad on to the Giry
monad on [28, Prop. 22 (3)]. Through this monad
opfunctor , we can convert -divergences on
to those on . This conversion can be applied to all the
statistical divergences in Table 2 and 3.
In addition, for any standard Borel space, we can
view such converted divergences as the same thing as
the original . When is standard Borel, we
have an equality , and is
an isomorphism. Therefore we obtain an isomorphism
[28, Prop. 22 (4)]. A concrete description
of its inverse is
, where and
are a section-retraction pair
(i.e. )
that exists for any standard Borel .
Theorem 1.
For any and standard Borel
,
5.6 Divergences on State Monads
The state monad with a
state space is used to represent programs that update the state. We
construct divergences on using divergences on the state
space in several ways.
5.6.1 Lipschitz Constant on States
We first consider the state monad on . We also consider a
function satisfying .
The following -divergence
on measures how much the function pair
extends the distance between two
states before updated. In short,
measures the Lipschitz constant on state transformers.
Proposition 3.
The family of
-divergences on defined by
is a -relative -divergence on .
For state transformers , their state-updating part
is given as functions
.
When , is exactly the
Lipschitz constant of .
5.6.2 Distance between State Transformers with the Same Inputs
Suppose that the function also satisfies the triangle
inequality. The following -divergence
on estimates the distance
between updated states after the state transformers and are
applied to the same input.
Proposition 4.
Suppose that the function also satisfy the
triangle-inequality. The family
of -divergences on
defined by:
is an -relative -divergence on .
5.6.3 Sup-Metric on the State Monad on the Category of Generalized Ultrametric Spaces
The category of generalized (-valued) ultrametric spaces333Recall that an ultrametric space is a set
together with a function such that
and . and
nonexpansive functions is Cartesian closed [49, Section 2.2].
We consider the state monad on
for a fixed space .
From the definition of exponential objects in ,
consists of the set of nonexpansive state transformers with the sup metric between them.
In fact, the metric part of all forms a divergence on .
Proposition 5.
The family consisting of
the metric part of the spaces , given by
forms an -relative -divergence on .
In the category , instead of , there is another basic
endorelation :
By modifying the divergence , we obtain a
-relative -divergence as below:
Proposition 6.
The following
forms a -relative -divergence on .
5.7 Combining Divergence with Cost
In Section 5.2, we have introduced a divergence on the
monad modeling nondeterministic choice and cost
counting. In this section we construct a divergence on the combination
of a general computational effect and cost counting.
Let be a CC-SM and be a
divergence and be
a monoid object in (for cost counting). Then the composite
of the monad and the monoid action monad
again carries a monad structure. We now define a
family
of -divergences by
Proposition 7.
The family is an -relative -divergence on .
For example, the divergence on the composite monad
on describes Kullback-Leibler divergence
between distributions of costs in the probabilistic computations with
real-valued costs. Intuitively, the side condition
in the definition of
means that the difference between and lies only in the
costs.
5.8 Preorders on Monads
To explore the generality of our framework, we look at the case where
the divergence domain is ; here is
the numerical multiplication.
We identify an indexed family
of
-divergences and a family of adjacency relations
.
We point out a connection between -relative -divergences and
preorders on monads studied in
([32, 50]). A preorder on a
monad on assigns a preorder on for
each , and this assignment satisfies:
Substitutivity
For any function and ,
implies .
Congruence
For any function , if
holds for any , then
holds for any .
Proposition 8.
A preorder on a monad on bijectively corresponds to an
-relative -divergence on such that each
is a preorder.
For a preorder on a monad on , by
we mean the divergence corresponding to
by Proposition 8
(in fact, we have for all set ).
6 Properties of Divergences on Monads
6.1 Divergences on Monads as Structures in
In this section we examine divergences on monads from the view point
of monoidal structure of . For any CC , the category
has a symmetric monoidal structure, whose unit and tensor
product are given by
The coherence isomorphisms of this symmetric monoidal structure are
inherited from the Cartesian monoidal structure on . Moreover,
becomes a symmetric strict monoidal functor of
type .
6.1.1 Enrichments of Kleisli Categories Induced by Divergences
Let be a CC-SM. We first show that a (non-graded) divergence
on a monad attaches a -enrichment on the
Kleisli category of . What we mean by attaching an
enrichment to an ordinary category is formulated as follows.
Definition 9.
A -enrichment of a category is a family
of
-divergences on the homset such that the following
inequalities hold:
(3)
(4)
Such an enrichment determines a -enriched category
, whose object collection and homobjects are given by
The identity and composition morphisms of :
are inherited from ; they are guaranteed to be nonexpansive by
the conditions (3) and (4). The
change of base of enrichment of by the symmetric strict
monoidal functor coincides with . 444The
underlying category of [34, Section 1.3] does not coincide with .
We relate conditions (3) and (4) with the unit
reflexivity and composability conditions in the definition of divergence on monad
(Definition 6).
Theorem 2.
Let be a CC-SM, be a basic endorelation
such that 555 happens if and only if for
any . Therefore nontrivial basic endorelations always
satisfy . , be a divergence
domain and be a family
of -divergences on . Define a family
of -divergences on the homset
of the Kleisli category by
(5)
Then is a -enrichment of if and only if
is an -relative -divergence on .
6.1.2 Internalizing Divergences as Structures in
One might wonder how the -divergence (5) given to
each homset of arises. Under a strengthened assumption, we
derive it from the closed structure with respect to the monoidal
product of . This allows us to internalize
divergences on monads as structures in .
Let be a CCC-SM and be a divergence domain whose
monoid operation preserves the largest element , that is,
. A consequence of this strengthened assumption is the
following:
Lemma 1.
Let be an object such that takes
only values in . Then the functor
has a right adjoint, which we
denote by . Moreover, is a map
of adjunction of type:
The proof of this lemma exhibits that the -divergence
associated to the internal hom object measures
the divergence between by
which almost coincides with the sup part of (5); here
is the bijection given in Section
2. We use this coincidence to characterize the
unit-reflexivity and composability conditions in the definition of
divergence on monad (Definition 6). First, we define the internal Kleisli extension morphism
by
(6)
Next, for a basic endorelation , we define the functor
by
Theorem 3.
Let be a CCC-SM, be a grading
monoid, be a divergence domain whose monoid operation
satisfies , and be a basic
endorelation. Let be a
doubly-indexed family of -divergences on , regarded as
-objects. Then
1.
satisfies the -unit reflexivity condition if and
only if for any , the following nonexpansivity holds on
the global element corresponding to
the monad unit:
2.
satisfies the -composablity condition
if and only if for any and , the following
nonexpansivity holds on the internal Kleisli extension morphism
:
[5] formalized families of composable
divergences as parameterized assignment in weakly closed
monoidal refinement. Roughly speaking, they adopted the equivalence
(2) of Theorem 3 as the definition of
parameterized assignment. However, divergence on monads and
parameterized assignments are built on slightly different categorical
foundations, and their generalities are incomparable. Notable
differences from parameterized assignment are: 1) divergences on
monads are defined in relative to basic endorelations, and 2) the
underlying category of divergences on monads is any CCs, while
parameterized assignments requires closed structure on their
underlying category. In this sense divergences on monads are a mild
generalization of parameterized assignments.
6.1.3 Divergences on Monads and Divergence Liftings of Monads
We next relate graded divergences on monads and monad-like structures
on the category of -divergences on
-objects. What we mean by monad-like structures is graded
divergence liftings of monads on , which we introduce
below. It is a graded monad on
([31]) whose unit and multiplication are
inherited from a monad on .
Definition 10.
Let be a CC-SM, be a grading monoid and be a
divergence domain. An -graded -divergence lifting of is
an mapping
such that (below stands for the forgetful functor )
1.
2.
implies
3.
4.
.
Let be a basic endorelation. We say
that an -graded -divergence lifting of is
-strong if the strength of satisfies
We write for
the collection of -strong -graded -divergence liftings of
. We introduce a partial order on by
We will later see a similar concept of strong graded relational
lifting of monad in Definition
15. Divergence lifting and relational
lifting are actually instances of a common general definition of
strong graded lifting of monad ([31]),
but in this paper we omit this general definition.
The following theorem relates that every divergence can be expressed as the
composite of a graded divergence lifting and the divergence corresponding to a
basic endorelation.
Theorem 4.
Let be a CC-SM, be a grading monoid,
be a divergence domain and be a basic
endorelation. For any , define a mapping
by, for ,
where
Then is an -graded -divergence lifting
such that .
When , Theorem 4 implies that the assignment
extends to the -relative monad
in the sense of [3].
When we strengthen the assumptions on and as done in Section
6.1.2, we obtain a sharper correspondence between divergences on monads and
strong graded divergence liftings of monads.
Theorem 5.
Let be a CCC-SM, be a grading monoid, be a
divergence domain such that satisfies and
be a basic endorelation. Then there
exists an adjunction between partial orders:
where
6.2 Generation of Divergences
It has been shown that DP can be interpreted as hypothesis testing
([54, 30]). Given a query and
adjacent datasets , we
consider the following hypothesis testing with the null and
alternative hypotheses:
For any rejection region , the Type I and Type II
errors are then represented by and
, respectively. [30] showed that
is -DP if and only if for any
adjacent datasets , the
pair of Type I error and Type II error lands in the privacy
region :
They also showed that this is equivalent to
the testing using probabilistic decision rules [30, Corollary 2.3]:
Later [7] generalized this
probabilistic variant of hypothesis testing to general statistical
divergences, and arrived at a notion of -generatedness of
statistical divergences (). Following their
generalization, we introduce the concept of -generatedness of divergences on monads.
Definition 11.
Let . A divergence is -generated if for any , and
,
An equivalent definition of being
-generated is: the following holds for any
:
Here is the binary relation
; see also
(9). For an -generated divergence
, its component at is an essential
part that determines all components of . When a
divergence is shown to be -generated, the calculation of the
codensity lifting given in Section
7 will be simplified (Section
7.1).
We illustrate -generatedness of various divergences. First,
we show the -generatedness of divergences on the Giry monad
in Tables 2 and 3.
•
Divergence is generated over the two-point discrete
space [7, Section
B.7]. The binary relation
coincides with the privacy
region .
•
Divergence is also generated over [7, Section
C.1].
•
Divergences , , and
are generated over the countably infinite discrete space . In
contrast, they are not -generated for every finite discrete space
[7, Sections B.5 and B.9].
On the sub-Giry monad , the divergence is -generated,
and the total variation distance is -generated.
Proposition 9.
The divergence is -generated.
Proposition 10.
The divergence is not -generated but -generated.
-Generatedness of Preorders on Monads
We relate -generatedness of divergences and preorders on
monads studied in ([32]). Let be a monad on
and be a set. [32] introduced the concept of
congruent and substitutive preorders on as
those satisfying:
Substitutivity
For any function and
, implies
.
Congruence
For any function , if
holds for any , then
holds for any .
For instance, any component of a preorder on at forms a
congruent and substitutive preorder on . We write
for the set of all congruent and
substitutive preorders on , and for the
collection of all preorders on . [32] gave a
construction
of preorders on from congruent and substitutive preorders on
:
The constructed preorders on are -generated in the
following sense:
Proposition 11.
For any , the
-divergence corresponding to the preorder on
is -generated (see Proposition
8 for the correspondence).
Applying this proposition, we can determine -generatedness of
preorders on monads:
•
If the monad has a rank , the construction
is bijective [32, Theorem 7]. Hence for
such a monad, each preorder on corresponds to an
-generated -divergence.
•
For the subprobability distribution monad on ,
[50] identified all preorders on :
there are 41 preorders on . Among them, 25 preorders are
1-generated, while 16 preorders are 2-generated [50, Proposition
6.3].
6.3 An Adjunction between Quantitative Equational Theories and Divergences
[39] introduced a concept of quantitative equational theory as an algebraic presentation of
monads on the category of (pseudo-)metric spaces. A quantitative
equational theory is an equational theory with indexed equations
having the axioms of pseudometric spaces, plus
suitable axioms reflecting properties of quantitative algebras. A
quantitative equational theory determines a pseudometric on the set of
-terms.
Consider a set of function symbols of finite arity. If
is the arity of a function , we write
. Let be a set of variables, and let
be the -term algebra over . For
and , we write
for the term obtained by applying to
. The construction forms a (strong) monad
on whose unit sends variables to terms, that is,
, and Kleisli extension
of function
is defined inductively by
A substitution of -terms over is a function . For
, we call the substitution of
to . We define the set of
indexed equations of terms by
Here the index runs over non-negative rational numbers. A conditional quantitative equation is a judgment of the following
form
the left hand side of turnstile () is called hypothesis and the right
hand side conclusion. We denote by the set of
conditional quantitative equations.
For any countable subset of and any substitution
,
we define
.
Definition 12(Quantitative Equational Theory [39, Definition 2.1]).
A quantitative equational theory (QET for short) of type
over is a set closed under the
rules summarized as Figure 1.
(Ref)
(Sym)
(Tri)
(Max)
(Arch)
(Nonexp)
(Subst)
(Cut)
(Assumpt)
Figure 1: Quantitative Equational Theory Rules
We write for the set of QETs of type over
. We regard it as a poset by the set
inclusion order. Given a set of conditional quantitative
equations of type over , by we mean the
least QET containing .
We state an adjunction between quantitative equational theories and
divergences on free-algebra monads on . More specifically, we
construct the following adjunction and isomorphism between posets:
(7)
By combining these, a QET of type over determines an
-generated -relative -divergence on
and vice versa. The poset in the middle is that of congruent and substitutive pseudometrics, which are a quantitative
analogue of congruent and substitutive preorders.
Definition 13.
Let be a monad on and .
A congruent and substitutive pseudometric (CS-EPMet for short) on
is an extended pseudometric666A function
is called an extended pseudometric on if
(reflexivity),
(symmetry) and
(triangle-inequality) hold for all .
on
satisfying
Substitutivity
For all function and
, .
Congruence
For all set , function and ,
.
We denote by the set of CS-EPMets on .
We then make it into a poset
by the following pointwise opposite order:
Definition 14.
Let be a monad on and . We denote by
the collection of -generated
-relative -divergences on such that each
component is an extended pseudometric. We restrict the
partial order on to
.
We next introduce various monotone functions appearing in
(7).
Proposition 12.
The functions defined above
are all well-defined monotone functions having types
given in (7).
That is an extended pseudometric is shown in the beginning of
[39, Section 5]. Here we additionally show
that it enjoys congruence and substitutivity of Definition
13. The function is taken from the right hand
side of the definition of -generatedness (Definition
11). The function simply extracts the -th
component of a given divergence.
Theorem 6.
For any set of function symbols with finite arity and set
, the following holds for the monotone functions in (7):
1.
is the inverse of .
2.
We have an adjunction satisfying
:
(8)
In the proof of this theorem, we used the definition of models of QET
([6]). Intuitively, the right
adjoint extracts the pseudometric on from a given
QET. The left adjoint constructs the least QET containing all
information of a given pseudometric on . The adjunction
(8) also implies that we can construct monads on the
category of extended metric spaces from CS-EPMets by Mardare et al.’s
metric term monad construction
([39]). Overall adjunction (7)
says that -generated divergences can be axiomatized with QETs whose
variable set is .
The range of is a subset of of unconditional QETs
defined below (See also [38, Section 3]):
Unconditional QETs of type over are equivalent to -generated divergence on :
restricting QETs to unconditional QETs, the adjunction (8) becomes a pair of isomorphisms.
Theorem 7.
.
7 Graded Strong Relational Liftings for Divergences
We have introduced the concept of divergence on monad for measuring
quantitative difference between two computational effects. To
integrate this concept with relational program logic, we employ a
semantic structure called graded strong relational lifting of
monad. It is introduced for the semantics of approximate probabilistic
relational Hoare logic for the verification of differential privacy
([12]), then later used in various program
logics
([13, 8, 9, 51, 52]).
Independently, it is also introduced as a semantic structure for effect
system ([31]). Liftings introduced in the
study of differential privacy are designed to satisfy a special
property called fundamental property [12, Theorem
1]: when we supply the equivalence
relation to the lifting, it returns the adjacency relation of the
divergence. This special property is the key to express the
differential privacy of probabilistic programs in relational program
logics.
In this paper, we present a general construction of graded
strong relational liftings from divergences on monads. First, we
recall its definition
([31, 24]).
Definition 15.
Let be a CC-SM and be a
grading monoid. An -graded strong relational
lifting of is a mapping
satisfying the following conditions:
1.
,
and implies .
2.
.
3.
implies
.
4.
.
Our interest is in the graded strong relational lifting that carries
the information of a given divergence . We
identify such liftings by the following fundamental
property. First define the adjacency relation of
by
(9)
Note that is monotone on and .
Definition 16.
We say that an -graded strong relational lifting
of satisfies the fundamental property with
respect to if the following holds:
Theorem 8.
Let be a CC-SM, be a grading
monoid, be a divergence domain and
be
a doubly-indexed family of -divergences satisfying monotonicity
on (Definition 6). Define the following mapping
:
1.
The mapping is an
-graded strong relational lifting of .
2.
Let be a basic
endorelation. Then
is -unit-reflexive
(S)
is -composable
(C)
The construction of is a graded extension of the
codensity lifting
([51, 33]).
The remainder of this section is the proof of Theorem 8.
Proof.
(Proof of (1)) Proving conditions 1-3
of graded strong relational lifting (Definition
15) are routine generalization of
[33] and [31, Section
5]; thus omitted here (see Lemma
4 in appendix).
However, condition 4 of Definition
15 needs a special attention because in
general codensity lifting does not automatically lift strength. The
current setting works because of our particular choice of the
category of binary relations over . We prove condition 4 as follows.
Since for any holds,
we have the equivalence
From this, condition 3 (law of graded Kleisli extension), and the equation
(1) on the strength of a CC-SM,
we prove condition 4 from condition 2 (unit law):
for all and , we have
(Proof of (2)-(S))
We show the equivalence of being -unit-reflexive and the
implication
(10)
We suppose that the above implication holds. We fix .
Let . By instantiating the whole implication with
, the middle part of
(10) becomes
which is trivially true. Therefore we conclude
for any ,
that is, -unit reflexivity holds.
Conversely, we suppose that satisfies the unit-reflexivity.
We take of appropriate type and assume the middle part
of (10). By instantiating it with
, we conclude .
(Proof of (2)-(C)) We show the
equivalence of being -composable and the implication
as follows:
The first two equivalences are obtained by expanding the definitions
of , and , the last two
equivalences hold because is a divergence domain.
∎
Combining the fundamental property and the strength of , we
recover a strength law of divergences.
Proposition 13.
Let be a CC-SM, be a basic endorelation,
be a grading monoid and be a
divergence domain. Suppose also that
holds for all .
Then each divergence satisfies: for all
and ,
7.1 Simplifying Codensity Liftings by -Generatedness of Divergences
We here show that for an -generated divergence , the
calculation of the codensity lifting can be
simplified. For an object , we define by
The original calculation of is a large
intersection
where runs over all -objects, but if is
-generated, the parameter can be fixed at .
Proposition 14.
For any -generated divergence , we
have .
Proof.
We show the equivalence for each .
() Immediate from .
() By the -generatedness of , we have for all and ,
Therefore, for any , we have
This completes the proof.
∎
For example, the generatedness of shown in Section
6.2 implies that
and
. In fact,
the simplification is equal to the
-graded relational lifting for DP given
in [51, Section 2.2], which is defined by,
for each ,
For detail, see the proof of equalities (†) and (‡)
in the proof of [51, Theorem 2.2(iv)].
7.2 Two Lifting Approaches: Codensity and Coupling
We briefly compare two lifting approaches: graded codensity lifting
and coupling-based lifting employed in
([12, 13, 8, 9, 52]).
We compare the role of the unit-reflexivity and composability in the
codensity graded lifting and the coupling-based graded
lifting. Consider the CCC-SM , where is the
probability distribution monad. Given an -relative -graded
-divergence on , the coupling-based graded lifting
is defined by
(11)
where is the projection () from the binary
relation. The pair of probability distributions
collected in the right hand side of (11) is called a
coupling.
The fundamental property
immediately follows from the definition of , while
the composability and unit-reflexivity of are used to make
a strong -graded lifting
[13, Proposition 9]. On the other hand,
the codensity graded lifting is always an
-graded lifting; this does not rely on the
unit-reflexivity and composability of (Proposition
1). These properties are used to show that
satisfies the fundamental property (Proposition 2).
The coupling-based lifting (11) can be naturally
generalized to any -monad . However, at this moment we do not
know how to generalize the coupling technique to any CC-SM
. As the prior study by [52]
pointed out, there is already a difficulty in extending it to the
CC-SM .
We illustrate how the problem arises. Let . We would
like to pick two probability measures over as couplings, but
is merely a set. We therefore equip it with the subspace
-algebra of , and let be the derived
measurable space (hence ). We write
for measurable projections (). We
then define a candidate -graded lifting of by
We now verify that also lifts the Kleisli extension of
, that is,
Let be pair of measurable
functions. Then for each , we have
. Therefore there
exists such that
and
. Using the
axiom of choice, we turn this relationship into functions
. If they were measurable
functions of type , then from the
composability of , we would have
for such that
. This gives
.
However, in general, ensuring the measurability of is
not possible, especially because they are picked up by the axiom of
choice. A solution given in [52] is to
use the category of spans, that guarantees the existence
of good measurable functions .
8 Approximate Computational Relational Logic
We introduce a program logic called approximate computational
relational logic (acRL for short). It is a combination of Moggi’s
computational metalanguage and a relational refinement type system
([9]). The strong graded relational
lifting of a monad constructed from a divergence will be used to
relationally interpret monadic types, and gradings give upper
bounds of divergences between computational effects caused by two
programs. acRL is similar to the relational refinement type system
HOARe2 ([9]), which is designed for
verifying differential privacy of probabilistic programs. Compared to
HOARe2, acRL supports general monads and divergences, while it does
not support dependent products nor non-termination.
The relational logic acRL adopts the extensional approach
(cf. [44, Chapter 9.2]):
•
Relational assertions between contexts and are defined
as binary relations
between and , or equivalently -objects such that
. Logical connectives and
quantifications are defined as operations on such -objects.
This is in contrast to the standard design of logic where assertions
are defined by a BNF.
•
Let and be
well-typed terms, be a relational assertion between
, and be an assertion between . The main
concern of acRL is the statement
“” (equivalently
). In this section we denote
this statement by .
•
Inference rules of the logic consists of the facts about the
statement . We remark that in the
standard logic, proving these facts corresponds to the soundness of
inference rules.
8.1 Moggi’s Computational Metalanguage
Figure 2: Syntax of Types and Raw Terms of the Computational Metalanguage
8.1.1 Syntax of the Computational Metalanguage
For the higher-order programming language, we adopt Moggi’s computational metalanguage ([42]). It is an
extension of the simply typed lambda calculus with monadic types. For
a set , we define the set of types over by the first
BNF in Figure 2. We then define the set of
first-order types to be the subset of consisting only of
.
We next introduce computational signatures for specifying
constants in the computational metalanguage. A computational signature
is a tuple where is a set of base types,
and and are functions whose range is
. The domains of are sets of value operation symbols and effectful operation symbols, and
are denoted by , respectively. These functions assign input
and output types to these operations.
Fix a countably infinite set of variables. A context is a function
from a finite subset of to ; contexts are often denoted
by capital Greek letters . For contexts such that
, by we mean the join of and
.
The set of raw terms is defined by the second BNF in Figure
2. The type system of the computational metalanguage has
judgments of the form where is a context,
a raw term and a type. It adopts the standard rules for
products, coproducts, implications and monadic types; see
e.g. [42]. The typing rules for value operations
and effectful operations are given by
A simultaneous substitution from to is a function
from the set of variables to raw terms such that
the well-typedness holds for each
. The application of to a term
is denoted by , which has a typing
. For disjoint contexts (), we
define the projection substitutions
by
.
8.1.2 Categorical Semantics of the Computational Metalanguage
Figure 3: Data for the Categorical Semantics of Metalanguage
1.
is a CCC-SM and has finite coproducts.
2.
for each
3.
for each such that
4.
for each such that
The interpretation of the computational metalanguage over a
computational signature is given by the data
specified by Figure 3.
We first inductively extend the interpretation of base types to all
types using the bi-Cartesian closed structure and the monad. Next,
for each context , we fix a product diagram
; when
, we assume that with
. Lastly we interpret a typing derivation of
as a morphism in the
standard way, using the interpretations of operations given in Figure
3. We further extend this to the interpretation of each
simultaneous substitution as a morphisms
.
8.2 Approximate Relational Computational Logic
8.2.1 Relational Logic in External Form
A relational assertion between disjoint contexts and
is a binary relation between and . We denote
such a relational assertion by , and identify it as a
-object such that .
Similarly, a relational assertion between types and is
defined to be a relational assertion ;
here are reserved and fixed variables.
Relational assertions between contexts and carry a boolean algebra structure given by the
set-intersection, set-union and set-complement (see the boolean
algebra in Section 2.1). The
pseudo-complement is defined to be
. For , by
and we mean the
relational assertions defined by the following equivalence:
The boolean algebra structure and the above quantifier operations
allow us to interpret first-order logical formulas as relational
assertions; we omit its detail here. In addition to these standard
logical connectives, we will use graded strong relational lifting
to form relational assertions. That is, for any basic
endorelation , grading monoid , divergence domain
and divergence , we obtain a relational
assertion
from any
, and .
For substitutions and an
assertion , by
we mean the relational
assertion
.
For disjoint context pairs and and relational
assertions and , by the
juxtaposition we mean the
relational assertion
.
8.2.2 Inference Rules for acRL
For
well-typed computational metalanguage terms and
, and relational assertions and
, by the judgment
we mean the inclusion of
binary relations. This is equivalent to
. We show basic facts about judgments
.
Proposition 15.
1.
and and
implies .
2.
and and
implies
.
3.
and and
and implies
.
4.
implies
.
5.
and
implies
.
We next establish relational judgments on effectful operations. We
present a convenient way to establish such judgments using the
fundamental property of the graded relational lifting .
Proposition 16.
For any such that , relational
assertion and , putting
, we have
.
Proof.
Take an arbitrary pair .
We have by definition of .
Thanks to the fundamental property of (Theorem 8),
it is equivalent to .
∎
9 Case Study I: Higher-Order Probabilistic Programs
We represent a higher-order probabilistic programming language with
sampling commands from continuous distributions as a computational
metalanguage. For now we assume that the language supports sampling
from Gaussian distribution and Laplace distribution. This
computational metalanguage is specified by the computational
signature:
where is some chosen signature for value operations over
reals. We interpret this computational metalanguage by filling Figure
3 as follows:
for the interpretation of , we take the quasi-Borel space
associated with the standard Borel space ,
3.
the interpretation of value operations is given as expected (we
omit it here); for example when contains the real number
addition operator as type , its
interpretation is the QBS morphism
,
4.
for the interpretation of effectful operations, we put
Here, is the Gaussian
distribution with mean and variance .
is the Laplacian
distribution with mean and variance
777If (or ), (resp. ) is not defined, thus we replace it by the Dirac distribution at instead.. Every
probability (Borel-)measure on can be converted to the
probability measure
on the
quasi-Borel space (see Section 5.5).
9.1 A Relational Logic Verifying Differential Privacy
To formulate differential privacy and its relaxations in the quasi-Borel setting, we
convert statistical divergences on the Giry monad
in Table 2 to -relative divergences on
the probability monad on by the construction in Section 5.5.
Then, we construct
the graded relational lifting by Theorem
8. Using this, as an instantiation of acRL, we build a
relational logic reasoning about differential privacy and its
relaxations, supporting higher-order programs and continuous random
samplings. Basic proof rules can be given by Proposition
15.
For effectful operations, we import basic proof rules on noise-adding
mechanisms given in prior studies
([21, 22, 40, 16])
via Theorem 1 and Proposition
16. For example, consider the
-relative -graded -divergence
on . Proposition
16 with an effectful operation and
a relational assertion (below we
identify global elements in and real numbers)
together with Theorem 1 and the prior
result [21, Example 1] yields the
following judgment:
By letting be the relational assertion
, the above
judgment is equivalent to:
(12)
This rule corresponds to the rule [LapGen] of the program logic apRHL+
([11]) for differential privacy. For another example,
by the reflexivity of , is also
reflexive, hence we obtain the following judgments (below is
the relational assertion
):
(13)
(14)
The judgment (13) correspond to
[LapNull] of apRHL+. Similarly, the following judgments about the
DP, Rényi-DP, zero-concentrated DP of the Gaussian mechanism can be
derived as
(15)–(17).
(15)
(16)
(17)
In (15) we require
. The derivation is done via Proposition
16, Theorem
1 and prior studies
([51, 40, 17]).
10 Case Study II: Probabilistic Programs with Costs
We further extend the computational signature in the
previous section with an effectful operation such that
. The intention of is to
increase cost counter by during execution888To make
examples simpler, we allow negative costs.. To interpret this
extended metalanguage, we fill Figure 3 as follows:
1.
for the CCC-SM, we take where
is the monad for
modeling probabilistic choice and cost counting (see Section
5.7).
interpretation of value operations is also the same as Section 9,
4.
for the interpretation of effectful
operations, put
We derive a closed term
for ticking with a cost sampled from Gaussian distribution:
The term adds cost counter by a random value
sampled from the Gaussian distribution .
10.1 Relational Reasoning on Probabilistic Costs
We convert the total valuation distance
to the divergence
on by Propositions
2 and
7.
We also prove basic facts
on effectful operations. First, the following relational judgments on can
be easily given:
(18)
Remark that and holds.
Next, in the similar way as (13), by
the reflexivity of , we have the reflexivity of
, and we obtain, for each real number constant ,
(19)
We also directly verify the following judgment on using
Theorem 1 and Proposition
16:
(20)
10.1.1 An Example of Relational Reasoning
We give examples of verification of difference (of distributions) of costs between
two runs of a probabilistic program whose output and cost depend on the input.
We consider the following program:
It first samples a real number from the Laplacian distribution
centered at the input , call the (possibly effectful) closure
with and return . Since the return type of is , it
can only probabilistically tick the counter. We show that the
following two judgments in acRL:
(A)
(B)
In judgment (A), we pass the tick
operation itself to and . By the
fundamental property of , the difference of
costs between two runs of and is , because each of
these programs reports cost and deterministically. In
contrast, in judgment (B), we pass to
and the probabilistic tick function that
ticks a real number sampled from the Gaussian distribution with
variance . Therefore the cost reported by the runs of
programs and follow the Gaussian distributions
and , whose difference by is
bounded by .
We first show (A). By
(18) and 2 of Proposition 15, we
have,
This work is based on the frameworks for verifying the differential
privacy of probabilistic programs using relational logic, summarized
in Table 5. Composable divergences employed in these
frameworks include the one for differential privacy, plus its recent
relaxations, such as, Rényi DP, zero-concentrated DP, and
truncated-concentrated DP ([16, 17, 40]).
The key semantic structure in these frameworks is
graded relational liftings of the probability distribution monad.
Barthe et al. gave a graded relational
lifting of the distribution monad based on the existence of two witnessing probability
distributions (called coupling) ([12]). Since then, coupling-based liftings have been refined and used in several works
([8, 10, 13, 52]).
They can be systematically constructed from composable
divergences on the probability distribution monad
([13]). One advantage of coupling-based
liftings is that, to relate two probability distributions, it suffices
to exhibit a coupling; this is exploited in the mechanized
verification of differential privacy of programs
([1, 2]).
These coupling-based liftings, however, are developed upon discrete
probability distributions, and measure-theoretic probability
distributions, such as Gaussian or Cauchy distributions, were not
supported until the work ([52]).
The relational Hoare logic supporting sampling from continuous
probability measures is given in the study by
[51]. In his work, the graded relational
lifting for -DP is given in the style of codensity lifting ([33]), which
does not rely on the existence of coupling. Yet, it has been an open
question [52, Section VIII] how to extend
his graded relational lifting to support various relaxations of
differential privacy. This paper answers to this question as Theorem
8. Later, coupling-based liftings has also been extended
to support samplings from continuous probability measures
([52]). This extension is achieved by
redefining the concept of binary relations as spans of measurable
functions. Comparison of these approaches is in the next section.
The verification of differential privacy in functional programming
languages has also been pursued
([48, 23, 9, 5]).
[48] introduced a linear functional
programming language with a graded monadic type that supports
reasoning about -differential privacy. Later, Gaboardi et
al. strengthen Reed-Pierce type system with dependent types
([23]). A category-theoretic account of
Reed and Pierce type system is given in
[5], where general
-differential privacy is also supported. These
works basically regard types as metric spaces, allowing us to reason
about sensitivity of programs with respect to inputs. The
coupling-based lifting techniques are also employed in the relational
models of higher-order probabilistic programming language
([9]).
The study [5] gives a categorical
definition of composable divergences in a general framework called
weakly closed refinements of symmetric monoidal closed
categories [5, Definition 1]. A
comparison is given in Section 6.1.2.
[39] introduced a quantitative refinement of
algebraic theory called quantitative equational theory, and
studied variety theorem for quantitative algebras.
[6] discussed tensor products of
quantitative equational theories. QETs and divergences on monads share
the common interest of measuring quantitative differences between
computational effects. Divergences on monads are derived as a
generalization of the composability condition of statistical
divergences studied by [13]. To make a
precise connection between these two concepts, in Section
6.3, we have given an adjunction between QETs of type
over and -generated divergences on the free monad
. The adjunction cuts down to the isomorphism between
unconditional QETs of type over and -generated
divergences on .
The use of metric-like spaces in the semantics is seen in several
recent work. [25] studies quantitative
refinements of Abramsky’s applicative bisimilarity for Reed-Pierce
type system. He introduces a monadic operational semantics of the
language and formalized quantitative applicative bisimilarity using
monad liftings to the category of quantale-valued relations.
[15] also used metric-like spaces to
study bisimulations and up-to techniques in the category of
quantale-valued relations.
In this work our interest is relational program verification of
effectful programs, and it is carried out in the relational category
, rather than . The quantitative difference of
computational effects measured by a divergence is represented
by the binary relation graded by upper bounds
of distance.
12 Future Work
The framework for relational cost analysis given in ([47])(extension of ([18])) consists of the relational logic verifying the difference of costs between two programs
and the unary logic verifying the lower and upper bound of costs (i.e. cost intervals) in one program.
We expect that the relational logic can be reformulated by an instantiation of acRL with the divergence on (or its variant).
However to reformulate the unary logic, we want a unary version of divergence on
for cost intervals.
To establish the connection between the unary logic and relational logic, we want a conversion from the unary version of divergence (for cost intervals) to
(for cost difference).
There might be many other examples and applications of divergences on monads.
In this paper, we mainly discussed examples of divergences with basic endorelations and , but various other basic endorelations can be considered.
13 Measurable Spaces and Quasi-Borel Spaces
Measurable Spaces.
For the treatment of continuous probability distributions, we employ
the category of measurable spaces and measurable functions.
For a measurable space we write and for the
underlying set and -algebra of respectively. The category
is a (well-pointed) CC, and it has all small limits and small colimits
that are strictly preserved by the forgetful functor
. It is naturally
isomorphic to the global element functor .
Standard Borel Spaces.
A standard Borel space is a special measurable space
whose -algebra is
the coarsest one containing the topology of a Polish
space . In particular, the real line
forms a standard Borel space. In fact, a measurable space is
standard Borel if and only if there are
and in
forming a section-retraction pair, that is,
.
For example, , , , () are
standard Borel.
The Giry Monad.
We recall the Giry monad ([26]). For every
measurable space , is the set of all
probability measures over with the coarsest -algebra
induced by functions ()
defined by . The unit assigns to
each the Dirac distribution centered at .
For every , the Kleisli extension
is given by
for each . We also denote by
the subprobabilistic variant of (called sub-Giry
monad), where the underlying set of is relaxed
to the set of subprobaility measures over .
The Giry monad (resp. the sub-Giry monad ) carries a
(commutative) strength
over the CC . It computes the product of
measures (). Therefore
and are (well-pointed) CC-SMs.
Quasi-Borel Spaces.
The category is not suitable for the semantics of
higher-order programming languages since it is not Cartesian
closed ([4]). For the treatment of higher-order
probabilistic programs with continuous distributions, we employ the
Cartesian closed category of quasi-Borel spaces and morphisms
between them, together with the probability monad on
([28]). A quasi-Borel space is a pair
of a set and a subset of the function
space satisfying
1.
for and a measurable function ,
.
2.
for any , .
3.
for all and a family of functions , .
A morphism is a function
such that holds for all .
The category is a (well-pointed) CCC, and has all countable
products and coproducts that are strictly preserved by the forgetful functor .
It is naturally isomorphic to the global element functor .
Connection to Measurable Spaces: an Adjunction
We can convert measurable spaces and quasi-Borel spaces using an
adjunction . They
are given by
For any standard Borel space , we have
. The right adjoint is
full-faithful when restricted to the standard Borel
spaces [28, Proposition 15-(2)]. The right adjoint
preserves countable coproducts and function spaces (if exists) of
standard Borel spaces [28, Proposition 19].
Probability Measures and the Probability Monad.
A probability measure on a quasi-Borel space is a pair
. We introduce an equivalence
relation over probability measures on by
Using this, we introduce a probability monad on as
follows:
•
On objects, we define by
•
The unit is defined by
for an arbitrary .
•
The Kleisli extension of is defined
by where
there are and satisfying
by
definition of .
The monad is (commutative) strong with respect to the CCC
.
Acknowledgments
Tetsuya Sato carried out this research under the support by JST ERATO
HASUO Metamathematics for Systems Design Project (No. JPMJER1603) and
JSPS KAKENHI Grant Number 20K19775, Japan. Shin-ya Katsumata carried
out this research under the support by JST ERATO HASUO Metamathematics
for Systems Design Project (No. JPMJER1603) and JSPS KAKENHI Grant
Number 18H03204, Japan. The authors are grateful to Ichiro Hasuo
providing the opportunity of collaborating in that project. The
authors are grateful to Satoshi Kura, Justin Hsu, Marco Gaboardi,
Borja Balle and Gilles Barthe for fruitful discussions.
References
[1]
Aws Albarghouthi and Justin Hsu.
Constraint-based synthesis of coupling proofs.
In Computer Aided Verification - 30th International Conference,
CAV 2018, Proceedings, Part I, volume 10981 of LNCS, pages
327–346. Springer, 2018.
[2]
Aws Albarghouthi and Justin Hsu.
Synthesizing coupling proofs of differential privacy.
PACMPL, 2(POPL):58:1–58:30, 2018.
[3]
Thorsten Altenkirch, James Chapman, and Tarmo Uustalu.
Monads need not be endofunctors.
Log. Methods Comput. Sci., 11(1), 2015.
[4]
Robert J. Aumann.
Borel structures for function spaces.
Illinois J. Math., 5(4):614–630, 12 1961.
[5]
Arthur Azevedo de Amorim, Marco Gaboardi, Justin Hsu, and Shin-ya
Katsumata.
Probabilistic relational reasoning via metrics.
In 34th Annual ACM/IEEE Symposium on Logic in Computer
Science, LICS 2019, pages 1–19. IEEE, 2019.
[6]
Giorgio Bacci, Radu Mardare, Prakash Panangaden, and Gordon Plotkin.
Tensor of Quantitative Equational Theories.
In Fabio Gadducci and Alexandra Silva, editors, 9th Conference
on Algebra and Coalgebra in Computer Science (CALCO 2021), volume 211 of
Leibniz International Proceedings in Informatics (LIPIcs), pages
7:1–7:17, Dagstuhl, Germany, 2021. Schloss Dagstuhl – Leibniz-Zentrum
für Informatik.
[7]
Borja Balle, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Tetsuya Sato.
Hypothesis testing interpretations and renyi differential privacy.
In Silvia Chiappa and Roberto Calandra, editors, Proceedings of
the Twenty Third International Conference on Artificial Intelligence and
Statistics (AISTATS 2020), volume 108 of Proceedings of Machine
Learning Research, pages 2496–2506, Online, 26–28 Aug 2020. PMLR.
[8]
Gilles Barthe, Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu,
César Kunz, and Pierre-Yves Strub.
Proving differential privacy in Hoare logic.
In IEEE 27th Computer Security Foundations Symposium, CSF
2014, pages 411–424. IEEE Computer Society, 2014.
[9]
Gilles Barthe, Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu,
Aaron Roth, and Pierre-Yves Strub.
Higher-order approximate relational refinement types for mechanism
design and differential privacy.
In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India,
January 15-17, 2015, pages 55–68. ACM, 2015.
[10]
Gilles Barthe, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and
Pierre-Yves Strub.
Proving differential privacy via probabilistic couplings.
In Proceedings of the 31st Annual ACM/IEEE Symposium on Logic
in Computer Science, LICS ’16, pages 749–758. ACM, 2016.
[11]
Gilles Barthe, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub.
Coupling proofs are probabilistic product programs.
In Proceedings of the 44th ACM SIGPLAN Symposium on Principles
of Programming Languages, POPL 2017, pages 161–174, New York, NY, USA,
2017. Association for Computing Machinery.
[12]
Gilles Barthe, Boris Köpf, Federico Olmedo, and Santiago Zanella
Béguelin.
Probabilistic relational reasoning for differential privacy.
In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages, POPL 2012, pages 97–110. ACM,
2012.
[13]
Gilles Barthe and Federico Olmedo.
Beyond differential privacy: Composition theorems and relational
logic for f-divergences between probabilistic programs.
In Automata, Languages, and Programming - 40th International
Colloquium, ICALP 2013, Proceedings, Part II, volume 7966 of LNCS,
pages 49–60. Springer, 2013.
[14]
Nick Benton.
Simple relational correctness proofs for static analyses and program
transformations.
SIGPLAN Not., 39(1):14–25, January 2004.
[15]
Filippo Bonchi, Barbara König, and Daniela Petrisan.
Up-To Techniques for Behavioural Metrics via Fibrations.
In 29th International Conference on Concurrency Theory (CONCUR
2018), volume 118 of Leibniz International Proceedings in Informatics
(LIPIcs), pages 17:1–17:17, Dagstuhl, Germany, 2018. Schloss
Dagstuhl–Leibniz-Zentrum fuer Informatik.
[16]
Mark Bun, Cynthia Dwork, Guy N. Rothblum, and Thomas Steinke.
Composable and versatile privacy via truncated CDP.
In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory
of Computing, STOC 2018, pages 74–86, New York, NY, USA, 2018. Association
for Computing Machinery.
[17]
Mark Bun and Thomas Steinke.
Concentrated differential privacy: Simplifications, extensions, and
lower bounds.
In Theory of Cryptography, pages 635–658, Berlin, Heidelberg,
2016. Springer Berlin Heidelberg.
[18]
Ezgi Çiçek, Gilles Barthe, Marco Gaboardi, Deepak Garg, and Jan
Hoffmann.
Relational cost analysis.
SIGPLAN Not., 52(1):316–329, January 2017.
[19]
Imre Csiszár.
Eine informationstheoretische Ungleichung und ihre Anwendung auf
den beweis der ergodizitat von markoffschen ketten.
Magyar. Tud. Akad. Mat. Kutato Int. Kozl., 8:85–108, 1963.
[20]
Imre Csiszár.
Information-type measures of difference of probability distributions
and indirect observations.
Studia Sci. Math. Hungar., 2:299–318, 1967.
[21]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith.
Calibrating noise to sensitivity in private data analysis.
In Theory of Cryptography, volume 3876 of LNCS, pages
265–284. Springer Berlin Heidelberg, 2006.
[22]
Cynthia Dwork and Aaron Roth.
The algorithmic foundations of differential privacy.
Foundations and Trends® in Theoretical Computer
Science, 9(3-4):211–407, 2013.
[23]
Marco Gaboardi, Andreas Haeberlen, Justin Hsu, Arjun Narayan, and Benjamin C.
Pierce.
Linear dependent types for differential privacy.
In The 40th Annual ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages, POPL ’13, pages 357–370. ACM,
2013.
[24]
Marco Gaboardi, Shin-ya Katsumata, Dominic Orchard, and Tetsuya Sato.
Graded hoare logic and its categorical semantics.
In Nobuko Yoshida, editor, Programming Languages and Systems -
30th European Symposium on Programming, ESOP 2021, Held as Part of the
European Joint Conferences on Theory and Practice of Software, ETAPS 2021,
Luxembourg City, Luxembourg, March 27 - April 1, 2021, Proceedings, volume
12648 of Lecture Notes in Computer Science, pages 234–263. Springer,
2021.
[25]
Francesco Gavazzo.
Quantitative behavioural reasoning for higher-order effectful
programs: Applicative distances.
In Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in
Computer Science, LICS ’18, pages 452–461, New York, NY, USA, 2018.
Association for Computing Machinery.
[26]
Michèle Giry.
A categorical approach to probability theory.
In B. Banaschewski, editor, Categorical Aspects of Topology and
Analysis, volume 915 of LNM, pages 68–85. Springer, 1982.
[27]
Rob Hall.
New Statistical Applications for Differential Privacy.
PhD thesis, Machine Learning Department School of Computer Science
Carnegie Mellon University, 2012.
[28]
Chris Heunen, Ohad Kammar, Sam Staton, and Hongseok Yang.
A convenient category for higher-order probability theory.
In 32nd Annual ACM/IEEE Symposium on Logic in Computer
Science, LICS 2017, pages 1–12, 2017.
[29]
B. Jacobs.
Categorical Logic and Type Theory.
Elsevier, 1999.
[30]
Peter Kairouz, Sewoong Oh, and Pramod Viswanath.
The composition theorem for differential privacy.
In Proceedings of the 32nd International Conference on Machine
Learning, ICML 2015, Lille, France, 6-11 July 2015, pages 1376–1385,
2015.
[31]
Shin-ya Katsumata.
Parametric effect monads and semantics of effect systems.
In The 41st Annual ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages, POPL ’14, pages 633–646. ACM,
2014.
[32]
Shin-ya Katsumata and Tetsuya Sato.
Preorders on monads and coalgebraic simulations.
In Frank Pfenning, editor, Foundations of Software Science and
Computation Structures, pages 145–160, Berlin, Heidelberg, 2013. Springer
Berlin Heidelberg.
[33]
Shin-ya Katsumata, Tetsuya Sato, and Tarmo Uustalu.
Codensity lifting of monads and its dual.
Logical Methods in Computer Science, 14(4), 2018.
[34]
Max Kelly.
Basic Concepts of Enriched Category Theory, volume 64.
Cambridge University Press, 1982.
Republished in: Reprints in Theory and Applications of Categories,
No. 10 (2005) pp.1-136.
[35]
Friedrich Liese and Igor Vajda.
On divergences and informations in statistics and information theory.
IEEE Transactions on Information Theory, 52(10):4394–4412, Oct
2006.
[36]
John M. Lucassen and David K. Gifford.
Polymorphic effect systems.
In Conference Record of the Fifteenth Annual ACM Symposium on
Principles of Programming Languages, pages 47–57. ACM Press, 1988.
[37]
Saunders Mac Lane.
Categories for the Working Mathematician (Second Edition),
volume 5 of Graduate Texts in Mathematics.
Springer, 1998.
[38]
R. Mardare, P. Panangaden, and G. Plotkin.
On the axiomatizability of quantitative algebras.
In 2017 32nd Annual ACM/IEEE Symposium on Logic in Computer
Science (LICS), pages 1–12, Los Alamitos, CA, USA, jun 2017. IEEE Computer
Society.
[39]
Radu Mardare, Prakash Panangaden, and Gordon Plotkin.
Quantitative algebraic reasoning.
In Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in
Computer Science, LICS ’16, page 700–709, New York, NY, USA, 2016.
Association for Computing Machinery.
[40]
Ilya Mironov.
Rényi differential privacy.
In 2017 IEEE 30th Computer Security Foundations Symposium
(CSF), pages 263–275, Aug 2017.
[41]
John C. Mitchell and Andre Scedrov.
Notes on sconing and relators.
In Computer Science Logic, 6th Workshop, CSL ’92, volume 702
of LNCS, pages 352–378. Springer, 1992.
[42]
Eugenio Moggi.
Notions of computation and monads.
Information and Computation, 93(1):55–92, 1991.
[43]
Tetsuzo Morimoto.
Markov processes and the H-theorem.
Journal of the Physical Society of Japan, 18(3):328–331, 1963.
[44]
Hanne Riis Nielson and Flemming Nielson.
Semantics with Applications: An Appetizer.
Springer-Verlag, Berlin, Heidelberg, 2007.
[45]
Federico Olmedo.
Approximate Relational Reasoning for Probabilistic Programs.
PhD thesis, Technical University of Madrid, 2014.
[46]
Shiva Prasad and Kasiviswanathan Adam Smith.
A note on differential privacy: Defining resistance to arbitrary side
information.
Journal of Privacy and Confidentiality, 6(1), 2014.
[47]
Ivan Radiček, Gilles Barthe, Marco Gaboardi, Deepak Garg, and Florian
Zuleger.
Monadic refinements for relational cost analysis.
Proc. ACM Program. Lang., 2(POPL):36:1–36:32, December 2017.
[48]
Jason Reed and Benjamin C. Pierce.
Distance makes the types grow stronger: a calculus for differential
privacy.
In Proceeding of the 15th ACM SIGPLAN international
conference on Functional programming, ICFP 2010, pages 157–168. ACM,
2010.
[49]
J. J. M. M. Rutten.
Elements of generalized ultrametric domain theory.
Theor. Comput. Sci., 170(1-2):349–381, December 1996.
[50]
Tetsuya Sato.
Identifying all preorders on the subdistribution monad.
In Bart Jacobs, Alexandra Silva, and Sam Staton, editors, Proceedings of the 30th Conference on the Mathematical Foundations of
Programming Semantics, MFPS 2014, Ithaca, NY, USA, June 12-15, 2014,
volume 308 of Electronic Notes in Theoretical Computer Science, pages
309–327. Elsevier, 2014.
[51]
Tetsuya Sato.
Approximate relational hoare logic for continuous random samplings.
In The Thirty-second Conference on the Mathematical Foundations
of Programming Semantics, MFPS 2016, volume 325 of Electronic Notes
in Theoretical Computer Science, pages 277–298. Elsevier, 2016.
[52]
Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya
Katsumata.
Approximate span liftings: Compositional semantics for relaxations of
differential privacy.
In 34th Annual ACM/IEEE Symposium on Logic in Computer
Science, LICS 2019, pages 1–14. IEEE, 2019.
[53]
Ross Street.
The formal theory of monads.
Journal of Pure and Applied Algebra, 2(2):149 – 168, 1972.
[54]
Larry Wasserman and Shuheng Zhou.
A statistical framework for differential privacy.
Journal of the American Statistical Association,
105(489):375–389, 2010.
relax
Proof.
relax∎
relax__relax
Proposition 17.
The family of -divergences defined by
is a -relative -divergence on the monad .
Proof.
The monotonicity of is obvious.
We show the -unit-reflexivity of .
For all (that is, ), we have
We show the -composability of .
Let and .
We write and for each .
•
If and for all , we have
•
If or for some , we have
This completes the proof.
∎
Proposition 18.
The family of -divergences defined by
is a -relative -divergence on the monad .
Proof.
The monotonicity of is obvious.
We show the -unit-reflexivity of .
For all (that is, ), we have
We show the -composability of .
For all and , we have
This completes the proof.
∎
Proposition 19.
The family of -divergences defined by
is a -relative -divergence on the monad .
Proof.
The monotonicity of is obvious.
We show the -unit-reflexivity of .
For all (that is, ), we have
We show the -composability of .
For all and , we have
This completes the proof.
∎
__relaxaxp@oldproof
(Proof of Proposition 1)
We have -unit reflexivity because the reflexivity is obtained from .
We show -composability.
To show this, we prove a bit stronger statement.
Consider three positive weight functions with
. Assume that there are some
satisfying the following conditions:
{itembox}[l](A’)
for all ,
and
Let , and let
. We want to show the composability in
the sense of [45, Definition 5.2]:
(23)
We first fix a measurable partition of , that is
a family of measurable subsets
satisfying and
.
For each , we fix
two monotone increasing sequences and
of simple functions that converge uniformly to
measurable functions
and
respectively.
The above composability (23)
is then equivalent to
(24)
We fix .
We suppose
and
for
some
()
and a measurable partition of .
Thanks to the condition (A’), we calculate as follows:
We evaluate the above three subexpressions
as follows.
We evaluate as follows:
Here,
the first inequality is given from the non-negativity of each ;
the equality is given by definition of and ;
the second inequality can be given by the continuity of
([35, Theorem 16]; [52, Theorem 3] for the sub-Giry monad ):
the last inequality is derived
by
from the assumption that either
or for all holds.
We next evaluate as follows:
Here,
the first inequality is derived from the non-negativity of each
(25)
the first equality is given by definition of and and the countable additivity of and ;
the second inequality is given by the continuity of and ;
the last inequality is derived
by
from the assumption that either or holds.
We prove the third inequality.
Since is convex function, and sequences
and
are monotone increasing at each ,
By Jensen’s inequality,
the sequence
is monotone increasing for each .
Then, the sequence of supremums
is also monotone increasing, because each is always greater than .
Hence,
Finally, we evaluate
as follows:
Here, if either or for any
holds then the limit will be .
To sum up the above evaluations of , we obtain the inequality
(24) if we have either
1.
and , or
2.
and .
This completes the proof.
__relaxaxp@oldproof
Parameters for Proposition 1 for
for weight functions of , , and
are shown in Table 4.
Below, we check the conditions in Proposition 1.
•
For the weight function of ,
the tuple satisfies for all ,
we have
•
For the weight function of ,
the tuple satisfies for all ,
we have
•
For the weight function of ,
the tuple
satisfies for all ,
•
For the weight function of ,
The tuple
satisfies for all ,
We first show the monotonicity of .
Assume . From the monotonicity of the original ,
we obtain for each ,
Second, we show the -unit-reflexivity of .
For , we have for all .
We can calculate for all ,
Finally, we show the -composability of .
For all ,
, and we can calculate
To prove the second equality, we calculate
This completes the proof.
__relaxaxp@oldproof
__relaxaxp@oldproof
(Proof of Proposition 3)
It suffices to show -unit reflexivity and -composability:
Here and
.
__relaxaxp@oldproof
__relaxaxp@oldproof
(Proof of Proposition 4)
It suffices to show -unit reflexivity and -composability:
Here and
. Without loss of
generality, we may assume holds and
and are nonexpansive, and for every
, holds and
and are nonexpansive.
__relaxaxp@oldproof
__relaxaxp@oldproof
(Proof of Proposition 5)
We first show the -unit reflexivity of . For any , we calculate
We next show the -composability of .
For any and nonexpansive functions , we compute
We note here that the nonexpansivity of
is equivalent to the one of its uncurrying .
__relaxaxp@oldproof
__relaxaxp@oldproof
(Proof of Proposition 6)
We first show the -unit reflexivity of .
For (i.e. ), we calculate
Next, we show the -composability of .
For any and nonexpansive functions , we compute
This completes the proof.
__relaxaxp@oldproof
__relaxaxp@oldproof
(Proof of Proposition 7)
The monotonicity of is obvious since .
We show the -unit reflexivity of .
For all , we have
Hence,
We next show the -composability of .
For any
, we define
by
. Then, we have
for any .
First, for all , we have
From this and the equality (1), we can calculate as follows:
From the assumption , the -unit-reflexivity and -composability of the original divergence , we obtain the -composability of as follows:
This completes the proof.
__relaxaxp@oldproof
__relaxaxp@oldproof
(Proof of Proposition 8)
We consider a preorder on a monad . We define the
-divergence on by
Each is a preorder because
holds for each .
The -unit reflexivity of is derived from the reflexivity of . For all set and ,
Since is a preorder on ,
for all set , and ,
Hence, we have the -composability
Conversely, we consider an -relative -divergence on
such that each is a preorder.
We show that the family
defined by forms a preorder on monad .
Each component of at set is a preorder on the set .
We here note that the divergence must be reflexive (i.e. for all )
because of the reflexivity of :
From the reflexivity and -composability of , we have for all
and ,
(26)
(27)
They are equivalent to
the substitutivity and congruence of
respectively:
Finally, the above conversions and
are mutually inverse:
This completes the proof.
__relaxaxp@oldproof
__relaxaxp@oldproof
(Proof of Theorem 2)
First, it is easy to see that the inequality (3) is equivalent to
satisfying -unit reflexivity.
We next show that the inequality (4) is equivalent to
satisfying -composability.
(only if) Since , we have
. Therefore it holds
. By letting in
the inequality (4), we obtain the -composability:
(if) From the -composability, for any and
and , we have
Next, for any , we have . Thus by monotonicity of
we have
By discharging , we conclude
__relaxaxp@oldproof
__relaxaxp@oldproof
(Proof of Theorem 4)
is a graded variant of codensity lifting
performed along the fibration
([33]; see also Definition
15). Proving that it is a graded
lifting of is routine. We show . The direction is easy. We
show the converse. From the composability of , for any
, , and
, we have
Next, the nonexpansivity of is equivalent to
Therefore we conclude . By discharging , we
conclude the inequality .
__relaxaxp@oldproof
__relaxaxp@oldproof
(Proof of Theorem 5)
Let . We have already shown that
is an -graded -divergence lifting of . We show
that is -strong (this proof does not need the closedness of
). Let and
be objects. We first rewrite the goal:
In the step , we used the equality (1). To show this goal,
we proceed as follows. Let
and
. First, from the
composability of , we obtain
We look at summands of the
right hand side. First, we easily obtain
. Next, from the
nonexpansivity of , for any , we have
Because ,
we obtain
Therefore we obtain the goal:
Next, let . We show that
.
The unit law of immediately entails
Next, under
the assumption on and , in the functor
has a right adjoint above
the adjunction (Lemma
1). Therefore each component of the internal Kleisli
extension morphism given in (6) are nonexpansive
morphisms in :
Therefore we conclude
We also easily have monotonicity:
for by
condition 1 of graded divergence lifting. We thus conclude that
.
We finally show . Let
. We show
(28)
Let . Since is an -graded -divergence lifting of
, we obtain
This implies the inequality
in . By taking the sup for
, we obtain the inequality (28).
__relaxaxp@oldproof
__relaxaxp@oldproof
(Proof of Proposition 9)
We write .
We first check the measurable isomorphism .
The measurable functions
() and the function
() are mutually inverse.
For any (Borel-)measurable , we have
and
if and otherwise.
Since all generators of are and where , we conclude the measurability of .
Thus, corresponds bijectively to , and
We then obtain, for all ,
The first inequality is given by where is the indicator function of
defined by when and otherwise.
The last inequality is given by the data-processing inequality which is given by the reflexivity and -composability of .
__relaxaxp@oldproof
__relaxaxp@oldproof
(Proof of Proposition 10)
We first prove that is not -generated.
We write .
We define by
Then the total variation distance between them is calculated by
On the other hand, for any , we have
This implies that is not -generated.
Next, we prove that is -generated.
From the data-processing inequality which is given by the reflexivity and -composability of ,
we obtain for any ,
We show that the above inequality becomes the equality for some .
We fix , a base measure over satisfying the absolute continuity and
the Radon-Nikodym derivatives (density functions) of with respect to respectively.
Let and .
We define by
if and otherwise.
Then for any we have
Similarly we have .
Therefore, we obtain
We then conclude that is -generated.
__relaxaxp@oldproof
__relaxaxp@oldproof
(Proof of Proposition 11) For all set
and , we have
This implies that is -generated.
__relaxaxp@oldproof
Lemma 2.
For any ,
the function defined by
is a CS-EPMet on such that
.
Proof.
We first check the axioms of extended pseudometric.
By (Ref), contains
for each .
Hence holds for all .
By (Sym) and (Cut),
if and only if
. Hence, for all
,
By (Tri) and (Cut), if
and
then
.
Hence, for all ,
We next check the substitutivity. Let and
.
By (Subst), we have
Since is arbitrary, we conclude the substitutivity as
follows:
Next, we check the congruence, Let and
By applying
(Nonexp) and (Cut) inductively by unfolding the structure of
,
(29)
If
for some
, then we have
for all . By
(Max),(Cut) and definition of , we have
for all
. Hence,
(30)
From the above two implications (29) and (30),
We conclude the congruence as follows:
Finally, we assume .
By definition of ,
for any such that , there is
satisfying and .
Since is arbitrary, by (Max) and (Cut), we conclude
(Max) Immediate from the transitivity of ordering and the monotonicity of .
(Arch) Immediate from the Archimedean property and the completeness of .
(Nonexp)
Let .
We then take a term corresponding to .
Let be functions.
We fix an arbitrary .
Assume for each .
Then this asserts .
From the congruence of , we conclude
(Assumpt) Immediate.
__relaxaxp@oldproof
__relaxaxp@oldproof
(Proof of Theorem 7)
Since the range of is a subset of ,
we may define the following monotone restrictions of and :
By Theorem 6, we have and .
We show .
Let . There exists such that .
We check .
By the adjunction , we have which is equivalent to .
It suffices to check .
We have
From the monotonicity of the closure , we conclude
Since is arbitrary, we have .
__relaxaxp@oldproof
Lemma 4.
Let be a CC-SM and
be a doubly-indexed family of -divergences satisfying
monotonicity on (Definition 6). Then
is an -graded relational lifting of
(satisfies conditions 1–3 of
Definition 15).
Proof.
(Condition 1) We first show that
for all whenever
and . From the monotonicity of , for
all , , ,,
we have
Therefore, for any , we
obtain as follows:
(Condition 2) We next show
. From the
definition of morphisms in , for all ,
we have
as follows:
(Condition 3) Finally, we show that
holds for any
and . For all
, we have
(a)
For all , we have
(b)
We here fix . We show
. We also fix , , and
. From
(a), we obtain
Therefore, by instantiating (b)
with and
, for all
, we have
Since , ,
, and
are arbitrary, we
conclude
as follows:
This completes the proof.
∎
__relaxaxp@oldproof
(Proof of Proposition 13)
By Theorem 8 and the assumption ,
we obtain for all and ,
This completes the proof.
__relaxaxp@oldproof
Lemma 5.
The mapping
forms a measurable function of type .
Proof.
We show that for all ,
the mapping forms a measurable
function of type where
is the subspace of whose underlying set is .
We have
The mapping forms a continuous function of type
, hence
it is uniformly continuous on the compact set where and are arbitrary closed intervals in and
respectively.
Then, for all , there exists such that
holds wherever .
Hence, for all , there is such that
whenever ,
Since the closed intervals and are arbitrary, we conclude that
the function is continuous, hence measurable.
Hence, the mapping is measurable.
Since is arbitrary and ,
the mapping forms a measurable function of type .
The rest of proof is routine.
∎
Corollary 1.
.
Lemma 6(Measurability of ).
The mapping
forms a measurable function of type .
Proof.
We have, for all ,
The density function is continuous
function of type where is the subspace of whose underlying set is
.
The measurability of is proved in the same way as .
The rest of proof is routine.
∎