Probability Trees
Abstract.
In this article, we introduce a formal definition of the concept of probability tree and conduct a detailed and comprehensive study of its fundamental structural properties. In particular, we define what we term an inductive probability measure and prove that such trees can be identified with these measures. Furthermore, we prove that probability trees are completely determined by probability measures on the Borel -algebra of the tree’s body.
We then explore applications of probability trees in several areas of mathematics, including probability theory, measure theory, and set theory. In the first, we show that the cumulative distribution of finitely many dependent and non-identically distributed Bernouli tests is bounded by the cumulative distribution of some binomial distribution. In the second, we establish a close relationship between probability trees and the real line, showing that Borel, measurable sets, and their measures can be preserved, as well as other combinatorial properties. Finally, in set theory, we establish that the null ideal associated with suitable probability trees is Tukey equivalent to the null ideal on . This leads to a new elementary proof of the fact that the null ideal of a free -finite Borel measure on a Polish space is Tukey equivalent with the null ideal of , which supports that the associated cardinal characteristics remain invariant across the spaces in which they are defined.
Key words and phrases:
probability tree, measure theory, cardinal invariants, real numbers, random variables, inductive probability measure.2020 Mathematics Subject Classification:
60A99, 60G50, 60A05, 60B05, 60C05, 03E17, 60B051. Introduction
In probability theory and statistics, the so-called probability trees are graphic tools that allow the representation of problems and help in their understanding and subsequent study, particularly those problems involving some kind of dependency. These trees can be used to visually analyze all the possible results of an experiment and the probabilities associated with each of them, making it easier to analyze situations involving decisions and dependent events and calculate their respective probabilities.
Intuitively, a probability tree starts with a single node —called the root of the tree111However, in some contexts, such as in the study of genealogy, unrooted probability trees are also considered (see e.g. [GT95]). —which represents the first event in the experiment. In the next step, from the root, branches of the tree fan out to represent all possible outcomes of the first experiment, labeled with their respective probabilities whose sum must be equal to one since they cover all possible outcomes. Each branch ends in a new node, from which new nodes and new branches are generated as the experiment is carried out, which ultimately determines the tree structure, covering all possible events in the experiment under consideration. The probabilities of each branch can be multiplied along the paths to calculate cumulative probabilities —that is, the probability that two or more events occur simultaneously— allowing for the complete determination of any sequence of events in the experiment.
In some contexts beyond probability and statistics, probability trees are relevant in many different areas of the natural sciences, such as in the field of genetics, where they are used to help model the inheritance of biological traits as well as diseases. Specifically, they can be used to determine the probability that an individual inherits a certain gene from his or her parents, connecting information between recessive and dominant inheritance patterns (see [FF80]). In an area of science directly related to genetics, namely genealogy, probability trees have other very significant applications. There they can be used to model and analyze situations of relationships between generations, as well as to calculate the probabilities of transmission of certain biological traits, similar to the case of genetics mentioned above. In this context, the trees in question help to represent family connections and possible routes of genetic inheritance, especially when dealing with dependent events, such as calculating the probability that a certain individual inherits a specific gene or disease from their ancestors. A well-known example of this is the so-called infinitely-many-sites models, which are used to study genetic variability and mutation processes in populations over time (see [Gri89]).
As we mentioned at the beginning of this introduction, probability trees are especially useful in the analysis of dependent events, that is, those in which the probability of an event is conditioned by the results of previous events. In these cases, the probability in the successive branches reflects such dependence, making these trees an essential tool for representing problems in the context of Bayesian Inference (see e.g. [ZM18]). This makes this type of tree very useful in some sub-areas of computer science such as artificial intelligence and machine learning, where they are used in particular to study probabilistic and learning algorithms, such as decision trees, which allow to classify data and make predictions of future outcomes based on probabilities derived from previously known data. In particular, probability trees serve as a model for a certain type of probabilistic process, known as Causal Generative Processes, which are essential in artificial intelligence as they allow modeling data generation based on causal relationships (see [GMD+20] and [KWWC24]). Furthermore, probability trees are widely used to carry out simulations, including the well-known Monte Carlo methods, where they are used in the representation and subsequent analysis of random processes (see [FSNG23]).
While we have seen that probability trees arise in highly practical and diverse contexts such as genetics, genealogy, artificial intelligence, machine learning, and Bayesian inference, the origin of this paper stems from a far more abstract and unexpected field: Forcing Theory, in Set Theory. Recently, based on previous work by Saharon Shelah [She00] and Jakob Kellner, Saharon Shelah, and Anda Tănasie [KST19], Miguel A. Cardona, along with the authors in [CMU24], introduced a general theory of iterated forcing using finitely additive measures222A preliminary version of the general theory of iterated forcing using finitely additive measures was presented in the master’s thesis of the second author (see [Uri23]), where the first questions related to probability trees arose. In this thesis, an entire chapter was dedicated to the formalization and study of these trees (see [Uri23, Ch. 3]), which served as the starting point for the development of this article.. This theory is founded on what the authors referred to as --linkedness, a property of partially ordered sets, where is an infinite cardinal and is a class of ordered pairs. A central focus of the theory was to prove that random forcing —the poset ordered by inclusion— satisfies this property for certain suitable parameters and . Later, the authors in [MU24] extended this result, a task that was not only highly technical but also required the development and application of concepts related to probability trees (as in the original work of Shelah [She00]). The core of this proof was to prove the existence of two objects: a finite set and an element of random forcing satisfying certain special conditions. Without delving into overly technical and unnecessarily complex details, the strategy was based on defining a probability tree whose nodes represented partial approximations of the set . Then, instead of directly attempting to find the objects and , following the approach of the probabilistic method (see [AS16] and [Uri23, pp. -]), the probability of their existence was calculated using the tree structure. Finally, it was shown that this probability is positive, ensuring that such objects satisfying the required conditions can indeed be found. To achieve this, it was necessary not only to formalize the notion of a probability tree but also to develop and analyze the structural properties of such trees.
In the references cited so far, there is no concrete definition of the notion of probability tree, as it is usually tailored to the needs of each particular case. Moreover, there is currently no detailed study of the structure of these trees. For this reason, the main objective of this article is clear and specific: to formalize the notion of probability tree and analyze its structure rigorously. Our formalization of this concept —presented in 4.11— is quite intuitive and, in general terms, outlined in the second paragraph of this introduction.
The formal definition and structural study of probability trees are carried out mainly in Section 4. Our starting point is to prove that every probability tree induces a measure on the tree that generates probability space structure at each of its fronts and levels (see Theorem 4.10 and 4.15). The analysis of the converse of this result motivated the introduction of the notion of inductive probability measure, which enabled a deeper exploration of the structure of theses trees. This led to the prove that probability trees can be identified with inductive probability measures and are completely determined by probability measures on the Borel -algebra of —the body of , which is the set of maximal branches (see Section 3). The structural study developed in this article focuses primarily on the definition of four collections associated with these trees: tree probability sequences (), inductive probability measures (), Borel probability measures (), and general probability sequences (), as well as the connections between them, which are ultimately reduced to the commutativity of the diagram presented in Figure 3.
Once the structure of probability trees was analyzed, we laid the groundwork for exploring their applications in different areas. This structural study not only allowed us to formalize and better understand their properties but also to establish connections with other branches of mathematics. In particular, we apply probability trees to three distinct areas: probability theory, measure theory, and set theory, particularly in the combinatorics of real numbers and invariant cardinals. Below, we provide a brief description of each of these applications.
The first application —in Probability Theory—is related to a generalization of a well-known result: by adding a finite number of independent and identically distributed random variables with a Bernoulli distribution, we obtain a random variable with a binomial distribution. However, we face the situation where these random variables with Bernoulli distribution are dependent and even not identically distributed. To address this problem, we use probability trees to achieve the following result, which corresponds to Theorem 5.1.
Theorem A.
Let , be a natural number, and be the random variable representing the number of successes of -many dependent Bernoulli distributed random variables, where the probability of success of each variable also depends on the previous events. If is a lower bound of the probability of success of each Bernoulli-distributed random variable, then the cumulative distribution of is below the cumulative distribution of the binomial distribution with parameters and .
This situation arose in the proof of [CMU24, Main Lemma 7.17] (see also [Uri23, Main Lemma 4.3.17]), where A proved sufficient for the purposes required in that context.
The second application —in Measure Theory— establishes a connection between probability trees and the real line. In particular, we will prove the following theorem, which corresponds to Section 6.
Theorem B.
Every probability tree defines a canonical probability measure on the Borel -algebra of which has a connection with the Lebesgue measure of the unit interval.
The measure is defined through a function defined on , expect on a countable subset, constructed from the probability tree . The connection between and is established through as, when restricted to an appropriate set, turns out to be a homeomorphism, thus preserving the Borel sets and, consequently, the measurable sets (see Theorem 6.19 and 6.23). Additionally, this function preserves the measure between the measurable spaces and , where denotes the Lebesgue measure. On the other hand, the properties that define will also allow us to demonstrate that probability trees can characterize the probability measures defined on (see Theorem 4.37).
The third application —in Set Theory— is related to cardinal invariants. Cardinal invariants, also called cardinal characteristics, are cardinal numbers that capture combinatorial properties of infinite spaces. Examples of this kind of cardinals arise considering ideals. Recall that for a non-empty set , is an ideal on , if it is closed under finite unions, -downwards closed, , and . In this context, we define the cardinals invariants associated with as follows:
is the additivity of , | |
is the covering of , | |
is the uniformity of , | |
is the cofinality of . |
Apparently, there is no unanimous reason why they are called invariants, but it is known that they possess an invariance property: in many cases, the associated cardinal characteristics do not depend on the space on which the ideal is defined, as long as the space satisfies certain properties (see Theorem 7.7). The connection between probability trees and the real line that we establish in Section 6 allows us to extend this invariance property to the null ideal of . If is a measure space, denotes the ideal of all -measure zero subsets of . When the measure space is understood, we just write , e.g. and with respect to the Lebesgue measure.
Theorem C.
If is a probability tree such that is free, the cardinal invariants associated with and are the same.
Moreover, these identities follow by the Tukey-equivalence (7.3) between structures associated with these ideals.
This theorem leads to a new elementary proof of the more general known fact that the invariance of the cardinal invariants associated with the null ideal applies for any free -finite measure on the Borel -algebra of a Polish space. Details are developed in Section 7.
In this work, we adopt the set-theoretic treatment of natural numbers.
Notation 1.1.
We adopt the set-theoretic treatment of natural numbers (starting from ): , and each natural number is the set of its predecessors. Formally, if is a natural number, then . The entire set of natural numbers is denoted by . Furthermore, as in the context of ordinal numbers in set theory, we typically write “” instead of “” and “” for “ or ”.333 is the first infinite ordinal number, represented as the limit of natural numbers.
2. Elementary notions of measure and probability theory
2.1. Review of measure theory
We review, in this section, basic notions related to measure theory and probability spaces.
Let be a non-empty set and . The -algebra generated by , that is, the smallest -algebra of sets over that contains , is denoted by . If , then , which is a (-)algebra of sets over whenever is a (-)algebra of sets over . Recall that, if is a function between non-empty sets, a -algebra over and is a -algebra over , then is --measurable if for all . Most of the time, we work with the Borel -algebra of a topological space: given a topological space , denotes the -algebra generated by the collection of open subsets of , which is known as the Borel -algebra of . Any set is called Borel in , and a function between topological spaces is called a Borel map if it is --measurable.
Regarding functions. Let be a function between non-empty sets. For and , define and Functions can be used to transfer -algebras.
Fact 2.1.
Let be a function between non-empty sets.
-
(a)
If is a (-)algebra over then is a (-)algebra over .
-
(b)
If is a (-)algebra over , then is a (-)algebra over .
-
(c)
If then .
As a consequence, we have the following results.
Corollary 2.2.
Let be a non-empty set, and . Then we have that .
Corollary 2.3.
If is a topological space and is a subspace, then we have that .
Corollary 2.4.
Any continuous function between topological spaces is Borel.
We now review the notion of measure. Let be a non-empty set and with . Recall that a function is a finitely additive measure (fam), if and whenever is a finite collection of pairwise disjoint sets whose union is in . Also, a fam is a (-additive) measure if for any collection of pairwise disjoint sets whose union is in .
Given a fam , we say that is finite if and . When there is some such that and for all , is called -finite. If and , then is a probability fam. Finally, is free if, for any , and .
Recall that a measure space is a triple where is a -algebra over and is a measure on .
Example 2.5.
Let be a countable set. For any function there is a unique measure such that for all . Indeed, for , must be .
Conversely, for any measure there is a unique function such that ( must be ).
As a consequence, if is a countable or finite set, to define a probability measure on it is sufficient to define a function such that . For example, we can use this to introduce the uniform measure on finite sets.
Definition 2.6.
Let be a non-empty finite set. The uniform measure on , denoted by , is the measure on determined by .
Definition 2.7.
The standard measure on is the (unique) probability measure on obtained from the function , as in 2.5.
Measure zero sets play an essential role in measure theory. Let be a -algebra over the set and let be a measure. Define
The sets in are called -null, or just null, or measure zero sets.
It is known that is the -algebra generated by and that there is a unique measure on that extends , namely, for and . This measure is called the completion of , and we use the same letter to denote this completion. In terms of measure spaces, we say that is the completion of .
Notice that iff . In this situation, we say that is a complete measure and is a complete measure space.
Lemma 2.8.
Let be a map between non-empty sets. If is a measure space, then is a measure space where for .
Lemma 2.9.
Let be an algebra on a set and let be a fam on . Assume that and for all . Then the set is countable.
Finally, recall:
Theorem 2.10 ([Hal50, §13]).
Let be an algebra of sets over a set and assume that is a -finite measure. Then, there is a unique measure on that extends .
3. Elementary notions of trees
3.1. Trees
In this section, we introduce the set-theoretic notion of tree (of height ) and study some of its combinatorial and topological properties.
We start by fixing some notation about functions and sequences.
Notation 3.1.
We write to denote a function such that . We usually say that is a sequence (indexed by ).
We typically look at sequences of length : for , is a finite sequence of length , while is a sequence of length . The empty sequence is the only sequence of length . We use to denote the length of a sequence of length . When is a finite sequence and is a sequence of length , define the concatenation of and by . If and are sequences of length , then that is, means that is a longer sequence extending .
Let be a set and . We define as the set of sequences in of length ; as the set of sequences in of length ; and as the set of sequences in of length . Equivalently, and .
Recall the notions of partial and linear orders: a partial order is a pair where is a non-empty set and is a reflexive, transitive, and anti-symmetric relation. We say that and are comparable in if either or . A chain in is a subset of such that any pair of elements are comparable. When is a chain in itself, we say that is a linear order.
We are ready to introduce the notion of tree.
Definition 3.2.
A tree (of height ) is a partial order containing a minimum element , called the root of , such that, for any , is a finite linear order (under the order of ). The members of are usually called the nodes of .
For example, when is a non-empty set, is a tree with .
Now, we introduce some notation related to trees and their properties.
Definition 3.3.
Given a tree , we fix the following notation for and :
-
(1)
the height of in .
-
(2)
the -th level of , so .
-
(3)
the height of .
-
(4)
the set of immediate successors of (in ).
-
(5)
the set of splitting nodes of .
-
(6)
is the set of successors of in .
-
(7)
that is, the set of maximal nodes of .
Example 3.4.
Consider the tree . Then, for and :
-
(1)
, the length of .
-
(2)
.
-
(3)
.
-
(4)
.
We introduce the following notions related to trees.
Definition 3.5.
Let be a tree.
-
(1)
We say that is a subtree of if and, for any and in , .
-
(2)
A tree is well-pruned if, for any and , there is some above .
-
(3)
A tree is finitely branching if is finite for all .
-
(4)
A tree is perfect if, for every , there is some splitting node in above .
Notice that, if then whenever is a well-pruned tree.
On the other hand, it is not hard to check that, if is a tree and is a subtree of then, for any and , , , , , and .
Example 3.6.
If is a subtree of then and for . For any and , is a subtree of of height . Note that is well-pruned, it is finitely branching iff is finite, and it is perfect iff and .
Theorem 3.7.
Any tree of height is isomorphic with a subtree of for some non-empty set .
Proof..
Let be a tree. Put (as a set). Define in the following way: for , enumerate such that (so ), so set . Then, for any , and is a subtree of . ∎
Notation 3.8.
Due to the previous theorem, from now on all trees are trees of sequences, i.e. subtrees of for some non-empty set , unless otherwise stated.
From the next sections, the space of infinite branches of a tree will be very important to understand the combinatorics and topology of the reals.
Definition 3.9.
When is a subtree of , we define
Notice that when has finite height (because, in this case, ). Since there is a bijection between and the maximal chains contained in , we call the space of maximal branches of the tree or the body of . Any is called an infinite branch of .
It is clear that .
If is a subtree of , then implies . However, the converse is not always true, since there could be trees of height with . For example, for let where is the sequence of length composed with only . Then is a counterexample. The so-called König’s Theorem (see Theorem 3.10) gives us the sufficient conditions to have the equivalence.
Theorem 3.10.
Let be a subtree of . If is finitely-branching, then the following statements are equivalent.
-
(i)
.
-
(ii)
.
-
(iii)
is infinite.
The set is also related to the notion of well-foundedness.
Definition 3.11.
A tree is well-founded if every non-empty subset of has a maximal element.
Lemma 3.12.
Let be a tree of sequences. Then, the following statements are equivalent.
-
(i)
is well-founded.
-
(ii)
.
Proof..
If and contains some infinite branch , then is a subset of without maximal elements. Conversely, if is not well-founded, i.e. there is some non-empty without maximal elements, then we can construct an increasing sequence in . This increasing sequence determines a unique member of , so . ∎
3.2. Tree-topology
In this Subsection, we assign a topology to for any tree and study some of its properties. We start by recalling some basic topological notions.
Consider a topological space . A set is clopen in if it is open and closed in . We say that is zero-dimensional if it has a base of clopen sets. For , denotes the closure of , and is the (topological) interior of . The subindex is removed when clear from the context. Recall that a topological space is discrete if every subset of is open.
If , the smallest topology of containing , which is called the topology of generated by , is denoted by .
We define the topology of the branches of a tree.
Definition 3.13.
Let be a tree. The tree-topology of is the topology generated by , where . We just write when is clear from the context. Denote .
Notice that, if is a tree of finite height, then the tree topology is the discrete topology. More generally, (in case it is non-empty) is a discrete subspace.
Assume that is a tree of height . We say that are compatible (in ) if either or . Otherwise, they are incompatible, which we represent by , or just . It is not hard to check the following.
Fact 3.14.
Let be a tree and . Then:
-
(a)
If then .
-
(b)
iff .
-
(c)
iff either , or and there are no splitting nodes between , including it, and , excluding it. (The latter implies .)
Lemma 3.15.
The collection is a base of the topology of and each is clopen in . In particular, is a zero-dimensional space.
The countable product of discrete spaces can be expressed as a topological space of the form .
Definition 3.16.
Let be a sequence of non-empty sets. Define , which is a well-pruned tree of height and for all . Notice that,
In the case that for all , and .
Two very important spaces are defined in this way: the Cantor Space and the Baire space .
Lemma 3.17.
Let be a sequence of discrete spaces. Then the tree-topology of is the same as the product topology.
Proof..
It is enough to show that is a base of the product topology. First, for , is open in the product topology because where
Now assume that is open in the product topology and . Then, there is some sequence such that each , is finite, and . Hence, there is some such that . Therefore, . ∎
The Cantor space is compact as a consequence of the following result, which follows by Theorem 3.10.
Theorem 3.18.
Let be a tree. Then is compact iff is finitely branching.
4. Probability trees
In this section, we formalize the notion of a probability tree and explore some of its structural properties. This will allow us to prove that such trees can be identified with a specific class of sequences, which we call inductive probability measures, as well as with probability measures on the Borel -algebra on the space of maximal branches of the tree (see 4.11, 4.28, and Theorem 4.37). Later, in Subsection 4.4, we will build on the concept of conditional probability to define a relative expected value within this framework.
Notation 4.1.
We denote by the collection of all countable trees of sequences.
Remark 4.2.
Although, for simplicity, we develop all the theory in this section for countable trees of sequences, thanks to Theorem 3.7 it can be applied, in a natural way, to arbitrary countable trees, regardless of whether they are composed by sequences.
4.1. Trees as probability spaces
We show how to define probability spaces from a tree in the sense of Section 3.
Definition 4.3.
We say that is a probability tree if and , where each is a probability measure on .
Furthermore, we define as the class of all sequences such that is a probability tree for some . Notice that this is uniquely determined by (the domain of) , so it will be denoted by .
The following are examples of probability trees.
Example 4.4.
Order-isomorphisms preserve the probability tree structure.
Lemma 4.5.
Let be a probability tree, a partial order, and an order-isomorphism. Then is a probability tree, where , and for any and , .
Let be the subtree of whose set of nodes is (see Figure 1).
If we define for ; for ; and for then is a probability tree if, and only if:
Notice that, in that case, it satisfies the following:
that is, if for any we define then we have that is a probability space. The same happens trivially at .
As a consequence, it is possible to induce a probability space structure on each level of , and even a measure on the whole tree. To formalize this idea, we introduce the following definition.
Definition 4.6.
Let be a probability tree. Define the measure determined by
For any , , which is a measure on .
In some cases, the probability of a successor’s space in a probability tree can be determined using .
Lemma 4.7.
Let be a probability tree, and . Then . In particular, if , then
Proof..
Let and . Then,
Thus, ∎
The induced measure satisfies the following properties.
Lemma 4.8.
If is a probability tree, then:
-
(a)
.
-
(b)
for any .
Proof..
(a): , where the last equality holds because empty products are equal to 1.
Considering the probability trees from 4.4, we can calculate the corresponding measures on its levels.
Example 4.9.
Based on 4.6, we can prove that a well-pruned probability tree induces a probability space structure at each level.
Theorem 4.10.
If is a well-pruned probability tree then, for any , is a probability measure on .
Proof..
Theorem 4.10 may not hold when is not well-pruned. We leave this discussion to the following subsection (4.16).
4.2. Inductive probability measures
The properties presented in 4.8 are fundamental for the analysis of the structure of probability trees. This motivates the following notion of inductive probability measure.
Definition 4.11.
-
(1)
Let . We say that is an inductive probability measure on if it is a measure on such that and whenever .
For any , , which is a measure on .
-
(2)
Denote by the collection of inductive probability measures in some . Notice that is uniquely determined by , so it will be denoted by .
-
(3)
Define the function such that, for any , , which is well-defined by virtue of 4.8. Notice that .
Notice that the proof of Theorem 4.10 only uses that is an inductive probability measure. For this reason, the same proof yields:
Theorem 4.12.
If is an inductive probability measure on a well-pruned tree , then is a probability measure on for all .
As in the case of Theorem 4.10, the previous theorem may not hold when is not well-pruned. To understand this, we generalize this theorem by using the following notion.
Definition 4.13.
Let be a tree of sequences. A set is a front of if it satisfies the following:
-
(i)
Any pair of members of are incompatible in .
-
(ii)
Every maximal branch of intersects , i.e. for any there is some such that .
For example, for any , (which is a disjoint union) is a front of .
Fact 4.14.
Let be a tree of sequences.
-
(a)
is well-pruned iff, for any , is a front of .
-
(b)
Assume that is not well-pruned and let be the minimum number such that . Then and, for any , is a front of iff .
Proof..
Regardless of whether is well-pruned or not, we can find the maximum (which can be ) satisfying that is well-pruned. Notice that and is well-pruned iff . Also, if is not well-pruned, then .
Therefore, it is enough to show that, for any , is a front of iff . If then its length must be (otherwise would not be well-pruned), so for any . This shows the implication from right to left.
For the converse, assume that . Then and is not well-pruned, so . Any in this set has length , so it does not have nodes of length below it. Thus, is not a front. ∎
Theorem 4.12 is generalized as follows.
Theorem 4.15.
If and is a front of , then is a probability measure. Moreover,
(4.15.1) |
Proof..
For , let . Since , (4.15.1) and 4.8 (a) imply that . Hence, it is enough to show (4.15.1). We provide two proofs of (4.15.1); the first is presented below, and the second is in \autopagerefproofb035-3.
Let be the set of nodes in below some member of . This is a subtree of and : if then , so for some because is a front, but all members of are pairwise incompatible, so for all , contradicting that .
Therefore, is a well-founded tree by 3.12. It is enough to show that . Assume the contrary, so contains a maximal element . If then , so , that is . Thus . This implies that and, since is maximal in , , so for all . On the other hand, is a disjoint union, so
where the last equality holds by 4.8 (b). But this contradicts that . ∎
Notice that Theorem 4.12 (and Theorem 4.10) follow from 4.14 and Theorem 4.15.
Example 4.16.
Given an inductive probability measure in , we can compute the probability of in each level of in terms of the probability of .
Corollary 4.17.
If and is an inductive probability measure in , then for any and , if is a front of (which holds when is well-pruned), then .
Proof..
Apply (4.15.1) to any front of containing , e.g. . ∎
We will prove later that is surjective (see 4.24), however, it is evident that is not one-to-one, as it does not matter how a probability tree is defined above a measure-zero node. While it might seem reasonable to eliminate measure-zero nodes and restrict to probability trees with strictly positive measures, this approach is not ideal, as the applications often involve trees with measure-zero nodes. Instead, we will isolate the positive part of a tree with respect to measures in and .
Definition 4.18.
Let and .
-
(1)
We say that is positive if, for any , and , is positive. Similarly, is positive if, for any , .
-
(2)
If is a probability tree, we say that it is positive if is positive.
-
(3)
and , which are called the null part of and , respectively.
-
(4)
and , which are called the positive part of and , respectively.
-
(5)
For , set .
-
(6)
and .
-
(7)
is the collection of all positive . Similarly, is the collection of all positive .
-
(8)
We define the functions:
by , by , and is . 4.19 below justifies that the co-domains of these functions, and the domain of , are as indicated.
Next, we list some basic properties of the notions introduced in 4.18.
Fact 4.19.
Let and . Then:
-
(a)
.
-
(b)
is a subtree of (so ) and . Moreover, if is well-pruned then so is and . A similar result holds for .
-
(c)
and , also and .
-
(d)
.
-
(e)
is positive iff iff
-
(f)
is positive iff iff .
-
(g)
is positive iff is positive.
Proof..
(a): Immediate because and .
(b): We just deal with since it implies the result for by (a). First notice that , so , and it is clear that in implies . Therefore, is a subtree of .
Now let , so . Since we have that , there must be some such that , so and, hence . This indicates that (the contention is trivial). This implies that, whenever is well-pruned, is well-pruned with the same height.
(c): It is clear that is a measure on and . Now, for , since the nodes in have measure zero,
Thus, and . The proof for is similar.
We can use and to characterize when and are equal. Furthermore, this characterization enables us to show that, by restricting to positive trees, establishes a bijection between and (see also 4.25).
Lemma 4.20.
Let . Then iff and . As a consequence, is a one-to-one function.
Proof..
Assume that . It is clear that and , therefore . On the other hand, let and . By 4.7 we have
Thus, .
Conversely, if and , then by 4.19. Hence and . Since the nodes in have measure zero with respect both and , we conclude that . ∎
The condition in 4.20 that characterizes when and are equal, motivates the definition of the following equivalence relations on and .
Definition 4.21.
-
(1)
We say that are positive equivalent, denoted by , iff . It is clear that this is an equivalence relation. Denote by the -equivalence class of .
-
(2)
Similarly, are positive equivalent, denoted by , iff . It is clear that this is an equivalence relation. Denote by the -equivalence class of .
-
(3)
and are defined by and , respectively. Furthermore, and similarly, .
-
(4)
Define by , which is well-defined by virtue of 4.22 below.
Notice that, trivially and for and . In fact, is the canonical class-representative of . Similarly for inductive probability measures.
We can characterize when and are positively equivalent in terms of and .
Lemma 4.22.
If , then the following statements are equivalent.
-
(i)
.
-
(ii)
.
-
(iii)
.
As a consequence, is well-defined and one-to-one.
Proof..
We explore the close relationship between and , and even show that probability trees can be identified with inductive probability measures. To this end, we show how to construct probability trees from inductive probability measures.
Definition 4.23.
-
(1)
Denote by the collection of pairs such that and, for any , is a probability measure on . Since for all , we often identify a positive with and claim that .
-
(2)
Given determined by and , we define as follows. For any and ,
When , denote .
- (3)
-
(4)
Define , and by , , and , respectively, where is determined by .
Given determined by , we can use to induce a probability tree structure on . This will allow us to define a bijection between and
Lemma 4.24.
Let be determined by and , and let .
-
(a)
is a probability tree.
-
(b)
.
-
(c)
is surjective.
-
(d)
, .
-
(e)
If then .
-
(f)
If then .
-
(g)
is bijective.
-
(h)
.
Proof..
On the other hand, if then because is a probability measure on .
(b): It is clear that . By induction on , we show that for all . If and then , so . Now assume that, for any , . For , by 4.7 and induction hypothesis, we have that:
Consider two possible cases. On the one hand, if then, by the definition of ,
On the other hand, if , then where the last equality holds because and, thus, for any .
As a consequence of 4.24 (c) and 4.19 (g), and are surjective functions. Therefore, by 4.20 and 4.22, they are bijections. Furthermore, 4.24 (b), (e) and (f) allow us to define explicitly the inverse of .
Corollary 4.25.
The function is bijective and its inverse is . Moreover, is also bijective.
Corollary 4.26.
Let be a probability tree. If is positive, then iff .
Finally, we can summarize the connections between , , and in Figure 2.
Theorem 4.27.
-
(a)
, , , and are bijective.
-
(b)
All the diagrams in Figure 2 are commutative. As a consequence, any pair of paths with the same starting and ending points produce the same function.
-
(c)
, , , , , and are surjective.
Proof..
(b): It is enough to prove that the seven smallest sub-diagrams are commutative. Fix , and . The far left and right sub-diagrams commute because and . It is clear that the sub-diagram at the bottom commutes.
4.3. Borel probability measures
So far, we have defined three distinct classes associated with probability trees, namely , , and . However, there is a fourth which arises naturally by noting that every probability measure on induces an inductive probability measure in (see 4.29 and 4.30). The new class is then the class of probability measures on , which we introduce below.
Definition 4.28.
We define as the collection of all probability measures on for some . Notice that is uniquely determined by , so it will be denoted by .
Note that, when (e.g. in the case ), .
Now, we construct inductive probability measures derived from measures in .
Definition 4.29.
For , define the measure on determined by for all . Furthermore, we define the function such that, for any , .
We will see that and therefore, is well-defined in the sense that .
Lemma 4.30.
If then is an inductive probability measure in .
Proof..
Assume that . For , since is a disjoint union,
which proves that . ∎
We aim to expand Figure 2 by incorporating the new class , for which we need to establish connections between and the classes and . The relationship between and poses no issues, as it is defined by . We will show that is a bijection: one-to-one will follow by Theorem 2.10, while surjectivity is consequence of a connection between and , namely, for , we will construct a satisfying . The definition of is easy when (i.e. ), as the only possible measure is , which is in with . However, we need more tools for the construction of when and to prove (even in the case ). We present two ways to do this: the first uses Theorem 4.15, while the second is a concrete construction using a connection with the Lebesgue measure of the unit interval (see Section 6, Theorem 6.13). The second construction is one of the main applications of this paper, as it not only addresses the problem at hand but also provides an interesting connection between probability trees and the real line, which will also have consequences for representing cardinals invariants of the continuum (see Section 7).
Theorem 4.31.
For every there is a unique with such that, for any , , i.e. .
Proof..
Set . Define
(4.31.1) |
This is an algebra of sets over and every is clopen in , so . It is clear that for all , so . We first define on by whenever for some and . We show that this function is well-defined and that it is -additive, as it is clear that (the only representing is ).
To see that the map is well-defined, assume that where , and . Without loss of generality, consider the case . Then, for any , since we must have that . Therefore, by (4.15.1), . This implies that
where the last equality hold because , which follows by .
To show that is -additive in , for let , , and , and assume that is pairwise disjoint and for some and . We prove that . We may assume that for all because we can modify without affecting by using that, whenever , where is the set of nodes above some .
Notice that any pair of nodes in each are pairwise incompatible, as well as nodes coming from different and . Set . Since for all , we get that is a front of . Hence, for , we must have . Therefore, by (4.15.1),
This shows that is a measure on . Therefore, by Theorem 2.10 this is extended by a unique measure on , which we still denote by . This is a probability measure because . To show the uniqueness, if satisfies that , then and, for and , . Hence, extends the measure we already defined on , so by the uniqueness of the extension. ∎
As a consequence, we have a connection between and .
Definition 4.32.
We define the function by .
We list some properties of below.
Lemma 4.33.
-
(a)
is a bijective function and, for , where satisfies (i.e. does not depend on this ).
-
(b)
is surjective, and for , iff .
Proof..
(a): Let be such that . Then and, by 4.29, for any , . This implies that (see (4.31.1)), which implies by Theorem 2.10. This proves that is one-to-one. On the other hand, the surjectivity follows easily from Theorem 4.31: for , since is surjective, we can find a such that , so . This also shows that .
Similar to the case of and , we can define an equivalence relation on .
Definition 4.34.
-
(1)
Let be the class of such that for all , i.e. the only measure zero open subset of is the empty set.
-
(2)
For , set , , and . Also set .
-
(3)
We say that are positive equivalent, denoted by , iff . Denote the equivalence class of by .
-
(4)
Define the maps
by , by , by , by , and by . These maps are well-defined thanks to the following result.
Fact 4.35.
Let and . Then:
-
(a)
and .
-
(b)
is the largest open measure zero subset of and .
-
(c)
, and .
-
(d)
iff iff .
-
(e)
iff (so is well-defined).
-
(f)
and .
-
(g)
iff (so is well-defined).
Proof..
(a): Immediate by the definition of .
(b): Clear because is composed by all the measure zero basic clopen sets.
Finally, and, for , . Hence .
(f): By Theorem 4.31, , so the result follows because is a bijection.
(g): The implication from left to right is clear by (f). For the converse, if then, by (c) and Theorem 4.31, , so because is a bijection. ∎
Corollary 4.36.
The functions and are bijections.
Finally, we can expand Figure 2 by including .
Theorem 4.37.
Consider the diagram in Figure 3. Then:
-
(a)
All the sub-diagrams in Figure 3 are commutative. As a consequence, any pair of paths that start at the same point and end at the same point produce the same function. Moreover:
-
(a.1)
any path from to is the map , and
-
(a.2)
any path from to is the map .
-
(a.1)
-
(b)
, , , , , , and are surjective.
-
(c)
, , , , , , , , , and are bijective.
Proof..
The commutativity of the sub-diagrams of Figure 3 follows easily by Theorem 4.27 and 4.31.
Corollary 4.38.
Let be a probability tree.
-
(a)
If and then . As a consequence, whenever is positive.
-
(b)
If is positive then .
4.4. Relative expected value in probability trees
In this subsection, we are going to introduce a notion of expected value on probability trees that we call relative expected value.
For all this section fix a probability tree . Recall that, for any , is the set of all nodes in above . Notice that, if , then . Furthermore, inherits a probability tree structure from :
Lemma 4.39.
Let be a probability tree. Then, for any , inherits probability tree structure from in a natural way, that is, is a probability tree, where .
Since for any , , we can abuse of the notation and write “” instead of “”. This will be widely applied in this subsection.
We can now define the relative expected value in probability trees.
Definition 4.40.
Let , , such that , and let be a random variable on the probability space Then, we define:
and call it the relative expected value of with respect to . Here, is interpreted as a random variable on . When the context is clear, we simply write or even , instead of
Notice that the “” above is a dummy variable, that is, the expected value of is calculated by varying over the nodes in at level that extend . Since the relative expected value is defined in terms of the typical expected value, it is clear that it is linear, i.e. for and any random variables on ,
The following result allows us to decompose the probability of the successors of at the level of in terms of the probability at the level of :
Lemma 4.41.
For in ,
The relative expected value can be calculated as a composition of relative expected values at intermediate levels, as follows (see Figure 4).
Theorem 4.42.
Let , and assume that . If is a random variable on , then
Finally, as a consequence (when ), we can express the expected value of in terms of the relative expected value:
Corollary 4.43.
If , is a front of and is a random variable on then
So far we discussed the relative expected values of random variables on . However, we can extend this notion to random variables on any front. We first fix some terminology.
Definition 4.44.
Let and fronts of .
-
(1)
The node is below if for some .
-
(2)
The front is below if any node of is below .
For instance, is a front below whenever .
Definition 4.45.
Let be a front of , below , and let be a random variable on (recall from Theorem 4.15 that is a probability measure). Define the relative expected value of with respect to as
Notice that is a front of . Also, is interpreted as a random variable on the probability space .
We can generalize Theorem 4.42 for fronts. We omit the proof, as it is very similar.
Theorem 4.46.
Assume that are fronts, is below , and is below . If is a random variable on , then
In particular, .
5. Bounding cumulative dependent Bernoulli distributions
As mentioned in the introduction it is well known that, by adding finite many independent and identically distributed random variables with Bernoulli distribution, we obtain a random variable with the binomial distribution. However, there are cases where we must deal with dependent random variables with Bernoulli distribution which may not be identically distributed.444See, for example, the proof of [Uri23, Main Lemma 4.3.17] and the proof of [CMU24, Main Lemma 7.17]. In the following theorem, we show how the cumulative distribution of the number of successes of these random variables can be bounded by the cumulative distribution of the binomial distribution, which proves A. Here, success corresponds to and failure to .
For a natural number , -many dependent trials with Bernoulli distribution can be understood as a probability tree where is the complete binary tree of height , i.e. iff is a sequence of length composed by ’s and ’s (including the empty sequence). Any of length represents a sequence of success and failures of the first Bernoulli tests and, whenever , is Bernoulli distributed with probability of success , which clearly depends on , i.e. on the previous trials.
The random variable expressing the success at the -th trial for is , defined by , so success is attained when , i.e. . Therefore, the probability of success is
Theorem 5.1.
Let , , and assume that is a probability tree. Define as the random variable measuring the number of successes after trials, that is, for any
Assume that there exists some such that, for any , . Then, for all
where denotes the binomail distribution of trials, each with probability of success .
Proof..
For any and define:
For let and notice that its volume is It is easy to show that is a partition of the -dimensional unitary cube .
For let . Thus
(5.1.1) |
On the other hand, define the polyhedron for . We can use this to express the cumulative binomial distribution because is a partition of and, by setting , we obtain
(5.1.2) |
6. Probability trees and the real line
In this section, we explore the connection between probability trees and the real line. To this end, we start with an alternative proof of Theorem 4.31, specifically, given a probability tree , we construct a measure such that for any . This construction also shows a connection between the measure space and the Lebesgue measure space of , reflected by a map that, under certain conditions, preserves measure (see Theorem 6.19 and Theorem 6.28).
For all this section, we assume that is a probability tree. Since is countable, is the -algebra generated by .
Definition 6.1.
It is easy to show that is isomorphic with a tree such that, for any , there is some such that . Such a is called a representation of .
To define the measure , we fix a representation of as in 6.1 and, without loss of generality, assume . Our construction is motivated by known connections between the Cantor space, the Baire space, and as in e.g. [Lev02, Ch. VII, §3].
Definition 6.2.
Define a sequence of closed intervals by recursion on as follows.
-
•
, that is, and .
-
•
Having defined the interval , when let be the collection of consecutive closed intervals contained in where each has length , that is: , whenever , and whenever .
Denote . Then , so is countable.
Notice that is just one point (that is, ) iff .
Let us look at many basic properties of the objects defined in 6.2.
Lemma 6.3.
Let .
-
(a)
If then, for any ,
-
(b)
.
-
(c)
If then . Moreover, holds whenever . As a consequence, .
-
(d)
If then and . As a consequence, when , .
Proof..
We have that (a) is clear from the definition of .
(b): Proceed by induction on . When , . Assume and that the claim is true for all . Let , that is, where and . By induction hypothesis, , so , where the last equality holds by 4.7.
(c): Observe that is a monotone increasing sequence. Even more, by (a), if then , so . Then, covers .
To prove the “moreover” in the statement, assume that . Hence, and therefore, , so . As a consequence, covers .
(d): If , then there is some such that and . Without loss of generality, we assume that . Then, , so and, in case they intersect, the unique point in the intersection must be in . Hence, the result follows because . This easily implies that . ∎
Notice that, for , when in , so . Then, for , is a monotone increasing sequence and is a monotone decreasing sequence (letting in the case that has finite length and ) in , so their limits exist. This allows us to introduce the following definition.
Definition 6.4.
Define and by and , respectively. Furthermore, for each , define and, for , set .
Clearly, . Next, let us look at some basic properties of and .
Lemma 6.5.
Let . Then:
-
(a)
For , and
-
(b)
. As a consequence, .
Proof..
(b): Assume that , so for some . Since , we get .
Corollary 6.6.
is a partition of . In particular, for any , there exists an unique such that .
We can compute the functions and for the probability trees from 4.4.
Example 6.7.
-
(1)
For , results from splitting in half. It can be proved that , and, for , , hence is a singleton.
-
(2)
For , is the first half of , is the first half of , is the first half of and, in general, is the first half of . Therefore, .
It can be proved by induction on that . As a consequence, is a singleton for all .
-
(3)
For ,
As a consequence, when is the constant sequence of . Otherwise, let be the minimum such that . If then , and if then . Therefore, when is the constant sequence of . Otherwise, letting be the minimum such that , if then and, if , then .
Next, we analyze the conditions under which and take extreme values, that is, when and . For this, we use the lexicographic order: for , iff there is some such that and . Notice that is a partial order on but not necessarily linear, e.g. comparable nodes in are not -comparable. However, is linear at any level of .
Lemma 6.8.
Let . Then:
-
(a)
If and then .
-
(b)
iff for all satisfying .
-
(c)
iff for all satisfying .
Proof..
(a): If then there is some such that and . This implies that . On the other hand, and , so .
Conversely, assume that for all such that . We show by induction on that . This is clear for . Now assume that and . For every , because , so . By induction on , it is easy to show that , so . ∎
As a consequence, any element in has the form for some .
Corollary 6.9.
.
Proof..
The construction of yields the following important connection between and .
Definition 6.10.
Define such that, for any , is the unique such that .
Notice that is well-defined by virtue of 6.6. Now, we will use it to connect with .
Lemma 6.11.
The map is continuous. Moreover:
-
(a)
For any , , and
-
(b)
If then .
Proof..
We are ready to introduce a more precise definition of .
Although this definition uses the representation of that we fixed, we will show in 6.15 that does not depend on the representation of .
Theorem 6.13.
The map from 6.12 is a probability measure on such that for all .
Proof..
Remark 6.14.
In the previous proof, we do not always have , neither the equivalent formulation “ implies ”. For example, when for some such that is uncountable, and is just a singleton, and then for every non-empty , and there are many such that are not Borel in . This is not much of a problem anyway because the converse holds for the completions of and . For more details, see Theorem 6.19, 6.20, and 6.23.
Although it is possible that for some , this is a very undesirable situation that is typically avoided. When for all and all singletons in have measure zero (which implies ), the equivalence discussed above will hold (see Theorem 6.28).
The uniqueness of in Theorem 4.31 simply follows by Theorem 2.10. Therefore:
Corollary 6.15.
The measure does not depend on the representation of .
The construction of uses the Lebesgue measure on and, unlike the first construction in Theorem 4.31, does not rely on Theorem 4.15, neither on its consequences. Using the second construction, we can prove Theorem 4.15 more directly.
Second proof of Theorem 4.15.
Let , and let be a front of . Since is surjective, there is some such that . Hence, by Theorem 6.13.
Let below and . It is enough to show that . Since is a front and is below , we obtain that is a disjoint union, so
The function has more properties than it appears to have. Under certain conditions, it is a topological embedding into . To construct the inverse function, we look at the measure zero points of .
Lemma 6.16.
Let be a subtree of with , and let . Then:
-
(a)
.
-
(b)
.
-
(c)
iff is a singleton.
Proof..
(a): For any , consider , which is a pairwise disjoint union. Then is a decreasing sequence of clopen sets in whose intersection is . Furthermore, by Theorem 6.13, it is clear that, for any , . As a consequence,
(b): By the definition of ,
Using 6.16 (c), we can introduce a sort of inverse of as follows (inverse in the sense of 6.18 (g)).
Definition 6.17.
-
(1)
For , define , the free part of .
-
(2)
Let be the function with domain such that, for , is the unique point in . Notice that could be empty.
-
(3)
For , define and , which are open in .
- (4)
Lemma 6.18.
-
(a)
is countable. In particular, .
-
(b)
For , if then .
-
(c)
If and then .
-
(d)
is continuous and .
-
(e)
. In particular, .
-
(f)
.
-
(g)
is an homeomorphism from onto with inverse .
-
(h)
and , i.e. .
-
(i)
is countable and .
-
(j)
for all .
Proof..
(b): By the definition of , for , but when , so .
(c): If and then . Also , so .
(d): Let and . Since , there is some such that , where . Then, for any , , so . This shows that is continuous.
Notice that is a sequence of pairwise disjoint intervals that cannot contain points in , and thus neither in . Hence, .
(e): If then for some . Either or , and in the latter case, by (b). Thus, . On the other hand, assume that . If then , so , i.e. there is some . Thus, .
In the case that , we have , so by (c).
(f): If then for some , so , i.e. .
For the converse, if then for some and , i.e. , so by (e). Hence, .
(g): Recall that by (e). If then , so is defined and it is equal to by (c). Conversely, if then for some by (f), so for some . By (b), , so . This shows that is a bijection from onto with inverse . Since both and are continuous, we are done.
(h): If then for some such that . Then , so . Moreover, , so .
On the other hand, if and , then and , so .
(i): It is enough to show that, for , . Let , so (by (h)) and . By the definition of , , so and . Since , we must have that, for some , either for all , or for all . We show that at most only one satisfies that for all but finitely many . Likewise, there is at most one satisfying for all but finitely many , so .
Assume that satisfy that, for some , for all . If then let be such that and . Without loss of generality, . Since , , so . Then, for some , , so , a contradiction.
Finally, since is countable and contained in ,
(j): By the definition of , we have ∎
Now we analyze the effect of applying the functions and to Borel sets and their respective measures.
Theorem 6.19.
Let and . Then:
-
(a)
If , then . Furthermore, and
-
(b)
If then and . As a consequence,
-
(c)
iff and .
-
(d)
iff and .
-
(e)
iff and . In this case, .
-
(f)
iff and . In this case, .
Proof..
(a): Let . For : if , then ; and if , then is an uncountable interval. Now, by 6.18 (j), 6.11 (b) and because . As a consequence, Finally, since we have , it follows that:
(c): If , then because is open in . Also, follows by (a). To prove the converse, assume that and . Notice that we can write as a union of four sets:
so it is enough to show that these four sets are Borel in . Since and are countable (see 6.18), and by hypothesis, it only remains to prove that . Since is Borel, , it follows that
(d): The implication from left to right follows by (b) and because is clearly a Borel set. To prove the converse, notice that we can write . Since is countable and by the hypothesis, it is enough to show that is Borel. This holds because, by hypothesis and by (a), .
(e): Assume that . Then, because is Borel. On the other hand, because , is countable, (by 6.18 (g)), is a Borel function, and is Borel in . To prove the converse, notice that we can write . Therefore, it is enough to show that is Borel in . This holds because , is Borel, and and are Borel sets. Finally, follows by (b) and 6.18 (g).
(f): Assume that . Then because is open, and follows by 6.11. To show the converse, write as follows:
Hence, it is enough to prove that the last set is Borel. Since , by (e), .
Finally, regarding the measure, we have . ∎
Remark 6.20.
The converse in Theorem 6.19 (a) does not hold when is uncountable because it contains non-Borel sets that are mapped into the countable . Similarly, the converse in Theorem 6.19 (b) is not true when , because it contains non-Borel subsets whose pre-images are empty.
We can also analyze the completion of .
Definition 6.21.
Denote by the completion of .
In the cases of the Cantor space and the Bairse space, we have:
Example 6.22.
- (1)
- (2)
Since every measurable set can be decomposed as a union of a Borel set and a null set, from Theorem 6.19 we get:
Corollary 6.23.
Let and . Then:
-
(a)
iff . In this case, and .
-
(b)
iff and . In this case, we have that and .
-
(c)
iff . In this case, .
-
(d)
iff and . In this case, we have that .
Finally, let us consider the case in which is free, i.e. when every point in has measure zero. Thanks to 6.16 and 6.5 (a), we have the following characterization.
Lemma 6.24.
The following statements are equivalent.
-
(i)
is a free.
-
(ii)
for all , i.e. .
-
(iii)
for all .
-
(iv)
is a singleton for all .
As a direct consequence, when is free we get information about the structure of .
Corollary 6.25.
If is free then is a perfect tree.
Proof..
If then , so contains more than two points. This implies that there are two incompatible nodes in above . ∎
When is free, some properties listed in 6.18 and Theorem 6.19 can be simplified. For example, and hence (see Figure 6). As a consequence:
Theorem 6.26.
Assume that is free.
-
(a)
for , i.e. is one-to-one.
-
(b)
If and then .
-
(c)
.
-
(d)
is continuous and .
-
(e)
is an homeomorphism from onto with inverse .
-
(f)
.
Proof..
From 6.24 and Theorem 6.26 (d) it follows like in 6.7 (1) and (2) that is dense when is free. This is actually a characterization of freeness.
Corollary 6.27.
is free iff is dense in .
Proof..
Assume that is free. Let such that . Pick some . By Theorem 6.26 (d), there is a such that , hence . Therefore, we can find an such that and , that is, , where . Thus, is dense. To prove the converse, assume that is not free, that is, there exists some such that . Let us show that . Towards contradiction, assume that there exists a such that . Consider , hence , which is not possible. Thus is not dense in . ∎
As mentioned in 6.20, to establish equivalences in Theorem 6.19 while preserving the measure there, we face two potential obstacles: on the one hand, may be non-empty, and on the other hand, may include non-measurable subsets. The first issue is solved when is free, and the second by completing the measure, since is null in . Thus, under these conditions, Theorem 6.19 and 6.23 get simplified.
Theorem 6.28.
Assume that is free, and . Then:
-
(a)
iff and . In this case, .
-
(b)
iff . In this case .
-
(c)
iff . In this case, .
-
(d)
iff . In this case .
-
(e)
iff . In this case, .
-
(f)
iff and . In this case, .
-
(g)
iff . In this case, .
-
(h)
iff . In this case .
In addition, we can show a connection between and the binary probability tree.
Definition 6.29.
For , denote by the constant sequence of ’s of length . Given a representation of , define by recursion as follows:
-
•
;
-
•
if , define ;
-
•
if , define
-
•
if , define for all .
Let be the smallest subtree of containing , i.e. iff is below some node in .
We list below some basic properties of the function .
Lemma 6.30.
Definition 6.31.
Thanks to 6.30, we can define a map that sends to some extending for all .
Lemma 6.32.
-
(a)
The function is well-defined, i.e. the in 6.31 exists and is unique.
-
(b)
is a topological embedding.
-
(c)
, which is countable.
-
(d)
is onto iff is finitely splitting.
Definition 6.33.
Using , we define a measure on as follows: if for some , set (this value does not depend on because implies ), otherwise, if , pick the largest node such that , and set .
Theorem 6.34.
. Moreover, for any satisfying , and ,
-
(a)
and .
-
(b)
and .
-
(c)
iff , in which case .
-
(d)
iff , in which case .
-
(e)
.
Proof..
Clearly, . Now let . In the case that for some , , even more, this can be found as a splitting node of . Then , so , and . Thus, .
One can show by recursion that, for , . This implies that , i.e. . On the other hand, whenever , i.e. for some with , we have that , so and . This shows that , i.e. , which implies that . Now, if then , so is defined, i.e. is the unique extension of . Thus, , which shows that . This proves (a).
For , . This proves that , so . Now, for , is the unique point in . This concludes (b).
7. Probability trees and the null ideal
In this section, we aim to prove the invariance of the cardinal invariants associated with the null ideal, mostly via Tukey connections. Although the most general result (Theorem 7.7) is already known, we offer an elementary and direct proof using probability trees. Our starting point is to show that, whenever is free, and are Tukey equivalent (see 7.3 and Theorem 7.6).
Based on [Voj93], we first introduce key concepts related to relational systems and Tukey connections, which will provide the necessary framework to formalize our results.
Definition 7.1.
We say that is a relational system if , are non-empty sets and is a relation.
-
(1)
A set is -bounded if .
-
(2)
A set is -dominating if .
We can associate two cardinal invariants with relational systems:
-
, the unbounding number of , and
-
, the dominating number of .
We now present some examples of relational systems and their associated cardinal invariants.
Example 7.2.
Let be an ideal on a set containing all its singletons.
-
(1)
is a relational system, and .
-
(2)
is a relational system, and .
Now we introduce the notion of Tukey connections, which can be thought of as homomorphisms between relational systems:
Definition 7.3.
Let and be relational systems. We say that is a Tukey connection from into if and are functions such that
In this case, we write and we say that is Tukey-below . When and , we say that and are Tukey equivalent and we denote it by
Tukey equivalences determined inequalities between the associated cardinal invariants.
Lemma 7.4.
Let and be two relational systems.
-
(a)
If then and .
-
(b)
If then and .
To prove the results of this section, we introduce several maps. Let and (with some representation), and consider as in 6.17. We define the maps
For this section, we extend to in such a way that, for , is some -preimage of , if it exists, otherwise is sent to some arbitrary point in .
Lemma 7.5.
Let , , , , and . Then:
-
(a)
implies .
-
(b)
If is free, then implies .
-
(c)
implies .
-
(d)
If then implies .
-
(e)
If is free then implies .
Proof..
Easy to check, also using that, by Theorem 6.26 (d), whenever is free. ∎
As a consequence, for the null ideal we obtain:
Theorem 7.6.
Let . If is free then and
Proof..
Although the following generalization of Theorem 7.6 is already known, we offer an alternative proof using our results. Recall that a Polish space is a completely metrizable separable space, and that a Borel isomorphism between topological spaces is a bijection that sends Borel sets into Borel sets in both directions.
Theorem 7.7 ([Kec95, Thm. 17.41]).
If is a Polish space and is a free probability measure, then there is some Borel isomorphism such that, for any Borel , . In particular, and .
Proof..
Since is a free probability measure on , must be uncountable, so there exists a Borel isomorphism . Let be the probability measure on induced by , i.e. for . It is clear that is free, so it is enough to prove the theorem for instead of .
Pick some such that , and pick some infinite countable and let . We proceed by cases. First assume that is countable. Hence, we can construct a function such that equals at and . This is indeed a Borel bijection (hence isomorphism) and, by Theorem 6.28, it preserves measure as desired.
Now consider the case when is uncountable. Let be the ternary Cantor set in , which is closed and . Pick a sequence of pairwise disjoint closed intervals of positive length contained in such that for all , and let be the image of under the canonical homeomorphism from onto (i.e the line segment from to ), so resembles the Cantor ternary set in , thus it is closed with measure zero.
Define the map such that it is the identity on and is a linear isomorphism onto for all . This is a Borel isomorphism that preserves the Lebesgue measure. Finally, define such that coincides with at , , and is a Borel isomorphism onto . This is the desired Borel isomorphism. ∎
We also have a similar result for the Lebesgue measure on .
Corollary 7.8.
If is a Polish space, is a free -finite measure and , then there is some Borel isomorphism such that, for any Borel , . In particular, and .
Proof..
Partition into Borel sets such that for each and . Then, we can partition into semi-open intervals such that each has length . By Theorem 7.7, there is some Borel isomorphism preserving measure (this uses that, for any , there is some finer Polish topology on such that its Borel -algebra is and is clopen, see e.g. [Kec95, Thm. 13.1]). Thus, we obtain the desired by putting all the together. ∎
Corollary 7.9.
If is a Polish space, is a free -finite measure and then and . In particular,
References
- [AS16] Noga Alon and Joel H. Spencer. The probabilistic method. Wiley Series in Discrete Mathematics and Optimization. John Wiley & Sons, Inc., Hoboken, NJ, fourth edition, 2016.
- [CMU24] Miguel A. Cardona, Diego A. MejÃa, and Andrés F. Uribe-Zapata. A general theory of iterated forcing using finitely additive measures. Preprint, arXiv:2406.09978, 2024.
- [FF80] Jan M. Friedman and R. David Fish. The use of probability trees in genetic counselling. Clinical Genetics, 18, 1980.
- [FSNG23] Yangqing Fu, Ming Sun, Buqing Nie, and Yue Gao. Accelerating Monte Carlo tree search with probability tree state abstraction. Preprint, arXiv:2310.06513, 2023.
- [GMD+20] Tim Genewein, Tom McGrath, Grégoire Déletang, Vladimir Mikulik, Miljan Martic, Shane Legg, and Pedro A. Ortega. Algorithms for causal reasoning in probability trees. Preprint, arXiv:2010.12237, 2020.
- [Gri89] R. C. Griffiths. Genealogical-tree probabilities in the infinitely-many-site model. J. Math. Biol., 27(6):667–680, 1989.
- [GT95] R. C. Griffiths and Simon Tavaré. Unrooted genealogical tree probabilities in the infinitely-many-sites model. Mathematical Biosciences, 127(1):77–98, 1995.
- [Hal50] Paul R. Halmos. Measure Theory. D. Van Nostrand Co., Inc., New York, 1950.
- [Kec95] Alexander S. Kechris. Classical Descriptive Set Theory. Springer New York, NY, 1995.
- [KST19] Jakob Kellner, Saharon Shelah, and Anda R. Tănasie. Another ordering of the ten cardinal characteristics in Cichoń’s diagram. Comment. Math. Univ. Carolin., 60(1):61–95, 2019.
- [KWWC24] Aneesh Komanduri, Xintao Wu, Yongkai Wu, and Feng Chen. From identifiable causal representations to controllable counterfactual generation: A survey on causal generative modeling. Transactions on Machine Learning Research, 2024.
- [Lev02] Azriel Levy. Basic set theory. Dover Publications, Inc., Mineola, NY, 2002. Reprint of the 1979 original [Springer, Berlin].
- [MU24] Diego A. Mejía and Andrés F. Uribe-Zapata. The measure algebra adding -many random reals is --linked. Topology Appl., 2024. To appear, arXiv:2312.13443.
- [She00] Saharon Shelah. Covering of the null ideal may have countable cofinality. Fund. Math., 166(1-2):109–136, 2000.
- [Uri23] Andrés F. Uribe-Zapata. Iterated forcing with finitely additive measures: applications of probability to forcing theory. Master’s thesis, Universidad Nacional de Colombia, sede Medellín, 2023. https://shorturl.at/sHY59.
- [Voj93] Peter Vojtáš. Generalized Galois-Tukey-connections between explicit relations on classical objects of real analysis. In Set theory of the reals (Ramat Gan, 1991), volume 6 of Israel Math. Conf. Proc., pages 619–643. Bar-Ilan Univ., Ramat Gan, 1993.
- [ZM18] Cheng Zhang and Frederick A Matsen IV. Generalizing tree probability estimation via bayesian networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.