Hitting Times for Continuous-Time Imprecise-Markov Chains
Uncertainty in Artificial Intelligence – Eindhoven University of Technology)
Abstract
We study the problem of characterizing the expected hitting times for a robust generalization of continuous-time Markov chains. This generalization is based on the theory of imprecise probabilities, and the models with which we work essentially constitute sets of stochastic processes. Their inferences are tight lower- and upper bounds with respect to variation within these sets.
We consider three distinct types of these models, corresponding to different levels of generality and structural independence assumptions on the constituent processes.
Our main results are twofold; first, we demonstrate that the hitting times for all three types are equivalent. Moreover, we show that these inferences are described by a straightforward generalization of a well-known linear system of equations that characterizes expected hitting times for traditional time-homogeneous continuous-time Markov chains.
1 Introduction
We consider the problem of characterizing the expected hitting times for continuous-time imprecise-Markov chains [Škulj, 2015, Krak et al., 2017, Krak, 2021, Erreygers, 2021]. These are robust, set-valued generalizations of (traditional) Markov chains [Norris, 1998], based on the theory of imprecise probabilities [Walley, 1991, Augustin et al., 2014]. From a sensitivity-analysis perspective, we may interpret these sets as hedging against model-uncertainties with respect to a model’s numerical parameters and/or structural (independence) assumptions.
The inference problem of hitting times essentially deals with the question of how long it will take the underlying system to reach some particular subset of its states. This is a common and important problem in such fields as, e.g., reliability analysis, where it can capture the expected time-to-failure of a system; and epidemiology, to model the expected time-until-extinction of an epidemic. For imprecise-Markov chains, then, we are interested in evaluating these quantities in a manner that is robust against, and conservative with respect to, any variation that is compatible with one’s uncertainty about the model specification.
Erreygers [2021] has recently obtained some partial results towards characterizing such inferences, but has not been able to give a complete characterization and has largely studied the finite-time horizon case. The problem of hitting times for discrete-time imprecise-Markov chains was previously studied by Krak et al. [2019], Krak [2020]. In this present work, we largely emulate and extend their results to the continuous-time setting.
We will be concerned with three different types of imprecise-Markov chains. These are all sets of stochastic processes that are in a specific sense compatible with a given set of numerical parameters, but the three types differ in the independence properties of their elements. In particular, they correspond to (i) a set of (time-)homogeneous Markov chains, (ii) a set of (not-necessarily homogeneous) Markov chains, and (iii) a set of general—not-necessarily homogeneous nor Markovian—stochastic processes. It is known (and perhaps not very surprising) that inferences with respect to these three models do not in general agree; see e.g. [Krak, 2021] for a detailed analysis of their differences.
However, our first main result in this work is that the expected hitting time is the same for these three different types of models. Besides being of theoretical interest, we want to emphasize the power of this result: it means that even if a practitioner using Markov chains would be uncertain whether the system they are studying is truly homogeneous and/or Markovian, relaxing these assumptions would not influence inferences about the hitting times in this sense. Purely pragmatically, it also means that we can use computational methods tailored to any one of these types of models, to compute these inferences.
Our second main result is that these hitting times are characterized by a generalization of a well-known system of equations that holds for continuous-time homogeneous Markov chains; see Proposition 2 for this linear system.
The remainder of this paper is structured as follows. In Section 2 we introduce the basic required concepts that we will use throughout, formalizing the notion of stochastic processes and defining the inference problem of interest. In Section 3, we define the various types of imprecise-Markov chains that we use throughout this work. We spend some effort in Section 4 to study the transition dynamics of these models, from a perspective that is particularly relevant for the inference problem of hitting times. In Section 5 we explain and sketch the proofs of our main results, and we give a summary in Section 6.
Because we have quite a lot of conceptual material to cover before we can explain our main results, we are not able to fit any real proofs in the main body of this work. Instead, these—together with a number of technical lemmas—have largely been relegated to the supplementary material.
2 Preliminaries
Throughout, we consider a fixed, finite state space with at least two elements. This set contains all possible values for some abstract underlying process. An element of is called a state, and is usually generically denoted as .
We use , and to denote the reals, the non-negative reals, and the positive reals, respectively. denotes the natural numbers without zero, and we let .
For any , we use to denote the vector space of real-valued functions on ; in particular, denotes the space of all real functions on . We use to denote the supremum norm on any such space; for any we let . Throughout, we make extensive use of indicator functions, which are defined for all as if and , otherwise. We use the shorthand . Let denote the function that is identically equal to ; its dimensionality is to be understood from context.
A map is also called an operator, and we denote its evaluation in as . If it holds for all that then is called non-negatively homogeneous. For any non-negatively homogeneous operator on , we define the induced operator norm . We reserve the symbol to denote the identity operator on any space; the domain is to be understood from context.
Note that any linear operator is also non-negatively homogeneous. Moreover, if is linear it can be represented as an matrix by arbitrarily fixing an ordering on . However, without fixing such an ordering, we simply use to denote the entry in the -row and -column of such a matrix, for any . For any and we then have , so that simply represents the usual matrix-vector product of with the (column) vector . In the sequel, we interchangeably refer to linear operators also as matrices. We note the well-known equality for the induced matrix norm.
2.1 Processes & Markov Chains
We now turn to stochastic processes, which are fundamentally the subject of this work. The typical (measure-theoretic) way to define a stochastic process is simply as a family of random variables with index set . This index set represents the time domain of the stochastic process. The random variables are understood to be taken with respect to some underlying probability space , where is a set of sample paths, which are functions from to representing possible realizations of the evolution of the underlying process through . The random variables , are canonically the maps on .
However, for our purposes it will be more convenient to instead refer to the probability measure as the stochastic process. Different processes may then be taken over the same measurable space , using the same canonical variables for all these processes.
In this work we will use both discrete- and continuous-time stochastic processes, which corresponds to choosing or , respectively. In both cases we take to be the -algebra generated by the cylinder sets; this ensures that all functions that we consider are measurable.
In the discrete-time case, we let be the set of all functions from to . A discrete-time stochastic process is then simply a probability measure on . Moreover, is said to be a Markov chain if it satisfies the (discrete-time) Markov property, meaning that
for all and . If, additionally, it holds for all and that
then is said to be a (time-)homogeneous Markov chain. We use , and to denote, respectively, the set of all discrete-time stochastic processes; the set of all discrete-time Markov chains; and the set of all discrete-time homogeneous Markov chains.
In the continuous-time case, we let be the set of all cadlag functions from to . A continuous-time stochastic process is a probability measure on . The process is said to be a Markov chain if it satisfies the (continuous-time) Markov property,
for all , , and all . If, additionally, it holds that
for all and all with , then is said to be a (time-)homogeneous Markov chain. We use , and to denote, respectively, the set of all continuous-time stochastic processes; the set of all continuous-time Markov chains; and the set of all continuous-time homogeneous Markov chains.
We refer to [Norris, 1998] for an excellent further introduction to discrete-time and continuous-time Markov chains.
2.2 Transition Dynamics
Throughout this work, we make extensive use of operator-theoretic representations of the behavior of stochastic processes, and Markov chains in particular. The first reason for this is that such operators serve as a way to parameterize Markov chains. Moreover, they are also useful as a computational tool, since they can often be used to express inferences of interest; see, e.g., Propositions 1 and 2 further on. We introduce the basic concepts below, and refer to e.g. [Norris, 1998] for details.
A transition matrix is a linear operator on such that, for all , it holds that for all , and . There is an important and well-known connection between Markov chains and transition matrices; for any discrete-time homogeneous Markov chain , we can define the corresponding transition matrix as
Since is a probability measure, we clearly have that is a transition matrix. Conversely, a given transition matrix uniquely determines a discrete-time homogeneous Markov chain with , up to the specification of the initial distribution . For this reason, transition matrices are often taken as a crucial parameter to specify (discrete-time, homogeneous) Markov chains.
Analogously, for a (non-homogeneous) discrete-time Markov chain , we might define a family of time-dependent corresponding transition matrices, with
for all and . Conversely, any family of transition matrices uniquely determines a discrete-time Markov chain with for all , again up to the specification of .
In the continuous-time setting, transition matrices are also of great importance. However, it will be instructive to first introduce rate matrices. A rate matrix is a linear operator on such that, for all , it holds that for all with , and .
For any rate matrix and any , the matrix exponential of can be defined as [Van Loan, 2006]
An alternative characterization is as the (unique) solution to the matrix ordinary differential equation [Van Loan, 2006]
(1) |
For any it holds that , and we immediately have . The family is therefore called the semigroup generated by , and is called the generator of this semigroup. Moreover, for any rate matrix and any , is a transition matrix [Norris, 1998, Thm 2.1.2].
Now let us consider a continuous-time homogeneous Markov chain , and define the corresponding transition matrix111Note that in continuous-time, we always have to measure the transition-time interval to specify these matrices. for all and as
(2) |
It turns out that there is then a unique rate matrix associated with such that for all . By combining Equations (1) and (2), we can identify as
As before, in the other direction we have that any fixed rate matrix uniquely determines a continuous-time homogeneous Markov chain with , up to the specification of . For this reason, rate matrices are often used to specify (continuous-time, homogeneous) Markov chains.
Let us finally consider a (not-necessarily homogeneous) continuous-time Markov chain . For any with , we can then define a transition matrix with, for all , . Under appropriate assumptions of differentiability, this induces a family of rate matrices , as
(3) |
In the converse direction we might try to reconstruct the transition matrices of by solving the matrix ordinary differential equation(s)
(4) |
By comparing with Equation (1), we see that in the special case where does not depend on —that is, where is homogeneous with , say—we indeed obtain . However, in general the non-autonomous system (4) does not have such a closed-form solution, and we cannot move beyond this implicit characterization.
2.3 Hitting Times
We now have all the pieces to introduce the inference problem that is the subject of this work, viz. the expected hitting times of some non-empty set of states with respect to a particular stochastic process. We take this set to be fixed for the remainder of this work.
In the discrete-time case, we consider the (extended real-valued)222We agree that ; ; and, for any , and if . function given by
This captures the number of steps before a process “hits” any state in . The expected hitting time for a discrete-time process starting in is then defined as
We use to denote the extended real-valued function on given by . When dealing with homogeneous Markov chains, this quantity has the following simple characterization:
Proposition 1.
[Norris, 1998, Thm 1.3.5] Let be a discrete-time homogeneous Markov chain with corresponding transition matrix . Then is the minimal non-negative solution to the linear system333Throughout, for any , the quantity is understood as the pointwise product between the functions and .444Strictly speaking this requires extending the domain of to extended-real valued functions, but we will shortly introduce some assumptions that obviate such an exposition.
In the continuous-time case, the definition is analogous; we introduce a function as
This function measures the time until a process “hits” any state in on a given sample path. The expected hitting time for a continuous-time process starting in is
We again use to denote the extended-real valued function on given by . Also in this case, the characterization for homogeneous Markov chains is particularly simple:
Proposition 2.
[Norris, 1998, Thm 3.3.3] Let be a continuous-time homogeneous Markov chain with rate matrix such that for all . Then is the minimal non-negative solution to
(5) |
3 Imprecise-Markov Chains
Let us now introduce imprecise-Markov chains [Hermans and Škulj, 2014, Škulj, 2015, Krak et al., 2017], which are the stochastic processes that we aim to study in this work. Their characterization is based on the theory of imprecise probabilities [Walley, 1991, Augustin et al., 2014].
We here adopt the “sensitivity analysis” interpretation of imprecise probabilities. This means that we represent an imprecise-Markov chain simply as a set of stochastic processes. Intuitively, the idea is that we collect in all (traditional, “precise”) stochastic processes that we deem to plausibly capture the dynamics of the underlying system of interest. Inferences with respect to are defined using lower- and upper expectations, given respectively as
So, their inferences represent robust—i.e. conservative—and tight lower- and upper bounds on inferences with respect to all stochastic processes that we deem to be plausible.
3.1 Sets of Processes & Types
We already mentioned that an imprecise-Markov chain is essentially simply a set of stochastic processes. Let us now consider how to define such sets.
We start by considering the discrete-time case; then, clearly, will be a set of discrete-time processes. We will parameterize such a set with some non-empty set of transition matrices. Our aim is then to include in all processes that are in some sense “compatible” with .555We will not constrain the initial models of the elements of , since in any case such a choice would not influence the inferences that we study in this work. However, at this point we are faced with a choice about which type of processes to include in this set, and these different choices lead to different types of imprecise-Markov chains.
Arguably the conceptually most simple model is , which contains all homogeneous Markov chains whose corresponding transition matrix is included in :
However, we could instead consider , which is the set of all (not-necessarily homogeneous) Markov chains whose time-dependent transition matrices are contained in :
The last choice that we consider here is the set , which essentially contains all discrete-time processes whose single-step transition dynamics are described by . Its characterization is more cumbersome since we have not expressed these general processes in terms of transition matrices, but we can say that it is the set of all such that for all and all , there is some such that for all it holds that
This last type is called an imprecise-Markov chain under epistemic irrelevance, whence the superscript ‘’.
Note that the three types , and capture not only “plausible” variation in terms of parameter uncertainty—expressed through the set —but also variation in terms of the structural independence conditions that we consider! So, from an applied perspective, if someone is not sure whether the underlying system that they are studying is truly Markovian and/or time-homogeneous, they might choose to use different such sets in their analysis.
In the continuous-time case, we again proceed analogously. First, we fix a non-empty set of rate matrices, which will be the parameter for our models. We then first consider the set of all homogeneous Markov chains whose rate matrix is included in :
The other two types are constructed in analogy to the discrete-time case, but unfortunately we don’t have the space for a complete exposition of their characterization. Instead we refer the interested reader to [Krak et al., 2017, Krak, 2021] for an in-depth study of these different types and comparisons between them; in what follows we limit ourselves to a largely intuitive specification.
The model is the set of all continuous-time (not-necessarily homogeneous) Markov chains whose transition dynamics are compatible with at every point in time. This includes in particular all Markov chains satisfying the appropriate differentiability assumptions to meaningfully say that the time-dependent rate matrices —as in Equation (3)—are included in for all . However, also contains other processes that are not (everywhere) differentiable; see e.g. [Krak, 2021, Sec 4.6 and 5.2] for the technical details.
The most involved model to explain is again , which includes all continuous-time processes whose time- and history-dependent transition dynamics can be described using elements of . It includes, but is not limited to, appropriately differentiable processes such that for all , all , and all , there is some such that for all it holds that
We again refer to [Krak, 2021, Sec 4.6 and 5.2] for the technical details involving the additional elements of that are not appropriately differentiable. Importantly, we note the nested structure [Krak, 2021, Prop 5.9]
where the inclusions are strict provided isn’t trivial.
For notational convenience, we will use identical sub- and superscripts to denote the corresponding lower- and upper expectations for any of these imprecise-Markov chains; e.g., we let .
3.2 Imprecise Transition Dynamics
Let us now introduce some machinery to describe the dynamics of imprecise-Markov chains. In particular, we here move from the set-valued parameters and used in Section 3.1, to their dual representations; these are operators that can serve as computational tools.
In Section 3.1, we described discrete-time imprecise-Markov chains using non-empty sets of transition matrices. With any such set, we can associate the corresponding lower- and upper transition operators and on , defined respectively as
More generally, any operator (resp. ) on is a lower (resp. upper) transition operator if for all , all , and all , it holds that [De Bock, 2017]
-
1.
and
-
2.
and
-
3.
and .
It should be noted that lower- and upper transition operators are conjugate, in that any induces a corresponding upper transition operator , and vice versa. Moreover, any transition matrix is also a lower—and, by its linearity, upper—transition operator.
It is easily verified that the lower- and upper transition operators corresponding to a given non-empty set are, indeed, lower- and upper transition operators. Conversely, with a given lower transition operator , we can associate the set of transition matrices that dominate it, in the sense that
This set satisfies the following important properties:
Proposition 3.
[Krak, 2021, Sec 3.4] Let be a lower transition operator with conjugate upper transition operator and dominating set of transition matrices . Then is a non-empty, closed, and convex set of transition matrices that has separately specified rows,666A set of matrices is said to have separately specified rows if, intuitively, it is closed under the row-wise recombination of its elements; see e.g. [Hermans and Škulj, 2014] for details. and for all it holds that and . Moreover, for all there is some such that , and there is some—possibly different— such that .
Notably, there is a one-to-one relation between non-empty sets of transition matrices that are closed and convex and have separately specified rows, and lower (or upper) transition operators: if is the lower transition operator for the set , and if satisfies these properties, then [Krak, 2021, Cor 3.38]. Hence these objects may serve as dual representations for each other.
One reason that this is important is the use of as a computational tool; under the conditions of this duality it holds that for any function and any , we can write [Hermans and Škulj, 2014]
where is the lower transition operator for . This reduces the problem of computing such inferences for the imprecise-Markov chains and to solving independent linear optimization problems over ; first compute , then compute , and so forth. Note that this method in general only yields a conservative bound on the corresponding inference for , as the minimizers that obtain may be different at each step.
We next consider the dynamics in the continuous-time setting. We proceed analogously to the above: we first consider a non-empty and bounded777In the induced operator norm. set of rate matrices. With this set, we then associate the corresponding lower- and upper rate operators and on , defined as
More generally, any operator (resp. ) on is a lower (resp. upper) rate operator if for all , all and , and all with , it holds that [De Bock, 2017]
-
1.
and
-
2.
and
-
3.
and
-
4.
and
As before, such objects are conjugate, in that if is a lower rate operator, then is an upper rate operator. Moreover, any rate matrix is also a lower (and upper) rate operator. There is again a duality between lower (or upper) rate operators, and sets of rate matrices. For fixed and with the dominating set of rate matrices defined as
we have the following result:
Proposition 4.
[Krak, 2021, Sec 6.2] Let be a lower rate operator with conjugate upper rate operator and dominating set of rate matrices . Then is a non-empty, compact, and convex set of rate matrices that has separately specified rows, and for all it holds that and . Moreover, for all there is some such that , and there is some—possibly different— such that .
Now fix any lower rate operator and any , and let
(6) |
The operator is then a lower transition operator [De Bock, 2017], and the family is a semigroup of lower transition operators; it satisfies for all , and . The analogous construction with an upper rate operator instead generates a semigroup of upper transition operators. When and are taken with respect to the same set , these semigroups satisfy, for all , , and ,
(7) |
Here the importance again derives from the use as a computational tool; under the conditions of duality between and , we have for any and any that [Škulj, 2015, Krak et al., 2017]
Hence such inferences can be numerically computed by approximating (6) with a finite choice of , and then solving independent linear optimization problems over . Error bounds for this scheme are available in the literature [Škulj, 2015, Krak et al., 2017, Erreygers, 2021].
3.3 Class Structure
Let us now fix a set of rate matrices that we will use in the remainder of this work. Throughout, let and denote the lower- and upper rate operators associated with . We impose several standard regularity conditions on this set: we assume that is non-empty, compact, convex, and that it has separately specified rows. These are common assumptions that are imposed to ensure the duality between and , which in turn guarantees that inferences with the induced imprecise-Markov chains remain well-behaved, as well as analytically (and, often, computationally) tractable.
We now have all the pieces to start studying the inference problem that is the subject of this work: the lower- and upper expected hitting times of the set for continuous-time imprecise-Markov chains described by .
Before we begin, let us impose two additional conditions on the dynamics of the system.
Assumption 1.
We assume that all states in are absorbing, which is equivalent to requiring that for all and all .
Note that this does not influence the inferences in which we are interested, since those only deal with behavior at times before states in are reached. However, imposing this explicitly substantially simplifies the analysis.
Next, we assume that the set is lower reachable from any state [De Bock, 2017]. This means that we can construct a sequence starting in any and ending in some such that, for all , it holds that . This is equivalent [De Bock, 2017] to
Assumption 2.
We assume for all and all .
Essentially, this means that for all elements of our imprecise-probabilistic models the probability of eventually hitting is bounded away from zero. This ensures that the expected hitting times remain bounded for all , so that we can ignore any extended real-valued analysis. It also implies that for all we have that for all , which is relevant to meet the precondition of Proposition 2. As a practical point, De Bock [2017] gives an algorithm to check whether a given set satisfies this condition.
On a technical level, Assumption 2 is the crucial one for our results, and—unlike with Assumption 1—it cannot really be ignored in practice. However, based on earlier work by Krak et al. [2019] in the discrete-time setting, we hope in the future to strengthen our results to hold without this assumption.
4 Subspace Dynamics
In the context of hitting times, the interesting behavior of a process actually occurs before it has reached a target state in . Hence it will be useful to introduce some machinery to study the transition dynamics as it relates to the states .
To introduce the notation in a general way, choose any non-empty . Then for any , let denote the restriction of to . Conversely, for any , let denote the unique extension of to that satisfies for all . Moreover, for any operator on , we define the operator on as
This somewhat verbose notation is perhaps most easily understood when is a linear operator, i.e. a matrix. In that case, is simply the sub-matrix of on the coordinates in . The definition above allows us to extend this notion also to non-linear operators, and to lower- and upper transition and rate operators, specifically.
Now for any rate matrix , we call its corresponding subgenerator. For any , we then define . We have the following result:
Proposition 5.
Fix and let be its subgenerator. Then for all . Moreover, the family is a semigroup.
Analogously, we define and to be the lower- and upper subgenerators corresponding to and , respectively. We also let and . Perhaps unsurprisingly, we then have:
Proposition 6.
It holds that and for all . Moreover, the families , are semigroups.
Our Assumption 2 implies the norm bound:
Proposition 7.
For any , it holds that .
It is a straightforward consequence of the use of the supremum norm, together with Equation (7) and the fact that and are (upper) transition operators, that also for all . Hence by the semigroup property we immediately have that . This also implies the following well-known result.
Proposition 8.
[Taylor and Lay, 1958, Thm IV.1.4] For any with subgenerator , and all , the inverse operator exists, and .
This allows us to characterize hitting times for discrete-time homogeneous Markov chains whose transition matrix is given by , as follows.
Proposition 9.
Choose any , let be its subgenerator, and fix any . Let be such that . Then the expected hitting times satisfy and for all .
Proof.
We need the following observation:
Lemma 1.
Consider any with subgenerator , and let be the set of eigenvalues of . Then for all .
This implies that , and so we have:
Corollary 1.
For any with subgenerator , the inverse operator exists.
This allows us to characterize hitting times for continuous-time homogeneous Markov chains:
Proposition 10.
Choose any , let be its subgenerator, and let with . Then the expected hitting times satisfy and for all .
Proof.
4.1 Quasicontractivity of Subspace Dynamics
We already know from Proposition 7 that for all . Since (because it is a semigroup), it follows that for all . A semigroup that satisfies this property is said to be contractive. Moreover, Proposition 7 together with the semigroup property implies that . A semigroup that satisfies this property is said to be uniformly exponentially stable, and in such a case the following result holds:
Proposition 11.
There are and such that for all .
This result means that the norm decays exponentially as grows. However, for technical reasons we require an exponentially decaying norm bound with ; if this holds the semigroup is said to be quasicontractive.
It is not clear that obtaining such a bound is possible when is induced by the supremum norm on . However, we can get it by defining a different norm on . We then obtain the quasicontractivity with respect to the induced operator norm . Because is finite-dimensional these norms are equivalent, and such a result suffices for our purposes. This re-norming trick is originally due to Feller [1953], and an analogous construction is commonly used for semigroups of linear operators; see e.g. [Renardy and Rogers, 2006, Thm 12.21].
Proposition 12.
The map is a norm on .
Moreover, we have the desired result:
Proposition 13.
We have for all .
Finally, the same bound holds for precise models:
Proposition 14.
For any with subgenerator it holds that for all .
5 Hitting Times as Limits
We now have all the pieces to explain the proof of our main results. The trick will be to establish a connection between hitting times for continuous-time imprecise-Markov chains, and hitting times for discrete-time imprecise-Markov chains, for which analogous results were previously established by Krak et al. [2019].
We essentially just look at a discretized continuous-time Markov chain taking steps of some fixed size , derive the expected hitting time for this discrete-time Markov chain, and then take the limit . The main difficulty is in establishing that this converges uniformly for all elements in our sets of processes; this is why we went through the trouble of establishing quasicontractivity in Section 4.1.
To start, for any and , let be the minimal non-negative solution to the linear system888Note the re-scaled term on the right-hand side, which distinguishes this from the system in Proposition 1; this is required since the hitting times for discrete-time Markov chains are expressed in the number of steps, and to pass to continuous-time we need to measure the size of these steps.
(9) |
and let be the minimal non-negative solution to
(10) |
Then we know from Propositions 1 and 2 that represents the expected hitting times of a discrete-time homogeneous Markov chain with transition matrix , and that does the same for a continuous-time homogeneous Markov chain with rate matrix . We now have the following result:
Proposition 15.
There are and such that for all and all .
Since is bounded due to Proposition 10:
Corollary 2.
We have for all .
We will now set up the analogous results for imprecise-Markov chains. First, let
(11) |
Clearly, it follows from Proposition 2 and the definition of lower- and upper expectations that these quantities represent the lower- and upper expected hitting times for the imprecise-Markov chain , i.e. it holds that
Now for any , let and denote the minimal non-negative solutions to the non-linear systems
(12) |
and
(13) |
It was previously shown by Krak et al. [2019] that—up to re-scaling with —the quantities and represent the lower (resp. upper) expected hitting times of, identically, the discrete-time imprecise-Markov chains , , and parameterized by the set of transition matrices that dominate . We now set out of prove an analogous result for continuous-time imprecise-Markov chains. We start with the following:
Proposition 16.
It holds that and .
This property allows us to leverage recent results by Erreygers [2021] and Krak [2021] regarding discrete and finite approximations of lower- and upper expectations in continuous-time imprecise-Markov chains, to obtain our first main result:
Theorem 1.
It holds that
and, moreover, that
Moreover, it follows relatively straightforwardly from Proposition 16 that the lower- and upper expected hitting times for continuous-time imprecise-Markov chains satisfy an immediate generalization of the system that characterizes the expected hitting times for (precise) continuous-time homogeneous Markov chains. This is our second main result:
Theorem 2.
Let and denote the lower- and upper expected hitting times for any one of , , or . Then is the minimal non-negative solution to the non-linear system , and is the minimal non-negative solution to the non-linear system .
6 Summary & Conclusion
We have investigated the problem of characterizing expected hitting times for continuous-time imprecise-Markov chains. We have shown that under two relatively mild assumptions on the system’s class structure—viz. that the target states are absorbing, and can be reached by any non-target state—the corresponding lower (resp. upper) expected hitting time is the same for all three types of imprecise-Markov chains.
We have also demonstrated that these lower- and upper expected hitting times and satisfy the non-linear systems
in analogy with the precise linear system (5). Indeed, we conclude that the lower- and upper expected hitting times for any of these three types of imprecise-Markov chains, can be fully characterized as the unique minimal non-negative solutions to these respective systems.
We aim to strengthen these results in future work to hold with fewer assumptions on the system’s class structure.
acknowledgements
We would like to sincerely thank Jasper De Bock for many stimulating discussions on the subject of imprecise-Markov chains, and for pointing out a technical error in an earlier draft of this work. We are also grateful for the constructive feedback of three anonymous reviewers.
References
- Augustin et al. [2014] Thomas Augustin, Frank P. A. Coolen, Gert de Cooman, and Matthias C. M. Troffaes, editors. Introduction to Imprecise Probabilities. John Wiley & Sons, 2014.
- De Bock [2017] Jasper De Bock. The limit behaviour of imprecise continuous-time Markov chains. Journal of Nonlinear Science, 27(1):159–196, 2017.
- Engel and Nagel [2000] Klaus-Jochen Engel and Rainer Nagel. One-parameter semigroups for linear evolution equations, volume 194. Springer, 2000.
- Erreygers [2021] Alexander Erreygers. Markovian Imprecise Jump Processes: Foundations, Algorithms and Applications. PhD thesis, Ghent University, 2021.
- Feller [1953] William Feller. On the generation of unbounded semi-groups of bounded linear operators. Annals of Mathematics, pages 166–174, 1953.
- Hermans and Škulj [2014] Filip Hermans and Damjan Škulj. Stochastic processes. In Thomas Augustin, Frank P. A. Coolen, Gert de Cooman, and Matthias C. M. Troffaes, editors, Introduction to Imprecise Probabilities, chapter 11. Wiley, 2014.
- Krak [2020] Thomas Krak. Computing expected hitting times for imprecise Markov chains. In International Conference on Uncertainty Quantification & Optimisation, pages 185–205. Springer, 2020.
- Krak [2021] Thomas Krak. Continuous-Time Imprecise-Markov Chains: Theory and Algorithms. PhD thesis, Ghent University, 2021.
- Krak et al. [2017] Thomas Krak, Jasper De Bock, and Arno Siebes. Imprecise continuous-time Markov chains. International Journal of Approximate Reasoning, 88:452–528, 2017.
- Krak et al. [2019] Thomas Krak, Natan T’Joens, and Jasper De Bock. Hitting times and probabilities for imprecise Markov chains. In Proceedings of ISIPTA 2019, volume 103 of Proceedings of Machine Learning Research, pages 265–275. PMLR, 2019.
- Norris [1998] James Robert Norris. Markov chains. Cambridge university press, 1998.
- Renardy and Rogers [2006] Michael Renardy and Robert C. Rogers. An introduction to partial differential equations. Springer Science & Business Media, 2006.
- Škulj [2015] Damjan Škulj. Efficient computation of the bounds of continuous time imprecise Markov chains. Applied mathematics and computation, 250:165–180, 2015.
- Taylor and Lay [1958] Angus E. Taylor and David C. Lay. Introduction to functional analysis, volume 1. Wiley New York, 1958.
- Van Loan [2006] Charles F. Van Loan. A study of the matrix exponential. 2006.
- Walley [1991] Peter Walley. Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London, 1991.
Appendix A Proofs and Lemmas for Section 4
For certain operators, we note that subspace restriction distributes over operator composition:
Lemma 2.
Let and be operators on such that . Then .
Proof.
Fix any . Then
Note that since and for all , we also have for all . Hence in particular, it holds that . We therefore find that
which concludes the proof. ∎
This can be used in particular for certain operators associated with and the associated lower- and upper rate operators:
Lemma 3.
Fix any and any . Then
Proof.
Fix any , and first choose any and . By Assumption 1 and the definition of rate matrices, we have for all , whence . Since is arbitrary, we also have and . It follows that
Since this is true for all and all , the result is now immediate. ∎
Corollary 3.
For all and it holds that
Proof.
Use Lemma 3 and the definitions of , , . ∎
Lemma 4.
Let and be operators on such that . Then .
Proof.
Fix any with . Then . Moreover, since for all and since , we have that for all . Hence we find
The result follows since is arbitrary. ∎
Proof of Proposition 5.
Proof of Proposition 6.
The proof is completely analogous to the proof of Proposition 5; simply replace with either or as appropriate. ∎
Proof of Proposition 7.
Let ; then due to Assumption 2. Fix any with . By definition, we have .
Let denote the set of transition matrices that dominates . Due to Proposition 3, there is some such that . Fix any . Then, using that for all , together with the fact that is a transition matrix, we have
and hence . We have and since is a transition matrix. Using the linear character of , we find that
Since and we have
Combining the above we find that
Since this is true for all , we find that . Moreover, since , it follows that , or in other words, that
The result follows since with is arbitrary. ∎
Appendix B Proofs and Lemmas for Section 4.1
Proof of Proposition 11.
This proof is a straightforward generalization of an argument in [Engel and Nagel, 2000, Prop I.3.12].
Let first ; then due to Proposition 7. Define
Then since . Moreover, due to Proposition 7, and hence . Now set and ; then since .
Fix any . If then the result is trivial, so let us suppose that . Then there are and such that . Using the semigroup property, we have
We have and , and so
which concludes the proof. ∎
Proof of Proposition 12.
It follows from the definition that for any upper transition operator and any non-negative , also is non-negative. In the sequel, we will therefore say that upper transition operators preserve non-negativity. Since is an upper transition operator, this property clearly extends also to .
Now fix and . By preservation of non-negativity we have for any that
Moreover, we clearly have , and so by the monotonicity of upper transition operators, we have
Finally, by the subadditivity of upper transition operators, we find that
Again by preservation of non-negativity we have
Because this is true for all , we find that
Multiplying both sides with and noting that is arbitrary, we find that
Hence we have established that satisfies the triangle inequality.
Next, fix any and . Then
So is absolutely homogeneous.
Finally, fix and suppose that . It holds that
whence it holds that . This implies that also . Since , we have
whence . Hence separates . ∎
Lemma 5.
For any with subgenerator , any , and any , it holds that .
Proof.
Choose . Let be any matrix with non-negative entries. Then for all . In particular, we have
where the final two equalities follow from the fact that only has non-negative entries. Since this is true for any matrix with non-negative entries, we have in particular that . Similarly, it holds that
It follows that, for any , we have
Due to preservation of non-negativity, and since this is true for any , we have
Now let be such that and ; this clearly exists since is finite-dimensional. Then we have
where the final inequality used that . Hence we have found that .
Since by Equation (7), we also have
By monotonicity of upper transition operators, this implies that
and, due to the preservation of non-negativity, we have
and
Hence for all we have
or in other words, that
Since this holds for all , we have
which concludes the proof. ∎
Proof of Proposition 13.
The argument is analogous to the well-known case for linear quasicontractive semigroups; for a similar result, see e.g. [Renardy and Rogers, 2006, Thm 12.21]. So, fix any and .
Using a similar argument as used in the proof of Lemma 5, we use the preservation of non-negativity and the monotonicity of upper transition operators, to find for any that
Hence we have
where for the second equality we used the semigroup property. Since is arbitrary, this implies that
which completes the proof. ∎
Appendix C Proofs and Lemmas for Section 5
The following result is well-known, but we state it here for convenience:
Lemma 6.
Let be a linear bounded operator on a Banach space with norm . Suppose that and that exists. Then
Proof.
Since we have . Taking norms,
where the final step used the value of the geometric series and that . ∎
Lemma 7.
There is some such that for any with , and any with subgenerator , it holds that .
Proof.
Let be as in Proposition 11, and let be the norm from Equation (8). Since is finite-dimensional the norms and are equivalent, and hence there is some such that for all . Set ; then since .
Fix any such that , and any with subgenerator . It follows from Proposition 14 that . Using a standard quadratic bound on the negative scalar exponential, we have
(14) |
where the third inequality used that . Notice that . Moreover, exists by Proposition 8. By the norm equivalence, we have
(15) |
and, by Lemma 6, that
Using Equation (14) we obtain
Combining with Equation (15) yields
which concludes the proof. ∎
Proof of Proposition 15.
Let be as in Lemma 7, and let and with ; note that since is bounded by assumption. Observe that we must have due to Assumption 2, whence .
Choose any and . It is immediate from the definitions that for all and all , so it remains to bound the norm on .
Let be the subgenerator of on . By Proposition 9 we have that . Using the definition of this implies that
Re-ordering terms we have
Let . We find that
We see that the difference on the left-hand side occurs again on the right-hand side. Hence we can substitute the same expansion times to get
Since we know from Section 4 that , we see that the left summand vanishes as we take and, using Proposition 8, we have . So, passing to this limit and taking norms, we find
Using Lemmas 3 and 4 and Corollary 3, we have
and so, due to [Krak, 2021, Lemma B.8], we have . Since it follows that , and so . Since we have , whence due to Lemma 7. In summary we find
which concludes the proof. ∎
Proposition 17.
[Krak, 2020, Prop 7] Fix any , and let denote the set of transition matrices that dominate . Choose any . For all , let be the (unique) non-negative solution to , and let be such that .
Then .
Proof.
The preconditions of the reference actually require every to be an extreme point of , but inspection of the proof of [Krak, 2020, Prop 7] shows that this is not required; the superfluous condition is only used to streamline the statement of an algorithmic result further on in that work. ∎
We next need some results that involve transition matrices corresponding to (not-necessarily homogeneous) Markov chains . We recall from Section 2.2 that these are defined for any with as
Lemma 8.
Consider the sequence constructed as in Proposition 17. For any , there is a Markov chain with corresponding transition matrix such that .
Hence in particular, we can choose the co-sequence in Proposition 17 to be .
Proof.
This follows from [Krak, 2021, Cor 6.24] and the fact that is non-empty, compact, convex, and has separately specified rows. ∎
Proposition 18.
For all there is a Markov chain with corresponding transition matrix , such that the unique solution to satisfies .
Proof.
Let , and let be as in Proposition 17, with the co-sequence chosen as in Lemma 8 to consist of transition matrices corresponding to Markov chains in . Then lives in .
The set is compact by [Krak, 2021, Cor 5.18] and the fact that is non-empty, compact, and convex. Hence we can find a subsequence with . Since , there is a Markov chain with corresponding transition matrix .
Moreover, since , it follows from [Krak, 2021, Cor 6.24] that the transition matrices and all dominate the lower transition operator . Together with Assumption 2, this allows us to invoke [Krak, 2020, Prop 6], by which we can let be the unique solution to , and it holds for any that , and
Similarly, it holds that , and
Since and by continuity of the map —which holds since all these inverses exist—it follows that . Since also , it follows that .
By Proposition 17 we have , and hence we conclude that . ∎
Proposition 19.
Fix any and consider any Markov chain with transition matrix . Choose any . Then there is some such that for all there are , such that
Proof.
The result is trivial if , so let us consider the case where . Let . By [Krak, 2021, Lemma 5.12] there is some such that for all and with , for all there is some such that
Since is a Markov chain, we can factor its transition matrices [Krak, 2021, Prop 5.1] as
Using [Krak, 2021, Lemma B.5] for the first inequality, we have
which concludes the proof. ∎
Lemma 9.
Consider a sequence in with limit . For all , let denote the minimal non-negative solution to , and let denote the minimal non-negative solution to . Then .
Proof.
Lemma 10.
Proof of Proposition 16.
We only give the proof for the lower hitting times, i.e. that . The argument for the upper hitting times is completely analogous.
Choose any two sequences and in such that and . We will assume without loss of generality that for all , where .
Now first fix any , and consider . By Proposition 18 there is a Markov chain with transition matrix such that the unique solution to satisfies .
By Proposition 19, there are with and in such that, with
it holds that . Now define
Then since is convex. Let denote the minimal non-negative solution to .
By repeating this construction for all , we obtain a sequence in . Since is (sequentially) compact, we can consider a subsequence such that .
Let be the minimal non-negative solution to . We now need to estimate some norm bounds that hold by choosing large enough. Let and fix any .
Since converges to , it follows from Lemma 9 that for large enough, we have
(16) |
Since is bounded, this also implies that the sequence is eventually uniformly bounded above in norm by some constant , say.
For all , let be such that and . Then
For large enough we eventually have , and so by Proposition 15, we then have
with independent of . Hence for large enough we have
(17) |
Let next be the minimal non-negative solution to . Since , for large enough we have due to Assumption 2.
By [Krak, 2021, Lemmas B.8 and B.12] we have
and so, for any , we can choose large enough so that eventually . Using the continuity of the map on operators for which this inverse exists, for large enough we therefore find that
Since , this implies that then also
(18) |
Next, we recall that , and
Hence by continuity of the map on operators for which this inverse exists, for large enough we find that
Since , this implies that also
(19) |
Putting Equations (16)–(19) together, we find that for any large enough it holds that
(20) |
Since is arbitrary this clearly implies that
(21) |
Next, let us show that . To this end, assume ex absurdo that there is some such that for some . Let . Due to Corollary 2, for any small enough it holds that
This implies in particular that for large enough it holds that . Moreover, it follows from Equation (20) that for large enough we have . It holds that , and hence, since , we find that that for large enough ,
In other words, and using Lemma 10, we then have
where the last step used that . From this contradiction we conclude that our earlier assumption must be wrong, and so it holds that for all and . This implies that . Since it clearly also holds that because , this implies that, indeed as claimed, .
In summary, at this point we have shown that for any sequence in with , there is a subsequence such that .
So, finally, suppose ex absurdo that . Then there is some sequence in such that , and some , such that for all . By the above result, there is a subsequence such that , which is a contradiction. ∎
Proof of Theorem 1.
The crucial approach of this proof is to emulate Erreygers [2021, Sec 6.3] and consider discretized and truncated hitting times. By taking appropriate limits of such approximations, we then recover the “real” hitting times. We however need to be a bit careful with these constructions, since lower (and upper) expectation operators for continuous-time imprecise-Markov chains are not necessarily continuous with respect to arbitrary limits of such approximations [Erreygers, 2021, Chap 5]. This—fairly long—proof is therefore roughly divided into two parts; first, we construct a specific sequence of approximations, and establish the relevant continuity properties with respect to this sequence. Then, in the second part of this proof, we use this continuity to establish the main claim of this theorem.
To this end, for any and , we first consider a fixed-step grid over with step-size , as
(22) |
We define the associated approximate hitting time functions for all as
(23) |
Then by [Erreygers, 2021, Lemma 6.19], as we take the time-horizon to infinity and the step-size to zero, we have the point-wise limit to the actual hitting time function , in that
(24) |
Let us now construct a specific sequence of approximate hitting time functions that will converge to this limit. To this end, first fix an arbitrary sequence in such that . Moreover, for any , we introduce the (discrete-time) truncated hitting time , defined for all as
Now fix any , let , and let denote the set of transition matrices that dominate . We now consider discrete-time imprecise-Markov chains parameterized by . As discussed in [Krak et al., 2019], for all there are functions999These represent lower and an upper expectations with respect to a game-theoretic imprecise-Markov chain, but the details don’t concern us here. and in such that
that, moreover, satisfy
and
We already noted in Section 5 that the functions and from Equations (12) and (13) satisfy
Combining the above, we find that
and
Hence for all , we can now choose large enough such that , and so that with we have both
(25) |
and
(26) |
With these selections, we now define the sequence of approximate hitting times as for all . Clearly we have , and since we also find that . Hence by Equation (24) we have the pointwise limit
(27) |
Having constructed this specific sequence that converges to the “true” hitting time function, we will now demonstrate the relevant continuity properties of the lower- and upper expectations of interest, with respect to this sequence.
To this end, we define as
(28) |
Then for all we have for all . Moreover, since every is non-negative, it holds in fact that for all and . This means that if we can show that the upper expectation is bounded for all , then we can use the imprecise version of the dominated convergence theorem [Erreygers, 2021, Thm 5.32] to take lower- and upper expectations of the limit in Equation (27). So, we will now show that this boundedness indeed holds.
We note that, for fixed , is monotonically decreasing as we increase . To see this, first consider the grids and over . For any there is some such that , and since , we find that also . Hence we conclude that . From this set inclusion, we also clearly have for any that
and so together with the fact that for all , it then follows from Equation (23) that .
Using this observation, we immediately find that for any and it holds that
and so from Equation (28), we have
Next, we observe that is monotonically increasing as we increase . Indeed, for any the grid over simply constitutes the set . Hence that is monotonically increasing as we increase , follows immediately from Equation (23). In particular, this implies that the sequence converges monotonically to . Moreover, we have for all , and so we find that identically
Hence by the continuity of upper expectations with respect to monotonically increasing convergent sequences of functions that are bounded below [Erreygers, 2021, Thm 5.31], we have
(29) |
Now, for every , only depends on finitely many time-points; indeed, only depends on the value of for . Using [Krak, 2021, Thm 7.2], this means that the lower- and upper expectations of these functions with respect to the imprecise-Markov chain , can also be expressed as lower (resp. upper) expectations of this function with respect to an induced discrete-time imprecise-Markov chain. Indeed, since the step-size used in these approximating functions is uniformly equal to one, and using the obvious correspondence between and , it is not difficult to see that
(30) |
where is the set of transition matrices that dominates .
We now again invoke the previously mentioned results from [Krak et al., 2019]; for any , there is a function that satisfies
(31) |
Combining Equations (29), (30), and (31), and using [Krak et al., 2019, Prop 7] to establish the limit on the final right-hand side, we have
By [Krak et al., 2019, Thm 12] it holds that
and, moreover, that there is some homogeneous discrete-time Markov chain with associated transition matrix and hitting times such that . Putting this together, we find that
(32) |
By Proposition 1, is also the minimal non-negative solution to the system
(33) |
It is immediate from the definition that , and since is clearly non-negative, we obtain from Equation (32) that for all we have
or in other words, that for all . So, it remains to bound this upper expectation on .
By our Assumption 2, it holds for all that . Since is the lower transition operator corresponding to due to Proposition 3, it follows that satisfies conditions C1–C3 and R1 from Reference [Krak, 2020]. We now recall that . Since the preconditions C1–C3 and R1 of this reference are all satisfied, we can now invoke [Krak, 2020, Lemma 10], which states that the inverse operator exists.
We note that , and so . Hence in particular, we have . From Equation (33), we now find that
and so re-ordering terms, we have . Using the existence of the inverse operator established above, we obtain
Since is an invertible bounded linear operator, also clearly is bounded. Hence we have
From Equation (32) we find that for all . In summary, at this point we have shown that is bounded for all . Since we already established that absolutely dominates the sequence , we can now finally use the limit (27) and the dominated convergence theorem [Erreygers, 2021, Thm 5.32] to establish that
(34) |
and
(35) |
This concludes the first part of this proof. Our next step will be to identify the limits superior and inferior in the above inequalities as corresponding to, respectively, and .
Let us start by obtaining the required result for the lower expectation. From the definition of the limit superior, there is a convergent subsequence such that
(36) |
Now fix any , and consider the approximate function . As before, this function really only depends on the system at finitely many time points; specifically, those on the grid over . We can therefore again use [Krak, 2021, Thm 7.2], to express the lower- and upper expectations of this function with respect to the imprecise-Markov chain , as lower- and upper expectations of a function with respect to an induced discrete-time imprecise-Markov chain. Since the step size of this grid is now equal to rather than one, this requires a bit more effort than before. In particular, we now need to compensate for the step-size of the grid. Indeed, the corresponding discrete-time imprecise-Markov chain should consider steps that are implicitly of this “length”, so we consider the model induced by the set of transition matrices that dominate . It then remains to find an appropriate translation of to the domain .
As a first observation, we note that this “translation” should depend on the same number of time points as . We note that since it holds that . Hence it follows from Equation (22) that contains exactly time points, in addition to the origin , and that depends exactly on these time points. Indeed, inspection of Equation (23) reveals that, by re-scaling to compensate for the step size , the quantity simply represents the natural index of the discrete grid element of on which did (or did not) initially hit . Adapting Equation (23), we therefore define for any that
We see that, as required, is again simply the identity of the step on which did (or did not) initially hit . This implies the relation to the discrete-time truncated hitting time ; for any we have
and so we simply have that . Following the discussion in [Krak, 2021, Chap 7], and [Krak, 2021, Thm 7.2] in particular, we therefore find the identity
(37) |
We again recall from Krak et al. [2019] the objects in satisfying
Hence from Equations (36) and (37) we now find that
(38) |
We now note that, for all , we have
where we used Equation (25) for the final inequality. Using that and , together with Proposition 16, we see that both summands vanish as we increase , and so we have
(39) |
We already established in Section 5 that . Hence by combining Equations (34), (36), (38), and (39), we now find that
(40) |
However, as noted in Section 3.1 we have the inclusion , so it immediately follows from the definition of the lower expectations that
Hence by Equation (40) we obtain the identity
which concludes the proof that the lower expected hitting times are the same for all three types of continuous-time imprecise-Markov chains. We omit the proof for the upper expected hitting times; this is completely analogous, starting instead from Equation (35) and using the norm bound (26) to pass to the limit . ∎
Proof of Theorem 2.
First fix any . For it then holds that
Since for all , we have and . We can therefore rearrange terms and add to the above, to obtain
Because the individual limits exist by Proposition 16 and [De Bock, 2017, Prop 9], taking to zero yields
So, is indeed a solution to the system . It follows from a completely analogous argument that also is a solution to .
That and are non-negative is clear. We now first show that is the minimal non-negative solution to its corresponding system. To this end, suppose ex absurdo that there is some non-negative such that for some , and . Then clearly since for all and is non-negative.
By Proposition 4, there is then some such that , and so also . By Proposition 2 there is some minimal non-negative solution to the system , where the minimality implies in particular that . Since , we obtain
which is a contradiction.
We next show that is the minimal non-negative solution to its corresponding system; this will require a bit more effort and we need to start with some auxiliary constructions. By Proposition 4, there is some such that . Let be the subgenerator of .
Consider the from Proposition 11, and let be the norm from Section 4.1. Since is finite-dimensional, the norms and are equivalent, whence there is some such that for all .
Now let be such that , , , and ; this is clearly always possible. Define the map for all as
Let us show that is a contraction on the Banach space , or in other words, that there is some such that for all [Renardy and Rogers, 2006, Sec 10.1.1]. So, fix any . Then we have
from which we find that
(41) |
By Proposition 14 we have , and so using a standard quadratic bound on the negative scalar exponential,
(42) |
where we used that .
Moreover, we have that
where the second inequality used Lemmas 3 and 4 and Corollary 3; and the final inequality used [Krak, 2021, Lemma B.8]. Since we have
(43) |
Combining Equations (41), (42), and (43) we obtain
Since , , and , we conclude that is indeed a contraction. Hence by the Banach fixed-point theorem [Renardy and Rogers, 2006, Thm 10.1], there is a unique fixed point such that and, for any , it holds that
(44) |
It is easy to see that this unique fixed point is given by . Indeed, from the choice of and the fact that satisfies , we have
Moreover, since we have and , whence
Noting that , after multiplying with we find that . Comparing with the definition of , we have
so is indeed a fixed-point of . Since we already established that has a unique fixed-point, we conclude from Equation (44) that
(45) |
Next let us show that is monotone. To this end, fix any such that ; then clearly also . Since is such that , it follows from [Krak, 2021, Prop 4.9] that is a transition matrix. By the monotonicity of transition matrices [Krak, 2021, Prop 3.32], we find that therefore
which in turn implies that
From the definition of we therefore conclude that . Since with are arbitrary, this concludes the proof of the monotonicity of .
Now, let us define for all as
We first note that, since , it follows from [De Bock, 2017, Prop 5] that is a lower transition operator. From the conjugacy of and , we have for any that
which implies that is the upper transition operator that is conjugate to the lower transition operator . By the monotonicity of upper transition operators—see Section 3.2—this implies that is monotone, or in other words that for all with it holds that .
Let us next show that
(46) |
Indeed, if then , and since , it follows from the definitions of and that then
Hence we have
Moreover, we immediately have from the definition that , and so Equation (46) indeed holds.
Next, we note that for any it holds that , and so by Equation (46) we find that
Provided that also , then using the previously established monotonicity of we obtain
We clearly have , whence
Indeed, we can repeat this reasoning for steps, to conclude that
(47) |
Now suppose ex absurdo that there is some non-negative , such that for some , and such that . Since is non-negative and , we must have that . Moreover, we clearly have , which implies that and so . Hence it follows that , and we find that . Hence is a fixed point of . Since , and using Equation (47), this implies that for any we have
Recalling that is such that , we use Equation (45) to take limits in and find that
which is a contradiction. ∎