This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: University of Waterloo, Waterloo, ON N2L 3G1, Canada 11email: {yiming.meng, j.liu}@uwaterloo.ca

Robustly Complete Finite-State Abstractions for Verification of Stochastic Systems

Yiming Meng    Jun Liu
Abstract

In this paper, we focus on discrete-time stochastic systems modelled by nonlinear stochastic difference equations and propose robust abstractions for verifying probabilistic linear temporal specifications. The current literature focuses on developing sound abstraction techniques for stochastic dynamics without perturbations. However, soundness thus far has only been shown for preserving the satisfaction probability of certain types of temporal-logic specification. We present constructive finite-state abstractions for verifying probabilistic satisfaction of general ω\omega-regular linear-time properties of more general nonlinear stochastic systems. Instead of imposing stability assumptions, we analyze the probabilistic properties from the topological view of metrizable space of probability measures. Such abstractions are both sound and approximately complete. That is, given a concrete discrete-time stochastic system and an arbitrarily small 1\mathcal{L}_{1}-perturbation of this system, there exists a family of finite-state Markov chains whose set of satisfaction probabilities contains that of the original system and meanwhile is contained by that of the slightly perturbed system. A direct consequence is that, given a probabilistic linear-time specification, initializing within the winning/losing region of the abstraction system can guarantee a satisfaction/dissatisfaction for the original system. We make an interesting observation that, unlike the deterministic case, point-mass (Dirac) perturbations cannot fulfill the purpose of robust completeness.

Keywords:
Verification of stochastic systems Finite-state abstraction Robustness Soundness Completeness 1\mathcal{L}_{1}-perturbation Linear temporal logic Metrizable space of probability measures

1 Introduction

Formal verification is a rigorous mathematical technique for verifying system properties using formal analysis or model checking [4]. So far, abstraction-based formal verification for deterministic systems has gained its maturity [5]. Whilst bisimilar (equivalent) symbolic models exist for linear (control) systems [17, 31], sound and approximately complete finite abstractions can be achieved via stability assumptions [25, 14] or robustness (in terms of Dirac perturbations) [21, 20, 22].

There is a recent surge of interest in studying formal verification for stochastic systems. The verification of temporal logics for discrete-state homogeneous Markov chains can be solved by existing tools [4, 24, 6, 9].

In terms of verification for general discrete-time continuous-state Markov systems, a common theme is to construct abstractions to approximate the probability of satisfaction in proper ways. First attempts [26, 29, 30, 3, 2] were to relate the verification of trivial probabilistic computation tree logic (PCTL) formulas to the computation of corresponding value functions. The authors [32, 33] developed alternative techniques to deal with the potential error blow-up in infinite-horizon problems. The same authors [35] investigated the necessity of absorbing sets on the uniqueness of the solutions of corresponding Bellman equations. The related PCTL verification problem can be then precisely captured by finite-horizon ones. They also proposed abstractions for verifying general bounded linear-time (LT) properties [34], and extended them to infinite-horizon reach-avoid and repeated reachability problems [34, 36].

Markov set-chains are also constructive to be abstractions. The authors [1] showed that the error is finite under strong assumptions on stability (ergodicity). A closely related approach is to apply interval-valued Markov chains (IMCs), a family of finite-state Markov chains with uncertain transitions, as abstractions for the continuous-state Markov systems with certain transition kernel. The authors [18] argued without proof that for every PCTL formula, the probability of (path) satisfaction of the IMC abstractions forms a compact interval, which contains the real probability of the original system. They further developed ‘O’-maximizing/minimizing algorithms based on [15, 40] to obtain the upper/lower bound of the satisfaction probability of ‘next’, ‘bounded until’, and ‘until’ properties. The algorithm provides a fundamental view of computing the bounds of probability satisfaction given IMCs. However, the intuitive reasoning for soundness seems inaccurate based on our observation (readers who are interested in the details are referred to Remark 8 of this paper). Inspired by [18], the work in [7] formulated IMC abstraction for verifying bounded-LTL specifications; the work in [11, 12] constructed IMC abstractions for verifying general ω\omega-regular properties of mixed-monotone systems, and provided a novel automata-based approach in obtaining the bounds of satisfaction probability. In [11, 12, Fact 1], the authors claimed the soundness of verifying general ω\omega-regular properties using IMC abstractions, but a proof is not provided. In [10], the authors showed that IMC can be used to provide conservative estimates of the expected values for stochastic linear control system. The authors also remarked that their result can be extended to deal with omega-regular properties based on [15, 19] (where [19] is only regarding safety properties), but without any proofs. To the best knowledge of the authors, we currently lacks a general framework, as the one presented in the paper, for guaranteeing soundness of IMC abstractions for verifying ω\omega-regular properties.

Motivated by these issues, our first contribution is to provide a formal mathematical proof of the soundness of IMC abstractions for verifying ω\omega-regular linear-time properties. We show that, for any discrete-time stochastic dynamical systems modelled by a stochastic difference equation and any linear-time property, an IMC abstraction returns a compact interval of probability of (path) satisfaction which contains the satisfaction probability of the original system. A direct consequence is that starting within the winning/losing region computed by the abstraction can guarantee a satisfaction/dissatisfaction for the original system. The second contribution of this paper is to deal with stochastic systems with extra uncertain perturbations (due to, e.g., measurement errors or modeling uncertainties). Under mild assumptions, we show that, in verifying probabilistic satisfaction of general ω\omega-regular linear-time properties, IMC abstractions that are both sound and approximately complete are constructible for nonlinear stochastic systems. That is, given a concrete discrete-time continuous-state Markov system 𝕏\mathbb{X}, and an arbitrarily small 1\mathcal{L}_{1}-bounded perturbation of this system, there exists an IMC abstraction whose set of satisfaction probability contains that of 𝕏\mathbb{X}, and meanwhile is contained by that of the slightly perturbed system. We argue in Section 4 that to make the IMC abstraction robustly complete, the perturbation is generally necessary to be 1\mathcal{L}_{1}-bounded rather than only bounded in terms of point mass. We analyze the probabilistic properties based on the topology of metrizable space of (uncertain) probability measures, and show that the technique proves more powerful than purely discussing the value of probabilities. We also would like to clarify that the main purpose of this paper is not on providing more efficient algorithms for computing abstractions. We aim to provide a theoretical foundation of IMC abstractions for verifying continuous-state stochastic systems with perturbations and hope to shed some light on designing more powerful robust verification algorithms.

The rest of the paper is organized as follows. Section 2 presents some preliminaries on probability spaces and Markov systems. Section 3 presents the soundness of abstractions in verifying ω\omega-regular linear-time properties for discrete-time nonlinear stochastic systems. Section 4 presents the constructive robust abstractions with soundness and approximate completeness guarantees. We discuss the differences of robustness between deterministic and stochastic systems. The paper is concluded in Section 5.

Notation: We denote by \small{\prod} the product of ordinary sets, spaces, or function values. Denote by \otimes the product of collections of sets, or sigma algebras, or measures. The nn-times repeated product of any kind is denoted by ()n(\cdot)^{n} for simplification. Denote by πj:i=0()i()j\pi_{j}:\prod_{i=0}^{\infty}(\cdot)_{i}\rightarrow(\cdot)_{j} the projection to the jthj^{\text{th}} component. We denote by ()\mathscr{B}(\cdot) the Borel σ\sigma-algebra of a set.

Let |||\cdot| denote the inifinity norm in n\mathbb{R}^{n}, and let 𝔹:={xn:|x|1}\mathbb{B}:=\{x\in\mathbb{R}^{n}:|x|\leq 1\}. We denote by 1:=||\|\cdot\|_{1}:=\mathcal{E}|\cdot| the 1\mathcal{L}_{1}-norm for n\mathbb{R}^{n}-valued random variables, and let 𝔹1:={X:n-valued random variable withX11}\mathbb{B}_{1}:=\{X:\mathbb{R}^{n}\text{-valued random variable with}\;\|X\|_{1}\leq 1\}. Given a matrix MM, we denote by MiM_{i} its ithi^{\text{th}} row and by MijM_{ij} its entry at ithi^{\text{th}} row and jthj^{\text{th}} column.

Given a general state space 𝒳\mathcal{X}, we denote by 𝔓(𝒳)\mathfrak{P}(\mathcal{X}) the space of probability measures. The space of bounded and continuous functions on 𝒳\mathcal{X} is denoted by Cb(𝒳)C_{b}(\mathcal{X}). For any stochastic processes {Xt}t0\{X_{t}\}_{t\geq 0}, we use the shorthand notation X:={Xt}t0X:=\{X_{t}\}_{t\geq 0}. For any stopped process {Xtτ}t0\{X_{t\wedge\tau}\}_{t\geq 0}, where τ\tau is a stopping time, we use the shorthand notation XτX^{\tau}.

2 Preliminaries

We consider ={0,1,}\mathbb{N}=\{0,1,\cdots\} as the discrete time index set, and a general Polish (complete and separable metric) space 𝒳\mathcal{X} as the state space. For any discrete-time 𝒳\mathcal{X}^{\infty}-valued stochastic process XX, we introduce some standard concepts as follows.

2.1 Canonical sample space

Given a stochastic process XX defined on some (most likely unknown) probability space (Ω,,)(\Omega^{\dagger},\mathscr{F}^{\dagger},\mathbb{P}^{\dagger}). For ϖ𝒳=:Ω\varpi\in\mathcal{X}^{\infty}=:\Omega and tt\in\mathbb{N}, we define ϖt:=πt(ϖ)\varpi_{t}:=\pi_{t}(\varpi) and the coordinate process 𝔛t:𝒳𝒳\mathfrak{X}_{t}:\mathcal{X}^{\infty}\rightarrow\mathcal{X} as 𝔛t(ϖ):=ϖt\mathfrak{X}_{t}(\varpi):=\varpi_{t} associated with :=σ{𝔛0,𝔛1,}\mathcal{F}:=\sigma\{\mathfrak{X}_{0},\mathfrak{X}_{1},\cdots\}. Then Ω𝒳\Omega^{\dagger}\longrightarrow\mathcal{X}^{\infty} (ωt=0Xt(ω)\omega^{\dagger}\longmapsto\prod_{t=0}^{\infty}X_{t}(\omega^{\dagger})) is a measurable map from (Ω,)(\Omega^{\dagger},\mathscr{F}^{\dagger}) to (Ω,)(\Omega,\mathcal{F}). In particular, =σ{𝔛tΓ,Γ(𝒳),t}=(𝒳)=(𝒳)=σ{𝒞}\mathcal{F}=\sigma\{\mathfrak{X}_{t}\in\Gamma,\;\;\Gamma\in\mathscr{B}(\mathcal{X}),\;\;t\in\mathbb{N}\}=\mathscr{B}(\mathcal{X}^{\infty})=\mathscr{B}^{\infty}(\mathcal{X})=\sigma\{\mathcal{C}\}, where 𝒞\mathcal{C} is the collection of all finite-dimensional cylinder set of the following form:

i=1nΓi={ϖ:𝔛t1(ϖ)Γ1,,𝔛tn(ϖ)Γn,Γi(𝒳),ti,i=1,,n}.\prod_{i=1}^{n}\Gamma_{i}=\{\varpi:\mathfrak{X}_{t_{1}}(\varpi)\in\Gamma_{1},\cdots,\mathfrak{X}_{t_{n}}(\varpi)\in\Gamma_{n},\;\Gamma_{i}\in\mathscr{B}(\mathcal{X}),t_{i}\in\mathbb{N},i=1,\cdots,n\}.

The measure 𝒫:=X1\mathcal{P}:=\mathbb{P}^{\dagger}\circ X^{-1} of the defined coordinate process 𝔛\mathfrak{X} is then uniquely determined and admits the probability law of the process XX on the product state space, i.e.,

𝒫[𝔛t1Γ1,,𝔛tnΓn]=𝒫(i=1nΓi)=[Xt1Γ1,,XtnΓn].\begin{split}\mathcal{P}[\mathfrak{X}_{t_{1}}\in\Gamma_{1},\cdots,\mathfrak{X}_{t_{n}}\in\Gamma_{n}]=\mathcal{P}\left(\prod_{i=1}^{n}\Gamma_{i}\right)=\mathbb{P}^{\dagger}[X_{t_{1}}\in\Gamma_{1},\cdots,X_{t_{n}}\in\Gamma_{n}].\end{split} (1)

for any finite-dimensional cylinder set i=1nΓi\prod_{i=1}^{n}\Gamma_{i}\in\mathcal{F}. We call (Ω,,𝒫)(\Omega,\mathcal{F},\mathcal{P}) the canonical space of XX and denote by \mathcal{E} the associated expectation operator.

Since we only care about the probabilistic behavior of trajectories in the state space, we prefer to work on the canonical probability spaces (Ω,,𝒫)(\Omega,\mathcal{F},\mathcal{P}) and regard events as sets of sample paths. To this end, we also do not distinguish the notation 𝔛\mathfrak{X} from XX due to their identicality in distribution, i.e., we use XX to denote its own coordinate process for simplicity.

In the context of discrete state space 𝒳\mathcal{X}, we specifically use the boldface notation (𝛀,𝐅,𝐏)(\mathbf{\Omega},\mathbf{F},\mathbf{P}) for the discrete canonical spaces of some discrete-state process.

Remark 1

We usually denote by νi\nu_{i} the marginal distribution of 𝒫\mathcal{P} at some ii\in\mathbb{N}. We can informally write the nn-dimensional distribution (on nn-dimensional cylinder set) as 𝒫()=i=1nνi()\mathcal{P}(\cdot)=\otimes_{i=1}^{n}\nu_{i}(\cdot) regardless of the dependence.

2.2 Markov transition systems

For any discrete-time stochastic process XX, we set t=σ{X0,X1,,Xt}\mathcal{F}_{t}=\sigma\{X_{0},X_{1},\cdots,X_{t}\} to be the natural filtration.

Definition 1 (Markov process)

A stochastic process XX is said to be a Markov process if each XtX_{t} is t\mathcal{F}_{t}-adapted and, for any Γ(𝒳)\Gamma\in\mathscr{B}(\mathcal{X}) and t>st>s, we have

𝒫[XtΓ|s]=𝒫[XtΓ|σ{Xs}],a.s.\mathcal{P}[X_{t}\in\Gamma\;|\;\mathcal{F}_{s}]=\mathcal{P}[X_{t}\in\Gamma\;|\;\sigma\{X_{s}\}],\;\;\text{a.s.} (2)

Correspondingly, for every tt, we define the transition probability as

Θt(x,Γ):=𝒫[Xt+1Γ|Xt=x],Γ(𝒳).\Theta_{t}(x,\Gamma):=\mathcal{P}[X_{t+1}\in\Gamma\;|\;X_{t}=x],\;\;\Gamma\in\mathscr{B}(\mathcal{X}). (3)

We denote Θt:={Θt(x,Γ):x𝒳,Γ(𝒳)}\Theta_{t}:=\{\Theta_{t}(x,\Gamma):\;x\in\mathcal{X},\;\Gamma\in\mathscr{B}(\mathcal{X})\} as the family of transition probabilities at time tt. Note that homogeneous Markov processes are special cases such that Θt=Θs\Theta_{t}=\Theta_{s} for all tst\neq s.

We are interested in Markov processes with discrete observations of states, which is done by assigning abstract labels over a finite set of atomic propositions. We define an abstract family of labelled Markov processes as follows.

Definition 2 (Markov system)

A Markov system is a tuple 𝕏=(𝒳,[[Θ]],Π,L)\mathbb{X}=(\mathcal{X},[\![\Theta]\!],\Pi,L), where

  • 𝒳=𝒲Δ\mathcal{X}=\mathcal{W}\cup\Delta, where 𝒲\mathcal{W} is a bounded working space, Δ:=𝒲c\Delta:=\mathcal{W}^{c} represents all the out-of-domain states;

  • [[Θ]][\![\Theta]\!] is a collection of transition probabilities from which Θt\Theta_{t} is chosen for every tt;

  • Π\Pi is the finite set of atomic propositions;

  • L:𝒳2ΠL:\mathcal{X}\rightarrow 2^{\Pi} is the (Borel-measurable) labelling function.

For X𝕏X\in\mathbb{X} with X0=x0X_{0}=x_{0} a.s., we denote by 𝒫Xx0\mathcal{P}_{X}^{x_{0}} the law, and {𝒫Xx0}X𝕏\{\mathcal{P}_{X}^{x_{0}}\}_{X\in\mathbb{X}} by its collection. Similarly, for any initial distribution ν0𝔓(𝒳)\mathcal{\nu}_{0}\in\mathfrak{P}(\mathcal{X}), we define the law by 𝒫Xν0()=𝒳𝒫Xx()ν0(dx)\mathcal{P}_{X}^{\mathcal{\nu}_{0}}(\cdot)=\int_{\mathcal{X}}\mathcal{P}_{X}^{x}(\cdot)\mathcal{\nu}_{0}(dx), and denote {𝒫Xν0}X𝕏\{\mathcal{P}_{X}^{\mathcal{\nu}_{0}}\}_{X\in\mathbb{X}} by its collection. We denote by {𝒫nq0}n=0\{\mathcal{P}_{n}^{q_{0}}\}_{n=0}^{\infty} (resp. {𝒫nν0}n=0\{\mathcal{P}_{n}^{\mathcal{\nu}_{0}}\}_{n=0}^{\infty}) a sequence of {𝒫Xx0}X𝕏\{\mathcal{P}_{X}^{x_{0}}\}_{X\in\mathbb{X}} (resp. {𝒫Xν0}X𝕏\{\mathcal{P}_{X}^{\mathcal{\nu}_{0}}\}_{X\in\mathbb{X}}). We simply use 𝒫X\mathcal{P}_{X} (resp. {𝒫X}X𝕏\{\mathcal{P}_{X}\}_{X\in\mathbb{X}}) if we do not emphasize the initial condition.

For a path ϖ:=ϖ0ϖ1ϖ2Ω\varpi:=\varpi_{0}\varpi_{1}\varpi_{2}\cdots\in\Omega, define by Lϖ:=L(ϖ0)L(ϖ1)L(ϖ2)L_{\varpi}:=L(\varpi_{0})L(\varpi_{1})L(\varpi_{2})\cdots its trace. The space of infinite words is denoted by

(2Π)ω={A0A1A2:Ai2Π,i=0,1,2}.(2^{\Pi})^{\omega}=\{A_{0}A_{1}A_{2}\cdots:A_{i}\in 2^{\Pi},\;i=0,1,2\cdots\}.

A linear-time (LT) property is a subset of (2Π)ω(2^{\Pi})^{\omega}. We are only interested in LT properties Ψ\Psi such that Ψ((2Π)ω)\Psi\in\mathscr{B}((2^{\Pi})^{\omega}), i.e., those are Borel-measurable.

Remark 2

Note that, by [36] and [38, Proposition 2.3], any ω\omega-regular language of labelled Markov processes is measurable. It follows that, for any Markov process XX of the given 𝕏\mathbb{X}, the traces LϖL_{\varpi} generated by measurable labelling functions are also measurable. For each Ψ((2Π)ω)\Psi\in\mathscr{B}((2^{\Pi})^{\omega}), we have the event Lϖ1(Ψ)L_{\varpi}^{-1}(\Psi)\in\mathcal{F}.

A particular subclass of LT properties can be specified by linear temporal logic (LTL)111While we consider LTL due to our interest, it can be easily seen that all results of this paper in fact hold for any measurable LT property, including ω\omega-regular specifications.. To connect with LTL specifications, we introduce the semantics of path satisfaction as well as probabilistic satisfaction as follows.

Definition 3

For the syntax of LTL formulae Ψ\Psi and the semantics of satisfaction of Ψ\Psi on infinite words, we refer readers to [21, Section 2.4].

For a given labelled Markov process XX from 𝕏\mathbb{X} with initial distribution ν0\mathcal{\nu}_{0}, we formulate the canonical space (Ω,,𝒫Xν0)(\Omega,\mathcal{F},\mathcal{P}_{X}^{\mathcal{\nu}_{0}}). For a path ϖΩ\varpi\in\Omega, we define the path satisfaction as

ϖΨLϖΨ.\varpi\vDash\Psi\Longleftrightarrow L_{\varpi}\vDash\Psi.

We denote by {XΨ}:={ϖ:ϖΨ}\{X\vDash\Psi\}:=\{\varpi:\;\varpi\vDash\Psi\}\in\mathcal{F} the events of path satisfaction. Given a specified probability ρ[0,1]\rho\in[0,1], we define the probabilistic satisfaction of Ψ\Psi as

X𝒫ρν0[Ψ]𝒫Xν0{XΨ}ρ,X\vDash\mathcal{P}^{\mathcal{\nu}_{0}}_{\bowtie\rho}[\Psi]\Longleftrightarrow\mathcal{P}_{X}^{\mathcal{\nu}_{0}}\{X\vDash\Psi\}\bowtie\rho,

where {,<,,>}\bowtie\in\{\leq,<,\geq,>\}.

2.3 Weak convergence and Prokhorov’s theorem

We consider the set of possible uncertain measures within the topological space of probability measures. The following concepts are frequently used later.

Definition 4 (Tightness of set of measures)

Let 𝒳\mathcal{X} be any topological state space and M𝔓(𝒳)M\subseteq\mathfrak{P}(\mathcal{X}) be a set of probability measures on 𝒳\mathcal{X}. We say that MM is tight if, for every ε>0\varepsilon>0 there exists a compact set K𝒳K\subset\mathcal{X} such that μ(K)1ε\mu(K)\geq 1-\varepsilon for every μM\mu\in M.

Definition 5 (Weak convergence)

A sequence {μn}n=0𝔓(𝒳)\{\mu_{n}\}_{n=0}^{\infty}\subseteq\mathfrak{P}(\mathcal{X}) is said to converge weakly to a probability measure μ\mu, denoted by μnμ\mu_{n}\Rightarrow\mu, if

𝒳h(x)μn(dx)𝒳h(x)μ(dx),hCb(𝒳).\int_{\mathcal{X}}h(x)\mu_{n}(dx)\rightarrow\int_{\mathcal{X}}h(x)\mu(dx),\;\;\forall h\in C_{b}(\mathcal{X}). (4)

We frequently use the following alternative condition [8, Proposition 2.2]:

μn(A)μ(A),A(𝒳)s.t.μ(A)=0.\mu_{n}(A)\rightarrow\mu(A),\;\;\forall A\in\mathscr{B}(\mathcal{X})\;\text{s.t.}\;\mu(\partial A)=0. (5)

Correspondingly, the weak equivalence of any two measures μ\mu and ν\nu on 𝒳\mathcal{X} is such that

𝒳h(x)μ(dx)=𝒳h(x)ν(dx),hCb(𝒳).\int_{\mathcal{X}}h(x)\mu(dx)=\int_{\mathcal{X}}h(x)\nu(dx),\;\;\forall h\in C_{b}(\mathcal{X}). (6)
Remark 3

Weak convergence describes the weak topology. The meaning of the weak topology is to extend the normal convergence in deterministic settings. Note that xnxx_{n}\rightarrow x in 𝒳\mathcal{X} is equivalent to the weak convergence of Dirac measures δxnδx\delta_{x_{n}}\Rightarrow\delta_{x}. It is interesting to note that xnxx_{n}\rightarrow x (resp. x=yx=y) in 𝒳\mathcal{X} does not imply the strong convergence (resp. equivalence) of the associated Dirac measures. A classical counterexample is to let xn=1/nx_{n}=1/n and x=0x=0, and we do not have limnδ1/n=δ0\lim_{n\rightarrow\infty}\delta_{1/n}=\delta_{0} in the strong sense since, i.e., 0=limnδ1/n({0})δ0({0})=1.0=\lim_{n\rightarrow\infty}\delta_{1/n}(\{0\})\neq\delta_{0}(\{0\})=1.

To describe the convergence (in probability law) of more general random variables {Xn}\{X_{n}\} in 𝒳\mathcal{X}, it is equivalent to investigate the weak convergence of their associated measures {μn}\{\mu_{n}\}. It is also straightforward from Definition 5 that weak convergence also describes the convergence of probabilistic properties related to {μn}\{\mu_{n}\}.

Theorem 1 (Prokhorov)

Let 𝒳\mathcal{X} be a complete separable metric space. A family Λ𝔓(𝒳)\Lambda\subseteq\mathfrak{P}(\mathcal{X}) is relatively compact if an only if it is tight. Consequently, for each sequence {μn}\{\mu_{n}\} of tight Λ\Lambda, there exists a μΛ¯\mu\in\bar{\Lambda} and a subsequence {μnk}\{\mu_{n_{k}}\} such that μnkμ\mu_{n_{k}}\Rightarrow\mu.

Remark 4

The first part of Prokhorov’s theorem provides an alternative criterion for verifying the compactness of family of measures w.r.t. the corresponding metric space using tightness. On a compact metric space 𝒳\mathcal{X}, every family of probability measures is tight.

2.4 Discrete-time continuous-state stochastic systems

We define Markov processes determined by the difference equation

Xt+1=f(Xt)+b(Xt)wt+ϑξtX_{t+1}=f(X_{t})+b(X_{t})w_{t}+\vartheta\xi_{t} (7)

where the state Xt(ϖ)𝒳nX_{t}(\varpi)\in\mathcal{X}\subseteq\mathbb{R}^{n} for all tt\in\mathbb{N}, the stochastic inputs {wt}t\{w_{t}\}_{t\in\mathbb{N}} are i.i.d. Gaussian random variables with covariance Ik×kI_{k\times k} without loss of generality. Mappings f:nnf:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n} and b:nn×kb:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n\times k} are locally Lipschitz continuous. The memoryless perturbation ξt𝔹1\xi_{t}\in\mathbb{B}_{1} are independent random variables with intensity ϑ0\vartheta\geq 0 and unknown distributions.

For ϑ0\vartheta\neq 0, (7) defines a family 𝕏\mathbb{X} of Markov processes XX. A special case of (7) is such that ξ\xi has Dirac (point-mass) distributions {δx:x𝔹}\{\delta_{x}:x\in\mathbb{B}\} centered at some uncertain points within a unit ball.

Remark 5

The discrete-time stochastic dynamic is usually obtained from numerical schemes of stochastic differential equations driven by Brownian motions to simulate the probability laws at the observation times. Gaussian random variables are naturally selected to simulate Brownian motions at discrete times. Note that in [11], random variables are used with known unimodal symmetric density with an interval as support. Their choice is in favor of the mixed-monotone models to provide a more accurate approximation of transition probabilities. Other than the precision issue, such a choice does not bring us more of the other 1\mathcal{L}_{1} properties. Since we focus on formal analysis based on 1\mathcal{L}_{1} properties rather than providing accurate approximation, using Gaussian randomnesses as a realization does not lose any generality.

We only care about the behaviors in the bounded working space 𝒲\mathcal{W}. By defining stopping time τ:=inf{t:X𝒲}\tau:=\inf\{t\in\mathbb{N}:X\notin\mathcal{W}\} for each XX, we are able to study the probability law of the corresponding stopped (killed) process XτX^{\tau} for any initial condition x0x_{0} (resp. ν0\mathcal{\nu}_{0}), which coincides with 𝒫Xx0\mathcal{P}_{X}^{x_{0}} (resp. 𝒫Xν0\mathcal{P}_{X}^{\mathcal{\nu}_{0}}) on 𝒲\mathcal{W}. To avoid any complexity, we use the same notation XX and 𝒫Xx0\mathcal{P}_{X}^{x_{0}} (resp. 𝒫Xν0\mathcal{P}_{X}^{\mathcal{\nu}_{0}}) to denote the stopped processes and the associated laws. Such processes driven by (7) can be written as a Markov system

𝕏=(𝒳,[[𝒯]],Π,L),\mathbb{X}=(\mathcal{X},\left[\![\mathcal{T}\right]\!],\Pi,L), (8)

where for all x𝒳𝒲x\in\mathcal{X}\setminus\mathcal{W}, the transition probability should satisfy 𝒯(x,Γ)=0\mathcal{T}(x,\Gamma)=0 for all Γ𝒲\Gamma\cap\mathcal{W}\neq\emptyset; [[𝒯]]\left[\![\mathcal{T}\right]\!] is the collection of transition probabilities. For ξ\xi having Dirac distributions, the transition 𝒯\mathcal{T} is of the following form:

𝒯(x,){{μ𝒩(f(x)+ϑξ,b(x)bT(x)),ξ𝔹},x𝒲,{μ:μ(Γ)=0,Γ𝒲},x𝒳𝒲.\mathcal{T}(x,\cdot)\in\left\{\begin{array}[]{lr}\{\mu\sim\mathcal{N}(f(x)+\vartheta\xi,\;\;b(x)b^{T}(x)),\;\xi\in\mathbb{B}\},\;\;\forall x\in\mathcal{W},\\ \{\mu:\;\mu(\Gamma)=0,\;\;\forall\Gamma\cap\mathcal{W}\neq\emptyset\},\;\;\forall x\in\mathcal{X}\setminus\mathcal{W}.\end{array}\right. (9)
Assumption 1

We assume that inL(x)\textbf{in}\in L(x) for any xΔx\notin\Delta and inL(Δ)\textbf{in}\notin L(\Delta). We can also include ‘always (in)(\textbf{in})’ in the specifications to observe sample paths for ‘inside-domain’ behaviors, which is equivalent to verifying {τ=}\{\tau=\infty\}.

2.5 Robust abstractions

We define a notion of abstraction between continuous-state and finite-state Markov systems via state-level relations and measure-level relations.

Definition 6

A (binary) relation γ\gamma from AA to BB is a subset of A×BA\times B satisfying (i) for each aAa\in A, γ(a):={bB:(a,b)γ}\gamma(a):=\{b\in B:(a,b)\in\gamma\}; (ii) for each bBb\in B, γ1(b):={aA:(a,b)γ}\gamma^{-1}(b):=\{a\in A:(a,b)\in\gamma\}; (iii) for AAA^{\prime}\subseteq A, γ(A)=aAγ(a)\gamma(A^{\prime})=\cup_{a\in A^{\prime}}\gamma(a); (iv) and for BBB^{\prime}\subseteq B, γ1(B)=bBγ1(b)\gamma^{-1}(B^{\prime})=\cup_{b\in B^{\prime}}\gamma^{-1}(b).

Definition 7

Given a continuous-state Markov system

𝕏=(𝒳,[[𝒯]],Π,L)\mathbb{X}=(\mathcal{X},\left[\![\mathcal{T}\right]\!],\Pi,L)

and a finite-state Markov system

𝕀=(𝒬,[[Θ]],Π,L𝕀),\mathbb{I}=(\mathcal{Q},[\![\Theta]\!],\Pi,L_{\mathbb{I}}),

where Q=(q1,,qn)TQ=(q_{1},\cdots,q_{n})^{T} and [[Θ]][\![\Theta]\!] stands for a collection of n×nn\times n stochastic matrices. A state-level relation α𝒳×Q\alpha\subseteq\mathcal{X}\times Q is said to be an abstraction from 𝕏\mathbb{X} to 𝕀\mathbb{I} if (i) for all x𝒳x\in\mathcal{X}, there exists qQq\in Q such that (x,q)α(x,q)\in\alpha; (ii) for all (x,q)α(x,q)\in\alpha, L𝕀(q)=L(x)L_{\mathbb{I}}(q)=L(x).

A measure-level relation γα𝔓(𝒳)×𝔓(Q)\gamma_{\alpha}\subseteq\mathfrak{P}(\mathcal{X})\times\mathfrak{P}(Q) is said to be an abstraction from 𝕏\mathbb{X} to 𝕀\mathbb{I} if for all i{1,2,,n}i\in\{1,2,\cdots,n\}, all 𝒯[[𝒯]]\mathcal{T}\in\left[\![\mathcal{T}\right]\!] and all xα1(qi)x\in\alpha^{-1}(q_{i}), there exists Θ[[Θ]]\Theta\in\left[\![\Theta\right]\!] such that (𝒯(x,),Θi)γα(\mathcal{T}(x,\cdot),\Theta_{i})\in\gamma_{\alpha} and that 𝒯(x,α1(qj))=Θij\mathcal{T}(x,\alpha^{-1}(q_{j}))=\Theta_{ij} for all j{1,2,,n}j\in\{1,2,\cdots,n\}.

Similarly, γα𝔓(Q)×𝔓(𝒳)\gamma_{\alpha}\subseteq\mathfrak{P}(Q)\times\mathfrak{P}(\mathcal{X}) is said to be an abstraction from 𝕀\mathbb{I} to 𝕏\mathbb{X} if for all i{1,2,,n}i\in\{1,2,\cdots,n\}, all Θ[[Θ]]\Theta\in\left[\![\Theta\right]\!] and all xα1(qi)x\in\alpha^{-1}(q_{i}), there exists 𝒯[[𝒯]]\mathcal{T}\in\left[\![\mathcal{T}\right]\!] such that (Θi,𝒯(x,))γα(\Theta_{i},\mathcal{T}(x,\cdot))\in\gamma_{\alpha} and that 𝒯(x,α1(qj))=Θij\mathcal{T}(x,\alpha^{-1}(q_{j}))=\Theta_{ij} for all j{1,2,,n}j\in\{1,2,\cdots,n\}.

If such relations α\alpha and γα\gamma_{\alpha} exist, we say that 𝕀\mathbb{I} abstracts 𝕏\mathbb{X} (resp. 𝕏\mathbb{X} abstracts 𝕀\mathbb{I}) and write 𝕏γα𝕀\mathbb{X}\preceq_{\gamma_{\alpha}}\mathbb{I} (resp. 𝕀γα𝕏\mathbb{I}\preceq_{\gamma_{\alpha}}\mathbb{X}).

Assumption 2

Without loss of generality, we assume that the labelling function is amenable to a rectangular partition222See e.g. [11, Definition 1].. In other words, a state-level abstraction can be obtained from a rectangular partition.

3 Soundness of Robust IMC Abstractions

IMCs333We omit the definition from this paper due to the limitation of space. For a formal definition see e.g. [18, Definition 3]. are quasi-Markov systems on a discrete state space with upper/under approximations (Θ^\hat{\Theta}/Θˇ\check{\Theta}) of the real transition matrices. To abstract the transition probabilities of continuous-state Markov systems (8), Θ^\hat{\Theta} and Θˇ\check{\Theta} are obtained from over/under approximations of 𝒯\mathcal{T} based on the state space partition. Throughout this section, we assume that Θ^\hat{\Theta} and Θˇ\check{\Theta} have been correspondingly constructed.

Given an IMC, we recast it to a finite-state Markov system

𝕀=(𝒬,[[Θ]],Π,L𝕀),\mathbb{I}=(\mathcal{Q},[\![\Theta]\!],\Pi,L_{\mathbb{I}}), (10)

where

  • 𝒬\mathcal{Q} is the finite state-space partition with dimension N+1N+1 containing {Δ}\{\Delta\}, i.e., Q=(q1,q2,,qN,Δ)TQ=(q_{1},q_{2},\cdots,q_{N},\Delta)^{T};

  • [[Θ]][\![\Theta]\!]444This is a necessary step to guarantee proper probability measures in (12). Algorithms can be found in [16] or [18, Section V-A]. is a set of stochastic matrices satisfying

    [[Θ]]={Θ:stochastic matrices withΘˇΘΘ^componentwisely};[\![\Theta]\!]=\{\Theta:\text{stochastic matrices with}\;\check{\Theta}\leq\Theta\leq\hat{\Theta}\;\;\text{componentwisely}\}; (11)
  • Π,L𝕀\Pi,L_{\mathbb{I}} are as before.

To make 𝕀\mathbb{I} an abstraction for (10), we need the approximation to be such that Θˇijqj𝒯(x,dy)Θ^ij\check{\Theta}_{ij}\leq\int_{q_{j}}\mathcal{T}(x,dy)\leq\hat{\Theta}_{ij} for all xqix\in q_{i} and i,j=1,,Ni,j=1,\cdots,N, as well as ΘN+1=(0,0,,1)\Theta_{N+1}=(0,0,\cdots,1). We further require that the partition should respect the boundaries induced by the labeling function, i.e., for any qQq\in Q,

L𝕀(q):=L(x),xq.L_{\mathbb{I}}(q):=L(x),\;\forall x\in q.

Clearly, the above connections on the state and transition probabilities satisfy Definition 7.

The Markov system 𝕀\mathbb{I} is understood as a family of ‘perturbed’ Markov chain generated by the uncertain choice of Θ\Theta for each tt. The nn-step transition matrices are derived based on [[Θ]][\![\Theta]\!] as

[[Θ(2)]]={Θ0Θ1:Θ0,Θ1[[Θ]]},[[Θ(n)]]={Θ0Θ1Θn:Θi[[Θ]],i=0,1,,n}.\begin{split}[\![\Theta^{(2)}]\!]&=\{\Theta_{0}\Theta_{1}:\;\;\Theta_{0},\Theta_{1}\in[\![\Theta]\!]\},\\ &\cdots\\ [\![\Theta^{(n)}]\!]&=\{\Theta_{0}\Theta_{1}\cdots\Theta_{n}:\;\;\Theta_{i}\in[\![\Theta]\!],\;i=0,1,\cdots,n\}.\\ \end{split}

Given an initial distribution μ0𝔓(Q)\mu_{0}\in\mathfrak{P}(Q), the marginal probability measure at each tt forms a set

𝔓(Q)tμ0:={μt=(Θ(t))Tμ0:Θ(t)[[Θ(t)]]}.\mathfrak{P}(Q)\supseteq\mathscr{M}_{t}^{\mu_{0}}:=\{\mu_{t}=(\Theta^{(t)})^{T}\mu_{0}:\;\;\Theta^{(t)}\in[\![\Theta^{(t)}]\!]\}. (12)

If we do not emphasize the initial distribution μ0\mu_{0}, we also use t\mathscr{M}_{t} to denote the marginals for short.

We aim to show the soundness of robust IMC abstractions in this section. The proofs in this section are completed in Appendix 0.A.

3.1 Weak compactness of marginal space t\mathscr{M}_{t} of probabilities

The following lemma is rephrased from [39, Theorem 2] and shows the structure of the t\mathscr{M}_{t} for each tt\in\mathbb{N} and any initial distribution μ0\mu_{0}.

Lemma 1

Let 𝕀\mathbb{I} be a Markov system of the form (10) that is derived from an IMC. Then the set t\mathscr{M}_{t} of all possible probability measures at each time tt\in\mathbb{N} is a convex polytope, and immediately is compact. The vertices of t\mathscr{M}_{t} are of the form

(Vit)T(Vi2)T(Vi1)Tμ0(V_{i_{t}})^{T}\cdots(V_{i_{2}})^{T}(V_{i_{1}})^{T}\mu_{0} (13)

for some vertices VijV_{i_{j}} of [[Θ]][\![\Theta]\!].

Example 1

Let Q=(q1,q2,q3)TQ=(q_{1},q_{2},q_{3})^{T} and μ0=(1,0,0)T\mu_{0}=(1,0,0)^{T}. The under/over estimations of transition matrices are given as

Θˇ=[14014001010],Θ^=[34034001010].\check{\Theta}=\begin{bmatrix}\frac{1}{4}&0&\frac{1}{4}\\ 0&0&1\\ 0&1&0\\ \end{bmatrix},\;\;\;\hat{\Theta}=\begin{bmatrix}\frac{3}{4}&0&\frac{3}{4}\\ 0&0&1\\ 0&1&0\\ \end{bmatrix}.

Then [[Θ]][\![\Theta]\!] forms a convex set of stochastic matrices with vertices

V1=[14034001010],V2=[34014001010].V_{1}=\begin{bmatrix}\frac{1}{4}&0&\frac{3}{4}\\ 0&0&1\\ 0&1&0\\ \end{bmatrix},\;\;\;V_{2}=\begin{bmatrix}\frac{3}{4}&0&\frac{1}{4}\\ 0&0&1\\ 0&1&0\\ \end{bmatrix}.

Therefore, the vertices of 1\mathscr{M}_{1} are

ν1(1)=(V1)Tμ0=(14,0,34)T,ν1(2)=(V2)Tμ0=(34,0,14)T.\nu_{1}^{(1)}=(V_{1})^{T}\mu_{0}=(\frac{1}{4},0,\frac{3}{4})^{T},\;\;\nu_{1}^{(2)}=(V_{2})^{T}\mu_{0}=(\frac{3}{4},0,\frac{1}{4})^{T}.

Hence, 1={μ:μ=αν1(1)+(1α)ν1(2),α[0,1]}\mathscr{M}_{1}=\{\mu:\mu=\alpha\nu_{1}^{(1)}+(1-\alpha)\nu_{1}^{(2)},\;\alpha\in[0,1]\}. Similarly, the vertices of 2\mathscr{M}_{2} are

ν2(1)=(V1)T(V1)Tμ0=(116,1216,316)T,ν2(2)=(V2)T(V1)Tμ0=(316,1216,116)T,ν2(3)=(V1)T(V2)Tμ0=(316,416,916)T,ν2(4)=(V2)T(V2)Tμ0=(916,416,316)T,\begin{split}&\nu_{2}^{(1)}=(V_{1})^{T}(V_{1})^{T}\mu_{0}=(\frac{1}{16},\frac{12}{16},\frac{3}{16})^{T},\;\;\nu_{2}^{(2)}=(V_{2})^{T}(V_{1})^{T}\mu_{0}=(\frac{3}{16},\frac{12}{16},\frac{1}{16})^{T},\\ &\nu_{2}^{(3)}=(V_{1})^{T}(V_{2})^{T}\mu_{0}=(\frac{3}{16},\frac{4}{16},\frac{9}{16})^{T},\;\;\nu_{2}^{(4)}=(V_{2})^{T}(V_{2})^{T}\mu_{0}=(\frac{9}{16},\frac{4}{16},\frac{3}{16})^{T},\\ \end{split}

and

2={μ:μ=αβν2(1)+α(1β)ν2(2)+β(1α)ν2(3)+(1α)(1β)ν2(4),α,β[0,1]}.\mathscr{M}_{2}=\{\mu:\mu=\alpha\beta\nu_{2}^{(1)}+\alpha(1-\beta)\nu_{2}^{(2)}+\beta(1-\alpha)\nu_{2}^{(3)}+(1-\alpha)(1-\beta)\nu_{2}^{(4)},\;\alpha,\beta\in[0,1]\}.

The calculation of the rest of t\mathscr{M}_{t} follows the same procedure.

Now we introduce the total variation distance TV\left\|\;\cdot\;\right\|_{\text{TV}} and see how (t,TV)(\mathscr{M}_{t},\left\|\;\cdot\;\right\|_{\text{TV}}) (at each tt) implies the weak topology.

Definition 8 (Total variation distance)

Given two probability measures μ\mu and ν\nu on 𝒳\mathcal{X}, the total variation distance is defined as

μνTV=2supΓ(𝒳)|μ(Γ)ν(Γ)|.\left\|\mu-\nu\right\|_{\text{TV}}=2\sup_{\Gamma\in\mathscr{B}(\mathcal{X})}|\mu(\Gamma)-\nu(\Gamma)|. (14)

In particular, if 𝒳\mathcal{X} is a discrete space,

μνTV=μν1=q𝒳|μ(q)ν(q)|.\left\|\mu-\nu\right\|_{\text{TV}}=\|\mu-\nu\|_{1}=\sum_{q\in\mathcal{X}}|\mu(q)-\nu(q)|. (15)
Corollary 1

Let 𝕀\mathbb{I} be a Markov system of the form (10) that is derived from an IMC. Then at each time tt\in\mathbb{N}, for for each {μn}t\{\mu_{n}\}\subseteq\mathscr{M}_{t}, there exists a μt\mu\in\mathscr{M}_{t} and a subsequence {μnk}\{\mu_{n_{k}}\} such that μnkμ\mu_{n_{k}}\Rightarrow\mu. In addition, for each hCb(𝒳)h\in C_{b}(\mathcal{X}) and tt\in\mathbb{N}, the set H={𝒳h(x)μ(x),μt}H=\{\sum_{\mathcal{X}}h(x)\mu(x),\;\mu\in\mathscr{M}_{t}\} forms a convex and compact subset in \mathbb{R}.

Remark 6

The above shows that TV\left\|\;\cdot\;\right\|_{\text{TV}} metricizes the weak topology of measures on QQ. Note that since QQ is bounded and finite, any metrizable family of measures on QQ is compact. For example, let Q={q1,q2}Q=\{q_{1},q_{2}\}, and {(0,1)T,(1,0)T}\{(0,1)^{T},(1,0)^{T}\} be a set of singular measures on QQ. Then every sequence of the above set has a weakly convergent subsequence. However, these measures do not have a convex structure as t\mathscr{M}_{t}. Hence the corresponding HH that is generated by {(0,1)T,(1,0)T}\{(0,1)^{T},(1,0)^{T}\} only provides vertices in \mathbb{Z}.

3.2 Weak compactness of probability laws of 𝕀\mathbb{I} on infinite horizon

In this subsection, we focus on the case where I0=q0I_{0}=q_{0} a.s. for any q0Q{Δ}q_{0}\in Q\setminus\{\Delta\}. The cases for arbitrary initial distribution should be similar. We formally denote q0:={𝐏Iq0}I𝕀\mathscr{M}^{q_{0}}:=\{\mathbf{P}_{I}^{q_{0}}\}_{I\in\mathbb{I}} by the set of probability laws of every discrete-state Markov processes I𝕀I\in\mathbb{I} with initial state q0Qq_{0}\in Q. We denote tq0\mathscr{M}_{t}^{q_{0}} by the set of marginals at tt.

Proposition 1

For any q0Qq_{0}\in Q, every sequence {𝐏nq0}n=0\{\mathbf{P}_{n}^{q_{0}}\}_{n=0}^{\infty} of q0\mathscr{M}^{q_{0}} has a weakly convergent subsequence.

Remark 7

The property is an extension of the marginal weak compactness relying on the (countable) product topology. We can also introduce proper product metrics to metricize, see e.g. [28]. Similar results hold under certain conditions for continuous time processes on continuous state spaces with uniform norms [27, Lemma 82.3 and 87.3].

Theorem 2

Let 𝕀\mathbb{I} be a Markov system of the form (10) that is derived from an IMC. Then for any LTL formula Ψ\Psi, the set Sq0={𝐏Iq0(IΨ)}I𝕀S^{q_{0}}=\{\mathbf{P}_{I}^{q_{0}}(I\vDash\Psi)\}_{I\in\mathbb{I}} is a convex and compact subset in \mathbb{R}, i.e., a compact interval.

3.3 Soundness of IMC abstractions

Proposition 2

Let 𝕏\mathbb{X} be a Markov system driven by (8). Then every sequence {𝒫nx0}n=0\{\mathcal{P}_{n}^{x_{0}}\}_{n=0}^{\infty} of {𝒫Xx0}X𝕏\{\mathcal{P}_{X}^{x_{0}}\}_{X\in\mathbb{X}} has a weakly convergent subsequence. Consequently, for any LTL formula Ψ\Psi, the set {𝒫Xx0(XΨ)}X𝕏\{\mathcal{P}_{X}^{x_{0}}(X\vDash\Psi)\}_{X\in\mathbb{X}} is a compact subset in \mathbb{R}.

Lemma 2

Let X𝕏X\in\mathbb{X} be any Markov process driven by (8) and 𝕀\mathbb{I} be the finite-state IMC abstraction of 𝕏\mathbb{X}. Suppose the initial distribution ν0\nu_{0} of XX is such that ν0(q0)=1\nu_{0}(q_{0})=1. Then, there exists a unique law 𝐏Iq0\mathbf{P}_{I}^{q_{0}} of some I𝕀I\in\mathbb{I} such that, for any LTL formula Ψ\Psi,

𝒫Xν0(XΨ)=𝐏Iq0(IΨ).\mathcal{P}_{X}^{\nu_{0}}(X\vDash\Psi)=\mathbf{P}_{I}^{q_{0}}(I\vDash\Psi).
Theorem 3

Assume the settings in Lemma 2. For any LTL formula Ψ\Psi, we have

𝒫Xν0(XΨ){𝐏Iq0(IΨ)}I𝕀,\mathcal{P}_{X}^{\nu_{0}}(X\vDash\Psi)\in\{\mathbf{P}_{I}^{q_{0}}(I\vDash\Psi)\}_{I\in\mathbb{I}},

Proof  The conclusion is obtained by combining Lemma 2 and Theorem 2.  

Corollary 2

Let 𝕏\mathbb{X}, its IMC abstraction 𝕀\mathbb{I}, an LTL formula Ψ\Psi, and a constant ρ[0,1]\rho\in[0,1] be given. Suppose I𝐏ρq0[Ψ]I\vDash\mathbf{P}^{q_{0}}_{\bowtie\rho}[\Psi] for all I𝕀I\in\mathbb{I}, we have X𝒫ρν0[Ψ]X\vDash\mathcal{P}^{\nu_{0}}_{\bowtie\rho}[\Psi] for all X𝕏X\in\mathbb{X} with ν0(q0)=1\nu_{0}(q_{0})=1.

Remark 8

Note that we do not have 𝒫Xν0{𝐏Iq0}I𝕀\mathcal{P}_{X}^{\nu_{0}}\in\{\mathbf{P}_{I}^{q_{0}}\}_{I\in\mathbb{I}} since each 𝐏Iq0\mathbf{P}_{I}^{q_{0}} is a discrete measure whereas 𝒫Xν0\mathcal{P}_{X}^{\nu_{0}} is not. They only coincide when measuring Borel subset of 𝐅\mathbf{F}. It would be more accurate to state that 𝒫Xν0(XΨ)\mathcal{P}_{X}^{\nu_{0}}(X\vDash\Psi) is a member of {𝐏Iq0(IΨ)}I𝕀\{\mathbf{P}_{I}^{q_{0}}(I\vDash\Psi)\}_{I\in\mathbb{I}} rather than say “the true distribution (the law as what we usually call) of the original system is a member of the distribution set represented by the abstraction model” [18].

Remark 9

We have seen that, in view of Lemma 2, the ‘post-transitional’ measures are automatically related only based on the relations between transition probabilities. We will see in the next section that such relations can be constructed to guarantee an approximate completeness of 𝕀\mathbb{I}.

Proposition 3

Let ε:=maxiΘ^iΘˇiTV\varepsilon:=\max_{i}\|\hat{\Theta}_{i}-\check{\Theta}_{i}\|_{\text{TV}}. Then for each LTL formula Ψ\Psi, as ε0\varepsilon\rightarrow 0, the length λ(Sq0)0\lambda(S^{q_{0}})\rightarrow 0.

Remark 10

By Lemma 2, for each X𝕏X\in\mathbb{X}, there exists exactly one 𝐏I\mathbf{P}_{I} of some I𝕀I\in\mathbb{I} by which satisfaction probability equals to that of XX. The precision of Θ^\hat{\Theta} and Θˇ\check{\Theta} determines the size of Sq0S^{q_{0}}. Once we are able to calculate the exact law of XX, the Sq0S^{q_{0}} becomes a singleton by Proposition 3. For example, let each wtw_{t} become δ0\delta_{0}, we have each t\mathscr{M}_{t} reduced to a singleton {δf(xt)}\{\delta_{f(x_{t})}\} automatically. The verification problem becomes checking whether L(f(xt))ΨL(f(x_{t}))\vDash\Psi given the partition QQ. The probability of satisfaction is either 0 or 11. Another example would be Xt+1=AXt+BwtX_{t+1}=AX_{t}+Bw_{t}, where A,BA,B are linear matrices. We are certain about the exact law of this system, and there is no need to introduce IMC for approximations at the beginning. IMC abstractions prove more useful when coping with systems whose marginal distributions are uncertain or not readily computable.

4 Robust Completeness of IMC Abstractions

In this section, we are given a Markov system 𝕏1\mathbb{X}_{1} driven by (7) with point-mass perturbations of strength ϑ10\vartheta_{1}\geq 0. Based on 𝕏1\mathbb{X}_{1}, we first construct an IMC abstraction 𝕀\mathbb{I}. We then show that 𝕀\mathbb{I} can be abstracted by a system 𝕏2\mathbb{X}_{2} with more general 1\mathcal{L}_{1}-bounded noise of any arbitrary strength ϑ2>ϑ1\vartheta_{2}>\vartheta_{1}.

Recalling the soundness analysis of IMC abstractions in Section 3, the relation of satisfaction probability is induced by a relation between the continuous and discrete transitions. To capture the probabilistic properties of stochastic processes, reachable set of probability measures is the analogue of the reachable set in deterministic cases. We rely on a similar technique in this section to discuss how transition probabilities of different uncertain Markov systems are related. To metricize sets of Gaussian measures and to connect them with discrete measures, we prefer to use Wasserstein metric.

Definition 9

Let μ,ν𝔓(𝒳)\mu,\nu\in\mathfrak{P}(\mathcal{X}) for (𝒳,||)(\mathcal{X},|\cdot|), the Wasserstein distance555This is formally termed as 1st1^{\text{st}}-Wasserstein metric. We choose 1st1^{\text{st}}-Wasserstein metric due to the convexity and nice property of test functions. is defined by μνW=inf|XY|\left\|\mu-\nu\right\|_{\text{W}}=\inf\mathcal{E}|X-Y|, where the infimum is is taken over all joint distributions of the random variables XX and YY with marginals μ\mu and ν\nu respectively. We frequently use the following duality form of definition666Lip(h)\operatorname{Lip}(h) is the Lipschitz constant of hh such that |h(x2)h(x1)|Lip(h)|x2x1||h(x_{2})-h(x_{1})|\leq\operatorname{Lip}(h)|x_{2}-x_{1}|.,

μνW:=sup{|𝒳h(x)dμ(x)𝒳h(x)dν(x)|,hC(𝒳),Lip(h)1}.\left\|\mu-\nu\right\|_{\text{W}}:=\sup\left\{\left|\int_{\mathcal{X}}h(x)d\mu(x)-\int_{\mathcal{X}}h(x)d\nu(x)\right|,\;h\in C(\mathcal{X}),\operatorname{Lip}(h)\leq 1\right\}.

The discrete case, Wd\left\|\cdot\right\|_{\text{W}}^{d}, is nothing but to change the integral to summation. Let BW={μ𝔓(𝒳):μδ0W1}B_{W}=\{\mu\in\mathfrak{P}(\mathcal{X}):\left\|\mu-\delta_{0}\right\|_{\text{W}}\leq 1\}. Given a set 𝔊𝔓(𝒳)\mathfrak{G}\subseteq\mathfrak{P}(\mathcal{X}), we denote μ𝔊=infν𝔊μνW\|\mu\|_{\mathfrak{G}}=\inf_{\nu\in\mathfrak{G}}\left\|\mu-\nu\right\|_{\text{W}} by the distance from μ\mu to 𝔊\mathfrak{G}, and 𝔊+r𝔹W:={μ:μ𝔊r}\mathfrak{G}+r\mathbb{B}_{W}:=\{\mu:\;\|\mu\|_{\mathfrak{G}}\leq r\}777This is valid by definition. by the rr-neighborhood of 𝔊\mathfrak{G}.

Remark 11

Note that 𝔹W\mathbb{B}_{W} is dual to 𝔹1\mathbb{B}_{1}. For any μ𝔹W\mu\in\mathbb{B}_{W}, the associated random variable XX should satisfy |X|1\mathcal{E}|X|\leq 1, and vice versa.

The following well-known result estimates the Wasserstein distance between two Gaussians.

Proposition 4

Let μ𝒩(m1,Σ1)\mu\sim\mathcal{N}(m_{1},\Sigma_{1}) and ν𝒩(m2,Σ2)\nu\sim\mathcal{N}(m_{2},\Sigma_{2}) be two Gaussian measures on n\mathbb{R}^{n}. Then

|m1m2|μνW(m1m222+Σ11/2Σ21/2F2)1/2,|m_{1}-m_{2}|\leq\left\|\mu-\nu\right\|_{\text{W}}\leq\left(\|m_{1}-m_{2}\|_{2}^{2}+\|\Sigma_{1}^{1/2}-\Sigma_{2}^{1/2}\|_{F}^{2}\right)^{1/2}, (16)

where F\|\cdot\|_{F} is the Frobenius norm.

On finite state spaces, total variation and Wasserstein distances manifest equivalence [13, Theorem 4]. We only show the following side of inequality in favor of the later proofs.

Proposition 5

For any μ,ν\mu,\nu on some discrete and finite space QQ, we have

μνWddiam(Q)μνTV.\left\|\mu-\nu\right\|_{\text{W}}^{d}\leq\operatorname{diam}(Q)\cdot\left\|\mu-\nu\right\|_{\text{TV}}. (17)

Before proceeding, we define the set of transition probabilities of 𝕏i\mathbb{X}_{i} from any box [x]n[x]\subseteq\mathbb{R}^{n} as

𝕋i([x])={𝒯(x,):𝒯[[𝒯]]i,x[x]},i=1,2,\mathbb{T}_{i}([x])=\{\mathcal{T}(x,\cdot):\;\mathcal{T}\in\left[\![\mathcal{T}\right]\!]_{i},\;x\in[x]\},\;i=1,2,

and use the following lemma to approximate 𝕋1([x])\mathbb{T}_{1}([x]).

Lemma 3

Fix any ε>0\varepsilon>0, any box [x]n[x]\subseteq\mathbb{R}^{n}. For all κ>0\kappa>0, there exists a finitely terminated algorithm to compute an over-approximation of the set of (Gaussian) transition probabilities from [x][x], such that

𝕋1([x])𝕋1([x])^𝕋1([x])+κ𝔹W,\mathbb{T}_{1}([x])\subseteq\widehat{\mathbb{T}_{1}([x])}\subseteq\mathbb{T}_{1}([x])+\kappa\mathbb{B}_{W},

where 𝕋1([x])^\widehat{\mathbb{T}_{1}([x])} is the computed over-approximation set of Gaussian measures.

Remark 12

The proof is completed in Appendix 0.B. The lemma renders the inclusions with larger Wasserstein distance to ensure no missing information about the covariances.

Definition 10

For i=1,2i=1,2, we introduce the modified transition probabilities for 𝕏i=(𝒳,[[𝒯]]i,x0,Π,L)\mathbb{X}_{i}=(\mathcal{X},\left[\![\mathcal{T}\right]\!]_{i},x_{0},\Pi,L) based on (9). For all 𝒯i[[𝒯]]i\mathcal{T}_{i}\in\left[\![\mathcal{T}\right]\!]_{i}, let

𝒯~i(x,Γ)={𝒯i(x,Γ),Γ𝒲,xW,𝒯i(x,𝒲c),Γ=𝒲,xW,1,Γ=𝒲,x𝒲.\tilde{\mathcal{T}}_{i}(x,\Gamma)=\left\{\begin{array}[]{lr}\mathcal{T}_{i}(x,\Gamma),\;\forall\Gamma\subseteq\mathcal{W},\;\forall x\in W,\\ \mathcal{T}_{i}(x,\mathcal{W}^{c}),\;\Gamma=\partial\mathcal{W},\;\forall x\in W,\\ 1,\;\Gamma=\partial\mathcal{W},\;x\in\partial\mathcal{W}.\end{array}\right. (18)

Correspondingly, let [[𝒯]]~\tilde{\left[\![\mathcal{T}\right]\!]} denote the collection. Likewise, we also use ()~\tilde{(\cdot)} to denote the induced quantities of any other types w.r.t. such a modification.

Remark 13

We introduce the concept only for analysis. The above modification does not affect the law of the stopped processes since we do not care about the ‘out-of-domain’ transitions. We use a weighted point mass to represent the measures at the boundary, and the mean should remain the same. It can be easily shown that the Wasserstein distance between any two measures in [[𝒯]]~(x,)\tilde{\left[\![\mathcal{T}\right]\!]}(x,\cdot) is upper bounded by that of the non-modified ones.

Theorem 4

For any 0ϑ1<ϑ20\leq\vartheta_{1}<\vartheta_{2}, we set 𝕏i=(𝒳,[[𝒯]]~i,x0,Π,L)\mathbb{X}_{i}=(\mathcal{X},\tilde{\left[\![\mathcal{T}\right]\!]}_{i},x_{0},\Pi,L), i=1,2i=1,2, where 𝕏1\mathbb{X}_{1} is perturbed by point masses with intensity ϑ1\vartheta_{1}, and 𝕏2\mathbb{X}_{2} is perturbed by general L1L_{1}-perturbation with intensity ϑ2\vartheta_{2}. Then, under Assumption 2, there exists a rectangular partition QQ (state-level relation α𝒳×Q\alpha\subseteq\mathcal{X}\times Q),a measure-level relation γα\gamma_{\alpha} and a collection of transition matrices [[Θ]]\left[\![\Theta\right]\!], such that the system 𝕀=(Q,[[Θ]],q0,Π,L)\mathbb{I}=(Q,\left[\![\Theta\right]\!],q_{0},\Pi,L) abstracts 𝕏1\mathbb{X}_{1} and is abstracted by 𝕏2\mathbb{X}_{2} by the following relation:

𝕏1γα𝕀,𝕀γα1𝕏2.\mathbb{X}_{1}\preceq_{\gamma_{\alpha}}\mathbb{I},\;\;\mathbb{I}\preceq_{\gamma_{\alpha}^{-1}}\mathbb{X}_{2}. (19)

Proof  We construct a finite-state IMC with partition QQ and an inclusion of transition matrices [[Θ]]\left[\![\Theta\right]\!] as follows. By Assumption 2, we use uniform rectangular partition on 𝒲\mathcal{W} and set α={(x,q):q=ηxη}{(Δ,Δ)}\alpha=\{(x,q):q=\eta\lfloor\frac{x}{\eta}\rfloor\}\cup\{(\Delta,\Delta)\}, where \lfloor\cdot\rfloor is the floor function and η\eta is to be chosen later. Denote the number of discrete nodes by N+1N+1.

Note that any family of (modified) Gaussian measures [[𝒯]]~1\tilde{\left[\![\mathcal{T}\right]\!]}_{1} is induced from [[𝒯]]1\left[\![\mathcal{T}\right]\!]_{1} and should contain its information. For any 𝒯~[[𝒯]]~1\tilde{\mathcal{T}}\in\tilde{\left[\![\mathcal{T}\right]\!]}_{1} and qQq\in Q,

  1. (i)

    for all ν~𝒩~(m,s2)𝕋~1(α1(q),)\tilde{\nu}\sim\tilde{\mathcal{N}}(m,s^{2})\in\tilde{\mathbb{T}}_{1}(\alpha^{-1}(q),\cdot), store {(ml,Σl)=(ηmη,η2s2η2)}l\{(m_{l},\Sigma_{l})=(\eta\lfloor\frac{m}{\eta}\rfloor,\eta^{2}\lfloor{\frac{s^{2}}{\eta^{2}}}\rfloor)\}_{l};

  2. (ii)

    for each ll, define ν~lref𝒩~(ml,Σl)\tilde{\nu}_{l}^{\text{ref}}\sim\tilde{\mathcal{N}}(m_{l},\Sigma_{l}) (implicitly, we need to compute νlref(Δ)\nu_{l}^{\text{ref}}(\Delta)); compute ν~lref(α1(qj))\tilde{\nu}_{l}^{\text{ref}}(\alpha^{-1}(q_{j})) for each qjQΔq_{j}\in Q\setminus\Delta;

  3. (iii)

    for each ll, define μlref=[ν~lref(α1(q1)),,ν~lref(α1(qN)),ν~lref(Δ)]\mu^{\text{ref}}_{l}=[\tilde{\nu}_{l}^{\text{ref}}(\alpha^{-1}(q_{1})),\cdots,\tilde{\nu}_{l}^{\text{ref}}(\alpha^{-1}(q_{N})),\tilde{\nu}_{l}^{\text{ref}}(\Delta)];

  4. (iv)

    compute ws:=(2N+2)η\textbf{ws}:=(\sqrt{2N}+2)\eta and tv:=Nηws\textbf{tv}:=N\eta\cdot\textbf{ws};

  5. (v)

    construct [[μ]]=l{μ:μμlrefTVtv(η),μ(Δ)+jNμ(qj)=1}\left[\![\mu\right]\!]=\bigcup_{l}\{\mu:\left\|\mu-\mu^{\text{ref}}_{l}\right\|_{\text{TV}}\leq\textbf{tv}(\eta),\;\mu(\Delta)+\sum_{j}^{N}\mu(q_{j})=1\};

  6. (vi)

    Let γα={(ν~,μ),μ[[μ]]}\gamma_{\alpha}=\{(\tilde{\nu},\mu),\;\mu\in\left[\![\mu\right]\!]\} be a relation between ν~𝕋~(α1(q))\tilde{\nu}\in\tilde{\mathbb{T}}(\alpha^{-1}(q)) and the generated [[μ]]\left[\![\mu\right]\!].

Repeat the above step for all qq, the relation γα\gamma_{\alpha} is obtained. The rest of the proof falls in the following steps. For iNi\leq N, we simply denote 𝔊i:=𝕋~1(α1(qi))\mathfrak{G}_{i}:=\tilde{\mathbb{T}}_{1}(\alpha^{-1}(q_{i})) and 𝔊^i:=𝕋1~(α1(qi))^\hat{\mathfrak{G}}_{i}:=\widehat{\tilde{\mathbb{T}_{1}}(\alpha^{-1}(q_{i}))}.
Claim 1: For iNi\leq N, let [[Θi]]=γα(𝔊^i)\left[\![\Theta_{i}\right]\!]=\gamma_{\alpha}(\hat{\mathfrak{G}}_{i}). Then the finite-state IMC 𝕀\mathbb{I} with transition collection [[Θ]]\left[\![\Theta\right]\!] abstracts 𝕏1\mathbb{X}_{1}.

Indeed, for each i=1,,Ni=1,\cdots,N and each 𝒯~\tilde{\mathcal{T}}, we have γα(𝔊i)γα(𝔊^i)\gamma_{\alpha}(\mathfrak{G}_{i})\subseteq\gamma_{\alpha}(\widehat{\mathfrak{G}}_{i}). We pick any modified Gaussian ν~𝔊^i\tilde{\nu}\in\hat{\mathfrak{G}}_{i}, there exists a ν~ref\tilde{\nu}^{\text{ref}} such that (by Proposition 4) ν~ν~refWννrefW2Nη\left\|\tilde{\nu}-\tilde{\nu}^{\text{ref}}\right\|_{\text{W}}\leq\left\|\nu-\nu^{\text{ref}}\right\|_{\text{W}}\leq\sqrt{2N}\eta. We aim to find all discrete measures μ\mu induced from ν~\tilde{\nu} (such that their probabilities match on discrete nodes as requirement by Definition 7). All such μ\mu should satisfy888Note that we also have μμrefWdμν~Wd+ν~ν~refWd+ν~refμrefWd=ν~ν~refWd\left\|\mu-\mu^{\text{ref}}\right\|_{\text{W}}^{d}\leq\left\|\mu-\tilde{\nu}\right\|_{\text{W}}^{d}+\left\|\tilde{\nu}-\tilde{\nu}^{\text{ref}}\right\|_{\text{W}}^{d}+\left\|\tilde{\nu}^{\text{ref}}-\mu^{\text{ref}}\right\|_{\text{W}}^{d}=\left\|\tilde{\nu}-\tilde{\nu}^{\text{ref}}\right\|_{\text{W}}^{d}, but it is hard to connect ν~ν~refWd\left\|\tilde{\nu}-\tilde{\nu}^{\text{ref}}\right\|_{\text{W}}^{d} with ν~ν~refW\left\|\tilde{\nu}-\tilde{\nu}^{\text{ref}}\right\|_{\text{W}} for general measures. This connection can be done if we only compare Dirac or discrete measures.,

μμrefWd=μμrefWμν~W+ν~ν~refW+ν~refμrefW(2+2N)η,\begin{split}\left\|\mu-\mu^{\text{ref}}\right\|_{\text{W}}^{d}&=\left\|\mu-\mu^{\text{ref}}\right\|_{\text{W}}\\ &\leq\left\|\mu-\tilde{\nu}\right\|_{\text{W}}+\left\|\tilde{\nu}-\tilde{\nu}^{\text{ref}}\right\|_{\text{W}}+\left\|\tilde{\nu}^{\text{ref}}-\mu^{\text{ref}}\right\|_{\text{W}}\\ &\leq(2+\sqrt{2N})\eta,\end{split} (20)

where the first term of line 2 is bounded by,

μν~W=suphC(𝒳),Lip(h)1|𝒳h(x)𝑑μ(x)𝒳h(x)𝑑ν~(x)|suphC(𝒳),Lip(h)1j=1nα1(qj)|h(x)h(qj)|𝑑ν~(x)ηj=1nα1(qj)𝑑ν~(x)η,\begin{split}\left\|\mu-\tilde{\nu}\right\|_{\text{W}}&=\sup_{h\in C(\mathcal{X}),\operatorname{Lip}(h)\leq 1}\left|\int_{\mathcal{X}}h(x)d\mu(x)-\int_{\mathcal{X}}h(x)d\tilde{\nu}(x)\right|\\ &\leq\sup_{h\in C(\mathcal{X}),\operatorname{Lip}(h)\leq 1}\sum_{j=1}^{n}\int_{\alpha^{-1}(q_{j})}|h(x)-h(q_{j})|d\tilde{\nu}(x)\\ &\leq\eta\sum_{j=1}^{n}\int_{\alpha^{-1}(q_{j})}d\tilde{\nu}(x)\leq\eta,\end{split} (21)

and the third term of line 2 is bounded in a similar way. By step (v)(vi) and Proposition 5, all possible discrete measures μ\mu induced from ν~\tilde{\nu} should be included in γα(𝔊^i)\gamma_{\alpha}(\hat{\mathfrak{G}}_{i}). Combining the above, for any ν~𝔊i\tilde{\nu}\in\mathfrak{G}_{i} and hence in 𝔊^i\hat{\mathfrak{G}}_{i}, there exists a discrete measures in Θiγα(𝔊^i)\Theta_{i}\in\gamma_{\alpha}(\hat{\mathfrak{G}}_{i}) such that for all qjq_{j} we have ν~(α1(qj))=Θij\tilde{\nu}(\alpha^{-1}(q_{j}))=\Theta_{ij}. This satisfies the definition of abstraction.

Claim 2: γα1(γα(𝔊i))𝔊i+(2η+Nηtv(η))𝔹W\gamma_{\alpha}^{-1}(\gamma_{\alpha}(\mathfrak{G}_{i}))\subseteq\mathfrak{G}_{i}+(2\eta+N\eta\cdot\textbf{tv}(\eta))\cdot\mathbb{B}_{W}. This is to recover all possible (modified) measures ν~\tilde{\nu} from the constructed γα(𝔊i)\gamma_{\alpha}(\mathfrak{G}_{i}), such that their discrete probabilities coincide. Note that, the ‘ref’ information is recorded when computing γα(𝔊i)\gamma_{\alpha}(\mathfrak{G}_{i}) in the inner parentheses. Therefore, for any μγα(𝔊i)\mu\in\gamma_{\alpha}(\mathfrak{G}_{i}) there exists a μref\mu^{\text{ref}} within a total variation radius tv(η)\textbf{tv}(\eta). We aim to find corresponding measure ν~\tilde{\nu} that matches μ\mu by their probabilities on discrete nodes. All such ν~\tilde{\nu} should satisfy,

ν~ν~refWν~μW+μμrefWd+μrefν~refW2η+Nηtv(η),\begin{split}\left\|\tilde{\nu}-\tilde{\nu}^{\text{ref}}\right\|_{\text{W}}&\leq\left\|\tilde{\nu}-\mu\right\|_{\text{W}}+\left\|\mu-\mu^{\text{ref}}\right\|_{\text{W}}^{d}+\left\|\mu^{\text{ref}}-\tilde{\nu}^{\text{ref}}\right\|_{\text{W}}\\ &\leq 2\eta+N\eta\cdot\textbf{tv}(\eta),\end{split} (22)

where the bounds for the first and third terms are obtained in the same way as (21). The second term is again by a rough comparison in Proposition 5. Note that ν~ref\tilde{\nu}^{\text{ref}} is already recorded in 𝔊i\mathfrak{G}_{i}. The inequality in (22) provides an upper bound of Wasserstein deviation between any possible satisfactory measure and some ν~ref𝔊i\tilde{\nu}^{\text{ref}}\in\mathfrak{G}_{i}.

Claim 3: If we can choose η\eta and κ\kappa sufficiently small such that

2η+Nηtv(η)+κϑ2ϑ1,2\eta+N\eta\cdot\textbf{tv}(\eta)+\kappa\leq\vartheta_{2}-\vartheta_{1}, (23)

then 𝕀γα1𝕏2\mathbb{I}\preceq_{\gamma_{\alpha}^{-1}}\mathbb{X}_{2}.

Indeed, the [[Θ]]\left[\![\Theta\right]\!] is obtained by γα(𝔊^i)\gamma_{\alpha}(\hat{\mathfrak{G}}_{i}) for each ii. By Claim 2 and Lemma 3, we have

γα1(γα(𝔊^i))𝔊^i+(2η+Nηtv(η))𝔹W𝔊i+(2η+Nηtv(η)+κ)𝔹W\gamma_{\alpha}^{-1}(\gamma_{\alpha}(\hat{\mathfrak{G}}_{i}))\subseteq\hat{\mathfrak{G}}_{i}+(2\eta+N\eta\cdot\textbf{tv}(\eta))\cdot\mathbb{B}_{W}\subseteq\mathfrak{G}_{i}+(2\eta+N\eta\cdot\textbf{tv}(\eta)+\kappa)\cdot\mathbb{B}_{W}

for each ii. By the construction, we can verify that 𝕋~2(α1(qi))=𝔊i+(ϑ2ϑ1)𝔹W\tilde{\mathbb{T}}_{2}(\alpha^{-1}(q_{i}))=\mathfrak{G}_{i}+(\vartheta_{2}-\vartheta_{1})\cdot\mathbb{B}_{W}. The selection of η\eta makes γα1(γα(𝔊^i))𝕋~2(α1(qi))\gamma_{\alpha}^{-1}(\gamma_{\alpha}(\hat{\mathfrak{G}}_{i}))\subseteq\tilde{\mathbb{T}}_{2}(\alpha^{-1}(q_{i})), which completes the proof.  

Remark 14

The relation γα\gamma_{\alpha} (resp. γα1\gamma_{\alpha}^{-1}) provides a procedure to include all proper (continuous, discrete) measures that connect with the discrete probabilities. The key point is to record ν~ref\tilde{\nu}^{\text{ref}}, μref\mu^{\text{ref}}, and the corresponding radius. These are nothing but finite coverings of the space of measures. This also explains the reason why we use ‘finite-state’ rather than ‘finite’ abstraction. The latter has a meaning of using finite numbers of representative measures to be the abstraction.

To guarantee a sufficient inclusion, conservative estimations, e.g. the bound 2Nη\sqrt{2N}\eta in Claim 1 and the bound in Proposition 5, are made. This estimation can be done more accurately given more assumptions. For example, the deterministic systems (where ww becomes δ\delta) provide Dirac transition measures, the μμrefWd=0\left\|\mu-\mu^{\text{ref}}\right\|_{\text{W}}^{d}=0 and hence the second term in (22) is 0.

Remark 15

Note that, to guarantee the second abstraction based on γα1\gamma_{\alpha}^{-1}, we search all possible measures that has the same discrete probabilities as μγα(𝔊^i)\mu\in\gamma_{\alpha}(\hat{\mathfrak{G}}_{i}), not only those Gaussians with the same covariances as 𝔊i\mathfrak{G}_{i} (or 𝔊^i\hat{\mathfrak{G}}_{i}). Such a set of measures provide a convex ball w.r.t. Wasserstein distance. This actually makes sense because in the forward step of creating 𝕀\mathbb{I}, we have used both Wasserstein and total variation distance to find a convex inclusion of all Gaussian or Gaussian related measures. There ought to be some measures that are ‘non-recoverable’ to Gaussians, unless we extract some ‘Gaussian recoverable’ discrete measures in [[Θi]]\left[\![\Theta_{i}\right]\!], but this loses the point of over-approximation. In this view, IMC abstractions provide unnecessarily larger inclusions than needed.

For the deterministic case, the above mentioned ‘extraction’ is possible, since the transition measures do not have diffusion, the convex inclusion becomes a collection of vertices themselves (also see Remark 6). Based on these vertices, we are able to use γα\gamma_{\alpha} to find the δ\delta measures within a convex ball w.r.t. Wasserstein distance.

In contrast to the above special case [21], where the uncertainties are bounded w.r.t. the infinity norm, we can only guarantee the approximated completeness via a robust 1\mathcal{L}_{1}-bounded perturbation with strictly larger intensity than the original point-mass perturbation. However, this indeed describes a general type of uncertainties for the stochastic systems to guarantee 1\mathcal{L}_{1}-related properties, including probabilistic properties. Unless higher-moment specifications are of interests, uncertain 1\mathcal{L}_{1}-random variables are what we need to be the analogue of perturbations in [21].

Corollary 3

Given an LTL formula Ψ\Psi, let Siν0={𝒫Xν0(XΨ)}X𝕏iS_{i}^{\nu_{0}}=\{\mathcal{P}_{X}^{\nu_{0}}(X\vDash\Psi)\}_{X\in\mathbb{X}_{i}} (i=1,2i=1,2) and S𝕀q0={𝐏Iq0(IΨ)}I𝕀S_{\mathbb{I}}^{q_{0}}=\{\mathbf{P}_{I}^{q_{0}}(I\vDash\Psi)\}_{I\in\mathbb{I}}, where the initial conditions are such that ν0(α1(q0))=1\nu_{0}(\alpha^{-1}(q_{0}))=1. Then all the above sets are compact and S1ν0S𝕀q0S2ν0S_{1}^{\nu_{0}}\subseteq S_{\mathbb{I}}^{q_{0}}\subseteq S_{2}^{\nu_{0}}.

The proof in shown in Appendix 0.B.

5 Conclusion

In this paper, we constructed an IMC abstraction for continuous-state stochastic systems with possibly bounded point-mass (Dirac) perturbations. We showed that such abstractions are not only sound, in the sense that the set of satisfaction probability of linear-time properties contains that of the original system, but also approximately complete in the sense that the constructed IMC can be abstracted by another system with stronger but more general 1\mathcal{L}_{1}-bounded perturbations. Consequently, the winning set of the probabilistic specifications for a more perturbed continuous-state stochastic system contains that of the less Dirac perturbed system. Similar to most of the existing converse theorems, e.g. converse Lyapunov functions, the purpose is not to provide an efficient approach for finding them, but rather to characterize the theoretical possibilities of having such existence.

It is interesting to compare with robust deterministic systems, where no random variables are involved. In [21], both perturbed systems are w.r.t. bounded point masses. More heavily perturbed systems abstract less perturbed ones and hence preserve robust satisfaction of linear-time properties. However, when we try to obtain the approximated completeness via uncertainties in stochastic system, the uncertainties should be modelled by more general 1\mathcal{L}_{1} random variables. Note that the probabilistic properties of random variables is dual to the weak topology of measures, we study the measures and hence probability laws of processes instead of the state space per se. The state-space topology is not sufficient to quantify the regularity of IMC abstractions. In contrast, the 1\mathcal{L}_{1} uncertain random variables is a perfect analogue of the uncertain point masses (in |||\cdot|) for deterministic systems. If we insist on using point masses as the only type of uncertainties for stochastic systems, the IMC type abstractions would possibly fail to guarantee the completeness. For example, suppose the point-mass perturbations represents less precision of deterministic control inputs [23, Definition 2.3], the winning set decided by the ϑ2\vartheta_{2}-precision stationary policies is not enough to cover that of the IMC abstraction, which fails to ensure an approximated bi-similarity of IMCs compared to [21].

For future work, it would be useful to extend the current approach to robust stochastic control systems. It would be interesting to design algorithms to construct IMC (resp. bounded-parameter Markov decision processes) abstractions for more general robust stochastic (resp. control) systems with 1\mathcal{L}_{1} perturbations based on metrizable space of measures and weak topology. The size of state discretization can be refined given more specific assumptions on system dynamics and linear-time objectives. For verification or control synthesis w.r.t. probabilistic safety or reachability problems, comparisons can be made with stochastic Lyapunov-barrier function approaches.

References

  • [1] Abate, A., D’Innocenzo, A., Di Benedetto, M.D., Sastry, S.S.: Markov set-chains as abstractions of stochastic hybrid systems. In: Proc. of Hybrid Systems: Computation and Control (HSCC). pp. 1–15. Springer (2008)
  • [2] Abate, A., Katoen, J.P., Mereacre, A.: Quantitative automata model checking of autonomous stochastic hybrid systems. In: Proc. of Hybrid Systems: Computation and Control (HSCC). pp. 83–92 (2011)
  • [3] Abate, A., Prandini, M., Lygeros, J., Sastry, S.: Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems. Automatica 44(11), 2724–2734 (2008)
  • [4] Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press (2008)
  • [5] Belta, C., Yordanov, B., Gol, E.A.: Formal Methods for Discrete-time Dynamical Systems, vol. 89. Springer (2017)
  • [6] Bustan, D., Rubin, S., Vardi, M.Y.: Verifying ω\omega-regular properties of markov chains. In: International Conference on Computer Aided Verification. pp. 189–201. Springer (2004)
  • [7] Cauchi, N., Laurenti, L., Lahijanian, M., Abate, A., Kwiatkowska, M., Cardelli, L.: Efficiency through uncertainty: Scalable formal synthesis for stochastic hybrid systems. In: Proc. of Hybrid Systems: Computation and Control (HSCC). pp. 240–251 (2019)
  • [8] Da Prato, G., Zabczyk, J.: Stochastic equations in infinite dimensions. Cambridge University Press (2014)
  • [9] Dehnert, C., Junges, S., Katoen, J.P., Volk, M.: A storm is coming: A modern probabilistic model checker. In: International Conference on Computer Aided Verification. pp. 592–600. Springer (2017)
  • [10] Delimpaltadakis, G., Laurenti, L., Mazo Jr, M.: Abstracting the sampling behaviour of stochastic linear periodic event-triggered control systems. arXiv preprint arXiv:2103.13839 (2021)
  • [11] Dutreix, M., Coogan, S.: Specification-guided verification and abstraction refinement of mixed monotone stochastic systems. IEEE Transactions on Automatic Control (2020)
  • [12] Dutreix, M.D.H.: Verification and synthesis for stochastic systems with temporal logic specifications. Ph.D. thesis, Georgia Institute of Technology (2020)
  • [13] Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. International statistical review 70(3), 419–435 (2002)
  • [14] Girard, A., Pola, G., Tabuada, P.: Approximately bisimilar symbolic models for incrementally stable switched systems. IEEE Transactions on Automatic Control 55(1), 116–126 (2009)
  • [15] Givan, R., Leach, S., Dean, T.: Bounded-parameter markov decision processes. Artificial Intelligence 122(1-2), 71–109 (2000)
  • [16] Hartfiel, D.J.: Markov Set-Chains. Springer (2006)
  • [17] Kloetzer, M., Belta, C.: A fully automated framework for control of linear systems from temporal logic specifications. IEEE Transactions on Automatic Control 53(1), 287–297 (2008)
  • [18] Lahijanian, M., Andersson, S.B., Belta, C.: Formal verification and synthesis for discrete-time stochastic systems. IEEE Transactions on Automatic Control 60(8), 2031–2045 (2015)
  • [19] Laurenti, L., Lahijanian, M., Abate, A., Cardelli, L., Kwiatkowska, M.: Formal and efficient synthesis for continuous-time linear stochastic hybrid processes. IEEE Transactions on Automatic Control 66(1), 17–32 (2020)
  • [20] Li, Y., Liu, J.: Robustly complete synthesis of memoryless controllers for nonlinear systems with reach-and-stay specifications. IEEE Transactions on Automatic Control (2020)
  • [21] Liu, J.: Robust abstractions for control synthesis: Completeness via robustness for linear-time properties. In: Proc. of Hybrid Systems: Computation and Control (HSCC). pp. 101–110 (2017)
  • [22] Liu, J.: Closing the gap between discrete abstractions and continuous control: Completeness via robustness and controllability. In: International Conference on Formal Modeling and Analysis of Timed Systems. pp. 67–83. Springer (2021)
  • [23] Majumdar, R., Mallik, K., Soudjani, S.: Symbolic controller synthesis for büchi specifications on stochastic systems. In: Proceedings of the 23rd International Conference on Hybrid Systems: Computation and Control. pp. 1–11 (2020)
  • [24] Parker, D.: Verification of probabilistic real-time systems. Proc. 2013 Real-time Systems Summer School (ETR’13) (2013)
  • [25] Pola, G., Girard, A., Tabuada, P.: Approximately bisimilar symbolic models for nonlinear control systems. Automatica 44(10), 2508–2516 (2008)
  • [26] Ramponi, F., Chatterjee, D., Summers, S., Lygeros, J.: On the connections between pctl and dynamic programming. In: Proc. of Hybrid Systems: Computation and Control (HSCC). pp. 253–262 (2010)
  • [27] Rogers, L.C.G., Williams, D.: Diffusions, markov processes and martingales, volume 1: Foundations. Cambridge Mathematical Library, (2000)
  • [28] Sagar, G., Ravi, D.: Compactness of any countable product of compact metric spaces in product topology without using tychonoff’s theorem. arXiv preprint arXiv:2111.02904 (2021)
  • [29] Soudjani, S.E.Z., Abate, A.: Adaptive gridding for abstraction and verification of stochastic hybrid systems. In: 2011 Eighth International Conference on Quantitative Evaluation of SysTems. pp. 59–68. IEEE (2011)
  • [30] Summers, S., Lygeros, J.: Verification of discrete time stochastic hybrid systems: A stochastic reach-avoid decision problem. Automatica 46(12), 1951–1961 (2010)
  • [31] Tabuada, P., Pappas, G.J.: Linear time logic control of discrete-time linear systems. IEEE Transactions on Automatic Control 51(12), 1862–1877 (2006)
  • [32] Tkachev, I., Abate, A.: On infinite-horizon probabilistic properties and stochastic bisimulation functions. In: 2011 50th IEEE Conference on Decision and Control and European Control Conference. pp. 526–531. IEEE (2011)
  • [33] Tkachev, I., Abate, A.: Regularization of bellman equations for infinite-horizon probabilistic properties. In: Proc. of Hybrid Systems: Computation and Control (HSCC). pp. 227–236 (2012)
  • [34] Tkachev, I., Abate, A.: Formula-free finite abstractions for linear temporal verification of stochastic hybrid systems. In: Proc. of Hybrid Systems: Computation and Control (HSCC). pp. 283–292 (2013)
  • [35] Tkachev, I., Abate, A.: Characterization and computation of infinite-horizon specifications over markov processes. Theoretical Computer Science 515, 1–18 (2014)
  • [36] Tkachev, I., Mereacre, A., Katoen, J.P., Abate, A.: Quantitative model-checking of controlled discrete-time markov processes. Information and Computation 253, 1–35 (2017)
  • [37] Valby, L.V.: A category of polytopes. Ph.D. thesis, Reed College (2006)
  • [38] Vardi, M.Y.: Automatic verification of probabilistic concurrent finite state programs. In: 26th Annual Symposium on Foundations of Computer Science (FOCS). pp. 327–338. IEEE (1985)
  • [39] Vassiliou, P.C.: Non-homogeneous markov set systems. Mathematics 9(5),  471 (2021)
  • [40] Wu, D., Koutsoukos, X.: Reachability analysis of uncertain systems using bounded-parameter markov decision processes. Artificial Intelligence 172(8-9), 945–954 (2008)

Appendix 0.A Proofs of Section 3

Proof of Corollary 1.

Proof  It is clear that QQ under discrete metric is complete and separable. In addition, for each tt, the space (t,TV)(\mathscr{M}_{t},\left\|\;\cdot\;\right\|_{\text{TV}}) is complete and separable. By Lemma 1, each (t,TV)(\mathscr{M}_{t},\left\|\;\cdot\;\right\|_{\text{TV}}) is also compact. For any sequence {μn}t\{\mu_{n}\}\subseteq\mathscr{M}_{t}, a quick application of Theorem 1 leads to the existence of a weakly convergent subsequence {μnk}\{\mu_{n_{k}}\} and a weak limit point μ\mu in t\mathscr{M}_{t}. By the definition of weak convergence and the discrete structure of QQ, it is clear that for each hCb(𝒳)h\in C_{b}(\mathcal{X}) and tt\in\mathbb{N}, we have

𝒳h(x)μnk(x)𝒳h(x)μ\sum_{\mathcal{X}}h(x)\mu_{n_{k}}(x)\rightarrow\sum_{\mathcal{X}}h(x)\mu

in a strong sense, which concludes the compactness of HH. Now we choose μ1,μ2t\mu_{1},\mu_{2}\in\mathscr{M}_{t}, then αμ1+(1α)μ2t\alpha\mu_{1}+(1-\alpha)\mu_{2}\in\mathscr{M}_{t} for all α[0,1]\alpha\in[0,1]. Therefore,

α𝒳h(x)μ1(x)+(1α)𝒳h(x)μ2(x)=𝒳h(x)[αμ1+(1α)μ2](x)H\alpha\sum_{\mathcal{X}}h(x)\mu_{1}(x)+(1-\alpha)\sum_{\mathcal{X}}h(x)\mu_{2}(x)=\sum_{\mathcal{X}}h(x)[\alpha\mu_{1}+(1-\alpha)\mu_{2}](x)\in H

for all α[0,1]\alpha\in[0,1]. This shows the convex structure of HH.  

Proof of Proposition 1.

Proof  We make a bit abuse of notation and define πT:Q0TQ\pi_{T}:Q^{\infty}\rightarrow\prod_{0}^{T}Q as the projection onto the finite product space of QQ up to time TT. Since we do no emphasize the initial conditions, we also use 𝐏\mathbf{P}, \mathscr{M} and t\mathscr{M}_{t} for short. By Tychonoff theorem, any product of QQ is also compact w.r.t. the product topology. Therefore, any family of measures on QTQ^{T} is tight and hence compact. By Remark 1, for every 𝐏\mathbf{P}\in\mathscr{M}, we have 𝐏πT1=t=0Tμt\mathbf{P}\circ\pi_{T}^{-1}=\otimes_{t=0}^{T}\mu_{t} (recall Remark 1) for some μtt\mu_{t}\in\mathscr{M}_{t}, and {𝐏πT1}I𝕀\{\mathbf{P}\circ\pi_{T}^{-1}\}_{I\in\mathbb{I}} forms a compact set. Hence, every sequence {𝐏nπT1}n{𝐏πT1}I𝕀\{\mathbf{P}_{n}\circ\pi_{T}^{-1}\}_{n}\subseteq\{\mathbf{P}\circ\pi_{T}^{-1}\}_{I\in\mathbb{I}} with any finite TT contains a weakly convergent subsequence. We construct the convergent subsequence of {𝐏n}n\{\mathbf{P}_{n}\}_{n} in the following way.

We initialize the procedure by setting T=0T=0. Then 0\mathscr{M}_{0} is compact, and there exists a weakly convergent subsequence {𝐏0,nπ01}\{\mathbf{P}_{0,n}\circ\pi_{0}^{-1}\}. Based on {𝐏0,n}\{\mathbf{P}_{0,n}\}, we are able to see that it contains a weakly convergent subsequence, denoted by {𝐏1,n}\{\mathbf{P}_{1,n}\}, such that {𝐏1,nπ11}\{\mathbf{P}_{1,n}\circ\pi_{1}^{-1}\} weakly converges. By induction, we have {𝐏k+1,n}{𝐏k,n}\{\mathbf{P}_{k+1,n}\}\subseteq\{\mathbf{P}_{k,n}\} for each kk\in\mathbb{N}. Repeating this argument and picking the diagonal subsequence {𝐏n,n}\{\mathbf{P}_{n,n}\}, then {𝐏n,n}\{\mathbf{P}_{n,n}\} has the property that {𝐏n,nπT1}\{\mathbf{P}_{n,n}\circ\pi_{T}^{-1}\} is weakly convergent for each TT . We denote the weak limit point of each {𝐏n,nπT1}\{\mathbf{P}_{n,n}\circ\pi_{T}^{-1}\} by t=0Tμt\otimes_{t=0}^{T}\mu_{t}. By the way of construction, we have

t=0Tμt()=t=0T+1μt(×Q),T.\otimes_{t=0}^{T}\mu_{t}(\cdot)=\otimes_{t=0}^{T+1}\mu_{t}(\cdot\times Q),\;\;\forall T\in\mathbb{N}.

By Kolmogorov’s extension theorem, there exists a unique 𝐏\mathbf{P} on QQ^{\infty} such that t=0Tμt()=𝐏πT1()\otimes_{t=0}^{T}\mu_{t}(\cdot)=\mathbf{P}\circ\pi_{T}^{-1}(\cdot) for each TT.

We have seen that for each {𝐏n}\{\mathbf{P}_{n}\}, the constructed subsequence satisfies 𝐏n,n𝐏\mathbf{P}_{n,n}\Rightarrow\mathbf{P}, which concludes the claim.  

Proof of Theorem 2

Proof  Since we do not emphasize the initial conditions, we simply drop the superscripts q0q_{0} for short. Given I𝕀I\in\mathbb{I} with any initial condition, the corresponding canonical space is (𝛀,𝐅,𝐏I)(\mathbf{\Omega},\mathbf{F},\mathbf{P}_{I}). By Proposition 1, every sequence {𝐏n}\{\mathbf{P}_{n}\}\subseteq\mathscr{M} has a weakly convergent subsequence, denoted by {𝐏nk}\{\mathbf{P}_{n_{k}}\}, to a 𝐏\mathbf{P}\in\mathscr{M} of some II. Note that for any II, the measurable set {IΨ}={ϖ:ϖΨ}𝐅\{I\vDash\Psi\}=\{\varpi:\varpi\vDash\Psi\}\in\mathbf{F} is the same due to the identical labelling function. It is important to notice that due to the discrete topology of 𝛀\mathbf{\Omega}, every Borel measurable set A𝐅A\in\mathbf{F} is such that A=\partial A=\emptyset. By Definition 5 we have 𝐏nk(InkΨ)𝐏(IΨ)\mathbf{P}_{n_{k}}(I_{n_{k}}\vDash\Psi)\rightarrow\mathbf{P}(I\vDash\Psi) for all Ψ\Psi. The compactness of Sq0S^{q_{0}} follows immediately. The convexity of the set of laws is based on the tensor product of convex polytopes [37]. To show the convexity of Sq0S^{q_{0}}, we notice that, for any q0,qntQq_{0},\cdots q_{n_{t}}\in Q and I𝕀I\in\mathbb{I},

𝐏I(I0=q0,,It=qnt,It+1=qnt+1){Θnt+1,ntΘnt,nt1Θn1,0δq0:Θ[[Θ]]}\begin{split}&\mathbf{P}_{I}\left(I_{0}=q_{0},\cdots,I_{t}=q_{n_{t}},I_{t+1}=q_{n_{t+1}}\right)\\ \in&\{\Theta_{n_{t+1},n_{t}}\Theta_{n_{t},n_{t-1}}\cdots\Theta_{n_{1},0}\delta_{q_{0}}:\Theta\in[\![\Theta]\!]\}\end{split}

and hence forms a convex set. Immediately, the convexity holds for
{𝐏I(i=1nΓi)}I𝕀\{\mathbf{P}_{I}(\prod_{i=1}^{n}\Gamma_{i})\}_{I\in\mathbb{I}} for any cylinder set i=1nΓi\prod_{i=1}^{n}\Gamma_{i}. By a standard monotone class argument, {𝐏I(A)}I𝕀\{\mathbf{P}_{I}(A)\}_{I\in\mathbb{I}} is also convex for any Borel measurable set A𝐅A\in\mathbf{F}, which implies the convexity of Sq0S^{q_{0}} in the statement.  

Proof of Proposition 2

Proof  Note that the laws are associated with XX with X0=x0X_{0}=x_{0}, which actually means the stopped process XτX^{\tau} (Recall notations in Section 2.4). Now that Xtτ𝒲¯X_{t\wedge\tau}\in\overline{\mathcal{W}} for each tt, the state space of XX is compact, so is the countably infinite product. By a similar argument as Proposition 1, we can conclude the first part of the statement. Note that by assumption, the partition respects the boundary of the labelling function. Hence, for all formula Ψ\Psi, the boundary of {XΨ}𝐅\{X\vDash\Psi\}\in\mathbf{F} has measure 0. The second part can be concluded directly by Definition 5.  

Proof of Lemma 2

Proof  Note that XX is on (Ω,,𝒫Xν0)(\Omega,\mathcal{F},\mathcal{P}_{X}^{\nu_{0}}) and II is on (𝛀,𝐅,𝐏Iq0)(\mathbf{\Omega},\mathbf{F},\mathbf{P}_{I}^{q_{0}}). We first show the case when ν0=δx0\nu_{0}=\delta_{x_{0}} for any x0q0x_{0}\in q_{0}. That is, for X0=x0X_{0}=x_{0} a.s. with any x0q0x_{0}\in q_{0}, there exists a unique law of some I𝕀I\in\mathbb{I} such that 𝒫Xx0(XΨ)=𝐏Iq0(IΨ)\mathcal{P}_{X}^{x_{0}}(X\vDash\Psi)=\mathbf{P}_{I}^{q_{0}}(I\vDash\Psi) for any Ψ\Psi.

Let νt\nu_{t} denote the marginal distribution of 𝒫Xx0\mathcal{P}_{X}^{x_{0}} at each tt. Let t={μt}I𝕀\mathscr{M}_{t}=\{\mu_{t}\}_{I\in\mathbb{I}} denote the set of marginal distributions of {𝐏Iq0}I𝕀\{\mathbf{P}_{I}^{q_{0}}\}_{I\in\mathbb{I}}. Now, at t=1t=1, ν1(qj)=𝒯(x0,qj)δx0\nu_{1}(q_{j})=\mathcal{T}(x_{0},q_{j})\delta_{x_{0}} for all j{1,2,,N+1}j\in\{1,2,\cdots,N+1\}. Suppose q0q_{0} is the ithi^{th} element of QQ, by the construction of IMC, we have

Θˇijν1(qj)=qjδx0𝒯(x0,dy)Θ^ij,x0q0andj{1,2,,N+1}.\check{\Theta}_{ij}\leq\nu_{1}(q_{j})=\int_{q_{j}}\delta_{x_{0}}\mathcal{T}(x_{0},dy)\leq\hat{\Theta}_{ij},\;\;\forall x_{0}\in q_{0}\;\text{and}\;\forall j\in\{1,2,\cdots,N+1\}.

Since qQν1(q)=1\sum_{q\in Q}\nu_{1}(q)=1, by letting μ1=(ν1(q1),ν1(q2),,ν1(qN+1))T\mu_{1}=(\nu_{1}(q_{1}),\nu_{1}(q_{2}),\cdots,\nu_{1}(q_{N+1}))^{T}, we have automatically μ11\mu_{1}\in\mathscr{M}_{1} by definition. Note that μ1\mu_{1} is unique w.r.t. TV\left\|\;\cdot\;\right\|_{\text{TV}}, and has the property that μ(q)=ν(q)\mu(q)=\nu(q) for each qQq\in Q.

Similarly, at t=2t=2, we have

Θˇijμ1(qi)qjqiν1(dx)𝒯(x,dy)Θ^ijμ1(qi),i,j{1,2,,N+1},\check{\Theta}_{ij}\mu_{1}(q_{i})\leq\int_{q_{j}}\int_{q_{i}}\nu_{1}(dx)\mathcal{T}(x,dy)\leq\hat{\Theta}_{ij}\mu_{1}(q_{i}),\;\forall i,j\in\{1,2,\cdots,N+1\},

where 𝒯\mathcal{T} may not be the same as that of t=1t=1. Therefore, for any j{1,2,,N+1}j\in\{1,2,\cdots,N+1\},

ν2(qj)=i=1N+1qjqiν1(dx)𝒯(x,dy)\nu_{2}(q_{j})=\sum_{i=1}^{N+1}\int_{q_{j}}\int_{q_{i}}\nu_{1}(dx)\mathcal{T}(x,dy)

and there exists a μ2\mu_{2} such that iΘˇijμ1(qi)μ2(qj)=ν2(qj)iΘ^ijμ1(qi)\sum_{i}\check{\Theta}_{ij}\mu_{1}(q_{i})\leq\mu_{2}(q_{j})=\nu_{2}(q_{j})\leq\sum_{i}\hat{\Theta}_{ij}\mu_{1}(q_{i}), which means (by (12)) μ22\mu_{2}\in\mathscr{M}_{2}. In addition, there also exists a 𝐏q0\mathbf{P}^{q_{0}} such that its one-dimensional marginals up to t=2t=2 admit μ1\mu_{1} and μ2\mu_{2}, and satisfies

𝐏q0[I0=q0,I1=qi,I2=qj]=𝒫Xx0[X0=x0,X1qi,X2qj].\mathbf{P}^{q_{0}}[I_{0}=q_{0},I_{1}=q_{i},I_{2}=q_{j}]=\mathcal{P}_{X}^{x_{0}}[X_{0}=x_{0},X_{1}\in q_{i},X_{2}\in q_{j}].

Repeating this procedure, there exists a unique μtt\mu_{t}\in\mathscr{M}_{t} w.r.t. TV\left\|\;\cdot\;\right\|_{\text{TV}} for each tt, such that μt(q)=νt(q)\mu_{t}(q)=\nu_{t}(q) for each qQq\in Q. It is also clear that for each given x0q0x_{0}\in q_{0} and each tt, the selected 𝐏q0\mathbf{P}^{q_{0}} satisfies

𝒫Xx0(0t1Ai)=𝐏q0(0t1Ai)=𝐏q0(0t1Ai×Q),Ai(Q).\mathcal{P}_{X}^{x_{0}}(\prod_{0}^{t-1}A_{i})=\mathbf{P}^{q_{0}}(\prod_{0}^{t-1}A_{i})=\mathbf{P}^{q_{0}}(\prod_{0}^{t-1}A_{i}\times Q),\;\;A_{i}\in\mathscr{B}(Q).

By Kolmogrov extension theorem, there exists a unique law 𝐏Iq0\mathbf{P}_{I}^{q_{0}} of some I𝕀I\in\mathbb{I} such that each TT-dimensional distribution coincides with 0Tμi\otimes_{0}^{T}\mu_{i}, and, for each given x0q0x_{0}\in q_{0}, 𝒫Xx0(Γ)=𝐏Iq0(Γ)\mathcal{P}_{X}^{x_{0}}(\Gamma)=\mathbf{P}_{I}^{q_{0}}(\Gamma) for all Γ(Q)=𝐅.\Gamma\in\mathscr{B}(Q^{\infty})=\mathbf{F}. Due to the assumption that L(x)=L𝕀(q)L(x)=L_{\mathbb{I}}(q) for all xqx\in q and qQq\in Q, we have

{L1(Ψ)}={L𝕀1(Ψ)}𝐅\{L^{-1}(\Psi)\}=\{L_{\mathbb{I}}^{-1}(\Psi)\}\in\mathbf{F}

for all LTL formula Ψ\Psi, which implies {XΨ}={IΨ}\{X\vDash\Psi\}=\{I\vDash\Psi\} by definition. Thus, given x0q0x_{0}\in q_{0}, the above 𝐏Iq0\mathbf{P}_{I}^{q_{0}} should satisfy 𝒫Xx0(XΨ)=𝐏Iq0(IΨ)\mathcal{P}_{X}^{x_{0}}(X\vDash\Psi)=\mathbf{P}_{I}^{q_{0}}(I\vDash\Psi).

Based on the above conclusion, as well as the definition of 𝒫Xν0\mathcal{P}_{X}^{\nu_{0}} and the convexity of Sq0S^{q_{0}} (recall Theorem 2), the result for more general initial distribution ν0\nu_{0} with ν0(q0)=1\nu_{0}(q_{0})=1 can be obtained.  

Proof of Proposition 3.

Proof  Let μ,νt\mu,\nu\in\mathscr{M}_{t} for each tt, and V,K[[Θ]]V,K\in[\![\Theta]\!] be any stochastic matrices generated by 𝕀\mathbb{I}. Then, for each tt, we have

VTμKTνTVVTμVTνTV+maxiViKiTVνTV12maxi,jViKjTVμνTV+ενTVμνTV+ε.\begin{split}\left\|V^{T}\mu-K^{T}\nu\right\|_{\text{TV}}&\leq\left\|V^{T}\mu-V^{T}\nu\right\|_{\text{TV}}+\max_{i}\left\|V_{i}-K_{i}\right\|_{\text{TV}}\left\|\nu\right\|_{\text{TV}}\\ &\leq\frac{1}{2}\max_{i,j}\left\|V_{i}-K_{j}\right\|_{\text{TV}}\left\|\mu-\nu\right\|_{\text{TV}}+\varepsilon\left\|\nu\right\|_{\text{TV}}\\ &\leq\left\|\mu-\nu\right\|_{\text{TV}}+\varepsilon.\end{split} (24)

This implies that the total deviation of any μ~,ν~t+1\tilde{\mu},\tilde{\nu}\in\mathscr{M}_{t+1} is bounded by

maxtμνTV+ε.\max_{\mathscr{M}_{t}}\left\|\mu-\nu\right\|_{\text{TV}}+\varepsilon.

Note that at t=0t=0, max0μνTV=0\max_{\mathscr{M}_{0}}\left\|\mu-\nu\right\|_{\text{TV}}=0. Hence, at each t>0t>0, as ε0\varepsilon\rightarrow 0,

maxtμνTV0.\max_{\mathscr{M}_{t}}\left\|\mu-\nu\right\|_{\text{TV}}\rightarrow 0.

By the product topology and Kolmogrov extension theorem, the set {𝐏}I𝕀\{\mathbf{P}\}_{I\in\mathbb{I}} is reduced to a singleton. The conclusion follows after this.  

Appendix 0.B Proofs of Section 4

Proof of Lemma 3.

Proof  It can be proved, for example, using inclusion functions. Let 𝕀n\mathbb{IR}^{n} denote the set of all boxes in n\mathbb{R}^{n}. Let [f]:𝕀n𝕀n[f]:\mathbb{IR}^{n}\rightarrow\mathbb{IR}^{n} be a convergent inclusion function of ff satisfying (i) f([x])[f]([x])f([x])\subseteq[f]([x]) for all [x]𝕀n[x]\in\mathbb{IR}^{n}; (ii) limλ([y])0λ([f]([x])=0\lim_{\lambda([y])\rightarrow 0}\lambda([f]([x])=0, where λ\lambda denote the width. Similarly, let [B]:𝕀n𝕀n×k[B]:\mathbb{IR}^{n}\rightarrow\mathbb{IR}^{n\times k} be a convergent inclusion matrix of b(x)b(x) and satisfy (i) b([x])[B]([x])b([x])\subseteq[B]([x]) for all [x]𝕀n[x]\in\mathbb{IR}^{n}; (ii) limλ([B])0λ([B]([x])=0\lim_{\lambda([B])\rightarrow 0}\lambda([B]([x])=0, where λ([B]([x]):=maxi,jλ([Bij])\lambda([B]([x]):=\max_{i,j}\lambda([B_{ij}]).

Without loss of generality, we assume that κ<1\kappa<1. Due to the Lipschitz continuity of ff and bb, we can find inclusions such that λ([f]([y])Lfλ([y])\lambda([f]([y])\leq L_{f}\lambda([y]) for any subintervals of [x][x], and similarly λ([B]([y])Lbλ([y])\lambda([B]([y])\leq L_{b}\lambda([y]). For each such interval [y][y], we can obtain the interval [m]=[f]([y])[m]=[f]([y]) and [s2]=[B]([y])[B]([y])[s^{2}]=[B]([y])[B]^{*}([y]). Let TT denote the collection of Gaussian measures with mean and covariance of all such intervals ([m][m] and [s2][s^{2}]), and 𝕋1([x])^\widehat{\mathbb{T}_{1}([x])} be its union. Then 𝕋1([x])^\widehat{\mathbb{T}_{1}([x])} satisfies the requirement. Indeed, we have 𝕋1([x])𝕋1([x])^\mathbb{T}_{1}([x])\subseteq\widehat{\mathbb{T}_{1}([x])}. For the second part of inclusion, we have for any μ𝕋1([x])^\mu\in\widehat{\mathbb{T}_{1}([x])} and ν𝕋1([x])\nu\in\mathbb{T}_{1}([x]),

μνW2[m]2+[s2](Lfλ([y]))2+(Lbλ([y]))2,\left\|\mu-\nu\right\|_{\text{W}}^{2}\leq[m]^{2}+[s^{2}]\leq(L_{f}\lambda([y]))^{2}+(L_{b}\lambda([y]))^{2}, (25)

where we are able to choose [y][y] arbitrarily small such that (Lfλ([y]))2+(Lbλ([y]))2<κ2(L_{f}\lambda([y]))^{2}+(L_{b}\lambda([y]))^{2}<\kappa^{2}. The second part of inclusion can be completed by such a choice of [y][y].  

Proof of Corollary 3

Proof  The first part of the proof is provided in Section 3. The second inclusion is done in a similar way as Lemma 2 and Theorem 3. Indeed, by the definition of abstraction, for any μt\mu\in\mathscr{M}_{t}, there exists a marginal measure ν\nu of some X𝕏2X\in\mathbb{X}_{2} such that their probabilities match on discrete nodes. By the same induction as Lemma 2, we have that for any law 𝐏Iq0\mathbf{P}_{I}^{q_{0}} of some I𝕀I\in\mathbb{I}, there exists a 𝒫Xν0\mathcal{P}_{X}^{\nu_{0}} of some X𝕏2X\in\mathbb{X}_{2} such that the probabilities of any Γ(Q)\Gamma\in\mathscr{B}(Q^{\infty}) match. The second inclusion follows after this. The compactness also follows a similar way as Proposition 2. Note that, S1S_{1} may not be convex, but S𝕀S_{\mathbb{I}} and S2S_{2} are (also see details in Remark 15).