¹¹institutetext: University of Waterloo, Waterloo, ON N2L 3G1, Canada ¹¹email: {yiming.meng, j.liu}@uwaterloo.ca

Robustly Complete Finite-State Abstractions for Verification of Stochastic Systems

Yiming Meng Jun Liu

Abstract

In this paper, we focus on discrete-time stochastic systems modelled by nonlinear stochastic difference equations and propose robust abstractions for verifying probabilistic linear temporal specifications. The current literature focuses on developing sound abstraction techniques for stochastic dynamics without perturbations. However, soundness thus far has only been shown for preserving the satisfaction probability of certain types of temporal-logic specification. We present constructive finite-state abstractions for verifying probabilistic satisfaction of general $\omega$ -regular linear-time properties of more general nonlinear stochastic systems. Instead of imposing stability assumptions, we analyze the probabilistic properties from the topological view of metrizable space of probability measures. Such abstractions are both sound and approximately complete. That is, given a concrete discrete-time stochastic system and an arbitrarily small $\mathcal{L}_{1}$ -perturbation of this system, there exists a family of finite-state Markov chains whose set of satisfaction probabilities contains that of the original system and meanwhile is contained by that of the slightly perturbed system. A direct consequence is that, given a probabilistic linear-time specification, initializing within the winning/losing region of the abstraction system can guarantee a satisfaction/dissatisfaction for the original system. We make an interesting observation that, unlike the deterministic case, point-mass (Dirac) perturbations cannot fulfill the purpose of robust completeness.

Keywords:

Verification of stochastic systems Finite-state abstraction Robustness Soundness Completeness

\mathcal{L}_{1}

-perturbation Linear temporal logic Metrizable space of probability measures

1 Introduction

Formal verification is a rigorous mathematical technique for verifying system properties using formal analysis or model checking [4]. So far, abstraction-based formal verification for deterministic systems has gained its maturity [5]. Whilst bisimilar (equivalent) symbolic models exist for linear (control) systems [17, 31], sound and approximately complete finite abstractions can be achieved via stability assumptions [25, 14] or robustness (in terms of Dirac perturbations) [21, 20, 22].

There is a recent surge of interest in studying formal verification for stochastic systems. The verification of temporal logics for discrete-state homogeneous Markov chains can be solved by existing tools [4, 24, 6, 9].

In terms of verification for general discrete-time continuous-state Markov systems, a common theme is to construct abstractions to approximate the probability of satisfaction in proper ways. First attempts [26, 29, 30, 3, 2] were to relate the verification of trivial probabilistic computation tree logic (PCTL) formulas to the computation of corresponding value functions. The authors [32, 33] developed alternative techniques to deal with the potential error blow-up in infinite-horizon problems. The same authors [35] investigated the necessity of absorbing sets on the uniqueness of the solutions of corresponding Bellman equations. The related PCTL verification problem can be then precisely captured by finite-horizon ones. They also proposed abstractions for verifying general bounded linear-time (LT) properties [34], and extended them to infinite-horizon reach-avoid and repeated reachability problems [34, 36].

Markov set-chains are also constructive to be abstractions. The authors [1] showed that the error is finite under strong assumptions on stability (ergodicity). A closely related approach is to apply interval-valued Markov chains (IMCs), a family of finite-state Markov chains with uncertain transitions, as abstractions for the continuous-state Markov systems with certain transition kernel. The authors [18] argued without proof that for every PCTL formula, the probability of (path) satisfaction of the IMC abstractions forms a compact interval, which contains the real probability of the original system. They further developed ‘O’-maximizing/minimizing algorithms based on [15, 40] to obtain the upper/lower bound of the satisfaction probability of ‘next’, ‘bounded until’, and ‘until’ properties. The algorithm provides a fundamental view of computing the bounds of probability satisfaction given IMCs. However, the intuitive reasoning for soundness seems inaccurate based on our observation (readers who are interested in the details are referred to Remark 8 of this paper). Inspired by [18], the work in [7] formulated IMC abstraction for verifying bounded-LTL specifications; the work in [11, 12] constructed IMC abstractions for verifying general $\omega$ -regular properties of mixed-monotone systems, and provided a novel automata-based approach in obtaining the bounds of satisfaction probability. In [11, 12, Fact 1], the authors claimed the soundness of verifying general $\omega$ -regular properties using IMC abstractions, but a proof is not provided. In [10], the authors showed that IMC can be used to provide conservative estimates of the expected values for stochastic linear control system. The authors also remarked that their result can be extended to deal with omega-regular properties based on [15, 19] (where [19] is only regarding safety properties), but without any proofs. To the best knowledge of the authors, we currently lacks a general framework, as the one presented in the paper, for guaranteeing soundness of IMC abstractions for verifying $\omega$ -regular properties.

Motivated by these issues, our first contribution is to provide a formal mathematical proof of the soundness of IMC abstractions for verifying $\omega$ -regular linear-time properties. We show that, for any discrete-time stochastic dynamical systems modelled by a stochastic difference equation and any linear-time property, an IMC abstraction returns a compact interval of probability of (path) satisfaction which contains the satisfaction probability of the original system. A direct consequence is that starting within the winning/losing region computed by the abstraction can guarantee a satisfaction/dissatisfaction for the original system. The second contribution of this paper is to deal with stochastic systems with extra uncertain perturbations (due to, e.g., measurement errors or modeling uncertainties). Under mild assumptions, we show that, in verifying probabilistic satisfaction of general $\omega$ -regular linear-time properties, IMC abstractions that are both sound and approximately complete are constructible for nonlinear stochastic systems. That is, given a concrete discrete-time continuous-state Markov system $\mathbb{X}$ , and an arbitrarily small $\mathcal{L}_{1}$ -bounded perturbation of this system, there exists an IMC abstraction whose set of satisfaction probability contains that of $\mathbb{X}$ , and meanwhile is contained by that of the slightly perturbed system. We argue in Section 4 that to make the IMC abstraction robustly complete, the perturbation is generally necessary to be $\mathcal{L}_{1}$ -bounded rather than only bounded in terms of point mass. We analyze the probabilistic properties based on the topology of metrizable space of (uncertain) probability measures, and show that the technique proves more powerful than purely discussing the value of probabilities. We also would like to clarify that the main purpose of this paper is not on providing more efficient algorithms for computing abstractions. We aim to provide a theoretical foundation of IMC abstractions for verifying continuous-state stochastic systems with perturbations and hope to shed some light on designing more powerful robust verification algorithms.

The rest of the paper is organized as follows. Section 2 presents some preliminaries on probability spaces and Markov systems. Section 3 presents the soundness of abstractions in verifying $\omega$ -regular linear-time properties for discrete-time nonlinear stochastic systems. Section 4 presents the constructive robust abstractions with soundness and approximate completeness guarantees. We discuss the differences of robustness between deterministic and stochastic systems. The paper is concluded in Section 5.

Notation: We denote by $\small{\prod}$ the product of ordinary sets, spaces, or function values. Denote by $\otimes$ the product of collections of sets, or sigma algebras, or measures. The $n$ -times repeated product of any kind is denoted by $(\cdot)^{n}$ for simplification. Denote by $\pi_{j}:\prod_{i=0}^{\infty}(\cdot)_{i}\rightarrow(\cdot)_{j}$ the projection to the $j^{\text{th}}$ component. We denote by $\mathscr{B}(\cdot)$ the Borel $\sigma$ -algebra of a set.

Let $|\cdot|$ denote the inifinity norm in $\mathbb{R}^{n}$ , and let $\mathbb{B}:=\{x\in\mathbb{R}^{n}:|x|\leq 1\}$ . We denote by $\|\cdot\|_{1}:=\mathcal{E}|\cdot|$ the $\mathcal{L}_{1}$ -norm for $\mathbb{R}^{n}$ -valued random variables, and let $\mathbb{B}_{1}:=\{X:\mathbb{R}^{n}\text{-valued random variable with}\;\|X\|_{1}\leq 1\}$ . Given a matrix $M$ , we denote by $M_{i}$ its $i^{\text{th}}$ row and by $M_{ij}$ its entry at $i^{\text{th}}$ row and $j^{\text{th}}$ column.

Given a general state space $\mathcal{X}$ , we denote by $\mathfrak{P}(\mathcal{X})$ the space of probability measures. The space of bounded and continuous functions on $\mathcal{X}$ is denoted by $C_{b}(\mathcal{X})$ . For any stochastic processes $\{X_{t}\}_{t\geq 0}$ , we use the shorthand notation $X:=\{X_{t}\}_{t\geq 0}$ . For any stopped process $\{X_{t\wedge\tau}\}_{t\geq 0}$ , where $\tau$ is a stopping time, we use the shorthand notation $X^{\tau}$ .

2 Preliminaries

We consider $\mathbb{N}=\{0,1,\cdots\}$ as the discrete time index set, and a general Polish (complete and separable metric) space $\mathcal{X}$ as the state space. For any discrete-time $\mathcal{X}^{\infty}$ -valued stochastic process $X$ , we introduce some standard concepts as follows.

2.1 Canonical sample space

Given a stochastic process $X$ defined on some (most likely unknown) probability space $(\Omega^{\dagger},\mathscr{F}^{\dagger},\mathbb{P}^{\dagger})$ . For $\varpi\in\mathcal{X}^{\infty}=:\Omega$ and $t\in\mathbb{N}$ , we define $\varpi_{t}:=\pi_{t}(\varpi)$ and the coordinate process $\mathfrak{X}_{t}:\mathcal{X}^{\infty}\rightarrow\mathcal{X}$ as $\mathfrak{X}_{t}(\varpi):=\varpi_{t}$ associated with $\mathcal{F}:=\sigma\{\mathfrak{X}_{0},\mathfrak{X}_{1},\cdots\}$ . Then $\Omega^{\dagger}\longrightarrow\mathcal{X}^{\infty}$ ( $\omega^{\dagger}\longmapsto\prod_{t=0}^{\infty}X_{t}(\omega^{\dagger})$ ) is a measurable map from $(\Omega^{\dagger},\mathscr{F}^{\dagger})$ to $(\Omega,\mathcal{F})$ . In particular, $\mathcal{F}=\sigma\{\mathfrak{X}_{t}\in\Gamma,\;\;\Gamma\in\mathscr{B}(\mathcal{X}),\;\;t\in\mathbb{N}\}=\mathscr{B}(\mathcal{X}^{\infty})=\mathscr{B}^{\infty}(\mathcal{X})=\sigma\{\mathcal{C}\}$ , where $\mathcal{C}$ is the collection of all finite-dimensional cylinder set of the following form:

\prod_{i=1}^{n}\Gamma_{i}=\{\varpi:\mathfrak{X}_{t_{1}}(\varpi)\in\Gamma_{1},\cdots,\mathfrak{X}_{t_{n}}(\varpi)\in\Gamma_{n},\;\Gamma_{i}\in\mathscr{B}(\mathcal{X}),t_{i}\in\mathbb{N},i=1,\cdots,n\}.

The measure $\mathcal{P}:=\mathbb{P}^{\dagger}\circ X^{-1}$ of the defined coordinate process $\mathfrak{X}$ is then uniquely determined and admits the probability law of the process $X$ on the product state space, i.e.,

\begin{split}\mathcal{P}[\mathfrak{X}_{t_{1}}\in\Gamma_{1},\cdots,\mathfrak{X}_{t_{n}}\in\Gamma_{n}]=\mathcal{P}\left(\prod_{i=1}^{n}\Gamma_{i}\right)=\mathbb{P}^{\dagger}[X_{t_{1}}\in\Gamma_{1},\cdots,X_{t_{n}}\in\Gamma_{n}].\end{split}

(1)

for any finite-dimensional cylinder set $\prod_{i=1}^{n}\Gamma_{i}\in\mathcal{F}$ . We call $(\Omega,\mathcal{F},\mathcal{P})$ the canonical space of $X$ and denote by $\mathcal{E}$ the associated expectation operator.

Since we only care about the probabilistic behavior of trajectories in the state space, we prefer to work on the canonical probability spaces $(\Omega,\mathcal{F},\mathcal{P})$ and regard events as sets of sample paths. To this end, we also do not distinguish the notation $\mathfrak{X}$ from $X$ due to their identicality in distribution, i.e., we use $X$ to denote its own coordinate process for simplicity.

In the context of discrete state space $\mathcal{X}$ , we specifically use the boldface notation $(\mathbf{\Omega},\mathbf{F},\mathbf{P})$ for the discrete canonical spaces of some discrete-state process.

Remark 1

We usually denote by $\nu_{i}$ the marginal distribution of $\mathcal{P}$ at some $i\in\mathbb{N}$ . We can informally write the $n$ -dimensional distribution (on $n$ -dimensional cylinder set) as $\mathcal{P}(\cdot)=\otimes_{i=1}^{n}\nu_{i}(\cdot)$ regardless of the dependence.

2.2 Markov transition systems

For any discrete-time stochastic process $X$ , we set $\mathcal{F}_{t}=\sigma\{X_{0},X_{1},\cdots,X_{t}\}$ to be the natural filtration.

Definition 1 (Markov process)

A stochastic process $X$ is said to be a Markov process if each $X_{t}$ is $\mathcal{F}_{t}$ -adapted and, for any $\Gamma\in\mathscr{B}(\mathcal{X})$ and $t>s$ , we have

\mathcal{P}[X_{t}\in\Gamma\;|\;\mathcal{F}_{s}]=\mathcal{P}[X_{t}\in\Gamma\;|\;\sigma\{X_{s}\}],\;\;\text{a.s.}

(2)

Correspondingly, for every $t$ , we define the transition probability as

\Theta_{t}(x,\Gamma):=\mathcal{P}[X_{t+1}\in\Gamma\;|\;X_{t}=x],\;\;\Gamma\in\mathscr{B}(\mathcal{X}).

(3)

We denote $\Theta_{t}:=\{\Theta_{t}(x,\Gamma):\;x\in\mathcal{X},\;\Gamma\in\mathscr{B}(\mathcal{X})\}$ as the family of transition probabilities at time $t$ . Note that homogeneous Markov processes are special cases such that $\Theta_{t}=\Theta_{s}$ for all $t\neq s$ .

We are interested in Markov processes with discrete observations of states, which is done by assigning abstract labels over a finite set of atomic propositions. We define an abstract family of labelled Markov processes as follows.

Definition 2 (Markov system)

A Markov system is a tuple $\mathbb{X}=(\mathcal{X},[\![\Theta]\!],\Pi,L)$ , where

•

$\mathcal{X}=\mathcal{W}\cup\Delta$ , where $\mathcal{W}$ is a bounded working space, $\Delta:=\mathcal{W}^{c}$ represents all the out-of-domain states;
•

$[\![\Theta]\!]$ is a collection of transition probabilities from which $\Theta_{t}$ is chosen for every $t$ ;
•

$\Pi$ is the finite set of atomic propositions;
•

$L:\mathcal{X}\rightarrow 2^{\Pi}$ is the (Borel-measurable) labelling function.

For $X\in\mathbb{X}$ with $X_{0}=x_{0}$ a.s., we denote by $\mathcal{P}_{X}^{x_{0}}$ the law, and $\{\mathcal{P}_{X}^{x_{0}}\}_{X\in\mathbb{X}}$ by its collection. Similarly, for any initial distribution $\mathcal{\nu}_{0}\in\mathfrak{P}(\mathcal{X})$ , we define the law by $\mathcal{P}_{X}^{\mathcal{\nu}_{0}}(\cdot)=\int_{\mathcal{X}}\mathcal{P}_{X}^{x}(\cdot)\mathcal{\nu}_{0}(dx)$ , and denote $\{\mathcal{P}_{X}^{\mathcal{\nu}_{0}}\}_{X\in\mathbb{X}}$ by its collection. We denote by $\{\mathcal{P}_{n}^{q_{0}}\}_{n=0}^{\infty}$ (resp. $\{\mathcal{P}_{n}^{\mathcal{\nu}_{0}}\}_{n=0}^{\infty}$ ) a sequence of $\{\mathcal{P}_{X}^{x_{0}}\}_{X\in\mathbb{X}}$ (resp. $\{\mathcal{P}_{X}^{\mathcal{\nu}_{0}}\}_{X\in\mathbb{X}}$ ). We simply use $\mathcal{P}_{X}$ (resp. $\{\mathcal{P}_{X}\}_{X\in\mathbb{X}}$ ) if we do not emphasize the initial condition.

For a path $\varpi:=\varpi_{0}\varpi_{1}\varpi_{2}\cdots\in\Omega$ , define by $L_{\varpi}:=L(\varpi_{0})L(\varpi_{1})L(\varpi_{2})\cdots$ its trace. The space of infinite words is denoted by

(2^{\Pi})^{\omega}=\{A_{0}A_{1}A_{2}\cdots:A_{i}\in 2^{\Pi},\;i=0,1,2\cdots\}.

A linear-time (LT) property is a subset of $(2^{\Pi})^{\omega}$ . We are only interested in LT properties $\Psi$ such that $\Psi\in\mathscr{B}((2^{\Pi})^{\omega})$ , i.e., those are Borel-measurable.

Remark 2

Note that, by [36] and [38, Proposition 2.3], any $\omega$ -regular language of labelled Markov processes is measurable. It follows that, for any Markov process $X$ of the given $\mathbb{X}$ , the traces $L_{\varpi}$ generated by measurable labelling functions are also measurable. For each $\Psi\in\mathscr{B}((2^{\Pi})^{\omega})$ , we have the event $L_{\varpi}^{-1}(\Psi)\in\mathcal{F}$ .

A particular subclass of LT properties can be specified by linear temporal logic (LTL)¹¹1While we consider LTL due to our interest, it can be easily seen that all results of this paper in fact hold for any measurable LT property, including $\omega$ -regular specifications.. To connect with LTL specifications, we introduce the semantics of path satisfaction as well as probabilistic satisfaction as follows.

Definition 3

For the syntax of LTL formulae $\Psi$ and the semantics of satisfaction of $\Psi$ on infinite words, we refer readers to [21, Section 2.4].

For a given labelled Markov process $X$ from $\mathbb{X}$ with initial distribution $\mathcal{\nu}_{0}$ , we formulate the canonical space $(\Omega,\mathcal{F},\mathcal{P}_{X}^{\mathcal{\nu}_{0}})$ . For a path $\varpi\in\Omega$ , we define the path satisfaction as

\varpi\vDash\Psi\Longleftrightarrow L_{\varpi}\vDash\Psi.

We denote by $\{X\vDash\Psi\}:=\{\varpi:\;\varpi\vDash\Psi\}\in\mathcal{F}$ the events of path satisfaction. Given a specified probability $\rho\in[0,1]$ , we define the probabilistic satisfaction of $\Psi$ as

X\vDash\mathcal{P}^{\mathcal{\nu}_{0}}_{\bowtie\rho}[\Psi]\Longleftrightarrow\mathcal{P}_{X}^{\mathcal{\nu}_{0}}\{X\vDash\Psi\}\bowtie\rho,

where $\bowtie\in\{\leq,<,\geq,>\}$ .

2.3 Weak convergence and Prokhorov’s theorem

We consider the set of possible uncertain measures within the topological space of probability measures. The following concepts are frequently used later.

Definition 4 (Tightness of set of measures)

Let $\mathcal{X}$ be any topological state space and $M\subseteq\mathfrak{P}(\mathcal{X})$ be a set of probability measures on $\mathcal{X}$ . We say that $M$ is tight if, for every $\varepsilon>0$ there exists a compact set $K\subset\mathcal{X}$ such that $\mu(K)\geq 1-\varepsilon$ for every $\mu\in M$ .

Definition 5 (Weak convergence)

A sequence $\{\mu_{n}\}_{n=0}^{\infty}\subseteq\mathfrak{P}(\mathcal{X})$ is said to converge weakly to a probability measure $\mu$ , denoted by $\mu_{n}\Rightarrow\mu$ , if

\int_{\mathcal{X}}h(x)\mu_{n}(dx)\rightarrow\int_{\mathcal{X}}h(x)\mu(dx),\;\;\forall h\in C_{b}(\mathcal{X}).

(4)

We frequently use the following alternative condition [8, Proposition 2.2]:

\mu_{n}(A)\rightarrow\mu(A),\;\;\forall A\in\mathscr{B}(\mathcal{X})\;\text{s.t.}\;\mu(\partial A)=0.

(5)

Correspondingly, the weak equivalence of any two measures $\mu$ and $\nu$ on $\mathcal{X}$ is such that

\int_{\mathcal{X}}h(x)\mu(dx)=\int_{\mathcal{X}}h(x)\nu(dx),\;\;\forall h\in C_{b}(\mathcal{X}).

(6)

Remark 3

Weak convergence describes the weak topology. The meaning of the weak topology is to extend the normal convergence in deterministic settings. Note that $x_{n}\rightarrow x$ in $\mathcal{X}$ is equivalent to the weak convergence of Dirac measures $\delta_{x_{n}}\Rightarrow\delta_{x}$ . It is interesting to note that $x_{n}\rightarrow x$ (resp. $x=y$ ) in $\mathcal{X}$ does not imply the strong convergence (resp. equivalence) of the associated Dirac measures. A classical counterexample is to let $x_{n}=1/n$ and $x=0$ , and we do not have $\lim_{n\rightarrow\infty}\delta_{1/n}=\delta_{0}$ in the strong sense since, i.e., $0=\lim_{n\rightarrow\infty}\delta_{1/n}(\{0\})\neq\delta_{0}(\{0\})=1.$

To describe the convergence (in probability law) of more general random variables $\{X_{n}\}$ in $\mathcal{X}$ , it is equivalent to investigate the weak convergence of their associated measures $\{\mu_{n}\}$ . It is also straightforward from Definition 5 that weak convergence also describes the convergence of probabilistic properties related to $\{\mu_{n}\}$ .

Theorem 1 (Prokhorov)

Let $\mathcal{X}$ be a complete separable metric space. A family $\Lambda\subseteq\mathfrak{P}(\mathcal{X})$ is relatively compact if an only if it is tight. Consequently, for each sequence $\{\mu_{n}\}$ of tight $\Lambda$ , there exists a $\mu\in\bar{\Lambda}$ and a subsequence $\{\mu_{n_{k}}\}$ such that $\mu_{n_{k}}\Rightarrow\mu$ .

Remark 4

The first part of Prokhorov’s theorem provides an alternative criterion for verifying the compactness of family of measures w.r.t. the corresponding metric space using tightness. On a compact metric space $\mathcal{X}$ , every family of probability measures is tight.

2.4 Discrete-time continuous-state stochastic systems

We define Markov processes determined by the difference equation

X_{t+1}=f(X_{t})+b(X_{t})w_{t}+\vartheta\xi_{t}

(7)

where the state $X_{t}(\varpi)\in\mathcal{X}\subseteq\mathbb{R}^{n}$ for all $t\in\mathbb{N}$ , the stochastic inputs $\{w_{t}\}_{t\in\mathbb{N}}$ are i.i.d. Gaussian random variables with covariance $I_{k\times k}$ without loss of generality. Mappings $f:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}$ and $b:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n\times k}$ are locally Lipschitz continuous. The memoryless perturbation $\xi_{t}\in\mathbb{B}_{1}$ are independent random variables with intensity $\vartheta\geq 0$ and unknown distributions.

For $\vartheta\neq 0$ , (7) defines a family $\mathbb{X}$ of Markov processes $X$ . A special case of (7) is such that $\xi$ has Dirac (point-mass) distributions $\{\delta_{x}:x\in\mathbb{B}\}$ centered at some uncertain points within a unit ball.

Remark 5

The discrete-time stochastic dynamic is usually obtained from numerical schemes of stochastic differential equations driven by Brownian motions to simulate the probability laws at the observation times. Gaussian random variables are naturally selected to simulate Brownian motions at discrete times. Note that in [11], random variables are used with known unimodal symmetric density with an interval as support. Their choice is in favor of the mixed-monotone models to provide a more accurate approximation of transition probabilities. Other than the precision issue, such a choice does not bring us more of the other $\mathcal{L}_{1}$ properties. Since we focus on formal analysis based on $\mathcal{L}_{1}$ properties rather than providing accurate approximation, using Gaussian randomnesses as a realization does not lose any generality.

We only care about the behaviors in the bounded working space $\mathcal{W}$ . By defining stopping time $\tau:=\inf\{t\in\mathbb{N}:X\notin\mathcal{W}\}$ for each $X$ , we are able to study the probability law of the corresponding stopped (killed) process $X^{\tau}$ for any initial condition $x_{0}$ (resp. $\mathcal{\nu}_{0}$ ), which coincides with $\mathcal{P}_{X}^{x_{0}}$ (resp. $\mathcal{P}_{X}^{\mathcal{\nu}_{0}}$ ) on $\mathcal{W}$ . To avoid any complexity, we use the same notation $X$ and $\mathcal{P}_{X}^{x_{0}}$ (resp. $\mathcal{P}_{X}^{\mathcal{\nu}_{0}}$ ) to denote the stopped processes and the associated laws. Such processes driven by (7) can be written as a Markov system

\mathbb{X}=(\mathcal{X},\left[\![\mathcal{T}\right]\!],\Pi,L),

(8)

where for all $x\in\mathcal{X}\setminus\mathcal{W}$ , the transition probability should satisfy $\mathcal{T}(x,\Gamma)=0$ for all $\Gamma\cap\mathcal{W}\neq\emptyset$ ; $\left[\![\mathcal{T}\right]\!]$ is the collection of transition probabilities. For $\xi$ having Dirac distributions, the transition $\mathcal{T}$ is of the following form:

\mathcal{T}(x,\cdot)\in\left\{\begin{array}[]{lr}\{\mu\sim\mathcal{N}(f(x)+\vartheta\xi,\;\;b(x)b^{T}(x)),\;\xi\in\mathbb{B}\},\;\;\forall x\in\mathcal{W},\\ \{\mu:\;\mu(\Gamma)=0,\;\;\forall\Gamma\cap\mathcal{W}\neq\emptyset\},\;\;\forall x\in\mathcal{X}\setminus\mathcal{W}.\end{array}\right.

(9)

Assumption 1

We assume that $\textbf{in}\in L(x)$ for any $x\notin\Delta$ and $\textbf{in}\notin L(\Delta)$ . We can also include ‘always $(\textbf{in})$ ’ in the specifications to observe sample paths for ‘inside-domain’ behaviors, which is equivalent to verifying $\{\tau=\infty\}$ .

2.5 Robust abstractions

We define a notion of abstraction between continuous-state and finite-state Markov systems via state-level relations and measure-level relations.

Definition 6

A (binary) relation $\gamma$ from $A$ to $B$ is a subset of $A\times B$ satisfying (i) for each $a\in A$ , $\gamma(a):=\{b\in B:(a,b)\in\gamma\}$ ; (ii) for each $b\in B$ , $\gamma^{-1}(b):=\{a\in A:(a,b)\in\gamma\}$ ; (iii) for $A^{\prime}\subseteq A$ , $\gamma(A^{\prime})=\cup_{a\in A^{\prime}}\gamma(a)$ ; (iv) and for $B^{\prime}\subseteq B$ , $\gamma^{-1}(B^{\prime})=\cup_{b\in B^{\prime}}\gamma^{-1}(b)$ .

Definition 7

Given a continuous-state Markov system

\mathbb{X}=(\mathcal{X},\left[\![\mathcal{T}\right]\!],\Pi,L)

and a finite-state Markov system

\mathbb{I}=(\mathcal{Q},[\![\Theta]\!],\Pi,L_{\mathbb{I}}),

where $Q=(q_{1},\cdots,q_{n})^{T}$ and $[\![\Theta]\!]$ stands for a collection of $n\times n$ stochastic matrices. A state-level relation $\alpha\subseteq\mathcal{X}\times Q$ is said to be an abstraction from $\mathbb{X}$ to $\mathbb{I}$ if (i) for all $x\in\mathcal{X}$ , there exists $q\in Q$ such that $(x,q)\in\alpha$ ; (ii) for all $(x,q)\in\alpha$ , $L_{\mathbb{I}}(q)=L(x)$ .

A measure-level relation $\gamma_{\alpha}\subseteq\mathfrak{P}(\mathcal{X})\times\mathfrak{P}(Q)$ is said to be an abstraction from $\mathbb{X}$ to $\mathbb{I}$ if for all $i\in\{1,2,\cdots,n\}$ , all $\mathcal{T}\in\left[\![\mathcal{T}\right]\!]$ and all $x\in\alpha^{-1}(q_{i})$ , there exists $\Theta\in\left[\![\Theta\right]\!]$ such that $(\mathcal{T}(x,\cdot),\Theta_{i})\in\gamma_{\alpha}$ and that $\mathcal{T}(x,\alpha^{-1}(q_{j}))=\Theta_{ij}$ for all $j\in\{1,2,\cdots,n\}$ .

Similarly, $\gamma_{\alpha}\subseteq\mathfrak{P}(Q)\times\mathfrak{P}(\mathcal{X})$ is said to be an abstraction from $\mathbb{I}$ to $\mathbb{X}$ if for all $i\in\{1,2,\cdots,n\}$ , all $\Theta\in\left[\![\Theta\right]\!]$ and all $x\in\alpha^{-1}(q_{i})$ , there exists $\mathcal{T}\in\left[\![\mathcal{T}\right]\!]$ such that $(\Theta_{i},\mathcal{T}(x,\cdot))\in\gamma_{\alpha}$ and that $\mathcal{T}(x,\alpha^{-1}(q_{j}))=\Theta_{ij}$ for all $j\in\{1,2,\cdots,n\}$ .

If such relations $\alpha$ and $\gamma_{\alpha}$ exist, we say that $\mathbb{I}$ abstracts $\mathbb{X}$ (resp. $\mathbb{X}$ abstracts $\mathbb{I}$ ) and write $\mathbb{X}\preceq_{\gamma_{\alpha}}\mathbb{I}$ (resp. $\mathbb{I}\preceq_{\gamma_{\alpha}}\mathbb{X}$ ).

Assumption 2

Without loss of generality, we assume that the labelling function is amenable to a rectangular partition²²2See e.g. [11, Definition 1].. In other words, a state-level abstraction can be obtained from a rectangular partition.

3 Soundness of Robust IMC Abstractions

IMCs³³3We omit the definition from this paper due to the limitation of space. For a formal definition see e.g. [18, Definition 3]. are quasi-Markov systems on a discrete state space with upper/under approximations ( $\hat{\Theta}$ / $\check{\Theta}$ ) of the real transition matrices. To abstract the transition probabilities of continuous-state Markov systems (8), $\hat{\Theta}$ and $\check{\Theta}$ are obtained from over/under approximations of $\mathcal{T}$ based on the state space partition. Throughout this section, we assume that $\hat{\Theta}$ and $\check{\Theta}$ have been correspondingly constructed.

Given an IMC, we recast it to a finite-state Markov system

\mathbb{I}=(\mathcal{Q},[\![\Theta]\!],\Pi,L_{\mathbb{I}}),

(10)

where

•

$\mathcal{Q}$ is the finite state-space partition with dimension $N+1$ containing $\{\Delta\}$ , i.e., $Q=(q_{1},q_{2},\cdots,q_{N},\Delta)^{T}$ ;

•

$[\![\Theta]\!]$ ⁴⁴4This is a necessary step to guarantee proper probability measures in (12). Algorithms can be found in [16] or [18, Section V-A]. is a set of stochastic matrices satisfying

[\![\Theta]\!]=\{\Theta:\text{stochastic matrices with}\;\check{\Theta}\leq\Theta\leq\hat{\Theta}\;\;\text{componentwisely}\};

(11)

•

$\Pi,L_{\mathbb{I}}$ are as before.

To make $\mathbb{I}$ an abstraction for (10), we need the approximation to be such that $\check{\Theta}_{ij}\leq\int_{q_{j}}\mathcal{T}(x,dy)\leq\hat{\Theta}_{ij}$ for all $x\in q_{i}$ and $i,j=1,\cdots,N$ , as well as $\Theta_{N+1}=(0,0,\cdots,1)$ . We further require that the partition should respect the boundaries induced by the labeling function, i.e., for any $q\in Q$ ,

L_{\mathbb{I}}(q):=L(x),\;\forall x\in q.

Clearly, the above connections on the state and transition probabilities satisfy Definition 7.

The Markov system $\mathbb{I}$ is understood as a family of ‘perturbed’ Markov chain generated by the uncertain choice of $\Theta$ for each $t$ . The $n$ -step transition matrices are derived based on $[\![\Theta]\!]$ as

\begin{split}[\![\Theta^{(2)}]\!]&=\{\Theta_{0}\Theta_{1}:\;\;\Theta_{0},\Theta_{1}\in[\![\Theta]\!]\},\\ &\cdots\\ [\![\Theta^{(n)}]\!]&=\{\Theta_{0}\Theta_{1}\cdots\Theta_{n}:\;\;\Theta_{i}\in[\![\Theta]\!],\;i=0,1,\cdots,n\}.\\ \end{split}

Given an initial distribution $\mu_{0}\in\mathfrak{P}(Q)$ , the marginal probability measure at each $t$ forms a set

\mathfrak{P}(Q)\supseteq\mathscr{M}_{t}^{\mu_{0}}:=\{\mu_{t}=(\Theta^{(t)})^{T}\mu_{0}:\;\;\Theta^{(t)}\in[\![\Theta^{(t)}]\!]\}.

(12)

If we do not emphasize the initial distribution $\mu_{0}$ , we also use $\mathscr{M}_{t}$ to denote the marginals for short.

We aim to show the soundness of robust IMC abstractions in this section. The proofs in this section are completed in Appendix 0.A.

3.1 Weak compactness of marginal space $\mathscr{M}_{t}$ of probabilities

The following lemma is rephrased from [39, Theorem 2] and shows the structure of the $\mathscr{M}_{t}$ for each $t\in\mathbb{N}$ and any initial distribution $\mu_{0}$ .

Lemma 1

Let $\mathbb{I}$ be a Markov system of the form (10) that is derived from an IMC. Then the set $\mathscr{M}_{t}$ of all possible probability measures at each time $t\in\mathbb{N}$ is a convex polytope, and immediately is compact. The vertices of $\mathscr{M}_{t}$ are of the form

(V_{i_{t}})^{T}\cdots(V_{i_{2}})^{T}(V_{i_{1}})^{T}\mu_{0}

(13)

for some vertices $V_{i_{j}}$ of $[\![\Theta]\!]$ .

Example 1

Let $Q=(q_{1},q_{2},q_{3})^{T}$ and $\mu_{0}=(1,0,0)^{T}$ . The under/over estimations of transition matrices are given as

\check{\Theta}=\begin{bmatrix}\frac{1}{4}&0&\frac{1}{4}\\ 0&0&1\\ 0&1&0\\ \end{bmatrix},\;\;\;\hat{\Theta}=\begin{bmatrix}\frac{3}{4}&0&\frac{3}{4}\\ 0&0&1\\ 0&1&0\\ \end{bmatrix}.

Then $[\![\Theta]\!]$ forms a convex set of stochastic matrices with vertices

V_{1}=\begin{bmatrix}\frac{1}{4}&0&\frac{3}{4}\\ 0&0&1\\ 0&1&0\\ \end{bmatrix},\;\;\;V_{2}=\begin{bmatrix}\frac{3}{4}&0&\frac{1}{4}\\ 0&0&1\\ 0&1&0\\ \end{bmatrix}.

Therefore, the vertices of $\mathscr{M}_{1}$ are

\nu_{1}^{(1)}=(V_{1})^{T}\mu_{0}=(\frac{1}{4},0,\frac{3}{4})^{T},\;\;\nu_{1}^{(2)}=(V_{2})^{T}\mu_{0}=(\frac{3}{4},0,\frac{1}{4})^{T}.

Hence, $\mathscr{M}_{1}=\{\mu:\mu=\alpha\nu_{1}^{(1)}+(1-\alpha)\nu_{1}^{(2)},\;\alpha\in[0,1]\}$ . Similarly, the vertices of $\mathscr{M}_{2}$ are

\begin{split}&\nu_{2}^{(1)}=(V_{1})^{T}(V_{1})^{T}\mu_{0}=(\frac{1}{16},\frac{12}{16},\frac{3}{16})^{T},\;\;\nu_{2}^{(2)}=(V_{2})^{T}(V_{1})^{T}\mu_{0}=(\frac{3}{16},\frac{12}{16},\frac{1}{16})^{T},\\ &\nu_{2}^{(3)}=(V_{1})^{T}(V_{2})^{T}\mu_{0}=(\frac{3}{16},\frac{4}{16},\frac{9}{16})^{T},\;\;\nu_{2}^{(4)}=(V_{2})^{T}(V_{2})^{T}\mu_{0}=(\frac{9}{16},\frac{4}{16},\frac{3}{16})^{T},\\ \end{split}

and

\mathscr{M}_{2}=\{\mu:\mu=\alpha\beta\nu_{2}^{(1)}+\alpha(1-\beta)\nu_{2}^{(2)}+\beta(1-\alpha)\nu_{2}^{(3)}+(1-\alpha)(1-\beta)\nu_{2}^{(4)},\;\alpha,\beta\in[0,1]\}.

The calculation of the rest of $\mathscr{M}_{t}$ follows the same procedure.

Now we introduce the total variation distance $\left\|\;\cdot\;\right\|_{\text{TV}}$ and see how $(\mathscr{M}_{t},\left\|\;\cdot\;\right\|_{\text{TV}})$ (at each $t$ ) implies the weak topology.

Definition 8 (Total variation distance)

Given two probability measures $\mu$ and $\nu$ on $\mathcal{X}$ , the total variation distance is defined as

\left\|\mu-\nu\right\|_{\text{TV}}=2\sup_{\Gamma\in\mathscr{B}(\mathcal{X})}|\mu(\Gamma)-\nu(\Gamma)|.

(14)

In particular, if $\mathcal{X}$ is a discrete space,

\left\|\mu-\nu\right\|_{\text{TV}}=\|\mu-\nu\|_{1}=\sum_{q\in\mathcal{X}}|\mu(q)-\nu(q)|.

(15)

Corollary 1

Let $\mathbb{I}$ be a Markov system of the form (10) that is derived from an IMC. Then at each time $t\in\mathbb{N}$ , for for each $\{\mu_{n}\}\subseteq\mathscr{M}_{t}$ , there exists a $\mu\in\mathscr{M}_{t}$ and a subsequence $\{\mu_{n_{k}}\}$ such that $\mu_{n_{k}}\Rightarrow\mu$ . In addition, for each $h\in C_{b}(\mathcal{X})$ and $t\in\mathbb{N}$ , the set $H=\{\sum_{\mathcal{X}}h(x)\mu(x),\;\mu\in\mathscr{M}_{t}\}$ forms a convex and compact subset in $\mathbb{R}$ .

Remark 6

The above shows that $\left\|\;\cdot\;\right\|_{\text{TV}}$ metricizes the weak topology of measures on $Q$ . Note that since $Q$ is bounded and finite, any metrizable family of measures on $Q$ is compact. For example, let $Q=\{q_{1},q_{2}\}$ , and $\{(0,1)^{T},(1,0)^{T}\}$ be a set of singular measures on $Q$ . Then every sequence of the above set has a weakly convergent subsequence. However, these measures do not have a convex structure as $\mathscr{M}_{t}$ . Hence the corresponding $H$ that is generated by $\{(0,1)^{T},(1,0)^{T}\}$ only provides vertices in $\mathbb{Z}$ .

3.2 Weak compactness of probability laws of $\mathbb{I}$ on infinite horizon

In this subsection, we focus on the case where $I_{0}=q_{0}$ a.s. for any $q_{0}\in Q\setminus\{\Delta\}$ . The cases for arbitrary initial distribution should be similar. We formally denote $\mathscr{M}^{q_{0}}:=\{\mathbf{P}_{I}^{q_{0}}\}_{I\in\mathbb{I}}$ by the set of probability laws of every discrete-state Markov processes $I\in\mathbb{I}$ with initial state $q_{0}\in Q$ . We denote $\mathscr{M}_{t}^{q_{0}}$ by the set of marginals at $t$ .

Proposition 1

For any $q_{0}\in Q$ , every sequence $\{\mathbf{P}_{n}^{q_{0}}\}_{n=0}^{\infty}$ of $\mathscr{M}^{q_{0}}$ has a weakly convergent subsequence.

Remark 7

The property is an extension of the marginal weak compactness relying on the (countable) product topology. We can also introduce proper product metrics to metricize, see e.g. [28]. Similar results hold under certain conditions for continuous time processes on continuous state spaces with uniform norms [27, Lemma 82.3 and 87.3].

Theorem 2

Let $\mathbb{I}$ be a Markov system of the form (10) that is derived from an IMC. Then for any LTL formula $\Psi$ , the set $S^{q_{0}}=\{\mathbf{P}_{I}^{q_{0}}(I\vDash\Psi)\}_{I\in\mathbb{I}}$ is a convex and compact subset in $\mathbb{R}$ , i.e., a compact interval.

3.3 Soundness of IMC abstractions

Proposition 2

Let $\mathbb{X}$ be a Markov system driven by (8). Then every sequence $\{\mathcal{P}_{n}^{x_{0}}\}_{n=0}^{\infty}$ of $\{\mathcal{P}_{X}^{x_{0}}\}_{X\in\mathbb{X}}$ has a weakly convergent subsequence. Consequently, for any LTL formula $\Psi$ , the set $\{\mathcal{P}_{X}^{x_{0}}(X\vDash\Psi)\}_{X\in\mathbb{X}}$ is a compact subset in $\mathbb{R}$ .

Lemma 2

Let $X\in\mathbb{X}$ be any Markov process driven by (8) and $\mathbb{I}$ be the finite-state IMC abstraction of $\mathbb{X}$ . Suppose the initial distribution $\nu_{0}$ of $X$ is such that $\nu_{0}(q_{0})=1$ . Then, there exists a unique law $\mathbf{P}_{I}^{q_{0}}$ of some $I\in\mathbb{I}$ such that, for any LTL formula $\Psi$ ,

\mathcal{P}_{X}^{\nu_{0}}(X\vDash\Psi)=\mathbf{P}_{I}^{q_{0}}(I\vDash\Psi).

Theorem 3

Assume the settings in Lemma 2. For any LTL formula $\Psi$ , we have

\mathcal{P}_{X}^{\nu_{0}}(X\vDash\Psi)\in\{\mathbf{P}_{I}^{q_{0}}(I\vDash\Psi)\}_{I\in\mathbb{I}},

Proof The conclusion is obtained by combining Lemma 2 and Theorem 2.

Corollary 2

Let $\mathbb{X}$ , its IMC abstraction $\mathbb{I}$ , an LTL formula $\Psi$ , and a constant $\rho\in[0,1]$ be given. Suppose $I\vDash\mathbf{P}^{q_{0}}_{\bowtie\rho}[\Psi]$ for all $I\in\mathbb{I}$ , we have $X\vDash\mathcal{P}^{\nu_{0}}_{\bowtie\rho}[\Psi]$ for all $X\in\mathbb{X}$ with $\nu_{0}(q_{0})=1$ .

Remark 8

Note that we do not have $\mathcal{P}_{X}^{\nu_{0}}\in\{\mathbf{P}_{I}^{q_{0}}\}_{I\in\mathbb{I}}$ since each $\mathbf{P}_{I}^{q_{0}}$ is a discrete measure whereas $\mathcal{P}_{X}^{\nu_{0}}$ is not. They only coincide when measuring Borel subset of $\mathbf{F}$ . It would be more accurate to state that $\mathcal{P}_{X}^{\nu_{0}}(X\vDash\Psi)$ is a member of $\{\mathbf{P}_{I}^{q_{0}}(I\vDash\Psi)\}_{I\in\mathbb{I}}$ rather than say “the true distribution (the law as what we usually call) of the original system is a member of the distribution set represented by the abstraction model” [18].

Remark 9

We have seen that, in view of Lemma 2, the ‘post-transitional’ measures are automatically related only based on the relations between transition probabilities. We will see in the next section that such relations can be constructed to guarantee an approximate completeness of $\mathbb{I}$ .

Proposition 3

Let $\varepsilon:=\max_{i}\|\hat{\Theta}_{i}-\check{\Theta}_{i}\|_{\text{TV}}$ . Then for each LTL formula $\Psi$ , as $\varepsilon\rightarrow 0$ , the length $\lambda(S^{q_{0}})\rightarrow 0$ .

Remark 10

By Lemma 2, for each $X\in\mathbb{X}$ , there exists exactly one $\mathbf{P}_{I}$ of some $I\in\mathbb{I}$ by which satisfaction probability equals to that of $X$ . The precision of $\hat{\Theta}$ and $\check{\Theta}$ determines the size of $S^{q_{0}}$ . Once we are able to calculate the exact law of $X$ , the $S^{q_{0}}$ becomes a singleton by Proposition 3. For example, let each $w_{t}$ become $\delta_{0}$ , we have each $\mathscr{M}_{t}$ reduced to a singleton $\{\delta_{f(x_{t})}\}$ automatically. The verification problem becomes checking whether $L(f(x_{t}))\vDash\Psi$ given the partition $Q$ . The probability of satisfaction is either $0$ or $1$ . Another example would be $X_{t+1}=AX_{t}+Bw_{t}$ , where $A,B$ are linear matrices. We are certain about the exact law of this system, and there is no need to introduce IMC for approximations at the beginning. IMC abstractions prove more useful when coping with systems whose marginal distributions are uncertain or not readily computable.

4 Robust Completeness of IMC Abstractions

In this section, we are given a Markov system $\mathbb{X}_{1}$ driven by (7) with point-mass perturbations of strength $\vartheta_{1}\geq 0$ . Based on $\mathbb{X}_{1}$ , we first construct an IMC abstraction $\mathbb{I}$ . We then show that $\mathbb{I}$ can be abstracted by a system $\mathbb{X}_{2}$ with more general $\mathcal{L}_{1}$ -bounded noise of any arbitrary strength $\vartheta_{2}>\vartheta_{1}$ .

Recalling the soundness analysis of IMC abstractions in Section 3, the relation of satisfaction probability is induced by a relation between the continuous and discrete transitions. To capture the probabilistic properties of stochastic processes, reachable set of probability measures is the analogue of the reachable set in deterministic cases. We rely on a similar technique in this section to discuss how transition probabilities of different uncertain Markov systems are related. To metricize sets of Gaussian measures and to connect them with discrete measures, we prefer to use Wasserstein metric.

Definition 9

Let $\mu,\nu\in\mathfrak{P}(\mathcal{X})$ for $(\mathcal{X},|\cdot|)$ , the Wasserstein distance⁵⁵5This is formally termed as $1^{\text{st}}$ -Wasserstein metric. We choose $1^{\text{st}}$ -Wasserstein metric due to the convexity and nice property of test functions. is defined by $\left\|\mu-\nu\right\|_{\text{W}}=\inf\mathcal{E}|X-Y|$ , where the infimum is is taken over all joint distributions of the random variables $X$ and $Y$ with marginals $\mu$ and $\nu$ respectively. We frequently use the following duality form of definition⁶⁶6 $\operatorname{Lip}(h)$ is the Lipschitz constant of $h$ such that $|h(x_{2})-h(x_{1})|\leq\operatorname{Lip}(h)|x_{2}-x_{1}|$ .,

\left\|\mu-\nu\right\|_{\text{W}}:=\sup\left\{\left|\int_{\mathcal{X}}h(x)d\mu(x)-\int_{\mathcal{X}}h(x)d\nu(x)\right|,\;h\in C(\mathcal{X}),\operatorname{Lip}(h)\leq 1\right\}.

The discrete case, $\left\|\cdot\right\|_{\text{W}}^{d}$ , is nothing but to change the integral to summation. Let $B_{W}=\{\mu\in\mathfrak{P}(\mathcal{X}):\left\|\mu-\delta_{0}\right\|_{\text{W}}\leq 1\}$ . Given a set $\mathfrak{G}\subseteq\mathfrak{P}(\mathcal{X})$ , we denote $\|\mu\|_{\mathfrak{G}}=\inf_{\nu\in\mathfrak{G}}\left\|\mu-\nu\right\|_{\text{W}}$ by the distance from $\mu$ to $\mathfrak{G}$ , and $\mathfrak{G}+r\mathbb{B}_{W}:=\{\mu:\;\|\mu\|_{\mathfrak{G}}\leq r\}$ ⁷⁷7This is valid by definition. by the $r$ -neighborhood of $\mathfrak{G}$ .

Remark 11

Note that $\mathbb{B}_{W}$ is dual to $\mathbb{B}_{1}$ . For any $\mu\in\mathbb{B}_{W}$ , the associated random variable $X$ should satisfy $\mathcal{E}|X|\leq 1$ , and vice versa.

The following well-known result estimates the Wasserstein distance between two Gaussians.

Proposition 4

Let $\mu\sim\mathcal{N}(m_{1},\Sigma_{1})$ and $\nu\sim\mathcal{N}(m_{2},\Sigma_{2})$ be two Gaussian measures on $\mathbb{R}^{n}$ . Then

|m_{1}-m_{2}|\leq\left\|\mu-\nu\right\|_{\text{W}}\leq\left(\|m_{1}-m_{2}\|_{2}^{2}+\|\Sigma_{1}^{1/2}-\Sigma_{2}^{1/2}\|_{F}^{2}\right)^{1/2},

(16)

where $\|\cdot\|_{F}$ is the Frobenius norm.

On finite state spaces, total variation and Wasserstein distances manifest equivalence [13, Theorem 4]. We only show the following side of inequality in favor of the later proofs.

Proposition 5

For any $\mu,\nu$ on some discrete and finite space $Q$ , we have

\left\|\mu-\nu\right\|_{\text{W}}^{d}\leq\operatorname{diam}(Q)\cdot\left\|\mu-\nu\right\|_{\text{TV}}.

(17)

Before proceeding, we define the set of transition probabilities of $\mathbb{X}_{i}$ from any box $[x]\subseteq\mathbb{R}^{n}$ as

\mathbb{T}_{i}([x])=\{\mathcal{T}(x,\cdot):\;\mathcal{T}\in\left[\![\mathcal{T}\right]\!]_{i},\;x\in[x]\},\;i=1,2,

and use the following lemma to approximate $\mathbb{T}_{1}([x])$ .

Lemma 3

Fix any $\varepsilon>0$ , any box $[x]\subseteq\mathbb{R}^{n}$ . For all $\kappa>0$ , there exists a finitely terminated algorithm to compute an over-approximation of the set of (Gaussian) transition probabilities from $[x]$ , such that

\mathbb{T}_{1}([x])\subseteq\widehat{\mathbb{T}_{1}([x])}\subseteq\mathbb{T}_{1}([x])+\kappa\mathbb{B}_{W},

where $\widehat{\mathbb{T}_{1}([x])}$ is the computed over-approximation set of Gaussian measures.

Remark 12

The proof is completed in Appendix 0.B. The lemma renders the inclusions with larger Wasserstein distance to ensure no missing information about the covariances.

Definition 10

For $i=1,2$ , we introduce the modified transition probabilities for $\mathbb{X}_{i}=(\mathcal{X},\left[\![\mathcal{T}\right]\!]_{i},x_{0},\Pi,L)$ based on (9). For all $\mathcal{T}_{i}\in\left[\![\mathcal{T}\right]\!]_{i}$ , let

\tilde{\mathcal{T}}_{i}(x,\Gamma)=\left\{\begin{array}[]{lr}\mathcal{T}_{i}(x,\Gamma),\;\forall\Gamma\subseteq\mathcal{W},\;\forall x\in W,\\ \mathcal{T}_{i}(x,\mathcal{W}^{c}),\;\Gamma=\partial\mathcal{W},\;\forall x\in W,\\ 1,\;\Gamma=\partial\mathcal{W},\;x\in\partial\mathcal{W}.\end{array}\right.

(18)

Correspondingly, let $\tilde{\left[\![\mathcal{T}\right]\!]}$ denote the collection. Likewise, we also use $\tilde{(\cdot)}$ to denote the induced quantities of any other types w.r.t. such a modification.

Remark 13

We introduce the concept only for analysis. The above modification does not affect the law of the stopped processes since we do not care about the ‘out-of-domain’ transitions. We use a weighted point mass to represent the measures at the boundary, and the mean should remain the same. It can be easily shown that the Wasserstein distance between any two measures in $\tilde{\left[\![\mathcal{T}\right]\!]}(x,\cdot)$ is upper bounded by that of the non-modified ones.

Theorem 4

For any $0\leq\vartheta_{1}<\vartheta_{2}$ , we set $\mathbb{X}_{i}=(\mathcal{X},\tilde{\left[\![\mathcal{T}\right]\!]}_{i},x_{0},\Pi,L)$ , $i=1,2$ , where $\mathbb{X}_{1}$ is perturbed by point masses with intensity $\vartheta_{1}$ , and $\mathbb{X}_{2}$ is perturbed by general $L_{1}$ -perturbation with intensity $\vartheta_{2}$ . Then, under Assumption 2, there exists a rectangular partition $Q$ (state-level relation $\alpha\subseteq\mathcal{X}\times Q$ ),a measure-level relation $\gamma_{\alpha}$ and a collection of transition matrices $\left[\![\Theta\right]\!]$ , such that the system $\mathbb{I}=(Q,\left[\![\Theta\right]\!],q_{0},\Pi,L)$ abstracts $\mathbb{X}_{1}$ and is abstracted by $\mathbb{X}_{2}$ by the following relation:

\mathbb{X}_{1}\preceq_{\gamma_{\alpha}}\mathbb{I},\;\;\mathbb{I}\preceq_{\gamma_{\alpha}^{-1}}\mathbb{X}_{2}.

(19)

Proof We construct a finite-state IMC with partition $Q$ and an inclusion of transition matrices $\left[\![\Theta\right]\!]$ as follows. By Assumption 2, we use uniform rectangular partition on $\mathcal{W}$ and set $\alpha=\{(x,q):q=\eta\lfloor\frac{x}{\eta}\rfloor\}\cup\{(\Delta,\Delta)\}$ , where $\lfloor\cdot\rfloor$ is the floor function and $\eta$ is to be chosen later. Denote the number of discrete nodes by $N+1$ .

Note that any family of (modified) Gaussian measures $\tilde{\left[\![\mathcal{T}\right]\!]}_{1}$ is induced from $\left[\![\mathcal{T}\right]\!]_{1}$ and should contain its information. For any $\tilde{\mathcal{T}}\in\tilde{\left[\![\mathcal{T}\right]\!]}_{1}$ and $q\in Q$ ,

(i)

for all $\tilde{\nu}\sim\tilde{\mathcal{N}}(m,s^{2})\in\tilde{\mathbb{T}}_{1}(\alpha^{-1}(q),\cdot)$ , store $\{(m_{l},\Sigma_{l})=(\eta\lfloor\frac{m}{\eta}\rfloor,\eta^{2}\lfloor{\frac{s^{2}}{\eta^{2}}}\rfloor)\}_{l}$ ;
(ii)

for each $l$ , define $\tilde{\nu}_{l}^{\text{ref}}\sim\tilde{\mathcal{N}}(m_{l},\Sigma_{l})$ (implicitly, we need to compute $\nu_{l}^{\text{ref}}(\Delta)$ ); compute $\tilde{\nu}_{l}^{\text{ref}}(\alpha^{-1}(q_{j}))$ for each $q_{j}\in Q\setminus\Delta$ ;
(iii)

for each $l$ , define $\mu^{\text{ref}}_{l}=[\tilde{\nu}_{l}^{\text{ref}}(\alpha^{-1}(q_{1})),\cdots,\tilde{\nu}_{l}^{\text{ref}}(\alpha^{-1}(q_{N})),\tilde{\nu}_{l}^{\text{ref}}(\Delta)]$ ;
(iv)

compute $\textbf{ws}:=(\sqrt{2N}+2)\eta$ and $\textbf{tv}:=N\eta\cdot\textbf{ws}$ ;
(v)

construct $\left[\![\mu\right]\!]=\bigcup_{l}\{\mu:\left\|\mu-\mu^{\text{ref}}_{l}\right\|_{\text{TV}}\leq\textbf{tv}(\eta),\;\mu(\Delta)+\sum_{j}^{N}\mu(q_{j})=1\}$ ;
(vi)

Let $\gamma_{\alpha}=\{(\tilde{\nu},\mu),\;\mu\in\left[\![\mu\right]\!]\}$ be a relation between $\tilde{\nu}\in\tilde{\mathbb{T}}(\alpha^{-1}(q))$ and the generated $\left[\![\mu\right]\!]$ .

Repeat the above step for all $q$ , the relation $\gamma_{\alpha}$ is obtained. The rest of the proof falls in the following steps. For $i\leq N$ , we simply denote $\mathfrak{G}_{i}:=\tilde{\mathbb{T}}_{1}(\alpha^{-1}(q_{i}))$ and $\hat{\mathfrak{G}}_{i}:=\widehat{\tilde{\mathbb{T}_{1}}(\alpha^{-1}(q_{i}))}$ .
Claim 1: For $i\leq N$ , let $\left[\![\Theta_{i}\right]\!]=\gamma_{\alpha}(\hat{\mathfrak{G}}_{i})$ . Then the finite-state IMC $\mathbb{I}$ with transition collection $\left[\![\Theta\right]\!]$ abstracts $\mathbb{X}_{1}$ .

Indeed, for each $i=1,\cdots,N$ and each $\tilde{\mathcal{T}}$ , we have $\gamma_{\alpha}(\mathfrak{G}_{i})\subseteq\gamma_{\alpha}(\widehat{\mathfrak{G}}_{i})$ . We pick any modified Gaussian $\tilde{\nu}\in\hat{\mathfrak{G}}_{i}$ , there exists a $\tilde{\nu}^{\text{ref}}$ such that (by Proposition 4) $\left\|\tilde{\nu}-\tilde{\nu}^{\text{ref}}\right\|_{\text{W}}\leq\left\|\nu-\nu^{\text{ref}}\right\|_{\text{W}}\leq\sqrt{2N}\eta$ . We aim to find all discrete measures $\mu$ induced from $\tilde{\nu}$ (such that their probabilities match on discrete nodes as requirement by Definition 7). All such $\mu$ should satisfy⁸⁸8Note that we also have $\left\|\mu-\mu^{\text{ref}}\right\|_{\text{W}}^{d}\leq\left\|\mu-\tilde{\nu}\right\|_{\text{W}}^{d}+\left\|\tilde{\nu}-\tilde{\nu}^{\text{ref}}\right\|_{\text{W}}^{d}+\left\|\tilde{\nu}^{\text{ref}}-\mu^{\text{ref}}\right\|_{\text{W}}^{d}=\left\|\tilde{\nu}-\tilde{\nu}^{\text{ref}}\right\|_{\text{W}}^{d}$ , but it is hard to connect $\left\|\tilde{\nu}-\tilde{\nu}^{\text{ref}}\right\|_{\text{W}}^{d}$ with $\left\|\tilde{\nu}-\tilde{\nu}^{\text{ref}}\right\|_{\text{W}}$ for general measures. This connection can be done if we only compare Dirac or discrete measures.,

\begin{split}\left\|\mu-\mu^{\text{ref}}\right\|_{\text{W}}^{d}&=\left\|\mu-\mu^{\text{ref}}\right\|_{\text{W}}\\ &\leq\left\|\mu-\tilde{\nu}\right\|_{\text{W}}+\left\|\tilde{\nu}-\tilde{\nu}^{\text{ref}}\right\|_{\text{W}}+\left\|\tilde{\nu}^{\text{ref}}-\mu^{\text{ref}}\right\|_{\text{W}}\\ &\leq(2+\sqrt{2N})\eta,\end{split}

(20)

where the first term of line 2 is bounded by,

\begin{split}\left\|\mu-\tilde{\nu}\right\|_{\text{W}}&=\sup_{h\in C(\mathcal{X}),\operatorname{Lip}(h)\leq 1}\left|\int_{\mathcal{X}}h(x)d\mu(x)-\int_{\mathcal{X}}h(x)d\tilde{\nu}(x)\right|\\ &\leq\sup_{h\in C(\mathcal{X}),\operatorname{Lip}(h)\leq 1}\sum_{j=1}^{n}\int_{\alpha^{-1}(q_{j})}|h(x)-h(q_{j})|d\tilde{\nu}(x)\\ &\leq\eta\sum_{j=1}^{n}\int_{\alpha^{-1}(q_{j})}d\tilde{\nu}(x)\leq\eta,\end{split}

(21)

and the third term of line 2 is bounded in a similar way. By step (v)(vi) and Proposition 5, all possible discrete measures $\mu$ induced from $\tilde{\nu}$ should be included in $\gamma_{\alpha}(\hat{\mathfrak{G}}_{i})$ . Combining the above, for any $\tilde{\nu}\in\mathfrak{G}_{i}$ and hence in $\hat{\mathfrak{G}}_{i}$ , there exists a discrete measures in $\Theta_{i}\in\gamma_{\alpha}(\hat{\mathfrak{G}}_{i})$ such that for all $q_{j}$ we have $\tilde{\nu}(\alpha^{-1}(q_{j}))=\Theta_{ij}$ . This satisfies the definition of abstraction.

Claim 2: $\gamma_{\alpha}^{-1}(\gamma_{\alpha}(\mathfrak{G}_{i}))\subseteq\mathfrak{G}_{i}+(2\eta+N\eta\cdot\textbf{tv}(\eta))\cdot\mathbb{B}_{W}$ . This is to recover all possible (modified) measures $\tilde{\nu}$ from the constructed $\gamma_{\alpha}(\mathfrak{G}_{i})$ , such that their discrete probabilities coincide. Note that, the ‘ref’ information is recorded when computing $\gamma_{\alpha}(\mathfrak{G}_{i})$ in the inner parentheses. Therefore, for any $\mu\in\gamma_{\alpha}(\mathfrak{G}_{i})$ there exists a $\mu^{\text{ref}}$ within a total variation radius $\textbf{tv}(\eta)$ . We aim to find corresponding measure $\tilde{\nu}$ that matches $\mu$ by their probabilities on discrete nodes. All such $\tilde{\nu}$ should satisfy,

\begin{split}\left\|\tilde{\nu}-\tilde{\nu}^{\text{ref}}\right\|_{\text{W}}&\leq\left\|\tilde{\nu}-\mu\right\|_{\text{W}}+\left\|\mu-\mu^{\text{ref}}\right\|_{\text{W}}^{d}+\left\|\mu^{\text{ref}}-\tilde{\nu}^{\text{ref}}\right\|_{\text{W}}\\ &\leq 2\eta+N\eta\cdot\textbf{tv}(\eta),\end{split}

(22)

where the bounds for the first and third terms are obtained in the same way as (21). The second term is again by a rough comparison in Proposition 5. Note that $\tilde{\nu}^{\text{ref}}$ is already recorded in $\mathfrak{G}_{i}$ . The inequality in (22) provides an upper bound of Wasserstein deviation between any possible satisfactory measure and some $\tilde{\nu}^{\text{ref}}\in\mathfrak{G}_{i}$ .

Claim 3: If we can choose $\eta$ and $\kappa$ sufficiently small such that

2\eta+N\eta\cdot\textbf{tv}(\eta)+\kappa\leq\vartheta_{2}-\vartheta_{1},

(23)

then $\mathbb{I}\preceq_{\gamma_{\alpha}^{-1}}\mathbb{X}_{2}$ .

Indeed, the $\left[\![\Theta\right]\!]$ is obtained by $\gamma_{\alpha}(\hat{\mathfrak{G}}_{i})$ for each $i$ . By Claim 2 and Lemma 3, we have

\gamma_{\alpha}^{-1}(\gamma_{\alpha}(\hat{\mathfrak{G}}_{i}))\subseteq\hat{\mathfrak{G}}_{i}+(2\eta+N\eta\cdot\textbf{tv}(\eta))\cdot\mathbb{B}_{W}\subseteq\mathfrak{G}_{i}+(2\eta+N\eta\cdot\textbf{tv}(\eta)+\kappa)\cdot\mathbb{B}_{W}

for each $i$ . By the construction, we can verify that $\tilde{\mathbb{T}}_{2}(\alpha^{-1}(q_{i}))=\mathfrak{G}_{i}+(\vartheta_{2}-\vartheta_{1})\cdot\mathbb{B}_{W}$ . The selection of $\eta$ makes $\gamma_{\alpha}^{-1}(\gamma_{\alpha}(\hat{\mathfrak{G}}_{i}))\subseteq\tilde{\mathbb{T}}_{2}(\alpha^{-1}(q_{i}))$ , which completes the proof.

Remark 14

The relation $\gamma_{\alpha}$ (resp. $\gamma_{\alpha}^{-1}$ ) provides a procedure to include all proper (continuous, discrete) measures that connect with the discrete probabilities. The key point is to record $\tilde{\nu}^{\text{ref}}$ , $\mu^{\text{ref}}$ , and the corresponding radius. These are nothing but finite coverings of the space of measures. This also explains the reason why we use ‘finite-state’ rather than ‘finite’ abstraction. The latter has a meaning of using finite numbers of representative measures to be the abstraction.

To guarantee a sufficient inclusion, conservative estimations, e.g. the bound $\sqrt{2N}\eta$ in Claim 1 and the bound in Proposition 5, are made. This estimation can be done more accurately given more assumptions. For example, the deterministic systems (where $w$ becomes $\delta$ ) provide Dirac transition measures, the $\left\|\mu-\mu^{\text{ref}}\right\|_{\text{W}}^{d}=0$ and hence the second term in (22) is $0$ .

Remark 15

Note that, to guarantee the second abstraction based on $\gamma_{\alpha}^{-1}$ , we search all possible measures that has the same discrete probabilities as $\mu\in\gamma_{\alpha}(\hat{\mathfrak{G}}_{i})$ , not only those Gaussians with the same covariances as $\mathfrak{G}_{i}$ (or $\hat{\mathfrak{G}}_{i}$ ). Such a set of measures provide a convex ball w.r.t. Wasserstein distance. This actually makes sense because in the forward step of creating $\mathbb{I}$ , we have used both Wasserstein and total variation distance to find a convex inclusion of all Gaussian or Gaussian related measures. There ought to be some measures that are ‘non-recoverable’ to Gaussians, unless we extract some ‘Gaussian recoverable’ discrete measures in $\left[\![\Theta_{i}\right]\!]$ , but this loses the point of over-approximation. In this view, IMC abstractions provide unnecessarily larger inclusions than needed.

For the deterministic case, the above mentioned ‘extraction’ is possible, since the transition measures do not have diffusion, the convex inclusion becomes a collection of vertices themselves (also see Remark 6). Based on these vertices, we are able to use $\gamma_{\alpha}$ to find the $\delta$ measures within a convex ball w.r.t. Wasserstein distance.

In contrast to the above special case [21], where the uncertainties are bounded w.r.t. the infinity norm, we can only guarantee the approximated completeness via a robust $\mathcal{L}_{1}$ -bounded perturbation with strictly larger intensity than the original point-mass perturbation. However, this indeed describes a general type of uncertainties for the stochastic systems to guarantee $\mathcal{L}_{1}$ -related properties, including probabilistic properties. Unless higher-moment specifications are of interests, uncertain $\mathcal{L}_{1}$ -random variables are what we need to be the analogue of perturbations in [21].

Corollary 3

Given an LTL formula $\Psi$ , let $S_{i}^{\nu_{0}}=\{\mathcal{P}_{X}^{\nu_{0}}(X\vDash\Psi)\}_{X\in\mathbb{X}_{i}}$ ( $i=1,2$ ) and $S_{\mathbb{I}}^{q_{0}}=\{\mathbf{P}_{I}^{q_{0}}(I\vDash\Psi)\}_{I\in\mathbb{I}}$ , where the initial conditions are such that $\nu_{0}(\alpha^{-1}(q_{0}))=1$ . Then all the above sets are compact and $S_{1}^{\nu_{0}}\subseteq S_{\mathbb{I}}^{q_{0}}\subseteq S_{2}^{\nu_{0}}$ .

The proof in shown in Appendix 0.B.

5 Conclusion

In this paper, we constructed an IMC abstraction for continuous-state stochastic systems with possibly bounded point-mass (Dirac) perturbations. We showed that such abstractions are not only sound, in the sense that the set of satisfaction probability of linear-time properties contains that of the original system, but also approximately complete in the sense that the constructed IMC can be abstracted by another system with stronger but more general $\mathcal{L}_{1}$ -bounded perturbations. Consequently, the winning set of the probabilistic specifications for a more perturbed continuous-state stochastic system contains that of the less Dirac perturbed system. Similar to most of the existing converse theorems, e.g. converse Lyapunov functions, the purpose is not to provide an efficient approach for finding them, but rather to characterize the theoretical possibilities of having such existence.

It is interesting to compare with robust deterministic systems, where no random variables are involved. In [21], both perturbed systems are w.r.t. bounded point masses. More heavily perturbed systems abstract less perturbed ones and hence preserve robust satisfaction of linear-time properties. However, when we try to obtain the approximated completeness via uncertainties in stochastic system, the uncertainties should be modelled by more general $\mathcal{L}_{1}$ random variables. Note that the probabilistic properties of random variables is dual to the weak topology of measures, we study the measures and hence probability laws of processes instead of the state space per se. The state-space topology is not sufficient to quantify the regularity of IMC abstractions. In contrast, the $\mathcal{L}_{1}$ uncertain random variables is a perfect analogue of the uncertain point masses (in $|\cdot|$ ) for deterministic systems. If we insist on using point masses as the only type of uncertainties for stochastic systems, the IMC type abstractions would possibly fail to guarantee the completeness. For example, suppose the point-mass perturbations represents less precision of deterministic control inputs [23, Definition 2.3], the winning set decided by the $\vartheta_{2}$ -precision stationary policies is not enough to cover that of the IMC abstraction, which fails to ensure an approximated bi-similarity of IMCs compared to [21].

For future work, it would be useful to extend the current approach to robust stochastic control systems. It would be interesting to design algorithms to construct IMC (resp. bounded-parameter Markov decision processes) abstractions for more general robust stochastic (resp. control) systems with $\mathcal{L}_{1}$ perturbations based on metrizable space of measures and weak topology. The size of state discretization can be refined given more specific assumptions on system dynamics and linear-time objectives. For verification or control synthesis w.r.t. probabilistic safety or reachability problems, comparisons can be made with stochastic Lyapunov-barrier function approaches.

References

[1] Abate, A., D’Innocenzo, A., Di Benedetto, M.D., Sastry, S.S.: Markov set-chains as abstractions of stochastic hybrid systems. In: Proc. of Hybrid Systems: Computation and Control (HSCC). pp. 1–15. Springer (2008)
[2] Abate, A., Katoen, J.P., Mereacre, A.: Quantitative automata model checking of autonomous stochastic hybrid systems. In: Proc. of Hybrid Systems: Computation and Control (HSCC). pp. 83–92 (2011)
[3] Abate, A., Prandini, M., Lygeros, J., Sastry, S.: Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems. Automatica 44(11), 2724–2734 (2008)
[4] Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press (2008)
[5] Belta, C., Yordanov, B., Gol, E.A.: Formal Methods for Discrete-time Dynamical Systems, vol. 89. Springer (2017)
[6] Bustan, D., Rubin, S., Vardi, M.Y.: Verifying $\omega$ -regular properties of markov chains. In: International Conference on Computer Aided Verification. pp. 189–201. Springer (2004)
[7] Cauchi, N., Laurenti, L., Lahijanian, M., Abate, A., Kwiatkowska, M., Cardelli, L.: Efficiency through uncertainty: Scalable formal synthesis for stochastic hybrid systems. In: Proc. of Hybrid Systems: Computation and Control (HSCC). pp. 240–251 (2019)
[8] Da Prato, G., Zabczyk, J.: Stochastic equations in infinite dimensions. Cambridge University Press (2014)
[9] Dehnert, C., Junges, S., Katoen, J.P., Volk, M.: A storm is coming: A modern probabilistic model checker. In: International Conference on Computer Aided Verification. pp. 592–600. Springer (2017)
[10] Delimpaltadakis, G., Laurenti, L., Mazo Jr, M.: Abstracting the sampling behaviour of stochastic linear periodic event-triggered control systems. arXiv preprint arXiv:2103.13839 (2021)
[11] Dutreix, M., Coogan, S.: Specification-guided verification and abstraction refinement of mixed monotone stochastic systems. IEEE Transactions on Automatic Control (2020)
[12] Dutreix, M.D.H.: Verification and synthesis for stochastic systems with temporal logic specifications. Ph.D. thesis, Georgia Institute of Technology (2020)
[13] Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. International statistical review 70(3), 419–435 (2002)
[14] Girard, A., Pola, G., Tabuada, P.: Approximately bisimilar symbolic models for incrementally stable switched systems. IEEE Transactions on Automatic Control 55(1), 116–126 (2009)
[15] Givan, R., Leach, S., Dean, T.: Bounded-parameter markov decision processes. Artificial Intelligence 122(1-2), 71–109 (2000)
[16] Hartfiel, D.J.: Markov Set-Chains. Springer (2006)
[17] Kloetzer, M., Belta, C.: A fully automated framework for control of linear systems from temporal logic specifications. IEEE Transactions on Automatic Control 53(1), 287–297 (2008)
[18] Lahijanian, M., Andersson, S.B., Belta, C.: Formal verification and synthesis for discrete-time stochastic systems. IEEE Transactions on Automatic Control 60(8), 2031–2045 (2015)
[19] Laurenti, L., Lahijanian, M., Abate, A., Cardelli, L., Kwiatkowska, M.: Formal and efficient synthesis for continuous-time linear stochastic hybrid processes. IEEE Transactions on Automatic Control 66(1), 17–32 (2020)
[20] Li, Y., Liu, J.: Robustly complete synthesis of memoryless controllers for nonlinear systems with reach-and-stay specifications. IEEE Transactions on Automatic Control (2020)
[21] Liu, J.: Robust abstractions for control synthesis: Completeness via robustness for linear-time properties. In: Proc. of Hybrid Systems: Computation and Control (HSCC). pp. 101–110 (2017)
[22] Liu, J.: Closing the gap between discrete abstractions and continuous control: Completeness via robustness and controllability. In: International Conference on Formal Modeling and Analysis of Timed Systems. pp. 67–83. Springer (2021)
[23] Majumdar, R., Mallik, K., Soudjani, S.: Symbolic controller synthesis for büchi specifications on stochastic systems. In: Proceedings of the 23rd International Conference on Hybrid Systems: Computation and Control. pp. 1–11 (2020)
[24] Parker, D.: Verification of probabilistic real-time systems. Proc. 2013 Real-time Systems Summer School (ETR’13) (2013)
[25] Pola, G., Girard, A., Tabuada, P.: Approximately bisimilar symbolic models for nonlinear control systems. Automatica 44(10), 2508–2516 (2008)
[26] Ramponi, F., Chatterjee, D., Summers, S., Lygeros, J.: On the connections between pctl and dynamic programming. In: Proc. of Hybrid Systems: Computation and Control (HSCC). pp. 253–262 (2010)
[27] Rogers, L.C.G., Williams, D.: Diffusions, markov processes and martingales, volume 1: Foundations. Cambridge Mathematical Library, (2000)
[28] Sagar, G., Ravi, D.: Compactness of any countable product of compact metric spaces in product topology without using tychonoff’s theorem. arXiv preprint arXiv:2111.02904 (2021)
[29] Soudjani, S.E.Z., Abate, A.: Adaptive gridding for abstraction and verification of stochastic hybrid systems. In: 2011 Eighth International Conference on Quantitative Evaluation of SysTems. pp. 59–68. IEEE (2011)
[30] Summers, S., Lygeros, J.: Verification of discrete time stochastic hybrid systems: A stochastic reach-avoid decision problem. Automatica 46(12), 1951–1961 (2010)
[31] Tabuada, P., Pappas, G.J.: Linear time logic control of discrete-time linear systems. IEEE Transactions on Automatic Control 51(12), 1862–1877 (2006)
[32] Tkachev, I., Abate, A.: On infinite-horizon probabilistic properties and stochastic bisimulation functions. In: 2011 50th IEEE Conference on Decision and Control and European Control Conference. pp. 526–531. IEEE (2011)
[33] Tkachev, I., Abate, A.: Regularization of bellman equations for infinite-horizon probabilistic properties. In: Proc. of Hybrid Systems: Computation and Control (HSCC). pp. 227–236 (2012)
[34] Tkachev, I., Abate, A.: Formula-free finite abstractions for linear temporal verification of stochastic hybrid systems. In: Proc. of Hybrid Systems: Computation and Control (HSCC). pp. 283–292 (2013)
[35] Tkachev, I., Abate, A.: Characterization and computation of infinite-horizon specifications over markov processes. Theoretical Computer Science 515, 1–18 (2014)
[36] Tkachev, I., Mereacre, A., Katoen, J.P., Abate, A.: Quantitative model-checking of controlled discrete-time markov processes. Information and Computation 253, 1–35 (2017)
[37] Valby, L.V.: A category of polytopes. Ph.D. thesis, Reed College (2006)
[38] Vardi, M.Y.: Automatic verification of probabilistic concurrent finite state programs. In: 26th Annual Symposium on Foundations of Computer Science (FOCS). pp. 327–338. IEEE (1985)
[39] Vassiliou, P.C.: Non-homogeneous markov set systems. Mathematics 9(5), 471 (2021)
[40] Wu, D., Koutsoukos, X.: Reachability analysis of uncertain systems using bounded-parameter markov decision processes. Artificial Intelligence 172(8-9), 945–954 (2008)

Appendix 0.A Proofs of Section 3

Proof of Corollary 1.

Proof It is clear that $Q$ under discrete metric is complete and separable. In addition, for each $t$ , the space $(\mathscr{M}_{t},\left\|\;\cdot\;\right\|_{\text{TV}})$ is complete and separable. By Lemma 1, each $(\mathscr{M}_{t},\left\|\;\cdot\;\right\|_{\text{TV}})$ is also compact. For any sequence $\{\mu_{n}\}\subseteq\mathscr{M}_{t}$ , a quick application of Theorem 1 leads to the existence of a weakly convergent subsequence $\{\mu_{n_{k}}\}$ and a weak limit point $\mu$ in $\mathscr{M}_{t}$ . By the definition of weak convergence and the discrete structure of $Q$ , it is clear that for each $h\in C_{b}(\mathcal{X})$ and $t\in\mathbb{N}$ , we have

\sum_{\mathcal{X}}h(x)\mu_{n_{k}}(x)\rightarrow\sum_{\mathcal{X}}h(x)\mu

in a strong sense, which concludes the compactness of $H$ . Now we choose $\mu_{1},\mu_{2}\in\mathscr{M}_{t}$ , then $\alpha\mu_{1}+(1-\alpha)\mu_{2}\in\mathscr{M}_{t}$ for all $\alpha\in[0,1]$ . Therefore,

\alpha\sum_{\mathcal{X}}h(x)\mu_{1}(x)+(1-\alpha)\sum_{\mathcal{X}}h(x)\mu_{2}(x)=\sum_{\mathcal{X}}h(x)[\alpha\mu_{1}+(1-\alpha)\mu_{2}](x)\in H

for all $\alpha\in[0,1]$ . This shows the convex structure of $H$ .

Proof of Proposition 1.

Proof We make a bit abuse of notation and define $\pi_{T}:Q^{\infty}\rightarrow\prod_{0}^{T}Q$ as the projection onto the finite product space of $Q$ up to time $T$ . Since we do no emphasize the initial conditions, we also use $\mathbf{P}$ , $\mathscr{M}$ and $\mathscr{M}_{t}$ for short. By Tychonoff theorem, any product of $Q$ is also compact w.r.t. the product topology. Therefore, any family of measures on $Q^{T}$ is tight and hence compact. By Remark 1, for every $\mathbf{P}\in\mathscr{M}$ , we have $\mathbf{P}\circ\pi_{T}^{-1}=\otimes_{t=0}^{T}\mu_{t}$ (recall Remark 1) for some $\mu_{t}\in\mathscr{M}_{t}$ , and $\{\mathbf{P}\circ\pi_{T}^{-1}\}_{I\in\mathbb{I}}$ forms a compact set. Hence, every sequence $\{\mathbf{P}_{n}\circ\pi_{T}^{-1}\}_{n}\subseteq\{\mathbf{P}\circ\pi_{T}^{-1}\}_{I\in\mathbb{I}}$ with any finite $T$ contains a weakly convergent subsequence. We construct the convergent subsequence of $\{\mathbf{P}_{n}\}_{n}$ in the following way.

We initialize the procedure by setting $T=0$ . Then $\mathscr{M}_{0}$ is compact, and there exists a weakly convergent subsequence $\{\mathbf{P}_{0,n}\circ\pi_{0}^{-1}\}$ . Based on $\{\mathbf{P}_{0,n}\}$ , we are able to see that it contains a weakly convergent subsequence, denoted by $\{\mathbf{P}_{1,n}\}$ , such that $\{\mathbf{P}_{1,n}\circ\pi_{1}^{-1}\}$ weakly converges. By induction, we have $\{\mathbf{P}_{k+1,n}\}\subseteq\{\mathbf{P}_{k,n}\}$ for each $k\in\mathbb{N}$ . Repeating this argument and picking the diagonal subsequence $\{\mathbf{P}_{n,n}\}$ , then $\{\mathbf{P}_{n,n}\}$ has the property that $\{\mathbf{P}_{n,n}\circ\pi_{T}^{-1}\}$ is weakly convergent for each $T$ . We denote the weak limit point of each $\{\mathbf{P}_{n,n}\circ\pi_{T}^{-1}\}$ by $\otimes_{t=0}^{T}\mu_{t}$ . By the way of construction, we have

\otimes_{t=0}^{T}\mu_{t}(\cdot)=\otimes_{t=0}^{T+1}\mu_{t}(\cdot\times Q),\;\;\forall T\in\mathbb{N}.

By Kolmogorov’s extension theorem, there exists a unique $\mathbf{P}$ on $Q^{\infty}$ such that $\otimes_{t=0}^{T}\mu_{t}(\cdot)=\mathbf{P}\circ\pi_{T}^{-1}(\cdot)$ for each $T$ .

We have seen that for each $\{\mathbf{P}_{n}\}$ , the constructed subsequence satisfies $\mathbf{P}_{n,n}\Rightarrow\mathbf{P}$ , which concludes the claim.

Proof of Theorem 2

Proof Since we do not emphasize the initial conditions, we simply drop the superscripts $q_{0}$ for short. Given $I\in\mathbb{I}$ with any initial condition, the corresponding canonical space is $(\mathbf{\Omega},\mathbf{F},\mathbf{P}_{I})$ . By Proposition 1, every sequence $\{\mathbf{P}_{n}\}\subseteq\mathscr{M}$ has a weakly convergent subsequence, denoted by $\{\mathbf{P}_{n_{k}}\}$ , to a $\mathbf{P}\in\mathscr{M}$ of some $I$ . Note that for any $I$ , the measurable set $\{I\vDash\Psi\}=\{\varpi:\varpi\vDash\Psi\}\in\mathbf{F}$ is the same due to the identical labelling function. It is important to notice that due to the discrete topology of $\mathbf{\Omega}$ , every Borel measurable set $A\in\mathbf{F}$ is such that $\partial A=\emptyset$ . By Definition 5 we have $\mathbf{P}_{n_{k}}(I_{n_{k}}\vDash\Psi)\rightarrow\mathbf{P}(I\vDash\Psi)$ for all $\Psi$ . The compactness of $S^{q_{0}}$ follows immediately. The convexity of the set of laws is based on the tensor product of convex polytopes [37]. To show the convexity of $S^{q_{0}}$ , we notice that, for any $q_{0},\cdots q_{n_{t}}\in Q$ and $I\in\mathbb{I}$ ,

\begin{split}&\mathbf{P}_{I}\left(I_{0}=q_{0},\cdots,I_{t}=q_{n_{t}},I_{t+1}=q_{n_{t+1}}\right)\\ \in&\{\Theta_{n_{t+1},n_{t}}\Theta_{n_{t},n_{t-1}}\cdots\Theta_{n_{1},0}\delta_{q_{0}}:\Theta\in[\![\Theta]\!]\}\end{split}

and hence forms a convex set. Immediately, the convexity holds for
$\{\mathbf{P}_{I}(\prod_{i=1}^{n}\Gamma_{i})\}_{I\in\mathbb{I}}$ for any cylinder set $\prod_{i=1}^{n}\Gamma_{i}$ . By a standard monotone class argument, $\{\mathbf{P}_{I}(A)\}_{I\in\mathbb{I}}$ is also convex for any Borel measurable set $A\in\mathbf{F}$ , which implies the convexity of $S^{q_{0}}$ in the statement.

Proof of Proposition 2

Proof Note that the laws are associated with $X$ with $X_{0}=x_{0}$ , which actually means the stopped process $X^{\tau}$ (Recall notations in Section 2.4). Now that $X_{t\wedge\tau}\in\overline{\mathcal{W}}$ for each $t$ , the state space of $X$ is compact, so is the countably infinite product. By a similar argument as Proposition 1, we can conclude the first part of the statement. Note that by assumption, the partition respects the boundary of the labelling function. Hence, for all formula $\Psi$ , the boundary of $\{X\vDash\Psi\}\in\mathbf{F}$ has measure $0$ . The second part can be concluded directly by Definition 5.

Proof of Lemma 2

Proof Note that $X$ is on $(\Omega,\mathcal{F},\mathcal{P}_{X}^{\nu_{0}})$ and $I$ is on $(\mathbf{\Omega},\mathbf{F},\mathbf{P}_{I}^{q_{0}})$ . We first show the case when $\nu_{0}=\delta_{x_{0}}$ for any $x_{0}\in q_{0}$ . That is, for $X_{0}=x_{0}$ a.s. with any $x_{0}\in q_{0}$ , there exists a unique law of some $I\in\mathbb{I}$ such that $\mathcal{P}_{X}^{x_{0}}(X\vDash\Psi)=\mathbf{P}_{I}^{q_{0}}(I\vDash\Psi)$ for any $\Psi$ .

Let $\nu_{t}$ denote the marginal distribution of $\mathcal{P}_{X}^{x_{0}}$ at each $t$ . Let $\mathscr{M}_{t}=\{\mu_{t}\}_{I\in\mathbb{I}}$ denote the set of marginal distributions of $\{\mathbf{P}_{I}^{q_{0}}\}_{I\in\mathbb{I}}$ . Now, at $t=1$ , $\nu_{1}(q_{j})=\mathcal{T}(x_{0},q_{j})\delta_{x_{0}}$ for all $j\in\{1,2,\cdots,N+1\}$ . Suppose $q_{0}$ is the $i^{th}$ element of $Q$ , by the construction of IMC, we have

\check{\Theta}_{ij}\leq\nu_{1}(q_{j})=\int_{q_{j}}\delta_{x_{0}}\mathcal{T}(x_{0},dy)\leq\hat{\Theta}_{ij},\;\;\forall x_{0}\in q_{0}\;\text{and}\;\forall j\in\{1,2,\cdots,N+1\}.

Since $\sum_{q\in Q}\nu_{1}(q)=1$ , by letting $\mu_{1}=(\nu_{1}(q_{1}),\nu_{1}(q_{2}),\cdots,\nu_{1}(q_{N+1}))^{T}$ , we have automatically $\mu_{1}\in\mathscr{M}_{1}$ by definition. Note that $\mu_{1}$ is unique w.r.t. $\left\|\;\cdot\;\right\|_{\text{TV}}$ , and has the property that $\mu(q)=\nu(q)$ for each $q\in Q$ .

Similarly, at $t=2$ , we have

\check{\Theta}_{ij}\mu_{1}(q_{i})\leq\int_{q_{j}}\int_{q_{i}}\nu_{1}(dx)\mathcal{T}(x,dy)\leq\hat{\Theta}_{ij}\mu_{1}(q_{i}),\;\forall i,j\in\{1,2,\cdots,N+1\},

where $\mathcal{T}$ may not be the same as that of $t=1$ . Therefore, for any $j\in\{1,2,\cdots,N+1\}$ ,

\nu_{2}(q_{j})=\sum_{i=1}^{N+1}\int_{q_{j}}\int_{q_{i}}\nu_{1}(dx)\mathcal{T}(x,dy)

and there exists a $\mu_{2}$ such that $\sum_{i}\check{\Theta}_{ij}\mu_{1}(q_{i})\leq\mu_{2}(q_{j})=\nu_{2}(q_{j})\leq\sum_{i}\hat{\Theta}_{ij}\mu_{1}(q_{i})$ , which means (by (12)) $\mu_{2}\in\mathscr{M}_{2}$ . In addition, there also exists a $\mathbf{P}^{q_{0}}$ such that its one-dimensional marginals up to $t=2$ admit $\mu_{1}$ and $\mu_{2}$ , and satisfies

\mathbf{P}^{q_{0}}[I_{0}=q_{0},I_{1}=q_{i},I_{2}=q_{j}]=\mathcal{P}_{X}^{x_{0}}[X_{0}=x_{0},X_{1}\in q_{i},X_{2}\in q_{j}].

Repeating this procedure, there exists a unique $\mu_{t}\in\mathscr{M}_{t}$ w.r.t. $\left\|\;\cdot\;\right\|_{\text{TV}}$ for each $t$ , such that $\mu_{t}(q)=\nu_{t}(q)$ for each $q\in Q$ . It is also clear that for each given $x_{0}\in q_{0}$ and each $t$ , the selected $\mathbf{P}^{q_{0}}$ satisfies

\mathcal{P}_{X}^{x_{0}}(\prod_{0}^{t-1}A_{i})=\mathbf{P}^{q_{0}}(\prod_{0}^{t-1}A_{i})=\mathbf{P}^{q_{0}}(\prod_{0}^{t-1}A_{i}\times Q),\;\;A_{i}\in\mathscr{B}(Q).

By Kolmogrov extension theorem, there exists a unique law $\mathbf{P}_{I}^{q_{0}}$ of some $I\in\mathbb{I}$ such that each $T$ -dimensional distribution coincides with $\otimes_{0}^{T}\mu_{i}$ , and, for each given $x_{0}\in q_{0}$ , $\mathcal{P}_{X}^{x_{0}}(\Gamma)=\mathbf{P}_{I}^{q_{0}}(\Gamma)$ for all $\Gamma\in\mathscr{B}(Q^{\infty})=\mathbf{F}.$ Due to the assumption that $L(x)=L_{\mathbb{I}}(q)$ for all $x\in q$ and $q\in Q$ , we have

\{L^{-1}(\Psi)\}=\{L_{\mathbb{I}}^{-1}(\Psi)\}\in\mathbf{F}

for all LTL formula $\Psi$ , which implies $\{X\vDash\Psi\}=\{I\vDash\Psi\}$ by definition. Thus, given $x_{0}\in q_{0}$ , the above $\mathbf{P}_{I}^{q_{0}}$ should satisfy $\mathcal{P}_{X}^{x_{0}}(X\vDash\Psi)=\mathbf{P}_{I}^{q_{0}}(I\vDash\Psi)$ .

Based on the above conclusion, as well as the definition of $\mathcal{P}_{X}^{\nu_{0}}$ and the convexity of $S^{q_{0}}$ (recall Theorem 2), the result for more general initial distribution $\nu_{0}$ with $\nu_{0}(q_{0})=1$ can be obtained.

Proof of Proposition 3.

Proof Let $\mu,\nu\in\mathscr{M}_{t}$ for each $t$ , and $V,K\in[\![\Theta]\!]$ be any stochastic matrices generated by $\mathbb{I}$ . Then, for each $t$ , we have

\begin{split}\left\|V^{T}\mu-K^{T}\nu\right\|_{\text{TV}}&\leq\left\|V^{T}\mu-V^{T}\nu\right\|_{\text{TV}}+\max_{i}\left\|V_{i}-K_{i}\right\|_{\text{TV}}\left\|\nu\right\|_{\text{TV}}\\ &\leq\frac{1}{2}\max_{i,j}\left\|V_{i}-K_{j}\right\|_{\text{TV}}\left\|\mu-\nu\right\|_{\text{TV}}+\varepsilon\left\|\nu\right\|_{\text{TV}}\\ &\leq\left\|\mu-\nu\right\|_{\text{TV}}+\varepsilon.\end{split}

(24)

This implies that the total deviation of any $\tilde{\mu},\tilde{\nu}\in\mathscr{M}_{t+1}$ is bounded by

\max_{\mathscr{M}_{t}}\left\|\mu-\nu\right\|_{\text{TV}}+\varepsilon.

Note that at $t=0$ , $\max_{\mathscr{M}_{0}}\left\|\mu-\nu\right\|_{\text{TV}}=0$ . Hence, at each $t>0$ , as $\varepsilon\rightarrow 0$ ,

\max_{\mathscr{M}_{t}}\left\|\mu-\nu\right\|_{\text{TV}}\rightarrow 0.

By the product topology and Kolmogrov extension theorem, the set $\{\mathbf{P}\}_{I\in\mathbb{I}}$ is reduced to a singleton. The conclusion follows after this.

Appendix 0.B Proofs of Section 4

Proof of Lemma 3.

Proof It can be proved, for example, using inclusion functions. Let $\mathbb{IR}^{n}$ denote the set of all boxes in $\mathbb{R}^{n}$ . Let $[f]:\mathbb{IR}^{n}\rightarrow\mathbb{IR}^{n}$ be a convergent inclusion function of $f$ satisfying (i) $f([x])\subseteq[f]([x])$ for all $[x]\in\mathbb{IR}^{n}$ ; (ii) $\lim_{\lambda([y])\rightarrow 0}\lambda([f]([x])=0$ , where $\lambda$ denote the width. Similarly, let $[B]:\mathbb{IR}^{n}\rightarrow\mathbb{IR}^{n\times k}$ be a convergent inclusion matrix of $b(x)$ and satisfy (i) $b([x])\subseteq[B]([x])$ for all $[x]\in\mathbb{IR}^{n}$ ; (ii) $\lim_{\lambda([B])\rightarrow 0}\lambda([B]([x])=0$ , where $\lambda([B]([x]):=\max_{i,j}\lambda([B_{ij}])$ .

Without loss of generality, we assume that $\kappa<1$ . Due to the Lipschitz continuity of $f$ and $b$ , we can find inclusions such that $\lambda([f]([y])\leq L_{f}\lambda([y])$ for any subintervals of $[x]$ , and similarly $\lambda([B]([y])\leq L_{b}\lambda([y])$ . For each such interval $[y]$ , we can obtain the interval $[m]=[f]([y])$ and $[s^{2}]=[B]([y])[B]^{*}([y])$ . Let $T$ denote the collection of Gaussian measures with mean and covariance of all such intervals ( $[m]$ and $[s^{2}]$ ), and $\widehat{\mathbb{T}_{1}([x])}$ be its union. Then $\widehat{\mathbb{T}_{1}([x])}$ satisfies the requirement. Indeed, we have $\mathbb{T}_{1}([x])\subseteq\widehat{\mathbb{T}_{1}([x])}$ . For the second part of inclusion, we have for any $\mu\in\widehat{\mathbb{T}_{1}([x])}$ and $\nu\in\mathbb{T}_{1}([x])$ ,

\left\|\mu-\nu\right\|_{\text{W}}^{2}\leq[m]^{2}+[s^{2}]\leq(L_{f}\lambda([y]))^{2}+(L_{b}\lambda([y]))^{2},

(25)

where we are able to choose $[y]$ arbitrarily small such that $(L_{f}\lambda([y]))^{2}+(L_{b}\lambda([y]))^{2}<\kappa^{2}$ . The second part of inclusion can be completed by such a choice of $[y]$ .

Proof of Corollary 3

Proof The first part of the proof is provided in Section 3. The second inclusion is done in a similar way as Lemma 2 and Theorem 3. Indeed, by the definition of abstraction, for any $\mu\in\mathscr{M}_{t}$ , there exists a marginal measure $\nu$ of some $X\in\mathbb{X}_{2}$ such that their probabilities match on discrete nodes. By the same induction as Lemma 2, we have that for any law $\mathbf{P}_{I}^{q_{0}}$ of some $I\in\mathbb{I}$ , there exists a $\mathcal{P}_{X}^{\nu_{0}}$ of some $X\in\mathbb{X}_{2}$ such that the probabilities of any $\Gamma\in\mathscr{B}(Q^{\infty})$ match. The second inclusion follows after this. The compactness also follows a similar way as Proposition 2. Note that, $S_{1}$ may not be convex, but $S_{\mathbb{I}}$ and $S_{2}$ are (also see details in Remark 15).