This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Formalizing and Evaluating Requirements of Perception Systems for Automated Vehicles using Spatio-Temporal Perception Logic

Mohammad Hekmatnejad, Bardh Hoxha, Jyotirmoy V. Deshmukh, Yezhou Yang, and Georgios Fainekos Mohammad Hekmatnejad was with the School of Computing and Augmented Intelligence (formerly CIDSE), Arizona State University, USA; e-mail: [email protected] Hoxha is with the Toyota Research Institute of North America, USA.Jyotirmoy V. Deshmukh is with the University of Southern California, USA.Yezhou Yang is with Arizona State University, USA.Georgios Fainekos is with the Toyota Research Institute of North America, USA; the work was performed while he was with the Arizona State University, USA.
Abstract

Automated vehicles (AV) heavily depend on robust perception systems. Current methods for evaluating vision systems focus mainly on frame-by-frame performance. Such evaluation methods appear to be inadequate in assessing the performance of a perception subsystem when used within an AV. In this paper, we present a logic – referred to as Spatio-Temporal Perception Logic (STPL) – which utilizes both spatial and temporal modalities. STPL enables reasoning over perception data using spatial and temporal operators. One major advantage of STPL is that it facilitates basic sanity checks on the functional performance of the perception system, even without ground-truth data in some cases. We identify a fragment of STPL which is efficiently monitorable offline in polynomial time. Finally, we present a range of specifications for AV perception systems to highlight the types of requirements that can be expressed and analyzed through offline monitoring with STPL.

Index Terms:
Formal Methods, Perception System, Temporal Logic, Autonomous Driving Systems.

I Introduction

The safe operation of automated vehicles (AV), advanced driver assist systems (ADAS), and mobile robots in general, fundamentally depends on the correctness and robustness of the perception system (see discussion in Sec. 3 by Schwarting et al. (2018)). Faulty perception systems and, consequently, inaccurate situational awareness can lead to dangerous situations (Lee (2018); Templeton (2020)). For example, in the accident in Tempe in 2018 involving an Uber vehicle (Lee (2018)), different sensors had detected the pedestrian, but the perception system was not robust enough to assign a single consistent object class to the pedestrian. Hence, the AV was not able to predict the future path of the pedestrian. This accident highlights the importance of identifying the right questions to ask when evaluating perception systems.

Refer to caption
(a) Frame f=0f=0 and τ(0)=0\tau(0)=0
Refer to caption
(b) Frame f=1f=1 and τ(1)=0.04\tau(1)=0.04
Refer to caption
(c) Frame f=2f=2 and τ(2)=0.08\tau(2)=0.08
Refer to caption
(d) Frame f=3f=3 and τ(3)=0.12\tau(3)=0.12
Refer to caption
(e) Frame f=4f=4 and τ(4)=0.16\tau(4)=0.16
Refer to caption
(f) Frame f=5f=5 and τ(5)=0.2\tau(5)=0.2
Figure 1: A series of image frames taken from KITTI (Geiger et al. (2013a)) augmented with the data about classified data-objects using SqueezeDet (Wu et al. (2017)). Here, τ\tau is a function that maps each frame ff to its captured time.

In this paper, we argue that in order to improve AV/ADAS safety, we need to be able to formally express what should be the assumptions, performance, and guarantees provided by a perception system. For instance, the aforementioned accident highlights the need for predicting the right object class quickly and robustly. An informal requirement (in natural language) expressing such a perception system performance expectation could be:

Req. 1

Whenever a new object is detected, then it is assigned a class within 1 sec, after which the object does not change class until it disappears.

Such requirements are not only needed for deployed safety critical systems, but also for existing training and evaluation data sets. For example, a sequence of six image frames from the KITTI dataset is shown in Figure 1. All detected objects with bounding boxes are labeled with class names such as Car, Pedestrian, and Cyclist in those frames. Nonetheless, in Fig. 1(a), a cyclist that was detected in frame f=0f=0 and existed in frame f=1f=1 changed its class to pedestrian in frame f=2f=2, which is a violation of Req. 1. If such a requirement is too strict for the perception system, then it could be relaxed by requiring that the object is properly classified in at least 4 out of 5 frames (or any other desired frequency). The first challenge for an automated testing & verification framework is to be able to formally (i.e., mathematically) represent such high-level requirements.

Formalizing the expectations on the performance of the perception system has several benefits. First and foremost, such formal requirements can help us evaluate AV/ADAS perception systems beyond what existing metrics used in vision and object tracking can achieve (Yurtsever et al. (2020); Richter et al. (2017)). In particular, evaluation metrics used for image classification algorithms are typically operating on a frame-by-frame basis ignoring cross-frame relations and dependencies. Although tracking algorithms can detect misclassifications, they cannot assess the frequency and severity of them, or more complex spatio-temporal failure conditions.

Second, formal specifications enable automated search-based testing (e.g., see Abbas et al. (2017); Tuncali et al. (2020); Dreossi et al. (2019b, a); Corso et al. (2020); Gladisch et al. (2019); DeCastro et al. (2020)) and monitoring (e.g., see Rizaldi et al. (2017); Hekmatnejad et al. (2019)) for AV/ADAS. Requirements are necessary in testing in order to express the conditions (assumptions) under which system level violations are meaningful. Similarly, logical requirements are necessary in monitoring since without formal requirements, it is very hard to concisely capture the rules of the road. We envision that a formal specification language which specifically targets perception systems will enable similar innovations in testing and monitoring AV/ADAS.

Third, formal specifications on the perception system can also function as a requirements language between original equipment manufacturers (OEM) and suppliers. As a simple example for the need of a requirements language consider the most basic question: should the image classifier correctly classify all objects within the sensing range of the lidar, or all objects of at least xx pixels in diameter? Again, without requirements, any system behavior is acceptable and, most importantly, not testable/verifiable!

Finally, using a formal specification language, we can search offline perception datasets (see Kim et al. (2022); Anderson et al. (2023); Yadav and Curry (2019)) to find scenarios that violate the requirements in order to reproduce them in simulation, e.g., Bashetty et al. (2020), or to assess the real-to-sim gap in testing, e.g., Fremont et al. (2020).

Even though there is a large body of literature on the application of formal languages and methods to AV/ADAS (see the survey by Mehdipour et al. (2023)), the prior works have certain limitations when perception systems are specifically targeted. Works like Tuncali et al. (2020) and Dreossi et al. (2019a) demonstrated the importance of testing system level requirements for an AV when the AV contains machine learning enabled perception components. Both works use Signal Temporal Logic (STL) (Maler and Nickovic (2004)) for expressing requirements on the performance of the AV/ADAS. However, STL cannot directly express requirements on the performance of the perception system itself.

This limitation was identified by Dokhanchi et al. (2018a) who developed a new logic – Timed Quality Temporal Logic (TQTL) – which can directly express requirements for image classification algorithms like SqueezeDet by Wu et al. (2017). Namely, TQTL was designed to enable reasoning over objects classified in video frames and their attributes. For example, TQTL can formalize Req. 1 that an object should not change classes. However, TQTL does not support spatial or topological reasoning over the objects or regions in a scene. This is a major limitation since TQTL cannot express requirements such as occlusion, overlap, and other spatial relations for basic sanity checks, e.g., that objects do not “teleport” from frame to frame, or that “every 3D bounding box contains at least 1 lidar point”.

In this paper, we introduce the Spatio-Temporal Perception Logic (STPL) to address the aforementioned limitations. We combine TQTL with the spatial logic 𝒮4u{\mathcal{S}}4_{u}111For a historical introduction to 𝒮4u{\mathcal{S}}4_{u} see van Benthem and Bezhanishvili (2007) and Kontchakov et al. (2007). to produce a more expressive logic specifically focused on perception systems. STPL supports quantification among objects or regions in an image, and time variables for referring to specific points in time. It also enables both 2D and 3D spatial reasoning by expressing relations between objects across time. For example with STPL, we can express requirements on the rate that bounding boxes should overlap:

Req. 2

The frames per second of the camera is high enough so that for all detected cars, their bounding boxes self-overlap for at least 3 frames and for at least 10% of the area.

which is satisfiable for the image frames in Fig. 1.

Beyond expressing requirements on the functional performance of perception systems, STPL can be used to compare different machine learning algorithms on the same perception data. For example, STPL could assess how rain removal algorithms (see the work by Sun et al. (2019)) improve the performance of the object detection algorithms over video streams. Along these lines, Mallick et al. (2023) proposed to use TQTL for differential testing between two different models for pedestrian detection in extreme weather conditions. As another example, STPL could evaluate the impact of a point cloud clustering algorithm on a prediction algorithm (see the works by Campbell et al. (2016) and Qin et al. (2016)).

Since STPL is a very expressive formal language (TQTL+𝒮4u{\mathcal{S}}4_{u}), it is not a surprise that both the offline and online monitoring problems can be computationally expensive. The main challenge in designing a requirements language for perception systems is to find the right balance between expressivity and computability.

STPL borrows time variables and freeze time quantifiers from the Timed Propositional Temporal Logic (TPTL) by Alur and Henzinger (1994). Unfortunately, the time complexity of monitoring TPTL requirements is PSPACE-hard (see Markey and Raskin (2006)). Even though the syntax and semantics of STPL that we introduce in Section IV allow arbitrary use of time variables in the formulas, in practice, in our implementation, we focus on TPTL fragments which are computable in polynomial time (see Dokhanchi et al. (2016); Elgyutt et al. (2018); Ghorbel and Prabhu (2022)).

Another source of computational complexity in STPL is the use of object/region quantification and set operations, i.e., intersection, complementation, and union. Even our limited use of quantifiers (\exists, \forall) over finite sets (i.e., objects in a frame) introduces a combinatorial component to our monitoring algorithms. This is a well known problem in first-order temporal logics that deal with data (e.g., see Basin et al. (2015) and Havelund et al. (2020)). In addition, the computational complexity of checking whether sets, which correspond to objects/regions, satisfy an arbitrary 𝒮4u{\mathcal{S}}4_{u} formula depends on the set representation and the dimensionality of the space (e.g., polytopes Baotic (2009), orthogonal polyhedra Bournez et al. (1999), quadtrees or octrees Aluru (2018), etc).

In this paper, we introduce an offline monitoring algorithm for STPL in Section VI, and We study its computational complexity in Section VI-E. In Section VI-F, we identify a fragment which can remain efficiently computable. Irrespective of the computational complexity of the full logic, in practice, our experimental results in Section VII indicate that we can monitor many complex specifications within practically relevant times.

All in all, to the best of our knowledge, this is the first time that a formal language is specifically designed to address the problem of functional correctness of perception systems within the context of autonomous/automated mobile systems. Our goal is to design a logic that can express functional correctness requirements on perception systems while understanding what is the computational complexity of monitoring such requirements in practice. As a bi-product, we develop a logical framework that can be used to compare different perception algorithms beyond the standard metrics.

In summary, the contributions of this paper are:

  1. 1.

    We introduce a new logic – Spatio-Temporal Perception Logic (STPL) – that combines quantification and functions over perception data, reasoning over timing of events, and set based operations for spatial reasoning (Section IV).

  2. 2.

    We present examples of STPL specifications of increasing complexity as a tutorial (Section V), and we demonstrate that STPL can find label inconsistencies in publicly available AV datasets (see Figure 4).

  3. 3.

    We introduce an offline monitoring algorithm (Section VI), and we propose to use different set representations for efficient computations (Section VI-E).

  4. 4.

    We study the time complexity of the offline monitoring problem for STPL formulas, and we propose fragments for which the monitoring problem is computable in polynomial time (Section VI-E).

  5. 5.

    We experimentally study offline monitoring runtimes for a range of specifications and present the trade offs in using the different STPL operators (Section VII).

  6. 6.

    We have released a publicly available open-source toolbox (STPL (2023)) for offline monitoring of STPL specifications over perception datasets. The toolbox allows for easy integration of new user-defined functions over time and object variables, and new data structures for set representation.

One common challenge with any formal language is its accessibility to users with limited experience in formal requirements. In order to remove such barriers, Anderson et al. (2022) developed PyFoReL which is a domain specific language (DSL) for STPL inspired by Python and integrated with Visual Studio Code. In addition, our publicly available toolbox contains all the examples and case studies presented in this paper which can be used as a tutorial by users without prior experience in formal languages. One can further envision integration with Large Language Models for direct use of natural language for requirement elicitation similar to the work by Pan et al. (2023); Fuggitti and Chakraborti (2023); Cosler et al. (2023) for signal or linear temporal logic.

Finally, even though the examples in the paper focus on perception datasets from AV/ADAS applications (i.e., Motional (2019); Geiger et al. (2013b)), STPL can be useful in other applications as well. One such example could be in robot manipulation problems where mission requirements are already described in temporal logic (e.g., see He et al. (2018) and He et al. (2015)).

II Preliminaries

We start by introducing definitions that are used throughout the paper for representing classified data generated by perception subsystems. In the following, an ego car (or the Vehicle Under Test – VUT) is a vehicle equipped with perception systems and other automated decision making software components, but not necessarily a fully automated vehicle (e.g., SAE automation level 4 or 5).

II-A Data-object Stream

A data-object stream 𝒟\mathcal{D} is a sequence of data-objects representing objects in the ego vehicle’s environment. Such data objects could be the output of the perception system, or ground truth data, or even a combination of these. The ground truth can be useful to analyze the precision of the output of a perception system. That is, given both the output of the perception system and the ground truth, it is possible to write requirements on how much, how frequently, and under what conditions the perception system is allowed to deviate from the ground truth. In all the examples and case studies that follow, we assume that we are either given the output of the perception system, or the ground truth – but not both. In fact, we treat ground truth data as the output of a perception system in order to demonstrate our framework. Interestingly, in this way, we also discover inconsistencies in the labels of training data sets for AV/ADAS (see Fig. 4).

We refer to each dataset in the sequence as a frame. In each frame, objects are classified and stored in a data-object. Data-objects are abstract because in the absence of a standard classification format, the output of different perception systems are not necessarily compatible with each other. To simplify the presentation, we will also refer to the order of each frame in the data stream as a frame number. For each frame ii, 𝒟(i)\mathcal{D}(i) is the set of data-objects for frame ii, where ii is the order of the frame in the data-object stream 𝒟\mathcal{D}. We assume that a data stream is provided by a perception system, in which a function \mathcal{R} retrieves data-attributes for an identified object. A data-attribute is a property of an object such as class, probability, geometric characteristics, etc. Some examples could be:

  • by (𝒟(i),id).Class\mathcal{R}(\mathcal{D}(i),id).Class, we refer to the class of an object identified by idid in the ii’th frame,

  • by (𝒟(i),id).Prob\mathcal{R}(\mathcal{D}(i),id).Prob, we refer to the probability that an object identified by idid in the ii’th frame is correctly classified,

  • by (𝒟(i),id).PC\mathcal{R}(\mathcal{D}(i),id).PC, we refer to the point clouds associated with an object identified by idid in the ii’th frame.

The function 𝒪\mathcal{OI} returns the set of identifiers for a given set of data-objects. By 𝒪(𝒟(i))\mathcal{OI}(\mathcal{D}(i)), we refer to all the identifiers that are assigned to the data-objects in the ii’th frame. Another important attribute of each frame is its time stamp, i.e., when this frame was captured in physical time. We represent this attribute for the frame ii by τ(i)\tau(i).

Example II.1

Assume that data-object stream 𝒟\mathcal{D} is represented in Figure 1, then the below equalities hold:

  • 𝒪(𝒟(1))={1,2,3}\mathcal{OI}(\mathcal{D}(1))=\{1,2,3\}

  • (𝒟(1),2).Class=cyclist\mathcal{R}(\mathcal{D}(1),2).Class=cyclist

  • (𝒟(1),2).Prob=0.57\mathcal{R}(\mathcal{D}(1),2).Prob=0.57

  • τ(1)=0.04\tau(1)=0.04

Finally, we remark that in offline monitoring, which is the application that we consider in this paper, the data stream 𝒟\mathcal{D} is finite. We use the notation |𝒟||\mathcal{D}| to indicate the total number of frames in the data stream.

II-B Topological Spaces

Throughout the paper, we will be using \mathbb{N} to refer to the set of natural numbers and \mathbb{R} to real numbers. In the following, for completeness, we present some definitions and notation related to topological spaces and the S4uS4_{u} logic which are adopted from Gabelaia et al. (2005) and Kontchakov et al. (2007). Later, we build our MTL×S4uMTL\times S4_{u} definition on this notation.

Definition II.1 (Topological Space)

A topological space is a pair 𝕋=𝕌,𝐈{\Large{\mathbb{T}}}=\langle{\Large{\mathbb{U}}},\mathbf{I}\rangle in which 𝕌{\Large{\mathbb{U}}} is the universe of the space, and 𝐈\mathbf{I} is the interior operator on 𝕌{\Large{\mathbb{U}}}. The universe is a nonempty set, and 𝐈\mathbf{I} satisfies the standard Kuratowski axioms where X,Y𝕌X,Y\subseteq{\Large{\mathbb{U}}}:

𝐈(XY)=𝐈X𝐈Y,𝐈X=𝐈𝐈X,𝐈XX,and𝐈𝕌=𝕌.\displaystyle\mathbf{I}(X\cap Y)=\mathbf{I}X\cap\mathbf{I}Y,\;\;\mathbf{I}X{=}\;\mathbf{I}\mathbf{I}X,\;\;\mathbf{I}X\subseteq X,\;\;\mbox{and}\;\;\mathbf{I}{\Large{\mathbb{U}}}={\Large{\mathbb{U}}}.

We denote 𝐂\mathbf{C} (closure) as the dual operator to 𝐈\mathbf{I}, such that 𝐂X=𝕌𝐈(𝕌X)\mathbf{C}X={\Large{\mathbb{U}}}-\mathbf{I}({\Large{\mathbb{U}}}-X) for every X𝕌X\subseteq{\Large{\mathbb{U}}}. Below we list some remarks related to the above definitions:

  • 𝐈X\mathbf{I}X is the interior of a set XX,

  • 𝐂X\mathbf{C}X is the closure of a set XX,

  • XX is called open if X=𝐈XX=\mathbf{I}X,

  • XX is called closed if X=𝐂XX=\mathbf{C}X,

  • If XX is an open set then its complement X¯=𝕌X\overline{X}={\Large{\mathbb{U}}}-X is a closed set and vice versa,

  • For any set X𝕌X\subseteq{\Large{\mathbb{U}}}, its boundary is defined as 𝐂X𝐈X\mathbf{C}X-\mathbf{I}X (XX and its complement have the same boundary),

II-C Image Space Notation and Definitions

Even though our work is general and applies to arbitrary finite dimensional spaces, our main focus is on 2D and 3D spaces as encountered in perception systems in robotics. In the following, the discussion focuses on 2D image spaces with the usual pixel coordinate systems. In Figure 2 left, a closed set is illustrated as the light blue region, while its boundary is a dashed-line in darker blue.

Refer to caption
Figure 2: Let X=X1X2X=X_{1}\cup X_{2} with 𝐂X=X\mathbf{C}X=X. Left: the whole rectangle represents the universe (colored gray) 𝕌{\Large{\mathbb{U}}}, the regions X1X_{1} and X2X_{2} without the dashed-lines (colored light blue) represent the interior 𝐈X\mathbf{I}X of the set XX, the dashed lines (colored darker blue) represent the boundary 𝐂X𝐈X\mathbf{C}X-\mathbf{I}X of the set XX, and the remaining region represents 𝕌X{\Large{\mathbb{U}}}-X. Right: the tightest bounding box around set XX that includes the four unique elements of the set as in Def. II.4.
Definition II.2 (Totally-ordered Topological Space)

A Totally-ordered (TO) topological space 𝐓{{\Large{\mathbf{T}}}} is a topological space 𝕌,𝐈\langle{\Large{\mathbb{U}}},\mathbf{I}\rangle that is equipped with a total ordering relation 𝕌×𝕌\preceq\;\subseteq{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{\Large{\mathbb{U}}}}\times{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{\Large{\mathbb{U}}}}. That is, p1,p2𝕌\forall p_{1},p_{2}\in{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{\Large{\mathbb{U}}}} either p1p2p_{1}\preceq p_{2} or p2p1p_{2}\preceq p_{1}.

Definition II.3 (Total order in 2D spaces)

In a two-dimensional (2D) TO topological space 𝐓{{\Large{\mathbf{T}}}}, p=(x,y)𝕌p=(x,y)\in{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{\Large{\mathbb{U}}}} denotes its coordinates in the xyx-y Cartesian coordinate system, where x,y0x,y\in\mathbb{R}_{\geq 0}. For p1=(x1,y1)p_{1}=(x_{1},y_{1}), p2=(x2,y2)p_{2}=(x_{2},y_{2}) the ordering relation is defined as:

(x1,y1)2D(x2,y2)y1<y2(y1=y2x1x2).\displaystyle(x_{1},y_{1})\preceq_{2D}(x_{2},y_{2})\iff y_{1}<y_{2}\vee(y_{1}=y_{2}\wedge x_{1}\leq x_{2}).

Note that even though Def. II.3 considers the coordinates over the reals, i.e., 02\mathbb{R}^{2}_{\geq 0}, in practice, the image space is defined over the integers, i.e., 2\mathbb{N}^{2} (pixel coordinates). The topological space in Figure 2 consists of the universe which is all the pixels that belong to the whole 2D gray-rectangle and the closure operator that for any set of regions includes their edges. Assume that the left-upper corner of the image is the origin of the xyx-y Cartesian coordinate system, and the xaxisx-axis and the yaxisy-axis are along the width and height of the image, respectively. This is the standard coordinate system in image spaces. Then, any pixel that belongs to the universe has an order with respect to the other pixels. For example, consider X=X1X2X=X_{1}\cup X_{2} in Fig. 2, then all the pixels that belong to X2X_{2} have higher orders than those in X1X_{1}.

Definition II.4 (Spatial Ordering Functions)

Given a closed set U𝕌{{U}}\subseteq{\Large{\mathbb{U}}} from a 2D TO topological space 𝐓=𝕌,𝐈{\Large{\mathbf{T}}}=\langle{\Large{\mathbb{U}}},\mathbf{I}\rangle, we define the top-most, bottom-most, left-most, right-most, and center-point functions by TM,BM,LM,RM.CT:2𝕌𝕌\mbox{\footnotesize{TM}},\mbox{\footnotesize{BM}},\mbox{\footnotesize{LM}},\mbox{\footnotesize{RM}}.\mbox{\footnotesize{CT}}:{2^{\Large{\mathbb{U}}}\rightarrow{\Large{\mathbb{U}}}}, respectively such that

TM(U)=\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\mbox{\footnotesize{TM}}({{U}})=} (xp,yp)U s.t.\displaystyle\;{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}(x_{p},y_{p})\in{{U}}\mbox{ s.t. }}
s=(xs,ys)𝕌,yp<ys(yp=ysxpxs)\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\forall s=(x_{s},y_{s})\in{\Large{\mathbb{U}}},y_{p}<y_{s}\vee(y_{p}=y_{s}\wedge x_{p}\geq x_{s})}
BM(U)=\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\mbox{\footnotesize{BM}}({{U}})=} (xp,yp)U s.t.\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\;(x_{p},y_{p})\in{{U}}\mbox{ s.t. }}
s=(xs,ys)𝕌,yp>ys(yp=ysxpxs)\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\forall s=(x_{s},y_{s})\in{\Large{\mathbb{U}}},y_{p}>y_{s}\vee(y_{p}=y_{s}\wedge x_{p}\leq x_{s})}
LM(U)=\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\mbox{\footnotesize{LM}}({{U}})=} (xp,yp)U s.t.\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\;(x_{p},y_{p})\in{{U}}\mbox{ s.t. }}
s=(xs,ys)𝕌,xp<xs(xp=xsypys)\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\forall s=(x_{s},y_{s})\in{\Large{\mathbb{U}}},x_{p}<x_{s}\vee(x_{p}=x_{s}\wedge y_{p}\leq y_{s})}
RM(U)=\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\mbox{\footnotesize{RM}}({{U}})=} (xp,yp)U s.t.\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\;(x_{p},y_{p})\in{{U}}\mbox{ s.t. }}
s=(xs,ys)𝕌,xp>xs(xp=xsypys)\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\forall s=(x_{s},y_{s})\in{\Large{\mathbb{U}}},x_{p}>x_{s}\vee(x_{p}=x_{s}\wedge y_{p}\geq y_{s})}

and,CT(U)and,\mbox{\footnotesize{CT}}({{U}}) is the center point of the rectangle defined by LM(U)\mbox{\footnotesize{LM}}({{U}}), RM(U)\mbox{\footnotesize{RM}}({{U}}), BM(U)\mbox{\footnotesize{BM}}({{U}}), and TM(U)\mbox{\footnotesize{TM}}({{U}}) points.

In Def. II.4, the notation 2A2^{A} for a set AA denotes the set of all subsets of AA, i.e., the powerset of AA. In Figure 2, we have identified the left-most, right-most, bottom-most, and right-most elements of the set XX using the LM(X)\mbox{\footnotesize{LM}}(X), TM(X)\mbox{\footnotesize{TM}}(X), RM(X),\mbox{\footnotesize{RM}}(X), and BM(X)\mbox{\footnotesize{BM}}(X) function definitions. The right rectangle in Figure 2 is the tightest bounding box for the set XX, and it can be derived using only those four points.

II-D Spatio-Temporal Logic MTL×S4u\times S4_{u}

Next, we introduce the logic MTL×S4u\times S4_{u} over discrete-time semantics. MTL×S4u\times S4_{u} is a combination of Metric Temporal Logic (MTL) (Koymans (1990)) with the S4uS4_{u} logic of topological spaces. From another perspective, it is an extension to the logic PTL×S4u\times S4_{u} (see Gabelaia et al. (2005)) by adding time/frame intervals into the spatio-temporal operators. Even though MTL×S4u\times S4_{u} is a new logic introduced in this paper, we present it in the preliminaries section in order to gradually introduce notation and concepts needed for STPL. In this paper, we use the Backus-Naur form (BNF) (see Perugini (2021)) which is standard for providing the grammar when we define the syntax of formal and programming languages.

In the following, we provide the formal definitions for MTL×S4u\times S4_{u}. Henceforth, the symbol pp refers to a spatial proposition, i.e., it represents a set in the topological space. In addition, the symbol 𝒯\mathcal{T} represents spatial expressions that evaluate to subsets of the topological space. We refer to 𝒯\mathcal{T} as a spatial term.

Definition II.5 (Syntax of MTL×S4u\times S4_{u} as a temporal logic of topological spaces)

Let Π\Pi be a finite set of spatial propositions over 𝕋={\Large{\mathbb{T}}}= 𝕌,𝐈\langle{\Large{\mathbb{U}}},\mathbf{I}\rangle, then a formula φ\varphi of MTL×S4uMTL\times S4_{u} can be defined as follows:

𝒯\displaystyle\mathcal{T} ::=p|𝒯¯|𝒯𝒯|𝐈𝒯|𝒯𝒰s𝒯|s𝒯\displaystyle::=p\;|\;\overline{\mathcal{T}}\;|\;\mathcal{T}\sqcap\mathcal{T}\;|\;\mathbf{I}\;{\mathcal{T}}\;|\;\mathcal{T}\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\mathcal{T}\;|\;\bigcirc^{s}_{\mathcal{I}}\;\mathcal{T}
φ\displaystyle\varphi ::=𝒯|¬φ|φφ|φ𝒰φ|φ\displaystyle::=\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\;\mathcal{T}\;|\;\neg\varphi\;|\;\varphi\wedge\varphi\;|\;\varphi\;\,\mathcal{U}_{\mathcal{I}}\;\varphi\;|\;\bigcirc_{\mathcal{I}}\;\varphi

where pΠp\in\Pi is a spatial proposition.

In the above definition, the grammar for MTL×S4u\times S4_{u} consists of two sets of production rules 𝒯\mathcal{T} and φ\varphi. The spatial production rule 𝒯\mathcal{T} in Def. II.5 contains a mix of spatial (¯,,𝐈\bar{\;},\sqcap,\mathbf{I}) and spatio-temporal (𝒰s,s\,\mathcal{U}^{s}_{\mathcal{I}},\bigcirc^{s}_{\mathcal{I}}) operators. Here, 𝒯¯\overline{\mathcal{T}} is the spatial complement of the spatial term 𝒯\mathcal{T}, \sqcap is the spatial intersection operator, and 𝐈\mathbf{I} is the spatial interior operator. The 𝒰s\,\mathcal{U}^{s}_{\mathcal{I}} and s\bigcirc^{s}_{\mathcal{I}} operators are the spatio-temporal until and the spatio-temporal next, respectively. Here, the subscript \mathcal{I} denotes a non-empty interval of 0\mathbb{R}_{\geq 0} and captures any timing constraints for the operators. When timing constraints are not needed, then we can set =[0,+)\mathcal{I}=[0,+\infty), or remove the subscript \mathcal{I} from the notation. Intuitively, the spatio-temporal until and next operators introduce spatial operations over time. For instance, the expression p1𝒰[1,3]sp2p_{1}\mathcal{U}^{s}_{[1,3]}p_{2} computes the union over all sets resulting by the intersection of each occurrence of set p2p_{2} at some time in the interval [1,3][1,3] with all the sets p1p_{1} up to that time (see Def. II.6 for more details). We refer to the spatio-temporal operators 𝒰s\mathcal{U}^{s}_{\mathcal{I}} and s\bigcirc^{s}_{\mathcal{I}}, as Spatio-Temporal Evolving (STE) operators. Also, we call a formula STE if it has STE operators in it. Similarly, we call a spatio-temporal formula Spatial Purely Evolving (SPE) if there is no STE operator in the formula. For more information refer to the (OC) and (PC) definitions by Gabelaia et al. (2005).

In the production rule φ\varphi in Def. II.5, the spatially exists \exists checks if the spatial term following it evaluates to a non-empty set. That is, 𝒯\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\;\mathcal{T} checks if there exist some points in the set represented by 𝒯\mathcal{T}. In φ\varphi, except for the spatially exists \exists , the other operators are the same as in MTL. That is ¬\neg, \wedge, 𝒰\,\mathcal{U}_{\mathcal{I}} and \bigcirc_{\mathcal{I}} are the negation, conjunction, timed until, and next time operators, respectively (see the review by Bartocci et al. (2010)). As an example of a simple formula in MTL×S4u\times S4_{u}, the expression p1𝒰[0.5,2.5]p2\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}p_{1}\,\mathcal{U}_{[0.5,2.5]}\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}p_{2} is true if the set represented by p2p_{2} is nonempty at some point in time between [0.5,2.5][0.5,2.5] and until then the set represented by p1p_{1} is non-empty.

In the following, we define the bounded discrete-time semantics for MTL×S4u\times S4_{u}. The definition uses the spatial semantics of S4uS4_{u} while extending the temporal fragment (PTL) with time constraints over finite traces as in MTL. The semantics are defined over a data-object stream 𝒟\mathcal{D}. However, for consistency with PTL×S4u\times S4_{u}, we will assume the existence of a spatio-temporal valuation function 𝔘:Π×2𝕌{\Large{\mathfrak{U}}}:\Pi\times\mathbb{N}\rightarrow 2^{{\Large{\mathbb{U}}}} that associates with every proposition pp and time frame ii a subset of the topological space. In the definition of STPL in Section IV, the sets 𝔘(p,i){\Large{\mathfrak{U}}}(p,i) will correspond to identified objects in the environment, i.e., bounding boxes, bounding rectangles, or even regions in semantic segmentation. In this section, the the sets 𝔘(p,i){\Large{\mathfrak{U}}}(p,i) are just arbitrary subsets of the universe.

Definition II.6 (Quantified Topological Temporal Model and Valuation)

A Quantified Topological Temporal Model (QTTM) is a tuple of the form ={\Large{\mathbb{Q}}}= 𝕋,𝔘,𝒟,τ\langle{\Large{\mathbb{T}}},{\Large{\mathfrak{U}}},\mathcal{D},\tau\rangle, where 𝕋=𝕌,𝐈{\Large{\mathbb{T}}}=\langle{\Large{\mathbb{U}}},\mathbf{I}\rangle is a TO topological space, 𝔘{\Large{\mathfrak{U}}} is a spatio-temporal valuation function, 𝒟\mathcal{D} is a data-object stream of size |𝒟||\mathcal{D}|, τ:+\tau:\mathbb{N}\rightarrow\mathbb{R^{+}} maps frame numbers to their physical times, and \mathcal{I} is any non-empty interval of 0\;\mathbb{R}_{\geq 0} over time.

Given a model {\Large{\mathbb{Q}}}, the valuation function 𝕍(p,𝒟,i,τ){\Large{\mathbb{V}}}(p,\mathcal{D},i,\tau) represents a subset of the topological space 𝕋{\Large{\mathbb{T}}} that is occupied by spatial proposition pΠp\in\Pi in the ii’th frame (e.g., τ(i)\tau(i) is the captured time). The valuation 𝕍{\Large{\mathbb{V}}} is inductively extended to any formulas that can be produced by the production rule 𝒯\mathcal{T} in the Def. II.5 as follows:

𝕍(p,𝒟,i,τ):=𝔘(p,i)\displaystyle{\Large{\mathbb{V}}}(p,\mathcal{D},i,\tau):={\Large{\mathfrak{U}}}(p,i)
𝕍(𝒯1𝒰s𝒯2,𝒟,i,τ):=i{j|τ(j)(τ(i)+)}\displaystyle{\Large{\mathbb{V}}}(\mathcal{T}_{1}\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\mathcal{T}_{2},\mathcal{D},i,\tau):=\bigcup_{i^{\prime}\in\{j\in\mathbb{N}\;|\;\tau(j)\in(\tau(i)+\mathcal{I})\}}
(𝕍(𝒯2,𝒟,i,τ)ii′′<i𝕍(𝒯1,𝒟,i′′,τ))\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\left({\Large{\mathbb{V}}}(\mathcal{T}_{2},\mathcal{D},i^{\prime},\tau)\cap\bigcap_{i\leq i^{\prime\prime}<i^{\prime}}{\Large{\mathbb{V}}}(\mathcal{T}_{1},\mathcal{D},i^{\prime\prime},\tau)\right)
𝕍(s𝒯,𝒟,i,τ):=\displaystyle{\Large{\mathbb{V}}}(\bigcirc^{s}_{\mathcal{I}}\;\mathcal{T},\mathcal{D},i,\tau):=
{𝕍(𝒯,𝒟,i+1,τ)if i+1<|𝒟|,τ(i+1)(τ(i)+)otherwise (i.e., an empty set)\displaystyle\left\{\begin{tabular}[]{ll}${\Large{\mathbb{V}}}(\mathcal{T},\mathcal{D},i+1,\tau)$&if $i+1{<|\mathcal{D}|},\tau(i+1)\in(\tau(i)+\mathcal{I})$\\ ${\Large{\varnothing}}$&otherwise (i.e., an empty set)\\ \end{tabular}\right.
𝕍(𝒯1𝒯2,𝒟,i,τ):=𝕍(𝒯1,𝒟,i,τ)𝕍(𝒯2,𝒟,i,τ)\displaystyle{\Large{\mathbb{V}}}(\mathcal{T}_{1}\sqcap\mathcal{T}_{2},\mathcal{D},i,\tau):={\Large{\mathbb{V}}}(\mathcal{T}_{1},\mathcal{D},i,\tau)\cap\;{\Large{\mathbb{V}}}(\mathcal{T}_{2},\mathcal{D},i,\tau)
𝕍(𝒯¯,𝒟,i,τ):=𝕍(𝒯,𝒟,i,τ)¯\displaystyle{\Large{\mathbb{V}}}(\overline{\mathcal{T}},\mathcal{D},i,\tau):=\overline{{\Large{\mathbb{V}}}(\mathcal{T},\mathcal{D},i,\tau)}
𝕍(𝐈𝒯,𝒟,i,τ):=𝐈𝕍(𝒯,𝒟,i,τ)\displaystyle{\Large{\mathbb{V}}}(\mathbf{I}\;\mathcal{T},\mathcal{D},i,\tau):=\;\mathbf{I}\;{\Large{\mathbb{V}}}(\mathcal{T},\mathcal{D},i,\tau)

where t+={t′′|t.t′′=t+t}t+\mathcal{I}=\{t^{\prime\prime}\;|\;\exists t^{\prime}\in\mathcal{I}\,.\,t^{\prime\prime}=t+t^{\prime}\}.

The valuation function 𝕍{\Large{\mathbb{V}}} definition is straightforward for the spatial operations, i.e., ¯,,𝐈\bar{\;},\sqcap,\mathbf{I}; 𝕍{\Large{\mathbb{V}}} is just applying the corresponding set theoretic operations, i.e., ¯,,𝐈\bar{\;},\cap,\mathbf{I}. The more interesting cases are the spatio-temporal (𝒰s,s\,\mathcal{U}^{s}_{\mathcal{I}},\bigcirc^{s}_{\mathcal{I}}) operators. The spatial-next operator s𝒯\bigcirc^{s}_{\mathcal{I}}\mathcal{T} first checks if the next sample (i+1)(i+1) satisfies the timing constraints, i.e., τ(i+1)(τ(i)+)\tau(i+1)\in(\tau(i)+\mathcal{I}), and if so, it returns the set that 𝒯\mathcal{T} represents at time (i+1)(i+1). The spatial until is a little bit more involved and it is better explained through derived operators. In the following, we define some of the commonly used derived operators:

  • The spatially for all operator \forall checks if the spatial expression 𝒯\mathcal{T} represents a set which is the same as the universe: 𝒯¬𝒯¯,\leavevmode\resizebox{7.22743pt}{}{\framebox{$\forall$}}\;\mathcal{T}\equiv\neg\;\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\;\overline{\mathcal{T}},

  • The spatial union operator: 𝒯1𝒯2𝒯1¯𝒯2¯¯,\mathcal{T}_{1}\sqcup\mathcal{T}_{2}\equiv\overline{\overline{\mathcal{T}_{1}}\sqcap\overline{\mathcal{T}_{2}}},

  • The spatial closure operator: 𝐂𝒯𝐈𝒯¯¯,\mathbf{C}\;\mathcal{T}\equiv\overline{\mathbf{I}\;\overline{\mathcal{T}}},

  • The spatial eventually operator: s𝒯𝕌𝒰s𝒯\Diamond^{s}_{\mathcal{I}}\;\mathcal{T}\equiv{\Large{\mathbb{U}}}\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\mathcal{T},

  • The spatial always operator: s𝒯s𝒯¯¯\Box^{s}_{\mathcal{I}}\;\mathcal{T}\equiv\overline{\Diamond^{s}_{\mathcal{I}}\;\overline{\mathcal{T}}}.

Notice that in the definition of the spatial eventually operator, we used the universe set 𝕌{\Large{\mathbb{U}}} as a terminal even though the syntax in Def. II.5 does not explicitly allow for that. The universe (𝕌{\Large{\mathbb{U}}}) and the empty set (\emptyset) can be defined as derived spatial expressions, i.e., 𝕌=pp¯{\Large{\mathbb{U}}}=p\sqcup\bar{p} and =pp¯\emptyset=p\sqcap\bar{p}. Therefore, if we replace 𝕌{\Large{\mathbb{U}}} in the definition of 𝕍{\Large{\mathbb{V}}} for 𝒰s\,\mathcal{U}^{s}_{\mathcal{I}}, we can observe that s\Diamond^{s}_{\mathcal{I}} corresponds to the spatial union of the expression 𝒯\mathcal{T} over the time interval {\mathcal{I}}.

Example II.2

A simple example is presented in Fig. 3 for a data-stream with 4 frames. The spatial expression sp\Diamond^{s}p corresponds to the union of all the sets represented by pp over time (gray set in Fig. 3) since there are no timing constraints. On the other hand, the spatial expression [0,1]sp\Box^{s}_{[0,1]}\,p corresponds to the intersection of the sets of pp at frames i=0,1,2i=0,1,2 since the last frame (i=3i=3) with τ(3)=1.2\tau(3)=1.2 does not satisfy the timing constraints [0,1][0,1]. The set [0,1]sp\Box^{s}_{[0,1]}\,p is represented as a black box in Fig. 3.

xxyyi=0i=0i=1i=1i=2i=2i=3i=3xxyy
Figure 3: Left: the evolution of a spatial proposition pp over four frames with τ(0)=0\tau(0)=0, τ(1)=0.4\tau(1)=0.4, τ(0)=0.8\tau(0)=0.8, τ(0)=1.2\tau(0)=1.2; Right: Gray: the set resulting from sp\Diamond^{s}p, and Black: the set resulting from [0,1]sp\Box^{s}_{[0,1]}p.

Given the definition of the valuation function 𝕍{\Large{\mathbb{V}}} for QTTM, we can proceed to define the semantics of MTL×S4u\times S4_{u}. Recall that the valuation of spatial expressions returns sets from some topological space. On the other hand, MTL×S4u\times S4_{u} formulas are interpreted over Boolean values True (\top) / False (\bot), i.e., the formulas are satisfied or are not satisfied. To evaluate MTL×S4u\times S4_{u} formulas, we use a valuation function [[φ]][\![\varphi]\!] which takes as an input an MTL×S4u\times S4_{u} formula φ\varphi, a data stream 𝒟\mathcal{D}, a sample ii, and a timestamp function τ\tau, and returns True (\top) or False (\bot).

Definition II.7 (MTL×S4u\times S4_{u} semantics)

Given an MTL×S4u\times S4_{u} formula φ\varphi, a QTTM ={\Large{\mathbb{Q}}}= 𝕋,𝔘,𝒟,τ\langle{\Large{\mathbb{T}}},{\Large{\mathfrak{U}}},\mathcal{D},\tau\rangle, and a frame ii\in\mathbb{N}, the semantics of φ\varphi are defined recursively as follows:

[[𝒯]](𝒟,i,τ):={if 𝕍(𝒯,𝒟,i,τ)otherwise (i.e., an empty set)\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}[\![\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\mathcal{T}]\!](\mathcal{D},i,\tau):=\left\{\begin{tabular}[]{ll}$\top$&if $\;{\Large{\mathbb{V}}}(\mathcal{T},\mathcal{D},i,\tau)\not={\Large{\varnothing}}$\\ $\bot$&otherwise (i.e., an empty set)\\ \end{tabular}\right.}
[[¬ϕ]](𝒟,i,τ):=¬[[ϕ]](𝒟,i,τ)\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}[\![\neg\phi]\!](\mathcal{D},i,\tau):=\neg[\![\phi]\!](\mathcal{D},i,\tau)}
[[ϕ1ϕ2]](𝒟,i,τ):=[[ϕ1]](𝒟,i,τ)[[ϕ2]](𝒟,i,τ)\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}[\![\phi_{1}\wedge\phi_{2}]\!](\mathcal{D},i,\tau):=[\![\phi_{1}]\!](\mathcal{D},i,\tau)\;\wedge[\![\phi_{2}]\!](\mathcal{D},i,\tau)}
[[ϕ1𝒰ϕ2]](𝒟,i,τ):=\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}[\![\phi_{1}\;\,\mathcal{U}_{\mathcal{I}}\;\phi_{2}]\!](\mathcal{D},i,\tau):=}
j{k|τ(k)(τ(i)+)}([[ϕ2]](𝒟,j,τ)ik<j[[ϕ1]](𝒟,k,τ))\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\bigvee_{j\in\{k\in\mathbb{N}\;|\;\tau(k)\in(\tau(i)+\mathcal{I})\}}\bigg{(}[\![\phi_{2}]\!](\mathcal{D},j,\tau)\wedge\bigwedge_{i\leq k<j}[\![\phi_{1}]\!](\mathcal{D},k,\tau)\bigg{)}}
[[ϕ]](𝒟,i,τ):=\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}[\![\bigcirc_{\mathcal{I}}\phi]\!](\mathcal{D},i,\tau):=}
{[[ϕ]](𝒟,i+1,τ)if i+1<|𝒟|,τ(i+1)(τ(i)+)otherwise\displaystyle\;\;\;\left\{\begin{tabular}[]{ll}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}$[\![\phi]\!](\mathcal{D},i+1,\tau)$}&{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}if $i+1{<|\mathcal{D}|},\tau(i+1)\in(\tau(i)+\mathcal{I})$}\\ {\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}$\bot$}&{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}otherwise}\\ \end{tabular}\right.

Notice that the definitions of the propositional and temporal operators closely match the definitions of the spatial and spatio-temporal operators where set operations (union and intersection) have been replaced by Boolean operations (disjunction and conjunction). Therefore, similar to the spatial expressions, we can define disjunction as φ1φ2¬(¬φ1¬φ2\varphi_{1}\vee\varphi_{2}\equiv\neg(\neg\varphi_{1}\wedge\neg\varphi_{2}), eventually as φ𝒰φ\Diamond_{\mathcal{I}}\varphi\equiv\top\,\mathcal{U}_{\mathcal{I}}\,\varphi, and always as φ¬¬φ\Box_{\mathcal{I}}\varphi\equiv\neg\Diamond_{\mathcal{I}}\neg\varphi. Finally, the constant true is defined as 𝕌\top\equiv\leavevmode\resizebox{7.22743pt}{}{\framebox{$\forall$}}\,{\Large{\mathbb{U}}}.

Remark II.8

In some specifications, it is easier to formalize a requirement using frame intervals and reason over frames rather than time intervals. To highlight this option, we add a tilde on top of the spatio-temporal operators with frame intervals, i.e., in 𝒰~s\tilde{\,\mathcal{U}}^{s}_{\mathcal{I}}, ~s\tilde{\bigcirc}^{s}_{\mathcal{I}}, 𝒰~\tilde{\,\mathcal{U}}_{\mathcal{I}}, and ~\tilde{\bigcirc}_{\mathcal{I}} the interval \mathcal{I} is over frame interval. When reasoning over frames, τ(i)\tau(i) equals ii, i.e., τ\tau is the identity function.

Example II.3

Revisiting Example II.2, we can now introduce and explain some MTL×S4u\times S4_{u} formulas. The formulas sp\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\Diamond^{s}p and [0,1]sp\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\Box^{s}_{[0,1]}\,p evaluate to true since the sets sp\Diamond^{s}p and [0,1]sp\Box^{s}_{[0,1]}\,p are not empty. On the other hand, the formula sp\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\Box^{s}\,p is false because the set that corresponds to sp\Box^{s}\,p is empty (in Fig. 3 there is no common subset for pp across all frames). Notice that the MTL×S4u\times S4_{u} formula sp\Box^{s}\,\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\,p is true since the set pp is not empty at every frame. Finally, considering time stamps versus frames, the set [0,1]sp\Box^{s}_{[0,1]}\,p, which considers timing constraints, is the same as the set ~[0,2]sp\tilde{\Box}^{s}_{[0,2]}\,p, which considers frame constraints.

III Problem Definition

Given a data stream 𝒟\mathcal{D} as defined in II-A, the goal of this paper is to:

  1. 1.

    formulate object properties in such a data stream, and

  2. 2.

    monitor satisfiability of formulas over the stream.

III-A Assumptions

Given a data stream 𝒟\mathcal{D}, we assume that a tracking perception algorithm uniquely assigns identifiers to classified objects for all the frames (see the work by Gordon et al. (2018)).

In order to relax this assumption, we need to enable probabilistic reasoning within the spatio-temporal framework, which is out of the scope of this work. However, our framework could do some basic sanity checks about the relative positioning of the objects during a series of frames to detect misidentified objects.

This assumption is only needed for some certain types of requirements and for the rest it can be lifted.

III-B Overall Solution

We define syntax and semantics of Spatio-Temporal Perception Logic (STPL) over sets of points in topological space, and quantifiers over objects. We build a monitoring algorithm over TPTL, MTL, and S4uS4_{u} and show the practicality, expressivity and efficiency of the algorithm by presenting examples. This is a powerful language and monitoring algorithm with many applications for verification and testing of complex perception systems. Our proposed language and its monitoring algorithm are different than prior works as we discussed in the introduction (see also related works in Section VIII), and briefly summarized below. STPL

  • reasons over spatial and temporal properties of objects through functions and relations;

  • extends the reasoning of prior set-based logics by equipping them with efficient offline monitoring algorithm tools;

  • focuses on expressing functional requirements for perception systems; and

  • is supported by open source monitoring tools.

IV Spatio-Temporal Perception Logic

In this section, we present the syntax and semantics of the Spatio-Temporal Perception Logic (STPL). Our proposed logic evaluates the geometric relations of objects in a data stream and their evolution over time. Theoretically, any set-based topological space can be used in our logic. However, we focus on topological spaces and geometric operations relevant to applications related to perception.

Next, we are going to define and interpret STPL formulas over data-object streams. We define our topological space to be axis aligned rectangles in 2D images, or arbitrary polyhedral sets (potentially boxes) in 3D environments. An axis aligned rectangle can be represented by a set of two points that correspond to two non-adjacent corners of the rectangle with four sides each parallel to an axis. A polyhedral is a set formed by the intersection of a finite number of closed half spaces. We assume that this information is contained in the perception datastream as annotations or attributes for each object or region identified by the perception system. For instance, (𝒟(i),id).𝒞𝒮\mathcal{R}(\mathcal{D}(i),id).\mathcal{CS} may store a rectangular axis-aligned convex-polygon that is associated with an object idid in the ii’th frame. Alternatively, (𝒟(i),id).𝒞𝒮\mathcal{R}(\mathcal{D}(i),id).\mathcal{CS} may store a convex-polyhedron. Without loss of generality, we will typically use definitions for 2D spaces (image spaces), and later in the case study, we will use bounding volumes (3D spaces).

IV-A STPL Syntax

The syntax of STPL is based on the syntax of TQTL (Dokhanchi et al. (2018a)) and 𝒮4u{\mathcal{S}}4_{u} (Gabelaia et al. (2005)).

In the context of STPL, the spatial propositions are the symbols that correspond to objects or regions in the perception dataset. We use the function symbol σ\mathcal{\sigma} to map these objects to their corresponding sets. Hence, the syntax of spatial terms 𝒯\mathcal{T} in STPL is the same as in MTL×S4u\times S4_{u}, but the function symbols σ\mathcal{\sigma} replace the spatial propositions pp. In addition, we add grammar support under production rule 𝒜\mathcal{A} for area computation for spatial terms. Area computation is needed in standard performance metrics for 2D vision algorithms, for example to compute union over intersection (UoI). In the same spirit, we can also support volume computation for spatial terms in 3D spaces.

For many requirements, we also need to access other attributes of the objects in the datastream, e.g., classes, probabilities, estimated velocities, or even compute some basic functions on object attributes, e.g., distances between bounding boxes. For usability, we define some basic functions to retrieve data and compute the desired properties. The functions which are currently supported are: object class (CC), class membership probability (PP), latitude (LatLat) and longitude (LonLon) coordinates of a point (CRTCRT), area of bounding box (AreaArea), and distance (DistDist) between two points (CRTCRT). We provide the syntax for using such functions under the production rule Θ\Theta.

This set of functions is sufficient to demonstrate the generality of our logic. However, a more general approach would be to define arithmetic expressions over user definable functions which can retrieve any desired data from the datastream. Such functionality support is going to be among the goals of future software releases.

Definition IV.1 (STPL Syntax over Discrete-Time Signals)

Let VtV_{t} and VoV_{o} be sets of time variables and object variables, respectively. Assume that xVtx\in V_{t} is a time variable, idVoid\in V_{o} is an object variable, \mathcal{I} is any non-empty interval of 0\;\mathbb{R}_{\geq 0} over time. The syntax for Spatio-Temporal Perception Logic (STPL) formulas is provided by the following grammar starting from the production rule ϕ\phi:

The syntax for spatial terms is:

𝒯\displaystyle\mathcal{T} ::=σ(id)|𝒯¯|𝒯𝒯|𝐈𝒯|𝒯𝒰s𝒯|s𝒯\displaystyle::=\mathcal{\sigma}(id)\;|\;\overline{\mathcal{T}}\;|\;\mathcal{T}\sqcap\mathcal{T}\;|\;\mathbf{I}\;{\mathcal{T}}\;|\;\mathcal{T}\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\mathcal{T}\;|\;\bigcirc^{s}_{\mathcal{I}}\;\mathcal{T}

The syntax for the functions that compute the area of a spatial term are:

𝒜\displaystyle\mathcal{A} ::=Area(𝒯)r|Area(𝒯)r×Area(𝒯)\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\;::=Area(\mathcal{T})\sim r\;|\;Area(\mathcal{T})\sim r\times Area(\mathcal{T})}

The atomic propositions that represent coordinates for a bounding box are:

CRT ::=LM|RM|TM|BM|CT\displaystyle::=\mbox{\footnotesize{LM}}\;|\;\mbox{\footnotesize{RM}}\;|\;\mbox{\footnotesize{TM}}\;|\;\mbox{\footnotesize{BM}}\;|\;\mbox{\footnotesize{CT}}

The syntax for spatio-temporal functions are:

Θ::=\displaystyle\Theta::= Dist(id,CRT,id,CRT)r|\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\;Dist(id,\mbox{\footnotesize{CRT}},id,\mbox{\footnotesize{CRT}})\sim r\;|\;}
Lat(id,CRT)r|Lon(id,CRT)r|\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\;Lat(id,\mbox{\footnotesize{CRT}})\sim r\;|\;Lon(id,\mbox{\footnotesize{CRT}})\sim r\;|\;}
Lat(id,CRT)r×Lat(id,CRT)|\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\;Lat(id,\mbox{\footnotesize{CRT}})\sim r\times Lat(id,\mbox{\footnotesize{CRT}})\;|\;}
Lon(id,CRT)r×Lon(id,CRT)|\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\;Lon(id,\mbox{\footnotesize{CRT}})\sim r\times Lon(id,\mbox{\footnotesize{CRT}})\;|\;}
Lat(id,CRT)r×Lon(id,CRT)|\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\;Lat(id,\mbox{\footnotesize{CRT}})\sim r\times Lon(id,\mbox{\footnotesize{CRT}})\;|\;}
Area(id)r|Area(id)r×Area(id)|\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\;Area(id)\sim r\;|\;Area(id)\sim r\times Area(id)\;|\;}
C(id)=c|C(id)=C(id)|\displaystyle\;C(id)=c\;|\;C(id)=C(id)\;|\;
P(id)a|P(id)r×P(id)\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\;P(id)\sim a\;|\;P(id)\sim{r}\times P(id)}

The syntax for the STPL formula is:

ϕ::=\displaystyle\phi::= |x.ϕ|id@x.ϕ|id.ϕ|id=id|¬ϕ|ϕϕ|\displaystyle\;\top\;|\;x.\phi\;|\;\exists id@x.\phi\;|\;\exists id.\phi\;|\;id=id\;|\;\neg\phi\;|\;\phi\vee\phi\;|\;
τxt|xn|x%cn|\displaystyle\tau-x\sim t\;|\;\mathcal{F}-x\sim n\;|\;\mathcal{F}-x\;\%\;c\sim n\;|\;
ϕ|ϕ𝒰ϕ|ϕ|ϕ𝒮ϕ|\displaystyle\bigcirc\phi\;|\;\phi\;\,\mathcal{U}\;\phi\;|\;\varodot\phi\;|\;\phi\;\mathcal{S}\;\phi\;|\;
Θ|𝒯|𝒜\displaystyle\Theta\;|\;\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\;\mathcal{T}\;|\;\mathcal{A}

where \top is the symbol for true, {<,>,,,=}\sim\in\{<,>,\geq,\leq,=\}, and r0,cr\in\mathbb{R}_{\geq 0},c\in\mathbb{N}, and a[0,1]a\in[0,1] are constants. Here, σ:VoΠ\mathcal{\sigma}:V_{o}\rightarrow\Pi is a function that maps object variables into spatial propositions.

The syntax of STPL is substantially different from the syntax of MTL×S4u\times S4_{u}. In the STPL syntax, x.ϕx.\phi stands for the freeze time quantifier. When this expression is evaluated, the corresponding frame ii is stored in the clock variable xx. The prefix id\exists id in the rule id@x.φ\exists id@x.\varphi or the rule id.φ\exists id.\varphi is the Existential object quantifier. When the formula id@x.φ(id)\exists id@x.\varphi(id) is evaluated, then it is satisfied (true) when there is an object idid at frame/time xx that makes φ(id)\varphi(id) true. The formula id.φ(id)\exists id.\varphi(id) is also searching for an object that makes φ(id)\varphi(id) true, but in this case we do not need to refer to the time that the object was selected. Similarly, the Universal object quantifier is defined as id@x.ϕ¬(id@x.¬ϕ)\forall id@x.\phi\equiv\neg(\exists id@x.\neg\phi) or id.ϕ¬(id.¬ϕ)\forall id.\phi\equiv\neg(\exists id.\neg\phi). The universal quantifier requires that all the objects in a frame satisfy the subformula φ\varphi. Notice that the syntax of STPL cannot enforce that all uses of time or object variables are bound, i.e., declared before use. In practice, such errors can be detected after the parse tree of the formula has been constructed through a simple tree traversal.

In contrast to MTL×S4u\times S4_{u}, the timing constraints in STPL are not annotating the temporal operators, but rather they are explicitly stated as arithmetic expressions in the formulas. For example, consider the specification “There must exist a car now which will have class membership probability greater than 90% within 1.5 sec”. With timing constraints annotating the temporal operators, the requirement would be:

id.(C(id)=car[0,1.5]P(id)>0.9).\exists id.\big{(}C(id)=\texttt{car}\wedge\Diamond_{[0,1.5]}P(id)>0.9\big{)}.

Since STPL uses time variables and arithmetic expressions to express timing constraints, the same requirement may be written as:

id@x.(C(id)=car(τx1.5P(id)>0.9)).\exists id@x.\big{(}C(id)=\texttt{car}\wedge\Diamond(\tau-x\leq 1.5\wedge P(id)>0.9)\big{)}.

The use of time variables enables the requirements engineer to define more complex timing requirements and to refer to objects at specific instances in time. The time, frame, and object constraints of STPL formulas are in the form of τx>r\tau-x>r, x>n\mathcal{F}-x>n, and id=idid=id, respectively. We denote τx\tau-x and x\mathcal{F}-x to refer to the elapsed time and frames, respectively. Note that we use the same variable to refer to the freeze time and frame, but distinguish their type based on how they are used in the constraints (τ\tau represents the current time, and \mathcal{F} represents the current frame number). The operator %\% in the expression x%cn\mathcal{F}-x\;\%\;c\sim n is used to specify periodic constraints. For reasoning over the past time, we added \varodot (previous) and 𝒮\mathcal{S} (since) operators as duality for the \bigcirc and 𝒰\,\mathcal{U} operators, respectively.

For STPL formulas ψ\psi, ϕ\phi, we define ψϕ¬(¬ψ¬ϕ)\psi\wedge\phi\equiv\neg(\neg\psi\vee\neg\phi), ¬\bot\equiv\neg\top (False), ψϕ¬ψϕ\psi\rightarrow\phi\equiv\neg\psi\vee\phi (ψ\psi Implies ϕ\phi), ϕψ¬(¬ϕ𝒰¬ψ)\phi\;\mathcal{R}\;\psi\equiv\neg(\neg\phi\;\,\mathcal{U}\;\neg\psi) (ϕ\phi releases ψ\psi), ϕ¯ψϕ(ϕψ)\phi\;\overline{\mathcal{R}}\;\psi\equiv\phi\;\mathcal{R}\;(\phi\vee\psi) (ϕ\phi non-strictly releases ψ\psi), ψ𝒰ψ\Diamond\psi\equiv\top\;\,\mathcal{U}\;\psi (Eventually ψ\psi), ψ¬¬ψ\Box\psi\equiv\neg\Diamond\neg\psi (Always ψ\psi) using syntactic manipulation.

Remark IV.2

In principle, in STPL, it is easy to add multiple classes and classification probabilities for each object. We would simply need to replace in Def. IV.1 the rules

C(id)=c|P(id)a|P(id)r×P(id)C(id)=c\;|\;P(id)\sim a\;|\;P(id)\sim{r}\times P(id)

in Θ\Theta with

cC(id)|P(id,c)a|P(id,c)a×P(id,c).c\in C(id)\;|\;P(id,c)\sim a\;|\;P(id,c)\sim a\times P(id,c).

where C(id)C(id) is now a function which returns a set of classes for the object. In order to write meaningful requirements over multiple classes, we would also need to introduce quantifiers over arbitrary sets. That is, we should be able to write a formula such as “there exists at least one class with probability greater than aa”: cC(Id).P(Id,c)>a\exists c\in C(Id)\,.\,P(Id,c)>a. This is within the scope of First Order Temporal Logics (e.g., see Basin et al. (2015) and Havelund et al. (2020)) and we plan to consider such additions in the future.

In general, the monitoring problem for STPL formulas is PSPACE-hard (since STPL subsumes Timed Proposition Temporal Logic (TPTL); see Markey and Raskin (2006)). However, there exists a fragment of STPL which is efficiently monitorable (in polynomial time).

Definition IV.3 (Almost Arbitrarily Nested Formula)

An Almost Arbitrarily Nested (AAN) formula is an STPL formula in which no time or object variables are used in the scope of another freeze time operator.

For example,

φ1\displaystyle\varphi_{1} =x1.(x1>2x2.(τx2<0.01))\displaystyle=x_{1}.\Box\big{(}\mathcal{F}-x_{1}>2\wedge x_{2}.\Diamond(\tau-x_{2}<0.01)\big{)}
φ2\displaystyle\varphi_{2} =Id1@x.(Id2.(Id1=Id2)(τx>2))\displaystyle=\Box\exists Id_{1}@x.\Diamond\big{(}\Box\forall Id_{2}.(Id_{1}=Id_{2})\wedge(\tau-x>2)\big{)}

are AAN formulas. In formula φ1\varphi_{1}, the time variable x1x_{1} is not used in the scope of the x2x_{2} freeze time operator. In formula φ2\varphi_{2}, there is no nested quantifier/freeze time operators. Therefore, they are both ANN STPL formulas. On the other hand,

φ3=\displaystyle\varphi_{3}= x1.(x2.(x1>2τx2<0.01))\displaystyle x_{1}.\big{(}\Box x_{2}.\Diamond(\mathcal{F}-x_{1}>2\wedge\;\tau-x_{2}<0.01)\big{)}
φ4=\displaystyle\varphi_{4}= Id1@x1.Id2@x2.Id3.\displaystyle\Box\forall Id_{1}@x_{1}.\bigcirc\forall Id_{2}@x_{2}.\bigcirc\Box\forall Id_{3}.
(P(Id3)>P(Id1)P(Id3)<P(Id2))\displaystyle\big{(}P(Id_{3})>P(Id_{1})\wedge P(Id_{3})<P(Id_{2})\big{)}

are not AAN formulas. That is because in φ3\varphi_{3}, the variable x1x_{1} is used in the scope of the nested quantifier operator “x2.x_{2}.”. In φ4\varphi_{4}, the Id1Id_{1} is used in the scope of the second nested quantifier operator “x2.x_{2}.”.

The authors in Dokhanchi et al. (2016) presented an efficient monitoring algorithm for TPTL formulas with an arbitrary number of independent time variables. Our definition of AAN formulas is adopted from their definition of encapsulated TPTL formulas that are TPTL formulas with only independent time variables in them.

Remark IV.4

Our definition of AAN formulas is syntactical. However, there can be syntactically AAN formulas that can be rewritten as semantically equivalent non-AAN formulas. There are other ways to define formulas that are not syntactically ANN but semantically equivalent to AAN formulas. The precise mathematical definitions of these formulas is beyond the scope of this paper. In the following, we will show examples of AAN formulas and some other formulas that are equivalent to AAN formulas. For example,

φ5=x.y.((τx>1)(τy>2))\varphi_{5}=\Box\;x.\Box\;y.\big{(}(\tau-x>1)\implies\Box(\tau-y>2)\big{)}

which is not an AAN formula can be rewritten as AAN

φ5=x.((τx>1)y.(τy>2)).{\varphi_{5}}^{\prime}=\Box\;x.\Box\big{(}(\tau-x>1)\implies y.\Box(\tau-y>2)\big{)}.

On the other hand, although

φ6=\displaystyle\varphi_{6}= id1@x.id2@y.\displaystyle\Box\forall id_{1}@x.\Box\forall id_{2}@y.
((C(id1)=C(id2))(P(id2)>0.9)\displaystyle\big{(}(C(id_{1})=C(id_{2}))\implies(P(id_{2})>0.9\big{)}

is not an AAN formula, our monitoring tool supports it since the clock variables are not used. That is, it can be written as:

φ6=\displaystyle\varphi_{6}^{\prime}= id1@x.id2.\displaystyle\Box\forall id_{1}@x.\Box\forall id_{2}.
((C(id1)=C(id2))(P(id2)>0.9))\displaystyle\big{(}(C(id_{1})=C(id_{2}))\implies(P(id_{2})>0.9)\big{)}

IV-B STPL Semantics

Before getting into the semantics of the STPL, we represent spatio-temporal function definitions that are used in production rules 𝒜,\mathcal{A}, and Θ\Theta. The definitions of these functions are independent of the semantics of the STPL language, and therefore, we can extend them to increase the expressivity of the language.

IV-B1 Spatio-Temporal Functions

We define functions ff as follows:

  • Class of an object:
    𝒇𝑪(𝒊𝒅,𝓓,𝒊,ϵ,𝜻)\boldsymbol{f_{C}(id,\mathcal{D},i,\epsilon,\zeta)} returns the \mathcal{R} (𝒟(\mathcal{D} (k)(k) ,ϵ(id)),\epsilon(id)) .Class.Class as the class of the ϵ(id)\epsilon(id) object in the kk’th frame, where kζ(id)k\leftarrow\zeta(id) if ζ(id)\zeta(id) is specified, and kik\leftarrow i otherwise.

  • Probability of a classified object:
    𝒇𝑷(𝒊𝒅,𝓓,𝒊,ϵ,𝜻)\boldsymbol{f_{P}(id,\mathcal{D},i,\epsilon,\zeta)} returns the (\mathcal{R}( 𝒟(k)\mathcal{D}(k) ,ϵ(id)),\epsilon(id)) .Prob.Prob as the probability of the ϵ(id)\epsilon(id) object in the kk’th frame, where kζ(id)k\leftarrow\zeta(id) if ζ(id)\zeta(id) is specified, and kik\leftarrow i otherwise.

  • Distance between two points from two different objects:
    𝒇𝑫𝒊𝒔𝒕(𝒊𝒅𝒋,𝒊𝒅𝒌,CRT𝟏,CRT𝟐,𝓓,𝒊,ϵ,𝜻)\boldsymbol{f_{Dist}(id_{j},id_{k},\mbox{\footnotesize{CRT}}_{1},\mbox{\footnotesize{CRT}}_{2},\mathcal{D},i,\epsilon,\zeta)} computes and returns the Euclidean Distance between the CRT1\mbox{\footnotesize{CRT}}_{1} point of the ϵ(idj)\epsilon(id_{j}) object and CRT2\mbox{\footnotesize{CRT}}_{2} point of the ϵ(idk)\epsilon(id_{k}) object in the k1k_{1}’th and k2k_{2}’th frames, respectively; and we have k1ζ(idj)k_{1}\leftarrow\zeta(id_{j}) if ζ(idj)\zeta(id_{j}) is specified, and k1ik_{1}\leftarrow i otherwise, and k2ζ(idk)k_{2}\leftarrow\zeta(id_{k}) if ζ(idk)\zeta(id_{k}) is specified, and k2ik_{2}\leftarrow i otherwise.

  • Lateral distance of a point that belongs to an object in a coordinate system (for image space, see section II-C):
    𝒇𝑳𝑨𝑻(𝒊𝒅,CRT,𝓓,𝒊,ϵ,𝜻)\boldsymbol{f_{LAT}(id,\mbox{\footnotesize{CRT}},\mathcal{D},i,\epsilon,\zeta)} computes and returns the Lateral Distance of the CRT point of the ϵ(id)\epsilon(id) object in the kk’th frame from the Longitudinal axis, where kζ(id)k\leftarrow\zeta(id) if ζ(id)\zeta(id) is specified, and kik\leftarrow i otherwise.

  • Longitudinal distance of a point that belongs to an object in a coordinate system (for image space, see section II-C):
    𝒇𝑳𝑶𝑵(𝒊𝒅,CRT,𝓓,𝒊,ϵ,𝜻)\boldsymbol{f_{LON}(id,\mbox{\footnotesize{CRT}},\mathcal{D},i,\epsilon,\zeta)} computes and returns the Longitudinal Distance of the CRT point of the ϵ(id)\epsilon(id) object in the kk’th frame from the Lateral axis, where kζ(id)k\leftarrow\zeta(id) if ζ(id)\zeta(id) is specified, and kik\leftarrow i otherwise.

  • Area of a region:
    𝒇𝑨𝒓𝒆𝒂(𝓣)\boldsymbol{f_{Area}(\mathcal{T})} computes and returns the area of an spatial term 𝒯\mathcal{T} if 𝒯\mathcal{T} is specified, otherwise it returns unspecified.

In the above functions, we use identifier variables to refer to objects in a data stream. Some parameters are point specifiers to choose a single point from all the points that belong to an object. The only function with a spatial term as an argument is the area function that calculates the area of a 2D bounding box. Similar reasoning for a 3D geometric shape (polyhedra in general) using a volume function is possible. This is supported by our open-source software STPL (2023), but we don’t provide formal definitions here for the brevity of the presentation.

All the functions have 𝒟\mathcal{D}, ii, ϵ\epsilon and ζ\zeta as arguments. Functions need the data stream 𝒟\mathcal{D} to access and retrieve perception data, the frame number ii to access a specific frame in time (“current” frame), a data structure (typically referred to as “environment”) ϵ\epsilon to store and retrieve the values assigned to variables (both for time and objects), and, finally, a data structure ζ\zeta to acquire the time/frame at which an object was selected through quantification. The data structure ζ\zeta is used to distinguish whether the time that an object was selected is needed or not (see semantics for \exists in Section IV-B2). For example, if we would like to store in the object variable Id1Id_{1}, the value kk as the identifier of an object in a frame, and store in the frame variable x, the value ii as the frozen frame, then we would write

ϵ[Id1k]\epsilon[Id_{1}\leftarrow k]\qquad and ϵ[xi]\qquad\epsilon[x\leftarrow i]

respectively. Similarly, we write

ζ[Id1i]\zeta[Id_{1}\leftarrow i]

to denote that we store the frame number ii when the object variable Id1Id_{1} was set. The initial data structures (environments) ϵ0\epsilon_{0} and ζ0\zeta_{0} are empty. That is, no information is stored about any object identifier of the clock variable. The environment ζ\zeta is needed for the comparison of objects, probabilities, etc over different points in time, and environment ϵ\epsilon is needed for the comparison of time/frame constraints and quantification of the object variables.

Note that the above functions are application dependent, and we can add to this list based on future needs. In the next section, when we introduce an example of 3D space reasoning, we present a new spatio-temporal function.

Definition IV.5 (Semantics of STPL)

Consider 𝕄={\Large{\mathbb{M}}}= 𝕋,\langle{\Large{\mathbb{T}}}, 𝕍,{\Large{\mathbb{V}}}, 𝒟,\mathcal{D}, ϵ,\epsilon, ζ\zeta, τ\tau\rangle as a topological temporal model in which 𝕋{\Large{\mathbb{T}}} is a topological space, 𝕍{\Large{\mathbb{V}}} is a spatial valuation function, 𝒟\mathcal{D} is a data-object stream, ζ:Vo{}\zeta:V_{o}\rightarrow\mathbb{N}\cup\{{\bot}\} is an object evaluating function, and ϵ:VtVo\epsilon:V_{t}\cup V_{o}\rightarrow\mathbb{N} is a frame-evaluating function. Here, ii\in\mathbb{N} is the index of current frame, τ:+\tau:\mathbb{N}\rightarrow\mathbb{R^{+}} is a mapping function from frame numbers to their physical times, ϕ,ϕ1,ϕ2STPL\phi,\phi_{1},\phi_{2}\in STPL (i.e., formulas belong to the language of the grammar of STPL), VtV_{t} is a set of time variables, and VoV_{o} is a set of object variables. The quality value of formula ϕ\phi with respect to 𝒟\mathcal{D} at frame i with evaluations ϵ\epsilon and ζ\zeta is recursively assigned as follows:

IV-B2 Semantics of Temporal Operators

[[]](𝒟,i,ϵ,ζ,τ):=\displaystyle[\![\top]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=\top
[[x.ϕ]](𝒟,i,ϵ,ζ,τ):=[[ϕ]](𝒟,i,ϵ[xi],ζ,τ)\displaystyle[\![x.\phi]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=[\![\phi]\!](\mathcal{D},i,\epsilon[x\Leftarrow i],\zeta,\tau)
[[id@x.ϕ]](𝒟,i,ϵ,ζ,τ):=\displaystyle[\![\exists id@x.\phi]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=
k𝒪(𝒟(i))([[ϕ]](𝒟,i,ϵ[idk,xi],ζ[idi]))\displaystyle\;\;\;\;\;\bigvee_{k\in\mathcal{OI}(\mathcal{D}(i))}\Big{(}[\![\phi]\!](\mathcal{D},i,\epsilon[id\Leftarrow k,x\Leftarrow i],\zeta[id\Leftarrow i])\Big{)}
[[id.ϕ]](𝒟,i,ϵ,ζ,τ):=\displaystyle[\![\exists id.\phi]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=
k𝒪(𝒟(i))([[ϕ]](𝒟,i,ϵ[idk],ζ[id]))\displaystyle\;\;\;\;\;\bigvee_{k\in\mathcal{OI}(\mathcal{D}(i))}\Big{(}[\![\phi]\!](\mathcal{D},i,\epsilon[id\Leftarrow k],\zeta[id\Leftarrow\bot])\Big{)}
[[τxn]](𝒟,i,ϵ,ζ,τ):={ if τ(i)τ(x)n otherwise \displaystyle[\![\tau-x\sim n]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=\left\{\begin{array}[]{ll}\top\mbox{ if \;\;}\tau(i)-\tau(x)\sim n\\ \bot\mbox{ otherwise }\\ \end{array}\right.
[[xn]](𝒟,i,ϵ,ζ,τ):={ if ixn otherwise \displaystyle[\![\mathcal{F}-x\sim n]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=\left\{\begin{array}[]{ll}\top\mbox{ if \;\;}i-x\sim n\\ \bot\mbox{ otherwise }\\ \end{array}\right.
[[x%cn]](𝒟,i,ϵ,ζ,τ):={ if (ix)%cn otherwise \displaystyle[\![\mathcal{F}-x\;\%\;c\sim n]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=\left\{\begin{array}[]{ll}\top\mbox{ if \;\;}(i-x)\;\%\;c\sim n\\ \bot\mbox{ otherwise }\\ \end{array}\right.
[[idj=idk]](𝒟,i,ϵ,ζ,τ):={ if ϵ(idj)=ϵ(idk) otherwise \displaystyle[\![id_{j}=id_{k}]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=\left\{\begin{array}[]{ll}\top\mbox{ if \;\;}\epsilon(id_{j})=\epsilon(id_{k})\\ \bot\mbox{ otherwise }\\ \end{array}\right.
[[¬ϕ]](𝒟,i,ϵ,ζ,τ):=¬[[ϕ]](𝒟,i,ϵ,ζ,τ)\displaystyle[\![\neg\phi]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=\neg[\![\phi]\!](\mathcal{D},i,\epsilon,\zeta,{\tau})
[[ϕ1ϕ2]](𝒟,i,ϵ,ζ,τ):=[[ϕ1]](𝒟,i,ϵ,ζ,τ)\displaystyle[\![\phi_{1}\vee\phi_{2}]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=[\![\phi_{1}]\!](\mathcal{D},i,\epsilon,\zeta,{\tau})\;\vee
[[ϕ2]](𝒟,i,ϵ,ζ,τ)\displaystyle\qquad\qquad\qquad\qquad\qquad\quad[\![\phi_{2}]\!](\mathcal{D},i,\epsilon,\zeta,{\tau})
[[ϕ1𝒰ϕ2]](𝒟,i,ϵ,ζ,τ):=\displaystyle[\![\phi_{1}\;\,\mathcal{U}\;\phi_{2}]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=
ij([[ϕ2]](𝒟,j,ϵ,ζ,τ)ik<j[[ϕ1]](𝒟,k,ϵ,ζ,τ))\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\bigvee_{i\leq j}\bigg{(}[\![\phi_{2}]\!](\mathcal{D},j,\epsilon,\zeta,{\tau})\wedge\bigwedge_{i\leq k<j}[\![\phi_{1}]\!](\mathcal{D},k,\epsilon,\zeta,{\tau})\bigg{)}
[[ϕ]](𝒟,i,ϵ,ζ,τ):=[[ϕ]](𝒟,i+1,ϵ,ζ,τ)\displaystyle[\![\bigcirc\phi]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=[\![\phi]\!](\mathcal{D},i+1,\epsilon,\zeta,{\tau})

IV-B3 Semantics of Past-Time Operators

Many applications require requirements that refer to past time. Past time operators are particularly relevant to online monitoring algorithms.

[[ϕ1𝒮ϕ2]](𝒟,i,ϵ,ζ,τ):=\displaystyle[\![\phi_{1}\;\mathcal{S}\;\phi_{2}]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=
ij([[ϕ2]](𝒟,j,ϵ,ζ,τ)j<ki[[ϕ1]](𝒟,k,ϵ,ζ,τ))\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\bigvee_{i\geq j}\bigg{(}[\![\phi_{2}]\!](\mathcal{D},j,\epsilon,\zeta,{\tau})\wedge\bigwedge_{j<k\leq i}[\![\phi_{1}]\!](\mathcal{D},k,\epsilon,\zeta,{\tau})\bigg{)}
[[ϕ]](𝒟,i,ϵ,ζ,τ):=[[ϕ]](𝒟,i1,ϵ,ζ,τ)\displaystyle[\![\varodot\;\phi]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=[\![\phi]\!](\mathcal{D},i-1,\epsilon,\zeta,{\tau})

IV-B4 Semantics of Spatio-Temporal Operators

Now we define the operators and functions that are needed for capturing the requirements that reason over different properties of data objects in a data stream. Note that in some of the following definitions, we used 𝕍{\Large{\mathbb{V}}} to avoid rewriting the semantics we built up upon the syntax and semantics as part of MTL×S4uMTL\times S4_{u} logic. The semantics of STPL will be based on the valuation function 𝕍{\Large{\mathbb{V}}} of MTL×S4uMTL\times S4_{u} with the change that 𝕍{\Large{\mathbb{V}}} will now also accept the environments ϵ\epsilon and ζ\zeta, and we extend the definition of 𝕍{\Large{\mathbb{V}}} over the spatial term σ(id)\mathcal{\sigma}(id). That is, we replace in the semantics of the MTL×S4uMTL\times S4_{u} the valuation of spatial propositions pp with:

𝕍(σ(id),𝒟,i,τ,ϵ,ζ)=𝔘(ϵ(id),k),{\Large{\mathbb{V}}}(\mathcal{\sigma}(id),\mathcal{D},i,\tau,\epsilon,\zeta)={\Large{\mathfrak{U}}}(\epsilon(id),k),

where kζ(id)k\leftarrow\zeta(id) if ζ(id)\zeta(id) is specified, and kik\leftarrow i otherwise. The semantics for the rest of the spatial operators are:

[[𝒯]](𝒟,i,ϵ,ζ,τ):=𝕍(𝒯,𝒟,i,τ,ϵ,ζ)\displaystyle[\![\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\;\mathcal{T}]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):={{\Large{\mathbb{V}}}}(\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\;\mathcal{T},\mathcal{D},i,\tau,{\epsilon,\zeta})
[[Area(𝒯)r]](𝒟,i,ϵ,ζ,τ):=\displaystyle[\![Area(\mathcal{T})\sim r]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=
{ if fArea(𝕍(𝒯,𝒟,i,τ,ϵ,ζ)) is unspecified if fArea(𝕍(𝒯,𝒟,i,τ,ϵ,ζ))r otherwise\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\left\{\begin{array}[]{ll}\bot\mbox{ if \;\;}f_{Area}\big{(}{{\Large{\mathbb{V}}}}(\mathcal{T},\mathcal{D},i,\tau,{\epsilon,\zeta})\big{)}\mbox{ is unspecified}\\ \top\mbox{ if \;\;}f_{Area}\big{(}{{\Large{\mathbb{V}}}}(\mathcal{T},\mathcal{D},i,\tau,{\epsilon,\zeta})\big{)}\sim r\\ \bot\mbox{ otherwise}\end{array}\right.
[[C(id)=r]](𝒟,i,ϵ,ζ,τ):=\displaystyle[\![C(id)=r]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=
{ if fC(id,𝒟,i,ϵ,ζ))=r otherwise\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\left\{\begin{array}[]{ll}\top\mbox{ if \;\;}f_{C}\big{(}id,\mathcal{D},i,\epsilon,\zeta)\big{)}=r\\ \bot\mbox{ otherwise}^{*}\end{array}\right.
[[P(id)r]](𝒟,i,ϵ,ζ,τ):=\displaystyle[\![P(id)\sim r]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=
{ if fP(id,𝒟,i,ϵ,ζ))r otherwise\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\left\{\begin{array}[]{ll}\top\mbox{ if \;\;}f_{P}\big{(}id,\mathcal{D},i,\epsilon,\zeta)\big{)}\sim r\\ \bot\mbox{ otherwise}^{*}\end{array}\right.
[[Dist(idj,CRT1,idk,CRT2)r]](𝒟,i,ϵ,ζ,τ):=\displaystyle[\![Dist(id_{j},\mbox{\footnotesize{CRT}}_{1},id_{k},\mbox{\footnotesize{CRT}}_{2})\sim r]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=
{ if fDist(idj,idk,CRT1,CRT2,𝒟,i,ϵ,ζ))r otherwise\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\left\{\begin{array}[]{ll}\top\mbox{ if \;\;}f_{Dist}\big{(}id_{j},id_{k},\mbox{\footnotesize{CRT}}_{1},\mbox{\footnotesize{CRT}}_{2},\mathcal{D},i,\epsilon,\zeta)\big{)}\sim r\\ \bot\mbox{ otherwise}^{*}\end{array}\right.
[[LAT(id,CRT)r]](𝒟,i,ϵ,ζ,τ):=\displaystyle[\![LAT(id,\mbox{\footnotesize{CRT}})\sim r]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=
{ if fLAT(id,CRT,𝒟,i,ϵ,ζ))r otherwise\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\left\{\begin{array}[]{ll}\top\mbox{ if \;\;}f_{LAT}\big{(}id,\mbox{\footnotesize{CRT}},\mathcal{D},i,\epsilon,\zeta)\big{)}\sim r\\ \bot\mbox{ otherwise}^{*}\end{array}\right.
[[LON(id,CRT)r]](𝒟,i,ϵ,ζ,τ):=\displaystyle[\![LON(id,\mbox{\footnotesize{CRT}})\sim r]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=
{ if fLON(id,CRT,𝒟,i,ϵ,ζ))r otherwise\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\left\{\begin{array}[]{ll}\top\mbox{ if \;\;}f_{LON}\big{(}id,\mbox{\footnotesize{CRT}},\mathcal{D},i,\epsilon,\zeta)\big{)}\sim r\\ \bot\mbox{ otherwise}^{*}\end{array}\right.

where {>,<,=,,}\sim\in\{>,<,=,\geq,\leq\}, and “otherwise” has a higher priority to become true in the if statements if ϵ(id)𝒪(𝒟(i))\epsilon(id)\not\in\mathcal{OI}(\mathcal{D}(i)).

Note that we omit the semantics for some of the spatio-temporal operators in the production rule Θ\Theta except the ones stated above due to their similarity to the semantics of the presented operators.

Definition IV.6 (STPL Satisfiability)

We say that the data stream 𝒟\mathcal{D} satisfies the STPL formula φ\varphi under the current environment iff [[φ]](𝒟,0,ϵ0,ζ0,τ)=\;[\![\varphi]\!](\mathcal{D},0,\epsilon_{0},\zeta_{0},{\tau})=\top, which is equivalent to denote (𝒟,0,ϵ0,ζ0,τ)φ(\mathcal{D},0,\epsilon_{0},\zeta_{0},{\tau})\models\varphi.

Note that by ϵ0\epsilon_{0} and ζ0\zeta_{0}, we reset all the variables to be empty. This enables the use of the presented proof system of TPTL by Dokhanchi et al. (2016).

Refer to caption
Figure 4: An object-data stream 𝒟\mathcal{D} of three frames illustrated in three time steps t0,t1t_{0},t_{1}, and t2t_{2}. In each frame, there are two rectangular geometric shapes denoted by identifiers 11 and 22. The lighter blue colored rectangle and the darker blue colored rectangle are identified by p1p_{1} and p2p_{2}, respectively.
Refer to caption
Figure 5: The step-by-step computation of 𝕍(p1𝒰[0,2]sp2,𝒟,0,τ,ϵ){\Large{\mathbb{V}}}(p_{1}\,\mathcal{U}^{s}_{[0,2]}p_{2},\mathcal{D},0,\tau,{\epsilon}) for i=0i=0 to 22. Here, p1p_{1} and p2p_{2} are the same as in Fig. 4.
Refer to caption
Figure 6: s2=𝕍(p2,𝒟,0,τ,ϵ)s_{2}={\Large{\mathbb{V}}}(p_{2},\mathcal{D},0,\tau,{\epsilon}), s0=𝕍(p2,𝒟,1,τ,ϵ)𝕍(p1,𝒟,0,τ,ϵ)s_{0}={\Large{\mathbb{V}}}(p_{2},\mathcal{D},1,\tau,{\epsilon})\cap{\Large{\mathbb{V}}}(p_{1},\mathcal{D},0,\tau,{\epsilon}), s1=𝕍(p2,𝒟,2,τ,ϵ)𝕍(p1,𝒟,0,τ,ϵ)𝕍(p1,𝒟,1,τ,ϵ)s_{1}={\Large{\mathbb{V}}}(p_{2},\mathcal{D},2,\tau,{\epsilon})\cap{\Large{\mathbb{V}}}(p_{1},\mathcal{D},0,\tau,{\epsilon})\cap{\Large{\mathbb{V}}}(p_{1},\mathcal{D},1,\tau,{\epsilon}), 𝕍(p1𝒰[0,2]sp2,𝒟,0,τ,ϵ)=s0s1s2{\Large{\mathbb{V}}}(p_{1}\,\mathcal{U}^{s}_{[0,2]}p_{2},\mathcal{D},0,\tau,{\epsilon})=s_{0}\cup s_{1}\cup s_{2}.
Example IV.1 (Evaluating a simple spatio-temporal STPL formula)

A sample data stream of three frames time stamped at times t0t_{0}, t1t_{1}, and t2t_{2} is illustrated in Figure 4. We labeled rectangular geometric shapes in each frame by σ(1)\mathcal{\sigma}(1) and σ(2)\mathcal{\sigma}(2), and will refer to them by p1p_{1} and p2p_{2}, respectively. Object p1p_{1} does not change its geometric shape in all the frames, but p2p_{2} evolves in each frame. The location of the p1p_{1} is the same in the first two frames, but in the third frame, it moves to the bottom-right corner of the frame. The location of the p2p_{2} changes constantly in each frame, that is, it first horizontally expands and moves to the bottom-left of the frame, and then expands horizontally and vertically and moves to the right-center of the frame. We are going to compute and evaluate the formula

ϕ:=id1.id2.(σ(id1)𝒰[0,2]sσ(id2))\phi:={\exists id_{1}.\exists id_{2}.}\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\big{(}\mathcal{\sigma}(id_{1})\;\,\mathcal{U}^{s}_{[0,2]}\;\mathcal{\sigma}(id_{2})\big{)}

according to the semantics in Def. II.6 and Def. IV.5. An English statement equivalent to the above formula is:

“There exist two objects σ(id1)\mathcal{\sigma}(id_{1}) and σ(id2)\mathcal{\sigma}(id_{2}) in the first frame such that, σ(id2)\mathcal{\sigma}(id_{2}) is non-empty in the first frame; or, for the next two frames, its intersection with σ(id1)\mathcal{\sigma}(id_{1}) in the previous frames is non-empty (given that σ(id1)\mathcal{\sigma}(id_{1}) does not change)”.

In Figure 5, we demonstrate the result of evaluating the subformula σ(id1)𝒰[0,2]sσ(id2)\mathcal{\sigma}(id_{1})\;\,\mathcal{U}^{s}_{[0,2]}\;\mathcal{\sigma}(id_{2}) for i=1i=1 to 33, where we assign to id11id_{1}\leftarrow 1 and id22id_{2}\leftarrow 2 from the four possible assignments. We used p1p_{1} and p2p_{2} to refer to σ(id11)\mathcal{\sigma}(id_{1}\leftarrow 1) and σ(id22)\mathcal{\sigma}(id_{2}\leftarrow 2) in each frame, respectively. That is, p1p_{1} at t=2t=2 is equivalent to 𝔘(ϵ(1),2){\Large{\mathfrak{U}}}(\epsilon(1),2). The evaluation of STPL formula ϕ\phi starts with the following equation

[[σ(id1)𝒰[0,2]sσ(id2)]](𝒟,0,ϵ0,ζ0,τ):=\displaystyle[\![\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\;\mathcal{\sigma}(id_{1})\;\,\mathcal{U}^{s}_{[0,2]}\;\mathcal{\sigma}(id_{2})]\!](\mathcal{D},0,\epsilon_{0},\zeta_{0},{\tau}):=
𝕍(p1𝒰[0,2]sp2,𝒟,0,τ,ϵ,ζ)\displaystyle\;\;\;\;{\Large{\mathbb{V}}}(\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\;p_{1}\;\,\mathcal{U}^{s}_{[0,2]}\;p_{2},\mathcal{D},0,\tau,{\epsilon,\zeta})

where, ϵ0[id11,id22]\epsilon_{0}[id_{1}\leftarrow 1,id_{2}\leftarrow 2] and ζ0[id1,id2]\zeta_{0}[id_{1}\leftarrow\bot,id_{2}\leftarrow\bot].

The evaluating of the until subformula is computed as below

𝕍(p1𝒰[0,2]sp2,𝒟,0,τ,ϵ,ζ):=\displaystyle{\Large{\mathbb{V}}}(p_{1}\;\,\mathcal{U}^{s}_{[0,2]}\;p_{2},\mathcal{D},0,\tau,{\epsilon,\zeta}):=
t{0,1,2}(𝕍(p2,𝒟,t,τ,ϵ,ζ)\displaystyle\bigcup_{t^{\prime}\in\{0,1,2\}}\Big{(}{\Large{\mathbb{V}}}(p_{2},\mathcal{D},t^{\prime},\tau,{\epsilon,\zeta})\cap
0t′′<t𝕍(p1,𝒟,t′′,τ,ϵ,ζ)),\displaystyle\bigcap_{0\leq t^{\prime\prime}<t^{\prime}}{\Large{\mathbb{V}}}(p_{1},\mathcal{D},t^{\prime\prime},\tau,{\epsilon,\zeta})\Big{)},

where τ(0)=0,τ(1)=1,\tau(0)=0,\tau(1)=1, and τ(2)=2\tau(2)=2.. The above formula is equal to the below disjunctive normal formula as shown in Fig. 5

𝕍(p2,𝒟,0,τ,ϵ,ζ)\displaystyle{\Large{\mathbb{V}}}(p_{2},\mathcal{D},0,\tau,{\epsilon,\zeta})
\displaystyle\cup
(𝕍(p2,𝒟,1,τ,ϵ,ζ)𝕍(p1,𝒟,0,τ,ϵ,ζ))\displaystyle\big{(}{\Large{\mathbb{V}}}(p_{2},\mathcal{D},1,\tau,{\epsilon,\zeta})\cap\;{\Large{\mathbb{V}}}(p_{1},\mathcal{D},0,\tau,{\epsilon,\zeta})\big{)}
\displaystyle\cup
(𝕍(p2,𝒟,2,τ,ϵ,ζ)𝕍(p1,𝒟,0,τ,ϵ,ζ)\displaystyle\big{(}{\Large{\mathbb{V}}}(p_{2},\mathcal{D},2,\tau,{\epsilon,\zeta})\cap\;{\Large{\mathbb{V}}}(p_{1},\mathcal{D},0,\tau,{\epsilon,\zeta})\;\cap\;
𝕍(p1,𝒟,1,τ,ϵ,ζ)).\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;{\Large{\mathbb{V}}}(p_{1},\mathcal{D},1,\tau,{\epsilon,\zeta})\big{)}.

The final evaluation of the above formula is depicted as hashed regions s0s_{0}, s1s_{1} and s2s_{2} in Fig. 6. The union of the regions is a non-empty set, hence, the spatial existential operator returns true.

IV-C MTL/STL Equivalences in STPL

Below, we represent some rewriting rules to translate time interval based formulas in MTL/STL to their equivalent STPL formulas.

ϕ1𝒰[a,b]ϕ2\displaystyle\phi_{1}\;\,\mathcal{U}_{[a,b]}\;\phi_{2}\equiv x.ϕ1𝒰((aτxb)ϕ2)\displaystyle\;x.\phi_{1}\,\mathcal{U}((a\leq\tau-x\leq b)\wedge\phi_{2})
ϕ1[a,b]ϕ2\displaystyle\phi_{1}\;\mathcal{R}_{[a,b]}\;\phi_{2}\equiv x.ϕ1((aτxb)ϕ2)\displaystyle\;x.\phi_{1}\mathcal{R}((a\leq\tau-x\leq b)\implies\phi_{2})
[a,b]ϕ\displaystyle\Diamond_{[a,b]}\;\phi\equiv x.𝒰((aτxb)ϕ)\displaystyle\;x.\top\,\mathcal{U}((a\leq\tau-x\leq b)\wedge\phi)
[a,b]ϕ\displaystyle\Diamond_{[a,b]}\;\phi\equiv x.((aτxb)ϕ)\displaystyle\;x.\Diamond((a\leq\tau-x\leq b)\wedge\phi)
[a,b]ϕ\displaystyle\Box_{[a,b]}\;\phi\equiv x.((aτxb)ϕ)\displaystyle\;x.\bot\mathcal{R}((a\leq\tau-x\leq b)\implies\phi)
[a,b]ϕ\displaystyle\Box_{[a,b]}\;\phi\equiv x.((aτxb)ϕ)\displaystyle\;x.\Box((a\leq\tau-x\leq b)\implies\phi)

In the next section, we use specific examples of perception systems to explain each aspect of the logic individually gradually and later combined.

V Case Study

In this section, we are going to gradually demonstrate the expressivity of the STPL language using a sample perception data stream corresponding to the image frames in Figure 1. There are six frames in this case-study that are taken from the KITTI222http://www.cvlibs.net/datasets/kitti/ databases (Geiger et al. (2013a)). We adopt a perception system based on SqueezeDet by which some desirable classes of objects in image frames are recognized. The perception classifies an object as one class from three desirable classes Car, Cyclist, and Pedestrian. It also assigns to each detected object a confidence level (normalized between 0 and 1), and a bounding box that surrounds the object. We also manually associate a unique object identifier to the detected objects of each frame. The data is available to the object data stream function 𝒟\mathcal{D} as represented in Table II, and visualized in the frames in Fig. 1.

In the following sections, we focus on STPL specifications that use quantifiers, and spatial and temporal operators. We present the formalization of 15 requirements in total. A highlight is the STPL formula (16) that formalizes Req. 15 and its utilization on the KITTI dataset in our monitoring tool. Our monitoring tool was able to detect some mismatches in the labeled data. All of the STPL formulas and their monitoring result except the last two are distributed with our monitoring framework (STPL (2023)).

Note that in the rest of the sections, when we translate a requirement into STPL, we will refer to the following assumptions. These assumptions enable us to write smaller formulas. They can be relaxed at the expense of more complex requirements.

Assumption V.1

There are different perception modules to detect, track and classify the objects. That is, the object detector detects objects, and the object tracker assigns unique identifiers to the detected objects, and the classifier assigns classes to the detected objects.

Assumption V.2

Object detector always detects objects.

Assumption V.3

Each detected and tracked object has a unique id.

The tracker can check the detected objects through the sequence of frames but it does not imply that the classifier will assign the same class to them in all the frames. If we cannot assume the existence of unique IDs on the same object, then we can still write requirements in a more complicated form to achieve more conservative results. For example, in the requirements that follow (Req 3-18), whenever we check for the presence of the same ID, e.g.,

ψ==Idi.Idj.(Idi=Idjφ),\psi_{=}=\Box\forall Id_{i}.\Diamond\exists Id_{j}.(\;Id_{i}=Id_{j}\implies\varphi),

then this precondition can be replaced with one where we quantify over all objects that intersect with the original object in the previous frame, e.g.,

ψ=\displaystyle\psi_{\sqcap}= Idi@x.Idj.\displaystyle\Box\forall Id_{i}@x.\Diamond\exists Id_{j}.
(((σ(Idi)σ(Idj))C(Idi)=C(Idj))φ).\displaystyle\Big{(}\big{(}\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}(\mathcal{\sigma}(Id_{i})\sqcap\mathcal{\sigma}(Id_{j}))\wedge C(Id_{i})=C(Id_{j})\big{)}\implies\varphi\Big{)}.

Notice the increase in complexity between the two formulas. In ψ=\psi_{=}, we just need to store the ID of the object in IdiId_{i} and just check if in the future there is another object (IdjId_{j}) with the same ID. In ψ\psi_{\sqcap}, we need to store the time xx that we observed the object IdiId_{i}, so that we can retrieve its bounded box and check for intersection with future objects along with checking the agreement of classes. Clearly, ψ=\psi_{=} and ψ\psi_{\sqcap} are not syntactically or semantically equivalent. Nevertheless, under the assumption of sufficient sampling rate, then the objects that satisfy ψ=\psi_{=} will also satisfy ψ\psi_{\sqcap}. There exist other possible formulas under which we can relax the requirement for persistent unique IDs. However, in the following presentation, we will always be using the assumptions that result in the simplest STPL formulas.

For the requirements that are formalized into STPL in the rest of this section, we used our STPL monitoring tool to apply the data stream in Table II to each STPL formula, and presented the result in Table I.

TABLE I: The result of monitoring STPL formula φ\varphi on the data stream 𝒟\mathcal{D} in Table II. [[φ]][\![\varphi]\!] abbreviates [[φ]](𝒟[\![\varphi]\!](\mathcal{D},0,ϵ0\epsilon_{0},ζ0\zeta_{0},τ{\tau})).
* We used the dataset “0018” from KITTI tracking benchmark for evaluating formula in Eq.(16).
𝝋\boldsymbol{\varphi} [[𝝋]]\boldsymbol{[\![\varphi]\!]} 𝝋\boldsymbol{\varphi} [[𝝋]]\boldsymbol{[\![\varphi]\!]} 𝝋\boldsymbol{\varphi} [[𝝋]]\boldsymbol{[\![\varphi]\!]} 𝝋\boldsymbol{\varphi} [[𝝋]]\boldsymbol{[\![\varphi]\!]}
  Eq.(1) \top Eq.(2) \bot Eq.(3) \bot Eq.(4) \bot
Eq.(5) \bot Eq.(9) \top Eq.(10) \top Eq.(11) \bot
Eq.(12) \top Eq.(13) \bot Eq.(14) \bot Eq.(15) \bot
Eq.(16) \top *Eq.(16) \bot Eq.(17) \top Eq.(18) \top
 

V-A Object Quantifier Examples

First, we present a requirement that enables search through a data stream to find a frame in which there are at least two unique objects from the same class.

Req. 3

There is at least one frame in which at least two unique objects are from the same class.

STPL (using all the assumptions):

ϕ3=Id1.Id2.(Id1Id2C(Id1)=C(Id2))\displaystyle\phi_{\ref{req:unique:obj}}=\Diamond\exists Id_{1}.\exists Id_{2}.\big{(}Id_{1}\neq Id_{2}\wedge C(Id_{1})=C(Id_{2})\big{)} (1)

In the above formula, the object variable constraint requires a unique assignment of object identifiers to the variables Id1Id_{1} and Id2Id_{2}. Additionally, the equivalence proposition is valid only if the classes of a unique pair of objects are equal. Also, the existential quantifier requires one assignment of objects to the object variables that satisfy its following subformula. Lastly, the eventually operator requires one frame in which its following subformula is satisfiable.

Checking the Formula on Real Data:

From the data stream 𝒟\mathcal{D} in Table II, and as depicted in the frames in Fig. 1, the frame numbers 0, 2 and 3 have two pedestrians, and frame number 3 has two cars in them. Therefore, the formula is satisfiable for the given object data stream and we can denote it as (𝒟,0,ϵ0,ζ0,τ)ϕ3(\mathcal{D},0,\epsilon_{0},\zeta_{0},{\tau})\models\phi_{\ref{req:unique:obj}}. Note that we can push the \Diamond operator after the existential quantifier (i.e., Id1.Id2.ϕId1.Id2.ϕ\Diamond\exists Id_{1}.\exists Id_{2}.\phi\equiv\exists Id_{1}.\exists Id_{2}.\Diamond\phi) and still expect the same result.

V-B Examples with Time Constraints

Next, we formalize a requirement in which we should use nested quantifiers and time/frame constraints in our formalization.

Req. 4

Always, for all objects in each frame, their class probability must not decrease with a factor greater than 0.1 in the next two seconds, unless Frame Per Second (FPS) drops below 10 during the first second.

STPL (using Assumptions V.1-V.3):

ϕ4=\displaystyle\phi_{\ref{req:prob:fps}}= Id1@x.(ψ41Id1,xψ42x)\displaystyle\Box\forall Id_{1}@x.\big{(}\psi_{\ref{req:prob:fps}1}^{Id_{1},x}\implies\psi_{\ref{req:prob:fps}2}^{x}\big{)} (2)

where

ψ41Id1,x\displaystyle\psi_{\ref{req:prob:fps}1}^{Id_{1},x}\equiv Id2.(Id1=Id2τx2\displaystyle\Diamond\exists Id_{2}.\big{(}Id_{1}=Id_{2}\;\wedge\;\tau-x\leq 2\;\wedge
P(Id2)<0.9×P(Id1))\displaystyle\qquad\qquad P(Id_{2})<0.9\times P(Id_{1})\big{)}
ψ42x\displaystyle\psi_{\ref{req:prob:fps}2}^{x}\equiv (Ratio(x,τx)<10τx1)\displaystyle\bigcirc\Diamond\big{(}Ratio(\mathcal{F}-x,\tau-x)<10\wedge\tau-x\leq 1\big{)}

Notice that the function “Ratio(x,τx)Ratio(\mathcal{F}-x,\tau-x)” is not directly allowed by the syntax of STPL (Def. IV.1). However, the arithmetic expression uses a single freeze time variable xx and, hence, its evaluation is no more computationally expensive than of the individual expressions “(x)(\mathcal{F}-x)” and “(τx)(\tau-x)”. We remark that the function Ratio introduces the possibility of division by zero. In the formula ψ42x\psi_{\ref{req:prob:fps}2}^{x}, the issue is avoided in the specification by using the next time operator (\bigcirc). However, an implementation of the function Ratio should handle the possibility of division by zero. The subformula ψ41Id1,x\psi_{\ref{req:prob:fps}1}^{Id_{1},x} searches for a vehicle for which within 2 sec in the future, i.e., (τx2\tau-x\leq 2), its classification probability drops below the desired threshold. Notice that with the expression P(Id2)<0.9×P(Id1)P(Id_{2})<0.9\times P(Id_{1}), we compare probabilities for the same object across different points in time. The subformula ψ42x\psi_{\ref{req:prob:fps}2}^{x} checks whether within 1 sec in the future the FPS drops below 10. The implication ψ41Id1,xψ42x\psi_{\ref{req:prob:fps}1}^{Id_{1},x}\implies\psi_{\ref{req:prob:fps}2}^{x} enforces that if there is a vehicle for which the classification probability drops, then at the same time the FPS should be low.

Checking the Formula on Real Data:

Except for the object with ID=2ID=2 in the first frame in Table II, the rest of the objects satisfy the requirement. Therefore, the whole requirement is not satisfiable, that is (𝒟,0,ϵ0,ζ0,τ)⊧̸ϕ(\mathcal{D},0,\epsilon_{0},\zeta_{0},\tau)\not\models\phi. Formula (2) is too strict. That is, it checks the maximum allowed drop in probabilities for a classified object in all the frames rather than for the newly classified objects. For example, assume a sample data stream of size three with only one object classified as a car with the probabilities 0.70.7, 0.90.9, 0.640.64 during the first, second, and third frames, respectively. Formula (2) without the always operator is not satisfiable starting from the second frame (more than 28%28\% probability decrement), but satisfiable for the first and the last frame.

We can relax Formula (2) by adding a precondition to the antecedent of the implication to only consider the newly detected objects. The relaxed requirement is possible by using the weak previous operator w\varodot_{w}. The weak previous operator does not become unsatisfiable in the first frame on a data stream where there is no previous frame. Similarly, by w\bigcirc_{w}, we denote the weak next operator. It is similar to the next operator, but it does not become unsatisfiable in the last frame on a finite data stream. For more information about weak temporal operators and past LTL see the works by Eisner and Fisman (2006) and Cimatti et al. (2004), respectively.

The following formula is a revised version of Formula (2):

STPL (using Assumptions V.1-V.3):

ϕ4=Id1@x.(ψ40Id1(ψ41Id1,xψ42x))\phi_{\ref{req:prob:fps}}^{\prime}=\Box\forall Id_{1}@x.\Big{(}\psi_{\ref{req:prob:fps}0}^{Id_{1}}\Rightarrow\big{(}\psi_{\ref{req:prob:fps}1}^{Id_{1},x}\Rightarrow\psi_{\ref{req:prob:fps}2}^{x}\big{)}\Big{)} (3)

where

ψ40Id1wId3.(Id1Id3)\psi_{\ref{req:prob:fps}0}^{Id_{1}}\equiv\varodot_{w}\forall Id_{3}.(Id_{1}\neq Id_{3})

In formula ϕ4\phi_{\ref{req:prob:fps}}^{\prime}, the antecedent subformula ψ40Id1\psi_{\ref{req:prob:fps}0}^{Id_{1}} checks if an object did not exist in the previous frame. Therefore, the consequent ψ41Id1,xψ42x\psi_{\ref{req:prob:fps}1}^{Id_{1},x}\implies\psi_{\ref{req:prob:fps}2}^{x} is only checked for new objects that appear in the frame for the first time.

Req. 5

Always, each object in a frame must exist in the next 2 frames during 1 second.

STPL (using Assumption V.2-V.3):

ϕ5=Id1@x.((wId3.(Id1Id3))((τx1x2)Id2.(Id1=Id2)))\begin{array}[]{rl}\phi_{\ref{req:time:constraints}}=&\Box\forall Id_{1}@x.\bigg{(}\big{(}\varodot_{w}\forall Id_{3}.(Id_{1}\neq Id_{3})\big{)}\Rightarrow\\ &\qquad\qquad\Box\Big{(}\big{(}\tau-x\leq 1\;\wedge\;\mathcal{F}-x\leq 2\big{)}\Rightarrow\\ &\qquad\qquad\exists Id_{2}.\big{(}Id_{1}=Id_{2}\big{)}\Big{)}\bigg{)}\end{array} (4)

Similar to the previous example, the equalities that need to hold are in the consequent of the second implication, but its antecedent is a conjunction of time and frame constraints. In this formula, there is a freeze time variable after the first quantifier operator, which is followed by an always operator. Thus, the constraints apply to the elapsed time and frames between the freeze time operator and the second always operator. Therefore, for any three consecutive frames, if the last two frames are within 1 second of the first frame, then all the objects in the first frame have to reappear in the second and third frames.

Running Formula in Real Data Stream:

The same result as in the previous example holds here for the data stream 𝒟\mathcal{D} in Table II that is (𝒟,0,ϵ0,ζ0,τ)⊧̸ϕ5(\mathcal{D},0,\epsilon_{0},\zeta_{0},{\tau})\not\models\phi_{\ref{req:time:constraints}}.

Below, we translate the requirement previously presented as in Req. 1.

Req. 6

Whenever a new object is detected, then it is assigned a class within 1 sec, after which the object does not change class until it disappears.

STPL (using Assumptions V.1-V.3):

ϕ6=Id1@x.((C(Id1)=0ψ61Id1,x)(C(Id1)>0ψ62Id1,x))\begin{array}[]{rl}\phi_{\ref{req:uber:formal}}=&\Box\forall Id_{1}@x.\\ &\bigg{(}\Big{(}C(Id_{1})=0\Rightarrow\psi_{\ref{req:uber:formal}1}^{Id1,x}\Big{)}\;\wedge\\ &\Big{(}C(Id_{1})>0\Rightarrow\psi_{\ref{req:uber:formal}2}^{Id1,x}\Big{)}\bigg{)}\end{array} (5)

where the subformulas are defined as:

ψ61Id1,x\displaystyle\psi_{\ref{req:uber:formal}1}^{Id1,x}\equiv ((τx1x1)\displaystyle\Diamond\Big{(}\big{(}\tau-x\leq 1\;\wedge\;\mathcal{F}-x\geq 1\big{)}\wedge
Id2.(Id1=Id2C(Id2)>0))\displaystyle\qquad\Box\exists Id_{2}.\big{(}Id_{1}=Id_{2}\wedge C(Id_{2})>0\big{)}\Big{)}
ψ62Id1,x\displaystyle\psi_{\ref{req:uber:formal}2}^{Id1,x}\equiv Id3((x1Id3=Id1)\displaystyle\Box\forall Id_{3}\Big{(}\big{(}\mathcal{F}-x\geq 1\wedge Id_{3}=Id_{1}\big{)}\Rightarrow
C(Id1)=C(Id3))\displaystyle\qquad\qquad C(Id_{1})=C(Id_{3})\Big{)}

In the above formula, we evaluate the two implication-form subformulas for any objects in all the frames. If there is an object that is not classified (i.e., C(Id1)=0C(Id_{1})=0), then we check the consequence of the corresponding subformula. The subformula corresponding to the subformula ψ61Id1,x\psi_{\ref{req:uber:formal}1}^{Id1,x} requires that when an unclassified object was observed, then eventually in less than a second afterward, the object always has a class being assigned to it. If there is an object that is already classified, then the second predicate has to be evaluated. The subformula equivalent to the ψ62Id1,x\psi_{\ref{req:uber:formal}2}^{Id1,x} requires that when a classified object was observed, then afterward the same object only can take the same class.

Running Formula in Real Data Stream:

In the frame 11 in Table II, the object with ID=2ID=2 changes its class from cyclist to pedestrian. Therefore, the data stream 𝒟\mathcal{D} does not satisfy the requirement (𝒟,0,ϵ0,ζ0,τ)⊧̸ϕ6(\mathcal{D},0,\epsilon_{0},\zeta_{0},{\tau})\not\models\phi_{\ref{req:uber:formal}}.

TABLE II: Data stream 𝒟\mathcal{D} of image frames in Fig. 1. Each row for the column headers: Frame, τ\tau, ID (object identifiers), class, and Prob represent the frame number, the sampling time (e.g., here we assume that frame-per-second is 25 fps), the associated identifier to the objects, the classification confidence, respectively. Moreover, the other four headers are the data attributes (minimum and maximum lateral and longitudinal positions of coordinates of the bounding boxes) by which a bounding box is associated with each classified object. Note that some objects are not tracked correctly throughout the data stream.
Frame 𝝉\boldsymbol{\tau} ID Class Prob 𝒙𝒎𝒊𝒏\boldsymbol{x_{min}} 𝒚𝒎𝒊𝒏\boldsymbol{y_{min}} 𝒙𝒎𝒂𝒙\boldsymbol{x_{max}} 𝒚𝒎𝒂𝒙\boldsymbol{y_{max}}
  0 0 1 car 0.88 58 151 220 287
0 0 2 cyclist 0.75 479 124 690 382
0 0 3 pedestrian 0.63 522 130 632 377
0 0 4 pedestrian 0.64 861 133 954 329
  1 0.04 1 car 0.88 61 152 217 283
1 0.04 2 cyclist 0.57 493 111 699 383
1 0.04 3 pedestrian 0.64 877 136 972 330
  2 0.08 1 car 0.89 58 143 220 271
2 0.08 2 pedestrian 0.65 511 107 724 367
2 0.08 3 pedestrian 0.64 911 115 1001 340
  3 0.12 1 car 0.92 56 139 216 266
3 0.12 2 cyclist 0.59 493 111 705 380
3 0.12 3 pedestrian 0.72 541 125 649 351
3 0.12 4 car 0.58 926 107 1004 302
3 0.12 5 pedestrian 0.76 938 118 998 332
  4 0.16 1 car 0.91 53 139 217 265
4 0.16 2 pedestrian 0.80 551 126 658 356
  5 0.2 1 car 0.92 52 140 216 264
5 0.2 2 cyclist 0.62 506 104 695 368
5 0.2 3 pedestrian 0.68 552 115 669 362
 

V-C Examples of Space and Time Requirements with 2-D Images

In the following examples, we want to demonstrate the expressivity and usability of the STPL language to capture requirements related to the detected objects and their spatial and temporal properties.

V-C1 Basic Spatial Examples without Quantifiers.

Let us assume that we have a spatial proposition pp as in Fig. 3.

Example V.1

One way to formalize that the sampling rate is sufficient is to check if the bounding box of an object overlaps with itself across consecutive frames. The following requirement can be used as a template in more complex requirements.

Req. 7

The intersection of a spatial predicate pp with itself across all frames must not be empty.

STPL:

ϕ7=sp\displaystyle\phi_{\ref{req:topo-intersect}}=\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\;\Box^{s}p (6)

The above requirement is clearly too strict. In order to make the requirement more realistic, we introduce frame intervals (see Rem. II.8) and also use quantification over objects. If we change the above requirement to: “All the objects in all the frames must intersect with themselves in the next three frames”, then we can modify Eq. (6) to the following:

ϕ7=Id1.~[0,3]sσ(Id1)\displaystyle{\phi_{\ref{req:topo-intersect}}^{\prime}=\Box\forall Id_{1}.\;\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\;\tilde{\Box}^{s}_{[0,3]}\;\mathcal{\sigma}(Id_{1})} (7)
Example V.2

It is interesting to check the occupancy of an object across a series of frames. This can be used for identifying regions of interest.

Req. 8

The union of a spatial proposition pp with itself in all the frames is not equal to the universe.

STPL:

ϕ8=¬sp\displaystyle\phi_{\ref{req:topo-union}}=\neg\;\leavevmode\resizebox{7.22743pt}{}{\framebox{$\forall$}}\;\Diamond^{s}p (8)

The result of applying the formula sp\Diamond^{s}p on a sequence of three frames is shown in Fig. 3. The resulting set is not equal to the universe; hence; the formula ϕ8\phi_{\ref{req:topo-union}} evaluates to true.

V-C2 Basic Spatial Examples with Quantifiers.

In the following examples, the spatial requirements cannot be formalized without the use of quantifiers.

Example V.3

We can check if some properties hold for all the objects across all the frames.

Req. 9

Always, all the objects must remain in the bounding box (x1,y1,x2,y2)(x_{1},y_{1},x_{2},y_{2}).

STPL (using Assumption V.2):

ϕ9=Id.(LAT(Id,LM)x1LAT(Id,RM)x2LON(Id,TM)y1LON(Id,BM)y2)\phi_{\ref{req:in-bbx}}=\Box\forall Id.\big{(}LAT(Id,\mbox{\footnotesize{LM}})\geq x_{1}\wedge LAT(Id,\mbox{\footnotesize{RM}})\leq x_{2}\;\wedge\\ LON(Id,\mbox{\footnotesize{TM}})\geq y_{1}\wedge LON(Id,\mbox{\footnotesize{BM}})\leq y_{2}\big{)} (9)
Running Formula in Real Data Stream:

The result of evaluating the above formula depends on the size and position of the bounding box. For example, if the bounding box is the same as the image frame in the data stream 𝒟\mathcal{D} in Table II, then we have (𝒟,0,ϵ0,ζ0,τ)ϕ9(\mathcal{D},0,\epsilon_{0},\zeta_{0},{\tau})\models\phi_{\ref{req:in-bbx}}.

Example V.4

The combination of eventually/globally operators and a quantifier operator is very useful for specifying properties of an object across time.

Req. 10

Eventually, there must exist an object that at the next frame shifts to the right.

STPL (using Assumptions V.2-V.3):

ϕ10=Id1@x.Id2.(Id1=Id2LAT(Id1,LM)<LAT(Id2,LM))\phi_{\ref{req:evn-shift}}=\Diamond\exists Id_{1}@x.\bigcirc\exists Id_{2}.\big{(}\\ Id_{1}=Id_{2}\wedge LAT(Id_{1},\mbox{\footnotesize{LM}})<LAT(Id_{2},\mbox{\footnotesize{LM}})\big{)} (10)

Note that in the above formula, the Id1Id_{1} is quantified in a freeze time quantifier to get access to the data-object values in the frozen time. If we change the first existential quantifier to the universal quantifier, then the formula represents the following requirement: “Eventually, there must exist a frame in which all the objects shift to the right in the next frame”.

Running Formula in Real Data Stream:

The data stream 𝒟\mathcal{D} as in Table II satisfies the above formula (𝒟,0,ϵ0,ζ0,τ)ϕ10(\mathcal{D},0,\epsilon_{0},\zeta_{0},{\tau})\models\phi_{\ref{req:evn-shift}}. For example, the pedestrian with ID=3ID=3 moved to the right in the frame 11. Note that the pedestrian and the recording camera both moved to the left, while the camera’s movement was faster.

Example V.5

The requirement below checks the robustness of the tracking algorithms in a perception system using time and spacial constraints.

Req. 11

If an object disappears from the right of the image in the next frame, then during the next 1 seconds, it can only reappear from the right.

STPL (using Assumptions V.2-V.3):

ϕ11=Id1@x.wId2@y.(ψ111Id1,Id2ψ112Id1,Id2,y)\phi_{\ref{req:disappear}}=\Box\forall Id_{1}@x.\bigcirc_{w}\forall Id_{2}@y.\Big{(}\psi_{\ref{req:disappear}1}^{Id_{1},Id_{2}}\Rightarrow\psi_{\ref{req:disappear}2}^{Id_{1},Id_{2},y}\Big{)} (11)

where

ψ111Id1,Id2\displaystyle\psi_{\ref{req:disappear}1}^{Id_{1},Id_{2}}\equiv Id1=Id2\displaystyle\;Id_{1}=Id_{2}\wedge
LAT(Id1,LM)<LAT(Id2,LM)\displaystyle\;LAT(Id_{1},\mbox{\footnotesize{LM}})<LAT(Id_{2},\mbox{\footnotesize{LM}})\;\wedge
LAT(Id2,LM)>34W\displaystyle\;LAT(Id_{2},\mbox{\footnotesize{LM}})>\frac{3}{4}W\;\wedge
wId3.(Id3Id1)\displaystyle\;\bigcirc_{w}\forall Id_{3}.\;(Id_{3}\neq Id_{1})
ψ112Id1,Id2,y\displaystyle\psi_{\ref{req:disappear}2}^{Id_{1},Id_{2},y}\equiv w\displaystyle\;\bigcirc_{w}\Box
(((τy<1)Id4.(Id2Id4)\displaystyle\;\Big{(}\big{(}(\tau-y<1)\wedge\forall Id_{4}.(Id_{2}\neq Id_{4})\;\wedge
wId5.(Id2=Id5))\displaystyle\;\bigcirc_{w}\exists Id_{5}.(Id_{2}=Id_{5})\big{)}\Rightarrow
wLAT(Id5,LM)>34W)\displaystyle\;\bigcirc_{w}LAT(Id_{5},\mbox{\footnotesize{LM}})>\frac{3}{4}W\Big{)}

Note that in the above formula, the Id1Id_{1} and Id2Id_{2} are quantified in a nested freeze time quantifier structure, and hence, it is not an AAN formula. There, WW is a constant denoting the width of the image in pixels. This formalization will check the reappearance of an object even if it happened more than once. In order to force the formula to only check once for the reappearance of an object, one should use the release operator.

Checking the Formula on Real Data:

In Table II, in the third frame, the pedestrian with ID=3ID=3 has disappeared in the fourth frame and reappeared in the last frame. However, the preconditions captured in subformula ψ111Id1,Id2\psi_{\ref{req:disappear}1}^{Id_{1},Id_{2}} for that object do not hold assuming that W=1100W=1100 for the data stream. That is the least xx position of the bounding box for the object with ID=3ID=3 is 911911 in frame 22, and then it changes to 541541 in frame 33 which violates the condition for shifting toward right before disappearing (the LATLAT condition in the subformula ψ111Id1,Id2\psi_{\ref{req:disappear}1}^{Id_{1},Id_{2}}). Note that, from the second frame afterward, the tracking algorithm mixes the ID association to the objects, and as a result, the geometrical positions of the objects are not consistent. The inconsistent object tracking makes the subformula ψ111Id1,Id2\psi_{\ref{req:disappear}1}^{Id_{1},Id_{2}} not hold for viable cases. Therefore, for the data stream 𝒟\mathcal{D} in Table II, we have (𝒟,0,ϵ0,ζ0,τ)ϕ11(\mathcal{D},0,\epsilon_{0},\zeta_{0},\tau)\models\phi_{\ref{req:disappear}}.

Refer to caption
(a) The car inside the red rectangle is identified with ID 44 in frame 1010, and it is annotated as “partly occluded”.
Refer to caption
(b) The same car as in Image (a) is bounded by the yellow rectangle, and it is annotated as “largely occluded” in frame 1111.
Refer to caption
(c) The car inside the red rectangle is identified with ID 55 in frame 1414, and it is annotated as “partly occluded”.
Refer to caption
(d) The same car as in Image (c) is bounded by the yellow rectangle, and it is annotated as “largely occluded” in frame 1515.
Refer to caption
(e) The car inside the red rectangle is identified with ID 1616 in frame 260260, and it is annotated as “partly occluded”.
Refer to caption
(f) The same car as in Image (e) is bounded by the yellow rectangle, and it is annotated as “largely occluded” in frame 261261.
Figure 7: We used the labels corresponding to the 390 images in the folder “00080008” from the KITTI tracking benchmark. Our STPL monitoring tool detected frames 11, 15, and 261 with “largely occluded” labels as inconsistent annotations. The training data and their format are available from the links in the footnote. 444Training data: http://www.cvlibs.net/datasets/kitti/eval_tracking.php, and data format: https://github.com/JonathonLuiten/TrackEval/blob/master/docs/KITTI-format.txt

For the examples in the rest of this section, we focus on the real-time requirements that concern detected objects in perception systems.

Example V.6

The requirement below checks the robustness of the classification of the pedestrians in a perception system using time and spatial constraints.

Req. 12

If a pedestrian is detected with probability higher than 0.80.8 in the current frame, then for the next 11 second, the probability associated with the pedestrian should not fall below 0.70.7, and the bounding box associated with the pedestrian in the future should not overlap with another detected bounding box.

STPL (using Assumptions V.2-V.3):

ϕ12=Id1@x.((C(Id1)=PedP(Id1)>0.8)(τx1Id2.(Id1=Id2P(Id2)>0.7C(Id2)=PedId3.(Id2Id3¬(σ(Id2)σ(Id3))))))\phi_{\ref{req:rob-ped}}=\Box\forall Id_{1}@x.\Big{(}\big{(}C(Id_{1})=Ped\wedge P(Id_{1})>0.8\big{)}\implies\\ \Box\Big{(}\tau-x\leq 1\implies\\ \exists Id_{2}.\big{(}Id_{1}=Id_{2}\wedge P(Id_{2})>0.7\wedge C(Id_{2})=Ped\;\wedge\\ \forall Id_{3}.\big{(}Id_{2}\neq Id_{3}\implies\\ \neg\;\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}(\mathcal{\sigma}(Id_{2})\sqcap\mathcal{\sigma}(Id_{3}))\big{)}\big{)}\Big{)}\Big{)} (12)
Running Formula in Real Data Stream:

In Table II, there is no pedestrian associated with probability higher than 0.80.8, therefore the data stream 𝒟\mathcal{D} satisfies the above formula (𝒟,0,ϵ0,ζ0,τ)ϕ12(\mathcal{D},0,\epsilon_{0},\zeta_{0},{\tau})\models\phi_{\ref{req:rob-ped}}.

Example V.7

The requirement below represents a situation in which all the adversarial cars in the frames from the current time forward are moving at least as fast as the ego car and in the same direction.

Req. 13

Always all the bounding boxes of the cars in the image frames do not expand.

STPL (using Assumptions V.2-V.3):

ϕ13=Id1@x.(C(Id1)=CarId2.((Id1=Id2C(Id2)=Car)Area(Id1)Area(Id2)))\phi_{\ref{req:bbx-not-expand}}=\Box\forall Id_{1}@x.\Big{(}C(Id_{1})=Car\implies\\ \Box\forall Id_{2}.\big{(}(Id_{1}=Id_{2}\wedge C(Id_{2})=Car)\implies\\ Area(Id_{1})\geq Area(Id_{2})\big{)}\Big{)} (13)

Notice that in the above example, the geometric position of the objects in the space is not required.

Running Formula in Real Data Stream:

In Table II, the bounding box of the car with ID=1ID=1 has expanded in the frame 22. Therefore, the data stream 𝒟\mathcal{D} does not satisfy the above formula (𝒟,0,ϵ0,ζ0,τ)⊧̸ϕ13(\mathcal{D},0,\epsilon_{0},\zeta_{0},{\tau})\not\models\phi_{\ref{req:bbx-not-expand}}.

To ease the presentation in the formalization of the requirements in the following examples, we introduce some derived spatial operators:

  • Subset: p1p2(p1¯p2)p_{1}\sqsubseteq p_{2}\;\equiv\;\leavevmode\resizebox{7.22743pt}{}{\framebox{$\forall$}}\big{(}\overline{p_{1}}\sqcup p_{2}\big{)}

  • Set equality: p1=p2p1p2p2p1p_{1}=p_{2}\;\equiv\;p_{1}\sqsubseteq p_{2}\wedge p_{2}\sqsubseteq p_{1}

Example V.8

We want to formalize a requirement specification that applies restrictions on the positions and movements of the adversarial and the ego car in all the image frames. The following requirement can be thought of as a template to be used within other requirements.

Req. 14

The relative position and velocity of all the cars are fixed with respect to the ego car.

STPL (using Assumptions V.2-V.3):

ϕ14=Id1@x.Id2.(Id1=Id2σ(Id1)=σ(Id2))\phi_{\ref{req:fix-rel-vel-pos}}=\forall Id_{1}@x.\Box\exists Id_{2}.\Big{(}Id_{1}=Id_{2}\wedge\mathcal{\sigma}(Id_{1})=\mathcal{\sigma}(Id_{2})\Big{)}

Notice that we quantify over all objects in the first frame (Id1Id_{1}), and then we require that for all future times there is an object (Id2Id_{2}) with the same ID and bounding box. If in addition, we would like to impose the requirement on any new objects that we observe, then the formalization of the requirement would be:

ϕ14=Id1@x.Id2.(Id1=Id2σ(Id1)=σ(Id2))\begin{array}[]{rl}\phi_{\ref{req:fix-rel-vel-pos}}^{\prime}=&\Box\forall Id_{1}@x.\Box\exists Id_{2}.\\ &\qquad\qquad\Big{(}Id_{1}=Id_{2}\wedge\mathcal{\sigma}(Id_{1})=\mathcal{\sigma}(Id_{2})\Big{)}\end{array} (14)

Another formalization for the last requirements is

ϕ14′′=Id1.(sσ(Id1)=sσ(Id1))\phi_{\ref{req:fix-rel-vel-pos}}^{\prime\prime}=\Box\forall Id_{1}.\Big{(}\Box^{s}\mathcal{\sigma}(Id_{1})=\Diamond^{s}\mathcal{\sigma}(Id_{1})\Big{)} (15)
Running Formula in Real Data Stream:

All of the objects in Table II have different bounding boxes in different frames. Thus, the data stream 𝒟\mathcal{D} does not satisfy the above formulas (𝒟,0,ϵ0,ζ0,τ)⊧̸ϕ14(\mathcal{D},0,\epsilon_{0},\zeta_{0},{\tau})\not\models\phi_{\ref{req:fix-rel-vel-pos}}.

Example V.9

In this example, we represent a sample requirement for detecting occluded objects. The example is about a high confidence OBJECT which suddenly disappears without getting close to the borders. The requirement checks if such an object existed before and next disappeared, then it has to be occluded by a previously close proximity object.

Req. 15

If there exists a high confidence OBJECT, and in the next frame, it suddenly disappears without being close to the borders, then it must be occluded by another object.

STPL (using Assumptions V.2-V.3):

ϕ15=Id1@x.((φhighprobφfarbordersφdisap)φocc)\phi_{\ref{req:obj-occlusion}}=\Box\forall Id_{1}@x.\Big{(}\big{(}\varphi_{high}^{prob}\wedge\varphi_{far}^{borders}\wedge{\bigcirc}\varphi_{disap}\big{)}\implies\\ \varphi_{occ}\Big{)} (16)

and the predicates are defined as:

φhighprobP(Id1)>0.8\displaystyle\varphi_{high}^{prob}\equiv P(Id_{1})>0.8
φfarbordersLON(Id1,TM)>d1LON(Id1,BM)<d2\displaystyle\varphi_{far}^{borders}\equiv LON(Id_{1},\mbox{\footnotesize{TM}})>d_{1}\wedge LON(Id_{1},\mbox{\footnotesize{BM}})<d_{2}
LAT(Id1,LM)>d3LAT(Id1,RM)<d4\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\;\;\wedge LAT(Id_{1},\mbox{\footnotesize{LM}})>d_{3}\wedge LAT(Id_{1},\mbox{\footnotesize{RM}})<d_{4}
φdisapId2.(Id1Id2)\displaystyle\varphi_{disap}\equiv{\forall Id_{2}.(Id_{1}\neq Id_{2})}
φoccId3.Id4.(Id1Id3Id1=Id4(σ(Id4)(σ(Id3)sσ(Id3))))\varphi_{occ}\equiv\exists Id_{3}.\exists Id_{4}.\bigg{(}Id_{1}\neq Id_{3}\wedge Id_{1}=Id_{4}\;\wedge\\ {\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\Big{(}\mathcal{\sigma}(Id_{4})\sqcap}{\big{(}\mathcal{\sigma}(Id_{3})\sqcup\bigcirc^{s}\mathcal{\sigma}(Id_{3})\big{)}}\Big{)}\bigg{)}

In the above formulas, the subformula φfarborders\varphi_{far}^{borders} checks if the object identified by Id1Id_{1} is close to the borders of the image frame at the current time, where d1d_{1}, d2d_{2}, d3d_{3}, and d4d_{4} are some integers. The subformula Id2.(Id1!=Id2)\forall Id_{2}.(Id_{1}!=Id_{2}) is to check if an object disappeared. In the subformula φocc\varphi_{occ}, the Id4Id_{4} refers to the object which disappears in the next frame, and the Id3Id_{3} refers to the object that occludes the other object. The spatial subformulas check if the bounding box of the two objects intersect each other in the current frame, or one bounding box in the current frame intersects with the other bounding box in the next frame. Similarly, we can formalize the occlusion without using STE operators by rewriting φocc\varphi_{occ} as below

φoccId3.Id4.(Id1Id3Id1=Id4Dist(Id4,CT,Id3,CT)<d5)\varphi_{occ}\equiv\exists Id_{3}.\exists Id_{4}.\big{(}Id_{1}\neq Id_{3}\wedge Id_{1}=Id_{4}\;\wedge\\ Dist(Id_{4},\mbox{\footnotesize{CT}},Id_{3},\mbox{\footnotesize{CT}})<d_{5}\big{)} (17)

Note that the second formalization is less realistic because it puts a threshold on the Euclidean distance (i.e., d5d_{5} is a positive real number) between two objects to infer their overlap.

Running Formula in Real Data Stream:

All the objects in Table II are close to the borders of the images, and the data stream does not satisfy the subformula φfarborders\varphi_{far}^{borders}. Therefore, the data stream 𝒟\mathcal{D} satisfies the the above formula (𝒟,0,ϵ0,ζ0,τ)ϕ(\mathcal{D},0,\epsilon_{0},\zeta_{0},{\tau})\models\phi.

In the following, we used the above formula to partially validate the correctness of a training data stream that is used for object tracking.

Running Formula in Real Data Stream:

We used the formula in Eq. (16) to verify the correctness of the labeled objects as “largely occluded” in the KITTI tracking benchmark. From the 21 datasets, except for 4 of them (i.e., datasets “0013”, “0016”, “0017”, and “0018”), we detected inconsistencies in labeling objects as “largely occluded”. For example, in the dataset “0008” (to refer to as data stream 𝒟\mathcal{D}), we identified 3 frames in each a car was labeled as “largely occluded”, while they were inconsistent with the rest of the occluded labels. That is, we have (𝒟,0,ϵ0,ζ0,τ)⊧̸ϕ(\mathcal{D},0,\epsilon_{0},\zeta_{0},{\tau})\not\models\phi. Our STPL monitoring tool detected the wrong labels by applying the STPL formula in Eq. (16) on the data stream in less than 2 seconds. The result of this experiment is shown in the Figure 4. We provide all the datasets as part of our monitoring tool.

Below, we translate the requirement previously presented as in Req. 2.

Req. 16

The frames per second of the camera is high enough so that for all detected cars, their bounding boxes self-overlap for at least 3 frames and for at least 10% of the area.

We interpret the above requirement to require an overlap check over a batch of three frames rather than three consecutive frames.

STPL (using Assumptions V.2-V.3):

ϕ16=Id1@x.((wId3.(Id1Id3))((x1x3)Id2.(Id1=Id2Ratio(Area(σ(Id1)σ(Id2)),Area(σ(Id2)))0.1)))\phi_{\ref{self-overlap}}=\Box\forall Id_{1}@x.\bigg{(}{\big{(}\varodot_{w}\forall Id_{3}.(Id_{1}\neq Id_{3})\big{)}\implies}\\ \Box\Big{(}\big{(}\mathcal{F}-x\geq 1\;\wedge\;\mathcal{F}-x\leq 3\big{)}\implies\\ \forall Id_{2}.\big{(}Id_{1}=Id_{2}\;\implies\\ Ratio(Area(\mathcal{\sigma}(Id_{1})\sqcap\mathcal{\sigma}(Id_{2})),\\ Area(\mathcal{\sigma}(Id_{2})))\geq 0.1\big{)}\Big{)}\bigg{)} (18)

In the above formula, the spatial subformula is in the form of a non-equality ratio function. Its first parameter represents an occupied area for the intersection of the bounding boxes of any two objects obj1obj_{1} and obj2obj_{2} referred to by Id1Id_{1} and Id2Id_{2}, respectively. The second parameter denotes the area for obj2obj_{2}. For the non-equality to be satisfiable, first, there must be a non-empty intersection between the two objects. Second, the ratio of the intersected area to the area of the second object must be at least 10%10\%.

Running Formula in Real Data Stream:

The data stream 𝒟\mathcal{D} in Table II satisfies the above formula (𝒟,0,ϵ0,ζ0,τ)ϕ16(\mathcal{D},0,\epsilon_{0},\zeta_{0},{\tau})\models\phi_{\ref{self-overlap}}.

V-D Examples of Space and Time Requirements with 3D environment

V-D1 Missed classification scenario

Refer to caption
(a) Annotated LiDAR point cloud image at time t0t_{0}.
Refer to caption
(b) Annotated LiDAR point cloud image at time t1t_{1}.
Refer to caption
(c) Annotated camera image for the LiDAR image on the left.
Refer to caption
(d) Annotated camera image for the LiDAR image on the left.
Refer to caption
(e) Colored semantic segmented of the LiDAR data at time t0t_{0}.
Refer to caption
(f) Colored semantic segmented of the LiDAR data at time t1t_{1}.
Figure 8: Two LiDAR and Camera frames annotated data along with the semantic segmentation data are taken from NuScenes LidarSeg dataset (Scene-0247) (Caesar et al. (2020)).
Refer to caption
(a) x-y plane bird-eye view of LiDAR point cloud at time t0t_{0}.
Refer to caption
(b) x-y plane bird-eye view of LiDAR point cloud at time t1t_{1}.
Figure 9: The bird-view of LiDAR point clouds for the two consecutive frames in Fig. 8 where the drivable area in front of the ego is shown.

Here, we are going to show by example how 3D reasoning is possible using STPL. Specifically, we are going to show by example if the perception system misses some important objects. The six images in Figure 8 illustrates annotated information of two consecutive frames that are taken from nuScenes-lidarseg555https://www.nuscenes.org/nuscenes?sceneId=scene-0274&frame=0&view=lidar dataset (Caesar et al. (2020)). For each detected class of objects, they adopt color coding to represent the object class in the images. The color orange represents cars, and the color cyan represents drivable regions. The color-coded chart of the classes, along with their frequency in the whole dataset can be found online (Motional (2019)). In the first frame, at time t0t_{0}, the LiDAR annotated image is shown in Fig. 8(a). The corresponding camera image for the LiDAR scene is shown in Fig. 8(c). There is a car in this image that we pointed to in a red arrow. As one can check, there is no point cloud data associated with the identified car in the LiDAR image, and as a result, it is not detected/recognized as a car. In the next images illustrated in Fig. 8(b,d), the undetected car at t0t_{0} is identified and annotated as a car at time t1t_{1}. Unlike before, there are some point cloud data associated with the car by which it was detected. The semantic segmentation of the corresponding images is shown in Fig. 8(e-f). First, we will discuss how to decide if there is a missed object in the datasets, similar to what we presented here. Second, we represent a requirement specification in STPL that can formalize such missed-detection scenarios.

Example V.10

For this and the next example scenarios, we have 3D coordinates of bounding volumes and 3D point clouds representing each detected object in the frames. However, the height of those points and cubes are not helpful in these cases, and therefore, we mapped them to the x-y plane by ignoring the z-axis of sensor positions (i.e., see Motional (2019)).

Let assume that the origin of the coordinate system is the center of the ego vehicle, and the y-axis is positive for all the points in front of the ego car, and the x-axis is positive for all the points to the right of the ego. That is, we have 2D bounding boxes of classified objects that are mapped to a flat ground-level environment as shown in Fig. 9. Therefore, we do not need to define new spatial functions for 3D reasoning.

Req. 17

If there is a car driving toward the ego car in the current frame, it must exist in the previous frame.

STPL (using Assumptions V.2-V.3):

ϕ17=Id1.Id3@x.((φexistcarφclosedist)wφexistedbefore)\phi_{\ref{req:drive-toward}}=\Box\forall Id_{1}.\forall Id_{3}@x.\Big{(}\big{(}\varphi_{exist}^{car}\wedge\varphi_{close}^{dist}\big{)}\implies\\ {\varodot_{w}}\;\varphi_{existed}^{before}\Big{)} (19)

and the predicates are defined as:

φexistcar(C(Id1)=Car)(C(Id3)=Drv)\displaystyle\varphi_{exist}^{car}\equiv\big{(}C(Id_{1})=Car\big{)}\wedge\big{(}C(Id_{3})=Drv\big{)}
φclosedistLON(Id1,BM)1.2×LON(Id3,BM)\displaystyle\varphi_{close}^{dist}\equiv LON(Id_{1},\mbox{\footnotesize{BM}})\leq 1.2\times LON(Id_{3},\mbox{\footnotesize{BM}})\wedge
LON(Id1,TM)0.5×LON(Id3,BM)\displaystyle LON(Id_{1},\mbox{\footnotesize{TM}})\geq 0.5\times LON(Id_{3},\mbox{\footnotesize{BM}})\wedge
LAT(Id1,CT)LAT(Id3,LM)\displaystyle LAT(Id_{1},\mbox{\footnotesize{CT}})\geq LAT(Id_{3},\mbox{\footnotesize{LM}})\wedge
LAT(Id1,CT)LAT(Id3,RM)\displaystyle LAT(Id_{1},\mbox{\footnotesize{CT}})\leq LAT(Id_{3},\mbox{\footnotesize{RM}})
φexistedbeforeId2.(C(Id2)=CarId1=Id2\displaystyle\varphi_{existed}^{before}\equiv\exists Id_{2}.\big{(}C(Id_{2})=Car\wedge Id_{1}=Id_{2}\wedge
Dist(Id2,CT,𝕌,CT)>Dist(Id1,CT,𝕌,CT))\displaystyle Dist(Id_{2},\mbox{\footnotesize{CT}},{\Large{\mathbb{U}}},\mbox{\footnotesize{CT}})>Dist(Id_{1},\mbox{\footnotesize{CT}},{\Large{\mathbb{U}}},\mbox{\footnotesize{CT}})\big{)}

Note that in the above 2D spatial functions, we used BM and TM in contrast to their original semantics in the previous examples due to the change of origin of the universe. Also, we assume that the perception system uses an accurate object tracker that assigns unique IDs to objects during consecutive frames. We used Id1Id_{1}, Id2Id_{2}, and Id3Id_{3} to refer to a car in the current frame, a car in the previous frame, and the drivable region. Note that we needed to use the previous frame operator to check if an existing car was absent in the previous frame. That is, in φexistedbefore\varphi_{existed}^{before}, we check if the car identified by Id2Id_{2} matches with the same car in the previous frame. The antecedent of the formula as in Eq. (19) holds in the current frame, but the consequent of it fails because the car that exists in the current frame did not exist in the previous frame. Thus, the data stream shown as image frames in Fig. 9 falsifies the formula.

It is more realistic to formulate the same requirement using lane-based coordinate systems, if the perception system detects different lanes. Consequently, we can simplify the subformula φclosedist\varphi_{close}^{dist} to encode if the cars drive in the same lane.

V-D2 3D occlusion scenario

Refer to caption
(a) Annotated camera image for the LiDAR image on the right.
Refer to caption
(b) Annotated camera image for the LiDAR image on the right.
Refer to caption
(c) Annotated camera image for the LiDAR image on the right.
Refer to caption
(d) Annotated camera image for the LiDAR image on the right.
Refer to caption
(e) Annotated LiDAR point cloud image at time t0t_{0}.
Refer to caption
(f) Annotated LiDAR point cloud image at time t1t_{1}.
Refer to caption
(g) Annotated LiDAR point cloud image at time t2t_{2}.
Refer to caption
(h) Annotated LiDAR point cloud image at time t3t_{3}.
Figure 10: Two LiDAR and Camera frames annotated data are taken from NuScenes LidarSeg dataset (Scene-0399) (Caesar et al. (2020)). The images are taken by the back-left camera as represented in sensor-position image available at nuScenes website (Motional (2019)).

In Figure 10, there are four frames captured at t0t_{0} to t3t_{3}. In the second and third frames, an occluded car (pointed to a red arrow) is detected. The perception system detected a car for which there is no information from the sensors (the cubes in frames 2 and 3 are empty, and the cameras cannot see the occluded vehicle). Although this can be the desired behavior for a perception system that does object tracking, here we are going to formalize a requirement that detects similar cases as wrong classifications.

For this example, we need to define a new 3D spatial function Visible by which we can check if given two cars are visible from the view of a third car. Additionally, we need to add a data attribute Empty to the data-object stream 𝒟\mathcal{D} that determines if a bounding volume of an object is empty (e.g., (𝒟(i),ID).PC=\mathcal{R}(\mathcal{D}(i),ID).{PC}=\emptyset checks if there is no point could data associated with an object identified by IDID) or not. For instance, by (𝒟(i),ID).Empty\mathcal{R}(\mathcal{D}(i),ID).Empty we check if an object identified by IDID in the ii’th frame is empty. We use function E(id)E(id) as a new syntax that serves as above.

[[V(id,CRT,id,id)r]](𝒟,i,ϵ,ζ,τ):=\displaystyle[\![V(id,\mbox{\footnotesize{CRT}},id,id)\sim r]\!](\mathcal{D},i,\epsilon,\zeta,{\tau}):=
{ if fvisible(id,CRT,id,id,𝒟,i,ϵ,ζ))r otherwise\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\left\{\begin{array}[]{ll}\top\mbox{ if \;\;}f_{visible}\big{(}id,\mbox{\footnotesize{CRT}},id,id,\mathcal{D},i,\epsilon,\zeta)\big{)}\sim r\\ \bot\mbox{ otherwise}^{*}\end{array}\right.

𝒇𝒗𝒊𝒔𝒊𝒃𝒍𝒆(𝒊𝒅𝟏,CRT,𝒊𝒅𝟐,𝒊𝒅𝟑,𝓓,𝒊,ϵ,𝜻)\boldsymbol{f_{visible}(id_{1},\mbox{\footnotesize{CRT}},id_{2},id_{3},\mathcal{D},i,\epsilon,\zeta)} returns true if the ϵ(id2)\epsilon(id_{2}) and ϵ(id3)\epsilon(id_{3}) objects are visible from the view point CRT of ϵ(id1)\epsilon(id_{1}) in the kk’th frame, where kζ(id1)k\leftarrow\zeta(id_{1}) if ζ(id1)\zeta(id_{1}) is specified, and kik\leftarrow i otherwise. If one or both are invisible from the viewpoint of object ϵ(id1)\epsilon(id_{1}), then the function returns false.

Example V.11

The below requirement checks the robustness of the object detection in a perception system using spatial constraints and functions on point cloud data.

Req. 18

Any detected car (using the LiDAR data) driving the same direction on the left side of the ego car must include some point cloud data in its bounding volume unless it is occluded.

STPL (using Assumptions V.1-V.3):

ϕ18=Id1.Id2.((¬E(Id1)¬E(Id2)(Id1Id2)(C(Id1)=Car)(C(Id2)=Car)LAT(Id1,RM)<1LAT(Id2,RM)<1LAT(Id1,RM)>6LAT(Id2,RM)>6MD(Id1)=MD(Id2)V(𝕌,CT,Id1,Id2)(¬V(𝕌,CT,Id1,Id2)))(V(𝕌,CT,Id1,Id2)(E(Id1)¬E(Id2))))\phi_{\ref{req:point-cloud-occlusion}}=\Box\forall Id_{1}.\forall Id_{2}.\bigg{(}\Big{(}\neg E(Id_{1})\wedge\neg E(Id_{2})\wedge(Id_{1}\neq Id_{2})\\ \wedge\big{(}C(Id_{1})=Car\big{)}\wedge\big{(}C(Id_{2})=Car\big{)}\\ \wedge LAT(Id_{1},\mbox{\footnotesize{RM}})<-1\wedge LAT(Id_{2},\mbox{\footnotesize{RM}})<-1\\ \wedge LAT(Id_{1},\mbox{\footnotesize{RM}})>-6\wedge LAT(Id_{2},\mbox{\footnotesize{RM}})>-6\\ \wedge{MD(Id_{1})=MD(Id_{2})}\\ \wedge V({\Large{\mathbb{U}}},\mbox{\footnotesize{CT}},Id_{1},Id_{2})\\ \wedge\bigcirc\big{(}\neg V({\Large{\mathbb{U}}},\mbox{\footnotesize{CT}},Id_{1},Id_{2})\big{)}\Big{)}\implies\\ \bigcirc\Big{(}V({\Large{\mathbb{U}}},\mbox{\footnotesize{CT}},Id_{1},Id_{2})\\ \;\mathcal{R}\;\\ \big{(}E(Id_{1})\wedge\neg E(Id_{2})\big{)}\Big{)}\bigg{)} (20)

In the above formula, we keep track of two cars using Id1Id_{1} and Id2Id_{2} that are to the left of the ego car (their distance has to be limited between 1 to 6 meters from the left side of the ego car). We assume that the perception system returns the moving direction of each classified object, and we use function MDMD to retrieve the direction for a given object identifier. Note that this function can be replaced by a spatial formula that uses the coordinates of an object in different frames to calculate its moving direction. The subformula V(𝕌,CT,Id1,Id2)V({\Large{\mathbb{U}}},\mbox{\footnotesize{CT}},Id_{1},Id_{2}) requires a visible angle between the two cars from the point of view of the ego car (i.e., the center point of the 𝕌{\Large{\mathbb{U}}}) to return true. We draw a sample visible angle in the first and fourth frames in yellow. For the two cars that we could find a visible angle for them in the first frame, there is none in frames 2 and 3. Additionally, there is no point cloud in the occluded car’s bounding volume in frames 2 and 3 that satisfies the whole formula until the implies operator. The consequence of the implication is a release formula to be satisfiable from the next time/frame. In the release formula, the right-hand side requires the occluded object to be empty, and only when it becomes visible again, it can be non-empty. The whole formula is satisfiable for the pair of cars we discussed here. The data stream shown as image frames in Fig. 10 satisfies the formula in Eq. (20).

VI Monitoring Algorithm

The offline monitoring algorithm for STPL is one of the main contributions of this work. We use a dynamic programming (DP) approach similar to Dokhanchi et al. (2016); Fainekos et al. (2012) since we are operating over data streams (timed state sequences). To keep the presentation as succinct as possible, we focus the presentation only to the future fragment of STPL. We divide our monitoring algorithm into four algorithms that work in a modular way. We present the pseudocode of the algorithms to analyze the worst case complexity of our STPL monitoring framework. In the following, we are going to use the formula

φ:=Id1@x.Id2.((σ(Id1)¯σ(Id2))(σ(Id2)¯σ(Id1))Id1=Id2)\varphi:=\Box\forall Id_{1}@x.\Box\exists Id_{2}.\Big{(}\leavevmode\resizebox{7.22743pt}{}{\framebox{$\forall$}}\big{(}\overline{\mathcal{\sigma}(Id_{1})}\sqcup\mathcal{\sigma}(Id_{2})\big{)}\wedge\\ \leavevmode\resizebox{7.22743pt}{}{\framebox{$\forall$}}\big{(}\overline{\mathcal{\sigma}(Id_{2})}\sqcup\mathcal{\sigma}(Id_{1})\big{)}\wedge Id_{1}=Id_{2}\Big{)}

to explain some of the aspects of the future fragment of our monitoring algorithm.

VI-A Algorithm-1

This algorithm represents the main dynamic programming body of the monitoring algorithm. It receives an STPL formula φ\varphi, a data stream ρ^\hat{\rho}, and dynamic programming tables MM and FF. The formula is divided into subformulas as in the example in Fig. 11. That is, the formula is divided into subformulas based on the presence of freeze time operators (φi\varphi_{i}) and spatial quantifiers (φi\varphi_{i}^{\prime}). It is important to note that the nested subformulas have lower index, e.g., φ1\varphi_{1} is nested in φ2\varphi_{2} as in Fig. 11. Notice that when we have subindexes, i.e., φi,j\varphi_{i,j}, we refer to the formula in the scope of ii’th freeze variable and the jj’th index from the root in the depth-first-search order.

The main loop of Algorithm 1 has 4 nested loops (lines 4-20). Further nested loops exist in ComputeFnExprComputeFnExpr (Algorithm 3). These nested loops are also the source of the computational complexity in our monitoring algorithm. The loops in lines 4 and 5 are responsible for exploring all possible assignments to the freeze time variables starting from the inner nested freeze time operators. Line 6 explores the data stream backward in time to resolve the future fragment of Linear Temporal Logic (LTL) and the S4uS4_{u} terms. Lines 10-14 identify whether the subformula φk,j\varphi_{k,j} is a spatio-temporal operator and calls the respective function.

Refer to caption
Figure 11: A parse tree for the formula in Eq. 14. Distinct regions represent subformulas φ2\varphi_{2}, φ1\varphi_{1}, φ1\varphi^{\prime}_{1}, and φ2\varphi^{\prime}_{2}. In each region, the subformulas subscript into the operators and predicates levels.

VI-B Algorithm-2

This algorithm computes the values of the DP table MM as defined in the temporal semantics of the STPL logic in Def. IV-B2. For a given subformula, first, it determines the operator and then fills the current columns of the DP tables accordingly. Most of the lines of the algorithm are straightforward where there is no DP table FF in the assignments. The idea of using the separate DP table FF for the freeze subformulas is to store the result of their evaluation at each frozen time, and then update the DP table MM accordingly. Table MM has two columns for each subformula, one for the current time and another for the next time. We use u01u_{01} and u10u_{10} to refer to the current column (current frame) for updating and the next column (next frame) for reading, respectively. Therefore, we need to toggle the value of these variables (between 0 and 1) to change the read column to write column and vice versa. Table FF has two rows for each frame, one for the current time and the other for the next time. We use t01t_{01} and t10t_{10} to refer to the current row for updating and the next row for reading, respectively. Therefore, we need to toggle the value of these variables (between 0 and 1) to change the read row to write row and vice versa. The M[j,u01].valM[j,u_{01}].val data-member of the DP table MM keeps the current evaluating result of the jj’th subformula φj\varphi_{j} at the current frame. The other data-members are used to update min/max values for the quantifier operators. The ITIT data-member is a table of size (SO)|Vid|(S_{O})^{|V_{id}|}, where SOS_{O} is the maximum number of objects in all input frames, and VidV_{id} is the set of the IdId variables in the scope of the evaluating subformula φj\varphi_{j}. In Alg. 2, the only computationally expensive part is the update of the table in line 24. In practice, the table update is not an issue since the number of idid variables is not high (at most 4-5 object variables in the examples in this paper) and the table update can be parallelized.

Algorithm 1 STPL Monitor

Input: φ\varphi, ρ^\hat{\rho}; Global variables: M|φ|×2M_{|\varphi|\times 2}, F2×|ρ^|F_{2\times|\hat{\rho}|}; Output: M[1,0]M[1,0].

Comments:
M¯|φ|×|ρ^|\overline{M}_{|\varphi^{\prime}|\times|\hat{\rho}|} is a table used to store the evaluation of spatial subformulas. For all the tables, the out-of-index accesses are handled case by case (i.e., \top or \bot is a default value)

Procedure STPL-Monitor(φ\varphi, ρ^\hat{\rho})

1:u010u_{01}\leftarrow 0 and t010t_{01}\leftarrow 0 \triangleright toggle between 0 and 1
2:φ\varphi^{\prime}\leftarrow Collect all spatial terms, arithmetic expressions, and functions
3:VV\leftarrow Collect all freeze time variables in φ\varphi
4:for k1 to |V|k\leftarrow 1\mbox{ to }|V| do
5:    for t0 to |ρ^|1t\leftarrow 0\mbox{ to }|\hat{\rho}|-1 do \triangleright freeze (store) the frame
6:        for u|ρ^|1 to tu\leftarrow|\hat{\rho}|-1\mbox{ to }t do
7:\triangleright backward iteration for the future fragment of STPL
8:           for jφk.max down to φk.minj\leftarrow\varphi_{k}.max\mbox{ down to }\varphi_{k}.min do
9:\triangleright iterate over subformulas under each freeze time operator
10:               if φjφ{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\varphi_{j}}\in\varphi^{\prime} then
11:                  ComputeFnExpr(φ,j,u,u01,t,ρ^)ComputeFnExpr(\varphi,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}j},u,u_{01},t,\hat{\rho})
12:               else
13:                  ComputeLTL(φ,j,u,u01,t,t01,k)ComputeLTL({\varphi,j},u,u_{01},t,t_{01},k)
14:               end if
15:           end for
16:           u01toggle(u01){u_{01}\leftarrow toggle(u_{01})}
17:        end for
18:    end for
19:    t01toggle(t01){t_{01}\leftarrow toggle(t_{01})}
20:end for
21:k|V|+1k\leftarrow|V|+1 \triangleright handle the root node
22:if φk is not a freeze operator\varphi_{k}\mbox{ is not a freeze operator} then
23:    Repeat steps 6 to 17 for t=0t=0
24:else
25:    u01toggle(u01){u_{01}\leftarrow toggle(u_{01})} , t01toggle(t01){t_{01}\leftarrow toggle(t_{01})}
26:    M[1][u01]=F[t01][0]{M[1][u_{01}]=F[t_{01}][0]}
27:end if
28:return M[1,u01]M[1,u_{01}]

end procedure

VI-C Algorithm-3

This algorithm computes the identifier table ITIT values of the DP table MM. Here, we compute spatial functions and quantifier operators. For a given spatial function Fn()Fn(\dots), if it has nested function operators, then this algorithm recursively computes the final values. The ITIT table stores the values for any functional expression based on the combinatorial assignments of objects to the object variables. In line 10, if the combination of object identifiers does not satisfy the correct range of existing objects in the current frame or a frozen frame, then it assigns \bot (false). Otherwise, if any parameter is a spatial formula, then it runs algorithm 4. The size of the ITIT table is the maximum number of combinations for objects with respect to the object variables. For example, if there are a maximum of five objects per frame in an object stream, and a formula has a maximum of three nested object variables, then the size of ITIT is 53=1255^{3}=125. We choose the size of the ITIT table based on the worst case scenario, but for efficient computation, we only compute as many as needed combinations for each quantifier subformula based on its scope. Therefore, we mark the remaining cells of the table with NaN for extended computation that merges these tables for different subformulas in Algorithm 2 by calling function UpdateIdTableUpdateIdTable at line 24.

Algorithm 2 LTL Monitor

Input: φ,j,u,u01,t,t01,k{\varphi,j},u,u_{01},t,t_{01},{k};

procedure ComputeLTL(φ,j,u,u01,t,t01,k{\varphi,j},u,u_{01},t,t_{01},{k})

1:u10toggle(u01)u_{10}\leftarrow toggle(u_{01}),t10toggle(t01)t_{10}\leftarrow toggle(t_{01})
2:if φj\varphi_{j}\equiv\top  then
3:    M[j,u01].valM[j,u_{01}].val\leftarrow\top
4:else if φjτvkr\varphi_{j}\equiv\tau-v_{k}\sim r then
5:    if (τuτt)r(\tau_{u}-\tau_{t})\sim r then
6:        M[j,u01].valM[j,u_{01}].val\leftarrow\top
7:    else
8:        M[j,u01].valM[j,u_{01}].val\leftarrow\bot
9:    end if
10:else if φjvkr\varphi_{j}\equiv\mathcal{F}-v_{k}\sim r then
11:    if (ut)r(u-t)\sim r then
12:        M[j,u01].valM[j,u_{01}].val\leftarrow\top
13:    else
14:        M[j,u01].valM[j,u_{01}].val\leftarrow\bot
15:    end if
16:else if φj\varphi_{j} is a freeze subformula then
17:    M[j,u01]M[j+1,u01]M[j,u_{01}]\leftarrow M[j+1,u_{01}]
18:    if φj\varphi_{j} has an \exists quantifier then
19:        M[j,u01].valM[j,u01].maxM[j,u_{01}].val\leftarrow M[j,u_{01}].max
20:    else
21:        M[j,u01].valM[j,u01].minM[j,u_{01}].val\leftarrow M[j,u_{01}].min
22:    end if
23:    F[t01][t]M[j,u01]F[t_{01}][t]\leftarrow M[j,u_{01}]
24:    UpdateIdTable(M[j,u01].IT)UpdateIdTable(M[j,u_{01}].IT)
25:else if φj¬φm\varphi_{j}\equiv\neg\varphi_{m} then
26:    M[j,u01]=¬M[m,u01]M[j,u_{01}]=\neg M[m,u_{01}] \triangleright Apply ¬\neg to val, min, max, IT
27:else if φjφmφn\varphi_{j}\equiv\varphi_{m}\vee\varphi_{n} then
28:    M[j,u01]M[j,u_{01}]\leftarrow max(M[m,u01],M[n,u01]M[m,u_{01}],M[n,u_{01}])
29:else if φjφm\varphi_{j}\equiv\bigcirc\varphi_{m} then
30:    if u=|ρ^|1u=|\hat{\rho}|-1 then
31:        M[j,u01]M[j,u_{01}]\leftarrow\bot
32:    else if φm\varphi_{m} is followed by @{x,}@\{x,\dots\} then
33:        M[j,u01]F[t10,u+1]M[j,u_{01}]\leftarrow F[t_{10},u+1]
34:    else
35:        M[j,u01]M[m,u10]M[j,u_{01}]\leftarrow M[m,u_{10}]
36:    end if
37:else if φjφm𝒰φn\varphi_{j}\equiv\varphi_{m}\,\mathcal{U}\varphi_{n} then
38:    if u=|ρ^|1u=|\hat{\rho}|-1 then
39:        M[j,u01]M[n,u01]M[j,u_{01}]\leftarrow M[n,u_{01}]
40:    else
41:        if φm\varphi_{m} is followed by x.x. then
42:           M[m,u01]M[m,u_{01}]\leftarrow F[t10,u]F[t_{10},u]
43:        end if
44:        if φn\varphi_{n} is followed by x.x. then
45:           M[n,u01]M[n,u_{01}]\leftarrow F[t10,u]F[t_{10},u]
46:        end if
47:        M[j,u01]M[j,u_{01}]\leftarrow     max(M[n,u01],M[n,u_{01}], min(M[m,u01],M[j,u10]M[m,u_{01}],M[j,u_{10}]))
48:    end if
49:end if

end procedure

VI-D Algorithm-4

This algorithm computes the values of the DP table M¯\overline{M}. It evaluates spatial-temporal formulas similar to the temporal formulas of Algorithm 2 with the modification that now set operations are used as opposed to Boolean operations. The data structures for the set representation and the corresponding set operations can affect the computational complexity of 4. Depending on the application, e.g., 2D vs 3D, a number of solutions are possible ranging from a linked list of pairs of 2D points (i.e., union of rectangles) to quadtrees or octrees (see next section).

Algorithm 3 Compute Function Expression

Input: φ,k,u,u01,t,ρ^\varphi,k,u,u_{01},t,\hat{\rho};

procedure ComputeFnExpr(φ,k,u,u01,t,ρ^\varphi,k,u,u_{01},t,\hat{\rho})

1:φkFn(Ω1,,Ωn)r\varphi_{k}\equiv Fn(\Omega_{1},\dots,\Omega_{n})\sim r where     rr\in\mathbb{R}, \sim\;\in {>,<,,,=,!=}\{>,<,\geq,\leq,=,!=\},     and FnFn is a reserved function name
2:Pn×1P_{n\times 1} is a list used to store the evaluated arguments of FnFn
3:Sid(SO)|Vid|S_{id}\leftarrow(S_{O})^{|V_{id}|} where     SOS_{O} is the maximum number of objects in all the frames     and VidV_{id} is the set of the object variables in the scope of φk\varphi_{k}
4:SOcfS_{O}^{cf}\leftarrow number of objects in the current frame uu
5:SOffS_{O}^{ff}\leftarrow number of the objects in the freeze frame tt
6:FIDF_{ID}\leftarrow is a map s.t.     IdVid,{FID[Id]=true|``Id@vx."φ\forall Id\in{V_{id}},\{F_{ID}[Id]=true\;|\;\exists\;``\exists Id@v_{x}."\in\varphi     or ``Id@vx."φ},vxVt}``\forall Id@v_{x}."\in\varphi\},v_{x}\in V_{t}\}
7:for i1toSidi\leftarrow 1\;{to}\;S_{id} do
8:\triangleright iterates over combinatorial assignments of values to the IDs
9:    {Id1,Id|Vid|}\{Id_{1},\dots Id_{|V_{id}|}\}\leftarrow i’th combinatorial of {1,SO}\{1,\dots S_{O}\}     to {Idj|IdjVid,1j|Vid|}\{Id_{j}\;|\;Id_{j}\in V_{id},1\leq j\leq|V_{id}|\}
10:    if  IdVid\exists Id\in V_{id} and ((FID[Id]=trueF_{ID}[Id]=true and Id>SOff)Id>S_{O}^{ff}) or     (FID[Id]=false(F_{ID}[Id]=false and Id>SOcf))Id>S_{O}^{cf})\big{)} then
11:        M[k,u01].IT[i1]M[k,u_{01}].IT[i-1]\leftarrow\bot
12:    else
13:        for j1tonj\leftarrow 1\;{to}\;n do \triangleright iterates over FnFn arguments
14:           if Ωj\Omega_{j} is a const number or reserved word then
15:               P[j].valΩjP[j].val\leftarrow\Omega_{j}
16:           else if ΩjFn()\Omega_{j}\equiv Fn(\dots) then
17:               P[j].valP[j].val\leftarrow             ComputeFnExpr(Ωj,k,u,u01,t,ρ^)ComputeFnExpr(\Omega_{j},k,u,u_{01},t,\hat{\rho})
18:           else if Ωjσ(Id)\Omega_{j}\equiv\mathcal{\sigma}(Id) then
19:               P[j].regionBoundingBox(Id,FID,u,t)P[j].region\leftarrow BoundingBox(Id,F_{ID},u,t)
20:           else if Ωj\Omega_{j} is a spatial formula then
21:               P[j].regionP[j].region\leftarrow             ComputeSpatioTemporal(Ωj,u,t,i,ρ^)ComputeSpatioTemporal(\Omega_{j},u,t,i,\hat{\rho})
22:           end if
23:        end for
24:        M[k,u01].IT[i1]Fn(P)rM[k,u_{01}].IT[i-1]\leftarrow Fn(P)\sim r
25:    end if
26:end for
27:SID(SO)LS_{ID}\leftarrow(S_{O})^{L} where     LL is the maximum number of object variables that can be used     in the scope of any @{}@\{\dots\} subformula
28:for iSidtoSID1i\leftarrow S_{id}\;{to}\;S_{ID}-1 do \triangleright iterates over the remaining of the ID Table
29:    M[k,u01].IT[i]NaNM[k,u_{01}].IT[i]\leftarrow\mbox{\footnotesize{NaN}}
30:end for

end procedure

Algorithm 4 Spatio-Temporal Monitor (with intervals)

Input: θ,u,t,i,ρ^\theta,u^{\prime},t,i,\hat{\rho};

procedure ComputeSpatioTemporal(θ,u,t,i,ρ^\theta,u^{\prime},t,i,\hat{\rho})

1:{Id1,Id|Vid|}i\{Id_{1},\dots Id_{|V_{id}|}\}\leftarrow i’th combinatorial assignment of {1,SO}\{1,\dots S_{O}\} to     object variables
2:for u|ρ^|1 down to uu\leftarrow|\hat{\rho}|-1\mbox{ down to }u^{\prime} do\triangleright iterates backward over time stamps
3:    SOcfS_{O}^{cf}\leftarrow number of objects in the current frame uu
4:    SOffS_{O}^{ff}\leftarrow number of the objects in the freeze frame tt
5:    for jθ.max down to θ.minj\leftarrow\theta.max\mbox{ down to }\theta.min do \triangleright iterates over subformulas
6:        if φjσ(Id)\varphi_{j}\equiv\mathcal{\sigma}(Id) then
7:           if  ((FID[Id]=true\big{(}(F_{ID}[Id]=true and Id>SOff)Id>S_{O}^{ff}) or     (FID[Id]=false(F_{ID}[Id]=false and Id>SOcf))Id>S_{O}^{cf})\big{)} then
8:               M¯[j,u]\overline{M}[j,u]\leftarrow{\Large{\varnothing}} \triangleright mark as NaN
9:           else
10:               M¯[j,u]BoundingBox(Id,FID,u,t)\overline{M}[j,u]\leftarrow BoundingBox(Id,F_{ID},u,t)
11:           end if
12:        else if φjφm¯\varphi_{j}\equiv\overline{\varphi_{m}} then
13:           M¯[j,u]Complement(M¯[m,u])\overline{M}[j,u]\leftarrow Complement(\overline{M}[m,u])
14:        else if φjφmφn\varphi_{j}\equiv\varphi_{m}\sqcap\varphi_{n} then
15:           M¯[j,u]Intersection(M¯[m,u],M¯[n,u])\overline{M}[j,u]\leftarrow Intersection(\overline{M}[m,u],\overline{M}[n,u])
16:        else if φjφmφn\varphi_{j}\equiv\varphi_{m}\sqcup\varphi_{n} then
17:           M¯[j,u]Union(M¯[m,u],M¯[n,u])\overline{M}[j,u]\leftarrow Union(\overline{M}[m,u],\overline{M}[n,u])
18:        else if φj𝐈φm\varphi_{j}\equiv\mathbf{I}\;{\varphi_{m}} then
19:           M¯[j,u]Interior(M¯[m,u])\overline{M}[j,u]\leftarrow Interior(\overline{M}[m,u])
20:        else if φj𝐂φm\varphi_{j}\equiv\mathbf{C}\;{\varphi_{m}} then
21:           M¯[j,u]Closure(M¯[m,u])\overline{M}[j,u]\leftarrow Closure(\overline{M}[m,u])
22:        else if φjφm𝒰~sφn\varphi_{j}\equiv\varphi_{m}\;\tilde{\,\mathcal{U}}_{\mathcal{I}}^{s}\;\varphi_{n} then
23:           if u=|ρ^|1u=|\hat{\rho}|-1 and 00\in\mathcal{I} then
24:               M¯[j,u]M¯[n,u]\overline{M}[j,u]\leftarrow\overline{M}[n,u]
25:           else if u=|ρ^|1u=|\hat{\rho}|-1 and 00\not\in\mathcal{I} then
26:               M¯[j,u]\overline{M}[j,u]\leftarrow\bot
27:           else if =[0,+)\mathcal{I}=[0,+\infty) then
28:               M¯[j,u]M¯[n,u]\overline{M}[j,u]\leftarrow\overline{M}[n,u]\;\sqcup     (M¯[m,u]M¯[j,u+1])(\overline{M}[m,u]\sqcap\overline{M}[j,u+1])
29:           else
30:               blmin{j+}b_{l}\leftarrow min\{j+\mathcal{I}\}; bumax{j+}b_{u}\leftarrow max\{j+\mathcal{I}\}
31:               rminjj<blr_{min}\leftarrow\sqcap_{j\leq j^{\prime}<b_{l}} M¯[m,j]\overline{M}[m,j^{\prime}]
32:               M¯[j,u]\overline{M}[j,u]\leftarrow\bot
33:               for jblj^{\prime}\leftarrow b_{l} to bub_{u} do \triangleright iterates over frame indices
34:                  M¯[j,j]M¯[j,j](M¯[n,j]rmin)\overline{M}[j,j^{\prime}]\leftarrow\overline{M}[j,j^{\prime}]\sqcup(\overline{M}[n,j^{\prime}]\sqcap r_{min})
35:                  rminrminM¯[m,j]r_{min}\leftarrow r_{min}\sqcap\overline{M}[m,j^{\prime}]
36:               end for
37:               if sup=+sup\;\mathcal{I}=+\infty then
38:                  M¯[j,u]M¯[j,u]\overline{M}[j,u]\leftarrow\overline{M}[j,u] (M¯[m,u]M¯[j,u+1])\sqcup(\overline{M}[m,u]\sqcap\overline{M}[j,u+1])
39:               end if
40:           end if
41:        end if
42:    end for
43:end for
44:return M¯[θ.min,u]\overline{M}[\theta.min,u^{\prime}]

end procedure

VI-E Correctness and Complexity Analysis

Let ρ^\hat{\rho} be an input signal of size |ρ^||\hat{\rho}|, φ\varphi be an AAN STPL formula of size |φ||\varphi| (note that size of a formula is the summation of the number of spatial, temporal and spatio-temporal subformulas), |Vt||V_{t}| be the size of freeze time variables (or 1 if there is none), SidS_{id} be the maximum number of used object variables in the scope of a quantifier operator, and SobjS_{obj} be the maximum number of objects in a single frame in ρ^\hat{\rho}. If φ\varphi is a TPTL formula, then it is known from Dokhanchi et al. (2016) that the upper bound time complexity for the variable-bounded TPTL monitoring algorithm is O(|Vt|×|φ|×|ρ^|2)O(|V_{t}|\times|\varphi|\times|\hat{\rho}|^{2}), which is polynomial. Additionally, the upper bound space complexity of the presented algorithm by Dokhanchi et al. (2016) is O(|φ|×|ρ^|)O(|\varphi|\times|\hat{\rho}|). Our STPL monitoring algorithm is founded based on the TPTL monitoring algorithm, but there are two major additions to the TPTL syntax and semantics. The first addition to the STPL grammar presented in Def. IV.1 is the existential/universal quantifiers that precedes the freeze time operator. The second addition is the production rule Θ\Theta. We call this extension of the TPTL language the Quantifiable TPTL (QTPTL). Next is the production rule 𝒯\mathcal{T} that produces purely spatio-temporal formulas (Π\Pi and 𝒯\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\;\mathcal{T} quantify the spatio-temporal in a constant time). The last addition to the QTPTL results in the STPL language. Therefore, we only analyze the time and space complexity of our algorithm for parts that concern the two aforementioned additions.

Our proposed monitoring algorithm is based on the Dynamic Programming (DP) algorithm by Dokhanchi et al. (2016) and, therefore, to consider the first addition, we need to evaluate each object variable related subformula at most (Sobj)Sid(S_{obj})^{S_{id}} times. This also requires extra space to build the DP tables. Therefore, the upper bound time and space complexity of the QTPTL algorithms increases to O((Sobj)Sid×|Vt|×|φ|×|ρ^|2)O((S_{obj})^{S_{id}}\times|V_{t}|\times|\varphi|\times|\hat{\rho}|^{2}) and O((Sobj)Sid×|φ|×|ρ^|)O((S_{obj})^{S_{id}}\times|\varphi|\times|\hat{\rho}|), respectively.

Finally, we consider spatial and spatio-temporal subformulas denoted as φs\varphi_{s} in addition to temporal ones denoted as φt\varphi_{t} to do complexity analysis of the STPL algorithm. It is easy to see that the production rule 𝒯\mathcal{T} has the same grammar as MTL/STL, except that the logical operators are replaced with spatial ones. Therefore, the time/space complexity of monitoring these formulas follows the same complexity as in MTL/STL monitoring algorithms except for the spatial operations. In MTL/STL, all the logical operations compute in constant time. However, for spatial operations, depending on the used data structure for representing the spatial terms, this might not hold. In Appendix: Complexity Analysis of STE formulas, we calculate exponential lower bound for some spatio-temporal formulas where the linked-list was used to represent the spatial terms. That is, if we do not exploit the geometrical properties of the spatial terms while storing them, then we get exponential complexity for the spatial operations. Whereas, if we use some geometry-sensitive data structures such as region Quadtrees and region Octrees for storing and computing 2D and 3D spatial terms (e.g., see Aluru (2018); Shneier (1981)), respectively, then, we get a polynomial time and space complexity, e.g., Bournez et al. (1999). Assume that we can decompose our d-dimensional topological space (dimension is fixed to be 2D or 3D) into rdr^{d} cells, where rr is the constant resolution along each axis. Construction of an image Quadtree/Octree is linear in the size of the image (see Shneier (1981)). The union and intersection algorithm for Quadtrees, in the worst case, requires visiting all the nodes in the trees, which can be done in KK times. For computing a spatio-temporal formula φs\varphi_{s}, at each time-step, in the worst case, it requires as many spatial operations as linear to the size of the formula.

Therefore, the time complexity of computing the formula φs\varphi_{s} against an input signal ρ^\hat{\rho} is as follows:

  • O(|ρ^|×|φs|×K)O(|\hat{\rho}|\times|\varphi_{s}|\times K), if there is no time/frame intervals in the formula and no frozen object variable is used in the formula. Note that KK is a big constant (i.e., KK is a function of the dimension and the resolution of the space) and we did not omit it to emphasize its impact on the computation.

  • O(|ρ^|×|φs|×K×c)O(|\hat{\rho}|\times|\varphi_{s}|\times K\times c), if there are time/frame intervals in the formula and no frozen object variable is used in the formula. Here, cc is defined as

    c=max0j|𝒟|,T(ϕ)|[j,maxJ(j,)]|c=\max_{0\leq j\leq|\mathcal{D}|,\mathcal{I}\in T(\phi)}|[j,\max J(j,\mathcal{I})]|

    where T(ϕ)T(\phi) contains all the timing constraints \mathcal{I} which are either attached on the temporal operators in formula ϕ\phi, or are timing constraints of the form τxn\tau-x\sim n or xn\mathcal{F}-x\sim n in the scope of a temporal operator in ϕ\phi. For example, if ϕ=[0.1,)ψ\phi=\Box_{[0.1,\infty)}\psi, then [0.1,)T(ϕ)[0.1,\infty)\in T(\phi), or if ϕ=x.(τx5.5ψ)\phi=x\;.\,\Box(\tau-x\leq 5.5\Rightarrow\psi), then [0,5.5]T(ϕ)[0,5.5]\in T(\phi). Intuitively, the function JJ returns all the samples that satisfy a constraint \mathcal{I} from the set T(ϕ)T(\phi) starting from a sample jj. Formally,

    J(j,)={τ1((τ(j)+R)(τ(j+1)+R)) if sup=+τ1(τ(j)+R) otherwise J(j,\mathcal{I})=\left\{\begin{array}[]{ll}\tau^{-1}((\tau(j)+_{R}\mathcal{I})\cap\\ \qquad(\tau(j+1)+_{R}\mathcal{I}))&\text{ if }\sup\mathcal{I}=+\infty\\ \tau^{-1}(\tau(j)+_{R}\mathcal{I})&\text{ otherwise }\end{array}\right.

    with t+R={t′′R|t.t′′=t+t}t+_{R}\mathcal{I}=\{t^{\prime\prime}\in R\;|\;\exists t^{\prime}\in\mathcal{I}\,.\,t^{\prime\prime}=t+t^{\prime}\}. Note that when considering constraints on the number of frames, i.e., xn\mathcal{F}-x\sim n, then the timestamp mapping τ\tau is the identity function. For a discussion on cc for STL/MTL with discrete time semantics see Fainekos et al. (2012).

  • O(|ρ^|×|φs|×K×|Vt|)O(|\hat{\rho}|\times|\varphi_{s}|\times K\times|V_{t}|), if there is no time/frame intervals in the formula, but there are frozen object variables used in the formula.

  • O(|ρ^|×|φs|×K×c×|Vt|)O(|\hat{\rho}|\times|\varphi_{s}|\times K\times c\times|V_{t}|), if there are time/frame intervals in the formula, and there are frozen object variables used in the formula.

Overall, the upper bound time and space complexity of the STPL algorithm are O((Sobj)Sid×|Vt|×|φ|×|ρ^|2×K×c)O((S_{obj})^{S_{id}}\times|V_{t}|\times|\varphi|\times|\hat{\rho}|^{2}\times K\times c) and O((Sobj)Sid×|φ|×|ρ^|×K)O((S_{obj})^{S_{id}}\times|\varphi|\times|\hat{\rho}|\times K), respectively.

In our implementation of the monitoring algorithms (discussed in Monitoring Algorithm), we focus on the future fragment of the STPL logic to avoid complexity (e.g, by excluding past-time operators as presented in Appendix: STPL Future Syntax). That is, we improved the space complexity of STPL formulas without spatial terms by decoupling the DP tables into two separate tables: one dedicated to the values of subformulas at the current time step, and the other for their frozen values along the time horizon. Therefore, we reduced the space complexity to O((Sobj)Sid×(|φt|+|ρ^|))O\big{(}(S_{obj})^{S_{id}}\times(|\varphi_{t}|+|\hat{\rho}|)\big{)} for non-spatial STPL formulas and spatial formulas without time/frame intervals in them. For improving the exponential complexity of the spatial STPL formulas, we merge the fragmented subsets after spatial operations. Additionally, if there is no frozen object variable used in a spatial formula, we only evaluate it once. Some optimizations can be done based on the content of the formulas, for example, if a temporal formula does not have globally, eventually, until and release operators in it, then we deduce the needed horizon length of the input signal accordingly (i.e., next time operator only requires the evaluation of the first two frames). Also, we interpret the time/frame constraints in a formula to possibly ignore the evaluation of the affected subformulas accordingly.

The correctness of the algorithm with respect to the presented syntax and semantics of STPL can be proven by using the correctness proofs that are presented for the TPTL and MTL monitoring algorithms by Dokhanchi et al. (2016) and Fainekos et al. (2012).

We have released an open-source version of the code for both Linux and Windows OS in a public repository in GitLab (see STPL (2023)). Our tool can be run in standalone mode or as part of Matlab. Additionally, there are data stream files and input configuration files that cover most of the examples in the previous sections, as well as the sensitivity analysis result in the following section.

VI-F A Polynomial Time/Space Fragment

In this section, we identify STPL specification templates for which the monitoring problem becomes polynomial time in the worst case. In the template below, we assume that the size of the spatial formula φs\varphi_{s} is bounded.

VI-F1 Example

The complexities of evaluating an arbitrary SPE formula φs\varphi_{s} (the formula is in conjunctive form) on a data stream ρ^\hat{\rho} is

  • O(2a3b)O\big{(}2^{a}3^{b}\big{)} for time and space, if there is no complement operator in the formula, where a,ba,b\in\mathbb{N}, and we have: argmax(2a3b|\arg\max\big{(}2^{a}3^{b}\;|\;(|φs|+1)/2=2a+3b)(|\varphi_{s}|+1)/2=2a+3b\big{)}. For instance, φs:=(𝒯𝒯𝒯)\varphi_{s}:=(\mathcal{T}\sqcup\mathcal{T}\sqcup\mathcal{T})(𝒯𝒯𝒯)\;\sqcap\;(\mathcal{T}\sqcup\mathcal{T}\sqcup\mathcal{T}).

  • O(4((|φs|+1)/3))O\big{(}4^{((|\varphi_{s}|+1)/3)}\big{)} for time and space, if there are complement operators in the NNF formula. For instance, φs:=𝒯¯𝒯¯\varphi_{s}:=\overline{\mathcal{T}}\;\sqcap\;\overline{\mathcal{T}}𝒯¯𝒯¯\;\sqcap\;\overline{\mathcal{T}}\;\sqcap\;\overline{\mathcal{T}}𝒯¯𝒯¯\;\sqcap\;\overline{\mathcal{T}}\;\sqcap\;\overline{\mathcal{T}}.

VI-F2 Example

An arbitrary ll-level nested spatial globally formula φs\varphi_{s}^{\Box}, with a spatial term as its right-most subformula, is of O(|ρ^|)O(|\hat{\rho}|) time and O(1)O(1) space complexity.

φs:=s(𝒯s(𝒯s(𝒯s𝒯)))\varphi_{s}^{\Box}:=\Box^{s}_{\mathcal{I}}\Big{(}\mathcal{T}\sqcap\Box^{s}_{\mathcal{I}}\big{(}\mathcal{T}\sqcap\Box^{s}_{\mathcal{I}}(\mathcal{T}\sqcap\Box^{s}_{\mathcal{I}}\mathcal{T})\big{)}\Big{)}

VI-F3 Example

An arbitrary ll-level nested spatial eventually formula φs\varphi_{s}^{\Diamond}, with a spatial term as its right-most subformula, is of O(|ρ^|(l+1))O(|\hat{\rho}|^{(l+1)}) time and space complexity.

φs:=s(𝒯s(𝒯s(𝒯s𝒯)))\varphi_{s}^{\Diamond}:=\Diamond^{s}_{\mathcal{I}}\Big{(}\mathcal{T}\sqcup\Diamond^{s}_{\mathcal{I}}\big{(}\mathcal{T}\sqcup\Diamond^{s}_{\mathcal{I}}(\mathcal{T}\sqcup\Diamond^{s}_{\mathcal{I}}\mathcal{T})\big{)}\Big{)}

VI-F4 Example

An arbitrary ll-level nested spatial eventually subformula φs\varphi_{s}^{\Diamond} followed by an arbitrary kk-level nested spatial globally subformula φs\varphi_{s}^{\Box}, with a spatial term as its right-most subformula, is of O(|ρ^|(l+1))O(|\hat{\rho}|^{(l+1)}) time and space complexity.

φs,:=s(𝒯s(𝒯s(𝒯s\displaystyle\varphi_{s}^{\Diamond,\Box}:=\Diamond^{s}_{\mathcal{I}}\Big{(}\mathcal{T}\sqcup\Diamond^{s}_{\mathcal{I}}\big{(}\mathcal{T}\sqcup\Diamond^{s}_{\mathcal{I}}(\mathcal{T}\sqcup\Diamond^{s}_{\mathcal{I}}
(s(𝒯s(𝒯s𝒯))))))\displaystyle\Big{(}\Box^{s}_{\mathcal{I}}\big{(}\mathcal{T}\sqcap\Box^{s}_{\mathcal{I}}(\mathcal{T}\sqcap\Box^{s}_{\mathcal{I}}\mathcal{T})\big{)}\Big{)})\big{)}\Big{)}

VI-F5 Example

A right-hand-side ll-level nested spatial until formula φs𝒰\varphi_{s}^{\,\mathcal{U}}, where the left-hand-side of all the until operators are a single spatial term, is of O(|ρ^|(l+2))O(|\hat{\rho}|^{(l+2)}).

φs𝒰:=𝒯𝒰s(𝒯𝒰s(𝒯𝒰s(𝒯𝒰s𝒯)))\displaystyle\varphi_{s}^{\,\mathcal{U}}:=\mathcal{T}\;\,\mathcal{U}^{s}_{\mathcal{I}}\Big{(}\mathcal{T}\;\,\mathcal{U}^{s}_{\mathcal{I}}\big{(}\mathcal{T}\;\,\mathcal{U}^{s}_{\mathcal{I}}(\mathcal{T}\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\mathcal{T})\big{)}\Big{)}

VI-F6 Example

A left-hand-side ll-level nested spatial release formula φs\varphi_{s}^{\mathcal{R}}, where the right-hand-side of all the until operators are a single spatial term, is of O(|ρ^|(l+2))O(|\hat{\rho}|^{(l+2)}).

φs:=(((𝒯s𝒯)s𝒯)s𝒯)s𝒯\displaystyle\varphi_{s}^{\mathcal{R}}:=\Big{(}\big{(}(\mathcal{T}\;\mathcal{R}^{s}_{\mathcal{I}}\;\mathcal{T})\mathcal{R}^{s}_{\mathcal{I}}\;\mathcal{T}\big{)}\mathcal{R}^{s}_{\mathcal{I}}\;\mathcal{T}\Big{)}\mathcal{R}^{s}_{\mathcal{I}}\;\mathcal{T}

VII Experiments and Results

TABLE III: Statistics on execution-time for different formulas and data stream sizes. We used the Berkeley DeepDrive (BDD) dataset to compute the results. mtimem-time and etimee-time represent the required time (in second) for releasing memories and executing the monitoring algorithm, respectively.
      |𝝆^|\boldsymbol{|\hat{\rho}|} |𝝋𝒕|\boldsymbol{|\varphi_{t}|} |𝝋𝒔|\boldsymbol{|\varphi_{s}|} |𝑽𝒕|\boldsymbol{|V_{t}|} 𝑺𝒐𝒃𝒋\boldsymbol{S_{obj}} 𝑺𝒊𝒅\boldsymbol{S_{id}} m-time e-time    
                                                                                    quantifier-formed STPL Formula (9) without spatial operators
     25 9 0 0 20 1 0 0.002    
    50 9 0 0 20 1 0 0.001    
    100 9 0 0 23 1 0 0.005    
    200 9 0 0 24 1 0 0.008    
                                                                                    mix-formed STPL Formula (10) without spatial operators
     25 7 0 1 20 2 0 0.132    
    50 7 0 1 20 2 0 0.519    
    100 7 0 1 23 2 0 2.76    
    200 7 0 1 24 2 0 11.31    
                                                                                    mix-formed STPL Formula (14) with SPE operators
     25 9 8 1 20 2 0.004 1.25    
    50 9 8 1 20 2 0.003 4.13    
    100 9 8 1 23 2 0.005 16.32    
    200 9 8 1 24 2 0.005 63.52    
                                                                                    quantifier-formed STPL Formula (15) with STE operators
      25 3 4 0 20 1 0 0.006    
    50 3 4 0 20 1 0 0.029    
    100 3 4 0 23 1 0 0.119    
    200 3 4 0 24 1 0 0.176    
                                                                                    mix-formed STPL Formula (16) with spatial terms
     25 29 6 1 20 3 0.023 2.23    
    50 29 6 1 20 3 0.023 4.97    
    100 29 6 1 23 3 0.037 16.07    
    200 29 6 1 24 3 0.043 46.88    
                                                                                    mix-formed STPL Formula (17) without spatial terms
     25 29 0 1 20 3 0 1.16    
    50 29 0 1 20 3 0 2.73    
    100 29 0 1 23 3 0 10.41    
    200 29 0 1 24 3 0 33.69    
     

We selected some of the presented example formulas to cover different possible combinations for the operators and quantifiers, and to demonstrate how the computation time scales concerning the size of the data stream and the formulas. For this experimental analysis, we used the DeepDrive dataset 666https://bdd-data.berkeley.edu/ (Yu et al. (2020)). As indicated in the last column of Table III, the performance of the STPL monitoring algorithm for hundreds of monitoring frames (including thousands of objects) is feasible for offline monitoring. Some statistics about the experiments are summarized in Table III. We used a Windows 10 machine with Intel Core i7 CPU 85508550U @@ 1.81.8GHZ, 1616GB RAM, and Matlab R2020a. The STPL monitoring algorithm is implemented in CC language and compiled for Matlab.

The formulas in Table III were selected based on the types of operators and their computational complexity. The maximum number of nested quantifiers is a source of exponential complexity (i.e., (Sobj)Sid(S_{obj})^{S_{id}}) in our monitoring algorithm. The highest number of nested quantification operators was 3 in Formulas (16) and (17). Another source of complexity is the number of spatial operators in the formula. For instance, the worst execution time is observed for the Formula (14) due to its higher number of spatial operators. These results are not surprising and are in agreement with our theoretical analysis in Section VI-E.

In order to study the impact of the number of objects in each frame on the monitoring algorithm, we manipulated the BDD dataset to create artificial datasets with specific number of objects. The result is presented in Table IV. In the first row, the maximum number of objects in all the frames is 5. For all the following rows, this number is doubled. Based on the O((Sobj)Sid)O((S_{obj})^{S_{id}}), the e-time of rows should increase with the ratio of 23=82^{3}=8. To remedy the exponential complexity of the nested quantified formulas, we can use parallel computation to efficiently evaluate quantified subformulas. More specifically, Algorithm 3 can be parallelized.

Since the theoretical time complexity of offline STPL monitoring is the same as the time complexity of online past-time STPL monitoring (see Balakrishnan et al. (2021) for a toolbox), the problem of online monitoring of bounded time short duration STPL properties is practically feasible and relevant. However, the offline STPL monitoring problem over extremely large perception datasets may not be feasible without further optimizations or expressivity restrictions. One promising direction is to prefilter very large perception datasets and extract subsequences that are interesting for STPL specifications. One such possibility for filtering is the perception query language SpRE (Anderson et al. (2023)) – see Section VIII for a brief discussion. We plan to pursue such a direction in the future.

TABLE IV: Statistics on execution time for the Formula (16) on an artificial perception data stream.
      |𝝆^|\boldsymbol{|\hat{\rho}|} |𝝋𝒕|\boldsymbol{|\varphi_{t}|} |𝝋𝒔|\boldsymbol{|\varphi_{s}|} |𝑽𝒕|\boldsymbol{|V_{t}|} 𝑺𝒐𝒃𝒋\boldsymbol{S_{obj}} 𝑺𝒊𝒅\boldsymbol{S_{id}} m-time e-time    
      25 29 6 1 5 3 0.001 0.060    
    25 29 6 1 10 3 0.005 0.441    
    25 29 6 1 20 3 0.043 2.419    
    25 29 6 1 40 3 0.312 20.027    
     

VIII Related Works

TABLE V: Overview comparison of spatio-temporal formal languages. The \exists\forall (data) column indicates whether the logic supports quantification over non-spatial data. The \exists\forall (spatial column indicates whether the logic supports some form of spatial quantification, e.g., emptiness operator (i.e., there exists a point), or directional operator (there exists a direction). FSM: Finite state machines.
  Language Temporal Foundation \exists\forall (data) \exists\forall (spatial) Spatial Foundation Domain in practice Applications
  STPL AAN-TPTL \checkmark \checkmark S4uS4_{u} Sets in Euclidean spaces Perception systems
TQTL AAN-TPTL \checkmark Predicates Perception data Vision based (2D) perception
SpRE FSM \checkmark S4uS4_{u} Sets in Euclidean spaces Perception systems
STSL STL \checkmark S4uS4_{u} Distances in Euclidean spaces System level requirements for CPS
SpaTel STL \checkmark TSSL Quadrants Pattern recognition
SSTL STL \checkmark Graph-based modeling Distances in Euclidean spaces Pattern recognition
GSTL STL \checkmark Mereotopology Cubics Knowledge representation

In this section, we provide a detailed overview of related works. We primarily focus on comparing STPL with other spatio-temporal logics. In Table V, we provide a summary comparison of the most relevant logics for easy reference. We conclude the section with some references on promising directions on the analysis of perception systems which are not directly related to spatio-temporal logics.

Region Connection Calculus by Cohn et al. (1997), Shape Calculus by Bartocci et al. (2010), and Situation Calculus by Bhatt and Loke (2008) are just some examples of the vast literature on logics about topology and space. Comprehensive surveys and comparisons of reasoning methods about topology and time are provided by Dylla et al. (2017); Aiello et al. (2007). Spatio-temporal logics and calculi are also frequently used in robotics for human-robot communication (Kress-Gazit and Pappas (2010); Summers-Stay et al. (2014)) and for specifying and verifying properties of AV (Linker and Hilscher (2013); Loos et al. (2011)). All the aforementioned works that deal with topology and time primarily focus on deductive reasoning, theorem proving, knowledge representation, axiomatization, and – in some cases – planning. In contrast, STPL requires computationally efficient tools for spatio-temporal reasoning in order to monitor data streams from perception systems.

Even though there exist spatio-temporal logics that can process spatio-temporal data (offline or online) such as SpaTeL (Haghighi et al. (2015)), SSTL (Nenzi et al. (2015)), or SaSTL (Ma et al. (2020)) (for a short survey see Bartocci et al. (2018)), or even images, e.g., SLCS (Buonamici et al. (2019)), all these logics are application dependent and cannot support the topological reasoning needed for perception data in AV. To highlight the fundamental differences between the aforementioned logics and STPL, we provide a detailed comparison with SpaTeL and SSTL. The differences with the other listed logics and conceptually similar in scope. The two spatio-temporal languages (SSTL and SpaTeL) are explicitly developed for describing high-level spatial patterns that evolve. Both languages are founded based on a graph-based representation of discrete models of space. For the SSTL, undirected weighted graphs are used to model space, and in SpaTeL, a networked system whose states encapsulate an image are represented as quad transition systems.

In more detail, in SSTL, the syntax of the language adds two spatial operators \diamonddiamond[w1,w2]φ\diamonddiamond_{[w_{1},w_{2}]}\varphi and φ1𝒮[w1,w2]φ2\varphi_{1}\mathcal{S}_{[w_{1},w_{2}]}\varphi_{2} into Signal Temporal Logic (STL) (Maler and Nickovic (2004)), which are named bounded somewhere and bounded surround, respectively. The first operator requires φ\varphi to hold in a location that can be reached from the current location with a cost between w1w_{1} and w2w_{2}. The cost is usually the distance between the two locations. For the second operator, the notation of external boundary of a set is required. An external boundary of a given set of nodes is defined as the set of nodes that are directly connected to the elements of the given set but are not members of it. The semantics of the second operator requires that for the current location ll and a given trace xx, ll belongs to a set of locations AA that all satisfy the formula φ1\varphi_{1}, and for all the locations in the external boundary of AA, they satisfy the formula φ2\varphi_{2}. An SSTL formula can be arbitrarily nested.

In SpaTeL, a combination of Tree Spatial Superposition Logic (TSSL) (Gol et al. (2014)) and STL is proposed to reason on spatial patterns. TSSL uses quad-trees to represent the space by partitioning the space recursively into quadrants. The TSSL logic is similar to the classic Computation Tree Logic (CTL) (e.g., see Huth and Ryan (2004)), with the main difference that the next and until operators are not temporal, but spatial. That is, evolution happens by a change of resolution (or zoom in). All the spatial operators are augmented by a set BB that selects the spatial directions (i.e., NW, NE, SW, and SE) in which the operators are allowed to work. Additionally, similar to temporal operators, there is a parameter kk that limits the operator’s evaluation range on a finite sequence of states. For example, Bφ1Ukφ2\exists_{B}\varphi_{1}U_{k}\varphi_{2} means that there exists a set of directions in BB by which the ii-th label of a path πB\pi^{B} satisfies the formula φ2\varphi_{2} and all the other labels on the path until ii satisfy the formula φ1\varphi_{1}. A difference between the former and the latter is that in the former one, the TSSL fragment of a formula does not include temporal subformulas.

In summary, the key differences of these logics with our proposed STPL logic are:

  • Modeling: SSTL and SpaTeL are not designed to model physical objects in 2D or 3D spaces. On the other hand, our logic is explicitly designed to handle physical objects.

  • Expressivity: SpaTeL is inherently less expressive than SSTL due to its modeling and traversing approach on quad-trees and the decoupled syntax for spatial and temporal formulas. Therefore, we are going to compare STPL with SSTL. There are two significant differences. The first one is the presence of time freeze operator and time variables and, hence, STPL is more expressive. The second one is the presence of the quantifiers and set operations over spatial terms/locations. As an example, SSTL cannot reason on whether the same object over two different frames overlaps or not with itself.

  • Application: SSTL and SpaTeL are mostly helpful for pattern recognition purposes, while STPL is a more general-purpose language. Quantitative semantics: there is quantitative and qualitative semantics for SSTL and SpaTeL, but currently, we only presented qualitative semantics for STPL. The graph-based modeling of the spatial environment and the fixed metric properties such as distance makes it more straightforward to define quantitative semantics for their underlying logic.

In another line of work, a graph-based spatio-temporal logic – GSTL by Liu et al. (2020) – is presented for knowledge representation and reasoning. GSTL deals with spatial elements as regions, and uses mereotopology (combination of mereology and topology to support parthood and connectivity, respectively) to represent relations between spatial elements. It exploits rectangle/cubic algebra to represent spatial objects. GSTL combines STL temporal logic with mereotopology-based spatial operators enriched with interval algebra. The satisfiability problem for GSTL is decidable by restricting the evolution of spatial elements. GSTL was primarily designed for model checking which restricts its expressivity for decidability reasons.

The works closest to ours stem from combining temporal logics with spatial logics (for a historical overview and a discussion on 𝒮4u{\mathcal{S}}4_{u} see Kontchakov et al. (2007)). Gabelaia et al. (2005) combine Linear Temporal Logic (LTL) (Manna and Pnueli (1992)) with 𝒮4u{\mathcal{S}}4_{u} to define the logic 𝒫𝒯×𝒮4u\mathcal{PTL}\times{\mathcal{S}}4_{u}. They further define several fragments of 𝒫𝒯×𝒮4u\mathcal{PTL}\times{\mathcal{S}}4_{u} and they study the decidability of the satisfiability problem. However, the problem of offline monitoring is not investigated in this line of work.

More recently, STSL was proposed by Li et al. (2021) where STL is combined with 𝒮4u{\mathcal{S}}4_{u}. Even though the monitoring problem is studied for STSL, STSL falls short of the goals of STPL in multiple directions. First and foremost, STSL does not support generic data and quantification over such data. That is, it is not possible to express a property such as Req. 2 where we need to quantify over the bounding boxes of all the cars in a frame. Second, STPL is based on TPTL which is a strictly more expressive logic than STL used in STSL. Third, a theoretical or experimental computational complexity analysis is not presented for STSL to identify what fragments are computationally important while still being practically relevant. Finally, and most importantly, the applications presented for STSL are restricted to properties over numerical trajectories produced by Cyber-Physical Systems, and it is clear that with the metric space chosen for STSL, the corresponding formal specifications can be expressed in STL. That is, in practice, there is no gain in expressive power in STSL over STL.

Another line of research relevant to our work is formal languages for analysis of perception systems. Timed Quality Temporal Logic (TQTL) by Dokhanchi et al. (2018a) was designed to reason over streams of perception data. TQTL is built upon the AAN fragment of Timed Propositional Temporal Logic (AAN-TPTL) (Dokhanchi et al. (2016)) by introducing quantification (\exists, \forall) over the objects in each frame, and by introducing functions that retrieve data relevant to each object, e.g., class, probabilities, bounding box coordinates, etc. The AAN fragment of TPTL was chosen due to its polynomial time complexity while still being strictly more expressive then STL. Note that further algorithmic improvements on AAN-TPTL are possible, e.g., see Elgyutt et al. (2018); Ghorbel and Prabhu (2022). Nevertheless, TQTL cannot reason directly about properties of bounding boxes. For example, TQTL cannot reason about self-overlap of bounding boxes across time, i.e., TQTL cannot express Req. 2. More generally, TQTL cannot reason about 3D scenes (e.g, bird-eye view of the world) since this requires a mechanism to relate spatially different objects in the environment. STPL resolves these shortcomings of TQTL to enable a versatile framework to reason about perception systems in both 2D and 3D (along with other state variables included in the perception data).

Spatial Regular Expressions (SpREs) were recently introduced by Anderson et al. (2023) to find patterns in streaming data. That is, given a SpRE, the goal is to find all the sequences of frames that match the pattern specified by SpRE. SpREs were designed to closely resemble regular expressions, and, hence, the underlying model of computation for processing streaming perception data is automata (Sipser (2006)). The current version of SpRE does not support quantification over data in order to enable online real-time processing. However, SpRE supports 𝒮4u{\mathcal{S}}4_{u} operators on per frame basis. Clearly, SpRE is less expressive than STPL, but we envision an interplay between the two languages. SpRE can potentially find very quickly the subsequences of streaming data over which we need to run the more expressive STPL requirements.

The PerSyS monitoring system by Antonante et al. (2021) presents a mathematical model for fault detection in perception systems. The base of their work is Perfect Minicomputer Corporation (PMC) model in multiprocessor systems, which is generalized to account for models with heterogeneous outputs (i.e., perception systems), and equipped with temporal dimensions to support interaction among PMC models. Their system supports consistency checking among different sensory outputs of a perception system with some formal guarantees on the maximum number of inconsistencies. In PerSyS, it is possible to design models that identify faults, but, similar to any other graph-based modeling technique, it is highly reliant on a correct model to begin with, and then adding formalized requirements as a set of constraints on the models (i.e., constraints on the edges of the PMC graphs). STPL monitoring goals are orthogonal to PerSyS. STPL is a specification language that formalizes assumptions and guarantees on the functional performance of the perception system. As such a language, it is more expressive than the constraints used in PerSyS. As a byproduct, STPL can also function as a comparison framework between different perception stacks.

Finally, the language Scenic by Fremont et al. (2018) has been developed for creating single scene images for testing object detection and classification algorithms. However, our work is complementary to languages that generate static scenes.

IX Conclusions

In this paper, we presented Spatio-Temporal Perception Logic (STPL), which is a logic which is specifically designed for reasoning over data streams of perception systems. STPL merges and extends other practical logics such as TPTL (Alur and Henzinger (1994)) and S4uS4_{u} (Aiello et al. (2007)) with data (object) quantification, functions and relations that enable topological reasoning over time. Our new logic can be used for specifying correctness requirements on perception systems, as well as to compare different machine learning stacks on their performance beyond the standard metrics (e.g., see Mallick et al. (2023)). We have identified fragments of STPL which are efficiently monitorable for perception applications, and we have demonstrated that practically relevant requirements which do not fall within these fragments can still be efficiently monitorable in practice. An open source publicly available toolbox has been developed STPL (2023) which can be used for offline perception system analysis. An online monitor for the past fragment of STPL is also available (Balakrishnan et al. (2021)). Using STPL, we have been able to discover inconsistencies in publicly available training datasets for AV/ADAS.

Since STPL formulas are rather complex even for experts, we have have been working toward developing a Domain Specific Language (DSL) called PyFoReL (Anderson et al. (2022)) for easier elicitation and maintenance of STPL requirements. PyFoReL provides a Pythonic language to compose requirements in a modular way while enforcing that they are valid STPL formulas. The next step would be to interface PyFoReL and/or the STPL syntax with Large Language Models (LLM). Similar work has been done in the past for LTL and STL with practically relevant success (e.g., see Pan et al. (2023); Fuggitti and Chakraborti (2023)). In addition, verification and debugging tools for STPL formulas will be needed since LLMs cannot be trusted to always produce correct translations. In the past, we have done similar work for STL/MTL specifications in Dokhanchi et al. (2018b). We expect to achieve further computational improvements on our monitoring algorithms by parallelization and by filtering relevant sequences of data through our new query language SpRE (Anderson et al. (2023)) before the STPL tools are used. Finally, it would also be interesting to see if meaningful robust semantics (Bartocci et al. (2018)) could be defined in order to support test case generation or even self-supervised training of neural networks.

Acknowledgment

This work was partially supported by NSF under grants CNS-2039087, CNS-2038666, IIP-1361926, and the NSF I/UCRC Center for Embedded Systems.

References

  • Abbas et al. (2017) Abbas H, O’Kelly ME, Rodionova A and Mangharam R (2017) A driver’s license test for driverless vehicles. Mechanical Engineering 139: S13–S16.
  • Aiello et al. (2007) Aiello M, Pratt-Hartmann IE and van Benthem JF (2007) Handbook of spatial logics. Springer.
  • Alur and Henzinger (1994) Alur R and Henzinger TA (1994) A really temporal logic. J. ACM 41: 181–203.
  • Aluru (2018) Aluru S (2018) Quadtrees and octrees. In: Handbook of Data Structures and Applications. Chapman and Hall/CRC, pp. 309–326.
  • Anderson et al. (2023) Anderson J, Fainekos G, Hoxha B, Okamoto H and Prokhorov D (2023) Pattern matching for perception streams. In: 23rd International Conference on Runtime Verification (RV).
  • Anderson et al. (2022) Anderson J, Hekmatnejad M and Fainekos G (2022) PyFoReL: A domain-specific language for formal requirements in temporal logic. In: IEEE 30th International Requirements Engineering Conference (RE).
  • Antonante et al. (2021) Antonante P, Spivak DI and Carlone L (2021) Monitoring and diagnosability of perception systems. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 168–175.
  • Balakrishnan et al. (2021) Balakrishnan A, Deshmukh J, Hoxha B, Yamaguchi T and Fainekos G (2021) Percemon: Online monitoring for perception systems. In: International Conference on Runtime Verification (RV), LNCS, volume 12974.
  • Baotic (2009) Baotic M (2009) Polytopic computations in constrained optimal control 50: 119–134.
  • Bartocci et al. (2010) Bartocci E, Corradini F, Berardini MRD, Merelli E and Tesei L (2010) Shape calculus. a spatial mobile calculus for 3d shapes. Scientific Annals of Computer Science 20: 1–31.
  • Bartocci et al. (2018) Bartocci E, Deshmukh J, Donzé A, Fainekos G, Maler O, Nickovic D and Sankaranarayanan S (2018) Specification-based monitoring of cyber-physical systems: A survey on theory, tools and applications. In: Lectures on Runtime Verification - Introductory and Advanced Topics, LNCS, volume 10457. Springer, pp. 128–168.
  • Bashetty et al. (2020) Bashetty SK, Amor HB and Fainekos G (2020) DeepCrashTest: turning dashcam videos into virtual crash tests for automated driving systems. In: International Conference on Robotics and Automation (ICRA).
  • Basin et al. (2015) Basin D, Klaedtke F, Müller S and Zălinescu E (2015) Monitoring metric first-order temporal properties 62(2).
  • Bhatt and Loke (2008) Bhatt M and Loke S (2008) Modelling dynamic spatial systems in the situation calculus. Spatial Cognition & Computation 8: 86–130.
  • Bournez et al. (1999) Bournez O, Maler O and Pnueli A (1999) Orthogonal polyhedra: Representation and computation. In: International Workshop on Hybrid Systems: Computation and Control. Springer, pp. 46–60.
  • Buonamici et al. (2019) Buonamici FB, Belmonte G, Ciancia V, Latella D and Massink M (2019) Spatial logics and model checking for medical imaging. International Journal on Software Tools for Technology Transfer : 1–23.
  • Caesar et al. (2020) Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G and Beijbom O (2020) nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11621–11631.
  • Campbell et al. (2016) Campbell J, Amor HB, Ang MH and Fainekos G (2016) Traffic light status detection using movement patterns of vehicles. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC). IEEE, pp. 283–288.
  • Cimatti et al. (2004) Cimatti A, Roveri M and Sheridan D (2004) Bounded verification of past ltl. In: International Conference on Formal Methods in Computer-Aided Design. Springer, pp. 245–259.
  • Cohn et al. (1997) Cohn AG, Bennett B, Gooday J and Gotts NM (1997) Qualitative spatial representation and reasoning with the region connection calculus. GeoInformatica 1(3): 275–316.
  • Corso et al. (2020) Corso A, Moss RJ, Koren M, Lee R and Kochenderfer MJ (2020) A survey of algorithms for black-box safety validation. arXiv preprint arXiv:2005.02979 .
  • Cosler et al. (2023) Cosler M, Hahn C, Mendoza D, Schmitt F and Trippel C (2023) nl2spec: Interactively translating unstructured natural language to temporal logics with large language models. In: Computer Aided Verification, LNCS, volume 13965. Springer, pp. 383–396.
  • DeCastro et al. (2020) DeCastro J, Leung K, Aréchiga N and Pavone M (2020) Interpretable policies from formally-specified temporal properties. In: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems Conference (ITSC).
  • Dokhanchi et al. (2018a) Dokhanchi A, Amor HB, Deshmukh JV and Fainekos G (2018a) Evaluating perception systems for autonomous vehicles using quality temporal logic. In: International Conference on Runtime Verification. Springer, pp. 409–416.
  • Dokhanchi et al. (2018b) Dokhanchi A, Hoxha B and Fainekos G (2018b) Formal requirement debugging for testing and verification of cyber-physical systems. ACM Transactions on Embedded Computing Systems 17. DOI:10.1145/3147451.
  • Dokhanchi et al. (2016) Dokhanchi A, Hoxha B, Tuncali CE and Fainekos G (2016) An efficient algorithm for monitoring practical tptl specifications. In: 2016 ACM/IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE). IEEE, pp. 184–193.
  • Dreossi et al. (2019a) Dreossi T, Donzé A and Seshia SA (2019a) Compositional falsification of cyber-physical systems with machine learning components. Journal of Automated Reasoning 63: 1031–1053.
  • Dreossi et al. (2019b) Dreossi T, Fremont DJ, Ghosh S, Kim E, Ravanbakhsh H, Vazquez-Chanlatte M and Seshia SA (2019b) Verifai: A toolkit for the formal design and analysis of artificial intelligence-based systems. In: International Conference on Computer Aided Verification. Springer, pp. 432–442.
  • Dylla et al. (2017) Dylla F, Lee JH, Mossakowski T, Schneider T, Delden AV, Ven JVD and Wolter D (2017) A survey of qualitative spatial and temporal calculi: Algebraic and computational properties. ACM Computing Surveys 50.
  • Eisner and Fisman (2006) Eisner C and Fisman D (2006) Weak vs. strong temporal operators. A Practical Introduction to PSL : 27–34.
  • Elgyutt et al. (2018) Elgyutt A, Ferrere T and Henzinger TA (2018) Monitoring temporal logic with clock variables. In: Formal Modeling and Analysis of Timed Systems (FORMATS), LNCS, volume 11022. Springer.
  • Fainekos et al. (2012) Fainekos GE, Sankaranarayanan S, Ueda K and Yazarel H (2012) Verification of automotive control applications using s-taliro. In: 2012 American Control Conference (ACC). IEEE, pp. 3567–3572.
  • Fremont et al. (2020) Fremont DJ, Kim E, Pant YV, Seshia SA, Acharya A, Bruso X, Wells P, Lemke S, Lu Q and Mehta S (2020) Formal scenario-based testing of autonomous vehicles: From simulation to the real world. In: 23rd IEEE International Conference on Intelligent Transportation Systems (ITSC).
  • Fremont et al. (2018) Fremont DJ, Yue X, Dreossi T, Ghosh S, Sangiovanni-Vincentelli AL and Seshia SA (2018) Scenic: Language-based scene generation. Technical report, arXiv:1809.09310.
  • Fuggitti and Chakraborti (2023) Fuggitti F and Chakraborti T (2023) Nl2ltl – a python package for converting natural language (nl) instructions to linear temporal logic (ltl) formulas 37(13): 16428–16430.
  • Gabelaia et al. (2005) Gabelaia D, Kontchakov R, Kurucz A, Wolter F and Zakharyaschev M (2005) Combining spatial and temporal logics: expressiveness vs. complexity. Journal of artificial intelligence research 23: 167–243.
  • Geiger et al. (2013a) Geiger A, Lenz P, Stiller C and Urtasun R (2013a) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32(11): 1231–1237.
  • Geiger et al. (2013b) Geiger A, Lenz P, Stiller C and Urtasun R (2013b) Vision meets robotics: The KITTI dataset. International Journal of Robotics Research (IJRR) 32: 1231–1237.
  • Ghorbel and Prabhu (2022) Ghorbel B and Prabhu V (2022) Linear time monitoring for one variable tptl. In: 25th ACM International Conference on Hybrid Systems: Computation and Control (HSCC).
  • Gladisch et al. (2019) Gladisch C, Heinz T, Heinzemann C, Oehlerking J, von Vietinghoff A and Pfitzer T (2019) Experience paper: Search-based testing in automated driving control applications. In: 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).
  • Gol et al. (2014) Gol EA, Bartocci E and Belta C (2014) A formal methods approach to pattern synthesis in reaction diffusion systems. In: 53rd IEEE Conference on Decision and Control.
  • Gordon et al. (2018) Gordon D, Farhadi A and Fox D (2018) Re3: Real-time recurrent regression networks for visual tracking of generic objects. IEEE Robotics and Automation Letters 3(2): 788–795.
  • Haghighi et al. (2015) Haghighi I, Jones A, Kong Z, Bartocci E, Grosu R and Belta C (2015) Spatel: a novel spatial-temporal logic and its applications to networked systems. In: Proceedings of the 18th International Conference on Hybrid Systems: Computation and Control. pp. 189–198.
  • Havelund et al. (2020) Havelund K, Peled D and Ulus3 D (2020) First-order temporal logic monitoring with bdds 56.
  • He et al. (2015) He K, Lahijanian M, Kavraki LE and Vardi MY (2015) Towards manipulation planning with temporal logic specifications. In: 2015 IEEE international conference on robotics and automation (ICRA). IEEE, pp. 346–352.
  • He et al. (2018) He K, Lahijanian M, Kavraki LE and Vardi MY (2018) Automated abstraction of manipulation domains for cost-based reactive synthesis. IEEE Robotics and Automation Letters 4(2): 285–292.
  • Hekmatnejad et al. (2019) Hekmatnejad M, Yaghoubi S, Dokhanchi A, Amor HB, Shrivastava A, Karam L and Fainekos G (2019) Encoding and monitoring responsibility sensitive safety rules for automated vehicles in signal temporal logic. In: 17th ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE).
  • Huth and Ryan (2004) Huth M and Ryan M (2004) Logic in Computer Science: Modelling and reasoning about systems. Cambridge university press.
  • Kim et al. (2022) Kim E, Shenoy J, Junges S, Fremont DJ, Sangiovanni-Vincentelli A and Seshia SA (2022) Querying labelled data with scenario programs for sim-to-real validation. In: ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS). pp. 34–45.
  • Kontchakov et al. (2007) Kontchakov R, Kurucz A, Wolter F and Zakharyaschev M (2007) Handbook of spatial logics: Spatial logic + temporal logic = ? Springer, pp. 497–564.
  • Koymans (1990) Koymans R (1990) Specifying real-time properties with metric temporal logic. Real-Time Systems 2(4): 255–299.
  • Kress-Gazit and Pappas (2010) Kress-Gazit H and Pappas GJ (2010) Automatic synthesis of robot controllers for tasks with locative prepositions. In: IEEE International Conference on Robotics and Automation (ICRA). pp. 3215–3220.
  • Lee (2018) Lee TB (2018) Report: Software bug led to death in uber’s self-driving crash. Ars Technica May 07.
  • Li et al. (2021) Li T, Liu J, Sun H, Chen X, Yin L, Mao X and Sun J (2021) Runtime verification of spatio-temporal specification language 21(26): 2392–2406.
  • Linker and Hilscher (2013) Linker S and Hilscher M (2013) Proof theory of a multi-lane spatial logic. In: International Conference on Theoretical Aspects of Computing (ICTAC), LNCS, volume 8049. Springer, pp. 231–248.
  • Liu et al. (2020) Liu Z, Jiang M and Lin H (2020) A graph-based spatial temporal logic for knowledge representation and automated reasoning in cognitive robots. arXiv preprint arXiv:2001.07205 .
  • Loos et al. (2011) Loos SM, Platzer A and Nistor L (2011) Adaptive cruise control: Hybrid, distributed, and now formally verified. In: Formal Methods, LNCS, volume 6664. Springer, pp. 42–56.
  • Ma et al. (2020) Ma M, Bartocci E, Lifland E, Stankovic J and Feng L (2020) SaSTL: Spatial aggregation signal temporal logic for runtime monitoring in smart cities. In: ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS). pp. 51–62.
  • Maler and Nickovic (2004) Maler O and Nickovic D (2004) Monitoring temporal properties of continuous signals. In: Proceedings of FORMATS-FTRTFT, LNCS, volume 3253. pp. 152–166.
  • Mallick et al. (2023) Mallick S, Ghosal S, Balakrishnan A and Deshmukh J (2023) Safety monitoring for pedestrian detection in adverse conditions. In: 23rd International Conference on Runtime Verification (RV).
  • Manna and Pnueli (1992) Manna Z and Pnueli A (1992) The Temporal Logic of Reactive and Concurrent Systems — Specification. Springer.
  • Markey and Raskin (2006) Markey N and Raskin JF (2006) Model checking restricted sets of timed paths. Theoretical Computer Science 358: 273–292.
  • Mehdipour et al. (2023) Mehdipour N, Althoff M, Tebbens RD and Belta C (2023) Formal methods to comply with rules of the road in autonomous driving: State of the art and grand challenges 152: 110692.
  • Motional (2019) Motional (2019) nuScenes dataset. URL https://www.nuscenes.org/nuscenes. Accessed: 2020-11-14.
  • Nenzi et al. (2015) Nenzi L, Bortolussi L, Ciancia V, Loreti M and Massink M (2015) Qualitative and quantitative monitoring of spatio-temporal properties. In: Runtime Verification. Springer, pp. 21–37.
  • Pan et al. (2023) Pan J, Chou G and Berenson D (2023) Data-efficient learning of natural language to linear temporal logic translators for robot task specification. In: IEEE International Conference on Robotics and Automation (ICRA).
  • Perugini (2021) Perugini S (2021) Programming languages: Concepts and implementation. Jones & Bartlett Learning.
  • Qin et al. (2016) Qin B, Chong ZJ, Soh SH, Bandyopadhyay T, Ang MH, Frazzoli E and Rus D (2016) A spatial-temporal approach for moving object recognition with 2d lidar. In: Experimental Robotics. Springer, pp. 807–820.
  • Richter et al. (2017) Richter SR, Hayder Z and Koltun V (2017) Playing for benchmarks. In: IEEE International Conference on Computer Vision, ICCV. pp. 2232–2241.
  • Rizaldi et al. (2017) Rizaldi A, Keinholz J, Huber M, Feldle J, Immler F, Althoff M, Hilgendorf E and Nipkow T (2017) Formalising and monitoring traffic rules for autonomous vehicles in isabelle/hol. In: Integrated Formal Methods. Springer, pp. 50–66.
  • Schwarting et al. (2018) Schwarting W, Alonso-Mora J and Rus D (2018) Planning and decision-making for autonomous vehicles. Annual Review of Control, Robotics, and Autonomous Systems 1: 187–210.
  • Shneier (1981) Shneier M (1981) Calculations of geometric properties using quadtrees. Computer Graphics and Image Processing 16(3): 296–302.
  • Sipser (2006) Sipser M (2006) Introduction to the theory of computation. 2nd edition. Course Technology.
  • STPL (2023) STPL (2023) Spatio-Temporal Perception Logic (STPL) Offline Monitoring Tools. https://gitlab.com/vnv-tools/STPL.
  • Summers-Stay et al. (2014) Summers-Stay D, Cassidy T and Voss C (2014) Joint navigation in commander/robot teams: Dialog & task performance when vision is bandwidth-limited. In: Third Workshop on Vision and Language.
  • Sun et al. (2019) Sun H, Ang MH and Rus D (2019) A convolutional network for joint deraining and dehazing from a single image for autonomous driving in rain. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 962–969.
  • Templeton (2020) Templeton B (2020) Tesla in taiwan crashes directly into overturned truck, ignores pedestrian, with autopilot on. Forbes June 2.
  • Tuncali et al. (2020) Tuncali CE, Fainekos G, Prokhorov D, Ito H and Kapinski J (2020) Requirements-driven test generation for autonomous vehicles with machine learning components. IEEE Transactions on Intelligent Vehicles 5: 265–280. DOI:10.1109/TIV.2019.2955903.
  • van Benthem and Bezhanishvili (2007) van Benthem J and Bezhanishvili G (2007) Modal logics of space. Handbook of Spatial Logics : 217–298.
  • Wu et al. (2017) Wu B, Iandola F, Jin PH and Keutzer K (2017) Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp. 129–137.
  • Yadav and Curry (2019) Yadav P and Curry E (2019) Vidcep: Complex event processing framework to detect spatiotemporal patterns in video streams. In: IEEE International conference on big data. pp. 2513–2522.
  • Yu et al. (2020) Yu F, Chen H, Wang X, Xian W, Chen Y, Liu F, Madhavan V and Darrell T (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2636–2645.
  • Yurtsever et al. (2020) Yurtsever E, Lambert J, Carballo A and Takeda K (2020) A survey of autonomous driving: Common practices and emerging technologies. IEEE Access 8: 58443–58469.

Appendix A Appendix: STPL Future Syntax

The following definition introduces a future fragment of the introduced STPL syntax in Def. IV.1. Here, we restrict the grammar by including rules that enforce a formula to be an Almost Arbitrarily Nesting Formula as in Def. IV.3. Notice that in the following, the grammar rules force the expressions to be indexed to track the level of nesting in quantifier operators.

Definition A.1 (STPL AAN Syntax for Discrete-Time Signal)

Let VxV_{x} and VoV_{o} be sets of time variables and object variables, respectively. Let xx be a vector of time variables, i.e., x=[x0,,xn1]Tx=[x_{0},\ldots,x_{n-1}]^{T}, and idid be a vector of object variables, i.e., id=[id0,,idm1]Tid=[id_{0},\ldots,id_{m-1}]^{T}, and \mathcal{I} be any non-empty interval of 0\;\mathbb{R}_{\geq 0} over time. The syntax for Spatio-Temporal Perception Logic (STPL) formulas is provided by the following grammar:

Φi,j\displaystyle\Phi_{i,j} ::=idi@xi.Φif,q|xi.Φif|idi.Φi,jq\displaystyle::=\exists{id_{i}}@x_{i}.\Phi^{f,q}_{i}\;|\;x_{i}.\Phi^{f}_{i}\;|\;\exists{id_{i}}.\Phi^{q}_{i,j}
|¬Φi,j|Φi,jΦi,j|Φi,j|Φi,j𝒰Φi,j|\displaystyle\;\;\;\;\;\;\top\;|\;\neg\Phi_{i,j}\;|\;\Phi_{i,j}\vee\Phi_{i,j}\;|\;\bigcirc\Phi_{i,j}\;|\;\Phi_{i,j}\;\,\mathcal{U}\;\Phi_{i,j}\;|\;
Φi,j|Φi,j𝒰Φi,j|~Φi,j|Φi,j𝒰~Φi,j\displaystyle\;\;\;\;\;\;{\bigcirc_{\mathcal{I}}\Phi_{i,j}\;|\;\Phi_{i,j}\;\,\mathcal{U}_{\mathcal{I}}\;\Phi_{i,j}\;|\;\tilde{\bigcirc}_{\mathcal{I}}\Phi_{i,j}\;|\;\Phi_{i,j}\;\tilde{\,\mathcal{U}}_{\mathcal{I}}\;\Phi_{i,j}}
Φif,q\displaystyle\Phi^{f,q}_{i} ::=Φif|Φi,iq|\displaystyle::=\Phi^{f}_{i}\;|\;\Phi^{q}_{i,i}\;|\;
|¬Φif,q|Φif,qΦif,q|Φif,q|Φif,q𝒰Φif,q|\displaystyle\;\;\;\;\;\;\top\;|\;\neg\Phi^{f,q}_{i}\;|\;\Phi^{f,q}_{i}\vee\Phi^{f,q}_{i}\;|\;\bigcirc\Phi^{f,q}_{i}\;|\;\Phi^{f,q}_{i}\;\,\mathcal{U}\;\Phi^{f,q}_{i}\;|\;
Φif,q|Φif,q𝒰Φif,q|~Φif,q|Φif,q𝒰~Φif,q\displaystyle\;\;\;\;\;\;{\bigcirc_{\mathcal{I}}\Phi^{f,q}_{i}\;|\;\Phi^{f,q}_{i}\;\,\mathcal{U}_{\mathcal{I}}\;\Phi^{f,q}_{i}\;|\;\tilde{\bigcirc}_{\mathcal{I}}\Phi^{f,q}_{i}\;|\;\Phi^{f,q}_{i}\;\tilde{\,\mathcal{U}}_{\mathcal{I}}\;\Phi^{f,q}_{i}}
Φif\displaystyle\Phi^{f}_{i} ::=τxi>t|xi>n|Φi+1,i|\displaystyle::=\tau-x_{i}>t\;|\;\mathcal{F}-x_{i}>n\;|\;\Phi_{i+1,i}\;|\;
|¬Φif|ΦifΦif|Φif|Φif𝒰Φif|\displaystyle\;\;\;\;\;\;\top\;|\;\neg\Phi^{f}_{i}\;|\;\Phi^{f}_{i}\vee\Phi^{f}_{i}\;|\;\bigcirc\Phi^{f}_{i}\;|\;\Phi^{f}_{i}\;\,\mathcal{U}\;\Phi^{f}_{i}\;|\;
Φif|Φif𝒰Φif|~Φif|Φif𝒰~Φif\displaystyle\;\;\;\;\;\;{\bigcirc_{\mathcal{I}}\Phi^{f}_{i}\;|\;\Phi^{f}_{i}\;\,\mathcal{U}_{\mathcal{I}}\;\Phi^{f}_{i}\;|\;\tilde{\bigcirc}_{\mathcal{I}}\Phi^{f}_{i}\;|\;\Phi^{f}_{i}\;\tilde{\,\mathcal{U}}_{\mathcal{I}}\;\Phi^{f}_{i}}
Φi,jq\displaystyle\Phi^{q}_{i,j} ::=C(Λi,j)=c|C(Λi,j)=C(Λi,j)|P(Λi,j)r\displaystyle::=C(\Lambda_{i,j})=c\;|\;C(\Lambda_{i,j})=C(\Lambda_{i,j})\;|\;P(\Lambda_{i,j})\geq r
|P(Λi,j)r×P(Λi,j)|Λi,j=Λi,j|Φi+1,j|\displaystyle\;\;\;\;\;\;|\;P(\Lambda_{i,j})\geq r\times P(\Lambda_{i,j})\;|\;{\Lambda_{i,j}=\Lambda_{i,j}\;|\;\Phi_{i+1,j}}\;|\;
|¬Φi,jq|Φi,jqΦi,jq|Φi,jq|Φi,jq𝒰Φi,jq|\displaystyle\;\;\;\;\;\;\top\;|\;\neg\Phi^{q}_{i,j}\;|\;\Phi^{q}_{i,j}\vee\Phi^{q}_{i,j}\;|\;\bigcirc\Phi^{q}_{i,j}\;|\;\Phi^{q}_{i,j}\;\,\mathcal{U}\;\Phi^{q}_{i,j}\;|\;
Φi,jq|Φi,jq𝒰Φi,jq|~Φi,jq|Φi,jq𝒰~Φi,jq|\displaystyle\;\;\;\;\;\;{\bigcirc_{\mathcal{I}}\Phi^{q}_{i,j}\;|\;\Phi^{q}_{i,j}\;\,\mathcal{U}_{\mathcal{I}}\;\Phi^{q}_{i,j}\;|\;\tilde{\bigcirc}_{\mathcal{I}}\Phi^{q}_{i,j}\;|\;\Phi^{q}_{i,j}\;\tilde{\,\mathcal{U}}_{\mathcal{I}}\;\Phi^{q}_{i,j}}\;|\;
Ωi,j|Θi,j|Πi,j\displaystyle\;\;\;\;\;\;{\leavevmode\resizebox{7.22743pt}{}{\framebox{$\exists$}}\Omega_{i,j}}\;|\;{\Theta_{i,j}}\;|\;{\Pi_{i,j}}
Λi,j\displaystyle{\Lambda_{i,j}} ::=idj|idj+1||idi\displaystyle::=id_{j}\;|\;id_{j+1}\;|\dots\;|\;id_{i}
Ωi,j\displaystyle{\Omega_{i,j}} ::=σ(Λi,j)||𝕌|Ω¯i,j|Ωi,jΩi,j|𝐈Ωi,j|\displaystyle::=\mathcal{\sigma}(\Lambda_{i,j})\;|\;{\Large{\varnothing}}\;|\;{\Large{\mathbb{U}}}\;|\;\overline{\Omega}_{i,j}\;|\;\Omega_{i,j}\sqcap\Omega_{i,j}\;|\;\mathbf{I}{\Omega_{i,j}}\;|\;
Ωi,j𝒰sΩi,j|sΩi,j|sΩi,j|sΩi,j|\displaystyle\;\;\;\;\;\;\Omega_{i,j}\;\,\mathcal{U}_{s}\;\Omega_{i,j}\;|\;\Diamond_{s}\Omega_{i,j}\;|\;\Box_{s}\Omega_{i,j}\;|\;\bigcirc_{s}\Omega_{i,j}\;|\;
Ωi,j𝒰sΩi,j|sΩi,j|sΩi,j|sΩi,j|\displaystyle\;\;\;\;\;\;{\Omega_{i,j}\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\Omega_{i,j}\;|\;\Diamond^{s}_{\mathcal{I}}\Omega_{i,j}\;|\;\Box^{s}_{\mathcal{I}}\Omega_{i,j}\;|\;\bigcirc^{s}_{\mathcal{I}}\Omega_{i,j}}\;|\;
Ωi,j𝒰~sΩi,j|~sΩi,j|~sΩi,j|~sΩi,j\displaystyle\;\;\;\;\;\;{\Omega_{i,j}\;\tilde{\,\mathcal{U}}^{s}_{\mathcal{I}}\;\Omega_{i,j}\;|\;\tilde{\Diamond}^{s}_{\mathcal{I}}\Omega_{i,j}\;|\;\tilde{\Box}^{s}_{\mathcal{I}}\Omega_{i,j}\;|\;\tilde{\bigcirc}^{s}_{\mathcal{I}}\Omega_{i,j}}
Πi,j\displaystyle{\Pi_{i,j}} ::=Area(Ωi,j)r|Area(Ωi,j)r×Area(Ωi,j)\displaystyle::=Area(\Omega_{i,j})\geq r\;|\;Area(\Omega_{i,j})\geq r\times Area(\Omega_{i,j})
Θi,j\displaystyle{\Theta_{i,j}} ::=Dist(Λi,j,CRT,Λi,j,CRT)r|\displaystyle::=Dist(\Lambda_{i,j},\textnormal{\scriptsize{CRT}},\Lambda_{i,j},\textnormal{\scriptsize{CRT}})\geq r\;|\;
Lat(Λi,j,CRT)r|Lon(Λi,j,CRT)r|\displaystyle\;\;\;\;\;\;Lat(\Lambda_{i,j},\textnormal{\scriptsize{CRT}})\geq r\;|\;Lon(\Lambda_{i,j},\textnormal{\scriptsize{CRT}})\geq r\;|\;
Lat(Λi,j,CRT)r×Lat(Λi,j,CRT)|\displaystyle\;\;\;\;\;\;Lat(\Lambda_{i,j},\textnormal{\scriptsize{CRT}})\geq r\times Lat(\Lambda_{i,j},\textnormal{\scriptsize{CRT}})\;|\;
Lon(Λi,j,CRT)r×Lon(Λi,j,CRT)|\displaystyle\;\;\;\;\;\;Lon(\Lambda_{i,j},\textnormal{\scriptsize{CRT}})\geq r\times Lon(\Lambda_{i,j},\textnormal{\scriptsize{CRT}})\;|\;
Lat(Λi,j,CRT)r×Lon(Λi,j,CRT)|\displaystyle\;\;\;\;\;\;Lat(\Lambda_{i,j},\textnormal{\scriptsize{CRT}})\geq r\times Lon(\Lambda_{i,j},\textnormal{\scriptsize{CRT}})\;|\;
Area(Λi,j)r|Area(Λi,j)r×Area(Λi,j)\displaystyle\;\;\;\;\;\;Area(\Lambda_{i,j})\geq r|Area(\Lambda_{i,j})\geq r\times Area(\Lambda_{i,j})
CRT ::=LM|RM|TM|BM|CT\displaystyle::=\textnormal{\scriptsize{LM}}\;|\;\textnormal{\scriptsize{RM}}\;|\;\textnormal{\scriptsize{TM}}\;|\;\textnormal{\scriptsize{BM}}\;|\;\textnormal{\scriptsize{CT}}

where i0i\geq 0, and the grammar starts from Φ0,0\Phi_{0,0}.

The time and frame constraints of STPL are represented in the form of τx>r\tau-x>r and x>n\mathcal{F}-x>n, respectively. The freeze time quantifier x.ϕx.\phi assigns the current frame ii to time variable xx before processing the subformula ϕ\phi. The Existential quantifier is denoted as \exists, and the Universal quantifier is denoted as \forall. The following syntactic equivalences hold for the STPL formulas ψ\psi and ϕ\phi using syntactic manipulation. {id}@x.ϕ¬({id}@x.¬ϕ)\forall\{id\}@x.\phi\equiv\neg(\exists\{id\}@x.\neg\phi), ψϕ¬(¬ψ¬ϕ)\psi\wedge\phi\equiv\neg(\neg\psi\vee\neg\phi), ¬\bot\equiv\neg\top (False), ψϕ¬ψϕ\psi\rightarrow\phi\equiv\neg\psi\vee\phi (ψ\psi Implies ϕ\phi), ϕψ¬(¬ϕ𝒰¬ψ)\phi\;\mathcal{R}\;\psi\equiv\neg(\neg\phi\;\,\mathcal{U}\;\neg\psi) (ϕ\phi releases ψ\psi), ϕ¯ψϕ(ϕψ)\phi\;\overline{\mathcal{R}}\;\psi\equiv\phi\;\mathcal{R}\;(\phi\vee\psi) (ϕ\phi non-strictly releases ψ\psi), ψ𝒰ψ\Diamond\psi\equiv\top\;\,\mathcal{U}\;\psi (Eventually ψ\psi), ψ¬¬ψ\Box\psi\equiv\neg\Diamond\neg\psi (Always ψ\psi). All the other operators with ~\tilde{\cdot} on them are with frame intervals, that is in ~s\tilde{\Box}^{s}_{\mathcal{I}}, ~s\tilde{\Diamond}^{s}_{\mathcal{I}}, 𝒰~s\tilde{\,\mathcal{U}}^{s}_{\mathcal{I}}, ~s\tilde{\bigcirc}^{s}_{\mathcal{I}}, 𝒰~\tilde{\,\mathcal{U}}_{\mathcal{I}}, and ~\tilde{\bigcirc}_{\mathcal{I}} the interval \mathcal{I} is over frame interval.

For parsing a formula using the above grammar, there are two production rules Φif\Phi^{f}_{i} and Φi,jq\Phi^{q}_{i,j} in which we can use the initial production rule after increasing the index ii (i.e., Φi+1,j\Phi_{i+1,j}). The index ii is to force scope for the use of freeze time variables. For example, if in the scope of a variant-quantifier operator we use x0x_{0}, then the index will increases to 11 to avoid use of x0x_{0} in the scope of the next variant-quantifier operator. The index jj is used as a pointer to each quantifier operator to track the scope of object variables. For example, in the formula φid0.(id1.id2.(ϕ1)ϕ2)\varphi\equiv\exists id_{0}.\Box\big{(}\exists id_{1}.\exists id_{2}.(\phi_{1})\vee\phi_{2}\big{)}, we have i=2i=2 and j=0j=0 while parsing the subformula ϕ1\phi_{1}, whereas, in ϕ2\phi_{2}, we have i=0i=0 and j=0j=0. Thus any function in ϕ1\phi_{1} with object variables in it will use the production rule Λ2,0\Lambda_{2,0}. Thus, the allowed object variables in ϕ1\phi_{1} are id0,id1id_{0},id_{1} and id2id_{2}. However, while parsing the subformula ϕ2\phi_{2}, we use Λ0,0\Lambda_{0,0} in which the only allowed object variable is id0id_{0}.

Appendix B Appendix: Complexity Analysis of STE formulas

𝒍\boldsymbol{l} 𝒕𝒏\boldsymbol{t_{n}} 𝒕𝒏𝟏\boldsymbol{t_{n-1}} 𝒕𝒏𝟐\boldsymbol{t_{n-2}} 𝒕𝒏𝟑\boldsymbol{t_{n-3}} 𝒕𝒏𝟒\boldsymbol{t_{n-4}} \boldsymbol{\dots}
0 1 1 1 1 1 \dots
1 1 1(1+1) = 2 1(1+ 1(1+1)) = 3 1(1+ 1(1+ 1(1+1))) = 4 1(1+ 1(1+ 1(1+ 1(1+1)))) = 5 \dots
2 1 2(1+1) = 4, 243!2\leq 4\leq 3! 3(1+ 2(1+1)) = 15, 22154!2^{2}\leq 15\leq 4! 4(1+ 3(1+ 2(1+1))) = 64, 23645!2^{3}\leq 64\leq 5! 5(1+ 4(1+ 3(1+ 2(1+1)))) = 325, 243256!2^{4}\leq 325\leq 6! \dots
3 1 2(1+1) (1+1) = 8 3(1+ 2(1+1)) (1+ 2(1+1) (1+1)) = 135 4(1+ 3(1+ 2(1+1))) (1+
3(1+ 2(1+1)) (1+ 2(1+1) (1+1))) = 8,704
5(1+ 4(1+ 3(1+ 2(1+1)))) (1+
4(1+ 3(1+ 2(1+1))) (1+
3(1+ 2(1+1)) (1+ 2(1+1) (1+1)))) = 2,829,125
\dots
\dots \dots \dots \dots \dots \dots \dots
TABLE VI: DP-based complexity analysis for spatio-temporal until operator with different levels of nesting. At l=0l=0 we have τ=σ(Id)\tau=\mathcal{\sigma}(Id); At l=1l=1: we have τ𝒰sτ\tau\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\tau, and τ𝒰sτ\tau\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\tau; At l=2l=2: we have (τ𝒰sτ)(\tau\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\tau) 𝒰s(τ𝒰sτ)\;\,\mathcal{U}^{s}_{\mathcal{I}}\;(\tau\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\tau); Finally, at l=3l=3: we have ((τ𝒰sτ)\big{(}(\tau\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\tau) 𝒰s(τ𝒰sτ))\;\,\mathcal{U}^{s}_{\mathcal{I}}\;(\tau\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\tau)\big{)} 𝒰s((τ𝒰sτ)\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\big{(}(\tau\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\tau) 𝒰s(τ𝒰sτ))\;\,\mathcal{U}^{s}_{\mathcal{I}}\;(\tau\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\tau)\big{)}.

Here the assumption is that we use the linked-list data structure to represent a spatial term 𝒯\mathcal{T} as a union of a finite number of unique subsets. We can compute 𝕍(τ1𝒰sτ2,𝒟,t,τ,ϵ,ζ){\Large{\mathbb{V}}}(\tau_{1}\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\tau_{2},\mathcal{D},t,\tau,\epsilon,\zeta) recursively as 𝕍(τ2,𝒟,t,τ,ϵ,ζ){\Large{\mathbb{V}}}(\tau_{2},\mathcal{D},t,\tau,\epsilon,\zeta) \cup (𝕍(τ1,𝒟,t,τ,ϵ,ζ)\big{(}{\Large{\mathbb{V}}}(\tau_{1},\mathcal{D},t,\tau,\epsilon,\zeta) \cap 𝕍((τ1𝒰sτ2),𝒟,t+1,τ,ϵ,ζ)){\Large{\mathbb{V}}}((\tau_{1}\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\tau_{2}),\mathcal{D},t+1,\tau,\epsilon,\zeta)\big{)}. For each until formula in each row of Table VI (starting from the row l=1l=1), for each time step tt, we use the recursive evaluation function to calculate the maximum number of bounding boxes as a result of computing the formula. The maximum number of bounding boxes that can be produced by τ1τ2\tau_{1}\cup\tau_{2} is equal to the total number of boxes in the two spatial terms (i.e., |τ1|+|τ2||\tau_{1}|+|\tau_{2}|). Additionally, the maximum number of bounding boxes that can be produced by τ1τ2\tau_{1}\cap\tau_{2} is equal to the product of number of boxes in the two spatial terms (i.e., |τ1|×|τ2||\tau_{1}|\times|\tau_{2}|). Consequently, the maximum number of bounding boxes that can be produced at each time step tt for the above until formula is |𝕍(τ2,𝒟,t,τ,ϵ,ζ)|+|𝕍(τ1,𝒟,t,τ,ϵ,ζ)|×|𝕍(τ1𝒰sτ2,𝒟,t+1,τ,ϵ,ζ)||{\Large{\mathbb{V}}}(\tau_{2},\mathcal{D},t,\tau,\epsilon,\zeta)|+|{\Large{\mathbb{V}}}(\tau_{1},\mathcal{D},t,\tau,\epsilon,\zeta)|\times|{\Large{\mathbb{V}}}(\tau_{1}\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\tau_{2},\mathcal{D},t+1,\tau,\epsilon,\zeta)|.

B-A Formulas with Exponential Time/Space Complexities

As it is stated in the first and second rows in Table VI, the number of needed operations grow as in arithmetic sequences. Thus, the time complexity which is the summation of numbers in the rows, are linear and polynomial functions of the number of the time steps for the first and second rows, respectively. Moreover, the space complexity for the first and the second rows are constant and linear functions of number of the time steps, respectively.

In the following, we calculate an upper bound and a lower bound for the maximum number of bounding boxes that can be produced for the third row (level 2) of the until operator. We use the function f2(t)f_{2}(t) to denote the maximum number of bounding boxes that are produced at the time step tt for the until formula (τ𝒰sτ)(\tau\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\tau) 𝒰s(τ𝒰sτ)\;\,\mathcal{U}^{s}_{\mathcal{I}}\;(\tau\;\,\mathcal{U}^{s}_{\mathcal{I}}\;\tau).

t=1nf2(t)=1+2(1+1)+3(1+2(1+1))+4(1+3(1+2(1+1)))+5(1+4(1+3(1+2(1+1))))+ +n(1+(n1)(1+(n2)(1+2(1+1))))\sum_{t=1}^{n}f_{2}(t)=\\ 1+2(1+1)+3(1+2(1+1))+\\ 4(1+3(1+2(1+1)))+5(1+4(1+3(1+2(1+1))))+\dots{\\ }+n(1+(n-1)(1+(n-2)(\dots 1+2(1+1)))\dots) (21)

We repetitively use the inequality (a+1)b>a(1+b)(a+1)b>a(1+b) for b>ab>a to derive the following inequality from the above equation

t=1nf2(t)<1+3!+4!+5!+6!++(n+1)!\displaystyle\sum_{t=1}^{n}f_{2}(t)<1+3!+4!+5!+6!+\dots+(n+1)! (22)

where n2n\geq 2. Therefore, we have

t=1nf2(t)<n×(n+1)!<(n+2)!\displaystyle\sum_{t=1}^{n}f_{2}(t)<n\times(n+1)!<(n+2)! (23)

Next, we calculate a lower bound for the maximum number of bounding boxes that can be produced for the level 2 of the until operator. We repetitively use the inequality 2(b+1)<a(1+2b)2^{(b+1)}<a(1+2^{b}) for a2a\geq 2 to derive the following inequality from Eq. (21)

t=1nf2(t)>1+21+22+23+24++2(n1)\displaystyle\sum_{t=1}^{n}f_{2}(t)>1+2^{1}+2^{2}+2^{3}+2^{4}+\dots+2^{(n-1)} (24)

where n2n\geq 2. Therefore, we have

t=1nf2(t)>2n\displaystyle\sum_{t=1}^{n}f_{2}(t)>2^{n} (25)

This time inequality suggests that the time/space complexity for any formulas with more than one level of nesting can be exponential.

B-B Best Complexity for the Worst Formulas

We can repeat the above method to calculate a lower bound for each row of the table by using the inequality ar+1<ar(1+b)a^{r+1}<a^{r}(1+b) for b>ab>a in each summation of the elements of rows to derive the below inequality

t=1nf0(t)+t=1nf1(t)++t=1nfl(t)>n+n(n+1)2+2n121+3n131++ln1l1>n+n(n+1)2+2(n1)+3(n1)++l(n1)(1+12+13++1l1)>2(n1)+3(n1)++l(n1)\sum_{t=1}^{n}f_{0}(t)+\sum_{t=1}^{n}f_{1}(t)+\dots+\sum_{t=1}^{n}f_{l}(t)>\\ n+\frac{n(n+1)}{2}+\frac{2^{n}-1}{2-1}+\frac{3^{n}-1}{3-1}+\dots+\frac{l^{n}-1}{l-1}>\\ n+\frac{n(n+1)}{2}+2^{(n-1)}+3^{(n-1)}+\dots+l^{(n-1)}\\ -(1+\frac{1}{2}+\frac{1}{3}+\dots+\frac{1}{l-1})\\ >2^{(n-1)}+3^{(n-1)}+\dots+l^{(n-1)}

where n2n\geq 2. Therefore, we have

t=1nf0(t)+t=1nf1(t)++t=1nfl(t)>l(n1)\displaystyle\sum_{t=1}^{n}f_{0}(t)+\sum_{t=1}^{n}f_{1}(t)+\dots+\sum_{t=1}^{n}f_{l}(t)>l^{(n-1)} (26)

This concludes the complexity of the algorithm to be Ω(|φs|(|ρ^|1))\Omega(|\varphi_{s}|^{(|\hat{\rho}|-1)}).