¹¹institutetext: Coordinated Science Laboratory
University of Illinois Urbana-Champaign
¹¹email: {li213, bcyang2, dzhuang6, mitras}@illinois.edu ²²institutetext: Massachusetts Institute of Technology
²²email: [email protected]

Refining Perception Contracts: Case Studies in Vision-based Safe Auto-landing

Yangge Li 11 0000-0003-4633-9408 Benjamin C Yang 11 0009-0007-4998-2702 Yixuan Jia 22 0009-0007-5717-5222 Daniel Zhuang 11 0009-0005-6338-5045 Sayan Mitra 11 0000-0002-6672-8470

Abstract

Perception contracts provide a method for evaluating safety of control systems that use machine learning for perception. A perception contract is a specification for testing the ML components, and it gives a method for proving end-to-end system-level safety requirements. The feasibility of contract-based testing and assurance was established earlier in the context of straight lane keeping—a 3-dimensional system with relatively simple dynamics. This paper presents the analysis of two 6 and 12-dimensional flight control systems that use multi-stage, heterogeneous, ML-enabled perception. The paper advances methodology by introducing an algorithm for constructing data and requirement guided refinement of perception contracts (DaRePC). The resulting analysis provides testable contracts which establish the state and environment conditions under which an aircraft can safety touchdown on the runway and a drone can safely pass through a sequence of gates. It can also discover conditions (e.g., low-horizon sun) that can possibly violate the safety of the vision-based control system.

Keywords:

Learning-enabled control systems Air vehicles Assured autonomy Testing.

1 Introduction

The successes of Machine Learning (ML) have led to an emergence of a family of systems that take autonomous control actions based on recognition and interpretation of signals. Vision-based lane keeping in cars and automated landing systems for aircraft are exemplars of this family. In this paper, we explore two flight control systems that use multi-stage, heterogeneous, ML-enabled perception: an automated landing system created in close collaboration with our industry partners and an autonomous racing controller built for a course in our lab.

Vision-based auto-landing system.

For autonomous landing, availability of relatively inexpensive cameras and ML algorithms could serve as an attractive alternative. The perception pipeline of a prototype vision-based automated landing system (see Fig. 1) estimates the relative pose of the aircraft with respect to the touchdown area using combination of deep neural networks and classical computer vision algorithm. The rest of the auto-landing system uses the estimated pose from the perception pipeline to guide and control the aircraft from 2300 meters—along a reference trajectory of $3^{\circ}$ slope—down to the touchdown area on the runway. The quality of the camera images depend on the intensity and direction of lighting, weather conditions, as well as the actual position of the aircraft and, in turn, influences the performance of the perception-based pose estimation. From a testing and verification perspective, the problem is to identify a range of environmental conditions and a range of initial approaches from which the aircraft can achieve safe touchdown; safety being defined by a cone or a sequence of predicates in the state space (see Fig. 2).

Vision-based drone racing.

Drone racing systems are pushing the boundaries of perception and agile control [23]. Our second case study (DroneRace) is about autonomous control of a drone through a sequence of gates, under different visibility conditions, in a photorealistic simulator we built using Neural Radiance Fields [26]. The perception pipeline here estimates the position of the drone using a vision-based particle filter. Once again, the challenge is to discover the environment and approach conditions under which the maneuver can be performed reliably.

Refer to caption — Figure 1: Left: Architecture of vision-based automated landing system (AutoLand). Middle: Camera view and keypoint detection variations with ambient and directed lighting. Right: Variation of simulation environment for DroneRace.

The problem is challenging because the uncertainties in the perception module are state and environment dependent; these uncertainties have to be quantified and then propagated through the control and dynamics of the system. The core technical challenge for addressing this problem is that we do not have specifications for the DNN-based perception pipeline: What precise property should they satisfy for system-level safety? The recently introduced perception contracts approach [15, 2] addresses this problem: The contract can be created from ground truth-labeled data, and it serves a specification for modular testing of the ML perception components. Perception contracts can also be used to prove end-to-end system-level safety requirements. It gives a bound on the error between true pose $d$ and estimated pose $\hat{d}$ as a function of the ground truth $d$ , that is supposed to hold even under environmental variations. In the previous works [15, 2], the range of environmental factors, and consequently, the perception contracts, were fixed, and the feasibility of such contract-based reasoning is demonstrated in low-dimensional, and relatively simple applications, such as a lane-keeping system.

This work advances the perception contracts method in two important ways. First, we develop an algorithm called DaRePC that iteratively refines the contracts based both the target system-level requirement and sampled data from the environment. To the best of our knowledge, DaRePC is the first approach that can help automatically identify the environmental conditions of vision-based flight control systems, that can assure system-level safety. We show that under certain convergence assumptions that can be empirically validated, DaRePC terminates with either a contract that proves the requirement or finds a counter-example. We discuss findings from applying DaRePC to AutoLand and DroneRace. These are challenging applications with high-dimensional (6d and 12d), nonlinear dynamics, and heterogeneous perception pipelines involving Deep Neural Networks and filtering modules. For both case studies, we are able to identify a set of environmental conditions where the system is proven to satisfy the requirement. We show that DaRePC is able to identify interesting combinations of non-convex set of environmental conditions that the system can satisfy requirement (Fig. 2). For the AutoLand case study, we can further show that our algorithm can find counter-examples violating requirements.

2 Related Works

Contracts and Assume-Guarantee.

A closely related approach has been developed by Păsăreanu, Mangal, Gopinath, and et al. in [18, 3, 19]. In [19], the authors generate weakest assumptions about the ML component, from the system-level correctness requirements and the available information about the non-ML components. These weakest assumptions are analogous to perception contracts, in that they are useful for assume-guarantee style reasoning for proving system-level requirements. However, unlike PCs they are completely agnostic of the actual perception components. The approach was used to analyze a TaxiNet-based lane following system [18]. Another important technical difference is that this framework uses probabilistic discrete state transition systems as the underlying mathematical model.

Our previous work [15, 2] develops the idea of perception contracts using the language of continuous state space models. Thus far, all the applications studies in these threads are related to lane following by a single agent, which is quite different from distributed formation control. The idea was recently extended to distributed system to assure safety of vision-based swarm formation control [14]. In [25], the perception contract is developed and used to help design a controller for safely landing a drone on a landing pad.

Safety of automated landing

Safety of a vision-based automated landing system is analyzed in [21]. This approach carefully exploits the geometry of perspective cameras to create a mathematical relationship between the aircraft pose and perceived pose. This relation is then modeled a NN which is composed with the controller (another NN in this case) for end-to-end analysis. This approach gives strong guarantees at the expense of requiring expertise (in creating the perception model) and being tailored for a particular application. Shoukri et al. in [22] analyze a vision-based auto landing system using falsification techniques. Their techniques involve computing the range of tolerable error from the perception pipeline and finding states where this error is violated. However, the impacts of environmental variations were not considered in either of these works.

Perception-based control

In a series of recent works by Dean et al., the authors studied the robustness guarantees of perception-based control algorithms under simple perception noises: In [4], the authors proposed a perception-based controller synthesis approach for linear systems and provided a theoretical analysis. In [5], the authors proposed robust barrier functions that provide an approach for synthesizing safety-critical controllers under uncertainty of the state. More complex perception modules have also been studied. In [8], the reachability of a closed-loop system with a neural controller has been studied.

VerifAI [7] provides a complete framework for analyzing autonomous systems with ML modules in the loop. It is used in [11] to analyze and redesign a Neural Network-Based Aircraft Taxiing System.

3 The Problem: Assuring Safety of Vision-based Systems

Notations.

For a function $f:X\rightarrow Y$ , we will extend its definition to sets as usual $f(S):=\{f(x)\ |\ x\in S\}$ for any $S\subseteq X$ . Also, for a function defined on sets $f:2^{X}\rightarrow Y$ , we will write $f(\{x\})$ simply as $f(x)$ for $x\in X$ .

A vision-based system or simply a system $S$ is described by a tuple $\langle X,X_{0},Y,\\ E,f,o\rangle$ where $X\subseteq\mathbb{R}^{n}$ is called the state space, $X_{0}\subseteq X$ is the set of initial states, $Y\subseteq\mathbb{R}^{m}$ is called the observation space, $E$ is called an environment space, $f:X\times Y\rightarrow 2^{X}$ is called the dynamics, and $o:X\times E\rightarrow 2^{Y}$ is called the observer. Typically, $X$ and $f$ are available in analytical form and are amenable to formal analysis. In contrast, the environment $E$ and the observer $o$ are not, and we only have black-box access to $o$ . For example, the perception pipelines implementing the observer for the AutoLand and the DroneRace systems (Sections 5-6), are well beyond the capabilities of NN verification tools.

An execution of a system $S$ in an environment $e\in E$ is a sequence of states $\alpha=x_{0},x_{1},\ldots,$ such that $x_{0}\in X_{0}$ and for each $t$ , $x_{t+1}\in f(x_{t},o(x_{t},e)).$ For an execution $\alpha$ and $t\geq 0$ , we write the $t^{\mathit{th}}$ state as $\alpha(t)$ . The reachable states of a system ${S}$ in an environment $e\in E$ is a sequence of sets of states ${Reach}_{S}=X_{0},X_{1},\ldots,$ such that for each $t$ , $X_{t}=\cup{\alpha}(t)$ for all possible execution ${\alpha}$ . For any $t\geq 0$ , ${Reach}_{S}(t)$ denotes the set of states reachable at time $t$ .

A requirement is given by a set $R\subseteq X$ . An execution $\alpha$ in environment $e$ satisfies a requirement $R$ if for each $t\geq 0$ , $\alpha(t)\in R$ . The system $S$ satisfies $R$ over $X_{0}$ and $E$ , which we write as $\langle o,X_{0},E\rangle\models R$ , if for every $e\in E$ , every execution of the $S$ with observation function $o$ , starting from any state in $X_{o}$ satisfies $R$ . We write $(o,X_{1},E_{1})\not\models R$ for some $X_{1}\subseteq X$ and $E_{1}\subseteq E$ , if there exists $x\in X_{1},e\in E_{1}$ and $t\geq 0$ , if for the execution $\alpha$ starting from $x$ in $e$ , $\alpha(t)\notin R$ . We make the dependence on $o,X_{0},$ and $E$ explicit here because these are the parameters that will change in our discussion.

Problem.

Given a system $S$ , we are interested in finding either (a) a subset $E_{0}\subseteq E$ of the environment such that $\langle o,X_{0},E_{0}\rangle\models R$ or (b) a counterexample, that is, an environment $E_{c}\in E$ such that $\langle o,X_{0},E_{c}\rangle\not\models R$ .

4 Method: Perception Contracts

The system $S$ is a composition of the dynamic function $f$ and the observer $o$ , and we are interested in developing compositional or modular solutions to the above problem. The observer $o$ in a real system is implemented using multiple machine learning (ML) components, and it depends on the environment in complicated ways. For example, a camera-based pose estimation system of the type used in ${\sf AutoLand}$ uses camera images which depend on lighting direction, lens flare, albedo and other factors, in addition to the actual physical position of the camera with respect to the target. Testing such ML-based observers is challenging because, in general, the observer has no specifications.

A natural way of tackling this problem is to create an over-approximation ${M}$ of the observer, through sampling or other means. The actual observer $o$ can be made to conform to this over-approximation ${M}$ arbitrarily precisely, by simply making ${M}$ large or conservative. However, such an over-approximation may not be useful for establishing system-level correctness. Thus, ${M}$ has to balance observer conformance and system correctness, which leads to the following notion of perception contract.

Definition 1

For a given observer $o:X\times E\rightarrow Y$ , a requirement ${R}$ , and sets $\tilde{X_{0}}\subseteq\tilde{X}\subseteq X$ , $\tilde{E}\subseteq E$ , a $(\tilde{X}_{0},\tilde{X},\tilde{E})$ -perception contract is a map ${M}:X\rightarrow 2^{Y}$ such that

1.

Conformance: $\forall x\in\tilde{X}$ , $o(x,\tilde{E}){\color[rgb]{0,0,0}\subseteq}{M}(x)$
2.

Correctness: $\langle{M},\tilde{X}_{0},\tilde{E}\rangle\models{R}$ .

Recall, $\langle{M},\tilde{X}_{0},\tilde{E}_{0}\rangle\models{R}$ means that the closed loop system $S_{M}$ , which is the system $S$ with the observer $o$ substituted by $M$ , satisfies the requirement $R$ . The following proposition states how perception contracts can be used for verifying actual perception-based systems.

Proposition 1

Suppose $\tilde{X}\subseteq X$ contains the reachable states of the system $S=\langle X,\tilde{X}_{0},Y,\tilde{E},f,o\rangle$ . If ${M}$ is a $(\tilde{X}_{0},\tilde{X},\tilde{E})$ -perception contract for $o$ for some requirement $R$ , then $S$ satisfies $R$ , i.e., $\langle o,\tilde{X}_{0},\tilde{E}\rangle\models{R}$ .

At a glance, Proposition 1 may seem circular: it requires us to first compute the reachable states $\tilde{X}$ of $S$ to get a perception contract $M$ , which is then used to establish the requirement $R$ for $S$ . However, as we shall see with the case studies, the initial $\tilde{X}$ can be a very crude over-approximation of the reachable states of $S$ , and the perception contract can be useful for proving much stronger requirements.

4.1 Learning Conformant Perception Contracts from Data

For this paper, we consider deterministic observers ${\hat{h}}:X\times E\rightarrow Y$ . This covers a broad range of perception pipelines that use neural networks and traditional computer vision algorithms. Recall, a perception contract $M:{\color[rgb]{0,0,0}\tilde{X}}\rightarrow 2^{Y}$ gives a set $M(x)$ of perceived or observed values, for any state $x\in\tilde{X}$ . In this paper, we choose to represent this set $M(x)$ as a ball $B_{M_{r}(x)}(M_{c}(x))\subseteq Y$ of radius $M_{r}(x)$ centered at $M_{c}(x)$ .

In real environments, observers are unlikely to be perfectly conformant to perception contracts. Instead, we can try to compute a contract $M$ that maximizes some measure of conformance of $\hat{h}$ such as $p=\mathbb{E}_{x,e\sim\mathcal{D}}[\mathbb{I}(\hat{h}(x,e)\in M(x))]$ , where $\mathbb{I}$ is the indicator function and $\mathcal{D}$ is a distribution over the $X$ and $E$ . See the case studies and the discussions in Section 7 for more information on the choice of $\mathcal{D}$ . Computing the conformance measure $p$ is infeasible because we only have black box access to $\hat{h}$ and $\mathcal{D}$ . Instead, we can estimate an empirical conformance measure: Given a set of states $\tilde{X}$ and a set of environments $\tilde{E}$ over which the perception contract $M$ is computed, we can sample according to $\mathcal{D}$ to get $L=\{\langle x_{1},e_{1},\hat{y}_{1}\rangle,...,\langle x_{n},e_{n},\hat{y}_{n}\rangle\}$ , where $x_{i}\in\tilde{X}$ , $e_{i}\in\tilde{E}$ , and $\hat{y}_{i}={\hat{h}}(x_{i},e_{i})$ . We define the empirical conformance measure of ${M}$ as the fraction of samples that satisfies ${M}$ : $\hat{p}_{n}=\frac{1}{n}\sum_{i=1}^{n}\mathbb{I}(\hat{y}_{i}\in M(x_{i}))$ . Using Hoeffding’s inequality [13] , we can bound the gap between the actual conformance measure $p$ and the empirical measure $\hat{p}_{n}$ , as a function of the number of samples $n$ .

Proposition 2

For any $\delta\in(0,1)$ , with probability at least $1-\delta$ , $p\geq\hat{p}-\sqrt{-\frac{\ln{\delta}}{2n}}$ .

This gives us a way to choose the number of samples $n$ , to bound the gap between the empirical and the actual conformance, with some probability (as decided by the choice of $\delta$ ).

Algorithm 1

\textsc{LearnContract}(\tilde{X},\tilde{E},{\hat{h}},pr,\epsilon,\delta)

n=-\frac{ln\delta}{2\epsilon^{2}}

\{\langle x_{i},e_{i}\rangle\}_{i=1}^{n}=\textsc{Sample}_{\mathcal{D}}(\tilde{X},\tilde{E},n)

\hat{y}_{1},...,\hat{y}_{n}={\hat{h}}(x_{1},e_{1}),...,{\hat{h}}(x_{n},e_{n})

L_{c}=\{\langle x_{1},\hat{y}_{1}\rangle,...,\langle x_{1},\hat{y}_{n}\rangle\}

M_{c}=\textsc{Regression}(L_{c})

L_{r}=\{\langle x_{1},|\hat{y}_{1}-M_{c}(x_{1})|\rangle,...,\langle x_{n},|\hat{y}_{n}-M_{c}(x_{n})|\rangle\}

M_{r}=\textsc{QuantileRegression}(L_{r},pr+\epsilon)

8:return

\langle M_{c},M_{r}\rangle

The LearnContract algorithm takes as input a set of states $\tilde{X}$ , an environment set $\tilde{E}$ over which a perception contract is to be constructed for the actual perception function ${\hat{h}}$ . It also takes the desired conformance measure $pr$ , the tolerable gap $\epsilon$ (between the empirical and actual conformance), and the confidence parameter $\delta$ . The algorithm first computes the required number of samples for which the gap $\epsilon$ holds with probability more than $1-\delta$ (based on Proposition 2). It then samples $n$ states over $\tilde{X}$ , and $n$ environmental parameters in $\tilde{E}$ , according to a distribution $\mathcal{D}$ . The samples are then used to compute $M_{c}(x)$ function using standard regression.

The training data for the radius function $M_{r}$ is obtained by computing the difference between actual perceived state $\hat{h}(x)$ and the center predicted by $M_{c}(x)$ . To make actual conformance $p\geq pr$ , from Proposition 2, we know that the empirical conformance $\hat{p}\geq p+\epsilon\geq pr+\epsilon$ . Recall, the empirical conformance is the fraction of samples satisfying $|\hat{y}_{i}-M_{c}(x_{i})|\leq M_{r}(x_{i})$ . $M_{r}$ is obtained by performing quantile regression with quantile value $pr+\epsilon$ . Given $L_{r}=\{\langle x_{1},r_{1}\rangle,...,\langle x_{n},r_{n}\rangle\}$ where $r_{i}=|\hat{y}_{1}-M_{c}(x_{1})|$ , QuantileRegression finds the function $M_{r}$ such that $pr+\epsilon$ fraction of data points satisfy that $|\hat{y}_{i}-M_{c}(x_{i})|\leq M_{r}(x_{i})$ by solving the optimization problem [6]:

\min_{M_{r}}~{}\{(1-\hat{p})\sum_{r_{i}<M_{r}(x_{i})}{|r_{i}-M_{r}(x_{i})|}+\hat{p}\sum_{r_{i}\geq M_{r}(x_{i})}{|r_{i}-M_{r}(x_{i})|}\}.

Therefore, given $pr$ , $\epsilon$ , $\delta$ , the algorithm is able to find ${M}$ with desired conformance.

Contracts with fixed training data.

Sometimes we only have access to a pre-collected training data set. LearnContract can be modified to accommodate this setup. Given the number of samples $n$ and $pr$ , $\delta$ , according to Proposition 2, we can get an upper bound on the gap $\epsilon$ and therefore get a lower bound for the empirical conformance $\hat{p}$ . We can then use quantile regression to find the contract with right empirical conformance on the training data set.

LearnContract constructs candidate perception contracts that are conformant to a degree, over specified state $\tilde{X}$ and environment $\tilde{E}$ spaces. For certain environments $e\in\tilde{E}$ , the system $S$ may not meet the requirements, and therefore, the most precise and conformant candidate contracts computed over such an environment $e$ will not be correct. In Section 4.2, we will develop techniques for refining $\tilde{X}$ and $\tilde{E}$ to find contracts that are correct or find counter-examples. For the completeness of this search process we will make the following assumption about convergence of LearnContract with respect to $\tilde{E}$ .

Convergence of LearnContract.

Consider a monotonically decreasing sequence of sets $\{E_{k}\}_{n=1}^{\infty}$ , such that their limit $\liminf_{k\rightarrow\infty}E_{k}=\{e_{d}\}$ for some environment $e_{d}\in\tilde{E}$ . Let $\{M_{k}\}$ be the corresponding sequence of candidate contracts constructed by LearnContract for changing $\tilde{E}=E_{k}$ , while keeping all other parameters fixed. We will say that the algorithm is convergent at $e_{d}$ if for all $x\in\tilde{X},\liminf_{k\rightarrow\infty}M_{k}(x)=\hat{h}(x,e_{d}).$ Informally, as candidate contracts are computed for a sequence of environments converging to $e_{d}$ , the contracts point-wise converge to $\hat{h}$ . In Section 5, we will see examples providing empirical evidence for this convergence property.

4.2 Requirement Guided Refinement of Perception Contracts

We now present Data and Requirement Guided Perception Contract (DaRePC) algorithm which uses LearnContract to find perception contract that are correct. Consider system ${\hat{S}}=\langle\tilde{X},X_{0},Y,{\color[rgb]{0,0,0}E_{0}},f,{\hat{h}}\rangle$ and the safety requirement ${R}$ , we assume there exists a nominal environment $e_{d}\in E_{0}$ such that there exists $x_{d}\in X_{0}$ with $\langle{\hat{h}},x_{d},e_{d}\rangle\models{R}$ .

Algorithm 2

\textsc{DaRePC}(X_{0},E_{0},e_{d},{R},{\hat{h}})

1:Params:

\tilde{X},pr,\epsilon,\delta

\tilde{E}=E_{0}

{M}_{\tilde{E}}=\textsc{LearnContract}(\tilde{X},\tilde{E},{\hat{h}},pr,\epsilon,\delta)

queue.push(X_{0})

5:while

queue\neq\emptyset

X_{c}=queue.pop()

7: if

\exists t\text{ s.t }\textsc{Reach}(X_{c},{M}_{\tilde{E}},t)\cap{R}=\emptyset

then return

\langle None,\tilde{E},X_{c}\rangle

8: else if

\exists t\text{ s.t }\textsc{Reach}(X_{c},{M}_{\tilde{E}},t)\nsubseteq{R}

then

X_{1},X_{2}=\textsc{RefineState}(X_{c})

10:

queue.push(X_{1},X_{2})

11:

\tilde{E}=\textsc{ShrinkEnv}(\tilde{E},e_{d},{M}_{\tilde{E}},{\hat{h}})

12:

{M}_{\tilde{E}}=\textsc{LearnContract}(\tilde{X},\tilde{E},\hat{h},pr,\epsilon,\delta)

13:return

\langle{M}_{\tilde{E}},\tilde{E},None\rangle

DaRePC takes as input initial $X_{0}$ , environment $E_{0}$ sets, requirement ${R}$ , and the parameters for LearnContract. It outputs either an environment set $\tilde{E}\in E_{0}$ with a candidate contract ${M}_{\tilde{E}}$ such that $\langle{M}_{\tilde{E}},X_{0},\tilde{E}\rangle\models{R}$ and the actual observer $\hat{h}$ conforms to ${M}_{\tilde{E}}$ over $\tilde{E}$ , or it outputs a set of initial states $X_{c}\in X_{0}$ and an environment set $\tilde{E}\in E_{0}$ such that ${R}$ will be violated.

DaRePC will check if $\textsc{Reach}(P,{M}_{\tilde{E}},t)$ leaves ${R}(t)$ . If $\textsc{Reach}(P,{M}_{\tilde{E}},t)$ completely leaves ${R}(t)$ , then ${M}_{E}$ satisfies ${R}$ can’t be found on $X_{c}$ . $X_{c}$ and $\tilde{E}$ is returned as a counter-example (Line 7). When ${Reach}(P,{M}_{E},t)$ partially exits ${R}(t)$ , then refinement will be performed by partitioning the initial states using RefineState and shrinking the environments around nominal $e_{d}$ using ShrinkEnv (Line 8). If $\textsc{Reach}(P,{M}_{E},t)$ never exit ${R}(t)$ , then ${M}_{E}$ is a valid perception contract that satisfy the correctness requirement in Definition 1. It is returned together with $\tilde{E}$ .

We briefly discuss the subroutines used in DaRePC.

$\textsc{Reach}(X_{c},M_{\tilde{E}},t)$

computes an over-approximation of ${Reach}_{S_{M}}(t)$ of the system $S_{{M}}=\langle\tilde{X},X_{c},Y,\tilde{E},f,M_{\tilde{E}}\rangle$ . For showing completeness of the algorithm, we assume that Reach converges to a single execution as $X_{c}$ , $\tilde{E}$ converges to a singleton set. This assumption is satisfied by many reachability algorithms, especially simulation-based reachability algorithms [9, 24].

$\textsc{RefineState}(X_{c})$

takes as input a set of states, and partitions it into a finer set of states. With enough refinements, all partitions will converge to arbitrarily small sets. There are many strategies for refining state partitions, such as, bisecting hyperrectangular sets along coordinate axes.

$\textsc{ShrinkEnv}(\tilde{E},e_{d},{M}_{\tilde{E}},{\hat{h}})$

The ShrinkEnv function takes as input a set of environmental parameters $\tilde{E}$ , the nominal environmental parameter $e_{d}$ , ${M}_{\tilde{E}}$ and the actual perception function ${\hat{h}}$ and generates a new set of environmental parameters. The sequence of sets generated by repeatedly apply ShrinkEnv will monotonically converge to a singleton set containing only the nominal environmental parameter $e_{d}$ .

4.3 Properties of DaRePC

Theorem 4.1

If ${M}_{\tilde{E}}$ is output from DaRePC, then $\langle{M}_{\tilde{E}},X_{0},\tilde{E}\rangle\models{R}$

Theorem 4.1 implies that if the condition in Proposition 1 is satisfied, then $\langle{\hat{h}},X_{0},\tilde{E}\rangle\models{R}$ .

From the assumptions for ${M}$ and the Reach function, we are able to show that as RefineState and ShrinkEnv are called repeatedly, the resulting reachable states computed by Reach will converge to a single trajectory under nominal environmental conditions. The property can be stated more formally as:

Proposition 3

Given $S=\langle\tilde{X},X_{0},Y,E_{0},f,{\hat{h}}\rangle$ , $\forall x_{0}\in X_{0}$ and an execution ${\alpha}$ of $S$ in nominal environment $e_{d}$ , $\forall\epsilon>0$ , $\exists n$ such that $X_{n}$ generated by calling RefineState $n$ times on $X_{0}$ and ${M}_{E_{n}}$ obtained on environment refined by ShrinkEnv $n$ times satisfies that $\forall t$ , $\textsc{Reach}(X_{n},{M}_{E_{n}},t)\subseteq B_{\epsilon}({\alpha}(t))$

We further define how a system can satisfy/violate the requirement robustly.

Definition 2

${\hat{S}}=\langle\tilde{X},X_{0},Y,E_{0},f,{\hat{h}}\rangle$ satisfy ${R}$ robustly for $X_{0}$ and $\tilde{E}$ if $\forall x_{0}\in X_{0}$ , $\forall e\in\tilde{E}$ , the execution ${\alpha}$ satisfies that $\forall t$ , $\exists\epsilon>0$ such that $B_{\epsilon}({\alpha}(t))\subseteq{R}$ .

Similarly, ${\hat{S}}$ violates ${R}$ robustly if there exists an execution $\alpha$ starting from $x_{0}\in X_{0}$ under $e_{d}$ and at time $t$ , ${\alpha}(t)\notin{R}$ , there always exist $\epsilon>0$ such that $B_{\epsilon}({\alpha}(t))\cap{R}=\emptyset$

With all these, we can show completeness of DaRePC.

Theorem 4.2 (Completeness)

If ${M}$ satisfies conformance and convergence and ${\hat{S}}=\langle\tilde{X},X_{0},Y,E_{0},f,{\hat{h}}\rangle$ satisfy/violate requirement robustly, then Algorithm 2 always terminates, i.e., the algorithm can either find a $\tilde{E}\subseteq E_{0}$ and ${M}_{\tilde{E}}$ that $\langle{M}_{\tilde{E}},X_{0},\tilde{E}\rangle\models{R}$ or find a counter-example $X_{c}\subseteq X_{0}$ and $\tilde{E}\subseteq E_{0}$ with $e_{d}\in\tilde{E}$ that $\langle{\hat{h}},X_{c},\tilde{E}\rangle\not\models{R}$ .

5 Case Study: Vision-based Auto Landing

Our first case study is ${\sf AutoLand}$ , which is a vision-based auto-landing system. Here an aircraft is attempting to touchdown on a runway using a vision-based pose estimator. Such systems are being contemplated for different types of aircraft landing in airports that do not have expensive Instrumented Landing Systems (ILS). The system block diagram is shown in Fig. 1. The safety requirement ${R}$ defines a sequence of shrinking rectangular regions that the aircraft should remain in (see Fig. 2), otherwise, it has to trigger an expensive missed approach protocol. For our analysis we built a detailed simulation of the aircraft dynamics; the observer or the perception pipeline is the actual code used for vision-based pose estimation.

5.1 AutoLand control system with DNN-based pose estimator

The state space $X$ of the AutoLand system is defined by the state variables $\langle x,y,z,\psi,\theta,v\rangle$ , where $x,y,z$ are position coordinates, $\psi$ is the yaw, $\theta$ is the pitch and $v$ is the velocity of the aircraft. The perception pipeline implements the observer ${\hat{h}}$ that attempts to estimate the full state of the aircraft (relative to the touchdown point), except the velocity. That is, the observation space $Y$ is defined by $\langle\hat{x},\hat{y},\hat{z},\hat{\psi},\hat{\theta}\rangle$ .

Dynamics.

The dynamics ( $f$ ) combines the physics model of the aircraft with a controller that follows a nominal trajectory. A 6-dimensional nonlinear differential equation describes the motion of the aircraft with yaw rate $\beta$ , the pitch rate $\omega$ , and acceleration $a$ as the control inputs (Details are given in the Appendix). The aircraft controller tries to follow a reference trajectory $\xi_{r}:\mathbb{Z}^{\geq 0}\rightarrow X$ approaching the runway. This reference approach trajectory starts at position $2.3$ kilometers from the touchdown point and $120$ meters above ground, and has a $-3^{\circ}$ slope in the z direction. The reference approach has constant speed of $10m/s$ .

Vision-based observer.

The observations $\langle\hat{x},\hat{y},\hat{z},\hat{\psi},\hat{\theta}\rangle$ are computed using a realistic perception pipeline composed of three functions (Fig. 1): (1) The airplane’s front camera generates an image, (2) a key-point detector detects specific key-points on the runway in that image using U-Net [20], (3) pose estimator uses the detected key-points and the prior knowledge of the actual positions of those key-points on the ground, to estimate the pose of the aircraft. This last calculation is implemented using the perspective-n-point algorithm [10]. Apart from the impact of the environment on the image, the observer is deterministic.

Environment.

For this paper, we study the effects of two dominant environment factors, namely the ambient light and the position of the sun, represented by a spotlight¹¹1We have also performed similar analysis with fog and rain, but the discussion is not included here owing to limited space.. (1) The ambient light intensity in the range [0.2, 1.2] and (2) the spot light’s angle (with respect to the runway) in the range $[-0.1,0.6]$ radians, define the environment $E_{0}$ . These ranges are chosen to include a nominal environment $e_{d}=\langle 1.0,0\rangle$ in which there exists an initial state $x_{0}\in X_{0}$ from which the system is safe. Sample camera images from the same state under different environmental conditions are shown in Fig. 1.

5.2 Safety Analysis of AutoLand

We apply DaRePC on the ${\sf AutoLand}$ system with reachability analysis implemented using Verse [16]. We run the algorithm with the above range of the environment $E_{0}$ , the requirement ${R}$ , and on two rectangular initial sets $X_{01}$ and $X_{02}$ with different range of initial positions (see Appendix 0.F).

For $X_{01}$ , the algorithm finds $\tilde{E}\subset E_{0}$ and a corresponding perception contract $M_{\tilde{E}}$ such that $\langle{M}_{\tilde{E}},X_{01},\tilde{E}\rangle\models{R}$ after performing 12 rounds of refinements with empirical conformance measure of $92.4\%$ . The resulting $\tilde{E}$ is visualized in Fig. 2. The white region represents the environmental parameters that are removed (not provably safe) during refinement. For the colored region, different colors represent the empirical conformance of $\hat{h}$ to $M$ over each environmental parameter partition. We have following observations from the analysis.

The computed perception contract ${M}_{\tilde{E}}$ achieves desired conformance.

To compute the perception contract, we choose the desired conformance $pr=90\%$ , the gap between actual conformance and empirical conformance $\epsilon=1\%$ and the confidence parameter $\delta=0.1\%$ . From LearnContract, the empirical conformance measure should be at least $91\%$ for approximately $34$ K samples. We round this up to $40$ k samples. The computed perception contract ${M}_{\tilde{E}}$ achieves $92.4\%$ empirical conformance measure over $\tilde{E}$ .

Additionally, we run 30 random simulations from $X_{01}$ under $\tilde{E}$ . Out of the resulting $30$ K data points generated, $29590$ observations were contained in the computed perception contract $M$ , which gives an empirical conformance of $98.6\%$ for this different sampling distribution. This shows that ${M}_{\tilde{E}}$ meets the desired conformance and can be even better for some realistic sampling distributions.

DaRePC captures interesting behavior.

The set of environmental parameters $\tilde{E}$ found by DaRePC also reveals some interesting behavior of the perception pipeline. We can see from Fig. 2 that when the intensity of ambient light is below 0.6 or above 1.15, the performance of perception pipeline becomes bad as many of the partitions are removed in those region. However, we can see there is a region $[0.95,1.1]\times[0.25,0.4]$ where the ambient lighting condition is close to the nominal ambient lighting condition, but surprisingly, the performance of perception pipeline is still bad. We found that this happens because in this environment, the glare from the spotlight (sun) makes the runway too bright for the perception pipeline to detect keypoints. Conversely, in region $[0.4,0.6]\times[0.25,0.4]$ , the relatively low ambient light is compensated by the sun, and therefore, the perception pipeline has reliable performance for landing.

Conformance, correctness and system-level safety.

Fig. 2 shows the system-level safety requirement ${R}$ in blue and $\textsc{Reach}(X_{01},{M}_{\tilde{E}},t)$ in red. We can see that the computed reachable set indeed falls inside ${R}$ , which by Proposition 1, implies that if the perception pipeline $\hat{h}$ conforms to ${M}_{\tilde{E}}$ then $\langle{\hat{h}},X_{01},\tilde{E}\rangle\models{R}$ .

Further, to check $\langle{\hat{h}},X_{01},\tilde{E}\rangle\models{R}$ , we run 30 simulations starting from states randomly sampled from $X_{01}$ under environments randomly sampled from $\tilde{E}$ . The results are shown in Fig. 2. All 30 simulations satisfy ${R}$ , and only one of these is not in the computed reachable set. This happens because ${M}_{\tilde{E}}$ doesn’t have $100\%$ conformance, and therefore cannot capture all behaviors of the actual system.

Discovering bad environments; $\langle{\hat{h}},X_{01},E_{0}\textbackslash\tilde{E}\rangle$ doesn’t satisfy ${R}$ .

We performed 10 random simulations from $X_{01}$ under $E_{0}\setminus\tilde{E}$ . Eight out of the 10 violate ${R}$ (Red trajectories in Fig. 2). This suggests that the environments removed during refinement can indeed lead to real counter-examples violating ${R}$ .

For $X_{02}$ , DaRePC is able to find a set of states $X_{c}\subseteq X_{02}$ and $E_{c}\subseteq E_{0}$ such that $\langle{\hat{h}},x_{c},e_{c}\rangle\not\models{R}$ . $X_{c}$ and $E_{c}$ are found by calling RefineState on $X_{0}$ 5 times and ShrinkEnv on $E_{0}$ 8 times. The computed set of environmental parameters $E_{c}$ and $X_{c}$ is shown in Fig. 3. We perform 30 simulations starting from randomly sampled $x_{c}\in X_{c}$ under $e_{c}\in E_{c}$ and the results are shown in Fig. 3. We can see clearly from the plot that all the simulation trajectories violates ${R}$ , which provides evidence that $X_{c}$ and $E_{c}$ we found are indeed counter-examples.

5.3 Convergence and conformance of Perception Contracts

In Section 4.1 , for showing termination of the algorithm, we assume that the perception contract we computed is convergent, i.e. as candidate contracts are computed for a sequence of environments converging to $e_{d}$ , the contracts also converge to $\hat{h}$ (pointwise). To test this, we pre-sampled $80$ k data points and set $pr=85\%$ , $\delta=0.1\%$ . We reduce $\tilde{E}$ for ${M}_{\tilde{E}}$ by invoking ShrinkEnv function 0, 3, and 6 times on the set of environments with pre-sampled data points. ShrinkEnv refines the environments by removing environmental parameters where the perceived states is far from the actual states and removing corresponding data points sampled under these environments. Fig. 4 shows how the output perception contract and its conformance measure are influenced by successive refinements. For the three range of environments $\tilde{E}_{0}$ , $\tilde{E}_{3}$ , $\tilde{E}_{6}$ after 0,3,6 round of refinements, we observe that ${\hat{h}}(x,e_{d})\in M_{\tilde{E}_{6}}(x)\subseteq M_{\tilde{E}_{3}}(x)\subseteq M_{\tilde{E}_{0}}(x)$ , which provides evidence that the perception contract is convergent.

Refinement preserves conformance measure of perception contract.

As we reduce the range of environmental parameters with 0, 3 and 6 refinements, the training data set reduced from 80000 to 71215 and 62638, respectively. Therefore, according to Proposition 2, the empirical conformance measure should at least be $85.8\%$ for all these three cases to achieve the $85\%$ actual conformance measure. In practice, we are able to get $85.9\%$ , $87.5\%$ and $89.1\%$ empirical conformance measure. From this, we can see refinement preserves conformance measure of perception contract.

6 Case Study: Vision-based Drone Racing (DroneRace)

In the second case study, a custom-made quadrotor has to follow a path through a sequence of gates in our laboratory using a vision-based particle filtering for pose estimation. The system architecture is shown in Fig. 6. The safety requirement $R$ is a pipe-like region in space passing through the gates. Our analysis here is based on a simulator mimicking the laboratory flying space that we created for this work using Neural Radiance Fields (NeRF) [26]. This simulator enables us to render photo-realistic camera images from any position in the space and with different lighting and visibility (fog) conditions. The observer implementing the vision-based particle filter is the actual code that flies the drone.

6.1 System Description

The 12-dimensional state space $X$ of the system is defined by the position $x,y,z$ , linear velocities $vx,vy,vz$ , roll pith yaw angles $\phi,\theta,\psi$ , and the corresponding angular velocities $\rho,\omega,\beta$ . The observer ${\hat{h}}$ estimates the 3d-position and the yaw angle. That is, the observation space $Y$ is defined by $\langle\hat{x},\hat{y},\hat{z},\hat{\psi}\rangle$ . This is a reasonable setup as roll and pitch angles can be measured directly using onboard accelerometer, but position and yaw angles have to be obtained through computation or expensive external devices such as GPS or Vicon.

Dynamics.

A standard 12-dimensional nonlinear model for the drone is used [12]. The input to the aircraft are the desired roll, pitch, yaw angles, and the vertical thrust. The path for passing through the gates is generated by a planner and the controller tries to follow this planned path. More details about the controller and the dynamics are provided in the Appendix 0.H.

Observer.

The observations $\langle\hat{x},\hat{y},\hat{z},\hat{\psi}\rangle$ are computed using a particle filter algorithm from [17]. This is a very different perception pipeline compared to ${\sf AutoLand}$ . The probability distribution over the state is represented using a set of particles, from which the mean estimate is calculated. Roughly, the particle filter updates the particles in two steps: in the prediction step it uses the dynamics model to move the particles forward in time, and in the correction step it re-samples the particles to select those particles that are consistent with the actual measurement (images in this case). The details of this algorithm are not important for our analysis. However, it is important to note that the particles used for prediction and correction do not have access to the environment parameters (e.g. lighting).

Besides using current image, the particle filter also uses history information to help estimation and therefore the estimation accuracy may improve over time. The shape of the reference trajectory that the drone is following can also influence its the performance, especially rapid turning in the reference trajectory. Therefore, consider this specific application, we decided to sample data along the reference trajectory for constructing the perception contract which gives similar distribution when the drone is following the reference trajectory.

Environment.

We study the impact of two environmental parameters: (1) Fog ranging from $0$ (clear) to $1$ (heavy fog), and (2) Ambient light ranging from $-1$ (dark) and 0 (bright). The nominal environment $e_{d}=\langle 0,0\rangle$ is a scenario in which the drone is able to complete the path successfully. The effect of different environments on the same view are shown in Fig. 1.

6.2 Analysis of DroneRace

For the DroneRace system, the initial set $X_{0}$ with $x\in[-0.85,-0.83]$ , $y\in[-0.06,-0.04]$ , $z\in[-0.34,-0.32]$ , $\phi\in[-0.01,0.01]$ , $\theta\in[-0.01,0.01]$ , $\psi\in[0.60,0.62]$ defines an approach zone (all the other dimensions are set to $0$ ). Due to the fact that the perception pipeline for the DroneRace example is highly sensitive, we choose desired actual conformance measure $pr=70\%$ , the gap $\epsilon=2\%$ and the confidence parameter $\delta=1\%$ . Therefore, the empirical conformance measure should be at least $72\%$ with at least 5756 training data points.

Conformance, correctness and system-level safety.

For $X_{0}$ , with 23 refinements on the environmental, DaRePC is able to show that $X_{0}$ is safe. The environment set $\tilde{E}$ that was proved safe is ball-like region around nominal environment $e_{d}=\langle 0,0\rangle$ . The empirical conformance measure of the contract is $72.4\%$ , which is higher than the minimal empirical conformance needed to get the desired conformance with confidence $\delta$ . The computed $\textsc{Reach}(X_{0},M_{\tilde{E}},t)$ reachable set ${R}$ shows correctness with respect to the requirement ${R}$ (Fig. 5), that is, $\langle{M}_{\tilde{E}},X_{0},\tilde{E}\rangle\models{R}$ .

$\langle\hat{h},X_{0},\tilde{E}\rangle$ satisfies ${R}$ .

We run 20 random simulated trajectories starting from randomly sampled $x\in X_{0}$ and $e\in\tilde{E}$ . The results are visualized by the blue trajectories in Fig. 5. We can see all the 20 simulations are covered by the Reach computed and therefore doesn’t violate ${R}$ . This provides evidence that the $\langle{\hat{h}},X_{0},\tilde{E}\rangle\models{R}$ . We also measure the empirical conformance measure of the perception contract for states in these 20 random simulations and the contract is able to get $93.6\%$ empirical conformance measure on these states.

Discovering bad environments.

We run 20 additional simulations starting from randomly sampled $x\in X_{0}$ under environments randomly sampled from outside of $\tilde{E}$ . The results are shown in Fig. 7 in appendix. We observe that among all 20 simulations, the drone violates requirments in 19 simulations. This provides evidence that the system can easily violate the requirement outside the environmental factors $\tilde{E}$ DaRePC identified.

7 Discussions

There are several limitations of the perception contract method developed in this paper, which also suggest directions for future explorations.

Distributional Shifts.

We discussed that DaRePC has nice properties of conformance and termination provided LearnContract converges, as the environment converges to the ground conditions. This only makes sense in the limited situation where the underlying sampling distribution $\mathcal{D}$ for constructing the contract $M$ remains the same under deployed conditions. It will be important to develop techniques for detecting and adapting to distributional shifts in the perception data, in order to have guarantees that are robust to such changes [1].

Runtime monitoring and control.

The proplem formulation in this paper is focused on verification, which in turn requires the identification of the environmental conditions under which system safety can be assured. Perception contracts can also be used for runtime monitoring for detection of the violation of certain properties. The key challenge here is to compute preimages of the contracts, that is: $M^{-1}:Y\rightarrow X$ , that can be used for detecting violation or imminent violation of the requirements. Another related problem is the correction of such imminent violations by changing the control actions.

The paper presents the analysis of 6 and 12 dimensional flight control systems that use multi-stage, heterogeneous, ML-enabled perception. Both systems have high-dimensional nonlinear dynamics and realistic perception pipelines including deep neural networks. We explore how the variation of environment impacts system level safety performance. We advances the method of perception contract by introducing an algorithm for constructing data and requirement guided refinement of perception contracts (DaRePC). DaRePC is able to either identify a set of states and environmental factors that the requirement can be satisfied or find a counter example. In future, we are planning to apply the algorithm to real hardware to test its performance.

References

[1] Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in ai safety (2016)
[2] Angello, A., Hsieh, C., Madhusudan, P., Mitra, S.: Perception contracts for safety of ml-enabled systems. In: Proc. of the ACM on Programming Languages (PACMPL), OOPSLA (2023)
[3] Calinescu, R., Imrie, C., Mangal, R., Rodrigues, G.N., Păsăreanu, C., Santana, M.A., Vázquez, G.: Discrete-event controller synthesis for autonomous systems with deep-learning perception components. arXiv:2202.03360 (2023)
[4] Dean, S., Matni, N., Recht, B., Ye, V.: Robust guarantees for perception-based control. In: Bayen, A.M., Jadbabaie, A., Pappas, G., Parrilo, P.A., Recht, B., Tomlin, C., Zeilinger, M. (eds.) Proceedings of the 2nd Conference on Learning for Dynamics and Control. Proceedings of Machine Learning Research, vol. 120, pp. 350–360. PMLR (10–11 Jun 2020), https://proceedings.mlr.press/v120/dean20a.html
[5] Dean, S., Taylor, A., Cosner, R., Recht, B., Ames, A.: Guaranteeing safety of learned perception modules via measurement-robust control barrier functions. In: Kober, J., Ramos, F., Tomlin, C. (eds.) Proceedings of the 2020 Conference on Robot Learning. Proceedings of Machine Learning Research, vol. 155, pp. 654–670. PMLR (16–18 Nov 2021), https://proceedings.mlr.press/v155/dean21a.html
[6] Dekkers, G.: Quantile regression: Theory and applications (2014)
[7] Dreossi, T., Fremont, D.J., Ghosh, S., Kim, E., Ravanbakhsh, H., Vazquez-Chanlatte, M., Seshia, S.A.: VerifAI: A toolkit for the formal design and analysis of artificial intelligence-based systems. In: Computer Aided Verification (CAV) (Jul 2019)
[8] Everett, M., Habibi, G., How, J.P.: Efficient reachability analysis of closed-loop systems with neural network controllers. In: 2021 IEEE International Conference on Robotics and Automation (ICRA). pp. 4384–4390 (2021). https://doi.org/10.1109/ICRA48506.2021.9561348
[9] Fan, C., Qi, B., Mitra, S., Viswanathan, M.: Dryvr: Data-driven verification and compositional reasoning for automotive systems. In: Majumdar, R., Kunčak, V. (eds.) Computer Aided Verification (CAV). pp. 441–461. Springer, Cham (2017)
[10] Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (jun 1981). https://doi.org/10.1145/358669.358692, https://doi.org/10.1145/358669.358692
[11] Fremont, D.J., Chiu, J., Margineantu, D.D., Osipychev, D., Seshia, S.A.: Formal analysis and redesign of a neural network-based aircraft taxiing system with VerifAI. In: Computer Aided Verification (CAV) (Jul 2020)
[12] Herbert, S.L., Chen, M., Han, S., Bansal, S., Fisac, J.F., Tomlin, C.J.: Fastrack: A modular framework for fast and guaranteed safe motion planning. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC). pp. 1517–1522 (2017). https://doi.org/10.1109/CDC.2017.8263867
[13] Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301), 13–30 (1963), http://www.jstor.org/stable/2282952
[14] Hsieh, C., Koh, Y., Li, Y., Mitra, S.: Assuring safety of vision-based swarm formation control (2023)
[15] Hsieh, C., Li, Y., Sun, D., Joshi, K., Misailovic, S., Mitra, S.: Verifying controllers with vision-based perception using safe approximate abstractions. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41(11), 4205–4216 (2022)
[16] Li, Y., Zhu, H., Braught, K., Shen, K., Mitra, S.: Verse: A python library for reasoning about multi-agent hybrid system scenarios. In: Enea, C., Lal, A. (eds.) Computer Aided Verification. pp. 351–364. Springer Nature Switzerland, Cham (2023)
[17] Maggio, D., Abate, M., Shi, J., Mario, C., Carlone, L.: Loc-nerf: Monte carlo localization using neural radiance fields (2022)
[18] Păsăreanu, C.S., Mangal, R., Gopinath, D., Getir Yaman, S., Imrie, C., Calinescu, R., Yu, H.: Closed-loop analysis of vision-based autonomous systems: A case study. In: Proc. 35th Int. Conf. Comput. Aided Verification. pp. 289–303 (2023)
[19] Păsăreanu, C., Mangal, R., Gopinath, D., Yu, H.: Assumption Generation for the Verification of Learning-Enabled Autonomous Systems. arXiv:2305.18372 (2023)
[20] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. pp. 234–241. Springer International Publishing, Cham (2015)
[21] Santa Cruz, U., Shoukry, Y.: Nnlander-verif: A neural network formal verification framework for vision-based autonomous aircraft landing. In: Deshmukh, J.V., Havelund, K., Perez, I. (eds.) NASA Formal Methods. pp. 213–230. Springer International Publishing, Cham (2022)
[22] Shoouri, S., Jalili, S., Xu, J., Gallagher, I., Zhang, Y., Wilhelm, J., Jeannin, J.B., Ozay, N.: Falsification of a Vision-based Automatic Landing System. https://doi.org/10.2514/6.2021-0998, https://arc.aiaa.org/doi/abs/10.2514/6.2021-0998
[23] Song, Y., Romero, A., Müller, M., Koltun, V., Scaramuzza, D.: Reaching the limit in autonomous racing: Optimal control versus reinforcement learning. Science Robotics 8(82), eadg1462 (2023). https://doi.org/10.1126/scirobotics.adg1462, https://www.science.org/doi/abs/10.1126/scirobotics.adg1462
[24] Sun, D., Mitra, S.: NeuReach: Learning reachability functions from simulations. In: Fisman, D., Rosu, G. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. pp. 322–337. Springer, Cham (2022)
[25] Sun, D., Yang, B.C., Mitra, S.: Learning-based perception contracts and applications (2023)
[26] Tancik, M., Weber, E., Ng, E., Li, R., Yi, B., Kerr, J., Wang, T., Kristoffersen, A., Austin, J., Salahi, K., Ahuja, A., McAllister, D., Kanazawa, A.: Nerfstudio: A modular framework for neural radiance field development. In: ACM SIGGRAPH 2023 Conference Proceedings. SIGGRAPH ’23 (2023)

Appendix 0.A Proof for Proposition 1

Proposition 1 Restated Suppose $\tilde{X}\subseteq X$ contains the reachable states of the system $S=\langle X,\tilde{X}_{0},Y,\tilde{E},f,o\rangle$ . If ${M}$ is a $(\tilde{X}_{0},\tilde{X},\tilde{E})$ -perception contract for $o$ for some requirement $R$ , then $S$ satisfies $R$ , i.e., $\langle o,\tilde{X}_{0},\tilde{E}\rangle\models{R}$ .

Proof

Consider any environment $e\in\tilde{E}$ , any initial state $x_{0}\in\tilde{X}_{0}$ . From correctness of ${M}$ , it follows that $\langle M,\tilde{X}_{0},\tilde{E}\rangle\models{R}$ . That is, for the system with $M$ as observer ${S_{{M}}}=\langle X,\tilde{X}_{0},Y,\tilde{E},{M}\rangle$ the reachable sets ${Reach_{S_{M}}}(t)\subseteq{R}$ for all $t$ , and for each $e\in\tilde{E}$ . Now consider any execution $\alpha$ of $S=\langle X,\tilde{X}_{0},Y,\tilde{E},f,o\rangle$ . We know $\alpha(0)\in\tilde{X}_{0}\subseteq{Reach_{S_{M}}}(0)$ . Since ${Reach}_{S}(t)\subseteq\tilde{X}$ , from conformance of ${M}$ , we know that $o(\alpha(t),\tilde{E})\subseteq{M}(\alpha(t))$ for all $t$ . Therefore, for $\alpha(t)\in{Reach_{S_{M}}}(t)$ , we have $\alpha(t+1)=f(\alpha(t),o(\alpha(t),e))\in f(\alpha(t),M(\alpha(t)))\subseteq{Reach_{S_{M}}}(t+1)$ . This tells us that $\alpha(t)\in{Reach_{S_{M}}}(t)$ for all execution $\alpha$ for all $t$ , which gives us $\langle o,\tilde{X}_{0},\tilde{E}\rangle\models{R}$ .

Appendix 0.B Proof for Proposition 2

Proposition 2 Restated For any $\delta\in(0,1)$ , with probability at least $1-\delta$ , we have that

p\geq\hat{p}-\sqrt{-\frac{\ln{\delta}}{2n}}

Proof

From Hoeffding’s inequality, we get

\begin{split}Pr(\hat{p}-p\geq\sqrt{-\frac{\ln{\delta}}{2n}})&\leq e^{-2n(\sqrt{-\frac{\ln{\delta}}{2n}})^{2}}=e^{-2n*(-\frac{ln\delta}{2n})}=\delta\end{split}

This implies that

Pr(\hat{p}-p\leq\sqrt{-\frac{ln\delta}{2n}})\geq 1-\delta

which proves the proposition.

Appendix 0.C Proof for Proposition 3

Proposition 3 Restated Given $S=\langle X,X_{0},Y,E_{0},f,{\hat{h}}\rangle$ , $\forall x_{0}\in X_{0}$ and an execution ${\alpha}$ of $S$ in nominal environment $e_{d}$ , $\forall\epsilon>0$ , $\exists n$ such that $X_{n}$ generated by calling RefineState $n$ times on $X_{0}$ and ${M}_{E_{n}}$ obtained on environment refined by ShrinkEnv $n$ times satisfies that $\forall t$ , $\textsc{Reach}(X_{n},{M}_{E_{n}},t)\subseteq B_{\epsilon}({\alpha}(t))$

Proof

We know from the definition of ShrinkEnv that as the sequence of sets of environmental parameters generated by ShrinkEnv will monotonically converge to a singleton set contain the grounding environmental parameter $e_{d}$ . We know from convergence of ${M}$ that ${M}$ will converge to ${\hat{h}}$ as the set of environmental parameters converge to a singleton set. In addition, we know the output of RefineState will converge to a singleton set of states after it’s repeatedly invoked by definition. Therefore, by the assumption of Reach, when a partition $X$ converges to some $x\in X$ and $\tilde{E}\rightarrow\{e_{d}\}$ , $\textsc{Reach}(X,{M}_{\tilde{E}},t)\rightarrow{\alpha}(t)$ which is the trajectory starting from state $x$ generated by ${\hat{S}}$ under environmental parameter $e_{d}$ .

Appendix 0.D Proof for Theorem 4.2

Theorem 4.2 Restated If ${M}$ satisfies conformance and convergence and ${\hat{S}}=\langle X,X_{0},Y,E_{0},f,{\hat{h}}\rangle$ satisfy/violate requirement robustly, then Algorithm 2 always terminates, i.e., the algorithm can either find a $\tilde{E}\subseteq E_{0}$ and ${M}_{\tilde{E}}$ that $\langle{M}_{\tilde{E}},X_{0},\tilde{E}\rangle\models{R}$ or find a counter-example $X_{c}\subseteq X_{0}$ and $\tilde{E}\subseteq E_{0}$ with $e_{d}\in\tilde{E}$ that $\langle{\hat{h}},X_{c},\tilde{E}\rangle\not\models{R}$ .

Proof

To show completeness of the algorithm, we need to show four cases:

1) When there exist a counter-example, the algorithm can find the counter-example. If there exist a counter-example such that trajectory ${\alpha}_{c}$ starting from $x_{c}\in X_{0}$ under grounding environmental parameter $e_{d}$ exits ${R}$ at time $t$ , then according to the robustness assumption of the system, there exist $\epsilon>0$ , such that $\forall x\in B_{\epsilon}({\alpha}_{c}(t))$ , $x\notin{R}$ . Therefore, according to Proposition 3, with enough refinement, we can get a set of states $X_{c}\subset X_{0}$ with $x_{c}\in X_{c}$ and $\tilde{E}\subset E_{0}$ with $e_{d}\in\tilde{E}$ such that ${\alpha}_{c}(T)\in\textsc{Reach}(X_{c},{M}_{\tilde{E}},t)\subseteq B_{\epsilon}({\alpha}_{c}(T))$ . In this case, $\textsc{Reach}(P,{M}_{\tilde{E}},t)\cap{R}=\emptyset$ which captures the counter-example.

2) A counter-example returned is indeed a counter-example. A set of states $X_{c}$ is returned as counter-example when $\exists t$ , such that $\textsc{Reach}(X_{c},{M}_{\tilde{E}},t)\cap{R}=\emptyset$ for some ${M}_{\tilde{E}}$ . From conformance of ${M}_{\tilde{E}}$ , we know ${\hat{h}}(x,e)\in M(x)$ for all $x$ and $e\in\tilde{E}$ In this case, trajectory ${\alpha}$ starting from $x_{c}\in X_{c}$ with any environmental parameter $e\in\tilde{E}$ will satisfy that ${\alpha}(t)\in{Reach}(P,{M}_{\tilde{E}},t)$ which is not in ${R}$ . The output of DaRePC in this case is indeed a counter-example.

3) When a counter-example doesn’t exist, the algorithm can find ${M}_{\tilde{E}}$ satisfy ${R}$ on $X_{0}$ . This case can be shown similar to case 1). Since we assume the system is robust, for every trajectory of the system that satisfies ${R}$ , there exist $\epsilon>0$ such that $B_{\epsilon}({\alpha}(t))\subset{R}$ for all $t$ . Since we know as DaRePC repeatedly performs refinement, $\textsc{Reach}(X,{M},t)$ will converge to a single execution. Therefore, according to Proposition 3, with enough refinement, for all safe trajectory we can get ${\alpha}(t)\in\textsc{Reach}(X,{M},t)\subset B_{\epsilon}({\alpha}(t))\subset{R}(t)$ . Therefore, with enough refinement, we can always find a ${M}_{\tilde{E}}$ that satisfies ${R}$ when the ${\hat{S}}$ satisfy ${R}$ on $X_{0}$ under $e_{d}$ .

4) A ${M}_{E}$ returned by DaRePC will always satisfy ${R}$ on $X_{0}$ . This is given automatically by Theorem 4.1 and the conformance assumption.

Appendix 0.E Dynamics and Controller for AutoLand Case Study

The dynamics of the AutoLand Case Study is given by

\begin{split}&x_{t+1}=x_{t}+v_{t}cos(\psi_{t})cos(\theta_{t})\delta t\\ &y_{t+1}=y_{t}+v_{t}sin(\psi_{t})cos(\theta_{t})\delta t\\ &z_{t+1}=z_{t}+sin(\theta_{t})\delta t\\ &\psi_{t+1}=\psi_{t}+\beta\delta t\\ &\theta_{t+1}=\theta_{t}+\omega\delta t\\ &v_{t+1}=\theta_{t}+a\delta t\end{split}

$\delta t$ is a descretization parameter. The control inputs are the yaw rate $\beta$ , pitch rate $\omega$ and acceleration $a$

The error between the reference points and state is then given by

\begin{split}&x_{e}=cos(\psi)(x_{r}-x)+sin(\psi)(y_{r}-y)\\ &y_{e}=-sin(\psi)(x_{r}-x)+cos(\psi)(y_{r}-y)\\ &z_{e}=z_{r}-z\\ &\psi_{e}=\psi_{r}-\psi\\ &\theta_{e}=\theta_{r}-\theta\\ &v_{e}=v_{r}-v\end{split}

The control inputs are then computed using errors computed by a PD controller.

\begin{split}&a=0.005(\sqrt{(v_{r}cos(\theta_{r})cos(\psi_{r})+0.01x_{e})^{2}+(v_{r}sin(\theta_{r})+0.01z_{e})^{2}}-v)\\ &\beta=\psi_{e}+v_{r}(0.01y_{e}+0.01sin(\psi_{e})\\ &\omega=(\theta_{r}-\theta)+0.001z_{e}\end{split}

Appendix 0.F Initial Sets for AutoLand Case Study

The initial sets $X_{01}$ , $X_{02}$ for AutoLand case study are

	$\displaystyle\small X_{01}=[-3020,-3010]\times[-5,5]\times[118,122]\times[-0.001,0.001]\times[-0.0534,-0.0514]\times[9.99,10.01]$
	$\displaystyle X_{02}=[-3030,-3000]\times[-5,5]\times[100,140]\times[-0.001,0.001]\times[-0.0534,-0.0514]\times[9.99,10.01].$

Appendix 0.G Block Diagram for DroneRace

The block diagram for the DroneRace case study is shown in Fig. 6

Appendix 0.H Dynamics and Controller for DroneRace Case Study

We used a slightly modified version of drone dynamics from [12]. It’s given by the following differential equations

\begin{split}&\dot{x}=vx*cos(\psi)-vy*sin(\psi)\\ &\dot{vx}=g*tan(\phi)\\ &\dot{\phi}=-d1*\phi+\rho\\ &\dot{\rho}=-d0*\phi+n0*ax\\ &\dot{y}=vx*sin(\psi)+vy*cos(\psi)\\ &\dot{vy}=g*tan(\theta)\\ &\dot{\theta}=-d1*\theta+\omega\\ &\dot{\omega}=-d0*\theta+n0*ay\\ &\dot{z}=vz\\ &\dot{vz}=kT*F-g\\ &\dot{\psi}=\beta\\ &\dot{\beta}=n0*az\end{split}

Where some parameters $g=9.81$ , $d0=10$ , $d1=8$ , $n0=10$ , $kT=0.91$ . Given reference state provided by the reference trajectory, we can compute the tracking error between reference state and current state $err=[x_{e},vx_{e},\phi_{e},\rho_{e},y_{e},vy_{e},\theta_{e},\omega_{e},z_{e},vz_{e},\psi_{e},\beta_{e}]$ . The position error between current state of the drone and the reference trajectory is given by.

\begin{split}&\begin{bmatrix}x_{e}\\ y_{e}\\ z_{e}\end{bmatrix}=\begin{bmatrix}cos(\psi)&-sin(\psi)&0\\ sin(\psi)&cos(\psi)&0\\ 0&0&1\end{bmatrix}\end{split}

The tracking error for other states is given by the difference between reference state and current state. Therefore, the control signals $ax,ay,kT,az$ can be computed using following feedback controller.

\begin{split}&\begin{bmatrix}ax\\ ay\\ kT\end{bmatrix}=\begin{bmatrix}3.16&4.52&4.25&1.36&0.&-0.&-0.&0.&-0.&0.\\ 0.&0.&0.&0.&1.&1.83&1.46&1.14&-0.&-0.\\ 0.&0.&-0.&0.&0.&-0.&-0.&-0.&1.&1.79\end{bmatrix}err\\ &az=1.0*\psi_{e}+1.0*\beta_{e}\end{split}

Refining Perception Contracts: Case Studies in Vision-based Safe Auto-landing

Abstract

Keywords:

1 Introduction

Vision-based auto-landing system.

Vision-based drone racing.

2 Related Works

Contracts and Assume-Guarantee.

Safety of automated landing

Perception-based control

3 The Problem: Assuring Safety of Vision-based Systems

Notations.

Problem.

4 Method: Perception Contracts

Definition 1

Proposition 1

4.1 Learning Conformant Perception Contracts from Data

Proposition 2

Contracts with fixed training data.

Convergence of LearnContract.

4.2 Requirement Guided Refinement of Perception Contracts

Reach​(Xc,ME~,t)\textsc{Reach}(X_{c},M_{\tilde{E}},t)

RefineState​(Xc)\textsc{RefineState}(X_{c})

ShrinkEnv​(E~,ed,ME~,h^)\textsc{ShrinkEnv}(\tilde{E},e_{d},{M}_{\tilde{E}},{\hat{h}})

4.3 Properties of DaRePC

Theorem 4.1

Proposition 3

Definition 2

Theorem 4.2 (Completeness)

5 Case Study: Vision-based Auto Landing

5.1 AutoLand control system with DNN-based pose estimator

Dynamics.

Vision-based observer.

Environment.

5.2 Safety Analysis of AutoLand

The computed perception contract ME~{M}_{\tilde{E}} achieves desired conformance.

DaRePC captures interesting behavior.

Conformance, correctness and system-level safety.

Discovering bad environments; ⟨h^,X01,E0​\​E~⟩\langle{\hat{h}},X_{01},E_{0}\textbackslash\tilde{E}\rangle doesn’t satisfy R{R}.

5.3 Convergence and conformance of Perception Contracts

Refinement preserves conformance measure of perception contract.

6 Case Study: Vision-based Drone Racing (DroneRace)

6.1 System Description

Dynamics.

Observer.

Environment.

6.2 Analysis of DroneRace

Conformance, correctness and system-level safety.

⟨h^,X0,E~⟩\langle\hat{h},X_{0},\tilde{E}\rangle satisfies R{R}.

Discovering bad environments.

7 Discussions

Distributional Shifts.

Runtime monitoring and control.

References

Appendix 0.A Proof for Proposition 1

Proof

Appendix 0.B Proof for Proposition 2

Proof

Appendix 0.C Proof for Proposition 3

Proof

Appendix 0.D Proof for Theorem 4.2

Proof

Appendix 0.E Dynamics and Controller for AutoLand Case Study

Appendix 0.F Initial Sets for AutoLand Case Study

Appendix 0.G Block Diagram for DroneRace

Appendix 0.H Dynamics and Controller for DroneRace Case Study

Appendix 0.I Simulation Trajectories for e∉E~e\notin\tilde{E} for DroneRace

$\textsc{Reach}(X_{c},M_{\tilde{E}},t)$

$\textsc{RefineState}(X_{c})$

$\textsc{ShrinkEnv}(\tilde{E},e_{d},{M}_{\tilde{E}},{\hat{h}})$

The computed perception contract ${M}_{\tilde{E}}$ achieves desired conformance.

Discovering bad environments; $\langle{\hat{h}},X_{01},E_{0}\textbackslash\tilde{E}\rangle$ doesn’t satisfy ${R}$ .

$\langle\hat{h},X_{0},\tilde{E}\rangle$ satisfies ${R}$ .

Appendix 0.I Simulation Trajectories for $e\notin\tilde{E}$ for DroneRace