STL2vec: Signal Temporal Logic Embeddings for Control Synthesis With Recurrent Neural Networks

Wataru Hashimoto, Kazumune Hashimoto, and Shigemasa Takai Wataru Hashimoto, Kazumune Hashimoto and Shigemasa Takai are with the Graduate School of Engineering, Osaka University, Suita, Japan (e-mail: [email protected], [email protected], [email protected])

Abstract

In this paper, a method for learning a recurrent neural network (RNN) controller that maximizes the robustness of signal temporal logic (STL) specifications is presented. In contrast to previous methods, we consider synthesizing the RNN controller for which the user is able to select an STL specification arbitrarily from multiple STL specifications. To obtain such a controller, we propose a novel notion called STL2vec, which represents a vector representation of the STL specifications and exhibits their similarities. The construction of the STL2vec is useful since it allows us to enhance the efficiency and performance of the RNN controller. We validate our proposed method through the examples of the path planning problem.

Index Terms:

Signal temporal logic, neural network controller, optimal control.

I Introduction

Recently, formal methods have attracted much attention in control to address complex and temporal objectives such as periodic, sequential, or reactive tasks, going beyond the traditional control objectives like stability and tracking. Temporal logics such as linear temporal logic (LTL)[1], metric temporal logic (MTL)[2], and signal temporal logic (STL)[3] allow us to write down formal descriptions for these specifications.

In this paper, we focus on dealing with the STL specifications, which can specify the temporal properties of real-valued signals. One of the notable advantages of using the STL specifications in control is that we have access to the quantitative semantics, called robustness, which measures how much a signal satisfies the STL specification. The robustness has been used for control purposes in many existing works [4, 5, 6, 7, 8, 9, 10]. The authors of [4, 5, 6] proposed a scheme of encoding STL constraints or robustness of the STL specifications by mixed-integer linear constraints, and these are utilized to employ model predictive control (MPC). To alleviate the computational burden of mixed-integer programming, the works [7, 8, 9, 10] proposed the concept of smooth robustness, allowing us to solve the maximization problem of the robustness based on gradient-based algorithms.

Aiming to further improve the scalability and meet real-time requirements, neural network (NN) based feedback controller was employed for STL tasks [11, 12, 13, 14, 15]. Although the learning procedure itself may be time-consuming, once this computation can be done offline, applying the resulting controller online becomes much faster than directly solving the optimization problem at each time step. In [11], the parameters of the feed-forward NN were trained to maximize the robustness of the STL specifications. Reinforcement learning algorithms such as deep Q-learning (DQN) and learning from demonstrations (Lfd) were used for learning the NN controller in [12] and [13] respectively. The works of [14, 15] used a recurrent neural network (RNN) [16] instead of a feed-forward NN, which is more suitable for control with STL specifications since the satisfaction of an STL formula depends on a state trajectory (not just the current state). In [14, 15], RNN parameters were trained using imitation learning and model-based reinforcement, respectively.

However, the afore-cited previous works can deal with only one STL specification (i.e., the NN controller is trained for only one prescribed STL specification), and did not explicitly deal with multiple STL specifications. In particular, it may be more desirable and flexible in practice that the user is able to choose an STL specification freely from multiple STL specifications, such as the case where a surveillance task in a certain region (building, park, etc.) changes from day to day. A naive approach to achieve this would be to learn a set of NN controllers independently for all the candidate STL specifications, although this would be expensive in terms of memory consumption and computational resources if the number of the STL specifications is large (i.e., the number of the parameters to be learned could increase as the number of the candidate STL specifications increases). Hence, we here consider learning a single NN controller, in which not only the states of the system but also a chosen STL specification are regarded as inputs to the NN. We in particular focus on learning an RNN controller since as previously mentioned, it is suitable for dealing with history dependent nature of the STL specifications. The above formulation leads us to the following question to be answered, which has not been investigated in previous works of literature:

How should we encode the STL specifications by vectors, so that they are readable to the RNN?

Naively, we could assign any unique numbers or one-hot vectors to all the STL specifications and use them as the inputs to the RNN. Although these approaches are easy to implement, information about the similarities in the specifications cannot be encoded with these schemes. Thus, the underlying function to be learned can be unreasonably complex as the number of candidate STL specifications increases, which leads to increased computation time for training as well as a critical failure in control execution.

Motivated by the above issue, we propose a novel scheme for constructing a vector representation of STL specifications, called STL2vec, which captures similarities between the STL specifications. The vectors for the specifications will be trained based on Word2vec [17, 18] which is a technique widely known in natural language processing. Then, the parameters for the RNN are trained by taking the vectors obtained by STL2vec and state trajectory as the inputs (to the RNN) and defining a loss function by the average of negative robustness scores for all the candidate STL specifications.

The main contributions of this work are summarized as follows. First, we propose a method for obtaining vector representations of STL specifications that capture their similarities in terms of control policy. Although STL2vec is constructed based on the concept of word2vec, which is a technique that maps a word of the natural language onto the vector space, how to generate a dataset to learn appropriate vector representations for STL specifications in control synthesis is not a trivial problem. We address the details on how to construct dataset to train appropriate STL2vec in Section V. Second, by using the vectors generated by STL2vec, we train the controller that can deal with multiple STL specifications with one RNN model. Naively, if one aims to train the NN controller to deal with multiple specifications by simply applying existing NN-based control synthesis methods such as [11, 12, 13, 14, 15], one needs to ready NN models as many as the number of the STL specifications, which leads to large memory consumption and computational resources (for details, see Section V-C). Moreover, the proposed method has the potential to accelerate the training as will be seen in the case study of Section VI. Although there exist other methods that make the NN controller flexible such as [14] and [19], which can handle (unknown) obstacles by using control barrier function, these methods are restricted to the specification of only collision avoidance and thus are not intended to deal with multiple temporal logic specifications. The proposed method in this paper allows us to overcome such limitations; it allows the user to have a flexibility of selecting an STL specification freely from multiple ones, and this will be achieved by constructing STL2vec.

II Preliminaries

II-A System description and notations

We consider a nonlinear discrete-time system of the form:

\displaystyle x_{t+1}=f(x_{t},u_{t}),\ x_{0}\in\mathcal{X}_{0},\ u_{t}\in\mathcal{U},

(1)

where $x_{t}\in\mathbb{R}^{n}$ and $u_{t}\in\mathcal{U}$ are the state and the control input at time $t\in\mathbb{Z}_{\geq 0}$ , $\mathcal{X}_{0}\subset\mathbb{R}^{n}$ is the set of initial states, $\mathcal{U}\subset\mathbb{R}^{m}$ is the set of control inputs, and $f:\mathbb{R}^{n}\times\mathcal{U}\rightarrow\mathbb{R}^{n}$ is a function capturing the dynamics of the system. We assume that the initial state $x_{0}$ is randomly chosen from $\mathcal{X}_{0}$ according to the probability distribution $p:\mathcal{X}_{0}\rightarrow\mathbb{R}$ , and $\mathcal{U}$ is given by $\mathcal{U}=\{u\in\mathbb{R}^{m}:u_{\min}\leq u\leq u_{\max}\}$ for given $u_{\max},u_{\min}\in\mathbb{R}^{m}$ (the inequalities are interpreted element-wise). Given $x_{0}\in\mathcal{X}_{0}$ and a sequence of control inputs $u_{0},\ldots,u_{T-1}$ , we can generate a unique sequence of states according to the dynamics (1), which we call a trajectory: $x_{0:T}=x_{0},x_{1},\ldots,x_{T}$ .

II-B Signal Temporal Logic

Signal temporal logic (STL) [3] is a logical formalism that can specify temporal properties of real-valued signals. The syntax of the STL formula is recursively defined as follows:

	$\displaystyle\phi::=$	$\displaystyle\top\mid\mu\mid\neg\phi\mid\phi_{1}\land\phi_{2}\mid\phi_{1}\lor\phi_{2}\mid\bm{F}_{I}\phi\mid$
		$\displaystyle\bm{G}_{I}\phi\mid\phi_{1}\bm{U}_{I}\phi_{2}$		(2)

where $\mu:\mathbb{R}^{n}\rightarrow\mathbb{B}$ is the predicate whose boolean truth value is determined by the sign of a function $h:\mathbb{R}^{n}\rightarrow\mathbb{R}$ , $\top$ , $\neg$ , $\land$ , and $\lor$ are Boolean true, negation, and, and or operators, respectively, and $\bm{F}_{I}$ , $\bm{G}_{I}$ , $\bm{U}_{I}$ are the temporal eventually, always, and until operators defined on a time interval $I=[a,b]=\{t\in\mathbb{Z}_{\geq 0}:a\leq t\leq b\}$ ( $a$ , $b\in\mathbb{Z}_{\geq 0}$ ).

We define the semantics of an STL formula $\phi$ with respect to the trajectory $\mathbf{x}:=x_{0:T}$ of the system (1) at time $t$ as follows:

	$\displaystyle(\mathbf{x},t)\models\mu\Leftrightarrow h(x_{t})>0$
	$\displaystyle(\mathbf{x},t)\models\neg\mu\Leftrightarrow\neg((\mathbf{x},t)\models\mu)$
	$\displaystyle(\mathbf{x},t)\models\phi_{1}\land\phi_{2}\Leftrightarrow(\mathbf{x},t)\models\phi_{1}\land(\mathbf{x},t)\models\phi_{2}$
	$\displaystyle(\mathbf{x},t)\models\phi_{1}\lor\phi_{2}\Leftrightarrow(\mathbf{x},t)\models\phi_{1}\lor(\mathbf{x},t)\models\phi_{2}$
	$\displaystyle(\mathbf{x},t)\models\bm{F}_{I}\phi\Leftrightarrow\exists t_{1}\in t+I,(\mathbf{x},t_{1})\models\phi$
	$\displaystyle(\mathbf{x},t)\models\bm{G}_{I}\phi\Leftrightarrow\forall t_{1}\in t+I,(\mathbf{x},t_{1})\models\phi$
	$\displaystyle(\mathbf{x},t)\models\phi_{1}\bm{U}_{I}\phi_{2}\Leftrightarrow\exists t_{1}\in t+I\ \mathrm{s.t.}\ (\mathbf{x},t_{1})\models\phi_{2}$
	$\displaystyle\qquad\qquad\land\forall t_{2}\in[t,t_{1}],(\mathbf{x},t_{2})\models\phi_{1},$

where $t+I=\{t+k\in\mathbb{Z}_{\geq 0}:k\in I\}$ . For simplicity, we denote $\mathbf{x}\models\phi$ to abbreviate $(\mathbf{x},0)\models\phi$ . The above semantics is qualitative, in the sense that it reveals only if the trajectory $\mathbf{x}$ either satisfies or violates $\phi$ .

The notion of robustness of STL formulas provides quantitative semantics, and it measures how much the trajectory satisfies the STL formula. The robustness is sound in the sense that positive robustness value implies satisfaction of STL formula and negative robustness implies violation of STL formula. The robustness score of the STL formula $\phi$ is defined with respect to a trajectory $\mathbf{x}$ and a time $t$ , which we denote by $\rho^{\phi}(\mathbf{x},t)$ , and is recursively defined as follows:

	$\displaystyle\rho^{\mu}(\mathbf{x},t)$	$\displaystyle\color[rgb]{0,0,1}=h(x_{t})$
	$\displaystyle\rho^{\neg\mu}(\mathbf{x},t)$	$\displaystyle=-h(x_{t})$
	$\displaystyle\rho^{\phi_{1}\land\phi_{2}}(\mathbf{x},t)$	$\displaystyle=\min(\rho^{\phi_{1}}(\mathbf{x},t),\rho^{\phi_{2}}(\mathbf{x},t))$
	$\displaystyle\rho^{\phi_{1}\lor\phi_{2}}(\mathbf{x},t)$	$\displaystyle=\max(\rho^{\phi_{1}}(\mathbf{x},t),\rho^{\phi_{2}}(\mathbf{x},t))$
	$\displaystyle\rho^{\bm{F}_{I}\phi}(\mathbf{x},t)$	$\displaystyle=\max_{t_{1}\in t+I}\rho^{\phi}(\mathbf{x},t)$
	$\displaystyle\rho^{\bm{G}_{I}\phi}(\mathbf{x},t)$	$\displaystyle=\min_{t_{1}\in t+I}\rho^{\phi}(\mathbf{x},t)$
	$\displaystyle\rho^{\phi_{1}\bm{U}_{I}\phi_{2}}(\mathbf{x},t)$	$\displaystyle=\max_{t_{1}\in t+I}\Bigl{(}\min(\rho^{\phi_{2}}(\mathbf{x},t_{1}),$
		$\displaystyle\qquad\quad\min_{t_{2}\in[t,t_{1}]}\rho^{\phi_{1}}(\mathbf{x},t_{2}))\Bigr{)}$

Note that the trajectory length $T$ should be selected large enough to determine the robustness score (see, e.g., [4]). For simplicity, we denote $\rho^{\phi}(\mathbf{x})$ to abbreviate $\rho^{\phi}(\mathbf{x},0)$ .

III Problem Statement

Let us now formulate a problem that we seek to solve throughout the paper. First, let $\Phi=\{\phi_{1},\phi_{2},\ldots,\phi_{M}\}$ denote a set of $M$ candidate STL specifications. We assume that the STL specification can be freely chosen from the $M$ candidate specifications by the user before the control execution. Once the STL specification is chosen by the user, say $\phi_{i}\in\Phi$ , the system (1) is then controlled aiming to satisfy $\phi_{i}$ . We also assume for simplicity that the chosen STL specification is fixed during control execution (as detailed below).

In this paper, we aim to learn a feedback controller that maximizes the robustness of the STL specification. Note that the satisfaction and the robustness of an STL specification are defined over the trajectory of the system (1), and that the STL specification is freely chosen from $\Phi$ . Hence, we should design a control policy that depends not only on the past and present states, but also on the STL specification, i.e., we need to obtain a control policy of the form $u_{t}=\pi(x_{0:t},\phi_{i};W)$ , where $x_{0:t}=x_{0},x_{1},\dots,x_{t}$ is the trajectory of the system (1), $\phi_{i}$ is the STL specification that is chosen from $\Phi$ , and $W$ denotes a set of parameters to be learned to characterize the control policy $\pi$ . More formally, the problem considered in this paper is defined as follows:

Problem 1

Given the system (1), horizon length $T$ , probability distribution of initial states $p\ :\ \mathcal{X}_{0}\rightarrow\mathbb{R}$ , and the candidate STL specifications $\Phi=\{\phi_{1},\phi_{2},\dots,\phi_{M}\}$ , find a set of parameters $W$ that is the solution to the following problem:

$\displaystyle\mathop{\mathrm{maximize}}_{{W}}\$	$\displaystyle\frac{1}{M}\sum_{i=1}^{M}\mathbb{E}_{p(x^{i}_{0})}\left[\rho^{\phi_{i}}\left(x_{0:T}^{i}\right)\right]$	(3)
$\displaystyle\mathrm{s.t.}\ \$	$\displaystyle{x}_{t+1}^{i}={f}\left({x}_{t}^{i},\pi\left(x_{0:t}^{i},\phi_{i};{W}\right)\right),$	(4)
	$\displaystyle t=0,1,\dots,T-1,\ i=0,1,\dots,M.$

where $\pi(\cdot,\cdot;W)$ denotes a control policy parameterized by $W$ and $x_{0:t}^{i}=x^{i}_{0},x^{i}_{1},\ldots,x^{i}_{t}$ is the trajectory of (1) along with the policy $\pi(\cdot,\phi_{i};W)$ . $\Box$

Refer to caption — Figure 1: The structure of RNN.

In Problem 1, we look for a set of the parameters of the control policy $W$ that maximizes the sum of the expectation of the robustness score with respect to the distribution of initial states over all $M$ candidate STL specifications. Since the control policy depends on the past and present states, or it is history dependent, in this paper we use a recurrent neural network (RNN) to learn the control policy $\pi$ (see, [14, 15]). RNN is a type of a neural network that has a feedback architecture. A basic structure of the RNN is illustrated in Fig. 1. As shown in Fig. 1, RNN keeps processing the sequential information via internal hidden state $h$ . The update rule of the hidden state and derivation of the output at time $t$ are as follows:

\displaystyle h_{t}=g_{W_{1}}(h_{t-1},s_{t}),\ u_{t}=l_{W_{2}}(h_{t}),

(5)

where $g_{W_{1}}$ and $l_{W_{2}}$ are the functions parameterized by weights $W_{1}$ and $W_{2}$ , respectively, and $s_{t}$ denotes the input that is fed to the RNN. Hence, the set of the RNN parameters is given by $\{W_{1},W_{2}\}$ . As discussed in [11], we can restrict the output of the RNN (i.e., the control input $u_{t}$ ) within the lower bound $u_{\mathrm{min}}$ and the upper bound $u_{\mathrm{max}}$ by defining each element $i$ $(i=1,\dots,m)$ of the function $l_{W_{2}}$ as follows:

\displaystyle u_{t,i}=u_{\mathrm{min},i}+\frac{u_{\mathrm{max},i}-u_{\mathrm{min},i}}{2}\left(\mathrm{tanh}\left([W_{2}h_{t}]_{i}\right)+1\right),

(6)

where the subscript $i$ denote $i$ -th element of the vector. Using (6), we can generate control inputs satisfying $u_{t}\in\mathcal{U}$ . A detailed definition of $s_{t}$ in (5) as well as a concrete procedure to learn $W_{1},W_{2}$ are elaborated in Section V.

Indeed, solving Problem 1 is not trivial and challenging in the following sense. Since the control policy depends on an STL specification $\phi_{i}$ , we should consider how to transform each STL specification $\phi_{i}\in\Phi$ into a corresponding vector, so that it can be fed to the RNN as the inputs together with the state trajectory $x_{1:t}$ . Intuitively, it may be desirable that we can provide similar inputs to the RNN if the two STL specifications are close to each other in terms of their control policies. For example, consider a simple case with $x\in\mathbb{R}$ and the STL candidate specifications

	$\displaystyle\phi_{1}=\bm{F}_{[0,10]}(0\leq x\leq 1),\ \phi_{2}=\bm{F}_{[0,11]}(0\leq x\leq 1),$		(7)
	$\displaystyle\phi_{3}=\bm{F}_{[0,10]}(10\leq x\leq 11),$		(8)
	$\displaystyle\phi_{4}=\bm{F}_{[0,10]}(10\leq x\leq 11)\lor\bm{F}_{[0,10]}(12\leq x\leq 13).$		(9)

Intuitively, control policies to satisfy $\phi_{1}$ and $\phi_{2}$ may be almost the same (as we aim to control the system to the same region with almost the same time intervals), while the control policies to satisfy $\phi_{1}$ and $\phi_{3}$ may be quite different (as we aim to control the system to different regions). In addition, if we could find a control policy to satisfy $\phi_{3}$ , this control policy also leads to the satisfaction of $\phi_{4}$ . Hence, it is convenient that we could provide a certain mapping, where $\phi_{1}$ and $\phi_{2}$ are mapped onto the vector points that are close to each other but far from those corresponding to $\phi_{3}$ and $\phi_{4}$ . In addition, $\phi_{3}$ and $\phi_{4}$ are mapped onto the same vector points if a control policy to satisfy $\phi_{3}$ could be found.

Motivated by the above intuition, in this paper we propose a novel scheme for constructing a vector representation of the STL specifications, which is referred to as STL2vec. The proposed approach is inspired by the notion of Word2vec [17], a vector representation of words that has been proposed in machine learning literature and widely utilized for natural language processing. A concrete procedure of the proposed approach is elaborated in Section V.

IV Summary of Word2vec (skip-gram)

Before providing the proposed approach, we summarize the basic concept of Word2vec. As we will see in the next section, the concept of Word2vec is a key ingredient to introduce STL2vec and how to train for it. The main objective of Word2vec is to group the vectors of similar or related words (for example, the words ”man” and ”boy” are mapped onto similar points in the vector space). As such, we can let the computer perform mathematical operations on words to detect their similarities. The mapping from each word to the vector is represented via an NN, i.e., the input for the NN is a word, and its output is a corresponding vector. One of the common techniques to train the NN is the skip-gram[17]. The structure of the skip-gram is shown in Fig. 2. As shown in Fig. 2, the skip-gram has a shallow three-layer NN. The input to the NN is an $M$ -dimensional one-hot vector, i.e., only one single element is 1 and the other elements are all zero, where $M$ indicates a total number of words in the corpus (typically with $M>10^{6}$ ). Here, each word is assigned by a unique one-hot vector (for example, ”man” is labeled by $[1,0,0,\ldots,0]^{\mathsf{T}}$ , and ”boy” is labeled by $[0,1,0,\ldots,0]^{\mathsf{T}}$ , etc.), so that each word in the corpus is readable to the NN. The outputs of the NN are the collection of $P$ ( $M$ -dimensional) vectors whose each element represents the probability of the corresponding word in the corpus (training data for the output layer will be explained later in this section), where $P$ is a user-defined parameter. The mapping from the input layer to the hidden layer is given by a matrix $W_{\mathrm{in}}\in\mathbb{R}^{M\times N}$ , where $N$ is a user-defined parameter that indicates the dimension of the word vector. The mapping from the hidden layer to each output layer is given by another matrix $W_{\mathrm{out}}\in\mathbb{R}^{N\times M}$ . Here, we note that the matrix $W_{\mathrm{out}}$ is the same for all the output layers.

The weight parameters $W_{\mathrm{in}},W_{\mathrm{out}}$ are trained from a large number of sentences, such as those in the news report (see, e.g. [17]). For example, suppose $P=2$ and we have a sentence: ”I want to eat an orange every morning”. Then, we regard the training input as a certain word in this sentence and the training output as the set of $P=2$ words around it. For example, the word ”eat” is regarded as the training input, and the set of $P=2$ words around it, i.e., {to, an} is regarded as the training outputs. Generating the training dataset as above is motivated by the insight that the meaning of a word is established by the surrounding words.

Using the dataset constructed above, we train the skip-gram model. First, we pass the one-hot encoded input data to the input layer and perform the forward computation. As an output layer and loss function, we use softmax and cross-entropy loss, respectively. Then, we update the weights $W_{\mathrm{in}}$ and $W_{\mathrm{out}}$ using the gradients obtained by back-propagation computation (for details, see [17]). After the training, the vector representation of each word in the corpus is given through the projection of the weight matrix $W_{\mathrm{in}}$ (hence, we drop the matrix $W_{\mathrm{out}}$ in the skip-gram model). That is, for each word $w$ in the corpus, the corresponding vector representation is given by $z_{w}=W_{\mathrm{in}}e_{w}\in\mathbb{R}^{N}$ , where $e_{w}\in\mathbb{R}^{M}$ denotes the one-hot vector of the word $w$ .

V Proposed Scheme

In this section, we describe the solution approach to Problem 1. The proposed approach consists of the two steps: (i) train a vector representation of the STL specifications, namely STL2vec, whose input is the STL specification $\phi_{i}\in\Phi$ and output is the $N$ -dimensional vector; (ii) train the RNN, whose inputs are the state trajectory $x_{0:t}$ and the vector representation of the STL specification $z_{\phi_{i}}$ , and the output is the control input $u_{t}$ . The illustration of the two steps are shown in Fig. 3. Since STL2vec will be trained based on the skip-gram, the parameter to be learned in Step (i) is $W_{\mathrm{in}}$ (recall in Section IV that we neglect $W_{\mathrm{out}}$ ). Also, recall that the parameters to be learned for RNN in Step (ii) is $\{W_{1},W_{2}\}$ (see Section III). Hence, the overall parameters to be learned for the control policy $\pi$ is $W=\{W_{\mathrm{in}},W_{1},W_{2}\}$ .

The proposed approach is beneficial in the following sense. By constructing STL2vec, we can obtain a vector representation that exhibits similarities between the STL specifications. This allows us to accelerate both efficiency and performance of the RNN controller in contrast to some naive approaches, such as integer or one-hot encoding i.e. simply assigning arbitrary numbers or one-hot vectors to the STL specifications (e.g., $\phi_{1}$ is assigned by 1, $\phi_{2}$ is assigned by 2, and so on) and use these numbers as the inputs to the RNN; for details, see an experimental result in Section VI. Moreover, the proposed approach is more beneficial than the approach of learning RNN controllers one by one for each STL formula in terms of computational memory; for details, see Section V-C and Section VI. The concrete procedures of the above two steps are given in the following subsections.

V-A Training STL2vec

In this subsection, we provide a detailed procedure of Step (i) (train STL2vec). To train STL2vec, we use the skip-gram model as explained in Section IV. First, we encode all the STL candidate specifications by the $M$ -dimensional one-hot vectors, i.e. each STL specification $\phi_{i}\in\Phi$ is encoded by an $M$ -dimensional vector whose $i$ -th element is 1, and all other elements are 0 ( $\phi_{1}$ is encoded as $[1,0,\ldots,0]^{\mathsf{T}}$ and $\phi_{2}$ is encoded as $[0,1,0,\ldots,0]^{\mathsf{T}}$ , and so on), so that they are readable to the NN. For simplicity, the one-hot vector of $\phi_{i}\in\Phi$ is denoted by $e_{\phi_{i}}\in\{0,1\}^{M}$ .

A key question regarding the construction of STL2vec is how to learn the weight parameters $W_{\mathrm{in}},W_{\mathrm{out}}$ for the skip gram model, or in other words, how to generate the training dataset to learn $W_{\mathrm{in}},W_{\mathrm{out}}$ . In this paper, we learn these parameters such that similar vector representations can be obtained if the two STL specifications are close to each other in terms of their control policies (for a detailed intuition for this, see Section III). To this end, we generate training dataset by comparing robustness scores among the STL specifications, aiming at measuring their closeness. More specifically, for each $\phi_{i}\in\Phi$ , we randomly select an initial state $x_{0}\in\mathcal{X}_{0}$ (following the probability distribution $p:\mathcal{X}_{0}\rightarrow\mathbb{R}$ ), and solve the following maximization problem:

	$\displaystyle\mathop{\mathrm{maximize}}_{u_{0},\ldots,u_{T-1}}\ \rho^{\phi_{i}}\left(x_{0:T}\right)$
	$\displaystyle\mathrm{s.t.}\ \ {x}_{t+1}={f}\left({x}_{t},u_{t}\right),\ u_{t}\in\mathcal{U},\ t=0,1,\dots,T-1.$		(10)

Let $u_{0}^{*},u_{1}^{*},\dots,u_{T-1}^{*}\in\mathcal{U}$ denote the optimal control inputs as the solution to (10), and $x_{0:T}^{*}=x_{0}^{*},x_{1}^{*},\dots,x_{T}^{*}$ the corresponding trajectory of the system (1). Thus, the corresponding (maximized) robustness score is $\rho^{\phi_{i}}\left(x^{*}_{0:T}\right)$ . Then, we compute the robustness scores of the obtained trajectory $x_{0:T}^{*}$ with respect to all the other STL specifications, i.e.,

\displaystyle\rho^{\phi_{j}}(x_{0:T}^{*}),\ \mathrm{for\ all}\ \phi_{j}\in\Phi\ \mathrm{with}\ i\neq j.

(11)

Then, we sort the robustness scores of (11) in order of their closeness to $\rho^{\phi_{i}}\left(x^{*}_{0:T}\right)$ (i.e., sort by evaluating $|\rho^{\phi_{i}}\left(x^{*}_{0:T}\right)-\rho^{\phi_{j}}\left(x^{*}_{0:T}\right)|$ , $i\neq j$ ). We denote the ordered specifications by $\phi^{i}_{k_{1}},\phi^{i}_{k_{2}},\dots,\phi^{i}_{k_{M-1}}$ , where $\phi^{i}_{k_{j}}\in\Phi$ denotes the specification which has $j$ -th closest robustness value to $\rho^{\phi_{i}}(x_{0:T}^{*})$ .

Input :

\Phi=\{\phi_{1},\ldots,\phi_{M}\}

(candidate specifications);

x_{0}

(initial state);

P

(number of output layers);

N

(dimension of vector representation);

N_{\mathrm{ite}}

(number of iterations)

Output :

W_{\mathrm{in}},W_{\mathrm{out}}

(weight parameters for the skip-gram)

\mathcal{D}\leftarrow\varnothing

;

2 for each $\phi_{i}\in\Phi$ do

4 for $\ell=1:N_{\mathrm{ite}}$ do

5 Select

x_{0}\in\mathcal{X}_{0}

according to the probability distribution

p:\mathcal{X}_{0}\rightarrow\mathbb{R}

;

6 Given

\phi_{i}\in\Phi

x_{0}\in\mathcal{X}_{0}

, solve (10) to obtain the optimal trajectory

x^{*}_{0:T}=x^{*}_{0},\ldots,x^{*}_{T}

;

7 Compute the robustness scores of

x^{*}_{0:T}

with respect to the other STL specifications (11);

9 Sort (11) in order of their closeness to

\rho^{\phi_{i}}\left(x^{*}_{0:T}\right)

, and pick up the first

P

STL specifications:

\phi_{k_{1}}^{i},\phi_{k_{2}}^{i},\dots,\phi_{k_{P}}^{i}

;

10 Let

\mathcal{D}_{\mathrm{temp}}

be given by (12). Then, update the dataset as

\mathcal{D}\leftarrow\mathcal{D}\cup\mathcal{D}_{\mathrm{temp}}

;

11 end for

13 end for

Based on the training data set

\mathcal{D}

and the cross entropy loss, train

W_{\mathrm{in}},W_{\mathrm{out}}

via back propagation.

Algorithm 1 Learning

W_{\mathrm{in}},W_{\mathrm{out}}

in the skip-gram.

Then, we pick up the first $P$ STL specifications, i.e., $\phi_{k_{1}}^{i},\phi_{k_{2}}^{i},\dots,\phi_{k_{P}}^{i}$ (recall that $P$ is the number of output layers in the skip-gram model), and set the input and output training data as $\left\{{\phi_{i}},\left({\phi^{i}_{k_{1}}},{\phi^{i}_{k_{2}}},\dots,{\phi^{i}_{k_{P}}}\right)\right\}$ , where $\phi_{i}$ represents the input data and $(\phi^{i}_{k_{1}},\phi^{i}_{k_{2}},\dots,\phi^{i}_{k_{P}})$ represents the output data. This data is then encoded by the one-hot vectors, regarded as the training data to learn the skip-gram:

\displaystyle\left\{e_{\phi_{i}},\left(e_{\phi^{i}_{k_{1}}},e_{\phi^{i}_{k_{2}}},\dots,e_{\phi^{i}_{k_{P}}}\right)\right\}.

(12)

The data in (12) is then added to the dataset. For each $\phi_{i}\in\Phi$ , we repeat the above process $N_{\mathrm{ite}}$ times so as to obtain a collection of the training data. The proposed approach presented above is summarized in Algorithm 1. Then, the weight parameters $W_{\mathrm{in}},W_{\mathrm{out}}$ are learned by minimizing the cross entropy loss via back-propagation.

Remark 1

Note that, due to the definition of the robustness given in Section II, the robustness can be nested with max/min functions and thus the optimization problem (10) can be non-differentiable. To deal with such a problem, we adopt a smooth approximation of the min/max operators by the log-sum-exp as follows: $\max(a_{1},\dots,a_{m})\approx\frac{1}{\beta}\mathrm{ln}\sum_{i=1}^{m}\exp(\beta\alpha_{i})$ and $\min(a_{1},\dots,a_{m})\approx\frac{1}{\beta}\mathrm{ln}\sum_{i=1}^{m}\exp(-\beta\alpha_{i})$ , where $\beta>0$ is the scaling parameter. It is known that the approximation error goes to $0$ as $\beta\rightarrow\infty$ (see, e.g., [7]). This approximation allows the problem (10) to be solved based on gradient-based algorithms. $\Box$

Remark 2

Since the optimization problem (10) is non-linear, finding the global optimal solution of the problem (10) is difficult and some local optima that leads to undesirable control performance can be obtained. To the best of our knowledge, there are no general, systematic methodologies to obtain the global optimum of (10). However, we believe that there are some heuristic methods that potentially improve the solution. For example, in the case study of Section VI, we have confirmed that we could increase the possibility to obtain the reasonable solution (the solution with the positive robustness) by appropriately setting the parameter $\beta$ in the smooth approximation. Furthermore, we have also confirmed that we could deal with the problem by solving (10) several times with the initial states randomly sampled from the vicinity of $x_{0}$ , i.e., once $x_{0}$ is chosen, (10) is solved with the initial state $\tilde{x}_{0}$ newly sampled satisfying $\|\tilde{x}_{0}-x_{0}\|\leq\epsilon$ , where $\epsilon$ is a given small positive constant, which could increase the possibility to obtain positive robustness. $\Box$

Finally, similarly to Word2vec, the vector representation of the STL specifications is given through the projection of $W_{\mathrm{in}}$ , i.e., the vector representation of $\phi_{i}\in\Phi$ is given by $z_{\phi_{i}}=W_{\mathrm{in}}e_{\phi_{i}}$ .

Example 1

Consider $M=5$ candidate STL specifications $\Phi=\{\phi_{1},\phi_{2},\dots,\phi_{5}\}$ and $P=2$ . Suppose that we consider $\phi_{1}$ and solve the corresponding optimization problem (10) (i.e., $i=1$ ) to obtain the optimal trajectory $x_{0:T}^{*}$ . Suppose that the robustness scores of $x_{0:T}^{*}$ with respect to the STL specifications are obtained as $\rho^{\phi_{1}}(x_{0:T}^{*})=0.3$ , $\rho^{\phi_{2}}(x_{0:T}^{*})=0.2$ , $\rho^{\phi_{3}}(x_{0:T}^{*})=-0.5$ , $\rho^{\phi_{4}}(x_{0:T}^{*})=0.1$ , and $\rho^{\phi_{5}}(x_{0:T}^{*})=0.25$ . Then, we select the top $P=2$ specifications among $\{\phi_{2},\phi_{3},\phi_{4},\phi_{5}\}$ whose robustness scores are the closest to the one of $\phi_{1}$ . In this case, we select $\phi_{5}$ and $\phi_{2}$ , since the robustness scores of $\phi_{5}$ and $\phi_{2}$ have the most and the second closest robustness values to $\rho^{\phi_{1}}(x_{0:T}^{*})$ , respectively. Thus, the resulting data is $\{(\phi_{1}),(\phi_{5},\phi_{2})\}$ and the encoded training data to be added in the dataset is $\left\{[1,0,0,0,0],\ \left([0,0,0,0,1],[0,1,0,0,0]\right)\right\}$ . $\Box$

Remark 3

If there exist two or more STL specifications that have the same robustness values in the output data, we add all the combinations to the training data set. For example, in Example 1, if the robustness values are obtained as $\rho^{\phi_{1}}(x_{0:T}^{*})=0.3$ , $\rho^{\phi_{2}}(x_{0:T}^{*})=0.2$ , $\rho^{\phi_{3}}(x_{0:T}^{*})=-0.5$ , $\rho^{\phi_{4}}(x_{0:T}^{*})=0.2$ , and $\rho^{\phi_{5}}(x_{0:T}^{*})=0.1$ , then we add the training data as $\left\{(\phi_{1}),(\phi_{2},\phi_{5})\right\}$ and $\left\{(\phi_{1}),(\phi_{4},\phi_{5})\right\}$ . $\Box$

V-B Training RNN

Here, we describe a detailed procedure of Step (ii). Recall that in Step (ii), we aim to train RNN, whose inputs are the trajectory $x_{0:t}$ and the vector representation of $\phi_{i}$ (i.e., $z_{\phi_{i}}$ ), and the output is the control input $u_{t}$ . Further, remember that the concrete unfolded structure of the RNN is shown in Fig. 1, and the update rules are given by (5)–(6). Here, the input variable $s_{t}$ is given by the concatenation of the state $x_{t}$ and the vector representation of the chosen STL specification, i.e., $s_{t}=[x_{t}^{\mathsf{T}},z_{\phi_{i}}^{\mathsf{T}}]^{\mathsf{T}}$ , where $z_{\phi_{i}}\in\mathbb{R}^{N}$ is the vector representation of the chosen STL specification $\phi_{i}$ . Now, let $\tilde{\pi}(x_{0:t},z_{\phi_{i}};W_{1},W_{2})$ denote the control policy (or a mapping) for the RNN, where $W_{1},W_{2}$ are the RNN parameters to be learned (for the illustration, see Fig. 3(b)). Since the parameters for STL2vec $W_{\mathrm{in}}$ is fixed, we here focus on learning the RNN parameters $W_{1},W_{2}$ to characterize $\tilde{\pi}$ .

To numerically solve the Problem 1, we keep repeating the following procedure until a certain number of epochs $N_{\mathrm{epo}}$ is reached. First, in each epoch, we construct mini-batch of the specifications by randomly rearranging the order of the candidate specifications and splitting them up according to the prespecified batch size $N_{b}$ , i.e., we construct the batches $\mathcal{B}_{1}=\{\phi_{1}^{1},\ldots,\phi_{N_{b}}^{1}\}$ , $\mathcal{B}_{2}=\{\phi_{1}^{2},\ldots,\phi_{N_{b}}^{2}\}$ , $\dots$ , $\mathcal{B}_{K}=\{\phi_{1}^{K},\ldots,\phi_{N_{b}}^{K}\}$ , where $K=\frac{M}{N_{b}}$ (assuming that $M$ can be divided by $N_{b}$ ) ¹¹1If $M$ is not divisible by $N_{b}$ , we construct minibatch with $K=\lfloor\frac{M}{N_{b}}\rfloor$ ( $\lfloor\cdot\rfloor$ denotes a floor function) and append the remaining specifications in the last batch. For example, if $\Phi=\{\phi_{1},\phi_{2},\ldots,\phi_{5}\}$ , and $N_{b}=2$ , we construct minibatch e.g., $\mathcal{B}_{1}=\{\phi_{1},\phi_{3}\}$ , $\mathcal{B}_{2}=\{\phi_{2},\phi_{4},\phi_{5}\}$ .. Then, we iterate the following procedures for all the batches $\mathcal{B}_{p}\ (p=1,\ldots,K)$ to update the parameters $W_{1}$ and $W_{2}$ .

V-B1 Forward Computation

For all the pairs of the initial state $x_{0}^{j}$ $(j=1,\ldots,L)$ which are randomly sampled from the initial region $\mathcal{X}_{0}$ and the vectors of the specifications $z_{\phi_{i}^{p}}$ $(i=1,\ldots,N_{b})$ in a batch $\mathcal{B}_{p}$ , we generate the trajectories $x_{0:T}^{i,j}$ by iteratively applying ${x}_{t+1}^{i,j}={f}\left({x}_{t}^{i,j},\tilde{\pi}\left(x^{i,j}_{0:t},z_{\phi_{i}^{p}};{W_{1},W_{2}}\right)\right)$ for fixed $W_{1}$ and $W_{2}$ from the initial state $x_{0}^{i,j}=x_{0}^{j}$ . Then, by using all the $N_{b}L$ trajectories obtained above, we compute the following loss:

\displaystyle-\frac{1}{{N_{b}}L}\sum_{i=1}^{{N_{b}}}\sum_{j=1}^{L}\rho^{\phi_{i}^{p}}\left(x_{0:T}^{i,j}\right),

(13)

which is an approximation of the expectation (3) in Problem 1 with the specifications in a batch (note that we take the negative of the robustness to define a loss).

V-B2 Backward Computation

After the forward computation, we compute the gradients for all the parameters via backpropagation through time (BPTT) [16]. This computation can be easily implemented by combining the auto-differentiation tools designed for NNs like PyTorch [22] and STLCG [20] which is a newly developed python toolbox for computing the STL robustness using computation graph.

V-B3 Weight Update

Lastly, we update all the parameters in the RNN controller by using the weights obtained above. In this paper, we use Adam optimizer [21].

V-C Some comparisons between the proposed approach and the approach of learning the RNNs one by one

As one of the alternative methods, one could consider learning a set of RNN controllers independently one by one for all the STL specifications. This one-by-one approach might eventually provide higher control performance than the proposed method, since it tries to learn a set of RNN controllers independently for all the candidate STL specifications (on the other hand, the proposed method learns the controller by using only one RNN model). However, the proposed approach has the potential to require much less computational memory than the one-by-one approach due to the following reason. In the proposed method, the total number of the parameters to be learned is given by $N_{\mathrm{in}}+N_{R1}+N_{R2}$ , where $N_{\mathrm{in}}$ , $N_{R1}$ , and $N_{R2}$ are the number of elements in $W_{\mathrm{in}}$ , $W_{1}$ , and $W_{2}$ . For instance, when we use the Long-Short-Term-Memory (LSTM) as the RNN model, $N_{R1}$ and $N_{R2}$ are given by $N_{R1}=4N_{h}N_{\mathrm{lstm}}(n+N)$ and $N_{R2}=4N_{h}N_{\mathrm{lstm}}m$ , where $N_{h}$ is the dimension of the hidden state of RNN and $N_{\mathrm{lstm}}$ is the number of LSTM layers. The number of parameters for STL embedding $N_{\mathrm{in}}$ is given by $N_{\mathrm{in}}=MN$ . On the other hand, the required number of parameters when we train the controller one-by-one for each candidate specifications is given by $M(N^{{}^{\prime}}_{R1}+N^{{}^{\prime}}_{R2})$ , where $N^{{}^{\prime}}_{R1}=4N_{\mathrm{lstm}}nN_{h}$ and $N^{{}^{\prime}}_{R2}=N_{R2}=N_{\mathrm{lstm}}mN_{\mathrm{h}}$ are the number of parameters for each RNN model when we train STL specifications one-by-one. From the above discussion, the number of the parameters for the STL embedding $N_{\mathrm{in}}$ and the RNN models in the one-by-one case are both proportional to $M$ , and the corresponding coefficients are given by $N$ and $N^{{}^{\prime}}_{R1}+N^{{}^{\prime}}_{R2}$ , respectively. In other words, the number of the required parameters increases with respect to the number of the STL specifications in both of the two approaches. However, it is noted that we typically have $N\ll N^{{}^{\prime}}_{R1}+N^{{}^{\prime}}_{R2}$ ²²2It is argued that $N\ll N^{{}^{\prime}}_{R1}+N^{{}^{\prime}}_{R2}$ indeed holds for many control problems with STL tasks based on the fact that the previous works regarding the control synthesis with RNN (e.g., in [15]) use the structure of 2 LSTM layers with 32 dimensional hidden states, which leads to $N^{{}^{\prime}}_{R1}+N^{{}^{\prime}}_{R2}=1280$ , even for simple 2-D path planning problem. On the other hand, in the literature of word2vec [17], vector size $N$ for word embedding is typically set from 50 to 200 for very large corpus like $780\times 10^{3}$ (in our case study of Section VI, we have set $N=20$ that is smaller than these typical values, since the total number of the candidate STL specifications considered in the case study is $369\ll 780\times 10^{3}$ )., i.e., the size of the vector representation of STL is selected much smaller than the number of the RNN parameters, and thus the number of the required parameters by the proposed approach is smaller than the one-by-one approach. For example, in the case study of Section VI, it is shown that the total number of parameters to learn the controller by the proposed approach is 24 times smaller than the one-by-one approach; for details, see Section VI.

VI case study

We show the efficacy of the proposed method through the example of a path planning problem of a vehicle in 2D space. We used PyTorch package [22] for the implementation of RNN, IPOPT in CasADi [23] to solve optimization (10), and STLCG [20] for the computation regarding STL robustness. We used Windows 10 with a 2.80 GHz Core i7 CPU and 32 GB of RAM for all the experiments discussed in this section. Throughout this section, we consider the following nonlinear, discrete-time nonholonomic system: $q_{x,t+1}=q_{x,t}+v_{t}\sin{\theta_{t}}$ , $q_{y,t+1}=q_{y,t}+v_{t}\cos{\theta_{t}}$ , $\theta_{t+1}=\theta_{t}+\omega_{t}$ , where $q_{x,t}$ , $q_{y,t}$ represent position of the vehicle, $\theta_{t}$ is the heading angle, $v_{t}$ and $\omega_{t}$ are velocity and angular velocity, respectively. The state $x_{t}$ and control input $u_{t}$ are defined as $x_{t}=[q_{x,t},q_{y,t},\theta_{t}]^{\top}$ and $u_{t}=[v_{t},\omega_{t}]^{\top}$ , respectively. As illustrated in Figure 4, we consider 4 regions in a 2D space: $\mathrm{Reg}\ 1=[3,5]\times[7,9]$ , $\mathrm{Reg}\ 2=[3,5]\times[3,5]$ , $\mathrm{Reg}\ 3=[7,9]\times[3,5]$ , $\mathrm{Reg}\ 4=[7,9]\times[7,9]$ , and the set of initial states $\mathcal{X}_{0}=[0,0.7]\times[0,0.7]$ (blue-edged color region). In addition, we consider 4 sub-regions in each region, which are labeled by indices as shown in Figure 4. In the following, we denote $j$ -th sub-region $(j=1,2,3,4)$ in region $i$ $(i=1,2,3,4)$ as $\mathrm{Reg}\ (i,j)$ .

The STL candidate specifications are shown in Table I. Here, the specifications are considered for all $i,i_{1},i_{2},i_{3}\in\{1,\ldots,4\}$ ( $i_{1}>i_{2}$ , $i_{3}\neq 1$ , $i_{1}\neq i_{3}$ ) and $j,j_{1},j_{2},j_{3}\in\{1,\ldots,4\}$ . The total number of the STL candidate specifications is given by $369$ .

VI-A Training STL2vec

Now we train STL2vec based on the proposed approach presented in Section V-A.

TABLE I: STL candidate specifications.

Specifications
$(a)$ $\bm{F}_{[0,20]}\ \mathrm{Reg}\ (i,j)$
$(b)$ $\bm{F}_{[0,20]}\ \mathrm{Reg}\ (i_{1},j_{1})\lor\bm{F}_{[0,20]}\ \mathrm{Reg}\ (i_{2},j_{2})$
$(c)$ $\bm{F}_{[0,10]}\ \mathrm{Reg}\ (i_{1},j_{1})\land\bm{F}_{[11,20]}\ \mathrm{Reg}\ (i_{3},j_{2})$
$(d)$ $\bm{F}_{[0,15]}\bm{G}_{[0,5]}\ \mathrm{Reg}\ (i,j)$
$(e)$ $\bm{F}_{[0,15]}\bm{G}_{[0,5]}\ \mathrm{Reg}\ (i_{1},j_{1})\lor\bm{F}_{[0,15]}\bm{G}_{[0,5]}\ \mathrm{Reg}\ (i_{2},j_{2})$
$(f)$ $\bm{F}_{[0,20]}\ \mathrm{Reg}\ (4,4){\land}\left(\lnot\mathrm{Reg}\ (4,4)\ \bm{U}_{[0,20]}\ \mathrm{Reg}\ (2,3)\right)$

TABLE II: The time required to obtain STL2vec (in sec)

Procedure	Run-time
Dataset generation (STL2vec)	1452
Training STL2vec	87

The parameters for the skip-gram are $N=20$ and $P=2$ . $N_{\mathrm{ite}}$ in Algorithm 1 and the number of epochs for the training of skip-gram model are set to 1 and 100, respectively. After the training, we evaluate similarities between all the 2 different vector representations $z_{\phi_{i}},z_{\phi_{j}}$ ( $i\neq j$ ) by the cosine similarity, which is defined as $(z_{\phi_{i}}^{\mathsf{T}}z_{\phi_{j}})/(\|z_{\phi_{i}}\|\|z_{\phi_{j}}\|)$ and it takes the maximum value of $1$ if $z_{\phi_{i}}$ has the same orientation as $z_{\phi_{j}}$ . The time required to train STL2vec is summarized in Table II. In Table III, we illustrate STL specifications which have the largest to fourth-largest cosine similarity values for some example specifications.

We summarize some characteristics that we have observed in the obtained embeddings in the followings: (I) Each specification in (a) of Table I is typically embedded close to the corresponding specification in (d). Moreover, each specification in (b) and (e) is embedded close to either $\bm{F}_{[0,20]}\ \mathrm{Reg}\ (i_{1},j_{1})$ (and $\bm{F}_{[0,15]}\bm{G}_{[0,5]}\mathrm{Reg}\ (i_{1},j_{1})$ ) or $\bm{F}_{[0,20]}\ \mathrm{Reg}\ (i_{2},j_{2})$ (and $\bm{F}_{[0,15]}\bm{G}_{[0,5]}\mathrm{Reg}\ (i_{2},j_{2})$ ). Typically, (b) and (e) are embedded close to the specification regarding the region which is closer to init region. For example, as we can partially see in Ex 1 of Table III, we have observed that almost all the specifications $\bm{F}_{[0,20]}\ \mathrm{Reg}\ (2,2)\lor\bm{F}_{[0,20]}\ \mathrm{Reg}\ (i,j)$ and $\bm{F}_{[0,15]}\ \bm{G}_{[0,5]}\ \mathrm{Reg}\ (2,2)\lor\bm{F}_{[0,15]}\ \bm{G}_{[0,5]}\ \mathrm{Reg}\ (i,j)$ $(i\in\{1,3,4\},\ j\in\{1,2,3,4\})$ are embedded close to the specification $\bm{F}_{[0,20]}\ \mathrm{Reg}\ (2,2)$ . This is intuitive and desirable result when we train the controller since all of these specifications are satisfied by entering whether $\mathrm{Reg}\ (i_{1},j_{1})$ or $\mathrm{Reg}\ (i_{2},j_{2})$ and stay there more than 5 steps, which is possible actions for all of regions in this example. (II) Each specification in (c) of Table I is basically embedded close to the specifications which are satisfied by similar trajectory. Specifically, the properties same as the following examples are observed for all the specifications in (c): (i) specifications $\bm{F}_{[0,10]}\ \mathrm{Reg}\ (1,3)\land\bm{F}_{[11,20]}\ \mathrm{Reg}\ (3,j)$ ( $j\in\{1,2,3,4\}$ ) are embedded close to each other (see Ex 2 of Table III); (ii) as we can see in Ex 3 of Table III, the specifications $\bm{F}_{[0,10]}\ \mathrm{Reg}\ (2,2)\land\bm{F}_{[11,20]}\ \mathrm{Reg}\ (3,1)$ and $\bm{F}_{[{\color[rgb]{1,0,0}0},20]}\ \mathrm{Reg}\ (3,1)$ are mapped onto similar vectors (these specifications are satisfied with the same trajectory since $\mathrm{Reg}(2,2)$ is on the way from the starting point to $\mathrm{Reg}(3,1)$ ). (III) The specification (f) of Table I is embedded close to $\bm{F}_{[0,10]}\ \mathrm{Reg}\ (2,3)\land\bm{F}_{[11,20]}\ \mathrm{Reg}\ (4,4)$ as we can see from EX 4 in Table III. This is also desirable result result since the specifications (f) require to firstly reach $\mathrm{Reg}\ (2,3)$ and then reach $\mathrm{Reg}\ (4,4)$ after that.

TABLE III: Cosine similarities of vector representations of the STL candidate specifications.

Ex 1	$\bm{F}_{[0,20]}\ \mathrm{Reg}\ (2,2)$	$\mathrm{sim}$	Ex 2	$\bm{F}_{0,10]}\ \mathrm{Reg}\ (1,3)\land\bm{F}_{[11,20]}\ \mathrm{Reg}\ (3,1)$	$\mathrm{sim}$
$1$	$\bm{F}_{[0,20]}\ \mathrm{Reg}\ (3,3)\lor\bm{F}_{[0,20]}\ \mathrm{Reg}\ (2,2)$	0.99	$1$	$\bm{F}_{[0,10]}\ \mathrm{Reg}\ (1,3)\land\bm{F}_{[11,20]}\ \mathrm{Reg}\ (3,4)$	0.99
$2$	$\bm{F}_{[0,20]}\ \mathrm{Reg}\ (2,2)\lor\bm{F}_{[0,20]}\ \mathrm{Reg}\ (1,1)$	0.99	$2$	$\bm{F}_{[0,10]}\ \mathrm{Reg}\ (1,3)\land\bm{F}_{[11,20]}\ \mathrm{Reg}\ (3,3)$	0.99
$3$	$\bm{F}_{[0,15]}\bm{G}_{[0,5]}\ \mathrm{Reg}\ (4,4)\lor\bm{F}_{[0,15]}\bm{G}_{[0,5]}\ \mathrm{Reg}\ (2,2)$	0.99	$3$	$\bm{F}_{{\color[rgb]{1,0,0}[0,10]}}\ \mathrm{Reg}\ (1,3)\land\bm{F}_{{\color[rgb]{1,0,0}[11,20]}}\ \mathrm{Reg}\ (3,2)$	0.95
$4$	$\bm{F}_{[0,15]}\bm{G}_{[0,5]}\ \mathrm{Reg}\ (2,2)$	0.99	$4$	$\bm{F}_{[0,15]}\bm{G}_{[0,5]}\ \mathrm{Reg}\ (3,3)\lor\bm{F}_{[0,15]}\bm{G}_{[0,5]}\ \mathrm{Reg}\ (1,3)$	0.76
Ex 3	$\bm{F}_{[0,10]}\ \mathrm{Reg}\ (2,2)\land\bm{F}_{[11,20]}\ \mathrm{Reg}\ (3,1)$	$\mathrm{sim}$	Ex 4	$\bm{F}_{[0,20]}\ \mathrm{Reg}\ (4,4){\land}\left(\lnot\mathrm{Reg}\ (4,4)\ \bm{U}_{[0,20]}\ \mathrm{Reg}\ (2,3)\right)$	$\mathrm{sim}$
$1$	$\bm{F}_{[0,20]}\ \mathrm{Reg}\ (3,1)$	0.96	$1$	$\bm{F}_{[0,10]}\ \mathrm{Reg}\ (2,3)\land\bm{F}_{[11,20]}\ \mathrm{Reg}\ (4,4)$	0.81
$2$	$\bm{F}_{[0,20]}\ \mathrm{Reg}\ (3,1)\lor\bm{F}_{[0,20]}\ \mathrm{Reg}\ (1,1)$	0.95	$2$	$\bm{F}_{[0,10]}\ \mathrm{Reg}\ (4,4)\land\bm{F}_{[11,20]}\ \mathrm{Reg}\ (3,3)$	0.73
$3$	$\bm{F}_{[0,20]}\ \mathrm{Reg}\ (4,4)\lor\bm{F}_{[0,20]}\ \mathrm{Reg}\ (3,1)$	0.95	$3$	$\bm{F}_{[0,10]}\ \mathrm{Reg}\ (4,4)\land\bm{F}_{[11,20]}\ \mathrm{Reg}\ (3,1)$	0.71
$4$	$\bm{F}_{[0,15]}\bm{G}_{[0,5]}\ \mathrm{Reg}\ (4,4){\color[rgb]{0,0,0}\lor}\bm{F}_{[0,15]}\bm{G}_{[0,5]}\ \mathrm{Reg}\ (3,1)$	0.93	$4$	$\bm{F}_{[0,10]}\ \mathrm{Reg}\ (4,4)\land\bm{F}_{[11,20]}\ \mathrm{Reg}\ (3,4)$	0.71

VI-B Training RNN

Next, we evaluate the control performance of the proposed method. In this experiment, we train the parameters of RNN model for the specifications (a), (b), (c), (f) and $\bm{F}_{[0,15]}\bm{G}_{[0,5]}\ \mathrm{Reg}\ (1,1)\lor\bm{F}_{[0,15]}\bm{G}_{[0,5]}\ \mathrm{Reg}\ (3,2)$ in (e) in Table I. As for the specifications in (c), we only consider the specifications with $(i_{1},i_{3})=(1,3),(2,1),(2,3),(2,4),(4,3)$ . Thus the total number of specifications is 194. We retrain the STL embeddings (20 dimension) with these specifications and use them as the input to the RNN. Same as the previous work [15], all of the RNN models used in this example consists of 2 LSTM layers with 32-dimensional hidden states.

TABLE IV: Some alternative approaches.

Approaches
(A1) Integer encoding scheme
(A2) One-hot encoding scheme
(A3) Training controllers one-by-one

As summarized in Table IV, we consider the following 3 alternative approaches for comparison with the proposed method: (A1) Integer encoding scheme: we incrementally assign an integer number for each STL specification (for example, $\bm{F}_{[0,20]}\ {\color[rgb]{1,0,0}\mathrm{Reg}}\ (1,1)$ is assigned by $1$ , $\bm{F}_{[0,20]}\ {\color[rgb]{1,0,0}\mathrm{Reg}}\ (1,2)$ is assigned by $2$ , etc), and use it as the input to the RNN (instead of the vectors generated by STL2vec) (A2) One-hot encoding scheme: we generate and assign 194 dimensional one-hot vectors for all the 194 specifications (for example, $\bm{F}_{[0,20]}\ {\color[rgb]{1,0,0}\mathrm{Reg}}\ (1,1)$ is assigned by $[1,0,\dots,0]$ , $\bm{F}_{[0,20]}\ {\color[rgb]{1,0,0}\mathrm{Reg}}\ (1,2)$ is assigned by $[0,1,\dots,0]$ , etc) (A3) Training one-by-one: we ready 194 RNN models and train each specification one-by-one. When we train the controllers for the approaches (A1) and (A2), we used the same training procedure discussed in Section V-B and used integer numbers or one-hot vectors instead of the vectors obtained by STL2vec. In both the proposed approach and (A1), (A2), we set the batch size and the number of initial states sampled for training in each iteration ( $N_{b}$ and $L$ in Section V-B)) to 8 and 3, respectively. In (A3), for each epoch we generate initial states $x^{j}_{0}$ ( $j=1,\ldots,L=3$ ) randomly from $\mathcal{X}_{0}$ and update the RNN parameters assigned for each STL specification $\phi_{i}\in\Phi$ ( $i=1,\ldots,M$ ) one by one via forward/backward computation (similarly to the procedure of Section V-B) with the following loss: $-\frac{1}{L}\sum_{j=1}^{L}\rho^{\phi_{i}}\left(x_{0:T}^{j}\right)$ .

TABLE V: Run time needed to reach several average of robustness (in sec).

approach \average of robustness	0.1	0.15	0.2	0.22
Proposed method	3494	4054	6722	14668
Learning one-by-one	5686	7663	10336	12168
One-hot encoding	27238	33863	–	–
Integer encoding	–	–	–	–

Fig. 5 shows the average of robustness values for all the control schemes and Table V summarizes the actual time (in sec) required to reach the robustness values 0.1, 0.15, 0.2, 0.22 for all the proposed and alternative methods (the symbol ”–” indicates that the corresponding average of the robustness value has not been achieved). Furthermore, the resulting few trajectories obtained by applying the RNN controller trained by the proposed method and approach (A3) are plotted in Fig. 6 (both controllers were trained 1100 epochs and the trajectories are plotted for 10 initial states newly sampled from $\mathcal{X}_{0}$ ). The robustness values are collected by testing the controllers once every 10 epochs with 30 initial states newly sampled from the initial region $\mathcal{X}_{0}$ . The values in Fig. 5 for the one-by-one scheme (A3) are plotted by taking the mean of the average of the robustness values for all the 194 RNN controllers at the same epochs. We further note here that the actual times required to run one epoch for all the approaches are different from each other. The averaged times required to run the proposed approach and the approaches (A1)-(A3) were 14.2, 14.0, 32.2, and 14.0 [s], respectively (note that the time for approach (A3) is the total sum of the times required to update the parameters of all the RNN models). The time required for the approach (A2) is larger than the other methods because of the large input dimension (the dimension of the one-hot vector is 194). On the other hand, we have confirmed that the time required to update the parameters is not so much increased for the proposed method when $N=20$ as shown above. As we can see from Fig 5 and Table V, the average of robustness value of the proposed method are improving faster at the beginning of the training procedure than the other methods in terms of both number of epochs and actual time. Especially, from Table V, we can confirm that the average of robustness value of the proposed method reaches 0.1, 0.15, and 0.2 faster than the other approaches. Moreover, as we mentioned in Subsection V-C, we can largely save the memory consumption compared with the scheme (A3). In Table VI, we summarize the total number of the parameters required for each approach in this example.

However, the average of robustness value of the proposed method is subtly overtaken by the approach (A3) around 500 epoch and the averaged robustness value of the approach (A3) reaches 0.22 faster than the proposed approach. Within 1100 epochs, the maximum average of robustness values of the proposed method and approach (A3) were $0.233$ and $0.235$ , respectively. The reason why the average of robustness values of the proposed method is overtaken by that of the approach (A3) may be because of the effect of the other specifications, i.e., since only one RNN controller is trained for many specifications in the proposed method, the control performance for a specification may be affected by the other specifications depending on the obtained embedding. Such effect is observed in (b) of Fig. 6. The resulting trajectories are converged to the ones that are relatively far from the middle of $\mathrm{Reg}(3,2)$ , which leads to the low robustness although the specification itself is satisfied. To remove such behavior, further investigation for obtaining more superior embedding will be one of our future works.

TABLE VI: Total number of parameters in each method.

	Number of parameters
Proposed method	10280
approach (A1)	1280
approach (A2)	50432
approach (A3)	248320

VII conclusion and Future work

We proposed a method for mapping STL specifications onto the vector space (STL2vec) based on the word2vec technique. To obtain the STL embeddings that capture the similarities in specifications in terms of control policy, we have provided a method for constructing the dataset by solving the robustness maximization problem for all the candidate specifications. Then, we trained the RNN controller whose inputs are the state trajectory and a vector generated by STL2vec to deal with multiple STL specifications with one RNN model. The example shown in the simulation section shows efficacy of the proposed method in terms of memory consumption and the time required for the training.

In this paper, it is assumed that the chosen STL specification is fixed during control execution and not allowed to change during the execution. Hence, future work should involve investigating the case where the STL specification is changed during control execution so as to provide more flexibility of the RNN controller.

Acknowledgement

This work is supported by JST CREST JPMJCR2012.

References

[1] A. Pnueli, “The temporal logic of programs,” 18th Annual Symposium on Foundations of Computer Science (SFCF), pp. 46-57, 1977.
[2] R. Koymans, “Specifying real-time properties with metric temporal logic,” Real-time systems, vol. 2, no. 4, pp. 255–299, 1990.
[3] O. Maler and D. Nickovic, “Monitoring temporal properties of continuous signals,” in Formal Techniques, Modelling and Analysis of Timed and Fault-Tolerant Systems. Springer, 2004, pp. 152–166.
[4] V. Raman, A. Donze, M. Maasoumy, R. M. Murray, A. Sangiovanni-Vincentelli, and S. A. Seshia, “Model predictive control with signal temporal logic specifications,” in 53rd IEEE Conference on Decision and Control, Dec 2014, pp. 81–87.
[5] S. Sadraddini and C. Belta, “Robust temporal logic model predictive control,” in 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2015, pp. 772–779.
[6] MN. Mehr, D. Sadigh, R. Horowitz, S. S. Sastry and S. A. Seshia, ”Stochastic predictive freeway ramp metering from Signal Temporal Logic specifications,” 2017 American Control Conference (ACC), 2017, pp. 4884-4889.
[7] Y. V. Pant, H. Abbas, and R. Mangharam, “Smooth operator: Control using the smooth robustness of temporal logic,” in IEEE Conference on Control Technology and Applications (CCTA), 2017, pp. 1235–1240.
[8] X. Li, Y. Ma, and C. Belta, “A policy search method for temporal logic specified reinforcement learning tasks,” in Annual American Control Conference (ACC). IEEE, 2018, pp. 240–245.
[9] N. Mehdipour, C. Vasile and C. Belta, ”Arithmetic-Geometric Mean Robustness for Control from Signal Temporal Logic Specifications,” 2019 American Control Conference (ACC), 2019, pp. 1690-1695.
[10] I. Haghighi, N. Mehdipour, E. Bartocci, and C. Belta, “Control from signal temporal logic specifications with smooth cumulative quantitative semantics,” in 2019 IEEE 58th Conference on Decision and Control (CDC). IEEE, 2019, pp. 4361–4366.
[11] S. Yaghoubi and G. Fainekos, “Worst-case satisfaction of stl specifications using feedforward neural network controllers: a lagrange multipliers approach,” in 2020 Information Theory and Applications Workshop (ITA). IEEE, 2020, pp. 1–20.
[12] A. Balakrishnan and J. V. Deshmukh, ”Structured Reward Shaping using Signal Temporal Logic specifications,” 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 3481-3486.
[13] A. Gopinath Puranic, J. V. Deshmukh and S. Nikolaidis, ”Learning From Demonstrations Using Signal Temporal Logic in Stochastic and Continuous Domains,” in IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 6250-6257, 2021.
[14] W. Liu, N. Mehdipour, and C. Belta, “Recurrent neural network controllers for signal temporal logic specifications subject to safety constraints,” IEEE Control Systems Letters, 2021.
[15] W. Liu and C. Belta, “Model-Based Safe Policy Search from Signal Temporal Logic Specifications Using Recurrent Neural Networks,” arXiv preprint arXiv:2103.15938, 2021.
[16] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[17] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in Neural Information Processing Systems, pp. 3111–3119, 2013.
[18] X. Rong, “word2vec Parameter Learning Explained,” arXiv:1411.2738, 2016.
[19] L. Xiao, Z. Serlin, G. Yang, and C. Belta. ”A formal methods approach to interpretable reinforcement learning for robotic planning.” Science Robotics 4, no. 37 2019.
[20] K. Leung, N. Arechiga, and M. Pavone, “Back-propagation through signal temporal logic specifications: Infusing logical structure into gradient-based methods,” arXiv preprint arXiv:2008.00097, 2020.
[21] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[22] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
[23] https://web.casadi.org