Spatio-Temporal Trajectory Similarity Measures: A Comprehensive Survey and Quantitative Study

Danlei Hu, Lu Chen, Hanxi Fang, Ziquan Fang, Tianyi Li, Yunjun Gao D. Hu, L. Chen (Corresponding Author), Z. Fang and Y. Gao are with the College of Computer Science, Zhejiang University, Hangzhou 310027, China, E-mail:{dlhu, luchen, zqfang, gaoyj}@zju.edu.cn. H. Fang is with the School of Earth Science, Zhejiang University, Hangzhou 310027, China, E-mail:[email protected]. T. Li is with the Department of Computer Science, Aalborg University, Denmark, E-mail:[email protected].

Abstract

Spatio-temporal trajectory analytics is at the core of smart mobility solutions, which offers unprecedented information for diversified applications such as urban planning, infrastructure development, and vehicular networks. Trajectory similarity measure, which aims to evaluate the distance between two trajectories, is a fundamental functionality of trajectory analytics. In this paper, we propose a comprehensive survey that investigates all the most common and representative spatio-temporal trajectory measures. First, we provide an overview of spatio-temporal trajectory measures in terms of three hierarchical perspectives: Non-learning vs. Learning, Free Space vs. Road Network, and Standalone vs. Distributed. Next, we present an evaluation benchmark by designing five real-world transformation scenarios. Based on this benchmark, extensive experiments are conducted to study the effectiveness, robustness, efficiency, and scalability of each measure, which offers guidelines for trajectory measure selection among multiple techniques and applications such as trajectory data mining, deep learning, and distributed processing.

Index Terms:

Trajectory Similarity Measure, Distributed Similarity Search, Deep Representation Learning, Experimental Evaluation

1 Introduction

With the proliferation of GPS-equipped devices and mobile computing services, massive spatio-temporal trajectory data of moving objects such as people, vessels, and vehicles are being captured [87, 39]. For example, people share visited places (e.g., POIs) on social networks by their smartphones to generate “check-in” trajectories; according to the AIS project [47], millions of vessels connected with the AIS services continuously report locations to ensure the sailing safety; and the world’s largest ridesharing company Uber collects up to 17 million vehicle trips daily [63].

Trajectory data with its analytics benefit a broad range of real-life applications across different fields such as urban computing [38], transportation [88], behavior study [37], and public security [34], to name but a few. A fundamental functionality of most trajectory analysis is to evaluate the relationship/distance between two trajectories, i.e., trajectory similarity measurement. With an accurate and efficient trajectory similarity measure, downstream trajectory analytics involving retrieval [58, 50], clustering [32, 18], classification [22, 31], and mobility pattern mining [44, 28, 60] tasks can be well-supported to serve upper applications. For instance, Xie et al. [71] propose two distance measures to support similarity queries and joins in a large-scale trajectory dataset. Wang et al. [67] design a road network oriented distance measure for vehicle trajectory similarity computation, based on which, $k$ -means clustering is studied [69] to detect hot/popular traveling paths in a city. In both cases, the trajectory similarity measure plays a fundamental role, and a different measure selection may result in totally different query results and clustering quality. Take the trajectory clustering as an example, clustering aims to group similar trajectories into clusters, where similarity computation is a fundamental task of clustering as shown in Figure 1(a); while different similarity measures may affect the clustering quality. As shown in Figure 1, although the numbers of clusters are the same (i.e., $\#$ clusters=4), the clustering results are quite different when applying different measures (i.e., DTW and EDR). Hence, the selection of similarity measure is important. In addition, more trajectory similarity based analysis tasks can refer to [66, 39].

Refer to caption — Figure 1: Example of Trajectory Clustering

TABLE I: Spatio-Temporal Trajectory Similarity Measures. (Here,

m

and

n

represent the lengths of two trajectories,

d

denotes the dimension of embedded vector, and the dark backdrop indicates the studied scope in the state-of-the-art survey [57])

Category			Measure	Complexity	Metric	Unequal Length	Parameter Free	Noise Sensitive	Repre- sentation
			ED	O(n)	✓	$\times$	$\times$	$\times$	Point
			DTW [79]	O(mn)	$\times$	✓	✓	$\times$	Point
			LCSS [64]	O(mn)	$\times$	✓	$\times$	✓	Point
			EDR [13]	O(mn)	$\times$	✓	$\times$	✓	Point
			EDwP [49]	O(mn)	$\times$	✓	✓	✓	Segment
			ERP[12]	O(mn)	✓	✓	$\times$	$\times$	Point
			Hausdorff [4]	O(mn)	✓	✓	✓	$\times$	Point
			Frechet [2]	O(mn)	$\times$	✓	✓	$\times$	Point
			LIP [46]	O((m+n)log(m+n))	$\times$	✓	✓	$\times$	Segment
			OWD [21]	O(mn)	$\times$	✓	$\times$	$\times$	Segment
		Standalone	Seg-Frechet [71]	O(mn)	$\times$	✓	✓	$\times$	Segment
			DFT [71]	O(mn)	$\times$	✓	✓	$\times$	Segment
			DITA [55]	O(mn)	$\times$	✓	✓	$\times$	Point
	Free Space	Distributed	REPOSE [85]	O(mn)	$\times$	✓	✓	$\times$	Point
			NetDTW [72]	O(mn)	$\times$	✓	✓	$\times$	Point
			NetLCSS [19]	O(mn)	$\times$	✓	$\times$	✓	Point
			LORS [67]	O(mn)	$\times$	✓	✓	$\times$	Segment
			TP [54]	O(mn)	$\times$	✓	$\times$	$\times$	Point
			NetERP [30]	O(mn)	✓	✓	$\times$	$\times$	Point
			NetEDR [30]	O(mn)	$\times$	✓	$\times$	✓	Point
		Standalone	LCRS [80]	O(mn)	$\times$	✓	✓	$\times$	Segment
Non- learning	Road Network	Distributed	DISON [80]	O(mn)	$\times$	✓	$\times$	$\times$	Segment
			NEUTRAJ [76]	O(m+n)	/	✓	$\times$	✓	Vector
	Free Space	Standalone	Traj2SimVec [84]	O(m+n)	/	✓	$\times$	✓	Vector
			GTS [26]	O(d)	/	✓	$\times$	$\times$	Vector
Learning	Road Network	Standalone	ST2Vec [19]	O(d)	/	✓	$\times$	$\times$	Vector

Unlike isolated spatial points or one-dimensional time series where the distance definition is straightforward, it is non-trivial to define the distance between continuous two-dimensional trajectories. It also needs to consider following four trajectory characteristics: (i) Different data sources, i.e., free space vs. road network space. In the latter case, a proper trajectory measure should take the road topology into account, as people and vehicles cannot travel like vessels without spatial constraints [54]. (ii) Various sampling rates and lengths. Unlike time series that generally feature constant and high sampling rates [36], spatio-temporal trajectory data is usually generated via varying samplings, resulting in variable lengths. (iii) The effect of noise. The noise points commonly exist, especially due to strength attenuation and interference in urban cities [83]. (iv) Complex shapes. Compared to private-car trajectories that are usually inaccessible caused by privacy principles, taxi trajectories are widely studied in the community [15, 61, 68]. However, taxi trajectories exhibit much more diverse, complex, and flexible geometric, because of various pick-pop demands. To deal with the above spatio-temporal characteristics, tremendous amounts of research efforts are devoted to designing dozens of spatio-temporal trajectory similarity measures.

Being faced with a huge amount of trajectory measures, researchers are often too exhausted to select a proper one. On the one hand, there are too many trajectory measures, which were proposed under different scenarios, e.g., learning based or non-learning based, free-space oriented or road network oriented, as well as standalone or distributed processing. In each scenario, various measures exist. Consequently, users need to spend tons of time and efforts to explore the specific details of each measure and the relations/differences among them. On the other hand, the evaluation on various trajectory measures is still not well organized. For instance, some measures only focus on efficiency, while others may put more attention on effectiveness and robustness. To address the problems mentioned above, a comprehensive survey, benchmark, and evaluation will be a great help for researchers involved in this important topic. Specifically, considering three-dimensional aspects, we classify the existing representative spatio-temporal trajectory measures proposed from 1995 to 2022 year in a hierarchical way, i.e., Non-learning vs. Learning (first hierarchy), Free Space vs. Road Network (second hierarchy), and Standalone vs. Distributed (third hierarchy). Table I summarizes trajectory similarity measures.

Although previous systematic surveys have made some efforts, they mostly focus on non-learning, free-space, or standalone based measures and thus significantly narrow the studied scope of trajectory similarity community. For example, Gudmundsson and Toohey et al. [62] only review four basic trajectory distance measures including Euclidean distance (ED), DTW [79], LCSS [64], and Fréchet [2]. Based on it, Sousa et al. [15] append partial vehicle trajectory similarity measures, e.g., LORS [67], LCRS [80], and NetEDR [30]. Note that, experimental evaluation of these measures are not studied in the surveys. Su et al. [57] conduct the state-of-the-art survey and experimental evaluation for 10 trajectory similarity measures under the non-learning, free-space, and standalone contexts. In contrast, we conduct a much more systematic review and evaluation of all the most common and representative spatio-temporal trajectory measures in a three-dimensional hierarchical way as depicted in Table I. We conduct a three-dimensional survey due to three following motivations:

(i) As deep learning has made great success in AI community, many studies [36, 76, 84, 26, 19] start leveraging the powerful approximation capabilities of neural networks to attempt to replace the traditional handcrafted trajectory measures with learning-based models. Besides, increasing efforts have been devoted to embracing learning-based techniques as an integral part of trajectory data management and analytics, such as deep clustering [20], deep mobility pattern mining [9], and deep path recommendation [81], to name just a few. As such, we believe that we have reached an imperative point to systematically study the contribution and explore the relations, differences, and pros/cons among the emerging learning-based and classic non-learning based trajectory measures.

(ii) In the early stage, most of the trajectory measures [57] are proposed for objects that move freely in the Euclidean space, e.g., bird or vessel trajectories. Recently, the proliferation of vehicle navigation systems and location based services (LBSs) enable the massive collection of vehicle and people trajectories in road networks. In that case, the free-space oriented trajectory measures cannot reflect the true distance between moving objects in a moving-constrained road network. In view of this, many network-aware trajectory measures [80, 67, 54, 30] are designed. To the best of our knowledge, there is no previous work that has given a systematic review and experimental evaluation of those network-based trajectory distance measures.

(iii) Since the storage capacity and processing ability of a single machine can no longer support a large scale of trajectory data, another popular line of trajectory similarity study is designing efficient and scalable frameworks upon distributed processing platforms (e.g., Spark) for large-scale trajectory similarity analytics. Towards this, several system-level frameworks are also developed. Hence, we are inspired to present a sufficient review and performance evaluation of distributed-based similarity computation studies.

Motivated by the observations above, we conduct a most unprecedented survey on all the most representative spatio-temporal trajectory similarity measures proposed in the literature. Specifically, we review 25 similarity measures from the following hierarchical perspectives, namely: (i) Model architecture, i.e., non-learning based vs. learning-based; (ii) Space context, i.e., free space oriented vs. road network oriented; and (iii) Computational mechanism, i.e., standalone machine vs. distributed platform. To the best of our knowledge, none of the previous studies have reached such a wide scope of trajectory measure evaluation. It is also significant to conduct objective and sufficient evaluation to study each measure by using the same tasks, datasets, and experimental settings. In light of this, we provide a standard benchmark¹¹1publicly available at https://github.com/ZJU-DAILY/TSM in a real setting by introducing five different scenarios, i.e., length shift, shape shift, noise shift, sampling shift, and cardinality shift, on four real datasets. Then, we extensively explore and compare the effectiveness, robustness, efficiency, and scalability for all measures via quantitative and qualitative analysis. With our benchmark, (i) users can quickly grasp the technical details about trajectory similarity measures; and (ii) users can easily choose or design suitable measures and utilize them as the baselines. Overall, this paper makes the contributions below.

•

We conduct a concise but concrete review to evaluate and compare spatio-temporal trajectory similarity measures qualitatively and quantitatively in three dimensions, i.e., learning-based or not, road network oriented or not, and distributed-based or not, providing a reference for measure selection among traditional similarity measures, deep learning models, and distributed processing technologies.
•

We provide a standard evaluation benchmark with five types of trajectory transformations under five typical trajectory analytics scenarios. Based on this, we conduct a comprehensive evaluation of effectiveness, robustness, efficiency, and scalability performance of 25 representative measures.
•

We have several key observations according to experimental results, based on which, we offer insights about trajectory measure selection in specific application scenarios. According to several issues that remained to be solved, we also present detailed potential future directions.

Section 2 gives the preliminaries. Sections 3 and 4 present the non-learning-based and learning-based trajectory similarity measures respectively, where free-space vs. network oriented and standalone vs. distributed measures are further detailed. Section 5 provides the benchmark. The evaluation results with insights are reported in Section 6. Section 7 puts forward interesting future directions. Finally, Section 8 concludes the paper.

2 Preliminaries

In this section, we present the definitions related to trajectory similarity measure.

Definition 1.

(Trajectory) A trajectory $T$ is an ordered sequence that consists of GPS sampling points, i.e., $T=\langle p_{1},p_{2},...,p_{n}\rangle$ , where $p_{i}\left(1\leq i\leq n\right)$ is an observed GPS location in d-dimensional tuple to describe the mobility of $T$ .

Here, $n$ denotes the length of a trajectory. We use $T_{i}$ to denote a trajectory whose id is $i$ , and use ${T_{i}^{j}}$ to denote the $j$ -th point in $T_{i}$ . The points observed by GPS equipments could contain various information. For simplicity, we assume each sampling point is two-dimension or three-dimension (i.e., $(\textit{latitude, longitude})$ or $(\textit{latitude, longitude, timestamp})$ ). Although trajectory data is similar to time series data, time series data typically lacks of spatial information. In particular, for vehicle trajectories, they are physically constrained to the road network. Thus, directly applying similarity measures for time series in free space is not always suitable. The trajectories with road network constraints are taken into consideration. The road network is defined as follows.

Definition 2.

(Road Network) A road network is represented as a directed graph $G=(V,E)$ , where $V$ is a set of road intersections (i.e., road vertices) in the road network, and $E$ is a set of edges with direction of road segments.

Here, $v_{i}=(lat_{i},lon_{i})\in V$ is a road intersection, where $i$ is the id of $v_{i}$ , while $(lat_{i},lon_{i})$ denotes the latitude and longitude of $v_{i}$ . Meanwhile, an edge $e_{i,j}\in E$ represents a directed road segment from $v_{i}$ to $v_{j}$ . Existing map matching methods [7, 82] are proposed to transform a trajectory of GPS points into a road network constrained trajectory. A trajectory in a road network can be represented as $T=\left\langle v_{1},v_{2},...,v_{c}\right\rangle(c\leq n)$ or $T=\left\langle e_{1},e_{2},...,e_{c-1}\right\rangle(c\leq n)$ , where $c$ is the number of road vertices in $T$ .

Definition 3.

(Similarity Measure) A similarity measure is a function $f(T_{i},T_{j})$ to measure the distance between $T_{i}$ and $T_{j}$ .

Definition 4.

(Metric Measure) Given a similarity measure f and any three trajectories $T_{i}$ , $T_{j}$ , and $T_{a}$ , we call f as a metric measure if f satisfies the following conditions: (i) Uniqueness: $f(T_{i},T_{j})=0$ $\leftrightarrow T_{i}=T_{j}$ ; (ii) Nonnegativity: $f(T_{i},T_{j})\geq 0$ ; (iii) Triangle Inequality: $f(T_{i},T_{a})\leq f(T_{i},T_{j})+f(T_{j},T_{a})$ ; and (iv) Symmetry: $f(T_{i},T_{j})=f(T_{j},T_{i})$ .

As shown in Table I, a similarity measure can be classified according to i) whether it is metric or not as defined in Definition 4 (“ $\checkmark$ ” denotes it is a metric, “ $\times$ ” denotes it is not a metric, “ $/$ ” denotes learning methods do not use a manual function to compute similarity), ii) whether it can support trajectories with different lengths, iii) whether it is parameter-free, and iv) whether it is noise sensitive. Since trajectory similarity query is a prerequisite task in upper-level applications [71, 24], we perform Top- $k$ trajectory similarity query to evaluate the capabilities of all similarity measures in Section 6, which is defined below.

Definition 5.

(Top- $k$ Similarity Query) Given a query trajectory $QT$ , a set of trajectories $\mathcal{T}=\{T_{1},T_{2},...,T_{|\mathcal{T}|}\}$ , and a similarity measure $f(\cdot)$ , the Top- $k$ similarity query returns $k$ trajectories in $\mathcal{T}$ that are most similar to $QT$ $(\mathcal{S}=\{\mathcal{S}\subseteq T\wedge|\mathcal{S}|=k\wedge\forall T_{s}\in\mathcal{S},\forall T_{o}\in\mathcal{T}-\mathcal{S}(f(QT,T_{s})\text{ }|=f(QT,T_{o}))\})$ . Here, “ $|=$ ” means “ $\geq$ ” for LCSS while denotes “ $\leq$ ” for others.

3 Non-learning based Measures

In this section, we introduce 22 non-learning based measures (as shown in the upper part of Table I), and further detail each measure in terms of space context (i.e., free space and road network).

3.1 Trajectory Similarity in Free Space

We review 14 free space oriented measures. Each measure is point-based or segment-based. Here, we provide an running example in Figure 2, in order to illustrate how to compute the similarity between two trajectories $Q_{1}$ and $Q_{2}$ using different measures, where the distance matrix is used to show the distances between points of two trajectories.

3.1.1 Point-based Measures

Point-based measures are widely utilized to compute the similarity among trajectories consisting of sampling points.

Euclidean Distance (ED). Euclidean distance is a well-known measure, which is used to calculate the distance between time series with the same length [29]. When applied to trajectory data, sampling points in two series are aligned in order [53]. Formally, ED between two trajectories $T_{1}$ and $T_{2}$ is defined as:

	$\displaystyle dis(p_{1},p_{2})=$	$\displaystyle\sqrt{(p_{1}.lat-p_{2}.lat)^{2}+(p_{1}.lon-p_{2}.lon)^{2}}$		(1)
		$\displaystyle\operatorname{ED}(T_{1},T_{2})=\frac{\sum^{N}_{i=1}dis(T_{1}^{i},T_{2}^{i})}{n}$		(2)

Here, $dis(p_{1},p_{2})$ denotes the distance between two points $p_{1}$ and $p_{2}$ , $lat$ and $lon$ denote the latitude and longitude of the point, while $n$ denotes the length of $T_{1}$ and $T_{2}$ . ED is a metric and parameter-free measure with the time complexity of O(n). However, it assumes that trajectories are with the same length, which is not realistic in real life. Thus, we do not focus on evaluating ED. Note that, in Figure 2, $Q_{1}$ and $Q_{2}$ are with different lengths, and thus, ED cannot be applied to compute the similarity in this running example.

Dynamic Time Warping Distance (DTW). Similar to ED, DTW is also a measure designed for time series and can be applied to trajectory data [23, 43]. However, different from ED, DTW can align a point of one trajectory to one or more consecutive points of another trajectory. The time complexity of DTW is O(mn) where $m$ and $n$ denote the length of two comparing trajectories respectively. Although DTW is non-metric, it is most commonly used for trajectory similarity computation. Various techniques are proposed to improve the efficiency of DTW [29, 52]. DTW distance between two trajectories $T_{1}$ and $T_{2}$ is defined as:

\operatorname{DTW}(T_{1},\!T_{2})\!=\!\begin{cases}0&\text{if}\hskip 2.84526ptm=n=0\\ \infty&\text{if}\hskip 2.84526pt{m=0}\\ &\text{or $n=0$}\\ {dis}\left(T_{1}^{m},T_{2}^{n}\right)+\\ \min\begin{cases}\!\operatorname{DTW}\left(Head(T_{1}),T_{2}\right),\\ \operatorname{DTW}\left(T_{1},Head\left(T_{2}\right)\right),\\ \operatorname{DTW}\left(Head(T_{1}),Head(T_{2})\right)\end{cases}&\text{otherwise}\end{cases}

(3)

where $Head(\cdot)$ is to get the head sampling points except the last one point (i.e., $Head(T)=\langle p_{1},p_{2},...,p_{n-1}\rangle$ ). Take $Q_{1}$ and $Q_{2}$ in Figure 2 as an example, we are able to compute the DTW distance matrix in Figure 3(a) and obtain DTW( $Q_{1}$ , $Q_{2}$ ) = 16.52.

Longest Common Subsequence (LCSS). LCSS distance [64, 51] between two trajectories $T_{1}$ and $T_{2}$ is defined as the size of the longest common subsequence of $T_{1}$ and $T_{2}$ . Given a threshold parameter $\epsilon$ to determine whether a pair of points $(p_{1},p_{2})$ matches, if $dis(p_{1},p_{2})\leq\epsilon$ , $(p_{1},p_{2})$ is a match pair. As shown in Figure 3(b), when $\epsilon=1.0$ , the pair $(Q_{1}^{0},Q_{2}^{1})$ is a match pair, as $dis(Q_{1}^{0},Q_{2}^{1})\leq\epsilon$ . Actually, LCSS distance finds all match pairs in two trajectories, and it is non-metric but noise sensitive compared to ED. Moreover, its time complexity is O(mn). Formally, LCSS distance is defined as:

\operatorname{LCSS}_{\epsilon}(T_{1},T_{2})=\begin{cases}0&\text{ if }m=0\\ &or\quad n=0\\ 1+LCSS_{\epsilon}(Head(T_{m}),Head(T_{n}))&\text{ if }dis(T_{1}^{m},T_{2}^{n})\leq\epsilon\\ \max\begin{cases}\operatorname{LCSS}_{\epsilon}\left(Head(T_{1}),T_{2}\right),\\ \operatorname{LCSS}_{\epsilon}\left(T_{1},Head(T_{2})\right)\end{cases}&\text{ otherwise }\end{cases}

(4)

As shown in Figure 3(b), when $\epsilon$ = 1.0, there is only one matched pair (i.e., ( $Q_{1}^{0},Q_{2}^{1}$ )). Thus, LCSS( $Q_{1}$ , $Q_{2}$ ) = 1.

Edit Distance on Real Sequence (EDR). Similar to LCSS, EDR distance [13, 1, 16, 32] is also one of the edit distance-based measures with the time complexity of O(mn), where edit distance equals to the number of necessary operations (i.e., insertion, deletion and replacement) when transforming one trajectory into another one. However, different from LCSS, EDR sets a parameter $cost$ to measure the cost of an unmatched pair of points. Formally, EDR is defined as:

\operatorname{EDR}(T_{1},\!T_{2})=\begin{cases}n&\text{ if }m=0\\ m&\text{ if }n=0\\ \min\begin{cases}\operatorname{EDR}\left(Head(T_{1}),T_{2}\right)+1,\\ \operatorname{EDR}\left(T_{1},Head(T_{2})\right)+1,\\ \operatorname{EDR}\left(Head(T_{1}),Head(T_{2})\right)+cost\end{cases}&\text{otherwise}\end{cases}

(5)

Here, if $dis(T_{1}^{m},T_{2}^{n})\leq\epsilon$ , then $cost=0$ ; otherwise $cost=1$ . As depicted in Figure 3(c), if $\epsilon$ = 1.0, there are 1 matched point and 5 unmatched point in $Q_{2}$ , denoting that 5 necessary operations to transform $Q_{1}$ into $Q_{2}$ are needed. Thus, EDR( $Q_{1}$ , $Q_{2}$ ) = 5.

Edit Distance with Real Penalty (ERP). The above edit distance-based measures are basically non-metric. However, ERP [12, 14, 90] is a metric measure that can be used for indexing and pruning. Different from other edit distance-based measures such as LCSS and EDR, ERP does not require a threshold parameter, but sets a reference point (i.e., gap point) for measuring. ERP is commonly used in trajectory similarity computation with the time complexity of O(mn) because of its metricity. Formally, ERP distance between two trajectories $T_{1}$ and $T_{2}$ is defined as:

\operatorname{ERP}(T_{1},T_{2})=\begin{cases}\sum_{i=1}^{n}dis\left(T_{2}^{i},r\right)&\text{ if }{m=0}\\ \sum_{i=1}^{m}dis\left(T_{1}^{i},r\right)&\text{ if }{n=0}\\ \min\begin{cases}\operatorname{ERP}\left(Head(T_{1}),Head(T_{2})\right)+dis\left(T_{1}^{m},T_{2}^{n}\right),\\ \operatorname{ERP}\left(T_{1},Head(T_{2})\right)+dis\left(T_{2}^{n},r\right),\\ \operatorname{ERP}\left(Head(T_{1}),T_{2}\right)+dis\left(T_{1}^{m},r\right)\end{cases}&\text{ otherwise }\end{cases}

(6)

where $r$ denotes the reference point. As shown in Figure 3(d), we set the reference point $r$ as (3, 2), and get ERP( $Q_{1}$ , $Q_{2}$ ) = 15.45.

Hausdorf Distance. Hausdorff distance [4] is a metric measure, which measures the maximum distance of all distance values from a point of one trajectory to the nearest point in another trajectory. The time complexity of Hausdorff distance is O(mn), which is a parameter-free measure. Formally, Hausdorf distance is defined as:

\operatorname{Hausdorff}\left(T_{1},T_{2}\right)=\max\begin{cases}\max_{1\leq i\leq m}\min_{1\leq j\leq n}dis\left(T_{1}^{i},T_{2}^{j}\right),\\ \max_{1\leq j\leq n}\min_{1\leq i\leq m}dis\left(T_{1}^{i},T_{2}^{j}\right)\end{cases}

(7)

In Figure 4(a), we find i) for each point of $Q_{1}$ , its nearest distance to trajectory $Q_{2}$ (denoted as grey shaded rectangles), and ii) for each point of $Q_{2}$ , its nearest distance to trajectory $Q_{1}$ (denoted as red rectangles). Based on this, we get two maximum distances (denoted as blue rectangles), and have Hausdorff( $Q_{1}$ , $Q_{2}$ ) = 3.61.

Frechet Distance. Frechet distance [2] is similar to the Hausdorff distance. The main difference is that Hausdorff distance does not consider for the direction of two trajectories, while Frechet distance does as depicted in Figure 5. Fréchet [2] is proposed by taking an example of walking a dog. Suppose that a man walks his dog with a leash. Though the man and his dog may have different trajectories, they move in the same direction. Frechet distance between two trajectories of man and dog is the shortest length of the leash required. Frechet distance is sensitive to noises with the time complexity of O(mn). However, it is commonly used for trajectories with different lengths or sampling rates. Since Frechet distance is non-metric, many variants are proposed to make it be metric [15]. The most common one is discrete Frechet distance [70] with the time complexity of O(mn). Thus, we use discrete Frechet distance in our experiments for its metricty. For simplicity, we use “Frechet” distance to denote discrete Frechet distance in the rest of this paper. Formally, discrete Frechet distacne is defined as:

\operatorname{Frechet}(T_{1},T_{2})=\begin{cases}\max_{1\leq i\leq m}dis\left(T_{1}^{i},T_{2}^{1}\right)&\text{ if }{n=1}\\ \max_{1\leq i\leq n}dis\left(T_{1}^{1},T_{2}^{i}\right)&\text{ if }{m=1}\\ \max\begin{cases}dis(T_{1}^{m},T_{2}^{n}),\\ \min\begin{cases}\operatorname{Frechet}\left(T_{1},Head(T_{2})\right),\\ \operatorname{Frechet}\left(Head(T_{1}),T_{2}\right),\\ \operatorname{Frechet}\left(Head(T_{1}),Head(T_{2})\right)\end{cases}\end{cases}&\text{ otherwise }\end{cases}

(8)

In Figure 4(b), we have Frechet( $Q_{1}$ , $Q_{2}$ )=4.12.

Distributed settings. To accelerate the trajectory similarity computation, distributed techniques [55, 85, 71, 80] are developed, where DITA [55] and REPOSE [85] are two representative methods designed for point-based measures.

DITA. DITA [55] is a distributed in-memory trajectory analytics system, using several classical point-based measures such as DTW and Frechet. DITA chooses STR Partition algorithm [33] to partition trajectory points. It uses R-tree [25] as the local index, and designs a trie-like index as the global index. Accordingly, a trie-like partition algorithm is also used in the local index, and several pruning optimization methods are developed to improve the efficiency of similarity search and join in the distributed environment.

REPOSE. REPOSE [85] is a distributed in-memory system destined for trajectory similarity search, which supports multiple distance measures, including Hausdorff, Frechet, DTW, LCSS, EDR and ERP. Zheng et al [85] discretize trajectories into reference trajectories and construct a trie-like index (i.e., RP-Tree) on reference points of reference trajectories. It is worth mentioning that, REPOSE tends to divide similar trajectories into different partitions as much as possible to achieve load balancing.

3.1.2 Segment-based Measures

Segment-based measures compute the similarity among trajectories, which are consisted of line segments. A line segment is a pair of points $(p_{1},p_{2})$ , and its length is the Euclidean distance between $p_{1}$ and $p_{2}$ . Based on this, we introduce 5 segment-based measures (i.e., EDwP, LIP, OWD, Seg-Frechet and Seg-Hausdorff) and a distributed method (i.e., DFT) designed for segment-based measures.

Edit Distance with Projections (EDwP). EDwP distance [49, 59, 8] employs a parameter-free approach and adapts to non-uniform sampling rates through dynamic interpolation and projections. It transforms trajectory points into line segments, and uses insertion and replacement operations to compute the edit distance. The time complexity of EDwP is O(mn), and EDwP distance between two trajectories $T_{1}$ and $T_{2}$ is defined as:

\operatorname{EDwP}(T_{1},T_{2})=\begin{cases}0&\text{ if }m=n=0\\ \infty&\text{ if }{m=0}\\ &\text{ or $n=0$ }\\ \min\begin{cases}\operatorname{EDwP}\left(T_{1},ins(T_{2},T_{1})\right),\\ \operatorname{EDwP}\left(ins(T_{1},T_{2}),T_{2}\right),\\ \operatorname{EDwP}\left(Head(T_{1}),Head(T_{2})\right)+\\ (rep(T_{1}^{1},T_{2}^{1})\times Coverage(T_{1}^{1},T_{2}^{1}))\end{cases}&\text{ otherwise }\end{cases}

(9)

where $rep(\cdot,\cdot)$ and $ins(\cdot,\cdot)$ denote the operations of replacement and insertion respectively. Specifically, $rep(l_{1},l_{2})=dis(l_{1}.s,l_{2}.s)+dis(l_{1}.e,l_{2}.e)$ , where $l_{i}.s$ and $l_{i}.e$ represent the first and last point of line segment $l_{i}$ , respectively. $Coverage(l_{1},l_{2})=dis(l_{1}.s,l_{1}.e)+dis(l_{2}.s,l_{2}.e)$ is used to measure the cost of replacing. In addition, $ins(l_{1},l_{2})$ effectively divides the line segment $l_{1}$ into two segments by projecting $l_{2}$ onto it. In Figure 6(a), we calculate the EDwP distance matrix and have EDwP( $Q_{1}$ , $Q_{2}$ )=5.14.

Locality In-between Polylines (LIP). LIP [46, 3] is a shape-based measure, which considers the areas of polygons formulated by intersection points of two trajectories. As shown in Figure 6(b), polygons have different areas (i.e., $Area_{\textit{1,2,3}}$ ). LIP sets a weight $w$ (i.e., the average perimeters of the polygons) for each area to adjust the effect of polygons. Nikos et al. [46] extend LIP to a spatio-temporal (i.e., time-aware) distance called STLIP, and also propose a speed-pattern spatio-temporal distance called SPSTLIP. The time complexity of LIP, STLIP and SPSTLIP are all O((m+n)log(m+n)). Formally, the LIP is defined as:

	$\displaystyle w_{i}=$	$\displaystyle\frac{{Perimeter}_{T_{1}}\left(I_{i},I_{i+1}\right)+{Perimeter}_{T_{2}}\left(I_{i},I_{i+1}\right)}{{Perimeter}_{T_{1}}+{Perimeter}_{T_{2}}}$		(10)
	$\displaystyle\operatorname{LIP}$	$\displaystyle(T_{1},T_{2})=\sum_{\forall Polygon_{i}}Area_{i}\cdot w_{i}$		(11)

where $I_{i}$ denotes the intersection point. In Figure 6(b), the areas and weights of all polygons are calculated. Then, we have LIP( $Q_{1}$ , $Q_{2}$ )=4.04.

Ont-way Distance (OWD). OWD [21, 42, 17] supports two representations of trajectories (i.e., linear representation and grid representation) as shown in Figure 7. Lin et al. [21] propose OWD based on linear representation, which transforms point-based trajectory data into segment-based. It measures the average minimal distance from each point in one trajectory to the other trajectory. OWD is an asymmetric measure according to Definition 4. Formally, the distance $D(p,T)$ between a point $p$ and a trajectory $T$ , and the OWD distance of linear representation are defined as below.

$\displaystyle D(p,T)$	$\displaystyle=\min_{q\in T}ED(p,q)$	(12)
$\displaystyle\operatorname{OWD}\left(T_{1}\rightarrow T_{2}\right)$	$\displaystyle=\frac{1}{\left\|T_{1}\right\|}\left(\int_{p\in T_{1}}D\left(p,T_{2}\right)dp\right)$	(13)
$\displaystyle\operatorname{OWD}\left(T_{1},T_{2}\right)=\frac{1}{2}$	$\displaystyle(\operatorname{OWD}\left(T_{1}\rightarrow T_{2}\right)+\operatorname{OWD}\left(T_{2}\rightarrow T_{1}\right))$	(14)

Considering the high computation cost O(mn) of linear representation, OWD is extended to grid representation called $\operatorname{OWD^{g}}$ , where trajectory points are mapped into grid cells according to their spatial information. Thus, OWD only computes the distances between grid cells (instead of sample points) and grid-based trajectories. The time complexity is reduced to $\textit{O(}m^{\prime}n^{\prime}\textit{)}$ , where $m^{\prime}$ and $n^{\prime}$ denote the numbers of grid cells occupied by two trajectories, respectively. Compared to $\operatorname{OWD}$ , $\operatorname{OWD^{g}}$ is more popular due to its low time complexity [57]. Formally, $\operatorname{OWD^{g}}$ between two grid-based trajectory $T_{\operatorname{g1}}$ and $T_{\operatorname{g2}}$ is defined as:

$\displaystyle D^{\operatorname{g}}\left(\operatorname{g},T_{\operatorname{g}}\right)=$	$\displaystyle\min_{\operatorname{g}^{\prime}\in T_{\operatorname{g}}}d^{\operatorname{g}}\left(\operatorname{g},\operatorname{g}^{\prime}\right)$	(15)
$\displaystyle\operatorname{OWD^{g}}\left(T_{\operatorname{g1}}\rightarrow T_{\operatorname{g2}}\right)$	$\displaystyle=\frac{1}{\left\|T_{\operatorname{g1}}\right\|}\sum_{p\in T_{\operatorname{g1}}}D^{\operatorname{g}}\left(p,T_{\operatorname{g2}}\right)$	(16)
$\displaystyle\operatorname{OWD^{g}}\left(T_{\operatorname{g1}},T_{\operatorname{g2}}\right)=$	$\displaystyle\frac{\operatorname{OWD^{g}}\left(T_{\operatorname{g1}}\rightarrow T_{\operatorname{g2}}\right)+\operatorname{OWD^{g}}\left(T_{\operatorname{g2}}\rightarrow T_{\operatorname{g1}}\right)}{2}$	(17)

where $d^{\operatorname{g}}$ (·) computes the distance of two grid cells. As depicted in Figure 7(a), we have $\operatorname{OWD}$ ( $Q_{1}$ , $Q_{2}$ )=1.67. In addition, $Q_{1}$ and $Q_{2}$ are extended to grid representations $Q_{\operatorname{g1}}$ and $Q_{\operatorname{g2}}$ (consisted of 12 grid cells respectively) as shown in Figure 7(b), where $\operatorname{OWD^{g}}$ ( $Q_{\operatorname{g1}}$ , $Q_{\operatorname{g2}}$ )=0.25. Note that, when dealing with sparse data with a large spatial span, the space overhead of $\operatorname{OWD^{g}}$ is quite large.

Seg-Frechet and Seg-Hausdorff. Inspired by Frechet and Hausdorff distances, Seg-Frechet and Seg-Hausdorff distances [71] are proposed for segment-based trajectories, by changing points to line segments when computing Frechet and Hausdorff.

Distributed setting. Xie et al. [71] develop a distributed framework (i.e., DFT) based on Spark that supports Seg-Frechet and Seg-Hausdorff measures. DFT uses STR partition algorithm [33] to partition segments, while constructs a dual R-Tree as the global index and a general R-Tree as the local index. DFT is the first distributed method to support fast trajectory similarity computation.

3.2 Trajectory Similarity in Road Network

In this subsection, we review 7 distance measures designed for road network constrained trajectories in terms of point-based and segment-based measures. Different from free space, trajectories in road network are usually denoted by road vertices or road segments. Here, Figure 8 gives the running example in road network, where the grid (denoted as grey lines) is assumed as the underlying road network. The spatial and temporal distance matrices are used to store the road network distances and temporal distances between vertices of two trajectories $Q_{1}$ and $Q_{2}$ , respectively.

3.2.1 Point-based Measures

Point-based measures in road network are utilized to compute the similarity among trajectories that are consisted of road vertices. We mainly introduce 5 point-based measures (i.e., NetERP, NetEDR, NetDTW, NetLCSS and TP) below.

NetERP, NetEDR, NetDTW and NetLCSS. NetERP [30], NetEDR [30], NetDTW [72] and NetLCSS [19] are expanded from classic measures in free space. They first map original trajectories into road network paths that consist of vertices or segments. Then, they define similarity measures based on classic distance measures such as ERP, EDR, DTW and LCSS, generally by aggregating the distances between road vertices or segments of two trajectories. Note that, these measures employ the shortest path distance (instead of Euclidean distance) between two road vertices in the graph. Thus, NetERP, NetEDR, NetDTW and NetLCSS have the same features as corresponding classic measures, and can be used in road network. Consider the running example in Figure 8, we have NetERP( $Q_{1}$ , $Q_{2}$ )=20.00, NetEDR( $Q_{1}$ , $Q_{2}$ )=4, NetDTW( $Q_{1}$ , $Q_{2}$ )=19.00, and NetLCSS( $Q_{1}$ , $Q_{2}$ )=1.

TP. TP [54] is a measure that considers both the spatial and temporal similarities. Shang et al. [54] assume that the point is 3-dimension, i.e., $v=(p,t)=((latitude,longitude),timestamp)$ . The spatial and temporal similarities (i.e., $Sim_{S}$ and $Sim_{T}$ ) are:

	$\displaystyle D^{\prime}(v.p,T)=\min_{q\in T}d^{\prime}(v.p,q.p)$	(18)
	$\displaystyle D^{\prime}(v.t,T)=\min_{q\in T}\|v.t-q.t\|$	(19)
$\displaystyle{Sim}_{S}\left(T_{1},T_{2}\right)$	$\displaystyle=\frac{\sum_{T_{1}^{i}\in T_{1}}e^{-D^{\prime}\left(T_{1}^{i}\cdot p,T_{2}\right)}}{\left\|T_{1}\right\|}+\frac{\sum_{T_{2}^{j}\in T_{2}}e^{-D^{\prime}\left(T_{2}^{j}\cdot p,T_{1}\right)}}{\left\|T_{2}\right\|}$	(20)
$\displaystyle{Sim}_{T}\left(T_{1},T_{2}\right)$	$\displaystyle=\frac{\sum_{T_{1}^{i}\in T_{1}}e^{-D^{\prime}\left(T_{1}^{i}\cdot t,T_{2}\right)}}{\left\|T_{1}\right\|}+\frac{\sum_{T_{2}^{j}\in T_{2}}e^{-D^{\prime}\left(T_{2}^{j}\cdot t,T_{1}\right)}}{\left\|T_{2}\right\|}$	(21)

Here, $d^{\prime}(\cdot,\cdot)$ denotes the shortest path distance between two road vertices in the graph. The spatial and temporal similarities are combined linearly to obtain the spatio-temporal similarity, i.e., $Sim_{ST}(T_{1},T_{2})$ $=\lambda\cdot Sim_{S}(T_{1},T_{2})+(1-\lambda)\cdot Sim_{T}(T_{1},T_{2})$ , where $\lambda$ $\left(0\leq\lambda\leq 1\right)$ is a parameter to adjust the relative importance of spatial and temporal similarities. As shown in Figure 8, since TP needs to utilize the temporal information of trajectories, we randomly add timestamps to points of $Q_{1}$ and $Q_{2}$ , i.e., $Q_{1}={(1,1,1),(3,4,3),(4,0,4),(6,2,6),(7,1,9)}$ , $Q_{2}={(1,5,2),(1,2,4),(3,2,6),(5,3,7),(8,3,9),(8,5,11)}$ . Based on this, when $\lambda$ =0.2, we have TP( $Q_{1}$ , $Q_{2}$ )=1.17.

3.2.2 Segment-based Measures.

Segment-based measures in road network measure the similarity among trajectories consisted of road segments. We mainly introduce 2 segment-based measures (i.e., LORS and LCRS) and a distributed framework (i.e., DISON).

Longest Overlapping Road Segments (LORS). LORS [67] is a distance measure based on the length of overlapped edges (i.e., road segments) with the time complexity of O(mn). Given two trajectories $T_{1}$ and $T_{2}$ , where the $i$ -th ( $1\leq i\leq m$ ) segment of $T_{1}$ is denoted as $T_{1}^{i}$ , LORS measure is defined formally as:

\operatorname{LORS}(T_{1},T_{2})=\begin{cases}0&\text{ if }{m=0}\text{ or $n=0$ }\\ length(T_{1}^{m})\\ +\operatorname{LORS}(Head(T_{1}),Head(T_{2}))&\text{ if }T_{1}^{m}=T_{2}^{n}\\ \max\begin{cases}\operatorname{LORS}\left(T_{1},Head(T_{2})\right),\\ \operatorname{LORS}\left(Head(T_{1}),T_{2}\right),\end{cases}&\text{ otherwise }\end{cases}

(22)

where $length$ (·) denotes the length of the edge. According to Eq. 22, LORS does not need to compute the shortest path distance in the graph, which is relatively efficient. In Figure 8, we have LORS( $Q_{1}$ , $Q_{2}$ )=0, as $Q_{1}$ and $Q_{2}$ have no overlapping segments.

Longest Common Road Segments (LCRS). Similar to LCSS, LCRS [80] finds the longest common road segment between two trajectories, and can be used in several applications such as carpooling. The time complexity of LCRS is also O(mn), and LCRS satisfies symmetry. It is formally defined as:

\displaystyle\operatorname{LCRS}(T_{1},T_{2})=\frac{\operatorname{LORS}(T_{1},T_{2})}{m+n-\operatorname{LORS}(T_{1},T_{2})}

(23)

As shown in Figure 8, LCRS( $Q_{1}$ , $Q_{2}$ )=0.00, as $Q_{1}$ and $Q_{2}$ have no common segments.

Distributed setting. Yuan et al. [80] extend standalone LCRS to a distributed framework called DISON, which supports road-network constraint similarity search and join. DISON first uses STR partition algorithm to partition the road segments, and then it builds a two-layer index with hashmap as the global index and an inverted index as the local index. Note that, DISON is unique distributed framework for similarity measure in road network.

4 Learning Based Measures

As mentioned in Section 3, non-learning measures generally have a time complexity of O(mn), which is high. To address this, learning-based methods are developed to reconstruct the input data of high-dimension into a new representation of low-dimension [41, 78, 36].

t2vec [36] designs a trajectory deep learning framework to achieve robustness to low sample rates and noises. It maps a trajectory into a $d$ -dimensional embedding vector. Note that, t2vec is not a similarity measure, but a representation method to transform trajectories into vectors.

Inspired by t2vec, many studies employ different deep learning frameworks in free space and road network, to learn approximate distance functions for non-learning measures. Generally, as shown in Figure 9, they first select similar and dissimilar trajectories (i.e., positive and negative trajectories) of anchor trajectories (origin trajectories) to obtain grid-based trajectories [76] or similarity triplets [84, 26, 19]. Then, features (in terms of spatial, temporal, structural and so on) are extracted, while trajectory embedding vectors are generated using deep representation learning. Finally, the learning (dashed rectangle in Figure 9) can stop until the trajectory similarities and dissimilarities evaluated on the embedding vectors are close to the ground-truth. Here, results calculated using non-learning measures are taken as ground-truth. Note that, the time complexity of similarity learning among vectors is linear. In view of this, we review 4 representative learning-based measures.

NEUTRAJ. The first learning-based trajectory similarity measure is NEUTRAJ [76], which is based on neutral metric learning. Specifically, NEUTRAJ first maps trajectories into grid-trajectories. Then, it samples trajectories as seeds, and uses their pair-wise similarities and dissimilarities as guidance. Finally, NEUTRAJ uses Long Short-Term Memory (LSTM) model to generate the embedding vectors, and approximates various non-learning similarity computations (as depicted in Figure 9) with the complexity of O(m+n). Note that, embedding vectors generated by NEUTRAJ can preserve the spatial information of trajectories. In addition, in order to improve the performance of deep representation learning methods on long trajectories, TrajGAT [77] is proposed, which is based on graph attention networks (GATs), Transformer and a quad-tree index for effectively embedding trajectories.

Traj2SimVec. Although NEUTRAJ has greatly reduced the time complexity, it needs a pre-training process to compute the similarity among all seed trajectories, which incurs a quadric time complexity during training. Thus, Traj2SimVec [84] is proposed to improve the training efficiency by simplifying training trajectories into triplet training samples. And the training time complexity is reduced to O(logn), where $n$ denotes the average length of training trajectories. Similar to Traj2SimVec, T3S [75] and TMN [74] are proposed based on attention networks, where the former considers dissimilar trajectories while the latter focuses on the inter-information between trajectories.

GTS. Different from the above two methods that are designed in free space, a Graph Neural Network (GNN) based approach named GTS [26] is designed to measure the trajectory similarity in road network. In order to reflect the information on the road network, GTS learns the representation of point-of-interest (POI) in spatial network and the relationship between trajectories, between POIs, or between trajectories and POIs.

ST2Vec. ST2Vec [19] is also proposed for spatio-temporal trajectory similarity computation in road network. Considering the complex temporal information in trajectory data [57], ST2Vec with a time complexity of O(d) captures both spatial and temporal features, and fusions these features to obtain spatio-temporal embedding vectors based on GNN and LSTM. Also, ST2Vec applys training triplets samples to reduce the training cost, and sets spatial and temporal weights that can be adjusted for flexible analysis.

Note that, due to the similarity among different learning-based methods, we mainly evaluate NEUTRAJ [76], Traj2SimVec [84], GTS [26], ST2Vec [19] in Section 6 for simplicity.

5 Evaluation Benchmark

Motivation. Since the evaluation on such a big family of trajectory similarity measures has never been organized, we are inspired to propose the first standard evaluation framework to examine the function of each measure. On the one hand, the primary goal of a distance measure is being effective, robust, efficient, and scalable. These four desiderata are often in tension with each other. In view of this, the evaluation framework must enable to study the capability of each measure in terms of multi-aspects. On the other hand, trajectory data usually have various characteristics such as variable lengths, complex shapes (especially for vehicle trajectories), varying samplings, outlier noises, etc. Specifically, the variable lengths and complex shapes are intrinsic properties [87], while the varying samplings and outlier noises are usually caused by other factors [57] such as battery power and urban building effect. Overall, we build an evaluation framework to benchmark the trajectory measures under as many real-life settings as possible, objectively studying the capability of each measure in terms of four performance aspects. Table II summarizes all the evaluation dimensions and corresponding scenarios. According to Table II, we evaluate each trajectory similarity measure from four dimensions as follows.

(i) Effectiveness vs. length and shape. Length and shape are considered as intrinsic properties of trajectory data [35], which requires an effective measure. However, as the similarity of two trajectories does not have a ground truth [32], we cannot determine whether the value computed by trajectory measures is correct or not. It is also meaningless to cross-compare the distance values returned by different measures. To this end, we adopt a qualitative way to visualize the Top- $k$ similarity query results returned by different trajectory measures, for the same query trajectory. Thus, we give an intuitive analysis when the query trajectory is selected with different lengths or shapes to show the effectiveness of each measure.

Specifically, we use parameter $L$ to change the trajectory length. For example, $\textit{L=}20$ means to select the first $20\%$ points from the original trajectory as a deformed trajectory. Such length-oriented evaluation enables us to see how each distance measure performs when the length of the trajectory changes from short to long, which is useful in an online setting that involves trajectory evolution. In terms of varying shapes, we select four typical geometrics, i.e., straight line, polyline without overlaps, polyline with overlaps, and round line, to explore how each similarity measure performs when the query trajectories are of different shapes.

(ii) Robustness vs. sampling and noise. Quality issues commonly exist during data collection. For example, a taxi driver may alter the default device sampling rate to reduce the power consumption [86]. In this case, even for the same trajectory, this may result in sparse and dense segments. In addition, noise may occur in GPS points, which is usually caused by low satellite visibility or urban canyons that affect signal quality. Consequently, given a query trajectory, a robust trajectory similarity measure is expected to retrieve the same query results for datasets with different sampling rates and noise ratios.

TABLE II: Evaluation Benchmark

Dimension	Varying Cases	Adjustable Parameters
Effectiveness	vs. length	$L$ (%): 20, 60, 100
Effectiveness	vs. shape	Four typical geometrics
Robustness	vs. sampling	$S$ (%): 10, 20, 40
Robustness	vs. noise	$N$ (%): 13, 16, 19/10, 20, 30
Efficiency	Top- $k$ query	/
Scalability	vs. length	$L$ (%): 20, 60, 100
Scalability	vs. cardinality	$O_{r}$ (%): 20, 60, 100

We use parameter $S$ to denote the percentage of sampling points in each trajectory. For example, $\textit{S=}20$ means that we sample $20\%$ points from a trajectory as a deformed trajectory. Note that, sampling transformation is meaningful and useful for free space trajectories, while meaningless and useless for road network constrained trajectories after map matching. We use parameter $N$ to represent the ratio of noises in a whole trajectory. For instance, $\textit{N=}10$ denotes to sample $10\%$ points from a trajectory, and then add an outlier for each point. Here, we utilize Gaussian noise [5] to add an outlier $p^{\prime}$ for each filtered point $p$ , i.e., $p^{\prime}=p+\Delta\cdot p\sim Gaussian(0,1)$ , where $\Delta$ is the radius range of outliers.

(iii) Efficiency vs. Top- $k$ query. An efficient trajectory similarity measure is expected to efficiently compute the distance value between any pair of trajectories, which especially is important for online trajectory applications. Therefore, we directly use the Top- $k$ similarity query as the evaluation task.

(iv) Scalability vs. length and cardinality. Despite how the average trajectory length or the data cardinality (i.e., the number of moving objects) of the trajectory dataset changes, a scalable measure should show stable inter-trajectory distance computation performance. Specifically, we proceed to use $L(\%)$ to control the average length of trajectories in the dataset. In addition, we use another parameter $O_{r}$ to control the percentage of moving objects w.r.t. all the objects, i.e., the trajectory dataset to be queried/processed.

6 Experimental Evaluation

We conduct extensive experiments aimed at offering insights to the performance of trajectory similarity measures summarized in Table I. First, we present experimental settings. Next, based on the evaluation benchmark proposed in Section 5, we study trajectory measures in terms of four aspects: effectiveness, robustness, efficiency, and scalability. Finally, we conduct a study of metric similarity measures to show the effect of indexing and pruning on them.

6.1 Experimental Setup

Datasets. We employ four real-world trajectory datasets: AIS [47], Geolife [89], T-Drive [48], and Porto [27], all of which are widely used in previous trajectory similarity studies [30, 67, 65, 36, 76, 84, 75, 19, 57, 26].

•

AIS records the locations of vessels in U.S. and international waters. We use the data from Jan. 1 to Dec. 1, 2019. The sample interval varies between 1 and 90 seconds.
•

Geolife contains about 25 million GPS points collected from 182 pedestrian in Beijing, from Apr. 2007 to Aug. 2012. The sample interval is 2 seconds. Geolife records different transportation modes of users, and thus, the trajectories’ speeds vary significantly.
•

T-Drive includes 1.5 million GPS points generated by 10,357 taxis in Beijing, from Feb. 2 to Feb. 8, 2008. The sample interval varies between 1 and 177 seconds.
•

Porto contains 1.7 million GPS points generated by 442 taxis in Porto, Portugal, from Aug. 2011 to Apr. 2012. The sample interval is 15 seconds.

The visualization of four datasets is shown in Figure 10. It suggests that the GPS points collected by AIS have a wider spatial coverage, and are distributed more unevenly than those collected by Geolife. Since vessels and pedestrian move in free space and taxis move along road networks, we use AIS and Geolife for evaluating free space oriented trajectory measures, and use T-drive and Porto for evaluating road network constrained trajectory measures. We remove trajectories that contain less than 5 GPS points, and map match [73] all the trajectories on T-Drive and Porto to the corresponding road networks extracted from OpenStreetMap [45]. We randomly select 50,000 trajectories for evaluation in the distributed settings. The average lengths of AIS, Geolife, T-Drive, and Porto are 512, 343, 150, and 58, respectively. Due to the high complexity of similarity computation and the limitations of a single machine, we randomly selected 10,000 trajectories out of a total of 50,000 trajectories for evaluation in standalone processing mode.

Experimental design. We verify trajectory similarity measures summarized in Table I by the evaluation benchmark depicted in Table II.

(i) Non-learning based measures. We evaluate 17 standalone measures: DTW, LCSS, EDR, EDwP, ERP, Hausdorff, Frechet, LIP, OWD, Seg-Frechet, LORS, TP, NetERP, NetEDR, NetDTW, NetLCSS, and LCRS in terms of effectiveness, robustness, efficiency, and scalability. However, we evaluate four distributed measures: DFT, DITA, REPOSE, and DISON only in terms of efficiency and scalability. This is because the distributed implementations only improve time performance by techniques such as parallelization and partitioning, without modifying the measures themselves.

(ii) Learning-based measures. Since all learning-based measures, i.e., NEUTRAJ [76], Traj2SimVec [84], GTS [26], and ST2Vec [19] are standalone-based, we evaluate them in terms of effectiveness, robustness, efficiency, and scalability. Note that, the target of learning-based models is to approximate the inter-trajectory similarity computed by non-learning measures and all existing learning-based measures can only support point-based similarity learning. Therefore, for each learning-based measure, we verify its capability of approximating point-based trajectory similarity measures, including DTW, LCSS, EDR, EDwP, ERP, Hausdorff, and Frechet in free space, and NetDTW, NetLCSS, NetERP, and TP in road networks.

Experimental settings. To ensure a fair and objective comparison, we do not implement any additional indexing structures when evaluating the standalone measures, and do not modify the indexing structures inherent in each of the distributed measures during evaluation.

For EDR and LCSS used in free space, we set the similarity threshold used in similarity computation to 30km. For NetEDR and NetLCSS used in road network, we set the similarity threshold to 1km. Moreover, we set the gap point of ERP and NetERP to the centroids of datasets. To vary noise ratios in benchmark settings, we set $\Delta{\textit{lat}}=\Delta{\textit{lon}}=0.1\degree$ for AIS, $\Delta{\textit{lat}}=\Delta{\textit{lon}}=0.0005\degree$ for Geolife, and $\Delta{\textit{lat}}=0.008\degree$ and $\Delta{\textit{lon}}=0.007\degree$ for T-Drive and Porto.

For standalone measures, we conduct experiments on an 8-core Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz processor. For distributed measures, we conduct experiments on a cluster consisting of 9 nodes with nine 10-core Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz processors. Each node ran CentOS7 system with Hadoop 2.6.5 and Spark 2.2.0. For learning-based measures, we set the hyperparameters based on the performance of each model. Specifically, for NEUTRAJ [76], the batch size is set to 60 and the initial learning rate is set to 0.01. For Traj2SimVec [84], the batch size is set to 64 and the embedding size of GCN is set to 128. For GTS [26] and ST2Vec [19], the batch size is set to 128 and the initial learning rate is set to 0.1. All of the LSTMs used in these measures have hidden sizes of 128. In addition, we split each dataset into a training set, validation set, and test set in a 3:1:6 ratio. All the experiments of learning-based methods were conducted on a server with GeForce RTX 3090, 2.40GHz GPU, and 24GB RAM.

6.2 Effectiveness Evaluation

We evaluate the effectiveness of each trajectory similarity measure by performing Top-1 similarity queries.

6.2.1 Non-learning based measures

Effect of Lengths. Figures 11(a)–(j) plot the Top- $1$ query results for a given query trajectory $QT$ (denoted by red lines) when the ratio of the length of the $QT$ to its entire length varies from 20% to 100%. Here, we only evaluate the free-space oriented measures on AIS and Geolife, because the statics show that the trajectories in free space are much longer than trajectories in road networks. The higher the similarity between the query trajectory and the corresponding Top- $1$ query result by visualization, the higher the effectiveness. The observations are below.

i) For AIS, results computed by LCSS and Hausdorff are almost identical when L=100%; while those on DTW, Frechet, OWD, and Seg-Frechet are almost identical. For Geolife, DTW, EDwP, OWD, Hausdorff, Frechet, and Seg-Frechet share the similar results with the same transportation mode as $QT$ , while only Hausdorff returns the result with opposite direction of it. This is because Hausdorff is an symmetric measure (cf. Definition 4), which does not take into account trajectories’ directions. ii) When the length of $QT$ varies, DTW, EDwP, Hausdorff, and OWD always return the same results on both datasets. This suggests that they can be flexibly adapted to identify similar trajectories w.r.t. a query trajectory with various lengths. iii) When the length of $QT$ is small, EDR performs worse than others, i.e., the results identified by EDR are quite different from $QT$ (cf. Figures 11(a)–(c) and 11(f)–(g)). The reason is that the costs of different edit operations in EDR computation are the same, making it sensitive to trajectory length (i.e., the number of trajectory points). Specifically, EDR always finds the query result whose length is similar with query trajectory $QT$ , even the result is spatially distant with $QT$ . iv) The results computed by LIP vary significantly with the length of $QT$ . This is because the results highly relies on the shapes of polygons formed by intersection points of $QT$ and a to-be-examined trajectory, while the shapes are obviously affected by the length of $QT$ . v) Finally, results identified by all the measures on AIS are generally better (i.e., more similar to $QT$ ) than those on Geolife. This is because each trajectory in Geolife generally contains multiple transportation modes and tends to exhibit various features.

TABLE III: Robustness of Learning-based Measures in Free vs. Sampling Rate

Sampling Rate (%)	Measures	AIS						Geolife
Sampling Rate (%)	Measures	DTW	LCSS	EDR	ERP	Frechet	Hausdorff	DTW	LCSS	EDR	ERP	Frechet	Hausdorff
$S$ =10	NEUTRAJ	0.0087	0.0075	0.0073	0.0082	0.0086	0.0087	0.0281	0.0217	0.0140	0.0156	0.0178	0.0166
$S$ =10	Traj2SimVec	0.2515	0.0016	0.0078	0.0356	0.2067	0.2520	0.0106	0.3312	0.3312	0.0271	0.0158	0.0110
$S$ =20	NEUTRAJ	0.0087	0.0083	0.0087	0.0080	0.0091	0.0091	0.0277	0.0204	0.0113	0.0179	0.0200	0.0183
$S$ =20	Traj2SimVec	0.2603	0.0019	0.0077	0.0352	0.2151	0.2610	0.0094	1.0000	1.0000	0.0245	0.0158	0.0110
$S$ =40	NEUTRAJ	0.0090	0.0080	0.0082	0.0084	0.0086	0.0095	0.0277	0.0206	0.0121	0.0210	0.0220	0.0214
$S$ =40	Traj2SimVec	0.2590	0.0018	0.0079	0.0351	0.2151	0.2602	0.0094	1.0000	1.0000	0.0245	0.0158	0.0249
$S$ =100	NEUTRAJ	0.0082	0.0094	0.0074	0.0076	0.0087	0.0088	0.0277	0.0212	0.0204	0.0252	0.0258	0.0249
$S$ =100	Traj2SimVec	0.25905	0.0018	0.0079	0.0352	0.2143	0.2705	0.0366	1.0000	1.0000	0.0215	0.0131	0.0513

Effect of Shapes. Figure 12 shows the Top- $1$ query results for four different query trajectories (denoted by red lines), each of which is with a typical spatial shape, i.e., straight (denoted as $QT_{s}$ ), polyline without overlaps (denoted as $QT_{o1}$ ), polyline with overlaps (denoted as $QT_{o2}$ ), and round (denoted as $QT_{r}$ ). Here, we only evaluate road-network constrained similarity measures on T-drive and Porto, as the vehicle trajectories usually have much more complex geometry than free space trajectories due to the constraints of road networks. The higher the similarity between the query trajectory and the corresponding Top- $1$ query result by visualization, the higher the effectiveness.

First, given a $QT_{s}$ , only NetEDR and NetLCSS return back-and-forth trajectories containing a large number of matched pairs with $QT$ , while all of the remaining measures return straight trajectories (cf. Figures 12(a) and (e)). The reason is that NetEDR and NetLCSS do not consider the direction of trajectories, but rather the number of matched pairs. Specifically, the more the matched pairs, the higher the similarity (cf. Section 3). Second, given a $QT_{o1}$ or a $QT_{o2}$ , all of measures except for NetEDR return similar results to it (cf. Figures 12(b), (c), (f), and (g)). This is because, NetEDR does not consider the spatial distance between two points; instead, it identifies the similar trajectories by comparing their lengths, i.e., a similar lengths results in a higher similarity. Third, given a $QT_{r}$ , only LORS, LCRS, and TP return results with round sub-trajectories, where LORS performs the best (cf. Figures 12(d) and (h)), because LORS can identify overlapping road segments, which enables it to effectively match trajectories with round shapes.

6.2.2 Learning based measures

We evaluate the effectiveness of learning based measures by their capabilities of approximating non-learning based measures, which are their targets. Specifically, given a query trajectory $QT$ and a non-learning based measure, we use its Top-50 query result calculated by the non-learning based measure as the ground truth. Next, we implement the model of a learning based measure to approximate the non-learning based measure, and apply the learned measure to compute Top-50 query result w.r.t. $QT$ . Finally, we utilize $\textit{HR}@50$ to measure the overlap between the Top-50 similarity query results returned by the learning-based measures and the ground truths. Clearly, the higher the $\textit{HR}@50$ , the higher the effectiveness.

Figure 13 depicts the effectiveness of learning based measures. First, Traj2SimVec has better performance (i.e., higher $\textit{HR}@50$ ) than NEUTRAJ in free space settings (cf. Figures 9(a) and (b)). Specifically, Traj2SimVec simplifies trajectories into triplets (anchor, similar, dissimilar) (cf. Section 4), as such triplets enhance the similarity learning among trajectories. Second, the performance of Traj2SimVec is unstable across different similarity measures and different datasets. This is because (i) LCSS, EDR, and ERP are string-based, whose information cannot be preserved in embeddings; and (ii) the data distribution of AIS is uneven, and the spatio-temporal features on Geolife is complex due to various transportation modes. Third, both $\textit{HR}@50$ of GTS and ST2Vec are stable in road network settings, and ST2Vec performs better than GTS. The reason is that the spatial information exploited by ST2Vec (i.e., road networks) is more comprehensive than that by GTS (i.e., POIs), and ST2Vec is able to extract. In a conclusion, learning-based measures are more effective for road network settings.

6.3 Robustness Evaluation

We study the robustness of each trajectory similarity measure by performing Top- $50$ similarity queries. Specifically, given a specific non-learning based measure and a query trajectory $QT$ , we perform robustness evaluation by the following three steps. First, we compute the $QT$ ’s Top- $50$ similar trajectories via a similarity measure, and set the result as the ground-truth. Second, we perform down sampling or add-noise operations on the original dataset according to parameters $S$ and $N$ in Table II, resulting in a set of transformed trajectory datasets. Third, we perform Top- $50$ similarity query for the same query trajectory $QT$ in each transformed dataset to obtain the corresponding Top- $50$ result. Finally, we use $\textit{HR}@50$ to measure the overlap between the Top-50 similarity query results on transformed datasets and the ground truths. Clearly, the higher the $\textit{HR}@50$ , the higher the robustness. Note that, we compute Top- $50$ similarity results for non-learning based measures, while learn those for learning based measures.

6.3.1 Non-learning based measures

Effect of Sampling Rates. We follow Table II to perform downsampling with S (%)=10, 20, and 40, where $S$ is the sampling rate. When S=100, no sampling is performed. Here, we only verify free space oriented measures on AIS and Geolife, as the variation of sampling rates has little effect on map-matched trajectories in road networks.

The results are shown in Figure 14. First, DTW, EDR, and ERP become less robust as the sampling rate increases. This is in contrast to the findings of a previous study [57], where experimental datasets with small spatial coverage are used. In this case, the change of the sampling rate does not significantly impact the distances between pair points when computing DTW and ERP distance. On the contrary, as AIS covers large spatial space, DTW and ERP that perform well in small urban areas [57] are sensitive to sampling rates.

Second, $\textit{HR}@50$ of EDwP varies between 0.6 and 1.0 on both AIS and Geolife, which suggests that its performance is slightly affected by sampling rates. In addition, LIP is robust on AIS, as its corresponding $\textit{HR}@50$ is larger than 0.6. However, it performs poorly on Geolife with $\textit{HR}@50$ being smaller than 0.4. Specifically, the similarity between each pair of trajectories calculated by LIP is determined by the areas and perimeters of the polygons formed by point intersections of the trajectories. Moreover, each trajectory on Geolife generally contains a variety of transportation modes, where the decreasing of sampling rates largely affects the shape of a trajectory. Thus, LIP is very sensitive to the sampling rates of trajectories on Geolife.

Finally, Frechet, Hausdorff, OWD, and Seg-Frechet are robust, as their $\textit{HR}@50$ are always larger than 0.9 even when S=10 $\%$ . This is because these measures rely on the number of matched pairs, which remains stable when the number of sampling points changes.

TABLE IV: Robustness of Learning-based Measures in Free Space vs. Noise Ratio

Noise Ratio (%)	Measures	AIS						Geolife
Noise Ratio (%)	Measures	DTW	LCSS	EDR	ERP	Frechet	Hausdorff	DTW	LCSS	EDR	ERP	Frechet	Hausdorff
$N$ =0	NEUTRAJ	0.0082	0.0094	0.0074	0.0076	0.0087	0.0088	0.0277	0.0212	0.0204	0.0252	0.0258	0.0249
$N$ =0	Traj2SimVec	0.2591	0.0018	0.0079	0.0352	0.2143	0.2705	0.0366	1.0000	1.0000	0.0215	0.0131	0.0513
$N$ =13	NEUTRAJ	0.0091	0.0080	0.0073	0.0079	0.0087	0.0085	0.0276	0.0223	0.0203	0.0243	0.0247	0.0227
$N$ =13	Traj2SimVec	0.2460	0.0017	0.0079	0.0345	0.2091	0.2583	0.0094	1.0000	1.0000	0.0255	0.0158	0.0110
$N$ =16	NEUTRAJ	0.0092	0.0071	0.0080	0.0082	0.0091	0.0085	0.0259	0.0201	0.0107	0.0283	0.0259	0.0251
$N$ =16	Traj2SimVec	0.2464	0.0021	0.0080	0.0348	0.2053	0.2558	0.0094	1.0000	1.0000	0.0249	0.0158	0.0110
$N$ =19	NEUTRAJ	0.0083	0.0071	0.0066	0.0081	0.0087	0.0084	0.0262	0.0200	0.0103	0.0266	0.0259	0.0254
$N$ =19	Traj2SimVec	0.2403	0.0017	0.0078	0.0334	0.2034	0.2540	0.0094	1.0000	1.0000	0.0252	0.0158	0.0110

TABLE V: Robustness of Learning-based Measures in Road Network vs. Noise Ratio

Noise Ratio (%)	Measures	T-Drive				Porto
Noise Ratio (%)	Measures	NetLCSS	NetDTW	TP	NetERP	NetLCSS	NetDTW	TP	NetERP
$N$ =0	GTS	0.2647	0.1439	0.2113	0.0869	0.2647	0.1439	0.2113	0.0869
$N$ =0	ST2Vec	0.1895	0.2691	0.3935	0.2570	0.2276	0.2928	0.3758	0.2505
$N$ =10	GTS	0.0099	0.0097	0.0097	0.0098	0.0510	0.0501	0.0502	00509
$N$ =10	ST2Vec	0.0100	0.0101	0.0098	0.0102	0.0950	0.0354	0.0463	0.0340
$N$ =20	GTS	0.0098	0.0101	0.0099	0.0101	0.0507	0.0506	0.0499	0.0501
$N$ =20	ST2Vec	0.0105	0.0103	0.0099	0.0103	0.0944	0.0334	0.0452	0.0348
$N$ =30	GTS	0.0099	0.0102	0.0105	0.0102	0.0512	0.0495	0.0495	0.0494
$N$ =30	ST2Vec	0.0103	0.0103	0.0098	0.0098	0.0924	0.0348	0.0462	0.0313

Effect of Noise Ratios. We set the noise ratio N (%) to 13, 16, and 19 for AIS and Geolife, and 10, 20, and 30 for T-drive and Porto, based on the data distributions of these datasets (as listed in Table II). A noise ratio of N=0 corresponds to the original datasets.

Figure 15 plots the results. First, DTW, LCSS, ERP, Frechet, Hausdorff, OWD, and Seg-Frechet perform stable and good in free space settings when the noise ratio is varied (cf. Figures 15(a)–(f)). This is because measures such as LCSS and EDR use string distance to compute the inter-trajectory similarity, making them robust to noises.

Second, as the trajectories on AIS are gathered locally and dispersed globally, adding noises does not affect grid-trajectory mapping of OWD (cf. Figures 15(a)–(c)). Nevertheless, according to the results on Geolife, noises may affect the performance of OWD (cf. Figures 15(d)–(f)). This is because the complex features and small spatial coverage of Geolife render that the noise points severely modify the grid representation of origin trajectories.

Third, the performance of LIP is the worst among all measures on Geolife (cf. Figures 15(d)–(f)), which is in contrast to previous study [57]. This is because LIP computes similarity according to the polygons formed by point intersections of trajectories. However, the shapes of polygons are greatly affected by noise points, with the result that their areas and perimeters largely differ from polygons formed by original point intersections.

Next, EDwP’s performance becomes worse on AIS but remains stable on Geolife, as the noise ratio increases (cf. Figures 15(a)–(c) and (d)–(f)). Specifically, the low sampling rate (cf. Section 6.1) of AIS causes that the large number of noise points have a significant impact on the lengths of trajectory segments, which in turn affects the similarity computation of EDwP (cf. Eq. 9).

Finally, in road network settings, all similarity measures’s performance drops drastically when adding more noises (cf. Figures 15(g)–(l)). Specifically, there are many paths between two locations in road networks, and thus, changing some road segments of a trajectory may result in another new trajectory with a new path. In this case, most of the measures sensitive to noise ratios. Moreover, when adapting free space oriented measures to the road networks, some noise-tolerant measures such as ERP and EDR tend to be noise-sensitive and others such as DTW and LCSS still are noise-tolerant. This indicates that some similarity measures are not suitable for being adjusted to the road networks.

TABLE VI: Efficiency of Learning-based Measures

Measures in Free Space	AIS				Geolife
	NEUTRAJ		Traj2SimVec		NEUTRAJ		Traj2SimVec
	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)
DTW	125.01	46.80	135.19	6.59	58.72	9.70	66.95	2.96
LCSS	107.87	42.59	143.94	5.96	59.25	9.64	65.83	3.36
EDR	126.02	45.89	138.36	6.13	54.16	9.01	66.58	3.35
ERP	111.30	47.14	141.94	5.99	57.64	9.43	65.85	3.94
Frechet	104.01	42.80	136.37	7.46	53.20	9.61	66.55	3.65
Hausdorff	113.38	46.89	135.45	6.61	57.80	9.60	63.30	3.59
Measures in Road Network	T-Drive				Porto
	GTS		ST2Vec		GTS		ST2Vec
	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)
NetLCSS	8.62	2.67	171.84	3.61	6.31	0.96	106.22	1.23
NetDTW	8.66	2.66	166.04	2.56	6.41	1.33	106.80	1.20
TP	8.64	2.63	165.75	3.14	6.35	1.08	119.15	1.15
NetERP	9.83	2.82	212.68	12.32	6.34	0.97	111.29	1.20

6.3.2 Learning-based measures

Effect of Sampling Rates. We perform downsampling operations with the same S (%)=10, 20, and 40 to obtain a set of transformed datasets. Then, we directly apply the deep models trained on the original datase (i.e., N=100 $\%$ ) to perform the Top- $50$ similarity queries on the transformed datasets and then compute $\textit{HR}@50$ . Table III depicts the results. We observe that all similarity functions learned by NEUTRAJ and Traj2SimVec are stable when varying sampling rates. This confirms the high robustness of learning-based similarity, which is able to learn and accommodate various sampling rates and thus is not affected by sampling points.

Effect of Noise Ratios. We perform the same downsampling operations in learning settings as those in non-learning settings to obtain a set of transformed datasets. Then, we directly apply the deep models trained on the original dataset (i.e., $\textit{N=}0\%$ ) to perform the Top- $50$ similarity queries on the transformed datasets and then compute $\textit{HR}@50$ . Tables IV, and V show the results on AIS, Geolife, T-drive, and Porto, respectively. As reported in Table IV, all similarity measures learned by NEUTRAJ and Traj2SimVec are robust to noise ratios, and the performance keeps stable when adding noises. This indicates that NEUTRAJ and Traj2SimVec are noise-tolerant, and outperform non-learning based measures significantly. However, as reported in Table V, the performance of all similarity measures learned by ST2Vec and GTS significantly drops when adding more noises.

6.4 Efficiency Evaluation

We evaluate the efficiency of each measure by performing Top- $50$ similarity queries.

6.4.1 Non-learning based measures

Given a query trajectory $QT$ , we record the query time (denoted as $Q$ ) of using free space oriented and road network constrained measures in Figure 16. First, Figures 16(a) and 16(b) indicate that the running time of DTW, LCSS, EDR, ERP, Frechet, OWD, and Hausdorff are similar. LIP achieves the highest efficiency (the running time is around 4 seconds), while EDwP achieves the lowest efficiency (the running time is more than 500 seconds). This is because (i) the time complexity of LIP is O((m+n)log(m+n)), while that of the remaining measures is O(mn); and (ii) EdwP maintains four dynamic processing tables for similarity computation while others only maintains one.

Next, Figures 16(c) and 16(d) suggest that the running time of LCRS, LORS, NetEDR, NetLCSS, and NetDTW are less than 10 seconds on T-Drive and 1 seconds on Porto, respectively, while that of TP and NetERP are much larger. Specifically, LCRS and LORS achieve the best efficiency in road network settings. This is because TP additionally considers temporal information of trajectories, and NetERP needs to compute the sum of distances between road intersections. Note that, we need to pre-compute the road network distance matrices for evaluating most of the standalone trajectory measures, which takes more than two days. In contrast, LCRS and LORS compute the overlapping segments as inter-trajectory distance, which does not need the road network distance matrices.

Finally, all of the distributed measures (i.e., DFT, DITA, REPOSE) achieve much higher efficiency than standalone measures (i.e., Seg-Frechet and DTW). Among DFT, DITA, and REPOSE, REPOSE achieves the best efficiency, and DFT performs the worst. This is because the former leverages a novel heterogeneous global partitioning strategy to achieve load balancing, while the latter traverses R trees to find a pruning boundary before querying.

6.4.2 Learning-based measures

Given a query trajectory $QT$ , we record the training time (denoted as $T_{tra}$ ) and query time (denoted as $Q$ ) of using free space oriented and road network constrained measures for querying on four datasets in Table VI.

TABLE VII: Scalability Evaluation of Learning-based Measures on Training Time and Query Time vs. Trajectory Length (AIS & T-Drive)

$L(\%)$	$L$ =20				$L$ =60				$L$ =100
	NEUTRAJ		Traj2SimVec		NEUTRAJ		Traj2SimVec		NEUTRAJ		Traj2SimVec
Measures in Free Space	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)
DTW	101.12	42.19	45.36	4.85	114.06	43.72	91.92	6.82	125.01	46.80	135.19	6.59
LCSS	105.10	40.36	44.68	4.40	108.21	45.33	96.79	5.90	107.87	42.59	143.94	5.97
EDR	100.77	41.67	44.92	4.37	111.34	44.40	92.74	5.73	126.02	45.89	138.36	6.13
ERP	103.86	41.56	42.34	4.82	113.05	42.80	90.34	5.82	111.30	47.14	141.94	5.99
Frechet	103.86	41.33	44.89	4.80	114.11	41.48	96.56	6.83	104.01	42.80	136.37	7.46
Hausdorff	103.87	41.67	44.78	4.87	113.76	42.96	93.47	7.28	113.38	46.89	135.45	6.61
	GTS		ST2Vec		GTS		ST2Vec		GTS		ST2Vec
Measures in Road Network	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)
NetLCSS	3.08	0.79	27.13	0.81	5.84	2.21	84.69	1.93	8.62	2.67	171.84	3.61
NetDTW	2.98	0.77	25.58	0.82	5.35	1.36	87.90	1.97	8.66	2.66	166.04	2.56
TP	3.06	0.65	27.21	0.92	5.29	1.71	84.04	1.97	8.64	2.63	165.75	3.14
NetERP	3.06	0.81	26.17	0.94	5.45	1.69	82.56	1.66	8.83	2.82	212.68	12.32

TABLE VIII: Scalability Evaluation of Learning-based Measures on Training Time and Query Time vs. Data Cardinality (AIS & T-Drive)

$O_{r}(\%)$	$O_{r}$ =20				$O_{r}$ =60				$O_{r}$ =100
	NEUTRAJ		Traj2SimVec		NEUTRAJ		Traj2SimVec		NEUTRAJ		Traj2SimVec
Measures in Free Space	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)
DTW	100.92	18.31	45.26	1.30	122.88	29.11	106.01	3.32	125.01	46.80	135.19	6.59
LCSS	109.90	18.26	29.64	1.28	125.88	29.26	107.02	3.39	107.87	42.59	143.94	5.97
EDR	100.79	18.08	44.52	1.33	123.67	29.69	106.56	3.21	126.02	25.89	138.36	6.13
ERP	101.95	18.48	45.35	1.33	123.46	29.56	106.58	3.46	111.30	47.14	141.94	5.99
Frechet	102.10	18.32	43.69	1.32	126.28	29.26	107.24	4.55	104.01	42.80	136.37	7.46
Hausdorff	101.10	18.39	44.58	1.30	122.48	29.20	106.65	3.34	113.38	46.89	135.45	6.61
	GTS		ST2Vec		GTS		ST2Vec		GTS		ST2Vec
Measures in Road Network	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}T$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)	$T_{tra}$ (s)	$Q$ (ms)
NetLCSS	1.46	0.41	39.70	0.64	4.71	1.46	84.95	1.83	8.62	2.67	171.84	3.61
NetDTW	1.49	0.50	28.98	0.52	4.72	1.28	84.58	1.52	8.66	2.66	166.04	2.56
TP	1.49	0.51	27.07	0.50	4.66	1.28	86.04	2.04	8.64	2.63	165.75	3.14
NetERP	1.49	0.52	30.40	0.59	5.33	1.24	90.59	1.88	8.83	2.82	212.68	12.32

First, the average query time of the learning based methods is 1-2 orders of magnitude lower than that of the non-learning based measures (cf. Figures 16(a) and 16(b)). Next, in road network, ST2Vec consumes more training time than GTS, while their query time is similar. This is because ST2Vec considers temporal information, which needs to be computed during model training. Finally, ST2Vec spends more training time on NetERP than other measures. In particular, NetERP is designed specifically for spatial data. When NetERP is adapted to process spatio-temporal data, it requires additional time to compute the sum of temporal distances between road vertices and reference points. In contrast, other measures are designed for spatio-temporal data, and does not leverage the reference points when calculating similarities.

6.5 Scalability Evaluation

We verify the scalability of each measure by varying the lengths and cardinality of trajectories. We report the index building time (denoted as $T_{idx}$ ) of distributed measures, training time (denoted as $T_{tra}$ ) of learning-based measures, and Top- $50$ query time (denoted as $Q$ ) of all measures. Note that, DISON (a distributed version of LCRS), which is the distributed version of LCRS and the single road network oriented measure designed for distributed environment, only supports threshold-based similarity queries. Specifically, given a query trajectory $QT$ , DISON needs to specify a threshold $\tau$ , and returns a trajectory set, such that the similarity between each two trajectories in the set and $QT$ are larger than $\tau$ . Thus, DISON does not support Top- $k$ similarity queries. In this case, we set $\tau\textit{=}$ 0.8, where the trajectories returned by DISON are 50, i.e., it is the same as the number of trajectories returned by other distributed measures. In addition, we implement the local and global indexes assocaited with the distributed measures (i.e., an R-Tree index of DFT, an R-Tree and a Trie-like indexes of DITA, an RP-Tree of REPOSE, and a two-layer index with hashmap of DISON) for a fair and objective comparison.

6.5.1 Non-learning based measures

Standalone methods. Figures 17 and 18 illustrate the query time of standalone measures when varying trajectory lengths and data cardinality. First, the query time of most standalone measures (except for LIP, LCRS, and LORS) increases significantly as trajectory length or data cardinality grows, which has poor scalablity. Next, as depicted in Figure 18, the query time of LIP increases slightly with the growth of data cardinality. This is because the time complexity of LIP is O((m+n)log(m+n)), which is lower than that of other standalone measures. LCRS’s and LORS’s query time remain stable when data cardinality rises from 60% to 100%. The reason is that both LCRS and LORS compute the similarity between a pair of trajectories by counting the number of their overlapping road segments, instead of calculating pairwise distances between their points or road vertices. This enables LCRS and LORS to process large datasets.

Distributed methods. Figures 19 and 20 show the index building time and query time of distributed measures when varying trajectory lengths and data cardinality. First, compared with Figures 17 and 18, distributed measures achieve much higher query efficiency than standalone measures, when varying both trajectory lengths and data cardinality on both datasets.

Second, the query time and index building time of DITA, REPOSE, and DISON vary gently with both trajectory lengths and cardinality on both datasets; while those of DFT increase rapidly with the growth of trajectory lenghs and cardinality (cf. Figures 17 and 18). Specifically, DFT (i) collects the trajectories’ ids in each node of the index; and (ii) calculates the distances between each data partition and query trajectory $QT$ in order to select a threshold for pruning. Clearly, the time costs of these operations increase when the trajectory length and data cardinality grow.

Next, Figures 19(a) and (d) show that REPOSE and DITA always perform better than DFT in terms of query time, when varying trajectory length and data cardinality on both AIS and Geolife. DFT uses R-tree for indexing while others use trie-like index. Trie-like index is a multi-level structure where each level stores a set of trajectory points, with the points that have earlier timestamps located at higher levels of the index, i.e., the root of the index represents the starting point of a trajectory, and the leaf nodes represent the end point. The similarity computation of both DITA and DFT requires to calculate the distance between each pair of GPS points in a chronological order. Thus, DITA’s trie-like index allows for efficient computation of similarity between GPS points while the data is being indexed. In contrast, the R-tree does not organize the points based on their chronological order. Thus, DFT is less efficient than DITA.

Finally, as depicted in Figures 19(c), 19(f), 20(c), and 20(f), the query time $Q$ of DISON is always less than 2 seconds on T-Drive, and is always less than 0.2 seconds on Porto, and slightly increases with the growth of trajectory length and data cardinality. On the contrary, the query time of LCRS is more than 3 seconds on T-Drive and 0.4 seconds on Porto, and drastically increases with trajectory length and data cardinality (cf. Figures 16(c) and 16(d)). This means that the distributed techniques are enable to enhance the scalability of similarity computation in road networks.

6.5.2 Learning-based methods

We evaluate the scalability of learning-based methods only on AIS and T-Drive, due to the space limitation and their larger sizes than Geolife’s and Porto’s. Tables VII and VIII list the training time and query time of learning-based measures when varying trajectory lengths and data cardinality. First, the query time of all measures slightly increase with the growth of trajectory length and data cardinality; while the training time increases significantly. This is because when trajectory lengths or data cardinality rises, learning-based methods need more time to capture the information of sampling points for model training; then they just apply the well-trained model to obtain the Top- $50$ results in a constant time. Finally, the query time of learning-based measures are much lower than that of non-learning based measures. This indicates that learning-based measures are able to accelerate the similarity computation.

6.6 Study of Metric Similarity Measures

Triangle inequality and indexing can be exploited to filter unqualified results and accelerate similarity computation, respectively, for metric similarity measures (cf. Section 3), and thus can affect the performance of them significantly. In this subsection, we evaluate the impacts of triangle inequality-based pruning and different indexing structures on the performance of three metric measures: ERP, Frechet, and Hausdorff. Specifically, given a query trajectory $QT$ , we apply the three measures to perform top-50 queries, and study their index building time (denoted as $T_{idx}$ ), query time (denoted as $Q$ ), and Prune rate (denoted as Prune Rate) when the three indexing structures are used, respectively. Here, $\textit{PruneRate}=\frac{|\textit{TrajNum}_{pruned}|}{|\textit{TrajNum}_{all}|}$ is designed to verify the pruning performance, where $\textit{TrajNum}_{pruned}$ is the number of filtered trajectories and $\textit{TrajNum}_{all}$ is the total number of trajectories. Clearly, the higher the PruneRate, the higher the pruning performance. Note that, three representative pivot-based indexes LEASA [40], MVP-Tree (MVPT) [6], and PM-Tree (PMT) [56] are selected, while HF method [10] is used to select pivot trajectories, where the pivot number is set to 5 following the previous survey [11].

Figure 21 shows the results. First, the indexing helps to reduce the query time of all metric measures, i.e., the lower the query time, the higher the PruneRate (cf. Figures 16(a)–(b)). Second, the rank of the search time among three measures changes when applying indexing structures. For example as shown in Figure 16(a), among Frechet, ERP, and Hausdorff, Frechet achieves the highest efficiency and Hausdorff performs worse than ERP on AIS when there is no indexing structure applied; however, Hausdorff achieves higher efficiency than ERP on AIS when using LEASA and MVPT, as illustrated in Figure 21(b). Finally, the index building time and pruning performance of different measures with the same indexing structure are different. As shown in Figure 21(a) and 21(c), Frechet spends less time on building index, and has more stable and higher PruneRate than ERP and Hausdorff.

Overall, it is necessary to build indexes and explore triangle inequality-based pruning before evaluating the performance of a metric similarity measure, as those auxiliary techniques may enhance the effectiveness of the metric similarity measures greatly.

6.7 Experimental Takeaways

We perform an independent and comprehensive experimental study on trajectory similarity measures, and analyze their effectiveness, robustness, efficiency, and scalability. The insights are summarized below.

•

The effect of most trajectory similarity measures is greatly affected by the distributions and features of datasets. Specifically, Frechet, Hausdorff, LCRS, and LORS always perform well and stable in different datasets; while ERP performs the worst.
•

Most measures only compute the spatial distance of trajectories to measure the similarity; however, they perform effectively in spatio-temporal scenarios when being directly extended to temporal distance (even though the time cost is high).
•

The point-based measures perform better in free space; while the segment-based measures are more effective in road network. In addition, some measures (i.e., DTW and LCSS) in free space can be directly extended to the road networks and preserve good performance; while others (i.e., EDR and ERP) are not suitable for being adjusted to the road networks, which affects the robustness.
•

For metric similarity measures, building indexes and exploring triangle inequality-based pruning may help to enhance their efficiency significantly.
•

In most cases, learning based measures are more robust in time consumption over different-scale datasets. Standalone-based measures have obviously lower efficiency and scalability but higher effectiveness. Overall, distributed-based measures enable to improve time performance while ensuring high effectiveness of standalone-based measures, which are promising in the study of trajectory similarity measures.

7 Future Directions

Although massive efforts are devoted to spatio-temporal trajectory similarity/distance measures, many challenges still exist to be addressed. In this section, we provide the following potential future research directions.

Temporality. Trajectory data includes both spatial and temporal information. Nonetheless, most of the existing trajectory similarity measures only consider spatial information, while only TP [54] considers both spatial and temporal information. In addition, the time complexity of TP is high, which cannot be efficiently applied to downstream tasks that require to capture time-dependent features (e.g., traffic flow forecasting). Hence, it is of interest to design efficient trajectory similarity measures that consider both spatial and temporal information.

Timeliness. With the wide applications of location services and positioning technologies, a large amount of GPS trajectory data is continuously collected as streaming data, making trajectory similarity computation more challenging. However, existing measures are very costly, failing to meet the real-time requirements of downstream tasks (e.g., ridesharing). Thus, real-time or online trajectory similarity computation is also a potential research direction.

Privacy. Due to the sensitive of private location information, the privacy protection is required when processing the trajectories. None of existing trajectory similarity measures consider the data privacy, and it is inefficient to directly apply existing measures to support high query efficiency/quality while achieving privacy protection. Therefore, it is also of interest to achieve effective and efficient similarity computation while preserving the privacy.

Effectiveness. The goal of trajectory similarity computation is to benefit the downstream trajectory analytics. As various similarity measures exist, how to select proper measure for different analysis tasks is important. For example, most of the existing similarity measures (e.g., DTW, NetERP, etc.) are designed for trajectory retrieval and clustering, which are ineffective to the tasks such as anomaly detection. Consequently, it is a promising way to combine similarity measure selection with downstream tasks to enhance the effect of trajectory analytics.

8 Conclusions

In this paper, we review spatio-temporal trajectory distance measures from three-dimensional perspectives. Then, we offer an evaluation benchmark to examine the effectiveness, robustness, efficiency, and scalability of each measure. According to experimental results, we give objective insights for trajectory measure selection in varying scenarios. Based on the issues to be addressed in the community, we believe effective/interpretable AI-driven similarity analysis as well as distributed similarity analysis in road networks or high-dimensional embedding space, are promising directions.

References

[1] O. Abul, F. Bonchi, and M. Nanni. Never walk alone: Uncertainty for anonymity in moving objects databases. In ICDE, pages 376–385, 2008.
[2] H. Alt and M. Godau. Computing the fréchet distance between two polygonal curves. Int. J. Comput. Geom. Appl., 5:75–91, 1995.
[3] G. L. Andrienko, N. V. Andrienko, and S. Wrobel. Visual analytics tools for analysis of movement data. SIGKDD, 9(2):38–46, 2007.
[4] S. Atev, G. Miller, and N. P. Papanikolopoulos. Clustering of vehicle trajectories. TITS, 11(3):647–657, 2010.
[5] N. M. Blachman. On fourier series for gaussian noise. Inf. Control., 1(1):56–63, 1957.
[6] T. Bozkaya and Z. M. Özsoyoglu. Distance-based indexing for high-dimensional metric spaces. In SIGMOD, pages 357–368, 1997.
[7] S. Brakatsoulas, D. Pfoser, R. Salas, and C. Wenk. On map-matching vehicle tracking data. pages 853–864, 2005.
[8] W. Cao, Z. Wu, D. Wang, J. Li, and H. Wu. Automatic user identification method across heterogeneous mobility data sources. In ICDE, pages 978–989, 2016.
[9] L. Chen, Y. Gao, Z. Fang, X. Miao, C. S. Jensen, and C. Guo. Real-time distributed co-movement pattern detection on streaming trajectories. PVLDB, 12(10):1208–1220, 2019.
[10] L. Chen, Y. Gao, X. Li, C. S. Jensen, and G. Chen. Efficient metric indexing for similarity search. In ICDE, pages 591–602, 2015.
[11] L. Chen, Y. Gao, X. Song, Z. Li, Y. Zhu, X. Miao, and C. S. Jensen. Indexing metric spaces for exact similarity search. ACM Comput. Surv., 55(6):128:1–128:39, 2020.
[12] L. Chen and R. T. Ng. On the marriage of lp-norms and edit distance. In VLDB, pages 792–803, 2004.
[13] L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In SIGMOD, pages 491–502, 2005.
[14] Q. Chen, L. Chen, X. Lian, Y. Liu, and J. X. Yu. Indexable PLA for efficient similarity search. In VLDB, pages 435–446, 2007.
[15] R. S. de Sousa, A. Boukerche, and A. A. F. Loureiro. Vehicle trajectory similarity: Models, methods, and applications. ACM Comput. Surv., 53(5):94:1–94:32, 2020.
[16] H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. J. Keogh. Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB, 1(2):1542–1552, 2008.
[17] S. Dodge, P. Laube, and R. Weibel. Movement similarity assessment using symbolic representation of trajectories. GIS.
[18] Z. Fang, Y. Du, L. Chen, Y. Hu, Y. Gao, and G. Chen. E ${}^{\mbox{2}}$ dtc: An end to end deep trajectory clustering framework via self-training. In ICDE, pages 696–707, 2021.
[19] Z. Fang, Y. Du, X. Zhu, D. Hu, L. Chen, Y. Gao, and C. S. Jensen. Spatio-temporal trajectory similarity learning in road networks. In SIGKDD, pages 347–356, 2022.
[20] Z. Fang, L. Pan, L. Chen, Y. Du, and Y. Gao. MDTP: A multi-source deep traffic prediction framework over spatio-temporal trajectory data. PVLDB, 14(8):1289–1297, 2021.
[21] E. Frentzos, K. Gratsias, and Y. Theodoridis. Index-based most similar trajectory search. In ICDE, pages 816–825, 2007.
[22] B. Friedrich, B. Cauchi, A. Hein, and S. J. F. Fudickar. Transportation mode classification from smartphone sensors via a long-short-term-memory network. CoRR, abs/1910.04739, 2019.
[23] A. Gasmelseed and N. H. Mahmood. Study of hand preferences on signature for right- handed and left-handed peoples. Int. J. Adv. Eng. Technol, 2011.
[24] R. H. Güting, T. Behr, and J. Xu. Efficient k-nearest neighbor search on moving object trajectories. VLDBJ, 19(5):687–714, 2010.
[25] A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD, pages 47–57, 1984.
[26] P. Han, J. Wang, D. Yao, S. Shang, and X. Zhang. A graph-based approach for trajectory similarity computation in spatial networks. In SIGKDD, pages 556–564, 2021.
[27] X. Huang, Y. Yin, S. Lim, G. Wang, B. Hu, J. Varadarajan, S. Zheng, A. Bulusu, and R. Zimmermann. Grab-posisi: An extensive real-life GPS trajectory dataset in southeast asia. In SIGSPATIAL, pages 1–10, 2019.
[28] C. Hung, W. Peng, and W. Lee. Clustering and aggregating clues of trajectories for mining trajectory patterns and routes. VLDBJ, 24(2):169–192, 2015.
[29] E. J. Keogh and M. J. Pazzani. Scaling up dynamic time warping for datamining applications. In SIGKDD, pages 285–289, 2000.
[30] S. Koide, C. Xiao, and Y. Ishikawa. Fast subtrajectory similarity search in road networks under weighted edit distance constraints. PVLDB, 13(11):2188–2201, 2020.
[31] I. Kontopoulos, A. Makris, D. Zissis, and K. Tserpes. A computer vision approach for trajectory classification. In MDM, pages 163–168, 2021.
[32] J. Lee, J. Han, and K. Whang. Trajectory clustering: a partition-and-group framework. In SIGMOD, pages 593–604, 2007.
[33] S. T. Leutenegger, J. M. Edgington, and M. A. López. STR: A simple and efficient algorithm for r-tree packing. In ICDE, pages 497–506, 1997.
[34] T. Li, L. Chen, C. S. Jensen, and T. B. Pedersen. TRACE: real-time compression of streaming trajectories in road networks. PVLDB, 14(7):1175–1187, 2021.
[35] X. Li, J. Han, S. Kim, and H. Gonzalez. ROAM: rule- and motif-based anomaly detection in massive moving object data sets. In SIAM, pages 273–284, 2007.
[36] X. Li, K. Zhao, G. Cong, C. S. Jensen, and W. Wei. Deep representation learning for trajectory similarity computation. In ICDE, pages 617–628, 2018.
[37] Z. Li, J. Han, M. Ji, L. Tang, Y. Yu, B. Ding, J. Lee, and R. Kays. Movemine: Mining moving object data for discovery of animal movement patterns. TIST, 2(4):37:1–37:32, 2011.
[38] S. Ma, Y. Zheng, and O. Wolfson. T-share: A large-scale dynamic taxi ridesharing service. In ICDE, pages 410–421, 2013.
[39] J. D. Mazimpaka and S. Timpf. Trajectory data mining: A review of methods and applications. J. Spatial Inf. Sci., 13(1):61–99, 2016.
[40] L. Micó, J. Oncina, and E. Vidal. A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recognit. Lett., 15(1):9–17, 1994.
[41] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In ICLR, 2013.
[42] M. F. Mokbel, C. Chow, and W. G. Aref. The new casper: Query processing for location services without compromising privacy. In VLDB, pages 763–774, 2006.
[43] C. Myers, L. R. Rabiner, and A. E. Rosenberg. Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans. Acous. Speech Signal Process, 28(6):623–635, 1980.
[44] K. S. Naveh and J. Kim. Urban trajectory analytics: Day-of-week movement pattern mining using tensor factorization. TITS, 20(7):2540–2549, 2019.
[45] OpenStreetMap. https://master.apis.dev.openstreetmap.org/. 2022.
[46] N. Pelekis, I. Kopanakis, G. Marketos, I. Ntoutsi, G. L. Andrienko, and Y. Theodoridis. Similarity search in trajectory databases. In TIME, pages 129–140, 2007.
[47] A. project. https://marinecadastre.gov/ais/. 2019.
[48] T.-D. Project. https://www.microsoft.com/en-us/research/publication/t-drive-trajectorydatasample/. 2008.
[49] S. Ranu, D. P, A. D. Telang, P. Deshpande, and S. Raghavan. Indexing and matching trajectories under inconsistent sampling rates. In ICDE, pages 999–1010, 2015.
[50] Y. S. Resheff. Online trajectory segmentation and summary with applications to visualization and retrieval. In IEEE BigData, pages 1832–1840, 2016.
[51] M. T. Robinson. The temporal development of collision cascades in the binary collision approximation. Nucl. Instrum. Methods Phys. Res. Sect. B, 48(1-4):408–413, 1990.
[52] S. Salvador and P. Chan. Toward accurate dynamic time warping in linear time and space. Intell. Data Anal., 11(5):561–580, 2007.
[53] A. C. Sanderson and A. K. C. Wong. Pattern trajectory analysis of nonstationary multivariate data. IEEE Trans. Syst. Man Cybern., 10(7):384–392, 1980.
[54] S. Shang, L. Chen, Z. Wei, C. S. Jensen, K. Zheng, and P. Kalnis. Trajectory similarity join in spatial networks. PVLDB, 10(11):1178–1189, 2017.
[55] Z. Shang, G. Li, and Z. Bao. DITA: distributed in-memory trajectory analytics. In SIGMOD, pages 725–740, 2018.
[56] T. Skopal, J. Pokorný, and V. Snásel. Pm-tree: Pivoting metric tree for similarity search in multimedia databases. In ADBIS, 2004.
[57] H. Su, S. Liu, B. Zheng, X. Zhou, and K. Zheng. A survey of trajectory distance measures and performance evaluation. PVLDBJ, 29(1):3–32, 2020.
[58] Y. Suzuki, J. Ishizuka, and K. Kawagoe. A similarity search of trajectory data using textual information retrieval techniques. In DASFAA, volume 4947, pages 627–634, 2008.
[59] N. Ta, G. Li, Y. Xie, C. Li, S. Hao, and J. Feng. Signature-based trajectory similarity join. TKDE, 29(4):870–883, 2017.
[60] D. A. Tedjopurnomo, X. Li, Z. Bao, G. Cong, F. M. Choudhury, and A. K. Qin. Similar trajectory search with spatio-temporal deep representation learning. TIST, 12(6):77:1–77:26, 2021.
[61] Y. Tong, Z. Zhou, Y. Zeng, L. Chen, and C. Shahabi. Spatial crowdsourcing: a survey. VLDBJ, 29(1):217–250, 2020.
[62] K. Toohey and M. Duckham. Trajectory similarity measures. ACM SIGSPATIAL, 7(1):43–50, 2015.
[63] Uber. https://expandedramblings.com/index.php/uber-statistics/. 2022.
[64] M. Vlachos, D. Gunopulos, and G. Kollios. Discovering similar multidimensional trajectories. In ICDE, pages 673–684, 2002.
[65] J. Wang, N. Wu, X. Lu, W. X. Zhao, and K. Feng. Deep trajectory recovery with fine-grained calibration using kalman filter. IEEE Trans. Knowl. Data Eng., 33(3):921–934, 2021.
[66] S. Wang, Z. Bao, J. S. Culpepper, and G. Cong. A survey on trajectory data management, analytics, and learning. ACM Comput. Surv., 54(2):39:1–39:36, 2021.
[67] S. Wang, Z. Bao, J. S. Culpepper, Z. Xie, Q. Liu, and X. Qin. Torch: A search engine for trajectory data. In SIGIR, pages 535–544, 2018.
[68] S. Wang, J. Cao, and P. S. Yu. Deep learning for spatio-temporal data mining: A survey. CoRR, abs/1906.04928, 2019.
[69] S. Wang, Y. Sun, and Z. Bao. On the efficiency of k-means clustering: Evaluation, optimization, and algorithm selection. PVLDB, 14(2):163–175, 2020.
[70] T. U. Wien, T. Eiter, T. Eiter, H. Mannila, and H. Mannila. Computing discrete fréchet distance. Technical report, Citeseer, 64(3):636–637, 1994.
[71] D. Xie, F. Li, and J. M. Phillips. Distributed trajectory similarity search. PVLDB, 10(11):1478–1489, 2017.
[72] X. Xie, W. Philips, P. Veelaert, and H. K. Aghajan. Road network inference from GPS traces using DTW algorithm. In ITSC, pages 906–911, 2014.
[73] C. Yang and G. Gidófalvi. Fast map matching, an algorithm integrating hidden markov model with precomputation. Int. J. Geogr. Inf. Sci., 32(3):547–570, 2018.
[74] P. Yang, H. Wang, D. Lian, Y. Zhang, L. Qin, and W. Zhang. TMN: trajectory matching networks for predicting similarity. In ICDE, pages 1700–1713, 2022.
[75] P. Yang, H. Wang, Y. Zhang, L. Qin, W. Zhang, and X. Lin. T3S: effective representation learning for trajectory similarity computation. In ICDE, pages 2183–2188, 2021.
[76] D. Yao, G. Cong, C. Zhang, and J. Bi. Computing trajectory similarity in linear time: A generic seed-guided neural metric learning approach. In ICDE, pages 1358–1369, 2019.
[77] D. Yao, H. Hu, L. Du, G. Cong, S. Han, and J. Bi. Trajgat: A graph-based long-term dependency modeling approach for trajectory similarity computation. In SIGKDD, pages 2275–2285, 2022.
[78] D. Yao, C. Zhang, Z. Zhu, J. Huang, and J. Bi. Trajectory clustering via deep representation learning. In IJCNN, pages 3880–3887, 2017.
[79] B. Yi, H. V. Jagadish, and C. Faloutsos. Efficient retrieval of similar time sequences under time warping. In ICDE, pages 201–208, 1998.
[80] H. Yuan and G. Li. Distributed in-memory trajectory similarity search and join on road network. In ICDE, pages 1262–1273, 2019.
[81] H. Yuan, G. Li, Z. Bao, and L. Feng. Effective travel time estimation: When historical trajectories over road networks matter. In SIGMOD, 2020.
[82] J. Yuan, Y. Zheng, C. Zhang, X. Xie, and G. Sun. An interactive-voting based map matching algorithm. In MDM, pages 43–52, 2010.
[83] D. Zhang, Z. Chang, S. Wu, Y. Yuan, and G. Chen. Continuous trajectory similarity search for online outlier detection. IEEE TKDE, PP(99):1–1, 2020.
[84] H. Zhang, X. Zhang, Q. Jiang, B. Zheng, Z. Sun, W. Sun, and C. Wang. Trajectory similarity learning with auxiliary supervision and optimal matching. In IJCAI, pages 3209–3215, 2020.
[85] B. Zheng, L. Weng, X. Zhao, K. Zeng, X. Zhou, and C. S. Jensen. REPOSE: distributed top-k trajectory similarity search with local reference point tries. In ICDE, pages 708–719, 2021.
[86] K. Zheng, Y. Zheng, X. Xie, and X. Zhou. Reducing uncertainty of low-sampling-rate trajectories. In ICDE, pages 1144–1155, 2012.
[87] Y. Zheng. Trajectory data mining: An overview. TIST, 6(3):29:1–29:41, 2015.
[88] Y. Zheng, L. Liu, L. Wang, and X. Xie. Learning transportation mode from raw gps data for geographic applications on the web. In WWW, pages 247–256, 2008.
[89] Y. Zheng, L. Wang, R. Zhang, X. Xie, and W. Ma. Geolife: Managing and understanding your past life over maps. In MDM, pages 211–212, 2008.
[90] X. Zuo and X. Jin. General hierarchical model (GHM) to measure similarity of time series. SIGMOD, 36(1):13–18, 2007.