This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Unsupervised Abnormal Stop Detection for Long Distance Coaches with Low-Frequency GPS

Jiaxin Deng, Junbiao Pang, Jiayu Xu, and Haitao Yu J. Deng, J. Pang and J. Xu are with the School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China (e-mail: [email protected]). H. Yu is the Beijing Intelligent Transportation Development Center, Beijing 100161, China (email: [email protected])
Abstract

In our urban life, long distance coaches supply a convenient yet economic approach to the transportation of the public. One notable problem is to discover the abnormal stop of the coaches due to the important reason, i.e., illegal pick up on the way which possibly endangers the safety of passengers. It has become a pressing issue to detect the coach abnormal stop with low-quality GPS. In this paper, we propose an unsupervised method that helps transportation managers to efficiently discover the Abnormal Stop Detection (ASD) for long distance coaches. Concretely, our method converts the ASD problem into an unsupervised clustering framework in which both the normal stop and the abnormal one are decomposed. Firstly, we propose a stop duration model for the low frequency GPS based on the assumption that a coach changes speed approximately in a linear approach. Secondly, we strip the abnormal stops from the normal stop points by the low rank assumption. The proposed method is conceptually simple yet efficient, by leveraging low rank assumption to handle normal stop points, our approach enables domain experts to discover the ASD for coaches, from a case study motivated by traffic managers. Datset and code are publicly available at: https://github.com/pangjunbiao/IPPs.

Index Terms:
Abnormal Stop Detection, Long range coach, Low-rank Matrix Decomposition

I INTRODUCTION

Long-range coaches, as a common approach of public transportation in China, offering convenient, economical and time-efficient long distance services, compared to railway travel [1] [2]. Yet, managing long-range coach industry presents unique challenges due to their long distances and the long time operation, distinguishing them from the other transportation systems, e.g., buses, subways or railway. A notable issue within the long-range coach industry is Illegal Passenger Pickups (IPPs) in China, where:

Definition 1 (Definition of IPP).

Coaches abnormally stop at the non-designated locations to pick up or drop off passengers and their luggage.

Local laws stipulate that passenger coaches must not pick up passengers outside the designated stops without legitimate reasons and must not alter the designated route. The IPPs activity in China tend to bring the several dangers as follows:

  • 1.

    Failure to the real-name ticketing process. In China, passengers and their luggage must be checked for security to protect passengers from the potential dangers. e.g., flammable items, knifes which pose significant safety risks [3].

  • 2.

    Overcrowding problem and passenger’s rights. The tickets sold from stations constrain the number of the passengers in a coach to avoid the overcrowding problem [4] [5], that is a strictly illegal behavior in China. Besides, if traffic accidents occur, the rights of the passenger without tickets would be impaired. Because a ticket usually contains the traffic accident insurance for this travel in China.

  • 3.

    Possibly causing traffic accidents. When the IPPs activity is occurred, passengers usually pick or drop their luggage at the side of a road in a rush [5]. Moreover, coaches usually occupy a lane of a road. The above two problem tends to cause the traffic accidents.

Refer to caption
(a) A motor in a coach
Refer to caption
(b) Stop at an intersection
Figure 1: A case shows that the IPPs activity of long-range coaches tends to carry dangerous luggage (a) and cause traffic accidents (b).222https://auto.zjol.com.cn/zjcw/jtjs/201902/t20190220_9494189.shtml

The motivation of coach drivers is usually to maximize their income by picking up passengers. In order to instantly prevent IPPs and protect the rights of passengers, transport authority currently checks the correspondence between the number passengers and the ticket records at the fixed inspection stations in China333https://www.gd.gov.cn/gdywdt/zfjg/content/post_2286796.html. This scheme is ineffective since coach drivers could easily bypass the manually check by picking up passengers at a spot along the route with instant messengers, e.g., mobile phones. This situation presents a critical issue for the coach industry and the transport authority: How can we effectively and precisely identify the spots where IPPs are often occurred in the data-driven way? Once the spots are identified, the local law enforcers would dynamically check these identified spots to prevent the IPPs activity.

A natural solution is to discover the abnormal stop spots of coaches. Because we have observed that some spots have good traffic conditions for shuttling passengers, e.g., the spot nearby by a bus station. A promising strategy leverages Global Positioning Satellite (GPS) to determine the stop spots (i.e., where the IPPs are occurred) of long-distance coaches. The GPS equipment on each coach in China reveals the locations of stops including the passive stops under the normal scenarios, e.g., congestion, and the red lights at intersections, as well as the active stops e.g., pick up passengers, and avoid an accident. Therefore, in the complex traffic conditions, effectively determining the nature (e.g., passive and active) of the different stop points is a key issue to discover the abnormal stop spots by stripping them from these normal ones.

I-A Research Challenges

How to determine the stop spots from the low-frequency GPS: In practice, there exists large amount of low-sampling-rate (e.g., one point every 2 minutes) GPS trajectories for a coach. They are generated in the scenarios where saving of energy cost and communication cost are desired. Low-frequency GPS typically refers to send a GPS coordinate per 3030 seconds in China (see Table I for more details). In order to capture the abnormal stop spots, we have to first obtain the possible stop spots during the travel service of a coach. However, a 3030 seconds interval is too long to directly judge whether a coach has stopped or not.

Discover the abnormal stop spots: A coach would have various stop behaviors due to the changes of road conditions and different driving behaviors [6]. The abnormal stop spots are the candidate spots where the coach drivers would like to pick up passengers. Therefore, the key challenge of this research is to effectively discover the abnormal stop spots in an unsupervised approach under the complex traffic conditions [7].

I-B Our Contribution

In this paper, we propose a method that aims to resolve the above challenges in an unsupervised approach. Rather than seeking the stop spots by manually screening the physical constraints that conventionally pick up passengers, we instead of directly discover the abnormal stop spots by assuming that the complex traffic system follows the low rank constraints [8]. That is, we cast our quest for “abnormal” stops as the matrix decomposition problem, in which the low rank models the normal stops, while a sparsity residual term represents the abnormal ones.

This paper makes several significant contributions to the field of the long range coach industry management, as outlined below:

Introduction to the IPPs Problem. This study is pioneering in exploring the discovery of the spots of the IPPs activity for long range coaches, only relying on the low-frequency GPS. We approach this challenge as an unsupervised problem, achieving the effective solutions for identifying the candidate spots where the IPPs activities tend to occur.

Discovery of the possible stop spots from the low-frequency GPS: This paper addresses the problem of the low-frequency GPS in particular. We propose a method to identify the possible stop spots by simply assuming that the speed of a coach changes linearly. These candidate stop spots are instrumental in characterizing the stop spots from diverse coach drivers.

Determining anomalous stop spots: We assume that the abnormal stop spots are sparse, compared with the normal ones. Besides, it is impossible for long range coaches to illegally stop at some special spots (e.g., main road). Therefore, we introduced the structured sparsity [9] and the low rank constraint [10] for detecting abnormal stop spots. Our findings reveal the proposed method as the superior method for detecting the spots for the IPPs activities.

II Related Work

To our knowledge, this paper first leverages GPS points to identify the stop spot of the IPPs activity for long range coaches. Therefore, we briefly review the related works about exploiting GPS to discover abnormal spots for the other tasks.

II-A Abnormal detection from GPS

The varying natural environments and traffic conditions [11] [12] [13] leverages GPS to address multiple issues related to traffic safety and efficiency. For instance, by monitoring the movement patterns and speed variations of buses, abnormal driving behaviors can be detected, e.g., abnormal driving behaviors [14] [15], over speeding [16], the deviation from the predetermined route [17] [18] [19], and black spots identification [20]. In a summary, discovering the stop spots for the IPPs activities barely been investigated.

Besides, both the video and audio play a pivotal role  [21] to identify abnormality in human behaviors. Compared with GPS, either video or audio directly reflects the human behaviors, e.g., fighting in surveillance scenes [22]. However, to our best knowledge, how to leverage video or audio to detect the IPPs activity is still an open problem.

II-B Handle Low-frequency GPS in Traffic

Low frequency GPS are application-logged data from GPS embedded equipments on diverse vehicles, e.g., trucks, buses. Different applications enforce diverse priors to handle the information loss caused by the low frequency. For instance, either the connection between the shortest path and the vehicle trajectory [23] or the spatial and the temporal information [24] [25] is used as a prior to tackle the low-frequency GPS for the map matching problem. As a comparison, we assume that a vehicle would follow a linearly changed velocity as a prior to discover the possible stop behavior.

II-C Low Rank for traffic problem

Although transportation in a city is a mixture of the multiple factors, (e.g., the influence of the road conditions, traffic conditions, and climate and environment), recent research has discovered that the spatio-temporal data from the transportation problem has the low-rank structure [7] [26]. Consequently, [27] [28] proposed a non-convex low-rank model to address missing data problem. Non-negative matrix (and tensor) factorization has been widely used for network traffic analysis [29][30] proposed a method based on sparse non-negative matrix factorization to identify the periodic patterns in traffic behavior. In this paper, we strip the abnormal stop spots from the abnormal ones by the low rank and the sparse decomposition.

III Methods

III-A Background

The dataset comprising coach GPS records were sourced from the Beijing transportation information center. Specifically, each coach is equipped with a GPS enabled device that transmits real-time but low frequent data including longitude, latitude, time stamps, instantaneous velocity, and engine state (see Table I). All items are relayed to a data center via telecommunication networks.

Adherence to local coach regulations is mandatory, requiring: 1) connectivity of GPS signals; and 2) accurate GPS time stamps. Non-compliance, indicated by erroneous data transmission, prompts law enforcement to notify coach companies for corrective actions.

Definition 2 (A GPS point).

A GPS point is a pair of longitude and latitude coordinates with a timestamp collected by an embedded GPS device. In this paper, we additionally collect the instantaneous velocity and the engine state of a coach for each GPS point. Therefore, each GPS point 𝐩i\bm{p}_{i} consists of longitude 𝐩i.lng\bm{p}_{i}.lng , latitude 𝐩i.lat\bm{p}_{i}.lat , timestamp 𝐩i.t\bm{p}_{i}.t , instantaneous velocity 𝐩i.v\bm{p}_{i}.v and the engine state 𝐩i.of\bm{p}_{i}.of, as in the example shown in Table I.

Definition 3 (A journey of a coach).

A journey of a coach 𝒯\mathcal{T} consists of a series of continuous GPS points, and each time interval between any two adjacent GPS points does not exceed a certain value Δt\Delta t, which represents the sampling interval; i.e., 𝒯:𝐩1𝐩2𝐩N\mathcal{T}:\bm{p}_{1}\rightarrow\bm{p}_{2}\rightarrow\ldots\rightarrow\bm{p}_{N}, and 0<𝐩i+1.t𝐩i.tΔt(1iN)0<\bm{p}_{i+1}.t-\bm{p}_{i}.t\leq\Delta t(1\leq i\leq N).

TABLE I: The items in a GPS point from a long-range coach
Item Meaning
Time stamp tt Sending time of a GPS point
Longitude lnglng Longitude of a GPS point
Latitude latlat Latitude of a GPS point
Instantaneous velocity vv Velocity
Engine state ofof start; off

III-B Extract candidate stop spots from low-frequency GPS

Definition 4 (the low-frequency GPS).

Low-Frequency (or low-sampling-rate) GPS (LF GPS) data fetch a location every 30 seconds or even more.

Obviously, LF GPS makes the location uncertainty increase as the sampling rate reduces. Consequently, judging whether a coach has stopped or not from LF GPS is very difficult due to the long time intervals. If a coach is assumed to linearly change speed, we have two cases which indicate that a coach has been stopped in 30 seconds as in Fig. 2.

Concretely, given two consecutive GPS points with the velocity viv_{i} and vi+1v_{i+1}, the time interval 30(seconds)30(\text{seconds}) between the consecutive GPS points, and vmv_{m} is the minimum velocity between the two consecutive GPS points, a coach would have a stop behavior as follows:

Case 1: when both the minimal velocity vmv_{m} is equal to 0 and the stop duration Δt=0\Delta t=0, we have:

(vivm)t2+(vi+1vm)(30t)2+30vm=d,\begin{split}\frac{(v_{i}-v_{m})\cdot t}{2}+\frac{(v_{i+1}-v_{m})\cdot(30-t)}{2}+30\cdot v_{m}=d,\end{split} (1)

where tt is the time a coach reaches to the minimum speed vmv_{m}, and dd the distance between the two GPS points (see Fig. 2(a)). If the minimal velocity in Eq. (1), vmv_{m} is equal to 0, (i.e., vm=0v_{m}=0), we have the following equation:

t(vivi+1)=2d30vi+1.\begin{split}t\cdot(v_{i}-v_{i+1})=2d-30v_{i+1}.\end{split} (2)

Therefore, by applying the constrains t>0t>0, we have the following inequality as follows:

2d30vi+1vivi+1>0.\frac{2d-30v_{i+1}}{v_{i}-v_{i+1}}>0. (3)

Case 2: when the minimal velocity vm=0v_{m}=0 and the stop duration Δt>0\Delta t>0, we could assume that a coach has stopped at the time tt as follows (see Fig. 2 (b), (c) ):

vit2+vi+1(30Δtt)2=d,\frac{v_{i}\cdot t}{2}+\frac{v_{i+1}\cdot(30-\Delta t-t)}{2}=d, (4)
t2d30vi+130vmvivi+1wherevm<0.t\leq\frac{2d-30v_{i+1}-30v_{m}}{v_{i}-v_{i+1}}\ \ \text{where}\ \ v_{m}<0. (5)

The key problem to judging whether a coach has stopped or not is: find the conditions that satisfy the non-negativity, i.e., t>0t>0 and Δt>0\Delta t>0.

Refer to caption
(a) case 1:vm=0v_{m}=0
Refer to caption
(b) case 2: vm<0v_{m}<0
Figure 2: The illustration of two cases to judge the stop behavior of a coach.

Therefore, (4) is re-written as follows:

Δt=vitvi+1t+302dvi+1.\Delta t=\frac{v_{i}\cdot t}{v_{i+1}}-t+30-\frac{2d}{v_{i+1}}. (6)

To remove the variable tt, we have the following sub-cases:

Sub-case 2.1: vi+1>viv_{i+1}>v_{i}: Given vi+1>viv_{i+1}>v_{i}, we have vivi+1<1\frac{v_{i}}{v_{i+1}}<1.Consequently, (6) is re-written as follows:

Δt(vivi+11)2d30vi+130vmvivi+1+302dvi+1(vivi+11)2d30vi+1vivi+1+302dvi+1\begin{split}\Delta t&\geq\big{(}\frac{v_{i}}{v_{i+1}}-1\big{)}\cdot\frac{2d-30v_{i+1}-30v_{m}}{v_{i}-v_{i+1}}+30-\frac{2d}{v_{i+1}}\\ &\geq\big{(}\frac{v_{i}}{v_{i+1}}-1\big{)}\cdot\frac{2d-30v_{i+1}}{v_{i}-v_{i+1}}+30-\frac{2d}{v_{i+1}}\end{split} (7)

Eq. (7) give the low bound of the stop duration Δt\Delta t. That is, if the low bound in Eq. (7) is greater than 0, a stop has occurred between the two GPS points.

Sub-case 2.2: vi+1viv_{i+1}\leq v_{i}: Given vi+1viv_{i+1}\leq v_{i}, we have vi×tvi+1t0\frac{v_{i}\times t}{v_{i+1}}-t\geq 0. Consequently, we simplify drop the term vi×tvi+1t\frac{v_{i}\times t}{v_{i+1}}-t in  (4) as follows:

Δt2×dvi+30.\Delta t\geq-\frac{2\times d}{v_{i}}+30. (8)
Algorithm 1 Extract Stop duration from the low-frequency GPS
1:  INPUT: vi,vi+1v_{i},v_{i+1}, and dd
2:  OUTPUT: Stop signal (Yes/No), and stop duration Δt\Delta t (if stopped)
3:  if  (3) is hold then
4:     Δt0\Delta t\leftarrow 0
5:     return ”Yes”, Δt\Delta t
6:  else
7:     if  vi+1>viv_{i+1}>v_{i} then
8:        Δt\Delta t is determined by the Eq.(7)
9:        if Δt>0\Delta t>0 then
10:           return ”Yes”, Δt\Delta t
11:        end if
12:     else if  vi+1viv_{i+1}\leq v_{i} then
13:        Δt\Delta t is determined by the Eq.(8)
14:        if Δt>0\Delta t>0 then
15:           return ”Yes”, Δt\Delta t
16:        end if
17:     end if
18:  end if
19:  return ”No”

III-C Group stop spots into a matrix

Ideally, we expect to precisely discover where a IPP activity occurs. Based on the first Markov assumption in traffic [31] [32], we adaptively segment the route into fine-grained road segments which help law enforcers precisely locate the IPP spots.

Definition 5 (Road segment).

A road segment rr is an un-directed edge. Each road segment is associated with a unique ID ridr_{id}, a starting point 𝐫s\mathbf{r}_{s}, and an ending point 𝐫e\mathbf{r}_{e}. Each point consists of the longitude and latitude coordinates.

In this paper, we uniformly segment a travel line of a coach into the road segments the fixed length DD. The estimated stop time Δt\Delta t in Alg. 1 were assigned to the corresponding road segments. Concretely, if there is are NN stop time Δti,s\Delta t_{i,s}, (s=1,2,,S)(s=1,2,...,S) of a coach in the ii-th road segment during one day, we summarize the stop duration of a coach in the ii-th segment during one day as follows:

𝑹i,j=s=1SΔti,s,\bm{R}_{i,j}=\sum_{s=1}^{S}\Delta t_{i,s}, (9)

where 𝑹M×N\bm{R}\in\mathbb{R}^{M\times N} is the stop duration matrix, MM is the number of the road segments, i=1,2,,Mi=1,2,...,M. NN is the total number of stop durations of each bus in each day after combining the stop durations of the same coach on the same day according to Eq. (9), j=1,2,,Nj=1,2,...,N.

Dealing with location inaccuracy: In many cases, the location information may be miss-located due to the blocked signal transmission. Therefore, a smooth mechanism is preferred over the stop duration Δti,j\Delta t_{i,j} between the ii-th interval and t+1t+1-th one. As shown in Fig.3, a simple and yet reasonable approach uses linear interpolation to smooth the stop duration Δti,j\Delta t_{i,j} as follows:

Δti,jDi,jD×Δti,j,Δti+1,jDDi,jD×Δti,j,\Delta t_{i,j}\leftarrow\frac{D_{i,j}}{D}\times\Delta t_{i,j},\ \ \Delta t_{i+1,j}\leftarrow\frac{D-D_{i,j}}{D}\times\Delta t_{i,j}, (10)

where Di,jD_{i,j} is the distance from the stop spot of a trajectory to the end point of the ii-th road segment.

Refer to caption
Figure 3: The relationship between the travel line, road segments, stop duration and its interpolated durations.

III-D Identify the IPPs spots by the low rank constraint

In order to strip the Abnormal Stop Spots (ASSs) from the matrix 𝑹\bm{R}, we introduce the stop type matrix 𝑰M×N\bm{I}\in\mathbb{R}^{M\times N}, where 𝑰ij{0,1}\bm{I}_{ij}\in\{0,1\}, in which 0 means the abnormal stop, and 11 indicates the normal stop. Consequently, striping the abnormal stops from the normal ones is modeled as follows:

argmin𝑰,𝑬𝚯+λ𝑬1+β𝑾2,1s.t.:𝑹=𝑹𝑰+𝑬,𝚯=𝑹𝑰,𝑬=𝑾,0𝑬,0𝑰ij1,\begin{split}\arg\min_{\bm{I},\bm{E}}\quad&\|\bm{\Theta}\|_{*}+\lambda\|\bm{E}\|_{1}+\beta\|\bm{W}\|_{2,1}\\ s.t.:\quad&\bm{R}=\bm{R}\odot\bm{I}+\bm{E},\\ &\bm{\Theta}=\bm{R}\odot\bm{I},\\ &\bm{E}=\bm{W},\\ &0\leq\bm{E},0\leq\bm{I}_{ij}\leq 1,\end{split} (11)

where the operation \odot is the element-wise multiplication between two matrices, 1\|\cdot\|_{1} is the L1L_{1} norm, 2,1\|\cdot\|_{2,1} is the group sparsity, \|\cdot\|_{*} is the nuclear norm, λ(λ>0)\lambda(\lambda>0) and β(β>0)\beta(\beta>0) are the hyper-parameters. In Eq. (11), we relaxes the 0-1 programming of 𝑰\bm{I}.

Eq. (11) assumes that the stop duration matrix 𝑹\bm{R} is decomposed into a low rank matrix 𝚯=𝑹𝑰\bm{\Theta}=\bm{R}\odot\bm{I}, and a mixture of the sparse matrix 𝑬\bm{E} and column-sparse matrix 𝑾\bm{W}. Motivated by [7], the matrix 𝚯\bm{\Theta} in Eq. (11) represents the normal stop spots, and the sparse matrix 𝑬\bm{E} indicates the abnormal stop spots. Therefore, the column-sparse 2,1\|\cdot\|_{2,1} models that some road segments are impossible to be ASSs, while the sparse 1\|\cdot\|_{1} follows that the ASSs should be sparse compared to the normal stop behaviors of coaches. In summary, Eq. (11) assumes that the ASSs are sparely occurred in some special road segments. Therefore, a larger value of 𝑬ij\bm{E}_{ij} indicates a higher ASS for the ii-th road segment.

III-E Model Optimization

We use the Alternating Direction Method of Multipliers (ADMM) to solve Eq. (11) as follows:

L(𝑰,𝑬,𝑾)=𝚯+λ𝑬1++β𝑾2,1+𝒀1,𝑹𝑹𝑰𝑬+𝒀2,𝚯𝑹𝑰+𝒀3,𝑾𝑬+ρ2𝑹𝑹𝑰𝑬F2+ρ2𝚯𝑹𝑰F2+ρ2𝑾𝑬F2,\begin{split}L(\bm{I},\bm{E},\bm{W})=&\|\bm{\Theta}\|_{*}+\lambda\|\bm{E}\|_{1}++\beta\|\bm{W}\|_{2,1}\\ &+\langle\bm{Y}_{1},\bm{R}-\bm{R}\odot\bm{I}-\bm{E}\rangle\\ &+\langle\bm{Y}_{2},\bm{\Theta}-\bm{R}\odot\bm{I}\rangle+\langle\bm{Y}_{3},\bm{W}-\bm{E}\rangle\\ &+\frac{\rho}{2}\|\bm{R}-\bm{R}\odot\bm{I}-\bm{E}\|_{F}^{2}\\ &+\frac{\rho}{2}\|\bm{\Theta}-\bm{R}\odot\bm{I}\|_{F}^{2}+\frac{\rho}{2}\|\bm{W}-\bm{E}\|_{F}^{2},\end{split} (12)

where 𝒀1M×N\bm{Y}_{1}\in\mathbb{R}^{M\times N}, 𝒀2M×N\bm{Y}_{2}\in\mathbb{R}^{M\times N}, and 𝒀3M×N\bm{Y}_{3}\in\mathbb{R}^{M\times N} are the Lagrange multiplier matrices, and ρ\rho (ρ>0\rho>0) is the penalty factor. The iterative alternating direction method is used to solve the Lagrangian function in Eq. (12).

Update the matrix E\bm{E}: When other variables are fixed, the subproblem w.r.t. 𝑬\bm{E} is as follows:

argmin𝑬λ2𝑬1+ρ2𝑬12(𝑹+𝑾+𝒀1ρ+𝒀3ρ𝑹𝑰)F2,~{}\arg\min_{\bm{E}}\frac{\lambda}{2}\|\bm{E}\|_{1}+\frac{\rho}{2}\big{\|}\bm{E}-\frac{1}{2}(\bm{R}+\bm{W}+\frac{\bm{Y}_{1}^{\top}}{\rho}+\frac{{\bm{Y}_{3}}^{\top}}{\rho}-\bm{R}\odot\bm{I})\big{\|}_{F}^{2}, (13)

which can be solved by the shrinkage method as follows:

𝑬=Sλ/2ρ(12(𝑹+𝑾+𝒀1ρ+𝒀3ρ𝑹𝑰)),\bm{E}=S_{\lambda/2\rho}\left(\frac{1}{2}(\bm{R}+\bm{W}+\frac{\bm{Y}_{1}}{\rho}+\frac{\bm{Y}_{3}}{\rho}-\bm{R}\odot\bm{I})\right), (14)

in which the shrinkage factor Sα(Z){S}_{\alpha}(Z) is defined as follows:

Sα(x)={xαx>α0|x|αx+αx<αS_{\alpha}(x)=\left\{\begin{array}[]{ll}x-\alpha&x>\alpha\\ 0&|x|\leq\alpha\\ x+\alpha&x<-\alpha\end{array}\right. (15)

Update the matrix 𝚯\bm{\Theta}: When other variables are fixed, the subproblem w.r.t. 𝚯\bm{\Theta} is as follows:

argmin𝚯𝚯+ρ2𝚯𝑹𝑰+𝒀2ρF2,\arg\min_{\bm{\Theta}}{\|\bm{\Theta}\|}_{\ast}+\frac{\rho}{2}\left\|\bm{\Theta}-\bm{R}\odot\bm{I}+\frac{{\bm{Y}}_{2}}{\rho}\right\|_{F}^{2}, (16)

which can be solved by the singular value threshold method. More specially, let 𝑪=𝑼𝚺𝑽\bm{C}=\bm{U}\bm{\Sigma}\bm{V}^{\top} be the Singular Value Decomposition (SVD) of 𝑪\bm{C}. Then, for Tλ(𝑪)=𝑼𝚺λ𝑽T{T_{\lambda}}(\bm{C})=\bm{U}{\bm{\Sigma}_{\lambda}}{\bm{V}^{T}}, we have:

Tλ(𝑪)=argmin𝚯12𝚯𝑪F2+λ𝚯,{T_{\lambda}}(\bm{C})=\arg\mathop{\min}\limits_{\bm{\Theta}}\frac{1}{2}\left\|{\bm{\Theta}-\bm{C}}\right\|_{F}^{2}+\lambda{\left\|\bm{\Theta}\right\|_{*}}, (17)

where 𝚺λ{\bm{\Sigma}_{\lambda}} is diagonal with (𝚺λ)ii=max(0,𝚺iiλ){({\bm{\Sigma}_{\lambda}})_{ii}}=\max(0,{\bm{\Sigma}_{ii}}-\lambda). By the SVD threshold method [33], the solution for the 𝚯\bm{\Theta} is as fallows:

𝚯=argmin𝚯L(𝚯)=T1ρ(𝑹𝑰𝒀2ρ).\begin{split}\bm{\Theta}&=\arg\min_{\bm{\Theta}}{L(\bm{\Theta})}\\ &={T}_{\frac{1}{\rho}}\big{(}\bm{R}\odot\bm{I}-\frac{{\bm{Y}}_{2}}{\rho}\big{)}.\end{split} (18)

Update the matrix I\bm{I}: When other variables are fixed, the subproblem w.r.t. 𝑰\bm{I} is as follows:

argmin𝑰ρ2((𝑹𝑰)22𝑹𝑹𝑰2𝑬𝑹𝑰)+ρ2((𝑹𝑰)22𝚯𝑹𝑰)𝒀1𝑹𝑰𝒀2𝑹𝑰,\begin{split}\arg\min_{\bm{I}}&\frac{\rho}{2}({(\bm{R}\odot\bm{I})}^{2}-2{\bm{R}}^{\top}\bm{R}\odot\bm{I}-2{\bm{E}}^{\top}\bm{R}\odot\bm{I})\\ &+\frac{\rho}{2}({(\bm{R}\odot\bm{I})}^{2}-2\bm{\Theta}^{\top}\bm{R}\odot\bm{I})-{\bm{Y}}_{1}^{\top}\bm{R}\odot\bm{I}\\ &-{\bm{Y}}_{2}^{\top}\bm{R}\odot\bm{I},\end{split} (19)

Then 𝑰\bm{I} is solved as follows:

𝑰=(ρ𝑹+ρ𝚯ρ𝑬+𝒀𝟏+𝒀𝟐)/(2ρ𝑹),\begin{split}\bm{I}=({\rho\bm{R}+\rho\bm{\Theta}-\rho\bm{E}+\bm{Y_{1}}+\bm{Y_{2}}})/({2\rho\bm{R}}),\end{split} (20)

where the operation // represents the element-wise division between two matrices.

Algorithm 2 Optimize our model with ADMM
1:  INPUT: The matrix𝑹\bm{R}, λ>0\lambda>0, β>0\beta>0 and ρ1\rho\geq 1;
2:  while not converged do
3:     Update the matrix 𝑬\bm{E} by Eq.(14);
4:     Update the matrix 𝚯\bm{\Theta} by Eq.(18);
5:     Update the matrix 𝑰\bm{I} by Eq.(20);
6:     Update the matrix 𝑾\bm{W} by Eq.(22);
7:     Update the Lagrange multiplier matrices 𝒀1\bm{Y}_{1}, 𝒀2\bm{Y}_{2}, and 𝒀3\bm{Y}_{3} by Eq. 23;
8:     ρρμ\rho\leftarrow\rho\cdot\mu;
9:  end while
10:  OUTPUT: 𝑰,𝑬\bm{I},\bm{E}

Update the matrix W\bm{W}: When other variables are fixed, the subproblem w.r.t. 𝑾\bm{W} is as follows:

argmin𝑾β𝑾2,1+ρ2𝑾𝑬+𝒀3ρF2.\begin{split}\arg\min_{\bm{W}}\beta\|\bm{W}\|_{2,1}+\frac{\rho}{2}\left\|\bm{W}-\bm{E}+\frac{{\bm{Y}}_{3}}{\rho}\right\|_{F}^{2}.\end{split} (21)

where 𝑾2,1=i=1nj=1m𝑾ij2=i=1n𝑾i2\|\bm{W}\|_{2,1}=\sum\limits_{i=1}^{n}{\sqrt{\sum\limits_{j=1}^{m}{\bm{W}_{ij}^{2}}}}=\sum\limits_{i=1}^{n}{{{\left\|{{\bm{W}_{i}}}\right\|}_{2}}}. We can update the ii-th column of the 𝑾\bm{W} matrix as follows:

𝑾(:,i)={𝐐iα𝐐i𝐐i,ifα<𝐐i,0,otherwise,{\bm{W}}(:,i)=\left\{\begin{array}[]{ll}\frac{\|{\mathbf{Q}}_{i}\|-\alpha}{\|\mathbf{Q}_{i}\|}\mathbf{Q}_{i},&\ if\ \ \alpha<\|\mathbf{Q}_{i}\|,\\ 0,&\text{otherwise},\end{array}\right. (22)

where α=β/ρ\alpha=\beta/\rho, 𝐐=𝑬𝒀3ρ\mathbf{Q}=\bm{E}-\frac{{\bm{Y}}_{3}}{\rho}, 𝐐i\mathbf{Q}_{i} is the ii-th column of the matrix 𝑸\bm{Q}.

Update the matrix Y\bm{Y}: Update the Lagrange multipliers 𝒀1{\bm{Y}}_{1}, 𝒀2{\bm{Y}}_{2} and 𝒀3{\bm{Y}}_{3} are as follows:

𝒀1𝒀1+ρ(𝑹𝑹𝑰𝑬),𝒀2𝒀2+ρ(𝚯𝑹𝑰),𝒀3𝒀3+ρ(𝑾𝑬),\begin{split}&{\bm{Y}}_{1}\leftarrow{\bm{Y}}_{1}+\rho(\bm{R}-\bm{R}\odot\bm{I}-\bm{E}),\\ &{\bm{Y}}_{2}\leftarrow{\bm{Y}}_{2}+\rho(\bm{\Theta}-\bm{R}\odot\bm{I}),\\ &{\bm{Y}}_{3}\leftarrow{\bm{Y}}_{3}+\rho(\bm{W}-\bm{E}),\end{split} (23)

During training, we update ρρμ\rho\leftarrow\rho\cdot\mu, where μ>1\mu>1. In this paper, we initialized ρ\rho to 1 and set μ\mu to 1.2. The ADMM process is summarized in Alg. 2.

IV Experiments and Case study

IV-A Experiment setup

IV-A1 Data

We use the GPS data generated from the travel line with the Liuliqiao start station in Beijing to Zhangjiakou end station in Hebei province for our study. Since the coaches barely stop on highway, we have invited 11 students to take this travel line where the coaches have the different departure times. These students have recorded the ASSs for the IPPs activities. The GPS points of these abnormal stops are recorded by the APP in mobile phones. Fig. 4 shows that 10 recorded abnormal stop spots.

During the pre-processing, we used three conventional tricks: 1) the area constraint, 2) the instantaneous speed, and 3) the precision of GPS. Concretely, the area constraint assumes that GPS only occurs within the Sixth Ring Road of Beijing. The instantaneous speed means that we should remove the GPS points who’s instantaneous speed is larger than 120 km/hkm/h. The precision of GPS means that we remove error GPS data if the number of digits after the point is less than 6.

Refer to caption
Figure 4: Visualization of the manually discovered ASSs.

IV-A2 Experiment settings

In our experiments, our experiments are conducted on a server (Windows) with Intel Core CPU with 3.2 GHz and 32 GB main memory. The source code was implemented with Matlab and Python.

IV-A3 Evaluation metrics

In our experiments, we use two kinds of evaluation metrics as follows:

  • Average Precision (AP): We plot Precision versus Recall of a Classifier (PRC) at different decision thresholds. Consequently, AP calculates the average precision across different decision thresholds, achieved by calculating the area under the PRC. higher AP is, better a method is.

  • Area Under Curve (AUC): Receiver Operating Characteristic (ROC) curve plots True Positive Rate (TPR) versus False Positive Rate (FPR) of a classifier at different decision thresholds. Consequently, the AUC value represents the area under the ROC curve. the larger the area under the curve is, the closer the AUC value is to 1.

IV-B Baselines and the State-of-Art Approaches

IV-B1 Exploring how to leverage the sparse matrix

Three indicators respectively are proposed to judge whether an abnormal stop occurs or not as follows:

  • All Stop Time (AST): The sum of the abnormal stop durations in the matrix 𝑬\bm{E}, i.e., j𝑬i,j\sum_{j}\bm{E}_{i,j} for the ii-th road segment. AST assumes that different coach drivers would spontaneously select the same spot for the IPPs activity.

  • Maximum Stop Time (MST): The max stop duration time of a coach in the matrix 𝑬\bm{E}, i.e., maxj𝑬i,j\max_{j}\bm{E}_{i,j} for the ii-th road segment. MST assumes that the largest stop duration would be the ideal abnormal stop spots for the IPPs activity.

  • Top-kk Averaged stop Time (TAT@kk): To remove the possible noisy in the matrix 𝑬\bm{E}, TAT@kk selects the top-kk stop duration at the ii-th road segment, i.e., 1ki=1ktopk(𝑬i,:)\frac{1}{k}\sum_{i=1}^{k}topk(\bm{E}_{i,:}), where the function topk(𝑬i,:)topk(\bm{E}_{i,:}) finds the top-kk values from the ii-th column of the matrix 𝑬\bm{E}. TAT@kk assumes that a few stops are caused by the non-abnormal stop reason. Therefore, TAT@kk is the a version of AST with the noise remove ability.

IV-B2 The possible SOTA solutions

To our best knowledge, we found that the off-the-shelf methods barely be directly used to our problem. In fact, stripping abnormal stops from a matrix is mathematically equal to matrix decomposition. Therefore, this paper proposed the three possible SOTA solutions and their reasons as follows:

  • Without Structure Sparsity (WST): WST ignores the physical constraints of spots as follows:

    argmin𝑰,𝑬rank(𝚯)+λ𝑬1s.t.𝑹=𝑹𝑰+𝑬,𝚯=𝑹𝑰,0𝑬,0𝑰ij1.\begin{split}\arg\min_{\bm{I},\bm{E}}\quad&{rank(\bm{\Theta})}+\lambda\|\bm{E}\|_{1}\\ s.t.&\bm{R}=\bm{R}\odot\bm{I}+\bm{E},\\ &\bm{\Theta}=\bm{R}\odot\bm{I},\\ &0\leq\bm{E},0\leq\bm{I}_{ij}\leq 1.\end{split} (24)

    WST assumes that the abnormal stops could occur at any spot. The comparisons between WSA and our method can reveal the importance of the physical constraints motivated group sparsity.

  • Without Stripping Abnormal Stop (WSA): The stop duration matrix 𝑹\bm{R} is directly used to discover the abnormal stop spots. The WSA strategy is widely used in the black spots identification task [6]. WSA does not assume that the IPPs activity is sparsely occurring. The comparisons between WSA and our method reveal whether ASSs are sparse, and whether matrix factorization could separate ASSs from the normal stop behaviors.

  • Using Instantaneous Speed of GPS (UIS): When the instantaneous speed of a coach is equal to 0, a stop spot is recorded by a GPS point. Concretely, the corresponding stop duration matrix 𝑹\bm{R} is firstly built by the zero instantaneous speed GPS. Second, we utilized the proposed method in Eq. (11) to discover the ASSs. The difference between UIS and our method is whether handing the low-frequency GPS or not. The comparisons between UIS and our method can reveal that the model that we extract the stop candidates from low-frequency GPS data are reasonable.

For a fair comparison, the other hyper-parameters of these SOTA methods are tuned to obtain the best performances.

IV-C Analysis of our approach

IV-C1 Selection of the abnormal stop indicators

We compared three indicators in Section IV-B1 when the proposed model in Eq. (11) has used with the a randomly guessed hyper-parameter, e.g., the sparsity weight λ=0.3\lambda=0.3, and the group sparsity β=0.3\beta=0.3. Table II details the performances of three indicators at the different length of the road segments, uncovering the 2 observations as follows:

TABLE II: The AP and AUC values for the different size (meters) of road segments.
Size 100 200 300 400
Indicators AP AUC AP AUC AP AUC AP AUC
AST 0.1833 0.1875 0.5556 0.7619 0.4167 0.6875 0.7750 0.7500
MST 0.1000 0.0556 0.5317 0.7143 0.4500 0.7500 0.7222 0.5833
TAT@2 0.1250 0.2222 0.5317 0.7619 0.4167 0.6875 0.7250 0.6250
  • The AST indicator achieves the best performance among these indicators in both AUC and AP in most cases. The explanation is that different coach drivers tend to consensually choose the same spots where the IPPs activity occurs. Compared to the other indicators, the sum operation in AST could suppress the noises in the abnormal stop situation.

  • AST slightly outperform MST and TAT@2. For example, when the size of the road segment is 200 meters, values of the AST indicator are 4.8% and 2.4% higher than that of MST in terms of both AUC and AP, respectively; while AST surpassed TAT@22 0% and 2.4% in terms of both AUC and AP, respectively. The comparative results reflect the viewpoint that “social support among survivors is common in mass emergencies and can be facilitated by action by professional groups” [34]. That is, different coach drivers would gradually choose a few preferable spots where the transport infrastructures and the road conditions facilities the occurrence of the IPPs activity.

In this paper, we use the AST as the indicator for the following experiments.

Refer to caption
(a) 200 meters
Refer to caption
(b) 300 meters
Figure 5: The values of the three indicators for every road segment when the interval is either 200 or 300 meters.

IV-C2 Effectiveness of the intervals

Table. II also detailed the effectiveness of the different length of road segments. We have observed that the lager size of road segments would bring a better result than that of the smaller one. The main exploitation is that it is easier to predict the ground truth in a larger road segment. However, as the size of road segments increases, identifying the exact locations of the IPP activity becomes more challenging. In this paper, we used the road segments with 200 meters to balance between the accuracy of the model and the difficulty to identify the IPPs spots.

Interestingly, we have observed that the performance slightly dropped when the interval was 300 meters. Fig. 5 further shows the detailed values of the three indicators for different size of road segments. When the road segment was 300 meters, the optimized abnormal stop duration at the ground truth road segment is significantly different than that of when the road segment is 200 meters in Fig. 5. The explanation is that the size of road segments would slightly change the distribution of the stop duration matrix, although the location inaccuracy is adopted.

Refer to caption
(a) The effectiveness of λ\lambda when β=0.1\beta=0.1
Refer to caption
(b) The effectiveness of β\beta when λ=0.1\lambda=0.1
Figure 6: The effectiveness of the β\beta and λ\lambda.

IV-C3 Influence of λ\lambda and β\beta

Fig. 6 shows that the impacts of the different settings of λ\lambda and β\beta. In Fig. 6(a), we fixed β=0.1\beta=0.1 and set the λ\lambda ranging for 0 to 1 with the step size 0.1. We observed that our method is very stable under the different hyper-parameters. Similarly, Fig. 6(b) fixed λ=0.1\lambda=0.1 and investigated the impact of β\beta. Fig. 6(b) shows that both AP and AUC remain unchanged under the different β\beta settings. This indicates that the non-negative β\beta has little impact on the model.

Moreover, the comparisons between Fig. 6(a) and 6(b) discovered that λ\lambda has a greater impact than β\beta. For instance, there is a little performance drop in terms of AP when λ\lambda is larger than 0.8 in Fig. 6(a), while β\beta shows no effect on the performance in the range of [0, 1] in Fig. 6(b). These results shows the the the sparsity term is more important the group sparsity one in Eq. (11). The explanation is that some spots are impossible to be selected as the candidate for the IPPs activity. The zero stop duration times for the #60 to #80 road segments in Fig. 5(a) also verify this phenomenon. In this paper, we simply used the following hyper-parameters: λ=0.1\lambda=0.1 and β=0.1\beta=0.1 in the following experiments.

IV-C4 Analysis of the convergence speed

To reduce the scale of problem, we firstly grouped road segments into several cluster by performing Affinity Propagation (AP) clustering [35] on GPS points. In our experiments, we have 23 clusters which ranges from 1.5 kilometers (KM) to 6 KM. That is, we can solve Eq. (11) in a distributed parallel strategy.

Fig. 7 illustrates the convergence speed of the matrix with 164×24164\times 24 in Eq. (11). Fig. 7 discovers that the optimization method in Alg. 2 would rapidly converge to a local minimum. For instance, the loss values decreased rapidly as the number of iterations increases from 0 to 5 in Fig. 7, where each iteration costed 0.004 seconds. Although the proposed method leverages ADMM to iteratively solve several sub-problems, the fast solvers for each sub-problem still guarantee the fast convergence speed in Alg. 7. The results indicate that the propose method would scale up to a large-scale problem in a parallel system.

Refer to caption
Figure 7: Convergence curves of one cluster
TABLE III: Comparisons between the SOTA approach and our method.
Methods AUC AP
WST 0.7143 0.5317
WSA 0.5238 0.3694
UIS 0.5312 0.2678
Our method 0.7619 0.5556

IV-D Qualitative Comparisons with the SOTA Methods

Table III compared the SOTA method in Section IV-B2 with our method. Table III shows that proposed method achieved the best AUC and AP values. The results indicate that our proposed method can effectively improve the accuracy of capturing abnormal stop spots. Besides, the superior performances of our method uncovered three observations as follows:

  • Group sparsity is very necessary to the ASS task. Our method significantly outperforms WST by 4.76% and 2.39% in terms of AUC and AP, respectively. This indicates that the group sparsity in Eq. (11) is very important to discover ASSs. The group sparsity empowers the model to remove the impossible stop spots.

  • Stripping abnormal stop duration from the stop duration is necessary. Our method significantly outperforms WSA, for instance, AUC and AP of the WSA are 23.81% and 18.62% lower than that of the proposed method, respectively. This means that there are many normal stop duration are mixed in the 𝑹\bm{R} matrix builded from strategy in WSA. Moreover, the significant performance gap also verifies that the low rank decomposition efficiently strip the abnormal stop duration form these normal ones.

  • Many abnormal stop behaviors barely been discovered by the instantaneous speed. Our method significantly outperforms UIS, for instance, AUC and AP of UIS are 23.07% and 28.78% lower than that of the proposed method, respectively. Because some abnormal stop would happen in 30 seconds. The comparisons between UIS and our method verify the effectiveness of the necessary and effectiveness of extracting candidate stop spots from low-frequency GPS in Section III-B.

Refer to caption
(a) #156 road segment.
Refer to caption
(b) #157 road segment.
Refer to caption
(c) #38 road segment.
Figure 8: The street-view of the road segments with the top-3 largest AST value.

IV-E An case study with discovered spots

Intuitively, the limited number of law enforcers should be dynamically allocated to the spots where coaches frequently stop for the IPPs activity. In terms of our approach, the AST of the 𝑬\bm{E} matrix reflects the possibility and the duration of the IPPs activity in a road segment. Except these well-understood spots (i.e., intercity bus stations, railway station, airports, and centralized business areas), the proposed method could effectively discover some spots which have the high probability to occur the IPPs activity. For instance, Fig. 8 illustrated three street-view of the road segments (i.e.,#156, #157, #38 in Fig. 5(a)) with the top-3 largest AST value. Note that #38, #157 road segments are the ground truth; while # 156 road segment is the IPPs spot discovered by our method. Specially, Figs. 8(a) and 8(b) shown that the stop spot is located near a highway toll station, facilitating coaches pick up passengers who have the long range travel demand. Besides, there is a bus stop which makes it convenient for shuffling passengers by buses. Moreover, several intersections provide a space for coaches to pick up passengers. For Fig. 8(c), the street-view images show that the stop spot is near an bus stop, subway station and the train station, which help coach drivers to pick up passengers. Because this area has a high probability to pick up the passengers with the long range travel demand. Additionally, there is a long side road which supplies a convenience for coaches to pull over.

V CONCLUSION

In this paper, we have described a method to discover the spots where the IPPs activity tend to occur for long range coach industry. Concretely, leveraging low-frequency GPS data, the proposed method efficiently identifies the the spots of the IPPs activity in an unsupervised approach by formalizing the challenge problem as the matrix factorization problem. There are significant contributions of the proposed method as follows:

  • To our best knowledge, we firstly introduce the IPPs problem, and treat this challenge as an unsupervised problem. The unsupervised method avoids to harvest the time-costing annotation, making our method practical to leverage the large scale GPS data from long range coaches.

  • We frame our search for ASSs as the matrix decomposition problem, where the low rank models the normal stops, a sparse matrix captures the abnormal ones, and a group sparsity constrains that some spots would never happen the IPPs activity.

  • To release the power of the low-frequent GPS, we propose an simple yet efficient method to discover the stop duration. Experimental results verify the effectiveness the linear change of velocity to discover the stop spots.

The experimental results demonstrate that the method not only accurately detects locations of the IPPs activity but also exhibits strong robustness. Compared to other methods, our approach has significantly improved detection accuracy. Furthermore, the method proposed in this study shows the explainable results. It can assist traffic law enforcement agencies in more effectively monitoring and managing the operation of long range coaches, thereby reinforcing codes of practice in the coach industry.

Future work will focus on further optimizing the algorithm to improve detection accuracy and applicability. Additionally, a deeper analysis of the patterns caused by the different size of road segments will be conducted. Another direction is to incorporate the semi-supervised method to the low rank [36].

References

  • [1] V. Van Acker, R. Kessels, D. Palhazi Cuervo, S. Lannoo, and F. Witlox, “Preferences for long-distance coach transport: Evidence from a discrete choice experiment,” Transportation Research Part A: Policy and Practice, vol. 132, pp. 759–779, 2020.
  • [2] S. Hasiak, F. Hasiak, and A. Egea, “Coach and train: Differences in individuals perception of these modes,” Transportation Research Procedia, vol. 14, pp. 1706–1715, 2016, transport Research Arena TRA2016.
  • [3] M. Bieler, A. Skretting, P. Büdinger, and T.-M. Grønli, “Survey of automated fare collection solutions in public transportation,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 14 248–14 266, 2022.
  • [4] Y. Dong, Y. Pan, D. Wang, and A. Chen, “Traffic load simulation for long-span bridges using a transformer model incorporating in-lane transverse vehicle movements,” IEEE Transactions on Intelligent Transportation Systems, pp. 1–14, 2024.
  • [5] G. Basulto-Elias, S. Hallmark, A. Barnwal, A. Sharma, M. Rizzo, and J. Merickel, “Strategy and safety at stop intersections in older adults with mild cognitive impairment and visual decline,” Transportation research interdisciplinary perspectives, vol. 22, 2023.
  • [6] Z. Fan, C. Liu, D. Cai, and S. Yue, “Research on black spot identification of safety in urban traffic accidents based on machine learning method,” Safety Science, vol. 118, pp. 607–616, 2019.
  • [7] J. Pang, J. Huang, X. Yang, Z. Wang, H. Yu, Q. Huang, and B. Yin, “Discovering fine-grained spatial pattern from taxi trips: Where point process meets matrix decomposition and factorization,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 10, pp. 3208–3219, 2018.
  • [8] V. Thibeault, A. Allard, and P. Desrosiers, “The low-rank hypothesis of complex systems,” NNature Physics, vol. 20, p. 294–302, 2024.
  • [9] X. Zhang, J. Zheng, D. Wang, G. Tang, Z. Zhou, and Z. Lin, “Structured sparsity optimization with non-convex surrogates of 2,0\ell_{2,0} -norm: A unified algorithmic framework,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 5, pp. 6386–6402, 2023.
  • [10] F. Cao, J. Chen, H. Ye, J. Zhao, and Z. Zhou, “Recovering low-rank and sparse matrix based on the truncated nuclear norm,” Neural Networks, vol. 85, pp. 10–20, 2017.
  • [11] Y. Yuan, S. Chunfu, C. Zhichao, H. Zhaocheng, Z. Changsheng, W. Yimin, and J. Vlon, “Bus dynamic travel time prediction: Using a deep feature extraction framework based on rnn and dnn,” Electronics, vol. 9, pp. 1876–1876, 2020.
  • [12] J. Pang, J. Huang, Y. Du, H. Yu, Q. Huang, and B. Yin, “Learning to predict bus arrival time from heterogeneous measurements via recurrent neural network,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 9, pp. 3283–3293, 2019.
  • [13] H. Shuang, F. Hui, Z. Jiahong, L. Junzhou, and Z. Weiliang, “Modelling and simulation of hierarchical scheduling of real-time responsive customised bus,” IET Intelligent Transport Systems, vol. 14, pp. 1615–1625, 2020.
  • [14] C. Boateng, K. Yang, S. G. Ara Ghoreishi, J. Jang, M. T. Jan, J. Conniff, B. Furht, S. Moshfeghi, D. Newman, R. Tappen, J. Zhai, and M. Rosseli, “Abnormal driving detection using gps data,” IEEE 20th International Conference on Smart Communities: Improving Quality of Life using AI, Robotics and IoT (HONET).
  • [15] W. Yu and Q. Huang, “A deep encoder-decoder network for anomaly detection in driving trajectory behavior under spatio-temporal context,” Int. J. Appl. Earth Obs. Geoinformation, vol. 115, p. 103115, 2022.
  • [16] L. Zhang, Y. Dong, H. Farah, A. Zgonnikov, and B. van Arem, “Data-driven semi-supervised machine learning with surrogate safety measures for abnormal driving behavior detection,” ArXiv, vol. abs/2312.04610, 2023.
  • [17] Y. Yang, J. Yan, J. Guo, Y. Kuang, M. Yin, S. Wang, and C. Ma, “Driving behavior analysis of city buses based on real-time gnss traces and road information,” Sensors (Basel, Switzerland), vol. 21, 2021.
  • [18] J.-G. Lee, J. Han, and X. Li, “Trajectory outlier detection: A partition-and-detect framework,” in 2008 IEEE 24th International Conference on Data Engineering, 2008, pp. 140–149.
  • [19] D. Kumar, J. C. Bezdek, S. Rajasegarar, C. Leckie, and M. S. Palaniswami, “A visual-numeric approach to clustering and anomaly detection for trajectory data,” The Visual Computer, vol. 33, pp. 265–281, 2017.
  • [20] Z. Fan, C. Liu, D. Cai, and S. Yue, “Research on black spot identification of safety in urban traffic accidents based on machine learning method,” Safety Science, vol. 118, pp. 607–616, 2019.
  • [21] K. Murata, E. Fujita, S. Kojima, S. Maeda, Y. Ogura, T. Kamei, T. Tsuji, S. Kaneko, M. Yoshizumi, and N. Suzuki, “Noninvasive biological sensor system for detection of drunk driving,” IEEE Transactions on Information Technology in Biomedicine, vol. 15, no. 1, pp. 19–25, 2011.
  • [22] Y. Kong and Y. Fu, “Human action recognition and prediction: A survey,” Int. J. Comput. Vision, vol. 130, no. 5, p. 1366–1401, May 2022.
  • [23] M. Quddus and S. Washington, “Shortest path and vehicle trajectory aided map-matching for low frequency gps data,” Transportation Research Part C: Emerging Technologies, vol. 55, pp. 328–339, 2015, engineering and Applied Sciences Optimization (OPT-i) - Professor Matthew G. Karlaftis Memorial Issue.
  • [24] X. Liu, K. Liu, M. Li, and F. Lu, “A st-crf map-matching method for low-frequency floating car data,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 5, pp. 1241–1254, 2017.
  • [25] L. Luo, X. Hou, W. Cai, and B. Guo, “Incremental route inference from low-sampling gps data: An opportunistic approach to online map matching,” Information Sciences, vol. 512, pp. 1407–1423, 2020.
  • [26] Y. Zhu, Z. Li, H. Zhu, M. Li, and Q. Zhang, “A compressive sensing approach to urban traffic estimation with probe vehicles,” IEEE Transactions on Mobile Computing, vol. 12, no. 11, pp. 2289–2302, 2013.
  • [27] X. Chen, J. Yang, and L. Sun, “A nonconvex low-rank tensor completion model for spatiotemporal traffic data imputation,” Transportation Research Part C: Emerging Technologies, vol. 117, p. 102673, 2020.
  • [28] Y. Yu, Y. Zhang, S. Qian, S. Wang, Y. Hu, and B. Yin, “A low rank dynamic mode decomposition model for short-term traffic flow prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 10, pp. 6547–6560, 2021.
  • [29] P. Ahmadi, R. Kaviani, I. Gholampour, and M. Tabandeh, “Modeling traffic motion patterns via non-negative matrix factorization,” in 2015 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2015, pp. 214–219.
  • [30] V. Karve, D. Yager, M. Abolhelm, D. B. Work, and R. B. Sowers, “Seasonal disorder in urban traffic patterns: A low rank analysis,” Journal of Big Data Analytics in Transportation, vol. 3, pp. 43 – 60, 2021.
  • [31] S. Lawlor and M. G. Rabbat, “Time-varying mixtures of markov chains: An application to road traffic modeling,” IEEE Transactions on Signal Processing, vol. 65, no. 12, pp. 3152–3167, 2017.
  • [32] Y. Xu, D. Li, and Y. Xi, “Perimeter traffic flow control for a multi-region large-scale traffic network with markov decision process,” IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 6, pp. 4809–4821, 2024.
  • [33] W. Dong, G. Shi, and X. Li, “Nonlocal image restoration with bilateral variance estimation: A low-rank approach,” IEEE transactions on image processing, vol. 22, no. 2, pp. 700–711, 2012.
  • [34] J. Drury, “Recent developments in the psychology of crowds and collective behaviour,” Current Opinion in Psychology, vol. 35, pp. 12–16, 2020, social Change (Rallies, Riots and Revolutions).
  • [35] B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” science, vol. 315, no. 5814, pp. 972–976, 2007.
  • [36] J. Chavoshinejad, S. A. Seyedi, F. Akhlaghian Tab, and N. Salahian, “Self-supervised semi-supervised nonnegative matrix factorization for data clustering,” Pattern Recognition, vol. 137, p. 109282, 2023.