This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

WiFi Fingerprint Clustering for Urban Mobility Analysis

Sumudu Hasala Marakkalage, Billy Pik Lik Lau, Yuren Zhou, Ran Liu, Chau Yuen, Wei Quin Yow, and Keng Hua Chong S.H. Marakkalage, B.P.L. Lau, Y. Zhou, R.Liu, and C. Yuen are with the Engineering Product Development Pillar, Singapore University of Technology and Design (SUTD), Singapore.
Corresponding author email: [email protected] W.Q. Yow is with Humanities, Arts and Social Sciences Pillar, SUTD K.H. Chong is with Architecture and Sustainable Design Pillar, SUTD.
Abstract

In this paper, we present an unsupervised learning approach to identify the user points of interest (POI) by exploiting WiFi measurements from smartphone application data. Due to the lack of GPS positioning accuracy in indoor, sheltered, and high rise building environments, we rely on widely available WiFi access points (AP) in contemporary urban areas to accurately identify POI and mobility patterns, by comparing the similarity in the WiFi measurements. We propose a system architecture to scan the surrounding WiFi AP, and perform unsupervised learning to demonstrate that it is possible to identify three major insights, namely the indoor POI within a building, neighborhood activity, and micro mobility of the users. Our results show that it is possible to identify the aforementioned insights, with the fusion of WiFi and GPS, which are not possible to identify by only using GPS.

Index Terms:
POI Extraction, Clustering, Data Fusion, Mobility Analysis, Unsupervised Learning.

1 Introduction

In recent times, mobile crowdsensing (MCS) has obtained a huge attention due to the pervasiveness of smart mobile devices, their in-built sensing abilities, and the fact that they have become an everyday carry item by humans. Therefore, plethora of MCS applications have become prominent in various sectors, namely transportation  [1], healthcare [2], and social networking platforms [3]. A particular phenomenon can be monitored by diverse information harnessed through smartphone applications with proper crowd participation [4, 5]. In mobility analysis applications, identifying detailed motion pattern information (outdoor and indoor) provides comprehensive insights on user mobility [6, 7, 8, 9]. Knowing the user points of interest (POI) is paramount in mobility tracking applications to provide context-aware services. Motion pattern learning and anomaly detection of human trajectories is done in [10] using Hidden Markov Models. Past research has conducted to detect the type of environment (i.e. indoor and outdoor) with the fusion of smartphone based sensor data [11, 12, 13]. Understanding the elderly lifestyle is studied in [14], using smartphone application data. Its main focus is to extract regions of interest (ROI) and POI with sensor fusion. Nonetheless, in contemporary urban indoor places (e.g. shopping malls, apartment complexes etc.), where massive crowd movements happen, aforementioned work are not sufficient to identify the indoor POI granularity.

Mobility tracking in indoor environments is a challenge because of the constraints to acquire fine-grained location based information in such places. Especially in high rise urban buildings/apartments, it is difficult to identify when people leave their home/office, and visit common areas within the same building or neighborhood POI, by only using GPS data. Even the GPS accuracy is low in those scenarios, we can distinguish such different places by incorporating WiFi data. Urban environments nowadays are equipped with plentiful of WiFi access points (AP). Hence, by combining or fusing GPS and WiFi information we intend to identify indoor POI (as first introduced in our previous work [15], and improved POI extraction technique in this paper), and introducing neighborhood activities, and micro mobility analysis information in this paper, by utilizing crowdsensing smartphone data. Prior research has utilized WiFi AP information to generate indoor floorplans [13, 16] and to identify indoor indoor locations through localization [17, 18, 19, 20, 21]. Major drawback of those work is, they require data collection in high sampling rates, which incurs high power consumption (a prime challenge in MCS [22, 23, 24, 25]). Furthermore, an extensive labor cost is required when creating indoor fingerprint maps, which is another drawback.

To identify the indoor POI, we focus on the mobility pattern of a typical user in indoor environments like shopping mall or apartment complex with POI, where users frequently visit, yet it is challenging to identify such POI by only using GPS location data, due to the lack of accuracy in indoor environments. For a particular user, after processing the raw GPS data, they may get clustered into one POI, when he visits a particular shopping mall, but in reality the user may have visited multiple POIs (e.g. visit different shops) within the same mall. This is due to the two dimensional nature of GPS data, which limits differentiation between multiple indoor POIs. Therefore, fusing GPS with WiFi data helps to identify such indoor POI.

Neighborhood activity analysis is conducted to understand the POI, where users visit in their residence neighborhood (e.g. common areas in an apartment complex). A user may visit a convenient store at downstairs for grocery, or may visit a common area in the same building to mingle with friends, as getting a short break while staying at home. Since those apartment buildings are high rise buildings (e.g. in Singapore, most of the apartments are high rise multi-storey buildings), such vertical mobility may not reflect in GPS location tracking. Hence, exploiting surrounding WiFi AP information is useful when identifying such neighborhood activities.

Micro mobility analysis is conducted to understand the mobility patterns of the users, due to blockage of GPS signal in sheltered walkway or void deck under high rise building. GPS alone may not give accurate information on such scenarios. Therefore, it would misinterpret same physical location with fluctuated GPS locations due to lack of accuracy. Investigating the surrounding WiFi AP information would indicate those fluctuated locations as one location, due to the similarity of WiFi measurement.

In a nutshell, the three main objectives of this article are to understand the distinct POI in indoor environments visited by users, neighborhood activity analysis, and micro mobility analysis. We verify the effectiveness of the proposed method, based on crowdsensing data collected from volunteers along with the visited POI ground truth. The contributions in this paper are listed below.

  • Introducing an unsupervised method to make use of the similarity of surrounding WiFi AP information of users to understand their mobility, and verify with real-world collected data.

  • Clustering of WiFi fingerprint in a given GPS POI to identify the distinct WiFi based POI of users in an indoor environment, the revisited POI by the same set of users, and the common POI among users.

  • GPS and WiFi data fusion to identify the neighborhood activity and heat map by excluding stay home duration.

  • Clustering of travel path WiFi fingerprints to identify the neighborhood micro mobility patterns that move under covered walkway or cutting across buildings.

The rest of this paper is organized as follows. In Section 2, the proposed system and its overview is presented. In Section 3, the unsupervised POI extraction technique and technical evaluations are presented. In Section 4, the neighborhood activity analysis process is presented along with the results. In Section 5, the micro mobility analysis technique and the results are presented. Section 6 presents the discussion and future work to conclude the paper.

2 System Overview

Identifying the trajectory of a user is essential in mobility analysis. Figure 1 shows a sample trajectory of a user. It consists of GPS stay points, indoor POI within a GPS stay point, neighborhood activity happen during a GPS stay point time duration, but doesn’t capture due to low GPS accuracy in indoor/high rise urban environments, and micro mobility (link) between two GPS stay points. In this paper, we identify those three insights on such a user trajectory.

Refer to caption
Figure 1: Example of a user’s trajectory

The proposed system comprises a smartphone application (front-end) to GPS location data, and surrounding WiFi AP information, which are transferred to a cloud-based server application (back-end). The collected raw GPS and WiFi data are further processed to identify the indoor POI, neighborhood activities, and neighbourhood micro mobility patterns of the users. Figure 2 shows an overview of the proposed system.

Refer to caption
Figure 2: Overview of the proposed system

2.1 Data Collection Mobile Apps

The surrounding WiFi AP information, namely MAC address and corresponding received signal strength (RSS) are scanned by the smartphone application, which acquires data at a sampling rate of 55 minutes to conserve the power, since excessive scanning of WiFi and GPS heavily impacts on mobile phone battery consumption, according to the Android API [26].

2.1.1 WiFi Scanning

Let MAC address of the WiFi AP be mm, and RSS of the AP be rr in dBmdBm. The list of surrounding AP MAC addresses and their corresponding RSS, which is also called scan result (ss) is shown in the Equation 1, where nn is the number of AP observed in a given scan result.

s={m1,r1},{m2,r2},,{mn,rn}s={\{m_{1},r_{1}\},\{m_{2},r_{2}\},...,\{m_{n},r_{n}\}} (1)

Each scan result and the corresponding timestamp (tt) of the WiFi scan is stored in a list of scan results (SS), denoted as shown in Equation 2, where mm is the number of scan results in SS.

S={s1,t1},{s2,t2},,{sm,tm}S={\{s_{1},t_{1}\},\{s_{2},t_{2}\},...,\{s_{m},t_{m}\}} (2)

The scanned list of scan results is stored locally in the device until it is uploaded to back-end for further analysis.

2.1.2 Data Compression

To avoid the extensive cost in transmitting the raw data into the back-end, we compress the raw data. As shown in Table I, we select 66 hour duration as the upload interval as it has a significant reduce in size when compressed. Data upload happens only when the device is connected to a WiFi network. Otherwise, the smartphone application keeps the data until it connects to a network.

TABLE I: Comparison of data size before and after compression
Duration of data
(Hours)
Size (uncompressed)
(Bytes)
Size (compressed)
(Bytes)
0.5 20,701 656
1 41,401 791
3 172,501 1,562
6 345,001 2,565

2.1.3 User Information

Users and their smartphone details (which are used for the later part of experimental study) are shown in the Table II.

TABLE II: Users and smartphone models
User Model
A OnePlus 3
B,F Samsung S8
C Sony Z3
D Google Pixel 2
E Huawei Nova 2i
G Xiaomi Max 2
H Oppo F5
I Oppo R11
J Xiaomi Mix 2
K Samsung A8
L LG V30

2.2 GPS Stay Points Extraction

The received data in back-end, are processed to understand the indoor POI, neighborhood activity, and micro mobility patterns of the users.

We obtain the GPS based stay points of users, using the data processing pipeline as shown in Figure 2’s GPS stay points extraction module. First, we conduct the data processing of the raw GPS data, which includes components such as removing abnormal, zero distance sequence, and low accuracy GPS location [25]. The abnormal data here includes GPS data with sudden location shift within a short period of time, which can distort the actual path traveled by the users. Zero distance often occurs when the GPS does not receive any signal from the satellite, which causes exact same location for subsequent data. This does not provide any meaningful data for us, and hence we filter it out. The next technique is accuracy filtering, where low accuracy GPS data are removed, that causes high uncertainty in determining the actual location of users. Next, we perform GPS stay point extraction [27, 28] to obtain the list of POIs from a particular user with the timestamp for each visit. Afterwards, we cluster POIs based on their geographical location using DBSCAN to group POI for similar places. This briefly explains the GPS stay points extraction method. After obtaining the raw GPS stay points, we use the duration of the GPS stay points to further detect indoor POI for the users.

3 Indoor POI Study

In this section, we present the techniques that used to extract indoor POI of the users, by processing the GPS and WiFi data collected through the smartphone based mobile application. Indoor POI extraction is performed by clustering the WiFi fingerprints and matching the corresponding cosine similarity scores. Table III shows the symbols used in this section, and their description for the convenience of the reader.

TABLE III: Symbols and their description for clustering algorithm
Symbol Description
ϵ\epsilon Similarity threshold
FF WiFi fingerprint
RR RSS (average) in dBm
pp Distinct MAC address count
DD Cosine similarity distance
α\alpha, β\beta Scan result from a list of scan results
YY Dot product of two WiFi fingerprints
ω\omega Number of common mac addresses
CC Cosine similarity score

In following subsections, we present the details of the WiFi fingerprint clustering and the similarity metrics used in indoor POI extraction process.

3.1 Unsupervised Indoor POI Extraction

Research done in [12] has experimentally evaluated different clustering algorithms, and has chosen DBSCAN [29] as the most suitable method because of its ability to form arbitrary shaped clusters. For indoor POI extraction in this paper, we introduce a modified DBSCAN algorithm to cluster the WiFi RSS measurements (i.e. clustered first using GPS data, as mentioned in Section 2.2). We employ cosine similarity score between two RSS values as the distance metric of the modified DBSCAN algorithm. A cluster (POI) is formed when a user stays for at least 2020 minutes in the same place. Therefore, we choose DBSCAN parameters namely, minimum required points to form a cluster (minPtsminPts) as 44 (based on 55 minute WiFi scan interval), and the cosine similarity threshold (ϵ\epsilon) to be adaptive, based on the Algorithm 3 (parameter selection is done by experimental evaluation as presented in Table IV). Algorithm 1 explains the procedure of clustering for a given set of WiFi data (SS), the similarity threshold ϵ\epsilon, and the minimum points to form a cluster minPtsminPts. PP is the list of output cluster points.

Input: similarity threshold (ϵ\epsilon), minPtsminPts, WiFi list (SS)
Output: Cluster point list (PP)
Visited points (VpV_{p}), index (z1)(z_{1}), P=0P=0
while size of SS \geq z1z_{1} do
       α=S[zi]\alpha=S[z_{i}]
       if αVp\alpha\not\subset V_{p} then
             add α\alpha to VpV_{p}
             NN = get neighbours of α\alpha
             if size of NminPtsN\geq minPts then
                  z2=0z_{2}=0
                   while size of Nz2N\geq z_{2} do
                        β=N[z2]\beta=N[z_{2}]
                         if βVp\beta\not\subset V_{p} then
                              add β\beta to VpV_{p}
                               QQ = get neighbours of β\beta
                               if size of QminPtsQ\geq minPts then
                                    merge QQ with NN
                               end if
                              
                        else
                              z2=z2+1z_{2}=z_{2}+1
                         end if
                        
                   end while
                  add NN to PP
             end if
            
      else
             z1=z1+1z_{1}=z_{1}+1;
            
       end if
      
end while
Algorithm 1 POI extraction from raw WiFi data

The process of obtaining the neighbour points is shown in Algorithm 2, where inputs are α\alpha and SS, and the output is NN, which are mentioned in Algorithm 1. The details of the similarity metric is explained in Section 3.1.1. The computation is done separately for each user. The worst case run time complexity for DBSCAN algorithm is O(n2)O(n^{2}), where nn is the number of WiFi scan results (S)(S) for a given user.

Input: Scan result (α\alpha),WiFi list (SS)
Output: Neighbour points (NN)
N=0N=0
for every index i in S do
      D=D= calculate similarity(α,S[i]\alpha,S[i])
       ϵ\epsilon = calculate threshold(α,S[i]\alpha,S[i])
       if DϵD\geq\epsilon then
            add S[i]S[i] to NN
       end if
      
end for
Algorithm 2 Obtaining the neighbour points
Input: Fingerprints (F1F_{1},F2F_{2})
Output: Similarity threshold (ϵ\epsilon)
if F1ALF_{1}\leq A_{L} and F2ALF_{2}\leq A_{L}  then
       ϵ\epsilon = ϵL\epsilon_{L}
else
      ϵ\epsilon = ϵH\epsilon_{H}
end if
Algorithm 3 The process of threshold calculation

After getting the final list of clusters, a fingerprint for each cluster (indoor POI) is generated with a unique POI ID (i.e. indoor POI ID is unique to a given GPS stay point). The POI fingerprint (FF) is denoted as shown in the Equation 3, where MM is the MAC address, RR is the corresponding average RSS in dBmdBm, and pp is the number of distinct MAC addresses scanned at that POI.

F={M1,R1},{M2,R2},,{Mp,Rp}F={\{M_{1},R_{1}\},\{M_{2},R_{2}\},...,\{M_{p},R_{p}\}} (3)

3.1.1 Cosine Similarity

We employ cosine similarity as the distance metric in DBSCAN algorithm. The similarity score between two WiFi fingerprints F1F_{1} and F2F_{2} is calculated as shown below.

F1={M11,R11},{M21,R21},,{Mu1,Ru1}F_{1}={\{M^{1}_{1},R^{1}_{1}\},\{M^{1}_{2},R^{1}_{2}\},...,\{M^{1}_{u},R^{1}_{u}\}} (4)
F2={M12,R12},{M22,R22},,{Mv2,Rv2}F_{2}={\{M^{2}_{1},R^{2}_{1}\},\{M^{2}_{2},R^{2}_{2}\},...,\{M^{2}_{v},R^{2}_{v}\}} (5)

where uu and vv denote the number of distinct MAC addresses in F1F_{1} and F2F_{2} respectively. The dot product of RSS in common MAC addresses for the two fingerprints (YY) is calculated according to the Equation 6, where ww denotes the number of common MAC addresses.

Y=i=1w[Ri1Ri2]Y=\sum_{i=1}^{w}[R^{1}_{i}\cdot R^{2}_{i}] (6)

The dot products of each RSS in F1F_{1} and F2F_{2} are calculated according to the Equations 7 and 8 respectively.

d1=j=1u[Rj1Rj1]d_{1}=\sum_{j=1}^{u}[R^{1}_{j}\cdot R^{1}_{j}] (7)
d2=k=1v[Rk2Rk2]d_{2}=\sum_{k=1}^{v}[R^{2}_{k}\cdot R^{2}_{k}] (8)

The cosine similarity (CC) between the two WiFi fingerprints is calculated according to the Equation 9.

C=Y/(d1×d2) ; where 0C1C=Y/(\sqrt{d_{1}}\times\sqrt{d_{2}})\text{ ; where }0\leq C\leq 1 (9)

3.1.2 Impact of Cosine Similarity Threshold

We evaluated the different cosine similarity threshold values and their impact on indoor POI extraction. Table IV shows the performance of two different similarity threshold values (i.e. adaptive vs. fixed), which are evaluated together with the ground truth labels for the user C in Table II. When the threshold is adaptive, the indoor POI extraction result aligns with the ground truth. POI ID 0505 is identified as an additional POI when the threshold value is 0.50.5 as highlighted in red in the Table IV. When the threshold value is fixed, different POI ID occurs in home environment. This is due to smaller size in the WiFi AP (<AL=35<A_{L}=35), observed in that environment. Therefore, we can observe that when the similarity threshold is fixed, even the changes in the size of the scanned AP list (e.g. residential AP list sizes are substantially low, when compared to shopping mall or office AP sizes) have an impact on the WiFi cluster formation.

TABLE IV: Impact of cosine similarity threshold for indoor POI extraction
POI ID
Ground Truth Start Time (HH:mm) End Time (HH:mm) ϵ=\epsilon= adaptive ϵ=0.5\epsilon=0.5
Home 00:00 09:23 01 01
Office 09:59 11:34 02 02
Meeting Room 11:58 14:57 03 03
Canteen 15:29 16:39 04 04
Office 16:44 17:09 02 02
Home 17:49 18:54 01 01
Home 18:59 20:39 01 05
Home 20:48 23:53 01 01

3.1.3 Popular POI Among Users

Knowing the popular POI among users is as equally important as knowing individual indoor POI when conducting user mobility analysis. We make use of Louvain method for community detection [30] to gain insights on popular POI among users. In a given indoor environment, let the number of POI be λ\lambda, and the number of pair-wise cosine similarity scores (II) is calculated according to the Equation 10.

I=h!2!(h2)!I=\frac{h!}{2!(h-2)!}\quad (10)

The Louvain algorithm takes II as the input and obtains the optimum partitioning among POI (nodes) by comparing pair-wise similarity (edges) scores, and provides the modularity as the output. The results of indoor POI extraction are presented in the following subsection.

3.2 Results

We collected WiFi and GPS data from a set of users (who use different smartphone models) together with the ground truth labels of the POI they visited. The experimental results for single user and multi user indoor POI extraction are presented in the following subsections.

3.2.1 Single User POI

For single user POI identification, we did an experiment to identify the POI when a single user visits the same POI multiple times. The WiFi clustering results are compared along with the ground truth. Table V presents the single user POI identification for user H, during one year time duration of POI visits in Changi General Hospital. According to Table V, the POI ID is different for different locations (i.e. WiFi clusters) inside the building. The proposed clustering technique is capable to detect when the user revisits POI ID 0404, 0707, 1010, 1111 and 1212.

TABLE V: Single user indoor POI results
Ground
Truth
Date
(yyyy-mm-dd)
Start Time
(HH:mm)
End Time
(HH:mm)
POI
ID
Center for Innovation 2019-01-22 11:32 11:57 04
2019-02-21 16:09 17:36
2019-08-06 10:02 10:52
Level 6
Room
2019-04-02 10:58 12:23 06
Main Board Room 2019-05-08 16:58 18:48 07
2019-08-13 15:10 15:35
Level 8 Room 2019-07-11 09:44 10:24 10
2019-07-18 08:39 09:32
Level 7 Room 2019-07-24 15:14 16:14 11
2019-07-31 08:48 09:59
2019-08-01 11:57 14:14
Ward 45 2019-08-02 13:12 15:54 12
2019-10-15 15:15 15:54

3.2.2 Multi User POI

In this subsection, we intend to identify the indoor POI, which are popular among multiple users. We select Changi City Point (CCP), where 1111 users (i.e. A to L in Table II) from our experiment visit for the purpose of shopping/dining, for a duration of 3 months. The clustering results detected 4141 indoor POI at the CCP. 820820 pair-wise similarities are given as the input to Louvain algorithm for community detection as shown in the Equation 10. We evaluated the modularity score for different similarity threshold values as shown in the Table VI to understand the optimum partition for communities. Since, different POI are different in terms of area size (e.g. food court is larger than clothing shop), our objective is to detect even the smallest POI visited by users. Therefore, for POI identification we selected 0.50.5 as the partitioning threshold for community detection.

TABLE VI: Louvain modularity score for different partitioning thresholds
Threshold Value Modularity Score
0.2 0.625
0.3 0.803
0.4 0.766
0.5 0.692

Table VII shows the details common POI visited by the 1111 users in Changi City Point. The shopping mall is a three-storey building with Basement 1 (B1), Level 1 (L1), and Level 2 (L2). Users from the study carried on with their normal routine to the mall for shopping/dining purposes. From the table we can observe that, users H, J, and K visited 33 different restaurants (at 3 different timing) in B1, denoted with indoor POI IDs 0000, 1111, and 2929 respectively. The Restaurant 4 in L1 also obtained the same indoor POI ID (i.e. POI ID 1111) as the Restaurant 2 in B1, where user J visited. When we checked the shopping mall layout, we observed that even those two restaurants are in two different levels, they are located right above one another, as shown in Figure 3. Also, there is a wide opening between them, which leads to similar WiFi measurement at those two places. Another observation is that, a large area like food court (almost half the size of L2) is divided into multiple POI, since users sat on various places and the WiFi RSS measurement is fluctuating due to large crowd.

TABLE VII: Common POI among different users in Changi City Point shopping mall
Floor Level
Ground Truth
POI ID
User(s)
B1 Restaurant 1 00 H
Restaurant 2 11 J
Restaurant 3 29 K
Drink Shop 05 G
Utility Store 13 K
L1 Restaurant 4 11 B,H
Clothing Shop 1 27 A
L2 Clothing Shop 2 08 C
Clothing Shop 3 09 C,D,I
Food Court 01, 02 A
03, 05 C
07, 10 D
12, 15 F
21, 23 J
24, 25 K
Refer to caption
Figure 3: Changi City Point Basement 1 and Level 1 Layout.

4 Neighborhood Activity Study

In an urban area, majority of the residents tend to visit nearby places of home or office for shopping and leisure activities during their free time. Especially in dense areas, where high rise buildings are common as residences, conventional GPS clustering approach using GPS data may indicate such a building as one POI, but in reality there are many possible POIs (e.g. convenient store, common area, BBQ pit etc.) in a multi-storey setting. This is due to the dimensionality nature of GPS data, and GPS data alone cannot provide accurate information on stay points at micro level. Moreover, it is useful to understand the user stay points in the residential(home) neighborhood. We define such stay points or places of short duration as neighborhood activity, and exploit WiFi fingerprint along with GPS data to identify such neighborhood activity.

4.1 Neighborhood Activity Data Processing Architecture

In order to extract the neighborhood activity from the trajectory, we leverage the concept of sensor fusion to combine GPS and WiFi information sources. The overall process of the neighborhood activity extraction is illustrated in Figure 4.

Refer to caption
Figure 4: Neighborhood activity data processing pipeline.

There are two main data sources used in the processing stage, which are WiFi and GPS stay points. First, we identify the GPS stay points and label them accordingly to understand the characteristics of each GPS POI. Subsequently, we filter it by time to study particular point of interest. Note that while we use a user’s house as point of interest, it could also be an office or any other GPS POI. The filtered GPS stay points will be fused with WiFi stay points to generate GPS+WiFi stay points (WiFi stay points are generated according to Algorithm 1). Among all the GPS and WiFi stay points identified based on the duration of stay, one can easily deduce home or office heuristically, which both are stay points with the longest stay durations. The rest of the GPS and WiFI stay points will be the neighborhood POI. The remaining raw GPS points (moving points) that occurred between neighborhood POI and a specific POI can be further converted into heat maps to capture potential neighborhood activity that does not form a stay point.

As a proof of concept, we perform a simple case study for an user H using WiFi and GPS data for 66 hours period of the day of 18 September 2019. We compared the methods between different GPS and GPS+WiFi stay points as illustrated in Figure 5(a). We observe that GPS+WiFi stay points data fusion method is able to detect the neighborhood activity, where the GPS stay points method is not capable. Subsequently, visualization of the stay points are shown in Figure 5(b), where gray icons represents traveling GPS data, and pink and green icon denotes home and neighborhood POI. The neighborhood area is located not far away from the residential area, which is less than 100m100m. Using only GPS stay points, it may appear that neighborhood activity is almost similar location to stay points, and thus clustering as same stay points. Therefore, we are able to detect the neighborhood activity accurately, using GPS+WiFi data sources, compared to using GPS only as data source.

Refer to caption
(a) Stay points extraction methods comparison
Refer to caption
(b) Neighborhood activity’s visualization
Figure 5: Toy example of performing neighborhood activity extraction.

4.2 Results

The results of the neighborhood activity study are presented in the following subsections.

4.2.1 Single User Neighborhood Activity

Refer to caption
(a) Raw GPS Data
Refer to caption
(b) GPS Stay Points
Refer to caption
(c) Neighborhood Activity by GPS+WiFi Stay Points
Figure 6: Comparison of raw data, GPS stay points, and neighborhood activity by GPS+WiFi stay points, for user H, from 01 April 2019 to 01 December 2019.

Using the aforementioned extraction techniques, we perform study on the user H over 8 months starting from 01 April 2019 to 01 December 2019. The raw GPS data is extracted based on the home’s location of user H, where unrelated GPS data are filtered in order to help us focus on that particular region. Note that, same temporal notion of the home stay point is applied to WiFi data to study neighborhood activity. Figure 6 shows the comparison of neighborhood activity obtained by GPS and GPS+WiFi. From Figure 6(a), we can observe from the raw data that user H has traveled to nearby places from home, where GPS stay point in Figure 6(b) fails to detect such events. It could happen because of the POI user traveled is a nearby location, which is indistinguishable by the GPS data. Using the GPS+WiFi stay points in Figure 6(c), we are able to detect neighborhood POI that the an user has visited (green, blue, and purple icons). Also, since WiFi data is fused with GPS stay points, specific POI location can be exactly identified, and the remaining moving raw GPS points are converted to heat map. The blue and green icons represent the housing recreational facilities, while the purple icon is referred to a nearby community mall. From the heat map, we notice some hot spots from the heat maps that user has visited while at the home region, but it does not form a stay point. To check whether that particular hot spots observed from the heat maps, we also validate the corresponding location with user H with each neighborhood activity shown in the blue patch on the top right corner. It turns out that the user H only visit the location for a short period of time, which stay duration is lesser than the predefined stay time threshold. Hence, stay point is not formed due to short duration, and only can be observed through heat maps. In a nutshell, we have demonstrated that through combination of GPS along with WiFi stay points, neighborhood activity can be detected to provide in-depth information to daily trajectory of the user.

To contrast, there is no traveling event around the neighborhood captured using GPS stay points detection as shown in Figure 5(a) and Figure 6(b). Therefore, we have demonstrated using GPS+WiFi data, it is possible to detect neighborhood activity within a region to further enhance user’s trajectory data context.

4.2.2 Multi User Neighborhood Activity

Figure 7 shows the neighbourhood activity obtained using the proposed method for three users who reside in the same neighborhood. These users are out of the Table II and their POI visit ground truth is unknown. Figure 7(a) shows the raw GPS points for the three users. Figure 7(b) shows the GPS stay points by the three users, while Figure 7(c) shows the GPS+WiFi stay points (blue pins) and the home locations (pink pins) for each user. By comparing the figures, we can observe that WiFi data cleans up a lot of inaccuracies of the GPS stay points for the three users.

One can notice that in the areas of 1, 5, and 6, the POIs become clearer in Figure 7(c) as compared to Figure 7(b). The WiFi information help us to identify GPS stay point that belong to the same POI. The heatmap in area 2, indicate the users walk along the river side, which is missing from Figure 7(b). In addition, while the users stay in area 3, there are quite a number of POIs in area 3 we well (those believe to be void deck directly underneath of the user’s home), and once again, these POIs are not visible by GPS in Figure 7(b), as they all are identified as user’s home. Finally, new POI is identified in area 4.

Refer to caption
(a) Raw GPS Data
Refer to caption
(b) GPS Stay Points by 3 Users
Refer to caption
(c) Neighborhood Activity by 3 Users using GPS+WiFi
Figure 7: Comparison of the neighbourhood activity between 3 users, data collected from 01 August 2020 to 15 March 2021. Note that heatmaps in GPS+WiFi is represented by three different colours (red, green, and blue) to indicate different users’ trajectory.

5 Micro Mobility Study

Users in the same residential neighborhood might share similar mobility patterns around the neighborhood. We define such mobility patterns as micro mobility of the neighborhood, and extract the mobility paths through a combination of WiFi and GPS data. The following subsections present the data processing technique and the results for micro mobility analysis.

5.1 Micro Mobility Path Extraction

To further study the mobility data using both WiFi and GPS data, we propose a data processing pipeline as shown in Figure 8 below.

Refer to caption
Figure 8: Micro mobility analysis data processing pipeline

Using the trajectory data obtained from the GPS stay points, we perform the timeline extraction to obtain the exact moment of WiFi samples needed to further study micro mobility. Neighborhood WiFi trajectory data are clustered together, using DBSCAN for all users who live in the same neighborhood. WiFi based clustering process is shown in Algorithm 4.

Input: Trajectory WiFi (STS_{T}) and GPS (LTL_{T}), ϵ\epsilon, minPtsminPts
Output: Processed WiFi based clusters (NpN_{p})
NP=0N_{P}=0
Cluster list (CC) = DBSCAN (STS_{T}, ϵ\epsilon, minPtsminPts)
for every index i in C do
      Ln=L_{n}= nearest GPS(t,C[i],LTt,C[i],L_{T})
       if accuracy (a)aL(a)\leq a_{L} then
            add average C[i]C[i] to NPN_{P}
      else
            get lowest accuracy, add to NPN_{P}
       end if
      
end for
Algorithm 4 The process of extracting WiFi based micro mobility clusters

Since we want to identify similar trajectory path, our objective in this scenario is different from that of identifying indoor stay points in Section 3. To understand the micro mobility, we need to identify the travel path, not the stay point. In other words, our objective is to clear up a messy interpretation of GPS map (as shown in Figure 10(a)) into a clearer map (as shown in Figure 10(c)) Therefore, DBSCAN parameters are different in this scenario. We set minPts=1minPts=1 as we need to include every WiFi scan result into the clustering process. Since our sampling rate is low (55 min), a user can travel a substantial distance during that time period. Therefore, we want to include every scan result to the clustering process, when identifying the mobility pattern. By evaluating the clustering results, we choose threshold level for cluster formation (ϵ\epsilon) to provide enough number of clusters to represent the user travel path which reducing the average distance error in WiFi based GPS clusters. Moreover, the number of APs we observe in outdoors are below the low AP level (ALA_{L}).

Once the clustering is completed, we obtain nearest GPS point for each cluster point’s timestamp. If there are more than one member in a particular cluster, we obtain the average of the nearest GPS points with high accuracy (i.e. accuracy aL=25m\leq a_{L}=25m) and represent one WiFi based cluster with one GPS point. If all the members in a cluster indicate low GPS accuracy (i.e. accuracy >aL=25m>a_{L}=25m), we get the lowest accuracy value GPS point (which means the highest GPS accuracy), and discard the rest of the members in the cluster.

Figure 9 shows the comparison between the number of clusters, average distance error (in meters) of the cluster points, vs. different threshold values for WiFi based clustering.

Refer to caption
Figure 9: Comparison of different threshold values with average distance error (m) and number of clusters

We can observe from the Figure 9, that when the threshold value increases the number of clusters also increase and the average distance error decreases. Our objective is to reduce the number of clusters (to obtain a clearer mobility path) and to reduce the average distance error. When ϵ=0.25\epsilon=0.25, we get 140140 clusters with 430.1430.1 meters of distance error. In contrast when ϵ=0.3\epsilon=0.3, we get 321321 clusters with 241.1241.1 meters of distance error. Therefore, by considering this trade-off we select ϵ=0.3\epsilon=0.3 as the threshold value for WiFi based clustering. It gives enough number of clusters to represent a messy GPS micro mobility path into a clearer path while having a reduced average distance error.

5.2 Results

We study the mobility pattern of 33 users (denoted as A, B, and L in the Table II) from our study, who live in the same neighborhood (i.e. Simei area in Singapore) and work at the same place (i.e. Singapore University of Technology and Design). Most of the times, these 33 users commute by walking. The results of the WiFi based clustering is shown in Figure 10. The figure 10(a) shows the raw GPS for 33 different users, which consists of GPS data points within the travel duration from individual home to work. Note that, each color denotes a separate user L-Purple, A-Yellow, and B-Black, and not all the users have same data amount despite the same timeline, which is from 01 December 2019 to 31 December 2019.

Refer to caption
(a) Raw GPS data (before WiFi based clustering), denoted by users Purple - L, Green - A, and Black - B.
Refer to caption
(b) Zoomed in satellite view of raw GPS data fluctuated along a sheltered walkway, denoted by users Purple - L, Green - A, and Black - B.
Refer to caption
(c) 140 GPS points after WiFi based clustering (ϵ=0.25\epsilon=0.25) by all three users.
Refer to caption
(d) Zoomed in satellite view of GPS points after WiFi based clustering (ϵ=0.25\epsilon=0.25), aligned through the walkway.
Refer to caption
(e) 321 GPS points after WiFi based clustering (ϵ=0.3\epsilon=0.3), by all three users.
Refer to caption
(f) Zoomed in satellite view of GPS points after WiFi based clustering (ϵ=0.3\epsilon=0.3), aligned through the walkway.
Figure 10: Comparison of before and after WiFi based clustering for different threshold values. 8345 points are reduced into 140 points (ϵ=0.25\epsilon=0.25) and 321 points (ϵ=0.3\epsilon=0.3).

Based on the raw data, we perform data processing as shown in the previous subsection, and cluster the locations, based on WiFi similarities to preserve significant GPS points. A total of 83458345 raw GPS points are simplified into 140140 (ϵ=0.25\epsilon=0.25) and 321321 (ϵ=0.3\epsilon=0.3) clusters as shown in Figure 10(c) and Figure 10(e) respectively, based on WiFi fingerprint clustering method. In other words, each point in Figures 10(c) and 10(e) represents one WiFi based cluster, which is mapped into the nearest GPS point by timestamp. By comparing the Figures, we can observe that WiFi based clustering helps to represent the data in a clearer way, instead of the messy data representation obtained by only using GPS data.

Figure 10(b), Figure 10(d), and Figure 10(f) show zoomed in satellite view of raw gps points shown in the red sqaure area in Figure 10(a), Figure 10(c), and 10(e) respectively. The red square area has a sheltered walkway at the side of the road. By comparing the figures 10(b), 10(d), and 10(f), we can see that when ϵ=0.3\epsilon=0.3, the clustered points are aligned through the walkway. Therefore, WiFi based GPS clustering helps to identify micro mobility patterns of users, which is not possible by only visualizing raw GPS data.

6 Discussion and Conclusion

We introduce a mobile crowdsensing system in this paper, to understand three major insights for urban mobility analysis through WiFi fingerprint clustering. Data collected from a smartphone application (GPS location, surrounding WiFi access points) are used to identify the indoor POI within a building, obtain neighborhood activity, understand and micro mobility patterns of the users.

We have demonstrated that, through the fusion of GPS data along with WiFi AP information, it is possible to identify the indoor POI among different users, which are not possible to identify only using GPS location data. We introduce neighbourhood activity analysis to identify the POI, where users visit for a short break, while staying at home (e.g. visit a common area in the same building, but a different floor level). Since urban apartment complexes are high rise buildings, GPS alone fails to identify such activities, yet the combination of GPS and WiFi can provide meaningful insights. Also, by such fusion we can identify neighborhood activity. When a user walks under a sheltered walk way, GPS lacks positioning accuracy and fluctuates a lot from the actual physical location. Therefore, it is impossible to capture such mobility patterns by only using GPS stay point extraction. We demonstrated that it is possible to interpret the user mobility paths by WiFi clustering based GPS points, for the purpose of identifying the common trajectories. For future work, our aim is to deploy the proposed system into a bigger user group, build a POI recommendation platform, and conduct user profiling based on their mobility patterns.

Acknowledgments

This research, led together with the Housing and Development Board, is supported by the Singapore Ministry of National Development and the National Research Foundation, Prime Ministers Office under the Land and Livability National Innovation Challenge (L2 NIC) Research Programme (L2 NIC Award No. L2NICTDF1-2017-4). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of the Housing and Development Board, Singapore Ministry of National Development and National Research Foundation, Prime Ministers Office, Singapore.

References

  • [1] K. Farkas, G. Feher, A. Benczur, and C. Sidlo, “Crowdsending based public transport information service in smart cities,” IEEE Communications Magazine, vol. 53, no. 8, pp. 158–165, 2015.
  • [2] C. Leonardi, A. Cappellotto, M. Caraviello, B. Lepri, and F. Antonelli, “Secondnose: an air quality mobile crowdsensing system,” in Proceedings of the 8th Nordic Conference on Human-Computer Interaction: Fun, Fast, Foundational.   ACM, 2014, pp. 1051–1054.
  • [3] X. Hu, X. Li, E. Ngai, V. Leung, and P. Kruchten, “Multidimensional context-aware social network architecture for mobile crowdsensing,” IEEE Communications Magazine, vol. 52, no. 6, pp. 78–87, 2014.
  • [4] S. Hoteit, S. Secci, S. Sobolevsky, C. Ratti, and G. Pujolle, “Estimating human trajectories and hotspots through mobile phone data,” Computer Networks, vol. 64, pp. 296–307, 2014.
  • [5] C. Kang, S. Sobolevsky, Y. Liu, and C. Ratti, “Exploring human movements in singapore: a comparative analysis based on mobile phone and taxicab usages,” in Proceedings of the 2nd ACM SIGKDD international workshop on urban computing.   ACM, 2013, p. 1.
  • [6] Q. Li, Y. Zheng, X. Xie, Y. Chen, W. Liu, and W.-Y. Ma, “Mining user similarity based on location history,” in Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems.   ACM, 2008, p. 34.
  • [7] Y. Lou, C. Zhang, Y. Zheng, X. Xie, W. Wang, and Y. Huang, “Map-matching for low-sampling-rate gps trajectories,” in Proceedings of the 17th ACM SIGSPATIAL international conference on advances in geographic information systems.   ACM, 2009, pp. 352–361.
  • [8] C. M. Gamanayake, L. A. Jayasinghe, B. Ng, and C. Yuen, “Cluster pruning: An efficient filter pruning method for edge ai vision applications,” IEEE Journal of Selected Topics in Signal Processing, 2020.
  • [9] Ó. Helgason, S. T. Kouyoumdjieva, and G. Karlsson, “Opportunistic communication and human mobility,” IEEE Transactions on Mobile Computing, vol. 13, no. 7, pp. 1597–1610, 2013.
  • [10] N. Suzuki, K. Hirasawa, K. Tanaka, Y. Kobayashi, Y. Sato, and Y. Fujino, “Learning motion patterns and anomaly detection by human trajectory analysis,” in Systems, Man and Cybernetics, 2007. ISIC. IEEE International Conference on.   IEEE, 2007, pp. 498–503.
  • [11] P. Zhou, Y. Zheng, Z. Li, M. Li, and G. Shen, “Iodetector: A generic service for indoor outdoor detection,” in Proceedings of the 10th acm conference on embedded network sensor systems.   ACM, 2012, pp. 113–126.
  • [12] B. P. L. Lau, M. S. Hasala, V. S. Kadaba, B. Thirunavukarasu, C. Yuen, B. Yuen, and R. Nayak, “Extracting point of interest and classifying environment for low sampling crowd sensing smartphone sensor data,” in 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), March 2017, pp. 201–206.
  • [13] H. Shin, Y. Chon, and H. Cha, “Unsupervised construction of an indoor floor plan using a smartphone,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 6, pp. 889–898, Nov 2012.
  • [14] S. H. Marakkalage, S. Sarica, B. P. L. Lau, S. K. Viswanath, T. Balasubramaniam, C. Yuen, B. Yuen, J. Luo, and R. Nayak, “Understanding the lifestyle of older population: Mobile crowdsensing approach,” IEEE Transactions on Computational Social Systems, 2018.
  • [15] S. H. Marakkalage, R. Liu, S. K. Viswanath, and C. Yuen, “Identifying indoor points of interest via mobile crowdsensing: An experimental study,” in 2019 IEEE VTS Asia Pacific Wireless Communications Symposium (APWCS).   IEEE, 2019, pp. 1–5.
  • [16] M. Alzantot and M. Youssef, “Crowdinside: automatic construction of indoor floorplans,” in Proceedings of the 20th International Conference on Advances in Geographic Information Systems.   ACM, 2012, pp. 99–108.
  • [17] J. Y. Zhu, A. X. Zheng, J. Xu, and V. O. Li, “Spatio-temporal (st) similarity model for constructing wifi-based rssi fingerprinting map for indoor localization,” in Indoor Positioning and Indoor Navigation (IPIN), 2014 International Conference on.   IEEE, 2014, pp. 678–684.
  • [18] R. Liu, S. H. Marakkalage, M. Padmal, T. Shaganan, C. Yuen, Y. L. Guan, and U.-X. Tan, “Crowd-sensing simultaneous localization and radio fingerprint mapping based on probabilistic similarity models,” in Proceedings of the ION 2019 Pacific PNT Meeting, Honolulu, Hawaii, April 2019, pp. 73–83.
  • [19] R. Liu, C. Yuen, T. Do, and U. Tan, “Fusing similarity-based sequence and dead reckoning for indoor positioning without training,” IEEE Sensors Journal, vol. 17, no. 13, pp. 4197–4207, July 2017.
  • [20] R. Liu, S. H. Marakkalage, M. Padmal, T. Shaganan, C. Yuen, Y. L. Guan, and U.-X. Tan, “Collaborative slam based on wifi fingerprint similarity and motion information,” IEEE Internet of Things Journal, 2019.
  • [21] X. Tian, X. Wu, H. Li, and X. Wang, “Rf fingerprints prediction for cellular network positioning: A subspace identification approach,” IEEE Transactions on Mobile Computing, vol. 19, no. 2, pp. 450–465, 2019.
  • [22] R. K. Ganti, F. Ye, and H. Lei, “Mobile crowdsensing: current state and future challenges,” IEEE Communications Magazine, vol. 49, no. 11, 2011.
  • [23] B. P. L. Lau, S. H. Marakkalage, Y. Zhou, N. U. Hassan, C. Yuen, M. Zhang, and U.-X. Tan, “A survey of data fusion in smart city applications,” Information Fusion, vol. 52, pp. 357–374, 2019.
  • [24] C. Wu, Z. Yang, and Y. Liu, “Smartphones based crowdsourcing for indoor localization,” IEEE Transactions on Mobile Computing, vol. 14, no. 2, pp. 444–457, 2014.
  • [25] S. H. Marakkalage, B. P. L. Lau, S. K. Viswanath, C. Yuen, and B. Yuen, “Real-time data analysis using a smartphone mobile application,” in Ageing and the Built Environment in Singapore.   Springer, 2019, pp. 221–240.
  • [26] Google, “Wi-Fi Scanning,” https://goo.gl/RqxNk2, 2018, [Online; accessed 01-November-2018].
  • [27] Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma, “Mining interesting locations and travel sequences from gps trajectories,” in Proceedings of the 18th International Conference on World Wide Web, ser. WWW ’09.   New York, NY, USA: Association for Computing Machinery, 2009, p. 791–800.
  • [28] B. P. L. Lau, M. S. Hasala, V. S. Kadaba, B. Thirunavukarasu, C. Yuen, B. Yuen, and R. Nayak, “Extracting point of interest and classifying environment for low sampling crowd sensing smartphone sensor data,” in 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).   IEEE, 2017, pp. 201–206.
  • [29] M. Ester, H.-P. Kriegel, J. Sander, X. Xu et al., “A density-based algorithm for discovering clusters in large spatial databases with noise.” in Kdd, vol. 96, no. 34, 1996, pp. 226–231.
  • [30] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of statistical mechanics: theory and experiment, vol. 2008, no. 10, p. P10008, 2008.