This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Indraprastha Institute of Information Technology, Delhi 22institutetext: Indian Institute of Technology, Delhi 33institutetext: Indian Institute of Technology, Hyderabad 44institutetext: Queen’s University, Belfast
44email: [email protected], 44email: [email protected], 44email: [email protected], 44email: [email protected], and 44email: [email protected]

Modeling Implicit Communities using Spatio-Temporal Point Processes
from Geo-tagged Event Traces

Ankita Likhyani 11    Vinayak Gupta 22    Srijith PK 33    Deepak P 44    Srikanta Bedathur 22
Abstract

The location check-ins of users through various location-based services such as Foursquare, Twitter and Facebook Places, etc., generate large traces of geo-tagged events. These event-traces often manifest in hidden (possibly overlapping) communities of users with similar interests. Inferring these implicit communities is crucial for forming user profiles for improvements in recommendation and prediction tasks. Given only time-stamped geo-tagged traces of users, can we find out these implicit communities, and characteristics of the underlying influence network? Can we use this network to improve the next location prediction task? In this paper, we focus on the problem of community detection as well as capturing the underlying diffusion process and propose a model CoLAB based on spatio-temporal point processes in continuous time but discrete space of locations that simultaneously models the implicit communities of users based on their check-in activities, without making use of their social network connections. CoLAB captures the semantic features of the location, user-to-user influence along with spatial and temporal preferences of users. To learn the latent community of users and model parameters, we propose an algorithm based on stochastic variational inference. To the best of our knowledge, this is the first attempt at jointly modeling the diffusion process with activity-driven implicit communities. We demonstrate CoLAB achieves upto 27% improvements in location prediction task over recent deep point-process based methods on geo-tagged event traces collected from Foursquare check-ins.

1 Introduction

Proliferation of smartphone usage and pervasive data connectivity have made it possible to collect enormous amounts of mobility information of users with relative ease. Foursquare announced in early 2018 that it collects more than 33 billion events every month from its 2525 million users111https://bit.ly/2BdhnnP (accessed in February 2019). These events generate a location information diffusion process through an underlying –possibly hidden– network of users that determines an location adoption behavior among users. Location adoption primarily depends upon user’s spatial, temporal and categorical preferences. For instance, one user’s check-in at a newly opened jazz club could inspire another user to visit the same club or a similar club in her vicinity depending upon the distance from the club and time of the day/week. These users might not be having a social connection but it’s an implicit influence because of similar choices. This often leads to the formation of –possibly overlapping– communities of users with similar behavior. Detecting community of such like minded people from large geo tagged events can benefit applications of various domains such as targted advertisements and friend recommendation. Prior work [39] also suggests that social connections are not as effective factors for prediction tasks.

In this paper, we move away from communities derived purely from social network links, and instead explicitly identify spatio-temporal activity-driven communties. Eschewing the reliance on social connections alone allows us to learn communities that are not overly biased towards connected users. At the same time, it supports the use of CoLAB under settings where social network is not available – either due to privacy settings of users, or due to other restrictions from platform.

Example 1

We illustrate the spatio-temporal activity-based network and communities that CoLAB derives using the US dataset we collected, shown in Figure 1. We observe that explicit social connections, shown using black edges, are very sparse and are unable to capture the implicit influence network. On the other hand, the latent network derived by CoLAB, shown using gray edges, can identify significantly higher number of relations in the latent influence network. It can be observed that it is not only captures the clusters but also identify potential influencers (and their influence networks) which can critically help in prediction tasks.

Refer to caption
(a) threshold = 0.5
Refer to caption
(b) threshold = 0.7
Refer to caption
(c) threshold = 0.9
Figure 1: CoLAB extracts underlying diffusion network over US data using geotagged checkin traces. This Maximum Weighted Spanning Forest (MWSF) is constructed by varying the threshold for edge weights (i.e. influence score computed using CoLAB). The inferred influence network depicts a tree-like structure of influence.

In this paper we determine that the location adoption process and community formation among users can be explained by the same latent factors underlying the observed behavior of users without considering their social network information. We propose CoLAB (Communities of Location Adoption Behaviour) that focuses on jointly inferring location specific influence scores between users and the communities they belong to, based solely on their activity traces. CoLAB completely disregards the social network information, which makes it suitable for scenarios where only activity traces are available. Note that if social network is available, it can be used as a prior or a regulization over the influence matrix we derive. Unlike the current best-performing models for location prediction task (e.g., [36]), CoLAB avoids the community formation from being biased only by the availability of social connections. Thus we generate better communities even for users with few or no social connections as shown in figure 1.

Further, we generate overlapping communities of users that take into account the spatio-temporal patterns, and the shared special interests over location categories. Although a user may be part of multiple communities with different special interests, we assume each event or check-in to be associated with only one of those communities. Prior works have focused on modeling temporal + textual together [39] or temporal + spatial features together [35], none of the techniques to the best of our knowledge have modeled the entire combination of temporal, spatial and location semantics in a spatio-temporal point process to infer the underlying influence network.

Contributions. Our main contributions are as follows:

  1. 1.

    We propose a novel model called CoLAB to model activity patterns over geo-tagged event traces. It leverages spatio-temporal Hawkes process[5, 28] to not only construct a information diffusion-based latent network but also recover overlapping community structures within it.

  2. 2.

    We develop a novel stochastic variational inference technique to learn the latent communities and model parameters.

  3. 3.

    As our target is to identify communities that comprise users who share interests as opposed to just being socially connected, there is unfortunately no gold-standard community information to evaluate our results. Therefore, we first empirically evaluate our method over synthetic data; further, results shows our inference algorithm can accurately recover the model parameters. For communties evaluation on real data, we make use of a joint loss function that evaluates on the basis intra-community properties defined in terms of users’ category affinity and spatial dispersment in their checkin characteristics.

  4. 4.

    We evaluate on two real-world geo-tagged event traces collected from two countries – viz., SA (Saudi Arabia) and US. The experimental results demonstrates that we achieve upto 27% improvement over neural network based models.

2 Related Work

In this section, we briefly review existing literature on community detection and characterization of location adoption.

2.1 Community Detection

Within general social networks, there has been much work on detecting communities as overlapping or non-overlapping clusters of users such that there is a high degree of connectedness between them. Techniques have largely considered social-network connectedness as the main driver in forming communities. Community detection techniques could adopt a top-down approach starting from the entire graph and form communities by separating them into more coherent subsets that would eventually become communities. Methods from this school include those that use graph-based methods [27] and filtering out edges based on local informations such as number of mutual friends [40]. Analogously, community discovery could proceed bottom-up by aggregating proximal nodes to form communities. Techniques within this category have explored methods such as merging proximal cliques [16] or by grouping nodes based on affinities to ’leader’ nodes [15]. Point processes[6], which are popular models for sequential and temporal data, have been recently explored for community detection in general social networks[29]. NetCodec, the method proposed therein, targets to simultaneously detect community structure and network infectivity among individuals from a trace of their activities. The key idea is to leverage multi-dimensional Hawkes process in order to model the relationship between network infectivity vis-a-vis community memberships and user popularity, in order to address the community discovery and network infectivity estimation tasks simultaneously. Our task of community detection within LBSNs is understandably more complex due to the primacy of spatial information in determining LBSN community structures. Accordingly, we leverage a spatio-temporal variant of multi-dimensional Hawkes process, in devising our solution.

LBSN Community Detection: There has been existing work  [25, 31, 23, 19] on community detection in LBSNs that leverage features such as venue categories, temporal features, geo-span, social-status, structure based and location based features in determining community structure within clustering formulations. They have used standard ML based techniques such as Spectral clustering [25], M2M^{2}[31]: a k-means based clustering, Entropy-based [23], and Sequence-Mining [19] based techniques. Our method, as illustrated therein, models a wider variety of features, incorporating both spatio-temporal check-in information as well as venue category information, in inferring communities and user-user influence.

2.2 Location Adoption Characterization

With each LBSN check-in being associated with a location, the location information is central to LBSNs. There has been much research into modelling user-location correlations in various forms, which may be referred to as location adoption in general.

[14] is the closest work to our work, where authors determine the patterns from geo located posts from twitter. Our model deals with discrete locations unlike [14] which simply considers sampling of spatial locations from a continuous distribution which can not be used for modeling check-in locations. The intensity function in CoLAB seamlessly integrates temporal and spatial components of the model and jointly learns the parameters whereas in [14] location and time are modeled separately. Moreover, we use multi-dimensional Hawkes Process as intensities differs across users. The parameter estimation of multi-dimensional Hawkes Process is much harder and Monte Carlo Simulations based methods used in [14] are very slow. We introduce the use of stochastic variational inference (SVI) [2] to overcome this. [11] and [10] exploit the role of social correlations and temporal cyclical behaviors in improving upon the task of predicting the next location that a user would check-in. Location categories (e.g., restaurant, pub etc.) has also been seen to be useful for next location prediction[22]. [34] exploit geographical influence across locations in recommending POIs (points of interest) to LBSN users.

Another task that has attracted significant attention within LBSNs is that of quantifying user influence in LBSNs. [37] consider using geo-social correlation between users to estimate mutual influence. [32] leverage the observation that mutual influence is central to successful recommendation to identify a set of influential seed nodes using information cascade models. [3] adopt influence propagation models to address a slight variant, that of identifying region-specific influential users. [18] and [30] propose other kinds of models for the influence maximization task within LBSNs. ocation Promotion [42][21] and Trip Purpose Detection [13] form other les addressed tasks within the context of LBSNs. Mobility models that mine spatial patterns based on generative models [11], Gaussian distributions [4] and kernel density based estimations [20] have been particularly popular in modeling location adoption behavior within LBSNs.

3 Problem Statement

Consider an geo-tagged event trace dataset SS over a set of LL locations 𝕃={k}k=1L={(xk,yk)}k=1L\mathbb{L}=\{\ell_{k}\}_{k=1}^{L}=\{(x_{k},y_{k})\}_{k=1}^{L}, a set of II users 𝕌={ik}k=1I\mathbb{U}=\{i_{k}\}_{k=1}^{I} and VV categories (restaurants, entertainments etc.). Let us consider there are NN events, with the nthn^{th} check-in denoted as En=(tn,n,cn,in,gn)E_{n}=(t_{n},\ell_{n},c_{n},i_{n},g_{n}) and 𝔼={En}n=1N\mathbb{E}=\{E_{n}\}_{n=1}^{N}. The notation denotes that EnE_{n} is the check-in event involving the user ini_{n} checking in to location n\ell_{n} at time tnt_{n}, with the category associated with the location being cnc_{n} and the latent community of the user associated with the check-in is gng_{n}. The task is to learn the latent community associated with the users and effectively model the diffusion of information among users. Towards this, we aim to learn a matrix ϕ\phi of size |𝕌||\mathbb{U}| x |M||M| where ithi^{th} row represents community participation for the ithi^{th} user (assuming MM communities). In addition, to model the diffusion process, we estimate matrix AijA_{ij}, where an element aija_{ij} represents the influence of ithi^{th} user on jthj^{th} user.

4 Preliminaries

Symbol Description
𝕌\mathbb{U} set of II users
𝕃\mathbb{L} set of LL locations
𝔾\mathbb{G} set of MM communities (latent)
λi(t,x,y)\lambda_{i}(t,x,y) intensity at time tt and spatial coordinate x,yx,y of user ii
μi\mu_{i} base rate of checkins of user ii
AijA_{ij} influence of user ii on user jj
κ(t,x,y)\kappa(t,x,y) triggering / self exciting kernel
Table 1: Terminology

Hawkes Process [12]. It is a point process with self triggering property, has been explored to model temporal occurrences of events in social media [zhao15]. The conditional intensity is modelled as: λ(t)=μ+tk<tκ(ttk),\lambda(t)=\mu+\sum_{t_{k}<t}\kappa(t-t_{k}), where μ\mu is the base intensity of event occurrence, with κ()\kappa() (kernel function) modelling the influence of past events using an exponential decay function.

Multidimensional Hawkes Process. This extension allows modelling of influence of events across entities (e.g., users in the social media case) by considering time-stamped events from across entities [41]; this naturally yields to modelling infectivity across users. The intensity function for an entity ii at time tt depends on past events as: λi(t)=μi+tk<tAikiκ(ttk),\lambda_{i}(t)=\mu_{i}+\sum_{t_{k}<t}A_{i_{k}i}\kappa(t-t_{k}), where (tk,ik)(t_{k},i_{k}) represents historical user/entity events on time scale, μi\mu_{i} >> 0 being the base intensity for the ithi^{th} entity/user, the non-negative influence matrix AijA_{ij} quantifying the quantum of influence from user/entity ii to jj.

Spatio-Temporal Hawkes Process:. This extends the basic Hawkes process along the spatial dimension[5]. These simultaneously model spatio-temporal clustering behavior and has been found useful for modelling dynamics of epidemics and crime. The intensity function takes the form:

λ(,t|Ht)=μ+tk<tκ(k,ttk),\lambda(\ell,t|H_{t})=\mu+\sum_{t_{k}<t}\kappa(\ell-\ell_{k},t-t_{k}),

where k\ell_{k} is the location associated with the event at tkt_{k}.

5 CoLAB Model

In this section we describe our model to infer the ϕ\phi matrix i.e. the communities vector inferred from check-in activities and information diffusion over the users in the network.

5.1 Spatio-Temporal Data Modeling

Given the check-in events EE as defined earlier, for modeling time and spatial components w.r.t. communities, we define the Hawkes process based model as follows:

5.1.1 Intensity Function

We model the user’s community-specific intensity using the multi-dimensional spatio-temporal Hawkes process. Multi-dimensional because influence from other users also contribute in the intensity of a user [41]. Consider the task of estimating the community-specific intensity of a user ini_{n} towards generating a check-in En=(tn,n,cn,in,gn)E_{n}=(t_{n},\ell_{n},c_{n},i_{n},g_{n}); the multi-dimensional spatio-temporal Hawkes process formulation yields the following:

λin,gn(tn,n)=μinηgn+tk<tnAikinκ(tntk,nk)𝕀(gk=gn)\lambda_{i_{n},g_{n}}(t_{n},\ell_{n})=\mu_{i_{n}}\eta_{g_{n}}+\sum_{t_{k}<t_{n}}A_{i_{k}i_{n}}\kappa(t_{n}-t_{k},\ell_{n}-\ell_{k})\mathbb{I}(g_{k}=g_{n}) (1)

where μin\mu_{i_{n}} is the base intensity of user ini_{n} and ηgn\eta_{g_{n}} is weight associated to gng_{n} towards a community gng_{n}, and 𝜼={ηg|g=1,,M}\boldsymbol{\eta}=\{\eta_{g}|g=1,...,M\} with ηg0\eta_{g}\geq 0. λin,gn(tn,n)\lambda_{i_{n},g_{n}}(t_{n},\ell_{n}) is community specific intensity of user ii at the nthn^{th} instance. We allow historical check-ins to contribute to the intensity - proportionate to their temporal and spatial proximity to tnt_{n} and n\ell_{n} respectively, and weighted using the influence between user ini_{n} and iki_{k} (i.e., AikinA_{i_{k}i_{n}}) - as long as they belong to the same community, enforced by the indicator function 𝕀(gk=gn)\mathbb{I}(g_{k}=g_{n}). Here, κ(tntk,nk)\kappa(t_{n}-t_{k},\ell_{n}-\ell_{k}) is the triggering exponential kernel which factorises over time and location.

κ(tntk,nk)=κ(tntk)κ(nk),\kappa(t_{n}-t_{k},\ell_{n}-\ell_{k})=\kappa(t_{n}-t_{k})*\kappa(\ell_{n}-\ell_{k}), (2)

where, κ(tntk)=exp(ν(tntk))\kappa(t_{n}-t_{k})=\exp(-\nu(t_{n}-t_{k})) is the time specific triggering kernel with ν\nu decay and κ(nk)=12πhexp(nk2h)\kappa(\ell_{n}-\ell_{k})=\frac{1}{2\pi\,h}exp\left(-\frac{||\ell_{n}-\ell_{k}||}{2h}\right) is the location specific triggering kernel with hh bandwidth. When the decay parameter is low, the influence of the previous events is high and similarly when the bandwidth parameter is high the influence of previous locations is high.

In general, the intensity of a particular user ii at some time tt and location \ell, is given as the sum of intensities that are estimated at the level of each community i.e. total intensity λi(t,)\lambda_{i}(t,\ell) = gλi,g(t,)\sum_{g}\lambda_{i,g}(t,\ell)

5.2 Category Distribution

The category cc associated with a check-in is represented as a |V||V|-length vector and it represents one of VV possible categories222We overload the notation cc to also represent a scalar categorical value in the set {1,,V}\{1,\ldots,V\} such as restaurant, entertainment etc. associated with the check-in. Also, we assume that the category depends on the underlying latent community associated with this check-in. For example, some community may be more inclined towards restaurants while another community is oriented towards sports. The category is modelled as a sample from a Multinomial (categorical) distribution,

cMultinomial(𝜽𝒈)c\sim Multinomial(\boldsymbol{\theta_{g}}) (3)

where 𝜽𝒈\boldsymbol{\theta_{g}} is a |V||V|-length vector whose elements encode probability of each category and which depends on the community gg that the check-in belongs to. We assume a prior over 𝜽𝒈\boldsymbol{\theta_{g}} as a sample from Dirichlet distribution with parameters 𝜽𝟎\boldsymbol{\theta_{0}}. We write the conditional distribution p(c,𝜽𝒈|𝜽𝟎)p(c,\boldsymbol{\theta_{g}}|\boldsymbol{\theta_{0}}) as:

p(c,𝜽𝒈|𝜽𝟎)=p(c|𝜽𝒈)p(𝜽𝒈|𝜽𝟎)=θg,cΓ(jθ0,j)jΓ(θ0,j)jθg,jθ0,j1p(c,\boldsymbol{\theta_{g}}|\boldsymbol{\theta_{0}})=p(c|\boldsymbol{\theta_{g}})p(\boldsymbol{\theta_{g}}|\boldsymbol{\theta_{0}})=\theta_{g,c}\frac{\Gamma(\sum_{j}\theta_{0,j})}{\prod_{j}\Gamma(\theta_{0,j})}\prod_{j}\theta_{g,j}^{\theta_{0,j}-1} (4)

jj runs over the VV categories and p(c|θg)p(c|\theta_{g}) is given as θg,c\theta_{g,c}

5.3 Distribution over communities

We assume the latent variable gg (the communities) associated with the user for some check-in, is distributed as multinomial distribution parameterized by πi\pi_{i} for a user ii. πig\pi_{ig} represents the probability user ii belongs to community gg.

gMultinomial(𝝅𝒊)g\sim Multinomial(\boldsymbol{\pi_{i}}) (5)

5.4 Generative Process

1Initialize the number of communities MM, and number of checkins NiN_{i} for each user;
2 Set μi\mu_{i} proptional to NiN_{i};
3 Initialize AijA_{ij} as column normalized matrix;
4 Initialize π\pi, η\eta and θ\theta as Dirichlet-Multinomial distribution ;
5 Initialize λi(t0,x0,y0)=μii=1,,U\lambda_{i}(t_{0},x_{0},y_{0})=\mu_{i}\quad\forall i=1,\ldots,U ;
6 for n = 1 to N do
7       Sample (tn,nt_{n},\ell_{n}) from i=1Uλi(t,)\sum_{i=1}^{U}\lambda_{i}(t,\ell);
8       Sample ini_{n} from Multinomial (λ1(tn,n)\lambda_{1}(t_{n},\ell_{n}), λ2(tn,n)\lambda_{2}(t_{n},\ell_{n}) , … , λU(tn,n)\lambda_{U}(t_{n},\ell_{n}));
9       Sample gng_{n} from a Multinomial (πin\pi_{i_{n}});
10       Sample cnc_{n} from Multinomial(θgn)Multinomial(\theta_{g_{n}}) (θg\theta_{g} is defined in section 5.2);
11      
12 end for
Algorithm 1 Generative Process

Note that, for sampling (t,x,y), the thinning algorithm proposed in [17] is modified in order to sample location coordinates from discrete “venue” set rather the continuous space. First we consider a discrete set of locations LL for the user based on her region. We sample (x,y)(x^{\prime},y^{\prime}) at nthn^{th} iteration, from a Gaussian distribution centered at the previous coordinates in the (n1)th(n-1)^{th} iteration: (xn1,yn1)(x_{n-1},y_{n-1}). Once (x,y)(x^{\prime},y^{\prime}) is sampled, the nearest coordinate in the LL is determined and returned as (xn,yn)(x_{n},y_{n}).

6 Estimation and Inference

Given the multi-dimensional Hawkes process model defined above, the joint probability density function over the check-in events 𝔼\mathbb{E} is given as:

n=1Np(tn,ln,cn,gn|in)=n=1N((p(tn,ln|in,gn)×p(gn|in))×p(cn|gn,θ))\prod_{n=1}^{N}p(t_{n},l_{n},c_{n},g_{n}|i_{n})=\prod_{n=1}^{N}\biggl{(}(p(t_{n},l_{n}|i_{n},g_{n})\times p(g_{n}|i_{n}))\times p(c_{n}|g_{n},\theta)\biggr{)} (6)

Here, p(cn|gn,θ)=θgn,cnp(c_{n}|g_{n},\theta)=\theta_{g_{n},c_{n}}, where cnc_{n} represents the category associated with the nthn^{th} check-in, and p(gn|in)=πin,gnp(g_{n}|i_{n})=\pi_{i_{n},g_{n}} the probability that user ini_{n} belong to the community gng_{n}.

n=1Np(tn,n|in,gn)=n=1Nλin,gn(tn,n)exp(i=1U0Tminmaxλi(t,)𝑑t𝑑)\prod_{n=1}^{N}p(t_{n},\ell_{n}|i_{n},g_{n})=\prod_{n=1}^{N}\lambda_{i_{n},g_{n}}(t_{n},\ell_{n})\exp\biggl{(}-\sum_{i=1}^{U}\int\limits_{0}^{T}\int\limits_{\ell_{min}}^{\ell_{max}}\lambda_{i}(t,\ell)dtd\ell\biggr{)} (7)

is the likelihood (event density) of generating the observations given the community and users in the interval [(0,min),(T,max)][(0,\ell_{min}),(T,\ell_{max})]. The first term in (7) provides the instantaneous probability of occurrence of the observed events and the second term provides the probability that no event happens outside these observations (survival probability) [6]. Thus, the complete joint log likelihood is:

=n=1N(logλin,gn(tn,n)+logπin,gn+logθgn,cn)i=1U0Tminmaxλi(t,)𝑑t𝑑\mathcal{LL}=\sum_{n=1}^{N}\bigg{(}\log\lambda_{i_{n},g_{n}}(t_{n},\ell_{n})+\log\pi_{i_{n},g_{n}}+\log\theta_{g_{n},c_{n}}\bigg{)}\\ -\sum_{i=1}^{U}\int\limits_{0}^{T}\int\limits_{\ell_{min}}^{\ell_{max}}\lambda_{i}(t,\ell)dtd\ell (8)

Assuming communities are known, we can estimate the model parameters μ\mu, η\eta,AA, θg\theta_{g}’s and π\pi’s by maximum likelihood estimation. We treat the kernel parameters, and the Dirichlet parameters as the hyper-parameters which we initialize to some fixed values. However, the communities are latent and the maximum likelihood estimation cannot be applied directly. This calls for the expectation maximization algorithm, where the parameters are estimated after integrating out the latent variables from the joint likelihood using the posterior distribution over the latent variables. In our case, the posterior distribution over latent communities is given as

p(g1,,gn|{tn,n,cn,in}n=1N)=n=1Np(tn,n|in,gn)×p(cn|gn,θ)×p(gn|in)g1,,gnn=1Np(tn,n|in,gn)×p(cn|gn,θ)×p(gn|in)p(g_{1},\ldots,g_{n}|\{t_{n},\ell_{n},c_{n},i_{n}\}_{n=1}^{N})\\ =\frac{\prod_{n=1}^{N}p(t_{n},\ell_{n}|i_{n},g_{n})\times p(c_{n}|g_{n},\theta)\times p(g_{n}|i_{n})}{\sum_{g_{1},\ldots,g_{n}}\prod_{n=1}^{N}p(t_{n},\ell_{n}|i_{n},g_{n})\times p(c_{n}|g_{n},\theta)\times p(g_{n}|i_{n})} (9)

The posterior distribution over the latent communities cannot be obtained in closed form due to the intractable normalization constant (denominator term) which involves an exponential number of summation terms. Markov chain Monte Carlo methods [1] can be used to obtain samples from the posterior. However, these approaches are not scalable to large datasets [2] and becomes computationally expensive for use in LBSNs. To overcome this, we use a variational expectation maximization algorithm where we approximate the posterior over communities using a variational distribution and estimate the model parameters and variational parameters by maximizing a variational lower bound [38].

6.1 Variational Expectation Maximization

The latent variables gng_{n}’s dependent on different types of feature set i.e. space, time through p(tn,n|in,gn)p(t_{n},\ell_{n}|i_{n},g_{n}) and semantics through p(cn|gn,θ)p(c_{n}|g_{n},\theta). Though the prior over gng_{n} is conjugate to p(cn|gn,θ)p(c_{n}|g_{n},\theta), it is not with respect to p(tn,n|in,gn)p(t_{n},\ell_{n}|i_{n},g_{n}) and hence the posterior over gng_{n} cannot be computed in closed form. Moreover, gng_{n}’s are inter-dependent i.e. at current step gng_{n} it is dependent on history from g1g_{1} to gn1g_{n-1} as well as the future ones i.e. gn+1g_{n+1}. Thus marginalizing out over such interconnected latent variables to compute the normalization constant for the posterior is intractable. To this end we assume a variational distribution over gng_{n}’s conditioned on the user ini_{n}. The conditional variational distribution over gng_{n} is considered to be a multinomial distribution with parameters ϕ𝒊𝒏\boldsymbol{\phi_{i_{n}}}. The variational parameter ϕ𝒊\boldsymbol{\phi_{i}} for a user ii represents the posterior probability distributions over the communities for the user as observed from the data.

q(gn|in)=Multinomial(gn|ϕ𝒊𝒏)q({g_{n}}|i_{n})=Multinomial(g_{n}|\boldsymbol{\phi_{i_{n}}}) (10)

The variational parameters can be learnt by minimizing the KL divergence between the variational posterior (10) and the exact posterior (9). However, a direct minimization of KL divergence is not possible due to the intractable posterior. Following variational inference approach [2], the variational parameters are learnt by maximizing a variational lower bound, Evidence Lower Bound (ELBO), which indirectly minimizes the KL divergence. ELBO is obtained by considering an expected value of the complete joint log likelihood w.r.t the variational distribution [2] and acts as a lower bound to the marginal likelihood or evidence (normalization constant of the posterior). Hence, ELBO is useful to learn the model parameters also in addition to the variational parameters. Using the variational distribution defined in (10) and the complete joint log likelihood (8), we obtain the ELBO as:

=n=1N(𝔼q[logλin,gn(tn,n)]+m=1Mϕin,mlogπin,m+m=1Mϕin,mlogθm,cn)i=1U0Tminmax𝔼q[λi(t,)]𝑑𝑑t𝔼q[logq]\mathscr{L}=\sum_{n=1}^{N}\bigg{(}\mathbb{E}_{q}[\log\lambda_{i_{n},g_{n}}(t_{n},\ell_{n})]+\sum_{m=1}^{M}\phi_{i_{n},m}\log\pi_{i_{n},m}+\sum_{m=1}^{M}\phi_{i_{n},m}\log\theta_{m,c_{n}}\bigg{)}\\ -\sum_{i=1}^{U}\int\limits_{0}^{T}\int\limits_{\ell_{min}}^{\ell_{max}}\mathbb{E}_{q}[\lambda_{i}(t,\ell)]d\ell dt-\mathbb{E}_{q}[\log q] (11)

Here, 𝔼q\mathbb{E}_{q} represents the expectation with respect to the variational distribution qq defined in (10). We learn the variational parameters and the model parameters by maximizing the ELBO. Table 2 lists the model parameters and variational parameters to be learnt using ELBO. All the terms in the ELBO except the first term can be computed in closed form.

Table 2: Parameters to be estimated and whether a Hyperparameter
Par Description H
𝝁\boldsymbol{\mu} Base Intensity
𝜼\boldsymbol{\eta} Weight associated towards community
AijA_{ij} Influence Matrix
hh Bandwidth (KDE) \checkmark
ν\nu Temporal Decay Parameter \checkmark
𝜽𝟎\boldsymbol{\theta_{0}} Dirichlet Prior: Category \checkmark
𝜽𝒈\boldsymbol{\theta_{g}} Multinomial Prior: Categories / community
π\pi Multinomial Prior: Communities (All users)
ϕ\phi Variational Parameters: Communities (All users)

Since the first term in (11) cannot be computed in closed form, we approximate it using the samples from the variational posterior (10) (Monte-Carlo approximation).

𝔼q[logλin,gn(tn,n)]1Ss=1Slogλin,𝒈(𝒔)(tn,n)\mathbb{E}_{q}[\log\lambda_{i_{n},g_{n}}(t_{n},\ell_{n})]\approx\frac{1}{S}\sum_{s=1}^{S}\log\lambda_{i_{n},\boldsymbol{g^{(s)}}}(t_{n},\ell_{n}) (12)

where 𝒈(𝒔)\boldsymbol{g^{(s)}} represent the vector of NN samples sampled from the joint variational distribution over all the gng_{n}’s, i.e. q(𝒈(𝒔))=i=1Nq(gn(s)|ϕin)q(\boldsymbol{g^{(s)}})=\prod_{i=1}^{N}q(g_{n}^{(s)}|\phi_{i_{n}}). This results in a stochastic variational lower bound where the stochasticity arises due to the approximation of expectation using Monte Carlo sampling [38]. We learn the model parameters μ\mu, η\eta, AijA_{ij}, θ\theta by maximizing the stochastic variational lower bound [26] using gradient based methods. However learning the variational parameters is problematic as the variational parameters does not appear explicitly in the stochastic term but only through the samples. For determining gradient w.r.t. ϕ\phi we apply the Reinforcement trick to the stochastic term and compute the gradient as follows [9]:

ϕ𝔼q[logλin,gn(tn,n)]1Ss=1Slogλin,𝒈(𝒔)(tn,n)ϕlogq(𝒈(𝒔))\bigtriangledown_{\phi}\mathbb{E}_{q}[\log\lambda_{i_{n},g_{n}}(t_{n},\ell_{n})]\\ \approx\frac{1}{S}\sum_{s=1}^{S}\log\lambda_{i_{n},\boldsymbol{g^{(s)}}}(t_{n},\ell_{n})\bigtriangledown_{\phi}\log q(\boldsymbol{g^{(s)}}) (13)

7 Experiments

Compared Baselines We empirically evaluate the performance of CoLAB 333We pledge to make our codes and datasets public. over synthetic and real datasets with the following baselines:

  • STHP: Spatio-Temporal Hawkes Process models the diffusion process across spatial and temporal dimensions but ignores the category associated location feature. This is the baseline derived from CoLAB ignoring the categories.

  • Sequence Mining [19]: First extracts frequent occuring venue category sequences and assigns communties based on clusters with similar patterns.

  • DH [8]: Dirichlet-Hawkes, clusters continuous time event streams using a modified Hawkes model with preferential cluster assignment through Dirichlet Process.

  • RMTPP:[7] Recurrent Marked Temporal Point Process model the time and the marker information by learning a general representation of the nonlinear dependency over the history based on recurrent neural networks. In this model, event history is embedded into a compact vector representation which is then used for predicting the next event time and marker type.

For an even comparison, we feed the check-in events to all the baselines and evaluate the community quality formed by DH and Sequence Mining along with location prediction for STHP and RMTPP.

7.1 Synthetic Data

We generate synthetic data using algorithm 1 (statistics in Table 4). The set of locations, #users to categories i.e. (U:V) and to the number of checkins i.e. (U:N) are kept similar to the real data collected from Brazil with ν\nu (temporal decay parameter) is set to 0.01 [33] and hh is picked up from the bandwidth values learned from users’ checkins. During inference, we use true values for all the parameters, except for the influence matrix AijA_{ij} and the user-community posterior ϕ\phi which are estimated. We use RelErr(Aij,A^ij)=1I2i,j=1I|aija^ij||aij|RelErr(A_{ij},\hat{A}_{ij})=\frac{1}{I^{2}}\sum_{i,j=1}^{I}\frac{|a_{ij}-\hat{a}_{ij}|}{|a_{ij}|} (and similarly for ϕ\phi), as the metric to evaluate the ability to recover the true values. Table 4 indicates that the stochastic variational inference technique offers considerably better reconstruction of the parameters, recording significant reductions in the error.

Table 3: Dataset Properties
Property #Users(U) #Communities(M) #Categories(V) #Check-ins(N)
Synthetic 100 10 200 9777
SA (Real) 95 - 314 15110
US (Real) 133 - 524 22059
Table 4: RelErrRelErr on AA and ϕ\phi and True Positives for Location Prediction results at Top-K on Synthetic Data
RelErr Top K
Technique AA ϕ\phi 5 10
STHP 0.99174 0.13807 681 1206
CoLAB 0.04813 0.07216 972 1677

7.2 Real Data

For real data, we use our crawls over Foursquare conducted between January-2015 and March-2016 and construct two collections consisting of check-ins from Saudi Arabia (SA) and United States (US), with details given in Table 4. We allocated first 80% (as per check-in timestamp) of each dataset to training and remaining for testing. Here, we also use temporal decay parameter as 0.01 [33] and hh is learnt for each user based on Silverman’s rule of thumb in kernel density estimation [1] and is fixed during the joint estimation.

7.2.1 Location Prediction with CoLAB

For location prediction task, we predict the next location from the previously seen locations in the training set at M(#communities)=10M(\#communities)=10 and at various top-KK ranked (eq. 8) cutoffs. Since, DH is for clustering event streams, therefore we make use of STHP and RMTPP as baselines for the Location Prediction task. Table 5 shows that CoLAB is able to offer significant improvements (18-37% in the top-55). As we increase KK, these gains diminish because the number of candidate locations saturates. We also study the effect of AijA_{ij}(influence matrix) and μ\mu (base intenisty) over CoLAB’s performance. It can be observed that without AijA_{ij}, the CoLAB’s performance degrades signifying CoLAB’s ability to capture the underlying diffusion process well.

Table 5: Comparison of CoLAB with other baselines and CoLAB without AijA_{ij} and μ\mu
Dataset K STHP RMTPP CoLAB w/o Aij CoLAB w/o μ\mu CoLAB
SA (#testcases = 2805) 5 287 250 279 331 339
10 455 499 478 589 593
20 950 951 911 1038 1043
50 1539 1539 1520 1664 1666
US (#testcases = 4395) 5 153 172 172 231 237
10 456 467 445 507 512
20 870 888 919 920 927
50 1700 1691 1759 1668 1673

7.2.2 Impact of #Communities

In figures 2 and 3 we study the impact of MM, over SA and US data we observe that with increasing MM the prediction accuracy improves and then diminishes, signifying optimal value of MM for better predictions. CoLAB performs significantly better than STHP, primarily due to the better estimation of AijA_{ij} (influence matrix), because of the presence of category information in the CoLAB model.

7.3 Community Assessment

We plot communities over SA and US data in Figure 4, where colored dots represents a check-in. In figure  4, it can be observed that; (i) Overlap between communities due to data concentration in cities. (ii) CoLAB is able to capture communities across cities.

Refer to caption
(a) K=5
Refer to caption
(b) K=10
Refer to caption
(c) K=20
Figure 2: Location Prediction Results for Varying M over SA
Refer to caption
(a) K=5
Refer to caption
(b) K=10
Refer to caption
(c) K=20
Figure 3: Location Prediction Results for Varying M over US

Refer to caption

(a) SA

Refer to caption

(b) US
Figure 4: Spatio-Temporal Activity-driven Latent Communities Captured by CoLAB

Unfortunately, we lack the community ground truth for users, making communities assessment a non-trivial task. Thus, we use a metric a joint loss function for the intra-community properties through a mixture of (i) category loss (cat\mathcal{L}_{cat}) and (ii) location loss (loc\mathcal{L}_{loc}). Category Loss: We consider all categories associated to locations as independent marks of a point process. Hence to estimate the category affinity in a community, we consider similarity among the check-in categories using pre-trained word embeddings [24] and devise a loss function.

cat=1|𝐓|En𝐓g{1𝐯Enμg𝐯En2μg2}Φ(En,g)\mathcal{L}_{cat}=\frac{1}{|\mathbf{T}|}\sum_{E_{n}\in\mathbf{T}}\sum_{g\in\mathcal{M}}\left\{1-\frac{\mathbf{v}_{E_{n}}\cdot\mu_{g}}{||\mathbf{v}_{E_{n}}||_{2}||\mu_{g}||_{2}}\right\}\cdot\Phi(E_{n},g) (14)

where 𝐓\mathbf{T} represents test data, μg\mu_{g} is category mean for a community gg using KcatK_{cat} frequent categories, 𝐯En\mathbf{v}_{E_{n}} is the category vector for event EnE_{n}; Φ(En,g)\Phi(E_{n},g) indicates whether EnE_{n} is assigned to community gg, with \mathcal{M} as all communities. Table 6 demonstrates CoLAB’s ability to capture category dynamics across communities. Although, it can be observed that at KcatK_{cat} = 10 Dirichlet Hawkes performs better because with most frequent categories like restaurant, coffee shop etc. DH assigns it most of the communities. Note that, DH is unable to capture communities with varied categories as seen at KcatK_{cat} = 50 and 100, that CoLAB performs better.
Location Loss:[4] show that users tend to visit nearby locations given we ignore the bias of loyalty. Hence ideally, a community of users not spatially dispersed in their checkin characteristics should be distinct from another community with checkins spanning large distances. We capture this through a distance based k-means loss (loc\mathcal{L}_{loc}) with cluster means(μl\mu_{l}) for checkin coordinates for each community. In table 7 we can see CoLAB performs significantly better than other baselines because CoLAB can better capture the geographical dispersion in communities.

Table 6: Results for Category Loss
Dataset KcatK_{cat} Sequence Mining DH STHP CoLAB
Daily Weekly
SA 10 250.24 236.19 103.47 125.17 119.41
50 1118.23 1089.27 862.05 842.63 826.72
100 2007.93 2120.76 1983.61 1784.04 1749.37
US 10 248.45 217.56 98.37 118.32 113.86
50 956.87 990.45 781.83 793.07 771.15
100 1907.84 2020.49 1901.37 1605.64 1583.02
Table 7: Results for Location Loss
Datasets Sequence Mining DH STHP CoLAB
Daily Weekly
SA 600.67 547.56 413.16 306.73 298.09
US 1127.34 1067.50 1039.08 849.92 834.64

7.3.1 Qualitative Assessment

We claim that a user in a community will display an affinity towards certain categories. For US data, figure 5 shows even with highly overlapping venue categories, our model finds the intricate differences between a community with affinity to music (a) and a community with affinity towards food/bar joints. The word clouds for SA dataset shows similar properties and have been avoided for brevity.

Refer to caption
(a) Community A
Refer to caption
(b) Community B
Figure 5: Word Cloud of Categories in Two Communities from US

8 Conclusion

In this paper, we presented CoLAB that uses spatio-temporal Hawkes process to infer the implicit communities using a novel stochastic variational inference technique. Empirical evaluations over synthetic as well as real-world datasets highlight its prowess with significant improvements in location and community detection tasks. This illustrates the effectiveness of the modelling used in CoLAB in generating user communities even in the absence of social connectedness information. In future work, we would like to explore scalability of CoLAB through sample-based inference techniques.

References

  • [1] Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)
  • [2] Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: A review for statisticians. Journal of the American Statistical Association 112(518), 859–877 (2017)
  • [3] Bouros, P., Sacharidis, D., Bikakis, N.: Regionally influential users in location-aware social networks. In: SIGSPATIAL (2014)
  • [4] Cho, E., Myers, S.A., Leskovec, J.: Friendship and mobility: User movement in location-based social networks. In: SIGKDD (2011)
  • [5] Cho, Y.S., Galstyan, A., Brantingham, P.J., Tita, G.: Latent self-exciting point process model for spatial-temporal networks. vol. 19 (2014)
  • [6] Daley, D.J., Vere-Jones, D.: An introduction to the theory of point processes. Vol. I. Probability and its Applications (New York), Springer-Verlag, New York, second edn. (2003)
  • [7] Du, N., Dai, H., Trivedi, R., Upadhyay, U., Gomez-Rodriguez, M., Song, L.: Recurrent marked temporal point processes: Embedding event history to vector. In: KDD (2016)
  • [8] Du, N., Farajtabar, M., Ahmed, A., Smola, A.J., Song, L.: Dirichlet-hawkes processes with applications to clustering continuous-time document streams. In: SIGKDD (2015)
  • [9] Gal, Y.: Uncertainty in Deep Learning. Ph.D. thesis, University of Cambridge (2016)
  • [10] Gao, H., Tang, J., Hu, X., Liu, H.: Modeling temporal effects of human mobile behavior on location-based social networks. In: CIKM (2013)
  • [11] Gao, H., Tang, J., Liu, H.: Exploring social-historical ties on location-based social networks. In: ICWSM (2012)
  • [12] Hawkes, A., Oakes, D.: A cluster process representation of a self-exciting process. Journal of Applied Probability 11(3), 493–503 (1974)
  • [13] Hu, W., Jin, P.J.: An adaptive hawkes process formulation for estimating time-of-day zonal trip arrivals with location-based social networking check-in data. Transportation Research Part C: Emerging Technologies 79, 136 – 155 (2017)
  • [14] Jankowiak, M., Gomez-Rodriguez, M.: Uncovering the spatiotemporal patterns of collective social activity. In: SDM (2017)
  • [15] Khorasgani, R.R., Chen, J., Zaïane, O.R.: Top leaders community detection approach in information networks. In: 4th SNA-KDD workshop on Social Network mining and Analysis. Citeseer (2010)
  • [16] Kumpula, J.M., Kivelä, M., Kaski, K., Saramäki, J.: Sequential algorithm for fast clique percolation. Physical Review E 78(2), 026109 (2008)
  • [17] Lewis, P.A.W., Shedler, G.S.: Simulation of nonhomogeneous poisson processes by thinning. Naval Research Logistics Quarterly 26(3), 403–413 (1979)
  • [18] Li, G., Chen, S., Feng, J., Tan, K.l., Li, W.: Efficient location-aware influence maximization. In: SIGMOD (2014)
  • [19] Li, H., Deng, K., Cui, J., Dong, Z., Ma, J., Huang, J.: Hidden community identification in location-based social network via probabilistic venue sequences. Information Sciences 422, 188 – 203 (2018)
  • [20] Lichman, M., Smyth, P.: Modeling human location data with mixtures of kernel densities. In: SIGKDD (2014)
  • [21] Likhyani, A., Bedathur, S., P, D.: Locate: Influence quantification for location promotion in location-based social networks. In: IJCAI (2017)
  • [22] Likhyani, A., Padmanabhan, D., Bedathur, S., Mehta, S.: Inferring and exploiting categories for next location prediction. In: WWW (2015)
  • [23] Liu, J., Li, Y., Ling, G., Li, R., Zheng, Z.: Community detection in location-based social networks: An entropy-based approach. In: IEEE CIT (2016)
  • [24] Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A.: Advances in pre-training distributed word representations. In: LREC (2018)
  • [25] Noulas, A., Scellato, S., Mascolo, C., Pontil, M.: Exploiting semantic annotations for clustering geographic areas and users in location-based social networks. In: The Social Mobile Web, ICWSM Workshop (2011)
  • [26] Paisley, J.W., Blei, D.M., Jordan, M.I.: Variational bayesian inference with stochastic search. In: ICML (2012)
  • [27] Prat-Pérez, A., Dominguez-Sal, D., Larriba-Pey, J.L.: High quality, scalable and parallel community detection for large real graphs. In: Proceedings of the 23rd international conference on World wide web. pp. 225–236. ACM (2014)
  • [28] Reinhart, A.: A review of self-exciting spatio-temporal point processes and their applications. Statistical Science (2018)
  • [29] Tran, L.Q., Farajtabar, M., Song, L., Zha, H.: Netcodec: Community detection from individual activities. In: SDM (2015)
  • [30] Wang, X., Zhang, Y., Zhang, W., Lin, X.: Distance-aware influence maximization in geo-social network. In: ICDE (2016)
  • [31] Wang, Z., Zhang, D., Yang, D., Yu, Z., Zhou, X.: Detecting overlapping communities in location-based social networks. In: Social Informatics (2012)
  • [32] Wu, H.H., Yeh, M.Y.: Influential nodes in a one-wave diffusion model for location-based social networks. In: PAKDD (2013)
  • [33] Yang, S.H., Zha, H.: Mixture of mutually exciting processes for viral diffusion. In: ICML. pp. II–1–II–9. ICML’13, JMLR.org (2013), http://dl.acm.org/citation.cfm?id=3042817.3042894
  • [34] Ye, M., Yin, P., Lee, W.C., Lee, D.L.: Exploiting geographical influence for collaborative point-of-interest recommendation. In: SIGIR (2011)
  • [35] Yuan, B., Li, H., Bertozzi, A.L., Brantingham, P.J., Porter, M.A.: Multivariate spatiotemporal hawkes processes and network reconstruction. SIAM Journal on Mathematics of Data Science (2019)
  • [36] Zarezade, A., Jafarzadeh, S., Rabiee, H.R.: Recurrent spatio-temporal modeling of check-ins in location-based social networks. PLOS ONE 13(5), 1–20 (05 2018)
  • [37] Zhang, C., Shou, L., Chen, K., Chen, G., Bei, Y.: Evaluating geo-social influence in location-based social networks. In: CIKM (2012)
  • [38] Zhang, C., Bütepage, J., Kjellström, H., Mandt, S.: Advances in variational inference. CoRR abs/1711.05597 (2017)
  • [39] Zhang, P., Wang, X., Li, B.: On predicting twitter trend: Factors and models. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. p. 1427–1429. ASONAM ’13, Association for Computing Machinery, New York, NY, USA (2013). https://doi.org/10.1145/2492517.2492576, https://doi.org/10.1145/2492517.2492576
  • [40] Zhao, F., Tung, A.K.: Large scale cohesive subgraphs discovery for social network visual analysis. Proceedings of the VLDB Endowment 6(2), 85–96 (2012)
  • [41] Zhou, K., Zha, H., Song, L.: Learning triggering kernels for multi-dimensional hawkes processes. In: ICML. vol. 28 (2013)
  • [42] Zhu, W.Y., Peng, W.C., Chen, L.J., Zheng, K., Zhou, X.: Modeling user mobility for location promotion in location-based social networks. In: SIGKDD (2015)