This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

SINCERE: Sequential Interaction Networks representation learning on Co-Evolving RiEmannian manifolds

Junda Ye Beijing University of Posts and TelecommunicationsBeijingChina [email protected] Zhongbao Zhang Beijing University of Posts and TelecommunicationsBeijingChina [email protected] Li Sun North China Electric Power UniversityBeijingChina [email protected] Yang Yan Beijing University of Posts and TelecommunicationsBeijingChina [email protected] Feiyang Wang Beijing University of Posts and TelecommunicationsBeijingChina [email protected]  and  Fuxin Ren Beijing University of Posts and TelecommunicationsBeijingChina [email protected]
(2018; 2023)
Abstract.

Sequential interaction networks (SIN) have been commonly adopted in many applications such as recommendation systems, search engines and social networks to describe the mutual influence between users and items/products. Efforts on representing SIN are mainly focused on capturing the dynamics of networks in Euclidean space, and recently plenty of work has extended to hyperbolic geometry for implicit hierarchical learning. Previous approaches which learn the embedding trajectories of users and items achieve promising results. However, there are still a range of fundamental issues remaining open. For example, is it appropriate to place user and item nodes in one identical space regardless of their inherent discrepancy? Instead of residing in a single fixed curvature space, how will the representation spaces evolve when new interaction occurs?

To explore these implication for sequential interaction networks, we propose SINCERE, a novel method representing Sequential Interaction Networks on Co-Evolving RiEmannian manifolds. SINCERE not only takes the user and item embedding trajectories in respective spaces into account, but also emphasizes on the space evolvement that how curvature changes over time. Specifically, we introduce a fresh cross-geometry aggregation which allows us to propagate information across different Riemannian manifolds without breaking conformal invariance, and a curvature estimator which is delicately designed to predict global curvatures effectively according to current local Ricci curvatures. Extensive experiments on several real-world datasets demonstrate the promising performance of SINCERE over the state-of-the-art sequential interaction prediction methods.

Sequential Interaction Network, Graph Neural Network, Dynamics, Riemannian Geometry
Contact Junda Ye ([email protected]) and Li Sun ([email protected]) for any question.
copyright: acmcopyrightjournalyear: 2018doi: XXXXXXX.XXXXXXXconference: Make sure to enter the correct conference title from your rights confirmation emai; June 03–05, 2018; Woodstock, NYprice: 15.00isbn: 978-1-4503-XXXX-X/18/06journalyear: 2023copyright: acmlicensedconference: Proceedings of the ACM Web Conference 2023; May 1–5, 2023; Austin, TX, USAbooktitle: Proceedings of the ACM Web Conference 2023 (WWW ’23), May 1–5, 2023, Austin, TX, USAprice: 15.00doi: 10.1145/3543507.3583353isbn: 978-1-4503-9416-1/23/04ccs: Information systems Social networksccs: Information systems Data miningccs: Computing methodologies Learning latent representationsccs: Computing methodologies Neural networks

1. Introduction

Predicting future interactions for sequential interaction networks is essential in many domains, such as “Guess You Like” in e-commerce recommendation systems (He et al., 2014; Jiang et al., 2016), “Related Searches” in search engines (Beutel et al., 2018; Huang et al., 2018) and “Suggested Posts” in social networks (Yang et al., 2021; Wang et al., 2021b). Specifically, purchasers and commodities form a transaction interaction network recording the trading or rating behaviors on e-commerce platforms, predicting interactions accurately helps improve the quality of item recommendations. Users and posts constitute a social interaction network indicating whether and when a user will click/comment posts, early interventions can be made to prevent teenagers from malicious posts like Internet fraud on social media with the help of interaction prediction.

Graph representation learning plays an important role in demonstrating different nodes in graphs within low-dimensional embeddings. In the literature, the majority of existing studies choose Euclidean space as representation space for its simplicity and efficiency. However, for certain graph topology patterns (e.g., hierarchical, scale-free or spherical data), Euclidean space may suffer from large distortion (Chami et al., 2019). Recently, non-Euclidean spaces have emerged as an alternative for their strong capability to embed hierarchical or cyclical data into a low-dimensional space while preserving the structure (Mathieu et al., 2019; Gulcehre et al., 2019; Zhang et al., 2021). Specifically, hyperbolic geometry was first introduced to graph representation learning (Nickel and Kiela, 2017, 2018; Ganea et al., 2018), and

Table 1. The average δ\delta-hyperbolicity (Gromov, 1987) of user and item subgraphs on Mooc and Wikipedia dataset along the timeline.
Dataset Timeline User Item
Mooc start 0.77 0.50
middle 1.0 0.67
end 1.0 0.13
Wikipedia start 1.04 1.16
middle 1.10 1.30
end 1.33 1.40
Refer to caption
Figure 1. User and item degree distribution on Wikipedia.
\Description

Study case of user and item degree distribution on Wikipedia.

then spherical spaces have also gained much attention (Defferrard et al., 2020; Rezende et al., 2020). The development of representation spaces is prosperous nowadays.

With regard to representing sequential interaction networks, a significant amount of work has been proposed to learn the user and item embeddings from their historical interaction records in Euclidean space (Fan et al., 2021; Cao et al., 2021; Dai et al., 2016; Nguyen et al., 2018; Kefato et al., 2021; Beutel et al., 2018). Recurrent models such as Time-LSTM (Zhu et al., 2017), Time-Aware LSTM (Baytas et al., 2017) and RRN (Wu et al., 2017) capture dynamics of users and items by endowing them with a long short-term memory. HILI (Chen et al., 2021) is the successor of Jodie (Kumar et al., 2019) and both of them are promising methods for real-world temporal interaction graphs. CTDNE (Nguyen et al., 2018) adopts random walk on temporal networks and CAW (Wang et al., 2021b) makes it causal and anonymous to inductively represent sequential networks. Additionally, recent models have been proposed to embed sequential interaction networks into hyperbolic spaces (Chamberlain et al., 2019; Feng et al., 2020; Sun et al., 2021a; Mirvakhabova et al., 2020; D\kabrowski et al., 2021). HyperML (Vinh Tran et al., 2020) investigates the notion of learning user and item representations in terms of metric learning in hyperbolic space. HTGN (Yang et al., 2021) designs a recurrent architecture on the sequence of snapshots under hyperbolic geometric.

Existing interaction network representation methods did have achieved promising results. However, they still suffer from two major shortcomings. First, they simply place two different types of nodes (user and item nodes) in one identical space, which is ambiguous and counter-intuitive. For instance, viewers and films are two kinds of nodes with totally different characters and distributions which are usually described by the notion of curvature in Riemannian geometry (Wilson et al., 2014; Sreejith et al., 2016; Bachmann et al., 2020). Here, we give motivated examples in Fig. 1 and Tab. 1 which show that not only the structure of user and item networks varies according to time, but also the degree distribution of users and items are different from each other. Therefore, setting user and item nodes in two spaces seems a properer way and designing an effective message passing mechanism considering cross-space scenarios is challenging. Second, they assume the space is static but in fact the interaction network constantly evolves over time, which is mainly ignored in the previous studies. More specifically, they attempt to learn the representations on a single fixed curvature manifold, either Euclidean space or the standard Poincaré Ball. The static curvature is a strong prerequisite that cannot manifest the representation space of inherent dynamics when new edges of the interaction networks are continuously formed. The challenge here is how to estimate the global curvatures accurately for the next period from local structure information. Besides, most of the existing methods (Kumar et al., 2019; Chen et al., 2021; Zhu et al., 2017; Baytas et al., 2017) also assume that an interaction would only influence corresponding nodes while ignoring the information propagation among the same type of nodes implicitly. New methods are also needed to incorporate the information flow between the same type of nodes.

To overcome the aforementioned shortcomings and challenges, we propose SINCERE to represent Sequential Interaction Networks on Co-Evolving RiEmannian manifolds. Specifically, SINCERE sets both user and item nodes in two κ\kappa-stereographic spaces, interactions with time embeddings in Euclidean space simultaneously. Next, it estimates the sectional curvatures of two co-evolving spaces according to local structure information through a curvature neural network with shared parameters. Then, the dissemination and integration of information, as well as node embeddings updating are performed through cross-geometry aggregation and updating components in our model. Finally, SINCERE is mainly learned through a Pull-and-Push loss based on maximum likelihood optimization. Extensive experiments validate the performance of SINCERE on several datasets.

The major contributions of this paper are summarized as follows:

  • In this paper, we uncover the intrinsic differences between user and item nodes in sequential interaction networks, and model the dynamicity of representation spaces. It is the first attempt to represent sequential interaction networks on a pair of co-evolving Riemannian manifolds to the best we know.

  • We propose a novel model SINCERE for sequential interaction networks representation learning. SINCERE represents user and item nodes in dual κ\kappa-stereographic spaces with variable curvatures, encoding both immanent dissimilarities and temporal effects of user and item nodes.

  • We conduct extensive experiments to evaluate our model with multiple baselines for the tasks of interaction prediction. Experimental results show the superiority and effectiveness of our proposed model.

2. Preliminaries

2.1. Riemannian Manifold

A Riemannian manifold (,g)(\mathcal{M},g) is a real and smooth manifold equipped with an inner product on tangent space g𝐱:𝒯𝐱×𝒯𝐱g_{\mathbf{x}}:\mathcal{T}_{\mathbf{x}}\mathcal{M}\times\mathcal{T}_{\mathbf{x}}\mathcal{M}\rightarrow\mathbb{R} at each point 𝐱\mathbf{x}\in\mathcal{M}, where the tangent space 𝒯𝐱\mathcal{T}_{\mathbf{x}}\mathcal{M} is the first order approximation of \mathcal{M} around 𝐱\mathbf{x}, which can be locally seemed Euclidean. The inner product metric tensor g𝐱g_{\mathbf{x}} is both positive-definite and symmetric, i.e., g𝐱(𝐯,𝐯)0,g𝐱(𝐮,𝐯)=g𝐱(𝐯,𝐮)g_{\mathbf{x}}(\mathbf{v},\mathbf{v})\geq 0,g_{\mathbf{x}}(\mathbf{u},\mathbf{v})=g_{\mathbf{x}}(\mathbf{v},\mathbf{u}) where 𝐱\mathbf{x}\in\mathcal{M} and 𝐮,𝐯𝒯𝐱\mathbf{u},\mathbf{v}\in\mathcal{T}_{\mathbf{x}}\mathcal{M}.

2.2. The κ\kappa-stereographic Model

Hyperboloid and sphere are two common models for hyperbolic and spherical Riemannian geometry. However, hyperbolic space is not a typical vector space, which means we cannot calculate the embedding vectors with operators as if we use in Euclidean space. In order to overcome the gap among hyperbolic, spherical and euclidean geometry, we adopt the κ\kappa-stereographic model (Bachmann et al., 2020) as it unifies operations with gyrovector formalism (Ungar, 2008).

An nn-dimension κ\kappa-stereographic model is a smooth manifold κn={𝐱nκ𝐱22<1}\mathcal{M}_{\kappa}^{n}=\left\{\mathbf{x}\in\mathbb{R}^{n}\mid-\kappa\|\mathbf{x}\|_{2}^{2}<1\right\} equipped with a Riemannian metric g𝐱κ=(λ𝐱κ)2𝐈g_{\mathbf{x}}^{\kappa}=\left(\lambda_{\mathbf{x}}^{\kappa}\right)^{2}\mathbf{I}, where κ\kappa\in\mathbb{R} is the sectional curvature and λ𝐱κ=2(1+κ𝐱22)1\lambda_{\mathbf{x}}^{\kappa}=2\left(1+\kappa\|\mathbf{x}\|_{2}^{2}\right)^{-1} is the point-wise conformal factor. To be specific, κn\mathcal{M}_{\kappa}^{n} is the stereographic projection model for spherical space (κ>0\kappa>0) and the Poincaré ball model of radius 1/κ1/\sqrt{-\kappa} for hyperbolic space (κ<0\kappa<0), respectively. We first list requisite operators below, specific ones are attached in Appendix. Note that bold letters denote the vectors on the manifold.

Mo¨\boldsymbol{\ddot{o}}bius Addition. It is a non-associative addition κ\oplus_{\kappa} of two points 𝐱,𝐲κn\mathbf{x},\mathbf{y}\in\mathcal{M}_{\kappa}^{n} defined as:

(1) 𝐱κ𝐲\displaystyle\mathbf{x}\oplus_{\kappa}\mathbf{y} =(12κ𝐱,𝐲κ𝐲22)𝐱+(1+κ𝐱22)𝐲12κ𝐱,𝐲+κ2𝐱22𝐲22.\displaystyle=\frac{\left(1-2\kappa\langle\mathbf{x},\mathbf{y}\rangle-\kappa\|\mathbf{y}\|_{2}^{2}\right)\mathbf{x}+\left(1+\kappa\|\mathbf{x}\|_{2}^{2}\right)\mathbf{y}}{1-2\kappa\langle\mathbf{x},\mathbf{y}\rangle+\kappa^{2}\|\mathbf{x}\|_{2}^{2}\|\mathbf{y}\|_{2}^{2}}.

Mo¨\boldsymbol{\ddot{o}}bius Scalar and Matrix Multiplication. The augmented Mo¨\ddot{o}bius scalar and matrix multiplication κ\otimes_{\kappa} of 𝐱κn\{𝐨}\mathbf{x}\in\mathcal{M}_{\kappa}^{n}\backslash\left\{\mathbf{o}\right\} by rr\in\mathbb{R} and 𝑴m×n\boldsymbol{M}\in\mathbb{R}^{m\times n} is defined as follows:

(2) rκ𝐱=\displaystyle r\otimes_{\kappa}\mathbf{x}= 1κtanh(rtanh1(κ𝐱2))𝐱𝐱2,\displaystyle\frac{1}{\sqrt{\kappa}}\tanh\left(r\tanh^{-1}(\sqrt{\kappa}\|\mathbf{x}\|_{2})\right)\frac{\mathbf{x}}{\|\mathbf{x}\|_{2}},
(3) 𝑴κ𝐱=(1/κ)\displaystyle\boldsymbol{M}\otimes_{\kappa}\mathbf{x}=(1/\sqrt{\kappa}) tanh(𝑴𝐱2𝐱2tanh1(κ𝐱2))𝑴𝐱𝑴𝐱2.\displaystyle\tanh\left(\frac{\|\boldsymbol{M}\mathbf{x}\|_{2}}{\|\mathbf{x}\|_{2}}\tanh^{-1}(\sqrt{\kappa}\|\mathbf{x}\|_{2})\right)\frac{\boldsymbol{M}\mathbf{x}}{\|\boldsymbol{M}\mathbf{x}\|_{2}}.

Exponential and Logarithmic Maps. The connection between the manifold κn\mathcal{M}_{\kappa}^{n} and its point-wise tangent space 𝒯𝐱κn\mathcal{T}_{\mathbf{x}}\mathcal{M}_{\kappa}^{n} is established by the exponential map exp𝐱κ:𝒯𝐱κnκn\exp_{\mathbf{x}}^{\kappa}:\mathcal{T}_{\mathbf{x}}\mathcal{M}_{\kappa}^{n}\rightarrow\mathcal{M}_{\kappa}^{n} and logarithmic map log𝐱κ:κn𝒯𝐱κn\log_{\mathbf{x}}^{\kappa}:\mathcal{M}_{\kappa}^{n}\rightarrow\mathcal{T}_{\mathbf{x}}\mathcal{M}_{\kappa}^{n} as follows:

(4) exp𝐱κ(𝐯)=𝐱κ\displaystyle\exp_{\mathbf{x}}^{\kappa}(\mathbf{v})=\mathbf{x}\oplus_{\kappa} (tanκ(|κ|λ𝐱κ𝐯22)𝐯𝐯2),\displaystyle\left(\tan_{\kappa}\left(\sqrt{|\kappa|}\frac{\lambda_{\mathbf{x}}^{\kappa}\|\mathbf{v}\|_{2}}{2}\right)\frac{\mathbf{v}}{\|\mathbf{v}\|_{2}}\right),
(5) log𝐱κ(𝐲)=2λ𝐱κ|κ|\displaystyle\log_{\mathbf{x}}^{\kappa}(\mathbf{y})=\frac{2}{\lambda_{\mathbf{x}}^{\kappa}\sqrt{|\kappa|}} tanκ1𝐱κ𝐲2𝐱κ𝐲𝐱k𝐲2.\displaystyle\tan_{\kappa}^{-1}\left\|-\mathbf{x}\oplus_{\kappa}\mathbf{y}\right\|_{2}\frac{-\mathbf{x}\oplus_{\kappa}\mathbf{y}}{\left\|-\mathbf{x}\oplus_{k}\mathbf{y}\right\|_{2}}.

Distance Metric. The generalized distance dκ(,)d_{\mathcal{M}}^{\kappa}\left(\cdot,\cdot\right) between any two points 𝐱,𝐲κn,𝐱𝐲\mathbf{x},\mathbf{y}\in\mathcal{M}_{\kappa}^{n},\mathbf{x}\neq\mathbf{y} is defined as:

(6) dκ(𝐱,𝐲)=2|κ|tanκ1(|κ|𝐱k𝐲2).\displaystyle d_{\mathcal{M}}^{\kappa}\left(\mathbf{x},\mathbf{y}\right)=\frac{2}{\sqrt{|\kappa|}}\tan_{\kappa}^{-1}\left(\sqrt{|\kappa|}\left\|-\mathbf{x}\oplus_{k}\mathbf{y}\right\|_{2}\right).
Refer to caption
Figure 2. Illustration of SINCERE architecture. We first divide timestamps into several intervals, each of which can be regarded as a batch. Interactions with different colors refer to different timestamps. Next, the graph is sent to the interaction integration component and the curvature estimator for capturing time and structure information, respectively. Then, user and item nodes in different Riemannian manifolds pass neighbor information to each other through novel cross-geometry aggregation and updating. Last, the representation and parameters of SINCERE are optimized through maximum likelihood training objective.
\Description

Architecture of our method.

2.3. Sectional Curvature and Ricci Curvature

In Riemannian geometry, curvature is the amount by which a curve deviates from being a straight line, or a surface deviates from being a plane, which is originally defined on the continuous smooth manifold. Sectional curvature and Ricci curvature are two success formulations for the discrete graph (Ye et al., 2020), describing the global and local structure, respectively.

Sectional Curvature. Sectional curvature is an equivalent but more geometrical description of the curvature of Riemannian manifolds compared to the Riemann curvature tensor (Lee, 2018). It is defined over all two-dimensional subspaces passing through a point and is treated as the constant curvature of a Riemannian manifold in recent work.

Ricci Curvature. Ricci curvature is the average of sectional curvatures and is mathematically a linear operator on tangent space at a point. In graph domain, it is extended to measure how the geometry of a pair of neighbors deviates from the case of gird graphs locally. Ollivier-Ricci (Ollivier, 2009) is a coarse approach to computing Ricci curvature while Forman-Ricci (Forman, 2003) is combinatorial and faster to compute.

Table 2. Table of symbols used in the paper
Symbol Description
𝕌κu\mathbb{U}_{\kappa_{u}} User space with curvature κu\kappa_{u}
𝕀κi\mathbb{I}_{\kappa_{i}} Item space with curvature κi\kappa_{i}
TnT_{n} Time interval from tn1t_{n-1} to tnt_{n}
κu/iT\kappa_{u/i}^{T} Curvature of user/item space during TT
𝐮(T)\mathbf{u}(T) Embedding of user uu during TT
𝐢(T)\mathbf{i}(T) Embedding of item ii during TT
𝒩u/iT\mathcal{N}_{u/i}^{T} Neighbors of user/item during TT
u/iT\mathcal{E}_{u/i}^{T} Edges linked to user/item during TT

2.4. Problem Definition and Notation

Before presenting our proposed model, we summarize the important notations in Tab. 2. We may drop the subscript indices in later sections for clarity. The definitions of sequential interaction network and representation learning task are stated as follows:

Definition 1 (Sequential Interaction Network (SIN)) 0.

Let 𝒢={𝒰,,,𝒯,𝒳}\mathcal{G}=\{\mathcal{U,I,E,T,X}\} denote a sequential interaction network, where 𝒰\mathcal{U} and \mathcal{I} are two disjoint node sets of user and item, respectively. 𝒰××𝒯\mathcal{E}\subseteq\mathcal{U}\times\mathcal{I}\times\mathcal{T} denotes the interaction/edge set, and 𝒳||×d𝒳\mathcal{X}\in\mathbb{R}^{|\mathcal{E}|\times d_{\mathcal{X}}} is the attributes associated with the interaction. d𝒳d_{\mathcal{X}} represents feature dimension. Each edge ee\in\mathcal{E} can be indicated as a triplet e=(u,i,t)e=(u,i,t) where u𝒰,iu\in\mathcal{U},i\in\mathcal{I}, and t𝒯t\in\mathcal{T} is the timestamp recording that user uu and item ii interact with each other at time tt.

Definition 2 (SIN Representation Learning Task) 0.

For a sequential interaction network 𝒢\mathcal{G}, it aims to learn encoding functions Φu:𝒰𝕌κu\Phi_{u}:\mathcal{U}\rightarrow\mathbb{U}_{\kappa_{u}}, Φi:𝕀κi\Phi_{i}:\mathcal{I}\rightarrow\mathbb{I}_{\kappa_{i}} and Φe:𝔼\Phi_{e}:\mathcal{E}\rightarrow\mathbb{E} that map two kinds of nodes in networks and interaction between them into dynamic low-dimensional embedding spaces under different Riemannian geometry, where nodes and interaction are represented as compact embedding vectors. Meanwhile, the embeddings capture the intrinsic relation between two types of nodes and are capable of predicting future interactions effectively.

3. SINCERE

Recall that the motivated examples given previously show the intrinsic difference between users and items, to solve the main issues of existing methods on representing sequential interaction networks, we consider the embeddings of user and item residing in different Riemannian manifolds. The two Riemannian manifolds are then bridged by a Euclidean tangent space, which is the representation space of interactions. How the representation spaces evolve over time is told by a curvature estimator.

The overall framework of SINCERE is presented in Fig. 2. The components of our approach are introduced in the following.

3.1. User-Item Co-Evolving Mechanism

Refer to caption
Figure 3. Illustration of one step aggregation. An example of how users update their embeddings from: i) hidden representations, ii) fusion of associated interactions, and iii) weighted midpoint of their neighbors.
\Description

Illustration of one step aggregation.

As we place the user and item nodes on different Riemannian manifolds, they are recorded by the embedding matrix 𝐔\mathbf{U} and 𝐈\mathbf{I} respectively, which can be initialized with the feature vectors such as the user profiling or film description if available. Since the initialized 𝐔\mathbf{U} and 𝐈\mathbf{I} are in Euclidean space, we logarithmically map them to corresponding Riemannian space 𝕌κu\mathbb{U}_{\kappa_{u}} and 𝕀κi\mathbb{I}_{\kappa_{i}} with Eq. 5.

For the message propagation and information aggregation, the design of SINCERE not only incorporates the information across different geometric spaces via a unified formalism, but also guarantees that all operations are conformal invariant. For clarity of expressions, we define map\operatorname{map} operation to map the vectors in manifold κ1\mathcal{M}_{\kappa_{1}} to κ2\mathcal{M}_{\kappa_{2}} as mapκ1κ2():=exp𝐨κ2(log𝐨κ1())\operatorname{map}_{\kappa_{1}}^{\kappa_{2}}\left(\cdot\right):=\exp_{\mathbf{o}}^{\kappa_{2}}\left(\log_{\mathbf{o}}^{\kappa_{1}}\left(\cdot\right)\right). The aggregation operators of user and item are defined as follows:

(7) 𝐮j(Tn)𝑴1κuT𝐡uj(Tn)user hidden aggregationκuT𝑴2κuTexp𝐨κuT(𝐞ujT)interaction aggregationκuT𝑴3κuTmapκiTκuT(𝐢ujT)item aggregation,\begin{split}\mathbf{u}_{j}(T_{n})\leftarrow\underbrace{\boldsymbol{M}_{1}\otimes_{\kappa_{u}^{T}}\mathbf{h}_{u_{j}}(T_{n})}_{\textit{user hidden aggregation}}&\oplus_{\kappa_{u}^{T}}\underbrace{\boldsymbol{M}_{2}\otimes_{\kappa_{u}^{T}}\exp_{\mathbf{o}}^{\kappa_{u}^{T}}(\mathbf{e^{\prime}}_{u_{j}}^{T})}_{\textit{interaction aggregation}}\\ &\oplus_{\kappa_{u}^{T}}\underbrace{\boldsymbol{M}_{3}\otimes_{\kappa_{u}^{T}}\operatorname{map}_{\kappa_{i}^{T}}^{\kappa_{u}^{T}}(\mathbf{i^{\prime}}_{u_{j}}^{T})}_{\textit{item aggregation}},\end{split}
(8) 𝐢k(Tn)𝑴4κiT𝐡ik(Tn)item hidden aggregationκiT𝑴5κiTexp𝐨κiT(𝐞ikT)interaction aggregationκiT𝑴6κiTmapκuTκiT(𝐮ikT)user aggregation,\begin{split}\mathbf{i}_{k}(T_{n})\leftarrow\underbrace{\boldsymbol{M}_{4}\otimes_{\kappa_{i}^{T}}\mathbf{h}_{i_{k}}(T_{n})}_{\textit{item hidden aggregation}}&\oplus_{\kappa_{i}^{T}}\underbrace{\boldsymbol{M}_{5}\otimes_{\kappa_{i}^{T}}\exp_{\mathbf{o}}^{\kappa_{i}^{T}}(\mathbf{e^{\prime}}_{i_{k}}^{T})}_{\textit{interaction aggregation}}\\ &\oplus_{\kappa_{i}^{T}}\underbrace{\boldsymbol{M}_{6}\otimes_{\kappa_{i}^{T}}\operatorname{map}_{\kappa_{u}^{T}}^{\kappa_{i}^{T}}(\mathbf{u^{\prime}}_{i_{k}}^{T})}_{\textit{user aggregation}},\end{split}

where 𝑴\boldsymbol{M}s are trainable matrices and 𝐡\mathbf{h} represents hidden state. The aggregation operator has three main components: hidden state aggregation, interaction aggregation and neighbor aggregation.

Take the user aggregation shown in Eq. 7 and Fig. 3 as an example: the first component is the update of user jj hidden representation for several layers, which can be cascaded for multi-hop message aggregation, and the second component aggregates the associated edge information to user jj and the third component aggregates items information which is interacted by user jj. It is worth noting that the problem of ignoring the information propagation among the same type of nodes can be solved by stacking several layers. 𝐞u/iT\mathbf{e^{\prime}}_{u/i}^{T} in the interaction aggregation is either Early Fusion 𝐞u/iT=Mlp(Pooling(𝐞u/iT))\mathbf{e^{\prime}}_{u/i}^{T}=\textsc{Mlp}\left(\textsc{Pooling}\left(\mathbf{e}\in\mathcal{E}_{u/i}^{T}\right)\right) or Late Fusion 𝐞u/iT=Pooling(Mlp(𝐞u/iT))\mathbf{e^{\prime}}_{u/i}^{T}=\textsc{Pooling}\left(\textsc{Mlp}\left(\mathbf{e}\in\mathcal{E}_{u/i}^{T}\right)\right) and Pooling()\textsc{Pooling}(\cdot) can be one of the Mean Pooling and Max Pooling. 𝐢uT\mathbf{i^{\prime}}_{u}^{T} and 𝐮iT\mathbf{u^{\prime}}_{i}^{T} in neighbor aggregation are the weighted gyro-midpoints (Ungar, 2010) (specified in the Appendix) in item and user stereographic spaces, respectively. Take 𝐢uT\mathbf{i^{\prime}}_{u}^{T} as an example, which is defined as:

(9) 𝐢uT=MidpointκiT(xj𝒩uT)=12κiT(xj𝒩uTαkλ𝐱jκxk𝒩uTαk(λ𝐱kκ1)𝐱j),\begin{split}\mathbf{i^{\prime}}_{u}^{T}=\textsc{Midpoint}_{\kappa_{i}^{T}}&\left(x_{j}\in\mathcal{N}_{u}^{T}\right)=\frac{1}{2}\otimes_{\kappa_{i}^{T}}\\ &\left(\sum_{x_{j}\in\mathcal{N}_{u}^{T}}\frac{\alpha_{k}\lambda_{\mathbf{x}_{j}}^{\kappa}}{\sum_{x_{k}\in\mathcal{N}_{u}^{T}}\alpha_{k}\left(\lambda_{\mathbf{x}_{k}}^{\kappa}-1\right)}\mathbf{x}_{j}\right),\end{split}

where αk\alpha_{k} is attention weight and λ𝐱κ\lambda_{\mathbf{x}}^{\kappa} is point-wise conformal factor.

After the aggregation of nodes and edges, we project embeddings to next-period manifolds for further training and predicting. The node updating equation is defined as follows:

(10) 𝐮j(Tn+1)=mapκuTnκuTn+1(𝐮j(Tn)),\mathbf{u}_{j}(T_{n+1})=\operatorname{map}_{\kappa_{u}^{T_{n}}}^{\kappa_{u}^{T_{n+1}}}\left(\mathbf{u}_{j}\left(T_{n}\right)\right),
(11) 𝐢j(Tn+1)=mapκiTnκiTn+1(𝐢j(Tn)),\mathbf{i}_{j}(T_{n+1})=\operatorname{map}_{\kappa_{i}^{T_{n}}}^{\kappa_{i}^{T_{n+1}}}\left(\mathbf{i}_{j}\left(T_{n}\right)\right),

where the curvature κTn+1\kappa^{T_{n+1}} of next interval can be obtained from Eq. 16.

3.2. Interaction Modeling

We set the sequential interactions in Euclidean space instead of on Riemannian manifolds as user and item nodes. The intuition behind such design is that the tangent space which is Euclidean is the common space of different Riemannian manifolds and users and items are linked by sequential interactions. Specifically, for an edge eke_{k}\in\mathcal{E}, the integration of its corresponding attribute 𝐗k𝒳\mathbf{X}_{k}\in\mathcal{X} and timestamp tk𝒯t_{k}\in\mathcal{T} is regarded as the initialized embedding. In order to capture the temporal information of interactions, we formalize the time encoder as a function Φt:d\Phi_{t}:\mathbb{R}\rightarrow\mathbb{R}^{d} which maps a continuous time tkt_{k} to a dd-dimensional vector. In order to account for time translation invariance in consideration of the aforementioned batch setup, we adopt the harmonic encoder (Xu et al., 2020) as follows:

(12) ϕ(t)=1d[cos(ω1t+θ1);cos(ω2t+θ2)..;cos(ωdt+θd)],\displaystyle\phi(t)=\sqrt{\frac{1}{d}}\left[\cos\left(\omega_{1}t+\theta_{1}\right);\cos\left(\omega_{2}t+\theta_{2}\right)..;\cos\left(\omega_{d}t+\theta_{d}\right)\right],

where ω\omegas and θ\thetas are learnable encoder parameters. and [;]\left[\cdot\,;\cdot\right] is the concatenation operator. The definition of Eq. 12 satisfies time translation invariance. Thus, the initial embedding of the interaction eke_{k} can be represented as:

(13) 𝐞k=σ(𝑾𝟏[𝐗k;ϕ(tk)]),\displaystyle\mathbf{e}_{k}=\sigma(\boldsymbol{W_{1}}\cdot\left[\mathbf{X}_{k}\,;\,\phi(t_{k})\right]),

where σ()\sigma(\cdot) is the activation function and 𝑾𝟏\boldsymbol{W_{1}} is the training matrix.

3.3. Curvature Estimator

Embedding vectors of the previous work are set on one fixed curvature manifold such as Poincaré Ball, while the fixed curvature is a strong premise assumption which cannot manifest the dynamics of the representation space. In fact, the curvature of the embedding space evolves over time, and thus we need to design curvature computation for both local curvature and global curvature, i.e., Ricci curvature and sectional curvature.

Sequential interaction networks are mostly bipartite graphs, which means there is no direct link connected between nodes of the same type. Therefore, to explore the local Ricci curvature in both user and item spaces, we need to extract new topologies from the input network. The rule of subgraph extraction is based on the shared neighbor relationship, e.g., users with common item neighbors are friends with each other. However, there will be several complete graphs which increase the complexity if we let all users with the same neighbors become friends. Here, we take sampling in extracting subgraph process to avoid long time cost. Then, the local curvature of any edge (x,y)(x,y) can be resolved by Ollivier-Ricci curvature κr\kappa_{r} defined as follows:

(14) κr(x,y)=1W(mx,my)dκ(x,y),\displaystyle\kappa_{r}(x,y)=1-\frac{W\left(m_{x},m_{y}\right)}{d_{\mathcal{M}}^{\kappa}(x,y)},

where W(,)W(\cdot,\cdot) is Wasserstein distance (or Earth Mover distance) which finds the optimal mass-preserving transportation plan between two probability measures. mxm_{x} is the mass distribution of the node xx measuring the importance distribution of one node among its one-hop neighbors 𝒩(x)={x1,x2,,xl}\mathcal{N}(x)=\{x_{1},x_{2},...,x_{l}\}, which is defined as:

(15) mxα(xi)={α if xi=x,(1α)/l if xi𝒩(x),0 otherwise.\displaystyle m_{x}^{\alpha}\left(x_{i}\right)=\begin{cases}\alpha&\text{ if }x_{i}=x,\\ (1-\alpha)/l&\text{ if }x_{i}\in\mathcal{N}(x),\\ 0&\text{ otherwise. }\end{cases}

where α\alpha is a hyper-parameter within [0,1][0,1]. Similar to previous work (Ye et al., 2020; Sia et al., 2019), we set α=0.5\alpha=0.5.

Input : An undirected graph GG, iterations nn
Output : Observed curvature κo\kappa_{o}
1 for mGm\in G do
2       for i=1,,ni=1,...,n do
3             b,cSample(𝒩(m))b,c\in\mathop{Sample}(\mathcal{N}(m)) and aSample(G)/{m}a\in\mathop{Sample}(G)/\{m\};
4             Calculate γ(a,b,c)\gamma_{\mathcal{M}}(a,b,c);
5             Calculate γ(m;b,c;a)\gamma_{\mathcal{M}}(m;b,c;a);
6            
7      Let γ(m)=MEAN(γ(m;b,c;a))\gamma(m)=\mathop{MEAN}(\gamma_{\mathcal{M}}(m;b,c;a));
8      
Return: κo=MEAN(γ(m))\kappa_{o}=\mathop{MEAN}(\gamma(m))
Algorithm 1 Curvature Observation

Let 𝐑\mathbf{R} represent the vector of Ricci curvatures. Inspired by the bilinear form between Ricci curvature and sectional curvature, we design the global sectional curvature neural network called CurvNN as follows:

(16) κe=Mlp(𝐑)T𝑾𝟐Mlp(𝐑),\displaystyle\kappa_{e}=\textsc{Mlp}\left(\mathbf{R}\right)^{T}\boldsymbol{W_{2}}\textsc{Mlp}\left(\mathbf{R}\right),

where 𝑾𝟐\boldsymbol{W_{2}} is the trainable matrix and Mlp means multilayer perceptron. The bilinear design of CurvNN allows us to obtain the sectional curvature of any sign.

The observation of the curvature is also important for us to train CurvNN. Therefore, we adopt a variant of the Parallelogram Law to observe the sectional curvature (Gu et al., 2019; Fu et al., 2021; Bachmann et al., 2020). Considering a geodesic triangle abcabc on manifold \mathcal{M} and mm is the midpoint of bcbc, two of their quantities are defined as: γ(a,b,c):=d(a,m)2+d(b,c)2/4(d(a,b)2+d(a,c)2)/2\gamma_{\mathcal{M}}(a,b,c):=d_{\mathcal{M}}(a,m)^{2}+d_{\mathcal{M}}(b,c)^{2}/4-\left(d_{\mathcal{M}}(a,b)^{2}+d_{\mathcal{M}}(a,c)^{2}\right)/2 and γ(m;b,c;a):=γ(a,b,c)2d(a,m)\gamma_{\mathcal{M}}(m;b,c;a):=\frac{\gamma_{\mathcal{M}}(a,b,c)}{2d_{\mathcal{M}}(a,m)}. The observed curvature is calculated in Algorithm 1. The observed curvature κo\kappa_{o} will supervise the training of CurvNN to generate estimated curvature κe\kappa_{e} accurately as possible.

3.4. Training Objective

Benefiting from the fact that the output of SINCERE is embedded representation of users and items, we can treat them as input to various downstream tasks, such as interaction prediction. If we easily add up the distance from different spaces, the distortion will be huge due to different geometry metrics. Therefore, we consider our optimization from two symmetric aspects, i.e., user and item. Specifically, we first map the item embedding to user space and calculate the distance under user geometry metric, and vice versa as follows:

(17) d𝕌(𝐮,𝐢)=dκu(𝐮,mapκiκu(𝐢)),d𝕀(𝐮,𝐢)=dκi(mapκuκi(𝐮),𝐢).\begin{split}d_{\mathbb{U}}\left(\mathbf{u},\mathbf{i}\right)=d_{\kappa_{u}}\left(\mathbf{u},\operatorname{map}_{\kappa_{i}}^{\kappa_{u}}\left(\mathbf{i}\right)\right),\\ d_{\mathbb{I}}\left(\mathbf{u},\mathbf{i}\right)=d_{\kappa_{i}}\left(\operatorname{map}_{\kappa_{u}}^{\kappa_{i}}\left(\mathbf{u}\right),\mathbf{i}\right).\end{split}

Then, we regard Fermi-Dirac decoder, a generalization of Sigmoid, as the likelihood function to compute the connection probability between users and items in both spaces. Since we convert the distance into probabilities, the probabilities then can be easily summed. Moreover, we are curious about which geometric manifolds are more critical for interaction prediction, and then assign more weight to the corresponding probability. The likelihood function and probability assembling are defined as follows:

(18) P𝕌/𝕀(e|Θ)\displaystyle P_{\mathbb{U}/\mathbb{I}}(e|\Theta) =1exp((d𝕌/𝕀(𝐮,𝐢)r)/t)+1,\displaystyle=\frac{1}{\exp\left(\left(d_{\mathbb{U}/\mathbb{I}}(\mathbf{u},\mathbf{i})-r\right)/t\right)+1},
(19) P(e|Θ)\displaystyle P(e|\Theta) =ρP𝕌+(1ρ)P𝕀,\displaystyle=\rho P_{\mathbb{U}}+(1-\rho)P_{\mathbb{I}},

where Θ\Theta represents all learnable parameters in SINCERE, r,t>0r,t>0 are hyperparameters of Fermi-Dirac decoder, and ρ\rho is the learnable weight. Users and items share the same formulation of the likelihood.

The main optimization is based on the idea of Pull-and-Push. We hope that the embedding in batch TnT_{n} is capable of predicting the interaction in batch Tn+1T_{n+1}, thus we utilize the graph in batch Tn+1T_{n+1} to supervise the previous batch. In particular, we expect a greater probability for all interactions that actually occur in next batch, but a smaller probability for all the negative samples. Given the large number of negative samples, we determine to use negative sampling (Gaussian) with rate of 20% and it can also avoid the problem of over-fitting. For the curvature estimator component, we wish that CurvNN can learn the global sectional curvature from the local structure (Ricci curvature) without observing in the prediction task. Therefore, we make the Euclidean norm of estimated curvature κe\kappa_{e} and observed curvature κo\kappa_{o} as our optimizing target. We calculate the total loss as follows:

(20) Θ=argmaxΘeln[P(e|Θ)]Sampling(eln[P(e|Θ)])+κeκo2.\begin{split}\Theta^{*}=\mathop{argmax}\limits_{\Theta}\sum_{e\in\mathcal{E}}&\ln[P(e|\Theta)]\\ -&\mathop{Sampling}\left(\sum_{e\notin\mathcal{E}}\ln[P(e|\Theta)]\right)+\|\kappa_{e}-\kappa_{o}\|_{2}.\end{split}

3.5. Computational Complexity Analysis

The procedure of training SINCERE is given in appendix. The curvature estimator and cross-geometry operations are two main processes of SINCERE. To avoid excessive parameters, we use a shared curvature estimator for both user and item spaces. The primary computational complexity of SINCERE is approximated as O(|𝒰|DD+||DD+||D2)+O(V3logV)O(|\mathcal{U}|DD^{\prime}+|\mathcal{I}|DD^{\prime}+|\mathcal{E}|D^{\prime 2})+O(V^{3}\log V), where |𝒰||\mathcal{U}| and |||\mathcal{I}| are number of user and item nodes and |||\mathcal{E}| is the number of interactions. DD and DD^{\prime} are the dimensions of input and hidden features respectively and VV is the total number of vertices in the Wasserstein distance sub-problem. The former O()O(\cdot) is the computational complexity of all aggregating and updating operations and the computation of other parts in our model can be parallelized and is computationally efficient. The latter O()O(\cdot) is the complexity of Olliver-Ricci curvature. In order to avoid repeated calculation of Ricci curvature, we adopt an OFFLINE pre-computation that calculates Ricci curvatures according to different batch sizes and saves it in advance since the subgraph in each batch does not change.

4. Experiments

4.1. Datasets and Compared Algorithms

Datasets. We conduct experiments on five dynamic real-world datasets: MOOC, Wikipedia, Reddit, LastFM and Movielen to evaluate the effectiveness of SINCERE. The description and statistic details of these datasets are shown in Appendix.

Compared Algorithms. To demonstrate the effectiveness of our method, we compare our model with nine state-of-the-art models, categorized as follows:

  • Recurrent sequence models: we compare with different flavors of RNNs trained for item-sequence data: LSTM, Time-LSTM (Zhu et al., 2017), RRN (Wu et al., 2017).

  • Random walk based models: CAW (Wang et al., 2021b), CTDNE (Nguyen et al., 2018) are two temporal network models adopting causal and anonymous random walks.

  • Sequence network embedding models: in this category, JODIE (Kumar et al., 2019), HILI (Chen et al., 2021) and DeePRed (Kefato et al., 2021) are three state-of-the-art methods in generating embeddings from sequential networks employing recursive networks.

  • Hyperbolic models: we compare SINCERE with HGCF (Sun et al., 2021a) which is a hyperbolic method on collaborative filtering.

Table 3. Future interaction prediction: Performance comparison in terms of mean reciprocal rank (MRR) and Recall@10. The best results are in bold and second best results are underlined.
Datasets \rightarrow MOOC Wikipedia Reddit LastFM Movielen
Methods \downarrow MRR Recall@10 MRR Recall@10 MRR Recall@10 MRR Recall@10 MRR Recall@10
LSTM 0.055 0.109 0.329 0.455 0.355 0.551 0.062 0.119 0.031 0.060
Time-LSTM (Zhu et al., 2017) 0.079 0.161 0.247 0.342 0.387 0.573 0.068 0.137 0.046 0.084
RRN (Wu et al., 2017) 0.127 0.230 0.522 0.617 0.603 0.747 0.089 0.182 0.072 0.181
CAW (Wang et al., 2021b) 0.200 0.427 0.656 0.862 0.672 0.794 0.121 0.272 0.096 0.243
CTDNE (Nguyen et al., 2018) 0.173 0.368 0.035 0.056 0.165 0.257 0.01 0.01 0.033 0.051
JODIE (Kumar et al., 2019) 0.465 0.765 0.746 0.822 0.726 0.852 0.195 0.307 0.428 0.685
HILI (Chen et al., 2021) 0.436 0.826 0.761 0.853 0.735 0.868 0.252 0.427 0.469 0.784
DeePRed (Kefato et al., 2021) 0.458 0.532 0.885 0.889 0.828 0.833 0.393 0.416 0.441 0.472
HGCF (Sun et al., 2021a) 0.284 0.618 0.123 0.344 0.239 0.483 0.040 0.083 0.108 0.260
SINCERE (ours) 0.586 0.885 0.793 0.865 0.825 0.883 0.425 0.466 0.511 0.819

4.2. Future Interaction Prediction

The goal of the future interaction prediction task is to predict the next item a user is likely to interact with based on observations of recent interactions. We here use two metrics, mean reciprocal rank (MRR) and Recall@10 to measure the performance of all methods. The higher the metrics, the better the performance. For fair comparison, all models are run for 50 epochs and the dimension of embeddings for most models are set to 128. We use a chronological train-valid-test split with a ratio of 80%-10%-10% following other baselines. We use the validation set to tune SINCERE hyperparameters using Adam optimization with a learning rate of 3×1033\times 10^{-3}.

Refer to caption
Figure 4. Comparison between methods and the ones without static embeddings on future interaction prediction.
\Description

Comparison between methods and the ones without static embeddings on future interaction prediction.

During testing, given a former period of interactions, SINCERE first estimates the sectional curvature of the next period based on the structure information, then calculates the link probability between the specific user and all items and predicts a ranked top-kk items list. The experimental results are reported in Tab. 3, where all the baselines use the parameters of the best performance according to the original papers. In general, our method performs the best against other methods, demonstrating the benefits of dual spaces design and representation spaces dynamics. Among all methods, we find that SINCERE and DeePRed have the smallest gaps between MRR and Recall@10, which means not only can these two models find the exact items the user may interact with, but also can distinguish the ground truth from negative samples with higher ranks.

While predicting future interactions, we discover that the performance of two state-of-the-art methods JODIE and HILI have little vibration with the embedding size from 32 to 256 compared with other methods, which means they are insensitive to the embedding size. We argue that the robustness to embedding size is actually irrelevant to the merit of the model itself and we argue that JODIE and HILI have a relatively strong reliance on static embeddings which are one-hot vectors. To verify this, we have further investigation on the results of removing static embeddings (one-hot vectors). In Fig. 4, JODIE_Dy and HILI_Dy indicates the ones with no static embedding, JODIEs are in blue schemes and HILIs in green. The experimental result shows that there is a huge gap once the one-hot vectors are removed which verifies our assumption, while SINCERE performs the best in terms of MRR and Recall@10. Besides, HILI relies less on the static embeddings for the greater gain between HILI and HILI_Dy compared to JODIE.

Table 4. Comparison on different variants of SINCERE.
Datasets \rightarrow MOOC Wikipedia Reddit Movielen
Methods \downarrow MRR Recall@10 MRR Recall@10 MRR Recall@10 MRR Recall@10
SINCERE 0.586 0.885 0.793 0.865 0.825 0.883 0.511 0.819
SINCERE-Static 0.469 0.813 0.747 0.834 0.703 0.775 0.402 0.664
SINCERE-𝔼\mathbb{E} 0.397 0.674 0.692 0.758 0.682 0.723 0.374 0.616

4.3. Ablation Analysis

The introduction of co-evolving κ\kappa-stereographic spaces is the basis of SINCERE. In order to analyze the effect of dynamic modeling and co-evolving κ\kappa-stereographic spaces, we conduct ablation experiments on four datasets with two variant models named SINCERE-Static and SINCERE-𝔼\mathbb{E}, respectively.

Refer to caption
Figure 5. A visualization of three different status of SINCERE-Static on Movielen dataset. We follow and color the movie nodes viewed by three users in red, orange and green. Others are in blue. Central convergence points are the three users we pick and map from user space.
\Description

A visualization of three different status of SINCERE-Static on Movielen dataset.

For SINCERE-Static, we setup the static curvature from the training beginning the same as last-batch curvature of the previous experiment to make the spaces steady. For SINCERE-𝔼\mathbb{E}, we fix the curvature to zero as the space is Euclidean. As shown in Tab. 4, the performance of SINCERE-Static is closer to SINCERE compared with SINCERE-𝔼\mathbb{E} which is because the curvature set in SINCERE-Static can be seen as the pseudo-optimal curvature. The small gap indicates that there is some information loss in the training process even though the pseudo-optimal curvature is given. Fig. 5 gives an visualization of SINCERE-Static on Movielen dataset with curvature set to 1-1. The performance of fixing the space as Euclidean is somehow less competitive, which confirms that introducing dual dynamic Riemannian manifolds is necessary and effective.

4.4. Hyperparameter Sensitivity Analysis

To further investigate the performance of SINCERE, we conduct several experiments for SINCERE with different setups to analyze its sensitivity to hyperparameters.

Refer to caption
Figure 6. Effect of embedding size.
\Description

Effect of embedding size.

Fig. 6 shows the impact of embedding dimension on the performance of three methods. 128128 seems an optimal value for DeePRed in most cases. As we discuss in Sec. 4.2, JODIE counts on the static embeddings which makes embedding size have almost no influence on it. However, we find that in some cases such as LastFM and Wikipedia, SINCERE performs better with dimension 6464 than 128128 and we argue that it is due to the strong representational power of hyperbolic spaces. In these cases, 6464 is capable enough to represent the networks.

Refer to caption
Figure 7. Effect of sample ratio on MOOC dataset.
\Description

Effect of sample ratio on MOOC dataset.

The time spent in computing Ricci curvatures and the influence of subgraph extraction sampling ratio on SINCERE performance are shown in Fig. 7. We observe that the computing time decreases sharply with the number of total intervals increase. Besides, the performance (Recall@10) levels off from the sampling ratio 15%15\% to 20% as interval number grows. Such finding informs us that instead of increasing sampling rate to improve performance, it is far better to set a higher interval number considering the computational complexity of Ricci curvatures.

5. Related Work

Sequential Graph Representation. Graph representation assigns individual node an embedding integrating structure and attribute information. Sequential interaction graph representation takes temporal dependency into consideration (Kazemi et al., 2020; Aggarwal and Subbian, 2014), and temporal graph neural networks achieve great success recently (Xu et al., 2020; Wang et al., 2021a; Zuo et al., 2018; Gupta et al., 2022; Sun et al., 2022a). Among them, RNN based models (Zhu et al., 2017; Wu et al., 2017; Baytas et al., 2017; Beutel et al., 2018) are the most popular due to effectiveness on sequential data. CTDNE (Nguyen et al., 2018) and CAW (Wang et al., 2021b) adopt random walk to temporal graph modeling. Co-evolution models consider the influence between users and items and achieve great success (Dai et al., 2016; Kumar et al., 2019; Chen et al., 2021; Kefato et al., 2021). For systematic and comprehensive reviews, readers may refer to (Kazemi et al., 2020; Aggarwal and Subbian, 2014).

Riemannian Graph Learning. Hyperbolic and spherical manifolds are typical instances of Riemannian manifolds and recently serve as the alternative of Euclidean spaces, solving distortion problems. A series of Riemannian graph models are proposed (Liu et al., 2019; Brehmer and Cranmer, 2020; Sonthalia and Gilbert, 2020; Chami et al., 2020; Sun et al., 2021b, 2023). Recently, to match the wide variety of graph structures, the study of mixed-curvature space has been conducted. For instance, MVAE (Skopek et al., 2020) and SelfMGNN (Sun et al., 2022b) are two recent representatives to capture the complex graph structure in product manifolds. As another line of work, GIL (Zhu et al., 2020) attempts to embed a graph into the Euclidean space and hyperbolic space simultaneously for interaction learning.

6. Conclusion

In this paper, we take the first attempt to study the sequential interaction network representation learning in co-evolving Riemannian spaces, and present a novel model SINCERE. Specifically, we first introduce a couple of co-evolving Riemannian representation spaces for both types of nodes bridged by the Euclidean-like tangent space. Then, we design the sectional curvature estimator and geometry-cross methods for space evolution trend prediction and message propagation. Finally, we conduct extensive experiments on several real-world datasets and show the superiority of SINCERE.

Acknowledgements.
This work was supported in part by National Natural Science Foundation of China under Grant U1936103, 61921003 and 62202164.

References

  • (1)
  • Aggarwal and Subbian (2014) Charu Aggarwal and Karthik Subbian. 2014. Evolutionary Network Analysis: A Survey. ACM Computing Surveys (CSUR) 47, 1 (2014), 1–36.
  • Bachmann et al. (2020) Gregor Bachmann, Gary Becigneul, and Octavian Ganea. 2020. Constant Curvature Graph Convolutional Networks. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 486–496. https://proceedings.mlr.press/v119/bachmann20a.html
  • Baytas et al. (2017) Inci M Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K Jain, and Jiayu Zhou. 2017. Patient Subtyping via Time-aware LSTM Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 65–74.
  • Beutel et al. (2018) Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H. Chi. 2018. Latent Cross: Making Use of Context in Recurrent Recommender Systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (Marina Del Rey, CA, USA) (WSDM ’18). Association for Computing Machinery, New York, NY, USA, 46–54. https://doi.org/10.1145/3159652.3159727
  • Brehmer and Cranmer (2020) Johann Brehmer and Kyle Cranmer. 2020. Flows for Simultaneous Manifold Learning and Density Estimation. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 442–453. https://proceedings.neurips.cc/paper/2020/file/051928341be67dcba03f0e04104d9047-Paper.pdf
  • Cao et al. (2021) Jiangxia Cao, Xixun Lin, Shu Guo, Luchen Liu, Tingwen Liu, and Bin Wang. 2021. Bipartite Graph Embedding via Mutual Information Maximization. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 635–643.
  • Chamberlain et al. (2019) Benjamin Paul Chamberlain, Stephen R Hardwick, David R Wardrope, Fabon Dzogang, Fabio Daolio, and Saúl Vargas. 2019. Scalable Hyperbolic Recommender Systems. arXiv preprint arXiv:1902.08648 (2019). https://arxiv.org/pdf/1902.08648.pdf
  • Chami et al. (2020) Ines Chami, Albert Gu, Vaggos Chatziafratis, and Christopher Ré. 2020. From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 15065–15076. https://proceedings.neurips.cc/paper/2020/file/ac10ec1ace51b2d973cd87973a98d3ab-Paper.pdf
  • Chami et al. (2019) Ines Chami, Zhitao Ying, Christopher Ré, and Jure Leskovec. 2019. Hyperbolic Graph Convolutional Neural Networks. Advances in Neural Information Processing Systems 32 (2019).
  • Chen et al. (2021) Huidi Chen, Yun Xiong, Yangyong Zhu, and Philip S. Yu. 2021. Highly Liquid Temporal Interaction Graph Embeddings. In Proceedings of the Web Conference 2021 (Ljubljana, Slovenia) (WWW ’21). Association for Computing Machinery, New York, NY, USA, 1639–1648. https://doi.org/10.1145/3442381.3449921
  • D\kabrowski et al. (2021) Jacek D\kabrowski, Barbara Rychalska, Michał Daniluk, Dominika Basaj, Konrad Gołuchowski, Piotr B\kabel, Andrzej Michałowski, and Adam Jakubowski. 2021. An Efficient Manifold Density Estimator for All Recommendation Systems. In Neural Information Processing: 28th International Conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8–12, 2021, Proceedings, Part IV 28. Springer, 323–337.
  • Dai et al. (2016) Hanjun Dai, Yichen Wang, Rakshit Trivedi, and Le Song. 2016. Deep Coevolutionary Network: Embedding User and Item Features for Recommendation. arXiv preprint arXiv:1609.03675 (2016). https://arxiv.org/pdf/1609.03675.pdf
  • Defferrard et al. (2020) Michaël Defferrard, Martino Milani, Frédérick Gusset, and Nathanaël Perraudin. 2020. DeepSphere: a Graph-based Spherical CNN. In International Conference on Learning Representations. https://openreview.net/forum?id=B1e3OlStPB
  • Fan et al. (2021) Ziwei Fan, Zhiwei Liu, Jiawei Zhang, Yun Xiong, Lei Zheng, and Philip S Yu. 2021. Continuous-time Sequential Recommendation with Temporal Graph Collaborative Transformer. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 433–442.
  • Feng et al. (2020) Shanshan Feng, Lucas Vinh Tran, Gao Cong, Lisi Chen, Jing Li, and Fan Li. 2020. Hme: A Hyperbolic Metric Embedding Approach for Next-poi Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1429–1438.
  • Forman (2003) Robin Forman. 2003. Bochner’s Method for Cell Complexes and Combinatorial Ricci Curvature. Discrete and Computational Geometry 29, 3 (2003), 323–374.
  • Fu et al. (2021) Xingcheng Fu, Jianxin Li, Jia Wu, Qingyun Sun, Cheng Ji, Senzhang Wang, Jiajun Tan, Hao Peng, and Philip S. Yu. 2021. ACE-HGNN: Adaptive Curvature Exploration Hyperbolic Graph Neural Network. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 111–120.
  • Ganea et al. (2018) Octavian Ganea, Gary Bécigneul, and Thomas Hofmann. 2018. Hyperbolic Neural Networks. In Advances in Neural Information Processing Systems. 5345–5355.
  • Gromov (1987) M. Gromov. 1987. Hyperbolic Groups. Springer New York, New York, NY, 75–263. https://doi.org/10.1007/978-1-4613-9586-7_3
  • Gu et al. (2019) Albert Gu, Frederic Sala, Beliz Gunel, and Christopher Ré. 2019. Learning Mixed-Curvature Representations in Product Spaces. In International Conference on Learning Representations. https://openreview.net/forum?id=HJxeWnCcF7
  • Gulcehre et al. (2019) Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, and Nando de Freitas. 2019. Hyperbolic Attention Networks. In International Conference on Learning Representations.
  • Gupta et al. (2022) Shubham Gupta, Sahil Manchanda, Srikanta Bedathur, and Sayan Ranu. 2022. TIGGER: Scalable Generative Modelling for Temporal Interaction Graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 6819–6828.
  • He et al. (2014) Xiangnan He, Ming Gao, Min-Yen Kan, Yiqun Liu, and Kazunari Sugiyama. 2014. Predicting the Popularity of Web 2.0 Items Based on User Comments. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. 233–242.
  • Huang et al. (2018) Zhipeng Huang, Bogdan Cautis, Reynold Cheng, Yudian Zheng, Nikos Mamoulis, and Jing Yan. 2018. Entity-Based Query Recommendation for Long-Tail Queries. ACM Trans. Knowl. Discov. Data 12, 6, Article 64 (aug 2018), 24 pages. https://doi.org/10.1145/3233186
  • Jiang et al. (2016) Shan Jiang, Yuening Hu, Changsung Kang, Tim Daly Jr, Dawei Yin, Yi Chang, and Chengxiang Zhai. 2016. Learning Query and Document Relevance from a Web-scale Click Graph. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 185–194.
  • Kazemi et al. (2020) Seyed Mehran Kazemi, Rishab Goel, Kshitij Jain, Ivan Kobyzev, Akshay Sethi, Peter Forsyth, and Pascal Poupart. 2020. Representation Learning for Dynamic Graphs: A Survey. Journal of Machine Learning Research 21, 70 (2020), 1–73.
  • Kefato et al. (2021) Zekarias Kefato, Sarunas Girdzijauskas, Nasrullah Sheikh, and Alberto Montresor. 2021. Dynamic Embeddings for Interaction Prediction. In Proceedings of the Web Conference 2021. 1609–1618.
  • Kumar et al. (2019) Srijan Kumar, Xikun Zhang, and Jure Leskovec. 2019. Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1269–1278.
  • Lee (2018) John M Lee. 2018. Introduction to Riemannian Manifolds. Springer.
  • Liu et al. (2019) Qi Liu, Maximilian Nickel, and Douwe Kiela. 2019. Hyperbolic Graph Neural Networks. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2019/file/103303dd56a731e377d01f6a37badae3-Paper.pdf
  • Mathieu et al. (2019) Emile Mathieu, Charline Le Lan, Chris J Maddison, Ryota Tomioka, and Yee Whye Teh. 2019. Continuous Hierarchical Representations with Poincaré Variational Auto-Encoders. In Advances in Neural Information Processing Systems. 12544–12555.
  • Mirvakhabova et al. (2020) Leyla Mirvakhabova, Evgeny Frolov, Valentin Khrulkov, Ivan Oseledets, and Alexander Tuzhilin. 2020. Performance of Hyperbolic Geometry Models on Top-N Recommendation Tasks. In Proceedings of the 14th ACM Conference on Recommender Systems. 527–532.
  • Nguyen et al. (2018) Giang Hoang Nguyen, John Boaz Lee, Ryan A Rossi, Nesreen K Ahmed, Eunyee Koh, and Sungchul Kim. 2018. Continuous-time Dynamic Network Embeddings. In Companion Proceedings of the Web Conference 2018. 969–976.
  • Nickel and Kiela (2017) Maximillian Nickel and Douwe Kiela. 2017. Poincaré Embeddings for Learning Hierarchical Representations. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/59dfa2df42d9e3d41f5b02bfc32229dd-Paper.pdf
  • Nickel and Kiela (2018) Maximillian Nickel and Douwe Kiela. 2018. Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 3779–3788. https://proceedings.mlr.press/v80/nickel18a.html
  • Ollivier (2009) Yann Ollivier. 2009. Ricci Curvature of Markov Chains on Metric Spaces. Journal of Functional Analysis 256, 3 (2009), 810–864.
  • Rezende et al. (2020) Danilo Jimenez Rezende, George Papamakarios, Sébastien Racaniere, Michael Albergo, Gurtej Kanwar, Phiala Shanahan, and Kyle Cranmer. 2020. Normalizing Flows on Tori and Spheres. In International Conference on Machine Learning. PMLR, 8083–8092.
  • Sia et al. (2019) Jayson Sia, Edmond Jonckheere, and Paul Bogdan. 2019. Ollivier-ricci Curvature-based Method to Community Detection in Complex Networks. Scientific reports 9, 1 (2019), 1–12.
  • Skopek et al. (2020) Ondrej Skopek, Octavian-Eugen Ganea, and Gary Bécigneul. 2020. Mixed-curvature Variational Autoencoders. In International Conference on Learning Representations. https://openreview.net/forum?id=S1g6xeSKDS
  • Sonthalia and Gilbert (2020) Rishi Sonthalia and Anna Gilbert. 2020. Tree! I am no Tree! I am a low dimensional Hyperbolic Embedding. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 845–856. https://proceedings.neurips.cc/paper/2020/file/093f65e080a295f8076b1c5722a46aa2-Paper.pdf
  • Sreejith et al. (2016) RP Sreejith, Karthikeyan Mohanraj, Jürgen Jost, Emil Saucan, and Areejit Samal. 2016. Forman Curvature for Complex Networks. Journal of Statistical Mechanics: Theory and Experiment 2016, 6 (2016), 063206.
  • Sun et al. (2021a) Jianing Sun, Zhaoyue Cheng, Saba Zuberi, Felipe Perez, and Maksims Volkovs. 2021a. HGCF: Hyperbolic Graph Convolution Networks for Collaborative Filtering. In Proceedings of the Web Conference 2021 (Ljubljana, Slovenia) (WWW ’21). Association for Computing Machinery, New York, NY, USA, 593–601. https://doi.org/10.1145/3442381.3450101
  • Sun et al. (2023) Li Sun, Junda Ye, Hao Peng, Feiyang Wang, and Philip S. Yu. 2023. Self-Supervised Continual Graph Learning in Adaptive Riemannian Spaces. In Proceedings of the AAAI Conference on Artificial Intelligence.
  • Sun et al. (2022a) Li Sun, Junda Ye, Hao Peng, and Philip S. Yu. 2022a. A Self-supervised Riemannian GNN with Time Varying Curvature for Temporal Graph Learning. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 1827–1836.
  • Sun et al. (2022b) Li Sun, Zhongbao Zhang, Junda Ye, Hao Peng, Jiawei Zhang, Sen Su, and Philip S. Yu. 2022b. A Self-supervised Mixed-curvature Graph Neural Network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 4146–4155.
  • Sun et al. (2021b) Li Sun, Zhongbao Zhang, Jiawei Zhang, Feiyang Wang, Hao Peng, Sen Su, and Philip S. Yu. 2021b. Hyperbolic Variational Graph Neural Network for Modeling Dynamic Graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4375–4383.
  • Ungar (2008) Abraham Albert Ungar. 2008. A Gyrovector Space Approach to Hyperbolic Geometry. Synthesis Lectures on Mathematics and Statistics 1, 1 (2008), 1–194.
  • Ungar (2010) Abraham Albert Ungar. 2010. Barycentric Calculus in Euclidean and Hyperbolic Geometry: A Comparative Introduction. World Scientific.
  • Vinh Tran et al. (2020) Lucas Vinh Tran, Yi Tay, Shuai Zhang, Gao Cong, and Xiaoli Li. 2020. HyperML: A Boosting Metric Learning Approach in Hyperbolic Space for Recommender Systems. Association for Computing Machinery, New York, NY, USA, 609–617. https://doi.org/10.1145/3336191.3371850
  • Wang et al. (2021a) Yiwei Wang, Yujun Cai, Yuxuan Liang, Henghui Ding, Changhu Wang, Siddharth Bhatia, and Bryan Hooi. 2021a. Adaptive Data Augmentation on Temporal Graphs. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 1440–1452. https://proceedings.neurips.cc/paper/2021/file/0b0b0994d12ad343511adfbfc364256e-Paper.pdf
  • Wang et al. (2021b) Yanbang Wang, Yen-Yu Chang, Yunyu Liu, Jure Leskovec, and Pan Li. 2021b. Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks. In International Conference on Learning Representations. https://openreview.net/forum?id=KYPz4YsCPj
  • Wilson et al. (2014) Richard C Wilson, Edwin R Hancock, Elżbieta Pekalska, and Robert PW Duin. 2014. Spherical and Hyperbolic Embeddings of Data. IEEE transactions on pattern analysis and machine intelligence 36, 11 (2014), 2255–2269.
  • Wu et al. (2017) Chao-Yuan Wu, Amr Ahmed, Alex Beutel, Alexander J Smola, and How Jing. 2017. Recurrent Recommender Networks. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 495–503.
  • Xu et al. (2020) Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, and Kannan Achan. 2020. Inductive Representation Learning on Temporal Graphs. In International Conference on Learning Representations. https://openreview.net/forum?id=rJeW1yHYwH
  • Yang et al. (2021) Menglin Yang, Min Zhou, Marcus Kalander, Zengfeng Huang, and Irwin King. 2021. Discrete-Time Temporal Network Embedding via Implicit Hierarchical Learning in Hyperbolic Space. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &amp; Data Mining (Virtual Event, Singapore) (KDD ’21). Association for Computing Machinery, New York, NY, USA, 1975–1985. https://doi.org/10.1145/3447548.3467422
  • Ye et al. (2020) Ze Ye, Kin Sum Liu, Tengfei Ma, Jie Gao, and Chao Chen. 2020. Curvature Graph Network. In International Conference on Learning Representations. https://openreview.net/forum?id=BylEqnVFDB
  • Zhang et al. (2021) Yiding Zhang, Xiao Wang, Chuan Shi, Nian Liu, and Guojie Song. 2021. Lorentzian Graph Convolutional Networks. In Proceedings of the Web Conference 2021. 1249–1261.
  • Zhu et al. (2020) Shichao Zhu, Shirui Pan, Chuan Zhou, Jia Wu, Yanan Cao, and Bin Wang. 2020. Graph Geometry Interaction Learning. Advances in Neural Information Processing Systems 33 (2020), 7548–7558.
  • Zhu et al. (2017) Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai. 2017. What to Do Next: Modeling User Behaviors by Time-LSTM.. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Vol. 17. 3602–3608.
  • Zuo et al. (2018) Yuan Zuo, Guannan Liu, Hao Lin, Jia Guo, Xiaoqian Hu, and Junjie Wu. 2018. Embedding Temporal Network via Neighborhood Formation. In Proceedings of KDD. 2857–2866.