Evolutionary Preference Learning via Graph Nested GRU ODE
for Session-based Recommendation

Jiayan Guo^∗ [email protected] Peking UniversityBeijingChina , Peiyan Zhang^∗ [email protected] HKUSTHong Kong , Chaozhuo Li [email protected] Microsoft Research AsiaBeijingChina , Xing Xie [email protected] Microsoft Research AsiaBeijingChina , Yan Zhang [email protected] Peking UniversityBeijingChina and Sunghun Kim [email protected] HKUSTHong Kong

(2022)

Abstract.

Session-based recommendation (SBR) aims to predict the user’s next action based on the ongoing sessions. Recently, there has been an increasing interest in modeling the user preference evolution to capture the fine-grained user interests. While latent user preferences behind the sessions drift continuously over time, most existing approaches still model the temporal session data in discrete state spaces, which are incapable of capturing the fine-grained preference evolution and result in sub-optimal solutions. To this end, we propose Graph Nested GRU ordinary differential equation (ODE), namely GNG-ODE, a novel continuum model that extends the idea of neural ODEs to continuous-time temporal session graphs. The proposed model preserves the continuous nature of dynamic user preferences, encoding both temporal and structural patterns of item transitions into continuous-time dynamic embeddings. As the existing ODE solvers do not consider graph structure change and thus cannot be directly applied to the dynamic graph, we propose a time alignment technique, called t-Alignment, to align the updating time steps of the temporal session graphs within a batch. Empirical results on three benchmark datasets show that GNG-ODE significantly outperforms other baselines.

Session-based Recommendation; Graph Neural ODE; Graph Neural Networks

^∗Both authors contribute equal to this work.

^†^†journalyear: 2022^†^†copyright: acmcopyright^†^†conference: Proceedings of the 31st ACM International Conference on Information and Knowledge Management; October 17–21, 2022; Atlanta, GA, USA^†^†booktitle: Proceedings of the 31st ACM International Conference on Information and Knowledge Management (CIKM ’22), October 17–21, 2022, Atlanta, GA, USA^†^†price: 15.00^†^†doi: 10.1145/3511808.3557314^†^†isbn: 978-1-4503-9236-5/22/10^†^†ccs: Information systems Recommender systems

1. Introduction

Recommender systems can help provide users with personalized information according to their preferences reflected in the historical interactions (He et al., 2017), which are widely applied in e-commerce websites, web searches, and so forth (Zhang et al., 2019; Wang et al., 2021). However, in some scenarios where only the user’s recent interactions within a narrow time range are available, the general recommenders are not applicable since the collaborative signal is scarce, leading to the obscure of user preferences (Hidasi et al., 2016). Thus, session-based recommendation (SBR) is proposed to detect the user intent from the limited interactions in the current session and make recommendations, where the session is defined as the user’s actions within a period of time (Hidasi et al., 2016; Li et al., 2017a).

Refer to caption — Figure 1. Illustration of the complex structural and temporal patterns in the session. Arrows denote that nearby and distant transition dependencies co-exist. The color gradient represents the continuity change of user preferences as time progresses. Item clicks can be interpreted as the observations of latent continuous user preferences.

Most existing SBR methods focus on modeling sequential pattern among items of a session by using Recurrent Neural Networks (RNNs) (Hidasi et al., 2016; Li et al., 2017a; Wang et al., 2019) or Graph Neural Networks (GNNs) (Wu et al., 2019a; Qiu et al., 2019; Wang et al., 2020a; Guo et al., 2022). However, these works view a session as a short sequence and assume that the primary intention of the user in a session usually remains the same, and try to capture the user’s preference directly from the entire session. Consequently, they often ignore the fact that a user’s fine-grained preference can drift over time, even in a relatively short-term session. Although the temporal pattern is crucial in capturing the fine-grained user preference, research on utilizing temporal information in SBR is still in the early stage.

Fortunately, multiple lines of recent studies in SBR have aimed to embrace this challenge by incorporating additional temporal information. The first line of works (Pan et al., 2021; Zhou et al., 2021a) models the evolution of user preference in a discrete-time setting. They model a session as snapshots of a dynamic session graph sampled at fixed-length timestamps. Thus, these approaches cannot model the irregularity of time intervals, which is essential for analyzing complex dynamics of user preferences, e.g., when the dwelling time of a user on an item becomes shorter, the user’s interest in the item tends to decrease (Fan et al., 2021). Another line of works integrates the time dimension by considering timestamps information as a contextual dimension (Shen et al., 2021a; Zhou et al., 2021b). However, these methods generate discrete user preferences that ignore the time elapse effect. Consider a user who makes a purchase today and her preference representation is updated. The representation will stay the same regardless of when she returns (i.e., a day later, a month later). As a result, the same recommendations will be offered when she returns next time. However, user preferences may change over time (Cheng et al., 2017; Kumar et al., 2019). The time elapse effect on user preferences should be considered and thus the preference representation needs to be updated to the query time. In this paper, we argue that the user’s preference is a continuous concept evolving as time progresses. As shown in Figure 1, item interactions can be interpreted as the observations of the latent continuous user preference at a specific timestamp. By modeling the preference dynamics in the continuous-time setting, we no longer need the equal-length time slice segmentation of the whole timeline and manage to consider the time elapse effect to predict the future embedding trajectories of items as time progresses.

In particular, modeling the user preference in a fully continuous manner is challenging. As most neural networks are discrete, where the iterative update of hidden states between two layers is a discretization of a continuous transformation (Chen et al., 2018; Lu et al., 2018; Haber and Ruthotto, 2017), they cannot model user preference in the continuous-time setting. To handle this challenge, in this paper, we propose to utilize neural ODE to complete this task. Owing to its intrinsic continuous nature, neural ODE enables tracking of the evolution of the underlying system. It is expected to offer improved performance compared with using discrete methods to model a continuous dynamical system (Huang et al., 2020, 2021). However, directly applying neural ODE models in SBR is still inapplicable. As user-item interactions occur irregularly along time, the neural ODE model should be theoretically continuous to guarantee stability. Moreover, due to the dynamic nature of a temporal session graph, the updating time steps of the temporal session graph within a batch may not be consistent. Thus, the existing ODE solvers are inapplicable in the batch update process since the solvers could accept only a one-time step parameter for each calculation.

To address these issues, we propose a novel ODE-based model for modeling the dynamics of user preference along time in a fully continuous manner. Different from previous snapshots-based methods, given an ongoing session, we transform the session into a fully continuous temporal session graph without using snapshots, which builds the potential structural and temporal relations between items. Afterward, we employ Graph Gated Neural Network (GGNN) (Li et al., 2015; Wu et al., 2019a) to encode the item embeddings and transition patterns simultaneously to infer the latent initial states for all items. We further derive a Graph Nested GRU (Li et al., 2019; Skardinga et al., 2021; Seo et al., 2018; Pareja et al., 2020) inspired continuous-time Ordinary Differential Equation network (GNG-ODE) that propagates the latent states of the items between different time steps as time progresses. Different from most existing temporal SBR models that learn the dynamics by employing recurrent model structures with discrete depth, our model coincides the time domain with the depth of a neural network and takes advantage of ODEs to steer the latent user/item features between two timestamps smoothly. As the existing ODE solvers are inapplicable in SBR where the graph is dynamic, we further propose a time alignment algorithm, called t-Alignment, to adapt the existing ODE solvers onto our dynamic graph setting by aligning the updating time steps of the dynamic session graphs within a batch. We conduct extensive experiments on three real-world public benchmarks. Comprehensive experimental results verify that GNG-ODE significantly outperforms the competitive baselines.

Our primary contributions can be summarized as follows:

•

We propose a novel GNG-ODE to effectively consider the intrinsic complex nature of user-item interactions by modeling a session as a continuous-time temporal session graph. In this carefully designed graph structure, the temporal information of item transitions is preserved in a fully continuous manner. To the best of our knowledge, this is the first work to model the continuous evolution of user preference using neural ODE in SBR.
•

We show that GNG-ODE is theoretically well-posed, i.e., its solution always exists and is unique (see Section 4.2). Besides, it enjoys several good numerical properties. We also propose the t-Alignment algorithm to make existing ODE solvers applicable to dynamic environments in SBR.
•

Extensive experiments on three public datasets demonstrate the effectiveness of our GNG-ODE model. Compared with all competitive baselines, the improvements brought by modeling the continuous evolution of user preferences are at most 6.05%, according to the ranking metric on average.

2. Related Works

Session-based Recommendation. Following the development of deep learning, many neural network based approaches have been proposed for SBR. Hidasi et al. (Hidasi et al., 2016) first propose to leverage the recurrent neural networks (RNNs) to model users’ preferences. Afterward, attention-based mechanisms are incorporated into the system and significantly boost performance. NARM (Li et al., 2017b) utilizes attention on RNN models to enhance the captured features while STAMP (Liu et al., 2018a) captures long and short-term preferences relying on a simple attentive model. Convolution Neural Networks (CNNs) are also leveraged. Tang et al. (Tang and Wang, 2018) try to embed item session as a matrix and perform convolution on the matrix to get the representation.

To better model the transitions within the sessions, most recent developments focus on leveraging Graph Neural Networks (GNNs) to extract the relationship between sessions. Wu et al. (Wu et al., 2019b) first propose to capture the complex transitions with graph structure. Afterward, Pan et al. (Pan et al., 2020b) try to avoid overfitting through highway networks (Srivastava et al., 2015). Position information (Wang et al., 2020c), target information (Yu et al., 2020), and global context (Wang et al., 2020a) are also taken into consideration to further improve the performance.

Temporal Information in SBR. Temporal information plays a vital role in user preference modeling. Although there are a few works in other recommendation areas (Li et al., 2020; Vassøy et al., 2019; Bai et al., 2019; Li et al., 2020; Kumar et al., 2019; Fan et al., 2021; Chen et al., 2021b) utilize the temporal information to facilitate recommendation, the temporal-related method has not been fully explored in SBR. Some prior efforts reduce the temporal information into the relative order/position information. For example, Yu et al. (Yu et al., 2016) use RNN to capture the sequential signal, which reveals the user’s future dynamic preference in the next-basket recommendation. Pan et al. (Pan et al., 2021) further model the evolution of item transitions by constructing a sequence of dynamic graph snapshots which contains the graphs transformed from the session at different timestamps. Zhou et al. (Shen et al., 2021a; Zhou et al., 2021b) integrate the time dimension by considering timestamps information as a contextual dimension. The observations of user clicks are put into bins of fixed duration, and the latent dynamics are discretized in the same way. To characterize the dynamics from both the user side and the item side, Zeyuan et al. (Chen et al., 2021b) propose to build a global user-item graph for each time slice and exploit time-sliced graph neural networks to learn user/item embeddings.

As outlined above, we find previous works on SBR have some limitations. First, temporal information is rarely or crudely exploited in these works. Second, existing methods model structural and temporal patterns separately without considering their interactions, which restricts the capacity of the models. Third, some methods rely on the segmentation of the whole timeline into a specified number of equal-length time slices, resulting in the temporal information loss problem (Chen et al., 2021b; Zhou et al., 2021a). Finally, these methods generate discrete user preference representations that ignore the time elapse effect on user preferences. The representation will stay the same regardless of when the user returns to the platform, i.e., a day later, a week later, or even one month later, thus limiting the performance.

Neural Ordinary Differential Equation. Neural ODE is a continuous approach to model the discrete sequence governed by a time-dependent function $\mathcal{T}\rightarrow\mathbb{R^{d}}$ of a continuous time variable $t$ .

(1)

\frac{d\bm{h}(t)}{dt}=\bm{f}_{\theta}(\bm{h}(t),t)

where $\theta$ is the parameter of the differential function. Eq. (1) drives the system state forward in infinite steps over time. The differential function $f:\mathbb{R}^{d}\times\mathbb{R}\rightarrow\mathbb{R}^{d}$ induces a differential ﬁeld that covers the input space. Given initial state $h(t_{0})$ , we can derive the state of time $T$ by a black-box differential equation solver, which evaluates the hidden unit dynamics $f$ wherever necessary to determine the solution with the desired accuracy.

(2)

\bm{h}(T)=\bm{h}(t_{0})+\int_{t=t_{0}}^{T}\bm{f}_{\theta}(\bm{h}(t),t)dt

There is a rich body of literature on Neural ODE recently. Ricky TQ et al. (Chen et al., 2018) first propose the Neural ODE framework and develop an adjoint method to solve the ode function, which is memory efficient. To improve the expression ability of Neural ODE, Junteng et al. (Jia and Benson, 2019) provide a data-driven approach to learn continuous and discrete dynamic behaviors, Emilien et al. (Dupont et al., 2019) add extra dimensions in hidden space and Cagatay et al. (Yildiz et al., 2019) propose second-order Neural ODE. To enable better learning on irregular sampled sequential data, Yulia et al. (Rubanova et al., 2019) propose to combine RNN and Neural ODE and Edward et al. (De Brouwer et al., 2019) introduce a RNN-based ODE that uses GRU-Bayes to update the hidden state. Michael et al. (Poli et al., 2019) first introduce graph Neural ODE that models the diffusion process on graphs and achieves better results than discrete versions on various tasks. Louis-Pasca et al. (Xhonneux et al., 2020) derive the analytic solution to the graph Neural ODE that avoids using of ODE solvers. Chengxi Zang and Fei Wan (Zang and Wang, 2019) introduce the graph Neural ODE on dynamic graphs while do not consider the change of graph structure. Ziwei et al. (Choi et al., 2021) first utilize Neural ODE to learn the optimal layer combination of collaborative filtering model rather than relying on manually designed architecture. However, how to exploit Neural ODE to learn the temporal dynamics in SBR still remains unexplored.

3. Problem Definition

Assume the item set is $V=\{v_{1},v_{2},...,v_{|V|}\}$ , where $v_{i}$ indicates item $i$ and $|V|$ is the number of all items. Given an ongoing session denoted as $S=\{v_{1},v_{2},...,v_{n}\}$ , the aim of a session-based recommendation is to predict the items that the user will interact with at the next timestamp, that is, $v_{n+1}$ . Specifically, the session-based recommender system takes the session $S$ as input and outputs the prediction scores on all candidate items, then the items ranked at the top $K$ positions will be recommended to the user.

4. Approach

In this section, we describe our proposed Graph Nested GRU Ordinary Differential Equation for Session-based Recommendation (GNG-ODE) in detail, which is constituted of three main components, that is, (i) the temporal session graph construction, (ii) the dynamic item representation learning and (iii) the user preference generation and prediction. The framework of the proposed GNG-ODE is schematically shown in Figure 3. Given an ongoing session, we first construct a temporal session graph which contains the graphs transformed from the sessions at different timestamps. Then, we learn the dynamic item representations through GNG-ODE. Finally, we generate the hybrid user preference, which is utilized to make predictions on candidate items.

4.1. Temporal Session Graph Construction

Given a session $S=\{v_{1},v_{2},...,v_{n}\}$ , we first generate the updating time point of the session and its corresponding target items at different timestamps as $(\widetilde{S}_{1},v_{2}),(\widetilde{S}_{2},v_{3}),...,(\widetilde{S}_{n-1},v_{n})$ , where $\widetilde{S}_{t}=\{v_{1},v_{2},...,v_{t}\}$ . This is similar to the data augmentation method widely applied in SBR (Wu et al., 2019a; Pan et al., 2020a). However, different from the existing methods which shuffle the augmented samples and utilize them for training individually, in GNG-ODE we capture the evolution of the session graphs over multi-time steps. Specifically, as shown in Figure 2(b), we construct a continuous time dynamic session graph denoted as $\mathcal{G}=(\mathcal{V},\mathcal{E},\tau)$ , where $\mathcal{V}=\{1,...,N\}$ is the set of all items appeared in the session. $\mathcal{E}\subseteq\mathcal{V}\times\mathcal{V}$ is the multiset of all edges in the session graph representing the transition between two items. A time function, $\tau:\mathcal{E}\rightarrow\mathbb{R}^{+}$ , maps each transition edge into time space. Under such temporal session graphs, we can learn different item embeddings of the items for various timestamps, which helps generate dynamic item representations for accurate user preference generation, as stated in (Xu et al., 2019b).

4.2. Dynamic Item Representation Learning

User click history can be seen as irregularly-sampled data that represents the observations of latent user interests. Typically, observations of user clicks are put into bins of fixed duration (Shen et al., 2021a; Zhou et al., 2021b), and the latent dynamics are discretized in the same way. This leads to difficulties with missing data (e.g., when there is no clicks at some time points) and ill-defined latent variables (Chen et al., 2018). To handle these challenges, our representation learning part consists of three components: (i) A GNN-inspired encoder that transforms the transition structure of observed items into initial hidden states. (ii) A hidden trajectory prediction model characterized by the GNG-ODE function and t-Alignment technique to learn the latent dynamics of the transition evolution. (iii) An attention-based decoder that generates the distribution of the next item that the user may click.

4.2.1. Initial Latent State Encoder

Items in a session can be regarded as observations of the latent user preference. To capture the dynamics of latent user preference, we first transform the raw embeddings of items in a session and their static transitions into the initial latent representations by GGNN. GGNN is widely used in session-based recommendation tasks (Wu et al., 2019a; Chen and Wong, 2020). Given a static session graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ , the GGNN first aggregates neighborhood information to form a neighborhood representation for a node, then applies GRU to combine the original node representation and the neighborhood representations:

(3)

\vspace{-0.3cm}\begin{split}\overline{\bm{h}}_{u}^{(l)}&=\sum_{v\in\mathcal{N}_{u}}\bm{w}_{uv}\bm{h}_{v}^{(l)}\\ \bm{h}^{(l+1)}_{u}&=\text{GRU}\left(\bm{h}_{u}^{(l)},\overline{\bm{h}}^{(l)}_{u}\right)\end{split}\vspace{-0.3cm}

where ${\bm{h}}_{u}^{(l)}$ is the hidden representation of item $u$ in layer $l$ , while $\overline{\bm{h}}_{u}^{(l)}$ denotes the neighborhood representation of item $u$ in layer $l$ . $w_{uv}$ is the edge weight of edge $e_{uv}$ . Details of constructing the static session graph can be found in (Wang et al., 2021). By applying GGNN we can infer a initial states jointly considering both item attributes and transition patterns, thus benefits the preference modeling capacity. The output of the last layer is then normalized by $l_{2}$ -norm to make the value between $[-1,1]$ to ensure the stability of ODE solvers. We denote it as $\bm{h}_{u}$ for simplicity.

4.2.2. Graph Nested GRU ODE

After computing the latent initial states for items, we now define the Graph Nested continuous-time GRU ODE (GNG-ODE) function that drives the system to move forward. Graph Nested GRU (GNG) is widely applied in dynamic graph learning settings (Li et al., 2019; Skardinga et al., 2021; Seo et al., 2018; Pareja et al., 2020), by

(4)

\vspace{-0.1cm}\begin{split}\bm{r}_{u}^{t}&=\sigma\left(W^{r}_{\mathcal{G}}\bm{x}^{t}_{u}+U^{r}_{\mathcal{G}}\bm{h}_{u}^{t-1}+\bm{b}_{r}\right)\\ \bm{z}_{u}^{t}&=\sigma\left(W^{z}_{\mathcal{G}}\bm{x}^{t}_{u}+U^{z}_{\mathcal{G}}\bm{h}_{u}^{t-1}+\bm{b}_{z}\right)\\ \bm{g}_{u}^{t}&=\tanh\left(W^{h}_{\mathcal{G}}\bm{x}^{t}_{u}+U^{h}_{\mathcal{G}}(\bm{r}_{u}^{t}\odot\bm{h}_{u}^{t-1})+\bm{b}_{h}\right)\\ \bm{h}_{u}^{t}&=\bm{z}_{u}^{t}\odot\bm{h}_{u}^{t-1}+(1-\bm{z}_{u}^{t})\odot{\bm{g}_{u}^{t}}\end{split}

where $\bm{r}_{u}^{t}$ and $\bm{z}_{u}^{t}$ are the reset gate and select gate at time step $t$ respectively. Let $\bm{x}_{u}^{t}$ and $\bm{h}_{u}^{t}$ denote the input embeddings and hidden state of item $u$ of time step $t$ , respectively. $\bm{b}_{r},\bm{b}_{z},\bm{b}_{h}\in\mathbb{R}^{d}$ are the parameters and $W_{\mathcal{G}},U_{\mathcal{G}}$ denote one layer of graph convolutional networks (Kipf and Welling, 2016) to aggregate neighborhood information of item $u$ . GNG models the structural and temporal dependency and performs well on discrete-time dynamic graphs. Here we show how to derive a continuous-time GNG-ODE. Specifically, we firstly show that the form of GNG can be written as a difference equation. Given the standard update for the hidden state $\bm{h}_{u}^{t}$ of the GNG in Eq. (4):

(5)

\bm{h}_{u}^{t}=\bm{z}_{u}^{t}\odot\bm{h}_{u}^{t-\Delta{t}}+\left(1-\bm{z}_{u}^{t}\right)\odot{\bm{g}_{u}^{t}}

We can obtain a difference equation by subtracting $\bm{h}_{u}^{t-\Delta t}$ from this state update equation and factoring out $(1-\bm{z}_{u}^{t})$ :

(6)

\begin{split}\Delta\bm{h}_{u}^{t}&=\bm{h}_{u}^{t}-\bm{h}_{u}^{t-\Delta t}\\ &=\left(1-\bm{z}_{u}^{t}\right)\odot\left(\bm{g}_{u}^{t}-\bm{h}_{u}^{t-\Delta t}\right)\end{split}

This difference equation naturally leads to the following ODE for $\bm{h}(t)$ when $\Delta t\rightarrow 0$ :

(7)

\begin{split}\frac{d\bm{h}_{u}(t)}{dt}=\left(1-\bm{z}_{u}(t)\right)\odot\left(\bm{g}_{u}(t)-\bm{h}_{u}(t)\right)\end{split}

with $\bm{r}_{u}(t)$ , $\bm{z}_{u}(t)$ , $\bm{g}_{u}(t)$ the following forms:

(8)

\centering\begin{split}\bm{r}_{u}(t)&=\sigma\left(W_{\mathcal{G}}^{r}\bm{x}_{u}(t)+U_{\mathcal{G}}^{r}\bm{h}_{u}(t)+\bm{b}_{r}\right)\\ \bm{z}_{u}(t)&=\sigma\left(W_{\mathcal{G}}^{z}\bm{x}_{u}(t)+U_{\mathcal{G}}^{z}\bm{h}_{u}(t)+\bm{b}_{z}\right)\\ \bm{g}_{u}(t)&=\text{tanh}\left(W_{\mathcal{G}}^{h}\bm{x}_{u}(t)+U_{\mathcal{G}}^{h}\left(\bm{r}_{u}(t)\odot\bm{h}_{u}(t)\right)+\bm{b}_{h}\right)\end{split}\@add@centering

The form of derived GNG-ODE is in line with GRU-ODE family. As time intervals between user clicks are irregular, the ODE function should theoretically ensure numerical stability. Inspired by (De Brouwer et al., 2019), we have the following corollaries.

Corollary 1. $\bm{h}_{u}(t)$ is bounded by $[-1,1]$ .

Proof. This bound comes from the negative feedback term in Eq. (7), which stabilizes the resulting system. In detail, as we use $L_{2}$ -norm to normalize the initial latent state, the $j$ -th dimension of the starting state $\bm{h}_{u}(0)$ is within $[-1,1]$ , then $\bm{h}(t)_{j}$ will always stay within [-1, 1] because

(9)

\frac{d\bm{h}_{u}(t)_{j}}{dt}|_{t:\bm{h}_{u}(t)_{j}=1}\leq 0\ \ \text{and}\ \ \frac{d\bm{h}_{u}(t)_{j}}{dt}|_{t:\bm{h}_{u}(t)_{j}=-1}\geq 0

This can be derived from the ranges of $\bm{z}$ and $\bm{g}$ in Eq.4. Moreover, when $\bm{h}_{u}(0)$ start outside of $[-1,1]$ , the negative feedback will quickly push $\bm{h}_{u}(t)$ into this region, making the system also robust to numerical errors.

Corollary 2. GNG-ODE is Lipschitz continuous with constant $K=2$ .

Proof. As $\bm{h}$ is differentiable and continous on $t$ , based on mean value theorem (Flett, 1958) that for any $t_{a},t_{b}$ , there exists $t^{*}\in(t_{a},t_{b})$ such that

(10)

\bm{h}(t_{a})-\bm{h}(t_{b})=\frac{d\bm{h}_{u}(t^{*})}{dt}(t_{a}-t_{b})

Taking the euclidean norm of the previous expression, we ﬁnd

(11)

||\bm{h}_{u}(t_{a})-\bm{h}_{u}(t_{b})||=||\frac{d\bm{h}_{u}(t^{*})}{dt}(t_{a}-t_{b})||

Furthermore, we have shown that $\bm{h}_{u}(t)$ is bounded on $[-1,1]$ . Hence, because of the bounded functions appearing in the ODE (sigmoids and hyperbolic tangents), the derivative of $\bm{h}_{u}(t)$ is itself bounded by $[-2,2]$ . We conclude that $\bm{h}_{u}(t)$ is Lipschitz continuous with constant $K=2$ .

Based on the above corollaries, our GNG-ODE enjoys the following properties:

Continuity. It means that our training procedure is further tractable. Specifically, The Cauchy–Kowalevski theorem (Folland, 2020) states that, given $\bm{f}=\frac{d\bm{h}_{u}(t)}{dt}$ , there exists a unique solution of $\bm{h}$ if $\bm{f}$ is analytic (or locally Lipschitz continuous), i.e., the ODE problem is well-posed if $f$ is analytic. In our case, as the GNG-ODE is Lipschitz continuous with constant $K=2$ , there will be only a unique optimal ODE for $\bm{h}_{u}(t)$ . Owing to the uniqueness of the solution, we could find a good solution for GNG-ODE function. Our method becomes fully continuous that can derive item representations at any given timestamps and any time granularity. In this way, we further avoid generating discrete user preference representations and manage to model the time elapse effect to predict the future embedding trajectories of items as time progresses.

Robustness. The continuous nature of our model allows it to track the evolution of the underlying system from irregular observations, and no longer need the equal-length slice segmentation of the whole timeline, which empowers our method to perceive more fine-grained temporal information compared with previous methods.

We can then apply various ODE solvers to integrate the ODE function in Eq. (7). ODE solvers discretize time variable $t$ and convert an integral into many steps of additions. Widely used solvers are fix-step solvers like explicit Euler and fourth-order Runge–Kutta (RK4) method or adaptive step solvers like Dopri5. Then the item representation at time $T$ can be inferred by:

(12)

\bm{h}_{u}(T)=\bm{h}_{u}(t_{0})+\int_{t=t_{0}}^{T}\frac{d\bm{h}_{u}(t)}{dt}

where $\bm{h}_{u}(t_{0})$ is the initial hidden state of node $u$ derived by the encoder in Section 3.2.1.

4.2.3. t-Alignment

As the step size of ODE solvers are not necessarily equal to the time interval of two updates of a temporal session graph, ODE solvers can not be directly applied to the dynamic graph setting. To solve this problem, we propose t-Alignment, a technique that updates the graph structure in time when solving the ODE function. An illustration of t-Alignment is shown in Figure 4. In specific, we assign each edge a timestamp $t$ to represent the time the edge appears. Given the initial time point $t_{0}$ , the number of the integral time point $k$ , and the step size of ODE solver $\Delta t$ . At each integral time point, $t_{0}+k\times\Delta t$ , we check all the edges and preserve the edges with timestamp $t$ where $t\leq t_{0}+k\times\Delta t$ , to form the current session graph, by

(13)

\mathcal{G}_{t_{0}+k\times\Delta t}=\left(\mathcal{V}_{t_{0}:t_{0}+k\times\Delta t},\mathcal{E}_{t_{0}:t_{0}+k\times\Delta t}\right)

Where $\mathcal{G}_{t_{0}+k\times\Delta t}$ is the graph at timestamp $t_{0}+k\times\Delta t$ , $\mathcal{V}_{t_{0}:t_{0}+k\times\Delta t}$ and $\mathcal{V}_{t_{0}:t_{0}+k\times\Delta t}$ are item set and edge set that exist between $t_{0}$ and $t_{0}+k\times\Delta t$ respectively. Then the GNG-ODE function simply takes $G_{t_{0}+k\times\Delta t}$ as input and infers $\frac{d\bm{h}(t_{0}+k\times\Delta t)}{dt}$ , the hidden states of all items. In such a setting, we only need to solve the ODE function once to get the hidden state of the end of the session. Besides, we do not need to store all snapshots of a temporal session graph or interrupt the integral process of ODE solvers to update the graph.

4.3. User Preference Generation and Prediction

After obtaining the item representations right after the last update time of a temporal session graph $G_{t_{n}}$ , which are denoted as $\bm{h}_{v_{1}}(t_{n}^{+}),...\bm{h}_{v_{n}}(t_{n}^{+})$ , we generate the hybrid preference representation to represent the current user interests. Specifically, we combine the recent interest and the long-term preference in the ongoing session to obtain the user’s preference. We use the vector of the last item as the recent interest, that is, $\hat{\bm{z}}_{r}=\bm{h}_{v_{n}}(t_{n}^{+})$ , where $\hat{\bm{z}}_{r}\in\mathbb{R}^{d}$ .

For long-term interest, we consider all items in the session and utilize an attention mechanism to determine the weights for combining the historical item vectors, as follows:

(14)

\begin{split}\hat{\bm{z}}_{l}&=\Sigma_{i=1}^{n}\bm{\gamma}_{i}\bm{h}_{v_{i}}(t_{n}^{+})\\ \bm{\gamma}_{i}&=\text{Softmax}(\bm{a}_{i})\\ \bm{a}_{i}&=W_{1}\sigma\left(W_{2}\bm{h}_{v_{i}}(t_{n}^{+})+W_{3}\hat{\bm{z}}_{r}+\bm{b}\right)\end{split}

where $\hat{\bm{z}}_{l}\in\mathbb{R}^{d}$ is the generated long-term preference at the $t^{*}$ -th timestamp, $\bm{a}_{i}$ and $\bm{\gamma}_{i}$ are the importance scores of item $\bm{v}_{i}$ before and after normalization, respectively, and $W_{1}\in\mathbb{R}^{d},W_{2},W_{3}\in\mathbb{R}^{d\times d}$ are learnable parameters. $\sigma$ is the sigmoid function.

Then, we generate the dynamic hybrid user preference by taking into account the long-term and recent interests, which can be denoted as:

(15)

\begin{split}\hat{\bm{z}}_{h}=W_{4}\left[\hat{\bm{z}}_{l};\hat{\bm{z}}_{r}\right],\end{split}

where $\hat{\bm{z}}_{h}\in\mathbb{R}^{d}$ is the final generated user preference at the timestamp $t_{n}$ , and $W_{4}\in\mathbb{R}^{d\times 2d}$ is the learnable parameters.

After that, we can make predictions by computing a probability distribution of the candidate items to be clicked at the next timestamp through the multiplication operation between the user preference and the embeddings of each item in $V$ :

(16)

\begin{split}\tilde{\bm{y}}_{i}=||\hat{\bm{z}}_{h}||^{T}||\bm{x}_{i}||,\end{split}

where $||\cdot||$ denotes the $L_{2}$ normalization operation.

Finally, we compute the normalized score for each candidate item, as follows:

(17)

\hat{\bm{y}}_{i}=\text{Softmax}(\tilde{\bm{y}}_{i})

where $\hat{\textbf{y}}=\{\hat{\bm{y}}_{1},\hat{\bm{y}}_{2},...,\hat{\bm{y}}_{|V|}\}$ are the normalized prediction scores vector for all candidate items.

4.4. Training

After obtaining the preferential scores, we adopt cross-entropy as the optimization objective to learn the parameters following (Chen and Wong, 2020; Gupta et al., 2019; Liu et al., 2018b; Xia et al., 2021). The loss function is:

(18)

L(\hat{\textbf{y}})=-\sum_{i=1}^{|I|}\bm{y}_{i}\log(\hat{\bm{y}}_{i})+(1-\bm{y}_{i})\log(1-\hat{\bm{y}}_{i})+\lambda||\bm{\Theta}||_{2}^{2}

where $\bm{y}_{i}\in\textbf{y}$ reflects the appearance of an item in the one-hot encoding vector of the ground truth, i.e., $y_{i}=1$ if the $i$ -th item is the target item of the given session; otherwise, $\bm{y}_{i}=0$ . $\bm{\Theta}$ is the model parameter. $\lambda$ is a scalar to control the influence of $L_{2}$ regularization. We use scaled softmax at the normalization stage Eq. (17) to prevent over smoothing of relevance scores.

4.5. Computational Complexity Analysis

Here we analyze the complexity of the initial latent state encoder and the dynamic representation learning module. Given the item set in the session as $V$ , the transitions (edges) in the session as $E$ . Then the time complexity of initial latent state encoder GGNN is $O((|E|+|V|)l)$ , where $l$ is the number of layers of GGNN. The time complexity of the dynamic representation learning module is $O((|E|+|V|)T/\Delta t)$ where $T$ is the time duration of the whole session and $\Delta t$ is the average integration step size of ODE solvers. Then the overall time complexity of the two modules is $O((l+T/\Delta t)(|E|+|V|))$ , which is a linear combination of $|E|$ and $|V|$ . As $l$ and $T/\Delta t$ are relatively small, we find that the total time complexity increases but is still acceptable.

5. Experiment

In this section, we have conducted extensive experiments, and analyzed the performance of the proposed GNG-ODE method by addressing the following key research questions as follows:

•

RQ1: Can our proposed GNG-ODE outperform the state-of-the-art baselines for session-based recommendation?
•

RQ2: How does GNG-ODE perform with different encoders?
•

RQ3: How does GNG-ODE perform comparing with other Neural ODE model? Is t-Alignment useful to help GNG-ODE jointly capture structural and temporal pattern?
•

RQ4: How well does GNG-ODE perform with different ODE solvers from the effectiveness perspective?
•

RQ5: How is the scalability of GNG-ODE?
•

RQ6: How do different hyper-parameters affect GNG-ODE?

Table 1. Statistics of datasets.

	Gowalla	Tmall	Nowplaying
#clicks	1,122,788	818,479	1,367,963
#train sessions	675,561	351,268	825,304
#test sessions	155,332	25,898	89,824
#items	29,510	40,727	60,416
Average length	4.32	6.69	7.42
Average Interval	11.07h	1.49s	4.36h

Table 2. Results(%) of main experiments. The numbers of HR@10, HR@20, MRR@10 and MRR@20 are reported.

\ast

denotes a significant improvement of GNG-ODE over the best baseline using a paired

t

-test (

p

¡ 0.01).

Model	Gowalla				Tmall				Nowplaying
Model	HR@10	HR@20	MRR@10	MRR@20	HR@10	HR@20	MRR@10	MRR@20	HR@10	HR@20	MRR@10	MRR@20
NARM	$40.71$	$49.84$	$22.19$	$23.63$	$21.97$	$27.14$	$10.50$	$12.89$	$13.11$	$17.36$	$6.03$	$6.49$
SR-GNN	$42.18$	$50.21$	$23.63$	$24.08$	$24.79$	$29.39$	$13.61$	$13.93$	$13.73$	$18.46$	$6.92$	$7.25$
NISER+	$44.92$	$53.58$	$25.03$	$25.43$	$29.81$	$36.32$	$15.27$	$15.72$	$16.37$	$22.35$	$8.05$	$8.52$
SGNN-HN	$42.19$	$50.92$	$24.10$	$25.26$	$22.09$	$29.20$	$11.85$	$15.40$	$13.22$	$17.52$	$7.15$	$7.41$
LESSR	$42.87$	$51.04$	$23.90$	$24.47$	$27.93$	$32.75$	$14.79$	$14.91$	$14.73$	$19.54$	$7.66$	$7.99$
GCE-GNN	$45.38$	$53.77$	$25.11$	$\underline{25.69}$	$28.01$	$33.42$	$15.08$	$15.42$	$16.94$	$22.37$	$8.03$	$8.40$
DAT-MDI	$\underline{45.46}$	$53.98$	$\underline{25.14}$	$25.65$	$27.98$	$33.23$	$15.21$	$15.64$	$\underline{16.98}$	$\underline{22.45}$	$8.12$	$8.67$
TiSASRec	$44.67$	$53.39$	$24.06$	$24.46$	$28.23$	$35.28$	$15.67$	16.02	$16.56$	$21.28$	$7.14$	$7.69$
TGSRec	$44.98$	$53.89$	$24.89$	$25.12$	$29.54$	$36.43$	$15.99$	$16.37$	$16.67$	$21.56$	$7.34$	$7.87$
TMI-GNN	$45.43$	$\underline{54.03}$	$25.12$	$25.47$	$\underline{30.06}$	$\underline{36.87}$	$\underline{16.94}$	$\underline{17.19}$	$16.89$	$22.28$	$\underline{8.39}$	$\underline{8.89}$
GNG-ODE	$\textbf{46.05}^{\ast}$	$\textbf{54.58}^{\ast}$	$\textbf{26.32}^{\ast}$	$\textbf{26.91}^{\ast}$	$\textbf{31.11}^{\ast}$	$\textbf{37.66}^{\ast}$	$\textbf{17.90}^{\ast}$	$\textbf{18.23}^{\ast}$	$\textbf{17.31}^{\ast}$	$\textbf{22.83}^{\ast}$	$\textbf{8.85}^{\ast}$	$\textbf{9.23}^{\ast}$
Improv.	$1.30\%$	$1.02\%$	$4.69\%$	$4.75\%$	$3.49\%$	$2.14\%$	$5.67\%$	$6.05\%$	$1.94\%$	$1.69\%$	$5.48\%$	$3.82\%$

5.1. Datasets and Preprocessing

We evaluate GNG-ODE and the baselines on the following three publicly available benchmark datasets, which are commonly used in the literature of session-based recommendation (Li et al., 2017b; Qiu et al., 2019; Ren et al., 2019; Wu et al., 2019a; Yuan et al., 2019; Chen and Wong, 2020; Xu et al., 2019a; Pan et al., 2020b; Gupta et al., 2019):

•

Gowalla¹¹1https://snap.stanford.edu/data/loc-gowalla.html is a dataset that contains users’ check-in information for point-of-interest recommendation. Following (Guo et al., 2019; Tang and Wang, 2018; Chen and Wong, 2020), we keep the 30,000 most popular locations and set the splitting interval to 1 day, and consider the last 20% of sessions for testing.
•

Tmall²²2http://ocelma.net/MusicRecommendationDataset/lastfm-1K.html is a user-purchase data (only purchase records are utilized) obtained from Tmall platform. We also use the last 20% of sessions as the test sets.
•

Nowplaying³³3https://dbis.uibk.ac.at/node/263#nowplaying (Poddar et al., 2018) is a comprehensive implicit feedback dataset consisting of user-song interactions crawled from Twitter. For each music, we randomly select 20% of users who have played the music as the test sets, and the remaining users for training.

Following (Li et al., 2017b; Liu et al., 2018b; Qiu et al., 2019; Ren et al., 2019; Wu et al., 2019a; Chen and Wong, 2020; Zhang et al., 2022), we filter the sessions containing merely an item and the items appearing less than five times for each dataset. We further make data augmentation that has been widely applied in (Li et al., 2017b; Liu et al., 2018b; Wu et al., 2019a; Chen and Wong, 2020) after filtering short sessions and infrequent items. The statistics of the datasets are shown in Table 1.

5.2. Baseline Models

We consider the following baselines to evaluate the performance of the proposed model.

•

NARM⁴⁴4https://github.com/lijingsdu/sessionRec_NARM (Li et al., 2017b) is a RNN-based method for session-based recommendation. It utilizes RNNs with attention to model user action sequences.
•

SR-GNN⁵⁵5https://github.com/CRIPAC-DIG/SR-GNN (Wu et al., 2019a) is a GNN-based method for session-based recommendation. It applies GNNs to extract item features and obtains session representaion through traditional attention network (Li et al., 2015).
•

NISER+⁶⁶6https://github.com/johnny12150/NISER (Gupta et al., 2019) employs normalized item and session embedding based on graph neural network to alleviate popularity bias problem in session-based recommendation.
•

SGNN-HN (Pan et al., 2020b) solves the long-range information propagation problem by adding a star node to take the non-adjacent items into consideration with gated graph neural networks.
•

LESSR⁷⁷7https://github.com/twchen/lessr (Chen and Wong, 2020) transforms the sessions into directed multigraphs and propagates information along shortcut connections to solve the lossy session encoding problem.
•

GCE-GNN⁸⁸8https://github.com/CCIIPLab/GCE-GNN (Wang et al., 2020b) transforms the sessions into global graph and local graphs to enable cross session learning.
•

DAT-MDI (Chen et al., 2021a) combines the GNN and GRU to learn the cross session enhanced session representation.
•

TiSASRec (Li et al., 2020) introduces to use relationship matrix to model the temporal relations for items in the sequence.
•

TGSRec (Fan et al., 2021) uses a user collaboratively continuous-time transformer for sequential recommendation.
•

TMI-GNN (Shen et al., 2021b) uses temporal information to guide the multi-interest network to capture more accurate user interests.

5.3. Implementation

We apply grid search to find the optimal hyper-parameters for each model. We use the last 20% of the training set as the validation set. The ranges of other hyper-parameters are $\{64,128,256,512\}$ for hidden dimensionality $d$ and $1e{-3}$ chosen for learning rate $\eta$ . The weight decay rate $\lambda$ is set to $1e{-4}$ . We use the Adam optimizer to train the models. The batch size is set to 512. We run all models five times with different random seeds and report the average. We use the same evaluation metrics HR@K (Hit Rate) and MRR@K (Mean Reciprocal Rank) following previous studies (Li et al., 2017b; Qiu et al., 2019; Ren et al., 2019; Wu et al., 2019a; Chen and Wong, 2020; Xu et al., 2019a; Pan et al., 2020b; Gupta et al., 2019). The implementation of our model can be found at https://github.com/SpaceLearner/GNG-ODE.

5.4. Overall Comparison (RQ1)

To demonstrate the overall performance of the proposed model, we compare it with the state-of-the-art recommendation methods. They include the static models NARM, SR-GNN, NISER+, SGNN-HN, LESSR, GCE-GNN and DAT-MDI, and temporal models like TiSASRec, TGSRec and TMI-GNN. The experimental results of all compared methods are shown in Table 2, from which we have the following observations.

Compared with RNN, GNN-based models have a stronger ability to explore complex graph-structured data. Moreover, LESSR works better than SR-GNN and SGNN-HN, demonstrating that handling the lossy session encoding problem can further help boost the recommendation performance of GNN models. In addition, through exploiting the global-level transitions between items, GCE-GNN outperforms many baselines on Gowalla that achieves the best performance in the static baselines. However, the performance of GCE-GNN on two other datasets is not satisfactory compared with NISER+, indicating that the long-tail problem and the overfitting problem are more prevalent outside the check-in scenario dataset (Gowalla). DAT-MDI performs the best on Gowalla dataset, which verifies the importance of capturing the complex structural pattern across sessions. We also observe that temporal information helps capture user preference, as temporal baselines all achieve comparable performance. Among temporal baselines, TMI-GNN performs the best, indicating that decomposing temporal information into different interests captures more fine-grained user preference.

Next, we zoom in on the performance of our proposed GNG-ODE. First, we can observe that GNG-ODE can achieve state-of-the-art performance for all cases on three datasets. In particular, GNG-ODE outperforms the existing temporal baselines (i.e., TiSASRec, TGSRec and TMI-GNN). We attribute the improvements of GNG-ODE against the baselines to two factors: One is that GNG-ODE can take the continuous evolution of the session graph structures into consideration, and the other one is that GNG-ODE solves the continuous concept modeling problem using the continuous ODE function. In addition, the improvements of GNG-ODE over the best baselines (i.e., DAT-MDI, GCE-GNN and TMI-GNN) in terms of HR@20 and MRR@20 are 2.14% and 6.05% on Tmall, respectively, and the corresponding improvements are 1.69% and 3.82% on the Nowplaying dataset. We can observe that on all datasets, GNG-ODE brings more performance gain over original models when K in evaluation when HR@K is smaller. A small value of K means the target items stay in the top positions of the recommendation list. Due to the position bias (Chen et al., 2020) in recommendation that users tend to pay more attention to the items in a higher position of the recommendation list, our framework can help original models to produce more accurate and user-friendly recommendations.

5.5. Impact of Encoders for Initial State Inference (RQ2)

In this experiment, we compare GNG-ODE with different initial state encoders to investigate the contribution of our encoder design. The following variants are tested on all datasets, where the results are reported in Table 3:

(1)

Identity: GNG-ODE with raw one-hot embeddings as the initial hidden state.
(2)

MLP: GNG-ODE with the output of a two-layer MLP as the initial hidden state.
(3)

GGNN: GNG-ODE with the output of a GGNN as the initial hidden state.

Table 3. Results of Different Initial State Encoders.

Dataset	Gowalla		Tmall		Nowplaying
Metrics	HR@20	MRR@20	HR@20	MRR@20	HR@20	MRR@20
Identity	53.84	26.80	35.03	15.11	22.75	9.23
MLP	53.86	26.35	34.75	14.58	21.45	8.39
GGNN	54.58	26.91	37.66	17.25	22.83	9.45

From Table 3, we can observe that GGNN achieves best perforamnce on all the three datasets. GGNN encoder emphasizes more on capturing structural information, replacing this with raw embedding or MLP will significantly decrease the recommendation performance on the Tmall dataset. For Gowalla and Nowplaying, compared with the results on Tmall, the structural information contributes less on both HR@20 and MRR@20 metrics. Our analysis is that the difference may be caused by how the influence of the structural and temporal factors in the e-commerce and check-in as well as interest-based scenarios varies. Specifically, in the e-commerce dataset, i.e., Tmall, the structural information is relatively more important, since the transition relation between items is much more complicated than the simple sequential signal (Wu et al., 2019a; Qiu et al., 2019).

5.6. Impact of ODE Functions (RQ3)

To verify the effectiveness of GNG-ODE and t-Alignment, we replace the GNG-ODE with several widely used ODE functions and compare their recommendation performance. The variants of GNG-ODE are listed as follows. For none t-Alignment version, we use the static session graph as input.

(1)

GNG-ODE: the model proposed in this paper.
(2)

GCN-ODE: use a two layer graph convolutional network (Kipf and Welling, 2017) as the ODE function.
(3)

GRU-ODE: use a one layer gated recurrent unit (Cho et al., 2014) as the ODE function.
(4)

MLP-ODE: use a two layer linear network with GELU acivation (Hendrycks and Gimpel, 2016) as the ODE function.

Table 4 shows the performance of different GNG-ODE variants. We observe that without t-Alignment the performance will degrade on all datasets. This confirms that building continuous session graphs could enable our model to capture the evolution of the session graphs over multi-time steps. This further demonstrates the indispensability of t-Alignment in expanding the applicability of those ODE solvers. Moreover, we find that the method jointly considering structural and temporal patterns (GNG-ODE) outperforms its counterparts that only consider structural patterns (GCN-ODE) or that only consider temporal patterns (GRU-ODE), demonstrating the superiority of GNG-ODE at capturing both information.

Table 4. Impact of ODE Function Module.

Dataset	Gowalla		Tmall		Nowplaying
Metrics	HR@20	MRR@20	HR@20	MRR@20	HR@20	MRR@20
GNG-ODE	54.58	26.91	37.66	18.23	22.83	9.45
GCN-ODE	54.46	26.63	37.45	17.83	22.65	9.25
GRU-ODE	54.41	26.76	37.33	17.30	22.59	9.23
MLP-ODE	54.34	26.81	37.17	17.63	22.34	9.18
GNG-ODE w/o t-Align	54.16	25.76	37.22	17.86	22.70	9.17
GCN-ODE w/o t-Align	53.85	25.57	37.18	17.37	22.62	9.14
GRU-ODE w/o t-Align	54.19	26.25	36.88	17.28	22.36	9.06
MLP-ODE w/o t-Align	53.99	26.12	36.89	17.36	22.10	9.04

5.7. Analysis of ODE Solvers (RQ4)

In this section, we investigate the effect of different ODE solvers. ODE solvers play a central role in the performance of GNG-ODE. Widely used numerical ODE solvers are fixed-step solvers like explicit Euler (Euler), fourth-order Runge–Kutta (RK4) and adaptive step size solvers like Dopri5.

5.7.1. Influence of ODE Solvers

The numbers of the best results for each solver are listed on the head of each subfigure in Figure 5. Adaptive step solver Dopri5 outperforms fixed-step solvers on the three datasets as adaptive step solvers adjust integration steps more flexibly than fixed step solvers. Besides, as RK4 takes more evaluation times in one step to achieve a more accurate estimation of the ODE function, it outperforms Euler on the three datasets.

5.7.2. Influence of Integration Step Size of Fixed Step ODE Solvers

Figure 5 also summarizes the curves of fixed-step ODE solvers induced by various step sizes. The $x$ -axis indicates the multiple of the time duration of a session relative to the integration step size. It is used as a hyper-parameter of the fixed step ODE solvers, e.g., Euler and RK4. We find that as the proportion of step size goes smaller, the performance of Euler and RK4 increases. It is because the estimated ODE function is more accurate in small step sizes. Besides, Euler is more sensitive than RK4 to step size as RK4 takes more estimation times in one integration step to ensure accuracy. Moreover, as the metric curves tend to converge as step size goes smaller, we do not need a too small step size to achieve good performance. To avoid taking too much running time, we can adjust the step size to balance both effectiveness and efficiency.

5.8. Running Time Comparison (RQ5)

The computation time per epoch for GNG-ODE is summarized in Figure 6. We also give the running time of DAT-MDI, one recent baseline that does not consider temporal information and TMI-GNN, the best baseline that considers temporal information. For fix-step solvers we choose $T/\Delta t=7$ . We find that the efficiency of GNG-ODE with Euler solver is on par with DAT-MDI and TMI-GNN. Although RK4 and Dopri5 take more time to compute, they achieve better performance and the time costs are still acceptable.

5.9. Hyper-parameter Study (RQ6)

To answer RQ6, we conduct experiments to study the sensitivity of GNG-ODE on the embedding dimension and the GGNN encoder layers. Specifically, we tune the the embedding dimension in {64, 128, 256, 512} and search the GGNN encoder layers in {1,2,3,4,5}. The ODE solver is set to RK4. The performance of GNG-ODE with different hyper-parameters is presented in Figure 7.

5.9.1. Embedding Size

From Figure 7, we can observe that when the number of encoder layers is small, increasing the embedding dimension can generally improve the recommendation performance, especially from dimensions 64 to 128. This is because a large embedding dimension has a relatively better representation ability of item characteristics. However, there is a merely limited promotion of the performance when the dimension increases from 256 to 512. As increasing the embedding dimension will consume more computation resources, the dimension 256 is a proper choice considering both the effectiveness and efficiency of the recommender.

5.9.2. Number of GGNN Encoder Layers

As shown in Figure 7, increasing the number of layers does not always result in better performance. For example, on Tmall and Nowplaying dataset, the optimal layer number is less than 3. The performance decreases quickly when the layer number exceeds this optimal value because of the over-smoothing problem (Xu et al., 2018).

6. Conclusion

In this paper, we design a new SBR model, GNG-ODE, to model the continuity of user preference along the time in a fully continuous manner with Neural ODE. GNG-ODE works upon our defined continuous-time temporal session graph. We employ the GGNN to encode the structural patterns to infer the initial latent states for all items. We further derive GNG-ODE that propagates the latent states of the items between different time steps in time. We also propose a time alignment algorithm, called t-Alignment, to adapt the existing ODE solvers onto our dynamic graph setting. Extensive experiments on three real-world datasets demonstrate the effectiveness of GNG-ODE. Moreover, the ablation study and analysis verify the efficacy of those components in GNG-ODE. In conclusion, GNG-ODE is a novel model to solve the SBR problem with temporal information.

Acknowledgement. This work was supported by National Key Research and Development Program of China under Grant No. 2018AAA0101902, and NSFC under Grant No. 61532001.

References

(1)
Bai et al. (2019) Ting Bai, Lixin Zou, Wayne Xin Zhao, Pan Du, Weidong Liu, Jian-Yun Nie, and Ji-Rong Wen. 2019. CTrec: A long-short demands evolution model for continuous-time recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 675–684.
Chen et al. (2021a) Chen Chen, Jie Guo, and Bin Song. 2021a. Dual attention transfer in session-based recommendation with multi-dimensional integration. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 869–878.
Chen et al. (2020) Jiawei Chen, Hande Dong, Xiang Wang, Fuli Feng, Meng Wang, and Xiangnan He. 2020. Bias and Debias in Recommender System: A Survey and Future Directions. arXiv preprint arXiv:2010.03240 (2020).
Chen et al. (2018) Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. 2018. Neural ordinary differential equations. arXiv preprint arXiv:1806.07366 (2018).
Chen and Wong (2020) Tianwen Chen and Raymond Chi-Wing Wong. 2020. Handling Information Loss of Graph Neural Networks for Session-based Recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1172–1180.
Chen et al. (2021b) Zeyuan Chen, Wei Zhang, Junchi Yan, Gang Wang, and Jianyong Wang. 2021b. Learning Dual Dynamic Representations on Time-Sliced User-Item Interaction Graphs for Sequential Recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 231–240.
Cheng et al. (2017) Justin Cheng, Caroline Lo, and Jure Leskovec. 2017. Predicting intent using activity logs: How goal specificity and temporal range affect user behavior. In Proceedings of the 26th International Conference on World Wide Web Companion. 593–601.
Cho et al. (2014) Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
Choi et al. (2021) Jeongwhan Choi, Jinsung Jeon, and Noseong Park. 2021. LT-OCF: Learnable-Time ODE-based Collaborative Filtering. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 251–260.
De Brouwer et al. (2019) Edward De Brouwer, Jaak Simm, Adam Arany, and Yves Moreau. 2019. GRU-ODE-Bayes: Continuous Modeling of Sporadically-Observed Time Series. In Advances in Neural Information Processing Systems. 7379–7390.
Dupont et al. (2019) Emilien Dupont, Arnaud Doucet, and Yee Whye Teh. 2019. Augmented Neural ODEs. In Advances in Neural Information Processing Systems. 3134–3144.
Fan et al. (2021) Ziwei Fan, Zhiwei Liu, Jiawei Zhang, Yun Xiong, Lei Zheng, and Philip S Yu. 2021. Continuous-time sequential recommendation with temporal graph collaborative transformer. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 433–442.
Flett (1958) Thomas Muirhead Flett. 1958. 2742. A mean value theorem. The Mathematical Gazette 42, 339 (1958), 38–39.
Folland (2020) Gerald B Folland. 2020. Introduction to partial differential equations. Princeton university press.
Guo et al. (2022) Jiayan Guo, Yaming Yang, Xiangchen Song, Yuan Zhang, Yujing Wang, Jing Bai, and Yan Zhang. 2022. Learning Multi-granularity Consecutive User Intent Unit for Session-based Recommendation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 343–352.
Guo et al. (2019) Lei Guo, Hongzhi Yin, Qinyong Wang, Tong Chen, Alexander Zhou, and Nguyen Quoc Viet Hung. 2019. Streaming session-based recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1569–1577.
Gupta et al. (2019) Priyanka Gupta, Diksha Garg, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. 2019. NISER: Normalized item and session representations to handle popularity bias. arXiv preprint arXiv:1909.04276 (2019).
Haber and Ruthotto (2017) Eldad Haber and Lars Ruthotto. 2017. Stable architectures for deep neural networks. Inverse problems 34, 1 (2017), 014004.
He et al. (2017) Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173–182.
Hendrycks and Gimpel (2016) Dan Hendrycks and Kevin Gimpel. 2016. Gaussian Error Linear Units (GELUs). arXiv: Learning (2016).
Hidasi et al. (2016) Balázs Hidasi, Alexandros Karatzoglou, L. Baltrunas, and D. Tikk. 2016. Session-based Recommendations with Recurrent Neural Networks. CoRR abs/1511.06939 (2016).
Huang et al. (2020) Zijie Huang, Yizhou Sun, and Wei Wang. 2020. Learning Continuous System Dynamics from Irregularly-Sampled Partial Observations. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 16177–16187.
Huang et al. (2021) Zijie Huang, Yizhou Sun, and Wei Wang. 2021. Coupled graph ode for learning interacting system dynamics. In 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2021. 705–715.
Jia and Benson (2019) Junteng Jia and Austin R. Benson. 2019. Neural Jump Stochastic Differential Equations. In Advances in Neural Information Processing Systems. 9843–9854.
Kipf and Welling (2016) Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
Kipf and Welling (2017) Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations.
Kumar et al. (2019) Srijan Kumar, Xikun Zhang, and Jure Leskovec. 2019. Predicting dynamic embedding trajectory in temporal interaction networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1269–1278.
Li et al. (2019) Jia Li, Zhichao Han, Hong Cheng, Jiao Su, Pengyun Wang, Jianfeng Zhang, and Lujia Pan. 2019. Predicting path failure in time-evolving graphs. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1279–1289.
Li et al. (2017a) Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017a. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1419–1428.
Li et al. (2017b) J. Li, Pengjie Ren, Zhumin Chen, Z. Ren, Tao Lian, and J. Ma. 2017b. Neural Attentive Session-based Recommendation. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (2017).
Li et al. (2020) Jiacheng Li, Yujie Wang, and Julian McAuley. 2020. Time interval aware self-attention for sequential recommendation. In Proceedings of the 13th international conference on web search and data mining. 322–330.
Li et al. (2015) Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).
Liu et al. (2018a) Qiao Liu, Y. Zeng, Refuoe Mokhosi, and H. Zhang. 2018a. STAMP: Short-Term Attention/Memory Priority Model for Session-based Recommendation. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018).
Liu et al. (2018b) Qiao Liu, Yifu Zeng, Refuoe Mokhosi, and Haibin Zhang. 2018b. STAMP: short-term attention/memory priority model for session-based recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1831–1839.
Lu et al. (2018) Yiping Lu, Aoxiao Zhong, Quanzheng Li, and Bin Dong. 2018. Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. In International Conference on Machine Learning. PMLR, 3276–3285.
Pan et al. (2020a) Zhiqiang Pan, Fei Cai, Wanyu Chen, Honghui Chen, and Maarten de Rijke. 2020a. Star graph neural networks for session-based recommendation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1195–1204.
Pan et al. (2020b) Z. Pan, Fei Cai, Wanyu Chen, Honghui Chen, and M. Rijke. 2020b. Star Graph Neural Networks for Session-based Recommendation. Proceedings of the 29th ACM International Conference on Information & Knowledge Management (2020).
Pan et al. (2021) Zhiqiang Pan, Wanyu Chen, and Honghui Chen. 2021. Dynamic Graph Learning for Session-Based Recommendation. Mathematics 9, 12 (2021), 1420.
Pareja et al. (2020) Aldo Pareja, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, Tao Schardl, and Charles Leiserson. 2020. Evolvegcn: Evolving graph convolutional networks for dynamic graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 5363–5370.
Poddar et al. (2018) Asmita Poddar, Eva Zangerle, and Y Yang. 2018. nowplaying-RS: a new benchmark dataset for building context-aware music recommender systems. In 15th Sound and Music Computing Conference.
Poli et al. (2019) Michael Poli, Stefano Massaroli, Junyoung Park, Atsushi Yamashita, Hajime Asama, and Jinkyoo Park. 2019. Graph Neural Ordinary Differential Equations. CoRR abs/1911.07532 (2019). arXiv:1911.07532
Qiu et al. (2019) Ruihong Qiu, Jingjing Li, Zi Huang, and Hongzhi Yin. 2019. Rethinking the item order in session-based recommendation with graph neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 579–588.
Ren et al. (2019) Pengjie Ren, Zhumin Chen, Jing Li, Zhaochun Ren, Jun Ma, and Maarten De Rijke. 2019. Repeatnet: A repeat aware neural recommendation machine for session-based recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4806–4813.
Rubanova et al. (2019) Yulia Rubanova, Ricky T. Q. Chen, and David Duvenaud. 2019. Latent ODEs for Irregularly-Sampled Time Series. (2019). arXiv:1907.03907
Seo et al. (2018) Youngjoo Seo, Michaël Defferrard, Pierre Vandergheynst, and Xavier Bresson. 2018. Structured sequence modeling with graph convolutional recurrent networks. In International Conference on Neural Information Processing. Springer, 362–373.
Shen et al. (2021a) Qi Shen, Shixuan Zhu, Yitong Pang, Yiming Zhang, and Zhihua Wei. 2021a. Temporal aware Multi-Interest Graph Neural Network For Session-based Recommendation. CoRR abs/2112.15328 (2021).
Shen et al. (2021b) Qi Shen, Shixuan Zhu, Yitong Pang, Yiming Zhang, and Zhihua Wei. 2021b. Temporal aware Multi-Interest Graph Neural Network For Session-based Recommendation. arXiv preprint arXiv:2112.15328 (2021).
Skardinga et al. (2021) Joakim Skardinga, Bogdan Gabrys, and Katarzyna Musial. 2021. Foundations and modelling of dynamic networks using dynamic graph neural networks: A survey. IEEE Access (2021).
Srivastava et al. (2015) R. Srivastava, Klaus Greff, and J. Schmidhuber. 2015. Highway Networks. ArXiv abs/1505.00387 (2015).
Tang and Wang (2018) Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 565–573.
Vassøy et al. (2019) Bjørnar Vassøy, Massimiliano Ruocco, Eliezer de Souza da Silva, and Erlend Aune. 2019. Time is of the essence: a joint hierarchical rnn and point process model for time and item predictions. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 591–599.
Wang et al. (2020c) Jinshan Wang, Qianfang Xu, Jiahuan Lei, Chaoqun Lin, and Bo Xiao. 2020c. PA-GGAN: Session-Based Recommendation with Position-Aware Gated Graph Attention Network. 2020 IEEE International Conference on Multimedia and Expo (ICME) (2020), 1–6.
Wang et al. (2019) Meirui Wang, Pengjie Ren, Lei Mei, Zhumin Chen, Jun Ma, and Maarten de Rijke. 2019. A collaborative session-based recommendation approach with parallel memory modules. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 345–354.
Wang et al. (2021) Shoujin Wang, Longbing Cao, Yan Wang, Quan Z Sheng, Mehmet A Orgun, and Defu Lian. 2021. A survey on session-based recommender systems. ACM Computing Surveys (CSUR) 54, 7 (2021), 1–38.
Wang et al. (2020a) Ziyang Wang, W. Wei, G. Cong, X. Li, Xian-Ling Mao, and Minghui Qiu. 2020a. Global Context Enhanced Graph Neural Networks for Session-based Recommendation. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2020).
Wang et al. (2020b) Ziyang Wang, Wei Wei, Gao Cong, Xiao-Li Li, Xian-Ling Mao, and Minghui Qiu. 2020b. Global context enhanced graph neural networks for session-based recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 169–178.
Wu et al. (2019a) Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019a. Session-based recommendation with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 346–353.
Wu et al. (2019b) S. Wu, Y. Tang, Yanqiao Zhu, L. Wang, X. Xie, and T. Tan. 2019b. Session-based Recommendation with Graph Neural Networks. In AAAI.
Xhonneux et al. (2020) Louis-Pascal A. C. Xhonneux, Meng Qu, and Jian Tang. 2020. Continuous Graph Neural Networks. (2020). arXiv:1912.00967
Xia et al. (2021) Xin Xia, Hongzhi Yin, Junliang Yu, Qinyong Wang, Lizhen Cui, and Xiangliang Zhang. 2021. Self-Supervised Hypergraph Convolutional Networks for Session-based Recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4503–4511.
Xu et al. (2019a) Chengfeng Xu, Pengpeng Zhao, Yanchi Liu, V. Sheng, J. Xu, Fuzhen Zhuang, J. Fang, and X. Zhou. 2019a. Graph Contextualized Self-Attention Network for Session-based Recommendation. In IJCAI.
Xu et al. (2019b) Chengfeng Xu, Pengpeng Zhao, Yanchi Liu, Victor S Sheng, Jiajie Xu, Fuzhen Zhuang, Junhua Fang, and Xiaofang Zhou. 2019b. Graph Contextualized Self-Attention Network for Session-based Recommendation.. In IJCAI, Vol. 19. 3940–3946.
Xu et al. (2018) Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie Jegelka. 2018. Representation learning on graphs with jumping knowledge networks. In International Conference on Machine Learning. PMLR, 5453–5462.
Yildiz et al. (2019) Cagatay Yildiz, Markus Heinonen, and Harri Lähdesmäki. 2019. ODE2VAE: Deep generative second order ODEs with Bayesian neural networks. In Advances in Neural Information Processing Systems. 13412–13421.
Yu et al. (2016) Feng Yu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. 2016. A dynamic recurrent model for next basket recommendation. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 729–732.
Yu et al. (2020) Feng Yu, Yanqiao Zhu, Qiang Liu, S. Wu, L. Wang, and Tieniu Tan. 2020. TAGNN: Target Attentive Graph Neural Networks for Session-based Recommendation. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2020).
Yuan et al. (2019) Fajie Yuan, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M Jose, and Xiangnan He. 2019. A simple convolutional generative network for next item recommendation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 582–590.
Zang and Wang (2019) Chengxi Zang and Fei Wang. 2019. Neural Dynamics on Complex Networks. CoRR abs/1908.06491 (2019). arXiv:1908.06491
Zhang et al. (2022) Peiyan Zhang, Jiayan Guo, Chaozhuo Li, Yueqi Xie, Jaeboum Kim, Yan Zhang, Xing Xie, Haohan Wang, and Sunghun Kim. 2022. Efficiently Leveraging Multi-level User Intent for Session-based Recommendation via Atten-Mixer Network. arXiv preprint arXiv:2206.12781 (2022).
Zhang et al. (2019) Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep learning based recommender system: A survey and new perspectives. ACM Computing Surveys (CSUR) 52, 1 (2019), 1–38.
Zhou et al. (2021a) Huachi Zhou, Qiaoyu Tan, Xiao Huang, Kaixiong Zhou, and Xiaoling Wang. 2021a. Temporal Augmented Graph Neural Networks for Session-Based Recommendations. (2021).
Zhou et al. (2021b) Huachi Zhou, Qiaoyu Tan, Xiao Huang, Kaixiong Zhou, and Xiaoling Wang. 2021b. Temporal Augmented Graph Neural Networks for Session-Based Recommendations. In SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021, Fernando Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, and Tetsuya Sakai (Eds.). ACM, 1798–1802.

Evolutionary Preference Learning via Graph Nested GRU ODE for Session-based Recommendation