This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

When Multi-Level Meets Multi-Interest: A Multi-Grained Neural Model for Sequential Recommendation

Yu Tian1, Jianxin Chang2, Yanan Niu2, Yang Song2, Chenliang Li1† 1Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan, 430072, China
[email protected]; [email protected]
2Kuaishou Technology Co., Ltd., Beijing, 10010, China
{changjianxin,niuyanan,yangsong}@kuaishou.com
(2022)
Abstract.

Sequential recommendation aims at identifying the next item that is preferred by a user based on their behavioral history. Compared to conventional sequential models that leverage attention mechanisms and RNNs, recent efforts mainly follow two directions for improvement: multi-interest learning and graph convolutional aggregation. Specifically, multi-interest methods such as ComiRec and MIMN, focus on extracting different interests for a user by performing historical item clustering, while graph convolution methods including TGSRec and SURGE elect to refine user preferences based on multi-level correlations between historical items. Unfortunately, neither of them realizes that these two types of solutions can mutually complement each other, by aggregating multi-level user preference to achieve more precise multi-interest extraction for a better recommendation. To this end, in this paper, we propose a unified multi-grained neural model (named MGNM) via a combination of multi-interest learning and graph convolutional aggregation. Concretely, MGNM first learns the graph structure and information aggregation paths of the historical items for a user. It then performs graph convolution to derive item representations in an iterative fashion, in which the complex preferences at different levels can be well captured. Afterwards, a novel sequential capsule network is proposed to inject the sequential patterns into the multi-interest extraction process, leading to a more precise interest learning in a multi-grained manner. Experiments on three real-world datasets from different scenarios demonstrate the superiority of MGNM against several state-of-the-art baselines. The performance gain over the best baseline is up to 3.12% and 4.35% in terms of NDCG@5 and HIT@5 respectively, which is one of the largest gains in recent development of sequential recommendation. Further analysis also demonstrates that MGNM is robust and effective at user preference understanding at multi-grained levels.

Sequential Recommendation, Multi-Interest Learning, Graph Neural Network
Chenliang Li is the corresponding author. Work done when Yu Tian was an intern at Kuaishou.
journalyear: 2022copyright: acmcopyrightconference: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval; July 11–15, 2022; Madrid, Spainbooktitle: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22), July 11–15, 2022, Madrid, Spainprice: 15.00doi: 10.1145/3477495.3532081isbn: 978-1-4503-8732-3/22/07ccs: Information systems Recommender systems

1. Introduction

With the rapid development of the Internet, recommender systems have become an important tool to solve information overload and enhance competitiveness for many online services such as news feeds, E-commerce, advertising, and social media. Obviously, sequential recommendation, which aims to identify the next item that a user will prefer in terms of her historical behaviors, has drawn increasing attention. The core challenge is how to capture the accurate interests from the user’s complex behaviors.

In the past few years, many sequential recommendation solutions have been proposed to model sequential patterns for preference learning. Specifically, earlier works aim to learn a user embedding vector by encoding the user’s overall preference from her complex behavior sequence (Tang and Wang, 2018; Hidasi et al., 2015; Vaswani et al., 2017; Sha and Wang, 2017; Yu et al., 2019). Typically, a sequence modeling technique is applied over the user behavior sequence. For example, GRU4Rec (Hidasi et al., 2015) uses the GRU module to encode preference signals from user behavior sequences. CASER (Tang and Wang, 2018) considers the sequence of item embeddings as an image and learns sequential patterns via horizontal and vertical convolutional filters.

Despite the great success achieved by these solutions, all of them ignore the discrimination of different interests by compositing multifaceted preferences into a single vector. Figure 1 illustrates the click sequences of two users from the E-commerce and Micro-video datasets, respectively. Here, each video is displayed by its first frame. From Figure 1(a), in this short click history, there are two main interests: sports and games. To address the above problem, a handful of multi-interest solutions are proposed recently. These methods are devised to learn accurate preference vectors for each user by multi-interest modeling. Generally, a multi-interest network is utilized to explicitly encode the multiple interests according to relevant information of items in the behavior sequence. For example, MIMN [16] utilizes memory induction units as multiple channels to derive multiple interests from the user’s behavior sequence, which delivers large performance gain in the display advertising system of Alibaba. What’s more, MIND (Li et al., 2019) and ComiRec (Cen et al., 2020) have been improved respectively on the basis of Capsule Network (CapsNet) (Sabour et al., 2017), and the online system has also gained benefits.

Refer to caption
Figure 1. Partial viewing history of two real users in e-commerce and micro-video scenes, respectively. For the user (a), there is a problem that items have an impact on two interests at the same time, i.e. interest overlapping at the (t2)(t-2)-th and tt-th timestamps. For the user (b), there are two different levels of interest in her interaction history: coarse-grained (i.e. Cartoon) and fine-grained (i.e. Tom and Jerry).
\Description

Partial viewing history of real user in micro video scenes.

All these multi-interest models, however, take the item as the minimum interest modeling unit, lacking the ability of modeling complex, dynamic and high-order user behaviors. More specifically, as shown in Figure 1 (a), the user mainly focuses on sports (shown in green) and games (shown in blue). Note that the two items in the (t2)(t-2)-th and tt-th timestamps have an impact on the modeling of both two interests (i.e., interest overlapping). In this case, it is difficult to decompose accurately for the existing multi-interest solutions. Moreover, Figure 1 (b) shows that a user’s interest would be in different granularities. To address this problem, some efforts propose to combine the sequential modeling with graph neural networks (Fan et al., 2021; Chang et al., 2021). They build an item graph for the historical interacted items and perform the graph convolution to aggregate the user preference in different levels. However, in comparison to multi-interest solutions, these methods ignore the benefit of multi-interest decomposition. All in all, how to model multiple interests in a multi-grained manner is the problem we want to solve.

To this end, in this paper, we proposed a novel Multi-Grained Neural Model (named MGNM) via a marriage between multi-interest learning and graph convolutional aggregation. Specifically, MGNM is developed with two major components: user-aware graph convolution and sequential capsule network. We introduce a learnable process to organize a user’s historical items in a user-aware manner, such that the discriminative graph structure and information propagation paths are well uncovered. We then perform graph convolution to derive the item representations iteratively, in which the complex preferences in different levels can be well captured. These multi-level item representations can better reflect the user’s diverse preferences. Afterwards, a novel sequential capsule network is proposed to inject the sequential patterns into the multi-interest extraction process, leading to a more precise interest learning. The recommendation is then generated in terms of the relevance between these multiple interests of different levels and the embedding of the candidate item. To summarize, the contributions of this paper are as follows,

  • We propose a novel neural model by exploiting the both benefits of multi-interest learning and graph convolutional aggregation for better recommendation performance. Specifically, MGNM can achieve multi-grained user preference learning by integrating multi-level preference composition and multi-interest decomposition into a unified framework.

  • We devise a learnable graph construction mechanism to achieve discriminative structure learning over complex user behaviors. Moreover, a sequential capsule network is proposed to exploit temporal information for better multi-interest extraction.

  • We conduct extensive experiments on three large-scale datasets collected from real-world applications. The experimental results show significant performance improvements compared with the state-of-the-art technique alternatives. Further analysis is provided to demonstrate the robustness and interpretability of MGNM.

2. Related Work

Considering both sequential modeling and multi-interest learning in recommender systems are two major areas related to our work, we therefore briefly summarize the relevant existing methods in these two areas.

2.1. Sequential Recommendation

Compared with the general recommendation, the scenario of sequential recommendation is different, and its main task is simplified to predict what the user prefers for a commodity pool in the future by using considering the sequential nature of the user historical behaviors. During the early phase, traditional reasoning methods are utilized, such as Markov Chain, which assumes that the next action depends on the previous action sequence. For example, Rendle et al. (Rendle, 2010) propose to combine matrix factorization with Markov Chains (MC) to achieve better performance in sequence recommendation. And some works assume that the next action only relies on the last behavior, using first-order Markov chain (Cheng et al., 2013). Note that these methods are not capable to capture the long-term interests of users effectively due to the limitation of the capability to simulate the dynamic changes of user preferences over time. Then, the emergence of neural networks further enhances recommender systems’ ability to extracte the preference of users, so another paradigm of sequence recommendation method based on neural networks in addition to MC-based methods has gradually become the mainstream. The most basic multi-layer perceptions (MLPs) structure extracts the non-linear correlations from user-item interactions (He and Chua, 2017). Then a series of models (Zhang et al., 2016; Qu et al., 2016; Shan et al., 2016; Cheng et al., 2016) represented by DeepFM (Guo et al., 2017) are put forward. For the DeepFM model, the FM module is used for a low-order combination of features, and the deep network module is used for the high-order combination of features. By combining the two methods in parallel, the final architecture can learn low-order and high-order combination features at the same time. Referring to the feature extraction mechanism in texts, audios, and pictures, CNN is used to improve the model capability in sequence recommendation. The CNN architectures are also verified to be effective in this regard to a certain extent, by mapping item sequences to embedding matrices. A representative work is Caser (Tang and Wang, 2018), which treats the use’s behavior sequence as an ”image” and adopts a convolutional neural network to extract user representation. Nevertheless, this mechanism ignores the sequential relations in sequence.

Compared with approaches based on DNN and CNN, RNN is able to capture dynamic time series information (Wang et al., 2020; Zhou et al., 2018). Hidasi et al. (Hidasi et al., 2015) first introduce RNN to the sequential recommendation and achieve impressive performance gain over previous methods. Due to the appearance and excellent performance of the RNN network, more and more methods based on the RNN structure are proposed. GRU4Rec (Hidasi et al., 2015) first applies Gated Recurrent Units to model the whole session for a more accurate recommendation. To quantify the different importance of past interactions on the next prediction, attention mechanism (Vaswani et al., 2017) is adopted. Specifically, attention mechanism makes it easy to memorize various remote dependencies or focus on important parts of the input. In addition, the attention-based methods are often more interpretable (Sha and Wang, 2017) than traditional deep learning models. There are some other works that introduce specific neural modules for particular recommendation scenarios, which are mainly based on the combination of RNNs, CNNs, and attention structure, leading to the applications of some emerging network models coming into vogue. For example, memory networks (Chen et al., 2018; Huang et al., 2018), graph neural networks (GNN) (Wu et al., 2019; Ying et al., 2018) that cooperate with the attention mechanism are used to extract short-term features with more consistency or adjacency consideration. SRGNN (Wu et al., 2019) regards the session history as a directed graph. In addition to considering the relationship between an item and its adjacent previous items, it also considers the relationship with other interactive items. What’s more, Fan and Liu et al. (Fan et al., 2021) integrate the sequence information and collaboration information, use a transformer to capture the temporal relationship in the sequence, and construct a continuous-time bipartite graph. SLi_Rec (Yu et al., 2019) utilized the fine-grained temporal characteristics of interactive data in the sequence recommendation to stress the ability to modeling sequential behaviors. The recent work represented by TGSRec (Fan et al., 2021) combines graph and temporal information to further greatly improve the performance of the model.

In a word, most of the existing general sequential approaches are learning to get a single representation of users from an RNN and attention-based model according to the historical behaviors. And graph models, which are capable to aggregate neighbor information, have also been proved to be very effective. Nevertheless, the user history interaction sequence contains more than one discrete interest of the user, and a single vector can not fully express the user preferences. In addition, the noise in the process of graph construction and information aggregation is also an important reason to limit the performance of graph-based sequential models.

2.2. Multi-Interest Recommendation

For a stronger ability to learn the complex behaviors precisely, recently researchers consider that representing user preferences as a single vector is insufficient, more and more sequential recommendation models based on multi-interest, therefore, appear in our field of vision. Li et al. (Li et al., 2020) consider that users’ interests are dynamic and evolve over time. A pre-trained model based on transformer structure is designed, using the item of the next time step as the label of the interest at the current time step, and then obtains the interest of each time step. The final interest representation is generated by the attentional fusion structure. Pi et al. (Pi et al., 2019) propose MIMN system which contains modules Neural Turing Machine (NTM), Memory Induction Unit (MIU), etc. In the MIU module, an additional storage unit s is also included, which contains M memory slots. It is considered that each memory slot is a user interest channel. Besides, both MIND (Li et al., 2019) and ComiRec (Cen et al., 2020) devise multi-interest recommendation models on the basis of CapsNet, which uses the idea of neural routing to realize interest decomposition. Note that ComiRec introduces two multi-interest extraction mechanisms including CapsNet and self-attention. At the same time, they also have good applications in the industry. The above methods are multi-interest methods based on sequence models. With the popularity of graph neural networks, the undeniable role of neighbor information has also been proved to be effective obviously. Therefore, the approach of combining graph and multi-interest has also attracted extensive attention in recent years. For example, in SURGE (Chang et al., 2021), it forms dense clusters in the interest graph to distinguish users’ core interests and performs cluster-aware and query-graph graph convolutional propagation to fuse users’ current core interests from behavior sequences. These mentioned approaches have also been successfully applied in many recommendation applications and are rather useful and efficient in real-world application tasks.

3. Method

Refer to caption
Figure 2. The network architecture of our proposed MGNM. The raw sequence is the historical behavior of a user. By transforming the original sequence into a user-aware adaptive graph and using the neural aggregation function of sequential CapsNet, the timing information is added to the graph in the training process. In the inference stage of the model, the max-pooling layer is used to obtain the final prediction score.

In this section, we present the proposed multi-grained neural model in detail. As illustrated in Figure 2, the proposed MGNM consists of two main components: user-aware graph convolution and sequential capsule network. In the following, we firstly present the formal problem setting. Then, we describe these components, followed by the prediction and model optimization process.

3.1. Problem Formulation

Let 𝒱={x1,x2,,xM}\mathcal{V}=\{x_{1},x_{2},...,x_{M}\} denotes the set of all items, 𝒰={u1,u2,,uN}\mathcal{U}=\{u_{1},u_{2},...,u_{N}\} be the set of all users, and {bu}MN\{b_{u}\}_{M}^{N} be the behavior sequence set between these NN users and MM items. Here, for each user uu, bu=[x1,x2,,xm]b_{u}=[x_{1},x_{2},...,x_{m}] is the sequence of her clicked items following the chronological order, and mm is the predefined maximum capacity. The sequential recommendation is to precisely identify the next item xm+1x_{m+1} that user uu will click in terms of {bu}MN\{b_{u}\}_{M}^{N}.

3.2. User-Aware Graph Convolution

In order to extract complex and high-order interests from user click sequences, we consider the graph structure and the aggregation of neighbor information of the target node at different distances in the graph. So at the first step, we convert discrete history behavior into a fully connected item-item graph. Compared with existing methods, we do not artificial use co-occurrence, click of the same user, and other relationships to enhance the graph, because this approach often introduces noise, which affects the performance of information aggregation in the later convolution process to some extent. In the MGNM, the nodes and users embedding would be updated by using the neural aggregation of CapsNet through gradient feedback and then generate an adaptive graph structure.

3.2.1. Embedding Layer

In the embedding layer, we firstly form a user embedding table URN×dU\in R^{N\times d} and an item embedding table VRM×dV\in R^{M\times d}, where dd denotes the dimension of the embedding vector. For the given user uu and the associated behavior sequence bub_{u}, we can perform the table lookup from UU and VV to obtain the corresponding user and item embedding representations 𝐱u\mathbf{x}_{u} and [𝐱1,𝐱2,,𝐱m][\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{m}] respectively. Hence, the user embeddings UU are expected to encode the users’ overall preference, while the item embeddings VV reflect items’ characteristics in this space instead.

3.2.2. Graph Construction

Given the historical behavior sequence bu=[x1,x2,,xm]b_{u}=[x_{1},x_{2},...,x_{m}] of user uu, we first transform the constituent items into a fully connected undirected graph 𝒢u\mathcal{G}_{u} by taking each item xix_{i} as a node. It is worth mentioning that we do not condense repeated items in the sequence (i.e., representing multiple clicks of the user), because the multiple clicks of the same item could convey more user preferences. We then introduce 𝐀Rm×m\mathbf{A}\in R^{m\times m} to denote the corresponding adjacency matrix, where each entry 𝐀i,j\mathbf{A}_{i,j} indicates the relatedness between item xix_{i} and item xjx_{j} in the perspective of user uu. Instead of utilizing behavior patterns to derive matrix 𝐀\mathbf{A}, we choose to learn this relatedness based on their hidden features as follows:

(1) 𝐀i,j=sigmoid((𝐱i𝐱j)𝐱u),\displaystyle\mathbf{A}_{i,j}=sigmoid((\mathbf{x}_{i}\odot\mathbf{x}_{j})\cdot\mathbf{x}_{u}),

where \odot and \cdot denote the Hadamard product and inner product respectively, and sigmoidsigmoid denotes the activation function. We can see that the user embedding 𝐱u\mathbf{x}_{u} is exploited to achieve user-aware graph construction. That is, the same item pair could have different relatedness values for different users. Also, the use of Hadamard product ensures the symmetry of the adjacency matrix.

Note that graph 𝒢u\mathcal{G}_{u} is a fully connected. Hence, we need 𝐀\mathbf{A} to be adequately discriminative to facilitate precise multi-level preference learning. To achieve this purpose, we add L1L1 regularization on the adjacency matrix 𝐀\mathbf{A} to approximate a certain sparsity.

3.2.3. Graph Convolution

Following the common practice, we perform graph convolution operation over 𝒢u\mathcal{G}_{u} as follows:

(2) 𝐇(l+1)\displaystyle\mathbf{H}^{(l+1)} =δ(𝐃~12𝐀~𝐃~12𝐇(l)𝐖),\displaystyle=\delta(\tilde{\mathbf{D}}^{-\frac{1}{2}}\tilde{\mathbf{A}}\tilde{\mathbf{D}}^{-\frac{1}{2}}\mathbf{H}^{(l)}\mathbf{W}),
(3) 𝐃~12\displaystyle\tilde{\mathbf{D}}^{-\frac{1}{2}} =𝐈+𝐃12𝐀𝐃12,\displaystyle=\mathbf{I}+\mathbf{D}^{-\frac{1}{2}}\mathbf{A}\mathbf{D}^{-\frac{1}{2}},
(4) 𝐇(0)\displaystyle\mathbf{H}^{(0)} =[𝐱1,𝐱2,,𝐱m],\displaystyle=[\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{m}],

where 𝐇(l)\mathbf{H}^{(l)} denotes the item representations aggregated by the ll-th layer (l{1,,L}l\in\{1,\cdots,L\}), δ()\delta(\cdot) denotes the LeakyReLU nonlinearity, II denotes the identity matrix aiming to add self-loop propagation, 𝐖\mathbf{W} denotes the trainable parameter and DD denotes degree matrix of 𝐀\mathbf{A}. The parameter matrix 𝐖\mathbf{W} is shared for all LL layers. This modification is to facilitate the feature aggregation from the high-order neighbors, which also reduces the model complexity. The item representations composited in each layer can reflect the user’s diverse preferences more precisely.

3.3. Sequential Capsule Network

After extracting multi-level item representations {𝐇(0),,𝐇(L)}\{\mathbf{H}^{(0)},\cdots,\mathbf{H}^{(L)}\}, where 𝐇l=[𝐡1(l),,𝐡m(l)]\mathbf{H}^{l}=[\mathbf{h}_{1}^{(l)},\cdots,\mathbf{h}_{m}^{(l)}] and 𝐡i(l)Rd\mathbf{h}_{i}^{(l)}\in R^{d}, we choose to utilize CapsNet to generate the user’s multiple interests for each level. Actually, the existing works for multi-interest-based recommendation utilize CapsNet to composite each interest representation through the built-in dynamic routing mechanism. The output of each capsule is equivalent to specific user interests. However, the standard dynamic routing mechanism mainly achieves the function of iterative soft-clustering. It is well validated that temporal information is critical for the sequential recommendation. This is why the application of CapsNet in fine-tuning CTR tasks is limited(Cen et al., 2020; Li et al., 2019).

Here, we repatch this defect by introducing a sequential encoding layer for CapsNet. Specifically, given the item representations at level ll, the iith capsule firstly performs a linear projection over 𝐇(l)\mathbf{H}^{(l)} as follows:

(5) 𝐙i=𝐇(l)𝐖i,\displaystyle\mathbf{Z}_{i}=\mathbf{H}^{(l)}\mathbf{W}_{i},

where 𝐙i=[𝐳1(l),,𝐳m(l)]\mathbf{Z}_{i}=[\mathbf{z}_{1}^{(l)},\cdots,\mathbf{z}_{m}^{(l)}] and 𝐖iRd×d\mathbf{W}_{i}\in R^{d\times d} is the trainable parameter for the projection.

We then initialize 𝐠=[g1,,gm]\mathbf{g}=[g_{1},\cdots,g_{m}] by truncated normal distribution, where gig_{i} is the agreement score indicating the relevance of item xix_{i} towards the capsule. The coupling coefficient 𝐜Rd\mathbf{c}\in R^{d} for the corresponding dynamic routing mechanism is then derived via a softmax function:

(6) 𝐜=softmax(𝐠).\displaystyle\mathbf{c}=softmax(\mathbf{g}).

Then, the capsule derive its output 𝐨i\mathbf{o}_{i} via a nonlinear squashing function as follows:

(7) 𝐨i\displaystyle\mathbf{o}_{i} =𝐯i2𝐯i2+1𝐯i𝐯i,\displaystyle=\frac{\|\mathbf{v}_{i}\|^{2}}{\|\mathbf{v}_{i}\|^{2}+1}\frac{\mathbf{v}_{i}}{\|\mathbf{v}_{i}\|},
(8) 𝐯i\displaystyle\mathbf{v}_{i} =j=1mcj𝐳j(l),\displaystyle=\sum_{j=1}^{m}c_{j}\mathbf{z}_{j}^{(l)},

where cjc_{j} is the jjth element of 𝐜\mathbf{c}. We then update the agreement score gig_{i} as follows:

(9) gi=gi+𝐨i𝐳i.\displaystyle g_{i}=g_{i}+\mathbf{o}_{i}^{\top}\mathbf{z}_{i}.

After this first iteration, we utilize a BiLSTM111Any other sequential modeling techniques like GRU and Transformer can be straightforwardly applied here. module to encode sequential patterns and update 𝐙i\mathbf{Z}_{i} via a residual structure:

(10) 𝐙i=𝐙i+BiLSTM(𝐙i).\displaystyle\mathbf{Z}_{i}=\mathbf{Z}_{i}+BiLSTM(\mathbf{Z}_{i}).

We then repeat the above routing process following Equation 6-10 for τ1\tau-1 without further applying BiLSTM encoding over 𝐙i\mathbf{Z}_{i}. That is, total τ\tau iterations are performed for each capsule. The output 𝐨i\mathbf{o}_{i} in the last iteration is fed into a full-connected layer to derive the iith interest representation 𝐪i(l)\mathbf{q}_{i}^{(l)} in the ll level as follows:

(11) 𝐪i(l)=ReLU(𝐨i𝐖i),\displaystyle\mathbf{q}_{i}^{(l)}=ReLU(\mathbf{o}_{i}\mathbf{W}_{i}^{\prime}),

where 𝐖iRd×d\mathbf{W}_{i}^{\prime}\in R^{d\times d} is the trainable parameter. Assuming the number of interests is KK, we obtain KK interest representations [𝐪1(l),,𝐪K(l)][\mathbf{q}_{1}^{(l)},\cdots,\mathbf{q}_{K}^{(l)}] for the llth level. That is, we extract in total (L+1)K(L+1)\cdot K interest representations.

3.4. Prediction and Model Optimization

3.4.1. Prediction

Given a candidate item xtx_{t}, we firstly utilize an attention mechanism to derive the user preference vector 𝐩u(l)\mathbf{p}_{u}^{(l)} for llth level as follows:

(12) 𝐩u(l)\displaystyle\mathbf{p}_{u}^{(l)} =j=1Kaj𝐪j(l),\displaystyle=\sum_{j=1}^{K}a_{j}\mathbf{q}_{j}^{(l)},
(13) aj\displaystyle a_{j} =exp(𝐪j(l)𝐱t)k=1Kexp(𝐪k(l)𝐱t),\displaystyle=\frac{\exp(\mathbf{q}_{j}^{(l)\top}\mathbf{x}_{t})}{\sum_{k=1}^{K}\exp(\mathbf{q}_{k}^{(l)\top}\mathbf{x}_{t})},

where aja_{j} is the attention weight for jjth interest. Then, we choose inner product to calculate the recommendation score as follows:

(14) y^u,i(l)=𝐩u(l)𝐱t,\displaystyle\hat{y}_{u,i}^{(l)}=\mathbf{p}_{u}^{(l)\top}\mathbf{x}_{t},

where y^u,i(l)\hat{y}_{u,i}^{(l)} denotes the recommendation score for the llth level. Note that different users could have different interest granularity. In other words, some users’ interests are very complex and dynamic, the high-level user preference is more accurate. On the other hand, some users’ interests are simple and straightforward, it is more appropriate to utilize the low-level user preference or even original item representations. Hence, we derive the final recommendation score by using the max-pooling:

(15) y^u,i=max(y^u,i(0),,y^u,i(L)).\hat{y}_{u,i}=\max(\hat{y}_{u,i}^{(0)},\cdots,\hat{y}_{u,i}^{(L)}).

3.4.2. Model Optimization

For the sake of enabling the model to capture user interests of different granularity from low-level to high-level, we choose to define a cross-entropy loss for each level. Thus, the final loss is formulated as follows:

(16) all\displaystyle\mathcal{L}_{all} =l=0Ll+θ11+θ22,\displaystyle=\sum_{l=0}^{L}\mathcal{L}_{l}+\theta_{1}\mathcal{L}_{1}+\theta_{2}\mathcal{L}_{2},
(17) l\displaystyle\mathcal{L}_{l} =u,i[yu,iln(y^u,i(l))+(1yu,i)ln(1u^u,i(l))],\displaystyle=-\sum_{u,i}[y_{u,i}ln(\hat{y}_{u,i}^{(l)})+(1-y_{u,i})ln(1-\hat{u}_{u,i}^{(l)})],

where yu,iy_{u,i} denotes the ground truth for user uu and item xix_{i}, 1\mathcal{L}_{1} denotes the L1L_{1} norm of the matrix 𝐀\mathbf{A}, 2\mathcal{L}_{2} denotes the L2L_{2} norm of all model parameters, θ1\theta_{1} and θ2\theta_{2} denote the hyperparameters.

4. Experiments

In this section, we conduct extensive experiments on three real-world datasets from different domains for performance evaluation. We then analyze the contributions of several components and different settings for MGNM 222The code implementation is available at https://github.com/WHUIR/MGNM. Finally, a thorough analysis of ablation experiments and a framework optimizer exploration are presented.

4.1. Experimental Settings

Datasets. The first dataset (namely Micro-video) is collected from a leading large-scale Micro-video sharing platform. This dataset contains 60,81360,813 users and their interaction records over seven days (i.e., October 22 to October 28, 2020). We take the interactions made in the first six days as the training set. The interactions were made before 1212PM on the last day as the validation set, and the rest as the test set.

The other two datasets are from Amazon product datasets333http://snap.stanford.edu/data/amazon/: Musical Instruments and Toys and Games. Here, each user interaction in the Amazon dataset is associated with a user rating score. Following the previous works (Rendle et al., 2010; He and McAuley, 2016; He et al., 2017), we take each user interaction with a rating score larger than 22 as the positive. We then organize these interactions and split the interaction sequence with the ratio of 77:11:22 to form the training, validation, and test set respectively following the chronological order. We further remove users whose length of history sequence is 11.

Table 1 summarizes detailed statistics of the three datasets after preprocessing. The Micro-video dataset includes a large number of items, while Toys and Games is much smaller. Also, the Musical Instruments is the smallest according to the interaction number. We can see that these three real-world datasets hold different characteristics, covering a broad range of real-world scenarios.

Table 1. Statistics of the three datasets.
Datasets #Users #Items #Interactions
Micro-video 60,813 292,286 14,952,659
Musical Instruments 60,739 56,301 946,627
Toys and Games 313,557 241,657 6,212,901

Baselines. We compare the proposed MGNM against the following state-of-the-art sequential recommendation methods:

  • Caser (Tang and Wang, 2018) is A CNN-based model which applies horizontal and vertical convolutional filters to capture the point-level, union-level, and skipping patterns for sequential recommendation.

  • A2SVD (Yu et al., 2019) is short for the asynchronous SVD method, which modifies the prediction model to express the user as the superposition of items. Combined with implicit feedback data, the parameters of the model are reduced, and the interpretability of the original SVD model is enhanced.

  • GRU2Rec (Hidasi et al., 2015) utilizes the gated recurrent unit to model the session sequence for recommendation.

  • SLi_Rec (Yu et al., 2019) improve the traditional RNN structure by proposing a temporal-aware controller and a content-aware controller so that contextual information can guide the state transition. An attention-based framework is proposed to combine the user’s long-term and short-term preferences. Hence, the representations can be generated adaptively according to the specific context.

  • MIMN (Pi et al., 2019) is a state-of-the-art multi-interest model that utilizes a multi-channel memory metwork, to capture user interests from the sequential behaviors .

  • MIND (Li et al., 2019) is a multi-interest learning model that utilizes CapsNet to capture diverse interests of a user.

  • ComiRec (Cen et al., 2020) is a recent multi-interest model containing a multi-interest module and an aggregation module. The multi-interest module captures a variety of interests from the user behavior sequence, and can retrieve the candidate item set in a large-scale item pool. Then the aggregation module uses controllable factors to balance the accuracy and diversity for recommendation. Two variants of ComiRec are used for performance comparison: ComiRec-DR and ComiRec-SA, where CapsNet and self-attention are used for multi-interest extraction respectively.

  • SURGE (Chang et al., 2021) is a uptodate graph neural model for sequential recommendation, which performs cluster-aware and query-graph propagation to fuse users’ current core interests from behavior sequences.

  • TGSRec (Fan et al., 2021) is also a uptodate graph neural model that considers temporal dynamics inside the sequential patterns.

All these baselines can be divided into four categories: (1) traditional sequential models that utilize RNN and attention mechanism (i.e., Caser, A2SVD and GRU4Rec); (2) temporal-aware models that exploit the timestamp information (i.e., SLi_Rec and TGSRec); (3) multi-interest models that derive various user interest (i.e., MIMN, MIND, ComiRec-DR, ComiRec-SA and SURGE); (4) graph nerual models that exploit high-order correlations (i.e., SURGE).

Table 2. Performance comparison of different methods across the three datasets. The best and second-best results are highlighted in boldface and underlined respectively. * indicates that the performance difference against the best result is statistically significant at 0.050.05 level. Note that TGSRec took too long to train hence has no results on the large Micro-video dataset. See context for details.
Method Micro-video Toys and Games Music Instruments
GAUC NDCG@5 HIT@5 MRR@5 GAUC NDCG@5 HIT@5 MRR@5 GAUC NDCG@5 HIT@5 MRR@5
Caser 0.6917* 0.0964* 0.1417* 0.0815* 0.6234* 0.0679* 0.1012* 0.0569* 0.6763* 0.0955* 0.1178* 0.0883*
A2svd 0.6808* 0.0443* 0.0686* 0.0364* 0.6846* 0.0507* 0.0739* 0.0430* 0.6652* 0.0956* 0.1368* 0.0820*
GRU4Rec 0.6944* 0.0702* 0.1050* 0.0589* 0.6624* 0.0840* 0.1278* 0.0697* 0.6498* 0.0619* 0.1049* 0.0478*
SLi_rec 0.6903* 0.0948* 0.1390* 0.0802* 0.7847* 0.0932* 0.1327* 0.0803* 0.6912* 0.1078 0.1507* 0.0937*
TGSRec 0.7915* 0.1410* 0.2027* 0.1164* 0.7759 0.0946* 0.1653 0.0729*
MIMN 0.7387* 0.1151* 0.1683* 0.0977* 0.7224* 0.1158* 0.1676* 0.0988* 0.6787* 0.0955* 0.1509* 0.0750*
MIND 0.6778* 0.08582* 0.1367* 0.0700* 0.6611* 0.1015* 0.1510* 0.0824* 0.6588* 0.1040* 0.1422* 0.0898*
ComiRec-DR 0.7028* 0.0863* 0.1307* 0.0718* 0.6681* 0.1131* 0.1597* 0.0978* 0.6647* 0.1091* 0.1541* 0.0943*
ComiRec-SA 0.6249* 0.0354* 0.0577* 0.0281* 0.6486* 0.0665* 0.0977* 0.0563* 0.6559* 0.0820* 0.1204* 0.0694*
SURGE 0.8116* 0.1091* 0.1728* 0.0883* 0.7863* 0.0930* 0.1353* 0.0791* 0.6902* 0.1056* 0.1494* 0.0913*
MGNM 0.8325 0.1463 0.2163 0.1232 0.8078 0.1611 0.2231 0.1408 0.7480* 0.1057 0.1658 0.1021

Hyperparameter Settings. For a fair comparison, all methods are implemented in Tensorflow and learnt with Adam optimizer. The learning rate, mini-batch size are set to 1e31e-3 and 256256. The number of negative samples is 55 in the training stage for all three datasets. We tuned the parameters of all methods over the validation and set the embedding size as 1616 and 4040 for Amazon datasets and Micro-video datasets respectively. Specifically, as to MGNM, we found our model performs relatively stable when K=4K=4, L=3L=3, and θ1=1e6,θ2=1e5\theta_{1}=1e-6,\theta_{2}=1e-5.

Evaluation Metric. Following the same setting in (Fan et al., 2021), we sample 1,0001,000 negative items for each testing instance. Four common metrics: hit rate (HR), mean reciprocal rank (MRR), and normalized discounted cumulative gain (NDCG) and Group AUC (GAUC), are used for performance evaluation. For method, we repeat the experiment 55 times and report the average results. The statistical significance test is conducted by the student’s tt-testtest.

4.2. Performance Evaluation

The overall performance of all methods is reported in Table 2. Here, we make the following observations.

As for traditional sequential models that utilize RNN and attention mechanism, they are difficult to achieve better performance. Compared with temporal-aware models and multi-interest models, they are not suitable for complex and various user interest modeling. The temporal-aware models perform very well in Amazon datasets. Specifically, on the Music Instruments dataset, TGSRec and Sli_Rec achieve the best performance in terms of GAUC and NDCG@55 respectively. They also achieve strong performance in terms of the other six metrics for both Toys and Games and Music Instruments. It is worthwhile to mention that TGSRec needs to build a global graph and takes the interactions at different time points as edges. This design choice requires much more computation cost for graph retrieval and convolution. Note that the graph constructed on the Micro-video dataset contains more than 200200 million edges. We utilize the implementation released by the original authors for evaluation. The time of an epoch training exceeds 1,2001,200 hours. Hence, we do not obtain results on the Micro-video dataset.

Also, given the superiority of these temporal-aware models on both Amazon datasets, the multi-interest models perform better on the Micro-video dataset. This suggests that neither temporal-aware models nor multi-interest models are robust enough to achieve precise preference understanding across different scenarios. Considering the semantic space in the Micro-video recommender scenario could be much broader than commodities in E-Commerce scenarios, the interest of each user also becomes more complex. It is reasonable that the multi-interest models could achieve better recommendation performance instead.

Our proposed MGNM has obvious improvement in most settings for the three datasets including Micro-video, Toys and Games, and Music Instruments. In detail, MGNM performs significantly better than all the baselines on 1010 out of 1212 dataset and metric combinations. Although our MGNM achieves only comparable NDCG@55 performance against SLi_Rec and performs a bit worse to TGSRec in terms of GAUC both on the Music Instruments dataset, we need to emphasize that both SLi_Rec and TGSRec exploit additional timestamp features to seize more discriminative capacity. In other words, our model MGNM lacks one-dimensional timestamp characteristics than these two models (i.e. SLi_Rec and TGSRec). Note that modeling multi-grained multi-interest in MGNM and exploiting timestamp information are not mutually excluded. As shown in Equation 10, it is straightforward to include fine-grained timestamp information in the sequential capsule network component444We leave it as a part of our future work.. Moreover, we can see that our proposed MGNM performs increasingly better on larger datasets. The relative performance gain by MGNM against the best baseline is in the range of 1.63%2.44%1.63\%-2.44\% and 2.09%4.35%2.09\%-4.35\% on Toys and Games and Micro-video datasets respectively. This further confirms that our MGNM is effective to capture multi-grained user interests for large-scale real-world scenarios that are rich in semantics.

4.3. Model Analysis

Here, we investigate the impact of each design choice and important parameter settings to the performance of MGNM.

Refer to caption
(a) GAUC
Refer to caption
(b) NDCG@5
Refer to caption
(c) HIT@5
Refer to caption
(d) MRR@5
Figure 3. The performance of different LL values on Toys and Games and Micro-video Datasets.
Refer to caption
(a) GAUC
Refer to caption
(b) NDCG@5
Refer to caption
(c) HIT@5
Refer to caption
(d) MRR@5
Figure 4. The performance of different KK values on Toys and Games and Micro-video Datasets.

Ablation Study. We conduct an ablation study for each design choice in MGNM to justify their validity. Specifically, these factors include user-aware graph convolution (UGCN), the L1L1 regularization on the adjacency matrix AA (L1L1Norm), sequential capsule network without sequential encoding layer (BiLSTM), and the max-pooling based prediction (MaxPool). As to the sequential capsule network, we also examine the following variants:

  • SCN\to BiLSTM: We replace the sequential capsule network with BiLSTM to encode the user behavior sequence in different levels. The last hidden state generated by the BiLSTM is taken the user interest in the corresponding level.

  • SCN\to SumPool: We replace the sequential capsule network with a sum pooling mechanism. Similar to SCN\to BiLSTM, the resultant representation is taken the user interest for the corresponding level.

  • SCN\to SelfAtt: We replace the sequential capsule network with a self-attention mechanism. The candidate item is utilized to derive the user interest for each level by using an attention mechanism.

  • SCN (Transformer): We replace the built-in BiLSTM in the sequential capsule network with a powerful transformer module.

Table 3 reports the performance of these variants and the full MGNM model on Toys and Games dataset555Similar observations are also made in the other two datasets. Here, we can make the following observations.

Firstly, The L1L_{1} regularization indeed improves the discriminative capacity of user-aware graph convolution. The experimental results show that the performance degradation without it is obvious. When we remove the multi-level item representation learning supported by user-aware graph convolution (i.e., w/o UGCN), substantial performance degradation is also experienced by MGNM, which illustrates the effectiveness of user-aware graph convolution and multi-level preference learning significantly. Also, we can find that MGNM experiences a large performance degradation by removing the sequential encoding layer (i.e., w/o BiLSTM). This is reasonable since the sequential patterns have been well validated to be effective for the sequential recommendation. Now, we further validate that sequential patterns are also very useful for multi-interest learning. At last, the max-pooling-based prediction plays a great role in improving all four performance metrics. As we described earlier, the max-pooling mechanism is flexible in capturing the complex user preference from multi-grained interests.

Table 3. The ablation study of MGNM on Toys and Games Dataset. The best results are highlighted in boldface.
Model Toys and Games
GAUC NDCG@5 HIT@5 MRR@5
w/o UGCN 0.7499 0.0929 0.1325 0.0799
w/o L1L1Norm 0.7757 0.1306 0.1848 0.1128
w/o BiLSTM 0.6743 0.1205 0.1689 0.1046
w/o MaxPool 0.8491 0.0980 0.1430 0.0832
SCN\to BiLSTM 0.6589 0.0838 0.1223 0.0712
SCN\to SumPool 0.6651 0.0846 0.1232 0.0720
SCN\to SelfAtt 0.6724 0.0791 0.1148 0.0674
SCN (Transformer) 0.6663 0.0923 0.1321 0.0792
MGNM 0.8078 0.1611 0.2231 0.1408

Secondly, we further dive deep into the effectiveness of the sequential capsule network component. The first three variants (i.e., SCN\to BiLSTM, SCN\to SumPool, SCN\to SelfAtt) aim to remove the multi-interest modeling by considering only the multi-level user preferences. We can see that these three variants all experience significant performance degradation across the four metrics. Recall that MGNM w/o UGCN also produces a substantial performance degradation above. These two observations suggest that both user-aware graph convolution and sequential capsule network works as a whole and either of them complements the other, leading to better user preference understanding. At last, we find that encoding sequential patterns with a heavy module like transformer achieves better performance than SCN\to BiLSTM, SCN\to SumPool, and SCN\to SelfAtt in terms of NDCG@55, HIT@55 and MRR@55. This also validates the benefit of modeling sequential patterns for multi-interest learning. However, the huge number of parameters involved in a transformer module could complicate the model optimization process. Note that we also derive multi-level item representations by the user-aware graph convolution, a lightweight sequential model like BiLSTM is sufficient for the next step. At the same time, to the best of our knowledge, there are no previous methods to integrate sequential modeling with CapsNet.

Impact of LL Value. Recall that we stack LL layers of graph convolution in MGNM to reflect the user’s diverse preferences in multi-grained manner. A larger LL value can recruit more high-order neighbors to derive the user’s preference more and more distant neighbor information is aggregated. However, some noisy information would also be included to deliver adverse impact. In Figure 3, we plot the performance patterns of varying LL values for both Toys and Games and Micro-video datasets. It is reasonable to see that all NDCG@55, HIT55 and MRR@55 scores firstly increase when LL becomes large (L3L\leq 3), and then decrease when LL is too large (L>3L>3). Also, the metric GAUC seems to be very stable for different LL values.

Impact of KK Value. The number of interests KK in MGNM controls the diversity of user preferences. Figure 4 plots the performance patterns of varying KK values for both Toys and Games and Micro-video datasets. We can observe that a single interest representation (i.e., K=1K=1) achieves the worst performance across the four metrics. The optimal KK value is 22 and 44 for Toys and Games and Micro-video datasets respectively. Moreover, we can see that MGNM achieves relatively more stable performance when KK is in the range of [3,5][3,5]. This is reasonable since the semantic space of the Micro-video dataset is much broader than the Toys and Games dataset.

Multi-Level User Interest Distribution. In Figure 3, we examine the impact of different LL layers in Micro-video. Here, we further investigate whether the multi-level user preferences indeed perform different roles for different users. Specifically, we randomly sample 5050 users from the Micro-video and Toys and Games datasets, respectively. For each user, we include her positive items in the test set and thousands of random negative items, and count the activated preference level by the max-pooling based predictor (ref. Equation 15). Figure 5 and Figure 6 plots the distribution of these activated levels for each user on Micro-video and Toys and Games datasets, respectively. We can observe that the desired preference level is quite different for different users. Also, the first two layers are adequate for most users in MGNM on Toys and Games dataset. But we also need to derive high-level preferences for a few users (i.e., L2L\geq 2) in Figure 6. As for the Micro-video dataset with a larger semantic space, the role of high-level preferences becomes more significant to all users from Figure 5. On the whole, users have more high-level preferences on Micro-video. In other words, users’ interests in Micro-video scenes are higher-level, more complex, and change faster, which we mentioned in figure 1 and the previous analysis. Thus, this phenomenon is in line with our expectations, which well proves the effective impact of the multi-level mechanism. Furthermore, in the inference stage, we replace max-pooling with sum-pooling to further verify the influence of max-pooling structure in Figure 7. In combination with the distribution of user interests in Figure 5 and 6) and the better experimental results of max-pooling than sum-pooling in Figure 7, this suggests that the user-aware graph convolution does distinguish the user’s interest in multi-level precisely and the multi-level mechanism does promote the performance.

Refer to caption
Figure 5. Visualization of multi-level user interest distribution on Micro-video dataset (Best viewed in color).
Refer to caption
Figure 6. Visualization of multi-level user interest distribution on Toys and Games dataset (Best viewed in color).
Refer to caption
Figure 7. Max-pooling vs. sum-pooling for MGNM in the inference stage.
Table 4. Runtime comparisons for different datasets.
Datasets Per Iteration (s) Iterations Total Time (m)
Micro-video 0.3825 15,311 97.60
Toys and Games 0.1843 13,202 40.55
Music Instruments 0.0598 2,373 2.37

Time Complexity Analysis. Table 4 reports the runtime of MGNM training procedure for a single user on different datasets by using a single GPU. Although the MGNM adopt the graph convolution, we can see that the model training with 1515M interactinos takes about 1.51.5H for one epoch, which is computationally efficient.

5. Conclusion

In this paper, we proposed a novel multi-grained neural model (named MGNM) with a combination of multi-level and multi-interest as a unified solution for sequential recommendation task. A learnable process was introduced to re-construct loose item sequences into tight item-item interest graphs in a user-aware manner. We then performed graph convolution to derive the item representations iteratively, in which the complex preferences in different levels can be well captured. Afterwards, a novel sequential CapsNet was designed to inject the sequential patterns into the multi-interest extraction process, leading to a more precise interest modeling. Extensive experiments on three real-world datasets in different recommendation scenes demonstrated the effectiveness of the multi-level and multi-interest mechanisms. Further studies on the number of preferences and multi-level user interest distribution confirmed that our method was able to deliver recommendation interpretation at multi-level granularities.

Acknowledgements.
This work was supported by National Natural Science Foundation of China (No. 61872278); and Young Top-notch Talent Cultivation Program of Hubei Province. Chenliang Li is the corresponding author.

References

  • (1)
  • Cen et al. (2020) Yukuo Cen, Jianwei Zhang, Xu Zou, Chang Zhou, Hongxia Yang, and Jie Tang. 2020. Controllable multi-interest framework for recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2942–2951.
  • Chang et al. (2021) Jianxin Chang, Chen Gao, Yu Zheng, Yiqun Hui, Yanan Niu, Yang Song, Depeng Jin, and Yong Li. 2021. Sequential Recommendation with Graph Neural Networks. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 378–387.
  • Chen et al. (2018) Xu Chen, Hongteng Xu, Yongfeng Zhang, Jiaxi Tang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2018. Sequential recommendation with user memory networks. In Proceedings of the eleventh ACM international conference on web search and data mining. 108–116.
  • Cheng et al. (2013) Chen Cheng, Haiqin Yang, Michael R Lyu, and Irwin King. 2013. Where you like to go next: Successive point-of-interest recommendation. In Twenty-Third international joint conference on Artificial Intelligence.
  • Cheng et al. (2016) Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.
  • Fan et al. (2021) Ziwei Fan, Zhiwei Liu, Jiawei Zhang, Yun Xiong, Lei Zheng, and Philip S Yu. 2021. Continuous-time sequential recommendation with temporal graph collaborative transformer. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 433–442.
  • Guo et al. (2017) Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).
  • He et al. (2017) Ruining He, Wang-Cheng Kang, and Julian McAuley. 2017. Translation-based recommendation. In Proceedings of the eleventh ACM conference on recommender systems. 161–169.
  • He and McAuley (2016) Ruining He and Julian McAuley. 2016. Fusing similarity models with markov chains for sparse sequential recommendation. In 2016 IEEE 16th International Conference on Data Mining. 191–200.
  • He and Chua (2017) Xiangnan He and Tat-Seng Chua. 2017. Neural factorization machines for sparse predictive analytics. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. 355–364.
  • Hidasi et al. (2015) Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
  • Huang et al. (2018) Jin Huang, Wayne Xin Zhao, Hongjian Dou, Ji-Rong Wen, and Edward Y Chang. 2018. Improving sequential recommendation with knowledge-enhanced memory networks. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 505–514.
  • Li et al. (2019) Chao Li, Zhiyuan Liu, Mengmeng Wu, Yuchi Xu, Huan Zhao, Pipei Huang, Guoliang Kang, Qiwei Chen, Wei Li, and Dik Lun Lee. 2019. Multi-interest network with dynamic routing for recommendation at Tmall. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2615–2623.
  • Li et al. (2020) Shihao Li, Dekun Yang, and Bufeng Zhang. 2020. MRIF: Multi-resolution Interest Fusion for Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1765–1768.
  • Pi et al. (2019) Qi Pi, Weijie Bian, Guorui Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Practice on long sequential user behavior modeling for click-through rate prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2671–2679.
  • Qu et al. (2016) Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, and Jun Wang. 2016. Product-based neural networks for user response prediction. In 2016 IEEE 16th International Conference on Data Mining (ICDM). 1149–1154.
  • Rendle (2010) Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International conference on data mining. 995–1000.
  • Rendle et al. (2010) Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web. 811–820.
  • Sabour et al. (2017) Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. 2017. Dynamic Routing Between Capsules. In In the 31th Conference on Neural Information Processing Systems. 3856–3866.
  • Sha and Wang (2017) Ying Sha and May D Wang. 2017. Interpretable predictions of clinical outcomes with an attention-based recurrent neural network. In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. 233–240.
  • Shan et al. (2016) Ying Shan, T Ryan Hoens, Jian Jiao, Haijing Wang, Dong Yu, and JC Mao. 2016. Deep crossing: Web-scale modeling without manually crafted combinatorial features. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 255–262.
  • Tang and Wang (2018) Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 565–573.
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.
  • Wang et al. (2020) Yaqing Wang, Caili Guo, Yunfei Chu, Jenq-Neng Hwang, and Chunyan Feng. 2020. A cross-domain hierarchical recurrent model for personalized session-based recommendations. Neurocomputing 380 (2020), 271–284.
  • Wu et al. (2019) Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendation with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 346–353.
  • Ying et al. (2018) Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 974–983.
  • Yu et al. (2019) Zeping Yu, Jianxun Lian, Ahmad Mahmoody, Gongshen Liu, and Xing Xie. 2019. Adaptive User Modeling with Long and Short-Term Preferences for Personalized Recommendation. In In the 30th International Joint Conference on Artificial Intelligence. 4213–4219.
  • Zhang et al. (2016) Weinan Zhang, Tianming Du, and Jun Wang. 2016. Deep learning over multi-field categorical data. In The 38th European Conference on Information Retrieval. 45–57.
  • Zhou et al. (2018) Yuwen Zhou, Changqin Huang, Qintai Hu, Jia Zhu, and Yong Tang. 2018. Personalized learning full-path recommendation model based on LSTM neural networks. Information Sciences (2018), 135–152.