Multi-trends Enhanced Dynamic Micro-video Recommendation

Yujie Lu UC Santa BarbaraUnited States [email protected] , Yingxuan Huang The University of Hong KongChina [email protected] , Shengyu Zhang Zhejiang UniversityChina [email protected] , Wei Han Singapore University of Technology and DesignChina [email protected] , Hui Chen Singapore University of Technology and DesignChina hui˙[email protected] , Fei Wu Zhejiang UniversityChina [email protected] and Zhou Zhao Zhejiang UniversityChina [email protected]

Abstract.

The explosively generated micro-videos on content sharing platforms call for recommender systems to permit personalized micro-video discovery with ease. Recent advances in micro-video recommendation have achieved remarkable performance in mining users’ current preference based on historical behaviors. However, most of them neglect the dynamic and time-evolving nature of users’ preference, and the prediction on future micro-videos with historically mined preference may deteriorate the effectiveness of recommender systems. In this paper, we propose to explicitly model dynamic multi-trends of users’ current preference and make predictions based on both the history and future potential trends. We devise the DMR framework, which comprises: 1) the implicit user network module which identifies sequence fragments from other users with similar interests and extracts the sequence fragments that are chronologically behind the identified fragments; 2) the multi-trend routing module which assigns each extracted sequence fragment into a trend group and update the corresponding trend vector; 3) the history-future trend prediction module jointly uses the history preference vectors and future trend vectors to yield the final click-through-rate. We validate the effectiveness of the proposed framework over multiple state-of-the-art micro-video recommenders on two publicly available real-world datasets. Relatively extensive analysis further demonstrate the superiority of modeling dynamic multi-trend for micro-video recommendation.

Micro-video Recommendation; Dynamic User Modeling; Future-aware; Multi-trend Routing

^†^†conference: Proceedings of the 29th ACM International Conference on Multimedia (MM ’21); October 20–24th, 2021; Chengdu, Sichuan, China^†^†booktitle: Proceedings of the 29th ACM International Conference on Multimedia (MM ’21), October 20–24th, 2021, Chengdu, Sichuan, China^†^†ccs: Information systems Recommender systems

1. Introduction

In recent years, the amount of searchable micro-videos has increased dramatically and exacerbated the need for recommender systems that can effectively mine users’ preference and identify potentially interested micro-videos in a personalized manner. Due to the powerful representation learning capacity, the rapid development of deep learning techniques has nourished the research field of recommendation (Lu et al., 2020; Du et al., 2020; He et al., 2020; Kang and McAuley, 2018b; Li et al., 2020; Sun et al., 2019; Tang and Wang, 2018a; Wang et al., 2018, 2019a; Wei et al., 2020, 2019b; Wu et al., 2019a; Yang et al., 2020; Yu et al., 2020). Such a development also gives rise to diverse models for video recommendation, which can be roughly categorized to collaborative filtering (Baluja et al., 2008; Huang et al., 2016), content-based filtering (Cui et al., 2014; Mei et al., 2011; Park, 2010; Zhou et al., 2015; Dong et al., 2018), and hybrid ones (Chen et al., 2012, 2016; Yan et al., 2015).

Compared with professional video recommendation, micro-video recommendation poses many unique challenges. For example, micro-videos typically lack of meta-data (e.g., genre, director, actor/actress, which are commonly available in professional videos), leading to semantic gap in representation (Chen et al., 2018b). Moreover, users might be interested in multiple topics of videos simultaneously, i.e., diverse interests, and yield interests to different extends (e.g., like, follow, click), i.e., multi-level interests (Li et al., 2019b). Recent years have witnessed much progress to confront the above challenges in this vein. THACIL (Chen et al., 2018b) employs temporal block splitting and hierarchical multi-head attention to model diverse interests across blocks. ALPINE (Li et al., 2019b) models users’ dynamic interests by constructing temporal behavior graph and devising the temporal graph-based LSTM. MTIN (Jiang et al., 2020a) considers personalized importance decay over time and diverse interests using item-level temporal mask and group routing mechanism, individually. In spite of the great advances of these works, we argue that solely modeling the historical behaviors deteriorates the capacity of user modeling capturing diverse and dynamic users’ interests. For example, MTIN (Jiang et al., 2020a) assigns historically interacted items to one of six interest groups and accordingly updates the six interest vectors. Since users’ interests are by nature dynamic, the interests learned from the logged data might be out-of-date or at least limited to the history, falling short to recommend fresh items and hurting the recommendation diversity. Therefore, capturing dynamic interest trends based on (but not limited to) historical items can be an indispensable function for high-quality recommender systems.

Towards this end, we devise the multi-trends framework for dynamic micro-video recommendation, abbreviated as DMR. We start from the perspective that trends refer to the possible future directions of the current interest implied by the logged interactions. Since we have no access to items interacted in the future, DMR encapsulates an implicit user network construction module that first identifies sequence fragments that yield similar interests as the current sequence from similar users. Then, we constructs possible trending sequences by extracting the sequence fragments that are chronologically behind the identified ones. We note that some trending sequences may share similar interests and representing each sequence as an individual interest may introduce unnecessary noises and computation costs. Towards this end, inspired by (Jiang et al., 2020a; Li et al., 2019a), we devise a multi-trend routing module that transforms multiple trending sequences to fewer number of multiple trend interest vectors. However, extracting trending sequences and mapping them to trend vectors for each testing inference might hurt the serving efficiency of industrial deployment. Thus, multi-trend routing module constructs a fixed-length trend memory for each user and read-writes the memory during training. For memory read-writing, we propose to assign trending sequences to memory slots in a soft way and power the process with attention mechanisms. During inference, we directly take the off-the-shelf history/trending vectors without extracting or transforming trending sequences, and thus addressing the efficiency issue. Predictions are performed with the history-trend joint prediction module.

To this end, DMR framework makes predictions based on both the history interests implied by the historical behaviors as well as multi-trends implied in similar users, which helps to capture even more diverse and dynamic interests compared with existing micro-video recommenders. We validate the effectiveness of DMR on micro-video recommendation benchmarks. The substantial improvement over state-of-the-art comparison methods and in-depth model analysis demonstrate the superiority of modeling multi-trend for micro-video recommendation. Overall, this paper has the following contributions:

•

We propose to capture even more diverse and dynamic interests beyond the historical behaviors by modeling the possible interest trends for micro-video recommendation.
•

We devise the novel DMR framework that encapsulates the implicit user network construction module, which extracts trending sequences from similar users, the multi-trend routing module, which performs dynamic trending memory read-write and improves the inference efficiency at the inference stage, and the history-trend joint prediction module.
•

We conduct extensive experiments on micro-video recommendation benchmarks, of which the results show DMR framework achieves high-quality recommendation with improvement on both accuracy and diversity.

2. Related Work

2.1. Video Recommendation

The methods for recommendation can be generally classified into two categories. Early algebraic approaches adopted collaborative filtering (Konstan et al., 1997; Ding and Li, 2005; He et al., 2017b; Sarwar et al., 2001) or model-based methods (Rendle et al., 2010; Deshpande and Karypis, 2004; Wang et al., 2015; Kim and Shim, 2014) to estimate user-item correlations and make predictions about users’ future interests. Collaborative filtering (CF) assumes that users sharing the same opinion on one issue tend to have more similar opinions on other issues (Koren and Bell, 2015), and thus it makes predictions specific to each user through information gleaned from other users (Terveen and Hill, 2001). Due to the extreme high computational complexity and data sparsity in traditional CF (Deshpande and Karypis, 2004; Papagelis et al., 2005), model-based methods alleviate this overhead by mapping user-item interactions into matrix entries, then apply factorization to the characteristic matrix to build nonlinear models that estimate correlations (i.e. preferences) between every pair of user and item and employ Hidden Markov models (HMM) to capture temporal trends of preferences (Rendle et al., 2010).

Recently, as the major advances in deep learning techniques, a wealth of research has sprung up on incorporating them into recommender systems. Most of work reformulated traditional estimation problem as learning task based on deep neural networks (Fan et al., 2019; Smirnova and Vasile, 2017; He et al., 2018). In the field of video recommendation, representative work focuses on content-based learning (Covington et al., 2016; Wang et al., 2019b; Wei et al., 2019a; Deldjoo et al., 2016), in which features of videos are extracted into embedding vectors and then matched with user representations that indicate individual preference. To name a few, Chen et al. (2017) tackled the item- and component-level implicit feedback issue in multimedia recommendation by learning independent video and users characteristics in a unified hierarchical attention network and then reckoning pair-wise scores as a measure of user preference. Although these works improve the accuracy of user modeling, they lack a clear partition of history and future for the given dataset and hence may encounter prediction bias due to mixing the two parts together. Our work adopts a multi-step time partition and similarity matching approach to alleviate this issue.

2.2. User Behavior Modeling

Modeling latent user interest from historical behaviors is commonly used in recommender systems. In the past two decades, a variety of approaches have been proposed, ranging from Markov chains (Shani et al., 2005; Rendle et al., 2010; He et al., 2016; He and McAuley, 2016; Norris and Norris, 1998) and traditional collaborative filtering (Koren, 2009; Ding and Li, 2005; Salakhutdinov and Mnih, 2008) to deep representation learning (Qu et al., 2016; Zhou et al., 2018). The approaches based on Markov decision processes implicitly track user state dynamics to predict future behaviors. For example, Rendle et al. (2010) captured long-term user interest via personalized transition graphs over underlying Markov chains. He and McAuley (2016) integrated similarity-based methods with Markov chains smoothly in personalized sequential recommender systems. Besides, temporal collaborative filtering is proposed to deal with the drifting user preferences. Koren (2009) offered a paradigm that tracks time changing behaviors throughout the life span of the data.

With the development of deep learning, more and more researchers adopted deep neural networks (DNN) to model the user dynamics in recommender systems. Particularly, Hidasi et al. (2015) applied recurrent neural networks (RNN) to model the whole session and introduced a new ranking loss function to make recommendations more accurate. Tang and Wang (2018b) utilized convolutional filters to embed a sequence of recent items into an ”image” in the time and latent spaces as well as learn sequential patterns as local features of the image. Wu et al. (2019b) considered session sequences as graph structured data and used graph neural networks (GNN) to capture complex transitions of items. Recently, self-attention mechanism (Vaswani et al., 2017) has been widely employed in recommender systems(Kang and McAuley, 2018a). For instance, Wu et al. (2020) proposed a Contextualized Temporal Attention Mechanism to weigh historical actions’ influence on not only what action it is, but also when and how the action took place.

However, previous work does not consider the influence of future information when modeling user behaviors in history sequences. In this work, we constructed a user-item heterogeneous graph to capture future interactions of each user with items.

3. Methodology

Refer to caption — Figure 1. Network Architecture of DMR. DMR is composed of an implicit user network module, a multi-trend routing module, a multi-level time attention layer and a prediction layer. Based on the users’ historical interactions, we build a implicit user network to construct future sequences. The multi-trend module are applied on the current user’s history sequences and future sequences in parallel to get representation of each trend group. The multi-level time attention mechanism are applied before the pooling layer to generate the history trend representation and future trend representation, which is further concatenated as dynamic user preference representation. Finally, the user representation and the candidate micro-video embedding are utilized for prediction in the classifier.

In this section, we first formulate the micro-video recommendation problem, and then introduce the proposed framework in detail. As illustrated in Figure 1, our proposed DMR framework for dynamic micro-video recommendation mainly comprises of three modules:1) Pearson Correlation Coefficient enhanced implicit user network module; 2) A history-future multi-trend joint routing module; 3) A multi-level time-aware attention module.

3.1. Problem Formulation

In a typical micro-video recommendation scenario, we have a set of users and a set of micro-videos, which can be denoted as ${U=\{u_{1},u_{2},u_{3},...,u_{|U|}\}}$ and ${V=\{v_{1},v_{2},v_{3},...,v_{|V|}\}}$ respectively. Let ${I_{u}=\{x_{1}^{u},x_{2}^{u},...,I_{|I_{u}|}^{u}\}}$ represent the sequence of interacted micro-videos ${x\in I_{u}}$ of user ${u\in U}$ , which is sorted in a chronological order according to the timestamp of each interaction, and ${x_{t}^{u}}$ denote the micro-video that the user ${u}$ has interacted with at timestamp ${t}$ . The interaction sequence ${I_{u}}$ is split into ${I_{+}}$ and ${I_{-}}$ which represent the micro-videos clicked by the user and the ones not clicked respectively. Given the user’s historical micro-video interaction behaviors, the investigated goal of the micro-video recommendation task in this paper is to predict the probability that the new candidate micro-video will be clicked by user ${u}$ . Notations are summarized in Table 1.

Table 1. Notations.

Notation	Description
u	a user
v	a micro-video
x	an interaction
d	the dimension of user/micro-video embeddings
t	the number of trends
U	the set of users
V	the set of micro-videos
I	the set of interactions
T	the trends set

Specifically, each instance is represented by a tuple ${(I_{u},A_{i})}$ , where ${I_{u}}$ denotes the set of items interacted by user ${u}$ , ${A_{i}}$ the features of target item ${i}$ including the information of interaction timestamp and micro-video embeddings. Through implicit user network module, we extract relative future sequence of user ${u}$ based on ${I_{u}}$ and their similar users’ historical interaction ${I_{u^{\prime}},u^{\prime}\in U}$ . The detail will be illustrated in Section 3.3.

To model diverse user preferences dynamically, DMR learns a function ${f}$ for mapping history trend set ${T_{u}^{h}}$ and future trend set ${T_{u}^{f}}$ into user representations, which can be formulated as:

(1)

\overrightarrow{e_{u}}=f(T_{u}^{h},T_{u}^{f})

where ${\overrightarrow{e_{u}}\in\mathbb{R}^{d\times 1}}$ denotes the representation vector of user ${u}$ , ${d}$ the dimension. Besides, the representation vector of target micro-video ${i}$ is obtained by an embedding function ${g}$ as:

(2)

\overrightarrow{e_{i}}=g(A_{i})

where ${\overrightarrow{e_{i}}\in\mathbb{R}^{d\times 1}}$ denotes the representation vector of target micro-video ${i}$ .

Based on the learned user representation vector and micro-video representation vector, the probability of candidate micro-video is calculated using the likelihood function ${P}$ as:

(3)

p(i|U,V,X)=P(\overrightarrow{e_{u}},\overrightarrow{e_{i}})

where ${\overrightarrow{e_{i}}}$ is the embedding of target item ${i}$ from set of micro-videos ${V}$ . Our framework outputs the click probabilities of the candidate micro-video to rank the personalized recommendation list. Then the system provide precise and diversified recommendation for each user, which entails potential preference of the specific user as they are most likely to interact with the recommended micro-videos.

The objective function for training our model is described in Section 3.6 We use the Adam optimizer to train our method.

3.2. Overview

The overall structure of our proposed framework DMR is illustrated in Figure 1, which is composed of an implicit user network module, a multi-trend routing module, a multi-level time-aware attention module and a prediction layer. As the relative future sequence for current user is actually the history sequence for the neighbors, the multi-trend routing algorithm is applied on both the future and history sequences using shared parameters in parallel. The framework takes the user historical interactions set ${X}$ as input. We use ${X^{u}_{1,N-K}}$ and ${X^{u}_{N-K+1,N}}$ to represent training and testing data of interactions sequence of user ${u}$ respectively. ${N}$ and ${K}$ denotes the selected total length of interaction sequence of each user ${u}$ and the length of training sequence respectively. For micro-videos from the set of ${X^{u}_{1,N-K}}$ , embeddings are presented as ${\overrightarrow{e}_{X^{u}_{1,N-K}}}$ .

The implicit user network module constructs neighbors set for each user by selecting the users that have similar micro-video preference as indicated in their past behaviors, and then extract the relative future sequences from each neighbor. The query items can be selected from the user historical interaction ${I_{u}}$ , for simplicity, we solely choose the last one in the list, which can be both efficient and effective as demonstrated in the empirical analysis. The relative future behaviors are defined as the interacted items following the query item in the chronological order, aiming at representing dynamic preference of the user. The intuition in behind is that the user tend to have similar preference trend as users with similar historical behaviors, and that the user can have diverse and dynamic trends of preferences.

The multi-trend routing module is developed to obtain the neighbor centroids according to diverse motivation behind specific interactions with the micro-videos. Then we learn future-aware diverse trends based on history and future sequence jointly. Furthermore, the future sequence evolved user representation acquired by time-aware attention layer is concatenated with the historical behavior evolved user representation to generate the dynamic user preference representation vector. Finally we compute the user’s preferences over different micro-videos from the pool by the prediction decoder. Each part will be elaborated in the following sections.

Algorithm 1 Implicit User Network Construction

0: The set of users

{U}

;User’s historical interaction sequence

{I_{u}}

;User’s query items sequence

{K_{u}}

and upper bound k;User’s candidate neighbors

{G_{u}}

and upper bound g;Similarity threshold

{\tau}

for neighbor selection;

0: The extracted neighbor set of user

{N_{u},u\in U}

;

1: for each

u\in U

{N_{u}\leftarrow\emptyset}

3: for each

i\in Inverse(I_{u})

4: if

{|K_{u}|<k}

then

{K_{u}\leftarrow INSERT(i)}

6: end if

7: end for

8: end for

9: for each

u\in U

10: for each

n\in U

11:

{s_{un}=USER\_SIMILARITY(u,n)}

12: if

{n\neq u\wedge|G_{u}|<g}\wedge s_{un}>\tau

then

13:

{|G_{u}|\leftarrow INSERT(n)}

14: end if

15: end for

16:

{N_{u}\leftarrow TOP\_SIMILARITY(G_{u});}

17: end for

18: return

{N_{u}}

3.3. Implicit User Network

As shown in Figure 2 ,the implicit user network is constructed based on user-item heterogeneous graph, which contains both the user nodes and item nodes. An edge in the graph represents the interaction between the user and the item. The weight of the edge indicates the temporal weight of each interacted item in a chronological order. The query items are selected in a multi-hop manner. The user nodes connected to the selected query items are considered as the candidate neighbor nodes of the current user.

Inspired by some works (Guo et al., 2016; Felicio et al., 2016), which extract social relationships in absence of explicit social networks (Mukherjee and Guennemann, 2019) , we construct the user network from user-item correlation implicitly.

Specifically, we compare the similarity among users via collaborative filtering implicitly based on the historical interactions with micro-videos. As the Pearson Correlation Coefficient(PCC) is a widely used similarity measure, we adopt Pearson Correlation Coefficient (Breese et al., 2013) to compute a linear correlation between the user and each candidate neighbor as:

(4)

s_{ij}=\frac{\sum\limits_{k\in I(i)\cap I(j)}(r_{ik}-\overline{r}_{i})\cdot(r_{jk}-\overline{r}_{j})}{\sqrt{\sum\limits_{k\in I(i)\cap I(j)}(r_{ik}-\overline{r}_{i})^{2}}\cdot\sqrt{\sum\limits_{k\in I(i)\cap I(j)}(r_{jk}-\overline{r}_{j})^{2}}}

where ${I(i)}$ is a set of micro-videos user ${i}$ interacted with, ${r_{ik}}$ and ${\overline{r}_{i}}$ represents the level (click or not click) of interaction of user ${i}$ over micro-video ${k}$ and the average level of action of user ${i}$ . The user similarity $s_{i}$ is ranging from ${[-1,1]}$ , and the similarity between users ${i}$ and ${j}$ is proportional to the value according to this definition. Following (Ma, 2013), we employ a mapping function ${f(x)=(x+1)/2}$ to bound the range of PCC similarities into ${[0,1]}$ .

In the case of users with only one common micro-video in history, PCC similarity gets $1$ when the users’ preferences over the common micro-video are similar and $-1$ when not, which encourages diversity of neighbors while damaging the fairness of similarity calculation. To tackle this issue, we only kept less than ${20\%}$ of such neighbor nodes to seek the balance.

In addition to the PCC method, we also design a filter with simple schema to extract similar users. For each user, if the historical interactions ${I_{u}}$ is split into two pieces, ${I^{u}_{1:t_{1}}}$ for training data, and ${I^{u}_{t_{1}:t_{2}}}$ for testing data, the item ${\hat{I}^{u}_{k}}$ is defined as the last ${k}$ micro-videos, ${k}$ could be any value less than or equal to ${|I^{u}|}$ , while in practice ${k=1}$ can achieve good enough performance with simplicity. We extracted a list of neighbors ${N=\{n_{1},n_{2},...,n_{|N|}\}}$ according to the query item. The detail of this process is described in Algorithm 1. Furthermore, we constructed the future sequence of user ${u}$ as:

(5)

F_{u}=\{n_{f},n_{f}\in I^{n},TI(n_{f})\geq TI(I^{u}_{|I_{u}|-k})\}

where Timestamp is denoted as ${TI}$ and the query item is denoted as ${I_{|I_{u}|-k}}$ . ${I_{n}}$ represents the interaction set of neighbor ${n}$

3.4. Multi-trend Routing

To capture the trend information lies in both history sequence and future sequence, we devised a multi-trend routing module into a two-stage manner to generate trend represent parallelly. Specifically, we group each micro-video from both the user’s historical sequence and extracted relative future sequence into diverse trends in the first stage. The micro-videos that are grouped into the same trend are considered to be similar according to users’ interactions over them and their own basic features. In the second stage, the micro-videos from historical sequence and relative future sequence are utilized to generate the representation of history and future trend group in parallel.

Based on the positive historical interaction sequence ${I_{+}}$ of user ${u}$ , we represent each micro-video ${x}$ in ${I_{+}}$ as an embedding vector ${\overrightarrow{x}\in\mathbb{R}^{d}}$ , where ${d}$ is the embedding size. And we initialize positive history trend group as ${T_{u}^{h}\in\mathbb{R}^{s\times d}}$ for user ${u}$ , where ${s}$ denotes the number of trend groups indicated from historical sequence and ${d}$ denotes the embedding dimension of each history trend. Specifically, each trend embedding is represented as ${\overrightarrow{t}\in\mathbb{R}^{d}}$ .

Similarly, based on the extracted future sequence ${F_{+}}$ from the implicit user network. The positive future trend group is denoted as ${T_{u}^{f}\in\mathbb{R}^{s\times d}}$ for user ${u}$ , where ${s}$ denotes the number of trend groups indicated from future sequence and ${d}$ denotes the embedding dimension of each future trend.

In order to fine-tune the representation of each trend, we apply attention mechanism over each micro-video and the initialized trend group. Given the micro-video embedding ${\overrightarrow{x}\in\mathbb{R}^{d}}$ and the trend embedding ${\overrightarrow{t}\in\mathbb{R}^{d}}$ , we calculate the weight between the micro-video and the trend based on a co-attention memory matrix. The micro-video from the history sequence and the future sequence are put into history trend and future trend separately. As the history sequence and future sequence is processed separately, our module is capable of capturing timeliness of trends which indicates evolved user interest.

3.5. Multi-level Time Attention Mechanism

As for the item-level, we use the weighted sum of historical micro-video features to obtain the current micro-video representation. Finally, we get the representation of each trend by attention mechanism on each micro-video in the trend group. As for the trend-level, we utilize the time-aware attention to activate the weight of diverse trends to capture the timeliness of each trend. Specifically, the attention function takes the interaction time of item ${i}$ , the interaction time of trends and trend embeddings as the query, key and value respectively. We compute the final representation of trend representation future sequence of user ${u}$ as:

(6)

HF_{u}=Attention(\overrightarrow{TI_{i}},\overrightarrow{TI_{tr}},\overrightarrow{t_{u}})=\overrightarrow{t_{u}}softmax(pow(\overrightarrow{TI_{i}},\overrightarrow{TI_{tr}}))

where Attention denotes the attention function, ${TI_{i}}$ represents the interaction time of micro-video ${i}$ , ${TI_{tr}}$ represents the average interaction time of micro-videos related to the trend group, ${\overrightarrow{t_{u}}}$ represents the embedding of the specific trend group.

The trend group generated from the user’s historical sequence and future sequence are then eventually updated by adding the corresponding trend group in ${T_{u}^{h}}$ and ${T_{u}^{f}}$ with the aggregation of history trend and future trend representation respectively.

3.6. Prediction

After computing the trend embeddings from activated trends through time-aware attention layer, we apply sumpooling to both history and future trend representations.

(7)

e_{u}^{h}=sumpooling(T_{u}^{h_{1}},...,T_{u}^{h_{s}}),e_{u}^{f}=sumpooling(T_{u}^{f_{1}},...,T_{u}^{f_{s}})

And then we concatenate the history trend representation vector ${e_{u}^{h}}$ and future trend representation vector ${e_{u}^{f}}$ to form a user preference embedding ${\overrightarrow{e_{u}}}$ as:

(8)

\overrightarrow{e_{u}}=e_{u}^{h}\frown e_{u}^{f}

Given a training sample ${u,i}$ with the user preference embedding ${\overrightarrow{e_{u}}}$ and micro-video embedding ${\overrightarrow{e_{i}}}$ as well as the micro-video set ${V}$ , we can predict the possibility of the user interacting with the micro-video as

(9)

p(i|U,V,I)=\frac{exp(\overrightarrow{e_{u}}^{T}\overrightarrow{e_{i}})}{\sum_{v\in V}exp(\overrightarrow{e_{u}}^{T}\overrightarrow{e_{v}})}

In the same way, we calculate the prediction score ${P(x|H_{-})}$ based on the negative interaction sequence, which aims to maximize the distance between the new micro-video embedding and user’s negative trend embeddings.

The final recommendation probability ${\hat{p}_{ij}}$ is represented by the linear combination of ${p(x|H_{+})}$ and ${p(x|H_{-})}$ . And the objective function of our model is as follows:

(10)

\mathbb{L}=-\sum\limits_{i\in\mathbb{U}}\left(\sum\limits_{i\in H_{+}}\log\sigma(\widehat{p}_{ui})+\sum\limits_{i\in H_{-}}log(1-\sigma(\hat{p}_{ui}))\right)

where ${\hat{p}_{ui}}$ denotes the prediction score of micro-video ${i}$ for user ${u}$ , ${\sigma}$ represents the sigmoid activation function.

4. Experiments

4.1. Dataset

MicroVideo-1.7M and KuaiShou were used as micro-video benchmark datasets in our experiments. Micro-video data and user-video interaction information can be found in each of these datasets. Each micro-video is represented by its features in these two datasets, and each interaction record includes the userID, micro-video ID, visited timestamp, and whether the user clicked the video. The two datasets’ statistics are shown in Table 2.

•

MicroVideo-1.7M(Chen et al., 2018a): This dataset comes from real data of micro-video sharing service in China which contains 1.7 million micro-videos.
•

KuaiShou: This dataset is released by the Kuaishou Competition in China MM 2018 conference.

Table 2. Statistics of the Datasets.

Dataset users items interactions train test

MicroVideo-1.7M 10,986 1,704,880 12,737,619 8,970,310 3,767,309

KuaiShou 10,000 3,239,534 13,661,383 10,931,092 2,730,291

Dataset	users	items	interactions	train	test
MicroVideo-1.7M	10,986	1,704,880	12,737,619	8,970,310	3,767,309
KuaiShou	10,000	3,239,534	13,661,383	10,931,092	2,730,291

4.2. Implementation Details

We used TensorFlow on four Tesla P40 GPUs to train our model with Adam optimizer. The following are the hyper-parameters: The micro-video embedding is 512-dimensional vectors, while the user embedding is 128-dimensional vectors. The batch size was set to 32, the optimizer was Adam, the learning rate was set to 0.001, and the regularization factor was set to 0.0001.

To find the user’s similar neighbors, we used the Pearson Correlation Coefficient (PCC) described earlier. In the ablation analysis, we set neighbor numbers as 5, 20, and 50. As for the future sequences, we cut off each neighbor’s at most 100 interacted micro-videos after the current user’s query items.

4.3. Evaluation Metrics

To compare the performance of different models,we use Precision@N, Recall@N, F1-score@N and AUC, where N is set to 50 as metrics for evaluation.

•

Precision: Number of correctly predicted positive observations divided by the total number of predicted positive observations.

(11) $Precision@N=\frac{1}{|U|}\sum\limits_{u\in U}\frac{|\hat{I}_{u,N}\cap I_{r}|}{|I_{r}|}$

where ${\hat{I}_{u,N}}$ denotes the set of top-N recommended micro-videos for user u and ${I_{r}}$ is the total recommendation list for user u.
•

Recall: Number of corrected recommended micro-videos divided by the total number of all recommended micro-videos.

(12) $Recall@N=\frac{1}{|U|}\sum\limits_{u\in U}\frac{|\hat{I}_{u,N}\cap I_{u}|}{|I_{u}|}$

where ${\hat{I}_{u,N}}$ denotes the set of top-N recommended micro-videos for user u and ${I_{u}}$ is the set of testing micro-videos for user u.

•

F1-score: F1 Score is the weighted average of Precision and Recall. It’s used to balance between Presicion and Recall.

(13)

F1-score=2*\frac{Precision*Recall}{Precision+Recall}

•

AUC: AUC (Area Under the ROC Curve) is used in classification analysis to determine the quality of classifiers.

4.4. Competitors

To validate the effectiveness of our proposed DMR framework, we conducted experiments on two publicly available real-world datasets. The comparision to other state-of-the-art micro-video recommenders are summarized in Table 3.

•

BPR(Rendle et al., 2012): Trained on pairwise items, the Bayesian personalized ranking(BPR) maximize the difference between positive and negative items of each user in Bayesian approach.
•

LSTM(Zhang et al., 2014): Long short-term memory(LSTM) is a sequence model. Hidden states of each unit are aggregated to form user interest representation.
•

CNN: The convolutional neural network (CNN) can be utilized to generate user interest representations based on the inter- action sequence. The max pooling layer and MLP layers are used for user interest extraction and prediction.
•

NCF(He et al., 2017a): As a collaborative filtering based model, NCF learns user embedding and item embedding with a shallow network and a deep network, which is able to learn an arbitrary function from data.
•

ATRank(Zhou et al., 2018): ATRank is an attention-based behavior modeling framework, which can model with heterogeneous user behaviors using only the attention model. It utilizes self-attention in multiple semantic spaces to capture behaviors interactions. The model is capable of predicting all types of user actions in a multi-task manner, which shows effectiveness over the highly optimized individual models.
•

THACIL(Chen et al., 2018a): THACIL achieved the click-through prediction for micro-videos by modeling user’s historical behaviors. The proposed recommendation algorithm characterizes both short-term and long-term correlation within user behaviors. It also profiles user interests at both coarse and fine granularities.
•

ALPINE(Li et al., 2019c): To intelligently route micro videos to target users, ALPINE proposed an LSTM model based on a temporal graph, which is encoded by user’s historical interaction sequence. The model captures the complex and diverse interests of users via a multi-level interest modeling layer. Moreover, the model achieves better performance by utilizing true negative samles, which indicates uninterested information.
•

MTIN(Jiang et al., 2020b): This model is a multi-scale time-aware user interest modeling framework, which learns user interests from fine-grained interest groups. MTIN incorporates the interest group routing unit to generate user interest groups based on the interaction sequence and leverages fine-grained interest groups via item-level and group-level interest extraction unit. The distilled user interest representation is used to predict the click probabilities of micro-video candidates.

Table 3. Overall Performance Comparision. The model performance of our model and several state-of-the-art baselines on two public datasets: MicroVideo-1.7M and KuaiShou-Dataset. The best results are highlighted in bold.

	MicroVideo-1.7M				KuaiShou-Dataset
Model	AUC@50	Precision@50	Recall@50	F1-score@50	AUC@50	Precision@50	Recall@50	F1-score@50
BPR	0.583	0.241	0.181	0.206	0.595	0.290	0.387	0.331
LSTM	0.641	0.277	0.205	0.236	0.731	0.316	0.420	0.360
CNN	0.650	0.287	0.214	0.245	0.719	0.312	0.413	0.356
NCF	0.672	0.316	0.225	0.262	0.724	0.320	0.420	0.364
ATRank	0.660	0.297	0.221	0.253	0.722	0.322	0.426	0.367
THACIL	0.684	${\mathbf{0.324}}$	0.234	0.269	0.727	0.325	0.429	0.369
ALPINE	0.713	0.300	0.460	0.362	0.739	0.331	0.436	0.376
MTIN	0.729	0.317	0.476	0.381	${\mathbf{0.752}}$	0.341	${\mathbf{0.449}}$	${\mathbf{0.388}}$
DMR	${\mathbf{0.731}}$	0.323	${\mathbf{0.478}}$	${\mathbf{0.385}}$	0.742	${\mathbf{0.343}}$	0.442	0.386

4.5. Results

The model performance on the two datasets is summarized in Table 3. We run experiments to dissect the effectiveness of our recommendation model. We compare the performance of DMR with several commonly used and state-of-the-art models: BPR, LSTM, CNN, NCF, ATRank, THACIL, ALPINE and MTIN. All these models are running on the two datasets introduced above: MicroVideo-1.7M and KuaiShou-Dataset. According to the results shown in Table 3, our model DMR achieve better performance on precision over KuaiShou dataset and performs better in terms of AUC, Recall and F1-score over MicroVideo-1.7M dataset.

Table 4. Effect analysis of Neighbors. The model performance with different Neighbor Number setting on two datasets: MicroVideo-1.7M and KuaiShou-Dataset. The metrics are @50. Here we set Neighbor Number to 5, 20, 50.

	MicroVideo-1.7M				KuaiShou-Dataset
Model	AUC@50	Precision@50	Recall@50	F1-score@50	AUC@50	Precision@50	Recall@50	F1-score@50
DMR-N5	0.689	0.319	0.425	0.364	0.674	0.333	0.439	0.378
DMR-N20	${\mathbf{0.731}}$	${\mathbf{0.323}}$	${\mathbf{0.478}}$	${\mathbf{0.385}}$	${\mathbf{0.742}}$	${\mathbf{0.343}}$	${\mathbf{0.442}}$	${\mathbf{0.386}}$
DMR-N50	0.668	0.280	0.282	0.281	0.652	0.329	0.404	0.362

Table 4 compares the result of different neighbor number setting of 5, 20 and 50. Considering more neighbors could result in more diversity, but too many neighbors would dilute interest trends’ embedding. Our model achieves improvements on neighbor number equals 20 over 5. Besides, it shows reduction if setting neighbor number from 20 to 50. This means the number of neighbors also play a crucial part in model performance.

The computational complexity of sequence layer modeling user and neighbors is $O(knd^{2})$ , where $k$ denotes the number of extracted neighbors, $n$ denotes the average sequence length and $d$ denotes the dimension of item’s representation. Capsule layer’s computational complexity depends on kernel size and number of trends. Average time complexity of capsule layer scales $O(nTr^{2})$ , where $r$ denotes kernel size of capsule layer and $T$ denotes the number of trends. For large-scale applications, our proposed model could reduce computational complexity by two measures: (1)encode neighbors with a momentum encoder(He et al., 2019).(2)adopt a light-weight Capsule network.

4.6. Recommendation Diversity

Aside from achieving high recommendation accuracy, diversity is also essential for the user experience. With little information of historical interactions between the users and the micro-videos, recommendation systems learned to assist users in selecting micro-videos that would be of interest to them. Recommender systems keep track of how users interacted with the micro-videos they’ve chosen.

Many research works (Adomavicius and Kwon, 2012; Boim et al., 2011; Di Noia et al., 2014; Premchaiswadi et al., 2013) have been undertaken to propose novel diversiﬁcation algorithms. Our proposed module can learn the diverse trends of user preference and provide recommendation with diversity. We define the individual diversity as below:

(14)

Diversity@N=\frac{\sum_{j=1}^{N}\sum_{k=j+1}^{N}\delta(CATE(\hat{i}_{u,j})\neq CATE(\hat{i}_{u,k}))}{N\times(N-1)/2}

where ${CATE}$ represents the category of the item. ${\hat{i}_{u}}$ denotes item recommended for user ${u}$ , ${j}$ and ${k}$ represents the order of the recommended items. ${\delta(\cdot)}$ is an indicator function.

Table 5 presents comparisons with THACIL and MTIN over the recommendation diversity metric on Micro-video dataset, which provides category infromation of micro-videos. We adopt the setting of six historical trend and six future trend evolved from 5 neighbors for our model. From the table, our module DMR achieve the optimum diversity metric indicating the recommendation it provide can effectively take neighbors’ interests into account.

Table 5. Model Recommendation Diversity Comparision on Micro-video Dataset.

MicroVideo-1.7M	THACIL	MTIN	DMR
Diversity@10	1.9112	1.9940	${\mathbf{1.9948}}$
Diversity@50	1.9104	1.9948	${\mathbf{1.9956}}$
Diversity@100	1.9436	1.9950	${\mathbf{1.9954}}$

5. Conclusion and Future Work

In this work, we propose to capture even more diverse and dynamic interests beyond those implied by the historical behaviors for micro-video recommendation. We refer to the future interest directions as trends and devise the DMR framework. DMR employ an implicit user network module to extract future sequence fragments from similar users. A mutli-trend routing module assigns these future sequences to different trend groups and updates the corresponding trending memory slot in a dynamic read-write manner. Final predictions are made based on both future evolved trends and history evolved trends with a history-future trends joint prediction module.

This work represents one of the initial attempts to explicitly capture possible interest trends for a given historical behavior sequence, especially for ranking models and micro-video recommendation. We believe that such an idea can be inspirational to future works on learning recommender systems of high diversity. In the future, though the implicit user network module does not affect serving efficiency, we would like to explore whether more efficient and effective solutions exist to boost the training since introducing information from other users might also introduce inevitable noises. Moreover, we plan to extend the multi-trend capturing idea to more applications in recommender systems and address application-specific challenges.

References

(1)
Adomavicius and Kwon (2012) G. Adomavicius and Y. Kwon. 2012. Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques. IEEE Transactions on Knowledge and Data Engineering 24, 5 (2012), 896–911.
Baluja et al. (2008) Shumeet Baluja, Rohan Seth, D. Sivakumar, Yushi Jing, Jay Yagnik, Shankar Kumar, Deepak Ravichandran, and Mohamed Aly. 2008. Video suggestion and discovery for youtube: taking random walks through the view graph.. In Proceedings of the 17th International Conference on World Wide Web, WWW 2008, Beijing, China, April 21-25, 2008.
Boim et al. (2011) Rubi Boim, Tova Milo, and Slava Novgorodov. 2011. Diversification and Refinement in Collaborative Filtering Recommender. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (Glasgow, Scotland, UK) (CIKM ’11). Association for Computing Machinery, New York, NY, USA, 739–744. https://doi.org/10.1145/2063576.2063684
Breese et al. (2013) John S. Breese, David Heckerman, and Carl Kadie. 2013. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. arXiv:1301.7363 [cs.IR]
Chen et al. (2012) Bisheng Chen, Jingdong Wang, Qinghua Huang, and Tao Mei. 2012. Personalized video recommendation through tripartite graph propagation.. In Proceedings of the 20th ACM Multimedia Conference, MM ’12, Nara, Japan, October 29 - November 02, 2012.
Chen et al. (2016) Jingyuan Chen, Xuemeng Song, Liqiang Nie, Xiang Wang, Hanwang Zhang, and Tat-Seng Chua. 2016. Micro Tells Macro: Predicting the Popularity of Micro-Videos via a Transductive Model.. In Proceedings of the 2016 ACM Conference on Multimedia Conference, MM 2016, Amsterdam, The Netherlands, October 15-19, 2016.
Chen et al. (2017) Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. 335–344.
Chen et al. (2018a) Xusong Chen, Dong Liu, Zhengjun Zha, W. Zhou, Zhiwei Xiong, and Y. Li. 2018a. Temporal Hierarchical Attention at Category- and Item-Level for Micro-Video Click-Through Prediction. Proceedings of the 26th ACM international conference on Multimedia (2018).
Chen et al. (2018b) Xusong Chen, Dong Liu, Zheng-Jun Zha, Wengang Zhou, Zhiwei Xiong, and Yan Li. 2018b. Temporal Hierarchical Attention at Category- and Item-Level for Micro-Video Click-Through Prediction.. In 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018, Seoul, Republic of Korea, October 22-26, 2018.
Covington et al. (2016) Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. New York, NY, USA.
Cui et al. (2014) Peng Cui, Zhiyu Wang, and Zhou Su. 2014. What Videos Are Similar with You?: Learning a Common Attributed Representation for Video Recommendation.. In Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, USA, November 03 - 07, 2014.
Deldjoo et al. (2016) Yashar Deldjoo, Mehdi Elahi, Paolo Cremonesi, Franca Garzotto, Pietro Piazzolla, and Massimo Quadrana. 2016. Content-based video recommendation system based on stylistic visual features. Journal on Data Semantics 5, 2 (2016), 99–113.
Deshpande and Karypis (2004) Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Transactions on Information Systems (TOIS) 22, 1 (2004), 143–177.
Di Noia et al. (2014) Tommaso Di Noia, Vito Claudio Ostuni, Jessica Rosati, Paolo Tomeo, and Eugenio Di Sciascio. 2014. An Analysis of Users’ Propensity toward Diversity in Recommendations. In Proceedings of the 8th ACM Conference on Recommender Systems (Foster City, Silicon Valley, California, USA) (RecSys ’14). Association for Computing Machinery, New York, NY, USA, 285–288. https://doi.org/10.1145/2645710.2645774
Ding and Li (2005) Yi Ding and Xue Li. 2005. Time weight collaborative filtering. In Proceedings of the 14th ACM international conference on Information and knowledge management. 485–492.
Dong et al. (2018) Jianfeng Dong, Xirong Li, Chaoxi Xu, Gang Yang, and Xun Wang. 2018. Feature Re-Learning with Data Augmentation for Content-based Video Recommendation.. In 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018, Seoul, Republic of Korea, October 22-26, 2018.
Du et al. (2020) Xiaoyu Du, Xiang Wang, Xiangnan He, Zechao Li, Jinhui Tang, and Tat-Seng Chua. 2020. How to Learn Item Representation for Cold-Start Multimedia Recommendation?. In MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020.
Fan et al. (2019) Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. 2019. Graph Neural Networks for Social Recommendation. In The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19). Association for Computing Machinery, New York, NY, USA, 417–426. https://doi.org/10.1145/3308558.3313488
Felicio et al. (2016) Cricia Felicio, Klérisson Paixão, Guilherme Alves, Sandra Amo, and Philippe Preux. 2016. Exploiting Social Information in Pairwise Preference Recommender System. Journal of Information and Data Management 7 (08 2016), 99.
Guo et al. (2016) J. Guo, Y. Zhu, A. Li, Q. Wang, and W. Han. 2016. A Social Influence Approach for Group User Modeling in Group Recommendation Systems. IEEE Intelligent Systems 31, 5 (2016), 40–48.
He et al. (2019) Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross B. Girshick. 2019. Momentum Contrast for Unsupervised Visual Representation Learning. CoRR abs/1911.05722 (2019). arXiv:1911.05722 http://arxiv.org/abs/1911.05722
He et al. (2016) Ruining He, Chen Fang, Zhaowen Wang, and Julian J. McAuley. 2016. Vista: A Visually, Socially, and Temporally-aware Model for Artistic Recommendation. CoRR abs/1607.04373 (2016). arXiv:1607.04373 http://arxiv.org/abs/1607.04373
He and McAuley (2016) Ruining He and Julian McAuley. 2016. Fusing similarity models with markov chains for sparse sequential recommendation. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 191–200.
He et al. (2020) Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yong-Dong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation.. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020.
He et al. (2018) Xiangnan He, Zhankui He, Jingkuan Song, Zhenguang Liu, Yu-Gang Jiang, and Tat-Seng Chua. 2018. Nais: Neural attentive item similarity model for recommendation. IEEE Transactions on Knowledge and Data Engineering 30, 12 (2018), 2354–2366.
He et al. (2017a) Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017a. Neural Collaborative Filtering. CoRR abs/1708.05031 (2017). arXiv:1708.05031 http://arxiv.org/abs/1708.05031
He et al. (2017b) Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017b. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173–182.
Hidasi et al. (2015) Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based Recommendations with Recurrent Neural Networks. arXiv:1511.06939 [cs.LG]
Huang et al. (2016) Yanxiang Huang, Bin Cui, Jie Jiang, Kunqian Hong, Wenyu Zhang, and Yiran Xie. 2016. Real-time Video Recommendation Exploration.. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016.
Jiang et al. (2020a) Hao Jiang, Wenjie Wang, Yinwei Wei, Zan Gao, Yinglong Wang, and Liqiang Nie. 2020a. What Aspect Do You Like: Multi-scale Time-aware User Interest Modeling for Micro-video Recommendation.. In MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020.
Jiang et al. (2020b) Hao Jiang, Wenjie Wang, Yinwei Wei, Zan Gao, Yinglong Wang, and Liqiang Nie. 2020b. What Aspect Do You Like: Multi-Scale Time-Aware User Interest Modeling for Micro-Video Recommendation. In Proceedings of the 28th ACM International Conference on Multimedia (Seattle, WA, USA) (MM ’20). Association for Computing Machinery, New York, NY, USA, 3487–3495. https://doi.org/10.1145/3394171.3413653
Kang and McAuley (2018a) Wang-Cheng Kang and Julian McAuley. 2018a. Self-attentive sequential recommendation. In 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 197–206.
Kang and McAuley (2018b) Wang-Cheng Kang and Julian J. McAuley. 2018b. Self-Attentive Sequential Recommendation.. In IEEE International Conference on Data Mining, ICDM 2018, Singapore, November 17-20, 2018.
Kim and Shim (2014) Younghoon Kim and Kyuseok Shim. 2014. TWILITE: A recommendation system for Twitter using a probabilistic model based on latent Dirichlet allocation. Information Systems 42 (2014), 59–77.
Konstan et al. (1997) Joseph A Konstan, Bradley N Miller, David Maltz, Jonathan L Herlocker, Lee R Gordon, and John Riedl. 1997. Grouplens: Applying collaborative filtering to usenet news. Commun. ACM 40, 3 (1997), 77–87.
Koren (2009) Yehuda Koren. 2009. Collaborative filtering with temporal dynamics. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 447–456.
Koren and Bell (2015) Yehuda Koren and Robert Bell. 2015. Advances in collaborative filtering. Recommender systems handbook (2015), 77–118.
Li et al. (2019a) Chao Li, Zhiyuan Liu, Mengmeng Wu, Yuchi Xu, Huan Zhao, Pipei Huang, Guoliang Kang, Qiwei Chen, Wei Li, and Dik Lun Lee. 2019a. Multi-Interest Network with Dynamic Routing for Recommendation at Tmall.. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, Beijing, China, November 3-7, 2019.
Li et al. (2019b) Yongqi Li, Meng Liu, Jianhua Yin, Chaoran Cui, Xin-Shun Xu, and Liqiang Nie. 2019b. Routing Micro-videos via A Temporal Graph-guided Recommendation System.. In Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, October 21-25, 2019.
Li et al. (2019c) Yongqi Li, Meng Liu, Jianhua Yin, Chaoran Cui, Xin-Shun Xu, and Liqiang Nie. 2019c. Routing Micro-Videos via A Temporal Graph-Guided Recommendation System. In Proceedings of the 27th ACM International Conference on Multimedia (Nice, France) (MM ’19). Association for Computing Machinery, New York, NY, USA, 1464–1472. https://doi.org/10.1145/3343031.3350950
Li et al. (2020) Zhaopeng Li, Qianqian Xu, Yangbangyan Jiang, Xiaochun Cao, and Qingming Huang. 2020. Quaternion-Based Knowledge Graph Network for Recommendation.. In MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020.
Lu et al. (2020) Yujie Lu, Shengyu Zhang, Yingxuan Huang, Luyao Wang, Xinyao Yu, Zhou Zhao, and Fei Wu. 2020. Future-Aware Diverse Trends Framework for Recommendation. CoRR (2020).
Ma (2013) Hao Ma. 2013. An experimental study on implicit social recommendation. 73–82. https://doi.org/10.1145/2484028.2484059
Mei et al. (2011) Tao Mei, Bo Yang, Xian-Sheng Hua, and Shipeng Li. 2011. Contextual Video Recommendation by Multimodal Relevance and User Feedback. ACM Trans. Inf. Syst. (2011).
Mukherjee and Guennemann (2019) Subhabrata Mukherjee and Stephan Guennemann. 2019. GhostLink: Latent Network Inference for Influence-aware Recommendation. arXiv:1905.05955 [cs.SI]
Norris and Norris (1998) James R Norris and James Robert Norris. 1998. Markov chains. Number 2. Cambridge university press.
Papagelis et al. (2005) Manos Papagelis, Dimitris Plexousakis, and Themistoklis Kutsuras. 2005. Alleviating the sparsity problem of collaborative filtering using trust inferences. In International conference on trust management. Springer, 224–239.
Park (2010) Jonghun Park. 2010. An online video recommendation framework using view based tag cloud aggregation. IEEE Multimedia, 2010 (2010).
Premchaiswadi et al. (2013) Wichian Premchaiswadi, Pitaya Poompuang, Nipat Jongswat, and Nucharee Premchaiswadi. 2013. Enhancing Diversity-Accuracy Technique on User-Based Top-N Recommendation Algorithms. In Proceedings of the 2013 IEEE 37th Annual Computer Software and Applications Conference Workshops (COMPSACW ’13). IEEE Computer Society, USA, 403–408. https://doi.org/10.1109/COMPSACW.2013.68
Qu et al. (2016) Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, and Jun Wang. 2016. Product-based neural networks for user response prediction. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 1149–1154.
Rendle et al. (2012) Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2012. BPR: Bayesian Personalized Ranking from Implicit Feedback. CoRR abs/1205.2618 (2012). arXiv:1205.2618 http://arxiv.org/abs/1205.2618
Rendle et al. (2010) Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web. 811–820.
Salakhutdinov and Mnih (2008) Ruslan Salakhutdinov and Andriy Mnih. 2008. Probabilistic matrix factorization. Advances in Neural Information Processing Systems 20 (2008), 1257–1264.
Sarwar et al. (2001) Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-Based Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th International Conference on World Wide Web (Hong Kong, Hong Kong) (WWW ’01). Association for Computing Machinery, New York, NY, USA, 285–295. https://doi.org/10.1145/371920.372071
Shani et al. (2005) Guy Shani, David Heckerman, Ronen I Brafman, and Craig Boutilier. 2005. An MDP-based recommender system. Journal of Machine Learning Research 6, 9 (2005).
Smirnova and Vasile (2017) Elena Smirnova and Flavian Vasile. 2017. Contextual sequence modeling for recommendation with recurrent neural networks. In Proceedings of the 2nd workshop on deep learning for recommender systems. 2–9.
Sun et al. (2019) Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer.. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, Beijing, China, November 3-7, 2019.
Tang and Wang (2018a) Jiaxi Tang and Ke Wang. 2018a. Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding.. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, Marina Del Rey, CA, USA, February 5-9, 2018.
Tang and Wang (2018b) Jiaxi Tang and Ke Wang. 2018b. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 565–573.
Terveen and Hill (2001) Loren Terveen and Will Hill. 2001. Beyond recommender systems: Helping people help each other. HCI in the New Millennium 1, 2001 (2001), 487–509.
Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS.
Wang et al. (2018) Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba.. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018.
Wang et al. (2015) Pengfei Wang, Jiafeng Guo, Yanyan Lan, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2015. Learning hierarchical representation model for nextbasket recommendation. In Proceedings of the 38th International ACM SIGIR conference on Research and Development in Information Retrieval. 403–412.
Wang et al. (2019b) Peng Wang, Yunsheng Jiang, Chunxu Xu, and Xiaohui Xie. 2019b. Overview of Content-Based Click-Through Rate Prediction Challenge for Video Recommendation. In Proceedings of the 27th ACM International Conference on Multimedia. 2593–2596.
Wang et al. (2019a) Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019a. Neural Graph Collaborative Filtering.. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, July 21-25, 2019.
Wei et al. (2019a) Yinwei Wei, Xiang Wang, Weili Guan, Liqiang Nie, Zhouchen Lin, and Baoquan Chen. 2019a. Neural multimodal cooperative learning toward micro-video understanding. IEEE Transactions on Image Processing 29 (2019), 1–14.
Wei et al. (2020) Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, and Tat-Seng Chua. 2020. Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback.. In MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020.
Wei et al. (2019b) Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019b. MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video.. In Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, October 21-25, 2019.
Wu et al. (2020) Jibang Wu, Renqin Cai, and Hongning Wang. 2020. DéJà vu: A Contextualized Temporal Attention Mechanism for Sequential Recommendation (WWW ’20). Association for Computing Machinery, New York, NY, USA, 11 pages. https://doi.org/10.1145/3366423.3380285
Wu et al. (2019a) Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019a. Session-Based Recommendation with Graph Neural Networks.. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019.
Wu et al. (2019b) Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019b. Session-based recommendation with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 346–353.
Yan et al. (2015) Ming Yan, Jitao Sang, and Changsheng Xu. 2015. Unified YouTube Video Recommendation via Cross-network Collaboration.. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, Shanghai, China, June 23-26, 2015.
Yang et al. (2020) Xuewen Yang, Dongliang Xie, Xin Wang, Jiangbo Yuan, Wanying Ding, and Pengyun Yan. 2020. Learning Tuple Compatibility for Conditional Outfit Recommendation.. In MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020.
Yu et al. (2020) Xuzheng Yu, Tian Gan, Yinwei Wei, Zhiyong Cheng, and Liqiang Nie. 2020. Personalized Item Recommendation for Second-hand Trading Platform.. In MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020.
Zhang et al. (2014) Yuyu Zhang, Hanjun Dai, Chang Xu, Jun Feng, Taifeng Wang, Jiang Bian, Bin Wang, and Tie-Yan Liu. 2014. Sequential Click Prediction for Sponsored Search with Recurrent Neural Networks. CoRR abs/1404.5772 (2014). arXiv:1404.5772 http://arxiv.org/abs/1404.5772
Zhou et al. (2018) Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1059–1068.
Zhou et al. (2015) Xiangmin Zhou, Lei Chen, Yanchun Zhang, Longbing Cao, Guangyan Huang, and Chen Wang. 2015. Online Video Recommendation in Sharing Community.. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015.