\useunder

\ul

Unleash LLMs Potential for Recommendation by Coordinating Twin-Tower Dynamic Semantic Token Generator

Jun Yin Central South UniversityChangshaHunanChina [email protected] , Zhengxin Zeng MicrosoftBeijingChina [email protected] , Mingzheng Li MicrosoftBeijingChina [email protected] , Hao Yan Central South UniversityChangshaHunanChina [email protected] , Chaozhuo Li Microsoft Research AsiaBeijingChina [email protected] , Weihao Han MicrosoftBeijingChina [email protected] , Jianjin Zhang MicrosoftBeijingChina [email protected] , Ruochen Liu Central South UniversityChangshaHunanChina [email protected] , Allen Sun MicrosoftBeijingChina [email protected] , Denvy Deng MicrosoftBeijingChina [email protected] , Feng Sun MicrosoftBeijingChina [email protected] , Qi Zhang MicrosoftBeijingChina [email protected] , Shirui Pan Griffith UniversityBrisbaneQueenslandAustralia [email protected] and Senzhang Wang Central South UniversityChangshaHunanChina [email protected]

(2018)

Abstract.

Owing to the unprecedented capability in semantic understanding and logical reasoning, the pre-trained large language models (LLMs) have shown fantastic potential in developing the next-generation recommender systems (RSs). However, the static index paradigm adopted by current methods greatly restricts the utilization of LLMs capacity for recommendation, leading to not only the insufficient alignment between semantic and collaborative knowledge, but also the neglect of high-order user-item interaction patterns. In this paper, we propose Twin-Tower Dynamic Semantic Recommender (TTDS)¹¹1The code and dataset will be available upon acceptance of the paper, the first generative RS which adopts dynamic semantic index paradigm, targeting at resolving the above problems simultaneously. To be more specific, we for the first time contrive a dynamic knowledge fusion framework which integrates a twin-tower semantic token generator into the LLM-based recommender, hierarchically allocating meaningful semantic index for items and users, and accordingly predicting the semantic index of target item. Furthermore, a dual-modality variational auto-encoder is proposed to facilitate multi-grained alignment between semantic and collaborative knowledge. Eventually, a series of novel tuning tasks specially customized for capturing high-order user-item interaction patterns are proposed to take advantages of user historical behavior. Extensive experiments across three public datasets demonstrate the superiority of the proposed methodology in developing LLM-based generative RSs. The proposed TTDS recommender achieves an average improvement of 19.41% in Hit-Rate and 20.84% in NDCG metric, compared with the leading baseline methods.

Generative Recommender Systems, Large Language Models, Twin-Tower Architecture, Vector Quantization

^†^†copyright: acmlicensed^†^†journalyear: 2018^†^†doi: XXXXXXX.XXXXXXX^†^†conference: Make sure to enter the correct conference title from your rights confirmation emai; June 03–05, 2018; Woodstock, NY^†^†isbn: 978-1-4503-XXXX-X/18/06^†^†ccs: Information systems Recommender systems^†^†ccs: Information systems Language models^†^†ccs: Computing methodologies Information extraction^†^†ccs: Information systems Personalization

Refer to caption — Figure 1. An overview of a). The proposed twin-tower generative recommender systems based on dynamic semantic index and b). Existing single-Tower generative recommender systems based on static semantic index.

1. Introduction

With the network information proliferating exponentially, recommender systems (RSs) which discover user interested contents have become essential components of online service platforms (Raffel et al., 2020; Covington et al., 2016; Geng et al., 2022; de Souza Pereira Moreira et al., 2021). Since user preference evolves over time, sequential recommender systems have attracted great research attention from both academia and industry (Kang and McAuley, 2018), due to the splendid capability in capturing collaborative patterns of items in user behavior sequences. In current research literature, sequential recommender systems introduce various deep neural network architectures, including convolutional neural networks (CNNs) (Yuan et al., 2019; Tang and Wang, 2018), recurrent neural networks (RNNs) (Tan et al., 2016; Li et al., 2017a), graph neural networks (GNNs) (Wu et al., 2019; Xu et al., 2019), and Transformers (Kang and McAuley, 2018; de Souza Pereira Moreira et al., 2021; Sun et al., 2019), to extract collaborative informative within the user historical behaviors represented by item sequences. For further improvement on recommendation performance, the pre-trained language models (PLMs) are adopted to capture the semantic information within the item textual features (Hou et al., 2022; Li et al., 2023; Hou et al., 2023). Recently, the emergency of large language models (LLMs) pre-trained on large-scale natural language (NL) corpus, such as GPT (Radford et al., 2023), T5 (Gomez-Uribe and Hunt, 2016), and LlaMA (Touvron et al., 2023), have revolutionized the recommender systems community, due to the unprecedented capability in semantic understanding and logical reasoning (Rajput et al., 2023; Zhu et al., 2024; Zheng et al., 2024).

The cornerstone of developing LLM-based recommender systems lies in the integration between the two modalities of item that serves as the fundamental in sequential recommendation, i.e., the collaborative information reflected by historical interaction and the semantic information implicating in item textual feature. Existing efforts to align the collaborative and semantic information in sequential recommendation can be divided into two main approaches: (i) The first approach verbalizes the historical interaction into textual sequence (e.g., concatenate the item titles and descriptions) and then instructs LLMs on sequential recommendation through natural language prompt (Hou et al., 2024; Bao et al., 2023; Hua et al., 2023). Such an content-based approach further increases the computation expenditure due to the greatly extended input sequence and fails to guarantee in-domain recommendation results (Zhu et al., 2024). (ii) The second approach incorporates LLMs and RSs through the static index mapping²²2Without ambiguity, the term ID and index (indices) will be used interchangeably.. The compact item indices can be considered as special tokens and used to extend the LLM vocabulary, aiming to mitigate the computation overhead and guarantee the in-domain legitimate recommendation results (Rajput et al., 2023; Zhu et al., 2024; Zheng et al., 2024; Geng et al., 2022; Hua et al., 2023). To be more specific, pseudo index based methods emulate traditional ID-based RS and adopt ID-alike words to represent users and items (e.g., ¡user_100¿ and ¡item_128¿) (Geng et al., 2022; Zhu et al., 2024; Hua et al., 2023). To bring the best of the content-based and the pseudo index based methods, semantic index based methods first embed the item/user related content into continuous vector, and then introduce discrete indexing technique which quantizes continuous embedding into discrete indices (Rajput et al., 2023; Zheng et al., 2024; Jin et al., 2024b). Notably, different from pseudo ID, the discrete semantic ID is generated based on the item/user representation similarity, which implies that ¡item_1¿ is more similar to ¡item_2¿ than ¡item_10¿.

Despite remarkable achievements have been developed, such a static index mechanism based integration paradigm is still confronted with the fundamental discrepancy between the semantic and collaborative knowledge. Throughout the static index based recommendation process, each item index is completely dominated by the corresponding textual content and the textual encoder (Rajput et al., 2023; Zheng et al., 2024; Jin et al., 2024a; Liu et al., 2024). First, the textual content itself implicates semantic inductive bias, probably misleading the downstream RSs. For example, the film Transformers (July 3, 2007) and the teaching video Transformer Careful Elaboration (October 28, 2021) are highly similar in terms of textual content, yet fractionally overlap within the interaction records. Second, distribution mismatch between the textual encoder embedding and the RS representation is overlooked (Jin et al., 2024b; Huh et al., 2023). Assuming that a shallow textual encoder is employed, the entire RS performance will be constrained by the inferior encoding ability. Last but not least, existing static integration paradigms fail to exploit high-order user-item interaction patterns during the recommendation-oriented fine-tune (Rajput et al., 2023; Zheng et al., 2024; Liu et al., 2024; Jin et al., 2024a). Only the item co-occurrence pattern within user historical behavior is underlined, high-order interaction patterns, such as the user co-purchase pattern and the user preference pattern, remain to be fully utilized.

In this paper, we for the first time investigate the generative RSs based on dynamic semantic index mechanism, aiming to address the above problems simultaneously. As illustrated in Figure 1, during the recommendation-oriented fine-tuning, the twin-tower semantic token generator is cooperatively optimized with the recommender module. Note that within the dynamic semantic index mechanism, the textual encoder module and the recommender module share the same LLM backbone, seamlessly aligning the distribution of the textual embedding and the recommender representation, eventually fusing the semantic and collaborative knowledge into a monolithic model. Moreover, the twin-tower semantic token generator contains two homogeneous branches of discrete index module, one of which for the item index learning and the other for the user counterpart. By reasonably aggregating the user and item semantic indices, the recommender module is able to delve into the exploition of high-order user-item interaction patterns. However, it is non-trivial to design the dynamic semantic index based generative recommender system due to the following challenges.

Under-trained User/Item Semantic Token. Since LLM-based the generative RSs extend the LLM vocabulary with the semantic tokens of users and items, the semantic tokens can be considered as basic words of a brand-new language which specifically describes the recommendation field (Wang et al., 2022; Tay et al., 2022; Rajput et al., 2023). However, the novel language is unfamiliar to the LLM backbone which is pre-trained on large-scale NL corpus. Therefore, there exists considerable discrepancy between the under-trained semantic tokens and the well pre-trained NL tokens. How to instruct LLMs in understanding the under-trained semantic tokens as well as the NL tokens is challenging.

Implicit High-order User-Item Interaction Pattern. As to the sequential recommendation task, the user historical behavior is represented as an interactive item sequence (e.g., [¡item_1¿, ¡item_3¿, …, ¡item_34¿]) (Kang and McAuley, 2018; de Souza Pereira Moreira et al., 2021; Sun et al., 2019). Accordingly, the item co-occurrence pattern is explicit and effortless to capture. Nevertheless, the high-order interaction patterns that implicitly contained in user historical behavior are overlooked by existing methods (Rajput et al., 2023; Zheng et al., 2024; Jin et al., 2024a; Liu et al., 2024; Zhu et al., 2024). For example, the user co-purchase pattern (e.g., ¡user_1¿ and ¡user_2¿ both purchase ¡item_9¿) and the user preference pattern (e.g., ¡user_3¿ and ¡user_4¿ share lots of similar items) are difficult to capture merely based on the sequential recommendation task (Wang et al., 2019). How to exploit and capture the implicit high-order user-item interaction patterns are also challenging.

To address the aforementioned challenges, we propose Twin-Tower Dynamic Semantic (TTDS) Recommender, the first generative RS that adopts dynamic semantic index mechanism. By devising a dynamic knowledge fusion framework, TTDS recommender is able to absorb the semantic knowledge in textual feature and the collaborative knowledge within the historical interaction into a single LLM backbone. Given the user historical interacted items, first the LLM can convert the user textual profile and item textual features into continuous embeddings, then the twin-tower semantic token generator hierarchically discretizes the continuous embeddings into semantic user/item indices (Zeghidour et al., 2021). Within the dynamic knowledge fusion process, a dual-modality variational auto-encoder (DM-VAE) is proposed to align the under-trained semantic tokens and the well pre-trained NL token across multiple representation granularity. Accordingly, the user profile and the item textual feature can be equivalently replaced by the corresponding semantic indices. By aggregating the user/item semantic indices and the user behavior sequence through human instruction, the semantic knowledge and the collaborative knowledge are seamlessly integrated during the recommendation-oriented fine-tuning procedure. Furthermore, we customize a series of novel tuning tasks target at capturing high-order user-item interaction patterns. Eventually, the learned user/item indices not only contain semantic knowledge represented by the textual features, but also imply collaborative knowledge reflected by the interaction histories. Based on the dynamic knowledge fusion framework, TTDS recommender is able to perform sequential recommendation following the paradigm of next token prediction in NLP (Tay et al., 2022).

The main contributions of this work can be summarized below:

•

We for the first time investigate the dynamic semantic index based generative recommender systems and propose the Twin-Tower Dynamic Semantic Recommender that adopts a dynamic knowledge fusion framework,
•

We propose the dual-modality variational auto-encoder (DM-VAE) module, aiming to facilitate LLM to understand the newly introduced semantic tokens. In addition, we design a series of specific tuning tasks, to exploit and capture high-order user-item interaction patterns.
•

Extensive experimental results on three public recommendation datasets demonstrate that the TTDS recommender outperforms the current SOTA recommender system models, achieving an average improvement of 19.41% in Hit-Rate and 20.84% in NDCG metrics,.

2. Related Work

Sequential Recommendation. While the inchoate sequential recommendation models are developed on the basis of classical matrix factorization (Rendle et al., 2009; Kabbur et al., 2013), K-nearest neighbor (Davidson et al., 2010; Linden et al., 2003), and markov chains techniques (He and McAuley, 2016; He et al., 2017), artificial neural network based deep sequential recommender systems have dominated the current leaderboard (Fang et al., 2020). GRU4Rec (Jannach and Ludewig, 2017) for the first time incorporates GRU module (Chung et al., 2014) with sequential recommendation. NARM (Li et al., 2017b) further enhances the long-term user preference memory through attention mechanism, and AttRec (Zhang et al., 2019a) introduces metric learning to capture user-item affinity additionally. Inspired by the great success of pre-trained language models and masked token prediction, Bert4Rec (Sun et al., 2019) first utilizes deep transformer model with masking pre-train strategy on sequential recommendation. The sequential recommendation models introduced above aims at learning high-dimensional item representation and then provide the recommendation result based on Approximate Nearest Neighbor in the Maximum Inner Product Search (Guo et al., 2020). On the other hand, generative recommendation models based on index learning technique have attracted significant research attention, due to the efficient inference. DSI (Tay et al., 2022) for the first time proposes an end-to-end transformer model for document retrieval, NCI (Wang et al., 2022) further supplements the index learning mechanism of DSI by proposing a specific prefix decoder.

LLM-based Recommender System. As the LLMs demonstrating comprehensive capacity on language modeling tasks, P5 (Geng et al., 2022) for the first time attempts to fine-tune LLM for a sequential recommendation oriented model, M6-Rec (Cui et al., 2022) replaces the pseudo index in P5 (e.g., ¡item_7¿) with the corresponding linguistic description, and TALLRec (Bao et al., 2023) combines both the pseudo index and textual description. CLLM4Rec (Zhu et al., 2024) extends the LLM vocabulary with vanilla item index and proposes a mixed prompting strategy to adapt the extended LLM. Furthermore, TIGER (Rajput et al., 2023) for the first time adopts hierarchical vector quantization technique to generate semantic item index and LC-Rec (Zheng et al., 2024) substitutes the T5X backbone with LlaMA (Touvron et al., 2023) for superior recommendation performance.

3. Background

In this section, we briefly introduce the sequential recommendation task and the large language model. The key notations are summarized in Appendix A for clarity.

Sequential Recommendation. By analyzing the user historical interactions, sequential recommendation aims to identify user preference and predict the suitable item that would be engaged with (Yuan et al., 2019; Tang and Wang, 2018; Tan et al., 2016; Li et al., 2017a; Xu et al., 2019; Wu et al., 2019; de Souza Pereira Moreira et al., 2021; Kang and McAuley, 2018; Sun et al., 2019). Given a chronologically organized sequence of interacted items $s=\{v_{1},v_{2},\cdots,v_{l}\}$ , the objective function of sequential recommender system $f$ is to maximize the corresponding likelihood defined as follows,

(1)

\log p(v_{1},v_{2},\cdots,v_{l};f)=\sum_{j=1}^{l-1}\log f(v_{j+1}|v_{1},v_{2},\cdots,v_{j}).

Finally, the trained sequential recommender system $f^{*}$ is optimized by maximizing the likelihood function over all the $N$ training interaction sequences, that is,

(2)		$\displaystyle f^{*}$	$\displaystyle=\mathop{\arg\max}\limits_{f}\sum_{i=1}^{N}\log p(s_{i};f)$
(2)			$\displaystyle=\mathop{\arg\max}\limits_{f}\sum_{i=1}^{N}\sum_{j=1}^{\|s_{j}\|-1}\log f(v_{j+1}\|v_{1},v_{2},\cdots,v_{j}).$

Large Language Model. Transformer-based models with billions learnable parameters trained on large scale corpora (Radford et al., 2023; Gomez-Uribe and Hunt, 2016; Touvron et al., 2023), i.e., large language models (LLMs), have presented astonishing capabilities in natural language understanding and logical reasoning based on learned knowledge. The mainstream LLMs mostly belong to the decoder-only architecture with superior generative ability (Radford et al., 2023), which consists of a token embedding layer, a single decoder module, and a correlated tokenizer. Given a natural language sentence $s=\{w_{1},w_{2},\cdots\}$ , the tokenizer first converts the sentence into sequence of token index $t_{i}$ whose corresponding word $w_{i}$ is included in the LLM vocabulary, formulated as follows,

(3)		$\displaystyle t=\{t_{1},t_{2},\cdots\}$	$\displaystyle={\rm Tokenizer}(s)$
(3)			$\displaystyle=\{{\rm Tokenizer}(w_{1}),{\rm Tokenizer}(w_{2}),\cdots\}.$

Then, the token index sequence is fed into the token embedding layer and projected into continuous latent space, by the table look-up operation as follows,

(4)

h={\rm Embed\_Token}(t)={\rm One\_Hot}(t)\cdot E,

{\rm One\_Hot}(t)\in\{0,1,\cdots,|\mathcal{V}|-1\}^{L\times|\mathcal{V}|},E\in R^{|\mathcal{V}|\times d},

where $E$ is the token embedding table, $L$ is the token sequence length, $\mathcal{V}$ is the LLM vocabulary, and $d$ is the latent space dimension. Eventually, the continuous embedding $h$ forward through the LLM decoder module. In addition, for the LLM aiming at generation task, a language model head is appended to transform the final representation $h^{\prime}$ back into the token index space, as follows,

(5)

h^{\prime}={\rm Decoder}(h),t^{\prime}={\rm LM\_Head}(h^{\prime}).

Afterwards, the tokenizer is able to translate the newly generated token index sequence into natural language sentence by an inverse table look-up operation as opposed to Formula 4.

4. Methodology

In this section, we introduce the Twin-Tower Dynamic Semantic (TTDS) Recommender in detail. Holistically, as shown in Figure 2, the TTDS recommender is characterized by the proposed dynamic knowledge fusion framework, where the semantic knowledge within textual content and the collaborative knowledge within historical behavior are sufficiently integrated. Specifically, a twin-tower semantic token generator takes in charge of hierarchically generating semantic index for both item and user, based on the corresponding LLM textual representation. A dual-modality variational auto-encoder (DM-VAE) is devised to eliminate great discrepancy between the newly-introduced semantic tokens and the well pre-trained NL tokens. Moreover, during the recommendation-oriented fine-tuning, a series of customized tasks are specially designed for modeling the implicit high-order user-item interaction patterns.

4.1. Problem Formulation

In this paper, we focus on sequential recommendation task whose target is predicting suitable items for users according to their historical behavior. For each user $U_{j}$ , the historical behavior is mainly represented by the sequence of interactive items $\{I_{k}\}$ , such as $S_{j}=[I_{7},I_{3},\cdots,I_{26}]$ . Additionally, we use $C_{j}^{U},C_{k}^{I}$ to denote the related textual content of user $U_{j}$ and item $I_{k}$ , respectively. For the item branch, the textual content mostly includes the item title, the detailed description, and other auxiliaries (e.g., brand and category). Similarly, the user-related textual content usually consists of the recent comment, the search query, and the holistic user profile.

4.2. TTDS Recommender

4.2.1. Dynamic Knowledge Fusion Framework

To sufficiently integrate the semantic knowledge and the collaborative knowledge, we propose the dynamic knowledge fusion framework for TTDS recommender. Within this framework, the TTDS recommender system consists of a twin-tower semantic token generator and a LLM-based recommender backbone. The former aims to discretize the continuous representation of user/item into corresponding semantic index. The latter is not only in charge of converting the textual content of user/item into continuous representation, but also responsible for the next-item prediction based on the user/item semantic index and human instruction.

Consider a system of $K$ items and $J$ users, the interactive item sequence of user $U_{j}$ can be denoted as $S_{j}=[I_{k_{1}},I_{k_{2}},\cdots,I_{k_{n}}]$ , where the subscript $k_{x}$ identifies the interactive item and $n$ is the sequence length. First, the LLM backbone $f$ embeds the related textual content of user $U_{j}$ and item $I_{k_{x}}$ , denoted as $C_{j}^{U}$ and $C_{k_{x}}^{I}$ , into continuous representation, formulated as follows,

(6)

X=f(C)={\rm Decoder}({\rm Embed\_Token}({\rm Tokenizer}(C))),

where $C$ refers to the textual content $C_{j}^{U}$ and $C_{k_{x}}^{I}$ , $X$ corresponds to the continuous representation $X_{j}^{U}$ and $X_{k_{x}}^{I}$ . Afterwards, the twin-tower semantic token generator takes the continuous representation $X$ as input and produce semantic index for both user and item. To be more specific, the semantic token generator functions as vector quantization, which adopts a learnable codebook $\mathcal{C}=\{c_{i}\}_{i=1}^{|\mathcal{C}|}$ to approximate the input representation, formulated as follows,

(7)

i^{*}=\mathop{\arg\min}\limits_{i}\mathcal{D}(X,c_{i}),

where $\mathcal{D}$ represents distance metric and $i^{*}$ is the semantic index of $X$ . Within the learnable codebook $\mathcal{C}$ , vector $c^{*}$ (i.e., $c_{i^{*}}$ ) is the closest one to the input representation $X$ .

The twin-tower semantic token generator quantizes the continuous embedding $X$ into the discrete index $i^{*}$ (i.e., semantic index) which serves as an identifier. Note that in Formula 7, the semantic index $i^{*}$ is selected on the basis of representation distance and thus the similar representations will be allocated with similar indices. Accordingly, the vocabulary of LLM backbone is extended with the newly generated semantic token, to predict the next item in an end-to-end manner. Based on the semantic index and the interaction history, we prompt the LLM backbone by natural language human instruction. Specifically, the original item index and user index within the interaction history are both replaced with the corresponding semantic index. The rewritten interaction history is organized as a semantic index sequence in chronological order and the sequential recommendation instruction is also merged into the prompt. An example instruction prompt is given as follows: You are an expert in sequential recommendation. Please predict the most suitable item for user $<i^{*}_{U_{0}}>$ , based on the historical interactions: $<i^{*}_{I_{3}}>,<i^{*}_{I_{7}}>,\cdots,<i^{*}_{I_{15}}>$ . Denoting the human instruction as $P$ , the LLM backbone $f$ first transforms natural language $P$ into hidden representation $X_{P}$ based on Formula 6. Then a language model head is appended to project the hidden state $X_{P}$ into the semantic token vocabulary, formulated as follows,

(8)

\hat{i}={\rm LM\_Head}(X_{P}),

where $\hat{i}$ is the semantic index of the recommendation result and an inverse look-up operation opposite to Formula 7 can identify the original item index. The sequential recommendation task based on human instruction prompt can be conveniently formulated as language generation task. The optimization objective is defined as the negative log-likelihood as follows,

(9)

\mathcal{L}_{\rm LLM}=-\sum_{b=1}^{B}\log F(i^{*}_{b}|P_{b}),

where $F$ is the combination of the LLM backbone and the LM head, $B$ represents the batch-size, $i^{*}_{b}$ and $P_{b}$ is the ground-truth semantic index and the human instruction of the $b$ -th batch, respectively.

4.2.2. Hierarchical Index Mechanism

Considering the computation increment caused by the extended vocabulary, TTDS recommender adopts hierarchical index mechanism whose expressive space increases exponentially with the index length. Within the hierarchical index mechanism, $M$ semantic tokens constitute one semantic index to represents one user/item. Taking item $I_{k}$ as an example, the quantization procedure in Formula 7 will be conducted $M$ times in a residual manner, formulated as follows,

(10)

i^{*}_{m}=\mathop{\arg\min}\limits_{i}\mathcal{D}(X_{k,m}^{I},c_{i,m}),

(11)

X_{k,m}^{I}=X_{k,m-1}^{I}-c_{m-1}^{*}.

The semantic index of item $I_{k}$ can be represented as

{\rm Sid}_{I_{k}}=<i_{1}^{*},i_{2}^{*},\cdots,i_{M}^{*}>.

Therefore, a $M$ length hierarchical index mechanism with $N$ as base can theoretically represents $N^{M}$ distinct items and the newly introduced tokens are merely $N*M$ .

4.2.3. Dual-Modality Variational Auto-Encoder

Within the proposed dynamic knowledge fusion framework, the newly generated user/item semantic tokens are used to extend the LLM vocabulary, to predict the next-item in an end-to-end manner. However, the LLM backbone is unfamiliar with the under-trained semantic tokens, compared with the natural language tokens well pre-trained on large-scale corpus. To facilitate the LLM understanding on semantic tokens, we propose a dual-modality variational auto-encoder (DM-VAE) which constructs multi-grained alignment between the natural language token and the semantic token. Given the continuous embedding $X^{I}_{k}$ and the semantic index ${\rm Sid}_{I_{k}}$ of item $I_{k}$ , we note that $X^{I}_{k}$ and ${\rm Sid}_{I_{k}}$ are two different modalities of the same object. Therefore, the LLM representation of ${\rm Sid}_{I_{k}}$ ought to be similar to $X^{I}_{k}$ . We design the semantic index level reconstruction based on the optimization objective defined as follows,

(12)

\mathcal{L}_{\rm ID}^{S}=\|X^{S}_{k}-X^{I}_{k}\|_{2}^{2},

where $X^{S}_{k}$ is the LLM representation of ${\rm Sid}_{I_{k}}$ . Notably, based on the hierarchical index mechanism, a single semantic index consists of $M$ semantic tokens. We design the semantic token level reconstruction based on the optimization objective defined as follows,

(13)

\mathcal{L}_{\rm Token}^{S}=\sum_{m=1}^{M}\|X_{k,m-1}^{I}-c_{m-1}^{*}\|_{2}^{2}.

The above two objective functions are also defined for the user branch. Overall, the optimization objective of DM-VAE is defined as follows,

(14)

\mathcal{L}_{\rm DM-VAE}=\mathcal{L}_{\rm ID}^{S}+\beta*\mathcal{L}_{\rm Token}^{S},

where $\beta$ is the weighted parameter to control the loss magnitude.

Table 1. Performance comparison of TTDS recommender and baselines on three public datasets. The best and the runner-up performance are indicated in bold and underlined font, respectively. For evaluation stability, the presented performance of TTDS is the average result over several distinct instructions. The Improvement is defined as (Best-Second)/Second.

Dataset	Instruments					Games					Arts
Metric	HR@1	HR@5	HR@10	NDCG-5	NDCG@10	HR@1	HR@5	HR@10	NDCG@5	NDCG@10	HR@1	HR@5	HR@10	NDCG@5	NDCG@10
FMLP-Rec	0.0480	0.0786	0.0988	0.0638	0.0704	0.0152	0.0571	0.0930	0.0361	0.0476	0.0310	0.0757	0.1046	0.0541	0.0634
Caser	0.0149	0.0543	0.0710	0.0355	0.0409	0.0085	0.0367	0.0617	0.0227	0.0307	0.0138	0.0379	0.0541	0.0262	0.0313
HGN	0.0523	0.0813	0.1048	0.0668	0.0744	0.0154	0.0517	0.0856	0.0333	0.0442	0.0300	0.0622	0.0875	0.0462	0.0544
GRU4Rec	0.0571	0.0821	0.1031	0.0698	0.0765	0.0176	0.0586	0.0964	0.0381	0.0502	0.0421	0.0749	0.0964	0.0590	0.0659
BERT4Rec	0.0435	0.0671	0.0822	0.0560	0.0608	0.0136	0.0482	0.0763	0.0311	0.0401	0.0337	0.0559	0.0713	0.0451	0.0500
SASRec	0.0503	0.0751	0.0947	0.0627	0.0690	0.0145	0.0581	0.0940	0.0365	0.0481	0.0225	0.0757	0.1016	0.0508	0.0592
FDSA	0.0520	0.0834	0.1046	0.0681	0.0750	0.0161	0.0644	0.1041	0.0404	0.0531	0.0451	0.0734	0.0933	0.0595	0.0660
S3-Rec	0.0367	0.0863	0.1136	0.0626	0.0714	0.0119	0.0606	0.1002	0.0364	0.0491	0.0245	0.0767	0.1051	0.0521	0.0612
P5-CID	0.0587	0.0827	0.1016	0.0708	0.0768	0.0177	0.0506	0.0803	0.0342	0.0437	0.0485	0.0724	0.0902	0.0607	0.0664
TIGER	0.0608	0.0863	0.1064	0.0738	0.0803	0.0188	0.0599	0.0939	0.0392	0.0501	0.0465	0.0788	0.1012	0.0631	0.0703
CLLM4Rec	0.0336	0.0666	0.0845	0.0516	0.0574
LC-Rec LoRA	0.0576	0.0817	0.1009	0.0698	0.0760	0.0165	0.0520	0.0835	0.0342	0.0443	0.0367	0.0637	0.0837	0.0504	0.0569
TTDS (Ours)	0.0714	0.1028	0.1281	0.0872	0.0947	0.0254	0.0704	0.1083	0.0480	0.0597	0.0639	0.1002	0.1260	0.0823	0.0906
Improvement	17.47%	19.07%	12.74%	18.09%	17.98%	34.92%	9.25%	4.00%	18.78%	12.48%	31.65%	27.11%	19.86%	30.46%	28.86%

4.2.4. High-Order User-Item Interaction Pattern

Most of the existing LLM-based recommendation systems merely focus on the item-related information and design item-centred fine-tuning tasks, neglecting the high-order user-item interaction patterns implicated in the user historical behavior. On the basis of the interactive item sequence, the item co-occurrence pattern is easy to capture. However, the high-order user-item interaction patterns, such as the user co-purchase pattern and the user preference pattern, are implicit and difficult to modeling. To this end, we specifically customize a series of fine-tuning tasks oriented for the high-order user-item interaction patterns. First, inspired by the collaborative filtering technique, we design a user prediction task which is the counterpart of item sequential recommendation. We extract the users who purchase the same item and organize them as an interactive user sequence. The LLM backbone is prompted with the former several users and the human instruction is to predict the next user who is interested in the same item. An example instruction prompt is given as follows: You are an recommendation expert. The item ${\rm Sid}_{I_{1}}$ has been clicked by the following users: ${\rm Sid}_{U_{9}},{\rm Sid}_{U_{5}},\cdots,{\rm Sid}_{U_{23}}$ . Can you predict another user who is interested in this item?

Furthermore, we also design fine-tuning tasks based on the user recent comments and search queries. The LLM backbone is prompted with the user historical interaction sequence and the human instruction is to deduce the user comment or the search query, which is able to reflect the user preference to a certain degree. Notably, the designed fine-tuning tasks oriented for the high-order user-item interaction patterns can also be formatted as language generation task in a sequence-to-sequence manner, with the optimization objective defined in Formula 9. The overall objective function of TTDS recommender system is defined as follows,

(15)

\mathcal{L}_{\rm All}=\mathcal{L}_{\rm LLM}+\alpha*\mathcal{L}_{\rm DM-VAE},

where $\alpha$ is the weighted hyper-parameter.

4.2.5. Collision Handling Mechanism

It is worth noting that the hierarchical index mechanism formulated by Formula 10 and 11 is unable to guarantee the semantic index uniqueness, i.e., different items/users are allocated with distinct semantic indices. Through the argmin operation over distance metric may result in semantic index collision, the corresponding recommender system is still practicable with a fairly low collision rate (Zheng et al., 2024). In our practice, the collision rate is no more than 1%. However, on one hand, the collision rate is affected by the dataset scale, the model architecture, the training parameter, and so on, which is extremely uncontrollable. On the other hand, within some recommender systems, the index uniqueness is significant and must be guaranteed (Rajput et al., 2023; Liu et al., 2024). Inspired by the rehashing method in hash collision, we append an additional token into the semantic index, representing the order inside the collision set (Rajput et al., 2023; Liu et al., 2024). For example, two different items share the same semantic index $<1,2,3>$ . Then, the semantic indices will be remapped to $<1,2,3,p_{0}>$ and $<1,2,3,p_{1}>$ , which scrupulously guarantee the index uniqueness.

4.3. Discussion

To provide a comprehensive understanding to the proposed contribution and innovation, we conduct a discussion about the proposed TTDS recommender and several leading LLM-based generative recommender systems.

4.3.1. Index Mechanism

For the LLM-based generative recommender systems, the index mechanism is the fundamental of item representation during both model training and service deployment. CLLM4Rec (Zhu et al., 2024) adopts vanilla item index such as ¡item_1¿ and considers the item index as a special token which can be appended into the language model vocabulary, resulting in a linearly expanded vocabulary. Furthermore, semantic index based methods, including TIGER (Rajput et al., 2023), LC-Rec (Zheng et al., 2024), and LETTER (Wang et al., 2024), adopt residual vector quantization technique to hierarchically allocate item index (e.g., $<10,1,6>$ ), which reduce the vocabulary scale to a great extent. However, all of the aforementioned methods separate the item indexing process from the downstream sequential recommendation task, hindering the integration of the semantic and collaborative knowledge, thus leading to a sub-optimal performance. To overcome this limitation, we propose a dynamic knowledge fusion framework, where the semantic index generator are jointly optimized with the sequential recommender. Accordingly, the semantic knowledge and the collaborative knowledge are able to be fused sufficiently. Furthermore, we construct multi-grained alignment between the semantic index representation and the textual feature representation of the same item, to facilitate the understanding of newly introduced semantic tokens.

4.3.2. Sequential Recommendation

Existing LLM-based generative recommender systems taking the user historical interaction sequence as input and output an item index as the prediction result. Semantic index based methods, including TIGER, LC-Rec, and LETTER, all neglect the high-order user-item interaction patterns, such as the user co-purchase pattern and the user preference pattern, provide next-item prediction solely based on the interactive item sequence which contains the local item co-occurrence pattern. CLLM4Rec takes user comments into consideration and exploits the user preference to some extent. However, the user co-purchase pattern is also overlooked in CLLM4Rec and the vanilla user index fails to capture the user similarity, leading the moderate recommendation performance. In this paper, we propose to tokenize user representation into semantic user index and customize several high-order user-item interaction patterns exploitation tasks, promoting the TTDS recommender to model the collaborative knowledge.

5. Experiment

In this section, we present and analyse the sequential recommendation performance of TTDS recommender on three public datasets. Furthermore, we investigate the effectiveness of the proposed innovations in TTDS recommender and the superiority of the dynamic semantic index. Finally, we describe several important and intriguing characteristics of TTDS recommender.

5.1. Experimental Setup

5.1.1. Dataset

We evaluate the proposed TTDS recommender all the baseline models on public benchmarks from the Amazon Product Review dataset (He and McAuley, 2020), containing user review data from May 1996 to October 2018. Particularly, we extract three categories for the sequential recommendation task, i.e., ”Musical Instruments”,”Arts, Crafts and Sewing”, and ”Video Games”. Within the above datasets, each item is associated with a series of textual contents, including the item title, the detailed description, the item category, and so on. Similar to the user entity, the associated textual contents include the user comment, the search query, and so on. Following standard procedure, inactive users/items with less then 5 interactions are filtered out and the user interactive sequence is created according to the chronological order. The dataset statistics and detailed pre-processing are presented in Appendix A.

5.1.2. Baseline

To comprehensively demonstrate the multifaceted superiority of TTDS recommender, the evaluation includes the following baselines based on various methodologies.

•

MLP-Based: FMLP-Rec (Zhou et al., 2022) proposes an all-MLP model with learnable filters for sequential recommendation, ensuring efficiency and reduces noise signals.
•

CNN-Based: Caser (Yuan et al., 2019) captures user behaviors by applying horizontal and vertical convolutional filters.
•

RNN-Based: HGN (Ma et al., 2019) utilizes hierarchical gating networks to capture both long-term and short-term user interests from historical behaviors. GRU4Rec (Jannach and Ludewig, 2017) is an sequential recommendation model that utilizes GRU (Chung et al., 2014) to encode the item sequence.
•

GNN-Based: SR-GNN (Wu et al., 2019) models session sequences as graph structured data, to capture complex transitions of items. GC-SAN (Xu et al., 2019) utilizes both graph neural network and self-attention mechanism, for session-based recommendation.
•

Transformer-Based: BERT4Rec (Sun et al., 2019) adopts a bidirectional Transformer model and combines it with a mask prediction task for the item sequences modeling. SASRec (Kang and McAuley, 2018) exploits a unidirectional transformer model to capture the item sequences and predict the next item. FDSA (Zhang et al., 2019b) focuses on the transformation patterns between item features, modeling both item-level and feature-level sequences separately through self-attention networks. S³-Rec (Zhou et al., 2020) utilizes mutual information maximization to pre-train a self-supervised sequential recommendation model, learning the correlation between items and attributes.
•

PLM/LLM-Based: P5-CID (Geng et al., 2022) organizes multiple recommendation tasks in a text-to-text format and models different tasks uniformly using the T5 (Gomez-Uribe and Hunt, 2016) model. TIGER (Rajput et al., 2023) adopts the generative retrieval paradigm for sequential recommendation and introduces a semantic ID to uniquely identify items. CLLM4Rec (Zhu et al., 2024) proposes mixed prompting strategy based on heterogeneous tokens to fulfill sequential recommedation task. LC-Rec (Zheng et al., 2024) extends the TIGER framework and further adopts LlaMA (Touvron et al., 2023) as the recommeder backbone. Moreover, LC-Rec also design several semantic index oriented fine-tuning tasks.

5.1.3. Evaluation Strategy

We adopt the Top-K Hit-Rate (HR@K) with K = 1, 5, 10 and the Normalized Discounted Cumulative Gain (NDCG@K) with K = 5, 10 to evaluate the sequential recommendation performance. Following standard setting, the leave-one-out strategy is adopted. Specifically, the most recent item serves as the evaluation data, the second most recent item servers as the validation data, and the remaining interactive items form the training data. As an end-to-end generative recommender system, TTDS performs full ranking evaluation over the whole item set, instead of the sample-based or candidate-based evaluation.

5.1.4. Implementation Detail

We employ the open source large language model LlaMA (Touvron et al., 2023) developed by Meta and introduce low-rank adaption technique (Hu et al., 2022) for LlaMA fine-tuning. The representation dimension of LlaMA model is 4096. A language model head is appended to translating the hidden representation into the extended vocabulary. For the twin-tower dynamic semantic token generator, the semantic index length $L$ is set as 4, and thus both the item and the user branch have 4 hierarchically connected codebooks respectively. Each codebook consists of 256 quantization embeddings whose dimension is set to 256. The encoder module within the semantic token generator contains a series of linear layers $[4096,2048,1024,512,256]$ , gradually projecting the LlaMA representation into the codebook hidden space. We prepare a warm-up phase with a learning rate of 1e-3 for the twin-tower dynamic semantic token generator, by freezing the LlaMA-based recommender. For the TTDS recommender training phase, we adopt the AdamW (Kingma and Ba, 2015) optimizer with a learning rate of 5e-4.

5.2. Main Result

In this section, we compare the proposed TTDS recommender with the leading baselines in sequential recommendation. The overall evaluation result is presented in Table 1.

In comparing the performance of various recommendation algorithms, it is evident that those which integrate item content data, such as FDSA and S3-Rec, outperform the conventional methods that depend solely on user-item interactions and collaborative filtering, exemplified by Caser, HGN, GRU4Rec, BERT4Re, SASRec, and FMLP-Rec. This suggests that supplementing the recommendation process with item content can significantly enhance its efficacy. Furthermore, P5-CID and TIGER have shown promising results, particularly in the first two datasets, where they excel in metrics like HR@1 and NDCG, which are crucial for evaluating item ranking. However, when applied to the Games dataset, while they do show some improvement over the ID-only model, their performance does not markedly surpass that of methods already incorporating auxiliary content. This discrepancy may be attributed to the varying impact of content information and the inherent challenges in effectively modeling it across different types of data and scenarios.

Our model demonstrates consistently superior performance across three datasets, significantly outperforming the baseline methods. This excellence can be traced back to two key factors: (1) The item indexing mechanism, which employs vector quantization alongside uniform semantic mapping. This approach adeptly captures the nuances of item similarities and guarantees a semantically accurate generation process at the final index stage, ensuring no loss of semantic information. (2) The seamless integration of collaborative semantics into Large Language Models (LLMs). This integration harmonizes language semantics with collaborative semantics, creating a cohesive and effective recommendation system. By implementing these strategies, our model harnesses the robust predictive power of LLMs, leading to substantial enhancements in the accuracy and effectiveness of the recommendation task.

Table 2. Ablation study on the architecture of the Twin-Tower Dynamic Semantic recommender model. Baseline is the best performance achieved by baseline models.

Metric	HR@1	HR@5	HR@10	NDCG@5	NDCG@10
TTDS	0.0711	0.1018	0.1281	0.0864	0.0947
STDS	0.0701	0.1010	0.1230	0.0860	0.0929
TTSS	0.0621	0.0828	0.1005	0.0727	0.0784
TTDS-DM	0.0685	0.0986	0.1224	0.0835	0.0911
STDS-DM	0.0610	0.0919	0.1159	0.0769	0.0845
TTDS-HO	0.0696	0.1007	0.1236	0.0852	0.0926
Baseline	0.0608	0.0863	0.1136	0.0738	0.0803

5.3. Ablation Study

5.3.1. Architecture

To demonstrate the effectiveness of the proposed innovations, including the dynamic knowledge fusion framework, the dual-modality variational auto-encoder, and the high-order user-item interaction pattern specific tasks, we design ablation study to investigate the contribution of each module. In ablation study, TTDS refers to the twin-tower dynamic semantic recommender which is the unabridged model, STDS refers to single-tower dynamic semantic recommender that removes the TTDS user branch, TTSS refers to twin-tower static semantic recommender that removes the dynamic knowledge fusion framework, , TTDS-DM removes the DM-VAE module, STDS-DM removes both the DM-VAE module and the TTDS user branch, and TTDS-HO removes the specific tuning tasks for the high-order user-item interaction patterns.

The sequential recommendation performance on Amazon Instruments dataset is presented in Table 2. The data presented in Table 2 reveals that progressively integrating various semantic alignment tasks into the sequential recommendation process, which was initially based solely on collaborative semantics, leads to a notable enhancement in performance. The inclusion of these instruction tuning tasks has proven to be advantageous for boosting the effectiveness of sequential recommendations. Furthermore, there is an indication that performance could be further optimized by incorporating an even greater number of semantic alignment tasks into the model.

Table 3. Ablation study on the index mechanism of the Twin-Tower Dynamic Semantic recommender model. Baseline is the best performance achieved by baseline models.

Metric	HR@1	HR@5	HR@10	NDCG@5	NDCG@10
TTDS	0.0711	0.1018	0.1281	0.0864	0.0947
TTDS-3	0.0686	0.1004	0.1222	0.0844	0.0914
TTDS-5	0.0697	0.1007	0.1233	0.0853	0.0926
TTSS	0.0621	0.0828	0.1005	0.0727	0.0784
TT-DLSH	0.0677	0.0976	0.1228	0.0829	0.0909
TT-SLSH	0.0660	0.0918	0.1131	0.0790	0.0859
TT-SR	0.0110	0.0405	0.0713	0.0251	0.0349
Baseline	0.0608	0.0863	0.1136	0.0738	0.0803

5.3.2. Indexing Mechanism

Furthermore, we investigate the impact of different index mechanism. The dynamic semantic index adopted by TTDS recommender is able to sufficiently integrate the semantic and collaborative knowledge together. To comprehensively justify its superiority, we introduce dynamic locality sensitive hashing (LSH) index, static LSH index, and static random index, denoted as TT-DLSH, TT-SLSH, TT-SR. Since the TTDS recommeder in main experiment adopts dynamic semantic index consisting of 4 tokens, we further train two comparison version of TTDS recommender with the dynamic semantic index length equals 3 and 5, denoted as TTDS-3 and TTDS-5, respectively. Note that the variant TTSS that removes the dynamic knowledge fusion framework adopts the static semantic index, also serving as a contrastive model.

The sequential recommendation performance on Amazon Instruments dataset is presented in Table 3. We can conclude the following observations from the presented result. First, representing each item/user with 4 semantic tokens (i.e., TTDS in main experiment) is appropriate such an dataset volume (ten thousand). TTDS-3 with less semantic index length may suffer from inadequate expressiveness and TTDS-5 with larger representation space will increase the difficulty of target item generation. Second, comparing TTDS recommender with the variants that adopts static semantic index TTSS, the superiority of dynamic semantic index optimized by the dynamic knowledge fusion framework is significant. The outperformance of TT-DLSH over the static variant TT-SLSH further verifies the effectiveness of the proposed dynaminc knowledge fusion framework for the locality sensitive hashing index as well. Third, comparing the semantic index based recommender and the LSH index based recommender, we can note that for the static index mechanism, the LSH index outperforms the semantic index, since the hyperplane projection in locality sensitive hashing is more stable than the vector quantization technique. Conversely, within the dynamic index situation, the capacity of semantic index is sufficiently exploited and thus TTDS recommender achieves significant improvement over the LSH index based variant.

6. Conclusion

In this research, we propose the Twin-Tower Dynamic Semantic Recommender (TTDS), a pioneering generative Recommender System (RS) that employs a dynamic semantic indexing approach to address the aforementioned challenges holistically. This innovative system is the first of its kind to incorporate a dynamic knowledge fusion framework, which includes a twin-tower semantic token generator within a Large Language Model (LLM)-based recommendation engine. It assigns a hierarchical semantic index to both items and users, enabling the prediction of the target item’s semantic index with precision. Additionally, we introduce a dual-modality variational auto-encoder designed to enhance the alignment between semantic and collaborative knowledge at multiple levels of granularity. To capitalize on user historical behavior, we have also developed a suite of novel tuning tasks specifically tailored to capture complex user-item interaction patterns. Our extensive experimental evaluation across three public datasets has confirmed the efficacy of our proposed methodology in advancing LLM-based generative RSs. The TTDS recommender has achieved remarkable results, with an average improvement of 19.41% in Hit-Rate and 20.84% in the Normalized Discounted Cumulative Gain metric over the leading baseline methods, showcasing its potential in the field of recommendation systems.

References

(1)
Bao et al. (2023) Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. 2023. TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation. In Proceedings of the ACM Conference on Recommender Systems.
Chung et al. (2014) Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In Proceedings of the International Conference on Neural Information Processing System.
Covington et al. (2016) Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for Youtube Recommendations. In Proceedings of the ACM conference on recommender systems.
Cui et al. (2022) Zeyu Cui, Jianxin Ma, Chang Zhou, Jingren Zhou, and Hongxia Yang. 2022. M6-Rec: Generative Pretrained Language Models are Open-Ended Recommender Systems. arXiv:2205.08084
Davidson et al. (2010) James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, and Dasarathi Sampath. 2010. The YouTube Video Recommendation System. In Proceedings of the ACM Conference on Recommender Systems.
de Souza Pereira Moreira et al. (2021) Gabriel de Souza Pereira Moreira, Sara Rabhi, Jeong Min Lee, Ronay Ak, and Even Oldridge. 2021. Transformers4Rec: Bridging the Gap between NLP and Sequential / Session-Based Recommendation. In Proceedings of the ACM conference on recommender systems.
Fang et al. (2020) Hui Fang, Danning Zhang, Yiheng Shu, and Guibing Guo. 2020. Deep Learning for Sequential Recommendation: Algorithms, Influential Factors, and Evaluations. ACM Transactions on Information Systems 39(1) (2020).
Geng et al. (2022) Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. 2022. Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). In Proceedings of the ACM conference on recommender systems.
Gomez-Uribe and Hunt (2016) Carlos Alberto Gomez-Uribe and Neil Hunt. 2016. The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Transactions on Management Information Systems 6(4) (2016).
Guo et al. (2020) Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating Large-Scale Inference with Anisotropic Vector Quantization. In Proceedings of the International Conference on Machine Learning.
He et al. (2017) Ruining He, Wang-Cheng Kang, and Julian McAuley. 2017. Translation-based Recommendation. In Proceedings of the ACM Conference on Recommender Systems.
He and McAuley (2016) Ruining He and Julian McAuley. 2016. Fusing Similarity Models with Markov Chains for Sparse Sequential Recommendation. In IEEE International Conference on Data Mining.
He and McAuley (2020) Ruining He and Julian McAuley. 2020. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In Proceedings of the International Conference on World Wide Web.
Hou et al. (2023) Yupeng Hou, Zhankui He, Julian McAuley, and Wayne Xin Zhao. 2023. Learning Vector-Quantized Item Representation for Transferable Sequential Recommenders. In Proceedings of the ACM Web Conference.
Hou et al. (2022) Yupeng Hou, Shanlei Mu, Wayne Xin Zhao, Yaliang Li, Bolin Ding, and Ji-Rong Wen. 2022. Towards Universal Sequence Representation Learning for Recommender Systems. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
Hou et al. (2024) Yupeng Hou, Junjie Zhang, Zihan Lin, Hongyu Lu, Ruobing Xie, Julian McAuley, and Wayne Xin Zhao. 2024. Large Language Models are Zero-Shot Rankers for Recommender Systems. In Proceedings of the Advances in Information Retrieval: European Conference on Information Retrieval Engineering.
Hu et al. (2022) Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In Proceedings of the International Conference on Learning Representations.
Hua et al. (2023) Wenyue Hua, Shuyuan Xu, Yingqiang Ge, and Yongfeng Zhang. 2023. How to Index Item IDs for Recommendation Foundation Models. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region.
Huh et al. (2023) Minyoung Huh, Brian Cheung, Pulkit Agrawal, and Phillip Isola. 2023. Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks. In Proceedings of the International Conference on Machine Learning.
Jannach and Ludewig (2017) Dietmar Jannach and Malte Ludewig. 2017. When Recurrent Neural Networks meet the Neighborhood for Session-Based Recommendation. In Proceedings of the ACM Conference on Recommender Systems.
Jin et al. (2024b) Bowen Jin, Hansi Zeng, Guoyin Wang, Xiusi Chen, Tianxin Wei, Ruirui Li, Zhengyang Wang, Zheng Li, Yang Li, Hanqing Lu, Suhang Wang, Jiawei Han, and Xianfeng Tang. 2024b. Language Models as Semantic Indexers. In Proceedings of the International Conference on Machine Learning.
Jin et al. (2024a) Mengqun Jin, Zexuan Qiu, Jieming Zhu, Zhenhua Dong, and Xiu Li. 2024a. Contrastive Quantization based Semantic Code for Generative Recommendation. arXiv:2404.14774
Kabbur et al. (2013) Santosh Kabbur, Xia Ning, and George Karypis. 2013. FISM: Factored Item Similarity Models for Top-N Recommender Systems. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
Kang and McAuley (2018) Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recommendation. In Proceedings of the IEEE International Conference on Data Mining.
Kingma and Ba (2015) Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations.
Li et al. (2017a) Jing Li, Zhumin Chen Pengjie Ren, Zhaochun Ren, Tao Lian, and Jun Ma. 2017a. Neural Attentive Session-based Recommendation. In Proceedings of the ACM on Conference on Information and Knowledge Management.
Li et al. (2017b) Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017b. Neural Attentive Session-based Recommendation. In Proceedings of the ACM on Conference on Information and Knowledge Management.
Li et al. (2023) Jiacheng Li, Ming Wang, Jin Li, Jinmiao Fu, Xin Shen, Jingbo Shang, and Julian McAuley. 2023. Text Is All You Need: Learning Language Representations for Sequential Recommendation. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
Linden et al. (2003) Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon.com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Computing 7(1) (2003).
Liu et al. (2024) Han Liu, Yinwei Wei, Xuemeng Song, Weili Guan, Yuan-Fang Li, and Liqiang Nie. 2024. MMGRec: Multimodal Generative Recommendation with Transformer Model. arXiv:2404.16555
Ma et al. (2019) Chen Ma, Peng Kang, and Xue Liu. 2019. Hierarchical Gating Networks for Sequential Recommendation. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
Radford et al. (2023) Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2023. Improving Language Understanding by Generative Pre-Rraining.
Raffel et al. (2020) Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with A Unified Text-To-text Transformer. The Journal of Machine Learning Research 21(1) (2020).
Rajput et al. (2023) Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Keshavan, Trung Vu, Lukasz Heidt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, Maciej Kula, Ed H. Chi, and Maheswaran Sathiamoorthy. 2023. Recommender Systems with Generative Retrieval. In Proceedings of the International Conference on Neural Information Processing Systems.
Rendle et al. (2009) Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the Conference on Uncertainty in Artificial Intelligence.
Sun et al. (2019) Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer. In Proceedings of the ACM International Conference on Information and Knowledge Management.
Tan et al. (2016) Yong Kiam Tan, Xinxing Xu, and Yong Liu. 2016. Improved Recurrent Neural Networks for Session-based Recommendations. In Proceedings of the Workshop on Deep Learning for Recommender Systems.
Tang and Wang (2018) Jiaxi Tang and Ke Wang. 2018. Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding. In Proceedings of the ACM International Conference on Web Search and Data Mining.
Tay et al. (2022) Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, and Donald Metzler. 2022. Transformer Memory as a Differentiable Search Index. In Proceedings of the International Conference on Neural Information Processing System.
Touvron et al. (2023) Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, and Faisal Azhar. 2023. LlaMA: Open and efficient foundation language models.
Wang et al. (2024) Wenjie Wang, Honghui Bao, Xinyu Lin, Jizhi Zhang, Yongqi Li, Fuli Feng, See-Kiong Ng, and Tat-Seng Chua. 2024. Learnable Item Tokenization for Generative Recommendation. arXiv:2405.07314
Wang et al. (2019) Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural Graph Collaborative Filtering. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval.
Wang et al. (2022) Yujing Wang, Yingyan Hou, Haonan Wang, Ziming Miao, Shibin Wu, Hao Sun, Qi Chen, Yuqing Xia, Chengmin Chi, Guoshuai Zhao, Zheng Liu, Xing Xie, Hao Allen Sun, Weiwei Deng, Qi Zhang, and Mao Yang. 2022. A Neural Corpus Indexer for Document Retrieval. In Proceedings of the International Conference on Neural Information Processing System.
Wu et al. (2019) Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based Recommendation with Graph Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence.
Xu et al. (2019) Chengfeng Xu, Pengpeng Zhao, Yanchi Liu, Victor S. Sheng, Jiajie Xu, Fuzhen Zhuang, Junhua Fang, and Xiaofang Zhou. 2019. Graph Contextualized Self-Attention Network for Session-based Recommendation. In Proceedings of the International Joint Conference on Artificial Intelligence.
Yuan et al. (2019) Fajie Yuan, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose, and Xiangnan He. 2019. A Simple Convolutional Generative Network for Next Item Recommendation. In Proceedings of the ACM International Conference on Web Search and Data Mining.
Zeghidour et al. (2021) Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi. 2021. SoundStream: An End-to-End Neural Audio Codec. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2021).
Zhang et al. (2019a) Shuai Zhang, Yi Tay, Lina Yao, Aixin Sun, and Jake An. 2019a. Next Item Recommendation with Self-Attentive Metric Learning. In Proceedings of the AAAI Conference on Artificial Intelligence.
Zhang et al. (2019b) Tingting Zhang, Pengpeng Zhao, Yanchi Liu, Victor S. Sheng, Jiajie Xu, Deqing Wang, Guanfeng Liu, and Xiaofang Zhou. 2019b. Feature-level Deeper Self-Attention Network for Sequential Recommendation. In Proceedings of the International Joint Conference on Artificial Intelligence.
Zheng et al. (2024) Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, and Ji rong Wen. 2024. Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation. In Proceedings of the IEEE International Conference on Data Engineering.
Zhou et al. (2020) Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, and Ji-Rong Wen. 2020. S3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization. In Proceedings of the ACM International Conference on Information and Knowledge Management.
Zhou et al. (2022) Kun Zhou, Hui Yu, Wayne Xin Zhao, and Ji-Rong Wen. 2022. Filter-enhanced MLP is All You Need for Sequential Recommendation. In Proceedings of the ACM Web Conference.
Zhu et al. (2024) Yaochen Zhu, Liang Wu, Qi Guo, Liangjie Hong, and Jundong Li. 2024. Collaborative Large Language Model for Recommender Systems. In Proceedings of the ACM on Web Conference.