This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Exploring the Individuality and Collectivity of Intents behind Interactions for Graph Collaborative Filtering

Yi Zhang Anhui UniversityHefeiChina [email protected] Lei Sang Anhui UniversityHefeiChina [email protected]  and  Yiwen Zhang Anhui UniversityHefeiChina [email protected]
(2024)
Abstract.

Intent modeling has attracted widespread attention in recommender systems. As the core motivation behind user selection of items, intent is crucial for elucidating recommendation results. The current mainstream modeling method is to abstract the intent into unknowable but learnable shared or non-shared parameters. Despite considerable progress, we argue that it still confronts the following challenges: firstly, these methods only capture the coarse-grained aspects of intent, ignoring the fact that user-item interactions will be affected by collective and individual factors (e.g., a user may choose a movie because of its high box office or because of his own unique preferences); secondly, modeling believable intent is severely hampered by implicit feedback, which is incredibly sparse and devoid of true semantics. To address these challenges, we propose a novel recommendation framework designated as Bilateral Intent-guided Graph Collaborative Filtering (BIGCF). Specifically, we take a closer look at user-item interactions from a causal perspective and put forth the concepts of individual intent—which signifies private preferences—and collective intent—which denotes overall awareness. To counter the sparsity of implicit feedback, the feature distributions of users and items are encoded via a Gaussian-based graph generation strategy, and we implement the recommendation process through bilateral intent-guided graph reconstruction re-sampling. Finally, we propose graph contrastive regularization for both interaction and intent spaces to uniformize users, items, intents, and interactions in a self-supervised and non-augmented paradigm. Experimental results on three real-world datasets demonstrate the effectiveness of BIGCF compared with existing solutions.

Recommender System, Collaborative Filtering, Intent Modeling, Graph Neural Network, Self-Supervised Learning
copyright: acmlicensedjournalyear: 2024doi: 10.1145/3626772.3657738copyright: acmlicensedconference: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval; July 14–18, 2024; Washington, DC, USAbooktitle: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24), July 14–18, 2024, Washington, DC, USAisbn: 979-8-4007-0431-4/24/07ccs: Information systems Recommender systems

1. Introduction

Refer to caption
Figure 1. (a) A simplified user-item interaction bipartite graph in movie recommendation scenario; (b) disentangled interaction graph incorporating user intents, user u1u_{1}’s choice of film i3i_{3} is influenced by intents c1c_{1} and c2c_{2}, indicating that he likes horror films with action elements; (c) user-item causal graph considering collective and individual attributes.

Personalized recommender systems (Ricci et al., 2011) are the cornerstone of contemporary E-platforms for suggesting items to users that suit their needs. In terms of technology, recommender systems are built on the concept of collaborative filtering (CF) (Rendle et al., 2009), which aims to infer user preferences from historical interaction data (He et al., 2017). The main focus of current CF models is to develop high-quality embedding representations for both users and items. To achieve this goal, embedding modeling for CF models has been developed in a multifaceted way with a series of results, such as matrix factorization (Rendle et al., 2009), multilayer perceptrons (He et al., 2017), graph convolutional network (GCN) (Ying et al., 2018; Wang et al., 2019; He et al., 2020), and graph contrastive learning (GCL) (Wu et al., 2021; Lin et al., 2022; Ren et al., 2023). Given that the user-item interactions naturally generate a bipartite graph (Fig. 1(a)), the latest stage of the study is to investigate the provably motivating impacts of GCN (Kipf and Welling, 2017) on recommender systems. In concrete terms, LightGCN (He et al., 2020) has been the standard practice in graph-based recommender systems, but recent extensive studies strongly suggest that this is rapidly giving way to GCL (You et al., 2020; Wu et al., 2021).

Despite the impressive results achieved by these methods, we argue that these attempts fall short of providing a more detailed explanation of user behavior. In contrast to the unintentional actions of animals, user behavior is constantly motivated by multiple factors and is susceptible to social influences (Wu et al., 2019). In a nutshell, there are real intents hidden behind the user’s choice of items (Chang et al., 2023). Intent for recommendation has been investigated in some groundbreaking studies. For example, DGCF (Wang et al., 2020a) and KGIN (Wang et al., 2021) aim to learn multiple disentangled intent representations for users, whereas DCCF (Ren et al., 2023) also takes item-side intent modeling into account. We note that these efforts provide interpretability to the recommendation results while maintaining competitive performance, but do not fully reflect the user’s individual preferences.

For a deeper understanding of the interactions, we show a refined disentangled interaction graph, as shown in Fig. 1(b), and user behavior can be divided into two aspects. On the one hand, a user’s behavior is not isolated but influenced by others (Wu et al., 2019), which is also an essential requirement of collaborative filtering (He et al., 2017). Based on this fact, the set of intents {c1,c2,c3}\{c_{1},c_{2},c_{3}\} in Fig. 1(b) is not unique to user u1u_{1}, but is shared by all users. For example, the fact that a group of users {u1,u2}\{u_{1},u_{2}\} like movies with horror themes suggests that the users’ behavior come from a collective intent c1c_{1}. This is referred to as the Bandwagon Effect in social psychology terminology (Knyazev and Oosterhuis, 2022). On the other hand, user behavior also exhibits individualism (Chen et al., 2017). As illustrated in Fig. 1(b), not only does user u1u_{1} enjoy horror movies, but he also enjoys horror movies that have action elements. In fact, the individual preference of user u1u_{1} represents an intent set {c1,c2}\{c_{1},c_{2}\}.

Considering that items are not subjective, there are barriers to defining the intent of an item. Motivated by (Ma et al., 2019b; Ren et al., 2023), we can perceive item intent as an attribute, meaning the reason the user selects it. Similar to users, items can likewise be portrayed in two aspects: the popularity effect and individual characteristics. In general, the Popularity Effect reflects the affinity of items (Wei et al., 2021), which corresponds to the bandwagon effect on the user side. The greater the item’s popularity within the user community, the higher the likelihood of it being selected. And the more substantial factors influencing users’ decisions to pick an item are its individual characteristics. For example, movie i1i_{1} has theme thriller, while movie i3i_{3} has themes thriller and action in Fig. 1(b).

Based on the aforementioned study, the interactions between users and items are influenced by both from collective and individual, which can be abstracted into the fine-grained causal graph shown in Fig. 1(c). To achieve accurate recommendations with intents, it is essential to consider these factors at a finer granularity. However, it will face the following two challenges:

  • \bullet

    How to consider both collective and individual factors for users (items) and realize adaptive trade-offs?

  • \bullet

    How to model user (item) intents using only sparse and semantic-free interaction data?

To tackle the above challenges, we propose a novel Bilateral Intent-guided Graph Collaborative Filtering (BIGCF) for implicit feedback-based recommender system. Considering the sparsity of implicit feedback, we convert the graph recommendation problem into a graph generation task and encode the feature distributions of users and items via iterative GCNs. Such feature distributions are regarded as user preferences and item characteristics, respectively. More further, the bandwagon and popularity effects are abstracted as the Collective Intents on the user and item sides, respectively. Subsequently, we learn disentangled Individual Intents for users and items utilizing collective intents with user preference and item characteristic distributions. Definitionally, the collective intents are learnable parameters shared by all users (items), whereas the individual intents are linear combinations of user preferences (item characteristics) with collective intents. Finally, inspired by the recent motivating effects of GCL in recommender systems (You et al., 2020; Wu et al., 2021), we propose augmentation-agnostic Graph Contrastive Regularization in both interaction and intent spaces, respectively, to self-supervisedly attain uniformity and alignment for all nodes on interaction graph. Overall, BIGCF is an extremely simplified framework for personalized recommender systems that takes into account the individual and collective factors of user-item interactions and enhances recommendation performance through bilateral intent modeling. The major contributions of this paper are summarized as follows:

  • \bullet

    We decompose the motivations behind user-item interactions into collective and individual factors, and propose the recommendation framework BIGCF, which focuses on exploring the individuality and collectivity of intents on the interaction graph.

  • \bullet

    We further propose graph contrastive regularization in both interaction and feature spaces, which regulates the uniformity of the whole feature spaces in an augmentation-agnostic manner and achieves collaborative cross-optimization in dual spaces.

  • \bullet

    We conduct extensive experiments on three public datasets and show that BIGCF not only significantly improves recommendation performance but can also effectively explore users’ true preferences.

Refer to caption
Figure 2. The complete framework of the proposed BIGCF. BIGCF consists of a high-order graph structure encoding, a bi-directional intent-directed graph reconstruction and a graph contrastive regularization process in dual space.

2. METHODOLOGY

2.1. Problem Formulation

A typical recommendation scenario includes a set of MM users 𝒰={u1,u2,,uM}\mathcal{U}=\{u_{1},u_{2},\dots,u_{M}\} and a set of NN items ={i1,i2,,iN}\mathcal{I}=\{i_{1},i_{2},\dots,i_{N}\}. Furthermore, a user-item interaction matrix 𝐑M×N\mathbf{R}\in\mathbb{R}^{M\times N} is given according to the historical user-item interactions. Based on previous works (Wang et al., 2019; He et al., 2020), interactions can be abstracted to a bipartite graph structure 𝒢=<𝒱={𝒰,𝒱},>\mathcal{G}=<\mathcal{V}=\{\mathcal{U},\mathcal{V}\},\mathcal{E}>, where ui=iu=Rui\mathcal{E}_{ui}=\mathcal{E}_{iu}=R_{ui}. Therefore, the recommendation task can also be considered as a bilateral graph generation problem (Truong et al., 2021), i.e., predicting the probability of the existence of edges on the user-item interaction graph 𝒢\mathcal{G}:

(1) (𝐑^|𝐄(0),𝐑)=u𝒰i(R^ui|𝐞u,𝐞i),\displaystyle\begin{aligned} \mathbb{P}(\hat{\mathbf{R}}|\mathbf{E}^{(0)},\mathbf{R})&=\prod_{u\in\mathcal{U}}\prod_{i\in\mathcal{I}}\mathbb{P}(\hat{R}_{ui}|\mathbf{e}_{u},\mathbf{e}_{i}),\end{aligned}

where 𝐄(0)={𝐄𝒰(0),𝐄(0)}\mathbf{E}^{(0)}=\{\mathbf{E}^{(0)}_{\mathcal{U}},\mathbf{E}^{(0)}_{\mathcal{I}}\} is the initial interaction embedding table. 𝐞u\mathbf{e}_{u} and 𝐞i\mathbf{e}_{i} are the embeddings of user uu and item ii via encoding. The widely adopted graph recommendation paradigm is to model user preferences leveraging interactions, a process that relies solely on the interaction graph 𝒢\mathcal{G} (He et al., 2020; Zhang et al., 2024). However, user behavior is often driven by a variety of intents, which are complex and intermingled, with some users gravitating towards horror movies while others have alternatives. At the probabilistic level, the interaction probability requires additional consideration of the effect of intents 𝐂\mathbf{C}:

(2) (𝐑^|𝐄(0),𝐑,𝐂)=u𝒰i(R^ui|𝐞u,𝐞i)×\displaystyle\mathbb{P}(\hat{\mathbf{R}}|\mathbf{E}^{(0)},\mathbf{R},\mathbf{C})=\prod_{u\in\mathcal{U}}\prod_{i\in\mathcal{I}}\mathbb{P}(\hat{R}_{ui}|\mathbf{e}_{u},\mathbf{e}_{i})\times
k𝒦(𝐞u|𝐄(0),𝐑,𝐂𝒰k)(𝐞i|𝐄(0),𝐑,𝐂k),\displaystyle\sum_{k\in\mathcal{K}}\mathbb{P}(\mathbf{e}_{u}|\mathbf{E}^{(0)},\mathbf{R},\mathbf{C}_{\mathcal{U}}^{k})\mathbb{P}(\mathbf{e}_{i}|\mathbf{E}^{(0)},\mathbf{R},\mathbf{C}_{\mathcal{I}}^{k}),

where 𝐂={𝐂𝒰k,𝐂k|k𝒦}\mathbf{C}=\{\mathbf{C}_{\mathcal{U}}^{k},\mathbf{C}_{\mathcal{I}}^{k}|k\in\mathcal{K}\} is the collective intent table for all user and item nodes, of which the number is |𝒦||\mathcal{K}|. Intent 𝐂\mathbf{C} is shared by all users and items as a way to adequately model the collectivity of user preferences and item characteristics. As described in the Introduction section, user intent 𝐂𝒰\mathbf{C}_{\mathcal{U}} represents the bandwagon effect with group volition, indicating the common motivation for choosing items, while item intent 𝐂\mathbf{C}_{\mathcal{I}} means the popularity effect, indicating the common reasons for why the item is selected. Fig. 2 presents the framework of BIGCF, which is a refined version of the user-item causal graph given in Fig. 1(c).

2.2. BIGCF

To fully understand user (item) intents and make recommendations, we need to model the interaction and intent embeddings for user uu and item ii, and construct the final embedding representations 𝐞u\mathbf{e}_{u} and 𝐞i\mathbf{e}_{i} that are used for recommendations (R^ui|𝐞u,𝐞i)\mathbb{P}(\hat{R}_{ui}|\mathbf{e}_{u},\mathbf{e}_{i}). Considering the sparsity and semantic-free nature of implicit feedback, we represent the embeddings 𝐞u\mathbf{e}_{u} and 𝐞i\mathbf{e}_{i} as Gaussian distributions with the support of variational inference (Kingma and Welling, 2013; Liang et al., 2018):

(3) (𝐞u|𝐄(0),𝐑,𝐂𝒰)\displaystyle\mathbb{P}(\mathbf{e}_{u}|\mathbf{E}^{(0)},\mathbf{R},\mathbf{C}_{\mathcal{U}}) 𝒩(𝐞u|𝝁u,diag[𝝈u2]),\displaystyle\sim\mathcal{N}\left(\mathbf{e}_{u}|\boldsymbol{\mu}_{u},\text{diag}[\boldsymbol{\sigma}^{2}_{u}]\right),
(𝐞i|𝐄(0),𝐑,𝐂)\displaystyle\mathbb{P}(\mathbf{e}_{i}|\mathbf{E}^{(0)},\mathbf{R},\mathbf{C}_{\mathcal{I}}) 𝒩(𝐞i|𝝁𝒊,diag[𝝈i2]),\displaystyle\sim\mathcal{N}\left(\mathbf{e}_{i}|\boldsymbol{\mu_{i}},\text{diag}[\boldsymbol{\sigma}^{2}_{i}]\right),

where 𝝁u\boldsymbol{\mu}_{u} and 𝝈u2\boldsymbol{\sigma}^{2}_{u} are the approximate mean and variance of the isotropic Gaussian Distribution of node uu, respectively.

2.2.1. High-order Graph Structure Encoding

Many studies have demonstrated that the GCN-based embedding encoding process (Kipf and Welling, 2017) can fully utilize the rich collaborative relationships on the user-item interaction graph 𝒢\mathcal{G} (He et al., 2020; Wu et al., 2021). Considering that BIGCF is a model-agnostic recommendation framework, we empirically employ lightweight GCN (He et al., 2020) to encode high-order collaborative relationships:

(4) 𝐞u,𝝁(l)=i𝒮u1|𝒮u||𝒮i|𝐞i,𝝁(l1),𝐞i,𝝁(l)u𝒮i1|𝒮u||𝒮i|𝐞u,𝝁(l1),\mathbf{e}_{u,\boldsymbol{\mu}}^{(l)}=\sum_{i\in\mathcal{S}_{u}}\frac{1}{\sqrt{|\mathcal{S}_{u}||\mathcal{S}_{i}|}}\mathbf{e}_{i,\boldsymbol{\mu}}^{(l-1)},\quad\mathbf{e}_{i,\boldsymbol{\mu}}^{(l)}\sum_{u\in\mathcal{S}_{i}}\frac{1}{\sqrt{|\mathcal{S}_{u}||\mathcal{S}_{i}|}}\mathbf{e}_{u,\boldsymbol{\mu}}^{(l-1)},

where 𝐞u,𝝁(l)\mathbf{e}_{u,\boldsymbol{\mu}}^{(l)} and 𝐞i,𝝁(l)\mathbf{e}_{i,\boldsymbol{\mu}}^{(l)} are ll-th layer (l[1,L]l\in[1,L]) user and item structural embeddings, respectively (we define 𝐞u,𝝁(0)=𝐞u(0)\mathbf{e}_{u,\boldsymbol{\mu}}^{(0)}=\mathbf{e}_{u}^{(0)} and 𝐞i,𝝁(l)=𝐞i(0)\mathbf{e}_{i,\boldsymbol{\mu}}^{(l)}=\mathbf{e}_{i}^{(0)}), and 𝒮u\mathcal{S}_{u} and 𝒮i\mathcal{S}_{i} are the one-order receptive field of user uu and item ii, respectively. From a macroscopic view, the process is a cumulative multiplication of the graph Laplace matrix: 𝐄𝝁(l)=(𝐃0.5𝐀𝐃0.5)l𝐄(0)\mathbf{E}_{\boldsymbol{\mu}}^{(l)}=\left(\mathbf{D}^{-0.5}\mathbf{A}\mathbf{D}^{-0.5}\right)^{l}\mathbf{E}^{(0)}, where 𝐃\mathbf{D} is the diagonal degree matrix of 𝐀\mathbf{A}, and 𝐀(M+N)×(M+N)\mathbf{A}\in\mathbb{R}^{(M+N)\times(M+N)} is the adjacent matrix of 𝒢\mathcal{G}. After LL layers of propagation, we empirically construct the final structural embeddings by sum pooling for speed: 𝐞u,𝝁=l=0L𝐞u,𝝁(l)\mathbf{e}_{u,\boldsymbol{\mu}}={\textstyle\sum_{l=0}^{L}\mathbf{e}_{u,\boldsymbol{\mu}}^{(l)}} and 𝐞i,𝝁=l=0L𝐞i,𝝁(l)\mathbf{e}_{i,\boldsymbol{\mu}}={\textstyle\sum_{l=0}^{L}\mathbf{e}_{i,\boldsymbol{\mu}}^{(l)}}. Considering the variability of neighbor contributions to the central node at each order on the interaction graph 𝒢\mathcal{G}, the computation of 𝐄𝝁={𝐄𝒰,𝝁,𝐄,𝝁}\mathbf{E}_{\boldsymbol{\mu}}=\{\mathbf{E}_{\mathcal{U},\boldsymbol{\mu}},\mathbf{E}_{\mathcal{I},\boldsymbol{\mu}}\} is included in each layer to capture user preferences and item characteristics with different structural semantics.

2.2.2. Bilateral Intent-guided Graph Reconstruction

We now switch perspectives to investigate the correspondence between intents and interactions. Given the unique preferences of the user uu and the unique characteristics of the item ii, we need to additionally consider the adaptability between intents and interactions. Specifically, we first measure the correlation scores between the structural embeddings {𝐞u,𝝁,𝐞i,𝝁}\{\mathbf{e}_{u,\boldsymbol{\mu}},\mathbf{e}_{i,\boldsymbol{\mu}}\} and the collective intents {𝐂𝒰,𝐂}\{\mathbf{C}_{\mathcal{U}},\mathbf{C}_{\mathcal{I}}\}:

(5) (𝐂𝒰k|𝐞u,𝝁)\displaystyle\mathbb{P}(\mathbf{C}_{\mathcal{U}}^{k}|\mathbf{e}_{u,\boldsymbol{\mu}}) =exp(𝐞u,𝝁𝐂𝒰k/κ)/k𝒦exp(𝐞u,𝝁𝐂𝒰k/κ),\displaystyle=\text{exp}\left({\mathbf{e}_{u,\boldsymbol{\mu}}}^{\top}\mathbf{C}_{\mathcal{U}}^{k}/\kappa\right)/{\textstyle\sum_{k^{\prime}}^{\mathcal{K}}}\text{exp}\left({\mathbf{e}_{u,\boldsymbol{\mu}}}^{\top}\mathbf{C}_{\mathcal{U}}^{k^{\prime}}/\kappa\right),
(𝐂k|𝐞i,𝝁)\displaystyle\mathbb{P}(\mathbf{C}_{\mathcal{I}}^{k}|\mathbf{e}_{i,\boldsymbol{\mu}}) =exp(𝐞i,𝝁𝐂k/κ)/k𝒦exp(𝐞i,𝝁𝐂k/κ),\displaystyle=\text{exp}\left({\mathbf{e}_{i,\boldsymbol{\mu}}}^{\top}\mathbf{C}_{\mathcal{I}}^{k}/\kappa\right)/{\textstyle\sum_{k^{\prime}}^{\mathcal{K}}}\text{exp}\left({\mathbf{e}_{i,\boldsymbol{\mu}}}^{\top}\mathbf{C}_{\mathcal{I}}^{k^{\prime}}/\kappa\right),

where κ[0,1]\kappa\in[0,1] is a temperature coefficient used to adjust the predicted distribution. In practice, the collective intent {𝐂𝒰,𝐂}\{\mathbf{C}_{\mathcal{U}},\mathbf{C}_{\mathcal{I}}\} is utilized to compute the affinity between the interaction and each intent k𝒦k\in\mathcal{K}, reflecting the individuality and enabling deeper integration of disentangled intents with the graph structure:

(6) 𝐞u,𝝈=k𝒦(𝐂𝒰k|𝐞u,𝝁)𝐂𝒰k,𝐞i,𝝈=k𝒦(𝐂k|𝐞i,𝝁)𝐂k,\mathbf{e}_{u,\boldsymbol{\sigma}}=\sum_{k\in\mathcal{K}}\mathbb{P}(\mathbf{C}_{\mathcal{U}}^{k}|\mathbf{e}_{u,\boldsymbol{\mu}})\cdot\mathbf{C}_{\mathcal{U}}^{k},\quad\mathbf{e}_{i,\boldsymbol{\sigma}}=\sum_{k\in\mathcal{K}}\mathbb{P}(\mathbf{C}_{\mathcal{I}}^{k}|\mathbf{e}_{i,\boldsymbol{\mu}})\cdot\mathbf{C}_{\mathcal{I}}^{k},

where 𝐞u,𝝈\mathbf{e}_{u,\boldsymbol{\sigma}} and 𝐞i,𝝈\mathbf{e}_{i,\boldsymbol{\sigma}} are the individual intent embeddings of user uu and item ii, respectively. In order to obtain the approximate posterior that can fit the user’s preferences, we regard the encoded structural embeddings and the individual intent embeddings as the variational parameters for distributions defined in Eq. 3. Considering the additivity of the Gaussian distributions (Ho et al., 2020), varying intents are seen as a combination of approximate Gaussians to adequately fit the individualized profile of user uu (or item ii):

(7) (𝐞u|𝐄(0),𝐑,𝐂𝒰)\displaystyle\mathbb{P}(\mathbf{e}_{u}|\mathbf{E}^{(0)},\mathbf{R},\mathbf{C}_{\mathcal{U}}) =k𝒦(𝐞u|𝐄(0),𝐑,𝐂𝒰k)\displaystyle=\sum_{k\in\mathcal{K}}\mathbb{P}(\mathbf{e}_{u}|\mathbf{E}^{(0)},\mathbf{R},\mathbf{C}_{\mathcal{U}}^{k})
𝒩(𝐞u|𝐞u,𝝁,k𝒦(𝐂𝒰k|𝐞u,𝝁)𝐂𝒰k)\displaystyle\sim\mathcal{N}\left(\mathbf{e}_{u}|\mathbf{e}_{u,\boldsymbol{\mu}},\sum_{k\in\mathcal{K}}\mathbb{P}(\mathbf{C}_{\mathcal{U}}^{k}|\mathbf{e}_{u,\boldsymbol{\mu}})\cdot\mathbf{C}_{\mathcal{U}}^{k}\right)
𝒩(𝐞u|𝐞u,𝝁,𝐞u,𝝈).\displaystyle\sim\mathcal{N}\left(\mathbf{e}_{u}|\mathbf{e}_{u,\boldsymbol{\mu}},\mathbf{e}_{u,\boldsymbol{\sigma}}\right).

The item side has a similar definition. The mean 𝐞u,𝝁\mathbf{e}_{u,\boldsymbol{\mu}} represents the individual preferences for user uu, which are independent of intents, while the variance 𝐞u,𝝈\mathbf{e}_{u,\boldsymbol{\sigma}} is a linear combination of distributions generated by multiple intents, which provides the variance with a variety of implications. For example, the user u1u_{1} in Fig. 1(b) will contain more correlation scores from intents c1c_{1} and c2c_{2}. And the intent c1c_{1} should have the highest score because the primary premise for choosing this movie is the horror theme. Thus we can assume that 𝐞u1,𝝈=0.6c1+0.3c2+0.1c3\mathbf{e}_{u_{1},\boldsymbol{\sigma}}=0.6c_{1}+0.3c_{2}+0.1c_{3}.

Given a user uu and an item ii, the mean and variance embeddings of the approximate posterior can be used to sample 𝐞u𝒩(𝝁u,𝝈u2)\mathbf{e}_{u}\sim\mathcal{N}(\boldsymbol{\mu}_{u},\boldsymbol{\sigma}_{u}^{2}) and 𝐞i𝒩(𝝁i,𝝈i2)\mathbf{e}_{i}\sim\mathcal{N}(\boldsymbol{\mu}_{i},\boldsymbol{\sigma}_{i}^{2}). Considering the non-differentiable nature of the sampling process, reparameterization trick (Kingma and Welling, 2013) is used here to cleverly circumvent the back-propagation problem:

(8) (𝐞u|𝐄(0),𝐑,𝐂𝒰)\displaystyle\mathbb{P}(\mathbf{e}_{u}|\mathbf{E}^{(0)},\mathbf{R},\mathbf{C}_{\mathcal{U}}) 𝐞u,𝝁+𝐞u,𝝈ϵu,\displaystyle\sim\mathbf{e}_{u,\boldsymbol{\mu}}+{\mathbf{e}_{u,\boldsymbol{\sigma}}}\odot\boldsymbol{\epsilon}_{u},
(𝐞i|𝐄(0),𝐑,𝐂)\displaystyle\mathbb{P}(\mathbf{e}_{i}|\mathbf{E}^{(0)},\mathbf{R},\mathbf{C}_{\mathcal{I}}) 𝐞i,𝝁+𝐞i,𝝈ϵi,\displaystyle\sim\mathbf{e}_{i,\boldsymbol{\mu}}+\mathbf{e}_{i,\boldsymbol{\sigma}}\odot\boldsymbol{\epsilon}_{i},

where ϵu\boldsymbol{\epsilon}_{u} and ϵi\boldsymbol{\epsilon}_{i} are auxiliary noise variables sampled from 𝒩(𝟎,𝐈)\mathcal{N}(\mathbf{0},\mathbf{I}), and \odot denotes an element-wise product. We emphasize the advantages of introducing reparameterization in the following ways: (1) ϵu\boldsymbol{\epsilon}_{u} and ϵi\boldsymbol{\epsilon}_{i} achieve fusion in an adaptive manner, thus avoiding tedious manual adjustments; (2) Eq. 8 does not depend on any additional process, such as attention mechanisms. Combining Eqs. 4, 5, 6, 7, and 8, we can obtain:

(9) 𝐞u\displaystyle\mathbf{e}_{u} =l=1Li𝒮u1|𝒮u||𝒮i|𝐞i(l1)structural information:𝐞u,𝝁+(k𝒦(𝐂𝒰k|𝐞u,𝝁)𝐂𝒰kuser intent:𝐞u,𝝈)ϵu,\displaystyle=\underbrace{\sum_{l=1}^{L}\sum_{i\in\mathcal{S}_{u}}\frac{1}{\sqrt{|\mathcal{S}_{u}||\mathcal{S}_{i}|}}\mathbf{e}_{i}^{(l-1)}}_{\text{structural information:}\mathbf{e}_{u,\boldsymbol{\mu}}}+(\underbrace{\sum_{k\in\mathcal{K}}\mathbb{P}(\mathbf{C}_{\mathcal{U}}^{k}|\mathbf{e}_{u,\boldsymbol{\mu}})\cdot\mathbf{C}_{\mathcal{U}}^{k}}_{\text{user intent:}\mathbf{e}_{u,\boldsymbol{\sigma}}})\odot\boldsymbol{\epsilon}_{u},
𝐞i\displaystyle\mathbf{e}_{i} =l=1Lu𝒮i1|𝒮u||𝒮i|𝐞u(l1)structural information:𝐞i,𝝁+(k𝒦(𝐂k|𝐞i,𝝁)𝐂kitem intent:𝐞i,𝝈)ϵi.\displaystyle=\underbrace{\sum_{l=1}^{L}\sum_{u\in\mathcal{S}_{i}}\frac{1}{\sqrt{|\mathcal{S}_{u}||\mathcal{S}_{i}|}}\mathbf{e}_{u}^{(l-1)}}_{\text{structural information:}\mathbf{e}_{i,\boldsymbol{\mu}}}+(\underbrace{\sum_{k\in\mathcal{K}}\mathbb{P}(\mathbf{C}_{\mathcal{I}}^{k}|\mathbf{e}_{i,\boldsymbol{\mu}})\cdot\mathbf{C}_{\mathcal{I}}^{k}}_{\text{item intent:}\mathbf{e}_{i,\boldsymbol{\sigma}}})\odot\boldsymbol{\epsilon}_{i}.

Eq. 9 further highlights the fusion process of structural information and intents based on the reparameterization process. It is important to note that the intent 𝐂𝒰\mathbf{C}_{\mathcal{U}} (𝐂\mathbf{C}_{\mathcal{I}}) is common to all users (items) and represents the collectivity of users (items), however the variance vector 𝐞u,𝝈\mathbf{e}_{u,\boldsymbol{\sigma}} (𝐞i,𝝈\mathbf{e}_{i,\boldsymbol{\sigma}}) is unique and represents the individuality of user uu (item ii) on the interaction graph 𝒢\mathcal{G}.

Based on the fused embeddings, we compute the probability scores of the existence of interaction edges RuiR_{ui} via inner product:

(10) (R^ui|𝐞u,𝐞i)=Sigmoid(𝐞u𝐞i).\mathbb{P}(\hat{R}_{ui}|\mathbf{e}_{u},\mathbf{e}_{i})=\text{Sigmoid}(\mathbf{e}_{u}^{\top}\mathbf{e}_{i}).

2.2.3. Graph Contrastive Regularization in Dual Spaces

There is evidence from a number of recent research that self-supervised learning (SSL) provides extra supervised signals to assist ease the data sparsity problem, and is positively motivating for recommendation tasks (Wu et al., 2021). Contrastive learning (Chen et al., 2020), which builds a pair of views of anchor nodes via data augmentation and maximizes the mutual information between them, is generally the most popular strategy for self-supervised learning. Given arbitrary node aa, the widely used contrastive loss infoNCE (Chen et al., 2020) is defined as follows:

(11) (𝐚,𝐚′′)=logexp(cos(𝐚,𝐚′′)/τ)bexp(cos(𝐚,𝐛)/τ),\mathcal{I}(\mathbf{a}^{\prime},\mathbf{a}^{\prime\prime})=-\text{log}\frac{\text{exp}\left(\text{cos}\left({\mathbf{a}^{\prime},\mathbf{a}^{\prime\prime}}\right)/\tau\right)}{{\textstyle\sum_{b\in\mathcal{B}}{\text{exp}(\text{cos}\left({\mathbf{a}^{\prime},\mathbf{b}}\right)/\tau)}}},

where 𝐚\mathbf{a}^{\prime} and 𝐚′′\mathbf{a}^{\prime\prime} are two augmented embeddings of aa, cos(,)\text{cos}(\cdot,\cdot) means the cosine similarity, and τ\tau is a predefined temperature coefficient. Note that the other nodes {b}\{b\in\mathcal{B}\} in a mini-batch are considered as negative samples of node aa in Eq. 11 for simplicity. Most of existing recommendation paradigms adopt various types of data augmentation to perform SSL task, such as structural (Wu et al., 2021; Yang et al., 2022), feature (Lin et al., 2022), or semantic augmentation (Ren et al., 2023). Nevertheless, these augmentation techniques not only add significantly to the training cost but also have a high propensity to distort the semantic information of input data. For example, removing key node structures may destroy the original interaction graph 𝒢\mathcal{G}.

Noting that the core of infoNCE lies in uniformizing the feature space (Wang and Isola, 2020; Wang and Liu, 2021), we regard the contrastive process (𝐚,𝐛)\mathcal{I}(\mathbf{a},\mathbf{b}) as a regularization of all nodes 𝒱\mathcal{V} on the interaction graph 𝒢\mathcal{G}. Specifically, regularization in the interaction space consists of two parts:

  • \bullet

    Interaction Regularization (IR) (𝐞u,𝐞i)\mathcal{I}(\mathbf{e}_{u},\mathbf{e}_{i}): Considering that the essence of recommendation lies in quantifying the user-item similarity, it is necessary to ensure a closer distance between the user uu and the positive interaction ii, and it is also needed to keep the user uu as far away as possible from the negatives {j/i}\{j\in\mathcal{B}/i\}.

  • \bullet

    Homogeneous Node Regularization (HNR) (𝐞u,𝐞u)\mathcal{I}(\mathbf{e}_{u},\mathbf{e}_{u}) and (𝐞i,𝐞i)\mathcal{I}(\mathbf{e}_{i},\mathbf{e}_{i}): The Matthew effect caused by the popularity bias (Wei et al., 2021) also needs to be taken into account to prevent the learned embeddings from concentrating close to one direction, which would result in dimensional collapse (Jing et al., 2021). We keep the nodes themselves (e.g., uu) as positive samples while keeping different nodes (e.g., {v/u}\{v\in\mathcal{B}/u\}) far away from each other.

Based on the above analysis, we propose the graph contrastive regularization process in the interaction space:

(12) inter=<u,i>((𝐞u,𝐞i)+(𝐞u,𝐞u)+(𝐞i,𝐞i)),\displaystyle\mathcal{L}_{\text{inter}}=\sum_{<u,i>\in\mathcal{B}}\left(\mathcal{I}(\mathbf{e}_{u},\mathbf{e}_{i})+\mathcal{I}(\mathbf{e}_{u},\mathbf{e}_{u})+\mathcal{I}(\mathbf{e}_{i},\mathbf{e}_{i})\right),

where 𝐞u\mathbf{e}_{u} and 𝐞i\mathbf{e}_{i} are the user and item embeddings obtained via Eq. 9. We emphasise the superiority of Eq. 12 in two respects. Firstly, the loss inter\mathcal{L}_{\text{inter}} does not depend on any form of data augmentation, which not only significantly reduces the time complexity but also ensures the integrity of the structural information. Secondly, Eq. 12 guarantees alignment between positive interaction pairs while keeping the entire interaction feature space uniform, whereas traditional CL strategies do not take into account the alignment relationships between users and items.

  • \bullet

    Bilateral Intent Regularization (BIR) (𝐞u,𝝈,𝐞u,𝝈)\mathcal{I}(\mathbf{e}_{u,\boldsymbol{\sigma}},\mathbf{e}_{u,\boldsymbol{\sigma}}) and \mathcal{I} (𝐞i,𝝈,𝐞i,𝝈)(\mathbf{e}_{i,\boldsymbol{\sigma}},\mathbf{e}_{i,\boldsymbol{\sigma}}): In the intent space, we obtain the individual intent embeddings {𝐞u,𝝈,𝐞i,𝝈}\{\mathbf{e}_{u,\boldsymbol{\sigma}},\mathbf{e}_{i,\boldsymbol{\sigma}}\} of user uu and item ii through the collective intents {𝐂𝒰,𝐂}\{\mathbf{C}_{\mathcal{U}},\mathbf{C}_{\mathcal{I}}\}. Considering that intent embeddings {𝐞u,𝝈,𝐞i,𝝈}\{\mathbf{e}_{u,\boldsymbol{\sigma}},\mathbf{e}_{i,\boldsymbol{\sigma}}\} indicate different motivations (reasons) for user uu (item ii) to select (be selected) items (by users), if an intent kk can be represented by other intents {k𝒦/k}\{k^{\prime}\in\mathcal{K}/k\}, then such an intent kk may be redundant.

Based on this, we further model disentangled intent representations via graph contrastive regularization process in the intent space:

(13) intent=u(𝐞u,𝝈,𝐞u,𝝈)+i(𝐞i,𝝈,𝐞i,𝝈).\mathcal{L}_{\text{intent}}=\sum_{u\in\mathcal{B}}\mathcal{I}(\mathbf{e}_{u,\boldsymbol{\sigma}},\mathbf{e}_{u,\boldsymbol{\sigma}})+\sum_{i\in\mathcal{B}}\mathcal{I}(\mathbf{e}_{i,\boldsymbol{\sigma}},\mathbf{e}_{i,\boldsymbol{\sigma}}).

Similar to Eq. 12, intent\mathcal{L}_{\text{intent}} also does not rely on any form of data augmentation. The process of graph contrastive regularization in both interaction and intent spaces is shown in Fig. 3. Notice that the computational chains of final interaction embeddings {𝐞u,𝐞i}\{\mathbf{e}_{u},\mathbf{e}_{i}\} and individual intent embeddings {𝐞u,𝝈,𝐞i,𝝈}\{\mathbf{e}_{u,\boldsymbol{\sigma}},\mathbf{e}_{i,\boldsymbol{\sigma}}\} contain the collective intent {𝐂𝒰,𝐂}\{\mathbf{C}_{\mathcal{U}},\mathbf{C}_{\mathcal{I}}\} and the structural embeddings {𝐞u,𝝁,𝐞i,𝝁}\{\mathbf{e}_{u,\boldsymbol{\mu}},\mathbf{e}_{i,\boldsymbol{\mu}}\}, which allows the proposed graph contrastive regularization process to achieve cross-domain collaboration in dual spaces. We will further analyze in Subsection 2.4.2.

Refer to caption
Figure 3. The computational process and information flow of graph contrastive regularization in interaction space and intent space. The solid line indicates that the input is directly involved in the computation, while the dashed line indicates that this input is indirectly involved in the operation.

2.3. Multi-task Joint Training

Recommendation tasks based on graph structures can be viewed as graph generation problems, i.e., predicting the likelihood of the existence of interaction edges (Zhang et al., 2023). Given the interaction graph 𝒢\mathcal{G} and interaction matrix 𝐑\mathbf{R}, we assume that arbitrary data point comes from the generation process 𝐑(𝐑^|𝐄)\mathbf{R}\sim\mathbb{P}(\hat{\mathbf{R}}|\mathbf{E}). In BIGCF, we learn the mean and variance embeddings of the approximate posterior for user and item, i.e., 𝐄(𝐄|Θ,𝐑)\mathbf{E}\sim\mathbb{P}(\mathbf{E}|\Theta,\mathbf{R}). Then the graph generation task can be optimized by the Evidence Lower Bound (ELBO) (Kingma and Welling, 2013):

(14) ELBO=𝔼𝐄(𝐄|Θ,𝐑)[log((𝐑^|𝐄)]KL((𝐄|Θ,𝐑)||(𝐄)),\mathcal{L}_{\text{ELBO}}=\mathbb{E}_{\mathbf{E}\sim\mathbb{P}(\mathbf{E}|\Theta,\mathbf{R})}[\text{log}(\mathbb{P}(\hat{\mathbf{R}}|\mathbf{E})]-\text{KL}(\mathbb{P}(\mathbf{E}|\Theta,\mathbf{R})||\mathbb{P}(\mathbf{E})),

where Θ={𝐄(0),𝐂}\Theta=\{\mathbf{E}^{(0)},\mathbf{C}\} is the set of model parameters. The first term is the reconstruction error between the input interaction matrix 𝐑\mathbf{R} and the reconstructed interaction matrix 𝐑^\hat{\mathbf{R}}, and in line with existing methods (Wang et al., 2019; He et al., 2020; Wu et al., 2021), we empirically employ the pair-wise loss (Rendle et al., 2009) to guide the reconstruction process on the interaction graph 𝒢\mathcal{G}:

(15) 𝔼(𝐄|Θ,𝐑)[log((𝐑^|𝐄)]=<u,i,j>logδ((R^ui)(R^uj))\mathbb{E}_{\mathbb{P}(\mathbf{E}|\Theta,\mathbf{R})}[\text{log}(\mathbb{P}(\hat{\mathbf{R}}|\mathbf{E})]=\sum_{<u,i,j>\in\mathcal{B}}-\text{log}\delta(\mathbb{P}(\hat{R}_{ui})-\mathbb{P}(\hat{R}_{uj}))

where δ\delta is the Sigmoid activation function, (R^ui)\mathbb{P}(\hat{R}_{ui}) and (R^uj)\mathbb{P}(\hat{R}_{uj}) are short for (R^ui|𝐞u,𝐞i)\mathbb{P}(\hat{R}_{ui}|\mathbf{e}_{u},\mathbf{e}_{i}) and (R^uj|𝐞u,𝐞j)\mathbb{P}(\hat{R}_{uj}|\mathbf{e}_{u},\mathbf{e}_{j}), respectively, and ii and jj are positive and negative samples of user uu, respectively. The second term in Eq. 14 is the KL divergence between the posterior distribution (𝐄|Θ,𝐑)\mathbb{P}(\mathbf{E}|\Theta,\mathbf{R}) and the prior (𝐄)\mathbb{P}(\mathbf{E}). For simplicity, the prior (𝐄)\mathbb{P}(\mathbf{E}) is set to a standard Gaussian 𝒩(𝟎,𝐈)\mathcal{N}(\mathbf{0},\mathbf{I}) (Liang et al., 2018).

Finally, to integrate recommendation task and intent modeling, we employ a multi-task joint training strategy to optimise the main recommendation task and the graph contrastive regularization in both interaction and intent spaces. The complete optimization objective of BIGCF is defined as follows:

(16) BIGCF=ELBO+λ1(inter+intent)+λ2Θ22\mathcal{L}_{\text{BIGCF}}=\mathcal{L}_{\text{ELBO}}+\lambda_{1}(\mathcal{L}_{\text{inter}}+\mathcal{L}_{\text{intent}})+\lambda_{2}\left\|\Theta\right\|_{2}^{2}

where λ1\lambda_{1} and λ2\lambda_{2} are adjustable weights, and Θ={𝐄(0),𝐂}\Theta=\{\mathbf{E}^{(0)},\mathbf{C}\} are trainable model parameters.

2.4. Model Analysis

2.4.1. Time Complexity

Table 1. Theoretical time complexity comparisons between BIGCF and seven baselines. The time complexity of embedding encoding is the theoretical time required to complete all operations in a batch. OA stands for additional operation.
Model Time complexity of encoding AO
NGCF (Wang et al., 2019) O(2(||+|𝒱|)Ld)O(2(|\mathcal{E}|+|\mathcal{V}|)Ld) -
LightGCN (He et al., 2020) O(2||Ld)O(2|\mathcal{E}|Ld) -
DGCF (Wang et al., 2020a) O(2|𝒦|||Ld)O(2|\mathcal{K}||\mathcal{E}|Ld) -
SGL-ED (Wu et al., 2021) O(2(1+2ρ^)||Ld)O(2(1+2\hat{\rho})|\mathcal{E}|Ld) O(4ρ^||)O(4\hat{\rho}|\mathcal{E}|)
HCCF (Xia et al., 2022) O(2(||+H|𝒱|)Ld)O(2(|\mathcal{E}|+H|\mathcal{V}|)Ld) -
LightGCL (Cai et al., 2023) O(2(||+q|𝒱|)Ld)O(2(|\mathcal{E}|+q|\mathcal{V}|)Ld) O(q||)O(q|\mathcal{E}|)
DCCF (Ren et al., 2023) O(2(|E|+|𝒦||𝒱|)Ld)O(2(|E|+|\mathcal{K}||\mathcal{V}|)Ld) O(4||Ld)O(4|\mathcal{E}|Ld)
BIGCF (Ours) O(2||Ld+|𝒦||𝒱|d)O(2|\mathcal{E}|Ld+|\mathcal{K}||\mathcal{V}|d) -

The time complexity of BIGCF mainly comes from the graph structure encoding and the intent modeling. Specifically, since BIGCF uses LightGCN (He et al., 2020) as the basic graph encoder, the time complexity of the process is O(2||Ld)O(2|\mathcal{E}|Ld), where |||\mathcal{E}| is the number of edges on the interaction graph 𝒢\mathcal{G}, and the time complexity of the intent modeling part is O(|𝒦||𝒱|d)O(|\mathcal{K}||\mathcal{V}|d), where |𝒱|=M+N|\mathcal{V}|=M+N is the number of nodes of the interaction graph 𝒢\mathcal{G}. Therefore, the total time complexity of BIGCF during training is O(2||Ld+|𝒦||V|d)O(2|\mathcal{E}|Ld+|\mathcal{K}|\mathcal{|}V|d). Table 1 presents the horizontal comparisons of the time complexity of BIGCF with other same-type methods. ρ\rho is the the edge keep probability of SGL-ED (Wu et al., 2021), HH is the number of hyperedges in HCCF (Xia et al., 2022), and qq is the required rank for SVD in LightGCL (Cai et al., 2023). We have the following findings:

  • \bullet

    The time complexity of BIGCF is slightly higher than that of LightGCN due to the modeling process of user and item intents.

  • \bullet

    The time complexity of BIGCF has a significant advantage over SSL-based methods because the intent modeling is independent of the interaction graph 𝒢\mathcal{G} and the number of layers LL.

  • \bullet

    BIGCF introduces the idea of GCL to achieve contrastive regularization in both interaction and intent spaces. The process does not require any additional operations for graph or feature augmentation to construct multi-views, thus significantly reducing the time complexity of model training (e.g., additional graph augmentation is required for SGL-ED (Wu et al., 2021) and DCCF (Ren et al., 2023), and SVD-based preprocessing is required for LightGCL (Cai et al., 2023)).

2.4.2. Theoretical Analysis

In this section, we present the positive promotion of the graph contrastive regularization paradigm for BIGCF. Specifically, the graph contrastive regularization process achieves alignment and uniformity in dual spaces with the help of measuring the mutual information (𝐚,𝐛)\mathcal{I}(\mathbf{a},\mathbf{b}) between vectors 𝐚\mathbf{a} and 𝐛\mathbf{b}. In the interaction space, for the regularization process of user uu and item ii in Eq. 12, the gradient of user uu is as follows (Wu et al., 2021):

(17) (𝐞u,𝐞i)𝐞u=1τ𝐞u{c(u)+j/<u,i>c(j)},\frac{\partial\mathcal{I}(\mathbf{e}_{u},\mathbf{e}_{i})}{\partial\mathbf{e}_{u}}=\frac{1}{\tau\left\|\mathbf{e}_{u}\right\|}\left\{c(u)+\sum_{j\in\mathcal{B}/<u,i>}c(j)\right\},

where c(u)c(u) and c(j)c(j) denote the contribution of user uu and negative sample j{j} to the gradients, respectively. For a large number of negative samples, the L2L_{2} norm of c(j)c(j) is defined as follows:

(18) c(j)21cos2(𝐞u,𝐞j)×exp(cos(𝐞u,𝐞j)/τ)\left\|c(j)\right\|_{2}\propto\sqrt{1-\text{cos}^{2}(\mathbf{e}_{u},\mathbf{e}_{j})}\times\text{exp}(\text{cos}(\mathbf{e}_{u},\mathbf{e}_{j})/\tau)

The above equation shows that the gradient optimization of c(j)c(j) ultimately depends on the cosine similarity. For hard-negative samples {j/<u,i>}\{j\in\mathcal{B}/<u,i>\}, the closer the similarity to 1.01.0 will result in a significant increase for the L2L_{2} norm c(j)2\left\|c(j)\right\|_{2}. The above effects will produce a surprisingly exponential increase when moderated by smaller temperature coefficient τ\tau, resulting in a significant improvement in training efficiency (Wang and Liu, 2021).

Noting that in the second and third terms of both Eqs. 12 and 13, the two input parameters are identical (e.g., (𝐞u,𝐞u¯)\mathcal{I}(\underline{\mathbf{e}_{u},\mathbf{e}_{u}})), we offer the following derivation:

(𝐞u,𝐞u)\displaystyle\mathcal{I}(\mathbf{e}_{u},\mathbf{e}_{u}) =logexp(cos(𝐞u,𝐞u)/τ)exp(cos(𝐞u,𝐞u)/τ)+v/uexp(cos(𝐞u,𝐞v)/τ)\displaystyle=-\text{log}\frac{\text{exp}\left(\text{cos}(\mathbf{e}_{u},\mathbf{e}_{u})/\tau\right)}{\text{exp}\left(\text{cos}(\mathbf{e}_{u},\mathbf{e}_{u})/\tau\right)+{\textstyle\sum_{v\in\mathcal{B}/u}\text{exp}\left(\text{cos}(\mathbf{e}_{u},\mathbf{e}_{v})/\tau\right)}}
(19) =(1/τ+log(exp(1/τ)+v/uexp(cos(𝐞u,𝐞v)/τ))).\displaystyle=\left(-1/\tau+\text{log}\left(\text{exp}\left(1/\tau\right)+\textstyle\sum_{v\in\mathcal{B}/u}\text{exp}\left(\text{cos}(\mathbf{e}_{u},\mathbf{e}_{v})/\tau\right)\right)\right).

This indicates that (𝐞u,𝐞u)\mathcal{I}(\mathbf{e}_{u},\mathbf{e}_{u}) actually measures the uniformity among different nodes in a batch. The above corollary also holds for the graph contrastive regularization of item embedding 𝐞i\mathbf{e}_{i} and intent embeddings {𝐞u,𝝈,𝐞i,𝝈}\{\mathbf{e}_{u,\boldsymbol{\sigma}},\mathbf{e}_{i,\boldsymbol{\sigma}}\}. With gradient optimization, nodes with greater similarity will be penalized more, thus forcing them to move away from each other and making the feature space uniform:

  • \bullet

    For interaction embeddings, a more uniform feature space ensures that cold-start items have more chances to be recommended.

  • \bullet

    For intent embeddings, a more uniform feature space will ensure that the redundancy among the intents is reduced and better reflects the user’s potential preferences.

Furthermore, given final interaction embeddings (𝐞u\mathbf{e}_{u} and 𝐞v\mathbf{e}_{v}) and individual intent embeddings (𝐞u,𝝈\mathbf{e}_{u,\boldsymbol{\sigma}} and 𝐞v,𝝈\mathbf{e}_{v,\boldsymbol{\sigma}}) in a mini-batch, we have the following derivations:

(20) cos(𝐞u,𝐞v)\displaystyle\text{cos}(\mathbf{e}_{u},\mathbf{e}_{v}) =(𝐞u,𝝁+𝐞u,𝝈ϵu)(𝐞v,𝝁+𝐞v,𝝈ϵv))𝐞u𝐞v;\displaystyle=\frac{(\mathbf{e}_{u,\boldsymbol{\mu}}+\mathbf{e}_{u,\boldsymbol{\sigma}}\odot\boldsymbol{\epsilon}_{u})\top(\mathbf{e}_{v,\boldsymbol{\mu}}+\mathbf{e}_{v,\boldsymbol{\sigma}}\odot\boldsymbol{\epsilon}_{v}))}{\left\|\mathbf{e}_{u}\right\|\cdot\left\|\mathbf{e}_{v}\right\|};
cos(𝐞u,𝝈,𝐞v,𝝈)\displaystyle\text{cos}(\mathbf{e}_{u,\boldsymbol{\sigma}},\mathbf{e}_{v,\boldsymbol{\sigma}}) =(k(𝐂𝒰k|𝐞u,𝝁)𝐂𝒰k)(k(𝐂𝒰k|𝐞v,𝝁)𝐂𝒰k)𝐞u,𝝈𝐞v,𝝈.\displaystyle=\frac{(\sum_{k}\mathbb{P}(\mathbf{C}_{\mathcal{U}}^{k}|\mathbf{e}_{u,\boldsymbol{\mu}})\cdot\mathbf{C}_{\mathcal{U}}^{k})^{\top}(\sum_{k}\mathbb{P}(\mathbf{C}_{\mathcal{U}}^{k}|\mathbf{e}_{v,\boldsymbol{\mu}})\cdot\mathbf{C}_{\mathcal{U}}^{k})}{\left\|\mathbf{e}_{u,\boldsymbol{\sigma}}\right\|\cdot\left\|\mathbf{e}_{v,\boldsymbol{\sigma}}\right\|}.

The above equation demonstrates the GCR process across the dual feature spaces, i.e., the process of calculating cos(𝐞u,𝐞v)\text{cos}(\mathbf{e}_{u},\mathbf{e}_{v}) will rely on the intent embeddings 𝐞u,𝝈\mathbf{e}_{u,\boldsymbol{\sigma}} and 𝐞,𝝈\mathbf{e}_{,\boldsymbol{\sigma}}, while the process of calculating cos(𝐞u,𝝈,𝐞,𝝈)\text{cos}(\mathbf{e}_{u,\boldsymbol{\sigma}},\mathbf{e}_{,\boldsymbol{\sigma}}) will take into account the interaction embeddings 𝐞u,𝝁\mathbf{e}_{u,\boldsymbol{\mu}} and 𝐞u,𝝁\mathbf{e}_{u,\boldsymbol{\mu}}. The direct (solid line) or non-direct (dashed line) participation process of various types of embeddings executing the graph contrastive regularization is also presented in Fig. 3.

Table 2. Statistics of the datasets.
Dataset #Users #Items #Interactions Sparsity
Gowalla 50,821 57,440 1,172,425 99.95%
Amazon-Book 78,578 77,801 2,240,156 99.96%
Tmall 47,939 41,390 2,357,450 99.88%
Table 3. Overall performance comparisons on Gowalla, Amazon-Book, and Tmall datasets w.r.t. Recall@K (abbreviated as R@K) and NDCG@K (abbreviated as N@K). The model that performs best on each dataset and metric is bolded. ‘Improv.%’ indicates the relative improvement of BIGCF over the best baseline, and the improvement is significant based on two-tailed paired t-test.
Gowalla Amazon-Book Tmall
R@20 R@40 N@20 N@40 R@20 R@40 N@20 N@40 R@20 R@40 N@20 N@40
MF (Rendle et al., 2009) 0.1553 0.2264 0.0923 0.1108 0.0557 0.0873 0.0411 0.0525 0.0465 0.0755 0.0316 0.0417
Mult-VAE (Liang et al., 2018) 0.1793 0.2535 0.1079 0.1267 0.0736 0.1126 0.0559 0.0687 0.0531 0.0841 0.0365 0.0473
CVGA (Zhang et al., 2023) 0.1852 0.2609 0.1126 0.1326 0.0849 0.1291 0.0660 0.0807 0.0626 0.0977 0.0438 0.0560
NGCF (Wang et al., 2019) 0.1676 0.2407 0.0977 0.1168 0.0597 0.0934 0.0421 0.0541 0.0499 0.0814 0.0339 0.0449
LightGCN (He et al., 2020) 0.1799 0.2577 0.1053 0.1255 0.0732 0.1148 0.0544 0.0681 0.0555 0.0895 0.0381 0.0499
DisenGCN (Ma et al., 2019a) 0.1379 0.2003 0.0798 0.0961 0.0481 0.0776 0.0353 0.0451 0.0422 0.0688 0.0285 0.0377
DisenHAN (Wang et al., 2020b) 0.1437 0.2079 0.0829 0.0997 0.0542 0.0865 0.0407 0.0513 0.0416 0.0682 0.0283 0.0376
MacridVAE (Ma et al., 2019b) 0.1643 0.2353 0.0987 0.1176 0.0730 0.1120 0.0555 0.0686 0.0605 0.0950 0.0422 0.0542
DGCF (Wang et al., 2020a) 0.1784 0.2515 0.1069 0.1259 0.0688 0.1073 0.0513 0.0640 0.0544 0.0867 0.0372 0.0484
DICE (Zheng et al., 2021) 0.1721 0.2424 0.1025 0.1209 0.0703 0.1129 0.0514 0.0654 0.0601 0.0973 0.0408 0.0536
DGCL (Li et al., 2021) 0.1793 0.2483 0.1067 0.1247 0.0677 0.1057 0.0506 0.0631 0.0526 0.0845 0.0359 0.0469
SGL-ED (Wu et al., 2021) 0.1809 0.2559 0.1067 0.1262 0.0774 0.1204 0.0578 0.0719 0.0574 0.0919 0.0393 0.0513
HCCF (Xia et al., 2022) 0.1818 0.2601 0.1061 0.1265 0.0824 0.1282 0.0625 0.0776 0.0623 0.0986 0.0425 0.0552
LightGCL (Cai et al., 2023) 0.1825 0.2601 0.1077 0.1280 0.0836 0.1280 0.0643 0.0790 0.0632 0.0971 0.0444 0.0562
DCCF (Ren et al., 2023) 0.1876 0.2644 0.1123 0.1323 0.0889 0.1343 0.0680 0.0829 0.0668 0.1042 0.0469 0.0598
BIGCF (Ours) 0.2086 0.2883 0.1242 0.1450 0.0989 0.1468 0.0761 0.0918 0.0755 0.1167 0.0535 0.0680
Improv.% 11.19% 9.04% 10.60% 9.60% 11.25% 9.31% 11.91% 10.74% 13.02% 12.00% 14.07% 13.71%
pp-value 2.0e102.0\text{e}^{-10} 3.0e83.0\text{e}^{-8} 4.4e84.4\text{e}^{-8} 1.5e71.5\text{e}^{-7} 1.9e61.9\text{e}^{-6} 2.2e82.2\text{e}^{-8} 7.9e67.9\text{e}^{-6} 6.5e86.5\text{e}^{-8} 8.9e88.9\text{e}^{-8} 5.7e85.7\text{e}^{-8} 2.8e72.8\text{e}^{-7} 3.7e83.7\text{e}^{-8}

3. Experiments

In this section, we perform experiments on three real-world datasets to validate our proposed BIGCF compared with state-of-the-art recommendation methods.

3.1. Experimental Settings

3.1.1. Datasets

To validate the effectiveness of BIGCF, we adopt three widely used large-scale recommendation datasets: Gowalla, Amazon-Book, and Tmall (Ren et al., 2023), which are varied in scale, field, and sparsity. Table 2 provides statistical information for three datasets. To ensure fairness and consistency, we adopt the same processing method with existing efforts (He et al., 2020; Ren et al., 2023). Specifically, all explicit feedback is forced to be converted to implicit feedback (i.e., ratings are only 0 and 1). Items that a user has interacted with are considered positive samples, and other items apart from that are seen as negative samples for that user. We measure the performance of all recommendation models via Recall@K and NDCG@K (Wang et al., 2019; Zhang et al., 2023).

3.1.2. Baselines

To validate the effectiveness of BIGCF, we choose the following state-of-the-art recommendation methods for comparison experiments:

  • \bullet

    Only factorization-based method: MF (Rendle et al., 2009).

  • \bullet

    Generative-based methods: Mult-VAE (Liang et al., 2018) and CVGA (Zhang et al., 2023).

  • \bullet

    GCN-based methods: NGCF (Wang et al., 2019) and LightGCN (He et al., 2020).

  • \bullet

    Intent modeling-based methods: DisenGCN (Ma et al., 2019a), DisenHAN (Wang et al., 2020b), MacridVAE (Ma et al., 2019b), DGCF (Wang et al., 2020a), DICE (Zheng et al., 2021), and DGCL (Li et al., 2021).

  • \bullet

    SSL-based methods: SGL-ED (Wu et al., 2021), HCCF (Xia et al., 2022), LightGCL (Cai et al., 2023), and DCCF (Ren et al., 2023).

3.1.3. Hyperparameter Settings

We implement BIGCF in PyTorch111https://github.com/BlueGhostYi/BIGCF. For a fair comparison, we use an experimental setup consistent with previous works (Ren et al., 2023). Specifically, the embedding size and batch size of all models are set to 32 and 10240, respectively. the default optimizer is Adam (Kingma and Ba, 2014), and initialization is done via the Xavier method (Glorot and Bengio, 2010). For all comparison methods, we determine the optimal hyperparameters based on the optimal settings given in the paper as well as the grid search. For BIGCF, the number of GCN layers is set in the range of {1, 2, 3}, the number of intent |𝒦||\mathcal{K}| is {16, 32, 64, 128}, and we empirically set the temperature coefficients κ\kappa and τ\tau to be 1.0 and 0.2, respectively. The weight of graph contrastive regularization λ1\lambda_{1} is set in the range of {0.1, 0.2, 0.3, 0.4, 0.5}, and the weight of L2L_{2} regularization λ2\lambda_{2} is set in the range of {1e31\text{e}^{-3}, 1e41\text{e}^{-4}, 1e-5\textbf{1e}^{\textbf{-5}}, 1e61\text{e}^{-6}}. Bolded positions indicate the default optimal settings.

Refer to caption
Figure 4. Comparison of training time (seconds) per epoch for disentanglement-based methods on three datasets.

3.2. Performance Comparisons

3.2.1. Overall Comparisons

Table 3 shows the performance of BIGCF and all baselines, and we have the following observations:

  • \bullet

    BIGCF achieves the best performance across all metrics on three datasets. Quantitatively, BIGCF improves by 11.19%, 11.25% and 13.02% over the strongest baseline w.r.t. Recall@20 on Gowalla, Amazon-Book, and Tmall datasets, respectively. The above experimental results demonstrate the rationality and generalizability of the proposed BIGCF.

  • \bullet

    BIGCF achieves a significant improvement over disentanglement-based approaches (e.g., DGCF (Wang et al., 2020a) and DCCF (Ren et al., 2023)), demonstrating the necessity of modeling intents at a finer granularity.

  • \bullet

    BIGCF can achieve better performance than the same type of generative methods (e.g., Mult-VAE (Liang et al., 2018)) and SSL-based methods (e.g., SGL-ED (Wu et al., 2021) and LightGCL (Cai et al., 2023)), which demonstrates that the proposed BIGCF can mine user preferences more effectively.

  • \bullet

    Fig. 4 presents comparisons of the training time of BIGCF with other disentanglement-based methods, and it can be seen that BIGCF has a significant advantage in training efficiency.

3.2.2. Comparisons w.r.t. Data Sparsity

To verify whether proposed BIGCF can effectively deal with the data sparsity issue, we divide the testing set into three subsets based on the interaction scale, denoting sparse users, common users, and popular users, respectively. The experimental results are shown in Fig. 5. It is clearly evident that BIGCF achieves improvements on all testing subsets of all datasets. Despite the performance degradation of all methods on sparse groups, BIGCF still shows advantages and gradually increases the gap with the performance of DCCF, which indicates that BIGCF has better sparsity resistance. We attribute this to the proposed bilateral intent modeling as well as to the graph contrastive regularization in dual spaces. On the one hand, the finer-grained intents help to deeply understand users’ real preferences and prevent the recommender system from over-recommending popular items; on the other hand, the graph contrastive regularization further constrains the distribution of the node representations within the feature space as a way of preventing nodes from over-aggregating.

Refer to caption
(a) Gowalla
Refer to caption
(b) Amazon-Book
Refer to caption
(c) Tmall
Figure 5. Sparsity tests with three interaction sparsity levels (sparse, normal, and popular) on (a) Gowalla, (b) Amazon-Book, and (c) Tmall datasets w.r.t. NDCG@20.
Table 4. Ablation studies of BIGCF on Gowalla, Amazon-Book, and Tmall datasets w.r.t. Recall@20 and NDCG@20.
Gowalla Amazon-Book Tmall
R@20 N@20 R@20 N@20 R@20 N@20
w/o GCR 0.1922 0.1126 0.0783 0.0584 0.0578 0.0395
w/o IR 0.2003 0.1198 0.0957 0.0735 0.0725 0.0514
w/o HNR 0.1931 0.1113 0.0906 0.0676 0.0662 0.0455
w/o BIR 0.1977 0.1182 0.0935 0.0718 0.0727 0.0512
w/o BIGR 0.1991 0.1191 0.0948 0.0720 0.0735 0.0521
w/o PGR 0.1912 0.1131 0.0880 0.0668 0.0680 0.0479
BIGCF 0.2086 0.1242 0.0989 0.0761 0.0755 0.0535

3.3. In-depth Studies of BIGCF

3.3.1. Ablation Studies

We construct a series of variants to verify the validity of each module in BIGCF:

  • \bullet

    BIGCFw/o GCR\text{BIGCF}_{\text{w/o GCR}}: remove the Graph Contrastive Regularization;

  • \bullet

    BIGCFw/o IR\text{BIGCF}_{\text{w/o IR}}: remove the Interaction Regularization;

  • \bullet

    BIGCFw/o HNR\text{BIGCF}_{\text{w/o HNR}}: remove the Homogeneous Node Regularization;

  • \bullet

    BIGCFw/o BIR\text{BIGCF}_{\text{w/o BIR}}: remove the Bilateral Intent Regularization;

  • \bullet

    BIGCFw/o BIGR\text{BIGCF}_{\text{w/o BIGR}}: remove the Bilateral Intent-guided Graph Reconstruction, and use 𝐞u,𝝁\mathbf{e}_{u,\boldsymbol{\mu}} and 𝐞i,𝝁\mathbf{e}_{i,\boldsymbol{\mu}} directly for downstream tasks;

  • \bullet

    BIGCFw/o PGR\text{BIGCF}_{\text{w/o PGR}}: remove the probability-based graph reconstruction (Eq. 9) and construct 𝐞u\mathbf{e}_{u} and 𝐞i\mathbf{e}_{i} by average pooling.

The experimental results for all variants with BIGCF on three datasets are shown in Table 4 and we have the following findings:

The performance of BIGCF is significantly degraded after removing the GCR module, and it is also degraded after removing the IR, HNR, and BIR modules, respectively. The experimental results demonstrate the necessity and effectiveness of graph contrastive regularization, which can regulate the uniformity of the feature spaces in an augmentation-free and self-supervised manner to prevent the node representations from converging to consistency.

Focusing on intent modeling: (1) firstly, removing the GCR module, BIGCFw/o GCR\text{BIGCF}_{\text{w/o GCR}} still outperforms LightGCN; (2) secondly, removing the BIR module decreases the performance of BIGCF; (3) lastly, removing the bilateral intent modeling, the performance of BIGCFw/o BIGR\text{BIGCF}_{\text{w/o BIGR}} likewise suffers a certain degree of degradation. The ablation experiments with the above three variants validate the necessity of the proposed intent modeling. Finally, the removal of probabilistic modeling and reparameterization process results in a significant performance degradation of BIGCFw/o PGR\text{BIGCF}_{\text{w/o PGR}}, which demonstrates the effectiveness of generative training strategy.

Refer to caption
(a) |𝒦||\mathcal{K}|
Refer to caption
(b) λ1\lambda_{1}
Figure 6. Hyperparameter Sensitivities for (a) the number of commonality intents |𝒦||\mathcal{K}| and (b) graph contrastive regularization weight λ1\lambda_{1} w.r.t. Recall@20 on three datasets.
Table 5. Impact on the number of graph convolution layers for BIGCF on Gowalla, Amazon-Book, and Tmall datasets w.r.t. Recall@20 and NDCG@20.
Gowalla Amazon-Book Tmall
R@20 N@20 R@20 N@20 R@20 N@20
BIGCF-1 0.2052 0.1221 0.0951 0.0736 0.0732 0.0521
BIGCF-2 0.2086 0.1242 0.0989 0.0761 0.0755 0.0535
BIGCF-3 0.2066 0.1230 0.0976 0.0750 0.0748 0.0530

3.3.2. Hyperparameter Sensitivities.

In this section, we present the effect of various hyperparameters on the performance of BIGCF, as shown in Fig. 6 and Table 5.

The experimental results in Table 5 show that BIGCF can effectively fuse graph structure information into the modeling process of user preferences with two GCN layers. However, stacking more GCN layers may introduce unnecessary noise information, which will negatively affect the performance of BIGCF.

As shown in Fig. 6, the performance of BIGCF further improves as the number of intents increases, which proves the effectiveness of intent modeling. When the number increases further, the performance improvement trend of BIGCF starts to slow down, which we argue could be because the excessive number of intents makes the boundaries among intents start to blur, which interferes with the modeling of user preferences. Increasing the strength of the GCR can further improve the performance, which demonstrates the necessity of constraining the uniformity of the feature space. It should be noted that there is a trade-off between alignment and uniformity, and that excessive graph contrastive regularization may cause troubles with the main recommendation task.

Refer to caption
(a) Intent scores for user u2550u_{2550}
Refer to caption
(b) Mean and variance of intent scores
Figure 7. (a) Probability scores of user u2550u_{2550} for collective intent 𝐂𝒰\mathbf{C}_{\mathcal{U}}. (b) Mean and variance of all user scores for collective intent before, during, and after training.

3.3.3. Case Study.

Finally, we evaluate the validity of individual and collective intents through case studies. Specifically, we randomly select a user u2550u_{2550} on Amazon-Book dataset and visualize his correlation scores (computed by Eq. 5) for collective intent (Fig. 7(a)). When not yet trained, all intents provide the same contribution, which cannot distinguish the user’s true preferences. And as the training proceeds, some aspects that the user does not care about are below the initial value (blue line), while the aspects that he cares about provide larger correlation scores (e.g., 𝐂𝒰14\mathbf{C}_{\mathcal{U}}^{14}, 𝐂𝒰34\mathbf{C}_{\mathcal{U}}^{34}, and 𝐂𝒰45\mathbf{C}_{\mathcal{U}}^{45}). It is worth noting that the user behavior is not entirely governed by the collective intent 𝐂𝒰14\mathbf{C}_{\mathcal{U}}^{14}, but is also more or less influenced by other intents (e.g., 𝐂𝒰34\mathbf{C}_{\mathcal{U}}^{34} and 𝐂𝒰45\mathbf{C}_{\mathcal{U}}^{45}). And focusing on overall behavior, we randomly select 5,000 users and show their means and variances of their correlation scores for collective intent before, during, and after training (Fig. 7(b)). It can be seen that after training, most users demonstrate greater variances for collective intent, which indicates that each user focuses on different aspects of multiple intents (e.g., user u2550u_{2550} in Fig. 7(a)), and BIGCF can effectively utilize implicit feedback data to capture the variability among users and reflect user preferences more accurately.

4. RELATED WORK

Only ID-based Recommendation. Implicit feedback has gradually become a mainstream research component of recommender systems due to its easy accessibility (Rendle et al., 2009). With this background, earlier works only map user or item ID to a single embedding and subsequently go through inner product (Rendle et al., 2009) or neural networks (He et al., 2017; He and Chua, 2017) to predict user preferences. Many novel designs have also emerged immediately afterward, such as attention (Chen et al., 2017), autoencoders (Sedhain et al., 2015), and generative models (Liang et al., 2018; Zhang et al., 2023). With the rise of graph neural networks (Wu et al., 2020), there has been a similar trend of studies in the recommendation field. For example, NGCF (Wang et al., 2019) and LightGCN (He et al., 2020) use GCNs (Kipf and Welling, 2017) to model high-quality embedding representations of users and items, and these research results have subsequently spread to other subfields, such as social network (Wu et al., 2019) and knowledge graph-based recommender systems (Wang et al., 2021). And with the introduction of contrastive learning (Chen et al., 2020), the research on recommender systems based on graph structures has taken a new step forward. For example, SGL (Wu et al., 2021) constructs different views of nodes by random masking, and this structural augmentation can expand the number of samples, thus effectively alleviating the data sparsity problem. Subsequent studies are based on how to simplify the design of SGL, such as replacing structural augmentation with feature augmentation (Lin et al., 2022; Cai et al., 2023), or designing self-learning augmentation paradigms (Xia et al., 2022). However, these designs only consider representation modeling and fail to take into account the intents to select items at a fine-grained level, which prevents ID-based methods alone from accurately capturing the user’s true preferences.

Disentanglement-based Recommendation. In general, disent-anglement-based approaches are typically grounded in modeling user-item interactions by projecting them onto different feature spaces (Ma et al., 2019b; Chen et al., 2021). For example, MacridVAE (Ma et al., 2019a) encodes multiple user intents with the variational autoencoders (Liang et al., 2018). DGCF (Wang et al., 2020a) and DisenHAN (Wang et al., 2020b) learn disentangled user representations with graph neural networks and graph attention networks, respectively. DICE (Zheng et al., 2021) learns two disentangled causal embeddings for users and items respectively. KGIN (Wang et al., 2021) proposes the notion of shared intents and captures user’s path-based intents by introducing item-side knowledge graph. Some novel efforts attempt to integrate contrastive learning into intent modeling, such as ICLRec (Chen et al., 2022), DiRec (Wu et al., 2023), and DCCF (Ren et al., 2023). However, these efforts do not define user or item intents at a fine-grained level, and data augmentation applied to contrastive learning will inevitably make model training more difficult. The proposed BIGCF is contrary to the design philosophy of pioneering works. We present the notion of individuality and collectivity of intents from a causal perspective, and perform intent modeling with a graph generation strategy. In addition, we propose graph contrastive regularization as a simple yet efficient constraint in dual space that does not depend on any type of data augmentation.

5. Conclusion

In this paper, we revisited user-item interactions at a fine-grained level and subdivided the motivation for interactions into collective and individual intents with a causal perspective. Based on this, we proposed a novel end-to-end recommendation framework called Bilateral Intent-guided Graph Collaborative Filtering (BIGCF). The core of BIGCF lies in modeling the bilateral intents of users and items through a graph generation strategy, and these intents are used to guide the interaction graph reconstruction. In addition, we further proposed graph contrastive regularization applicable to both interaction and intent spaces to optimize uniformity among nodes. Finally, we conducted extensive experiments on three real-world datasets and verified the effectiveness of BIGCF.

Acknowledgements.
This work is supported by the National Natural Science Foundation of China (No.62272001, No.62206002), the Anhui Provincial Natural Science Foundation (2208085QF195), and the Hefei Key Common Technology Project (GJ2022GX15).

References

  • (1)
  • Cai et al. (2023) Xuheng Cai, Chao Huang, Lianghao Xia, and Xubin Ren. 2023. LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation. In The Eleventh International Conference on Learning Representations (ICLR).
  • Chang et al. (2023) Bo Chang, Alexandros Karatzoglou, Yuyan Wang, Can Xu, Ed H Chi, and Minmin Chen. 2023. Latent User Intent Modeling for Sequential Recommenders. In Companion Proceedings of the ACM Web Conference 2023 (WWW). 427–431.
  • Chen et al. (2021) Hong Chen, Yudong Chen, Xin Wang, Ruobing Xie, Rui Wang, Feng Xia, and Wenwu Zhu. 2021. Curriculum Disentangled Recommendation with Noisy Multi-feedback. Advances in Neural Information Processing Systems (NeurIPS) 34, 26924–26936.
  • Chen et al. (2017) Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive Collaborative Filtering: Multimedia recommendation with Item-and Component-level Attention. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 335–344.
  • Chen et al. (2020) Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In International Conference on Machine Learning (ICML). 1597–1607.
  • Chen et al. (2022) Yongjun Chen, Zhiwei Liu, Jia Li, Julian McAuley, and Caiming Xiong. 2022. Intent Contrastive Learning for Sequential Recommendation. In Proceedings of the ACM Web Conference 2022 (WWW). 2172–2182.
  • Glorot and Bengio (2010) Xavier Glorot and Yoshua Bengio. 2010. Understanding the Difficulty of Training Deep Feedforward Neural Networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (ICAIS). 249–256.
  • He and Chua (2017) Xiangnan He and Tat-Seng Chua. 2017. Neural Factorization Machines for Sparse Predictive Analytics. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 355–364.
  • He et al. (2020) Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 639–648.
  • He et al. (2017) Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web (WWW). 173–182.
  • Ho et al. (2020) Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. Advances in Neural Information Processing Systems (NeurIPS) 33, 6840–6851.
  • Jing et al. (2021) Li Jing, Pascal Vincent, Yann LeCun, and Yuandong Tian. 2021. Understanding Dimensional Collapse in Contrastive Self-supervised Learning. arXiv preprint arXiv:2110.09348 (2021).
  • Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (2014).
  • Kingma and Welling (2013) Diederik P Kingma and Max Welling. 2013. Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114 (2013).
  • Kipf and Welling (2017) Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In The Fifth International Conference on Learning Representations (ICLR).
  • Knyazev and Oosterhuis (2022) Norman Knyazev and Harrie Oosterhuis. 2022. The Bandwagon Effect: Not Just Another Bias. In Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR). 243–253.
  • Li et al. (2021) Haoyang Li, Xin Wang, Ziwei Zhang, Zehuan Yuan, Hang Li, and Wenwu Zhu. 2021. Disentangled Contrastive Learning on Graphs. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 34. 21872–21884.
  • Liang et al. (2018) Dawen Liang, Rahul G Krishnan, Matthew D Hoffman, and Tony Jebara. 2018. Variational Autoencoders for Collaborative Filtering. In Proceedings of the 2018 World Wide Web Conference (WWW). 689–698.
  • Lin et al. (2022) Zihan Lin, Changxin Tian, Yupeng Hou, and Wayne Xin Zhao. 2022. Improving Graph Collaborative Filtering with Neighborhood-enriched Contrastive Learning. In Proceedings of the ACM Web Conference 2022 (WWW). 2320–2329.
  • Ma et al. (2019a) Jianxin Ma, Peng Cui, Kun Kuang, Xin Wang, and Wenwu Zhu. 2019a. Disentangled Graph Convolutional Networks. In International Conference on Machine Learning (ICML). 4212–4221.
  • Ma et al. (2019b) Jianxin Ma, Chang Zhou, Peng Cui, Hongxia Yang, and Wenwu Zhu. 2019b. Learning Disentangled Representations for Recommendation. Advances in Neural Information Processing Systems (NeurIPS) 32, 5711–5722.
  • Ren et al. (2023) Xubin Ren, Lianghao Xia, Jiashu Zhao, Dawei Yin, and Chao Huang. 2023. Disentangled Contrastive Collaborative Filtering. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 1137–1146.
  • Rendle et al. (2009) Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI). 452–461.
  • Ricci et al. (2011) Francesco Ricci, Lior Rokach, and Bracha Shapira. 2011. Introduction to Recommender Systems Handbook. In Recommender Systems Handbook. Springer, 1–35.
  • Sedhain et al. (2015) Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015. Autorec: Autoencoders Meet Collaborative Filtering. In Proceedings of the 24th International Conference on World Wide Web (WWW). 111–112.
  • Truong et al. (2021) Quoc-Tuan Truong, Aghiles Salah, and Hady W Lauw. 2021. Bilateral Variational Autoencoder for Collaborative Filtering. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (WSDM). 292–300.
  • Wang and Liu (2021) Feng Wang and Huaping Liu. 2021. Understanding the Behaviour of Contrastive Loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2495–2504.
  • Wang and Isola (2020) Tongzhou Wang and Phillip Isola. 2020. Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere. In International Conference on Machine Learning (ICML). 9929–9939.
  • Wang et al. (2019) Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural Graph Collaborative Filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 165–174.
  • Wang et al. (2021) Xiang Wang, Tinglin Huang, Dingxian Wang, Yancheng Yuan, Zhenguang Liu, Xiangnan He, and Tat-Seng Chua. 2021. Learning Intents behind Interactions with Knowledge Graph for Recommendation. In Proceedings of the Web Conference 2021 (WWW). 878–887.
  • Wang et al. (2020a) Xiang Wang, Hongye Jin, An Zhang, Xiangnan He, Tong Xu, and Tat-Seng Chua. 2020a. Disentangled Graph Collaborative Filtering. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 1001–1010.
  • Wang et al. (2020b) Yifan Wang, Suyao Tang, Yuntong Lei, Weiping Song, Sheng Wang, and Ming Zhang. 2020b. DisenHAN: Disentangled Heterogeneous Graph Attention Network for Recommendation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM). 1605–1614.
  • Wei et al. (2021) Tianxin Wei, Fuli Feng, Jiawei Chen, Ziwei Wu, Jinfeng Yi, and Xiangnan He. 2021. Model-agnostic Counterfactual Reasoning for Eliminating Popularity Bias in Recommender System. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD). 1791–1800.
  • Wu et al. (2021) Jiancan Wu, Xiang Wang, Fuli Feng, Xiangnan He, Liang Chen, Jianxun Lian, and Xing Xie. 2021. Self-Supervised Graph Learning for Recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 726–735.
  • Wu et al. (2019) Le Wu, Peijie Sun, Yanjie Fu, Richang Hong, Xiting Wang, and Meng Wang. 2019. A Neural Influence Diffusion Model for Social Recommendation. In Proceedings of the 42nd international ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 235–244.
  • Wu et al. (2023) Xixi Wu, Yun Xiong, Yao Zhang, Yizhu Jiao, and Jiawei Zhang. 2023. Dual Intents Graph Modeling for User-centric Group Discovery. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM). 2716–2725.
  • Wu et al. (2020) Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A Comprehensive Survey on Graph Neural Networks. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 32, 1 (2020), 4–24.
  • Xia et al. (2022) Lianghao Xia, Chao Huang, Yong Xu, Jiashu Zhao, Dawei Yin, and Jimmy Huang. 2022. Hypergraph Contrastive Collaborative Filtering. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 70–79.
  • Yang et al. (2022) Yuhao Yang, Chao Huang, Lianghao Xia, and Chenliang Li. 2022. Knowledge graph contrastive learning for recommendation. In Proceedings of the 45th international ACM SIGIR Conference on Research and Development in Information Retrieval. 1434–1443.
  • Ying et al. (2018) Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 974–983.
  • You et al. (2020) Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. 2020. Graph Contrastive Learning with Augmentations. In Advances in Neural Information Processing Systems (NeurIPS). 5812–5823.
  • Zhang et al. (2023) Yi Zhang, Yiwen Zhang, Dengcheng Yan, Shuiguang Deng, and Yun Yang. 2023. Revisiting Graph-based Recommender Systems from the Perspective of Variational Auto-encoder. ACM Transactions on Information Systems (TOIS) 41, 3 (2023), 1–28.
  • Zhang et al. (2024) Yi Zhang, Yiwen Zhang, Dengcheng Yan, Qiang He, and Yun Yang. 2024. NIE-GCN: Neighbor Item Embedding-Aware Graph Convolutional Network for Recommendation. IEEE Transactions on Systems, Man, and Cybernetics: Systems 54, 5 (2024), 2810–2821.
  • Zheng et al. (2021) Yu Zheng, Chen Gao, Xiang Li, Xiangnan He, Yong Li, and Depeng Jin. 2021. Disentangling User Interest and Conformity for Recommendation with Causal Embedding. In Proceedings of the Web Conference 2021 (WWW). 2980–2991.