This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Fusion Self-supervised Learning for Recommendations

Yu Zhang Anhui UniversityHefeiChina [email protected] Lei Sang Anhui UniversityHefeiChina [email protected] Yi Zhang Anhui UniversityHefeiChina [email protected] Yiwen Zhang Anhui UniversityHefeiChina [email protected]  and  Yun Yang Swinburne University of TechnologyMelbourneAustralia [email protected]
(2018)
Abstract.

Recommender systems are widely deployed in various web environments, and self-supervised learning (SSL) has recently attracted significant attention in this field. Contrastive learning (CL) stands out as a major SSL paradigm due to its robust ability to generate self-supervised signals. Mainstream graph contrastive learning (GCL)-based methods typically implement CL by creating contrastive views through various data augmentation techniques. Despite these methods are effective, we argue that there still exist several challenges. i) Data augmentation (e.g.,e.g., discarding edges or adding noise) necessitates additional graph convolution (GCN) or modeling operations, which are highly time-consuming and potentially harm the embedding quality. ii) Existing CL-based methods use traditional CL objectives to capture self-supervised signals. However, few studies have explored obtaining CL objectives from more perspectives and have attempted to fuse the varying signals from these CL objectives to enhance recommendation performance.

To overcome these challenges, we propose a Fusion Self-supervised Contrastive Learning framework for recommendation. Specifically, instead of facilitating data augmentations, we use high-order information from GCN process to create contrastive views. Additionally, to integrate self-supervised signals from various CL objectives, we propose an advanced CL objective. By ensuring that positive pairs are distanced from negative samples derived from both contrastive views, we effectively fuse self-supervised signals from distinct CL objectives, thereby enhancing the mutual information between positive pairs. Experimental results on three public datasets demonstrate the superior recommendation performance and efficiency of HFGCL compared to the state-of-the-art baselines.

Recommender Systems, Graph Neural Network, Contrastive Learning, Self-Supervised Learning
copyright: acmlicensedjournalyear: 2018doi: XXXXXXX.XXXXXXXconference: Make sure to enter the correct conference title from your rights confirmation emai; June 03–05, 2018; Woodstock, NYisbn: 978-1-4503-XXXX-X/18/06ccs: Information systems Recommender systems

1. INTRODUCTION

Recommender systems are essential to a variety of web platforms (Wu et al., 2023b), including e-commerce and streaming services, where they enhance user experiences by delivering personalized content. However, these systems often face the challenge of data sparsity. Self-supervised learning (SSL) (Liu et al., 2023b, a) has gained increasing recognition for its effectiveness in addressing data sparsity (Xia et al., 2023). The capability of SSL to extract self-supervised signals from large volumes of unlabeled data allows the approach to compensate for missing information, leading to widespread adoption in numerous studies (Assran et al., 2023; Chen et al., 2023; Baevski et al., 2023). Among various SSL paradigms, contrastive learning (CL) (Jing et al., 2023) stands out by acquiring self-supervised signals through maximizing mutual information between positive pairs in contrastive views. In recent years, the success of graph contrastive learning (GCL)-based methods (Wu et al., 2021; Zhang et al., 2024a) have gained significant attention in the field of recommender systems (Gao et al., 2022a).

In general, for CL to be effective, GCL-based methods require at least two distinct contrastive views. Inspired by various domains (Bayer et al., 2022; Rebuffi et al., 2021), most existing methods employ data augmentation techniques to generate these views. As illustrated in Fig. 1, data augmentation techniques can be broadly classified into two categories: graph augmentation (Wu et al., 2021) and feature augmentation (Yu et al., 2022). Graph augmentation involves creating interaction subgraphs by randomly discarding nodes or edges on the user-item interaction graph and then generating different contrastive views through graph convolution operations. Feature augmentation generates different contrastive views by adding noise to the embeddings during the graph convolution process. With the generated contrastive views, these methods effectively mine users’ deep preferences, thereby providing more personalized recommendations.

Despite the necessity of generating contrastive views, data augmentation techniques and multiple CL objectives pose several drawbacks. On the one hand. To create diverse contrastive views, in addition to the neighborhood-aggregated embeddings required for the main recommendation task, data augmentation generally requires additional graph convolution and modeling operations based on the same initial embeddings, which significantly increase training cost per epoch. Furthermore, graph augmentation in data augmentation often randomly drop nodes or edges, thereby disrupting the inherent properties of the input graph. In contrast, feature augmentation alleviate this issue by adding noise to each node, but they still overlook the unique characteristics of individual nodes. On the other hand. Existing GCL-based methods construct CL objectives (Yu et al., 2022; Yang et al., 2023b) from traditional contrastive views (e.g.,e.g., user-based and item-based). While the generated self-supervised signals can improve performance to some extent, the fusion of these signals remains problematic, leading to suboptimal recommendation (as analyzed in Section 3.2). In addition, few studies have explored contrastive learning paradigm from different perspectives and attempted to fuse diverse signals from more CL objectives to further improve recommendation quality.

Based on the above analysis, we identify two key challenges critical for advancing this field:

  • How to efficiently obtain high-quality contrastive views without data (graph and feature) augmentations?

  • How to better fuse self-supervised signals captured from different CL objectives?

Refer to caption
Figure 1. GCL-based methods using data augmentation. Top: Graph augmentation; Bottom: Feature augmentation.

To address these challenges, we propose a High-order Fusion Graph Contrastive Learning (HFGCL) framework for recommendation. Instead of relying on data augmentations, we uncover the user-item contrastive paradigm directly from users’ interaction behavior, and demonstrate that the graph convolution (GCN) encoder (He et al., 2020) enhances the similarity between user-item pairs. Through analysis in Section 4.2, we find that low-order information, having undergone minimal or no neighborhood aggregation, results in embeddings saturated with self-information. The self-information suppresses the similarity between positive pairs and complicates the maximization of mutual information. Consequently, we exclude low-order information and propose high-order contrastive views. Since the negative pairs provided by high-order contrastive views differ from traditional ones (e.g.,e.g., user views’ negative pairs are other users), we integrate the users and items in the batch and introduce them into the original CL loss. Moreover, to effectively integrate self-supervised signals from different CL objectives (i.e.,i.e., user-user, item-item, user-item contrastive views), we present a fusion CL loss. By reconstructing the fused embeddings between users and items, we further enhance the mutual information between the positive pairs, thereby enabling the effective fusion of diverse self-supervised signals. Overall, HFGCL is a simple yet efficient recommendation framework, capable of fusing rich self-supervised signals from multi-scale contrastive objectives with a fusion CL loss, thereby significantly improving recommendation performance.

The main contributions of this paper are as follows:

  • We reveal that most GCL-based methods are time-consuming to generate contrastive views and that existing CL methods struggle to fuse self-supervised signals from diverse contrastive objectives.

  • We introduce a novel graph contrastive learning paradigm from the perspective of more effective contrastive views construction, and propose the High-order Fusion Graph Contrastive Learning (HFGCL) framework for recommendation.

  • We leverage stacking multiple GCN layers to generate high-order contrastive views, and design a multi-scale fusion contrastive learning objective to better integrate self-supervised signals.

  • We conduct various experimental studies on three publicly datasets, and the results show that HFGCL has significant advantages in terms of recommendation performance and method training efficiency compared to existing the-state-of-art baselines.

2. PRELIMINARIES

In this section, we introduce the key technologies underlying our framework’s architecture.

2.1. Graph Neural Networks for Recommendation

Given 𝒰(|𝒰|=M)\mathcal{U}(|\mathcal{U}|=M) and (||=N)\mathcal{I}(|\mathcal{I}|=N) denote the set of users and items, respectively. In recommender system (Wu et al., 2023a), Graph Convolutional Networks (GCN) (Wang et al., 2019; Wu et al., 2019; He et al., 2020) are increasingly favored, which through graph convolution operations, effectively gather high-order neighborhood information to accurately capture user preference behaviors. LightGCN (He et al., 2020) as currently the most popular GCN encoder, which effectively captures high-order information through neighborhood aggregation. For efficient training, graph Laplace matrix is proposed: A~=D12AD12\tilde{A}=D^{-\frac{1}{2}}A{D^{-\frac{1}{2}}}, where A~(M+N)×(M+N)\tilde{A}\in\mathbb{R}^{(M+N)\times(M+N)} denotes graph Laplacian matrix, and AA denotes adjacency matrix, and DD denotes diagonal matrix of AA. To ensure that diverse neighborhood information is integrated, final embeddings are aggregated from all layers: 𝐄=1L+1(𝐄(0)++A~L𝐄(0)),\mathbf{E}=\frac{1}{L+1}(\mathbf{E}^{(0)}+...+\tilde{A}^{L}\mathbf{E}^{(0)}), where LL denotes the number of GCN layers. The point-wise Bayesian Personalisation Ranking (BPR) loss (Rendle et al., 2009) is adopted to optimize model parameters:

(1) rec=log(sigmoid(𝐞u𝐞i𝐞u𝐞j)),\mathcal{L}_{rec}=\mathrm{-log}(\mathrm{sigmoid}(\mathbf{e}_{u}^{\top}\mathbf{e}_{i}-\mathbf{e}_{u}^{\top}\mathbf{e}_{j})),

where u,i,j\langle u,i,j\rangle represents a triplet input to the model.

2.2. Contrastive Learning for Recommendation

Recent studies (Wu et al., 2021) have demonstrated that contrastive learning (CL), the primary paradigm in self-supervised learning, effectively addresses the challenge of data sparsity in recommender systems through the generation of self-supervised signals (Xia et al., 2023). CL-based methods create contrastive views through data augmentation and seek to optimize the mutual information between positive pairs, thereby obtaining self-supervised signals. Most of the existing methods mainly use InfoNCE (Chen et al., 2020) loss for CL:

(2) 𝒞(𝐞a,𝐞b)=a,blogexp(𝐞a𝐞b/τ)cbexp(𝐞a𝐞c/τ),\mathcal{C}(\mathbf{e}_{a},\mathbf{e}_{b})=-\sum_{\langle a,b\rangle\in\mathcal{B}}\mathrm{log}\frac{\mathrm{exp}(\mathbf{e}^{\top}_{a}\mathbf{e}_{b}/\tau)}{\sum_{c\in\mathcal{B}_{b}}\mathrm{exp}(\mathbf{e}^{\top}_{a}\mathbf{e}_{c}/\tau)},

where a,b\langle a,b\rangle denote positive pair, assumed here as a2,b3\langle a_{2},b_{3}\rangle in same batch ={a1,b2,a2,b3,,am,bn}\mathcal{B}=\{\langle a_{1},b_{2}\rangle,\langle a_{2},b_{3}\rangle,...,\langle a_{m},b_{n}\rangle\}, cc denotes all samples of b={b2,b3,,bn}\mathcal{B}_{b}=\{b_{2},b_{3},...,b_{n}\} in batch \mathcal{B}, τ\tau denotes the temperature coefficient. CL loss aims to maximizes the mutual information between positive pairs 𝐞a\mathbf{e}_{a} and 𝐞b\mathbf{e}_{b}, and to increase the distance between 𝐞a\mathbf{e}_{a} and other negative sample 𝐞c\mathbf{e}_{c}.

3. EXPLORING GRAPH CONTRASTIVE LEARNING FOR RECOMMENDATION

In this section, we explore the key aspects of GCL-based recommendation methods, with a particular focus on data augmentation techniques and contrastive views.

3.1. Effectiveness of Data Augmentation

Table 1. Comparison of GCN and GCL-based methods (with Data Augmentations) in terms of per-epoch runtime across three datasets (s: seconds, x: times slower).
Method Amazon-Book (time/epoch) Yelp2018 (time/epoch) Tmall (time/epoch)
LightGCN 67.9s 34.3s 54.4s
SGL-ED 188.1s (2.8x) 92.5s (2.7x) 154.5s (2.8x)
SimGCL 194.5s (2.9x) 83.5s (2.4x) 167.9s (3.1x)
NCL 137.1s (2.0x) 60.5s (1.8x) 89.5s (1.7x)
CGCL 140.5s (2.1x) 55.6s (1.6x) 132.2s (2.4x)
VGCL 98.7 (1.5x) 51.4 (1.5x) 81.7 (1.5x)

Existing researches on CL underscore the critical role of data augmentations (Wu et al., 2021; Yu et al., 2022; Yang et al., 2023b; Zhang et al., 2024b), which is attributed to CL that necessitates the creation of diverse contrastive views for effective implementation. In recommendation, there are two main types in Fig. 1: (1) graph augmentation (Wu et al., 2021) (2) feature augmentation (Yu et al., 2022).

Graph augmentation typically involves generating two distinct subgraphs by randomly discarding edges/nodes from original graph:

(3) 𝒢=Dropout(𝒢(𝒩¯,¯),p),𝒢′′=Dropout(𝒢(𝒩¯,¯),p),\mathcal{G}^{\prime}=\mathrm{Dropout}(\mathcal{G(\underline{N},\underline{E})},p),\mathcal{G}^{\prime\prime}=\mathrm{Dropout}(\mathcal{G(\underline{N},\underline{E})},p),

where 𝒢\mathcal{G} denotes the original graph, 𝒩\mathcal{N} and \mathcal{E} denote the nodes and edges, respectively, 𝒢\mathcal{G}^{\prime} and 𝒢′′\mathcal{G}^{\prime\prime} denotes the augmentation subgraphs, pp denotes keeping rate. Then, we perform graph convolution operations by 𝒢\mathcal{G}^{\prime} and 𝒢′′\mathcal{G}^{\prime\prime} to obtain different views 𝐞\mathbf{e}^{\prime} and 𝐞′′\mathbf{e}^{\prime\prime} for the same node 𝐞\mathbf{e}. Unlike graph augmentation, feature augmentation has garnered significant attention in numerous studies by introducing noises into graph convolution process:

(4) 𝐞=𝐞+,𝐞′′=𝐞+,\mathbf{e}^{\prime}=\mathbf{e}+\triangle,\mathbf{e}^{\prime\prime}=\mathbf{e}+\triangle,

where \triangle denotes the added noise (e.g.e.g., gaussian noise or uniform noise). After obtaining the contrastive views, contrastive learning is performed using the CL loss in Eq. 2, cl=𝒞(𝐞,𝐞′′)\mathcal{L}_{cl}=\mathcal{C}(\mathbf{e}^{\prime},\mathbf{e}^{\prime\prime}), effectively extracting self-supervised signals from different views to alleviate the data sparsity problem.

Despite these contrastive views are effective in improving performance (as shown in Table 2), data augmentation has a significant drawback: generating contrastive views each epoch adds extra time, significantly increasing training costs. Table 1 provides the per-epoch time consumption for the GCN-based method (e.g.,e.g., LightGCN (He et al., 2020)) and GCL-based methods with (e.g.,e.g., SGL-ED (Wu et al., 2021), SimGCL (Yu et al., 2022), NCL (Lin et al., 2022), CGCL (He et al., 2023) and VGCL (Yang et al., 2023b)). For fair comparison, all methods are tested under the same experimental setup, with details provided in Section 5.1. SGL-ED, which utilizes the graph augmentation technique to generate subgraphs, is extremely time-consuming. Additionally, most GCL-based (e.g.,e.g., SimGCL) methods require repetitive graph convolution or modeling operations to obtain the contrastive views, further contributing to the high time cost. This raises an important question: Is it possible to propose a novel contrastive learning paradigm that does not rely on data augmentation techniques, but still generates high-quality contrastive views and improves training efficiency? While data augmentations have proven effective in generating contrastive views and enhancing recommendation performance, efficiently generating high-quality contrastive views remains a key challenge in contrastive learning.

3.2. Necessity of More Contrastive Views

Table 2. Contrastive View and Performance of GCN and GCL-based Methods. (User and Item denote the number of User-based and Item-based contrastive views, respectively.)
Method CL Views Amazon-Book Yelp2018
User Item R@20 N@20 R@20 N@20
LightGCN - - 0.0411 0.0315 0.0639 0.0525
SGL-ED 2 2 0.0478 0.0379 0.0675 0.0555
SimGCL 2 2 0.0515 0.0414 0.0721 0.0601
NCL 4 4 0.0481 0.0373 0.0685 0.0577
CGCL 6 6 0.0483 0.0380 0.0690 0.0560
VGCL 4 4 0.0515 0.0410 0.0715 0.0587

To further investigate the optimization mechanism of GCL-based recommendation methods, we present the number of contrastive views generated by these methods in Table 2. Specifically, the listed methods obtain different perspectives of the same node (e.g.,e.g., e) and treat each pair of generated views (e.g.,e.g., view1=𝐞\mathbf{e}^{\prime} and view2=𝐞′′\mathbf{e}^{\prime\prime}) as a positive pair (e.g.,e.g., viwe1,viwe2\langle\mathrm{viwe1},\mathrm{viwe2}\rangle) for contrastive learning. This approach allows them to leverage additional self-supervised signals to optimize model performance. As shown in the statistics, SimGCL, compared to SGL-ED, does not generate additional contrastive views but instead focuses on optimizing data augmentation strategies. In contrast, subsequent methods such as NCL, CGCL, and VGCL, shift their focus toward generating more contrastive views (i.e.,i.e., significantly increasing the number of views). While these methods demonstrate improvements over baselines in their original studies, our experimental results (Table 2) reveal that the gains in recommendation performance are modest. Notably, these newer methods often fail to surpass the earlier SimGCL.

Refer to caption
Figure 2. Framework of HFGCL. Left: User-item interaction. Center: High-order contrastive views generation process. In the figure, let h=2h=2 and L=3L=3. After aggregating high-order embeddings, user u2u_{2} has a representation more similar to the positive item i2i_{2}, with only u4u_{4} missing. This indicates a close relationship between u2u_{2} and i2i_{2} in the high-order embedding space. Right: Dual optimization objectives. During the model training phase, two optimization objectives are included: the primary and the contrastive optimization objective, refer to Fig. 3 for the meaning.

Our experimental results suggest that while generating more contrastive views can provide richer self-supervised signals, the subtle differences among these signals pose significant challenges for effective utilization. Consequently, methods that prioritize generating a larger number of contrastive views often exhibit relatively inferior performance compared to SimGCL, which with high-quality contrastive views. As discussed in the previous section, identifying and generating high-quality contrastive views remains a critical yet often overlooked challenge in contrastive learning. Furthermore, as suggested by previous studies (Yang et al., 2023b), although self-supervised signals from different views demonstrate potential in alleviating data sparsity, the nuanced discrepancies between these signals significantly hinder their seamless fusion, ultimately limiting the effectiveness of these new methods.

4. HFGCL: HIGH-ORDER FUSION GRAPH CONTRASTIVE LEARNING

In this section, we propose a High-order Fusion Graph Contrastive Learning (HFGCL) framework in Fig. 2. HFGCL leverages high-order information to generate contrastive views and introduces a fusion CL loss, effectively integrating self-supervised signals from multi-scale CL objectives to alleviate the data sparsity issue.

4.1. User-Item Contrastive Learning Paradigm

Following the analysis in Section 3, we urgently need to find a novel contrastive learning paradigm. Inspired by the natural language processing (NLP) (Gao et al., 2022b), which generates contrastive views by applying the dropout function to input sentence. We believe that similar contrastive views also exist in graph-based recommendation tasks.

Considering a different perspective, we directly treat users and items as contrastive views. Given that users and items are first-order neighbors of each other, their high-order information are essentially identical. As graph convolution (GCN) progresses, users and items gradually capture neighborhood information, resulting in their embeddings becoming progressively closer. Therefore, GCN process ultimately generates a set of distinctive contrastive views.

The previous contrastive views originate from the same sample (e.g.,e.g., 𝐞u\mathbf{e}_{u}^{\prime} and 𝐞u′′\mathbf{e}_{u}^{\prime\prime} from 𝐞u\mathbf{e}_{u}, as shown in Eq. 4), necessitating that a set of views can be selected as negative samples (e.g.,e.g., iB𝐞ui′′\sum_{i\in B}\mathbf{e}_{u_{i}}^{\prime\prime}):

(5) cl=𝒞(𝐞u,𝐞u′′)+𝒞(𝐞i,𝐞i′′).\mathcal{L}_{cl}=\mathcal{C}(\mathbf{e}^{\prime}_{u},\mathbf{e}^{\prime\prime}_{u})+\mathcal{C}(\mathbf{e}^{\prime}_{i},\mathbf{e}^{\prime\prime}_{i}).

However, considering that the user-item contrastive views differs from the previous contrastive views, we utilize the item views and user views as negative samples for the user and item, respectively.

Formally, we propose the CL loss of user-item contrastive views is different from Eq. 5 as follows:

(6) cl=𝒞(𝐞u,𝐞i)+𝒞(𝐞i,𝐞u),\mathcal{L}_{cl}=\mathcal{C}(\mathbf{e}_{u},\mathbf{e}_{i})+\mathcal{C}(\mathbf{e}_{i},\mathbf{e}_{u}),

where 𝐞u\mathbf{e}_{u} and 𝐞i\mathbf{e}_{i} are from 𝐄=1L+1(𝐄(0)++A~L𝐄(0))\mathbf{E}=\frac{1}{L+1}(\mathbf{E}^{(0)}+...+\tilde{A}^{L}\mathbf{E}^{(0)}).

4.2. High-order Contrastive Views

Many studies (Wang et al., 2019) use aggregation functions to integrate layer-wise embeddings into the final encoder. Though the encoder effectively aggregates information from both the sample itself and its neighborhoods, we find that the low-order information is extremely unique. This uniqueness can seriously affect the similarity between positive pairs, thereby interfering with the mutual information of them.

To make the positive pairs more similar, we eliminate the low-order information (lh1l\leq h-1) and aggregate high-order information (lhl\geq h) as the final embeddings.

(7) 𝐄\displaystyle\mathbf{E} =1L+1(𝐄(0)++A~h1𝐄(0)LoworderInformation+A~h𝐄(0)++A~L𝐄(0)HighorderInformation),\displaystyle=\frac{1}{L+1}(\underbrace{\mathbf{E}^{(0)}+...+\tilde{A}^{h-1}\mathbf{E}^{(0)}}_{\mathrm{Low-order\;Information}}+\underbrace{\tilde{A}^{h}\mathbf{E}^{(0)}+...+\tilde{A}^{L}\mathbf{E}^{(0)}}_{\mathrm{High-order\;Information}}),
(8) 𝐄\displaystyle\mathbf{E} =1Lh+1(A~h𝐄(0)++A~L𝐄(0)),\displaystyle=\frac{1}{L-h+1}(\tilde{A}^{h}\mathbf{E}^{(0)}+...+\tilde{A}^{L}\mathbf{E}^{(0)}),

where hh denotes high-order information beginning at layer hh. Unlike various GCL-based methods, by mining the properties in the existing embedding, we find the high-order contrastive views in the graph convolution process. This idea reduces the time cost while preserving the original embedding information, thus improving the efficiency and performance of the model.

4.3. Multi-scale Fusion of Self-supervised Signals

In recommender systems, data augmentation typically involve augmenting the original user or item embeddings with multiple GCN and model modeling operations to generate different contrastive views. When calculating CL loss, different views of the same sample are regarded as positive pairs, while a set of augmented views are selected as negative pairs. Among them, it is not difficult to find that the negative pairs used in the previous CL loss come from a set of augmented views. In contrast, for the high-order contrastive views propose in Section 4.2, we give the corresponding CL loss in Eq. 6. It is differently from the previous negative pairs sampling approach, the negative pairs come from the opposing views. This difference leads to the self-supervised signals of users and items themselves are missing.

Original CL Loss. To be able to obtain this part of the self-supervised signal, we attempt to treat the samples themselves as negative pairs and introduce the corresponding CL loss:

(9) 𝒞self(𝐞a)=alogexp(𝐞a𝐞a/τ)bexp(eaeb/τ),\mathcal{C}_{self}(\mathbf{e}_{a})=-\sum_{a\in\mathcal{B}}\mathrm{log}\frac{\mathrm{exp}(\mathbf{e}_{a}^{\top}\mathbf{e}_{a}/\tau)}{\sum_{b\in\mathcal{B}}\mathrm{exp}(\mathrm{e}^{\top}_{a}\mathrm{e}_{b}/\tau)},

where aa and bb denote different users/items samples. Unlike previous GCL-based methods (Yu et al., 2022; Yang et al., 2023b), the positive pairs (i.e.,i.e., a,a\langle a,a\rangle) of samples are identical. Subsequently, we combine the CL loss associated with the users and the items themselves:

(10) clself(𝐞u,𝐞i)=𝒞self(𝐞u)+𝒞self(𝐞i).\mathcal{L}^{self}_{cl}(\mathbf{e}_{u},\mathbf{e}_{i})=\mathcal{C}_{self}(\mathbf{e}_{u})+\mathcal{C}_{self}(\mathbf{e}_{i}).

After obtaining the CL objectives for both users and items themselves, we combine these objectives with main CL loss (Eq. 6):

(11) cl=𝒞(𝐞u,𝐞i)+𝒞(𝐞i,𝐞u)+clself(𝐞u,𝐞i).\mathcal{L}_{cl}=\mathcal{C}(\mathbf{e}_{u},\mathbf{e}_{i})+\mathcal{C}(\mathbf{e}_{i},\mathbf{e}_{u})+\\ \mathcal{L}^{self}_{cl}(\mathbf{e}_{u},\mathbf{e}_{i}).

Fusion CL Loss. Most studies obtain self-supervised signals through traditional CL objectives (e.g.,e.g., separately obtaining signals from users and items). Although effective, we argue that differences in these self-supervised signals lead to suboptimal recommendations. Instead, in our CL objective, we propose four CL objectives to capture self-supervised signals. While proposed CL objectives capture more signals, in fact, different self-supervised signals make them extremely difficult to effectively integrate these self-supervised signals. It motivates us to further improve our CL objective to fuse these diverse self-supervised signals. Considering that, unlike the traditional CL loss, our proposed CL loss allows both users and items to serve as negative pairs. A simple strategy is to directly treat the concatenation of two contrastive views as negative pairs when constructing contrastive objectives:

(12) 𝒞con(𝐞u,𝐞i)\displaystyle\mathcal{C}_{con}(\mathbf{e}_{u},\mathbf{e}_{i}) =u,ilogexp(𝐞u𝐞i/τ)juiexp(𝐞u𝐞j/τ),\displaystyle=-\sum_{\langle u,i\rangle\in\mathcal{B}}\mathrm{log}\frac{\mathrm{exp}(\mathbf{e}^{\top}_{u}\mathbf{e}_{i}/\tau)}{\sum_{j\in\mathcal{B}_{ui}}\mathrm{exp}(\mathbf{e}^{\top}_{u}\mathbf{e}_{j}/\tau)},

where u\mathcal{B}_{u} and i\mathcal{B}_{i} denote the embeddings of 𝐞u\mathbf{e}_{u} and 𝐞i\mathbf{e}_{i} within the same batch, ui=[u,i]\mathcal{B}_{ui}=[\mathcal{B}_{u},\mathcal{B}_{i}] denotes the concatenation of u\mathcal{B}_{u} and i\mathcal{B}_{i}. While this strategy appears to be sound, the CL loss disregard the distance between 𝐞i\mathbf{e}_{i} and 𝐞j\mathbf{e}_{j}, focusing solely on the computation of 𝐞u\mathbf{e}_{u} with 𝐞j\mathbf{e}_{j}, resulting in bias in the obtained self-supervised signals.

Refer to caption
Figure 3. HFGCL Optimization Process. This process optimizes the primary objective, while fusing user, item, and user-item self-supervised signals from diverse contrastive views via a fusion CL objective.

To eliminate such biases, we propose a fused embedding representation 𝐞ui\mathbf{e}^{*}_{ui}. Specifically, we dynamically reconstruct and fuse the high-order neighborhood information of users and items:

(13) 𝐞ui=2l=hLαi𝒵u1|𝒵u||𝒵|i𝐞u(l)+(1α)u𝒵i1|𝒵u||𝒵i|𝐞i(l),\displaystyle\mathbf{e}^{*}_{ui}=2\sum_{l=h}^{L}\alpha\sum_{i\in\mathcal{Z}_{u}}\frac{1}{\sqrt{|\mathcal{Z}_{u}|}\sqrt{|\mathcal{Z}|_{i}}}\mathbf{e}_{u}^{(l)}+(1-\alpha)\sum_{u\in\mathcal{Z}_{i}}\frac{1}{\sqrt{|\mathcal{Z}_{u}|}\sqrt{|\mathcal{Z}_{i}|}}\mathbf{e}_{i}^{(l)},

where 𝒵u\mathcal{Z}_{u} and 𝒵i\mathcal{Z}_{i} denote the neighbors of users and items, respectively, α\alpha represents the parameter to adjust the contribution of user and item embeddings in the fusion process, 𝐞u(h)\mathbf{e}_{u}^{(h)} and 𝐞i(h)\mathbf{e}_{i}^{(h)} denote the embeddings obtained by stacking multiple GCN layers on the initial embeddings 𝐞u(0)\mathbf{e}_{u}^{(0)} and 𝐞u(0)\mathbf{e}_{u}^{(0)}, respectively. Subsequently, to better integrate the contrastive objectives generated by user, item, and user-item contrastive views, we present an advanced fusion CL loss that is specifically adapted to high-order contrastive views:

(14) 𝒞fusion(𝐞u,𝐞i)\displaystyle\mathcal{C}_{fusion}(\mathbf{e}_{u},\mathbf{e}_{i}) =u,ilogexp(𝐞u𝐞i/τ)juiexp(𝐞ui𝐞j)/τ),\displaystyle=-\sum_{\langle u,i\rangle\in\mathcal{B}}\mathrm{log}\frac{\mathrm{exp}(\mathbf{e}^{\top}_{u}\mathbf{e}_{i}/\tau)}{\sum_{j\in\mathcal{B}_{ui}}\mathrm{exp}(\mathbf{e}^{*\top}_{ui}\mathbf{e}_{j})/\tau)},

By calculating the similarity between fused embedding 𝐞ui\mathbf{e}^{*}_{ui} and richer negative pairs ui\mathcal{B}_{ui}, we can effectively integrate the self-supervised signals from diverse views and maximize the mutual information between the positive pair u,i\langle u,i\rangle. Meanwhile, to effectively illustrate how HFGCL optimizes the model, we provide the training process in Fig. 3 and demonstrate the signal fusion process. Since our high-order contrastive views select user-item interactions as positive pairs, unlike traditional contrastive methods that can only choose one set as negative pairs, we can simultaneously treat two sets of samples as negative pairs ui\mathcal{B}_{ui} for contrastive learning. Furthermore, our fusion CL loss can effectively aggregate this additional negative pairs information and naturally fuse all the self-supervised signals.

Formally, we propose the evolution of HFGCL’s CL loss:

cl\displaystyle\mathcal{L}_{cl} =(𝒞(𝐞u,𝐞i)+𝒞self(𝐞u))supervisesignal:user+(𝒞(𝐞i,𝐞u)+𝒞self(𝐞i))supervisesignal:item,\displaystyle=\underbrace{(\mathcal{C}(\mathbf{e}_{u},\mathbf{e}_{i})+\mathcal{C}_{self}(\mathbf{e}_{u}))}_{\mathrm{supervise\;signal:user}}+\underbrace{(\mathcal{C}(\mathbf{e}_{i},\mathbf{e}_{u})+\mathcal{C}_{self}(\mathbf{e}_{i}))}_{\mathrm{supervise\;signal:item}},
𝒞con(𝐞u,𝐞i)+𝒞con(𝐞i,𝐞u)supervisesignal:user&item,\displaystyle\Rightarrow\underbrace{\mathcal{C}_{con}(\mathbf{e}_{u},\mathbf{e}_{i})+\mathcal{C}_{con}(\mathbf{e}_{i},\mathbf{e}_{u})}_{\mathrm{supervise\;signal:user\&item}},
(15) 𝒞fusion(𝐞u,𝐞i).\displaystyle\Rightarrow\mathcal{C}_{fusion}(\mathbf{e}_{u},\mathbf{e}_{i}).

Through studying the relationships between multi-scale contrastive objectives, we simplify the various contrastive objectives traditionally used to capture different self-supervised signals. By utilizing a fusion CL loss, we efficiently obtain the richer self-supervised signals, enabling personalized recommendations.

4.4. Method Optimization

For the purpose of enhancing our contrastive method’s suitability for the primary recommendation task, we implement a multi-task training strategy to optimize the method’s parameters. The overall loss function for HFGCL is defined as follows:

(16) HFGCL=rec+λ1cl+λ2Θ22.\mathcal{L}_{\mathrm{HFGCL}}=\mathcal{L}_{rec}+\lambda_{1}\mathcal{L}_{cl}+\lambda_{2}\|\Theta\|^{2}_{2}.

where λ1\lambda_{1} and λ2\lambda_{2} denote the weights for cl\mathcal{L}_{cl} and Θ22\|\Theta\|^{2}_{2}, Θ\Theta denotes the regularization parameter applied to the naive embedding 𝐄(0)\mathbf{E}^{(0)}.

4.5. Method Analysis

4.5.1. Time Complexity Analysis

To demonstrate the efficiency of our HFGCL, we investigate the time complexity of the GCN- and GCL-based methods. Specifically, the embeddings in HFGCL are generated through graph convolution, yielding a time complexity of 𝒪(2Ld||)\mathcal{O}(2Ld|\mathcal{E}|), where |||\mathcal{E}| denotes the number of edge, dd denotes the embedding dimensions, LL denotes the GCN layer. Additionally, since calculating the CL loss requires fusing the self-supervised signals from users, items and user-item contrastive views, the time complexity becomes 𝒪(d(1+4))\mathcal{O}(\mathcal{B}d(1+4\mathcal{B})).

Table 3. Time complexity of proposed HFGCL method and the GCN- and CL-based baselines.
Method Encoding CL Loss
NGCF (Wang et al., 2019) 𝒪(2Ld(||+|𝒱|))\mathcal{O}(2Ld(|\mathcal{E}|+|\mathcal{V}|)) -
LightGCN (He et al., 2020) 𝒪((2Ld||)\mathcal{O}((2Ld|\mathcal{E}|) -
SGL-ED (Wu et al., 2021) 𝒪(2(1+2ρ)Ld||)\mathcal{O}(2(1+2\rho)Ld|\mathcal{E}|) 𝒪(2d(1+))\mathcal{O}(2\mathcal{B}d(1+\mathcal{B}))
SimGCL (Yu et al., 2022) 𝒪(6Ld||)\mathcal{O}(6Ld|\mathcal{E}|) 𝒪(2d(1+))\mathcal{O}(2\mathcal{B}d(1+\mathcal{B}))
NCL (Lin et al., 2022) 𝒪(2Ld||+d|𝒦||𝒱|)\mathcal{O}(2Ld|\mathcal{E}|+d|\mathcal{K}||\mathcal{V}|) 𝒪(4d+2|𝒱|))\mathcal{O}(4\mathcal{B}d+2\mathcal{B}|\mathcal{V}|))
CGCL (Yang et al., 2023b) 𝒪(2Ld||)\mathcal{O}(2Ld|\mathcal{E}|) 𝒪(6d+3|𝒱|)\mathcal{O}(6\mathcal{B}d+3\mathcal{B}|\mathcal{V}|)
VGCL (Yang et al., 2023b) 𝒪(2Ld||+d|𝒦||𝒱|)\mathcal{O}(2Ld|\mathcal{E}|+d|\mathcal{K}||\mathcal{V}|) 𝒪(4d(1+))\mathcal{O}(4\mathcal{B}d(1+\mathcal{B}))
LightGCL (Cai et al., 2023) 𝒪(2Ld(||+q|𝒱|))\mathcal{O}(2Ld(|\mathcal{E}|+q|\mathcal{V}|)) 𝒪(2d+|𝒱|)\mathcal{O}(2\mathcal{B}d+\mathcal{B}|\mathcal{V}|)
BIGCF (Zhang et al., 2024b) 𝒪(2Ld||+d|𝒦||𝒱|)\mathcal{O}(2Ld|\mathcal{E}|+d|\mathcal{K}||\mathcal{V}|) 𝒪(5d(1+))\mathcal{O}(5\mathcal{B}d(1+\mathcal{B}))
HFGCL(Ours) 𝒪(2Ld||)\mathcal{O}(2Ld|\mathcal{E}|) 𝒪(d(1+4))\mathcal{O}(\mathcal{B}d(1+4\mathcal{B}))

Table 3 summarizes the time complexity of HFGCL and several classical baselines, where ρ\rho denotes the probability of retaining edges or nodes in SGL-ED, |𝒦||\mathcal{K}| represents the number of collective intent nodes in BIGCF and the number of clusters in the clustering algorithms used for NCL and VGCL, |𝒱|=M+N|\mathcal{V}|=M+N is the total number of user and item nodes, and qq indicates the required rank for SVD in LightGCL. Methods based on GCL generate contrastive views through data augmentation (e.g.,e.g., SGL-ED and SimGCL), which require additional graph convolution operations and significantly increase the overall time complexity. Even methods that avoid these extra convolutions often treat all nodes as negative samples to maximize the mutual information of positive pairs (e.g.,e.g., NCL and CGCL), while some approaches involve time-consuming specialized modeling processes, such as graph reconstruction in VGCL and intent modeling in BIGCF. In contrast, HFGCL eliminates the need for additional data augmentation or specialized modeling operations during training. By introducing high-order contrastive views and proposing a fused contrastive loss, HFGCL achieves efficient contrastive learning, significantly enhancing both efficiency and recommendation performance.

4.5.2. Theoretical Analysis

Table 4. Comparison of user-item pair Cosine Similarities under different GCN layers on three datasets.
Layer Amazon-book Yelp2018 Tmall
l=0l=0 0.231 0.254 0.249
0lL0\leq l\leq L 0.312 0.306 0.277
hlLh\leq l\leq L 0.476 0.490 0.390

The core motivation of this paper is to construct high-quality contrastive views and fuse self-supervised signals from different CL objectives. For the former, after proposing the user-item contrastive views, to find high-quality contrastive views, we utilize cosine similarity to measure the similarity between positive pairs (i.e.,i.e., 𝐞u\mathbf{e}_{u} and 𝐞i\mathbf{e}_{i}) in Table 4. For the latter, to better integrate different self-supervised signals, we propose a fusion paradigm applicable to contrastive views. Specifically, the fusion CL loss simultaneously considers the distance between both positive pairs and more negative samples, further improving the alignment of positive pairs and the uniformity of the sample space (Wang et al., 2022a). To further explain, we present the gradient of user uu in fusion CL loss:

𝒞fusion𝐞u\displaystyle\frac{\partial\mathcal{C}_{fusion}}{\partial\mathbf{e}_{u}} =fτ𝐞i+jui(𝐞i+𝐞j)jui(exp(𝐞ui𝐞j)/τ),\displaystyle=\frac{f}{\tau}\cdot\frac{-\mathbf{e}_{i}+\sum_{j\in\mathcal{B}_{ui}}(\mathbf{e}_{i}+\mathbf{e}_{j})}{\sum_{j\in\mathcal{B}_{ui}}(\mathrm{exp}(\mathbf{e}^{{*\top}}_{ui}\mathbf{e}_{j})/\tau)},
(17) 1τ{c(u)+jui/uc(i,j)},\displaystyle\approx\frac{1}{\tau}\left\{c(u)+\sum_{j\in\mathcal{B}_{ui}/u}c(i,j)\right\},

where f=exp((𝐞u𝐞i)/τ)f=\mathrm{exp}((\mathbf{e}_{u}^{\top}\mathbf{e}_{i})/\tau), c()c(\cdot) is the contribution provided. For the first term c(u)c(u), the contribution of item ii to the gradient encourages user uu to move closer to item ii, directly strengthening the connection between the positive pair. For the second term c(i,j)c(i,j), the contribution of item ii and negative sample jj to the gradient indicates that user uu must consider the relationship between positive item ii and negative sample jj, thereby improving the distancing from hard negative samples filtered by item ii. The above demonstrates the effectiveness of proposed fusion paradigm:

  • For user-item pairs, maximizing the mutual information directly reflects users’ deeper preferences.

  • For more negative sample pairs, effective fusion of information between positive and negative pairs makes the feature space more uniform distribution.

The same conclusion holds for the gradient of item ii. Furthermore, for the temperature coefficient τ\tau, unlike previous work, we must assign a slightly larger τ\tau to the fusion CL loss due to the inclusion of information from both positive pairs and more negative samples.

5. EXPERIMENTS

In this section, to demonstrate the effectiveness of HFGCL, we perform experimental comparisons with the state-of-the-art recommendation methods on three real datasets.

5.1. Experimental Settings

Datasets. In selecting datasets to validate the effectiveness of our HFGCL, we choose three datasets that are widely recognized and frequently used for benchmarking in numerous studies: Amazon-book (He et al., 2020), Yelp2018 (Sang et al., 2025), Tmall (Ren et al., 2023). For all datasets, we consider all ratings ‘¿3’ as presence of interactions (i.e., presence of interaction is 1 otherwise 0). We filter out users with less than 10 interactions to ensure the validity of the recommendation. The details of all the datasets are shown in Table 5.

Table 5. Statistics of the datasets.
Dataset #Users #Items #Interactions Density
Amazon-book 52.6k 91.6k 2984.1k 0.06%
Yelp2018 31.7k 38.0k 1561.4k 0.13%
Tmall 47.9k 41.4k 2619.4k 0.13%
Table 6. A comparison of proposed HFGCL against the state-of-the-art baselines. The best value is highlighted in bold, the second-best value are underlined. R@ stands for Recall@ and N@ stands for NDCG@. ‘Improv.%’ denotes the relative improvement compared to the best baseline. * denotes significant improvements with t-test pp ¡ 0.05 over best baseline.
Method Amazon-book Yelp2018 Tmall
R@10 R@20 N@10 N@20 R@10 R@20 N@10 N@20 R@10 R@20 N@10 N@20
BPR-MF (UAI’2009) 0.0170 0.0308 0.0182 0.0239 0.0278 0.0486 0.0317 0.0394 0.0312 0.0547 0.0287 0.0400
NGCF (SIGIR’2019) 0.0199 0.0337 0.0200 0.0262 0.0331 0.0579 0.0368 0.0477 0.0374 0.0629 0.0351 0.0465
LightGCN (SIGIR’2020) 0.0228 0.0411 0.0241 0.0315 0.0362 0.0639 0.0414 0.0525 0.0435 0.0711 0.0406 0.0530
DirectAU (KDD’2022) 0.0296 0.0506 0.0297 0.0401 0.0414 0.0703 0.0477 0.0583 0.0475 0.0752 0.0443 0.0576
GraphAU (WSDM’2023) 0.0300 0.0502 0.0310 0.0400 0.0401 0.0691 0.0463 0.0574 0.0517 0.0840 0.0488 0.0625
Mult-VAE (WWW’2018) 0.0224 0.0407 0.0239 0.0315 0.0335 0.0584 0.0359 0.0450 0.0467 0.0740 0.0423 0.0552
CVGA (TOIS’2023) 0.0290 0.0492 0.0302 0.0379 0.0407 0.0694 0.0467 0.0571 0.0540 0.0854 0.0517 0.0648
DiffRec (SIGIR’2023) 0.0310 0.0514 0.0333 0.0418 0.391 0.0665 0.0447 0.0556 0.0485 0.0792 0.0473 0.0612
SGL-ED (SIGIR’2021) 0.0263 0.0478 0.0281 0.0379 0.0395 0.0675 0.0448 0.0555 0.0457 0.0738 0.0434 0.0556
SimGCL (SIGIR’2022) 0.0313 0.0515 0.0334 0.0414 0.0424 0.0721 0.0488 0.0601 0.0559 0.0884 0.0536 0.0674
NCL (WWW’2022) 0.0266 0.0481 0.0284 0.0373 0.0403 0.0685 0.0458 0.0577 0.0459 0.0750 0.0429 0.0553
CGCL (SIGIR’2023) 0.0274 0.0483 0.0284 0.0380 0.0404 0.0690 0.0452 0.0560 0.0542 0.0880 0.0510 0.0655
VGCL (SIGIR’2023) 0.0312 0.0515 0.0332 0.0410 0.0425 0.0715 0.0485 0.0587 0.0557 0.0880 0.0533 0.0670
LightGCL (ICLR’2023) 0.0303 0.0506 0.0318 0.0397 0.0377 0.0657 0.0437 0.0539 0.0531 0.0832 0.0533 0.0637
RecDCL (WWW’2024) 0.0311 0.0525 0.0318 0.0407 0.0408 0.0690 0.0464 0.0567 0.0527 0.0853 0.0492 0.0632
BIGCF (SIGIR’2024) 0.0294 0.0500 0.0320 0.0398 0.0431 0.0730 0.0497 0.0603 0.0547 0.0876 0.0524 0.0664
SCCF (KDD’2024) 0.0287 0.0491 0.0294 0.0399 0.0423 0.0718 0.0489 0.0595 0.0478 0.0772 0.0453 0.0580
HFGCL 0.0346 0.0566 0.0376 0.0458 0.0448 0.0752 0.0515 0.0624 0.0577 0.0911 0.0555 0.0697
Improv.% 10.54% 7.81% 12.57% 9.57% 3.94% 3.01% 3.62% 3.83% 3.22% 3.05% 3.54% 3.41%

Baselines. To demonstrate the effectiveness of HFGCL, we select numerous state-of-the-art baselines for comparison.

  • Only MF-based method: BPR-MF (Koren et al., 2009).

  • GCN-based methods: NGCF (Wang et al., 2019) and LightGCN (He et al., 2020).

  • AU-based methods: DirectAU (Wang et al., 2022a) and GraphAU (Yang et al., 2023a).

  • Generative-based methods: Mult-VAE (Liang et al., 2018), CVGA (Zhang et al., 2023) and DiffRec (Wang et al., 2023).

  • CL-based methods: SGL-ED (Wu et al., 2021), NCL (Lin et al., 2022), SimGCL(Yu et al., 2022), CGCL (He et al., 2023), VGCL (Yang et al., 2023b), LightGCL (Cai et al., 2023), RecDCL (Zhang et al., 2024a), BIGCF (Zhang et al., 2024b) and SCCF (Wu et al., 2024).

Evaluation Indicators. To evaluate the recommendation efficacy of the HFGCL method, we choose the widely used metrics (Zhang et al., 2023; Yu et al., 2022), Recall@K and NDCG@K (K=10, 20).

Hyperparameters. To ensure fair and consistent comparisons, all experiments run on a Linux system equipped with a GeForce RTX 2080Ti GPU. We implement HFGCL in the Pytorch environment111https://anonymous.4open.science/r/HFGCL-main-2F79. The batch size for all methods are set to 4096, 2048 and 4096 for Amazon-book, Yelp2018 and Tmall datasets, respectively. Except for RecDCL, all methods use an embedding size of 64, while RecDCL uses 2048, as it primarily studies variations across different embedding sizes. Embeddings are initialized using the Xavier strategy (Glorot and Bengio, 2010). Adam is used as the optimizer by default. For methods utilizing the GCN encoder, the number of GCN layers is chosen from {1, 2, 3}. Specifically, for HFGCL, the number of GCN layers LL is set to 3, the high-order information hh is varied within {1, 2, 3}, and the τ\tau parameter is selected from {0.20, 0.22, 0.24, 0.26, 0.28, 0.30}. The regularization weights λ1\lambda_{1} is chosen from {0.1,0.5,1.0, 2.5}, λ2\lambda_{2} is set from {1e-3, 1e-4, 1e-5, 1e-6}, and α\alpha is generally set to 0.5 by default.

5.2. Overall Performance Comparisons

To illustrate the superior performance of HFGCL, we compare with all baselines in Table 6. The following observations are made:

  • Our HFGCL achieves the most superior performance compared to all baselines on three sparse datasets. Specifically, compared to the strongest baseline, HFGCL improves w.r.t.w.r.t. NDCG@20 by 10.63%, 3.83%, and 3.41% on the Amazon-book, Yelp2018, and Tmall datasets, respectively. The experimental results are sufficient to show that HFGCL has a strong recommendation performance, enabling the provision of personalized recommendations.

  • Traditional MF-based method generally underperform compared to GNN-based methods, underscoring the substantial improvements that graph structures bring to recommender systems. Among these, LightGCN serves as the encoder for most GNN-based methods due to its straightforward and efficient architectural design. HFGCL uses this encoder, and by extracting the high-order information, we obtain the high-quality contrastive views thus further improving the method efficiency.

  • All CL-based methods such as SimGCL, RecDCL, and BIGCF demonstrate clear superiority over traditional methods, largely due to the self-supervised signals derived from CL. Within this group, HFGCL achieves the best performance. This is mainly attributed to the high quality contrastive views and the ability to fuse different self-supervised signals with CL loss. Additionally, while RecDCL achieves sub-optimal performance w.r.t.w.r.t. Recall@20 on the Amazon-book dataset, the use of the 2048 embedding size may compromise the fairness of comparisons with other methods. In contrast, HFGCL achieves superior recommendation performance by leveraging high-order contrastive for CL, without the need for any data augmentation.

5.3. Method Variants and Ablation Study

Table 7. Performance comparison of different variants of HFGCL and ablation experiments.
Method Amazon-book Yelp2018 Tmall
R@20 N@20 R@20 N@20 R@20 N@20
LightGCN 0.0411 0.0315 0.0639 0.0525 0.0711 0.0530
HFGCLb\mathrm{HFGCL}_{b} 0.0535 0.0430 0.0666 0.0548 0.0845 0.0644
HFGCLh\mathrm{HFGCL}_{h} 0.0536 0.0431 0.0718 0.0598 0.0873 0.0667
HFGCLs\mathrm{HFGCL}_{s} 0.0555 0.0452 0.0734 0.0612 0.0888 0.0680
w/o high 0.0545 0.0432 0.0704 0.0581 0.0782 0.0590
w/o GCN 0.0539 0.0422 0.0722 0.0595 0.0870 0.0666
w/o view 0.0466 0.0373 0.0671 0.0550 0.0725 0.0540
w/o CL 0.0372 0.0289 0.0570 0.0457 0.0580 0.0426
HFGCL 0.0566 0.0458 0.0752 0.0624 0.0911 0.0697
Improv.% 37.71% 45.40% 17.68% 18.86% 28.13% 31.51%

To demonstrate the effectiveness of HFGCL’s components, we introduce various variants and conduct an ablation study:

  • HFGCLb\mathrm{HFGCL}_{b}: replace the CL loss with Eq. 6. (0lL0\leq l\leq L)

  • HFGCLh\mathrm{HFGCL}_{h}: replace the CL loss with Eq. 6. (lhl\geq h)

  • HFGCLs\mathrm{HFGCL}_{s}: replace the CL loss with Eq. 11. (lhl\geq h)

  • HFGCLw/ohigh\mathrm{HFGCL_{w/o\,high}}: use user-item contrstive views. (0lL0\leq l\leq L)

  • HFGCLw/oGCN\mathrm{HFGCL_{w/o\,GCN}}: remove the GCN encoder. (l=0l=0)

  • HFGCLw/oview\mathrm{HFGCL_{w/o\,view}}: replace the CL loss with Eq. 10. (lhl\geq h)

  • HFGCLw/oCL\mathrm{HFGCL_{w/o\,CL}}: remove the CL loss. (lhl\geq h)

From the results shown in Table 7, we draw the following conclusions. Firstly, HFGCL outperforms LightGCN across all performance metrics. For variant HFGCLb\mathrm{HFGCL}_{b}, which utilizes user-item contrastive views, the performance not only matches that of SGL-ED but also significantly surpasses LightGCN. For variant HFGCLh\mathrm{HFGCL}_{h}, which eliminates low-order information, the CL loss effectively gets self-supervised signals from high-order information. For variant HFGCLs\mathrm{HFGCL}_{s}, with the addition self-sueprvies signals, there is a discernible enhancement in the method’s recommendation quality. These variants adequately address the question raised in Section 3.1 regarding the necessity of data augmentations, demonstrating that it is not essential for creating contrastive views.

Furthermore, for the fusion CL loss, we conduct a series of ablation studies to validate the effectiveness. For HFGCLw/ohigh\mathrm{HFGCL_{w/o\,high}}, the results show that low-order information hinders similarity between positive pairs. For HFGCLw/oGCN\mathrm{HFGCL_{w/o\,GCN}}, HFGCLw/oview\mathrm{HFGCL_{w/o\,view}} and HFGCLw/oCL\mathrm{HFGCL_{w/o\,CL}}, we remove the widely used GCN encoder, use the original CL loss and remove the CL loss, respectively, and the experimental results demonstrate the power of the fusion CL loss.

5.4. Method Efficiency Study

Refer to caption
Figure 4. Total training costs comparison of HFGCL and other GCL-based methods on three datasets.

To verify the efficiency of HFGCL, we compare the total training cost of the state-of-the-art methods. As shown in Fig. 4, our HFGCL has the shortest training cost on three datasets. This efficiency is primarily due to HFGCL quickly fuse self-supervised signals over the contrastive views, which enables it to converge exceedingly quickly. For SGL-ED and SimGCL, both data augmentation necessitate graph convolution, a process that is notably time-consuming. For NCL and CGCL, though they also do not require data augmentations, it is again extremely time-consuming to compute all users/items from the user-item interaction as negative samples. Moreover, for most of the GCL-based methods (using data augmentations), the overall convergence speed is significantly impacted because the quality of the generated embedding information is perturbed to varying degrees. It is noteworthy that despite the absence of training cost data for RecDCL, it ranks as the most time-consuming method in the comparison, primarily because of its embedding size of 2048.

5.5. Method Sparsity Study

To show the ability of HFGCL in addressing the data sparsity issue, we categorize the users of the dataset into three groups—sparse, common, and popular—following methodologies from previous research. The experimental results are shown in Fig. 5. In the sparsity study, we focus on the performance of the method in the sparse user group. Compared to existing SOTA methods, SimGCL and BIGCF, HFGCL shows significant performance improvements across all three datasets. Additionally, HFGCL maintains excellent performance across different levels of sparsity. On the Amazon-book dataset, we observe a general decline in performance for the regular user category, possibly due to higher noise levels in this group. Nevertheless, HFGCL still demonstrates strong recommendation performance, proving its robustness and resistance to sparsity.

NDCG@20(%)

Refer to caption
(a) Amazon-book
Refer to caption
(b) Yelp2018
Refer to caption
(c) Tmall
Figure 5. Performance comparison w.r.t. NDCG@20 for different user groups sparsity levels on three datasets.

5.6. Method Hyperparameter Study

In this section, we provide the sensitivity of HFGCL to various hyperparameters in Fig. 6 and Table 8. As depicted in Fig. 6, the method’s performance is significantly affected by the value of τ\tau, which is slightly higher compared to the previous method. This increase is primarily due to the elevated number of negative samples, necessitating a larger τ\tau to ensure optimal acquisition of mutual information. However, HFGCL’s performance is barely affected by λ1\lambda_{1}. We attribute this to our CL loss’s robust adaptation to the high-order contrastive views, ensuring the method performs optimally regardless of the λ1\lambda_{1} value. This efficacy underscores the strength of HFGCL’s CL loss. Table 8 shows the performance results when the high-order initialization layer hh of HFGCL is set to 1, 2 and 3. The results confirm the effectiveness of high-order information, indicating that when hlLh\leq l\leq L, HFGCL significantly improves performance. It also shows that stacking low-order information causes nodes to contain excessive self-information, reducing the similarity between positive pairs and resulting in suboptimal recommendations.

6. RELATED WORK

GNN-based Recommendation. Graph Neural Networks (GNNs) (Wu et al., 2022) have emerged as a pivotal research direction in recommender systems due to their unique architecture, which enables the effective capture of high-order information. The pioneering work of NGCF (Wang et al., 2019) introduced the use of GNNs for aggregating high-order domain information. Building on this, SGCN (Wu et al., 2019) advanced the field by eliminating nonlinearities and consolidating multiple weight matrices. The most critical development, LightGCN (He et al., 2020), has been widely adopted for its ability to achieve high-quality encoding by retaining only the essential neighborhood aggregation component.

Furthermore, GNNs have been extensively explored in various recommendation scenarios (Wang et al., 2022b; Sharma et al., 2024). In sequential recommendation, many methods, e.g.,e.g., GCE-GNN (Wang et al., 2020) leverage GNN to aggregate items within each sequence via interaction graphs, thereby enhancing the quality of item encoding. In social recommendation, numerous methods, e.g.,e.g., DVGRL (Zhang et al., 2024c) and GraphRec (Fan et al., 2019) combine interaction graphs with social relationship graphs to more effectively incorporate social relationships. However, despite the effectiveness, their lack of self-supervised signals hinders their ability to accurately capture users’ preferences.

CL-based Recommendation. Recent developments in recommender systems research have predominantly emphasized contrastive learning (CL) (Yu et al., 2024). The self-supervised signals generated by CL effectively address data sparsity issues. CL-based methods like NCL (Lin et al., 2022), SimGCL (Yu et al., 2022), VGCL(Yang et al., 2023b), and BIGCF (Zhang et al., 2024b) typically employ data augmentation to create contrastive views, by facilitating CL. For example, SimGCL (Yu et al., 2022) incorporates noise during graph convolution to generate multiple contrastive views. In contrast to traditional GCL-based methods, NCL (Lin et al., 2022) and CGCL (He et al., 2023) do not rely on data augmentation. For instance, NCL uses cross-layer and clustering methods to generate contrastive views and treats all user and item embeddings as negative samples for training. While these method have some benefits, they significantly increase the training cost. Our HFGCL, however, achieves optimal training efficiency and recommendation performance by considering high-order information as contrastive views and proposing a fusion CL loss.

Refer to caption
(a) τ\tau
Refer to caption
(b) λ1\lambda_{1}
Figure 6. Hyperparameter sensitivities to (a) the temperature coefficient τ\tau and (b) the graph contrastive regularization weight λ1\lambda_{1} w.r.t. Recall@20 across three datasets.
Table 8. Hyperparameter sensitivity to hh w.r.t. Recall@20 and NDCG@20 across three datasets.
Method Amazon-book Yelp2018 Tmall
R@20 N@20 R@20 N@20 R@20 N@20
HFGCL1\mathrm{HFGCL}_{1} 0.0526 0.0419 0.0742 0.0616 0.0888 0.0680
HFGCL2\mathrm{HFGCL}_{2} 0.0566 0.0458 0.0752 0.0624 0.0906 0.0684
HFGCL3\mathrm{HFGCL}_{3} 0.0542 0.0441 0.0744 0.0616 0.0911 0.0697

7. CONCLUSION

In this paper, we revealed the connection between data augmentations and contrastive learning (CL), then pointed out the challenges of each. These challenges motivated us to propose a novel recommendation method called Higher-order Fusion Graph Contrastive Learning (HFGCL), which filtered the low-order information to create high-order contrastive views. Additionally, we proposed a fusion CL loss which by more negative samples to maximize the mutual information, thus fusing diverse self-supervised signals. Our experimental results demonstrated the effectiveness of HFGCL on three public datasets.

In the future, we will focus on exploring the mechanism by which contrastive objectives generate self-supervised signals, particularly the interpretability of CL in recommender systems.

References

  • (1)
  • Assran et al. (2023) Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. 2023. Self-supervised Learning From Images With a Joint-Embedding Predictive Architecture. In IEEE Conference on Computer Vision and Pattern Recognition. 15619–15629.
  • Baevski et al. (2023) Alexei Baevski, Arun Babu, Wei-Ning Hsu, and Michael Auli. 2023. Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language. In Proceedings of the 40th International Conference on Machine Learning, Vol. 202. 1416–1429.
  • Bayer et al. (2022) Markus Bayer, Marc-André Kaufhold, and Christian Reuter. 2022. A Survey on Data Augmentation for Text Classification. Comput. Surveys 55, 7, Article 146 (Dec 2022), 39 pages.
  • Cai et al. (2023) Xuheng Cai, Chao Huang, Lianghao Xia, and Xubin Ren. 2023. LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation. In The Eleventh International Conference on Learning Representations. arXiv:2302.08191
  • Chen et al. (2020) Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning, Vol. 119. 1597–1607.
  • Chen et al. (2023) Weihua Chen, Xianzhe Xu, Jian Jia, Hao Luo, Yaohua Wang, Fan Wang, Rong Jin, and Xiuyu Sun. 2023. Beyond Appearance: A Semantic Controllable Self-supervised Learning Framework for Human-Centric Visual Tasks. In IEEE Conference on Computer Vision and Pattern Recognition. 15050–15061.
  • Fan et al. (2019) Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. 2019. Graph Neural Networks for Social Recommendation. In The World Wide Web Conference (WWW ’19). 417–426.
  • Gao et al. (2022a) Chen Gao, Xiang Wang, Xiangnan He, and Yong Li. 2022a. Graph Neural Networks for Recommender System. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (WSDM ’22). 1623–1625.
  • Gao et al. (2022b) Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2022b. SimCSE: Simple Contrastive Learning of Sentence Embeddings. arXiv:2104.08821
  • Glorot and Bengio (2010) Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 249–256.
  • He et al. (2023) Wei He, Guohao Sun, Jinhu Lu, and Xiu Susie Fang. 2023. Candidate-aware Graph Contrastive Learning for Recommendation. In Proceedings of the 46th International ACM SIGIR Conference (SIGIR ’23). 1670–1679.
  • He et al. (2020) Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, YongDong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 639–648.
  • Jing et al. (2023) Mengyuan Jing, Yanmin Zhu, Tianzi Zang, and Ke Wang. 2023. Contrastive Self-supervised Learning in Recommender Systems: A Survey. ACM Transactions on Information Systems 42, 2, Article 59 (Nov 2023), 39 pages.
  • Koren et al. (2009) Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer 42, 8 (Aug. 2009), 30–37.
  • Liang et al. (2018) Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony Jebara. 2018. Variational Autoencoders for Collaborative Filtering. In Proceedings of the 2018 World Wide Web Conference (WWW ’18). 689–698.
  • Lin et al. (2022) Zihan Lin, Changxin Tian, Yupeng Hou, and Wayne Xin Zhao. 2022. Improving Graph Collaborative Filtering with Neighborhood-enriched Contrastive Learning. In Proceedings of the ACM Web Conference 2022 (WWW ’22). 2320–2329.
  • Liu et al. (2023b) Xiao Liu, Fanjin Zhang, Zhenyu Hou, Li Mian, Zhaoyu Wang, Jing Zhang, and Jie Tang. 2023b. Self-supervised Learning: Generative or Contrastive. IEEE Transactions on Knowledge and Data Engineering 35, 1 (2023), 857–876.
  • Liu et al. (2023a) Yixin Liu, Ming Jin, Shirui Pan, Chuan Zhou, Yu Zheng, Feng Xia, and Philip S. Yu. 2023a. Graph Self-supervised Learning: A Survey. IEEE Transactions on Knowledge and Data Engineering 35, 6 (2023), 5879–5900.
  • Rebuffi et al. (2021) Sylvestre-Alvise Rebuffi, Sven Gowal, Dan Andrei Calian, Florian Stimberg, Olivia Wiles, and Timothy A Mann. 2021. Data Augmentation Can Improve Robustness. In Advances in Neural Information Processing Systems, Vol. 34. 29935–29948.
  • Ren et al. (2023) Xubin Ren, Lianghao Xia, Jiashu Zhao, Dawei Yin, and Chao Huang. 2023. Disentangled Contrastive Collaborative Filtering. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23). 1137–1146.
  • Rendle et al. (2009) Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the 25th Conference on Uncertainty in AI. 452–461.
  • Sang et al. (2025) Lei Sang, Yu Zhang, Yi Zhang, Honghao Li, and Yiwen Zhang. 2025. Towards similar alignment and unique uniformity in collaborative filtering. Expert Systems with Applications 259 (2025), 125346.
  • Sharma et al. (2024) Kartik Sharma, Yeon-Chang Lee, Sivagami Nambi, Aditya Salian, Shlok Shah, Sang-Wook Kim, and Srijan Kumar. 2024. A Survey of Graph Neural Networks for Social Recommender Systems. ACM Comput. Surv. 56, 10, Article 265 (jun 2024), 34 pages.
  • Wang et al. (2022a) Chenyang Wang, Yuanqing Yu, Weizhi Ma, Min Zhang, Chong Chen, Yiqun Liu, and Shaoping Ma. 2022a. Towards Representation Alignment and Uniformity in Collaborative Filtering. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1816–1825.
  • Wang et al. (2022b) Shoujin Wang, Qi Zhang, Liang Hu, Xiuzhen Zhang, Yan Wang, and Charu Aggarwal. 2022b. Sequential/Session-based Recommendations: Challenges, Approaches, Applications and Opportunities. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (Madrid, Spain) (SIGIR ’22). 3425–3428.
  • Wang et al. (2023) Wenjie Wang, Yiyan Xu, Fuli Feng, Xinyu Lin, Xiangnan He, and Tat-Seng Chua. 2023. Diffusion Recommender Model. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23). 832–841.
  • Wang et al. (2019) Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural Graph Collaborative Filtering. In Proceedings of the 42nd International ACM SIGIR Conference. 165–174.
  • Wang et al. (2020) Ziyang Wang, Wei Wei, Gao Cong, Xiao-Li Li, Xian-Ling Mao, and Minghui Qiu. 2020. Global Context Enhanced Graph Neural Networks for Session-based Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 169–178.
  • Wu et al. (2023b) Chuhan Wu, Fangzhao Wu, Yongfeng Huang, and Xing Xie. 2023b. Personalized News Recommendation: Methods and Challenges. ACM Transactions on Information Systems 41, 1, Article 24 (Jan. 2023), 50 pages.
  • Wu et al. (2019) Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. 2019. Simplifying Graph Convolutional Networks. In Proceedings of the 36th International Conference on Machine Learning, Vol. 97. 6861–6871.
  • Wu et al. (2021) Jiancan Wu, Xiang Wang, Fuli Feng, Xiangnan He, Liang Chen, Jianxun Lian, and Xing Xie. 2021. Self-supervised Graph Learning for Recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 726–735.
  • Wu et al. (2023a) Le Wu, Xiangnan He, Xiang Wang, Kun Zhang, and Meng Wang. 2023a. A Survey on Accuracy-Oriented Neural Recommendation: From Collaborative Filtering to Information-Rich Recommendation. IEEE Transactions on Knowledge and Data Engineering 35, 5 (2023), 4425–4445.
  • Wu et al. (2022) Shiwen Wu, Fei Sun, Wentao Zhang, Xu Xie, and Bin Cui. 2022. Graph Neural Networks in Recommender Systems: A Survey. Comput. Surveys 55, 5, Article 97 (Dec 2022), 37 pages.
  • Wu et al. (2024) Yihong Wu, Le Zhang, Fengran Mo, Tianyu Zhu, Weizhi Ma, and Jian-Yun Nie. 2024. Unifying Graph Convolution and Contrastive Learning in Collaborative Filtering. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24). 3425–3436.
  • Xia et al. (2023) Lianghao Xia, Chao Huang, Chunzhen Huang, Kangyi Lin, Tao Yu, and Ben Kao. 2023. Automated Self-Supervised Learning for Recommendation. In Proceedings of the ACM Web Conference 2023. 992–1002.
  • Yang et al. (2023a) Liangwei Yang, Zhiwei Liu, Chen Wang, Mingdai Yang, Xiaolong Liu, Jing Ma, and Philip S. Yu. 2023a. Graph-based Alignment and Uniformity for Recommendation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 4395–4399.
  • Yang et al. (2023b) Yonghui Yang, Zhengwei Wu, Le Wu, Kun Zhang, Richang Hong, Zhiqiang Zhang, Jun Zhou, and Meng Wang. 2023b. Generative-Contrastive Graph Learning for Recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1117–1126.
  • Yu et al. (2022) Junliang Yu, Hongzhi Yin, Xin Xia, Tong Chen, Lizhen Cui, and Quoc Viet Hung Nguyen. 2022. Are Graph Augmentations Necessary?: Simple Graph Contrastive Learning for Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1294–1303.
  • Yu et al. (2024) Junliang Yu, Hongzhi Yin, Xin Xia, Tong Chen, Jundong Li, and Zi Huang. 2024. Self-supervised Learning for Recommender Systems: A Survey. IEEE Transactions on Knowledge and Data Engineering 36, 1 (2024), 335–355.
  • Zhang et al. (2024a) Dan Zhang, Yangliao Geng, Wenwen Gong, Zhongang Qi, Zhiyu Chen, Xing Tang, Ying Shan, Yuxiao Dong, and Jie Tang. 2024a. RecDCL: Dual Contrastive Learning for Recommendation. In Proceedings of the ACM on Web Conference 2024 (WWW ’24). 3655–3666.
  • Zhang et al. (2024b) Yi Zhang, Lei Sang, and Yiwen Zhang. 2024b. Exploring the Individuality and Collectivity of Intents behind Interactions for Graph Collaborative Filtering. In Proceedings of the 47th International ACM SIGIR Conference (SIGIR ’24). Association for Computing Machinery, 1253–1262.
  • Zhang et al. (2023) Yi Zhang, Yiwen Zhang, Dengcheng Yan, Shuiguang Deng, and Yun Yang. 2023. Revisiting Graph-based Recommender Systems from the Perspective of Variational Auto-Encoder. ACM Transactions on Information Systems 41, 3 (2023), 1–28.
  • Zhang et al. (2024c) Yi Zhang, Yiwen Zhang, Yuchuan Zhao, Shuiguang Deng, and Yun Yang. 2024c. Dual Variational Graph Reconstruction Learning for Social Recommendation. IEEE Transactions on Knowledge and Data Engineering (2024). https://doi.org/10.1109/TKDE.2024.3386895