This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Graph Cross-Correlated Network for Recommendation

Hao Chen1, Yuanchen Bei1, Wenbing Huang, Shengyuan Chen, Feiran Huang2, and Xiao Huang Hao Chen is with the Faculty of Data Science, City University of Macau, Macao SAR, China. E-mail: [email protected] Yuanchen Bei is with the College of Computer Science and Technology, Zhejiang University, Hangzhou, China. E-mail: [email protected]. Wenbing Huang is with the Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China. E-mail: [email protected]. Shengyuan Chen and Xiao Huang are with the Department of Computing, The Hong Kong Polytechnic University, Hong Kong SAR, China. E-mail: [email protected]; [email protected]. Feiran Huang is with the College of Information Science and Technology, Guangzhou, China. E-mail: [email protected]. * Both authors contributed equally to this research. \dagger Corresponding author: Feiran Huang ([email protected]).
Abstract

Collaborative filtering (CF) models have demonstrated remarkable performance in recommender systems, which represent users and items as embedding vectors. Recently, due to the powerful modeling capability of graph neural networks for user-item interaction graphs, graph-based CF models have gained increasing attention. They encode each user/item and its subgraph into a single super vector by combining graph embeddings after each graph convolution. However, each hop of the neighbor in the user-item subgraphs carries a specific semantic meaning. Encoding all subgraph information into single vectors and inferring user-item relations with dot products can weaken the semantic information between user and item subgraphs, thus leaving untapped potential. Exploiting this untapped potential provides insight into improving performance for existing recommendation models. To this end, we propose the Graph Cross-correlated Network for Recommendation (GCR), which serves as a general recommendation paradigm that explicitly considers correlations between user/item subgraphs. GCR first introduces the Plain Graph Representation (PGR) to extract information directly from each hop of neighbors into corresponding PGR vectors. Then, GCR develops Cross-Correlated Aggregation (CCA) to construct possible cross-correlated terms between PGR vectors of user/item subgraphs. Finally, GCR comprehensively incorporates the cross-correlated terms for recommendations. Experimental results show that GCR outperforms state-of-the-art models on both interaction prediction and click-through rate prediction tasks.

Index Terms:
recommender systems, collaborative filtering, cross-correlation.

I Introduction

Collaborative filtering (CF) models have delivered remarkable recommendation performance by representing each user and item as embedding vectors, where the similarity between users/items is reflected in their embeddings [1, 2]. Consequently, an intuitive paradigm is to infer the potential relation between users and items by assessing the correlation of their embedding vectors, such as through the dot product or multilayer perceptrons. A typical way to obtain these embeddings is to directly apply the matrix factorization on the user-item interaction matrix [3, 4]. Moreover, advancements involve utilizing neural networks to learn interactions [1, 5] or incorporating additional regularizers [6, 7]. As research progresses, the question of how to learn powerful and meaningful user/item embeddings has gained significant interest from both the academic and industrial communities.

Recently, there has been a surge in CF-based recommendation models that harness the powerful modeling capabilities of Graph Neural Networks (GNNs) on graph-structured data. These models leverage graph convolutional processes to extract meaningful graph information from user-item behavior graphs progressively gathering information from distant neighbors within the user/item subgraphs [8, 9, 10]. Embedding representations for each user and item are derived after each round of graph convolution, converging into a single super vector across layers that encapsulates their interaction patterns. Subsequently, they follow and employ the conventional approach of computing the dot product of the super vectors to infer the relationship between a given user-item pair. Representatively, NGCF [8] utilizes a concatenation operation to construct the super vectors, while LightGCN [9] employs a weighted sum. This research direction has consistently demonstrated state-of-the-art performance [9, 11, 12].

Refer to caption
Figure 1: An illustration of the user-item graph and the different semantic meanings of each hop of neighbors in the user/item subgraph.

However, as shown in Figure 1, the recommendation graphs constructed from historical user-item interactions are heterogeneous bipartite graphs [13], where each hop of neighbor in the user-item subgraphs carries a specific semantic meaning. For instance, the first-hop neighbors of a given user comprise its historical interacted items, and the second-hop neighbors are relevant users with similar interests who have interacted with common items. Therefore, explicitly considering whether a user has interacted with similar items or whether its relevant users are interested in the target item can provide valuable information. Simply encoding all subgraph information into single vectors and inferring the user-item relation through dot products will undermine and weaken the semantic information between the user and item subgraphs, thereby wasting untapped potential.

By exploiting these potentials, we can not only develop advanced recommendation models but also improve the performance of existing CF models. Moreover, current GNNs adopt recursive graph convolution to aggregate neighbor embeddings layer by layer [8, 9, 14], resulting in an indistinguishable combination of information from different neighbor hops. This blending of information results in impure embeddings for each hop of neighboring nodes. Based on these observations, the traditional paradigm of using a single vector representation holds untapped potential in improving recommendation performance. Therefore, it is promising to provide non-recursive plain graph representation methods to obtain pure embeddings for each hop of neighbors, enabling the representation of a given user/item subgraph while explicitly extracting and considering all cross-correlations between user/item subgraphs.

Although showing promising prospects, it remains the following challenges: First, lacking suitable paradigms. Due to the graph convolution and single user-item vector inference architecture in current models, it is not applicable and non-trivial to extract pure embeddings for each hop of neighbors and comprehensively capture meaningful correlations between these embeddings in existing model paradigms. Moreover, ensuring design universality. Recommendation models in different stages of the recommender system exhibit various characteristics. For instance, the models used in the system’s ranking stage and prediction stage have different model architectures and computational efficiency requirements. Consequently, the designed model should adapt to the mainstream recommendation tasks and can plugin into current recommender models.

To this end, in this paper, we investigate how to explore a general graph-based recommendation paradigm to exploit the full potential of embedding vectors in user subgraphs and item subgraphs. Specifically, we aim to investigate the following three research questions: (1) How can we obtain pure embedding vectors for each hop of neighbors and represent the given user/item subgraph with more than a single vector? (2) How to construct and model meaningful cross-correlation terms between the vectors of user subgraphs and item subgraphs? (3) How to design a flexible paradigm for different recommendation tasks and can be equipped with current recommenders? By studying these research questions, The main contributions of this paper are organized as follows:

  • We propose a general graph-based recommendation paradigm called Graph Crossed-correlated Network for Recommendation (GCR), which represents user/item subgraphs as multiple vectors and explicitly extracts and comprehensively considers the cross-correlation between a pair of user-item subgraphs. GCR is not only an advanced recommendation model but also provides the flexibility to improve the performance of existing recommendation models.

  • We develop a novel non-recursive plain graph representation architecture for subgraph representation in GCR, called Plain Graph Representation (PGR). Different from the previous graph convolution process in the recommendation, PGR first unfolds all neighbors and then aggregates each hop of neighbors, respectively.

  • We further provide Cross-Correlated Aggregation (CCA) to explicitly extract and comprehensively consider all cross-correlations between the vectors of the hops of user and item subgraphs. We also theoretically illustrate that GCR has significantly higher flexibility than existing state-of-the-art graph-based recommendation models.

  • Extensive experimental results on both public datasets and the industrial dataset demonstrate that GCR significantly outperforms the state-of-the-art models. Moreover, replacing the dot product or MLP layer in baseline models with GCR can also significantly improve the recommendation performance of both interaction prediction and click-through rate prediction tasks.

II RELATED WORK

Current recommendation models can be categorized into three classes based on how the embedding is inferred: Collaborative Filtering (CF), Graph Neural Network (GNN)-based models, and Click-Through Rate (CTR) prediction models. Here, we discuss these types of methods and highlight the necessity of the graph cross-correlated network.

II-A Collaborative Filtering

Collaborative Filtering (CF) is a prevalent technique for recommender systems that parameterizes users and items as embeddings and learns their embeddings based on historical user-item interactions [1, 15, 16, 17].

One typical type of pioneering work, Matrix Factorization (MF) [3, 4], projects the user and item indexes into embedding vectors. It reconstructs historical user-item interactions by computing the dot product between user embeddings and item embeddings. To improve the quality of these embeddings, researchers have explored incorporating various types of auxiliary information, such as content information [18, 19], social relations [20, 21], user reviews [22, 23], and knowledge graphs [24, 25, 26, 27]. Although the dot product can extract interaction information by forcing interacted user-item pairs to achieve a larger dot product value, the linear interaction modeling limits the modeling performance when dealing with complex, sparse, or implicit interactions [1, 28]. To enhance the expressive power of the models and capture more intricate interactions, researchers have explored the use of neural networks to learn the interactions [1, 29]. This approach offers a promising avenue to improve the approximation capabilities and handle complex patterns in user-item interactions.

Another line of research finds that user-item interactions form a bipartite or heterogeneous graph. In this sense, conventional CF approaches derive user/item embeddings by only considering 1-hop interactions. Therefore, graph embedding methods, initially introduced for social network data mining tasks such as node classification, clustering, and community detection [30, 31], have demonstrated significant potential in real-world recommender systems. For example, DeepWalk [32] and Node2Vec [33] adapt word embedding methods [34] to graph structures and construct the corpus by randomly walking on social networks. Inspired by this, MetaPath2Vec [35] introduces the metapath-based random walk upon heterogeneous networks to embed users, items, and any other type of nodes together.

II-B Graph Neural Networks for Recommendations

In recent years, graph neural networks (GNNs) have garnered significant attention and have demonstrated state-of-the-art performance across various graph-related tasks [14, 36, 37, 38, 39, 40]. Recognizing the strengths of GNNs, researchers have begun incorporating them into Collaborative Filtering (CF) methods to enhance the learning of more powerful embeddings [9, 41]. For instance, NGCF [8] adapts user-to-item propagation and user-to-user propagation to extract graph embeddings for each user and item and then computes the dot product of concatenated layer-wise graph embeddings. To speed up the training process, LightGCN [9] discards the non-linear layers in NGCF and aggregates the layer-wise graph embeddings with weighted summations. UltraGCN [42] replaces the complex graph convolution process with simple auxiliary objective losses. Some recent studies further include data augmentation and self-supervised learning for enhancing graph-based recommendation models [11, 43]. Representatively, SGL [44] first introduces self-supervised learning on the user-item graph by creating multiple views of a node. It is trained using contrastive learning that maximizes agreement between these different views of the same node while minimizing agreement with views of different nodes. NCL [45] enhances the neighbor set with semantic neighbors guided by contrastive learning. SimGCL [46] is a simple but effective graph contrastive learning model with the removal of unnecessary augmentations. CGCL [47] further includes the novel strategies of candidate contrastive learning and candidate structural neighbor contrastive learning. Ultimately, these GNN-based models infer user-item interactions by computing the dot product of the super representation embedding vectors for users and items. By combining the power of GNNs with CF methods, these models leverage the graph structure of user-item interactions [10].

Although these methods have achieved state-of-the-art recommendation performance, the architecture that simply aggregates the information of users/items and their neighbors might not have enough degrees of freedom to model sophisticated user-item relationships. Despite the progress in learning more powerful embeddings, most of the aforementioned methods compute user-item relevance through a simple inner product [3, 4, 9] or MLP [1, 5]. Earily, NIA-GCN [48] conducts interactions between each central node and its neighbors before aggregation. However, this process does not consider the relevance of the user subgraph and the item subgraph. There have been few efforts to design more expressive relevance metrics or to investigate how the relevance metrics might influence eventual recommendations.

Therefore, in this paper, our proposed Graph Crossed-correlated Network for Recommendation (GCR) explores whether there exists a learnable relevance metric that can be compatible with any kind of embeddings, including CF embeddings, graph embeddings, and GNN embeddings. This not only helps to enhance the model in existing recommender systems but also enables researchers to gain a deeper understanding of recommendation systems.

II-C Click-Through Rate Prediction

Click-Through Rate (CTR) prediction is a crucial stage for online recommender systems and computational advertisements [29, 49, 50]. Existing models for CTR prediction can be largely divided into two main categories: feature-interaction methods and user-interest modeling methods.

Due to the high sparsity of raw input features and the difficulty of directly utilizing such features, feature interaction-based models have received extensive research for CTR prediction. Representatively, FM [51] first introduces the latent vectors for 2-order feature interaction to address the feature sparsity. Wide&Deep [52] conducts feature interaction by a wide linear regression model and a deep feed-forward network with joint training. AFM [53] then design the attention mechanism for cross-feature learning. DeepFM [54] further replaces the linear regression in Wide&Deep with FM to avoid feature engineering. Since the user’s historical interactions can reflect the interest, another line of work in recent advances focuses more on user interest modeling using deep neural networks and achieves state-of-the-art results. Notably, DIN [29] first designs a deep interest network with an attention mechanism between the user’s behavior sequence and the target item. DIEN [49] then further enhances DIN with gated recurrent neural units for user’s evolution patterns mining. UBR4CTR [55] and SIM [56] then design a two-stage paradigm, searching relevant items and computing their attention score with the target, to learn of the user’s life-long behaviors. DCIN [57] is a further novel model that integrates explicit and implicit decision-making contexts. Moreover, GMT [58] and NRCGI [59] then include graph sampling and graph clustering for CTR prediction with graph information inclusion in an efficient way.

We have also conducted experimental analyses of equipping the CTR prediction models with our proposed GCR on both the public dataset and the real-world industry dataset, which show that GCR can also be generalized to provide remarkable improvement on the CTR prediction task.

III Preliminaries

Notations. We suppose the sets of users and items are denoted as 𝒰{\mathcal{U}} and {\mathcal{I}}, respectively. Then, we use u𝒰u\in{\mathcal{U}} as the user index and ii\in{\mathcal{I}} as the item index. If no distinction is needed between users and items, we will adopt jj as the index. Let 𝑹|𝒰|×||{\bm{R}}\in\mathbb{R}^{|{\mathcal{U}}|\times|{\mathcal{I}}|} be the user-item interaction matrix, for each rui𝑹r_{ui}\in{\bm{R}}, we have rui=yuir_{ui}=y_{ui}, where yui=1y_{ui}=1 if the interaction between user uu and item ii is observed, and otherwise, yui=0y_{ui}=0. Thus, the bipartite graph is defined as 𝒢=(𝒰,,𝑹){\mathcal{G}}=({\mathcal{U}},{\mathcal{I}},{\bm{R}}) implied by the interaction matrix.

We now provide the formal formulation of the recommendation task, followed by introducing several representative methods along with their corresponding mathematical formalization, i.e., CF-based models, GNN-based models, and CTR prediction models.

Definition 1 (Recommendation Task).

Given the interaction graph 𝒢{\mathcal{G}}, the recommendation task involves predicting the relationships between target users and target items. In form, a recommender model is required to seek to predict these relations with the aim to learn a relevance function r:𝒰×𝒱r:{\mathcal{U}}\times{\mathcal{V}}\rightarrow\mathbb{R} based on the interactions in 𝒢{\mathcal{G}}, namely, y^ui=r(u,i)\hat{y}_{ui}=r(u,i) to compute the relevance between uu and ii.

CF-based models. As shown in Figure 2-(a), CF-based recommender models [3, 6, 4] learn the embeddings of users/items by fitting the constructed interaction matrix 𝑹{\bm{R}}:

𝑬𝒰,𝑬=fCF(𝑹),\bm{E}_{{\mathcal{U}}},\bm{E}_{{\mathcal{I}}}=f_{CF}({\bm{R}}), (1)

where 𝑬𝒰\bm{E}_{{\mathcal{U}}} and 𝑬\bm{E}_{{\mathcal{I}}} denote the learned embedding of the user set and the item set, respectively. Then, the inference of the user-item interactions for CF-based recommendation models can be written as:

rCF(u,i)=𝒆u𝒆i,r_{CF}(u,i)={\bm{e}}_{u}^{\top}\cdot{\bm{e}}_{i}, (2)

where 𝒆u=𝑬𝒰[u,:]d{\bm{e}}_{u}=\bm{E}_{{\mathcal{U}}}[u,:]\in\mathbb{R}^{d} and 𝒆i=𝑬[i,:]d{\bm{e}}_{i}=\bm{E}_{{\mathcal{I}}}[i,:]\in\mathbb{R}^{d} are the embeddings of user uu and item ii, respectively, and dd denotes the dimension of the embeddings. By denoting the embeddings as 𝑬𝒰{\bm{E}}_{{\mathcal{U}}} for all users and as 𝑬{\bm{E}}_{{\mathcal{I}}} for all items, the relevance function in Eq. (2) is generalized as

rCF(u,i)=g(u,i;𝒢,𝑬𝒰,𝑬),r_{CF}(u,i)=g(u,i;{\mathcal{G}},{\bm{E}}_{{\mathcal{U}}},{\bm{E}}_{{\mathcal{I}}}), (3)

where the function g()g(\cdot) is conditional on the interaction graph and the embeddings.

GNN-based models. Graph neural networks [14, 36] have become one of the most successful tools for graph modeling. When GNNs are tailored for recommendation tasks, a new family of methods is developed [8, 9]. As depicted in Figure 2-(b), GNN-based recommendation models first generate multi-hop graph embeddings by recursively passing messages among neighboring nodes and then aggregate the multi-hop graph embeddings to construct a super vector for the final inference.

The recursive computation to obtain the multi-hop graph embeddings can be formally written as:

𝒆j(l+1)=GCL(𝒆j(l),𝒆k(l)fork𝒩j),{\bm{e}}_{j}^{(l+1)}=\text{GCL}({\bm{e}}_{j}^{(l)},{{\bm{e}}_{k}^{(l)}\ \text{for}\ k\in{\mathcal{N}}_{j}}), (4)

where 𝒩j{\mathcal{N}}_{j} is defined as the neighbors of node jj, 𝒆k(0){\bm{e}}_{k}^{(0)} denotes the embedding of node kk before the graph convolutional process. GCL()\text{GCL}(\cdot) denotes a layer of graph convolution computation process. Both the embeddings and the parameters of GNN are learned from the observed interactions. Some recent state-of-the-art papers, such as LightGCN [9], find that a more lightweight GNN model will be more beneficial.

After obtaining multi-hop graph embeddings, GNN-based models aggregate these embeddings into super vectors with concatenation or weighted summation and then use the dot product of the super vectors to infer user-item interactions. Specifically, the representative NGCF [8] employs concatenation to aggregate the multi-hop graph embeddings, and the inference of the user-item interactions can be given as:

rNGCF(u,i)\displaystyle r_{NGCF}(u,i) =(l=0L𝒆u(l))(l=0L𝒆i(l))\displaystyle=(\parallel_{l=0}^{L}{\bm{e}}_{u}^{(l)})^{\top}\cdot(\parallel_{l=0}^{L}{\bm{e}}_{i}^{(l)}) (5)
=l=0L𝒆u(l)𝒆i(l),\displaystyle=\sum_{l=0}^{L}{\bm{e}}_{u}^{(l)\top}\cdot{\bm{e}}_{i}^{(l)},

where LL denotes the number of graph convolutional layers. Similarly, for the state-of-the-art model LightGCN [9], which aggregates multi-hop graph embeddings with weighted summation, the inference of the user-item interactions can be written as follows:

𝒆x=l=0Lwx(l)𝒆x(l),x{u,\displaystyle{\bm{e}}_{x}=\sum_{l=0}^{L}w_{x}^{(l)}{\bm{e}}_{x}^{(l)},\quad x\in\{u, i},\displaystyle i\}, (6)
rLightGCN(u,i)=𝒆u𝒆i\displaystyle r_{LightGCN}(u,i)={\bm{e}}_{u}^{\top}\cdot{\bm{e}}_{i} ,

where wu(l)w_{u}^{(l)} denotes the aggregation weight for user’s ll-hop graph embedding, and wi(l)w_{i}^{(l)} denotes the aggregation weight for item’s ll-hop graph embedding.

CTR prediction models. CTR prediction plays a central role in recommendation systems and online advertisements due to its huge commercial value by predicting user preferences towards items/advertisements in real-time [54, 29, 59]. Our proposed GCR is also flexible to enhance the CTR prediction performance, thus here we provide the formalization of the CTR prediction models.

Given a target user-item pair (u,i)(u,i), the optional context features 𝒔ui\bm{s}_{ui} and the interaction graph 𝒢{\mathcal{G}}, the CTR prediction task is to predict the target user uu’s clicking probability y^ui\hat{y}_{ui} on the target item ii. In form, the aim of a CTR model is to learn an accurate prediction function (){\mathcal{F}}(\cdot) with the output of the predicted clicking probability y^ui\hat{y}_{ui} as:

y^ui=(𝒆u,𝒆i,𝒔ui,𝒢),\hat{y}_{ui}={\mathcal{F}}({\bm{e}}_{u},{\bm{e}}_{i},\bm{s}_{ui},{\mathcal{G}}), (7)

where the learnable function (){\mathcal{F}}(\cdot) in CTR prediction models has the objection to minimize the difference from the predicted y^ui\hat{y}_{ui} to the ground-truth yuiy_{ui}.

Essential notations and definitions we used in this paper can be found in Table I.

TABLE I: Main symbols and definitions in the paper.
Symbol Description
𝒰{\mathcal{U}} the set of users
{\mathcal{I}} the set of items
𝑹{\bm{R}} the user-item interaction matrix
𝒢=(𝒰,,𝑹){\mathcal{G}}=({\mathcal{U}},{\mathcal{I}},{\bm{R}}) the user-item bipartite interaction graph
rui𝑹r_{ui}\in{\bm{R}} the ground-truth interaction relationship between user uu and item ii
dd dimension of user/item embeddings
𝑬𝒰{\bm{E}}_{{\mathcal{U}}} embedding matrix of the user set |𝒰|×d\in\mathbb{R}^{|{\mathcal{U}}|\times d}
𝑬{\bm{E}}_{{\mathcal{I}}} embedding matrix of the item set ||×d\in\mathbb{R}^{|{\mathcal{I}}|\times d}
𝒆u{\bm{e}}_{u} the embedding of the given user uu
𝒆i{\bm{e}}_{i} the embedding of the given item ii
𝒩j{\mathcal{N}}_{j} the neighboring nodes of a user/item node jj on the interaction graph
y^ui\hat{y}_{ui} the predicted probability by recommender models that user uu will interacted with item ii

IV Methodology

Refer to caption
Figure 2: (a) & (b): Sketch of existing recommendation models; (c): The overall framework architecture of our proposed GCR. (a) CF-based recommendation models utilize the dot product of the learned user/item embeddings to infer user-item interactions. (b) GNN-based recommendation models begin by performing graph convolutions to generate embeddings for multi-hop neighbors. These embeddings are then combined to construct super vectors, which are used for making predictions. (c) GCR first extracts graph embeddings with Plain Graph Representation and then flexibly considers all potential cross-correlations between the target user and the target item with Cross-Correlated Aggregation to infer their interaction.

In order to model the intricate cross-correlations among users and items, we aim to propose a flexible framework that can effectively learn and identify dominant groups of neighbors, which should also be applicable to enhance current recommender models. Several existing studies have explored the use of the MLP rather than the dot product (Eq.2) to implement the relevance function rr [1, 5]. Theoretically, an MLP is able to approximate any continuous function if the number of hidden units/layers is sufficient, according to the universal approximation theorem [60]. However, formulating rr in this way encounters two issues. Firstly, it lacks topology awareness as it only considers the original user/item embeddings without incorporating graph embeddings like GNN-based recommendation models, which could impede the expressivity of modeling multi-hop user-item relevance. Second, although MLPs are theoretically universal, they are difficult to use to approximate the dot product in practice [61] and thereby exhibit inferior performance than specially-designed architectures [51, 54].

To address the aforementioned two issues, we propose a novel simple but effective framework - Graph Cross-correlated Network for Recommendation (GCR). The core idea of GCR is illustrated in Figure 2-(c). It consists of two major components: Plain Graph Representation (PGR) and Cross-Correlated Aggregation (CCA). PGR encodes the interaction topology in an efficient way, while CCA then further flexibly aggregates the cross-correlated terms among multi-hop user/item subgraph embeddings. We summarize the idea of our proposed GCR with the following forms, which redefine Eq. (3) as follows:

𝒉x=PGR(x;𝒢,𝑬𝒰,𝑬),\displaystyle\bm{h}_{x}=\text{PGR}(x;{\mathcal{G}},{\bm{E}}_{{\mathcal{U}}},{\bm{E}}_{{\mathcal{I}}}), x{u,i},\displaystyle x\in\{u,i\}, (8)
rGCR(u,i)=CCA(𝒉u,\displaystyle r_{GCR}(u,i)=\text{CCA}(\bm{h}_{u}, 𝒉i).\displaystyle\bm{h}_{i}).

We now introduce the details of the PGR and CCA components in the following subsections.

IV-A Plain Graph Representation (PGR)

As depicted in Figure 1-(c), the Plain Graph Representation (PGR) module operates in two steps. Firstly, it extracts a subgraph of a specific depth for each user/item instance. Secondly, it performs information aggregation over the retrieved subgraph in an unfolding manner.

In particular, we iteratively track the neighbors of each target node jj with the breadth-first searching, obtaining {𝒩j(0),𝒩j(1),,𝒩j(L)}\{{\mathcal{N}}_{j}^{(0)},{\mathcal{N}}_{j}^{(1)},\cdots,{\mathcal{N}}_{j}^{(L)}\} where 𝒩j(l){\mathcal{N}}_{j}^{(l)} contains the ll-hop neighbors of jj, and 𝒩j(0){\mathcal{N}}_{j}^{(0)} contains the node jj itself, i.e., 𝒩j(0)=j{\mathcal{N}}_{j}^{(0)}={j}. We apply the simple but effective mean-pooling operator to read out the representation for each hop of neighbors, namely as follows:

𝒆j(l)\displaystyle{\bm{e}}_{j}^{(l)} =Readout({,𝒆k(0),}),k𝒩j(l),\displaystyle=\text{Readout}(\{...,{\bm{e}}_{k}^{(0)},...\}),k\in{\mathcal{N}}_{j}^{(l)}, (9)
=1|𝒩j(l)|k𝒩j(l)𝒆k(0),\displaystyle=\frac{1}{|{\mathcal{N}}_{j}^{(l)}|}\sum_{k\in{\mathcal{N}}_{j}^{(l)}}{\bm{e}}_{k}^{(0)},

where 𝒆j(l){\bm{e}}_{j}^{(l)} is the readout vector of node jj’s ll-hop neighbors.

Compared with GNN-based models. The superiority of our PGR to GNN-based ways for recommendation can be illustrated in the following three main aspects. Firstly, PGR is more suitable for recommendation tasks since it can extract graph information without mixing the personalized embeddings of users and items. Recursive graph convolution in Eq.(4) is initially designed for homogeneous graphs, where all users are of the same kind. Passing messages along the edges helps to better extract local graph information. However, for the recommendation task in the user-item bipartite graph, users and items have totally different semantic meanings and connecting characteristics. For example, users usually have some stationary interests and click a certain range of items, while popular items may be exposed and clicked by a great number and a wide range of users. Thus, mixing the embeddings of different types of nodes may bring in information noise when modeling the relations between users and items. Secondly, PGR, even though simple, is more robust than GNN for the recommendation task. Note that the observed connections between users/items in the recommendation are usually noisy and incomplete. Performing recursive message passing like GNNs would spread the noise along with the noisy connections, hence impeding the model training. In PGR, we use mean pooling to devise the representation for each instance, which is robust to eliminate the noise. Lastly, PGR is more computationally efficient, as no recursive updating is needed, ensuring the efficiency potential for online recommendations. Besides, PGR is parameter-free and can thus be pre-computed and stored to save time further, enhance model efficiency, and accelerate the inference process. The experimental comparisons between PGR and GNN-like models are conducted and reported in Table IV and Table V in Section V.

IV-B Cross-Correlated Aggregation (CCA)

The success of FM [51, 54] indicates that, for recommendation tasks, employing a specially designed feature interaction model structure can greatly enhance recommendation performance compared to blindly feeding all features into an MLP structure. Earily, NIA-GCN [48] first conducts interaction between a central node and its neighbors, which shows the effectiveness of building correlations. However, there is still much to be done to study how to model the complex correlations underlying user/item subgraphs for meaningful information mining.

Building upon the practical physical semantics of multi-hop embeddings, several insights can be derived. For example, start from the view of the root target user uu, 𝒆u(0)𝒆i(0){\bm{e}}_{u}^{(0)\top}\cdot{\bm{e}}_{i}^{(0)} infers the explicit relation between the target user and the target item, 𝒆u(0)𝒆i(1){\bm{e}}_{u}^{(0)\top}\cdot{\bm{e}}_{i}^{(1)} then reflects the correlation between the target user and the users who have clicked on the target item. Furthermore, 𝒆u(0)𝒆i(2){\bm{e}}_{u}^{(0)\top}\cdot{\bm{e}}_{i}^{(2)} implies the correlation between the target user and relevant items that are clicked by the users who have clicked on the target item. The relations start from the view of the root target item ii with 𝒆i(0){\bm{e}}_{i}^{(0)} also have their physical semantic meanings. Since different cross-correlations reflect user-item interactions from different angles, it is worth considering all possible cross-correlations between the target user and the target item in a flexible manner to make comprehensive recommendations. Thus, we provide two flexible ways for cross-correlation modeling: Hop-level Cross-Correlation and Element-level Cross-Correlation modeling.

Hop-level Cross-Correlation (HCC). One simple but powerful way to consider cross-correlations is to model them using the dot product. Specifically, we denote the cross-correlated relevance between the lul_{u}-th hop of the user graph embeddings and the lil_{i}-th hop of the item embeddings as zui(lu,li)z_{ui}^{(l_{u},l_{i})}. The cross-correlated terms of HCC can be formally defined as:

zui(lu,li)=𝒆u(lu)𝒆i(li),z_{ui}^{(l_{u},l_{i})}={\bm{e}}_{u}^{(l_{u})}\cdot{\bm{e}}_{i}^{(l_{i})}, (10)

where 0luL0\leq l_{u}\leq L and 0liL0\leq l_{i}\leq L. By concatenating all possible pairs into a vector, the aggregated cross-correlation of the target user uu and the target item ii can be reflected by concatenating the cross-correlations of all possible pairs as lu=0,li=0L,L𝒛ui(lu,li)\parallel_{l_{u}=0,l_{i}=0}^{L,L}{\bm{z}}_{ui}^{(l_{u},l_{i})}. After aggregating the cross-correlations, the relevance score computation can be given as:

y^ui\displaystyle\hat{y}_{ui} =MLP(lu=0,li=0L,L𝒛ui(lu,li))\displaystyle=\text{MLP}(\parallel_{l_{u}=0,l_{i}=0}^{L,L}{\bm{z}}_{ui}^{(l_{u},l_{i})}) (11)
=MLP(lu=0,li=0L,L𝒆u(lu)𝒆i(li)).\displaystyle=\text{MLP}(\parallel_{l_{u}=0,l_{i}=0}^{L,L}{\bm{e}}_{u}^{(l_{u})}\cdot{\bm{e}}_{i}^{(l_{i})}).

In HCC, the dimension of the cross-correlated terms zui(lu,li)z_{ui}^{(l_{u},l_{i})} decreases to one. Note that the aggregated cross-correlation vector of HCC contains only (L+1)2(L+1)^{2} scalars, meaning that the parameters of the MLP in Eq. (11) will be considerably small. This will potentially avoid the risk of overfitting and contain more information in fewer scalars.

Element-level Cross-Correlation (ECC). Another way to represent the cross-correlations between multi-hop user/item graph embeddings is by computing the element-wise product instead of the dot product to preserve more latent space information. Specifically, we refer to this as the Element-level Cross-Correlation (ECC). The computation of ECC cross-correlated terms can be expressed as follows:

𝒛ui(lu,li)=𝒆u(lu)𝒆i(li),{\bm{z}}_{ui}^{(l_{u},l_{i})}={\bm{e}}_{u}^{(l_{u})}\odot{\bm{e}}_{i}^{(l_{i})}, (12)

where \odot denotes the element-wise multiplication (Hadamard product), 0luL0\leq l_{u}\leq L and 0liL0\leq l_{i}\leq L. After traversing all possible pairs, concatenating their element-wise multiplication into a vector, and feeding the cross-correlated terms into Eq. (11), we can arrive at:

y^ui\displaystyle\hat{y}_{ui} =MLP(lu=0,li=0L,L𝒛ui(lu,li))\displaystyle=\text{MLP}(\parallel_{l_{u}=0,l_{i}=0}^{L,L}{\bm{z}}_{ui}^{(l_{u},l_{i})}) (13)
=MLP(lu=0,li=0L,L𝒆u(lu)𝒆i(li)).\displaystyle=\text{MLP}(\parallel_{l_{u}=0,l_{i}=0}^{L,L}{\bm{e}}_{u}^{(l_{u})}\odot{\bm{e}}_{i}^{(l_{i})}).

In ECC, the dimension of the cross-correlated terms zui(lu,li)z_{ui}^{(l_{u},l_{i})} is the same as the dimensions of the user/item embeddings, denoted by dd. Thus, the total dimension of the aggregated cross-correlation vector for ECC is (L+1)2d(L+1)^{2}\cdot d.

Theorem 1 (Degree of freedom). In statistics, The concept of degrees of freedom refers to the number of values within a statistic that are allowed to vary freely. When estimating statistical parameters, the degrees of freedom indicate the amount of independent information or data used in the estimation. It is calculated by subtracting the number of parameters used as intermediate steps in the estimation from the number of independent scores contributing to the estimate. Mathematically, degrees of freedom correspond to the number of dimensions in the domain of a random vector. It signifies the number of ”free” components within the vector, or rather, the number of components that need to be known before the vector is fully determined.

Analysis of Flexibility. Both HCC and ECC exhibit remarkably higher degrees of freedom than GNN-based recommendation models. For instance, as shown in Eq.(5), the concatenating aggregation used in NGCF [8] can be seen as computing the summation of the dot product for each hop of user/item graph embeddings. The concatenating aggregation cannot provide flexibility in aggregating the cross-correlation of multi-hop user-item interactions. For LightGCN [9], as illustrated in Eq.(6), the weighted summation aggregation of the cross-correlation is controlled by 2(L+1)2\cdot(L+1) parameters, resulting in the flexibility of 2(L+1)2\cdot(L+1). In the case of GCR, MLP structures are trained to flexibly aggregate the cross-correlations between users and items. The degrees of freedom for HCC and ECC can be computed as (L+1)2HnHl(L+1)^{2}\cdot H_{n}^{H_{l}}, where HlH_{l} is the number of hidden layers in MLP, and HnH_{n} is the number of hidden units of each layer. Specifically, given L=2L=2, d=64d=64, Hn=256H_{n}=256, and Hl=1H_{l}=1, the degrees of freedom for NGCF, LightGCN, HCC, and ECC are 0, 6, 2304, and 147456, respectively. Moreover, even when substituting the MLP used in GCR with linear regression, the degrees of freedom for HCC and ECC are 9 and 576, respectively, which are still higher than those of GNN-based models. In general, both theoretical and quantitative analyses demonstrate that GCR models have significantly higher degrees of freedom than GNN-based models.

Note that the cross-correlation aggregation, that is applied before MLP in Eq. (11) and Eq. (13), is crucial as it integrates the explicit user-item interactions as well as the implicit correlations between user-item pairs. We demonstrate the necessity of CCA in our ablation study and discuss the weight parameters assigned to each cross-correlated term in the case study in the following section.

IV-C Optimization

In our framework, PGR is parameter-free, and the parameters of MLP in Eq.(11) and Eq.(13) are required to be optimized. To do so, we employ the Bayesian Personalized Ranking (BPR) loss [4, 8, 9], which has been extensively used in recommender systems. It considers the relative order between observed and unobserved user-item interactions. Specifically, BPR assumes that the observed interactions should be assigned higher prediction values than unobserved ones. The objective function is formulated as follows:

BPR=(u,i,j)𝒪lnσ(y^uiy^uj),{\mathcal{L}}_{BPR}=-\sum_{(u,i,j)\in\mathcal{O}}\ln\sigma(\hat{y}_{ui}-\hat{y}_{uj}), (14)

where 𝒪={(u,i,j)|(u,i)+,(u,j)}\mathcal{O}=\{(u,i,j)|(u,i)\in\mathcal{R}^{+},(u,j)\in\mathcal{R}^{-}\}, +\mathcal{R}^{+} indicates observed user-item interactions, \mathcal{R}^{-} denotes sampled negative interactions, and σ()\sigma(\cdot) is the Sigmoid function.

Flexible optimization. The components of the proposed GCR framework are task-agnostic, thus it is flexible to different recommendation tasks. For the critical CTR prediction task, we can replace the objective function with the binary cross-entropy (BCE) loss [54, 29, 49]. Formally, for each labeled user-item pair (u,i)(u,i) in the training set 𝒯{\mathcal{T}} of CTR prediction, the BCE objective function can be expressed as:

BCE=(u,i)𝒯yuiln(y^ui)+(1yui)ln(1y^ui),{\mathcal{L}}_{BCE}=-\sum_{(u,i)\in{\mathcal{T}}}y_{ui}\cdot\ln(\hat{y}_{ui})+(1-y_{ui})\cdot\ln(1-\hat{y}_{ui}), (15)

where y^ui\hat{y}_{ui} is the predicted CTR and yuiy_{ui} is the ground-truth clicking label. The overall algorithm flow of each stage of our proposed GCR for recommendation is shown in Algorithm 1 and Algorithm 2.

Algorithm 1 Plain Graph Representation (PGR) stage of GCR
0:  The user-item interaction graph 𝒢=(𝒰,,𝑹){\mathcal{G}}=({\mathcal{U}},{\mathcal{I}},{\bm{R}}); The subgraph depth LL.
0:  The plain subgraph representations {𝑬𝒰1,,𝑬𝒰L}\{\bm{E}_{{\mathcal{U}}}^{1},...,\bm{E}_{{\mathcal{U}}}^{L}\} and {𝑬1,,𝑬L}\{\bm{E}_{{\mathcal{I}}}^{1},...,\bm{E}_{{\mathcal{I}}}^{L}\}.
1:  //The process of PGR is parameter-free. Thus, we pre-compute this part in practical applications.
2:  Assign users in 𝒰{\mathcal{U}} with trainable embeddings 𝑬𝒰\bm{E}_{{\mathcal{U}}} and items in {\mathcal{I}} with trainable embeddings 𝑬𝓘\bm{E_{{\mathcal{I}}}};
3:  for each user/item node j𝒰j\in{\mathcal{U}}\cup{\mathcal{I}} do
4:     for l1,,Ll\in 1,...,L do
5:        Obtain jj’s ll-hop neighbors 𝒩j(l){\mathcal{N}}_{j}^{(l)};
6:        Read out the representation 𝒆j(l){\bm{e}}_{j}^{(l)} of 𝒩j(l){\mathcal{N}}_{j}^{(l)} via Eq.(9);
7:     end for
8:     Obtain the unfolded representations {𝒆j(1),,𝒆j(L)}\{{\bm{e}}_{j}^{(1)},...,{\bm{e}}_{j}^{(L)}\} and pre-store to the database.
9:  end for
Algorithm 2 Cross-Correlated Aggregation (CCA) stage of GCR
0:  The user set 𝒰{\mathcal{U}} and item set {\mathcal{I}}; The embedding matrices 𝑬𝒰\bm{E}_{{\mathcal{U}}} and 𝑬\bm{E}_{{\mathcal{I}}}; The PGR {𝑬𝒰1,,𝑬𝒰L}\{\bm{E}_{{\mathcal{U}}}^{1},...,\bm{E}_{{\mathcal{U}}}^{L}\} and {𝑬1,,𝑬L}\{\bm{E}_{{\mathcal{I}}}^{1},...,\bm{E}_{{\mathcal{I}}}^{L}\}.
0:  The recommendation results.
1:  Randomly initialize the trainable parameters θCCA\theta_{CCA} in the CCA module.
2:  //Option 1: Hop-level Cross-Correlation (HCC).
3:  //Option 2: Element-level Cross-Correlation (ECC).
4:  if HCC then
5:     Train θCCA\theta_{CCA} and the embeddings via Eq.(10) and Eq.(11) and optimize them with recommendation task-specific objective functions (e.g., the BPR loss BPR{\mathcal{L}}_{BPR});
6:  else
7:     Train θCCA\theta_{CCA} and the embeddings via Eq.(12) and Eq.(13) and optimize them with recommendation task-specific objective functions (e.g., the BPR loss BPR{\mathcal{L}}_{BPR});
8:  end if
9:  Conduct inference with the trained θCCA\theta_{CCA} and embeddings and obtain the recommendation results.

V experiments

We conduct extensive experiments on three datasets and an industrial dataset with the goal of answering the following five research questions: Q1: Does GCR achieve the best recommendation performance among all baselines? Q2: Is GCR more effective than traditional single vector dot product paradigms? Q3: How efficient is GCR to be used than the traditional dot product paradigms? Q4: Whether PGR extracts more information than message-passing GNN, and how do HCC and ECC affect the final recommendation performance? Q5: Can GCR also be flexible and effective to equip with the CTR prediction models?

V-A Experimental Setup

V-A1 Datasets

We evaluate our proposed GCR on three publicly available recommendation datasets: Gowalla [62], Yelp2018 [9], and Amazon-Book [63] for the commonly adopted interaction prediction recommendation task. Furthermore, an additional large-scale industrial dataset WeiXin is adopted for the evaluation of the CTR prediction recommendation task. The statistics of these datasets are shown in Table II. The detailed descriptions of these datasets are shown as:

  • Gowalla111https://snap.stanford.edu/data/loc-Gowalla.html [62] is a check-in dataset provided by Gowalla with 107,092 users, 1,280,969 items and 6,442,892 check-in records, where users share their locations by checking in. In this dataset, the recorded locations are considered as items. We use this dataset to evaluate interaction prediction.

  • Yelp2018222https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset [9] is derived from the 2018 edition of the Yelp contest, which focuses on local businesses such as restaurants and bars. It contains 31,668 users, 38,048 items, and 1,561,406 review interactions. We use this dataset to evaluate interaction prediction.

  • Amazon-Book333http://jmcauley.ucsd.edu/data/amazon [63] is a subset of the Amazon dataset, which is a consumer&book-specific recommendation dataset. It contains 52,463 users (consumers) and 91,599 items (books), with the formed 2,984,108 review interactions. We use this dataset for the evaluation of both interaction prediction and CTR prediction tasks due to it is widely adopted in both two tasks.

  • WeiXin is an industrial dataset compiled from 7.3 million anonymous records of video playback logs on the Channels platform of Weixin, containing about 20,000 users and 100,000 videos. We use this dataset to evaluate CTR prediction. If a user has watched more than 90% of a given video, we treat this record as a positive sample. Otherwise, the record will be treated as a negative sample.

TABLE II: Statistics of the experimental datasets.
Dataset # Users # Items # Interactions Density Domain
Gowalla 107,092 1,280,969 6,442,892 0.0047% Check-in
Yelp2018 31,668 38,048 1,561,406 0.1296% Business
Amazon-Book 52,463 91,599 2,984,108 0.0620% E-commerce
WeiXin \sim20,000 \sim100,000 \sim7,300,000 0.3650% Short Video

V-A2 Comparing Methods

We evaluate the widely adopted interaction prediction recommendation task as the main evaluation. We compare GCR with ten representative recommendation models as follows: GRMF [6] is a special kind of the traditional matrix factorization method [3], which adds the graph Laplacian regularizer to restrict connected nodes to have similar embeddings. MetaPath2Vec [35] formalizes the metapath-based random walks on bipartite graphs as a corpus and then leverages skip-gram [34] models to compute the node embeddings. NGCF [8] is a representative GNN-based model that introduces the graph collaborative filtering algorithm to model the high-order connectivity information in the embedding function. LightGCN [9] linearly propagates user/item information on the user-item interaction graph with the simplification of the update operation by eliminating non-linearities in NGCF and uses the weighted sum to aggregate the layer-wise embeddings as the final embeddings. UltraGCN [42] is an efficient graph-based recommendation model with multi-task auxiliary losses. SGL [44] is a self-supervised method that utilizes node dropout, edge dropout, and random walk augmentations on the user-item bipartite graph. It generates two augmented graphs with the same type of augmentation operator and employs a shared LightGCN encoder for learning user/item embeddings. GTN [64] is a GNN-based recommendation model with the designed graph trend filtering for accuracy recommendations with robustness. NCL [45] is a state-of-the-art approach that employs a prototypical contrastive objective to capture user/item correlations with prototypes representing semantic neighbors. SimGCL [46] is a method that introduces random uniform noise to hidden representations as an augmentation technique for graph-based recommenders. CGCL [47] is a model that enhances the graph contrastive learning with candidate views.

To further verify the flexibility of GCR for enhancing different recommendation tasks, we have also conducted experiments that equipped the CTR prediction model with GCR. The popular CTR prediction models have been introduced in the related works. The utilized CTR prediction baselines in the experiments are listed as follows: DIN [29] is a representative model that uses a deep network based on the attention mechanism to extract user interest from historical user behaviors. DIEN [49] is the enhanced version of DIN, which further equips the model with GRU for evolution user interest modeling. UBR4CTR [55] and SIM [56] are representative CTR prediction method that aims to model the user’s life-long behavior sequence with the search-based paradigm. For SIM, we use its hard search paradigm in our experiments. Further, GMT [58] is a state-of-the-art graph-based CTR prediction model with neighborhood sampling and graph transformer architecture. DCIN [57] is a novel interest modeling model for CTR prediction with the inclusion of decision-making contexts.

TABLE III: Overall recommendation performance of GCR compared with state-of-the-art baseline models on three experimental datasets. The best and second-best results are highlighted in bold font and underlined. * indicates the statistical significance of improvement over the best-performed baseline with p<0.05p<0.05.
Datasets Gowalla Yelp2018 Amazon-Book
Methods Precision Recall NDCG Precision Recall NDCG Precision Recall NDCG
GRMF [6] 0.0329 0.1158 0.0926 0.0236 0.0502 0.0409 0.0173 0.0357 0.0307
MetaPath2Vec [35] 0.0402 0.1415 0.1062 0.0227 0.0458 0.0411 0.0358 0.0740 0.0683
NGCF [8] 0.0497 0.1570 0.1365 0.0322 0.0673 0.0569 0.0411 0.0851 0.0705
LightGCN [9] 0.0537 0.1834 0.1422 0.0421 0.0902 0.0786 0.0459 0.0941 0.0845
UltraGCN [42] 0.0543 0.1842 0.1431 0.0436 0.0923 0.0823 0.0458 0.0934 0.0839
SGL [44] 0.0554 0.1848 0.1467 0.0460 0.0951 0.0846 0.0468 0.0956 0.0875
GTN [64] 0.0564 0.1904 0.1515 0.0429 0.0908 0.0785 0.0461 0.0975 0.0857
NCL [45] 0.0568 0.1920 0.1523 0.0457 0.0944 0.0825 0.0471 0.0970 0.0871
SimGCL [46] 0.0579 0.1932 0.1543 0.0463 0.0968 0.0858 0.0474 0.0973 0.0878
CGCL [47] 0.0563 0.1893 0.1504 0.0460 0.0961 0.0847 0.0463 0.0962 0.0861
GCR (ours) 0.0604 0.1982 0.1588 0.0476 0.0991 0.0867 0.0485 0.0992 0.0891
Improvement (%) 4.32%* 2.59%* 2.92%* 2.81%* 2.38%* 1.05%* 2.32%* 1.74%* 1.48%*

V-A3 Evaluation Details

For the main results, we randomly split the user-item interaction records of each dataset into three distinct subsets: an embedding pre-training set, a model training (validation) set, and a test set, to evaluate the performance. These subsets respectively account for 65%, 15%, and 20% of the entire dataset. We use LightGCN [9] to produce the pre-trained embeddings of users and items in datasets and then feed them into the models as the raw features for model training. For the CTR prediction results, assuming each user has TT records, we use its [1,T2][1,T-2] records for model training, its (T1)(T-1)-th record for validation, and its TT-th record for CTR prediction testing [55, 59].

For evaluation metrics, we adopt the popular all-ranking evaluation protocol, which has been widely used in recent studies [8, 9]. For each user in the testing set, all non-interacted items are treated as negative items. Specifically, we rank all items in the dataset except for the interacted items used in the training process and then truncate the ranked list at 20 to calculate the Precision@20, Recall@20, and NDCG@20 metrics followed the previous works [8, 9]. Then, we have also conducted the analysis of the CTR prediction performance of GCR in subsection V-F. The evaluation metrics utilized for CTR prediction are the widely used AUC and RelaImpr [29].

V-A4 Implementation Details

The embedding size is fixed to 64 for all models, and all the embedding methods are implemented with their official codes. For our GCR, regarding the PGR module, we restrict the number of subgraph layers to 3. For the training of GCR, we use the Adam optimizer [65] with the learning rate searched from {0.01, 0.005, 0.001, 0.0005, 0.0001} and a batch size of 512. The coefficient of 2\ell_{2} normalization is set as 1e51e^{-5}, and the dropout ratio is set as 0.7. For non-linear layers, we use the Xavier distribution [66] to initialize the model parameters and utilize batch normalization [67] with a momentum of 0.1 to normalize the inputs before each non-linear layer. The activation function in MLP is set as Leaky-ReLU whose negative slope is set as 0.02. The size of the hidden layers is set as 256, while the number of hidden layers of MLP is set as 1 by default. Moreover, we perform an early stopping strategy by observing the AUC scores on the evaluation set. By default, GCR represents using PGR-ECC to flexibly consider all cross-correlation terms. Note that we ran all the experiments ten times with different random seeds and reported the average results.

V-B Overall Recommendation Results (Q1)

In this subsection, we compare our proposed GCR with the other six representative recommendation baseline models. The overall recommendation comparison results are illustrated in Table III, of which the improvement is calculated by comparing the performance of GCR with the best-performed baseline (marked in underlined). From these results, we have the following observations:

  • Firstly, it can be seen from the table that our GCR statistically outperforms the best baseline models on all datasets with respect to all metrics. For instance, GCR demonstrates a significant improvement in Precision of 4.32%, 2.81%, and 2.32% across the three experimental datasets, respectively. Specifically, GCR brings the performance average gains of 3.15%, 2.24%, and 1.82% for Precision, Recall, and NDCG metrics respectively on the three experimental datasets.

  • Furthermore, SimGCL mainly obtains the best performance over all the baselines. NGCF, LightGCN, UltraGCN, SGL, GTN, NCL, and SimGCL consistently outperform GRMF and MetaPath2Vec, suggesting that considering and aggregating the graph information helps to provide better recommendation results. GCR provides a more flexible way to aggregate the graph information and thus achieves better performance.

TABLE IV: Comparison of different relevance functions over three embeddings and three experimental datasets. * indicates the statistical significance of improvement over the best-performed base relevance function with p<0.05p<0.05.
Gowalla Yelp2018 Amazon-Book
Embeddings Relevance Function Precision Recall NDCG Precision Recall NDCG Precision Recall NDCG
GRMF Dot Product 0.0329 0.1158 0.0926 0.0236 0.0502 0.0409 0.0173 0.0357 0.0307
MLP 0.0281 0.0970 0.0756 0.0236 0.0512 0.0415 0.0158 0.0334 0.0277
Super-Vector 0.0331 0.1165 0.0935 0.0255 0.0537 0.0437 0.0190 0.0388 0.0335
GCR 0.0354 0.1227 0.0974 0.0264 0.0564 0.0453 0.0215 0.0436 0.0366
Improvement (%) 6.95%* 5.32%* 4.17%* 3.53%* 5.03%* 3.66%* 13.16%* 12.37%* 9.25%*
MetaPath2Vec Dot Product 0.0402 0.1415 0.1062 0.0286 0.0458 0.0411 0.0358 0.0740 0.0683
MLP 0.0331 0.1065 0.0841 0.0255 0.0610 0.0494 0.0250 0.0503 0.0436
Super-Vector 0.0473 0.1624 0.1240 0.0391 0.0845 0.0712 0.0448 0.0926 0.0838
GCR 0.0534 0.1802 0.1445 0.0433 0.0942 0.0811 0.0505 0.1012 0.0928
Improvement (%) 12.90%* 10.96%* 16.53%* 10.74%* 11.48%* 13.90%* 12.72%* 9.29%* 10.74%*
LightGCN Dot Product 0.0564 0.1904 0.1515 0.0421 0.0902 0.0786 0.0459 0.0941 0.0845
MLP 0.0322 0.1065 0.0827 0.0297 0.0614 0.0528 0.0226 0.0452 0.0391
Super-Vector 0.0539 0.1825 0.1437 0.0411 0.0877 0.0757 0.0452 0.0921 0.0827
GCR 0.0604 0.1982 0.1588 0.0476 0.0991 0.0867 0.0485 0.0992 0.0891
Improvement (%) 7.09%* 4.10%* 4.82%* 13.06%* 9.87%* 10.31%* 5.66%* 5.42%* 5.44%*

V-C Effectiveness Study of GCR (Q2)

In this subsection, we compare the recommendation performance between our GCR and the dot product, MLP, and the super-vector-based dot product when fed with the same embeddings. The compared relevance functions are specialized as follows:

  • Dot product [3, 4]: This is the most common relevance metric given by Eq.(2).

  • MLP (Multi layer perception) [1, 68]: MLP is used in NCF [1] and common CTR recommendation models [54], which also become a widely used relevance function.

  • Super-Vector (Super-vector-based dot product) [8, 9]: We use LightGCN to calculate the super vectors for each user and item.

The Effectiveness analysis is shown in Table IV, which provides the comparisons of the relevance functions over three different embeddings on three experimental datasets. The improvement is calculated by comparing GCR with the other best-performed relevance functions. From this result table, we can derive observations as follows.

  • Simply applying MLP as the relevance function cannot amplify the expressive power of the learned embedding. As stated in [61], although MLP is theoretically universal to approximate the dot product, it is difficult to approximate in practice, especially when the embeddings are optimized for the dot product.

  • Super-vector-based dot products can enhance the expressive power of GRMF and MetaPath2Vec embeddings. But for LightGCN, super-vector-based dot products lead to relatively worse recommendation performance. One possible reason for the degradations is that LightGCN has already utilized graph convolution layers to encode the graph information explicitly. Further applying graph convolution to graph-based embeddings will encounter the problem of high-order GNNs, e.g., the over-smoothing problem [69]. The over-smoothed embeddings are indistinguishable and negatively affect user modeling and the recommendation performance [70].

  • GCR statistically outperforms the other three counterparts on all three datasets with all three embeddings. In particular, with MetaPath2Vec embeddings, GCR improves over the strongest baselines w.r.t. NDCG@20 metric by 16.53%, 13.90%, and 10.74% on Gowalla, Yelp2018, and Amazon-Book, respectively. Further, GCR outperforms the other relevant functions with average gains of 9.53%, 8.20%, and 8.76% for Precision, Recall, and NDCG over the three types of embeddings on the three recommendation datasets. This phenomenon illustrates that GCR is able to alleviate the over-smoothing problem and further improve the recommendation performance of GRMF, MetaPath2Vec, and LightGCN.

Refer to caption
Figure 3: Comparison of inferring times on three experimental datasets.

V-D Efficiency Study of GCR (Q3)

Due to the necessity of real-time recommendation, besides effectiveness, model efficiency is also an important factor that needs to be considered [59]. As for the efficiency analysis, we report the average inference time of each compared method in Figure 3.

From these results, we can see the following findings: First, GCR has a negligible time cost compared to the dot product and MLP, which is acceptable given the dramatic performance improvement we achieved. Then, compared to the super-vector-based dot product, GCR is much more efficient because PGR is a parameter-free and parallelizable process. This experiment demonstrates that although GCR extracts and exploits the cross-correlations between multi-hop user/item subgraph embeddings, it achieves a good balance between time consumption and performance.

V-E Ablation Study (Q4)

In the ablation study section, we discuss the influence of our two core components: PGR and CCA. Specifically, we compare the recommendation performance of the following six combinations:

  • GNN (Super-Vector): This variant uses GNN to compute the graph representation super vector and uses the dot product for interaction prediction.

  • GNN-HCC: This variant uses GNN for graph representation and HCC for interaction prediction.

  • GNN-ECC: This variant uses GNN for graph representation and ECC for interaction prediction.

  • PGR-MLP: This variant uses PGR for graph representation and then feeds the concatenated graph representation into a three-layer MLP for interaction prediction.

  • PGR-HCC: This variant uses PGR for graph representation and HCC for interaction prediction.

  • GCR (PGR-ECC): This variant uses PGR for graph representation and ECC for interaction prediction.

TABLE V: Comparison of different GCR variations on MetaPath2Vec.
Gowalla Yelp2018 Amazon-Book
Relevance Functions NDCG NDCG NDCG
GNN (Super-Vector) 0.1240 0.0712 0.0838
GNN-HCC 0.1361 0.0738 0.0853
GNN-ECC 0.1371 0.0742 0.0860
PGR-MLP 0.0971 0.0541 0.0479
PGR-HCC 0.1438 0.0791 0.0926
GCR (PGR-ECC) 0.1445 0.0811 0.0928
Refer to caption
Figure 4: Visualization of the weights of the cross-correlation terms. For HCC, we have single-weight parameters for these terms (red lines), while we have 64 weight parameters for these terms of ECC (blue lines).

We describe the comparison results in Table V, from which we present the NDCG@20 of the above six variants on Gowalla, Yelp2018, and Amazon-Book. Without loss of generality, we conduct this experiment with MetaPath2Vec embeddings. From this result table, we can summarize the following conclusions:

  • PGR-based variations consistently perform better than GNN-based variations. This demonstrates that PGR is more informative than GNN for recommendation tasks. The plain representation style better extracts the information of each hop of neighbors than recursively message-passing GNNs. Therefore works better with CCA to extract the implicit relation behind user and item subgraphs.

  • Compared to GNN (Super-Vector), GNN-HCC/ECC consistently outperforms GNN in all three datasets. This proves that GNN (Super-Vector), which represents users/items as super-vectors, cannot fully exploit the expressive power of embeddings. Thus, by flexibly considering all possible cross-correlations with CCA, GNN-HCC/ECC outperforms GNN (Super-Vector) by significant margins.

  • PGR-MLP performs the worst on all three datasets. Without explicitly extracting the cross-correlations with CCA, MLP cannot model the complex interaction between user and item subgraphs. This observation verifies and substantiates the importance of cross-correlated aggregation.

Case Study. We further explore the mechanism of HCC/ECC of GCR through visualization. Specifically, in Figure 4, we visualize the weights of linear HCC/ECC by setting the number of hidden layers for MLP to zero. By visualizing the weights associated with each cross-correlated term, this figure demonstrates the following three mai observations: 1) the necessity of the cross-correlation aggregation; 2) the consistency between HCC and ECC; 3) the characteristics of the weights associated with each cross-correlated term.

  • The necessity of the cross-correlated aggregation. From the individual subplots in Figure 4, it appears that CCA models, including HCC and ECC, learn different weights for each cross-correlated term. This verifies that assigning different weights for each cross-correlated term helps to reduce the personalized ranking loss as well as make more accurate recommendations.

  • The consistency between HCC and ECC. One concern about the cross-correlated aggregation is whether the element-wise ECC tends to assign similar weight parameters as the layer-wise HCC model. As shown in the figure, after normalization, the weight parameters of HCC and ECC show similar trends on all three datasets with all three embeddings, confirming the consistency between HCC and ECC.

  • The characteristic of the embeddings In Figure 4, each row of subplots represents the visualization of the weights of a particular embedding, from GRMF to MetaPath2Vec to LightGCN. Comparing the tendency of the weight parameters of each embedding, the tendency of a particular embedding seems to be similar across different datasets. For GRMF, the weight parameters of z(01)z^{(01)} are significantly larger than other cross-correlation terms. For MetaPath2Vec, the weight parameters of each cross-correlation term happen to be much more uniform than other embeddings, while the weight parameters of z(02)z^{(02)} are slightly higher than the others. For LightGCN, the SOTA recommendation method, z(00)z^{(00)} and z(02)z^{(02)} have higher importance than other cross-correlation terms.

V-F CTR Prediction Performance (Q5)

Click-Through Rate (CTR) prediction task is to predict the probability that a target user will click on a target item. It performs as an important stage in the recommendation system. Therefore, in this subsection, we have also verified the flexibility and effectiveness of GCR in equipping the CTR models for performing the CTR prediction task.

We test whether GCR can improve the CTR prediction task by running experiments on user-item clicking data. Specifically, we conduct CTR prediction experiments on the widely used Amazon-Book recommendation dataset and the WeiXin industrial dataset, and the details of these datasets are illustrated in Table II. We include six representative CTR prediction models: DIN  [29], DIEN [49], UBR4CTR [55], SIM [56], GMT [58], and DCIN [57] as our baselines. Further, we equip DIN with our GCR as the model ”DIN+GCR”, which refers to enhancing DIN with the modeling of high-order information (1-hop and 2-hop neighbors with DIN-based neighborhood sampling) and graph cross-correlation terms with our GCR model.

TABLE VI: CTR prediction results on Amazon-Book and WeiXin datasets. * indicates the statistical significance of improvement over the best-performed baseline with p<0.05p<0.05.
Model Amazon-Book WeiXin
AUC RelaImpr AUC RelaImpr
DIN [29] 0.8106 0.00% 0.8071 0.00%
DIEN [49] 0.8218 3.61% 0.8104 1.07%
UBR4CTR [55] 0.8002 -3.35% 0.7994 -2.51%
SIM [56] 0.8304 6.37% 0.8151 2.61%
GMT [58] 0.8445 10.91% 0.8175 3.39%
DCIN [57] 0.8372 8.56% 0.8162 2.96%
DIN+GCR 0.8681 18.51% 0.8277 6.71%
Improvement (%) 2.79%* 1.25%*

In Table VI, we present the AUC metric and corresponding RelaImpr value to evaluate the performance of these CTR prediction models. From this table, “DIN+GCR” outperforms the other baselines, confirming the effectiveness and flexibility of GCR on the CTR prediction task. Compared to the best baseline GMT, “DIN+GCR” shows a relative improvement of 2.79% on Amazon-Book and 1.25% on WeiXin. More importantly, “DIN+GCR” improves DIN by extracting the graph information with PGR and constructing the cross-correlation interactions with ECC, bringing a RelaImpr gain of 18.51% on Amazon-Book and 6.71% on WeiXin. These results also show the effectiveness of GCR for real-world use in industrial scenarios. Thus, it can be utilized flexibly to enhance current CTR prediction models.

VI Conclusion

In this paper, we study the problem of recommendation with the modeling of graph neural networks (GNNs). Existing GNNs for recommendation encode the user-item subgraph into a single representation vector and conduct inference with the dot product-based operator. However, in this manner, the physical semantic meaning of each hop of the neighborhood is severely weakened. To better exploit the semantic information and make full use of the correlations between the information of central nodes and their neighborhoods, in this paper, we propose the graph cross-correlated network for recommendation (GCR), which is an effective framework that can flexibly model the cross-interactions among users and different-hop neighbors of items. We introduce the architecture of GCR with plain graph representation (PGR) and Cross-correlated Aggregation (CCA) components in detail, and then we theoretically analyze GCR and validate that it contains more flexibility than existing recommender models. Extensive experiments over three benchmark recommendation datasets and an industrial dataset demonstrate the effectiveness of GCR. Further in-depth analysis of GCR more specifically verified the effectiveness, efficiency, and flexibility of GCR. Future works can be conducted to study more powerful graph representation methods in an unfolding plain manner and explore the potentiality of GCR for explainable recommendations.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (No. U22A2095, 62032020, 62272200) and the Innovation and Technology Commission of Hong Kong under Innovation and Technology Fund - Mainland-Hong Kong Joint Funding Scheme (MHP/012/21).

References

  • [1] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural collaborative filtering,” in Proceedings of the 26th international conference on world wide web, 2017.
  • [2] S. Xu, Y. Ge, Y. Li, Z. Fu, X. Chen, and Y. Zhang, “Causal collaborative filtering,” in Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval, 2023.
  • [3] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, 2009.
  • [4] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, “Bpr: Bayesian personalized ranking from implicit feedback,” in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, 2009.
  • [5] Y. Tay, L. Anh Tuan, and S. C. Hui, “Latent relational metric learning via memory-based attention for collaborative ranking,” in Proceedings of the 2018 world wide web conference, 2018.
  • [6] N. Rao, H.-F. Yu, P. K. Ravikumar, and I. S. Dhillon, “Collaborative filtering with graph information: Consistency and scalable methods,” in Advances in Neural Information Processing Systems, vol. 28, 2015.
  • [7] M. Yang, M. Zhou, J. Liu, D. Lian, and I. King, “Hrcf: Enhancing collaborative filtering via hyperbolic geometric regularization,” in Proceedings of the ACM Web Conference 2022, 2022.
  • [8] X. Wang, X. He, M. Wang, F. Feng, and T.-S. Chua, “Neural graph collaborative filtering,” in Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval, 2019.
  • [9] X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang, “Lightgcn: Simplifying and powering graph convolution network for recommendation,” in Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020.
  • [10] S. Wu, F. Sun, W. Zhang, X. Xie, and B. Cui, “Graph neural networks in recommender systems: a survey,” ACM Computing Surveys, vol. 55, no. 5, 2022.
  • [11] C. Gao, Y. Zheng, N. Li, Y. Li, Y. Qin, J. Piao, Y. Quan, J. Chang, D. Jin, X. He et al., “A survey of graph neural networks for recommender systems: Challenges, methods, and directions,” ACM Transactions on Recommender Systems, vol. 1, no. 1, 2023.
  • [12] K. Sharma, Y.-C. Lee, S. Nambi, A. Salian, S. Shah, S.-W. Kim, and S. Kumar, “A survey of graph neural networks for social recommender systems,” ACM Computing Surveys, vol. 56, no. 10, pp. 1–34, 2024.
  • [13] J. Cao, X. Lin, S. Guo, L. Liu, T. Liu, and B. Wang, “Bipartite graph embedding via mutual information maximization,” in Proceedings of the 14th ACM international conference on web search and data mining, 2021.
  • [14] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in International Conference on Learning Representations, 2017.
  • [15] Y. Zhang, Y. Bei, H. Chen, Q. Shen, Z. Yuan, H. Gong, S. Wang, F. Huang, and X. Huang, “Multi-behavior collaborative filtering with partial order graph convolutional networks,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 6257–6268.
  • [16] Z. Hong, Z. Yuan, Q. Zhang, H. Chen, J. Dong, F. Huang, and X. Huang, “Next-generation database interfaces: A survey of llm-based text-to-sql,” arXiv preprint arXiv:2406.08426, 2024.
  • [17] J. Chen, H. Dong, Y. Qiu, X. He, X. Xin, L. Chen, G. Lin, and K. Yang, “Autodebias: Learning to debias for recommendation,” in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 21–30.
  • [18] J. Chen, H. Zhang, X. He, L. Nie, W. Liu, and T.-S. Chua, “Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention,” in Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval, 2017.
  • [19] H. Wang, N. Wang, and D.-Y. Yeung, “Collaborative deep learning for recommender systems,” in Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 2015.
  • [20] X. Xin, X. He, Y. Zhang, Y. Zhang, and J. Jose, “Relational collaborative filtering: Modeling multiple item relations for recommendation,” in Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, 2019.
  • [21] L.-b. Ning, S. Wang, W. Fan, Q. Li, X. Xu, H. Chen, and F. Huang, “Cheatagent: Attacking llm-empowered recommender systems via llm agent,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 2284–2295.
  • [22] Z. Cheng, Y. Ding, L. Zhu, and M. Kankanhalli, “Aspect-aware latent factor model: Rating prediction with ratings and reviews,” in Proceedings of the 2018 world wide web conference, 2018, pp. 639–648.
  • [23] F. Huang, Z. Yang, J. Jiang, Y. Bei, Y. Zhang, and H. Chen, “Large language model interaction simulator for cold-start item recommendation,” arXiv preprint arXiv:2402.09176, 2024.
  • [24] X. Wang, X. He, Y. Cao, M. Liu, and T.-S. Chua, “Kgat: Knowledge graph attention network for recommendation,” in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019.
  • [25] Q. Zhang, J. Dong, H. Chen, X. Huang, D. Zha, and Z. Yu, “Knowgpt: Black-box knowledge injection for large language models,” arXiv preprint arXiv:2312.06185, 2023.
  • [26] J. Dong, Q. Zhang, C. Zhou, H. Chen, D. Zha, and X. Huang, “Cost-efficient knowledge-based question answering with large language models,” arXiv preprint arXiv:2405.17337, 2024.
  • [27] Z. Hong, Z. Yuan, H. Chen, Q. Zhang, F. Huang, and X. Huang, “Knowledge-to-sql: Enhancing sql generation with data expert llm,” arXiv preprint arXiv:2402.11517, 2024.
  • [28] C.-K. Hsieh, L. Yang, Y. Cui, T.-Y. Lin, S. Belongie, and D. Estrin, “Collaborative metric learning,” in Proceedings of the 26th international conference on world wide web, 2017.
  • [29] G. Zhou, X. Zhu, C. Song, Y. Fan, H. Zhu, X. Ma, Y. Yan, J. Jin, H. Li, and K. Gai, “Deep interest network for click-through rate prediction,” in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018.
  • [30] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: Large-scale information network embedding,” in Proceedings of the 24th international conference on world wide web, 2015.
  • [31] X. Huang, J. Li, and X. Hu, “Label informed attributed network embedding,” in Proceedings of the tenth ACM international conference on web search and data mining, 2017.
  • [32] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014.
  • [33] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016.
  • [34] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
  • [35] Y. Dong, N. V. Chawla, and A. Swami, “metapath2vec: Scalable representation learning for heterogeneous networks,” in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017.
  • [36] H. Chen, Y. Xu, F. Huang, Z. Deng, W. Huang, S. Wang, P. He, and Z. Li, “Label-aware graph convolutional networks,” in Proceedings of the 29th ACM international conference on information & knowledge management, 2020.
  • [37] H. Chen, W. Huang, Y. Xu, F. Sun, and Z. Li, “Graph unfolding networks,” in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020.
  • [38] Y. Bei, S. Zhou, Q. Tan, H. Xu, H. Chen, Z. Li, and J. Bu, “Reinforcement neighborhood selection for unsupervised graph anomaly detection,” in 2023 IEEE International Conference on Data Mining (ICDM).   IEEE, 2023, pp. 11–20.
  • [39] J. Chen, H. Dong, X. Wang, F. Feng, M. Wang, and X. He, “Bias and debias in recommender system: A survey and future directions,” ACM Transactions on Information Systems, vol. 41, no. 3, pp. 1–39, 2023.
  • [40] Z. Zhao, J. Chen, S. Zhou, X. He, X. Cao, F. Zhang, and W. Wu, “Popularity bias is not always evil: Disentangling benign and harmful bias for recommendation,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 10, pp. 9920–9931, 2022.
  • [41] W. Chen, Y. Bei, Q. Shen, H. Chen, X. Huang, and F. Huang, “Feedback reciprocal graph collaborative filtering,” in Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024, pp. 4397–4405.
  • [42] K. Mao, J. Zhu, X. Xiao, B. Lu, Z. Wang, and X. He, “Ultragcn: ultra simplification of graph convolutional networks for recommendation,” in Proceedings of the 30th ACM international conference on information & knowledge management, 2021, pp. 1253–1262.
  • [43] J. Yu, H. Yin, X. Xia, T. Chen, J. Li, and Z. Huang, “Self-supervised learning for recommender systems: A survey,” IEEE Transactions on Knowledge and Data Engineering, 2023.
  • [44] J. Wu, X. Wang, F. Feng, X. He, L. Chen, J. Lian, and X. Xie, “Self-supervised graph learning for recommendation,” in Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, 2021.
  • [45] Z. Lin, C. Tian, Y. Hou, and W. X. Zhao, “Improving graph collaborative filtering with neighborhood-enriched contrastive learning,” in Proceedings of the ACM web conference 2022, 2022, pp. 2320–2329.
  • [46] J. Yu, H. Yin, X. Xia, T. Chen, L. Cui, and Q. V. H. Nguyen, “Are graph augmentations necessary? simple graph contrastive learning for recommendation,” in Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval, 2022, pp. 1294–1303.
  • [47] W. He, G. Sun, J. Lu, and X. S. Fang, “Candidate-aware graph contrastive learning for recommendation,” in Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023, pp. 1670–1679.
  • [48] J. Sun, Y. Zhang, W. Guo, H. Guo, R. Tang, X. He, C. Ma, and M. Coates, “Neighbor interaction aware graph convolution networks for recommendation,” in Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, 2020.
  • [49] G. Zhou, N. Mou, Y. Fan, Q. Pi, W. Bian, C. Zhou, X. Zhu, and K. Gai, “Deep interest evolution network for click-through rate prediction,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019.
  • [50] H. Chen, Y. Bei, Q. Shen, Y. Xu, S. Zhou, W. Huang, F. Huang, S. Wang, and X. Huang, “Macro graph neural networks for online billion-scale recommender systems,” in Proceedings of the ACM on Web Conference 2024, 2024, pp. 3598–3608.
  • [51] S. Rendle, “Factorization machines,” in 2010 IEEE International conference on data mining.   IEEE, 2010.
  • [52] H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, M. Ispir et al., “Wide & deep learning for recommender systems,” in Proceedings of the 1st workshop on deep learning for recommender systems, 2016.
  • [53] J. Xiao, H. Ye, X. He, H. Zhang, F. Wu, and T.-S. Chua, “Attentional factorization machines: learning the weight of feature interactions via attention networks,” in Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017.
  • [54] H. Guo, R. Tang, Y. Ye, Z. Li, and X. He, “Deepfm: a factorization-machine based neural network for ctr prediction,” in Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017.
  • [55] J. Qin, W. Zhang, X. Wu, J. Jin, Y. Fang, and Y. Yu, “User behavior retrieval for click-through rate prediction,” in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020, pp. 2347–2356.
  • [56] Q. Pi, G. Zhou, Y. Zhang, Z. Wang, L. Ren, Y. Fan, X. Zhu, and K. Gai, “Search-based user interest modeling with lifelong sequential behavior data for click-through rate prediction,” in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020.
  • [57] X. Li, S. Chen, J. Dong, J. Zhang, Y. Wang, X. Wang, and D. Wang, “Decision-making context interaction network for click-through rate prediction,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 4, 2023, pp. 5195–5202.
  • [58] E. Min, Y. Rong, T. Xu, Y. Bian, D. Luo, K. Lin, J. Huang, S. Ananiadou, and P. Zhao, “Neighbour interaction based click-through rate prediction via graph-masked transformer,” in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022, pp. 353–362.
  • [59] Y. Bei, H. Chen, S. Chen, X. Huang, S. Zhou, and F. Huang, “Non-recursive cluster-scale graph interacted model for click-through rate prediction,” in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023, pp. 3748–3752.
  • [60] B. C. Csáji et al., “Approximation with artificial neural networks,” Faculty of Sciences, Etvs Lornd University, Hungary, vol. 24, 2001.
  • [61] S. Rendle, W. Krichene, L. Zhang, and J. Anderson, “Neural collaborative filtering vs. matrix factorization revisited,” in Proceedings of the 14th ACM Conference on Recommender Systems, 2020.
  • [62] E. Cho, S. A. Myers, and J. Leskovec, “Friendship and mobility: user movement in location-based social networks,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011.
  • [63] J. McAuley, C. Targett, Q. Shi, and A. Van Den Hengel, “Image-based recommendations on styles and substitutes,” in Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, 2015.
  • [64] W. Fan, X. Liu, W. Jin, X. Zhao, J. Tang, and Q. Li, “Graph trend filtering networks for recommendation,” in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022.
  • [65] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations, 2015.
  • [66] D. Jakovetić, J. Xavier, and J. M. Moura, “Fast distributed gradient methods,” IEEE Transactions on Automatic Control, vol. 59, no. 5, pp. 1131–1146, 2014.
  • [67] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning.   pmlr, 2015.
  • [68] S. Fan, J. Zhu, X. Han, C. Shi, L. Hu, B. Ma, and Y. Li, “Metapath-guided heterogeneous graph neural network for intent recommendation,” in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019.
  • [69] D. Chen, Y. Lin, W. Li, P. Li, J. Zhou, and X. Sun, “Measuring and relieving the over-smoothing problem for graph neural networks from the topological view,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 04, 2020.
  • [70] Q. Li, Z. Han, and X.-M. Wu, “Deeper insights into graph convolutional networks for semi-supervised learning,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018.