This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Robust Training Objectives Improve Embedding-based Retrieval in Industrial Recommendation Systems

Matthew Kolodner, Mingxuan Ju, Zihao Fan, Tong Zhao, Elham Ghazizadeh, Yan Wu, Neil Shah, Yozen Liu Snap, Inc.2772 Donald Douglas Loop NSanta MonicaCAUSA90405 mkolodner, mju, zfan3, tong, eghazizadeh, ywu, nshah, [email protected]
(2024)
Abstract.

Improving recommendation systems (RS) can greatly enhance the user experience across many domains, such as social media. Many RS utilize embedding-based retrieval (EBR) approaches to retrieve candidates for recommendation. In an EBR system, the embedding quality is key. According to recent literature, self-supervised multitask learning (SSMTL) has showed strong performance on academic benchmarks in embedding learning and resulted in an overall improvement in multiple downstream tasks, demonstrating a larger resilience to the adverse conditions between each downstream task and thereby increased robustness and task generalization ability through the training objective. However, whether or not the success of SSMTL in academia as a robust training objectives translates to large-scale (i.e., over hundreds of million users and interactions in-between) industrial RS still requires verification. Simply adopting academic setups in industrial RS might entail two issues. Firstly, many self-supervised objectives require data augmentations (e.g., embedding masking/corruption) over a large portion of users and items, which is prohibitively expensive in industrial RS. Furthermore, some self-supervised objectives might not align with the recommendation task, which might lead to redundant computational overheads or negative transfer. In light of these two challenges, we evaluate using a robust training objective, specifically SSMTL, through a large-scale friend recommendation system on a social media platform in the tech sector, identifying whether this increase in robustness can work at scale in enhancing retrieval in the production setting. Through online A/B testing with SSMTL-based EBR, we observe statistically significant increases in key metrics in the friend recommendations, with up to 5.45%\mathbf{5.45}\% improvements in new friends made and 1.91%\mathbf{1.91}\% improvements in new friends made with cold-start users. Besides, with a dedicated case study, the benefits of robust training objectives are demonstrated through SSMTL on large-scale graphs with gains in both retrieval and end-to-end friend recommendation.

copyright: acmcopyrightjournalyear: 2024doi: XXXXXXX.XXXXXXXconference: RobustRecsys workshop at RecSys; Oct. 14–18, 2024; Bari, Italybooktitle: RobustRecsys workshop at the 18th ACM Conference on Recommender Systems (RecSys ’24), October 14–18, 2024, Bari, Italyprice: 15.00isbn: 978-1-4503-XXXX-X/18/06

1. Introduction

Recommendation systems (RS) have become a crucial component for user experience  (li2023recent, ; sun2024survey, ). Most industrial RS explore a two-stage process (10.1145/2959100.2959190, ). During the first stage (i.e., the retrieval phase), among hundreds of millions of candidate users/items, the RS usually utilizes several models optimized for recall to select a small set of candidate users/items (e.g., 1,000 candidates). Whereas during the second stage (i.e., the ranking phase), within the candidate subset, the RS can explore complicated expensive models that are optimized for precision to select top KK candidates for the final recommendation. Such two-stage process enables recommendation over large quantities of possible users/items and allows for greater flexibility towards key recommendation metrics.

In this two-stage scheme, the retrieval stage is especially important, as it acts as the bottleneck for possible candidates provided to the ranker in the second stage. One common approach (DBLP:journals/corr/abs-2006-11632, ; gan2023binary, ) for the retrieval step is to leverage embedding-based retrieval (EBR). Specifically, EBR learns embeddings for all users and items as vectors in a low-dimensional latent space. These embeddings are learned in a way such that the distance between them is reflective of their similarity, with more similar items being closer together in the latent space. As a result, candidates can be retrieved through a nearest-neighbor search across the latent space. In practice, this is done using an approximate nearest neighbor methods optimized for large-scale retrieval, such as FAISS (DBLP:journals/corr/JohnsonDJ17, ) and HNSW (DBLP:journals/corr/MalkovY16, ).

Many methods (zhang2023divide, ; 1167344, ; jha2023unified, ; peng2023embeddingbased, ) have been proposed for generating high-quality embeddings for EBR, which lead to more relevant candidates and improved metrics after the end-to-end recommendation. In this work, we specifically focus on the friend recommendation EBR setting, where vast amounts of topological information relating users are readily available. Recent works (10.1145/3539618.3591848, ; DBLP:journals/corr/abs-1806-01973, ; kung2024improving, ) have shown that including this relational information can improve the embedding quality. The relational information is commonly modeled with graph neural networks (GNNs), producing embeddings that leverage neighbor information in graphs, such as co-friend relationships. For graph-aware EBR in particular, link prediction has seen success for generating high-quality embeddings (LI2021516, ), where we look to predict the presence of an edge between a query node and set of candidate nodes.

While link prediction is effective in learning nuanced similarities and distinctions between candidates, there are several other self-supervised graph learning philosophies that can provide high-quality embeddings, such as mutual information maximization (oord2018representation, ), generative reconstruction (he2022masked, ), or whitening decorrelation (ermolov2021whitening, ). Based on these general philosophies, many graph-based approaches have been proposed and used to learning embeddings directly, achieving desirable properties of embeddings without requiring explicit labels. Recently, Ju et al. (ju2023multitask, ) evaluated combining these self-supervised learning approaches with link prediction in a multitask (MTL) setting, demonstrating a larger resilience to the adverse conditions between each downstream task and thereby increased robustness and generalization ability through the training objective

However, whether or not using SSMTL in academia as a robust training objective translates to large-scale (i.e., over hundreds of millions of users and interactions in-between) industrial RSs still requires verification. Simply adopting academic setups in industrial RSs might result in several issues. Firstly, many self-supervised objectives require data augmentations (e.g., embedding masking/corruption) over a large portion of users and items, which is prohibitively expensive in industrial RSs. Furthermore, some self-supervised objectives might not align with the recommendation task, which might lead to redundant computational overheads or negative transfer  (TorreyShavlik2010, ), a phenomenon where performance can worsen as a result of the complexity and potentially opposing nature of the various tasks.

In this work, we investigate whether robust SSMTL training objectives are able to improve the link prediction retrieval performance on large-scale graphs with over hundreds of millions of nodes and edges. Specifically, we look to find what combination of SSL approaches can improve overall robustness and thereby augment retrieval through complementary yet disjoint information. In our experiments, we find two SSL approaches, based on philosophies from whitening decorrelation (e.g., Canonical Correlation Analysis (DBLP:journals/corr/abs-2106-12484, )) and generative reconstruction (e.g., Masked Autoencoders (hou2022graphmae, )), that are able to augment the performance of link prediction without negative transfer. We deploy the proposed framework on an industrial large-scale friend recommendation system to a community of hundreds of millions of users. In the online A/B testing, we observe significant improvements in key metrics like new friends made, especially with cold-start users on the platform. Our contributions are summarized as follows:

  • We demonstrate the effectiveness of robust training objectives such as SSMTL in a large-scale industrial recommendation system.

  • We conduct an online study of SSMTL on a massive real-world recommendation system, and observe a statistically significant increase in key metrics, with up to 5.45%\mathbf{5.45}\% improvements in new friends made and 1.91%\mathbf{1.91}\% improvements in new friends made with cold-start users.

Refer to caption
Figure 1. In our proposed SSMTL framework, we combine the CCA and MAE SSL methods with the retrieval task in our embedding generation scheme for EBR. CCA looks to maximize the correlation of two augmented views of the input subgraph while decorrelating features of a single view. MAE seeks to reconstruct the query user nodes after being propagated through the GNN encoder backbone. Finally, the retrieval task seeks to predict which candidates share a link with the query user using a categorical cross entropy loss. The loss of each subtask is weighted and summed to measure the final loss. Embeddings can be generated through the GNN encoder for EBR.

2. Background

2.1. Graph-Aware Embedding-based Retrieval

In a two-stage recommendation system with a retrieval then ranking phase, the retrieval phase plays an important role filtering out the most relevant candidates to lighten the load of the ranker. Since the ranking result is largely dependent on items retrieved in the retrieval phase, a good quality retrieval model can drastically improve the final ranking. Embedding based retrieval (EBR) is a method that’s recently adopted and deployed in many content, product, and friend recommendation systems(DBLP:journals/corr/abs-2006-11632, ; 45530, ; TK:2021, ; 10.1145/3539618.3591848, ), and proved to achieve superior results. EBR transform users and items into embeddings, changing the retrieval problem into a nearest-neighbor search problem in a low-dimensional latent space. These embeddings can be determined in advance and indexed using an approximate nearest neighbor search such as FAISS (DBLP:journals/corr/JohnsonDJ17, ) and HNSW (DBLP:journals/corr/MalkovY16, ) in order to retrieve the top-kk most relevant items efficiently at serving.

When applying EBR to RS problems, the quality of embeddings is of upmost importance. In this paper, we use a friend recommendation system as our subject. In scenarios like friend recommendation where vast amounts of topological information relating users and items is readily available, these embeddings can be augmented with GNNs. Previous work showed that EBR for friend recommendation systems see benefits leveraging graph-aware embeddings(10.1145/3539618.3591848, ). In this setting, nodes would contain individual user features while edges map to user-user interactions. This approach compliments commonly used graph traversal approaches (eg. friend-of-friend (FoF) (Newman_2001, )), allowing for retrieval of candidates from any number of hops away from the target.

Here we describe GNNs for generating graph-aware embeddings for EBR. GNNs have demonstrated state-of-the-art performance in many problems containing rich topological information within the graph data (DBLP:journals/corr/abs-1812-08434, ), such as recommendation and forecasting. Formally, we define G=(𝒱,,X)G=(\mathcal{V},\mathcal{E},X), where 𝒱\mathcal{V} is the set of nn nodes (|𝒱|=n)|\mathcal{V}|=n), \mathcal{E} is the set of edges (𝒱𝒱\mathcal{E}\in\mathcal{V}\subseteq\mathcal{V}), and XX is a feature matrix of dimension dd where Xn×dX\in\mathbb{R}^{n\times d}. Many modern GNNs also employ a message-passing structure, consisting of an aggregation (AGG) and update (UPD) function. The goal of this paradigm is for nodes to receive information from their neighbors, collecting messages using its AGG function before updating their own messages with the UPD function, both of which are learnable and permutation-invariant. For some node uu at layer kk, the next message-passing layer can be written as

(1) 𝐡u(k+1)=UPD(k)(𝐡u(k),AGG(k)({𝐡v(k),v𝒩(u)}))\mathbf{h}_{u}^{(k+1)}=\text{UPD}^{(k)}\left(\mathbf{h}_{u}^{(k)},\text{AGG}^{(k)}\left(\{\mathbf{h}_{v}^{(k)},\forall v\in\mathcal{N}(u)\}\right)\right)

where 𝒩(u)\mathcal{N}(u) is the neighborhood nodes of node uu. Different message-passing GNN models use different combinations of AGG and UPD functions. An example of a more complex GNN, Graph-attention networks (GATs) (veličković2018graph, ), use an attention mechanism for each pair of nodes ii and jj

(2) αij=softmaxj(fatt(𝐖hi,𝐖hj))\alpha_{ij}=\text{softmax}_{j}\left(f_{\text{att}}\left(\mathbf{W}h_{i},\mathbf{W}h_{j}\right)\right)

where 𝐖\mathbf{W} is a linear transformation applied to every node and fattf_{\text{att}} is the attention function parameterized by a weight vector and a non-linearity function. The AGG function is then a attention-weighted sum of its neighbors features while the UPD function is implicitly defined in 𝐖\mathbf{W} and the non-linearity function. Typically, to generate graph-aware embeddings from GNNs, a margin based ranking loss(DBLP:journals/corr/abs-1806-01973, ; 10.1145/3539618.3591848, ) or contrastive(DBLP:journals/corr/abs-2101-01317, ) loss can be used, to encourage items that are closer in the graph to be closer in the embedding space.

2.2. Multitask Learning

Multitask learning (MTL) is an approach in machine learning where a model is trained simultaneously on several tasks. MTL has been extensively explored in recommendation as a way to improve key metrics (inproceedings1, ; 10.1145/3219819.3220007, ; ma2018entire, ; 10.1145/3383313.3412236, ). Thus, the core idea behind multitask learning is to improve the robustness of the model by leveraging the domain-specific information contained in the training signals of related tasks (NIPS2006_0afa92fc, ; Caruana1997MultitaskLearning, ). Hard parameter sharing, one of the most fundamental forms of MTL, uses a shared representation which then branches into multiple heads capable of learning task-specific information (guo2020learning, ; DBLP:journals/corr/abs-1911-12423, ; vandenhende2020branched, ).

For graph-aware EBR in particular, self-supervised multitask learning (SSMTL) has been proposed as a new approach to MTL, optimizing the embeddings directly to achieve desirable embedding properties without the use of positive or negative labels. In this setting, we combine several self-supervised learning (SSL) methods with a downstream retrieval task to learn both direct and indirect embedding features. Recent work (ju2023multitask, ) has shown that SSMTL can lead to improved task generalization and embedding quality on several academic benchmarks through the increasingly robust training objective. However, many of the SSL approaches used are constrained to the assumption that global graph information can be inferred within the graph structure. This is not valid in the large-scale recommendation setting, where graphs are constrained to some KK-hop around a query user in order to fit in memory. As a result, many of these SSL methods may lead to negative transfer due to SSL task conflict with the target link prediction task, and there remains work to be done to investigate which methods perform best in this large-scale setting.

3. Self-Supervised Multitask Learning for EBR

In the following sections, we describe details of the SSL methods used in our SSMTL approach, our experiment set up and results, highlighting the benefits and impact of including SSMTL based embedding in EBR for large-scale industrial recommendation systems.

3.1. Self-Supervised Learning Methods

We identify two self-supervised learning approaches that are scalable and lead to improvements in the large-scale recommendation setting through a more robust training objective.

Canonical Correlation Analysis. Based on work from (DBLP:journals/corr/abs-2106-12484, ), Canonical Correlation Analysis (CCA) deploys a non-contrastive, non-discriminitive SSL method to train the GNN. The self-supervised training objective is described in Equation 3. First, given a subgraph with nn nodes, two augmented views of the subgraph are created and fed through the GNN, producing 𝐙𝐀\mathbf{Z_{A}} and 𝐙𝐁\mathbf{Z_{B}} where 𝐙𝐀,𝐙𝐁n×k\mathbf{Z_{A}},\mathbf{Z_{B}}\in\mathbb{R}^{n\times k}. Each of these embeddings are fed through a task-specific head, and then are normalized so that each feature has 0 mean and 1n\frac{1}{\sqrt{n}} standard deviation, resulting in 𝐙~𝐀\mathbf{\tilde{Z}_{A}} and 𝐙~𝐁\mathbf{\tilde{Z}_{B}}. The loss is then computed from Equation 3. The first term in the equation seeks to minimize the distance of the same nodes between the two views. The second term enforces that the feature-wise covariance of all nodes is equal to the identity matrix.

(3) CCA=𝐙~A𝐙~BF2+λ(𝐙~AT𝐙~A𝐈F2+𝐙~BT𝐙~B𝐈F2)\mathcal{L_{\text{CCA}}}=\left\|\tilde{\mathbf{Z}}_{A}-\tilde{\mathbf{Z}}_{B}\right\|_{F}^{2}+\lambda\left(\left\|\tilde{\mathbf{Z}}_{A}^{T}\tilde{\mathbf{Z}}_{A}-\mathbf{I}\right\|_{F}^{2}+\left\|\tilde{\mathbf{Z}}_{B}^{T}\tilde{\mathbf{Z}}_{B}-\mathbf{I}\right\|_{F}^{2}\right)

Masked Autoencoders. Based on work from (hou2022graphmae, ), this approach leverages a graph masked autoencoder (MAE) that focuses on feature reconstruction. First, an augmented view of the subgraph is created and the features of the query users are masked out. This augmented graph is then fed through the GNN and a task-specific head. The features of the query users are then re-masked and passed through a graph convolution layer. As described in Equation 4, for all masked nodes 𝒱\mathcal{V}, the final loss is equal to the average of the scaled cosine error between the original features 𝐗\mathbf{X} and generated features 𝐙\mathbf{Z}. This approach only relies on the local neighborhood surrounding the query node, making it a good option for large-scale SSMTL.

(4) MAE=1|𝒱|vi𝒱(1𝐱iT𝐳i𝐱i𝐳i)y,y1\mathcal{L}_{\text{MAE}}=\frac{1}{|\mathcal{V}|}\sum_{v_{i}\in\mathcal{V}}\left(1-\frac{\mathbf{x}_{i}^{T}\mathbf{z}_{i}}{\|\mathbf{x}_{i}\|\cdot\|\mathbf{z}_{i}\|}\right)^{y},\quad y\geq 1

We note that these two approaches both utilize non-contrastive methods. While experimenting with different SSL tasks, we find that contrastive SSL approaches do not perform very well in the production setting due to their assumption that global information is readily available in the original and augmented graphs. This is not necessarily true for large-scale recommendation, where subgraphs are constrained to the K-hop neighborhood surrounding each query node.

3.2. Experimental Setup

3.2.1. Problem Breakdown

We evaluate the SSMTL as a robust training objective on an industrial friend recommendation system with hundreds of millions of users and connections. To handle this scale of training, we sample subgraphs containing the kk-hop neighborhood around each query user. Following training, the embeddings for EBR can be via propagation through the encoder backbone.

3.2.2. Retrieval Baseline

The baseline model uses a supervised single-task setup for embedding-based retrieval. We use a GAT as the GNN encoder backbone to obtain embeddings for the query user and each candidate, producing a candidate embedding matrix 𝐳\mathbf{z}. We can then compute the dot product between the query user and each candidate and apply Softmax to generate the logits. We then calculate the Categorical Cross Entropy Loss with the true labels 𝐲\mathbf{y} across the N=2N=2 classes and MM candidates, outlined in Equation 5.

(5) retrieval=i=1Nj=1Myijlog(ezijk=1Mezik)\mathcal{L}_{\text{retrieval}}=-\sum_{i=1}^{N}\sum_{j=1}^{M}y_{ij}\log\left(\frac{e^{z_{ij}}}{\sum_{k=1}^{M}e^{z_{ik}}}\right)

3.2.3. SSMTL Implementation Details

In our SSMTL approach, we use both CCA and MAE in combination with the retrieval baseline as the training objectives. All three methods share the same GAT GNN backbone. The augmented views for CCA and MAE occur separately, with CCA performing edge and feature drop augmentations while MAE performs edge drop and query node masking. The task-specific head for CCA is a Linear-ReLU-Linear block while the task-specific head for MAE is one linear layer. The final loss with SSMTL is a weighted sum of the losses.

(6) combined=αretrieval+βCCA+γMAE\mathcal{L_{\text{combined}}}=\alpha\mathcal{L}_{\text{retrieval}}+\beta\mathcal{L}_{\text{CCA}}+\gamma\mathcal{L}_{\text{MAE}}

where α\alpha is the weight for the retrieval loss, β\beta is the weight of the CCA loss, and γ\gamma is the weight of the MAE loss. In practice, we observed best performance when the retrieval weight was several orders of magnitude larger than the other loss weights.

3.3. Results

We evaluated the effectiveness of SSMTL for end-to-end friend recommendation with online A/B testing. The control group used candidates retrieved from the production model trained with retrieval baseline, while the treatment group instead used candidates retrieved with the new robust training objective in the SSMTL setting, specifically combining the previous retrieval loss with whitening decorrelation and generative reconstruction objectives.

In the A/B experimental results, we saw statistically significant improvements across several friend recommendation metrics. Specifically, we observed up to 5.45% improvements in new friends made and +1.91% new friends made with low-degree users in various markets. Overall, from these results, we see that SSMTL is able to provide improved recommendation compared with the single-task setting, in particular helping with candidate generation for low-degree users.

4. Conclusion

In this paper, we evaluate the effectiveness of a robust self-supervised multitask learning objective in embedding-based retrieval. Through online evaluation, we demonstrate that self-supervised methods used in a multi task setting are able to augment the performance of the underlying retrieval task on the scale of over 800 million nodes and edges, providing complementary yet disjoint information to enhance the embedding quality. We observe statistically significant gains in the number of friendships made for both high and low degree users.

References

  • [1] Yang Li, Kangbo Liu, Ranjan Satapathy, Suhang Wang, and Erik Cambria. Recent developments in recommender systems: A survey, 2023.
  • [2] Anchen Sun and Yuanzhe Peng. A survey on modern recommendation system based on big data, 2024.
  • [3] Paul Covington, Jay Adams, and Emre Sargin. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, RecSys ’16, page 191–198, New York, NY, USA, 2016. Association for Computing Machinery.
  • [4] Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano, and Linjun Yang. Embedding-based retrieval in facebook search. CoRR, abs/2006.11632, 2020.
  • [5] Yukang Gan, Yixiao Ge, Chang Zhou, Shupeng Su, Zhouchuan Xu, Xuyuan Xu, Quanchao Hui, Xiang Chen, Yexin Wang, and Ying Shan. Binary embedding-based retrieval at tencent, 2023.
  • [6] Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with gpus. CoRR, abs/1702.08734, 2017.
  • [7] Yury A. Malkov and Dmitry A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. CoRR, abs/1603.09320, 2016.
  • [8] Yuan Zhang, Xue Dong, Weijie Ding, Biao Li, Peng Jiang, and Kun Gai. Divide and conquer: Towards better embedding-based retrieval for recommender systems from a multi-task perspective, 2023.
  • [9] G. Linden, B. Smith, and J. York. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Computing, 7(1):76–80, 2003.
  • [10] Rishikesh Jha, Siddharth Subramaniyam, Ethan Benjamin, and Thrivikrama Taula. Unified embedding based personalized retrieval in etsy search, 2023.
  • [11] Ruoling Peng, Kang Liu, Po Yang, Zhipeng Yuan, and Shunbao Li. Embedding-based retrieval with llm for effective agriculture information extracting from unstructured data, 2023.
  • [12] Jiahui Shi, Vivek Chaurasiya, Yozen Liu, Shubham Vij, Yan Wu, Satya Kanduri, Neil Shah, Peicheng Yu, Nik Srivastava, Lei Shi, Ganesh Venkataraman, and Jun Yu. Embedding based retrieval in friend recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’23, page 3330–3334, New York, NY, USA, 2023. Association for Computing Machinery.
  • [13] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. Graph convolutional neural networks for web-scale recommender systems. CoRR, abs/1806.01973, 2018.
  • [14] Pau Perng-Hwa Kung, Zihao Fan, Tong Zhao, Yozen Liu, Zhixin Lai, Jiahui Shi, Yan Wu, Jun Yu, Neil Shah, and Ganesh Venkataraman. Improving embedding-based retrieval in friend recommendation with ann query expansion. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2930–2934, 2024.
  • [15] Chen Li, Xutan Peng, Yuhang Niu, Shanghang Zhang, Hao Peng, Chuan Zhou, and Jianxin Li. Learning graph attention-aware knowledge graph embedding. Neurocomputing, 461:516–529, 2021.
  • [16] Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  • [17] Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  • [18] Aleksandr Ermolov, Aliaksandr Siarohin, Enver Sangineto, and Nicu Sebe. Whitening for self-supervised representation learning. In International Conference on Machine Learning, pages 3015–3024. PMLR, 2021.
  • [19] Mingxuan Ju, Tong Zhao, Qianlong Wen, Wenhao Yu, Neil Shah, Yanfang Ye, and Chuxu Zhang. Multi-task self-supervised graph neural networks enable stronger task generalization. In The Eleventh International Conference on Learning Representations, 2023.
  • [20] Lisa Torrey and Jude Shavlik. Transfer Learning, pages 242–264. IGI Global, 2010.
  • [21] Hengrui Zhang, Qitian Wu, Junchi Yan, David Wipf, and Philip S. Yu. From canonical correlation analysis to self-supervised graph neural networks. CoRR, abs/2106.12484, 2021.
  • [22] Zhenyu Hou, Xiao Liu, Yukuo Cen, Yuxiao Dong, Hongxia Yang, Chunjie Wang, and Jie Tang. Graphmae: Self-supervised masked graph autoencoders, 2022.
  • [23] Paul Covington, Jay Adams, and Emre Sargin. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, New York, NY, USA, 2016.
  • [24] Tim Koh, George Wu, , and Michael Mi. Manas hnsw realtime: Powering realtime embedding-based retrieval, 2021.
  • [25] M. E. J. Newman. Clustering and preferential attachment in growing networks. Physical Review E, 64(2), July 2001.
  • [26] Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. Graph neural networks: A review of methods and applications. CoRR, abs/1812.08434, 2018.
  • [27] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks, 2018.
  • [28] Zhuang Liu, Yunpu Ma, Yuanxin Ouyang, and Zhang Xiong. Contrastive learning for recommender system. CoRR, abs/2101.01317, 2021.
  • [29] Yichao Lu, Ruihai Dong, and Barry Smyth. Why i like it: multi-task learning for recommendation and explanation. pages 4–12, 09 2018.
  • [30] Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H. Chi. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18, page 1930–1939, New York, NY, USA, 2018. Association for Computing Machinery.
  • [31] Xiao Ma, Liqin Zhao, Guan Huang, Zhi Wang, Zelin Hu, Xiaoqiang Zhu, and Kun Gai. Entire space multi-task model: An effective approach for estimating post-click conversion rate, 2018.
  • [32] Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. Progressive layered extraction (ple): A novel multi-task learning (mtl) model for personalized recommendations. In Proceedings of the 14th ACM Conference on Recommender Systems, RecSys ’20, page 269–278, New York, NY, USA, 2020. Association for Computing Machinery.
  • [33] Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. Multi-task feature learning. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems, volume 19. MIT Press, 2006.
  • [34] Rich Caruana. Multitask learning. Machine Learning, 28:41–75, 1997.
  • [35] Pengsheng Guo, Chen-Yu Lee, and Daniel Ulbricht. Learning to branch for multi-task learning, 2020.
  • [36] Ximeng Sun, Rameswar Panda, and Rogério Schmidt Feris. Adashare: Learning what to share for efficient deep multi-task learning. CoRR, abs/1911.12423, 2019.
  • [37] Simon Vandenhende, Stamatios Georgoulis, Bert De Brabandere, and Luc Van Gool. Branched multi-task networks: Deciding what layers to share, 2020.