FedCL: Federated Contrastive Learning for Privacy-Preserving Recommendation

Chuhan Wu^¹, Fangzhao Wu^², Tao Qi^¹, Yongfeng Huang^¹, Xing Xie^² ^¹Department of Electronic Engineering & BNRist, Tsinghua University, Beijing 100084, China
^²Microsoft Research Asia, Beijing 100080, China
{wuchuhan15, wufangzhao, taoqi.qt}@gmail.com, [email protected], [email protected]

Abstract

Contrastive learning is widely used for recommendation model learning, where selecting representative and informative negative samples is critical. Existing methods usually focus on centralized data, where abundant and high-quality negative samples are easy to obtain. However, centralized user data storage and exploitation may lead to privacy risks and concerns, while decentralized user data on a single client can be too sparse and biased for accurate contrastive learning. In this paper, we propose a federated contrastive learning method named FedCL for privacy-preserving recommendation, which can exploit high-quality negative samples for effective model training with privacy well protected. We first infer user embeddings from local user data through the local model on each client, and then perturb them with local differential privacy (LDP) before sending them to a central server for hard negative sampling. Since individual user embedding contains heavy noise due to LDP, we propose to cluster user embeddings on the server to mitigate the influence of noise, and the cluster centroids are used to retrieve hard negative samples from the item pool. These hard negative samples are delivered to user clients and mixed with the observed negative samples from local data as well as in-batch negatives constructed from positive samples for federated model training. Extensive experiments on four benchmark datasets show FedCL can empower various recommendation methods in a privacy-preserving way.

1 Introduction

Contrastive learning is a widely-adopted technique in personalized recommendation Zhou et al. (2020). Choosing informative negative samples is usually a necessity in contrastive recommendation model learning Robinson et al. (2020), and various strategies have been explored in existing works Wang et al. (2021). For example, Rendle et al. Rendle et al. (2009) proposed to compare each positive sample with a randomly selected negative sample from the entire item set. Wu et al. Wu et al. (2019) randomly selected several displayed but not clicked news within a news impression as negative samples to contrast them with the positive sample. Some recommendation methods further introduce hard negatives to improve the informativeness of training samples. For instance, Ding et al. Ding et al. (2019) proposed to use reinforcement learning to select hard negative samples from unclicked data. These methods are usually learned on centralized user data, where abundant and representative negative samples can be drawn from the global dataset. However, user data is usually highly privacy-sensitive, and centralized storage and mining of it may bring considerable privacy risks and user concerns. Some strict data protection regulations like GDPR¹¹1https://gdpr-info.eu/ also restrict the collection and centralized storage of user data. Thus, these methods are not feasible in privacy-preserving recommendation without centralized data to construct training samples.

Recent works study to use federated learning (FL) McMahan et al. (2017) to learn recommendation models in a decentralized way, where the raw user data is not necessarily uploaded to a server Chai et al. (2020). For example, Ammad et al. Ammad-Ud-Din et al. (2019) proposed a federated collaborative filtering method that only uploads the item embedding gradients, where negative samples are random items chosen from the item pool. These methods usually assume that there are sufficient negative items on each local device for negative sampling and model training, which may not be satisfied in real-world scenarios. In fact, the local data on user devices can be rather sparse and biased, and even there is no negative sample stored on user devices because many online services do not locally record non-interacted items Ning et al. (2021), which poses great challenges to contrastive recommendation model learning under privacy protection.

In this paper, we propose a federated contrastive recommendation model learning method named FedCL, which can exploit high-quality negative samples in a privacy-preserving way to empower model training. In our approach, the local model maintained on each client first infers a user embedding based on local user profiles. To protect user privacy, we perturb user embeddings via local differential privacy (LDP) before sending them to a central server for hard negative sampling. Since the perturbed user embeddings are quite noisy, we propose to cluster them on the server via the Ward clustering algorithm to reduce the influence of noise. The cluster centroids can represent users with different types of interests, and we use the cluster centroids instead of user embeddings to retrieve informative negatives from the item pool via a semi-hard negative sampling method. These negatives are delivered to user clients for federated model training by mixing them with the real negative samples stored by local clients (if available) and the local in-batch negatives inferred from positive samples. Extensive experiments on four benchmark recommendation datasets show that FedCL can effectively improve the performance of various recommendation methods and meanwhile protect user privacy.

The main contributions of this paper include:

•

We propose a federated contrastive learning method for privacy-preserving recommendation that can exploit informative negatives for model training.
•

We propose a federated hard negative sampling method that can accurately find hard negative samples for contrastive model training in a privacy-preserving way.
•

We design a negative mix-up method to incorporate various kinds of negative samples within a unified contrastive training framework.
•

We conduct extensive experiments to verify the effectiveness of our approach in empowering various recommendation methods in federated learning scenarios.

2 Related Work

2.1 Contrastive Learning in Information Retrieval

Contrastive learning is a core technique in many information retrieval (IR) tasks such as web search Xiong et al. (2020) and personalized recommendation Wei et al. (2021). A core problem of contrastive IR model training is selecting proper negative samples to contrast with the positive ones Robinson et al. (2020). One popular way is to randomly sample negatives from the candidate set. For example, the Bayesian Personalized Ranking (BPR) Rendle et al. (2009) method uses a randomly selected negative sample to compare with a positive sample. Many other works explore learning IR models by optimizing the gaps between the positive sample and a certain number of random negative samples Shen et al. (2014); Wu et al. (2019). Another widely used way is in-batch negative sampling. For example, Yi et al. Yi et al. (2019) proposed to use the positive items within the same batch as shared negatives for all queries in this batch. However, random or in-batch negative samples can be somewhat too easy for model training Robinson et al. (2020). To exploit more representative and informative negative samples for discriminative model training, some methods explore to incorporate hard negative samples into model learning Kalantidis et al. (2020). For example, Ding et al. Ding et al. (2019) proposed to use reinforcement learning to train a sampler model to select negative samples for contrastive recommendation model training. Xiong et al. Xiong et al. (2020) proposed to use an asynchronously updated ANN index to select globally hardest negative samples from the candidate document pool for learning the matching model. These methods usually require centralized data storage for negative sampling. However, under privacy-preserving settings there is no centralized data and the local data on user devices can be too sparse to obtain effective negative samples, making it difficult to apply existing contrastive IR model learning methods. Thus, in our work, we propose a federated contrastive learning method for recommendation, which can incorporate various kinds of negative samples into contrastive model training in a privacy-preserving way.

2.2 Federated Recommendation

In recent years, the application of federated learning in privacy-preserving recommendation has been extensively studied Chai et al. (2020); Qi et al. (2020); Liang et al. (2021). For example, Ammad et al. Ammad-Ud-Din et al. (2019) proposed to locally learn user and item embeddings on local user data, and upload item embedding gradients to a server for global model updating and delivery. Lin et al. Lin et al. (2020) proposed to randomly sample some unrated items and assign them virtual ratings to avoid communicating the full item embedding table. Qi et al. Qi et al. (2021) proposed a unified method for privacy-preserving news recall and ranking by modeling user interests with linear combinations of basis interest vectors. However, these methods usually assume that there are sufficient negative samples available on local user devices. Unfortunately, in many scenarios such as product recommendation and web search, non-interacted items are typically not locally recorded due to efficiency concerns. It may also be suboptimal to simply use random negative items provided by the server without considering clients’ characteristics. One recent work Ning et al. (2021) studies learning recommendation models under limited locally available negative samples. They proposed to use a batch-insensitive loss that enables using local positive items only to construct training samples. However, useful negative signals cannot be considered by this method. By contrast, our proposed method can effectively exploit various kinds of negative samples under privacy protection to improve contrastive recommendation model learning.

3 FedCL

Refer to caption — Figure 1: The framework of our FedCL approach. The standard local gradients uploading and aggregation in FL are omitted.

Next, we introduce our federated contrastive learning (FedCL) approach for recommendation in detail. Its overall framework is shown in Fig. 1. The server not only coordinates a large number of clients for collaborative model updating, but is also responsible for finding hard negative samples to empower model training. There are two major steps in FedCL, i.e., federated negative sampling and negative mix-up for local contrastive learning, which are described as follows.

3.1 Federated Negative Sampling

In federated learning, there is no global dataset for hard negative sampling. In addition, since local user data is highly sparse and biased, it is also impractical to conduct hard negative sampling locally. To solve this problem, we propose a federated negative sampling method to choose hard negatives in a privacy-preserving manner. Assume that there are $N$ clients in total. Each client first uses its maintained local user model to infer a user embedding from the local user profile. The user model can be a simple user ID embedding, or a more complicated one that learns user embeddings from the sequence of historical interacted items. We denote the embedding of user $u_{i}$ as $\mathbf{u}_{i}$ . Since this user embedding still contains much private user information, we use local differential privacy (LDP) to perturb this embedding. Motivated by Qi et al. (2020), we first clip each user embedding with a threshold $\delta$ according to its $L_{1}$ norm, and add Laplace noise $\mathbf{n}\sim Laplace(0,\frac{2\delta}{\epsilon})$ to protect this embedding ( $\epsilon$ is the privacy budget, and a lower $\epsilon$ means better privacy protection). Then the protected user embeddings satisfy $\epsilon$ -DP. We refer to the protected user embedding of $u_{i}$ as $\mathbf{\hat{u}}_{i}$ .

The protected user embeddings on user clients are uploaded to the server for negative sampling. A simple way is directly measuring the similarity between candidate items and these embeddings to find negative samples. However, these embeddings contain heavy noise if the privacy budget is relatively small, and the selected negative samples are usually inaccurate, as shown in the middle plot in Fig. 2. To mitigate the influence of noise on negative sampling, we propose to cluster the user embeddings and use their cluster centroids to represent users with different types of interests. More specifically, we use the Ward clustering algorithm to aggregate the $N$ user embeddings into $C$ clusters. The centroid of each cluster is the element-wise average of user embeddings in this cluster. We use the cluster centroid to replace the user embeddings in this cluster to conduct hard negative sampling (right plot in Fig. 2). Since the noise added to the user embeddings is zero-mean, the errors brought by the noise can be mitigated to some extent by using cluster centroids. We denote the cluster centroid corresponding to the user embedding $\mathbf{u}_{i}$ as $\mathbf{c}_{i}$ .

We evaluate the relevance between $\mathbf{c}_{i}$ and each item embedding in the candidate pool on the server to choose negative samples. A simple way is to pick the globally hardest negative samples Xiong et al. (2020). However, there may be many false negatives because the candidate items in the entire pool have not necessarily been displayed to the users. Thus, in our approach we propose to choose semi-hard negative samples to balance their difficulty and accuracy. For each centroid vector $\mathbf{c}_{i}$ , we first retrieve $R$ % of candidate samples with the highest difficulties to form a hard candidate subset (a smaller $R$ means harder negatives). We then randomly select $T$ negative samples from this subset as semi-hard negative samples. Note that for clients that are associated with the same cluster centroid, the sampling of negative items from the hard candidate subset is independent, which aims to ensure the comprehensiveness of negative samples. The semi-hard negative samples drawn from $\mathbf{c}_{i}$ are denoted as $[h_{1}^{i},h_{2}^{i},...,h_{T}^{i}]$ , which are further distributed to the $i$ -th client for local model training. In this way, the user clients can exploit the hard negative samples obtained from the global candidate pool in a privacy-preserving way.

1: Initialize model parameters on the clients and server

2: repeat

3: Select a subset

\mathcal{U}^{\prime}

from the user set

\mathcal{U}

randomly

\mathbf{g}=0

5: for each user

u_{i}

\in

\mathcal{U}^{\prime}

in parallel do

6: Infer user embedding

\mathbf{u}_{i}

from local profile

\mathbf{\hat{u}}_{i}\leftarrow clip(\mathbf{u}_{i},\delta)+Laplace(0,2\delta/\epsilon)

8: Upload

\mathbf{\hat{u}}_{i}

to the server

9: end for// Federated negative sampling

10: Server clusters collected user embeddings

11: for each user

u_{i}

\in

\mathcal{U}^{\prime}

12: Retrieve semi-hard negatives from the item pool using the cluster centroid

\mathbf{c}_{i}

13: Send negative items to the client

i

14: end for

15: Distribute

\mathbf{g}

to user clients for local update // Federated Model Learning

16: for each user

u_{i}

\in

\mathcal{U}^{\prime}

in parallel do

17: Mix semi-hard, in-batch and local negatives

18: Compute local loss

\mathcal{L}_{i}

on local training samples

19: Compute local model update

\mathbf{g}_{i}

20: Securely upload

\mathbf{g}_{i}

to the server

21: end for

22: Server aggregates local model updates

23: Update global model and distribute it to clients

24: until model convergence

Algorithm 1 FedCL

3.2 Negative Mix-up for Contrastive Training

Next, we introduce the proposed negative mix-up method for local contrastive model learning. Motivated by the in-batch negative sampling in prior work Yi et al. (2019), we propose to incorporate in-batch negatives by regarding the local positive items as a mini-batch, where the samples within this batch can be used as negatives for each other. We denote the local positive samples on the client $i$ as $[p_{1}^{i},p_{2}^{i},...,p_{P}^{i}]$ . For the positive item $p_{j}^{i}$ , its corresponding in-batch negatives are $[p_{1}^{i},...,p_{j-1}^{i},p_{j+1}...,p_{P}^{i}]$ . Note that this kind of in-batch negative samples cannot be used for training ID-based user modeling methods because they cannot learn different user embeddings based on the item interaction histories. In addition, in real-world scenarios there can be a small number of negative items stored on user devices or randomly distributed by the server. To help our approach be compatible with this setting, we also incorporate the local negative items into model training, which are denoted as $[n_{1}^{i},n_{2}^{i},...,n_{K}^{i}]$ . Given each positive item $p_{j}^{i}$ , we mix up the in-batch negatives, local negatives, and semi-hard negatives into a unified list of negative samples, and we denote the list of their embeddings as $\mathcal{N}_{j}^{i}$ . The embedding of the positive item $p_{j}^{i}$ is denoted as $\mathbf{p}_{j}^{i}$ The local contrastive training loss function $\mathcal{L}_{i}$ of the $i$ -th client is formulated as follows:

\mathcal{L}_{i}=-\sum_{j=1}^{P}\log(\frac{\exp(\mathbf{u}^{i}_{j}\cdot\mathbf{p}_{j}^{i})}{\exp(\mathbf{u}^{i}_{j}\cdot\mathbf{p}_{j}^{i})+\sum_{\mathbf{n}\in\mathcal{N}_{j}^{i}}\exp(\mathbf{u}^{i}_{j}\cdot\mathbf{n})}),

(1)

where $\mathbf{u}^{i}_{j}$ represents the user embedding for matching with the positive item $p_{j}^{i}$ (inferred by the user model on client $i$ ).

3.3 Federated Model Update

Finally, we briefly introduce the federated model update process in FedCL. After computing the loss function $\mathcal{L}_{i}$ , each client locally optimizes this loss to update the maintained local recommendation model. The local model updates from a certain number of clients are securely uploaded Chai et al. (2020) to the server for aggregation, and the aggregated model updates are used to update the global model maintained by the server. The server further distributes the global model to each client to conduct local update. The cascaded process of federated negative sampling, local negative mix-up for contrastive training, and federated model update is repeated for multiple rounds until the model converges.

4 Experiments

4.1 Datasets and Experimental Settings

We conduct experiments on four widely used benchmark datasets for recommendation. Two of them are taken from the Amazon dataset dump McAuley et al. (2015). We use the “Beauty” and “Sports and Outdoors” (shorten as “Sports”) domains for experiments. The third dataset is collected on Yelp. Following the settings in Zhou et al. (2020), we use the transaction records after January 1st, 2019. The last dataset is MovieLens 1M (denoted as ML-1M) Harper and Konstan (2015), which is also a canonical recommendation dataset. The statistics of the four datasets are listed in Table 1. For each user, the last interacted item is used for test, and the item before the last one is used for validation.

Datasets	#Users	#Items	#Actions	Avg. length	Density
Beauty	40,226	54,542	0.35m	8.8	0.02%
Sports	25,598	18,357	0.30m	8.3	0.05%
Yelp	30,431	20,033	0.32m	10.4	0.05%
ML-1M	6,040	3,416	1.0m	163.5	4.79%

Table 1: Statistics of the four datasets used in experiments.

Methods			Beauty				Sports
	HR@5	HR@10	nDCG@5	nDCG@10	HR@5	HR@10	nDCG@5	nDCG@10
NCF	0.0144	0.0296	0.0082	0.0131	0.0124	0.0220	0.0068	0.0101
NCF+Fed	0.0122	0.0267	0.0070	0.0123	0.0106	0.0199	0.0062	0.0094
NCF+BIL	0.0140	0.0288	0.0079	0.0129	0.0125	0.0223	0.0070	0.0102
NCF+FedCL	0.0245	0.0413	0.0155	0.0209	0.0178	0.0304	0.0109	0.0148
GRU4Rec	0.0166	0.0358	0.0088	0.0143	0.0135	0.0277	0.0094	0.0135
GRU4Rec+Fed	0.0134	0.0316	0.0074	0.0130	0.0115	0.0227	0.0079	0.0104
GRU4Rec+BIL	0.0160	0.0343	0.0083	0.0136	0.0130	0.0270	0.0091	0.0129
GRU4Rec+FedCL	0.0312	0.0539	0.0189	0.0265	0.0184	0.0311	0.0115	0.0156
BERT4Rec	0.0199	0.0410	0.0188	0.0259	0.0169	0.0332	0.0104	0.0156
BERT4Rec+Fed	0.0158	0.0352	0.0160	0.0222	0.0134	0.0292	0.0098	0.0135
BERT4Rec+BIL	0.0191	0.0397	0.0182	0.0250	0.0164	0.0325	0.0101	0.0150
BERT4Rec+FedCL	0.0368	0.0654	0.0209	0.0304	0.0215	0.0348	0.0124	0.0179

Table 2: Results of different methods on the Beauty and Sports datasets.

Methods			Yelp				ML-1M
	HR@5	HR@10	nDCG@5	nDCG@10	HR@5	HR@10	nDCG@5	nDCG@10
NCF	0.0134	0.0275	0.0080	0.0118	0.0168	0.0322	0.0098	0.0145
NCF+Fed	0.0104	0.0212	0.0072	0.0109	0.0133	0.0275	0.0090	0.0137
NCF+BIL	0.0128	0.0266	0.0078	0.0116	0.0157	0.0310	0.0095	0.0142
NCF+FedCL	0.0213	0.0359	0.0135	0.0186	0.0414	0.0675	0.0256	0.0342
GRU4Rec	0.0153	0.0262	0.0101	0.0135	0.0769	0.1662	0.0388	0.0674
GRU4Rec+Fed	0.0120	0.0237	0.0084	0.0112	0.0588	0.1025	0.0309	0.0531
GRU4Rec+BIL	0.0146	0.0255	0.0098	0.0129	0.0754	0.1633	0.0372	0.0660
GRU4Rec+FedCL	0.0264	0.0413	0.0176	0.0225	0.1155	0.2013	0.0671	0.0940
BERT4Rec	0.0184	0.0292	0.0117	0.0173	0.0736	0.1344	0.0427	0.0622
BERT4Rec+Fed	0.0152	0.0256	0.0100	0.0114	0.0544	0.0939	0.0288	0.0487
BERT4Rec+BIL	0.0178	0.0299	0.0119	0.0180	0.0727	0.1316	0.0420	0.0616
BERT4Rec+FedCL	0.0281	0.0440	0.0189	0.0236	0.1115	0.1946	0.0642	0.0909

Table 3: Results of different methods on the Yelp and ML-1M datasets.

We use the Adam Kingma and Ba (2015) optimizer for model training. The learning rate is 1e-3, and the number of selected users per update is 16. The hidden dimension is 64. The user embedding norm clip threshold is 1. The privacy budget $\epsilon$ is 4. The number of clusters is 25. To simulate local negative items, we randomly sample 100 items on each client, and for each positive we randomly choose 10 local negatives to participate in contrastive learning. The difficulty $R$ % of negative samples is set to 25%. The number of semi-hard negatives is 20. These hyperparameters are tuned on the validation sets. Motivated by Krichene and Rendle (2020), we evaluate the recommendation performance on the whole item set rather than the sampled subsets used in Sun et al. (2019). We use the Hit Ratio (HR) and normalized Discounted Cumulative Gain (nDCG) metrics over the top 5 or 10 ranked items. We repeat each experiment 5 times and report the average scores. We will soon release our code for reproducibility.

4.2 Performance Evaluation

To verify the effectiveness and generality of FedCL, we compare the results of several widely used methods, including NCF He et al. (2017), GRU4Rec Hidasi et al. (2016) and BERT4Rec Sun et al. (2019), under four different settings as follows: (a) centralized learning, which uses the original data sampling and model learning methods based on centralized data; (b) federated learning (Fed), which directly applies the federated learning framework to the basic model (local negatives are used); (c) federated learning with batch-insensitive loss (BIL) Ning et al. (2021), which uses batch-insensitive loss to learn models on local positive items only; (d) federated contrastive learning (FedCL), our proposed method. The results on the four datasets are shown in Tables 2 and 3. We find that compared with centralized learning, the performance of models learned in the standard federated framework usually decreases. This is mainly due to the non-IID property of user data decentralized on different clients and the scarcity of informative negative samples. In addition, the BIL method outperforms the federated baseline. This is because BIL learns uniform item representations via a spreadout regularization instead of learning from the limited local negative samples. Moreover, our proposed FedCL method outperforms other methods (t-test results show the improvement is significant, $p<0.001$ ), and it can consistently boost the performance of different base models. This is because our approach can incorporate various kinds of negative samples into contrastive model training to exploit richer negative signals.

4.3 Influence of Negative Samples

Next, we study the impacts of different types of negative samples in our approach. We compare the model performance with one kind of negative samples removed, and the performance of replacing semi-hard negatives with globally hardest negatives Xiong et al. (2020). The results on the Yelp and ML-1M datasets are shown in Fig. 3. We find that the hard negatives obtained by the federated negative sampling method have a major contribution to model performance. However, the performance has a huge decline if they are replaced by globally hardest samples. This is mainly because globally hardest samples may contain many false negatives, which are harmful to model training. In addition, both in-batch and local negatives have contributions, but the performance gain brought by the local negatives is not so large. Thus, the performance of FedCL can still be promising if there is no local negative sample available on user devices.

4.4 Influence of Clustering

We further study the influence of clustering on the model performance. The results without clustering, with Kmeans clustering, or with Ward clustering are compared in Fig. 4. We observe that the performance is suboptimal when the user embeddings are directly used for hard negative sampling. This is intuitive because the perturbed user embeddings are quite noisy. In addition, both Kmeans and Ward clustering algorithms can improve the model performance. This may be because using the centroids of clusters to replace original user embeddings can mitigate the influence of noise. In addition, the Ward clustering method is better than Kmeans. This may be because the hierarchical clustering method is stronger in modeling the relations between similar user interests.

4.5 Influence of Cluster Number

We then analyze the impacts of the number of clusters on model performance, as shown in Fig. 5. The results reveal that when the number of clusters is too small, the performance is suboptimal. This may be because user interests cannot be effectively distinguished when there are too few clusters. However, the performance starts to decrease when the number of clusters is larger than 25. This may be because the influence of noise encoded in user embeddings becomes large. Thus, we set the number of clusters $C$ to 25.

4.6 Influence of Privacy Budget

Finally, we analyze the influence of the privacy budget of user embeddings for hard negative sampling on the model performance. We change the intensity of Laplace noise according to different privacy budgets, and the corresponding results are shown in Fig. 6. We find that the performance usually decreases when a smaller privacy budget $\epsilon$ is used. This is intuitive because the noise is stronger when $\epsilon$ is smaller, and the obtained negative samples are more inaccurate. In our approach, we choose $\epsilon=4$ that can yield a good performance and satisfactory user privacy protection ability.

4.7 Experiments in Appendix

In the appendix, we analyze the impact of the negative sample difficulty ( $R$ %) as well as the number of local and semi-hard negative samples. The results show that selecting negatives from the top 25% relevant candidates and using a medium number of negative samples is suitable.

5 Conclusion

In this paper, we propose a federated contrastive learning framework for privacy-preserving recommendation, which can effectively exploit various kinds of informative negative samples to empower federated recommendation model learning. In our approach, we propose a federated negative sampling method to choose semi-hard negatives from the candidate item pool in a privacy-preserving way based on the locally inferred and perturbed user embeddings. In addition, we explore using a negative mix-up method to jointly incorporate local, in-batch and semi-hard negative samples into a unified contrastive learning framework. Extensive experiments on four widely used benchmark recommendation datasets validate that our approach can effectively improve different recommendation methods in a privacy-preserving manner.

References

Ammad-Ud-Din et al. [2019] Muhammad Ammad-Ud-Din, Elena Ivannikova, Suleiman A Khan, Were Oyomno, Qiang Fu, Kuan Eeik Tan, and Adrian Flanagan. Federated collaborative filtering for privacy-preserving personalized recommendation system. arXiv preprint arXiv:1901.09888, 2019.
Chai et al. [2020] Di Chai, Leye Wang, Kai Chen, and Qiang Yang. Secure federated matrix factorization. IEEE Intelligent Systems, 2020.
Ding et al. [2019] Jingtao Ding, Yuhan Quan, Xiangnan He, Yong Li, and Depeng Jin. Reinforced negative sampling for recommendation with exposure data. In IJCAI, pages 2230–2236, 2019.
Harper and Konstan [2015] F Maxwell Harper and Joseph A Konstan. The movielens datasets: History and context. TIIS, 5(4):1–19, 2015.
He et al. [2017] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. Neural collaborative filtering. In WWW, pages 173–182, 2017.
Hidasi et al. [2016] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. Session-based recommendations with recurrent neural networks. In ICLR, 2016.
Kalantidis et al. [2020] Yannis Kalantidis, Mert Bulent Sariyildiz, Noe Pion, Philippe Weinzaepfel, and Diane Larlus. Hard negative mixing for contrastive learning. arXiv preprint arXiv:2010.01028, 2020.
Kingma and Ba [2015] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
Krichene and Rendle [2020] Walid Krichene and Steffen Rendle. On sampled metrics for item recommendation. In KDD, pages 1748–1757, 2020.
Liang et al. [2021] Feng Liang, Weike Pan, and Zhong Ming. Fedrec++: Lossless federated recommendation with explicit feedback. In AAAI, pages 4224–4231, 2021.
Lin et al. [2020] Guanyu Lin, Feng Liang, Weike Pan, and Zhong Ming. Fedrec: Federated recommendation with explicit feedback. IEEE Intelligent Systems, 2020.
McAuley et al. [2015] Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. Image-based recommendations on styles and substitutes. In SIGIR, pages 43–52, 2015.
McMahan et al. [2017] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In AISTATS, pages 1273–1282. PMLR, 2017.
Ning et al. [2021] Lin Ning, Karan Singhal, Ellie X Zhou, and Sushant Prakash. Learning federated representations and recommendations with limited negatives. arXiv preprint arXiv:2108.07931, 2021.
Qi et al. [2020] Tao Qi, Fangzhao Wu, Chuhan Wu, Yongfeng Huang, and Xing Xie. Privacy-preserving news recommendation model learning. In EMNLP: Findings, pages 1423–1432, 2020.
Qi et al. [2021] Tao Qi, Fangzhao Wu, Chuhan Wu, Yongfeng Huang, and Xing Xie. Uni-fedrec: A unified privacy-preserving news recommendation framework for model training and online serving. In EMNLP: Findings, pages 1438–1448, 2021.
Rendle et al. [2009] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback. In UAI, pages 452–461, 2009.
Robinson et al. [2020] Joshua Robinson, Ching-Yao Chuang, Suvrit Sra, and Stefanie Jegelka. Contrastive learning with hard negative samples. arXiv preprint arXiv:2010.04592, 2020.
Shen et al. [2014] Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. Learning semantic representations using convolutional neural networks for web search. In WWW, pages 373–374, 2014.
Sun et al. [2019] Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. In CIKM, pages 1441–1450, 2019.
Wang et al. [2021] Jinpeng Wang, Jieming Zhu, and Xiuqiang He. Cross-batch negative sampling for training two-tower recommenders. In SIGIR, pages 1632–1636, 2021.
Wei et al. [2021] Yinwei Wei, Xiang Wang, Qi Li, Liqiang Nie, Yan Li, Xuanping Li, and Tat-Seng Chua. Contrastive learning for cold-start recommendation. In MM, pages 5382–5390, 2021.
Wu et al. [2019] Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, and Xing Xie. Npa: Neural news recommendation with personalized attention. In KDD, pages 2576–2584, 2019.
Xiong et al. [2020] Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808, 2020.
Yi et al. [2019] Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Kumthekar, Zhe Zhao, Li Wei, and Ed Chi. Sampling-bias-corrected neural modeling for large corpus item recommendations. In Recsys, pages 269–277, 2019.
Zhou et al. [2020] Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, and Ji-Rong Wen. S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization. In CIKM, pages 1893–1902, 2020.

Appendix

Implementation Details

Several implementation details are introduced as follows. We conducted experiments using a machine with Ubuntu 16.04 operating system and Python 3.6. The machine has a memory of 256GB and a Tesla V100 GPU with 32GB memory. We used Keras 2.2.4 and tensorflow 1.15 to implement deep learning models. The Faiss library is used to retrieve negative samples.

Influence of Negative Sample Difficulty

Fig. 7 shows the influence of using top $R$ % relevant candidate items for semi-hard negative sampling. The results show that the performance is suboptimal when the ratio $R$ is either too small or too large. This is because when $R$ is too large the sampled items are not so informative, while a small $R$ may lead to inaccurate negatives because they are too hard. Thus, we choose $R=25$ in our experiments.

Influence of Sample Number

We also study the influence of semi-hard and local negative samples on the model performance. We first set the number of local negative samples to 10 and change the number of semi-hard negative samples, as shown in Fig. 8. We find the performance is optimal when about 20 semi-hard negative samples are used. This is because the negative signals cannot be effectively exploited when semi-hard negatives are insufficient, while the negative sets are not uniform enough when there are too many hard negatives.

We further change the number of local negative samples and compare the results (Fig. 9). We find that the number of local negative samples used for model training also needs to be moderate. Thus, we choose to use 10 negative samples in model learning.