This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Semi-Disentangled Representation Learning in Recommendation System

Weiguang Chen [email protected] Hunan University Wenjun Jiang [email protected] Hunan University Xueqi Li [email protected] Hunan University Kenli Li [email protected] Hunan University Albert Zomaya [email protected] University of Sydney  and  Guojun Wang [email protected] Guangzhou University
Abstract.

Disentangled representation has been widely explored in many fields due to its maximal compactness, interpretability and versatility. Recommendation system also needs disentanglement to make representation more explainable and general for downstream tasks. However, some challenges slow its broader application — the lack of fine-grained labels and the complexity of user-item interactions. To alleviate these problems, we propose a Semi-Disentangled Representation Learning method (SDRL) based on autoencoders. SDRL divides each user/item embedding into two parts: the explainable and the unexplainable, so as to improve proper disentanglement while preserving complex information in representation. The explainable part consists of internalblockinternal\ block for individual-based features and externalblockexternal\ block for interaction-based features. The unexplainable part is composed by otherblockother\ block for other remaining information. Experimental results on three real-world datasets demonstrate that the proposed SDRL could not only effectively express user and item features but also improve the explainability and generality compared with existing representation methods.

1. Introduction

Disentangled representation learning aims at separating embedding into multiple independent parts, which makes it more explainable and general (bengio_representation_2013, ). The core idea of existing methods is to minimize the reconstruction error of the whole representation and maximize the independence among different parts simultaneously (chen_infogan_2016, ; higgins_beta-vae_2017, ). It has been successfully applied into image representation, and researchers have verified its superiority on many downstream tasks, e.g., image generation (alharbi_disentangled_2020, ; chen_infogan_2016, ) and style transfer (kotovenko_content_2019, ). Disentangled representation is also required by recommendation system to distinguish various hidden intentions under the same behavior (wang_disentangled_2020, ; ma_disentangled_2020, ). However, two obstacles slow its extensive application: the lack of enough fine-grained labels and the complexity of user-item interactions. Furthermore, Locatello et al. (locatello_challenging_2019, ) theoretically demonstrate the difficulty and even impossibility of unsupervised disentanglement and propose solutions using a few labels. It inspires us to put forward a Semi-Disentangled Representation Learning (SDRL) approach for recommendation system based on limited labels.

Refer to caption
Figure 1. An Illustration of User-Item Interactions 111Images come from https://www.imdb.com/..

Specifically, we would introduce an example with Fig. 1 to explain the requirements and challenges of disentangled representation for recommendation and clarify our motivations in this paper. For instance, users u1u_{1} and u3u_{3} have watched the same movie Marriage Story, but their motivations may differ from each other. u1u_{1} chooses it probably because he is a fan of the actress Scarlett Johansson while the motivation of u3u_{3} maybe that he takes interests in romance movies. Another scene (i.e., users u1u_{1} and u2u_{2} watch the same movie Sen to Chihiro no kamikakushi) reflects a similar phenomenon. u2u_{2} likes animation movies while u1u_{1} also watches it perhaps because u1u_{1} and u2u_{2} are friends. Disentangled representation could help distinguish these different intentions under the same behavior, so as to improve the representation accuracy and offer clues to explain why the items are provided.

However, it is hardly possible to develop complete and accurate disentanglement in recommendation system, in consideration that it lacks fine-grained labels and user-item interactions are complicated. Specific with the example, it usually lacks enough labels for building completely fine-grained aspects, e.g., u1u_{1} is a fan of Scarlett Johansson. On the other hand, some unknowable and random factors also affect users’ decisions, e.g., u2u_{2} invites his friend u1u_{1} to watch Sen to Chihiro no kamikakushi together. Disentanglement based on incomplete or inaccurate factors may decrease the expression ability of representation.

To overcome these limitations, we propose a semi-disentangled representation method SDRL, separating representation into three parts to respectively express internal, external and other remaining complex information. In particular, we present internalblockinternal\ block to denote the features related to individual itself, e.g., product category, movie style, user age. externalblockexternal\ block represents the characteristics from user-item interactions, e.g., user ratings and implicit feedback. Besides, we introduce the otherblockother\ block to generalize the information that may not contained by the former two blocks or random factors in the real scenes. Moreover, in addition to reduce the overall reconstruction error, we utilize category information and user-item ratings as the supervision for internalblockinternal\ block and externalblockexternal\ block respectively. In this way, SDRL could not only capture complex interactive information but also express various features into different blocks. To sum up, the main contributions are as follows:

  • We identify the problem of semi-disentangled representation learning in recommendation system, to preserve complex information while achieving proper disentanglement. As far as we know, we are the first to study this problem.

  • We propose a method SDRL to address the problem. It divides the representation into three blocks: internalblockinternal\ block, externalblockexternal\ block and otherblockother\ block. The former two are employed to express individual- and interaction-based features. The otherblockother\ block contains remaining information.

  • The experimental results demonstrate that the proposed SDRL improves the accuracy of two downstream tasks, as well as enhances the explainablity of representation.

2. Task Formulation

To improve both disentanglement and expression ability of representation in recommendation system, we separate embeddings into three blocks: internalblockinternal\ block, externalblockexternal\ block and otherblockother\ block. We would formally define these concepts and the problem to solve in this paper.

2.1. Key Concepts

Definition 1: internal block. It contains features about individual itself, which are extracted from content information.

Definition 2: external block. It expresses features based on interactions, i.e., user-item ratings, implicit feedback, etc.

Definition 3: other block. otherblockother\ block denotes characteristics excluding those contained by the former blocks.

In this work, we utilize category information to supervise internalblockinternal\ block. User-item ratings are employed as the supervision of externalblockexternal\ block. otherblockother\ block generalizes some features belonged to individuals and interactions but beyond supervision (e.g., the color of a product or the director of a movie), as well as some random factors.

2.2. Problem Definition

We identify the problem of semi-disentangled representation in recommendation system, to preserve complicated information while strengthening the representation explainability and generality with proper disentanglement. It requires to embed all features into three blocks using limited labels as supervision. It could be formally defined as follows.

Input: The normalized user-item ratings RR and item categories CIC^{I} are taken as the input, in which Ri,j=rR_{i,j}=r denotes that the ii-th user rates jj-th item as rr points and Cp,qI=1C^{I}_{p,q}=1 means that the pp-th item belongs to qq-th category.

Output: Each user and item needs to be represented as a kk-dimension vector consisting of three blocks, internalblockinternal\ block, externalblockexternal\ block and otherblockother\ block.

Objective: The goal is to (1) make representation ZZ preserve more original information of users and items and (2) encourage internalblockinternal\ block and externalblockexternal\ block to express more features related to categories and interactions respectively. The objective function is defined as follows.

(1) MaximizeP(U,I,R|Z)+P(C|Zint)+P(R|Zext),\displaystyle Maximize\ P(U,I,R|Z)+P(C|Z_{int})+P(R|Z_{ext}),

where UU and II represent initial embeddings of users and items, ZZ denotes their representation in semi-disentangled space, ZintZ_{int} and ZextZ_{ext} are the corresponding internalblockinternal\ block and externalblockexternal\ block, and C=CICUC=C^{I}\cup C^{U} contains category labels of items and users. P(A|B)P(A|B) means the possibility of generating AA given BB.

Refer to caption
Figure 2. The Framework of SDRL.

3. SDRL: The Details

We propose a semi-disentangled representation learning method SDRL for recommendation system. The framework is shown in Fig. 2. It consists of two major components. (1) Node representation (i.e., user representation and item representation) employs autoencoders to preserve characteristics of users, items and their interactions. (2) Supervised semi-disentanglement utilizes category information and user-item ratings to encourage internalblockinternal\ block and externalblockexternal\ block to respectively express more individual and interactive features. We also point out some possible directions to extend SDRL.

3.1. Node Representation

We exploit autoencoders to transform representation in initial space into semi-disentangled space.

Autoencoder is an unsupervised deep neural network (hinton_autoencoders_1993, ; vincent_stacked_2010, ), which has been extensively applied for network representation learning (wu_collaborative_2016, ; tallapally_user_2018, ). It has two modules of encoder and decoder as follows,

(2) f(x)=σ(Wx+b),\displaystyle f(x)=\sigma(Wx+b),
(3) g(y)=σ(Wy+b).\displaystyle g(y)=\sigma(W^{\prime}y+b^{\prime}).

Encoder denotes the mapping ff that transforms the input vector xRdx\in R^{d} into the latent representation yRdy\in R^{d^{\prime}}, where WRd×dW\in R^{d^{\prime}\times d} is a weight matrix and bRdb\in R^{d^{\prime}} is an offset vector. The mapping gg is called the decoder, reconstructing yy in the latent space into xx^{\prime} in the initial space, in which WRd×dW^{\prime}\in R^{d\times d^{\prime}} and bRdb^{\prime}\in R^{d} denote the weight matrix and the offset vector respectively. σ(.)\sigma(.) is an activation function.

The objective of autoencoder is to minimize the reconstruction error. Stacked autoencoders (vincent_stacked_2010, ) is a widely used variant, which has been experimentally verified the improvement of representation quality. Therefore, we employ it to generate representations of users and items.

Different from typical autoencoders, we use three encoders EncoderintEncoder_{int}, EncoderextEncoder_{ext} and EncoderothEncoder_{oth} to produce the corresponding internalblockinternal\ block, externalblockexternal\ block and otherblockother\ block, respectively. The generation process of users is as follows,

(4) ZintU=Encoderint(U),\displaystyle Z^{U}_{int}=Encoder_{int}(U),
(5) ZextU=Encoderext(U),\displaystyle Z^{U}_{ext}=Encoder_{ext}(U),
(6) ZothU=Encoderoth(U),\displaystyle Z^{U}_{oth}=Encoder_{oth}(U),
(7) ZU=concentrate(Zint,Zext,Zoth),\displaystyle Z^{U}=concentrate(Z_{int},Z_{ext},Z_{oth}),
(8) U^=Decoder(ZU).\displaystyle\hat{U}=Decoder(Z^{U}).

ZintUZ^{U}_{int}, ZextUZ^{U}_{ext} and ZothUZ^{U}_{oth} represent the above three blocks respectively, ZUZ^{U} denotes the embeddings of users UU in semi-disentangled space, and U^\hat{U} is the generated representations of users in the initial space. The representation of items also utilizes a similar process. The goal of the process is to reconstruct the representation of users and items in the initial space as well as user-item ratings with autoencoders. Considering that the number of unobserved interactions (i.e., there is no rating) far exceeds that of the observed, we employ Binary Cross Entropy (BCE) as the basic metric. We define the loss function of reconstruction as follows,

(9) lossrecon=BCE(U^,U)+BCE(I^,I)+BCE(R^,R),\displaystyle loss_{recon}=BCE(\hat{U},U)+BCE(\hat{I},I)+BCE(\hat{R},R),

where U^\hat{U}, I^\hat{I} denote the reconstructed users and items, R^\hat{R} represent predicted ratings by matching ZUZ^{U} and ZIZ^{I}. Details of BCE are as follow, where yy denote labels, y^\hat{y} represent the predicted values and NN is the number of yy,

(10) BCE(y^,y)=1Ni=1Nyilog(yi^)+(1yi)log(1yi^).\displaystyle BCE(\hat{y},y)=\frac{1}{N}\sum_{i=1}^{N}y_{i}\log(\hat{y_{i}})+(1-y_{i})\log(1-\hat{y_{i}}).

3.2. Supervised Semi-Disentanglement

Disentangled representation with weakly supervision has demonstrated its effectiveness in computer vision (locatello_disentangling_2020, ; chen_weakly_2020, ). The successful application and the complexity of interactions inspires us to improve the representation disentanglement using limited labels in recommendation system.

3.2.1. Internal Block Supervision

We employ category information CC as the supervision for internalblockinternal\ block. CIC^{I} represents the item-category correspondence based on side information. CUC^{U} denotes the user preference on category extracted from ratings and CIC^{I}, which is calculated as follows,

(11) CUm=Normalize(InI,Rm,n>0Rm,nCIn).\displaystyle C^{U_{m}}=Normalize(\sum_{I_{n}\in I,R_{m,n}>0}R_{m,n}C^{I_{n}}).

CUmC^{U_{m}} and CInC^{I_{n}} respectively denote category vectors of UmU_{m} and InI_{n}. We sum the product of rating Rm,nR_{m,n} and item category vector CInC^{I_{n}} of all rated items, and normalize it as UmU_{m}’s category preferences CUmC^{U_{m}}. The loss function is as follows,

(12) lossint=BCE(Cint^,C)),\displaystyle loss_{int}=BCE(\hat{C_{int}},C)),

where Cint^\hat{C_{int}} denotes the predicted category features of users and items using the corresponding internalblockinternal\ block.

3.2.2. External Block Supervision

We utilize ratings RR for supervising externalblockexternal\ block to contain more interactive information. The loss function on externalblockexternal\ block is as follows,

(13) lossext=BCE(Rext^,R)).\displaystyle loss_{ext}=BCE(\hat{R_{ext}},R)).

Rext^\hat{R_{ext}} denote the predicted ratings based on externalblockexternal\ block.

3.2.3. Semi-Disentanglement Analysis

Most existing methods explicitly improve the block independence with the mutual information or distance correlation (cheng_improving_2020, ), which maybe not well applicable in recommendation. The major reason is that we need improve representation disentanglement as well as preserve interrelated characteristics. Hence, we propose semi-disentangled method SDRL. It does not separate the whole embedding into explainable blocks as that in disentangled representation learning, i.e., preserving no explicit meanings or factors in otherblockother\ block. Furthermore, it does not force the independence among different blocks. Instead, it just pushes the explainable blocks to express the more corresponding characteristics, i.e., encouraging internalblockinternal\ block to express more category-based information and externalblockexternal\ block to contain more interactive features.

3.3. Model Optimization and Extension

We aim at improving the expression ability and proper disentanglement of representation at the same time, so we combine the loss function of node representation and semi-disentanglement as follows,

(14) loss=lossrecon+lossint+lossext.\displaystyle loss=loss_{recon}+loss_{int}+loss_{ext}.

3.3.1. Setting-oriented Extension

We separate the explainable part of representation into two blocks using autoencoders. As data characteristics and label information change, it’s flexible to adjust the number and the type of blocks as well as switch basic representation method. There are some possible extensions.

When more fine-grained labels are available even not for each sample, it is reasonable and convenient to add more blocks and optimize them with a small number of labels as in (locatello_disentangling_2020, ). In another setting where the extracted features are independent from each other, variational autoencoders (VAE) (kingma_introduction_2019, ) maybe a good choice to replace autoencoders.

3.3.2. Task-oriented Extension

The objective of this work is to produce general representation, so just a simple match of embeddings is employed for various downstream tasks. In fact, different tasks may emphasize different factors. For a specific task, some following-up modules maybe required and it is easy to extend SDRL with them.

For instance, for some tasks with supervision (e.g., rating prediction, node classification), attention mechanism (vaswani_attention_2017, ) is an intuitively excellent option as the following-up module to allocate different weights for different factors. For some tasks without labels (e.g., serendipity recommendation (li_directional_2020, )), pre-assigned weights probably could improve the performance. In brief, the proposed semi-disentangled representation provides opportunity to adaptively or manually pay different attentions to known factors on various tasks.

4. Experiments

To validate the effectiveness and explainability of SDRL, as well as the role of various semi-disentangled blocks, we conduct intensive experiments on three real-world datasets. We would briefly introduce the experimental settings in Section 4.1. Section 4.2 displays the comparison among our method and baselines in Top-K recommendation and item classification. In Sections 4.3 and 4.4, we perform ablation experiments and hyper-parameter analysis to study the impacts of different blocks and parameters in SDRL. Finally, we demonstrate the semi-disentanglement and explainability of representation with visualization and a case study in Section 4.5.

4.1. Experimental Settings

We perform experiments on three real-world datasets: MovieLens-latest-small (ml-ls), MovieLens-10m (ml-10m) (harper_movielens_2016, ) and Amazon-Book (mcauley_image-based_2015, ; he_ups_2016, ), whose statistics are shown in Table 1. We filter out users and items with less than 20 ratings and employ 80% ratings as the training data and the others as test data. We compare our method with four baselines, NetMF (qiu_network_2018, ; rozemberczki_karate_2020, ), ProNE (zhang_prone_2019, ), VAECF (liang_variational_2018, ) and MacridVAE (ma_learning_2019, ), based on four metrics, recallrecall, precisionprecision, F1F1 and ndcgndcg (shani_evaluating_2011, ).

Table 1. Statistics of Datasets.
dataset #Users #Items #Ratings #Categories Density
  ml-ls 610 9742 100836 18 1.7%
ml-10m 71567 10681 10000054 18 1.3%
Amazon-Book 52643 91599 2984108 22 0.062%

4.1.1. Baselines

We would briefly introduce four baselines and four variants of our proposed method SDRL.

NetMF: NetMF (qiu_network_2018, ) makes network embedding as matrix factorization, unifying four network embedding methods DeepWalk (perozzi_deepwalk_2014, ), LINE (tang_line_2015, ), PTE (tang_pte_2015, ), and node2vec (grover_node2vec_2016, ).

ProNE: ProNE (zhang_prone_2019, ) is a fast and scalable network representation approach consisting of two modules, sparse matrix factorization and propagation in the spectral space.

VAECF: Liang et al. develop a variant of Variational AutoEncoders for Collaborative Filtering (liang_variational_2018, ) on implicit feedback data, which is a non-linear probabilistic model.

MacridVAE: MacridVAE (ma_learning_2019, ) is one of the state-of-the-art methods that learn disentangled representation for recommendation, achieving macro and micro disentanglement.

To study the impacts of three blocks, internalblockinternal\ block (int), externalblockexternal\ block (ext) and otherblockother\ block (oth), we develop some variants through different combinations. For variants with two blocks, we set their proportion as 1:1.

SDRL’(int+ext) generates node embeddings consisting of two blocks, internalblockinternal\ block and externalblockexternal\ block.

SDRL’(int+oth) keeps internalblockinternal\ block and otherblockother\ block.

SDRL’(ext+oth) consists of externalblockexternal\ block and otherblockother\ block.

SDRL’(whole) represents a node as a whole embedding, which is similar to those common representation methods, but its optimization is based on autoencoders as in SDRL.

4.2. Performance Comparison

We verify the effectiveness of SDRL in two common downstream tasks in recommendation system, Top-K recommendation and item classification. We run the experiments 20 times and report the average values and the standard deviation. We highlight the best values of baselines in bold and calculate the corresponding improvements.

Table 2. Comparison on Top-K recommendation(%). (ml-ls)
method F1@5 F1@10 F1@15 ndcg@5 ndcg@10 ndcg@15
  NetMF 2.9573(±\pm0.0) 4.7776(±\pm0.0) 5.1616(±\pm0.0) 3.793(±\pm0.0) 5.2728(±\pm0.0) 6.9132(±\pm0.0)
ProNE 3.0491(±\pm0.1) 4.5284(±\pm0.14) 5.3116(±\pm0.17) 3.8125(±\pm0.18) 5.243(±\pm0.13) 7.0057(±\pm0.11)
VAECF 5.5595(±\pm0.29) 7.7409(±\pm0.33) 8.9691(±\pm0.43) 8.7243(±\pm0.55) 9.8197(±\pm0.36) 11.3705(±\pm0.35)
MacridVAE 5.1578(±\pm0.15) 7.3867(±\pm0.26) 8.5167(±\pm0.32) 8.0452(±\pm0.33) 9.0503(±\pm0.32) 10.6263(±\pm0.23)
improvement 30.0171 27.9192 25.4228 20.4062 25.8338 28.8774
SDRL 7.2283(±\pm0.27) 9.9021(±\pm0.27) 11.2493(±\pm0.26) 10.5046(±\pm0.57) 12.3565(±\pm0.3) 14.654(±\pm0.26)
Table 3. Comparison on Top-K recommendation(%). (ml-10m)
method F1@5 F1@10 F1@15 ndcg@5 ndcg@10 ndcg@15
  NetMF 3.9352(±\pm0.0) 6.2(±\pm0.0) 7.4752(±\pm0.0) 4.831(±\pm0.0) 6.8198(±\pm0.0) 9.2123(±\pm0.0)
ProNE 2.1065(±\pm0.04) 3.1941(±\pm0.06) 3.8371(±\pm0.05) 2.3357(±\pm0.05) 3.5148(±\pm0.04) 4.9629(±\pm0.03)
VAECF 5.1655(±\pm0.07) 7.5696(±\pm0.1) 8.881(±\pm0.1) 8.033(±\pm0.18) 9.4863(±\pm0.13) 11.4674(±\pm0.11)
MacridVAE 5.1832(±\pm0.07) 7.5236(±\pm0.11) 8.8513(±\pm0.14) 7.0934(±\pm0.18) 9.0107(±\pm0.16) 11.4481(±\pm0.15)
improvement 44.6963 38.8184 35.2393 42.2793 38.2288 37.0189
SDRL 7.4999(±\pm0.16) 10.508(±\pm0.23) 12.0106(±\pm0.25) 11.4293(±\pm0.46) 13.1128(±\pm0.35) 15.7125(±\pm0.27)
Table 4. Comparison on Top-K recommendation(%). (Amazon-Book)
method F1@5 F1@10 F1@15 ndcg@5 ndcg@10 ndcg@15
  NetMF 9.0322(±\pm0.0) 10.9837(±\pm0.0) 11.5754(±\pm0.0) 13.4461(±\pm0.0) 12.623(±\pm0.0) 13.8484(±\pm0.0)
ProNE 6.818(±\pm0.07) 8.8102(±\pm0.05) 9.5218(±\pm0.07) 9.9492(±\pm0.14) 9.9032(±\pm0.06) 11.1619(±\pm0.04)
VAECF 6.1366(±\pm0.09) 7.9531(±\pm0.14) 8.6195(±\pm0.14) 9.6312(±\pm0.19) 9.3344(±\pm0.14) 10.3365(±\pm0.1)
MacridVAE 8.0043(±\pm0.12) 9.9379(±\pm0.14) 10.5185(±\pm0.17) 12.1028(±\pm0.22) 11.5303(±\pm0.13) 12.6398(±\pm0.11)
improvement 0.2879 4.3592 5.9549 2.0549 5.4543 7.1127
SDRL 9.0582(±\pm0.07) 11.4625(±\pm0.12) 12.2647(±\pm0.14) 13.7224(±\pm0.18) 13.3115(±\pm0.14) 14.8334(±\pm0.1)
Refer to caption
(a) ml-ls
Refer to caption
(b) ml-10m
Refer to caption
(c) Amazon-Book
Figure 3. Statistics of Categories Grouped by Users.
Table 5. Comparison on item classification(%). (ml-ls)
method recall precison micro_F1
  NetMF 50.3539 59.8585 54.6957
ProNE 53.1639 62.73 57.5517
improvement 12.4686 13.6479 13.0066
SDRL 59.7927 71.2913 65.0372
Table 6. Comparison on item classification(%). (ml-10m)
method recall precison micro_F1
  NetMF 66.4181 65.6991 66.0564
ProNE 67.7764 66.7775 67.2731
improvement 18.0496 19.9254 18.9872
SDRL 80.0098 80.0832 80.0464
Table 7. Comparison on item classification(%). (Amazon-Book)
method recall precison micro_F1
  NetMF 51.4335 47.3786 49.3226
ProNE 51.2197 47.3555 49.2115
improvement 83.7268 76.2521 79.759
SDRL 94.4971 83.5058 88.6618
Table 8. Comparison on Top-K recommendation with different combinations of blocks (%). (ml-ls)
method F1@5 F1@10 F1@15 ndcg@5 ndcg@10 ndcg@15
  SDRL 7.2283(±\pm0.27) 9.9021(±\pm0.27) 11.2493(±\pm0.26) 10.5046(±\pm0.57) 12.3565(±\pm0.3) 14.654(±\pm0.26)
SDRL’(int+ext) 6.9607(±\pm0.22) 9.7205(±\pm0.24) 11.1285(±\pm0.27) 9.8062(±\pm0.48) 12.0343(±\pm0.31) 14.5021(±\pm0.25)
SDRL’(int+oth) 7.1316(±\pm0.21) 9.9035(±\pm0.26) 11.3039(±\pm0.24) 10.247(±\pm0.51) 12.2864(±\pm0.29) 14.8077(±\pm0.19)
SDRL’(ext+oth) 6.7585(±\pm0.28) 9.2792(±\pm0.19) 10.5929(±\pm0.2) 9.5182(±\pm0.55) 11.5124(±\pm0.26) 13.985(±\pm0.18)
SDRL’(whole) 6.2179(±\pm0.75) 9.146(±\pm0.59) 10.4072(±\pm0.51) 12.796(±\pm1.96) 13.1785(±\pm1.05) 13.864(±\pm0.6)
Table 9. Comparison on Top-K recommendation with different combinations of blocks (%). (ml-10m)
method F1@5 F1@10 F1@15 ndcg@5 ndcg@10 ndcg@15
  SDRL 7.4999(±\pm0.16) 10.508(±\pm0.23) 12.0106(±\pm0.25) 11.4293(±\pm0.46) 13.1128(±\pm0.35) 15.7125(±\pm0.27)
SDRL’(int+ext) 6.8778(±\pm0.23) 9.5076(±\pm0.34) 10.8903(±\pm0.44) 10.5319(±\pm1.02) 11.8883(±\pm0.81) 14.2037(±\pm0.75)
SDRL’(int+oth) 7.432(±\pm0.29) 10.4417(±\pm0.38) 11.9863(±\pm0.44) 11.936(±\pm1.22) 13.2881(±\pm0.88) 15.5704(±\pm0.74)
SDRL’(ext+oth) 5.7576(±\pm0.22) 8.4256(±\pm0.28) 9.886(±\pm0.46) 8.5747(±\pm0.77) 10.3502(±\pm0.62) 12.8529(±\pm0.49)
SDRL’(whole) 5.1166(±\pm0.39) 7.5039(±\pm0.37) 8.7737(±\pm0.53) 10.1397(±\pm2.41) 10.757(±\pm2.44) 11.8431(±\pm2.08)
Table 10. Comparison on Top-K recommendation with different combinations of blocks (%). (Amazon-Book)
method F1@5 F1@10 F1@15 ndcg@5 ndcg@10 ndcg@15
  SDRL 9.0582(±\pm0.07) 11.4625(±\pm0.12) 12.2647(±\pm0.14) 13.7224(±\pm0.18) 13.3115(±\pm0.14) 14.8334(±\pm0.1)
SDRL’(int+ext) 8.8361(±\pm0.07) 11.1924(±\pm0.1) 12.0149(±\pm0.11) 13.3619(±\pm0.17) 12.9737(±\pm0.12) 14.5178(±\pm0.08)
SDRL’(int+oth) 8.1967(±\pm0.1) 10.4707(±\pm0.12) 11.3059(±\pm0.14) 12.3478(±\pm0.18) 12.1008(±\pm0.1) 13.5747(±\pm0.08)
SDRL’(ext+oth) 9.0518(±\pm0.08) 11.4633(±\pm0.09) 12.3021(±\pm0.1) 13.7024(±\pm0.12) 13.3441(±\pm0.09) 14.9128(±\pm0.06)
SDRL’(whole) 1.908(±\pm0.05) 2.4953(±\pm0.05) 2.8446(±\pm0.03) 3.0402(±\pm0.11) 2.9335(±\pm0.05) 3.3497(±\pm0.02)
Table 11. Comparison on item classification with different combinations of blocks (%). (ml-ls)
method recall precison micro_F1
  SDRL 59.7927 71.2913 65.0372
SDRL’(int+ext) 62.1688 74.263 67.6797
SDRL’(int+oth) 61.9033 73.8443 67.3484
SDRL’(ext+oth) 52.5572 62.046 56.908
SDRL’(whole) 39.9923 42.0991 41.0177
Table 12. Comparison on item classification with different combinations of blocks (%). (ml-10m)
method recall precison micro_F1
  SDRL 80.0098 80.0832 80.0464
SDRL’(int+ext) 77.1831 77.2037 77.1933
SDRL’(int+oth) 76.558 76.4345 76.4961
SDRL’(ext+oth) 51.3015 49.2074 50.2283
SDRL’(whole) 47.1581 43.9449 45.492
Table 13. Comparison on item classification with different combinations of blocks (%). (Amazon-Book)
method recall precison micro_F1
  SDRL 94.4971 83.5058 88.6618
SDRL’(int+ext) 98.2341 86.0925 91.7634
SDRL’(int+oth) 97.9884 85.9162 91.556
SDRL’(ext+oth) 53.6879 49.6705 51.6008
SDRL’(whole) 48.922 45.5 47.1487
Refer to caption
(a) SDRL, ml-ls
Refer to caption
(b) SDRL, ml-10m
Refer to caption
(c) SDRL, Amazon-Book
Refer to caption
(d) SDRL’(int+other), ml-ls
Refer to caption
(e) SDRL’(int+other), ml-10m
Refer to caption
(f) SDRL’(int+other), Amazon-Book
Refer to caption
(g) SDRL’(ext+other), ml-ls
Refer to caption
(h) SDRL’(ext+other), ml-10m
Refer to caption
(i) SDRL’(ext+other), Amazon-Book
Figure 4. Illustration of Losses in the model training.

Top-K Recommendation. We generate Top-5, 10 and 15 items according to predicted ratings as recommendations. The comparison results on three datasets are shown in Tables 2, 3 and 4, respectively. Based on the observations of comparison results, we find that our method SDRL achieves steady improvements compared with baselines. It at least improves F1 (ndcg) by 20.41% on MovieLens-latest-small, 35.24% on MovieLens-10m and 0.29% on Amazon-Book. It demonstrates that the proposed representation method SDRL significantly improves the Top-K recommendation performance especially on MovieLens datasets ml-ls and ml-10m. The possible reason is that more dense interactions (as shown in Table 1) reflect richer features and it could promote (semi-)disentangled representation more accurate overall.

Item Classification. Item classification also plays an important role in recommendation system such as cold-start task (zhao_categorical-attributes-based_2018, ). Since VAECF and MacridVAE do not generate item representations, we only utilize NetMF and ProNE as baselines and item categories as labels in item classification. We employ item embedding as the input and a MLPClassifier 222https://scikit-learn.org/ as the classification algorithm for all methods. The comparison results are shown in Tables 5, 6 and 7, respectively. We observe that our method SDRL outperforms baselines on three datasets with the help of category supervision.

4.3. Ablation Analysis

To study the impacts of various blocks in SDRL, we develop the ablation study based on the same setting as comparison experiment. Not only highlighting the best values in bold, we also underline the second and third best values in the results.

Top-K Recommendation. Based on the observations of comparison results in Tables 8, 9 and 10, we have three major findings. (1) Overall, SDRL consisting of three blocks and SDRL’(int+ext) achieve a relatively better performance. We infer that both category- and rating-based features play an important role in Top-K recommendation. (2) Another find is that variants with separated blocks usually outperforms that with the only one block i.e., SDRL’(whole). It demonstrates that semi-disentanglement significantly improves the performance in Top-K recommendation.

(3) We also find a difference between the results on MovieLens datasets (i.e., ml-ls and ml-10m) and Amazon-Book. On MovieLens datasets, SDRL’(int+oth) outperforms SDRL’(ext+oth), while on Amazon-Book SDRL’(ext+oth) shows a better performance. That’s to say, the internalblockinternal\ block has a bigger impact on MovieLens datasets and the externalblockexternal\ block takes more effects on Amazon-Book. We conduct further analysis and find that the difference may have a close relation with the statistical characteristics of datasets. As is shown in Fig. 3, on MovieLens datasets, over 80% users relate with at least 14 categories (77.78%); and on Amazon-Book, over 80% users only relate with at least 12 categories (54.55%). The statistical characteristics of datasets could affect the role of different features in downstream tasks, and (semi-)disentanglement increases the flexibility to emphasize certain features.

We also conduct a deeper analysis in terms of model optimization, to explore what causes the difference. As introduced in Section 3, we employ lossintloss_{int} (in Equation 12) to encourage the internalblockinternal\ block to express relatively more category-based features. Meanwhile, the externalblockexternal\ block (in Equation 13) is expected to contain more interactive features with the optimization of lossextloss_{ext}. In addition, lossreconloss_{recon} (in Equation 9) optimizes the whole reconstruction of users, items and ratings, which would make the internalblockinternal\ block also contain some interactive features.

We draw the trends of each parts of loss functions for three methods (i.e., SDRL, SDRL’(int+oth) and SDRL’(ext+oth)) respectively in Fig. 4. We observe that there is a similar trend among lossreconloss_{recon}, lossintloss_{int} and lossextloss_{ext} on MovieLens datasets in Fig.s 4(a), 4(d) and 4(g) (or in Fig.s 4(b), 4(e) and 4(h)). That is, the training process fairly optimizes the internalblockinternal\ block, externalblockexternal\ block and the whole reconstruction. However, on Amazon-Book in Fig.s 4(c), 4(f) and 4(i), lossintloss_{int} obviously decreases faster than the other two losses.

It demonstrates that in the model training on Amazon-Book, the optimization for internalblockinternal\ block plays a more important role. The bias on Amazon-Book pushes the embeddings to contain more category-based features and relatively fewer rating-based features. The effect appears especially significant in SDRL’(int+oth), in which lossextloss_{ext} is removed from loss function. However, in Top-K recommendation, interactive features based on ratings may play a major part. It probably explain why SDRL’(int+oth) performs poorer than SDRL’(ext+oth) on Amazon-Book. Similarly, on MovieLens datasets, SDRL’(ext+oth) generates representations without the supervision of category-based information. Meanwhile, for SDRL’(int+oth), the fair optimization would encourage embeddings contain both category information and interactive features. Therefore, SDRL’(int+oth) outperforms SDRL’(ext+oth) on MovieLens datasets.

Last but not least, the difference also validates the importance of employing multiple features as supervision for different blocks in the representation learning, i.e., supervised (semi-)disentangled representation.

4.4. Hyper-parameter Analysis

It is flexible to adjust the proportion of internalblockinternal\ block, externalblockexternal\ block and otherblockother\ block in the embeddings. We set the proportion as 2:1:1, 1:2:1 and 1:1:2 to test their performance on Top-15 recommendation. The results in Fig. 5 show the variant with proportion 2:1:1 (i.e., a bigger proportion for internalblockinternal\ block) outperforms the others on MovieLens datasets. On Amazon-Book, the variant with a bigger proportion for externalblockexternal\ block performs better. Therefore, we set the proportion in SDRL as 2:1:1 on the Movielens datasets and 1:2:1 on Amazon-Book.

Refer to caption
Figure 5. Impact of Block Proportion.

4.5. Representation Visualization

In SDRL, internalblockinternal\ block and externalblockexternal\ block are expected to respectively express features from individual itself and user-item interactions. We visualize node representation to qualitatively analyze the semi-disentanglement of two parts.

4.5.1. Visualization based on Item Representation

For clarity, we choose the 77 most common categories as labels and visualize internalblockinternal\ block and externalblockexternal\ block of items using t-SNE (rauber_visualizing_2016, ) respectively. The results are revealed in Fig. 6. We could find that in the left figures, nodes in same color (i.e., items belonged to the same category) are more clustered. It indicates that internalblockinternal\ block contains more category-related information than externalblockexternal\ block, which is consistent with our expectation.

Refer to caption
(a) ml-ls (int)
Refer to caption
(b) ml-ls (ext)
Refer to caption
(c) ml-10m (int)
Refer to caption
(d) ml-10m (ext)
Refer to caption
(e) Amazon-Book (int)
Refer to caption
(f) Amazon-Book (ext)
Figure 6. Visualization of Item Representation.

4.5.2. Visualization based on User Representation

We also select four users (whose user IDs are 551, 267, 313 and 216) on MovieLens-latest-small to visualize their representation in Fig. 7. For 128-dimension embeddings on MovieLens-latest-small, the first 64 bits represent internalblockinternal\ block, the next 32 bits externalblockexternal\ block and the last 32 bits otherblockother\ block. Among them, users No.551, No.267 and No.313 share a similar internalblockinternal\ block representation while the embedding of user No. 216 shows a different one. Based on source data displayed in Fig. 7(b), we find that the former three users take interest in action, adventure and thriller movies while the later likes comedy movies the most. Meanwhile, there is similar information on externalblockexternal\ block among users No.216, No.267 and No.313 (especially users No.216 and No.313), while user No.551 is a little bit different. Based on the statistics of common items in Fig. 7(c), we could find that there are relatively more common items among users No.216, No. 313 and No.267 than user No.551 and the others.

Refer to caption
(a) Visualization of User Representation.
Refer to caption
(b) Visualization of User Preference on Category.
Refer to caption
(c) Statistics of Common Items.
Figure 7. Relation among User Representation, Preference on Category and Common Items.

Shortly, in line with the expectation, internalblockinternal\ block and externalblockexternal\ block express more features of category and user-item interactions, respectively. It also improves the interpretability of node representation as well as offer clues to generate explanations. For instance, when the recommended item better matches the target user on internalblockinternal\ block, an explanation based on category maybe more reasonable, while that on externalblockexternal\ block indicates interaction-based explanation is more suitable.

4.6. Summary of Experiments

In summary, we have the following findings in the experiments. (1) Overall, SDRL gains stable and general improvements, demonstrating its effectiveness in representation learning. (2) In consistent with expectation, internalblockinternal\ block and externalblockexternal\ block respectively express more category-based and interactive features. (3) Semi-disentanglement enhances the explainability and generality in terms of representation in recommendation system.

5. Related Work

We would briefly review related works on representation in recommendation system and disentangled representation.

5.1. Representation in Recommendation System

Representation learning transforms the sparse data in recommendation system into structured embeddings, providing great convenience for complex network process (cui_survey_2019, ). Based on the type of source data, existing representation methods could be divided into three categories: structure-based (perozzi_deepwalk_2014, ; grover_node2vec_2016, ), content-based (zhang_prone_2019, ; zhao_deep_2020, ) and both-based (wang_reinforced_2020, ; he_lightgcn_2020, ; fu_magnn_2020, ). Among them, both-based approaches receive lots of attentions in recent years, e.g., graph neural network (GNN). It initializes representation with content features and then update it with structure information.

Our method SDRL also employs content and structure features as the input. The major difference between SDRL and GNN-based approaches is that SDRL embeds content- and structure-based information into different blocks (i.e., internalblockinternal\ block and externalblockexternal\ block respectively) while they represent it as a whole.

5.2. Disentangled Representation

Disentangled representation learning aims to separate embedding into distinct and explainable factors in an unsupervised way (bengio_representation_2013, ; do_theory_2020, ). It has been successfully applied into computer vision (nemeth_adversarial_2020, ). Recently, Locatello et al. (locatello_challenging_2019, ; locatello_commentary_2020, ) demonstrate that unsupervised disentangled representation learning without inductive biases is theoretically impossible. To deal with this problem, some works with (semi-)supervision are proposed (locatello_disentangling_2020, ; chen_weakly_2020, ). However, the assumption of factor independence makes typical disentangled representation not applicable for recommendation.

Our method differs from existing disentangled representation works in that we do not separate the whole embedding (i.e., we preserve otherblockother\ block for no specific meanings) and we do not explicitly force the independence of different factors. In this way, we could preserve more complicated relations in recommendation system, while achieving proper disentanglement.

6. Conclusions

To improve both disentanglement and accuracy of representation, we propose a semi-disentangled representation method for recommendation SDRL. We take advantages of category features and user-item ratings as the supervision for the proposed internalblockinternal\ block and externalblockexternal\ block. To the best of our knowledge, it is the first attempt to develop semi-disentangled representation as well as improve disentanglement with supervision in recommendation system. We conduct intensive experiments on three real-world datasets and the results validate the effectiveness and explainability of SDRL. In the future work, we would try to extract more labels and utilize them to separate the explainable part into fine-grained features for extensive applications. We are also interested to deeply study the unexplainable part to manage the uncertainty.

References

  • (1) Yazeed Alharbi and Peter Wonka. Disentangled Image Generation Through Structured Noise Injection. In CVPR 2020, pages 5133–5141, 2020.
  • (2) Y. Bengio, A. Courville, and P. Vincent. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell., 35(8):1798–1828, August 2013.
  • (3) Junxiang Chen and Kayhan Batmanghelich. Weakly Supervised Disentanglement by Pairwise Similarities. In AAAI 2020, pages 3495–3502, 2020.
  • (4) Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. In NIPS 2016, pages 2172–2180, 2016.
  • (5) Pengyu Cheng, Martin Renqiang Min, Dinghan Shen, Christopher Malon, Yizhe Zhang, Yitong Li, and Lawrence Carin. Improving Disentangled Text Representation Learning with Information-Theoretic Guidance. In ACL 2020, pages 7530–7541, 2020.
  • (6) Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. A Survey on Network Embedding. IEEE Trans. Knowl. Data Eng., 31(5):833–852, 2019.
  • (7) Kien Do and Truyen Tran. Theory and Evaluation Metrics for Learning Disentangled Representations. In ICLR 2020, 2020.
  • (8) Xinyu Fu, Jiani Zhang, Ziqiao Meng, and Irwin King. MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding. In WWW 2020, pages 2331–2341, 2020.
  • (9) Aditya Grover and Jure Leskovec. node2vec: Scalable Feature Learning for Networks. In KDD 2016, pages 855–864, 2016.
  • (10) F. Maxwell Harper and Joseph A. Konstan. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst., 5(4):19:1–19:19, 2016.
  • (11) Ruining He and Julian J. McAuley. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In WWW 2016, pages 507–517, 2016.
  • (12) Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yong-Dong Zhang, and Meng Wang. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In SIGIR 2020, pages 639–648, 2020.
  • (13) Irina Higgins, Loïc Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In ICLR 2017, 2017.
  • (14) Geoffrey E. Hinton and Richard S. Zemel. Autoencoders, Minimum Description Length and Helmholtz Free Energy. In NIPS 1993, pages 3–10, 1993.
  • (15) Diederik P. Kingma and Max Welling. An Introduction to Variational Autoencoders. Found. Trends Mach. Learn., 12(4):307–392, 2019.
  • (16) Dmytro Kotovenko, Artsiom Sanakoyeu, Sabine Lang, and Bjorn Ommer. Content and Style Disentanglement for Artistic Style Transfer. In ICCV 2019, pages 4421–4430, 2019.
  • (17) Xueqi Li, Wenjun Jiang, Weiguang Chen, Jie Wu, Guojun Wang, and Kenli Li. Directional and Explainable Serendipity Recommendation. In WWW 2020, pages 122–132, 2020.
  • (18) Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony Jebara. Variational Autoencoders for Collaborative Filtering. In WWW 2018, pages 689–698, 2018.
  • (19) Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations. In ICML 2019, volume 97, pages 4114–4124, 2019.
  • (20) Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. A Commentary on the Unsupervised Learning of Disentangled Representations. In AAAI 2020, pages 13681–13684, 2020.
  • (21) Francesco Locatello, Michael Tschannen, Stefan Bauer, Gunnar Rätsch, Bernhard Schölkopf, and Olivier Bachem. Disentangling Factors of Variations Using Few Labels. In ICLR 2020, 2020.
  • (22) Jianxin Ma, Chang Zhou, Peng Cui, Hongxia Yang, and Wenwu Zhu. Learning Disentangled Representations for Recommendation. In NeurIPS 2019, pages 5712–5723, 2019.
  • (23) Jianxin Ma, Chang Zhou, Hongxia Yang, Peng Cui, Xin Wang, and Wenwu Zhu. Disentangled Self-Supervision in Sequential Recommenders. In KDD 2020, pages 483–491, 2020.
  • (24) Julian J. McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. Image-Based Recommendations on Styles and Substitutes. In SIGIR 2015, pages 43–52, 2015.
  • (25) József Németh. Adversarial Disentanglement with Grouped Observations. In AAAI 2020, pages 10243–10250, 2020.
  • (26) Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. DeepWalk: online learning of social representations. In KDD 2014, pages 701–710, 2014.
  • (27) Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec. In WSDM 2018, pages 459–467, 2018.
  • (28) Paulo E. Rauber, Alexandre X. Falcão, and Alexandru C. Telea. Visualizing Time-Dependent Data Using Dynamic t-SNE. In EuroVis 2016, pages 73–77, 2016.
  • (29) Benedek Rozemberczki, Oliver Kiss, and Rik Sarkar. Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs. In CIKM 2020, 2020.
  • (30) Guy Shani and Asela Gunawardana. Evaluating Recommendation Systems. In Recommender Systems Handbook, pages 257–297. 2011.
  • (31) Dharahas Tallapally, Rama Syamala Sreepada, Bidyut Kr. Patra, and Korra Sathya Babu. User preference learning in multi-criteria recommendations using stacked auto encoders. In RecSys 2018, pages 475–479, 2018.
  • (32) Jian Tang, Meng Qu, and Qiaozhu Mei. PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks. In KDD 2015, pages 1165–1174, 2015.
  • (33) Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. LINE: Large-scale Information Network Embedding. In WWW 2015, pages 1067–1077, 2015.
  • (34) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is All you Need. In NIPS 2017, pages 5998–6008, 2017.
  • (35) Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion. J. Mach. Learn. Res., 11:3371–3408, 2010.
  • (36) Xiang Wang, Hongye Jin, An Zhang, Xiangnan He, Tong Xu, and Tat-Seng Chua. Disentangled Graph Collaborative Filtering. In SIGIR 2020, pages 1001–1010, 2020.
  • (37) Xiang Wang, Yaokun Xu, Xiangnan He, Yixin Cao, Meng Wang, and Tat-Seng Chua. Reinforced Negative Sampling over Knowledge Graph for Recommendation. In WWW 2020, pages 99–109, 2020.
  • (38) Yao Wu, Christopher DuBois, Alice X. Zheng, and Martin Ester. Collaborative Denoising Auto-Encoders for Top-N Recommender Systems. In WSDM 2016, pages 153–162, 2016.
  • (39) Jie Zhang, Yuxiao Dong, Yan Wang, Jie Tang, and Ming Ding. ProNE: Fast and Scalable Network Representation Learning. In IJCAI 2019, pages 4278–4284, 2019.
  • (40) Kai Zhao, Ting Bai, Bin Wu, Bai Wang, Youjie Zhang, Yuanyu Yang, and Jian-Yun Nie. Deep Adversarial Completion for Sparse Heterogeneous Information Network Embedding. In WWW 2020, pages 508–518, 2020.
  • (41) Qian Zhao, Jilin Chen, Minmin Chen, Sagar Jain, Alex Beutel, Francois Belletti, and Ed H. Chi. Categorical-attributes-based item classification for recommender systems. In RecSys 2018, pages 320–328, 2018.