∎

¹¹institutetext: Sowmini Devi Veeramachaneni, Arun K Pujari ²²institutetext: Ècole Centrale School of Engineering, Mahindra University, Hyderabad, Telangana, India
Tel.: +91-8985061848
²²email: [email protected], [email protected] ³³institutetext: Vineet Padmanabhan ⁴⁴institutetext: School of Computer and Information Sciences, University of Hyderabad, Hyderabad, Telangana, India
⁴⁴email: [email protected] ⁵⁵institutetext: Vikas Kumar ⁶⁶institutetext: University of Delhi, Delhi, India
⁶⁶email: [email protected]

Transfer of codebook latent factors for cross-domain recommendation with non-overlapping data

Sowmini Devi Veeramachaneni Arun K Pujari Vineet Padmanabhan Vikas Kumar

Abstract

Recommender systems based on collaborative filtering play a vital role in many E-commerce applications as they guide the user in finding their items of interest based on the user’s past transactions and feedback of other similar customers. Data Sparsity is one of the major drawbacks with collaborative filtering technique arising due to the less number of transactions and feedback data. In order to reduce the sparsity problem, techniques called transfer learning/cross-domain recommendation has emerged. In transfer learning methods, the data from other dense domain(s) (source) is considered in order to predict the missing ratings in the sparse domain (target). In this paper, we come up with a novel transfer learning approach for cross-domain recommendation, wherein the cluster-level rating pattern(codebook) of the source domain is obtained via a co-clustering technique. Thereafter we apply the Maximum Margin Matrix factorization(MMMF) technique on the codebook in order to learn the user and item latent features of codebook. Prediction of the target rating matrix is achieved by introducing these latent features in a novel way into the optimisation function. In the experiments we demonstrate that our model improves the prediction accuracy of the target matrix on benchmark datasets.

Keywords:

Collaborative Filtering Matrix Factorisation Codebook Transfer LearningCross-Domain Recommendation

1 Introduction

The key idea of a recommender system (RS) Bobadilla et al., (2013); Ricci et al., (2010); Aggarwal, (2016) is to provide useful information to users regarding the products/items that would be interested in. Collaborative filtering (CF) is one of the common techniques used in recommendation engines to learn user profiles so that preferred items can be recommended. In CF, a recommendation is made for a user (usually called the target user) by taking into consideration the known preferences of other users who are much alike to the target user. Matrix factorization Koren et al., (2009); Wu, (2007) is widely considered as one of the most promising collaborative filtering techniques.

MF finds the user and item latent features from a given user-item rating matrix, and the product of these latent feature vectors yields the approximation/prediction of the rating matrix. Let us suppose that we are given $X\in\mathbb{R}^{m_{1}\times n_{1}}$ which is a user-item rating matrix. The number of users is denoted by $m_{1}$ and $n_{1}$ denotes the items. The idea in MF is to find two matrices, $U\in\mathbb{R}^{m_{1}\times l}$ and $V\in\mathbb{R}^{n_{1}\times l}$ ( $l$ is the number of latent factors), such that $U\times V^{T}=\hat{X}\approx X$ (i.e., the product is approximately equal to $X$ . ), on observed ratings ( $\mathcal{O}$ ). The problem can be formulated as follows,

Minimize~{}~{}~{}\mathcal{J}=\sum\limits_{(i,j)\epsilon\mathcal{O}}Loss(x_{ij},u_{i}v_{j})

Here, $Loss(\cdot)$ is the loss fuction that estimates the difference between the actual rating ( $x_{ij}$ ) and the predicted rating ( $\hat{x_{ij}}$ ). In several factorization models we can see that the loss function being considered is the squared error. In collaborative filtering, when the ratings are discrete {1, 2,…,r}, Maximum Margin Matrix Factorization (MMMF) Srebro et al., (2005); Devi et al., (2014); Salman et al., (2016) is shown to be successful. MMMF techniques uses hinge loss as the loss function and this helps in constraining the trace norms of $U$ and $V$ rather than constraining the dimensions, through regularisation. The objective is to discover the latent factor matrices of users ( $U\in\mathbb{R}^{m_{1}\times l}$ ) and items ( $V\in\mathbb{R}^{n_{1}\times l}$ ), and $r-1$ thresholds $\theta_{iq}$ ( $1\leq q\leq r-1$ ) for each user $i$ by minimizing the following optimization function.

\mathcal{J}(U,V,\Theta)=\sum_{(i,j)\in\mathcal{O}}\sum_{q=1}^{r-1}h(\mathcal{T}_{ij}^{q}(\theta_{iq}-u_{i}v_{j}^{T}))+\lambda(||U||_{F}^{2}+||V||_{F}^{2})

(1)

where, $\lambda>0$ is regularization parameter,

\mathcal{T}_{ij}^{q}=\begin{cases}-1&\text{if $q<x_{ij}$}\\ +1&\text{if $q\geq x_{ij}$\hskip 118.07875pt}\end{cases}

and $h(\cdot)$ is a smooth hinge-loss function with the following definition:

h(f)=\begin{cases}0&\text{if $f$ $\geq$ 1}\\ \frac{1}{2}(1-f)^{2}&\text{if $0<f<1$}\\ \frac{1}{2}-f&\text{otherwise.\hskip 91.04872pt}\end{cases}

(2)

The optimisation function for MMMF as given in the equation (Eq. 1) does not require $u_{i}v_{j}^{T}$ and $x_{ij}$ to be closer. It expects $u_{i}v_{j}^{T}$ to be as small as possible if $q\geq x_{ij}$ and as large as possible if $q<x_{ij}$ , when compared with $\theta_{iq}$ . One can solve the optimization function given in Eq. (1) using gradient descent.

Though MMMF based collaborative filtering techniques perform well with discrete data it is confined to a single domain and fails to account for user-item interaction when multiple domains are involved. They also do not perform well when the number of ratings are low resulting in data sparsity problem. Transfer learning methods Pan and Yang, (2010) have been developed to address these concerns.

Transfer learning helps in building a predictive model by extracting and transferring common knowledge when multiple domains are involved. These domains are usually referred in the literature as source and target. When transfer learning techniques are applied in the domain of recommender systems, two critical problems needs to be addressed for successful knowledge transfer. These are 1) how to account for knowledge transfer if both the domains have shared users/items and 2) accounting for transferring of knowledge when domains do not have shared users/items. As far as the first problem is concerned, it is quite difficult for such a scenario to exist in real world as the users/items in one system may not be present in another. Even otherwise, if such correspondence exists, mapping becomes a difficulty as the users/items may have different names in different systems. The second problem is pretty hard to address and in this study we employ a representative approach to do so.

In this paper, we come up with a novel approach of transferring the learnt knowledge from source to the target by assuming that the source and target domains have some implicit correlation. The learnt knowledge of source which are the latent factors of codebook get transferred to the target via hinge loss. Experimental results demonstrate the superiority of the proposed approach for cross-domain collaborative filtering as compared to other major methods in transfer learning which are codebook-based.

The remainder of the paper is structured as follows: In Section 2, we talk about related work. The proposed strategy is outlined in Section 3. Section 4 discusses the experimental results. Section 5 contains the conclusion and future work.

2 Related Work

Though collaborative filtering based recommender systems have become the norm these days, they have trouble in making accurate recommendations due to data sparsity problem, i.e., very little actual information available. To address the data sparsity issue in recommender systems, transfer learning Pan and Yang, (2010); Pan, (2016); Zhao et al., (2013) techniques (for cross domain recommender systems) has been proposed in the literature. As mentioned earlier, in transfer learning there is a source domain which is usually considered as a dense domain from which knowledge is transferred to the target domain which is usually sparse. For example, suppose that a particular user has watched lots of movies and have rated many of the movies he has seen. Suppose also that the same user has read many books and has given very few ratings in the books domain but at the same time wants a book that is of interest to him to be recommended. In such a scenario, the idea is to make use of the users’ ratings in the movie domain such that a particular book of his/her interest can be recommended.

One of the major issues involved in developing transfer learning techniques for recommendation purpose is to establish a bridge between the domains that are involved, so that knowledge can be transferred from source domain to the target domain. Domains can be linked and the transfer can happen explicitly via inter-domain similarities, common item attributes, etc. The transfer can happen also implicitly via shared user latent features or item latent features or by rating patterns which can be transferred between the domains. In Chung et al., (2007), a framework was proposed where the items that are relevant in the source domain are picked based on the common attributes they have with the target domain (user-interested domain). In this way, the inter-domain links were built utilising the common item attributes, however no overlap of users/items was required between the domains. On the other side the transfer of knowledge through shared latent features (of users/items) is addressed in Pan et al., (2010). The idea proposed in this paperPan et al., (2010) is to learn the hidden features present in the users and items of the source domain so that they can be integrated into the target rating matrix during the factorization process via regularisation. The success of this procedure is dependent on the existence of common users or items. In Pan et al., (2011), the latent properties of source and target are shared in a collective way. Here, rather than learning the latent features from source and utilizing them in the target, a technique is proposed wherein the latent features are simultaneously learnt from both the domains. A method called matrix tri-factorization is used to construct the shared latent space with the condition that from both the domains the users and items needs to be identical.

There are other set of methods in which rating patterns are analysed and transferred rather than latent features. These methods can be used in scenarios wherein users/items are not common between the domains. Rating patterns stem from the assumption that among the ratings of groups of users and groups of items a correlation could exist. One such method is codebook transfer (CBT) Li et al., 2009a , where the main assumption is that, though the users/items are different across systems, the clusters (groups - based on age, interest,…) of them behave similarly. It is an adaptive method which consists of mainly two steps. One is the codebook construction and the second step is filling the target matrix by transferring the learnt codebook. As part of the initial step, the users and items that belong to the dense source domain are co-clustered Ding et al., (2006) to get the cluster-level rating pattern(rating pattern at the cluster level). This pattern is called the codebook which consists of the mean rating of each of the co-clusters of users and items. Following that, the codebook is transferred to the target domain by expanding the values of the codebook. To do the same, users and items of the target domain needs to be mapped (to co-clusters) and this can be done by minimizing the quadratic loss which can be expressed as,

\min_{F_{1}\in\{0,1\}^{m_{1}\times k_{1}},F_{2}\in\{0,1\}^{n_{1}\times k_{2}}}||[X-F_{1}CF_{2}^{T}]\odot W||_{F}^{2}\quad\text{s.t.,$F_{1}\bf{1}$ = $1$, $F_{2}\bf{1}$ = $1$.}

(3)

Here, $X_{m_{1}\times n_{1}}$ represents the user-item rating matrix of target domain. $C_{k_{1}\times k_{2}}$ stands for codebook (rating pattern at cluster-level) which is learned from the source domain. The codebook is fixed and used to learn the cluster membership matrices of users ( ${F_{1}}_{m_{1}\times k_{1}}$ ) and items ( ${F_{2}}_{n_{1}\times k_{2}}$ ) of target data. A value of $1$ in the indicator matrix $W$ of size $m_{1}\times n_{1}$ shows the existence of the rating in the original rating matrix whereas a value of $0$ indicates the absence of the rating. The idea of code book transfer is to discover a common latent space wherein the knowledge gained in the form of $C$ (source domain data) can be used to enhance the recommendation in the target domain. i.e., here the ratings are transferred in the condensed form (codebook).

In Li et al., 2009b , authors have proposed a method named rating-matrix generative model which uses a probabilistic framework and fill the missing ratings of target domain by considering the rating data from multiple source rating matrices to construct the rating pattern. Moreno et al., (2012) extends the CBT by considering multiple source domains and checking different combinations of user/item clusters. It creates seperate codebooks for every source domain and extracts the relatedness between the target and each of the sources. It is based on the linear mixture of different codebooks in which the learning of weights is done by the minimization of target domain prediction error. A relaxation related to the assumption of a fully dense source domain rating matrix is taken into consideration in He et al., (2017, 2018). A different way of generating the codebook has been outlined in Ji et al., (2016); Veeramachaneni et al., (2019, 2022). In these methods, by making use of the technique of matrix factorisation the user and item latent factors are generated from the source domain. The user and item latent factors thus generated are used to obtain the user latent facor group and item latent factor group. Codebook is generated by multiplying the mean latent vectors of the group.

The methods as outlined in Pan and Ming, (2014); Pan et al., (2016) takes care of scenarios wherein the feedback data of target and source are of different types. The users and items in the target and auxiliary(source) domains are assumed to be the same in Pan and Ming, (2014). Two sets of source data are taken into consideration in Shi et al., (2013), one of which shares a common set of users with the target data and the other of which shares a common set of items with the target data. The source data’s latent factors are extracted, and similarity graphs are constructed from these latent factors. Both latent factors and similairty graphs are then transferred to the target data.

3 Proposed Approach

Let there be two user-item rating matrices of different domains say $X_{m_{1}\times n_{1}}$ (target domain matrix), $Y_{m_{2}\times n_{2}}$ (source domain matrix). Here $m_{1}$ , $m_{2}$ is the number of users and $n_{1}$ , $n_{2}$ is the number of items, and the entries of the matrices are the ratings given by the users to the items. Our goal is to predict the missing entries of the target domain more accurately using the source domain data. Figure-1 gives the sequential steps of the proposed method. Initially, we fill the missing entries in the source rating matrix with the mean of the ratings of that row (Step-1) and denote the filled-in rating matrix as $Y^{\prime}$ . In Step-2, we apply the co-clustering on the filled-in rating matrix ( $Y^{\prime}$ ) in order to get the rating pattern at the cluster-level called as codebook ( $C_{k_{1}\times k_{2}}$ ). Once the codebook is obtained, we process the codebook (Step-3) by removing some of the entries of codebook and replacing by $0$ . Processing of codebook is done by comparing it with filled-in rating matrix as follows. Take the entry of the codebook which indicates the average of ratings given by a cluster of the users to some group of items. Compare the value with the entries of the particular users (forming a cluster) and particular items (of the cluster) of the filled-in rating matrix. If the number of entries containing the same value is more than some specific threshold percentage ( $th$ ) then keep it, else we remove and replace it as zero. As there are real values in codebook and filled-in rating matrix, we don’t check for the values to be exact, but instead we check for their difference to be small. The difference (error) is compared using some margin $\epsilon$ . Calculate the difference between the entries of filled-in matrix and that of codebook, and if more than some threshold percentage ( $th$ ) of entries contain the margin less than or equal to $|\epsilon|$ , then keep the entry as it is, else remove the entry. By following this removal of entries we get a partial codebook ( $C_{p}$ ). Now, apply MMMF (1) on the processed codebook ( $C_{p}$ ) to get the codebook’s latent feature vectors of users ( $U_{c_{k_{1}\times l^{\prime}}}$ ), and items ( $V_{c_{k_{2}\times l^{\prime}}}$ ) alongside a threshold matrix of users ( $\Theta_{c_{k_{1}\times r-1}}$ ), which is shown in Step-4.

Refer to caption — Figure 1: Illustration of the proposed method

Once thedr latent features are obtained, we transfer the same to the target domain which is given at Step-5, by minimizing the optimization function given in Eq. 4. As far as we know, there is no previous research which addresses the transfer of latent features of codebook, and also consider hinge loss (Eq. 4) as the loss function while transferring the learnt knowledge of the source domain to the target domain. Our assumption in this work is that there could exist between the source and target domain some implicit correspondence through the user/item latent features ( $U_{c}$ , $V_{c}$ ) of codebook.

	$\displaystyle\mathcal{J}(\alpha,\beta)={}$	$\displaystyle\sum_{(i,j)\in\mathcal{O}}\sum_{q=1}^{r-1}h(\mathcal{T}_{ij}^{q}(\alpha_{i}\theta_{\cdot q}-(\alpha_{i}U_{c})(\beta_{j}V_{c})^{T}))+\lambda_{1}(\sum_{i=1}^{m_{1}}l_{1}(\sum_{k=1}^{k_{1}}\alpha_{ik})+\sum_{j=1}^{n_{1}}l_{1}(\sum_{k=1}^{k_{2}}\beta_{jk}))$		(4)
		$\displaystyle+\lambda_{2}(\sum_{i=1}^{m_{1}}\sum_{k=1}^{k_{1}}l_{2}(\alpha_{ik})+\sum_{j=1}^{n_{1}}\sum_{k=1}^{k_{2}}l_{2}(\beta_{jk}))$		(4)

where,
$\mathcal{T}_{ij}^{q}=\begin{cases}+1&\text{if $q\geq x_{ij}$}\\ -1&\text{if $q<x_{ij}$}\end{cases}$

$l_{1}(d)=\begin{cases}1-d,&\text{if $d<1$}\\ d-1,&\text{if $d>1$}\\ 0,&\text{otherwise}\end{cases}$

$l_{2}(d)=\begin{cases}d,&\text{if $d<0$}\\ 0,&\text{otherwise}\end{cases}$
$\lambda_{1},\lambda_{2}>0$ are regularization parameters. $h(\cdot)$ is the smooth-hinge loss defined as given in Eq.(2). For a given matrix $A$ , $A_{\cdot a}$ is the $a^{th}$ column of $A$ . $l_{1}(d)$ ensures the row sum of $\alpha$ and column sum of $\beta$ to be $1$ , whereas $l_{2}(d)$ ensures all elements of $\alpha$ and $\beta$ to be positive. We have used Gradient Descent technique to optimize the Eq.(4), by updating the variables $\alpha$ and $\beta$ . Initially $\alpha$ and $\beta$ are randomly assigned, and then by calculating the gradients of Eq.(4) w.r.t. to $\alpha$ and $\beta$ , we update $\alpha$ and $\beta$ . By using the updated values of $\alpha$ and $\beta$ , the value of the Eq.(4) decreases monotonically and converges to a local minimum.

The gradients of Eq.(4) w.r.t. variables $\alpha$ and $\beta$ are as follows,

\frac{\partial\mathcal{J}}{\partial{\alpha_{ik}}}=\lambda_{1}l_{1}^{{}^{\prime}}(\sum_{k=1}^{k_{1}}\alpha_{ik})+\lambda_{2}l_{2}^{{}^{\prime}}(\alpha_{ik})+\sum_{q=1}^{r-1}\sum_{(i,j)\in\mathcal{O}}\mathcal{T}_{ij}^{q}.h^{\prime}(\mathcal{T}_{ij}^{q}(\alpha_{i}\theta_{\cdot q}-(\alpha_{i}U_{c})(\beta_{j}V_{c})^{T}))(\theta_{kq}-IU_{c}(\beta_{j}V_{c})^{T})

(5)

\frac{\partial\mathcal{J}}{\partial{\beta_{jk}}}=\lambda_{1}l_{1}^{{}^{\prime}}(\sum_{k=1}^{k_{2}}\beta_{jk})+\lambda_{2}l_{2}^{{}^{\prime}}(\beta_{jk})-\sum_{q=1}^{r-1}\sum_{(i,j)\in\mathcal{O}}\mathcal{T}_{ij}^{q}.h^{\prime}(\mathcal{T}_{ij}^{q}(\alpha_{i}\theta_{\cdot q}-(\alpha_{i}U_{c})(\beta_{j}V_{c})^{T}))(\alpha_{i}U_{c})(V_{c_{k}})^{T}

(6)

where,

h^{\prime}(f)=\begin{cases}0&\text{if $f$ $\geq$ 1}\\ f-1&\text{if $0<f<1$}\\ -1&\text{otherwise.}\end{cases}

\displaystyle\begin{aligned} &l_{1}^{{}^{\prime}}(d)=\begin{cases}-1,&\text{if $d<1$}\\ 1,&\text{if $d>1$}\\ 0,&\text{otherwise}\end{cases}\\ &\end{aligned}

\displaystyle\begin{aligned} &l_{2}^{{}^{\prime}}(d)=\begin{cases}1,&\text{if $d<0$}\\ 0,&\text{otherwise}\end{cases}\\[14.2083pt] &\end{aligned}

$I$ is the row vector of dimension $1\times k_{1}$ containing all $1^{\prime}s$ .
The updation of $\alpha$ and $\beta$ variables is done as,

\alpha_{ik}^{t+1}=\alpha_{ik}^{t}-a\frac{\partial\mathcal{J}}{\partial{\alpha_{ik}^{t}}}

\beta_{jk}^{t+1}=\beta_{jk}^{t}-a\frac{\partial\mathcal{J}}{\partial{\beta_{jk}^{t}}}

Here $a$ is the step size. Once we get the converged values of $\alpha$ and $\beta$ , we construct the predicted target rating matrix (Step-6) using Equation (7), and map the resultant matrix with the threshold matrix ( $\alpha.\Theta_{c}$ ) to get the target predicted rating matrix.

\hat{X}=W\odot X+[1-W]\odot[\alpha U_{c}V_{c}^{T}\beta^{T}],

(7)

1: Input: A

m_{2}\times n_{2}

source rating matrix

Y

and a

m_{1}\times n_{1}

sparse target rating matrix

X

with

x_{ij}

known for

(i,j)\in\mathcal{O}

2: Output:

x_{ij}

for

(i,j)\notin\mathcal{O}

3: Fill the missing entries of each row of

Y

with the average of the rows and call it as

Y^{\prime}

4: Apply co-clustering on

Y^{\prime}

to get the codebook (

C

)

5: Process the codebook to get the partial codebook (

C_{p}

)

6: Find

U_{c}

V_{c}

\Theta_{c}

using MMMF (by minimizing Eq. 1) on

C_{p}

7: Use

U_{c}

V_{c}

, and find

\alpha

\beta

of target domain by minimizing equation (4).

8: Using these

\alpha

and

\beta

, calculate Eq. 7 and map with the

\alpha\Theta_{c}

in order to get the discrete predicted rating matrix (

\hat{X}

Algorithm 1 Finding latent features of codebook and transferring to target

where $\hat{X}$ is the predicted (approximated) target rating matrix. A value of 1 in the indicator matrix $W$ of size $m_{1}\times n_{1}$ shows the existence of the rating in the original rating matrix, whereas a value of $0$ indicates the absence of the rating. Error calculation for only the observed ratings is ensured through $W$ and the Hadamard product (element wise product) is denoted using $\odot$ . By using the gradient descent technique as given in Eq.(4) a minimal solution can be obtained by updaing $\alpha$ and $\beta$ . Initially $\alpha$ and $\beta$ are randomly assigned, and then by calculating the gradients of Eq.(4) w.r.t. to $\alpha$ and $\beta$ , we update $\alpha$ and $\beta$ . By using the updated values of $\alpha$ and $\beta$ , the value of the Eq.(4) decreases monotonically until a local minima is reached. Once we get $\alpha$ and $\beta$ by working out with the optimization function (4), we generate the predicted target rating matrix (Step-6) by making use of Equation (7), and map the resultant matrix with the threshold matrix ( $\alpha.\Theta_{c}$ ) to get the target predicted rating matrix.

To be brief of Figure-1, it depicts the flow of the proposed method, in which learning from source and transferring to target are the main steps. In the learning stage, the codebook’s latent features of source domain are learnt, and in the transferring stage, the learnt knowledge (latent features of codebook) get transferred to the target domain in order to predict the missing ratings of target domain more accurately.

4 Experimental Analysis

MovieLens-1M¹¹1https://grouplens.org/datasets/movielens/ is used as the source dataset and Goodbooks²²2https://github.com/zygmuntz/goodbooks-10k is used as the target dataset in our experiments. We have taken the first 5000 users and 3000 items from the Goodbooks data. The values of the datasets are in {0,1,2,3,4,5}. The value $0$ indicates that the rating is missing, and $1$ indicates the least rating, and $5$ is the highest rating. Table 1 gives the statistics of the datasets. In our experiments, we have divided the data into training (80%) and testing (20%) sets.

Table 1: Datasets statistics

Dataset	# of Users	# of Items	% of Observed entries
MovieLens 1M	6040	3952	3.77
Goodbooks	5000	3000	1.08

4.1 Evaluation Metrics

From the literature it can be seen that a variety of collaborative filtering algorithms have been put forward in the last decade or so. The accuracy with which these algorithms can predict a new item/set of items vary. It is often the case that performance evaluation of these collaborative filtering algorithms is based on prediction accuracy. The two most often used measures for computing the prediction accuracy are Mean Absolute Error (MAE) (Eq. (9) and Root Mean Square Error (RMSE) (Eq. (8)). These metrics are based on the difference between the true ratings and the predicted ratings, and it is natural that better performance equals smaller values of RMSE and MAE metrics.

RMSE=\sqrt{\sum\limits_{(i,j)\epsilon\mathcal{O}}\frac{{(x_{ij}-\hat{x}_{ij})}^{2}}{|\mathcal{O}|}}

(8)

MAE={\sum\limits_{(i,j)\epsilon\mathcal{O}}\frac{|(x_{ij}-\hat{x}_{ij})|}{|\mathcal{O}|}}

(9)

where $x_{ij}$ is the original rating, $\hat{x}_{ij}$ is the predicted rating, and $|\mathcal{O}|$ is the number of test ratings.

4.2 The different Methods used for Comparison

Some of the baseline methods we use for evaluating the performance of our proposed method can be outlined as follows:

•

MMMF Srebro et al., (2005); Rennie and Srebro, (2005): Maximum Margin Matrix Factorization (MMMF) is the dominant factorization technique used in collaborative filtering. MMMF is usually applied on the input rating matrix consisting of the user-item ratings. The idea is to find the user and item latent-factor vectors which are of low rank by making use of the existing ratings. MMMF can be applied on a single domain only, and hence in our experiments, we applied it on the target domain directly.
•

MINDTL He et al., (2017): In MINDTL, codebook is constructed by taking into consideration the data from all the incomplete source domains. Here codebook for each domain is constructed. Following that, the constructed codebooks are linearly integrated and transferred to target, and the missing(absent) values of the target rating matrix gets predicted. As far as our experimental setup is concerned only a single domain is taken into consideration.
•

TRACER Zhuang et al., (2018): In TRACER, data from multiple domains are accounted for and based on this, ratings (which includes missing ratings) for all the source matrices are predicted. Thereafter the predicted knowledge is utilized by transferring it into the target domain. By making use of consensus regularisation during the knowledge transfer process, all the predicted values are forced to be similar. In a way it can be said that in TRACER at the same time learning and transferring happens. We have considered a single domain in our experiments and therefore there is no need for consensus regularisation.
•

CBT Li et al., 2009a : In this approach, the dense part of the source user-item rating matrix is considered, and the missing values of the rows of the dense matrix get imputed using the average of the ratings of particular row (user). The codebook is obtained from the dense user-item matrix by applying the technique of co-clustering.

In our experiments, unlike in Li et al., 2009a , which consider only the dense part of the input data, the codebook is constructed by making use of the whole source data. Transferring of the learned codebook to the target domain is achieved by minimizing Eq. (3).

Table 2: Values of root mean square error and mean absolute error of baseline methods and our method on Goodbooks data using MovieLens-1M as source

	MMMF	MINDTL	TRACER	CBT	Proposed
RMSE	0.9582	1.2794	0.9637	0.9641	0.9507
MAE	0.6501	0.9232	0.7781	0.7890	0.6466

We have conducted the experiments on MovieLens-1M data (source) and Goodbooks data (target), with varying number of clusters ( $k_{1}$ , $k_{2}$ - 25, 50, 75, 100, 125, 150, 175, 200). Fig. (2) gives the impact of number of clusters on RMSE and Fig. (3) shows the impact of number of clusters on MAE. Although there is no much change in the metric values with varying number of clusters, in our experiments we have fixed $k_{1}$ to $150$ , $k_{2}$ to $100$ , for which the best performance is achieved. By fixing the number of clusters, we have also experimented our algorithm by varying the values of threshold ( $th$ ) and margin ( $\epsilon$ ). The range of $th$ (in %) fall in {40, 50, 60, 70, 80}, and the values of $\epsilon$ considered are {0.1, 0.2, 0.3, 0.4, 0.5}. The performance of our algorithm is satisfying when the value of $th$ is $50$ , and that of $\epsilon$ is $0.2$ . Hence, in the experiments of the proposed method, when Goodbooks data is the target and MovieLens-M is source, we have fixed the values of parameters as follows: $k_{1}$ = 150, $k_{2}$ = 100, $th$ = 80, $\epsilon$ = $0.3$ .

Table 2 shows the RMSE and MAE values on Goodbooks (target) data of baseline methods considered and the proposed method, by considering MovieLens-1M as source. The values reported are the average of five runs.

5 Conclusions and Future work

We have proposed a novel model for cross-domain recommendation that takes into account the latent features of the source domain’s codebook and utilizing them in the target domain, when the domains do not share common users or common items. As the first step, we imputed the missing entries of the source rating matrix with the average of the rows to get the filled-in rating matrix. We made use of the co-clustering technique for constructing the codebook from the source domain, i.e., for obtaining the cluster-level rating pattern. After this stage, codebook is processed by comparing the entries of codebook with the values of the filled-in source rating matrix. By applying the maximum margin matrix factorization technique on the processed codebook, the latent factor vectors of codebook are obtained. The learnt source knowledge (latent factors of codebook) is then transferred to the target domain via hinge loss, and the target domain latent features are learned which are then utilized to get the predicted target user-item rating matrix. By observing the experimental results on the benchmark data sets, we say that our model approximates the target matrix well. In the proposed method the only consideration we gave is for the ratings and it is possible in the future to consider social tags also as input types. Transfer learning applications of other type rather than rating prediction in recommender systems could be another direction that could be pursued.

Acknowledgements

This work has been done as part of the PhD dissertation of the first author at University of Hyderabad. The first author would like to acknowledge the funding agency, Council of Scientific and Industrial Research (CSIR) Government of India for the financial support in the form of CSIR-UGC NET-JRF/SRF.

References

Aggarwal, (2016) Aggarwal, C. C. (2016). Recommender Systems: The Textbook. Springer Publishing Company, Incorporated, 1st edition.
Bobadilla et al., (2013) Bobadilla, J., Ortega, F., Hernando, A., and Gutiérrez, A. (2013). Recommender systems survey. Knowledge-Based Systems, 46:109 – 132.
Chung et al., (2007) Chung, R., Sundaram, D., and Srinivasan, A. (2007). Integrated personal recommender systems. In Proceedings of the Ninth International Conference on Electronic Commerce, ICEC ’07, pages 65–74, New York, NY, USA. ACM.
Devi et al., (2014) Devi, V. S., Kagita, V. R., Pujari, A. K., and Padmanabhan, V. (2014). Collaborative filtering by pso-based mmmf. In 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 569–574.
Ding et al., (2006) Ding, C., Li, T., Peng, W., and Park, H. (2006). Orthogonal nonnegative matrix tri-factorizations for clustering. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, pages 126–135, New York, NY, USA. ACM.
He et al., (2018) He, M., Zhang, J., Yang, P., and Yao, K. (2018). Robust transfer learning for cross-domain collaborative filtering using multiple rating patterns approximation. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM ’18, pages 225–233, New York, NY, USA. ACM.
He et al., (2017) He, M., Zhang, J., and Zhang, J. (2017). Mindtl: Multiple incomplete domains transfer learning for information recommendation. China Communications, 14:218–236.
Ji et al., (2016) Ji, K., Sun, R., Li, X., and Shu, W. (2016). Improving matrix approximation for recommendation via a clustering-based reconstructive method. Neurocomput., 173(P3):912–920.
Koren et al., (2009) Koren, Y., Bell, R., and Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8):30–37.
(10) Li, B., Yang, Q., and Xue, X. (2009a). Can movies and books collaborate?: Cross-domain collaborative filtering for sparsity reduction. In Proceedings of the 21st International Jont Conference on Artifical Intelligence, IJCAI’09, pages 2052–2057, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
(11) Li, B., Yang, Q., and Xue, X. (2009b). Transfer learning for collaborative filtering via a rating-matrix generative model. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pages 617–624, New York, NY, USA. ACM.
Moreno et al., (2012) Moreno, O., Shapira, B., Rokach, L., and Shani, G. (2012). Talmud: Transfer learning for multiple domains. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, pages 425–434, New York, NY, USA. ACM.
Pan and Yang, (2010) Pan, S. J. and Yang, Q. (2010). A survey on transfer learning. IEEE Trans. on Knowl. and Data Eng., 22(10):1345–1359.
Pan, (2016) Pan, W. (2016). A survey of transfer learning for collaborative recommendation with auxiliary data. Neurocomputing, 177:447–453.
Pan et al., (2011) Pan, W., Liu, N. N., Xiang, E. W., and Yang, Q. (2011). Transfer learning to predict missing ratings via heterogeneous user feedbacks. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Volume Three, IJCAI’11, pages 2318–2323. AAAI Press.
Pan and Ming, (2014) Pan, W. and Ming, Z. (2014). Interaction-rich transfer learning for collaborative filtering with heterogeneous user feedback. IEEE Intelligent Systems, 29:48–54.
Pan et al., (2016) Pan, W., Xia, S., Liu, Z., Peng, X., and Ming, Z. (2016). Mixed factorization for collaborative recommendation with heterogeneous explicit feedbacks. Information Sciences, 332:84 – 93.
Pan et al., (2010) Pan, W., Xiang, E. W., Liu, N. N., and Yang, Q. (2010). Transfer learning in collaborative filtering for sparsity reduction. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI’10, pages 230–235. AAAI Press.
Rennie and Srebro, (2005) Rennie, J. D. M. and Srebro, N. (2005). Fast maximum margin matrix factorization for collaborative prediction. In Proceedings of the 22Nd International Conference on Machine Learning, ICML ’05, pages 713–719, New York, NY, USA. ACM.
Ricci et al., (2010) Ricci, F., Rokach, L., Shapira, B., and Kantor, P. B. (2010). Recommender Systems Handbook. Springer-Verlag, Berlin, Heidelberg, 1st edition.
Salman et al., (2016) Salman, K. H., Pujari, A. K., Kumar, V., and Veeramachaneni, S. D. (2016). Combining swarm with gradient search for maximum margin matrix factorization. In Booth, R. and Zhang, M.-L., editors, PRICAI 2016: Trends in Artificial Intelligence, pages 167–179, Cham. Springer International Publishing.
Shi et al., (2013) Shi, J., Long, M., Liu, Q., Ding, G., and Wang, J. (2013). Twin bridge transfer learning for sparse collaborative filtering. In Pei, J., Tseng, V. S., Cao, L., Motoda, H., and Xu, G., editors, Advances in Knowledge Discovery and Data Mining, pages 496–507, Berlin, Heidelberg. Springer Berlin Heidelberg.
Srebro et al., (2005) Srebro, N., Rennie, J. D. M., and Jaakola, T. S. (2005). Maximum-margin matrix factorization. In Advances in Neural Information Processing Systems 17, pages 1329–1336. MIT Press.
Veeramachaneni et al., (2019) Veeramachaneni, S. D., Pujari, A. K., Padmanabhan, V., and Kumar, V. (2019). A maximum margin matrix factorization based transfer learning approach for cross-domain recommendation. Appl. Soft Comput., 85.
Veeramachaneni et al., (2022) Veeramachaneni, S. D., Pujari, A. K., Padmanabhan, V., and Kumar, V. (2022). A hinge-loss based codebook transfer for cross-domain recommendation with non-overlapping data. Information Systems, 107:102002.
Wu, (2007) Wu, M. (2007). Collaborative filtering via ensembles of matrix factorizations. In KDD Cup and Workshop 2007, pages 43–47. Max-Planck-Gesellschaft.
Zhao et al., (2013) Zhao, L., Pan, S. J., Xiang, E. W., Zhong, E., Lu, Z., and Yang, Q. (2013). Active transfer learning for cross-system recommendation. In AAAI. Citeseer.
Zhuang et al., (2018) Zhuang, F., Zheng, J., Chen, J., Zhang, X., Shi, C., and He, Q. (2018). Transfer collaborative filtering from multiple sources via consensus regularization. Neural Networks, 108:287 – 295.