This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Conformal Group Recommender System

Venkateswara Rao Kagita [email protected] Anshuman Singh [email protected] Vikas Kumar [email protected] Pavan Kalyan Reddy Neerudu neerud˙[email protected] Arun K. Pujari [email protected] Rohit Kumar Bondugula [email protected] National Institute of Technology Warangal, India. University of Delhi, Delhi, India. Mahindra University, Hyderabad, India. University of Hyderabad, Hyderabad, India.
Abstract

Group recommender systems (GRS) are critical in discovering relevant items from a near-infinite inventory based on group preferences rather than individual preferences, like recommending a movie, restaurant, or tourist destination to a group of individuals. The traditional models of group recommendation are designed to act like a black box with a strict focus on improving recommendation accuracy, and most often, they place the onus on the users to interpret recommendations. In recent years, the focus of Recommender Systems (RS) research has shifted away from merely improving recommendation accuracy towards value additions such as confidence and explanation. In this work, we propose a conformal prediction framework that provides a measure of confidence with prediction in conjunction with a group recommender system to augment the system-generated plain recommendations. In the context of group recommender systems, we propose various nonconformity measures that play a vital role in the efficiency of the conformal framework. We also show that defined nonconformity satisfies the exchangeability property. Experimental results demonstrate the effectiveness of the proposed approach over several benchmark datasets. Furthermore, our proposed approach also satisfies validity and efficiency properties.

keywords:
Group recommender systems , Conformal prediction , Confidence measure

1 Introduction

Recommender systems (RS) assist users in decision-making by helping them sift through a huge variety of offered products, such as movies, web pages, articles, and books [1]. These systems exploit past interactions between users and items to make personalized recommendations. RS algorithms are broadly classified into content-based, collaborative, and hybrid filtering, depending on the input used for profiling users and items. Content-based filtering approaches recommend items to a user by considering the similarity between the content or features of items and the user’s profile [2, 3]. On the other hand, the collaborative filtering-based approach recommends items based on the preferences of other users who share similar preferences to the target user [4, 5, 6]. The hybrid approach combines various mechanisms and compositions of different data sources [7, 8]. Group recommender systems (GRS) [9, 10] extend this concept to the user group, wherein it analyses a group of users’ profiles and creates a communal recommendation list. These groups can consist of family members, friends, colleagues, or any collection of individuals who wish to engage with a specific item or application collectively. We observe numerous applications of group recommender systems in our daily lives. For example, a group of friends may plan to dine at a restaurant or organize a tour, while a family may wish to watch a movie together. A Group Recommender System (GRS) can be formally defined as follows. Let O={o1,o2,,om}O=\{o_{1},o_{2},\ldots,o_{m}\} represent the set of items, U={u1,u2,,un}U=\{u_{1},u_{2},\ldots,u_{n}\} denote the set of users, OjO_{j} represent the set of items present in the profile of a user uju_{j}, and UGUU_{G}\subseteq U indicate a group of users. Given the sets, OO, UU, OjO_{j} (j\forall j), and UGU_{G}, the goal of a GRS is to recommend the top-k most relevant items from OO to the user group UGU_{G} by considering the individual preferences within the group.

Several approaches in recent years have been proposed to extend personalized recommender systems to group recommender systems, wherein the major focus is to improve the accuracy of recommendations. With the recent algorithmic advancement, the focus of the research has been recently shifted towards creating transparent group recommendation models that prioritize accountability and explainability. These models aim to achieve accuracy while providing additional value through confidence measures, explanations, or sensitivity. Among these enhancements, associating a confidence measure with the recommendation set is a particularly appealing facet. The confidence measures indicate the system’s confidence that the desirable items are present in the recommendation set, which in turn enhance the system’s reliability and help users quickly and correctly identify the products of their choice. Although researchers have thoroughly investigated confidence-based personalized recommender systems [11, 12], only a few methods have been proposed for improving the confidence measure in group recommender systems. Further, considering the individual preferences of the group and providing a delightful recommendation with confidence is not trivial. This paper proposes a confidence-based group recommender system using a conformal prediction framework. The proposed approach represents confidence in connection with the error bound, i.e., if the system exhibits 80%80\% confidence in the recommendation, it implies that the probability of making an error is at most 20%20\%. The concept of conformal prediction forms the basis for the proposed Conformal Group Recommender System (CGRS).

Conformal prediction is a framework for reliable machine learning that measures confidence in the predicted labels on a per-instance basis. A conformal system framework can be implemented alongside any traditional machine learning algorithm, but the implementation largely depends on the underlying algorithm. This paper extends the concept to GRS and defines a nonconformity measure, an essential part of the conformal prediction framework, suitable for group recommendation settings. Given a set of users UU, items OO, and a target group of user profiles, Oj(j),UGO_{j}(\forall j),~{}U_{G} and the significance level ε\varepsilon, the proposed conformal group recommender system returns recommendation set to a group UGU_{G} with the confidence of (1ε)(1-\varepsilon). We demonstrate that the exchangeability property, an essential property for any conformal framework, is satisfied by the proposed nonconformity measure. Furthermore, the proposed approach also satisfies the validity and efficiency properties. Experiment results over several real-world datasets support the efficacy of the proposed conformal group recommender system.

The paper is organized as follows. Section 2 briefly reviews conformity measures in recommender systems and establishes the context for this paper. We cover the foundational concepts of conformal prediction and the algorithm used to build the proposed conformal group recommender systems in Section 3. The proposed algorithm and proof of validity is presented in Section 4. Section 5 presents extensive experimental results on eight benchmark datasets. Finally, Section 6 concludes the paper and offers recommendations for future endeavors.

2 Related Work

Group recommender systems assist a group of users who share a common interest. These systems exploit the collective consumption experience of the group to offer them personalized recommendations. However, most of these systems are less transparent than those designed to assist a single user, making it difficult for users to understand why the system recommends a particular item. Despite the enormous application of group recommender systems, there has yet to be a comprehensive analysis of associating confidence with the recommendation in a group context. While confidence measures are well-established in personalized recommender systems [13, 12], their application in group recommender systems is more complex. Group recommender systems have additional complexities in modeling and evaluating preferences due to diverse user preferences. This section reviews related work that focuses on group recommender systems and associating confidence measures with the recommendation set.

The majority of conventional group recommendation models aim to extend personal recommender systems (PRS) to group recommender systems (GRS) by blending individual preferences to form group profiles. MusicFX  [14] is among the early proposals in group recommendation, specifically targeting the recommendation of music channels for playing in fitness centers. This system exploits previously specified music preferences of individuals over a wide range of musical genres to recommend music at a particular time. Let’s browse [15] suggests web pages to a group consisting of two or more individuals who are likely to have shared interests.PolyLens [16] employs collaborative filtering and recommends movies to groups of users. Jameson et al. [17] proposed the Travel Decision Forum system, which helps groups plan vacation. The system provides an interface where the group members can view and copy other users’ preferences and combine them using the median strategy once all agree. Garcia et al. [18] proposed intersection and aggregation mechanisms to build a group profile from the individual profiles. Kagita et al.[19] use a virtual user approach with precedence mining to extend PRS to GRS. All the GRS approaches described so far utilize aggregation techniques to determine group profiles or recommendations, and they are inept at associating confidence with the recommendation. In this proposal, we adopt our previous work [19] and propose a conformal group recommender system that furnishes confidence in the recommendations to improve the system’s reliability and user satisfaction. Another difference between the previous work [19] and our current proposal is that we have used coexistence in place of precedence mining, as in many real-world applications, the associations of item consumption are more important than strict precedence relationships.

3 Preliminary Concepts

This section reviews the paper’s foundational concepts, which we use subsequently in our algorithm. We introduce the precursory concepts of conformal prediction in Section 3.1, which serves as the framework for constructing our suggested conformal group recommender system. We then briefly describe the precedence mining and association mining-based personalized recommender systems (PRS) in Sections 3.2 and  3.3, respectively, that we extended to the GRS. Section 3.4 explores the virtual user approach as an efficient method for developing a group recommender system (GRS) from a personalized recommender system (PRS).

3.1 Conformal Prediction

In this section, we offer a concise overview of the conformal prediction principle, presenting relevant information for a comprehensive understanding. Let zi=(xi,yi)z_{i}=(x_{i},y_{i}) be a training instance (or an example), where xidx_{i}\in\mathbb{R}^{d} is a dd-dimensional attribute vector and yi{Y1,Y2,,Yc}y_{i}\in\{Y_{1},Y_{2},\dots,Y_{c}\} is the associated class label. Given a dataset, S={zi}i=1nS=\{z_{i}\}_{i=1}^{n}, which contains nn training instances, the aim of the classification task is to produce a classifier h:xyh:x\rightarrow y, anticipating that hh predicts yn+ky_{n+k} well on any future example xn+kx_{n+k}, k1k\geq 1, by optimizing some specific evaluation function. A conformal predictor, on the other hand, associates precise levels of confidence in predictions. Given a method for making a prediction, the conformal predictor offers a set of predicted class labels for an unseen object xn+kx_{n+k} such that the probability of the correct label not present within this set is no greater than ε\varepsilon, called (1ε)(1-\varepsilon)-prediction region. We assume that the set of all possible labels {Y1,Y2,,Yc}\{Y_{1},Y_{2},\dots,Y_{c}\} is known a prior, and all zisz_{i}^{\prime}s, are in i.i.d. fashion (independently and identically distributed).

The conformal predictor uses a (non)conformity measure designed based on the underlying algorithm to evaluate the difference between a new (test) instance and a training data set. Let zn+1=(xn+1,Yj)z_{n+1}=(x_{n+1},Y_{j}), where YjY_{j} is tentatively assigned to xn+1x_{n+1}. The nonconformity measure for an example zi{Szn+1}z_{i}\in\{S\cup z_{n+1}\} is a measure of how well ziz_{i} conforms to {Szn+1}zi,i[1,n+1]\{S\cup z_{n+1}\}\setminus z_{i},\forall i\in[1,n+1]. In other words, the nonconformity measure assesses the likelihood that the label YjY_{j} is the correct label for our new example, xn+1x_{n+1}. The difference in SS’s predicted behaviour when ziz_{i} is replaced with zn+1z_{n+1} is then calculated. Let pp-value be the proportion of ziSz_{i}\in S with a nonconformity score worse than zn+1z_{n+1} for all possible values of yn+1y_{n+1} (all class labels). The (1ε)(1-\varepsilon)-prediction region is formed by labels with pp-values greater than ε\varepsilon. Intuitively, the prediction behavior is observed by employing any of the conventional predictors that utilize the training set SS.

3.2 Precedence Mining based Recommender Systems

The precedence mining (PM) based recommendation model is a type of collaborative filtering (CF) technique that captures the temporal patterns in the user profiles and provides recommendations to the target user based on the sequence of items they have consumed [20]. For example, if a user has seen Toy Story 1, the PM-based CF can recommend Toy Story 2 if it has been seen previously by users who have seen the Toy Story 1. The PM model uses precedence statistics, i.e., the precedence count of item pairs, to estimate the probability of future consumption based on past behavior. It estimates the relevance score of each candidate item for recommendation using precedence statistics.

Let O={o1,o2,,onitem}O=\{o_{1},o_{2},\ldots,o_{nitem}\} be the set of items, U={u1,u2,,unuser}U=\{u_{1},u_{2},\ldots,u_{nuser}\} be the set of users and OjO_{j} be the set of items consumed by uju_{j}. The items for recommendation are chosen based on the criteria that they are not present in the target user’s profile and are likely to be preferred by the target user compared to other items. The PM-based approach keeps Support()Support(\cdot) and PrecedenceCount()PrecedenceCount(\cdot) statistics for each item oio_{i} to calculate the recommendation score of an item.  Support(oi)Support(o_{i}) is the number of users who has consumed the iith item, and the PrecedenceCount(oi,oh)PrecedenceCount(o_{i},o_{h}) is the number of users who consumed item oio_{i} before item oho_{h}. We use a notation PP(oi|oh)PP(o_{i}|o_{h}) to denote the precedence probability of consuming an item oio_{i} preceding oho_{h}. We define PP(oi|oh)PP(o_{i}|o_{h}), and Score(oi,uj)Score(o_{i},u_{j}) as follows.

PP(oi|oh)=PrecedenceCount(oi,oh)Support(oh),PP(o_{i}|o_{h})=\frac{PrecedenceCount(o_{i},o_{h})}{Support(o_{h})}, (1)
Score(oi,uj)=Support(oi)nuser×olOjPP(ol|oi).Score(o_{i},u_{j})=\frac{Support(o_{i})}{nuser}\times\underset{o_{l}\in O_{j}}{\operatorname{\prod}}PP(o_{l}|o_{i}). (2)

The objects with high scores are recommended.

Example 1.

Let us consider the movie consumption data of nine users watching six different movies (i.e., Toy Story, Minions, Godfather I, Godfather II, Thor, and Avengers) in the table given below. Let Rob be the target user who has not yet watched Minions and Godfather II. We compute the relevance score for these two candidate items and rank them in decreasing order of the relevance score.

Users Movies watched
Mark Minions, Thor, Toy Story, Avengers
Christopher Godfather I, Godfather II, Toy Story, Minions
Rob Thor, Avengers, Godfather I, Toy Story
Jacob Avengers, Godfather I, Godfather II, Thor
Rachel Godfather I, Minions, Godfather II, Toy Story
Thomas Godfather I, Godfather II, Minions
Grant Thor, Avengers, Minions, Toy Story
Pamela Godfather I, Avengers, Thor, Godfather II
Holly Minions, Toy Story, Godfather I, Godfather II
Score(GodfatherII,Rob)\displaystyle Score(GodfatherII,Rob) =Support(GodfatherII)nuser×PP(ThorGodfatherII)×\displaystyle=\frac{Support(Godfather~{}II)}{nuser}\times PP(Thor\mid GodfatherII)\times
PP(AvengersGodfatherII)×PP(GodfatherIGodfatherII)\displaystyle~{}~{}PP(Avengers\mid Godfather~{}II)\times PP(Godfather~{}I\mid Godfather~{}II)
×P(ToystoryGodfatherII)\displaystyle~{}~{}\times P(Toy~{}story\mid Godfather~{}II)
=69×16×26×66×16=0.0062.\displaystyle=\frac{6}{9}\times\frac{1}{6}\times\frac{2}{6}\times\frac{6}{6}\times\frac{1}{6}=0.0062.

3.3 Association Mining based Recommender Systems

Although the temporal pattern is important in some real-world applications, the associations of item consumption are more important than strict precedence relationships. For instance, in household product recommendations, device recommendations, or Youtube video recommender systems, more than the sequence, it is crucial to analyze the set of items consumed together. Consider the table given in Example 1 where some movies, such as Minions and Toy Story, are watched together but not necessarily in a particular order. Out of the six users who have watched the movie Minions, five of them have also watched Toy story either before or after it. The opposite is also true. This could be because, while Minions and Toy Story are not as closely related as Godfather I and II, they are both animated children’s films, and it is likely that if a person enjoys Minions, he or she will enjoy Toy Story as well. We consider the number of profiles that consumed two items (say, oio_{i} and ojo_{j}) together in this work rather than the number of profiles that consumed o2o_{2} after o1o_{1}. In other words, instead of taking the precedence probability of ojo_{j} given oio_{i} (P(ojoi)P(o_{j}\mid o_{i})), we take the probability of oio_{i} given ojo_{j} (P(oioj)P(o_{i}\mid o_{j})) while estimating the relevance score of an item to the target user. Thus, Equation 2 is rewritten as

Score(oi,uj)=Support(oi)nuser×olOjP(oi|ol),Score(o_{i},u_{j})=\frac{Support(o_{i})}{nuser}\times\underset{o_{l}\in O_{j}}{\operatorname{\prod}}P(o_{i}|o_{l}), (3)

where P(oi|oh)=Support(oi,oh)Support(oh)P(o_{i}|o_{h})=\frac{Support(o_{i},o_{h})}{Support(o_{h})} and Support(oi,oh)Support(o_{i},o_{h}) is the number of users having consumed item oio_{i} and oho_{h} together. Support(oi)Support(o_{i}) represents the number of users who have consumed item oio_{i}, which is also equal to Support(oi,oi)Support(o_{i},o_{i}). P(oi|oh)P(o_{i}|o_{h}) denotes the probability of consuming an item oio_{i} given that an item oho_{h} is already consumed. Items with high scores are subsequently recommended to a user. Conversely, if the score is low, it is implausible that the item would interest the user.

We call this formulation as Association Mining based Recommender System. We now consider an example which illustrates the working of the Association Mining based recommender system.

Example 2.

Take into account the Support statistics below, computed based on thirty users’ profiles. U={u1,u2,,u30}U=\{u_{1},u_{2},\ldots,u_{30}\} over ten items O={o1,o2,,o10}O=\{o_{1},o_{2},\ldots,o_{10}\}.

Support=[2098117867739251011977684810215765643111152568663279762287694876881856416756751544176666641872784394472033432411236]Support=\left[\begin{array}[]{cccccccccc}20&9&8&11&7&8&6&7&7&3\\ 9&25&10&11&9&7&7&6&8&4\\ 8&10&21&5&7&6&5&6&4&3\\ 11&11&5&25&6&8&6&6&3&2\\ 7&9&7&6&22&8&7&6&9&4\\ 8&7&6&8&8&18&5&6&4&1\\ 6&7&5&6&7&5&15&4&4&1\\ 7&6&6&6&6&6&4&18&7&2\\ 7&8&4&3&9&4&4&7&20&3\\ 3&4&3&2&4&1&1&2&3&6\end{array}\right]

Let u1u_{1} be the target user and O1={o1,o3,o5,o7,o9}O_{1}=\{o_{1},~{}o_{3},~{}o_{5},~{}o_{7},~{}o_{9}\} be set of items consumed by u1u_{1}. The items which are not consumed by the the user u1u_{1} forms the candidate item set for recommendation i.e., OO1={o2,o4,o6,o8,o10}O\setminus O_{1}=\{o_{2},o_{4},o_{6},o_{8},o_{10}\}. The score of an item o2o_{2} which not consumed by user u1u_{1} is then calculated as

Score(o2,u1)\displaystyle Score(o_{2},u_{1}) =Support(o2)30×P(o1o2)×P(o3o2)×P(o5o2)×P(o7o2)×P(o9o2)\displaystyle=\frac{Support(o_{2})}{30}\times P(o_{1}\mid o_{2})\times P(o_{3}\mid o_{2})\times P(o_{5}\mid o_{2})\times P(o_{7}\mid o_{2})\times P(o_{9}\mid o_{2})
=2530×925×1025×925×725×825=0.0039.\displaystyle=\frac{25}{30}\times\frac{9}{25}\times\frac{10}{25}\times\frac{9}{25}\times\frac{7}{25}\times\frac{8}{25}=0.0039.

Similarly, Score(o4,u1)=0.0005Score(o_{4},u_{1})=0.0005, Score(o6,u1)=0.0024Score(o_{6},u_{1})=0.0024, Score(o8,u1)=0.0022Score(o_{8},u_{1})=0.0022, and Score(o10,u1)=0.0028Score(o_{10},u_{1})=0.0028. Hence, it ranks the items in the order of o2,o10,o6,o8o_{2},~{}o_{10},~{}o_{6},~{}o_{8}, and o4o_{4}.

Recommender systems that employ Association Mining focus on discovering associations among items that users have consumed. These systems then utilize support statistics to compute a high relevance score for recommending new items. The major advantage of the current model over a traditional CF approach is its ability to account for the frequent pairwise relations among all users. After identifying a subset of similar users, the traditional CF approach narrows down its search to items consumed by the users within the neighborhood. Consequently, certain consumption patterns of items by the entire user group are excluded due to this restriction. The nicety and novelty of this approach lie in its utilization of pairwise relations between items. The current approach is also different from the association rule mining [21] that derives if-then rules using user transactions. In contrast, the present method estimates the relevance score as the likelihood of consuming a candidate item given a set of commodities consumed by a user.

3.4 Virtual User Approach for GRS

The virtual user approach for GRS creates a virtual profile representing the group preferences and utilizes this profile to build the recommendation set [19]. A virtual user profile is created by computing the weight that indicates the group preferences for each item consumed by at least one group member. We compute the weight of an item oio_{i} for a group GG as follows:

weight(oi,G)=ujGweight(oi,uj)|G|where,weight(o_{i},G)=\sum_{u_{j}\in G}\frac{weight(o_{i},u_{j})}{\lvert G\rvert}~{}~{}where,\\ (4)
weight(oi,uj)={1,if oiOjscore(oi,uj),otherwise.weight(o_{i},u_{j})=\begin{cases}1,&\text{if }o_{i}\in O_{j}\\ score(o_{i},u_{j}),&\text{otherwise}\end{cases}.

In our previous work, after determining the group weight for all the group items, we proposed two different ways of creating a virtual user profile: 1)Threshold-based virtual user -  A virtual user profile is made up of items whose weight exceeds a certain threshold. 2) Weighted virtual user - The virtual user profile comprises all the items each group member consumes and their weights. This paper adapts both of these techniques and proposes an hybrid approach that relieves the items with significantly less weight and forms a weighted virtual user profile with the remaining items.

4 Conformal Group Recommender System

The proposed approach combines conformal prediction and group recommender systems to enrich the system-generated plain group recommendations with the associated confidence value. As discussed previously, the nonconformity measures, a pivotal component of the conformal prediction framework, play a critical role in the efficiency of any conformal model and must be devised carefully after inspecting the underlying model. The nonconformity measure, which essentially measures how strange the new object is in comparison to the training set, is then used to derive a confidence value for a new item. We use the pairwise relations between the items to derive the nonconformity measure in the proposed work.

4.1 Nonconformity Measure for CGRS

Let V(G)V(G) be the virtual user representing the group GG profile and OV(G)O_{V(G)} be the set of weighted items in the virtual user profile. We randomly divide the virtual user profile OV(G)O_{V(G)} into OV(G)1={o1:w1,o2:w2,,on:wn}O_{V(G)}^{1}=\{o_{1}:w_{1},o_{2}:w_{2},\ldots,o_{n}:w_{n}\} and OV(G)2={o1,o2,,ok}O_{V(G)}^{2}=\{o^{\prime}_{1},o^{\prime}_{2},\ldots,o^{\prime}_{k}\}, where OV(G)1O_{V(G)}^{1} is the training set that holds items known to be consumed and OV(G)2O_{V(G)}^{2} contains the candidate set of items that are consumed by the user but are not part of the training dataset. Given a new candidate item on+1OV(G)2o_{n+1}\in O_{V(G)}^{2}, CGRS measures its strangeness with respect to the items in the training set. We define the nonconformity of an item oiOV(G)1o_{i}\in O_{V(G)}^{1} as the relevance score of a candidate item ojOV(G)2o^{\prime}_{j}\in O_{V(G)}^{2} with a profile excluding oio_{i}. We compute the nonconformity value for each item in the extended training set OV(G)3=OV(G)1{on+1:1}O_{V(G)}^{3}=O_{V(G)}^{1}\cup\{o_{n+1}:1\}. To compute the nonconformity score of an item oio_{i}, we use the profile OV(G)4=OV(G)3{oi:wi}O_{V(G)}^{4}=O_{V(G)}^{3}\setminus\{o_{i}:w_{i}\}. The nonconformity value of oio_{i}, indicated as αi\alpha^{i}, concerning the recommendability of an item ojo^{\prime}_{j} is defined as follows.

αi(oj)=Support(oj)nuser×olOV(G)4wl×P(ol|oj).\alpha^{i}(o^{\prime}_{j})=\frac{Support(o^{\prime}_{j})}{nuser}\times\underset{o_{l}\in O_{V(G)}^{4}}{\operatorname{\prod}}w_{l}\times P(o_{l}|o^{\prime}_{j}). (5)
Lemma 1.

Nonconformity 𝒜()\mathcal{A}() of items remains consistent regardless of the permutation π\pi of the items, i.e., 𝒜(o1,o2,,on+1)=(α1,α2,,αn+1)𝒜(oπ(1),oπ(2),,oπ(n+1))=(απ(1),απ(2),,απ(n+1)).\mathcal{A}(o_{1},o_{2},\ldots,o_{n+1})=(\alpha^{1},\alpha^{2},\ldots,\alpha^{n+1})\Rightarrow\mathcal{A}(o_{\pi(1)},o_{\pi(2)},\\ \ldots,o_{\pi(n+1)})=(\alpha^{\pi(1)},\alpha^{\pi(2)},\ldots,\alpha^{\pi(n+1)}).

Proof.

The nonconformity computation given in Equation 5 considers the pairwise probability of a recommendable item ojo^{\prime}_{j} concerning all the items of OV(G)4O^{4}_{V(G)} in the product term. One can easily see that the result is invariant of the order of the items in OV(G)4O^{4}_{V(G)}. Hence, the nonconformity scores remain unaffected even when the training set is permuted. Hence, it is proved that the proposed nonconformity measure yields consistent results regardless of the permutation of the items. ∎

4.2 Recommendation using pp- values

The pp-value of a new item on+1o_{n+1} is defined as the proportion of oio_{i}’s in OV(G)4O_{V(G)}^{4} such that the score αi(oj)\alpha^{i}(o^{\prime}_{j}) is greater than or equal to αn+1(oj)\alpha^{n+1}(o^{\prime}_{j}).

p(on+1,oj)=|{oi1in+1,αi(oj)αn+1(oj)}|n+1p(o_{n+1},o^{\prime}_{j})=\frac{\Big{\lvert}\{o_{i}\mid 1\leq i\leq n+1,\alpha^{i}(o^{\prime}_{j})\geq\alpha^{n+1}(o^{\prime}_{j})\}\Big{\rvert}}{n+1} (6)

It can also be seen that the disagreement between αh(oj)\alpha^{h}(o^{\prime}_{j}) and αt(oj))\alpha^{t}(o^{\prime}_{j})) for ohoto_{h}\neq o_{t} is proportional to the difference between probabilities P(ot|oj){P}(o_{t}|o^{\prime}_{j}) and P(oh|oj)P(o_{h}|o^{\prime}_{j}). Formally, αh(oj)αt(oj)\alpha^{h}(o^{\prime}_{j})\geq\alpha^{t}(o^{\prime}_{j}) if and only if (wh×P(oh|oj))(wt×P(ot|oj))(w_{h}\times P(o_{h}|o^{\prime}_{j}))\leq(w_{t}\times P(o_{t}|o^{\prime}_{j})).

Lemma 2.

αh(oj)αt(oj)\alpha^{h}(o^{\prime}_{j})\geq\alpha^{t}(o^{\prime}_{j}) if and only if (wh×P(oh|oj))(wt×P(ot|oj))(w_{h}\times P(o_{h}|o^{\prime}_{j}))\leq(w_{t}\times P(o_{t}|o^{\prime}_{j}))

Proof.
αh(oj)\displaystyle\alpha^{h}(o^{\prime}_{j}) =Support(oj)nuser×olOV(G)4wl×P(ol|oj).\displaystyle=\frac{Support(o^{\prime}_{j})}{nuser}\times\underset{o_{l}\in O_{V(G)}^{4}}{\operatorname{\prod}}w_{l}\times P(o_{l}|o^{\prime}_{j}).
=Support(oj)nuser×(w1×P(o1|oj))××(wh1×P(oh1|oj))×(wh+1×P(oh+1|oj))×\displaystyle=\frac{Support(o^{\prime}_{j})}{nuser}\times(w_{1}\times P(o_{1}|o^{\prime}_{j}))\times\ldots\times(w_{h-1}\times P(o_{h-1}|o^{\prime}_{j}))\times(w_{h+1}\times P(o_{h+1}|o^{\prime}_{j}))\times
×(wn+1×P(on+1|oj)).\displaystyle~{}~{}\ldots\times(w_{n+1}\times P(o_{n+1}|o^{\prime}_{j})).

Similarly,

αt(oj)\displaystyle\alpha^{t}(o^{\prime}_{j}) =Support(oj)nuser×(w1×P(o1|oj))××(wt1×P(ot1|oj))×(wt+1×P(ot+1|oj))×\displaystyle=\frac{Support(o^{\prime}_{j})}{nuser}\times(w_{1}\times P(o_{1}|o^{\prime}_{j}))\times\ldots\times(w_{t-1}\times P(o_{t-1}|o^{\prime}_{j}))\times(w_{t+1}\times P(o_{t+1}|o^{\prime}_{j}))\times
×(wn+1×P(on+1|oj)).\displaystyle~{}~{}\ldots\times(w_{n+1}\times P(o_{n+1}|o^{\prime}_{j})).

The only term missing in αh\alpha^{h} computation is (wh×P(oh|oj))(w_{h}\times P(o_{h}|o^{\prime}_{j})) and similarly, for αt\alpha^{t} computation is (wt×P(ot|oj))(w_{t}\times P(o_{t}|o^{\prime}_{j})). The rest of the terms are the same in both αh\alpha^{h} and αt\alpha^{t} computation. Hence, αh(oj)αt(oj)\alpha^{h}(o^{\prime}_{j})\geq\alpha^{t}(o^{\prime}_{j}) if and only if (wh×P(oh|oj))(wt×P(ot|oj))(w_{h}\times P(o_{h}|o^{\prime}_{j}))\leq(w_{t}\times P(o_{t}|o^{\prime}_{j})). ∎

Hence, comparing the weighted probabilities is better choice than comparing the scores directly to streamline the computation. Thus we redefine p(on+1,oj)p(o_{n+1},o^{\prime}_{j}) given in Equation 6 as follows.

p(on+1,oj)=|{oh1hn+1,wh×P(oh|oj)wn+1×P(on+1|oj)}|n+1p(o_{n+1},o^{\prime}_{j})=\frac{\Big{\lvert}\{o_{h}\mid 1\leq h\leq n+1,w_{h}\times P(o_{h}|o^{\prime}_{j})\leq w_{n+1}\times P(o_{n+1}|o^{\prime}_{j})\}\Big{\rvert}}{n+1} (7)

The final pp-value is then computed by taking an average of pp-values concerning each candidate item in the set OV(G)2O^{2}_{V(G)}. A higher average pp-value of an item implies a greater chance of being recommended.

p(on+1)=ojOV(G)2p(on+1,oj)kp(o_{n+1})=\frac{\sum_{o^{\prime}j\in O^{2}_{V(G)}}p(o_{n+1},o^{\prime}_{j})}{k} (8)

Recommendation set Γε\Gamma^{\varepsilon} with (1ε)(1-\varepsilon) confidence: We determine pp-value for each candidate item, and if pp-value is more than ε\varepsilon, we include it in the recommendation set; otherwise, we do not.

Γε={ojojO{OV(G)},p(oj)>ε}\Gamma^{\varepsilon}=\{o_{j}\mid o_{j}\in O\setminus\{O_{V(G)}\},p(o_{j})>\varepsilon\}

The proposed conformal group recommender algorithm is outlined in Algorithm 1.

Input: O,GroupG,OjujG,Support,εO,Group~{}G,O_{j}~{}\forall u_{j}\in G,~{}Support,~{}\varepsilon

Output: Recommendation set (Γε\Gamma^{\varepsilon})

OV(G)BuildVirtualUserProfile(Support,OjujG)O_{V(G)}\leftarrow Build-Virtual-User-Profile(Support,O_{j}~{}\forall u_{j}\in G)

split OV(G)O_{V(G)} into two sets OV(G)1O^{1}_{V(G)} and OV(G)2O^{2}_{V(G)}

Γε\Gamma^{\varepsilon}\leftarrow\emptyset

for each ojho^{\prime}j_{h} in OjcO^{c}_{j} do

       Compute αh\alpha_{h} using Equation 5
end for
for each oo O{Oj,ujG}\in O\setminus\{O_{j},\forall u_{j}\in G\} do
       on+1oo_{n+1}\leftarrow oOV(G)3OV(G)1{on+1}O^{3}_{V(G)}\leftarrow O^{1}_{V(G)}\cup\{o_{n+1}\}for each ojo^{\prime}_{j} in OV(G)2O^{2}_{V(G)} do
             Compute P(oh|oj),h[1,n+1]P(o_{h}|o^{\prime}_{j}),h\in[1,n+1]  Compute p(on+1,oj)p(o_{n+1},o^{\prime}_{j}) using Equation 7   Compute p(on+1)p(o_{n+1}) using Equation 8if  p(on+1)>εp(o_{n+1})>\varepsilon  then  ΓεΓε{o}\Gamma^{\varepsilon}\leftarrow\Gamma^{\varepsilon}\cup\{o\} ;
            
       end for
      
end for
Algorithm 1 Conformal Group Recommender Systems.

Input: Support,OjujGSupport,O_{j}~{}\forall u_{j}\in G

Output: Virtual user V(G)V(G) profile OV(G)O_{V(G)}

OV(G)O_{V(G)}\leftarrow\emptyset

τinput(Threshold)\tau\leftarrow input(^{\prime}Threshold^{\prime})

GroupitemsujGOjGroup-items\leftarrow\bigcup_{u_{j}\in G}O_{j}

for each oiGroupitemso_{i}\in Group-items do

       Compute wi=weight(oi,G)w_{i}=weight(o_{i},G) using Equation 4if wi>τw_{i}>\tau then  OV(G)OV(G){oi:wi}O_{V(G)}\leftarrow O_{V(G)}\cup\{o_{i}:w_{i}\} ;
      
end for
Algorithm 2 Build-Virtual-User-Profile

4.3 Validity and Efficiency

The property of validity ensures that the probability of committing an error will not surpass ε\varepsilon for the recommendation set Γε\Gamma^{\varepsilon}. This indicates that the possibility of not recommending user’s interesting item oo is bounded to ε\varepsilon, i.e., P(p(oε))εP(p(o\leq\varepsilon))\leq\varepsilon. Lemma 1 discussed in the previous subsection forms the basis for proving the validity of the proposed approach. The following lemma establishes the fulfillment of the validity property by the proposed CGRS by employing the line of reasoning provided by Vovk et al. [22].

Lemma 3.

Assuming that objects o1,o2,,on+1o_{1},o_{2},\ldots,o_{n+1} are distributed independently and identically concerning their precedence relations with the training data, the probability of the error that on+1o_{n+1} does not belong to Γε\Gamma^{\varepsilon} will be at most ε[0,1]\varepsilon\in[0,1], i.e., P(p(on+1)ε)εP(p(o_{n+1})\leq\varepsilon)\leq\varepsilon.

Proof.

When the pp-value of a new item on+1o_{n+1} to be recommended is less than or equal to ε\varepsilon, i.e., p(on+1)εp(o_{n+1})\leq\varepsilon, we regard that as an error. The pp-value is lower than ε\varepsilon when the expression ojOV(G)2p(on+1,oj)kk\frac{\sum_{o^{\prime}j\in O^{2}_{V(G)}}p(o_{n+1},o^{\prime}j)}{k}\leq k holds. To determine the probability of p(on+1,oj)εp(o_{n+1},o^{\prime}_{j})\leq\varepsilon for all ojo^{\prime}_{j}, we consider the case where αn+1(oj)\alpha^{n+1}(o^{\prime}j) is among the ε(l+1)\lfloor\varepsilon(l+1)\rfloor largest elements of the set {α1(oj),α2(oj),,αn+1(oj)}\{\alpha^{1}(o^{\prime}j),\alpha^{2}(o^{\prime}j),\ldots,\alpha^{n+1}(o^{\prime}j)\}. When all objects ({o1,o2,,on+1}\{o_{1},o_{2},\ldots,o_{n+1}\}) are independently and identically distributed with respect to their precedence relations with ojo^{\prime}j, all permutations of the set {α1(oj),α2(oj),,αn+1(oj)}\{\alpha^{1}(o^{\prime}j),\alpha^{2}(o^{\prime}j),\ldots,\alpha^{n+1}(o^{\prime}j)\} are equiprobable. Thus, the probability that αn+1\alpha^{n+1} is among the ε(l+1)\lfloor\varepsilon(l+1)\rfloor largest elements does not exceed ε\varepsilon. Consequently, P(p(on+1,oj)ε)εP(p(o_{n+1},o^{\prime}_{j})\leq\varepsilon)\leq\varepsilon. In the scenario where the pp-values, i.e., p(on+1,oj)p(o_{n+1},o^{\prime}j), are uniformly distributed, the mean of these pp-values, denoted as p(on+1)p(o_{n+1}), will not exceed ε\varepsilon. Hence, P(p(on+1)ε)εP(p(o_{n+1})\leq\varepsilon)\leq\varepsilon, which represents the probability of error. ∎

Apart from meeting the validity property, possessing an efficient recommendation set is desirable. Within the conformal framework, an efficient set is characterized by narrower intervals and higher confidence levels. We experimentally investigate the efficiency aspects in Section 5 using standard performance metrics.

4.4 Time complexity analysis

In this section, we analyze the time complexity of the proposed approach. The time complexity of creating the virtual user profile is O(nitem×ni×|G|)O(nitem\times n_{i}\times\lvert G\rvert), where nitemnitem is the total number of items, |G|\lvert G\rvert is the group size, and nin_{i} is the number of items in each group member111If each member has consumed a different number of items, nin_{i} in the time complexity expression will be maximum among all the individual profile sizes.. Assuming (n+k)(n+k) items in the virtual user profile, nn items in OV(G)1O^{1}_{V(G)} and k items in OV(G)2O^{2}_{V(G)}, the time complexity of the pp-value computation for each item is O(nk)O(nk). To make a recommendation set, we need to compute pp-value for all the candidate items. Hence, the total complexity is O(n×k×nitem+nitem×ni×|G|).O(n\times k\times nitem+nitem\times n_{i}\times\lvert G\rvert).

5 Emperical Study

This section reports the empirical assessment of our proposed conformal group recommender system (CGRS). Experiments were carried out on eight benchmark datasets demonstrating that CGRS could achieve higher prediction accuracy than the baseline GRS model [19]. Table 1 presents the details of the experiments used for experiments.

Table 1: Experimental Datasets Description.
Dataset Users Items Records
MovieLens 100K 943 1,682 100,000
MovieLens-latest-small 707 8,553 100,024
MovieLens 1M 6,040 3,952 1,000,209
Personality-2018 1,820 35,196 1,028,752
MovieLens 10M 71,567 10,681 10,000,054
MovieLens 20M 138,494 26,745 20,000,262
MovieLens-latest 229,061 26,780 21,063,128
MovieLens 25M 162,000 62,000 25,000,096

In the preprocessing stage, the items available in a user’s profile are ordered according to their timestamp. We remove the user profiles having fewer than 20 ratings and perform train and test division for each user in a 6:4 ratio. The virtual user profile is constructed using the data available in the training sets. Further, in the case of CGRS, for splitting the virtual user profile into two groups, we fine-tuned with different combinations of sizes and selected 75% of the virtual user profile as OV(G)1O_{V(G)}^{1} and the remaining as the OV(G)2O_{V(G)}^{2}. We analyzed the results in two different kinds of group settings, namely homogeneous and random. In the homogeneous group setting, the candidates for forming a group are chosen in a way such that they jointly share at least some percentage of common items. Given the size of group gg, we form a homogeneous group if each user has at least 100/(2g)100/(2*g) percent of common items. In the random group setting, the candidates for forming a group are chosen randomly from the list of users.

Performance Metrics: To assess the effectiveness of the proposed CGRS approach, we used the following evaluation metrics in our experiments
Precision: Precision represents the proportion of items recommended by a system that are relevant to the user’s interests.

Precision=|recommendedused||recommended|,Precision=\frac{\lvert recommended\cap used\rvert}{\lvert recommended\rvert},

where recommendedrecommended is the set of all items recommended by the system, and usedused is the set of relevant items that the user used or interacted with.

Recall: Recall measures the proportion of relevant items that were actually recommended out of all the relevant items that exist for the user.

Recall=|recommendedused||used|.Recall=\frac{\lvert recommended\cap used\rvert}{\lvert used\lvert}.

F1-score: The F1-score is defined as the harmonic mean of precision and recall, and ranges from 0 to 1, where a higher score indicates a better performance.

F1=2×precision×recallprecision+recall.F1=2\times\frac{precision\times recall}{precision+recall}.

Normalized Discounted Cumulative Gain (NDCG): NDCG is a widely used ranking quality metric that measures a recommendation system’s effectiveness based on the recommended items’ graded relevance. NDCG calculates the ratio of the Discounted Cumulative Gain (DCG) of the recommended order to the Ideal DCG (iDCG) of the order. DCG measures the relevance and position of the recommended items, while iDCG represents the highest possible DCG that can be obtained for a given set of relevant items.

NDCG=DCGiDCG,NDCG=\frac{DCG}{iDCG},

where DCG is calculated as

DCG=i=1n2reli1log2(i+1)DCG=\sum_{i=1}^{n}\frac{2^{rel_{i}}-1}{\log_{2}(i+1)}

and iDCG is calculated by sorting the set of relevant items in decreasing order of their relevance scores.

Reverse Reciprocal (RR): RR calculates the reciprocal rank of the first item in the recommendation set that the user actually consumes. The reciprocal rank of a user is defined as the relevance score of the top-most relevant item. The RR score is then calculated by taking the reciprocal of the rank of the first item in the recommendation set. A higher RR score indicates a higher ranking quality of the recommended items.

RR=1r.RR=\frac{1}{r}.

Average Precision (AP): AP measures the quality of ranked lists. It is calculated as the average of precision values at each point in the ranked list where a relevant item is found. Given N number of items in the recommendation set and m number of items actually consumed by the user, the average precision score is calculated as follows

AP=1mk=1NP(k)×rel(k),AP=\frac{1}{m}\sum_{k=1}^{N}P(k)\times rel(k),

where P(k) is the precision at top-k items in recommendation set. The value of rel(k) is 1 if the item at kth position in the recommendation set is actually consumed by the user, otherwise 0.

Area Under Curve (AUC): AUC measures the probability that a random relevant item will be ranked higher than a random irrelevant item. The higher value of AUC score implies a better recommendation system. AUC ranges between 0 and 1, with a higher value indicating a better ranking performance of the recommendation system.

5.1 Experimental Results

This section reports the experimental analysis to study the performance of CGRS. The experiments are conducted on standard benchmark datasets mentioned in Table 1. The evaluation is based on traditional metrics such as Precision, Recall, F1-score, NDCG, RR, AP, and AUC. We also analyze the proposed approach by varying the group sizes. All the results reported here are the average of 500 randomly generated instances.

5.1.1 Performance over different datasets

We evaluated the Conformal Group Recommender System (CGRS) against the Group Recommender System (GRS) [7], using the performance metrics mentioned earlier in this section. GRS [7] serves as the foundation for our proposed CGRS. We conducted experiments using various datasets listed in Table 1, and the results indicate that the conformal approach improves recommendation accuracy and reliability by providing confidence. Although we present the results for a group size of 2, we observed similar outcomes for other group sizes.

Homogeneous Groups: In the first set of experiments, we assessed the effectiveness of our proposed approach in the homogeneous group setting. The performance results for the homogeneous group setting are presented in Table 2.

AP RR AUC NDCG
Dataset GRS CGRS GRS CGRS GRS CGRS GRS CGRS
ML 100K 0.20563 0.22068 0.74760 0.74972 0.88378 0.89307 0.56149 0.57118
ML-LS 0.18439 0.25334 0.75768 0.80948 0.94065 0.95401 0.55016 0.58754
ML 1M 0.20712 0.21363 0.81423 0.81464 0.88204 0.88616 0.60240 0.60578
ML 10M 0.21730 0.21929 0.74704 0.74891 0.97131 0.97307 0.55389 0.55569
ML 20M 0.19713 0.19925 0.70836 0.70961 0.98494 0.98589 0.54107 0.54267
ML 25M 0.18300 0.18455 0.75612 0.75612 0.99177 0.99231 0.54850 0.54979
ML Latest 0.19361 0.19494 0.66250 0.66250 0.98810 0.98850 0.52236 0.52341
Per-2018 0.21536 0.27154 0.89691 0.90925 0.96067 0.96489 0.63456 0.65580
Table 2: Performance comparison between GRS and CGRS for different datasets for homogenous group setting. The proposed CGRS outperforms GRS in terms of AP, RR, AUC and NDCG for all the datasets.

In Figure 1, we present a comprehensive comparison of precision scores for different datasets, while varying the top-K parameter. The results indicate that our proposed CGRS outperforms the underlying GRS in terms of precision scores. It is noteworthy that we obtained similar results for recall and F1-score as well, which are presented in Figures 2 and 3, respectively. These findings demonstrate that the proposed CGRS can deliver better recommendation accuracy compared to the traditional GRS, and can be particularly useful in applications where precise recommendations are crucial.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 1: Comparision of Precision between CGRS and GRS for MovieLens 100K, MovieLens-latest-small, MovieLens 1M, MovieLens 10M, MovieLens 20M, MovieLens 25M, MovieLens latest, Personality 2018 for homogeneous groups. The figures are arranged from left to right and top to bottom.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: Comparision of Recall between CGRS and GRS for MovieLens 100K, MovieLens-latest-small, MovieLens 1M, MovieLens 10M, MovieLens 20M, MovieLens 25M, MovieLens latest, Personality 2018 for homogeneous groups. The figures are arranged from left to right and top to bottom.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Comparision of Recall between CGRS and GRS for MovieLens 100K, MovieLens-latest-small, MovieLens 1M, MovieLens 10M, MovieLens 20M, MovieLens 25M, MovieLens latest, Personality 2018 for homogeneous groups. The figures are arranged from left to right and top to bottom.

Random Groups: In this experimental section, we focus on the performance results for the random group setting. Table 3 displays the AP, RR, AUC, and NDCG values of the CGRS and GRS. Our findings indicate that the CGRS approach continues to outperform the GRS approach, and we observe a similar trend in the random group setting. Additionally, we have included visual representations of the Precision, Recall, and F1-score metrics for each dataset in the Figures 4, 5, and 6, respectively.

AP RR AUC NDCG
Dataset GRS CGRS GRS CGRS GRS CGRS GRS CGRS
ML 100K 0.18899 0.20271 0.62167 0.63175 0.87293 0.87784 0.54847 0.55597
ML-LS 0.12772 0.19950 0.51417 0.57387 0.91507 0.92548 0.50005 0.53692
ML 1M 0.14462 0.14849 0.52104 0.52397 0.87487 0.87733 0.52774 0.53016
ML 10M 0.12972 0.20035 0.52734 0.58781 0.91666 0.92339 0.50428 0.54104
ML 20M 0.14143 0.14143 0.50034 0.50107 0.97454 0.97538 0.50591 0.50745
ML 25M 0.13341 0.13665 0.52560 0.52754 0.98546 0.98603 0.51164 0.51330
ML Latest 0.10681 0.10915 0.36672 0.36697 0.98266 0.98297 0.41387 0.41514
Per-2018 0.18746 0.23878 0.77307 0.79475 0.95812 0.96218 0.59667 0.61593
Table 3: Performance comparsion between GRS and CGRS for different datasets for random group setting. The proposed CGRS outperforms GRS in terms of AP, RR, AUC and NDCG for all the datasets.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: Comparision of Precision between CGRS and GRS for MovieLens 100K, MovieLens-latest-small, MovieLens 1M, MovieLens 10M, MovieLens 20M, MovieLens 25M, MovieLens latest, Personality 2018 for random groups. The figures are arranged from left to right and top to bottom.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: Comparision of Recall between CGRS and GRS for MovieLens 100K, MovieLens-latest-small, MovieLens 1M, MovieLens 10M, MovieLens 20M, MovieLens 25M, MovieLens latest, Personality 2018 for random groups. The figures are arranged from left to right and top to bottom.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 6: Comparision of Recall between CGRS and GRS for MovieLens 100K, MovieLens-latest-small, MovieLens 1M, MovieLens 10M, MovieLens 20M, MovieLens 25M, MovieLens latest, Personality 2018 for random groups. The figures are arranged from left to right and top to bottom.

5.1.2 Effect of Varying Group Sizes on Performance of CGRS

This subsection presents the comparative performance analysis of CGRS and GRS for homogeneous and random groups with varying group sizes. The CGRS method consistently outperforms GRS, as shown in the previous section.

Homogeneous Groups: The results in Table 4 show the performance metrics values for the varying group size. We present results specific to the Personality-2018 dataset in the table. Similar outcomes have been observed over the other datasets.

AP RR AUC NDCG
Group size GRS CGRS GRS CGRS GRS CGRS GRS CGRS
2 0.21536 0.27154 0.89691 0.90925 0.96067 0.96489 0.63456 0.65580
3 0.15776 0.19241 0.90760 0.92286 0.95538 0.95840 0.52325 0.53786
4 0.10855 0.13315 0.87872 0.88972 0.95073 0.95301 0.43393 0.44459
5 0.07871 0.09692 0.80740 0.82353 0.94608 0.94773 0.36742 0.37538
6 0.06259 0.07651 0.78733 0.81640 0.94102 0.94229 0.32410 0.33045
Table 4: Performance comparsion between GRS and CGRS for Personality-2018 dataset for homogenous group setting. The proposed CGRS method shows better results in terms of AP, RR, AUC and NDCG for all the group sizes.

We also provide visual comparisons of the Precision, Recall, and F1-score metrics for different group sizes using Figures 7, 8 and 9, respectively. Our results show that the proposed CGRS method outperforms GRS in terms of precision, recall, and F1-score for homogeneous groups with various group sizes. The improvement is particularly noticeable in the precision metric.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 7: Comparison of Precision between the conformal and base method of the recommender system for varying homogeneous group sizes, including group sizes 2, 3, 4, 5, and 6. The figures are arranged from left to right and top to bottom.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 8: Comparison of Recall between the conformal and base method of the recommender system for varying homogeneous group sizes, including group sizes 2, 3, 4, 5, and 6. The figures are arranged from left to right and top to bottom.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 9: Comparison of F1-score between the conformal and base method of the recommender system for varying homogeneous group sizes, including group sizes 2, 3, 4, 5, and 6. The figures are arranged from left to right and top to bottom.

Random groups: In this series of experiments, we investigate the performance of the proposed CGRS compared to the base method for random groups by varying the group sizes. Table 5 shows the results we obtained for the Personality-2018 dataset. Our findings indicate that the CGRS approach continues to outperform the GRS approach, and we observe a similar trend in the random group setting.

AP RR AUC NDCG
Group size GRS CGRS GRS CGRS GRS CGRS GRS CGRS
2 0.18746 0.23878 0.77307 0.79475 0.95811 0.96218 0.59667 0.61593
3 0.14278 0.17449 0.70996 0.73890 0.95421 0.95736 0.51699 0.52834
4 0.11274 0.13342 0.65856 0.67120 0.94705 0.94906 0.44609 0.4519
5 0.09147 0.11074 0.63788 0.65718 0.94661 0.94793 0.40132 0.40617
6 0.07448 0.08751 0.58739 0.59873 0.93640 0.93714 0.35557 0.35651
Table 5: Performance comparison between GRS and CGRS for Personality-2018 dataset for the random group setting. The proposed CGRS method shows better results in terms of AP, RR, AUC and NDCG for all the group sizes.

Figures 1011, and 12 depict a comparison between the Precision, Recall, and F1-score of the proposed CGRS and the base method for random group settings. The results indicate that the proposed CGRS method achieves better precision values compared to the base method for random groups with different group sizes. While CGRS demonstrates better recall and f1-score for random groups of size 2, GRS yields better values for other group sizes. In summary, our findings confirm the proposed CGRS achieves better performance accuracy than its counterparts while making the recommendation more transparent with the added confidence value.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 10: Comparison of Precision between the conformal and base method of the recommender system for varying random group sizes, including group sizes 2, 3, 4, 5, and 6. The figures are arranged from left to right and top to bottom.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 11: Comparison of Recall between the conformal and base method of the recommender system for varying random group sizes, including group sizes 2, 3, 4, 5, and 6. The figures are arranged from left to right and top to bottom.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 12: Comparison of F1-score between the conformal and base method of the recommender system for varying random group sizes, including group sizes 2, 3, 4, 5, and 6. The figures are arranged from left to right and top to bottom.

6 Conclusions and Discussion

This paper introduces a conformal framework to the group recommendation scenario for a reliable recommendation. The theoretical facets in the article demonstrate the likelihood that the proposed CGRS makes an error is bounded by the given significance level ε\varepsilon, and hence the system exhibits a confidence of (1ε1-\varepsilon). In addition to furnishing a confidence measure of reliability, the proposed method also improves the quality of recommendations. Our experimental analysis of various benchmark datasets corroborates that the proposed CGRS performs better than the baseline GRS approach in terms of different standard performance metrics assessing recommendation quality. Extension of the proposed framework to various group recommendation algorithms is a goal worth pursuing in the future. Further, investigating a conformal framework that efficiently furnishes confidence to the complex group recommender algorithms, such as deep learning-based models, is also an exciting direction for vigorous research.

References

  • [1] Francesco Ricci, Lior Rokach, and Bracha Shapira. Introduction to recommender systems handbook. In Recommender systems handbook, pages 1–35. Springer, 2011.
  • [2] Michael J Pazzani and Daniel Billsus. Content-based recommendation systems. In The adaptive web: methods and strategies of web personalization, pages 325–341. Springer, 2007.
  • [3] Pasquale Lops, Marco De Gemmis, and Giovanni Semeraro. Content-based recommender systems: State of the art and trends. Recommender systems handbook, pages 73–105, 2011.
  • [4] Yehuda Koren, Steffen Rendle, and Robert Bell. Advances in collaborative filtering. Recommender systems handbook, pages 91–142, 2021.
  • [5] Vikas Kumar, Arun K Pujari, Sandeep Kumar Sahu, Venkateswara Rao Kagita, and Vineet Padmanabhan. Collaborative filtering using multiple binary maximum margin matrix factorizations. Information Sciences, 380:1–11, 2017.
  • [6] KH Salman, Arun K Pujari, Vikas Kumar, and Sowmini Devi Veeramachaneni. Combining swarm with gradient search for maximum margin matrix factorization. In PRICAI 2016: Trends in Artificial Intelligence: 14th Pacific Rim International Conference on Artificial Intelligence, Phuket, Thailand, August 22-26, 2016, Proceedings 14, pages 167–179. Springer, 2016.
  • [7] Robin Burke. Hybrid recommender systems: Survey and experiments. User modeling and user-adapted interaction, 12:331–370, 2002.
  • [8] Svetlin Bostandjiev, John O’Donovan, and Tobias Höllerer. Tasteweights: a visual interactive hybrid recommender system. In Proceedings of the sixth ACM conference on Recommender systems, pages 35–42, 2012.
  • [9] Venkateswara Rao Kagita, Arun K Pujari, and Vineet Padmanabhan. Group recommender systems: A virtual user approach based on precedence mining. In AI 2013: Advances in Artificial Intelligence: 26th Australasian Joint Conference, Dunedin, New Zealand, December 1-6, 2013. Proceedings 26, pages 434–440. Springer, 2013.
  • [10] Venkateswara Rao Kagita, Vineet Padmanabhan, and Arun K Pujari. Precedence mining in group recommender systems. In Pattern Recognition and Machine Intelligence: 5th International Conference, PReMI 2013, Kolkata, India, December 10-14, 2013. Proceedings 5, pages 701–707. Springer, 2013.
  • [11] Venkateswara Rao Kagita, Arun K Pujari, Vineet Padmanabhan, Sandeep Kumar Sahu, and Vikas Kumar. Conformal recommender system. Information Sciences, 405:157–174, 2017.
  • [12] Venkateswara Rao Kagita, Arun K Pujari, Vineet Padmanabhan, and Vikas Kumar. Inductive conformal recommender system. Knowledge-Based Systems, 250:109108, 2022.
  • [13] Venkateswara Rao Kagita, Arun K Pujari, Vineet Padmanabhan, Sandeep Kumar Sahu, and Vikas Kumar. Conformal recommender system. Information Sciences, 405:157–174, 2017.
  • [14] Joseph F McCarthy and Theodore D Anagnost. Musicfx: an arbiter of group preferences for computer supported collaborative workouts. In Proceedings of the 1998 ACM conference on Computer supported cooperative work, pages 363–372, 1998.
  • [15] Henry Lieberman, Neil W Van Dyke, and Adrian S Vivacqua. Let’s browse: a collaborative web browsing agent. In Proceedings of the 4th international conference on Intelligent user interfaces, pages 65–68, 1998.
  • [16] Mark O’connor, Dan Cosley, Joseph A Konstan, and John Riedl. Polylens: A recommender system for groups of users. In ECSCW 2001, pages 199–218. Springer, 2001.
  • [17] Anthony Jameson. More than the sum of its members: challenges for group recommender systems. In Proceedings of the working conference on Advanced visual interfaces, pages 48–54, 2004.
  • [18] Inma Garcia, Laura Sebastia, Eva Onaindia, and Cesar Guzman. A group recommender system for tourist activities. In International conference on electronic commerce and web technologies, pages 26–37. Springer, 2009.
  • [19] Venkateswara Rao Kagita, Arun K Pujari, and Vineet Padmanabhan. Virtual user approach for group recommender systems using precedence relations. Information Sciences, 294:15–30, 2015.
  • [20] Aditya G Parameswaran, Georgia Koutrika, Benjamin Bercovitz, and Hector Garcia-Molina. Recsplorer: recommendation algorithms based on precedence mining. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 87–98, 2010.
  • [21] Yong Soo Kim and Bong-Jin Yum. Recommender system based on click stream data using association rule mining. Expert Systems with Applications, 38(10):13320–13327, 2011.
  • [22] Glenn Shafer and Vladimir Vovk. A tutorial on conformal prediction. Journal of Machine Learning Research, 9(3), 2008.