A Generic Reinforced Explainable Framework with Knowledge Graph for Session-based Recommendation

Abstract

Session-based recommendation (SR) has gained increasing attention in recent years. Quite a great amount of studies have devoted to design complex algorithms to improve recommendation performance, where deep learning methods account for the majority. However, most of these methods are black-box ones and ignore to provide moderate explanations to facilitate users’ understanding, which thus might lead to lowered user satisfaction and reduced system revenues. Therefore, in our study, we propose a generic Reinforced Explainable framework with Knowledge graph for Session-based recommendation (i.e., REKS), which strives to improve the existing black-box SR models (denoted as non-explainable ones) with Markov decision process. In particular, we construct a knowledge graph with session behaviors and treat SR models as part of the policy network of Markov decision process. Based on our particularly designed reward strategy and loss function, the reinforcement learning (RL)-based algorithm not only achieves improved recommendation accuracy, but also provides appropriate explanations at the same time. Finally, by instantiating the REKS in five representative, state-of-the-art SR models (i.e., GRU4REC, NARM, SR-GNN, GCSAN, BERT4REC), extensive experiments towards these methods on three Amazon datasets demonstrate the effectiveness of our framework on both recommendation and explanation tasks.

Index Terms:

explainable recommendation, session-based recommendation, knowledge graph

I Introduction

With the rapid development of Internet techniques, session-based recommendation (SR), which focuses on predicting a user’s next interested item(s) based on an anonymous session, plays a more and more important role [wang2021survey]. Most existing studies have been devoted to address this task by exploiting complex models like Markov Chain (MC) [le2016modeling] and deep learning (DL) methods, e.g., recurrent neural network (RNN) [li2017neural], attention mechanism [sun2019bert4rec] and graph neural network (GNN) [wang2020global, pang2022heterogeneous]. Although SR has been widely studied, most of the previous studies focus on improving recommendation accuracy of the system. Quite a series of SR methods (denoted as non-explainable SR), especially deep learning-based ones, have an unclear internal mechanism and are generally criticized as “black boxs”. The lack of transparency and explainability make users feel difficult to understand the recommendation results and thus be reluctant to accept them. At last, it further leads to lowered user satisfaction towards online experience, and reduced system revenues. Therefore, explainable recommendation systems, targeting for providing appropriate explanations regarding recommendations besides pursuing their accuracy, are in urgent need.

On the other hand, recently, knowledge graph (KG) is one of the increasingly preferred choices for developing explainable recommendation systems due to its rich information for depicting users/items, as well as their relationships with other entities and attributes. A typical KG consists of entity-relation-entity triplets where in session-based recommendation scenarios, a entity can be a user or a product, and a relation can be “purchase” or “also_buy” between two corresponding entities. Figure 1 illustrates a KG example from Amazon dataset¹¹1jmcauley.ucsd.edu/data/amazon/links.html.. As we can see, there is a session of three products in chronological order. The next recommended item by a certain SR is a phone mainly because the last product of this session is also a phone.

Refer to caption — Figure 1: An Amazon example of a knowledge graph with session information.

KG is firstly adopted in traditional recommendation systems to facilitate the explanation task, which can be concluded into two categories: post-hoc explainable methods, and KG path-based methods. The first category [ai2018learning, wang2018ripplenet] are two-stage ones, where KG embedding-based methods are firstly adopted to learn entity/relation representations from built KGs, which are integrated into recommendation models for providing more accurate results. Then, explanation paths obtained from KGs are generated to give reasonable explanations with regard to corresponding recommendations. On the contrary, the second category [wang2019explainable] are end-to-end ones, since the explanations (mostly also KG-based paths) are obtained simultaneously with recommendation lists, that is, explanation and recommendation tasks are fulfilled together, by using neural symbolic reasoning [xian2020cafe] or reinforcement learning [zhao2020leveraging]. The above discussed two types, although generating explanations in different ways, both obtain path-based explanations from KG. Besides, KG is also adopted by sequential recommendation task, which also considers interactions in chronological order, but takes user information, and treats a user’s all historical interactions as a sequence, differing from session-based recommendation. For example, IUM [huang2019explainable] designs an explainable interaction-driven user model to extract semantic paths among user-item pairs and then learns corresponding importance for each path. The aforementioned methods might be impractical for real-world recommendation systems since it is rather difficult to enumerate all explanatory paths to identify qualified paths for explanation tasks. Besides, they cannot be easily adapted to session-based scenarios.

To conclude, there is still little research on explainable SR (except several post-hoc methods [lin2018multi, geng2022path]), whereas quite a great amount of SR algorithms, including DL-based complex ones, have been designed to improve recommendation accuracy. In this view, to go a step further towards explainable SR, it is worthwhile to come up with a generic path-based explainable framework with KG to equip existing SR algorithms for the explanation task. However, it remains significantly challenging to construct a unified framework on KG for SR: (1) in contrast to the traditional scenario, SR tries to reflect every user’s recent interests. Nevertheless, a KG is generally rather static by being constructed with users’ historical information, that is, reflecting long-term interests. Thus, the first challenge is how to better integrate short-term interests modeled from session and long-term ones from the KG for explainable SR. (2) regarding path-guided explainable models, previous studies generally start from the user entity to find reasonable semantic paths in KG. However, besides the user entity (every user might have multiple sessions), SR can also start from any item (interaction-related) in a session. In this case, since the two types of start point are both particularly complex for SR, how to determine the start point for each semantic explainable path, is our second challenge. (3) considering that existing DL-based SR models have already achieved satisfactory performance in terms of recommendation accuracy, our third challenge lies in that, how to assure those DL-based models, equipped with our design generic framework, can obtain promising results on both recommendation and explanation tasks.

To tackle these issues, we propose a novel Generic Reinforced Explainable Framework with Knowledge Graph for Session-based Recommendation (denoted as REKS). In particular, towards the first challenge, we consider the session information when constructing the KG, and jointly adopt both session and KG information for final session representation. With regard to the second challenge, we choose to treat the last (interacted) item of every session as the corresponding start point, because it can better reflect a user’s recent interest. Finally, we propose a path-guided reinforcement learning framework with specically designed reward and loss functions to facilitate SOTA, representative non-explainable SR models, where session representations are learned through non-explainable models, which are then applied to make decisions on determining every next step in identifying semantic paths. Our contributions are four-fold:

•

Our proposed REKS can be easily instantiated with representative non-explainable SR models to well fulfill both the recommendation and explanation tasks. Specially, REKS adopts a Markov decision process to simultaneously generate recommendation lists and corresponding explainable paths based on KG.
•

Towards a better framework, REKS considers to start every semantic path from the last item of a session, involves session information in the policy network, and designs a complex reward and loss functions.
•

As a generic framework, REKS has strong generalization ability, and can provide explainability for other non-explainable SR models. Exhaustive experiments regarding five SOTA non-explainable SR models (i.e., GRU4REC, NARM, SR-GNN, GCSAN, and BERT4REC) with REKS on three Amazon datasets validate the effectiveness of our framework on both recommendation and explanation tasks.

II Related Work

Our study relates to two areas, namely session-based recommendation and explainable recommendation.

II-A Session-based Recommendation

Early studies on SR focused on exploiting item dependency relationships with Markov chain (MC) [kamehkhosh2017comparison]. For instance, Shani et al. [shani2005mdp] utilized Markov decision process to calculate the transition probability among items. Rendle et al. [rendle2010factorizing] later proposed a hybrid method, i.e., FPMC, which combines matrix factorization and first-order Markov chain to model long-term and short-term user preferences, respectively. Besides the first-order MC, high-order MC is also considered [he2016fusing, he2017translation] for SR.

In recent years, deep neural network techniques have been applied in SR and achieved encouraging results [liu2018stamp]. For example, GRU4REC [hidasi2015session] firstly applies RNN (i.e., a multi-layer gate recurrent unit) to process session data. NARM [li2017neural] explores a hybrid encoder with an attention mechanism to capture the representative item-transition information among items in every session. Moreover, BERT4REC [sun2019bert4rec] models user behaviors with a bidirectional self-attention network through Cloze task. However, similar to MC-based methods, RNN- and attention-based methods merely consider item transitions in a single session, and cannot capture item relationship across different sessions. In this view, graph neural network (GNN)-based methods [chen2020handling, xia2021self] have become increasing popular. For example, SR-GNN [wu2019session] firstly applies a gated graph network [li2015gated] to learn item embeddings, and adopts a self-attention layer to aggregate item representations. FGNN [qiu2019rethinking] adopts a weighted graph attention network (WGAT) to enhance SR-GNN, whilst GCSAN [xu2019graph] uses both GNN and self-attention mechanism to learn local dependencies and long-range dependencies for SR, respectively.

These non-explainable approaches, especially the DL-based ones, have achieved promising performance in terms of recommendation accuracy for session-based recommendation. However, they are lack of transparency and explanations towards the recommendation results, which thus might lead to lowered user satisfaction and acceptance.

II-B Explainable Recommendation

Recently, explainable recommendation has attracted increasing attention, which mainly consists of two types [zhang2020explainable]: post-hoc (two-stage) [zhang2014explicit] and model-based (end-to-end) methods [lin2018multi, geng2022path]. The former one refers to identifying reasonable explanations for the recommended items with auxiliary information after recommendation model ranks items in the first stage, whilst the latter type simultaneously optimizes recommendation and explanation tasks, e.g., in the form of multi-task learning. For the two types, knowledge graph (KG) has been adopted to facilitate explainable recommendation in traditional scenarios, and both of the two corresponding categories can be concluded as path-based explainable methods, as they obtain path-based explanations from KG for reasonably explaining recommended items. For example, post-hoc models [ai2018learning, wang2018ripplenet] adopt KG embedding-based techniques to learn entity/relation representations from built KG to improve recommendation accuracy. Then, explanation paths obtained from KGs are generated to generate recommendations in the second stage. In the end-to-end models [wang2019explainable], explanation paths are identified together with recommendation lists, by using neural symbolic reasoning [xian2020cafe] or reinforcement learning [zhao2020leveraging].

Besides, there are a few studies [hou2019explainable, yang2021gfe] on explainable sequential recommendation. For example, Hou et al. [hou2019explainable] proposed a knowledge-aware sequential model that integrates pre-defined path-based and embedding-based methods. However, it is impractical in real-world recommendation systems to enumerate all explanatory paths. Subsequently, RSL-GRU [cui2021reinforced] models the KGs-based explainable recommendation in sequential settings, which contains a GRU component and a reinforced path reasoning network (RPRN) component to learn a path search policy. This model aims to generate explanations for sequential recommendation, and cannot apply to other non-explainable models. For session-based recommendation, previous explainable studies [chen2021ssr, geng2022causality] are mostly post-hoc methods, where attention mechanisms are harnessed to provide explainability.

In summary, there is still little research on model-based explainable SR, howbeit quite a set of non-explainable SR algorithms have achieved SOTA results on recommendation accuracy. Therefore, our research targets to design a generic path-based explainable framework with KG (REKS) to help existing SR algorithms to simultaneously optimize recommendation and explanation tasks.

III Our REKS Framework

In this section, we first formally define some related concepts, including knowledge graph and semantic path, and the research problem. Then, we present our REKS framework.

III-A Problem Definition

Before formalizing our research problem, we firstly introduce some concepts related to our work.

Knowledge Graph. A knowledge graph (KG) is formally defined as a graph $\mathcal{G}=\{(e_{i},r,e_{j})|e_{i},e_{j}\in\mathcal{E},r\in\mathcal{R}\}$ , where $\mathcal{E}=\{e_{1},e_{2},...,e_{|\mathcal{E}|}\}$ denotes the set of entities, and $\mathcal{R}=\{r_{1},r_{2},...,r_{|\mathcal{R}|}\}$ denotes the set of relations. A triple $(e_{i},r,e_{j})$ represents a relation $r\in\mathcal{R}$ from a head entity $e_{i}$ to a tail entity $e_{j}$ in the KG $\mathcal{G}$ . In recommendation scenario, there exist at least two types of entities in $\mathcal{G}$ , i.e., the user entity $u\in\mathcal{U}\subset\mathcal{E}$ and the item entity $v\in\mathcal{V}\subset\mathcal{E}$ . They are connected by relations like “purchase” in e-commerce. Besides, there might be other types of entities and relations (depended on every particular scenario/domain) in $\mathcal{G}$ . For example, triple (iPhone, produced_by, Apple) refers to that iPhone (product entity)’s brand is Apple (brand entity).

Semantic Path. A semantic path between entity $e_{i}$ and $e_{j}$ in $\mathcal{G}$ refers to a sequence of entities connected by different types of relations, which is denoted as: $p(e_{i},e_{j})={e_{i}\stackrel{{\scriptstyle r_{i1}}}{{\longrightarrow}}e_{i+1}\stackrel{{\scriptstyle r_{i2}}}{{\longrightarrow}}e_{i+2}\cdots\ \stackrel{{\scriptstyle r_{is}}}{{\longrightarrow}}e_{j}}$ , where $e_{i}\stackrel{{\scriptstyle r_{i1}}}{{\longrightarrow}}e_{i+1}$ refers to $(e_{i},r_{i1},e_{i+1})\in\mathcal{G}$ , and $p(e_{i},e_{j})$ also means $e_{i}$ and $e_{j}$ is $s$ -hop reachable in $\mathcal{G}$ . Note that there might exist multiple semantic paths between two entities in $\mathcal{G}$ and their lengths are varied.

Problem Statement. Towards session-based recommendation, let $\mathcal{V}=\{v_{1},v_{2},\cdots,v_{|\mathcal{V}|}\}$ denote the item set. An anonymous session $\mathcal{S}_{e}$ can be represented by $\mathcal{S}_{e}=\{v_{1}^{S},v_{2}^{S},\cdots,v_{l}^{S}\}$ , where its length equals to $l$ and $v_{i}^{S}\in\mathcal{V}$ means the $i$ -th interacted item within $\mathcal{S}_{e}$ . Noted that the item set in sessions are the same as the item entities in KG $\mathcal{G}$ , which are connected with other entities in terms of different pre-defined (interacted) relations. Without loss of generality, we define our research problem as follows:

Definition 1.

Given the KG $\mathcal{G}=\{(e_{i},r,e_{j})|e_{i},e_{j}\in\mathcal{E},r\in\mathcal{R}\}$ and a session $\mathcal{S}_{e}=\{v_{1}^{S},v_{2}^{S},\cdots,v_{l}^{S}\}$ , we aim to predict a probability value for each item $v_{k}\in\mathcal{V}$ , $\widehat{y}_{k}$ . Correspondingly, the top $K$ items with the highest values will be recommended (recommendation task), and for each recommended item, one semantic path is associated to explain why the item is recommended (explanation task).

III-B The Proposed REKS Model

Figure 2 outlines the architecture of REKS, which consists of two components: knowledge graph construction and formulating recommendation as a Markov decision process. In particular, we first built the KG $\mathcal{G}$ and learn the initial entities and relations representation via TransE [bordes2013translating]. Secondly, we formulate our problem as a Markov decision process to simultaneously tackle recommendation and explanation tasks for SR, where we design a policy network to also consider session information besides the current state, and propose an appropriate reward function to moderately address our tasks. Next, we elaborate the two components of our model in details.

III-B1 Knowledge Graph Construction

As aforementioned, for each scenario, we firstly construct a KG $\mathcal{G}=\{(e_{i},r,e_{j})|e_{i},e_{j}\in\mathcal{E},r\in\mathcal{R}\}$ . The entities in Amazon dataset²²2In this paper, for better elaboration, we take Amazon dataset as an example to showcase the KG construction process, which can be easily applied in other scenarios/domains. can be divided into five types: user, product, brand, category, and related_product. As shown in Figure 2, we connect every entity pair with a directed relation if there is a relation from a head entity to a tail entity. It is worth noting that, in SR, there exists dependent relationship between items (products) in a session. Therefore, we accordingly add the co_occurrence relation in the built KG. That is, with regard to a session $\mathcal{S}_{e}$ , if item $v^{S}_{i}$ is firstly interacted followed by item $v^{S}_{i+1}$ , we add the triple $(v^{S}_{i},co\_occurrence,v^{S}_{i+1})$ to $\mathcal{G}$ , thus the KG has a edge from $v^{S}_{i}$ to $v^{S}_{i+1}$ . For other types of relations, similar to [xian2019reinforcement, zhao2020leveraging], we add a bidirectional edge to the knowledge graph. For example, if there exits a triple $(product,belong\_to,category)$ , we add two edges to the $\mathcal{G}$ , i.e., $product\xrightarrow{belong\_to}category$ and $category\xrightarrow{belong\_to}product$ . The statistics of total entities and relations on the three Amazon datasets are summarized in Tables LABEL:tb:relationsStatistics and LABEL:tb:entitiesStatistics.

After constructing the KG, we leverage Translating Embedding (TransE) technique [bordes2013translating] to obtain the initial representations for all entities and relations. Specifically, we first denote $\mathbf{X}^{0}\in\mathbb{R}^{(|\mathcal{E}|+|\mathcal{R}|)\times d_{0}}$ as initial embedding matrix:

\small\mathbf{X}^{0}=\text{TransE}(|\mathcal{E}|+|\mathcal{R}|,d_{0})

(1)

where $d_{0}$ is the dimension of entity/relation representations, and $|\mathcal{E}|+|\mathcal{R}|$ is the number of total entities and relations.

III-B2 Formulating Recommendation as a Markov Decision Process

We firstly elaborate some preliminaries of our Markov decision process (MDP). In general, we define the MDP environment with a quadruple $(S,A,T,R)$ , where $S$ indicates the state space, $A$ refers to the action space, $T:S\times A\to T$ denotes the state transition function, and $R:S\times A\to R$ indicates the reward function.

State. Let $s$ ( $s\in S$ ) denote a state in our MDP environment. The initial state $s_{0}$ at time $t_{0}$ is represented by the initial entity of the semantic path. Besides, for session-based recommendation, given a session $\mathcal{S}_{e}$ , we use the last interacted product, i.e., $v_{l}^{S}$ in the session as the initial entity, considering that the last interacted product is the most recent behavior of this session and thus may involve more information for better next-item prediction [balloccu2022post]. Accordingly, we denote the state at time $t$ as $s_{t}=(\mathbf{e_{t}},\mathbf{h_{t}})$ , where $\mathbf{e_{t}}$ is the representation of entity $e$ at time $t$ , and $\mathbf{h_{t}}$ contains the history of all entities and relations in the semantic path from the beginning to the time $t$ , i.e., $\mathbf{h_{t}}=(\mathbf{e_{0}},\mathbf{r_{1}},\mathbf{e_{1}},\cdots\mathbf{r_{t}})$ , where $\mathbf{e_{0}}$ is the representation of entity $v_{l}^{S}$ in session $\mathcal{S}^{e}$ .

Action. Based on the state $s_{t}=(\mathbf{e_{t}},\mathbf{h_{t}})$ , we define, at time $t$ , an action space $A_{t}=\{(r_{t+1},e_{t+1})|(e_{t},r_{t+1},e_{t+1})\in\mathcal{G}\}$ , which indicates all possible edges of entity $e_{t}$ in $\mathcal{G}$ excluding those ones historically being visited in the semantic path. The action behavior is modeled by a policy network $\pi$ , which outputs a probability distribution over actions in $A_{t}$ to decide the next action (happened at $t+1$ ). In our REKS model, the policy network is influenced by both session information and semantic path information.

In SR, we can obtain session representation by representative non-explainable SR models, like GRU4REC, NARM, SRGNN, BERT4REC, and GCSAN. For example, using NARM, session representation $\mathbf{S_{e}}\in\mathbb{R}^{d_{1}}$ is calculated as:

\small\mathbf{S_{e}}=NARM(\mathbf{X}^{0}_{\mathcal{V}},\mathcal{S}_{e})

(2)

where $\mathbf{X}^{0}_{\mathcal{V}}$ is the initial embedding of items (measured by TransE in Equation 1), and together with $\mathcal{S}_{e}$ , contribute to the inputs of NARM. Besides, the corresponding semantic path related information (denoted as $\mathbf{S_{p}}\in\mathbb{R}^{d_{0}}$ ) is modeled by the sum of the entity representation $\mathbf{x}^{0}_{e_{t}}$ and the relation representation $\mathbf{x}^{0}_{r_{t}}$ at time $t$ , as the most related information to determine next action in this semantic path is the most recent information (i.e., $e_{t}$ and $r_{t}$ ). Thus, the total representation $\mathbf{s^{ep}_{t}}\in\mathbb{R}^{d_{2}}$ is calculated as:

\small\mathbf{s^{ep}_{t}}=MLP(\mathbf{S_{e}}\oplus\mathbf{S_{p}})

(3)

where $\oplus$ denotes concatenation operation, and $MLP(*)$ denotes a multilayer perceptron.

Finally, we use a softmax function to compute the probability of selecting next product:

\small\pi=softmax(\mathbf{A_{t}}\odot(\mathbf{W_{1}}\mathbf{s^{ep}_{t}}))

(4)

where $\mathbf{A_{t}}$ is a binary vector of $A_{t}$ and contains the probability of each action, and $\mathbf{W_{1}}$ is a trainable parameter matrix.

Reward. It is important to define an appropriate reward for reinforcement learning. For our research, we strive to simultaneously optimize the recommendation and explanation tasks, that is, to accurately recommend top $K$ products that might be probably preferred/liked, and provide suitable and reasonable explanations regarding recommended items. To fulfill the two objectives, we consider three components in our designed reward function: item-level reward, rank-level reward, and path-level reward:

\small R(s_{t},a_{t})=R_{item}+R_{path}+2^{R_{rank}}

(5)

where $R_{item}$ , $R_{path}$ , and $R_{rank}$ denote the item-level reward, path-level reward, and rank-level reward,respectively.

With regard to item-level reward, we consider the straightforward reward strategy, namely, the terminal reward, which inclines to give a higher reward if a semantic path ends with the target product (i.e. the ground-truth item that the target user will interact with). Here, we define $R_{item}$ with regard to the terminal state $s_{T}$ $(s_{T}=(e_{T},h_{T}))$ :

\displaystyle\small R_{item}=\left\{\begin{array}[]{lcl}1&&{if\quad e_{T}=v_{T}}\\ \sigma((\mathbf{x}^{0}_{e_{T}})^{T}*\mathbf{x}^{0}_{v_{T}})&&{if\quad e_{T}\neq v_{T}\quad and\quad e_{T}\in\mathcal{V}}\\ 0&&{otherwise.}\\ \end{array}\right.

(9)

where $v_{T}$ is the ground-truth product, $\sigma(\cdot)$ is the sigmoid function, and $e_{T}$ is the last entity on the semantic path, i.e., the predicted entity given the session. The reward function indicates that: if the predicted entity is the target product, $R_{item}=1$ ; if the predicted entity is not the target product but belongs to product entity, we set the item-level reward as the similarity between the target product and the predicted entity. Otherwise, we consider the item-level reward to be $0$ .

For rank-level reward, we aim to further improve the recommendation accuracy of session-based scenario by pushing the target item to a higher position in the top-K ranking list. Towards this goal, we thus propose our rank-level reward, $R_{rank}$ , which is particularly measured as:

\displaystyle R_{rank}=\left\{\begin{array}[]{lcl}\frac{1}{log(r^{T}+2)}&&{if\quad e_{T}\in\mathcal{V}}\\ 0&&{otherwise.}\\ \end{array}\right.

(12)

where $r^{T}$ refers to the rank of entity $e_{T}$ in the predicted top $K$ product list.

Considering path-level reward, we want to generate a semantic path of high quality which is more relevant to the given session and thus can better explain our predicted item at the final step. Therefore, we define $R_{path}$ as:

\small R_{path}=\sigma(\mathbf{P}^{T}*\mathbf{Se})

(13)

where $\sigma(.)$ is the sigmoid function and $\mathbf{P}$ is the representation of this semantic path:

\small\mathbf{P}=mean(x^{0}_{e_{0}},x^{0}_{r_{1}},\cdots,x^{0}_{r_{T}},x^{0}_{e_{T}})

(14)

Transition. After obtaining each action $a_{t}$ , the agent can get an intermediate reward, i.e., $r_{t+1}=R(s_{t},a_{t})$ , which reflects the performance of our model. We then utilize the transition function $T:S\times A\to T$ to update the state. Given the current state $s_{t}$ and the action $a_{t}$ , the transition function is defined as:

\small s_{t+1}=T(s_{t},a_{t})

(15)

Optimization. Given the above MDP environment, we want to learn a stochastic policy $\pi$ that maximizes the expected cumulative reward, and minimizes the cross-entropy loss function. That is, our model’s loss function can be made up of two parts: reward loss function (cumulative reward maximization) $L_{r}$ and cross-entropy loss function $L_{ce}$ .

\small L=\beta L_{r}+L_{ce}

(16)

$\beta$ is a hyper-parameter to balance the two parts of loss. The reward loss function is defined as:

L_{r}=-Q(\theta)

(17)

\small Q(\theta)=E_{\pi}\sum_{t=0}^{T-1}\gamma^{t}R(s_{t+1},a_{t+1})

(18)

where $Q(\theta)$ is the expected cumulative reward, $\theta$ denotes the set of model parameters, and $\gamma$ indicates the discount factor. In this work, we employ REINFORCE with a baseline algorithm [sutton2018reinforcement] to optimize parameters $\theta$ .

Then the cross-entropy loss is calculated as:

L_{ce}=-\sum(y_{j}\log(\hat{y}_{j})+(1-y_{j})\log(1-\hat{y}_{j}))

(19)

where $y_{j}$ is the ground-truth score of item $v_{j}$ ( $1$ or $0$ ). $\hat{y}_{j}$ is the predicted score, which is calculated by the policy network in Equation 4.

After training the model, we implement a probabilistic beam search algorithm [xian2019reinforcement, zhao2020leveraging] to find candidate products and explainable semantic paths given a session in SR. It should be noted that the start point for the semantic paths is the last interacted product in the session, which is the same as the training process.

IV Experiments

In this section, we conduct extensive experiments on three datasets to validate the effectiveness of our model, with the goal of answering the four research questions (RQs):

•

RQ1: How does REKS facilitate representative non-explainable SR models on recommendation task?
•

RQ2: How do different components of REKS contribute to the model performance?
•

RQ3: How do different hyper-parameters affect the performance of REKS?
•

RQ4: How does REKS perform on explanation task?

IV-A Experimental Setup

IV-A1 Datasets

We choose three real-world datasets, i.e. Beauty, Cellphones, and Baby from the Amazon e-commerce platform. Each dataset consists of both users’ behaviors and meta information. Firstly, we extract different types of entities and relations from Amazon metadata. The detailed information are summarized in Tables LABEL:tb:relationsStatistics and LABEL:tb:entitiesStatistics. Besides, we consider a user’s interactions that occurred in a day as a session, and filter out items with less than $5$ interactions and sessions with lengths smaller than $2$ . The detailed information of three datasets regarding sessions are further reported in Table LABEL:tb:dataStatistics.

IV-A2 Baseline Models

We apply our REKS model on five representative SR models, i.e., one RNN-based method (GRU4REC), two GNN-based models (SR-GNN and GCSAN), one attention-based method (BERT4REC), and one hybrid method that combines attention and RNN (NARM).

•

GRU4REC [hidasi2015session] applies GRU to model interaction sequences and designs a ranking-based loss function.
•

NARM [li2017neural] improves GRU4REC by utilizing vanilla attention to model the relationship of the last item in a session with other items of the session.
•

SR-GNN [wu2019session] is a solid GNN-based approach for SR, which employs a gated graph convolutional layer to obtain item embedding and applies self-attention mechanism to obtain session representation.
•

GCSAN [xu2019graph] utilizes self-attention mechanism to capture item dependencies via graph information aggregation.
•

BERT4REC [sun2019bert4rec] employs a deep bidirectional self-attention framework to model user behaviors and utilizes a Cloze objective loss for SR.

Noted that for fair comparisons, the inputs of these baselines are the same as our model, including both the knowledge graph $\mathcal{G}$ and session information. We also apply TransE method on $\mathcal{G}$ to obtain the initial item representations for these baselines without REKS.

IV-A3 Evaluation Metrics

Given a session, each method generates a top- $K$ recommendation list ( $K$ is set to $5$ , $10$ , $20$ in our experiments). Towards exhaustive evaluations on recommendation task, we use two widely used accuracy metrics: HR@ $K$ and NDCG@ $K$ . HR@ $K$ indicates the hit ratio, i.e., the coverage rate of targeted predictions; and NDCG@ $K$ (Normalized Discounted Cumulative Gain) rewards each hit based on its position in the ranked recommendation list. A larger value indicates better performance for the two metrics.

IV-A4 Hyper-parameter Setups

Regarding each method, we tune the best hyper-parameters according to validation data on each dataset. For all methods, the dimension $d_{0}$ , $d_{1}$ , and $d_{2}$ are set to $400$ . We train all models with Adam optimizer and the path length is $2$ . The settings of other parameters are shown in Table LABEL:tb:hyperParameter. Noted that for all these methods, we run each experiment five times and report the average as the performance in Table LABEL:tb:comparativeResults, where REKS_baseline denotes applying RKES with the corresponding baseline. We also conduct a paired t-test (regarding every baseline and REKS_baseline pair) to validate the significance of the performance difference (^∗ for p-value $\leq$ .05 and ^∗∗ for p-value $\leq$ .01). Besides, the results are reported in percentage (%).

IV-B Experimental Results on Recommendation Task

Here, we present results to answer RQ1, RQ2 and RQ3.

IV-B1 Performance of REKS with Baselines (RQ1)

We instantiate the REKS framework on five SOTA non-explainable SR methods, and their performance on three Amazon datasets is reported in Table LABEL:tb:comparativeResults, where every RKES next to a baseline (e.g., NARM) represents the baseline with RKES (e.g., REKS_NARM). From Table LABEL:tb:comparativeResults, we have some interesting observations as below: (1) With REKS framework, the performance of every baseline can be significantly improved in almost all cases, validating the effectiveness of our framework for non-explainable SR methods on recommendation task. For example, on Beauty, REKS_GRU4REC achieves 9.91% in terms of HR@5 with an improvement of 13.91% over vanilla GRU4REC (8.70%). (2) Among non-explainable SR models, NARM performs better than other methods on Cellphones and Baby, while BERT4REC performs the best on Beauty. Besides, surprisingly, vanilla GCSAN obtains the worst performance across all scenarios. However, RKES_GCSAN also makes the largest progress over GCSAN compared to other methods, indicating that our RL-based framework can well mitigate the performance gaps among non-explainable SR models. (3) REKS_baseline inclines to get better performance than other REKS equipped methods if the corresponding baseline performs better, implying that a better session representation learned from non-explainable SR methods can better guide our RL-based framework to find much more reasonable paths for recommendation task in SR. To conclude, our REKS framework can significantly improve the performance of non-explainable SR methods on recommendation task.

IV-B2 Ablation Study (RQ2)

We conduct experiments to test the innovative designs in REKS: (1) loss function; (2) reward function; (3) start point of semantic paths; and (4) path length.

Impact of loss function. The loss function in REKS consists of two parts: reward-based loss and cross entropy loss. To validate the effectiveness of each part, we compare our model with two alternatives: REKS_R merely considers reward loss, while REKS_C only uses cross entropy loss. The results with different loss functions are shown in Figure LABEL:tb:differentlossfunction³³3We can get similar results regarding different methods with $K=5$ and $K=20$ , which are not reported in detail due to space limitation (the same as following figures).. REKS_R and REKS_C perform consistently worse than REKS, which demonstrates that each loss function can improve recommendation accuracy. Besides, REKS_R performs better than REKS_C, which might be caused by that, cross entropy loss function is directly linked to recommendation accuracy for recommendation task, whereas in reward loss (reward maximization), we also consider to address explanation task (e.g., path-level reward design).

Impact of reward function. In REKS, we innovatively design a complex reward function for the recommendation and explanation tasks. It consists of three parts: item-level, path-level, and rank-level rewards. Thus, we compare our model with three variants: (1) REKS-rank does not consider rank-level reward; (2) REKS-path further removes the path-level reward from REKS-rank, i.e., only uses item-level reward; and (3) REKS_R1 merely employs the most basic reward, which equals $1$ if the predicted product is the ground-truth product. Figure LABEL:fig:rewardfunction shows their performance, where the x-label represents the corresponding non-explainable SR model. As we can see, each component contributes to the final performance, validating the effectiveness of our reward designs for making more accurate recommendation.

Impact of start points of semantic paths. As discussed in Section I, the start point of semantic paths can be a user or a product (the last interacted item of a session), while in our REKS, we choose to start from the last interacted product given a session. Hence, to verify the validity of our design, we compare our framework (REKS) with a variant, REKS_user, starting from the user of the session. Noted that for each REKS_user model, we re-tune the corresponding best hyper-parameters: path length is $3$ , and the sampling size is set to $\{100,10,1\}$ for each step. From the results shown in Figure LABEL:fig:startingpoint, we observe that REKS consistently performs better than REKS_user, validating our hypothesis that, for SR, the last interacted product of a session can better reflect the corresponding user’s recent interests, and thus leads to more accurate next-item prediction.

Impact of path length. Here, we investigate the effect of path length on the model performance, while in our framework, we prefer the path length of $2$ . We thus compare our model with two variants: REKS with path length $3$ (REKS_l3) and that of length $4$ (REKS_l4), where the sampling size is $\{100,1,1\}$ , and $\{100,1,1,1\}$ for each step, respectively. Figure LABEL:fig:differentpathlength summarizes the comparative results, and we can see that when path length is $2$ , our framework can obtain the best recommendation accuracy. This might be caused by that, though a larger path length can generate more patterns, it may also introduce noisy information. Besides, REKS_l4 consistently performs better than REKS_l3. This is because a path pattern is mostly “product $\to$ other entity $\to$ product $\to$ other entity” in the knowledge graph, we are thus expected to get better recommendation performance when the path length is even.

IV-B3 Sensitivity of Hyper-parameters (RQ3)

We further investigate the impact of learning rate $lr$ , and discount factor $\beta$ in loss function on the model performance. Due to space limitation, we take REKS_NARM as an example, and Figure LABEL:fig:hyper shows the results with the learning rate in the range of $\{0.0001,0.0005,0.001,0.005\}$ and $\{0.2,0.4,0.6,0.8,1,1.2\}$ for $\beta$ , respectively. In general, we can see that our framework is comparatively insensitive to these hyper-parameters.

IV-C Our Performance on Explanation Task (RQ4)

In this section, we first introduce the process of generating explanations, and then conduct a survey to evaluate user satisfaction towards the explanations generated by our framework. Finally, we also give three explanation cases generated by REKS_NARM to intuitively elaborate how REKS interprets the recommendation results.

Generating explanations. Specifically, we adopt the probabilistic beam search [xian2019reinforcement] to generate semantic paths on knowledge graphs. We denote the predefined sampling sizes at each step as $P_{1}$ and $P_{2}$ (path length= $2$ ). Figure 3 shows the process of generating explanations given session $\{v_{1},v_{3},v_{4}\}$ , which involves three steps: (1) to determine the start point, where in REKS, the start point is the last interacted product of the session, i.e., $v_{4}$ ; (2) to select the top $P_{1}$ first-order neighbors according to the probability $\pi$ , where in Figure 3, one of the first-order neighbors, $c_{2}$ , is selected; and (3) to determine $P_{2}$ second-order neighbors with the highest scores (for each first-order neighbor). Via the three steps, there may exist multiple candidate paths between the start point ( $v_{4}$ ) and the predicted entity, such as $v_{2}$ . Then, according to the inner product of the probabilities of steps 2 and 3, we choose the semantic path with the highest overall probability as the one to interpret why we recommend the entity given this session. In our study, $P_{1}$ and $P_{2}$ are set to $100$ and $1$ , respectively. The semantic path in our scenario is presented as the mode of:

•

$v_{4}\xrightarrow{belong\_to}c_{2}\xrightarrow{belong\_to}v_{2}$

User study for verifying the quality of explanations. We design a user study to validate the suitability and quality of the explanations generated by REKS. In particular, we randomly select $20$ cases (sessions) from the three Amazon datasets (i.e., Baby, Beauty and Cellphones). Each case includes three parts: session information, semantic path, and the recommended product, where the semantic paths of each case are generated by RKES_NARM.

Moreover, inspired by [wang2018explainable, park2022reinforcement], we aim to evaluation explanations from the following six perspectives: satisfaction, effectiveness, transparency, persuasiveness, usability, and easy to understand. Correspondingly, we design a questionnaire (in web pages and viewed by browsers) that covers six questions for each case: (1) I am satisfied with the explanations. (2) These explanations help me make more accurate/faster decisions. (3) I know why the recommendation system recommends this product to me. (4) I believe that these explanations can increase the view/purchase rate on the product. (5) I am reluctant to use this system. (6) I find these explanations difficult to understand. We recruit $50$ subjects to take participant in the study in an anonymous form, where each subject performs subjective scores on a 1-5 scale (representing strongly disagree, disagree, neutral, agree, and strongly disagree, accordingly). Note that larger values indicate better performance regarding the first four questions, whereas smaller values imply better one for the last two. We make such design to control the quality of user study to avoid that some participants of low quality might consistently rate high or low for all questions.

Figure 4 displays the results from the user study, where the horizontal axis represents the average of the scores for each perspective, and the black line means the standard deviation. As can be seen, all perspectives are evaluated relatively positively, confirming the effectiveness and rationality of the explanations generated by our REKS framework.

Case study. We further showcase three intuitive examples (from three datasets) elaborate our explanations. Specifically, Figures 5 (a) (b), and (c) shows session information, three semantic paths of the highest probability, and the ground-truth recommended product, respectively.

As shown in the first case on Beauty, given session {Hair Gel, Hair Styling Gel, Shampoo}, the recommended product is “Conditioner”. Three semantic paths are generated, where the first and third paths recommend correctly, but the first semantic path is more explanatory, implying that the last product of the session “Shampoo” and the recommended “Conditioner” has the same brand “Dove”. Besides, given the session, we can see that the user’s recent interests are related to hair, which are consistently implied by all semantic paths. The second case is from Cellphones dataset, where the session is {Bluetooth Headset, Holster Case, Holster Combo, USB 3.3ft Cable}, and the recommended product is “USB 6ft Cable”. The first two paths provide accurate recommendation, where we recommend “USB 6ft Cable” since user 6824 or 7524 have bought both “USB 3.3ft Cable” and “USB 6ft Cable”. Intuitively, there are too many data cables in the same category, merely recommending the products of the same category (as the third semantic path) is less likely to succeed. For the third case (from Baby), concerning session {Flow Nipple, The Go Powder Cap}, “Simply Smart Bottle” is recommended since the last product of the session is frequently bought together with “Bottle Liners”, and “Bottle Liners” is frequently bought together with the recommended product. For the session, the user’s intent is might about Baby and Bottle, and “Bottle Liners” is the most relevant compared to other entities (e.g., Water Bottle) from the three semantic paths.

To conclude, via the user study and the three intuitive examples, we can find that REKS can not only generate reasonable explanations, but also can better capture real intentions implied by every session. Besides, our design of using last product of a session as the start point of semantic paths is also verified to be effective.

V Conclusions

Existing representative session-based recommendation, especially deep learning-based, often acts as a “black box” and cannot provide explanations, leading to lowerd user satisfaction and acceptance. In this paper, we design a generic reinforced explainable framework with knowledge graph (denoted as REKS) to simultaneously address recommendation and explanation tasks for session-based recommendation. REKS can be applied into non-explainable SR methods to facilitate the explanation task by generating path-based explanations from knowledge graph. In particular, REKS constructs a knowledge graph with session information and treat SR models as part of the policy network of Markov decision process with particularly designed reward strategy and loss function. By instantiating the REKS in five representative SR models (i.e., GRU4REC, NARM, SR-GNN, GCSAN, and BERT4REC), experimental results on three Amazon datasets demonstrate the effectiveness of our framework on both recommendation and explanation tasks, and the rationality of our model designs.