This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

EM-RBR: a reinforced framework for knowledge graph completion from reasoning perspective

Antiquus S. Hippocampus, Natalia Cerebro & Amelie P. Amygdale
Department of Computer Science
Cranberry-Lemon University
Pittsburgh, PA 15213, USA
{hippo,brain,jen}@cs.cranberry-lemon.edu
&Ji Q. Ren & Yevgeny LeNet
Department of Computational Neuroscience
University of the Witwatersrand
Joburg, South Africa
{robot,net}@wits.ac.za
\ANDCoauthor
Affiliation
Address
email
Use footnote for providing further information about author (webpage, alternative address)—not for acknowledging funding agencies. Funding acknowledgements go at the end of the paper.
Abstract

Knowledge graph completion aims to predict the new links in given entities among the knowledge graph (KG). Most mainstream embedding methods focus on fact triplets contained in the given KG, however, ignoring the rich background information provided by logic rules driven from knowledge base implicitly. To solve this problem, in this paper, we propose a general framework, named EM-RBR(embedding and rule-based reasoning), capable of combining the advantages of reasoning based on rules and the state-of-the-art models of embedding. EM-RBR aims to utilize relational background knowledge contained in rules to conduct multi-relation reasoning link prediction rather than superficial vector triangle linkage in embedding models. By this way, we can explore relation between two entities in deeper context to achieve higher accuracy. In experiments, we demonstrate that EM-RBR achieves better performance compared with previous models on FB15k, WN18 and our new dataset FB15k-R, especially the new dataset where our model perform futher better than those state-of-the-arts. We make the implementation of EM-RBR available at https://github.com/1173710224/link-prediction-with-rule-based-reasoning.

1 Introduction

Knowledge graph (KG) has the ability to convey knowledge about the world and express the knowledge in a structured representation. The rich structured information provided by knowledge graphs has become extremely useful resources for many Artificial Intelligence related applications like query expansion (Graupmann et al., 2005), word sense disambiguation (Wasserman Pritsker et al., 2015), information extraction (Hoffmann et al., 2011), etc. A typical knowledge representation in KG is multi-relational data, stored in RDF format, e.g. (Paris, Capital-Of, France). However, due to the discrete nature of the logic facts (Wang & Cohen, 2016), the knowledge contained in the KG is meant to be incomplete (Sadeghian et al., 2019). Consequently, knowledge graph completion(KGC) has received more and more attention, which attempts to predict whether a new triplet is likely to belong to the knowledge graph (KG) by leveraging existing triplets of the KG.

Currently, the popular embedding-based KGC methods aim at embedding entities and relations in knowledge graph to a low-dimensional latent feature space. The implicit relationships between entities can be inferred by comparing their representations in this vector space. These researchers (Bordes et al., 2013; Mikolov et al., 2013; Wang et al., 2014; Ji et al., 2015; Lin et al., 2015; Nguyen et al., 2017) make their own contributions for more reasonable and competent embedding. But the overall effect is highly correlated with the density of the knowledge graph. Because embedding method always fails to predict weak and hidden relations which a low frequency. The embedding will converge to a solution that is not suitable for triplets owned weak relations, since the training set for embedding cannot contain all factual triplets.However, reasoning over the hidden relations can covert the testing target to a easier one. For example, there is an existing triplet (Paul, Leader-Of, SoccerTeam) and a rule Leader-Of(x,y) \Longrightarrow Member-Of(x,y) which indicates the leader of a soccer team is also a member of a sport team. Then we can apply the rule on the triplet to obtain a new triplet (Paul, Member-of, SportTeam) even if the relation Member-of is weak in knowledge graph.

Besides, some innovative models try to harness rules for better prediction. Joint models (Rocktäschel et al., 2015; Wang et al., 2019; Guo et al., 2016) utilize the rules in loss functions of translation models and get a better embedding representation of entities and relations. An optimization based on ProPPR (Wang & Cohen, 2016) embeds rules and then uses those embedding results to calculate the hyper-parameters of ProPPR. These efforts all end up on getting better embedding from rules and triplets, rather than solving completion through real rule-based reasoning, which is necessary to address weak relation prediction as mentioned before. Compared with them, EM-RBR can perform completion from the reasoning perspective.

We propose a novel framework EM-RBR combing embedding and rule-based reasoning, which is a BFS essentially. In the development of the joint framework EM-RBR, we meet two challenges. On the one hand, we use AMIE (Galárraga et al., 2013) to auto-mine large amount of rules but not manually. However, these rules automatically mined sometimes are not completely credible. Therefore, it is necessary to propose a reasonable way to measure rules to pick proper rules when reasoning. On the other hand, it is known that traditional reasoning-based methods will give only 0 or 1 to one triplet to indicate acceptance or rejection for the given knowledge graph. This conventional qualitative analysis lacks the quantitative information as the embedding models. So the result of EM-RBR need to reflect the probability one triplet belonging to the knowledge graph.

Three main contributions in EM-RBR are summarized as follows:

  • EM-RBR is flexible and general enough to be combined with a lot of embedding models.

  • We propose novel rating mechanism for triplets combined with reasoning process, which can distinguish a given triplet with other wrong triplets better.

  • We propose a novel rating mechanism for auto-mined reasoning rules and each rule will be measured properly in our framework.

In the remaining of this paper, we will explain how our model works in Section 2, experiments in Section 3 and related work in Section 4.

2 Method

The core idea of our framework is to conduct multi-relation path prediction in deeper context from reasoning perspective, that is in the form of breadth first search. Before explaining the concrete reasoning algorithm, let’s take an overview of our framework in Section 2.1.

Refer to caption
Figure 1: An overview of our framework.

2.1 Overview

Definition 2.1.

Rule: A rule in our framework is in the form of B1(x,z)B2(z,y)H(x,y)B_{1}(x,z)\wedge B_{2}(z,y)\Longrightarrow H(x,y) or B(x,y)H(x,y)B(x,y)\Longrightarrow H(x,y), where the entities order in one triplet is random, i.e. B3(z,x)B4(z,y)R(x,y)B_{3}(z,x)\wedge B_{4}(z,y)\Longrightarrow R(x,y) is also a valid rule.

We model a knowledge graph as a collection of facts G={(h,r,t)|h,t,r}G=\{(h,r,t)|h,t\in\mathcal{E},r\in\mathcal{R}\}, where \mathcal{E} and \mathcal{R} represent the set of entities and relations in the knowledge graph, respectively. The steps of our framework are as follows corresponding to Figure 1.

Step 1. We invoke an embedding model to get a set Ξ(||+||)×k\Xi\in\mathbb{R}^{(|\mathcal{E}|+|\mathcal{R}|)\times k} containing the kk-dimensional embedding of entities and relations in GG.

Step 2. We apply AMIE (Galárraga et al., 2013) on GG to get the reasoning rule set Ω\Omega, where each rule meets Definition 2.1.

Step 3. The reasoning rules are measured based on the embedding of relations contained in the rule, which will explained in Secion 2.2.2.

Step 4. Reasoning is conducted for a given triplet (h,r,t)(h,r,t), which will be described in Section 2.2.

2.2 Reasoning algorithm

Definition 2.2.

score: The score Φ\Phi of a triplet meets Φ1\Phi\geq 1. The smaller Φ\Phi is, the triplet belongs to knowledge graph with greater probability. A triplet (h,r,t)(h,r,t)’s score is denoted as Φ(h,r,t)\Phi_{\sim(h,r,t)}.

We define two scores for each state during the search process. \mathcal{H} is a heuristic score and \mathcal{L} is the state score which will be used to compute Φ\Phi. Our method is based on the idea of BFS. We use a priority queue QQ to store states in ascending order of \mathcal{H}. The initial state is the target triplet itself, whose \mathcal{H} is 1 and \mathcal{L} is its score under an embedding model. Push the initial state into QQ.

During the search process, pop the top of QQ as the current state scurs_{cur}. It will not be extended if scurΦ\mathcal{H}_{s_{cur}}\geq\Phi111Φ\Phi is defined as Definition 2.2 and initialized as s0\mathcal{L}_{s_{0}}. We will always update Φ\Phi after we pop a state from QQ., otherwise we will extend it by matching rules to get new states. For each new state snews_{new}, compute its score snew\mathcal{H}_{s_{new}} and snew\mathcal{L}_{s_{new}}. If snew<scur\mathcal{H}_{s_{new}}<\mathcal{L}_{s_{cur}}, the state will then be pushed into QQ.

Repeat the above process until QQ is empty. Finally, we select the minimum \mathcal{L} of all states that were pop from QQ as Φ\Phi. The pseudo code is as shown in Appendix B.

In the above procedure, we abstract three things: 1.matching and extension, 2.state, 3.computation of ,\mathcal{H},\mathcal{L}. Details are shown in the following subsections.

2.2.1 matching and extension

State is a set of triplets, the initial state is the target triplet itself. Intermediate states are extended from the initial state. So essentially, the extension of a state is extension of the triplets in the state. For a triplet (h,r,t)(h,r,t), the process of matching and extension is roughly as follows:

1. Find rules ωΩ\omega\in\Omega in the form of B1(x,z)B2(z,y)H(x,y)B_{1}(x,z)\wedge B_{2}(z,y)\Longrightarrow H(x,y)222The rules we analyzed here are in the form of B1(x,z)B2(z,y)H(x,y)B_{1}(x,z)\wedge B_{2}(z,y)\Longrightarrow H(x,y). As for rules like B(x,z)H(x,y)B(x,z)\Longrightarrow H(x,y), the process is similar and will not be overtalked here., where H=rH=r.

2. Assign entities to variables in the rule, i.e. x=h,y=tx=h,y=t.

3. Find all z0z_{0} that satisfy (x,B1,z0)G(x,B_{1},z_{0})\in G or (z0,B2,y)G(z_{0},B_{2},y)\in G, where x=h,y=tx=h,y=t.

4. (h,r,t)(h,r,t) is extended to {(h,B1,z0),(z0,B2,t)}\{(h,B_{1},z_{0}),(z_{0},B_{2},t)\}. A triplet always has multiple extensions.

For example, we expand the target triplet in the initial state. There are two triplets in the sub-state, and either of them must be in the knowledge graph. When the sub-state is further expanded, the triplet in the knowledge graph need not to be expanded. Therefore, there should be m+1m+1 triplets in each sub-state after extending mm times. And at least mm of them belong to the knowledge graph.

2.2.2 computation of \mathcal{H} and \mathcal{L}

𝒪(h,r,t)\mathcal{H}_{\sim\mathcal{O}}(h,r,t) denotes the heuristic score of triplet (h,r,t)(h,r,t) when extended to state 𝒪\mathcal{O} and 𝒪(h,r,t)\mathcal{L}_{\sim\mathcal{O}}(h,r,t) is the corresponding state score.

𝒪(h,r,t)=(B1B2H)ΔPathω(B1,B2,H)𝒘.𝒓.𝒕,ω(B1,B2,H)e𝑩𝟏+𝑩𝟐𝑯k\mathcal{H}_{\sim\mathcal{O}}(h,r,t)=\prod_{(B_{1}\wedge B_{2}\Rightarrow H)\in\Delta_{Path}}\omega(B_{1},B_{2},H)\ \ \ \bm{w.r.t},\ \omega(B_{1},B_{2},H)\leftarrow e^{\frac{||\bm{B_{1}}+\bm{B_{2}}-\bm{H}||}{k}} (1)

𝒪(h,r,t)\mathcal{H}_{\sim\mathcal{O}}(h,r,t) is defined as Equation 1 indicating the product of the scores of all the rules. ΔPath\Delta_{Path} represents the set of the rules used in the extension from the initial state to the current state. ω(B1,B2,H)\omega(B_{1},B_{2},H) is the score of rules in the shape of B1B2HB_{1}\wedge B_{2}\Rightarrow H.

𝒪(h,r,t)=𝒪(h,r,t)(𝒪h,𝒪r,𝒪t)𝒪stransX(𝒪h,𝒪r,𝒪t)\mathcal{L}_{\sim\mathcal{O}}(h,r,t)=\mathcal{H}_{\sim\mathcal{O}}(h,r,t)*\prod_{(\mathcal{O}_{h},\mathcal{O}_{r},\mathcal{O}_{t})\in\mathcal{O}}s_{\sim transX}(\mathcal{O}_{h},\mathcal{O}_{r},\mathcal{O}_{t}) (2)

𝒪(h,r,t)\mathcal{L}_{\sim\mathcal{O}}(h,r,t) is defined as Equation 2 indicating the product of 𝒪(h,r,t)\mathcal{H}_{\sim\mathcal{O}}(h,r,t) and the scores of all the triplets in the state. 𝒪\mathcal{O} denotes the state and (𝒪h,𝒪r,𝒪t)(\mathcal{O}_{h},\mathcal{O}_{r},\mathcal{O}_{t}) is a triplet belongs to 𝒪\mathcal{O}. stransX(𝒪h,𝒪r,𝒪t)s_{\sim transX}(\mathcal{O}_{h},\mathcal{O}_{r},\mathcal{O}_{t}) is the embedding score of this triplet as defined in Equation 3.

stransX(𝒪h,𝒪r,𝒪t)={1if(𝒪h,𝒪r,𝒪t)G𝓞𝒉+𝓞𝒓𝓞𝒕/k+1if(𝒪h,𝒪r,𝒪t)Gs_{\sim transX}(\mathcal{O}_{h},\mathcal{O}_{r},\mathcal{O}_{t})=\left\{\begin{matrix}&1\ &if\ (\mathcal{O}_{h},\mathcal{O}_{r},\mathcal{O}_{t})\in G\\ &||\bm{\mathcal{O}_{h}}+\bm{\mathcal{O}_{r}}-\bm{\mathcal{O}_{t}}||/k+1\ &if\ (\mathcal{O}_{h},\mathcal{O}_{r},\mathcal{O}_{t})\notin G\end{matrix}\right. (3)

Rule’s score
To evaluate the score of rule B1(x,z)B2(z,y)H(x,y)B_{1}(x,z)\wedge B_{2}(z,y)\Longrightarrow H(x,y), we visualize the three triplets of this rule in a two-dimensional space in Figure 3 of Appendix A. In our model, if a rule has a high confidence, it should satisfy 𝒙+𝑯𝒚𝒙+𝑩𝟏𝒛+𝒛+𝑩𝟐𝒚\|\bm{x}+\bm{H}-\bm{y}\|\approx\|\bm{x}+\bm{B_{1}}-\bm{z}+\bm{z}+\bm{B_{2}}-\bm{y}\|. We have 𝑯𝑩𝟏+𝑩𝟐\bm{H}\approx\bm{B_{1}}+\bm{B_{2}}, so we can use 𝑩𝟏+𝑩𝟐𝑯3×k\|\bm{B_{1}}+\bm{B_{2}}-\bm{H}\|\in\mathbb{R}^{3\times k} to evaluate the score of the rule. kk is the dimension of embedding. The smaller score, the higher confidence. To make the dimension in the calculation uniform, we divide the score of the rule by kk. And then perform the ee exponential transformation to get the form in the Equation 1. The reason for this transformation will be explained in section 2.4.

Triplet’s score
𝓞𝒉+𝓞𝒓𝓞𝒕||\bm{\mathcal{O}_{h}}+\bm{\mathcal{O}_{r}}-\bm{\mathcal{O}_{t}}|| is the score of triplet (𝒪h,𝒪r,𝒪t)(\mathcal{O}_{h},\mathcal{O}_{r},\mathcal{O}_{t}) in transE model333Here we take transE as an example, so we use 𝓞𝒉+𝓞𝒓𝓞𝒕||\bm{\mathcal{O}_{h}}+\bm{\mathcal{O}_{r}}-\bm{\mathcal{O}_{t}}||. If the combined model changes, this formula should change to the form in the combined model, too.. The smaller the value, the more likely the triplet is in GG. When (𝒪h,𝒪r,𝒪t)G(\mathcal{O}_{h},\mathcal{O}_{r},\mathcal{O}_{t})\in G, the score is assumed to be 0. The same to rule’s score, we also perform a certain transformation on the scores of the triplets, which is to divide by kk and add 1.

2.3 Example

Assumption 2.1.

We put all the necessary message in Table 3 and 4. Apart from that, we make two assumptions. One is that we use the same symbol rir_{i} to represent a rule’s symbol and rule’s score. Another is that we define some data relations as Equation 4.

r1r3r5>s3(h,r,t)&s7(h,r,t)>Φ(h,r,t)r_{1}r_{3}r_{5}>\mathcal{L}_{\sim{s_{3}}}(h,r,t)\And\mathcal{H}_{\sim{s_{7}}}(h,r,t)>\Phi_{\sim(h,r,t)} (4)
Refer to caption
Figure 2: Demonstration of the search process based on an example. The search process is divided into six stages, each stage is contained in a sub-graph, each sub-graph contains three parts. The top of the sub-graph shows the current state of the priority queue, the middle part is the visualization of the search, the formula for updating Φ(h,r,t)\Phi_{\sim(h,r,t)} at each stage is given at the bottom.

In this section, we use an example to illustrate our algorithm as shown in Figure 2444There will be some conflicts in the usage of symbols. For these symbols, it’s only valid in this example.. The initial state s0s_{0} only contains one triplet (h,r,t)(h,r,t), and its state score and heuristic score are both 1. At the beginning, the priority queue QQ has only one element, i.e. the initial state with its scores. The search process is as follows, and the necessary message is defined in Assumption 2.1.

  • \@slowromancapi@.

    s0s_{0} matches r1r_{1} and r2r_{2} and extends to s1s_{1} and s2s_{2} respectively. s2s_{2} is a termination state for the triplets in s2s_{2} are all in GG. We use s2(h,r,t)\mathcal{L}_{\sim s_{2}}(h,r,t) to update Φ(h,r,t)\Phi_{\sim(h,r,t)} and push s1s_{1} into QQ.

  • \@slowromancapii@.

    Pop the top of queue s1s_{1}. Use it to update Φ(h,r,t)\Phi_{\sim(h,r,t)} and then extend it to three new states which will be pushed to QQ.

  • \@slowromancapiii@.

    Pop the top of queue s3s_{3} to update Φ(h,r,t)\Phi_{\sim(h,r,t)} and extend it with matching the rule r5r_{5}. Since r1r3r5>s3(h,r,t)r_{1}r_{3}r_{5}>\mathcal{L}_{\sim{s_{3}}}(h,r,t), i.e. the solution produced by this path will not be the global minimum. As a consequence, this state is no longer extended.

  • \@slowromancapiv@.

    Pop the top of queue s4s_{4} to update Φ(h,r,t)\Phi_{\sim(h,r,t)} and extend to get two new states s6,s7s_{6},s_{7}.

  • \@slowromancapv@.

    Pop the top of queue s6s_{6} to update Φ(h,r,t)\Phi_{\sim(h,r,t)} and extend to s8,s9s_{8},s_{9} after the rule r7r_{7}.

  • \@slowromancapvi@.

    Pop the top of queue s7s_{7} and now s7(h,r,t)>Φ(h,r,t)\mathcal{H}_{\sim{s_{7}}}(h,r,t)>\Phi_{\sim(h,r,t)}. So s7s_{7} and the remaining states in QQ need not extend. Therefore, all the remaining states in QQ become termination states. The search stops.

2.4 Analyse

Is the algorithm sure to be effective?

For three real number a,b,c(a,b,c>1)a,b,c(a,b,c>1), it’s possible that c>ab1c>a*b*1. Consider triplet (h,r,t)(h,r,t) and rule B1(x,z)B2(z,y)H(x,y)B_{1}(x,z)\wedge B_{2}(z,y)\Longrightarrow H(x,y) ,w.r.t r=Hr=H. If cc represents the score of (h,r,t)(h,r,t), aa represents the score of the rule, bb represents the score of the expanded new triplet that not in the knowledge graph, and 11 represents the score of the expanded new triplet that in the knowledge graph. Then the score of (h,r,t)(h,r,t) will be reduced to aba*b, i.e. we use the new expanded triplets and the rule to evaluate (h,r,t)(h,r,t).

Of course, this optimization will be effective only on the correct triplets. For the wrong triplets, another wrong triplet with a large score will be obtained after rule matching. So c>abc>a*b in this occasion is a very unlikely event. As a result, the correct triplets are optimized, and the wrong triplets will generally not be optimized. Therefore, from a macro perspective, the ranking of the correct triplets will increase.

Is this algorithm a terminating algorithm?

The heuristic score of a state is the product of scores of all the rules along the reasoning path. The scores of the rules are all number greater than 1, so when a search path is long enough, \mathcal{H} must be greater than Φ\Phi. So the search must be able to stop.

What is the principle when designing the calculation of rules and triplets?

From the above, we require that the scores of rules and triplets are greater than 1 and close to 1. Given that each dimension of embedding obtained by the translation model is a number less than 1 and close to 0, we divide the score by the corresponding dimension kk and then add 1 to meet our requirements. In addition, in order to highlight the importance of rules in the calculation, we use exponential changes for rules instead of plus 1.

3 Experiment

3.1 Experiment setup

Dataset: We evaluate EM-RBR on FB15k, WN18 (Bordes et al., 2013) and FB15k-R. FB15k has 14951 entities, 1345 relations and 59071 test triplets in total. WN18 has 40943 entities, 18 relations and 5000 test triplets in total. We create FB15k-R, a subset of FB15k, which contains 1000 tested triplets that have rich rules to take reasoning.

Metrics: We use a number of commonly used metrics, including Mean Rank (MR), Mean Reciprocal Rank (MRR), and Hit ratio with cut-off values n = 1,10. MR measures the average rank of all correct entities and MRR is the average inverse rank for correct entities. Hits@n measures the proportion of correct entities in the top n entities. MR is always greater or equal to 1 and the lower MR indicates better performance, while MRR and Hits@n scores always range from 0.0 to 1.0 and higher score reflects better prediction results. We use filtered setting protocol (Bordes et al., 2013), i.e., filtering out any corrupted triples that appear in the KB to avoid possibly flawed evaluation.

Baseline: To demonstrate the effectiveness of EM-RBR, we compare with a number of competitive baselines: TransE (Bordes et al., 2013), TransH (Wang et al., 2014), TransR (Lin et al., 2015), TransD (Ji et al., 2015), RUGE (Guo et al., 2017), ComplEx (Trouillon et al., 2016) and DistMult (Kadlec et al., 2017). Among these state-of-arts, TransE, TransH, TransR and TransD are combined with our reasoning framework. These 8 models are evaluated on FB15k and WN18 to prove that our framework is a real reinforced framework. In the end, all the baselines and combined models are evaluated on FB15k-R.

Implementation: For TransE, TransH, TransR and TransD, we set the same parameters, i.e., the dimensions of embedding k=100k=100, learning rate λ=0.001\lambda=0.001, the margin γ=1\gamma=1. We traverse all the training triplets for 1000 rounds. Other parameters of models are set as the same with the parameters in the published works  (Bordes et al., 2013; Wang et al., 2014; Lin et al., 2015)555We use the implementation of these model at https://github.com/thunlp/Fast-TransX. For RUGE, we set the embedding dimension k=100k=100 and other hyper-parameters are the same with  Guo et al. (2017)666The code is available at https://github.com/iieir-km/RUGE. For ComplEx and DistMult, all the parameters are consistent with Trouillon et al. (2016)777https://github.com/ttrouill/complex.

3.2 Experiment results

Table 1: Experimental results on FB15k,WN18 and FB15k-R test set. [\bm{\ddagger}]:E-R(E) denotes EM-RBR(E), indicating that the embedding model in this experimental group is transE. [\bm{\star}]:We don’t use it here because it’s time-consuming and not better than transE on WN18 as reported in  Lin et al. (2017). [\bm{\dotplus}]:This model is rerun and only tested on the subset of FB15k to show that our embedding model can perform better than it.
Model FB15k WN18 FB15k-R
MR MRR H@1 H@10 MR MRR H@1 H@10 MR MRR H@1 H@10
TransE 70.3 45.77 29.98 74.27 200.9 57.47 23.21 97.68 71.33 26.11 14.9 48.1
TransH 72.56 45.81 30.37 74.01 210.7 61.94 32.03 97.49 50.65 30.43 18.25 54.95
TransR 55.98 47.88 31.1 77.04 \bm{\star} - -   - -   - -   - - 29.64 18.51 7.65 37.6
TransD 56.41 47.88 32.48 75.99 202.8 60.35 29.6 97.37 28.24 26.16 14 50.45
\bm{\ddagger}E-R(E) 68.36 50.01 34.44 76.23 198.1 85.23 73.94 97.83 3.12 79.88 65.1 96.4
E-R(H) 70.72 52.39 38.82 76.52 201.4 84.57 74.97 96.48 3.52 85.61 75.45 97.8
E-R(R) 55.47 51.93 35.86 78.35 \bm{\star} - -   - -   - -   - - 1.73 86.01 74.3 99.2
E-R(D) 55.21 53.02 38.25 78.33 201.8 84.63 75.21 97.5 3.245 82.04 70.45 96.15
\bm{\dotplus}RotateE   - -   - -   - -   - -   - -   - -   - -   - - 26.66 33.29 20.05 59.75
RUGE   - -   - -   - -   - -   - -   - -   - -   - - 53.63 49.14 33.05 78.2
ComplEx   - -   - -   - -   - -   - -   - -   - -   - - 51.43 51.1 35.8 79.0
DistMult   - -   - -   - -   - -   - -   - -   - -   - - 62.02 46.2 30.2 77.1

Experimental Results are as shown in Table 1. Through this experiment, we would like to prove two things. One is that EM-RBR is a valid reinforced model, i.e. EM-RBR(X) model always performs better than X model. Another is that EM-RBR will beat all of the current state-of-the-arts on a data set with rich rules.

When evaluating on FB15k and WN18, our model has improved all the metrics compared with the translation model in the baseline, especially MRR and Hits@1 on WN18. For example, EM-RBR(D) improve Hits@1 on WN18 from 0.296 to 0.752 compared to transD.

As for FB15k-R, each triplet in this data set can match a lot of rules so that they can be optimized extremely under EM-RBR. On this data set, the best MR is 1.73 from EM-RBR(R) which is an improvement of 24.93 relative to RotateE. The best MRR is 0.86 from EM-RBR(R) which is an improvement of 0.35 relative to ComplEx. The best Hits@1 is 0.7545 from EM-RBR(H) which is an improvement of 39.65 relative to ComplEx. The best Hits@10 is 0.992 from EM-RBR(R) which is an improvement of 20.2 relative to ComplEx.

3.3 Result analysis

The triplets in the test set of each data set can be roughly divided into two parts: one is that the rules can be matched to be optimized by our model, and the other is that without rules to be matched. In FB15k, the ratio of these two parts is about 1:5. From the final result, although various metrics have been improved compared with the model before the combination. In fact, only a small part of the triplets have been optimized. It is also for this reason that the capabilities of our model can be fully demonstrated on the FB15k-R, because each triplet in this set has many rules that can be matched to obtain a good optimization effect.

In order to better understand the specific situation being optimized on each triplet. We respectively analyzed the corresponding ranking of each triplet under the translation model and the EM-RBR model when the head entity replacement and tail entity replacement were performed. The results were displayed in Table  2. The data item in the table is the result of sorting from largest to smallest value of stranssERs_{\sim trans}-s_{\sim ER}, where stranss_{\sim trans} is the ranking under the corresponding translation model and sERs_{\sim E-R} is the ranking under the corresponding EM-RBR model.

Table 2: Optimized case analysis. [\bm{\dotplus}]: the id number of the test case, for example, the first test case is /m/01qscs /award/award_nominee/award_nominations./award/award_nomination/award /m/02x8n1n and its id number is 0. [\bm{\ast}]: the rank of the test case in EM-RBR. [\bm{\ddagger}]: the rank of the test case in the embedding model. [\bm{\diamond}]: L corresponds to replacing the head entity and R the tail entity.
Rank EM-RBR(E) EM-RBR(H) EM-RBR(R)
\bm{\dotplus}id \bm{\ast}E-R \bm{\ddagger}trans \bm{\diamond}L/R \bm{\dotplus}id \bm{\ast}E-R \bm{\ddagger}trans \bm{\diamond}L/R \bm{\dotplus}id \bm{\ast}E-R \bm{\ddagger}trans \bm{\diamond}L/R
1 47722 2 14141 R 18355 2 12689 L 15105 2 966 L
2 47722 2 13900 L 32966 3 8551 L 42675 1 868 R
3 18355 2 7525 L 18355 2 7231 R 34891 2 733 R
4 36133 2 6884 L 47722 2 4569 L 24314 1 714 R
5 33004 1 6253 L 24243 1 4547 L 32849 1 701 L
6 33243 2 5883 R 33004 1 4490 L 55951 2 673 L
7 30883 2 5674 R 47722 2 3977 R 38773 1 640 L
8 14035 2 4862 L 13358 5 3741 R 54283 52 674 L
9 18355 2 4525 R 55951 2 3699 L 25500 1 585 R
10 24243 1 3655 L 50019 1 3386 R 34891 2 555 L
\bm{\cdot\ \cdot\ \cdot\ }
19372 52886 4 2 R 23339 6 1 L 44273 2 1 R
19373 52707 13 11 L 23288 7 2 R 43969 2 1 R
19374 52529 9 7 R 23218 7 2 R 43664 2 1 L
19375 51447 3 1 R 21906 7 2 R 43483 2 1 L
19376 50932 4 2 R 20794 7 2 R 42380 2 1 R
\bm{\cdot\ \cdot\ \cdot\ }

4 Related work

For the path-based methods, Lao et al. (2011) uses Path Ranking Algorithm (PRA) (Lao & Cohen, 2010) to estimate the probability of an unseen triplet as a combination of weighted random walks. Zhang et al. (2020) and (Qu & Tang, 2019) are both the combination of Markov logic network and embedding. Kok & Domingos (2007) is mainly a clustering algorithm, clustering entity sets under multiple relationship categories. Gardner et al. (2014) makes use of an external text corpus to increase the connectivity of KB. The Neural LP model  (Yang et al., 2017) compiles inferential tasks into differentiable numerical matrix sequences. Besides, many studies have modeled the path-finding problem as a Markov decision-making process, such as the DeepPath model  (Xiong et al., 2017) and MINERVA  (Das et al., 2017). For the embedding methods, Nguyen (2017) has organized the existing work. Our paper divides all embedding methods into four categories, which are: translation, Bilinear&\AndTensor, neural network and complex vector. Firstly, for translation, the Unstructured model (Bordes et al., 2014) assumes that the head and tail entity vectors are similar without distinguishing relation types. The Structured Embedding (SE) model (Bordes et al., 2011) assumes that the head and tail entities are similar only in a relation-dependent subspace. Later, there are transE, transR, transH (Lin et al., 2015; Wang et al., 2014; Bordes et al., 2013), etc. Sadeghian et al. (2019) mines first-order logical rules from knowledge graphs and uses those rules to solve KBC. Additionally, other work (Yang et al., 2017; Galárraga et al., 2013) can extract some high-quality rules from knowledge base. For the second type, DISTMULT (Yang et al., 2014) is based on the Bilinear model (Nickel et al., 2011) where each relation is represented by a diagonal matrix rather than a full matrix. SimplE (Kazemi & Poole, 2018) extends CP models (Hitchcock, 1927) to allow two embeddings of each entity to be learned dependently. The third method is to implement embedding with a neural network. Apart from the models mentioned in Section 1, NTN (Socher et al., 2013) and ER-MLP (Dong et al., 2014) also belong to this method. Fourthly, instead of embedding entities and relations in real-valued vector space, ComplEx  (Trouillon et al., 2016) is an extension of DISTMULT in the complex vector space. ComplEx-N3  (Lacroix et al., 2018) extends ComplEx with weighted nuclear 3-norm. Also in the complex vector space, RotatE  (Sun et al., 2019) defines each relation as a rotation from the head entity to the tail entity. QuatE  (Zhang et al., 2019) represents entities by quaternion embeddings (i.e., hypercomplex-valued embeddings) and models relations as rotations in the quaternion space.

5 Conclusion &\And Future work

This paper introduces an innovative framework called EM-RBR combining embedding and rule-based reasoning, which can be easily integrated with any translation based embedding model. Unlike previous joint models trying to get better embedding results from rules and triplets, our model allows solving completion from the reasoning perspective by conducting multi-relation path prediction, i.e. a breadth first search. We also demonstrate that EM-RBR can efficiently improve the performance of embedding methods for KGC. This makes the existing translation based embedding methods more suitable and reliable to be used in the real and large scale knowledge inference tasks.

There are two possible directions in the future. On one hand, we will combine our model with more embedding models, not just the translation-based embedding model. On the other hand, we are going to extract more and more reliable association rules to optimize our work. As mentioned above, only a part of triples are optimized when evaluating on FB15k. The fundamental reason for the rest is that there is no corresponding rule for matching. If these two problems are solved, EM-RBR can be better improved.

References

  • Bordes et al. (2011) Antoine Bordes, Jason Weston, Ronan Collobert, and Yoshua Bengio. Learning structured embeddings of knowledge bases. In Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011.
  • Bordes et al. (2013) Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems, pp. 2787–2795, 2013.
  • Bordes et al. (2014) Antoine Bordes, Xavier Glorot, Jason Weston, and Yoshua Bengio. A semantic matching energy function for learning with multi-relational data. Machine Learning, 94(2):233–259, 2014.
  • Das et al. (2017) Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alexander J. Smola, and Andrew McCallum. Go for a walk and arrive at the answer: Reasoning over paths in knowledge bases using reinforcement learning. CoRR, abs/1711.05851, 2017.
  • Dong et al. (2014) Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.  601–610, 2014.
  • Galárraga et al. (2013) Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, and Fabian Suchanek. Amie: association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22nd international conference on World Wide Web, pp.  413–422, 2013.
  • Gardner et al. (2014) Matt Gardner, Partha Talukdar, Jayant Krishnamurthy, and Tom Mitchell. Incorporating vector space similarity in random walk inference over knowledge bases. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp.  397–406, 2014.
  • Graupmann et al. (2005) Jens Graupmann, Ralf Schenkel, and Gerhard Weikum. The spheresearch engine for unified ranked retrieval of heterogeneous xml and web documents. volume 2, pp.  529–540, 01 2005.
  • Guo et al. (2016) Shu Guo, Quan Wang, Lihong Wang, Bin Wang, and Li Guo. Jointly embedding knowledge graphs and logical rules. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.  192–202, 2016.
  • Guo et al. (2017) Shu Guo, Quan Wang, Lihong Wang, Bin Wang, and Li Guo. Knowledge graph embedding with iterative guidance from soft rules. arXiv preprint arXiv:1711.11231, 2017.
  • Hitchcock (1927) Frank L Hitchcock. The expression of a tensor or a polyadic as a sum of products. Journal of Mathematics and Physics, 6(1-4):164–189, 1927.
  • Hoffmann et al. (2011) Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT ’11, pp.  541–550, USA, 2011. Association for Computational Linguistics. ISBN 9781932432879.
  • Ji et al. (2015) Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.  687–696, 2015.
  • Kadlec et al. (2017) Rudolf Kadlec, Ondrej Bajgar, and Jan Kleindienst. Knowledge base completion: Baselines strike back. arXiv preprint arXiv:1705.10744, 2017.
  • Kazemi & Poole (2018) Seyed Mehran Kazemi and David Poole. Simple embedding for link prediction in knowledge graphs. In Advances in neural information processing systems, pp. 4284–4295, 2018.
  • Kok & Domingos (2007) Stanley Kok and Pedro Domingos. Statistical predicate invention. In Proceedings of the 24th international conference on Machine learning, pp.  433–440, 2007.
  • Lacroix et al. (2018) Timothée Lacroix, Nicolas Usunier, and Guillaume Obozinski. Canonical tensor decomposition for knowledge base completion. In ICML, 2018.
  • Lao & Cohen (2010) Ni Lao and William W. Cohen. Relational retrieval using a combination of path-constrained random walks. Mach. Learn., 81(1):53–67, 2010.
  • Lao et al. (2011) Ni Lao, Tom Mitchell, and William W Cohen. Random walk inference and learning in a large scale knowledge base. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.  529–539. Association for Computational Linguistics, 2011.
  • Lin et al. (2017) Hailun Lin, Yong Liu, Weiping Wang, Yinliang Yue, and Zheng Lin. Learning entity and relation embeddings for knowledge resolution. Procedia Computer Science, 108:345–354, 2017.
  • Lin et al. (2015) Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and relation embeddings for knowledge graph completion. In Twenty-ninth AAAI conference on artificial intelligence, 2015.
  • Mikolov et al. (2013) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119, 2013.
  • Nguyen et al. (2017) Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, and Dinh Phung. A novel embedding model for knowledge base completion based on convolutional neural network. arXiv preprint arXiv:1712.02121, 2017.
  • Nguyen (2017) Dat Quoc Nguyen. An overview of embedding models of entities and relationships for knowledge base completion. arXiv preprint arXiv:1703.08098, 2017.
  • Nickel et al. (2011) Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. A three-way model for collective learning on multi-relational data. In Icml, volume 11, pp.  809–816, 2011.
  • Qu & Tang (2019) Meng Qu and Jian Tang. Probabilistic logic neural networks for reasoning. In Advances in Neural Information Processing Systems, pp. 7712–7722, 2019.
  • Rocktäschel et al. (2015) Tim Rocktäschel, Sameer Singh, and Sebastian Riedel. Injecting logical background knowledge into embeddings for relation extraction. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  1119–1129, 2015.
  • Sadeghian et al. (2019) Ali Sadeghian, Mohammadreza Armandpour, Patrick Ding, and Daisy Zhe Wang. Drum: End-to-end differentiable rule mining on knowledge graphs. In Advances in Neural Information Processing Systems, pp. 15321–15331, 2019.
  • Socher et al. (2013) Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems, pp. 926–934, 2013.
  • Sun et al. (2019) Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. Rotate: Knowledge graph embedding by relational rotation in complex space. CoRR, abs/1902.10197, 2019.
  • Trouillon et al. (2016) Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. Complex embeddings for simple link prediction. CoRR, abs/1606.06357, 2016.
  • Wang et al. (2019) Pengwei Wang, Dejing Dou, Fangzhao Wu, Nisansa de Silva, and Lianwen Jin. Logic rules powered knowledge graph embedding. arXiv preprint arXiv:1903.03772, 2019.
  • Wang & Cohen (2016) William Yang Wang and William W Cohen. Learning first-order logic embeddings via matrix factorization. In Ijcai, pp.  2132–2138, 2016.
  • Wang et al. (2014) Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by translating on hyperplanes. In Twenty-Eighth AAAI conference on artificial intelligence, 2014.
  • Wasserman Pritsker et al. (2015) Evgenia Wasserman Pritsker, William Cohen, and Einat Minkov. Learning to identify the best contexts for knowledge-based WSD. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.  1662–1667, Lisbon, Portugal, September 2015. Association for Computational Linguistics. doi: 10.18653/v1/D15-1192.
  • Xiong et al. (2017) Wenhan Xiong, Thien Hoang, and William Yang Wang. Deeppath: A reinforcement learning method for knowledge graph reasoning. CoRR, abs/1707.06690, 2017.
  • Yang et al. (2014) Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575, 2014.
  • Yang et al. (2017) Fan Yang, Zhilin Yang, and William W Cohen. Differentiable learning of logical rules for knowledge base reasoning. In Advances in Neural Information Processing Systems, pp. 2319–2328, 2017.
  • Zhang et al. (2019) Shuai Zhang, Yi Tay, Lina Yao, and Qi Liu. Quaternion knowledge graph embedding. CoRR, abs/1904.10281, 2019.
  • Zhang et al. (2020) Yuyu Zhang, Xinshi Chen, Yuan Yang, Arun Ramamurthy, Bo Li, Yuan Qi, and Le Song. Efficient probabilistic logic reasoning with graph neural networks. arXiv preprint arXiv:2001.11850, 2020.

Appendix A Rule’s visualization

Refer to caption
Figure 3: visualization of a rule.

Appendix B Pseudo code

Algorithm 1 EM-RBR
1:(h,r,t)(h,r,t)
2:Φ(h,r,t)\Phi_{\sim(h,r,t)}
3:Initialize QQ as an empty priority queue
4:Initialize the first state: s0{(h,r,t)}s_{0}\leftarrow\{(h,r,t)\}, s0(h,r,t),s0(h,r,t)1,1\mathcal{H}_{\sim s_{0}}(h,r,t),\mathcal{L}_{\sim s_{0}}(h,r,t)\leftarrow 1,1
5:Initialize the score: Φ(h,r,t)(stransX(h,r,t))/k+1\Phi_{\sim(h,r,t)}\leftarrow(s_{\sim transX}(h,r,t))/k+1
6:QQ.push(s0s_{0})
7:while !QQ.empty() do
8:     scurQs_{cur}\leftarrow Q.pop()
9:     Φ(h,r,t)min{Φ(h,r,t),scur(h,r,t)}\Phi_{\sim(h,r,t)}\leftarrow\min\{\Phi_{\sim(h,r,t)},\mathcal{L}_{\sim s_{cur}}(h,r,t)\}
10:     if scur(h,r,t)<Φ(h,r,t)\mathcal{H}_{\sim s_{cur}}(h,r,t)<\Phi_{\sim(h,r,t)} then
11:         𝕊ne\mathbb{S}_{ne}\leftarrow extend(scurs_{cur})
12:         calculate(sne(h,r,t)\mathcal{H}_{\sim s_{ne}}(h,r,t)), sne𝕊nes_{ne}\in\mathbb{S}_{ne}
13:         calculate(sne(h,r,t)\mathcal{L}_{\sim s_{ne}}(h,r,t)), sne𝕊nes_{ne}\in\mathbb{S}_{ne}
14:         for sne𝕊nes_{ne}\in\mathbb{S}_{ne} do
15:              if sne(h,r,t)<scur(h,r,t)\mathcal{H}_{\sim s_{ne}}(h,r,t)<\mathcal{L}_{\sim s_{cur}}(h,r,t) then
16:                  QQ.push(snes_{ne})
17:              end if
18:         end for
19:     end if
20:end while

Appendix C Data message in example

Table 3: Triplets in each state. [\bm{\star}]: If this symbol appears in the upper right corner of a triple, the triplet is not in the knowledge graph. Other triplets are all in the knowledge graph.
state triplets
s1s_{1} (h,B1,m1)(h,B_{1},m_{1})^{\bm{\star}} (m1,B2,t)(m_{1},B_{2},t)
s2s_{2} (h,B3,m2)(h,B_{3},m_{2}) (m2,B4,t)(m_{2},B_{4},t)
s3s_{3} (h,B5,m3)(h,B_{5},m_{3}) (m3,B6,m1)(m_{3},B_{6},m_{1})^{\bm{\star}} (m1,B2,t)(m_{1},B_{2},t)
s4s_{4} (h,B5,m4)(h,B_{5},m_{4})^{\bm{\star}} (m4,B6,m1)(m_{4},B_{6},m_{1}) (m1,B2,t)(m_{1},B_{2},t)
s5s_{5} (h,B7,m5)(h,B_{7},m_{5})^{\bm{\star}} (m5,B8,m1)(m_{5},B_{8},m_{1}) (m1,B2,t)(m_{1},B_{2},t)
s6s_{6} (h,B9,m6)(h,B_{9},m_{6}) (m6,B10,m4)(m_{6},B_{10},m_{4})^{\bm{\star}} (m4,B6,m1)(m_{4},B_{6},m_{1}) (m1,B2,t)(m_{1},B_{2},t)
s7s_{7} (h,B11,m7)(h,B_{11},m_{7}) (m7,B12,m4)(m_{7},B_{12},m_{4})^{\bm{\star}} (m4,B6,m1)(m_{4},B_{6},m_{1}) (m1,B2,t)(m_{1},B_{2},t)
s8s_{8} (h,B9,m6)(h,B_{9},m_{6}) (m6,B13,m8)(m_{6},B_{13},m_{8})^{\bm{\star}} (m8,B14,m4)(m_{8},B_{14},m_{4}) (m4,B6,m1)(m_{4},B_{6},m_{1}) (m1,B2,t)(m_{1},B_{2},t)
s9s_{9} (h,B9,m6)(h,B_{9},m_{6}) (m6,B13,m9)(m_{6},B_{13},m_{9}) (m9,B14,m4)(m_{9},B_{14},m_{4})^{\bm{\star}} (m4,B6,m1)(m_{4},B_{6},m_{1}) (m1,B2,t)(m_{1},B_{2},t)
Table 4: Rule’s score
rule score
B1(x,z)B_{1}(x,z) B2(z,y)\wedge B_{2}(z,y) r(x,y)\Rightarrow r(x,y) r1
B3(x,z)B_{3}(x,z) B4(z,y)\wedge B_{4}(z,y) r(x,y)\Rightarrow r(x,y) r2
B5(x,z)B_{5}(x,z) B6(z,y)\wedge B_{6}(z,y) B1(x,y)\Rightarrow B_{1}(x,y) r3
B7(x,z)B_{7}(x,z) B8(z,y)\wedge B_{8}(z,y) B1(x,y)\Rightarrow B_{1}(x,y) r4
B9(x,z)B_{9}(x,z) B10(z,y)\wedge B_{10}(z,y) B5(x,y)\Rightarrow B_{5}(x,y) r5
B11(x,z)B_{11}(x,z) B12(z,y)\wedge B_{12}(z,y) B5(x,y)\Rightarrow B_{5}(x,y) r6
B13(x,z)B_{13}(x,z) B14(z,y)\wedge B_{14}(z,y) B10(x,y)\Rightarrow B_{10}(x,y) r7