This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

MöbiusE: Knowledge Graph Embedding on Möbius Ring

Yao Chen [email protected] Jiangang Liu Zhe Zhang Shiping Wen Wenjun Xiong Department of Computer Science, Southwestern University of Finance and Economics, China Centre for Artificial Intelligence, University of Technology Sydney, Sydney, Australia
Abstract

In this work, we propose a novel Knowledge Graph Embedding (KGE) strategy, called MöbiusE, in which the entities and relations are embedded to the surface of a Möbius ring. The proposition of such a strategy is inspired by the classic TorusE, in which the addition of two arbitrary elements is subject to a modulus operation. In this sense, TorusE naturally guarantees the critical boundedness of embedding vectors in KGE. However, the nonlinear property of addition operation on Torus ring is uniquely derived by the modulus operation, which in some extent restricts the expressiveness of TorusE. As a further generalization of TorusE, MöbiusE also uses modulus operation to preserve the closeness of addition operation on it, but the coordinates on Möbius ring interacts with each other in the following way: any vector on the surface of a Möbius ring moves along its parametric trace will goes to the right opposite direction after a cycle. Hence, MöbiusE assumes much more nonlinear representativeness than that of TorusE, and in turn it generates much more precise embedding results. In our experiments, MöbiusE outperforms TorusE and other classic embedding strategies in several key indicators.

keywords:
Möbius ring, Torus ring , Knowledge graph , Embedding
journal: Journal of  Templates

1 Introduction

Graph or network structure can usually be used to model the intrinsic information behind data [1]. As an typical example, a knowledge graph (KG) is a collection of facts of the real world, related examples include DBpedia [2], YAGO [3] and Freebase [4]. These built KG database can be applied in many realistic engineering tasks, such as knowledge inference, question answering, and sample labeling. Driven by realistic applications, an intriguing and challenging problem for KG is: how to infer those unknown knowledge (or missing data) via the existing built KG database.

For any KG, the basic element of knowledge is represented by a triplet (h,r,t)(h,r,t), which composed of two entities h,th,t and one relation rr. In detail, hh is called the head, rr is called the tail, and (h,r,t)(h,r,t) means head hh maps to tail tt via the operation of relation rr. For example, the entity h=h= U.S.A projects to t=t= DonaldTrump under the relation h=h= ThePresidentOf, and the entity h=h= U.S.A projects to t=t= MikePence under the relation h=h= TheVicePresidentOf. Based on these two triplets, and considering the fact of “the president and vice president of a country at the same time are colleagues”, one predicts the existence of the triplet ((DonaldTrump, IsColleagueOf, MikePence)) if such triplet is not contained in the given database. For realistic applications such as question answering, the core task is to predict those triplets which are not listed in the given database, such as task is called link prediction in KG.

There exists many types of method for link prediction in KG, among which translation-based methods treats each entity and relation as embedded vectors in some constructed algebraic space, and each true triplet can be translated to a simple constraint described by an algebraic expression f(h,r,t)f(h,r,t). Using such a method, if some (h,r,t)(h,r,t) is not presented in KG but the corresponding algebraic constraint index f(h,r,t)f(h,r,t) is very small, then it is reasonable to believe that (h,r,t)(h,r,t) is a missing fact. Due to the choosing of different algebraic space, many different translation-based models have be obtained. TransE [5] is the first translation-based model for link prediction tasks, in which it uses the Euclidean space n\mathbb{R}^{n} as the embedding space and the constraint quantity for each triplet is set as simple as h+rt\|h+r-t\|. TransR [6] builds the constraint equation by mapping the entities to a subspace derived by the relation vector. TansH [7] improves TransE in dealing with reflexive/ONE-TO-MANY/MANY-TO-ONE/MANY-TO-MANY cases in KG, such an improvement is caused by treating each relation as a translating operation on a hyperplane. DistMult [8] utilizes a simple bilinear formulation to model the constraint generated by triplets, in this way the composition of relation is characterized by matrix multiplication, which makes such a model good at capturing relational semantics. ComplEx [9] uses the standard dot product between embeddings and demonstrates the capability of complex-valued embeddings instead of real-valued numbers. TorusE [10] chooses a compact Lie group, a torus, as the embedding space, and is more scalable to large-size knowledge graphs because of its lower computational complexity. Other types of embedding methods include TransA [11], a locally and temporally adaptive translation-based approach; ConvE [12], a multi-layer convolutional network embedding model; TransF, an embedding strategy with flexibility in each triplet constraint; QuaternionE [13], an embedding method treats each relation as a quaternion in hypercomplex space.

Among the above embedding methods, the basic operation behind TorusE is a simple modulus operation and this operation automatically regularizes the calculated results to the given bounded area. In this way, there is no need to make an extra regularization operation in TorusE. Inspired by this property of TorusE, we move beyond one-dimensional modulus operation on Lie group and propose MöbiusE, which takes a Möbius ring (whose basic dimension is two) as the embedding space. This MöbiusE has the following major benefits for embedding tasks: At first, MöbiusE provides much more flexibility since each point on the surface of a Möbius ring has two different expressions; Second, the expressiveness of MöbiusE is much greater than TorusE due to the complex distance function built in Möbius ring; Third, MöbiusE also automatically truncates the training vectors to constraint space as that of TorusE; Finally, MöbiusE subsumes TorusE, and inherits all the attractive properties of TorusE, such as its ability to model symmetry/antisymmetry, inversion, and composition.

The rest of the paper is organized as follows: Section 2 revisits the idea of TorusE and proposes the basic idea of MöbiusE, especially, we discuss about how to define the distance function on a Möbius ring. Section 3 provides the experiment results for MöbiusE on datasets FB15K and WN18. Section 4 discuss about related works and compare MöbiusE with other embedding methods. Section 5 concludes this paper. In the end, Section 6 lists the proof for several key propositions on Möbius ring.

Name Triplet Score Function Embedding Space
TransE [5] 𝐡+𝐫𝐭||\bf{h}+\bf{r}-\bf{t}|| 𝐡,𝐫,𝐭𝐧\bf{h,r,t}\in\mathbb{R}^{n}
TransH [7] 𝐡𝐰𝐫T𝐡𝐰𝐫+𝐝𝐫(𝐭𝐰𝐫T𝐭𝐰𝐫)||\bf{h}-\bf{w_{r}^{\mathrm{T}}hw_{r}}+d_{r}-(\bf{t}-\bf{w_{r}^{\mathrm{T}}tw_{r}})|| 𝐡,𝐭,𝐝𝐫,𝐰𝐫𝐧\bf{h,t,d_{r},w_{r}}\in\mathbb{R}^{n}
RESCAL [14] 𝐡𝐓𝐌𝐫𝐭\bf{h}^{T}\bf{M}_{r}\bf{t} 𝐡𝟐𝟏,𝐭𝟐𝟏,𝐌𝐫𝐅𝟏\|\bf{h}\|_{2}\leq 1,\|\bf{t}\|_{2}\leq 1,\|\bf{M}_{r}\|_{F}\leq 1
TorusE [10] 𝐡+𝐫𝐭𝐓𝐧||\bf{h}+\bf{r}-\bf{t}||_{T^{n}} 𝐡,𝐫,𝐭𝕋𝐧\bf{h,r,t}\in\mathbb{T}^{n}
Distmult [8] 𝐡T𝐝𝐢𝐚𝐠(𝐫)𝐭-\bf{h^{\mathrm{T}}}diag(\bf{r})\bf{t} 𝐡,𝐫,𝐭𝐧\bf{h,r,t}\in\mathbb{R}^{n}
ComplEx [9] Re(𝐡T𝐝𝐢𝐚𝐠(𝐫)𝐭¯)-{\rm Re}(\bf{h^{\mathrm{T}}diag(r)\bar{t}}) 𝐡,𝐫,𝐭𝐧\bf{h,r,t}\in\mathbb{C}^{n}
MöbiusE (ours) dist(𝐡𝐫,𝐭){\rm dist}(\bf{h}\oplus\bf{r},\bf{t}) 𝐡,𝐫,𝐭𝕄𝐧𝐪/𝐩\bf{h},\bf{r},\bf{t}\in\mathbb{M}^{q/p}_{n}
Table 1: Scoring Functions of Typical KGE Models

2 Embedding on Möbius Ring

A KG is described by a set of triples Δ\Delta, where each l=(h,r,t)Δl=(h,r,t)\in\Delta contains hh, rr, and tt as head entity, relation, and tail entity, respectively. Denote E={h,t|(h,r,t)Δ}E=\{h,t|(h,r,t)\in\Delta\} and R={r|(h,r,t)Δ}R=\{r|(h,r,t)\in\Delta\} as the set of entities and relations of Δ\Delta. Let ff be the scoring function, the task of KGE is to find vectors 𝐡,𝐫,𝐭\mathbf{h},\mathbf{r},\mathbf{t} corresponding to h,r,th,r,t which minimize:

=l=(h,r,t)Δ(h,r,t)Δl[γ+f(𝐡,𝐫,𝐭)f(𝐡,𝐫,𝐭)]+,\displaystyle\mathcal{I}=\sum_{l=(h,r,t)\in\Delta}\sum_{(h^{\prime},r,t^{\prime})\in\Delta_{l}^{\prime}}[\gamma+f(\mathbf{h},\mathbf{r},\mathbf{t})-f(\mathbf{h}^{\prime},\mathbf{r},\mathbf{t}^{\prime})]_{+}, (1)

where

Hl\displaystyle H_{l} =\displaystyle= {(e,r,t)|eE,eh},l=(h,r,t),\displaystyle\{(e,r,t)|e\in E,e\neq h\},\quad l=(h,r,t), (2)
Tl\displaystyle T_{l} =\displaystyle= {(h,r,e)|eE,et},l=(h,r,t),\displaystyle\{(h,r,e)|e\in E,e\neq t\},\quad l=(h,r,t), (3)
Δl\displaystyle\Delta_{l}^{\prime} =\displaystyle= HlTlΔ,\displaystyle H_{l}\cup T_{l}-\Delta, (4)

and l=(h,r,t)Δl=(h,r,t)\in\Delta, γ\gamma is a margin hyperparameter. The function []+[\cdot]_{+} is defined by [x]+=max(0,x)[x]_{+}=\max(0,x) for xx\in\mathbb{R}.

Suppose l=(h,r,t)l=(h,r,t) be a positive triple, the trueness of ll assessed by ff is given by

rankl\displaystyle{\rm rank}_{l} =\displaystyle= The rank of f(l) in the sequence of\displaystyle\mbox{The rank of }f(l)\mbox{ in the sequence of } (5)
{f(l)}f(Δl) in ascending order.\displaystyle\{f(l)\}\cup f(\Delta_{l}^{\prime})\mbox{ in ascending order}.

In general, we use the value rankl{\rm rank}_{l} to evaluate the trueness of a candidate triplet.

In order to begin with the following description, we give two critical functions 𝕞k()\mathbbm{m}_{k}(\cdot) and 𝕕k()\mathbbm{d}_{k}(\cdot).

For any uu\in\mathbb{R} and k+k\in\mathbb{Z}^{+}, 𝕞k(u)\mathbbm{m}_{k}(u) is defined as111𝕞1(u)\mathbbm{m}_{1}(u) is equivalent to [u][u] in [10].

𝕞k(u)[0,k) and u𝕞k(u)modk.\displaystyle\mathbbm{m}_{k}(u)\in[0,k)\mbox{ and }u\equiv\mathbbm{m}_{k}(u)\,{\rm mod}\,k. (6)

𝕕k(u)\mathbbm{d}_{k}(u) is defined as

𝕕k(u)=min(𝕞k(u),k𝕞k(u)).\displaystyle\mathbbm{d}_{k}(u)=\min(\mathbbm{m}_{k}(u),k-\mathbbm{m}_{k}(u)). (7)

Based on these definitions, for any uu\in\mathbb{R}, it is not difficult to verify that 𝕞k(u)=𝕞k(u±k)\mathbbm{m}_{k}(u)=\mathbbm{m}_{k}(u\pm k), 𝕞k(u)=k𝕞k(u)\mathbbm{m}_{k}(-u)=k-\mathbbm{m}_{k}(u), 𝕕k(u)=𝕕k(u)\mathbbm{d}_{k}(-u)=\mathbbm{d}_{k}(u), and 𝕕k(u)=𝕕k(u±k)\mathbbm{d}_{k}(u)=\mathbbm{d}_{k}(u\pm k). In particular, 𝕕k(u)\mathbbm{d}_{k}(u) can be reformulated as 𝕕k(u)=mini|u+ik|\mathbbm{d}_{k}(u)=\min_{i\in\mathbb{Z}}|u+ik|.

2.1 Revisit Torus Ring 𝕋2\mathbb{T}^{2}

Before introducing 𝕋2\mathbb{T}^{2}, we would like review the definition of Torus ring 𝕋1\mathbb{T}^{1}. As shown in Fig. 1, 𝕋1\mathbb{T}^{1} is a one-dimensional ring. In particular, any two points AA and BB on 𝕋1\mathbb{T}^{1} can be uniquely defined by their angles θ=2πx\theta=2\pi x and ω=2πy\omega=2\pi y with x,y[0,1)x,y\in[0,1), these variables are all depicted in Fig. 1.

The addition operation \oplus of any two points AA and BB is defined by xy=𝕞1(x+y)x\oplus y=\mathbbm{m}_{1}(x+y). The distance between AA and BB can be viewed as the minimal value among the rotating angles from AA to BB and from BB to AA, i.e., arcAB{\rm arc}_{AB} and arcBA{\rm arc}_{BA} as shown in Fig. 1. Describing such an idea in math, we have

dist(x,y)=min{𝕞1(xy),𝕞1(yx)}=𝕕1(xy).\displaystyle{\rm dist}(x,y)=\min\{\mathbbm{m}_{1}(x-y),\mathbbm{m}_{1}(y-x)\}=\mathbbm{d}_{1}(x-y).

This distance function (and all the following distance functions) satisfies

{dist(x,x)=0,dist(x,y)=dist(y,x)0,dist(x+y,z)dist(x,z)+dist(y,z),\displaystyle\left\{\begin{array}[]{l}\rm{dist}(x,x)=0,\\ \rm{dist}(x,y)=\rm{dist}(y,x)\geq 0,\\ \rm{dist}(x+y,z)\leq\rm{dist}(x,z)+\rm{dist}(y,z),\end{array}\right. (11)

where x,y,z[0,1)x,y,z\in[0,1).

Torus ring 𝕋2\mathbb{T}^{2} can be viewed as the independent stacking of two Torus ring 𝕋1\mathbb{T}^{1}. Given x=(x1,x2),y=(y1,y2)[0,1)×[0,1)x=(x_{1},x_{2}),y=(y_{1},y_{2})\in[0,1)\times[0,1) as two points on Torus ring 𝕋2\mathbb{T}^{2}, then the addition operation on 𝕋2\mathbb{T}^{2} is defined by xy=(𝕞1(x1+y1),𝕞1(x2+y2))x\oplus y=(\mathbbm{m}_{1}(x_{1}+y_{1}),\mathbbm{m}_{1}(x_{2}+y_{2})). The distance function on 𝕋2\mathbb{T}^{2} is given as

dist(x,y)=ϝτ,ϝ=(dist(x1,y1),dist(x2,y2)),\displaystyle\rm{dist}(x,y)=\|\digamma\|_{\tau},\quad\digamma=(\rm{dist}(x_{1},y_{1}),\rm{dist}(x_{2},y_{2})),

where τ\|\cdot\|_{\tau} is some vector norm.

Torus ring 𝕋2\mathbb{T}^{2} can be extended to nn-dimensional case, let x=(xi),y=(yi)[0,1)×nx=(x_{i}),y=(y_{i})\in[0,1)^{\times n} as two points on 𝕋n\mathbb{T}^{n}, it holds

xy\displaystyle x\oplus y =\displaystyle= (𝕞1(x1+y1),,𝕞1(xn+yn)),\displaystyle(\mathbbm{m}_{1}(x_{1}+y_{1}),\cdots,\mathbbm{m}_{1}(x_{n}+y_{n})), (12)
dist(x,y)\displaystyle\rm{dist}(x,y) =\displaystyle= ϝτ,,\displaystyle\|\digamma\|_{\tau},, (13)

where ϝ=(𝕕1(x1y1),𝕕1(x2y2),,𝕕1(xnyn)).\digamma=(\mathbbm{d}_{1}(x_{1}-y_{1}),\mathbbm{d}_{1}(x_{2}-y_{2}),\cdots,\mathbbm{d}_{1}(x_{n}-y_{n})).

Summarize the above discussion, the objective function (1) becomes TorusE when the scoring function f()f(\cdot) is set as

f(𝐡,𝐫,𝐭)=dist(𝐡𝐫,𝐭),𝐡,𝐫,𝐭𝕋n,\displaystyle f(\mathbf{h},\mathbf{r},\mathbf{t})={\rm dist}(\mathbf{h}\oplus\mathbf{r},\mathbf{t}),\quad\mathbf{h},\mathbf{r},\mathbf{t}\in\mathbb{T}^{n},

where dist(){\rm dist}(\cdot) is given by (13) and \oplus is given by (12).

Refer to caption
Figure 1: Illustration of Torus ring 𝕋1\mathbb{T}^{1}.

2.2 Möbius Ring 𝕄2\mathbb{M}^{2}

As an intuitive illustration of Möbius ring, we plot the surface of Möbius ring 𝕄2\mathbb{M}^{2} in Fig. 3. In this figure, let the radius from the center of the hole to the center of the tube be RR, the radius of the tube be rr, then the parametric equation for Möbius ring is

{x(θ,ω)=(R+rcos(θ2+ω))cosθ,y(θ,ω)=(R+rcos(θ2+ω))sinθ,z(θ,ω)=rsin(θ2+ω).\displaystyle\left\{\begin{array}[]{lcl}x(\theta,\omega)&=&(R+r\cos(\frac{\theta}{2}+\omega))\cos\theta,\\ y(\theta,\omega)&=&(R+r\cos(\frac{\theta}{2}+\omega))\sin\theta,\\ z(\theta,\omega)&=&r\sin(\frac{\theta}{2}+\omega).\\ \end{array}\right. (17)

Letting θ=2πx1\theta=2\pi x_{1} and ω=2πx2\omega=2\pi x_{2}, then any point on 𝕄2\mathbb{M}^{2} can be uniquely defined by (x1,x2)(x_{1},x_{2}). In particular, the period of x1x_{1} is 22 and the period of x2x_{2} is 11, hence we set (x1,x2)[0,2)×[0,1)(x_{1},x_{2})\in[0,2)\times[0,1).

Given x=(x1,x2),y=(y1,y2)[0,2)×[0,1)x=(x_{1},x_{2}),y=(y_{1},y_{2})\in[0,2)\times[0,1) as two points on a Möbius ring, the addition operation \oplus on 𝕄2\mathbb{M}^{2} is defined by

xy=(𝕞2(x1+y1),𝕞1(x2+y2)),\displaystyle x\oplus y=(\mathbbm{m}_{2}(x_{1}+y_{1}),\mathbbm{m}_{1}(x_{2}+y_{2})), (18)

i.e., the addition on 𝕄2\mathbb{M}^{2} is formulated by the modulus addition on each dimension with modulus 22 and 11, respectively.

The distance function dist(){\rm dist}(\cdot) between xx and yy is defined as

dist(x,y)\displaystyle{\rm dist}(x,y) =\displaystyle= min(w1,w2),\displaystyle\min(w_{1},w_{2}), (19)
w1\displaystyle w_{1} =\displaystyle= 𝕕2(y1x1)+𝕕1(y2x2),\displaystyle\mathbbm{d}_{2}(y_{1}-x_{1})+\mathbbm{d}_{1}(y_{2}-x_{2}), (20)
w2\displaystyle w_{2} =\displaystyle= 𝕕2(y1x1+1)+𝕕1(y2x2+12).\displaystyle\mathbbm{d}_{2}(y_{1}-x_{1}+1)+\mathbbm{d}_{1}\Big{(}y_{2}-x_{2}+\frac{1}{2}\Big{)}. (21)

One can further verify that the above definition follows the basic properties in (11). Moreover, the above distance satisfies:

Proposition 1.

For distance defined in (19) for 𝕄2\mathbb{M}^{2} and any two points x=(x1,x2)x=(x_{1},x_{2}), y=(y1,y2)y=(y_{1},y_{2}) in [0,2)×[0,1)[0,2)\times[0,1), it holds 0dist(x,y)340\leq{\rm dist}(x,y)\leq\frac{3}{4}, where the upperbound 34\frac{3}{4} achieves when x1y1=k±12x_{1}-y_{1}=k\pm\frac{1}{2} and x2y2=k2±14x_{2}-y_{2}=\frac{k^{\prime}}{2}\pm\frac{1}{4} for some integer kk and kk^{\prime}.

We give the proof of Proposition 1 in Appendix 6.3. In what follows, we will explain the reason for the definition (19) by using equation (17).

Given x=(x1,x2),y=(y1,y2)𝕄2x=(x_{1},x_{2}),y=(y_{1},y_{2})\in\mathbb{M}^{2} and the corresponding points AA and BB on 𝕄2\mathbb{M}^{2}, transforming AA to BB (or, BB to AA) needs one of the following two operations:

  • a).

    If transform AA to BB, then add y1x1+2ky_{1}-x_{1}+2k to the first component of xx, and add y2x2+ky_{2}-x_{2}+k^{\prime} to the second component of xx for some integer kk and kk^{\prime}. If transform BB to AA, then add x1y1+2kx_{1}-y_{1}+2k to the first component of yy, and add x2y2+kx_{2}-y_{2}+k^{\prime} to the second component of yy for some integer kk and kk^{\prime}. Hence, when we measure the distance in the first dimension using modulus 22 and considering that x1,y1[0,2)x_{1},y_{1}\in[0,2), such a distance is 𝕕𝟚(y1x1)\mathbbm{d_{2}}(y_{1}-x_{1}). Similarly, if we measure the distance between xx and yy in the second dimension using modulus 11 and considering that x2,y2[0,1)x_{2},y_{2}\in[0,1), the obtained distance is 𝕕𝟙(y2x2)\mathbbm{d_{1}}(y_{2}-x_{2}). By adding the distance in the two dimensions, we simply obtain the distance in this case be 𝕕2(y1x1)+𝕕1(y2x2)\mathbbm{d}_{2}(y_{1}-x_{1})+\mathbbm{d}_{1}(y_{2}-x_{2}), which conforms with (20).

  • b).

    If transform AA to BB, then add y1x1+2k+1y_{1}-x_{1}+2k+1 to the first component of xx, and add y2x2+k+12y_{2}-x_{2}+k^{\prime}+\frac{1}{2} to the second component of xx for some integer kk and kk^{\prime}. If transform BB to AA, then add x1y1+2k+1x_{1}-y_{1}+2k+1 to the first component of yy, and add x2y2+k+12x_{2}-y_{2}+k^{\prime}+\frac{1}{2} to the second component of yy for some integer kk and kk^{\prime}. In order to make an intuitive calculation, we calculate the value of θ2+ω\frac{\theta}{2}+\omega and θ\theta for AA in (17) as

    θ2+ω=πx1+2πx2,θ=2πx1,\frac{\theta}{2}+\omega=\pi x_{1}+2\pi x_{2},\quad\theta=2\pi x_{1},

    when add y1x1+2k+1y_{1}-x_{1}+2k+1 to x1x_{1}, and add y2x2+k+12y_{2}-x_{2}+k^{\prime}+\frac{1}{2} to x2x_{2}, it becomes

    θ\displaystyle\theta =\displaystyle= 2πy1+(4k+2)π,\displaystyle 2\pi y_{1}+(4k+2)\pi,
    θ2+ω\displaystyle\frac{\theta}{2}+\omega =\displaystyle= π(y1+2k+1)+2π(y2+k+12)\displaystyle\pi(y_{1}+2k+1)+2\pi(y_{2}+k^{\prime}+\frac{1}{2})
    =\displaystyle= πy1+2πy2+2(k+k+1)π,\displaystyle\pi y_{1}+2\pi y_{2}+2(k+k^{\prime}+1)\pi,

    which coincides with the coordinate of BB on 𝕄2\mathbb{M}^{2} (ignore multiple of 2π2\pi difference). Similar calculation can be conducted for the case of transforming BB to AA and hence we omit the details. To sum it up, when we measure the distance in the first dimension using modulus 22 and considering that x1,y1[0,2)x_{1},y_{1}\in[0,2), such a distance is 𝕕𝟚(y1x1+1)\mathbbm{d_{2}}(y_{1}-x_{1}+1). Similarly, if we measure the distance between xx and yy in the second dimension using modulus 11 and considering that x2,y2[0,1)x_{2},y_{2}\in[0,1), the obtained distance is 𝕕1(y2x2+12)\mathbbm{d}_{1}(y_{2}-x_{2}+\frac{1}{2}). By adding the distance in the two dimensions, we simply obtain the distance in this case be 𝕕2(y1x1+1)+𝕕1(y2x2+12)\mathbbm{d}_{2}(y_{1}-x_{1}+1)+\mathbbm{d}_{1}(y_{2}-x_{2}+\frac{1}{2}), which conforms with (21).

Summarize the above discussion, the distance between x,y𝕄2x,y\in\mathbb{M}^{2} should be the minimum of the two cases, which is equivalent to (19).

In fact, Möbius ring 𝕄2\mathbb{M}^{2} degenerates to Torus ring in some special case, as shown in the following proposition, whose proof is given in Appendix 6.3.

Proposition 2.

If the first dimension of 𝕄2\mathbb{M}^{2} is set to zero, then 𝕄2\mathbb{M}^{2} is equivalent to 𝕋1\mathbb{T}^{1}.

For comparison between Torus ring and Möbius ring, we list the parametric equations for Torus ring 𝕋2\mathbb{T}^{2} as:

{x(θ,ω)=(R+rcosω)cosθ,y(θ,ω)=(R+rcosω)sinθ,z(θ,ω)=rsinω.\displaystyle\left\{\begin{array}[]{lcl}x(\theta,\omega)&=&(R+r\cos\omega)\cos\theta,\\ y(\theta,\omega)&=&(R+r\cos\omega)\sin\theta,\\ z(\theta,\omega)&=&r\sin\omega.\\ \end{array}\right. (25)

As we can see in (25), the period of θ\theta and ω\omega are both 2π2\pi, hence the following points on 𝕋2\mathbb{T}^{2} are equivalent:

(θ,ω)(θ+2kπ,ω+2kπ),k,k.\displaystyle(\theta,\omega)\Leftrightarrow(\theta+2k\pi,\omega+2k^{\prime}\pi),\quad k,k^{\prime}\in\mathbb{Z}.

However, in Möbius (17), the period of θ\theta and ω\omega are 4π4\pi and 2π2\pi respectively. In particular, the following points can be viewed as equivalent points on Möbius ring:

(θ,ω)\displaystyle(\theta,\omega) \displaystyle\Leftrightarrow (θ+4kπ,ω+2kπ)\displaystyle(\theta+4k\pi,\omega+2k^{\prime}\pi)
\displaystyle\Leftrightarrow (θ+(4k+2)π,ω+(2k+1)π),k,k,\displaystyle(\theta+(4k+2)\pi,\omega+(2k^{\prime}+1)\pi),\quad k,k^{\prime}\in\mathbb{Z},

which can be verified in (17).

As shown in Fig. 2 and Fig. 3, the parametric curve on Torus ring and Möbius ring are also quite different: fixing ω\omega in (25) generates a cycle, but fixing ω\omega in (17) generates a twisted cycle.

Refer to caption
Figure 2: Parametirc curve on Torus: ω=π2\omega=\frac{\pi}{2} in (25)
Refer to caption
Figure 3: Parametirc curve on Mobius: ω=0\omega=0 in (17).

2.3 Möbius Ring 𝕄q/p\mathbb{M}^{q/p}

The simple Möbius ring 𝕄2\mathbb{M}^{2} can be extended to more general case 𝕄q/p\mathbb{M}^{q/p}. Möbius ring 𝕄q/p\mathbb{M}^{q/p} is a space constructed on [0,q)×[0,p)[0,q)\times[0,p), where p,qp,q are co-prime positive integers, any x,y𝕄q/px,y\in\mathbb{M}^{q/p} satisfies:

xy\displaystyle x\oplus y =\displaystyle= (𝕞q(x1+y1),𝕞p(x2+y2)),\displaystyle(\mathbbm{m}_{q}(x_{1}+y_{1}),\mathbbm{m}_{p}(x_{2}+y_{2})), (26)
dist(x,y)\displaystyle\rm{dist}(x,y) =\displaystyle= minj=0pq1{𝕕q(y1x1+1pj)+\displaystyle\min_{j=0}^{pq-1}\Big{\{}\mathbbm{d}_{q}(y_{1}-x_{1}+\frac{1}{p}j)+ (27)
𝕕p(y2x2+1qj)}.\displaystyle\mathbbm{d}_{p}(y_{2}-x_{2}+\frac{1}{q}j)\Big{\}}.

The reason of why dist(x,y)\rm{dist}(x,y) is defined as (30) is similar to that of (19) and hence omitted here. The distance dist(,)\rm{dist}(\cdot,\cdot) satisfies properties (11) too, and in particular there is:

Proposition 3.

The distance defined in (30) for 𝕄q/p\mathbb{M}^{q/p} satisfies

dist(x,y)[0,12p+12q),{\rm dist}(x,y)\in\Big{[}0,\frac{1}{2p}+\frac{1}{2q}\Big{)},

where the upperbound achieves when (x,y)=(12p,12q)(x,y)=\big{(}\frac{1}{2p},\frac{1}{2q}\big{)}.

The proof of the above proposition is similar to that of Proposition 1 and hence omitted in this paper. The proof of dist(x+y,z)dist(x,z)+dist(y,z)\rm{dist}(x+y,z)\leq\rm{dist}(x,z)+\rm{dist}(y,z) in 𝕄q/p\mathbb{M}^{q/p} in (11) is given in Appendix 6.1.

In order to increase the embedding dimensions for KGE, we define

𝕄nq/p=𝕄q/p𝕄q/p𝕄q/pn1\displaystyle\mathbb{M}^{q/p}_{n}=\underbrace{\mathbb{M}^{q/p}\uplus\mathbb{M}^{q/p}\uplus\cdots\uplus\mathbb{M}^{q/p}}_{n-1\;\;\uplus} (28)

as the direct sum of nn Möbius ring 𝕄q/p\mathbb{M}^{q/p}. For each u=(u1,u2,,u=(u_{1},u_{2},\cdots, un)u_{n}), v=(v1,v2,vn)𝕄nq/pv=(v_{1},v_{2},\cdots v_{n})\in\mathbb{M}^{q/p}_{n}, there is vi,ui𝕄q/pv_{i},u_{i}\in\mathbb{M}^{q/p} for each 1in1\leq i\leq n and

uv\displaystyle u\oplus v =\displaystyle= (u1v1,,unvn),\displaystyle(u_{1}\oplus v_{1},\cdots,u_{n}\oplus v_{n}), (29)
dist(u,v)\displaystyle{\rm dist}(u,v) =\displaystyle= ϝτ,\displaystyle\|\digamma\|_{\tau}, (30)

where ϝ=(dist(u1,v1),,dist(un,vn))\digamma=(\rm{dist}(u_{1},v_{1}),\cdots,\rm{dist}(u_{n},v_{n})) and τ\|\cdot\|_{\tau} is some vector norm.

For embedding on Möbius ring, the scoring function f(𝐡,𝐫,𝐭)f(\mathbf{h},\mathbf{r},\mathbf{t}) in (1) will be set as

f(𝐡,𝐫,𝐭)=dist(𝐡𝐫,𝐭),𝐡,𝐫,𝐭𝕄nq/p,\displaystyle f(\mathbf{h},\mathbf{r},\mathbf{t})={\rm dist}(\mathbf{h}\oplus\mathbf{r},\mathbf{t}),\quad\mathbf{h},\mathbf{r},\mathbf{t}\in\mathbb{M}^{q/p}_{n},

where the operation \oplus is defined in (29) and distance function is defined in (30), which is listed in Tab. 1.

In what follows, we list a key proposition for Möbius ring 𝕄q/p\mathbb{M}^{q/p}, whose proof is intuitive and hence omitted here.

Proposition 4.

The solutions of the following equation

dist(x,0)=0\displaystyle{\rm dist}(x,0)=0 (31)

in the region [0,q)×[0,p)[0,q)\times[0,p) for 𝕄q/p\mathbb{M}^{q/p} are x=(iq/p,jp/q)x=(iq/p,jp/q) with 0i<p0\leq i<p and 0j<q0\leq j<q.

In the next, we call the solutions of equation (31) be zero points of 𝕄q/p\mathbb{M}^{q/p}.

For any given KG described by set of triplets Δ\Delta, one of the difficulties for KGE comes from the cycle structure in the given KG data, a cycle structure in KG means the existence of entities hih_{i} and relations rir_{i} (0im0\leq i\leq m) such that

(hi,ri,hi+1)Δ or (hi+1,ri,hi)Δ for 0im1,\displaystyle(h_{i},r_{i},h_{i+1})\in\Delta\mbox{ or }(h_{i+1},r_{i},h_{i})\in\Delta\mbox{ for }0\leq i\leq m-1,

and (hm,rm,h0)Δ(h_{m},r_{m},h_{0})\in\Delta or (h0,rm,hm)Δ(h_{0},r_{m},h_{m})\in\Delta. In the case that the given set Δ\Delta contains no cycle structure, the geometric structure of Δ\Delta becomes a tree, and the embedding task for this KG without considering the negative samples can be implemented via a simple iteration strategy. However, realistic KG data contains a great amount of logic cycles and an appropriate KGE strategy should choose an algebraic structure which is capable of fitting the constraint generated by tremendous number of cycles.

Consider the properties in (11), the constraint of the above cycle without entities can be described by an equation of relations on 𝕄q/p\mathbb{M}^{q/p}:

dist(γ0r^0+γ1r^1+γsr^s,0)=0,\displaystyle{\rm dist}(\gamma_{0}\hat{r}_{0}+\gamma_{1}\hat{r}_{1}+\cdots\gamma_{s}\hat{r}_{s},0)=0, (32)

where γi\gamma_{i}\in\mathbb{Z}, r^i{r0,r1,,rm}\hat{r}_{i}\in\{r_{0},r_{1},\cdots,r_{m}\}.

In general, the number of cycles in a given KG dataset is much larger than that of the relations, hence the set of equations (32) in Euclidean space is hardly to have an exact solution. Intuitively, since the number of zero points in 𝕄q/p\mathbb{M}^{q/p} in the region [0,q)×[0,p)[0,q)\times[0,p) is pqpq, the set of equations (32) derived by all the cycles in given KG data will have more chance to get an exact solution. However, the number of zero point in 𝕋2=[0,1)×[0,1)\mathbb{T}^{2}=[0,1)\times[0,1) is only 11 for TorusE, which is much smaller than that of 𝕄q/p\mathbb{M}^{q/p}. In the above sense, MöbiusE assumes more powerful expressiveness than TorusE.

3 Experiments

The performance of the proposed MöbiusE is tested on two datasets: FB15K and WN18 [5], which are extracted from realistic knowledge graphs. The basic properties on these two datasets is given in Tab. 2.

We conduct the link prediction task by using the same method reported in [5]. For each test triplet (which is a positive sample), we replace its head (or tail) to generate a corrupted triplet, then the score of each corrupted triplet is calculated by the scoring function f()f(\cdot), and the ranking of the test triplet is obtained according to these scores, i.e., the definition of rankl{\rm rank}_{l} in (5). It should be noted from (4) that the set of generated corrupted elements excludes the positive triplets Δ\Delta, we call such a set be “filtered” [10], and the ranking values in our experiments are all obtained on the “filtered” set.

We choose stochastic gradient descent algorithm to optimize the objective (1). In order to generate the negative sample in (1), we employ the “Bern” method [7] for negative sampling because of the datasets only contains positive triplets.

To evaluate our model, we use Mean Rank (MR), Mean Reciprocal Rank (MRR) and HIT@m as evaluating indicators [5]. MR is calculated by meanlΔ(rankl){\rm mean}_{l\in\Delta}({\rm rank}_{l}) with Δ\Delta be the set of triplet data and rankl{\rm rank}_{l} is defined in (5), MRR is calculated by meanlΔ(1/rankl){\rm mean}_{l\in\Delta}(1/{\rm rank}_{l}), HIT@m is defined by |{lΔ:ranklm}|/|Δ||\{l\in\Delta:{\rm rank}_{l}\leq m\}|/|\Delta|.

We conduct a grid search to find a set of optimal hyper-parameters for each dataset, the searching area for margin γ\gamma in (1) is {2000,1000,500,200,100}\{2000,1000,500,200,100\} and the searching area for learning rate α\alpha in stochastic gradient descent is {0.002,0.001,\{0.002,0.001, 0.0005,0.0002,0.0001}0.0005,0.0002,0.0001\}. Finally, the learning rate α\alpha is set to 0.00050.0005 both for WN18 and FB15K, the margin γ\gamma is set to 2,0002,000 for WN18 and 500500 for FB15K.

We conduct our experiments on the augmented Möbius ring 𝕄nq/p\mathbb{M}^{q/p}_{n} as defined in (28), and we choose three different types of Möbisu ring for embedding, i.e., 𝕄n2/1\mathbb{M}^{2/1}_{n}, 𝕄n3/1\mathbb{M}^{3/1}_{n}, 𝕄n3/2\mathbb{M}^{3/2}_{n}. In choosing the distance function in 𝕄nq/p\mathbb{M}^{q/p}_{n}, we choose τ=1\|\cdot\|_{\tau}=\|\cdot\|_{1} in dist(){\rm dist}(\cdot) in (30). We call the above corresponding MöbiusE be MöbiusE(2,1), MöbiusE(3,1), MöbiusE(3,2) in Tab. 3 and Tab. 4. The value of nn in 𝕄nq/p\mathbb{M}^{q/p}_{n} is set to 5,0005,000, which means in each embedding vector we have 10,00010,000 parameters to be trained. The dimensions for other types of models in Tab. 3 and Tab. 4 are all set to 10,00010,000.

As shown in Tab. 3, Möbius(3,1) outperforms the other models in 4 of the 5 critical indicators. In Tab. 4, Möbius(2,1) outperforms the other models also in 4 of the 5 critical indicators.

Dataset #Ent #Rel #Train #Valid #Test
WN18 40,943 18 141,442 5,000 5,000
FB15K 14,951 1,345 483,142 50,000 59,071
Table 2: Statistics of tested datasets.
Model MRR MR HIT@10 HIT@3 HIT@1
TransE 0.516 73.21 0.847 0.738 0.269
TransH 0.559 82.48 0.795 0.680 0.404
RESCAL 0.354 - 0.587 0.409 0.235
DistMult 0.690 151.40 0.818 0.749 0.609
ComplEx 0.673 210.17 0.796 0.713 0.606
TorusE 0.746 143.93 0.839 0.784 0.689
MöbiusE(2,1) 0.782 118.53 0.862 0.817 0.731
MöbiusE(3,1) 0.783 114.50 0.863 0.817 0.734
MöbiusE(3,2) 0.767 73.87 0.863 0.816 0.702
Table 3: Performance of MöbiusE in FB15K
Model MRR MR HIT@10 HIT@3 HIT@1
TransE 0.414 - 0.688 0.534 0.247
TransR 0.218 - 0.582 0.404 0.218
RESCAL 0.890 - 0.928 0.904 0.842
DistMult 0.797 655 0.946 - -
ComplEx 0.941 - 0.947 0.945 0.936
TorusE 0.947 577.62 0.954 0.950 0.943
MöbiusE(2,1) 0.947 539.80 0.954 0.950 0.944
MöbiusE(3,1) 0.947 531.80 0.954 0.950 0.943
MöbiusE(3,2) 0.944 425.51 0.954 0.949 0.937
Table 4: Performance of MöbiusE in WN18

4 Related Works

The key for different types of KGE strategies is the choosing of embedding spaces, and in turn different addition operations and different scoring functions, i.e., different f()f(\cdot) in (1). As the first work in KGE, TranE [5] uses the basic algebraic addition to generate the score function, and regularizes the obtained vectors via a normalized condition. In TransE, the left (right) entity and the relation uniquely defines the right (left) entity. However, for realistic KG, it may have the following properties: a). ONE-TO-MANY mapping, given any fixed relation rr, the left (right) entity corresponding to a the right (left) entity is not unique; b): MANY-TO-ONE mapping, given any fixed entities hh and tt, the relation between hh and tt may not be unique. The original TransE cannot resolve the above two drawbacks in the beginning.

As an improvement of TransE to overcome the above drawback a), TransR [6] maps each relation rr to a vector and a matrix, the corresponding scoring function is derived by the image of such a relation matrix. In this sense, different entities may be mapped to a same vector in the image space and the drawback a) of TransE is resolved. Similar to TransR, TransH [7] uses a single vector to obtain the image of each entities. As further generalization of TransR and TransH, TransD [15] uses a dynamic matrix which is determined both by relation and entities to generate the above mapping matrix. As another solution to the drawbacks of TransE, TransM [16] introduces a relation-based adjust factor to the scoring function in TransE, such a factor will be much smaller in MANY-TO-MANY case than that of ONE-TO-ONE case and hence penalizing effects works.

Introducing flexibility in the scoring function in some extent enhances the generalization capability of KGE. As an implementation of such an idea, TransF [17] uses a flexible scoring function in KGE, in which the matching degree is desribed by the inner products of desired entities and given relation. Increasing the complexity in the scoring function enhances the nonlinear representative capability of KGE, RESCAL [14] and DistMult [8] uses a matrix-induced inner product to represent the scoring function in KGE, and ComplEx [9] applies the similar but embeds both relation and entities to a complex-valued space. TransA [11] combines the idea of TransE and RESCAL and incorporates the residue vector in TransE to the vector-based norm in RESCAL, such a strategy can adaptively find the loss function according to the structure of knowledge graphs.

MöbiusE is inspired from TorusE [10], however, MöbiusE differs from TorusE in the following aspects: The minimal dimension of Torus ring is 11, but the minimal dimension of Möbius ring is 22; The addition in Torus ring chooses an unique value for modulus operation (generally 11), but the addition in Möbius ring chooses different values (generally pp and qq in 𝕄q/p\mathbbm{M}^{q/p}); The distance function is Möbius ring is strongly nonlinear (see, (30)), which is much more complicated than that of Torus ring. These major difference guarantees the strong expressiveness of MöbiusE.

5 Conclusions

A novel KGE strategy is proposed by taking advantage of the intertwined rotating property of Möbius ring. As a first step, we defined the basic addition operation on Möbius ring and discussed its related properties. Next, in order to obtain an appropriate scoring function based on Möbius ring, we built a distance function on Möbius ring and constructed the corresponding distance-induced scoring function. Finally, a complete KGE strategy is obtained via the above constructions, which outperforms several key KGE strategies in our conducted experiments.

6 Appendix

6.1 Proof of dist(x,z)dist(x,y)+dist(y,z)\rm{dist}(x,z)\leq\rm{dist}(x,y)+\rm{dist}(y,z) in 𝕄q/p\mathbb{M}^{q/p}.

According to the definition of dist(,)\rm{dist}(\cdot,\cdot) on 𝕄q/p\mathbb{M}^{q/p}, we only need to prove dist(x+y,0)dist(x,0)+dist(y,0)\rm{dist}(x+y,0)\leq\rm{dist}(x,0)+\rm{dist}(y,0).

Note that for any i1,i2,i1,i2,j,ji_{1},i_{2},i_{1}^{\prime},i_{2}^{\prime},j,j^{\prime}\in\mathbb{Z}, there is

𝕕q(y1+x1+1pj)+𝕕p(y2+x2+1qj)\displaystyle\mathbbm{d}_{q}(y_{1}+x_{1}+\frac{1}{p}j)+\mathbbm{d}_{p}(y_{2}+x_{2}+\frac{1}{q}j)
\displaystyle\leq |y1+x1+1pj+i1q|+|y2+x2+1qj+i2p|+\displaystyle|y_{1}+x_{1}+\frac{1}{p}j+i_{1}q|+|y_{2}+x_{2}+\frac{1}{q}j+i_{2}p|+
\displaystyle\leq |x1+1pj+1pj+i1q+i1q|+\displaystyle|x_{1}+\frac{1}{p}j+\frac{1}{p}j^{\prime}+i_{1}q+i_{1}^{\prime}q|+
|x2+1qj+1qj+i2p+i2p|+\displaystyle|x_{2}+\frac{1}{q}j+\frac{1}{q}j^{\prime}+i_{2}p+i_{2}^{\prime}p|+
|y11pji1q|+|y21qji2p|\displaystyle|y_{1}-\frac{1}{p}j^{\prime}-i_{1}^{\prime}q|+|y_{2}-\frac{1}{q}j^{\prime}-i_{2}^{\prime}p|

based on which and the arbitrariness of i1i_{1} and i2i_{2}, there is

dist(x+y,0)\displaystyle{\rm dist}(x+y,0)
=\displaystyle= minj=0pq1{𝕕q(y1+x1+1pj)+𝕕p(y2+x2+1qj)}\displaystyle\min_{j=0}^{pq-1}\left\{\mathbbm{d}_{q}(y_{1}+x_{1}+\frac{1}{p}j)+\mathbbm{d}_{p}(y_{2}+x_{2}+\frac{1}{q}j)\right\}
\displaystyle\leq minj=0pq1(mini1|x1+1pj+1pj+i1q+i1q|+\displaystyle\min_{j=0}^{pq-1}\left(\min_{i_{1}}|x_{1}+\frac{1}{p}j+\frac{1}{p}j^{\prime}+i_{1}q+i_{1}^{\prime}q|\right.+
mini2|x2+1qj+1qj+i2p+i2p|)+\displaystyle\left.\min_{i_{2}}|x_{2}+\frac{1}{q}j+\frac{1}{q}j^{\prime}+i_{2}p+i_{2}^{\prime}p|\right)+
|y11pji1q|+|y21qji2p|\displaystyle|y_{1}-\frac{1}{p}j^{\prime}-i_{1}^{\prime}q|+|y_{2}-\frac{1}{q}j^{\prime}-i_{2}^{\prime}p|
=\displaystyle= minj=0pq1(𝕕q(x1+1p(j+j))+𝕕p(x2+1q(j+j)))\displaystyle\min_{j=0}^{pq-1}\left(\mathbbm{d}_{q}(x_{1}+\frac{1}{p}(j+j^{\prime}))+\mathbbm{d}_{p}(x_{2}+\frac{1}{q}(j+j^{\prime}))\right)
+|y11pji1q|+|y21qji2p|.\displaystyle+|y_{1}-\frac{1}{p}j^{\prime}-i_{1}^{\prime}q|+|y_{2}-\frac{1}{q}j^{\prime}-i_{2}^{\prime}p|.
=\displaystyle= dist(x,0)+|y11pji1q|+|y21qji2p|.\displaystyle{\rm dist}(x,0)+|y_{1}-\frac{1}{p}j^{\prime}-i_{1}^{\prime}q|+|y_{2}-\frac{1}{q}j^{\prime}-i_{2}^{\prime}p|.

Due to the arbitrariness of j,i1,i2j^{\prime},i_{1}^{\prime},i_{2}^{\prime}, there is dist(x+y,0)dist(x,0)+dist(y,0){\rm dist}(x+y,0)\leq{\rm dist}(x,0)+{\rm dist}(y,0).

6.2 Proof of Proposition 1

We define a function g(α,β)=min{g1,g2}g(\alpha,\beta)=\min\{g_{1},g_{2}\} with g1=𝕕2(α)+𝕕1(β)g_{1}=\mathbbm{d}_{2}(\alpha)+\mathbbm{d}_{1}(\beta), g2=𝕕2(α+1)+𝕕1(β+12)g_{2}=\mathbbm{d}_{2}(\alpha+1)+\mathbbm{d}_{1}(\beta+\frac{1}{2}). Based on which we know α\alpha has a period of 22 and β\beta has a period of 11. Next, we divide the interval [1,0)[-1,0) to 44 subintervals Iα,i=[1+0.5i,0.5+0.5i)I_{\alpha,i}=[-1+0.5i,-0.5+0.5i), and divide [0.5,0.5)[-0.5,0.5) to 44 subintervals Iβ,j=[0.5+0.25j,0.25+0.25j)I_{\beta,j}=[-0.5+0.25j,-0.25+0.25j) with i,j=0,1,2,3i,j=0,1,2,3, then the value of supx,yIαi×Iβ,jg\sup_{x,y\in I_{\alpha_{i}}\times I_{\beta,j}}g can be calculated and all these 1616 values are 3/43/4.

6.3 Proof of Proposition 2

Let x=(0,x2)x=(0,x_{2}), y=(0,y2)y=(0,y_{2}), then dist(x,y)=min(𝕕1(y2x2),1+𝕕1(y2x2+12))=𝕕1(y2x2){\rm dist}(x,y)=\min(\mathbbm{d}_{1}(y_{2}-x_{2}),1+\mathbbm{d}_{1}(y_{2}-x_{2}+\frac{1}{2}))=\mathbbm{d}_{1}(y_{2}-x_{2}), hence 𝕄2\mathbb{M}^{2} can be viewed as 𝕋1\mathbb{T}^{1} by constraining the first dimension to zero.

References

  • [1] J. Lu, J. Xuan, G. Zhang, X. Luo, Structural property-aware multilayer network embedding for latent factor analysis, Pattern Recognition 76 (2018) 228 – 241.
  • [2] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives, Dbpedia: a nucleus for a web of open data, in: Proceedings of the 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference, 2007, pp. 722–735.
  • [3] M. F. Suchanek, G. Kasneci, G. Weikum, Yago: a core of semantic knowledge, in: Proceedings of the 16th International Conference on World Wide Web, 2007, pp. 697–706.
  • [4] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor, Freebase: a collaboratively created graph database for structuring human knowledge, in: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 2008, pp. 1247–1250.
  • [5] A. Bordes, N. Usunier, A. García-Durán, J. Weston, O. Yakhnenko, Translating embeddings for modeling multi-relational data, in: Advances in Neural Information Processing Systems, 2013, pp. 2787–2795.
  • [6] Y. Lin, Z. Liu, M. Sun, Y. Liu, X. Zhu, Learning entity and relation embeddings for knowledge graph completion, in: Proceedings of the 29th AAAI Conference on Artificial Intelligence, 2015, pp. 2181–2187.
  • [7] Z. Wang, J. Zhang, J. Feng, Z. Chen, Knowledge graph embedding by translating on hyperplanes, in: Proceedings of the 28th AAAI Conference on Artificial Intelligence, 2014, pp. 1112–1119.
  • [8] B. Yang, W. T. Yih, X. He, J. Gao, L. Deng, Embedding entities and relations for learning and inference in knowledge bases, in: Proceedings of the 4th International Conference of Learning Representations, 2015, pp. 1412–1423.
  • [9] T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, G. Bouchard, Complex embeddings for simple link prediction, in: Proceedings of the 29th International Conference on Machine Learning, 2012, pp. 2071–2080.
  • [10] T. Ebisu, R. Ichise, Toruse: Knowledge graph embedding on a lie group, in: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018, pp. 1819–1826.
  • [11] Y. Jia, Y. Wang, X. Jin, H. Lin, X. Cheng, Knowledge graph embedding: A locally and temporally adaptive translation-based approach, ACM Transactions on the Web (2018) 1559–1131.
  • [12] T. Dettmers, P. Minervini, P. Stenetorp, S. Riedel, Convolutional 2D knowledge graph embeddings, in: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018, pp. 1811–1818.
  • [13] S. Zhang, Y. Tay, L. Yao, Q. Liu, Quaternion knowledge graph embeddings, in: Advances in Neural Information Processing Systems, 2019, pp. 2731–2741.
  • [14] M. Nickel, V. Tresp, H. P. Kriegel, A three-way model for collective learning on multi-relational data, in: Proceedings of the 28th International Conference on Machine Learning, 2011, pp. 809–816.
  • [15] G. Ji, S. He, L. Xu, K. Liu, J. Zhao, Knowledge graph embedding via dynamic mapping matrix, in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015, pp. 687–696.
  • [16] M. Fan, Q. Zhou, E. Chang, T. F. Zheng, Transition-based knowledge graph embedding with relational mapping properties, in: Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation, 2014, pp. 328–337.
  • [17] J. Feng, M. Huang, M. Wang, M. Zhou, Y. Hao, X. Zhu, Knowlege graph embedding by flexible translation, in: Proceedings of the 15th International Conference on Principles of Knowledge Representation and Reasoning, 2016, pp. 557–560.