This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Dual-Geometric Space Embedding Model for Two-View Knowledge Graphs

Roshni G. Iyer University of California, Los AngelesLos AngelesCAUSA [email protected] Yunsheng Bai University of California, Los AngelesLos AngelesCAUSA [email protected] Wei Wang University of California, Los AngelesLos AngelesCAUSA [email protected]  and  Yizhou Sun University of California, Los AngelesLos AngelesCAUSA [email protected]
(2022)
Abstract.

Two-view knowledge graphs (KGs) jointly represent two components: an ontology view for abstract and commonsense concepts, and an instance view for specific entities that are instantiated from ontological concepts. As such, these KGs contain heterogeneous structures that are hierarchical, from the ontology-view, and cyclical, from the instance-view. Despite these various structures in KGs, most recent works on embedding KGs assume that the entire KG belongs to only one of the two views but not both simultaneously. For works that seek to put both views of the KG together, the instance and ontology views are assumed to belong to the same geometric space, such as all nodes embedded in the same Euclidean space or non-Euclidean product space, an assumption no longer reasonable for two-view KGs where different portions of the graph exhibit different structures. To address this issue, we define and construct a dual-geometric space embedding model (DGS) that models two-view KGs using a complex non-Euclidean geometric space, by embedding different portions of the KG in different geometric spaces. DGS utilizes the spherical space, hyperbolic space, and their intersecting space in a unified framework for learning embeddings. Furthermore, for the spherical space, we propose novel closed spherical space operators that directly operate in the spherical space without the need for mapping to an approximate tangent space. Experiments on public datasets show that DGS significantly outperforms previous state-of-the-art baseline models on KG completion tasks, demonstrating its ability to better model heterogeneous structures in KGs.

Knowledge Graph Embeddings; Non-Euclidean Geometry
journalyear: 2022copyright: rightsretainedconference: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; August 14–18, 2022; Washington, DC, USAbooktitle: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’22), August 14–18, 2022, Washington, DC, USAisbn: 978-1-4503-9385-0/22/08doi: 10.1145/3534678.3539350ccs: Computing methodologies Knowledge representation and reasoningccs: Computing methodologies non-Euclidean geometric space embedding

1. Introduction

Knowledge graphs (KGs) are essential data structures that have been shown to improve several semantic applications, including semantic search (Berant and Liang, 2014), question answering (Huang et al., 2019a; Iyer et al., 2022), and recommender systems (Wang et al., 2019). Two-view knowledge graphs (Huang et al., 2019b) consist of both (1) an instance-view component, containing entity-entity relations, and (2) an ontology-view component, containing concept-concept and entity-concept relations. In addition, there are nodes involved in both entity-concept relations and entity-entity relations, which we refer to as bridge nodes, e.g., nodes 5 and 6 in Figure 1(a). Bridge nodes provide the connections to bridge both KG views. The instance-view component contains data that form cyclical structures, such as nodes 8-10 in Figure 1(a). The ontology-view component contains data that form hierarchical structures. As such, the two-view KG contains intrinsic heterogeneous structures.

Refer to caption
Figure 1. Two-View KG visualization and embedding space. (a) Example two-view KG. (b) Corresponding DGS representation.

A key challenge with KGs is that they are often highly incomplete, making KG completion an important area of investigation. In recent years, several research works have focused on improving knowledge graph embeddings (KGE) to address this task (Bordes et al., 2013; Yang et al., 2015). The idea behind embedding methods is to map entities and relations to a latent low-dimensional vector space while preserving the semantics and inherent structures in the KG. However, most embedding methods, including graph neural network (GNN)-based (Chami et al., 2020; Sun et al., 2020; Chami et al., 2019; Iyer et al., 2021) and non-GNN-based methods (Kolyvakis et al., 2019; Trouillon et al., 2016), are limited to operating on KGs belonging to either the instance-view or the ontology-view but not both. Specifically, they either model entities and relations in the zero-curvature Euclidean space (Sun et al., 2019), omitting both cyclic and hierarchical structures, or in the hyperbolic space (Bai et al., 2021; Zhang et al., 2020), omitting cyclic structures and treating relations as hierarchical.

For works that seek to put both views of the KG together, the instance and ontology views are assumed to belong to the same space, an assumption no longer reasonable for two-view KGs. Specifically, all KG relations are modeled using the Euclidean space (Hao et al., 2019), or are all modeled using a product space, being the Riemannian product manifold of the Euclidean, spherical and hyperbolic spaces (Wang et al., 2021). However, our goal in modeling two-view KGs is to use a unified framework to embed hierarchical structures of the KG in the hyperbolic space, cyclic structures of the KG in the spherical space, and nodes involved in both structures in a unified intersection space.

To address the above challenges, in this paper, we present the Dual-Geometric Space embedding model (DGS) for two-view KGs. To summarize, our work makes the following contributions:

  • We formulate the problem of modeling a two-view KG in complex non-Euclidean geometric space for KG completion. To our knowledge, we are the first to model two-view KGs in distinct non-Euclidean spaces using a unifying framework, e.g., different views belong to different geometric spaces.

  • We propose to model instance-view entities in the spherical space for solely cyclical relations, and ontology-view concepts in the hyperbolic space for solely hierarchical relations, and bridge entities involved in both cyclic/hierarchical structures in a specially designed intersection bridge space.

  • To the best of our knowledge we are also the first to design closed spherical space operations, to directly operate in the spherical space without mapping to an external approximation space, e.g., the tangent Euclidean space.

  • We investigate seven variant and ablation models of DGS and evaluate these models on two KG completion tasks. Extensive experiments demonstrate the effectiveness of DGS in populating knowledge in two-view KGs. Further, our model significantly outperforms its single non-Euclidean and Euclidean geometric space counterparts including the product space, and existing state-of-the-art graph neural network (GNN) embedding methods.

2. Preliminary and Related Work

Here, we formalize two-view KGs, which jointly model the entities and concepts in the instance and ontology views, and discuss the achievement of primary state-of-the-art models for two-view KGs.

2.1. Problem Formulation

A two-view KG, GG, consists of nodes and edges, such that nodes denote the set of entities EE, or set of concepts, CC. Edges denote relations where RIR_{I} are the set of entity-entity relations, ROR_{O} are the set of concept-concept relations, and RIOR_{IO} are the set of entity-concept relations. The two-view KG consists of an instance-view component containing entities with relations of RIR_{I}, and an ontology-view component containing concepts and bridge entities with relations of ROR_{O} and RIOR_{IO}. We denote a subset of entities in EE to be bridge entities, that communicate between both instance and ontology views. These bridge entities associate with both RIR_{I} and RIOR_{IO} relations. Relations in RIOR_{IO} are entity-concept relations, which are similar to concept-concept relations such as “is_a”. EE and CC as well as RIR_{I}, R𝒪R_{\mathcal{O}} and RI𝒪R_{I\mathcal{O}} are each disjoint sets. We denote eiEe_{i}\in E to be the ii-th entity, 𝒉ei\bm{h}_{e_{i}} to be entity ii’s representation in the Cartesian coordinate system, and rIkRIr_{Ik}\in R_{I} to be the kk-th relation in the instance-view component. Without loss of generality (WLOG), we also model the ontology-view component where ciCc_{i}\in C is the ii-th concept, 𝒉ci\bm{h}_{c_{i}} is concept ii’s representation in the Cartesian coordinate system, and r𝒪kR𝒪r_{\mathcal{O}k}\in R_{\mathcal{O}} is the kk-th relation in the ontology-view component. Therefore, the instance-view can be considered as a set of triples in the form (ei,rIk,ej)E×RI×E(e_{i},{r_{Ik}},e_{j})\in E\times R_{I}\times E, which may be any real-world instance associations and are often cyclical in structure. Likewise, the ontology-view can be represented by a set of triples (ci,r𝒪k,cj)C×R𝒪×C(c_{i},{r_{\mathcal{O}k}},{c_{j}})\in C\times R_{\mathcal{O}}\times C, which contain hierarchical associations such as is_a. Here we use Figure 1(a) to illustrate the aforementioned problem formulation. For example, (Soil, composed_of, Mineral) and (Fungus, is_a, Eukaryote) are instance-view and ontology-view triples respectively. Further, nodes of Bacteria-Soil-Mineral is an example of a cycle between non-bridge entities.

A bridge entity may be involved with two types of triples:
(ei,rI𝒪k,cj)({e_{i}},{r_{I\mathcal{O}k}},{c_{j}}) E×RI𝒪×C\in E\times R_{I\mathcal{O}}\times C for (hierarchical) entity-concept relations and (ei,rIk,ej)({e_{i}},{r_{Ik}},{e_{j}}) E×RI×E\in E\times R_{I}\times E for (cyclical) entity-entity relations. For example, Rabbit is a bridge entity in Figure 1(a). The triple (Rabbit, emit, Carbon Dioxide) represents an entity-entity relation, and the triple (Rabbit, is_a, Herbivore) represents an entity-concept relation.

The objective of our research is to learn KG embeddings of nodes and relations in the KG, such that we seamlessly unify multiple curvature geometric spaces to better capture the contextual information and heterogeneous structures in the KGs. We evaluate the quality of the learned embeddings on the KG tasks of triple completion and entity typing, described in Section 4.

2.2. Non-Euclidean Geometric Spaces

In this section, we describe the various properties of non-Euclidean geometric spaces, which are curved spaces unlike the zero-curvature Euclidean space. The textbook (Petersen et al., 2006) provides more details. Geometric spaces of Euclidean (𝔼d\mathbb{E}^{d}), spherical (𝕊d\mathbb{S}^{d}), and hyperbolic (d\mathbb{H}^{d}) spaces belong to Riemannian manifolds (𝑴d\bm{M}^{d}), such that each point 𝒂𝑴d\bm{a}\in\bm{M}^{d} has a corresponding tangent space, (T𝒂𝑴d)d(T_{\bm{a}}\bm{M}^{d})^{d}, that approximates 𝑴d\bm{M}^{d} around 𝒂\bm{a}. Further, each Riemmanian manifold, 𝑴d\bm{M}^{d}, is associated with a Riemanian metric distdist that defines the geodesic distance of two points on the manifold and the curvature KK of the space. In the spherical space, curvature KS>0K_{S}>0, suitable for capturing cyclical structures (Zhang et al., 2021), while in the hyperbolic space, curvature KH<0K_{H}<0, suitable for capturing hierarchical structures (Nickel and Kiela, 2017). Widely used models on the hyperbolic space include the Poincaré ball model (Cannon et al., 1997), the Lorentz (Beltrami, 1897) model, and the Klein model (Beltrami, 1897). As the these three models are both isometric and isomorphic to one another, WLOG we utilize the Poincaré ball model in our work.

Non-Euclidean Space Optimization

DGS utilizes Riemannian optimization for updating entity and relational embeddings because Euclidean space optimization methods, such as SGD, provide the update direction of the Euclidean gradient to be in non-curvature space. This does not align with parameters in our model that must be updated in positive or negative curvature spaces. For parameter learning, DGS uses RSGD (Bonnabel, 2013), whose update function is denoted below, where (T𝒂Md)d(T_{\bm{a}}M^{d})^{d} denotes the tangent Euclidean space of 𝒂Md\bm{a}\in M^{d}, R(T𝒂Md)d\nabla_{R}\in(T_{\bm{a}}M^{d})^{d} denotes the Riemannian gradient of loss function L(𝒂)L(\bm{a}), 𝒂t\mathcal{R}_{\bm{a}_{t}} denotes retraction onto MdM^{d}, or non-Euclidean space at 𝒂\bm{a}, and ηt\eta_{t} denotes learning rate at time tt:

𝒂𝒕+𝟏=𝒂t(ηtRL(𝒂t))\bm{a_{t+1}}=\mathcal{R}_{\bm{a}_{t}}(-\eta_{t}\nabla_{R}L(\bm{a}_{t}))

The retraction operator, ()\mathcal{R}(\cdot) involves mapping between spaces. For non-Euclidean spaces, the retraction is generally performed between the non-Euclidean space and approximate tangent Euclidean space using logarithmic and exponential mapping functions as follows, where log0(𝒉eiB)\mathrm{log}_{0}(\bm{h}_{e_{i}}^{B}) is a logarithmic map at center 𝟎\bm{0} from the hyperbolic space to Euclidean tangent space, and exp0(𝒉eiB)\mathrm{exp}_{0}(\bm{h}_{e_{i}}^{B}) is an exponential map at center 𝟎\bm{0} from the Euclidean tangent space to hyperbolic space of 𝒂\bm{a}.

(1) log0(𝒂)=tanh1(i𝒂)𝒂i𝒂\mathrm{log}_{0}(\bm{a})=\mathrm{tanh}^{-1}(i\cdot\lVert\bm{a}\rVert)\frac{\bm{a}}{i\cdot\lVert\bm{a}\rVert}
(2) exp0(𝒂)=tanh(i𝒂)𝒂i𝒂\mathrm{exp}_{0}(\bm{a})=\mathrm{tanh}(i\cdot\lVert\bm{a}\rVert)\frac{\bm{a}}{i\cdot\lVert\bm{a}\rVert}

2.3. Two-View KG Models

In this section, we describe the models that are utilized for two-view KGs, which consider the problem setting of modeling ontological and instance views. To address the challenges of these models, we propose DGS in Section 3.

JOIE

The JOIE (Hao et al., 2019) model is an embedding model that considers the problem of two-view KGs. However, JOIE models all triples in the same zero-curvature Euclidean space, omitting the hierarchical and cyclical structures of the KG. Further, there is no special consideration of representation for bridge nodes.

Models leveraging product of spaces

Models including
M2GNN (Wang et al., 2021) which extends (Gu et al., 2018) from social networks to the domain of KGs, (Sun et al., 2022), etc. help to address the limitation of JOIE by utilizing the hyperbolic and spherical spaces for representing triples. However, they treat all triples in the KG to be in the same non-Euclidean product space, formed by a Riemannian product of the Euclidean, spherical, and hyperbolic spaces. Thus, the embedding space of hierarchical triples is not distinguished from the embedding space of cyclic triples, and further there is also no special consideration of representation for bridge nodes.

3. DGS Architecture

This section describes our proposed dual-geometric space embedding model (DGS), which jointly embeds entities and concepts in a knowledge graph. DGS embeds different link types of the two-view KG in different non-Euclidean manifold spaces to capture the inherent heterogeneous structures of the KG, as described below. We also design general procedures, which are detailed in the Supplements, for efficiently converting arbitrary dd-dimensional embeddings between the polar and Cartesian coordinate representations, which are utilized by sub-models in DGS. Source code is available at https://github.com/roshnigiyer/dgs.

3.1. Modeling

The key questions in the modeling process are in how to: (1) design or adopt an appropriate embedding space for nodes, (2) define the representation of entity/concept and relation in that embedding space, and (3) define the KG embedding model and loss function of DGS. Our framework enables each of entities and concepts to be influenced by bridge nodes, and simultaneously bridge nodes to be informed by entities and concepts. In this way, DGS exhibits a unified framework to jointly model two-views of KGs.

DGS models nodes in the instance view as points in a spherical space 𝕊d\mathbb{S}^{d} with fixed norm space wSw^{S}, and nodes in the ontology view as points in a hyperbolic space d\mathbb{H}^{d} with learnable norm space w𝒉ciHw_{\bm{h}_{c_{i}}^{H}} per concept. The bridge nodes lie in the intersection of the two, which is a submanifold intersection space, called 𝔹d\mathbb{B}^{d}, shown as the dotted circle in Figure 1(b). 𝔹d\mathbb{B}^{d} contains the same fixed norm space as the spherical space wSw^{S}. For modeling compatibility, we set the degrees of freedom of 𝕊d\mathbb{S}^{d}, d\mathbb{H}^{d}, and 𝔹d\mathbb{B}^{d} to d1d-1, d1d-1, and d2d-2 respectively. 𝔹d\mathbb{B}^{d} has one degree of freedom less than d\mathbb{H}^{d} and 𝕊d\mathbb{S}^{d} because it is a submanifold intersection space of both the spherical and hyperbolic spaces. The norm of 𝔹d\mathbb{B}^{d} is wSw^{S} because that is the intersection norm space of 𝕊d\mathbb{S}^{d} and d\mathbb{H}^{d}. The concept-specific norm spaces, w𝒉ciHw_{\bm{h}_{c_{i}}^{H}}, are learnable in order for the hierarchy in the KG to be learned by the embedding. In practice, it can be seen that hierarchical root concepts move towards the center of the hyperbolic space e.g., towards norm 0, shown in Section 4.

Parameter optimization, detailed in Section 3.2, is performed using RSGD, described in Section 2.2, on hinge loss functions which utilize non-Euclidean space geodesic distances for the spherical and hyperbolic spaces respectively. We construct the hinge loss function such that positive triples are scored higher than negative triples within positive margin hyperparameters.

3.1.1. Modeling Instance View KG in Spherical Space

Representation of entities.

We propose to embed the entities from the instance-view on the surface of the spherical ball from the spherical space, 𝕊d\mathbb{S}^{d}, in order to better capture the cyclic structures present in this KG view. Entities are represented as points 𝒉ei\bm{h}_{e_{i}} that belong to the surface of the spherical ball in 𝕊d\mathbb{S}^{d} as shown in Figure 1(b). Formally, 𝕊d={𝒉eiSd|𝒉eiS=wS},wS[0,1)\mathbb{S}^{d}=\{\bm{h}_{e_{i}}^{S}\in\mathbb{R}^{d}\big{|}\lVert\bm{h}_{e_{i}}^{S}\rVert=w^{S}\},w^{S}\in[0,1), is the dd-dimensional wSw^{S}-norm ball, where \lVert\cdot\rVert is the Euclidean norm.

For an entity 𝒉eiS\bm{h}_{e_{i}}^{S}, we propose to model the relation applied to the entity as a rotation operation, and therefore represent the relation rIkr_{Ik} as a vector of angles 𝜽rIkS[0,2π)d1\bm{\theta}_{r_{Ik}}^{S}\in[0,2\pi)^{d-1}. Below, we show our proposed vector transformation procedure to model the relation as rotation of the head entity, frot:𝒉eiS,new=frot(𝒉eiS,𝜽rIkS)f_{\mathrm{rot}}:\bm{h}_{e_{i}}^{S,new}=f_{\mathrm{rot}}(\bm{h}_{e_{i}}^{S},\bm{\theta}_{r_{Ik}}^{S}), and prove that the operation is closed in the spherical space. frot()f_{\mathrm{rot}}(\cdot) eliminates the need to project representations to the tangent Euclidean space in order to perform standard transformations. To the best of our knowledge, we are the first to propose a closed vector transformation procedure that operates directly in the spherical space.

Spherical Space Vector Transformation Procedure: frot()f_{\mathrm{rot}}(\cdot).

This section describes our proposed vector transformation procedure, frotf_{\mathrm{rot}}, for the spherical space, which directly performs operations using properties, such as positive curvature, in the spherical space. This section also outlines the proof of the closedness property of the transformation procedure. For computational efficiency, we also extend the vector transformation procedure to the batch version, the batch vector transformation procedure, which is detailed in the Supplements. frotf_{\mathrm{rot}} takes as input an entity embedding and relation operator, which it uses to transform the entity embedding: 𝒉eiS,new=frot(𝒉eiS,𝜽rIkS)\bm{h}_{e_{i}}^{S,new}=f_{\mathrm{rot}}(\bm{h}_{e_{i}}^{S},\bm{\theta}_{r_{Ik}}^{S}):

  1. (1)

    Given 𝒉eiS,𝜽rIkS\bm{h}_{e_{i}}^{S},\bm{\theta}_{r_{Ik}}^{S}, we convert 𝒉eiS\bm{h}_{e_{i}}^{S} to the corresponding representation 𝜽eiS\bm{\theta}_{e_{i}}^{S} in polar coordinates, with 𝑟𝑎𝑑\mathit{rad} denoting the radius which has the value wSw^{S}. Refer to Section A.1 for details on the conversion procedure.

    (3) 𝒉eiS:[xei,1S,xei,2S,,xei,dS]𝜽eiS:[θei,1S,θei,2S,,θei,d1S];𝑟𝑎𝑑=wS\bm{h}_{e_{i}}^{S}:[x_{e_{i},1}^{S},x_{e_{i},2}^{S},...,x_{e_{i},d}^{S}]\rightarrow\bm{\theta}_{e_{i}}^{S}:[\theta_{e_{i},1}^{S},\theta_{e_{i},2}^{S},...,\theta_{e_{i},d-1}^{S}];\mathit{rad}=w^{S}
    (4) 𝜽rIkS:[θrIk,1,θrIk,2,,θrIk,d1];𝑟𝑎𝑑=wS\bm{\theta}_{r_{Ik}}^{S}:[\theta_{r_{Ik},1},\theta_{r_{Ik},2},...,\theta_{r_{Ik},d-1}];\mathit{rad}=w^{S}
  2. (2)

    Denote 𝒛[l]\bm{z}[l] to be the ll-th entry of 𝒛\bm{z} and apply the transformation:

    (5) (𝜽eiS+𝜽rIkS)[l]=(𝜽eiS[l]+𝜽rIkS[l])mod 2π;l[1,d1](\bm{\theta}_{e_{i}}^{S}+\bm{\theta}_{r_{Ik}}^{S})[l]=(\bm{\theta}_{e_{i}}^{S}[l]+\bm{\theta}_{r_{Ik}}^{S}[l])\ \mathrm{mod}\ 2\pi;l\in[1,d-1]
    (6) 𝜽eiS,new=[(θei,1S+θrIk,1S)mod 2π,,(θei,d1S+θrIk,d1S)mod 2π];𝑟𝑎𝑑=wS\bm{\theta}_{e_{i}}^{S,new}=[(\theta_{e_{i},1}^{S}+\theta_{r_{Ik},1}^{S})\ \mathrm{mod}\ 2\pi,...,\\ (\theta_{e_{i},d-1}^{S}+\theta_{r_{Ik},d-1}^{S})\ \mathrm{mod}\ 2\pi];\mathit{rad}=w^{S}
  3. (3)

    Convert from polar coordinates back to Cartesian coordinates. Refer to Section A.2 for details on the conversion procedure.

    (7) 𝜽eiS,new𝒉eiS,new:[xei,1S,new,xei,2S,new,,xei,dS,new]\bm{\theta}_{e_{i}}^{S,new}\rightarrow\bm{h}_{e_{i}}^{S,new}:[x_{e_{i},1}^{S,new},x_{e_{i},2}^{S,new},...,x_{e_{i},d}^{S,new}]
Theorem.

The vector transformation procedure, frotf_{\mathrm{rot}}, is closed in the spherical space.

Proof.

The proof is outlined by examining all three steps for frotf_{\mathrm{rot}} from the transformation procedure, where 𝒉ei\bm{h}_{e_{i}} and 𝜽ei\bm{\theta}_{e_{i}} represent the same point eie_{i} in the embedding space by the isomorphic Cartesian and polar coordinate systems respectively.

  1. (1)

    The Cartesian coordinate representation is equivalent to the polar coordinate representation of the point, eie_{i}, under MCP model, detailed in the Supplements. Further, radius 𝑟𝑎𝑑=wS\mathit{rad}=w^{S} of the polar representation embeddings lies in the spherical norm space wSw^{S}.

  2. (2)

    θ[0,2π);𝑟𝑎𝑑=wS\theta\in[0,2\pi);\mathit{rad}=w^{S} defines polar representation embeddings in the spherical space, and
    (𝜽eiS+𝜽rIkS)[l]=(𝜽eiS[l]+𝜽rIkS[l])mod 2π[0,2π)l;𝑟𝑎𝑑=wS(\bm{\theta}_{e_{i}}^{S}+\bm{\theta}_{r_{Ik}}^{S})[l]=(\bm{\theta}_{e_{i}}^{S}[l]+\bm{\theta}_{r_{Ik}}^{S}[l])\ \mathrm{mod}\ 2\pi\in[0,2\pi)\forall l;\mathit{rad}=w^{S}. Thus, both angular and radial coordinates are preserved.

  3. (3)

    The polar coordinate representation is equivalent to the Cartesian coordinate representation of the point, eie_{i}, under MPC model, detailed in the Supplements.

Instance View Loss Function.

The instance-view model uses a hinge loss function that is maximized for all triples in the instance space RIR_{I}, with positive triples denoted, ptI=(𝒉eiS,𝜽rIkS,𝒉ejS)\mathrm{pt}_{I}=(\bm{h}_{e_{i}}^{S},\bm{\theta}_{r_{Ik}}^{S},\bm{h}_{e_{j}}^{S}) with corresponding score ϕobsS(ptI)\phi^{S}_{\mathrm{obs}}(\mathrm{pt}_{I}) and negative triples denoted, ntI=(𝒉eiS,𝜽rIkS,𝒉ejS)\mathrm{nt}_{I}=(\bm{h}_{e_{i}}^{S^{\prime}},\bm{\theta}_{r_{Ik}}^{S},\bm{h}_{e_{j}}^{S^{\prime}}) with corresponding score ϕcorrS(ntI)\phi^{S}_{\mathrm{corr}}(\mathrm{nt}_{I}), and γRI>0\gamma^{R_{I}}>0 is a positive margin hyperparameter. Specifically, the instance loss function measures the distance between the predicted tail entity and the ground truth.

(8) ϕobsS(ptI)=distS(frot(𝒉eiS,𝜽rIkS),𝒉ejS)\phi^{S}_{\mathrm{obs}}(\mathrm{pt}_{I})=\mathrm{dist}_{S}\big{(}f_{\mathrm{rot}}(\bm{h}_{e_{i}}^{S},\bm{\theta}_{r_{Ik}}^{S}),\bm{h}_{e_{j}}^{S}\big{)}
(9) ϕcorrS(ntI)=distS(frot(𝒉eiS,𝜽rIkS),𝒉ejS)\phi^{S}_{\mathrm{corr}}(\mathrm{nt}_{I})=\mathrm{dist}_{S}\big{(}f_{\mathrm{rot}}(\bm{h}_{e_{i}}^{S^{\prime}},\bm{\theta}_{r_{Ik}}^{S}),\bm{h}_{e_{j}}^{S^{\prime}}\big{)}
(10) LinstRI=1|RI|(ptIRI)(ntIRI)max(0,γRI+ϕobsS(ptI)ϕcorrS(ntI))L_{\mathrm{inst}}^{R_{I}}=\frac{1}{|R_{I}|}\sum_{(\mathrm{pt_{I}}\in R_{I})\land(\mathrm{nt_{I}}\notin R_{I})}{\mathrm{max}\big{(}0,\gamma^{R_{I}}+\phi^{S}_{\mathrm{obs}}(\mathrm{pt}_{I})-\phi^{S}_{\mathrm{corr}}(\mathrm{nt}_{I})\big{)}}

We calculate spherical geodesic distance (Meng et al., 2019) between points 𝒙S\bm{x}^{S} and 𝒚S\bm{y}^{S} on the manifold as follows:

(11) distS(𝒙S,𝒚S)=arccos((𝒙S)T𝒚S)\mathrm{dist}_{S}(\bm{x}^{S},\bm{y}^{S})=\mathrm{arccos}((\bm{x}^{S})^{{}^{T}}\bm{y}^{S})

RIR_{I} includes links between all entities including both non-bridge entities and bridge-entities. In this way, the representation of non-bridge entities is influenced by the bridge entities.

3.1.2. Modeling Ontological View KG in Hyperbolic Space

Representation of concepts.

We propose to embed concepts from the ontology-view on the Poincaré disk from the hyperbolic space, d\mathbb{H}^{d}, in order to better capture the hierarchical structures present in this KG view. Concepts are represented as points, 𝒉ci\bm{h}_{c_{i}} that belong inside the Poincaré disk in d\mathbb{H}^{d} as shown in Figure 1(b). Formally, d={𝒉ciHd|𝒉ciH=wciH},wciH[0,1)\mathbb{H}^{d}=\{\bm{h}_{c_{i}}^{H}\in\mathbb{R}^{d}\big{|}\lVert\bm{h}_{c_{i}}^{H}\rVert=w_{c_{i}}^{H}\},w_{c_{i}}^{H}\in[0,1), is the dd-dimensional wciHw_{c_{i}}^{H}-norm ball, where \lVert\cdot\rVert is the Euclidean norm. We assume the center of the disk is aligned with the center of the sphere, and for convenience set the last dimension dd to 0.

For concepts in the hyperbolic space, d\mathbb{H}^{d}, any hyperbolic space model for KG can be applied in principle, which we denote as follows with r𝒪kR𝒪r_{\mathcal{O}k}\in R_{\mathcal{O}} to denote a relation between two concepts:

(12) fKGE(ptO)=fKGE(𝒉ciH,r𝒪k,𝒉cjH)f_{\mathrm{KGE}}(\mathrm{pt}_{O})=f_{\mathrm{KGE}}(\bm{h}_{c_{i}}^{H},r_{\mathcal{O}k},\bm{h}_{c_{j}}^{H})

We illustrate MuRP (Balaževic´ et al., 2019) as an example scoring function, which uses the hyperbolic geodesic distance and relies on Möbius addition to model the relation, where exp0()\mathrm{exp}_{0}(\cdot) and log0()\mathrm{log}_{0}(\cdot) are defined in Section 2.2, 𝑹d×d\bm{R}\in\mathbb{R}^{d\times d} is a learnable diagonal relation matrix representing the stretch transformation by relation r𝒪kR𝒪r_{\mathcal{O}k}\in R_{\mathcal{O}} with representation 𝒉r𝒪kH\bm{h}_{r_{\mathcal{O}k}}^{H}, and scalar biases bciH,bcjHb_{c_{i}}^{H},b_{c_{j}}^{H} of concepts cic_{i} and cjc_{j}:

(13) fMuRP(ptO)=fMuRP(𝒉ciH,r𝒪k,𝒉cjH)=fMuRP(𝒉ciH,𝒉r𝒪kH,𝒉cjH)=distH(exp0(𝑹log0(𝒉ciH)),𝒉cjHH𝒉r𝒪kH)2+bciH+bcjH\begin{split}f_{\mathrm{MuRP}}(\mathrm{pt}_{O})&=f_{\mathrm{MuRP}}(\bm{h}_{c_{i}}^{H},r_{\mathcal{O}k},\bm{h}_{c_{j}}^{H})\\ &=f_{\mathrm{MuRP}}(\bm{h}_{c_{i}}^{H},\bm{h}_{r_{\mathcal{O}k}}^{H},\bm{h}_{c_{j}}^{H})\\ &=-\mathrm{dist}_{H}\big{(}\mathrm{exp}_{0}(\bm{R}\mathrm{log}_{0}(\bm{h}_{c_{i}}^{H})\big{)},\bm{h}_{c_{j}}^{H}\oplus_{H}\bm{h}_{r_{\mathcal{O}k}}^{H})^{2}+b_{c_{i}}^{H}+b_{c_{j}}^{H}\end{split}
(14) 𝒙HH𝒚H=(1+2𝒙H,𝒚H+𝒚H22)𝒙H+(1𝒙H22)𝒚H1+2𝒙H,𝒚H+𝒙H22𝒚H22\bm{x}^{H}\oplus_{H}\bm{y}^{H}=\frac{(1+2\langle\bm{x}^{H},\bm{y}^{H}\rangle+\lVert\bm{y}^{H}\rVert_{2}^{2})\bm{x}^{H}+(1-\lVert\bm{x}^{H}\rVert_{2}^{2})\bm{y}^{H}}{1+2\langle\bm{x}^{H},\bm{y}^{H}\rangle+\lVert\bm{x}^{H}\rVert_{2}^{2}\lVert\bm{y}^{H}\rVert_{2}^{2}}

We calculate hyperbolic geodesic distance (Nickel and Kiela, 2017) between points 𝒙H\bm{x}^{H} and 𝒚H\bm{y}^{H} on the manifold as follows:

(15) distH(𝒙H,𝒚H)=arccosh(1+2𝒙H𝒚H22(1𝒙H2)(1𝒚H2))\mathrm{dist}_{H}(\bm{x}^{H},\bm{y}^{H})=\mathrm{arccosh}(1+2\frac{\lVert\bm{x}^{H}-\bm{y}^{H}\rVert_{2}^{2}}{(1-\lVert\bm{x}^{H}\rVert^{2})(1-\lVert\bm{y}^{H}\rVert^{2})})
Ontological View Loss Function.

The ontology-view model uses a hinge loss function that is maximized for all links between concepts in the ontology space and all links between bridge nodes and concepts, i.e., R𝒪RI𝒪R_{\mathcal{O}}\cup R_{I\mathcal{O}}, with positive triples denoted, pt𝒪=(𝒉ciH,𝜽r𝒪kH,𝒉cjH)(𝒉eiH,𝒉rI𝒪k,𝒉cjH)\mathrm{pt}_{\mathcal{O}}=(\bm{h}_{c_{i}}^{H},\bm{\theta}_{r_{\mathcal{O}k}}^{H},\bm{h}_{c_{j}}^{H})\cup(\bm{h}_{e_{i}}^{H},\bm{h}_{r_{I\mathcal{O}k}},\bm{h}_{c_{j}}^{H}) with corresponding score ϕobsH(pt𝒪)\phi^{H}_{obs}(\mathrm{pt}_{\mathcal{O}}) and negative triples denoted, nt𝒪=(𝒉ciH,𝜽r𝒪kH,𝒉cjH)(𝒉eiH,𝒉rI𝒪k,𝒉cjH)\mathrm{nt}_{\mathcal{O}}=(\bm{h}_{c_{i}}^{H^{\prime}},\bm{\theta}_{r_{\mathcal{O}k}}^{H},\bm{h}_{c_{j}}^{H^{\prime}})\cup(\bm{h}_{e_{i}}^{H^{\prime}},\bm{h}_{r_{I\mathcal{O}k}},\bm{h}_{c_{j}}^{H^{\prime}}) with corresponding score ϕcorrH(nt𝒪)\phi^{H}_{corr}(\mathrm{nt}_{\mathcal{O}}), and γR𝒪RI𝒪>0\gamma^{R_{\mathcal{O}}\cup R_{I\mathcal{O}}}>0 is a positive margin hyperparameter:

(16) ϕobsH(pt𝒪)=fKGE(pt𝒪)\phi^{H}_{obs}(\mathrm{pt}_{\mathcal{O}})=f_{\mathrm{KGE}}(\mathrm{pt}_{\mathcal{O}})
(17) ϕcorrH(nt𝒪)=fKGE(nt𝒪)\phi^{H}_{corr}(\mathrm{nt}_{\mathcal{O}})=f_{\mathrm{KGE}}(\mathrm{nt}_{\mathcal{O}})
(18) LontoR𝒪RI𝒪=1|R𝒪RI𝒪|(pt𝒪R𝒪RI𝒪)(nt𝒪R𝒪RI𝒪)max(0,γR𝒪RI𝒪+ϕcorrH(nt𝒪)ϕobsH(pt𝒪))\begin{split}L_{\mathrm{onto}}^{R_{\mathcal{O}}\cup R_{I\mathcal{O}}}=\frac{1}{|R_{\mathcal{O}}\cup R_{I\mathcal{O}}|}\sum_{(\mathrm{pt_{\mathcal{O}}}\in R_{\mathcal{O}}\cup R_{I\mathcal{O}})\land(\mathrm{nt_{\mathcal{O}}}\notin R_{\mathcal{O}}\cup R_{I\mathcal{O}})}\\ {\mathrm{max}\big{(}0,\gamma^{R_{\mathcal{O}}\cup R_{I\mathcal{O}}}+\phi^{H}_{corr}(\mathrm{nt}_{\mathcal{O}})-\phi^{H}_{obs}(\mathrm{pt}_{\mathcal{O}})\big{)}}\end{split}

RORI𝒪R_{O}\cup R_{I\mathcal{O}} includes triples composed of concepts and bridge-entities. In this way, the representation of concepts is influenced by the bridge entities, which is described in Section 3.1.3.

3.1.3. Modeling the Intersection of the Two Spaces.

Representation of bridge entities.

Bridge nodes are entity nodes that bridge the communication between the instance-view and ontology-view components. Bridge nodes are connected to concepts in the graph, formed by link ontology 𝒉rI𝒪k\bm{h}_{r_{I\mathcal{O}k}} but may also be connected to other entities, formed by link instance 𝜽rk\bm{\theta}_{r_{\mathcal{I}k}}. As shown in Figure 1(a), nodes 5 and 6 are bridge nodes that are involved in both cyclic and hierarchical structures through their links to other entities as well as concepts. As such, we propose to embed bridge entities in the intersection space of the Poincare disk and surface of the spherical ball, in order to better capture the heterogeneous KG structures that are associated with these nodes. We refer to the intersection submanifold embedding space as the bridge space, 𝔹d\mathbb{B}^{d}, where the representation of these nodes are informed by both 𝕊d\mathbb{S}^{d} and d\mathbb{H}^{d} and has one lower degree of freedom. The bridge space can therefore be derived as a sphere in general. Formally, 𝔹d={𝒉eiBd|𝒉eiB=wS},wS[0,1)\mathbb{B}^{d}=\{\bm{h}_{e_{i}}^{B}\in\mathbb{R}^{d}\big{|}\lVert\bm{h}_{e_{i}}^{B}\rVert=w^{S}\},w^{S}\in[0,1), is the dd-dimensional wSw^{S}-norm ball, where \lVert\cdot\rVert is the Euclidean norm, and the value of last dimension d=0d=0. Links associated with bridge nodes are 𝜽rIk,𝒉rI𝒪k[0,2π)d1\bm{\theta}_{r_{Ik}},\bm{h}_{r_{I\mathcal{O}k}}\in[0,2\pi)^{d-1}, and operations on bridge nodes, such as geodesic distance and loss functions, happen in either the spherical space or hyperbolic space.

To ensure compatibility with the connected concept nodes, we map the bridge entities, 𝒉eiB\bm{h}_{e_{i}}^{B}, to an embedding in the ontology space through a non-linear transformation function, g𝒉rI𝒪k(𝒉eiB)g_{\bm{h}_{r_{I\mathcal{O}k}}}(\bm{h}_{e_{i}}^{B}), where AGG()\mathrm{AGG}(\cdot) denotes an averaging over all relations kk in RIOR_{IO}. Logarithmic and exponential mapping functions of log0()\mathrm{log}_{0}(\cdot) and exp0()\mathrm{exp}_{0}(\cdot) are described in Section 2.2.

(19) g𝒉rI𝒪k(𝒉eiB)=AGG(projB(tanh(𝑾𝒉rIOkH𝒉eiBH𝒃𝒉rIOk)))g_{\bm{h}_{r_{I\mathcal{O}k}}}(\bm{h}_{e_{i}}^{B})=\textrm{AGG}\Big{(}\mathrm{proj}_{B}\big{(}\mathrm{tanh}(\bm{W}_{\bm{h}_{r_{IOk}}}\otimes_{H}\bm{h}_{e_{i}}^{B}\oplus_{H}\bm{b}_{\bm{h}_{r_{IOk}}})\big{)}\Big{)}
(20) projB(𝒛)=projS(𝒛)\mathrm{proj}_{B}(\bm{z})=\mathrm{proj}_{S}(\bm{z})
(21) 𝑴H𝒉eiB=exp0(𝑴log0(𝒉eiB))\bm{M}\otimes_{H}\bm{h}_{e_{i}}^{B}=\mathrm{exp}_{0}\big{(}\bm{M}\mathrm{log}_{0}(\bm{h}_{e_{i}}^{B})\big{)}
(22) 𝒉ciHH𝒉cjH=(1+2(𝒉ciH)T𝒉cjH+𝒉cjH22)𝒉ciH+(1𝒉ciH22)𝒉cjH1+2(𝒉ciH)T𝒉cjH+𝒉ciH22𝒉cjH22\bm{h}_{c_{i}}^{H}\oplus_{H}\bm{h}_{c_{j}}^{H}=\frac{\big{(}1+2(\bm{h}_{c_{i}}^{H})^{T}\bm{h}_{c_{j}}^{H}+\lVert\bm{h}_{c_{j}}^{H}\rVert_{2}^{2}\big{)}\bm{h}_{c_{i}}^{H}+(1-\lVert\bm{h}_{c_{i}}^{H}\rVert_{2}^{2})\bm{h}_{c_{j}}^{H}}{1+2(\bm{h}_{c_{i}}^{H})^{T}\bm{h}_{c_{j}}^{H}+\lVert\bm{h}_{c_{i}}^{H}\rVert_{2}^{2}\lVert\bm{h}_{c_{j}}^{H}\rVert_{2}^{2}}

where both the weight matrix 𝑾hrI𝒪k\bm{W}_{h_{r_{I\mathcal{O}k}}} and bias 𝒃hrI𝒪k\bm{b}_{h_{r_{I\mathcal{O}k}}} are specific to each relation kk in RIOR_{IO} and reserved for the ontology 𝒉rI𝒪k\bm{h}_{r_{I\mathcal{O}k}}.

Bridge Node Loss Function.

The bridge-node model uses a hinge loss function as a combination of the entity’s ontology-specific loss, ontoLoss𝒉rI𝒪k\mathrm{ontoLoss}_{\bm{h}_{r_{I\mathcal{O}k}}}, and instance-specific loss, instLoss𝜽rIk\mathrm{instLoss}_{\bm{\theta}_{r_{Ik}}}, that is maximized for RIRIOR_{I}\cup R_{IO} which contains all triples associated with the bridge nodes. Positive triples are denoted ptB,I=(𝒉eiB,𝜽rIk,𝒉ejB)\mathrm{pt}_{B,I}=(\bm{h}_{e_{i}}^{B},\bm{\theta}_{r_{Ik}},\bm{h}_{e_{j}}^{B}) and ptB,IO=(𝒉eiB,𝒉rIOk,𝒉cjB)\mathrm{pt}_{B,IO}=(\bm{h}_{e_{i}}^{B},\bm{h}_{r_{IOk}},\bm{h}_{c_{j}}^{B}), and negative triples are denoted ntB,I=(𝒉eiB,𝜽rIk,𝒉ejB)\mathrm{nt}_{B,I}=(\bm{h}_{e_{i}}^{B^{\prime}},\bm{\theta}_{r_{Ik}},\bm{h}_{e_{j}}^{B^{\prime}}) and ntB,IO=(𝒉eiB,𝒉rIOk,𝒉cjB)\mathrm{nt}_{B,IO}=(\bm{h}_{e_{i}}^{B^{\prime}},\bm{h}_{r_{IOk}},\bm{h}_{c_{j}}^{B^{\prime}}) with loss function defined as follows:

(23) ontoLoss𝒉rI𝒪k(ptB,I𝒪,ntB,I𝒪)=max(0,γRI𝒪+ϕobsH(ptB,I𝒪)ϕcorrH(ntB,I𝒪))\mathrm{ontoLoss}_{\bm{h}_{r_{I\mathcal{O}k}}}(\mathrm{pt_{B,I\mathcal{O}}},\mathrm{nt_{B,I\mathcal{O}}})\\ =\mathrm{max}\big{(}0,\gamma^{R_{I\mathcal{O}}}+\phi^{H}_{obs}(\mathrm{pt}_{B,I\mathcal{O}})-\phi^{H}_{corr}(\mathrm{nt}_{B,I\mathcal{O}})\big{)}
(24) instLoss𝜽rIk(ptB,I,ntB,I)=max(0,γRI+ϕobsS(ptB,I)ϕcorrS(ntB,I))\mathrm{instLoss}_{\bm{\theta}_{r_{Ik}}}(\mathrm{pt_{B,I}},\mathrm{nt_{B,I}})\\ =\mathrm{max}\big{(}0,\gamma^{R_{I}}+\phi^{S}_{\mathrm{obs}}(\mathrm{pt}_{B,I})-\phi^{S}_{\mathrm{corr}}(\mathrm{nt}_{B,I})\big{)}
(25) LbridgeRIRIO=1|RIRIO|(ptB,I,ptB,I𝒪RIRIO)(ntB,I,ntB,I𝒪RIRIO)(ontoLoss𝒉rI𝒪k(ptB,I𝒪,ntB,I𝒪)+instLoss𝜽rIk(ptB,I,ntB,I))L_{\mathrm{bridge}}^{R_{I}\cup R_{IO}}=\frac{1}{|R_{I}\cup R_{IO}|}\sum_{(\mathrm{pt_{B,I}},\mathrm{pt_{B,I\mathcal{O}}}\in R_{I}\cup R_{IO})\land(\mathrm{nt_{B,I}},\mathrm{nt_{B,I\mathcal{O}}}\notin R_{I}\cup R_{IO})}\big{(}\\ {\mathrm{ontoLoss}_{\bm{h}_{r_{I\mathcal{O}k}}}(\mathrm{pt_{B,I\mathcal{O}}},\mathrm{nt_{B,I\mathcal{O}}})+\mathrm{instLoss}_{\bm{\theta}_{r_{Ik}}}}(\mathrm{pt_{B,I}},\mathrm{nt_{B,I}})\big{)}

The combination loss function above enables bridge nodes to learn from both spaces of intersection of 𝕊d\mathbb{S}^{d} and d\mathbb{H}^{d}.

3.2. Training

This section details the training framework of DGS, for representing two-view KGs, described in Section 3.1. We describe, for each epoch, the training of each node in the two-view KG which includes (1) the forward propagation step, (2) the loss function, and (3) the backward propagation step to optimize parameters. Algorithm 1 provides a summary of the framework.

Embedding Initialization.

We randomly initialize all embeddings of ee and rr in polar coordinates: 𝜽eid1Unif([0,2π))d1\bm{\theta}_{e_{i}}\in\mathbb{R}^{d-1}\leftarrow\mathrm{Unif}([0,2\pi))^{d-1} and 𝜽rIk,𝜽r𝒪kd1Unif([0,2π))d1\bm{\theta}_{r_{Ik}},\bm{\theta}_{r_{\mathcal{O}k}}\in\mathbb{R}^{d-1}\leftarrow\mathrm{Unif}([0,2\pi))^{d-1}, then convert entity embeddings to their corresponding Cartesian representation: 𝒉ei=MPC(𝜽ei)\bm{h}_{e_{i}}=\mathrm{MPC}(\bm{\theta}_{e_{i}}). Link 𝒉rI𝒪k\bm{h}_{r_{I\mathcal{O}k}} is a randomly sampled position on the Poincare disk. Refer to Section A in Supplements for details about the conversion procedure, describing both polar-Cartesian coordinate conversion (MPC) and Cartesian-polar coordinate conversion (MCP). For the instance-view model, we also choose a value for norm wSw^{S}, that is sampled from a uniform distribution: wS:wS[0,1)Unif([0,1))w^{S}:w^{S}\in[0,1)\rightarrow\mathrm{Unif}([0,1)) for all entities, and for the ontology-view model, we choose a value for norm w𝒉ciHw_{\bm{h}_{c_{i}}}^{H}, assigned uniformly at random per entity. We set the curvature values of the spherical and hyperbolic spaces as KS=1K_{S}=1 and KH=1K_{H}=-1 respectively. We leave the non-trivial problem of learning optimal curvatures as future work.

Training Procedure for Instance View KG

Parameter optimization is performed using Riemannian stochastic gradient descent (RSGD) for the spherical space as follows for entity embedding and relational embedding updates respectively. To ensure that the updated entity embedding remains in the norm-wSw^{S} space, we perform a rescaling operation, projS\mathrm{proj}_{S}, to project out-of-boundary embeddings back to the surface of the wSw^{S}-ball.

(26) projS(𝒛)={wS𝒛𝒛if𝒛wS𝒛otherwise\mathrm{proj}_{S}(\bm{z})=\begin{cases}w^{S}\cdot\frac{\bm{z}}{\lVert\bm{z}\rVert}\quad&\text{if}\,\lVert\bm{z}\rVert\neq w^{S}\\ \bm{z}\quad&\text{otherwise}\\ \end{cases}
(27) r(𝒉ei,tS,LinstRI)=(1+𝒉ei,tSTLinstRI(𝒉ei,tS)LinstRI(𝒉ei,tS))(I𝒉ei,tS𝒉ei,tST)r(\bm{h}_{e_{i,t}}^{S},L_{\mathrm{inst}}^{R_{I}})=\big{(}1+\frac{\bm{h}_{e_{i,t}}^{S^{T}}\nabla{L_{\mathrm{inst}}^{R_{I}}}(\bm{h}_{e_{i,t}}^{S})}{\lVert\nabla{L_{\mathrm{inst}}^{R_{I}}}(\bm{h}_{e_{i,t}}^{S})\rVert}\big{)}(I-\bm{h}_{e_{i,t}}^{S}\bm{h}_{e_{i,t}}^{S^{T}})
(28) 𝒉ei,t+1SprojS(ηtr(𝒉ei,tS,LinstRI)LinstRI(𝒉ei,tS))\bm{h}_{e_{i},t+1}^{S}\leftarrow\mathrm{proj}_{S}\big{(}-\eta_{t}\cdot r(\bm{h}_{e_{i,t}}^{S},L_{\mathrm{inst}}^{R_{I}})\nabla{L_{\mathrm{inst}}^{R_{I}}}(\bm{h}_{e_{i,t}}^{S})\big{)}
(29) 𝜽rIk,t+1Sηtr(𝜽rIk,tS,LinstRI)LinstRI(𝜽rIk,tS)\bm{\theta}_{r_{Ik},t+1}^{S}\leftarrow-\eta_{t}\cdot r(\bm{\theta}_{r_{Ik},t}^{S},L_{\mathrm{inst}}^{R_{I}})\nabla{L_{\mathrm{inst}}^{R_{I}}}(\bm{\theta}_{r_{Ik},t}^{S})
Training procedure for concepts.

Parameter optimization is performed using RSGD for the hyperbolic space as follows for concept embedding and relational embedding updates respectively, where the corresponding concept norm space, wciHw^{H}_{c_{i}}, is also learned through RSGD by updating embeddings of 𝒉ciH\bm{h}_{c_{i}}^{H}. Diagonal relational matrix 𝑹\bm{R} is also updated through RSGD, and we update scalar biases bci,t+1H,bcj,t+1Hb_{c_{i},t+1}^{H},b_{c_{j},t+1}^{H} through stochastic gradient descent.

(30) 𝒉ci,t+1H𝒉ci,tHηt(1𝒉ci,tH22)2LontoR𝒪RI𝒪(𝒉ci,tH)\bm{h}_{c_{i,t+1}}^{H}\leftarrow\bm{h}_{c_{i,t}}^{H}-\eta_{t}(\frac{1-\lVert\bm{h}_{c_{i,t}}^{H}\rVert^{2}}{2})^{2}\nabla{L_{\mathrm{onto}}^{R_{\mathcal{O}}\cup R_{I\mathcal{O}}}}(\bm{h}_{c_{i,t}}^{H})
(31) 𝒉r𝒪k,t+1H𝒉r𝒪k,tHηt(1𝒉r𝒪k,tH22)2LontoR𝒪RI𝒪(𝒉r𝒪k,tH)\bm{h}_{r_{\mathcal{O}k},t+1}^{H}\leftarrow\bm{h}_{r_{\mathcal{O}k},t}^{H}-\eta_{t}(\frac{1-\lVert\bm{h}_{r_{\mathcal{O}k},t}^{H}\rVert^{2}}{2})^{2}\nabla{L_{\mathrm{onto}}^{R_{\mathcal{O}}\cup R_{I\mathcal{O}}}}(\bm{h}_{r_{\mathcal{O}k},t}^{H})
(32) 𝑹t+1𝑹tηt(1𝑹t22)2LontoR𝒪RI𝒪(𝑹t)\bm{R}_{t+1}\leftarrow\bm{R}_{t}-\eta_{t}(\frac{1-\lVert\bm{R}_{t}\rVert^{2}}{2})^{2}\nabla{L_{\mathrm{onto}}^{R_{\mathcal{O}}\cup R_{I\mathcal{O}}}}(\bm{R}_{t})
(33) bci,t+1Hbci,tHηtLontoR𝒪RI𝒪(bci,tH)b_{c_{i},t+1}^{H}\leftarrow b_{c_{i},t}^{H}-\eta_{t}\nabla{L_{\mathrm{onto}}^{R_{\mathcal{O}}\cup R_{I\mathcal{O}}}}(b_{c_{i},t}^{H})
(34) bcj,t+1Hbcj,tHηtLontoR𝒪RI𝒪(bcj,tH)b_{c_{j},t+1}^{H}\leftarrow b_{c_{j},t}^{H}-\eta_{t}\nabla{L_{\mathrm{onto}}^{R_{\mathcal{O}}\cup R_{I\mathcal{O}}}}(b_{c_{j},t}^{H})

After the epoch’s update of concept embeddings, we once again reset the value of the last dimension dd to 0 to satisfy the original framework constraint of the Poincaré disk. We also enforce that the angular dimensions of relational embeddings are in [0,2π)[0,2\pi).

Training Procedure for Bridge Nodes

Parameter optimization is performed using RSGD for the bridge space as follows for bridge entity embedding and relational embedding updates respectively. Ontology optimization, ontoOpt()\mathrm{ontoOpt}(\cdot), and instance optimization, instOpt()\mathrm{instOpt}(\cdot), are performed alternatively in batches for each of the two embedding types according to the type of link that the embedding is associated with, e.g., RIR_{I} or RIOR_{IO}. This enables the representation of bridge nodes to be informed by both 𝕊d\mathbb{S}^{d} and d\mathbb{H}^{d}.

(35) ontoOpt(𝒛)=𝒛ηt(1𝒛22)2LbridgeRIRIO(𝒛)\mathrm{ontoOpt}(\bm{z})=\bm{z}-\eta_{t}(\frac{1-\lVert\bm{z}\rVert^{2}}{2})^{2}\nabla{L_{\mathrm{bridge}}^{R_{I}\cup R_{IO}}}(\bm{z})
(36) instOpt(𝒛)=ηtr(𝒛,LbridgeRIRIO)LbridgeRIRIO(𝒛)\mathrm{instOpt}(\bm{z})=-\eta_{t}\cdot r(\bm{z},L_{\mathrm{bridge}}^{R_{I}\cup R_{IO}})\nabla{L_{\mathrm{bridge}}^{R_{I}\cup R_{IO}}}(\bm{z})
(37) 𝒉ei,t+1BprojB(ontoOpt(𝒉ei,tB))||projB(instOpt(𝒉ei,tB))\bm{h}_{e_{i},t+1}^{B}\leftarrow\mathrm{proj}_{B}\big{(}\mathrm{ontoOpt}(\bm{h}_{e_{i},t}^{B})\big{)}||\mathrm{proj}_{B}\big{(}\mathrm{instOpt}(\bm{h}_{e_{i},t}^{B})\big{)}
(38) 𝒉rI𝒪k,t+1ontoOpt(𝒉rI𝒪k,t)\bm{h}_{r_{I\mathcal{O}k},t+1}\leftarrow\mathrm{ontoOpt}(\bm{h}_{r_{I\mathcal{O}k},t})
(39) 𝜽rIk,t+1instOpt(𝜽rIk,t)\bm{\theta}_{r_{Ik},t+1}\leftarrow\mathrm{instOpt}(\bm{\theta}_{r_{Ik},t})

After each epoch’s update of the instance optimization for 𝒉eiB\bm{h}_{e_{i}}^{B}, the value of the last dimension dd is reset to 0 and rescaled with projB(𝒛)\mathrm{proj}_{B}(\bm{z}) defined to be the same as projS(𝒛)\mathrm{proj}_{S}(\bm{z}) to ensure the intersection space constraint of the bridge entity model. Angular dimensions are also enforced to be in [0, 2π\pi).

Input :  set of entities ee; set of relations rr
instance-view entity to entity triples with links, 𝜽rIkS\bm{\theta}_{r_{Ik}}^{S}
ontology-view concept to concept triples with links, 𝜽r𝒪kH\bm{\theta}_{r_{\mathcal{O}k}}^{H}
bridge entity to concept triples with links, ontology 𝒉rI𝒪k\bm{h}_{r_{I\mathcal{O}k}}
bridge entity to entity triples with links, instance 𝜽rIk\bm{\theta}_{r_{Ik}}
Output :  Updated embeddings, 𝜽rIk,EPS\bm{\theta}_{r_{Ik},\mathrm{EP}}^{S}, 𝜽r𝒪k,EPH\bm{\theta}_{r_{\mathcal{O}k,\mathrm{EP}}}^{H}, ontology 𝒉rI𝒪k,EP\bm{h}_{r_{I\mathcal{O}k,\mathrm{EP}}}, and instance 𝜽rIk,EP\bm{\theta}_{r_{Ik},\mathrm{EP}} at final epoch EP\mathrm{EP}
1 for epoch \in (1, 2, …, EP) do
2       Step 1*:
3       Sample links from 𝜽rIkS\bm{\theta}_{r_{Ik}}^{S}
4       Perform spherical update of entities: Section 3.1.1
5       Step 2*:
6       Sample links from 𝜽r𝒪kH\bm{\theta}_{r_{\mathcal{O}k}}^{H}
7       Perform hyperbolic update of concepts: Section 3.1.2
8       Step 3*:
9       Sample links from ontology 𝒉rI𝒪k\bm{h}_{r_{I\mathcal{O}k}} and instance 𝜽rIk\bm{\theta}_{r_{Ik}}
10       Alternatively perform spherical and hyperbolic updates of bridge nodes: Section 3.1.3
11 end for
12
return 𝛉rIk,EPS\bm{\theta}_{r_{Ik},\mathrm{EP}}^{S}, 𝛉r𝒪k,EPH\bm{\theta}_{r_{\mathcal{O}k,\mathrm{EP}}}^{H}, ontology 𝒉rI𝒪k,EP\bm{h}_{r_{I\mathcal{O}k,\mathrm{EP}}}, and instance 𝜽rIk,EP\bm{\theta}_{r_{Ik},\mathrm{EP}}
Algorithm 1 Overall training procedure of DGS. “*” indicates that the three steps can be performed in any order.

4. Experiments

In this section, we evaluate DGS on two KG tasks: the triple completion task on each of the instance and ontology views of the KG and the entity typing task to test quality of the learned bridge space in communicating between each view of the KG. We also provide a case study on entity typing for different variants of DGS by embedding on other combinations of geometric spaces. Further, we provide a visualization of embeddings before and after the learning process projected onto the 3-D geometric space of DGS.

4.1. Datasets

We utilize the datasets of YAGO26K-906 and DB111K-174 since they have the two-view KG setting unlike other datasets for KG embeddings that consider solely an instance-view (Toutanova and Chen, 2015) or ontology-view (Dettmers et al., 2018). YAGO26K-906 and DB111K-174 are prepared from (Hao et al., 2019), which are extracted from YAGO (Suchanek et al., 2007) and DBpedia (Lehmann et al., 2015) respectively. Refer to (Hao et al., 2019) for the detailed construction process. Table 1 provides dataset statistics and Table 2 provides data splits for both datasets on both KG tasks. It can be observed that the instance-view contains many more triples than the ontology-view and that DB111K-174 contains a larger proportion of entity-concept triples (10.35%) compared to YAGO26K-906 (2.43%).

Table 1. Dataset statistics for entities EE, concepts CC and their relations. EEE-E denotes entity-entity links, CCC-C denotes concept-concept links, and EE-CC denotes entity-concept links.
Dataset Nodes Relations
#EE #CC #EE-EE: RIR_{I} #CC-CC: ROR_{O} #EE-CC: RIOR_{IO}
YAGO26K-906 26,078 906 390,738 8,962 9,962
DB111K-174 111,762 174 863,643 763 99,748
Table 2. Data splits for triple completion and entity typing. We provide splits for all KG triples in RI,RO,RIOR_{I},R_{O},R_{IO} for train(tr), validation(v), and test(ts).
YAGO26K-906
Task Tr(RI/RO/RIOR_{I}/R_{O}/R_{IO}) V(RI/RO/RIOR_{I}/R_{O}/R_{IO}) Ts(RI/RO/RIOR_{I}/R_{O}/R_{IO})
Triple Completion 332,128/7,618/8,691 19,536/448/485 39,074/896/1,019
Entity Typing 211,346/4,876/5,379 23,549/543/598 156,311/3,592/3,985
DB111K-174
Task Tr(RI/RO/RIOR_{I}/R_{O}/R_{IO}) V(RI/RO/RIOR_{I}/R_{O}/R_{IO}) Ts(RI/RO/RIOR_{I}/R_{O}/R_{IO})
Triple Completion 734,096/648/84,864 43,182/38/5,018 86,365/77/10,131
Entity Typing 466,538/462/53,863 51,828/46/5,985 345,504/337/39,900

4.2. Models

4.2.1. Baselines

We compare DGS to state-of-the-art neural network embedding models, which include Euclidean, non-Euclidean, and product space KGE models, as well as GNN-based models for KG completion and entity typing.

  • TransE (Bordes et al., 2013), one of the first KGE models, which simply captures the relationship between entities as a translation.

  • DistMult (Yang et al., 2015), a matrix factorization KGE model, modeling the relationship between entities via multiplication.

  • ComplEx (Trouillon et al., 2016), a KGE model that extends DistMult into the complex number field.

  • RotatE (Sun et al., 2019), a recent KGE model, based on the rotation assumption where a relation is a rotation from the subject to the object in the complex vector space.

  • JOIE (Hao et al., 2019) and M2GNN (Wang et al., 2021): Refer to Section 2.3 where this is discussed.

  • HyperKG (Kolyvakis et al., 2019), a KGE model extending translational KGE methods to the Poincaré-ball model of hyperbolic geometry.

  • HAKE (Zhang et al., 2020), which extends RotatE by having relations combining modulus scaling with rotation.

  • ConE (Bai et al., 2021), a KGE model embedding entities into hyperbolic cones and relations as transformations between cones.

  • RefH/RotH/AttH (Chami et al., 2020), which are hyperbolic KGE models that combine hyperbolic spaces using hyperbolic attention, where RefH and RotH are variants of AttH using only reflections and rotations respectively.

  • HGCN (Chami et al., 2019), a hyperbolic GCN model utilizing Riemannian geometry and the hyperboloid model.

  • HyperKA (Sun et al., 2020), which extends GCNs from the Euclidean to hyperbolic space using the Poincaré ball model.

4.2.2. DGS Variants

We describe variant models of DGS below.

  • DGS-RO-FC, which is DGS with the Riemannian operator (RO) used for vector transformation instead of our proposed closed spherical space vector transformation procedure in Section 3.1.1, and with fixed center (FC) of spherical ball at 0, which is the same center as the Poincaré disk. For single geometric spaces of 𝕊d\mathbb{S}^{d} and d\mathbb{H}^{d} for non-bridge nodes and concepts, the Riemannian operator is performed as a retraction operation to the tangent Euclidean spaces. However, we extend the Riemannian operator when performing retraction for the intersection ring of the bridge space, which is formed by intersecting the spherical ball’s surface and Poincaré disk. This is described in the Supplements.

  • DGS-SO-FC, which is DGS with the proposed closed spherical space operator (SO), and with FC of spherical ball at 0, which is the same center as the Poincaré disk.

  • DGS (ours), which is DGS with the proposed closed SO, and with learnable center (LC) of spherical ball, which for simplicity of constructing the intersection space for bridge nodes, is set to the last dimension to only allow for vertical shift. Note that we do not need to also make the center of the Poincaré disk learnable as this shift is already introduced with making one of the centers learnable. For learning the spherical center, ω\omega, in the model, we follow the same training procedure for Section 3.2 but for the non-bridge entities and bridge entities, we perform the operations by temporarily shifting to the center 0 (e.g., ω-\omega shift), then shift back to the new center ω\omega (e.g., +ω+\omega shift) after the updates are performed.

4.2.3. DGS Ablation Models

We study different ablations models of DGS, which are formed by utilizing different combinations of manifold spaces for each type of link of DGS (RI/ROR_{I}/R_{O}) in the two-view KG including the spherical space, 𝕊d\mathbb{S}^{d}, hyperbolic space using the Poincaré model, d\mathbb{H}^{d}, or Euclidean space, 𝔼d\mathbb{E}^{d}. These include: DGS (𝕊d/𝕊d\mathbb{S}^{d}/\mathbb{S}^{d}), MGS (d/d\mathbb{H}^{d}/\mathbb{H}^{d}), DGS (𝔼d/d\mathbb{E}^{d}/\mathbb{H}^{d}), and DGS (𝕊d/𝔼d\mathbb{S}^{d}/\mathbb{E}^{d}). Since RIOR_{IO} is always at the intersection of the two spaces RIR_{I} and ROR_{O}, we do not need to specify the geometric space separately.

4.3. Evaluation

Table 3. Results of KG triple completion. 𝑴\bm{M} denotes the dd-dimensional manifold space, RIR_{I} are entity links, ROR_{O} are concept links. For each group of models, the best results are bold-faced. The overall best results on each dataset are underscored.
Datasets YAGO26K-906 DB111K-174
Type 𝑴\bm{M} Graphs RIR_{I} KG Completion ROR_{O} KG Completion RIR_{I} KG Completion ROR_{O} KG Completion
Metrics MRR H@1 H@10 MRR H@1 H@10 MRR H@1 H@10 MRR H@1 H@10
𝔼d\mathbb{E}^{d} TransE 0.187 13.73 35.05 0.189 14.72 24.36 0.318 22.70 48.12 0.539 47.90 61.84
𝔼d\mathbb{E}^{d} DistMult 0.288 24.06 31.24 0.156 14.32 16.54 0.280 27.24 29.70 0.501 45.52 64.73
non-GNN based d\mathbb{C}^{d} ComplEx 0.291 24.85 37.28 0.180 14.83 22.97 0.321 27.39 46.63 0.549 47.80 62.23
KGE models d\mathbb{C}^{d} RotatE 0.302 25.31 42.17 0.228 16.35 27.28 0.356 29.31 54.60 0.557 49.16 68.19
𝔼d\mathbb{E}^{d} JOIE 0.316 24.62 51.85 0.289 18.66 39.13 0.479 35.21 72.38 0.602 52.48 79.71
d\mathbb{H}^{d} HyperKG 0.215 18.35 36.02 0.174 14.50 23.26 0.302 23.31 46.72 0.542 47.59 62.11
d\mathbb{H}^{d} HAKE 0.293 23.04 40.19 0.301 19.27 41.09 0.391 31.10 60.46 0.638 55.69 81.07
d\mathbb{H}^{d} ConE 0.299 23.56 41.23 0.313 20.05 41.80 0.422 33.69 68.12 0.639 55.89 81.45
d\mathbb{H}^{d} RefH 0.282 23.19 40.52 0.298 19.70 41.26 0.407 30.06 66.93 0.622 55.35 81.09
d\mathbb{H}^{d} RotH 0.295 23.50 41.03 0.308 19.97 41.78 0.418 30.18 67.05 0.639 55.82 81.44
GNN-based models d\mathbb{H}^{d} AttH 0.298 23.43 41.20 0.310 19.99 41.53 0.419 30.10 66.58 0.629 55.37 81.39
d\mathbb{H}^{d} HGCN 0.307 23.04 40.25 0.302 19.38 40.49 0.396 31.54 61.78 0.638 55.81 81.60
d\mathbb{H}^{d} HyperKA 0.320 26.71 52.09 0.305 18.83 40.28 0.486 35.79 72.33 0.613 53.36 80.59
d\mathbb{P}^{d} M2GNN 0.347 29.63 54.28 0.341 23.70 42.19 0.506 36.52 73.11 0.644 56.82 83.01
DGS (𝕊d/𝕊d\mathbb{S}^{d}/\mathbb{S}^{d}) 0.338 27.15 53.20 0.318 20.36 41.02 0.491 34.58 71.40 0.606 53.29 80.17
Ablation variants: DGS (d/d\mathbb{H}^{d}/\mathbb{H}^{d}) 0.314 25.11 52.02 0.358 24.61 43.28 0.502 35.79 73.61 0.663 57.59 84.16
DGS (RI/ROR_{I}/R_{O}) DGS (𝔼d/d\mathbb{E}^{d}/\mathbb{H}^{d}) 0.327 25.32 52.89 0.343 23.95 41.62 0.498 35.11 72.37 0.640 56.17 82.68
DGS (𝕊d/𝔼d\mathbb{S}^{d}/\mathbb{E}^{d}) 0.322 24.91 52.36 0.297 19.43 40.61 0.484 33.29 73.54 0.619 53.72 80.51
DGS-RO-FC 0.352 29.79 55.21 0.364 25.04 43.27 0.518 37.65 73.97 0.681 59.23 84.16
DGS variants DGS-SO-FC 0.364 30.15 55.93 0.369 25.81 44.18 0.536 38.29 74.28 0.687 59.26 84.82
DGS (ours) 0.366 30.15 56.06 0.372 25.88 44.38 0.536 38.31 74.85 0.690 59.88 84.82

In this section, we detail our evaluation on the tasks of KG triple completion and entity typing. The goal of triple completion is to construct the missing relation facts in a KG structure. Specifically, we test constructing a missing target node, from each of the ontology or instance views: or queries (ei,rk,?ej)(e_{i},r_{k},?e_{j}) and (ci,rk,?cj)(c_{i},r_{k},?c_{j}), such that each model evaluated is trained on the entire two-view KG. The goal of entity typing is to predict the concepts that correspond to the entities, or queries (ei,rk,?cj)(e_{i},r_{k},?c_{j}).

Using plausibility scores to rank each test candidate set, for each task, we report results for the evaluation metrics of mean reciprocal rank (MRRMRR) and Hits@, e.g., Hits@1Hits@1, Hits@3Hits@3, Hits@10Hits@10. Table 2 reports our data splits for each task. For triple completion, this is chosen to be embedding distance of the source node to the missing target node modeled under the relation, and for entity typing, this is chosen to be the embedding distance from the entity’s representation in the concept space to the concept. (Hao et al., 2019) provides more details on the evaluation procedure. For evaluation consistency, for both tasks, model training hyperparameters are chosen for dimensionality dd\in {50, 100, 200, 300} for all triples, learning rate η\eta\in {5e-4, 1e-3, 1e-2, 1e-1} and margins γ\gamma\in {0.5, 1}. Further, different batch sizes and epochs are used according to the type and size of the graphs.

KG Triple Completion Results.

Results are reported in Table 3. DGS outperforms all of the baseline models on both datasets. DGS achieves an average performance gain over all baselines by 32.36% on MRRMRR, 27.59% on Hit@1Hit@1, and 29.17% on Hit@10Hit@10 for the instance-view completion across both YAGO26K-906 and DB111K-174. DGS achieves an average performance gain over all baselines by 23.43% on MRRMRR, 28.41% on Hit@1Hit@1, and 18.11% on Hit@10Hit@10 for the ontology-view completion across both YAGO26K-906 and DB111K-174.

It can be observed that in both the instance and ontology views on both datasets, the hyperbolic-based KGE models outperform their Euclidean and complex space counterparts. Further, hyperbolic KGE models perform better on ontology view than instance view likely due to there being prevalence of hierarchy in the ontology. It is also seen that using multiple geometric spaces is more effective than using a single geometric space. For GNN-based models, M2GNN, which uses a product space, d\mathbb{P}^{d} combining Euclidean, spherical, and hyperbolic spaces, outperforms the models using only one of the spaces. Compared to M2GNN, the most competitive baseline model in most cases, DGS shows significant improvement of 3.25% on MRRMRR, 5.15% on Hit@1Hit@1, and 2.73% on Hit@10Hit@10 on average for both YAGO26K-906 and DB111K-174. This is likely because hierarchy in the KG is better modeled through the hyperbolic space than with influence from the spherical or Euclidean space, and cyclic links are better modeled through the space than with influence from the hyperbolic or Euclidean space. For bridge entities involved in both hierarchical and cyclic links, we see it is beneficial to model them in an intersection space to better model both of these properties.

Table 4. Results of entity typing. For each group of models, the best results are bold-faced. The overall best results on each dataset are underscored.
Datasets YAGO26K-906 DB111K-174
Metrics MRR Acc. Hit@3 MRR Acc. Hit@3
TransE 0.144 7.32 35.26 0.503 43.67 60.78
DistMult 0.411 36.07 55.32 0.551 49.83 68.01
ComplEx 0.519 43.08 59.28 0.583 55.07 70.17
RotatE 0.673 58.24 73.95 0.721 61.48 75.67
JOIE 0.899 85.72 96.02 0.859 75.58 96.02
HyperKG 0.627 55.39 65.64 0.736 61.20 74.29
HAKE 0.905 87.42 96.37 0.866 76.04 96.22
ConE 0.912 87.51 96.39 0.869 76.85 96.38
RefH 0.907 87.49 96.37 0.867 76.26 96.31
RotH 0.909 87.49 96.39 0.868 76.55 96.36
AttH 0.910 87.50 96.38 0.868 76.50 96.33
HGCN 0.905 87.44 96.37 0.867 76.11 96.27
HyperKA 0.918 87.76 96.45 0.871 76.65 96.50
M2GNN 0.922 88.16 97.01 0.880 77.58 96.94
DGS (𝕊d/𝕊d\mathbb{S}^{d}/\mathbb{S}^{d}) 0.919 87.94 96.81 0.875 77.03 96.78
DGS (d/d\mathbb{H}^{d}/\mathbb{H}^{d}) 0.924 88.12 97.01 0.887 77.53 96.95
DGS (𝔼d/d\mathbb{E}^{d}/\mathbb{H}^{d}) 0.922 88.07 96.91 0.883 77.37 96.58
DGS (𝕊d/𝔼d\mathbb{S}^{d}/\mathbb{E}^{d}) 0.917 87.68 96.41 0.866 76.33 96.48
DGS-RO-FC 0.930 88.47 97.32 0.891 77.81 97.04
DGS-SO-FC 0.938 89.02 98.11 0.895 77.92 97.40
DGS (ours) 0.939 89.07 98.18 0.895 77.94 97.47
Entity Typing Results.

Results are reported in Table 4. Consistent with the KG completion results, it can be seen that DGS variants outperform the baseline models on both datasets for entity typing. DGS achieves an average performance gain over all baselines of 24.49% on MRRMRR, 26.39% on AccAcc, and 18.78% on Hit@3Hit@3 for YAGO26K-906. DGS achieves an average performance gain over all baselines of 14.86% on MRRMRR, 13.74% on AccAcc, and 12.15% on Hit@3Hit@3 for DB111K-174. Compared to the most competitive GNN-based model, M2GNN, DGS has nearly a 2% gain in MRRMRR for each YAGO26K-906 and DB111K-174.

Ablation Studies.

Tables 3 and 4 report our results for ablation studies in models belonging to the category Ablation variants. It can be seen that using the spherical space for RIR_{I} links learns a better representation than its hyperbolic or Euclidean space counterparts. This indicates the prevalence of cyclic relations between entities. Utilizing the hyperbolic space for ROR_{O} and RIOR_{IO} links learns a better representation than its spherical or Euclidean space counterparts. This indicates the prevalence of hierarchical relations between concepts and entities-concepts. Interestingly, a fully hyperbolic-space model in general achieves better performance than its fully spherical counterpart, indicating that YAGO26K-906 and DB111K-174 may contain relatively more hierarchical links than cyclic links. Further, learning a better representation in one view benefits the other view more if there are more RIOR_{IO} bridge links. For example, since DB111K-174 contains significantly more RIOR_{IO} links than YAGO26K-906 shown in Table 1, it can be seen that DGS (d/d/d\mathbb{H}^{d}/\mathbb{H}^{d}/\mathbb{H}^{d}) has better performance against baselines on the instance view of DB111K-174 compared to the instance view of YAGO26K-906.

Visualizations.

We project the learned embeddings into 3-D space, and plot seven nodes with their relations in Figure 2. Concepts that are higher in the ontological hierarchy such as animal and person are closer to the origin compared to lower-level concepts such as leader, and entities tend to be farther away from the origin.

Refer to caption
Figure 2. Example of embeddings learned by DGS. Solid lines connect entities and concepts as in the original dataset. Dashed lines connect every node to the origin to indicate closeness to origin.

5. Conclusions

We are among the first to explore utilization of different embedding spaces for different views of a knowledge graph. Entities in instance view often have cyclical connections and are hereby modeled via spherical embeddings; whereas concepts and bridge entities in ontology view tend to form a hierarchy and are hereby modeled in hyperbolic space. We propose the notion of bridge space to seamlessly model intersection of the two spaces in which bridge entities reside. In addition, we propose a set of closed spherical space operations to eliminate the need of projecting embeddings to the tangent Euclidean space. The resulting model, DGS, is a unified framework that significantly outperforms all baselines, showing the effectiveness of capturing different structures in the knowledge graph using embedding spaces of different curvatures and properties.

Future directions include supporting geometric spaces of learnable curvature (to better capture the knowledge graph structure) and allowing a learnable offset between the origins of the two geometric spaces (to better accommodate uneven entity distribution and/or unbalanced ontology structures) We also aim to extend our dual-space model to a multi-space geometric embedding model.

6. Acknowledgements

This work was partially supported by NSF III-1705169, 1829071, 1937599, 2106859, 2119643; NIH R35-HL135772; NIBIB R01-EB027650; Amazon Research Awards; Cisco Research Grant USA000EP280889; NEC Gifts; Picsart Gifts; and Snapchat Gifts. The authors would also like to thank anonymous reviewers for their constructive feedback.

References

  • (1)
  • Bai et al. (2021) Yushi Bai, Rex Ying, Hongyu Ren, and Jure Leskovec. 2021. Modeling Heterogeneous Hierarchies with Relation-specific Hyperbolic Cones. NeurIPS (2021).
  • Balaževic´ et al. (2019) Ivana Balaževic´, Carl Allen, and Timothy Hospedales. 2019. Multi-relational Poincaré Graph Embeddings. NeurIPS (2019).
  • Beltrami (1897) Eugenio Beltrami. 1897. Teoria fondamentale degli spazii di curvatura costante. Annali di Matematica Pura ed Applicata.
  • Berant and Liang (2014) Jonathan Berant and Percy Liang. 2014. Semantic Parsing via Paraphrasing. ACL (2014).
  • Bonnabel (2013) Silvère Bonnabel. 2013. Stochastic gradient descent on Riemannian manifolds. IEEE (2013).
  • Bordes et al. (2013) Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. NeurIPS (2013).
  • Cannon et al. (1997) James W. Cannon, William J. Floyd, Richard Kenyon, and Walter R. Parry. 1997. Hyperbolic Geometry. MSRI.
  • Chami et al. (2020) Ines Chami, Adva Wolf, Da-Cheng Juan, Frederic Sala, Sujith Ravi, and Christopher Re´. 2020. Low-Dimensional Hyperbolic Knowledge Graph Embeddings. ACL (2020).
  • Chami et al. (2019) Ines Chami, Rex Ying, Christopher Ré, and Jure Leskovec. 2019. Hyperbolic Graph Convolutional Neural Networks. NeurIPS (2019).
  • Dettmers et al. (2018) Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. Convolutional 2D knowledge graph embeddings. AAAI.
  • Gu et al. (2018) Albert Gu, Frederic Sala, Beliz Gunel, and Christopher Ré. 2018. Learning mixed-curvature representations in product spaces. In ICLR.
  • Hao et al. (2019) Junheng Hao, Muhao Chen, Wenchao Yu, Yizhou Sun, and Wei Wang. 2019. Universal Representation Learning of Knowledge Bases by Jointly Embedding Instances and Ontological Concepts. KDD (2019).
  • Huang et al. (2019a) Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019a. Knowledge Graph Embedding Based Question Answering. WSDM (2019).
  • Huang et al. (2019b) Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019b. Knowledge Graph Embedding Based Question Answering. WSDM (2019).
  • Iyer et al. (2022) Roshni G. Iyer, Thuy Vu, Alessandro Moschitti, and Yizhou Sun. 2022. Question-Answer Sentence Graph for Joint Modeling Answer Selection. arXiv preprint arXiv:2203.03549 (2022).
  • Iyer et al. (2021) Roshni G. Iyer, Wei Wang, and Yizhou Sun. 2021. Bi-Level Attention Graph Neural Networks. ICDM (2021).
  • Kolyvakis et al. (2019) Prodromos Kolyvakis, Alexandros Kalousis, and Dimitris Kiritsis. 2019. HyperKG: Hyperbolic Knowledge Graph Embeddings for Knowledge Base Completion. arXiv preprint arXiv:1908.04895 (2019).
  • Lehmann et al. (2015) Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendesf, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Soren Auer, and Christian Bizer. 2015. DBpedia: A large-scale, multilingual knowledge base extracted from Wikipedia. In Semantic Web Journal. 167–195.
  • Meng et al. (2019) Yu Meng, Jiaxin Huang, Guangyuan Wang, Chao Zhang, Honglei Zhuang, Lance Kaplan, and Jiawei Han. 2019. Spherical Text Embedding. NeurIPS (2019).
  • Nickel and Kiela (2017) Maximilian Nickel and Douwe Kiela. 2017. Poincaré Embeddings for Learning Hierarchical Representations. NeurIPS (2017).
  • Petersen et al. (2006) Peter Petersen, S. Axler, and K.A. Ribet. 2006. Riemannian Geometry. Springer.
  • Suchanek et al. (2007) Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A Core of Semantic Knowledge. WWW.
  • Sun et al. (2022) Li Sun, Zhongbao Zhang, Junda Ye, Hao Peng, Jiawei Zhang, Sen Su, and Philip S. Yu. 2022. A Self-supervised Mixed-curvature Graph Neural Network. AAAI (2022).
  • Sun et al. (2020) Zequn Sun, Muhao Chen, Wei Hu, Chengming Wang, Jian Dai, and Wei Zhang. 2020. Knowledge Association with Hyperbolic Knowledge Graph Embeddings. ACL (2020).
  • Sun et al. (2019) Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2019. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. ICLR (2019).
  • Toutanova and Chen (2015) Kristina Toutanova and Danqi Chen. 2015. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality.
  • Trouillon et al. (2016) Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex Embeddings for Simple Link Prediction. ICML (2016).
  • Wang et al. (2019) Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, and Minyi Guo. 2019. RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems. WSDM (2019).
  • Wang et al. (2021) Shen Wang, Xiaokai Wei, Cícero Nogueira dos Santos, Zhiguo Wang, Ramesh Nallapati, Andrew Arnold, Bing Xiang, Philip S. Yu, and Isabel F Cruz. 2021. Mixed-Curvature Multi-Relational Graph Neural Network for Knowledge Graph Completion. WWW (2021).
  • Yang et al. (2015) Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. ICLR (2015).
  • Zhang et al. (2021) Shuai Zhang, Yi Tay, Wenqi Jiang, Da-cheng Juan, and Ce Zhang. 2021. Switch spaces: Learning product spaces with sparse gating. arXiv preprint arXiv:2102.08688 (2021).
  • Zhang et al. (2020) Zhanqiu Zhang, Jianyu Cai, Yongdong Zhang, and Jie Wang. 2020. Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction. AAAI (2020).

Appendix A General Polar-Cartesian Conversion Framework

In this section, we detail our designed general procedures for efficiently converting arbitrary dd-dimensional embeddings between the polar and Cartesian coordinate representations, which are utilized by sub-models in DGS. These procedures include: (1) the Cartesian to polar coordinate transformation (MCP) and (2) the polar to Cartesian coordinate transformation (MPC).

Refer to caption
Figure 3. Illustration of vector 𝒛\bm{z} in spherical space on 3-dimensional surface. Angular dimensions of 𝒛\bm{z} are θz,1\theta_{z,1}, which is the distance from the x1x_{1}-axis to 𝒛\bm{z} projected onto the x1x_{1}-x2x_{2} plane, and θz,2\theta_{z,2}, which is the distance from the x3x_{3}-axis to 𝒛\bm{z} projected onto the x2x_{2}-x3x_{3} plane.

A.1. MCP Transformation Procedure.

Using Figure 3, we derive transformations for the 3-dimensional spherical surface, and extend to a dd-dimensional spherical surface:

tan(θz,1)\displaystyle\mathrm{tan}(\theta_{z,1}) =xz,1xz,2;𝑟𝑎𝑑=w\displaystyle=\frac{x_{z,1}}{x_{z,2}};\mathit{rad}=w
cot(θz,2)\displaystyle\mathrm{cot}(\theta_{z,2}) =xz,3xz,12+xz,22;𝑟𝑎𝑑=w\displaystyle=\frac{x_{z,3}}{\sqrt{x_{z,1}^{2}+x_{z,2}^{2}}};\mathit{rad}=w
cot(θz,3)\displaystyle\mathrm{cot}(\theta_{z,3}) =xz,4xz,12+xz,22+xz,32;𝑟𝑎𝑑=w\displaystyle=\frac{x_{z,4}}{\sqrt{x_{z,1}^{2}+x_{z,2}^{2}+x_{z,3}^{2}}};\mathit{rad}=w
\displaystyle\vdots
cot(θz,d1)\displaystyle\mathrm{cot}(\theta_{z,d-1}) =xz,dxz,12+xz,22++xz,d12;𝑟𝑎𝑑=w\displaystyle=\frac{x_{z,d}}{\sqrt{x_{z,1}^{2}+x_{z,2}^{2}+...+x_{z,d-1}^{2}}};\mathit{rad}=w
\displaystyle\implies
θz,1\displaystyle\theta_{z,1} =tan1(xz,2xz,1);𝑟𝑎𝑑=w\displaystyle=\mathrm{tan}^{-1}(\frac{x_{z,2}}{x_{z,1}});\mathit{rad}=w
θz,i\displaystyle\theta_{z,i} =cot1xz,i+1i=1ixz,i2,i[2,d1];𝑟𝑎𝑑=w\displaystyle=\mathrm{cot}^{-1}{\frac{x_{z,i+1}}{\sqrt{\sum_{i^{\prime}=1}^{i}{x_{z,i^{\prime}}^{2}}}}},i\in[2,d-1];\mathit{rad}=w
Refer to caption
Figure 4. Riemannian retraction operation for the bridge space of DGS

A.2. MPC Transformation Procedure.

Using Figure 3, we derive transformations for the dd-dimensional spherical surface, as follows with:

xz,1\displaystyle x_{z,1} =cos(θz,1);𝑟𝑎𝑑=w\displaystyle=\mathrm{cos}(\theta_{z,1});\mathit{rad}=w
xz,di\displaystyle x_{z,d-i} =(i=did1sin(θz,i))sin(θz,di1);𝑟𝑎𝑑=w\displaystyle=\big{(}\prod_{i^{\prime}=d-i}^{d-1}\mathrm{sin}(\theta_{z,i^{\prime}})\big{)}\cdot\mathrm{sin}(\theta_{z,d-i-1});\mathit{rad}=w
xz,d\displaystyle x_{z,d} =cos(θz,d1);𝑟𝑎𝑑=w\displaystyle=\mathrm{cos}(\theta_{z,d-1});\mathit{rad}=w
Theorem.

The MCP procedure and MPC procedure are closed in the spherical space.

Proof.

The above equations for the MCP and MPC procedures directly use properties of trigonometry for right triangles for each embedding dimension θi,i[1,d1]\theta_{i},i\in[1,d-1], ensuring that the transformation is closed. ∎

Appendix B Batch Vector Transformation Procedure

We illustrate the batch version of the rotation operation presented in Section 3.1.1. Consider for entity eie_{i}, there are KK relations connected to eie_{i}. Instead of applying frotf_{\mathrm{rot}} to each one of the KK relations, we can perform the following to compute the result efficiently:

frot(𝒉eiS,[𝜽r1S𝜽rKS])=[(θei,1S+θr1,1S)++(θei,dS+θr1,dS)(θei,1S+θrK,1S)++(θei,dS+θrK,dS)]mod2π\displaystyle f_{\mathrm{rot}}\left(\bm{h}_{e_{i}}^{S},\left[\begin{array}[]{c}\bm{\theta}_{r_{1}}^{S}\\ \vdots\\ \bm{\theta}_{r_{K}}^{S}\end{array}\right]\right)=\left[\begin{array}[]{c}({\theta}_{e_{i},1}^{S}+{\theta}_{r_{1},1}^{S})+\dots+({\theta}_{e_{i},d}^{S}+{\theta}_{r_{1},d}^{S})\\ \vdots\\ ({\theta}_{e_{i},1}^{S}+{\theta}_{r_{K},1}^{S})+\dots+({\theta}_{e_{i},d}^{S}+{\theta}_{r_{K},d}^{S})\end{array}\right]\mathrm{mod}2\pi
=[frot(𝒉eiS,𝜽r1S)frot(𝒉eiS,𝜽rKS)]\displaystyle=\left[\begin{array}[]{c}f_{\mathrm{rot}}(\bm{h}_{e_{i}}^{S},\bm{\theta}_{r_{1}}^{S})\\ \vdots\\ f_{\mathrm{rot}}(\bm{h}_{e_{i}}^{S},\bm{\theta}_{r_{K}}^{S})\end{array}\right]
Theorem.

The batch vector transformation procedure is closed.

Proof.

Each relation has frot()f_{\mathrm{rot}}(\cdot) applied to the entity, and frot()f_{\mathrm{rot}}(\cdot) has been shown to be closed because every dimension of the transformed head is within the angular constraints e.g., by mod2π\mathrm{mod}2\pi, and norm space constraints because frot()f_{\mathrm{rot}}(\cdot) preserves the norm space of the spherical space. ∎

Appendix C Riemannian retraction operation for the bridge space of DGS

Figure 4 describes the Riemannian retraction operation \mathcal{R} for the bridge space to the tangent Euclidean space. The bridge space is formed by intersecting the surface of the spherical ball with the Poincaré disk in the hyperbolic space. As shown in the figure, we separately map the Poincaré disk and the spherical ball to different tangent Euclidean spaces, and enable for joint communication between the two spaces to represent the bridge entities. On the left, we perform isomorphic mapping from the Poincaré disk, d\mathcal{H}^{d}, to the hyperboloid model, d+1\mathcal{B}^{d+1} as in (Chami et al., 2019) using:

(40) dd+1(x1,,xd)=2x1,,2xd,1+x221x22\mathcal{R}_{\mathcal{H}^{d}\rightarrow\mathcal{B}^{d+1}}(x_{1},...,x_{d})=\frac{2x_{1},...,2x_{d},1+\lVert\textbf{x}\rVert_{2}^{2}}{1-\lVert\textbf{x}\rVert_{2}^{2}}
(41) d+1d(x1,,xd+1)=(x1,,xd)xd+1+1\mathcal{R}_{\mathcal{B}^{d+1}\rightarrow\mathcal{H}^{d}}(x_{1},...,x_{d+1})=\frac{(x_{1},...,x_{d})}{x_{d+1}+1}

, then using logarithmic mapping map to the tangent Euclidean space 1. On the right, we use logarithmic mapping to map the spherical ball to the tangent Euclidean space. Then we allow for transformation between the two Euclidean spaces through transformation mappings γαβ\gamma_{\alpha\beta} and γβα\gamma_{\beta\alpha} such that

(42) γαβ=γβγα1\gamma_{\alpha\beta}=\gamma_{\beta}\circ\gamma_{\alpha}^{-1}
(43) γβα=γαγβ1\gamma_{\beta\alpha}=\gamma_{\alpha}\circ\gamma_{\beta}^{-1}