Dual-Geometric Space Embedding Model for Two-View Knowledge Graphs

Roshni G. Iyer University of California, Los AngelesLos AngelesCAUSA [email protected] , Yunsheng Bai University of California, Los AngelesLos AngelesCAUSA [email protected] , Wei Wang University of California, Los AngelesLos AngelesCAUSA [email protected] and Yizhou Sun University of California, Los AngelesLos AngelesCAUSA [email protected]

(2022)

Abstract.

Two-view knowledge graphs (KGs) jointly represent two components: an ontology view for abstract and commonsense concepts, and an instance view for specific entities that are instantiated from ontological concepts. As such, these KGs contain heterogeneous structures that are hierarchical, from the ontology-view, and cyclical, from the instance-view. Despite these various structures in KGs, most recent works on embedding KGs assume that the entire KG belongs to only one of the two views but not both simultaneously. For works that seek to put both views of the KG together, the instance and ontology views are assumed to belong to the same geometric space, such as all nodes embedded in the same Euclidean space or non-Euclidean product space, an assumption no longer reasonable for two-view KGs where different portions of the graph exhibit different structures. To address this issue, we define and construct a dual-geometric space embedding model (DGS) that models two-view KGs using a complex non-Euclidean geometric space, by embedding different portions of the KG in different geometric spaces. DGS utilizes the spherical space, hyperbolic space, and their intersecting space in a unified framework for learning embeddings. Furthermore, for the spherical space, we propose novel closed spherical space operators that directly operate in the spherical space without the need for mapping to an approximate tangent space. Experiments on public datasets show that DGS significantly outperforms previous state-of-the-art baseline models on KG completion tasks, demonstrating its ability to better model heterogeneous structures in KGs.

Knowledge Graph Embeddings; Non-Euclidean Geometry

^†^†journalyear: 2022^†^†copyright: rightsretained^†^†conference: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; August 14–18, 2022; Washington, DC, USA^†^†booktitle: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’22), August 14–18, 2022, Washington, DC, USA^†^†isbn: 978-1-4503-9385-0/22/08^†^†doi: 10.1145/3534678.3539350^†^†ccs: Computing methodologies Knowledge representation and reasoning^†^†ccs: Computing methodologies non-Euclidean geometric space embedding

1. Introduction

Knowledge graphs (KGs) are essential data structures that have been shown to improve several semantic applications, including semantic search (Berant and Liang, 2014), question answering (Huang et al., 2019a; Iyer et al., 2022), and recommender systems (Wang et al., 2019). Two-view knowledge graphs (Huang et al., 2019b) consist of both (1) an instance-view component, containing entity-entity relations, and (2) an ontology-view component, containing concept-concept and entity-concept relations. In addition, there are nodes involved in both entity-concept relations and entity-entity relations, which we refer to as bridge nodes, e.g., nodes 5 and 6 in Figure 1(a). Bridge nodes provide the connections to bridge both KG views. The instance-view component contains data that form cyclical structures, such as nodes 8-10 in Figure 1(a). The ontology-view component contains data that form hierarchical structures. As such, the two-view KG contains intrinsic heterogeneous structures.

Refer to caption — Figure 1. Two-View KG visualization and embedding space. (a) Example two-view KG. (b) Corresponding DGS representation.

A key challenge with KGs is that they are often highly incomplete, making KG completion an important area of investigation. In recent years, several research works have focused on improving knowledge graph embeddings (KGE) to address this task (Bordes et al., 2013; Yang et al., 2015). The idea behind embedding methods is to map entities and relations to a latent low-dimensional vector space while preserving the semantics and inherent structures in the KG. However, most embedding methods, including graph neural network (GNN)-based (Chami et al., 2020; Sun et al., 2020; Chami et al., 2019; Iyer et al., 2021) and non-GNN-based methods (Kolyvakis et al., 2019; Trouillon et al., 2016), are limited to operating on KGs belonging to either the instance-view or the ontology-view but not both. Specifically, they either model entities and relations in the zero-curvature Euclidean space (Sun et al., 2019), omitting both cyclic and hierarchical structures, or in the hyperbolic space (Bai et al., 2021; Zhang et al., 2020), omitting cyclic structures and treating relations as hierarchical.

For works that seek to put both views of the KG together, the instance and ontology views are assumed to belong to the same space, an assumption no longer reasonable for two-view KGs. Specifically, all KG relations are modeled using the Euclidean space (Hao et al., 2019), or are all modeled using a product space, being the Riemannian product manifold of the Euclidean, spherical and hyperbolic spaces (Wang et al., 2021). However, our goal in modeling two-view KGs is to use a unified framework to embed hierarchical structures of the KG in the hyperbolic space, cyclic structures of the KG in the spherical space, and nodes involved in both structures in a unified intersection space.

To address the above challenges, in this paper, we present the Dual-Geometric Space embedding model (DGS) for two-view KGs. To summarize, our work makes the following contributions:

•

We formulate the problem of modeling a two-view KG in complex non-Euclidean geometric space for KG completion. To our knowledge, we are the first to model two-view KGs in distinct non-Euclidean spaces using a unifying framework, e.g., different views belong to different geometric spaces.
•

We propose to model instance-view entities in the spherical space for solely cyclical relations, and ontology-view concepts in the hyperbolic space for solely hierarchical relations, and bridge entities involved in both cyclic/hierarchical structures in a specially designed intersection bridge space.
•

To the best of our knowledge we are also the first to design closed spherical space operations, to directly operate in the spherical space without mapping to an external approximation space, e.g., the tangent Euclidean space.
•

We investigate seven variant and ablation models of DGS and evaluate these models on two KG completion tasks. Extensive experiments demonstrate the effectiveness of DGS in populating knowledge in two-view KGs. Further, our model significantly outperforms its single non-Euclidean and Euclidean geometric space counterparts including the product space, and existing state-of-the-art graph neural network (GNN) embedding methods.

2. Preliminary and Related Work

Here, we formalize two-view KGs, which jointly model the entities and concepts in the instance and ontology views, and discuss the achievement of primary state-of-the-art models for two-view KGs.

2.1. Problem Formulation

A two-view KG, $G$ , consists of nodes and edges, such that nodes denote the set of entities $E$ , or set of concepts, $C$ . Edges denote relations where $R_{I}$ are the set of entity-entity relations, $R_{O}$ are the set of concept-concept relations, and $R_{IO}$ are the set of entity-concept relations. The two-view KG consists of an instance-view component containing entities with relations of $R_{I}$ , and an ontology-view component containing concepts and bridge entities with relations of $R_{O}$ and $R_{IO}$ . We denote a subset of entities in $E$ to be bridge entities, that communicate between both instance and ontology views. These bridge entities associate with both $R_{I}$ and $R_{IO}$ relations. Relations in $R_{IO}$ are entity-concept relations, which are similar to concept-concept relations such as “is_a”. $E$ and $C$ as well as $R_{I}$ , $R_{\mathcal{O}}$ and $R_{I\mathcal{O}}$ are each disjoint sets. We denote $e_{i}\in E$ to be the $i$ -th entity, $\bm{h}_{e_{i}}$ to be entity $i$ ’s representation in the Cartesian coordinate system, and $r_{Ik}\in R_{I}$ to be the $k$ -th relation in the instance-view component. Without loss of generality (WLOG), we also model the ontology-view component where $c_{i}\in C$ is the $i$ -th concept, $\bm{h}_{c_{i}}$ is concept $i$ ’s representation in the Cartesian coordinate system, and $r_{\mathcal{O}k}\in R_{\mathcal{O}}$ is the $k$ -th relation in the ontology-view component. Therefore, the instance-view can be considered as a set of triples in the form $(e_{i},{r_{Ik}},e_{j})\in E\times R_{I}\times E$ , which may be any real-world instance associations and are often cyclical in structure. Likewise, the ontology-view can be represented by a set of triples $(c_{i},{r_{\mathcal{O}k}},{c_{j}})\in C\times R_{\mathcal{O}}\times C$ , which contain hierarchical associations such as is_a. Here we use Figure 1(a) to illustrate the aforementioned problem formulation. For example, (Soil, composed_of, Mineral) and (Fungus, is_a, Eukaryote) are instance-view and ontology-view triples respectively. Further, nodes of Bacteria-Soil-Mineral is an example of a cycle between non-bridge entities.

A bridge entity may be involved with two types of triples:
$({e_{i}},{r_{I\mathcal{O}k}},{c_{j}})$ $\in E\times R_{I\mathcal{O}}\times C$ for (hierarchical) entity-concept relations and $({e_{i}},{r_{Ik}},{e_{j}})$ $\in E\times R_{I}\times E$ for (cyclical) entity-entity relations. For example, Rabbit is a bridge entity in Figure 1(a). The triple (Rabbit, emit, Carbon Dioxide) represents an entity-entity relation, and the triple (Rabbit, is_a, Herbivore) represents an entity-concept relation.

The objective of our research is to learn KG embeddings of nodes and relations in the KG, such that we seamlessly unify multiple curvature geometric spaces to better capture the contextual information and heterogeneous structures in the KGs. We evaluate the quality of the learned embeddings on the KG tasks of triple completion and entity typing, described in Section 4.

2.2. Non-Euclidean Geometric Spaces

In this section, we describe the various properties of non-Euclidean geometric spaces, which are curved spaces unlike the zero-curvature Euclidean space. The textbook (Petersen et al., 2006) provides more details. Geometric spaces of Euclidean ( $\mathbb{E}^{d}$ ), spherical ( $\mathbb{S}^{d}$ ), and hyperbolic ( $\mathbb{H}^{d}$ ) spaces belong to Riemannian manifolds ( $\bm{M}^{d}$ ), such that each point $\bm{a}\in\bm{M}^{d}$ has a corresponding tangent space, $(T_{\bm{a}}\bm{M}^{d})^{d}$ , that approximates $\bm{M}^{d}$ around $\bm{a}$ . Further, each Riemmanian manifold, $\bm{M}^{d}$ , is associated with a Riemanian metric $dist$ that defines the geodesic distance of two points on the manifold and the curvature $K$ of the space. In the spherical space, curvature $K_{S}>0$ , suitable for capturing cyclical structures (Zhang et al., 2021), while in the hyperbolic space, curvature $K_{H}<0$ , suitable for capturing hierarchical structures (Nickel and Kiela, 2017). Widely used models on the hyperbolic space include the Poincaré ball model (Cannon et al., 1997), the Lorentz (Beltrami, 1897) model, and the Klein model (Beltrami, 1897). As the these three models are both isometric and isomorphic to one another, WLOG we utilize the Poincaré ball model in our work.

Non-Euclidean Space Optimization

DGS utilizes Riemannian optimization for updating entity and relational embeddings because Euclidean space optimization methods, such as SGD, provide the update direction of the Euclidean gradient to be in non-curvature space. This does not align with parameters in our model that must be updated in positive or negative curvature spaces. For parameter learning, DGS uses RSGD (Bonnabel, 2013), whose update function is denoted below, where $(T_{\bm{a}}M^{d})^{d}$ denotes the tangent Euclidean space of $\bm{a}\in M^{d}$ , $\nabla_{R}\in(T_{\bm{a}}M^{d})^{d}$ denotes the Riemannian gradient of loss function $L(\bm{a})$ , $\mathcal{R}_{\bm{a}_{t}}$ denotes retraction onto $M^{d}$ , or non-Euclidean space at $\bm{a}$ , and $\eta_{t}$ denotes learning rate at time $t$ :

\bm{a_{t+1}}=\mathcal{R}_{\bm{a}_{t}}(-\eta_{t}\nabla_{R}L(\bm{a}_{t}))

The retraction operator, $\mathcal{R}(\cdot)$ involves mapping between spaces. For non-Euclidean spaces, the retraction is generally performed between the non-Euclidean space and approximate tangent Euclidean space using logarithmic and exponential mapping functions as follows, where $\mathrm{log}_{0}(\bm{h}_{e_{i}}^{B})$ is a logarithmic map at center $\bm{0}$ from the hyperbolic space to Euclidean tangent space, and $\mathrm{exp}_{0}(\bm{h}_{e_{i}}^{B})$ is an exponential map at center $\bm{0}$ from the Euclidean tangent space to hyperbolic space of $\bm{a}$ .

(1)

\mathrm{log}_{0}(\bm{a})=\mathrm{tanh}^{-1}(i\cdot\lVert\bm{a}\rVert)\frac{\bm{a}}{i\cdot\lVert\bm{a}\rVert}

(2)

\mathrm{exp}_{0}(\bm{a})=\mathrm{tanh}(i\cdot\lVert\bm{a}\rVert)\frac{\bm{a}}{i\cdot\lVert\bm{a}\rVert}

2.3. Two-View KG Models

In this section, we describe the models that are utilized for two-view KGs, which consider the problem setting of modeling ontological and instance views. To address the challenges of these models, we propose DGS in Section 3.

JOIE

The JOIE (Hao et al., 2019) model is an embedding model that considers the problem of two-view KGs. However, JOIE models all triples in the same zero-curvature Euclidean space, omitting the hierarchical and cyclical structures of the KG. Further, there is no special consideration of representation for bridge nodes.

Models leveraging product of spaces

Models including
M²GNN (Wang et al., 2021) which extends (Gu et al., 2018) from social networks to the domain of KGs, (Sun et al., 2022), etc. help to address the limitation of JOIE by utilizing the hyperbolic and spherical spaces for representing triples. However, they treat all triples in the KG to be in the same non-Euclidean product space, formed by a Riemannian product of the Euclidean, spherical, and hyperbolic spaces. Thus, the embedding space of hierarchical triples is not distinguished from the embedding space of cyclic triples, and further there is also no special consideration of representation for bridge nodes.

3. DGS Architecture

This section describes our proposed dual-geometric space embedding model (DGS), which jointly embeds entities and concepts in a knowledge graph. DGS embeds different link types of the two-view KG in different non-Euclidean manifold spaces to capture the inherent heterogeneous structures of the KG, as described below. We also design general procedures, which are detailed in the Supplements, for efficiently converting arbitrary $d$ -dimensional embeddings between the polar and Cartesian coordinate representations, which are utilized by sub-models in DGS. Source code is available at https://github.com/roshnigiyer/dgs.

3.1. Modeling

The key questions in the modeling process are in how to: (1) design or adopt an appropriate embedding space for nodes, (2) define the representation of entity/concept and relation in that embedding space, and (3) define the KG embedding model and loss function of DGS. Our framework enables each of entities and concepts to be influenced by bridge nodes, and simultaneously bridge nodes to be informed by entities and concepts. In this way, DGS exhibits a unified framework to jointly model two-views of KGs.

DGS models nodes in the instance view as points in a spherical space $\mathbb{S}^{d}$ with fixed norm space $w^{S}$ , and nodes in the ontology view as points in a hyperbolic space $\mathbb{H}^{d}$ with learnable norm space $w_{\bm{h}_{c_{i}}^{H}}$ per concept. The bridge nodes lie in the intersection of the two, which is a submanifold intersection space, called $\mathbb{B}^{d}$ , shown as the dotted circle in Figure 1(b). $\mathbb{B}^{d}$ contains the same fixed norm space as the spherical space $w^{S}$ . For modeling compatibility, we set the degrees of freedom of $\mathbb{S}^{d}$ , $\mathbb{H}^{d}$ , and $\mathbb{B}^{d}$ to $d-1$ , $d-1$ , and $d-2$ respectively. $\mathbb{B}^{d}$ has one degree of freedom less than $\mathbb{H}^{d}$ and $\mathbb{S}^{d}$ because it is a submanifold intersection space of both the spherical and hyperbolic spaces. The norm of $\mathbb{B}^{d}$ is $w^{S}$ because that is the intersection norm space of $\mathbb{S}^{d}$ and $\mathbb{H}^{d}$ . The concept-specific norm spaces, $w_{\bm{h}_{c_{i}}^{H}}$ , are learnable in order for the hierarchy in the KG to be learned by the embedding. In practice, it can be seen that hierarchical root concepts move towards the center of the hyperbolic space e.g., towards norm 0, shown in Section 4.

Parameter optimization, detailed in Section 3.2, is performed using RSGD, described in Section 2.2, on hinge loss functions which utilize non-Euclidean space geodesic distances for the spherical and hyperbolic spaces respectively. We construct the hinge loss function such that positive triples are scored higher than negative triples within positive margin hyperparameters.

3.1.1. Modeling Instance View KG in Spherical Space

Representation of entities.

We propose to embed the entities from the instance-view on the surface of the spherical ball from the spherical space, $\mathbb{S}^{d}$ , in order to better capture the cyclic structures present in this KG view. Entities are represented as points $\bm{h}_{e_{i}}$ that belong to the surface of the spherical ball in $\mathbb{S}^{d}$ as shown in Figure 1(b). Formally, $\mathbb{S}^{d}=\{\bm{h}_{e_{i}}^{S}\in\mathbb{R}^{d}\big{|}\lVert\bm{h}_{e_{i}}^{S}\rVert=w^{S}\},w^{S}\in[0,1)$ , is the $d$ -dimensional $w^{S}$ -norm ball, where $\lVert\cdot\rVert$ is the Euclidean norm.

For an entity $\bm{h}_{e_{i}}^{S}$ , we propose to model the relation applied to the entity as a rotation operation, and therefore represent the relation $r_{Ik}$ as a vector of angles $\bm{\theta}_{r_{Ik}}^{S}\in[0,2\pi)^{d-1}$ . Below, we show our proposed vector transformation procedure to model the relation as rotation of the head entity, $f_{\mathrm{rot}}:\bm{h}_{e_{i}}^{S,new}=f_{\mathrm{rot}}(\bm{h}_{e_{i}}^{S},\bm{\theta}_{r_{Ik}}^{S})$ , and prove that the operation is closed in the spherical space. $f_{\mathrm{rot}}(\cdot)$ eliminates the need to project representations to the tangent Euclidean space in order to perform standard transformations. To the best of our knowledge, we are the first to propose a closed vector transformation procedure that operates directly in the spherical space.

Spherical Space Vector Transformation Procedure: $f_{\mathrm{rot}}(\cdot)$ .

This section describes our proposed vector transformation procedure, $f_{\mathrm{rot}}$ , for the spherical space, which directly performs operations using properties, such as positive curvature, in the spherical space. This section also outlines the proof of the closedness property of the transformation procedure. For computational efficiency, we also extend the vector transformation procedure to the batch version, the batch vector transformation procedure, which is detailed in the Supplements. $f_{\mathrm{rot}}$ takes as input an entity embedding and relation operator, which it uses to transform the entity embedding: $\bm{h}_{e_{i}}^{S,new}=f_{\mathrm{rot}}(\bm{h}_{e_{i}}^{S},\bm{\theta}_{r_{Ik}}^{S})$ :

(1)

Given $\bm{h}_{e_{i}}^{S},\bm{\theta}_{r_{Ik}}^{S}$ , we convert $\bm{h}_{e_{i}}^{S}$ to the corresponding representation $\bm{\theta}_{e_{i}}^{S}$ in polar coordinates, with $\mathit{rad}$ denoting the radius which has the value $w^{S}$ . Refer to Section A.1 for details on the conversion procedure.

(3)

\bm{h}_{e_{i}}^{S}:[x_{e_{i},1}^{S},x_{e_{i},2}^{S},...,x_{e_{i},d}^{S}]\rightarrow\bm{\theta}_{e_{i}}^{S}:[\theta_{e_{i},1}^{S},\theta_{e_{i},2}^{S},...,\theta_{e_{i},d-1}^{S}];\mathit{rad}=w^{S}

(4)

\bm{\theta}_{r_{Ik}}^{S}:[\theta_{r_{Ik},1},\theta_{r_{Ik},2},...,\theta_{r_{Ik},d-1}];\mathit{rad}=w^{S}

(2)

Denote $\bm{z}[l]$ to be the $l$ -th entry of $\bm{z}$ and apply the transformation:

(5)

(\bm{\theta}_{e_{i}}^{S}+\bm{\theta}_{r_{Ik}}^{S})[l]=(\bm{\theta}_{e_{i}}^{S}[l]+\bm{\theta}_{r_{Ik}}^{S}[l])\ \mathrm{mod}\ 2\pi;l\in[1,d-1]

(6)

\bm{\theta}_{e_{i}}^{S,new}=[(\theta_{e_{i},1}^{S}+\theta_{r_{Ik},1}^{S})\ \mathrm{mod}\ 2\pi,...,\\ (\theta_{e_{i},d-1}^{S}+\theta_{r_{Ik},d-1}^{S})\ \mathrm{mod}\ 2\pi];\mathit{rad}=w^{S}

(3)

Convert from polar coordinates back to Cartesian coordinates. Refer to Section A.2 for details on the conversion procedure.

(7)

\bm{\theta}_{e_{i}}^{S,new}\rightarrow\bm{h}_{e_{i}}^{S,new}:[x_{e_{i},1}^{S,new},x_{e_{i},2}^{S,new},...,x_{e_{i},d}^{S,new}]

Theorem.

The vector transformation procedure, $f_{\mathrm{rot}}$ , is closed in the spherical space.

Proof.

The proof is outlined by examining all three steps for $f_{\mathrm{rot}}$ from the transformation procedure, where $\bm{h}_{e_{i}}$ and $\bm{\theta}_{e_{i}}$ represent the same point $e_{i}$ in the embedding space by the isomorphic Cartesian and polar coordinate systems respectively.

(1)

The Cartesian coordinate representation is equivalent to the polar coordinate representation of the point, $e_{i}$ , under MCP model, detailed in the Supplements. Further, radius $\mathit{rad}=w^{S}$ of the polar representation embeddings lies in the spherical norm space $w^{S}$ .
(2)

$\theta\in[0,2\pi);\mathit{rad}=w^{S}$ defines polar representation embeddings in the spherical space, and
$(\bm{\theta}_{e_{i}}^{S}+\bm{\theta}_{r_{Ik}}^{S})[l]=(\bm{\theta}_{e_{i}}^{S}[l]+\bm{\theta}_{r_{Ik}}^{S}[l])\ \mathrm{mod}\ 2\pi\in[0,2\pi)\forall l;\mathit{rad}=w^{S}$ . Thus, both angular and radial coordinates are preserved.
(3)

The polar coordinate representation is equivalent to the Cartesian coordinate representation of the point, $e_{i}$ , under MPC model, detailed in the Supplements.

∎

Instance View Loss Function.

The instance-view model uses a hinge loss function that is maximized for all triples in the instance space $R_{I}$ , with positive triples denoted, $\mathrm{pt}_{I}=(\bm{h}_{e_{i}}^{S},\bm{\theta}_{r_{Ik}}^{S},\bm{h}_{e_{j}}^{S})$ with corresponding score $\phi^{S}_{\mathrm{obs}}(\mathrm{pt}_{I})$ and negative triples denoted, $\mathrm{nt}_{I}=(\bm{h}_{e_{i}}^{S^{\prime}},\bm{\theta}_{r_{Ik}}^{S},\bm{h}_{e_{j}}^{S^{\prime}})$ with corresponding score $\phi^{S}_{\mathrm{corr}}(\mathrm{nt}_{I})$ , and $\gamma^{R_{I}}>0$ is a positive margin hyperparameter. Specifically, the instance loss function measures the distance between the predicted tail entity and the ground truth.

(8)

\phi^{S}_{\mathrm{obs}}(\mathrm{pt}_{I})=\mathrm{dist}_{S}\big{(}f_{\mathrm{rot}}(\bm{h}_{e_{i}}^{S},\bm{\theta}_{r_{Ik}}^{S}),\bm{h}_{e_{j}}^{S}\big{)}

(9)

\phi^{S}_{\mathrm{corr}}(\mathrm{nt}_{I})=\mathrm{dist}_{S}\big{(}f_{\mathrm{rot}}(\bm{h}_{e_{i}}^{S^{\prime}},\bm{\theta}_{r_{Ik}}^{S}),\bm{h}_{e_{j}}^{S^{\prime}}\big{)}

(10)

L_{\mathrm{inst}}^{R_{I}}=\frac{1}{|R_{I}|}\sum_{(\mathrm{pt_{I}}\in R_{I})\land(\mathrm{nt_{I}}\notin R_{I})}{\mathrm{max}\big{(}0,\gamma^{R_{I}}+\phi^{S}_{\mathrm{obs}}(\mathrm{pt}_{I})-\phi^{S}_{\mathrm{corr}}(\mathrm{nt}_{I})\big{)}}

We calculate spherical geodesic distance (Meng et al., 2019) between points $\bm{x}^{S}$ and $\bm{y}^{S}$ on the manifold as follows:

(11)

\mathrm{dist}_{S}(\bm{x}^{S},\bm{y}^{S})=\mathrm{arccos}((\bm{x}^{S})^{{}^{T}}\bm{y}^{S})

$R_{I}$ includes links between all entities including both non-bridge entities and bridge-entities. In this way, the representation of non-bridge entities is influenced by the bridge entities.

3.1.2. Modeling Ontological View KG in Hyperbolic Space

Representation of concepts.

We propose to embed concepts from the ontology-view on the Poincaré disk from the hyperbolic space, $\mathbb{H}^{d}$ , in order to better capture the hierarchical structures present in this KG view. Concepts are represented as points, $\bm{h}_{c_{i}}$ that belong inside the Poincaré disk in $\mathbb{H}^{d}$ as shown in Figure 1(b). Formally, $\mathbb{H}^{d}=\{\bm{h}_{c_{i}}^{H}\in\mathbb{R}^{d}\big{|}\lVert\bm{h}_{c_{i}}^{H}\rVert=w_{c_{i}}^{H}\},w_{c_{i}}^{H}\in[0,1)$ , is the $d$ -dimensional $w_{c_{i}}^{H}$ -norm ball, where $\lVert\cdot\rVert$ is the Euclidean norm. We assume the center of the disk is aligned with the center of the sphere, and for convenience set the last dimension $d$ to 0.

For concepts in the hyperbolic space, $\mathbb{H}^{d}$ , any hyperbolic space model for KG can be applied in principle, which we denote as follows with $r_{\mathcal{O}k}\in R_{\mathcal{O}}$ to denote a relation between two concepts:

(12)

f_{\mathrm{KGE}}(\mathrm{pt}_{O})=f_{\mathrm{KGE}}(\bm{h}_{c_{i}}^{H},r_{\mathcal{O}k},\bm{h}_{c_{j}}^{H})

We illustrate MuRP (Balaževic´ et al., 2019) as an example scoring function, which uses the hyperbolic geodesic distance and relies on Möbius addition to model the relation, where $\mathrm{exp}_{0}(\cdot)$ and $\mathrm{log}_{0}(\cdot)$ are defined in Section 2.2, $\bm{R}\in\mathbb{R}^{d\times d}$ is a learnable diagonal relation matrix representing the stretch transformation by relation $r_{\mathcal{O}k}\in R_{\mathcal{O}}$ with representation $\bm{h}_{r_{\mathcal{O}k}}^{H}$ , and scalar biases $b_{c_{i}}^{H},b_{c_{j}}^{H}$ of concepts $c_{i}$ and $c_{j}$ :

(13)

\begin{split}f_{\mathrm{MuRP}}(\mathrm{pt}_{O})&=f_{\mathrm{MuRP}}(\bm{h}_{c_{i}}^{H},r_{\mathcal{O}k},\bm{h}_{c_{j}}^{H})\\ &=f_{\mathrm{MuRP}}(\bm{h}_{c_{i}}^{H},\bm{h}_{r_{\mathcal{O}k}}^{H},\bm{h}_{c_{j}}^{H})\\ &=-\mathrm{dist}_{H}\big{(}\mathrm{exp}_{0}(\bm{R}\mathrm{log}_{0}(\bm{h}_{c_{i}}^{H})\big{)},\bm{h}_{c_{j}}^{H}\oplus_{H}\bm{h}_{r_{\mathcal{O}k}}^{H})^{2}+b_{c_{i}}^{H}+b_{c_{j}}^{H}\end{split}

(14)

\bm{x}^{H}\oplus_{H}\bm{y}^{H}=\frac{(1+2\langle\bm{x}^{H},\bm{y}^{H}\rangle+\lVert\bm{y}^{H}\rVert_{2}^{2})\bm{x}^{H}+(1-\lVert\bm{x}^{H}\rVert_{2}^{2})\bm{y}^{H}}{1+2\langle\bm{x}^{H},\bm{y}^{H}\rangle+\lVert\bm{x}^{H}\rVert_{2}^{2}\lVert\bm{y}^{H}\rVert_{2}^{2}}

We calculate hyperbolic geodesic distance (Nickel and Kiela, 2017) between points $\bm{x}^{H}$ and $\bm{y}^{H}$ on the manifold as follows:

(15)

\mathrm{dist}_{H}(\bm{x}^{H},\bm{y}^{H})=\mathrm{arccosh}(1+2\frac{\lVert\bm{x}^{H}-\bm{y}^{H}\rVert_{2}^{2}}{(1-\lVert\bm{x}^{H}\rVert^{2})(1-\lVert\bm{y}^{H}\rVert^{2})})

Ontological View Loss Function.

The ontology-view model uses a hinge loss function that is maximized for all links between concepts in the ontology space and all links between bridge nodes and concepts, i.e., $R_{\mathcal{O}}\cup R_{I\mathcal{O}}$ , with positive triples denoted, $\mathrm{pt}_{\mathcal{O}}=(\bm{h}_{c_{i}}^{H},\bm{\theta}_{r_{\mathcal{O}k}}^{H},\bm{h}_{c_{j}}^{H})\cup(\bm{h}_{e_{i}}^{H},\bm{h}_{r_{I\mathcal{O}k}},\bm{h}_{c_{j}}^{H})$ with corresponding score $\phi^{H}_{obs}(\mathrm{pt}_{\mathcal{O}})$ and negative triples denoted, $\mathrm{nt}_{\mathcal{O}}=(\bm{h}_{c_{i}}^{H^{\prime}},\bm{\theta}_{r_{\mathcal{O}k}}^{H},\bm{h}_{c_{j}}^{H^{\prime}})\cup(\bm{h}_{e_{i}}^{H^{\prime}},\bm{h}_{r_{I\mathcal{O}k}},\bm{h}_{c_{j}}^{H^{\prime}})$ with corresponding score $\phi^{H}_{corr}(\mathrm{nt}_{\mathcal{O}})$ , and $\gamma^{R_{\mathcal{O}}\cup R_{I\mathcal{O}}}>0$ is a positive margin hyperparameter:

(16)

\phi^{H}_{obs}(\mathrm{pt}_{\mathcal{O}})=f_{\mathrm{KGE}}(\mathrm{pt}_{\mathcal{O}})

(17)

\phi^{H}_{corr}(\mathrm{nt}_{\mathcal{O}})=f_{\mathrm{KGE}}(\mathrm{nt}_{\mathcal{O}})

(18)

\begin{split}L_{\mathrm{onto}}^{R_{\mathcal{O}}\cup R_{I\mathcal{O}}}=\frac{1}{|R_{\mathcal{O}}\cup R_{I\mathcal{O}}|}\sum_{(\mathrm{pt_{\mathcal{O}}}\in R_{\mathcal{O}}\cup R_{I\mathcal{O}})\land(\mathrm{nt_{\mathcal{O}}}\notin R_{\mathcal{O}}\cup R_{I\mathcal{O}})}\\ {\mathrm{max}\big{(}0,\gamma^{R_{\mathcal{O}}\cup R_{I\mathcal{O}}}+\phi^{H}_{corr}(\mathrm{nt}_{\mathcal{O}})-\phi^{H}_{obs}(\mathrm{pt}_{\mathcal{O}})\big{)}}\end{split}

$R_{O}\cup R_{I\mathcal{O}}$ includes triples composed of concepts and bridge-entities. In this way, the representation of concepts is influenced by the bridge entities, which is described in Section 3.1.3.

3.1.3. Modeling the Intersection of the Two Spaces.

Representation of bridge entities.

Bridge nodes are entity nodes that bridge the communication between the instance-view and ontology-view components. Bridge nodes are connected to concepts in the graph, formed by link ontology $\bm{h}_{r_{I\mathcal{O}k}}$ but may also be connected to other entities, formed by link instance $\bm{\theta}_{r_{\mathcal{I}k}}$ . As shown in Figure 1(a), nodes 5 and 6 are bridge nodes that are involved in both cyclic and hierarchical structures through their links to other entities as well as concepts. As such, we propose to embed bridge entities in the intersection space of the Poincare disk and surface of the spherical ball, in order to better capture the heterogeneous KG structures that are associated with these nodes. We refer to the intersection submanifold embedding space as the bridge space, $\mathbb{B}^{d}$ , where the representation of these nodes are informed by both $\mathbb{S}^{d}$ and $\mathbb{H}^{d}$ and has one lower degree of freedom. The bridge space can therefore be derived as a sphere in general. Formally, $\mathbb{B}^{d}=\{\bm{h}_{e_{i}}^{B}\in\mathbb{R}^{d}\big{|}\lVert\bm{h}_{e_{i}}^{B}\rVert=w^{S}\},w^{S}\in[0,1)$ , is the $d$ -dimensional $w^{S}$ -norm ball, where $\lVert\cdot\rVert$ is the Euclidean norm, and the value of last dimension $d=0$ . Links associated with bridge nodes are $\bm{\theta}_{r_{Ik}},\bm{h}_{r_{I\mathcal{O}k}}\in[0,2\pi)^{d-1}$ , and operations on bridge nodes, such as geodesic distance and loss functions, happen in either the spherical space or hyperbolic space.

To ensure compatibility with the connected concept nodes, we map the bridge entities, $\bm{h}_{e_{i}}^{B}$ , to an embedding in the ontology space through a non-linear transformation function, $g_{\bm{h}_{r_{I\mathcal{O}k}}}(\bm{h}_{e_{i}}^{B})$ , where $\mathrm{AGG}(\cdot)$ denotes an averaging over all relations $k$ in $R_{IO}$ . Logarithmic and exponential mapping functions of $\mathrm{log}_{0}(\cdot)$ and $\mathrm{exp}_{0}(\cdot)$ are described in Section 2.2.

(19)

g_{\bm{h}_{r_{I\mathcal{O}k}}}(\bm{h}_{e_{i}}^{B})=\textrm{AGG}\Big{(}\mathrm{proj}_{B}\big{(}\mathrm{tanh}(\bm{W}_{\bm{h}_{r_{IOk}}}\otimes_{H}\bm{h}_{e_{i}}^{B}\oplus_{H}\bm{b}_{\bm{h}_{r_{IOk}}})\big{)}\Big{)}

(20)

\mathrm{proj}_{B}(\bm{z})=\mathrm{proj}_{S}(\bm{z})

(21)

\bm{M}\otimes_{H}\bm{h}_{e_{i}}^{B}=\mathrm{exp}_{0}\big{(}\bm{M}\mathrm{log}_{0}(\bm{h}_{e_{i}}^{B})\big{)}

(22)

\bm{h}_{c_{i}}^{H}\oplus_{H}\bm{h}_{c_{j}}^{H}=\frac{\big{(}1+2(\bm{h}_{c_{i}}^{H})^{T}\bm{h}_{c_{j}}^{H}+\lVert\bm{h}_{c_{j}}^{H}\rVert_{2}^{2}\big{)}\bm{h}_{c_{i}}^{H}+(1-\lVert\bm{h}_{c_{i}}^{H}\rVert_{2}^{2})\bm{h}_{c_{j}}^{H}}{1+2(\bm{h}_{c_{i}}^{H})^{T}\bm{h}_{c_{j}}^{H}+\lVert\bm{h}_{c_{i}}^{H}\rVert_{2}^{2}\lVert\bm{h}_{c_{j}}^{H}\rVert_{2}^{2}}

where both the weight matrix $\bm{W}_{h_{r_{I\mathcal{O}k}}}$ and bias $\bm{b}_{h_{r_{I\mathcal{O}k}}}$ are specific to each relation $k$ in $R_{IO}$ and reserved for the ontology $\bm{h}_{r_{I\mathcal{O}k}}$ .

Bridge Node Loss Function.

The bridge-node model uses a hinge loss function as a combination of the entity’s ontology-specific loss, $\mathrm{ontoLoss}_{\bm{h}_{r_{I\mathcal{O}k}}}$ , and instance-specific loss, $\mathrm{instLoss}_{\bm{\theta}_{r_{Ik}}}$ , that is maximized for $R_{I}\cup R_{IO}$ which contains all triples associated with the bridge nodes. Positive triples are denoted $\mathrm{pt}_{B,I}=(\bm{h}_{e_{i}}^{B},\bm{\theta}_{r_{Ik}},\bm{h}_{e_{j}}^{B})$ and $\mathrm{pt}_{B,IO}=(\bm{h}_{e_{i}}^{B},\bm{h}_{r_{IOk}},\bm{h}_{c_{j}}^{B})$ , and negative triples are denoted $\mathrm{nt}_{B,I}=(\bm{h}_{e_{i}}^{B^{\prime}},\bm{\theta}_{r_{Ik}},\bm{h}_{e_{j}}^{B^{\prime}})$ and $\mathrm{nt}_{B,IO}=(\bm{h}_{e_{i}}^{B^{\prime}},\bm{h}_{r_{IOk}},\bm{h}_{c_{j}}^{B^{\prime}})$ with loss function defined as follows:

(23)

\mathrm{ontoLoss}_{\bm{h}_{r_{I\mathcal{O}k}}}(\mathrm{pt_{B,I\mathcal{O}}},\mathrm{nt_{B,I\mathcal{O}}})\\ =\mathrm{max}\big{(}0,\gamma^{R_{I\mathcal{O}}}+\phi^{H}_{obs}(\mathrm{pt}_{B,I\mathcal{O}})-\phi^{H}_{corr}(\mathrm{nt}_{B,I\mathcal{O}})\big{)}

(24)

\mathrm{instLoss}_{\bm{\theta}_{r_{Ik}}}(\mathrm{pt_{B,I}},\mathrm{nt_{B,I}})\\ =\mathrm{max}\big{(}0,\gamma^{R_{I}}+\phi^{S}_{\mathrm{obs}}(\mathrm{pt}_{B,I})-\phi^{S}_{\mathrm{corr}}(\mathrm{nt}_{B,I})\big{)}

(25)

L_{\mathrm{bridge}}^{R_{I}\cup R_{IO}}=\frac{1}{|R_{I}\cup R_{IO}|}\sum_{(\mathrm{pt_{B,I}},\mathrm{pt_{B,I\mathcal{O}}}\in R_{I}\cup R_{IO})\land(\mathrm{nt_{B,I}},\mathrm{nt_{B,I\mathcal{O}}}\notin R_{I}\cup R_{IO})}\big{(}\\ {\mathrm{ontoLoss}_{\bm{h}_{r_{I\mathcal{O}k}}}(\mathrm{pt_{B,I\mathcal{O}}},\mathrm{nt_{B,I\mathcal{O}}})+\mathrm{instLoss}_{\bm{\theta}_{r_{Ik}}}}(\mathrm{pt_{B,I}},\mathrm{nt_{B,I}})\big{)}

The combination loss function above enables bridge nodes to learn from both spaces of intersection of $\mathbb{S}^{d}$ and $\mathbb{H}^{d}$ .

3.2. Training

This section details the training framework of DGS, for representing two-view KGs, described in Section 3.1. We describe, for each epoch, the training of each node in the two-view KG which includes (1) the forward propagation step, (2) the loss function, and (3) the backward propagation step to optimize parameters. Algorithm 1 provides a summary of the framework.

Embedding Initialization.

We randomly initialize all embeddings of $e$ and $r$ in polar coordinates: $\bm{\theta}_{e_{i}}\in\mathbb{R}^{d-1}\leftarrow\mathrm{Unif}([0,2\pi))^{d-1}$ and $\bm{\theta}_{r_{Ik}},\bm{\theta}_{r_{\mathcal{O}k}}\in\mathbb{R}^{d-1}\leftarrow\mathrm{Unif}([0,2\pi))^{d-1}$ , then convert entity embeddings to their corresponding Cartesian representation: $\bm{h}_{e_{i}}=\mathrm{MPC}(\bm{\theta}_{e_{i}})$ . Link $\bm{h}_{r_{I\mathcal{O}k}}$ is a randomly sampled position on the Poincare disk. Refer to Section A in Supplements for details about the conversion procedure, describing both polar-Cartesian coordinate conversion (MPC) and Cartesian-polar coordinate conversion (MCP). For the instance-view model, we also choose a value for norm $w^{S}$ , that is sampled from a uniform distribution: $w^{S}:w^{S}\in[0,1)\rightarrow\mathrm{Unif}([0,1))$ for all entities, and for the ontology-view model, we choose a value for norm $w_{\bm{h}_{c_{i}}}^{H}$ , assigned uniformly at random per entity. We set the curvature values of the spherical and hyperbolic spaces as $K_{S}=1$ and $K_{H}=-1$ respectively. We leave the non-trivial problem of learning optimal curvatures as future work.

Training Procedure for Instance View KG

Parameter optimization is performed using Riemannian stochastic gradient descent (RSGD) for the spherical space as follows for entity embedding and relational embedding updates respectively. To ensure that the updated entity embedding remains in the norm- $w^{S}$ space, we perform a rescaling operation, $\mathrm{proj}_{S}$ , to project out-of-boundary embeddings back to the surface of the $w^{S}$ -ball.

(26)

\mathrm{proj}_{S}(\bm{z})=\begin{cases}w^{S}\cdot\frac{\bm{z}}{\lVert\bm{z}\rVert}\quad&\text{if}\,\lVert\bm{z}\rVert\neq w^{S}\\ \bm{z}\quad&\text{otherwise}\\ \end{cases}

(27)

r(\bm{h}_{e_{i,t}}^{S},L_{\mathrm{inst}}^{R_{I}})=\big{(}1+\frac{\bm{h}_{e_{i,t}}^{S^{T}}\nabla{L_{\mathrm{inst}}^{R_{I}}}(\bm{h}_{e_{i,t}}^{S})}{\lVert\nabla{L_{\mathrm{inst}}^{R_{I}}}(\bm{h}_{e_{i,t}}^{S})\rVert}\big{)}(I-\bm{h}_{e_{i,t}}^{S}\bm{h}_{e_{i,t}}^{S^{T}})

(28)

\bm{h}_{e_{i},t+1}^{S}\leftarrow\mathrm{proj}_{S}\big{(}-\eta_{t}\cdot r(\bm{h}_{e_{i,t}}^{S},L_{\mathrm{inst}}^{R_{I}})\nabla{L_{\mathrm{inst}}^{R_{I}}}(\bm{h}_{e_{i,t}}^{S})\big{)}

(29)

\bm{\theta}_{r_{Ik},t+1}^{S}\leftarrow-\eta_{t}\cdot r(\bm{\theta}_{r_{Ik},t}^{S},L_{\mathrm{inst}}^{R_{I}})\nabla{L_{\mathrm{inst}}^{R_{I}}}(\bm{\theta}_{r_{Ik},t}^{S})

Training procedure for concepts.

Parameter optimization is performed using RSGD for the hyperbolic space as follows for concept embedding and relational embedding updates respectively, where the corresponding concept norm space, $w^{H}_{c_{i}}$ , is also learned through RSGD by updating embeddings of $\bm{h}_{c_{i}}^{H}$ . Diagonal relational matrix $\bm{R}$ is also updated through RSGD, and we update scalar biases $b_{c_{i},t+1}^{H},b_{c_{j},t+1}^{H}$ through stochastic gradient descent.

(30)

\bm{h}_{c_{i,t+1}}^{H}\leftarrow\bm{h}_{c_{i,t}}^{H}-\eta_{t}(\frac{1-\lVert\bm{h}_{c_{i,t}}^{H}\rVert^{2}}{2})^{2}\nabla{L_{\mathrm{onto}}^{R_{\mathcal{O}}\cup R_{I\mathcal{O}}}}(\bm{h}_{c_{i,t}}^{H})

(31)

\bm{h}_{r_{\mathcal{O}k},t+1}^{H}\leftarrow\bm{h}_{r_{\mathcal{O}k},t}^{H}-\eta_{t}(\frac{1-\lVert\bm{h}_{r_{\mathcal{O}k},t}^{H}\rVert^{2}}{2})^{2}\nabla{L_{\mathrm{onto}}^{R_{\mathcal{O}}\cup R_{I\mathcal{O}}}}(\bm{h}_{r_{\mathcal{O}k},t}^{H})

(32)

\bm{R}_{t+1}\leftarrow\bm{R}_{t}-\eta_{t}(\frac{1-\lVert\bm{R}_{t}\rVert^{2}}{2})^{2}\nabla{L_{\mathrm{onto}}^{R_{\mathcal{O}}\cup R_{I\mathcal{O}}}}(\bm{R}_{t})

(33)

b_{c_{i},t+1}^{H}\leftarrow b_{c_{i},t}^{H}-\eta_{t}\nabla{L_{\mathrm{onto}}^{R_{\mathcal{O}}\cup R_{I\mathcal{O}}}}(b_{c_{i},t}^{H})

(34)

b_{c_{j},t+1}^{H}\leftarrow b_{c_{j},t}^{H}-\eta_{t}\nabla{L_{\mathrm{onto}}^{R_{\mathcal{O}}\cup R_{I\mathcal{O}}}}(b_{c_{j},t}^{H})

After the epoch’s update of concept embeddings, we once again reset the value of the last dimension $d$ to 0 to satisfy the original framework constraint of the Poincaré disk. We also enforce that the angular dimensions of relational embeddings are in $[0,2\pi)$ .

Training Procedure for Bridge Nodes

Parameter optimization is performed using RSGD for the bridge space as follows for bridge entity embedding and relational embedding updates respectively. Ontology optimization, $\mathrm{ontoOpt}(\cdot)$ , and instance optimization, $\mathrm{instOpt}(\cdot)$ , are performed alternatively in batches for each of the two embedding types according to the type of link that the embedding is associated with, e.g., $R_{I}$ or $R_{IO}$ . This enables the representation of bridge nodes to be informed by both $\mathbb{S}^{d}$ and $\mathbb{H}^{d}$ .

(35)

\mathrm{ontoOpt}(\bm{z})=\bm{z}-\eta_{t}(\frac{1-\lVert\bm{z}\rVert^{2}}{2})^{2}\nabla{L_{\mathrm{bridge}}^{R_{I}\cup R_{IO}}}(\bm{z})

(36)

\mathrm{instOpt}(\bm{z})=-\eta_{t}\cdot r(\bm{z},L_{\mathrm{bridge}}^{R_{I}\cup R_{IO}})\nabla{L_{\mathrm{bridge}}^{R_{I}\cup R_{IO}}}(\bm{z})

(37)

\bm{h}_{e_{i},t+1}^{B}\leftarrow\mathrm{proj}_{B}\big{(}\mathrm{ontoOpt}(\bm{h}_{e_{i},t}^{B})\big{)}||\mathrm{proj}_{B}\big{(}\mathrm{instOpt}(\bm{h}_{e_{i},t}^{B})\big{)}

(38)

\bm{h}_{r_{I\mathcal{O}k},t+1}\leftarrow\mathrm{ontoOpt}(\bm{h}_{r_{I\mathcal{O}k},t})

(39)

\bm{\theta}_{r_{Ik},t+1}\leftarrow\mathrm{instOpt}(\bm{\theta}_{r_{Ik},t})

After each epoch’s update of the instance optimization for $\bm{h}_{e_{i}}^{B}$ , the value of the last dimension $d$ is reset to 0 and rescaled with $\mathrm{proj}_{B}(\bm{z})$ defined to be the same as $\mathrm{proj}_{S}(\bm{z})$ to ensure the intersection space constraint of the bridge entity model. Angular dimensions are also enforced to be in [0, 2 $\pi$ ).

Input : set of entities

e

; set of relations

r

instance-view entity to entity triples with links,

\bm{\theta}_{r_{Ik}}^{S}

ontology-view concept to concept triples with links,

\bm{\theta}_{r_{\mathcal{O}k}}^{H}

bridge entity to concept triples with links, ontology

\bm{h}_{r_{I\mathcal{O}k}}

bridge entity to entity triples with links, instance

\bm{\theta}_{r_{Ik}}

Output : Updated embeddings,

\bm{\theta}_{r_{Ik},\mathrm{EP}}^{S}

\bm{\theta}_{r_{\mathcal{O}k,\mathrm{EP}}}^{H}

, ontology

\bm{h}_{r_{I\mathcal{O}k,\mathrm{EP}}}

, and instance

\bm{\theta}_{r_{Ik},\mathrm{EP}}

at final epoch

\mathrm{EP}

1 for epoch $\in$ (1, 2, …, EP) do

2 Step 1*:

3 Sample links from

\bm{\theta}_{r_{Ik}}^{S}

4 Perform spherical update of entities: Section 3.1.1

5 Step 2*:

6 Sample links from

\bm{\theta}_{r_{\mathcal{O}k}}^{H}

7 Perform hyperbolic update of concepts: Section 3.1.2

8 Step 3*:

9 Sample links from ontology

\bm{h}_{r_{I\mathcal{O}k}}

and instance

\bm{\theta}_{r_{Ik}}

10 Alternatively perform spherical and hyperbolic updates of bridge nodes: Section 3.1.3

11 end for

return $\bm{\theta}_{r_{Ik},\mathrm{EP}}^{S}$ , $\bm{\theta}_{r_{\mathcal{O}k,\mathrm{EP}}}^{H}$ , ontology $\bm{h}_{r_{I\mathcal{O}k,\mathrm{EP}}}$ , and instance $\bm{\theta}_{r_{Ik},\mathrm{EP}}$

Algorithm 1 Overall training procedure of DGS. “*” indicates that the three steps can be performed in any order.

4. Experiments

In this section, we evaluate DGS on two KG tasks: the triple completion task on each of the instance and ontology views of the KG and the entity typing task to test quality of the learned bridge space in communicating between each view of the KG. We also provide a case study on entity typing for different variants of DGS by embedding on other combinations of geometric spaces. Further, we provide a visualization of embeddings before and after the learning process projected onto the 3-D geometric space of DGS.

4.1. Datasets

We utilize the datasets of YAGO26K-906 and DB111K-174 since they have the two-view KG setting unlike other datasets for KG embeddings that consider solely an instance-view (Toutanova and Chen, 2015) or ontology-view (Dettmers et al., 2018). YAGO26K-906 and DB111K-174 are prepared from (Hao et al., 2019), which are extracted from YAGO (Suchanek et al., 2007) and DBpedia (Lehmann et al., 2015) respectively. Refer to (Hao et al., 2019) for the detailed construction process. Table 1 provides dataset statistics and Table 2 provides data splits for both datasets on both KG tasks. It can be observed that the instance-view contains many more triples than the ontology-view and that DB111K-174 contains a larger proportion of entity-concept triples (10.35%) compared to YAGO26K-906 (2.43%).

Table 1. Dataset statistics for entities

E

, concepts

C

and their relations.

E-E

denotes entity-entity links,

C-C

denotes concept-concept links, and

E

C

denotes entity-concept links.

Dataset	Nodes		Relations
	# $E$	# $C$	# $E$ - $E$ : $R_{I}$	# $C$ - $C$ : $R_{O}$	# $E$ - $C$ : $R_{IO}$
YAGO26K-906	26,078	906	390,738	8,962	9,962
DB111K-174	111,762	174	863,643	763	99,748

Table 2. Data splits for triple completion and entity typing. We provide splits for all KG triples in

R_{I},R_{O},R_{IO}

for train(tr), validation(v), and test(ts).

		YAGO26K-906
Task	Tr( $R_{I}/R_{O}/R_{IO}$ )	V( $R_{I}/R_{O}/R_{IO}$ )	Ts( $R_{I}/R_{O}/R_{IO}$ )
Triple Completion	332,128/7,618/8,691	19,536/448/485	39,074/896/1,019
Entity Typing	211,346/4,876/5,379	23,549/543/598	156,311/3,592/3,985
		DB111K-174
Task	Tr( $R_{I}/R_{O}/R_{IO}$ )	V( $R_{I}/R_{O}/R_{IO}$ )	Ts( $R_{I}/R_{O}/R_{IO}$ )
Triple Completion	734,096/648/84,864	43,182/38/5,018	86,365/77/10,131
Entity Typing	466,538/462/53,863	51,828/46/5,985	345,504/337/39,900

4.2. Models

4.2.1. Baselines

We compare DGS to state-of-the-art neural network embedding models, which include Euclidean, non-Euclidean, and product space KGE models, as well as GNN-based models for KG completion and entity typing.

•

TransE (Bordes et al., 2013), one of the first KGE models, which simply captures the relationship between entities as a translation.
•

DistMult (Yang et al., 2015), a matrix factorization KGE model, modeling the relationship between entities via multiplication.
•

ComplEx (Trouillon et al., 2016), a KGE model that extends DistMult into the complex number field.
•

RotatE (Sun et al., 2019), a recent KGE model, based on the rotation assumption where a relation is a rotation from the subject to the object in the complex vector space.
•

JOIE (Hao et al., 2019) and M²GNN (Wang et al., 2021): Refer to Section 2.3 where this is discussed.
•

HyperKG (Kolyvakis et al., 2019), a KGE model extending translational KGE methods to the Poincaré-ball model of hyperbolic geometry.
•

HAKE (Zhang et al., 2020), which extends RotatE by having relations combining modulus scaling with rotation.
•

ConE (Bai et al., 2021), a KGE model embedding entities into hyperbolic cones and relations as transformations between cones.
•

RefH/RotH/AttH (Chami et al., 2020), which are hyperbolic KGE models that combine hyperbolic spaces using hyperbolic attention, where RefH and RotH are variants of AttH using only reflections and rotations respectively.
•

HGCN (Chami et al., 2019), a hyperbolic GCN model utilizing Riemannian geometry and the hyperboloid model.
•

HyperKA (Sun et al., 2020), which extends GCNs from the Euclidean to hyperbolic space using the Poincaré ball model.

4.2.2. DGS Variants

We describe variant models of DGS below.

•

DGS-RO-FC, which is DGS with the Riemannian operator (RO) used for vector transformation instead of our proposed closed spherical space vector transformation procedure in Section 3.1.1, and with fixed center (FC) of spherical ball at 0, which is the same center as the Poincaré disk. For single geometric spaces of $\mathbb{S}^{d}$ and $\mathbb{H}^{d}$ for non-bridge nodes and concepts, the Riemannian operator is performed as a retraction operation to the tangent Euclidean spaces. However, we extend the Riemannian operator when performing retraction for the intersection ring of the bridge space, which is formed by intersecting the spherical ball’s surface and Poincaré disk. This is described in the Supplements.
•

DGS-SO-FC, which is DGS with the proposed closed spherical space operator (SO), and with FC of spherical ball at 0, which is the same center as the Poincaré disk.
•

DGS (ours), which is DGS with the proposed closed SO, and with learnable center (LC) of spherical ball, which for simplicity of constructing the intersection space for bridge nodes, is set to the last dimension to only allow for vertical shift. Note that we do not need to also make the center of the Poincaré disk learnable as this shift is already introduced with making one of the centers learnable. For learning the spherical center, $\omega$ , in the model, we follow the same training procedure for Section 3.2 but for the non-bridge entities and bridge entities, we perform the operations by temporarily shifting to the center 0 (e.g., $-\omega$ shift), then shift back to the new center $\omega$ (e.g., $+\omega$ shift) after the updates are performed.

4.2.3. DGS Ablation Models

We study different ablations models of DGS, which are formed by utilizing different combinations of manifold spaces for each type of link of DGS ( $R_{I}/R_{O}$ ) in the two-view KG including the spherical space, $\mathbb{S}^{d}$ , hyperbolic space using the Poincaré model, $\mathbb{H}^{d}$ , or Euclidean space, $\mathbb{E}^{d}$ . These include: DGS ( $\mathbb{S}^{d}/\mathbb{S}^{d}$ ), MGS ( $\mathbb{H}^{d}/\mathbb{H}^{d}$ ), DGS ( $\mathbb{E}^{d}/\mathbb{H}^{d}$ ), and DGS ( $\mathbb{S}^{d}/\mathbb{E}^{d}$ ). Since $R_{IO}$ is always at the intersection of the two spaces $R_{I}$ and $R_{O}$ , we do not need to specify the geometric space separately.

4.3. Evaluation

Table 3. Results of KG triple completion.

\bm{M}

denotes the

d

-dimensional manifold space,

R_{I}

are entity links,

R_{O}

are concept links. For each group of models, the best results are bold-faced. The overall best results on each dataset are underscored.

		Datasets	YAGO26K-906						DB111K-174
Type	$\bm{M}$	Graphs	$R_{I}$ KG Completion			$R_{O}$ KG Completion			$R_{I}$ KG Completion			$R_{O}$ KG Completion
		Metrics	MRR	H@1	H@10	MRR	H@1	H@10	MRR	H@1	H@10	MRR	H@1	H@10
	$\mathbb{E}^{d}$	TransE	0.187	13.73	35.05	0.189	14.72	24.36	0.318	22.70	48.12	0.539	47.90	61.84
	$\mathbb{E}^{d}$	DistMult	0.288	24.06	31.24	0.156	14.32	16.54	0.280	27.24	29.70	0.501	45.52	64.73
non-GNN based	$\mathbb{C}^{d}$	ComplEx	0.291	24.85	37.28	0.180	14.83	22.97	0.321	27.39	46.63	0.549	47.80	62.23
KGE models	$\mathbb{C}^{d}$	RotatE	0.302	25.31	42.17	0.228	16.35	27.28	0.356	29.31	54.60	0.557	49.16	68.19
	$\mathbb{E}^{d}$	JOIE	0.316	24.62	51.85	0.289	18.66	39.13	0.479	35.21	72.38	0.602	52.48	79.71
	$\mathbb{H}^{d}$	HyperKG	0.215	18.35	36.02	0.174	14.50	23.26	0.302	23.31	46.72	0.542	47.59	62.11
	$\mathbb{H}^{d}$	HAKE	0.293	23.04	40.19	0.301	19.27	41.09	0.391	31.10	60.46	0.638	55.69	81.07
	$\mathbb{H}^{d}$	ConE	0.299	23.56	41.23	0.313	20.05	41.80	0.422	33.69	68.12	0.639	55.89	81.45
	$\mathbb{H}^{d}$	RefH	0.282	23.19	40.52	0.298	19.70	41.26	0.407	30.06	66.93	0.622	55.35	81.09
	$\mathbb{H}^{d}$	RotH	0.295	23.50	41.03	0.308	19.97	41.78	0.418	30.18	67.05	0.639	55.82	81.44
GNN-based models	$\mathbb{H}^{d}$	AttH	0.298	23.43	41.20	0.310	19.99	41.53	0.419	30.10	66.58	0.629	55.37	81.39
	$\mathbb{H}^{d}$	HGCN	0.307	23.04	40.25	0.302	19.38	40.49	0.396	31.54	61.78	0.638	55.81	81.60
	$\mathbb{H}^{d}$	HyperKA	0.320	26.71	52.09	0.305	18.83	40.28	0.486	35.79	72.33	0.613	53.36	80.59
	$\mathbb{P}^{d}$	M²GNN	0.347	29.63	54.28	0.341	23.70	42.19	0.506	36.52	73.11	0.644	56.82	83.01
	–	DGS ( $\mathbb{S}^{d}/\mathbb{S}^{d}$ )	0.338	27.15	53.20	0.318	20.36	41.02	0.491	34.58	71.40	0.606	53.29	80.17
Ablation variants:	–	DGS ( $\mathbb{H}^{d}/\mathbb{H}^{d}$ )	0.314	25.11	52.02	0.358	24.61	43.28	0.502	35.79	73.61	0.663	57.59	84.16
DGS ( $R_{I}/R_{O}$ )	–	DGS ( $\mathbb{E}^{d}/\mathbb{H}^{d}$ )	0.327	25.32	52.89	0.343	23.95	41.62	0.498	35.11	72.37	0.640	56.17	82.68
	–	DGS ( $\mathbb{S}^{d}/\mathbb{E}^{d}$ )	0.322	24.91	52.36	0.297	19.43	40.61	0.484	33.29	73.54	0.619	53.72	80.51
	–	DGS-RO-FC	0.352	29.79	55.21	0.364	25.04	43.27	0.518	37.65	73.97	0.681	59.23	84.16
DGS variants	–	DGS-SO-FC	0.364	30.15	55.93	0.369	25.81	44.18	0.536	38.29	74.28	0.687	59.26	84.82
	–	DGS (ours)	0.366	30.15	56.06	0.372	25.88	44.38	0.536	38.31	74.85	0.690	59.88	84.82

In this section, we detail our evaluation on the tasks of KG triple completion and entity typing. The goal of triple completion is to construct the missing relation facts in a KG structure. Specifically, we test constructing a missing target node, from each of the ontology or instance views: or queries $(e_{i},r_{k},?e_{j})$ and $(c_{i},r_{k},?c_{j})$ , such that each model evaluated is trained on the entire two-view KG. The goal of entity typing is to predict the concepts that correspond to the entities, or queries $(e_{i},r_{k},?c_{j})$ .

Using plausibility scores to rank each test candidate set, for each task, we report results for the evaluation metrics of mean reciprocal rank ( $MRR$ ) and Hits@, e.g., $Hits@1$ , $Hits@3$ , $Hits@10$ . Table 2 reports our data splits for each task. For triple completion, this is chosen to be embedding distance of the source node to the missing target node modeled under the relation, and for entity typing, this is chosen to be the embedding distance from the entity’s representation in the concept space to the concept. (Hao et al., 2019) provides more details on the evaluation procedure. For evaluation consistency, for both tasks, model training hyperparameters are chosen for dimensionality $d\in$ {50, 100, 200, 300} for all triples, learning rate $\eta\in$ {5e-4, 1e-3, 1e-2, 1e-1} and margins $\gamma\in$ {0.5, 1}. Further, different batch sizes and epochs are used according to the type and size of the graphs.

KG Triple Completion Results.

Results are reported in Table 3. DGS outperforms all of the baseline models on both datasets. DGS achieves an average performance gain over all baselines by 32.36% on $MRR$ , 27.59% on $Hit@1$ , and 29.17% on $Hit@10$ for the instance-view completion across both YAGO26K-906 and DB111K-174. DGS achieves an average performance gain over all baselines by 23.43% on $MRR$ , 28.41% on $Hit@1$ , and 18.11% on $Hit@10$ for the ontology-view completion across both YAGO26K-906 and DB111K-174.

It can be observed that in both the instance and ontology views on both datasets, the hyperbolic-based KGE models outperform their Euclidean and complex space counterparts. Further, hyperbolic KGE models perform better on ontology view than instance view likely due to there being prevalence of hierarchy in the ontology. It is also seen that using multiple geometric spaces is more effective than using a single geometric space. For GNN-based models, M²GNN, which uses a product space, $\mathbb{P}^{d}$ combining Euclidean, spherical, and hyperbolic spaces, outperforms the models using only one of the spaces. Compared to M²GNN, the most competitive baseline model in most cases, DGS shows significant improvement of 3.25% on $MRR$ , 5.15% on $Hit@1$ , and 2.73% on $Hit@10$ on average for both YAGO26K-906 and DB111K-174. This is likely because hierarchy in the KG is better modeled through the hyperbolic space than with influence from the spherical or Euclidean space, and cyclic links are better modeled through the space than with influence from the hyperbolic or Euclidean space. For bridge entities involved in both hierarchical and cyclic links, we see it is beneficial to model them in an intersection space to better model both of these properties.

Table 4. Results of entity typing. For each group of models, the best results are bold-faced. The overall best results on each dataset are underscored.

Datasets	YAGO26K-906			DB111K-174
Metrics	MRR	Acc.	Hit@3	MRR	Acc.	Hit@3
TransE	0.144	7.32	35.26	0.503	43.67	60.78
DistMult	0.411	36.07	55.32	0.551	49.83	68.01
ComplEx	0.519	43.08	59.28	0.583	55.07	70.17
RotatE	0.673	58.24	73.95	0.721	61.48	75.67
JOIE	0.899	85.72	96.02	0.859	75.58	96.02
HyperKG	0.627	55.39	65.64	0.736	61.20	74.29
HAKE	0.905	87.42	96.37	0.866	76.04	96.22
ConE	0.912	87.51	96.39	0.869	76.85	96.38
RefH	0.907	87.49	96.37	0.867	76.26	96.31
RotH	0.909	87.49	96.39	0.868	76.55	96.36
AttH	0.910	87.50	96.38	0.868	76.50	96.33
HGCN	0.905	87.44	96.37	0.867	76.11	96.27
HyperKA	0.918	87.76	96.45	0.871	76.65	96.50
M²GNN	0.922	88.16	97.01	0.880	77.58	96.94
DGS ( $\mathbb{S}^{d}/\mathbb{S}^{d}$ )	0.919	87.94	96.81	0.875	77.03	96.78
DGS ( $\mathbb{H}^{d}/\mathbb{H}^{d}$ )	0.924	88.12	97.01	0.887	77.53	96.95
DGS ( $\mathbb{E}^{d}/\mathbb{H}^{d}$ )	0.922	88.07	96.91	0.883	77.37	96.58
DGS ( $\mathbb{S}^{d}/\mathbb{E}^{d}$ )	0.917	87.68	96.41	0.866	76.33	96.48
DGS-RO-FC	0.930	88.47	97.32	0.891	77.81	97.04
DGS-SO-FC	0.938	89.02	98.11	0.895	77.92	97.40
DGS (ours)	0.939	89.07	98.18	0.895	77.94	97.47

Entity Typing Results.

Results are reported in Table 4. Consistent with the KG completion results, it can be seen that DGS variants outperform the baseline models on both datasets for entity typing. DGS achieves an average performance gain over all baselines of 24.49% on $MRR$ , 26.39% on $Acc$ , and 18.78% on $Hit@3$ for YAGO26K-906. DGS achieves an average performance gain over all baselines of 14.86% on $MRR$ , 13.74% on $Acc$ , and 12.15% on $Hit@3$ for DB111K-174. Compared to the most competitive GNN-based model, M²GNN, DGS has nearly a 2% gain in $MRR$ for each YAGO26K-906 and DB111K-174.

Ablation Studies.

Tables 3 and 4 report our results for ablation studies in models belonging to the category Ablation variants. It can be seen that using the spherical space for $R_{I}$ links learns a better representation than its hyperbolic or Euclidean space counterparts. This indicates the prevalence of cyclic relations between entities. Utilizing the hyperbolic space for $R_{O}$ and $R_{IO}$ links learns a better representation than its spherical or Euclidean space counterparts. This indicates the prevalence of hierarchical relations between concepts and entities-concepts. Interestingly, a fully hyperbolic-space model in general achieves better performance than its fully spherical counterpart, indicating that YAGO26K-906 and DB111K-174 may contain relatively more hierarchical links than cyclic links. Further, learning a better representation in one view benefits the other view more if there are more $R_{IO}$ bridge links. For example, since DB111K-174 contains significantly more $R_{IO}$ links than YAGO26K-906 shown in Table 1, it can be seen that DGS ( $\mathbb{H}^{d}/\mathbb{H}^{d}/\mathbb{H}^{d}$ ) has better performance against baselines on the instance view of DB111K-174 compared to the instance view of YAGO26K-906.

Visualizations.

We project the learned embeddings into 3-D space, and plot seven nodes with their relations in Figure 2. Concepts that are higher in the ontological hierarchy such as animal and person are closer to the origin compared to lower-level concepts such as leader, and entities tend to be farther away from the origin.

5. Conclusions

We are among the first to explore utilization of different embedding spaces for different views of a knowledge graph. Entities in instance view often have cyclical connections and are hereby modeled via spherical embeddings; whereas concepts and bridge entities in ontology view tend to form a hierarchy and are hereby modeled in hyperbolic space. We propose the notion of bridge space to seamlessly model intersection of the two spaces in which bridge entities reside. In addition, we propose a set of closed spherical space operations to eliminate the need of projecting embeddings to the tangent Euclidean space. The resulting model, DGS, is a unified framework that significantly outperforms all baselines, showing the effectiveness of capturing different structures in the knowledge graph using embedding spaces of different curvatures and properties.

Future directions include supporting geometric spaces of learnable curvature (to better capture the knowledge graph structure) and allowing a learnable offset between the origins of the two geometric spaces (to better accommodate uneven entity distribution and/or unbalanced ontology structures) We also aim to extend our dual-space model to a multi-space geometric embedding model.

6. Acknowledgements

This work was partially supported by NSF III-1705169, 1829071, 1937599, 2106859, 2119643; NIH R35-HL135772; NIBIB R01-EB027650; Amazon Research Awards; Cisco Research Grant USA000EP280889; NEC Gifts; Picsart Gifts; and Snapchat Gifts. The authors would also like to thank anonymous reviewers for their constructive feedback.

References

(1)
Bai et al. (2021) Yushi Bai, Rex Ying, Hongyu Ren, and Jure Leskovec. 2021. Modeling Heterogeneous Hierarchies with Relation-specific Hyperbolic Cones. NeurIPS (2021).
Balaževic´ et al. (2019) Ivana Balaževic´, Carl Allen, and Timothy Hospedales. 2019. Multi-relational Poincaré Graph Embeddings. NeurIPS (2019).
Beltrami (1897) Eugenio Beltrami. 1897. Teoria fondamentale degli spazii di curvatura costante. Annali di Matematica Pura ed Applicata.
Berant and Liang (2014) Jonathan Berant and Percy Liang. 2014. Semantic Parsing via Paraphrasing. ACL (2014).
Bonnabel (2013) Silvère Bonnabel. 2013. Stochastic gradient descent on Riemannian manifolds. IEEE (2013).
Bordes et al. (2013) Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. NeurIPS (2013).
Cannon et al. (1997) James W. Cannon, William J. Floyd, Richard Kenyon, and Walter R. Parry. 1997. Hyperbolic Geometry. MSRI.
Chami et al. (2020) Ines Chami, Adva Wolf, Da-Cheng Juan, Frederic Sala, Sujith Ravi, and Christopher Re´. 2020. Low-Dimensional Hyperbolic Knowledge Graph Embeddings. ACL (2020).
Chami et al. (2019) Ines Chami, Rex Ying, Christopher Ré, and Jure Leskovec. 2019. Hyperbolic Graph Convolutional Neural Networks. NeurIPS (2019).
Dettmers et al. (2018) Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. Convolutional 2D knowledge graph embeddings. AAAI.
Gu et al. (2018) Albert Gu, Frederic Sala, Beliz Gunel, and Christopher Ré. 2018. Learning mixed-curvature representations in product spaces. In ICLR.
Hao et al. (2019) Junheng Hao, Muhao Chen, Wenchao Yu, Yizhou Sun, and Wei Wang. 2019. Universal Representation Learning of Knowledge Bases by Jointly Embedding Instances and Ontological Concepts. KDD (2019).
Huang et al. (2019a) Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019a. Knowledge Graph Embedding Based Question Answering. WSDM (2019).
Huang et al. (2019b) Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019b. Knowledge Graph Embedding Based Question Answering. WSDM (2019).
Iyer et al. (2022) Roshni G. Iyer, Thuy Vu, Alessandro Moschitti, and Yizhou Sun. 2022. Question-Answer Sentence Graph for Joint Modeling Answer Selection. arXiv preprint arXiv:2203.03549 (2022).
Iyer et al. (2021) Roshni G. Iyer, Wei Wang, and Yizhou Sun. 2021. Bi-Level Attention Graph Neural Networks. ICDM (2021).
Kolyvakis et al. (2019) Prodromos Kolyvakis, Alexandros Kalousis, and Dimitris Kiritsis. 2019. HyperKG: Hyperbolic Knowledge Graph Embeddings for Knowledge Base Completion. arXiv preprint arXiv:1908.04895 (2019).
Lehmann et al. (2015) Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendesf, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Soren Auer, and Christian Bizer. 2015. DBpedia: A large-scale, multilingual knowledge base extracted from Wikipedia. In Semantic Web Journal. 167–195.
Meng et al. (2019) Yu Meng, Jiaxin Huang, Guangyuan Wang, Chao Zhang, Honglei Zhuang, Lance Kaplan, and Jiawei Han. 2019. Spherical Text Embedding. NeurIPS (2019).
Nickel and Kiela (2017) Maximilian Nickel and Douwe Kiela. 2017. Poincaré Embeddings for Learning Hierarchical Representations. NeurIPS (2017).
Petersen et al. (2006) Peter Petersen, S. Axler, and K.A. Ribet. 2006. Riemannian Geometry. Springer.
Suchanek et al. (2007) Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A Core of Semantic Knowledge. WWW.
Sun et al. (2022) Li Sun, Zhongbao Zhang, Junda Ye, Hao Peng, Jiawei Zhang, Sen Su, and Philip S. Yu. 2022. A Self-supervised Mixed-curvature Graph Neural Network. AAAI (2022).
Sun et al. (2020) Zequn Sun, Muhao Chen, Wei Hu, Chengming Wang, Jian Dai, and Wei Zhang. 2020. Knowledge Association with Hyperbolic Knowledge Graph Embeddings. ACL (2020).
Sun et al. (2019) Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2019. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. ICLR (2019).
Toutanova and Chen (2015) Kristina Toutanova and Danqi Chen. 2015. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality.
Trouillon et al. (2016) Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex Embeddings for Simple Link Prediction. ICML (2016).
Wang et al. (2019) Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, and Minyi Guo. 2019. RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems. WSDM (2019).
Wang et al. (2021) Shen Wang, Xiaokai Wei, Cícero Nogueira dos Santos, Zhiguo Wang, Ramesh Nallapati, Andrew Arnold, Bing Xiang, Philip S. Yu, and Isabel F Cruz. 2021. Mixed-Curvature Multi-Relational Graph Neural Network for Knowledge Graph Completion. WWW (2021).
Yang et al. (2015) Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. ICLR (2015).
Zhang et al. (2021) Shuai Zhang, Yi Tay, Wenqi Jiang, Da-cheng Juan, and Ce Zhang. 2021. Switch spaces: Learning product spaces with sparse gating. arXiv preprint arXiv:2102.08688 (2021).
Zhang et al. (2020) Zhanqiu Zhang, Jianyu Cai, Yongdong Zhang, and Jie Wang. 2020. Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction. AAAI (2020).

Appendix A General Polar-Cartesian Conversion Framework

In this section, we detail our designed general procedures for efficiently converting arbitrary $d$ -dimensional embeddings between the polar and Cartesian coordinate representations, which are utilized by sub-models in DGS. These procedures include: (1) the Cartesian to polar coordinate transformation (MCP) and (2) the polar to Cartesian coordinate transformation (MPC).

A.1. MCP Transformation Procedure.

Using Figure 3, we derive transformations for the 3-dimensional spherical surface, and extend to a $d$ -dimensional spherical surface:

	$\displaystyle\mathrm{tan}(\theta_{z,1})$	$\displaystyle=\frac{x_{z,1}}{x_{z,2}};\mathit{rad}=w$
	$\displaystyle\mathrm{cot}(\theta_{z,2})$	$\displaystyle=\frac{x_{z,3}}{\sqrt{x_{z,1}^{2}+x_{z,2}^{2}}};\mathit{rad}=w$
	$\displaystyle\mathrm{cot}(\theta_{z,3})$	$\displaystyle=\frac{x_{z,4}}{\sqrt{x_{z,1}^{2}+x_{z,2}^{2}+x_{z,3}^{2}}};\mathit{rad}=w$
		$\displaystyle\vdots$
	$\displaystyle\mathrm{cot}(\theta_{z,d-1})$	$\displaystyle=\frac{x_{z,d}}{\sqrt{x_{z,1}^{2}+x_{z,2}^{2}+...+x_{z,d-1}^{2}}};\mathit{rad}=w$
	$\displaystyle\implies$
	$\displaystyle\theta_{z,1}$	$\displaystyle=\mathrm{tan}^{-1}(\frac{x_{z,2}}{x_{z,1}});\mathit{rad}=w$
	$\displaystyle\theta_{z,i}$	$\displaystyle=\mathrm{cot}^{-1}{\frac{x_{z,i+1}}{\sqrt{\sum_{i^{\prime}=1}^{i}{x_{z,i^{\prime}}^{2}}}}},i\in[2,d-1];\mathit{rad}=w$

A.2. MPC Transformation Procedure.

Using Figure 3, we derive transformations for the $d$ -dimensional spherical surface, as follows with:

	$\displaystyle x_{z,1}$	$\displaystyle=\mathrm{cos}(\theta_{z,1});\mathit{rad}=w$
	$\displaystyle x_{z,d-i}$	$\displaystyle=\big{(}\prod_{i^{\prime}=d-i}^{d-1}\mathrm{sin}(\theta_{z,i^{\prime}})\big{)}\cdot\mathrm{sin}(\theta_{z,d-i-1});\mathit{rad}=w$
	$\displaystyle x_{z,d}$	$\displaystyle=\mathrm{cos}(\theta_{z,d-1});\mathit{rad}=w$

Theorem.

The MCP procedure and MPC procedure are closed in the spherical space.

Proof.

The above equations for the MCP and MPC procedures directly use properties of trigonometry for right triangles for each embedding dimension $\theta_{i},i\in[1,d-1]$ , ensuring that the transformation is closed. ∎

Appendix B Batch Vector Transformation Procedure

We illustrate the batch version of the rotation operation presented in Section 3.1.1. Consider for entity $e_{i}$ , there are $K$ relations connected to $e_{i}$ . Instead of applying $f_{\mathrm{rot}}$ to each one of the $K$ relations, we can perform the following to compute the result efficiently:

	$\displaystyle f_{\mathrm{rot}}\left(\bm{h}_{e_{i}}^{S},\left[\begin{array}[]{c}\bm{\theta}_{r_{1}}^{S}\\ \vdots\\ \bm{\theta}_{r_{K}}^{S}\end{array}\right]\right)=\left[\begin{array}[]{c}({\theta}_{e_{i},1}^{S}+{\theta}_{r_{1},1}^{S})+\dots+({\theta}_{e_{i},d}^{S}+{\theta}_{r_{1},d}^{S})\\ \vdots\\ ({\theta}_{e_{i},1}^{S}+{\theta}_{r_{K},1}^{S})+\dots+({\theta}_{e_{i},d}^{S}+{\theta}_{r_{K},d}^{S})\end{array}\right]\mathrm{mod}2\pi$
	$\displaystyle=\left[\begin{array}[]{c}f_{\mathrm{rot}}(\bm{h}_{e_{i}}^{S},\bm{\theta}_{r_{1}}^{S})\\ \vdots\\ f_{\mathrm{rot}}(\bm{h}_{e_{i}}^{S},\bm{\theta}_{r_{K}}^{S})\end{array}\right]$

Theorem.

The batch vector transformation procedure is closed.

Proof.

Each relation has $f_{\mathrm{rot}}(\cdot)$ applied to the entity, and $f_{\mathrm{rot}}(\cdot)$ has been shown to be closed because every dimension of the transformed head is within the angular constraints e.g., by $\mathrm{mod}2\pi$ , and norm space constraints because $f_{\mathrm{rot}}(\cdot)$ preserves the norm space of the spherical space. ∎

Appendix C Riemannian retraction operation for the bridge space of DGS

Figure 4 describes the Riemannian retraction operation $\mathcal{R}$ for the bridge space to the tangent Euclidean space. The bridge space is formed by intersecting the surface of the spherical ball with the Poincaré disk in the hyperbolic space. As shown in the figure, we separately map the Poincaré disk and the spherical ball to different tangent Euclidean spaces, and enable for joint communication between the two spaces to represent the bridge entities. On the left, we perform isomorphic mapping from the Poincaré disk, $\mathcal{H}^{d}$ , to the hyperboloid model, $\mathcal{B}^{d+1}$ as in (Chami et al., 2019) using:

(40)

\mathcal{R}_{\mathcal{H}^{d}\rightarrow\mathcal{B}^{d+1}}(x_{1},...,x_{d})=\frac{2x_{1},...,2x_{d},1+\lVert\textbf{x}\rVert_{2}^{2}}{1-\lVert\textbf{x}\rVert_{2}^{2}}

(41)

\mathcal{R}_{\mathcal{B}^{d+1}\rightarrow\mathcal{H}^{d}}(x_{1},...,x_{d+1})=\frac{(x_{1},...,x_{d})}{x_{d+1}+1}

, then using logarithmic mapping map to the tangent Euclidean space 1. On the right, we use logarithmic mapping to map the spherical ball to the tangent Euclidean space. Then we allow for transformation between the two Euclidean spaces through transformation mappings $\gamma_{\alpha\beta}$ and $\gamma_{\beta\alpha}$ such that

(42)

\gamma_{\alpha\beta}=\gamma_{\beta}\circ\gamma_{\alpha}^{-1}

(43)

\gamma_{\beta\alpha}=\gamma_{\alpha}\circ\gamma_{\beta}^{-1}

Dual-Geometric Space Embedding Model for Two-View Knowledge Graphs

Abstract.

1. Introduction

2. Preliminary and Related Work

2.1. Problem Formulation

2.2. Non-Euclidean Geometric Spaces

Non-Euclidean Space Optimization

2.3. Two-View KG Models

JOIE

Models leveraging product of spaces

3. DGS Architecture

3.1. Modeling

3.1.1. Modeling Instance View KG in Spherical Space

Representation of entities.

Spherical Space Vector Transformation Procedure: frot​(⋅)f_{\mathrm{rot}}(\cdot).

Theorem.

Proof.

Instance View Loss Function.

3.1.2. Modeling Ontological View KG in Hyperbolic Space

Representation of concepts.

Ontological View Loss Function.

3.1.3. Modeling the Intersection of the Two Spaces.

Representation of bridge entities.

Bridge Node Loss Function.

3.2. Training

Embedding Initialization.

Training Procedure for Instance View KG

Training procedure for concepts.

Training Procedure for Bridge Nodes

4. Experiments

4.1. Datasets

4.2. Models

4.2.1. Baselines

4.2.2. DGS Variants

4.2.3. DGS Ablation Models

4.3. Evaluation

KG Triple Completion Results.

Entity Typing Results.

Ablation Studies.

Visualizations.

5. Conclusions

6. Acknowledgements

References

Appendix A General Polar-Cartesian Conversion Framework

A.1. MCP Transformation Procedure.

A.2. MPC Transformation Procedure.

Theorem.

Proof.

Appendix B Batch Vector Transformation Procedure

Theorem.

Proof.

Appendix C Riemannian retraction operation for the bridge space of DGS

Spherical Space Vector Transformation Procedure: $f_{\mathrm{rot}}(\cdot)$ .