Is There More Pattern in Knowledge Graph?
Exploring Proximity Pattern for Knowledge Graph Embedding

Ren Li^{1, 2}, Yanan Cao^{1, 2}, Qiannan Zhu³, Xiaoxue Li⁴, Fang Fang^{1, 2}

Abstract

Modeling of relation pattern is the core focus of previous Knowledge Graph Embedding works, which represents how one entity is related to another semantically by some explicit relation. However, there is a more natural and intuitive relevancy among entities being always ignored, which is that how one entity is close to another semantically, without the consideration of any explicit relation. We name such semantic phenomenon in knowledge graph as proximity pattern. In this work, we explore the problem of how to define and represent proximity pattern, and how it can be utilized to help knowledge graph embedding. Firstly, we define the proximity of any two entities according to their statistically shared queries, then we construct a derived graph structure and represent the proximity pattern from global view. Moreover, with the original knowledge graph, we design a Chained couPle-GNN (CP-GNN) architecture to deeply merge the two patterns (graphs) together, which can encode a more comprehensive knowledge embedding. Being evaluated on FB15k-237 and WN18RR datasets, CP-GNN achieves state-of-the-art results for Knowledge Graph Completion task, and can especially boost the modeling capacity for complex queries that contain multiple answer entities, proving the effectiveness of introduced proximity pattern.

1 Introduction

Knowledge Graphs (KGs) are large, graph-structured databases which store facts in triple form $(h,r,t)$ , denoting that head entity $h$ and tail entity $t$ satisfy relation $r$ . Knowledge Graph Embedding (KGE) is a task of learning low-dimensional representation for entities and relations, so that the symbolic knowledge can be integrated into numerical model to support various down-stream knowledge based tasks, such as recommendation system (Wang et al. 2018), question answering (Yasunaga et al. 2021) and text generation (Zhang et al. 2020), etc. To evaluate the effectiveness of learned embeddings, Knowledge Graph Completion (KGC) task is usually adapted (Dettmers et al. 2018; Balazevic and Allen 2019; Vashishth et al. 2020a), aiming at predicting tail entities $t$ given $(h,r,?)$ or head entities $h$ given $(?,r,t)$ . Essentially, KGC can be regarded as query-answer format and without loss of generality, we denote the query as $(h,r,?)$ and the answer as $t$ .

The general intuition of previous KGE works is to model the relation pattern of knowledge graph, which represents how one entity is related to another semantically by some explicit relation. For example, TransE (Bordes et al. 2013) models relations as addition operation from head entities to tail entities. RotatE (Sun et al. 2019) treats relations as rotation between entities on complex field. GNN-based models like R-GCN (Schlichtkrull et al. 2018), CompGCN (Vashishth et al. 2020b), focus on capturing the relation pattern from global graph view, through neighborhood aggregation mechanism. While among entities, a more natural and intuitive relevancy is always ignored, which is that how one entity is close to another semantically, without the consideration of any explicit relation. We name such semantic phenomenon in knowledge graph as proximity pattern.

Concretely, if a query $(h,r,?)$ holds multiple answers $t_{1}\sim t_{N}$ simultaneously, we consider $t_{1}\sim t_{N}$ satisfy proximity pattern. We assume that answers under the same query share some common characteristics, which will close them semantically. For example, given query (Robert Downey Jr., act, ?), some of the answers like The Avengers, Iron Man 2, Avengers: Age of Ultron, all possess the characteristics like superhero films, comic book films, product of Marvel Studios, to close them semantically. Other examples including movies from the same director, paper published from the same laboratory, multiple hyponyms of the same hypernym, support the same phenomenon.

Intuitively, it is necessary to capture and model the proximity pattern since it can provide beneficial evidences during inference phase. Like in Figure 1, we can know that The Avengers has strong semantic proximity with Avengers: Age of Ultron, because of their frequent “co-answer”s under the same query. Then if there is fact (Marvel Studios, product, The Avengers) during training, it will be easy to infer out (Marvel Studios, product, Avengers: Age of Ultron). Note that though for some GNN-based KGE works, neighbor aggregation mechanism can also capture the proximity information to some extent, because of no explicit guidance and attendance, the effective utilization is still limited.

Refer to caption — Figure 1: Knowledge inference based on proximity pattern. The high proximity between The Avengers and Avengers: Age of Ultron can be derived by large number of co-occurrences under the same query. Then if we have (Marvel Studios, product, The Avengers) during training, a strong confidence will be given to the hold of the fact (Marvel Studios, product, Avengers: Age of Ultron).

Inspired by this intuition, in this work we explore the problem of how to define and represent proximity pattern, and how it can be utilized to help knowledge graph embedding. The overall framework is demonstrated in Figure 2. Firstly, we define the proximity of any two entities according to the statistically shared query between them, then based on the proximity value we construct a derived proximity graph. This can be seen that we extract a new semantic graph structure from the original knowledge graph, which can represent the proximity pattern from a global view. Then we employ Graph Neural Networks (GNNs) to model and merge two patterns (graphs) together, as their powerful graph structure learning ability revealed in recent years (Kipf and Welling 2017; Velickovic et al. 2018; Xu et al. 2019). We first employ a relation aware GNN on knowledge graph and a homogeneous GNN on proximity graph respectively, to capture the global semantic interactions within pattern. Then we stack the two GNNs together, to capture the deep semantic interactions across pattern. We name the overall architecture as Chained couPle-GNN (CP-GNN), to represent the separate and sequence modeling characteristics applied here. In CP-GNN, GNN $\mathrm{G}_{p}$ is put back to the $\mathrm{G}_{r}$ , because we tend to serve proximity modeling as an enhancement of original knowledge graph embedding. After fusing into the proximity pattern, the encoder will obtain a more comprehensive knowledge embedding. Finally, ConvE (Dettmers et al. 2018) is chose as the decoder to perform the Knowledge Graph Completion task.

In summary, our main contributions are as follows:

•

To our best knowledge, this is the first work to propose proximity pattern concept in knowledge graph field, which serves as a different semantic assumption compared with traditional relation pattern.
•

We dive into the way to fuse proximity pattern and relation pattern for more comprehensive knowledge graph embedding, and design a Chained couPle-GNN (CP-GNN) architecture to sufficiently capture the global and deep interactions of two patterns.
•

Extensive experiments on FB15k-237 and WN18RR datasets demonstrate the effectiveness of our proposed proximity pattern and CP-GNN KGE model.

2 Related Work

Knowledge Graph Embedding (KGE) is an active research area, where literature works mainly aim to model the relation pattern of KG and can be roughly divided into three families (Wang et al. 2017; Arora 2020): Translational Distance Models apply distance-based scoring function and model relations as some operation between head and tail entity, like addition operation in TransE (Bordes et al. 2013), hyper-plane addition in TransH (Wang et al. 2014), complex field rotation in RotatE (Sun et al. 2019), etc. Semantic Matching Models employ similarity-based function to directly model the relation into the triple. DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016) employ multiplication model to represent the likelihood of a fact. ConvE (Dettmers et al. 2018), InteracE (Vashishth et al. 2020a) applies neural networks for similarity modeling. GNN-based Models are proposed to encode relation pattern from a global graph structure angle. R-GCN (Schlichtkrull et al. 2018) improves GCNs by introducing relation-specific transformation during neighbor aggregation. A2N (Bansal et al. 2019) introduces an attentional aggregating mechanism to adaptively merge relevant neighborhood into query representation. Moreover, Comp-GCN (Vashishth et al. 2020b) generalizes different neighbor aggregation methods of knowledge graph as entity-relation composition operations, giving a unified framework GNN-based works. All three families of KGE works lack the consideration of latent semantic relevancy that describes how entities are close to each other without explicit relation.

3 Methodology

In section 3.1, we firstly introduce the Knowledge Graph Embedding problem. Then in section 3.2, we introduce the definition, representation and modeling method of proximity pattern, and in section 3.3, we discuss the modeling method of relation pattern. Finally in section 3.4, we will introduce the CP-GNN architecture and the model training process.

3.1 Problem Definition

A knowledge graph is denoted by $\mathcal{G}=(\mathcal{E},\mathcal{R},\mathcal{F})$ , where $\mathcal{E}$ and $\mathcal{R}$ represents the set of entities and relations, and $\mathcal{F}=\{(h_{i},r_{i},t_{i})\}\subseteq\mathcal{E}\times\mathcal{R}\times\mathcal{E}$ is the set of triple facts. Knowledge Graph Embedding (KGE) task tries to learn a function $\phi:\mathcal{E}\times\mathcal{R}\times\mathcal{E}\mapsto\mathbb{R}$ such that given a query $q=(h,r,?)$ and an entity $t$ , the output score $\phi(q,t)\in\mathbb{R}$ measures the likelihood of $t$ being one of the answers of $q$ .

During calculation, $\mathcal{E}$ and $\mathcal{R}$ will be represented as embedding matrix. Entity embedding matrix is formulated as $E\in\mathbb{R}^{n_{e}\times d}$ , where $n_{e}$ denotes the number of entities, and $d\ll n_{e}$ is the dimension of embedding. For single entity $e_{i}$ , we denote its corresponding vector as boldface form $\mathbf{e}_{i}\in\mathbb{R}^{d}$ , which is the transposition of $i$ -th row of $E$ . Similarly, relation embedding matrix is denoted as $R\in\mathbb{R}^{n_{r}\times d}$ , where $n_{r}$ is the number of relations and $\mathbf{r}_{i}$ the embedding vector of relation $r_{i}$ .

3.2 Proximity Pattern Modeling

Proximity Pattern Definition

In order to draw forth the proximity concept, we firstly give the definition of Query-Answer (QA) pair:

Definition 1 (Query-Answer pair, QA pair)

A Query-Answer (QA) pair $(q,a)$ consists of a query $q=(h,r,?)$ or $(?,r,t)$ , and an answer set $a=\{e_{1},...,e_{m}\}$ that represents all the answer entities of query $q$ in knowledge graph.

Each triple fact $(h,r,t)$ is included in two QA pairs, for $(h,r,?)$ and $(?,r,t)$ direction respectively. From the triple set $\mathcal{F}=\{(h_{i},r_{i},t_{i})\}$ , we can obtain a QA pair set $\mathcal{T}=\{(q_{k},a_{k})\}$ .

The core hypothesis we put forward for the proximity pattern is that entities in the same answer set share characteristics, like the movies from same director, the paper published from same laboratory, etc. The term proximity means that such entities should be close to each other both in cognitive semantic space and numerical embedding space. To formally describe the concept, we propose Proximity Measure (PM) as following:

Definition 2 (Proximity Measure, PM)

Given a QA pair $(q_{k},a_{k})$ with $|a_{k}|>1$ , for $\forall e_{i},e_{j}\in a_{k}\,(i\neq j)$ , the Proximity Measure (PM) between $e_{i},e_{j}$ with regard to $(q_{k},a_{k})$ is defined as: $p^{k}_{ij}=p^{k}_{ji}=\frac{\mathrm{max}(M-|a_{k}|,0)}{M-2}$ .

$M\geq 2$ is a hyper-parameter that represents the threshold size of the answer set. PM is inversely proportional to the answer set size, which takes the maximum value of $1$ for $|a_{k}|=2$ , and minimum value of $0$ for $|a_{k}|>=M$ . This comes from the observation that the concentration extent of semantic proximity decays when the number of answers increases. For example, the more movies one actor participates in, lower the probability that these movies share same characteristics (genre, style, etc.) will be.

Beyond single particular query, by considering all the observed shared queries in dataset, we define Statistical Proximity Measure as:

Definition 3 (Statistical Proximity Measure, SPM)

For any entity $e_{i},e_{j}\in\mathcal{E}$ , the Statistical Proximity Measure(SPM) between $e_{i}$ and $e_{j}$ is:

p_{ij}=p_{ji}=\sum_{k=1}^{|\mathcal{T}|}p^{k}_{ij}

where $\mathcal{T}=\{(q_{k},a_{k})\}$ is the set of QA pair, and $p^{k}_{ij}=0$ if $e_{i}$ or $e_{j}$ is not in $a_{k}$ .

After obtaining the SPM between any entity pair, we can give the formal definition of the proximity pattern of knowledge graph.

Definition 4 (Proximity Pattern)

For a knowledge graph $\mathcal{G}=(\mathcal{E},\mathcal{R},\mathcal{F})$ , the Proximity Pattern of $\mathcal{G}$ is a matrix:

P^{\mathcal{G}}\in\mathcal{R}^{n_{e}\times n_{e}},[P^{\mathcal{G}}]_{ij}=p_{ij}

where $n_{e}$ is the number of entities in knowledge graph, and $[P^{\mathcal{G}}]_{ij}$ is the value of ij-th entry of $P^{{\mathcal{G}}}$ .

Proximity Graph Construction

According to the Definition 4, proximity pattern is a matrix $P^{\mathcal{G}}$ that describes semantic proximity extent between any entity pair. In order to represent the global interactions among entities, we construct a proximity graph based on $P^{\mathcal{G}}$ . Specifically, given a minimum threshold $I$ , if the entry $[P^{\mathcal{G}}]_{ij}>I$ , we will connect an undirected edge between $e_{i}$ and $e_{j}$ in graph, and set the weight of edge to $[P^{\mathcal{G}}]_{ij}$ . The illustration can be seen in figure 3.

GNN for Modeling Proximity Pattern

After constructing the proximity graph, we can utilize extensive graph representation learning works to encode proximity pattern into embeddings. Here we choose the tool of Graph Neural Network (GNN), which has shown the competence for modeling global and complex graph features.

We start with a single GNN layer. For each entity $e_{i}$ , denoting its input entity embedding for $l$ -th layer as $\mathbf{e}_{i}^{(l)}$ , the neighbor aggregation mechanism is formulated as:

	$\displaystyle\mathbf{n}_{i}^{(l)}$	$\displaystyle=\sum_{e_{j}\in\mathcal{N}_{i}}\alpha_{ij}\,\mathbf{e}_{j}^{(l)}$		(1)
	$\displaystyle\alpha_{ij}$	$\displaystyle=\frac{\mathrm{exp}([P^{\mathcal{G}}]_{ij})}{\sum_{e_{z}\in\mathcal{N}_{i}}\mathrm{exp}([P^{\mathcal{G}}]_{iz})}$		(2)

$\mathcal{N}_{i}$ denotes the neighborhood of entity $e_{i}$ in proximity graph. $\alpha_{ij}$ is the normalization version of $[P^{\mathcal{G}}]_{ij}$ computed by $\mathrm{Softmax}$ function across $\mathcal{N}_{i}$ . $\mathbf{n}_{i}^{(l)}$ is the neighbor representation of entity $e_{i}$ on $l$ -th layer, which can be seen as the weighted summarization of neighbors according to their proximity extent with $e_{i}$ .

After getting the neighbor representation, we employ it to update the entity embedding:

\mathbf{e}_{i}^{(l+1)}=\sigma(W_{p}^{(l)}\mathbf{n}_{i}^{(l)})+\mathbf{e}_{i}^{(l)}

(3)

$W_{p}^{(l)}\in\mathbb{R}^{d\times d}$ is the linear transformation matrix for proximity pattern. $\mathbf{e}_{i}^{(l+1)}$ is either the output of $l$ -th GNN layer or the input of $(l+1)$ -th layer. $\sigma$ is the non-linear activation function.

The modeling process corresponds to $G_{p}$ in figure 2. After $L$ layers’ aggregation and updating, we serve the resulted entity embedding matrix $E^{(L)}$ as the output of $\mathrm{G}_{p}$ and denote as $E_{p}$ , into which the proximity pattern is integrated.

3.3 Relation Pattern Modeling

The relation pattern describes how one entity is related to another by some explicit relation, which is represented by knowledge graph triples, and can be formally defined as:

Definition 5 (Relation Pattern)

For a knowledge graph $\mathcal{G}=(\mathcal{E},\mathcal{R},\mathcal{F})$ , the Relation Pattern of $\mathcal{G}$ is a tensor: $R^{\mathcal{G}}\in\mathcal{R}^{n_{e}\times n_{e}\times n_{r}}$ ,

[R^{\mathcal{G}}]_{ijk}=\begin{cases}1,&(e_{i},r_{k},e_{j})\in\mathcal{F}\\ 0,&otherwise\end{cases}

where $n_{e}$ and $n_{r}$ is the number of entities and relations in knowledge graph, and $[R^{\mathcal{G}}]_{ijk}$ is the value of ijk-th entry of $R^{{\mathcal{G}}}$ .

According to Definition 5, the knowledge graph itself is the sparse embodiment of the relation pattern, hence to capture relation pattern in a global way, we introduce a relation-aware GNN model, which corresponds to $\mathrm{G}_{r}$ in figure 2.

For a single layer, the neighbor aggregation function is formulated as:

\mathbf{n}_{i}^{(l)}=\sum_{(e_{j},r_{j})\in\mathcal{N}_{i}}\alpha_{ij}^{(l)}\,\varphi(\mathbf{e}_{j}^{(l)},\mathbf{r}_{j})

(4)

$\mathcal{N}_{i}=\{(e_{j},r_{j})|(e_{j},r_{j},e_{i})\in\mathcal{F}\}$ denotes the relational neighbors of entity $e_{i}$ , which are the neighbor entities associated with the connecting relation. $\mathbf{r}_{j}$ is the initial embedding of relation and remains invariable for every layer, which will be explained later. $\varphi(\mathbf{e},\mathbf{r})$ is the composition function to fuse the entity and relation information. The selection includes additive function: $\varphi(\mathbf{e},\mathbf{r})=\mathbf{e}+\mathbf{r}$ ; multiplication function: $\varphi(\mathbf{e},\mathbf{r})=\mathbf{e}*\mathbf{r}$ , where $*$ denotes element-wise multiplication; Multilayer Perceptron: $\varphi(\mathbf{e},\mathbf{r})=\mathrm{MLP}([\mathbf{e}||\mathbf{r}])$ , where $||$ is the vector concatenation operation. Here we choose additive function, which is the most computational effective one. $\alpha_{ij}^{(l)}$ is the weight for relational neighbor $(e_{j},r_{j})$ , the discussion of different weight choices is placed in Appendix A.

Then we update the entity embedding by:

\mathbf{e}_{i}^{(l+1)}=\sigma(W_{r}^{(l)}\mathbf{n}_{i}^{(l)})+\mathbf{e}_{i}^{(l)}

(5)

$W_{r}^{(l)}\in\mathbb{R}^{d\times d}$ is the linear transformation matrix for relation pattern. After $L$ layers, we use the output entity embedding matrix $E^{(L)}$ as the final output of $\mathrm{G}_{r}$ , denoted as $E_{r}$ .

For equation 4, different from previous works (Vashishth et al. 2020b) where relations are also updated in each layer, we remain the relation embedding the same. Here we assume that relations serve as the translation operation between entities, and the operation itself should keep consistent effect in different layers.

3.4 Pattern Fusion and Training

In this section we will introduce our proposed CP-GNN framework to merge two patterns together and an end-to-end encoder-decoder training paradigm.

At encoding stage, as introduced in section 3.2 and 3.3, two GNNs are employed to capture the global semantic interactions within pattern, which denoted as $\mathrm{G}_{p}$ and $\mathrm{G}_{r}$ . Then we stack two GNNs together to capture the deep semantic interactions across pattern. GNN $\mathrm{G}_{p}$ is put back to the $\mathrm{G}_{r}$ , because we tend to serve proximity modeling as an enhancement of original knowledge graph embedding. After fusing into the proximity pattern, the encoder will obtain a more comprehensive knowledge embedding.

Given the initial entity and relation embedding matrix as $E\in\mathbb{R}^{n_{e}\times d}$ and $R\in\mathbb{R}^{n_{r}\times d}$ , the encoder process follows:

	$\displaystyle E_{r}$	$\displaystyle=\mathrm{G}_{r}(E,R)$		(6)
	$\displaystyle E_{p}$	$\displaystyle=\mathrm{G}_{p}(E_{r})$		(7)

For relation embedding $R$ , we transform it by an Multilayer Perceptron (MLP) to get the output embedding. The total output of the encoder can be described as:

	$\displaystyle E_{enc}$	$\displaystyle=E_{p}$		(8)
	$\displaystyle R_{enc}$	$\displaystyle=\mathrm{MLP}(R)$		(9)

In decoder module, we perform the task of Knowledge Graph Completion. The module takes the embedding pair of query $\mathbf{q}=(\mathbf{h},\mathbf{r})$ as input, and aims to measure the answer likelihood with regard to all the entities $\{\mathbf{e}_{1},...,\mathbf{e}_{n_{e}}\}$ of $E_{enc}$ . Here we choose ConvE (Dettmers et al. 2018) as our decoder implementation, which uses 2D convolutional neural network to match query and answers. We refer readers to original paper for more details, and here we directly utilize the decoder function as:

\mathrm{ConvE}(\mathbf{q},E_{enc})=\mathbf{o}

(10)

$\mathbf{o}\in\mathbb{R}^{n_{e}}$ is the matching scores between query and all entities. Then we use binary cross entropy loss to measure the difference between model output $\mathbf{o}$ and target $\mathbf{t}\in\mathbb{R}^{n_{e}}$ :

	$\displaystyle\mathcal{L}(\mathbf{o},\mathbf{t})=-\frac{1}{n_{e}}\sum_{i}^{n_{e}}\mathbf{t}[i]\cdot\log(\mathbf{o}[i])+$		(11)
	$\displaystyle(1-\mathbf{t}[i])\cdot\log(1-\mathbf{o}[i])$		(11)

$\mathbf{t}[i]$ and $\mathbf{o}[i]$ denote the $i$ -th entry of the vector. Then Stochastic Gradient Descent (SGD) algorithm is applied to train model predictions approximating targets.

4 Experiments

4.1 Experiment Setup

Dataset

We conduct experiments of Knowledge Graph Completion task on two commonly used public datasets: FB15k-237 (Toutanova and Chen 2015) and WN18RR (Dettmers et al. 2018). FB15k-237 contains entities and relations from Freebase, which is a large commonsense knowledge base. WN18RR is created from WordNet, a lexical database of semantic relations between words. Each dataset is split into train, valid and test sets. The statistics of two dataset are summarized at Table 1.

Evaluation Protocol

The model performance is measured by five frequently used metrics: MRR (the Mean Reciprocal Rank of correct entities), MR (the Mean Rank of correct entities), Hits@1, Hits@3, Hits@10 (the accuracy of correct entities ranking in top 1/3/10). We follow the filtered setting protocol (Bordes et al. 2013) for evaluation, i.e. all the other true entities appearing in train, valid and test set are excluded when ranking. In addition, based on the observation of (Sun et al. 2020), to eliminate the problem of abnormal score distribution, if prediction target have the same score with multiple other entities, we take the average of upper bound and lower bound as result.

Knowledge Graph Construction

Here we give some details for knowledge graph construction process, which are shown effective during our experiments:

•

Following the (Vashishth et al. 2020b) work, we transform the knowledge graph to undirected graph, by introducing an inverse edge $(t,r^{-1},h)$ for each edge $(h,r,t)$ , which aims to pass the information bidirectionally and enhance the graph connectivity.
•

For each training batch, we randomly remove a proportion of edges in the knowledge graph. So that the model will focus more on how to predict missing edges in the graph, which is closer to the inference process.

Hyper-parameter setting

The hyper-parameters involved in our work include: batch size from $\{256,512,1024\}$ , learning rate from $\{1e^{-4},3e^{-4},5e^{-3}\}$ , dimension of entity and relation embedding from $\{500,1000\}$ , layer number of knowledge graph from $\{1,2,3\}$ , layer number of proximity graph from $\{1,2,3\}$ , randomly removing rate of batch edges from $\{0.1,0.3,0.5,0.7,1.0\}$ , maximum set size $M$ of QA pair from $\{25,50,100,500\}$ (Definition 2), minimum SPM threshold $I$ from $\{0.5,1,3,5\}$ (Definition 3). We tune the hyper-parameters by grid search algorithm.

Dataset	FB15k-237	WN18RR
# entity	14,541	40,943
# relation	237	11
# train triple	272,115	86,835
# valid triple	17,535	3,034
# test triple	20,466	3,134

Table 1: Dataset statistics

Models	FB15k-237					WN18RR
Models	MRR	MR	H@1	H@3	H@10	MRR	MR	H@1	H@3	H@10
Translational Distance
TransE (Bordes et al. 2013)	.294	357	-	-	.465	.226	3384	-	-	.501
RotatE (Sun et al. 2019)	.338	177	.241	.375	.533	.476	3340	.428	.492	.571
PaiRE (Chao et al. 2021)	.351	160	.256	.387	.544	-	-	-	-	-
Semantic Matching
DistMult (Yang et al. 2015)	.241	254	.155	.263	.419	.430	5110	.390	.440	.490
ComplEx (Trouillon et al. 2016)	.247	339	.158	.275	.428	.440	5261	.410	.460	.510
TuckER (Balazevic and Allen 2019)	.358	-	.266	.394	.544	.470	-	.443	.482	.526
ConvE (Dettmers et al. 2018)	.325	244	.237	.356	.501	.430	4187	.400	.440	.520
InteractE (Vashishth et al. 2020a)	.354	172	.263	-	.535	.463	5202	.430	-	.528
ProcrustEs (Peng et al. 2021)	.345	-	.249	.379	.541	.474	-	.421	.502	.569
GNN-based
R-GCN (Schlichtkrull et al. 2018)	.248	-	.151	-	.417	-	-	-	-	-
KBGAT (Nathani et al. 2019)*	.157	270	-	-	.331	.412	1921	-	-	.554
SACN (Shang et al. 2019)	.350	-	.260	.390	.540	.470	-	.430	.480	.540
A2N (Bansal et al. 2019)	.317	-	.232	.348	.486	.450	-	.420	.460	.510
CompGCN (Vashishth et al. 2020b)	.355	197	.264	.390	.535	.479	3533	.443	.494	.546
CP-GNN (ours)	.365	178	.276	.397	.554	.482	3214	.447	.492	.571

Table 2: Knowledge Graph Completion results on FB15k-237 and WN18RR dataset. H@1, H@3 and H@10 denote the metrics of Hits@1, Hits@3 and Hits@10 respectively. The best results are in bold. CP-GNN achieves the SOTA performance in the overall consideration of five metrics on two datasets. The results of TransE, RotatE, DistMult, ComplEx and ConvE are from (Sun et al. 2019). * means that the results of KBGAT are from (Sun et al. 2020) because original results suffer from same score evaluation problem, which is discussed in section 4.1. Other results are from the published paper.

4.2 Experimental Results of KGC Task

Our baselines are selected from three categories which are Translational Distance Models: TransE (Bordes et al. 2013), RotatE (Sun et al. 2019), PaiRE (Chao et al. 2021); Semantic Matching Models: DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), TuckER (Balazevic and Allen 2019), ConvE (Dettmers et al. 2018), InteractE (Vashishth et al. 2020a), ProcrustEs (Peng et al. 2021); GNN-based Models: R-GCN (Schlichtkrull et al. 2018), KBGAT (Nathani et al. 2019), SACN (Shang et al. 2019), A2N (Bansal et al. 2019), CompGCN (Vashishth et al. 2020b).

The experimental results are demonstrated in Table 2, where we can see that proposed CP-GNN model obtains state-of-the-art results compared to current methods. On FB15k-237 dataset, CP-GNN obtains best results using four of five metrics, and on WN18RR dataset CP-GNN also achieves best with MRR, H@1 and H@10 metrics. For the MRR metric, which is an important indicator to describe the general ranking performance, CP-GNN attains best results on both two datasets, showing the simultaneous modeling of two semantic patterns encodes a more comprehensive knowledge representation for downstream task. For H@k metrics, CP-GNN achieves best performance on five terms across two datasets. The only exception is H@3 on WN18RR, while the result is also competitive. This shows that CP-GNN maintains a high prediction accuracy for top ranking entities. For the MR metric, CP-GNN also gives a competitive performance. Note that for the models PaiRE (Chao et al. 2021) and KBGAT (Nathani et al. 2019) with the best MR reporting, their other metric outcomes are not so as outstanding, and CP-GNN still achieves the better overall performance.

In addition, we observe that CP-GNN shows a comparatively better performance on FB15k-237 dataset than WN18RR. We consider such performance differences are caused by the query complexity characteristics of two datasets. In FB15k-237, there are high rate of queries with multiple answers (also known as 1-N relations from triple view), which demands higher modeling capacity to capture latent relevancy among answers, i.e. proximity pattern. While in WN18RR most queries only satisfy one answer (also known as 1-1 relations), implying WN18RR is an easier dataset that traditional relation pattern is sufficient to some extent. This can also be proven from the consistently better performance on WN18RR relative to FB15k-237 across most models. We think that proximity pattern plays a more important role in complex query scenarios like in FB15k-237, which will be further discussed in section 5.2.

5 Effectiveness Evaluation of Proximity Pattern

In this section we attempt to answer following questions:
Q1. Does the employment of proximity pattern do improve the model performance for Knowledge Graph Completion task? (section 5.1)
Q2. How does the proximity pattern perform for data with different modeling complexity? (section 5.2)

5.1 Ablation Study of Model Performance

To evaluate the effect of proximity pattern for model performance on KGC task, we do the ablation study of removing the proximity pattern modeling module $\mathrm{G}_{p}$ in CP-GNN, and denote the remained architecture that only retains knowledge graph module $\mathrm{G}_{r}$ as CP-GNN (KG). Corresponding to the figure 2, after relation pattern modeling the obtained embedding $\mathrm{E}_{r}$ will be directly input into the decoder as $\mathrm{E}_{enc}$ . The relation embedding $R$ and its transformation will remain unchanged. The comparison results on FB15k-237 test set are summarized in Table 3. We can observe the performance degeneration across all five metrics in CP-GNN (KG), which shows the limited capacity of only modeling relation pattern and the effectiveness of fusing into proximity pattern.

Models	FB15k-237
Models	MRR	MR	H@1	H@3	H@10
CP-GNN	0.365	178	0.276	0.397	0.554
CP-GNN (KG)	0.357	193	0.258	0.380	0.531

Table 3: Ablation study of proximity pattern for Knowledge Graph Completion task on FB15k-237 dataset.

5.2 Ablation Study on Data with Different Modeling Complexity

In this section, we will further probe into how does CP-GNN and CP-GNN (KG) perform in the subdivided data complexity scenarios. For each query-answer data, we decide its category based on its answer set size, which serves as an implication of modeling complexity. Intuitively, more answers need to simultaneously satisfy for one query, higher probability the conflicts may happen, which puts forward higher requirements for the model to capture the latent closeness relevancy among answer entities. In practice, for every $(h,r,t)$ we will count the satisfied entities in train set of query $(h,r,?)$ and $(?,r,t)$ as its two categories. We denote the category as N-type.

We divide N-type into six ranges: N $=$ 0, N $=$ 1, 1 $<$ N $<=$ 10, 10 $<$ N $<=$ 100, 100 $<$ N $<=$ 500, N $>$ 500, representing the different complexity level. The statistics of N-type data in FB15k-237 and WN18RR is summarized in table 4. We can see that in FB15k-237 the proportion of complex data is obviously larger, where 10 $<$ N $<=$ 100 is 0.27 and 100 $<$ n $<=$ 500 is 0.13, and the counterparts in WN18RR are 0.09 and 0.04 respectively. This corresponds to our analysis in section 4.2, that FB15k-237 is a harder dataset so most models reveal a worse performance compared to WN18RR dataset.

We demonstrate the MRR result of CP-GNN and CP-GNN (KG) on different N-type ranges in figure 4. We can observe that CP-GNN outperforms or competes CP-GNN (KG) in all scenarios, and when N $>$ 10 the improvements are especially evident. This shows that the proximity pattern can effectively boost the modeling capacity for complex data. We consider this is because through proximity pattern, the potential answer entities will be modeled more close to each other, and the model can easily utilize observed facts to infer out unknown answers, like what illustrated in figure 1.

N-type Range	FB15k-237		WN18RR
N-type Range	Num.	Rate	Num.	Rate
N $=$ 0	6,881	0.17	2827	0.45
N $=$ 1	2,929	0.07	970	0.15
1 $<$ N $<=$ 10	11,368	0.28	1657	0.26
10 $<$ N $<=$ 100	10,965	0.27	581	0.09
100 $<$ N $<=$ 500	5,359	0.13	233	0.04
N $>$ 500	3,430	0.08	0	0.0
Total	40,932	1.0	6268	1.0

Table 4: N-type statistics of FB15k-237 and WN18RR test set. N=0 means that there is no satisfied answer entities in train set. Because of each triple contributes

(h,r,?)

and

(?,r,t)

two cases, the total number is double to the triple number.

6 Conclusions

In this work we explore the proximity pattern, a new semantic assumption of knowledge graph, which is able to help obtain more comprehensive knowledge embedding. Moreover, we design a Chained couPle-GNN (CP-GNN) architecture to fuse proximity pattern and original relation pattern in knowledge graph globally and deeply. Extensive experiments demonstrate the validity of proposed proximity pattern and the effectiveness of fused knowledge representations, especially in complex data scenarios.

References

Arora (2020) Arora, S. 2020. A Survey on Graph Neural Networks for Knowledge Graph Completion. CoRR, abs/2007.12374.
Balazevic and Allen (2019) Balazevic, I.; and Allen, C. 2019. TuckER: Tensor Factorization for Knowledge Graph Completion. In Inui, K.; Jiang, J.; Ng, V.; and Wan, X., eds., Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, 5184–5193. Association for Computational Linguistics.
Bansal et al. (2019) Bansal, T.; Juan, D.; Ravi, S.; and McCallum, A. 2019. A2N: Attending to Neighbors for Knowledge Graph Inference. In Korhonen, A.; Traum, D. R.; and Màrquez, L., eds., Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, 4387–4392. Association for Computational Linguistics.
Bordes et al. (2013) Bordes, A.; Usunier, N.; García-Durán, A.; Weston, J.; and Yakhnenko, O. 2013. Translating Embeddings for Modeling Multi-relational Data. In Burges, C. J. C.; Bottou, L.; Ghahramani, Z.; and Weinberger, K. Q., eds., Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, 2787–2795.
Chao et al. (2021) Chao, L.; He, J.; Wang, T.; and Chu, W. 2021. PairRE: Knowledge Graph Embeddings via Paired Relation Vectors. In Zong, C.; Xia, F.; Li, W.; and Navigli, R., eds., Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, 4360–4369. Association for Computational Linguistics.
Dettmers et al. (2018) Dettmers, T.; Minervini, P.; Stenetorp, P.; and Riedel, S. 2018. Convolutional 2D Knowledge Graph Embeddings. In McIlraith, S. A.; and Weinberger, K. Q., eds., Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, 1811–1818. AAAI Press.
Kipf and Welling (2017) Kipf, T. N.; and Welling, M. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.
Nathani et al. (2019) Nathani, D.; Chauhan, J.; Sharma, C.; and Kaul, M. 2019. Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs. In Korhonen, A.; Traum, D. R.; and Màrquez, L., eds., Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, 4710–4723. Association for Computational Linguistics.
Peng et al. (2021) Peng, X.; Chen, G.; Lin, C.; and Stevenson, M. 2021. Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis. In Toutanova, K.; Rumshisky, A.; Zettlemoyer, L.; Hakkani-Tür, D.; Beltagy, I.; Bethard, S.; Cotterell, R.; Chakraborty, T.; and Zhou, Y., eds., Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, 2364–2375. Association for Computational Linguistics.
Schlichtkrull et al. (2018) Schlichtkrull, M. S.; Kipf, T. N.; Bloem, P.; van den Berg, R.; Titov, I.; and Welling, M. 2018. Modeling Relational Data with Graph Convolutional Networks. In Gangemi, A.; Navigli, R.; Vidal, M.; Hitzler, P.; Troncy, R.; Hollink, L.; Tordai, A.; and Alam, M., eds., The Semantic Web - 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3-7, 2018, Proceedings, volume 10843 of Lecture Notes in Computer Science, 593–607. Springer.
Shang et al. (2019) Shang, C.; Tang, Y.; Huang, J.; Bi, J.; He, X.; and Zhou, B. 2019. End-to-End Structure-Aware Convolutional Networks for Knowledge Base Completion. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, 3060–3067. AAAI Press.
Sun et al. (2019) Sun, Z.; Deng, Z.; Nie, J.; and Tang, J. 2019. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
Sun et al. (2020) Sun, Z.; Vashishth, S.; Sanyal, S.; Talukdar, P. P.; and Yang, Y. 2020. A Re-evaluation of Knowledge Graph Completion Methods. In Jurafsky, D.; Chai, J.; Schluter, N.; and Tetreault, J. R., eds., Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, 5516–5522. Association for Computational Linguistics.
Toutanova and Chen (2015) Toutanova, K.; and Chen, D. 2015. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd workshop on continuous vector space models and their compositionality, 57–66.
Trouillon et al. (2016) Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; and Bouchard, G. 2016. Complex Embeddings for Simple Link Prediction. In Balcan, M.; and Weinberger, K. Q., eds., Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, 2071–2080. JMLR.org.
Vashishth et al. (2020a) Vashishth, S.; Sanyal, S.; Nitin, V.; Agrawal, N.; and Talukdar, P. P. 2020a. InteractE: Improving Convolution-Based Knowledge Graph Embeddings by Increasing Feature Interactions. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, 3009–3016. AAAI Press.
Vashishth et al. (2020b) Vashishth, S.; Sanyal, S.; Nitin, V.; and Talukdar, P. P. 2020b. Composition-based Multi-Relational Graph Convolutional Networks. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
Velickovic et al. (2018) Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; and Bengio, Y. 2018. Graph Attention Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
Wang et al. (2018) Wang, H.; Zhang, F.; Wang, J.; Zhao, M.; Li, W.; Xie, X.; and Guo, M. 2018. RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems. In Cuzzocrea, A.; Allan, J.; Paton, N. W.; Srivastava, D.; Agrawal, R.; Broder, A. Z.; Zaki, M. J.; Candan, K. S.; Labrinidis, A.; Schuster, A.; and Wang, H., eds., Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, Torino, Italy, October 22-26, 2018, 417–426. ACM.
Wang et al. (2017) Wang, Q.; Mao, Z.; Wang, B.; and Guo, L. 2017. Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE Trans. Knowl. Data Eng., 29(12): 2724–2743.
Wang et al. (2014) Wang, Z.; Zhang, J.; Feng, J.; and Chen, Z. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In Brodley, C. E.; and Stone, P., eds., Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Québec City, Québec, Canada, 1112–1119. AAAI Press.
Xu et al. (2019) Xu, K.; Hu, W.; Leskovec, J.; and Jegelka, S. 2019. How Powerful are Graph Neural Networks? In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
Yang et al. (2015) Yang, B.; Yih, W.; He, X.; Gao, J.; and Deng, L. 2015. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In Bengio, Y.; and LeCun, Y., eds., 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
Yasunaga et al. (2021) Yasunaga, M.; Ren, H.; Bosselut, A.; Liang, P.; and Leskovec, J. 2021. QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering. In Toutanova, K.; Rumshisky, A.; Zettlemoyer, L.; Hakkani-Tür, D.; Beltagy, I.; Bethard, S.; Cotterell, R.; Chakraborty, T.; and Zhou, Y., eds., Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, 535–546. Association for Computational Linguistics.
Zhang et al. (2020) Zhang, H.; Liu, Z.; Xiong, C.; and Liu, Z. 2020. Grounded Conversation Generation as Guided Traverses in Commonsense Knowledge Graphs. In Jurafsky, D.; Chai, J.; Schluter, N.; and Tetreault, J. R., eds., Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, 2031–2043. Association for Computational Linguistics.

Appendix A Weight Implementation Details

For the neighborhood aggregating mechanism of knowledge graph (equation 4), $\alpha_{ij}^{(l)}$ describes the weight from source node $e_{j}$ to destination node $e_{i}$ for each layer, and the choices include:

1.

Prior Weight: The straight application of Prior Weight is an economical choice. For example, one can use $\alpha_{ij}^{(l)}=\frac{1}{d_{i}}$ , the reciprocal of the outdegree of source nodes, implying that source message should be evenly spread to each destination node.
2.

GCN Weight: GCN Weight is derived from the Graph Convolutional Network (Kipf and Welling 2017), which forms as $\alpha_{ij}=\frac{1}{\sqrt{d_{i}}\sqrt{d_{j}}}$ , where $d_{i}$ is the degree of the node $i$ .
3.

Attention Weight: The Attention Weight is dynamically calculated in each layer, which can capture the different relative importance of each neighbor. Here we calculate the weight of each relational neighbor $(e_{j},r_{j})$ to node $e_{i}$ as:

$\alpha_{ij}^{(l)}=\mathrm{Softmax}\left({(\mathbf{e}_{i}^{(l)})}^{T}\,\varphi(\mathbf{e}_{j},\mathbf{r}_{j})\right)$

We recommend the usage of attention weight to dynamically aggregate the neighbor information based on the importance, which is also shown performance improvement in our experiment (but in the cost of extra time and memory occupation).

Is There More Pattern in Knowledge Graph? Exploring Proximity Pattern for Knowledge Graph Embedding