Is There More Pattern in Knowledge Graph?
Exploring Proximity Pattern for Knowledge Graph Embedding
Abstract
Modeling of relation pattern is the core focus of previous Knowledge Graph Embedding works, which represents how one entity is related to another semantically by some explicit relation. However, there is a more natural and intuitive relevancy among entities being always ignored, which is that how one entity is close to another semantically, without the consideration of any explicit relation. We name such semantic phenomenon in knowledge graph as proximity pattern. In this work, we explore the problem of how to define and represent proximity pattern, and how it can be utilized to help knowledge graph embedding. Firstly, we define the proximity of any two entities according to their statistically shared queries, then we construct a derived graph structure and represent the proximity pattern from global view. Moreover, with the original knowledge graph, we design a Chained couPle-GNN (CP-GNN) architecture to deeply merge the two patterns (graphs) together, which can encode a more comprehensive knowledge embedding. Being evaluated on FB15k-237 and WN18RR datasets, CP-GNN achieves state-of-the-art results for Knowledge Graph Completion task, and can especially boost the modeling capacity for complex queries that contain multiple answer entities, proving the effectiveness of introduced proximity pattern.
1 Introduction
Knowledge Graphs (KGs) are large, graph-structured databases which store facts in triple form , denoting that head entity and tail entity satisfy relation . Knowledge Graph Embedding (KGE) is a task of learning low-dimensional representation for entities and relations, so that the symbolic knowledge can be integrated into numerical model to support various down-stream knowledge based tasks, such as recommendation system (Wang et al. 2018), question answering (Yasunaga et al. 2021) and text generation (Zhang et al. 2020), etc. To evaluate the effectiveness of learned embeddings, Knowledge Graph Completion (KGC) task is usually adapted (Dettmers et al. 2018; Balazevic and Allen 2019; Vashishth et al. 2020a), aiming at predicting tail entities given or head entities given . Essentially, KGC can be regarded as query-answer format and without loss of generality, we denote the query as and the answer as .
The general intuition of previous KGE works is to model the relation pattern of knowledge graph, which represents how one entity is related to another semantically by some explicit relation. For example, TransE (Bordes et al. 2013) models relations as addition operation from head entities to tail entities. RotatE (Sun et al. 2019) treats relations as rotation between entities on complex field. GNN-based models like R-GCN (Schlichtkrull et al. 2018), CompGCN (Vashishth et al. 2020b), focus on capturing the relation pattern from global graph view, through neighborhood aggregation mechanism. While among entities, a more natural and intuitive relevancy is always ignored, which is that how one entity is close to another semantically, without the consideration of any explicit relation. We name such semantic phenomenon in knowledge graph as proximity pattern.
Concretely, if a query holds multiple answers simultaneously, we consider satisfy proximity pattern. We assume that answers under the same query share some common characteristics, which will close them semantically. For example, given query (Robert Downey Jr., act, ?), some of the answers like The Avengers, Iron Man 2, Avengers: Age of Ultron, all possess the characteristics like superhero films, comic book films, product of Marvel Studios, to close them semantically. Other examples including movies from the same director, paper published from the same laboratory, multiple hyponyms of the same hypernym, support the same phenomenon.
Intuitively, it is necessary to capture and model the proximity pattern since it can provide beneficial evidences during inference phase. Like in Figure 1, we can know that The Avengers has strong semantic proximity with Avengers: Age of Ultron, because of their frequent “co-answer”s under the same query. Then if there is fact (Marvel Studios, product, The Avengers) during training, it will be easy to infer out (Marvel Studios, product, Avengers: Age of Ultron). Note that though for some GNN-based KGE works, neighbor aggregation mechanism can also capture the proximity information to some extent, because of no explicit guidance and attendance, the effective utilization is still limited.

Inspired by this intuition, in this work we explore the problem of how to define and represent proximity pattern, and how it can be utilized to help knowledge graph embedding. The overall framework is demonstrated in Figure 2. Firstly, we define the proximity of any two entities according to the statistically shared query between them, then based on the proximity value we construct a derived proximity graph. This can be seen that we extract a new semantic graph structure from the original knowledge graph, which can represent the proximity pattern from a global view. Then we employ Graph Neural Networks (GNNs) to model and merge two patterns (graphs) together, as their powerful graph structure learning ability revealed in recent years (Kipf and Welling 2017; Velickovic et al. 2018; Xu et al. 2019). We first employ a relation aware GNN on knowledge graph and a homogeneous GNN on proximity graph respectively, to capture the global semantic interactions within pattern. Then we stack the two GNNs together, to capture the deep semantic interactions across pattern. We name the overall architecture as Chained couPle-GNN (CP-GNN), to represent the separate and sequence modeling characteristics applied here. In CP-GNN, GNN is put back to the , because we tend to serve proximity modeling as an enhancement of original knowledge graph embedding. After fusing into the proximity pattern, the encoder will obtain a more comprehensive knowledge embedding. Finally, ConvE (Dettmers et al. 2018) is chose as the decoder to perform the Knowledge Graph Completion task.
In summary, our main contributions are as follows:
-
•
To our best knowledge, this is the first work to propose proximity pattern concept in knowledge graph field, which serves as a different semantic assumption compared with traditional relation pattern.
-
•
We dive into the way to fuse proximity pattern and relation pattern for more comprehensive knowledge graph embedding, and design a Chained couPle-GNN (CP-GNN) architecture to sufficiently capture the global and deep interactions of two patterns.
-
•
Extensive experiments on FB15k-237 and WN18RR datasets demonstrate the effectiveness of our proposed proximity pattern and CP-GNN KGE model.

2 Related Work
Knowledge Graph Embedding (KGE) is an active research area, where literature works mainly aim to model the relation pattern of KG and can be roughly divided into three families (Wang et al. 2017; Arora 2020): Translational Distance Models apply distance-based scoring function and model relations as some operation between head and tail entity, like addition operation in TransE (Bordes et al. 2013), hyper-plane addition in TransH (Wang et al. 2014), complex field rotation in RotatE (Sun et al. 2019), etc. Semantic Matching Models employ similarity-based function to directly model the relation into the triple. DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016) employ multiplication model to represent the likelihood of a fact. ConvE (Dettmers et al. 2018), InteracE (Vashishth et al. 2020a) applies neural networks for similarity modeling. GNN-based Models are proposed to encode relation pattern from a global graph structure angle. R-GCN (Schlichtkrull et al. 2018) improves GCNs by introducing relation-specific transformation during neighbor aggregation. A2N (Bansal et al. 2019) introduces an attentional aggregating mechanism to adaptively merge relevant neighborhood into query representation. Moreover, Comp-GCN (Vashishth et al. 2020b) generalizes different neighbor aggregation methods of knowledge graph as entity-relation composition operations, giving a unified framework GNN-based works. All three families of KGE works lack the consideration of latent semantic relevancy that describes how entities are close to each other without explicit relation.
3 Methodology
In section 3.1, we firstly introduce the Knowledge Graph Embedding problem. Then in section 3.2, we introduce the definition, representation and modeling method of proximity pattern, and in section 3.3, we discuss the modeling method of relation pattern. Finally in section 3.4, we will introduce the CP-GNN architecture and the model training process.
3.1 Problem Definition
A knowledge graph is denoted by , where and represents the set of entities and relations, and is the set of triple facts. Knowledge Graph Embedding (KGE) task tries to learn a function such that given a query and an entity , the output score measures the likelihood of being one of the answers of .
During calculation, and will be represented as embedding matrix. Entity embedding matrix is formulated as , where denotes the number of entities, and is the dimension of embedding. For single entity , we denote its corresponding vector as boldface form , which is the transposition of -th row of . Similarly, relation embedding matrix is denoted as , where is the number of relations and the embedding vector of relation .
3.2 Proximity Pattern Modeling
Proximity Pattern Definition
In order to draw forth the proximity concept, we firstly give the definition of Query-Answer (QA) pair:
Definition 1 (Query-Answer pair, QA pair)
A Query-Answer (QA) pair consists of a query or , and an answer set that represents all the answer entities of query in knowledge graph.
Each triple fact is included in two QA pairs, for and direction respectively. From the triple set , we can obtain a QA pair set .
The core hypothesis we put forward for the proximity pattern is that entities in the same answer set share characteristics, like the movies from same director, the paper published from same laboratory, etc. The term proximity means that such entities should be close to each other both in cognitive semantic space and numerical embedding space. To formally describe the concept, we propose Proximity Measure (PM) as following:
Definition 2 (Proximity Measure, PM)
Given a QA pair with , for , the Proximity Measure (PM) between with regard to is defined as: .
is a hyper-parameter that represents the threshold size of the answer set. PM is inversely proportional to the answer set size, which takes the maximum value of for , and minimum value of for . This comes from the observation that the concentration extent of semantic proximity decays when the number of answers increases. For example, the more movies one actor participates in, lower the probability that these movies share same characteristics (genre, style, etc.) will be.
Beyond single particular query, by considering all the observed shared queries in dataset, we define Statistical Proximity Measure as:
Definition 3 (Statistical Proximity Measure, SPM)
For any entity , the Statistical Proximity Measure(SPM) between and is:
where is the set of QA pair, and if or is not in .
After obtaining the SPM between any entity pair, we can give the formal definition of the proximity pattern of knowledge graph.
Definition 4 (Proximity Pattern)
For a knowledge graph , the Proximity Pattern of is a matrix:
where is the number of entities in knowledge graph, and is the value of ij-th entry of .
Proximity Graph Construction
According to the Definition 4, proximity pattern is a matrix that describes semantic proximity extent between any entity pair. In order to represent the global interactions among entities, we construct a proximity graph based on . Specifically, given a minimum threshold , if the entry , we will connect an undirected edge between and in graph, and set the weight of edge to . The illustration can be seen in figure 3.
GNN for Modeling Proximity Pattern
After constructing the proximity graph, we can utilize extensive graph representation learning works to encode proximity pattern into embeddings. Here we choose the tool of Graph Neural Network (GNN), which has shown the competence for modeling global and complex graph features.
We start with a single GNN layer. For each entity , denoting its input entity embedding for -th layer as , the neighbor aggregation mechanism is formulated as:
(1) | ||||
(2) |
denotes the neighborhood of entity in proximity graph. is the normalization version of computed by function across . is the neighbor representation of entity on -th layer, which can be seen as the weighted summarization of neighbors according to their proximity extent with .
After getting the neighbor representation, we employ it to update the entity embedding:
(3) |
is the linear transformation matrix for proximity pattern. is either the output of -th GNN layer or the input of -th layer. is the non-linear activation function.
The modeling process corresponds to in figure 2. After layers’ aggregation and updating, we serve the resulted entity embedding matrix as the output of and denote as , into which the proximity pattern is integrated.

3.3 Relation Pattern Modeling
The relation pattern describes how one entity is related to another by some explicit relation, which is represented by knowledge graph triples, and can be formally defined as:
Definition 5 (Relation Pattern)
For a knowledge graph , the Relation Pattern of is a tensor: ,
where and is the number of entities and relations in knowledge graph, and is the value of ijk-th entry of .
According to Definition 5, the knowledge graph itself is the sparse embodiment of the relation pattern, hence to capture relation pattern in a global way, we introduce a relation-aware GNN model, which corresponds to in figure 2.
For a single layer, the neighbor aggregation function is formulated as:
(4) |
denotes the relational neighbors of entity , which are the neighbor entities associated with the connecting relation. is the initial embedding of relation and remains invariable for every layer, which will be explained later. is the composition function to fuse the entity and relation information. The selection includes additive function: ; multiplication function: , where denotes element-wise multiplication; Multilayer Perceptron: , where is the vector concatenation operation. Here we choose additive function, which is the most computational effective one. is the weight for relational neighbor , the discussion of different weight choices is placed in Appendix A.
Then we update the entity embedding by:
(5) |
is the linear transformation matrix for relation pattern. After layers, we use the output entity embedding matrix as the final output of , denoted as .
For equation 4, different from previous works (Vashishth et al. 2020b) where relations are also updated in each layer, we remain the relation embedding the same. Here we assume that relations serve as the translation operation between entities, and the operation itself should keep consistent effect in different layers.
3.4 Pattern Fusion and Training
In this section we will introduce our proposed CP-GNN framework to merge two patterns together and an end-to-end encoder-decoder training paradigm.
At encoding stage, as introduced in section 3.2 and 3.3, two GNNs are employed to capture the global semantic interactions within pattern, which denoted as and . Then we stack two GNNs together to capture the deep semantic interactions across pattern. GNN is put back to the , because we tend to serve proximity modeling as an enhancement of original knowledge graph embedding. After fusing into the proximity pattern, the encoder will obtain a more comprehensive knowledge embedding.
Given the initial entity and relation embedding matrix as and , the encoder process follows:
(6) | ||||
(7) |
For relation embedding , we transform it by an Multilayer Perceptron (MLP) to get the output embedding. The total output of the encoder can be described as:
(8) | ||||
(9) |
In decoder module, we perform the task of Knowledge Graph Completion. The module takes the embedding pair of query as input, and aims to measure the answer likelihood with regard to all the entities of . Here we choose ConvE (Dettmers et al. 2018) as our decoder implementation, which uses 2D convolutional neural network to match query and answers. We refer readers to original paper for more details, and here we directly utilize the decoder function as:
(10) |
is the matching scores between query and all entities. Then we use binary cross entropy loss to measure the difference between model output and target :
(11) | |||
and denote the -th entry of the vector. Then Stochastic Gradient Descent (SGD) algorithm is applied to train model predictions approximating targets.
4 Experiments
4.1 Experiment Setup
Dataset
We conduct experiments of Knowledge Graph Completion task on two commonly used public datasets: FB15k-237 (Toutanova and Chen 2015) and WN18RR (Dettmers et al. 2018). FB15k-237 contains entities and relations from Freebase, which is a large commonsense knowledge base. WN18RR is created from WordNet, a lexical database of semantic relations between words. Each dataset is split into train, valid and test sets. The statistics of two dataset are summarized at Table 1.
Evaluation Protocol
The model performance is measured by five frequently used metrics: MRR (the Mean Reciprocal Rank of correct entities), MR (the Mean Rank of correct entities), Hits@1, Hits@3, Hits@10 (the accuracy of correct entities ranking in top 1/3/10). We follow the filtered setting protocol (Bordes et al. 2013) for evaluation, i.e. all the other true entities appearing in train, valid and test set are excluded when ranking. In addition, based on the observation of (Sun et al. 2020), to eliminate the problem of abnormal score distribution, if prediction target have the same score with multiple other entities, we take the average of upper bound and lower bound as result.
Knowledge Graph Construction
Here we give some details for knowledge graph construction process, which are shown effective during our experiments:
-
•
Following the (Vashishth et al. 2020b) work, we transform the knowledge graph to undirected graph, by introducing an inverse edge for each edge , which aims to pass the information bidirectionally and enhance the graph connectivity.
-
•
For each training batch, we randomly remove a proportion of edges in the knowledge graph. So that the model will focus more on how to predict missing edges in the graph, which is closer to the inference process.
Hyper-parameter setting
The hyper-parameters involved in our work include: batch size from , learning rate from , dimension of entity and relation embedding from , layer number of knowledge graph from , layer number of proximity graph from , randomly removing rate of batch edges from , maximum set size of QA pair from (Definition 2), minimum SPM threshold from (Definition 3). We tune the hyper-parameters by grid search algorithm.
Dataset | FB15k-237 | WN18RR |
---|---|---|
# entity | 14,541 | 40,943 |
# relation | 237 | 11 |
# train triple | 272,115 | 86,835 |
# valid triple | 17,535 | 3,034 |
# test triple | 20,466 | 3,134 |
Models | FB15k-237 | WN18RR | |||||||||
MRR | MR | H@1 | H@3 | H@10 | MRR | MR | H@1 | H@3 | H@10 | ||
Translational Distance | |||||||||||
TransE (Bordes et al. 2013) | .294 | 357 | - | - | .465 | .226 | 3384 | - | - | .501 | |
RotatE (Sun et al. 2019) | .338 | 177 | .241 | .375 | .533 | .476 | 3340 | .428 | .492 | .571 | |
PaiRE (Chao et al. 2021) | .351 | 160 | .256 | .387 | .544 | - | - | - | - | - | |
Semantic Matching | |||||||||||
DistMult (Yang et al. 2015) | .241 | 254 | .155 | .263 | .419 | .430 | 5110 | .390 | .440 | .490 | |
ComplEx (Trouillon et al. 2016) | .247 | 339 | .158 | .275 | .428 | .440 | 5261 | .410 | .460 | .510 | |
TuckER (Balazevic and Allen 2019) | .358 | - | .266 | .394 | .544 | .470 | - | .443 | .482 | .526 | |
ConvE (Dettmers et al. 2018) | .325 | 244 | .237 | .356 | .501 | .430 | 4187 | .400 | .440 | .520 | |
InteractE (Vashishth et al. 2020a) | .354 | 172 | .263 | - | .535 | .463 | 5202 | .430 | - | .528 | |
ProcrustEs (Peng et al. 2021) | .345 | - | .249 | .379 | .541 | .474 | - | .421 | .502 | .569 | |
GNN-based | |||||||||||
R-GCN (Schlichtkrull et al. 2018) | .248 | - | .151 | - | .417 | - | - | - | - | - | |
KBGAT (Nathani et al. 2019)* | .157 | 270 | - | - | .331 | .412 | 1921 | - | - | .554 | |
SACN (Shang et al. 2019) | .350 | - | .260 | .390 | .540 | .470 | - | .430 | .480 | .540 | |
A2N (Bansal et al. 2019) | .317 | - | .232 | .348 | .486 | .450 | - | .420 | .460 | .510 | |
CompGCN (Vashishth et al. 2020b) | .355 | 197 | .264 | .390 | .535 | .479 | 3533 | .443 | .494 | .546 | |
CP-GNN (ours) | .365 | 178 | .276 | .397 | .554 | .482 | 3214 | .447 | .492 | .571 |
4.2 Experimental Results of KGC Task
Our baselines are selected from three categories which are Translational Distance Models: TransE (Bordes et al. 2013), RotatE (Sun et al. 2019), PaiRE (Chao et al. 2021); Semantic Matching Models: DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), TuckER (Balazevic and Allen 2019), ConvE (Dettmers et al. 2018), InteractE (Vashishth et al. 2020a), ProcrustEs (Peng et al. 2021); GNN-based Models: R-GCN (Schlichtkrull et al. 2018), KBGAT (Nathani et al. 2019), SACN (Shang et al. 2019), A2N (Bansal et al. 2019), CompGCN (Vashishth et al. 2020b).
The experimental results are demonstrated in Table 2, where we can see that proposed CP-GNN model obtains state-of-the-art results compared to current methods. On FB15k-237 dataset, CP-GNN obtains best results using four of five metrics, and on WN18RR dataset CP-GNN also achieves best with MRR, H@1 and H@10 metrics. For the MRR metric, which is an important indicator to describe the general ranking performance, CP-GNN attains best results on both two datasets, showing the simultaneous modeling of two semantic patterns encodes a more comprehensive knowledge representation for downstream task. For H@k metrics, CP-GNN achieves best performance on five terms across two datasets. The only exception is H@3 on WN18RR, while the result is also competitive. This shows that CP-GNN maintains a high prediction accuracy for top ranking entities. For the MR metric, CP-GNN also gives a competitive performance. Note that for the models PaiRE (Chao et al. 2021) and KBGAT (Nathani et al. 2019) with the best MR reporting, their other metric outcomes are not so as outstanding, and CP-GNN still achieves the better overall performance.
In addition, we observe that CP-GNN shows a comparatively better performance on FB15k-237 dataset than WN18RR. We consider such performance differences are caused by the query complexity characteristics of two datasets. In FB15k-237, there are high rate of queries with multiple answers (also known as 1-N relations from triple view), which demands higher modeling capacity to capture latent relevancy among answers, i.e. proximity pattern. While in WN18RR most queries only satisfy one answer (also known as 1-1 relations), implying WN18RR is an easier dataset that traditional relation pattern is sufficient to some extent. This can also be proven from the consistently better performance on WN18RR relative to FB15k-237 across most models. We think that proximity pattern plays a more important role in complex query scenarios like in FB15k-237, which will be further discussed in section 5.2.
5 Effectiveness Evaluation of Proximity Pattern
In this section we attempt to answer following questions:
Q1. Does the employment of proximity pattern do improve the model performance for Knowledge Graph Completion task? (section 5.1)
Q2. How does the proximity pattern perform for data with different modeling complexity? (section 5.2)
5.1 Ablation Study of Model Performance
To evaluate the effect of proximity pattern for model performance on KGC task, we do the ablation study of removing the proximity pattern modeling module in CP-GNN, and denote the remained architecture that only retains knowledge graph module as CP-GNN (KG). Corresponding to the figure 2, after relation pattern modeling the obtained embedding will be directly input into the decoder as . The relation embedding and its transformation will remain unchanged. The comparison results on FB15k-237 test set are summarized in Table 3. We can observe the performance degeneration across all five metrics in CP-GNN (KG), which shows the limited capacity of only modeling relation pattern and the effectiveness of fusing into proximity pattern.
Models | FB15k-237 | ||||
---|---|---|---|---|---|
MRR | MR | H@1 | H@3 | H@10 | |
CP-GNN | 0.365 | 178 | 0.276 | 0.397 | 0.554 |
CP-GNN (KG) | 0.357 | 193 | 0.258 | 0.380 | 0.531 |
5.2 Ablation Study on Data with Different Modeling Complexity
In this section, we will further probe into how does CP-GNN and CP-GNN (KG) perform in the subdivided data complexity scenarios. For each query-answer data, we decide its category based on its answer set size, which serves as an implication of modeling complexity. Intuitively, more answers need to simultaneously satisfy for one query, higher probability the conflicts may happen, which puts forward higher requirements for the model to capture the latent closeness relevancy among answer entities. In practice, for every we will count the satisfied entities in train set of query and as its two categories. We denote the category as N-type.
We divide N-type into six ranges: N0, N1, 1N10, 10N100, 100N500, N500, representing the different complexity level. The statistics of N-type data in FB15k-237 and WN18RR is summarized in table 4. We can see that in FB15k-237 the proportion of complex data is obviously larger, where 10N100 is 0.27 and 100n500 is 0.13, and the counterparts in WN18RR are 0.09 and 0.04 respectively. This corresponds to our analysis in section 4.2, that FB15k-237 is a harder dataset so most models reveal a worse performance compared to WN18RR dataset.
We demonstrate the MRR result of CP-GNN and CP-GNN (KG) on different N-type ranges in figure 4. We can observe that CP-GNN outperforms or competes CP-GNN (KG) in all scenarios, and when N10 the improvements are especially evident. This shows that the proximity pattern can effectively boost the modeling capacity for complex data. We consider this is because through proximity pattern, the potential answer entities will be modeled more close to each other, and the model can easily utilize observed facts to infer out unknown answers, like what illustrated in figure 1.
N-type Range | FB15k-237 | WN18RR | ||
---|---|---|---|---|
Num. | Rate | Num. | Rate | |
N0 | 6,881 | 0.17 | 2827 | 0.45 |
N1 | 2,929 | 0.07 | 970 | 0.15 |
1N10 | 11,368 | 0.28 | 1657 | 0.26 |
10N100 | 10,965 | 0.27 | 581 | 0.09 |
100N500 | 5,359 | 0.13 | 233 | 0.04 |
N500 | 3,430 | 0.08 | 0 | 0.0 |
Total | 40,932 | 1.0 | 6268 | 1.0 |

6 Conclusions
In this work we explore the proximity pattern, a new semantic assumption of knowledge graph, which is able to help obtain more comprehensive knowledge embedding. Moreover, we design a Chained couPle-GNN (CP-GNN) architecture to fuse proximity pattern and original relation pattern in knowledge graph globally and deeply. Extensive experiments demonstrate the validity of proposed proximity pattern and the effectiveness of fused knowledge representations, especially in complex data scenarios.
References
- Arora (2020) Arora, S. 2020. A Survey on Graph Neural Networks for Knowledge Graph Completion. CoRR, abs/2007.12374.
- Balazevic and Allen (2019) Balazevic, I.; and Allen, C. 2019. TuckER: Tensor Factorization for Knowledge Graph Completion. In Inui, K.; Jiang, J.; Ng, V.; and Wan, X., eds., Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, 5184–5193. Association for Computational Linguistics.
- Bansal et al. (2019) Bansal, T.; Juan, D.; Ravi, S.; and McCallum, A. 2019. A2N: Attending to Neighbors for Knowledge Graph Inference. In Korhonen, A.; Traum, D. R.; and Màrquez, L., eds., Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, 4387–4392. Association for Computational Linguistics.
- Bordes et al. (2013) Bordes, A.; Usunier, N.; García-Durán, A.; Weston, J.; and Yakhnenko, O. 2013. Translating Embeddings for Modeling Multi-relational Data. In Burges, C. J. C.; Bottou, L.; Ghahramani, Z.; and Weinberger, K. Q., eds., Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, 2787–2795.
- Chao et al. (2021) Chao, L.; He, J.; Wang, T.; and Chu, W. 2021. PairRE: Knowledge Graph Embeddings via Paired Relation Vectors. In Zong, C.; Xia, F.; Li, W.; and Navigli, R., eds., Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, 4360–4369. Association for Computational Linguistics.
- Dettmers et al. (2018) Dettmers, T.; Minervini, P.; Stenetorp, P.; and Riedel, S. 2018. Convolutional 2D Knowledge Graph Embeddings. In McIlraith, S. A.; and Weinberger, K. Q., eds., Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, 1811–1818. AAAI Press.
- Kipf and Welling (2017) Kipf, T. N.; and Welling, M. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.
- Nathani et al. (2019) Nathani, D.; Chauhan, J.; Sharma, C.; and Kaul, M. 2019. Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs. In Korhonen, A.; Traum, D. R.; and Màrquez, L., eds., Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, 4710–4723. Association for Computational Linguistics.
- Peng et al. (2021) Peng, X.; Chen, G.; Lin, C.; and Stevenson, M. 2021. Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis. In Toutanova, K.; Rumshisky, A.; Zettlemoyer, L.; Hakkani-Tür, D.; Beltagy, I.; Bethard, S.; Cotterell, R.; Chakraborty, T.; and Zhou, Y., eds., Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, 2364–2375. Association for Computational Linguistics.
- Schlichtkrull et al. (2018) Schlichtkrull, M. S.; Kipf, T. N.; Bloem, P.; van den Berg, R.; Titov, I.; and Welling, M. 2018. Modeling Relational Data with Graph Convolutional Networks. In Gangemi, A.; Navigli, R.; Vidal, M.; Hitzler, P.; Troncy, R.; Hollink, L.; Tordai, A.; and Alam, M., eds., The Semantic Web - 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3-7, 2018, Proceedings, volume 10843 of Lecture Notes in Computer Science, 593–607. Springer.
- Shang et al. (2019) Shang, C.; Tang, Y.; Huang, J.; Bi, J.; He, X.; and Zhou, B. 2019. End-to-End Structure-Aware Convolutional Networks for Knowledge Base Completion. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, 3060–3067. AAAI Press.
- Sun et al. (2019) Sun, Z.; Deng, Z.; Nie, J.; and Tang, J. 2019. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
- Sun et al. (2020) Sun, Z.; Vashishth, S.; Sanyal, S.; Talukdar, P. P.; and Yang, Y. 2020. A Re-evaluation of Knowledge Graph Completion Methods. In Jurafsky, D.; Chai, J.; Schluter, N.; and Tetreault, J. R., eds., Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, 5516–5522. Association for Computational Linguistics.
- Toutanova and Chen (2015) Toutanova, K.; and Chen, D. 2015. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd workshop on continuous vector space models and their compositionality, 57–66.
- Trouillon et al. (2016) Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; and Bouchard, G. 2016. Complex Embeddings for Simple Link Prediction. In Balcan, M.; and Weinberger, K. Q., eds., Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, 2071–2080. JMLR.org.
- Vashishth et al. (2020a) Vashishth, S.; Sanyal, S.; Nitin, V.; Agrawal, N.; and Talukdar, P. P. 2020a. InteractE: Improving Convolution-Based Knowledge Graph Embeddings by Increasing Feature Interactions. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, 3009–3016. AAAI Press.
- Vashishth et al. (2020b) Vashishth, S.; Sanyal, S.; Nitin, V.; and Talukdar, P. P. 2020b. Composition-based Multi-Relational Graph Convolutional Networks. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
- Velickovic et al. (2018) Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; and Bengio, Y. 2018. Graph Attention Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
- Wang et al. (2018) Wang, H.; Zhang, F.; Wang, J.; Zhao, M.; Li, W.; Xie, X.; and Guo, M. 2018. RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems. In Cuzzocrea, A.; Allan, J.; Paton, N. W.; Srivastava, D.; Agrawal, R.; Broder, A. Z.; Zaki, M. J.; Candan, K. S.; Labrinidis, A.; Schuster, A.; and Wang, H., eds., Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, Torino, Italy, October 22-26, 2018, 417–426. ACM.
- Wang et al. (2017) Wang, Q.; Mao, Z.; Wang, B.; and Guo, L. 2017. Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE Trans. Knowl. Data Eng., 29(12): 2724–2743.
- Wang et al. (2014) Wang, Z.; Zhang, J.; Feng, J.; and Chen, Z. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In Brodley, C. E.; and Stone, P., eds., Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Québec City, Québec, Canada, 1112–1119. AAAI Press.
- Xu et al. (2019) Xu, K.; Hu, W.; Leskovec, J.; and Jegelka, S. 2019. How Powerful are Graph Neural Networks? In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
- Yang et al. (2015) Yang, B.; Yih, W.; He, X.; Gao, J.; and Deng, L. 2015. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In Bengio, Y.; and LeCun, Y., eds., 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
- Yasunaga et al. (2021) Yasunaga, M.; Ren, H.; Bosselut, A.; Liang, P.; and Leskovec, J. 2021. QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering. In Toutanova, K.; Rumshisky, A.; Zettlemoyer, L.; Hakkani-Tür, D.; Beltagy, I.; Bethard, S.; Cotterell, R.; Chakraborty, T.; and Zhou, Y., eds., Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, 535–546. Association for Computational Linguistics.
- Zhang et al. (2020) Zhang, H.; Liu, Z.; Xiong, C.; and Liu, Z. 2020. Grounded Conversation Generation as Guided Traverses in Commonsense Knowledge Graphs. In Jurafsky, D.; Chai, J.; Schluter, N.; and Tetreault, J. R., eds., Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, 2031–2043. Association for Computational Linguistics.
Appendix A Weight Implementation Details
For the neighborhood aggregating mechanism of knowledge graph (equation 4), describes the weight from source node to destination node for each layer, and the choices include:
-
1.
Prior Weight: The straight application of Prior Weight is an economical choice. For example, one can use , the reciprocal of the outdegree of source nodes, implying that source message should be evenly spread to each destination node.
-
2.
GCN Weight: GCN Weight is derived from the Graph Convolutional Network (Kipf and Welling 2017), which forms as , where is the degree of the node .
-
3.
Attention Weight: The Attention Weight is dynamically calculated in each layer, which can capture the different relative importance of each neighbor. Here we calculate the weight of each relational neighbor to node as:
We recommend the usage of attention weight to dynamically aggregate the neighbor information based on the importance, which is also shown performance improvement in our experiment (but in the cost of extra time and memory occupation).