This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Context-Enhanced Entity and Relation Embedding for
Knowledge Graph Completion (Student Abstract)

Ziyue Qiao, Zhiyuan Ning, Yi Du, Yuanchun Zhou
Corresponding author
Abstract

Most researches for knowledge graph completion learn representations of entities and relations to predict missing links in incomplete knowledge graphs. However, these methods fail to take full advantage of both the contextual information of entity and relation. Here, we extract contexts of entities and relations from the triplets which they compose. We propose a model named AggrE, which conducts efficient aggregations respectively on entity context and relation context in multi-hops, and learns context-enhanced entity and relation embeddings for knowledge graph completion. The experiment results show that AggrE is competitive to existing models.

Introduction

Knowledge graphs(KGs) store a wealth of knowledge from real world into structured graphs, which consist of collections of triplets, and each triplet (h,r,t)(h,r,t) represents that head entity hh is related to tail entity tt through a relation type rr. KGs have play a significant role in AI-related applications such as recommendation systems, question answering, and information retrieval. However, the coverage of knowledge graphs nowadays is still far from complete and comprehensive, researchers have proposed a number of knowledge graph completion(KGC) methods to predict the missing links/relations in the incomplete knowledge graph.

Most state-of-the-art methods for KGC are usually based on knowledge graph embeddings, which normally assign an embedding vector to each entity and relation in the continuous embedding space and train the embeddings based on the existing triplets, i.e., to make the scores for observed graph triplets higher than random ones. However, most of them fail to utilize the context/neighborhood of the entity or relation, which may contain rich and valuable information for KGC.

Recently, some researches have proved the significance of contextual information of entity and relation in KGC. For example, A2N (Bansal et al. 2019) and RGHAT (Zhang et al. 2020) propose attention-based methods which leverage the contextual information for link prediction by attending to the neighbor entities and lead to more accurate KGC. PathCon (Wang, Ren, and Leskovec 2020) considers relational context of the head/tail entity and relational paths between head and tail entity in one model, and finds that they are critical to the task of relation prediction. However, these researches just utilize entity context or relation context, which may lead to information loss.

In this paper, we aim to take full advantage of both the entity context and relation context for enhancing the KGC task. Specifically, different from the neighborhood definition in traditional KG topology, for each element in each triplet, we extract the pair composed of the other two elements as one neighbor in its context. Then we propose an efficient model, named AggrE, to alternately aggregate the information of entity context and relation context in multi-hops into entity and relation, and learn context-enhanced entity embeddings and relation embeddings. Then we use the learned embeddings to predict the missing relation rr given a pair of entities (h,?,t)(h,?,t) to complete knowledge graphs.

Refer to caption
Figure 1: An illustration of extracting entity context and relation context from the triplets of knowledge graphs.

The Proposed Model

Given a knowledge graph G={(h,r,t)}E×R×EG=\{(h,r,t)\}\subseteq E\times R\times E, where EE and RR are entity set and relation set respectively. Firstly, we extract contexts of entities and relations from the existing triplets, as shown in Figure 1. For an entity eie_{i} in GG, we define the entity context of eie_{i} as Cei={(rj,ek)|(ei,rj,ek)G(ek,rj,ei)G}C_{e_{i}}=\{(r_{j},e_{k})|(e_{i},r_{j},e_{k})\in G\vee(e_{k},r_{j},e_{i})\in G\}, which is actually the set of neighbor entities with their corresponding relations of eie_{i}. For a relations rir_{i} in GG, we define the relation context of rir_{i} as Cri={(ej,ek)|(ej,ri,ek)G}C_{r_{i}}=\{(e_{j},e_{k})|(e_{j},r_{i},e_{k})\in G\}, which is actually the two endpoints of rir_{i}. Denote 𝐞i(0)\mathbf{e}^{(0)}_{i} and 𝐫i(0)\mathbf{r}^{(0)}_{i} as the randomly initialized embedding of eie_{i} and rir_{i} respectively, our intuition is to aggregate the contextual information into the representations of each entity and relation to help the prediction. We define the aggregation functions as:

𝐞i(l+1)=𝐞i(l)+(rj,ek)Ceiαi,j,k(l)(𝐫j(l)𝐞k(l))\displaystyle\mathbf{e}^{(l+1)}_{i}=\mathbf{e}^{(l)}_{i}+\sum_{(r_{j},e_{k})\in C_{e_{i}}}\alpha_{i,j,k}^{(l)}\cdot(\mathbf{r}^{(l)}_{j}\odot\mathbf{e}^{(l)}_{k}) (1)
𝐫i(l+1)=𝐫i(l)+(ej,ek)Criβi,j,k(l)(𝐞j(l)𝐞k(l))\displaystyle\mathbf{r}^{(l+1)}_{i}=\mathbf{r}^{(l)}_{i}+\sum_{(e_{j},e_{k})\in C_{r_{i}}}\beta_{i,j,k}^{(l)}\cdot(\mathbf{e}^{(l)}_{j}\odot\mathbf{e}^{(l)}_{k}) (2)

where 0l<L0\leq l<L and LL is the number of aggregation layers, 𝐞i(l+1)\mathbf{e}^{(l+1)}_{i} and 𝐫i(l+1)\mathbf{r}^{(l+1)}_{i} is the ll-th layer’s output embedding of eie_{i} and rir_{i}, \odot is the element-wise product, αi,j,k(l)\alpha_{i,j,k}^{(l)} and βi,j,k(l)\beta_{i,j,k}^{(l)} are layer-specific softmax normalization constants, αi,j,k(l)\alpha_{i,j,k}^{(l)} represent how important each entity context is for eie_{i}: αi,j,k(l)=exp(si,j,k(l))(rj,ek)Ceiexp(si,j,k(l))\alpha_{i,j,k}^{(l)}=\frac{\exp(s^{(l)}_{i,j,k})}{\sum\nolimits_{(r_{j^{\prime}},e_{k^{\prime}})\in C_{e_{i}}}\exp(s^{(l)}_{i,j^{\prime},k^{\prime}})}, and βi,j,k(l)\beta_{i,j,k}^{(l)} represent how important each relation context is for rir_{i}:βi,j,k(l)=exp(sj,i,k(l))(ej,ek)Criexp(sj,i,k(l))\beta_{i,j,k}^{(l)}=\frac{\exp(s^{(l)}_{j,i,k})}{\sum\nolimits_{(e_{j^{\prime}},e_{k^{\prime}})\in C_{r_{i}}}\exp(s^{(l)}_{j^{\prime},i,k^{\prime}})}. The si,j,k(l)s^{(l)}_{i,j,k} represents the score for each possible triplet (ei,rj,ek)(e_{i},r_{j},e_{k}) after ll layers’ aggregation, and we use the same score function with DistMult(Yang et al. 2014) to calculate the triplet scores:

si,j,k(l)=(𝐞i(l))TDiag(𝐫j(l))𝐞k(l)s^{(l)}_{i,j,k}=(\mathbf{e}^{(l)}_{i})^{T}Diag(\mathbf{r}^{(l)}_{j})\mathbf{e}^{(l)}_{k} (3)

Where Diag(𝐫j(l))Diag(\mathbf{r}^{(l)}_{j}) is a diagonal matrix with 𝐫j(l)\mathbf{r}^{(l)}_{j} in its diagonal. After LL layers’ aggregation, we can obtain the final output 𝐞i(L)\mathbf{e}^{(L)}_{i} and 𝐫i(L)\mathbf{r}^{(L)}_{i} of each entity and relation, which contain neighbor information from their LL-hops contexts. Then we conduct a softmax loss function on the final triplet scores to compute the likelihood of predicting the correct relations:

=(ei,rj,ek)Glogexp(si,j,k(L))rjRexp(si,j,k(L))\mathcal{L}=-\sum_{(e_{i},r_{j},e_{k})\in G}\log\frac{\exp(s^{(L)}_{i,j,k})}{\sum_{r_{j^{\prime}}\in R}\exp(s^{(L)}_{i,j^{\prime},k})} (4)

where RR is the set of relations. We use a mini-batch Adam optimizer to minimize \mathcal{L}. The difference between our aggregation model and Graph Neural Network(GNN) is that instead of using complex matrix transformation, we use element-wise products to obtain neighborhood information and add it directly to central nodes, as the embedding itself can be regarded as trainable transformation parameters. Also our model is expected to be more efficient and suitable for larger knowledge graph.

Experiments

We conduct experiments on two widely used KG benchmarks: FB15K-237 and WN18RR. Noted that the trainable parameters in our model are only entity and relation embeddings, for a fair comparison, we choose 5 traditional baselines with a small amount of parameters. We use Mean Reciprocal Rank (MRR, the mean of all the reciprocals of predicted ranks), Mean Rank (MR, the mean of all the predicted ranks), and Hit@3(the proportion of correctly predicted entities ranked in the top 3 predictions) as evaluation metrics. In the experiment, we set the embedding dimensionality as 256, the learning rate as 5e-3, l2 penalty coefficient as 1e-7, batch size as 512 and a maximum of 20 epochs. We set the number of aggregation layers LL as 2 for WN18RR and 4 for FB15K-237 repectively.

WN18RR FB15K-237
MRR MR Hit@3 MRR MR Hit@3
TransE 0.784 2.079 0.870 0.966 1.352 0.984
Distmult 0.847 2.024 0.891 0.875 1.927 0.936
ComplEx 0.840 2.053 0.880 0.924 1.494 0.970
SimplE 0.730 3.259 0.755 0.971 1.407 0.987
RotatE 0.799 2.284 0.823 0.970 1.315 0.980
AggrE 0.953 1.136 0.989 0.966 1.171 0.989
Table 1: Results of relation prediction. The results of baselines are taken from (Wang, Ren, and Leskovec 2020)

As shown in Table 1, The results indicate that AggrE significantly outperforms all the baselines on two benchmarks, which indicates the effectiveness of AggrE. Specifically, the improvement is rather significant for WN18RR, where the links between entities are more sparse than in FB15K-237, this may because without using extra parameters other than embeddings, AggrE is less prone to overfitting. Besides, AggrE achieves substantial improvements against DistMult on all the metrics, and noted that they have the similar objective functions, it indicates that aggregating contextual information for entities and relations is valuable and can great improve the performance of prediction.

Conclusion

In this paper, we specify a novel definition on the context/neighborhood of entity and relation in KGs, and propose a multi-layer aggregation models to compose contextual information to embeddings for KGC. In the future, we will explore more possible aggregation functions in our model.

Acknowledgments

This research was supported by the NSFC under Grant 61836013, National Key R&D Plan of China (2016YFB0501901), Beijing Nova Program of Science and Technology (Z191100001119090).

References

  • Bansal et al. (2019) Bansal, T.; Juan, D.-C.; Ravi, S.; and McCallum, A. 2019. A2N: attending to neighbors for knowledge graph inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 4387–4392.
  • Wang, Ren, and Leskovec (2020) Wang, H.; Ren, H.; and Leskovec, J. 2020. Entity Context and Relational Paths for Knowledge Graph Completion. arXiv preprint arXiv:2002.06757 .
  • Yang et al. (2014) Yang, B.; Yih, W.-t.; He, X.; Gao, J.; and Deng, L. 2014. Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575 .
  • Zhang et al. (2020) Zhang, Z.; Zhuang, F.; Zhu, H.; Shi, Z.-P.; Xiong, H.; and He, Q. 2020. Relational Graph Neural Network with Hierarchical Attention for Knowledge Graph Completion. In AAAI, 9612–9619.