Modelling Multi-relations for Convolutional-based Knowledge Graph Embedding

Sirui Li Kok Wai Wong Dengya Zhu Chun Che Fung Discipline of Information Technology, Murdoch University, South St, Murdoch, Western Australia Discipline of Business Information Systems, School of Management and Marketing, Curtin University, Kent St, Bentley, Western Australia

Abstract

Representation learning of knowledge graphs aims to embed entities and relations into low-dimensional vectors. Most existing works only consider the direct relations or paths between an entity pair. It is considered that such approaches disconnect the semantic connection of multi-relations between an entity pair, and we propose a convolutional and multi-relational representation learning model, ConvMR. The proposed ConvMR model addresses the multi-relation issue in two aspects: (1) Encoding the multi-relations between an entity pair into a unified vector that maintains the semantic connection. (2) Since not all relations are necessary while joining multi-relations, we propose an attention-based relation encoder to automatically assign weights to different relations based on semantic hierarchy. Experimental results on two popular datasets, FB15k-237 and WN18RR, achieved consistent improvements on the mean rank. We also found that ConvMR is efficient to deal with less frequent entities.

keywords:

Knowledge Graph Embedding; Multi-relation; Convolution; Representation learning.

1 Introduction

A Knowledge Graph (KG) represents a graph-structured knowledge base to encode real-world entities and illustrate the relationship between them. There are two main branches: triple-based KGs and quadruple-based KGs. The former is typically represented as sets of Resource Description Framework (RDF) triples (s, r, o), where $s$ is the subject entity, $o$ is the object entity and $r$ is the relation, e.g. (Barack Obama, president of, USA). The latter is often called Temporal KGs (TKGs) represented as quadruples $(s,r,o,T)$ , where $T$ denotes the timestamp, e.g. (Barack Obama, president of, USA, 2010). Although typical KGs contain millions of entities and billions of triples, they are far from complete [1]. In this paper, we focus on completing triple-based KGs because the application of TKGs is limited to time-dependent domains, while triple-based KGs can be applied to more general domains [2].

Many efforts have been devoted to predicting the missing links between subject-object pairs for triple-based KGs [3]. The traditional method using formal logic is neither traceable nor robust when dealing with long-range reasoning over a large scale KG [4]. Recent studies [5, 4, 6, 7] attempt to learn low-dimensional representations of entities and relations while preserving certain properties of the original KG. The objective of these studies is to learn a score function that assigns higher plausibility scores to valid triples than invalid triples [6].

There are two main kinds of recent studies: triple-level learning and path-level learning. The triple level learning [5, 4, 6, 7] only takes direct relations between entities into account. For example, TransE [5] assumes that the relation between two entities corresponds to a translation between the vector representations of the entities. The path-level learning [1, 8, 9] claims that there are also substantial multiple-hop relation paths between entities indicating their semantic relationships. For instance, PTransE [1] is a path-based TransE, which not only learns vector representations from directly connected triples but also builds triples from KGs using entity pairs connected with relation paths.

Despite their success in predicting missing links, the triple/path-level learning methods disconnect the diverse aspects of multi-relations that are highly semantically related [10]. For an instance extracted from FB15k-237 as shown in Table 1, there are a total of four relations between entity Prince Edward Island and entity Canada, which indicates the positional relationship at different hierarchies. In the triple-level learning, models split the example into four different triples and feed them to the score function $f$ : $f$ (Prince Edward Island, $r_{a}$ , Canada), $f$ (Prince Edward Island, $r_{b}$ , Canada), $f$ (Prince Edward Island, $r_{c}$ , Canada) and $f$ (Prince Edward Island, $r_{d}$ , Canada). In the path-level learning, models firstly mine paths between entity Prince Edward Island and entity Canada. Then models learn the score function based on the four triples and paths. Either triple-level learning or path-level learning treats this example as four different triples and therefore weakens the semantic connection between the four relations. Obviously, to reflect the relevant knowledge, it is more reasonable to model multi-relations between a pair of entities as an individual vector representation rather than divide them into four different triples [10].

Table 1: An example of multi-relations in FB15k-237.

r_{a}

r_{b}

r_{c}

and

r_{d}

represent the four relations, respectively.

Subject	Relation	Object
Prince Edward Island	$r_{a}$ : /base/biblioness/bibs_location/country	Canada
	$r_{b}$ : /location/administrative_division/country
	$r_{c}$ : /location/administrative_division/first_level_division_of
	$r_{d}$ : /base/areas/schema/administrative_area/administrative_parent

Motivated by the observation, we propose a neural network-based schema, ConvMR (Convolutional and Multi-Relational model), to typically address the issue related to the multi-relations between an entity pair. In ConvMR, in addition to directly connected triples, we also build triples from KGs using multi-relations between entity pairs. The triple/path-level learning only considers direct connections between an entity pair, building $(s,r_{1},o),(s,r_{2},o),...,(s,r_{N},o)$ and optimizing the objective. In contrast, ConvMR also builds a multi-relation triple $(s,r_{1}\circ r_{2}\circ...r_{N},o)$ , where $\circ$ is an operation to join the multi-relations into a unified relation representation. Compared with triple-level learning, ConvMR enhances the semantic connection between multi-relations and entities. Compared with path-level learning, ConvMR relieves the burden to mine meaningful paths. There are two challenges to make ConvMR non-trivial. Firstly, the multi-relations should be properly encoded into a vector representation. The encoder is supposed to be easily implemented and maintain the semantic connection. Secondly, some relations could be less relevant while joining multi-relations because not every relation is necessary to the objective. To address the first challenge, we compared different encoders and chose the best one. To address the second challenge, we combined the chosen encoder with the attention mechanism. When dealing with variable sized inputs, the attention mechanism focuses on the most relevant parts of the input [11].

2 Related Works

2.1 Triple-based Knowledge Graph Embedding

Many Knowledge Graph Embedding models (KGEs) have been proposed to reason missing links between subject-object pairs. The key idea of recent KGEs is to learn low-dimensional vectors of entities and relations.

2.1.1 Triple-level learning

Most of the currently available techniques [4, 5, 12, 13] use triples of a KG to perform the embedding task, enforcing vectors to be compatible with triples. These triple-level learning techniques can be roughly categorized into two groups: translational distance models and neural networks-based models. Translational distance models [5, 4, 13] view the relation as a translation operation between the subject and the object. For example, given a valid triple $(s,r,o)$ , TransE [5] translates $r$ to a sum operation, $\mathbf{v_{s}}+\mathbf{v_{r}}=\mathbf{v_{o}}$ , where $\mathbf{v_{x}}$ is the vector of item x. TransE is simple but it has flaws in dealing with complex relations, e.g. 1-to-N, N-to-1, and N-to-N relations [14]. To better model complex relations, some models [4, 13] allow the entity or the relation to have distinct representations. However, Song et al. [15] argue that translational distance models suffer from slow convergence and poor local optimality. Moreover, these shallow models are limited to their expressiveness [16].

Recent studies [6, 17, 18, 19, 7] have raised interest in applying deep neural networks to the triple-level learning. For example, ConvE [17] is a Convolutional Neural Networks (CNNs)-based model, concatenating $\mathbf{v_{s}}$ and $\mathbf{v_{r}}$ into a matrix and the matrix is fed to the convolution layer. However, ConvE ignores the transitional characteristic between triples [6]. To overcome this issue, Nguyen et al. [6] propose ConvKB which concatenates $\mathbf{v_{s}}$ , $\mathbf{v_{r}}$ and $\mathbf{v_{o}}$ together before feeding them to the convolution layer. Still, these neural networks-based models are triple-level learning methods so they often simplify the complex nature of the data in a KG [20].

2.1.2 Path-level learning

The triple-level learning methods only use direct relations between entities so they ignore rich information in relation paths [1]. On the other hand, the path-level learning methods [1, 8, 9] can explore relation paths between entities. For instance, PTransE [1] improves TransE by incorporating relation inferences into KGEs. If there exists a path $(s_{1},r_{1},s_{2},r_{2},s_{3})$ and a triple $(s_{1},r_{3},s_{3})$ , PTransE models a relation inference by learning $\mathbf{v_{r_{1}}\circ v_{r_{2}}=v_{r_{3}}}$ where $\circ$ is an operator, e.g. multiply. Yang et al. [8] use first-order logical rules to generate paths. However, they ignore relational dependencies of entities and they do not fully exploit the potential of KG paths [9].

Some path-level methods [21, 22] use neural networks to generate paths. For example, DeepWalk [21] uses the uniform random walks to sample paths and employs Skip-Gram [23] to encode these paths. On the other hand, neural networks are also used to capture the long-term relational dependencies of paths. Guo et al. [9] propose Recurrent Skipping Networks (RSNs) to enhance semantic learning of relational paths. While encoding relational paths improves model performance, complex algorithms have to be proposed to generate or prune meaningful paths [14].

2.2 Quadruple-based Knowledge Graph Embedding

Recent works of quadruple-based KGEs extend triple-based KGEs to the temporal domain. For example, some works [24, 25, 26] add a timestamp encoder and design time-sensitive quadruple decoding functions. However, entity-level temporal patterns are not explicitly captured by these methods [2]. On the other hand, some works [27, 28, 29] use message passing networks to capture intra-graph neighbourhood information that is combined with the temporal recurrence or attention mechanisms. However, their focus is on continuous TKG [2].

2.3 Attention Mechanism

Attention mechanisms have become almost a de facto standard in many Natural Language Processing (NLP) tasks [30]. Attention mechanisms are neural networks that could automatically weight the relevance of the input and take such weights into account in the downstream tasks. Recently, some KGEs [31, 32] incorporate the attention mechanism to enhance the effectiveness. HRAN [32] considers the importance of relational paths through the attention mechanism. Based on the learned attention values for each relational path, HRAN aggregates the neighbour features in a hierarchical structure. Similarly, Wang et al. [31] propose Graph Attenuated Attention networks (GAATs) which integrates an attenuated attention mechanism to assign different weights to different relational paths and acquire the information from the neighbours.

3 Methodology

A KG $\mathcal{G}=<E,R>$ is a collection of valid triples in the form of (subject, relation, object) denoted by $(s,r,o)$ , where $E$ is a set of entities and $R$ is a set of relations. To clarify, we denote $\mathbf{v_{x}}$ as the vector representation of item $x$ and denote the dimensionality of vector representation by $k$ . There are three important modules in the proposed ConvMR: multi-relation triple generator, relation encoder and the objective learning.

3.1 Multi-relation Triple Generator

Input: All triples of a KG

\mathcal{G}

Output: Original triples and multi-relation triples.

1 pair2rel=dict()

2 multi-triples=set()

3 triples = all triples of

\mathcal{G}

4 for $(s,r,o)$ in $triples$ do

5 if $(s,o)$ not in pair2rel then

6 pair2rel[(s, o)] = set()

7 end if

pair2rel[(s,o)]

.add(

r

)

10 end for

11filter = find all keys in pair2rel whose length of value is greater than 1

12 for $(s,o)$ in filter do

r_{1},r_{2},...,r_{N}=pair2rel[(s,o)]

, where N is the number of relations between

(s,o)

14 multi-triples.add(

(s,r_{1},r_{2},...,r_{N},o)

)

15 end for

return triples+multi-triples

Algorithm 1 Algorithm for multi-relation triple generator.

In learning, ConvMR concerns not only direct connections between an entity pair (i.e., original triples of a KG: $(s,r_{1},o),(s,r_{2},o),...,(s,r_{N},o)$ ) but also multi-relations between the pair (i.e., $(s,r_{1},r_{2},...,r_{N},o)$ ). We introduce the process of building the latter in Algorithm 1. Generally, it uses a dictionary whose keys are $(s,o)$ pairs and values are the relation set.

3.2 Relation Encoder

Besides building multi-relation triples, ConvMR also needs a reliable relation encoder that can represent either one relation from an original triple or a set of relations from a multi-relation triple. In this paper, we consider four type of operations: set-transformer [33], attention-based average (attn-average), Gated Recurrent Unit (GRU) [34] and bi-directional GRU (biGRU). They are chosen due to the easy implementation and fast computation. To simplify the equation, we use $(s,H,o)$ to represent the input of each operation. The input can be either a triple $(s,r,o)$ or a multi-relation triple $(s,r_{1},r_{2},..,r_{N},o)$ .

Set-transformer. We assume that the sequence of the multi-relations does not matter in learning. Hence, we introduce the set-transformer [33] that is an attention-based neural network module. It is designed to model interactions among elements in the input set. It is formalized as:

Z=Encoder(H)=SAB(SAB(H))

(1)

\mathbf{v_{r^{\prime}}}=Decoder(Z)=rFF(SAB(PMA(Z)))

(2)

SAB(H)=MAB(H,H)

(3)

where $\mathbf{v_{r^{\prime}}}$ is the representation of $H$ , MAB is Multihead Attention Block, PMA is Pooling by Multihead Attention, SAB is the Set Attention Block.

Attn-average. Attn-average stands for the attention-based average operation. Given the input $(s,H,o)$ , attn-average firstly calculates the attention weight $a_{i}$ for each $r_{i}\in H$ :

\mathbf{A}=softmax(tanh(\mathbf{W}\times\mathbf{H}))

(4)

The output $\mathbf{A}=[a_{1},a_{2},...,a_{N}]\in\mathbb{R}^{1\times N}$ is the attention weight matrix. Matrix $\mathbf{H}=[\mathbf{v_{r_{1}},v_{r_{2}},...,v_{r_{N}}}]\in\mathbb{R}^{k\times N}$ . $\mathbf{W}$ is a shared linear transformation. $tanh$ is the activation function. Then the attn-average represents $H$ by:

\mathbf{v_{r^{\prime}}}=avg(\mathbf{H}\odot\mathbf{A})=\frac{a_{1}\mathbf{v_{r_{1}}}+a_{2}\mathbf{v_{r_{2}}}+...+a_{N}\mathbf{v_{r_{N}}}}{N}

(5)

where $\odot$ is the element-wise multiplication; $avg$ is the average operation.

GRU. Set-transformer or attn-average is based on the assumption that the sequence of the input is unnecessary. To investigate if the assumption is true, we employ GRU [34] which is a gating mechanism in RNNs. GRU can capture the long-term dependency in sequences. It is formalized as:

\mathbf{v_{r^{\prime}}}=GRU(\mathbf{H})

(6)

biGRU. GRU only considers one direction of the input. A BiGRU is a sequence processing model that consists of two GRUs. One takes the input in a forward direction and the other in a backwards direction. It is formalized as:

\mathbf{v_{r^{\prime}}}=biGRU(\mathbf{H})

(7)

3.3 Objective Formalization

The objective of ConvMR (or other KGEs) is to learn a score function giving higher scores to valid triples than invalid triples. Given the subject vector $\mathbf{v_{s}}\in\mathbb{R}^{1\times k}$ , the object vector $\mathbf{v_{o}}\in\mathbb{R}^{1\times k}$ and the relation/multi-relation vector $\mathbf{v_{r^{\prime}}}\in\mathbb{R}^{1\times k}$ , we train a convolution layer to calculate the score for the matrix $\mathbf{T}=[\mathbf{v_{s}},\mathbf{v_{o}},\mathbf{v_{r^{\prime}}}]\in\mathbb{R}^{k\times 3}$ that is the concatenation of $\mathbf{v_{s}}$ , $\mathbf{v_{o}}$ and $\mathbf{v_{r^{\prime}}}$ . CNNs have shown its power to extract local features and generalize the transitional characters in previous KGEs [6]. We use filters operated on the convolution layer. As a result, the score function $f$ is:

f(s,H,o)=concat(g([\mathbf{v_{s}},\mathbf{v_{o}},\mathbf{v_{r^{\prime}}}]\ast\Omega))

(8)

where $concat$ is the concatenation operation, $\Omega$ is a set of filters, $\ast$ is the convolution operation; $g$ is the activation function ReLU.

We used the same loss function $\mathcal{L}$ as some previous KGEs, e.g. ConvKB [6] and ComplEx [12]. We used the AdaGrad to train ConvMR by minimizing the loss function with $L_{2}$ regularization on the weight vector $\mathbf{w}$ of the model. The loss function $\mathcal{L}$ is:

\mathcal{L}=\sum_{(s,H,o)\in\mathcal{G}\cup\mathcal{G^{\prime}}}log(1+exp(l_{(s,H,o)}\cdot f(s,H,o)))+\frac{\lambda}{2}||\mathbf{w}||^{2}_{2}

(9)

where $l_{(s,H,o)}=1$ for $(s,H,o)\in\mathcal{G}$ ; $l_{(s,H,o)}=-1$ for $(s,H,o)\in\mathcal{G^{\prime}}$ . $\mathcal{G^{\prime}}$ is generated by corrupting valid triples in $\mathcal{G}$ .

4 Experiments

4.1 Datasets

We evaluated the proposed ConvMR on two popular datasets: WN18RR [17] and FB15k-237 [35]. The statistical information of WN18RR and FB15k-237 is provided in Table 2.

Table 2: Statistics of the experimental datasets. #E is the number of entities. #R is the number of relations.

Dataset	#E	#R	#Triples in train/valid/test
WN18RR	40,943	11	86,835	3,034	3,134
FB15k-237	14,541	237	272,115	17,535	20,466

4.2 Baselines

We included eight baseline techniques and most of them are CNNs-based KGEs because the proposed ConvMR employed CNNs to train the score function.

•

TransE [5] is a popular transition-based method. TransE assumes the subject vector representation plus the relation vector representation should be close to the object vector representation.
•

ConvE [17] is the first KGE using CNNs. ConvE finds the convolutional features in the concatenation of the subject vector representation and the relation vector.
•

ConvKB [6] improves ConvE by concatenating subject, relation and object before feeding them into the convolution layer.
•

HypER [18] is a variation of ConvE. HypER introduces hypernetworks to generate relation-specific filters for the convolution layer.
•

ConvR [19] is a modification of ConvKB. Instead of using global filters like ConvKB, ConvR performs an adaptive convolution with relation-specific filters.
•

CapsE [7] improves ConvKB by adding a “deep” architecture, capsule (each capsule is a group of neurons), to capture the intrinsic spatial relationship between triples.
•

RSN [9] concentrates on learning vector representations from relational paths by employing Recurrent Neural Networks (RNN).
•

CompGCN [36] addresses the issue of GCNs that they only learn entity embeddings. CompGCN jointly learns node representations and relation representations in the graph to solve the KG sparsity.

4.3 Training Protocol

In training, we randomly replaced $s$ or $o$ to generate invalid instances. We employed the same initialization of entity and relation vector representations as CapsE [7]. We tuned the hyper-parameters with the grid search. Finally, we trained ConvMR for 20 epochs and set $k$ =100, the number of batch was 800, learning rate of the AdaGrad optimizer started from 0.01 and was reduced by 0.1/5 epochs.

4.4 Evaluation Metrics

Previous works perform the link prediction task to evaluate KGEs. Given a relation and an entity, the purpose of the link prediction task is to predict a missing entity, i.e., referring to $s$ given $(r,o)$ or inferring to $o$ given $(s,r)$ . To generate the corrupted triples in the link prediction task, we followed the ranking process proposed in [5]: for each triple $(s,r,o)$ , we replaced either $s$ or $o$ with every entity $e\in E$ . We use the Filtered setting protocol [5], i.e., not taking any corrupted triples that appear in the KG into accounts. We employed evaluation metrics: Mean Rank (MR) (the average of predicted ranks) and Hits@10 (the proportion of correct entities ranked in top 10). A lower MR and a higher Hits@10 value is the mark of a better model. Note that the input is triples $(s,r,o)$ in test for the fair comparison with baselines.

4.5 Results

We firstly compared the four types of operations in the relation encoder and chose the best one. Then, we designed three experiments to evaluate the power of learning the joint of multi-relations. Finally, we compared the proposed ConvMR with baselines.

4.5.1 Comparison of Different Relation Encoders

For ConvMR, we considered four operations for the relation encoder: set-transformer, attn-average, GRU and biGRU. From Table 3 we observed that: (1) Overall, attn-average outperformed other operations on all metrics across both datasets. While the set-transformer had the worst MR and lowest Hits@10 on both datasets. There are two reasons leading to the major gap between the two operations. Firstly, the set-transformer trained more parameters than the attn-average, which increased the difficulty of training and hence downgraded the overall performance. Secondly, the set-transformer aggregated features of multi-relations into a dense vector by using complex components, such as the multi-head attention block. It might constrain the ability of CNN to extract local features or to generalize the transitional character from triples. (2) Attn-average did better than GRU and biGRU, especially on FB15k-237 where attn-average gained significant improvements of 366-155=211 in MR and 52.5-30.3=22.2 in Hits@10. It represented that finding the long dependency of multi-relations did not contribute to the relation encoder. (3) To investigate whether the sequence or direction of the multi-relations is necessary, we made a comparison between GRU and biGRU. Obviously, biGRU did not dramatically outperform GRU. They achieved similar MR and Hits@10 on both datasets. It showed that the sequence or direction of multi-relations is unnecessary. Based on the above analysis, we chose the attn-average as our relation encoder in the following experiments.

Table 3: Comparison results of four operations in the relation encoder.

Operation	WN18RR		FB15k-237
Operation	MR	Hits@10	MR	Hits@10
set-transformer	915	45.4	500	23.1
attn-average	692	53.3	155	52.5
GRU	720	50.1	365	30.3
biGRU	718	50.1	366	31.5

4.5.2 Effect of the Attention Mechanism

In the following experiments, the relation encoder referred to attn-average. In this section, we studied how the attention mechanism affects the performance of the relation encoder by conducting an ablation study. We removed the attention mechanism of the relation encoder. In this case, multi-relations were only encoded by the average operation. On WN18RR, removing the attention mechanism decreased MR by 729-692=37, Hits@10 by 53.3-50.9=2.4% . On FB15k-237, the removal leaded to a 211-155=56 decrease in MR and 52.5-46.4=6.1% decrease in Hits@10. Therefore, the attention mechanism is important in the relation encoder.

To further demonstrate how the attention mechanism enhanced the performance of the relation encoder, we selected the highest frequency multi-relation from each dataset. The multi-relations with the highest frequency are (“derivationally_related_form”, “synset_domain_topic_of”) in WN18RR and (“award_winner”, “award_nominee”) in FB15k-237. We fed them into our trained attention mechanism. In WN18RR, the attention mechanism assigned 0.58 weight to “derivationally_related_form” and assigned 0.42 weight to “synset_domain_topic_of” while encoding the multi-relations. In FB15k-237, the attention mechanism assigned 0.49 weight to “award_winner” and assigned 0.51 weight to “award_nominee”.

A study [37] analysed the relations of popular KGs and classified them into three types based on the relatedness, suject specifics and object specifics: highly related (R), (generalised) specialisation (S) and (generalised) context-shift (C). Type C is a generalised case of type R and type S. That is, relations in type C have higher semantic relatedness, more subject and object specifics than relations in type R and type S. In WN18RR, the relation “derivationally_related_form” belongs to type R and the relation “synset_domain_topic_of” belongs to type C. In FB15k-237, the vast majority of relations belong to type C. Based on this observation, we can conclude that the weights reflect the hierarchies of relations. Relation “derivationally_related_form” is the parent hierarchy of relation “synset_domain_topic_of”. Relation “award_winner” and relation “award_nominee” have the same or similar hierarchy. This finding supported that the relation encoder can maintain the semantic/hierarchical connection between relations by automatically assigning different weights.

4.5.3 Removal of Multi-relation Triple Generator

In this section, we focused on the research question “Do multi-relations contribute to the proposed model?”. To answer this question, we trained ConvMR without the multi-relation triple generator. That is, the input of ConvMR was original triples from the KG. As a result, ConvMR without multi-relations got 735-692=43 MR and 53.3-51.0=2.3% Hits@10 decreases in WN18RR; 200-155=45 MR and 52.5-50.4=2.1% Hits@10 decreases in FB15k-237. The decreases indicated that learning with multi-relations captures more semantically rich neighbourhood information and hence produces more reasonable vector representations.

Table 4: Comparison results of four categories. (with/without multi-relations)

Category	WN18RR		FB15k-237
Category	MR	Hits@10	MR	Hits@10
1-1	4/5	96.4/96.4	292/300	54.9/53.4
1-M	839/883	31.8/30.8	355/350	36.2/35.7
M-1	1157/1229	30.6/27.5	208/230	57.3/54.8
M-M	32/35	96.3/94.5	118/130	51.2/48.8

Following [5], we categorized relations into four categories based on the average number $n_{s}$ of subjects per objects and the average number $n_{o}$ of objects per subjects: one-to-one (1-1), one-to-many (1-M), many-to-one (M-1) and many-to-many (M-M). We compared the performance of ConvMR with and without the generator in each category. The results are shown in Table 4. Overall, there was no significant difference in 1-1 and 1-M on MR and Hits@10 across both datasets. However, ConvMR with multi-relations gained obvious improvement on M-1 and M-M on both datasets. Entities in M-1 and M-M triples often appear less frequently than entities in 1-1 and 1-M triples [7]. In summary, training ConvMR with multi-relations can improve the representation learning, especially for rare entities.

4.5.4 Comparison with Baselines

Table 5: Results on WN18RR and FB15k-237 test sets. Results of TransE, ConvE, ConvKB and CapsE are from Vu et al. [7] where TransE, ConvKB and CapsE used the same vector representation intiailization as ConvMR; CompGCN and HypER are from the original papers; RSN and ConvR are from Rossi et al. [38]

Method	WN18RR		FB15k-237
Method	MR	Hits@10	MR	Hits@10
ConvR	5646	52.7	251	52.6
RSN	4210	48.3	248	44.4
CompGCN	3533	54.6	197	53.5
HypER	5798	52.2	250	52.0
ConvE	4187	51.5	244	50.1
TransE	743	56.0	347	46.5
ConvKB	763	56.0	254	53.2
CapsE	719	56.0	303	59.3
ConvMR	692	53.3	155	52.5

Table 5 compared the experimental results of ConvMR with baselines, using the same evaluation protocol. From the results, we can see that: (1) ConvMR obtained the best MR and proper Hits@10 on both datasets. On WN18RR, ConvMR achieved 719-692=27 MR increase and outperformed four models on Hits@10. On FB15k-237, ConMR beat four models on Hits@10 and achieved 197-155=42 MR increase compared with all baselines but 250-155=95 increase compared with CNN-based KGEs. It is mainly because previous KGEs processed triples independently, disconnecting the semantic connection between multi-relations. Additionally, there are more M-1 triples (20.5%) and M-M triples (72.3%) in FB15k-237 than those (47.4%, 36.1%) in WN18RR, so the MR of FB15k-237 increased dramatically. In Section 4.5.3, we have shown that training ConvMR with multi-relations performed better on M-1 and M-M triples even though entities in M-1 and M-M are rare. Moreover, our attention mechanism is able to specify importance weight to relations based on semantic hierarchy, which has been shown in Section 4.5.2.

5 Conclusion

In this paper, we proposed a CNN-based KGE, ConvMR. Previous KGEs were triple/path-level learning methods, disconnecting the semantic connection of multi-relations between subject-object pairs. The proposed ConvMR addressed this issue by building multi-relation triples and learning the weighted joint of them. The experiments have presented two important findings: (1) Learning representations with multi-relation triples can effectively mine features of less frequent entities. (2) The weighted joint of multi-relations can automatically assign different weights to relations based on the hierarchy, which maintains the semantic connection between relations. Experimental results demonstrated that ConvMR outperforms eight baseline techniques on MR across two popular datasets: WN18RR and FB15k-237. Even though ConMR did not achieve the state-of-the-art performance on Hits@10, our work identified a new perspective of KGE. In the future, we will explore the generalization of ConvMR to quadruple-based KGs and apply ConvMR to other NLP applications.

References

Lin et al. [2015a] Yankai Lin, Zhiyuan Liu, Huanbo Luan, Maosong Sun, Siwei Rao, and Song Liu. Modeling relation paths for representation learning of knowledge bases. arXiv preprint arXiv:1506.00379, 2015a.
Wu et al. [2020] Jiapeng Wu, Meng Cao, Jackie Chi Kit Cheung, and William L Hamilton. Temp: Temporal message passing for temporal knowledge graph completion. arXiv preprint arXiv:2010.03526, 2020.
Bordes et al. [2011] Antoine Bordes, Jason Weston, Ronan Collobert, and Yoshua Bengio. Learning structured embeddings of knowledge bases. In Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011.
Wang et al. [2014] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 28, 2014.
Bordes et al. [2013] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems, 26, 2013.
Nguyen et al. [2018] Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, and Dinh Q. Phung. A novel embedding model for knowledge base completion based on convolutional neural network. In Marilyn A. Walker, Heng Ji, and Amanda Stent, editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers), pages 327–333. Association for Computational Linguistics, 2018. doi: 10.18653/v1/n18-2053. URL https://doi.org/10.18653/v1/n18-2053.
Vu et al. [2019] Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen, Dinh Phung, et al. A capsule network-based embedding model for knowledge graph completion and search personalization. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2180–2189, 2019.
Yang et al. [2017] Fan Yang, Zhilin Yang, and William W Cohen. Differentiable learning of logical rules for knowledge base reasoning. Advances in neural information processing systems, 30, 2017.
Guo et al. [2019] Lingbing Guo, Zequn Sun, and Wei Hu. Learning to exploit long-term relational dependencies in knowledge graphs. In International Conference on Machine Learning, pages 2505–2514. PMLR, 2019.
Zhang et al. [2017] Chunhong Zhang, Miao Zhou, Xiao Han, Zheng Hu, and Yang Ji. Knowledge graph embedding for hyper-relational data. Tsinghua Science and Technology, 22(2):185–197, 2017.
Galassi et al. [2021] Andrea Galassi, Marco Lippi, and Paolo Torroni. Attention in natural language processing. IEEE Trans. Neural Networks Learn. Syst., 32(10):4291–4308, 2021. doi: 10.1109/TNNLS.2020.3019893. URL https://doi.org/10.1109/TNNLS.2020.3019893.
Trouillon et al. [2016] Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. Complex embeddings for simple link prediction. In International conference on machine learning, pages 2071–2080. PMLR, 2016.
Lin et al. [2015b] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and relation embeddings for knowledge graph completion. In Twenty-ninth AAAI conference on artificial intelligence, 2015b.
Wang et al. [2017] Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering, 29(12):2724–2743, 2017.
Song et al. [2020] Hyun-Je Song, A-Yeong Kim, and Seong-Bae Park. Learning translation-based knowledge graph embeddings by n-pair translation loss. Applied Sciences, 10(11):3964, 2020.
Xie et al. [2020] Zhiwen Xie, Guangyou Zhou, Jin Liu, and Xiangji Huang. Reinceptione: Relation-aware inception network with joint local-global structural information for knowledge graph embedding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5929–5939, 2020.
Dettmers et al. [2018] Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. Convolutional 2d knowledge graph embeddings. In Thirty-second AAAI conference on artificial intelligence, 2018.
Balažević et al. [2019] Ivana Balažević, Carl Allen, and Timothy M Hospedales. Hypernetwork knowledge graph embeddings. In International Conference on Artificial Neural Networks, pages 553–565. Springer, 2019.
Jiang et al. [2019] Xiaotian Jiang, Quan Wang, and Bin Wang. Adaptive convolution for multi-relational learning. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 978–987, 2019.
Rosso et al. [2020] Paolo Rosso, Dingqi Yang, and Philippe Cudré-Mauroux. Beyond triplets: hyper-relational knowledge graph embedding for link prediction. In Proceedings of The Web Conference 2020, pages 1885–1896, 2020.
Perozzi et al. [2014] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710, 2014.
Grover and Leskovec [2016] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 855–864, 2016.
Mikolov et al. [2013] Tomáš Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies, pages 746–751, 2013.
Dasgupta et al. [2018] Shib Sankar Dasgupta, Swayambhu Nath Ray, and Partha P. Talukdar. Hyte: Hyperplane-based temporally aware knowledge graph embedding. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 2001–2011. Association for Computational Linguistics, 2018. doi: 10.18653/v1/d18-1225. URL https://doi.org/10.18653/v1/d18-1225.
Goel et al. [2020] Rishab Goel, Seyed Mehran Kazemi, Marcus A. Brubaker, and Pascal Poupart. Diachronic embedding for temporal knowledge graph completion. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 3988–3995. AAAI Press, 2020. URL https://ojs.aaai.org/index.php/AAAI/article/view/5815.
Lacroix et al. [2020] Timothée Lacroix, Guillaume Obozinski, and Nicolas Usunier. Tensor decompositions for temporal knowledge base completion. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=rke2P1BFwS.
Manessi et al. [2020] Franco Manessi, Alessandro Rozza, and Mario Manzo. Dynamic graph convolutional networks. Pattern Recognition, 97:107000, 2020.
Pareja et al. [2020] Aldo Pareja, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, Tao B. Schardl, and Charles E. Leiserson. Evolvegcn: Evolving graph convolutional networks for dynamic graphs. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 5363–5370. AAAI Press, 2020. URL https://ojs.aaai.org/index.php/AAAI/article/view/5984.
Hajiramezanali et al. [2019] Ehsan Hajiramezanali, Arman Hasanzadeh, Krishna R. Narayanan, Nick Duffield, Mingyuan Zhou, and Xiaoning Qian. Variational graph recurrent neural networks. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 10700–10710, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/a6b8deb7798e7532ade2a8934477d3ce-Abstract.html.
Bahdanau et al. [2015] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1409.0473.
Wang et al. [2019] Rui Wang, Bicheng Li, Shengwei Hu, Wenqian Du, and Min Zhang. Knowledge graph embedding via graph attenuated attention networks. IEEE Access, 8:5212–5224, 2019.
Li et al. [2021] Zhifei Li, Hai Liu, Zhaoli Zhang, Tingting Liu, and Neal N Xiong. Learning knowledge graph embedding with heterogeneous relation attention networks. IEEE Transactions on Neural Networks and Learning Systems, 2021.
Lee et al. [2019] Juho Lee, Yoonho Lee, Jungtaek Kim, Adam Kosiorek, Seungjin Choi, and Yee Whye Teh. Set transformer: A framework for attention-based permutation-invariant neural networks. In International Conference on Machine Learning, pages 3744–3753. PMLR, 2019.
Cho et al. [2014] Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014.
Toutanova and Chen [2015] Kristina Toutanova and Danqi Chen. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd workshop on continuous vector space models and their compositionality, pages 57–66, 2015.
Vashishth et al. [2020] Shikhar Vashishth, Soumya Sanyal, Vikram Nitin, and Partha P. Talukdar. Composition-based multi-relational graph convolutional networks. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=BylA_C4tPr.
Allen et al. [2019] Carl Allen, Ivana Balazevic, and Timothy M. Hospedales. On understanding knowledge graph representation. CoRR, abs/1909.11611, 2019. URL http://arxiv.org/abs/1909.11611.
Rossi et al. [2021] Andrea Rossi, Denilson Barbosa, Donatella Firmani, Antonio Matinata, and Paolo Merialdo. Knowledge graph embedding for link prediction: A comparative analysis. ACM Transactions on Knowledge Discovery from Data (TKDD), 15(2):1–49, 2021.