TDRE: A Tensor Decomposition Based Approach for Relation Extraction
Abstract
Extracting entity pairs along with relation types from unstructured texts is a fundamental subtask of information extraction. Most existing joint models rely on fine-grained labeling scheme or focus on shared embedding parameters. These methods directly model the joint probability of multi-labeled triplets, which suffer from extracting redundant triplets with all relation types. However, each sentence may contain very few relation types. In this paper, we first model the final triplet extraction result as a three-order tensor of word-to-word pairs enriched with each relation type. And in order to obtain the sentence contained relations, we introduce an independent but joint training relation classification module. The tensor decomposition strategy is finally utilized to decompose the triplet tensor with predicted relational components which omits the calculations for unpredicted relation types. According to effective decomposition methods, we propose the Tensor Decomposition based Relation Extraction (TDRE) approach which is able to extract overlapping triplets and avoid detecting unnecessary entity pairs. Experiments on benchmark datasets NYT, CoNLL04 and ADE datasets demonstrate that the proposed method outperforms existing strong baselines.
keywords:
Relation extraction , Tensor decomposition , Natural language processing , Deep neural network1 Introduction
Relation Extraction (RE) is aiming to extract entity pairs with relation types from unstructured text, which facilitates many other Natural Language Processing (NLP) tasks, including knowledge base construction and question answering. For example, there is a given sentence “Bill Gates co-founded Microsoft with his friend Paul Allen”, and the task is to extract triplet (Bill Gates, Founder, Microsoft). Traditional pipeline models consist of two separate subtasks name entity recognition and relation classification, which first extract entities and then classify entity pairs with relation types for decoding these into triple formats. However, pipeline models suffer from error propagation and do not fully use the relevance of these two subtasks. To tackle these problems, most recent studies pay attention to joint models which integrate entity recognition and relation classification into a single model by jointly training like training paradigm in multi-tasks projects. Owing to reasonable information sharing, joint models have achieved better performance than the pipeline models.
Most existing joint models rely on fine-grained labeling scheme [1] or train by parameter sharing [2, 3] in the embedding layers. However, fine-grained labeling scheme is unable to identify overlapping relations. As an improvement, the authors model triplet extraction as multi-head selective problems in [4], which approximately calculates the joint probability where is a triplet in the given sentence, denotes the source entity, denotes the target entity and denotes the relation between the source entity and the target entity. Having these notations in mind, we could model these predicted approximate joint probabilities as an tensor, where is the number of words in the sentence under consideration and is the number of predefined relation types. Each element in this tensor denotes the probability of the existence of a relation among word pairs. However, the great influence of relation classification would be ignored if we directly model the joint probabilities. And it could result in predicting unnecessary triplets when the predicted relations are wrong. As is well known, the triplet is correct only when all elements in the triplet are predicted correctly, which relies on the corrected predicted relations. Wrong predicted relation types could decrease the accuracy of the extracted triplets and cost a lot of unnecessary calculations.
In this paper, we attempt to avoid predicting redundant entity pairs while extracting overlapping triplets. It also means to prune the extracted tensor that could decode overlapping relations. Based on the tensor notation mentioned above, we consider the final extracted triplets (joint probability) as a three-dimensional tensor. By introducing a tensor based decomposition into directional component (DEDICOM) algorithm [5], we decompose the extracted triplets tensor into relational component so that it can effectively capture the interactions among relation types. From the perspective of probability, decomposing is an effective way to simplify the original joint probabilities. To extract multiple kinds of relation types that the sentence contains, we regard relation classification model as an independent module but also a part of joint training process. And then we define a new operation between the diagonal tensor appeared in the DEDICOM algorithm and the result of relation classification to restrict the detection of entity pairs with no prediction relation types. It is able to avoid extracting redundant triplets. Originally, the tensor based DEDICOM strategy is able to capture the asymmetric interactions among word pairs in the sentence. This idea is suitable for our task because the extracted triple tuple is directional where is the source entity and is the target entity.
The main work of this paper includes:
-
1.
We propose a new joint learning framework with tensor decomposition strategy for relation extraction (TDRE). Additionally, we employ the DEDICOM strategy for capturing the interactions among relation types.
-
2.
We introduce a new joint training relation classification module and apply the results into tensor decomposition process, which is able to avoid extracting redundant triplets.
-
3.
We conduct extensive experiments on three benchmark datasets NYT10, CoNLL04 and ADE datasets, which show that our model is able to achieve better performance than most of the existing strong baseline models.
2 Related Work
Many previous relation extraction approaches focus on manual feature extraction which heavily rely on NLP tools and sometimes are only applicable to specific fields. Recent years, neural networks and deep learning have attracted more and more attention. Deep neural network models have made significant progress in relation extraction without complicated handcrafted features. The task of relation extraction can be divided into two categories: pipeline models [6, 7] and joint models [8, 1, 4, 2]. Pipeline models take the task as two separated models including name entity recognition model and relation classification model. However, separated models cannot dig the potential relevance between two subtasks and suffer from error propagation. Joint models take these subtasks into a single model and train jointly by shared parameters.
Zheng et al. [1] introduced a novel tagging scheme where each word was tagged with a unique tag including an entity type and a relation type. Hence, the set of word tags is the Cartesian product of the relation types and the entity types. However, it is unable to deal with overlapping relations which means that one word may be mapped with many other words with different relations in the text. Zeng et al. [9] proposed a sequence-to-sequence copy mechanism to decode overlapping triplets. Bekoulis et al. [4] considered entities-relation extraction tasks as multi-head selection problems which could effectively solve overlapping relations problem as well. With the development of graph convolution neural network, it is developed to capture not only the ordered features on the timeline but also the features between different nodes in space with graph representation. Sun et al. [2] made fully uses of the relevance between entity and relation types by building entity-relation graph to catch the combination of entity pairs and valid relation. Guo et al. [10] also used graph convolution networks to capture structured information with dependency trees and to select relevant sub-structures with a soft-pruning approach. Nevertheless, this kind of proposed methods predict each relation for every word-to-word pairs which still suffer from redundant triplets. Takanobu et al. [11] firstly detected relation types, and then identified related entity pairs with hierarchical reinforcement learning strategy which avoided predicting redundant relation types to some extent.
Inspired by the aforementioned proposed methods, we introduce an independent but joint training relation classification module. And based on the DEDICOM strategy, we develop a joint learning framework for relation extraction with tensor decomposition algorithms. The proposed tensor decomposition based relation extraction model is effective to avoid predicting redundant triplets.
3 Model
In this section, we present our tensor decomposition based relation extraction model (TDRE) which is illustrated in Figure 1.

The relation extraction problem is denoted as follows. Given the relation types set , where represents the dataset of predefined relation types and “N” for none-relation types. If the word is not matched with any other words, it will be assigned the relation type of “N”. Given a sentence consisting of words , what we need is to structure the text into multiple triple tuples , where denotes the source entity, denotes the target entity and denotes the relation between the source entity and target entity.
3.1 Embedding Layer and BiLSTM layer
In order to map each word into a word vector, we use pre-trained word embeddings with the skip-gram word2vec model [12]. And we combine character embeddings with convolutional neural networks (CNN) since they are able to capture morphological features such as prefixes and suffixes. Bidirectional Long Short-Term Memory Networks (BiLSTM) is utilized to encode bidirectional contextual information [13],
(1) |
where represents the hidden states with BiLSTM networks concatenating the forward and backward hidden states at position in the given sequence, and represents the th word embedding representation. Here, we formally express as by pre-trained word vector embeddings and CNN based character embeddings. For this layer, we will get a final representation matrix for the given sentence:
(2) |
where is the sequence length and is the final concatenated bidirectional hidden dimension of BiLSTM networks.
3.2 Name Entity Recognition Model
To identify the entities in a given sentence, we regard entity recognition task as a sequence labeling problem as what is done in [14]. A Conditional Random Fields (CRF) layer is employed here. The designed state transition matrix makes full use of neighbor tag information, which helps both entity type identification and boundary recognition. With the sentence representation after embedding layer and BiLSTM layer, the given entity tag set and the defined feature score functions , the probabilistic model is formulated as:
(3) |
where and shows each feature function may have different weight to decide the final probabilistic score. Each feature function may include several sub-feature functions which can be formalized as where represents the position of current word in the given sentence, denotes the entity tag of current word and denotes the entity tag of the previous word. With these definitions, the loss function is formally given as:
(4) | ||||
(5) |
where is the ground truth word label and . Our optimization goal is to minimize the loss function and finally decode entity tags with Viterbi algorithm.
3.3 Relation Classification Model
As we all know, simple classification problem is much easier than simultaneous classification for both relations and its related entity pairs. In this module, we consider the relation classification problem as an independent multi-label classification problem, because each sentence may contain more than one relation type [15, 16, 17, 18]. In order to share the word representation, we use the outputs of BiLSTM networks as the inputs of our classification model which could help find the interactions between entity recognition and relation classification. It is worth noting that we are categorizing for the entire sentence here instead of categorizing every word in the sentence.
A linear network is utilized in our relation classification model and it can be denoted as , where is the sentence representation, and are the training parameters. As for the multi-label problems, we focus on the final expression of the classification results and each element is calculated as follows:
(6) |
where is the sigmoid activation function and is the threshold for predicting the sentence contained relation types.
If the real relation types are denoted by , then the goal of this classification module is to minimize the loss function:
(7) |
where is the ground truth relation type and is the probability for predicting the current sentence into the th relation type. And in the training process, sigmoid cross-entropy loss function is used since there exists a multi-label problem.
3.4 Triplet Extraction Model
According to the above defined sentence with words, the target of relation extraction is to extract the existing triple tuples with the predefined relation type set from the unstructured text. And the extracted triplet corresponds to specified entity spans and the relation between and . For a specific relation type , we focus on mapping each word with the others in the sentence with an indication function. The word-to-word pair mapping can be modeled as a matrix , where the element in the matrix indicates whether the th word and the th word can form a triplet related word pair. And and represent the last word of source entity and target entity respectively. Hence, the indication function is expressed as when the th word can map with th word in the specific relation type . Extending matrix with relation type component, we have a tensor extraction representation . It can be defined as following:
(8) |
Note that in this mathematical formula, and are the tail words of two entity spans.
In order to extract useful information from the tensor, we employ a tensor decomposition algorithm named decomposition into directional component (DEDICOM) strategy introduced by Harshman et al. [5]. As for a sequence with words, the defined mapping matrix needs to describe the asymmetric relationships between each word and the other words. This idea is suitable for that each extracted triplet is directional since source entity and target entity are ordered. Triplet extraction process can be modeled as a three-order DEDICOM model, where the target constructed tensor is , where denotes the number of relation types. The decomposition is:
(9) |
where is the sequence representation after BiLSTM networks, and are parameters. The matrices are diagonal, and the diagonal entry indicates the participation of th latent component at relation and .
From the above mentioned module, it is necessary to predict all relation types for each word pair while the sentence may contain only few relation types. To address this problem and to further utilize the predicted relation types with relation classification model, we need to make specific settings for the factor of in the structure of the tensor decomposition. Actually, it is not necessary to calculate the mapping matrix when there is no prediction for the th relation type. We set as a zero matrix, thus turns into a zero matrix after calculation, which means no triplet in th relation component. Thus, it is theoretically observable that the decomposition strategy avoids predicting unnecessary triplets and simultaneously reduces the predicting time. Therefore, a custom operation can be represented as follows:
(10) |
And then, the final decomposition algorithm can be formalized as:
(11) |
where denotes the prediction result of relation classification module.
From the perspective of probability distribution, we do not directly model the joint probability like what is done in [4] and [3] but decompose joint probability into relation component as conditional probability:
(12) |
, where represents probability distribution, respectively denotes the -th and -th word representation in the sentence and denotes the -th relation type. Owing to that each word may be mapped with more than one word and relation types, we employ sigmoid function here to amplify the difference. And the final triplet decode procession can be denoted as:
(13) |
where is the calculated tensor and represents the threshold for judging whether a triplet is true.
The goal of this decomposed relation extraction module is to minimize the cross-entropy loss during training:
(14) |
where is the ground truth label and is the predicted triplet tensor.
Input: Sentence , predefined relation type set
Output: Triplets , are the source entity and the target entity, denotes the relation type, and denotes the number of triplets.
3.5 Loss Function
In order to jointly train the proposed model, we combine all of the three objective loss functions and optimize parameters together. The final loss function is computed as follows:
(15) |
where is the loss function in name entity recognition module, is the loss function in relation classification module and is the loss function of final triplet extraction module which is calculated by DEDICOM algorithm. In order to update parameters, we optimize our model with Adam optimization approach and train the proposed joint model like multi-tasks learning paradigm.
Finally, our TDRE algorithm is summarized in Algorithm 1.
4 Experiments
4.1 Datasets
We conduct experiments on three benchmark datasets for relation extraction: (i) NYT10, the originally New York Times corpus [19] which is developed by distant supervised and published by Takanobu et al. [11] who filtered dataset by removing the relations in training set but not in testing set and sentences containing no relations. (ii) CoNLL04 dataset [20] and it is split by Gupta et al. [21] and Adel et al. [22]. The official given entity type set is {Location, Organization, Person, Other} where we omit the beginning, inside and the outside tags of entity types here. And the official defined relation type set is {Kill, Live in, Located in, OrgBased in, Work for}. (iii) Adverse Drug Events dataset (ADE) [23], where we use 80% for training and 20% as test set. Entity type set {Beginning, Inside, Outside} is applied here since there is no official entity types. And the relation type set is {Adverse-Effect, Drug-Disease Treatment}.
The details of the public datasets are reported in Table 1. The approximate statistical number of triplets on ADE dataset is shown since we conduct cross-validation on ADE dataset.
Datasets | NYT10 | CoNLL04 | ADE |
---|---|---|---|
Relation types | 29 | 5 | 2 |
Entity types | 7 | 4 | 3 |
Training set | 70339 | 910 | 3416 |
Training triplets | 87739 | 1273 | 5650 |
Test set | 4006 | 288 | 854 |
Test triplets | 5859 | 422 | 1450 |
4.2 Details of Implementation
Similar to previous works, we obtain the 50-dimensional word embeddings which is used by [22] trained on Wikipedia. Character embeddings are initialized randomly and the kernel size is set to 3 for convolution neural network while parameters update with training process. And then we concatenate the word embeddings and character embeddings as the final input embeddings. For the representation of the learning layers, we use three-layer BiLSTM networks with 64 hidden units. Comparing to most existing models, we randomly initialize entity label embeddings for all conducted datasets with embedding size 128, and update parameters by optimization. Training is performed by using the Adam optimizer with learning rate , and dropout technique is used in input embeddings and BiLSTM hidden layers, where the dropout rate is set to 0.1. Early stopping is also used on the validation set for avoiding overfitting.
-Methods | NYT10 | NYT10-sub | ||||
---|---|---|---|---|---|---|
Pre | Rec | F1 | Pre | Rec | F1 | |
Tagging [1] | 59.3 | 38.1 | 46.4 | 25.6 | 23.7 | 24.6 |
CopyR [9] | 56.9 | 45.2 | 50.4 | 39.2 | 26.3 | 31.5 |
HRL [11] | 71.4 | 58.6 | 64.4 | 81.5 | 47.5 | 60.0 |
MrMep [24] | 71.7 | 63.5 | 67.3 | 83.2 | 55.0 | 66.2 |
TDRE (ours) | 81.3 | 68.3 | 74.3 | 82.2 | 70.2 | 75.7 |
4.3 Results
Evaluation Metrics
As for entity recognition, the entity is correctly identified only when all the characters of the original entity are recognized and recognized as the correct type. As for triplet extraction, a triplet is considered correct if and only if the source entity, target entity and related relation types are both correct. We adopt precision, recall rate and micro-F1 to evaluate the performance.
Baselines
For comparison, we choose the following models as baselines:
-
1.
Tagging [1]. It takes the joint extraction as a sequential labeling problem with a special tagging schema where the label of each word includes the entity type and entity order (source or target entity) and the relation type.
-
2.
CopyR [9]. It is a sequence-to-sequence learning framework with copy mechanism which could handle the relational triplet overlapped problem.
-
3.
HRL [11]. It applies a hierarchical reinforcement learning by detecting relations and then extracting participating entities for the relation which regards the related entities as the argument of a relation.
-
4.
MrMep [24]. It is also a joint learning model which first classifies relations and then uses triplet attention to reinforce the connections between relation and entity pairs.
-
5.
Multi-head selection model [4]. It takes the relation extraction task as a multi-head selection problem, which could identify multiple relation types for each entity pair.
-
6.
Multi-head with adversarial training regularization method [25]. It has also achieved better performance in overlapping relations.
-
7.
Li et tal. [26] introduces a neural joint model to simultaneously extract both biomedical entities and their relations by using hand-crafted features or features derived from NLP tools.
-
8.
MultiQA [27]. It takes the task as a multi-turn question answering problem which identify answer spans (entities) from the defined relation related questions.
Experiment Results
The detailed comparisons between baseline models and our proposed model are summarized in Table 2 and Table 3 with precision, recall, micro-average F1. The best performances are marked in bold.
Dataset | Model | Entity Recognition | Triplet Extraction | Overall F1 | ||||
---|---|---|---|---|---|---|---|---|
Pre | Rec | F1 | Pre | Rec | F1 | |||
CoNLL04 | Multi-Head | 83.75 | 84.06 | 83.9 | 63.75 | 60.43 | 62.04 | 72.97 |
MultiHead-AT | - | - | 83.61 | - | - | 61.95 | 72.78 | |
MultiQA | 89.0 | 86.6 | 87.8 | 69.2 | 68.2 | 68.9 | 78.35 | |
TDRE (ours) | 91.85 | 91.34 | 91.59 | 80.90 | 76.67 | 78.73 | 85.16 | |
ADE | Li [26] | 82.70 | 86.70 | 84.60 | 67.50 | 75.80 | 71.40 | 78.00 |
Multi-Head | 84.72 | 88.16 | 86.40 | 72.10 | 77.24 | 74.58 | 80.49 | |
MultiHead-AT | - | - | 86.73 | - | - | 75.52 | 81.13 | |
TDRE (ours) | 88.82 | 88.28 | 88.55 | 81.63 | 76.20 | 78.82 | 83.69 |
As for NYT10 dataset, the purposed TDRE model significantly outperforms the others. Through the sub test set NYT10-sub published by HRL [11], it is well known that it contains most of the overlapping relations while the triplets shared both source entity and target entity. Our model shows the best performance particularly in NYT10-sub which means TDRE is more effective for overlapping problems. And in Table 3, it can be seen that our proposed approach performs better comparing with the baseline methods whatever in entity recognition or relation identification on CoNLL04 and ADE dataset.
Especially focusing on the precision of triplet extraction, experimental results show great improvements. Theoretically, our model reduces extra computation for the unpredicted relation types while most of the other models compute every relation type for an entity pair. The agreement between theoretical analysis and experimental results show that the model can indeed reduce predicting the redundant triplets. All the experimental results indicate that the proposed approach of tensor decomposition into relation component to model relation extraction process is more effective than the baseline models, even without using BERT.
Ablation Study
In order to illustrate the effectiveness of various parts of our model, we conduct ablation experiments on the CoNLL04 dataset. Concretely, we separately remove character embeddings (Char Emb.), entity label embeddings (Label Emb.) and relation classification module (CLS). According to the results presented in Table 4, it can be observed that character embeddings, which could catch additional information such as suffixes, prefixes and other morphological features, are helpful in the triplet extraction. In terms of entity label embeddings, it is possible to incorporate entity type information and boundary information which is beneficial to capture the integrity of entities. As for removing classification module, the proposed approach degenerates in directly modeling joint probability and the performances severely degrade whether in entity recognition or in triplet extraction. This indicates the proposed tensor decomposition into relation component approach has great contributions in triplet extraction.
Methods | Entity | Triplet Extraction | ||
---|---|---|---|---|
F1 | Pre | Rec | F1 | |
TDRE | 91.59 | 80.90 | 76.67 | 78.73 |
- Char Emb. | 85.65 | 75.91 | 64.52 | 69.75 |
- Label Emb. | 89.51 | 75.74 | 66.90 | 71.05 |
- CLS | 88.80 | 74.53 | 66.90 | 70.51 |
Analysis of Relation Classification Model
Experiments have been conducted on CoNLL04 dataset with recurrent neural networks (RNN), recurrent convolution neural networks (RCNN) and bidirectional long short-term memory networks (BiLSTM) models for relation classification. The results in Table 5 show that our model are still better than the previous compared models [4, 27] even without the classification model. We guess it is because the raw introduced DEDICOM strategy decomposes with diagonal tensor, and it is to interpret connections between relation types. The strategy is originally suitable for capturing asymmetry which is fit to our ordered triplets. Focusing on the other results with RNN and RCNN, performances are better than multi-head model but not better than MultiQA [27]. There are two possible reasons: (1) classification models pay much attention on identifying relations with sacrificing the accuracy of entity recognition; (2) parameters are not shared with name entity recognition in these classification models, which ignores the inherent connections between relation identification and entity recognition. After parameters sharing in BiLSTM networks, the model achieves substantial improvements in entity recognition and triplet extraction module. Better performance in relation identification or in entity recognition can yield better triplet extraction performance.
Model | Rel | Entity | Triplet Extraction | ||
---|---|---|---|---|---|
F1 | F1 | Pre | Rec | F1 | |
- CLS | - | 88.8 | 74.5 | 66.9 | 70.5 |
RNN | 97.6 | 85.0 | 70.5 | 60.2 | 64.9 |
RCNN | 99.7 | 87.7 | 75.2 | 66.4 | 70.5 |
BiLSTM | 94.1 | 91.6 | 80.9 | 76.7 | 78.7 |
5 Conclusion
In this paper, we introduce a joint neural model with tensor decomposition based strategy for multi-labeled relation extraction (TDRE) . The introduced tensor is to model the connections of word pairs in each relation component, which could tackle overlapping relations. Consider relation classification as an independent but joint training module to get sentence contained relation types. By implanting predicted relations to the tensor decomposition formula as relational component factors, it can avoid identifying unnecessary entity pairs among no related relation types. Experiments on public datasets demonstrate that the proposed model achieves significant improvements over previous state-of-the-art baselines. In the future, we plan to explore other tensor decomposition approaches and extend our model in other tensor based modeling tasks.
Acknowledgments
This work is supported by the Science and Technology Plan Project of Sichuan Province (Key R & D Project, 2020YFS0465) and Mathematics Teaching Research and Development Center for Colleges (CMC20190501).
References
-
[1]
S. Zheng, F. Wang, H. Bao, Y. Hao, P. Zhou, B. Xu,
Joint extraction of entities
and relations based on a novel tagging scheme, in: Proceedings of the 55th
Annual Meeting of the Association for Computational Linguistics, 2017, pp.
1227–1236.
doi:10.18653/v1/P17-1113.
URL https://www.aclweb.org/anthology/P17-1113 -
[2]
C. Sun, Y. Gong, Y. Wu, M. Gong, D. Jiang, M. Lan, S. Sun, N. Duan,
Joint type inference on
entities and relations via graph convolutional networks, in: Proceedings of
the 57th Annual Meeting of the Association for Computational Linguistics,
2019, pp. 1361–1370.
doi:10.18653/v1/P19-1131.
URL https://www.aclweb.org/anthology/P19-1131 -
[3]
T.-J. Fu, P.-H. Li, W.-Y. Ma,
GraphRel: Modeling text
as relational graphs for joint entity and relation extraction, in:
Proceedings of the 57th Annual Meeting of the Association for Computational
Linguistics, 2019, pp. 1409–1418.
doi:10.18653/v1/P19-1136.
URL https://www.aclweb.org/anthology/P19-1136 - [4] G. Bekoulis, J. Deleu, T. Demeester, C. Develder, Joint entity recognition and relation extraction as a multi-head selection problem, Expert Systems with Applications 114 (2018) 34–45. doi:10.1016/j.eswa.2018.07.032.
- [5] R. Harshman, P. Green, Y. Wind, M. Lundy, A model for the analysis of asymmetric data in marketing research, Marketing Science (1982) 205–242doi:10.1287/mksc.1.2.205.
- [6] K. Fundel-Clemens, R. Küffner, R. Zimmer, Relex-relation extraction using dependency parse trees, Bioinformatics (2007) 365–371doi:10.1093/bioinformatics/btl616.
- [7] H. Gurulingappa, A. Mateen‐Rajpu, L. Toldo, Extraction of potential adverse drug events from medical case reports, Journal of Biomedical Semantics 3 (1) (2012) 15. doi:10.1186/2041-1480-3-15.
-
[8]
M. Miwa, M. Bansal, End-to-end
relation extraction using LSTMs on sequences and tree structures, in:
Proceedings of the 54th Annual Meeting of the Association for Computational
Linguistics, 2016, pp. 1105–1116.
doi:10.18653/v1/P16-1105.
URL https://www.aclweb.org/anthology/P16-1105 -
[9]
X. Zeng, D. Zeng, S. He, K. Liu, J. Zhao,
Extracting relational facts
by an end-to-end neural model with copy mechanism, in: Proceedings of the
56th Annual Meeting of the Association for Computational Linguistics, 2018,
pp. 506–514.
doi:10.18653/v1/P18-1047.
URL https://www.aclweb.org/anthology/P18-1047 -
[10]
Z. Guo, Y. Zhang, W. Lu,
Attention guided graph
convolutional networks for relation extraction, in: Proceedings of the 57th
Annual Meeting of the Association for Computational Linguistics, 2019, pp.
241–251.
doi:10.18653/v1/P19-1024.
URL https://www.aclweb.org/anthology/P19-1024 - [11] R. Takanobu, T. Zhang, J. Liu, M. Huang, A hierarchical framework for relation extraction with reinforcement learning, Proceedings of the AAAI Conference on Artificial Intelligence (2019) 7072–7079.
- [12] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in: Proceedings of the 26th International Conference on Neural Information Processing Systems, 2013, pp. 3111–3119.
- [13] A. Graves, A. R. Mohamed, G. Hinton, Speech recognition with deep recurrent neural networks, Acoustics, Speech, and Signal Processing (2013) 6645–6649.
- [14] Z. Huang, W. Xu, K. Yu, Bidirectional LSTM-CRF models for sequence tagging, Computing Research Repository arXiv:1508.01991 (2015).
- [15] S. Lai, L. Xu, K. Liu, J. Zhao, Recurrent convolutional neural networks for text classification, in: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015, pp. 2267–2273.
- [16] P. Liu, X. Qiu, X. Huang, Recurrent neural network for text classification with multi-task learning, in: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016, pp. 2873–2879.
-
[17]
Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, E. Hovy,
Hierarchical attention
networks for document classification, in: Proceedings of the 2016 Conference
of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Association for Computational
Linguistics, 2016, pp. 1480–1489.
doi:10.18653/v1/N16-1174.
URL https://www.aclweb.org/anthology/N16-1174 - [18] Y. L. Liang Yao, Chengsheng Mao, Graph convolutional networks for text classification, in: The Thirty-Third AAAI Conference on Artificial Intelligence, 2019, pp. 7370–7377.
- [19] S. Riedel, L. Yao, A. McCallum, Modeling relations and their mentions without labeled text, in: Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases: Part III, ECML PKDD’10, Springer-Verlag, 2010, pp. 148–163. doi:10.1007/978-3-642-15939-8\_10.
-
[20]
D. Roth, W.-t. Yih, A linear
programming formulation for global inference in natural language tasks, in:
Proceedings of the Eighth Conference on Computational Natural Language
Learning, 2004, pp. 1–8.
URL https://www.aclweb.org/anthology/W04-2401 -
[21]
P. Gupta, H. Schütze, B. Andrassy,
Table filling multi-task
recurrent neural network for joint entity and relation extraction, in:
Proceedings of COLING 2016, the 26th International Conference on
Computational Linguistics: Technical Papers, 2016, pp. 2537–2547.
URL https://www.aclweb.org/anthology/C16-1239 -
[22]
H. Adel, H. Schütze,
Global normalization of
convolutional neural networks for joint entity and relation classification,
in: Proceedings of the 2017 Conference on Empirical Methods in Natural
Language Processing, 2017, pp. 1723–1729.
doi:10.18653/v1/D17-1181.
URL https://www.aclweb.org/anthology/D17-1181 - [23] H. Gurulingappa, A. M. Rajput, A. Roberts, J. Fluck, M. Hofmann-Apitius, L. Toldo, Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports, Journal of biomedical informatics 45 (5) (2012) 885–892. doi:10.1016/j.jbi.2012.04.008.
-
[24]
J. Chen, C. Yuan, X. Wang, Z. Bai,
Mrmep: Joint extraction of
multiple relations and multiple entity pairs based on triplet attention, in:
Proceedings of the 23rd Conference on Computational Natural Language
Learning, 2019, pp. 593–602.
doi:10.18653/v1/K19-1055.
URL https://www.aclweb.org/anthology/K19-1055 -
[25]
G. Bekoulis, J. Deleu, T. Demeester, C. Develder,
Adversarial training for
multi-context joint entity and relation extraction, in: Proceedings of the
2018 Conference on Empirical Methods in Natural Language Processing, 2018,
pp. 2830–2836.
doi:10.18653/v1/D18-1307.
URL https://www.aclweb.org/anthology/D18-1307 -
[26]
F. Li, M. Zhang, G. Fu, D.-H. Ji,
A neural joint model for
entity and relation extraction from biomedical text, BMC Bioinformatics
18 (1) (2017) 198.
doi:10.1186/s12859-017-1609-9.
URL https://doi.org/10.1186/s12859-017-1609-9 -
[27]
X. Li, F. Yin, Z. Sun, X. Li, A. Yuan, D. Chai, M. Zhou, J. Li,
Entity-relation extraction
as multi-turn question answering, in: Proceedings of the 57th Annual Meeting
of the Association for Computational Linguistics, 2019, pp. 1340–1350.
doi:10.18653/v1/P19-1129.
URL https://www.aclweb.org/anthology/P19-1129