Data Augmentation for Event Extraction using Text Infilling

First Author
Affiliation / Address line 1
Affiliation / Address line 2
Affiliation / Address line 3
email@domain
&Second Author
Affiliation / Address line 1
Affiliation / Address line 2
Affiliation / Address line 3
email@domain

Abstract

We present XDA-EE: eXplaination-based Data Augmentation for Event Extraction.

1 Introduction

Event Extraction (EE) is an important yet challenging task in information extraction, whose goal is to extract triggers with specific types and their arguments from unstructured texts. In recent years, deep learning methods have emerged as one of the most prominent approaches for this task (Nguyen2019OneFA; Lin2020AJN; Du2020EventEB; Paolini2021StructuredPA; Lu2021Text2EventCS). However, they are notorious for requiring large labelled data, which has become a major impediment to the development and application of EE systems.

Annotating data for EE is usually costly and time-consuming, as it requires expert knowledge. One possible solution is to resort to data augmentation, which is a useful technique for synthetic data generation and has been widely used for various tasks (Shorten2019ASO; Wang2020ASO; Iwana2021AnES; Wei2019EDAED). The current data augmentation methods that can be applied to this task, can be broadly classified into two categories: (1) Rule-based methods create augmented examples by manipulating some words in the original text, such as word replacement (Zhang2015CharacterlevelCN; Kobayashi2018ContextualAD; Cai2020DataMT) or word position swap (Sahin2018DataAV; Wei2019EDAED; Min2020SyntacticDA); (2) Generation-based methods synthesize new examples via some generative models, such as back-translation models Sennrich2016ImprovingNM or pretrained language models (Ding2020DAGADA; Zhou2021MELMDA; Ye2022ZeroGenEZ). Generally speaking, the rule-based methods provide good control over the content of the generated sentences through word-level manipulation, but are poor in terms of syntactic and semantic diversity. According to evidence from previous works (Dathathri2020PlugAP; Yang2020GenerativeDA; Ye2022ZeroGenEZ), pretrained language models, such as GPT-2 (radford2019language) are known for its ability to make a creative use of words, but are not well-controlled with the content of the generated text.

To this end, we introduce an enhanced data augmentation framework called XDA-EE (eXplaination-based Data Augmentation for Event Extraction), which combines the advantages of both rule-based and generation-based data augmentation methods. Our framework augments training samples via text infilling (zhu2019text; Donahue2020EnablingLM) as shown with an example in Fig. LABEL:. Text infilling aims to predict missing spans of text, which has the potential to connect fragmented concepts. By means of text infilling, we can control the content of the augmented samples by [xxx]. The text infilling model can be viewed as a keyword-based search engine, in which [xxxx]. Inspired by Wickramanayake2021ExplanationbasedDA, we also propose to use model decision explanations to facilitate the search for underrepresented samples from the text infilling model.

2 Data Augmentation via Text Infilling

We propose XDA-EE to synthesize a new sample based on saliency information.

2.1 Template Construction

2.2 Data Augmentation via Text Infilling

2.3 Sample Selection with Model Decision Explanation

2.4

3 Experiments

3.1 Experimental Setup

Benchmark Datasets

Low-resource Setting

Event Extraction Models

3.2 Results and Discussion

Model	Method
Text2Event	No augmentation
	Synonym-Replacement	33.9	18.7	0.57
	Mention-Replacement
	Back-Translation
\hdashlineXDA-EE

Table 1: Main Results.

Appendix A Example Appendix

This is a section in the appendix.