This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Bi-Directional Iterative Prompt-Tuning for Event Argument Extraction

Lu Dai1, Bang Wang1, Wei Xiang 1, Yijun Mo2
1School of Electronic Information and Communications,
Huazhong University of Science and Technology, Wuhan, China
2School of Computer Science and Technology,
Huazhong University of Science and Technology, Wuhan, China
{dailu18, wangbang, xiangwei, moyj}@hust.edu.cn
Abstract

Recently, prompt-tuning has attracted growing interests in event argument extraction (EAE). However, the existing prompt-tuning methods have not achieved satisfactory performance due to the lack of consideration of entity information. In this paper, we propose a bi-directional iterative prompt-tuning method for EAE, where the EAE task is treated as a cloze-style task to take full advantage of entity information and pre-trained language models (PLMs). Furthermore, our method explores event argument interactions by introducing the argument roles of contextual entities into prompt construction. Since template and verbalizer are two crucial components in a cloze-style prompt, we propose to utilize the role label semantic knowledge to construct a semantic verbalizer and design three kinds of templates for the EAE task. Experiments on the ACE 2005 English dataset with standard and low-resource settings show that the proposed method significantly outperforms the peer state-of-the-art methods. Our code is available at https://github.com/HustMinsLab/BIP.

1 Introduction

As a key step of event extraction, event argument extraction refers to identifying event arguments with predefined roles. For example, for an "Attack" event triggered by the word "fired" in the sentence "Iraqis have fired sand missiles and AAA at aircraft", EAE aims to identify that "Iraqis", "missiles", "AAA" and "aircraft" are event arguments with the "Attacker", "Instrument", "Instrument" and "Target" roles, respectively.

Refer to caption
(a) Fine-Tuning for EAE
Refer to caption
(b) Prompt-Tuning for EAE.
Figure 1: Illustration of fine-tuning and prompt-tuning methods for predicting the argument role of the entity mention "Iraqis" in the event triggered by the word "fired".

In order to exploit the rich linguistic knowledge contained in pre-trained language models, fine-tuning methods have been proposed for EAE. The paradigm of these methods is to use a pre-trained language model to obtain semantic representations, and then feed these representations into a well-designed neural network to extract event arguments. For example in Figure 1(a), an event trigger representation and an entity mention representation are first obtained through a pre-trained language model, and then input to a designed neural network, such as hierarchical modular network (Wang et al., 2019) and syntax-attending transformer network (Ma et al., 2020), to determine the argument role that the entity mention plays in the event triggered by the trigger. However, there is a significant gap between the EAE task and the objective form of pre-training, resulting in the poor utilization of the prior knowledge in PLMs. Additionally, fine-tuning methods heavily depend on extensive annotated data and perform poorly in low-resource data scenarios.

To bridge the gap between the EAE task and the pre-training task, prompt-tuning methods (Li et al., 2021; Ma et al., 2022; Hsu et al., 2022; Liu et al., 2022) recently have been proposed to formalize the EAE task into a more consistent form with the training objective of generative pre-trained language models. These methods achieve significantly better performance than fine-tuning methods in low-resource data scenarios, but not as good as the state-of-the-art fine-tuning method ONEIE (Lin et al., 2020) in high-resource data scenarios.

To achieve excellent performance in both low-resource and high-resource data scenarios, we leverage entity information to model EAE as a cloze-style task and use a masked language model to handle the task. Figure 1(b) shows a typical cloze-style prompt-tuning method for EAE. The typical prompt-tuning method suffers from two challenges: (i) The typical human-written verbalizer (Schick and Schütze, 2021) is not a good choice for EAE. The human-written verbalizer is to manually assign a label word to each argument role. For example in Figure 1(b), we choose the "attacker" as the label word of "Attacker" role. However, an argument role may have different definitions in different types of events. For example, the "Entity" role refers to "the voting agent" and "the agents who are meeting" in the "Elect" and "MEET" events, respectively. (ii) Event argument interactions are not explored. Existing work (Sha et al., 2018; Xiangyu et al., 2021; Ma et al., 2022) has demonstrated the usefulness of event argument interactions for EAE. For the "Attack" event triggered by the word "fired" in Figure 1, given that "missiles" is an "Instrument", it is more likely to correctly classify "AAA" into the "Instrument" role.

In this paper, we propose a bi-directional iterative prompt-tuning (BIP) method to alleviate the aforementioned challenges. To capture argument interactions, a forward iterative prompt and a backward iterative prompt are constructed to utilize the argument roles of contextual entities to predict the current entity’s role. For the verbalizer, we redefine the argument role types and assign a virtual label word to each argument role, where the initial representation of each virtual label word is generated based on the semantic of the argument role. In addition, we design three kind of templates: hard template, soft template, and hard-soft template, which are further discussed in the experimental section. Extensive experiments on the ACE 2005 English dataset show that the proposed method can achieve the state-of-the-art performance in both low-resource and high-resource data scenarios.

2 Related Work

In this section, we review the deep learning methods for event argument extraction and prompt-tuning methods for natural language processing.

2.1 Event Argument Extraction

Early deep learning methods use various neural networks to capture the dependencies in between event triggers and event arguments to extract event arguments, such as convolutional neural network (CNN)-based models (Chen et al., 2015), recurrent neural network (RNN)-based models (Nguyen et al., 2016; Sha et al., 2018) and graph neural networks (GNN)-based models (Liu et al., 2018; Dai et al., 2021). As pre-trained language models have been proven to be powerful in language understanding and generation (Devlin et al., 2019; Liu et al., 2019; Lewis et al., 2020), some PLM-based methods have been proposed to extract event arguments. These methods can be divided into two categories: fine-tuning and prompt-tuning ones.

Fine-tuning methods aim to design a variety of neural network models to transfer pre-trained language models to EAE task. According to the modeling manner of EAE task, existing fine-tuning work can be further divided into three groups: classification-based methods (Wang et al., 2019; Wadden et al., 2019; Lin et al., 2020; Ma et al., 2020; Xiangyu et al., 2021); machine reading comprehension-based methods (Du and Cardie, 2020; Li et al., 2020; Liu et al., 2020); generation-based methods (Paolini et al., 2020; Lu et al., 2021). Prompt-tuning methods aim to design a template to provide useful prompt information for pre-trained language models to extract event arguments (Li et al., 2021; Ma et al., 2022; Hsu et al., 2022; Liu et al., 2022). For example, Li et al. (2021) create a template for each event type based on the event ontology definition and model the EAE task as the conditional text generation. This method acquires event arguments by comparing the designed template with the generated natural language text. Hsu et al. (2022) improve the method of Li et al. (2021) by replacing the non-semantic placeholder tokens in the designed template with words with role label semantics.

2.2 Prompt-tuning

The core of prompt-tuning is to transform the given downstream task into a form that is consistent with a training task of the pre-trained language models (Liu et al., 2021). As prompt-tuning makes better use of prior knowledge contained in pre-trained language models, this new paradigm is beginning to become popular in NLP tasks and has achieved promising performance (Seoh et al., 2021; Han et al., 2021; Cui et al., 2021; Hou et al., 2022; Hu et al., 2022; Chen et al., 2022). For example, Cui et al. (2021) use candidate entity spans and entity type label words to obtain templates, and recognize entities based on the pre-trained generative language model’s score for each template. Hu et al. (2022) convert the text classification task to a masked language modeling problem by predicting the word filled in the "[MASK]" token, and propose a knowledgeable verbalizer to map the predicted word into a label. Chen et al. (2022) consider the relation extraction problem as a cloze task and use the relation label semantic knowledge to initialize the virtual label word embedding for each relation label.

3 Model

In this section, we first introduce the problem description of event argument extraction and the overall framework of our bi-directional iterative prompt-tuning method, then explain the details of designed semantical verbalizer, three different templates, and model training.

Refer to caption
Figure 2: The overall architecture of our bi-directional iterative prompt-tuning method shown with an example predicting argument roles of "Iraqis", "missiles", "AAA", and "aircraft" in the "Attack" event triggered by "fired", where blue font represents the given trigger and green font represents the given entity.

3.1 Problem Description

As the most common ACE dataset provides entity mention, entity type and entity coreference information, we use these entity information to formalize event argument extraction into the argument role prediction problem of entities. The detailed problem description is as follow: Given a sentence SS, an event trigger tt with event type, and nn entities {e1,e2,,en}\{e_{1},e_{2},...,e_{n}\}, the goal is to predict the argument role of each entity in the event triggered by tt and output a set of argument roles {r1,r2,,rn}\{r_{1},r_{2},...,r_{n}\}.

In this paper, the argument role prediction problem is casted as a cloze-style task through a template T()T(\cdot) and verbalizer. For the trigger tt and entity eie_{i}, a template T(t,ei,[MASK])T(t,e_{i},[\texttt{MASK}]) is constructed to query the argument role that the entity eie_{i} plays in the event triggered by tt. For example in Figure 1(b), the template T(fired,Iraqis,[MASK])T(fired,Iraqis,[\texttt{MASK}]) can be set as "For the attack event triggered by the fired, the person, Iraqis, is [MASK]", where "attack" represents the event type of the trigger "fired" and "person" represents the entity type of the entity "Iraqis". Then the input of the ii-th entity eie_{i} is:

xi=S[SEP]T(t,ei,[MASK]).x_{i}=S\ [\texttt{SEP}]\ T(t,e_{i},[\texttt{MASK}]). (1)

The verbalizer is a mapping from the label word space to the argument role space. Let ljl_{j} denote the label word that is mapped into the role rjr_{j}, the confidence score that the ii-th entity is classified as the jj-th role type is:

sij=Ci([MASK]=lj),s_{ij}=C_{i}([\texttt{MASK}]=l_{j}), (2)

where CiC_{i} is the output of a pre-trained masked language model at the masked position in xix_{i}, i.e. the confidence score of each word in the dictionary filled in the [MASK] token.

3.2 Overall Framework

Figure 2 presents the overall architecture of our bi-directional iterative prompt-tuning method, consisting of a forward iterative prompt (FIP) and a backward iterative prompt (BIP). The forward iterative prompt predicts the argument role of each entity iteratively from left to right until argument roles of all entities are obtained. For example in Figure 2, the order of entities is "IraqismissilesAAAaircraftIraqis\rightarrow missiles\rightarrow AAA\rightarrow aircraft".

In order to utilize the predicted argument role information to classify the current entity into the correct role, we introduce the argument roles of the first ii-11 entities into the template of the ii-th entity. The template of the ii-th entity in the forward iterative prompt can be represented as:

FIP(ei)=T(t,e1,l1,,ei1,li1,ei,[MASK]),FIP(e_{i})=T(t,e_{1},\overrightarrow{l_{1}},...,e_{i-1},\overrightarrow{l_{i-1}},e_{i},[\texttt{MASK}]), (3)

where lj\overrightarrow{l_{j}} is the role label word of the jj-th entity predicted by the forward iterative prompt. For example in Figure 2, l1\overrightarrow{l_{1}} is the word "attacker". Then the confidence score distribution of the ii-th entity over all argument roles in the forward iterative prompt can be computed by

𝐬i=MLM(S[SEP]FIP(ei)).\overrightarrow{\mathbf{s}_{i}}=MLM(S\ [\texttt{SEP}]\ FIP(e_{i})). (4)

li\overrightarrow{l_{i}} is the word corresponding to the argument role with the highest value in 𝐬i\overrightarrow{\mathbf{s}_{i}}.

Similarly, the backward iterative prompt predicts the argument role of each entity in a right-to-left manner. The argument role confidence score distribution of the ii-th entity in the backward iterative prompt can be computed by:

BIP(ei)=T(t,en,ln,,ei+1,li+1,ei,[MASK]),\displaystyle BIP(e_{i})=T(t,e_{n},\overleftarrow{l_{n}},...,e_{i+1},\overleftarrow{l_{i+1}},e_{i},[\texttt{MASK}]), (5)
𝐬i=MLM(S[SEP]BIP(ei)).\displaystyle\overleftarrow{\mathbf{s}_{i}}=MLM(S\ [\texttt{SEP}]\ BIP(e_{i})). (6)

Then we can obtain the final argument role confidence score distribution of the ii-th entity by

𝐬i=𝐬i+𝐬i.\mathbf{s}_{i}=\overrightarrow{\mathbf{s}_{i}}+\overleftarrow{\mathbf{s}_{i}}. (7)

Finally, the argument role label with the highest score is chosen as the role prediction result.

3.3 Semantical Verbalizer

To tackle the problem that an argument role may have different definitions in different types of events, we reconstruct the set of argument role types and design a semantical verbalizer. Specifically, we further divide the argument role that participates in multiple types of events into multiple argument roles that are specific to event types. For example, the "Entity" role is divided into "Elect:Entity", "Meet:Entity", and etc. Since the "Place" role has the same meaning in all types of events, we do not consider to divide it.

For each new argument role, the semantical verbalizer constructs a virtual word to represent the role and initializes the representation of the virtual word with the semantic of the argument role. Let a mm-word sequence {qi1,qi2,,qi,m}\{q_{i1},q_{i2},...,q_{i,m}\} denote the semantic description of the argument role rir_{i}, the initial representation of the label word lil_{i} that is mapped into the role rir_{i} can be computed by:

𝐄(li)=1mj=1m𝐄(qij),\mathbf{E}(l_{i})=\frac{1}{m}\sum\limits^{m}_{j=1}\mathbf{E}(q_{ij}), (8)

where 𝐄\mathbf{E} is the word embedding table of a pre-trained masked language model.

For redefined argument roles, different argument roles may have the same semantics, such as "Appeal:Adjudicator" and "Sentence:Adjudicator". Therefore, it is easy to misclassify the entity with "Appeal:Adjudicator" role into the "Sentence:Adjudicator" role. In order to solve the problem, we use the event structure information to extract arguments. For an event with the "Appeal" type, its role label can only be "Appeal:Defendant", "Appeal:Adjudicator" and "Appeal:Plaintiff".

3.4 Templates

Refer to caption
Figure 3: Examples of three different templates with trigger "fired" and entity "missiles", where green font represents the given trigger, green underlined font represents the text span of event type, blue font represents the given entity, blue underlined font represents the text span of entity type.

To take full advantage of event type, trigger, and entity information, the designed template should contain event types, triggers, entity types, and entity mentions. Since some entity types and event types are not human-understandable words, such as "PER" and "Phone-Write", we need to convert each entity (event) type into a human-understandable text span. For example, we use "person" and "’written or telephone communication" as the text spans for "PER" and "Phone-Write" respectively.

Let Mi={εi1,εi2,,εid}M_{i}=\{\varepsilon_{i1},\varepsilon_{i2},...,\varepsilon_{id}\} denote the entity mention set of the ii-th entity, the word sequence of the ii-th entity can be represented as:

e^i=εi1orεi2ororεid.\hat{e}_{i}=\varepsilon_{i1}\ or\ \varepsilon_{i2}\ or\ ...\ or\ \varepsilon_{id}. (9)

We use wtw^{t} to denote the text span of event type of the given trigger and wiew_{i}^{e} to denote the text span of the entity type of the ii-th entity. For the given trigger tt and ii-th entity eie_{i}, three different templates of forward iterative prompt are designed as follows:

  • Hard Template: All known information are connected manually with natural language. "For the wtw^{t} event triggered by the tt, the w1ew_{1}^{e}, e^1\hat{e}_{1}, is l1\overrightarrow{l_{1}}, … , the wi1ew_{i-1}^{e}, e^i1\hat{e}_{i-1}, is li1\overrightarrow{l_{i-1}}, the wiew_{i}^{e}, e^i\hat{e}_{i}, is [MASK]"

  • Soft Template: Add a sequence of learnable pseudo tokens after all known information. "wtw^{t} tt w1ew_{1}^{e} e^1\hat{e}_{1} l1\overrightarrow{l_{1}}wi1ew_{i-1}^{e} e^i1\hat{e}_{i-1} li1\overrightarrow{l_{i-1}} wiew_{i}^{e} e^i\hat{e}_{i} [V1] [V2] [V3] [MASK] [V4] [V5] [V6]"

  • Hard-Soft Template: All known information are connected with learnable pseudo tokens. "[V1] wtw^{t} [V2] tt [V3] [V4] w1ew_{1}^{e} [V5] e^1\hat{e}_{1} [V6] l1\overrightarrow{l_{1}}, … , [V4] wi1ew_{i-1}^{e} [V5] e^i1\hat{e}_{i-1} [V6] li1\overrightarrow{l_{i-1}} [V4] wiew_{i}^{e} [V5] e^i\hat{e}_{i} [V6] [MASK]"

Pseudo tokens are represented by "[Vi]". The embedding of each pseudo token is randomly initialized and optimized during training.

3.5 Training

During training, gold argument roles are used to generate the template of each entity in forward iterative prompt and backward iterative prompt. The optimization objective is to ensure that the masked language model can predict argument role accurately in both forward iterative prompt and backward iterative prompt. We use pt,i\overrightarrow{p_{t,i}} and pt,i\overleftarrow{p_{t,i}} to represent the probability of the entity eie_{i} playing each role type in the event triggered by tt in forward and backward iterative prompt respectively. The loss function is defined as follows:

pt,i=softmax(𝐬i),pt,i=softmax(𝐬i),\displaystyle\overrightarrow{p_{t,i}}=softmax(\overrightarrow{\mathbf{s}_{i}}),\ \ \overleftarrow{p_{t,i}}=softmax(\overleftarrow{\mathbf{s}_{i}}),
𝕃=t𝕋i=1nt(log(pt,i(r~t,i))+log(pt,i(r~t,i))),\displaystyle\mathbb{L}=-\sum\limits_{t\in\mathbb{T}}\sum\limits^{n_{t}}_{i=1}(\log(\overrightarrow{p_{t,i}}(\tilde{r}_{t,i}))+\log(\overleftarrow{p_{t,i}}(\tilde{r}_{t,i}))), (10)

where 𝕋\mathbb{T} is the event trigger set in the training set, ntn_{t} is the number of entities contained in the same sentence as the event trigger tt, and r~t,i\tilde{r}_{t,i} is the correct argument role of the ii-th entity playing in the event triggered by tt.

4 Experiments

4.1 Experimental Setup

We evaluate our proposed method on the most widely used event extraction dataset, ACE 2005 English dataset111https://catalog.ldc.upenn.edu/LDC2006T06 (Doddington et al., 2004). Following the previous work (Wadden et al., 2019; Lin et al., 2020; Ma et al., 2022), the dataset is pre-processed and divided into training/development/test set, where 3333 event subtypes, 77 entity types and 2222 argument roles are considered in the processed dataset. As event argument extraction task is only focused on, we use gold entities and event triggers to conduct experiments.

We use Bert-base(containing around 110 millions parameters) (Devlin et al., 2019) and Roberta-base(containing around 125 millions parameters) (Liu et al., 2019) models to predict the masked words and train each model with AdamW, where the batch size is set to 44 and the learning rate is set to 1e1e-55. For the low-resource setting, we generate some subsets containing (1%,5%,10%,20%,50%,75%)(1\%,5\%,10\%,20\%,50\%,75\%) of the fulling training set in the same way as  (Hsu et al., 2022). In each experiment, the masked language model is trained by a subset and evaluated by the fulling development and test sets. All experiments are run on a NVIDIA Quadro P4000 GPU.

PLM Model Eval Argument Identification Role Classification
P R F P R F
Bert HMEAE SM 65.22 68.08 66.62 60.06 62.68 61.34
(EMNLP, 2019) FM 73.67 72.70 73.18 66.86 65.99 66.42
ONEIE SM 73.65 71.72 72.67 69.31 67.49 68.39
(ACL, 2020) FM 79.48 75.77 77.58 74.89 71.39 73.09
BERD SM 68.83 66.62 67.70 63.25 61.22 62.22
(ACL, 2021) FM 76.01 71.04 73.55 69.63 65.26 67.37
Roberta HMEAE SM 70.37 69.24 69.80 64.00 62.97 63.48
(EMNLP, 2019) FM 76.58 72.55 74.51 69.49 65.84 67.62
ONEIE SM 72.86 73.18 73.02 69.81 70.12 69.96
(ACL, 2020) FM 78.55 79.12 78.84 75.22 75.77 75.50
BERD SM 69.03 69.53 69.28 63.24 63.70 63.47
(ACL, 2021) FM 75.72 73.28 74.48 69.08 66.86 67.95
Bart DEGREE(EAE) SM 70.39 68.95 69.66 65.77 64.43 65.10
(NAACL, 2022) FM 79.20 75.60 77.37 74.16 70.80 72.44
PAIE SM 72.16 71.12 71.64 68.65 66.71 67.67
(ACL, 2022) FM 76.75 79.55 78.13 72.82 74.22 73.51
Bert BIP(our) 75.54 81.29 78.31 71.60 77.05 74.23
Roberta BIP(our) 78.17 (-1.31) 86.40 (+6.85) 82.08 (+3.24) 75.26 (+0.04) 83.19 (+7.42) 79.03 (+3.53)
Table 1: Experiment results of our proposed method with hard template and baselines, where the boldface is the best results, the underline is the second best results, and results of baselines are our re-implementations. Due to the limited memory of our GPU, only base-version models are adopted to perform experiments.

4.2 Baselines

Two categories of state-of-the-art methods are compared with our proposed method.

Fine-tuning Methods:

  • HMEAE (Wang et al., 2019) is a hierarchical modular model that uses the superordinate concepts of argument roles to extract event arguments.

  • ONEIE (Lin et al., 2020) is a neural framework that leverages global features to jointly extract entities, relations, and events. When applying ONEIE to the EAE task, we also use gold entity mentions and event triggers to extract event arguments, without considering the relations.

  • BERD (Xiangyu et al., 2021) is a bi-directional entity-level recurrent decoder that utilizes the argument roles of contextual entities to predict argument roles entity by entity.

Prompt-tuning Methods:

  • DEGREE(EAE) (Hsu et al., 2022) summarizes an event into a sentence based on a designed prompt containing the event type, trigger, and event-type-specific template. Then event arguments can be extracted by comparing the generated sentence with the event-type-specific template.

  • PAIE (Ma et al., 2022) is an encoder-decoder architecture, where the given context and designed event-type-specific prompt are input into the encoder and decoder separately to extract event argument spans.

4.3 Evaluation

Since we use an entity as a unit for argument role prediction, an event argument is correctly identified if the entity corresponding to the argument is predicted to be the non-None role type. The argument is further be correctly classified if the predicted role type is the same as the gold label.

For the above baselines, they consider that an event argument is correctly classified only if its offsets and role type match the golden argument, which can be called "strict match (SM)". In order to compare our model with baselines more fairly, we use a "flexible match (FM)" method to evaluate these baselines, that is, an argument is correctly classified if its offsets match any of the entity mentions co-referenced with the golden argument and role type match the golden argument.

Same as the previous work, the standard micro-averaged Precision(P), Recall(R), and F1-score(F1) are used to evaluate all methods.

4.4 Overall Results

Table 1 compares the overall results between our model and baselines, from which we have several observations and discussions.

(1) BIP(Roberta) gains the significant improvement in event argument extraction. The F1-scores of BIP(Roberta) are more than 9%9\% higher than those of all baselines obtained by the strict match evaluation method. Even using the flexible match method to evaluate baselines, the BIP(Roberta) method also outperforms the state-of-the-art ONEIE(Roberta) by 3.24%3.24\% increase of F1-score in term of argument identification and 3.53%3.53\% increase of F1-score in term of role classification.

(2) Comparing with the strict match, the flexible match achieves 5%5\% to 7%7\% F1-score improvements in term of argument identification and role classification. These results indicate that the trained argument extraction models can indeed identify the entity mention co-referenced with the golden argument as the event argument. In addition, in the actual application scenarios, we only pay attention to which entity is the event argument, not which mention in an entity is the event argument. Therefore, it is more reasonable and efficient to predict argument roles in unit of entity than entity mention.

(3) Roberta-version methods outperform Bert-version methods. In particular, for our proposed BIP method, Roberta further gains 3.77%3.77\% and 4.8%4.8\% F1-score improvements on argument identification task and role classification task respectively. These improvements can be explained by Roberta using a much larger training dataset than Bert and removing the next sentence prediction task. In the following experiments, we only consider Roberta-version methods.

Model Role Classification
P R F1
BIP(our) 75.26 83.19 79.03
BIP(forward) 76.06 78.95 77.47
BIP(backward) 75.94 76.61 76.27
-BI 78.79 76.02 77.38
-SV 74.79 78.07 76.39
-BI-SV 78.19 74.42 76.25
Table 2: An ablation study of our proposed method.

4.5 Ablation Study

Table 2 presents an ablation study of our proposed BIP method. BIP(forward) only considers the forward iterative prompt to extract event arguments. BIP(backward) only considers the backward iterative prompt. BIP-BI does not use a bi-directional iterative strategy to consider argument interactions, i.e. predicts the argument role of each entity separately. BIP-SV replaces our designed semantical verbalizer with a human-written verbalizer, where each label word is manually selected from a pre-trained language model vocabulary. BIP-BI-SV uses neither the bi-directional iterative strategy nor the semantical verbalizer. Some observations on the ablation study are as follows:

(1) Compared with the method BIP, the performance of BIP(forward) and BIP(backward) is decreased by 1.56%1.56\% and 2.76%2.76\% F1-score in term of role classification, respectively. These results clearly demonstrate that the bi-directional iterative prompt-tuning can further improve the performance by comparing with one direction.

(2) Comparing with the methods BIP-BI and BIP-BI-SV, the methods BIP and BIP-SV can further improve the performance of role classification in terms of 1.65%1.65\% and 0.14%0.14\% increase of F1-score, respectively. These results suggest that the bi-directional iterative strategy is useful for event argument extraction. In addition, we notice that the improvement brought by our bi-directional iterative strategy for the method BIP-BI is higher than BIP-BI-SV. This suggests that the more accurate the independent predicted argument role of each entity, the greater improvement the bi-directional iterative strategy will bring to the performance of argument extraction.

(2) The methods BIP and BIP-BI are respectively outperform the methods BIP-SV and BIP-BI-SV by 2.64%2.64\% and 1.13%1.13\% F1-score in term of role classification. These results illustrate that our semantical verbalizer is more effective than a human-written verbalizer for event argument extraction.

Sentence 1: Swapping smiles, handshakes and hugs at a joint press appearance after talks linked to Saint Petersburg’s
300th anniversary celebrations, Bush and Putin set out to recreate the buddy atmosphere of their previous encounters.
Event Trigger: talks, Event Type: Meet
Extraction Results:
Entity BIP BIP(forward) BIP(backward) BIP-BI BIP-SV
Bush Entity(\checkmark) Entity(\checkmark) None(×\times) Entity(\checkmark) Entity(\checkmark)
Putin Entity(\checkmark) Entity(\checkmark) None(×\times) None(×\times) Entity(\checkmark)
Sentence 2: Earlier Saturday, Baghdad was again targeted, one day after a massive U.S. aerial bombardment in which
more than 300 Tomahawk cruise missiles rained down on the capital.
Event Trigger: targeted, Event Type: Attack
Extraction Results:
Entity BIP BIP(forward) BIP(backward) BIP-BI BIP-SV
[Baghdad, capital] Place(\checkmark) Place(\checkmark) Place(\checkmark) Place(\checkmark) Place(\checkmark)
[Tomahawk, missiles] None(\checkmark) Instrument(×\times) None(\checkmark) None(\checkmark) None(\checkmark)
Sentence 3: Last month, the SEC slapped fines totaling 1.4 billion dollars on 10 Wall Street brokerages to settle charges
of conflicts of interest between analysts and investors.
Event Trigger: fines, Event Type: Fine
Extraction Results:
Entity BIP BIP(forward) BIP(backward) BIP-BI BIP-SV
SEC Adjudicator(\checkmark) Adjudicator(\checkmark) Adjudicator(\checkmark) Adjudicator(\checkmark) Entity(×\times)
brokerages Entity(\checkmark) Entity(\checkmark) Entity(\checkmark) Entity(\checkmark) Entity(\checkmark)
Table 3: Event argument extraction results by different methods.

4.6 Low-Resource Event Argument Extraction

Refer to caption
Figure 4: Performances against the ratio of training data, while ONEIE and PAIE are evaluated by flexible match.

Figure 4 presents the performance of our BIP, BIP-BI and two state-of-the-art methods in both low-resource and high-resource data scenarios. We can observe that the variation of F1-score has a trend of rising with the increase of the training data. Comparing the fine-tuning method ONEIE, prompt-tuning methods BIP, BIP-BI and PAIE obviously improve the performance of role classification in low-resource data scenarios. This result shows that prompt-tuning methods can more effectively utilize the rich knowledge in PLMs than fine-tuning methods.

Even using flexible match to evaluate the prompt-tuning method PAIE, our method BIP and BIP-BI achieve better performance in both low-resource and high-resource data scenarios. The main reason is that our method can make use of the entity information and predicted argument roles when constructing the template. We notice that the performance of BIP is worse than that of BIP-BI, when the ratio of training data is less than 20%20\%. This is because when the number of training data is too small, the probability of argument roles being correctly predicted is low. If the bi-directional iterative strategy is adopted, the wrongly predicted argument roles will be used for template construction, which will further degrade the performance of EAE.

4.7 Case Study

Model Template Argument Identification Role Classification
P R F1 P R F1
BIP(our) Hard Template 78.17 86.40 82.08 75.26 83.19 79.03
Soft Template 80.63 82.75 81.67 77.49 79.53 78.50
Hard-Soft Template 77.15 82.46 79.72 74.15 79.24 76.61
BIP-BI Hard Template 81.82 78.95 80.36 78.79 76.02 77.38
Soft Template 76.25 84.94 80.36 73.62 82.02 77.59
Hard-Soft Template 81.84 80.70 81.12 78.29 77.49 77.88
Table 4: Performance of different templates

In order to showcase the effectiveness of our method BIP, we sample three sentences from the ACE 2005 English test dataset to compare the event argument extraction results by BIP, BIP(forward), BIP(backward), BIP-BI and BIP-SV methods.

In Sentence 1 of Table 3, the method without the bi-directional iterative strategy BIP-BI can only identify the entity "Bush" as the "Entity" role. For the entity "Putin", the methods with the forward iterative prompt BIP, BIP(forward), BIP-SV can correctly classify it into the "Entity" role. This attributes to that the information that entity "Bush" is the "Entity" argument is introduced into the template construction of the entity "Putin". We also notice that "Bush" and "Putin" are both misclassified in the BIP(backward) method, where the error role information of "Putin" is passed to the classification of "Bush". In addition, for the entity "[he, Erdogan]" in Sentence 2, the method only with the forward iterative prompt BIP(forward) misclassifies the entity "[Tomahawk, missiles]" into the "Instrument" role. These results show that the argument roles of contextual entities can provide useful information for the role identification of the current entity. However, only considering argument interactions in one direction may degrade the performance of event argument extraction.

In Sentence 3, the method BIP-SV misclassifies the entity "SEC" into the "Entity" role. For the human-written verbalizer of BIP-SV, the word "judge" is selected as the label word of role "Adjudicator". It is difficult to associate the entity "SEC" with the word "judge". In the semantical verbalizer, we use the text sequence "the entity doing the fining" to describe the semantic of "Adjudicator" role in the "Fine" event. Since the pre-trained language models can easily identify the entity "SEC" as "the entity doing the fining", the methods with semantical verbalizer can correctly identify the entity "SEC" as the "Adjudicator" role. The result verifies the effectiveness of our designed semantical verbalizer.

4.8 Prompt Variants

In this section, we compare three different templates introduced in Section 3.4 to investigate how different types of templates affect the performance of EAE. For the BIP-BI method, the performances of hard template, soft template and hard-soft template are comparable. Since the hard-soft template combines the manual knowledge and learnable virtual tokens, it achieves the best performance. However, the hard-soft template performs worst for the BIP method. Unlike the BIP-BI method which only considers event trigger and current entity information, BIP introduces the predicted argument role information into the template. Therefore, there are so many learnable pseudo tokens in the hard-soft template, resulting in poor performance.

5 Conclusion and Future Work

In this paper, we regard event argument extraction as a cloze-style task and propose a bi-directional iterative prompt-tuning method to address this task. The bi-directional iterative prompt-tuning method contains a forward iterative prompt and a backward iterative prompt, which predict the argument role of each entity in a left-to-right and right-to-left manner respectively. For the template construction in each prompt, the predicted argument role information is introduced to capture argument interactions. In addition, a novel semantical verbalizer is designed based on the semantic of the argument role. And three kinds of templates are designed and discussed. Experiment results have shown the effectiveness of our method in both high-resource and low-resource data scenarios. In the future work, we are interested in the joint prompt-tuning method of event detection and event argument extraction.

Limitations

  • As the entity information is necessary to model event argument extraction as a cloze-style task, our method is not suitable for the situation that entities are not provided.

  • Comparing with the methods that predict argument roles simultaneously, the speed of our method is slower due to that it predicts the argument role of each entity one by one.

References

Appendix A Verbalizer

A.1 Semantical Verbalizer

For our designed semantical verbalizer, an argument role that participates in multiple types of events is divided into multiple argument roles that are specific to event types. For each new argument role, we use a virtual word to represent the role and initialize the representation of the virtual word with the semantic of the argument role. Table 6 shows the redefined argument role types, and the semantic description and virtual label word of each argument role type.

Redefined Argument Role Label Semantic Description Virtual Label Word
Event:None the entity that is irrelevant to the event Event:None
Event:Place the place where the event takes place Event:Place
Be-Born:Person the person who is born Be-Born:Person
Marry:Person the person who are married Marry:Person
Divorce:Person the person who are divorced Divorce:Person
Injure:Agent the one that enacts the harm Injure:Agent
Injure:Victim the harmed person Injure:Victim
Injure:Instrument the device used to inflict the harm Injure:Instrument
Die:Agent the killer Die:Agent
Die:Victim the person who died Die:Victim
Die:Instrument the device used to kill Die:Instrument
Transport:Agent the agent responsible for the transport event Transport:Agent
Transport:Artifact the person doing the traveling or the artifact being transported Transport:Artifact
Transport:Vehicle the vehicle used to transport the person or artifact Transport:Vehicle
Transport:Origin the place where the transporting originated Transport:Origin
Transport:Destination the place where the transporting is directed Transport:Destination
Transfer-Ownership:Buyer the buying agent Transfer-Ownership:Buyer
Transfer-Ownership:Seller the selling agent Transfer-Ownership:Seller
Transfer-Ownership:Beneficiary the agent that benefits from the transaction Transfer-Ownership:Beneficiary
Transfer-Ownership:Artifact the item or organization that was bought or sold Transfer-Ownership:Artifact
Transfer-Money:Giver the donating agent Transfer-Money:Giver
Transfer-Money:Recipient the recipient agent Transfer-Money:Recipient
Transfer-Money:Beneficiary the agent that benefits from the transfer Transfer-Money:Beneficiary
Start-Org:Agent the founder Start-Org:Agent
Start-Org:Org the organization that is started Start-Org:Org
Merge-Org:Org the organizations that are merged Merge-Org:Org
Declare-Bankruptcy:Org the organization declaring bankruptcy Declare-Bankruptcy:Org
End-Org:Org the organization that is ended End-Org:Org
Attack:Attacker the attacking agent Attack:Attacker
Attack:Target the target of the attack Attack:Target
Attack:Instrument the instrument used in the attack Attack:Instrument
Demonstrate:Entity the demonstrating agent Demonstrate:Entity
Meet:Entity the agents who are meeting Meet:Entity
Phone-Write:Entity the communicating agent Phone-Write:Entity
Start-Position:Person the employee Start-Position:Person
Start-Position:Entity the employer Start-Position:Entity
End-Position:Person the employee End-Position:Person
End-Position:Entity the employer End-Position:Entity
Elect:Person the person elected Elect:Person
Elect:Entity the voting agent Elect:Entity
Nominate:Person the person nominated Nominate:Person
Nominate:Agent the nominating agent Nominate:Agent
Arrest-Jail:Person the person who is jailed or arrested Arrest-Jail:Person
Arrest-Jail:Agent the jailer or the arresting agent Arrest-Jail:Agent
Release-Parole:Person the person who is released Release-Parole:Person
Release-Parole:Entity the former captor agent Release-Parole:Entity
Trial-Hearing:Defendant the agent on trial Trial-Hearing:Defendant
Trial-Hearing:Prosecutor the prosecuting agent Trial-Hearing:Prosecutor
Trial-Hearing:Adjudicator the judge or court Trial-Hearing:Adjudicator
Charge-Indict:Defendant the agent that is indicted Charge-Indict:Defendant
Charge-Indict:Prosecutor the agent bringing charges or executing the indictment Charge-Indict:Prosecutor
Sue:Plaintiff the suing agent Sue:Plaintiff
Sue:Defendant the agent being sued Sue:Defendant
Sue:Adjudicator the judge or court Sue:Adjudicator
Convict:Defendant the convicted agent Convict:Defendant
Convict:Adjudicator the judge or court Convict:Adjudicator
Sentence:Defendant the agent who is sentenced Sentence:Defendant
Sentence:Adjudicator the judge or court Sentence:Adjudicator
Fine:Entity the entity that was fined Fine:Entity
Fine:Adjudicator the entity doing the fining Fine:Adjudicator
Execute:Person the person executed Execute:Person
Execute:Agent the agent responsible for carrying out the execution Execute:Agent
Extradite:Person the person being extradited Extradite:Person
Extradite:Agent the extraditing agent Extradite:Agent
Extradite:Origin the original location of the person being extradited Extradite:Origin
Extradite:Destination the place where the person is extradited to Extradite:Destination
Acquit:Defendant the agent being acquitted Acquit:Defendant
Acquit:Adjudicator the judge or court Acquit:Adjudicator
Pardon:Defendant the agent being pardoned Pardon:Defendant
Pardon:Adjudicator the state official who does the pardoning Pardon:Adjudicator
Appeal:Defendant the defendant Appeal:Defendant
Appeal:Adjudicator the judge or court Appeal:Adjudicator
Appeal:Plaintiff the appealing agent Appeal:Plaintiff
Table 5: Label words of the human-written verbalizer.

A.2 Human-written Verbalizer

For the human-written verbalizer, we assign a label word to each argument role. Table 6 lists the label word of each argument role.

Argument Role Label Label Word
None none
Person person
Place place
Buyer buyer
Seller seller
Beneficiary beneficiary
Artifact artifact
Origin origin
Destination destination
Giver donor
Recipient recipient
Org organization
Agent agent
Victim victim
Instrument instrument
Entity entity
Attacker attacker
Target target
Defendant defendant
Adjudicator judge
Prosecutor prosecutor
Plaintiff plaintiff
Vehicle vehicle
Table 6: Label words of the human-written verbalizer.

Appendix B Templates

For our designed templates, each entity (event) type is converted into a human-understandable text span, so as to take full advantage of event type label and entity type label information. Table 7 and  8 list all text spans of entity types and event types.

Entity Type Text Span
FAC facility
ORG organization
GPE geographical or political entity
PER person
VEH vehicle
WEA weapon
LOC location
Table 7: Text spans of entity types.
Event Type Text Span
Transport transport
Elect election
Start-Position employment
End-Position dimission
Attack attack
Meet meeting
Marry marriage
Transfer-Money money transfer
Demonstrate demonstration
End-Org collapse
Sue prosecution
Injure injury
Die death
Arrest-Jail arrest or jail
Phone-Write written or telephone communication
Transfer-Ownership ownership transfer
Start-Org organization founding
Execute execution
Trial-Hearing trial or hearing
Be-Born birth
Charge-Indict charge or indict
Sentence sentence
Declare-Bankruptcy bankruptcy
Release-Parole release or parole
Fine fine
Pardon pardon
Appeal appeal
Extradite extradition
Divorce divorce
Merge-Org organization merger
Acquit acquittal
Nominate nomination
Convict conviction
Table 8: Text spans of event types.