SPEED++: A Multilingual Event Extraction Framework for
Epidemic Prediction and Preparedness

Tanmay Parekh Jeffrey Kwan Jiarui Yu Sparsh Johri Hyosang Ahn
Sreya Muppalla Kai-Wei Chang Wei Wang Nanyun Peng
Computer Science Department, University of California, Los Angeles
{tparekh, weiwang, violetpeng, kwchang}@cs.ucla.edu

Abstract

Social media is often the first place where communities discuss the latest societal trends. Prior works have utilized this platform to extract epidemic-related information (e.g. infections, preventive measures) to provide early warnings for epidemic prediction. However, these works only focused on English posts, while epidemics can occur anywhere in the world, and early discussions are often in the local, non-English languages. In this work, we introduce the first multilingual Event Extraction (EE) framework SPEED++ for extracting epidemic event information for a wide range of diseases and languages. To this end, we extend a previous epidemic ontology with 20 argument roles; and curate our multilingual EE dataset SPEED++ comprising 5.1K tweets in four languages for four diseases. Annotating data in every language is infeasible; thus we develop zero-shot cross-lingual cross-disease models (i.e., training only on English COVID data) utilizing multilingual pre-training and show their efficacy in extracting epidemic-related events for 65 diverse languages across different diseases. Experiments demonstrate that our framework can provide epidemic warnings for COVID-19 in its earliest stages in Dec 2019 (3 weeks before global discussions) from Chinese Weibo posts without any training in Chinese. Furthermore, we exploit our framework’s argument extraction capabilities to aggregate community epidemic discussions like symptoms and cure measures, aiding misinformation detection and public attention monitoring. Overall, we lay a strong foundation for multilingual epidemic preparedness.

Tanmay Parekh Jeffrey Kwan Jiarui Yu Sparsh Johri Hyosang Ahn Sreya Muppalla Kai-Wei Chang Wei Wang Nanyun Peng Computer Science Department, University of California, Los Angeles {tparekh, weiwang, violetpeng, kwchang}@cs.ucla.edu

1 Introduction

Timely epidemic-related information is vital for policymakers to issue warnings and implement control measures Collier et al. (2008). Social media being timely, publicly accessible, widely used, and high in volume Heymann et al. (2001); Lamb et al. (2013); Lybarger et al. (2021) acts as a crucial information source. Previous works Parekh et al. (2024); Zong et al. (2022) have explored utilizing Event Extraction (EE) Sundheim (1992); Doddington et al. (2004) to extract epidemic events from social media posts for epidemic prediction. However, these works have focused only on English; while epidemics can originate anywhere worldwide and be discussed in various regional languages.

Refer to caption — Figure 1: Zero-shot multilingual epidemic prediction in Chinese for COVID-19 pandemic. (Top) Number of epidemic events extracted in Dec-Jan 2020. Arrows indicate SPEED++ epidemic warnings. (Bottom) SPEED++ warning with respect to the general timeline of major moments of the COVID-19 pandemic.

In our work, we introduce SPEED++ (Social Platform based Epidemic Event Detection + Arguments + Multilinguality), the first multilingual EE framework designed for epidemic preparedness. We advance English-based SPEED Parekh et al. (2024) multilingually by developing and benchmarking zero-shot cross-lingual models capable of extracting epidemic information across many languages. While SPEED primarily identifies basic epidemic events, we develop enhanced models capable of extracting detailed event-specific information (e.g., symptoms, control measures) by incorporating Event Argument Extraction (EAE). To integrate EAE, we enrich the SPEED ontology with event-specific roles relevant to social media to create a rich EE ontology comprising 7 event types (e.g. infect, cure, prevent, etc.) and 20 argument roles (e.g. disease, symptoms, time, means, etc.). Apart from English, we also annotated three other languages - Spanish, Hindi, and Japanese - to benchmark multilingual EE models. Leveraging the enriched ontology and expert annotations, we develop our SPEED++ dataset comprising 5.1K tweets and 4.6K event mentions across four different diseases (COVID-19, Monkeypox, Zika, and Dengue) in four languages.

Using SPEED++, we develop our zero-shot cross-lingual models by empowering the state-of-the-art EE models like TagPrime Hsu et al. (2023a) with multilingual pre-training and augmented training using pseudo-generated multilingual data from CLaP Parekh et al. (2023a). These models are trained realistically on limited English COVID-specific data. Benchmarking on SPEED++ reveals how our trained models outperform various baselines by an average of 15-16% F1 points for unseen diseases across four different languages.

To demonstrate the utility of our multilingual EE SPEED++ framework, we apply it to two epidemic-related applications. First, we utilize the framework’s multilingual capabilities for epidemic prediction by aggregating epidemic events across different languages. By incorporating tweet locations, we construct a global epidemic severity meter capable of providing epidemic warnings in 65 languages spanning 117 countries. Applying our framework for COVID-19 to Chinese Weibo posts, we successfully detected early epidemic warnings by Dec 30, 2019 (Figure 1) - three weeks before the global infection tracking even began. This multilingual epidemic prediction capability can significantly enhance our global preparedness for future epidemics.

As another application, we repurpose our framework as an information aggregation system for community discussions about epidemics such as symptoms, cure measures, etc. Leveraging the EAE capability of our framework, we meticulously extract these event-specific details from millions of tweets across diseases and languages. Similar arguments are then agglomeratively clustered to generate an aggregated ranked bulletin. We demonstrate that this bulletin can aid misinformation detection (e.g., cow urine as a cure for COVID-19) and public attention shift monitoring (e.g., rashes as symptoms for Monkeypox). Such an automated disease-agnostic multilingual aggregation system can significantly alleviate human effort while providing insights into public epidemic opinions.

In conclusion, our work presents a three-fold contribution. First, we create the first multilingual Event Extraction dataset for epidemic prediction SPEED++ encompassing four diseases and four languages. Second, leveraging SPEED++, we develop models proficient in extracting epidemic-related data across a wide set of diseases and languages. Lastly, we demonstrate the robust utility of our framework through two epidemic-centric applications, facilitating multilingual epidemic prediction and the aggregation of epidemic information.

2 Background

Epidemic prediction is a classic epidemiological task that provides early warnings for future epidemics of any infectious disease Signorini et al. (2011). Previous works Lejeune et al. (2015); Lybarger et al. (2021) have utilized keyword-based and simple classification-based methods for extracting epidemic mentions (detailed in § 6). SPEED Parekh et al. (2024) was the first to explore Event Extraction (EE) for extracting epidemic-based events in English. In our work, we utilize Event Extraction but focus multilingually on a broader range of languages. The extracted events are aggregated over time and abnormal influxes are reported as early epidemic warnings. To our best knowledge, we are the first to develop a multilingual Event Extraction framework for epidemic prediction.

Task Definition

We adhere to the ACE 2005 guidelines Doddington et al. (2004) to define an event as an occurrence or change of state associated with a specific event type. An event mention is the sentence that describes the event, and it includes an event trigger, the word or phrase that most clearly indicates the event. Event Extraction comprises two subtasks: Event Detection and Event Argument Extraction. Event Detection (ED) involves identifying these event triggers in sentences and classifying them into predefined event types, while Event Argument Extraction (EAE) extracts arguments and assigns them event-specific roles. Figure 2 shows an illustration for two event mentions for the events infect and control.

3 Dataset Creation

We focus on social media, specifically Twitter as our main document source for studying four diseases - COVID-19, Monkeypox, Zika, and Dengue. SPEED Parekh et al. (2024) focused only on Event Detection (ED) for English. Since ED identifies events but does not provide any epidemic-related information, we improve SPEED by additionally incorporating Event Argument Extraction (EAE) to develop a complete EE dataset SPEED++. Furthermore, we extend to three other languages used on social media - Spanish, Hindi, and Japanese - to enhance the multilingual capability of our framework. We detail the data creation process below while Figure 3 provides a high-level overview.

3.1 Ontology Creation

Event ontologies comprises event types and corresponding event-specific roles. For our ontology, we derive the event types from SPEED and augment them with event-specific roles in our work. We follow ACE guidelines Doddington et al. (2004) for role definitions while also including a few non-entity roles based on GENEVA Parekh et al. (2023b).

We initially drafted event-specific roles through a crowdsourced survey with 100 participants. Through manual inspection, we extract the frequently mentioned roles in the responses. These are augmented with more typical roles like Time and Place. We further expand the ontology with epidemic-specific roles (e.g. Effectiveness of a cure, Duration of a symptom) from a fine-grained COVID ontology ExcavatorCovid Min et al. (2021a). Finally, the roles are renamed to reflect corresponding events (e.g. the person who gets infected in the infect event is named Infected). This multi-perspective role curation approach enhances the diversity and coverage of our ontology.

Filtering and Validation

To ensure the relevance of our ontology for social media, we analyzed the event roles based on their frequency on Twitter. Specifically, we sampled 50 tweets from the SPEED dataset and annotated them with our event roles. Based on this analysis, we filtered out event roles with too few occurrences, such as Origin (the source of the disease) and Manner (how a person was infected). Additionally, we merged roles that were too similar (e.g. Impact and Effectiveness are merged). Finally, we validate our ontology with two public health experts (epidemiologists from the Dept. of Public Health). The final ontology of event types and roles is presented in Table 1 with definitions and examples in Appendix § A.

Event Type	Argument Roles
Infect	infected, disease, place, time, value, information-source
Spread	population, disease, place, time, value, information-source, trend
Symptom	person, symptom, disease, place, time, duration, information-source
Prevent	agent, disease, means, information-source, target, effectiveness
Control	authority, disease, means, place, time, information-source, subject, effectiveness
Cure	cured, disease, means, place, time, value, facility, information-source, effectiveness, duration
Death	dead, disease, place, time, value, information-source, trend

Table 1: Event Ontology for SPEED++ comprising 7 event types and 20 argument roles.

Dataset	# Langs	# Event	# Arg	# Sent	# EM	Avg. EM	# Args	Avg. Args	Domain
Dataset	# Langs	Types	Roles	# Sent	# EM	per Event	# Args	per Role	Domain
Genia2013	1	13	7	664	6,001	429	5,660	809	Biomedical
MLEE	1	29	14	286	6,575	227	5,958	426	Biomedical
ACE	3	33	22	29,483	5,055	153	15,328	697	News
ERE	2	38	21	17,108	7,284	192	15,584	742	News
MEE	8	16	23	31,226	50,011	3126	38,748	1685	Wikipedia
SPEED++	4	7	20	5,107	4,677	668	13,827	691	Social Media

Table 2: Data statistics for SPEED++ dataset and comparison with other standard EE datasets. Langs = languages, # = number of, Avg. = average, Sent = sentences, EM = event mentions, Args = arguments.

3.2 Data Processing

We utilize Twitter as the social media platform and focus on four diseases - COVID-19, Monkeypox (MPox), Zika, and Dengue. To maintain a similar distribution, we follow the data processing process from SPEED Parekh et al. (2024). For English, we directly utilize the base data provided by SPEED which dated tweets from May 15 to May 31, 2020. For other languages, we extract tweets in the same date range as SPEED utilizing Twitter COVID-19 Endpoint as the COVID-19 base dataset. We utilize dumps from Dias (2020) as the Zika+Dengue base dataset. For tweet preprocessing, we follow Pota et al. (2021): (1) anonymizing personal information, (2) normalizing retweets and URLs, and (3) removing emojis and segmenting hashtags.

Event-based Filtering

To reduce annotation costs, we utilize SPEED’s event filtering technique. Specifically, each event type is associated with a seed repository of 5-10 tweets in each language. Query tweets are filtered based on their similarity to this seed repository. For procuring the multilingual event-specific seed sentences, we translate the original English seed tweets into different languages. To improve filtering efficiency, we additionally conduct keyword-based filtering for specific language-event pairs (e.g. Japanese-symptom, Japanese-cure, etc.). Here, we filtered out a query tweet if it did not contain any event-specific keywords. Finally, we apply event-based sampling from SPEED to procure the final base dataset that is utilized for data annotation. Additional details are discussed in § B.

3.3 Data Annotation

We conduct two sets of annotations to create our multilingual EE dataset: (1) EAE annotations for existing SPEED English ED data and (2) ED+EAE annotations for data in Japanese, Hindi, and Spanish. For ED, annotators were tasked to identify the presence of any events in a given tweet. For EAE, annotators were further asked to identify and extract event-specific roles that were also mentioned in the tweet. We provide further details about the annotation guidelines in § C.

Annotators and Agreement

To maintain high annotation quality, our annotators were selected to be a pool of seven experts who were computer science NLP students trained through multiple annotation rounds. Of these seven, we had three annotators who were bilingual speakers of English and Japanese/Hindi/Spanish respectively. These three annotators handled the multilingual ED and EAE annotations. The remaining four annotators, along with the bilingual English-Hindi annotator, focused on English EAE annotations.

To ensure good annotation agreement, we conduct two agreement studies among the annotators: (1) ED annotations for multilingual annotators and (2) EAE annotations for all annotators. Both these studies were conducted using English data (even for multilingual annotations) to ensure that agreement could be measured in a fair manner. Agreement scores were measured using Fleiss’ Kappa Fleiss (1971). For ED agreement, two rounds of study for the 3 multilingual annotators yielded a super-strong agreement score of 0.75 (30 samples). For EAE, the agreement score for the 7 annotators after two annotation rounds was a decent 0.6 (25 samples).

Annotation Verification

To mitigate single annotator bias, each datapoint in the English data is annotated by two annotators, with a third annotator resolving inconsistencies. Owing to the scarcity of multilingual annotators, we hire three additional bilingual speakers to verify the multilingual annotations. These verification annotators were selected through a thorough qualification test to ensure high verification quality. They were requested to judge if the current annotations were reasonable. If the original annotation was deemed incorrect, they were asked to provide feedback to correct the annotations. This feedback was finally utilized by the original multilingual annotators to rectify the annotation. We provide additional details in § C.1.

3.4 Data Analysis

Comparison with other datasets

SPEED++ comprises 5,106 tweets with 4,674 event mentions and 13,815 argument mentions across four diseases and four languages. We present the main statistics along with comparisons with other prominent EE datasets like ACE Doddington et al. (2004), ERE Song et al. (2015), Genia2013 Li et al. (2020), MEE Pouran Ben Veyseh et al. (2022), and MLEE Pyysalo et al. (2012) in Table 2. We note that SPEED++ is one of the few multilingual EE datasets, notably the first in the social media domain. Overall, SPEED++ is comparable in various event and argument-related statistics with the previous standard EE datasets.

Lang	# Sent	Avg. Length	# EM	# Args
en	2,560	32.5	2,887	8,423
es	1,012	32.4	614	1,485
hi	716	30.0	627	2,344
ja	819	89.2*	549	1,575

Table 3: Data statistics for SPEED++ split by language. # = number of, Avg = average, Lang = language, Sent = sentences, Args = arguments, *character-level.

Multilingual Statistics

We provide a deeper split of data statistics per language in Table 3. Owing to cheaper annotations, English has many more annotated sentences compared to other languages. This is also a design choice, as we will solely utilize English data for training zero-shot multilingual models (discussed in § 4). In terms of event and argument densities (i.e. # EM / # Sent and # Args / # Sent), we notice a broader variation across languages, with English and Hindi being denser. The average lengths (in terms of the number of words) are similar across the languages.

Argument Study

We deep-dive to study the density in terms of arguments per sentence for SPEED++ by comparing with standard EE datasets ACE and ERE and a multilingual EE dataset MEE in Figure 4. Noticeably, SPEED++ is more dense (has a higher mean argument per sentence value) and has a broader distribution with sentences up to 18 arguments as well. Furthermore, following GENEVA Parekh et al. (2023b), we add 4 non-entity roles, which make up 20% of the total arguments. Such non-entity arguments are not present in any other multilingual datasets. Overall, the high and broad argument density and the existence of non-entity arguments render SPEED++ to be a more challenging EE dataset.

4 Zero-shot Cross-lingual Event Extraction

	Disease	Language	# Sent	# EM
Train	COVID	English	1,601	1,746
Dev	COVID	English	374	471
Test	COVID	Spanish	534	365
		Hindi	416	412
		Japanese	542	395
	Monkeypox	English	286	398
	Zika + Dengue	English	299	272
		Spanish	478	249
		Hindi	300	215
		Japanese	277	154

Table 4: Data split for epidemic event extraction. # = number of, Sent = sentences, EM = event mentions.

Model	COVID						MPox		Zika + Dengue								Average
	hi		jp		es		en		en		hi		jp		es
	EC	AC	EC	AC	EC	AC	EC	AC	EC	AC	EC	AC	EC	AC	EC	AC	EC	AC
Baseline Models
ACE - TagPrime	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
DivED^∗^‡	0.0	-	0.0	-	27.0	-	36.7	-	47.7	-	4.4	-	1.3	-	12.8	-	16.2	-
Keyword^⋆	15.2	-	28.6	-	26.3	-	41.3	-	39.3	-	18.6	-	39.7	-	20.6	-	28.7	-
COVIDKB^†	45.2	-	42.3	-	24.2	-	18.5	-	45.5	-	47.5	-	34.0	-	34.6	-	36.5	-
GPT-3.5-turbo^⋆	35.7	14.5	36.4	15.0	43.2	16.8	46.4	24.0	56.6	33.0	45.5	20.0	29.0	11.0	39.6	15.1	41.6	18.7
Trained on SPEED++ (Our Framework)
TagPrime	60.1	39.0	35.3	7.9	62.0	37.1	70.2	45.1	66.7	48.5	65.0	40.7	27.2	7.5	49.9	28.2	54.6	31.8
TagPrime + XGear	60.1	27.1	35.3	9.7	62.0	36.0	70.2	42.0	66.7	45.8	65.0	31.9	27.2	8.8	49.9	26.6	54.6	28.5
BERT-QA	54.7	33.9	21.2	4.0	60.6	28.1	66.1	39.0	63.0	45.5	50.8	31.1	4.6	0.8	41.9	24.1	45.4	25.8
DyGIE++	61.0	35.7	38.2	2.0	61.7	39.1	67.4	39.3	64.1	46.4	61.4	32.2	27.5	0.4	45.6	22.7	53.4	27.2
OneIE	61.9	34.3	12.0	11.4	44.5	37.3	68.8	42.6	66.7	47.9	61.9	38.5	12.0	5.0	44.5	25.4	46.5	30.3
TagPrime + CLaP	58.6	32.9	48.4	19.1	62.6	37.7	70.2	45.1	66.7	48.5	65.2	40.6	39.2	18.8	49.7	28.1	57.6	33.9

Table 5: Benchmarking EE models trained on SPEED++ for extracting event information in the cross-lingual cross-disease setting. EC = Event Classification, AC = Argument Classification, hi = Hindi, jp = Japanese, es = Spanish, and en = English. ^⋆Numbers are higher due to string matching evaluation. ^†Binary classification evaluation.

To validate the effectiveness of EE for epidemic events, we benchmark various EE models using SPEED++. Given the infeasibility of procuring quality data in all languages for all diseases, we benchmark in a zero-shot cross-lingual cross-disease fashion i.e. we train models only on English COVID data and evaluate on the rest. We provide the data split for our benchmarking in Table 4. For evaluation, we report F1-score for event classification (EC) and argument classification (AC) Ahn (2006) measuring the classification of events and arguments respectively. We use TextEE Huang et al. (2024) for most implementations, with specific details discussed in § D. Additional benchmarking experiments are provided in § E.

EE Models

Most EE models solely focus on English and can not be directly utilized in the cross-lingual setting. To this end, we adopt the following models using multilingual pre-trained models and tokenization: (1) TagPrime Hsu et al. (2023a), (2) EEQA Du and Cardie (2020), (3) DyGIE++ Wadden et al. (2019a), (4) OneIE Lin et al. (2020a), (5) XGear Huang et al. (2022). Since XGear is an EAE-only model, we combine it with the TagPrime ED model. To further improve the models, we train them with pseudo-generated data using a label projection model CLaP Parekh et al. (2023a).

Baseline Models

We consider the following baselines: (1) ACE - TagPrime, a TagPrime model trained on the multilingual EE dataset ACE Doddington et al. (2004) and transferred to SPEED++. (2) DivED Cai et al. (2024), a Llama2-7B model fine-tuned on a diverse range of event definitions, (3) COVIDKB Zong et al. (2022), an epidemiological work using a BERT classification model. Since the original output classes are different, we train it to simply classify tweets as epidemic-related or not. (4) Keyword baseline inspired from an epidemiological work Lejeune et al. (2015) curates a set of keywords for each event. (5) GPT-3.5-turbo Brown et al. (2020), a Large Language Model (LLM) baseline using seven in-context examples.

Results

We present our per-disease per-language results in Table 5. We note that most of the baseline models do not perform well for our task, as was noted also in Parekh et al. (2024). The GPT-based LLM baseline performs better in English but exhibits poor performance across other languages. On the other hand, we observe stronger performance by the supervised baselines trained on our SPEED++ dataset with TagPrime providing the best overall average performance. We also note how CLaP further improves performance by 2-3 F1 points in the cross-lingual setting, especially for character-based language Japanese. ^†^†^‡English-based Llama performs poorly multilingually.

5 Applications

To validate its practical utility for epidemic preparedness, we demonstrate our framework’s use in two downstream applications: Global Epidemic Prediction and Epidemic Information Aggregation. For this, we train a multilingual TagPrime model on the entire SPEED++ dataset. Further details about these applications are provided below.

5.1 Global Epidemic Prediction

To showcase the robust multilingual utility of our framework, we highlight its extensive language coverage and provide an in-depth analysis of COVID-19 predictions from Chinese data.

Global Epidemic Monitoring

Validating our framework for each language is resource-intensive and infeasible. Instead, we perform a preliminary study of the broad language coverage of our framework by demonstrating its capability to detect COVID-related events across 65 languages. We analyze tweets across all languages from a single day (May 28, 2020) and extract epidemic events using our framework. Utilizing user locations, we map the tweets from these languages to 117 countries. For reference, we plot the extracted events for each country with the actual number of reported COVID-19 cases¹¹1https://www.worldometers.info/coronavirus/ in Figure 5. Countries with significant COVID-19 spread appear in the top-right, while some outliers are also shown in the figure. Our framework achieves a healthy correlation of 0.73 with the actual reported cases, indicating strong performance across a broad range of languages.

We further extend these plots geographically in Figure 6. Each country is color-coded on the number of COVID cases, with lighter shades indicating fewer cases while darker shades indicate a massive spread. The extracted number of events from the country-mapped tweets by our framework are plotted as translucent circles. Bigger dots indicate more events extracted for the given country. In this plot, we observe extracted events and COVID-19 spread more in Western European countries like the United Kingdom, France, Spain, Italy, and Germany, while lesser events spread in Eastern European countries. We provide additional details along with a world map geographical plot in § F.

COVID-19 Epidemic Prediction using Chinese

As a case study, we examine the earliest stages of the COVID-19 pandemic, analyzing Chinese social media posts from Dec 16, 2019, to Jan 21, 2020, using Weibo data from Hu et al. (2020). Using our trained TagPrime model, we infer on Chinese in a zero-shot fashion (i.e. without prior training on Chinese). We aggregate the 7-day rolling average of our extracted event mentions across time and report any sharp increases as epidemic warnings, as illustrated in Figure 1. Since case reporting had not begun, actual COVID-19 case numbers are unavailable for this period. Instead, we also plot the total number of Weibo posts and active users Guo et al. (2021). Additionally, we compare trends with baselines from COVIDKB Zong et al. (2022) and a keyword-based approach Lejeune et al. (2015).

Chinese Posts	Translation
武汉华南海鲜市场出现 [infect]多个不明原因肺炎病例，请同道们提高警惕，早期发现，早期隔离 [prevent]	Multiple cases of pneumonia of unknown origin have appeared [infect] in Wuhan’s Huanan Seafood Market. Please be more vigilant, detect and isolate [prevent] them as early as possible.
近日，武汉进入流感高发期 [spread]，多家医院感冒 [symptom]发烧的患者数量猛增。	Recently, Wuhan has entered a period of high influenza incidence [spread], and the number of patients with colds [symptom] and fevers in many hospitals has increased sharply.

Table 6: Sample Weibo posts in Chinese with their translations identified by SPEED++ framework as epidemic-related from late December 2019. Event types and their trigger words are marked in blue.

Figure 1 demonstrates how our SPEED++ framework provides epidemic warnings three weeks before the global tracking of infection cases began. While the keyword-based method also provides some signals, they are relatively weaker. Furthermore, Table 5 shows that the keyword baseline performs worse for morphologically richer languages, making it less robust. Additionally, the number of posts and active users do not provide any epidemic-related signals. For further validation, we present sample event mentions extracted by our framework in Table 6. In the bottom timeline²²2https://www.cdc.gov/museum/timeline/covid19.html in Figure 1, we demonstrate the efficacy of our framework as it provided epidemic warnings 6 weeks before the "COVID-19" term was coined and used in social media. Overall, we show how our framework can provide early epidemic warnings multilingually without relying on any target language data, making it suitable for global deployment.

5.2 Epidemic Information Aggregation

Our framework possesses a strong EAE capability to extract detailed information about epidemic events such as symptoms, preventive measures, cure measures, etc. Aggregating such information from millions of social media posts can provide insights into public opinions regarding various epidemic aspects. To this end, we develop an information aggregation system for community epidemic discussions. Specifically, we use our framework to extract arguments for various event roles, project them into a representative space, and merge similar arguments using agglomerative clustering. The final arguments and their counts for different diseases and languages, extracted from 6M tweets, are presented as a bulletin in Figure 7. Further details and a complete table are reported in § G.

Tweet	Translation
Hindi - COVID-19 - Cure Measures
\dnkoronA k\? EKlAP ek yo\388wA bn k\? yog s\? koronA ko hrAnA h\4.	Become a warrior against Corona and defeat it through yoga
\dns\2Ebt koronA pA\<E)EVv\rs,\re hAlt nA\7jk aA(mEnB\0r aEByAn k\? tht gO\8m/ k\? \7k\3A5w\4 krvAkr upcAr EkyA jA rhA h\4.	Sambit is corona positive, condition is critical, and is being treated by gargling with cow urine under the self-reliant campaign.
\dns\2Ebt jF k\? koronA \3FEwBAEvt hon\? kF Kbr EmlF. jAnkr \7d,K \7haA\rs,\re l\?Ekn v\? a-ptAl m\?\qva -vA-Ly lAB l\? rh\? h\4\qva\rs,\re aOr ab b\7ht b\?htr h\4	Received the news of Sambit ji being affected by Corona! Sorry to hear, but he’s recovering in the hospital, and is much better now.
Spanish - COVID-19 - Cure Measures
Científicos rusos sugieren que una proteína presente en la leche materna puede ser clave en la lucha contra el covid-19 (url)	Russian scientists suggest that a protein present in breast milk may be key in the fight against covid-19 (url)
Este transplante de pulmones a un paciente con COVID-19 es una operación realizada hasta ahora sólo en China y que por primera vez se lleva a cabo en Europa	This lung transplant to a patient with COVID-19 is an operation carried out so far only in China and is being carried out for the first time in Europe.
el mundo acumula más evidencia de la efectividad de la ivermectina para el tratamiento en casa de pacientes con estadios leves de #(COVID)-19	the world accumulates more evidence of the effectiveness of ivermectin for the home treatment of patients with mild stages of #(COVID)-19

Language	Inconsistent rate	Verification acceptance rate
Hindi	27.46%	31.48%
Japanese	17.52%	81.73%
Spanish	19.01%	66.66%

Pre-trained LM	XLM-RoBERTa-Large
Training Batch Size	$16$
Eval Batch Size	$4$
Learning Rate	$0.001$
Weight Decay	$0.001$
Gradient Clipping	$5$
Training Epochs	$10$
Warmup Epochs	$5$
Max Sequence Length	$250$
Linear Layer Dropout	$0.2$

Pre-trained LM	XLM-RoBERTa-Large
Training Batch Size	$6$
Eval Batch Size	$12$
Learning Rate	$0.001$
Weight Decay	$0.001$
Gradient Clipping	$5$
Training Epochs	$90$
Warmup Epochs	$5$
Max Sequence Length	$250$
Linear Layer Dropout	$0.2$

Pre-trained LM	mT5-Large
Training Batch Size	$6$
Eval Batch Size	$12$
Learning Rate	$0.00001$
Weight Decay	$0.00001$
Gradient Clipping	$5$
Training Epochs	$90$
Warmup Epochs	$5$
Max Sequence Length	$400$

Pre-trained LM	mBERT
Training Batch Size	$64$
Learning Rate	$0.00002$
Training Epochs	$4$
Max Sequence Length	$128$
Number of Classes	$2$

Rank	Clustered Argument	Count
English - Monkeypox - Symptoms
1	rash	818
2	sick	746
3	lesions	637
4	fever	548
5	side effect	484
6	itching	441
7	rashes	412
8	cough	175
English - Zika - Symptoms
1	birth defects	2.2K
2	brain damage	1.1K
3	microcephaly	990
4	health problems	723
5	nerve disorder	705
6	congenital syndrome	391
7	nerve damage	382
8	damages placenta	196
English - Dengue - Symptoms
1	fever	4.8K
2	multiple organ failure	1.1K
3	shock syndrome	522
4	symptoms	326
5	high fever	292
6	severe disease	256
7	disease	250
8	rashes	200

Rank	Argument	Translation	Count
Hindi - COVID-19 - Cure Measures
1	\dnilAj	treatment	1.1K
2	\dnhom aAisol\?fn	home isolation	1K
3	\dnyog	yoga	636
4	\dn-vA-Ly lAB	recover	500
5	\dngO\8m/ k\? \7k\3A5w\4	cow urine rinse	448
6	\dnaApk\? aAfFvA\0d	your blessings	240
7	\dnEX-cAj\0	discharge	126
8	\dndvAao\2	medicines	120
Spanish - COVID-19 - Cure Measures
1	hidroxicloroquina	hydroxychloroquine	583
2	leche materna	breastmilk	427
3	medicamentos	medicines	252
4	tratamientos	treatments	226
5	red integrada covid	covid integrated network	214
6	ivermectina	ivermectin	157
7	remdesivir	remdesivir	152
8	transplante	transplant	132

Arguments of INFECT event
Argument Name	Argument Definition	Example
Infected	The individual(s) being infected	300 people tested positive.
Disease	The disease or virus that invaded the host	I tested positive for COVID.
Place	The place that the individual(s) are infected	5 students at school are infected.
Time	The time that the individual(s) are infected	300 people tested positive on May 15.
Value	The number of people being infected	Some people are infected.
Information-source	The source that is providing this information regarding to infection	According to CDC, if you have COVID…

Arguments of SPREAD event
Argument Name	Argument Definition	Example
Population	The population among which the disease spreads	16000 Americans are infected.
Disease	The disease/virus/pandemic that is prevailing	Monkey-box is spreading …
Place	The place at which the disease is spreading	the flu prevails in the U.S..
Time	The time during which the disease is spreading	the flu prevails in the U.S. in winter.
Value	The number of people being infected	16000 Americans are infected.
Information-source	The source that is providing this information regarding to the transmission of the disease	My mom says COVID is spreading again.
Trend	The possible change of a transmission of a disease with respect to past status	COVID spreads faster than we’ve expected.

Arguments of SYMPTOM event
Argument Name	Argument Definition	Example
Person	The individual(s) displaying symptoms	I’m coughing now.
Symptom	The concrete symptom(s) that are displayed	You may have severe fever and stomach-ache.
Disease	The disease(s)/virus that are potentially causing the symptoms	If you cough, that’s probably COVID.
Place	The place at which the symptom(s) are displayed	Students are showing illness at school.
Time	The time during which the symptom(s) are displayed	I feel sick yesterday.
Duration	The time interval that the symptom(s) last	My fever lasts three days.
Information-source	The source that is providing this information regarding to symptoms of the disease	He said half of his class were ill.

Arguments of PREVENT event
Argument Name	Argument Definition	Example
Agent	The individual(s) attempting to avoid infections	You should wear a mask to protect yourself and others.
Disease	The disease/virus/illness being defensed against	prevent COVID infection.
Means	actions/means that may prevent infection	You should wear a mask to protect yourself and others.
Information-source	The source that is providing this information regarding to the prevention of this disease	CDC proves masks can efficiently blocks virus.
Target	The individual(s)/population to which the agent attempts to prevent the disease transmission	You should wear a mask to protect yourself and others.
Effectiveness	How effective is the means against the disease	CDC proves masks can efficiently blocks virus.

Arguments of CONTROL event
Argument Name	Argument Definition	Example
Authority	The authority implementing/advocating the control of a pandemic	To impede COVID transmission, Chinese government required quarantine upon arrival.
Disease	The intruding disease/virus/pandemic being defensed against	To impede COVID transmission…
Means	the enacted/advocated policies/actions that may control the pandemic	To impede COVID transmission, Chinese government required quarantine upon arrival.
Information-source	The source that is providing this information regarding to the control of this disease	CNN reports massive pandemic lockdowns in China.
Place	The individual(s)/population to which the agent attempts to prevent the disease transmission	NY will enforces mask policy from June.
Time	The individual(s)/population to which the agent attempts to prevent the disease transmission	NY will enforces mask policy from June.
Effectiveness	How effective is the means against the disease	The infection rate does not decrease since the enforcement of mask policy.
Subject	The individual(s)/population encouraged/ordered to implement the control measures	Due to the pandemic, students are required to wear masks in class.

Arguments of CURE event
Argument Name	Argument Definition	Example
Cured	The individuals(s) recovered/receiving the treatments	My grandma recovered from COVID yesterday.
Disease	The disease/illness that the patients get rid of	My grandma recovered from COVID yesterday.
Means	The therapy that (potentially) treat the disease	Just get rest and your fever will go away.
Information-source	The source that is providing this information regarding to the cure/recovery of this disease	CNN reports that XX company claimed to developed COVID treatment.
Place	The place at which the recovery takes place	In the U.S., 15670 people recovered and 16000 died of COVID.
Time	The time at which the recovery takes place	By May 15, 15670 Americans recovered and 16000 died of COVID.
Effectiveness	How effective is the means against the disease	The new COVID treatment is not fully effective.
Value	The number of people being cured	By May 15, 15670 Americans recovered and 16000 died of COVID.
Facility	The individual(s)/organization(s) utilizing/inventing certain means to facilitate recoveries	CNN reports that XX company claimed to developed COVID treatment.
Duration	The time interval that the treatment takes	I received the treatment for two month before full recovery.

Arguments of DEATH event
Argument Name	Argument Definition	Example
Dead	The individuals(s) who die of infectious disease	By March, 500 people died of COVID in CA.
Disease	The disease/virus/pandemic that (potentially) causes the death	By March, 500 people died of the virus in CA.
Information-source	The source that is providing this information regarding to fatality of this disease	Daily news: new death of COVID …
Place	The place at which the death takes place	By March, 500 people died of COVID in CA.
Time	The time at which the death takes place	By March, 500 people died of COVID in CA.
Value	The number of death due to infectious disease	By March, 500 people died of COVID in CA.
Trend	The possible change of death counts caused by disease infection compared to the statistics from the past.	The COVID death toll is still increasing…

Rank	Clustered Argument	Count
English - COVID-19 - Symptoms
1	can’t breathe	8.8K
2	pneumonia	6.7K
3	sick	4.2K
4	hemorrhaging	2.9K
5	prevents me from staying home	2.7K
6	cough	2.1K
7	symptoms	1.9K
8	critically ill	1.2K
English - COVID-19 - Cure Measures
1	hydroxychloroquine	3.7K
2	remdesivir	2.5K
3	drug	2.1K
4	treatment	1.7K
5	hcq	1.3K
6	vaccine	485
7	zinc	448
8	lockdown	425
English - COVID-19 - Control Measures
1	lockdown	187K
2	quarantine	56K
3	social distancing	38K
4	deny entry	28K
5	response	21K
6	title 32 orders	15K
7	masks	15K
8	executive order	10K

Event	English	Hindi	Spanish	Japanese
Infect	I caught the virus earlier today	\dnm\4\qva aAj \7sbh vAyrs s\? bFmAr ho gyA \8h\2	Contraje el virus temprano hoy	今日ウイルスに感染した
Infect	My brother tested positive for COVID-19	\dnm\?r\? BAI kA COVID-19 \dnV\?-V pA\<EjEVv aAyA	Mi hermano dio positivo por COVID-19.	私の兄はCOVID-19陽性でした
Spread	The COVID-19 outbreak put WHO in alert that the pandemic may develop into global scale	COVID-19 \dnk\? \3FEwkop n\? WHO \dnko stk\0 kr EdyA h\4 Ek yh mhAmArF v\4E\398wk -tr pr EvkEst ho sktF	El brote Covid-19 alertó al OMS que la pandemia puede alcanzar en escala global	COVID-19の発生により、WHOはパンデミックが世界的な規模に発展する可能性に警戒を強めている
Spread	A new flu is sweeping across Los Angeles	\dnlA\<s e\2EjSs m\?\qva ek nyA \325w\8l P\4l rhA h\4	Una nueva gripe está propagando a traves de Los Ángeles	ロサンゼルスで新型インフルエンザが流行
Symptom	Many of my friends have a cold	\dnm\?r\? k\4i do-to\qva ko sdF\qvb h\4	Muchos de mis amigos tienen un resfriado	私の多くの友人が風邪をひいた
Symptom	I became incredibly ill after catching the virus	\dnvAyrs kF cp\?V m\?\qva aAn\? k\? bAd m\4\qva aEv\398wsnFy !p s\? bFmAr ho gyA	Me enfermé increíblemente después de contagiarme de el virus	ウイルスに感染した後、私はすごく体調を崩した
Prevent	Medical experts encourage young kids to wash their hands	\dnEcEk(sA Evf\?q\3E2w CoV\? b\3CEwo\qva ko hAT Don\? k\? Ele \3FEwo(sAEht krt\? h\4\qva	Los expertos médicos alientan a los niños a lavarse las manos	医療専門家が幼児に手洗いを奨励
Prevent	Wear a mask to protect your family from the disease	\dnapn\? pErvAr ko bFmArF s\? bcAn\? k\? Ele mA-k phn\?\qva	Use una máscara para proteger a su familia de la enfermedad	家族を疾病から守るためにマスクを着用すること
Control	The WHO has published new guidelines in response to the rising cases of COVID-19	WHO \dnn\? COVID-19 \dnk\? bxt\? mAmlo\qva k\? jvAb m\?\qva ne EdfAEnd\?\qvbf \3FEwkAEft Eke h\4\qva	La OMS ha publicado nuevas pautas en respuesta a los casos crecientes de Covid-19	WHOはCOVID-19の感染者増加を受けて新しいガイドラインを発表した。
Control	Government officials have imposed a lockdown on certain districts	\dnsrkArF aEDkAEryo\qva n\? \7kC Ejlo\qva m\?\qva lA\<kXAun lgA EdyA h\4	Los funcionarios gubernamentales han impuesto un aislamiento a ciertos distritos	政府当局が特定の地区にロックダウンを課した
Cure	There is no magic cure for the pandemic	\dnaBF tk koEvX kA koI \3FEwBAvF ilAj nhF\qva	No existe una cura mágica para la pandemia	パンデミックに特効薬はない
Cure	Unfortunately doctors were unable to save him from the pandemic	\dn\7dBA\0‘y s\? XA\<?Vr us\? mhAmArF s\? bcAn\? m\?\qva asmT\0 T\?	Desfortunadamente, los médicos no pudieron salvarlo de la pandemia	残念ながら、医師たちは彼をパンデミックから救うことはできなかった。
Death	700 people killed by COVID	\dnkoEvX s\? \rn700 logo\qva kF mOt	700 personas matadas por Covid	COVIDによる死亡率：７００人
	The mortality rate of the pandemic has decreased as experts figure out how to treat it	\dnmhAmArF kF \9m(\7y dr m\?\qva kmF aAI h\4 \3C8wo\qvaEk Evf\?q\3E2w yh ptA lgA rh\? h\4\qva Ek iskA ilAj k\4s\? EkyA jAe	La tasa de mortalidad de la pandemia ha disminuido a medida porque los expertos están descubriendo cómo tratarla	パンデミックの死亡率は、専門家による治療法の解明のために減少している。

SPEED++: A Multilingual Event Extraction Framework for Epidemic Prediction and Preparedness

Abstract

1 Introduction

2 Background

Task Definition

3 Dataset Creation

3.1 Ontology Creation

Filtering and Validation

3.2 Data Processing

Event-based Filtering

3.3 Data Annotation

Annotators and Agreement

Annotation Verification

3.4 Data Analysis

Comparison with other datasets

Multilingual Statistics

Argument Study

4 Zero-shot Cross-lingual Event Extraction

EE Models

Baseline Models

Results

5 Applications

5.1 Global Epidemic Prediction

Global Epidemic Monitoring

COVID-19 Epidemic Prediction using Chinese

5.2 Epidemic Information Aggregation

6 Related Works

Event Extraction Datasets

Multilingual Epidemiological Information Extraction

7 Conclusion and Future Work

Acknowledgements

Limitations

Ethical Considerations

References

Appendix A Ontology Creation: Role Definitions

Appendix B Dataset Filtering and Sampling

Appendix C Annotation Guidelines and Details

C.1 Multilingual Data Verification

Appendix D Benchmarking Model: Implementation Details

D.1 TagPrime

D.2 XGear

D.3 BERT-QA

D.4 DyGIE++

D.5 OneIE

D.6 CLaP

D.7 DivED

D.8 COVIDKB

D.9 Keyword

D.10 GPT-3

Appendix E Additional Epidemic Event Extraction Experiments

EAE Benchmarking

Appendix F Global Epidemic Prediction: Additional Details

Location mapping

Geographical plotting

Appendix G Epidemic Information Aggregation: Additional Details

Example tweets

SPEED++: A Multilingual Event Extraction Framework for
Epidemic Prediction and Preparedness