This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

BBAEG: Towards BERT-based Biomedical Adversarial Example Generation for Text Classification

Ishani Mondal
Microsoft Research India
[email protected]
Abstract

Healthcare predictive analytics aids medical decision-making, diagnosis prediction and drug review analysis. Therefore, prediction accuracy is an important criteria which also necessitates robust predictive language models. However, the models using deep learning have been proven vulnerable towards insignificantly perturbed input instances which are less likely to be misclassified by humans. Recent efforts of generating adversaries using rule-based synonyms and BERT-MLMs have been witnessed in general domain, but the ever-increasing biomedical literature poses unique challenges. We propose BBAEG (Biomedical BERT-based Adversarial Example Generation), a black-box attack algorithm for biomedical text classification, leveraging the strengths of both domain-specific synonym replacement for biomedical named entities and BERT-MLM predictions, spelling variation and number replacement. Through automatic and human evaluation on two datasets, we demonstrate that BBAEG performs stronger attack with better language fluency, semantic coherence as compared to prior work.

1 Introduction

Recent studies have exposed the importance of biomedical NLP in the well-being of human-beings, analyzing the critical process of medical decision-making. However, the dialogue managing tools targeted for medical conversations Zhang et al. (2020), Campillos Llanos et al. (2017), Kazi and Kahanda (2019) between patients and healthcare providers in assisting diagnosis may generate certain insignificant perturbations (spelling errors, paraphrasing), which when fed to the classifier to determine the type of diagnosis required/detecting adverse drug effects/drug recommendation, might provide unreasonable performance. Insignificant perturbations might also creep in from the casual language expressed in the tweets Zilio et al. (2020). Thus, the classifier needs to be robust towards these perturbations.

Generating adversarial examples in text is challenging compared to computer vision tasks because of (i) discrete nature of input space and (ii) preservation of semantic coherence with original text. Initial works for attacking text models relied on introducing errors at the character level or manipulating words Feng et al. (2018) to generate adversarial examples. But due to grammatical disfluency, these seem very unnatural. Some rule-based synonym replacement strategies Alzantot et al. (2018), Ren et al. (2019) have lead to more natural looking examples. Jin et al. (2019) proposed TextFooler, as a baseline to generate adversaries for text classification models. But, the adversarial examples created by TextFooler rely heavily on word-embedding based word similarity replacement technique, and not overall sentence semantics. Recently, Garg and Ramakrishnan (2020) proposed BERT-MLM-based Devlin et al. (2019) word replacements to create adversaries to better fit the overall context.

Despite these advancements, there is much less attention towards making robust predictions in critical domains like biomedical, which comes with its unique challenges. Araujo et al. (2020) has proposed two types of rule-based adversarial attacks inspired by natural spelling errors and typos made by humans and synonym replacement in the biomedical domain. Some challenges include: 1) Biomedical named entities are usually multi-word phrases such as colorectal adenoma. During token replacement, we need the entire entity to be replaced, but the MLM model (token-level replacement) fails to generate correct synonym of entity fitting in the context. So, we need a BioNER+Entity Linker Martins et al. (2019), Mondal et al. (2019) to link entity to ontology for generating correct synonyms. 2) Due to several variations of representing medical entities such as Type I Diabetes could be expressed as ’Type One Diabetes’, we explore numeric entity expansion strategies for generating adversaries. 3) Spelling variations (keyboard swap, modification). While we evaluate on two benchmark datasets, our method is general and is applicable for any biomedical classification datasets.

In this paper, we present BBAEG (Biomedical BERT-based Adversarial Example Generation)111https://github.com/Ishani-Mondal/BBAEG.git, a novel black-box attack algorithm for biomedical text classification task leveraging both the BERT-MLM model for non-named entity replacements combined with NER linked synonyms for named entities to better fit the overall context. In addition to replacing words with synonyms, we explore the mechanism of generating adversarial examples using typographical variations and numeric entity modification. Our BBAEG attack beats the existing baselines by a wide margin on both automatic and human evaluation across datasets and models. To the best of our knowledge, we are the first to introduce a novel algorithm for generating adversarial examples for biomedical text whose success attack is higher than the existing baselines like TextFooler and BAE Garg and Ramakrishnan (2020). The overall contributions of the paper include: 1) We explore several challenges of biomedical adversarial example generation. 2) We propose BBAEG, a biomedical adversarial example generation technique for text classification combining the power of several perturbation techniques. 3) We introduce 3 type of attacks for this purpose on two biomedical text classification datasets. 4) Through human evaluation, we show that BBAEG yields adversarial examples with improved naturalness.

Input: DD=[w1w_{1}, … wlw_{l}], label = yy, target classification model MM
Output: Adversarial example of DD = DadvD_{adv}
1 Initialization: DadvD_{adv} \leftarrow D, Tag the entities in DD, Named entities are in SNES_{NE} and the rest in SNNES_{NNE} ;
2 Compute token importance IiI_{i} \forall wiw_{i} \in DD;
3 for ii in descending order of IiI_{i} do
4       L = {} ;
5       if (wiw_{i} in SNES_{NE} and (wit..wi+t)(w_{i-t}..w_{i+t}) is a NE) then
6             Syns = synonyms of NE;
7             for s \in Syns do
8                   L[s]L[s] = Dadv[1:it1]D_{adv[1:i-t-1]}[s]Dadv[i+t+1:l]D_{adv[i+t+1:l]}
9            end for;
10            
11       else if (wiw_{i} in SNNES_{NNE}) then
12             DadvD_{adv} = Dadv[1:i1]D_{adv[1:i-1]}[M]Dadv[i+1:l]D_{adv[i+1:l]};
13             TT = top-KK filtered and semantically similar tokens for MM \in DMD_{M};
14             for tt \in TT do
15                   L[t]L[t] = Dadv[1:i1]D_{adv[1:i-1]}[t]Dadv[i+1:l]D_{adv[i+1:l]}
16            end for;
17            
18      end if;
19       if \exists tt \in TT such that M(L[t])M(L[t]) \neq yy then
20             Return: DadvD_{adv} \leftarrow L[t]L[t^{\prime}] where M(L[t])M(L[t]) \neq yy and L[t]L[t^{\prime}] has maximum similarity with DD
21       else
22             N1N_{1} = Rotate pp characters in wiw_{i} (pp \leq ll);
23             N2N_{2} = Random insertion of symbols before/end in wiw_{i};
24             Noise = N1N_{1} + N2N_{2} ;
25             for tt \in NoiseNoise do
26                   L[t]L[t] = Dadv[1:i1]D_{adv[1:i-1]}[t]Dadv[i+1:l]D_{adv[i+1:l]}
27            end for;
28             if \exists tt \in TT such that M(L[t])M(L[t]) \neq yy then
29                   Return: DadvD_{adv} \leftarrow L[t]L[t^{\prime}] where M(L[t])M(L[t]) \neq yy and L[t]L[t^{\prime}] has maximum similarity with DD
30             else if wiw_{i} contains numeric entity then
31                   tt = Replace wiw_{i} by num2wordsnum2words ;
32                   L[t]L[t] = Dadv[1:i1]D_{adv[1:i-1]}[t]Dadv[i+1:l]D_{adv[i+1:l]};
33                   Return: DadvD_{adv} \leftarrow L[t]L[t] if M(L[t])M(L[t]) \neq yy
34             else
35                   Return: DadvD_{adv} \leftarrow L[t]L[t^{\prime}] where L[t]L[t^{\prime}] causes max reduction in yy probability
36             end if;
37            
38       end if;
39      
40end for;
Return DadvD_{adv} \leftarrow None
Algorithm 1 BBAEG Algorithm

2 Methodology

Problem Definition: Given a set of nn inputs (D,YD,Y) = [(D1D_{1}, y1y_{1}), . . .(DnD_{n}, yny_{n})] and a trained classifier MM : DD \rightarrow YY, we assume the soft-label black-box setting where the attacker can only query the classifier for output probabilities on a given input, and has no access to the model parameters, gradients or training data. For an input of length ll consisting of words wiw_{i}, where 1 \leq ii \leq ll, (Di=[w1,,wl],y)(D_{i}=[w_{1},...,w_{l}],y), we want to generate an adversarial example DadvD_{adv} such that M(Dadv)M(D_{adv}) \neq yy. We would like DadvD_{adv} to be grammatically correct, semantically similar to DD (SimSim(DD, DadvD_{adv}) \geq α\alpha), where α\alpha denotes the similarity threshold.

BBAEG Algorithm:
Our proposed BBAEG algorithm consists of four steps: 1) Tagging the biomedical entities on DD and prepare two classes NE (named entities) and Non-NE (non-named entities) 2) Ranking the important words for perturbation 3) Choosing perturbation schemes 4) Final adversaries generation.

1) Named Entity Tagging: For each input instance DiD_{i} (Line 1 in Algorithm), we apply sciSpacy222https://allenai.github.io/scispacy/ with en-ner-bc5cdr-md to extract biomedical named entities (drugs and diseases), followed by its Entity Linker (Drugs to DrugBank Wishart et al. (2017), Disease to MESH333https://meshb.nlm.nih.gov/)). After linking the NE to respective ontologies, we use pyMeshSim444https://github.com/luozhhub/pyMeSHSim (for disease) and DrugBank (for drugs) to obtain synonyms. In each DiD_{i} of size ll (w1,w2,[wiwi+2],wlw_{1},w_{2},...[w_{i}...w_{i+2}],...w_{l}), multi-word expressions (wiwi+2w_{i}...w_{i+2}) are named entities. We put them in Named Entities Set (SNE)(S_{NE}) and other words in non-Named Entity set (SNNE)(S_{NNE}).

2) Ranking of important words: We estimate token importance IiI_{i} of each wiw_{i} \in DD, by deleting wiw_{i} from DD and computing the decrease in probability of predicting the correct label yy (Line 2), similar to Jin et al. (2019). Thus, we receive a set for each token which contains the tokens in decreasing order of their importance.

3) Choosing perturbation schemes: Consider the input DiD_{i}, we describe a sieve-based approach of perturbing DiD_{i}. Sieves are ordered by precision, with the most precise sieve appearing first.

Sieve 1 : In the first sieve, we propose to alter the synonyms of the tokens in SNES_{NE} (Line 5-9) using Ontology linking and the words in SNNES_{NNE} (Line 10-15) using BERT-MLM predicted tokens. This stems from the fact that synonym replacement of the non-named entities using BERT-MLM generates reasonable predictions considering the surrounding context Garg and Ramakrishnan (2020). If the token is a part of SNES_{NE}, replace them with the domain-specific synonyms one by one, but if the token is part of SNNES_{NNE}, then replace those words by the top-KK BERT-MLM predictions. To achieve high semantic similarity with the original text, we filter the set of top KK tokens (KK is a pre-defined constant) (Line 12) predicted by BERT-MLM for the masked token, using a Sentence-Transformer Reimers and Gurevych (2019) based sentence similarity scorer. Additionally, we filter out predicted tokens that do not belong to the same part of speech as original token. If this sieve generates adversaries for DiD_{i}, then DadvD_{adv} is being returned.

Sieve 2: (Line 20-28) If the first sieve does not generate adversary, we introduce two typographical noise in the input 1) Spelling Noise-N1: Rotating random pp characters (Line 20) 2) Spelling Noise-N2: insertion of symbols to the beginning or end (Line 21). If this sieve generates adversaries for DiD_{i}, then DadvD_{adv} is being returned.

Sieve 3: (Line 29-31) If Sieve 2 does not generate adversary, we replace the numeric entities by expanding the numeric digit. For example: PMD1 can be rewritten as PMD One, Covid19 as Covid nineteen. If this sieve generates adversaries for DiD_{i}, then DadvD_{adv} is being returned.

4) Final adversaries generation: For each of the three sieves, among all the winning adversaries, the one which is the most similar to original text as measured by Reimers and Gurevych (2019) is returned. If the sieves do not generate adversaries, we return the perturbed example which causes maximum reduction in the probability of output.

Twitter ADE Corpus ADE
Before-attack After-attack % Before-attack After-attack %
HAN-TF 0.80 0.33 0.10 0.83 0.46 0.09
HAN-BAE 0.80 0.35 0.08 0.83 0.43 0.06
HAN-Ours 0.80 0.36 0.05 0.83 0.31 0.11
BERT-base-TF 0.83 0.52 0.12 0.85 0.59 0.11
BERT-base-BAE 0.83 0.50 0.16 0.85 0.60 0.15
BERT-base-BBAEG 0.83 0.44 0.12 0.85 0.54 0.13
RoBERTa-base-TF 0.82 0.66 0.26 0.86 0.75 0.28
RoBERTa-base-BAE 0.82 0.63 0.23 0.86 0.74 0.24
RoBERTa-base-BBAEG 0.82 0.57 0.19 0.86 0.70 0.23
SciBERT-TF 0.85 0.45 0.11 0.88 0.53 0.13
SciBERT-BAE 0.85 0.43 0.11 0.88 0.56 0.11
SciBERT-BBAEG 0.85 0.38 0.10 0.88 0.50 0.08
BioBERT-TF 0.86 0.51 0.18 0.87 0.51 0.09
BioBERT-BAE 0.86 0.48 0.13 0.87 0.48 0.13
BioBERT-BBAEG 0.86 0.37 0.13 0.87 0.45 0.07
ClinicalBERT-TF 0.81 0.47 0.17 0.81 0.54 0.15
ClinicalBERT-BAE 0.81 0.48 0.16 0.81 0.58 0.22
ClinicalBERT-BBAEG 0.81 0.46 0.17 0.81 0.50 0.19
Table 1: Before-attack and after-attack accuracies of the models along with the % of perturbed words in the input space. Best attack and least % of perturbations are shown in bold for each dataset.
Table 2: shows the adversaries generated by BBAEG on handpicked examples from test set of ADE corpus. The different adversaries generated by baselines and BBAEG are shown. Also, the adversaries generated using different ablation of sieves [Spellings in Blue and Number in green, synonyms by attack algorithms in red] are shown.
Adverse Drug Event (ADE) Corpus (Adversaries : ADE Present \rightarrow ADE Not present)
Original: Successful challenge with clozapine in a history of pulmonary eosinophilia ailment.
BAE (Using BERT-MLM): Successful challenge with hydrochloride in a history of pulmonary disease ailment.
BBAEG (Best Combination): Successful challenge with clozapinum in a history of Loeffler Syndrome ailment.
Original: A 21-year-old patient developed rhabdomyolysis during 19th week of treatment with clozapine for schizophrenia.
BBAEG (Spelling Noise-N2): A 21-year-old patient developed rhabdomyolysis during 19th week of treatment with inoclozapine for cdschizophrenia.
BBAEG (Spelling Noise-N1): A 21-year-old patient developed rhabdomyolysis during 19th week of treatment with clpazoine for schizoerhpnia.
BBAEG (Synonyms): A 21-year-old patient developed rhabdomyolysis during 19th week of treatment with Clozapinum for dementia Praecox.
BBAEG (Number Replacement): A twenty-one-year-old patient developed rhabdomyolysis during nineteenth week of treatment with clozapine for schizophrenia.

3 Experimental setup

Datasets and Experimental Details: We evaluate BBAEG on two different biomedical text classification datasets: 1) Adverse Drug Event (ADE) Detection Gurulingappa et al. (2012) and 2) Twitter ADE dataset Rosenthal et al. (2017) for the task of classifying whether the sentence contains mention of ADE (binary).

We use 6 classification models as MM: Hierarchical Attention Model Yang et al. (2016), BERT Devlin et al. (2019), RoBERTa Liu et al. (2019), BioBERT Lee et al. (2019), Clinical-BERT Huang et al. (2019), SciBERT Beltagy et al. (2019). We fine-tune these models on the training data (of each corpus) using Adam Optimizer Kingma and Ba (2015) with learning rate of 0.00002, 10 epochs and perform adversarial attack on the test data. For the BBAEG non-NER synonym attacks, we use BERT-base-uncased MLM to predict the masked tokens. We consider top KK=10 synonyms from the BERT-MLM predictions and set threshold α\alpha of 0.75 for cosine similarity between Reimers and Gurevych (2019) embeddings of the adversarial and input text, we set pp=2 characters for rotation to introduce noise in input. For more details refer to the appendix.

4 Results

Automatic Evaluation Results: We examine the success of adversarial attack using two criteria: (1) Performance Drop (Adrop): Difference between original (accuracy on original test set) and after-attack accuracy (accuracy on the perturbed test set) (2) Perturbation of input (%): Percentage of perturbed words in adversary generated. Success of attack is directly and indirectly proportional with criteria 1 and 2 respectively.

Effectiveness: Table 1 shows the results of BBAEG attack on two datasets across all the models. During our experiments with HAN (general deep learning model), we observe that the attack is the most successful compared to BERT-variants, RoBERTa and the existing baselines, in terms of both the criteria (1 and 2). Also, using BioBERT and Sci-BERT (35-45% and 40-50% accuracy drop respectively), the attack is the most successful. This stems from the fact that the vocabularies used in the datasets have already been explored during pre-training by the contextual embeddings, thus more sensitive towards small perturbations. Moreover, it has been clearly observed that unlike BERT and HAN, RoBERTa is very less susceptible to adversarial attacks (10-20% accuracy drop), perturbing 20-25% words in the input space. We also observe that BERT-MLM-based synonym replacement techniques for non-NER, combined with multi-word NER synonym replacement using entity linking outperforms TextFooler(TF) and BAE-based approaches in terms of accuracy drop.

Twitter ADE ADE
Accuracy Drop (Semantic Similarity) Accuracy Drop (Semantic Similarity)
BioBERT-BBAEG (best variation) 0.43 (0.893) 0.42 (0.906)
- w/o Synonym Replacement (S1) 0.39 (0.899) 0.40 (0.919)
- w/o Spelling Noise N1 (S2-1) 0.37 (0.901) 0.35 (0.912)
- w/o Spelling Noise N2 (S2-2) 0.34 (0.913) 0.31 (0.891))
- w/o Number Replacement (S3) 0.30 (0.920) 0.27 (0.915)
SciBERT-BBAEG (best variation) 0.45 (0.879) 0.38 (0.881)
- w/o Synonym Replacement (S1) 0.42 (0.901) 0.35 (0.912)
- w/o Spelling Noise N1 (S2-1) 0.39 (0.915) 0.36 (0.901)
- w/o Spelling Noise N2 (S2-2) 0.31 (0.891) 0.31 (0.847)
- w/o Number Replacement (S3) 0.32 (0.911) 0.36 (0.903)
Table 3: Ablation analysis of the sieves (S1-S3) on accuracy drop and average semantic similarities between adversaries and original text.
Twitter ADE ADE
Accuracy Naturalness Accuracy Naturalness
TextFooler (TF) 0.85 3.78 0.78 3.55
BAE Algorithm 0.88 3.95 0.84 3.89
BBAEG (Our Method) 0.94 4.23 0.90 4.56
Table 4: Human Evaluation on both the datasets.

Ablation Analysis: In Table 3, we perform an ablation analysis on the different perturbation schemes and the effect of the attack using each of the sieves by making use of two fine-tuned contextual embedding model as the target model for ADE classification. Synonym replacement (S1) (average 35% accuracy drop) and character rotation (S2-1) (average 38% accuracy drop) seems to be the most promising approach for success attacks on biomedical text classification. Moreover, we conduct a deeper analysis to gain an insight of how much the synonyms of NER vs Non-NER entities contribute towards prediction change. We have found that the multi-word NERs during replacement generates natural-looking examples (compared to MLM-based entity replacement such as pulmonary eosinophillia is replaced by Loeffler Syndrome (for BBAEG) by normalizing to MESH vocabulary, while replaced by disease in BAE predictions as shown in Table 2 and they seem very unnatural. This proves that high semantic similarity does not always ensure generation of proper grammatical adversaries.

Human Evaluation: Apart from automatic evaluation, we also perform human evaluation of our BBAEG attacks on the BERT classifier. We perform similar kind of human evaluation by two biomedical domain-experts on randomly selected 100 generated adversarial examples (from each of the different attack algorithms) on each of the two datasets. For each sample, 50 annotations were collected. Similar setup was performed by Garg and Ramakrishnan (2020) during evaluation. The main two criteria for evaluation of the perturbed samples are as follows:

1) Naturalness : How much the adversaries generated is semantically similar to the original text content, preserving grammatical correctness on Likert Scale (1-5)? To evaluate the naturalness of the adversarial examples, we first present the annotators with 50 different set of original data samples to understand data distribution.

2) Accuracy of generated instances: on the binary classification of presence of Adverse Drug Reaction (ADR) on the adversarial examples. We enumerate the average scores of two annotators (for TextFooler (TF), BAE and our BBAEG) and present those in Table 4.

During ablation analysis, we observe that the synonym replaced perturbed samples looked more natural to the human evaluators compared to the spelling perturbed samples and number replaced entities. When considered jointly, the number replaced and synonym replaced samples seemed more natural to the annotators compared to spelling perturbed samples. This arises due to the fact that the number replaced entities when thrown to the annotators they could easily interpret the meaning correctly when given in combination with the original sample. For instance, in the examples shown in table 2, the number replaced samples (21-year old \rightarrow twenty-one-year old) look more natural and easily interpretable compared to spelling perturbed samples (clozapine \rightarrow clpazoine).

5 Conclusion and Future Work

In this paper, we propose a new technique for generating adversarial examples combining contextual perturbations based on BERT-MLM, synonym replacement of biomedical entities, typographical errors and numeric entity expansion. We explore several classification models to demonstrate the efficacy of our method. Experiments conducted on two benchmark biomedical datasets demonstrate the strength and effectiveness of our attack. As a future work, we would like to explore more about retraining the models with the perturbed samples in order to improve model robustness.

Acknowledgement

The author would like to thank the annotators for hard work, and also the anonymous reviewers for their insightful comments and feedback.

References