Adversarial Attacks Against Deep Learning Systems for ICD-9 Code Assignment

Sharan Raja^1,3, and Rudraksh Tuwani^2,3
¹Massachusetts Institute of Technology, Cambridge, MA
²Harvard University, Cambridge, MA
³{rsharan, rudraksh}@mit.edu

Abstract

Manual annotation of ICD-9 codes is a time consuming and error-prone process. Deep learning based systems tackling the problem of automated ICD-9 coding have achieved competitive performance. Given the increased proliferation of electronic medical records, such automated systems are expected to eventually replace human coders. In this work, we investigate how a simple typo-based adversarial attack strategy can impact the performance of state-of-the-art models for the task of predicting the top 50 most frequent ICD-9 codes from discharge summaries. Preliminary results indicate that a malicious adversary, using gradient information, can craft specific perturbations, that appear as regular human typos, for less than $3\%$ of words in the discharge summary to significantly affect the performance of the baseline model.

1 Introduction

The International Classification of Diseases (ICD) establishes a standardized fine-grained classification system for a broad range of diseases, disorders, injuries, symptoms, and other related health conditions [3]. It is primarily intended for use by healthcare workers, policymakers, insurers and national health program managers. The United States incurs administrative costs in billions of dollars annually arising from a complex billing infrastructure [16]. Specifically, the ICD code assignment is typically a manual process, consuming on average between 25 to 43 minutes per patient depending on the ICD version [1]. It is also prone to errors resulting from inexperienced coders, variation between coders, incorrect grouping of codes or mistakes in the patient discharge summaries. These errors are very costly with one report estimating that preventable errors in ICD coding have cost Medicare system 31.6 billion in FY2018 [2].

Recent work [13, 15, 4] has tried to automate the task of ICD code assignment using deep learning. Typically framed as a multilabel classification problem, researchers have trained Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Transformer models to predict ICD-9 codes from patient discharge summaries. These models have outperformed rule-based approaches and those utilizing conventional algorithms such as Logistic Regression, Support Vector Machines, Random Forests etc., achieving competitive micro F1-scores in the range 42% - 68%. Amongst these models, those based on CNNs have achieved the best performance.

Neural network models have revolutionized the field of NLP and SOTA models for various NLP tasks involve deep neural network models such as BERT, Bidirectional RNN or CNN-based methods. Recent works [10, 11, 14, 19] have shown a particular vulnerability of such deep models to adversarial examples that are often produced by adding small and imperceptible perturbations to the input data. The state of the art models of NLP are no exceptions to such perturbations. [18] provides a review of different adversarial attacks and defense strategies in the NLP literature. Based on granularity of the perturbation, adversarial attack strategies in NLP can be classified into three types - character-level attacks, word-level attacks and sentence-level attacks. In a character-level attack strategy, the model induces noise at the character level. Character-level noise can be induced due to naturally occurring reasons such as typos and misspellings or due to intentional modification by a malicious third-party. [12, 6, 5] are some of the existing character-level attack strategies in NLP. To accurately model the naturally occurring typos, [17] restrict the typos distribution based on the character constraints found in a standard English keyboard. We follow this strategy in our work. Furthermore, we assume a white-box setting where the adversary has access to gradients of the loss function wrt to the model inputs. To our knowledge, this is the first work to investigate the effects of adversarial samples in clinical NLP domain.

2 Data and Preprocessing

We used MIMIC-III [8], a large open source database comprising information of patients admitted to critical care units of Beth Israel Deaconess Medical Center (Boston, Massachusetts, USA). The database contains de-identified Electronic health records with both structured and unstructured data including diagnostics and laboratory results, medications, and discharge summaries. In this work, we focus on discharge summaries which encapsulates details pertaining to a patient’s stay.

Each discharge summary is manually annotated by human coders with multiple ICD-9 codes, describing both the diagnoses and procedures that the patient underwent. Out of the approx. 13000 possible ICD-9 codes, 8921 (6918 diagnosis, 2003 procedure) are present in our dataset. Following previous work, we merge discharge summaries corresponding to the same patient ID, such that no patient appears twice in our dataset resulting in 47,427 discharge summaries. This is done to ensure that there is no ‘data leakage’ between train, validation, and test sets.

The full label setting is quite noisy and suffers from class imbalance. Potential sources of noise include both missed assignments (not annotating all relevant ICD-9 codes) and incorrect assignments (annotating similar but incorrect ICD-9 codes). Consequently, it is relatively trivial to develop an adversarial attack strategy in the full label setting. For instance, one could simply find the keywords corresponding to low frequency labels and then either append or remove them from a discharge summary to alter a machine learning model’s prediction. This strategy will however fail for frequent labels since we expect the model to generalize beyond simply memorizing a few keywords. Therefore, we limited the label set to the 50 most frequent labels and removed discharge summaries which were not annotated with at least one of the labels. The resulting dataset was then split into training, validation and testing sets which contained 8067, 1574, and 1730 discharge summaries, respectively.

We followed the same pre-processing steps as in previous work [13]. All tokens without any alphabetic characters were removed. We then lowercased all tokens and replaced those appearing less than three times in the training documents with an ‘UNK’ token.

3 Baseline model

Our baseline models were the same as [13]. Specifically, we used a CNN-based sentence classifier model introduced by [9] which utilizes a max pooling layer to get sentence vector representations. We call this model Max Pool based CNN. The other model that we use instead utilizes label embeddings to calculate attention weights over word positions. These weights are then used to pool the output of the convolutional layer and calculate the sentence vector representation. This model is referred to as the Attention Pool based CNN.

4 Adversarial attack strategy

We generate adversarial examples based on the following algorithm: Given a pre-trained NLP algorithm $f:X\to y$ and a measure of classification $q:y\to s$ , we are interested in finding perturbations $\delta x$ on the input $X$ such that $q(X+\delta x)\leq q(X)$ under the constraint $||\delta x||\leq K$ . The final constraint ensures that the perturbations are small. In our work, we consider perturbations (typos) of four types:

1.

Insert - Insert characters into a word, such as hike $\to$ hlike
2.

Delete - Delete characters in a word, such as hike $\to$ hke
3.

Swap - swap two characters of a wors, such as hike $\to$ hkie
4.

Replace - Replace a character in a word with any neighboring keys in the keyboard, such as hike $\to$ hoke. Here o is a neighboring word to i in a standard english keyboard.

Given an input sentence $s$ that is tokenized according to the model’s tokenizer as $s=(w_{1},w_{2},..,w_{N})$ , we compute the partial derivative of loss with respect to each input item as shown below,

\mathcal{G}_{f}\left(w_{i}\right)=\nabla_{w_{i}}\mathcal{L}\left(w_{i},y\right)

(1)

Based on this gradient information, we select a input word $w_{i}$ to attack. We experiment with two different strategies here, the maximum gradient strategy where we choose the word corresponding to the maximum gradient and a random strategy where a random word is chosen to attack. Once a word is chosen, we generate all possible typos based on the four ways described above. The typo which decreases the score of the output $y$ based on the score function $q$ is chosen. Here, we use the top5 precision as the score function. Now the word replaced with the optimal typo word is again fed through this loop for $K$ times. Each time, a different word is chosen to ensure that final words don’t change from the initial words by a lot. We experiment with different choices of $K$ . The algorithm is shown in alg. 1

Algorithm 1 Adversarial attack for ICD-9 classification

1: Input: Document

X

, ground truth labels

y*

, classifier

f(.)

, budget

K

and score function

q

i\leftarrow 0,X_{best}\leftarrow X

3: while

i\leq K

c\leftarrow

Segmentation

(X_{best})

5: for each token

c_{i}

c

6: Compute gradients of component

c_{i}

according to eq. 1

7: end for

8: Find the token or word based on the gradient according to the strategy

9: Generate all possible typos for the chosen word

10: Create a list of documents; each document corresponding to a typo

11: Find the document instance that decreases the output score the most assign this to

X_{best}

12:

i\leftarrow i+1

13: end while

5 Results

To the best of our knowledge, [13] is the current state-of-the-art for the task of automated ICD-9 code assignment. We re-implemented their best performing models using the AllenNLP framework [7]. The test-set performance of the models for the task of predicting the top-50 most frequent ICD-9 codes from discharge summaries is given in table 1. We found that the Max Pool based CNN outperformed the Attention Pool based CNN on all performance metrics. Further, we found that the computation time for training as well as generating predictions for the former was much lesser than the latter. Therefore, we decided to limit our focus on developing an adversarial attack strategy for the Max Pool based CNN.

We experiment with three different values of budget $K=\{10,20,30\}$ and two different strategies - maximum gradient and random strategy for selecting the token to attack. The maximum gradient strategy can be used to analyze the robustness of the model to malicious attacks while the random strategy can be used to simulate natural settings with adversarial examples. The training time for each run on the entire corpus ( $1725$ discharge summaries) - $8$ hrs to $16$ hrs on a machine with Tesla K80 GPU. The results are summarized in table 2.

In accordance with our intuition, max grad strategy performs better than random strategy. This is because, max grad strategy can produce meaningful perturbations in a large input space (average size of input document is $\sim 1400$ tokens). The model’s performance doesn’t drop much with random strategy. This suggests that the model is some what robust to naturally occurring noise such as typos and misspellings. However, this might change as the budget is increased. Due to computational limits, we did not explore budgets beyond $30$ . A key result of our work is that, with less than $3\%$ of input tokens modified, the model’s performance drops significantly from $0.62$ to $0.377$ . This shows the potential vulnerability of this model to malicious attacks. Since, only a very few tokens are changed, it might be hard to defend against these attacks by training a discriminator to distinguish maliciously modified documents from regular ones.

Tables 3 and 4 show examples of discharge summaries before and after attack with their top5 labels. It is important to note that, on a few discharge summaries (last example in both the tables), the algorithm increases the top5 precision instead of decreasing it. One can make modifications to the algorithm to ensure that this doesn’t happen which would result in further drop in precision. Due to time constraints, we were not able to accommodate this modification. Nevertheless, these examples show the brittleness of the baseline model to input tokens.

Table 1: Performance of baseline models on MIMIC-III dataset for predicting the top 50 most frequent ICD-9 codes.

Model
Metric	Max Pool CNN	Label Attention Pool CNN
Macro F1 Score	$0.55$	$0.49$
Micro F1 Score	$0.63$	$0.55$
Macro AUC	$0.87$	$0.83$
Micro AUC	$0.91$	$0.86$
Top 5 Precision	$0.62$	$0.54$

Table 2: Results of adversarial attacks on the corpus of discharge summaries of size

1725

Top5 precision
Budget	Max grad strategy	Random strategy
Baseline ( $K=0$ ) $\to 0.62$
$10$	$0.549$	$0.592$
$20$	$0.462$	$0.574$
$30$	$0.377$	$0.567$

Table 3: Examples of sentences for budget

10

in maximum gradient strategy where the adversarial attack strategy resulted in maximum change in top5 labels. The first two examples cause the predictions to be worse and the last example shows a case where the adversarial example results in increased top5 precision. Labels in blue appear are part of ground truth labels.

Maximum gradient strategy, budget

=10

Top5 precision

Description

0.8\to 0.2

…unchanged as well. A tracheostomy tube and right subclavian line…

…unchanged as well. A tacheostomy ttube and right subclavian line…

…performed on. During tracheostomy procedure, pneumothorax occured and

chest tube…

…performed on. During tacheostomy proecedure, pneumothroax occurred

and chest tube…

Top5 labels before attack - Insertion of Sengstaken tube, Pneumonia,

Respiratory Ventilation, Venous catheterization, Arterial catheterization

Top5 labels after attack - Pneumonia, Unspecified pleural effusion, Insertion

of Sengstaken tube, Anemia, Acute post-hemorrhagic anemia

0.8\to 0.2

…cholelithiasis complicated hospital course including sepsis w persistent

hyperbilirubinemia…

…cholelithiasis complicated hospital course including sespis w persistent

hyperbilirubinemia…

…surgical or invasive procedure - ercp, laparoscopic cholecystectomy,

laparoscopic liver biopsy..

…surgical or invasive preocedure - erccp, laproscopic, cholecysectomy,

laparoscopic liver biopsy..

…presentation to hospital1 intubated jaundiced scleral…

…presentation to hospital1 int8bated jaundiced scleral…

Top5 labels before attack - Unspecified acquired hypothyroidism, Insertion

of endotracheal tube, Respiratory Ventilation, Enteral infusion of concentrated

nutritional substances, Continuous invasive mechanical ventilation

Top5 labels after attack - Unspecified acquired hypothyroidism, Diagnostic

ultrasound of heart, Old myocardial infarction, Major depressive disorder,

Other and unspecified hyperlipidemia.

0.2\to 0.8

…higher on tube feeds appreciate nutrition recs tfs changed to…

…higher on ttube fees apprciate nutritin res tfts changed to..

…for both chf and suspected aspiration pna w iv lasix…

…for both chf and suspected aspirtation pna w iv lasix…

Top5 labels before attack - Enteral infusion of concentrated nutritional

substances Venous catheterization, Food / vomit pneumonitis, Urinary tract

infection, Acute respiratory failure.

Top5 labels after attack - Acute respiratory failure, Venous catheterization,

Congestive heart failure, Insertion of endotracheal tube, Unspecified essential

hypertension

Table 4: Examples of sentences for budget

20

Maximum gradient strategy, budget

=20

Top5 precision

Description

1.0\to 0.2

…cabg, x4, hyperlipidemia, anxiety, hypertension, migraines, gi bleed…

…cbg, x4, hyperlipiddemia, axiety, hypertnesion, migranes, gi, bleedd…

…medical history - coronary artery disease, hyperlipidemia, anxiety…

…medical history - conronary atery disease, hyperlipdiemia, anxiety…

…room and underwent coronary artery bypass grafting x4 with left…

…room and underwent coronary bypas gratfting x4 with left…

Top5 labels before attack - Single internal mammary-coronary artery bypass,

Extracorporeal circulation auxiliary to open heart surgery, Other and unspecified

hyperlipidemia, Atherosclerotic heart disease of native coronary artery without

angina pectoris, Unspecified essential hypertension

Top5 labels after attack - Extracorporeal circulation auxiliary to open heart

surgery, Enteral infusion of concentrated nutritional substances, Transfusion of

packed cells, Diagnostic ultrasound of heart, Atrial fibrillation

1.0\to 0.2

…to posterior descending artery bronchosccopy reintubated history of present…

…to posterior descending artery bronchosccopy reitnubated history of present…

…the procedure was hemoptysis requiring intubation he was transferred back…

…the procedure was hemoptysis requiring ibntubation he was transferred back…

…mitral regurgitation, hypertension, hypercholesterolemia, congestive heart

failure, tobacco abuse…

…motral regunrgitation, hypertension, hypercholesterolemia, congesitve heeart

failre, taobacco abusee

Top5 labels before attack - Extracorporeal circulation auxiliary to open heart

surgery, Single internal mammary-coronary artery bypass, Atherosclerotic heart

disease of native coronary artery without angina pectoris, Mitral valve disorders,

Congestive heart failure

Top5 labels after attack - Unspecified essential hypertension, enteral infusion of

concentrated nutritional substances, Extracorporeal circulation auxiliary to open

heart surgery, Respiratory Ventilation, Transfusion of packed cells.

0.4\to 1.0

…cancer s p resection bilateral renal masses per pcp name…

…cancer s p resecton bliateral reanl mases per pcp name..

…morbid obesity, depression, restless leg syndrome…

…mtorbid obestity, deprssion, resltess leg syndrome…

Top5 labels before attack - Congestive heart failure, Chronic obstructive

pulmonary disease Chronic kidney disease, Hypertensive chronic kidney disease,

Non-invasive mechanical ventilation

Top5 labels after attack - Congestive heart failure, Chronic obstructive pulmonary

disease, Unspecified essential hypertension, Diabetes mellitus without mention of

complication, Urinary tract infection

6 Discussion

This work is a first step at exploring the robustness of NLP models used for automatic ICD-9 code classification. Clinical documents are different from regular documents as they are typically generated in a fast-paced environment with higher than average typos and non-standard acronyms. As a result, clinical NLP models are more susceptible to adversarial samples compared to a regular NLP model trained on a standard English dataset. A key extension of the work would be to consider a dictionary learnt from clinical documents and biomedical literature as a defense against these character-level perturbations. Although this might mitigate the decrease in performance, it wouldn’t completely solve it. A more rigorous way to deal with this would be to account for this in the tokenization strategy. It is easy to push a word out of vocabulary when using tokenization strategies like word2vec and GloVe. Other strategies that model words unseen in training dataset such as word-piece and byte-pair encoding will also break when typos are introduced because these models learn sub words from a standard dictionary. Therefore, any defense must account for these typos in the fundamental tokenization strategy. An interesting direction would be to learn a word similarity metric and map an unknown word to a closer word in the vocabulary given the input word and the context in which it appears. Building a robust tokenization strategy would be the first step towards a robust NLP model against character-level adversarial attacks.

References

per [2014] Perspectives, 2014. URL https://perspectives.ahima.org/preparing-for-icd-10-cmpcs-implementation-impact-on-productivity-and-quality/.
cmi [2018] Error rate drops, but medicare still lost $31.6 billion to preventable billing errors in fy2018, 2018.
who [2019] International classification of diseases (icd) information sheet, Oct 2019. URL https://www.who.int/classifications/icd/factsheet/en/.
Amin et al. [2019] S. Amin, G. Neumann, K. Dunfield, A. Vechkaeva, K. Chapman, and M. Wixted. Mlt-dfki at clef ehealth 2019: Multi-label classification of icd-10 codes with bert. 09 2019.
Ebrahimi et al. [2018] J. Ebrahimi, D. Lowd, and D. Dou. On adversarial examples for character-level neural machine translation, 2018.
Eger et al. [2019] S. Eger, G. G. Şahin, A. Rücklé, J.-U. Lee, C. Schulz, M. Mesgar, K. Swarnkar, E. Simpson, and I. Gurevych. Text processing like humans do: Visually attacking and shielding nlp systems, 2019.
Gardner et al. [2018] M. Gardner, J. Grus, M. Neumann, O. Tafjord, P. Dasigi, N. Liu, M. Peters, M. Schmitz, and L. Zettlemoyer. Allennlp: A deep semantic natural language processing platform. 03 2018.
Johnson et al. [2016] A. E. Johnson, T. J. Pollard, L. Shen, H. L. Li-wei, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. A. Celi, and R. G. Mark. Mimic-iii, a freely accessible critical care database. Scientific data, 3:160035, 2016.
Kim [2014] Y. Kim. Convolutional neural networks for sentence classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 08 2014. doi: 10.3115/v1/D14-1181.
Kurakin et al. [2016a] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world, 2016a.
Kurakin et al. [2016b] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale, 2016b.
Li et al. [2019] J. Li, S. Ji, T. Du, B. Li, and T. Wang. Textbugger: Generating adversarial text against real-world applications. Proceedings 2019 Network and Distributed System Security Symposium, 2019. doi: 10.14722/ndss.2019.23138. URL http://dx.doi.org/10.14722/ndss.2019.23138.
Mullenbach et al. [2018] J. Mullenbach, S. Wiegreffe, J. Duke, J. Sun, and J. Eisenstein. Explainable prediction of medical codes from clinical text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1101–1111, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-1100. URL https://www.aclweb.org/anthology/N18-1100.
Papernot et al. [2016] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami. Practical black-box attacks against machine learning, 2016.
Schmaltz and Beam [2020] A. Schmaltz and A. L. Beam. Exemplar auditing for multi-label biomedical text classification. ArXiv, abs/2004.03093, 2020.
Shull [2018] J. Shull. Digital health and the state of interoperable ehrs (preprint). JMIR Medical Informatics, 7, 11 2018. doi: 10.2196/12712.
Sun et al. [2020] L. Sun, K. Hashimoto, W. Yin, A. Asai, J. Li, P. Yu, and C. Xiong. Adv-bert: Bert is not robust on misspellings! generating nature adversarial samples on bert, 2020.
Zhang et al. [2019] W. E. Zhang, Q. Z. Sheng, A. Alhazmi, and C. Li. Adversarial attacks on deep learning models in natural language processing: A survey, 2019.
Zhao et al. [2017] Z. Zhao, D. Dua, and S. Singh. Generating natural adversarial examples, 2017.