Benchmarking Differentially Private Residual
Networks for Medical Imagery

Sahib Singh Harshvardhan D. Sikka

Abstract

Hospitals and other medical institutions often have vast amounts of medical data which can provide significant value when utilized to advance research. However, this data is often sensitive in nature, and as such is not readily available for use in a research setting, often due to privacy concerns. In this paper, we measure the performance of a deep neural network on differentially private image datasets pertaining to Pneumonia. We analyze the trade-off between the model’s accuracy and the scale of perturbation among the images. Knowing how the model’s accuracy varies among various perturbation levels in differentially private medical images is useful in these contexts. This work is contextually significant given the corona-virus pandemic, as Pneumonia has become an even greater concern owing to its potentially deadly complication of infection with COVID-19.

Machine Learning, ICML

1 Introduction

Pneumonia is an infection that causes inflammation in the alveoli of the lungs, and can be caused by numerous infectious agents. Often pneumonia is the resulting complication of an existing infection. Infectious agents include Corona-viruses, like SARS‑CoV‑2, Influenza Viruses, and various bacterial species. (usnews20). According to the Centers for Disease Control and Prevention, there are about 250,000 hospitalization and about 50,000 deaths every year owing to pneumonia (usnews20). Patient condition and immune response to pneumonia can vary based on the specific conditions and factors involved in the patient’s own health and physiology, as well as the characteristics of the infectious agent. (usnews20). Key contributing factors include:

•

Pre-existing comorbidities and general state of overall health in the infected individual
•

Virulence level of the infectious organism.
•

The level of exposure to the infectious agent. Increased proximity leads to significantly larger risk of infection severity, along with chance of infection. This is due to the increased inhalation of the infectious agent through various mediums. (usnews20)

The percentage of deaths attributed to pneumonia and influenza (P&I) is 8.2% in the United States, exceeding the threshold of epidemic classification, which is 7.2% (cdc20). Recently, deaths due to pneumonia have sharply increased due to the worldwide presence of COVID-19 and the SARS‑CoV‑2 virus. The rapid construction and evaluation of relevant models to track, diagnose, or support the treatment and mitigation of the COVID-19 is critical given global circumstances. Given the urgent need for these developments and the inherently sensitive nature of medical data, training and evaluating models while maintaining obfuscation of critical personal information in the data corpus is vital. The field of Differential Privacy approaches these constraints through various methods, including the direct obfuscation of data (dwork2006our; dwork2006calibrating). In this work, we analyze the impact of differentially private datasets on the performance of a popular image classification model, Resnet (he2016deep; szegedy2017inception). We compare the model performance on the Chest X-Ray Dataset (kermany2018identifying) with different levels of Perturbation while ensuring the images are differentially private. This analysis aims to aid medical professionals better understand the tradeoff between accuracy and data privacy, and may serve as a useful reference to better evaluate how sensitive information must be preserved while still ensuring the data remains useful for research purposes.

This text is organized as follows: First, we introduce Differential Privacy methods as they pertain to the our findings. Following this, the Experimental Design and corresponding Analysis of Results is covered. Finally, we discuss the significance of these findings and the potential future directions.

We would additionally like to acknowledge similar work which were done earlier in a more theoretical setting- (mireshghallah2020principled) and (fan2019differential). Our paper builds upon their work and applies it towards the Health-care domain, and is relevant in Medical Research in particular.

2 Methods

In this section we discuss the fundamental privacy preserving concepts used throughout the paper.

2.1 Differential Privacy (DP). (dwork2006our; dwork2006calibrating)

The central idea in differential privacy is the introduction of randomized noise to ensure privacy through plausible deniability. Based on this idea, for $\epsilon\geq 0$ , an algorithm $A$ is understood to satisfy Differential Privacy if and only if for any pair of datasets that differ in only one element, the following statement holds true.

$P[A(D)=t]\leq e^{\epsilon}$ $P[A(D^{{}^{\prime}})=t]$ $\;\forall t$

Where $D$ and $D^{\prime}$ are differing datasets by at most one element, and $P[A(D)=t]$ denotes the probability that $t$ is the output by $A$ . This setting approximates the effect of individual opt-outs by minimizing the inclusion effect of an individual’s data.

However, one major limitation of this kind of Differential Privacy is that the data owners will have to trust a central authority, i.e. the database maintainer, to ensure their privacy. Hence in order to ensure stronger privacy guarantee we utilize the concept of Local Differential Privacy (LDP) (bebensee2019local). We say that an algorithm π satisfies $\epsilon$ -Local Differential Privacy where $\epsilon>0$ if and only if for any input $v\;and\;v^{{}^{\prime}}$ .

$\forall y\;\in Range(\pi):P[\pi(v)=y]\leq e^{\epsilon}$ $P[\pi(v^{{}^{\prime}})=y]$

For $\epsilon-LDP$ the privacy loss is captured by $\epsilon$ . Having $\epsilon=0$ ensures perfect privacy as $e^{(0)}$ = 1, on the other hand, $\epsilon=∞$ provides no privacy guarantee. The choice of $\epsilon$ is quite important as the increase in privacy risks is proportional to $e^{\epsilon}$ .

2.2 Laplace Distribution. (dwork2014algorithmic)

The Laplace distribution, also known as the double-exponential distribution, is a symmetric version of the exponential distribution. The distribution centered at 0 (i.e. $\mu=0$ ) with scale $\beta$ has the following probability density function:

$Lap(x|b)=\frac{1}{2\beta}\;\exp(\frac{-|x|}{\beta})$

The variance of this distribution is $\sigma^{2}=2\beta^{2}$ .

2.3 Laplace Mechanism. (dwork2006calibrating; dwork2014algorithmic)

Laplace Mechanism independently perturbs each coordinate of the output with Laplace noise (from the Laplace distribution having mean zero) scaled to the sensitivity of the function.

Given $\epsilon\geq 0$ and a target function $f$ , the Laplace Mechanism is the randomizing algorithm

$A_{f}(D)=f(D)+x$

where x is random variable drawn from a Laplace distribution , corresponding to perturbation. $\Delta f$ corresponds to the global sensitivity of function $f$ , defined as over all dataset pairs that differ in only one element $(D,D^{\prime})$ .

3 Experimental Result

The experiments discussed in this section used an 18-Layer Residual Network (ResNet) previously trained to achieve convergence on the ImageNet task. ResNets share many ideas with the popular VGG architecture (simonyan2014very; szegedy2017inception), with significantly fewer filters and overall decreased complexity. They make use of identity connections between sets of layers as a solution to the common problem of gradient signals vanishing during backpropagation in very deep networks (he2016deep). The experimental setup consisted of training the selected model on an image classification task consisting of the Chest X-Ray dataset (kermany2018identifying). The Chest X-Ray dataset consists of approximately 5,800 images sourced from chest radiography, which is used by medical specialists to confirm pneumonia and other medical concerns, though they are not often the sole point of diagnosis. Different radiographic images taken at separate time intervals, such as before and during an illness, are often useful to physicians during the diagnosis. In general, these images form up an important part of an often multi-stage diagnosis process.

The dataset was used in a binary classification setting, with candidate data samples corresponding to either Normal or Pneumonia classes. The original Chest X-Ray dataset was directly used in this experiment, as well as 3 other versions generated by the addition of different perturbations to the images. These alternate, differentially-private datasets were generated by drawing random samples from the Laplacian Mechanism mentioned in (dwork2006calibrating) with $\mu=0$ and varying levels of scale i.e. $\beta$ .

These perturbations were added directly to the input image to create a noisy representation for subsequent training. These 4 datasets were used in different experiments to train the Resnet-18 model to convergence. To train the model on these images, some pre-processing steps were undertaken. Input images passed to the deep neural network were scaled to $256\times 256$ pixels, and normalized to 1. Therefore, the function $f$ defined in Section 2.1 is the identity function and the sensitivity $\Delta f$ is 1.

Refer to caption — Figure 1: Accuracy vs Training Epochs for Resnet-18

The experiments are all carried out using Python 3.8.2 and PyTorch 1.4.0. We trained separate instances of the pre-trained Resnet-18 model on these 4 image datasets, over 120 epochs. The best models from these runs were saved and analyzed. The tradeoff in accuracy with these varying scales of perturbations in the images was examined. Best model accuracies on both the train and test set are included in Table 1, along with the learning curves over 120 epochs of training in Figure 1. A validation set was used to tweak and improve training performance over the whole set of training epochs, and was excluded from the figures for clarity.

There are some interesting insights that emerge from the data included in Table 1 and Figure 1. These findings emphasize the intuition that accuracy clearly diminishes as scale of perturbation of the images is increased, represented by $\beta$ . This relationship generalizes to the whole training process across all 120 epochs. Another interesting finding is the behavior of the training and test curves at different perturbation levels with respect to each other. The best trade-off between model bias and variance seems to be with a $\beta$ value of 2, and the other perturbation levels also seem to exhibit better general training than that of the model trained on the original dataset, which demonstrates overfitting to the training data. This may highlight the potential for differential privacy methods in improving model generalization to unseen data, which may be useful under certain considerations.

4 Conclusion And Future Directions

In this paper, we provide empirical evaluation of a computationally tractable 18-Layer Resnet on a medically relevant classification task using the publicly available Chest X-Rays dataset. This is an effective approach in many medical contexts, where the interactions of the system are sufficiently complex and avoid easy to capture analysis. We find that differentially private noise mechanisms lead to generally different results with different perturbation quantities, and highlight the inherent trade-offs in these decisions. We also highlight interesting behavior in model performance on unseen data as a function of perturbation levels. This work also demonstrates the usefulness of transfer learning in privacy-preserving scenarios, as a pre-trained Resnet achieved good performance on another classification task across various perturbations.

There are several useful directions in benchmarking Privacy-Preserving methods on both medical and non-medical tasks where data is sensitive but progress is critical. Our future work on this topic will include the profiling of additional ML models, both Neural Network based systems and otherwise, across a variety of Privacy-Preserving settings, including Federated Learning. We also intend on understanding and improving the use of transfer learning in these settings where privacy is a necessary consideration. Another derivative research direction we intend on pursuing is the effect of various perturbation levels on entire neural network topologies, such as those generated through meta learning methods (elsken2018neural). Furthermore, different data modalities including audio files, video, text, tabular data, and various forms of cyber data can also be benchmarked in a similar way.

Benchmarking Differentially Private Residual Networks for Medical Imagery