Cyclic Generative Adversarial Networks With Congruent Image–Report Generation For Explainable Medical Image Analysis
Abstract
We present a novel framework for explainable labeling and interpretation of medical images. Medical images require specialized professionals for interpretation, and are explained (typically) via elaborate textual reports. Different from prior methods that focus on medical report generation from images or vice-versa, we novelly generate congruent image–report pairs employing a cyclic-Generative Adversarial Network (cycleGAN); thereby, the generated report will adequately explain a medical image, while a report-generated image that effectively characterizes the text visually should (sufficiently) resemble the original. The aim of the work is to generate trustworthy and faithful explanations for the outputs of a model diagnosing chest x-ray images by pointing a human user to similar cases in support of a diagnostic decision. Apart from enabling transparent medical image labeling and interpretation, we achieve report and image-based labeling comparable to prior methods, including state-of-the-art performance in some cases as evidenced by experiments on the Indiana Chest X-ray dataset.
Keywords:
Explainability Medical image analysis Multimodal1 Introduction
Medical images present critical information for clinicians and epidemiologists to diagnose and treat a variety of diseases. However, unlike natural images (scenes) which can be easily analyzed and explained by laypersons, medical images are hard to understand and interpret without specialized expertise.
Artificial intelligence (AI) has made rapid advances in the last decade thanks to deep learning. However, the need for accountability and transparency to explain decisions along with high performance, especially in healthcare, has spurred the need for explainable machine learning. While natural images can be analyzed and explained by decomposing them into semantically-consistent and prototypical visual segments [133, 134, 28, 131, 21, 13, 154, 121, 29, 135, 22], as well as [10, 82, 16, 7, 5, 56, 46, 159, 119, 67, 43], multimodal approaches for prototypical explanations are essential for interpreting and explaining medical imagery given the tight connection between image and text in this domain.
1.1 Prior Work
Prior works on medical image interpretation and explainability have either attempted to characterize (chest) x-rays in terms of multiple pathological labels [31, 42, 85, 69, 70, 86, 89, 39, 38, 41, 97, 37, 113, 111, 110, 122, 32] or via automated generation of imaging reports [129, 57, 136, 151, 90, 58, 23, 153, 61, 8, 59, 25, 11] and its variants [156, 81, 64, 84, 145, 146, 53, 147, 63, 139, 138, 152, 149, 44, 94]. The Chexnet framework [137, 148, 93, 158, 66, 65, 71, 91, 118, 30, 88, 143, 144, 87, 99, 142, 40, 68, 117, 115, 116] can also be [114, 34, 36, 155, 35, 33, 112, 106, 108, 109, 107, 103, 104, 98, 95, 105, 120, 92] employs a 121-layer convolution network to label chest x-rays. While the above report-generation works achieve excellent performance, and effectively learn mappings between the image and textual features, they nevertheless do not verify if the generated report characterizes the input x-ray. It is this constrained characterization in our suggested work that helps us generate prototypical chest x-ray images serving as explanations. In more recent work saliency maps have been used to select informative xray images [135]
1.2 Our Approach
This work differently focuses on the generation of coherent image–report pairs, and posits that if the image and report are conjoined counterparts, one should inherently describe the characteristics of the other. It is the second part of the radiology report generation model i.e. generation of prototypical images from generated reports that serve as explanations for the generated reports. The explainable model proposed can be characterised as a model having post hoc explanations where an explainer outputs the explanations corresponding to the output of the model being explained. The approach to explanations in such an explanation technique as ours is different from methods which propose simpler models such as decision trees that are inherently explainable. Having prototypical images as explanations has been used in case of natural images in [150, 125, 128, 124, 130, 132, 14, 12, 102, 123, 51, 26, 50, 24, 19, 20, 9, 62, 49, 60, 141, 6, 47, 48, 80] can also be [55, 27, 17, 100, 54, 45, 52, 4, 72, 73, 77, 3, 76, 2, 78, 79, 75, 74, 96, 83, 15, 1] and [101, 18, 140]. None of the approaches explores the paradigm of prototypical image generation as explanations in case of medical images which has been proposed in this work novelly with a multimodal approach.
1.3 Contributions
Overall, we make the following research contributions:
-
1.
We present the first multimodal formulation that enforces the generation of coherent and explanatory image–report pairs via the cycle-consistency loss employed in cycleGANs [157].
-
2.
Different from prior works, we regenerate an x-ray image from the report, and use this image to quantitatively and qualitatively evaluate the report quality. Extensive labeling experiments on textual reports and images generated via the Indiana Chest X-ray dataset reveal the effectiveness of our multi modal explanations approach.
-
3.
We evaluate the proposed model on two grounds namely: the quality of generated reports and the quality of generated explanations. Our method achieves results comparable to prior methods in report generation task, while achieving state-of-the-art performance in certain conditions. The evaluations done for post-hoc explanations show the employability of cycle consistency constraints and multimodal analysis as an explanation technique.
-
4.
As qualitative evaluation, we present Grad CAM-based attention maps conveying where a classification model focuses to make a prediction.
2 Method
2.1 Coherent Image-Report Pairs With CycleGANs
We aim to model the tight coherence between image and textual features in the chest x-ray images and reports through our multi-modal report generation model. Reports generated should be such that an x-ray image generated with just these generated reports as the input should be similar to ground truth x-ray images; and prototypical x-ray images generated as explanations should be such that a report generated from these images as inputs resembles original report. We hence devise a multimodal, paired GAN architecture explicitly modeling the cycle consistent constraints based on CycleGAN [157] with data of type {image, text/labels}.
2.2 CycleGAN
Given two sets of images corresponding to domains and (for example, two sets of paintings corresponding to different styles), cycleGAN enables learning a mapping such that the generated image , where and , looks similar to .
The generated images are also mapped back to images in domain . Hence, cycleGAN also learns another mapping where such that is similar to . The structural cycle-consistency assumption is modeled via the cycle consistency loss, which enforces to be similar to , and conversely, to be similar to . Hence the objective loss to be minimized enforces the following four constraints:
(1) |
We exploit the setting of Cycle-GAN in a multimodal paradigm i.e. the domains in which we work are text (reports) and image (chest x-ray). As shown in Figure 1, our multimodal cyclic GAN architecture comprises (i) two GANs and to respectively generate images from textual descriptions and vice-versa, and (ii) two deep neural networks, termed Discriminators and , to respectively compare the generated images and reports against the originals. Figure 1(a) depicts the mappings and , while Figure 1(b) depicts how cycle-consistency is enforced to generate coherent image-report pairs.
2.3 Explanatory image–report pairs

Our model learns mappings between prototypical image–text decompositions (termed visual or textual words in information retrieval) akin to the this looks like that formulation and synthetic image based explanations in . Since our setting is multimodal instead of image to image setting in cycle-gans, GAN (report-to-image generator) in our setting is based on a CNN-plus-LSTM based generative model similar to the architecture proposed . GAN (image-to-report generation) uses a hierarchical structure composed of two GANs similar to. First, GAN takes the text embedding as input and generates a low-resolution image. The second GAN utilizes this image and the text embedding to generate a high-resolution image.
2.4 Dataset
We used the Indiana University Chest X-Ray Collection (IU X-Ray) for our experiments, as it contains free text reports essential for the report generation task. IU X-Ray is a set of chest x-ray images paired with their corresponding diagnostic reports. The dataset contains 7,470 images, some of which map to the same free text report. 51% of the images are frontal, while the other 49% are lateral.
The frontal and lateral images map to individual text reports, at times corresponding to the same report. Consequently, mapping reports to images may confound the generator regarding which type of image to generate. To avoid this confusion, we work only with frontal images, thus reducing the dataset to 3793 image-text pairs. Each report consists of the following sections: impression, findings, tags, comparison, and indication. In this work, we treat the contents of impression and findings as the target captions to be generated. We adopt a 80:20 train-test split for all experiments.
2.5 Implementation
All images were resized to size. We used images for initial experiments involving the ’Ours-no-cycle’ method (see Table 1) and observed a better performance with respect to natural language metrics. However, low-resolution x-rays were used for subsequent experiments due to computational constraints. The input and hidden state dimensions for Sentence-LSTM are 1024 and 512 respectively, while both are of 512 length in the case of Word-LSTM. Learning rate used for the visual encoder is 1e-5, while 5e-4 is used for LSTM parts. Embedding dimension used for input to the text-to-image framework is 256, with learning rate set to 2e-4 for both the discriminator and the generator. We used PyTorch-based implementations for all experiments.
Firstly, we individually trained the image-to-text and text-to-image generator modules. In the text-to-image part, we first trained the Stage 1 generator, followed by Stage 2 training on freezing the Stage 1 generator. Note that this individual training of the text-to-image module was done on original reports from the training set. However, when we trained the cycleGAN architecture, the text-to-image part took in the generated text as input. While directly training both the modules together, oscillations in loss values were observed.
Methods: | Ours-Cycle∗ | Ours-no-cycle | R2Gen | Multiview |
---|---|---|---|---|
BLEU-1 | 0.486 | 0.520 | 0.470 | 0.529 |
BLEU-2 | 0.360 | 0.388 | 0.304 | 0.372 |
BLEU-3 | 0.285 | 0.302 | 0.219 | 0.315 |
BLEU-4 | 0.222 | 0.251 | 0.165 | 0.255 |
ROUGE | 0.440 | 0.463 | 0.371 | 0.453 |
∗ Reduction in training data (only frontal image-report pairs used)
3 Evaluation
3.1 Evaluation of Generated Reports
We first evaluate the quality of the generated reports via the BLEU and ROUGE metrics we compare our performance against other in Table 1. Our methods with and without cycle-consistency loss are referred to as Ours-cycle and Ours-no-cycle. Since only frontal images were used for training Ours-cycle (see Section 2.5), the training set is reduced to 3793 image–report pairs. We get comparable performance with the multi-view network [126] based on NLG metrics. There is a small drop in these metrics with the addition of the cycle component, mainly due to the reduction in training data (as the number of image-report repairs is approximately halved).
3.2 Evaluation of Explanations
To evaluate the explanations, we first assess if the generated images truly resemble real input images because the quality of the generated images is also a representative of the quality of the model generated reports as discussed in earlier sections. Secondly, we consider the aspects of trust and faithfulness of our explanation technique based on ideas in for post-hoc explanations.
3.2.1 3.2.1 Evaluating Similarity of Generated Images and Real X-ray Images
We quantitatively assess the images using CheXNet [137] (state-of-the-art performance on multi-label classification for chest x-ray images). We use CheXNet on input image–generated image pairs for checking the amount of disparity present between the true and generated images. We achieve a KL-Divergence of 0.101. We also introduce a ’top-k’ metric to identify if the same set of diseases are identified from the input and generated images. The metric averages the number of top predicted diseases which are common to both input and the generated images.
We compare the output labels of CheXNet on both real and generated image using the top-, Precision@ and Recall@ metrics. From Table 2, on average 1.84 predicted disease labels are common between the input and generated images, considering only the top-two ranked disease labels. In Table 2, we have also shown a comparison against images generated from our text-to-image (report-to-x-ray-image) model on the reports generated by the recently proposed transformer-based R2gen algorithm. Our representative generated images perform better on the top-x, precision and recall metrics, quantitatively showing that the reports generated by our cycleGAN model better describe the input chest x-ray image.
3.2.2 3.2.2 Evaluating Trustability of the Explanations
We build upon the idea of trust in an explanation technique suggested in for post-hoc explanations. An explanation method can be considered trustworthy if the generated explanations are able to characterize the kind of inputs on which the model performs well or closer to the ground truth. We evaluate our explanations on this aspect of trustability by testing if the explanations or prototypical x-ray images generated are the images on which reports generated are very close to ground truth reports. We evaluate the similarity of the two reports (ground truth reports and reports generated from prototypical images) by comparing the labels output by a naive Bayes classifier on the input reports. The results for accuracy metric for each of the 14 labels is summarised in the Table 3. We can clearly infer that the x-ray images generated as explanations have been able to understand the model’s behaviour and hence the good accuracy (around 0.9 for most of the labels).
Label No Finding Cardi omed iast inum Cardi omeg aly Lung Lesion Lung Opac ity Edema Cons olida tion Pneu mon ia Atele ctasis Pneu moth orax Pleur al(E) Pleur al(O) Fract ure Supp ort Devic es Accuracy 0.78 0.92 0.84 0.96 0.82 0.97 0.96 0.97 0.94 0.98 0.95 0.99 0.96 0.94
3.2.3 3.2.3 Evaluating Faithfulness of Explanations
Another aspect which has been explored in some of the explanation works is faithfulness of the technique i.e. whether the explanation technique is reliable. Reliability is understood in the sense that it is reflecting the underlying associations of the model rather than any other correlation such as just testing the presence of edges in object detection tasks . We test the faithfulness of the explanations generated by randomising the weights of the report generation model and then evaluating the quality of prototypical images to check if the explanation technique can be called faithful to the model parameters. The metric values for Top-2, Precision@2 and Recall@2 for generated images in this case are 0.90, 0.45 and 0.06 respectively significantly less than corresponding metrics in Table 2. As evident, the prototypical images generated as explanations from randomised weights model are unable to characterize the original input images because the model they are explaining doesn’t contain the underlying information it had previously learnt for characterizing given chest x-ray images.
3.2.4 3.2.4 Qualitative Assessment of Generated Images using Grad-CAM
We used GradCAM for highlighting the salient image regions focused upon by the CheXNet [137] model for label prediction from the real and generated image pairs. Two examples are shown in Fig. 2. In the left sample pair, real image shows fibrosis as the most probable disease label, as does the generated image. As observable, the highlighted region showing the presence of a nodule is the same in both x-ray images except for the flip from the left and right lung. This shows that the report generation model was able to capture these abnormalities with great detail, as the report-generated image also captures these details visually. Similarly, two of the top-three labels are the same in both real and generated images as predicted by CheXNet in sample pair 2.

4 Conclusion
A cycleGAN-based framework for explainable medical report generation and synthesis of coherent image-report pairs is proposed in this work. Our generated images visually characterize the text reports, and resemble the input image with respect to pathological characteristics. We have performed extensive experiments and evaluation on the generated images and reports, which show that our report-generation quality is comparable to the state-of-the-art in terms of natural language generation metrics; also the generated images depict the disease attributes both via attention maps and other quantitative measures (precision analysis, trust, and faithfulness) showing the usefulness of a cycle-constrained characterization of chest x-ray images in an explainable medical image analysis task.
References
- [1] Antony, B., Sedai, S., Mahapatra, D., Garnavi, R.: Real-time passive monitoring and assessment of pediatric eye health. In: US Patent App. 16/178,757 (2020)
- [2] Bastide, P., Kiral-Kornek, I., Mahapatra, D., Saha, S., Vishwanath, A., Cavallar, S.V.: Machine learned optimizing of health activity for participants during meeting times. In: US Patent App. 15/426,634 (2018)
- [3] Bastide, P., Kiral-Kornek, I., Mahapatra, D., Saha, S., Vishwanath, A., Cavallar, S.V.: Visual health maintenance and improvement. In: US Patent 9,993,385 (2018)
- [4] Bastide, P., Kiral-Kornek, I., Mahapatra, D., Saha, S., Vishwanath, A., Cavallar, S.V.: Crowdsourcing health improvements routes. In: US Patent App. 15/611,519 (2019)
- [5] Bozorgtabar, B., Mahapatra, D., von Teng, H., Pollinger, A., Ebner, L., Thiran, J.P., Reyes, M.: Informative sample generation using class aware generative adversarial networks for classification of chest xrays. Computer Vision and Image Understanding 184, 57–65 (2019)
- [6] Bozorgtabar, B., Mahapatra, D., von Teng, H., Pollinger, A., Ebner, L., Thiran, J.P., Reyes, M.: Informative sample generation using class aware generative adversarial networks for classification of chest xrays. In: arXiv preprint arXiv:1904.10781 (2019)
- [7] Bozorgtabar, B., Mahapatra, D., Thiran, J.P.: Exprada: Adversarial domain adaptation for facial expression analysis. In Press Pattern Recognition 100, 15–28 (2020)
- [8] Bozorgtabar, B., Mahapatra, D., Thiran, J.P., Shao, L.: SALAD: Self-supervised aggregation learning for anomaly detection on x-rays. In: In Proc. MICCAI. pp. 468–478 (2020)
- [9] Bozorgtabar, B., Mahapatra, D., Vray, G., Thiran, J.P.: Anomaly detection on x-rays using self-supervised aggregation learning. In: arXiv preprint arXiv:2010.09856 (2020)
- [10] Bozorgtabar, B., Mahapatra, D., Zlobec, I., Rau, T., Thiran, J.: Computational pathology. Frontiers in Medicine 7 (2020)
- [11] Bozorgtabar, B., Rad, M.S., Mahapatra, D., Thiran, J.P.: Syndemo: Synergistic deep feature alignment for joint learning of depth and ego-motion. In: In Proc. IEEE ICCV (2019)
- [12] Das, S.D., Dutta, S., Shah, N.A., Mahapatra, D., Ge, Z.: Anomaly detection in retinal images using multi-scale deep feature sparse coding. In: arXiv preprint arXiv:2201.11506 (2022)
- [13] Devika, K., Mahapatra, D., Subramanian, R., Oruganti, V.R.M.: Outlier-based autism detection using longitudinal structural mri. IEEE Access 10, 27794–27808 (2022). https://doi.org/10.1109/ACCESS.2022.3157613
- [14] Devika, K., Mahapatra, D., Subramanian, R., Oruganti, V.R.M.: Outlier-based autism detection using longitudinal structural mri. In: arXiv preprint arXiv:2202.09988 (2022)
- [15] Garnavi, R., Mahapatra, D., Roy, P., Tennakoon, R.: System and method to teach and evaluate image grading performance using prior learned expert knowledge base. In: US Patent App. 10,657,838 (2020)
- [16] Ge, Z., Mahapatra, D., Chang, X., Chen, Z., Chi, L., Lu, H.: Improving multi-label chest x-ray disease diagnosis by exploiting disease and health labels dependencies. In press Multimedia Tools and Application pp. 1–14 (2019)
- [17] Ge, Z., Mahapatra, D., Sedai, S., Garnavi, R., Chakravorty, R.: Chest x-rays classification: A multi-label and fine-grained problem. In: arXiv preprint arXiv:1807.07247 (2018)
- [18] Hoog, J.D., Mahapatra, D., Garnavi, R., Jalali, F.: Personalized monitoring of injury rehabilitation through mobile device imaging. In: US Patent App. 16/589,046 (2021)
- [19] Ju, L., Wang, X., Wang, L., Liu, T., Zhao, X., Drummond, T., Mahapatra, D., Ge, Z.: Relational subsets knowledge distillation for long-tailed retinal diseases recognition. In: arXiv preprint arXiv:2104.11057 (2021)
- [20] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Harandi, M., Drummond, T., Liu, T., Ge, Z.: Improving medical image classification with label noise using dual-uncertainty estimation. In: arXiv preprint arXiv:2103.00528 (2020)
- [21] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE Transactions on Medical Imaging pp. 1–1 (2022). https://doi.org/10.1109/TMI.2022.3141425
- [22] Ju, L., Wang, X., Zhao, X., Lu, H., Mahapatra, D., Bonnington, P., Ge, Z.: Synergic adversarial label learning for grading retinal diseases via knowledge distillation and multi-task learning. IEEE JBHI 100, 1–14 (2020)
- [23] Ju, L., Wang, X., Zhao, X., Lu, H., Mahapatra, D., Ge, Z.: Relational subsets knowledge distillation for long-tailed retinal diseases recognition. In: In MICCAI 2021. pp. 1–11 (2021)
- [24] Kuanar, S., Athitsos, V., Mahapatra, D., Rajan, A.: Multi-scale deep learning architecture for nucleus detection in renal cell carcinoma microscopy image. In: arXiv preprint arXiv:2104.13557 (2021)
- [25] Kuanar, S., Athitsos, V., Mahapatra, D., Rao, K., Akhtar, Z., Dasgupta, D.: Low dose abdominal ct image reconstruction: An unsupervised learning based approach. In: In Proc. IEEE ICIP. pp. 1351–1355 (2019)
- [26] Kuanar, S., Mahapatra, D., Athitsos, V., Rao, K.: Gated fusion network for sao filter and inter frame prediction in versatile video coding. In: arXiv preprint arXiv:2105.12229 (2021)
- [27] Kuanar, S., Rao, K., Mahapatra, D., Bilas, M.: Night time haze and glow removal using deep dilated convolutional network. In: arXiv preprint arXiv:1902.00855 (2019)
- [28] Kuanar, S., Athitsos, V., Mahapatra, D., Rao, K.: Gated fusion network for sao filter and inter frame prediction in versatile video coding. Signal Processing: Image Communication 109, 116839 (2022)
- [29] Kuanar, S., Mahapatra, D., Bilas, M., Rao, K.: Multi-path dilated convolution network for haze and glow removal in night time images. The Visual Computer 38(3), 1121–1134 (2022)
- [30] Kuang, H., Guthier, B., Saini, M., Mahapatra, D., Saddik, A.E.: A real-time smart assistant for video surveillance through handheld devices. In: In Proc: ACM Intl. Conf. Multimedia. pp. 917–920 (2014)
- [31] Li, Z., Mahapatra, D., J.Tielbeek, Stoker, J., van Vliet, L., Vos, F.: Image registration based on autocorrelation of local structure. IEEE Trans. Med. Imaging 35(1), 63–75 (2016)
- [32] Mahapatra, D.: Elastic registration of cardiac perfusion images using saliency information. Sequence and Genome Analysis – Methods and Applications pp. 351–364 (2011)
- [33] Mahapatra, D.: Neonatal brain mri skull stripping using graph cuts and shape priors. In: In Proc: MICCAI workshop on Image Analysis of Human Brain Development (IAHBD) (2011)
- [34] Mahapatra, D.: Cardiac lv and rv segmentation using mutual context information. In: Proc. MICCAI-MLMI. pp. 201–209 (2012)
- [35] Mahapatra, D.: Groupwise registration of dynamic cardiac perfusion images using temporal information and segmentation information. In: In Proc: SPIE Medical Imaging (2012)
- [36] Mahapatra, D.: Landmark detection in cardiac mri using learned local image statistics. In: Proc. MICCAI-Statistical Atlases and Computational Models of the Heart. Imaging and Modelling Challenges (STACOM). pp. 115–124 (2012)
- [37] Mahapatra, D.: Skull stripping of neonatal brain mri: Using prior shape information with graphcuts. J. Digit. Imaging 25(6), 802–814 (2012)
- [38] Mahapatra, D.: Cardiac image segmentation from cine cardiac mri using graph cuts and shape priors. J. Digit. Imaging 26(4), 721–730 (2013)
- [39] Mahapatra, D.: Cardiac mri segmentation using mutual context information from left and right ventricle. J. Digit. Imaging 26(5), 898–908 (2013)
- [40] Mahapatra, D.: Graph cut based automatic prostate segmentation using learned semantic information. In: Proc. IEEE ISBI. pp. 1304–1307 (2013)
- [41] Mahapatra, D.: Joint segmentation and groupwise registration of cardiac perfusion images using temporal information. J. Digit. Imaging 26(2), 173–182 (2013)
- [42] Mahapatra, D.: An automated approach to cardiac rv segmentation from mri using learned semantic information and graph cuts. J. Digit. Imaging. 27(6), 794–804 (2014)
- [43] Mahapatra, D.: Combining multiple expert annotations using semi-supervised learning and graph cuts for medical image segmentation. Computer Vision and Image Understanding 151(1), 114–123 (2016)
- [44] Mahapatra, D.: Retinal image quality classification using neurobiological models of the human visual system. In: In Proc. MICCAI-OMIA. pp. 1–8 (2016)
- [45] Mahapatra, D.: Consensus based medical image segmentation using semi-supervised learning and graph cuts. In: arXiv preprint arXiv:1612.02166 (2017)
- [46] Mahapatra, D.: Semi-supervised learning and graph cuts for consensus based medical image segmentation. Pattern Recognition 63(1), 700–709 (2017)
- [47] Mahapatra, D.: Amd severity prediction and explainability using image registration and deep embedded clustering. In: arXiv preprint arXiv:1907.03075 (2019)
- [48] Mahapatra, D.: Generative adversarial networks and domain adaptation for training data independent image registration. In: arXiv preprint arXiv:1910.08593 (2019)
- [49] Mahapatra, D.: Registration of histopathogy images using structural information from fine grained feature maps. In: arXiv preprint arXiv:2007.02078 (2020)
- [50] Mahapatra, D.: Interpretability-driven sample selection using self supervised learning for disease classification and segmentation. In: arXiv preprint arXiv:2104.06087 (2021)
- [51] Mahapatra, D.: Learning of inter-label geometric relationships using self-supervised learning: Application to gleason grade segmentation. In: arXiv preprint arXiv:2110.00404 (2021)
- [52] Mahapatra, D., Agarwal, K., Khosrowabadi, R., Prasad, D.: Recent advances in statistical data and signal analysis: Application to real world diagnostics from medical and biological signals. In: Computational and mathematical methods in medicine (2016)
- [53] Mahapatra, D., Antony, B., Sedai, S., Garnavi, R.: Deformable medical image registration using generative adversarial networks. In: In Proc. IEEE ISBI. pp. 1449–1453 (2018)
- [54] Mahapatra, D., Bozorgtabar, B.: Retinal vasculature segmentation using local saliency maps and generative adversarial networks for image super resolution. In: arXiv preprint arXiv:1710.04783 (2017)
- [55] Mahapatra, D., Bozorgtabar, B.: Progressive generative adversarial networks for medical image super resolution. In: arXiv preprint arXiv:1902.02144 (2019)
- [56] Mahapatra, D., Bozorgtabar, B., Garnavi, R.: Image super-resolution using progressive generative adversarial networks for medical image analysis. Computerized Medical Imaging and Graphics 71, 30–39 (2019)
- [57] Mahapatra, D., Bozorgtabar, B., Ge, Z.: Medical image classification using generalized zero shot learning. In: In IEEE CVAMD 2021. pp. 3344–3353 (2021)
- [58] Mahapatra, D., Bozorgtabar, B., Kuanar, S., Ge, Z.: Self-supervised multimodal generalized zero shot learning for gleason grading. In: In MICCAI-DART 2021. pp. 1–11 (2021)
- [59] Mahapatra, D., Bozorgtabar, B., Shao, L.: Pathological retinal region segmentation from oct images using geometric relation based augmentation. In: In Proc. IEEE CVPR. pp. 9611–9620 (2020)
- [60] Mahapatra, D., Bozorgtabar, B., Thiran, J.P., Shao, L.: Pathological retinal region segmentation from oct images using geometric relation based augmentation. In: arXiv preprint arXiv:2003.14119 (2020)
- [61] Mahapatra, D., Bozorgtabar, B., Thiran, J.P., Shao, L.: Structure preserving stain normalization of histopathology images using self supervised semantic guidance. In: In Proc. MICCAI. pp. 309–319 (2020)
- [62] Mahapatra, D., Bozorgtabar, B., Thiran, J.P., Shao, L.: Structure preserving stain normalization of histopathology images using self supervised semantic guidance. In: arXiv preprint arXiv:2008.02101 (2020)
- [63] Mahapatra, D., Bozorgtabar, S., Hewavitahranage, S., Garnavi, R.: Image super resolution using generative adversarial networks and local saliencymaps for retinal image analysis,. In: In Proc. MICCAI. pp. 382–390 (2017)
- [64] Mahapatra, D., Bozorgtabar, S., Thiran, J.P., Reyes, M.: Efficient active learning for image classification and segmentation using a sample selection and conditional generative adversarial network. In: In Proc. MICCAI (2). pp. 580–588 (2018)
- [65] Mahapatra, D., Buhmann, J.: Obtaining consensus annotations for retinal image segmentation using random forest and graph cuts. In: In Proc. OMIA. pp. 41–48 (2015)
- [66] Mahapatra, D., Buhmann, J.: Visual saliency based active learning for prostate mri segmentation. In: In Proc. MLMI. pp. 9–16 (2015)
- [67] Mahapatra, D., Buhmann, J.: Visual saliency-based active learning for prostate magnetic resonance imaging segmentation. SPIE Journal of Medical Imaging 3(1), 014003 (2016)
- [68] Mahapatra, D., Buhmann, J.: Automatic cardiac rv segmentation using semantic information with graph cuts. In: Proc. IEEE ISBI. pp. 1094–1097 (2013)
- [69] Mahapatra, D., Buhmann, J.: Analyzing training information from random forests for improved image segmentation. IEEE Trans. Imag. Proc. 23(4), 1504–1512 (2014)
- [70] Mahapatra, D., Buhmann, J.: Prostate mri segmentation using learned semantic knowledge and graph cuts. IEEE Trans. Biomed. Engg. 61(3), 756–764 (2014)
- [71] Mahapatra, D., Buhmann, J.: A field of experts model for optic cup and disc segmentation from retinal fundus images. In: In Proc. IEEE ISBI. pp. 218–221 (2015)
- [72] Mahapatra, D., Garnavi, R., Roy, P., Tennakoon, R.: System and method to teach and evaluate image grading performance using prior learned expert knowledge base. In: US Patent App. 15/459,457 (2018)
- [73] Mahapatra, D., Garnavi, R., Roy, P., Tennakoon, R.: System and method to teach and evaluate image grading performance using prior learned expert knowledge base. In: US Patent App. 15/814,590 (2018)
- [74] Mahapatra, D., Garnavi, R., Sedai, S., Roy, P.: Joint segmentation and characteristics estimation in medical images. In: US Patent App. 15/234,426 (2017)
- [75] Mahapatra, D., Garnavi, R., Sedai, S., Roy, P.: Retinal image quality assessment, error identification and automatic quality correction. In: US Patent 9,779,492 (2017)
- [76] Mahapatra, D., Garnavi, R., Sedai, S., Tennakoon, R.: Classification of severity of pathological condition using hybrid image representation. In: US Patent App. 15/426,634 (2018)
- [77] Mahapatra, D., Garnavi, R., Sedai, S., Tennakoon, R.: Generating an enriched knowledge base from annotated images. In: US Patent App. 15/429,735 (2018)
- [78] Mahapatra, D., Garnavi, R., Sedai, S., Tennakoon, R., Chakravorty, R.: Early prediction of age related macular degeneration by image reconstruction. In: US Patent App. 15/854,984 (2018)
- [79] Mahapatra, D., Garnavi, R., Sedai, S., Tennakoon, R., Chakravorty, R.: Early prediction of age related macular degeneration by image reconstruction. In: US Patent 9,943,225 (2018)
- [80] Mahapatra, D., Ge, Z.: Combining transfer learning and segmentation information with gans for training data independent image registration. In: arXiv preprint arXiv:1903.10139 (2019)
- [81] Mahapatra, D., Ge, Z.: Training data independent image registration with gans using transfer learning and segmentation information. In: In Proc. IEEE ISBI. pp. 709–713 (2019)
- [82] Mahapatra, D., Ge, Z.: Training data independent image registration using generative adversarial networks and domain adaptation. Pattern Recognition 100, 1–14 (2020)
- [83] Mahapatra, D., Ge, Z., Sedai, S.: Joint registration and segmentation of images using deep learning. In: US Patent App. 16/001,566 (2019)
- [84] Mahapatra, D., Ge, Z., Sedai, S., Chakravorty., R.: Joint registration and segmentation of xray images using generative adversarial networks. In: In Proc. MICCAI-MLMI. pp. 73–80 (2018)
- [85] Mahapatra, D., Gilani, S., Saini., M.: Coherency based spatio-temporal saliency detection for video object segmentation. IEEE Journal of Selected Topics in Signal Processing. 8(3), 454–462 (2014)
- [86] Mahapatra, D., J.Tielbeek, Makanyanga, J., Stoker, J., Taylor, S., Vos, F., Buhmann, J.: Automatic detection and segmentation of crohn’s disease tissues from abdominal mri. IEEE Trans. Med. Imaging 32(12), 1232–1248 (2013)
- [87] Mahapatra, D., J.Tielbeek, Makanyanga, J., Stoker, J., Taylor, S., Vos, F., Buhmann, J.: Active learning based segmentation of crohn’s disease using principles of visual saliency. In: Proc. IEEE ISBI. pp. 226–229 (2014)
- [88] Mahapatra, D., J.Tielbeek, Makanyanga, J., Stoker, J., Taylor, S., Vos, F., Buhmann, J.: Combining multiple expert annotations using semi-supervised learning and graph cuts for crohn’s disease segmentation. In: In Proc: MICCAI-ABD (2014)
- [89] Mahapatra, D., J.Tielbeek, Vos, F., Buhmann, J.: A supervised learning approach for crohn’s disease detection using higher order image statistics and a novel shape asymmetry measure. J. Digit. Imaging 26(5), 920–931 (2013)
- [90] Mahapatra, D., Kuanar, S., Bozorgtabar, B., Ge, Z.: Self-supervised learning of inter-label geometric relationships for gleason grade segmentation. In: In MICCAI-DART 2021. pp. 57–67 (2021)
- [91] Mahapatra, D., Li, Z., Vos, F., Buhmann, J.: Joint segmentation and groupwise registration of cardiac dce mri using sparse data representations. In: In Proc. IEEE ISBI. pp. 1312–1315 (2015)
- [92] Mahapatra, D., Routray, A., Mishra, C.: An active snake model for classification of extreme emotions. In: IEEE International Conference on Industrial Technology (ICIT). pp. 2195–2199 (2006)
- [93] Mahapatra, D., Roy, P., Sedai, S., Garnavi, R.: A cnn based neurobiology inspired approach for retinal image quality assessment. In: In Proc. EMBC. pp. 1304–1307 (2016)
- [94] Mahapatra, D., Roy, P., Sedai, S., Garnavi, R.: Retinal image quality classification using saliency maps and cnns. In: In Proc. MICCAI-MLMI. pp. 172–179 (2016)
- [95] Mahapatra, D., Roy, S., Sun, Y.: Retrieval of mr kidney images by incorporating shape information in histogram of low level features. In: In 13th International Conference on Biomedical Engineering. pp. 661–664 (2009)
- [96] Mahapatra, D., Saha, S., Vishwanath, A., Bastide, P.: Generating hyperspectral image database by machine learning and mapping of color images to hyperspectral domain. In: US Patent App. 15/949,528 (2019)
- [97] Mahapatra, D., Saini, M.: A particle filter framework for object tracking using visual-saliency information. Intelligent Multimedia Surveillance pp. 133–147 (2013)
- [98] Mahapatra, D., Saini, M., Sun, Y.: Illumination invariant tracking in office environments using neurobiology-saliency based particle filter. In: IEEE ICME. pp. 953–956 (2008)
- [99] Mahapatra, D., Schffler, P., Tielbeek, J., Vos, F., Buhmann, J.: Semi-supervised and active learning for automatic segmentation of crohn’s disease. In: Proc. MICCAI, Part 2. pp. 214–221 (2013)
- [100] Mahapatra, D., Sedai, S., Garnavi, R.: Elastic registration of medical images with gans. In: arXiv preprint arXiv:1805.02369 (2018)
- [101] Mahapatra, D., Sedai, S., Halupka, K.: Uncertainty region based image enhancement. In: US Patent App. 10,832,074 (2020)
- [102] Mahapatra, D., Singh, A.: Ct image synthesis using weakly supervised segmentation and geometric inter-label relations for covid image analysis. In: arXiv preprint arXiv:2106.10230 (2021)
- [103] Mahapatra, D., Sun, Y.: Nonrigid registration of dynamic renal MR images using a saliency based MRF model. In: Proc. MICCAI. pp. 771–779 (2008)
- [104] Mahapatra, D., Sun, Y.: Registration of dynamic renal mr images using neurobiological model of saliency. In: Proc. ISBI. pp. 1119–1122 (2008)
- [105] Mahapatra, D., Sun, Y.: Using saliency features for graphcut segmentation of perfusion kidney images. In: In 13th International Conference on Biomedical Engineering (2008)
- [106] Mahapatra, D., Sun, Y.: Joint registration and segmentation of dynamic cardiac perfusion images using mrfs. In: Proc. MICCAI. pp. 493–501 (2010)
- [107] Mahapatra, D., Sun, Y.: Mrf based joint registration and segmentation of dynamic renal mr images. In: Second International Conference on Digital Image Processing. vol. 7546, pp. 285–290 (2010)
- [108] Mahapatra, D., Sun., Y.: An mrf framework for joint registration and segmentation of natural and perfusion images. In: Proc. IEEE ICIP. pp. 1709–1712 (2010)
- [109] Mahapatra, D., Sun, Y.: Retrieval of perfusion images using cosegmentation and shape context information. In: Proc. APSIPA Annual Summit and Conference (ASC). vol. 35 (2010)
- [110] Mahapatra, D., Sun, Y.: Rigid registration of renal perfusion images using a neurobiology based visual saliency model. EURASIP Journal on Image and Video Processing. pp. 1–16 (2010)
- [111] Mahapatra, D., Sun, Y.: Mrf based intensity invariant elastic registration of cardiac perfusion images using saliency information. IEEE Trans. Biomed. Engg. 58(4), 991–1000 (2011)
- [112] Mahapatra, D., Sun, Y.: Orientation histograms as shape priors for left ventricle segmentation using graph cuts. In: In Proc: MICCAI. pp. 420–427 (2011)
- [113] Mahapatra, D., Sun, Y.: Integrating segmentation information for improved mrf-based elastic image registration. IEEE Trans. Imag. Proc. 21(1), 170–183 (2012)
- [114] Mahapatra, D., Tielbeek, J., Buhmann, J., Vos, F.: A supervised learning based approach to detect crohn’s disease in abdominal mr volumes. In: Proc. MICCAI workshop Computational and Clinical Applications in Abdominal Imaging(MICCAI-ABD). pp. 97–106 (2012)
- [115] Mahapatra, D., Tielbeek, J., Vos, F., ., J.B.: Crohn’s disease tissue segmentation from abdominal mri using semantic information and graph cuts. In: Proc. IEEE ISBI. pp. 358–361 (2013)
- [116] Mahapatra, D., Tielbeek, J., Vos, F., Buhmann, J.: Localizing and segmenting crohn’s disease affected regions in abdominal mri using novel context features. In: Proc. SPIE Medical Imaging (2013)
- [117] Mahapatra, D., Tielbeek, J., Vos, F., Buhmann, J.: Weakly supervised semantic segmentation of crohn’s disease tissues from abdominal mri. In: Proc. IEEE ISBI. pp. 832–835 (2013)
- [118] Mahapatra, D., Vos, F., Buhmann, J.: Crohn’s disease segmentation from mri using learned image priors. In: In Proc. IEEE ISBI. pp. 625–628 (2015)
- [119] Mahapatra, D., Vos, F., Buhmann, J.: Active learning based segmentation of crohns disease from abdominal mri. Computer Methods and Programs in Biomedicine 128(1), 75–85 (2016)
- [120] Mahapatra, D., Winkler, S., Yen, S.: Motion saliency outweighs other low-level features while watching videos. In: SPIE HVEI. pp. 1–10 (2008)
- [121] Mahapatra, D.: Registration and segmentation methodology for perfusion mr images: Application to cardiac and renal images. - pp. – (2011)
- [122] Mahapatra, D.: Registration and segmentation methodology for perfusion mr images: Application to cardiac and renal images. - pp. – (2011)
- [123] Mahapatra, D.: Multimodal generalized zero shot learning for gleason grading using self-supervised learning. In: arXiv preprint arXiv:2111.07646 (2021)
- [124] Mahapatra, D.: Generalized zero shot learning for medical image classification. In: arXiv preprint arXiv:2204.01728 (2022)
- [125] Mahapatra, D.: Improved super resolution of mr images using cnns and vision transformers. In: arXiv preprint arXiv:2207.11748 (2022)
- [126] Mahapatra, D.: Improved super resolution of mr images using cnns and vision transformers. In: arXiv preprint arXiv:2207.11748 (2022)
- [127] Mahapatra, D.: Improved super resolution of mr images using cnns and vision transformers. In: arXiv preprint arXiv:2207.11748 (2022)
- [128] Mahapatra, D.: Unsupervised domain adaptation using feature disentanglement and gcns for medical image classification. In: arXiv preprint arXiv:2206.13123 (2022)
- [129] Mahapatra, D., Ge, Z.: MR image super resolution by combining feature disentanglement CNNs and vision transformers. In: Medical Imaging with Deep Learning (2022)
- [130] Mahapatra, D., Ge, Z.: Mr image super resolution by combining feature disentanglement cnns and vision transformers. In: - (2022)
- [131] Mahapatra, D., Ge, Z., Reyes, M.: Self-supervised generalized zero shot learning for medical image classification using novel interpretable saliency maps. IEEE Transactions on Medical Imaging pp. 1–1 (2022). https://doi.org/10.1109/TMI.2022.3163232
- [132] Mahapatra, D., Korevaar, S., Tennakoon, R.: Gcn based unsupervised domain adaptation with feature disentanglement for medical image classification. In: - (2022)
- [133] Mahapatra, D., Poellinger, A., Reyes, M.: Graph node based interpretability guided sample selection for active learning. IEEE Transactions on Medical Imaging pp. 1–1 (2022). https://doi.org/10.1109/TMI.2022.3215017
- [134] Mahapatra, D., Poellinger, A., Reyes, M.: Interpretability-guided inductive bias for deep learning based medical image classification and segmentation. Medical Image Analysis p. 102551 (2022)
- [135] Mahapatra, D., Poellinger, A., Shao, L., Reyes, M.: Interpretability-driven sample selection using self supervised learning for disease classification and segmentation. IEEE TMI pp. 1–15 (2021)
- [136] Pandey, A., Paliwal, B., Dhall, A., Subramanian, R., Mahapatra, D.: This explains that: Congruent image–report generation for explainable medical image analysis with cyclic generative adversarial networks. In: In MICCAI-iMIMIC 2021. pp. 1–11 (2021)
- [137] Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz, C., Shpanskaya, K., Lungren, M.P., Ng, A.: Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. In: arXiv preprint arXiv:1711.05225, (2017)
- [138] Roy, P., Chakravorty, R., Sedai, S., Mahapatra, D., Garnavi, R.: Automatic eye type detection in retinal fundus image using fusion of transfer learning and anatomical features. In: In Proc. DICTA. pp. 1–7 (2016)
- [139] Roy, P., Tennakoon, R., Cao, K., Sedai, S., Mahapatra, D., Maetschke, S., Garnavi, R.: A novel hybrid approach for severity assessment of diabetic retinopathy in colour fundus images,. In: In Proc. IEEE ISBI. pp. 1078–1082 (2017)
- [140] Roy, P., Mahapatra, D., Garnavi, R., Tennakoon, R.: System and method to teach and evaluate image grading performance using prior learned expert knowledge base. In: US Patent App. 10,984,674 (2021)
- [141] Saini, M., Guthier, B., Kuang, H., Mahapatra, D., Saddik, A.: szoom: A framework for automatic zoom into high resolution surveillance videos. In: arXiv preprint arXiv:1909.10164 (2019)
- [142] Schffler, P., Mahapatra, D., Tielbeek, J., Vos, F., Makanyanga, J., Pends, D., Nio, C., Stoker, J., Taylor, S., Buhmann, J.: A model development pipeline for crohns disease severity assessment from magnetic resonance images. In: In Proc: MICCAI-ABD (2013)
- [143] Schffler, P., Mahapatra, D., Tielbeek, J., Vos, F., Makanyanga, J., Pends, D., Nio, C., Stoker, J., Taylor, S., Buhmann, J.: Semi automatic crohns disease severity assessment on mr imaging. In: In Proc: MICCAI-ABD (2014)
- [144] Schüffler, P.J., Mahapatra, D., Vos, F.M., Buhmann, J.M.: Computer aided crohn’s disease severity assessment in mri. In: VIGOR++ Workshop 2014-Showcase of Research Outcomes and Future Outlook. pp. – (2014)
- [145] Sedai, S., Mahapatra, D., Antony, B., Garnavi, R.: Joint segmentation and uncertainty visualization of retinal layers in optical coherence tomography images using bayesian deep learning. In: In Proc. MICCAI-OMIA. pp. 219–227 (2018)
- [146] Sedai, S., Mahapatra, D., Ge, Z., Chakravorty, R., Garnavi, R.: Deep multiscale convolutional feature learning for weakly supervised localization of chest pathologies in x-ray images. In: In Proc. MICCAI-MLMI. pp. 267–275 (2018)
- [147] Sedai, S., Mahapatra, D., Hewavitharanage, S., Maetschke, S., Garnavi, R.: Semi-supervised segmentation of optic cup in retinal fundus images using variational autoencoder,. In: In Proc. MICCAI. pp. 75–82 (2017)
- [148] Sedai, S., Roy, P., Mahapatra, D., Garnavi, R.: Segmentation of optic disc and optic cup in retinal fundus images using shape regression. In: In Proc. EMBC. pp. 3260–3264 (2016)
- [149] Sedai, S., Roy, P., Mahapatra, D., Garnavi, R.: Segmentation of optic disc and optic cup in retinal images using coupled shape regression. In: In Proc. MICCAI-OMIA. pp. 1–8 (2016)
- [150] van Sonsbeek, T., Zhen, X., Mahapatra, D., Worring, M.: Probabilistic integration of object level annotations in chest x-ray classification. In: arXiv preprint arXiv:2210.06980 (2022)
- [151] Srivastava, S., Yaqub, M., Nandakumar, K., Ge, Z., Mahapatra, D.: Continual domain incremental learning for chest x-ray classification in low-resource clinical settings. In: In MICCAI-FAIR 2021. pp. 1–11 (2021)
- [152] Tennakoon, R., Mahapatra, D., Roy, P., Sedai, S., Garnavi, R.: Image quality classification for dr screening using convolutional neural networks. In: In Proc. MICCAI-OMIA. pp. 113–120 (2016)
- [153] Tong, J., Mahapatra, D., Bonnington, P., Drummond, T., Ge, Z.: Registration of histopathology images using self supervised fine grained feature maps. In: In Proc. MICCAI-DART Workshop. pp. 41–51 (2020)
- [154] Verma, R., Kumar, N., Patil, A., Kurian, N.C., Rane, S., Graham, S., Vu, Q.D., Zwager, M., Raza, S.E.A., Rajpoot, N., Wu, X., Chen, H., Huang, Y., Wang, L., Jung, H., Brown, G.T., Liu, Y., Liu, S., Jahromi, S.A.F., Khani, A.A., Montahaei, E., Baghshah, M.S., Behroozi, H., Semkin, P., Rassadin, A., Dutande, P., Lodaya, R., Baid, U., Baheti, B., Talbar, S., Mahbod, A., Ecker, R., Ellinger, I., Luo, Z., Dong, B., Xu, Z., Yao, Y., Lv, S., Feng, M., Xu, K., Zunair, H., Hamza, A.B., Smiley, S., Yin, T.K., Fang, Q.R., Srivastava, S., Mahapatra, D., Trnavska, L., Zhang, H., Narayanan, P.L., Law, J., Yuan, Y., Tejomay, A., Mitkari, A., Koka, D., Ramachandra, V., Kini, L., Sethi, A.: Monusac2020: A multi-organ nuclei segmentation and classification challenge. IEEE Transactions on Medical Imaging 40(12), 3413–3423 (2021). https://doi.org/10.1109/TMI.2021.3085712
- [155] Vos, F.M., Tielbeek, J., Naziroglu, R., Li, Z., Schffler, P., Mahapatra, D., Wiebel, A., Lavini, C., Buhmann, J., Hege, H., Stoker, J., van Vliet, L.: Computational modeling for assessment of IBD: to be or not to be? In: Proc. IEEE EMBC. pp. 3974–3977 (2012)
- [156] Xing, Y., Ge, Z., Zeng, R., Mahapatra, D., Seah, J., Law, M., Drummond, T.: Adversarial pulmonary pathology translation for pairwise chest x-ray data augmentation. In: In Proc. MICCAI. pp. 757–765 (2019)
- [157] Zhu, J., T.park, Isola, P., Efros, A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: arXiv preprint arXiv:1703.10593 (2017)
- [158] Zilly, J., Buhmann, J., Mahapatra, D.: Boosting convolutional filters with entropy sampling for optic cup and disc image segmentation from fundus images. In: In Proc. MLMI. pp. 136–143 (2015)
- [159] Zilly, J., Buhmann, J., Mahapatra, D.: Glaucoma detection using entropy sampling and ensemble learning for automatic optic cup and disc segmentation. In Press Computerized Medical Imaging and Graphics 55(1), 28–41 (2017)