1249
\addauthorManu [email protected]
\addauthorMeghal [email protected]
\addauthorMonika [email protected]
\addauthorLovekesh [email protected]
\addinstitution
Deep Learning and Artificial Intelligence Team (DLAI),
TCS Research,
New Delhi, India
DKMA-ULD Rebuttal
Response to reviews
We would like to thank all the reviewers for valuable comments on our work and appreciating our state-of-the-art work to be considered again for revision. We provide our responses to comments inline below:
1 Reviewer Comments and Changes Done
1.1 Reviewer 1 - Reject
-
•
Why do authors think multi-organ segmentation performance performance improvement of DKMA-ULD is caused by the proposed architecture?: DeepLesion is a heterogeneous dataset with lesions annotated across multiple organs of the body to train a universal lesion detector (ULD). Existing ULD methods utilize multiple neighbor slices (3 to 27) of a patient’s CT-scan to provide 3D context information to the network but they do not explicitly provide organ-specific texture and solely rely on the network to learn such information implicitly. However, we explicitly provide multi-intensity images using 5 HU windows to augment the network with extra information and make it easier to learn heterogeneous information. In addition, this multi-window information needs to be fused with the extracted feature maps properly before passing them to an RPN for lesion detection. To that end, we proposed a novel attention based feature fusion mechanism. In Table 2, we have shown an ablation study on how adding each element to the network helps in improving the detection sensitivity. Also in Figure 3(b), we support our claim that augmenting domain knowledge in the ULD network helps in improving lesion detection performance for all available organs.
-
•
What is the training and testing data of [27]. Is it a fair comparison: MELD[27] used datasets including LUNA, LiTS, NIH-Lymph and DeepLesion training sets for its training and reported performance on the official test-set of DeepLesion whereas we train only on DeepLesion. Therefore, the comparison is skewed towards MELD but we demonstrate in Table 1 that we outperform MELD[27] on the DeepLesion test set despite training on less data as we have used only the official training set of DeepLesion for DKMA-ULD.
-
•
Comparison and fairness with MELD[27]: As mentioned in Table 3, the base MELD network achieves a sensitivity of on Deeplesion test-set. They have further improved the sensitivity () by utilizing Missing Annotation Matching (MAM) and Negative Region Mining (NRM). On the other hand, sensitivity of our proposed baseline method w/o self-supervision is . Similar to self-supervision, techniques like MAM and NRM can also be used to further improve the sensitivity of any given architecture. We have added the above mentioned details to Table 1 of the main paper. It is evident that the sensitivity for our proposed baseline method is higher than MELD[27] baseline method at all FPs levels.
-
•
Add the performance of [27] and [31] in Figure 3(a): appended result for MULAN[31]. Our model outperforms the previous SOTA (at FP=) as shown in Figure 3(a). For your reference, we also presented average (at FP= &) and organ-wise sensitivity of [31] in Figure 4(a) and Figure 4(b), respectively (Refer supplementary material for exact values). Since, MELD[27] have not released their models and code publicly, we could not provide lesion-size wise and organ-wise sensitivity comparison for MELD but have provided for MULAN.
1.2 Reviewer 2- Borderline Accept
-
•
In the experimental section, there are too few comparison methods: Added detailed results for sensitivity comparison on different lesion sizes and organs in Figure 3 and 4. Kindly refer supplementary material for the values. Our DKMA-ULD gives state-of-the-art results in all cases.
1.3 Reviewer 3- Borderline Accept
-
•
The paper has some benefits but its contribution is mildly novel: We have replicated the radiologist’s way of considering multi-organ information using multiple HU windows and fusing the relevant information with our proposed attention mechanism for making the final prediction. Multi-intensity images impart multi-organ information to the network making it easily generalizable and robust to detecting lesions present in new organs.
-
•
very niche considering the scope of BMVC: The scope of proposed architecture is not limited to medical domain only but instead it can be used for object detection in general. For example, we can use multi-frequency images in spectral domain corresponding to a given image and fuse the multi-view spectral information using attention mechanism for robust object detection in natural images. Moreover, it has the capability to detect objects of varying sizes often found in real-world scenario (aerial, surveillance etc.) which improves the overall detection performance.