This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\floatsetup

[table]capposition=top \newfloatcommandcapbtabboxtable[][\FBwidth]

11institutetext: Institute for Medical Imaging Technology, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
11email: [email protected]
22institutetext: Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China

mr2NST: Multi-Resolution and Multi-Reference Neural Style Transfer for Mammography

Sheng Wang1,2    Jiayu Huo1,2    Xi Ouyang1,2    Jifei Che2    Xuhua Ren1    Zhong Xue2    Qian Wang1    and Jie-Zhi Cheng2
Abstract

Computer-aided diagnosis with deep learning techniques has been shown to be helpful for the diagnosis of the mammography in many clinical studies. However, the image styles of different vendors are very distinctive, and there may exist domain gap among different vendors that could potentially compromise the universal applicability of one deep learning model. In this study, we explicitly address style variety issue with the proposed multi-resolution and multi-reference neural style transfer (mr2NST) network. The mr2NST can normalize the styles from different vendors to the same style baseline with very high resolution. We illustrate that the image quality of the transferred images is comparable to the quality of original images of the target domain (vendor) in terms of NIMA scores. Meanwhile, the mr2NST results are also shown to be helpful for the lesion detection in mammograms.

Keywords:
Mammography Neural Style Transfer Style Normalization.

1 Introduction

The mammography is a widely used screening tool for breast cancer. It has been shown in many studies [2][5] that the incorporation of CAD softwares in the reading workflow of mammography can be helpful to improve the diagnostic workup. Equipped with the deep learning (DL) techniques, the CAD scheme was shown to further outperform radiologists from multiple centers across several western countries [7]. Although promising CAD performances for mammography have been shown in many previous studies, there is still an essential issue that is not well and explicitly addressed in previous DL works. As shown in Fig. 1, the image styles, like image contrast, edge sharpness, etc., of different vendors are quite different. Accordingly, one DL based CAD scheme may not always perform well on mammograms from different vendors, unless sufficient large and diverse training data are provided. Because the collection of large mammograms from various vendors can be very difficult and expensive, we here propose a mammographic style transfer (mST) scheme to normalize the image styles of different vendors to the same style baseline. It will be shown that style normalization step with the mST scheme can further boost the robustness of the classic Faster-RCNN detector [6] to the mammograms of different vendors and improve the detection performance for masses and microcalcification, denoted as μ\muC for short throughout this paper.

The direct use of the off-the-shell neural style transfer (NST) methods for the mST scheme may encounter two major issues. First, the style transfer of very subtle but important abnormalities like μ\muC or calcification is very challenging. It is because the ST for the μ\muC, which could be depicted in less than 20 pixels, may need to be carried out in high resolution. However, to our best knowledge, most classic NST methods only support images with resolution less than 1000×10001000\times 1000, whereas the dimensionality of nowadays mammography is usually larger than 2000×20002000\times 2000. Therefore, the step of image downsize is inevitable in our problem and hence the quality of subtle abnormalities like μ\muC after transfer may be compromised. Second, for most classic NST methods, a style reference image is usually needs to be manually selected as network input. However, in our context, an automatic selection scheme for style reference images is needed to facilitate the style normalization process. Meanwhile, the appearance variety of mammography is large and also depends on the category of breast density and the subject’s figure. The consideration of only one style reference image may not be sufficient to yield a plausible transfer results.

Refer to caption
Figure 1: The comparison of image styles from different vendors. Red circles highlight calcifications and μ\muCs.

To address the two issues, the mST scheme is realized with a new multi-resolution and multi-reference neural style transfer (mr2NST) network in this study. By considering multi-resolution, the details of subtle abnormalities like μ\muC or calcification can be better preserved in the transfer process. With the multiple reference images, our mr2NST network can deal with wide variety of mammography and integrate the style transfer effects from the reference images for more plausible style normalization results. Our mr2NST network also takes into account the similarities between the input image to be transferred and reference images for the integration of multiple style transfer effects. To our best knowledge, this is the first study that explicitly explores the style transfer technique to mitigate the style variation problem, which may compromise the detection performance for breast lesions.

Refer to caption
Figure 2: The pipeline of the proposed mr2NST for mST. T denote the multi-resolution operation; R stands for the refiner network.

We perform the style transfer experiments by comparing with the classic cycleGAN [11] and the conventional exact histogram matching (EHM) [1], and test the style normalization, i.e., mST, results on the detection tasks of masses and μ\muCs in mammograms. The experimental results suggest that the mST results from our mr2NST network are more plausible and can mitigate the problem of style differences from distinctive vendors for better detection results.

2 Method

In this section, we will briefly introduce the concept of NST and then discuss the details of our mr2NST network. The network of our mr2NST network is shown in Fig. 2, and the backbone is VGG19 [8].

2.1 Neural Style Transfer

The NST, which was first introduced by Gatys et al. [3], commonly requires two input images of a content image xCx_{C} to be transferred and a style reference image xSx_{S}, and then performs feature learning of the feature representatives of Fl(xC)F_{l}(x_{C}) and Fl(xS)F_{l}(x_{S}) in layer ll of a NST network. Each column of Fl(x)F_{l}(x), FlRMl(x)×NlF_{l}\in{R}^{M_{l}(x)\times N_{l}}, is a feature map, whereas NlN_{l} is the number of feature maps in layer ll and Ml(x)=Hl(x)×Wl(x)M_{l}(x)=H_{l}(x)\times W_{l}(x) is the product of height and width of each feature map. The output of NST is the style transferred image, denoted as x^\hat{x}, by minimizing the loss function:

Ltotal=Lcontent+Lstyle,L_{total}=L_{content}+L_{style}, (1)

where the content term LcontentL_{content} compares feature maps from the x^\hat{x} and xSx_{S} of each single layer lCl_{C}:

Lcontent=1NlcMlc(xC)ij(Flc(x^)Flc(xC))ij2,L_{content}=\dfrac{1}{N_{l_{c}}M_{l_{c}}(x_{C})}\sum_{ij}{}(F_{l_{c}}(\hat{x})-F_{l_{c}}(x_{C}))^{2}_{ij}, (2)

and the style term LstyleL_{style} compares a set of summary statistics:

Lstyle=lwlEl;El=14Nl2ij(Gl(x^)Gl(xS))ij2,\begin{split}L_{style}&=\sum_{l}{}w_{l}E_{l};\\ E_{l}&=\dfrac{1}{4N^{2}_{l}}\sum_{ij}{}(G_{l}(\hat{x})-G_{l}(x_{S}))^{2}_{ij},\end{split} (3)

where Gl(x)=1Ml(x)Fl(x)TFl(x)G_{l}(x)=\dfrac{1}{M_{l}(x)}F_{l}(x)^{T}F_{l}(x) is the gram matrix of the feature maps of the layer ll in response to image xx.

2.2 Multiple Reference Style Images

The mr2NST network takes multiple reference style images for better accommodate the appearance variety of mammography. Different regions in a mammography may need distinctive reference images to be transferred. For example the dense breast image to be transferred may be more suitable to take reference of images with denser glandular tissues. To attain this goal, a quantitative measurement for style similarity is needed.

The gram matrix in the equation 3 computes the co-variance statistics of features at one layer as the quantification of style similarity of the corresponding perceptual level. A higher value in the gram matrix suggests more similar of the corresponding paired feature maps in style. Accordingly, with the multiple nn reference style images, we can compute the corresponding gram matrices with each style image and integrate of the gram matrices with the max operation. Specifically, A simple but effective multi-reference style term is defined as

LMultiref=lwlEl;El=14Nl2ij(Gl(x^)Gl)ij2;Gl=H(M(Fl(xS1),Fl(xS2)Fl(xSn)),h¯).\begin{split}L_{Multi-ref}&=\sum_{l}{}w_{l}E_{l};\\ E_{l}&=\dfrac{1}{4N^{2}_{l}}\sum_{ij}{}(G_{l}(\hat{x})-G_{l})^{2}_{ij};\\ G_{l}&=H(M(F_{l}(x_{S_{1}}),F_{l}(x_{S_{2}})...F_{l}(x_{S_{n}})),\overline{h}).\end{split} (4)

The function M()M() is a element-wise max operation takes nNl×Hl×WlnN_{l}\times H_{l}\times W_{l} feature maps F(xSn)F(x_{S_{n}}) with the nnth reference image at the llth layer and outputs a N×NN\times N matrix, GlG^{\prime}_{l}. Specifically, the function M()M() computes each element gijg^{\prime}_{ij} of GlG^{\prime}_{l} as

gij=max(Fi(xS1)TFj(xS1),Fi(xS2)TFj(xS2),,Fi(xSn)TFj(xSn)).g{\prime}_{ij}=max\Big{(}F_{i}(x_{S_{1}})^{T}F_{j}(x_{S_{1}}),F_{i}(x_{S_{2}})^{T}F_{j}(x_{S_{2}}),...,F_{i}(x_{S_{n}})^{T}F_{j}(x_{S_{n}})\Big{)}. (5)

The function HH is a histogram specification function to normalize the GlG^{\prime}_{l} with the reference density histogram, h¯\overline{h}, for numerical stabilization. h¯\overline{h} is the density histogram of a n×Nl×Nln\times N_{l}\times N_{l} matrix stacked by nn Nl×NlN_{l}\times N_{l} style gram matrices.

The size of nowadays mammography is commonly bigger than 2080×20802080\times 2080, and may require formidably large GPU memory for any off-the-shelf NST method. In our experience, the ST for an image with the size of 512×512512\times 512 could consume up to 10.8GB GPU memory for inference. For the mST with original size, it is estimated to require more than 160GB GPU memory and hence is very impractical for the clinical usage or laboratorial study. Accordingly, we here propose a multi-resolution strategy that can more efficiently use the GPU resources and still attain the goal of better consideration of local details in the mST scheme.

Referring to Fig. 2, the multi-resolution is implemented by considering the 2048×20482048\times 2048 original image (scale0), division of image into 44 1024×10241024\times 1024 patches with overlapping (scale1) as well as 1616 512×512512\times 512 patches with overlapping (scale2). The image of scale0 and the patches of scale1 are resized into 512×512512\times 512 to fit the memory limit and support the feature learning with the middle- and large-sized receptive fields.

The image and patches of the scale0, scale1 and scale2 are transferred by taking multiple reference style images, see Fig. 2. For each patch/image of each scale, we perform the style transfer by optimizing the multi-reference style term LMultirefL_{Multi-ref} defined in e.q. 4. Afterward, the all transferred patches of scale1 and scale2 are further reconstructed back to the integral mammograms. The reconstructed mammograms of scale1 and scale2 as well as the transferred image of scale0 are then further resized back to the original size. For the final output, we integrate the three transferred images of scale0, scale1 and scale2 with weighted summation and further refine the summed image by a refiner network. The final style transferred mammogram can be computed as

Mout=R(S0,S1,S2)=r(n=02wnSn),M_{out}=R(S_{0},S_{1},S_{2})=r(\sum_{n=0}^{2}{w_{n}S_{n}}), (6)

where RR is the refiner network and rr denotes a network composed by 33 1×11\times 1 convolutional layers[9], and w0w_{0}, w1w_{1}, and w3w_{3} are three learnable weights. The refiner network is trained with the GAN scheme, where the refiner network is treated as generator to fool a discriminator DD. The discriminator DD, with the backbone of ResNet18 [4], is devised to check whether the input image is of the target style. The training of the refiner GAN can be driven by minimizing the loss:

LGAN(R,D)=logD(Style)+log(1D(R(S0,S1,S2))).L_{GAN}(R,D)=\log{D(Style)}+\log{(1-D(R(S_{0},S_{1},S_{2})))}. (7)

3 Experiments and Results

Refer to caption
Figure 3: The visual illustration of the multiple reference and multiple resolution effect. The red arrows suggest the calcifications or μ\muCs in the images.
Refer to caption
Figure 4: Visual comparison of mST results from different methods. The right part gives zoom-in comparison in terms of vessel structures.

In this study, we involved 1,380 mammograms, where 840 and 540 mammograms were collected from two distinctive hospitals, denoted as HAH_{A} and HBH_{B}, with local IRB approvals. The mammograms from HAH_{A} and HBH_{B} were acquired from the GE healthcare (GE) and United Imaging Healthcare (UIH), respectively. All mammograms are based on the unit of breast. Accordingly, there are half cranicaudal (CC) and half mediolateral oblique (MLO) views of mammograms in our dataset. For the training of the refiner GAN with the e.q. 7, we use independent 80 GE and 80 UIH mammograms, which are not included in the 1,380 images.

Throughout the experiments, we set the source and target domains as GE (HAH_{A}) and UIH (HBH_{B}), respectively. As can be found in Fig. 1 and Fig. 4, the image style of GE is relatively soft, whereas the UIH style is sharper. Accordingly, the image styles from different vendors can be very distinctive. We compare our method with the baselines of cycleGAN [11] and exact histogram matching (EHM)[1]. Since the cycleGAN requires training step, we randomly select 100 and 80 images from HAH_{A} and HBH_{B}, respectively, to train the cycleGAN. Except the refiner GAN, our mr2NST doesn’t need a training step. For each ST inference with mr2NST, we select 5 reference images of the target UIH domain with 5 best similar images from an reference image bank of 40 UIH images, which are not included in the 1,380 images and the 80 training data of refiner GAN. The similarity for the selection is based on the area of breast. The selected reference images are of the same view (CC/MLO) with the source image to be transferred. The optimizer Adam is adopted with 400 epochs of optimization for our mr2NST.

Fig. 3 illustrates the efficacy of multi-reference and multi-resolution scheme for the mST from GE to UIH. The upper row in Fig. 3 shows better enhance on glandular tissues with 5 reference images on a case with high density, while the lower row suggests the calcification can be better enhanced by fusing the transferred images from all three scales. Fig. 4 shows the mST results from our mr2NST and the baselines of cycleGAN and EHM. From visual comparison, the quality of the transferred images from mr2NST are much better. The cycleGAN requires large GPU memory and can’t support mST in high resolution. Meanwhile, referring to the right part of Fig. 4, mr2NST can preserve the details of vasculature after the mST.

Table 1: NIMA scores of UIH, GE and mST results.
GE UIH mr2NST cycleGAN EHM
Score 5.16 ±\pm 0.12 5.43 ±\pm 0.10 5.42 ±\pm 0.15 4.74 ±\pm 0.22 5.29 ±\pm 0.11

Two experiments are conducted to illustrate the efficacy of our mr2NST w.r.t. the transferred image quality and detection performance. The first experiment aims to evaluate the quality of transferred images with the neural image assessment (NIMA) score [10]. Specifically, we randomly select 400 GE and 400 UIH (not overlapped with the training dataset of cycleGAN) for mST. The 400 GE images are transferred to the UIH domain with the comparing methods and the resulted NIMA scores of the transferred images are listed in Table 1. We also compare the NIMA scores between the transferred and original images at UIH domain with the student t test. The computed pp-values are 0.580.58, 4.76×10124.76\times 10^{-12}, and 3.34×10613.34\times 10^{-61}, w.r.t. mr2NST, EHM, and cycleGAN, suggesting that the quality of mST images from mr2NST is not significantly different to the quality of original UIH images in terms of NIMA scores. On the other hand, the quality differences of mST images from EHM, and cycleGAN to the UIH images deem to be significant.

The second experiment aims to illustrate if the mST can help to mitigate the domain gap problem and improve the detection performance. The dataset of UIH (HBH_{B}) is relatively small, and therefore, we aim to illustrate if the mST from GE to UIH can assist to improve the detection results in the UIH domain. Since the baselines can’t yield comparable image quality to the target UIH domain, we only perform this experiments with mr2NST. Specifically, we conduct 5 schemes of various combination of UIH, UIHGE (simulated UIH with mr2NST from GE), and GE data for the training of Faster-RCNN [6] with the backbone of resnet50. The detection results for masses and μ\muCs are reported in Table 2.

The testing UIH data, which is served as the testing data for all training settings in Table 2, include 120 images of 90 positive cases and 30 normal images. The 90 testing positive cases have 36 and 28 images with only masses and μ\muC, respectively, and 26 images with both. For the training with only real UIH data, there are 420 images with 100 normal cases and 320 positive cases (131 only masses, 123 only μ\muC, and 66 both). For the 2nd to 5th schemes in Table 2, we aim to compare the effects of adding 420 and 840 extra training data with either real GE or UIHGE images. The UIHGE data of the 2nd and 3rd are the mST results from the GE data of 4th and 5th schemes, respectively, and 420 GE images is the subset of 840 GE images. For systematical comparison, the 420 images GE has the same distribution of mass, μ\muC and normal cases with the real UIH 420 images, whereas the 840 GE images are distributed in the same ratio with double size.

In Table 2, the detection performance is assessed with average precision (AP) and recall with average 0.5 (Recall0.5) and 1 (Recall1) false-positives (FP) per image. As can be observed, the adding of UIHGE in the training data can better boost the detection performance, by comparing the rows of 2nd, 3rd to 1st row in Table 2. Referring to 4th and 5th rows in Table 2, the direct incorporation of GE data seems to be not helpful for the detection performance. The transferred UIHGE images on the other hand are more similar to the real UIH images and can be served more informative samples for the training of detector.

Table 2: Detection performance comparison.
Training Data Masses μ\muCs
AP Recall0.5 Recall1 AP Recall0.5 Recall1
420 real UIH 0.656 0.761 0.869 0.515 0.459 0.567
420 real UIH + 420 UIHGE 0.724 0.823 0.891 0.569 0.593 0.702
420 real UIH + 840 UIHGE 0.738 0.811 0.912 0.670 0.622 0.784
420 real UIH + 420 GE 0.641 0.741 0.847 0.555 0.509 0.651
420 real UIH + 840 GE 0.654 0.738 0.869 0.632 0.604 0.738

4 Conclusion

A new style transfer method, mr2NST, is proposed in this paper to normalize the image styles form different vendors on the same level. The mST results can be attained with high resolution by take multiple reference images from the target domain. The experimental results suggest that style normalization with mr2NST can improve the detection results for masses and μ\muCs.

References

  • [1] Coltuc, D., Bolon, P., Chassery, J.M.: Exact histogram specification. IEEE Transactions on Image Processing 15(5), 1143–1152 (2006)
  • [2] Freer, T.W., Ulissey, M.J.: Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast center. Radiology 220(3), 781–786 (2001)
  • [3] Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2414–2423 (2016)
  • [4] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
  • [5] Kooi, T., Litjens, G., Van Ginneken, B., Gubern-Mérida, A., Sánchez, C.I., Mann, R., den Heeten, A., Karssemeijer, N.: Large scale deep learning for computer aided detection of mammographic lesions. Medical image analysis 35, 303–312 (2017)
  • [6] Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems. pp. 91–99 (2015)
  • [7] Rodriguez-Ruiz, A., Lång, K., Gubern-Mérida, A., Broeders, M., Gennaro, G., Clauser, P., Helbich, T., Chevalier, M., Tan, T., Mertelmeier, T., Wallis, M., Andersson, I., Zackrisson, S., Mann, R., Sechopoulos, I.: Stand-alone artificial intelligence for breast cancer detection in mammography: Comparison with 101 radiologists. Journal of the National Cancer Institute 111 (03 2019). https://doi.org/10.1093/jnci/djy222
  • [8] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  • [9] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1–9 (2015)
  • [10] Talebi, H., Milanfar, P.: Nima: Neural image assessment. IEEE Transactions on Image Processing 27(8), 3998–4011 (2018)
  • [11] Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision. pp. 2223–2232 (2017)