This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

(eccv) Package eccv Warning: Package ‘hyperref’ is loaded with option ‘pagebackref’, which is *not* recommended for camera-ready version

11institutetext: National Taiwan University 22institutetext: Amazon Web Services 33institutetext: aetherAI
33email: [email protected], [email protected], [email protected], [email protected]

Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization

Ming-Yang Ho 11 Che-Ming Wu 22 Min-Sheng Wu 33 Yufeng Jane Tseng 11
Abstract

Recent advancements in ultra-high-resolution unpaired image-to-image translation have aimed to mitigate the constraints imposed by limited GPU memory through patch-wise inference. Nonetheless, existing methods often compromise between the reduction of noticeable tiling artifacts and the preservation of color and hue contrast, attributed to the reliance on global image- or patch-level statistics in the instance normalization layers. In this study, we introduce a Dense Normalization (DN) layer designed to estimate pixel-level statistical moments. This approach effectively diminishes tiling artifacts while concurrently preserving local color and hue contrasts. To address the computational demands of pixel-level estimation, we further propose an efficient interpolation algorithm. Moreover, we invent a parallelism strategy that enables the DN layer to operate in a single pass. Through extensive experiments, we demonstrate that our method surpasses all existing approaches in performance. Notably, our DN layer is hyperparameter-free and can be seamlessly integrated into most unpaired image-to-image translation frameworks without necessitating retraining. Overall, our work paves the way for future exploration in handling images of arbitrary resolutions within the realm of unpaired image-to-image translation. Code is available at: https://github.com/Kaminyou/Dense-Normalization.

Keywords:
Unpaired image-to-image translation Ultra-high-resolution image Parallelism

1 Introduction

Unpaired image-to-image (I2I) translation is a conventional computer vision task that aims to translate an image from one domain to another without using paired images [1, 2, 3]. However, most frameworks are incapable of handling ultra-high-resolution (UHR) images due to GPU memory limitations. For example, a popular CUT [4] framework requires 14 GB of GPU VRAM for inference and 160 GB for training when processing an image with a resolution of 2,048×\times2,048, exceeding the capacity of a single 32GB NVIDIA V-100 GPU. This presents a significant challenge for researchers and practitioners working on unpaired I2I translation tasks involving UHR images.

Nevertheless, the ubiquity of UHR images in our daily life is undeniable, with mobile phones capturing 4K resolution photos and movies exceeding 8K resolution [5, 6]. Without an effective methodology, performing common image translation tasks like style transfer [7] and colorization [8] on these images would be significantly hindered.

Another critical application of UHR unpaired I2I translation is stain transformation in digital pathology [9, 10, 11]. Standard staining methods, such as hematoxylin and eosin (H&E), are commonly used due to their cost-effectiveness. However, for more detailed cancer diagnostics, the use of expensive immunohistochemical (IHC) stains becomes essential [12, 13]. Given that pathological images frequently have resolutions exceeding 10,000×\times10,000 pixels, an effective algorithm for stain transformation that reduces the cost of pathological staining is urgently needed.

A few strategies have been leveraged to perform unpaired I2I translation on UHR images. Simplifying the model architecture or increasing the output image size enable translation on 2K images [14, 15], but they still have a high GPU memory usage with space complexity of 𝒪(N2)\mathcal{O}(N^{2}) for an image with a resolution of N×NN\times N. Alternatively, patch-wise training and inference can decrease space complexity to 𝒪(1)\mathcal{O}(1), but struggle to produce seamless results due to the tiling artifacts that appear when stitching the patches into an UHR image. Although convolutional operators in most I2I frameworks should guarantee that the final output can be seamlessly assembled from the patches, normalization operators applied per patch disrupt this property. Since the statistical moments calculated in Instance Normalization (IN) layers can affect color fidelity [16], their discrepancies between neighboring patches lead to gap-type tiling artifacts, evidenced by patch-wise IN [17] (refer to Fig. 1 (b) and Fig. 2 (a)).

Refer to caption
Figure 1: Comparison of translations. (a) Showcases a real2paint translated ultra-high-resolution image (3,024×\times4,032 pixels) produced by our Dense Normalization (DN) from the image displayed in the top right corner, with comparisons highlighted within the blue-boxed region. (b) Illustrates the occurrence of gap-type tiling artifacts in patch-wise IN [17] or KIN [18]; (c) Demonstrates jitter-type tiling artifacts resulting from TIN [19]; (d) Presents DN’s effectiveness in diminishing tiling artifacts.

To mitigate this issue, Thumbnail Instance Normalization (TIN)[19] applies global image-level statistics, at the expense of losing local hue and contrast, resulting in over/under coloring. Furthermore, significant perturbations in these statistical moments unfortunately create jitter-type tiling artifacts, which manifest as color jitters at the edges of patches (see Fig. 1 (c) and Fig. 2 (b)). On the other hand, Kernelized Instance Normalization (KIN) [18] alleviates gap-type tiling artifacts by more closely aligning adjacent patch-level statistics, but it necessitates selecting a kernel size to make a trade-off between blurring artifacts and preserving local color contrast (see Fig. S1 in the supplementary material). Additionally, KIN’s method requires a two-stage pipeline (caching and inference stages) due to the initial need for statistics calculation and subsequent performance of convolution operations on them (see Fig. 2 (c)). This raises a question: Can pixel-level statistical moment estimation address all these issues, and can it be accomplished in a single pass?

Refer to caption
Figure 2: Comparison of various normalization strategies. This figure illustrates the framework and the impact of different normalization methods on an UHR image (3,024×\times4,032 pixels) for the summer2autumn task: (a) Patch-wise IN [17] uses patch-level statistics and leads to statistical differences between patches, resulting in noticeable gap-type tiling artifacts. (b) TIN [19] eliminates statistical differences with global image-level statistics (from the thumbnail) but compromises color and hue details, also inducing jitter-type tiling artifacts. (c) KIN [18] utilizes a two-stage pipeline to mitigate statistical differences by applying convolutional operations on patch-level statistics, albeit at the expense of local detail. (d) DN estimates pixel-level statistical moments in a single pass, effectively preserving local color and hue while diminishing tiling artifacts. (e) DN outperforms all methods in every aspect of human evaluation. In the row of features, indicates “achieved”; \bigtriangleup indicates “partially achieved”; indicates “not achieved”. Red close-up boxes highlight the outcomes influenced by different statistical moments used for normalization.

To answer the above question, we propose the Dense Normalization (DN) layer, which is capable of estimating statistical moments for every pixel. It possesses four expected properties: diminishing tiling artifacts (𝒫1\mathcal{P}_{1}), preserving local hue and color contrast (𝒫2\mathcal{P}_{2}), executing in a single pass (𝒫3\mathcal{P}_{3}), and being hyperparameter-free (𝒫4\mathcal{P}_{4}), as illustrated in Fig.1, Fig.2(d), and Fig. S2 in the supplementary material.

While pixel-level statistics estimation can be achieved by performing bilinear interpolation on patch-level statistics, its naïve implementation is time-consuming due to the high computational demands. Hence, we developed a fast interpolation algorithm to enhance calculation efficiency and practicality (see the comparison in Table 4). Furthermore, to perform pixel-level statistics estimation, patch-wise statistics must first be calculated and cached. Fast interpolation is then performed on these statistics, a process that would typically necessitate a two-stage pipeline similar to KIN. Nevertheless, we have devised a prefetching strategy that cleverly hides the caching process within the inference process, leveraging GPU parallelism to enable DN to operate in a single pass.

We evaluated our DN on four publicly available datasets, including natural and pathological images; quantitative evaluations confirmed that DN outperforms previous methods and its applicability in the healthcare field is demonstrated. In summary, our research has achieved the following:

  • To the best of our knowledge, this is the first study to estimate pixel-level statistical moments for normalization in UHR unpaired I2I translation, effectively diminishing tiling artifacts (𝒫1\mathcal{P}_{1}) while simultaneously preserving local hue and color (𝒫2\mathcal{P}_{2}), thereby achieving state-of-the-art performance.

  • We introduce a fast interpolation algorithm for efficient pixel-wise statistics estimation, along with a prefetching parallelism algorithm that enables DN to operate in a single pass (𝒫3\mathcal{P}_{3}), significantly decreasing runtime in comparison to naïve implementations.

  • Our hyperparameter-free DN layer (𝒫4\mathcal{P}_{4}) can be seamlessly integrated into any existing framework utilizing IN layers during inference, without necessitating model retraining.

2 Related works

Unpaired image-to-image translation. Several frameworks have been developed for unpaired image-to-image translation, aiming to discover the mapping between diverse image domains. CycleGAN [20], DiscoGAN [21], and DualGAN [22] utilize cycle-consistency loss to enforce the mapping. However, the pixel-level cycle-consistency constraint can lead to deformation and hinder the generation of large objects and fine textures when there are significant domain differences. Recently, strategies have been proposed to enhance performance beyond cycle-consistency. DistanceGAN [23] maintains pairwise distances between different parts of the same sample in each domain, while ACL-GAN [24] utilizes adversarial loss to address cyclic loss. CUT [4] maximizes patch-wise similarity between domains using contrastive learning, and LSeSim [25] learns spatial correlation to preserve structural similarity. Additionally, patch-wise semantic relationship regularization [26] is used to enhance correspondence between input and output images, while an energy function [27] is employed to retain domain-independent features and discard domain-specific ones. Despite these advancements, these frameworks are limited to processing small images. Our DN is a plugin designed to enable the processing of UHR images by simply replacing the IN layer in these frameworks.

Refer to caption
Figure 3: Framework of the Proposed Method. (a) Provides an overall view of our framework’s pipeline. (b) Shows the details of the dispatcher and the Dense Normalization (DN) layer. A UHR image 𝑿\boldsymbol{X} is initially divided into patches 𝒙r,cpatch\boldsymbol{x}^{\text{patch}}_{r,c}, with rr and cc representing the row and column coordinates, respectively. The dispatcher sequences two patches for the prefetching and inference branches. Within the DN layer, the prefetching branch calculates and caches statistical moments. For the inference branch, statistics for the patch and its eight surrounding patches are queried. Subsequently, fast interpolations are employed to estimate the mean (𝝁^c,rpixel\hat{\boldsymbol{\mu}}^{\text{pixel}}_{c,r}) and standard deviation (𝝈^c,rpixel\hat{\boldsymbol{\sigma^{*}}}^{\text{pixel}}_{c,r}) for each pixel, facilitating dense normalization.

Ultra-high-resolution unpaired image-to-image translation. Performing unpaired image-to-image translation on ultra-high-resolution images is computationally expensive. The patch-wise-based method, which divides the input image into smaller patches and reassembles the translated ones, is a solution, but it often leads to tiling artifacts. To solve this problem, overlapping windows [28, 29] can be used, or a perceptual embedding consistency loss can be employed to learn color, contrast, and brightness invariant features [30]. Meanwhile, downsampling-based methods [14, 15] avoid tiling artifacts but may result in detail loss and increased spatial complexity in upsampled images. Thumbnail Instance Normalization (TIN) [19] eliminates gap-type tiling artifacts by assuming that all patches share the same global image-level statistics, but may result in over/under-colorizing and jitter-type tiling artifacts. Kernelized Instance Normalization (KIN)[18] involves a two-stage pipeline and computes patch-level statistics using convolution operations to preserve local information but requires selecting an optimal kernel. Our DN differentiates itself by estimating pixel-level statistics to reduce tiling artifacts (𝒫1\mathcal{P}_{1}) and preserve local hues and colors (𝒫2\mathcal{P}_{2}) in a single pass (𝒫3\mathcal{P}_{3}), without the need for hyperparameter tuning (𝒫4\mathcal{P}_{4}).

Refer to caption
Figure 4: Details of the fast interpolation operation utilized in DN. Panel (a) illustrates the process of deriving N×\timesN pixel-level statistical moment estimations from a 3×\times3 matrix. Panel (b) visualizes the matrix multiplication operation involved in fast interpolation.

3 Proposed methods

3.1 Overall framework

Unpaired I2I translation aims to train a generator 𝒢\mathcal{G} to translate an image 𝑿\boldsymbol{X} in domain 𝒳\mathcal{X} to another domain 𝒴\mathcal{Y} even when there are no corresponding paired images. The output of 𝒢\mathcal{G} is denoted as 𝒀^\hat{\boldsymbol{Y}} and is expected to be in domain 𝒴\mathcal{Y}. In the context of UHR unpaired I2I translation, all images in domain 𝒳\mathcal{X} have a high resolution of H×WH\times W. To enable 𝒢\mathcal{G} to handle images with infinite resolution, the model must be trained and executed in a patch-wise manner to reduce the GPU space complexity to a constant.

After training an I2I generator 𝒢\mathcal{G} with patches, the IN layers are replaced with DN layers. During the inference process, an UHR image 𝑿H×W\boldsymbol{X}\in\mathbb{R}^{H\times W} is divided into patches 𝒙c,rpatchN×N\boldsymbol{x}^{\text{patch}}_{c,r}\in\mathbb{R}^{N\times N}, each with a size of N×NN\times N, and their coordinates (cc, rr) relative to the original image are recorded. Here, c{0,1,,HN1}c\in\{0,1,...,\lceil\frac{H}{N}\rceil-1\} and r{0,1,,WN1}r\in\{0,1,...,\lceil\frac{W}{N}\rceil-1\}. Then, a dispatcher sequentially inputs these patches and coordinates into the generator.

Fig. 3 illustrates the overall pipeline of our framework. Specifically, during each dispatch, two patches along with their coordinates are sent: one to the prefetching branch and the other to the inference branch. Both patches simultaneously go through all layers except the DN layer. In the DN layer, the patch in the prefetching branch undergoes a standard instance normalization [17], storing the resultant statistics in the cache table (Tμ,TσT_{\mu},T_{\sigma}) with the coordinates as keys. The patch in the inference branch uses its coordinates to query the cache table for its own and its eight neighbors’ statistics, forming two 3×33\times 3 matrices of coarse-level (patch-level) statistical moments (𝝁~c,rpatch\tilde{\boldsymbol{\mu}}^{\text{patch}}_{c,r}, 𝝈~c,rpatch3×3\tilde{\boldsymbol{\sigma}}^{\text{patch}}_{c,r}\in\mathbb{R}^{3\times 3}). Fast interpolation is then applied to these matrices to estimate fine-level (pixel-level) statistical measures (𝝁^c,rpixel\hat{\boldsymbol{\mu}}^{\text{pixel}}_{c,r}, 𝝈^c,rpixelN×N\hat{\boldsymbol{\sigma^{*}}}^{\text{pixel}}_{c,r}\in\mathbb{R}^{N\times N}), which are subsequently used for dense normalization. The translated patches are then reassembled into an UHR image, with the DN operation effectively reducing tiling artifacts.

3.2 Details

Dispatcher. Given a collection of patches along with their coordinates, the dispatcher first arranges them vertically to create a list of images, denoted as PP. Specifically, PP is formed as (𝒙0,0patch,𝒙1,0patch,,𝒙h1,0patch,𝒙0,1patch,,𝒙h1,w1patch)(\boldsymbol{x}^{\text{patch}}_{0,0},\boldsymbol{x}^{\text{patch}}_{1,0},...,\boldsymbol{x}^{\text{patch}}_{h-1,0},\boldsymbol{x}^{\text{patch}}_{0,1},...,\boldsymbol{x}^{\text{patch}}_{h-1,w-1}), where h=HNh=\lceil\frac{H}{N}\rceil and w=WNw=\lceil\frac{W}{N}\rceil. Subsequently, the dispatcher sequentially dispatches the images. At step tt, the dispatcher sends out two images along with their coordinates: P[t]P[t] for the inference branch and P[t+h+2]P[t+h+2] for the prefetching branch. This arrangement ensures that the eight neighboring patches of P[t]P[t], as originally cropped from the UHR image, have already been processed by the prefetching branch, guaranteeing that the corresponding statistical moments are cached in TμT_{\mu} and TσT_{\sigma} and can be queried. The iteration starts from t=(h+2)t=-(h+2) and goes up to hw1h\cdot w-1. For tt values outside the range of sequence PP, an empty image ϕ\phi is provided, and the branch assigned to process it performs no action.

Refer to caption
Figure 5: Comparison of two-stage and single-pass DN. A naïve implementation of DN might resemble KIN, operating in two stages. However, our dispatcher design and prefetching strategy enable the prefetching branch to run in parallel with the inference branch across most neural network (NN) layers, and to execute asynchronously in the DN layer, effectively hiding the runtime of the prefetching branch.

Prefetching branch and caching. When a patch 𝒙c,rpatch\boldsymbol{x}^{\text{patch}}_{c,r} enters the prefetching branch, it first undergoes a standard instance normalization process [17].

IN(𝒙c,rpatch)=γ(𝒙c,rpatch𝔼[𝒙c,rpatch]Var[𝒙c,rpatch])+βIN(\boldsymbol{x}^{\text{patch}}_{c,r})=\gamma\left(\frac{\boldsymbol{x}^{\text{patch}}_{c,r}-\mathbb{E}[\boldsymbol{x}^{\text{patch}}_{c,r}]}{\sqrt{Var[\boldsymbol{x}^{\text{patch}}_{c,r}]}}\right)+\beta (1)

Here, 𝔼[𝒙c,rpatch]\mathbb{E}[\boldsymbol{x}^{\text{patch}}_{c,r}] and Var[𝒙c,rpatch]\sqrt{Var[\boldsymbol{x}^{\text{patch}}_{c,r}]} are denoted as μc,rpatch{\mu}^{\text{patch}}_{c,r} and σc,rpatch{\sigma}^{\text{patch}}_{c,r}, respectively. These represent the mean and standard deviation of 𝒙c,rpatch\boldsymbol{x}^{\text{patch}}_{c,r}. Subsequently, these two statistics are stored in the cache table using their coordinates as keys; specifically, Tμ[c][r]:=μc,rpatchT_{\mu}[c][r]:={\mu}^{\text{patch}}_{c,r} and Tσ[c][r]:=σc,rpatchT_{\sigma}[c][r]:={\sigma}^{\text{patch}}_{c,r}.

Inference branch and dense normalization. When a patch 𝒙c,rpatch\boldsymbol{x}^{\text{patch}}_{c,r} enters the inference branch, it first uses its coordinates to query the cache tables for its and its eight neighbors’ statistical moments. Specifically, we query TμT_{\mu} and TσT_{\sigma} with keys {c1,c,c+1}×{r1,r,r+1}\{c-1,c,c+1\}\times\{r-1,r,r+1\}, yielding two 3×33\times 3 matrices: 𝝁~c,rpatch\tilde{\boldsymbol{\mu}}^{\text{patch}}_{c,r} and 𝝈~c,rpatch\tilde{\boldsymbol{\sigma}}^{\text{patch}}_{c,r}. Our goal is to derive two N×NN\times N pixel-level statistical moment estimations, 𝝁^c,rpixel\hat{\boldsymbol{\mu}}^{\text{pixel}}_{c,r} and 𝝈^c,rpixel\hat{\boldsymbol{\sigma^{*}}}^{\text{pixel}}_{c,r}, from 𝝁~c,rpatch\tilde{\boldsymbol{\mu}}^{\text{patch}}_{c,r} and 𝝈~c,rpatch\tilde{\boldsymbol{\sigma}}^{\text{patch}}_{c,r}, respectively, for 𝒙c,rpatch\boldsymbol{x}^{\text{patch}}_{c,r}.

We assume that the patch-level statistics represent the statistics for the central pixel of the patch; for example, 𝝁^c,rpixel[N2][N2]=μc,rpatch\hat{\boldsymbol{\mu}}^{\text{pixel}}_{c,r}\left[\frac{N}{2}\right]\left[\frac{N}{2}\right]={\mu}^{\text{patch}}_{c,r}. Hence, we can utilize the process below to derive 𝝁^c,rpixel\hat{\boldsymbol{\mu}}^{\text{pixel}}_{c,r} from 𝝁~c,rpatch\tilde{\boldsymbol{\mu}}^{\text{patch}}_{c,r}, with Fig. 4(a) providing a visual representation of the entire interpolation process.

  1. 1.

    Perform fast interpolation on each corner of the 3×33\times 3 matrix 𝝁~c,rpatch\tilde{\boldsymbol{\mu}}^{\text{patch}}_{c,r}. For instance, for the top-left corner 𝝁~c,rpatch[0:1,0:1]\tilde{\boldsymbol{\mu}}^{\text{patch}}_{c,r}[0:1,0:1] (a 2×22\times 2 submatrix), apply fast interpolation.

  2. 2.

    This interpolation is performed on a 2×22\times 2 submatrix to expand it to an N×NN\times N matrix.

  3. 3.

    By interpolating each 2×22\times 2 submatrix into an N×NN\times N matrix, a larger 2N×2N2N\times 2N matrix is constructed.

  4. 4.

    The central N×NN\times N submatrix is then extracted from this 2N×2N2N\times 2N matrix, serving as the pixel-level statistical estimation 𝝁^c,rpixel\hat{\boldsymbol{\mu}}^{\text{pixel}}_{c,r} for the patch 𝒙c,rpatch\boldsymbol{x}^{\text{patch}}_{c,r}.

For 𝝈~c,rpatch\tilde{\boldsymbol{\sigma}}^{\text{patch}}_{c,r}, we first calculate the inverse of each element to form 𝝈~c,rpatch\tilde{\boldsymbol{\sigma}^{*}}^{\text{patch}}_{c,r}. Then, the same interpolation and cropping processes are conducted to obtain the pixel-level statistical estimation 𝝈^c,rpixelN×N\hat{\boldsymbol{\sigma^{*}}}^{\text{pixel}}_{c,r}\in\mathbb{R}^{N\times N} for the patch 𝒙c,rpatch\boldsymbol{x}^{\text{patch}}_{c,r}.

Now, we can utilize the pixel-level statistical moments to perform dense normalization, denoted as DN()DN(\cdot).

DN(𝒙patch,𝝁^c,rpixel,𝝈^c,rpixel)=γ((𝒙patch𝝁^c,rpixel)𝝈^c,rpixel)+βDN(\boldsymbol{x}^{patch},\hat{\boldsymbol{\mu}}^{\text{pixel}}_{c,r},\hat{\boldsymbol{\sigma^{*}}}^{\text{pixel}}_{c,r})=\gamma((\boldsymbol{x}^{patch}-\hat{\boldsymbol{\mu}}^{\text{pixel}}_{c,r})\cdot\hat{\boldsymbol{\sigma^{*}}}^{\text{pixel}}_{c,r})+\beta (2)

Fast interpolation. Different from interpolation in general cases where input and output sizes are always different, our DN requires computing interpolation from 2×2\mathbb{R}^{2\times 2} to N×N\mathbb{R}^{N\times N} several times, with NN being a constant. Hence, we reformulate bilinear interpolation into fast interpolation, which can reduce computational demands and expedite DN computation.

Given a matrix 𝑸2×2\boldsymbol{Q}\in\mathbb{R}^{2\times 2}, if we wish to interpolate it into 𝑸N×N\boldsymbol{Q}^{\prime}\in\mathbb{R}^{N\times N}, using standard bilinear interpolation can be formulated as follows:

𝑸=[q0,0q0,1q1,0q1,1]\boldsymbol{Q}=\begin{bmatrix}q_{0,0}&q_{0,1}\\ q_{1,0}&q_{1,1}\end{bmatrix} (3)
𝑸[i][j]=1N2[Nvivi][q0,0q0,1q1,0q1,1][Nvjvj],i,j{0,1,,N1}\boldsymbol{Q}^{\prime}[i][j]=\frac{1}{N^{2}}\begin{bmatrix}N-v_{i}&v_{i}\end{bmatrix}\begin{bmatrix}q_{0,0}&q_{0,1}\\ q_{1,0}&q_{1,1}\end{bmatrix}\begin{bmatrix}N-v_{j}\\ v_{j}\end{bmatrix},\quad\forall i,j\in\{0,1,...,N-1\} (4)
wherevk=kNN1,k{0,1,,N1}\text{where}\ v_{k}=\frac{kN}{N-1},\quad k\in\{0,1,...,N-1\} (5)

This can be further reformulated as:

𝑸[i][j]\displaystyle\boldsymbol{Q}^{\prime}[i][j] =1N2[(Nvi)(Nvj)(Nvi)vjvi(Nvj)vivj],[q0,0q0,1q1,0q1,1]\displaystyle=\frac{1}{N^{2}}\left\langle\begin{bmatrix}(N-v_{i})(N-v_{j})&(N-v_{i})\cdot v_{j}\\ v_{i}\cdot(N-v_{j})&v_{i}\cdot v_{j}\end{bmatrix},\begin{bmatrix}q_{0,0}&q_{0,1}\\ q_{1,0}&q_{1,1}\end{bmatrix}\right\rangle (6)
=[(Nvi)(Nvj)N2(Nvi)vjN2vi(Nvj)N2vivjN2],[q0,0q0,1q1,0q1,1],i,j{0,1,,N1}\displaystyle=\left\langle\begin{bmatrix}\frac{(N-v_{i})(N-v_{j})}{N^{2}}&\frac{(N-v_{i})\cdot v_{j}}{N^{2}}\\ \frac{v_{i}\cdot(N-v_{j})}{N^{2}}&\frac{v_{i}\cdot v_{j}}{N^{2}}\end{bmatrix},\begin{bmatrix}q_{0,0}&q_{0,1}\\ q_{1,0}&q_{1,1}\end{bmatrix}\right\rangle,\forall i,j\in\{0,1,...,N-1\} (7)

where ,\left\langle,\right\rangle denotes the Frobenius inner product. It can be noted that for a given coordinate (i,j)(i,j), the interpolated value 𝑸[i][j]\boldsymbol{Q}^{\prime}[i][j] is a weighted sum of elements in 𝑸\boldsymbol{Q} with a fixed set of weights since NN is a constant.

Thus, the final interpolated result 𝑸\boldsymbol{Q}^{\prime} can be written as:

𝑸=k=01l=01qk,l𝑴k,l\boldsymbol{Q^{\prime}}=\sum_{k=0}^{1}\sum_{l=0}^{1}q_{k,l}\cdot\boldsymbol{M}_{k,l} (8)

This equation represents the fast interpolation process (see Fig. 4 (b)), where the elements in each matrix 𝑴k,lN×N\boldsymbol{M}_{k,l}\in\mathbb{R}^{N\times N} are defined as follows:

𝑴0,0[i][j]=(Nvi)(Nvj)N2,i,j{0,1,,N1}\boldsymbol{M}_{0,0}[i][j]=\frac{(N-v_{i})(N-v_{j})}{N^{2}},\quad\forall i,j\in\{0,1,...,N-1\} (9)
𝑴0,1[i][j]=(Nvi)vjN2,i,j{0,1,,N1}\boldsymbol{M}_{0,1}[i][j]=\frac{(N-v_{i})\cdot v_{j}}{N^{2}},\quad\forall i,j\in\{0,1,...,N-1\} (10)
𝑴1,0[i][j]=vi(Nvj)N2,i,j{0,1,,N1}\boldsymbol{M}_{1,0}[i][j]=\frac{v_{i}\cdot(N-v_{j})}{N^{2}},\quad\forall i,j\in\{0,1,...,N-1\} (11)
𝑴1,1[i][j]=vivjN2,i,j{0,1,,N1}\boldsymbol{M}_{1,1}[i][j]=\frac{v_{i}\cdot v_{j}}{N^{2}},\quad\forall i,j\in\{0,1,...,N-1\} (12)

This reformulation highlights fast interpolation’s desirable features (see Table 4). First, it consists solely of matrix multiplication, which can be accelerated by a GPU. Second, all matrices 𝑴k,l\boldsymbol{M}_{k,l} are consistent across all interpolation operations in our dense normalization, allowing for precomputation and caching.

Parallelism and single pass. Our dispatcher design obviates the need for separating caching and inference stages, enabling our framework to execute them concurrently in a single pass (𝒫3\mathcal{P}_{3}). This efficiency is attributed to the inherent characteristics of GPUs. Specifically, processing a batch of images through a neural network layer (e.g., a convolutional layer) incurs a similar time cost regardless of the batch size. Consequently, two dispatched patches can be processed in parallel across most layers of the generator. Even though they perform different tasks upon reaching the DN layer, the operations are asynchronously enqueued and executed in parallel by the GPU. While data synchronization does require some time, it incurs only a minimal time cost. This parallel execution strategy allows the prefetching branch’s operations to be effectively “hidden” beneath those of the inference branch, markedly reducing the overall runtime (see Fig. 5).

4 Experiments

4.1 Datasets

Natural images. To assess the effectiveness of DN, we utilized two publicly accessible datasets: Kyoto summer2autumn [18] and real2paint [31]. Kyoto summer2autumn contains UHR unpaired images of summer and autumn landscapes (5,184×\times3,456 pixels), useful for seasonal image conversion. The real2paint dataset contains UHR paintings by Vincent Van Gogh (4,000×\times3,000 pixels to 10,000×\times8,000 pixels) and real images (4,032×\times3,024 pixels) from the UHDM dataset [31]. Although low-resolution versions of Vincent Van Gogh paintings datasets are available [32], we collected 21 high-resolution images of public Vincent Van Gogh paintings online to facilitate research on UHR unpaired I2I translation. This curated list will be made publicly available.

Pathological whole slide images (WSIs). In order to demonstrate the versatility of our DN module, we performed experiments on two additional pathological datasets for stain transformation. The ACROBAT dataset [33] consists of UHR WSIs of H&E and corresponding estrogen receptor (ER), anti-progesterone receptor (PGR), human epidermal growth factor receptor 2 (HER2), and Ki67 WSIs. We randomly selected unpaired H&E and PGR WSIs from this dataset as transformation targets. The ANHIR dataset [34] contains WSIs from various organs with different staining and sizes ranging from 5,000×\times5,000 pixels to 50,000×\times50,000 pixels. For this dataset, we selected unpaired breast (H&E to ER) and lung lesion (H&E to Ki67) as our stain transformation targets.

4.2 Experimental settings

In our experiments with the aforementioned datasets, we cropped the UHR images into 512×\times512 patches and trained CycleGAN, CUT, and L-LSeSim frameworks for 100 epochs with default hyperparameters. We replaced the IN layers with our DN layers for the inference process and compared the results with patch-wise IN, TIN, and KIN methods. The experiments were conducted on an NVIDIA RTX 3090 GPU. However, due to the GPU memory limitation, we were unable to directly use the IN layer with UHR images as input without cropping. We presented results obtained using the CUT model, with further findings available in the supplementary material.

4.3 Metrics

We compared the results from our DN, patch-wise IN, TIN, and KIN methods using qualitative and quantitative evaluation techniques. The translated UHR images were assessed using the standard Fréchet inception distance (FID) [35] metric. Additionally, we conducted a downstream task to differentiate between patches from the source and target domains. To explicitly showcase the adverse effects of tiling artifacts or over/under-colorization, we intentionally cropped new patches across the raw translated patches.

However, since these metrics are not designed to evaluate UHR images and may not accurately identify tiling artifacts or over/under-colorizing, we conducted a detailed human evaluation consisting of three quality challenges. Participants were shown the source and translated images generated by DN, patch-wise IN, TIN, and KIN and asked to identify the image with the best quality, the fewest tiling artifacts, and the best color and hue. We recruited computer vision and pathology specialists in addition to the general population for the evaluation. For the translated WSIs, we conducted a fidelity challenge, following the AMT perceptual studies protocol [36].

5 Results

Refer to caption
Figure 6: Results of translated images. The figure compares the translation results on UHR images using four normalization methods: patch-wise IN [17], TIN [19], KIN [18], and DN with a CUT [4] framework. Red close-up boxes highlight the comparisons of tiling artifacts, while yellow close-up boxes focus on evaluating over/under-colorizing and local hue preservation. DN shows the best performance overall. For a better view, please zoom in.

5.1 Qualitative evaluation

Fig. 6, Fig. S4, and Fig. S5 in the supplementary material show UHR images translated from natural and pathological WSI datasets. These images reveal that patch-wise IN generates a significant amount of gap-type tiling artifacts, while KIN mitigates some of these artifacts but at the expense of details, hue, and color. TIN reduces gap-type tiling artifacts but results in over/under-colorizing, loss of local hue details, and creates jitter-type tiling artifacts. Conversely, our DN approach is the only method that successfully diminishes tiling artifacts while maintaining local hue and color details, producing the best results.

5.2 Quantitative evaluation

Table 1: Quantitative Results. The best-performing method in each experiment is highlighted in bold. DN surpasses both TIN [19] and KIN [18], as indicated by the underlined results. With respect to the FID metric, DN generally introduces the least disturbance to diminish tiling artifacts, in some cases even outperforming the intuitive lower bound. Across all experiments for the domain differentiation downstream task, measured by accuracy, DN consistently exceeds the performance of other methods, showcasing its superior efficacy in UHR image translation.
FID \downarrow Domain differentiation (%) \uparrow
Lower bound* TIN KIN DN Patch-wise IN TIN KIN DN
summer2autumn 98.281 117.268 98.003 97.732 0.967 0.828 0.950 0.975
real2paint 234.732 249.612 238.561 237.202 0.971 0.556 0.757 0.986
ACROBAT 21.046 43.988 27.224 21.346 0.983 0.858 0.977 0.985
ANHIR (breast) 64.202 161.128 91.443 68.616 0.969 0.932 0.932 0.975
ANHIR (lung lesion) 130.672 174.450 133.263 130.062 0.880 0.863 0.880 0.900
  • Lower bound*: This is empirically achieved by patch-wise IN, as it is the optimized target of the backbone model, thereby setting an intuitive lower boundary for the FID values.

Table 1 (left part) presents the standard FID evaluation results on various datasets. Since the backbone model is optimized using patch-wise IN, this method can be considered as having the optimal FID values, thereby setting an intuitive lower bound, while other methods aim to minimally disturb the translation process to remove tiling artifacts. Overall, our hyperparameter-free DN (𝒫4\mathcal{P}_{4}) outperforms TIN and KIN, indicating that it introduces the smallest adjustment necessary to achieve the goal, and local color and hue are preserved (𝒫1\mathcal{P}_{1}). Remarkably, in some cases, it even surpasses the intuitive lower bound. On the other hand, KIN secures second place due to its balance between patch-wise IN and TIN. TIN inevitably yields the worst results because the use of global statistics introduces the largest disturbance.

Table 1 (right part) displays the results of the domain differentiation downstream task. DN consistently outperforms all other methods, likely indicating the involvement of the fewest tiling artifacts (𝒫2\mathcal{P}_{2}) and the preservation of hue and color (𝒫1\mathcal{P}_{1}). On the other hand, TIN yields the worst results, which is probably due to the large disturbances introduced by the use of global statistics.

To address the limitations of available metrics, we employed human evaluation to assess image quality (see Table 2). Three image quality challenges were conducted by forty participants, and our DN method achieved the best performance across all three challenges (𝒫1\mathcal{P}_{1} and 𝒫2\mathcal{P}_{2}), particularly for the Kyoto summer2autumn dataset. Additionally, we recruited eight computer vision specialists to evaluate the translation of natural images and eight pathology specialists to evaluate the results of stain transformation. Interestingly, the effectiveness of DN was more pronounced to these specialists. Furthermore, the fidelity challenge (see Fig. S8 in the supplementary material) revealed that the images generated by DN were nearly indistinguishable from real pathological images.

Table 2: Human evaluation results conducted by the general public and experts. The best-performing method in each experiment is highlighted in bold. DN outperforms patch-wise IN [17], TIN [19], and KIN [18], as indicated by the underlined results. Overall, DN is the most preferred method across all aspects.
Has the best quality (%) \uparrow Has the fewest tiling artifacts (%) \uparrow Exhibits the best color and hue (%) \uparrow
By the general public By experts By the general public By experts By the general public By experts
IN* TIN KIN DN IN* TIN KIN DN IN* TIN KIN DN IN* TIN KIN DN IN* TIN KIN DN IN* TIN KIN DN
summer2autumn 10.00 15.79 12.11 62.11 0.00 5.71 11.43 82.86 4.74 18.95 12.63 63.68 0.00 14.29 8.57 77.14 6.32 15.26 12.63 65.79 0.00 2.86 11.43 85.71
real2paint 3.68 34.21 22.11 40.00 0.00 28.57 22.86 48.57 3.68 34.21 22.11 40.00 0.00 22.86 22.86 54.29 7.37 32.11 18.9 41.58 0.00 25.71 20.00 54.29
stain transformation - - - - 14.29 10.71 3.57 71.43 - - - - 7.14 17.86 10.71 64.29 - - - - 14.29 7.14 3.57 75.00
  • IN*: patch-wise IN; -: not applicable

5.3 Runtime and resource utilization

Table 3 presents the runtime and GPU VRAM usage for various methods. Employing operations on statistical moments generally leads to longer runtime but yields superior results. Distinctively, DN achieves faster execution in a single pass (𝒫3\mathcal{P}_{3}) compared to KIN, with a modest increase in GPU VRAM usage. This efficiency is due to the parallel execution of prefetching and the inference branch, highlighting DN’s innovative approach to parallelism design.

Table 3: Comparison of runtime and GPU memory usage. Using an NVIDIA RTX 3090 GPU, we benchmarked the runtime and GPU VRAM usage for a 4,302 ×\times 3,024 image. One-stage DN, despite involving substantial operations on statistical moments, runs faster than KIN.
IN* TIN KIN DN DN
Statistics type patch-level image-level patch-level pixel-level pixel-level
# of pipeline stage 1 1 2 2 1
Operations on statistics
Runtime (s) 2.46 2.62 4.42 5.51 4.35
GPU VRAM usage (mb) 2951 3335 3145 3161 4157
  • IN*: patch-wise IN

5.4 Ablation study

Refer to caption
Figure 7: Ablation study for interpolating granularity. The gradual introduction of tiling artifacts is observed with increasing interpolating granularity. DN takes the most comprehensive approach to normalization by interpolating every pixel.

Interpolating granularity. The statistical measures are interpolated by DN for each pixel, representing an interpolating granularity of 1 pixel. By incrementally increasing the interpolating granularity to 2, 4, and up to 512 pixels, gap-type tiling artifacts begin to emerge gradually, as illustrated in Fig. 7. These results affirm that DN effectively diminishes tiling artifacts by the pixel-level statistical moment estimation.

Runtime Optimization. DN employs a fast interpolation algorithm and prefetching parallelism strategy to enable exhaustive estimation of pixel-level statistical moments efficiently. Table 4 presents evidence of significant acceleration. Performance benchmarks conducted on a single NVIDIA RTX 3090 GPU revealed that these strategies could achieve a speedup of 44 times for the entire image and 53 times per patch inference, respectively.

Table 4: Speedup achieved using fast interpolation and prefetching parallelism. Benchmarking was conducted on an NVIDIA RTX 3090 GPU to evaluate runtime optimization strategies for a 4,302×\times3,024 image. Although prefetching parallelism requires processing a slightly higher number of patches, it significantly enhances performance, achieving final speedups of 44 and 53 times for the entire image and per patch, respectively.
Fast interpolation Prefetching # of Runtime (s) Runtime (s) Speedup (times) Speedup (times)
Reformulation Precomputation parallelism patches (entire image) (per patch) (entire image) (per patch)
35 192.50 5.50 1x 1x
35 8.28 0.24 23x 23x
35 5.51 0.16 35x 35x
42 4.35 0.10 44x 53x

6 Conclusion

In this study, we have introduced DN for UHR unpaired I2I translation. DN estimates pixel-level statistical moments for normalization, thereby diminishing tiling artifacts and preserving local hue and color simultaneously. It can be seamlessly integrated into any unpaired I2I translation model equipped with IN layers, without necessitating model retraining or hyperparameter tuning. The proposed fast interpolation algorithm allows DN to efficiently estimate statistical moments for every pixel. Additionally, a prefetching parallelism strategy enables DN to operate in a single pass. Experimental results have demonstrated that DN outperforms all prior methods on datasets containing natural images and pathological WSIs. Furthermore, DN’s ability to successfully perform stain transformation highlights its practicality in the medical domain.

Limitations and discussion. Although our research has demonstrated the superiority of DN over previous methods for UHR unpaired I2I translation, DN still requires patch-wise processing. Consequently, it would struggle to maintain the continuity of translated objects across patches. On the other hand, while less pronounced than TIN, jitter-type tiling artifacts occasionally emerge in the results, causing slight visibility issues and a lack of seamlessness. Addressing these limitations remains a goal for future work.

Furthermore, there is a lack of appropriate metrics and datasets for evaluating existing methods in UHR image translation. While we conducted human evaluations to mitigate this limitation, we recognize the importance of creating new metrics and releasing large datasets. To this end, we have released a curated list of the real2paint dataset to encourage further research into UHR unpaired I2I translation.

References

  • [1] Henri Hoyez, Cédric Schockaert, Jason Rambach, Bruno Mirbach, and Didier Stricker. Unsupervised image-to-image translation: A review. Sensors, 22(21):8540, 2022.
  • [2] Shizuo Kaji and Satoshi Kida. Overview of image-to-image translation by use of deep neural networks: denoising, super-resolution, modality conversion, and reconstruction in medical imaging. Radiological physics and technology, 12:235–248, 2019.
  • [3] Yingxue Pang, Jianxin Lin, Tao Qin, and Zhibo Chen. Image-to-image translation: Methods and applications. IEEE Transactions on Multimedia, 24:3859–3881, 2021.
  • [4] Taesung Park, Alexei A Efros, Richard Zhang, and Jun-Yan Zhu. Contrastive learning for unpaired image-to-image translation. In European Conference on Computer Vision, pages 319–345. Springer, 2020.
  • [5] Ryohei Funatsu, Steven Huang, Takayuki Yamashita, Kevin Stevulak, Jeff Rysinski, David Estrada, Shi Yan, Takuji Soeno, Tomohiro Nakamura, Tetsuya Hayashida, et al. 6.2 133mpixel 60fps cmos image sensor with 32-column shared high-speed column-parallel sar adcs. In 2015 IEEE International Solid-State Circuits Conference-(ISSCC) Digest of Technical Papers, pages 1–3. IEEE, 2015.
  • [6] Masahiko Kitamura, Daisuke Shirai, Kunitake Kaneko, Takahiro Murooka, Tomoko Sawabe, Tatsuya Fujii, and Atsushi Takahara. Beyond 4k: 8k 60p live video streaming to multiple sites. Future Generation Computer Systems, 27(7):952–959, 2011.
  • [7] Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Yizhou Yu, and Mingli Song. Neural style transfer: A review. IEEE transactions on visualization and computer graphics, 26(11):3365–3385, 2019.
  • [8] Ivana Žeger, Sonja Grgic, Josip Vuković, and Gordan Šišul. Grayscale image colorization methods: Overview and evaluation. IEEE Access, 9:113326–113346, 2021.
  • [9] Kevin de Haan, Yijie Zhang, Jonathan E Zuckerman, Tairan Liu, Anthony E Sisk, Miguel FP Diaz, Kuang-Yu Jen, Alexander Nobori, Sofia Liou, Sarah Zhang, et al. Deep learning-based transformation of h&e stained tissues into special stains. Nature communications, 12(1):4884, 2021.
  • [10] Xilin Yang, Bijie Bai, Yijie Zhang, Yuzhu Li, Kevin de Haan, Tairan Liu, and Aydogan Ozcan. Virtual stain transfer in histology via cascaded deep neural networks. ACS Photonics, 9(9):3134–3143, 2022.
  • [11] Ranran Zhang, Yankun Cao, Yujun Li, Zhi Liu, Jianye Wang, Jiahuan He, Chenyang Zhang, Xiaoyu Sui, Pengfei Zhang, Lizhen Cui, et al. Mvfstain: Multiple virtual functional stain histopathology images generation based on specific domain mapping. Medical Image Analysis, 80:102520, 2022.
  • [12] Eva-Maria Birkman, Naziha Mansuri, Samu Kurki, Annika Ålgars, Minnamaija Lintunen, Raija Ristamäki, Jari Sundström, and Olli Carpén. Gastric cancer: immunohistochemical classification of molecular subtypes and their association with clinicopathological characteristics. Virchows Archiv, 472:369–382, 2018.
  • [13] Kentaro Inamura. Update on immunohistochemistry for the diagnosis of lung cancer. Cancers, 10(3):72, 2018.
  • [14] Yuda Song, Hui Qian, and Xin Du. Multi-curve translator for high-resolution photorealistic image translation. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XV, pages 126–143. Springer, 2022.
  • [15] Jie Liang, Hui Zeng, and Lei Zhang. High-resolution photorealistic image translation in real-time: A laplacian pyramid translation network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9392–9400, 2021.
  • [16] Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision, pages 1501–1510, 2017.
  • [17] Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
  • [18] Ming-Yang Ho, Min-Sheng Wu, and Che-Ming Wu. Ultra-high-resolution unpaired stain transformation via kernelized instance normalization. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXI, pages 490–505. Springer, 2022.
  • [19] Zhe Chen, Wenhai Wang, Enze Xie, Tong Lu, and Ping Luo. Towards ultra-resolution neural style transfer via thumbnail instance normalization. In Proceedings of the AAAI Conference on Artificial Intelligence, 2022.
  • [20] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, 2017.
  • [21] Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. Learning to discover cross-domain relations with generative adversarial networks. In International conference on machine learning, pages 1857–1865. PMLR, 2017.
  • [22] Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE international conference on computer vision, pages 2849–2857, 2017.
  • [23] Sagie Benaim and Lior Wolf. One-sided unsupervised domain mapping. In NIPS, 2017.
  • [24] Yihao Zhao, Ruihai Wu, and Hao Dong. Unpaired image-to-image translation using adversarial consistency loss. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16, pages 800–815. Springer, 2020.
  • [25] Chuanxia Zheng, Tat-Jen Cham, and Jianfei Cai. The spatially-correlative loss for various image translation tasks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021.
  • [26] Chanyong Jung, Gihyun Kwon, and Jong Chul Ye. Exploring patch-wise semantic relation for contrastive learning in image-to-image translation tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18260–18269, 2022.
  • [27] Min Zhao, Fan Bao, Chongxuan Li, and Jun Zhu. Egsde: Unpaired image-to-image translation via energy-guided stochastic differential equations. arXiv preprint arXiv:2207.06635, 2022.
  • [28] Amal Lahiani, Jacob Gildenblat, Irina Klaman, Shadi Albarqouni, Nassir Navab, and Eldad Klaiman. Virtualization of tissue staining in digital pathology using an unsupervised deep learning approach. In European Congress on Digital Pathology, pages 47–55. Springer, 2019.
  • [29] Thomas de Bel, Meyke Hermsen, Jesper Kers, Jeroen van der Laak, and Geert Litjens. Stain-transforming cycle-consistent generative adversarial networks for improved segmentation of renal histopathology. In International Conference on Medical Imaging with Deep Learning–Full Paper Track, 2018.
  • [30] Amal Lahiani, Irina Klaman, Nassir Navab, Shadi Albarqouni, and Eldad Klaiman. Seamless virtual whole slide image synthesis and validation using perceptual embedding consistency. IEEE Journal of Biomedical and Health Informatics, 25(2):403–411, 2020.
  • [31] Xin Yu, Peng Dai, Wenbo Li, Lan Ma, Jiajun Shen, Jia Li, and Xiaojuan Qi. Towards efficient and scale-robust ultra-high-definition image demoiréing. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVIII, pages 646–662. Springer, 2022.
  • [32] Guilherme Folego, Otavio Gomes, and Anderson Rocha. From impressionism to expressionism: Automatically identifying van gogh’s paintings. In 2016 IEEE international conference on image processing (ICIP), pages 141–145. IEEE, 2016.
  • [33] Philippe Weitz, Masi Valkonen, Leslie Solorzano, Johan Hartman, Pekka Ruusuvuori, and Mattias Rantalainen. Acrobat-automatic registration of breast cancer tissue. In 10th Internatioal Workshop on Biomedical Image Registration, 2022.
  • [34] Jiří Borovec, Jan Kybic, Ignacio Arganda-Carreras, Dmitry V Sorokin, Gloria Bueno, Alexander V Khvostikov, Spyridon Bakas, I Eric, Chao Chang, Stefan Heldmann, et al. Anhir: automatic non-rigid histological image registration challenge. IEEE transactions on medical imaging, 39(10):3042–3052, 2020.
  • [35] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  • [36] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.