This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Deep Reformulated Laplacian Tone Mapping

Jie Yang Corresponding author e-mail: [email protected] Westlake University, Hangzhou 310024, China Ziyi Liu I2Sense lab, University of Calgary, Calgary T2N 1N4, Canada Mengchen Lin I2Sense lab, University of Calgary, Calgary T2N 1N4, Canada Svetlana Yanushkevich Biometric Technologies Laboratory, Schulich School of Engineering, University of Calgary, Calgary T2N 1N4, Canada Orly Yadid-Pecht I2Sense lab, University of Calgary, Calgary T2N 1N4, Canada
Abstract

Wide dynamic range (WDR) images contain more scene details and contrast when compared to common images. However, it requires tone mapping to process the pixel values in order to display properly. The details of WDR images can diminish during the tone mapping process. In this work, we address the problem by combining a novel reformulated Laplacian pyramid and deep learning. The reformulated Laplacian pyramid always decompose a WDR image into two frequency bands where the low-frequency band is global feature-oriented, and the high frequency band is local feature-oriented. The reformulation preserves the local features in its original resolution and condenses the global features into a low-resolution image. The generated frequency bands are reconstructed and fine-tuned to output the final tone mapped image that can display on the screen with minimum detail and contrast loss. The experimental results demonstrate that the proposed method outperforms state-of-the-art WDR image tone mapping methods. The code is made publicly available at https://github.com/linmc86/Deep-Reformulated-Laplacian-Tone-Mapping.

Index Terms:
Tone mapping, wide dynamic range image, image processing, machine learning.

I Introduction

Wide dynamic range (WDR) imaging plays an important role in many imaging-related applications including photography, machine vision, medical imaging, and self-driving cars. Unlike traditional images that may suffer from under- and over-exposure, WDR images are obtained with WDR sensors with huge dynamic range [1, 2] or radiance recovery algorithm such as [3] that take multiple exposure images to compensate the under and over exposure regions. WDR images greatly avoid the detail and contrast loss issues of conventional low dynamic range (LDR) images that often affect the human visual experience. However, unlike most conventional LDR images where the pixel values range from 0 to 255255, the range of the pixels of WDR images are distributed over a much wider range. They may range from 0 to 1,000,0001,000,000 or 1×1061\times 10^{-6} to 1.01.0 based on the way that they are acquired. Although WDR display devices do exist in the commercial market for direct WDR display, they are still far away from representing all available luminance levels. In fact, the absolute majority of displays are, and in the foreseeable future will most likely be, of a very limited dynamic range. Therefore, to show WDR images on commonly used displays, additional tone mapping is still needed to convert the WDR images to a standard displayable level. To avoid any misunderstanding, we call the displayable image that is tone mapped from WDR as WDR-LDR to distinguish from conventional LDR images.

Previous methods of tone mapping employ various gradient reduction methods to compress the dynamic range [4, 5, 6, 7, 8]. Unfortunately, their WDR-LDR output often inevitably loses some details and contrast that are preferred by the human visual system (HVS). This is because tone mapping is not only a gradient reduction problem, but rather an in-depth topic involving human perception. A good WDR tone mapping algorithm could not only compress the large gradient but also enhance the local details of WDR images. In this paper, we propose to directly learn the global compression and local detail manipulation functionalities between the WDR images and the WDR-LDR images. Our method takes a WDR image as input, and tone maps it to WDR-LDR automatically, compressing the global dynamic range while enhancing local details. This work has the following key contributions. First, we present the reformulated Laplacian method to decompose the global feature and local features from the original WDR image. The reformulated Laplacian method condenses the global features into a low-resolution image which facilitate global feature extraction during convolution operations. Secondly, we present a two-stage network architecture and a full end-to-end learning approach which can directly tone map WDR images to WDR-LDR images. The entire network is a joint of three sub-networks that focuses on global compression, local manipulation, and fine tuning, respectively. The three sub-networks work cooperatively to produce the final WDR-LDR image. Code and model are available on our project page.

II Related Work

In this section, we discuss works relevant to our research. The works include image-to-image transformation, conventional approaches that tone map WDR image to WDR-LDR image and reverse tone mapping that reconstructs a WDR image from an LDR image.

Image-to-image transformation Generally speaking, WDR tone mapping is an image-to-image transformation task. In recent years, many image-to-image transformation tasks are tackled by training deep convolutional neural networks. For example, deep neural networks for denoising, colorization, semantic segmentation, and super-resolution applications are massively proposed and show great performance improvement when compared with traditional methods [9, 10, 11, 12]. Style transfer methods that adopt perceptual loss can produce certain artistic rendered counterpart of an input image and preserve the original image content [13, 14]. Perceptual loss measures the high-level image feature representations extracted from pre-trained convolutional neural networks. These features are more sensitive to HVS than simple pixel values. In-network encoder-decoder architectures are also widely used in image transformation works where the original image is encoded into a low dimensional latent representation and then decoded to reconstruct the required image [11, 15, 16, 17].

LDR to WDR The most well-known approach to generate a WDR image is merge multiple LDR photographs that were taken with different exposures [3]. It is still widely used in many applications. To remove ghost artifacts caused by misalignment between images of different exposures, many effective techniques were proposed [18, 19, 20] including CNN-based solution [21]. Unlike WDR radiance recovery that fuses all available information of a bracketed of images, reverse tone mapping simply generates the missing information from a single LDR image. In recent years, with the growing popularity of machine learning and abundant WDR sources, traditional reverse tone mapping methods [22, 23, 24, 25] were overperformed by machine learning based approaches. Endo et al. [26] proposed a convolutional neural network architecture that is able to create a serial of bracketed images of different exposures from a single LDR image. A WDR image is then generated from these bracketed images. Eilertsen et al. [27] proposed encoder-decoder architecture that is able to reconstruct WDR image from arbitrary single exposed LDR image with unknown camera response functions and post-processing. Marnerides et al. [28] proposed ExpandNet which consists of three different branches to reconstruct the missed information of an LDR image. A generative adversarial network is also proposed to carry out reverse tone mapping which could generate images with wider dynamic range [29].

WDR to WDR-LDR The research of tone mapping a WDR image to WDR-LDR image has been lasting for decades. The simplest approach to tone mapping a WDR image is to use a global tone mapping operator (TMO). Global TMO applies a single global function to all pixels in the image where identical pixels will be given an identical output value within the range of the display device.Tumblin and Rushmeier [30] and Ward [31] were the early researchers who developed global operators for tone mapping.

Recently, Khan et al. [32] proposed a global TMO that uses a sensitivity model of the human visual system. In general, global TMOs are easy to implement and mostly artifacts-free, and they have unique advantages in hardware implementations. However, the tone mapped images mostly suffer from low brightness, low contrast and loss of details due to the global compression of the dynamic range. Different from global TMOs, local TMOs are able to tone map a pixel based on local statistics and reveal more detail and contrast. Some early local TMOs were inspired by certain kind of features of human visual system [6, 33, 34]. Some local TMOs solve the WDR image compression as a constrained optimization problem [35, 36]. In recent years, various edge preserving filters based TMOs were developed [5, 37, 8, 7], and showed unprecedented results when compared with the aforementioned methods. A comprehensive review and classification of tone mapping algorithms can be found in [38]. Recently, a machine learning method that can effectively calculate the coefficients of a locally-affine model in bilateral space was reported [39]. It shows the great potential and performance that machine learning can provide for WDR tone mapping.

Refer to caption
Figure 1: An overview of the 10 layers CNN architecture.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 2: The tone mapped results of the CNN with flat architecture (a and b) and reformulated Laplacian pyramid architecture (c and d). Images (a) and (c), and the images (b) and (d) exhibit the same scene, respectively.

III Approach

To train a learning-based TMO to learn the mapping from a WDR image to a WDR-LDR image, we originally design our CNN model to be 10 layers flat with skip connection architecture shown in Figure 1. We used a combination of the well-known 1\ell_{1}-norm loss, Structual dissimilarity (DSSIM) [40] loss, and feature loss [13] to train our network. The 1\ell_{1}-norm can be formulated as:

1=f(x;θ)y1\ell_{1}=||f(x;\theta)-y||_{1} (1)

where f(.)f(.) represents the CNN network that takes the input image xx and the weight θ\theta. yy is the ground truth.

The DSSIM loss is a variation of the Structual Similarity index (SSIM) [41] that reflects the distance between two images. The DSSIM can be formulated as:

DSSIM=1ssim(f(x;θ)y)DSSIM=1-ssim(f(x;\theta)-y) (2)

Feature loss, feat(x,y)\ell_{feat}(x,y) as a part of the perceptual loss, was proposed by [13]. It uses 16-layer VGG network pre-trained on ImageNet to measure the semantic differences between two images. Unlike the 1\ell_{1}-norm pushes the output image to exactly match the label in each pixel, lfeat(x,y)l_{feat}(x,y) encourages them to increase the similarities in different feature levels. Suppose ϕi(x)\phi_{i}(x) is the output of the feature loss network at ii-th activation layer, and the activation map is a shape of Wi×Hi×CiW_{i}\times H_{i}\times C_{i}. We adopted 5 convolutional layers of VGG-16. The feature loss function is formulated as:

feat(xg,yg)=i=151WiHiCi||(ϕi(xg)ϕi(yg)||1\ell_{feat}(x_{g},y_{g})=\sum_{i=1}^{5}\frac{1}{W_{i}H_{i}C_{i}}||(\phi_{i}(x_{g})-\phi_{i}(y_{g})||_{1} (3)
Refer to caption
Figure 3: An overview of the proposed deep multi-bands tone mapping architecture. It decompose an input WDR image into multiple frequency bands with Laplacian pyramid, every band is mapped to the WDR-LDR domain with a specific deep neural network.

After many experimental attempts, the result directly generated by such CNN with skip connection architecture shows unpleasing. The tone mapped images exhibit in severe contrast loss and color distortion. Figure 2 shows two examples of the tone mapped results comparing to our novel TMO that will be introduced in the next section. These cases demonstrate that CNN, in general, can be used to compress the gradient of a WDR image. However, CNN with this architecture lacks the ability to generate a WDR-LDR image with a smooth texture and high contrast. It also fails to preserve the details in the overexposed regions such as the scenery outside the window in (a). In addition, lots of halo artifacts can also be visually observed in high gradient areas. This is likely because CNN with ordinary architecture has difficulty extracting the high-frequency feature of a WDR image. Frequency means the rate of change of intensity per pixel. If you have an area in your image changing from black to white, which takes many pixels to represent that intensity variation, it is called low-frequency, and vice versa. For that reason, we came up with the idea of redesigning our CNN to operate on the different image frequencies. One network can focus on the gradient compression in the high-frequency layer, while the other network focuses on the compression of the naturalness. In the end, the result of the two image frequency bands will be reconstructed back to generate the tone mapped WDR-LDR image. We combine our CNN with the reformulated Laplacian pyramid to complete this task.

Figure 3 presents an overview of the novel architecture. The objective of our work is to find the weight θ\theta, which tone map the input image xx to an output image x^\hat{x}, i.e. x^=f(x;θ)\hat{x}=f(x;\theta). The input WDR image xx is first decomposed into nn different frequency bands x0x_{0}, x1x_{1}, …, xn2x_{n-2}, xgx_{g} with Laplacian pyramid decomposition where x0x_{0} is the highest frequency band and xgx_{g} is the lowest frequency band. The high frequency bands from x0x_{0} to xn2x_{n-2} are further Laplacian reconstructed to a single image xlx_{l} which has the original resolution of the WDR image. The entire network ff is composed of three sub-networks, global compression network fgf_{g}, local manipulation network flf_{l}, and fine tune network ftf_{t}. fgf_{g} is used to generate the low frequency Laplacian decomposition of x^g\hat{x}_{g}, i.e. x^g=fg(xg;θg)\hat{x}_{g}=f_{g}(x_{g};\theta_{g}). Network flf_{l} is used to generate the high frequency components of x^l\hat{x}_{l}, xl^=fl(xl;θl)\hat{x_{l}}=f_{l}(x_{l};\theta_{l}). Network fgf_{g} handles global features while network flf_{l} deals with the high frequency local features. The generated images of x^l\hat{x}_{l} and x^g\hat{x}_{g} are reconstructed and fine toned through network ftf_{t} to output the final WDR-LDR image x^\hat{x}.

III-A Laplacian Pyramid Reformulation

Laplacian pyramid condenses the global luminance information of an image to lower resolution without sacrificing the detail since the traditional Laplacian pyramid reconstruction operation will nevertheless restore the image back. On the other hand, applying convolutional operation over the image with lower resolution can effectively decrease the computational complexity, thus reduces the requirement of the computing device.

A WDR image can be segmented into nn different frequency bands with a Laplacian pyramid. The lowest frequency band contains the global luminance terrain of the original image and the higher frequencies contain local detail and textural information which varies fast in space. The advantage of using Laplacian decomposition is apparent.

  1. 1.

    Taking the lowest frequency layer xgx_{g} as an example, its resolution is reduced by 2n12^{n-1} times in both width and height when compared to the original image, moreover, the global luminance terrain is well preserved in xgx_{g}.

  2. 2.

    In subsequent processing, even a small kernel in the neural network can process a large receptive field of the original image.

  3. 3.

    Additionally, the low-resolution input of xgx_{g} can significantly reduce the required computation for training the parameter θg\theta_{g}. Figure 4 shows the visual comparison with different choices of the number of the frequency band nn.

However, the generated Laplacian pyramid also has certain drawbacks. Firstly, there are nn layers of images and each contains different frequency components of the original WDR image. It would be difficult to process all different layers with a single neural network because the low frequency layer needs to be compressed greatly while the high frequency layers only need to be manipulated locally. Furthermore, if the nn layers are processed with nn different neural networks, the complexity of tone mapping model will also grow and make it not feasible to fit in hardware devices.

Refer to caption
Figure 4: Visual comparison of the resulting images in different frequency bands. (a) is the ground truth image. (b), (c), (d) and (e) are the images with the frequency band n=2n=2, 33, 44 and 55, respectively.

To overcome the mentioned drawbacks caused by the Laplacian pyramid, we reconstruct an image from the entire Laplacian pyramid without the lowest frequency layer. The generated xlx_{l} is a single image that has the same resolution as the original WDR image. Now, the Laplacian pyramid is reformulated into two layers with xlx_{l} representing all high frequency components and xgx_{g} representing low frequency global luminance terrain. Figure 5 intuitively shows the relationship and the difference between the reformulated Laplacian pyramid and the original one. Using two layers in the Laplacian pyramid structure has the following advantages. First, it reduces the original Laplacian pyramid model from nn layers to 22 layers, hence the computation complexity of subsequent processing is significantly reduced. Secondly, the segmentation of high and low frequency components of the WDR image leads network fgf_{g} and flf_{l} to focus on simple tasks, namely, global compression and local detail manipulation, respectively.

III-B Global Compression Network

The global compression network of fgf_{g} focuses on the compression of the global dynamic range of the WDR image, namely xgx_{g}. After the decomposition of the Laplacian pyramid, xgx_{g} is a low resolution image and only contains global luminance information of the original WDR image. Unlike many image transformation works [42, 29, 43] that employ encode-decode architecture to avoid the loss of the global feature during convolution, our architecture is able to achieve the same effect with the help of the low resolution representation xgx_{g}. A small k×kk\times k kernel is able to cover (2n1k)2(2^{n-1}*k)^{2} pixels of the original WDR image if the WDR image is decomposed to nn layers. Therefore, we adopt a simple CNN architecture to do the compression. The detail of the proposed global compression network is summarized Table I.

WW and HH are the width and height of the input image xgx_{g}, respectively. Given an input image xx, and the ground truth WDR-LDR image yy, we use the 1\ell_{1}-norm, feature loss feat\ell_{feat} and 2\ell_{2} regularization as the loss function:

global=α1+βfeat(xg,yg)+γR(θg)\ell_{global}=\alpha\ell_{1}+\beta\ell_{feat}(x_{g},y_{g})+\gamma R(\theta_{g}) (4)

where α\alpha = 0.5, β\beta = 0.5 and γ\gamma = 0.2. The 1\ell_{1}-norm can be formulated as:

1=fg(xg;θg)yg1\ell_{1}=||f_{g}(x_{g};\theta_{g})-y_{g}||_{1} (5)

where ygy_{g} is the lower frequency part of the corresponding reformulated Laplacian pyramid of ground truth yy. As our dataset is not ample comparing to all WDR image representations, we implement 2\ell_{2} regularization loss R(θg)R(\theta_{g}) to all our neural networks to prevent over-fitting.

Refer to caption
Figure 5: Illustration of the relationship and the differences between reformulated Laplacian pyramid and the original Laplacian pyramid. Unlike the original pyramid shape image decomposition (top-left, n=5n=5), the reformulated structure (top-right, n=5n=5) always contains two layers. Layers at the same frequency are connected with red dots. The high frequency layer of the reformulated pyramid in each nn can be reconstructed by adding the high frequency layer and its upsampled previous high frequency layer in the original Laplacian pyramid.

III-C Local Manipulation Network

The purpose of the local manipulation network flf_{l} is to manipulate the high frequency part of the WDR image, namely xlx_{l}. Unlike xgx_{g}, the high frequency features contained in xlx_{l} are mostly local. For simplicity, we adopt the same architecture in Table I to do the local manipulation. This is because xlx_{l} has the same resolution as the WDR image, the kernels in Table I will only cover a local image patch instead of a global area. The same network can serve two different goals when cooperated with xgx_{g} and xlx_{l}, respectively. The learning objective of the local manipulation network and the global compression network are the same. We use the same set of parameters and loss function:

local=α1+βfeat(xl,yl)+γR(θl)\ell_{local}=\alpha\ell_{1}+\beta\ell_{feat}(x_{l},y_{l})+\gamma R(\theta_{l}) (6)

yly_{l} is the high frequency part of the corresponding reformulated Laplacian pyramid of the ground truth yy.

TABLE I: Detail of Global Compression Network and Local Manipulation Network.
Layers Input Size Kernel Size Stride Kernel Num.
Input W×HW\times H - - -
Conv_1 W×HW\times H 3×33\times 3 11 3232
Batch_norm_1 W×H×32W\times H\times 32 - - -
Conv_2 W×H×32W\times H\times 32 3×33\times 3 11 3232
Batch_norm_2 W×H×32W\times H\times 32 - - -
Conv_3 W×H×32W\times H\times 32 3×33\times 3 11 3232
Batch_norm_3 W×H×32W\times H\times 32 - - -
Conv_4 W×H×32W\times H\times 32 3×33\times 3 11 3232
Batch_norm_4 W×H×32W\times H\times 32 - - -
Conv1×\times1 W×H×32W\times H\times 32 - - 11
Output W×HW\times H Input + Conv1×\times1

III-D Fine Tune Network

The global compression network and the local manipulation network are able to generate the corresponding reformulated Laplacian layer x^g\hat{x}_{g} and x^l\hat{x}_{l}, respectively. The Laplacian pyramid requires additional operations to add all frequency layers. That is, x^t\hat{x}_{t} = upsampling(x^g\hat{x}_{g}) + x^l\hat{x}_{l}. However, image x^\hat{x} cannot guarantee overall visual quality since x^g\hat{x}_{g} and x^l\hat{x}_{l} are produced with separate neural networks. Moreover, color shifts, regional blurry and other artifacts may also occur in x^\hat{x}.

To overcome these possible issues, we utilize a fine tune network ftf_{t} to further refine the reconstructed image x^t\hat{x}_{t} to the desired ground truth image. ftf_{t} is a ResNet architecture with large feature maps and small depth since the main feature of the image has been learned from the previous two neural networks.

The ResNet contains 4 residual blocks. Each residual block consists of 2 convolutional layers. We use 3×3×323\times 3\times 32 kernels for every layer with stride of 1, and we use a batch normalization layer after each convolutional layer. At the end of this ResNet, a 1×11\times 1 convolution layer is applied to condense all extracted features from the 32-channel receptive field to 1. The loss function of fine tune network is slightly different than the previous network. We adopted feature loss lfeatl_{feat} and 2\ell_{2}-norm:

t=αt2+βtfeat(x^t,y)\ell_{t}=\alpha_{t}\ell_{2}+\beta_{t}\ell_{feat}(\hat{x}_{t},y) (7)

where αt\alpha_{t} = 0.6 and βt\beta_{t} = 0.4.

IV Experiments

In this section, we first present the experimental setup and then analyze the effects of the proposed Laplacian pyramid reformation. We then compare the proposed model with the state-of-the-art methods on two databases.

IV-A Training Data Generation

We trained the proposed network for WDR tone mapping on Laval indoor dataset [44]. This dataset contained 2,233 high-resolution (7768×38847768\times 3884), high dynamic range indoor panoramas WDR images captured by Canon 5D Mark III camera. In the Laval indoor dataset, some images contain watermarks of different scale in the bottom region. We discard the bottom 15% of the panoramas to remove watermarks on the original images. After this cropping, the image resolution became 7768×33017768\times 3301. And the total number of images used in the experiment was 2,125. These images are further down-sampled to one-quarter of its original resolution and transferred to luminance image. The luminance image is generated and recovered using methods described in [8]. We generate 20 sub-images from each training sample. The size of the sub-images are drawn uniformly from the range [20%, 60%] of the size of an input WDR image and re-sampled to 512×512512\times 512 pixels The ground truth WDR-LDR images are generated using various tools including Luminance HDR 111https://github.com/LuminanceHDR/LuminanceHDR, HDR tool box provided by [45], Photoshop with human tuning and supervision.

Refer to caption
(a) Reference
Refer to caption
(b) Mantiuk [35]
Refer to caption
(c) Paris [7]
Refer to caption
(d) Ferradans [46]
Refer to caption
(e) Mai [47]
Refer to caption
(f) Gu [8]
Refer to caption
(g) Photomatix [48]
Refer to caption
(h) Proposed
Figure 6: Visual comparison on the test set. The proposed method can effectively compress the global dynamic range while preserving local detail and contrast.

IV-B Ground Truth Generation

Similar to Cai’s [49] reference image generation method, we generated high-quality ground truth images using several TMOs and human tuning. We used 6 TMOs in this process including Fattal [4], Ferradans [46], Mantiuk [35], Drago [50], Durand [5], and Reinhard [51] from Luminance HDR and HDR tool box. Then we employed 4 volunteers and 2 photographers in this process. The two photographers first picked out the images they thought were unsatisfactory (such as too dark, too bright, or exists distortion), and used Photoshop to fix them according to their own preferences. The volunteers performed the random pairwise comparison independently in the 7 sets of tone mapped images by given instruction:

  • Select one image of two that best suits your visual preferences.

  • Spend no more than 5 sec for each pair.

Images with the same vote or that couldn’t be selected within 5 seconds will be circulated back to photographers to modify, and then send to volunteers in the next round until all images have been selected.

IV-C Implementation Details

We randomly selected 70% images for training our model and use the remaining 10% for validation and 20% for testing. The network parameters are initialized using the truncated normal initializer. All training experiments are performed using the TensorFlow222https://www.tensorflow.org/?hl=zh-cn deep learning library. We adopt the ADAM optimizer for loss minimization with the learning rate is 10310^{-3}, momentum β1\beta_{1} = 0.9 and β2\beta_{2} = 0.999, ϵ=108\epsilon=10^{-8}. We use mini-batch gradient descent with batch size 8 for local manipulation, 64 for global compression and 4 for fine tuning. The forgoing networks are trained in multiple steps. The network flf_{l} and fgf_{g} are trained first. And then, we use the loss function of ftf_{t} to jointly train the entire system containing flf_{l}, fgf_{g} and ftf_{t}. The proposed model is trained in an end-to-end fashion.

IV-D Parameter Setting

The process of Laplacian pyramid reformulation described in Section III-A has a hidden parameter nn which indicates the number of layers during the original Laplacian decomposition. A different nn value will certainly affect the training and lead to different results. In order to evaluate the effect of this parameter on the final trained model, we trained our model with n=2,3,,7n=2,3,...,7 and evaluated the average PSNR, SSIM [41] and FSITM [52] on the test data set. The result is summarized in Table II. It is not surprising that the median nn values achieve average higher metrics. Actually, a smaller nn value will assign most information to the xgx_{g} image while a larger nn value will move more frequency bands to xlx_{l}. Suppose nn is so large that xgx_{g} has only one pixel, then the final image will solely be determined by flf_{l}. On the other hand, if nn is too small, then xlx_{l} will contain limited information which deteriorates the desired functionality of flf_{l}. The model with n=6n=6 gives the highest metric values. In the rest of this paper, we set n=6n=6 for all remaining experiments.

Refer to caption
(a) Reference
Refer to caption
(b) Mantiuk [35]
Refer to caption
(c) Paris [7]
Refer to caption
(d) Ferradans [46]
Refer to caption
(e) Mai [47]
Refer to caption
(f) Gu [8]
Refer to caption
(g) Photomatrix [48]
Refer to caption
(h) Proposed
Refer to caption
(i) Reference
Refer to caption
(j) Mantiuk [35]
Refer to caption
(k) Paris [7]
Refer to caption
(l) Ferradans [46]
Refer to caption
(m) Mai [47]
Refer to caption
(n) Gu [8]
Refer to caption
(o) Photomatrix [48]
Refer to caption
(p) Proposed
Figure 7: Visual comparison on the test set. The proposed method is able to enhance and recover local details that cannot be seen with other algorithms.
TABLE II: Average PSNR, SSIM and FSITM values computed for models with different nn values. Median nn values are able to achieve relatively higher indices when compared with the two end values.
Parameter PSNR (dB) SSIM FSITM
n=2n=2 17.4065 0.8586 0.8983
n=3n=3 17.2129 0.8475 0.9209
n=4n=4 19.2187 0.8417 0.9212
n=5n=5 19.6627 0.8782 0.9321
n=6n=6 20.0335 0.8948 0.9378
n=7n=7 16.7248 0.7996 0.9195
TABLE III: Comparison of the average running time on an image with size 3884×16503884\times 1650. [7] requires more than one hour of processing time.
Methods Time(s)
[8] 2.4
[35] 1.7
[46] 8.2
[47] 0.7
[48] 3.7
Ours 0.61(with GPU)

IV-E Running Time

We report the processing time of each algorithm in Table III. We evaluated all methods on a PC with Intel(R) Core(TM) i5-8600 CPU 3.10GHz, 16G memory. We used one HDR image of size 3884×16503884\times 1650 as input. Note that learning-based TM solutions are designed under GPU environment by convention since they run often significantly slower than other TM approaches under CPU environment. Our model runs 24.95 seconds with CPU and 0.61 seconds with GPU (Nvidia Titan Xp).

Refer to caption
(a) Gamma correction
Refer to caption
(b) Mantuik [35]
Refer to caption
(c) Paris[7]
Refer to caption
(d) Ferradans [46]
Refer to caption
(e) Mai [47]
Refer to caption
(f) Gu [8]
Refer to caption
(g) Photomatix [48]
Refer to caption
(h) Proposed
Refer to caption
(i) Gamma correction
Refer to caption
(j) Mantiuk [35]
Refer to caption
(k) Paris[7]
Refer to caption
(l) Ferradans [46]
Refer to caption
(m) Mai [47]
Refer to caption
(n) Gu [8]
Refer to caption
(o) Photomatrix[48]
Refer to caption
(p) Proposed
Figure 8: Visual comparison on the Fairchild database. The proposed model renders more detail in saturated regions with less artifacts when compared with other state-of-the-art approaches.

IV-F Comparison With State-of-the-art Methods

We compare the proposed model with other 6 state-of-the-art image tone mapping methods, namely, Gu TMO [8], Mantiuk TMO [35], Paris TMO [7], Ferradans TMO [46], Mai TMO [47], and Photomatix TMO [48]. Among them, Gu TMO [8] is based on edge preserving filter theory, images tone mapped with this TMO usually exhibit more detail. Paris TMO [7] is based on local Laplacian operator and it is good at preserving details from the WDR image. Mantiuk TMO [35] regards tone mapping as an optimization problem, though it has some difficulty in preserving detail, it can give very natural looking image results. Ferradans TMO [46] and Mai TMO [47] are popular tone mapping methods and are used in open source applications. Photomatix is a commercial software dedicated to WDR image tone mapping.

The following results of the mentioned algorithms are obtained from their online websites or open source projects with default parameter settings. We evaluate the model on the test dataset first and then on a totally different database.

IV-F1 Objective Quality Assessment

We first compared our algorithm with these methods on images from the test set of the Laval database. The results on one image are shown in Figure 6. All methods are able to produce acceptable images. However, they also have different problems.

For example, images obtained with Mantiuk TMO, Paris TMO, Ferradans TMO, and Photomatix TMO are generally darker than other images which give them a disadvantage for screening and human observing. Mai TMO generates the brightest image but it also saturates the area within the red rectangle. Both Gu TMO and the proposed method are able to generate images that are similar to the reference image. However, the image from the proposed TMO looks more natural than the image of Gu TMO because of the global luminance distribution. Moreover, it can display more local detail in the floor when compared with the Gu TMO.

Actually, the proposed model is able to extract local detail even under some extreme conditions. Two examples are given in Figure 7. The two images show a commonly seen WDR scenario where there is extreme luminance difference between inside and outside window area. The proposed model can still display the scenes outside the window more clearly than other results including the reference images.

TABLE IV: PSNR, SSIM, FSITM and HDR-VDP2 indices for different tone mapping methods. The values are obtained using the test data set.
Methods PSNR SSIM FSITM HDR-VDP2
Gu [8] 16.5024 0.7755 0.830 35.165
Mantiuk [35] 14.8641 0.7563 0.876 38.864
Paris [7] 18.3214 0.8443 0.853 39.598
Ferradans [46] 16.7242 0.8756 0.863 39.219
Mai [47] 19.4638 0.8842 0.858 39.346
Photomatrix [48] 17.0730 0.8565 0.856 42.194
Ours 20.0335 0.8948 0.864 42.215
TABLE V: Average TMQI, BTMQI, FSITM and HDR-VDP2 indices for different tone mapping methods. The values are obtained using the Fairchild database.
Methods TMQI BTMQI FSITM HDR-VDP2
Gu [8] 0.8300 3.6683 0.823 26.161
Mantiuk [35] 0.9194 3.7474 0.872 27.686
Paris [7] 0.9228 2.8849 0.857 26.747
Ferradans [46] 0.8418 4.4286 0.857 27.35
Mai [47] 0.8894 3.8959 0.859 27.464
Photomatrix [48] 0.8978 3.3978 0.869 27.226
Ours 0.9257 4.5110 0.868 29.323

PSNR, SSIM, FSITM and HDR-VDP2 [53] are employed to assess these algorithms quantitatively. FSITM is designed to evaluate the feature similarity index for tone-mapped images. We measured the quality of the images using HDR-VDP2, the visual metric that mimics the anatomy of the HVS to evaluate the quality of HDR images. The average indices obtained from the test set of Laval database is summarized in Table IV. In all four metrics, the proposed algorithm is able to achieve the highest scores for PSNR, SSIM and HDR-VDP2. For the FSITM, the algorithm achieves the second highest score.

To further demonstrate the robustness of our method, we evaluate our method on images outside the test set. We choose Fairchild database [54] which contains 105 WDR images containing various situations. It is a commonly used benchmark for measuring tone mapping methods. Two result images are shown in Figure 8. The first image has a very wide dynamic range in front of and behind the lamp. Mantiuk TMO, Ferrandans TMO, Mai TMO, and the proposed models are able to generate images while other algorithms show various color artifacts in the top left dim region. In the four images without artifacts, the proposed model is able to show the color boards clearly under both dark and bright lighting conditions. In the second image, only Gu TMO and the proposed model are able to show clearly the shape of the sun. Images obtained with other methods cannot show this detail. Since there are no reference images for Fairchild database, we use the blind quality indexes alone with TMQI [55] to quantitatively measure the performance of different methods. Blind quality assessment of tone mapped images (BTMQI) [56] are two blind metrics that do not require a reference image to compute a metric score. The computed indices are summarized in Table V. The proposed model is able to achieve the highest TMQI and BTMQI scores.

TABLE VI: The summary of the subjective experimental results. SumSum indicates the total number of each TMO being selected in the Comparative Selection Section. AveselectionAve_{selection} represents the average number of each TMO being selected in the Comparative Selection Section. AveratingAve_{rating} represents the average rating score of each TMO in the Image Quality Rating Section. The computational and statistical details are elaborated in the Supplementary material.
SumSum AveselectionAve_{selection} AveratingAve_{rating}
Ferradans [46] 69 8.625 6.184
Gu [8] 111 13.875 6.403
Mai [47] 65 8.125 6.571
Mantiuk [35] 54 6.75 6.079
Paris [7] 52 6.5 5.586
Photomatrix [48] 39 4.875 5.642
Ours 112 14 6.580

IV-F2 Subjective Preference Assessment

We used human-preferred tuning to generate our ground truth images. In this process, no metrics were employed because no golden standard metrics exist to evaluate the quality of a tone mapped image nor do the metrics truly reflect the observers’ preference. These metrics can only be used as limited references. To assess the actual visual experience of our tone mapped LDR images, we also carried out a subjective preference experiment beyond the set of objective indices measurement to assess the visual quality of the image generated from our deep learning solution.

The subjective experiment has two sections: Comparative Selection Section and Image Quality Rating Section. Each section contains 8 groups of tone mapped LDR images. In the Comparative Selection Section, each group contains the LDR images generated by all 7 algorithms from the same scene for visual comparison. Participants were asked to select one image of their visual preference. In the Image Quality Rating Section, participants were asked to rate each LDR image from 11 to 1010 based on the degree of visual comfort and the degree of details revealed in the image. Details refer to the brightness, contrast, and the extent to which overexposure and underexposure details are revealed. 11 represents “dislike” or “fuzzy details” and 1010 stands for “most favorite” or “clear and rich details”. 55 is somewhat in the middle. All images were randomly selected from Laval HDR dataset.

We used SurveyHero333https://www.surveyhero.com website to build our subjective experiment project. We send the survey invitation randomly via email, social networking websites, and application (see appendix for the experiment details). The survey result is shown in Table VI. In terms of visual experience that meant to be tested in the Comparative Selection Section, the voting results of our algorithm and Gu’s algorithm are significantly better than other approaches. The good visual experience comes from the low frequency layer that our Global Compression Network can effectively learn from the global luminance terrain of the human-tuned ground truth image. In the Image Quality Rating Section, the Local Manipulation Network in our model can extract and enhance the local high frequency details, therefore avoiding the lack of details in the overexposed and underexposed areas of the image after tone mapping. Our result achieves the highest voting score in both sections.

V Conclusion

In this work, we have proposed a new tone mapping method that can perform high-resolution WDR image tone mapping. To preserve the global low frequency feature as well as maintain local high frequency detail, we have proposed a novel reformulated Laplacian method to decompose a WDR image into a low-resolution image which contains the low frequency component of the WDR image and a high-resolution image which contains the remaining higher frequencies of the WDR image. The two images are processed by a dedicated global compression network and a local manipulation neural network, respectively. The global compression network learns how to compress the global scale gradient of a WDR image and the local manipulation network learns to manipulate local features. The generated images from the two networks are further merged together to produce the final output image for screen display. We visually and quantitatively compared our model with other state-of-the-art tone mapping methods with images from and outside the targeted database. The results showed that the proposed method outperforms other methods, and sometimes even shows better results than the ground truth.

References

  • [1] O. Yadid-Pecht and E. R. Fossum, “Wide intrascene dynamic range cmos aps using dual sampling,” IEEE Transactions on Electron Devices, vol. 44, no. 10, pp. 1721–1723, 1997.
  • [2] A. Spivak, A. Belenky, A. Fish, and O. Yadid-Pecht, “Wide-dynamic-range cmos image sensors—comparative performance analysis,” IEEE transactions on electron devices, vol. 56, no. 11, pp. 2446–2461, 2009.
  • [3] P. E. Debevec and J. Malik, “Recovering high dynamic range radiance maps from photographs,” in Proceedings of the 24th annual conference on Computer graphics and interactive techniques.   ACM Press/Addison-Wesley Publishing Co., 1997, pp. 369–378.
  • [4] R. Fattal, D. Lischinski, and M. Werman, “Gradient domain high dynamic range compression,” in ACM transactions on graphics (TOG), vol. 21, no. 3.   ACM, 2002, pp. 249–256.
  • [5] F. Durand and J. Dorsey, “Fast bilateral filtering for the display of high-dynamic-range images,” in ACM transactions on graphics (TOG), vol. 21, no. 3.   ACM, 2002, pp. 257–266.
  • [6] E. Reinhard and K. Devlin, “Dynamic range reduction inspired by photoreceptor physiology,” IEEE Transactions on Visualization & Computer Graphics, no. 1, pp. 13–24, 2005.
  • [7] S. Paris, S. W. Hasinoff, and J. Kautz, “Local laplacian filters: edge-aware image processing with a laplacian pyramid,” Communications of the ACM, vol. 58, no. 3, pp. 81–91, 2015.
  • [8] B. Gu, W. Li, M. Zhu, and M. Wang, “Local edge-preserving multiscale decomposition for high dynamic range image tone mapping,” IEEE Transactions on image Processing, vol. 22, no. 1, pp. 70–79, 2013.
  • [9] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155, 2017.
  • [10] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 295–307, 2016.
  • [11] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention.   Springer, 2015, pp. 234–241.
  • [12] R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in European Conference on Computer Vision.   Springer, 2016, pp. 649–666.
  • [13] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision.   Springer, 2016, pp. 694–711.
  • [14] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2414–2423.
  • [15] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1520–1528.
  • [16] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.
  • [17] Y.-H. Tsai, X. Shen, Z. Lin, K. Sunkavalli, X. Lu, and M.-H. Yang, “Deep image harmonization,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2017.
  • [18] Y. S. Heo, K. M. Lee, S. U. Lee, Y. Moon, and J. Cha, “Ghost-free high dynamic range imaging,” in Asian Conference on Computer Vision.   Springer, 2010, pp. 486–500.
  • [19] C. Lee, Y. Li, and V. Monga, “Ghost-free high dynamic range imaging via rank minimization,” IEEE Signal Processing Letters, vol. 21, no. 9, pp. 1045–1049, 2014.
  • [20] P. Sen, N. K. Kalantari, M. Yaesoubi, S. Darabi, D. B. Goldman, and E. Shechtman, “Robust patch-based hdr reconstruction of dynamic scenes.” ACM Trans. Graph., vol. 31, no. 6, pp. 203–1, 2012.
  • [21] S. Wu, J. Xu, Y.-W. Tai, and C.-K. Tang, “Deep high dynamic range imaging with large foreground motions,” in European Conference on Computer Vision.   Springer, 2018, pp. 120–135.
  • [22] F. Banterle, P. Ledda, K. Debattista, and A. Chalmers, “Inverse tone mapping,” in Proceedings of the 4th international conference on Computer graphics and interactive techniques in Australasia and Southeast Asia.   ACM, 2006, pp. 349–356.
  • [23] A. G. Rempel, M. Trentacoste, H. Seetzen, H. D. Young, W. Heidrich, L. Whitehead, and G. Ward, “Ldr2hdr: on-the-fly reverse tone mapping of legacy video and photographs,” in ACM transactions on graphics (TOG), vol. 26, no. 3.   ACM, 2007, p. 39.
  • [24] R. P. Kovaleski and M. M. Oliveira, “High-quality brightness enhancement functions for real-time reverse tone mapping,” The Visual Computer, vol. 25, no. 5-7, pp. 539–547, 2009.
  • [25] T.-H. Wang, C.-W. Chiu, W.-C. Wu, J.-W. Wang, C.-Y. Lin, C.-T. Chiu, and J.-J. Liou, “Pseudo-multiple-exposure-based tone fusion with local region adjustment,” IEEE Transactions on Multimedia, vol. 17, no. 4, pp. 470–484, 2015.
  • [26] Y. Endo, Y. Kanamori, and J. Mitani, “Deep reverse tone mapping.” ACM Trans. Graph., vol. 36, no. 6, pp. 177–1, 2017.
  • [27] G. Eilertsen, J. Kronander, G. Denes, R. K. Mantiuk, and J. Unger, “Hdr image reconstruction from a single exposure using deep cnns,” ACM Transactions on Graphics (TOG), vol. 36, no. 6, p. 178, 2017.
  • [28] D. Marnerides, T. Bashford-Rogers, J. Hatchett, and K. Debattista, “Expandnet: A deep convolutional neural network for high dynamic range expansion from low dynamic range content,” in Computer Graphics Forum, vol. 37, no. 2.   Wiley Online Library, 2018, pp. 37–49.
  • [29] S. Lee, G. H. An, and S.-J. Kang, “Deep recursive hdri: Inverse tone mapping using generative adversarial networks,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 596–611.
  • [30] J. Tumblin and H. Rushmeier, “Tone reproduction for realistic images,” IEEE Computer graphics and Applications, vol. 13, no. 6, pp. 42–48, 1993.
  • [31] G. Ward, “A contrast-based scalefactor for luminance display,” Graphics gems IV, pp. 415–421, 1994.
  • [32] I. R. Khan, S. Rahardja, M. M. Khan, M. M. Movania, and F. Abed, “A tone-mapping technique based on histogram using a sensitivity model of the human visual system,” IEEE Transactions on Industrial Electronics, vol. 65, no. 4, pp. 3469–3479, 2018.
  • [33] J. H. Van Hateren, “Encoding of high dynamic range video with a model of human cones,” ACM Transactions on Graphics (TOG), vol. 25, no. 4, pp. 1380–1399, 2006.
  • [34] H. Spitzer, Y. Karasik, and S. Einav, “Biological gain control for high dynamic range compression,” in Color and Imaging Conference, vol. 2003, no. 1.   Society for Imaging Science and Technology, 2003, pp. 42–50.
  • [35] R. Mantiuk, S. Daly, and L. Kerofsky, “Display adaptive tone mapping,” in ACM Transactions on Graphics (TOG), vol. 27, no. 3.   ACM, 2008, p. 68.
  • [36] K. Ma, H. Yeganeh, K. Zeng, and Z. Wang, “High dynamic range image tone mapping by optimizing tone mapped image quality index,” in Multimedia and Expo (ICME), 2014 IEEE International Conference on.   IEEE, 2014, pp. 1–6.
  • [37] Z. Farbman, R. Fattal, D. Lischinski, and R. Szeliski, “Edge-preserving decompositions for multi-scale tone and detail manipulation,” in ACM Transactions on Graphics (TOG), vol. 27, no. 3.   ACM, 2008, p. 67.
  • [38] G. Eilertsen, R. K. Mantiuk, and J. Unger, “A comparative review of tone-mapping algorithms for high dynamic range video,” in Computer Graphics Forum, vol. 36, no. 2.   Wiley Online Library, 2017, pp. 565–592.
  • [39] M. Gharbi, J. Chen, J. T. Barron, S. W. Hasinoff, and F. Durand, “Deep bilateral learning for real-time image enhancement,” ACM Transactions on Graphics (TOG), vol. 36, no. 4, p. 118, 2017.
  • [40] A. Loza, L. Mihaylova, N. Canagarajah, and D. Bull, “Structural similarity-based object tracking in video sequences,” in 2006 9th International Conference on Information Fusion.   IEEE, 2006, pp. 1–6.
  • [41] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans Image Process, vol. 13, no. 4, pp. 600–612, 2004.
  • [42] X. Yang, K. Xu, Y. Song, Q. Zhang, X. Wei, and R. W. Lau, “Image correction via deep reciprocating hdr transformation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1798–1807.
  • [43] J. Cai, S. Gu, and L. Zhang, “Learning a deep single image contrast enhancer from multi-exposure images,” IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 2049–2062, 2018.
  • [44] M.-A. Gardner, K. Sunkavalli, E. Yumer, X. Shen, E. Gambaretto, C. Gagné, and J.-F. Lalonde, “Learning to predict indoor illumination from a single image,” arXiv preprint arXiv:1704.00090, 2017.
  • [45] F. Banterle, A. Artusi, K. Debattista, and A. Chalmers, Advanced High Dynamic Range Imaging (2nd Edition).   Natick, MA, USA: AK Peters (CRC Press), July 2017.
  • [46] S. Ferradans, M. Bertalmio, E. Provenzi, and V. Caselles, “An analysis of visual adaptation and contrast perception for tone mapping,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 10, pp. 2002–2012, 2011.
  • [47] Z. Mai, H. Mansour, R. Mantiuk, P. Nasiopoulos, R. Ward, and W. Heidrich, “Optimizing a tone curve for backward-compatible high dynamic range image and video compression,” IEEE transactions on image processing, vol. 20, no. 6, pp. 1558–1571, 2011.
  • [48] Photomatrix, https://www.hdrsoft.com/, [Online; accessed 19-July-2018].
  • [49] J. Cai, S. Gu, and L. Zhang, “Learning a deep single image contrast enhancer from multi-exposure images,” IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 2049–2062, 2018.
  • [50] F. Drago, K. Myszkowski, T. Annen, and N. Chiba, “Adaptive logarithmic mapping for displaying high contrast scenes,” in Computer Graphics Forum, vol. 22, no. 3.   Wiley Online Library, 2003, pp. 419–426.
  • [51] E. Reinhard, W. Heidrich, P. Debevec, S. Pattanaik, G. Ward, and K. Myszkowski, High dynamic range imaging: acquisition, display, and image-based lighting.   Morgan Kaufmann, 2010.
  • [52] H. Z. Nafchi, A. Shahkolaei, R. F. Moghaddam, and M. Cheriet, “Fsitm: A feature similarity index for tone-mapped images,” IEEE Signal Processing Letters, vol. 22, no. 8, pp. 1026–1029, 2014.
  • [53] R. Mantiuk, K. J. Kim, A. G. Rempel, and W. Heidrich, “Hdr-vdp-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions,” ACM Transactions on graphics (TOG), vol. 30, no. 4, pp. 1–14, 2011.
  • [54] M. D. Fairchild, “The hdr photographic survey,” in Color and Imaging Conference, vol. 2007, no. 1.   Society for Imaging Science and Technology, 2007, pp. 233–238.
  • [55] Y. Hojatollah and W. Zhou, “Objective quality assessment of tone-mapped images,” IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society, vol. 22, no. 2, pp. 657–667, 2013.
  • [56] G. Ke, S. Wang, G. Zhai, S. Ma, X. Yang, W. Lin, W. Zhang, and G. Wen, “Blind quality assessment of tone-mapped images via analysis of information, naturalness, and structure,” IEEE Transactions on Multimedia, vol. 18, no. 3, pp. 432–443, 2016.

-A Comparative Selection

This section contains 8 questions. Each question has a group of 7 tone mapped LDR images in random order. Fig. 9 shows an example of a question. Participants can click on each image to view the full size. During the survey, participants need to choose an image with the best visual preference.

Refer to caption
Figure 9: A screenshot of a survey question in the Comparative Selection section.

-B Image Quality Rating

This section contains 6 questions, 7 tone mapped LDR images in each question. Fig. 10 shows an example of a question. Participants are asked to rate each LDR image from 1 to 10 based on image brightness, image contrast, and the extent to which overexposure and underexposure details are revealed. 1 represents ”dislike” or ”fuzzy details” and 10 stands for ”most favorite” or ”clear and rich details”.

Refer to caption
Figure 10: A screenshot of a survey question in the Image Quality Rating section.

-C Assessment Process and Results

We send out our survey444https://surveyhero.com/c/7040d7c6 via WeChat (the Chinese social media and multipurpose application) and email. Each participant is only allowed to complete the survey once. Participants can also choose to abandon the test at any time. Until the submission of the paper, there were a total of 71 people participated in the survey. Table VII and Table VIII summarizes result of Comparative Selection and Image Quality Rating respectively.

We perform a simple averaging function to obtain AveselectionAve_{selection} in the Comparative Selection section by the equation

Aveselection=i=1NssiNsAve_{selection}=\frac{\sum_{i=1}^{N_{s}}s_{i}}{N_{s}} (8)

where s denotes the number of LDR images being selected from each algorithm in each question. NsN_{s} represents the total number of questions in the Comparative Selection section.

To acquire the AveratingAve_{rating} in the Image Quality Rating section, we first computed the weighted average score ww of each TMO by the equation

w=i=110rini10w=\frac{\sum_{i=1}^{10}r_{i}n_{i}}{10} (9)

where rir_{i} denote each rating score (from 1 to 10). nin_{i} is the number of rir_{i} score that participants rated to the LDR image.

Then we used ww to calculate the AveratingAve_{rating}

Averating=i=1NrwiNrAve_{rating}=\frac{\sum_{i=1}^{N_{r}}w_{i}}{N_{r}} (10)

where NrN_{r} represents the total number of questions in the Image Quality Rating section.

TABLE VII: The summary of the result in the Comparative Selection section. The number from 1 to 8 in the first row indicate each survey question. SumSum denotes the total number of each TMO being selected. AveselectionAve_{selection} denotes the average number of each TMO being selected.
Methods 1 2 3 4 5 6 7 8 SumSum AveselectionAve_{selection}
Ferradans [46] 8 4 7 10 8 14 9 9 69 8.625
Gu [8] 16 25 9 13 18 8 11 11 111 13.875
Mai [47] 15 7 8 8 5 7 9 6 65 8.125
Mantiuk [35] 3 6 10 11 7 9 4 4 54 6.75
Paris [7] 8 4 17 9 5 3 2 4 52 6.5
Photomatix [48] 2 2 10 3 4 4 8 6 39 4.875
Ours 19 23 7 13 11 11 13 15 112 14
TABLE VIII: The summary of the results in the Image Quality Rating section. The number from 1 to 6 in the first row indicate each survey question. AveratingAve_{rating} represents the average rating score of each TMO.
Methods 1 2 3 4 5 6 AveratingAve_{rating}
Ferradans [46] 6.45 5.59 5.47 6.50 5.87 7.23 6.18
Gu [8] 6.00 6.39 6.06 7.00 6.10 6.87 6.40
Mai [47] 6.64 6.29 6.27 6.73 6.19 7.31 6.57
Mantiuk [35] 5.88 6.11 5.73 6.72 5.52 6.52 6.08
Paris [7] 5.47 5.93 5.76 5.29 5.16 5.90 5.59
Photomatix [48] 6.00 5.52 4.55 6.21 6.10 5.48 5.64
Our 6.56 6.25 6.40 6.43 6.84 7.00 6.58

-D Color Recovery

Our approach operates on luminance WDR images. We employed an additional color recovery step to assign a color to the pixels of compressed dynamic range images using the method described in [4]

x^c=(x^lh)sl\hat{x}_{c}=(\frac{\hat{x}_{l}}{h})^{s}l (11)

where x^\hat{x} is the final WDR-LDR output after the fine tune network. hh and ll denote the WDR image of the output and the one after dynamic range compression, respectively. We set the color saturation controller s=0.6 as [4] found to produce satisfactory results.

-E Additional Qualitative Comparisons

Figs. 11 - 16 show additional results of qualitative comparison with the state-of-the-art algorithms [35, 7, 46, 47, 8, 48]. The images were randomly chosen from the test set of Laval dataset [44]. The dataset contains uniformly panorama indoor WDR images in various scenes with the aspect ratio of around 1.0 to 2.0. To better demonstrate our model’s performance in preserving local details, we crop the full size images to half size and keep the overexposed regions. Our model is able to recover more details in these regions when compared with other state-of-the-art methods. Figs. 17 - 20 show full size output of different methods. It also yields visually comparable or better WDR-LDR images when compared with other methods. Figs. 21 - 24 show visual comparison with different choices of the number of the frequency band nn in the Fairchild dataset [54]. Although our neural network was trained with an indoor dataset which the dynamic range inevitably much smaller than the outdoors, our method still be able to yield pleasing WDR-LDR images.

From these exhaustive predicted outputs, we show the proposed method generates output not only effectively compress global dynamic range like other compared methods, but also preserves and enhances the details of saturated region better.

Refer to caption
(a) Reference
Refer to caption
(b) Mantiuk [35]
Refer to caption
(c) Paris [7]
Refer to caption
(d) Ferradans [46]
Refer to caption
(e) Mai [47]
Refer to caption
(f) Gu [8]
Refer to caption
(g) Photomatrix [48]
Refer to caption
(h) Proposed
Figure 11: Qualitative comparison on Laval data test set. The proposed method is able to recover local details in the saturated region.
Refer to caption
(a) Reference
Refer to caption
(b) Mantiuk [35]
Refer to caption
(c) Paris [7]
Refer to caption
(d) Ferradans [46]
Refer to caption
(e) Mai [47]
Refer to caption
(f) Gu [8]
Refer to caption
(g) Photomatrix [48]
Refer to caption
(h) Proposed TMO
Figure 12: Qualitative comparison on Laval data test set. The proposed method is able to recover local details in the saturated region.
Refer to caption
(a) Reference
Refer to caption
(b) Mantiuk [35]
Refer to caption
(c) Paris [7]
Refer to caption
(d) Ferradans [46]
Refer to caption
(e) Mai [47]
Refer to caption
(f) Gu [8]
Refer to caption
(g) Photomatrix [48]
Refer to caption
(h) Proposed TMO
Figure 13: Qualitative comparison on Laval data test set. The proposed method is able to recover local details in the saturated region.
Refer to caption
(a) Reference
Refer to caption
(b) Mantiuk [35]
Refer to caption
(c) Paris [7]
Refer to caption
(d) Ferradans [46]
Refer to caption
(e) Mai [47]
Refer to caption
(f) Gu [8]
Refer to caption
(g) Photomatrix [48]
Refer to caption
(h) Proposed TMO
Figure 14: Qualitative comparison on Laval data test set. The proposed method is able to recover local details in the saturated region.
Refer to caption
(a) Reference
Refer to caption
(b) Mantiuk [35]
Refer to caption
(c) Paris [7]
Refer to caption
(d) Ferradans [46]
Refer to caption
(e) Mai [47]
Refer to caption
(f) Gu [8]
Refer to caption
(g) Photomatrix [48]
Refer to caption
(h) Proposed TMO
Figure 15: Qualitative comparison on Laval data test set. The proposed method is able to enhance local details in the saturated region.
Refer to caption
(a) Reference
Refer to caption
(b) Mantiuk [35]
Refer to caption
(c) Paris [7]
Refer to caption
(d) Ferradans [46]
Refer to caption
(e) Mai [47]
Refer to caption
(f) Gu [8]
Refer to caption
(g) Photomatrix [48]
Refer to caption
(h) Proposed TMO
Figure 16: Qualitative comparison on Laval data test set. The proposed method is able to enhance local details in the saturated region.
Refer to caption
Refer to caption
Figure 17: top: Mai TMO [47], bottom: Ferradans TMO [46]
Refer to caption
Refer to caption
Figure 18: top: Photomatix TMO [48], bottom: Gu TMO [8]
Refer to caption
Refer to caption
Figure 19: top: Mantiuk TMO [35], bottom: Paris TMO [7]
Refer to caption
Refer to caption
Figure 20: top: Reference, bottom: Proposed TMO
Refer to caption
Figure 21: Visual comparison of the resulting images in different frequency bands. (a), (b), (c), (d), (e) and (f) are the images with the frequency band n=2n=2, 33, 44, 55, 66 and 77, respectively.
Refer to caption
Figure 22: Visual comparison of the resulting images in different frequency bands. (a), (b), (c), (d), (e) and (f) are the images with the frequency band n=2n=2, 33, 44, 55, 66 and 77, respectively.
Refer to caption
Figure 23: Visual comparison of the resulting images in different frequency bands. (a), (b), (c), (d), (e) and (f) are the images with the frequency band n=2n=2, 33, 44, 55, 66 and 77, respectively.
Refer to caption
Figure 24: Visual comparison of the resulting images in different frequency bands. (a), (b), (c), (d), (e) and (f) are the images with the frequency band n=2n=2, 33, 44, 55, 66 and 77, respectively.