This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\addauthor

Ziteng [email protected] \addauthorLin [email protected],1 \addauthorTatsuya [email protected],2 \addinstitution MIL Lab
The University of Tokyo \addinstitution RIKEN AIP IAC

Discovering an Image-Adaptive Coordinate System for Photography Processing

Abstract

Curve &\& Lookup Table (LUT) based methods directly map a pixel to the target output, making them highly efficient tools for real-time photography processing. However, due to extreme memory complexity to learn full RGB space mapping, existing methods either sample a discretized 3D lattice to build a 3D LUT or decompose into three separate curves (1D LUTs) on the RGB channels. Here, we propose a novel algorithm, IAC, to learn an image-adaptive Cartesian coordinate system in the RGB color space before performing curve operations. This end-to-end trainable approach enables us to efficiently adjust images with a jointly learned image-adaptive coordinate system and curves. Experimental results demonstrate that this simple strategy achieves state-of-the-art (SOTA) performance in various photography processing tasks, including photo retouching, exposure correction, and white-balance editing, while also maintaining a lightweight design and fast inference speed.

1 Introduction

Refer to caption
Figure 1: Compare of our IAC and previous image-adaptive curve &\& LUT methods.

The Curve &\& Lookup Table (LUT) serves as an array, replacing runtime computations with a simpler indexing operation. Instead of recalculating results for each operation, precomputed values stored in the table guide input values directly to corresponding outputs. In recent years, deep network based image-adaptive curves (also referred to as 1D LUT) [Moran et al.(2021)Moran, McDonagh, and Slabaugh, Kim et al.(2020)Kim, Koh, and Kim, Song et al.(2021)Song, Qian, and Du, Guo et al.(2020)Guo, Li, Guo, Loy, Hou, Kwong, and Cong, Jiang et al.(2023)Jiang, Wang, Li, Li, Fan, and Liu, Vinker et al.(2021)Vinker, Huberman-Spiegelglas, and Fattal] and image-adaptive 3D LUTs [Zeng et al.(2022)Zeng, Cai, Li, Cao, and Zhang, Yang et al.(2022)Yang, Jin, Jia, Xu, and Chen, Zhang et al.(2023)Zhang, Zhang, Zhang, and Wang, Wang et al.(2021)Wang, Li, Peng, Ma, Wang, Song, and Yan, Liu et al.(2023)Liu, Yang, Fu, and Qian] have played a crucial role in the image processing era. Compared to methods that use networks for end-to-end mapping, curve &\& LUT-based methods are highly efficient and can adapt images to arbitrary scale.

Image adaptive curves (IA-curves) adjust R/ G/ B channels individually use 3 curves (3×\times1D LUT) [Kim et al.(2020)Kim, Koh, and Kim, Song et al.(2021)Song, Qian, and Du] or use a single curve to uniformly adjust all R, G, and B channels (1×\times1D LUT) [Guo et al.(2020)Guo, Li, Guo, Loy, Hou, Kwong, and Cong, Jiang et al.(2023)Jiang, Wang, Li, Li, Fan, and Liu]. The advantage of curve-based methods lies in their low computational cost, fast inference time, and ability to specific attribute needs (i.e. intensity [Guo et al.(2020)Guo, Li, Guo, Loy, Hou, Kwong, and Cong]). However, such operations also pose several challenges because existing curves are primarily built upon the R/ G/ B coordinate axes, which impractical to achieve adjustments for some attributes like hue and saturation [Moran et al.(2021)Moran, McDonagh, and Slabaugh, Wang et al.(2021)Wang, Li, Peng, Ma, Wang, Song, and Yan, Qiu et al.(2022)Qiu, Liu, Sun, Lin, and Chen], moreover, as it shown Fig. 1(a), pixel projections onto R, G, and B channels often cluster together, leading to non-uniform sampling and space wastage. Some efforts have been proposed to alleviate this issue, like Kim et al. [Kim et al.(2020)Kim, Koh, and Kim] add a pixel-wise local adjustment network after curve adjustment and Moran et al. [Moran et al.(2021)Moran, McDonagh, and Slabaugh] built the curve on multi-colour space, however, these approaches typically add extra computational overhead, also building curves in multiple color spaces increases the network’s learning burden.

Image-adaptive 3D LUTs (IA-3DLUTs) seem to be another solution, which quantize an RGB colour space to grid through uniform sampling (i.e. 33*33*33, 16*16*16), and perform lookup operations on the sampled grid. Zeng et al. [Zeng et al.(2022)Zeng, Cai, Li, Cao, and Zhang] and following works [Yang et al.(2022)Yang, Jin, Jia, Xu, and Chen, Zhang et al.(2023)Zhang, Zhang, Zhang, and Wang, Wang et al.(2021)Wang, Li, Peng, Ma, Wang, Song, and Yan] use neural networks to learn 3D RGB cube’s grid value, then use trilinear interpolation to predict missing colours in 3D RGB space. Compare to curve, 3D LUT provide more accurate colour adjustments as they operate in 3-dimensional space. However, the spatial domain sampling approach would result in many colors losing their index (ID) in 3D RGB space (see Fig. 1(a)). The subsequent trilinear interpolation relies on CUDA acceleration, which is often memory-intensive and not supported by commonly used frameworks like PyTorch or TFLite on mobile devices [Conde et al.(2024)Conde, Vazquez-Corral, Brown, and Timofte]. Meanwhile, 3D LUT methods still retain high computational complexity (𝒪(n3)\mathcal{O}(n^{3})) and also exhibit heavier redundancy in space utilization. For instance, a 33-point image-adaptive 3D LUT typically only utilizes 5.53%\% of the available space.

After observing the two types of algorithms mentioned above, we thought, why not let the network learn an image-adaptive coordinate space? Where the input image is first projected into its’ preference coordinate system, and then apply curve adjustment in the projected space before transforming back to the RGB space. In this way, we propose IAC, which integrate the learning of an image-adaptive coordinate space alongside curve adjustment, allows the network to adapt its coordinate space preferences for each image and task, maximizing targeted considerations while minimizing spatial complexity (𝒪(n)\mathcal{O}(n)). Our solution also requires very little computational cost, and add only a small number of additional coordinate parameters compared to IA-curve methods, which easily enhance the performance and flexibility of the curve. Our contributions can be summarized as follows:

  • We first applied the concept of image adaptive coordinate system, which dynamically adjust the coordinate space to better adapt to the image’s own features and variations.

  • The advantage of our method lies in its low spatial complexity (𝒪(n)\mathcal{O}(n)). Meanwhile for network part, our algorithm also maintains a light-weight design (\sim 39.7K parameters), making it feasible to implement on mobile and edge devices.

  • Beyond the typical photo retouching tasks, we further validated the potential of our approach in exposure correction and white balance editing tasks. State-of-the-art (SOTA) experimental results demonstrated the effectiveness of our method.

Refer to caption
Figure 2: We adopt image-adaptive coordinate system (IAC) for various photography processing tasks, including photo retouching, exposure correction and white balance editing.

2 Photography Processing

Photography processing aims to handle deviations occurred during the photographs capture stages and deviations introduced in the Image Signal Processor (ISP) stage. Here we primarily focus on three tasks: photo retouching, exposure correction and white balance editing, an overview of these 3 tasks is shown in Fig. 2.

2.1 Photo Retouching

Photo retouching is the process of enhancing an image to improve its appearance, clarity, or the overall quality, which is commonly used in professional photography to create visual pleasant outputs. In earlier decade, people manually adjusted photos or relied on some regression techniques [Bychkovsky et al.(2011)Bychkovsky, Paris, Chan, and Durand] (i.e. LASSO Regression [Hastie et al.(2009)Hastie, Tibshirani, Friedman, and Friedman], Gaussian Process Regression [Rasmussen(2004)]). In the era of deep learning, people achieve end-to-end mapping through neural networks, employing data-driven techniques to automatically train a network capable of adjusting images [Moran et al.(2021)Moran, McDonagh, and Slabaugh, Kim et al.(2020)Kim, Koh, and Kim, Song et al.(2021)Song, Qian, and Du, Guo et al.(2020)Guo, Li, Guo, Loy, Hou, Kwong, and Cong, Kim et al.(2021)Kim, Choi, Kim, and Koh, Zeng et al.(2022)Zeng, Cai, Li, Cao, and Zhang, Wang et al.(2021)Wang, Li, Peng, Ma, Wang, Song, and Yan, Yang et al.(2022)Yang, Jin, Jia, Xu, and Chen, Wang et al.(2019)Wang, Zhang, Fu, Shen, Zheng, and Jia, Chen et al.(2018)Chen, Wang, Kao, and Chuang, Gharbi et al.(2017)Gharbi, Chen, Barron, Hasinoff, and Durand, Moran et al.(2020)Moran, Marza, McDonagh, Parisot, and Slabaugh, Ignatov et al.(2017)Ignatov, Kobyshev, Timofte, Vanhoey, and Van Gool, Yang et al.(2023b)Yang, Zhang, Wang, Yu, Wang, and Zhang, Yang et al.(2023a)Yang, Ding, Wu, Li, and Zhang, Kim et al.(2022)Kim, Lee, Kim, Jang, and Kim, Cai et al.(2023)Cai, Bian, Lin, Wang, Timofte, and Zhang]. Like DeepLPF [Moran et al.(2020)Moran, Marza, McDonagh, Parisot, and Slabaugh] learns three different types of local parametric filters and regresses the parameters of these spatially localized filters to enhance the image, and DeepUPE [Wang et al.(2019)Wang, Zhang, Fu, Shen, Zheng, and Jia] introduces intermediate illumination within network to correlate the input with the anticipated enhancement results.

2.2 Exposure Correction

Incorrect exposure times or challenging light conditions can result images with exposure anomalies. Photo exposure correction aims handling both under &\& over exposure conditions to achieve a more balanced and visually pleasing result.

Traditional exposure correction algorithms [Yuan and Sun(2012), Nayar and Branzoi(2003)] prefer to use histogram adjustment to handle exposure error. When it comes to deep learning era, various neural network-based methods [Afifi et al.(2021)Afifi, Derpanis, Ommer, and Brown, Cui et al.(2022)Cui, Li, Gu, Su, Gao, Jiang, Qiao, and Harada, Nguyen et al.(2023)Nguyen, Tran, Nguyen, and Nguyen, Li et al.(2023)Li, Liu, Ma, Jiang, Fan, and Liu, Zhou et al.(2024)Zhou, Li, Liang, Xu, Liu, and Xu] have been proposed to address this issue, such as Afifi et al. [Afifi et al.(2021)Afifi, Derpanis, Ommer, and Brown] utilized a coarse-to-fine multi scale CNN model, and Cui et al. [Cui et al.(2022)Cui, Li, Gu, Su, Gao, Jiang, Qiao, and Harada] used transformer attention to predict key ISP parameters to correct exposure. Very recently, Nguyen et al. [Nguyen et al.(2023)Nguyen, Tran, Nguyen, and Nguyen] designed to use a pseudo ground-truth learning way to achieve unsupervised exposure correction.

2.3 White Balance Editing

White balance (WB) editing aims to correct the acquired sRGB images with wrong white balance setting. This task is more challenging than regular WB correction task which operated on raw-RGB, since illumination estimation is achieved on raw-RGB, once the WB setting is chosen there still remains various other non-linear operation stages in ISP, these operations can increase the complexity of white balance correction. [Afifi et al.(2019)Afifi, Price, Cohen, and Brown, Delbracio et al.(2021)Delbracio, Kelly, Brown, and Milanfar].

Afifi et al. [Afifi et al.(2019)Afifi, Price, Cohen, and Brown] first introduced this task and proposed a method based on k-nearest neighbor (KNN) to compute a nonlinear colour mapping function for correcting images. After that, various deep learning methods [Afifi and Brown(2020), Kinli et al.(2022)Kinli, Yilmaz, Özcan, and Kıraç, Kinli et al.(2023)Kinli, Yilmaz, Özcan, and Kıraç] have been introduced in this area, these works aim to accomplish the mapping from error white-balance (WB) images to correct WB images through an end-to-end network based approach.

In this paper, we will evaluate our method on the aforementioned three tasks to confirm the effectiveness of our image-adaptive coordinate system in diverse scenarios. Experimental results demonstrate that our approach achieves state-of-the-art (SOTA) performance across all three tasks while maintaining time and parameter efficiency.

3 Image-Adaptive Coordinate System

An overview of IAC algorithm is shown in Fig 1(b). Given an image x(r,g,b)x(r,g,b) in the RGB colour space 𝕀(R,G,B)\mathbb{I}(R,G,B), our goal is to learn xx’s most suitable coordinate projection vectors {n1,n2,n3\vec{n_{1}},\vec{n_{2}},\vec{n_{3}}}, image xx would be mapped along {n1,n2,n3\vec{n_{1}},\vec{n_{2}},\vec{n_{3}}} to projected in new coordinate system, then adjusted with 3 curves on new coordinate system. Both of the projection vector {n1,n2,n3\vec{n_{1}},\vec{n_{2}},\vec{n_{3}}} and curves {curve1,curve2,curve3\textit{curve}_{1},\textit{curve}_{2},\textit{curve}_{3}} are learned from network \mathbb{N}.

3.1 Methodology

Image-adaptive coordinate system’s vectors n1\vec{n_{1}}, n2\vec{n_{2}} and n3\vec{n_{3}} are three 3-dimensional linearly independent vectors, which form a 3×33\times 3 invertible matrix:

[n1,n2,n3]=[a1a2a3b1b2b3c1c2c3],\left[\vec{n_{1}},\vec{n_{2}},\vec{n_{3}}\right]=\begin{bmatrix}a_{1}&a_{2}&a_{3}\\ b_{1}&b_{2}&b_{3}\\ c_{1}&c_{2}&c_{3}\end{bmatrix}, (1)

where the RGB colour space could be seem as a special case when [n1,n2,n3]\left[\vec{n_{1}},\vec{n_{2}},\vec{n_{3}}\right] is an identity matrix. Input image x(r,g,b)(H,W,3)x(r,g,b)\in(H,W,3) would multiply to matrix [n1,n2,n3]\left[\vec{n_{1}},\vec{n_{2}},\vec{n_{3}}\right] and project onto the new coordinates space (depicted as (a) in Fig.3):

F(x)\displaystyle F(x) =x[n1,n2,n3]=[xn1,xn2,xn3]\displaystyle=x\cdot\left[\vec{n_{1}},\vec{n_{2}},\vec{n_{3}}\right]=\left[x\cdot\vec{n_{1}},x\cdot\vec{n_{2}},x\cdot\vec{n_{3}}\right] (2)
=[x(r),x(g),x(b)][a1a2a3b1b2b3c1c2c3]=[t1,t2,t3]\displaystyle=\left[x(r),x(g),x(b)\right]\cdot\begin{bmatrix}a_{1}&a_{2}&a_{3}\\ b_{1}&b_{2}&b_{3}\\ c_{1}&c_{2}&c_{3}\end{bmatrix}=\left[t_{1},t_{2},t_{3}\right]
where\displaystyle where :\displaystyle:
ti\displaystyle t_{i} =x(r)ai+x(g)bi+x(b)cii(1,2,3).\displaystyle=x(r)\cdot a_{i}+x(g)\cdot b_{i}+x(b)\cdot c_{i}\quad i\in(1,2,3).

Following Eq.2, the image x(r,g,b)x(r,g,b) would be projected with the matrix [n1,n2,n3]\left[\vec{n_{1}},\vec{n_{2}},\vec{n_{3}}\right] to F(x)F(x). The projection results are represented as F(x)(t1,t2,t3)F(x)(t_{1},t_{2},t_{3}), where t1,t2,t3t_{1},t_{2},t_{3} are the projected values in the new coordinate space. Then, adaptive curves curve1,curve2,curve3{\textit{curve}_{1},\textit{curve}_{2},\textit{curve}_{3}} would be built on the t1,t2,t3t_{1},t_{2},t_{3} channels to adjust the value of F(x)F(x) (depicted as (b) in Fig.3). Here, we normalize the range of t1,t2,t3t_{1},t_{2},t_{3} between 0 and 1. Meanwhile, the curves also range from 0 to 1, and for each curve, we designed it as 200 dimensions 111For more details, please refer to our supplementary part.. Then we adjust the pixel value through curves {curve1,curve2,curve3\textit{curve}_{1},\textit{curve}_{2},\textit{curve}_{3}}:

ti=curvei(ti)i(1,2,3),t_{i}^{\prime}=\textit{curve}_{i}(t_{i})\quad i\in(1,2,3), (3)

values in F(x)(t1,t2,t3)F(x)(t_{1},t_{2},t_{3}) after curve mapping adjustment would be L(F(x))(t1,t2,t3)L(F(x))(t_{1}^{\prime},t_{2}^{\prime},t_{3}^{\prime}), here we first perform a denormalization to recover L(F(x))L(F(x))’s normalized value, then multiply to [n1,n2,n3]\left[\vec{n_{1}},\vec{n_{2}},\vec{n_{3}}\right]’s inverse matrix to map L(F(x))L(F(x)) back to RGB colour space and get final results (depicted as (c) in Fig.3):

F1(L(F(x)))=L(F(x))[n1,n2,n3]1.F^{-1}(L(F(x)))=L(F(x))\cdot\left[\vec{n_{1}},\vec{n_{2}},\vec{n_{3}}\right]^{-1}. (4)

We will initialize the matrix [n1,n2,n3]\left[\vec{n_{1}},\vec{n_{2}},\vec{n_{3}}\right] as an invertible matrix, and in order for the matrix [n1,n2,n3]\left[\vec{n_{1}},\vec{n_{2}},\vec{n_{3}}\right] to be invertible during learning stage, if the rank of the learned matrix [n1,n2,n3]\left[\vec{n_{1}},\vec{n_{2}},\vec{n_{3}}\right] less than 3, we will add a set of small random numbers to help it recover to rank 3.

Afterwards, we will compute loss function (i.e. L1 loss) between F1(L(F(x)))F^{-1}(L(F(x))) and the ground truth xgtx_{gt} to optimize network \mathbb{N}, then help us find the most suitable image-adaptive coordinate projection vectors [n1,n2,n3]\left[\vec{n_{1}},\vec{n_{2}},\vec{n_{3}}\right] and curves.

Refer to caption
Figure 3: An overview of IAC’s network architecture, image xx would pass by network \mathbb{N} to generate coordinate system vectors {n1,n2,n3\vec{n_{1}},\vec{n_{2}},\vec{n_{3}}} and curves {curve1,curve2,curve3\textit{curve}_{1},\textit{curve}_{2},\textit{curve}_{3}}.

3.2 Network Design

In this section we introduce IAC’s network design, network \mathbb{N} is responsible to predict image-adaptive coordinate system {n1,n2,n3\vec{n_{1}},\vec{n_{2}},\vec{n_{3}}} and curves {curve1,curve2,curve3\textit{curve}_{1},\textit{curve}_{2},\textit{curve}_{3}}, additionally IAC is a general approach that could also be implemented in other framework.

As shown it in Fig. 3, input image xx first passes through 3 down-sampling blocks, each down-sampling block consists of a down-sampling (\downarrow2) 3×\times3 convolution, batch normalization and a GELU activation [Hendrycks and Gimpel(2016)]. After down-sampling process, we designed a parallel generator to predict image adaptive coordinate and curves. The parallel generator comprises three parallel branches: 𝔾1\mathbb{G}_{1}, 𝔾2\mathbb{G}_{2}, 𝔾3\mathbb{G}_{3}, each branch consists of several ConvNext [Liu et al.(2022)Liu, Mao, Wu, Feichtenhofer, Darrell, and Xie] blocks, where we set the block number to 3 in our experiments, for each ConvNext [Liu et al.(2022)Liu, Mao, Wu, Feichtenhofer, Darrell, and Xie] block, it consists of a 7×\times7 depth-wise convolution, two 1×\times1 convolutions, layer normalization (LN), and GELU activation [Hendrycks and Gimpel(2016)], channel number of convolution blocks is set to 32, also the entire structure is linked by a residual operation. The large convolution design allows IAC to more effectively extract image features, which also enabling us to better acquire the global-wise image information.

Among three parallel branches: 𝔾1\mathbb{G}_{1}, 𝔾2\mathbb{G}_{2}, 𝔾3\mathbb{G}_{3}, branch 𝔾1\mathbb{G}_{1} is responsible to predict vector n1\vec{n_{1}} and curve1\textit{curve}_{1}, feature pass by 𝔾1\mathbb{G}_{1}’s ConvNext blocks would go through a global average pooling layer and linear layer to predict n1=[a1,b1,c1]\vec{n_{1}}=[a_{1},b_{1},c_{1}] and curve1\textit{curve}_{1}. Similarity, branch 𝔾2\mathbb{G}_{2} is responsible to predict n2\vec{n_{2}} and curve2\textit{curve}_{2} and branch 𝔾3\mathbb{G}_{3} is responsible to predict n3\vec{n_{3}} and curve3\textit{curve}_{3}. After that the predicted vectors and curves would process input image xx, as we mentioned in Sec. 3.1. Please refer to our supplementary for more structure details.

4 Experiments

In this section, we selected 3 tasks to validate the effectiveness of our image-adaptive coordinate (IAC) method, including (a)(a). photo retouching, (b)(b). exposure correction and (c)(c) white balance editing. We would detailed introduce the experiments as follow 222For more experimental details and training setting, please refer to our supplementary part.:

Refer to caption
Figure 4: Visulization of photo retouching results on MIT-Adobe FiveK dataset [Bychkovsky et al.(2011)Bychkovsky, Paris, Chan, and Durand].

4.1 Photo Retouching Experiments

Table 1: Experimental results on MIT-Adobe FiveK [Bychkovsky et al.(2011)Bychkovsky, Paris, Chan, and Durand] dataset, we compare PSNR\uparrow, SSIM\uparrow, parameter number (#\# Para)\downarrow and inference time\downarrow. Here best results are marked as red and best results without CUDA operation are marked as blue.

DeepUPE DPE HDRNet DeepLPF DPED 3D LUT AdaInt CURL IAC (Ours) PSNR 21.88 23.75 24.66 24.73 21.76 25.29 25.49 24.04 25.02 SSIM 0.853 0.828 0.875 0.916 0.871 0.922 0.926 0.900 0.902 Need CUDA ? # Para 927.1K 3.4M 483.1K 1.7M - 593.7K 619.7K 1.4M 39.7K Inference Time 0.628s 0.534s 0.673s 1.287s - 0.012s 0.018s 0.834s 0.014s

We conclude photo retouching experiments on MIT-Adobe FiveK [Bychkovsky et al.(2011)Bychkovsky, Paris, Chan, and Durand] dataset, MIT-Adobe FiveK dataset consists of 5000 images, each meticulously adjusted by five experts (A/B/C/D/E). Following previous works [Moran et al.(2020)Moran, Marza, McDonagh, Parisot, and Slabaugh, Wang et al.(2019)Wang, Zhang, Fu, Shen, Zheng, and Jia, Zeng et al.(2022)Zeng, Cai, Li, Cao, and Zhang], we employ images adjusted by expert C as the ground truth references. And compare our methods with various SOTA photo retouching methods include DeepUPE [Wang et al.(2019)Wang, Zhang, Fu, Shen, Zheng, and Jia], DPE [Chen et al.(2018)Chen, Wang, Kao, and Chuang], HDRNet [Gharbi et al.(2017)Gharbi, Chen, Barron, Hasinoff, and Durand], DeepLPF [Gharbi et al.(2017)Gharbi, Chen, Barron, Hasinoff, and Durand], DPED [Ignatov et al.(2017)Ignatov, Kobyshev, Timofte, Vanhoey, and Van Gool], CURL [Moran et al.(2021)Moran, McDonagh, and Slabaugh], 3D LUT [Zeng et al.(2022)Zeng, Cai, Li, Cao, and Zhang] and AdaInt [Yang et al.(2022)Yang, Jin, Jia, Xu, and Chen], it worth to note that 3D LUT and AdaInt are 2 image-adaptive 3D LUT methods which must rely CUDA to acceleration. Comparison results are shown in Table. 1, our IAC approach could gain best image quality performance (PSNR, SSIM) among non-CUDA methods, meanwhile keep an extremely lightweight design (39.7K parameters) and fast inference time. Some visualization examples are shown in Fig. 4, our IAC could produces ideal colour restoration results, without excessively low or brightened enhanced images, ensuring visual quality in accordance with human perception.

Table 2: Comparison results on the exposure correction ME [Afifi et al.(2021)Afifi, Derpanis, Ommer, and Brown] dataset, where the best results are marked as bold and second best results are marked as underline.

Method Expert A Expert B Expert C Expert D Expert E Avg Test Time \downarrow PSNR\uparrow SSIM\uparrow PSNR\uparrow SSIM\uparrow PSNR\uparrow SSIM\uparrow PSNR\uparrow SSIM\uparrow PSNR\uparrow SSIM\uparrow PSNR\uparrow SSIM\uparrow   HE [Gonzalez and Woods(2006)] 16.14 0.685 16.28 0.671 16.52 0.696 16.63 0.668 17.30 0.688 16.58 0.682 0.50s LIME [Guo et al.(2017)Guo, Li, and Ling] 11.15 0.590 11.83 0.610 11.52 0.607 12.64 0.628 13.61 0.653 12.15 0.618 10.32s RetinexNet [Wei et al.(2018)Wei, Wang, Yang, and Liu] 10.76 0.585 11.61 0.596 11.13 0.605 11.99 0.615 12.67 0.636 11.63 0.607 1.08s Deep-UPE [Wang et al.(2019)Wang, Zhang, Fu, Shen, Zheng, and Jia] 13.16 0.610 13.90 0.642 13.69 0.632 14.80 0.649 15.68 0.667 14.25 0.640 0.78s Zero-DCE [Guo et al.(2020)Guo, Li, Guo, Loy, Hou, Kwong, and Cong] 11.64 0.536 12.56 0.539 12.06 0.544 12.96 0.548 13.77 0.580 12.60 0.549 0.04s 3D-LUT [Zeng et al.(2022)Zeng, Cai, Li, Cao, and Zhang] 13.68 0.591 11.86 0.577 12.79 0.627 12.96 0.548 14.51 0.602 13.06 0.519 0.28s SCI [Ma et al.(2022)Ma, Ma, Liu, Fan, and Luo] 16.11 0.737 17.15 0.805 16.36 0.764 16.51 0.766 16.09 0.761 16.44 0.767 0.17s MSEC [Afifi et al.(2021)Afifi, Derpanis, Ommer, and Brown] 19.16 0.746 20.10 0.734 20.20 0.769 18.98 0.719 18.98 0.727 19.48 0.739 0.72s IAT [Cui et al.(2022)Cui, Li, Gu, Su, Gao, Jiang, Qiao, and Harada] 19.63 0.780 21.21 0.816 21.21 0.820 19.58 0.805 19.21 0.797 20.07 0.804 0.11s PSENet [Nguyen et al.(2023)Nguyen, Tran, Nguyen, and Nguyen] 19.90 0.817 21.65 0.867 21.23 0.850 19.86 0.844 19.34 0.840 20.34 0.844 0.28s MSLT [Zhou et al.(2024)Zhou, Li, Liang, Xu, Liu, and Xu] 20.21 0.805 22.47 0.864 22.03 0.844 20.33 0.830 20.04 0.832 21.02 0.835 0.24s   IAC (Ours) 21.23 0.829 21.84 0.870 22.05 0.859 20.09 0.846 20.88 0.848 21.22 0.850 0.09s

4.2 Exposure Correction Experiments

Secondly, we conducted experiments in exposure correction task to further verify our method’s effectiveness. Here we choose ME dataset [Afifi et al.(2021)Afifi, Derpanis, Ommer, and Brown] which contains 24,330 8-bit sRGB images, and divided into 17,675 training images, 750 validation images, and 5905 test images. ME dataset is rendered from MIT-Adobe FiveK [Bychkovsky et al.(2011)Bychkovsky, Paris, Chan, and Durand] dataset’s RAW data with 5 different exposure values (EVs), where EVs ranging from {-1.5, -1, 0, +1, +1.5}, including under-exposure to over-exposure conditions. This task aims to assess the model’s capability to simultaneously adjust both under &\& over-exposure conditions.

We show the experimental results in Table. 2, we compare IAC with various methods, including traditional methods histogram equalization (HE) [Gonzalez and Woods(2006)] and LIME [Guo et al.(2017)Guo, Li, and Ling], SOTA deep-network based image enhancement methods [Wei et al.(2018)Wei, Wang, Yang, and Liu, Wang et al.(2019)Wang, Zhang, Fu, Shen, Zheng, and Jia, Guo et al.(2020)Guo, Li, Guo, Loy, Hou, Kwong, and Cong, Ma et al.(2022)Ma, Ma, Liu, Fan, and Luo] and SOTA deep-network based exposure correction methods [Afifi et al.(2021)Afifi, Derpanis, Ommer, and Brown, Cui et al.(2022)Cui, Li, Gu, Su, Gao, Jiang, Qiao, and Harada, Nguyen et al.(2023)Nguyen, Tran, Nguyen, and Nguyen, Zhou et al.(2024)Zhou, Li, Liang, Xu, Liu, and Xu]. From Table. 2 we can see that our method gain best performance in overall PSNR and SSIM, meanwhile keep a fast inference speed. We also show the visulization results in Fig. 6, our IAC demonstrates the capability to effectively correct overexposure and enhance underexposure, while also efficiently preserving image details. An example in underexposure "Night" scene (Fig. 6 line 1\sim2) shows that, Zero-DCE [Guo et al.(2020)Guo, Li, Guo, Loy, Hou, Kwong, and Cong], MSEC [Afifi et al.(2021)Afifi, Derpanis, Ommer, and Brown], and PSENet [Nguyen et al.(2023)Nguyen, Tran, Nguyen, and Nguyen] tend to over-brighten images, potentially causing them to lose details, meanwhile IAT [Cui et al.(2022)Cui, Li, Gu, Su, Gao, Jiang, Qiao, and Harada] and MSLT [Zhou et al.(2024)Zhou, Li, Liang, Xu, Liu, and Xu] may result in low clarity.

4.3 White Balance Editing Experiments

For white balance editing task, we utilize the Rendered WB dataset created by Afifi et al. [Afifi et al.(2019)Afifi, Price, Cohen, and Brown], this dataset comprises two subsets: Set1, containing 62,535 images captured by seven distinct DSLR cameras, and Set2, containing 2,881 images captured by one DSLR camera and four different phone cameras. Here we follow previous works’ [Afifi et al.(2019)Afifi, Price, Cohen, and Brown, Afifi and Brown(2020)] setting, which take Set1 for training and use Set2 for testing, same as previous work [Afifi and Brown(2020)], we randomly choose 12,000 images in Set1 as the training set.

We made comparison with various WB editing methods, including the classical White Patch [Brainard and Wandell(1986)] method, along with recent methods FC4 [Hu et al.(2017)Hu, Wang, and Lin], KNN-WB [Afifi et al.(2019)Afifi, Price, Cohen, and Brown] and CNN-WB [Afifi and Brown(2020)]. Comparison results are shown in Table. 3, we can see IAC can achieve competitive results meanwhile keep fastest inference speed. Additionally our method is much more light-weight than CNN-WB [Afifi and Brown(2020)] (IAC \sim 39.7K v.s. CNN-WB \sim 10M), some visualization results in Set2 are shown in Fig. 5, demonstrates that IAC could also handle white balance editing task.

Table 3: Comparison results on the white balance editing dataset [Afifi et al.(2019)Afifi, Price, Cohen, and Brown]’s Set2, yellow colour shows best results and blue colour shows second best results.

Methods MSE \downarrow MAE \downarrow Delta E 2000 \downarrow Inference Time Mean Q1 Q2 Q3 Mean Q1 Q2 Q3 Mean Q1 Q2 Q3 White Patch 586.72 148.65 335.76 664.41 11.26 6.28 10.17 16.89 12.28 8.79 12.07 15.01 0.15s FC4 505.30 142.46 307.77 635.35 10.37 5.94 9.42 14.04 10.82 7.39 10.64 13.77 0.89s KNN-WB 171.09 37.04 87.04 190.88 4.48 2.26 3.64 5.95 5.60 3.43 4.90 7.06 0.54s CNN-WB 124.97 30.13 76.32 154.44 3.75 2.02 3.08 4.72 4.90 3.13 4.35 6.08 1.2s Ours 130.58 33.22 72.56 180.48 3.99 1.98 3.44 4.87 5.13 3.24 4.48 6.27 0.05s

Refer to caption
Figure 5: Visulization of white balance editing results on dataset [Afifi et al.(2019)Afifi, Price, Cohen, and Brown]’s Set2.
Refer to caption
Figure 6: Visualization results on exposure correction dataset [Afifi et al.(2021)Afifi, Derpanis, Ommer, and Brown], line 1 and line 2 are the under-exposure correction results, meanwhile line 3 to line 6 are the over-exposure correction results, our method could both handle over&\&under-exposure meanwhile keep more details.

5 Conclusion

We present IAC, which learns an image-adaptive coordinate system for various photography processing tasks. Experimental results on photo retouching, exposure correction, and white balance editing showcase the superior performance of our method. In the future, we aim to extend the coordinate transformation solution to curved surface, our algorithm may yield even better results in non-uniform coordinate spaces, we also want validate the effectiveness of IAC for 3D application, such as 3D reconstruction in challenging lighting conditions [Cui et al.(2024)Cui, Gu, Sun, Ma, Qiao, and Harada].

6 Acknowledgement

This work was partially supported by JST Moonshot R&\&D Grant Number JPMJPS2011, CREST Grant Number JPMJCR2015 and Basic Research Grant (Super AI) of Institute for AI and Beyond of the University of Tokyo.

References

  • [Afifi and Brown(2020)] Mahmoud Afifi and Michael S. Brown. Deep white-balance editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  • [Afifi et al.(2019)Afifi, Price, Cohen, and Brown] Mahmoud Afifi, Brian Price, Scott Cohen, and Michael S Brown. When color constancy goes wrong: Correcting improperly white-balanced images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1535–1544, 2019.
  • [Afifi et al.(2021)Afifi, Derpanis, Ommer, and Brown] Mahmoud Afifi, Konstantinos G Derpanis, Bjorn Ommer, and Michael S Brown. Learning multi-scale photo exposure correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9157–9167, 2021.
  • [Brainard and Wandell(1986)] David H Brainard and Brian A Wandell. Analysis of the retinex theory of color vision. JOSA A, 3(10):1651–1661, 1986.
  • [Bychkovsky et al.(2011)Bychkovsky, Paris, Chan, and Durand] Vladimir Bychkovsky, Sylvain Paris, Eric Chan, and Frédo Durand. Learning photographic global tonal adjustment with a database of input / output image pairs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011.
  • [Cai et al.(2023)Cai, Bian, Lin, Wang, Timofte, and Zhang] Yuanhao Cai, Hao Bian, Jing Lin, Haoqian Wang, Radu Timofte, and Yulun Zhang. Retinexformer: One-stage retinex-based transformer for low-light image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 12504–12513, October 2023.
  • [Chen et al.(2018)Chen, Wang, Kao, and Chuang] Yu-Sheng Chen, Yu-Ching Wang, Man-Hsin Kao, and Yung-Yu Chuang. Deep photo enhancer: Unpaired learning for image enhancement from photographs with gans. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
  • [Conde et al.(2024)Conde, Vazquez-Corral, Brown, and Timofte] Marcos V Conde, Javier Vazquez-Corral, Michael S Brown, and Radu Timofte. Nilut: Conditional neural implicit 3d lookup tables for image enhancement. In Proceedings of the AAAI Conference on Artificial Intelligence, 2024.
  • [Cui et al.(2022)Cui, Li, Gu, Su, Gao, Jiang, Qiao, and Harada] Ziteng Cui, Kunchang Li, Lin Gu, Shenghan Su, Peng Gao, ZhengKai Jiang, Yu Qiao, and Tatsuya Harada. You only need 90k parameters to adapt light: a light weight transformer for image enhancement and exposure correction. In 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, November 21-24, 2022. BMVA Press, 2022. URL https://bmvc2022.mpi-inf.mpg.de/0238.pdf.
  • [Cui et al.(2024)Cui, Gu, Sun, Ma, Qiao, and Harada] Ziteng Cui, Lin Gu, Xiao Sun, Xianzheng Ma, Yu Qiao, and Tatsuya Harada. Aleth-nerf: Illumination adaptive nerf with concealing field assumption. In Proceedings of the AAAI Conference on Artificial Intelligence, 2024.
  • [Delbracio et al.(2021)Delbracio, Kelly, Brown, and Milanfar] Mauricio Delbracio, Damien Kelly, Michael S. Brown, and Peyman Milanfar. Mobile computational photography: A tour. Annual Review of Vision Science, 7(1):571–604, 2021. 10.1146/annurev-vision-093019-115521. URL https://doi.org/10.1146/annurev-vision-093019-115521. PMID: 34524880.
  • [Gharbi et al.(2017)Gharbi, Chen, Barron, Hasinoff, and Durand] Michaël Gharbi, Jiawen Chen, Jonathan T Barron, Samuel W Hasinoff, and Frédo Durand. Deep bilateral learning for real-time image enhancement. ACM Transactions on Graphics (TOG), 36(4):1–12, 2017.
  • [Gonzalez and Woods(2006)] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing (3rd Edition). Prentice-Hall, Inc., USA, 2006. ISBN 013168728X.
  • [Guo et al.(2020)Guo, Li, Guo, Loy, Hou, Kwong, and Cong] C. Guo, C. Li, J. Guo, C. C. Loy, J. Hou, S. Kwong, and R. Cong. Zero-reference deep curve estimation for low-light image enhancement. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  • [Guo et al.(2017)Guo, Li, and Ling] Xiaojie Guo, Yu Li, and Haibin Ling. Lime: Low-light image enhancement via illumination map estimation. IEEE Transactions on Image Processing, 2017.
  • [Hastie et al.(2009)Hastie, Tibshirani, Friedman, and Friedman] Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
  • [Hendrycks and Gimpel(2016)] Dan Hendrycks and Kevin Gimpel. Bridging nonlinearities and stochastic regularizers with gaussian error linear units. CoRR, abs/1606.08415, 2016. URL http://arxiv.org/abs/1606.08415.
  • [Hu et al.(2017)Hu, Wang, and Lin] Yuanming Hu, Baoyuan Wang, and Stephen Lin. Fc 4: Fully convolutional color constancy with confidence-weighted pooling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4085–4094, 2017.
  • [Ignatov et al.(2017)Ignatov, Kobyshev, Timofte, Vanhoey, and Van Gool] Andrey Ignatov, Nikolay Kobyshev, Radu Timofte, Kenneth Vanhoey, and Luc Van Gool. Dslr-quality photos on mobile devices with deep convolutional networks. In Proceedings of the IEEE international conference on computer vision, 2017.
  • [Jiang et al.(2023)Jiang, Wang, Li, Li, Fan, and Liu] Ting Jiang, Chuan Wang, Xinpeng Li, Ru Li, Haoqiang Fan, and Shuaicheng Liu. Meflut: Unsupervised 1d lookup tables for multi-exposure image fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10542–10551, 2023.
  • [Johnson et al.(2016)Johnson, Alahi, and Fei-Fei] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision, 2016.
  • [Kim et al.(2022)Kim, Lee, Kim, Jang, and Kim] Bomi Kim, Sunhyeok Lee, Nahyun Kim, Donggon Jang, and Dae-Shik Kim. Learning color representations for low-light image enhancement. In 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 904–912, 2022. 10.1109/WACV51458.2022.00098.
  • [Kim et al.(2020)Kim, Koh, and Kim] Han-Ul Kim, Young Jun Koh, and Chang-Su Kim. Global and local enhancement networks for paired and unpaired image enhancement. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision – ECCV 2020, pages 339–354, Cham, 2020. Springer International Publishing. ISBN 978-3-030-58595-2.
  • [Kim et al.(2021)Kim, Choi, Kim, and Koh] Hanul Kim, Su-Min Choi, Chang-Su Kim, and Yeong Jun Koh. Representative colour transform for image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
  • [Kinli et al.(2022)Kinli, Yilmaz, Özcan, and Kıraç] Furkan Kinli, Dogacan Yilmaz, Barış Özcan, and Mustafa Furkan Kıraç. Modeling the lighting in scenes as style for auto white-balance correction. 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 4892–4902, 2022. URL https://api.semanticscholar.org/CorpusID:252917913.
  • [Kinli et al.(2023)Kinli, Yilmaz, Özcan, and Kıraç] Furkan Kinli, Dogacan Yilmaz, Barış Özcan, and Mustafa Furkan Kıraç. Deterministic neural illumination mapping for efficient auto-white balance correction. 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 1131–1139, 2023. URL https://api.semanticscholar.org/CorpusID:260704581.
  • [Li et al.(2023)Li, Liu, Ma, Jiang, Fan, and Liu] Gehui Li, Jinyuan Liu, Long Ma, Zhiying Jiang, Xin Fan, and Risheng Liu. Fearless luminance adaptation: A macro-micro-hierarchical transformer for exposure correction. In Proceedings of the 31st ACM International Conference on Multimedia, MM ’23, page 7304–7313, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701085.
  • [Liu et al.(2023)Liu, Yang, Fu, and Qian] Chengxu Liu, Huan Yang, Jianlong Fu, and Xueming Qian. 4d lut: learnable context-aware 4d lookup table for image enhancement. IEEE Transactions on Image Processing, 32:4742–4756, 2023.
  • [Liu et al.(2022)Liu, Mao, Wu, Feichtenhofer, Darrell, and Xie] Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  • [Ma et al.(2022)Ma, Ma, Liu, Fan, and Luo] Long Ma, Tengyu Ma, Risheng Liu, Xin Fan, and Zhongxuan Luo. Toward fast, flexible, and robust low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5637–5646, 2022.
  • [Moran et al.(2020)Moran, Marza, McDonagh, Parisot, and Slabaugh] Sean Moran, Pierre Marza, Steven McDonagh, Sarah Parisot, and Gregory Slabaugh. Deeplpf: Deep local parametric filters for image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  • [Moran et al.(2021)Moran, McDonagh, and Slabaugh] Sean Moran, Steven McDonagh, and Gregory Slabaugh. Curl: Neural curve layers for global image enhancement. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 9796–9803, 2021. 10.1109/ICPR48806.2021.9412677.
  • [Nayar and Branzoi(2003)] Nayar and Branzoi. Adaptive dynamic range imaging: optical control of pixel exposures over space and time. In Proceedings Ninth IEEE International Conference on Computer Vision, pages 1168–1175 vol.2, 2003. 10.1109/ICCV.2003.1238624.
  • [Nguyen et al.(2023)Nguyen, Tran, Nguyen, and Nguyen] Hue Nguyen, Diep Tran, Khoi Nguyen, and Rang Nguyen. Psenet: Progressive self-enhancement network for unsupervised extreme-light image enhancement. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1756–1765, 2023.
  • [Qiu et al.(2022)Qiu, Liu, Sun, Lin, and Chen] Zhaolin Qiu, Jiaqing Liu, Hao Sun, Lanfen Lin, and Yen-Wei Chen. Costhr: A heart rate estimating network with adaptive color space transformation. IEEE Transactions on Instrumentation and Measurement, 71:1–10, 2022. 10.1109/TIM.2022.3170976.
  • [Rasmussen(2004)] Carl Edward Rasmussen. Gaussian Processes in Machine Learning, pages 63–71. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004. ISBN 978-3-540-28650-9. 10.1007/978-3-540-28650-9_4. URL https://doi.org/10.1007/978-3-540-28650-9_4.
  • [Song et al.(2021)Song, Qian, and Du] Yuda Song, Hui Qian, and Xin Du. Starenhancer: Learning real-time and style-aware image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4126–4135, 2021.
  • [Vinker et al.(2021)Vinker, Huberman-Spiegelglas, and Fattal] Yael Vinker, Inbar Huberman-Spiegelglas, and Raanan Fattal. Unpaired learning for high dynamic range image tone mapping. In Proceedings of the IEEE/CVF international conference on computer vision, pages 14657–14666, 2021.
  • [Wang et al.(2019)Wang, Zhang, Fu, Shen, Zheng, and Jia] Ruixing Wang, Qing Zhang, Chi-Wing Fu, Xiaoyong Shen, Wei-Shi Zheng, and Jiaya Jia. Underexposed photo enhancement using deep illumination estimation. In The IEEE Conference on Computer Vision and Pattern Recognition, 2019.
  • [Wang et al.(2021)Wang, Li, Peng, Ma, Wang, Song, and Yan] Tao Wang, Yong Li, Jingyang Peng, Yipeng Ma, Xian Wang, Fenglong Song, and Youliang Yan. Real-time image enhancer via learnable spatial-aware 3d lookup tables. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2471–2480, 2021.
  • [Wei et al.(2018)Wei, Wang, Yang, and Liu] Chen Wei, Wenjing Wang, Wenhan Yang, and Jiaying Liu. Deep retinex decomposition for low-light enhancement. In British Machine Vision Conference, 2018.
  • [Yang et al.(2022)Yang, Jin, Jia, Xu, and Chen] Canqian Yang, Meiguang Jin, Xu Jia, Yi Xu, and Ying Chen. Adaint: Learning adaptive intervals for 3d lookup tables on real-time image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  • [Yang et al.(2023a)Yang, Ding, Wu, Li, and Zhang] Shuzhou Yang, Moxuan Ding, Yanmin Wu, Zihan Li, and Jian Zhang. Implicit neural representation for cooperative low-light image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 12918–12927, October 2023a.
  • [Yang et al.(2023b)Yang, Zhang, Wang, Yu, Wang, and Zhang] Shuzhou Yang, Xuanyu Zhang, Yinhuai Wang, Jiwen Yu, Yuhan Wang, and Jian Zhang. Difflle: Diffusion-guided domain calibration for unsupervised low-light image enhancement, 2023b.
  • [Yuan and Sun(2012)] Lu Yuan and Jian Sun. Automatic exposure correction of consumer photographs. In European Conference on Computer Vision, 2012.
  • [Zeng et al.(2022)Zeng, Cai, Li, Cao, and Zhang] Hui Zeng, Jianrui Cai, Lida Li, Zisheng Cao, and Lei Zhang. Learning image-adaptive 3d lookup tables for high performance photo enhancement in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(04):2058–2073, 2022.
  • [Zhang et al.(2023)Zhang, Zhang, Zhang, and Wang] Fengyi Zhang, Lin Zhang, Tianjun Zhang, and Dongqing Wang. Adaptively hashing 3dluts for lightweight real-time image enhancement. In 2023 IEEE International Conference on Multimedia and Expo (ICME), pages 2771–2776, 2023. 10.1109/ICME55011.2023.00471.
  • [Zhou et al.(2024)Zhou, Li, Liang, Xu, Liu, and Xu] Yijie Zhou, Chao Li, Jin Liang, Tianyi Xu, Xin Liu, and Jun Xu. 4k-resolution photo exposure correction at 125 fps with  8k parameters. In Winter Conference on Applications of Computer Vision (WACV), 2024.

Appendix A Experiments Setting

All experiments for IAC model on the 3 tasks (Photo Retouching, Exposure Correction, and White Balance Editing) were conducted on a single Nvidia A100 GPU. Next, we will provide a detailed explanation of the experimental settings and training details for each task.

A.1 Photo Retouching Setting

The MIT-Adobe FiveK [Bychkovsky et al.(2011)Bychkovsky, Paris, Chan, and Durand] dataset contains 5,000 images, of which 4,500 are used for training and the remaining 500 for evaluation. The training images are uniformly resized to 400×\times600 and augmented with random flip and rotation. The training process uses the Adam optimizer with an initial learning rate of 1e51e^{-5} and a weight decay set to 0.0002. The model is trained for a total of 100 iterations, accompanied by a cosine annealing learning strategy.

The loss function between predicted image F1(L(F(x)))F^{-1}(L(F(x))) and ground truth image x^\hat{x} is a mixed loss function mix\mathcal{L}_{mix} consisting of smooth L1 loss and VGG loss [Johnson et al.(2016)Johnson, Alahi, and Fei-Fei]:

mix=1smooth+0.04vgg.\mathcal{L}_{mix}=\mathcal{L}_{1smooth}+0.04\cdot\mathcal{L}_{vgg}. (5)

A.2 Exposure Correction Setting

The exposure correction ME dataset [Afifi et al.(2021)Afifi, Derpanis, Ommer, and Brown] contains 24,330 images, which divided into 17,675 training images, 750 validation images, and 5905 test images. For the exposure correction task, the training images are cropped into 256×\times256 patches and augmented with random flip and rotation. We also adopt Adam optimizer same as photo retouching task, with an initial learning rate of 2e52e^{-5} and a weight decay set to 0.0001. The model is trained for a total of 20 iterations, also accompanied by a cosine annealing learning strategy. And the loss fucntion we used in exposure correction task is L1 loss function.

A.3 White Balance Editing Setting

The White Balance Editing dataset [Afifi et al.(2019)Afifi, Price, Cohen, and Brown], Rendered WB, includes two sets: Set1 containing 62,535 images and Set2 containing 2,881 images. We use 12,000 images from Set1 for training. The training settings are the same as for the exposure correction tasks, except the number of training epochs is set to 100. We also adopt the L1 loss function for this task.

Appendix B Ablation Analyse

B.1 Curve Dimension Ablation

We conducted an ablation analysis on the dimensionality of curves on the exposure correction ME [Afifi et al.(2021)Afifi, Derpanis, Ommer, and Brown] dataset, investigating the impact of the dimensions of {curve1,curve2,curve3}\left\{\textit{curve}_{1},\textit{curve}_{2},\textit{curve}_{3}\right\} on the experimental results. The experimental results are shown in Table. IV, from which we can observe that setting the dimension to 200 is a rather reasonable choice. Meanwhile, as shown in Fig. VII, setting a lower dimension easily leads to pixelation in the images.

B.2 Network Structure Ablation

In our default experiments, we set the dimension size in ConvNext [Liu et al.(2022)Liu, Mao, Wu, Feichtenhofer, Darrell, and Xie] block to 32, in the ablation study on photo retouching dataset [Bychkovsky et al.(2011)Bychkovsky, Paris, Chan, and Durand], we attempted other dimension numbers such as 16, 24, and 64, as shown in Table. V. From the perspective of parameter count/FLOPs and overall performance, we found that setting it to 24 or 32 is a more reasonable choice. Furthermore, in the original network, we attempted to use three parallel branches to learn three sets of coordinates and their corresponding curves. Here, we also tried putting the three sets of coordinates and curves into a single branch for learning. However, we found that this approach led to a significant decrease in performance, as shown in Table. V (“unified branch”).

Refer to caption
Figure VII: The ablation analyse of the curve dim’s effect.
Table IV: Ablation analyse on the curve dimension.
dims 50 dims 100 dims 150 dims 200 dims 250
PSNR 17.67 20.45 20.88 21.22 21.24
SSIM 0.634 0.795 0.832 0.850 0.849
Table V: Ablation analyse on the ConvNext [Liu et al.(2022)Liu, Mao, Wu, Feichtenhofer, Darrell, and Xie] block convolution dimension.
size 16 size 24 size 32 size 64 unified branch (size 32)
PSNR 24.01 24.81 25.02 25.01 24.12
SSIM 0.872 0.897 0.902 0.895 0.865
parameters 16.7K 28.9K 39.7K 97.8K 25.4K
Flops 0.72 G 1.98 G 3.25 G 7.89 G 3.04 G