This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: R. Pourreza-Shahri 22institutetext: University of Texas at Dallas 33institutetext: N. Kehtarnavaz 44institutetext: University of Texas at Dallas 44email: [email protected]

Automatic Exposure Selection and Fusion
for High Dynamic Range Photography via Smartphones

Reza Pourreza-Shahri    Nasser Kehtarnavaz
(Appeared in Signal, Image and Video Processing volume 11, pages 1437–1444 (2017) (link))
Abstract

High Dynamic Range (HDR) photography involves fusing a bracket of images taken at different exposure settings in order to compensate for the low dynamic range of digital cameras such as the ones used in smartphones. In this paper, a method for automatically selecting the exposure settings of such images is introduced based on the camera characteristic function. In addition, a new fusion method is introduced based on an optimization formulation and weighted averaging. Both of these methods are implemented on a smartphone platform as an HDR app to demonstrate the practicality of the introduced methods. Comparison results with several existing methods are presented indicating the effectiveness as well as the computational efficiency of the introduced solution.

Keywords:
Automatic exposure selection High Dynamic Range photography on smartphones exposure bracketing

1 Introduction

Many camera sensors, in particular the ones used in smartphones, have limited dynamic or contrast ranges. High Dynamic Range (HDR) techniques allow compensating for low dynamic ranges of such sensors by capturing a number of images at different exposures, called an exposure bracket, and by fusing these images to form an HDR image Ref1 , Ref2 .
Although an exposure bracket may contain many images taken at different exposure settings, Barakat et al. Ref3 showed that in most cases, an exposure bracket consisting of three images is adequate for capturing the full contrast of a scene. Exposure bracketing is thus normally done by taking three images with one taken at the auto-exposure (AE) setting together with a brighter looking image and a darker looking image taken at EVAE±nEV_{AE}\pm n exposure settings, where nn is manually selected or is a user-specified exposure level. An automatic exposure selection method makes it possible to generate HDR images without requiring users to select the exposure level n. Automatic exposure selection methods in the literature can be grouped into two major categories: scene irradiance-based methods Ref3 , Ref4 and ad-hoc methods Ref5 -Ref7 .
Fusion approaches in the literature can also be placed into one of these two major categories: (i) irradiance mapping Ref8 followed by tonal mapping, and (ii) direct exposure fusion Ref9 . Tonal mapping methods work either globally or locally. Global tonal mapping methods, e.g. Ref10 , use a monotonically increasing curve to compress an irradiance map. Such methods do not retain local image details. On the other hand, local tonal mapping methods retain local image details. Recent tonal mapping algorithms, e.g. Ref11 -Ref13 , decompose the luminance of an irradiance map image into a base layer and a detail layer where the base layer consists of large scale variations and the detail layer of small scale variations. Direct exposure fusion methods, e.g. Ref15 -Ref21 , use local image characteristics to generate weight maps and then fuse bracket images using such maps. It is worth noting that some exposure fusion works have addressed the generation of HDR images in the presence of changing scenes Ref22 , Ref23 .
In this paper, a new HDR photography method for scenes that remain static during the time different exposures are captured is introduced and implemented on smartphones. This method consists of two parts: exposure selection and exposure fusion. In the exposure selection part, the scene is analyzed to determine three exposure times automatically for under-exposed, normal-exposed, and over-exposed images. These images are then blended or fused in the exposure fusion part using an optimization framework and weighted averaging.
The rest of the paper is organized as follows. In sections 2 and 3, the exposure selection and the exposure fusion parts are described, respectively. Section 4 provides the smartphone implementation or app for generating HDR images. The results and comparisons with several existing methods are then presented in section 5 and finally the paper is concluded in section 6.

2 Automatic Exposure Selection Method

Exposure bracketing involves the use of three images: a normal-exposed image (INE\textbf{I}_{NE}), an over-exposed image (IOE\textbf{I}_{OE}) and an under-exposed image (IUE\textbf{I}_{UE}). The exposure selection method we introduced in Ref7 has been employed here for the smartphone implementation. The steps to find optimal exposure deviations about the normal exposure are shown in Fig. 1.

Refer to caption
Figure 1: Automatic exposure selection steps

Initially, all the camera parameters are set to automatic and an auto-exposed image Iin\textbf{I}_{in} is captured. The luminance of Iin\textbf{I}_{in} is clustered into dark, normal, and bright regions. These regions represent under-exposed, well-exposed, and over-exposed parts of the image, respectively. The optimal exposure times are then obtained in order to make the dark, the normal, and the bright regions of Iin\textbf{I}_{in} better exposed in IOE\textbf{I}_{OE}, INE\textbf{I}_{NE}, and IUE\textbf{I}_{UE}, respectively. Unlike the conventional automatic exposure methods that consider a linear relationship between the exposure time and the brightness level, in this work the camera characteristic function is used to establish this relationship and the exposure times are found by mapping the gray-level means of the clustered regions to the optimal gray level (usually 128 for 256-level images). More details of the exposure selection process are provided in Ref7 .

3 Fusion Method to Generate HDR Images

This section describes the fusion of the images IUE\textbf{I}_{UE}, INE\textbf{I}_{NE}, and IOE\textbf{I}_{OE}, using an algorithm described in the next subsections. Let YUE\textbf{Y}_{UE}, CbUE\textbf{Cb}_{UE}, CrUE\textbf{Cr}_{UE}, YNE\textbf{Y}_{NE}, CbNE\textbf{Cb}_{NE}, CrNE\textbf{Cr}_{NE}, YOE\textbf{Y}_{OE}, CbOE\textbf{Cb}_{OE}, CrOE\textbf{Cr}_{OE}, Y, Cb, Cr represent the Y, Cb, and Cr components of the images IUE\textbf{I}_{UE}, INE\textbf{I}_{NE}, IOE\textbf{I}_{OE}, and the fused or output image, respectively. Fusion is conducted for the luminance and chrominance component separately as described in the following two subsections.

3.1 Luminance fusion

Here gradient information is used to guide the luminance fusion. The rationale behind using gradient information is that a well-exposed image provides a better representation of edges, that is to say it provides higher gradient values compared to a poorly-exposed image Ref18 . The gradients are extracted from the well-exposed regions of YUE\textbf{Y}_{UE}, YNE\textbf{Y}_{NE}, and YOE\textbf{Y}_{OE} and merged into gradient maps along the horizontal (𝚲h\mathbf{\Lambda}_{h}) and vertical (𝚲v\mathbf{\Lambda}_{v}) directions. An initial estimate (X) of the fused luminance (Y) is also obtained by averaging YUE\textbf{Y}_{UE}, YNE\textbf{Y}_{NE}, and YOE\textbf{Y}_{OE}. Using the extracted gradients and the initial estimate, the fused luminance (Y) is devised by solving the optimization problem that is stated below.

Y^=argminY{YXF2+λhY𝚲hF2+λvY𝚲vF2}\begin{array}[]{ll}\widehat{\textbf{Y}}=\textrm{argmin}_{\textbf{Y}}\big{\{}\|\textbf{Y}-\textbf{X}\|_{F}^{2}+\lambda\|\nabla^{h}\textbf{Y}-\mathbf{\Lambda}_{h}\|_{F}^{2}+\\ \qquad\qquad\qquad\enspace\lambda\|\nabla^{v}\textbf{Y}-\mathbf{\Lambda}_{v}\|_{F}^{2}\big{\}}\end{array} (1)

where .F2\|.\|_{F}^{2} denotes the Frobenius matrix norm, h\nabla^{h} and v\nabla^{v} indicate the horizontal and vertical gradient operators, and λ\lambda is a weighting parameter. The solution to this optimization problem is given by (see Appendix A):

𝚿^u,v={X}u,v+λ{h}u,v{𝚲h}u,v+{v}u,v{𝚲v}u,v1+λ{h}u,v2+λ{v}u,v2Y^=1{𝚿^}\begin{array}[]{ll}\widehat{\mathbf{\Psi}}_{u,v}=\\ \quad\frac{\mathcal{F}\{\textbf{X}\}_{u,v}+\lambda\mathcal{F}\{\nabla^{h}\}_{u,v}^{\dagger}\mathcal{F}\{\mathbf{\Lambda}_{h}\}_{u,v}+\mathcal{F}\{\nabla^{v}\}_{u,v}^{\dagger}\mathcal{F}\{\mathbf{\Lambda}_{v}\}_{u,v}}{1+\lambda\|\mathcal{F}\{\nabla^{h}\}_{u,v}\|^{2}+\lambda\|\mathcal{F}\{\nabla^{v}\}_{u,v}\|^{2}}\\ \widehat{\textbf{Y}}=\mathcal{F}^{-1}\big{\{}\widehat{\mathbf{\Psi}}\big{\}}\end{array} (2)

where u and v are the pixel coordinates, \mathcal{F} and 1\mathcal{F}^{-1} denote the forward and inverse Fourier transforms, respectively, and \dagger denotes complex conjugate.
For luminance fusion, the images YUE\textbf{Y}_{UE}, YNE\textbf{Y}_{NE}, and YOE\textbf{Y}_{OE}, are available and the output image Y is estimated. As stated earlier, X, 𝚲h\mathbf{\Lambda}_{h} and 𝚲v\mathbf{\Lambda}_{v} are obtained from the available image data in the solution provided in (2) to estimate Y. In this work, the initial estimate X is chosen to be the average of the images captured at different exposure settings because the average luminance conveys the brightness of a scene as illustrated in Fig. 2. In other words, X is set to be

X=(YOE+YNE+YUE)/3\textbf{X}=(\textbf{Y}_{OE}+\textbf{Y}_{NE}+\textbf{Y}_{UE})/3 (3)
Refer to caption
Figure 2: Top row from left to right: under-exposed, normal-exposed, and over-exposed images. Bottom row: average image.

The horizontal (𝚲h\mathbf{\Lambda}_{h}) and the vertical (𝚲v\mathbf{\Lambda}_{v}) gradients are extracted from the luminance components using the clustering outcome obtained in the exposure selection step. In other words, the gradient in the dark, normal, and bright regions is obtained from the over-exposed, normal-exposed, and under-exposed images, respectively. For this purpose, three binary cluster maps (R) are generated which correspond to the three clusters obtained from the exposure selection step . For a pixel k of Iin\textbf{I}_{in}, if its gray-level belongs to the cluster 1, 2, or 3 (specified by the dark, normal, and bright areas, respectively), then the corresponding pixel in RUE\textbf{R}_{UE}, RUE\textbf{R}_{UE}, or RUE\textbf{R}_{UE} is set to 1, where RUE\textbf{R}_{UE}, RNE\textbf{R}_{NE}, and ROE\textbf{R}_{OE} represent the binary cluster maps for the under-exposed, normal-exposed, and over-exposed clusters, respectively. Hence, for a pixel k

RUEk={1,CIink=10,otherwiseRNEk={1,CIink=20,otherwiseROEk={1,CIink=30,otherwise\begin{split}\textbf{R}_{UE}^{k}=\left\{\begin{array}[]{ll}1,\quad C_{\textbf{I}_{in}^{k}}=1\\ 0,\quad\textrm{otherwise}\end{array}\right.\\ \textbf{R}_{NE}^{k}=\left\{\begin{array}[]{ll}1,\quad C_{\textbf{I}_{in}^{k}}=2\\ 0,\quad\textrm{otherwise}\end{array}\right.\\ \textbf{R}_{OE}^{k}=\left\{\begin{array}[]{ll}1,\quad C_{\textbf{I}_{in}^{k}}=3\\ 0,\quad\textrm{otherwise}\end{array}\right.\end{split} (4)

where CIinkC_{\textbf{I}_{in}^{k}} denotes the cluster label of the kthk^{th} pixel of Iin\textbf{I}_{in}. Then, 𝚲h\mathbf{\Lambda}_{h} and 𝚲v\mathbf{\Lambda}_{v} for the pixel k are computed as follows:

𝚲hk=RUEk(hYOE)k+RNEk(hYNE)k+ROEk(hYUE)k𝚲vk=RUEk(vYOE)k+RNEk+(vYNE)k+ROEk(vYUE)k\begin{array}[]{ll}\mathbf{\Lambda}_{h}^{k}=\textbf{R}_{UE}^{k}\big{(}\nabla^{h}\textbf{Y}_{OE}\big{)}^{k}+\textbf{R}_{NE}^{k}\big{(}\nabla^{h}\textbf{Y}_{NE}\big{)}^{k}+\\ \qquad\enspace\textbf{R}_{OE}^{k}\big{(}\nabla^{h}\textbf{Y}_{UE}\big{)}^{k}\\ \mathbf{\Lambda}_{v}^{k}=\textbf{R}_{UE}^{k}\big{(}\nabla^{v}\textbf{Y}_{OE}\big{)}^{k}+\textbf{R}_{NE}^{k}+\big{(}\nabla^{v}\textbf{Y}_{NE}\big{)}^{k}+\\ \qquad\enspace\textbf{R}_{OE}^{k}\big{(}\nabla^{v}\textbf{Y}_{UE}\big{)}^{k}\end{array} (5)

The method described above causes sharp transitions on cluster borders since image gradients get multiplied by binary cluster maps. In order to avoid such sharp transitions, a low-pass filter is applied to every binary map. Here, the so-called guided filter Ref13 is applied to the binary masks to make transitions of cluster borders smooth. In other words, the smooth cluster maps (𝛀\mathbf{\Omega}) for the three clusters are obtained as follows:

𝛀UE=guided_filter(X,RUE,r,ε)𝛀NE=guided_filter(X,RNE,r,ε)𝛀OE=guided_filter(X,ROE,r,ε)\begin{array}[]{ll}\mathbf{\Omega}_{UE}=\textrm{guided\_filter}(\textbf{X},\textbf{R}_{UE},r,\varepsilon)\\ \mathbf{\Omega}_{NE}=\textrm{guided\_filter}(\textbf{X},\textbf{R}_{NE},r,\varepsilon)\\ \mathbf{\Omega}_{OE}=\textrm{guided\_filter}(\textbf{X},\textbf{R}_{OE},r,\varepsilon)\end{array} (6)

where X acts as a guide and r and ε\varepsilon indicate the filter parameters. The smoothed cluster maps are then normalized. The normalized weights for a pixel k are computed this way

ωUEk=𝛀UEk/(𝛀UEk+𝛀NEk+𝛀OEk)ωNEk=𝛀NEk/(𝛀UEk+𝛀NEk+𝛀OEk)ωOEk=𝛀OEk/(𝛀UEk+𝛀NEk+𝛀OEk)\begin{array}[]{ll}\omega_{UE}^{k}=\mathbf{\Omega}_{UE}^{k}\big{/}\big{(}\mathbf{\Omega}_{UE}^{k}+\mathbf{\Omega}_{NE}^{k}+\mathbf{\Omega}_{OE}^{k}\big{)}\\ \omega_{NE}^{k}=\mathbf{\Omega}_{NE}^{k}\big{/}\big{(}\mathbf{\Omega}_{UE}^{k}+\mathbf{\Omega}_{NE}^{k}+\mathbf{\Omega}_{OE}^{k}\big{)}\\ \omega_{OE}^{k}=\mathbf{\Omega}_{OE}^{k}\big{/}\big{(}\mathbf{\Omega}_{UE}^{k}+\mathbf{\Omega}_{NE}^{k}+\mathbf{\Omega}_{OE}^{k}\big{)}\end{array} (7)

Once the normalized weights are computed, 𝚲h\mathbf{\Lambda}_{h} and 𝚲v\mathbf{\Lambda}_{v} are obtained as follows:

𝚲hk=ωUEk(hYOE)k+ωNEk(hYNE)k+ωOEk(hYUE)k𝚲vk=ωUEk(vYOE)k+ωNEk(vYNE)k+ωOEk(vYUE)k\begin{array}[]{ll}\mathbf{\Lambda}_{h}^{k}=\omega_{UE}^{k}\big{(}\nabla^{h}\textbf{Y}_{OE}\big{)}^{k}+\omega_{NE}^{k}\big{(}\nabla^{h}\textbf{Y}_{NE}\big{)}^{k}+\\ \qquad\enspace\omega_{OE}^{k}\big{(}\nabla^{h}\textbf{Y}_{UE}\big{)}^{k}\\ \mathbf{\Lambda}_{v}^{k}=\omega_{UE}^{k}\big{(}\nabla^{v}\textbf{Y}_{OE}\big{)}^{k}+\omega_{NE}^{k}\big{(}\nabla^{v}\textbf{Y}_{NE}\big{)}^{k}+\\ \qquad\enspace\omega_{OE}^{k}\big{(}\nabla^{v}\textbf{Y}_{UE}\big{)}^{k}\end{array} (8)

X, 𝚲h\mathbf{\Lambda}_{h} and 𝚲v\mathbf{\Lambda}_{v} computed via (3) and (8) are then inserted into (2) to obtain Y.

3.2 Chrominance fusion

Unlike the luminance component, chrominance fusion is achieved through weighted averaging. The saturation value of a pixel is used as a weight to fuse its chrominance components. The saturation ss for a pixel k with red, green, and blue components of rkr_{k}, gkg_{k}, bkb_{k} is computed this way

ρk=(rk+gk+bk)/3sk=(rkρk)2+(gkρk)2+(bkρk)2\begin{array}[]{ll}\rho_{k}=(r_{k}+g_{k}+b_{k})/3\\ s_{k}=\sqrt{(r_{k}-\rho_{k})^{2}+(g_{k}-\rho_{k})^{2}+(b_{k}-\rho_{k})^{2}}\end{array} (9)

Let sUEks_{UE}^{k}, sNEks_{NE}^{k}, and sOEks_{OE}^{k} represent the saturation of the pixel k in IUE\textbf{I}_{UE}, INE\textbf{I}_{NE}, and IOE\textbf{I}_{OE}, respectively. The normalized weights are obtained as follows:

ϖUEk=sUEk/(sUEk+sNEk+sOEk)ϖNEk=sNEk/(sUEk+sNEk+sOEk)ϖOEk=sOEk/(sUEk+sNEk+sOEk)\begin{array}[]{ll}\varpi_{UE}^{k}=s_{UE}^{k}\big{/}\big{(}s_{UE}^{k}+s_{NE}^{k}+s_{OE}^{k}\big{)}\\ \varpi_{NE}^{k}=s_{NE}^{k}\big{/}\big{(}s_{UE}^{k}+s_{NE}^{k}+s_{OE}^{k}\big{)}\\ \varpi_{OE}^{k}=s_{OE}^{k}\big{/}\big{(}s_{UE}^{k}+s_{NE}^{k}+s_{OE}^{k}\big{)}\end{array} (10)

And the chrominance components of the pixel k are then fused according to these equations:

Cbk=ϖUEkCbUEk+ϖNEkCbNEk+ϖOEkCbOEkCrk=ϖUEkCrUEk+ϖNEkCrNEk+ϖOEkCrOEk\begin{array}[]{ll}\textbf{Cb}^{k}=\varpi_{UE}^{k}\textbf{Cb}_{UE}^{k}+\varpi_{NE}^{k}\textbf{Cb}_{NE}^{k}+\varpi_{OE}^{k}\textbf{Cb}_{OE}^{k}\\ \textbf{Cr}^{k}=\varpi_{UE}^{k}\textbf{Cr}_{UE}^{k}+\varpi_{NE}^{k}\textbf{Cr}_{NE}^{k}+\varpi_{OE}^{k}\textbf{Cr}_{OE}^{k}\end{array} (11)

4 Smartphone Implementation

A smartphone app is developed in this work to demonstrate the practicality aspect of the introduced automatic exposure selection and fusion methods. This app is developed for smartphones running Android operating system by utilizing the following software tools: Android Studio and Android Native Development Kit (Android NDK) which allows incorporating C/C++ codes into Android Studio. The entire processing is divided into two parts: capturing part, which is done via Java code, and processing part, which is done via C code. More details about the two parts are provided in the following two subsections.

4.1 Capturing

The capturing part involves taking Iin\textbf{I}_{in} followed by taking IOE\textbf{I}_{OE}, INE\textbf{I}_{NE}, and IUE\textbf{I}_{UE}. A button named Capture is provided in the app to launch capturing images. Once the Capture button is pressed by the user, the capturing process starts as described in Algorithm 1. The output of Algorithm 1 consists of three images in the YCbCr format, corresponding to the under-exposed, normal-exposed, and over-exposed conditions as well as the clustering outcome in the form of three binary maps.

Table 1: Capturing part
Input: None.
Output: Three YCbCr images and binary maps
1. Perform an initial capture
2. Cluster the luminance component of this initial
     capture, record the 3 cluster means and generate
     three binary maps
3. Initialize three CaptureRequests and set the
     CaptureRequest exposure times
4. Initialize a burst CaptureSession with the three
     defined CaptureRequests
5. Launch CaptureSession and wait for camera to
     acquire the three images
6. Pass the three images together with the three binary
     maps through JNI (Java Native Interface) to the C
     fusion code

4.2 Fusion part

When the three images are made available from the capturing part, the processing outlined in section 3, which is coded in C, is performed and an HDR image as described in Algorithm 2 is generated. This image is then sent back to the Java environment for getting saved in memory.

Table 2: Fusion part
Input: Three images in YCbCr format and binary maps.
Output: HDR image
1. Fuse luminance component
    a. Obtain X via (3)
    b. Calculate the horizontal and vertical gradients of
      YUE\textbf{Y}_{UE}, YNE\textbf{Y}_{NE}, and YOE\textbf{Y}_{OE}
    c. Apply the guided filter to get smooth cluster maps
      via (6) and normalize the weights via (7)
    d. Acquire the gradients via (8)
    e. Estimate Y via (2)
2. Fuse chrominance component
    a. Obtain the saturation for each pixel of IOE\textbf{I}_{OE}, INE\textbf{I}_{NE},
      and IUE\textbf{I}_{UE} via (9) and calculate the normalized
      chrominance weights via (10)
    b. Calculate Cb, Cr via (11)
3. Combine Y, Cb, and Cr to form the output image
     in the YCbCr format and then convert to the RGB
     format

5 Results and Discussion

This section provides the results of the experimentations conducted to evaluate the introduced automatic exposure selection and fusion methods for generating HDR images.
An Android app was designed to capture 34 images of a scene at 8M size and fixed ISO, fixed lens position, with the auto-white-balance enabled, across different exposure times. 34 exposure times varying from 0.5 sec to 1/4000 sec in steps of 1/3 standard exposure value were used to generate a ground-truth irradiance map image (HDRref\textbf{HDR}_{ref}) of the scene using the method introduced in Ref8 . The above process was repeated for 10 different indoor and outdoor scenes. The complete dataset is made available for public use at http://www.utdallas.edu/\simkehtar/ImageEnhancement/
HDR/Scenes.

The first set of experiments mentioned next addressed selecting the optimal parameters while the second and third sets of experiments addressed the performance and the results obtained by a modern smartphone, respectively.

5.1 Parameter selection

The parameters of the exposure fusion part included the regularization parameter λ\lambda of the optimization in (2), as well as the guided filter parameters, i.e. the filter radius r and the regularization parameter ε\varepsilon in (6). For the guided filter, since maximal smoothing within short distances was desired, the minimal value of 1 for r together with a large value of 1 for ε\varepsilon were considered. The kernels [1,1][-1,1] and [1,1]T[-1,1]^{T} were used as the horizontal and vertical gradient operators, respectively.
The experiments revealed that the optimal λ\lambda was highly dependent on the image size. To address this issue, resized copies of the images were generated by scaling the dimension from 0.1 to 1 in steps of 0.1. The fusion algorithm was then applied to all the images, the original images and resized ones. λ\lambda was varied according to (12) and the widely used image quality measure of Tone-Mapped Quality Index TMQI Ref24 was computed for every λ\lambda ,

λ=SIZE/2sλ,sλ=2,1,,5\lambda=SIZE/2^{s_{\lambda}},s_{\lambda}=-2,-1,\ldots,5 (12)

where SIZESIZE denotes the diameter of the input image and sλs_{\lambda} is a scaling factor. The average TMQI over all the images for different scale values is shown in Fig. 3. This figure indicates that the scale value of 2 generated the best TMQI. Hence, λ\lambda was set to λ=SIZE/4\lambda=SIZE/4 for the subsequent experimentations.

Refer to caption
Figure 3: TMQI vs. scale sλs_{\lambda}

Furthermore, in order to gain computational efficiency, another experiment was conducted by applying the guided filtering operation to the down-sampled binary maps (instead of the original size) and TMQI was recomputed. The binary cluster maps were scaled to a smaller size, after applying the guided filter, the filtering outcome was rescaled to the original size. All the other steps of the exposure fusion were kept the same. The dimensions of the binary cluster maps were lowered as follows:

SCALE=1/2sg,sg=0,1,,4SCALE=1/2^{s_{g}},s_{g}=0,1,\ldots,4 (13)

where SCALESCALE denotes the down-sampling ratio and sgs_{g} represents a scaling factor. The average TMQI over all the images for different sgs_{g} values is shown in Fig. 4. As can be seen from this figure, TMQI dropped monotonically as sgs_{g} was increased. However, since the decrease in TMQI was more as sgs_{g} varied from 3 to 4 compared to the previous decreases, sg=3s_{g}=3 was chosen in the smartphone app. In other words, for computational efficiency reasons on a smartphone platform, the binary clustered maps were down-sampled with a factor of 8 before applying the guided filter.

Refer to caption
Figure 4: TMQI vs. scale sgs_{g}

5.2 Comparison

This section provides objective and subjective comparisons with several representative existing methods. The method introduced in this paper, named OB, together with iCam Ref10 , Weighted Least Squares (WLS) Ref11 , Guided Filtering (GF) Ref13 , and Exposure Fusion (EF) Ref15 methods were applied to the image sequences. Sample image sequences as well as the corresponding fusion results are displayed in Fig. 5. As can be seen from Figs. 5 (c) and (d), the fusion results using the EF and OB methods were found to be better compared to the other methods. However, the light region in Fig. 5 (c) appeared over-exposed and the wall region above the light in Fig. 5 (d) lost some color information when using the EF method whereas these effects did not occur when using the OB method.

Refer to caption
Figure 5: Sample results, (a) sequence 1, (b) sequence 2, from left to right: under-exposed, normal-exposed, and over exposed, (c) HDR image for sequence 1, (d) HDR image for sequence 2, from left to right: iCam, WLS, GF, EF, OB.

The TMQI measure averaged over all the 10 scenes are provided in Fig. 6. As can be seen from this figure, the performance of the EF and OB methods were found comparable and higher than those of the other methods.

Refer to caption
Figure 6: TMQI for the scenes together with the average TMQI

The processing times of the aforementioned methods were measured by applying them to the images in the dataset resized to 1 megapixels. These times were found to be 2.67, 22.61, 1.07, 2.37, and 0.99 seconds for the iCam, WLS, GF, EF, and OB methods, respectively, when using a 2.66 GHz machine with all the methods coded in Matlab. It is also worth mentioning that the processing times for the iCam, WLS, and GF methods, which all fall under the tonal mapping category of HDR, only reflect the time associated with the tonal mapping and do not include the time associated with the scene irradiance map generation. Since the TMQI measure for the iCam, WLS, and GF methods was found to be lower compared to those for the EF and OB methods, the processing time comparison was limited to the EF and OB methods which is stated below.
The computational complexities of the EF and OB methods are C1O(N2log(N))C_{1}O\big{(}N^{2}log(N)\big{)} and C2O(N2log(N))C_{2}O\big{(}N^{2}log(N)\big{)}, respectively, where C1C_{1} and C2C_{2} denote constants and NN denotes the image size (height or width). Although the two methods have the same big OO, the constants C1C_{1} and C2C_{2} have a noticeable impact on the processing time as NN increases. Fig. 7 shows the average processing times of the EF and OB methods over the resized versions of the captured scenes in terms of the image size expressed in pixels. The processing times were measured for NN=64, 128, 256, 512, 1024, 2048. In addition, C1N2log(N)C_{1}N^{2}log(N) and C2N2log(N)C_{2}N^{2}log(N) curves were fitted to the processing times thus generating the values of 8.47e-8 and 2.25e-7 for the constants C1C_{1} and C2C_{2}, respectively. As can be seen from this figure, the two dashed lines fit the processing times closely. As a result, the processing time of the OB method is on average C2/C1=0.37C_{2}/C_{1}=0.37 that of the EF method, that is the OB method is about 60% faster than the EF method. More computational gain can be achieved noting that the Fourier transforms of h\nabla^{h} and v\nabla^{v} in (2) are input image-independent and can be computed offline and stored in memory.

Refer to caption
Figure 7: Running time comparison

Although the introduced solution is intended for use in scenes that do not change during the time that different exposure images are captured, the exposure selection and capturing parts are designed and implemented in such a way that there is minimal delay in capturing the three exposure images. Thus, in practice, one experiences little or no misalignments between the three exposure images because of possible changes that may occur in a scene. Furthermore, the exposure fusion part is tolerant to slight misalignments since it extracts the fusion information locally from the auto-exposed image rather than the other exposure images. An example of an HDR image in the presence of misalignments that are generated by the EF and OB methods is shown in Fig. 8. As can be seen from this figure, the ghost effect is noticeable around the cables and the edges of the monitor and the mouse in the image generated by the EF method while no ghost artifacts are present in the image generated by the OB method. It is worth noting that it is possible to use computationally efficient image registration techniques, e.g. Ref4 , Ref25 , to align different exposure images before performing fusion for situations where severe misalignments may occur.

Refer to caption
Figure 8: Misalignment effect; left: EF method, right our OB method.

5.3 Smartphone results

In this subsection, the actual smartphone implementation outcome of the OB method is reported. This HDR app can be downloaded from http://www.utdallas.edu/
\simkehtar/HDRApp.apk and run on an Android smartphone. A sample exposure selection and fusion outcome obtained by running the developed app on a smartphone is shown in Fig. 9 together with the corresponding under-exposed, normal-exposed, and over-exposed images.

Refer to caption
Figure 9: Under-exposed, normal-exposed, and over-exposed images together with their fusion outcome.

The average processing time for images of size 768×1024768\times 1024 pixels on a modern smartphone was found to be about 1.8s. Note that the intention here has been to show the practicality of running the introduced solution on smartphones. It should be realized that it is possible to achieve lower processing times as the app code running on the smartphone does not utilize vector processing on the NEON coprocessor present on nearly all modern smartphones.

6 Conclusion

A method for automatic exposure selection and a method for fusion of exposure bracket images were introduced in this paper. The exposure selection was done by analyzing the brightness of a scene via clustering and the camera characteristic function. For exposure fusion, the luminance and chrominance of three bracket images were blended via an optimization formulation and weighted averaging, respectively. The smartphone implementation and comparisons with the existing methods have shown the practicality and performance of the introduced solution.

Appendix A Optimization Solution

This appendix provides the solution of the optimization problem with only one gradient term. The derivation for two terms is straightforward and not included here to save space. The optimization formulation for one gradient term is given by

Y^=argminY{YXF2+λY𝚲F2}\widehat{\textbf{Y}}=\textrm{argmin}_{\textbf{Y}}\big{\{}\|\textbf{Y}-\textbf{X}\|_{F}^{2}+\lambda\|\nabla\textbf{Y}-\mathbf{\Lambda}\|_{F}^{2}\big{\}} (14)

where 𝚲\mathbf{\Lambda} represents the gradient of Y and \nabla indicates the gradient operator. In vector form, Equation (14) can be written as:

y^=argminy{yx2+λCyδ2}\widehat{\textbf{y}}=\textrm{argmin}_{\textbf{y}}\big{\{}\|\textbf{y}-\textbf{x}\|^{2}+\lambda\|\textbf{C}\textbf{y}-\mathbf{\delta}\|^{2}\big{\}} (15)

where y, x, and δ\mathbf{\delta} represent the column vector versions of Y, X, and 𝚲\mathbf{\Lambda}, respectively, and C denotes the block-circulant matrix representation of \nabla. By taking the derivative with respect to y, the following solution is obtained

y^=(I+λCTC)1(x+λCTδ)\widehat{\textbf{y}}=\big{(}\textbf{I}+\lambda\textbf{C}^{T}\textbf{C}\big{)}^{-1}\big{(}\textbf{x}+\lambda\textbf{C}^{T}\mathbf{\delta}\big{)} (16)

where I denotes the identity matrix. Since C is a block-circulant matrix, it can be represented in diagonal form as:

C=WEW1\textbf{C}=\textbf{W}\textbf{E}\textbf{W}^{-1} (17)

where E is the diagonal version of C and W is the DFT (Discrete Fourier Transform) operator. The diagonal values of E correspond to the DFT coefficients of \nabla (\nabla should be zero-padded properly before applying DFT). Hence, (16) can be rewritten as:

y^=(I+λWEEW1)1(x+λWEW1δ)\widehat{\textbf{y}}=\big{(}\textbf{I}+\lambda\textbf{W}\textbf{E}^{\dagger}\textbf{E}\textbf{W}^{-1}\big{)}^{-1}\big{(}\textbf{x}+\lambda\textbf{W}\textbf{E}^{\dagger}\textbf{W}^{-1}\mathbf{\delta}\big{)} (18)

By multiplying both sides of (18) by W1\textbf{W}^{-1}, the following equation is resulted

W1y^=(I+λEE)1(W1x+λEW1δ)\textbf{W}^{-1}\widehat{\textbf{y}}=\big{(}\textbf{I}+\lambda\textbf{E}^{\dagger}\textbf{E}\big{)}^{-1}\big{(}\textbf{W}^{-1}\textbf{x}+\lambda\textbf{E}^{\dagger}\textbf{W}^{-1}\mathbf{\delta}\big{)} (19)

Since W1y^\textbf{W}^{-1}\widehat{\textbf{y}}, W1x\textbf{W}^{-1}\textbf{x}, and W1δ\textbf{W}^{-1}\mathbf{\delta} correspond to the DFT of y^\widehat{\textbf{y}}, x, and δ\mathbf{\delta}, respectively, (19) can be stated as follows:

{Y^u,v}={X}u,v+λ{}u,v{𝚲}u,v1+λ{}u,v2\mathcal{F}\{\widehat{\textbf{Y}}_{u,v}\}=\frac{\mathcal{F}\{\textbf{X}\}_{u,v}+\lambda\mathcal{F}\{\nabla\}_{u,v}^{\dagger}\mathcal{F}\{\mathbf{\Lambda}\}_{u,v}}{1+\lambda\|\mathcal{F}\{\nabla\}_{u,v}\|^{2}} (20)

References

  • (1) Mann, S., Picard, R.,: Being ‘undigital’ with digital cameras: extending dynamic range by combining differently exposed pictures. In: IS&T, 48th annual conference, USA, pp. 422-428 (1995).
  • (2) Reinhard, E., Heidrich, W., Debevec, P., Pattanaik, S., Ward, G., Myszkowski, K.: High Dynamic Range Imaging, 2nd edn. Morgan Kaufmann Publishers, San Francisco (2010).
  • (3) Barakat, N., Hone, A.N., Darcie, T.E.: Minimal-bracketing sets for high-dynamic-range image capture. IEEE Trans. Image Process. 17(10), 1864-1875 (2008).
  • (4) Gupa, M., Iso, D., Nayar, S.K.: Fibonacci exposure bracketing for high dynamic range imaging. In: IEEE Intl. Conf. Cumputer Vision (ICCV), Australia, pp. 1473-1480 (2013).
  • (5) Huang, K., Chiang, J.: Intelligent exposure determination for high quality HDR image generation. In: IEEE Intl. Conf. Image Process. (ICIP), Australia, pp. 3201-3205 (2013).
  • (6) Pourreza-Shahri, R., Kehtarnavaz, N.: Automatic exposure selection for high dynamic range photography. In: IEEE Intl. Conf. Consumer Electronics (ICCE), USA, pp. 469-497, (2015).
  • (7) Pourreza-Shahri, R., Kehtarnavaz, N.: Exposure bracketing via automatic exposure selection. In: IEEE Intl. Conf. Image Proc. (ICIP), Canada, pp. 320-323 (2015).
  • (8) Debevec, P., Malik, J.: Recovering high dynamic range radiance maps from photographs. In: 24th Annual Conf. Computer Graphics and Interactive Techniques (SIGGRAPH), USA, pp. 369-378 (1997).
  • (9) Cvetković, S., Klijn, J., With, P.H.N.: Tone-mapping functions and multiple-exposure techniques for high dynamic-range images. IEEE Trans. Consumer Electron. 54(2), 904-911 (2008).
  • (10) Kuang, J., Johnson, G.M., Fairchild, M.D.: iCAM06: A refined image appearance model for HDR image rendering. J. Visual Communication 18, 406-414 (2007).
  • (11) Farbman, Z., Fattal, R., Lischinski, D., Szeliski, R.:Edge-preserving decompositions for multi-scale tone and detail manipulation. ACM Trans. Graphics 21(3), 249–256 (2008).
  • (12) Durand, F., Dorsey, J.: Fast bilateral filtering for the display of high-dynamic-range images. ACM Trans. Graphics 21(3), 257–266 (2002).
  • (13) He, K., Sun, J., Tang, X.: Guided image filtering. IEEE Trans. Pattern Analysis and Machine Intelligence 35(6), 1397-1409 (2013).
  • (14) Mertens, T., Kautz, J., Van Reeth, F.:Exposure fusion: a simple and practical alternative to high dynamic range photography. Computer Graphics Forum 28(1), 161–171 (2009).
  • (15) Li, S., Kang, X.: Fast multi-exposure image fusion with median filter and recursive filter. IEEE Trans. Consumer Electron. 58(2), 626–632 (2012).
  • (16) Song, M., Tao, D., Chen, C., Bu, J., Luo, J., Zhang, C.: Probabilistic exposure fusion. IEEE Trans. Image Process. 21(1), 341– 357 (2012).
  • (17) Zhang, W., Cham, W.-K.: Gradient-directed multi-exposure composition. IEEE Trans. Image Process. 21(4), 2318–2323 (2012).
  • (18) Shen, R., Cheng, I., Shi, J., Basu, A.: Generalized random walks for fusion of multi-exposure images. IEEE Trans. Image Process. 20(12), 3634-3646 (2011).
  • (19) Shen, R., Cheng, I., Basu, A.: QoE-based multi-exposure fusion in hierarchical multivariate gaussian CRF. IEEE Trans. Image Process. 22(6), 2469–2478 (2013).
  • (20) Xu, L., Du, J., Zhang, Z.: Feature-based multiexposure image-sequence fusion with guided filter and image alignment. J. Electron. Imaging 24(1), 013022 (2015).
  • (21) Hu, J., Gallo, O., Pulli, K.: Exposure stacks of live scenes with hand-held cameras. In: Europ. Conf. Computer Vision (ECCV), Italy, pp. 499-512 (2012).
  • (22) Tico, M., Gelfand, N., Pulli, K.: Motion-blur-free exposure fusion. In: IEEE Intl. Conf. Image Process. (ICIP), China, pp. 3321-3324 (2010).
  • (23) Yeganeh H., Wang, Z.: Objective quality assessment of tone-mapped images. IEEE Trans. Image Process. 22(2), 657-667 (2013).
  • (24) Ward, G.: Fast, robust image registration for compositing high dynamic range photographs from handheld exposures. J. Graphic Tools 8(2), 17-30 (2003).