This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

${\dagger}$${\dagger}$footnotetext: These authors contributed equally to this work

Group Sparse Coding for Image Denoising

Luoyu Chen
School of Computer Engineering
The Australian National University
Canberra 2601
[email protected]
   Fei Wu
School of Computer Engineering
The Australian National University
Canberra 2601
[email protected]
Abstract

Group sparse representation has shown promising results in image debulrring and image inpainting in GSR [3] , the main reason that lead to the success is by exploiting Sparsity and Nonlocal self-similarity (NSS) between patches on natural images, and solve a regularized optimization problem. However, directly adapting GSR[3] in image denoising yield very unstable and non-satisfactory results, to overcome these issues, this paper proposes a progressive image denoising algorithm that successfully adapt GSR [3] model and experiments shows the superior performance than some of the state-of-the-art methods.

1 Introduction

Image denoising as a classical problem has been studied in past century due to its practical significance, aiming at reconstruct high quality image from its degraded noisy version. The problem is formulated as solving a linear inverse problem: given observed noisy image 𝐲\mathbf{y} with additive white Gaussian noise 𝐯\mathbf{v} with standard deviation σ\sigma, we want to recover the original image 𝐱\mathbf{x}, which is:

𝐲=𝐱+𝐯,𝐯𝒩(0,σ).\mathbf{y}=\mathbf{x}+\mathbf{v},\;\;\;\mathbf{v}\in\mathcal{N}(0,\sigma).

For most natural images, it is easy to observe there are many repeating patterns with low texture, thus, sparse coding would be ideal to represent low texture region. Simultaneously, it is hard for sparse coding to capture dense texture such as white noise, therefore appropriate level of sparse coding can capture desirable texture of original image while removing the white noise. Such idea has been explored by Ksvd [2], a sparse coding denoising method, by iterating sparse coding and dictionary learning, more sparse representation and higher quality image can be reconstructed. However, this requires to solve a large scale optimization problem on a global dictionary, which is highly computational demanding. In addition to sparse coding, BM3D  [1] also exploit NSS as an effective prior for natural images, and is one of the state of the art image denoiser. Similar patches are stacked into 3D group, and then uses two stage collaborative filtering on transformed domain of the 3D group, which exploit both sparsity and NSS. However, its performance tend to get poorer when noise level gets high, because the patch matching phase is directly performed on noisy image.

GSR  [3] is an improved version of Ksvd, changing unit from patch to patch group and learn adaptive dictionaries for each group, which helps prevent the large scale sparse coding for the entire image by one pass, and also acknowledge that a global dictionary is far from enough to capture rich texture features on different regions of an image, adaptive dictionaries helps yielding promising reconstruction quality. However, GSR suffers from the instability in hyper-parameter choosing to determine the trade-off between data fidelity and representation sparsity, for different images and noise levels, the optimal hyper-parameter differs significantly, which dramatically limits the possibility for large-scale batch processing. Hence, this paper’s contribution focus on adapting GSR [3] to image denoising to yield stable and superior denoisng result, also, proposes a better patch similarity search scheme to further boost denoising quality.

This paper is organized as follows: first introduces the l0l_{0} optimization problem for image denoising. Second, discusses the iterative optimization steps and finalize with an overview of the entire algorithm. Third part presents the experimental results.

2 Group Sparse Representation Learning

2.1 Similar Patch Grouping

In Figure 1, for an image 𝐱\mathbf{x}, with size NN is divided into nn overlapped patches 𝐱𝐤\mathbf{x_{k}} of size b×b,k=1,2,n\sqrt{b}\times\sqrt{b},k=1,2,\cdots n, Then for each patch 𝐱𝐤\mathbf{x_{k}}, its most similar cc patches are selected from an L×LL\times L sized searching window to form a set S𝐱𝐤S_{\mathbf{x_{k}}}. After this, all the patches in S𝐱𝐤S_{\mathbf{x_{k}}} are stacked into a matrix 𝐱𝐆𝐤b×c\mathbf{x_{G_{k}}}\in\mathfrak{R}^{b\times c}, which contains every element of S𝐱𝐤S_{\mathbf{x_{k}}} as its column, 𝐱𝐆𝐤={𝐱𝐆𝐤,𝟏,𝐱𝐆𝐤,𝟐,,𝐱𝐆𝐤,𝐜}\mathbf{x_{G_{k}}}=\{\mathbf{x_{{G_{k}},1},x_{{G_{k}},2},\cdots,x_{{G_{k}},c}}\}, where xGk,ix_{{G_{k}},i} denotes the ii-th similar patch (column form) of the kk-th group. The similarity measurement simply uses the mean square error to determine the distance between different patches.

Refer to caption
Figure 1: patch matching scheme, from GSR[3] paper

In this paper, the similarity measurement is based on more and more better denoised images, therefore producing better results than purely search on the noisy image, related experiments is displayed at the third section.

2.2 The l0\mathit{l_{0}} Minimization Problem for Image Denoising

Each group 𝐱𝐆𝐤\mathbf{x_{G_{k}}} can be sparsely represented by a dictionary 𝐃𝐆𝐤\mathbf{D_{G_{k}}} and a coefficient vector 𝐚𝐆𝐤\mathbf{a_{G_{k}}}. We want to make sure that the dictionary can partially fit the noise image to capture the local texture, and also, the coefficient vector can be sparse enough to filtered out the white noise. Hence, we want to solve the optimization problem:

argmin𝐃𝐱,𝐚𝐱Σk=1n𝐱𝐆𝐤𝐃𝐆𝐤𝐚𝐆𝐤2+λ𝐚𝐱0\underset{\mathbf{D_{x}},\;\mathbf{a_{x}}}{\arg\min}\Sigma_{k=1}^{n}\|\mathbf{x_{G_{k}}}-\mathbf{D_{G_{k}}}\mathbf{a_{G_{k}}}\|_{2}+\lambda\|\mathbf{a_{x}}\|_{0}

Where 𝐃𝐱={𝐃𝐆𝟏,,𝐃𝐆𝐧}\mathbf{D_{x}}=\{\mathbf{D_{G_{1}}},\cdots,\mathbf{D_{G_{n}}}\}, 𝐚𝐱={𝐚𝐆𝟏,,𝐚𝐆𝐧}\mathbf{a_{x}}=\{\mathbf{a_{G_{1}}},\cdots,\mathbf{\mathbf{a_{G_{n}}}}\}, λ\lambda is the trade-off between representation sparsity and data fidelity, larger λ\lambda encourages more sparse representation while sacrificing the data fidelity, which is more likely to filter out the white noise, but more sparsity generally means the recovered patches are more smooth and more local texture might be lost. Therefore, a carefully selected λ\lambda can remove the noise while preserving most local texture.

In GSR[3], the optimization problem is solved by iteratively solving 2 sub-problems:

𝐃𝐱(𝐭+𝟏)=argmin𝐃𝐱Σk=1n𝐱𝐆𝐤𝐃𝐆𝐤𝐚𝐆𝐤2+λ𝐚𝐱0\mathbf{D_{x}^{(t+1)}}=\underset{\mathbf{D_{x}}}{\arg\min}\Sigma_{k=1}^{n}\|\mathbf{x_{G_{k}}}-\mathbf{D_{G_{k}}}\mathbf{a_{G_{k}}}\|_{2}+\lambda\|\mathbf{a_{x}}\|_{0}
𝐚𝐱(𝐭+𝟏)=argmin𝐚𝐱Σk=1n𝐱𝐆𝐤𝐃𝐆𝐤(𝐭+𝟏)𝐚𝐆𝐤2+λ𝐚𝐱0\mathbf{a_{x}^{(t+1)}}=\underset{\mathbf{a_{x}}}{\arg\min}\Sigma_{k=1}^{n}\|\mathbf{x_{G_{k}}}-\mathbf{D_{G_{k}}^{(t+1)}}\mathbf{a_{G_{k}}}\|_{2}+\lambda\|\mathbf{a_{x}}\|_{0}

The first sub-problem is to learn a set of adaptive dictionaries for each group, GSR[3] uses SVD decomposition for each group {𝐱𝐆𝟏,,𝐱𝐆𝐧}\{\mathbf{x_{G_{1}}},\cdots,\mathbf{x_{G_{n}}}\}, the k-th group 𝐱𝐆𝐤=𝐔𝐤𝐒𝐤𝐕𝐤𝐓\mathbf{x_{G_{k}}}=\mathbf{U_{k}S_{k}V_{k}^{T}}, then for each local dictionary 𝐃𝐆𝐤=[𝐮𝐤,𝟏𝐯𝐤,𝟏𝐓,,𝐮𝐤,𝐦𝐯𝐤,𝐦𝐓]\mathbf{D_{G_{k}}}=[\mathbf{u_{k,1}v_{k,1}^{T}},\cdots,\mathbf{u_{k,m}v_{k,m}^{T}}], where m = min(b,c)\min(b,c).

The second sub-problem is solved by Split Bregman Iteration (SBI), the answer is approximated as, 𝐚𝐱=Hard(𝐚𝐱,2τ)\mathbf{a_{x}}=Hard(\mathbf{a_{x}},2\sqrt{\tau}), where τ=(λK)/(μN)\tau=(\lambda K)/(\mu N), K=b×c×nK=b\times c\times n, NN is the number of pixels of image 𝐱\mathbf{x}. The hyper-parameter λ\lambda and μ\mu are empirically selected, λ\lambda is the trade-off term as mentioned above. However, λ\lambda varies a lot for different images and different noise level in order to obtain the best denoise effect, which means SBI approximated solution can be sub-optimal. However there still exists a ’golden-ratio’ trade-off factor between texture fitting (the data fidelity term) and denoising effect (the sparsity regularization term). By experimenting on multiple images, when the mean absolute difference between reconstructed image and the current denoising image is 0.73σ^(t)0.73\hat{\sigma}^{(t)}, or the difference stops increasing when more sparse 𝐚𝐆𝐤\mathbf{a_{G_{k}}} is being thresholded (only happens when σ75\sigma\geq 75), the best threshoulding is obtained. μ\mu is the learning rate to produce a less noisy image in each iteration by superimposing the currently denoised images 𝐱^(t)\hat{\mathbf{x}}^{(t)} over the noisy image, and is empirically selected as 0.1, to achieve better denoising effect in follow-up iterations.

Input: The noisy image 𝐲N×N\mathbf{y}\in\mathfrak{R}^{\sqrt{N}\times\sqrt{N}};
   The white noise standard deviation, σ\sigma
   The patch size bb
   The number of similar patches to extract cc
   The learning rate μ\mu
   The maximum number of iteration max_itermax\_iter
   
1
Output: The denoised image, 𝐱N×N\mathbf{x}\in\mathfrak{R}^{\sqrt{N}\times\sqrt{N}}.
2
3mmin(b,c)m\leftarrow\mathbf{\min}(b,c)
4for t=1t=1 to max_itermax\_iter do
5      
6      if t=1t=1 then
7             𝐱^(t)𝐲,σ^(t)σ\hat{\mathbf{x}}^{(t)}\leftarrow\mathbf{y},\hat{\sigma}^{(t)}\leftarrow\sigma
8       else
9             𝐱^(t)𝐲+μ(𝐲𝐱^(t1)),σ^(t)σ2𝐲𝐱^(𝐭)22/N\hat{\mathbf{x}}^{(t)}\leftarrow\mathbf{y}+\mu(\mathbf{y}-\hat{\mathbf{x}}^{(t-1)}),\hat{\sigma}^{(t)}\leftarrow\sqrt{\sigma^{2}-\|\mathbf{y-\hat{\mathbf{x}}^{(t)}}\|_{2}^{2}/N}
10       end if
11      %search similar patches and perform grouping on 𝐱^(t)\hat{\mathbf{{x}}}^{(t)}%
12      𝐱𝐆{𝐱𝐆𝟏,,𝐱𝐆𝐧}\mathbf{x_{G}}\leftarrow\{\mathbf{x_{G_{1}}},\cdots,\mathbf{x_{G_{n}}}\}
13      for k = 1 to n do
14             %learn local dictionary 𝐃𝐆𝐤\mathbf{D_{G_{k}}}%
15             𝐱𝐆𝐤𝐔𝐤𝐒𝐤𝐕𝐤𝐓,𝐃𝐆𝐤[𝐮𝐤,𝟏𝐯𝐤,𝟏𝐓,,𝐮𝐤,𝐦𝐯𝐤,𝐦𝐓]\mathbf{x_{G_{k}}}\leftarrow\mathbf{U_{k}S_{k}V_{k}^{T}},\mathbf{D_{G_{k}}}\leftarrow[\mathbf{u_{k,1}v_{k,1}^{T}},\cdots,\mathbf{u_{k,m}v_{k,m}^{T}}]
16            %initialize 𝐚𝐆𝐤\mathbf{a_{G_{k}}} by full rank recovery %
17            𝐚𝐆𝐤𝐒𝐤\mathbf{a_{G_{k}}}\leftarrow\mathbf{S_{k}}
18       end for
19      
20      %threshould coefficient vector 𝐚𝐆\mathbf{a_{G}}%
21      T0T\leftarrow 0
22      rec_loss0rec\_loss\leftarrow 0
23      prev_rec_loss0prev\_rec\_loss\leftarrow 0
24      while  rec_loss0.73σ^(t)rec\_loss\leq 0.73\hat{\sigma}^{(t)}  do
25            
26            𝐚𝐆Hard(𝐚𝐆,T)\mathbf{a_{G}^{\prime}}\leftarrow Hard(\mathbf{a_{G}},T)
27            %put sparsely decoded patches back to image%
28            𝐱^(t)𝐃𝐆𝐚𝐆\mathbf{\hat{x}}^{{}^{\prime}(t)}\leftarrow\mathbf{D_{G}\circ a_{G}^{\prime}}
29            rec_loss1N𝐱^(t)𝐱^(t)1rec\_loss\leftarrow\frac{1}{N}\|\mathbf{\hat{x}}^{{}^{\prime}(t)}-\mathbf{\hat{x}}^{(t)}\|_{1}
30            if |prev_rec_lossrec_loss|<0.001|prev\_rec\_loss-rec\_loss|<0.001 then
31                  Break
32             end
33            
34            prev_rec_lossrec_lossprev\_rec\_loss\leftarrow rec\_loss
35            TT+1T\leftarrow T+1
36       end while
37      𝐱^(t)𝐱^(t)\mathbf{\hat{x}}^{(t)}\leftarrow\mathbf{\hat{x}}^{{}^{\prime}(t)}
38      tt+1t\leftarrow t+1
39 end for
40
return 𝐱^(max_iter)\hat{\mathbf{x}}^{(max\_iter)}
Algorithm 1 GSR Image Denoising with improved patch search scheme

2.3 GSR Denoising Step

The overall denoising step is as as follows, the first step is patch grouping for each patch by searching on the noisy image, so as to obtain {𝐱𝐆𝟏,,𝐱𝐆𝐧}\{\mathbf{x_{G_{1}}},\cdots,\mathbf{x_{G_{n}}}\}. The second step is iterative optimization. Before doing optimization, the current noise level σ^(t)\hat{\sigma}^{(t)} need to be estimated, in the first iteration, the image to be denoised is the original noisy image and σ^(t)\hat{\sigma}^{(t)} is the already known standard deviation of the white noise, but for following up iterations, the noisy level need to be estimated. Then for each group 𝐱𝐆𝐤\mathbf{x_{G_{k}}}, learn its adaptive dictionary 𝐃𝐆𝐤\mathbf{D_{G_{k}}} by SVD decomposition and its coefficient 𝐚𝐆𝐤\mathbf{a_{G_{k}}} by threshoulding untill its mean absolute reconstruction difference is very close to 0.73σ^(t)0.73\hat{\sigma}^{(t)}. The third step is to put all sparsely represented patch in each patch group back to the image, and average the intensity by the number of times each pixel is counted to reconstruct a partially denoised version of the noisy image. Then this partially denoised image superimposes the noisy image as the image to be denoised in the next iteration, and search similar patches on the superimposed noisy image, and repeat the second and third step until maximum allowed iteration is reached. Algorithm 1 gives a complete description of the GSR image denoising process.

3 Experimental Results

This section displays all experimental results bwteen proposed method and other state-of-the-art methods. Also, the efficacy of improved similar patch seraching secheme, which search on the gradually less noisy image is compared with searching only on noisy image. Both quantitative and qualitative results are reported. All the test images are shown below, with size 256×256256\times 256.

Refer to caption
Figure 2: From left to right: {Lena, Boat, Bird, Brab, Couple, Man, Fish}

The evaluation metric is peak signal-to-noise ratio (PSNR):

MSE=1N𝐱𝐱^22MSE=\frac{1}{N}\|\mathbf{x}-\hat{\mathbf{x}}\|_{2}^{2}
PSNR=20log10(255)10log10(MSE)PSNR=20\log_{10}(255)-10\log_{10}(MSE)

The quantitative result is displayed at Table 2, white noise with level σ{10,20,30,50,100}\sigma\in\{10,20,30,50,100\} are exhibited, in each grid, the top left is the result for Ksvd [1], the top right is the result for BM-3d [2], the bottom left is the result for GSR image denoising without similar patch search scheme improved, the bottom left is the result for GSR image denoising with similar patch search scheme improved, while the bottom right is the result for GSR image denoising with similar patch search scheme improved. As we can see from the table, GSR-image denoising almost beated all results of Ksvd [1] and BM-3d [2], the top two results are labeled as bold font, most of them are from GSR-image denoising results. Also, it is clear to see that with improved patch similarity search scheme, GSR-image denoising can always produce better results.

σ\sigma Lena Boat Bird Brab Couple Man Fish
10 34.50 34.68 33.03 33.52 35.20 35.51 33.04 33.39 32.67 32.89 31.71 31.96 30.52 30.90
34.80 35.09 33.41 33.68 35.42 35.75 33.63 33.91 32.78 33.02 31.94 32.19 30.94 31.13
20 30.25 31.14 29.25 29.70 31.78 32.04 29.55 29.89 28.41 29.09 27.55 27.91 26.44 26.90
31.31 31.78 30.01 30.23 32.01 32.49 30.21 30.61 29.12 29.59 27.89 28.42 27.01 27.32
30 28.89 29.11 27.50 27.67 29.70 29.99 27.45 27.75 26.34 26.96 25.50 25.78 24.40 24.73
29.44 29.91 28.09 28.43 30.21 30.62 28.14 28.72 27.23 27.78 26.08 26.40 24.87 25.24
100 22.00 22.21 21.01 21.13 20.98 21.18 20.70 20.86 21.01 21.13 18.53 18.77 17.76 17.99
22.36 22.93 21.60 22.17 21.21 21.31 21.94 22.25 22.07 22.40 18.89 19.44 18.20 18.84
Table 1: The quantitative results for image denoising by using different algorithms
Refer to caption
Figure 3: Denoised image when σ\sigma = 20, Left to right: original, GSR(improved search scheme), GSR, BM-3d, Ksvd

In Figure 3, qualitative results are displayed for all test images, GSR with improved search scheme preserves major parts of local texture while smoothing the flat area of original image, and this helps produce the most natural presentations. GSR lost a little bit local texture and looks over-smoothed because using not well matched groups tend to average out the local texture. BM-3d preserves most local textures but also produces lots of artifacts in the flat area or the original image. Ksvd produces the worst denosing effect with noticeable artifacts and damaged local textures.

Refer to caption
Refer to caption
Figure 4: Denoised image when σ={10,20,30,50,100}\sigma=\{10,20,30,50,100\}
Refer to caption
Refer to caption
Figure 5: PSNR improvement over iterations

Better similar patch searching scheme significantly helps denoising quality. In Figure 4, it is clear to see that at all noise levels, GSR image denoising with improved search scheme (GSRS) beats GSR and other two methods on test images Lena and Brab. Also, the advantage is clearly exhibited in iterative denoising process, as shown in Figure 5, better searching scheme significantly boost denoising quality in the first few iterations, while the iterative improvement for naive similar patch search method only helps little.

4 Conclusion

This paper proposes a Group Sparse Representation based method for image denoising, that is, solving sparse coding problems for each similar patch group and iterate this process on more and more better quailty image. This method also naturally incorporate with a better patch search scheme, that is, searching on more and more better quality image. Finally this method is evaluated against by two state-of-the-art algorithms Ksvd and BM-3d, both qualitative and quantitative results are reported, and shows superior performance. In addition, improved similar patch search scheme is also evaluated by comparing with naive search scheme, it helps reaching better denoising quality both visually and statistically.

References

  • [1] Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on image processing, 16(8):2080–2095, 2007.
  • [2] Michael Elad and Michal Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image processing, 15(12):3736–3745, 2006.
  • [3] Jian Zhang, Debin Zhao, and Wen Gao. Group-based sparse representation for image restoration. IEEE transactions on image processing, 23(8):3336–3351, 2014.