This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Multi-stage image denoising with the wavelet transform

Chunwei Tian [email protected] Menghua Zheng Wangmeng Zuo Bob Zhang Yanning Zhang [email protected] David Zhang School of Software, Northwestern Polytechnical University, Xi’an, Shaanxi, 710129, China National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, Xi’an, Shaanxi, 710129, China School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001 Peng Cheng Laboratory, Shenzhen, Guangdong, 518055, China Department of Computer and Information Science, University of Macau, Macau, 999078, China School of Computer Science, Northwestern Polytechnical University, Xi’an, Shaanxi, 710129, China School of Data Science, The Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, Guangdong, China Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, China
Abstract

Deep convolutional neural networks (CNNs) are used for image denoising via automatically mining accurate structure information. However, most of existing CNNs depend on enlarging depth of designed networks to obtain better denoising performance, which may cause training difficulty. In this paper, we propose a multi-stage image denoising CNN with the wavelet transform (MWDCNN) via three stages, i.e., a dynamic convolutional block (DCB), two cascaded wavelet transform and enhancement blocks (WEBs) and a residual block (RB). DCB uses a dynamic convolution to dynamically adjust parameters of several convolutions for making a tradeoff between denoising performance and computational costs. WEB uses a combination of signal processing technique (i.e., wavelet transformation) and discriminative learning to suppress noise for recovering more detailed information in image denoising. To further remove redundant features, RB is used to refine obtained features for improving denoising effects and reconstruct clean images via improved residual dense architectures. Experimental results show that the proposed MWDCNN outperforms some popular denoising methods in terms of quantitative and qualitative analysis. Codes are available at https://github.com/hellloxiaotian/MWDCNN.

keywords:
Image denoising , CNN , wavelet transform , dynamic convolution , signal processing.

1 Introduction

Noisy images often arise in the high-level vision tasks, which makes image denoising become an important task in the field of low-level vision [28]. Specifically, a clean image as well as x{x} of image denoising problem can be represented via a degradation model of x=yn{x=y-n}, where y{y} and n{n} stand for a given noisy image and additive white Gaussian noise (AWGN) of standard deviation σ{\sigma}, respectively [28]. According to that, scholars developed a lot of denoisers [32]. For instance, Wang et al. combined a switching scheme and progressive methods based median filter to obtain more detailed information for salt-pepper image denoising [32]. To improve denoising performance, prior knowledge was used to extract useful information in image denoising [24]. Rabbani et al. depended on Laplacian random variables of high local correlation to obtain an estimator based maximum a posteriori and minimum mean squared for image denoising [24]. To improve denoising effect, mapping similar 2D-fragments of corrupted images into 3D-data arrays can enhance sparsity in image denoising [8]. Besides, using different gradient vectors to conduct absolute cosine value can recover detailed information [16]. Also, an edge-preserving regularization term is used to overcome adverse effect of unreliable prior from a guidance image for image denoising [16]. Although mentioned these methods have performed well in image denoising, they suffered from two shortages as follows [35]. That is, they resorted to manually selecting parameters. Also, they may require complex optimization algorithms to pursue excellent denoising effects.

To address the problem, discriminative learning methods in end-to-end ways rather than manually tuning parameters are developed [28]. Due to powerful learning abilities, discriminative learning methods, especially convolutional neural networks (CNNs) are applied to resolve image denoising problem [35]. Zhang et al. exploited combinations of stacked convolutions, batch normalization (BN) and ReLU to implement an efficient CNN denoiser [35]. To obtain salient features, attention idea is embedded into a network in image processing application [29]. For instance, Tian et al. used an attention mechanism to guide a CNN to extract more accurate information for image denoising [29]. To improve denoising performance, making full use of hierarchical information can facilitate more useful information to suppress noise [27]. For instance, Tai et al. utilized recursive units and gate units to transfer memory abilities of shallow layers to deep layers to improve quality of restored images [27]. To further improve performance in image restoration, Zhang et al. used residual dense blocks in a deeper CNN to integrate local and global features for obtaining more robust features in image restoration [39]. Although these methods are effective in image denoising, they may rely on deeper architectures to pursue excellent denoising performance, which may increase difficulty of training. In this paper, we present a multi-stage image denoising CNN with the wavelet transform as well as MWDCNN. It relies on three stages, i.e., a dynamic convolutional block (DCB), two cascaded stacked wavelet transform and enhancement blocks (WEBs) and a residual block (RB). DCB dynamically adjusts parameters of several convolutions via a dynamic convolution to overcome drawbacks of poor performance of some lightweight CNNs in terms of network depth and width for making a tradeoff between denoising performance and computational costs. Stacked WEBs use combinations of signal processing techniques (i.e., wavelet transform) and discriminative learning (i.e., residual dense blocks) to suppress noise for recovering more detailed information in image denoising. To further remove redundant features, RB is used to refine obtained features for improving denoising effects and reconstruct clean images via improved stacked residual dense architectures. Besides, the proposed MWDCNN is superior to popular denoising methods, i.e., a denoising CNN (DnCNN) [35] and attention-guided denoising (ADNet) [29] in terms of quantitative and qualitative analysis.

Contributions of this paper can be shown as follows.

(1) A dynamic convolution is used into a CNN to address limitations in depth and width of lightweight CNNs for pursuing good denoising performance.

(2) The combination of a signal processing technique and discriminative learning technique is used for image denoising.

(3) Enhanced residual dense architectures are used to remove redundant information for improving denoising effects.

The remainder of this paper is organized as follows. Section 2 provides related work about deep CNNs for image denoising, dynamic convolutions and wavelet transform techniques. Section 3 describes the proposed denoising method. Section 4 shows datasets, experimental settings, the proposed method analysis and extensive experimental results. Section 5 reports conclusion of this paper.

2 Related work

2.1 Deep CNNs for image denoising

To overcome drawbacks of traditional machine learning in image denoising, networks based components, i.e., CNNs are proposed [36]. To accelerate training speed of denoisers, Zhang et al. used patches of noisy images and noisy mapping to act a CNN to achieve an efficient denoising network [36]. To improve denoising performance, combining dilated convolutions and BN in a CNN can obtain more accurate context information to promote denoising effect [28]. Mentioned methods have obtained excellent denoising performance, however, convolutional kernels of these methods have same weights for noisy images, which may have big computational costs. Differing from these methods, our denoising method uses dynamic convolutions in a CNN to adjust parameters of convolutional kernels to train a robust denoiser, where mentioned dynamic convolutions can be shown in Section 2.2. Besides, we combine frequency features via wavelet transform and structural information via a CNN to obtain complementary information in image denoising, where mentioned wavelet transform can be shown in Section 2.3.

2.2 Dynamic convolutions for image applications

Most of existing methods share parameters of each convolutional layer to train a denoising CNN for image applications (i.e., image classification), however, they cannot adjust parameters of each layer to obtain a robust classifier, according to different images [5]. To address this question, dynamic convolutions used multiple parallel convolutions via attention mechanisms are fused to reduce computational costs of different CNNs for training a robust classifier [5]. To improve training efficiency, a dynamic convolution was embedded into a CNN to remove redundancy features for improving classification accuracy [40]. To mine more context, Sun et al. used Gaussian dynamic pyramid pooling in a CNN to improve expressive abilities in image segmentation [26]. Besides, using matrix decomposition to guide a dynamic convolution can make a tradeoff between performance and training speed in image recognition [17]. Due to excellent performance, a dynamic convolution is used into a CNN for image denoising in this paper.

2.3 Wavelet transform for image applications

It is known that images can be treated as signals, thus, signal processing techniques, i.e., wavelet transform are effective for low-level tasks [7]. For instance, Cho et al. used multivariate statistical idea to estimate coefficients of wavelet transform for filtering noise in image denoising [7]. Liu et al. utilized wavelet transform and genetic algorithm to suppress noise in image denoising [19]. To mine more useful information, wavelet transform is used into a CNN to learn detailed information and content information for image super-resolution [12]. As an alternative, using cross-connection and residual technique to integrate a CNN and wavelet transform was a good tool for image super-resolution [34]. According to mentioned illustrations, we can see that wavelet transform is effective for low-level task. Inspired by that, we fuse two cascaded wavelet transform techniques into a CNN to implement fusion of frequency information and struct information for promoting visual denoising effects.

Refer to caption
Figure 1: Network architecture of MWDCNN.

3 The proposed method

3.1 Network architecture

The designed 23-layer MWDCNN contains three parts, i.e., a DCB, two cascaded WEBs and a RB for image denoising as shown in Fig.1. A 5-layer DCB uses a dynamic convolution to dynamically adjust parameters of several convolutions to make a tradeoff between denoising performance and computational costs, according to different images. To extract more useful information, two stacked 8-layer WEBs combine signal processing technique and discriminative learning to remove noise. That is, it combines frequency features via wavelet transform and structure information via convolutional layers to mine robust features in image denoising. Also, a 10-layer RB is used to refine obtained features via enhanced residual dense architectures and reconstruct clean images via a residual learning operation. To vividly express the mentioned process, the following equation is conducted.

IC=fRB(fWEB(fWEB(fDCB(IN))))=fMWDCNN(IN),\begin{array}[]{ll}{I_{C}}&={f_{RB}(f_{WEB}(f_{WEB}(f_{DCB}(I_{N}))))}\\ &={f_{MWDCNN}(I_{N})},\end{array} (1)

where IN{I_{N}} and IC{I_{C}} denote given a noisy image and a clean image, respectively. fDCB{f_{DCB}} stands for a function of DCB. fWEB{f_{WEB}} and fRB{f_{RB}} are functions of WEB and RB, respectively. fMWDCNN{f_{MWDCNN}} expresses a function of MWDCNN. MWDCNN is trained by a loss function in Section 3.2.

3.2 Loss function

To keep consistency with popular denoising methods, i.e., DnCNN [35] and fast and flexible denoising network (FFDNet) [36], we choose a mean square error (MSE) [2] as a loss function (also treated as objective function) to train a denoiser of MWDCNN. Specifically, MSE uses pairs of {ICi,INi}{\{I_{C}^{i},I_{N}^{i}\}} (1=<i<=n{1=<i<=n}) in a supervised way to train this denoising model, where ICiI_{C}^{i} and INiI_{N}^{i} are defined as the ith{i-th} clean image and noisy image, respectively. n{n} represents the number of noisy images in the training process. MWDCNN can be optimized by Adam to obtain suitable parameters [13]. Besides, to show its good performance of MSE, we discuss validity of different losses from Charbonnier of the newest denoising method[18] and Pearson of quality assessment method [3] fused into our MWDCNN for image denoising in this Section 4.3. The mentioned procedure can be formulated as Eq. (2).

l(θ)=12ni=1nfMWDCNN(INi)(ICi)2,\begin{array}[]{ll}l\left(\theta\right){\rm{}}={\rm{}}\frac{1}{{2n}}\sum\limits_{i=1}^{n}{||{f_{MWDCNN}}(I_{N}^{i})-(I_{C}^{i})|{|^{2}}},\end{array} (2)

where l{l} and θ{\theta} express loss function and parameter sets.

3.3 Dynamic convolution block

The first stage of MWDCNN is implemented by a dynamic convolution block. The dynamic convolution block is composed of five layers: a convolutional layer, a 3-layer dynamic convolution and a convolutional layer. The first convolutional layer is used to convert noisy images to linear features, where its parameters are input channel number of 3 or 1 (depending on whether the input is a color or gray noisy images), convolutional kernel of 5×5{5\times 5} and output channel number of 64. The 3-layer dynamic convolution consists of 2-layer weight generator and 1-layer convolutional layer with 5×5{5\times 5} , where the weight generator contains an average pooling, a combination of convolutional layer with 1×1{1\times 1} and ReLU, a convolutional layer with 1×1{1\times 1}, and Softmax as illustrated in Fig. 2 [5]. The second convolutional layer followed by ReLU is used to refine obtained features from the dynamic convolution, where its parameters are input channel number of 64, convolutional kernel of 5×5{5\times 5} and output channel number of 64. These illustrations can be presented as follows.

ODCB=fDCB(IN)=CR(R(DC(C(IN)))),\begin{array}[]{ll}O_{DCB}&={f_{DCB}(I_{N})}\\ &=CR(R(DC(C(I_{N})))),\end{array} (3)

where DC{DC} denotes a function of dynamic convolution and ODCB{O_{DCB}} is output of DCB, which acts two stacked WEBs. C{C}, R{R} and CR{CR} stand for a convolution operation of 5×5{5\times 5}, activation function as well as ReLU and convolution operation of 5×5{5\times 5} with ReLU, respectively.

Implementations of mentioned dynamic convolution are shown as follows [5]. Firstly, it uses WG to obtain four weights to act four parallel convolutional kernels in a weighted way to adjust parameters. Secondly, obtained results act a convolutional layer of 5×5{5\times 5} . Thirdly, output of the first convolution in a DCB and output of second step in a dynamic convolution linearly fuse obtained features by obtaining suitable parameters for different noisy images. Work of a dynamic convolution in a DCB can be symbolled by the following Equations.

ODC=DC(ODCB_1C)=C(WG(ODCB_1C)×K)×ODCB_1C=C(i=14WGi(ODCB_1C)×Ki)×ODCB_1C,\begin{array}[]{ll}{O_{DC}}&=DC({O_{DCB\_1C}})\\ &=C{\rm{(}}WG{\rm{(}}{O_{DCB\_1C}}{\rm{)}}\times K)\times{O_{DCB\_1C}}\\ &=C{\rm{(}}\sum\limits_{i=1}^{4}{{\rm{W}}{{\rm{G}}_{i}}{\rm{(}}{O_{DCB\_1C}}{\rm{)}}}{\rm{}}\times{K_{i}})\times{O_{DCB\_1C}},\par\end{array} (4)

where ODCB_1C{O_{DCB\_1C}} is an output of the first convolutional layer in the DCB. DC{DC} denotes function of a dynamic convolution. WG{WG}, C{C} and K{K} stand for functions of WG, a convolution of 5×5{5\times 5} and four parallel convolutional kernels, respectively. Also, this convolutional layer has input and output channel number of 64. WGi{WG_{i}} and Ki{K_{i}} are used to represent the ith{i-th} channel of results from WG and the ith{i-th} convolutional kernel in four parallel convolutional kernels. ODC{O_{DC}} expresses an output of a dynamic convolution. Specifically, a function of WG [5] can be defined as Eq. (5).

OWG=WG(ODCB_1C)=S(C1(RC1(AP(ODCB_1C)))),\begin{array}[]{ll}{O_{WG}}&=WG{\rm{(}}{O_{DCB\_1C}}{\rm{)}}\\ \rm&=S{\rm{(}}{C_{1}}(R{C_{1}}{\rm{(}}AP{\rm{(}}{O_{DCB\_1C}})))),\par\end{array} (5)

where AP{AP} and S{S} are defined as an average pooling and Softmax, respectively. RC1RC_{1} and C1C_{1} are the combination of a convolutional layer of 1×1{1\times 1} and a ReLU, and a convolutional layer of 1×1{1\times 1}. The input channel number and output channel number of the first convolutional layer in the WG are 64 and 4, respectively. The input channel number and output channel number of the second convolutional layer in the WG are 4 and 4, respectively. Besides, WG can be visualized as Fig. 2.

Refer to caption
Figure 2: Architecture of weight generator (WG).

3.4 Two stacked wavelet transform and enhancement blocks

The second phase of WDMCNN uses two stacked wavelet transform and enhancement blocks. Specifically, each 4-layer wavelet transform and enhancement block as well as WEB includes three steps to fuse frequency features and structure information via combining signal processing techniques and discriminative learning for obtaining more useful information. The first step uses discrete wavelet transform (DWT) [9] to convert linear construct information to four frequency features. The second step utilizes structural network to guide signal processing techniques via a feature enhancement (FE) mechanism (also treated as a 4-layer residual dense block (RDB) in Fig. 3 [39]) to suppress noise for recovering more detailed information in image denoising. It is noted that sizes of all convolutional kernels besides the last covolutional kernel in a FE are 5×5{5\times 5} and the last convolutional kernel in a FE is 1×11\times 1. The third step uses an inverse discrete wavelet transform (IDWT) to convert frequency features to linear structure information. The mentioned illustrations can be shown in Eq. (6)

OWEB=fWEB(fWEB(ODCB))=fWEB(fIDWT(fFE(fDWT(ODCB))))=fIDWT(fFE(fDWT(fIDWT(fFE(fDWT(ODCB)))))),\begin{array}[]{l}\begin{array}[]{l}\begin{split}{O_{WEB}}&={f_{WEB}}({f_{WEB}}({O_{DCB}}))\\ {\rm{}}&={f_{WEB}}({f_{IDWT}}({f_{FE}}({f_{DWT}}({O_{DCB}}))))\\ {\rm{}}&={f_{IDWT}}({f_{FE}}({f_{DWT}}({f_{IDWT}}({f_{FE}}({f_{DWT}}({O_{DCB}})))))),\end{split}\end{array}\end{array} (6)

where fWEB{f_{WEB}} expresses a function of WEB. fDWT{f_{DWT}} and fIDWT{f_{IDWT}} denote functions of DWT and IDWT, respectively. fFE{f_{FE}} and OWEB{O_{WEB}} represent function of FE and output of two stacked WEBs, respectively. More information of mentioned DWT and IDWT can be obtained in Ref. [9]. Also, mentioned 4-layer FE as well as a RDB contains three combinations of a convolutional layer and ReLU (also treaded as Conv+ReLU) and a convolutional layer (also treaded as Conv with 1×1{1\times 1}) and its more detailed information can be obtained in Ref. [39].

3.5 Residual block

The RB is implemented a 10-layer enhanced residual dense architecture. That is, it is composed of two combinations of residual dense block and ReLU and two convolutional layers as well as RDC, and two residual learning operations. That can be implemented by four steps as follows. The first step uses two combinations of a 4-layer RDBs [39] and ReLU as the first component to further refine features in image denoising. To prevent long-term dependency problem, the second step uses two residual learning operations to fuse obtained features from two combinations of each RDB and ReLU, and WEB to enhance memory abilities of shallow layers on deep layers in image denoising. To prevent over-enhancement phenomenon of the second step, the third step uses a convolutional layer followed by ReLU to refine these features, where its input channel and output channel are 64, respectively. Finally, a convolutional layer is used to reconstruct noisy mappings and a residual learning operation is used to reconstruct clean images via obtained noisy mappings and given noisy images. izes of all convolutional kernels besides the fouth and eighth convolutions in a residual block are 5×5{5\times 5}. Also, the fourth and eighth convolutional kernels are 1×11\times 1. Input channel number and output channel number of two RDBs are 64, and input channel number of the last convolutional layer is 64. And the output channel number of the last convolutional layer is 3 or 1, depending on whether the noisy image is a color or gray images. The illustrations above can be shown in Eq. (7).

IC=fRB(OWEB)=INfRDC(OWEB)=INC(CR(OWEB+R(fRDB(OWEB))+R(fRDB(R(fRDB(OWEB)))))),\begin{array}[]{ll}{I_{C}}&={f_{RB}}({O_{WEB}})\\ &={I_{N}}-{f_{RDC}}({O_{WEB}})\\ &={I_{N}}-C(CR({O_{WEB}}+R({f_{RDB}}({O_{WEB}}))+R({f_{RDB}}(R({f_{RDB}}({O_{WEB}})))))),\end{array} (7)

where fRDC{f_{RDC}} denotes a function of RDC. Also, ‘-’ stands for a residual learning operation as well as {\oplus} in Fig. 1. fRDB{f_{RDB}} and R{R} express a function of RDB and ReLU, respectively.

Refer to caption
Figure 3: Architecture of feature enhancement (FE) as well as residual dense block (RDB).

4 Experiments

4.1 Datasets

Training datasets: Berkeley segmentation dataset (BSD) with 432 natural images of 481×321{481\times 321} [14] to train color and gray Gaussian synthetic denoisers. To enhance the number of training images and accelerate training efficiency, we randomly cut each image into 512 image patches of 48×48{48\times 48}. The number of noisy image patches is 221,184. In terms of real-noisy image denoising, we choose 100 natural images with sizes of 512×512{512\times 512} as training data to train a real noisy image denoiser [33]. Also, we cut each image in real noisy image dataset as 211,600 image patches of 48×48{48\times 48}. To further extend diversity of training images, we randomly use a one way from data augmentation with eight ways to make training images richer, mentioned eight ways can be shown as Ref. [29].

Test datasets: To fairly test the proposed denoising method, public datasets, i.e., BSD68 [14], Set12 [14], CBSD68 [14], Kodak24 [10] and CC [22]. In terms of gray Gaussian synthetic image denoising, BSD68 [14] and Set12 [14] are used as test datasets. In terms of color Gaussian synthetic image denoising, CBSD68 [14] and Kodak24 [10] are utilized as test datasets. In terms of real noisy image denoising, CC [22] is exploited as test dataset.

4.2 Implementation details

For training denoisers in this paper, batch size and the number of epochs are set to 64 and 90, respectively. Learning rate is initialized as 1×104{1\times 10^{-4}}, which may vary from different epochs. That is, learning rate is 1×104{1\times 10^{-4}} from the 1st epoch to the 30th epoch, it is 1×105{1\times 10^{-5}} from the 31th epoch to the 60th epoch and it is 1×106{1\times 10^{-6}} from the 61th epoch to the 90th epoch. β1=0.9{{\beta_{1}}=0.9} and β2=0.999{\beta_{2}}=0.999. More experimental parameters are the same as Ref. [29]. We use PyTorch of 1.10.2 [28] and Python of 3.8.5 to implement codes of MWDCNN. All the experiments are conducted on Ubuntu of 20.04 with AMD EPYC of 7502P/3.35GHz, 32-core CPU, RAM of 128G and a GPU of a Nvidia GeForce GTX 3090. Besides, we choose Nvidia CUDA of 11.1 version and cuDNN of 8.04 version to improve running speed of the mentioned GPU.

4.3 Network analysis

In this paper, we analyze designed denoising network to show its rationality and validity as follows.

Dynamic convolutional block: Robust denoisers are very important for different scenes. However, most of existing denoising methods resorted to sharing parameters to train denoising models, which cannot make these denoising models robust for different scenes. That also has huge challenges for high-level vision tasks. Also, dynamic convolution can adjust parameters of convolutional kernels, according to given images [5]. Inspired by that, we fuse a dynamic convolution into a CNN to suppress noise for improving denoising performance without increasing computational costs. That is, we design a dynamic convolutional block as well as DCB to address the mentioned problems. DCB uses a convolutional layer to extract linear features. Subsequently, a dynamic convolutional layer [5] is used in a CNN to linearly combine all convolutional kernels to dynamically adjust parameters rather than same parameters of convolutions, according to different noisy images, which can make a tradeoff between denoising performance and computational costs as follows.

In terms of denoising performance, we design three experiments to verify denoising effect of a dynamic convolution in a CNN, according to individual and local parts of the MWDCNN for image denoising. In terms of a dynamic convolution as a local part in a MWDCNN, we use MWDCNN without a convolution in a DCB and MWDCNN without a dynamic convolution and a convolutional layer in a DCB to verify superiority of a dynamic convolution for denoising effect as shown in Table 1, which shows effectiveness of dynamic convolutional block in the MWDCNN for image denoising. In terms of a dynamic convolution as an individual, we use combination of a dynamic convolution layer and three convolutional layers as well as the combination of three stacked convolutional layers and a dynamic convolutional layer and combination of six stacked convolutional layers (also treated as four convolutions with 5×5{5\times 5} and two convolutions with 1×1{1\times 1} ) to conduct experiments. The number of input and output channels of the first convolutional layer are 3 and 64, respectively. The number of input and output channels of the last convolutional layer are 64 and 3, respectively. The number of input and output channels of other convolutional layers are 64 and 64, respectively. In Table 1, we can see that the combination of three stacked convolutional layers and a dynamic convolutional layer obtains higher value than that of combination of six stacked convolutional layers, which tests effectiveness of individual for image denoising. According to mentioned analysis, it is known that a dynamic convolution in a CNN is effective for image denoising.

In terms of computational costs, we choose important metrices, i.e., complexities (parameters and Flops) and run-time to show denoising competitiveness of a dynamic convolution on real digital cameras in the real world. That is, the combination of three stacked convolutional layers and a dynamic convolutional layer obtains competitive both parameters and flops than that of combination of six stacked convolutional layers as illustrated in Table 2, which shows competitive complexities of a dynamic convolution for image denoising. The combination of three stacked convolutional layers and a dynamic convolutional layer obtains less run-time than that of combination of six stacked convolutional layers in Table 3, which shows fast execution denoising ability of a dynamic convolutional layer. Thus, a dynamic convolutional layer used in a CNN is very reasonable and valid for image denoising in this paper. Besides, we use a convolutional layer in a DCB to extract useful information, where its effectiveness is tested via MWDCNN and MWDCNN without a convolution in a DCB in Table 1.

Table 1: PSNR(dB) of different methods on CBSD68 for σ=25{\sigma=25}.
Methods PSNR
MWDCNN (Ours) 31.45
MWDCNN without a convolution in a DCB 31.42
MWDCNN without a dynamic convolution and a convolutional layer in a DCB 31.39
MWDCNN with a convolutional layer and without a DCB and a WEB 31.35
MWDCNN with a convolutional layer and FE without a DCB and a WEB 31.34
MWDCNN with a convolutional layer and without DCB and two WEBs 31.33
MWDCNN with a convolutional layer and without DCB, two WEBs and two ReLUs
in RDC
31.31
MWDCNN with a convolutional layer and without DCB, two WEBs, two ReLUs and
two RLOs in RDC
31.30
MWDCNN with a convolutional layer and without DCB, two WEBs, two ReLUs,
two RLOs and a RDB in RDC
31.20
MWDCNN with a convolutional layer and without DCB, two WEBs, two ReLUs,
two RLOs, a RDB and a convolutional layer in RDC
31.12
The combination of six stacked convolutional layers 30.83
The combination of three stacked convolutional layers and a dynamic convolutional
layer
30.85
MWDCNN without two WEBs
31.35
MWDCNN with a convolutional layer and without an RDC
31.32
MWDCNN without ReLUs
31.44
Table 2: Complexities of different methods on CBSD68 for σ=25{\sigma=25}.
Methods Parameters Flops
The combination of six stacked convolutional layers 0.212M 5.59G
The combination of three stacked convolutional layers
and a dynamic convolutional layer
0.498M 2.81G
Table 3: Run-time of different methods on a noisy image of 1024×1024{1024\times 1024} from the CBSD68 for σ=25{\sigma=25}.
Methods Run-time
The combination of six stacked convolutional layers 0.060s
The combination of three stacked convolutional layers
and a dynamic convolutional layer
0.046s
Table 4: PSNR(dB) and SSIM of MWDCNN with different loss functions on CBSD68 for σ=25\sigma=25.
Loss function PSNR SSIM
MSE[2] 31.45 0.8925
Charbonnier[4] 31.43 0.8918
MSE+Pearson[3] 31.42 0.8919

Wavelet transform enhancement block: It is known that images can be treated as signals, thus, signal processing techniques, i.e., wavelet transform are effective for low-level tasks [7]. Besides, a combination of a CNN and wavelet transform is very effective for low-level vision tasks in Section 2.3. Inspired by that, we fuse wavelet transform techniques into a CNN for image denoising. Differing from these methods in Section 2.3, we embed a network architecture into a wavelet transform technique to fuse frequency features and structure information via a feature enhancement mechanism and DWT as shown in Section 3.4, where effectiveness of wavelet transform technique can be tested via MWDCNN with a convolutional layer and without a DCB and a WEB and MWDCNN with a convolutional layer and FE without a DCB and a WEB in Table 1. That shows effectiveness of signal processing technique (wavelet technique) in the MWDCNN for image denoising. MWDCNN with a convolutional layer and FE without a DCB and a WEB has higher PSNR value than that of MWDCNN with a convolutional layer and without DCB and two WEBs, which shows the effectiveness of FE in image denoising in Table 1.That shows effectiveness of FE (discriminative learning) in a MWDCNN for image denoising. To extract more robust features, we choose two stacked WEBs in a CNN in image denoising in Section 3.4. Its effectiveness is verified via MWDCNN without a dynamic convolution and a convolutional layer in a DCB and MWDCNN with a convolutional layer and without a DCB and a WEB, MWDCNN with a convolutional layer and without a DCB and a WEB and MWDCNN with a convolutional layer and without DCB and two WEBs as listed in Table 1. Besides, MWDCNN is superior to MWDCNN without two WEBs, which tests effectiveness of the combination of signal processing and discriminative learning techniques in the MWDCNN for image denoising in Table 1. According to mentioned illustrations, two stacked WEBs in a MWDCNN are effective and reasonable for image denoising.

Residual block: According to Ref. [1], we can see that stacked convolutional layers can refine obtained features. To enhance robustness of obtained denoiser and prevent dependency of deep networks, we propose a 10-layer residual block as well as RB via improved residual dense architectures to improve quality of predicted images and construct images as follows. Firstly, we use two stacked RDBs (each RDB with three convolution kernels of 5×5{5\times 5}) to refine obtained features from stacked WEBs, its effectiveness is tested by MWDCNN with a convolutional layer and without DCB, two WEBs, two ReLUs and two RLOs in RDC and MWDCNN with a convolutional layer and without DCB, two WEBs, two ReLUs, two RLOs and a RDB in RDC in Table 1. Secondly, ReLU acts each RDB to convert obtained features into non-linearity, where its effectiveness is verified through MWDCNN with a convolutional layer and without DCB and two WEBs and MWDCNN with a convolutional layer and without DCB, two WEBs and two ReLUs in RDC as presented in Table 1. To address long-term dependency of deep networks, we use two residual operations to fuse obtained features from the second combination of RDB and ReLU and the second WEB. MWDCNN with a convolutional layer and without DCB, two WEBs and two ReLUs in RDC outperforms MWDCNN with a convolutional layer and without DCB, two WEBs, two ReLUs and two RLOs in RDC in PSNR as listed in Table 1, which shows its effectiveness. Fourthly, we use a convolutional layer of 5×55\times 5 to prevent over-enhancement phenomenon from the previous steps. MWDCNN with a convolutional layer and without DCB, two WEBs, two ReLUs, two RLOs and a RDB in RDC has obtained higher PSNR value than that of MWDCNN with a convolutional layer and without DCB, two WEBs, two ReLUs, two RLOs, a RDB and a convolutional layer in RDC to test effectiveness of a convolutional layer. According to mention illustrations, the designed block is reasonable and valid for image denoising. Also, MWDCNN has obtained better denoising performance than that of MWDCNN with a convolutional layer and without an RDC for image denoising in Table 1, which verifies effectiveness of RDC for MWDCNN in image denoising. Besides, ReLUs as important components of MWDCNN are used to convert linear data into non-linearity, where its effectiveness is verified by MWDCNN and MWDCNN without ReLUs in Table 1.

In terms of choosing loss function, we take into the following three aspects consideration. The first aspect utilizes a loss function as well as MSE of popular denoising methods as an objective function in terms of previous denoising work. The second aspect uses a loss function (Charbonnier) of the newest denoising method based Transformer to fuse into the proposed MWDCNN to test denoising performance in terms of the newest denoising work. The third aspect exploits typical quality assessment method of Pearson as a loss function to test denoising performance in terms of perceptual theory. As shown in Table 4, we can see that the proposed method with MSE is superior to the proposed method with Charbonnier, which shows MSE is suitable to lightweight denoising CNN and Charbonnier [18] is useful for big denoising CNN, i.e., Transformer. Also, only using MSE as a loss function outperforms using the combination of MSE and Pearson as a loss function [3] in terms of PSNR and SSIM for image denoising, which shows fusion of two different losses may cause a negative effect. Thus, we choose MSE as a loss function to train a MWDCNN denoiser.

In a summary, the dynamic convolutional block with a dynamic convolution is strong adaptability for different scenes. Also, it is very competitive in terms of denoising performance and computational costs. To further improve denoising performance, a wavelet transform enhancement block relies on a combination of a single processing technique and discriminative learning technique via fusing wavelet transform technique and a CNN is designed. To improve the robustness of obtained MWDCNN denoising model, a residual block uses a stacked architecture to refine obtained information. According to mentioned illusrations, we can see that key components, i.e., dynamic convolutional block, wavelet transform enhancement block, residual block and chosen loss function are reasonable and valid for image denoising, which makes MWDCNN reasonable and valid in image denoising in this paper.

Table 5: Average PSNR (dB) of different methods on BSD68 with noise levels of 15, 25 and 50.
Methods
15
25
50
BM3D[8] 31.07 28.57 25.62
WNNM[11] 31.37 28.83 25.87
TNRD[6] 31.42 28.92 25.97
DnCNN[35] 31.72 29.23 26.23
FFDNet[36] 31.63 29.19 26.29
ADNet[29]
31.74
29.25 26.29
CDNet[23]
31.74
29.28
26.36
MWDCNN (Ours)
31.77
29.28
26.29
MWDCNN-B (Ours) 31.39 29.16 26.20
Table 6: Average PSNR (dB) of different methods on Set12 with noise levels of 15, 25 and 50.
Images C.man House Peppers Starfish Monarch Airplane Parrot Lena Barbara Boat Man Couple Average
noise level 15
BM3D[8] 31.91 34.93 32.69 31.14 31.85 31.07 31.37 34.26 33.10 32.13 31.92 32.10 32.37
WNNM[11] 32.17 35.13 32.99 31.82 32.71 31.39 31.62 34.27 33.60 32.27 32.11 32.17 32.70
TNRD[6] 32.19 34.53 33.04 31.75 32.56 31.46 31.63 34.24 32.13 32.14 32.23 32.11 32.50
DnCNN[35] 32.61 34.97 33.30 32.20 33.09 31.70 31.83 34.62 32.64 32.42 32.46 32.47 32.86
FFDNet[36] 32.43 35.07 33.25 31.99 32.66 31.57 31.81 34.62 32.54 32.38 32.41 32.46 32.77
GNLM[15] —— 35.01 32.98 —— —— —— —— 33.89 —— 31.69 31.94 —— ——
MWDCNN(Ours) 32.53 35.09 33.29 32.28 33.20 31.74 31.97 34.64 32.65 32.49 32.46 32.52 32.91
MWDCNN-B(Ours) 32.14 34.98 33.17 31.73 33.03 31.49 31.78 34.46 31.79 32.22 32.17 32.26 32.60
noise level 25
BM3D[8] 29.45 32.85 30.16 28.56 29.25 28.42 28.93 32.07 30.71 29.90 29.61 29.71 29.97
WNNM[11] 29.64 33.22 30.42 29.03 29.84 28.69 29.15 32.24 31.24 30.03 29.76 29.82 30.26
TNRD[6] 29.72 32.53 30.57 29.02 29.85 28.88 29.18 32.00 29.41 29.91 29.87 29.71 30.06
DnCNN[35] 30.18 33.06 30.87 29.41 30.28 29.13 29.43 32.44 30.00 30.21 30.10 30.12 30.43
FFDNet[36] 30.10 33.28 30.93 29.32 30.08 29.04 29.44 32.57 30.01 30.25 30.11 30.20 30.44
DudeNet[30] 30.23 33.24 30.98 29.53 30.44 29.14 29.48 32.52 30.15 30.24 30.08 30.15 30.52
GNLM[15] —— 32.91 30.19 —— —— —— —— 31.67 —— 29.71 29.63 —— ——
MWDCNN(Ours) 30.19 33.33 30.85 29.66 30.55 29.16 29.48 32.67 30.21 30.28 30.10 30.13 30.55
MWDCNN-B(Ours) 30.02 33.14 30.75 29.29 30.28 29.02 29.36 32.51 29.90 30.17 30.05 30.14 30.39
noise level 50
BM3D[8] 26.13 29.69 26.68 25.04 25.82 25.10 25.90 29.05 27.22 26.78 26.81 26.46 26.72
WNNM[11] 26.45 30.33 26.95 25.44 26.32 25.42 26.14 29.25 27.79 26.97 26.94 26.64 27.05
TNRD[6] 26.62 29.48 27.10 25.42 26.31 25.59 26.16 28.93 25.70 26.94 26.98 26.50 26.81
DnCNN[35] 27.03 30.00 27.32 25.70 26.78 25.87 26.48 29.39 26.22 27.20 27.24 26.90 27.18
FFDNet[36] 27.05 30.37 27.54 25.75 26.81 25.89 26.57 29.66 26.45 27.33 27.29 27.08 27.32
DudeNet[30] 27.22 30.27 27.51 25.88 26.93 25.88 26.50 29.45 26.49 27.26 27.19 26.97 27.30
GNLM[15] —— 28.99 26.96 —— —— —— —— 28.49 —— 26.63 26.78 —— ——
MWDCNN(Ours) 26.99 30.58 27.34 25.85 27.02 25.93 26.48 29.63 26.60 27.23 27.27 27.11 27.34
MWDCNN-B(Ours) 27.09 30.23 27.35 25.70 26.78 25.73 26.39 29.42 26.57 27.24 27.23 26.97 27.23
Table 7: Average PSNR (dB) of different methods on CBSD68 with noise levels of 15, 25, 35, 50 and 75.
Methods 15 25 35 50 75
CBM3D[8] 33.52 30.71 28.89 27.38 25.74
DnCNN[35] 33.98 31.31 29.65 28.01 ——
FFDNet[36] 33.80 31.18 29.57 27.96 26.24
ADNet[29] 33.99 31.31 29.66 28.04 26.33
DudeNet[30] 34.01 31.34 29.71 28.09 26.40
GradNet[20] 34.07 31.39 —— 28.12 ——
MWDCNN (Ours) 34.18 31.45 29.81 28.13 26.40
MWDCNN-B (Ours) 34.10 31.44 29.80 28.15 ——
Table 8: Average PSNR (dB) of different methods on Kodak24 with noise levels of 15, 25, 35, 50 and 75.
Methods σ=15\sigma=15 σ=25\sigma=25 σ=35\sigma=35 σ=50\sigma=50 σ=75\sigma=75
CBM3D[8] 34.28 31.68 29.90 28.46 26.82
DnCNN[35] 34.73 32.23 30.64 29.02 ——
FFDNet[36] 34.55 32.11 30.56 28.99 27.25
ADNet[29] 34.76 32.26 30.68 29.10 27.40
DudeNet[11] 34.81 32.26 30.69 29.10 27.39
GradNet[20] 34.85 32.35 —— 29.23 ——
MWDCNN (Ours) 34.91 32.40 30.87 29.26 27.55
MWDCNN-B (Ours) 34.83 32.39 30.83 29.23 ——
Table 9: PSNR (dB) of different methods on real noisy images(CC).
Setting CBM3D[8] TID[21] DnCNN[35] DudeNet[30]
MWDCNN
(Ours)
Canon 5D ISO = 3200 39.76 37.22 37.26 36.66 36.97
36.40 34.54 34.13 36.70 36.01
36.37 34.25 34.09 35.03 34.80
Nikon D600 ISO = 3200 34.18 32.99 33.62 33.72 33.91
35.07 34.20 34.48 34.70 34.88
37.13 35.58 35.41 37.98 37.02
Nikon D600 ISO = 3200 36.81 34.49 37.95 38.10 37.93
37.76 35.19 36.08 39.15 37.49
37.51 35.26 35.48 36.14 38.44
Nikon D600 ISO = 3200 35.05 33.70 34.08 36.93 37.10
34.07 31.04 33.70 35.80 36.72
34.42 33.07 33.31 37.49 37.25
Nikon D600 ISO = 3200 31.13 29.40 29.83 31.94 32.24
31.22 29.86 30.55 32.51 32.56
30.97 29.21 30.09 32.91 32.76
Average 35.19 33.36 33.86 35.72 35.74
Table 10: FSIM results of different methods on BSD68 with noise levels of 15, 25 and 50.
Methods σ=15\sigma=15 σ=25\sigma=25 σ=50\sigma=50
DnCNN[35] 0.746 0.689 0.602
FFDNet[36] 0.745 0.690 0.602
MWDCNN (Ours) 0.747 0.691 0.603
Table 11: FSIM results of different methods on CBSD68 and Kodak24 with noise levels of 15, 25, 35, 50 and 75.
Noise Level σ=15\sigma=15 σ=25\sigma=25 σ=35\sigma=35 σ=50\sigma=50 σ=75\sigma=75
Dataset CBSD68
DnCNN-B[35] 0.784 0.737 0.701 0.657 ——
ADNet[29] 0.785 0.738 0.702 0.659 0.606
FFDNet[36] 0.782 0.733 0.695 0.648 0.593
MWDCNN (Ours) 0.788 0.742 0.707 0.664 0.615
Dataset Kodak24
DnCNN-B[35] 0.764 0.713 0.675 0.631 ——
ADNet[29] 0.768 0.717 0.678 0.635 0.585
FFDNet[36] 0.766 0.709 0.668 0.621 0.570
MWDCNN (Ours) 0.773 0.722 0.686 0.643 0.596
Table 12: SSIM results of different methods on Kodak24 with noise levels of 15, 25, 35, 50 and 75.
Noise Level σ=15\sigma=15 σ=25\sigma=25 σ=35\sigma=35 σ=50\sigma=50 σ=75\sigma=75
DnCNN-B[35] 0.9205 0.8774 0.8394 0.7896 ——
ADNet[29] 0.9239 0.8820 0.8445 0.7984 0.7387
FFDNet [36] 0.9231 0.8792 0.8409 0.7930 0.7328
MWDCNN (Ours) 0.9269 0.8862 0.8515 0.8062 0.7491
Table 13: LPIPS results of different methods on Kodak24 with noise levels of 15, 25, 35, 50 and 75.
Noise Level σ=15\sigma=15 σ=25\sigma=25 σ=35\sigma=35 σ=50\sigma=50 σ=75\sigma=75
DnCNN-B [35] 0.1719 0.2323 0.2780 0.3344 ——
ADNet [29] 0.1663 0.2256 0.2690 0.3228 0.3837
FFDNet [36] 0.1721 0.2376 0.2866 0.3408 0.4055
MWDCNN (Ours) 0.1567 0.2154 0.2579 0.3075 0.3639
Table 14: Inception-Score of different methods on Kodak24 with noise levels of 15 and 50.
Noise Level σ=15\sigma=15 σ=50\sigma=50
DnCNN-B [35] 9.68 8.97
ADNet [29] 9.41 9.31
FFDNet [36] 9.46 9.21
MWDCNN (Ours) 9.83 9.52
Refer to caption
(a)
Figure 4: Denoising results of a gray noisy image from Set12 with σ{\sigma} = 15. (a) Clean image, (b) Noisy image, (c) FFDNet/31.81dB, (d) ADNet/31.96dB, (e) DnCNN/31.83dB and (f) MWDCNN (Ours)/31.97dB.
Refer to caption
(a)
Figure 5: Denoising results of a gray noisy image from Set12 with σ{\sigma} = 25. (a) Clean image, (b) Noisy image, (c) FFDNet/29.32dB, (d) ADNet/29.41dB, (e) DnCNN/29.41dB and (f) MWDCNN (Ours)/29.66dB.
Refer to caption
(a)
Figure 6: Denoising results of a color noisy image from Kodak 24 with σ{\sigma} = 25. (a) Clean image, (b) Noisy image, (c) FFDNet/32.59dB, (d)ADNet/32.30dB, (e) DnCNN/32.53dB and (f) MWDCNN (Ours)/32.82dB.
Refer to caption
(a)
Figure 7: Denoising results of a color noisy image from Kodak 24 with σ{\sigma} = 35. (a) Clean image, (b) Noisy image, (c) FFDNet/31.05dB, (d) ADNet/30.24dB, (e) DnCNN/30.93dB and (f) MWDCNN (Ours)/31.20dB.

4.4 Comparisons with state-of-the-art denoising methods

In this section, we quantitatively and qualitatively analyze denoising performance of the proposed MWDCNN. Specifically, quantitative analysis is evaluated by comparing PSNR, feature similarity index measure (FSIM)[37], structural similarity index measure (SSIM)[31], learned perceptual image patch similarity (LPIPS) as well as DeepFeatures[38] and Inception-Score[25] of popular denoising methods containing DnCNN[35], block-matching and 3-D filtering (BM3D) [8], FFDNet[36], ADNet[29], dual denoising network (DudeNet)[30], weighted nuclear norm minimization method (WNNM)[11], trainable nonlinear reaction diffusion (TNRD)[6], complex-valued denoising network (CDNet)[23], grey theory applied in non-local means (GNLM)[15], image gradient network (GradNet)[20] and targeted image denoising (TID)[21] on some public datasets, i.e., BSD68[14], Set12[14], CBSD68[14], Kodaka24[10] for different noise levels and CC[22] for real noisy image denoising. For gray Gaussian noisy image denoising, we use BSD68 and Set12 to test denoising performance of the proposed MWDCNN as shown in Tables 5 and 6. In Table 5, we can see that MWDCNN is superior to DnCNN for three different noise levels (i.e., 15, 25 and 50) on BSD68. MWDCNN-B as well as MWDCNN for blind denoising is conducted via varying noise levels from 0 to 55. Besides, the proposed MWDCNN has obtained excellent denoising results for each noisy images on the Set12 in PSNR for three noise levels (i.e., 15, 25 and 50) as given in Table 6.

For color Gaussian noisy image denoising, we use CBSD68 and Kodak24 to test denoising performance of different methods for noise levels of 15, 25, 35, 50 and 75 as shown in Tables 7 and 8, respectively. For instance, our MWDCNN as the best result exceeds ADNet as the second result in PSNR on CBSD68 for mentioned five noise levels in Table 7. Also, our MWDCNN is more effective than these of other comparative denoising methods on Kodak24 for five noise levels above in Table 8. These illustrations our MWDCNN is a good tool for color Gaussian noisy image denoising.

For real noisy image denoising, we use CC to test denoising performance of different methods for real digital devices in the real world. As shown in Table 9, our MWDCNN has obtained higher average PSNR than that of other denoising methods, which shows our MWDCNN is effective in real noisy image denoising.

To further test denoising performance, we use FSIM[37] on BSD68 for noise levels of 15, 25 and 50, CBSD68 and Kodak24 for noise levels of 15, 25, 35, 50 and 75, and SSIM[31] and LPIPSM[38] on Kodak24 for noise levels of 15, 25, 35, 50 and 75 to conduct experiments in terms of low-level perceptual aspects. As shown Tables 10 and 11, we can see that our MWDCNN obtained the highest FSIM for all noise levels to verify its excellent denoising performance. As shown in Tables 12 and 13, we can see that our MWDCNN is superior to other popular denoising methods, i.e., DnCNN-B, ADNet and FFDNet in terms of SSIM and LPIPS. In terms of high-level metrics, we choose Inception-Score[25] to conduct experiments as shown in Table 14, which shows our methods is more effective than DnCNN-B, ADNet and FFDNet in low- and high-noise levels. The best denoising results are marked by red and blue lines from Table 5 to Table 14..

Due to difficult improvements of denoising methods, improvement of 0.01dB from denoisers are very reasonable [28, 35]. Thus, although some MWDCNN have only a little improvements of PSNR than that of the second image denoising methods for certain noise level for gray and color Gaussian noisy image denoising, they are very competitive for image denoising. Also, the proposed MWDCNN is very effective for real noisy image denoising. Besides, our MWDCNN is beneficial to low-level metrics, i.e., FSIM, SSIM and LPIPS and high-level metric, i.e., Inception-Score. These illustrate the proposed MWDCNN is effective and robust in image denoising in terms of quantitative analysis.

In terms of qualitatively analysis, we choose popular denoising methods, i.e., FFDNet, ADNet, DnCNN and our MWDCNN to obtain predicted images to compare visual denoising performance. The predicted images of different methods can be chosen one area to amplify this area as observation area, the mentioned observation area is clearer, its corresponding denoising method is more effective for image denoising. For gray noisy image denoising, we choose two images from the Set12 to conduct visual experiments for noise levels of 15 and 25, respectively. As shown in Fig. 4 and Fig. 5, we can see that our MWDCNN is clearer than these of other denoising methods in the observation areas. That shows it is more effective in visual perspective for gray image denoising. For color noisy image denoising, we choose two images from the Kodak24 to conduct visual experiments for noise levels of 25 and 35, respectively. As shown in Fig. 6 and Fig. 7, we can see that our MWDCNN is clearer than other denoising methods in the observation areas. That illustrates it is more effective in perspective for color image denoising. According to mentioned descriptions, we can see that the proposed MWDCNN is more superior than other popular denoising methods in qualitatively analysis. In a summary, our MWDCNN is a good choice for image denoising, according to quantitative and qualitative analysis.

According to analysis of Sections 4.3 and 4.4, we can see that due to use of a dynamic convolution, the proposed MWDCNN is strong adaptability and computational costs for different scenes. Due to combination of a single processing technique and discriminative learning technique as well as fusing wavelet transform technique and a CNN, the proposed MWDCNN has better denoising performance. Due to use of stacked architectures, our MWDCNN is robust for image denoising. Thus, the proposed MWDCNN is effective to complex scenes for portable digital devices, i.e., smartphones and cameras.

5 Conclusion

In this paper, we propose a multi-stage image denoising CNN with the wavelet transform as well as MWDCNN in image denoising. The first stage of MWDCNN uses linearly combination of several convolutional kernels to dynamically adjust parameters rather than same parameters of convolutions, according to different noisy images, which can make a tradeoff between denoising performance and computational cost. It is very useful for noisy images with unknown noise. The second stage of MWDCNN fuses frequency features and structure features via two combinations of wavelet transformation and residual dense block to suppress noise, which enhances robustness of obtained denoiser for complex scenes. The third stage of MWDCNN uses improved residual dense architectures to remove redundant features to improve performance of image denoising. The proposed method is very beneficial to damaged images with unknown noise. However, it depends on a supervised manner to train a denoising model. Also, collected noisy images by cameras are difficult to obtain clean reference images. Thus, we will conduct clean reference images to train a blind denoiser in the future, according to properties of noisy images.

Acknowledgments

This work was supported in part by National Natural Science Foundation of China under Grant 62201468, in part by the China Postdoctoral Science Foundation Grant 2022TQ0259, in part by the Jiangsu Provincial Double–Innovation Doctor Program Grant JSSCBC20220942.

References

  • Ahn et al. [2018] Ahn, N., Kang, B., Sohn, K.A., 2018. Image super-resolution via progressive cascading residual network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 791–799.
  • Allen [1971] Allen, D.M., 1971. Mean square error of prediction as a criterion for selecting variables. Technometrics 13, 469–475.
  • Ayyoubzadeh and Royat [2021] Ayyoubzadeh, S.M., Royat, A., 2021. (asna) an attention-based siamese-difference neural network with surrogate ranking loss function for perceptual image quality assessment, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 388–397.
  • Charbonnier et al. [1994] Charbonnier, P., Blanc-Feraud, L., Aubert, G., Barlaud, M., 1994. Two deterministic half-quadratic regularization algorithms for computed imaging, in: Proceedings of 1st International Conference on Image Processing, IEEE. pp. 168–172.
  • Chen et al. [2020] Chen, Y., Dai, X., Liu, M., Chen, D., Yuan, L., Liu, Z., 2020. Dynamic convolution: Attention over convolution kernels, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11030–11039.
  • Chen and Pock [2016] Chen, Y., Pock, T., 2016. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE transactions on pattern analysis and machine intelligence 39, 1256–1272.
  • Cho and Bui [2005] Cho, D., Bui, T.D., 2005. Multivariate statistical modeling for image denoising using wavelet transforms. Signal Processing: Image Communication 20, 77–89.
  • Dabov et al. [2007] Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K., 2007. Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on image processing 16, 2080–2095.
  • Feng et al. [2021] Feng, X., Zhang, W., Su, X., Xu, Z., 2021. Optical remote sensing image denoising and super-resolution reconstructing using optimized generative network in wavelet transform domain. Remote Sensing 13, 1858.
  • Franzen [1999] Franzen, R., 1999. Kodak lossless true color image suite. source: http://r0k. us/graphics/kodak 4.
  • Gu et al. [2014] Gu, S., Zhang, L., Zuo, W., Feng, X., 2014. Weighted nuclear norm minimization with application to image denoising, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2862–2869.
  • Guo et al. [2017] Guo, T., Seyed Mousavi, H., Huu Vu, T., Monga, V., 2017. Deep wavelet prediction for image super-resolution, in: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 104–113.
  • Kingma and Ba [2014] Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 .
  • Li et al. [2013] Li, H., Cai, J., Nguyen, T.N.A., Zheng, J., 2013. A benchmark for semantic image segmentation, in: 2013 IEEE International Conference on Multimedia and Expo (ICME), IEEE. pp. 1–6.
  • Li and Suen [2016] Li, H., Suen, C.Y., 2016. A novel non-local means image denoising method based on grey theory. Pattern Recognition 49, 237–248.
  • Li et al. [2022] Li, P., Liang, J., Zhang, M., Fan, W., Yu, G., 2022. Joint image denoising with gradient direction and edge-preserving regularization. Pattern Recognition 125, 108506.
  • Li et al. [2021] Li, Y., Chen, Y., Dai, X., Liu, M., Chen, D., Yu, Y., Yuan, L., Liu, Z., Chen, M., Vasconcelos, N., 2021. Revisiting dynamic convolution via matrix decomposition. arXiv preprint arXiv:2103.08756 .
  • Liang et al. [2021] Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R., 2021. Swinir: Image restoration using swin transformer, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1833–1844.
  • Liu [2015] Liu, Y., 2015. Image denoising method based on threshold, wavelet transform and genetic algorithm. International Journal of Signal Processing, Image Processing and Pattern Recognition 8, 29–40.
  • Liu et al. [2020] Liu, Y., Anwar, S., Zheng, L., Tian, Q., 2020. Gradnet image denoising, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 508–509.
  • Luo et al. [2015] Luo, E., Chan, S.H., Nguyen, T.Q., 2015. Adaptive image denoising by targeted databases. IEEE transactions on image processing 24, 2167–2181.
  • Nam et al. [2016] Nam, S., Hwang, Y., Matsushita, Y., Kim, S.J., 2016. A holistic approach to cross-channel image noise modeling and its application to image denoising, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1683–1691.
  • Quan et al. [2021] Quan, Y., Chen, Y., Shao, Y., Teng, H., Xu, Y., Ji, H., 2021. Image denoising using complex-valued deep cnn. Pattern Recognition 111, 107639.
  • Rabbani [2009] Rabbani, H., 2009. Image denoising in steerable pyramid domain based on a local laplace prior. Pattern Recognition 42, 2181–2193.
  • Salimans et al. [2016] Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X., 2016. Improved techniques for training gans. Advances in neural information processing systems 29.
  • Sun et al. [2021] Sun, X., Chen, C., Wang, X., Dong, J., Zhou, H., Chen, S., 2021. Gaussian dynamic convolution for efficient single-image segmentation. IEEE Transactions on Circuits and Systems for Video Technology .
  • Tai et al. [2017] Tai, Y., Yang, J., Liu, X., Xu, C., 2017. Memnet: A persistent memory network for image restoration, in: Proceedings of the IEEE international conference on computer vision, pp. 4539–4547.
  • Tian et al. [2020a] Tian, C., Fei, L., Zheng, W., Xu, Y., Zuo, W., Lin, C.W., 2020a. Deep learning on image denoising: An overview. Neural Networks 131, 251–275.
  • Tian et al. [2020b] Tian, C., Xu, Y., Li, Z., Zuo, W., Fei, L., Liu, H., 2020b. Attention-guided cnn for image denoising. Neural Networks 124, 117–129.
  • Tian et al. [2021] Tian, C., Xu, Y., Zuo, W., Du, B., Lin, C.W., Zhang, D., 2021. Designing and training of a dual cnn for image denoising. Knowledge-Based Systems 226, 106949.
  • Wang et al. [2004] Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 600–612.
  • Wang and Zhang [1999] Wang, Z., Zhang, D., 1999. Progressive switching median filter for the removal of impulse noise from highly corrupted images. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing 46, 78–80.
  • Xu et al. [2018] Xu, J., Li, H., Liang, Z., Zhang, D., Zhang, L., 2018. Real-world noisy image denoising: A new benchmark. arXiv preprint arXiv:1804.02603 .
  • Yang and Wang [2021] Yang, H., Wang, Y., 2021. An effective and comprehensive image super resolution algorithm combined with a novel convolutional neural network and wavelet transform. IEEE Access 9, 98790–98799.
  • Zhang et al. [2017] Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L., 2017. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE transactions on image processing 26, 3142–3155.
  • Zhang et al. [2018a] Zhang, K., Zuo, W., Zhang, L., 2018a. Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. IEEE Transactions on Image Processing 27, 4608–4622.
  • Zhang et al. [2011] Zhang, L., Zhang, L., Mou, X., Zhang, D., 2011. Fsim: A feature similarity index for image quality assessment. IEEE transactions on Image Processing 20, 2378–2386.
  • Zhang et al. [2018b] Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O., 2018b. The unreasonable effectiveness of deep features as a perceptual metric, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595.
  • Zhang et al. [2018c] Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y., 2018c. Residual dense network for image super-resolution, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2472–2481.
  • Zhang et al. [2020] Zhang, Y., Zhang, J., Wang, Q., Zhong, Z., 2020. Dynet: Dynamic convolution for accelerating convolutional neural networks. arXiv preprint arXiv:2004.10694 .