This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution

We thank reviewers for their constructive feedback. Clarifications and new results will be added in the final version.

(R-wUV2-C1): We agree that the alternative optimization scheme between kernel and image is widely applied and fundamental in this area, however, most of them are underlying supervised learning with necessary pre-training phase. The main novelty and contribution of this paper is the proposed random kernel prior learning module, which for the first time breaks with the data dominating approach towards blind SR tasks. Different to the existing learning-based alternative optimization methods, the kernel estimator is training while learning in a plug-and-play way, without any labelled data as well as cumbersome pre-training/re-training requests. This is realized by a fine-designed network-based Langevin dynamics optimization strategy, in which the random kernel prior and LR image based data consistency are comprehensively integrated to improve the convergence performance of kernel estimation, instead of pre-training data prior [25] or manually designed modeling [49]. Benefiting from these two merits, DKP allows superior performance and significant flexibility towards arbitrary kernel categories, thus the novelty should be note-worth.

(R-wUV2-C2, R-7Fwy-J2, R-5pKG-C3): The runtime cost for random kernel sampling/generation is essentially very small (<<1s). The details of the random kernel (both of Gaussian and motion) sampling/generation processes have been given in the supplementary Sec 2.1. We note that the low time consumption of obtaining random kernel prior is because of the model-based explicit generating functions only need random sampling on a few hyper-parameters (e.g., 2-dimension Gaussian function only needs mean and variance). This also indicates a merit of the DKP model. We provide detailed computational cost comparisons with DIP-based alternating methods in Table 1. It is clear that the proposed DKP model does not increase computational burden than the existing methods, and it removes the pre-training phase (6 hours for FKP). The total FLOPs (image FLOPs + kernel FLOPs) of DIP-based methods are also comparable.

(R-wUV2-C3): Thanks for your recommendations. The simulation results on Gaussian and motion kernel scenarios are given in Table 2. All the models are obtained from the original papers and are re-trained for fair comparisons. The results and analysis of the mentioned methods will be added in the final version.

(R-wUV2-C4): We note that in all the experiments of this paper, the degradations are set to be unknown, and all the degradation formulations are equivalent to the blind SR methods as your recommendation (IKC, DAN, DCLS). Please refer to Sec 1, lines 30-32, Sec 3, lines 165-176 and Sec 5.1, lines 378-385 for the explicit definition and illustration. The real-world dataset RealSRSet [23] has been applied to verify the effectiveness of DKP, and the real-world results have been presented in the manuscript Fig. 5 line 3 and supplementary Fig. 7. In all real-world scenarios, DKP achieves superior performance than the counterparts.

(R-7Fwy-C1, J3): The necessity of RKS and PKE has been discussed in P2 left column lines 87-90. We highlight that RKS module plays the role of generating random kernel priors. PKE essentially estimates blur kernels on the basis of the random kernel priors and LR observations, thus it is indispensable and we didn’t do ablation on its absence. To ensure the random kernel priors will be converged during the optimization process, Eq.(6) is designed to re-weight the random sampled kernels, therefore, allowing the MCMC process converging to random kernels than minimize the LR reconstruction error. To improve the ablation study, we have added the ablation of different structures of kernel network in PKE module in Table 3.

(R-7Fwy-C2): We agree and will revise it for consistency.

(R-7Fwy-J1): We agree that these two terms in Eq.5 are usually combined in statistics, as refer to the same 𝒌rl\bm{k}^{l}_{r}. In this paper, p(𝒌rl|𝒙t1,𝒚)p({\bm{k}}_{r}^{l}|{\bm{x}}^{t-1},\bm{y}) denotes to kernel re-weight (Eq.6 and 7), and p(𝒌rl|𝚺r)p({\bm{k}}^{l}_{r}|{\bm{\Sigma}}_{r}) indicates kernel generation (Eq.3 and 4), thus making them divided for better demonstration.

Table 1: The computational costs and FLOPs of different methods.
Methods Pre-train Runtime (total) Model size (total) FLOPs/iter (image side) FLOPs/iter (kernel side)
DoubleDIP \approx 91s 2.25M + 641K \approx 18.9G \approx 600K
FKP-DIP \approx 6h \approx 90s 2.25M + 143K \approx 18.9G \approx 178K
DIP-DKP (Ours) \approx 92s 2.25M + 562K \approx 18.9G \approx 536K
Table 2: Average PSNR/SSIM of different methods on public datasets that are synthesized by the random Gaussian (top)/Motion (bottom) kernels with s=4s=4.
Method Kernel Set5 Set14 BSD100 Urban100
IKC (2019) 27.01/0.7895 25.21/0.6581 24.98/0.6106 22.57/0.6421
DAN-v1 (2020) random 27.34/0.7930 25.47/0.6601 25.12/0.6132 22.75/0.6441
DAN-v2 (2021) Gaussian 27.41/0.7936 25.62/0.6621 25.31/0.6153 22.71/0.6442
DCLS (2022) kernel 27.50/0.7948 25.68/0.6639 25.34/0.6169 22.92/0.6475
DIP-DKP (Ours) scenario 28.43/0.8139 26.34/0.6968 25.98/0.6611 23.24/0.6644
Diff-DKP (Ours) 29.44/0.8592 26.76/0.7400 26.63/0.7057 23.92/0.6875
IKC (2019) 24.01/0.7227 23.96/0.6118 22.29/0.5816 20.09/0.5453
DAN-v1 (2020) random 24.11/0.7244 24.03/0.6131 22.40/0.5829 20.16/0.5462
DAN-v2 (2021) Motion 24.43/0.7276 24.18/0.6166 22.55/0.5878 20.27/0.5490
DCLS (2022) kernel 24.78/0.7323 24.38/0.6211 22.74/0.5922 20.49/0.5534
DIP-DKP (Ours) scenario 25.30/0.7417 24.52/0.6434 23.02/0.6136 21.24/0.5667
Diff-DKP (Ours) 28.74/0.8313 26.03/0.6719 24.10/0.6287 22.26/0.5862
Table 3: The ablation study of our RKS module. (Set5, x4, image PSNR/SSIM)
Layers\\backslashUnits/Layer 10 100 1000 10000
1 13.75/0.3211 23.57/0.7452 28.93/0.8285 28.24/0.8118
3 13.61/0.3183 28.97/0.8299 28.48/0.8135 28.35/0.8119
5 13.30/0.3164 28.81/0.8269 28.52/0.8127 26.65/0.7847
7 13.86/0.3407 28.30/0.8172 28.54/0.8125 27.93/0.8075

(R-5pKG-C2): We agree that FKP is a pre-trained model, however, it still needs to optimize its latent variable (input noise) alternatively on test images [25]. Comparing to the deep network applied in FKP (5-layers), DKP only uses a 2-layers lightweight network for kernel estimation, thus allowing comparable runtime. This can also be verified by the FLOPs comparison in Table 1.

(R-5pKG-C4): The fundamental concept of this paper is to break with the data dominating methods in blind SR tasks, thus make the existing models be possible for specific applications that are infeasible with pre-training, such as, high speed targets (e.g., satellites, aircraft) and medical images (e.g., beating heart). We agree your comment and will be happy to discover it in the future work.