A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution
We thank reviewers for their constructive feedback. Clarifications and new results will be added in the final version.
(R-wUV2-C1): We agree that the alternative optimization scheme between kernel and image is widely applied and fundamental in this area, however, most of them are underlying supervised learning with necessary pre-training phase. The main novelty and contribution of this paper is the proposed random kernel prior learning module, which for the first time breaks with the data dominating approach towards blind SR tasks. Different to the existing learning-based alternative optimization methods, the kernel estimator is training while learning in a plug-and-play way, without any labelled data as well as cumbersome pre-training/re-training requests. This is realized by a fine-designed network-based Langevin dynamics optimization strategy, in which the random kernel prior and LR image based data consistency are comprehensively integrated to improve the convergence performance of kernel estimation, instead of pre-training data prior [25] or manually designed modeling [49]. Benefiting from these two merits, DKP allows superior performance and significant flexibility towards arbitrary kernel categories, thus the novelty should be note-worth.
(R-wUV2-C2, R-7Fwy-J2, R-5pKG-C3): The runtime cost for random kernel sampling/generation is essentially very small (1s). The details of the random kernel (both of Gaussian and motion) sampling/generation processes have been given in the supplementary Sec 2.1. We note that the low time consumption of obtaining random kernel prior is because of the model-based explicit generating functions only need random sampling on a few hyper-parameters (e.g., 2-dimension Gaussian function only needs mean and variance). This also indicates a merit of the DKP model. We provide detailed computational cost comparisons with DIP-based alternating methods in Table 1. It is clear that the proposed DKP model does not increase computational burden than the existing methods, and it removes the pre-training phase (6 hours for FKP). The total FLOPs (image FLOPs + kernel FLOPs) of DIP-based methods are also comparable.
(R-wUV2-C3): Thanks for your recommendations. The simulation results on Gaussian and motion kernel scenarios are given in Table 2. All the models are obtained from the original papers and are re-trained for fair comparisons. The results and analysis of the mentioned methods will be added in the final version.
(R-wUV2-C4): We note that in all the experiments of this paper, the degradations are set to be unknown, and all the degradation formulations are equivalent to the blind SR methods as your recommendation (IKC, DAN, DCLS). Please refer to Sec 1, lines 30-32, Sec 3, lines 165-176 and Sec 5.1, lines 378-385 for the explicit definition and illustration. The real-world dataset RealSRSet [23] has been applied to verify the effectiveness of DKP, and the real-world results have been presented in the manuscript Fig. 5 line 3 and supplementary Fig. 7. In all real-world scenarios, DKP achieves superior performance than the counterparts.
(R-7Fwy-C1, J3): The necessity of RKS and PKE has been discussed in P2 left column lines 87-90. We highlight that RKS module plays the role of generating random kernel priors. PKE essentially estimates blur kernels on the basis of the random kernel priors and LR observations, thus it is indispensable and we didn’t do ablation on its absence. To ensure the random kernel priors will be converged during the optimization process, Eq.(6) is designed to re-weight the random sampled kernels, therefore, allowing the MCMC process converging to random kernels than minimize the LR reconstruction error. To improve the ablation study, we have added the ablation of different structures of kernel network in PKE module in Table 3.
(R-7Fwy-C2): We agree and will revise it for consistency.
(R-7Fwy-J1): We agree that these two terms in Eq.5 are usually combined in statistics, as refer to the same . In this paper, denotes to kernel re-weight (Eq.6 and 7), and indicates kernel generation (Eq.3 and 4), thus making them divided for better demonstration.
Methods | Pre-train | Runtime (total) | Model size (total) | FLOPs/iter (image side) | FLOPs/iter (kernel side) |
---|---|---|---|---|---|
DoubleDIP | ✗ | 91s | 2.25M + 641K | 18.9G | 600K |
FKP-DIP | 6h | 90s | 2.25M + 143K | 18.9G | 178K |
DIP-DKP (Ours) | ✗ | 92s | 2.25M + 562K | 18.9G | 536K |
Method | Kernel | Set5 | Set14 | BSD100 | Urban100 |
---|---|---|---|---|---|
IKC (2019) | 27.01/0.7895 | 25.21/0.6581 | 24.98/0.6106 | 22.57/0.6421 | |
DAN-v1 (2020) | random | 27.34/0.7930 | 25.47/0.6601 | 25.12/0.6132 | 22.75/0.6441 |
DAN-v2 (2021) | Gaussian | 27.41/0.7936 | 25.62/0.6621 | 25.31/0.6153 | 22.71/0.6442 |
DCLS (2022) | kernel | 27.50/0.7948 | 25.68/0.6639 | 25.34/0.6169 | 22.92/0.6475 |
DIP-DKP (Ours) | scenario | 28.43/0.8139 | 26.34/0.6968 | 25.98/0.6611 | 23.24/0.6644 |
Diff-DKP (Ours) | 29.44/0.8592 | 26.76/0.7400 | 26.63/0.7057 | 23.92/0.6875 | |
IKC (2019) | 24.01/0.7227 | 23.96/0.6118 | 22.29/0.5816 | 20.09/0.5453 | |
DAN-v1 (2020) | random | 24.11/0.7244 | 24.03/0.6131 | 22.40/0.5829 | 20.16/0.5462 |
DAN-v2 (2021) | Motion | 24.43/0.7276 | 24.18/0.6166 | 22.55/0.5878 | 20.27/0.5490 |
DCLS (2022) | kernel | 24.78/0.7323 | 24.38/0.6211 | 22.74/0.5922 | 20.49/0.5534 |
DIP-DKP (Ours) | scenario | 25.30/0.7417 | 24.52/0.6434 | 23.02/0.6136 | 21.24/0.5667 |
Diff-DKP (Ours) | 28.74/0.8313 | 26.03/0.6719 | 24.10/0.6287 | 22.26/0.5862 |
LayersUnits/Layer | 10 | 100 | 1000 | 10000 |
---|---|---|---|---|
1 | 13.75/0.3211 | 23.57/0.7452 | 28.93/0.8285 | 28.24/0.8118 |
3 | 13.61/0.3183 | 28.97/0.8299 | 28.48/0.8135 | 28.35/0.8119 |
5 | 13.30/0.3164 | 28.81/0.8269 | 28.52/0.8127 | 26.65/0.7847 |
7 | 13.86/0.3407 | 28.30/0.8172 | 28.54/0.8125 | 27.93/0.8075 |
(R-5pKG-C2): We agree that FKP is a pre-trained model, however, it still needs to optimize its latent variable (input noise) alternatively on test images [25]. Comparing to the deep network applied in FKP (5-layers), DKP only uses a 2-layers lightweight network for kernel estimation, thus allowing comparable runtime. This can also be verified by the FLOPs comparison in Table 1.
(R-5pKG-C4): The fundamental concept of this paper is to break with the data dominating methods in blind SR tasks, thus make the existing models be possible for specific applications that are infeasible with pre-training, such as, high speed targets (e.g., satellites, aircraft) and medical images (e.g., beating heart). We agree your comment and will be happy to discover it in the future work.