A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution

We thank reviewers for their constructive feedback. Clarifications and new results will be added in the final version.

(R-wUV2-C1): We agree that the alternative optimization scheme between kernel and image is widely applied and fundamental in this area, however, most of them are underlying supervised learning with necessary pre-training phase. The main novelty and contribution of this paper is the proposed random kernel prior learning module, which for the first time breaks with the data dominating approach towards blind SR tasks. Different to the existing learning-based alternative optimization methods, the kernel estimator is training while learning in a plug-and-play way, without any labelled data as well as cumbersome pre-training/re-training requests. This is realized by a fine-designed network-based Langevin dynamics optimization strategy, in which the random kernel prior and LR image based data consistency are comprehensively integrated to improve the convergence performance of kernel estimation, instead of pre-training data prior [25] or manually designed modeling [49]. Benefiting from these two merits, DKP allows superior performance and significant flexibility towards arbitrary kernel categories, thus the novelty should be note-worth.

(R-wUV2-C2, R-7Fwy-J2, R-5pKG-C3): The runtime cost for random kernel sampling/generation is essentially very small ( $<$ 1s). The details of the random kernel (both of Gaussian and motion) sampling/generation processes have been given in the supplementary Sec 2.1. We note that the low time consumption of obtaining random kernel prior is because of the model-based explicit generating functions only need random sampling on a few hyper-parameters (e.g., 2-dimension Gaussian function only needs mean and variance). This also indicates a merit of the DKP model. We provide detailed computational cost comparisons with DIP-based alternating methods in Table 1. It is clear that the proposed DKP model does not increase computational burden than the existing methods, and it removes the pre-training phase (6 hours for FKP). The total FLOPs (image FLOPs + kernel FLOPs) of DIP-based methods are also comparable.

(R-wUV2-C3): Thanks for your recommendations. The simulation results on Gaussian and motion kernel scenarios are given in Table 2. All the models are obtained from the original papers and are re-trained for fair comparisons. The results and analysis of the mentioned methods will be added in the final version.

(R-wUV2-C4): We note that in all the experiments of this paper, the degradations are set to be unknown, and all the degradation formulations are equivalent to the blind SR methods as your recommendation (IKC, DAN, DCLS). Please refer to Sec 1, lines 30-32, Sec 3, lines 165-176 and Sec 5.1, lines 378-385 for the explicit definition and illustration. The real-world dataset RealSRSet [23] has been applied to verify the effectiveness of DKP, and the real-world results have been presented in the manuscript Fig. 5 line 3 and supplementary Fig. 7. In all real-world scenarios, DKP achieves superior performance than the counterparts.

(R-7Fwy-C1, J3): The necessity of RKS and PKE has been discussed in P2 left column lines 87-90. We highlight that RKS module plays the role of generating random kernel priors. PKE essentially estimates blur kernels on the basis of the random kernel priors and LR observations, thus it is indispensable and we didn’t do ablation on its absence. To ensure the random kernel priors will be converged during the optimization process, Eq.(6) is designed to re-weight the random sampled kernels, therefore, allowing the MCMC process converging to random kernels than minimize the LR reconstruction error. To improve the ablation study, we have added the ablation of different structures of kernel network in PKE module in Table 3.

(R-7Fwy-C2): We agree and will revise it for consistency.

(R-7Fwy-J1): We agree that these two terms in Eq.5 are usually combined in statistics, as refer to the same $\bm{k}^{l}_{r}$ . In this paper, $p({\bm{k}}_{r}^{l}|{\bm{x}}^{t-1},\bm{y})$ denotes to kernel re-weight (Eq.6 and 7), and $p({\bm{k}}^{l}_{r}|{\bm{\Sigma}}_{r})$ indicates kernel generation (Eq.3 and 4), thus making them divided for better demonstration.

Table 1: The computational costs and FLOPs of different methods.

Methods	Pre-train	Runtime (total)	Model size (total)	FLOPs/iter (image side)	FLOPs/iter (kernel side)
DoubleDIP	✗	$\approx$ 91s	2.25M + 641K	$\approx$ 18.9G	$\approx$ 600K
FKP-DIP	$\approx$ 6h	$\approx$ 90s	2.25M + 143K	$\approx$ 18.9G	$\approx$ 178K
DIP-DKP (Ours)	✗	$\approx$ 92s	2.25M + 562K	$\approx$ 18.9G	$\approx$ 536K

Table 2: Average PSNR/SSIM of different methods on public datasets that are synthesized by the random Gaussian (top)/Motion (bottom) kernels with

s=4

Method	Kernel	Set5	Set14	BSD100	Urban100
IKC (2019)		27.01/0.7895	25.21/0.6581	24.98/0.6106	22.57/0.6421
DAN-v1 (2020)	random	27.34/0.7930	25.47/0.6601	25.12/0.6132	22.75/0.6441
DAN-v2 (2021)	Gaussian	27.41/0.7936	25.62/0.6621	25.31/0.6153	22.71/0.6442
DCLS (2022)	kernel	27.50/0.7948	25.68/0.6639	25.34/0.6169	22.92/0.6475
DIP-DKP (Ours)	scenario	28.43/0.8139	26.34/0.6968	25.98/0.6611	23.24/0.6644
Diff-DKP (Ours)		29.44/0.8592	26.76/0.7400	26.63/0.7057	23.92/0.6875
IKC (2019)		24.01/0.7227	23.96/0.6118	22.29/0.5816	20.09/0.5453
DAN-v1 (2020)	random	24.11/0.7244	24.03/0.6131	22.40/0.5829	20.16/0.5462
DAN-v2 (2021)	Motion	24.43/0.7276	24.18/0.6166	22.55/0.5878	20.27/0.5490
DCLS (2022)	kernel	24.78/0.7323	24.38/0.6211	22.74/0.5922	20.49/0.5534
DIP-DKP (Ours)	scenario	25.30/0.7417	24.52/0.6434	23.02/0.6136	21.24/0.5667
Diff-DKP (Ours)		28.74/0.8313	26.03/0.6719	24.10/0.6287	22.26/0.5862

Table 3: The ablation study of our RKS module. (Set5, x4, image PSNR/SSIM)

Layers $\backslash$ Units/Layer	10	100	1000	10000
1	13.75/0.3211	23.57/0.7452	28.93/0.8285	28.24/0.8118
3	13.61/0.3183	28.97/0.8299	28.48/0.8135	28.35/0.8119
5	13.30/0.3164	28.81/0.8269	28.52/0.8127	26.65/0.7847
7	13.86/0.3407	28.30/0.8172	28.54/0.8125	27.93/0.8075

(R-5pKG-C2): We agree that FKP is a pre-trained model, however, it still needs to optimize its latent variable (input noise) alternatively on test images [25]. Comparing to the deep network applied in FKP (5-layers), DKP only uses a 2-layers lightweight network for kernel estimation, thus allowing comparable runtime. This can also be verified by the FLOPs comparison in Table 1.

(R-5pKG-C4): The fundamental concept of this paper is to break with the data dominating methods in blind SR tasks, thus make the existing models be possible for specific applications that are infeasible with pre-training, such as, high speed targets (e.g., satellites, aircraft) and medical images (e.g., beating heart). We agree your comment and will be happy to discover it in the future work.