This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Perception-Distortion Balanced Super-Resolution:
A Multi-Objective Optimization Perspective

Lingchen Sun, Jie Liang, Shuaizheng Liu, Hongwei Yong, Lei Zhang L. Sun, S. Liu, H. Yong, and L. Zhang are with the Department of Computing, the Hong Kong Polytechnic University, Hong Kong (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). This work is supported by the Hong Kong RGC RIF grant (R5001-18) and the PolyU-OPPO Joint Innovation Lab.J. Liang is with the OPPO Research Institute (e-mail: [email protected]).
Abstract

High perceptual quality and low distortion degree are two important goals in image restoration tasks such as super-resolution (SR). Most of the existing SR methods aim to achieve these goals by minimizing the corresponding yet conflicting losses, such as the 1\ell_{1} loss and the adversarial loss. Unfortunately, the commonly used gradient-based optimizers, such as Adam, are hard to balance these objectives due to the opposite gradient decent directions of the contradictory losses. In this paper, we formulate the perception-distortion trade-off in SR as a multi-objective optimization problem and develop a new optimizer by integrating the gradient-free evolutionary algorithm (EA) with gradient-based Adam, where EA and Adam focus on the divergence and convergence of the optimization directions respectively. As a result, a population of optimal models with different perception-distortion preferences is obtained. We then design a fusion network to merge these models into a single stronger one for an effective perception-distortion trade-off. Experiments demonstrate that with the same backbone network, the perception-distortion balanced SR model trained by our method can achieve better perceptual quality than its competitors while attaining better reconstruction fidelity. Codes and models can be found at https://github.com/csslc/EA-Adam.

Index Terms:
Super-resolution, GAN, evolutionary algorithm, optimization.

I Introduction

Image super-resolution (SR) [1] is a valuable technique to reconstruct a high-resolution (HR) image with good quality from a low-resolution (LR) input image. Benefiting from the deep learning techniques, it has become popular to train a deep neural network (DNN) for SR by using a large amount of LR-HR image pairs [2, 3, 4, 5, 6, 7, 8, 9]. One commonly used loss function to optimize the SR models is the distortion (or fidelity) measure, e.g., the norm of the prediction errors [2, 10, 11, 12]. However, SR is a typical ill-posed problem, and there are many possible HR estimations for the given LR input. It is well-known that the 2\ell_{2} or 1\ell_{1} loss tends to generate over-smoothed HR estimates [13, 14] although they can result in good distortion measures such as peak-to-noise ratio (PSNR).

To better preserve the local structures and details of reconstructed images, the SSIM loss [15] and perceptual loss [16] have also been used to train SR models. The former adopts the widely used image quality assessment index, i.e., SSIM [15, 17], as the optimization objective, while the latter minimizes the 1\ell_{1} or 2\ell_{2} norm of the prediction difference in a feature space [7, 8]. In general, the SSIM loss and perceptual loss can improve the perceptual quality of SR outputs, but the improvement is limited because they can hardly generate details that are lost during degradation[18, 19].

Refer to caption
Figure 1: Performance comparison of perception-distortion balanced SR models trained by the Adam optimizer and our hybrid EA-Adam optimizer on the BSD100 dataset. For the perceptual quality index LPIPS, the smaller the better. For the distortion degree index PSNR, the higher the better. The models optimized by Adam (orange triangle points) with different weighted combinations of 1\ell_{1}, perceptual and adversarial losses are either stagnated into an unsatisfactory solution, or achieve unbalanced performance. The models optimized by our EA-Adam optimizer (blue circle points) can obtain better convergence and divergence. Furthermore, the fused model (red star) from EA-Adam models achieves the best perceptual quality index while maintaining a high fidelity index.

To obtain perceptually more realistic HR images, the perception-distortion trade-off methods [20, 7, 8, 21, 22, 23] have been developed in SR. Usually, they train a generative adversarial network (GAN) [24] by using the adversarial loss. Instead of minimizing the pixel-wise 2\ell_{2} or 1\ell_{1} loss or local structure-wise SSIM or perceptual loss, the adversarial loss is optimized to minimize the distribution divergence between the predicted SR images and HR images. In practice, it is unstable and impossible to train the SR model using only the adversarial loss. Therefore, most GAN-based SR methods weigh the 1\ell_{1}, perceptual and adversarial losses [7, 8] into one single loss for optimization.

While GAN-based SR methods can generate some realistic details, they will also introduce undesired artifacts [25, 18]. Many following works [26, 13, 27, 28, 29, 30, 31] aim to reduce the artifacts caused by the adversarial loss while keeping the realistic details. Dario et al[26] applied 1\ell_{1} and adversarial losses in both spatial and Fourier domains to improve the restoration of high-frequency details. Liang et al[13] designed an LDL loss to suppress GAN-generated artifacts by considering their statistical difference from visually friendly details. To avoid artifacts in low-frequency areas, a region-aware adversarial loss [27] was designed to help the discriminator pay more attention to high-frequency areas. Zhang et al[28] used a similar idea to handle the low-frequency and high-frequency areas differently by using a constrained loss. Xie et al[29] proposed to detect the artifact regions and developed a finetuning procedure to improve GAN-based SR models.

Unfortunately, the above-mentioned methods have two major limitations. First, empirically selecting weights to combine the losses into one is unreasonable and labor-consuming [28]. Second and more importantly, distortion- and perception-oriented losses often conflict with each other, and the commonly used gradient-based DNN optimizers, such as Adam [32], are difficult to balance the conflicting gradient-decent directions. We illustrate this issue in Fig. 1, where we train a population of SR networks by weighing the 1\ell_{1}, perceptual and adversarial losses with different weights. The Adam optimizer is used to train the network, and the PSNR (the higher the better) and LPIPS (the smaller the better) [33] indices are used to measure the distortion degree and perceptual quality, respectively. An ideal SR model is expected to achieve both high PSNR and low LPIPS, i.e., approaching the origin point in Fig. 1. However, we can see that the learned models (orange triangle points) by Adam with different weighted 1\ell_{1}, perceptual and adversarial losses are rather stagnated into an unsatisfactory solution, or achieve unbalanced performance.

In this paper, we propose a new approach for perception-distortion balanced SR from the network optimization perspective. We formulate the problem as a multi-objective optimization task [18], one objective aiming for high perceptual quality and another for low distortion degree. Considering that these two objectives are hard to achieve simultaneously by using gradient-based optimizers because of their conflicting gradient descent directions, we propose to integrate the gradient-free evolutionary algorithm (EA) [34, 35, 36] with the gradient-based Adam for network optimization. Adam mainly ensures the model convergence, i.e., the losses will decrease along the gradient, while EA mainly ensures the model divergence, i.e., the potential solutions could interact with and benefit from each other by the crossover operation. Meanwhile, the mutation operation of EA helps the model to jump out of the local optimum. As a result, the obtained Pareto front (PF) consists of a series of optimal models with different perception-distortion preferences, as shown by the blue circular points in Fig. 1. Compared with the Adam optimizer, the models optimized by our proposed EA-Adam method are closer to the original point and are more evenly distributed than those optimized by Adam. Motivated by the mixture of experts theory [37, 38] and the network interpolation methods [39, 40], we further propose a simple yet effective fusion network to merge the obtained models into a single stronger one, which inherits the advantages of the model population. As indicated by the red star in Fig. 1, the fused model achieves the best perceptual quality index while maintaining a high fidelity index. Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed hybrid EA-Adam optimizer and weight aggregation network. Compared with the existing state-of-the-art methods, our model achieves a better perception-distortion trade-off.

The rest of this paper is organized as follows. Section II summarizes the related works on distortion- and perception-oriented SR methods, and evolutionary algorithms on DNN learning. Section III presents the multi-objective formulation of SR, the hybrid EA-Adam optimizer, and the network fusion approach in detail. Section IV reports the experimental results and discussions. Finally, Section V concludes this paper.

II Related Work

II-A Distortion-oriented SR Methods

Most of the SR methods in this category [2, 3, 11, 5] employ the pixel-wise loss functions, such as the 2\ell_{2} or 1\ell_{1} norm of the prediction error, while focusing on the design of network architectures to improve the SR performance. The seminal work of SRCNN [2] introduced a convolutional neural network (CNN) with three layers to perform the SR task. Afterward, many SR models have been developed. Kim et al[6] trained a deeper and wider network to improve the representational ability of CNN. Deep residual connections were introduced to improve deep SR network learning performance [41, 42, 10]. Tong et al[12] proposed to use dense skip connections in a very deep network. The recursive network et al[6] was designed to improve SR performance without introducing new parameters for additional convolutions. Very recently, the transformer networks [43, 44, 45, 46] have been successfully employed for SR, resulting in better distortion-based measures such as PSNR. Because 2\ell_{2} or 1\ell_{1} loss tends to generate blurry SR outputs, the SSIM loss [15] and perceptual loss [16] have been proposed to regularize the local structures of images. These two losses can improve the visual quality of SR outputs without harming too much the distortion measures. However, they cannot generate image details to make the SR image perceptually more realistic.

II-B Perception-oriented SR Methods

To make the reconstructed SR images perceptually more realistic, SRGAN [7] combined the adversarial loss with distortion-oriented losses (e.g., 1\ell_{1} loss) in training. Following SRGAN, ESRGAN [8] used the VGG features before activation and employed the RRDB backbone [8] as the generator model, improving the perceptual quality. DualFormer [31] leveraged the spectral and spatial discriminators for identifying high-frequency and low-frequency information, respectively. While GAN-based methods can generate new image details, they may introduce many unnatural visual artifacts because of unstable adversarial training. Therefore, many of the following works aim to reduce the GAN-generated artifacts from the perspectives of frequency division [27, 26, 28, 47, 48], the weight of layers in perceptual loss [30, 49], or different statistical properties [13, 29]. In addition, some works [22] aim to optimize the no-reference image quality metrics, which can be non-differentiable. For example, RankSRGAN [22] designed a differentiable rank-content loss to rank the quality of predictions with NIQE [50] or Ma [51] index.

II-C Evolutionary Algorithms in DNN Learning

EA [35, 36] is a classical search-based optimization technique, where individuals interactively evolve from a population. The EA operators mainly consist of crossover [52], mutation, and selection. Due to the global search and gradient-free properties, EA has proven its effectiveness in many applications [53, 54, 55]. It has recently been applied to DNN learning [56, 57, 58]. Guo et al[56] applied evolutionary architecture search to construct a simplified supernet with all architectures optimized simultaneously. Yu et al[57] searched over a big single-stage model that contains small children models. EAGAN [58] applied an efficient two-stage EA-based network search framework to learn GANs. Another application is network compression. For example, Zhou et al[59] established filter pruning as a multi-objective optimization problem and proposed a knee-guided EA algorithm to trade-off between the scale of parameters and performance. In addition, EA algorithm can be used as the network optimizer. ESGD [60] alternately applied SGD and EA in one framework to improve the average performance of DNN. Zhang et al[61] proposed a hierarchical cluster-based suppression algorithm to remove similar weights and improve population diversity. GEMONN [62] determined the search direction in the EA optimizer by the gradient of weights to improve the efficiency of EA for training DNN.

To optimize a network, EA is usually coupled with a gradient-based optimizer to solve large-scale optimization problems [63]. Most of the previous EA-based DNN learning works [60] optimize a single objective. In this work, we formulate the perceptually realistic SR task as a multi-objective optimization problem and propose a hybrid EA and Adam algorithm to solve it.

III The Proposed Method

Refer to caption
Figure 2: (a) The framework of traditional GAN-based SR methods. (b) The Adam steps optimize a set of generator and discriminator networks. (c) The EA steps optimize the generator networks by fixing the discriminator networks.

This section presents the proposed hybrid EA-Adam optimizer and a simple yet effective fusion network for perception-distortion balanced SR. Firstly, we formulate the SR task as a multi-objective optimization problem, with two objectives focusing on perceptual quality and fidelity quality, respectively. Secondly, a hybrid EA-Adam optimizer is designed to solve the problem, where EA addresses the model divergence and Adam addresses the model convergence. A set of SR models with different perception-distortion preferences are trained to form a PF. Lastly, a simple yet effective fusion network is proposed to merge the learned models into a strong perception-distortion-balanced SR model.

III-A Multi-Objective Formulation

Most of the GAN-based SR methods [7, 8] train the generator model by weighting the pixel-wise LpixL_{pix} loss, perceptual loss LpercepL_{percep} and adversarial loss LadvL_{adv} as follows:

LGAN=α1Lpix+α2Lpercep+α3Ladv,{L_{GAN}}={\alpha_{1}}{L_{pix}}+{\alpha_{2}}{L_{percep}}+{\alpha_{3}}{L_{adv}}, (1)

where the LpixL_{pix} (i.e., 1\ell_{1} loss) optimizes the reconstruction fidelity, and LpercepL_{percep} and LadvL_{adv} improve the perceptual quality. The weights α1\alpha_{1}, α2\alpha_{2} and α3\alpha_{3} are often empirically set to 0.01, 1 and 0.005, respectively.

The Adam optimizer [32] is dominantly used to optimize the above loss function. Unfortunately, the gradient-based Adam is difficult to minimize the perception- and distortion-oriented losses simultaneously due to their opposite gradient descent directions, while an ideal weight setting is hard to be found given an infinite searching space. Therefore, most traditional SR models turn to pursue either fidelity or perceptual quality yet sacrifice the other. As illustrated in Fig. 1, when the weights are biased towards distortion- or perception-oriented losses, the SR results will be over-smoothed or include visually unpleasant artifacts. When similar weights are set to balance perception and distortion, the optimization might be stagnant at the local minimum as no explicit gradient descent directions can be found. Note that we apply gradient normalization [64] to avoid the trivial solution.

In this paper, rather than linearly combining the distortion- and perception-oriented losses into a single objective function, we formulate the perceptually realistic SR task as a multi-objective optimization problem as follows:

{minf1=Lpix,minf2=Lpercep+αLadv,\left\{\begin{array}[]{l}\min{f_{1}}={L_{pix}},\\ \min{f_{2}}={L_{percep}}+{\alpha}{L_{adv}},\end{array}\right. (2)

where objectives f1f_{1} and f2f_{2} respectively focus on the reconstruction fidelity and the perceptual quality of the SR results. In the following, we propose to couple EA [36] with Adam [32] to address the multi-objective optimization problem in Eq. (2). EA and Adam help with the optimization process for the divergence and convergence, respectively.

TABLE I: {justify} The proposed hybrid EA-Adam optimization algorithm.
Input: The number of optimized models NN, Adam epochs TAdamT^{Adam}, EA epochs TEAT^{EA}, total epochs TT, the pretrained model θG0{\theta}_{G_{0}} optimized by 1\ell_{1} loss, and the probability δ\delta of selecting parents from the population \mathcal{B}.
Step 1. Initialization: Initialize a population of generators 𝒢={θG1,,θGN}\mathcal{G}=\{\theta_{G_{1}},\cdots,\theta_{G_{N}}\}, where θGk=θG0,k[1,,N]{\theta}_{G_{k}}={\theta}_{G_{0}},k\in[1,\cdots,N]. Initialize the discriminator population 𝒟={θD1,,θDN}{\mathcal{D}}=\{\theta_{D_{1}},\cdots,\theta_{D_{N}}\}. Generate the neighboring population \mathcal{B}. Define the Adam optimizers for each model.
Step 2. Optimization iteration:
For t1:TEA+TAdam:Tt\in 1:T^{EA}+T^{Adam}:T do
#\#Adam Steps
For i{1,,TAdam}i\in\{1,\cdots,T^{Adam}\} do
For s{2,,N}s\in\{2,\cdots,N\} do
1. Update the weights of θGs\theta_{G_{s}} by Adam.
2. Update the weights of θDs\theta_{D_{s}} by Adam.
EndFor
EndFor
#\#EA Steps
For j{1,,TEA}j\in\{1,\cdots,T^{EA}\} do
For k{2,,N}k\in\{2,\cdots,N\} do
1. If r<δr<\delta, 𝒫=(k)\mathcal{P}=\mathcal{B}(k); otherwise, 𝒫=𝒢\mathcal{P}=\mathcal{G}.
2. Select parents θ1{\theta}_{1} and θ2{\theta}_{2} from 𝒫\mathcal{P} randomly.
3. Generate offspring θI{\theta}_{I} by the crossover and mutation in Eq. (3) and Eq. (4).
4. Select the new individual θGk\theta_{G_{k}} as in Eq. (5).
EndFor
EndFor
Re-initialization: Re-define the Adam optimizer for each model in population.
EndFor
Refer to caption
Figure 3: Illustration of the convergence process of the proposed EA-Adam optimizer in an EA-Adam cycle (TAdam+TEAT_{Adam}+T_{EA}). The Adam steps primarily focus on optimization convergence, enabling rapid search for improved objective losses. In contrast, the EA step emphasizes divergence optimization, ensuring a uniform spread of solutions across the plane.
Refer to caption
Figure 4: Two stages of our network fusion process. Left: training of the attention-based weight regression network, which predicts the fusion weights of experts for each input image. Right: model fusion by averaging weight vectors of a batch of validation data.

III-B The Hybrid EA-Adam Optimizer

The algorithm of the proposed hybrid EA-Adam optimizer is summarized in Table I. EA and Adam alternately optimize the NN SR models with different perception-distortion preferences. We employ the general framework of GAN-based SR learning, which is shown in Fig. 2(a). The generator network learns the mapping between the LR and HR images, and the discriminator network discriminates between the true and fake images. The two networks compete with each other to ensure that the generated images are of high quality and more natural.

The Adam step is illustrated in Fig. 2(b). Initialized by a pre-trained model with 1\ell_{1} loss, the other N1N-1 generator-discriminator pairs are updated by Adam separately, as shown in the Adam step of Table I. We fix the first model across the training process as it has the best reconstruction fidelity by training with pure pixel-wise 1\ell_{1} loss. Then, the EA step starts, as illustrated in Fig. 2(c). The adversarial training facilitates the model convergence in the Adam step to obtain better objective scores, while the EA step focuses on the model divergence, enabling the optimized models uniformly distributed on the performance plane spanned by the two objective indices. Therefore, the discriminators are only updated in the Adam steps to enhance convergence.

In the EA step, a population of generator models is optimized together. Following MOEA-D [36], the parents are first chosen from the neighboring population \mathcal{B} or the whole population 𝒢\mathcal{G}, which focuses on the local or global evolution. The offspring is generated by the selected parents via crossover and mutation operators. The crossover is used to interact with the individuals in a population with each other so that useful information can be effectively propagated internally. We employ the SBX crossover [52], which is defined as follows:

θI=0.5×[(1+β)θ1+(1β)θ2].{\theta}_{I}=0.5\times\left[{(1+\beta){\theta}_{1}}+(1-\beta){{\theta}_{2}}\right]. (3)

Here, {θ}I,1,2\{\theta\}_{I,1,2} indicate the parameters of the corresponding models, β\beta is a weighting variable that relies on a random number r[0,1]r\in[0,1], where β=(r×2)1/(1+η)\beta={{{(r\times 2)}^{1/(1+\eta)}}} if r<0.5r<0.5 and β=(1/(2r×2))1/(1+η)\beta={{{(1/(2-r\times 2))}^{1/(1+\eta)}}} otherwise, η\eta is a constant and is set to 20 as in SBX [52]. The mutation operator disturbs the optimization to jump out of the local optimum by adding a zero-mean random Gaussian noise ε(k)𝒩(0,0.01){\varepsilon^{(k)}}\sim{\mathcal{N}(0,0.01)} [60] to each of the NN models, i.e.,

θI=θI+ε(k),k{1,,N}.{\theta_{I}}={\theta_{I}}+{\varepsilon^{(k)}},k\in\{1,\cdots,N\}. (4)

To select a good model from the parent models and the newly generated offspring models, we measure the model performance by using an aggregation function derived from the Tchebycheff decomposition [65] as follows:

minFte=max{λ(f1z1),(1λ)(f2z2)},\min{F^{te}}=\max\{\lambda({f_{1}}-z_{1}^{*}),(1-\lambda)({f_{2}}-z_{2}^{*})\}, (5)

where z1z_{1}^{*} and z2z_{2}^{*} are the moving minimum values of f1f_{1} and f2f_{2} recorded until the current iteration, and weights λ\lambda are uniformly distributed within (0,1](0,1]. Specifically, we set λ=0,1N1,2N1,,1\lambda={0,\frac{1}{{N-1}}},{\frac{2}{{N-1}},\cdots,1} for the NN models, respectively. With different aggregation loss weights λ\lambda, the multiple optimization problem can be divided into NN sub-problems of single-objective optimization for different perception-distortion preferences. Note that the momentum in the Adam optimizer should be initialized after the EA optimization step. Finally, a set of generator models with different perception-distortion preferences are obtained to form a PF, as shown in Fig. 1.

To illustrate the training convergence process of the EA-Adam optimization algorithm, in Fig. 3 we demonstrate the performance of optimized models on the two conflicting objectives f1f_{1} and f2f_{2} over an Adam-EA cycle on the B100 dataset. Different shaped symbols represent the N=4N=4 optimized models, where the green color denotes the model optimized after the first step of Adam, the blue color denotes the model optimized after TAdamT_{Adam} steps by Adam (TAdam=10T_{Adam}=10 in this experiment), and the red color denotes the model further optimized by EA with TEA=1T_{EA}=1 step. We can see that Adam focuses on the convergence of the 4 models, enabling a fast search for better objective scores of them. However, due to the opposite optimization gradient directions of f1f_{1} and f2f_{2}, Adam suffers from optimization stagnation. That is, the optimized models targeting at different perception-distortion objectives tend to converge on the same side of the plane, making it difficult to form a diverse solution space. Fortunately, the introduction of EA enhances the diversification of the NN models so that the optimized models are more evenly distributed in the optimization plane with improved performance.

III-C Network Fusion

As mentioned above, the NN SR models (with the same architectural topology) optimized by the proposed hybrid EA-Adam optimizer can work better than their counterparts optimized by Adam in balancing perceptual quality and distortion degree. However, it can be labor-consuming to manually select one from them to super-resolve the input LR image. On the other hand, if we apply all the NN models to the input image and then fuse the outputs into one image, it can be expensive and less effective in inference [39]. To solve this issue, we propose to automatically fuse the NN models into one stronger model to pursue further enhanced performance beneath the PF formed by the NN models. To achieve this goal, we first introduce a lightweight attention-based weight regression network to fuse the convolution layers of the NN SR models as in [38, 37, 66]. As shown at the left of Fig. 4, the input is the feature from one layer and the output is the weight of the corresponding layer in the NN models. The generated weights satisfy kwk=1,k{1,,N}\sum\nolimits_{k}{{w_{k}}=1},k\in\{1,\cdots,N\}. There are 3 modules in the weight regression network. The first module consists of 3 convolution layers with Leaky ReLU activation. The kernel sizes of convolution layers are all 5×55\times 5. The second module is a global average pooling layer, and the last is a mapping module, which consists of two linear layers followed by a Sigmoid activation. The first module extracts the features and the last two modules estimate the weights to fuse the convolution layers of the SR models.

We train the attention-based weight regression network with the most commonly-used LGANL_{GAN} loss as it encourages the output to be perceptually more realistic. Meanwhile, since the aggregated model linearly interpolates among the NN SR models that are fixed in this stage, the reconstruction fidelity can also be well preserved. We fuse the NN SR models in one shot and apply the fused model to all testing images, as shown at the right of Fig. 4. Here, we employ a set of MM validation images, i.e., {Ii}i=1M\{I_{i}\}_{i=1}^{M}, to facilitate the fusion process. Based on the validation data, we calculate the adaptive fusion weights for each IiI_{i}, i.e., [w1i,w2i,,wNi][w_{1}^{i},w_{2}^{i},\cdots,w_{N}^{i}], using the trained attention-based weight regression network, and average them over the MM weight vectors to get a universal weight, i.e., [w¯1,w¯2,,w¯N][\bar{w}_{1},\bar{w}_{2},\cdots,\bar{w}_{N}]. Finally, a fused model is obtained, which inherits the advantages from the NN SR models in terms of both reconstruction fidelity and perceptual quality, while holding a stronger generalization capability.

IV EXPERIMENT

IV-A Experiment Setup

Datasets and evaluation metrics. To train the population of SR models, we use either the DIV2K dataset [67] (800 images) or the DF2K dataset [67, 68] (800 DIV2K images and 2650 Flickr2k images) according to the training datasets of competing methods. For the network fusion process, 700 images of DIV2K are used to train the weight regression network, and the remaining 100 images are used as the validation data in the network fusion process. Note that we do not employ any test data in the stages of network fusion process. Once learned, our weight regression network is fixed and applied to any given test set.

Among the 800 images in the DIV2K training dataset, we utilize 700 images as training data for the weight regression network, and the remaining 100 images are used as the validation data in the network fusion process. We have not employed any test data during the training stage. Once learned, our weight regression network is fixed and applied to any given test set. Therefore, our method is zero-shot, which aligns with the approach taken by the referenced and compared SR methods.

Following prior arts [8, 13, 48], we conduct experiments with a scaling factor of 4×4\times on both synthetic (downsampled using MATLAB bicubic kernel) and real-world (degraded using RealESRGAN pipeline [69]) experiments. For the bicubic-degraded SR task, we evaluate the performance of different methods on 6 benchmarks, including Set5 [70], Set14 [71], BSD100 [72], Urban100 [73], Manga109 [74], and DIV2K100 [67]. We compute the PSNR and SSIM [15] indices on the Y channel in the YCbCr space to measure the distortion degree, and compute the LPIPS [33] and DISTS [75] indices in the RGB space to evaluate the perceptual quality. For the real-world-degraded SR task, we evaluate the performance of different methods on the RealSR [76] and DrealSR [77] benchmarks.

Backbones and compared methods: As in many previous perceptually realistic SR works [22, 21, 13, 27], we adopt SRResNet [7] and RRDB [8] as the backbone networks. SRResNet is a lightweight network proposed in SRGAN [7], while RRDB [8] is widely used in later GAN-based SR methods [21, 78, 13] for its superior performance. Specifically, we compare our SRResNet-based model with SRGAN [7] and RankSRGAN [22], and compare our RRDB-based model with ESRGAN [8], SPSR [78], LDL [13], CAL-GAN [79] and WGSR [48]. In addition, Transformer-based backbones have become popular as they can capture long range dependency. Therefore. we compare SwinIR [43] trained with the LGANL_{GAN} loss by Adam optimizer and our proposed hybrid EA-Adam optimization method to validate the effectiveness of our method on the Transformer backbone. For the methods that apply different backbones or training datasets, we compare them with our model of similar capacity and training datasets. Specifically, we compare our SRResNet-based model with CFSNet [80] and G-MGBP [20], and compare our RRDB-based model with SROOE[81], PD-ADMM [28] and DualFormer [31]. We further validate the proposed method for the real-world SR tasks, and compare the obtained ‘RealESRGAN+Ours’ model with RealESRGAN [69], IKC [82], IKR-Net [83], LDL [13] and DASR [38]. The results of the compared methods are obtained using the officially released models. Note that we do not compare our method with the distortion-oriented SR methods, because this work focuses on perception-distortion balance.

Training details: We validate the effectiveness of our method with scaling factor 4 following the previous works [7, 8]. The data augmentation and discriminator networks used in SRGAN and ESRGAN are adopted in our SRResNet- and RRDB-based models, respectively, for a fair comparison. The size of training input patches is set as 32×32. For both the hybrid EA-Adam optimizer and the attention-based weight regression network, the learning rate is set as 1e51e^{-5} for the SRResNet backbone, and 1e41e^{-4} for the RRDB and SwinIR backbones. To balance the performance and the training efficiency, we set the population size as N=5N=5. The TAdamT^{Adam} for the Adam optimizer is 10, the TEAT^{EA} for the EA optimizer is 1, and the total optimization epoch TT is 100. The probability δ\delta of selecting parents from the neighboring population \mathcal{B} is 0.7.

TABLE II: {justify} Quantitative comparison between the state-of-the-art methods and the proposed hybrid EA-Adam optimization method. Three groups of methods trained with the same backbone are compared, where SRResNet [7], RRDB [8] and SwinIR [43] backbones are used, respectively. The ‘SwinIR-LGANL_{GAN}’ is the SwinIR model trained with the LGANL_{GAN} loss by the Adam optimizer. The best results of each group are highlighted in bold.
Method SRGAN [7] RankSR- GAN [22] SRResNet + Ours ESRGAN [8] SPSR [78] LDL [13] CAL- GAN [79] WGSR [48] RRDB + Ours SwinIR + LGANL_{GAN} SwinIR + Ours
Training Dataset DIV2K DIV2K DIV2K DF2K+OST DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K DIV2K
Set5 PSNR \uparrow 29.93 29.77 30.25 30.45 30.40 30.99 31.04 30.56 31.31 30.35 30.40
SSIM\uparrow 0.8535 0.8394 0.8584 0.8582 0.8443 0.8679 0.8555 0.8529 0.8792 0.8543 0.8581
LPIPS\downarrow 0.0740 0.0763 0.0694 0.0741 0.0686 0.0643 0.0670 0.0741 0.0683 0.0694 0.0678
DISTS\downarrow 0.0999 0.0958 0.0975 0.0972 0.0925 0.0938 0.1007 0.1073 0.0947 0.0960 0.0953
Set14 PSNR\uparrow 26.53 26.47 26.98 27.07 26.64 27.20 27.31 26.97 27.42 26.61 26.78
SSIM\uparrow 0.7173 0.7031 0.7342 0.7085 0.7138 0.7429 0.7367 0.7255 0.7537 0.7253 0.7302
LPIPS\downarrow 0.1430 0.1398 0.1330 0.1315 0.1345 0.1301 0.1308 0.1372 0.1239 0.1268 0.1268
DISTS\downarrow 0.1113 0.1105 0.1043 0.0985 0.0990 0.1003 0.1127 0.1147 0.1021 0.1074 0.1047
BSD100 PSNR\uparrow 25.58 25.51 25.94 25.33 25.51 26.12 26.29 26.01 26.51 25.58 25.75
SSIM\uparrow 0.6677 0.6510 0.6836 0.6643 0.6599 0.6941 0.6829 0.6815 0.7071 0.6764 0.6832
LPIPS\downarrow 0.1770 0.1829 0.1766 0.1602 0.1665 0.1614 0.1672 0.1791 0.1609 0.1568 0.1560
DISTS\downarrow 0.1293 0.1290 0.1286 0.1174 0.1186 0.1243 0.1287 0.1351 0.1212 0.1223 0.1207
Manga109 PSNR\uparrow 28.12 27.94 28.67 28.42 28.56 29.41 29.21 28.13 29.33 28.60 28.64
SSIM\uparrow 0.8656 0.8500 0.8735 0.8619 0.8590 0.8767 0.8671 0.8521 0.8874 0.8650 0.8694
LPIPS\downarrow 0.0704 0.0755 0.0648 0.0644 0.0664 0.0547 0.0690 0.0760 0.0536 0.0631 0.0621
DISTS\downarrow 0.0551 0.0560 0.0540 0.0467 0.0460 0.0402 0.0498 0.0653 0.0398 0.0407 0.0391
Urban100 PSNR\uparrow 24.41 24.53 24.83 24.36 24.80 25.50 25.33 24.83 25.57 25.06 25.14
SSIM\uparrow 0.7320 0.7284 0.7479 0.7363 0.7473 0.7692 0.7639 0.7453 0.7752 0.7564 0.7594
LPIPS\downarrow 0.1439 0.1435 0.1354 0.1235 0.1206 0.1096 0.1167 0.1303 0.1150 0.1215 0.1193
DISTS\downarrow 0.1076 0.1062 0.1038 0.0880 0.0861 0.0861 0.0877 0.1065 0.0842 0.0895 0.0863
DIV2K100 PSNR\uparrow 28.17 28.07 28.62 28.18 28.18 28.96 28.95 28.98 29.34 28.63 28.77
SSIM\uparrow 0.7765 0.7654 0.7900 0.7779 0.7720 0.7970 0.7897 0.7980 0.8123 0.7897 0.7945
LPIPS\downarrow 0.1254 0.1318 0.1208 0.1151 0.1126 0.1008 0.1072 0.1191 0.1038 0.1044 0.1027
DISTS\downarrow 0.0663 0.0657 0.0646 0.0594 0.0546 0.0529 0.0600 0.0687 0.0554 0.0562 0.0537
#FLOPs 10.38G 73.43G 4.42G
#Params 1.52M 16.69M 0.93M
TABLE III: {justify} Quantitative comparisons between the state-of-the-art perceptually realistic SR models and the models trained by our method. The ’RRDB+Ours-PD’ employs the RRDB backbone and the weight setting in PD-ADMM [28]. The ’DualFormer+Ours’ employs the RRDB backbone and the discriminator in DualFormer [31]. The best results of each group are highlighted in bold.
Method G-MGBP  [20] CFSNet [80] SRResNet +Ours PD-AD MM[28] RRDB +Ours-PD SROOE  [30] DualFormer  [31] RRDB +Ours DualFormer +Ours
Training Dataset DIV2K DIV2K DIV2K DIV2K DIV2K DF2K DF2K DF2K DF2K
Set5 PSNR\uparrow 29.54 30.23 30.25 31.82 31.68 31.30 31.35 31.45 31.32
SSIM\uparrow 0.8419 0.8503 0.8584 0.8810 0.8838 0.8671 0.8690 0.8744 0.8713
LPIPS\downarrow 0.0855 0.0781 0.0694 0.0758 0.0746 0.0646 0.0678 0.0640 0.0595
DISTS\downarrow 0.1167 0.0986 0.0975 0.1107 0.1047 0.0936 0.0920 0.0930 0.0917
Set14 PSNR\uparrow 26.58 26.59 26.98 27.99 27.95 27.36 27.47 27.59 27.53
SSIM\uparrow 0.7151 0.7170 0.7342 0.7589 0.7668 0.7362 0.7397 0.7479 0.7443
LPIPS\downarrow 0.1502 0.1619 0.1330 0.1401 0.1336 0.1167 0.1191 0.1137 0.1134
DISTS\downarrow 0.1207 0.1178 0.1043 0.1129 0.1083 0.0907 0.0933 0.1007 0.0974
BSD100 PSNR\uparrow 25.68 25.60 25.94 26.86 26.93 26.34 26.50 26.64 26.43
SSIM\uparrow 0.6673 0.6677 0.6836 0.7137 0.7206 0.6953 0.6895 0.7044 0.7019
LPIPS\downarrow 0.1882 0.1822 0.1766 0.1854 0.1774 0.1528 0.1583 0.1583 0.1520
DISTS\downarrow 0.1418 0.1312 0.1286 0.1369 0.1312 0.1172 0.1207 0.1209 0.1176
Manga109 PSNR\uparrow 28.17 28.54 28.67 30.17 29.92 29.97 29.82 29.82 29.87
SSIM\uparrow 0.8609 0.8622 0.8735 0.8864 0.8952 0.8802 0.8837 0.8943 0.8876
LPIPS\downarrow 0.0786 0.0689 0.0648 0.0616 0.0528 0.0504 0.0528 0.0491 0.0504
DISTS\downarrow 0.0726 0.0558 0.0550 0.0544 0.0455 0.0402 0.0377 0.0386 0.0365
Urban100 PSNR\uparrow 24.40 24.72 24.83 26.29 25.89 25.94 25.68 25.94 25.95
SSIM\uparrow 0.7415 0.7445 0.7479 0.7882 0.7824 0.7801 0.7741 0.7853 0.7854
LPIPS\downarrow 0.1477 0.1362 0.1354 0.1234 0.1220 0.1080 0.1150 0.1124 0.1089
DISTS\downarrow 0.1358 0.1086 0.1038 0.1016 0.0931 0.0846 0.0842 0.0833 0.0827
DIV2K100 PSNR\uparrow 28.27 28.49 28.62 29.71 29.91 29.24 29.30 29.33 29.36
SSIM\uparrow 0.7766 0.7826 0.7900 0.8117 0.8226 0.8016 0.8020 0.8142 0.8111
LPIPS\downarrow 0.1488 0.1233 0.1208 0.1223 0.1136 0.0968 0.1027 0.1034 0.0989
DISTS\downarrow 0.0925 0.0649 0.0666 0.0763 0.0654 0.0565 0.0551 0.0540 0.0524
#FLOPs 22.42G 19.22G 10.38G 109.78G 73.43G 90.06G 73.43G 73.43G 73.43G
#Params 0.28M 5.00M 1.52M 18.67M 16.69M 103.44M 16.69M 16.69M 16.69M
Refer to caption
Figure 5: Visual comparisons among the proposed EA-Adam optimization method and other state-of-the-art methods under the SRResNet backbone.
Refer to caption
Figure 6: Visual comparisons among the proposed EA-Adam optimization method and other state-of-the-art methods under the RRDB backbone.
Refer to caption
Figure 7: Visual comparisons among the SwinIR optimized with the LGANL_{GAN} loss by Adam optimizer and the proposed EA-Adam optimization method.
Refer to caption
Figure 8: Visual comparisons among the proposed EA-Adam optimization method and PD-ADMM.
Refer to caption
Figure 9: Visual comparisons among the proposed EA-Adam optimization method with SROOE and DualFormer.

IV-B Comparison with State-of-the-Arts

In this section, we compare the final fused model by our proposed hybrid EA-Adam optimizer and network fusion method against state-of-the-art methods. Table II provides the quantitative comparisons among the methods with the same backbone. Table III provides the quantitative comparisons among the methods with similar model capacity.

From Table II, we can see that for the lightweight network models with the SRResNet backbone, SRGAN gets better PSNR and SSIM scores than RankSRGAN on almost all the benchmarks, while they have higher or lower LPIPS scores on different benchmarks. Our proposed method improves the fidelity- and perceptual-oriented metrics by a large margin under all benchmarks without increasing the SR model complexity. This validates the advantages of our hybrid EA-Adam optimization strategy, which could push the PF toward the optimal point of solution space. For models trained with the RRDB backbone, we can see that our proposed method achieves better results than the competing methods ESRGAN, SPSR, LDL, CAL-GAN, and WGSR in most cases. The LDL also demonstrates competing results in several instances, especially for the perceptual quality metric on the Urban100 dataset. This advantage comes from its iterative and local discriminative strategy that can effectively inhibit visual artifacts and encourage the generation of high-frequency details. Furthermore, our method with the SwinIR backbone demonstrates significant improvement over its counterparts, which verifies that the proposed method can be applied to both CNN-based and transformer-based models.

From Table III, we can see that the proposed method achieves consistent improvement on most benchmarks in terms of both perceptual quality (LPIPS) and reconstruction accuracy (PSNR and SSIM), although the compared methods such as G-MGBP, CFSNet and SROOE have more model parameters and FLOPs (e.g., 22.42G FLOPs and 19.22G FLOPs for ‘G-MGBP’ and ‘CFSNet’, while 10.38G FLOPs for ‘SRResNet+Ours’). In addition, we test the released model of PD-ADMM, which performs favorably in terms of distortion control yet sacrifices the perceptual quality. PD-ADMM applies the same weight, i.e., [0.5, 0.5], for perception and distortion objectives. A constraint term is further introduced to ensure that the low-frequency information in two stages is equal. The ADMM optimization is applied to solve the constrained optimization problem in PD-ADMM. Since it is improper and difficult to integrate the constraint term into our proposed hybrid EA-Adam approach with NN-optimized models, to fairly compare our method with PD-ADMM, we apply the optimization objective of PD-ADMM to EA-Adam without its constraint. One can see that our method outperforms PD-ADMM in almost all cases but with fewer FLOPs and parameters. SROOE employs an additional condition network to learn the loss aggregation weights, leading to significantly larger model size and Flops than the other methods. In contrast, our EA-Adam method is more efficient while achieving comparable results with SROOE. Our method obtains better performance than DualFormer on both perception and distortion metrics. Note that DualFormer employs a spectral and a spatial discriminator. Our EA-Adam presents a new training strategy from the optimization perspective, which can be applied to DualFormer to further enhance its performance, as shown in ‘DualFormer+Ours’ in Table III.

We provide 4× SR visual comparisons between models (with SRResNet [7], RRDB [8] and SwinIR [43] backbones) trained by our method and other state-of-the-arts models in Fig. 5, Fig. 6 and Fig. 7. We do not compare with the early published method ESRGAN due to limited space. In addition, we give the visual comparisons between PD-ADMM [28] and ‘RRDB+Ours-PD’, i.e., the RRDB model trained by our EA-Adam optimization method with the weight setting in PD-ADMM, in Fig. 8. The visual comparisons between SROOE [30], DualFormer [31] and our proposed EA-Adam optimization method are presented in Fig. 9. One can see that our method can restore more correct visual patterns (e.g., the steps in the second group of Fig. 5 and the zebra crossings in the third group of Fig. 6) while inhibiting the artifacts (e.g., the building windows in the fifth group of Fig. 9). At the same time, our model can generate many high-frequency details (e.g., the lines of buildings and wood floor in the first and third group of Figs. 7 and 8), respectively.

IV-C Ablation Studies

In this section, we conduct a series of ablation studies to validate the effectiveness of the proposed hybrid EA-Adam optimization. We first conduct experiments to compare our hybrid EA-Adam algorithm with the Adam algorithm, and compare their results with different selections of NN. Then we compare our network fusion method with the learnable weight method. Furthermore, we study the selection of different loss functions in optimizing the weight regression network. We also study the aggregation weights of obtained models with different perception-distortion preferences. All ablation results are evaluated on the Urban100 dataset.

TABLE IV: {justify} Quantitative comparison between the models optimized by Adam, AdamW, our hybrid EA-Adam, and hybrid EA-AdamW optimizers. The SRResNet backbone is adopted and the models are evaluated on the Urban100 dataset. ‘Fused Model’ is obtained by applying our network fusion method to the four trained models. The best results are highlighted.
Model Model1 Model2 Model3 Model4 Fused model
Weights [0.25,0.75][0.25,0.75] [0.5,0.5][0.5,0.5] [0.75,0.25][0.75,0.25] [1,0][1,0] -
PSNR Adam 24.36 23.82 23.96 24.22 24.41
EA-Adam 25.78 25.75 24.95 24.73 24.83
AdamW 25.48 24.67 23.41 24.18 24.37
EA-AdamW 25.95 25.81 25.21 24.82 24.91
LPIPS Adam 0.3061 0.2899 0.2665 0.1446 0.1437
EA-Adam 0.2120 0.2055 0.1536 0.1384 0.1354
AdamW 0.2570 0.2925 0.2306 0.1477 0.1444
EA-AdamW 0.2265 0.2201 0.1519 0.1404 0.1368
Refer to caption
Figure 10: Visual comparisons among SR models trained by the original Adam optimizer and our hybrid EA-Adam optimizer with the SRResNet [7] backbone. Each model trained by EA-Adam outperforms the one trained by Adam in both fidelity and perceptual quality. The fused model by the proposed EA-Adam method achieves further benefits in fidelity and perception.

Hybrid EA-Adam vs. Adam. We compare the performance of models optimized by the original Adam and our hybrid EA-Adam. In addition, we replace Adam with AdamW [84], a recently developed variant of Adam, and report the results of AdamW and hybrid EA-AdamW for a more comprehensive evaluation of our proposed method. As traditional optimizers (Adam and AdamW) cannot directly optimize a multi-objective problem, we uniformly set 4 combinations of loss weights, i.e., [0.25,0.75][0.25,0.75], [0.5,0.5][0.5,0.5], [0.75,0.25][0.75,0.25] and [1,0][1,0], to weigh the two objectives in Eq. (2) into a single-objective optimization problem. For example, if the weight is [0.25,0.75][0.25,0.75], the optimization objective of Adam is 0.25×f1+0.75×f20.25\times f_{1}+0.75\times f_{2}, which is the same as that of the Adam steps in our proposed hybrid optimizer. The models optimized by different weights are named ‘Model1’ to ‘Model4’, respectively. In addition, we apply the same network fusion method (as in Section 3.3) to the 4 models optimized by Adam.

Table IV gives the quantitative comparison. Without loss of generality, we conduct experiments with the SRResNet backbone and evaluate on the Urban100 dataset. PSNR and LPIPS metrics are used to measure the fidelity and perceptual quality, respectively. One can see that our hybrid EA-Adam/AdamW optimizers outperform Adam/AdamW by a large margin in most cases. Both the fidelity and perceptual quality metrics are improved, validating that the proposed hybrid EA-Adam strategy can push the PF closer to the optimal point of the metric space. For most models, EA-Adam improves PSNR by more than 0.4dB over Adam and improves LPIPS by more than 10%10\%. This might be caused by the gradient vanishing problem in the gradient-based optimization process so that the models can be trapped into a local minimum. In contrast, with our hybrid EA-Adam optimizer, effective interaction between different models can be introduced by the crossover operator in EA, so that the models can be stably optimized with better convergence.

On the other hand, each of the four models may perform well in one aspect, yet sacrifices the other aspect as a trade-off. The fused model by our network fusion method achieves a better balance between fidelity and perceptual quality. Specifically, although the fidelity indices of the fused model are lower than that of Models 1, 2, and 3, its perceptual performance is better than Model 4, which is trained to maximize perceptual quality. This result well aligns with the goal of fusion network, which aims to further improve the perceptual quality of the fused model without reducing much the fidelity. It is worth noting that the bad convergence of Models 1 to 3 makes the traditional Adam/Adamw optimizer gain less improvement from the network fusion process. In contrast, the proposed hybrid optimization strategy facilitates the effectiveness of network fusion via the interaction of individuals in crossover operation. In addition, we find that AdamW does not show many advantages over Adam in the SR task. This is because AdamW is mainly proposed to handle the over-fitting problem in high-level vision tasks by introducing weight decay. However, this may not fit the SR task as the optimization might be under-fitting to the large space of image details.

Fig. 10 shows the visual comparisons among SR models trained by Adam and our EA-Adam optimizers, where consistent observation with Table IV can be drawn. Specifically, models optimized by EA-Adam outperform their counterparts optimized by Adam, where the structures and patterns are more loyal to the HR and richer details are generated. It can also be observed that while each of the four models is preferred in either fidelity or perceptual quality, the fused model by our method shows comprehensive enhancement in both aspects.

TABLE V: {justify} Training time and quantitative comparisons on Urban100 benchmark by using different numbers of individuals (NN) in EA.
N PSNR SSIM LPIPS Training time
3 25.34 0.7525 0.1184 63h
5 25.57 0.7752 0.1150 112h
7 25.60 0.7804 0.1125 160h
9 25.64 0.7823 0.1114 211h
TABLE VI: {justify} Quantitative comparison between the ‘Learnable Weight’, ‘Weight Regression’, and ‘Our Fusion Method’ on the Urban100 dataset. The RRDB backbone is adopted. The best and second results are highlighted in red and blue.
PSNR SSIM LPIPS
Learnable Weight 25.60 0.7748 0.1195
Weight Regression 25.71 0.7756 0.1124
Our Fusion Method 25.57 0.7752 0.1150

Selection of the number of optimized models NN. To show the influence of the number of optimized models, we conduct experiments by using different selections of NN. The results on Urban100 are shown in Table V. One can see that increasing NN can obtain better perception-distortion performance but at the price of longer optimization time. When NN is greater than 5, the benefits are not significant. Considering both performance and training time, we choose N=5N=5 in the experiments.

Network Fusion vs. Learnable Weight. We propose a two-stage framework for fusing NN networks. The first stage is the attention-based weight regression, which generates a set of weights for each input LR image. Therefore, the corresponding weight vectors of validation data can be obtained. The second stage is model fusion, which obtains a single stronger model by averaging all the weight vectors of validation data. To show the effectiveness of the proposed fusion method, we introduce a fusion method by setting the fusion weights of NN models as learnable parameters directly, which is named ‘Learnable Weight’. We compare our two stages, named ‘Weight Regression’ and ‘Our Fusion Method’, with ‘Learnable Weight’ on the Urban100 dataset in Table VI. We can see that the ‘Weight Regression’ achieves the best results in all metrics since it can dynamically infer the optimal weights for each input based on image content. ‘Our Fusion Method’ obtains better performance than ‘Learnable Weight’ in perceptual quality metric LPIPS with 0.0045 improvement while keeping comparable PSNR and SSIM values. This is because the universal network weight is inherited from ‘Weight Regression’ using validation data, which holds a stronger generalization capability.

TABLE VII: {justify} Quantitative comparison on Urban100 benckmark among different loss functions for the network fusion process. The RRDB backbone is adopted.
PSNR SSIM LPIPS
RRDB+Ours-1\ell_{1} 26.55 0.8029 0.1955
RRDB +Ours-PD [65] 25.89 0.7824 0.1220
RRDB +Ours-LadvL_{\scalebox{0.85}{adv}} 25.57 0.7752 0.1150

Loss Selection for Network Fusion. The fusion network aims to leverage the strength of NN models for perceptually more plausible SR results. Table VII gives the numerical comparisons on the Urban100 benchmark with 1\ell_{1}, PDPD [28] and LadvL_{adv} loss functions. We choose the adversarial loss in this paper because it achieves the best LPIPS score, which implies perceptually more realistic outputs.

TABLE VIII: {justify} Aggregation weights of two layers (L1 and L2) in the RRDB models with different perception-distortion preferences given two inputs (I1 and I2), and the final fusion model (Ours).
L1 L2
I1 [0.0463, 0.0167, 0.0678, 0.3333, 0.5358] [0.2348, 0.2084, 0.1426, 0.1452, 0.2689]
I2 [0.0263, 0.0144, 0.1014, 0.3307, 0.5272] [0.3371, 0.1061, 0.0022, 0.1985, 0.3561]
Ours [0.0965, 0.0733, 0.1429, 0.2629, 0.4243] [0.3203, 0.1212, 0.0224, 0.1932, 0.3428]
TABLE IX: {justify} Quantitative comparisons on Urban100 benchmark between different ways of aggregation weight combination. The RRDB backbone is adopted. The best results are highlighted in bold.
PSNR SSIM LPIPS
(a) 21.86 0.6546 0.2269
(b) 24.52 0.7573 0.1755
(c) 25.57 0.7752 0.1150

Aggregation Weights of Obtained Models. In the fusion stage, the network parameters of the NN models are frozen, and the aggregation weights are optimized for each layer using a set of validation data to fuse the NN models into one by interpolation, as done in those network interpolation methods [39]. To further show the advantage of our proposed fusion method, we study the aggregation weights of obtained models. Given two inputs (I1 and I2), the aggregation weights in two random layers (L1 and L2) of the 55 obtained RRDB models are shown in Table VIII. The aggregation weights of the final fusion model ‘Ours’ are also given. It can be seen that the weights are neither evenly distributed among models nor concentrated on one or two models. The aggregation weight distribution varies in layers, and each model contributes to the final fusion. This demonstrates the effectiveness of our EA-Adam optimizer, which can simultaneously obtain models with different focuses. We also perform the ablation study on aggregation weight by (a) directly averaging the parameters of the 55 networks, (b) applying the aggregation weights of one layer to all layers of the 55 networks, and (c) our proposed fusion strategy. The results on Urban100 are shown in Table IX. Our fusion method achieves the best result, showing the effectiveness of our network fusion method.

IV-D Applications to real-world Super-Resolution

To demonstrate the generalization capability of the proposed method, we further present the comparisons by using the complex real-world SR setting. Specifically, we use the generator and discriminator of RealESRGAN [69] and optimize the model with our proposed EA-Adam hybrid optimizer and the fusion strategy. The DF2K [67, 68] is used as the training dataset, and the LR images are degraded with the RealESRGAN degradation pipeline [69]. The test data are from the real-world datasets, including RealSR [76] and DRealSR [77]. The compared methods include RealESRGAN [69], IKC [82], LDL [13], IKRNet [83] and DASR [38]. The results are shown in Table X. One can see that IKC obtains the worst results since it focuses on solving blur degradation and cannot adapt to real-world SR tasks. The proposed method obtains the best perception-distortion results in the real-world SR task, demonstrating its effectiveness.

TABLE X: {justify} Quantitative comparisons with GAN-based state-of-the-art real-world super-resolution methods on RealSR and DrealSR test datasets. The RRDB backbone is adopted. The best result is highlighted in bold.
RealSR DrealSR
PSNR/SSIM/LPIPS PSNR/SSIM/LPIPS
RealESRGAN 25.69/0.7616/0.2727 28.64/0.8053/0.2847
IKC 19.51/0.5088/0.4337 27.07/0.7429/0.3994
IKR-Net 25.15/0.7308/0.3912 28.94/0.8131/0.4274
LDL 25.28/0.7567/0.2766 28.21/0.8126/0.2815
DASR 27.02/0.7708/0.3151 29.77/0.8263/0.3126
RRDB+Ours 27.59/0.7806/0.2621 30.32/0.8358/0.2802

IV-E Training and Inference Time

Training time. We first compare the training time of our hybrid EA-Adam optimization method (including the EA-Adam optimizer and the network fusion stage) and the original Adam optimizer by training an SR model with RRDB-backbone for ×4 super-resolution. Table XI provides the quantitative comparison of the training time. Specifically, the Adam optimizer costs 250 epochs and 48h to converge. Though hybrid EA-Adam costs a longer time in each epoch, it converges in much fewer epochs thanks to the effective interaction in the EA step. The hybrid EA-Adam optimization method totally costs 144h to finish the model training, which is about three times the training time of Adam. Most of the training time of hybrid EA-Adam is spent on the EA part. It is noted that there have been various parallel computation methods [85, 86] to speed up the EA process based on CUDA technology, showing great potential to accelerate our method in training.

Inference time. While spending more time in training, the proposed method does not introduce any extra computational and memory burden during the inference stage. The trained SR model has the same inference time as other SR models with the same backbone.

TABLE XI: {justify} Comparison of the training time of our hybrid EA-Adam optimization method and the Adam method.
Method Adam EA-Adam optimizer Network Fusion
EA-iter Adam-iter
Run-time/epoch 0.19h 4.76h 0.80h 0.63h
Total-epoch 250 8 92 50
Total-run-time 48h 112h 32h
Refer to caption
Figure 11: Visualization of failure cases generated by our model.

IV-F Failure Cases

While our method can improve the training process of GAN-based SR models and improve the model performance in perception-distortion balance, it can still fail in some challenging cases. Fig. 11 shows some typical examples. If the high-frequency structures are largely destroyed in the input LR image, it is hard for our model to correct them in the SR output, as shown in Fig. 11(a). In addition, for some small-scale textures and text characters, our method may generate some visual artifacts, as shown in Figs. 11(b)) and 11(c)).

V Conclusion

In this paper, we proposed a hybrid EA-Adam optimization method to optimize the perception-distortion balanced image super-resolution (SR) models. We formulated the perception-distortion trade-off as a multi-objective optimization problem. A population of SR models were optimized by alternatively performing the Adam and EA steps, where the Adam was dedicated to the model convergence and the EA focused on rescuing the model from local minimum due to its strong capability in multi-objective searching. An effective model fusion strategy was then designed to merge the trained models into a stronger one, which further improved the perception-distortion trade-off without increasing the inference cost. Extensive experiments demonstrated the effectiveness of the proposed method against the traditional Adam optimizer, and the state-of-the-art perception-distortion balanced performance of the trained SR models.

References

  • [1] Z. Wang, J. Chen, and S. C. H. Hoi, “Deep learning for image super-resolution: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3365–3387, 2021.
  • [2] D. Chao, L. C. Change, H. Kaiming, and T. Xiaoou, “Learning a deep convolutional network for image super-resolution,” in European conference on computer vision, 2014, pp. 184–199.
  • [3] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in European conference on computer vision.   Springer, 2016, pp. 391–407.
  • [4] M. S. Sajjadi, B. Scholkopf, and M. Hirsch, “EnhanceNet: Single image super-resolution through automated texture synthesis,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 4491–4500.
  • [5] M. Haris, G. Shakhnarovich, and N. Ukita, “Deep back-projection networks for super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1664–1673.
  • [6] J. Kim, J. K. Lee, and K. M. Lee, “Deeply-recursive convolutional network for image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1637–1645.
  • [7] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4681–4690.
  • [8] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. Change Loy, “ESRGAN: Enhanced super-resolution generative adversarial networks,” in Proceedings of the European conference on computer vision (ECCV) workshops, 2018, pp. 0–0.
  • [9] C. Laroche, A. Almansa, and M. Tassano, “Deep model-based super-resolution with non-uniform blur,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2023, pp. 1797–1808.
  • [10] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 286–301.
  • [11] S. Anwar and N. Barnes, “Densely residual laplacian super-resolution,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, pp. 1192–1204, 2020.
  • [12] T. Tong, G. Li, X. Liu, and Q. Gao, “Image super-resolution using dense skip connections,” in Proceedings of the IEEE International Conference on computer vision, 2017, pp. 4799–4807.
  • [13] J. Liang, H. Zeng, and L. Zhang, “Details or Artifacts: A locally discriminative learning approach to realistic image super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5657–5666.
  • [14] T. Xu, P. Mi, X. Zheng, L. Li, F. Chao, G. Jiang, W. Zhang, Y. Zhou, and R. Ji, “What hinders perceptual quality of PSNR-oriented methods?” arXiv preprint arXiv:2201.01034, 2022.
  • [15] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
  • [16] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European conference on computer vision.   Springer, 2016, pp. 694–711.
  • [17] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2.   Ieee, 2003, pp. 1398–1402.
  • [18] Y. Blau and T. Michaeli, “The perception-distortion tradeoff,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6228–6237.
  • [19] J. W. Soh, G. Y. Park, J. Jo, and N. I. Cho, “Natural and realistic single image super-resolution with explicit natural manifold discrimination,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8122–8131.
  • [20] Y. Blau, R. Mechrez, R. Timofte, T. Michaeli, and L. Zelnik-Manor, “The 2018 PIRM challenge on perceptual image super-resolution,” in Computer Vision – ECCV 2018 Workshops.   Cham: Springer International Publishing, 2019, pp. 334–355.
  • [21] K. Zhang, L. V. Gool, and R. Timofte, “Deep unfolding network for image super-resolution,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 3217–3226.
  • [22] W. Zhang, Y. Liu, C. Dong, and Y. Qiao, “RankSRGAN: Generative adversarial networks with ranker for image super-resolution,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3096–3105.
  • [23] Q. Cai, J. Li, H. Li, Y.-H. Yang, F. Wu, and D. Zhang, “Tdpn: Texture and detail-preserving network for single image super-resolution,” IEEE Transactions on Image Processing, vol. 31, pp. 2375–2389, 2022.
  • [24] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
  • [25] E. Prashnani, H. Cai, Y. Mostofi, and P. Sen, “PieAPP: Perceptual image-error assessment through pairwise preference,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1808–1817.
  • [26] D. Fuoli, L. Van Gool, and R. Timofte, “Fourier space losses for efficient perceptual image super-resolution,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 2360–2369.
  • [27] W. Li, K. Zhou, L. Qi, L. Lu, and J. Lu, “Best-Buddy GANs for highly detailed image super-resolution,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, 2022, pp. 1412–1420.
  • [28] Y. Zhang, B. Ji, J. Hao, and A. Yao, “Perception-distortion balanced ADMM optimization for single-image super-resolution,” in European Conference on Computer Vision.   Springer, 2022, pp. 108–125.
  • [29] L. Xie, X. Wang, X. Chen, G. Li, Y. Shan, J. Zhou, and C. Dong, “Desra: detect and delete the artifacts of gan-based real-world super-resolution models,” arXiv preprint arXiv:2307.02457, 2023.
  • [30] S. H. Park, Y. S. Moon, and N. I. Cho, “Perception-oriented single image super-resolution using optimal objective estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1725–1735.
  • [31] X. Luo, Y. Zhu, S. Xu, and D. Liu, “On the effectiveness of spectral discriminators for perceptual quality improvement,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 13 243–13 253.
  • [32] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [33] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595.
  • [34] N. Srinivas and K. Deb, “Muiltiobjective optimization using nondominated sorting in genetic algorithms,” Evolutionary computation, vol. 2, no. 3, pp. 221–248, 1994.
  • [35] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE transactions on evolutionary computation, vol. 6, no. 2, pp. 182–197, 2002.
  • [36] Q. Zhang and H. Li, “MOEA/D: A multiobjective evolutionary algorithm based on decomposition,” IEEE Transactions on evolutionary computation, vol. 11, no. 6, pp. 712–731, 2007.
  • [37] Y. Wang, L. Wang, H. Wang, P. Li, and H. Lu, “Blind single image super-resolution with a mixture of deep networks,” Pattern Recognition, vol. 102, p. 107169, 2020.
  • [38] J. Liang, H. Zeng, and L. Zhang, “Efficient and degradation-adaptive network for real-world image super-resolution,” in European Conference on Computer Vision, 2022.
  • [39] X. Wang, K. Yu, C. Dong, X. Tang, and C. C. Loy, “Deep network interpolation for continuous imagery effect transition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1692–1701.
  • [40] M. Liu, J. Pan, Z. Yan, W. Zuo, and L. Zhang, “Adaptive network combination for single-image reflection removal: A domain generalization perspective,” arXiv preprint arXiv:2204.01505, 2022.
  • [41] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 136–144.
  • [42] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2472–2481.
  • [43] J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “SwinIR: Image restoration using swin transformer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1833–1844.
  • [44] X. Zhang, H. Zeng, S. Guo, and L. Zhang, “Efficient long-range attention network for image super-resolution,” arXiv preprint arXiv:2203.06697, 2022.
  • [45] X. Chen, X. Wang, J. Zhou, Y. Qiao, and C. Dong, “Activating more pixels in image super-resolution transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 367–22 377.
  • [46] Q. Cai, Y. Qian, J. Li, J. Lyu, Y.-H. Yang, F. Wu, and D. Zhang, “Hipa: Hierarchical patch transformer for single image super resolution,” IEEE Transactions on Image Processing, vol. 32, pp. 3226–3237, 2023.
  • [47] H. Huang, R. He, Z. Sun, and T. Tan, “Wavelet-SRNET: A wavelet-based CNN for multi-scale face super resolution,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1689–1697.
  • [48] C. Korkmaz, A. M. Tekalp, and Z. Dogan, “Training generative image super-resolution models by wavelet-domain losses enables better control of artifacts,” arXiv preprint arXiv:2402.19215, 2024.
  • [49] S. H. Park, Y. S. Moon, and N. I. Cho, “Flexible style image super-resolution using conditional objective,” IEEE Access, vol. 10, pp. 9774–9792, 2022.
  • [50] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a “completely blind” image quality analyzer,” IEEE Signal processing letters, vol. 20, no. 3, pp. 209–212, 2012.
  • [51] C. Ma, C.-Y. Yang, X. Yang, and M.-H. Yang, “Learning a no-reference quality metric for single-image super-resolution,” Computer Vision and Image Understanding, vol. 158, pp. 1–16, 2017.
  • [52] T. Higuchi, S. Tsutsui, and M. Yamamura, “Theoretical analysis of simplex crossover for real-coded genetic algorithms,” in International Conference on Parallel Problem Solving from Nature.   Springer, 2000, pp. 365–374.
  • [53] X. Zhou, A. K. Qin, M. Gong, and K. C. Tan, “A survey on evolutionary construction of deep neural networks,” IEEE Transactions on Evolutionary Computation, vol. 25, no. 5, pp. 894–912, 2021.
  • [54] R. Saravanan, S. Ramabalan, N. G. R. Ebenezer, and C. Dharmaraja, “Evolutionary multi criteria design optimization of robot grippers,” Applied Soft Computing, vol. 9, no. 1, pp. 159–172, 2009.
  • [55] H. Mala-Jetmarova, N. Sultanova, and D. Savic, “Lost in optimisation of water distribution systems? A literature review of system design,” Water, vol. 10, no. 3, p. 307, 2018.
  • [56] Z. Guo, X. Zhang, H. Mu, W. Heng, Z. Liu, Y. Wei, and J. Sun, “Single path one-shot neural architecture search with uniform sampling,” in European conference on computer vision.   Springer, 2020, pp. 544–560.
  • [57] J. Yu, P. Jin, H. Liu, G. Bender, P.-J. Kindermans, M. Tan, T. Huang, X. Song, R. Pang, and Q. Le, “BigNAS: Scaling up neural architecture search with big single-stage models,” in European Conference on Computer Vision.   Springer, 2020, pp. 702–717.
  • [58] G. Ying, X. He, B. Gao, B. Han, and X. Chu, “EAGAN: Efficient two-stage evolutionary architecture search for GANs,” in European Conference on Computer Vision.   Springer, 2022, pp. 37–53.
  • [59] Y. Zhou, G. G. Yen, and Z. Yi, “A knee-guided evolutionary algorithm for compressing deep neural networks,” IEEE transactions on cybernetics, vol. 51, no. 3, pp. 1626–1638, 2019.
  • [60] X. Cui, W. Zhang, Z. Tüske, and M. Picheny, “Evolutionary stochastic gradient descent for optimization of deep neural networks,” Advances in neural information processing systems, vol. 31, 2018.
  • [61] H. Zhang, K. Hao, L. Gao, B. Wei, and X. Tang, “Optimizing deep neural networks through neuroevolution with stochastic gradient descent,” IEEE Transactions on Cognitive and Developmental Systems, 2022.
  • [62] S. Yang, Y. Tian, C. He, X. Zhang, K. C. Tan, and Y. Jin, “A gradient-guided evolutionary approach to training deep neural networks,” IEEE Transactions on Neural Networks and Learning Systems, 2021.
  • [63] Y. Tian, H. Chen, H. Ma, X. Zhang, K. C. Tan, and Y. Jin, “Integrating conjugate gradients into evolutionary algorithms for large-scale continuous multi-objective optimization,” IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 10, pp. 1801–1817, 2022.
  • [64] Z. Chen, V. Badrinarayanan, C.-Y. Lee, and A. Rabinovich, “GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks,” in International conference on machine learning.   PMLR, 2018, pp. 794–803.
  • [65] K. Miettinen, Nonlinear multi-objective optimization.   Springer Science, 2012, vol. 12.
  • [66] Y. Chen, X. Dai, M. Liu, D. Chen, L. Yuan, and Z. Liu, “Dynamic convolution: Attention over convolution kernels,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 030–11 039.
  • [67] E. Agustsson and R. Timofte, “NTIRE 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 126–135.
  • [68] R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang, and L. Zhang, “Ntire 2017 challenge on single image super-resolution: Methods and results,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 114–125.
  • [69] X. Wang, L. Xie, C. Dong, and Y. Shan, “Real-esrgan: Training real-world blind super-resolution with pure synthetic data,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 1905–1914.
  • [70] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel, “Low-complexity single-image super-resolution based on nonnegative neighbor embedding,” 2012.
  • [71] R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” in International conference on curves and surfaces.   Springer, 2010, pp. 711–730.
  • [72] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 2.   IEEE, 2001, pp. 416–423.
  • [73] J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-exemplars,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5197–5206.
  • [74] Y. Matsui, K. Ito, Y. Aramaki, A. Fujimoto, T. Ogawa, T. Yamasaki, and K. Aizawa, “Sketch-based manga retrieval using Manga109 dataset,” Multimedia Tools and Applications, vol. 76, no. 20, pp. 21 811–21 838, 2017.
  • [75] K. Ding, K. Ma, S. Wang, and E. P. Simoncelli, “Image quality assessment: Unifying structure and texture similarity,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 5, pp. 2567–2581, 2020.
  • [76] J. Cai, H. Zeng, H. Yong, Z. Cao, and L. Zhang, “Toward real-world single image super-resolution: A new benchmark and a new model,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  • [77] P. Wei, Z. Xie, H. Lu, Z. Zhan, Q. Ye, W. Zuo, and L. Lin, “Component divide-and-conquer for real-world image super-resolution,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16.   Springer, 2020, pp. 101–117.
  • [78] C. Ma, Y. Rao, Y. Cheng, C. Chen, J. Lu, and J. Zhou, “Structure-preserving super resolution with gradient guidance,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 7769–7778.
  • [79] J. Park, S. Son, and K. M. Lee, “Content-aware local gan for photo-realistic super-resolution,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 10 585–10 594.
  • [80] W. Wang, R. Guo, Y. Tian, and W. Yang, “Cfsnet: Toward a controllable feature space for image restoration,” in Proceedings of the IEEE/CVF International Conference on Computer Vision ICCV, October 2019.
  • [81] S. Vasu, N. Thekke Madam, and A. Rajagopalan, “Analyzing perception-distortion tradeoff using enhanced perceptual super-resolution network,” in Proceedings of the European Conference on Computer Vision ECCV Workshops, 2018, pp. 0–0.
  • [82] J. Gu, H. Lu, W. Zuo, and C. Dong, “Blind super-resolution with iterative kernel correction,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 1604–1613.
  • [83] H. F. Ates, S. Yildirim, and B. K. Gunturk, “Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation,” Computer Vision and Image Understanding, vol. 233, p. 103718, 2023.
  • [84] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in International Conference on Learning Representations, 2019.
  • [85] J. Shi, J. Sun, Q. Zhang, H. Zhang, and Y. Fan, “Improving pareto local search using cooperative parallelism strategies for multiobjective combinatorial optimization,” IEEE Transactions on Cybernetics, 2022.
  • [86] J. Shi, Q. Zhang, and J. Sun, “PPLS/D: Parallel pareto local search based on decomposition,” IEEE transactions on cybernetics, vol. 50, no. 3, pp. 1060–1071, 2018.