AllRestorer: All-in-One Transformer for Image Restoration under Composite Degradations

Jiawei Mao¹ Yu Yang¹ Xuesong Yin¹ Ling Shao²
Hao Tang³
¹Hangzhou Dianzi University ²University of the Chinese Academy of Sciences ³Peking University

Abstract

Image restoration models often face the simultaneous interaction of multiple degradations in real-world scenarios. Existing approaches typically handle single or composite degradations based on scene descriptors derived from text or image embeddings. However, due to the varying proportions of different degradations within an image, these scene descriptors may not accurately differentiate between degradations, leading to suboptimal restoration in practical applications. To address this issue, we propose a novel Transformer-based restoration framework, AllRestorer. In AllRestorer, we enable the model to adaptively consider all image impairments, thereby avoiding errors from scene descriptor misdirection. Specifically, we introduce an All-in-One Transformer Block (AiOTB), which adaptively removes all degradations present in a given image by modeling the relationships between all degradations and the image embedding in latent space. To accurately address different variations potentially present within the same type of degradation and minimize ambiguity, AiOTB utilizes a composite scene descriptor consisting of both image and text embeddings to define the degradation. Furthermore, AiOTB includes an adaptive weight for each degradation, allowing for precise control of the restoration intensity. By leveraging AiOTB, AllRestorer avoids misdirection caused by inaccurate scene descriptors, achieving a 5.00 dB increase in PSNR compared to the baseline on the CDD-11 dataset.

Figure 1: AllRestorer can be applied to several image restoration tasks. (i) All-in-One Image Restoration: AllRestorer can address multiple single degradation scenarios with just one set of parameters. (ii) Real-World Image Restoration: AllRestorer can successfully respond to real-world all-in-one restoration and composite degradation restoration challenges. (iii) Composite Degradation Restoration: AllRestorer can be applied to various composite degradation restoration tasks with only a single network.

1 Introduction

To meet the demands of autonomous driving [13, 58, 44] under various road conditions, recent studies have moved beyond image restoration for single-degradation scenarios [49, 28, 38] and have shifted their focus to all-in-one image restoration tasks [25, 32, 59]. As an emerging low-level vision task, all-in-one restoration requires the model to restore visuals from various types of impaired inputs.

While all-in-one image restoration has shown satisfactory performance under experimental conditions, real-world scenarios often involve multiple degradations coexisting in the same scene, posing a significant challenge to most restoration methods. To address this issue, there is an increasing need for a solution that combines all-in-one image restoration with composite degradation restoration. Recent approaches introduce the concept of scene descriptors in all-in-one image restoration [17, 5], which guide the model to accurately perform restoration by providing finer distinctions between single and composite degradations.

VPIP [9] introduces paired degraded and clean images corresponding to the current restoration task as visual scene descriptors, providing prior information for restoring composite degraded images. However, the VPIP setting requires users to accurately determine the type of degradation and provide the corresponding visual scene descriptors, making it impractical for real-world deployment. To address this limitation, MPerceiver [1] introduces impaired image-to-text inversion, creating a textual scene descriptor that adaptively leverages diffusion-based prior knowledge for composite image restoration without human intervention. OneRestore [17] uses explicit definitions of different degradations in textual form to guide scene descriptors, enabling accurate restoration of composite degradations.

Despite some progress under experimental conditions, these schemes do not effectively model composite degradations. Since composite degradations include multiple degradations, and the proportions of these degradations vary, it often leads to confusion for users and models in identifying the correct degradation type, causing incorrect restoration.

Refer to caption — Figure 2: AllRestorer achieves state-of-the-art performance on the CDD-11 dataset while remaining lightweight.

In response to the above issues, we propose a transformer-based restoration model, AllRestorer, designed to handle the four physical degradation types (see Fig. 1) in the official composite degradation dataset CDD-11 [17]. In AllRestorer, we propose a novel All-in-One Transformer Block (AiOTB) to adaptively remove all degradations in latent space, effectively addressing the composite degradation problem. As the core component of AllRestorer, AiOTB builds a memory bank containing descriptors of all single degradation types (e.g., rain, snow, fog, and low-light) and implements a novel All-in-One Attention (AiOA) mechanism. AiOA leverages the memory bank’s descriptors to introduce restoration schemes for all degradations. The self-attention mechanism in AiOTB adaptively assigns the appropriate restoration schemes for each degradation in latent space. Considering that visual scene descriptors lack precise definitions for degradations and that text scene descriptors may not efficiently capture the variations within the same degradation type, AllRestorer employs a composite scene descriptor that integrates both text and image embeddings. To adapt to varying proportions of single degradations within composite degradation scenes and to support all-in-one restoration composed of multiple single-degradation processes, AiOA also includes an adaptive weighting mechanism to control the restoration intensity for different degradations.

To demonstrate the effectiveness of AllRestorer for all-in-one image restoration and composite degradation restoration, we trained and tested it on the CDD-11 dataset. As shown in Fig. 2, AllRestorer achieves state-of-the-art (SoTA) performance on CDD-11, surpassing the previous SoTA baseline by 5.00 dB in PSNR. Additionally, real-world tests confirmed the potential of AllRestorer for practical applications.

Our contributions can be summarized as:

•

We propose a novel restoration scheme, AllRestorer, which effectively addresses the composite degradation challenge by removing all types of degradation.
•

Our composite scene descriptor, based on text and image embeddings, ensures an accurate representation of degradation types.
•

Our adaptive weights effectively control the restoration intensity for different degradations within composite scenes, accommodating varying proportions.

2 Related Work

2.1 All-in-One Image Restoration

All-in-one image restoration aims to address multiple types of degradation in real-world scenarios using a single model. All-in-One [26] first introduces Neural Architecture Search (NAS) to extend the restoration task to an all-in-one approach. However, the large number of parameters in NAS makes practical deployment challenging. TKL [8] utilizes knowledge distillation to aggregate multiple restoration priors for the all-in-one task, while TransWeather [41] introduces learnable queries to model different degradations. However, learnable queries lack interpretability, and knowledge distillation requires collecting teacher models. WeatherDiff [30] and MPerceiver [1] use diffusion models to handle all-in-one restoration. VPIP [9] and PromptIR [32] employ prompt learning with different scenario descriptors for all-in-one restoration, and OneRestore [17] further explores composite degradation restoration by improving scene descriptors (see Fig. 3). However, inaccurate judgment of scene descriptors affects the application of these solutions.

2.2 Prompt Learning

Recently, prompt learning in Natural Language Processing (NLP) [31, 3, 24] has gained increasing importance in computer vision [55, 15]. Pre-trained models can perform various tasks based on prompts containing specific knowledge. Notably, CoOp [54] and CoCoOp [53] introduce prompt learning for optimizing CLIP [34], while SAM [23] uses multimodal prompts for unified semantic segmentation. MAE-VQGAN [2] and Painter [43] also employ prompts to address multiple visual tasks. PromptIR [32] and TransWeather [41] utilize learnable prompts for image restoration across various impaired scenarios, and VPT [21] introduces visual prompts for this purpose. Building on prompt learning, OneRestore [17] constructs text scene descriptors for adaptive composite degradation restoration. However, in all-in-one restoration and composite degradation restoration, the lack of interpretability in visual prompts may lead to confusion between different degradation tasks, while fixed text prompts struggle to adapt to variations of the same degradation in different scenarios.

3 The Proposed Method

In AllRestorer (see Fig. 4), we define a memory bank $m$ that stores the corresponding texts $t$ for all types of degradation, covering four common physical corrosions: low-light, haze, rain, and snow. To extract these features, we utilize CLIP [34] encoders (fine-tuned on CDD-11) to generate text embeddings for each entry in the memory bank, and image embeddings for the degraded input image. AllRestorer primarily uses an encoder-decoder U-Net [36] framework to capture multi-scale representations. Within this U-Net architecture, we introduce an All-in-One Transformer Block (AiOTB) as the core module. Unlike previous all-in-one restoration approaches [32, 17, 9], AiOTB incorporates a novel All-in-One Attention (AiOA) mechanism (Sec. 3.1). This mechanism introduces restoration schemes for each degradation by leveraging all scene descriptors as queries.

In AiOTB, Self-Attention (SA) is employed to remove the degradations from a given image in the latent space based on the corresponding restoration scheme. Notably, to efficiently characterize the degradation and accommodate its varying manifestations across different scenes, AiOA uses scene descriptors composed of both image and text embeddings (Sec. 3.2). Furthermore, to adapt to varying ratios of different degradations in complex scenes and to ensure that the all-in-one restoration is not negatively affected by unrelated scene descriptors, AiOA integrates an adaptive weight (Sec. 3.3) that adjusts the intensity of restoration for each type of degradation in latent space. Finally, as in most prior work [49, 20, 4], AllRestorer incorporates deep convolution in the feed-forward network of AiOTB to effectively restore the local details of the image.

3.1 All-in-One Transformer Block

To prevent scene descriptors $c$ from being misled by the varying proportions of different degradations in composite degradation scenarios, AllRestorer introduces the novel AiOTB base unit. As illustrated in Fig. 4, AiOTB consists of three main components: AiOA, SA, and Deep Convolutional Feed-Forward Network (DCFFN), where the SA function is represented as ${\rm{SA}}(Q,K,V)$ . Unlike attention mechanisms used in previous all-in-one restoration frameworks, AiOA, the core of AiOTB, considers all scene descriptors stored in the memory bank. By modeling long-range dependencies between image features and these descriptors, AiOTB effectively integrates all restoration solutions for the given image:

\begin{split}&{{\rm{AiOA}}(Q_{c},K,V)}={{\lambda}_{1}}{\rm{SA}}(c_{1},K,V)+,\cdots,\\ &+{{\lambda}_{i}}{\rm{SA}}(c_{i},K,V)+,\cdots,+{{\lambda}_{n}}{\rm{SA}}(c_{n},K,V),\end{split}

(1)

where $\lambda$ represents the adaptive weights defined in Sec. 3.3, which control the restoration intensity for different types of degradation. The variable $n$ denotes the total number of degradation types. $Q_{c}$ consists of all scene descriptors used as query matrices, while $K$ and $V$ are the key and value matrices projected from the image features, respectively. Within AiOTB, SA adaptively selects the appropriate restoration approach for each type of degradation in the latent space, allowing AiOTB to effectively remove all degradations without requiring precise human or model-specific judgments for composite degradations. For the DCFFN component, we utilize the feed-forward network design from Restormer [49].

3.2 Composite Scene Descriptor

For the AiOA scene descriptor, we develop a composite scene descriptor for AllRestorer. Unlike traditional scene descriptors, our composite version integrates both text and image embeddings. Compared to image descriptors, the inclusion of text embeddings provides a more explicit definition of degradation, helping to reduce confusion across tasks. Meanwhile, unlike text descriptors, the addition of image embeddings allows the composite descriptor to adapt to variations of the same type of degradation across different scenes. Therefore, composite scene descriptors not only avoid ambiguity but also effectively define the degradation characteristics.

Specifically, AllRestorer introduces a fine-tuned CLIP encoder to extract all text embeddings $x_{t}$ in $m$ and image tokens $e$ for given degraded image $I$ , respectively. However, for composite degradation, $e$ will encompass various degradations, which interfere with image restoration. In AllRestorer we calculate the dot product similarity among all text embeddings and each image token, selecting $k$ image tokens with the highest similarity as the image embeddings $x_{m}$ of the composite scene descriptor:

		$\displaystyle{e^{1},\cdots,e^{l}}={{\rm{CLIP}}(I)},$		(2)
		$\displaystyle{x_{t}^{1},\cdots,x_{t}^{n}}={{{\rm{CLIP}}(t^{1})},\cdots,{{\rm{CLIP}}(t^{n})}},$
		$\displaystyle sim^{i}=softmax(x_{t}^{i}\cdot[{e^{1},\cdots,e^{l}}]),i\in[1,n],$
		$\displaystyle x_{m}^{i}=sample([{e^{1},\cdots,e^{l}}],sim^{i},k),$

where $l$ denotes the token number of $e$ . The function $sample(e,sim,k)$ identifies the $k$ tokens from the image tokens $e$ with the highest dot product similarity $sim$ . Eventually, the combined image embedding and text embedding will be projected as our composite scene descriptor:

	$\displaystyle{c_{i}}$	$\displaystyle={concat[x_{m}^{i},x_{t}^{i}]},\quad i\in[1,n],$		(3)
	$\displaystyle{c_{i}}$	$\displaystyle={c_{i}}{W_{c}},$		(3)

where $W_{c}$ represents the projection matrix of the composite scene descriptor. Depending on the composite scenario descriptor, AllRestorer can define composite degradation more accurately to avoid incorrect restoration.

3.3 Adaptive Weights

To account for the varying proportions of different degradations in composite degradation scenarios and to prevent AiOTB’s overall restoration from interfering with single-degradation restoration, AiOA incorporates adaptive weights to regulate the restoration strength for each type of degradation in different contexts (see Fig. 5).

In AllRestorer, the CLIP image projection layer is used to project the image class token $e^{1}$ , thus computing the similarity of the dot product with each text embedding to adaptively represent the proportion of each degradation in the given image. The similarity is scaled by the softmax function as an adaptive weight for each degradation:

\begin{split}z&={{\rm{CLIP_{proj}}}(e^{1})},\\ {{\lambda}_{i}}&=softmax(z\cdot{x_{t}^{i}}),i\in[1,n].\end{split}

(4)

Using the similarity between the image projection vectors $z$ and the text embeddings, adaptive weights can adaptively assess the proportion of various degradations in the image, ensuring that AiOTB does not affect the restoration of other degradations while addressing the corresponding one.

4 Experiments

Types	Method	PSNR $\uparrow$	SSIM $\uparrow$	Params $\downarrow$
	MIRNet [47]	25.97	0.8474	31.79M
	MPRNet [48]	25.47	0.8555	15.74M
	MIRNetv2 [50]	25.37	0.8335	5.86M
	Restormer [49]	28.99	0.8646	26.13M
Uniform	DGUNet [28]	26.92	0.8559	17.33M
	NAFNet [10]	24.13	0.7964	17.11M
	SRUDC [39]	27.64	0.8600	6.80M
	Fourmer [56]	23.44	0.7885	0.55M
	OKNet [11]	26.33	0.8605	4.72M
	AirNet [25]	23.75	0.8140	8.93M
	TransWeather [41]	23.13	0.7810	21.90M
All-in-One	WeatherDiff [29]	22.49	0.7985	82.96M
	PromptIR [32]	25.90	0.8499	38.45M
	WGWSNet [59]	26.96	0.8626	25.76M
	OneRestore [17]	28.47	0.8784	5.98M
Composite	${\text{OneRestore }^{\star}}$ [17]	28.72	0.8821	5.98M
	AllRestorer (ours)	33.72	0.9436	12.18M

Table 1: Quantitative comparison of AllRestorer with baselines on CDD-11, where

{\text{OneRestore}^{\star}}

denotes OneRestore based on text scene descriptor.


Input	Restormer [49]	OKNet [11]	PromptIR [32]	TransWeather [41]	OneRestore [17]	AllRestorer	Target

4.1 Datasets and Evaluation Metrics

To fairly evaluate the performance of AllRestorer and baselines on all-in-one image restoration and composite degradation restoration tasks, we select the official dataset CDD-11 [17] for uniform training and test them on synthetic datasets and real datasets. The synthesized data in CDD-11 were utilized to assess all-in-one image restoration and composite degradation restoration. Real-world datasets such as LOL-v2 [45], Snow100k-R [27], NH-HAZE [14], and SPA-Data [42] were employed to evaluate AllRestorer’s performance in real-world all-in-one image restoration. Additionally, we gather 200 real composite scene cases from the web (real200) to evaluate AllRestorer’s performance in real-world composite degradation restoration. For CDD-11 results, we utilize PSNR and SSIM metrics for evaluation. In real-world tests, we employ NIQE and PIQE metrics.

4.2 Implementation Details

We trained AllRestorer for 300 epochs on the PyTorch framework in NVIDIA RTX 3090. The training was optimized using an Adam model with an initial learning rate of 1.25e-4. The loss functions include the smoothed $L1$ loss and the perceptual loss based on the VGG model, which is weighted at 0.04. All training images were uniformly set to 256 resolution, and data augmentation was limited to random cropping. Regarding CLIP fine-tuning, we only fine-tuned 40 epochs at 1e-6 learning rate on single degradation data in CDD-11.

4.3 Comparison with SOTA Methods

We evaluate the performance of AllRestorer, unified image restoration networks (Uniform), all-in-one image restoration methods (All-in-One), and composite degradation restoration schemes (Composite) on CDD-11.


Input	OKNet [11]	Fourmer [56]	TransWeather [41]	PromptIR [32]	OneRestore [17]	AllRestorer

Method	Venue & Year	NIQE $\downarrow$	PIQE $\downarrow$
NAFNet [10]	ECCV 2022	5.561	27.0
SRUDC [39]	ICCV 2023	8.911	42.2
Fourmer [56]	ICML 2023	5.731	24.9
OKNet [11]	AAAI 2024	26.33	27.5
AirNet [25]	CVPR 2022	6.283	29.5
TransWeather [41]	CVPR 2022	5.760	27.4
PromptIR [32]	NeurIPS 2023	6.103	28.2
WeatherDiff [29]	TPAMI 2023	6.201	28.8
OneRestore [17]	ECCV 2024	4.316	27.3
AllRestorer (ours)	-	4.241	26.5

Table 2: Real world composite degradation restoration results.

Quantitative Evaluation.

In Tab. 1, we provide comparison results of Restorer with baselines in CDD-11. Quantitative comparison results indicate that AllRestorer, which removes all degradation in the latent space, demonstrates a clear advantage on CDD-11. Specifically, AllRestorer achieved 33.72 dB PSNR and 0.9436 SSIM on CDD-11, marking a 5.00 dB PSNR and 0.0615 SSIM improvement over the current state-of-the-art (SoTA) baseline. Furthermore, Fig. 6 illustrates that AllRestorer performs better on each restoration task in CDD-11.

Qualitative Evaluation.

We present the visual performances of AllRestorer and other methods for all-in-one image restoration and composite degradation restoration in Fig. 7. Visual comparisons demonstrate that AllRestorer effectively removes both individual and complex composite degradations. Residual degradations are observed in the results of both unified image restoration baselines and all-in-one restoration baselines. OneRestore occasionally misidentifies the degradation, causing incorrect restoration.

4.4 Real-World Test

In this section, we assess AllRestorer’s performance in real-world applications. Specifically, we evaluated its restoration capabilities on real-world composite degradation scenarios using 200 real-world cases collected online and tested its all-in-one restoration results under multiple official real-world datasets.

Composite Degradation Restoration.

To fairly reflect real-world composite degradation repair performance, we uniformly compared restoration results from AllRestorer and baselines trained on CDD-11 in the real200 examples. Tab. 2 shows that AllRestorer achieves satisfactory performance in the restoration of composite degraded scenes in the real world. Compared to the baseline method OneRestore, AllRestorer decreases the NIQE by 0.075. In Fig. 8, we provide a visual comparison of the composite degradation restoration results for different methods in real200 examples. Despite OneRestore enhanced light, rain and haze residuals remain (see red box in Fig. 8), and other approaches struggle with real composite degradation scenes. In contrast, AllRestorer achieved more satisfactory results.

All-in-One Image Restoration.

To evaluate the robustness of AllRestorer in real-world all-in-one restoration, we tested AllRestorer trained on CDD-11 on multiple real benchmarks. For fair comparisons, we also trained the baseline models on the corresponding degraded data in CDD-11. In Fig. 9, we compare AllRestorer with low-light enhancement baselines [57, 22, 52, 37], deraining baselines [35, 12, 46, 16], dehazing baselines [19, 18, 33, 40], and desnowing baselines [6, 7, 51, 27], respectively. Compared to baselines, AllRestorer exhibits better restoration in most impaired scenarios.

4.5 Efficiency Comparison

Types	Method	MACs $\downarrow$	Inference Time $\downarrow$
	MPRNet [48]	1393.831G	75.314ms
	MIRNetv2 [50]	140.918G	41.391ms
Uniform	DGUNet [28]	200.464G	39.724ms
	Fourmer [56]	34.114G	31.590ms
	OKNet [11]	39.549G	9.382ms
	TransWeather [41]	6.131G	12.717ms
All-in-One	PromptIR [32]	158.140G	94.330ms
	WGWSNet [59]	1.677G	69.746ms
	OneRestore [17]	11.340G	16.081ms
Composite	${\text{OneRestore}^{\star}}$ [17]	11.340G	16.081ms
	ALLRestore	11.549G	25.019ms

Table 3: Efficiency comparison of ALLRestore and baselines under 256

\times

256 resolution in NVIDIA RTX 3090.

In this section, we provide an efficiency comparison of ALLRestore with other methods. In Tab. 3, we evaluated Multiply-Accumulate Operations (MACs) and Inference Time of different methods on NVIDIA RTX 3090 for 256-resolution images. Compared to most solutions, ALLRestore has pleasing computational efficiency.

AiOA	SA	DCFFN	PSNR	SSIM
		✓	24.81	0.8607
	✓	✓	27.49	0.8710
✓		✓	28.26	0.8715
✓	✓	✓	33.72	0.9436
A. Individual components.

Visual_$sd$	Text_$sd$	Composite_$sd$	PSNR	SSIM
			29.15	0.9239
✓			31.38	0.9310
	✓		32.41	0.9359
		✓	33.72	0.9436
B. Effects of composite scene descriptor.

Table 4: Ablation studies for AiOTB and composite scene descriptor, where sd indicates scene descriptor

4.6 Ablation Study

All-in-One Transformer Block.

We quantitatively evaluate the contribution of each component within AiOTB to AllRestorer’s performance on CDD-11. Tab. 4 (A) demonstrates that the baseline using only DCFFN is insufficient for effectively removing degradation even though it can reconstruct the image. SA improves image restoration but lacks definite guidance. While AiOA introduces all restoration solutions for AiOTB, it relies on SA to execute the corresponding restoration for the impaired images. Thus, each component in the AiOTB is indispensable (see Fig. 11).

Composite Scene Descriptor.

We compare the impact of different scene descriptors on CDD-11 to verify the effectiveness of our composite scene descriptor. Tab. 4 (B) and Fig. 10 illustrate this comparison for different scene descriptors. Although baseline and visual scene descriptors can mitigate the effects of a single impairment on image restoration, they are less effective in distinguishing scenes with composite degradations. Text scene descriptors can define different degradations but struggle to adapt to different variations of the same degradation (e.g., “snow particle artifacts of some sizes in Fig. 10”). Also, we report the effect of varying image token numbers $k$ on restoration in Tab. 5.

Setting	PSNR	SSIM
w/o Adaptive weights	30.66	0.9271
w/ Adaptive weights	33.72	0.9436
Composite scene descriptor, $k$ =5	33.04	0.9388
Composite scene descriptor, $k$ =10	33.72	0.9436
Composite scene descriptor, $k$ =25	33.59	0.9431
Composite scene descriptor, $k$ =50	33.16	0.9390

Table 5: Effect of adaptive weights and the impact of different number sampling tokens

k

on composite scene descriptor.

Adaptive Weights.

On CDD-11, we evaluate the contribution of adaptive weights to AllRestorer. As shown in Tab. 5, removing adaptive weights notably diminishes AllRestorer’s performance. The absence of a restoration intensity control results in pronounced artifacts in Fig. 11. This indicates that managing restoration intensity for each degradation helps achieve a higher-quality restoration.

5 Conclusion

In this paper, we propose AllRestorer, a model designed for accurate composite degradation handling and all-in-one image restoration. We develop a novel All-in-One Transformer Block (AiOTB) within AllRestorer, which integrates multiple restoration techniques for degraded images through All-in-One Attention (AiOA). This attention mechanism is guided by composite scene descriptors to effectively remove all types of degradation in the latent space. AiOA employs adaptive weights to control the restoration intensity for each type of degradation across different scenarios. Leveraging AiOTB’s adaptive approach to accurately identify and remove multiple forms of degradation, AllRestorer demonstrates SoTA performance on both synthetic and real datasets.

References

Ai et al. [2024] Yuang Ai, Huaibo Huang, Xiaoqiang Zhou, Jiexiang Wang, and Ran He. Multimodal prompt perceiver: Empower adaptiveness generalizability and fidelity for all-in-one image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25432–25444, 2024.
Bar et al. [2022] Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, and Alexei Efros. Visual prompting via image inpainting. Advances in Neural Information Processing Systems, 35:25005–25017, 2022.
Brown et al. [2020] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
Chen et al. [2024a] Sixiang Chen, Tian Ye, Yun Liu, and Erkang Chen. Dual-former: Hybrid self-attention transformer for efficient image restoration. Digital Signal Processing, 149:104485, 2024a.
Chen et al. [2024b] Sixiang Chen, Tian Ye, Kai Zhang, Zhaohu Xing, Yunlong Lin, and Lei Zhu. Teaching tailored to talent: Adverse weather restoration via prompt pool and depth-anything constraint. In European conference on computer vision. Springer, 2024b.
Chen et al. [2020] Wei-Ting Chen, Hao-Yu Fang, Jian-Jiun Ding, Chen-Che Tsai, and Sy-Yen Kuo. Jstasr: Joint size and transparency-aware snow removal algorithm based on modified partial convolution and veiling effect removal. In European Conference on Computer Vision, 2020.
Chen et al. [2021] Wei-Ting Chen, Hao-Yu Fang, Cheng-Lin Hsieh, Cheng-Che Tsai, I Chen, Jian-Jiun Ding, Sy-Yen Kuo, et al. All snow removed: Single image desnowing algorithm using hierarchical dual-tree complex wavelet representation and contradict channel loss. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4196–4205, 2021.
Chen et al. [2022] Wei-Ting Chen, Zhi-Kai Huang, Cheng-Che Tsai, Hao-Hsiang Yang, Jian-Jiun Ding, and Sy-Yen Kuo. Learning multiple adverse weather removal via two-stage knowledge learning and multi-contrastive regularization: Toward a unified model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17653–17662, 2022.
Chen et al. [2024c] Xiangyu Chen, Yihao Liu, Yuandong Pu, Wenlong Zhang, Jiantao Zhou, Yu Qiao, and Chao Dong. Learning a low-level vision generalist via visual task prompt. In ACM Multimedia 2024, 2024c.
Chu et al. [2022] Xiaojie Chu, Liangyu Chen, and Wenqing Yu. Nafssr: Stereo image super-resolution using nafnet. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 1239–1248, 2022.
Cui et al. [2024] Yuning Cui, Wenqi Ren, and Alois Knoll. Omni-kernel network for image restoration. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1426–1434, 2024.
Deng et al. [2020] Sen Deng, Mingqiang Wei, Jun Wang, Yidan Feng, Luming Liang, Haoran Xie, Fu Lee Wang, and Meng Wang. Detail-recovery image deraining via context aggregation networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14560–14569, 2020.
Fu et al. [2024] Daocheng Fu, Xin Li, Licheng Wen, Min Dou, Pinlong Cai, Botian Shi, and Yu Qiao. Drive like a human: Rethinking autonomous driving with large language models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 910–919, 2024.
Fu et al. [2021] Minghan Fu, Huan Liu, Yankun Yu, Jun Chen, and Keyan Wang. Dw-gan: A discrete wavelet transform gan for nonhomogeneous dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 203–212, 2021.
Ge et al. [2023] Chunjiang Ge, Rui Huang, Mixue Xie, Zihang Lai, Shiji Song, Shuang Li, and Gao Huang. Domain adaptation via prompt learning. IEEE Transactions on Neural Networks and Learning Systems, 2023.
Guo et al. [2021] Qing Guo, Jingyang Sun, Felix Juefei-Xu, Lei Ma, Xiaofei Xie, Wei Feng, and Yang Liu. Efficientderain: Learning pixel-wise dilation filtering for high-efficiency single-image deraining. In AAAI, 2021.
Guo et al. [2024] Yu Guo, Yuan Gao, Yuxu Lu, Ryan Wen Liu, and Shengfeng He. Onerestore: A universal restoration framework for composite degradation. In European Conference on Computer Vision, 2024.
Hang et al. [2020] Dong Hang, Pan Jinshan, Hu Zhe, Lei Xiang, Zhang Xinyi, Wang Fei, and Yang Ming-Hsuan. Multi-scale boosted dehazing network with dense feature fusion. In CVPR, 2020.
Hong et al. [2020] Ming Hong, Yuan Xie, Cuihua Li, and Yanyun Qu. Distilling image dehazing with heterogeneous task imitation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3462–3471, 2020.
Huang et al. [2023] Huaibo Huang, Xiaoqiang Zhou, Jie Cao, Ran He, and Tieniu Tan. Vision transformer with super token sampling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22690–22699, 2023.
Jia et al. [2022] Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual prompt tuning. In European Conference on Computer Vision (ECCV), 2022.
Jiang et al. [2021] Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, and Zhangyang Wang. Enlightengan: Deep light enhancement without paired supervision. IEEE Transactions on Image Processing, 30:2340–2349, 2021.
Kirillov et al. [2023] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023.
Lester et al. [2021] Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
Li et al. [2022] Boyun Li, Xiao Liu, Peng Hu, Zhongqin Wu, Jiancheng Lv, and Xi Peng. All-In-One Image Restoration for Unknown Corruption. In IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, 2022.
Li et al. [2020] Ruoteng Li, Robby T Tan, and Loong-Fah Cheong. All in one bad weather removal using architectural search. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3175–3185, 2020.
Liu et al. [2018] Yun-Fu Liu, Da-Wei Jaw, Shih-Chia Huang, and Jenq-Neng Hwang. Desnownet: Context-aware deep network for snow removal. IEEE Transactions on Image Processing, 27(6):3064–3073, 2018.
Mou et al. [2022] Chong Mou, Qian Wang, and Jian Zhang. Deep generalized unfolding networks for image restoration. In CVPR, 2022.
Özdenizci and Legenstein [2023] Ozan Özdenizci and Robert Legenstein. Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–12, 2023.
Özdenizci and Legenstein [2023] Ozan Özdenizci and Robert Legenstein. Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8):10346–10357, 2023.
Petroni et al. [2019] Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H Miller, and Sebastian Riedel. Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019.
Potlapalli et al. [2023] Vaishnav Potlapalli, Syed Waqas Zamir, Salman Khan, and Fahad Khan. Promptir: Prompting for all-in-one image restoration. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
Qin et al. [2020] Xu Qin, Zhilin Wang, Yuanchao Bai, Xiaodong Xie, and Huizhu Jia. Ffa-net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11908–11915, 2020.
Radford et al. [2021] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
Ren et al. [2019] Dongwei Ren, Wangmeng Zuo, Qinghua Hu, Pengfei Zhu, and Deyu Meng. Progressive image deraining networks: A better and simpler baseline. In IEEE Conference on Computer Vision and Pattern Recognition, 2019.
Ronneberger et al. [2015] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015.
Sharma and Tan [2021] Aashish Sharma and Robby T Tan. Nighttime visibility enhancement by increasing the dynamic range and suppression of light effects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11977–11986, 2021.
Song et al. [2023a] Binbin Song, Xiangyu Chen, Shuning Xu, and Jiantao Zhou. Under-display camera image restoration with scattering effect. arXiv preprint arXiv:2308.04163, 2023a.
Song et al. [2023b] Binbin Song, Xiangyu Chen, Shuning Xu, and Jiantao Zhou. Under-display camera image restoration with scattering effect. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12580–12589, 2023b.
Song et al. [2023c] Yuda Song, Zhuqing He, Hui Qian, and Xin Du. Vision transformers for single image dehazing. IEEE Transactions on Image Processing, 32:1927–1941, 2023c.
Valanarasu et al. [2022] Jeya Maria Jose Valanarasu, Rajeev Yasarla, and Vishal M Patel. Transweather: Transformer-based restoration of images degraded by adverse weather conditions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2353–2363, 2022.
Wang et al. [2019] Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang Zhang, and Rynson WH Lau. Spatial attentive single-image deraining with a high quality real rain dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12270–12279, 2019.
Wang et al. [2023] Xinlong Wang, Wen Wang, Yue Cao, Chunhua Shen, and Tiejun Huang. Images speak in images: A generalist painter for in-context visual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6830–6839, 2023.
Xu et al. [2024] Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee K Wong, Zhenguo Li, and Hengshuang Zhao. Drivegpt4: Interpretable end-to-end autonomous driving via large language model. IEEE Robotics and Automation Letters, 2024.
Yang et al. [2021] Wenhan Yang, Wenjing Wang, Haofeng Huang, Shiqi Wang, and Jiaying Liu. Sparse gradient regularized deep retinex network for robust low-light image enhancement. IEEE Transactions on Image Processing, 30:2072–2086, 2021.
Ye et al. [2021] Yuntong Ye, Yi Chang, Hanyu Zhou, and Luxin Yan. Closing the loop: Joint rain generation and removal via disentangled image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2053–2062, 2021.
Zamir et al. [2020] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Learning enriched features for real image restoration and enhancement. In ECCV, 2020.
Zamir et al. [2021] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. In CVPR, 2021.
Zamir et al. [2022a] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. In CVPR, 2022a.
Zamir et al. [2022b] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Learning enriched features for fast image restoration and enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022b.
Zhang et al. [2021] Kaihao Zhang, Rongqing Li, Yanjiang Yu, Wenhan Luo, and Changsheng Li. Deep dense multi-scale network for snow removal using semantic and geometric priors. IEEE Transactions on Image Processing, 2021.
Zhang et al. [2019] Yonghua Zhang, Jiawan Zhang, and Xiaojie Guo. Kindling the darkness: A practical low-light image enhancer. In Proceedings of the 27th ACM international conference on multimedia, pages 1632–1640, 2019.
Zhou et al. [2022a] Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional prompt learning for vision-language models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022a.
Zhou et al. [2022b] Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models. International Journal of Computer Vision (IJCV), 2022b.
Zhou et al. [2022c] Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022c.
Zhou et al. [2023] Man Zhou, Jie Huang, Chun-Le Guo, and Chongyi Li. Fourmer: An efficient global modeling paradigm for image restoration. In International conference on machine learning, pages 42589–42601. PMLR, 2023.
Zhou et al. [2022d] Shangchen Zhou, Chongyi Li, and Chen Change Loy. Lednet: Joint low-light enhancement and deblurring in the dark. In ECCV, 2022d.
Zhou et al. [2024] Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, and Ming-Hsuan Yang. Drivinggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21634–21643, 2024.
Zhu et al. [2023] Yurui Zhu, Tianyu Wang, Xueyang Fu, Xuanyu Yang, Xin Guo, Jifeng Dai, Yu Qiao, and Xiaowei Hu. Learning weather-general and weather-specific features for image restoration under multiple adverse weather conditions. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.