This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Utility-Oriented Underwater Image Quality Assessment Based on Transfer Learning

Weiling Chen,  Rongfu Lin, Honggang Liao Tiesong Zhao,  Ke Gu,  and Patrick Le Callet W. Chen, R. Lin and H. Liao are with Fujian Key Lab for Intelligent Processing and Wireless Transmission of Media Information, Fuzhou University, Fuzhou 350108, China (E-mails: {weiling.chen, n191127021, 211127119}@fzu.edu.cn).T. Zhao is with Fujian Key Lab for Intelligent Processing and Wireless Transmission of Media Information, Fuzhou University, Fuzhou 350108, China and Peng Cheng Laboratory, Shenzhen 518000, China. (E-mail: [email protected]).K. Gu is with the Faculty of Information Technology, Beijing University of Technology, Engineering Research Center of Intelligent Perception and Autonomous Control, Ministry of Education, Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing Artificial Intelligence Institute, Beijing 100124, China (E-mail: [email protected]).Patrick Le Callet is with Équipe Image, Perception et Interaction, Laboratoire des Sciences du Numérique de Nantes, Université de Nantes, France (E-mail: [email protected]).
Abstract

The widespread image applications have greatly promoted the vision-based tasks, in which the Image Quality Assessment (IQA) technique has become an increasingly significant issue. For user enjoyment in multimedia systems, the IQA exploits image fidelity and aesthetics to characterize user experience; while for other tasks such as popular object recognition, there exists a low correlation between utilities and perceptions. In such cases, the fidelity-based and aesthetics-based IQA methods cannot be directly applied. To address this issue, this paper proposes a utility-oriented IQA in object recognition. In particular, we initialize our research in the scenario of underwater fish detection, which is a critical task that has not yet been perfectly addressed. Based on this task, we build an Underwater Image Utility Database (UIUD) and a learning-based Underwater Image Utility Measure (UIUM). Inspired by the top-down design of fidelity-based IQA, we exploit the deep models of object recognition and transfer their features to our UIUM. Experiments validate that the proposed transfer-learning-based UIUM achieves promising performance in the recognition task. We envision our research provides insights to bridge the researches of IQA and computer vision.

Index Terms:
Image Quality Assessment (IQA), underwater images, utility-oriented IQA.

I Introduction

Images play an important role in daily life and work. During its acquisition, transmission, storage, and display, the noises are inevitable and could degrade the quality of images. Traditional Image Quality Assessment (IQA) is developed to automatically assess the perceptual quality of images including fidelity-oriented IQA and aesthetics-oriented IQA. However, with the massive growth in complex tasks, perceptual quality may not necessarily be used for the subsequent processing beyond user enjoyment. In such a case, the traditional IQA cannot be applied and thus new techniques should be developed.

Refer to caption
Figure 1: The difference among utility-oriented IQA, aesthetic-oriented IQA and fidelity-oriented IQA.

As shown in Fig. 1, the fidelity-oriented IQA focuses on the clarity of details and textures that can be affected by the degree and type of distortion. It is the most widely used and studied IQA at present to reflect the viewability of the image. In a fidelity-oriented IQA system, the image quality is undoubtedly positively correlated with clarity, and converged as the clarity improves. In aesthetic-oriented IQA, the image gives a sense of harmony and beauty for user enjoyment. In such scenarios, image quality is positively related to composition, color harmony, etc, in addition to fidelity. To improve the visual comforts and aesthetic perceptions of users, the aesthetic analysis of images has also been widely studied. In Fig. 1, the shades of colors and different shapes are used to indicate the influence of other factors on aesthetic quality.

Besides user enjoyment, many images have also been applied in practical scenarios like user analysis, understanding, processing. For example, underwater images are used for the exploration of marine resources. In practical scenarios, utility is the deciding factor of quality while fidelity and aesthetic are just contributing factors. Furthermore, the fidelity- and aesthetic-based quality can be collectively called perceived quality. As described in [1], the link between utility quality and perceived quality is difficult to be summarized by a single model. In [2], a further relationship is mentioned, that is, perceived quality improvements correspond to smaller utility upgrades. In consequence, it is suggested that the image utility saturates more easily than perceived quality. After the saturation point, the extra enhancement of perceived quality does not benefit the utility. Furthermore, the utility is also related to the requirements of the specific task, for example, highlighted target is more desired in object detection. The utility-related characteristics are usually neglected by existing fidelity- or aesthetic-oriented IQA. In view of the above characteristics, perception-based evaluation criteria are not directly feasible in utility-oriented IQA. To evaluate the image utility, we propose a definition of utility-oriented IQA by summarizing the task descriptions of utility assessment in [1] as:

The quality evaluation of an image considering its utility to complete a vision-based task.

For more specific analysis, we initialize our research in the scenario of underwater fish detection, which is a challenging task that has not yet been perfectly solved. On the basis of the Underwater Image quality database for Fish Detection (UIFD) [3], an Underwater Image Utility Database (UIUD) is built. Specifically, we add the types of distortions that affect image quality under certain tasks, including foreground/background distortions, and tentative analysis of non-target images. Compared with several public databases, our database considers the task background from the original image selection, image degradation processing, and subjective experiment design. Then, we make good use of deep models for object recognition and transfer these features to an Underwater Image Utility Measure (UIUM). For a more intuitive explanation, the UIUM is decomposed into a main-task and a sub-task. The sub-task is pre-trained on a large-scale database for object detection to get a fish detection model, while the main-task predicts the image utility, taking advantage of key feature information obtained from the sub-task. Our major contributions are as follows:

(1) Raised and discussed the utility-oriented IQA under the scenario of object detection. We envision our research provides insights to bridge the researches of IQA and computer vision.

(2) Developed the first-of-its-kind utility-oriented IQA database. It can be utilized as a benchmark to develop and evaluate subjective methods of utility-oriented IQA.

(3) Proposed a UIUM metric based on transfer learning, which is the first successful attempt to present a utility-oriented IQA metric. Experimental results demonstrate the effectiveness of our model.

The remaining of this paper is organized as follows. Section II introduces the related work. Section III explains details about the construction of image quality database. Section IV elaborates the proposed method. In Section V, we report the experimental results. Finally, the paper is concluded in Section VI.

II Related work

II-A Image Quality Assessment

IQA can be divided into three categories according to its objective as described above. Fidelity-oriented IQA evaluates whether the image clearly conveys all visual information. The related studies can be classified into Natural Scene Statistics (NSS)-based methods and learning-based methods. [4] utilized an NSS model of DCT coefficients to predict image quality scores. [5] used locally normalized luminance coefficients to quantify possible losses of naturalness in images due to distortions. A small codebook was employed in [6] for a general-purpose blind IQA based on high-order statistics aggregation. [7] integrated the features of NSS derived from multiple cues to learn a multivariate Gaussian model of image patches. [8] combined feature learning and regression into one optimization process to estimate quality. A neural network-based pooling was presented in [9] to assess the global image quality with local patch qualities. To explore the relationship between fidelity and quality, [10, 11] proposed deep learning methods. Notwithstanding the prosperity of methods, the above-mentioned ones perform well in the quality estimation of Natural Scene Images (NSIs) but fail in that of utility-oriented IQA.

Unlike fidelity-oriented IQA, aesthetics-oriented IQA prefers visual contents with reasonable layouts and visual comforts. [12] relied on artificially designed functions to improve the image aesthetics reasoning of Convolutional Neural Network (CNN). [13] proposed a two-stream CNN which considers heterogeneous and complementary aesthetic perceptual abilities respectively. [14] proposed to quantify image aesthetics by distributing it across multiple quality levels. In [15], authors proposed a multi-reference eye inpainting Generative Adversarial Networks (GAN) approach based on an eye aesthetic dataset. [16] presented a deep multi-modal learning for aesthetic quality assessment of unmanned aerial vehicle videos. [17] employed a gated information fusion network to weight the roles of foveal vision and peripheral vision, which are key issues in image aesthetic evaluation. [18] proposed a semi-supervised deep active learning to explore the way humans perceive semantically important areas in images. Moreover, it developed a probabilistic model to incorporate the aesthetic experience of multiple users by encoding the experience of several professional photographers. These methods focus on various visual factors and composition to enhance users experience.

The utility-oriented IQA evaluates the task-aware utility based on the richness of useful information. High-utility images help complete subsequent tasks. At present, there is few work in utility quality assessment. Rouse et al. proposed the concept of utility assessment for natural images in [1], where the objective evaluation was expected to be consistent with subjective judgement of usefulness. In this paper, a subjective utility quality database for natural images, referred to as the CU-Nants database, was obtained. On basis of CU-Nants database, a series of full-reference utility quality assessment methods were proposed [2], [1], [19]. These methods are all built upon the hypothesis that contour degradations are consistent with decreased perceived utility. In [20], Edward et al. came up with a no-reference utility measurement utilizing the Oxford Visual Geometry Group’s (VGG) deep CNN for CU-Nant database. As an extent, Edward et al. also found that highly performing utility measurements can also predict saliency for object recognition [21], since the utility measured is related to the contour difference between test and reference images, and the distortion in contour impacts the ability of observer to recognize object. [22] considered the influence of blur, dramatic pose variations, and occlusion on face quality. The utility evaluation was guided by the vascular structure, rather than the perceived quality of the whole retinal image in [23]. Besides, video utility evaluation algorithms have also been discussed in the context of tasks such as compression [24], [25].

Despite of these great efforts, there are still lack of a deep analysis of utility-oriented IQA and a large-scale database for underwater image utility evaluation. Most of aforementioned contributions are designed for natural images or videos. Nevertheless, underwater images are different from natural images both in terms of the statistical properties, visual characteristics and distortion types. As for underwater images, although sonar images [26] can provide acoustic information that assists underwater detection, underwater optical imaging is still critical to show more intuitive visual information and its quality assessment is different from fidelity evaluation of sonar images [27]-[28]. Therefore, the fidelity-based sonar IQA is not suitable for the utility-oriented IQA discussed in this paper.

In response to these problems, we design a utility-based subjective experiment to construct a quality database. Then, an image quality evaluation method based on the fish detection task is proposed.

II-B Transfer Learning and Object Detection

Due to the limited training samples in the existing database, IQA based on deep learning faces the problem of overfitting. Transfer learning [29] offers an effective solution by utilizing data from a related source task to improve the performance. However, there are few works on quality evaluation based on transfer learning. A transfer learning framework was described in [30], which learned an end-to-end image quality estimator in classification or regression. In [31], the features extracted from distorted images were transferred to the same feature space, in order to solve the problem of insufficient video contents. [32] developed an IQA architecture of multi-domain transitive transfer learning, which is associated with the ImageNet source domain, the IQA target domain, and their corresponding tasks.

Refer to caption
Figure 2: Construction of the UIUD database.

Object detection is to find out the objects that people are interested in from images or videos, and detect their position and size. Different from the image classification task, object detection must not only solve the classification problem, but also the positioning problem. Until 2012 [33], the rise of CNNs pushed the field of object detection of natural scene images to a new level. With the development of ocean exploration and exploitation, detecting fish from underwater videos and images is of great significance for fishery resource assessment and ecological environment monitoring. In [34], authors proposed a novel composite fish detection framework based on a composite backbone and an enhanced path aggregation network called Composited FishNet. [35] presented a novel dataset with 400 images of fish in the wild. By using these dataset, the state-of-the-art detection models are trained with fine-tuning strategies. [36] proposed a deep but lightweight neural network to detect fish. However, due to the appalling underwater conditions, images or videos captured underwater are often with poor utility. Most fish targets are small and easily confused. It is difficult to achieve fully automatic machine recognition according to the current technology.

Inspired by these, we propose a UIUM metric based on transfer learning, which uses a trained object detection model to share the prior knowledge of key feature information for detection.

III Database Construction

The ocean is unknown and changeable for human beings. A variety of fishes are important parts of the marine biosphere. Hence, monitoring the amount and species of fish is essential for regional ecological balance. However, it is still a difficult task due to the unpredictability of species and the limited learning ability of machines. Artificial recognition is still the most reliable way under this scenario. In this paper, we establish a subjective quality database as a benchmark to develop and evaluate objective IQA methods.

A large number of general-purpose IQA databases have been built based on the standard ITU-R BT.500 [37], such as LIVE [38], TID2013 [39], KonIQ-10k [40] . These databases focus on visual factors such as image details, texture, color, etc. While for aesthetic image quality databases, there are AVA [41], Waterloo IAA [42], etc, which favor that the various visual factors in the image such as color, light, and background have a reasonable layout. Rouse et al. developed a subjective utility database, consisting of reference and distorted versions of natural images along with corresponding subjective utility and quality ratings [1]. The utility scores of this database were obtained by pair-wise useful information comparison. However, there is currently no utility-oriented underwater image quality database. In the context of underwater fish detection, we develop a first-of-its-kind database for utility-oriented IQA, with its development process summarized in Fig. 2. Firstly, all source images are prepared from underwater scenes. Then, diversified types of distortions are introduced to cover typical impairments of underwater images. After that, a comprehensive subject test is conducted based on the single-stimulus test of ITU-R BT.500 [37]. Finally, all user scores are analyzed and summarized to obtain the MOS values.

III-A Material Preparation

Refer to caption
(a) Reference
Refer to caption
(b) Type1
Refer to caption
(c) Type2
Refer to caption
(d) Type3
Refer to caption
(e) Type4
Refer to caption
(f) Type5
Refer to caption
(g) Type6
Refer to caption
(h) Non-target
Figure 3: Examples of the images in the UIUD database.

We have built a fish image quality database (i.e., the UIFD database) which contains 2675 fish images. This database employed the Fish4Knowledge video repository, which was taken to monitor coral reefs, to simulate the underwater environment and combine it with an underwater fish detection task. In this work, 145 images with clear fish characteristics were selected from the Fish4Knowledge video repository for the UIFD database, and each image includes 1 to 6 fishes. To simulate the complex underwater environment, five typical underwater distortions are devised, including channel distortion, contrast distortion, illumination distortion, motion blur, and ocean-snow distortion. Besides, the foreground/background distortion is also introduced to cover the impact of regional distortions on object detection. There are 4 to 5 distortion levels for most of the distortion types. Besides, we add 90 non-target images to conduct a tentative analysis. A high-fidelity image without targets is considered as a low-utility image. By introducing the foreground/background distortions as well as non-target images, the proposed database covers different utilities of underwater images.

In the proposed UIUD database, a total of 3,340 images are generated including original images and the corresponding distorted images. We present some examples of the UIUD database in Fig. 3. For convenience, the distortion type is marked as “Type1” to “Type6”. The detailed information corresponding to each type is given in Table I.

TABLE I: Composition of the UIUD database.
Distortion Types Number Numbering
Channel Distortion 495 Type1
Contrast Distortion 580 Type2
Illumination Distortion 580 Type3
Motion Blur 580 Type4
Foreground/Background
Distortion
580 Type5
Ocean-Snow Distortion 290 Type6
Reference Image 145
Non-Target 90
Total Image 3340

III-B Subjective Quality Evaluation

We invited 21 subjects to conduct subjective experiments on the supplementary images except for no-target images. The number of subjects can make the experimental results reach the saturation point. All subjects have been pretrained with sufficient knowledge of underwater fishes. The collected information includes the number of fishes, the number of fish species, the time for the subject to make a judgment, and the quality of this image. The first three kinds of information are analyzed in further research. In the UIUD database, only the subjective scores are used.

Particularly, the scoring of image quality is utility-based in this work. Traditional Absolute Category Rating (ACR) defined in ITU-R BT.500 utilizes five adjectives as categorical scales to assess image quality, which is called adjective categorical judgment methods as the right column of Table II shows. The UIUD database focuses on the utility of image, i.e., whether or not the fish can be detected. So we develop the rating scales as given in the left column of Table II. In Table II, the first adjective reflects the completeness of fish information, while the second adjective establishes the identification threshold of target fish. Apart from that, the experimental environment and monitor are calibrated according to the recommendations of ITU-R BT.500 [37].

Finally, to improve the reliability of results, we randomly sampled 5 images to form a verification set. If a subject presents significantly different scores between the verification set and those corresponding images in the subjective test, his or her score will be discarded. In detail, if the difference between two scores of the same image is more than 2, the image is regarded as a fluctuation image. In this subjective test, the average number of fluctuation images of each subject is less than 1. Thus no score has been discarded, which means that all subjects are reliable.

TABLE II: The rating criteria of UIUD and traditional subjective experiments.
rating UIUD Traditional Databases
5 Complete and obvious Excellent
4 Incomplete but uninfluential Good
3 Incomplete but identifiable Fair
2 Incomplete but distinguishable Poor
1 All lost or undistinguishable Bad
Refer to caption
Figure 4: The boxplot of subjective scores. The horizontal axes corresponds to the image number, and the vertical axes corresponds to the subjective scores. The two ends of the blue rectangular box represent the corresponding 25th and 75th percentiles of the subjective scores. The upper and lower blue horizontal lines represent the maximum and minimum values respectively.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: The MOS histograms of UIUD and other public databases. (a) The histogram of LIVE. (b) The histogram of TID2008. (c) The histogram of TID2013. (d) The histogram of UIUD.
Refer to caption
Figure 6: The framework of UIUM model. The images shown in (a) are example images of FODD. The number 1 and 79, 81, 251, 323 represent the type and 2D coordinates of a fish, respectively. The images shown in (b) are example images of UIUD. The number represents the utility quality of the image. (c) is a feature extractor. (d) describes the sub-task and (e) indicates the main-task. They are two output structures used for object detection and quality regression, respectively.

III-C Data Processing and Analysis

Follow the practices in [43], we further verify the reliability of user scores obtained in Section III-B. Outlier Coefficient (OC) is introduced to quantify the subjective agreement of the UIUD database:

OC=NoutlierNtotal,\quad OC=\frac{N_{\text{outlier}}}{N_{\text{total}}}, (1)

where NtotalN_{\text{total}} denotes the total number of labeled images, and NoutlierN_{\text{outlier}} denotes the number of the images whose interval between the 25th and 75th percentiles of subjective ratings is larger than 1. To visualize the results of OC, Fig. 4 shows the boxplot of subjective scores of 30 images. A subjective score is considered as an outlier when its blue rectangle is larger than 1. According to the analysis, our database achieves an OC of 5% and thus is considered to be with high subjective agreement. Then, we process the subject rating values given by each viewer into vectors and then compute the Normalized Cross Correlation (NCC) and the Euclidean Distance (EUD) between every two vectors. The final values of NCC and EUD achieve 0.91 and 0.08, respectively. The results demonstrate the high correlation between two subjective rating vectors.

After data processing, Mean Opinion Score (MOS) values were calculated and used as the image labels. The MOS histograms of the UIUD database and other IQA databases are shown in Fig. 5. Among them, the proposed database has its own characteristics in data distribution. The data often presents Gaussian distribution without task constraints as Fig. 5(a)(b)(c) show. However, there are discontinuities according to different requirements in subjective experiments with task backgrounds. The scores increase with more fish information but saturate when the image clarity increases to a certain level. In contrast, there are fewer mid-quality images. This is consistent with the facts reflected in Fig. 5(d). There are few images with mid-quality between 30 and 45 and also few images with high quality above 65. The different distributions of scores lead to different characteristics of UIUM compared with conventional IQA approaches.

IV Transfer learning for UIUM

IV-A Motivation of Framework Design

Underwater images for fish detection have their particular features which are different from features extracted in fidelity- or aesthetic-oriented IQA. Traditional methods such as pixel-mapping-based measurements and textural features are not targeted in utility measurement. Moreover, there is still a lack of training data in real-world applications. To address these problems, the transfer learning technique, which requires less training data, is utilized in this paper to transfer utility-based features from fish detection networks to an IQA model. In addition, the introduction of transfer learning is also motivated by the top-down design of perception-based, in which IQA learns various perceptual characteristics of the human visual system. As shown in Fig. 6, we design a dual-stream output structure that is decomposed into main-task and sub-task. The sub-task is trained on a fish detection database to get a fish detection model. The main-task predicts the utility-oriented quality of images.

IV-B Network Architecture

The proposed network includes a shared layer and a dual-stream output layer. The shared layer utilizes the YOLOv4 which consists of backbone and neck to transfer fish detection network to utility-oriented IQA. It will be used as a feature extractor for both the main-task and the sub-task. The dual-stream output layer completes object detection and quality prediction, which can guide the realization of transfer learning.

In the shared layer, the backbone and neck are critical detectors to extract basic features for object detection. We employ CSPDarknet53 as the backbone for its high accuracy and low complexity. CSPDarknet53 is a combination of Yolov3 backbone network and Cross Stage Partial Network (CSPNet) [44]. CSPNet mainly solves the problem of complex computation from the perspective of network structure design. The neck further processes the important features extracted by the Backbone. In order to better extract the features concerned by the fish detection task, Spatial Pyramid Pooling (SPP) [45] module, Feature Pyramid Networks (FPN) [46] and Path Aggregation Network (PAN) [47] module of YOLOv4 are implemented. The SPP module employs different kernels for pooling and then concatenates feature maps of different scales, thus it can effectively increase the receiving range of backbone features and significantly separates the most important context features. The FPN layer conveys strong semantic features from top to bottom, while PAN transports strong positioning features from bottom to top. They work together to aggregate parameters from different backbone layers to different detection layers.

After the shared layer, three feature maps under different scales and receptive fields are obtained. Fig. 6 (d) is used for sub-task, whose loss function is generally composed of classification loss and bounding box regression loss. The output of Fig. 6 (d) verifies the performance of object detection. Fig. 6 (e) is designed for main-task, i.e., predicting quality. The fully connected layer is exploited to achieve quality prediction. Experimental result shows that an optimal number of fully connected layer is 1. We connect feature maps into a one-dimensional vector. This vector will be inputted into this fully connected layer. The Mean Square Error (MSE) is employed as a loss function in the main-task, which is widely used in various regression tasks.

IV-C Transfer Learning from Object Detection to IQA

After building the network architecture, we exploit the shared layer and dual-stream output layer to transfer an object detection model to an IQA model. Our transfer learning process is performed in two steps: pre-training and fine-tune.

Throughout the training process, we utilize three databases, namely the UIUD database, Fish Object Detection Database (FODD) and Microsoft Common Objects in Context (COCO) [48]. The UIUD database is the image quality database established in this paper, and its label is a subjective utility quality score. FOOD and COCO are both object detection databases, and their labels are categories and coordinates. COCO is a large-scale object detection database that mainly obtains images from complex daily scenes. FODD is a database that we compiled to train the fish detection network. We denote these training datasets UIUD, FODD and COCO by DtD_{\text{t}}, DfD_{\text{f}} and DcD_{\text{c}}, respectively.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 7: The test results of Fish-YOLOv4. (a)(b) Images with different backgrounds. (c)(d) High-definition images.

IV-C1 Pre-training

At this step, two training processes are carried out. We skip the network of (e) in Fig. 6 and make the feature map flow into (d) in Fig. 6. The initial parameters of the shared layer are collectively denoted by θ\theta. We first feed θ\theta to the shared layer. The implementation process is as follows:

First Training: θc=argminθ(Dc;Lc(θ),Lbbr(θ)),\textbf{First Training: }\quad\theta_{c}=\mathop{\arg\min}_{\theta}(D_{\text{c}};L_{c}(\theta),L_{bbr}(\theta)), (2)

where LcL_{c} and LbbrL_{bbr} are the classification loss and bounding box regression loss respectively. θc\theta_{c} is the parameter set obtained by pre-training on COCO. Then, the network is trained on FODD to get a fish detection model. To avoid overfitting, we divide the training and testing sets by video content that each species has an appropriate ratio between the training and testing sets. The second training is implemented as:

Second Training: θ^c=argminθ(Df;Lc(θc),Lbbr(θc)),\textbf{Second Training: }\quad\hat{\theta}_{c}=\mathop{\arg\min}_{\theta}(D_{\text{f}};L_{c}(\theta_{c}),L_{bbr}(\theta_{c})), (3)

where θ^c\hat{\theta}_{c} are the optimal parameters in Fig. 6 (c).

After the above trainings, we obtain a fish detection model called Fish-YOLOv4 in this paper. The mAP (mean Average Precision) of Fish-YOLOv4 can reach 0.75, which is already a relatively good performance. To further identify whether the network really learns fish detection, we check the reliability of Fish-YOLOv4 from two aspects. First, images with a different seabed background are selected as shown in Fig. 7 (a)(b). Second, we choose high-definition images different from the first two for testing like in Fig. 7 (c)(d). As can be seen, Fish-YOLOv4 can also successfully detect images when the style of test images is quite different. As a consequence, it can be employed as a feature extractor for the next step.

IV-C2 Fine-tune

At the fine-tune step, we disable the network of part (d) in Fig. 6 and allow the feature map flow into part (e) in Fig. 6. The parameters of the shared layer are frozen and not optimized, and the parameters of the output layer for quality prediction are fine-tuned. Specifically, the shared layer will take the role of feature extractor to obtain utility-meaningful and utility-relevant feature maps.

Feature Extraction: Fmap=Fish-YOLOv4(I;θ^c),\textbf{Feature Extraction: }\quad F_{\text{map}}=\text{Fish-YOLOv4}(I;\hat{\theta}_{c}), (4)

where II refers to an image in the UIUD database. θ^c\hat{\theta}_{c} is the parameter obtained from pre-training. FmapF_{\text{map}} represents three feature maps obtained through the shared layer. In order to further verify the effectiveness of FmapF_{\text{map}}, we perform a visualization in Fig. 8. The feature maps have a higher brightness in the target area, which means that higher weights are assigned to object regions. This implies that the features extracted by the shared layer are utility-related. We regard the feature maps as the inputs of the fully connected layer to get a quality score, as follows:

Refer to caption
Refer to caption
Figure 8: Some examples of feature map visualization.
Quality Regression: y=FClayer(Fmap;θe),\textbf{Quality Regression: }\quad{y}=\text{FC}_{\text{layer}}(F_{\text{map}};\theta_{e}), (5)

where FClayer\text{FC}_{\text{layer}} represents the fully connected layer part (e) in Fig. 6. yy is the predicted quality obtained by inputting FmapF_{\text{map}} under the random initial parameter θe\theta_{e}. We then fine-tune the θe\theta_{e}:

Parameter Training: θ^e=argminθ1Ni=0N(yi^yi)2,\textbf{Parameter Training: }\quad\hat{\theta}_{e}=\mathop{\arg\min}_{\theta}\frac{1}{N}\sum_{i=0}^{N}(\hat{y_{i}}-{y}_{i})^{2}, (6)

where yi^\hat{y_{i}} and yi{y}_{i} represent the ground truth and predicted score of the iith image, respectively. After the parameter training, the optimal solution θ^e\hat{\theta}_{e} of FClayer\text{FC}_{\text{layer}} is obtained. Then UIUM is defined as:

Quality Prediction: Qutility=UIUM(I;θ^c,θ^e),\textbf{Quality Prediction: }\quad\text{Q}_{\text{utility}}=\text{UIUM}(I;\hat{\theta}_{c},\hat{\theta}_{e}), (7)

where Qutility\text{Q}_{\text{utility}} is the quality of input image II. θ^c\hat{\theta}_{c} and θ^e\hat{\theta}_{e} are the optimal solutions of Eqs. (3) and (6), respectively.

V Experimental results and analysis

In this section, we evaluate the performance of proposed UIUM model and compare it with other state-of-the-art IQA metrics.

V-A Experimental protocols

Methods for comparison. To compare the performance comprehensively, we choose several state-of-the-art methods for comparison. They include 11 fidelity-oriented methods (BLIINDS-II [4], BRISQUE [5], HOSA [6], ILNIQE [7], WaDIQaM-NR [9], CNN-IQA [8], PSNR, SSIM [49], FSIM [50], UCIQE [51], UIQM [52]). Among them, the WaDIQaM-NR and CNN-IQA are deep learning-based NR-IQA methods. The PSNR, SSIM and FSIM are FR IQA algorithms. In addition, the last two are underwater IQA algorithms. In particular, underwater utility-oriented IQA (NRCDM [28]) and aesthetic-oriented IQA (NIMA [53]) have also been added for comparison.

Performance criteria. Three commonly used criteria are chosen to calculate the correlation between the subjective and objective quality scores, which indicates the performance of IQA methods. These criteria include Spearman Rank-Order Correlation Coefficient (SRCC), Pearson Linear Correlation Coefficient (PLCC), and Kendall Rank Correlation Coefficient (KRCC).

The three metrics mentioned above can effectively evaluate the monotonicity, accuracy and consistency between the prediction quality of the algorithm and the MOS. However, these performance metrics suffer from some drawbacks. On the one hand, they do not take into account the uncertainty of subjective ratings. On the other hand, they require mapping between predicted values and subjective scores. These defects not only cause the performance metrics to be vulnerable to the quality range of the stimuli in the experiments, but also result in that the performance comparison is not performed in the real scenarios. In order to overcome the above shortcomings, we utilize the method proposed in [54], [55], which is less dependent on the range effect and inspired by the real application without any mapping. First of all, the subjective scores need to be pre-processed as follows:

z(i,j)=|MOS(i)MOS(j)|var(i)N(i)+var(j)N(j),\quad z(i,j)=\frac{|MOS(i)-MOS(j)|}{\sqrt{\frac{var(i)}{N(i)}+\frac{var(j)}{N(j)}}}, (8)

where var(i)var(i) and N(i)N(i) denote the variance of the subjective scores and the number of subjects of image ii, respectively. MOS(i)MOS(i) is the MOS value of image ii. Then, the cumulative distribution function (cdf) of the normal distribution is employed to calculate the disparity between the qualities of image ii and jj:

pz(i,j)=cdf(z)=12πzexp(z22)𝑑z,\quad p_{z(i,j)}=cdf(z)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{z}exp(\frac{z^{2}}{2})dz, (9)

where the paired images are subjectively considered to be significantly different when pz(i,j)>0.95p_{z(i,j)}>0.95. For significantly different pairs, the quality difference predicted by IQA model mm is defined as:

Δm(i,j)=Qm(i)Qm(j),\quad\Delta_{m}(i,j)=Q_{m}(i)-Q_{m}(j), (10)

where Qm(i)Q_{m}(i) represents the objective score predicted by IQA model mm for image ii. When Δm(i,j)>0.95\Delta_{m}(i,j)>0.95, the quality of image ii is objectively identified as significantly better than the quality of image jj.

After the above processing, the percentage of correct recognition of qualitatively better image from the pair, which is denoted by C0C_{0} (the higher the better), is employed as one performance metric.

V-B Performance Evaluation

TABLE III: Average Performance comparison of different IQA methods in UIUD Database.
Methods PLCC SRCC KRCC Number C0C_{0}
BLINDS-II 0.1100 0.0520 0.0339 1 0.4919
BRISQUE 0.1765 0.1099 0.0754 2 0.4652
HOSA 0.3096 0.2798 0.1919 3 0.3662
ILNIQE 0.2301 0.2398 0.1629 4 0.4046
CNN-IQA 0.5520 0.5943 0.4181 5 0.7401
WaDIQaM-NR 0.8129 0.8024 0.6417 6 0.8712
NRCDM 0.2562 0.3992 0.2598 7 0.4648
NIMA 0.1088 0.1212 0.0801 8 0.5302
PSNR 0.2179 0.0394 0.0420 9 0.5477
SSIM 0.3740 0.2912 0.2074 10 0.6376
FSIMc 0.3601 0.3505 0.2503 11 0.6563
UCIQE 0.2886 0.1876 0.1242 12 0.4564
UIQM 0.0124 0.0155 0.0125 13 0.4981
OURS 0.8473 0.8377 0.6544 14 0.8794
Refer to caption
Figure 9: The classification ability of UIUM and other IQA methods on image quality pairs (Better/Worse) in the UIUD.
TABLE IV: Performance on individual distortion types.
Type1 Type2 Type3 Type4 Type5 Type6
Methods PLCC SRCC PLCC SRCC PLCC SRCC PLCC SRCC PLCC SRCC PLCC SRCC
BLINDS-II 0.1994 0.2325 0.2918 0.2556 0.1377 0.0087 0.0083 0.0478 0.2837 0.2559 0.1608 0.1343
BRISQUE 0.5268 0.4990 0.4229 0.4042 0.0486 0.0030 0.3873 0.2891 0.2671 0.2386 0.2551 0.2132
HOSA 0.6893 0.7089 0.3701 0.2982 0.1160 0.0898 0.1527 0.0621 0.3125 0.3173 0.2838 0.2470
ILNIQE 0.4740 0.6137 0.5337 0.5260 0.2582 0.2634 0.3834 0.3552 0.1540 0.0468 0.2865 0.1870
CNN-IQA 0.8439 0.8372 0.7174 0.7333 0.6499 0.6796 0.6248 0.6168 0.4845 0.5084 0.6339 0.6349
WaDIQaM-NR 0.9179 0.9029 0.8220 0.8433 0.8270 0.8477 0.9251 0.9209 0.8724 0.8786 0.8561 0.8529
NSIQM 0.2733 0.0344 0.2322 0.1367 0.1836 0.1457 0.2070 0.2006 0.0719 0.0905 0.0667 0.0289
NIMA 0.0791 0.0917 0.3122 0.2997 0.0357 0.0047 0.1994 0.2047 0.2798 0.2850 0.0503 0.0508
PSNR 0.7624 0.7546 0.3916 0.3900 0.3130 0.2817 0.3109 0.2262 0.3512 0.3199 0.2447 0.1653
SSIM 0.7901 0.7827 0.5293 0.4751 0.3540 0.3352 0.4420 0.3792 0.3550 0.3185 0.2069 0.1726
FSIMc 0.8200 0.8156 0.5641 0.5152 0.3625 0.3574 0.5221 0.5005 0.3617 0.3228 0.2132 0.1813
UCIQE 0.1478 0.0469 0.4965 0.1933 0.1568 0.0059 0.1171 0.0262 0.1851 0.1237 0.1426 0.0114
UIQM 0.1388 0.1460 0.5006 0.4451 0.3223 0.3063 0.2227 0.1235 0.0828 0.0611 0.1716 0.0498
OURS 0.9213 0.8950 0.8784 0.8812 0.8801 0.8745 0.9185 0.9156 0.8968 0.8916 0.8669 0.8607
TABLE V: Time consumption (milliseconds/image) of the UIUM method and the other mathods on the UIUD.
Method BLINDS-II BRISQUE HOSA
Cost (ms) 6.50×1036.50\times 10^{3} 1.39×1021.39\times 10^{2} 3.02×1023.02\times 10^{2}
Method ILNIQE UCIQE UIQM
Cost (ms) 2.18×1032.18\times 10^{3} 2.97×102.97\times 10 7.05×107.05\times 10
Method PSNR SSIM FSIMc
Cost (ms) 1.15×1021.15\times 10^{2} 6.836.83 9.889.88
Method CNN-IQA WaDIQaM-NR OURS
Cost (ms) 3.14×103.14\times 10 7.93×107.93\times 10 3.26×103.26\times 10

Performance on UIUD database. The UIUD database is divided into training and testing sets with an 80/20 split and no content overlapped. Besides, 10-fold cross-validation is utilized to evaluate the performance. Since the UIUD database is currently the first utility-oriented image quality database, we cannot perform cross-database verification. To ensure fair comparisons, the compared algorithms with deep learning are retrained and fine-tuned in the UIUD database. Table III shows the average results of all methods, where the best and 2nd-best performances are highlighted with bold and underline respectively. Our method outperforms all of the state-of-the-art methods for PLCC, SRCC and KRCC evaluations. As shown in Table III, only the proposed method and the WaDIQaM-NR method achieve SRCC and PLCC values above 0.8, while the performances of other algorithms are not ideal under the task of object detection. Deep network structure, feature fusion and spatial pooling methods make WaDIQaM-NR competitive. Especially, the WaDIQaM-NR network is significantly deeper than related IQA models. From this comparison, the UIUM is more relevant to utility. Although NRCDM evaluates images based on utility quality, its performance is less than satisfactory. This is owing to the big gap between imaging principles of acoustic image and optical image. The poor performance of NIMA results from the fact that it evaluates image content based on aesthetic appreciation.

Performance on individual distortion types. To further understand the effect of different distortion types, we test the performance of different algorithms on each individual distortion type. The results are shown in Table IV, where the best and second-best results are also shown in bold font and underline, respectively. Our algorithm has high correlations in all distortion types. The other algorithms also have higher correlations in some traditional distortion types, such as contrast distortion and illumination distortion. This is obvious because these types of distortions also appear in fidelity-oriented quality evaluations. Most algorithms perform well in transmission distortion, since the degree of distortion is highly correlated with visual perception. Furthermore, the traditional methods based on manual feature extraction fail on the remaining distortion types. The reason is that the extracted features are not suitable for underwater images, especially under task backgrounds.

From Table IV, three deep learning methods have relatively superior performances due to their strong learning abilities. CNN-IQA has no advantage compared with WaDIQaM-NR and UIUM algorithms, since its network structure is relatively simple and less targeted. Quite rightly, deep learning is data-driven, we will further verify the superiority of our algorithm from the perspective of data requirements in following subsection.

Discrimination ability for significantly different qualities. In order to test the discrimination ability, we calculate C0C_{0} for each selected IQA methods and the UIUM. Since testing in the UIUD requires large-scale complicated computation, we randomly sampled 100 images from each distortion type and completed the test on this subset. For convenience, the IQA methods are numbered separately as shown in the penultimate column of Table III. The last column of Table III shows the exact values of C0C_{0} for selected IQA methods. The results in the table indicate that UIUM (#14) has the best performance, and the WaDIQaM (#6) is second only to the UIUM in C0C_{0}. The same results can be obtained from the data bar of Fig. 9 (left), UIUM (#14) outperforms all other algorithms. The significance plot about C0C_{0} is also shown in Fig. 9 (right). In this plot, black and white boxs indicate the performance of the method in the row is significantly lower and higher than the method in the column, respectively. The gray box reflects the similar performance. It can be concluded from the box plot and table that the UIUM is significantly better at distinguishing difference in utility than other IQA methods.

Intuitive Comparison. To visually illustrate the performance of UIUM, its prediction scores and the subjective MOSs are directly compared in Fig. 10. Fig. 10(a) to Fig. 10(c) are the reference images, while Fig. 10(d) to Fig. 10(f) are the corresponding distorted images. The predictions made by UIUM have a high correlation with MOS values. Another key difference between UIUM and traditional IQA is shown here is that the quality score of the original image is not necessarily higher than that of the severely degraded images, such as Fig. 10(b) and Fig. 10(e).

Refer to caption
(a) M:73.56 T:72.33
Refer to caption
(b) M:70.48 T:72.57
Refer to caption
(c) M:77.61 T:73.54
Refer to caption
(d) M:69.28 T:61.19
Refer to caption
(e) M:74.76 T:76.84
Refer to caption
(f) M:75.24 T:76.03
Figure 10: UIUM scores for several examples that illustrate the good performance of the proposed method. M and T represent the subjective MOS value and the predicted score of UIUM, respectively.

Computation Time Comparison. To measure the time complexity of each algorithm, UIUM and all other algorithms are run on the UIUD database for testing. These computational cost tests are conducted on the MATLAB R2019a and Pytorch software platform on a computer with a 3.98 GHz CPU, 16.00 GB of RAM and an RTX2070 graphics card. The average time consumption of each method is tabulated in Table V. The first two lines are traditional NR-IQA, the third line is traditional FR-IQA, and the last line is NR-IQA based on deep learning. SSIM and FSIMc have obvious advantages in speed, but they are both methods of manually extracting features, which are difficult to achieve robust performance due to the limitation of fixed features. In the deep learning methods, the calculation time of UIUM is almost the same as that of CNN-IQA, while UIUM has great advantages over CNN-IQA in performance. WaDIQaM-NR is 4% worse than UIUM in performance, which is the closest to UIUM among all methods. It also lags behind us in computational efficiency. In general, UIUM has great advantages in performance and calculation time.

TABLE VI: Ablation Study.
Methods PLCC SRCC KRCC
ODUQA 0.7529 0.7549 0.5619
UIUM 0.8473 0.8377 0.6544

V-C Ablation Study

To further evaluate the impact of transfer learning, we conduct an ablation experiment. In the UIUM, the pre-trained weights are firstly fed in COCO. Then the model is trained on FODD to obtain a fish detection model. Finally, we fix the parameters of backbone and neck, and fine-tunes the fully connected layer on UIUD database. In the ablation experiment, we define an Object Detection Utility-Oriented Quality Assessment (ODUQA). The framework of ODUQA is the same to UIUM except that the ODUQA is trained on a normal object detection model, not a fish detection model. These two methods are tested separately in the UIUD database. The results in Table VI shows that our method achieves higher performance. It demonstrates that the features extraction module is able to precisely capture the useful information for the target domain (i.e., utility-oriented IQA) from the source domain (i.e., fish detection).

V-D Advantages of Transfer Learning on Database

In this work, transfer learning makes full use of the labeled data of source task, which enable better performance of a new task with less labeled data. However, data annotating is boring and expensive. Therefore, we discuss the performance of UIUM and WaDIQaM-NR on smaller data sets in this section. The reason for choosing WaDIQaM-NR is its high performance on the complete UIUD database, and it is also an excellent deep learning-based NR-IQA method in recent years. After randomly deleting part of the reference images and their corresponding distorted images, we got two databases with 1007 and 2015 images, respectively. In this section, we name the databases according to the number of images, which are UIUD-3340, UIUD-2015, and UIUD-1007. The proportion of testing and training sets is the same as above.

Refer to caption
Figure 11: The performances of UIUM and WaDIQaM-NR on UIUD with different image numbers.
Refer to caption
Figure 12: Scatter plot of prediction scores of UIUM and WaDIQaM-NR on non-target images. The horizontal axes correspond to the image number, and the vertical axes represent the image scores.

The performances of UIUM and WaDIQaM-NR on the UIUD database with different image numbers are shown in Fig. 11. When the number of images changes, the performance of UIUM remains the same. However, the performance of WaDIQaM-NR drops significantly. When the number of images becomes 1007, its SRCC and PLCC drop to 10% and 8%, respectively. This fully proves that our algorithm employing transfer learning achieves better performance with less training data. Furthermore, the performance of WaDIQaM-NR starts to saturate on UIUD-2015. Specifically, the optimal SRCC and PLCC that WaDIQaM-NR can achieve in this task is about 0.8, while the UIUM yields a 5% performance gain In consequence, UIUM achieves higher performance in a smaller database, and its final performance is also better than WaDIQaM-NR.

V-E Analysis of Non-Target Images

We have added 90 non-target images to UIUD. The reason is that the task cannot be completed with a distortion-free and clear but targetless image. Therefore, we tentatively verify the non-target images. The result is shown in the Fig. 12. It can be found that UIUM can define most targetless images as low-quality images below 40 points, and the prediction of WaDIQaM-NR for these images is more volatile, even with predictions of more than 60 points. In the future, we will add non-target images and further analyze the content in combination with the background of specific tasks.

VI CONCLUSIONS

Nowadays, the IQA has been a popular vision task in quality monitoring and optimization during acquisition, transmission, enhancement, etc. In this work, we envision another application of IQA, namely utility-oriented IQA, which associates image quality with its utility in a vision-based task. We conduct our work in the context of fish detection, since it is of great significance for underwater exploration and it is difficult to achieve automatic analysis according to the current state of the art. We firstly develop a database consists of representative images for fish detection and their typical distorted versions, named UIUD. To our knowledge, the UIUD database is the first utility-oriented image quality database. Then we propose a UIUM algorithm to achieve utility measurement. We extract utility-related features by employing transfer learning, which transfers characteristic features in fish detection to utility-oriented IQA with a shared layer. The proposed framework can be easily extended to general object detection tasks. We hope our research can initiate the quality evaluation in general computer vision tasks and expand the field of IQA. The UIUD database and the UIUM model will be made publicly available to facilitate reproducible research.

Acknowledgment

UIFD was published in 2021 IEEE/CIC international conference on communication in China. Here we reorganize its structure to evaluate the utilities of underwater images.

References

  • [1] D. M. Rouse, R. Pépion, S. S. Hemami, and P. Le Callet, “Image utility assessment and a relationship with image quality assessment,” Proceedings of SPIE - The International Society for Optical Engineering, vol. 7240, pp. 724 010–724 010–14, 2009.
  • [2] D. M. Rouse, S. S. Hemami, R. Ppion, and P. Le Callet, “Estimating the usefulness of distorted natural images using an image contour degradation measure,” JOSA A, vol. 28, no. 2, pp. 157–188, 2011.
  • [3] R. Lin, T. Zhao, W. Chen, Y. Zheng, and H. Wei, “Underwater image quality database towards fish detection,” in 2021 IEEE/CIC International Conference on Communications in China (ICCC Workshops), 2021.
  • [4] M. A. Saad, A. C. Bovik, and C. Charrier, “Blind image quality assessment: A natural scene statistics approach in the dct domain,” IEEE Transactions on Image Processing, vol. 21, no. 8, pp. 3339–3352, 2012.
  • [5] A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality assessment in the spatial domain,” IEEE Transactions on Image Processing, vol. 21, no. 12, pp. 4695–4708, 2012.
  • [6] J. Xu, P. Ye, Q. Li, H. Du, Y. Liu, and D. Doermann, “Blind image quality assessment based on high order statistics aggregation,” IEEE Transactions on Image Processing, vol. 25, no. 9, pp. 4444–4457, 2016.
  • [7] L. Zhang, L. Zhang, and A. C. Bovik, “A feature-enriched completely blind image quality evaluator,” IEEE Transactions on Image Processing, vol. 24, no. 8, pp. 2579–2591, 2015.
  • [8] L. Kang, P. Ye, Y. Li, and D. Doermann, “Convolutional neural networks for no-reference image quality assessment,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1733–1740.
  • [9] S. Bosse, D. Maniry, K.-R. Müller, T. Wiegand, and W. Samek, “Deep neural networks for no-reference and full-reference image quality assessment,” IEEE Transactions on Image Processing, vol. 27, no. 1, pp. 206–219, 2018.
  • [10] S. Yang, Q. Jiang, W. Lin, and Y. Wang, “SGDNet: An end-to-end saliency-guided deep neural network for no-reference image quality assessment,” in Proceedings of the 27th ACM International Conference on Multimedia, 2019, pp. 1383–1391.
  • [11] W. Zhang, K. Ma, G. Zhai, and X. Yang, “Uncertainty-aware blind image quality assessment in the laboratory and wild,” IEEE Transactions on Image Processing, vol. 30, pp. 3474–3486, 2021.
  • [12] M. Kucer, A. C. Loui, and D. W. Messinger, “Leveraging expert feature knowledge for predicting image aesthetics,” IEEE Transactions on Image Processing, vol. 27, no. 10, pp. 5100–5112, 2018.
  • [13] W. Zhang, G. Zhai, X. Yang, and J. Yan, “Hierarchical features fusion for image aesthetics assessment,” in 2019 IEEE International Conference on Image Processing (ICIP), 2019, pp. 3771–3775.
  • [14] C. Cui, H. Liu, T. Lian, L. Nie, L. Zhu, and Y. Yin, “Distribution-oriented aesthetics assessment with semantic-aware hybrid network,” IEEE Transactions on Multimedia, vol. 21, no. 5, pp. 1209–1220, 2019.
  • [15] B. Yan, Q. Lin, W. Tan, and S. Zhou, “Assessing eye aesthetics for automatic multi-reference eye in-painting,” in 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 13 506–13 514.
  • [16] Q. Kuang, X. Jin, Q. Zhao, and B. Zhou, “Deep multimodality learning for uav video aesthetic quality assessment,” IEEE Transactions on Multimedia, vol. 22, no. 10, pp. 2623–2634, 2020.
  • [17] X. Zhang, X. Gao, W. Lu, and L. He, “A gated peripheral-foveal convolutional neural network for unified image aesthetic prediction,” IEEE Transactions on Multimedia, vol. 21, no. 11, pp. 2815–2826, 2019.
  • [18] Z. Liu, Z. Wang, Y. Yao, L. Zhang, and L. Shao, “Deep active learning with contaminated tags for image aesthetics assessment,” IEEE Transactions on Image Processing, pp. 1–1, 2018.
  • [19] E. T. Scott and S. S. Hemami, “Image utility estimation using difference-of-gaussian scale space,” in 2016 IEEE International Conference on Image Processing (ICIP), 2016, pp. 101–105.
  • [20] E. T. Scott and S. S. Hemami, “No-reference utility estimation with a convolutional neural network,” Electronic Imaging, no. 9, pp. 202–1–202–6, 2018.
  • [21] E. T. Scott and S. S. Hemami, “Interactions between saliency and utility,” Electronic Imaging, vol. 2017, no. 14, pp. 77–84, 2017.
  • [22] L. Best-Rowden and A. K. Jain, “Learning face image quality from human assessments,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 12, pp. 3064–3077, 2018.
  • [23] Z. Yan, X. Yang, and K.-T. Cheng, “A skeletal similarity metric for quality evaluation of retinal vessel segmentation,” IEEE Transactions on Medical Imaging, vol. 37, no. 4, pp. 1045–1057, 2018.
  • [24] M. Bhat, J.-M. Thiesse, and P. L. Callet, “Combining video quality metrics to select perceptually accurate resolution in a wide quality range: A case study,” in 2021 IEEE International Conference on Image Processing (ICIP), 2021, pp. 2164–2168.
  • [25] T. Vigier, L. Krasula, A. Milliat, M. P. Da Silva, and P. Le Callet, “Performance and robustness of hdr objective quality metrics in the context of recent compression scenarios,” in 2016 Digital Media Industry Academic Forum (DMIAF), 2016, pp. 59–64.
  • [26] W. Chen, F. Yuan, E. Cheng, and W. Lin, “Subjective and objective quality evaluation of sonar images for underwater acoustic transmission,” in 2017 IEEE International Conference on Image Processing (ICIP), 2017, pp. 176–180.
  • [27] W. Chen, K. Gu, W. Lin, F. Yuan, and E. Cheng, “Statistical and structural information backed full-reference quality measure of compressed sonar images,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 2, pp. 334–348, 2020.
  • [28] W. Chen, K. Gu, W. Lin, Z. Xia, P. Le Callet, and E. Cheng, “Reference-free quality assessment of sonar images via contour degradation measurement,” IEEE Transactions on Image Processing, vol. 28, no. 11, pp. 5336–5351, 2019.
  • [29] F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and Q. He, “A comprehensive survey on transfer learning,” Proceedings of the IEEE, vol. 109, no. 1, pp. 43–76, 2021.
  • [30] Y. Feng and C. Yiyu, “No-reference image quality assessment through transfer learning,” in 2017 IEEE 2nd International Conference on Signal and Image Processing (ICSIP), 2017, pp. 90–94.
  • [31] Y. Zhang, X. Gao, L. He, W. Lu, and R. He, “Objective video quality assessment combining transfer learning with cnn,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 8, pp. 2716–2730, 2020.
  • [32] X. Yang, F. Li, and H. Liu, “TTL-IQA: Transitive transfer learning based no-reference image quality assessment,” IEEE Transactions on Multimedia, pp. 1–1, 2020.
  • [33] K. Alex, S. Ilya, and H. G. E, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, pp. 1097–1105, 2012.
  • [34] Z. Zhao, Y. Liu, X. Sun, J. Liu, X. Yang, and C. Zhou, “Composited fishnet: Fish detection and species recognition from low-quality underwater videos,” IEEE Transactions on Image Processing, vol. 30, pp. 4719–4734, 2021.
  • [35] T. Akgül, N. Çalik, and B. U. Töreyın, “Deep learning-based fish detection in turbid underwater images,” in 2020 28th Signal Processing and Communications Applications Conference (SIU), 2020, pp. 1–4.
  • [36] X. Li, Y. Tang, and T. Gao, “Deep but lightweight neural networks for fish detection,” in OCEANS 2017 - Aberdeen, 2017, pp. 1–5.
  • [37] R. I and B. ITUR Recommendation, “Methodology for the subjective assessment of the quality of television pictures bt series broadcasting service,” R Rec Bt Ebu Technical Review – April, 2012.
  • [38] H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3440–3451, 2006.
  • [39] N. Ponomarenko, O. Ieremeiev, V. Lukin, K. Egiazarian, and L. Jin, “Color image database tid2013: Peculiarities and preliminary results,” in European Workshop on Visual Information Processing (EUVIP), 2013, pp. 106–111.
  • [40] V. Hosu, H. Lin, T. Sziranyi, and D. Saupe, “Koniq-10k: An ecologically valid database for deep learning of blind image quality assessment,” IEEE Transactions on Image Processing, vol. 29, pp. 4041–4056, 2020.
  • [41] N. Murray, L. Marchesotti, and F. Perronnin, “Ava: A large-scale database for aesthetic visual analysis,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2408–2415.
  • [42] W. Liu and Z. Zhou, “A database for perceptual evaluation of image aesthetics,” in 2017 IEEE International Conference on Image Processing (ICIP), 2017, pp. 1317–1321.
  • [43] L. Ma, W. Lin, C. Deng, and K. N. Ngan, “Image retargeting quality assessment: A study of subjective scores and objective metrics,” IEEE Journal of Selected Topics in Signal Processing, vol. 6, no. 6, pp. 626–639, 2012.
  • [44] C.-Y. Wang, H. yuan Mark Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, “CSPNet: A new backbone that can enhance learning capability of cnn,” in 2020 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 1571–1580.
  • [45] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904–1916, 2015.
  • [46] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 936–944.
  • [47] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8759–8768.
  • [48] T.-Y. Lin, M. Maire, S. Belongie, and J. Hays, “Microsoft coco: Common objects in context,” in European Conference on Computer Vision.   Springer, 2014, pp. 740–755.
  • [49] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
  • [50] L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Transactions on Image Processing, vol. 20, no. 8, pp. 2378–2386, 2011.
  • [51] M. Yang and A. Sowmya, “An underwater color image quality evaluation metric,” IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 6062–6071, 2015.
  • [52] K. Panetta, C. Gao, and S. Agaian, “Human-visual-system-inspired underwater image quality measures,” IEEE Journal of Oceanic Engineering, vol. 41, no. 3, pp. 541–551, 2016.
  • [53] H. Talebi and P. Milanfar, “Nima: Neural image assessment,” IEEE Transactions on Image Processing, vol. 27, no. 8, pp. 3998–4011, 2018.
  • [54] L. Krasula, K. Fliegel, P. Le Callet, and M. Klíma, “On the accuracy of objective image and video quality models: New methodology for performance evaluation,” in 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).   IEEE, 2016, pp. 1–6.
  • [55] J. A. Hanley and B. J. McNeil, “A method of comparing the area under two roc curves derived from the same cases,” Radiology. v148, pp. 839–843.