Bucket of deep transfer learning features and classification models for melanoma detection

Mario Manzo [email protected] Simone Pellino Information Technology Services, University of Naples “L’Orientale” MIUR

Abstract

Malignant melanoma is the deadliest form of skin cancer and, in recent years, is rapidly growing in terms of the incidence worldwide rate. The most effective approach to targeted treatment is early diagnosis. Deep learning algorithms, specifically convolutional neural networks, represent a methodology for the image analysis and representation. They optimize the features design task, essential for an automatic approach on different types of images, including medical. In this paper, we adopted pretrained deep convolutional neural networks architectures for the image representation with purpose to predict skin lesion melanoma. Firstly, we applied a transfer learning approach to extract image features. Secondly, we adopted the transferred learning features inside an ensemble classification context. Specifically, the framework trains individual classifiers on balanced subspaces and combines the provided predictions through statistical measures. Experimental phase on datasets of skin lesion images is performed and results obtained show the effectiveness of the proposed approach with respect to state-of-the-art competitors.

keywords:

Melanoma detection, Deep Learning, Transfer Learning , Ensemble Classification

^†^†journal: Cognitive Systems Research

1 Introduction

Among the types of malignant cancer, melanoma is the deadliest form of skin cancer and its incidence rate is growing rapidly around the world. Early diagnosis is particularly important since melanoma can be cured with a simple excision. In the majority, due to the similarity of the various skin lesions (melanoma and not-melanoma) [1], the visual analysis could be unsuitable and would lead to a wrong diagnosis. In this regard, image processing and artificial intelligence tools can provide a fundamental aid to a step of automatic classification [2]. Further improvement in diagnosis is provided by dermoscopy technique [3]. Dermoscopy technique can be applied to the skin, in order to capture illuminated and magnified images of the skin lesion in a non invasive way to highlight areas containing spots. Furthermore, the visual effect of the deeper skin layer can be improved if the skin surface reflection is removed. Anyhow, classification of melanoma dermoscopy images is a difficult task for different issues. First, the degree of similarity between melanoma and not-melanoma lesions. Second, the segmentation, and, therefore, the identification of the affected area is very complicated because of the variations in terms of texture, size, color, shape and location. The last issue and not the least, is the additional skin conditions such as hair, veins or variations due to image capturing. To this end, many solutions have been provided to improve the task. For example, low-level hand-crafted features [4] are adopted to discriminate non-melanoma and melanoma lesions. In some cases, this type of features are unable to discriminate clearly, leading to results that are sometimes not very relevant [5]. Differently, segmentation is adopted to isolate the foreground elements from the background ones [6]. Consequently, the segmentation includes low-level features with a low representational power that provides unsatisfactory results [7]. In recent years, deep learning has become an effective solution for the extraction of significant features on large data. In particular, the diffusion of deep neural networks, applied to the image classification task, is connected to various factors such as the availability of software in terms of open source license, the constant growth of hardware power and the availability of large datasets [8]. Deep learning has proven effective for the management, analysis, representation and classification of medical images [9]. Specifically, for the treatment of melanoma, deep neural networks were adopted both in segmentation and classification phases [10]. However, the high variation of the types of melanoma and the imbalance of the data have a decisive impact on performance [11], hindering the generalization of the model and leading to over-fitting [12]. In order to overcome the aforementioned issues, in this paper, we introduce a novel framework based on transfer deep learning and ensemble classification for melanoma detection. It works based on three integrated stages. A first, which performs image preprocessing operations. A second, which extracts features using transfer deep learning. A third, including a layer of ensemble learning, in which different classification algorithms and features extracted are combined with the aim of making the best decision (melanoma/not-melanoma). Our approach provides the following main contributions:

1.

A deep and ensemble learning based framework, to simultaneously address inter-class variation and class imbalance for the task of melanoma classification.
2.

A framework that in the classification phase, at the same time, creates multiple image representation models, based on features extracted with deep transfer learning.
3.

The demonstration of how the choice of multiple features can enrich image representation by leading a lesion assessment like a skilled dermatologist.
4.

Some experimental greater improvements over existing methods on different state of art datasets about melanoma detection task.

The paper is structured as follows. Section 2 provides an overview of state of art about melanoma classification approaches. Section 3 describes in detail proposed framework. Section 4 provides a wide experimental phase, while section 5 concludes the paper.

2 Related work

In this section, we briefly analyze the most important approaches of skin lesions recognition literature. In this field are included numerous works that address the issue according to different aspects. Some works offer an important contribution about image representation, by implementing segmentation algorithms or new descriptors. Instead, others implement complex mechanisms of learning and classification.

In [13] a novel boundary descriptor based on the color variation of the skin lesion input images, achieved with standard cameras is introduced. Furthermore, in order to reach higher performance, a set of textural and morphological features is added. Multilayer perceptron neural network as classifier is adopted.

In [14] authors propose a complex framework that implements an illumination correction and features extraction on skin image lesions acquired using normal consumer-grade cameras. Applying a multi-stage illumination improvement algorithm and defining a set of high-level intuitive features (HLIF), that quantifies the level of asymmetry and border irregularity about a lesion, the proposed model can be used to classify accurate skin lesion diagnoses.

While in [15] authors, to properly evaluate contents of the concave contours, introduce a novel border descriptor named boundary intersection-based signature (BIBS). Shape signature is a one-dimensional illustration of shape border and cannot contribute to a proper description for concave borders that have more than one intersection points. For this reason, BIBS analyzes boundary contents of shape especially shapes with concave contours. Support vector machine (SVM) for classification process is adopted.

Another descriptor for the individualization of skin lesions is named high-level intuitive features (HLIFs) [16]. HLIFs is created to simulate a model of human-observable characteristics. It captures specific characteristics that are significant to the given application: Color Asymmetry, analyzing and clustering pixels colors, Structural Asymmetry, applying the Fourier descriptors of the shape, Border Irregularity, using morphological opening and closing, Color characteristics, transforming the image to a perceptually uniform color space, building color-spatial representations that model the color information for a patch of pixels, clustering the patch representations into $k$ color clusters, quantifying the variance found using the original lesion and the $k$ representative colors.

A texture analysis method of Local Binary Patterns (LBP) and Block Difference of Inverse Probabilities is proposed in [17]. A comparison is provided with classification results obtained by taking the raw pixel intensity values as input. Classification stage is achieved generating an automated model obtained by both Convolutional Neural Networks (CNN) and SVM.

In [18] authors propose a system that automatically extracts the lesion regions, using a non-dermoscopic digital images, and then computes color and texture descriptors. Extracted features are adopted for automatic prediction step. The classification is managed using a majority vote of all predictions.

In [19] non-dermoscopic clinical images to assist a dermatologist in early diagnosis of melanoma skin cancer are adopted. Images are preprocessed in order to reduce artifacts like noise effects. Subsequently, images are analyzed through a pretrained CNN which is a member of deep learning models. CNN are trained by large number of training samples in order to distinguish between melanoma and benign cases.

In [20] Predict-Evaluate-Correct K-fold (PECK) algorithm is presented. Algorithm works by merging deep CNNs with SVM and random forest classifiers to achieve an introspective learning method. In addition, authors provides a novel segmentation algorithm, named Synthesis and Convergence of Intermediate Decaying Omnigradients (SCIDOG), to accurately detect lesion contours in non-dermoscopic images, even in the presence of significant noise, hair, and fuzzy lesion boundaries.

In [21] authors propose a novel solution to improve melanoma classification by defining a new feature that exploits the border-line characteristics of the lesion segmentation mask combining gradients with LBP. These border-line features are used together with the conventional ones and lead to higher accuracy in classification stage.

In [22] an objective features extraction function for CNN is proposed. The goal is to acquire the variation separability as opposed to the categorical cross entropy which maximizes according to the target labels. The deep representative features increase the variance between the images making it more discriminative. Also, the idea is to build a CNN and perform principal component analysis (PCA) during the train phase.

In [23] a deep learning computer aided diagnosis system for automatic segmentation and classification of melanoma lesions is proposed. The system extracts CNN and statistical and contrast location features on the results of raw image segmentation. The combined features are utilized to obtain the final classification of melanoma, malignant or benign.

In [24] authors propose an efficient algorithm for prescreening of pigmented skin lesions for malignancy using general-purpose digital cameras. The proposed method enhances borders and extracts a broad set of dermatologically important features. These discriminative features allow classification of lesions into two groups of melanoma and benign.

In [25] a skin lesion detection system optimized to run entirely on the resource constrained smartphone is described. The system combines a lightweight method for skin detection with a hierarchical segmentation approach including two fast segmentation algorithms and proposes novel features to characterize a skin lesion. Furthermore, the system implements an improved features selection algorithm to determine a small set of discriminative features adopted by the final lightweight system.

3 Materials and Methods

In the section we describe the proposed framework which includes two well known methodologies: deep neural network and ensemble learning. The main idea is to combine algorithms of features extraction and classification. The result is a set of competitive models providing a range of confidential decisions useful for making choices during classification. The framework is composed of three level. A first, which performs preprocessing operations such as image resize and data balancing. A second, of transfer learning, which extracts features using deep neural networks. A third level, of ensemble learning, in which different classification algorithms (SVM [26], Logistic Label Propagation (LLP) [27], KNN [28]) and features extracted are combined with the aim of making the best decision. Adopted classifiers are trained and tested through a bootstrapping policy. Finally, the framework iterates through a predetermined number of times in a supervised learning context.

3.1 Data balancing

Melanoma lesion analysis and classification is connected with accurate segmentation with purpose to isolate areas of the image containing information of interest. Moreover, the wide variety of skin lesions and the unpredictable obstructions on the skin make traditional segmentation an ineffective tool, especially for non-dermoscopic images. Furthermore, the problem of imbalance, present in many datasets, makes the classification difficult to address, especially when the samples of the minority class are very underrepresented. In the case under consideration, to compensate the strong imbalance between the two classes, a balancing phase was performed. The goal is to isolate segments of the image that could contain melanoma. In particular, the resampling of the minority class is performed by adding images altered through the application of K-Means color segmentation algorithm [29]. The application of segmentation algorithms for image augmentation [30], and consequently to provide a balancing between classes, represented a good compromise for this stage of the pipeline.

3.2 Image resize

Images to be processed have been resized based on the dimension, related to the input layer, claimed by the deep neural networks (details can be found in table 1 column 5). Many of the networks require this type of step but it does not alter the image information content in any way. This normalization step is essential because images of different or large dimensions cannot be processed for the features extraction stage.

3.3 Transfer learning and features extraction

The transfer learning approach has been chosen for features extraction purpose. Commonly, pretrained network is adopted as starting point to learn a new task. It is the easiest and fastest way to exploit the representational power of pretrained deep networks. It is usually much faster and easier to tune a network with transfer learning than training a new network from scratch with randomly initialized weights. We have selected deep learning architectures for image classification based on their structure and performance skills. The goal is to extract features from images through neural networks by redesign their structures in the final layer according to the needs of the addressed task (two outgoing classes: melanoma and not-melanoma). The features extraction is performed through a chosen layer (different for each network and specified in the table 1), placed in the final part of the structure. The image will be encoded through a vector of real numbers produced by consecutive convolution steps, from the input layer to the layer chosen for the representation. Below a description of the adopted networks is reported.

Alexnet [8] consists of 5 convolutional layers and 3 fully connected layers. It includes the non-saturating ReLU activation function, better then tanh and sigmoid during training phase. For features extraction, we have chosen fully connected 7 (fc7) layer composed of 4096 neurons.

Googlenet [31] is composed of 22 layers deep. The network is inspired by LeNet [32] but implemented a novel element which is dubbed an inception module. This module is based on several very small convolutions in order to drastically reduce the number of parameters. Their architecture reduced the number of parameters from 60 million (AlexNet) to 4 million. Furthermore, it includes batch normalization, image distortions and Root Mean Square Propagation algorithm. For features extraction, we have chosen global average pooling (pool5-7x7_s1) layer composed of 1024 neurons.

Resnet18 and Resnet50 [33] are inspired by pyramidal cells contained in the cerebral cortex. They use particular skip connections or shortcuts to jump over some layers. They are composed of 18 and 50 layers deep, which with the help of a technique known as skip connection has paved the way for residual networks. For features extraction, we have chosen two global average pooling (pool5 and avg-pool) layers composed of 512 and 2048 neurons respectively.

Table 1: Description of adopted pretrained network.

Network	Depth	Size (MB)	Parameters (Millions)	Input Size	Features Layer
Alexnet	8	227	61	227 $\times$ 227	fc7
Googlenet	8	27	7	224 $\times$ 224	pool5-7x7_s1
Resnet18	18	44	11.7	224 $\times$ 224	pool5
Resnet50	50	96	25.6	224 $\times$ 224	avg_pool

3.4 Network design

The adopted networks have been adapted to the melanoma classification problem. Originally, they have been trained on the Imagenet dataset [34], composed of a million images and classified into 1000 classes. The result is a rich features representation for a wide range of images. The network processes an image and provides a label along with probabilities for each of the classes. Commonly, the first layer of the network is the image input layer. This requires input images with 3 color channels. Just after, convolutional layers work to extract image features in which the last learnable layer and the final classification layer adopt to classify the input image. In order to make suitable the pretrained network to classify new images, the two last layers with new layers are replaced. In many cases, the last layer, including learnable weights, is a fully connected layer. This is replaced with a new fully connected layer related to the number of outputs equal to the number of classes of new data. Moreover, to speedup the learning in the new layer respect to transferred layers, it is recommended to increase the learning rate factors. As an optional choice, the weights of earlier layers can be frozen by setting the related learning rate to zero. This setting produces a failure of update of the weights, during the training, and a consequent lowering of the execution time as the gradients of the related layers must not be calculated. This aspect is very interesting to avoid overfitting in the case of small datasets.

3.5 Ensemble Learning

The contribution of different transfer learning features and classifiers can be mixed in an ensemble context. Considering the set of images, with cardinality $k$ , belonging to $x$ classes, to be classified

Imgs=\{i_{1},i_{2},\ldots,i_{k}\}

(1)

each element of the set will be treated with the procedure below. Let’s consider the set $C$ composed of $n$ classifiers

C=\{\beta_{1},\beta_{2},\ldots,\beta_{n}\}

(2)

and set $F$ composed of $m$ vectors of transferred learning features

F=\{\Theta_{1},\Theta_{2},....\Theta_{m}\}

(3)

the goal is the combination each element of the set $C$ with the elements of the set $F$ . The set of combinations can be defined as $CF$

CF=\begin{bmatrix}\beta_{1}\Theta_{1}&\dots&\beta_{1}\Theta_{m}\\ \vdots&\ddots&\\ \beta_{n}\Theta_{1}&&\beta_{n}\Theta_{m}\end{bmatrix}

each combination provides a decision $i\in I\{-1,1\}$ , where $1$ stands for melanoma and $-1$ for not-melanoma, related to image of the set $Imgs$ . The set of decisions $D$ can be defined as follows

D=\begin{bmatrix}d_{\beta_{1}\Theta_{1}}&\dots&d_{\beta_{1}\Theta_{m}}\\ \vdots&\ddots&\\ d_{\beta_{n}\Theta_{1}}&&d_{\beta_{n}\Theta_{m}}\end{bmatrix}

Each $d_{\beta_{i}\Theta_{j}}$ value represents a decision based on the combination of sets $C$ and $F$ . In addition, the set of scores S can be defined as follows

S=\begin{bmatrix}P(i|x)_{d_{\beta_{1}\Theta_{1}}}&\dots&P(i|x)_{d_{\beta_{1}\Theta_{m}}}\\ \vdots&\ddots&\\ P(i|x)_{d_{\beta_{n}\Theta_{1}}}&&P(i|x)_{d_{\beta_{n}\Theta_{m}}}\end{bmatrix}

a score value, $s\in S\{0,\dots,1\}$ , is associated with each decision $d$ and represents the posterior probability $P(i|x)$ that an image $i$ belongs to class $x$ . At this point, let’s introduce the concept of mode, defined as the value which is repeatedly occurred in a given set

mode=l+\left(\frac{f_{1}-f_{0}}{2f_{1}-f_{0}-f_{2}}\right)\times h

(4)

where $l$ is the lower limit of the modal class, $h$ is the size of the class interval, $f_{1}$ is the frequency of the modal class, $f_{0}$ is the frequency of the class which precedes the modal class and $f_{2}$ is the frequency of the class which successes the modal class. The columns of matrix $D$ are analyzed with the mode, in order to obtain the values of the most frequent decisions. This step is carried out in order to verify the best response of the different classifiers, contained in the set $C$ , which adopt the same type of features. Moreover, the $mode$ provides two indications. The most frequent value and its occurrences (indices). For each most frequent occurrence, modal value, the corresponding score of the matrix $S$ is extracted. In this regard, a new vector is generated

DS=\{ds_{P(i|x)_{d_{\beta_{1,\dots,n}\Theta_{1}}}},\ldots,ds_{P(i|x)_{d_{\beta_{1,\dots,n}\Theta_{m}}}}\},

(5)

where each element $ds$ contains the average of the scores that have a higher frequency, extracted through the $mode$ , in the related column of the matrix $D$ . Also, the modal value of each column of the matrix $D$ is stored in the vector $DM$

DM=\{dm_{d_{\beta_{1,\dots,n}\Theta_{1}}},\dots,dm_{d_{\beta_{1,\dots,n}\Theta_{m}}}\},

(6)

the final decision will consist in the selection of the element of the vector $DM$ with the same position of the maximum score value of the vector $DS$ . This last step verifies the best prediction based on the different features adopted, essentially the best features suitable for the classification of the image.

3.6 Train and test strategy: Bootstrapping

Bootstrapping is a statistical technique which consists in creating samples of size $B$ , named bootstrap samples, from a dataset of size $N$ . The bootstrap samples are random inserted with replacement on the dataset. This strategy has important statistical properties. First, subsets can be considered as directly extracted from the original distribution, independently of each others, containing representative and independent samples, almost independent and identically distributed (idd). Two considerations must be made in order to validate the hypotheses. First, the $N$ dimension of the original dataset should be large enough to detect the underlying distribution. Sampling the original data is a good approximation of real distribution (representativeness). Second, the $N$ dimension of the dataset should be better than the $B$ dimension of the bootstrap samples so that the samples are not too correlated (independence). Commonly, considering the samples to be truly independent means requiring too much data compared to the amount actually available. This strategy can be adopted to generate several bootstrap samples that can be considered nearly representative and almost independent (almost iid samples). In the proposed framework, bootstrapping is applied to set $F$ (equation 3) in order to perform the training and testing stages of classifiers. This strategy seemed suitable for the problem faced in order to create a competitive environment capable of providing the best performance.

4 Experimental results

This section describes the experiments performed on public datasets. In order to produce compliant performance, the settings included in well-known melanoma classification methods, in which the main critical issue concerns the features extraction for image representation, are adopted.

4.1 Datasets

First adopted dataset is MED-NODE¹¹1http://www.cs.rug.nl/ imaging/databases/melanoma_naevi/. It was created by the Department of Dermatology of the University Medical Center Groningen (UMCG). The dataset was initially used to train the MED-NODE computer assisted melanoma detection system [35]. It is composed of 170 non-dermoscopic images, where 70 are melanoma and 100 are nevi. The image dimensions vary greatly, ranging from $201\times 257$ to $3177\times 1333$ pixels.

Second adopted dataset, Skin-lesion (from now), is described in [36]. It is composed of $206$ images of skin lesion, which were obtained using standard consumer-grade cameras in varying and unconstrained environmental conditions. These images were extracted from the online public databases Dermatology Information System²²2http://www.dermis.net and DermQuest³³3http://www.dermquest.com. Of these images, $119$ are melanomas, and $87$ are not-melanoma. Each image contains a single lesion of interest.

4.2 Settings

The framework consists of different modules written in Matlab language. Moreover, we applied pretrained networks available which are included in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) [37]. Among all the computational stages, the features extraction process, described in section 3.3, was certainly the most expensive. As is certainly known, the networks are composed of fully connected layers that make the structure extremely dense and complex. This aspect certainly increases the computational load. Alexnet, Googlenet, Resnet50 are adopted to extract features on MED-NODE dataset. Differently, Resnet50 and Resnet18 are adopted for Skin-lesion dataset. In the table 1, some important details related to the layers chosen for features extraction are shown. Networks were trained by setting the mini batch size to $5$ , the maximum epochs to $10$ , the initial learning rate to $3\cdot 10^{-4}$ and the optimizer is stochastic gradient descent with momentum (SGDM) algorithm. For both experimental procedures, in order to train the classifiers, $80\%$ and $20\%$ of images are included in train and test set respectively, for a number of iteration equal to $10$ . Table 2 enumerates classification algorithms included in the framework and related settings (some algorithms appear more times with different configurations).

Table 2: Classification algorithms and related settings.

Algorithms	Setting
SVM [26]	KernelFunction:polynomial, KernelScale: auto
SVM [26]	KernelFunction: gaussian, KernelScale: auto
LLP [27]	KernelFunction: rbf, Regularization parameter: 1, init:0, maxiter: 1000
KNN [28]	NumNeighbors: 3, Distance: spearman
KNN [28]	NumNeighbors: 4, Distance: correlation

4.3 Discussion

The tables 4 and 5 describe the comparison with existing skin cancer classification methods (we referred with the results which appear in the corresponding papers). The provided performance can be considered satisfactory compared to competitors. In terms of accuracy, although it provides a rough measurement, we have provided the best result for MED-NODE and the second for Skin-lesion (only surpassed by BIBS). Differently, PPV and NPV give good indications on the classification ability. TPR, a measure that provides greater confidence about addressed problem, is very high for both datasets. Otherwise, TNR, which also provides a high degree of sensitivity related to the absence of tumors within the image, is the best value for both datasets. Regarding the remaining measures, $F_{1}^{p}$ , $F_{1}^{n}$ and MCC, considerable values were obtained but, unfortunately, not available for all competitors. We can certainly attribute the satisfactory performance to two main aspects. First, the deep learning features, which even if abstract, are able to best represent the images. Furthermore, the framework provides multiple representation models that certainly constitute a different starting point than a standard approach, in which a single representation is provided. This aspect is relevant for improving performance. Not negligible issue, the normalization of the image size, with respect to the request of the first layer of the neural network, before the features extraction phase, does not produce a performance degradation. In other cases, normalization causes loss of quality of the image content and a consequent degradation of details. Otherwise, the weak point is the computational load even if pretrained networks include layers with already tuned weights. Surely, the time required for training is long but less than a network created from scratch. Second, the classification scheme, which provides multiple choices in decision making. In fact, at each iteration, the framework chooses which classifier is suitable for recognizing melanoma in the images included in the proposed set. Certainly, this approach is more computationally expensive but produces better results than a single classifier.

Moreover, the table 3 shows the metrics adopted for the performance evaluation, in order to provide a uniform comparison with algorithms working on the same task.

Table 3: Evalutation metrics adopted during relevance feedback stage.

Metric	Equation
True Positive Rate	$TPR=\frac{TP}{TP+FN}$
True Negative Rate	$TNR=\frac{TN}{TN+FP}$
Positive Predictive Value	$PPV=\frac{TP}{TP+FP}$
Negative Predictive Value	$NPV=\frac{TN}{TN+FN}$
Accuracy	$ACC=\frac{TP+FN}{TP+FP+TN+FN}$
$F_{1}$ -Score(Positive)	$F_{1}^{P}=\frac{2\cdot PPV\cdot TPR}{PPV+TPR}$
$F_{1}$ -Score(Negative)	$F_{1}^{N}=\frac{2\cdot NPV\cdot TNR}{NPV+TNR}$
Matthew’s Correlation Coefficient	$MCC=\frac{TP\cdot TN-FP\cdot FN}{\par\sqrt{(TP+FP)\cdot(TP+FN)\cdot(TN+FP)\cdot(TN+FN)}}$

Looking carefully at the table, it is important to focus on the meaning of the individual measures with reference to melanoma detection. The True Positive rate, also known as Sensitivity, concerns the portion of positives melanoma images that are correctly identified. This provide important information because highlights the skill to identify images containing skin lesions and contributes to increase the degree of robustness of result. The same concept is true for the True Negative rate, also known as Specificity, which instead measures the portion of negatives, not containing skin lesions, that have been correctly identified. The Positive and Negative Predictive values, also known as Precision and Recall respectively, are probabilistic measures that indicate whether an image with a positive or negative melanoma test may or may not have a skin lesion. In essence, Recall expresses the ability to find all relevant instances in the dataset, Precision expresses the proportion of instances that the framework claims to be relevant were actually relevant. Accuracy, a well-known performance measure, is the proportion of true results among the total number of cases examined. In our case provides an overall analysis, certainly a rough measurement compared to the previous ones, about the skill of a classifier to distinguish a skin lesion from an image without lesions. $F_{1}-Score$ measure combines the Precision and Recall of the model, as the harmonic mean, in order to find an optimal blend. The choice of the harmonic mean instead of a simple mean concerns the possibility of eliminating extreme values. Finally, Matthew’s correlation coefficient is another overall well-known quality measure. It takes into account True/False Positives/Negatives values and is generally regarded as a balanced measure which can be adopted even if the classes are of very different sizes.

Table 4: Experimental results on MED-NODE dataset.

Method	TPR TNR PPV NPV ACC $F_{1}^{p}$ $F_{1}^{n}$ MCC
MED-NODE annoted [18]	0.78 0.59 0.56 0.80 0.66 0.65 0.68 0.36
Spotmole [38]	0.82 0.57 0.56 0.83 0.67 0.67 0.68 0.39
Barhoumi and Zagrouba [39]	0.46 0.87 0.70 0.71 0.70 0.56 0.78 0.37
MED-NODE color [18]	0.74 0.72 0.64 0.81 0.73 0.69 0.76 0.45
MED-NODE texture [18]	0.62 0.85 0.74 0.77 0.76 0.67 0.81 0.49
Jafari et al. [24]	0.90 0.72 0.70 0.91 0.79 0.79 0.80 0.61
MED-NODE combined [18]	0.80 0.81 0.74 0.86 0.81 0.77 0.83 0.61
Nasr Esfahani et al. [19]	0.81 0.80 0.75 0.86 0.81 0.78 0.83 0.61
Benjamin Albert [20]	0.89 0.93 0.92 0.93 0.91 0.89 0.92 0.83
Pereira et[21]ght/svm-smo/f23-32	0.45 0.92 $-$ $-$ 0.73 $-$ $-$ $-$
Pereira et [21]ght/svm-smo/f1-32	0.56 0.86 $-$ $-$ 0.74 $-$ $-$ $-$
Pereira et al. [21]lbpc/svm-smo/f23-32	0.49 0.93 $-$ $-$ 0.75 $-$ $-$ $-$
Pereira et al. [21]lbpc/svm-smo/f1-32	0.58 0.91 $-$ $-$ 0.78 $-$ $-$ $-$
Pereira et al. [21]ght/svm-sda/f23-32	0.66 0.83 $-$ $-$ 0.76 $-$ $-$ $-$
Pereira et al. [21]ght/svm-sda/f1-32	0.66 0.86 $-$ $-$ 0.78 $-$ $-$ $-$
Pereira et al. [21]lbpc/svm-isda/f23-32	0.69 0.83 $-$ $-$ 0.77 $-$ $-$ $-$
Pereira et al. [21]lbpc/svm-isda/f1-32	0.65 0.88 $-$ $-$ 0.79 $-$ $-$ $-$
Pereira et al. [21]ght/ffn/f23-32	0.63 0.84 $-$ $-$ 0.76 $-$ $-$ $-$
Pereira et al. [21]ght/ffn/f1-32	0.63 0.84 $-$ $-$ 0.76 $-$ $-$ $-$
Pereira et al. [21]lbpc/ffn/f23-32	0.64 0.83 $-$ $-$ 0.75 $-$ $-$ $-$
Pereira et al. [21]lbpc/ffn/f1-32	0.66 0.86 $-$ $-$ 0.77 $-$ $-$ $-$
Sultana et al. [22]	0.73 0.86 0.77 0.83 0.81 $-$ $-$ $-$
Ge, Yunhao and Liet al. [23]	0.94 0.93 $-$ $-$ 0.92 $-$ $-$ $-$
Mandal et al.[40] Case 1	0.61 0.65 0.74 0.87 0.65 $-$ $-$ $-$
Mandal et al.[40] Case 2	0.80 0.73 0.74 0.87 0.71 $-$ $-$ $-$
Mandal et al.[40] Case 3	0.84 0.66 0.68 0.86 0.71 $-$ $-$ $-$
Jafari et al. [41]	0.82 0.71 0.67 0.85 0.76 $-$ $-$ $-$
Jafari et al. [24]	0.90 0.72 0.70 0.91 0.79 0.79 0.80 0.61
T. Do et al. [25] Color	0.81 0.73 0.66 0.85 0.75 $-$ $-$ $-$
T. Do et al. [25] Texture	0.66 0.85 0.75 0.79 0.78 $-$ $-$ $-$
T. Do et al.[25]Col.and Text.	0.84 0.72 0.70 0.87 0.77 $-$ $-$ $-$
E. Nasr-Esfahani et al. [19]	0.81 0.80 0.75 0.86 0.81 $-$ $-$ $-$
Our	0.90 0.97 0.97 0.90 0.93 0.93 0.94 0.87

Table 5: Experimental results on Skin-lesion dataset.

Method	TPR TNR PPV NPV ACC $F_{l}^{p}$ $F_{l}^{n}$ MCC
Texture analysis [17]	0.87 0.71 0.76 - 0.75 - - -
HLIFs [16]	0.96 0.73 - - 0.83 - - -
BIBS [15]	0.92 0.88 0.91 - 0.90 - - -
Decision Support [14]	0.84 0.79 - - 0.81 - - -
Color pigment boundary [13]	0.95 0.88 0.92 - 0.82 - - -
R. Amelard et al. [42]Asymmetry $F_{C}$	0.73 0.64 - - 0.69 - - -
R. Amelard et al. [42]Proposed HLIFs	0.79 0.68 - - 0.75 - - -
R. Amelard et al. [42]Cavalcanti feature set	0.84 0.78 - - 0.82 - - -
R. Amelard et al. [42]Modified $F_{C}$	0.86 0.75 - - 0.72 - - -
R. Amelard et al. [42]Combined $F_{MC}$ $F_{A}^{HLIFS}$	0.91 0.80 - - 0.86 - - -
Our	0.84 0.92 0.91 0.85 0.88 0.87 0.88 0.76

5 Conclusions and Future Works

The challenge in the discrimination of melanoma and nevi has resulted to be very interesting in recent years. The complexity of the task is linked to different factors such as the large amount of types of melanomas or the difficulties for digital phase acquisition (noise, lighting, angle, distance and much more). Machine learning classifiers suffer greatly these factors and inevitably reflect on the quality of the results. In support, the convolutional neural networks give a big hand for both classification and features extraction phases. In this context, we have proposed a framework that combines standard classifiers and features extracted with convolutional neural networks using a transfer learning approach. The results produced certainly support the theoretical thesis. A multiple representation of the image compared to a single one is a high discrimination factor even if the features adopted are completely abstract. The extensive experimental phase has shown how the proposed approach is competitive, and in some cases surpassing, with respect to state of the art methods. Certainly, the main weak point concerns the computational complexity relating to features extraction phase, as it is known, takes a long time especially when the data to be processed grows. Future work will certainly concern the study and analysis of additional convolutional neural networks still unexplored for this type of problem or, alternatively, the application of the proposed framework to tasks different from the melanoma detection.

Acknowledgements

Our thinking is for Alfredo Petrosino. He followed us during first steps towards the Computer Science, through a whirlwind of goals, ideas and, specially, love and passion for the work. We will be forever grateful great master.

References

[1] N. Codella, J. Cai, M. Abedini, R. Garnavi, A. Halpern, J. R. Smith, Deep learning, sparse coding, and svm for melanoma recognition in dermoscopy images, in: International workshop on machine learning in medical imaging, Springer, 2015, pp. 118–126.
[2] N. K. Mishra, M. E. Celebi, An overview of melanoma detection in dermoscopy images using image processing and machine learning, arXiv preprint arXiv:1601.07843 (2016).
[3] M. Binder, M. Schwarz, A. Winkler, A. Steiner, A. Kaider, K. Wolff, H. Pehamberger, Epiluminescence microscopy: a useful tool for the diagnosis of pigmented skin lesions for formally trained dermatologists, Archives of dermatology 131 (3) (1995) 286–291.
[4] C. Barata, M. E. Celebi, J. S. Marques, A survey of feature extraction in dermoscopy image analysis of skin cancer, IEEE journal of biomedical and health informatics 23 (3) (2018) 1096–1109.
[5] M. E. Celebi, H. A. Kingravi, B. Uddin, H. Iyatomi, Y. A. Aslandogan, W. V. Stoecker, R. H. Moss, A methodological approach to the classification of dermoscopy images, Computerized Medical imaging and graphics 31 (6) (2007) 362–373.
[6] T. Tommasi, E. La Torre, B. Caputo, Melanoma recognition using representative and discriminative kernel classifiers, in: International Workshop on Computer Vision Approaches to Medical Image Analysis, Springer, 2006, pp. 1–12.
[7] S. Pathan, K. G. Prabhu, P. Siddalingaswamy, A methodological approach to classify typical and atypical pigment network patterns for melanoma diagnosis, Biomedical Signal Processing and Control 44 (2018) 25–37.
[8] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, 2012, pp. 1097–1105.
[9] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp. 234–241.
[10] L. Yu, H. Chen, Q. Dou, J. Qin, P.-A. Heng, Automated melanoma recognition in dermoscopy images via very deep residual networks, IEEE transactions on medical imaging 36 (4) (2016) 994–1004.
[11] C.-K. Shie, C.-H. Chuang, C.-N. Chou, M.-H. Wu, E. Y. Chang, Transfer representation learning for medical image analysis, in: 2015 37th annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, 2015, pp. 711–714.
[12] H.-C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, R. M. Summers, Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning, IEEE transactions on medical imaging 35 (5) (2016) 1285–1298.
[13] D. S. I. Jayant Sachdev, Shashank Shekhar, Skin lesion images classification using new color pigmented boundary descriptors, 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA 2017) (2017).
[14] R. Amelard, J. Glaister, A. Wong, D. A. Clausi, Melanoma decision support using lighting-corrected intuitive feature models, in: Computer Vision Techniques for the Diagnosis of Skin Cancer, Springer, 2013, pp. pp 193–219.
[15] S. A. Mahdiraji, Y. Baleghi, S. M. Sakhaei, Bibs, a new descriptor for melanoma/non-melanoma discrimination, in: Electrical Engineering (ICEE), Iranian Conference on, 2018, pp. 1397–1402.
[16] R. Amelard, J. Glaister, A. Wong, D. A. Clausi, High-level intuitive features (hlifs) for intuitive skin lesion description, IEEE Transactions on Biomedical Engineering 62 (3) (2015) 820–831.
[17] E. Karabulut, T. Ibrikci, Texture analysis of melanoma images for computer-aided diagnosis, in: Int. Conference on Intelligent Computing, Computer Science & Information Systems (ICCSIS 16), Vol. 2, 2016, pp. 26–29.
[18] I. Giotis, N. Molders, S. Land, M. Biehl, M. Jonkman, N. Petkov, Med-node: A computer-assisted melanoma diagnosis system using non-dermoscopic images, Expert Systems with Applications 42 (05 2015). doi:10.1016/j.eswa.2015.04.034.
[19] E. Nasr-Esfahani, S. Samavi, N. Karimi, S. M. R. Soroushmehr, M. H. Jafari, K. Ward, K. Najarian, Melanoma detection by analysis of clinical images using convolutional neural network, in: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, 2016, pp. 1373–1376.
[20] B. A. Albert, Deep learning from limited training data: Novel segmentation and ensemble algorithms applied to automatic melanoma diagnosis, IEEE Access 8 (2020) 31254–31269.
[21] P. M. Pereira, R. Fonseca-Pinto, R. P. Paiva, P. A. Assuncao, L. M. Tavora, L. A. Thomaz, S. M. Faria, Skin lesion classification enhancement using border-line features – the melanoma vs nevus problem, Biomedical Signal Processing and Control 57 (2020) 101765. doi:https://doi.org/10.1016/j.bspc.2019.101765.
URL http://www.sciencedirect.com/science/article/pii/S1746809419303465
[22] N. N. Sultana, N. B. Puhan, B. Mandal, Deeppca based objective function for melanoma detection, in: 2018 International Conference on Information Technology (ICIT), IEEE, 2018, pp. 68–72.
[23] Y. Ge, B. Li, Y. Zhao, E. Guan, W. Yan, Melanoma segmentation and classification in clinical images using deep learning, in: Proceedings of the 2018 10th International Conference on Machine Learning and Computing, 2018, pp. 252–256.
[24] M. H. Jafari, S. Samavi, N. Karimi, S. M. R. Soroushmehr, K. Ward, K. Najarian, Automatic detection of melanoma using broad extraction of features from digital images, in: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, 2016, pp. 1357–1360.
[25] T. Do, T. Hoang, V. Pomponiu, Y. Zhou, Z. Chen, N. Cheung, D. Koh, A. Tan, S. Tan, Accessible melanoma detection using smartphones and mobile image analysis, IEEE Transactions on Multimedia 20 (10) (2018) 2849–2864.
[26] V. V. Corinna Cortes, Support-vector networks, Machine Learning 20 (1995) 273–297.
[27] T. Kobayashi, K. Watanabe, N. Otsu, Logistic label propagation, Pattern Recognition Letters 33 (5) (2012) 580–588.
[28] B. V. Dasarathy, Nearest neighbor (nn) norms: Nn pattern classification techniques, IEEE Computer Society Tutorial (1991).
[29] A. Likas, N. Vlassis, J. J. Verbeek, The global k-means clustering algorithm, Pattern recognition 36 (2) (2003) 451–461.
[30] C. Shorten, T. M. Khoshgoftaar, A survey on image data augmentation for deep learning, Journal of Big Data 6 (1) (2019) 60.
[31] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
[32] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel, Backpropagation applied to handwritten zip code recognition, Neural computation 1 (4) (1989) 541–551.
[33] He, Kaiming, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition., in: Proceedings of the IEEE conference on computer vision and pattern recognition, IEEE, 2016, pp. 770–778.
[34] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale hierarchical image database, in: 2009 IEEE conference on computer vision and pattern recognition, Ieee, 2009, pp. 248–255.
[35] I. Giotis, N. Molders, S. Land, M. Biehl, M. F. Jonkman, N. Petkov, Med-node: a computer-assisted melanoma diagnosis system using non-dermoscopic images, Expert systems with applications 42 (19) (2015) 6578–6585.
[36] R. Amelard, J. Glaister, A. Wong, D. A. Clausi, High-level intuitive features (hlifs) for intuitive skin lesion description, IEEE Transactions on Biomedical Engineering 62 (3) (2014) 820–831.
[37] Russakovsky, Olga, Deng, Jia, Su, Hao, Krause, Jonathan, Satheesh, Sanjeev, Ma, Sean, Huang, Zhiheng, Karpathy, Andrej, Khosla, Aditya, Bernstein, Michael, et al., Imagenet large scale visual recognition challenge, International journal of computer vision 115 (3) (2015) 211–252.
[38] C. Munteanu, S. Cooclea, Spotmole melanoma control system, ? (2009).
URL https://www.spotmole.com
[39] E. Zagrouba, W. Barhoumi, A prelimary approach for the automated recognition of malignant melanoma, Image Analysis and Stereology 23 (2004) 121–135. doi:10.5566/ias.v23.p121-135.
[40] B. Mandal, N. Sultana, N. Puhan, Deep residual network with regularized fisher framework for detection of melanoma, IET Computer Vision 12 (07 2018). doi:10.1049/iet-cvi.2018.5238.
[41] M. H. Jafari, S. Samavi, S. M. R. Soroushmehr, H. Mohaghegh, N. Karimi, K. Najarian, Set of descriptors for skin cancer diagnosis using non-dermoscopic color images, in: 2016 IEEE international conference on image processing (ICIP), IEEE, 2016, pp. 2638–2642.
[42] R. Amelard, A. Wong, D. A. Clausi, Extracting high-level intuitive features (hlif) for classifying skin lesions using standard camera images, in: 2012 Ninth Conference on Computer and Robot Vision, IEEE, 2012, pp. 396–403.