Skin Cancer: Convolutional Neural Network Based Skin Lesion

Need help with assignments?

Our qualified writers can create original, plagiarism-free papers in any format you choose (APA, MLA, Harvard, Chicago, etc.)

Order from us for quality, customized work in due time of your choice.

Click Here To Order Now

Analysis towards Melanoma

Abstract Melanoma is only harmful in human epidemic diseases and the level of these diseases is increasing continuously. Computers are not more intelligent than humans, but it is easy to find some information easily, which may not be easily apparent to the human eye. Such as: skin color variations or taxa variations. As knowledge is in inadequate source, automated systems proficient of detecting disease could save lives, reduce unnecessary biopsies, and reduce costs. My research is based on the melanoma computer aspect based on: taking pictures of harmful skin though it is very difficult to acquire a large amount of medical image data required for deep learning and to learn if there are many medical image data but they are not properly labeled. In our work we used, the International Skin Imaging Collaboration 2018(ISIC) dataset, is a challenge focusing on the automatic analysis of skin lesions. In our paper, we propose three deep learning methods and a system that combines current developments in deep learning with well-known machine learning approaches. The goal of my research is to use artificial intelligence to make an actual and automated diagnosis method that are capable of segmenting skin lesions, as well as examining the detected part and nearby tissue for melanoma detection.

Keywords Skin Lesion, Melanoma, CNN, VGG-16, Random Forest.

I. Introduction

Skin is the main part of the human body, which aids to cover the muscles, bones beside the whole body. Skin is uncovered to the outer environment; thus, the disease and infection occur more to the skin. So, proper attention to skin disease is essential. Nowadays lot of people are suffering from skin diseases. It is one of the most common diseases in humans and its frequency is increasing noticeably. Melanoma is the deadliest form of skin cancer. Though melanoma accounts for only 4% of all skin cancers, it is responsible for 75% of all skin cancer deaths [1]. If the symptoms are identified, this disease can be treated at its early stage and can be cured; but if it is identified too late, it can grow deeper into the skin, speared to other parts of the body and can be dangerous, as it becomes difficult to treat. One out of five Americans will be suffering from skin cancer by the age of 70. There are different types of skin diseases. In our dataset, we discuss 7 types of skin diseases; these are: Actinic Keratoses and intraepithelial carcinoma/ Bowens disease(akiec), basal cell carcinoma (bcc), benign keratoses like lesions (solar lentigines/seborrheic keratoses and lichen-planus like keratoses, bkl), dermatofibroma(df), melanoma(mel), melanocytic nevi(nv) and vascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage, vasc). But skin cancers fall into generally two categories: melanoma skin cancer and non-melanoma skin cancer. The most recognized skin cancers, basal cell carcinoma (BCC) and squamous cell carcinoma (SCC), are non-melanomas and once in a while risky (perilous). Among many applications of image processing, diagnosis of diseases by detecting particular features from medical images is now very significant. The computerized diagnostics is helpful to enhance the diagnostic accuracy as well as the speed. For these reasons, developing algorithms for diagnosis from medical images has become a major area of research in the area of medical science. Among the different of computerized image processing methods used, various image processing methods are classified to low-level visual feature representations, segmentation algorithms and classical machine learning techniques such as k-nearest neighbor (kNN), support vector machines (SVM) [3] and convolutional neural networks (CNNs) [1]. A computer is not more intelligent than humans but it may be able to extract some information, like color variation, asymmetry, texture features, that may not be readily apparent by human eyes. Nevertheless, automatic recognition of melanoma from dermoscopy images is still a difficult task, as it has several challenges.

This research attempts to relate ideas of Machine Learning and Computer Vision to answer the problem of detection and classification of Melanoma skin cancer. The main objective of this work is to classify cancer from the images by detecting the characteristics present in the affected cancer lesion. Various machine learning techniques, such as artificial neural network and its different variants, unsupervised learning method like K-means clustering, support vector machine, Can give promising results in the detection and classification Task; for example, k-means clustering clusters the infected and non-infected region in the different clusters and hence, the skin cancer can be successfully categorized. Deep neural network has been found to be very efficient among various types of machine learning techniques. Due to the non-linear behavior of the technique, it can be effectively applied to images as well. Convolutional neural network (CNNs) consists of a mass of convolutional modules, and every module usually contains three kinds of layers; convolutional layer, pooling layer and fully connected layer. The key steps in an image processing-based diagnosis of skin diseases are: image acquisition of skin lesion image, segmentation of the skin lesion from a skin region, extraction of features of the lesion spot and feature classification. Here, a method for detecting melanoma using convolutional neural network-based skin lesion analysis has been proposed. In this work, Image processing-based skin lesion analysis provides an effective way to perform classification of cancer and thus, can be a guide in the clinical process of diagnosis.

AI. Related Work

Detecting different types of skin diseases from lesion image is quite challenging. For segmentation of skin lesion in the image, existing methods both use manual, semi-automatic or fully automatic edge detection techniques. Recently a lot of researchers done in this field, hence its quite hard to get a better results.

Nurul Huda Firdaus Mohd Azmi et al. [2] presented an analysis of the segmentation method called the ABCD rules (Asymmetry, Border irregularity, Color variegation, Diameter) in image segmentation. The authors showed that the rule effectively classifies the images with a high value of total dermatoscopy score (TDS). The research was carried on malignant tumor and benign skin lesion images.

A technical survey by O. Abuzaghleh et al. [5] proposed two major models. The first model is a real-time alert to help users to check skin injury caused by sunlight. The second model is an image analysis module, which have image input, hair detection, and removal, lesion segmentation, feature extraction and classification. The proposed method used PH2 dermoscopy image database from Pedro Hispano Hospital in Portugal. The database has 200 dermoscopy images of lesions, with benign, atypical and melanoma cases. The experimental results showed that the proposed method is effective, classifying of the benign, atypical, and melanoma images with an accuracy of 96.3%, 95.7% and 97.5% individually.

Teck Yan Tan et al. [11] employed pre-processing such as dull razors and median filters to remove hair and other noises. Segmentation is done by using the pixel limitation technique to separate lesions from the image background. Support Vector Machine (SVM) classifier performs benign and malignant lesion recognition. The method was evaluated using the Dermofit dermoscopy image database with 1300 images and achieved an average accuracy of 92% and 84% for benign and malignant skin lesion classification. Genetic Algorithm (GA) is also applied to identify the most discriminative feature subsets to improve classification accuracy.

M.H. Jafari et al. [12] proposed methods for skin lesion segmentation in medical images using deep learning technique. Then pre-processing the input image, segment the image, shows the lesion region and using deep convolutional neural network (CNN) for skin lesion classification analysis. The result is calculated using Dermquest database. Also, Yading Yuan et al. [16] proposed 19-layer deep convolutional neural networks (CNNs) that are designed a unique loss function. The method achieved promising segmentation accuracy and was calculated using two freely accessible databases ISBI 2016 challenge dataset and PH2 database.

Codella et al. [1] in their paper proposed a method for the identification of melanoma in dermoscopy images by combining deep learning, sparse coding and support vector machine (SVM) learning algorithms. They claimed that their method is helpful since it used unsupervised learning, aiming at avoiding lesion segmentation and complex pre-processing. Kawahara et al. [4]. used a fully-convolutional neural network from skin lesions in order to categorize melanomas with higher accuracy (Kawahara et al., 2016). Later, Codella et al. [1] recommended a method that combines deep learning methods that can segment skin images for analysis. This study trains a model to identify melanoma-positive dermoscopy images running a CNN model, the aim of melanoma recognition with higher accuracy.

BI. Background Theory

A. Convolutional Neural Networks

Convolutional neural networks (CNN) are a type of the neural network that is particularly suited for image analysis. Convolutional neural networks are widely used for image classification, recognition, objects detection [13]. A typical CNN architecture contains convolutional, pooling and fully-connected layers. Relatively novel techniques such as batch normalization, dropout and shortcut connections [14] can additionally be used to increase classification accuracy.

B. Conv Net Architecture

For Convolutional Neural Networks, VGGNet is a well-documented and generally used architecture. It is now very popular due to its impressive performance on image data. For best-performing (with 16 and 19 weight layers) have been made publicly available. In this work, the VGG16 architecture was chosen, since it has been found easy to be applied on different types of datasets. Make things easier well to other datasets. During training, the input to our ConvNets is a fixed-size 224 X 224 RGB image. The only pre-processing, we do is subtracting the mean RGB value, computed on the training set, from each pixel. The image is passed through a stack of convolutional layers, where we use filters with a very small receptive field 3 X 3 (which is the smallest size to capture the notion of left/ right, up/down, center). Only max pooling is used in VGG-16. The pooling kernel size is always 2*2 and the stride is always 2 in VGG-16. Fully connected layers are implemented using convolution in VGG-16. Its size is shown in the format n1*n2, where n1 is the size of the input tensor, and n2 is the size of the output tensor. Dropout is a technique to improve the generalization of deep learning methods. It sets the weights connected to a certain percentage of nodes in the network to 0 (and VGG-16 set the percentage to 0.5 in the two dropout layers) [15]. The input layer of the network expects a 224×224 pixel RGB image. All hidden layers are set with a ReLU (Rectified Linear Unit) as the activation function layer (nonlinearity operation) and include three-dimensional pooling through use of a max-pooling layer. The network is established with a classifier block consisting of three Fully Connected (FC) layers.

Fig. 1. VGG-16 Architecture

C. Support Vector Machine

Support vector machines (SVMs, also support vector networks) is a supervised learning model that studies data used for classification and regression analysis. An SVM model is a symbol of the examples as points in space, mapped so that the examples of the detached groups are separated by a clear gap that is as wide as possible. New examples are then mapped into that same space and expected to belong to a group based on which side of the gap they fall. In addition, the task of linear classification, SVMs can perform non-linear classification using the kernel trick, indirectly mapping their inputs into high-dimensional feature spaces.

D. Random Forest

Random Forest is a supervised learning algorithm. Random forest builds multiple decision trees and merges them together to get a more accurate and stable prediction. Random Forest is a flexible, easy-to-use machine learning algorithm that produces, even without hyper-parameter modification, a great result most of the time. It is also one of the commonly used algorithms, due to its simplicity and the fact that it can be used for both classification and regression tasks. The forest it builds, is an ensemble of decision Trees, most of the time trained with the bagging method. The general idea of the bagging method is that a combination of learning models increases the overall result. Random Forest has nearly the same hyperparameters as a decision tree or a bagging classifier. Random Forest adds additional randomness to the model while growing the trees [7]. Instead of searching for the most important feature while splitting a node, it searches for the best feature among a random subset of features. These results in a wide range that generally results in a better model. Therefore, in a random forest, only a random subset of the features is taken into consideration by the algorithm for splitting a node. Random forest is a collection of decision trees, but there are some differences. One difference is that, deep decision trees might suffer from overfitting. Random forest prevents over-fitting most of the time, by creating random subsets of the features and building smaller trees using these subsets. Random forests are a way of averaging multiple deep decision trees, trained on different parts of the same training set, with the goal of reducing the variance.

IV. Proposed Methodology

In this research, several methods, from classical machine learning algorithms like SVM, tree-based algorithm Random Forest, and deep learning-based algorithm have been investigated. The process of disease detection and classification is shown in the below figure.

A. Pre-Processing

In our work, we attempt to keep the pre-processing steps, minimal to confirm better generalization skill when tested on other dermoscopic skin lesion datasets. We thus only apply some standard pre-processing steps. First, we normalize the pixel values of the images. Next, the images are resized and the size of 224 x 224 pixels.

B. Data Augmentation

Data Augmentation is a method that is used to avoid overfitting when training Machine Learning models. The goal of data augmentation is to learn how to raise our data set size to train robust Convolutional Network models with limited or small amounts of data. This study is required for improving the performance of an image classification model. Some of the simplest augmentations that are flipping, translations, rotation, scaling, color enhancement, isolating individual R, G, B color channels, and adding noise, etc. Generating augmented input for CNN, using image analysis filters the traditional input to CNN architecture consists of whole images, or image patches, of a standard size, in RGB format. In this work, we augment the input of the CNN with the response of a number of well-established filters that are frequently used for image feature extraction. We augment the training set by blurring the images and we use Gaussian blur for reducing noise and making the image smoother. After that we convert the RGB image to enhance the red color on the image and apply a detached layer on an image, later a partition is performed on those images. This augmentation leads to an increase of training data.

Fig. 3. Data Augmentation

Fig. 2. Flow Diagram of our model

The results of this research have the potential to be used as a practical tool for diagnosis.

C. Image Segmentation

Image segmentation is an important area in an image processing background. It is the process to classify an image into different groups. There are many different methods, and k-means is one of the most popular methods. K-means clustering in such a fashion that the different regions of the image are marked with different colors and if possible, boundaries are created separating different regions. The motivation behind image segmentation using k-means is that we try to assign labels to each pixel based on the RGB values. Color Quantization is the process of reducing the number of colors in an image. Sometimes, some devices may have a constraint such that it can produce only a limited number of colors. In those cases, also color quantization is performed. Here we use k-means clustering for color

quantization. There are 3 features, say, R, G, B. So, we need to reshape the image to an array of Mx3 size (M is the number of pixels in the image). We also set a criteria value for k-means which defines the termination criteria for k-means. We return the segmented output and also the labeled result. The labeled result contains the labels 0 to N, where N is the number of partitions we choose. Each label corresponds to one partition. And after the clustering, we apply centroid values (it is also R, G, B) to all pixels, such that resulting image will have a specified number of colors. And again, we need to reshape it back to the shape of the original image.

D. Classification

For disease classification, we use a collection of recent machine learning models such as SVM, Random Forest, Convolutional neural networks. While implementing deep learning algorithms, I have chosen one novel Convolutional neural network architecture and it is the VGG-16 model. In our system, we propose to use the method of segmentation, classification and Convolution Neural Network. Since we have only a little amount of data to feed into Convolutional neural networks, we used data augmentation to increase the size of our training data, so that it fits well on the validation data. This classification method proves to be efficient for most of skin images.

Result and Discussion

E. Datasets and challenges

There are relatively few data sets in the field of dermatology and even fewer datasets of skin lesion images. Moreover, most of these datasets are too small and/or not publicly available, which provides an additional obstacle to performing reproducible research in the area. Examples of dermatology-related image datasets used in recent research include: The dataset for workshop ‘ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection’ (ISIC 2018) is used [6], [7]. In the training set, there are a total of 10015 skin lesion images from seven skin diseases- Melanoma (1113), Melanocytic nevus (6705), Basal cell carcinoma (514), Actinic keratosis (327), Benign keratosis (1099), Dermatofibroma (115) and Vascular (142). The validation dataset consists of 193 images. The final dataset consists of 10015 dermatoscopic images which can serve as a training set for academic machine learning purposes.

F. Evaluation Metrics

The model evaluation is performed using the same training and testing partition used in the ISIC dataset. CNN architectures with lower loss and greater accuracy were considered in creating a novel CNN model. For quantitative evaluation of the performance their have three commonly used metrics; Recall or Sensitivity (SENS), Classification accuracy (ACC), and Specificity (SPEC) and in our work we were used accuracy performance. These are defined in terms of the numbers of true positives (TPs), true negatives (TNs), false negatives (FNs), and false positives (FPs) [5]. The metrics are defined as follow:

The number of correct predictions divided by the total number of predictions. Accuracy calculates the proportion of predicted pixels that are correctly identified in the predicted binary image.

Accuracy=

Sensitivity or recall evaluates the ability to find the pixels which contain skin lesion in the binary image.

Recall =

Precision, the fraction of recovered instances that are related. It is also alike to positive analytical value.

Precision =

TABLE 1.PRECISION AND RECALL RESULTS ON OUR SVM, VGG16 &

RANDOM FOREST MODEL

Model

SVM

VGG16

Random Forest

Name

Precision

Recall

Precision

Recall

Precision

Recall

NV

1

1

0.64912

1

0.97368

1

2

MEL

0.973684

1

1

0.7027

0.92857

0.702702

BKL

1

1

0.90476

1

0.98039

1

1

BCC

1

1

0.98039

1

0.925

1

2

AKIEC

1

1

0.925

1

0.97674

1

VASC

1

1

0.97674

0.9074

0.92452

0.907407

4

DF

1

0.96969

0.92452

0.9090

0.88235

0.909090

8

9

MEAN

0.99624

0.99567

0.90864

0.9313

0.94161

0.931314

1

Accuracy results for each model are shown in Table 2, respectively. Best values are highlighted (in bold).

TABLE 2. MODEL EVALUATION

Model

SVM

VGG16

Random

Name

Forest

Accuracy

0.903708987

0.9153 ± 0.034

0.897318007

V. Conclusion And Future Scope

The aim of our work is dermoscopic images analysis tools to enable the automated diagnosis of melanoma from dermoscopic images, we can easily detect the disease and can take proper steps to save our times. Image analysis of skin lesions contains three stages: segmentation, feature extraction and disease classification. The different methods

were proposed for each stages of lesion analysis in the papers. We Used Support Vector Machine, VGG-16 and Random Forest. We got 0.903 in Support Vector Machine, 0.9153 ± 0.0343 in VGG-16 and 0.8973 in Random Forest. In assumption, there should be normal actions and freely existing datasets for the new researchers so that together we can fight against this deadliest disease. In future work, on one hand, we will collect more training data to cover the large variation of data distribution, on the other hand we will investigate other techniques.

References

  1. Codella, N., Cai, J., Abedini, M., Garnavi, R., Halpern, A., & Smith, J. R. (2015, October). Deep learning, sparse coding, and SVM for melanoma recognition in dermoscopy images. In International Workshop on Machine Learning in Medical Imaging (pp. 118-126). Springer, Cham. J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.6873.
  2. Nurulhuda Firdaus Mohd Azmi, Haslina Md Sarkan, Yazriwati Yahya, and Suriayati Chuprat. Abcd rules segmentation on malignant tumor and benign skin lesion images. In Computer and Information Sciences (ICCOINS), 2016 3rd International Conference on, pages 6670. IEEE, 2016.
  3. Yuan, X., Yang, Z., Zouridakis, G., & Mullani, N. (2006, August). SVM-based texture classification and application to early melanoma detection. In Engineering in Medicine and Biology Society, 2006. EMBS’06. 28th Annual International Conference of the IEEE (pp. 4775-4778). IEEE.
  4. Kawahara, J., BenTaieb, A., & Hamarneh, G. (2016, April). Deep features to classify skin lesions. In Biomedical Imaging (ISBI), 2016 IEEE 13th International Symposium on (pp. 1397-1400). IEEE.
  5. O. Abuzaghleh, B. D. Barkana, and M. Faezipour. Noninvasive real-time automated skin lesion analysis system for melanoma early detection and prevention. IEEE Journal of Translational Engineering in Health and Medicine, 3:112, 2015.
  6. https://challenge2018.isic-archive.com/task3/
  7. Codella N., Nguyen Q.B., Pankanti S., Gutman D., Helba B., Halpern A., Smith J.R. Deep learning ensembles for melanoma recognition in dermoscopy images. IBM J. Res. Dev. 2016;61 doi: 10.1147/JRD.2017.2708299.
  8. Maria JoaÜo M Vasconcelos, Lu´1s Rosado, and Ma´rcia Ferreira. A new risk assessment methodology for dermoscopic skin lesion images. In Medical Measurements and Applications (MeMeA), 2015 IEEE Interna- tional Symposium on, pages 570575. IEEE, 2015.
  9. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, vol. 1. MIT press Cam-bridge (2016)
  10. U. Jamil, S. Khalid, and M. U. Akram. Dermoscopic feature analysis for melanoma recognition and prevention. In 2016 Sixth International Conference on Innovative Computing Technology (INTECH), pages 290295, Aug 2016.
  11. Teck Yan Tan, Li Zhang, and Ming Jiang. An intelligent decision support system for skin cancer detection from dermoscopic images. In Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD),2016 12th International Conference on, pages 21942199. IEEE, 2016.
  12. Mohammad H Jafari, Nader Karimi, Ebrahim Nasr-Esfahani, Shadrokh Samavi, S Mohamad R Soroushmehr, K Ward, and Kayvan Najarian. Skin lesion segmentation in clinical images using deep learning. In Pattern Recognition (ICPR), 2016 23rd International Conference on, pages 337342. IEEE, 2016.
  13. Krizhevsky, A., I. Sutskever, and G.E. Hinton, ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017. 60(6): p. 84-90.
  14. Srivastava, N., et al., Dropout: a simple way to prevent neural networks from overfitting. Journal of machine learning research, 2014. 15(1): p. 1929-1958.

Need help with assignments?

Our qualified writers can create original, plagiarism-free papers in any format you choose (APA, MLA, Harvard, Chicago, etc.)

Order from us for quality, customized work in due time of your choice.

Click Here To Order Now