Abstract

Melanoma skin cancer is a fatal illness. However, most melanomas can be treated with minimal surgery if found early. In this regard, the addition of image analysis techniques that automate skin cancer diagnosis would support and increase dermatologists’ diagnosis accuracy. As a result, enhanced melanoma detection can benefit patients who are showing indicators of the disease. Convolutional neural networks can learn from features hierarchically. Since the implementation of a neural network requires a large volume of images to achieve high accuracy rates, an insufficient number of skin cancer images represent an additional challenge in the detection of skin lesions; the current work aims to develop an intelligent system that allows, based on the analysis of images of skin lesions and contextual information of the patient, to accurately determine if it represents a case of melanoma-type skin cancer. The TensorFlow library was used to execute models in the constructed app. The Mobilenet V2 model was used with a collection of 305 pictures retrieved from the Internet. Diagnoses included melanoma, plaque and psoriatic skin conditions, and Kaposi’s sarcoma and atopic dermatitis. There were two separate machines used to conduct the application tests. There was more than 75% acceptable performance in predicting Kaposi’s sarcoma-like illnesses for melanoma-like lesions, as well as plaque psoriasis and atopic dermatitis, respectively. Despite the low amount of images used in training, the constructed mobile application performed well.

1. Introduction

Skin cancer is a disease that has been steadily spreading throughout the world’s population [1], claiming hundreds of thousands of lives each year while also causing an alarming rise in treatment and preventative expenses [2]. It is, in fact, one of the most common cancer diseases with the most significant societal consequence. As a result, preventive medicine is constantly looking for new techniques and/or technologies that allow for early detection of this disease, where operating costs are reduced without sacrificing efficiency and quality of service, both of which are critical for a health system that requires more attention and promptness in its patients’ diagnosis. Based on this problem and the demands of hospital centers specializing in the treatment of nonmelanoma cancer and actinic keratosis, the development of a computer program is proposed that allows an analysis to be performed in the context of clinical imaging, using various mathematical algorithms integrated through image processing and analysis methods of vision and artificial intelligence [3]. To accomplish so, we start with the fact that the research photos are skin spots or moles whose pigmentation and morphology are distinguished from ordinary skin spots by their tonality and unique structure [4]. The study is based on the fact of converting a three-channel image (RGB) to a two- and one-channel image because this transformation allows for the application of algorithms that aid in the discrimination of own parameters in the study sample’s neighborhoods that provide relevant information on the shape and structure of the objects that make them up, with the particularity of expanding or minimizing the observation scale with the corresponding number of points. The pixels that make up an image are known as objects. In terms of image treatment (recognition, description, interpretation, and understanding), a series of phases typical of artificial vision are used, which are combined with deep learning (artificial intelligence) [5], where each one implies the use of a set of algorithms that allow the registration and detection of objects that have certain irregularities within or on the border of the spot or mole in the first instance, which are vital in determining whether they are cancerous or not.

Similarly, these pixels are identified, and their neighborhood, which can be degree 4 or 8, is calculated. If it is a four, it means there are four points connected in perpendicular directions (up, down, left, and right). Take the perpendicular and diagonal directions if the number is 8. It should be mentioned that the specialist doctor makes the digital acquisition of the photographs after entering the patient’s information into a clinical record, which is then uploaded to the system’s repository folder, and the program is then started according to the phases presented.

Because the key resides in the quality or resolution of the image, which is generally stored as a bitmap, registration of the image is necessary when carrying out the analysis for the extraction of characteristics of the objects of study. As a result, if the registry is incomplete, it will necessitate the employment of complementary restoration procedures, which will incur a cost in computational resources and additional time, which the specialist may not have. However, for an inexperienced dermatologist, the diagnosis of melanoma can be very challenging. The accuracy of melanoma detection with dermoscopy varies between 75 and 84% [6]. This indicates the need to include computer-assisted diagnostic platforms, increasing the accuracy of imaging diagnosis of skin lesions.

Being able to differentiate the type of injury is a complex process. To diagnose a skin disease, the presence/absence of some signs is verified, such as morphology, distribution, color, scaling, and disposition; if each of these elements is analyzed separately, the difficulty of recognizing the disease increases. The use of computational support systems to diagnose these lesions has become essential for dermatologists, being able to issue more accurate diagnoses minimizing the subjectivity that could appear in the absence of these tools.

Many computer programs currently serve as a support system in medicine; among these is machine learning (ML) [7], a branch of artificial intelligence constituted by several techniques learning [8]. Some ML-based solutions include early prediction of diabetic neuropathies based on plantar pressure, diagnostic imaging, and detection of infarcts with recognition of small differences between normal and abnormal myocardium, diagnosis of Parkinson’s disease using a mobile device [9], and coronary artery angiography, among many other applications. With the continuous evolution of technology, diagnostic tools have been innovating. In this sense, the incorporation of mobile devices in clinical diagnostic activities is increasing, which entails mobile applications. Smartphones are being used to detect ocular symptoms cardiac auscultation, monitor skin lesions by patients, and identify lesions through artificial neural networks, among other uses. Mobile applications are mostly available in virtual stores either for patients or for use by health professionals.

Convolutional neural networks (CNN) continue to be the pioneers in ML methods used for medical diagnosis due to their fault tolerance and ease of insertion with existing technology [10]. These networks can be configured using TensorFlow, an open-source library from Google© for training and developing ML models. The base unit is the tensor, which can be seen as a multidimensional data array that includes image information, width, length in pixels, and each color layer.

The documentation on computer systems that implement ML methods to recognize skin lesions is extensive; however, most of these are based on programs or pieces of software that need other calculation programs such as MATLAB, MAPLE, or OCTAVE for training and execution of the models. The use of computers with high-speed processors and large memory capacity to execute NCR algorithms limits medical professionals to the use of equipment that occupies large spaces in the consultation area. Due to this, the development of mobile applications for smartphones has become a trend in recent years, finding in the digital markets a limited number of alternatives for the dermatologist, where paid applications are the majority 4,16. In this sense, the evidence published so far on smartphone applications that support the dermatologist in diagnostic activities for multiple lesions is limited and scarce. The studies published in this area focus primarily on recognizing melanomas due to the characteristics of the disease [11].

The use of computer-aided diagnostic systems to detect skin cancer has increased during the last decade. In recent years, deep learning models have obtained remarkable results in different medical image analysis tasks. In particular, convolutional neural networks have become the standard approach to handle this type of problem [12]. The idea of ​​having a tool that, based on an image in real time, allows us to guide an accurate diagnosis by taking advantage of the hardware and software resources of smartphones is beneficial for the clinician, as well as the use of this type of support applications for those doctors who practice in rural populations where access to a dermatologist is difficult. In this sense, the objective of this research was to develop a prototype of an Android-based mobile application for the recognition of skin lesions in real time using convolutional neural networks.

2. Materials and Methods

2.1. Convolutional Neural Networks (CNN)

Artificial neural networks were used to achieve detection of the lesion to be identified. Convolutional neural networks consist of multiple layers of convolutional filters of one or more dimensions. After each layer, a function is usually added to perform the nonlinear causal mapping. Like any network used for classification, in the beginning, these networks have a feature extraction phase, composed of convolutional neurons, and then, there is a reduction by sampling. There are simpler perceptron neurons to perform the final classification on the extracted features. The feature extraction phase resembles the stimulating process in cells of the visual cortex. This phase comprises alternating layers of convolutional neurons and downsampling neurons. As the data progresses through this phase, its dimensionality decreases, with neurons in far layers being much less sensitive to disturbances in the input data but at the same time being activated by increasingly complex features. CNN learns to recognize a variety of objects within images, but they need to be previously “trained” with a significant amount of data. [13]

2.1.1. Pixels and Samples

To begin with, the network takes the pixels of an image as input. If you have an image with only 28 × 28 pixels in height and width, it is equivalent to using 784 neurons and that is if you have one color (grayscale). If you had a color image, you would need 3 RGB channels (red, green, and blue), and therefore, neurons would be used. These neurons constitute an input layer. [10, 1317]

2.1.2. Convolutions

Here begins the “distinctive processing” of convolutional networks, in other words, “convolutions.” These consist of taking “groups of nearby pixels” from the input image and operating mathematically (scalar product) against a small matrix called the kernel. Assuming that this kernel has a size of 3 × 3 pixels, it is possible to visualize all the input neurons (from left to right, from top to bottom). Thus, a new output matrix is generated, which will ultimately be a new layer of hidden neurons. In case the image is a color image, the kernel would have a size of 3 × 3 × 3, that is, a filter with 3 kernels of 3 × 3; then, those 3 filters are added and will make up 1 output (as if it were 1 single channel) [12, 13].

2.2. Description of the Mobile Application

A prototype mobile application was developed for research purposes called (PCIDIIT) PCIDIIT (processor and convolutional identifier of dermatological images implementing transfer learning) [15, 1823], to be used under the Android operating system and programmed in Android Studio (Google Inc.). Its purpose is to identify dermatological lesions captured by a smartphone camera using CNN as a learning model. Execute permissions are preset for the use of the built-in camera. Its operation is based on identifying an image captured by the phone’s camera without the need to photograph and store it. This recognition is executed without an Internet connection locally in real time.

The mobile application has two activities and general service. One of the activities corresponds to the welcome of the application; the other is destined to the use of the camera, information, and configuration of the parameters for the processing of the images; in Figure 1, a screenshot of the application is observed. The recognition service prepares the image, loads the CNN model, analyses the image, makes inferences, and displays the results.

At the hardware level, PCIDIIT detects the model of the device where it is executed and the number of processing threads available for its optimal operation. To run the mobile application, you need a smartphone that has a 1.2 GHz or higher processor, 512 Mb or higher RAM, a built-in camera of more than 2 Mpx, and the Android KitKat 4.4.2 or higher operating system. The tests of this application were done on two phones, a Samsung Galaxy S4 mini and a Nokia 7.1.

2.2.1. Image Processing

Image processing is done in different stages before being identified. The CNN-based identification model must be previously trained with the necessary images and classes to be incorporated into the mobile application. The image capture is interpreted as a px RGB bitmap necessary to execute the inference and recognition process. The recognition model transforms the received bitmap to a tensor that the TensorFlow library understands to execute the CNN-based algorithm and perform the identification. After the recognition process, an ordered list of probabilities corresponding to each model class is generated; these data are displayed on the screen together with the inference time.

2.2.2. Recognition Model

The model used for the recognition was Mobilenet V2, an efficient model for mobile applications. The floating-point mode was chosen to prioritize accuracy over recognition speed. The network layers have at their output a normalization layer and a linear rectifier except for the last layer, which is connected to a normalized exponential function that returns the inference probability. To save on training time and computation time, transfer learning was used as a learning technique to retrain the CNN model and the integrated TensorFlow lite API converter to generate the model compatible with mobile applications.

2.2.3. Data Set

The images were obtained using the web scraping technique for publicly accessible photographs and consulting image banks such as http://shutterstock.com/, http://geosalud.com/ and http://istockphoto.com/. 305 clinical images divided into five classes are selected to be identified as shown in Figure 2. To increase the number of images in the database, random linear transformations were performed, such as zooms and rotations, in each one. The data set was split into two, 80% of the images were used for network training and the remaining 20% ​​for validation.

3. Results

The model training was performed by choosing 25 epochs of 500 steps each. To measure the performance of the training, the graph proposed by TensorFlow was used, which measures the precision by epochs and the rate of over-training of the model. Figure 3 shows the learning curve obtained; it can be seen that the maximum precision obtained is around 68% of the set of images used for validation.

To measure the model’s accuracy, the confusion matrix shown in Figure 4 is built. This matrix describes the performance of the application for each class or lesion. An unbalanced data set was used for its elaboration. The results of this test reveal that 75% of melanoma and plaque psoriasis lesions were recognized from the validation image set, 67% of atopic dermatitis images were recognized, and 40% of Kaposi’s sarcoma lesions images were recognized. Images classified as lesion-free or unrecognized were correctly identified in 79% of cases.

The receiver operating characteristic curve or ROC curve shown in Figure 5 is used to compare the validation results. The values ​​of the area under that curve (AUC), together with the precision and sensitivity (recall) for each class, are shown in Figure 5.

The ROC curve shows that the Kaposi’s sarcoma lesion (red) is the smallest curve compared to the others, with a value of 0.678, representing a 67.8% probability of recognizing a random case of Kaposi’s sarcoma, whether it is a positive Kaposi versus a negative one. In this case, the sensitivity is reduced compared to the specificity, which means that true positive cases are less recognizable. The melanoma lesion has an AUC of 0.852, plaque psoriasis has an AUC of 0.814, and atopic dermatitis has an AUC of 0.79, with melanoma being the most recognizable lesion by the mobile application. The precision and sensitivity values ​​in Table 1 show that Kaposi’s sarcoma is the lesion with the lowest values.

The fact of using unbalanced validation data sets suggests using another metric to test the mobile application’s performance. For this, the F1-value metric was used based on the precision and sensitivity values; the results of this test are shown in Figure 5 for each type of lesion. These obtained values ​​are consistent with the previous performance metrics. This demonstrates that this mobile application has a good performance for recognizing melanoma-type lesions, plaque psoriasis, and atopic dermatitis. However, it has a regular performance to identify Kaposi’s sarcomas.

4. Discussion

The mobile application developed in this work under Android technology turned out to be the best option in terms of operating system and software development costs since similar applications for the diagnosis of skin lesions are mostly designed in iOS (system operative of Apple); this implies high costs in its acquisition, programming, and maintenance. The implemented model was based on CNN to recognize a group of skin lesions [1012], using transfer learning as a learning method. Many recognition systems widely use this method based on ML. There are lesion identification systems that use different CNN models in their algorithms, including AlexNet, VGGNet, and ResNet, whose performance is similar to the model used for this study. Regarding the model’s training, the use of TensorFlow for mobile applications and other programs such as MATLAB for solutions on desktop computers has been evidenced in different investigations.

The nature of the training of CNN models to identify lesions requires an image bank. In this case, the number of items per class was small, around 60 images per class, which involved using the method of linear transformations, zooming, and rotations to expand the database at the time of training. The number of images used in different studies far exceeds that used in this work; it is noteworthy that for this reason, these investigations do not use random linear transformations [11]. The recognition of the images is not direct but done based on the probability of belonging to one of the classes programmed in the CNN. During the tests, it was found that the inference time for the model does not exceed 500 ms, which is excellent when used as a diagnostic tool. This time is considered fast for identification, as is the change in percentage of certainty when the lesion is identified.

Melanoma-type lesions, plaque psoriasis, and atopic dermatitis performed well when evaluating the metrics corresponding to each lesion. Due to the clinical importance and associated severity, melanomas are skin lesions frequently studied for identification by ML methods. This work obtained an accuracy of 85.2% for the melanoma lesion [12]. In other investigations whose accuracy and performance were similar, other studies based on CNN obtained a regular and consistent performance in their results. Accuracy is less than 80%; however, there is research evidence that accuracy exceeds 90% when recognizing melanomas.

The plaque psoriasis lesion has been included in other investigations. It has been obtained with 72.2% and 80% accuracy, values ​​close to the 74.2% obtained by the application developed in this work. The metrics for this disease are within the range considered good, and the predictions are interpreted as correct with a certain margin of error. Regarding the behavior to correctly predict atopic dermatitis, a regular result was obtained, less than 70%; however, the sensitivity for this class is close to 80% (79.3%), making the predictions made reliable.

The PCIDIIT app offers regular performance for Kaposi’s sarcoma like lesions. This class was identified less frequently, demonstrated in the confusion matrix and the calculated metrics, showing a regular precision (68.9%) and sensitivity (68.9%), which implies an increase in the number of false-positive cases. This performance is due to the number and quality of images used for training.

The literature reports [10, 15] a large number of investigations based exclusively on ML methods to recognize cutaneous melanomas. These pieces of software condition the image to be identified using a series of digital filters to extract its characteristics and issue a diagnosis or like that of Ray et al. in 2020 [7], which presented an accuracy of 75%.

5. Conclusions

The application developed to recognize skin diseases showed more than 75% accuracy and more than 77% sensitivity for three of the four chosen lesions, showing good behavior and performance despite the low number of images used for the training analysis. Being able to carry out recognition locally in real time using smartphones presents a great advantage for using this application as a diagnostic tool in remote areas without the Internet access.

The increase in the number of images at the time of the CNN training would allow improving the precision and sensitivity values in this application and the previous conditioning of the images to extract clinical characteristics that allow providing more data for the recognition of skin lesions with ML methods.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declares that they have no conflicts of interest.