In this paper, deep belief learning network architecture (DBL) is proposed for medical image classification in a bid to improve the diagnostics of dermal melanoma as an alternative to traditional dermoscopy. Preprocessing was carried out by using a linear Gaussian filter by eliminating high-frequency artifacts and distortion. The -means segmentation technique was used to extract the region of interest. The DBL network was then applied to the segmented image for classification. The DBL architecture disperses the weights and hyperparameters to all positions in an image, making it possible to scale to various image sizes. The effects of overfitting were mitigated for small datasets and were achieved by optimizing the proposed network. The algorithm works effectively by fine-tuning constraints. The results showed an increase in the accuracy between the proposed model and AlexNet and LeeNet for segmented images from 8% to 47%, respectively. Similarly, an increase for nonsegmented images was observed between 2% and 48%. An average reduction of 47.8% and 41.5% in error for both segmented and nonsegmented images was recorded for dermal images. The execution time also decreased in comparison with the other architectures averaged by 8-13%, since the weights were distributed only on the clustered regions in the segmented image, as compared to the whole image thus allowing the network to classify it faster with improved accuracy.

1. Introduction

Dermal melanoma is one of the types of melanoma that is usually curable when detected and treated early. Once melanoma has spread deeper into the skin or other parts of the body, it becomes difficult to treat and can be deadly. Many people are dying in the world of skin cancer because dermal lesions are not diagnosed at an early stage. In 2020, there were between 2 and 3 million cases of skin cancer according to the World Health Organization (WHO) [1]. Early detection and accurate screening diagnostics all contribute to the prevention of skin cancer. Skin lesions appear differently from the normal appearance of skin due to various reasons, and the diagnosis can be exhaustive if done manually [2]. There is a need for a fast, accurate, and automated diagnostic process to resolve this issue. Dermal lesion segmentation and classification are an active research area where accuracy must be improved using a robust algorithm. Therefore, a lot must be done to improve the accuracy performance of automated dermal diagnosis.

Most CAD-based systems for both segmentation and classification methods use handpicked features to differentiate between normal skin and skin lesions [3]. These strategies executing on an established machine learning technique based on feature vectors are extraction and then applied to a classifier without rejecting the nature of those extracted features. However, such features are not able to identify melanoma due to differences in morphology, causing underperforming diagnosis from CAD systems [4]. Classification of melanoma is done by using a well-defined segmented area called the region of interest (ROI) to optimize the performance of these systems. It improves the classification capability, as the “region of interest” provides a better representation of melanoma attributes. The exact classification of the melanoma skin lesions can be determined if the feature extraction steps specify only the affected area, as done by a skin specialist (dermatologist). When doing feature extraction, the addition of nonaffected skin lesion with affected skin lesion produces weak features, as the ROI is not targeted. Hence, classification results may be inaccurate. This is why segmentation is considered the first step before classification, to improve the performance of CAD systems. Another approach for designing task-adapted feature representations is to learn an architecture of increasingly complex features directly from its in-domain data [5].

Deep learning is an upgraded form of artificial neural networks since it comprises more layers that allow higher levels of abstraction and improved data predictions. Among the deep learning technologies, convolution neural networks (CNNs) have specifically proven themselves to be useful tools that are capable of handling a wide variety of computer vision tasks [6]. The success of the deep belief convolution network lies in its ability to autonomously learn midlevel and high-level abstractions from the analyzed image. This makes it a very effective tool that can be used for recognizing tumors and localization in natural images. A common CNN structure comprises multiple layers of convolution filters that are mixed with a multitude of data reduction or pooling layers. This allows the CNN to typically provide one or more probabilities or class labels as its output. In addition, the convolution filters are developed using a training dataset that educates the filters by allowing them to preprocess the image automatically without the need of tediously handcrafting features that are application-specific [7].

Many researchers have focused on the segmentation and have not included the classification. Therefore, they have not covered the classification part which in itself is the next step [8]. Most of the researchers used standard networks for feature extraction and classification, i.e., AlexNet and U-Net [9]. Some do not have access to the published datasets, and some have worked for general imaging tasks. Most of these networks cannot give good results for dermoscopic images since dermal images are often of low quality and resolution as most are taken with mobile cameras and such. Since the adjacent and bordering pixels of an image have a similar intensity which brings about needleless load on the algorithm while the low quality of the image decreases the capability of the network in small-sized patches to learn global features [10].

Some researchers have focused on segmentation using window bounding boxes but are not achieving a precise boundary on the region of interest. Currently, there is a lot of focus on learning a maximum number of features in the area of digital imaging in a bid to improve the accuracy of detection by a convolutional neural network (CNN) architecture [11, 12]. However, most of them agree that convolutional neural network (CNN) architecture is capable to extract notable features of images quite efficiently. To start with, the most noteworthy challenge faced by CNNs is the lack of labeled training data. CNNs need data of substantial amounts that are clearly labeled for training, and many studies have been conducted on limited datasets thereby not achieving desired results [13]. It is very difficult to arrange this data due to the expense of interpretation from derma specialists and huge variants of the same skin disease [14]. Deep belief network’s CNN needs numerous tunings in the network architecture as a result of overfitting and convergence issues. This tuning is done to the parameters of the network so that a comparable learning speed between all layers is achieved. Table 1 summarizes the previous works’ methodologies and shortcomings.

This study is an attempt to propose a dermascopic image classification using deep belief learning convolution neural network architecture. We intend to identify and classify dermal lesions using -means clustering segmentation that not only satisfies the highest accuracy but also reduces the execution time for all types of medical images in comparison with other architectures that have been made. Our contributions to this research are as follows: (1)The proposed architecture is a convolution neural network for medical images using a deep belief network for small datasets. This architecture is easy to train low-resolution images(2)It will remove overfitting and convergence problems for small datasets(3)The algorithm is more scalable by using probabilistic pooling and weight sharing(4)The speed and accuracy of inference needed to create a percept have also been improved. Since neurons only communicate in their stochastic binary state, the communication has also been made simple(5)This presented a comparison of the results between the proposed network architecture and prior models for segmented and nonsegmented images showing that the former provides better features and improved accuracy in dermal image classification

This paper is organized as follows. Section 1 gives an introduction and discusses previous related work. Section 2 highlights the major mathematical methods. Section 3 shows the simulation setup. Section 4 has results and discussion in which the performance of the proposed algorithm is evaluated and the results are shown. Lastly, Section 5 presents the conclusion.

2. Methodology

2.1. Preprocessing

The first step in deep learning is to make the available dataset clean and present it in a useable manner to improve the intensities of an image with complete information and better resolution. This is done through preprocessing techniques. It is necessary to have a good quality image before segmentation is done to obtain accurate and precise results. In the presented approach, the input image is passed through a low-pass Gaussian filter to remove high-frequency artifacts thus improving the image quality through noise elimination, contrast enhancement, intensity equalization, and outlier removal. Additionally, the Gaussian distribution in 1D and 2D cases is shown in

The standard deviation of the distribution is . Isotropic Gaussian of 2D cases is circularly symmetric [20].

2.2. Segmentation

The -means clustering method is broadly used to separate data into two or more clusters. It uses a process known as clustering to gather all the data points that have similar feature vectors in one cluster while grouping all the other data points that have dissimilar feature vectors in different clusters. -means has a vital role in determining the difference between signal intensities of pixels in different clusters, providing a change estimation of intensities after evaluating parenchymal regions and finding connections between the morphovariations and the variations of strict parameters [21].

The -means procedure is as follows:

Input data points , and the number of clusters needed is the value of : (i)Pick points as the initial centroids from the dataset, either randomly or the first (ii)Find the Euclidean distance of each point in the dataset with the identified points (cluster centroids)(iii)Assign each data point to the closest centroid using the distance found in the previous step(iv)Find the new centroid by taking the average of the points in each cluster group(v)Repeat (ii) to (iv) for a fixed number of iterations or till the centroids do not change

The Euclidean distance between two points and where and ,

Assign each point to the nearest cluster: if each cluster centroid is denoted by , then each data point is assigned to a cluster where distance is the Euclidean distance and is the set of all points assigned to the th cluster.

To find the new centroid from the clustered group of points where is the set of all points assigned to the th cluster,

Derma images are distinguished on the basis of signal intensity received from different tissues in the image. These signal intensities play a vital role in identifying and sorting the pixels into different clusters and giving a change estimation of intensities after studying dermal tissues and finding connections between the dermoscopic lesion structural variations and the variations of strict parameters [22].

2.3. Proposed Deep Belief Learning Network

Deep learning is a branch of artificial neural networks, most often used in image processing and object recognition. They are also known as space or shift-invariant artificial neural networks due to the shared-weight architecture of the convolution filters or kernels that slide along input features and give us feature maps by providing interpreted equivariant responses. The advantage of convolutional network architecture is that it converts the output of one filter applied to the subsequent layer thereby extracting valuable features of the image. The result of applying the filters to an input image is captured by the feature maps; i.e., at each layer, the feature map is the output of that layer. In neural networks, a feature map comprises a layer of hidden neurons where each coordinate represents an individual neuron. The approach used by algorithms of deep learning is to use a network of parameters that are organized by layers. Except for the input and output layers, the rest of the layers are known as hidden layers. The kernel’s size determines the receptive field of that neuron and also indicates the weights of the connections between the layers of neurons as well as the neurons in the previous layer. We find that every kernel has tuned to a different orientation, spatial scale, and frequency in accordance with the training data statistics learned kernels which are comparable to edge detectors [23].

Gradient descent is used to train deep networks in a bid to reduce a predefined cost function of the output layer and is shown as the negative log-likelihood function. Among a plentitude of deep network models, the type known as deep belief network (DBN) architecture is the one in which each layer is initialized as a restricted Boltzmann machine (RBM) hence minimizing the RBM input energy function [24]. Since both kinds, i.e., DBNs and RBMs, do not work with the 2D structure of images to extract a given feature, the weights that are required need to be learned one by one for each pixel. This restriction results in exhaustive scaling of these network models to full camera images because of increased computational complexity. To overcome this limitation, the deep belief learning (DBL network) was employed for derma images. In this network, convolution is used to distribute the hyperparameters among all locations in an image thus allowing inferences to be done efficiently. Hence, it is due to the intrinsic nature of this network that entire images can be scaled by this model. The DBL network utilizes convolution in a deep belief restricted Boltzmann machine (DBRBM), akin to restricted Boltzmann machine (RBM) in most ways, but the weights in the DBL networks’ visible and hidden layers are distributed between all pixels of an image [25].

The proposed multilayer perceptron (MLP) architecture of DBLCNN as described in Table 2 as shown in Figure 1 is quite different from the high-performance CNN models. Instead of applying correspondingly large receptive fields in the first convolution layer, e.g., with stride 4 or with stride 2, the input image size is with 96 kernels of size with a stride of 4 pixels which are convolved with the input at every four pixels. Our configuration is different from previous architectures in such a way that first, we have incorporated five nonlinear rectification layers instead of one, which makes the decision function more discriminative. Secondly, it controls the number of parameters by using comparatively small filter sizes and stride sizes; i.e., the 1st convolution layer has , the second convolution layer has , the third convolution layer has , and the last two convolution layers have the same size with channels. The first convolution layer stack is parametrized by weights while the second convolution layers require parameters; similarly, third layer parameters are , and the last two convolution layers have parameters.

Small-size convolution filters have been previously used, but the nets of the proposed model are significantly deeper than theirs, and they did not evaluate images either.

The proposed architecture is similar in that it is based on very deep DBL network net with small convolution filters and the weight matrix connecting with probabilistic max-pooling. The network topology is more straightforward than other models, and this allows the net to be expanded into a deeper network in a simpler way.

Let the probabilistic statistics for restricted Boltzmann machine with binary hidden units and visible units be and , respectively. The probability is determined by the combined configuration of the visible and hidden units. Every possible combined configuration of the visible and hidden units has an energy as given in the equation below:

where is the partition function.

Hence, the energy function of the DBL network is

where the hidden and visible units are and , respectively, and and are the visible and hidden unit biases, respectively.

From the energy function, Gibbs sampling uses the following conditional distributions over visible and hidden layers. where is the rectified unit function.

2.3.1. Rectified Unit (Activation Function)

Rectified units are an important function of the design of a convolutional neural network. The activation function for output layers depends on the type of classification problem. In the proposed work, the logistic sigmoid activation function (Gibbs sampling) is the basis for the learning algorithm. The output function of a neuron for its input of is .

The sigmoid function maps . Saturation is prevented if the required targets are L2-normalized because at , linearity is at its maximum. These saturating nonlinearities are much slower than the nonsaturating nonlinearities, in the course of training time with gradient decent [26].

2.3.2. Learning Rate and Epoch

The learning rate controls how quickly the model is adapted to the problem. Smaller learning rates require more training epochs given the smaller changes made to the weights in each update, whereas larger learning rates result in rapid changes and require fewer training epochs.

The learning rate is controlled by

where “initial rate” is , “decrease constant” is , and is the “epoch/stage”; depending on these parameters, the learning rate can be increased or decreased. The learning rate for each parameter determines the error of the classifier network [27].

2.3.3. Batch Size

The batch size is the number of training samples considered when the optimization solver is updated once. Before choosing a certain batch size, several factors are kept in consideration, i.e., the computational cost and the uncertainty of update from a small batch to a large batch.

A smaller batch size produces more noise as compared to a larger batch size. However, in the presence of a large number of minima in the error function, the model may get stuck in the first minima it encounters. Hence, an ideal batch size needs to be chosen that will optimize the model by introducing more noise in the model estimate of the gradient. This noise can then be utilized to push the model out of some shallow valleys in the error function [28].

2.4. Performance Evaluation Parameters

When in the training phase, there is a vital role of accuracy measurement for architecture selection; to maximize prediction accuracy on training samples, the parameters need to be selected. In the end, in the result of the learning step, accuracy is measured to assess the architecture’s predictive ability on new data. There is a risk to overfit training data as learning algorithms are trained on finite samples: the model might memorize the training samples instead of learning a general rule, i.e., the data generating model. Hence, an index of the model generalization ability having high accuracy on unseen data is a measure of the robustness of classifiers. Factors affecting accuracy of models are overfitting and underfitting. Overfitting happens when the model learns noise instead of showing the true relationship. When the training error is much lower than the generalization or testing error, the model predicted is said to be overfitted, and underfitting happens when a model is unable to capture the underlying pattern of the data. These models usually have high bias and low variance [29].

2.4.1. Accuracy

In this study of a diagnostic test to discriminate between subjects affected by a disease dermal melanoma, i.e., atypical and benign, i.e., common disease, (i)the number of correctly classified melanoma is the true positives or TP(ii)the number of correctly classified benign is the true negatives or TN(iii)the number of controls classified as melanoma is the false positives or FP(iv)the number of patients classified as benign is the false negatives or FN

Hence, accuracy is determined by the below equation [30]:

2.4.2. Error/Loss

The selection of the output layer activation function is linked with the error or loss. The output layer activation function is the identity, or the linear activation function as the output is a simple function of the inputs. The output of the activation function is not restrained to a fixed range, hence suitable to address the regression problem. Let the training dataset be , having the corresponding output vectors as ; then, backpropagation loss or error is formalized into an equation shown below, where the network output is and is the weight vector.

This is commonly called the “sum of squares error function.” In binary classification, we have an error function or a “cross entropy error function” as shown below:

3. Simulation Setup

The flowchart in Figure 2 outlines the simulation setup of dermal image classification using the deep belief learning classifier that has been implemented for this work. The preprocessing and segmentation have been done using MATLAB simulation whereas the classifying process has been carried using NVIDIA ‘DIGITS’.

4. Results and Discussions

The proposed DBL network classifier framework is shown in Figure 3.

The PH2 dataset includes various images of dermal lesion comprising common and atypical dermascopic images as shown in Figure 4. For the purposes of validation and the evaluation of the proposed network, a reliable ground truth image database is needed. A ground truth image database is a vital requisite, specifically in the field of dermoscopy. The compilation of the truth data is a task that must be done by a derma specialist. Each dermascopic image is to be manually segmented and annotated. For the implementation, the PH2 dermascopic image database was used as a ground truth. The database contains a total number of 200 dermal lesions, including 100 common images and 100 atypical images [7].

According to the classification scale of the Fitzpatrick skin type [31], all dermoscopic images are from either skin type II or III. Therefore, in the PH2 dataset, the skin colors are represented from white to cream white. In this research, the images of the database were prudently chosen by considering their quality, resolution, and dermoscopic features. Using the somewhat standard thumb rule for deep belief learning, 70% of the images were used for training, 15% were used for validation, and the remaining 15% were used for testing. The NVIDIA deep learning GPU system was used for training and testing. The program is run on ‘DIGITS’. “DIGITS simplifies common deep learning tasks such as managing data, designing and training neural networks on multi-GPU systems, monitoring performance in real time with advanced visualizations, and selecting the best performing model from the results browser for deployment. DIGITS is completely interactive so that engineers can focus on developing and teaching networks rather than programming and debugging.” This method teaches and applies multilayered artificial neural networks to realize any objective or goal independent of human intervention. DBLs that are used for image classification normally use a fused combination of CNN and completely connected layers. Here, the tiled artificial neurons are applied where the fully connected layers react to visual fields of overlapping areas.

4.1. Preprocess the Images

The first step was to create and set up the dataset and preprocess the images; for this, we used a Gaussian filter which is basically a low-pass digital filter in order to remove the highest and lowest intensities from the images effectively. Having completed the preprocessing, different routes or methods were then applied to evaluate the accuracy and effectiveness of deep belief network learning models. First, the images were passed through a Gaussian filter; then, the preprocessed derma images were used to train and test the deep belief learning classifier. Some results can be seen in Figure 5.

4.2. Segmentation

After the preprocessed images were segmented via the -means segmentation method, the output of the -means method was applied to the same deep belief learning classifier. Initially, the output of the segmentation steps, obtained through the -means algorithm, has been compared with the truth images as shown in Figure 6.

The two methods, Dice similarity and Hausdorff distance, were then employed to assess and compare the accuracy of the segmented and nonsegmented images.

The Dice similarity and Hausdorff distance methods were used to show the accuracy of the segmented image.

The Dice similarity coefficient (DSC) was used as a statistical validation metric to evaluate the performance of the segmented image and the spatial overlap accuracy of automated probabilistic fractional segmentation of images. The Dice coefficient lies in the range and has value 0 if there is no overlap between the two images and 1 if both images are identical. Hausdorff distance computes the shape similarity between the segmented image and ground truth image. The function computed the average distance from a point on the truth image to the closest point on the segmented image for forward and reverse distances, and the output distance was the minimum value from both distances; the lower the distance value, the better the match [32]. That method gives interesting results, even in the presence of noise or occlusion (when the target is partially hidden). The comparison is shown in Table 3.

4.3. Classification

In order to realize the best precision using tuning parameters, the classifier optimization is done first. The main tuning parameters for convolution are the batch size, learning rate, epochs, and kernel size. The final classification from the two routes (segmented and nonsegmented) was then compared based on several performance-related aspects. The test results achieved after going through the above-described techniques have been applied. Then, the support vector machine classifier was trained using the stochastic gradient descent where the momentum was set to 0.9 and the weight decay value was set to 0.0005. Here, it was determined that the weight decay with respect to the network’s ability to learn is important in this regard. This is because it lessens the training error that we come across in the network. A Gaussian distribution of zero mean with standard deviation of 0.01 was used to initialize the weights as well. Then, for all layers of the network, the same learning rate was used. Figure 7 shows the inference results.

We followed an experimental process in which we divided the learning rate by 10 each time; no enhancement was observed in the validation error rate as well as the current learning rate. The learning rate was started with the value of 0.01 and was reduced three times and then discontinued.

In the experimental trials, the batch size was selected to be 10 for 50 epochs. The DBL network classifier model was applied on both the nonsegmented and segmented images after optimizing all the constraints. Figure 8 shows a graph plotted to compare accuracy and loss (validation and testing loss) along for epochs for segmented and nonsegmented images.

The architecture employed for the classifier comprised five convolutional layers with kernel sizes of 9, 7, 5, 3, and 3 for each layer (for both segmented and nonsegmented images). In the DBL network, Figure 9 shows the classifier’s results in the form of learning rates and loss as well as a close look at the iterations.

The general trend can be seen from the figures given in Table 4 that the average and maximum accuracy of the segmented dermal images is more than that of the nonsegmented images; similarly, the loss is lesser. Hence, the execution time for the same epochs is lesser for segmented images than for nonsegmented images thus indicating a reduced computational load.

During the training and testing time, we ran our models on the GPU in order to exploit its computational speed, reducing the execution time further. We noted that the segmented images used in the classifier model were processed 8-10 times faster than the nonsegmented classified model. Predictions for one image with the DBL model take on average 4 minutes.

As mentioned above, the accuracy, error, and execution time of the proposed DBL network were observed for the proposed classifier architecture between the segmented and nonsegmented images. Then, the performance was compared with respect to the AlexNet classifier architecture which was also carried out. Table 3 shows a comparison of implemented DBL network architecture vs. the AlexNet architecture. As can be seen from the table, the AlexNet is also a good choice for classification; however, it is hard to train low-resolution images (neurons die quickly), it has overfitting problems for small datasets, and it needs an improvement in computation.

The implemented DBL network has the following characteristics such as an improved feature space, the initial layers learning 1st-order generic features (e.g., edge detectors or color blob detectors), later layers learning higher-order features, the DBL network for joint configuration, and layers having faster convergence. Table 5 shows the performance of some commonly used deep CNN models being currently used. It is quite obvious from the results that for the derma images or 2D, the DBL network classifier surpasses most of the recent architectures for this dataset.

5. Conclusion

In this work, we have presented an optimized deep belief learning network model on dermal images for melanoma detection thereby improving classification accuracy and reducing execution time. In conclusion, the outcome of this work has indicated that the accuracy of classification of the proposed network for nonsegmented images increased within 2% to 48% when compared with AlexNet and LeeNet, and similarly for segmented images, there was an increase within 8% to 47%. We also observed that the average error for the dermal nonsegmented and segmented images was reduced within 47.8% and 41.5%. The execution time for the proposed classifier was also observed to have an average decrease within the range of 8-13% when compared with the other classifier models on dermal images. The final deduction is that as the distribution of weights in the segmented image was specifically on the clustered areas, instead of the whole image, the network had less work to do and was able to improve the accuracy as well as work faster. The presented work can improve the dermoscopic image classification complementing the doctors’ capability to detect and analyze dermal images leading to a more authentic diagnosis. As a result, the diagnostic process can be less tedious, less expensive, and timely for the concerned patient.

Data Availability

We used the PH2 dermoscopic image database.

Conflicts of Interest

We wish to confirm that there are no known conflicts of interest associated with this publication, and there has been no significant financial support for this work that could have influenced its outcome. We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us. We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing, we confirm that we have followed the regulations of our institutions concerning intellectual property. We further confirm that any aspect of the work covered in this manuscript has not involved either experimental animals or human patients. We understand that the corresponding author is the sole contact for the editorial process (including editorial manager and direct communications with the office). She is responsible for communicating with the other authors about the progress, submissions of revisions, and final approval of proofs. We confirm that we have provided a current, correct email address which is accessible by the corresponding author and which has been configured to accept email.


The authors extend their appreciation to King Saud University for funding this work through Researchers Supporting Project number RSP2022R426, King Saud University, Riyadh, Saudi Arabia.