Abstract
Oral cancer is one of the lethal diseases among the available malignant tumors globally, and it has become a challenging health issue in developing and low-to-middle income countries. The prognosis of oral cancer remains poor because over 50% of patients are recognized at advanced stages. Earlier detection and screening models for oral cancer are mainly based on experts’ knowledge, and it necessitates an automated tool for oral cancer detection. The recent developments of computational intelligence (CI) and computer vision-based approaches help to accomplish enhanced performance in medical-image-related tasks. This article develops an intelligent deep learning enabled oral squamous cell carcinoma detection and classification (IDL-OSCDC) technique using biomedical images. The presented IDL-OSCDC model involves the recognition and classification of oral cancer on biomedical images. The proposed IDL-OSCDC model employs Gabor filtering (GF) as a preprocessing step to eliminate noise content. In addition, the NasNet model is exploited for the generation of high-level deep features from the input images. Moreover, an enhanced grasshopper optimization algorithm (EGOA)-based deep belief network (DBN) model is employed for oral cancer detection and classification. The hyperparameter tuning of the DBN model is performed using the EGOA algorithm which in turn boosts the classification outcomes. The experimentation outcomes of the IDL-OSCDC model using a benchmark biomedical imaging dataset highlighted its promising performance over the other methods with maximum , , , and of 95%, 96.15%, 93.75%, and 94.67% correspondingly.
1. Introduction
Oral cancer is leading cancer globally and is considered by late diagnoses, morbidity higher, and mortality rates. Two-third of the total occurrence arises in low- and middle-income countries (LMICs), and half of the cases are in South Asia [1, 2]. Excessive usage of alcohol and tobacco are the main risk factor for oral tumors. The major factor in South and Southeast Asia is betel quid chewing which usually comprises slaked lime, betel leaf, and areca nut and might comprise tobacco [3]. Currently, they are commercially offered in sachets and are common in public because of their dynamic marketing strategy. The oral lesion is related to late presentation, mainly in LMIC, around two-third present at a late stage, and consequently, the survival rate is poor [4]. Cancer management, particularly at the late stage, is too expensive [5]. The lack of knowledge of health professionals and lack of public awareness concerning oral lesions are major reasons for late diagnosis. The OPMD diagnosis has a risk of malignant transformation, is of great significance to reduce mortality and morbidity from oral tumors, and has been the major emphasis of the screening program [6]. But the application of this program depends on visual inspection has turned out to be challenging in real-time settings as they depend on healthcare professionals, who are not experienced or adequately trained to identify this lesion [7, 8].
Earlier identification of OSCC gains significant importance for improved diagnosis, treatment, and survival [5, 6]. Late diagnosis has hampered the quest for precision medicine in spite of the advancements in the understanding of the molecular mechanism of cancer. Thus, machine learning (ML) and deep learning (DL) models have been employed for improving recognition and thereby reducing cancer-specific death rates and morbidity [7]. Automated image examination clearly has the significance of assisting pathologists and clinicians in the earlier detection of OSCC and decision-making in management. The considerable heterogeneity in the presence of oral cancer makes the detection highly complex for healthcare professionals and common cause of delays in inpatient referral to oral lesion specialists [9]. In addition, early-stage OSCC lesions and OPMD are generally asymptomatic and might seem like small, harmless lesions, leading to late presentation of the patient and eventually leading to diagnosis delay [10, 11]. Advancement in the fields of deep learning and computer vision offers an effective method to propose adjunctive technology that could implement an automatic screening of the oral cavity and present feedback to individuals for self-examination and healthcare professionals at the time of patient examination.
Bhandari et al. [12] aim to improve the performance of classifying and detecting oral tumors within a minimized processing time. The presented technique comprises a convolution neural network with an adapted loss function to minimize the error in classifying and predicting oral tumors by supporting multiclass classification and minimizing the over-fitting of the data. Lu et al. [13] presented an automatic approach for oral tumor diagnosis on slide cytology images. The pipeline comprises per-cell focus selection, CNN-based classification, and fully convolution regression-based nucleus recognition. The proposed method offers faster per-cell focus decisions at human-level accuracy. Song et al. [14] introduced an image classification method based on autofluorescence and white-light images with the DL method. The data are fused, extracted, and calculated to feed the DL-NN. Next, compared and investigated the efficiency of regularization, convolution neural network, and transfer learning technique for classifying oral tumors.
Figueroa et al. [15] designed a DL training model which provides understandability to its prediction and guides the network to remain focused and precisely delineate the tumorous region of the image. Lim et al. [16] developed a DL architecture called D’OraCa to categorize oral lesions with photographic images. It develops a mouth landmark recognition method for the oral image and integrates it with oral cancer classification as guidance to enhance the classification performance. Shamim et al. [17] evaluated and applied the effectiveness of six deep convolutions neural network (DCNN) models with transfer learning, for recognizing precancerous tongue lesions through a smaller data set. DCNN model can distinguish between five kinds of tongue cancer and differentiate between benign and precancerous tongue lesions.
In comparison with conventional ML models, the DL models receive input and do not involve a complicated feature extraction process. Besides, the heterogeneous pattern can result in variance over distinct instances and thereby causes complexity in handcrafted features with restricted generalization ability. In addition, the DL models exhibit high scalability owing to the capability of processing large amounts of data. The considerable heterogeneity in the presence of oral lesions makes the detection process difficult and is considered to be the leading cause of delays in inpatient referrals to oral cancer specialists. In addition, early-stage OSCC lesions remain symptomless and may look like small, inoffensive lesions, resulting in the late demonstration of the patient and eventually leading to additional diagnosis delay. Therefore, it is needed to design effective OSCC classification models.
This article presents an intelligent deep learning enabled oral squamous cell carcinoma detection and classification (IDL-OSCDC) model using biomedical images. The suggested model employs Gabor filtering (GF) as a preprocessing process to eliminate noise content. In addition, the NasNet model is exploited for the generation of high-level deep features from the input images. Moreover, an enhanced grasshopper optimization algorithm (EGOA)-based deep belief network (DBN) model is employed for oral cancer classification and detection. The hyperparameter tuning of the DBN model is performed using the EGOA algorithm which in turn boosts the classification outcomes. The experimentation outcomes of the IDL-OSCDC model are performed using a benchmark biomedical imaging dataset.
The rest of the paper is organized as follows. Section 2 provides the proposed IDL-OSCDC model, and Section 3 offers the performance validation. At last, Section 4 concludes the study.
2. The Proposed IDL-OSCDC Model
In this article, a novel IDL-OSCDC model was introduced for the identification and classification of oral tumors using biomedical images. At the initial stage, the IDL-OSCDC model utilized the GF technique to get rid of noise content. Following this, the NasNet model is exploited for the generation of higher-level deep features from the input images. Finally, the EGOA-DBN model is utilized to detect and categorize oral cancer. Figure 1 illustrates the overall process of the IDL-OSCDC technique.

2.1. Image Preprocessing Using GF Technique
In this study, the IDL-OSCDC model utilized the GF technique to get rid of noise content. The GF is a bandpass filter that is effectively executed for variation of image processing and machine vision application. In 2D, the Gabor function is an oriented complex sinusoidal grating reduced by a 2D Gaussian envelope. In a 2D co-ordinate scheme, the GFs containing real components and imaginary ones are demonstrated as [18]wherein which implies the wavelength of sinusoidal factors, and signifies the orientation separation angle of the Gabor kernel. Notably, it can be required only to assume from the interval as symmetry creates other directions redundant. defines the phase offset, demonstrates the standard derivation (SD) of the Gaussian envelope, and denotes the ratio of spatial features (the default value is 0.5) identifying the ellipticity of supports of the Gabor functions. The parameter has been defined by 6 and spatial frequency bandwidth as
2.2. Feature Extraction: NASNet Model
For the effectual derivation of feature vectors, the NASNet model is utilized [19]. The NASNetMobile model is a recently developed DL model with 53,26,716 parameters. It exhibits high reliability. The fundamental component of the NASNet model is the block, and a collection of blocks is integrated to form a cell. The searching space involved in the NASNet is the factorization of the networks to cells and again splits into blocks. The number and type of cells/blocks are not predefined. However, they need to be optimized for the chosen dataset. The probable functioning of the block comprises convolution, separable convolution, max pooling, average pooling, and identify map. The block has the ability of mapping two inputs into an output feature map. It performs element-wise addition. When the cell receives a block with a feature map size of H × W and stride of 1, the outcome will be the identical size of the feature map. Figure 2 depicts the framework of the NASNet model.

Once the stride is 2, the size is decreased by 2. The cells have been integrated from an optimizing method. The network progress is concentrated on 3 features: the cell infrastructure, the amount of cells that are stacked (N), and the amount of filters from the primary layer (F). Primarily N and F are set in the search. Then, N and F from the primary layer are changed for controlling the depth as well as the width of networks. If the search was complete, methods are created with various sizes for fitting the data set. The cell is then related in an optimizing method for developing the NASNet infrastructure. All the cells are associated with 2 input states named hidden state. For providing higher accuracy, NASNetLarge is obtained N as 6, but the essential concern to NASNetMobile is for running with restricted resources. In order to both normal as well as reduce cells, an input size of 224 × 224 × 3 was decreased to a size of 7 × 7 at the output with a chosen group of functions utilizing 5B cells. A novel model named scheduled DropPath was presented in NASNet, whereas all the paths from the cell were dropped with linearly enhancing probability as trained of network progress.
2.3. Image Classification: DBN Model
During image classification process, the DBN model allocates proper class labels to it. DBN is a probabilistic generation model comprising a stack of restricted Boltzmann machine (RBM) and backpropagation (BP) neural networks. It encompasses the visible layer, n hidden, and output layers [20]. The input/visible layer is placed at the end of the model, and the features are passed via many hidden layers at the time of the learning procedure. At last, the proper class label will be allocated at the output layer. In addition, RBM comprises input and hidden layers where bidirectional links exist among two layers. Consider that there are units in the input layer with vector and units in the hidden layer with vector The energy function of the RBM can be represented using the following equation:where signifies the parameters of RBM, comprising unit bias of input layer and unit bias of hidden layer , and denotes link weight among the nodes that exist among the input and hidden layers. Based on the energy function of the RBM model, the joint distribution can be defined as follows:where is termed as a normalization factor. The independent probability distribution of the input layer can be formulated as follows:
As there exist no links among the nodes in the equivalent layer, the conditional probability distribution of all layers can be defined as follows:where indicates sigmoid function. The intention of RBM is the maximization of probability via modifying bias , and weight . The RBM parameters set is attained from training data by the use of the maximum likelihood estimation approach. The gradient value of the parameters can be represented as follows:where signifies the probability of derived by RBM, characterizes probability provided by the reconstructed RBM. Also, the parameter set can be reorganized using the contrast divergence model.where and indicate learning rate and batch size. Once the initial training process of RBM is done, the present hidden layer turned it into the visible layer of the succeeding RBM. Once every RBM training is done, the deep features are classified.
2.4. Hyperparameter Optimization: EGOA Algorithm
The hyperparameter tuning of the DBN model is performed using the EGOA algorithm which in turn boosts the classification outcomes. GOA emulates the behavior of grasshopper insects. This insect affects agriculture and crop productivity, and the life cycle comprises egg, nymph, and adulthood [21]. In the nymph stage, the key feature includes moving and jumping in the rolling cylinder (with slow movement and small steps). In the adulthood stage, grasshopper migrates a longer distance in a swarm (with long-range and abrupt movement). Such behaviors are arithmetically expressed by taking the location of the grasshopper into account .whereas signifies social interaction of the ith grasshopper as follows:
Now, indicates the distance between the ith and jth grasshoppers whereas denotes the strength of social force function.
In which and represents the gravity force and wind advection for ith grasshopper correspondingly, and indicate the attractive length scale and the intensity of attraction as follows:where and represent the unity vector to the direction of the wind and the center of Earth, and and represent the gravitational constant and constant drift correspondingly. But equation (10) could be directly used for finding the solution to the optimization issue; hence, the researcher is rewritten as the following equation:where and represent the lower and upper bounds of the searching region, correspondingly; denotes the value of the optimal solution, and is determined in equation (12). But, in equation (14), gravity is not taken into account, and the direction of the wind is often considered as . Now, represents a reduction coefficient to shrink the attraction, comfort, and repulsion zones.where and represent the maximal value (equivalent to 1) and minimal value (equivalent to 0.00001) of , correspondingly; denotes the existing iteration, and represent the maximal amount of iterations. At last, the pseudocode of the GOA is given in Algorithm 1.
|
In the EGOA, the OBL approach was utilized for determining the opposite solution to the existing solution, and it then utilizes the value of fitness function (f) for determining if the opposite has superior to the existing solutions. The fundamental explanation of OBL is presented in [22], by considering the opposite value to the real value that is computed as
This definition is the generalization to ‐dimensional by utilizing the following subsequent formula:whereas refers to the opposite vector in the real vector . Besides, with the optimized procedure, the 2 solutions and are calculated, and the optimum solution is saved, but the other was eliminated by relating the fitness function. For sample, if (to minimized), is stored; else, is saved.
3. Results and Discussion
This section investigates the oral cancer classification performance of the IDL-OSCDC model using the benchmark Kaggle repository [23]. The dataset includes images of lips and tongue which are classified into cancerous and noncancerous groups. A sample image is demonstrated in Figure 3.

Figure 4 showcases various confusion matrices created by the IDL-OSCDC model on distinct sets of TR/TS datasets. On the training/testing (TR/TS) set of 90 : 10, the IDL-OSCDC model categorized 9 samples into cancer and 4 samples into noncancer. In line with the TR/TS set of 80 : 20, the IDL-OSCDC technique has categorized 18 samples into cancer and 7 samples into noncancer. Meanwhile, on the TR/TS set of 70 : 30, the IDL-OSCDC approach has categorized 24 samples into cancer and 14 samples into noncancer. Eventually, on the TR/TS set of 60 : 40, the IDL-OSCDC system has categorized 28 samples into cancer and 22 samples into noncancer.

(a)

(b)

(c)

(d)
Table 1 and Figure 5 report an extensive oral cancer classification performance of the IDL-OSCDC approach on the test and training dataset. The results are inspected under distinct sizes of TR/TS data. The experimental outcome signified that the IDL-OSCDC model has reached proficient values under all sizes of TR/TS data. For instance, with TR/TS set of 90 : 10, the IDL-OSCDC model has provided , , , and of 92.86%, 90%, 95%, and 91.81%, respectively.

Following this, with a TR/TS set of 80 : 20, the IDL-OSCDC methodology has offered , , , and of 92.59%, 88.89%, 95%, and 91.12% correspondingly. Along with that, with TR/TS set of 70 : 30, the IDL-OSCDC model has given , , , and of 95%, 96.15%, 93.75%, and 94.67% correspondingly. Furthermore, with TR/TS set of 60 : 40, the IDL-OSCDC method has provided , , , and of 94.34%, 94.11%, 94.49%, and 94.27% correspondingly.
A brief precision-recall examination of the IDL-OSCDC model on different TR/TS datasets is portrayed in Figure 6. By observing the figure, it is noticed that the IDL-OSCDC model has accomplished maximum precision-recall performance under all datasets.

(a)

(b)

(c)

(d)
Figure 7 demonstrates the ROC inspection of the IDL-OSCDC model under different sets of training and testing datasets. The result indicates that the IDL-OSCDC model has resulted in the highest performance on the testing dataset over the other ones.

(a)

(b)

(c)

(d)
Figure 8 illustrates the training and validation accuracy investigation of the IDL-OSCDC approach on the applied dataset. The figure conveyed that the IDL-OSCDC model has offered maximum training/validation accuracy in the classification process.

Next, Figure 9 represents the training and validation loss examination of the IDL-OSCDC model on the applied dataset. The figure reported that the IDL-OSCDC model has exhibited reduced loss values.

Table 2 investigates the comparative study of the IDL-OSCDC technique with recent approaches [24]. Figure 10 inspects the detailed examination of the IDL-OSCDC model with other models. The figure revealed that the SVM method has resulted in least performance with a lower of 88.38%. In addition, the ANN-SVM technique has reached a slightly enhanced outcome with of 90.48% whereas the fuzzy technique has depicted a moderately improved of 92.76%. Following this, the RF and CapsNet technique have shown closer results than the other methods. However, the IDL-OSCDC model has shown an effectual outcome with a maximum of 95%.

Figure 11 examines the detailed , , and examination of the IDL-OSCDC model with other techniques. The figure exposed that the SVM system has resulted in least performance with lower , , and of 89.82%, 90.65%, and 88.01%. Furthermore, the ANN-SVM model has reached slightly enhanced outcome with , , and of 92.24%, 90.13%, and 91.94% whereas the fuzzy method has portrayed moderately enhanced , , and of 88.03%, 89.43%, and 91.53%. Afterward, the RF and CapsNet models have revealed closer results over the other methods. Finally, the IDL-OSCDC algorithm has shown effectual outcome with maximum , , and of 96.15%, 93.75%, and 94.67%. After observing the abovementioned figures and tables, it is apparent that the IDL-OSCDC model has resulted in maximum performance over the other methods.

4. Conclusion
In this article, a novel IDL-OSCDC model has been established for the identification and classification of oral lesions using biomedical images. At the initial stage, the IDL-OSCDC model utilized the GF technique to get rid of noise content. Following this, the NasNet model is exploited for the generation of higher-level deep features from the input images. Finally, the EGOA-DBN model is utilized to detect and categorize oral cancer. The hyperparameter tuning of the DBN model is performed using the EGOA algorithm which in turn boosts the classification outcomes. The experimentation outcomes of the IDL-OSCDC model are performed using a benchmark biomedical imaging dataset. An extensive comparison study highlighted its promising performance over the other methods. In the future, advanced DL models can be utilized as a classifier to optimize the detection performance.
Data Availability
Data sharing is not applicable to this article as no datasets were generated during the current study.
Ethical Approval
This article does not contain any studies with human participants performed by any of the authors.
Consent
Consent is not applicable in this study.
Conflicts of Interest
The authors declare that they have no conflicts of interest to report regarding the present study.
Acknowledgments
The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting the work under grant code 22UQU4400271DSR06 and Taif University Researchers Supporting Project under number TURSP-2020/154, Taif University, Taif, Saudi Arabia.