Abstract
Ovarian cancer is a serious sickness for elderly women. According to data, it is the seventh leading cause of death in women as well as the fifth most frequent disease worldwide. Many researchers classified ovarian cancer using Artificial Neural Networks (ANNs). Doctors consider classification accuracy to be an important aspect of making decisions. Doctors consider improved classification accuracy for providing proper treatment. Early and precise diagnosis lowers mortality rates and saves lives. On basis of ROI (region of interest) segmentation, this research presents a novel annotated ovarian image classification utilizing FaReConvNN (rapid regionbased Convolutional neural network). The input photos were divided into three categories: epithelial, germ, and stroma cells. This image is segmented as well as preprocessed. After that, FaReConvNN is used to perform the annotation procedure. For regionbased classification, the method compares manually annotated features as well as trained feature in FaReConvNN. This will aid in the analysis of higher accuracy in disease identification, as human annotation has lesser accuracy in previous studies; therefore, this effort will empirically prove that ML classification will provide higher accuracy. Classification is done using a combination of SVC and Gaussian NB classifiers after the regionbased training in FaReConvNN. The ensemble technique was employed in feature classification due to better data indexing. To diagnose ovarian cancer, the simulation provides an accurate portion of the input image. FaReConvNN has a precision value of more than 95%, SVC has a precision value of 95.96%, and Gaussian NB has a precision value of 97.7%, with FRCNN enhancing precision in Gaussian NB. For recall/sensitivity, SVC is 94.31 percent and Gaussian NB is 97.7 percent, while for specificity, SVC is 97.39 percent and Gaussian NB is 98.69 percent using FaReConvNN.
1. Introduction
One of the most common types of cancer in women is Ovarian Cancer (OC). In year 2018, 295,414 women were diagnosed with ovarian cancer, which resulted in 184,799 deaths around the world. Since earlystage tumors are often asymptomatic, most women with ovarian cancer have advanced disease at the time of diagnosis, resulting in a lower longterm survival [1]. Although ovarian tumors are chemosensitive as well as exhibit initial success against platinum/taxane treatment, the 5year recurrence rates in patients with advanced illness are 60 percent to 80 percent [2].
It is characterized by earlystage symptoms that are modest and a low survival rate. The most common as well as dangerous gynecologic cancer is OC. Serous, mucinous, Endometroid, and clear cell ovarian cancer are four subtypes of primary epithelial ovarian carcinoma [3]. According to earlier research, one out of every 54 women can acquire OC.5year survival rate for a patient diagnosed with OC is roughly 48.6%[4]. The low survival rate is largely attributable to cancer discovery at an advanced stage, with 72 percent of patients diagnosed at stage III or IV. As a result, early detection is critical. Attempts to detect OC in the preclinical stage have been made in the past, employing both medical imaging and blood markers. Although these biomarkers show promise, they have a number of drawbacks, including missing classification, sluggishness, and longer working hours [5].
Although the Serum Carbohydrate Antigen (CA125) is commonly utilized, its accuracy rate is low due to its high sensitivity. Ultrasound imaging, MRI, and CT scan are some of the imaging modalities that are used to locate and characterize tumors. Early detection of any medical condition, particularly cancer, is critical for improving survival rates. Medical imaging is one of the most successful ways for earlystage diagnosis, prediction of brain imaging modalities, monitoring cancer stages, and followup procedures after cancer therapy, according to studies. Manually interpreting the data from these medical photographs is timeconsuming and prone to human error [6]. The normal ovary and origin of three types of ovarian cancer have been shown in Figure 1. There are three main types of ovarian tumors.
1.1. Epithelial Tumors
This type of tumors is derived from the cells on the surface of the ovary. This is the most common form of ovarian cancer and occurs primarily in adults.
1.2. Germ Cell Tumors
This type of tumors is derived from the egg producing cells within the body of the ovary. This occurs primarily in children and teens and is rare by comparison to epithelial ovarian tumors.
1.3. Stromal tumors
These tumors are rare in comparison to epithelial tumors and this class of tumors often produces steroid hormones.
In addition, computeraided diagnosis (CAD) methods are frequently employed to assist physicians and pathologists in better analyzing the outcomes of medical images. ML methods are employed in a CADbased medical imaging strategy for cancer detection [7]. Feature extraction is a crucial stage in the machine learning approach [8]. In the literature, many feature extraction approaches have been examined and analyzed in the context of various MRI, CT, and ultrasound images. Previous work has focused on generating worthy feature descriptors and ML methods for context learning from various types of medical images. These methods have some drawbacks that limit the use of CADbased medical diagnostic procedures. In this study, we focus on representation learning rather than a learningbased strategy to solve the shortcomings of CADbased systems. Deep Learning learns from image data using hierarchical feature representation, a form of representation learning technique [9]. The image data itself is used to produce highlevel feature representation. The deep learning approach has gained huge profits and success in different applications like image recognition, objection detection, speech processing, and many others [10] with the addition and support of considerable parallel architecture and GPU. Physicians classify patients’ symptoms into one of several illness classes based on their understanding. Learning categorization model is a learning challenge in this study for ovarian problems. A broad classification approach was discovered through data analysis. Training data containing cases such as objects or instances is characterized using attribute vectors (features or variables). It could be either quantitative or qualitative in nature. In supervised learning, mutually exclusive cases, as well as class data, are employed for learning, when all cases have the same attribute vector from the same class.
2. Literature Survey
Recent research [9] has demonstrated that combining genetic data with pathology images to diagnose tumors is very effective. For predicting breast cancer outcome, researchers [10] combined pathology images and genetic data. To connect the heterogeneous data of two modalities, a multiple kernel learning method was used. Method had an accuracy of 0.8022 and a precision of 0.7273. M2DP, a multimodal task feature selection technique for cancer diagnosis, was introduced in [11]. Method was tested utilizing a breast cancer benchmark and a lung cancer benchmark, with an accuracy of 72.53 percent and 70.08 percent. In [12], the author proposed a various kernel strategies for forecasting lung carcinomas by combining genomic data with pathological aspects of images, with an accuracy of 0.8022 [13]. Using a DL model, the machine fully extracted deep features from the gene as well as image modalities, and then combined the disparate features utilizing weighted linear aggregation. The accuracy of prediction was 88.07 percent. Recent developments in CNN and other DL methods have profound implications in medical diagnostics. For histopathologic analysis of prostate cancer, the author [14] used a deep residual CNN. The model correctly classified the image patches into benign and malignant at a coarse level of 91.5 percent of the time. Using residual networks, study [15] presented a method for automatically classifying brain cancers (ResNet50 architecture). On a patientbypatient basis, the model accuracy was 0.97. For classifying dermoscopy images, author [16] utilized deep GoogleNet Inception. Precision was 0.677 [17]. DenseNet161 and ResNet50 were used in this study. The Fscore of the DenseNet161 model was 92.38 percent, while the accuracy was 91.57 percent. However, many jobs in the realm of medical applications are reliant on longrange interdependence [18]. RNN methods are the most popular approaches for learning longitudinal data in depth. LSTM [19] is an RNN version that captures both LSTM dependencies within sequential input. The Fscore for the method was 0.8905 [20]. Table 1 depicts the comparative analysis of proposed and existing techniques.
3. Research Methodology
This section discusses the proposed technique in ovarian cancer detection based on segmentation and classification using deep learning architectures. Overfitting and other errors might occur if the training sample is too small. To enhance classification accuracy, we increased the sample size in our study by manipulating images [22]. Image enhancement and rotation are examples of image manipulation. To improve sample sizes, we rotated original input images from 0° to 270° in 90° steps around their center point. Our two groups of data produced two separate recognition models [23]: one used the original image dataset as training data without image segmentation and other utilized the image dataset as training data, with a sample size 11 times larger than the original image dataset. Figure 2 depicts the architecture of this study process [24].
The input image was divided into three categories: epithelial, germ, and stroma cells. The image is first preprocessed for noise reduction as well as filtering. This image was manually tagged as well as trained utilizing the standard training model [25]. This study used a NN known as the FaReConvNN to compensate for hand annotation. Using FaReConvNN, an object is detected using a trained image and a manually segmented image. Since the convolution is used to detect edges, both features are annotated based on region. Image segmentation has annotated contextual features. The accuracy of disease detection utilizing computerassisted diagnosis is higher than that of manual detection. Gaussian NB and SVC are utilized for classification when FaReConvNN is applied [26].
Through various processing techniques or combinations of multiple processing, such as random rotation, shifts, shear, flips, etc., image augmentation artificially generates training images. The validation error must drop along with the training error in order to create useful Deep Learning models. This can be done very well with data augmentation. The distance between the training and validation sets, as well as any upcoming testing sets, will be minimized because the augmented image will represent a wider range of potential image locations. The suggested method seeks to enhance segmentation outcomes by creating a new MRI image dataset from an existing MRI image dataset. In this work, the segmentation of the ovarian imaging collection is specifically discussed. It is stated that the segmentation task entails locating the pixels that belong to the ovarian cancer image and separating the nuclei from the surrounding tissue. We can totally rearrange the pixels in an image enhancement process that involves flipping an image horizontally or vertically while maintaining the features. Images may be at a range of angles, though they are unlikely to be upside down. Each image may be rotated by a different amount. The majority of the image’s pixel values have changed from the original image’s values.
Incorrect pixel values that are randomly distributed throughout the image can also be used to add noise to the image. Each image in the training set can be enhanced using standard augmentation methods like flips and rotations without requiring manual image processing. Batches of photos are pulled from the directory by “ImageDataGenerator,” which then applies transformations like “vertical flip,” “horizontal flip,” or “rotation range.”
The enhanced images first go through preprocessing to improve them before computation. The preprocessing step primarily results in the collection of images under various methods of image examination. It changes the applied image into a new one that is essentially identical to the applied image, with a few minor differences. Resizing, masking, segmentation, normalization, noise removal, and other preprocessing procedures are some of them. By downsizing the photos and filtering the noises that are present in the image, this study preprocesses the applied product images. Each image is changed to its default size of 300 × 300 pixels before being resized. Resized photographs are sent to the filtering process so that the product images can produce better results.
3.1. Cancer Detection Using FaReConvNN
3.1.1. Convolution Layer
A set of filters make up the convolutional layer. The learnable parameters of layer are the values of these filters. When it comes to CNNs, the goal behind convolution is to extract features from an image while keeping the spatial relationship between pixels as well as learnt features inside the image by using small, equalsized tiles. For an MN 3 input image with K filters of size I J in the first convolutional layer, where I <<< N and 3 indicate the color channels. Every element from input image as well as filter matrix undergoes a mathematical process, which results in the learned features [4]. This is what it means:where is output of layer is weight of filter f which is used at layer l1. To put it another way, the filter slides through all of the image’s elements and multiplies each one, resulting in a single matrix called a feature map. Size of the feature map matrix is determined by the depth and stride.
Additionally, an activation function known as ReLU is commonly utilized to introduce nonlinearity to CNNs, allowing them to learn nonlinear models. This rectifier approach is most commonly utilized since ReLU considerably enhances CNN object identification performance [27].
3.1.2. Pooling Layer
Pooling is one of ConvNet’s unique concepts, as previously noted. Pooling step’s goal is to lower the dimensionality of every feature map by removing noisy, unnecessary convolutions and computation networks while keeping the majority of the critical data. There are several types, including Max, Sum, and Average, but maxpooling is the most popular and recommended [28]. In maxpooling, a spatial neighborhood is constructed, and the max unit is obtained from the feature map depending on the filter dimension, which are 2×2 windows, for example. Figure 3 displays maxpooling with a 2×2 window as well as a stride of 2, which reduces the dimensionality of the Feature Map by picking the maximum of each region.
3.1.3. Fully Connected Layer
It comes directly before the output layer in a ConvNet and functions like a standard NN at end of convolutional as well as pooling layers. Each neuron on a fully connected layer is coupled to each neuron on the layer before the FC layer. The FC Layer’s goal is to utilize the preceding layer’s output features to classify images using the training dataset. In essence, a CNN’s fully connected layers act as a classifier, with convolutional layer outputs serving as classifier’s input [29].
Figure 4 shows the unified structure of FaReConvNN and RPN. Modern object detectors have anchor boxes as a standard feature. A rectangular box is acquired for every object in an image during object detection, resulting in many boxes of varied shapes as well as sizes in every image.
The images are first separated into grids. The reason for this is that medical photographs are typically quite large. You also want to make sure that each grid has labeling (done by specialists). CNN [30] is trained on each grid. Each grid is provided with a mask that states “cancerous” or “noncancerous” when it is transmitted. Then, as you glide through each grid, train the NN to recognize each grid's mask.
3.1.4. Region Proposal Network
Feature map is RPN’s input, while the output is a series of rectangular object proposals, each with an objectless score [31]. The selective search takes 2 seconds per image to propose a region, whereas RPN takes only 10 ms. Anchor boxes with three aspect ratios and three scales are used by FaReConvNN. As a result, there are 9 anchor boxes for every pixel in the feature map. A simple convolution layer with a kernel size of 33 is followed by two FC layers in the architecture. 11 convolutional layers are used to create this fully connected layer. The classification layer’s output size should be 29, whereas the regression layer’s output size should be 49. For each pixel in the feature map, the total number of predictions [32] will now be (4 + 2)9(HW).
3.1.5. Loss Function
Loss function utilized in FRCNN is represented by the following:
As previously stated, regression offset is determined using the closest anchor box. Anchor boxes are now acting as region proposals, which is related to the region proposal technique. At training time, all anchor boxes do not contribute to loss. Positive labels are given to anchors with the biggest IOU with ground truth as well as IOU overlap greater than 0.7. The training purpose is not served by anchors that are neither positive nor negative. Anchors that cross borders are also ignored [33].
Consider data with labels such that a FaReConvNN finds function to weave via data such that as much as possible through the utility of 3 various parts: An activation function would satisfy by the following:
Objective function is from (4) wherewhere
Decision variables and constraints have been given by following:
Consider an affine approximation to in (2) at and the following update rule:where serves as proximal term is expressed as follows:where is the proximal operator corresponding to and . From (9) and (10), proximity operator is defined.where denotes signum function. The update that results is identical to usual iterative ISTA. Utilizing 11 gradient expression,
Equation (12)is expressed in following form:
The activation function is defined as a linear combination of K DoG, and linear function is defined as the following equations:where
Training dataset D contains N examples , where Random noise vectors are considered to be equally distributed as well as independent. Let be coefficients of LET activation in layer [34]. By reducing squared estimation error over all training examples, the optimal set of activation specifications c is obtained as follows:
Gradient of J (c) with respect to c is required for optimization. Unless a very tiny step size is specified, optimization of J (c) utilizing vanilla GD tends to diverge. We get around this problem by pointing out that the Hessian does not have to be computed directly. To train the network’s parameters, all that is required is the Hessianvector product. Search direction is determined in Ith iterate ci of HFO by minimizing a secondorder Taylorseries approximation J˜(c) to actual cost J (c) by the following:where , and is search direction to be selected optimally at each iteration by reducing a normalized quadratic approximation by (17):
The dimension of the position vectors is d. In the range [0, 1], R1, r2 are two random values, β is a constant, and Γ(x) = (x − 1)! Equation (18) is used to determine the fitness function here:
The level set function ϕ is a surface that is positive inside region Ω, negative outside the region Ω, defined over image space. The level set equation in its most general form is as follows:
3.1.6. Training
A CNN should be trained on a large database of images in order to attain low error rates. Backpropagation is utilized to train CNN by computing the gradient required for updating the network’s weights. Depending on which layer is being taught, there are number of different steps to train the CNN [35].
The backpropagation mechanism is used in the FC layer. The squared error loss function, as shown in (1), must first be used to estimate the error or cost function indicated E(yL) at the output layer as follows: is the nth training example target of class k, while is the actual output from last layer.
Derivative of error function is partial derivative from output layer [25], as shown in the following:
For every input to current neuron, usually known as delta must be determined.
relates ReLU function σ is used to which is input to the current neuron. After you’ve completed this for all neurons, you’ll need to calculate the errors from the previous layer.where is weight connected to input in next layer. Then, until input to the first completely linked layer is reached, (7) and (8) are repeated, resulting in the network's higher reasoning [36], or dense layers, training on one training sample. (9) represents the change in weight, which is supplementary to old weight:where is the learning rate.
3.1.7. BackpropagationMax Pooling Layers
Backpropagation in convolutional layers is varied from that in the FC layer. Gradients for each weight in FC layers must be modified for the current layer. Since convolutional layer shares weights, each expression with weight must be included. The gradient component for individual [37] weights are computed using the chain rule in the following way: . This entails calculating affection of the loss function E based on a single pixel change in the weight kernel:where
This is evaluated by utilizing chain rule again as in the following:
Since the error at the current layer is already known, deltas may be simply determined by calculating the derivative of activation function. The activation function is answer one or zero except for when its derivative is not defined [38]. Following that, error must be transmitted to the preceding tier. This is accomplished once more by utilizing chain rule as shown in the following equation:
This equation represents a convolution in which has been flipped along both axes. It is also worth noting that this will not work for values at the top as well as the bottom.

3.1.8. SVC (Support Vector Classifier)
Consider set has a convincing pattern, that is grouped into positive class and negative class. The data are then considered as follows:where is weights vector and is scalar.
The problem of hyperplane H0 is same as finding optimal split field with the biggest margin value, which is expressed as follows:
The linear case data in (29)classification nicely demonstrates how data can be divided into two types. Therefore, the slack variable needs to be added so that is obtained.
Following functions (29) can be used to solve the problem in equation as follows:with constrains
C helps to reduce the model's complexity and minimize training errors. The optimization issue in equations (11)–(13) is written as an optimization [39] issue without constraints in (32) using the Lagrange function, as follows:
Nonnegative variables are known as Lagrange Multiplier where ≥ 0. Goal of (14) is to reduce Lp to and b while simultaneously increasing Lp to . The dual issue of (14) can be solved using partial derivatives Lp to , b, and s as follows:with constraints
3.1.9. Gaussian NB (Naives Bayes)
Nave Bayes is a simple and quick classification method that relies on the Bayes theorem and is expressed by the following:
This classifier assumes that each variable contributes equally to the outcome on its own. Every characteristic is now independent of the others, and output is likewise affected by the same weight. As a result, the Nave Bayes theorem cannot be applied to realworld problems, and when this approach is utilized, only poor accuracy is obtained. As a result, Gaussian NB is utilized, which assumes that features have a normal distribution. Features have a conditional probability and are presumed to be Gaussian. Equation (36) gives the Gaussian NB as follows:
4. Performance Analysis
This section compares the suggested strategy for ovarian cancer diagnosis to existing methods and analyses the performance analysis. The model’s performance is represented by a confusion matrix that includes true negatives, true positives, false negatives, and false positives.
4.1. Database Description
The suggested classifier was tested on singlecell blood smear samples obtained from the Cancer Imaging Archive database [40]. Cropped sections of epithelial cells, germ cells, and stromal cells can be found in Cancer Imaging Archive database. The Cancer Imaging Archive database’s grey level attributes are virtually identical to Cancer Imaging Archive database, but with a larger dimension.
The confusion matrix of Gaussian NB employing FaReConvNN is shown in Figure 5 above, where rows indicate predicted class and columns represent an actual class of ovarian cancer data. Trained network that is correctly and erroneously classified is represented by the diagonal blue and white cells. The righthand column represents each anticipated class, whereas the bottom row reflects each actual class’ performance [41]. This confusion matrix plot for Gaussian NB utilizing FaReConvNN reveals that the overall classification performance is 98.69 percent correct.
The confusion matrix of SVC employing FaReConvNN is shown in Figure 6, with rows and columns indicating predicted as well as actual classes. This SVC confusion matrix plot utilizing FaReConvNN reveals that total classification accuracy is 97.39 percent. The anticipated class is represented by the righthand column, while the performance of each actual class is represented by the bottom row. To make it easier to examine the performance, zeroes are added here [42]. Few couples are frequently misidentified as a result of this confusion matrix. The analysis of SVC, as well as Gaussian NB with various specifications, is shown in Table 2.
Graphical depiction based on settings for Gaussian NB as well as SVM utilizing FaReConvNN is shown in Figure 7. Precision, recall/sensitivity, and specificity are the metrics that have been determined in percent. SVC precision is 95.96 percent, whereas Gaussian NB precision is 97.7 percent, with FRCNN enhancing precision in Gaussian NB. For recall/sensitivity, SVC is 94.31 percent and Gaussian NB is 97.7 percent, while for specificity, SVC is 97.39 percent and Gaussian NB is 98.69 percent using FaReConvNN. The Gaussian NB technique in classification utilizing FaReConvNN delivers an improved predicted class in ovarian cancer detection, as discussed above. The parametric values acquired by various approaches are compared in Table 3.
Figure 8 compares existing and proposed techniques in terms of precision, recall, and specificity. CNN [43] has an accuracy of 81.91 percent, DCNN has an accuracy of 89.19 percent, KNN has an accuracy of 78.45 percent, SVC has an accuracy of 95.96 percent, and Gaussian NB has an accuracy of 97.7 percent. CNN has a recall of 79.02 percent, DCNN has a recall of 88.28 percent, KNN has a recall of 74.19 percent, SVS has a recall of 94.31 percent, and Gaussian NB has a recall of 97.7 percent. CNN has 82.93 percent specificity, DCNN has 91.91 percent, KNN has 75.33 percent, SVC has 97.39 percent, and Gaussian NB has 98.69 percent. SVC and Gaussian NB with FRCNN have outperformed all other approaches suggested.
5. Conclusion
In comparison to existing methodologies, the classification method of both SVC, as well as Gaussian NB utilizing FaReConvNN, delivers a precision value of more than 95%, according to the performance analysis. Utilizing this proposed FaReConvNN, 97 percent to almost 99 percent precision was acquired from the projected class utilizing this classification technique. Based on the results of the suggested method, it can be stated that this OC detection classification method is a significant contribution to the medical sector, assisting clinicians in making more precise decisions and treating patients more effectively. There is still scope for research contribution that too in the experimentation of different deep learning models and their hyperparameters optimization for achieving promising and trustable results. Also, intermediate results of CNN can be analyzed and further inferences can be derived for further research.
Data Availability
The data that support the findings of this study are available on request from the corresponding author.
Conflicts of Interest
The authors declare that they have no conflicts of interest to report regarding the present study.
Acknowledgments
The authors extend their appreciation to Taif University for supporting current work by Taif University Researchers Supporting Project number (TURSP  2020/257), Taif University, Taif, Saudi Arabia.