Abstract
In parallel with the development of various emerging fields such as computer vision and related technologies, e.g., iris identification and glaucoma detection, criminals are developing their methods. It is the foremost reason for the blindness of human beings that affects the eye’s optic nerve. Fundus photography is carried out to examine this eye disease. Medical experts evaluate fundus photographs, which is a time-consuming visual inspection. Most current systems for automated glaucoma detection in fundus images rely on segmentation-based features nuanced by the underlying segmentation methods. Convolutional neural networks (CNNs) are powerful tools for solving image classification tasks, as they can learn highly discriminative features from raw pixel intensities. However, their applicability to medical image analysis is limited by the nonavailability of large sets of annotated data required for training. In this work, we aim to accelerate this process using a computer-aided diagnosis of this severe disease with the help of transfer learning based on deep convolutional neural networks. We have suggested the Inception V-3 approach for image classification based on convolution neural networks. Our developed model has the potential to address this CNN model’s problem of classification accuracy, and with imaging data, our proposed method outperforms recent state-of-the-art approaches. The case study for digital forensics is an essential component of emerging technologies, and hence glaucoma detection plays a vital role in it.
1. Introduction
Glaucoma is a set of eye situations that hurt the optic nerve, the health of which is vigorous for sound vision. This harm is often produced by oddly high pressure in the eyes. Glaucoma is the leading reason of blindness, and it may occur at any age but is more common in older people having more than 60 years of age. Several types of glaucoma have no cautioning signs, but few warning signs indicate the possibility of this disease. If this disease is recognized earlier, vision loss can be slowed or prevented completely [1].
Deep learning is a modern technique with much promise in the area of ophthalmology. Deep learning tools have been used to analyze multimedia images, optical coherence tomography, and visual fields, among other diagnostic modalities. These methods effectively determine cataracts, glaucoma, age-related macular degeneration, and diabetic retinopathy, among other diseases. Deep learning methods are quickly emerging and can increasingly be used in ophthalmic treatment [2].
Convolutional neural networks (CNNs) have been a popular method for solving computer vision problems, including image recognition and semantic segmentation. These methods are robust enough to learn deep image properties that are usually neglected but are essential for the target challenge, avoiding intermediate steps including segmentation or function design and selection [3]. However, this benefit comes at the expense of learning from incredibly broad, annotated training sets [4]. Ting et al. [5] released a research report on eye disorders like diabetic retinopathy, glaucoma, and age-related macular degeneration. In their research, they compiled a list of papers published between 2016 and 2018. They collected a collection of publications that used fundus and optical coherence tomography pictures and TL methods. They did not include recent publications that used TL approaches, and they did not have eye cataract disease diagnosis in their research scope.
Similarly, Hogarty et al. [6] reviewed existing state publications in ophthalmology that used AI, but their emphasis missed systematic AI methodologies. Hagiwara et al. [7] looked at an essay on utilizing fundus pictures to detect glaucoma efficiently. They concentrated on computer-assisted systems and optical disc segmentation systems. A range of research works used DL and TL approaches to diagnose glaucoma.
The most popular approach to studying the human eye in ophthalmology is to take and evaluate an eye-fundus image. A fundus camera is used to take a snapshot of the eye’s surroundings via the pupil during this type of eye test. Visual analysis is a popular method of examining these images. In medical screening, this procedure will take many hours in front of a computer screen. We aim to speed up the diagnosis by utilizing computational algorithms to examine the photos and find and highlight the most relevant information. We would want to be able to diagnose anomalies and illnesses without the need for human involvement. The popular image processing methods built and validated utilizing low-resolution images have demonstrated disadvantages in clinical usage due to fundus images’ increasingly growing spatial resolution. A new generation of approaches must be created for this reason. These techniques must be able to work on high-resolution images while being computationally simple. We present a novel vessel segmentation approach with low computational criteria and a publicly accessible high-resolution fundus database with manually developed gold standards for testing retinal structure segmentation methods.
Retinal vessel segmentation is a complicated process that has drawn researchers from all around the world for years. Many different algorithms were used during this period. Unsupervised and supervised approaches are the two critical types of segmentation algorithms. Unsupervised methods use heuristics to identify boats, while supervised methods immediately learn a criteria scheme using prelabelled data as the gold standard. Since supervised methods need a broad training set for each camera configuration, we concentrate on heuristic methods. On the other hand, heuristic techniques necessitate a series of criteria tailored to the camera configuration. As a result, they are far less reliant on the test dataset during production—a more in-depth look at the segmentation and other algorithms used in retinal image processing.
Model-based approaches are another popular segmentation technique. Active contour-based techniques, stage sets, and other related approaches are the most well known and frequently employed. Early snake-based algorithms begin with a rough object contour that is iteratively optimized by multiple powers. The forces achieve their equilibrium on the object boundaries in the ideal situation. These approaches are vulnerable to parameterization, and they may encounter difficulties when segmenting thick and thin vessels simultaneously. As a result, the consumer must manually set and optimize the parameters.
The authors of [8] proposed an automated solution and classification of this chronic condition using the Cup to Disk Ratio in another report (CDR). Consequently, we may infer that classifying the image’s retina can be complicated if we are not going for the retina areas after extracting the functionality. After identifying the picture features, we do not need any experience to pick the discriminative features. Consequently, we may infer that identifying glaucoma using CADx devices for ophthalmologists is a challenging job.
The matched-filter method was one of the first and most popular methods for fundus photographs. It enhances vessels by applying predefined vessel profiles of various sizes and orientations to the picture. To obtain vessel segmentation, early implementations of these methods depended on an essential thresholding phase. These techniques were occasionally mixed with other interventions. The matched filters yield high-quality performance, but their only downside is that they need vessel profiles and large-region comparisons for each pixel in the image, which takes a long time to compute. The consistency and scale of the used vessel profile database have a direct effect on the segmentation performance. It is subject to race, camera configuration, and even eye or vascular disorders, restricting its applicability. Some algorithms are designed to segment only one or more items identified by a consumer or preprocessed. These approaches typically do not look at the whole picture but rather the area around the already segmented regions. Based on parallels and other parameters, region-growing methods attempt to increase the segmented area of nearby pixels. These techniques are most accessible, but they could have problems in some parts of the picture where the vessels have more inadequate contrast than the underlying tissues, such as vessel ends or small vessels. In this situation, the expanding zone can segment vast unwelcomed regions. In these cases, vessel monitoring algorithms are more reliable. They monitor the provided vessels by looking for a vessel-like structure in the already segmented area. These algorithms are robust at identifying vessel endings, but they may fail at bifurcations and vessel crossings, where the local structures no longer resemble traditional vessels.
Categorization is a process in which things (objects, ideas, or people) are classified according to their similarities or standard criteria (classes, types, and index). It allows people to simplify and explain their understanding of the essence of the things, activities, and ideas around them. Humans do categorize it as something that many animals do: “do the utmost with the right things.” Categorization depends on characteristics ranging from non-member to party representative. Categorization is essential in the study, assessment, inference, decision-making, vocabulary, and other ways the animals communicate with their ecosystems [9].
ANN cannot fulfill general roles. Until patterns as entrants are presented, they are thoroughly eligible with datasets named testing datasets. These emerging trends can be an accurate definition or just a prognosis after a training session. ANN may detect trends in real-system results, computer programs, and physical models, or other sources. The design of the appropriate algorithm, called hidden layers for answers, can handle as much of the details. ANN concentrates on our brain [10].
The machine learning subset depends on input and output rates, which are representations, variables, functions, and the basics [11]. It differentiates between meanings from the lower to the higher standard. It is designed to contain data, including text, photographs, and audio. It is an ANN sector with complex perceptron secret layers. Furthermore, backspreading is used in ANN for defining the mistake that the actual output would take. There are two weights and the input meaning for both coming ties. The production of each relation is a function of the summed unit values [12]. The abbreviations and acronyms used in this work are shown in Table 1.
1.1. Motivation
As already stated, DL and TL techniques have several advantages, and in recent years, many researchers have applied these methods. Overall, relatively few analysis articles are conducted in clinical databases that concurrently discuss all glaucoma detection forms. This study is also an essential and needed work towards the disease diagnosis of glaucoma detection.
1.2. Digital Forensics and Computer Vision
Nowadays, digital content is widely available and easily redistributed, whether legally or illegally [13]. For instance, after sharing photographs on the Internet, other web users can alter them and then publish modified versions, resulting in identical images [14]. The presence of near duplicates has a significant impact on the performance of search engines. Computer vision is the process of extracting, analyzing, and understanding usable information from digital images automatically. Computer vision’s primary use is picture understanding. Image comprehension encompasses various tasks, including feature extraction, object identification, object recognition, image cleaning, and image transformation. CNNs have consistently performed well in various computer vision tasks that previously needed a visual inspection [15, 16]. As a result of these astounding results, academics have begun to use CNNs to solve picture forensic challenges. However, image classification approaches in computer vision typically rely on visually discernible cues. On the other hand, forensic solutions rely on imperceptible traces that are often highly delicate and hidden in the excellent details of the image under investigation. As a result, training a CNN to tackle a forensic assignment requires extra caution, as standard processing procedures can significantly impair forensic traces.
1.3. Organization of the Paper
This paper is organized as follows: Section 1 provides the introduction of the research. Section 2 provides a literature review of related work. We have provided concise and clear past work related to our study. The methodology is discussed in Section 3 with the help of the proposed solutions and implementation of research. Section 4 is composed of a discussion and results section. In the discussion section, we discussed our overall research with results tables and performance graphs. The conclusion section concluded our investigation with conclusory remarks and provided some valuable hints for future research directions.
2. Literature Review
In many applications, the advances achieved in information technology and the demands of image processing are spreading. There are many techniques in image processing, and image classification is also its subject. Researchers have used different methods for image classification, but there is no specific solution for image classification. The target of image classification is the foreground of an image, as we can say the problem is the background of an image. Classification is the way to separate a picture into various sections or stages that may take an image from our goal region [17]. Thus, when we take our attention from it, we avoid classification at that stage. The usage of the conditional random field in sketches and classifications is to boost the state-of-the-art. The selective object recognition search method used in a single image in multilocation has provided the OverFeat system for location, identification, and classifying boundaries in convolution networks [18] using the skin-color chart for the color analysis of the face position by area grading. The FaceNet was used to learn a mapping from face pictures to calculate facial similarities.
Ophthalmologists can conduct the eye screening procedure more effectively, and a more decisive outcome can be achieved, resulting in a more cost-effective approach for patients. We will save time and money by taking an in-depth research approach to glaucoma research and study. Since ophthalmologists need to distinguish between the glaucoma retinal picture and the healthy image, the computer-aided diagnostics (CADx) method should be introduced in clinics and hospitals to discern between the glaucoma retinal image and the healthy image.
Decision tree classifier (C4.5), Naive Bayes classifier, and random forest are examples of the current machine learning methods. These tests are insufficient to diagnose glaucoma disorder with greater precision. The author has developed a CNN architecture focused on the deep learning methodology in this report. The authors determined which network is the best for identification after analyzing performance, precision, recall, and f-score. Deep learning methods may be used to distinguish between nonglaucoma and glaucoma patterns for diagnosis decisions, resulting in higher precision.
Reference [19] published a comprehensive analysis of automated approaches to diabetic eye disease identification from several viewpoints, including (i) accessible databases, (ii) picture preprocessing methods, (iii) deep learning models, and (iv) output assessment indicators. As seen in Figure 1, the survey offered a detailed review of eye disease identification strategies, including state-of-the-art field approaches.

Instead of utilizing the segmentation technique, image characteristics across the retina area may diagnose glaucoma eye disease. Consequently, there is a need to correctly recognize cup and disc boundaries using a procedure that suggests the disorder induces morphological adjustments in the retina. Without requiring deep domain awareness in image processing, the essential features may be extracted in detail automatically.
Acharya et al. [13] used a sequence of radon, discrete wavelet, and discrete cosine transforms to derive features from retinal images. Messidor data (1200 fundus photos divided into 400 sets) discovered a decision tree graded DME with the best results. The package is split into four 100-image subsets, each with a corresponding excel file corresponding to the diagnosis.
Yu et al. [20] concentrated on offline classifiers to increase numerical performance when classifying pixels, while Chudzik et al. [21] highlighted the significance of layer freezing and transfer learning. CNNs can acquire lower-level functionality from public databases using transfer learning, which compensates for data scarcity. Hatanaka et al. [22] used a two-step DCNN to detect MAs while still filtering false positives. Natural language processing was used by Dai et al. [23] to compensate for videos that were poorly monitored (NLP). During preparation, the multisieving system was used. CNN connects the supervised evidence in clinical reports to the positions of lesions in retinal photographs. The CNN then employs these newly acquired comparisons to detect lesions in research photos without the assistance of clinical documents.
The authors of [24] wrote a review paper on fundus image classification methods to detect glaucoma using multiple ML approaches and techniques. The authors of [25] provided a systematic review of the articles using CNNs for medical image analysis, published in the medical literature before 2019. Articles were screened based on the following items: type of image analysis approach (detection or classification), algorithm architecture, the dataset used, training phase, test, comparison method (with specialists or other), results (accuracy, sensibility, and specificity), and conclusion. The writers of [26] have used five open-access databases, which are open to the public: retinal fundus images; OCT images; skin lesion images, and pediatric and adult chest X-ray images, to feed them into the search system for neural architecture, hosted by Google Cloud autoML, which has automatically built a deep learning architecture to identify standard information. The authors of [27] used medical small-sized image data for classification using the RF algorithm and deep ensembles. The authors of [28] worked on a deep learning convolutional network based on Keras and Tensor Flow using python for image classification. They used several different medical images as a dataset to diagnose eye diseases, which contain four types of diseases, diabetic retinopathy, glaucoma, myopia, and normal. CNN, VGG16, and InceptionV3 neural network structures were compared singly and together using a bagging ensemble to diagnose eye diseases. All experiments were applied, and the result was obtained. The authors of [29] worked on eye disease classification using backpropagation with parabola learning rate. They achieved 89.83% classification accuracy in their experiment. Table 2 summarizes the related works.
3. Problem Formulation
State-of-the-art literature has shown glaucoma detection using various datasets of fundus images. Various authors have used different datasets for multiple purposes, but a comprehensive collection of fundus images for classifying healthy and unhealthy eyes is missing. After selecting these images, applying deep learning models as a transfer learning factor can give us good results in terms of classification and reasonable statistical measures. The critical contributions in the literature and their approaches are summarized in Table 2.
4. Materials and Methods
We looked at many machine learning methods used by various researchers on fundus photos in the literature. C 4.5, Naive Bayes classifier, and random forest are examples of the current machine learning methods. These tests are insufficient to diagnose glaucoma disorder with greater precision. In this analysis, we developed a CNN for glaucoma disease diagnosis utilizing a deep learning framework. We determined which network is better for identification after examining, depending on accuracy, as shown in Figure 2.

Convolution layers often might not be connected to input data, like raw pixel value, but the output of the other layers. Convolution layer initialization allows for the hierarchical redirection of the results. For filters directly operating on raw pixel value, regarding the weighted value like a line characteristic and those filters attempting to extract weaves that combine a function, these filters are assisted by first line layer production, which became multiple lines to display the shape. It can begin until the deeper layers delete faces such as houses, animals, or people. It is precise how the breadth of this network can be improved if we choose to see high and low functionality abstracted by ANN [31]. Moreover, we practice this in a timely fashion. Thus, we get the message that the layers closest to the input layers are more pertinent to concert components than chairs and tables, such as lines and types, and the thresholds are relative to the output layers.
The core building blocks in convolutional neural networks are classified as revolutionary layers. The heart of the Convolutional neural network is the convolutional layer, which gives the network its name. This sheet carries out a convolution mechanism. CNN cannot be comprehended due to the single filter. Besides, several structures are being examined alongside a single collection of results. The current layers aim to keep the exact dimensions as the input. Local features tend to be gained through convolutional layers [32]. Compared to models 32 or 512 for collecting features from an entity or various ways of seeing and observing guidance, those layers have many essential methods of monitoring input data. This versatility allows for flexibility in the training specifics, not only lines but also specific lines.
There are many channels for colorful images, typically for each color channel like red, green, and blue. From a data point of view, it means that if the input of a single image to a model is given, it is three images. The number of channels, also called the width, must also be close to the stream. When an image has 3 channels for input, 3 channels should be used in a filter applied to that image, e.g., a depth of 3. A 3-3 filter can be 3 × 3 × 3 × 3 or 3 [3,3,3] sides, second columns, and third columns. The filter is used for a point product process to achieve a single input and filter depth value. Since 32 and 32 filters are in a convolution sheet, they are twin and tridimensional, with correct filter weights for all three channels [27]. However, each filter gives upshots of 32 of the 32-convergence filters for the 32 feature maps generated in a single function chart.
Resizing is a method in which the number of pixels is decreased or expanded. The scaling of images is referred to as resizing. When an image is resized, a new picture may be produced with a higher or lower number of pixels. There are two different types of resizing: zooming and shrinking. The process by which an image is broadened and visible in detail is called zooming. The image should be scaled in such a way that the image’s precision is not affected. Reducing the image means deleting pixels from columns and lines that have little effect on the image details. By running a python script via CMD with the dataset route, we resized Inception V-3 input size: [224 224] and VGG16 input size: [224 224]. We divided the 1563 photos into eleven datasets, each labeled with a folder name and held the same picture in each group.
We have used this technique in the Inception of V-3 and VGG16 beginnings. It allows the data availability of the training model to be explicitly increased. It expends the dataset of training by using the same dataset to create modified picture versions. In a deep learning neural network, Keras provides this capability to fit models through data enhancement. Training deep neural networks on additional data can best provide skilled models. The variations of the pictures can be created with this technology, improving the possibility of fitting in models to simplify what they have learned about the new pics. Kera’s profound learning library can fit models with image data generator using the increase of image data. This technique trains a large number of neural networks by padding, cutting, and horizontal flipping. In 2D images, most of the data increase is used. This technique can be used to supply transformed image forms. Random samples are extracted from the dataset, labeled as healthy and unhealthy, as shown in Figure 3.

5. Results and Discussion
The next trait is a plane sheet; the neuron’s weight is the same in the plane. This map uses the sigmoid functionality; in short, the activation function can be named CNN. In the mapping plane, neurons exchange the weights in the same plane. It can be used in 2D image creation due to being multilayer. The first convolution and the second sampling method rely on CNN. The first method utilizes the filter to render it the coconvolution layer [33]. After this scalar weighting and resulting prejudice, the second process uses pixels from the neighbourhoods to generate a new attribute.
Pictures are more of the day in red, grey, and 96 by 96 measurements, implying 32 by 32. The convolutional layer that brings the network its name is at the core of the convolutional neural network. This layer executes a convolution process. Current layers try to maintain the same size as the input. Convolutional layers appear to acquire local characteristics [32]. Such layers provide several specific methods of tracking input data compared to models 32 or 512 to collect features from an object or multiple forms of seeing and observing instruction. This variety enables variation, for example, not just lines but precise lines in the specific training details. The filter is used for a point product process to achieve a single input and filter depth value. It means that since 32 and 32 filters are in a convolution sheet, they are not only twin but also tridimensional, with correct filter weights for all three channels. Convolution layers often might not be connected to input data, like raw pixel value, but the output of the other layers. Convolution layer initialization allows for the hierarchical redirection of the results. For filters directly operating on raw pixel value, regarding the weighted value like a line characteristic and those filters attempting to extract weaves that combine a function, these filters are assisted by first line layer production, which became multiple lines to display the shape as shown in Figures 4 and 5.

(a)

(b)

(c)

(d)

(e)

(a)

(b)

(c)
By pool size arguments, the average pooling layers determined the number of rectangular areas. If the magnitude of the pond is the height of 2 and the width of 3, the layer returns to the average region value. We also analyzed the previous pooling layers where the input feature map is sampled, but the global pooling samples the whole feature map in a single value. As FC layers, we call these layers; they flatten the matrix into vectors and nourish it like a neural network into completely interconnected layers. In detecting 3D systems, layers are not better than ultimately linked structures. It is also necessary to learn independently of the characteristics of a convolution layer and a linked layer. By keeping this straightforward, we have a square form picture with a scale of one with c-channels. It requires an illustration of a square convolution layer of size k, as shown in Figure 6.

The graphical representation of the architecture of the model is presented in Figure 6. This graphical representation consists of layers of the model, including dense, pool, and convolutional layers. The representation of the model’s architecture is implemented using the visual-Keras library. It is a python package that allows plotting the architecture of the model’s architecture. It supports the layered architectural view of the model, which is extremely significant for the convolutional neural network.
In Figure 7 there is a brief comparison in terms of training accuracy, training loss, and validation accuracy of different deep learning networks. The circular diagrams show the graphical comparisons of multiple deep learning models that are implemented in this paper. The graphical representation shows that EfficientNetB7, ResNet152, and InceptionNetV3 have validation accuracies of 100.0%, 99.65%, and 96.47%, respectively. The validation loss is 0.6931 in all the networks. This graphical comparison is extremely significant to visually analyze the models’ training and their respective validation loss. The graphical comparison allows the reader to quickly get an idea about the statistics of each model, which helps build a narrative on the selection of deep learning neural networks when used for transfer learning for medical images dataset or any other images dataset. Each deep learning neural network is graphically represented in different color codes to make them different. The validation accuracy, validation loss, and training accuracy are also graphically presented in different color codes. The overall comparison can be visually seen in Figure 7.

Eight different deep learning neural networks are implemented on the glaucoma dataset. The results of various statistics of different models in terms of validation accuracy, validation loss, and training accuracy are presented in Table 3. The table shows that the EfficientNetB7 deep learning model has the highest validation accuracy with 100.0%. ResNet152 has the second-highest validation accuracy with 99.65%, and InceptionNetV3 has the third-highest validation accuracy with 96.47%. The remaining five deep learning models, including FenseNet201, NASNetMobile, ResNet50, ResNet101V2, and VGG16, have less validation accuracy than the other three models. The validation accuracy is the same in all the models with 0.69315.
The evaluation matrix of implemented deep neural networks precision, recall, f1-score, and support is presented in the bar graph in Figure 8. The values of precision, recall, f1-score, and support are graphically described in the bar graph. The graph showed the values in different colors for different evaluation parameters. The bar chart is a very comprehensive way of representing categorical data by summarizing. Data is displayed using different bars, with each bar representing a particular category. Each category is represented with a different color. The height of each bar represents the degree of occurrence of the data. The overall evaluation matrix, including precision, recall, f1-score, and support of the deep neural models implemented in this paper, are represented in the bar chart in Figure 8.

The distribution of values of evaluating matrix is represented in the box plot shown in Figure 9. Box plot is a comprehensive way of describing the distribution of data generally in a five-number summary. The five-number summary is minimum, first quartile, median, third quartile, and maximum. Box plot is extremely significant to get the information about the values that are outliers. The box plot also describes if your data is symmetrical, how tightly the data is grouped, and the skewness. The box plot shown in Figure 8 represents the distribution of the evaluating matrix values of deep neural networks. The values of evaluating matrices such as precision, recall, f1-score, and support are presented in a five-number summary by grouping the values and finding the minimum, first quartile, median, third quartile, and maximum. This five-number summary is represented in the form of a box, which is present in the box plot graph shown in Figure 9.

A brief graphical representation of support, f1-score, recall, and precision is presented in Figure 10. On the x-axis, there are deep neural networks that are implemented in this research paper. On the y-axis, three different scales represent different evaluation matrices. The scale with green color represents the precision value of various deep neural networks ranging from 0.1 to 0.8. The plotted curves in green color represent the precision values of deep neural networks placed on the x-axis. The scale with orange color represents the recall, and it ranges from 0.0 to 1.0. The plotted curves with orange color are representing the recall values of each deep neural network concerning each class. The scale with purple color is representing the f1-score ranging from 0.0 to 0.8. The curves with purple color are representing the f1-score values of each deep neural network concerning each class. The blue square boxes in the plot are the support values of each deep neural model concerning each class. This graphical representation of evaluating matrix is extremely significant in terms of describing the model’s performance.

A comparison between different deep learning neural networks in terms of precision, recall, f1-score, and support is presented in Table 4 concerning both classes healthy and unhealthy. The precision, also called the positive predicted value, is the fraction between true positive instances and the number of true positive and false positive instances. Eight experiments are carried out with eight different deep learning neural networks, and precision is calculated for each deep neural network for each class. The results of the precision values of each deep neural model are presented. The recall value of each deep learning model for each class is also calculated. The F1-score is the weighted average of precision and recall, and this is also calculated for every deep neural network for each class. The support is all true instances of the class. The support value is also calculated for all deep neural networks to each class. All these values are calculated to form the confusion matrix of the models. The comparison table of these matrices is extremely significant to get information about the performance of each deep learning model. A detailed comparison among the values of all evaluation matrices is presented in Table 4.
6. Conclusions
Before that, too many models of neural network segmentation were planned. Many computer scientists have done some research to boost the classification performance, but the precision of such artificial neural networks is still missing. That is why we have suggested the Inception V-3 approach for image classification based on convolution neural networks. This model we have developed has the potential to address this CNN model’s problem of classification accuracy. DenseNet201 achieved recall 1 for unhealthy class, EfficientNetB7 recall is 1 for healthy class, ResNet152 recall is 1 for healthy class, and InceptionV3 achieved recall 0.97 for healthy class. When we take F1-measure, then EfficientNetB7 and ResNet152 beat the other algorithms with 0.82 values. InceptionV3 has 0.81 value for f-measure.
We will use different latest neural networks’ parameter tuning to compare the old one we have already used. Further research and improvement in the current work are outlined in future work to achieve this goal entirely. An increase in the number of images in the dataset gives high precision than before. Assessing the output of other neural networks for the dataset defined is planned further. We are developing an application to track the past database glaucoma.
Data Availability
The data supporting this study’s findings are available from the corresponding author or Jahanzaib Latif ([email protected]) upon reasonable request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work is supported in part by the Beijing Natural Science Foundation (No. 4212015), Natural Science Foundation of China (No. 61801008), China Ministry of Education - China Mobile Scientific Research Foundation (No. MCM20200102), China Postdoctoral Science Foundation (No. 2020M670074), Beijing Municipal Commission of Education Foundation (No. KM201910005025).