Abstract

Keratoconus is a noninflammatory disease characterized by thinning and bulging of the cornea, generally appearing during adolescence and slowly progressing, causing vision impairment. However, the detection of keratoconus remains difficult in the early stages of the disease because the patient does not feel any pain. Therefore, the development of a method for detecting this disease based on machine and deep learning methods is necessary for early detection in order to provide the appropriate treatment as early as possible to patients. Thus, the objective of this work is to determine the most relevant parameters with respect to the different classifiers used for keratoconus classification based on the keratoconus dataset of Harvard Dataverse. A total of 446 parameters are analyzed out of 3162 observations by 11 different feature selection algorithms. Obtained results showed that sequential forward selection (SFS) method provided a subset of 10 most relevant variables, thus, generating the highest classification performance by the application of random forest (RF) classifier, with an accuracy of 98% and 95% considering 2 and 4 keratoconus classes, respectively. Found classification accuracy applying RF classifier on the selected variables using SFS method achieves the accuracy obtained using all features of the original dataset.

1. Introduction

In many fields (computer vision, pattern recognition, …, etc.), the resolution of most problems is based on the processing of data extracted from data acquired in the real world and structured in the form of vectors [1]. The quality of the processing system depends directly on the right choice of the content of these vectors. But, in many cases, the resolution of the problem becomes almost impossible because of the very large dimension of these vectors. Therefore, it is often useful, and sometimes necessary, to proceed to a selection of the most relevant features compared to the used resolution method, by eliminating harmful features to the adopted system, even if this selection of variables may lead to a slight loss of information. Moreover, to extract important features from these large variables and data, statistical techniques were used to minimize noise and redundant data [2]. Thus, the selection of parameters is really important in improving the model and this is by using correlated and nonredundant parameters. In addition, learning is done quickly, and the complexity of the model will be reduced, making it easier to understand and improving metric performance in terms of precision, accuracy, and recall [3].

There are four important reasons why feature selection is essential. First, spare the model to reduce the number of parameters. Second, to decrease the learning time. Then, to reduce overfilling by improving the generalization and to avoid the problems of dimensionality [4]. So, our motivation is to get the best model with high predictions and small errors.

It is in this context particularly that this work is presented that consists in determining the most relevant parameters for diagnosing keratoconus, which corresponds to a deformation of the cornea (the transparent coating of the iris and the pupil of the eye) which gradually thins [5], loses its normal spherical shape, and takes on an irregular cone shape as illustrated in Figure 1 below.

Keratoconus can be diagnosed during a consultation, motivated by the existence of functional signs secondary to progressive irregular myopic astigmatism. In general, the functional signs are not very specific. The most common is the presence of visual blurring, photophobia, fog, progressive loss of visual acuity predominantly at a distance, monocular diplopia, or persistent irritation [7]. However, there are several tools to diagnose keratoconus such as corneal topography, corneal biomechanics, and optical coherence tomography OCT. Each tool has its own parameters to diagnose the disease, so in this study, we will analyze the different parameters using machine learning algorithms, then, a validation of obtained results by a physician expert in the field will be performed. In this work, feature selection techniques are used to increase the potential for classifier generalization. Thus, a comparison of the results without and with feature selection, using filters, wrappers, embedded, and hybrid methods, will also be presented. The main contributions of this research are summarized as follows. First, the analysis of various parameters extracts the most relevant ones, especially for the analysis of classification data. Second, a comparative study of different machine learning models, such as random forest (RF), support vector machine (SVM), -nearest neighbors (KNN), decision tree (DT), Naive Bayes (NB), logistic regression (LR), and linear discriminant analysis (LDA) using critical features. Different models will have different strengths in classifying data which will affect classification performance. Also, multiple feature selection methods are used to get the best accuracy. In addition, we mainly review the variable selection application and provide description, analysis, and future research suggestions. The remain of this paper is organized as follows. The following section represents the related works. Section 3 describes the employed methodology in keratoconus classification. The simulation results are presented in Section 4. Section 5 presents the result discussion. Finally, the conclusions and future directions of the research are indicated in Section 6.

Artificial intelligence (AI) has integrated different domains of medicine field such as ophthalmology. The number of works that focused on the detection of ophthalmic diseases using machine learning (ML) is growing. Several research teams aim to build intelligent systems for keratoconus diagnosis and classification. In [8], authors proposed an ensemble of deep transfer learning considering SqueezeNet (SqN), AlexNet (AlN), ShuffleNet (SfN), and MobileNet-v2 (MbN) for improved detection of keratoconus. Built system was trained on a dataset of 2136 corneal topographic maps and provided an accuracy in the range of 92.2% to 94.8%. To evaluate keratoconus diagnosability, the authors of [9] developed an intelligent system based on deep learning using color-coded map with Placido disk-based corneal topography. Trained on a total of 3390 color-coded map images representing 4 eyes classes, the proposed system achieved an accuracy of 78.5% in keratoconus classification. Authors of [10] proposed an intelligent system based on time delay neural network (TDNN) to verify both the progression predictability using two prior tomography measurements and the system accuracy when labelling the eye as stable or suspect progressive. Obtained results showed a sensitivity of 70.8% and a specificity of 80.6% using data of 743 patients captured by Pentacam. To screen keratoconus using corneal topography, authors of [11] adopted three convolutional neural network (CNN) models (VGG16, InceptionV3, and ResNet152) to develop the proposed system. Trained on a dataset of 354 images, built system achieved accuracies 93.1%, 93.1%, and 95.8% using VGG16, InceptionV3, and ResNet152, respectively. The authors of [12] proposed a convolutional neural network- (CNN-) based intelligent system for keratoconus detection. Trained on a data set of 3000 images, provided by Pentacam technology only, developed system provided a classification with an accuracy of 99.33%. Authors of [13] built feedforward neural network- (FNN-) based intelligent system for keratoconus identification. Developed system discriminate keratoconus eyes with an accuracy of 96.56% on a dataset of 851 elements using neighborhood component analysis for features selection (NCAFS). In [14], a RF model was used to detect keratoconus. The obtained system provided a classification accuracy of 76% on a dataset of 500 images. Using a dataset of 124 images and using 29 parameters, the authors of [15] have developed a keratoconus identification and classification system using Bayesian neural networks (BNN). Adopting principal component analysis (PCA) of features selection, the developed system allowed a classification with an accuracy of 73% and 80%, respectively, for supervised and unsupervised learning. In [5], the authors proposed a keratoconus classification system based on unsupervised machine learning (UnML) and trained on a dataset of 3156 images. To reduce the dimensionality of the input data from 420 to eight important variables, the authors adopted the PCA method. The built system allowed keratoconus identification with a specificity of 94.1% and a sensitivity of 97.7%. In [16], authors have developed a BNN-based system of keratoconus classification. Classification accuracy of this system, using 16 parameters on a dataset of 60 elements, was 100%. The authors of [17] have built an intelligent system to classify keratoconus based on CNN technique. Trained on a dataset of 543 images, the accuracy of the proposed system was 99.1%. In [18], the authors developed SVM-based system for keratoconus detection and classification. Classification accuracy of built system was between 92.6% and 98.0% on a dataset of 131 images and 25 extracted parameters. Trained on a dataset of 372 images using 55 parameters, the system proposed in [19] allowed keratoconus classification using decision trees (DT) model. Developed system discriminated normal and keratoconus eyes with a sensitivity of 100% and a specificity of 99.5% and classified normal and forme fruste keratoconus eyes with 93.6% sensitivity and 97.2% specificity. The authors of [20] proposed an SVM-based system for keratoconus detection and classification. Classification accuracy provided by the built system was 98.2% on a dataset of 3502 elements and using 7 parameters. In [21], authors have developed a classification system for keratoconus with an accuracy of 90% on a dataset of 40 images and 12 parameters. The authors of [22] have proposed eight classifiers in order to compare their performance. Using 11 extracted parameters on a dataset of 88 elements, RF, SVM, KNN, logistic regression (LR), linear discriminant analysis (LDA), lasso regression (LaR), DT, and multilayer perceptron neural network (MPAN) models provided an accuracy of 87%, 86%, 73%, 81%, 81%, 84%, 80%, and 52%, respectively. The authors of [23] developed a system for early and mild keratoconus detection. Based on logistic regression, this system allowed early and mild keratoconus detection using only 5 selected variables from a dataset of 27 features. The variable selection was performed using and Kruskal-Wallis algorithms. The overall accuracy of this system was 73%. In [24], the authors have proposed a comparative study of 25 different machine learning models allowing keratoconus detection based on the corneal imaging. Different classifiers were trained on a dataset of 3151 corneal images, collected from 3146 eyes. Applied on a subset of 8 selected parameters using subset selection (SS) and feature ranking (FRank) feature selection methods, proposed models provided classification accuracy varying between 62% and 94%, and the highest performance was generated by the SVM model. Table 1 below summarizes the works already cited:

Despite the good performance of different systems already mentioned in the related works, which allowed a very good discrimination between normal and keratoconus eyes in keratoconus classification, many works of them did not mention that they used variable selection method. Such methods could increase system performance, by eliminating irrelevant variables, reducing data dimensionality, and optimizing algorithm prediction time. In this work, we propose a comparative study of keratoconus classification using different classifiers, without and with features selection, by applying different types of variables selection algorithms.

3. Methodology

3.1. Feature Selection

The performance of a machine learning system is affected by several factors, including the representation and relevance of the data used by that system. Generally, not all learning data is always relevant to the system. However, the selection of relevant features, by eliminating less informative, redundant, or even irrelevant variables, is of great importance to the learning system. The feature selection model adopted in this work is described in Figure 2 below.

3.2. Data Preprocessing

The data preprocessing stage consists generally of eliminating irrelevant and redundant variables, handling missing values in the dataset, and handling categorical data, such as textual data that is difficult to understand for machines. The dataset resulting from this step is then used by different types of algorithms in order to select relevant features.

3.3. Filters

First, the used dataset was filtered using the filters. These methods allow to select variables using different approaches and criteria to calculate the relevance of a variable before the learning phase. In other words, the evaluation of the importance of characteristics is done independently of the use of a classifier. However, the characteristics retained by the filters can be used by all learning algorithms. Filters remove irrelevant, redundant, constant, duplicated, and correlated characteristics in a very efficient manner [25]. The main filtering methods used in this work are: (1)Fast correlation-based filter (FCBF) that allows to select features representing a low correlation with other features and which are more correlated to the target variable using symmetrical uncertainty [26](2)Mutual information (MI) which can be defined as the measure of reduction of uncertainty of a variable in view of the knowledge of a second variable. It represents a statistical dependence between two random variables, thus, measuring their degree of dependence in the probabilistic sense [25](3)Analysis of variance (ANOVA) is a statistical model that allows to compare the mathematical expectation of several subsamples in order to demonstrate the possible similarities or differences on specific aspects in a studied sample [27](4)Variance algorithm calculates the variance of different features. This algorithm selects the features for which the variance is greater or equal to a special threshold defined initially

3.4. Wrapper Methods

The weakness of filters is the fact that they do not consider the learning algorithm when selecting variables. Wrapper methods solve this problem by introducing the learning algorithm during feature selection. This method evaluates the classification performance of a subset of variables during the selection procedure using a classifier [25]. The wrapper algorithms used in this study are: (1)Recursive feature elimination (RFE) is a selection feature algorithm in which specific weight values are assigned to features by application of external estimator. This process is repeated recursively, and in each step, attributes whose weights are the smallest ones are removed from the current set. It works until the desired set of features to select from is eventually reached. In the RFE approach, the number of features to select should be initially defined [28](2)Sequential forward selection (SFS) is an iterative algorithm starting from an empty subset of variables. For each iteration, FFS algorithm evaluates the variables individually and retains the variable that best improves the model. The selection process stops when the performance of the system is no longer increased by adding a new variable [29](3)Sequential backward selection (SBS) is an iterative algorithm initially using all the features of the dataset. BFE eliminates the least significant variable in each iteration until no performance improvement is noticed [29](4)Genetic algorithms (GA) are iterative algorithms based on the genetic evolution process. GA constitutes chromosomes from an initial population by proposing potential solutions to the studied problem. This initial population of solution evolves using three operators (selection, crossing, and mutation operators) to converge to the best solution [30](5)Hybrid recursive feature addition (HRFA) creates a model using only the most relevant variable selected by ranking different variables of the original dataset. The algorithm adds the most important feature at each step and reassesses the performance of the model [31]. If the metric exceeds an arbitrarily defined threshold, the variable is retained otherwise it can be deleted. This processing is repeated until all variables are evaluated

The feature subset selected by a wrapper method represents a strong dependence on the classifier used in the selection phase. However, changing the classification algorithm can produce poor classification performance.

3.5. Embedded Methods

Embedded methods select the features judged critical during the training of the machine learning model adopted for the classification [32].

3.6. Hybrid Method

Hybrid method is a combination of a filter and a wrapper method of features selection. The features retained using the filter algorithm are evaluated by the wrapper algorithm to find the best subset of features [25].

Choosing the right method for selecting features usually depends on the initial goal. Filters are very good in reducing data size and eliminating redundant features. Wrapper methods on the other hand are very powerful at producing good classification precision using a given classifier. Table 2 below illustrates different types of feature selection algorithms used in this study:

3.7. Classification Methodology

The main objective of this work is to compare performance and execution time of different machine learning models in the classification of keratoconus. Classification is realized in first time using all features of the dataset of keratoconus, available in Harvard Dataverse [33]. In the second time, the classification is performed after a selection of crucial features by the application of different types of feature selection algorithms already cited on the original dataset. In other words, keratoconus classification is realized using different models with and without feature selection. The 10-fold cross-validation technique has been commonly used for different machine learning models in order to avoid the overfitting problem. Figure 3 below illustrates the classification methodology adopted in this study.

Random forest (RF) is an ensemble of many individual decision trees. It is a classification prediction method which is based on decision trees. This method is proposed by Breiman in 2001 [34]. RF is one of ensemble methods that involve using many learners to improve the performance of any single one of them individually. This method can be described as technique that uses a combination of a group of weak decision trees together, to create a stronger and aggregated one. The classification algorithm of RF is structured as follows [35].

  1. For b =1 to B:
    a. Draw a bootstrap sample of size N from the training data.
    b. Grow a random-forest tree Tb to the bootstrapped data, by recursively repeating the following steps for each terminal node of the tree, until the minimum node size nmin is reached.
      i. Select m variables at random from the p variables.
      ii. Pick the best variable/split-point among the m.
      iii. Split the node into two daughter nodes.
  2. Output the ensemble of trees
To make a prediction at a new point x:
Classification (voting): Let be the class prediction of the bth random-forest tree.

If the data change a little, the performances the individual trees may change but the forest is relatively stable because it is a combination of many trees, and this is the main advantage of RF.

Naive Bayes (NB) technique is based on the Bayes theorem. The Naïve Bayes is a probabilistic classifier which is well suited for high dimensional datasets. Despite of its simplicity, NB algorithm can outstrip more efficient other classifiers. NB classifier computes probability estimates rather than predictions. To verify whether a given observation belongs to a specific class, NB algorithm calculates probability each output value. NB assumes that the attributes present do not influence each other and are mutually independent [36]. This is called conditional independence.

Consider a dataset composed of attributes, and each tuple of is structured in values. Suppose that and are the two available class labels for the target data. For each new tuple , NB classifier predict that if the class has a highest probability condition on , i.e.,

If the class Ci had the maximum probability which is maximized, this class is called maximum posterior hypothesis. As is constant for all the classes the equation can be depicted as:

-nearest neighbor (KNN) algorithm is a simple and easy supervised machine learning algorithm that can be used to solve classification and regression problems. According to the measure of similarity, like the distance functions, -NN provides a classification of the new cases, by attributing them to the most present category among these neighbors [37].

The distance of the case to be classified to the other cases is ensured using some norm-based measurement functions, such as

The -NN algorithm can be described as follows [38].

Input data:
   i. Dataset D.
   ii. A distance calculation function.
   iii. A number K
To predict a new observation X, do:
1. Calculate all distances of this observation with the other observations in the dataset D.
2. Retain the K observations from the dataset close to X using the selected distance calculation function.
3. Take the values of retained observations:
   a. if regression: Calculate the average of retained observations values.
   b. if classification: Calculate the mode of retained observations values.
4. Return the value calculated in step 3 as a predicted value by K-NN for the observation X.

Logistic regression (LR) is a statistical-based classification model, and it is a linear predictive algorithm based on the concept of probability. The decision rule of the LR is ensured by a complex function called Sigmoid function. The probability generated by Sigmoid function is limited between 0 and 1. When the predicted value is greater than a threshold, the event is likely to occur, while when this value is below the same threshold, it is not [39]. The Sigmoid function is defined as follows: where and are the features and their corresponding weights/coefficients.

Linear discriminant analysis (LDA) is a supervised classification technique belonging to competitive machine learning models, developed in 1936 by R. A. Fisher. It is a simple and robust classification method which produces models that provide a good accuracy as more complex methods [40]. The idea behind LDA is to search a linear combination of variables (predictors) that best separates two classes (targets). Linear discriminant analysis process can be described into 5 steps as follows [40].

1. Computing the within-class and between-class scatter matrices.
2. Computing the eigenvectors and their corresponding eigenvalues for the scatter matrices.
3. Sorting the eigenvalues and selecting the top k.
4. Creating a new matrix that will contain the eigenvectors mapped to the k eigenvalues.
5. Obtaining new features by taking the dot product of the data and the matrix from step.

The within-class scatter matrix is calculated using the following mathematical equation [40]:

where is the total number of distinct classes and and

where is a sample (a row) and is the total number of samples within a given class.

The between-class scatter matrix is calculated using the following mathematical equation: where

The linear discriminants are provided by solving the generalized eigenvalue problem of the following matrix:

Decision tree (DT) is a tree-structured classification model. Each node of a DT represents a test evaluating an attribute of any individual in the population. The arcs from a node represent the responses to the test associated with this node. Each sheet of DT corresponds to a class, called the default class. The DT used in this work is based on the CART algorithm presented below [41].

Input: Sample S
Start: Initialize the current tree to the empty tree; the root is the current node.
Repeat
   Calculate the Gini(p) index of the current node p
        
   If the current node p is terminal, then
      Assign it a class.
   Else
    Select a test and create as many new child nodes as there are possible answers to the test: Choose the test t that maximize ∆(p, t). Where p is the current position, t is a test and Pg and Pd are the proportions of elements on the positions p1and p2, respectively.
        
   End if
   Pass to the next unexplored node if there is one
Until obtaining a decision tree
End

Support vector machine (SVM) consists in finding a hyperplane (straight line in the case of two dimensions) that best separates these two classes in the case of a binary classification [42]. The separating hyperplane is represented by the following equation [43]: where is a vector of dimensions and is a term. The decision function, for an example , can be expressed as follows:

In reality, most of the problems are multiclass; in this case, solutions based on SVM methods reduce the multiclass problem to a composition of several biclass hyperplanes making it possible to draw the decision boundaries between the different classes.

3.8. Evaluation Metrics

In different steps of keratoconus classification, the performance evaluation of obtained results is based in classification accuracy, recall, -score, ROC curve, and prediction time.

The precision is a measure that expresses how accurate your model is relatively to those predicted positive, how many of them are actually positive [44]. In our case, precision indicates eyes correctly predicted having keratoconus out of all eyes actually having keratoconus. Precision is calculated using the following formula:

The recall (or true positive rate) is the measure of our model correctly identifying true positives [44]. Thus, for all instances who actually have keratoconus disease, recall tells indicates how many the model correctly identified as having a keratoconus disease. Recall is computed using the following equation:

The -score is a metric combining false positives and false negatives to strike a balance between precision and recall [44]. It is a weighted average (or harmonic average) of the precision and recall. Model is considered perfect when -score is 1, while the model is considered as a total failure when -score is 0. -score is computed as follows:

The accuracy is a popular measure that describes classification performance of the model over all classes [44]. It represents the ratio between the number of correct predictions to the total number of predictions. Accuracy is calculated using the following equation: where true positives (TP) is number of correct samples predicted as “yes.” True negatives (TN) is number of correct samples predicted as “no.” False positives (FP) is number of samples that are incorrectly predicted as “yes” when they are actually “no.” False negatives (FN) is number of samples that are incorrectly predicted as “no” when they are actually “yes.”

Execution time is the measurement of the time consumed by different machine learning models for the training and prediction phases to perform a classification.

Area under ROC curve (AUC) curve is a graph that represents relationship between false positive rate and true positive rate of a test for all possible thresholds. Ordinates represent false positive rate and abscissas correspond to true positive rate. ROC curve expresses the ability of a classifier to differentiate between true positive (TP) and false positive (FP) rates [44]. The value of ROC lies between 0.5 and 1, and efficient classifier tends to maximize the ROC value towards 1.

4. Simulation Results

4.1. Dataset Description

The current comparative study is based on the public keratoconus dataset of Harvard Dataverse [33], available in: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/G2CRMO. Structured in csv file, this dataset is composed of 446 features of 3162 rows. Eyes are classified in 4 classes as described in Table 3 below.

This dataset is extracted and used in [5] from a dataset of 12,242 eye images acquired from SS-1000 CASIA OCT Imaging Systems images and representing corneal swept source optical coherence tomography (OCT) in multiple centers across Japan.

4.2. Technical Description of the Used Calculator

The different classification models studied were implemented in Python using Jupyter application. All the simulations were carried out with CUDA 10.1 under Ubunto16.04, using a Xeon E5-2697 V4 CPU (18 cores, 36 threads) ECC: on, a RAM of 64 Gbytes DDR4 2133 MHz, a GPU 1 GTX 1070 Ti (8GB GDDR5, CUDA cores: 2432), total 38912 threads, ECC: off and a 2 Tesla k80 GPU (24 GB GDDR5, CUDA cores: 4992), total 53,248 threads, ECC: on.

4.3. Obtained Results considering Two Classes

Obtained results of first classification task, applying different algorithms of features selection on the original dataset, considering just two classes of eyes (class 1 for normal eyes with a total of 264 elements, and class 2 of keratoconus eyes with à total of 2989 elements) and using different classification models are illustrated in Figures 46 below.

Figures 79 show the classifier performance comparison based on the accuracy, using retained features by different features selection methods and considering 2 keratoconus classes.

4.4. Obtained Results considering Four Keratoconus Classes

The same proposed model is evaluated using the original dataset and considering the four classes of eyes as already illustrated in Table 2, both without and with features selection. Figures 10 and 11 below represent simulation results.

The classification algorithm comparison based on the classification accuracy of different models associated to the classification task considering 4 keratoconus classes is illustrated in Figures 12 and 13 below.

The results provided by the previous simulations show that the random forest algorithm represents the highest performance compared to other algorithms, both with and without features selection. RF algorithm allowed keratoconus classification with an accuracy around 98% in the case of classification according to 2 classes and exceeding 91% in the case of the classification considering 4 keratoconus classes. These results are obtained by using a number of variables to be retained fixed at a maximum of 10 for the features selection algorithms which require mentioning the number of features to be selected.

5. Discussion

The main objective of this work is to present a comparative study of different machine learning models’ performances in the case of keratoconus classification, based on the public keratoconus dataset of Harvard Dataverse, both without and with feature selection. Each classification technique was applied using all the variables of the dataset, then by applying 11 features selection algorithms to select relevant variables. To assess studied model’s ability to correctly classify keratoconus, 2 classification tasks were performed. The first classification was carried out retaining only 2 classes (normal eyes and keratoconus eyes), and the second classification was carried out considering 4 classes (class 1 for normal eyes, class 2 for healthy eyes with form fruste keratoconus, class 3 for eyes with mild keratoconus, and class 4 for eyes with advanced keratoconus stage).

In overall, RF algorithm has a good ability of differentiating between normal eyes and keratoconus eyes. RF classifier provided the best performance in terms of classification accuracy using all features and for all algorithms of variable selection accepted the filter combined to the HRFA algorithm in the case of 2 and 4 classes of keratoconus. Table 4 below shows the performance of RF model in terms of classification accuracy retaining 2 and 4 keratoconus classes with respect to the different algorithms of features selection already mentioned.

On the other hand, and as illustrated by Table 2 below, RF algorithm represented the highest performances by the application of the SFS algorithm of feature selection. In the case of the classification using only 2 eye classes, this method generated an accuracy of 98.10% using just 10 variables, against 98.0% by the same classifier applied to all the dataset composed of 446 variables. Also, the execution time in this case was reduced remarkably from 16.014 seconds to 3.241 seconds. In the second classification task, taking into account 4 classes of keratoconus, the classification accuracy of the RF was of the order of 95.32% by processing the 10 selected variables using the SFS algorithm, against 95.32% by use of all dataset variables with a significant decrease in execution time from 20.485 seconds to 3.702 seconds.

Table 5 below illustrates the performances of different classifiers, applied with different techniques of features selection, considering 2 and 4 classes.

Generally, algorithms of RF, LR, and LDA represent the best performances in different keratoconus classification tasks. Figure 14 below illustrates the ROC curves comparison of RF, LR, and LDA algorithms applied on the retained variables, using SFS features selection algorithm, with respect to the keratoconus classes C1, C2, C3, and C4.

Obtained ROC curves show that RF, LR, and LDA models discriminate accurately different classes of keratoconus using just 10 variables instead of 446 features, hence, the effectiveness of SFS algorithm in the selection of relevant variables, thus, reducing the execution time and material resources of computations. RF algorithm represents the highest performance with an area under curve (AUC) between 98% and 100% across different keratoconus classes. LR and LDA models provide an AUC varying between 94% and 100% for different keratoconus classes.

However, concerning the comparison of the feature selection algorithms according to the calculated execution time, this work made it possible to classify these algorithms into 3 categories. A first category of the fastest algorithms which are dedicated to execution on personal computers and whose calculated execution time does not exceed 3 minutes, these algorithms are mutual information, ANOVA, embedded, embedded with filter, filter with RFE, RFE, and finally, filter with HRFA algorithms. The second category of these algorithms concerns algorithms for which the calculated execution time varies from 18 minutes to 313 hours, these algorithms require efficient calculators with good hardware configurations, and this category is composed of filter with SFS, SFS, genetic, and filter with SBS algorithms. The third category is composed of algorithms for which the execution time exceeds 300 hours, and the algorithms of this category are SBS algorithm which was applied just on the case of keratoconus classification considering two classes, due to the expensive prediction time it consumed. Table 6 below resumes different categories of features selection algorithms.

In order to validate the proposed methodology, the adopted process is applied on the database keratoconus [45] composed of 42 features out of 205. Table 7 below presents a brief description of the used dataset:

In the case of binary classification considering 2 keratoconus classes (normal and keratoconus eyes) and using a subset of six selected variables, the highest performance was achieved, applying 10-fold cross-validation, by RF classifier using genetic selection features algorithm. The best-obtained accuracy was in the range of 93%. In the case of 6 keratoconus classes, the highest performance was provided by the NB model, trained on a subset of 6 selected variables using Boruta [28] algorithm of feature selection. The classification accuracy of this model was 71%.

6. Conclusions

In conclusion, the current work represented a comparative study of keratoconus classification performances using different machine learning classifiers. The classification was performed in 2 steps, retaining 2 target classes then considering 4 target classes, applying and without features selection. The obtained results demonstrated that RF algorithm combined to SFS algorithm, which has selected just 10 features, provided a classification accuracy relatively higher than the use of all features, i.e., 446 variables. Given the importance of execution time in addition to classification performance, the use of SFS algorithm has reduced significatively the execution time and has increased classification accuracy by eliminating harmful variables to classification models, hence, the usefulness and the impact of including selection of critical and relevant features to be used in the classification, especially in the case of largest datasets. This work was carried out as part of a project involving machine and deep learning, in the field of ophthalmology, which aims to produce an intelligent system capable of detecting and classifying keratoconus based on the analysis of topographic maps of the eyes.

Data Availability

The current comparative study is based on the public keratoconus dataset of Harvard Dataverse, available in: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/G2CRMO.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.