Abstract

Novel coronavirus 2019 has created a pandemic and was first reported in December 2019. It has had very adverse consequences on people's daily life, healthcare, and the world's economy as well. According to the World Health Organization's most recent statistics, COVID-19 has become a worldwide pandemic, and the number of infected persons and fatalities growing at an alarming rate. It is highly required to have an effective system to early detect the COVID-19 patients to curb the further spreading of the virus from the affected person. Therefore, to early identify positive cases in patients and to support radiologists in the automatic diagnosis of COVID-19 from X-ray images, a novel method PCA-IELM is proposed based on principal component analysis (PCA) and incremental extreme learning machine. The suggested method's key addition is that it considers the benefits of PCA and the incremental extreme learning machine. Further, our strategy PCA-IELM reduces the input dimension by extracting the most important information from an image. Consequently, the technique can effectively increase the COVID-19 patient prediction performance. In addition to these, PCA-IELM has a faster training speed than a multi-layer neural network. The proposed approach was tested on a COVID-19 patient’s chest X-ray image dataset. The experimental results indicate that the proposed approach PCA-IELM outperforms PCA-SVM and PCA-ELM in terms of accuracy (98.11%), precision (96.11%), recall (97.50%), F1-score (98.50%), etc., and training speed.

1. Introduction

The World Health Organization (WHO) identified COVID-19 (virus known as SARS-CoV-2) as a worldwide pandemic in February 2020. This triggered never expected counter-measures, such as the closure of cities, districts, and foreign travel. Coronaviruses (CoV) are death-defying viruses that may cause severe acute respiratory syndrome (SARS-CoV). Various researchers and institutions have attempted an effective solution from different possible diminutions in encountering the COVID-19 pandemic. Multimedia dataset (audio, picture, video, etc.) is booming in a massive amount of text information as civilization enters the information era. Image classification has become more essential as the need for real-world vision systems grows [1] and has recently attained a lot of attention from many researchers. It has evolved into one of the most essential operations, serving as a requirement for all other image processing operations. Image classification using learning algorithms is a special open issue in image processing that has sparked a lot of interest due to its promising applications. In general, an image categorization system has two primary processes. The first stage is to create an effective image representation that has enough information about the image to allow for classification further. The second step is to use a good classifier to classify the new image. Thus, there are two major challenges to consider when improving picture classification performance: dimensionality reduction and classifier. Apart from computer vision and image operation, one of the most important stages in image classification is feature extraction which determines the invariant characteristic of images when using computer devices to assess and deal with image data.

In a practical scenario, feature extraction has been applied in many fields like historic structures, medical image processing, remote image sensing, etc. The image’s essential lower-level qualities include color, texture, and shape. The color feature has globality, which may be retrieved using tools such as the color histogram, color set, and color moment. It might simply explain the proportions of different colors across the image. The useful characteristic is color for identifying photos that are difficult to distinguish automatically, and the spatial variation should be ignored. However, it is unable to explain the image’s local distribution as well as the description of the distinct colors’ spatial positions. Image classification with feature extraction using incremental extreme learning machines is proposed in this paper. Firstly, on the COVID-19 dataset of chest X-ray images, features were extracted from an image using PCA. Eventually, the SVM, ELM, and IELM are applied to image classification [2] once the dimension is reduced by PCA method. Different metrics were employed to achieve the robust evaluation: classification accuracy, recall, precision, F-score, true-negative rate (TNR), true-positive rate (TPR), AUC, G-mean, precision-recall curve, and receiver operating characteristics (ROC) curve.

The paper is arranged in the following sequence: several related approaches have been discussed in Section 2. The suggested technique is described and critiqued in Section 3. Section 4 contains a description of PCA and feature extraction techniques. Subsections 4.14.6 contain different algorithmic approaches that are compared with the proposed method. In Section 5, the proposed method and algorithm have been discussed. Section 6 describes the different evaluation criteria that are used. Section 7 discusses the experimental setup that has been used. Section 8 describes the dataset. Finally, Section 9 discusses the experimental results, and the research is concluded.

The content of image features comprises color, texture, and other visual elements. The extracted content from visual features is the main component for analyzing the image. In this segment, some of the earlier work based on PCA and other feature extraction techniques along with different classification techniques has been discussed.

Sun et al. [3] suggested an image classification system based on multi-view depth characteristics and principal component analysis. In this method, depth features are extracted from the image, and from RGB depth, characters are independently extracted and PCA is applied to reduce dimension. The Scene15 dataset, Caltech256 dataset, and MIT Indoor datasets are used in the evaluation process. Eventually, the SVM [4] is used to classify images. The method's performance is demonstrated by the experimental results.

Mustaqeem and Saqib [5] suggested a hybrid method that is based on PCA and SVM. PROMISE (KC1: 2109 observations, CM1: 344 observations) data from NASA’s directory have been used for the experiment. The dataset was divided into two parts: training (KC1: 1476 observations, CM1: 240 observations) and testing (KC1: 633 observations, CM1: 104 observations). Principal components of the features are extracted by PCA, and it helps in dimensionality reduction and minimizing time complexity.

In addition to this, SVM is used for further classification, and for hyperparameter tuning, GridSearchCV is used. From this, precision, recall, F-measure, and accuracy for KC1 dataset analysis are 86.8%, 99.6%, 92.8%, and 86.6%, respectively, and for CM1 dataset analysis, precision, recall, F-measure, and accuracy are 96.1%, 99.0%, 97.5%, and 95.2%, respectively. Similarly, Castaño et al. [6] provide a deterministic approach for starting ELM training based on hidden node parameters with activation function. The hidden node parameters are calculated with the help of Moore–Penrose generalized inverse, whereas the output node parameters are recovered through principal component analysis. Experimental validation with fifteen well-known datasets was used to validate the algorithm. The Bonferroni–Dunn, Nemenyi, and Friedman tests were used to compare the results obtained. In comparison with later ELM advancements, this technique significantly reduces computing costs and outperforms them.

Mateen et al. [7] suggested VGG-19 DNN-based DR model with better performance than AlexNet and the spatial invariant feature transform (SIFT) in terms of classification accuracy and processing time. For FC7-SVD, FC7-PCA, FC8-SVD, and FC8-PCA, respectively, classification accuracies are 98.34%, 92.2%, 98.13%, and 97.96% by using SVD and PCA feature selection with fully connected layers.

Zhao et al. [8] suggested extreme learning machines with no iteration along with supervised samples are used for model building as a class incremental extreme learning machine. The algorithm is shown to be stable and has almost equivalent accuracy of batch learning. Similarly, Huang and Chen [9] proposed an algorithm that analytically calculates hidden nodes’ output after randomly producing and adding computational nodes to the hidden layer as a convex incremental extreme learning machine. Using a convex optimization, the existing hidden node output is calculated again. This can converge faster while maintaining efficiency and simplicity.

Zhu et al. [10] proposed a principal component analysis (PCA)-based categorization system with kernel-based extreme learning machine (KELM). Based on the resultant output, this model achieves better accuracy than SVM and other traditional classification methods. For the classification of HSIs, Kang et al. [11] developed the PCA-EPF extraction approach. In this research work, they have proposed the combination of PCA and standard edge preserving filtering (EPF)-based feature extraction. The proposed method achieves better classification accuracy with limited training samples. Similarly, Perales-González et al. [12] introduced a new ELM architecture based on the negative correlation learning framework dubbed negative correlation hidden layer ELM (NCHL-ELM). This model shows better accuracy when compared with other classifications by integrating a parameter into each node in the original ELM hidden layer.

Based on fractal dimension technology, Li et al. [13] suggested an enhanced ELM algorithm (F-ELM). By reducing the dimension of the hidden layer, the model improves in training speed. From the experimental results, it can be concluded that as compared to the standard ELM technique, the suggested algorithm significantly reduces computing time while also improving inversion accuracy and algorithm stability.

Because of the complexity of the data models, deep learning is incredibly pricey to train. Furthermore, deep learning necessitates the use of high-priced GPUs and hundreds of computer machines. There is no simple rule that can help you choose the best deep learning tools since it necessitates the understanding of topology, training technique, and other characteristics, whereas the simple ELM is a one-shot computation with a rapid learning pace. But the biggest advantage in IELM is the ability to randomly increase hidden nodes incrementally and analytically fix the output weights. The output error of the IELM rapidly diminishes as the number of hidden neurons increases.

In our method, SVM, ELM, and IELM based on the PCA technique are employed for image classification [14] for COVID-19 patient detection using the COVID-19 chest X-ray dataset. A summary of the most recent and related research works is described in Table 1 [3, 513].

3. Proposed Methodology

The back propagation (BP) approach is commonly used to train multi-layer perceptron (MLP). Various algorithms can be used to train this typical architecture. Gradients and heuristics are two types of algorithms that are commonly used. These algorithms have a few things in common: they have a hard time dealing with enormous amounts of data, and they have a slow convergence rate in these situations. Huang et al. (Huang et al.) [15] introduced the extreme learning machine as a solution to this problem.

The typical computing time required to train an SLFN using gradient-based techniques is reduced by this algorithm. The ELM, on the other hand, has several flaws. The randomly generated input weights and bias for ELM [16] result in some network instability. In case if there are outliners in the training data, then the hidden layer's output matrix will have ill-conditioned problems and it results in low generalization performance and lower forecasting accuracy. There are two types of ELM called fixed ELM and IELM [17]. In comparison with the ELM, the output error of the IELM rapidly diminishes and it tends toward zero with the growth in number of hidden neurons (Huang et al.) [15]. In online continuous learning regression and classification problem, this approach is very prominent (Xu and Wang; Zhang et al.) [18, 19].

A trained classifier can be obtained after training the classifiers with a sufficient amount of image data and then fed into the trained classifier for observation and analysis.

4. Feature Extraction

A single feature cannot describe the image feature and quality properly. The image classification will not yield acceptable results unless distinguishing features are described. Three images corresponding to three viewpoints are placed on each RGB color image. Our method uses PCA to extract the image's important information and minimize the input dimension [2023].

4.1. Classification of Images and PCA Feature Extraction

Extracting useful features from an image is a prominent task in image classification, and principal component analysis (PCA) is used for this purpose. PCA uses orthogonal transformation and converts variables to fewer independent components than the original variables. The output data with this approach will not lose important data features, and PCA loadings can be used for the identification of important data. A multivariate statistical analysis approach is used by PCA, which can perform linear transformation of numerous variables to pick a few key variables. PCA transforms data using eigenvectors from N-dimension to M-dimension where M < N. The new features are a linear mixture of the old ones, allowing them to capture the data's intrinsic unpredictability with little information loss. Figure 1 reveals the steps of the proposed model.

Suppose that the research object has p indexes, these indexes are regarded as p random variables and represented as X1, X2, , Xp. With this, new indexes are created by combining p random variable F1, F2, ..., Fp, which can mirror the data from the original indexes [24]. The independent replacement indexes reflect the original indexes’ essential information.

The following are the PCA stages in detail:(1)Data standardization: The following calculation formula is used to standardize the matrix X:where X = {xij}, Y = {yij}, where i = 1, 2, ..., n and j = 1, 2, ..., p,(2)The following formula is used to solve the correlation coefficient matrix R:(3)The following formula is used to calculate the eigenvalue and eigenvector of the coefficient matrix:The calculated eigenvector is ai = (ai1, ai2, ... , aip), where i = 1, 2, 3, 4, …………., p, and the eigenvalue is i (i = 1, 2, ..., p). To get a collection of main components Fi, the eigenvalues are sorted in descending order:(4)The following are the main factors to consider kth primary component contribution rate and expressed as

The rate of the first k primary components’ cumulative contribution is expressed as

The first principal component, F1, is the one with the highest variance out of all the combinations of Y1, Y2, ..., Yp; the second principal component F2 is one with the highest variance among all the combinations of Y1, Y2, ..., Yp, and they have no relation with F1.

4.2. SVM

Several algorithms have been implemented and suggested in machine learning to solve the classification problem. Among the different classification problems, support vector machine (SVM) is one of the supervised algorithms in machine learning with [5, 25] the advantages as follows:(i)It employs L2 regularization to overcome overfitting problems.(ii)Even with minimal data, provide suitable findings.(iii)Different kernel functions to match the features’ complicated functions and interactions.(iv)Manages the data nonlinearity.(v)The model is stable thanks to the hyper-plane splitting rule.(vi)Analyzes the data with a high degree of dimensionality.

Instead of focusing on decreasing prediction error, SVM focuses more on optimizing classification decision boundaries, which is why the hyper-plane is used to separate classes. If the data dimension is n and the hyper-plane is a (n − 1) vector function, then it can be represented mathematically as follows:

It also signifies, in a broader sense,where x denotes the input feature vector, is the weight vector, and b is the bias. By adjusting and b, several hyper-planes can be created, but the hyper-plane with the best margin will be chosen. The largest feasible perpendicular distance between each class and the hyper-plane is defined as ideal margin. The cost function or objective function is minimized to get the best margin. The cost function may be written as follows:

Even if the predictions are right and the data are correctly categorized by hypothesis, SMV utilized to penalize any that are close to the borders (0 < ). The main goal is to figure out optimal value to minimize , so differentiating Eq. 11 concerning , we get the gradient of a cost function as follows:

As far as we have calculated , weights of can be updated as

We go through the procedure again and again until smallest discovered. Because data are rarely linearly separable, we must sketch a decision boundary between the classes rather than using a hyper-plane to separate them. We will need to convert (13) into a decision boundary to deal with the dataset’s nonlinearity: is the kernel function in (14). There are various types of kernel functions that may be used to create SVM, such as linear, polynomial, and exponential, but we will use the radial basis function in this model (RBF). Distance parameter that is used is Euclidean distance, and the smoothness of the borders is defined by the parameter .where is the square of Euclidean distance between any single observation and mean of the training sample .

Input: Given N observations along with the class labels (xi, ti), xiRm, tiRC.
Output: SVM classification model.
(1)Procedure SVM.
(2)Initialize weights and bias b with any arbitrary number.
(3)Feature optimization.
(4)TrainDataSet, TestingDataSets = Data.Split(Ratio).
(5)Define y.
(6)Define f(y).
(7)Define J().
(8)Calculate .
(9)Repeat for minimum J ():
 Update .
 Call steps 6–8.
 End.
(10)Calculate accuracy.
(11)Return accuracy.
(12)End procedure.
Input: Given N observations along with the class labels (xi, ti), xiRm, tiRC.
Output: SVM classification model.
4.3. PCA-SVM

The motive of the support vector machine (SVM) [3] is to find the best possible hyper-plane that will separate two planes on the training set. The coefficient of the hyper-plane is that we have to project. It uses structural risk minimization theory to build the best hyper-plane segmentation in the feature space and a learning editor to achieve global optimization.

Assume the training data, , , .

This could be projected into a hyper-plane:

For the normalization,

The classification of the interval is equal to , when the maximum interval is equal to the minimum .

Before classifying the data through SVM, the necessary features from the image data need to be extracted. The high-dimensional data can be converted to the low-dimensional data with this approach. For this, the PCA method as a feature extraction through convergence matrix and eigenvalue proportion calculation is used. PCA-based SVM is the method that is used for classification and regression. After that, SVM is used to classify low-dimensional data. Figure 2 depicts the working flow of PCA-SVM. Once the parameter optimization is done, the model is ready to predict categorization.

Input: Given N observations along with the class labels (xi, ti), xiRm, RC.
Output: PCA-SVM model for classification.
(1)Procedure PCA-SVM
(2)Identify the relationship among features through a covariance matrix.
(3)Through the linear transformation or eigendecomposition of the covariance matrix, we get eigenvectors and eigenvalues.
(4)Transform our data using eigenvectors into principal components.
(5)Quantify the importance of these relationships using eigenvalues and keep the important principal components.
(6)Data extracted from PCA and will be given as input.
(7)Initialize weights and bias b with any arbitrary number.
(8)Feature optimization.
(9)TrainDataSet, TestingDataSets = Data.Split(Ratio).
(10)Define y.
(11)Define f(y).
(12)Define J().
(13)Calculate .
(14)Repeat for minimum J():
Update .
Call steps 10–13.
End.
(15)Calculate accuracy.
(16)Return accuracy.
(17)End procedure.
4.4. Extreme Learning Machine (ELM)

An extreme learning machine is a single hidden layer feedforward network that can be used for both classification and regression. In ELM [26], weights between the input layer, hidden layer, and biases are randomly generated. The output weights are calculated using the generalized Moore–Penrose pseudo-inverse. ELM performs faster than other feedforward networks [27] and outperforms other iterative methods. Figure 3 shows the basic network architecture of ELM.

Suppose [xi, ti] denotes N training samples, wherein training instances iϵ 1, 2, 3, …………, N and xi = [xi1, xi2,…, xim]T ϵ Rm denotes ith training instance and its desired output ti = [ti1, ti2,…, tiC ]Tϵ RC.

Let the number of input features and number of neurons be equal and represented by m; similarly, let L be the number of hidden neurons. The number of output neurons and number of classes are equal and denoted by c. Figure 4 [24] shows the flowchart of the principal component analysis [28]. The input weight matrix is represented by U = [u1, u2,…, uj,…uL]TRL×m, and the hidden neuron bias is represented by b = [b1, b2,…, bj,…bL]TRL. uj = [uj1, uj2,…ujm] are the connecting weights between the jth hidden neuron with the input neurons. Bias of the jth hidden neuron is bj, and jth hidden layer output for ith instance is represented by

Here, activation function is represented by . For all the training instances hidden layer output is represented by H and can be represented by

Between the hidden layer and the output layer, the output weight β can be computed using Eq. (20). Linear activation function is used by the output layer in this computation.

Here,

The vector βj = [βj1,…, βjk,…, βjC]T, where j = (1, 2, 3, …..…, L) represents the connecting weights between the jth hidden neuron and the kth output neuron. The predicted outcome of all the output neurons for all training instances is represented as

Here, the output function is f(x) = [fk(x),…, fC(x)]. From Eq. 23, label for class x can be predicted.

Input: Given N observations along with the class labels (xi, ti), xiRm, RC.
Output: ELM classification model.
(1)ELM procedure.
(2)Hidden biases b and input weights U randomly selected.
(3)From (19), H as hidden layer output is defined.
(4)From (20), β is defined as the weights between the hidden layer and the output layer.
(5)Return β.
4.5. PCA-ELM : Classification Method Based on PCA-ELM

In the PCA technique [6], variables are first scaled. The different steps of PCA that has been applied in PCA-ELM are(1)Scaling of trained data.(2)Covariance matrix evaluation.(3)Eigenvalues for the covariance matrix along with eigenvectors are defined.(4)Evaluating the principal components.

The output from PCA is given as an input to ELM [29]. The process of PCA-ELM [30] is shown in Figure 5.

Input: Given N observations along with the class labels (xi, ti), xiRm, RC.
Output: PCA-ELM model for classification.
(1)Procedure PCA-ELM.
(2)Identify the relationship among features through a covariance matrix.
(3)Through the linear transformation or eigendecomposition of the covariance matrix, we get eigenvectors and eigenvalues.
(4)Transform our data using eigenvectors into principal components.
(5)Quantify the importance of these relationships using eigenvalues and keep the important principal components.
(6)Extracted principal components given as the input data.
(7)Hidden biases b and input weights U randomly selected.
(8)From (19), hidden layer output H can be computed.
(9)From (20), weights between the hidden and the output layer computed as β.
(10)Return β.
(11)End procedure.
4.6. ELM

Compared to the other neural networks, the ELM learns faster as there is no need to adjust hidden nodes and provides better generalization capability. But there are various flaws with the ELM. Randomly generated bias and input weights in ELM network [31] are results in some network instability. Training data outliers from the hidden layer's output matrix result in poor network generalization performance. In comparison to the ELM, the output error of the IELM rapidly diminishes and resolves the issue of very small weights of output and validity of hidden layer neurons. In online continuous learning, it is appropriate for regression and classification tasks.

The IELM [32] network model structure is shown in Figure 6. Suppose the size of input, hidden nodes, and outputs are m, , and n, respectively, and is the input weight matrix with dimension of the current hidden layer neuron and uniformly distributed between random numbers . The bias of the ith hidden node is a random number between uniformly distributed, the activation function for the hidden layer neuron is sigmoid function given by (24), and output weight matrix β is with dimension.

The hidden node activation function (sigmoid) is given bywhere x is the input matrix.

A matrix X is of dimension, and it represents N dataset input. Y is a n × N matrix that represents the output where N datasets for a training set {(X, Y)}. Training steps of IELM algorithm are described as follows:

Step 1. In the initialization phase, suppose  = 0 and L is the maximum number of the hidden nodes. Output Y is defined in terms of the initial value of the residuals E (difference between target and actual error) is set to be the and ε is the expected training accuracy.

Step 2. Training phase, while  < L and E > ε(1)Hidden nodes will be increased by 1, i.e., (2)Hidden layer neuron is evaluated randomly from input weights and bias .(3)Output of the activation function is calculated for the node ( needs to be extended into a vector ).(4)Hidden layer neuron output vector can be calculated from(5)Output weight for can be evaluated from(6)After increasing the new hidden node, residual error is calculated:The network error rate can be reduced by the output weight . All these steps will iteratively work till the residual error becomes smaller than ε. The training process restarts through the determination of the random input weight and the bias . Whether the trained network has fulfilled the desirable result or not can be determined from set.

Input: Given N observations along with the class labels (xi, ti), xiRm, tiRC.
Output: IELM model for classification.
(1)Procedure IELM.
(2)Initialize: hidden nodes  = 0 and maximum L.
(3)For the newly increased hidden layer neuron , input weights and bias randomly evaluated.
(4)Output of the activation function calculated for the node .
(5)Hidden layer output calculated from Eq. (27).
(6)Weights between the hidden and the output layer can be evaluated from Eq. (28).
(7)Return .
(8)End procedure.

5. Proposed PCA-Based Incremental ELM (PCA-IELM)

An orthogonal transformation is used to extract meaningful characteristics from data in PCA [33]. PCA may also be used to minimize the dimensions of a large data collection. Principal components from COVID-19 X-ray images are extracted using PCA and given as input to IELM which gradually adds concealed nodes produced at random. A conventional SLFNs function with n hidden nodes can be expressed aswhere denotes the output of the ith hidden node: (for additive nodes) or .

The ith hidden layer and the output node are linked with output weights . Hidden nodes are randomly added to the existing networks in IELM. The randomly generated hidden node parameters and and fixed output weight are .

Suppose the residual error function for the current network is defined as where n is the number of hidden nodes and is the target function. IELM is mathematically represented as

(i)Input: Given N observations along with the class labels (xi, ti), xiRm, RC.
(ii)Output: PCA_IELM model for classification.
(1)Procedure PCA_IELM.
(2)Identify the relationship among features through a covariance matrix.
(3)Through the linear transformation or eigendecomposition of the covariance matrix, we get eigenvectors and eigenvalues.
(4)Transform our data using eigenvectors into principal components.
(5)Quantify the importance of these relationships using eigenvalues and keep the important principal components.
(6)Input for the model is given in terms of extracted principal components.
(7)Initialize: hidden nodes  = 0 and maximum L.
(8)For the newly increased hidden layer neuron , input weights and bias randomly evaluated.
(9)g calculated as the output function for the node .
(10)Hidden layer output calculated from Eq. (27).
(11)Weights between the hidden and the output layer can be evaluated from Eq. (28).
(12)Return .
(13)End procedure.

6. Evaluation Criteria for Effective Measure of Model

For evaluation of the different models, generally, the confusion matrix is prepared. Table 2 defines a simple representation of the confusion matrix [34, 35], and it can classify between predicted and actual values. From the confusion matrix, we can derive different performance metrics, e.g., accuracy, precision, recall, sensitivity, and F-score. To assess the model, nine different metrics are calculated by formula as given in Table 3 [36].

7. Experimental Setup

The whole experiment was performed on a system having a configuration of 10th Generation Intel (R) Core (TM) i7-10750H CPU @ 2.60 GHz processor, 8 GB RAM, and NVIDIA GTX graphics 1650TI. The code is written in Python 3.10.0 and uses Jupyter Notebook as a debugger, which can be installed from the link: https://jupyter.org/install.

8. Dataset Description

The COVID-19 chest X-ray images [37] dataset encompasses a total of 13808 images in which 3616 COVID-19 positive cases (26.2%) along with 10,192 (73.8%) normal cases are downloaded from Kaggle. COVID-19 and normal patient chest X-ray images are kept in separate files. Dataset was divided into training and testing images which had been done randomly with a condition that testing images will not be repeated in training images. During the experiment, 80% of the total images were used for training and 20% for testing. All images have the same dimension (299 × 299) pixels in the PNG file format. Figure 7 demonstrates the X-ray images of normal and COVID-19 cases.

The histogram of an image gives a global description of the image’s appearance. It represents the relative frequency of occurrences of various intensity values in an image. In the histogram of the COVID-19 image, the intensity value is highest between bins 14–15, whereas in the normal image the histogram has the highest intensity value at bins 16–17. This difference in the color intensity value assists in making the distinction between COVID-19 and normal images. Figure 8 demonstrates the histogram plot of normal and COVID-19 images. Figure 9 shows the training images for X-ray images of COVID-19 and normal.

Because PCA uses orthogonal transformation to convert all features into a few independent features, all features are considered during the feature selection process. The data to be processed are reduced to a set of features called a “reduced representation set.”

9. Results and Discussion

In this segment, we present the outcomes and analysis of the experiments performed in the COVID-19 patient prediction using the chest X-ray dataset. From the experimental results, the proposed method shows better performance in terms of accuracy, precision, recall, F1-score, AUC, G-mean, and other parameters. For each model, PCA-SVM, PCA-ELM, and PCA-IELM, a separate confusion matrix is formed. All the performance metrics values are derived from the confusion matrix (Tables 46). Classification accuracy gained by the proposed method PCA-IELM is 98.11% over the chest X-ray dataset, which suggests better results than the other two models, PCA-based SVM (91.8%) and PCA-based ELM (93.80%) in terms of accuracy. Sometimes, performance metrics' accuracy may be misleading and can misclassify instances. So, other metrics are also taken into consideration to confirm the claim made by the classifier. PCA-IELM has the highest precision value of 96.11%. That means PCA-IELM is 96.11% reliable in making decisions, whereas models PCA-SVM and PCA-ELM record less precision, 84.3% and 88.3%, respectively. Similarly, for the proposed method PCA-IELM, other metrics (refer to Figure 10) recall, F1-score, TPR, TNR, and G-mean are considerably higher than the other two methods, PCA-SVM and PCA-ELM.

The geometric mean (G-mean) is a statistic that analyzes categorization performance across majority and minority classes. Even if negative examples are correctly labelled as such, a poor G-mean suggests weak performance in identifying positive occurrences. This statistic is essential for preventing overfitting the negative class while underfitting the positive class, since the COVID-19 dataset understudy is also class imbalanced (IR = 2.81). Even then, the PCA-ELM model indicates good performance by attaining the highest G-mean value of 98%. Similarly, PCA-SVM and PCA-ELM have 88% and 90.5% success rates, respectively.

Table 7 demonstrates the performance variation (sensitivity, specificity, precision, F1-score, accuracy) based on different counts of hidden nodes in the range of 10–150 with an interval of 10 hidden nodes. Training and testing accuracies of PCA-IELM demonstrated almost the same behavior on the COVID-19 dataset (refer to Figure 11). There is moderate variation in the accuracy of PCA-IELM with respect to different numbers of hidden nodes. The accuracy at 10 numbers of hidden nodes was found to be 97.73%, and 98.11% was achieved at 140 numbers of hidden nodes in the PCA-IELM model and beyond (refer to Table 7).

When there is a moderate to large class imbalance, precision-recall curves should be drawn. Here, the COVID-19 dataset is imbalanced with an imbalance ratio (IR) of 2.81. It is worth noticing that precision is also called the positive predictive value (PPV). Moreover, recall is also known as sensitivity, hit rate, or true-positive rate (TPR). It means they talk about positive cases and not negative ones. Most machine learning algorithms often involve a trade-off between recall and precision. A good PR curve has a greater AUC (area under curve). Figures 12(b), 13(b), and 14(b) depict PR curves. Figure 13(b) shows the greater AUC, which is an indication of the better performance of PCA-IELM than the other two models. In addition to these, ROC of Figure 14(a) also grabs more AUC than two other Figures 12(a) and 13(a). Therefore, PCA-IELM claims better performance than PCA-SVM and PCA-IELM. The proposed PCA-IELM model outperforms other previously developed models for identification of COVID-19 patients from chest X-ray image (refer Table 8 [3847]). As far as the training and testing time taken by the proposed model PCA-IELM is concerned, it was higher (refer to Table 9) because the execution of the model happened in an incremental way and not in one go.

10. Conclusions

In this paper, an effective classification model is proposed on the COVID-19 chest X-ray image dataset using principal component analysis (PCA) and incremental extreme learning machine (IELM). This study established the valuable application of the ELM model to classify COVID-19 patients from X-ray images by developing the PCA-IELM model. The proposed PCA-based IELM algorithm is an efficient IELM-based algorithm. The hidden node parameters are measured by the information returned to the PCA in the training dataset, and using the Moore–Penrose generalized inverse output, the node parameters are determined. PCA-IELM utilizes the best feature of IELM, which is to increase hidden nodes incrementally and wisely determine the output weights, whereas ELM requires you to set the appropriate number of hidden nodes manually, and this is similar to the hit and trial method. In comparison with the ELM, the output error of the IELM rapidly reduces and is near to zero as the number of hidden neurons increases. It was observed that as the number of hidden nodes increased, the performance of the PCA-IELM increased and it became stable at 150 hidden nodes. PCA-IELM outperforms PCA-SVM and PCA-ELM in terms of accuracy (98.11%), precision (96.11%), recall (97.50%), F1-score (98.50%), G-mean (98%), etc. The suggested research contributes to the prospect of a low-cost, quick, and automated diagnosis of the COVID-19 patient, and it may be used in clinical scenarios. This effective system can provide early detection of COVID-19 patients. As a result, it is helpful in controlling the further spread of the virus from an affected person. This is an intelligent assistance for radiologists to accurately diagnose COVID-19 in X-ray images.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.