Abstract

Background. Cloud-based environment for machine learning plays a vital role in medical imaging analysis and predominantly for the people residing in rural areas where health facilities are insufficient. Diagnosis of COVID-19 based on machine learning with cloud computing act to assist radiologists and support telehealth services for remote diagnostics during this pandemic. Methods. In the proposed computer-aided diagnosis (CAD) system, the balance contrast enhancement technique (BCET) is utilized to enhance the chest X-ray images. Textural and shape-based features are extracted from the preprocessed X-ray images, and the fusion of these features generates the final feature vector. The gain ratio is applied for feature selection to remove insignificant features. An extreme learning machine (ELM) is a neural network modification with a high capability for pattern recognition and classification problems for COVID-19 detection. Results. However, to further improve the accuracy of ELM, we proposed bootstrap aggregated extreme learning machine (BA-ELM). The proposed cloud-based model is evaluated on a benchmark dataset COVID-Xray-5k dataset. We choose 504 (after data augmentation) and 100 images of COVID-19 for training and testing, respectively. Conclusion. Finally, 2000 and 1000 images are selected from the non-COVID-19 category for training and testing. The model achieved an average accuracy of 95.7%.

1. Introduction

Corona virus emerged from Wuhan, China in Dec 2019 and named COVID-19. According to the World Health Organization (WHO), the virus could affect the respiratory system, causing fever, cough, and pneumonia [1]. Currently, COVID-19 is spread out in many populations and livestock, including bats, cats, camels, and cattle. SARS (severe acute respiratory syndrome) is a viral respiratory ailment caused by the SARS-associated coronavirus (SARS-CoV), which was first reported in Southern China in 2003 and has since spread to many countries around the world [1]. The SARS (severe acute respiratory syndrome) and novel COVID-19 outbreaks most likely started in Chinese wet markets, transmitted virus from animal to human. The WHO's Emergency Committees declared the novel COVID-19 a pandemic on January 30, 2020, due to its rapid spread from person to person and since most infected cases lack immunity. The COVID-19 syndrome has spread across China and is now present in several neighboring countries, America, and Europe. COVID-19 has shocked the world with its rapid spread and virulence factor, which has had a significant effect on the lives of the global population, both from a health angle and economic angle. At the time of writing this paper, around 123,758,893 people were infected from COVID-19. However, about 2,725,167 global deaths were reported till 15 : 55 on March 23, 2021 [2]. The report displays that most cases of deaths have happened in the US, such as 542,991.

COVID-19 has posed serious health risks around the world. COVID-19 symptoms are non-specific, and it could lead to extreme pneumonia and even death in some cases [3]. The more common signs and symptoms are fever, dry cough, weakness, sputum intake, shortness of breath, sore throat, headache, myalgia or arthralgia, chills, nausea or vomiting, nasal swelling, diarrhea, hemoptysis, and conjunctival congestion, according to a study published on February 20, 2020, by the WHO-China Joint Mission on COVID-19 [4]. Aslan et al. [5] developed two deep learning architectures to identify positive COVID-19 instances using Chest CT X-ray images. Both architectures use AlexNet. However, the second suggested design is a hybrid structure with a bidirectional long short-term memory (BiLSTM) layer that also considers temporal aspects. While the first hybrid architecture's COVID-19 classification accuracy is 98.14%, the second hybrid architecture's is 98.70%. Saygili [6] tested his proposed approach on three public COVID-19 datasets in five main stages: data collection, preprocessing, feature extraction, dimension reduction, and classification. All stages have suboperations as well. The proposed model detected COVID-19 with 89.41% accuracy for dataset-1 (CT), 99.02% accuracy for dataset-2 (X-ray), and 98.11% accuracy for dataset-3 (CT). In the X-ray dataset, COVID-19 (+), COVID-19 (-), and pneumonia but not COVID-19 classes had an accuracy of 85.96%

COVID-19 causes minor respiratory problems and fever in most people after 5–6 days. The majority of COVID-19-infected patients suffer mildly and recover soon. Around 80% of confirmed patients have a minor or uncomplicated condition that improves without special care. However, due to a shortage of nucleic acid detection boxes and low diagnosis rates in the epidemic region, the increasing number of new suspected and contaminated cases has become a concern in most hospitals [7]. COVID-19 detection uses X-ray and computed tomography (CT) scans as a key diagnostic instrument [8].

On the other hand, skilled radiologists must accurately identify COVID-19 by chest scans. However, a CT scan provides a complete image of blood vessels, muscles, lungs, and soft tissues. Internal structures can be seen and their scale, form, density, and texture could be determined using CT images. CT scans offer a more accurate view than conventional X-rays by presenting a series of slices of a single portion of the body. However, we employed X-rays to find the infected case of COVID-19 due to its easy availability in every hospital/clinic. COVID-19 diagnosis relies on a physician's ability to distinguish affected patients with a reduced incidence of false negatives (FN). Due to heavy workload and fatigue, many patients and comparatively inexperienced radiologists resulted in a high false-positive (FP) rate. As a result, early detection of COVID-19 is critical to prevent undue strain on the healthcare system and hold patients in quarantine excessively [9]. This research presents a computer-aided diagnosis (CAD) system based on machine learning and a cloud server to detect whether the patients are having symptoms of COVID-19 or not. We believe that the proposed framework could help reduce the burden on healthcare workers and improve the diagnosis capability of COVID-19 during this pandemic.

2. Background

Machine learning methodologies and image processing procedures are often used in diagnostics to achieve fast and precise results. The use of machine learning, especially CNN-based architecture, is a significant innovation that has yielded promising results. Computer-aided diagnosis (CAD) systems have been suggested as the most effective method for COVID-19 so far, and deep learning is gaining interest in medical imaging research. According to Saygili [10], automated COVID-19 detection includes preprocessing, segmentation, and classification. Preprocessing procedures include CT scan image scaling, sharpening, noise reduction, and contrast stretching, while segmentation uses an expectation-maximization-based Gaussian mixture model. COVID-19 is categorized as positive or negative using multiple classifiers. The research employed two publicly available CT datasets and a mixed dataset developed by mixing them. This dataset has the highest accuracy (98.5%), followed by dataset-2 (86.3%), and mixed dataset (94.5%). Different classifiers were suggested by Xie et al. [11] to differentiate SARS from normal pneumonia using X-ray images. Wang et al. [12] used 3D CT volumes to create a deep learning model for lesion localization and COVID-19 classification. They used U-Net to segment lung regions and then used a deep neural network to model COVID-19 with an accuracy of 0.901. Butt et al. [13] proposed a COVID-19 deep learning algorithm, which they evaluated on a dataset of 618 CT images. There were 224 cases of pneumonia, 219 cases of COVID-19, and 175 cases of common people. The suggested procedure had a 92.2% accuracy, a 0.996% AUC, and a 98.2% sensitivity. Shan et al. [14] used deep learning to obtain a Dice similarity coefficient (DSC) of 91.6%. Wang et al. [15] presented an inception migration-learning model with a precision of 82.9%, accuracy of 80.5%, and sensitivity of 84%. Singh et al. [16] used a CNN model to characterize COVID-19 as positive or negative using chest CT images and achieved a higher accuracy score. Harmon et al. [17] used deep learning to classify COVID-19 using chest CT images and achieved 90.8% accuracy, 84% sensitivity, and 93% precision, respectively. Murugan et al. [18] proposed CNN-ResNet50 to identify COVID-19, pneumonia, and regular using an X-ray image dataset and achieved accuracy, specificity, and sensitivity 94.07, 85.21, and 91.48, respectively. The DenseNet201 model was used by Jaiswal et al. [19] to detect COVID-19 using chest CT. Experiments showed that the model outperforms on CT scan breast images.

This research's key contributions are as follows:(i)A cloud-based diagnosis system is proposed to remotely evaluate patients suspected of/infected from COVID-19. Additionally, analysis of the cases will perform on a cloud server to facilitate the patients from a remote area.(ii)The fusion of features is employed to achieve robust features from enhanced X-ray images.(iii)Gain ratio is used which is an advanced version of information gain to remove irrelevant features and select appropriate features.(iv)Bootstrap aggregated extreme learning machine (BA-ELM) is employed after feature selection where multiple ELM models are trained through bootstrap resampling for enhancing accuracy.

The rest of the paper is organized as follows. Section 3 provides the details of the proposed model, including input data, preprocessing, a fusion of features, best feature selection, and finally classification. Section 4 depicts evaluation criteria. Section 5 exhibits experimental results. Finally, the conclusion and future work are presented in Section 6.

3. Cloud-Based COVID-19 Diagnosis Model

This research proposed a model of a cloud-based COVID-19 diagnosis system that facilitates evaluating and monitoring remote patients to identify COVID-19. This system concentrated on classifying and diagnosing the disease as “COVID-19″ or “non-COVID-19.”

The architecture of the proposed model is demonstrated in Figure 1. In this model, the suspected or infected patient goes to the nearby hospital in their city/town, where a chest X-ray is available. First, the hospital collects patient data in the form of X-rays and other parameters regarding health. Then, it sends the images and data via the Internet to the CAD server in the cloud for further processing. After processing, the report is sent back to the concerned patient via the doctor’s recommendation.

In the cloud environment, the processing of patients’ data is performed in the following steps. First, the fusion of texture and shape features is employed [20]. Various researchers proved that feature selection improves classification accuracy and reduces the computational cost. Thus, we employed a method of gain ratio to remove the insignificant features and find the significant ones. Finally, classification is applied on reduced features using bootstrap aggregated extreme learning machine (BA-ELM).

3.1. COVID-Xray-5k Dataset

Benchmark datasets are critical for assessing the approaches offered and comparing their outcomes in the reported literature. We chose the COVID-Xray-5k dataset [21], which included 84 and 2000 images of COVID-19 and non-COVID-19, respectively, for the training set. The process of data augmentation is applied to increase the number of training sets to 420 COVID-19 samples. Furthermore, the testing data of the non-COVID-19 containing different images belong to 13 subcategories such as edema, fracture, pneumonia, no_finding, and so on. Therefore, we only choose 400 images from the no-finding category and 600 images from others (50 from each category), resulting in 1000 images of non-COVID-19. Finally, the testing data of COVID-19 involve 100 images as presented in Table 1.

3.2. Balance Contrast Enhancement Technique (BCET)

BCET is an image enhancement technique used to improve the detection quality for further processing. The quality is improved by stretching or compressing the X-ray images' contrast without changing the histogram pattern. The general form of parabolic functional is computed aswhere the three coefficients , and r are used to find the maximum, minimum, and mean of an image's input and output values.where “m,” “n,” “e,” and “s” denote the maximum, minimum, mean value, and mean square sum of the input image, respectively, while “g,” “f,” and “a” denote the output image's maximum, minimum, and mean value. Figure 2 presents the balance contrast enhancement technique (BCET) output.

3.3. Fusion Features

The quality of good features such as agility, integrity, abstractness, invariance, and uniqueness should be followed to obtain more precise results [22]. Thus, it is vital to employ an effective feature extraction method for further classification of COVID-19 detection. As we know, two common methods are used for feature extraction: shape and textural features. In the proposed method, a fusion of grey level co-occurrence matrix (GLCM) [23] and shape-based features [24] is employed. GLCM utilizes the composition of grey levels and locations; due to its statistical qualities, objects are identified quickly. However, shape-based features compactly signify an object's properties. Therefore, the employment of one method may restrict the description ability in terms of classification. Thus, the fusion of both methods develops a distinct descriptor for an optimal representation and further classification.

In grey level co-occurrence matrix (GLCM), twenty features are extracted from the co-occurrence matrix (combinations of grey levels occurrence) of the X-ray image presented in Table 2.

The shape-based features are used to identify whether the object's shape is triangular or circular and find the perimeter of the object's boundary. Seven features are utilized: compactness, elongation, solidity, rectangularity, roundness, eccentricity, and convexity, as presented in Table 3. More details regarding GLCM and shape-based features and their formula can be found in [20].

Fusion of features is employed to obtain 27 features which is the concatenation of GLCM and shape-based features.

3.4. Gain Ratio

The gain ratio [25] utilizes ranking to remove irrelevant features based on the single attribute evaluation technique. This method improves the information gain method by eliminating the biasness towards features. The gain ratio is a filter technique that does a non-iterative calculation on the dataset to identify the relevance of the feature through the below formula:where , in which is the likelihood of having a class.

3.5. Bootstrap Aggregated Extreme Learning Machine (BA-ELM)

Extreme learning machine (ELM) is a single hidden layer feed-forward neural network usually employed for regression, compression, classification, and pattern recognition problems [26]. ELM is performed training differently from traditional network training techniques because it does not need gradient-based backpropagation. ELM removes the barrier of updating weights and biases. It emphasizes achieving minimum training error and lowest weight standards that enhance this model's overall efficacy. The output of the ELM model iswhere represent the number of hidden neurons, activation function, vector of the input layer, vector of the input sample, and bias value, respectively, is the output layer weight corresponding to the hidden neuron, and shows training sample’s number.

Moreover, to increase the reliability and accuracy of ELM, BA-ELM is proposed, where multiple ELM models are trained through bootstrap resampling [27, 28].

The output of a BA-ELM is calculated as

represent aggregated predictor of neural network, vector of neural network inputs, neural network, aggregated weight for merging neural network, and numbers of neural networks that are merged, respectively.

4. Evaluation Criteria

This model's primary purpose is to classify COVID-19 and non-COVID-19 samples based on four outcomes presented in Table 4.

Based on Table 4, the following formulae of evaluation metrics are employed, as illustrated in Table 5. Moreover, we also computed a confusion matrix to demonstrate the accurate and wrong number of samples.

5. Research Methods and Discussion

The experiments are performed using the proposed CAD system for COVID-19 detection. First, the input data are provided in X-ray images; then, preprocessing is performed; afterwards, fusion of features is formed. Then, appropriate features are selected, and finally, classification is done using multiple extreme learning machine (ELM) models. Then, bootstrap aggregation of ELM models is developed to achieve the best results, and all these steps are illustrated in Figure 3.

5.1. Performance Analysis of Different ELM Models

The model is trained with a different number of hidden layers through resampling. The minimum hidden layer is kept at 50 and the maximum is kept at 150. The model layers are resampled for the best classification results. Table 6 summarizes the result in the form of precision, sensitivity, specificity, accuracy, F-score, and MCC. The model tuned with 50 hidden layers achieved precision, sensitivity, specificity, accuracy, F-score, and MCC of 95.7%, 97.8%, 64.7%, 94.1%, 96.7%, and 68###XXX###0025;. The model is further trained with 100 hidden layers and accomplished 96.1%, 97.9%, 67%, 94.6%, 97%, and 70% of precision, sensitivity, specificity, accuracy, F-score, and MCC. Similarly, a hyperparameter with 150 hidden layers is performed and acquired 96.1%, 98.4%, 69%, 95.1%, 97.2%, and 73% of precision, sensitivity, specificity, accuracy, F-score, and MCC. Finally, the models are aggregated to achieve good results as presented in Table 6. In Table 6, it is presented that the sensitivity result is much better than the specificity rate.

The demonstration of incorrectly and correctly classified X-ray images through aggregation is shown in Figure 4, which is presented in the form of a confusion matrix.

We compare the proposed model with various techniques available in the literature. Finally, the model is compared in terms of F1-score and accuracy in Table 7, which shows that the proposed cloud-based model achieved better results through BA-ELM classifier.

Extreme learning machine (ELM) is the advanced technique of neural network for object recognition. Neural network mimics the working process of the human brain. BA-ELM could be formed by bootstrap aggregating multiple ELM models to recognize the learning data, gain knowledge, and achieve results. The proposed model reported good performance in classifying COVID-19 and non-COVID-19 case images. This process can identify COVID-19 more efficiently with minimal intervention of healthcare workers' efforts and even less knowledge about COVID-19.

6. Conclusion

This study presented a diagnosis of COVID-19 through X-ray images using various machine learning methods. The proposed cloud-based approach comprises four main steps: image enhancement, feature extraction for fusion features, feature selection, and classification. The proposed model is trained over a publicly available dataset and achieved 95.7% accuracy through the BA-ELM model. The proposed model might be implemented by healthcare professionals and researchers for COVID-19 detection to reduce healthcare workers’ workload. The results obtained are good enough; nevertheless, it could be improved further by employing segmentation and other learning classifiers. Additionally, we will acquire diverse and more extensive datasets to practically assess the approach as future work.

Data Availability

The COVID-Xray-5k dataset used to support the findings of this study is included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was technically supported by Artificial Intelligence and Data Analytics Research Lab, CCIS, Prince Sultan University, Riyadh, Saudi Arabia. The authors are thankful for the support.