Abstract

Blood is a vital body fluid and can be instrumental in identifying various pathological conditions. Nowadays, a lot of people are suffering from COVID-19 and every country has its own limited testing capacity. Consequently, a system is required to help doctors analyze a patient’s blood structure including COVID-19. Therefore, in this paper, we extracted and selected blood features by proposing a new feature extraction and selection method named stepwise linear discriminant analysis (SWLDA). SWLDA emphasizes on picking confined features from blood structure images and discerning its class based on reversion value such as partial F value. SWLDA begins with picking an equivalence comprising the sole finest X variable and then puts in effort to add more Xs individually, providing the situations are adequate. The process of adding and picking is based on F value to determine which variable would be entered. Then, the picked or the default F-to-enter value is compared with the uppermost partial F value. After this step, the forward addition or backward removal begins and whether the partial test values for all the predictor variables already in the line are estimated is known. Then, the comparison is made between the lowermost partial test value (FL) and preselected or defaulting consequence levels such as F0 (i.e., if F0>FL, the variable ZL is removed, and the F test is started again; otherwise, the regression equation is adopted). Finally, the system is trained by employing support vector machine (SVM) to label the blood images. The performance of the proposed approach is assessed by employing 8 different datasets of blood structures. It is assured that the proposed method has achieved significant results under different blood structure images including COVID-19.

1. Introduction

Blood is a structure that consists of plasma and blood cells in the circulatory system of the heart and veins, which is called the cardiovascular system in the body. Microscopic analysis of marginal blood flow, resulting in hematology, is an expensive and inefficient procedure [13]. Blood is a vital body fluid and can be instrumental in identifying various pathological conditions. Nowadays, the whole world is suffering from the COVID-19 pandemic and every country has its own limited testing capacity. A machine learning algorithm will help to classify the blood cells of various blood structure images when a large training set is available [4].

A convolutional neural network- (CNN-) based model has been proposed in [5], which has the ability to automatically identify the types of blood cells for the purpose of improving the clinical efficiency and saving time. Similarly, an intelligent machine learning method is developed for prognosis and prediction of cancer [6]. In their work, they analyzed the performances of different machine learning models for prediction of cancer.

Furthermore, an ensemble method that was based on various morphological filters was proposed by [7] to classify different shapes of red blood cells. Additionally, the authors of [8] developed a system for the identification of white blood cells. Their system was based on six different types of machine learning methods. They claimed 95% recognition rate. Similarly, an intelligent system was proposed by [9] for the classification of red blood cells in series of images. Furthermore, in [10], the authors built a data-driven system in order to track and predict the possible blood donors. They assembled different binary classification methods in order to approximate the possibility that blood donors can donate without relying on their previous donation behavior. They tried to improve the stock-demand interval by proposing a projective model, which aids to recognize the latent donors. Likewise, an artificial neural network- (ANN-) based system was developed by [11] in order to estimate and predict the arterial blood gas from various traumatic patients. In their system, they employed feedforward backpropagation neural network (FBPNN), which can only guess the quantity of these gases from the preliminary information of the patients.

In summary, a machine learning-based algorithm could also be the utmost indicating tool to provide the best evidence about patients who have been previously identified or assumed to be infested. Remotely, the intensive care of such patients, through medical-grade sensors and assembling data via video-based cameras, might enhance medical decisions for providers. In addition, the process also benefits them to further learn about the illness thus, they can better treat it.

Accordingly, in this work, we have extracted and selected the blood features by proposing the usage of a new feature extraction and selection algorithm named stepwise linear discriminant analysis (SWLDA), which emphasizes on picking confined features from blood structure images and discerning its class, based on the reversion value such as the partial F value. SWLDA begins with picking an equivalence comprising the sole finest X variable and then puts in effort to add more Xs individually, providing the situations are adequate. The process of adding and picking is based on F value to determine which variable would be entered. Then, the picked or the default F-to-enter value is compared with the uppermost partial F value. After this step, the forward addition or backward removal begins and whether the partial test values for all the predictor variables already in the line are estimated is known. Then, the comparison is made between the lowermost partial test value (FL) and preselected or defaulting consequence levels such as F0 (i.e., if F0>FL, the variable ZL is removed, and the F test is started again; otherwise, the regression equation is adopted). Finally, the system is trained by utilizing support vector machine (SVM) to label the blood images. The performance of the proposed approach is judged by employing 8 different classes of blood structure images such as blood cancer (BC), COVID-19, dengue (DG), human immunodeficiency virus (HIV), malaria (ML), and thalassemia (TL), tuberculosis (TB), and typhoid (TY). We collected a small dataset to show the performance of the proposed approach for each class. In the collected dataset, each class contains 20 images. The proposed method achieved significant results under different blood structure images, including COVID-19, against existing state-of-the-art methods.

The remaining of the paper is arranged as follows: Section 2 summarizes the most recent works against various kinds of radiology domains. Section 3 presents the proposed methodology. The description of the datasets utilized in this work and experimental setup are described in Section 4. The results with discussion are presented in Section 5. Finally, paper will be concluded with some future directions in Section 6.

Lots of machine learning methods have been proposed in order to classify various types of blood structures. A fused convolution neural network (CNN) model for white blood cells detection with MRMR feature selection and extreme learning machine has been proposed by [12]. However, computational-wise CNN is much more expensive than other models. Similarly, an intelligent feature extraction method was proposed by [13] in order to detect leukemia in white blood cells. They classified various types of white blood cells by employing Gaussian feature convolutional method. However, computational-wise, their method was costly.

Traditionally, manually using microscopes by pathologists is a time-consuming process in the computer world to diagnose the disease. This model is an essential issue in rural areas. Experienced pathologists depend on this. As a result, it gained some importance in the last couple of years of detecting malaria using computer image analysis which has been trained using dynamic learning mechanisms [14]. Deep learning is useful for large image classification with a small training dataset, and it may not be the best choice for this task. The combined use of in-depth and confident visual features is based on algorithm-learned features. These two descriptors are manually approached together to receive their fine-tuned deep convolutional neural networks and pretrained triples [13, 15].

Likewise, an automated system was developed by [16] which automatically detects and identifies the red blood cells from various blood images. Initially, they employed global threshold-based method to extract the regions from background against color images in green channel. After that, some morphological filters coupled with labeling algorithm were utilized for noise removing and holes filling. Once the images were enhanced, they extracted the prominent features through geometrical properties from red blood cells. However, mostly, the morphological filters distort noise over the entire image which makes the image less perceptible and causes devaluation to the quantity of evidence on the image instead of enlarging its quality [17]. The authors of [1820] utilized pretrained deep convolutional neural network called AlexNet to extract the informative features from blood microscopic images, which were further sent to a classifier for recognition. However, deep convolutional neural network takes huge number of resources such as computational time and power in order to train vast models from scrape.

An intelligent approach was proposed by [21] for the purpose of detecting and counting the anaemia that infested red blood cells in microscopic RGB images. The approach was based on Hough transform and morphological filters. However, Hough transform provides deceptive consequences when objects are aligned accidentally [22]. This is one of the major limitations of the Hough transform. Also, most of the morphological filters distort noise over the entire image which makes the image less perceptible. This causes devaluation to the quantity of evidence on the image instead of enlarging its quality [17].

On the contrary, dimension lessening through extracting the discerning features is based on the idea of lengthening the entire fling of the information whereas lessening the discrepancy inside the classes. It is confirmed that the feature values for the blood classes are exceedingly fused which might affect the outcomes within a high misclassification percentage. This is due to resemblances among the blood features, which is an outcome in high onside-class discrepancy and low amongst-class discrepancy. Hence, an approach is required which will differentiate the within class variances and also reduce the dimensions of the feature space. Some of the intelligent approaches such as kernel discriminant analysis (KDA) [23], generalized discriminant analysis (GDA) [24], and linear discriminant analysis (LDA) [25] were proposed to overcome the limitations of the existing works. Among them, the LDA has the best performance. However, LDA is not flexible when there a huge amount of data has been utilized. Please check [26], for more details about LDA.

In summary, it is observed that LDA would not fundamentally produce an improvement in the significance of an automatic classification system [27]. Moreover, all the systems whose performances were assessed only on two or three types of blood structures classification are considered to be a common limitation of the previous works. Accordingly, this research proposed the procedure of a better technique, i.e., SWLDA [28]. This technique does not agonize anyone of the abovementioned concerns. Based on our knowledge, SWLDA has been employed first time as a feature extraction and selection technique in blood structure classification systems. Furthermore, we classified eight different types of diseases using various blood structure images, which is one of the major contributions of the current work.

3. Materials and Methods

3.1. Stepwise Linear Discriminant Analysis

One of the robust linear classification methods named Fisher’s linear discriminant is used to determine the optimal classification among the two classes [23] It is based on Gaussian distribution with an alike covariance. The comparison of fisher linear discriminant and the least-squares regression algorithm in the form of binary jobs is shown aswhere M shows the medium of detected feature vectors and Y presents the labels of the vector class. Fisher linear discriminant has the ability to deliver the utmost classification result against linear data; however, in the presence of sequential data, it does not provide a significant solution. Therefore, we employed a nonlinear feature extraction/selection technique named SWLDA [28]. The SWLDA model consists of multiple steps. It is an iterative process that starts without any prior predictions. Then, in each iterative phase, a predictor’s inclusion or exclusion is decided based on the tests for the slope parameters called partial F-test values, i.e., the t-test values. When no predictors are included or excluded in the iterative sequential model, the model stops directing us to a ‘final model.’ A brief description of different phases of SWLDA is as follows.

3.1.1. Initial Process

Firstly, we had to identify and set a threshold or an impact level to decide the inclusion of a predictor in iterative model (known as Alpha-to-Enter, threshold level, denoted by ae). Similarly, another threshold or an impact level was required in order to identify the exclusion of a predictor in the model known as Alphato-Remove (represented by ar). A simple way to find these values is to try different values and select the one that provides the best results. Assuming 3 different interpreters (I1, I2, and I3) and a result (response) ‘y’, the following three steps are then performed.

(1) First Step. After specifying the threshold/impact levels,(i)All interpreters are fitted, i.e., regressing y on every I1, I2,..,Ip−1, where p presents the entire number of interpreters (in this case )(ii)Those interpreters whose t test, such as values, is fewer than ale; the first interpreter is added in the stepwise model which will become the interpreter that has the lowest t test value(iii)If no interpreters have a t test value fewer than ale, the model will stop working

(2) Second Step.(i)Let us assume that I1 has the lowest t test, the underneath ale, and hence, it is the best interpreter nominated from the initial step.(ii)Fit every of the two interpreter models which includes I1 as an interpreter, i.e., regressing y on I1, I2, and I1, I3.(iii)For those interpreters whose t test with value is lower than ale, the second interpreter to be placed in the model will be the interpreter that has the smallest t test with value.(iv)If there is no interpreter has a t test with value lower than ale, then the model stops inserting, due to which the first interpreter in the model that is inserted from the first round considered as the final.(v)Nevertheless, assume instead that I2 is reflected as the utmost second interpreter and hence passed into the model.(vi)As I1 was the primary interpreter in the model, stage back and watch if inserting I2 into the model someway exaggerated the implication of the I1 interpreter. Then, explicitly observe the t test with value for evaluating B1 = 0. Assume that the t test with value for B1 = 0 has turned out not to be important, that is, the value is superior than alr, and eliminate I1 from the model.

(3) Third Step.(i)Perform the above step, iteratively, until adding an extra interpreter does not produce a t test with value underneath ale.

More information on the stepwise model can be found in [28]. The overall flowchart of the stepwise model is shown in Figure 1. For ale, based on various experiments, we chose a value of 0.35, as this value provided the best results. Likewise, we also set alr = 0.40, and these values are known as selected (default) values, and the value is a vector for testing whether elements of L or 0.

3.2. Classification Using Support Vector Machine

Support vector machine (SVM) is one of the well-known numerical methods that might be utilized in image processing, pattern recognition, and machine intelligence [30]. SVM is used extensively for the purpose of linear and two-fold classification. It is reliant on the optimal splitting choice hyperplane among the two or additional classes along with the supreme borderline within the shapes of separate class. SVM employs the supposed purpose that projects the data from the unique feature space to additional higher dimensional space because of which the linear classification in the novel space is equal to nonlinear classification in the original space.

SVM has the ability to identify two or extra classes via hyperplanes. In this step, we employed an optimum algorithm to find the finest separating hyperplane within various class symbols, as presented in Figure 2.

Commonly, SVM is described as given below:where M presents the regular vector to the hyperplane which splits the two or more classes, φ is the intrinsic function of the inserting data, y describes the data point, and T shows the training data. This is associated to the subsequent function as below:where Re(y) is the subsequent function which presents the training designs; this is the so-named support vector which grips the entire information nearby the classification concerns.

4. Evaluation Process

4.1. Datasets Used

We utilized the following dataset in order to show the efficacy of the proposed approach.

4.1.1. Blood Structure Images’ Dataset

In this work, we utilized the blood structure dataset of different diseases such as blood cancer, COVID-19, dengue, HIV, malaria, thalassemia, tuberculosis (TB), and typhoid, and 160 images are given (20 images each). This dataset was collected from various open sources which were thoroughly checked by the medical experts (physicians). The images in this dataset were collected from both female and male patients. Also, we did not provide the metadata for the entire patients. The age limit for the patients was between 30 to 65 years. In this dataset, the images were transformed to a zero-mean vector of size 1 × 8100 by reducing the size of each input frame to 90 × 90. The dataset was collected within the period of 2 months (from August to September of 2020). All the images are in JPG format for the purpose of keeping the original quality of the images. Moreover, all the images have been passed from the preprocessing step in order to normalize them accordingly.

4.2. Experimental Setup
(i)In this study, heavy experiments are accomplished to present the significance of the proposed model. All the experiments are accomplished in Matlab on a personal computer having specification such as Intel Pentium Core™ i7-6700 (3.7 GHz) with the capacity of RAM of 16 GB.(ii)In the first experiment, the significance of the proposed approach is analyzed. For this experiment, all the images of the eight diseases are employed to show the validation of the proposed approach.(iii)In the second experiment, the performance of the blood structure classification system is trained and tested under the nonpresence of the designed approach using the collected blood structure dataset. For this purpose, we utilized existing well-known feature extraction algorithms such as local binary pattern, local directional pattern, local directional pattern variance, Hough transform, wavelet transform, curvelet transform, scale invariant feature transform, template matching, deformable templates, and unitary image transforms.(iv)Lastly, here in this experiment, the comparison between the developed model and previous state-of-the-art feature extraction methods has been done.

5. Results and Discussion

In this section, we performed a series of experiments against the defined blood structure images of dataset. The overall results of the proposed approach are presented in the following sections.

5.1. First Experiment

In this experiment, the performance of the proposed approach is tested and validated using the blood structure images dataset. The experiments are performed under the setup described in Section 4. The overall results are shown in Table 1 and Figure 3. It is obvious from Table 1 and Figure 3 that the proposed approach achieved best recognition rates on blood structure images of dataset. This is because the proposed approach has the ability to choose the utmost revealing features by captivating the benefit of the forward selection model and to eliminate the inappropriate feature by taking the benefit of the backward reversion model.

5.2. Second Experiment

This section presents the significance of the proposed approach, and for this purpose, huge experiments are performed. We employed well-known existing feature extraction methods such as local binary pattern, local directional pattern, local directional pattern variance, Hough transform, wavelet transform, curvelet transform, scale invariant feature transform, template matching, deformable templates, and unitary image transforms under the absence of the developed feature extraction approach. The overall results are described in Tables 211.

As can be shown from Tables 211, the blood structure classification system does not achieve best performance on existing feature extraction methods (under the absence of the proposed method). This is because the proposed method begins by selecting an equivalence obtaining the single utmost I variable and then puts in effort which are required to increase more Is individually by providing the conditions are adequate. The addition and selection depend on partial F-test values to find which variable would be inserted. Then, the comparison is made between uppermost partial F value and the default F-to-enter value. After this procedure, the forward addition or backward removing starts. In this procedure, the partial test values for the entire interpreters (which are already presented in the line) are computed. Then, the lowermost partial test value such as FL, is equated with default consequence levels such as F0, (i.e., if F0>FL, the variable ZL is removed, and the F test is started again; otherwise, the regression equation is adopted).

5.3. Third Experiment

In this experiment, we made a comprehensive comparison between the developed technique and state-of-the-art works against recognition rates. Some of the methods are implemented by us under the exact setting as described in their respective articles; while, for some methods, we borrowed their simulation. The comparison results are presented in Table 12.

As clarified in Table 12, the designed system (under the presence of the proposed feature extraction method) outperformed more than the existing works. This is because the proposed feature extraction method emphasizes on choosing local features from blood structure images and discerning their class based on reversion (regression) values such as partial F test.

6. Conclusions

There is a rapid growth of technology in every field of life, and its use in the medical field is no exception. In the medical field, blood is a vital body fluid and can be instrumental in identifying various pathological conditions. Nowadays, a lot of people are suffering from COVID-19 and every country has its own limited testing capacity.

Artificial intelligence- (AI-) based methods might be utmost obliging in offering information about those patients who previously were identified or suspicious of being infected. Remotely, the intensive care of such patients along with medical-grade sensors and gathering data on video-based cameras might enhance the medical conclusions for providers, and the procedure may also aid them to learn more about the illness Thus, they might treat it well. Accordingly, in this work, we built a system that automatically identified the patients’ diseases with their blood structure images. In this work, we developed the procedure of nonlinear feature extraction technique named stepwise linear discriminant analysis, which extracts and selects the local features from the blood structure images and discriminates them based on regression values. The extraction and selection are based on partial F test values in order to determine which variable would be entered. Moreover, in the proposed methods, we utilized the advantages of forward and backward algorithms that extract the relevant feature and remove the irrelevant features, respectively, from the feature space. After this, support vector machine has been utilized in order to classify the blood samples. The proposed system might help doctors to remotely check the patients’ condition without suspecting them, relying on the doctors’ ability to easily provide some basic recommendations for better improvement.

Most of the existing blood structure classification systems were implemented in a controlled environment (in labs) that is not realistic. Therefore, in the future, we will implement the proposed system along with the proposed feature extraction method in healthcare in order to facilitate medical experts or physicians.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares there are no conflicts of interest.

Acknowledgments

The author extends his appreciation to the Deanship of Scientific Research at Jouf University for funding this work through research Grant no. DSR2020–04–2613.