Leukemia is a fatal category of cancer-related disease that affects individuals of all ages, including children and adults, and is a significant cause of death worldwide. Particularly, it is associated with White Blood Cells (WBC), which is accompanied by a rise in the number of immature lymphocytes and cause damage to the bone marrow and/or blood. Therefore, a rapid and reliable cancer diagnosis is a critical requirement for successful therapy to raise survival rates. Currently, a manual analysis of blood samples obtained through microscopic images is done to diagnose this disease, which is often very slow, time-consuming, and less accurate. Furthermore, in microscopic analysis, the appearance and shape of leukemic cells seem very similar to normal cells which make detection more difficult. In the past decades, deep learning utilizing Convolutional Neural Networks (CNN) has provided state-of-the-art approaches for image classification problems; however, there is still a gap to improve their efficacy, learning procedure, and performance. Therefore, in this research study, we proposed a new variant of deep learning algorithm to diagnose leukemia disease by analyzing the microscopic images of blood samples. The proposed deep learning architecture emphasizes the channel associations on all levels of feature representation by incorporating the squeeze and excitation learning that recursively performs recalibration on channel-wise feature outputs by modeling channel interdependencies explicitly. In addition, the incorporation of the squeeze-and-excitation process enhances the feature discriminability of leukemic and normal cells, and strategically assists in exposing informative features of leukemia cells while suppressing less valuable ones as well as improving feature representational power of deep learning algorithm. We show that piling these learning operations of squeeze and excite together in a deep learning model can improve the performance of the model in diagnosing leukemia from microscopic images based on blood samples of patients. Furthermore, an extensive set of experiments are performed on both cropped cells and full-size microscopic images as well as with data augmentation to address the problem of fewer data and to further boost their performance. The proposed model is tested on two publicly available datasets of blood samples of leukemia patients, namely, ALL_IDB1 and ALL_IDB2. The suggested deep learning model exhibits good results and can be utilized to make a reliable computer-aided diagnosis for leukemia cancer.

1. Introduction

Leukemia is a type of cancer that has a very high mortality rate [1]. It is accompanied by the malicious cloning of abnormal white blood cells (WBC) and is hence referred to as a malignant hematological tumor [2]. Usually, the human body comprises three cell types: red blood cells, white blood cells, and platelets, as shown in Figure 1. The supply of oxygen from the heart to all tissues is often the responsibility of red blood cells [3]. They account for up to half of the total volume of blood. Likewise, the white blood cells play a pivotal role in the immune system of the human body and act as a defense wall from numerous infections and diseases [4]. As a result, the correct categorization of these white blood cells is critical to determine the nature of the disease. They are divided according to the composition of the cytoplasm. Lymphocytes are one of the categories of white blood cells and their disorders caused Acute Lymphoblastic Leukemia (ALL) [5]. Generally, leukemia is categorized into two subtypes known as acute leukemia and chronic leukemia. Without any particular treatment, the overall recovery rate of acute leukemia is barely three months while the onset period of chronic leukemia is more than acute leukemia. Acute lymphocytic leukemia (ALL) is one of the widespread types of acute leukemia responsible for about 25% of all childhood cancers [6]. It originates in the lymphatic system, which generates the blood cells. At the beginning stage, it appears in the bone marrow and is subsequently disseminated throughout the human body. In a healthy individual, the growth of WBC is dependent on the requirements of the body, but in the context of leukemia, they are formed abnormally while becoming ineffective.

Usually, the dark-purple-like color of these leukemic cells makes it easy to identify them but the assessment and further processing become extremely sophisticated due to the pattern and texture-based variations. Leukocytes are a class of cells that vary dramatically from each other. They might be recognized by their shape or size, but one problematic factor is that they are flanked by some other elements of the blood which includes red blood cells and platelets. The shape of lymphocytes is somewhat regular, and their nuclei have uniform and flat borders. The lymphocytes also called lymphoblast in patients of ALL have a quite minimal uniform border and tiny cavities in the cytoplasm known as vacuoles as well as inside the nuclei spherical particles are referred to as nucleoli. The disease becomes more acute as the stated morphology becomes more prominent. This might also result in premature death if the intervention is neglected and if its diagnosis is done later in the disease’s progression. The age of a patient has a vital risk factor influencing prognosis because the probability of having ALL is greater in children aged 7–8 years. This probability is eventually reduced up to the age of 20 and starts to rise again around the age of 50. According to information reported by Ref. [7], 5930 new cases had the disease ALL in the United States in 2018, and around 1500 individuals, including both children and adults, are likely to die from ALL. Furthermore, according to data reported in Ref. [8], in 2015, there were around 876,000 individuals who experienced ALL worldwide, and it triggered 111,000 deaths. The medication of acute lymphoblastic leukemia has evolved to make great development in the past 50 years. The survival rate of patients has increased up to 70% with early assessment and intervention [9]. Hence, at the earlier stages of acute lymphoblastic leukemia, its diagnosis and effective treatment are very essential. One of the important tools employed by the medical operators to diagnose acute lymphoblastic leukemia is referred to as morphology. With this diagnostic tool, it can be observed that a patient is suffering from acute lymphoblastic leukemia whenever the bone marrow has a considerable amount of cancer cells (B-lymphoblast cells). The fundamental factor to diagnose acute lymphoblastic leukemia is precisely discerning of cancer cells from normal cells (B-lymphoid precursors). On the contrary, the visual appearance of cancer cells is somewhat very similar to normal cells in microscopic images, which makes it hard to distinguish between them. Furthermore, it is very crucial for the hematologist to diagnose the presence of leukemia along with its specific form to prevent medical problems and determine the optimal treatment of leukemia disease. The screening of leukemia by a specialist through human blood samples is a critical and time-consuming task.

To tackle such challenges, quantifiable analysis of different blood samples is performed in the computer-aided-diagnosis (CAD) systems that are designed by employing either machine learning or deep learning approaches. There exist numerous research studies in which leukemic cancer detection is performed. With regard to traditional machine learning methods, a discriminative set of leukemic cells’ features are first extracted followed by the process of classification [10]. Some researchers have suggested the segmentation process so that the accurate features are extracted from the region of interest, i.e., segmented lymphocyte images [11]. These segmentation methods include k-means, watershed, as well as HSV color-based segmentation [11]. In these segmentations, the extra elements present in the blood are eliminated and thus only details related to WBC involving lymphocytes and lymphoblast are drawn [10]. Specifically, for leukemic disease, the segmentations are generally divided into pixel-based, region-based, as well as shape-based approaches [12]. It has been observed that segmentation strategies based on K-means and edge-based are widely used to segment out the cells of a blast from various smears of blood [13]. Recently, it is reported that by combining thresholding and morphological techniques superior segmentation is achieved [14]. Furthermore, the complex images having variations such as low contrast, noise-sensitivity are challenging to segment accurately using these approaches [15]. Furthermore, a lot of feature extraction approaches are employed in leukemic cell analysis. These include morphological features, such as shape, and edge features. In addition to these, textural, color, and GLCM, as well as geometrical and statistical features are also employed [10]. Some research studies have performed the hybridization of these features to further enhance the performance [16, 17]. Likewise, different classifiers have been exploited to perform the classification among leukemic and normal cells. These include Support Vector Machines (SVM), K-nearest neighbor (KNN), Random Forest (RF), Naive-Bayes, etc. [10, 18, 19]. All of these traditional machine learning approaches show significant results; however, these approaches require a lot of parametric steps as well as accurate analysis and feature engineering before the classification phase. When opposed to a good data representation, a bad data portrayal frequently results in worse performance [20].

Subsequently, with the emergence of deep learning, a lot of problems and challenges in image analysis have been solved as these approaches employed automated feature engineering. In the recent past, the automated diagnosis of several diseases with the science of computer vision emerges as a potential research area [21, 22]. Image recognition and segmentation using deep learning are some of the imperative elements in the technology of computer vision [23]. One of the most frequently used deep neural networks in computer vision is Convolutional Neural Networks (CNNs) [2427]. These CNNs possess a great deal of self-learning capability, adaptability, and generalization power and are heavily used in medical imaging problems and IoT-based systems [28, 29]. Conventional image identification techniques need hand-crafted features extraction followed by categorization, while the CNN-based methods only require the image data which are given as an input to the network, and the task of image classification is achieved by their self-learning property [30]. Besides this, they have also required a substantial amount of data as well as computing power to train. In many circumstances, the total number of data samples is inadequate for a CNN to train from the beginning. In such situations, transfer learning is employed to exploit the potential of CNNs, while minimizing the computing cost.

Particularly, for the diagnosis of leukemia cancer, a lot of research studies have been proposed based on deep learning frameworks. In such methods, some research studies have suggested CNN-architectures with different depth levels and the setting of layers to perform leukemia cancer detection [31, 32]. It has been observed that deep learning through transfer learning method is the most widely used approach in leukemia cancer detection [33]. Several different pretrained models including AlexNet, MobileNet, ResNet, Vgg16, etc., have been exploited [34, 35]. In addition, it is indicated that deep learning methods work better than traditional machine learning methods in leukemia cancer detection [36]. However, in terms of feature learning, accuracy, and effectiveness, these techniques still have some shortcomings and need to be addressed. The emergence of new CNN architectures is a difficult engineering endeavor that often necessitates the choice of several new hyperparameters and layer settings. Furthermore, in existing studies, the feature discriminability among leukemic and normal cells is not well-considered; hence, what if the learning or feature representation of deep learning algorithm is improved by adding more discriminating power to further boost up the performance? Secondly, most approaches are based on transfer learning methods and have reported very accurate results. Is there, however, a method other than transfer learning such as to increase the performance of a simple deep learning algorithm? This research study attempts to answer these questions, by suggesting a deep learning algorithm whose representational power is improved by incorporating squeeze-and-excitation learning. The main aim of this article is to provide a deep learning solution with the goal of addressing different challenges such as assisting timely as well as an accurate diagnosis by empowering the feature discriminability among leukemic and normal cells. Furthermore, it is worth noting that improving deep learning algorithms is an ongoing research challenge among numerous researchers. Convolutional neural networks (CNNs) and traditional deep learning models are excellent algorithms for solving a wide range of visual problems. Recent research [37], however, indicates that the representational power of traditional CNN architecture can be improved by adding modules that accurately describe dynamic and nonlinear relationships among channels using global details. Further, these modules aid in the learning of the model and considerably improve its accuracy. Hence, deploying and suggesting better deep learning solutions is one of the secondary objectives of this study. In addition, the proposed technique does not require a prior segmentation and all its parametric steps adjusted by the user to further perform the leukemia detection, rather it defines a fully automated solution to leukemia cancer detection.

More specifically, in this research article, we proposed an effective learning-based deep learning model for leukemia disease detection using microscopic blood samples–based image modality. The feature representation at every layer of feature extraction and representation is improved by emphasizing the interdependencies among channels [37]. This can be accomplished by the squeeze and excitation learning process in which we first squeeze the features acquired by convolution layers from microscopic blood samples to generate the channel descriptor. This descriptor combines the wide-range distribution of outcomes provided by channel-wise features and causes the feature details global receptive field of the model to be utilized by bottom layers. Similarly, after the squeeze, the excitation process further enhances the features in which the activations related to samples are being learned for every channel by a self-gating process depending on channel reliance, by regulating the excitation of each channel. Both types of learning operations empower the feature representation of blood samples of leukemia disease which ultimately results in an inaccurate diagnosis of leukemia cancer. In addition, these operations also help in improving the feature discriminability among leukemic and normal cells. Furthermore, the total number of blood samples in both of the datasets is not appropriate for the training of the model; therefore, an excessive augmentation is also performed to boost the performance. Besides, we have demonstrated the results of leukemia diagnosis by the proposed Model using both cropped and full-size microscopic images, respectively. Some samples images from ALL_IDB1 and ALL_IDB2 are shown in Figure 2. The research has the following contributions:(i)An all-inclusive efficient and improved representational power-based deep learning model is proposed to diagnose the leukemia disease from microscopic blood samples(ii)A feature discriminability among leukemic and normal cells in blood samples is enhanced by global information embedding in squeeze operation and recursive recalibration using the excitation process(iii)During the feature extraction process, the proposed improved deep-learning model emphasizes relevant features of leukemic cells while suppressing irrelevant ones, resulting in improved performance(iv)The proposed model shows significant improvements over the traditional deep learning model and can be integrated with any Internet of Medical Things (IoMT)-based systems

The rest of the paper is partitioned into several sections: Section 2 presents some existing work, Section 3 describes the proposed method in detail, Section 4 explains experimentation results, and Section 5 concludes the paper followed by future direction.

Over the decades, several strategies for automated leukemia identification on microscopic images have been established in the literature. These strategies include the traditional machine learning classifiers and deep learning algorithms. However, some approaches have employed ensemble machine learning as well as hybrid deep learning methods for leukemia cancer detection.

2.1. Conventional Machine Learning Approaches

In the existing literature, machine learning methods are extensively employed for leukemia cancer detection. These methods are generally categorized into several steps such as preprocessing, feature extraction, followed by classifications. However, some methods also involve segmentation and feature selection procedures to further improve the performance. For instance, Singhal et al. employed the Support Vector Machine (SVM)-based approach for automated diagnosis of Acute Lymphoblastic Leukemia (ALL) [13]. This diagnosis can be accomplished by extracting the geometric features as well as texture features using Local Binary Patterns (LBP). The experimental outcomes of their proposed method demonstrate that texture features surpass the geometric features and exhibit an accuracy of 89.72%, which is a little bit high than the 88.79% accuracy given by geometric features. Similarly, Mohamed et al. proposed another method in which the color space of every microscopic image is transformed into YCbCr followed by acquiring the values of Gaussian distribution of Cb and Cr [38]. Later on, different sets of features are computed including morphological, texture, and size to train the classifier. Their designed strategy attained 94.3% accuracy by using the Random Forest as a classifier for the detection of two classes of leukemia (ALL and AML) and Myeloma. Mohapatra et al. suggested a framework for screening acute leukemia in pigmented blood samples and microscopic images of bone marrow [39]. After the extraction of features from microscopic images, a model based on the ensemble approach is trained for classification. In contrast to other traditional classifiers, such as naive Bayesian (NB), K-nearest neighbor, radial basis functional network (RBFN), multilayer perceptron (MLP), and SVM, their proposed ensemble attained 94.73% performance accuracy along with above 90% resultant values of average sensitivity and specificity. Subsequently, Patel and Mishra designed the framework using unsupervised learning in which leukemia identification is performed using k-means clustering [40]. With the help of this, leukemia detection is estimated by computing the proportion. Bhattacharjee et al. suggest an approach for the identification of acute lymphoblastic leukemia that employs the watershed transforms preceded by morphological transformations for segmentation. After extraction of morphological features, the Gaussian Mixture Model (GMM) and Binary Search Tree (BST) are employed to carry out the classification. Their proposed approach shows 95.56% accuracy. Mishra et al. proposed a model based on Linear Discriminant Analysis (LDA) for the classification of leukemia disease by employing Discrete Orthogonal Stockwell Transform (DOST) [41] for feature extraction from blood sample images [42].

2.2. Deep Learning Approaches

In the context of deep learning, many researchers adopt and design several architectures for the automated classification of leukemia cancer. These deep learning methods are further classified into traditional standalone deep learning models or the transfer of learning-based approaches. For instance, Shaheen et al. suggested the AlexNet-based deep learning model to diagnose Acute Myeloid Leukemia (AML) using blood samples in the form of microscopic images [34]. They have compared the performance of their presented approach with the LeNet-5 model in terms of accuracy, quadratic loss, recall, and precision. Their proposed method shows 98.58% accuracy along with 88.9% of the microscopic images being accurately classified with 87.4% accuracy. Rehman et al. suggested a CNN architecture comprising several convolutional and max-pooling layers for leukemia cancer detection [31]. Prior to providing data samples as an input to the CNN algorithm, all microscopic samples are first preprocessed to be converted into HSV color space followed by a segmentation process to obtain the required region-of-interest. In their work, an accuracy of 97.98% is reported for leukemia cancer detection. Zakir Ullah et al. suggest the attention-based deep learning model to extract the most relevant features of leukemic cells [43]. However, the designed model is based on VGG16, which is one of the pretrained deep learning models. Their proposed method utilizes the segmented leukemic (malignant) and normal cell images and validation is performed using a 7-fold cross-validation. Pansombut et al. suggested a CNN model called ConVNet to detect ALL and all its subtypes [44]. They have compared their designed framework with traditional machine learning techniques including SVM, multi-layer perceptron (MLP), and Random Forest (RF). They employed two kinds of datasets with a total number of images in the collection being 363. Shafique and Tehsin designed a deep learning model to categorize leukemia disease into six different classes [2]. They employed a pretrained AlexNet to undertake binary classification on 368 images to avoid having to train from the beginning. A classification algorithm for WBC employing both transfer and deep learning is designed by Habibzadeh et al. [45] In the first stage, they have performed the preprocessing steps on the dataset followed by the process of feature extraction. In the last stage, the classification procedure is carried out through Inception and ResNet model. A total of 352 images are used in their work to validate the model’s accuracy. Ahmed et al. also designed an efficient approach for the categorization of White Blood Cell Leukemia [46]. In their work, the deep features are extracted using VGGNet and reduced by Swarm Optimization. This bio-inspired optimization technique plays a pivotal role in optimizing the deep features for accurate and reliable classification of White Blood Cell Leukemia. This work also reports encouraging results. One of the latest research in leukemia detection is the study by Bibi et al. [47]. They proposed an Internet of Medical things (IOMT)-based framework [48]along with the assistance of cloud computing and diagnostic devices that are connected through Internet resources. The designed system enables real-time synchronization for screening and treatment of leukemia in patients as well as medical operators and professionals, thereby potentially decreasing the work and effort for patients and doctors. Their automated system is based on Dense Convolutional Neural Network (DenseNet-121) [49] and Residual Convolutional Neural Network (ResNet-34). The performance of the proposed method is validated on two different benchmark datasets referred to as LL-IDB and ASH image bank and the reported results are exceptional.

2.3. Hybrid Deep Learning Approaches

Other than employing standalone deep learning models, some research studies designed the hybrid deep learning frameworks to perform the leukemia cancer detection. For instance, Yu et al. proposed a hybrid method in which ResNet50 [50], VGG16 [51], and VGG19 [51], based on state-of-the-art convolutional neural networks (CNNs), are employed to carry out the automated identification of cells [52]. The outcomes of their proposed approach are compared with conventional machine learning approaches, i.e., K-Nearest Neighbors (KNN), Logistic Regression (LR), Support Vector Machine (SVM), and Decision Tree (DT). Their proposed technique shows 88.50% accuracy for cell recognition. Mourya et al. also design a hybrid model based on deep learning architecture in which dual CNN architectures are employed to enhance performance accuracy [53]. The proposed approach is validated on 636 blood samples of healthy and ALL cells and exhibits 89.70% accuracy. Furthermore, Jiang et al. employed the ViT-CNN referred to as vision-transformer CNN based on ensemble learning [54]. The proposed technique is able to distinguish the normal and cancer cells that are helpful in the detection of Acute Lymphoblastic Leukemia (ALL). In their work, both vision transformer and CNN-based model are integrated to draw the extensive set of cells features into distinct ways to obtain the improved classification outcomes. They have also enhanced the data by employing enhancement-random sampling (DERS) to overcome the challenges of the unbalanced dataset. Their proposed algorithm shows outstanding results of 99.03% which proves the effectiveness of the proposed method as a CAD system for Acute Lymphoblastic Leukemia (ALL). Kassani et al. designed a hybrid approach in which VGG16 and MobileNet are combined to extract the deep features followed by classification of Leukemic B-lymphoblast [55]. Their proposed approach is enriched with various data augmentation methods and attained 96.17% accuracy, 95.17% sensitivity, and 98.58% specificity. Furthermore, Zoph et al. merge the two deep learning models, namely, NASNetLarge [56] and VGG19 to categorize the leukemic B-lymphoblast cells and normal B-lymphoid precursor cells, with a detection performance of 96.58% [57]. Their proposed model effectively diagnoses acute lymphoblastic leukemia and illustrated that in contrast to a single model, the ensemble learning is much better.

In addition, optimization-based algorithms are also employed for Leukemia disease classification. Krishna et al. proposed Chronological Sine Cosine Algorithm (SCA)-based deep learning model to detect the acute lymphocytic leukemia from the blood sample images and attained a 98.70% value of accuracy [58]. For instance, Tuba et al. employed the Generative Adversarial optimization (GAO) [59] for the detection of acute lymphocytic leukemia and achieved a 99.66% resultant value of accuracy [60]. Similarly, Saif et al. employed both Artificial Neural Network (ANN) and Genetic Algorithm (GA) [61] and carried out the segmentation of acute lymphoblastic leukemia utilizing local pixel information and reported 97.07% accuracy [15]. Acharya et al. proposed to design an acute lymphoblastic leukemia diagnosis by employing image segmentation and data mining techniques [62] and achieved 98.60% accuracy. Our suggested simple deep learning model employs squeeze-and-excitation learning and addresses the problem of morphological similarity among leukemic and normal cells, thereby increasing the accuracy of the traditional deep learning model and categorizing the images as healthy or unhealthy blood samples.

3. Methodology

The design overview of the proposed methodology is depicted in Figure 3. The proposed framework begins with the acquisition of microscopic images of blood samples. Later on, the data augmentation techniques are employed to overcome the problem of fewer data since in deep neural networks more data are required for their training and superior performance. Lastly, a deep CNN architecture-based squeeze and excitation learning is proposed to diagnose leukemia from the inputted microscopic images of blood samples. Each step is explained in-depth in the following subsections of methodology:

3.1. Acquisition of Data

The data utilized in this work to evaluate the model’s performance were obtained from the Acute Lymphoblastic Leukemia Image Database for image processing (ALL-IDB). We used both of the datasets given by this database, ALL-IDB1 and ALL-IDB2. These are publicly available datasets that comprise microscopic images of blood samples. The database focuses on Acute Lymphoblastic Leukemia (ALL), which is a potentially deadly type of leukemia. It is most frequently found in childhood, with the highest prevalence between the ages of 2 and 5 years. In the datasets, the labeling of ALL lymphoblast is annotated by experienced oncologists. All microscopic images are captured by a Canon Power Shot G5 camera which was used in conjunction with an optical laboratory microscope. The range of magnifications of the microscope is from 300 to 500 during data collection. All microscopic blood sample images are in the jpg. format, along with a 24-bit color depth. More precisely, the first dataset ALL-IDB1 is comprises 108 images including 39000 components of blood, wherein the lymphocytes have been annotated. Similarly, in the second dataset, the regions of cells are cropped from the whole microscopic image. Except for the image dimensions, ALL-IDB2 images have comparable grey level features to ALL-IDB1 images.

3.2. Data Augmentation

CNN exhibited cutting-edge performance in a variety of tasks. However, the amount of the training data has a significant influence on CNN performance [24, 25, 63]. Acquiring sufficient clinical images is a challenging task, due to data privacy concerns especially in the field of medical imaging. On the other hand, if machine learning and deep learning models were trained using the original images as well as with augmented samples, they might be more generalizable. In various image-based studies with CNNs, several ways of data augmentation have diminished the error rate of the network by providing speculation. The dataset used in this research includes a wide variety of microscopic blood sample images, but the quantity of blood samples in both datasets is quite limited. Hence, to tackle the problem of small dataset size, and overcome the problem of overfitting, we have employed several types of data augmentation to artificially increase the data for training of the model. In this research study, the augmentation type of rotation at 60 degrees, 90 degrees, and random shift in range (−1.0, 1.0) is employed.

3.3. Proposed CNN Architecture

The proposed deep CNN architecture is described below in detail:

3.3.1. Convolutional Layers and Max-Pool Layers

Generally, the two fundamental operations of the Convolutional neural networks (CNNs) include the convolution and max-pooling layers mimicking a variety of substantially complex cells in the visual cortex. Besides this, CNNs have a localized perceptive area, hierarchical organization, feature extraction, and classification phase that can automatically learn the appropriate feature and categorization process, and has a significant implication in the domain of computer vision. In the proposed model, microscopic images of blood samples are preprocessed by data augmentation and directly fed into the proposed deep learning model design to craft the localized features. Consider a microscopic blood sample image of a patient of dimension with a kernel size of which is convolved over the image to generate a collection of features maps of dimensions , as shown in equations (1) and (2):

In equations (1) and (2), the zero-padding in the direction of both width and height is denoted by and while the value of stride in both vertical and horizontal directions is denoted by and , respectively. The input image of size is provided as an input to the first convolutional layer. Continuing to follow the convolution layers, a pooling layer is also used which contributes to diminishing the computing and spatial necessities of the activation function and makes the proposed model to be translation invariant and functions interdependently on each layer of the input data and spatially downscales it.

3.3.2. Integrating Squeeze-and-Excitation Learning

In order to raise a network’s representational potential, we have employed the Squeeze-and-Excitation learning to strengthen the spatial encoding of the model [37]. This learning emphasizes the channels’ relationships by recursively recalibrating the outputs of channel-wise features as well as simultaneously considering channel interdependencies. It can be added as a computing unit that can be formed for any particular transformation such as . For the sake of clarity, is supposed to be a convolutional function in the accompanying notation. Consider the , which signifies the learned set of filter kernels, wherein indicates the filter’s parameters. Therefore, the outcomes of the can be specified as , in which the value of is defined in equation (3):

In equation (3), represents the convolution, and (the bias terms are ignored for simplicity], while denotes the filter of size 2D, and hence indicates the singular channel that operates on the equivalent channel. The correlations among channels are implicitly encoded in because the result can be computed by summing all the channels but they are jumbled with the spatial correlation acquired by the filters. Here the main objective is to assure that the model is able to improve its sensitivity to relevant features so that they can be accessed by some transformations while silencing less relevant ones. This can be accomplished by modeling the channel interdependencies explicitly and recalibrating kernel outputs in two stages, squeeze and excitation operation, before passing them into the succeeding transformation. The pictorial representation of squeeze and excite operations is shown in Figure 4.

(1) Embedding of Global Information by Squeeze Operation. The signal to every channel is taken into account in final features to address the channel dependencies. Since every learned kernel applies with a local receptive field, each component of the transformation result is not capable of employing the context-related information beyond this region. The bottom layers of the model where the sizes of the receptive fields are smaller also become the major cause of this problem. This can be addressed by adding spatial details of squeeze global into a descriptor channel. More precisely, this can be accompanied by adding the average pooling, i.e., global to produce the statistics of channel-wise information. Mathematically, a statistic is formed by contracting through spatial dimensions in which the component of is computed by the following equation:

The output of transformation could well be interpreted as a group of descriptors that are local as well as for those whose statistics represent the complete microscopic image of the blood sample. Usually, in the feature engineering part, such information is useful [6466]. The aggregation technique used here is the global average pooling.

(2) Recalibration through Excitation Operation. In order to completely acquire the channel-wise dependencies, another operation, namely, excitation is done to take advantage of the information attained in the squeeze operation. This aim can be accomplished by fulfilling two important conditions. The first one is flexibility indicating that the model should be able to comprehend the nonlinear relationships among channels. Similarly, the second criteria are that the model should learn a nonexclusive relationship because we would like to guarantee that different channels are allowed to be noticed in contrast to one-hot activation. For this purpose, a gating mechanism is employed along with sigmoid activation.

In equation (5), the term denotes the ReLu activation, and . To incorporate the generalization and reduce the complexity of the model, the gating process is parameterized by designing a bottleneck along with two FC layers with nonlinearity, i.e., the parameters and denoting reduction ratio forms a layer of dimensionality reduction followed by ReLu activation. Lastly, by the use of parameters , a dimensionality growing layer is added. The block’s final result is computed by rescaling the result of the transformation with the following activations:where and denote channel-wise multiplication among the feature maps and the scalar . The activations are served as channel weights that are tuned to the particular descriptor . In this manner, the squeeze and excite operation assists to improve the feature learning discriminability by introducing dynamics that are conditional on the input.

3.3.3. Network Architecture

The architecture of the proposed model is amalgamated with convolution, max-pool, as well as squeeze and excite operations, as indicated in the above sections. It consists of a stack of these layers configured with the best set of parameters to efficiently perform leukemia cancer detection. The network begins at the input layer, where microscopic blood samples of dimension are provided as input. Later on, this input is propagated to convolutional and max-pool layers. This convolution layer operated on the image with a kernel size of to provide low-level optimum interpretations of the image that are effective for any image classification task. As depicted in Figure 5, this process of obtaining advantageous representations including both mid and high levels is further strengthened by employing subsequent convolution layers of the same kernel size. Furthermore, the total number of filters in each of the nonpadded convolution layers is configured to 32, 64, and 128, respectively. Following each convolution layer, a ReLu [67] activation function is implanted to bring the nonlinearity in the network learning. In addition, we have supplemented the network with batch normalization after every convolution operation to normalize the data and improve the performance. More specifically, there are three convolution layers, with each followed by ReLu [67] activation and batch normalization. After batch-normalization, the input is passed through squeeze and excite block. The squeeze operation enables the global information embedding while the recalibration process is attained through excite operation as described in the above sections. Subsequently, the max-pooling layer with a window size of is used after every batch normalization layer. The stride size during pooling is set to 1. The sizes of feature maps after every are , , and , respectively. After this, a global max-pooling layer is added to reduce the extracted feature dimensions followed by two dense layers with hidden units set to 128 and 1. The activation function on the second last dense layer is ReLu while on the last layer it is sigmoid. In addition, the model is trained with loss function “binary_crossentropy” as well as weight optimizer Adam with a learning rate of 0.001. The graphical representation of the traditional deep learning model is depicted in Figure 6 while Figure 4 shows the architecture of the proposed model. All of the architecture aspects stated above are included in the traditional deep learning model illustrated in Figure 6 to increase its performance.

4. Experiments and Results

This section discusses the findings of the designed model in various experimental scenarios, followed by discussions and comparisons. In addition, the proposed model is implemented using Python with Keras deep learning framework and all simulations are run on Google Colab with a 12GB NVIDIA Tesla K80 GPU. The experimental setup includes two datasets, and performance is validated both alone and in combination. All of the parameters are defined by trial and error procedure, and the results are reported with the best set of parameter settings.

4.1. Evaluation Criteria

The criteria used to evaluate the performance of the proposed model are determined by the following metrics:

4.1.1. Accuracy

This metric measures the total number of classes accurately predicted by the trained model out of all categories, i.e., an Acute Lymphoblastic Leukemia (ALL) and not Acute Lymphoblastic Leukemia (ALL). This measure indicates how many patients are diagnosed with leukemia and those who are not. The higher the value of accuracy, the more accurate is the model [6871]. The equation of accuracy is shown in the following equation:

4.1.2. Precision

Out of all positive cases, this metric measures the proportion of true positives [72]. In the instance of leukemia disease, it is the ability of the model to accurately highlight those patients who have leukemia disease. Mathematically, it is defined as in the following equation:

4.1.3. Recall

The recall assesses how the model is correctly highlighting the leukemia disease patients based on the overall relevant data. It is computed by the following equation:

4.1.4. FScore

This metric measures the overall efficiency of the model by integrating both values of recall and precision.

In equations (7)–(10), the term TN represents the True negative, TP represents the true positive, FP represents the false positive, and FN denotes the false negative.

4.2. Results of the ALL_IDB1 Database

To assess the performance of the proposed model, we first validate this model with the ALL_IDB1 database. The whole database is partitioned into two nonoverlapping collections of train and test samples with a ratio of 80:20. As previously stated, the total number of microscopic blood samples of both ALL and without ALL is very less for training. Hence, we have employed data augmentation techniques to increase the total number of blood samples for training, as shown in Figure 7. The total number of train and test blood samples for both ALL and without ALL classes is provided in Table 1. Subsequently, after data augmentation, the train set sufficiently contains a large number of samples that are used to train the model. These augmented images are used to train the model. The proposed model extracts the features of leukemic cells from convolution and max-pool layers. At each feature level representation of images, the squeeze and excitation learning is incorporated to improve the representational capacity of the model by modeling the interdependencies among the channels explicitly with the help of these extracted convolution operations. The results on the ALL-IDB1 dataset are shown in Table 2. As illustrated in Table 2, it is observed that the proposed model is performing very efficiently in diagnosing the patients having leukemia disease with 100% accuracy.

Furthermore, the additional evaluation indicators that comprise precision, recall, and FScore values are 100%, 100%, and 100%, respectively. In addition, we have also verified the reliability of the model class-wise on the ALL-IDB1 database. In this examination, the performance is analyzed on individual classes, i.e., patients with ALL and without ALL. It has been revealed that the model is also performing very accurately in individual instances, as shown in Table 2. Furthermore, in the ALL_IDB1 dataset, the number of test samples is very limited and does not contain diverse variations. Hence, we validate the model three different times, and each time we formed the train and test set differently by random shuffling. After division, we augmented the train set only. Alternatively, we have also charted the confusion matrix of this experiment, which is depicted in Figure 8 (first image). For each class category present in the dataset, the confusion matrix illustrates the overall efficiency of the model. It is evident from the outcomes that the proposed model shows better performance in classifying the microscopic blood samples into ALL and without ALL classes. The model learns well due to an adaptive recalibration of channel responses by considering interdependencies among channels. The squeeze and excitation learning brings the dynamics in the input to empower the feature discrimination. Both operations of squeeze and excitation can be included by global information embedding and recursive recalibration.

4.3. Results of the ALL_IDB2 Database

In the second phase of validation, we have employed the second dataset, namely, ALL_IDB2. In this dataset, the total number of microscopic blood samples is also very insufficient for the training of the proposed model. Hence, the same procedure of data augmentation is applied to this dataset. Later on, the train set with an excessive type of variations in the images is used to train the model. The total number of training and testing instances for this experimental setup is given in Table 3. Furthermore, all of the hyperparameters of the proposed model are the same as we set with the first database. The results of the proposed model on this database are shown in Table 2. As done previously with the first experiment, the features of microscopic images of blood samples are drawn from the convolutional and pooling layers whose learning improves by incorporating the squeeze and excitation operations. Table 2 shows that the model is also exhibiting good scores on this dataset. The overall accuracy obtained by the model in the first run is about 96% while values of precision, recall, and F-Score are 96%, respectively. Similarly, the results of the second and third runs in which we divided the train and test sets with different random shuffling are also encouraging, i.e., accuracy with the second run is 98% while with the third run it is 99.98%. Besides this, the performance of the model is also examined by demonstrating the class-wise performance of the model, as shown in Table 2. Another noteworthy thing to be mentioned here is that in this dataset, the regions of cells are cropped from the microscopic images while in the first experiment we have employed the full microscopic images of blood samples. The proposed model shows the best results on both types of image settings. Subsequently, the confusion matrix is also drawn for the experiment in which cropped cell images are used. Figure 8 (second image) shows the confusion matrix for this experiment.

4.4. Results of Combining Both Datasets

Furthermore, in the third experiment, we have combined the microscopic images of both datasets to create more diversity and increase the number of test images. Similarly, in this experiment, we have also performed the data augmentation to increase the training size. The total number of testing and training samples of both ALL and not ALL classes is given in Table 4. As done previously for both of the datasets separately, the train set with extensive data augmentation is given as an input to the proposed model. The model extracts the features of leukemic and normal cells from cropped and full-size, respectively. The accuracy achieved in this scenario is also encouraging. Similarly, the recall, precision, and F-Score values are also better. The recall value of 99.24% is achieved on ALL classes with the third experiment while it is 97.01% in the first experiment, respectively.

Besides this, the confusion matrix of this experiment is also plotted, as shown in Figure 8 (third image). Moreover, the training loss and accuracy curves are also plotted for all of the experiments, as shown in Figure 9. The learning curves indicate that the model performance on epoch 5, in terms of accuracy is leading towards the best values of the accuracy. Similarly, on epoch 10, the loss values are approximating near to zero. This behavior demonstrates the effectiveness of the proposed model during the learning process. In addition, the receiver operating curves (ROC) are also drawn for both of the databases. It is a likelihood curve that shows a true-positive rate (TPR) versus a false-positive rate (FPR) at various thresholds. ROC is a very accurate metric to examine the efficiency of the binary classifier, but it can also be plotted for multi-class problems [73]. The ROC curve demonstrates the trade-off between sensitivity (or TPR) and specificity (1 – FPR). Classification methods that produce curves nearer to the top-left corner function are the best. The nearer the curve gets to the ROC space’s 45-degree diagonal, the slower and less reliable the assessment. The ROC curves are depicted in Figure 10 for all of the experiments.

In addition to the above results, we have compared the results of the proposed CNN architecture with the traditional deep learning model, as shown in Figure 6. In a traditional deep learning model, we generally have several convolution, max-pool, and dense layers. Here, in this study, we empower the representational capabilities of this traditional deep learning model by incorporating squeeze and excite operations. The results shown in Table 5 provide the details regarding improvements in terms of accuracy, precision, recall, and FScore values of the proposed model over the traditional deep learning model. We tested both models three times, each time randomly shuffling the whole dataset to create the train and test sets. It has been demonstrated that the proposed model shows 5.5% average accuracy improvement over the traditional deep learning model.

Furthermore, the precision, recall, and FScore values are also encouraging and high than traditional deep learning models. In addition, the average loss value on the test set for traditional CNN is 1.44 while for the proposed this loss value is 0.117. Furthermore, we have also examined the effect of data augmentation on the performance of both models. Generally, the effective training of deep learning models requires a very large amount of data. On the contrary, with less data, the underlying model is less generalizable and more prone to overfitting issues. Hence, in order to show the influence of data augmentation for this particular problem, the results are listed in Table 6. In the first experiment, we train the deep learning model three times by random shuffling of the data without any form of data augmentation. It is observed that without data augmentation the results are less in terms of accuracy. More specifically, the accuracy of the traditional deep learning model is 88.6% while with the proposed model, it is 94%. However, when the data augmentations are performed over the data, the results are increased and models learn better, exhibiting accuracies of 92.8% and 98.43%, respectively.

4.5. Comparison with Existing Works

Finally, we have compared the results of the proposed model with the existing work on leukemia cancer detection, as shown in Table 7. For instance, Ahmed et al. proposed a CNN-based architecture to classify the different types of leukemia, both acute and chronic [32]. Their proposed architecture achieved an average accuracy of 88.25. In their work, the comparison is also performed with some traditional machine learning approaches such as Naive Bayes, decision tree, K-nearest neighbor, and support vector machines (SVM). Furthermore, the authors Shafique and Tehsin have suggested the transfer learning approach using the AlexNet model for the detection of acute lymphocytic leukemia (ALL) and its subtypes [2]. For only ALL detection, their proposed approach exhibits 99.50% accuracy, which is remarkable. Jothi et al. performed the ALL classification in which they employed the optimization-based backtracking algorithm to segment the leukemic cells from a given microscopic image [74]. Later, a different set of features are extracted such as morphological, color, and statistical, etc., followed by a feature section. Finally, the classification between healthy and leukemic cells is done by the Jaya algorithm, which is population-based meta-heuristic optimization. Their proposed frameworks exhibit 99% accuracy. Subsequently, Mishra et al. also performed the ALL classification by improved feature extraction method using 2D-discrete orthonormal S-transform [42]. This method extracts an extensive set of relevant texture features which is further reduced by PCA and LDA-based algorithms. Finally, one of the popular classifiers Adaboost is used to classify the leukemic and normal cells with an accuracy of 99.66%. Furthermore, Jiang et al. proposed a vision transformer–based convolutional neural network (ViT-CNN) for acute lymphocytic leukemia detection [54]. To get superior classification results, their proposed ViT-CNN ensemble model can extract features from cell images in two fundamentally distinct approaches. Their proposed method achieves an accuracy of 99.03%, respectively. In addition, Agaian et al. designed the cell energy feature–based approach to ALL feature extraction followed by SVM classifier to perform the classification [75]. Their proposed framework shows 94% accuracy in ALL detection. Tuba and Tuba perform acute lymphocytic detection using five shapes and six texture-based features. [60]. The classification is performed by SVM whose parameters are tuned with a generative adversarial-based optimization algorithm. Their proposed techniques show 93.84% accuracy. Moreover, Jha and Dutta proposed a chronological sine-cosine algorithm (SCA)-based deep CNN model to classify the ALL images [58]. The SCA algorithm is employed to find the best weights of the deep learning model to classify the microscopic images. Their proposed techniques demonstrate 98.70% accuracy. In comparison with all these approaches, some studies utilize the deep learning models, some are based on transfer learning mechanisms, some utilized optimization-based methods, while some have employed the traditional machine learning approaches. All of them perform excellently well, but the results of deep learning–based methods are more accurate and better. Hence, the proposed framework is also based on the deep learning method in which performance is boosted up by incorporating the squeeze and excitation learning. The results are presented in the above sections as well as according to comparison made in Table 7, and it is observed that the proposed approach is good in classifying leukemia cancer.

The major reason behind the improvements is the representational power of the model, as indicated in Ref. [37], i.e., the representational power of traditional CNNs is enhanced by explicitly considering the interdependencies among convolutional features’ channels. This mechanism is accomplished by adding squeeze-and-excitation operations into the layers of deep learning. More precisely, the squeeze operations consolidate the widespread distribution of outputs acquired from channel-wise features. This is followed by an excitation operation, which takes the extracted information by squeeze operation as input to completely learn channel-associations with the recalibration process. These operations strengthen the model’s representational power. They also improve its feature learning method in order to extract more compact and discriminative features, which is a critical prerequisite for microscopic image analysis. Furthermore, the proposed model is also light-weighted in terms of network depth and layers as well as a number of trainable parameters. Furthermore, the suggested method is a simple and enhanced deep learning approach that does not require any post-processing or pre-processing techniques to identify leukemia cancer. On the contrary, when there are more different and complicated differences or variations in microscopic blood samples, it may be necessary to upgrade network configurations as well as network depth, since the model presently does not have deeper depths as it is a light-weighted model.

5. Conclusion

Leukemia is a form of blood cancer that is one of the principal causes of cancer-related death. Recent research studies propose deep learning–based strategies for leukemia cancer detection, including transfer learning approaches, and show incredibly precise outcomes. However, improving deep learning algorithms is a continuing research problem for various researchers. Hence, in this research study, an improved deep learning model based on squeeze and excitation learning is proposed to diagnose leukemia cancer from a given microscopic blood sample of patients. In the proposed model, the representation ability is improved at every level of feature representation by permitting it to undertake periodic channel-wise feature recalibration. The squeeze and excitation operations enable the model to extract strong, relevant, and discriminative features from leukemic and normal cells. The proposed model has been validated on publicly available datasets and shows promising results when compared to the traditional deep learning model. In the future, the proposed technique can be validated on the different subtypes of acute lymphocytic leukemia.

Data Availability

The dataset is publicly available and can also be obtained by contacting the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.