The most common human parasite as per the medical experts is the malarial disease, which is caused by a protozoan parasite, and Plasmodium falciparum, a common parasite in humans. A microscopist with expertise in malaria diagnosis must conduct this complex procedure to identify the stages of infection. This epidemic is an ongoing disease in some parts of the world, which is commonly found. A Kaggle repository was used to upload the data collected from the NIH portal. The dataset contains 27558 samples, of which 13779 samples carry parasites and 13779 samples do not. This paper focuses on two of the most common deep transfer learning methods. Unlike other feature extractors, VGG-19’s fine-tuning and pretraining made it an ideal feature extractor. Several image classification models, including VGG-19, have been pretrained on larger datasets. Additionally, deep learning strategies based on pretrained models are proposed for detecting malarial parasite cases in the early stages, in addition to an accuracy rating of 98.34 0.51%.

1. Introduction

The leading cause of infection in all parts of the world is malaria, a deadly disease. The rapid consumption and high mortality rate of this epidemic condition have been documented throughout history. The global death toll from malaria was estimated at 4,29,000. In 2015, there were an estimated 3,03,000 children under the age of five years, and prenatal women are the most at risk of death [1]. By detecting this disease at an early stage, the death rate can be reduced and prevented. Researchers face a challenge in providing the most accurate parasite detection in the shortest amount of time, cost, and effort. During the last few decades, this visual inspection has played a vital role in being a tool in the health check field for decision making. A thick blood smear can identify malaria parasites in blood samples [2]. This shows that it is close to eleven times more sensitive than a thin blood smear for the rapid detection of parasites. It is often used to test the development stages to create blood smears placed on a microscope glass slide. To confirm malaria infection, a pathologist uses a light microscope to identify changes in the size, shape, and perception of various RBCs. The pathologist’s understanding of the disease depends on the accuracy of a plasmodium microscopic report. This technique is arduous and inefficient due to uncertainty, which can cause erroneous and contradictory diagnoses, as well as inappropriate medication and, in rare situations, the death of the patient and specimens of infected and noninfected microbes.

A mechanism for detecting malaria infection has been proposed using supervised learning. The methods for detecting malaria parasites were demonstrated by using in vitro culture image samples [3]. The samples developed in a laboratory do not contain substances other than WBCs, platelets, or parasites. When considering the severity of malaria depending on the number of deaths caused by the disease, it is reasonable to accept possible minor errors caused by an automated method during execution [4, 5]. Since the advent of deep learning techniques, feature extraction has been made far more efficient than traditional methods. Because deep learning methods still require trained experts and advanced techniques for calculating disease prediction, most of these methods still require efficient feature extraction optimization. There are many layers and levels of nonlinear mapping in CAD schemes based on ANN architecture. As a result of the layer-wise network of hidden layers, gradient-based optimization produces poor results [6, 7]. Microscopic diagnosis requires extensive training, experience, and skills. In rural areas where malaria is prevalent, manual microscopy has not proven to be an effective screening tool when performed by nonexperts [810]. This model was intended to enhance the model’s performance by modifying the network architecture and hyper-tuning the features to achieve a better-performing model. To determine the key features in the future, this paper focuses on the network architecture. The basic VGG-19 model obtains 85% accuracy, but after fine-tuning the model and applying the data augmentation technique to the training dataset, it can attain 97.14%.

The following is the sequence under which this manuscript is structured. Section 2 represents the related work and data acquisition. Section 3 illustrates the proposed deep transfer learning methodology. Section 4 discusses the findings. Section 5 concludes the paper.

There are several methods of identifying parasite RBCs. Purwar et al. preprocessed by utilizing local histograms [11]. The Hough transform and morphological operations are used for segmentation and classification of diseased and clean cells. Di Ruberto et al. employed a statistical k-means clustering algorithm [12]. Using thresholding, Ritter and Cooper [13] have segmented cells, separated overlaps, and modified division lines according to Dijkstra’s algorithm. Diaz et al. [14] applied efficiency to develop templates from parasite-stained images, which would then be used to classify each cell’s infection life stages. RBCs can be segmented from the background in blood images using a variety of techniques. Díaz et al. [15] determined whether the RBCs were infected by the parasite from the host or not by separating them. Savkare et al. [16] and Ross et al. [17] analyzed grayscale images using k-means and k-medians to define the overall clusters into two parts. The significance of supervised learning is to identify classes of sample data. These disease stages were identified with a blend of ML algorithms. Krizhevsky et al. [18] built a well-known CNN architecture, that is, the ALEX net, to compete with ImageNet and bagged awards. Quinn et al. [19] compared the CNN with a classifier using trees, and the accuracy rate was determined by analyzing the middle of the operating region. To identify blood cells on blood smear images, Chowdhury et al. [20] adopted a CNN approach to detect infections that were affected by the blood smears. Raviraja et al. [6] analyzed pretrained CNN models for malaria detection in blood cell images, namely, DenseNet121, VGG-16, Alexnet, ResNet50, FastAI, and ResNet101, Meng et al. [21]. With a high precision of 97.5%, ResNet50 excelled over the other CNN models [22]. To conclude, many deep learning algorithms for detecting malaria using cell images have been presented by Yue [23]. Many of them used large pre-trained CNN models to enhance the accuracy of classification, whereas others used customized CNNs to minimize the computational time [24].

The implications of substantial quantities of improperly classified data in medical image classification are catastrophic, and the objective of proposing a medical diagnosis tool is wrecked. In addition to efficiency, additional parameters such as F1 score [25, 26], area under the curve (AUC) [27, 28], sensitivity [29, 30], and specificity [3133] are vital in analyzing various approaches. A random sample of cell images infected/not infected with malaria is shown in Figure 1.

2.1. Data Acquisition

The data was obtained from the National Institutes of Health portal and uploaded to a Kaggle repository. There are 27558 cell images in the dataset, out of which 13779 are malaria-infected cell images and another 13779 are malaria-free cell images. The random samples were collected and separated into three sets: training, testing, and validation. Next, 8000 images were deployed for training and 3000 images for validation for each class. These 11000 images have been used to train models. Subsequently, these were used along with the remaining 2779 images for each class as test data to evaluate the proposed models to perform on images that have never been seen before, because the original dataset included images with different dimensions, and it was scaled to equal dimensions before splitting the data. Most of the images were scaled to 128 × 128 pixels with three channels of RGB. Having all images equal will allow the neural networks to learn more quickly and with minimal mistakes in the future. The outcomes obtained excelled all existing methods, so data augmentation was used to improve the results. A few of the images in the class folders at the beginning are significantly different than the ones at the end. As a result, this was used to sample the data at random, as this will allow the proposed models to learn more diversified features from both classes, which will reduce overfitting and make our model more extensible to data.

3. Proposed Model

Morphological image processing is used to eliminate noise and recreate object features (see Figure 2). To achieve deep feature extraction and transfer learning, the convolutional layers are frozen. To classify RBCs, a fine-tuned pretrained CNN model with image augmentation is proposed (see Figure 3).

3.1. Transfer Learning

The paths, according to K. Fukushima, are the beginning of a deep convolutional neural network architecture. The idea has been around for a while, but due to the lack of efficient high computational power, it has yet to gain momentum. In recent times, graphics processing units have advanced significantly toward high-performance computing technologies. Computational intelligence techniques have gained prominence as a result of their high possibility. CNN is the most popular type of this approach, which is composed of the layers described below.

Three main CNN layers were used in this algorithm, namely, “convolution,” “pooling,” and “fully connected layers.” Figure 3 exhibits a schematic representation of a model with one conventional layer and one maximum layer. In the feature map, activation functions are used to increase the nonlinearity of the network ReLU. A neuron with the sigmoid activation function appears in the output layer of the model. When all negative values in the activation map are replaced with zero, ReLU activation completely cancels out all negative values. A binary classification model is built using sigmoid activation with a loss of function of binary cross-entropy. It has a learning rate of 0.01. In this case, the function gives a value between 0 and 1. Following the compilation of the dataset, the model will be trained based on the inputs received from the training samples, which will map the inputs to the outputs. It is a matter of choosing a set of weights that is best suited for resolving these problems.

3.2. Predictable Method

The initial model is trained by this network model using traditional methods of training for a given number of epochs. Model accuracies of 99.80% during training and 95.60% during validation were unchanged. Figures 4 and 5 show the exactness and failure graphs of the model. AUC score, specificity, sensitivity, and test accuracy were used to evaluate the performance. Table 1 depicts an implementation of the test set, whereas Table 2 exhibits CNN model architecture. The input layer a = i1 and output layer io1 = 3 and p1 = 0, input layer c2d, output layer co1 = 32 and p1 = 896, max input layer mp2d and output layer mo2 = 32, and p1 = 0. The input layer c2d1, output layer c2do1 = 64 and p2 = 18496, and the input layer mp2d1 and output layer mpo1 = 64 and p2 = 0. The input layer c2d2, output layer c2o2 = 128, and p3 = 73856. The input layer = m2d2, output layer = m2do2 = 128, and p4 = 0. The input layer flattened layer = Fl, output layer flo = 28800, and p4 = 0. Input the dense layer dl, output layer det5 = 512, and p5 = 14746112. The input layer dropout = dot, output layer dot5 = 512, and p5 = 0. The input layer dns_1 and output layer dnst5 = 512 and p6 = 262656.

3.3. Pretrained CNN Model

Figures 6 and 7, respectively, demonstrate the accuracy model and performance for the VGG-19 pretrained model and weights as displayed in Table 3. The convolutional layers are divided into sixteen layers, and there are 3 ∗ 3 convolutional filters. The input layer b1-c1, output layer t1 = 64 and parameter p1 = 1792, the input layer b1-c2, output layer t1 = 64 and parameter p1 = 3698, and b1_pool and output t1 = 64 and parameter p1 = 0. The input layer b2-c1, the output layer t2 = 128, and the parameter p2 = 73856. The input layer b2-c2, the output layer t1 = 128, and parameters p1 = 147584 and b2_pool and output t2 = 1284 and parameter p2 = 0. The input layer b3-c1, output layer t3 = 256, and parameters p3 = 295168, b3_c2 output layer t3 = 256, and p3 = 590080, b3-c3 and b3-c4 output layer t3 = 256, and output layer p4 = 590080. The b3-pool and output t3 = 256 and p3 = 0. The input layer b4-c1, the output layer t4 = 512, and p4 = 1180160. The input layer b4-c2, b4-c3, b4-c4 = 512 and p4 = 2359808. The input layer b4-pool and t4 = 512 and the output layer p4 = 0. The input layer b5-c1, b5-c2, b5-c3, b5-c4, and t5 = 512, and the output layer p5 = 2359808. The input layer fla-1 and the output layer t5 = 4608 and p5 = 0. The input layer dse3 and output layer dset = 512 and p5 = 2359808. The input layer dtt2 and the output layer dtt5 = 512 and p5 = 0. The input layer dse4 and output layer dset5 = 512 and p5 = 262656. In addition to that, there are max pooling filters for downscaling and two fully connected hidden layers with 4096 units each. The remaining dense layer consists of one thousand units, each representing one of the image categories in ImageNet. The dense layer is fully connected, so the last three layers are skipped, and the five layers are concentrated to use the vgg19 model for feature extraction. This model was built from scratch using the original datasets that include all 19 trainable layers. The ReLU served as the activation function for the network. The Adam optimization was used to create the loss function. The model was built using transfer learning and using pre-trained frozen layers. Later, this model was used to generate the output of the images. This was implemented using the sigmoid activation function as a simple feature extractor by freezing all five convolutional blocks to prevent the weights from moving across epochs. This third model is fine-tuned and is built by freezing the first three blocks from the image net and then training blocks four and five from the malarial datasets. To fine-tune the VGG-model, blocks 4 and 5 were changed so that their weights are updated each time the model is evaluated. These preprocessing strategies were applied to this model, which includes normalization, data augmentation, and standardization. A sigmoid activation function with two methods was applied to solve the classification problem to gain an output of 1 for infected and 0 for healthy.

3.4. Image Augmentation with a Fine-Tuned Pretrained Model

In Figure 8, the existing images from the training samples were reworked and transformed to create a new, modified version of the originals because of rotation, shearing, translation, zooming, and so on. Figures 6 and 7 illustrate how the random transformation, model accuracy, and loss of the model are what are required to obtain the same images every time. The augmentation of the model accuracy is shown in Figures 9 and 10.

The confusion matrix (FN) is used to evaluate the number of positive and negative predictions as shown in Figure 11. The confusion matrix is used to determine whether a prediction is a truly positive or true negative.

4. Discussion

Individual red blood cell smear images are investigated to evaluate if they are infected or healthy. The study comprises a range of pretrained convolutional neural networks with transfer learning that are fine-tuned and registered on the malaria dataset. On VGG-19 and Transfer Learning, research shows that different preprocessing approaches like normalization and scaling do not affect model performance, however, the data augmentation technique has shown encouraging outcomes. The basic VGG-19 model obtains 85% accuracy, but after fine-tuning the model and applying the data augmentation technique to the training dataset, it can attain 97.14%, as shown in Table 4. Transfer learning is a wonderful strategy that can be used to create promising results, according to the research and the performance analyses depicted in Table 5.

5. Conclusion

The deep learning neural network model was applied to improve the model’s performance. It was shown that standardization and normalization had less impact on classification. The use of data augmentation improved the model performance and yielded positive results. Models of VGG-19 and ImageNet were derived from the initial concept using the combination of transfer learning and parameter tuning. To determine the key features, this paper has focused on the network architecture. This was intended to enhance the model’s performance by modifying the network architecture and hyper-tuning the features to achieve a better-performing model.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This research was funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R120), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.