Abstract

Malaria is a vector borne disease widely occurring at equatorial region. Even after decades of campaigning of malaria control, still today it is high mortality causing disease due to improper and late diagnosis. To prevent number of people getting affected by malaria, the diagnosis should be in early stage and accurate. This paper presents an automatic method for diagnosis of malaria parasite in the blood images. Image processing techniques are used for diagnosis of malaria parasite and to detect their stages. The diagnosis of parasite stages is done using features like statistical features and textural features of malaria parasite in blood images. This paper gives a comparison of the textural based features individually used and used in group together. The comparison is made by considering the accuracy, sensitivity, and specificity of the features for the same images in database.

1. Introduction

An infected female Anopheles mosquito is the host, which spread the malaria parasite [1]. Malaria is a highly occurring, vector borne disease in equatorial region. World Health Organization (WHO) has given the analysis report for the year 2013 as 300–500 billion people were the victims of malaria [1, 2]. There is a need for saving lives by making early diagnosis and medication. Nowadays in the biomedical field, all the disease diagnosis methods are turning into computerized automatic way. In this paper, diagnosis of malaria parasite in the blood images is made computerized by using the proposed method which works with image processing.

There are many clinical methods which can be used for malaria diagnosis which are peripheral blood smear (PBS), quantitative buffy coat (QBC), rapid diagnosis test (RDT), Polymerase Chain Reaction (PCR), and Third Harmonic Generation (THG) [3, 4]. Among these entire tests, the PBS is widely used and has limitations of human resistance and time requirement. These limitations can be overcome by making the system automated. The blood images are processed and then the diagnosis is done. The major problem in processing the blood images is the segmentation of red blood cells (RBC) from the background; a proper separation can help to achieve high accuracy in detecting the presence of plasmodium and helps in classification of life cycle stages of malaria parasite.

The life cycle stages of malaria parasite are differentiated into three different stages, that is, trophozoites, schizocytes, and gametocytes [3]. Each one of them is characterized by unique features such as shape, size, and texture. To identify and classify these stages, the above features will play important role while working with image processing. Features are the key way to differentiate and diagnose the parasite stages.

The organization of this paper is as follows. Section 2 gives the detailed methodology about the work done and all the different image processing techniques used stepwise, while Section 3 provides all the comparison results obtained in presented study and lastly Section 4 gives the discussion followed by conclusion part of the complete research work carried out.

2. Methodology

The method of detection of malaria parasite is mainly subdivided into two parts first as the image recognition and second as the image classification and parasitemia estimation. The proposed method follows the stepwise work as shown in Figure 1, as block diagram.

The first block consists of the three basic steps of image processing, that is, preprocessing, segmentation, and feature extraction, while in the image classification and detection of infected red blood cells that is called parasitemia separation of infected RBC and the stage detection are carried out.

In this work, the malaria images of three different stages, schizont, trophozoite, and gametocyte stages, have been captured from the blood smears. First a database was created using images from the online database like Center for Disease Control (CDC) and Prevention and from Public Health Image Library [5] consisting of both types of samples parasitic and nonparasitic blood samples. Some images were added into the database which was obtained from Smt. Kashibai Navale General Hospital and Research Centre, Pune, consisting of both malaria positive blood samples and the normal blood samples images. The oil immersion images are taken from the digital camera with 100x zoom. Altogether, 100-image database was created among which 50 images are used for training stages and the remaining 50 images for testing stage were used.

2.1. Preprocessing

Preprocessing is done to make the images easy to be processed further as it removes the noise and other unwanted artifact from the image under observation. The steps followed in preprocessing can be explained as in Figure 2 flowchart.

Initially the RGB image is converted into the gray image as shown in Figure 3. The basic Red, Green, and Blue (RGB) planes of the input image are separated so as to create object mask for segmentation. The RGB plane separation is done as shown in Figure 5. As there are intensity variations all over the image, to take histogram and work on intensity factor, thresholding is the key requirement. Therefore, autothresholding is done using Otsu’s method [6]. An object mask is the image which is used as the standard sample to segment the region of interest in the input image and hence it should contain the parasitic part in the image. Therefore, considering all these aspects, the Hue, Saturation, and Value (HSV) bands are created, as shown in Figure 4. Combined GV band is obtained and named as object mask. This object mask is used for further process because it gives good quality image containing all the RBC and parasites clearly as shown in Figure 6. Finally, the morphological operations like closing and opening are performed on the object mask for removing the unwanted artifacts [7].

2.2. Segmentation

Segmentation is the process carried out to obtain the area of interest from the complete image [8]. As the malaria parasite affects the red blood cells [3, 4], the region of interest is the red blood cells in the microscopic images of blood sample. In this work, the segmentation is done to separate each red blood cell. These segmented cells are further processed and then the infected red blood cells are identified. The segmentation technique used in this work is watershed transform with the distance transform. The main reason of using watershed transform is to detect the overlapped red blood cells. The principal concept of watershed is to change the inputimage into another image whose catchment basins are the objects or regions which are required to be segmented or identified.

The segmentation is carried out as follows:(i)Distance transform is calculated using Euclidean distance formula:where “” and “” are the two neighboring pixels between which the distance is calculated and their position can be given as and , where , are the -coordinates and , are -coordinates of the two neighboring pixels which are considered. Then, these distance values are calculated for all the pixels in an image and a distance matrix is formed. This is used as an input for the watershed transform.(ii)After calculating the distance between each pixel, it creates matrix of these values called label matrix.(iii)Label matrix is then given as an input to watershed and the image is obtained as shown in Figures 7(a) and 7(b).(iv)This obtained image is segmented using the major and minor axis criteria as shown in Figure 7(c).

2.3. Feature Extraction
2.3.1. Statistical Features

The staining of the slides varies in the color acquired by the RBC, in different colors like blue, purplish, pink, and so forth [4]. Hence the color intensity varies, therefore that can be counted into the feature list, and this variation can be plot on histogram of each image. The histogram is studied using the statistical values of color variation of the image on the basis of different staining. Therefore, the histogram based statistical features are studied in this research. The variance feature gives the variation of the intensity values in image, while the skewness gives the symmetric distribution of those intensity values. There are high peaks observed in the infected cells as compared to the normal cells, and hence the kurtosis values are efficient for detection of infected red blood cells. The statistical features are calculated as described below.

If image function is of two space variables and and , then the discrete values of function can be taken in terms of the intensity values defined by “” as , where represents the total number of intensity levels in the image. All these features are calculated using the histogram of the image and probability density function . Here is the ratio of all the intensity values and the total number of pixels in the image. The features are calculated as follows:See [9].

2.3.2. Texture Based Features

In second stage, that is, schizocytes, there is growth of granular spotting, that is, the growth of cytoplasm known as matured trophozoites. These cytoplasm dots create a particular texture which differentiates it from other two stages more prominently. Hence, the texture features are included in the feature extraction list. The infected cells are correlated with the background pixels which can be defined by the correlation values calculated and the energy level of those infected cells defines them with higher energy values as compared to normal cell values.

The texture based features are calculated using Gray-Level Cooccurrence Matrix (GLCM) from the input. The GLCM calculates the occurrence of a pixel with gray-level value “” horizontally adjacent to a pixel with the value “.” Every element can be mentioned by in GLCM which specifies the number of times that the pixel with a value “” occurred horizontally adjacent to a pixel with value “.”

These texture based features can be calculated and used as follows:(i) Contrast is the value which returns the measure of intensity contrast between a pixel and the neighboring pixel over the complete image. The standard range of these values for general images is [10]:(ii) Correlation is the measure of how pixel is correlated to the neighboring pixel in the complete image. The standard range of these values for general images is [10]: (iii) Energy is the sum of all squared elements in the GLCM matrix. The standard range of these values for general images is [10]:where the Gray-Level Cooccurrence Matrix (GLCM) is calculated first and then the properties, like contrast, correlation, and energy, are calculated.

The different samples of all three stages of malaria parasite are taken for the training of the Support Vector Machine classifier as shown in Figure 8. The three images in (a) are of third stage as gametocytes while (b) consists of second stage, that is, the schizonts, and (c) consists of the first stage trophozoites image samples which are used for training purpose.

The different values for all seven features are given to the SVM for training the stages. The images used are labeled according to the stages name as the trophozoites are named as , and so forth, while schizonts are named as , and so forth, and lastly third-stage gametocytes are named as , and so forth. The values of features are as shown in Tables 1, 2, and 3 for each stage I, II, III, respectively.

2.4. Image Classification and Parasitemia Estimation

Classification is done using the feature matrix created by calculating all the feature values for each RBC. This feature matrix of trained data is compared with the feature matrix for the image taken as an input. Then, the class is decided accordingly on the comparison results and displayed. The Support Vector Machine (SVM) is for the classification. SVM is a supervised classifier which requires trained dataset and can be used for multistage classification using multi-SVM.

Based on the classification results obtained in the work, the quality of the result is to be decided by calculating as follows: (see [11]), where TP is true positive (infected sample correctly classified), TN is true negative (normal sample correctly classified), FP is false positive (normal sample misclassified), and FN is false negative (infected sample misclassified).

All the values of calculation done for the above values are as shown in Table 4 with the elapse time requirement. The algorithm is executed on hardware platform of Laptop, model: Lenovo Win7PC, Core i5 @250 GHz, RAM: 8 GB, 64-bit OS. From this, it is observed that the time required for calculation is proportional to the number of RBC in the image. All the images used for testing were labeled as , and so forth, where Ts stands for testing sample and the number of sample followed by this.

Some of the testing samples are shown in Figure 9 where in (a) there are first-stage sample images, that is, trophozoites, and (b) consists of second-stage sample images, that is, schizonts, and lastly (c) consists of all the third-stage parasite samples, that is, gametocytes.

3. Experimental Results

The complete process is performed as mentioned stepwise above by considering all the statistical features and texture based features and then obtained the results in terms of accuracy; sensitivity and specificity are calculated in percentage. Secondly the statistical features were kept constant and texture features were considered one by one individually and calculated the values for accuracy, sensitivity, and specificity for the same database as mentioned before.

Table 5 gives all the results obtained after calculating the values of accuracy, sensitivity, and specificity for individual texture based features and altogether with the statistical features this comparison is shown in a graphical way in Figure 10.

4. Results and Discussion

From the observation done in Table 5 for accuracy, sensitivity, and specificity for considering all features together and for individual texture based features, it can be seen that if all the features are used together, then the high diagnosis accuracy can be achieved. If all the features are used together, the SVM gets trained very efficiently and the classification of the parasitic and nonparasitic samples and the stages determination are done with higher accuracy of about 97%. This means the diagnosing of parasite and the detection of stage are more reliable if all the features are used together. Among three texture based features, namely, contrast, energy, and correlation, when only energy features are considered for classification, it gives more accuracy, sensitivity, and specificity up to 90.55%, 85.36%, and 91.66%, respectively. The high accuracy achieved by presented system can be compared with other methods in the literature as shown in Table 6.

5. Conclusion

As the number of people infected by malaria in tropical region is high, there is special need of early diagnosis of malaria parasite, for its prevention against getting into next stage and medication of a person. This paper has presented automated diagnosis of malaria parasite in blood images. The diagnosis is done by using features like statistical and textural based features which has enhanced the diagnosis accuracy of the presented system. Additionally the presented method is also helpful in detecting stages of malaria parasite. The segmentation is done using watershed algorithm using distance transform which counts the overlapped RBC in the blood images. The system achieved high accuracy of 97.7% for automatic diagnosis of malaria parasite in blood images.

Further this technique can be used for diagnosing different hematological disorders by training the SVM accordingly. The textural properties can be used for enhancing the results of accuracy and sensitivity.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

The authors gratefully acknowledge and thank Dr. Saurabh Joshi from Smt. Kashibai Navale General Hospital and Research Centre; he helped them in the formation of database in a proper way and guided them in the diseases related medical terms. The authors would like to thank all authors of the references which have been used.