Abstract

Wireless capsule endoscopy (WCE) is a technology for filming the gastrointestinal (GI) tract to find abnormalities such as tumors, polyps, and bleeding. This paper proposes a new method based on hand-crafted features to detect polyps in WCE frames. A polyp has a convex surface containing pixel values with a specified Gaussian distribution. If a polyp exists in the WCE image, edges will be seen at the border of the occupied area. Since WCE images often suffer from low illumination, a histogram equalization (HE) technique can be used to enhance the image. In this paper, we initially find probable polyp edges via thresholding. Then, we use the edges to find the region of interest (ROI). Then, the mean, standard deviation (STD), and division of mean by STD from the ROI are computed as features to discriminate between polyp and nonpolyp using a support vector machine (SVM). The evaluation results on the Kvasir-Capsule dataset show 99% accuracy for the proposed method in polyp detection. Furthermore, the proposed method runs at a real-time speed of ∼0.031 seconds detection for each image.

1. Introduction

Gastric, colorectal, and esophageal diseases cause many cancers and deaths a year [1]. A polyp is a common cause of gastrointestinal (GI) cancers. Polyps are usually benign clumps of cells [2]. It is commonly observable as an oval shape inside the mucosal wall [3]. Polyps can be precancerous lesions; hence, it should be better to identify and remove polyps to prevent cancer [2, 3]. Wireless capsule endoscopy (WCE) is a technology for filming the GI tract to find abnormalities such as tumors, polyps, and bleeding [4]. It is noninvasive and uses a swallowable capsule equipped with a video camera to capture the film [5]. WCE images may include various lesions such as polyps, bleeding, angiodysplasia, and erythematous [6]. The output file from the examination is an eight-hour video containing about 8000 frames [7]. Checking all frames is a time-consuming task for the physician [2]. Also, physicians may miss some abnormalities during the examination [8]. Hence, automatic and accurate computer-aided detection is required to improve the accuracy and speed of detection [9].

Many challenges are affecting polyp detection systems from WCE images. The main issue is the similarity in many cases between normal mucosa and polyp lesions in terms of color, texture, and shape (Figure 1). In addition, WCE images often suffer from noise, low resolution, and blurriness [10]. This research proposes a new technique to detect polyp lesions in WCE images.

In WCE images, polyps are convex shapes with a specified Gaussian distribution in pixel values, different from other convex shapes in the image. Meanwhile, the periphery of a polyp is visible by the edges. Hence, a suitable edge detection technique can be used to extract region of interest (ROI) for polyp detection. WCE images often suffer from low illumination. Hence, histogram equalization (HE) can be used to enhance the image for ROI extraction. Then, discriminant features are extracted from ROIs to detect polyps and segment the image. Our goal is to use the features extracted based on the Gaussian distribution of convex areas to distinguish polyp and normal mucosa images.

The rest of the paper is organized as follows: In Section 2, related works are reviewed. In Section 3, the proposed method is introduced. Section 4 discusses the experimental results, and Section 5 provides the conclusion and future works.

In the literature, several studies exist for abnormality detection in WCE. Before 2015, GI abnormality detection systems commonly used hand-crafted features and classic machine learning algorithms [11]. Li and Meng [12] proposed a method using local binary patterns (LBP) and wavelet transformations to extract features from WCE frames for polyp detection. Then, a support vector machine (SVM) was used to classify polyp and nonpolyp images. After 2015, GI abnormality detection systems focused more on the deep convolutional neural network (CNN) systems [13]. Yu et al. [14] proposed a hierarchical network, namely, the HCNN-NELM network, for polyp detection. It uses a CNN for extracting features and a cascaded extreme learning machine (ELM) as a classifier. Guo and Yuan [15] proposed a Triple ANet network. It uses abnormal-aware attention module and adaptive dense block. They introduced angular contrastive loss to achieve better results in intraclass variabilities and low interclass variances situations.

Recently, different researchers proposed state-of-the-art methodologies for WCE abnormality detection. Prasath et al. [16] proposed a new image enhancement model for WCE video frames based on a human visual system. The proposed method models the neuronal mechanism using the feature-linking model (FLM). The FLM is a neural network model based on the precise timing of the spiking neurons. This method is proposed to enhance uneven illumination and darker regions for better visualization.

Bchir et al. [17] proposed a new automatic detection approach for multiple bleeding spots using WCE video. The proposed method has two components. The first component extracts handcrafted features for the feature extraction step, and the second one uses an unsupervised learning technique to overcome performance degradation.

Ellahyani et al. [18] proposed a WCE abnormality detection system based on an extreme learning machine. In the preprocessing, they use the hue component of HSV color space to apply oriented gradients (HOG) and a modified rotation-invariant local binary pattern for feature extraction and then combined the features as a vector and feed them to the Kernel ELM classifier.

Mohammed et al. [19] used a recurrent neural network (RNN), namely, pathology-sensitive deep learning model (PS-DeVCEM) network, for colon disease detection. The PS-DeVCEM network uses a ResNet50, a residual long short-term memory network (short LSTM) [20], and a CNN architecture. The CNN extracts the spatial features, the ResNet50, and the short LSTM extract temporal features from the image. This network has the advantage of self-supervision learning, which can be used for generating representative labels on unlabeled data. In addition, it can minimize within-video similarities for negative and positive feature frames.

Jha et al. [21] proposed another method for detecting polyps. This method uses EfficientDet [22] for the backbone and bidirectional feature pyramid network architecture for the feature base network. Also, it uses faster R-CNN [23] and fast R-CNN [24] as region proposal and detection networks, respectively. In the end, they used YOLOv3 [25] for utilizing multiclass logistic loss.

Qadir et al. [26] proposed a novel method based on full CNNs and 2-dimensional Gaussian shapes for polyp prediction. The proposed method uses a CNN-based autoencoder to predict Gaussian shapes in polyp regions of images. At first, they convert binary polyp masks to Gaussian masks. Then, they use new masks to train CNNs. Eventually, the MDenetplus network detects polyps.

Reuss et al. [13] proposed a new method with the idea of sequential models to detect polyps in WCE. They use a pretrained self-supervised network to extract low-level features. Then add a CNN and a bidirectional LSTM on the top layer to extract high-level features. In the end, they use a fully connected layer for binary classification.

In another study, Amiri et al. [6] proposed a new method that detects polyps and some other abnormalities in WCE. The proposed method uses the joint normal distribution to identify distinct areas and separate the foreground from the background to find ROI. Then, features such as shape, texture, and color are extracted from the ROI. In the next step, they applied the correlation-based feature selection technique to select the best feature sets. Finally, an SVM was trained to classify different abnormality lesions.

3. Proposed Method

This paper proposes a new polyp detection method for WCE images. The diagram of the proposed method is depicted in Figure 2. The first step of the proposed method is image enhancement using HE [27], which is applied to the three RGB channels of the image. Figure 3 depicts the RGB and the HE-enhanced images. HE enhancement is an effective process for the feature selection step and positively impacts the proposed method. Then, ROIs are extracted from the image. We use feature extraction and classification techniques for the detection.

3.1. ROI Extraction

We initially find the probable polyp area (ROI) via thresholding; then, the central point coordinate of the area is computed. In WCE, the periphery of the polyp is visible by the edges. Hence, we can find polyp edges by thresholding. We provide binary masks for each RGB color channel, i.e., C, using the following equation to find polyp borders:where C indicates an RGB channel, a and b indicate the thresholding range, which is different for each RGB channel, and H indicates the thickness of the polyp edges. The smaller the H parameter, the thicker the edges. The values of a, b, and H for each RGB channel, computed experimentally, are indicated in Table 1. Figure 4 depicts the probable polyp edge thickness for different values of H. We use the following equation to obtain ROI:where Rmask[i, j], Gmask[i, j], and Bmask[i, j] are the binary masks associated with R, G, and B channels, respectively, obtained using equation (1); and Fmask[i, j] represents ROI for polyp detection. Then, the central point coordinates of the possible polyp, i.e., ROI, can be obtained using equation (3) as follows:where is the coordinate of the possible polyp center and N is the total number of pixels equal to 1 in Fmask[i,j]. The ROI extraction process is depicted in Figure 5.

3.2. Feature Extraction

As mentioned, a polyp has a convex shape with pixel values in a Gaussian distribution [25], different from other convex shapes in WCE images. We use the Shapiro–Wilk test [28] to show that the convexity of regions in each RGB component has a Gaussian distribution. To use the Shapiro–Wilk test, we made binary masks for the polyp and normal mucosa convex regions in each image. Afterward, we got the pixel values in the binary mask for each RGB channel separately. Then, we shaped these values as vectors and applied Shapiro–Wilk test to these values separately. Figure 6 shows the distribution of pixel values in each RGB color channel for four convex shapes from WCE, two polyps, and two normal mucosas. As shown in Figure 6, the Gaussian distributions of ROI in polyp and normal mucosa images are different. It shows that the distribution in polyp images is more Gaussian than in normal mucosa images. Also, Table 2 shows the mean, standard deviation (STD), and division of mean by the STD values of each RGB channel in two polyps and two normal mucosa images. Therefore, we use the Gaussian distribution to extract the features from each channel in the RGB image. The division of the mean by the STD helps the SVM model classify accurately because it has a specific range for polyp regions different from normal mucosa regions. In each component, the three different features are extracted from ROI (i.e., mean, STD, and division of the mean by the STD). We use several different window sizes around the central point of the ROI. We experimentally found that the best values for window sizes are 3, 71, 91, and 111. Hence, with three components of RGB image, four window sizes, and three features for each window, i.e., 3 × 4 × 3 = 36 features are extracted from each ROI.

3.3. Classification and Segmentation

In this step, the extracted features are classified using an SVM. We use a radial basis function (RBF) as the kernel and 0.001 for the gamma coefficient in the SVM model [29]. After classification, we segment the detected ROI. Figure 7 depicts the polyp segmentation process using the proposed method. Once the SVM model detects an ROI as a polyp, its edge binary mask is obtained using equation (2) (Figure 7(b)). Then, we use the dilation operator to fill the discontinuity from the edge [30] (Figure 7(c)). The process helps to have a continuous border of the polyp. Finally, we segment pixels located between the central point coordinate of the ROI and the nearest border (Figure 7). We remove other regions that do not intersect with the central point coordinates (Figure 7(e)). Our research shows that using original RGB images to apply equation (2) has better results than HE-enhanced images in the segmentation step. The values of a, b, and H parameters for each component in the segmentation step are indicated in Table 3. We find these values with trial and error.

4. Experimental Results

4.1. Dataset

We used the Kvasir-Capsule dataset to evaluate the proposed method, which is a publicly released WCE dataset with various labeled and unlabeled data. It was introduced in 2020 and got updated in 2021 [3]. The Kvasir-Capsule dataset has three parts: labeled image, labeled video, and unlabeled video. In this paper, we used polyp and normal mucosa-labeled images to measure the performance of the proposed method. The Kvasir-Capsule dataset has 34,338 normal mucosa images and 55 polyp images.

4.2. Data Augmentation

All images in the dataset have a resolution of 336 × 336 pixels. Since the dataset is unbalanced with only a few polyp images, we used data augmentation to provide more images containing a polyp. Our data augmentation includes random shifts, flips, and rotations between 0 and 270 degrees. Due to the specification of medical datasets and not producing unreal data, we do not use zooming and other image manipulations in our data augmentation step. The augmentation made 4950 new images from 55 polyp images. Then, we train the model using the new dataset: 34334 normal mucosa images and 4950 polyp images. The Imutils library in Python was used for data augmentation.

4.3. Results

We split all augmented data into 95% and 5% for training and test sets, respectively (Table 4). Then, we trained an SVM model with the training set and computed the performance metrics with the test set. Figure 8 depicts the learning and validation curves. It shows that the learning and validation curves are converged, which means our features are good enough for the training model.

In this paper, accuracy (AC), false-positive rate (FPR), false-negative rate (FNR), precision, recall, and F-measure are used to evaluate the results [31].

In these equations, TP, FP, TN, and FN, respectively, represent true-positive, false-positive, true-negative, and false-negative.

Table 5 represents the performance results of the proposed method and the method recently developed by Amiri et al. [6]. As shown in Table 5, our proposed method has better results for polyp detection on the Kvasir-Capsule dataset. Also, it has better results in all metrics except accuracy. The confusion matrix for the detection is provided in Table 6.

Figure 9 shows the output of the proposed method several sample images containing a polyp along with a mask indicating the polyp area from the dataset. As can be seen, the proposed method can identify polyp edges and areas accurately.

In another experiment, we investigated the time consumption of the proposed polyp detection method. We applied all 34393 (=55 + 34338) images to our proposed method implemented using Python 3.7.9., and the average detection time per image was 0.031 seconds on a computer with an Intel (R) Core (TM) i7-5820K CPU @ 3.30 GHz, 56 GB RAM, Windows 10 operating system.

5. Conclusion

This paper proposed a polyp detection method in WCE images. Investigations in this paper show that a polyp in a WCE image has a convex shape with subtle edges. Hence, the ROI is extracted from a WCE image via thresholding. Then, the ROI is further evaluated considering the distribution of its pixel values, as the distribution in a polyp area is Gaussian. We used an SVM classifier to discriminate between polyp and nonpolyp, considering the mean, STD, and division of mean by STD of the specified Gaussian surface. Hence, we considered that this method consists of several steps: preprocessing, ROI extraction, feature extraction, classification, and segmentation. In the preprocessing step, the image is enhanced using the HE technique. Then, ROI is extracted from an image, considering the distribution of pixel values in a window. We get features from the ROI in each component of the enhanced image with different windows. Ultimately, we train an SVM classifier to detect between normal mucosa and polyp images. Once the SVM model detected an ROI as a polyp, we segmented the polyp region in each image. In this paper, we evaluate the performance of the proposed method on the Kvasir-Capsule dataset. Also, the proposed method was compared with a state-of-the-art method. The final results are satisfactory in classifying WCE images between normal mucosa and polyp classes in an unbalanced dataset.

Data Availability

The Kvasir-Capsule data used to support the findings of this study have been deposited in the https://osf.io/dv2ag/repository (DOI 10.17605/OSF.IO/DV2AG).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the Shahrood University of Technology. The authors acknowledge that university students did this research, and only the students’ tuition fees are the university’s responsibility, and no research expenses have been spent by the university.