Classification of Region of Interest in Mammograms Using Dual Contourlet Transform and Improved KNN

Dong, Min; Wang, Zhe; Dong, Chenghui; Mu, Xiaomin; Ma, Yide

doi:https://doi.org/10.1155/2017/3213680

Journal of Sensors

On this page

Abstract Introduction Materials Conclusion Disclosure Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2017 | Article ID 3213680 | https://doi.org/10.1155/2017/3213680

Classification of Region of Interest in Mammograms Using Dual Contourlet Transform and Improved KNN

Min Dong,^1,2Zhe Wang,¹Chenghui Dong,³Xiaomin Mu,¹and Yide Ma²

Academic Editor: Morteza Aramesh

Received11 Jul 2017

Accepted21 Aug 2017

Published05 Nov 2017

Abstract

Goal. Breast cancer is becoming one of the most common cancers among women. Early detection can help increase the survival rates. Feature extraction directly affects diagnosis result. In this work, a novel feature extraction method based on Dual Contourlet Transform (Dual-CT) is presented, and improved nearest neighbor (KNN) is employed to improve the classification performance. Method. This presented method includes three main sections: firstly, the Region of Interest (ROI) is cropped manually according to gold standard from Mammographic Image Analysis Society (MIAS) database; secondly, the ROIs are decomposed into different resolution levels using Dual-CT, contourlet, and wavelet; a set of texture features are extracted. Then improved KNN and traditional KNN are implemented for classification. Experiments are performed on 324 ROIs which include 206 normal cases and 118 abnormal cases; the abnormal cases are composed of 66 benign cases and 52 malignant cases. Results. Experimental results prove the validity and superiority of Dual-CT-based feature and improved KNN. In particular, 94.14% and 95.76% classification accuracy is achieved based on Dual-CT domain. Moreover, the proposed method is comparable with state-of-the-art methods in terms of accuracy. Contribution. Dual-CT-based feature is used for analyzing mammogram and can help improve breast cancer diagnosis accuracy.

1. Introduction

Breast cancer ranks second as a cause of deaths among women in the world and it has become a major public health problem [1]. According to the American Cancer Society statistics, the estimated new breast cancer cases reach 246,660 among women in the US during 2016, and it has been the most dangerous malignant tumor for women. Among which an estimated 40,450 breast cancer deaths are expected. The data show that breast cancer incidence rates are increased slightly, but the decline in breast cancer death rates is down by 36% from peak rates as a result of improvements in early detection and treatment [2]. In consequence, breast cancer early detection and diagnosis are becoming a difficult point and hot issue of current international research.

Mammography as the best valid tool has been widely used in early breast cancer detection [3–7]. However, the growing mammograms especially the large number of normal cases increase the reading burden of radiologist; it may lead to missing the subtle abnormalities. Consequently Computer-Aided Diagnosis (CAD) is particularly significant to provide a second opinion and reduce false positive and false negative rates. Over recent decades, many researchers have proved its effectiveness in breast cancer diagnosis.

CAD methods for distinguishing the normal and abnormal or benign and malignant have been investigated based on many different techniques [8, 9]. These classification techniques can be divided into two categories. One is image analysis with segmentation the lesion areas [10–16], and the other is image analysis without segmentation [17–23]. Wei et al. [10] come up with a content-based mammogram retrieval system; meanwhile, a similarity measure scheme was proposed, this study was tested on Digital Database for Screening Mammography (DDSM) dataset, and experimental results demonstrated that round-shape masses were most discriminative when using Zernike moments and round-shape, circumscribed margin masses could achieve the highest precision among all mass types. Mustra and Grgic [11] presented a new method for breast skin-air interface detection and pectoral muscle detection based on selected Region of Interest (ROI); this approach was used to solve segmentation in very low contrast pectoral muscle areas. Pereira et al. [12] put forward a method for overcoming the limitation of analyzing only Cranio-Ca (CC) and Mediolateral Oblique (MLO) views, an artifact removal algorithm and multiple thresholding were used for mass preprocessing and segmentation, and finally they tested this new idea on DDSM database. Agrawalet al. [13] proposed a method for mass automatic detection which did not remove pectoral muscles. Firstly, they segmented mass use saliency; secondly, different features of the segmented regions are extracted; then they detected the mass by Support Vector Machine (SVM). This experiment was tested on the MIAS database and the results showed the effectiveness of this proposed method. Zhang et al. [14] focused on identifying the optimal segmentor from an ensemble mix of weak segmentor; the result showed that the segmentor achieved higher segmentation success rates in most cases. Anitha and Peter [15] proposed a new method to identify and segment the suspicious mass using a modified transition rule. An adaptive global thresholding was used to obtain the rough region; then the initial seed point and the modified transition rule were used for segmentation of the mass. This proposed approach yielded promising results when evaluating on 70 mass mammograms from mini-MIAS database. Dong et al. [16] presented a novel automatic segmentation and classification base on DDSM and MIAS database, the experimental results verified the effectiveness of this new approach.

These methods mentioned above have taken much effect on CAD breast cancer. At the same time, the methods without segmentation also play an important role. Campanini et al. [17] exploited all the information available on the image instead of extracting any feature from ROI; then SVM was used to classify suspect areas or not; finally a voting strategy by an ensemble of experts was applied to achieve the final suspect regions. The presented system obtained impressive results when testing on DDSM database. Rashed et al. [18] used fractional amount of biggest wavelets coefficients in multilevel decomposition and they achieved a remarkably high efficiency in distinguishing between benign and malignant tumors. Reyad et al. [19] studied the effect of different features to be used in CAD system for classification of masses, these features included Local Binary Pattern (LBP), statistical measures, and multiresolution feature, the results showed that when using both statistical and LBP features, the accuracy was increased to 98.63%, and the contourlet-based features achieved classification accuracy of 98.43%. Tai et al. [20] studied the local texture characteristics and the discrete photo-metric distribution of each ROI and used stepwise linear discriminate analysis to classify abnormal regions, and the results revealed that the proposed system obtained satisfactory detection performance. Orozco et al. [21] presented a CAD system to distinguish lung nodules CT images based on supervised extraction of the ROI; experimental results showed that this method helped reducing the complexity of classification without the segmentation stage. Pak et al. [22] used ROI-feature extraction based on Nonsubsampled Contourlet Transform (NSCT) and Super Resolution (SR); then AdaBoost algorithm was used to classify and determine the probability of benign and malignant. Beura et al. [23] employed Gray Level Cooccurrence Matrix (GLCM) to all the detailed wavelet coefficients based on ROI and then classified the breast tissues as normal, benign, or malignant using Back Propagation Neural Network (BPNN).

Based on the discussion above, it can be concluded that breast cancer analysis with segmentation has got certain achievements; moreover, the segmentation results directly affect the classification accuracy. Whereas mammogram analysis without segmentation also can obtain higher accuracy and it help reduce the complexity of classification by not carrying out the segmentation stage. Just in time, we have proposed a new structure of Dual Contourlet Transform (Dual-CT) in our previous work [24]. Meanwhile, to our knowledge, there is no previous research using Dual-CT-based feature in digital mammogram analysis. In this paper, we firstly extracted the Dual-CT-based feature of ROI and then developed a new classification method based on Dual-CT feature and improved -nearest neighbors (KNN) classifier. Firstly, we identify the ROI manually according to the gold standard. Secondly, Dual-CT is used to decompose the ROI and then a series of feature are extracted based on Dual-CT coefficients. Finally, improved KNN is employed to classify the mammogram into normal and abnormal and malignant and benign.

The outline of the rest paper is organized as follows. Section 2 describes the wavelet transform, contourlet transform, and Dual-CT briefly; also KNN is given a simple introduction; the database and preprocessing are described in Section 3. In Section 4, feature extraction and feature analysis are presented, and the achieved results are discussed. Conclusions and future work are presented in Section 5.

2. Materials and Image Preprocessing

2.1. Wavelet Transform

Wavelet proposed by J. Morlet is widely used in many areas [25, 26]. Wavelets are short basis functions that are used to represent other functions. It is implemented by iterations of discrete time filters. The basis function is called mother wavelet, and a cluster of functions can be generated by translations and dilation of this basis function. It possesses well-localized properties in both time and frequency simultaneously.

The wavelet basis function can be described as follows; firstly we define scale parameter and translation parameter where , , , respectively.

In general, the discrete wavelet transform of can be defined as

When applying two-dimensional (2D) wavelet decomposition to an image, we will get four subbands in each level, the low frequency subband and three high frequency subbands. Then the low frequency subband is used to be further decomposed. The low frequency subband contains the coarse information of the original image; and the edges and other details information are distributed in the high frequency subbands. Figure 1 shows the decomposition of DWT. The wavelet has the following properties:1.Multiresolution: it can represent the images approximated successively, from coarse to fine resolutions.2.Localization: the separable wavelet represents the basic elements localizing in both spatial and frequency domains.3.Critical sampling: wavelet can form a basis or a frame with small redundancy.

For more details about wavelet analysis, refer to [27].

2.2. Contourlet Transform

Contourlet transform [28] is proposed as a new image representation approach over wavelet. It is a “true” 2D image representation scheme and it can capture the intrinsic geometrical structure of original image. The contourlet combines with Laplacian pyramid (LP) and directional filter banks (DFB) for multiresolution and multidirectional decomposing. The LP is firstly used to capture the discontinuous points; next DFB is used to link discontinuous points into linear structures. Figures 2 and 3 show the decomposition structure of CT and the frequency spectrum decomposition of DFB, respectively.

The LP iteratively decomposes a 2D image into low-pass and bandpass subbands, and the following bandpass subbands are fed into DFB to capture the directional information. Then iterating this scheme on the low-pass subband, the contourlet coefficients are obtained finally. The contourlet has the following advantages over wavelet:1.Directionality: DFB contains basis elements oriented at various directions which are more than three directions offered by wavelet.2.Anisotropy: contourlet contains basis elements with various elongated shapes with different aspect ratios, and it can capture smooth contours in images.

More details about contourlet can be found in [29].

2.3. Dual Contourlet Transform

Dual Contourlet Transform is developed as an improvement over contourlet. It is constructed by cascading of dual LP and DFB; the dual LP is used to improve the spectrum aliasing in downsampling of LP. DFB involves basis functions orienting at any power of two’s number of directions with flexible aspect ratios. Figure 4 shows the decomposition structure of Dual-CT.

The difference between Dual-CT and contourlet is the special dual LP structure. The dual LP is composed of two trees satisfying the relationship of phase constrains, and the two trees can be seen an approximate Hilbert transform. Figure 5 shows the decomposition structure of dual LP. In Figure 5, , , , and are all low-pass filters. The filters of the dual tree LP are designed as below:

At level 1, and satisfy the following relationship:

At level 2 and the following levels, and must satisfy the following relationship:

At level 1, we use the low-pass filter of “9-7” as , filter is just a sample delay of filter, at level 2 and the following, filters and satisfy the following relationship: and , where and are filters pair designed by “Q-shift” algorithm. As such, filters and satisfy the phase constrained relationship of 1/2 sampling delay, and the outputs of two trees form approximate analytic signal.

Besides the properties of contourlet, Dual-CT offers approximate shift invariance, phase information, which is very important in image processing areas.

2.4. Nearest Neighbor (KNN)

Nearest Neighbor (KNN) [30] is proposed by Cover and Hart in 1968, and it is one of the most simple machine learning algorithms. It is an extension of the simple nearest neighbor. KNN classifies an unknown sample on the “vote” of nearest neighbor rather than on the single nearest neighbor.

The main steps of KNN implementation are as follows:(1)Assess similarity: calculate the similarity between the test sample and each sample of the training set. In general, the similarity can be measured by Euclidean distance, Manhattan distance, Jaccard similarity coefficient, correlation coefficient, and so on. Among these, Euclidean distance is the most widely used. For a given feature sample and training set feature , the Euclidean distance is calculated as below: where is the number of the feature vectors, is the number of training samples, and is the Euclidean distance between the test sample and the th sample of the training set.(2)Find neighbors: find the neighbors nearest distance and sort in ascending order. The selection of value will directly affect the classification result. As shown in Figure 6, the test sample class will change with the value of . The candidate of can be chosen as 3, 5, and 7 or by experience.(3)Vote and classify: according to the vote result of each category, the test sample is classified to one class.

3. Mammogram Database and Preprocessing

3.1. Database

The mammogram is obtained by compressing the breast between two acrylic plates when X-ray is emitted through. In the previous study, MIAS [31] was widely used in mammography analysis because that they are freely available [13, 16, 32, 33]. In this work, we choose the same dataset, the same as other researchers. Another reason is that various cases of MIAS are labeled by expert radiologist based on experience and biopsy.

Mammogram of MIAS is selected from the United Kingdom National Breast Screening Program; it contains 161 pairs of films. Every image is 1024 × 1024 pixels; they contain normal and abnormal cases. The coordinates center and approximate radius (in pixels) of abnormality are given by experts. Each mediolateral oblique view is available for research purpose. The summary of MIAS digital mammogram is listed in Table 1. For instance, there are two lesion areas in a mammogram such as “mdb 239” and “mdb 249,” so there are totally 324 samples.

3.2. Region of Interest (ROI) Extraction

The original mammogram contains background, muscle, and the label; this information can be seen as noise in the process of classification. Instead of segmentation on the lesion areas, we apply a cropping operation to the original image manually, and then 324 ROI are extracted with size of pixels. The center of ROI is selected according to the given center of the abnormal area. An example of cropping result is shown in Figure 7.

(a) Full mammogram “mbd 028”

(b) Cropper mammogram “mbd 028”

(c) Cropper normal mammogram “mbd 131”

(d) Cropper malignant mammogram “mbd 083”

(e) Cropper benign mammogram “mbd 021”

3.3. Performance Evaluation

The objective evaluation criterion is measured by classification accuracy and receiver operating characteristic (ROC). Sensitivity and specificity are statistical measures of a binary classification test. The confusion matrix is defined in Table 2.

Sensitivity deals only with positive cases; it indicates the proportion of the detected positive cases over the actual positive cases.

Specificity deals only with negative cases; it indicates the proportion of the detected negative cases over the actual negative cases.

Accuracy deals with all cases and it is the most commonly used indicators; it reflects the precision of predict results.

The ROC curve is used to evaluate the predictive accuracy of the proposed model. It indicates the relation between sensitivity and specificity. The area under the ROC curve (AUC) is one of the excellent methods for comparing classifiers into two-class issues. If the ROC curve rises quickly towards the upper left corner of the graph, this indicates that the test method performs better. When the AUC is close to 1.0, it indicates that the diagnostic test is reliable; on the contrary, an area close to 0.5 demonstrates the unreliable test result.

4. The Proposed System

In this work, a new classification method of mammograms is proposed. The procedure of the proposed system can be summarized as follows, and the proposed system is presented in Figure 8.1.ROI extraction: a ROI with size of pixels is cropped, where the center of ROI is determined according to the given center of the abnormal area. The normal ROI is extracted randomly from the normal mammogram.2.Dual-CT decomposition: Dual-CT is implemented on the extracted ROI and Dual-CT coefficients are obtained.3.Feature extraction and analysis: feature is extracted based on the Dual-CT directional subbands and this characteristic difference is analyzed between normal and abnormal ones and benign and malignant ones.4.Classification: improved KNN is used for classification based on the extracted features.

4.1. Feature Extraction

Image texture is an important feature of representing itself; different types of image possess different texture. Previous studies [18, 22, 36] have shown that combining texture feature with multiresolution transform domain feature can help improving the classification accuracy. In this work, feature is extracted from the multiresolution domain based on ROI. Firstly, the extracted ROI is decomposed by the proposed Dual-CT, and the Dual-CT coefficients are obtained; secondly the directional subband coefficients are used for feature extraction. After investigation and analysis, it is found that these nine features including mean, smoothness, and others are effective. These nine features are illustrated as follows.

For the given ROI, is each gray value of the ROI, is the gray level histogram, is the number of gray levels.(1)Mean: it reflects the average gray level of an image(2)Standard deviation: it reflects the degree of deviation between the whole image and the mean image(3)Smoothness: the practical significance is similar to the variance (4)Skewness: it reflects the deviation trend between the whole gray level and the mean; the gray deviation caused by minority extremum can be indicated in this index (5)Uniformity:(6)Entropy: it is often used to measure the random distribution of gray value; the greater the randomness, the larger the entropy value:(7)Contrast: it is used to measure the image definition; the deeper the texture, the larger the contrast, where is a positive number. Experimentally, is set as 1/4. For a given image , is the normalized Gray Level Cooccurrence Matrix (GLCM) of , where and are the size of .(8)Correlation: it measures the spatial similarity of GLCM along with row or column direction.(9)Homogeneity: it reflects the homogeneity of image texture; it is often used to measure local variation of image texture

4.2. Feature Analysis

In this research, there are nine features extracted based on Dual-CT domain. In order to verify the feature used in this paper effectiveness, we choose ten extracted ROIs from the abnormal and normal images, respectively, and compute several features of the selected image. These selected features include standard variance, uniformity, entropy, and correlation. Figure 9 shows the feature value of 10 normal ROIs and 10 abnormal ROIs.

It can be seen that the standard variance of normal images is stable and low, while that of abnormal images is high and sharp. It means that gray scale of the normal image changes smoothly; the emergence of the lesion area changes the gray level distribution obviously. For the uniformity and the correlation, the normal image achieves higher value than the abnormal. It indicates that the local similarity is higher in normal sample than that in abnormal sample. From the entropy indicator, the normal image is lower than the abnormal because the gray level distribution of the abnormal image is more randomness, while it is regular in normal image.

Figure 9 indicates that the feature of smoothness is significant to distinguish these two types.

The same step is also done for the malignant and benign ROIs. We firstly select 10 malignant and 10 benign samples, respectively; secondly we compute the same four features of the selected ROIs. Figure 10 shows the feature value of 10 benign ROIs and 10 malignant ROIs. From Figure 10, we can see that the features between benign and malignant have obvious difference. The uniformity and correlation value of benign ROI is larger than that of malignant ROI, while the standard variance and the entropy value of benign ROI is smaller than that of malignant ROI. These changes are related to the gray level distribution of lesion areas. For instance, the gray level of benign lesion assumes disciplinary changes and the gray level of malignant lesion changes desultorily.

As can be seen, these selected features are useful to classify normal and abnormal ROI and benign and malignant ROI. Following, we will use these features for classification and analyze the experimental results.

4.3. Classification Results

In order to verify the effectiveness of the present new method, we compare our method with the state-of-the-art methods. For the choice of the number of decomposition layers, previous research suggests that three layers of decomposition in feature extracted often indicate better classification results. So in this paper, we choose three levels. The main steps in this article are as follows: firstly, the 322 ROIs are extracted manually; secondly Dual-CT, contourlet transform, and wavelet transform are used to decompose the extracted ROIs, the decomposition level is set as based on the experiment, and then we obtain the multiresolution coefficients: Dual-CT has 2⁴ directions at each scale for each tree; contourlet has 2⁴ directions at each scale, and wavelet has only 3 directions at each sale. The aforementioned nine features are extracted on the directional coefficients, and we obtain the feature database. Finally, the features are fed into the improved KNN for classification.

4.3.1. The Improved KNN

The basic KNN can be described in three steps: computing distance, finding nearest neighbor, and classification. In this work, we used improved KNN to improve the classification accuracy. The implementation of improved KNN is illustrated as follows.1.Compute distance: for each test sample, we calculate the Euclidean distance between the feature of test sample and all the rest of the features database.2.Find nearest samples: sorting the distance in ascending order and finding the first samples. In our experiment, we set as 3 and 5 by experience.3.Classify: the test sample will be divided into the class of more votes directly in former KNN. In order to increase the classification accuracy, we improve this step.

For a test sample , represents the category and and are the sample and sample number which belong to in the neighbor, respectively; then we define as the credibility of to category .

The smaller the is, the greater the possibility that belongs to . If is equal to 0, there is no doubt that belongs to .

In the following experiments, the improved KNN will be proved to show its effectiveness in classification.

4.3.2. Classification between Normal and Abnormal

In this section, there are totally 324 ROIs extracted from the MIAS database. It includes 206 normal areas and 118 abnormal areas. Table 3 shows the classification accuracy of different methods.

For Table 3, we analyze the classification performance from the following two aspects.1.In terms of KNN classifier, Dual-CT-based features perform better than contourlet and wavelet in general. Especially for the abnormal case, the accuracy of Dual-CT is 15% higher than that of the contourlet and wavelet on average. For the correlation index, Dual-CT achieves the accuracy of up to 80.51% in abnormal cases, whereas the accuracy is 63.56% and 64.41% of contourlet and wavelet, respectively.2.In terms of improved KNN, the classification performance is improved totally. Especially for the abnormal cases, the accuracy is promoted obviously.

All in all, the classification accuracy of normal is higher than that of abnormal. Dual-CT domain feature performs the best of the three multiresolution domains; contourlet-based feature performs slightly better than wavelet-based feature. The improved KNN helps improving the classification performance.

Table 4 shows the classification accuracy of the nine extracted features based on the KNN and improved KNN classifier. It can be concluded that the classification accuracy is up to 94.14% based on entropy feature using improved KNN classifier, and the average of the classification accuracy is about 93%. The best classification performance of KNN is achieved by the correlation index with the accuracy of 83.02%, and the improved KNN promotes the accuracy to 93.83%.

Figure 11 compares the area under ROC curve of multiresolution feature based on KNN and improved KNN classifier. It indicates that the Dual-CT domain feature using improved KNN classifier achieves better performance. We can see that a higher ROC of 0.95 has been obtained on average. This proves once again the superiority and robustness of our method over the others.

4.3.3. Classification between Benign and Malignant

There are totally 118 abnormal cases, which includes 52 malignant cases and 66 benign cases. In this section, we classify the two cases with the proposed method. The classification results are listed as shown in Table 5.

From Table 5, we can see that the best classification accuracy rate is achieved by the Dual-CT feature using improved KNN classifier. Using KNN classifier to distinguish benign and malignant seems a little weak; it cannot provide reliable classification accuracy; when using improved KNN classifier, the classification accuracy has increased significantly with almost 15 percentage points. The best performance is achieved by the standard deviation and contrast in Dual-CT domain with the accuracy of 100% and 96.15% for the benign case and malignant case, respectively.

To further confirm the effectiveness of our method, we calculate the classification accuracy based on the above simulation results and list in Table 6. The higher the values for classification accuracy, the better the performance of the method. Table 6 shows that the optimal value is obtained by improved KNN classifier with Dual-CT domain feature (about 6%~20% improvements). In terms of feature index, wavelet-based feature performs the worst; contourlet-based feature performs slightly better; Dual-CT-based feature performs the best. In terms of classifiers, the improved KNN obviously promotes the classification accuracy compared with KNN.

Figure 12 shows the AUC comparison between KNN classifier and improved KNN classifier based on multiresolution feature. In most instances, Dual-CT-based feature achieves better performance than CT and wavelet-based feature. The AUC is 0.95, 0.91, and 0.88 based on improved KNN with Dual-CT domain feature, contourlet domain feature, and wavelet domain feature. This indicates that the proposed method can detect benign and malignant lesions with high probability, and it will help reduce the number of biopsies for benign lesions.

4.3.4. Compared with State-of-the-Art Methods

In the previous section, we have demonstrated that the proposed Dual-CT-based feature with improved KNN classifier provides better performance than that using traditional KNN classifier. Here, we compare this proposed method with state-of-the-art methods reported in the literature, including accuracy and AUC. Table 7 shows the comparison where the database and classification technique are listed. It can be seen that the proposed method obtains better diagnostic performance. Even compared to [7], it is also comparable. It can be noted that [7] reaches the higher accuracy, but we choose the point where the classification accuracy rate is higher and the number of features is fewer.

4.4. Discussions

The obtained promising results suggest the following:1.The Dual-CT-based features perform better than contourlet-based features and wavelet-based features. It is consistent with the expected effect since the Dual-CT simultaneously possesses approximate shift invariance and higher directional selectivity than contourlet and wavelet. Dual-CT is able to capture the anisotropic structures and multidimensional features of mammogram. Wavelet lacks shift invariance and has poor directional selectivity; contourlet performs a little better than wavelet because of its better directional selectivity.2.The improved KNN classifier performs better than the traditional KNN classifier in terms of classification performance. This should be attributed to the improved discrimination process. In the improved KNN, we take the number of samples in each category into consideration. For the MIAS database, there are 206 normal cases and 118 abnormal cases; the traditional KNN directly distinguishes the test sample to either of the two classes according to the nearest neighbor samples. The number of normal cases is about double that of the abnormal cases, it will lead to normal cases that are more likely to be selected into nearest neighbor samples than abnormal cases and bring about misclassification. The introduction of credibility solves this problem and improves the classification performance.3.The normal and benign cases achieve better performance than that of abnormal and malignant cases. It may be because the normal cases have relatively homogeneous texture; in contrast, the abnormal cases include many conditions such as microcalcification, circumscribed mass, speculated mass, architectural distortion, and other cases, as well as the same reason in benign and malignant cases.

5. Conclusion

In this work, a new method of digital mammogram analysis and classification is proposed. Firstly, the ROI is cropped from MIAS database manually according to the gold standard. Secondly, Dual-CT, contourlet, and wavelet transform are used to decompose each cropped ROI separately. The directional subbands from each decomposition level are used to extract feature. Then improved KNN and traditional KNN are employed to distinguish normal and abnormal and malignant and benign. We analyze the classification accuracy and AUC of each method quantitatively. The experimental results suggest that the Dual-CT-based features obtain a better performance as compared to contourlet and wavelet transform, and improved KNN gives a more outstanding performance than traditional KNN. For instance, the accuracy of abnormal based on entropy feature reaches 80.51%, while the accuracy achieved by contourlet and wavelet transform is 63.56% and 64.41%, respectively; for classification of benign and malignant, the Dual-CT-based feature using improved KNN is 95.76%, which is 20 percent higher than that of traditional KNN. Moreover, the proposed method is comparable with state-of-the-art methods reported in recent literatures in terms of accuracy and AUC.

The Dual-CT-based features are firstly used to analyze mammograms, and improved KNN is used to help improving diagnosis of breast cancer. These positive results clearly demonstrate the great potential of the Dual-CT-based feature and improved KNN in analysis and classification of biomedical data. In the future, we will try to extend the proposed method with appropriate changes for other medical images.

Disclosure

The authors provided their source codes for the comparison results.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is jointly supported by National Natural Science Foundation of China (no. 61175012), Key Research Projects of Henan Higher Education Institutions (no. 18A510017), and Henan Postdoctoral Research Project “Image Fusion Algorithm Based on Dual Tree Contourlet.”

References

S. K. Wajid and A. Hussain, “Local energy-based shape histogram feature extraction technique for breast cancer diagnosis,” Expert Systems with Applications, vol. 42, no. 20, pp. 6990–6999, 2015.
View at: Publisher Site | Google Scholar
R. L. Siegel, K. D. Miller, and A. Jemal, “Cancer statistics, 2016,” CA: A Cancer Journal for Clinicians, vol. 66, no. 1, pp. 7–30, 2016.
View at: Publisher Site | Google Scholar
R. Nithya and B. Santhi, “Computer aided diagnosis system for mammogram analysis: A survey,” Journal of Medical Imaging and Health Informatics, vol. 5, no. 4, pp. 653–674, 2015.
View at: Publisher Site | Google Scholar
F. B. Garma and M. A. Hassan, “Classification of breast tissue as normal or abnormal based on texture analysis of digital mammogram,” Journal of Medical Imaging and Health Informatics, vol. 4, no. 5, pp. 647–653, 2014.
View at: Publisher Site | Google Scholar
M. M. Eltoukhy, S. J. Safdar Gardezi, and I. Faye, “A method to reduce curvelet coefficients for mammogram classification,” in Proceedings of the 2014 IEEE Region 10 Symposium, IEEE TENSYMP 2014, pp. 663–666, April 2014.
View at: Google Scholar
A. Tahmasbi, F. Saki, and S. B. Shokouhi, “Classification of benign and malignant masses based on Zernike moments,” Computers in Biology and Medicine, vol. 41, no. 8, pp. 726–735, 2011.
View at: Publisher Site | Google Scholar
M. Meselhy Eltoukhy, I. Faye, and B. Belhaouari Samir, “A statistical based feature extraction method for breast cancer diagnosis in digital mammogram using multiresolution representation,” Computers in Biology and Medicine, vol. 42, no. 1, pp. 123–128, 2012.
View at: Publisher Site | Google Scholar
N. Al-Najdawi, M. Biltawi, and S. Tedmori, “Mammogram image visual enhancement, mass segmentation and classification,” Applied Soft Computing, vol. 35, pp. 175–185, 2015.
View at: Publisher Site | Google Scholar
A. Oliver, J. Freixenet, J. Martí et al., “A review of automatic mass detection and segmentation in mammographic images,” Medical Image Analysis, vol. 14, no. 2, pp. 87–110, 2010.
View at: Publisher Site | Google Scholar
C.-H. Wei, S. Y. Chen, and X. Liu, “Mammogram retrieval on similar mass lesions,” Computer Methods and Programs in Biomedicine, vol. 106, no. 3, pp. 234–248, 2012.
View at: Publisher Site | Google Scholar
M. Mustra and M. Grgic, “Robust automatic breast and pectoral muscle segmentation from scanned mammograms,” Signal Processing, vol. 93, no. 10, pp. 2817–2827, 2013.
View at: Publisher Site | Google Scholar
D. C. Pereira, R. P. Ramos, and M. Z. do Nascimento, “Segmentation and detection of breast cancer in mammograms combining wavelet analysis and genetic algorithm,” Computer Methods and Programs in Biomedicine, vol. 114, no. 1, pp. 88–101, 2014.
View at: Publisher Site | Google Scholar
P. Agrawal, M. Vatsa, and R. Singh, “Saliency based mass detection from screening mammograms,” Signal Processing, vol. 99, pp. 29–47, 2014.
View at: Publisher Site | Google Scholar
Y. Zhang, N. Tomuro, J. Furst, and D. S. Raicu, “Identifying the optimal segmentors for mass classification in mammograms,” in Proceedings of the Medical Imaging 2015: Image Processing, S. Ourselin and M. A. Styner, Eds., February 2015.
View at: Publisher Site | Google Scholar
J. Anitha and J. D. Peter, “Mammogram segmentation using maximal cell strength updation in cellular automata,” Medical & Biological Engineering & Computing, vol. 53, no. 8, pp. 737–749, 2015.
View at: Publisher Site | Google Scholar
M. Dong, X. Lu, Y. Ma, Y. Guo, Y. Ma, and K. Wang, “An efficient approach for automated mass segmentation and classification in mammograms,” Journal of Digital Imaging, vol. 28, no. 5, pp. 613–625, 2015.
View at: Publisher Site | Google Scholar
R. Campanini, D. Dongiovanni, E. Iampieri et al., “A novel featureless approach to mass detection in digital mammograms based on support vector machines,” Physics in Medicine and Biology, vol. 49, no. 6, pp. 961–975, 2004.
View at: Publisher Site | Google Scholar
E. A. Rashed, I. A. Ismail, and S. I. Zaki, “Multiresolution mammogram analysis in multilevel decomposition,” Pattern Recognition Letters, vol. 28, no. 2, pp. 286–292, 2007.
View at: Publisher Site | Google Scholar
Y. A. Reyad, M. A. Berbar, and M. Hussain, “Comparison of statistical, LBP, and multi-resolution analysis features for breast mass classification,” Journal of Medical Systems, vol. 38, no. 9, 2014.
View at: Publisher Site | Google Scholar
S.-C. Tai, Z.-S. Chen, and W.-T. Tsai, “An automatic mass detection system in mammograms based on complex texture features,” IEEE Journal of Biomedical and Health Informatics, vol. 18, no. 2, pp. 618–627, 2014.
View at: Publisher Site | Google Scholar
H. M. Orozco, O. V. Villegas, G. C. Sánchez, J. O. Domínguez, and J. N. Alfaro, “Automated system for lung nodules classification based on wavelet feature descriptor and support vector machine,” Biomedical Engineering Online, vol. 14, no. 1, 2014.
View at: Publisher Site | Google Scholar
F. Pak, H. R. Kanan, and A. Alikhassi, “Breast cancer detection and classification in digital mammography based on non-subsampled contourlet transform (NSCT) and super resolution,” Computer Methods and Programs in Biomedicine, vol. 122, no. 2, pp. 89–107, 2015.
View at: Publisher Site | Google Scholar
S. Beura, B. Majhi, and R. Dash, “Mammogram classification using two dimensional discrete wavelet transform and gray-level co-occurrence matrix for detection of breast cancer,” Neurocomputing, vol. 154, pp. 1–14, 2015.
View at: Publisher Site | Google Scholar
M. Dong, J. Zhang, and Y. Ma, “Image denoising via bivariate shrinkage function based on a new structure of dual contourlet transform,” Signal Processing, vol. 109, pp. 25–37, 2015.
View at: Publisher Site | Google Scholar
M. Schultze-Kraft, R. Becker, M. Breakspear, and P. Ritter, “Exploiting the potential of three dimensional spatial wavelet analysis to explore nesting of temporal oscillations and spatial variance in simultaneous EEG-fMRI data,” Progress in Biophysics and Molecular Biology, vol. 105, no. 1-2, pp. 67–79, 2011.
View at: Publisher Site | Google Scholar
J. Weng, J. Zhong, and C. Hu, “Digital reconstruction based on angular spectrum diffraction with the ridge of wavelet transform in holographic phase-contrast microscopy,” Optics Express, vol. 16, no. 26, pp. 21971–21981, 2008.
View at: Publisher Site | Google Scholar
I. Daubechies, “The wavelet transform, time-frequency localization and signal analysis,” Institute of Electrical and Electronics Engineers Transactions on Information Theory, vol. 36, no. 5, pp. 961–1005, 1990.
View at: Publisher Site | Google Scholar | MathSciNet
M. Do and M. Vetterli, “Contourlets: a directional multiresolution image representation,” in Proceedings of the ICIP 2002 International Conference on Image Processing, pp. I-357–I-360, Rochester, NY, USA.
View at: Publisher Site | Google Scholar
M. N. Do and M. Vetterli, “The contourlet transform: an efficient directional multiresolution image representation,” IEEE Transactions on Image Processing, vol. 14, no. 12, pp. 2091–2106, 2005.
View at: Publisher Site | Google Scholar
T. Hastie and R. Tibshirani, “Discriminant adaptive nearest neighbor classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 6, pp. 607–616, 1996.
View at: Publisher Site | Google Scholar
J. Suckling, J. Parker, and D. R. Dance, “The mammographic image analysis society digital mammogram database,” in Proceedings of the Excerpta Medica International Congress, 1994.
View at: Google Scholar
S. Dhahbi, W. Barhoumi, and E. Zagrouba, “Breast cancer diagnosis in digitized mammograms using curvelet moments,” Computers in Biology and Medicine, vol. 64, pp. 79–90, 2015.
View at: Publisher Site | Google Scholar
T. Mu, A. K. Nandi, and R. M. Rangayyan, “Classification of breast masses using selected shape, edge-sharpness, and texture features with linear and kernel-based classifiers,” Journal of Digital Imaging, vol. 21, no. 2, pp. 153–169, 2008.
View at: Publisher Site | Google Scholar
F. Saki, A. Tahmasbi, H. Soltanian-Zadeh, and S. B. Shokouhi, “Fast opposite weight learning rules with application in breast cancer diagnosis,” Computers in Biology and Medicine, vol. 43, no. 1, pp. 32–41, 2013.
View at: Publisher Site | Google Scholar
B. Verma, P. McLeod, and A. Klevansky, “Classification of benign and malignant patterns in digital mammograms for the diagnosis of breast cancer,” Expert Systems with Applications, vol. 37, no. 4, pp. 3344–3351, 2010.
View at: Publisher Site | Google Scholar
I. Zyout, J. Czajkowska, and M. Grzegorzek, “Multi-scale textural feature extraction and particle swarm optimization based model selection for false positive reduction in mammography,” Computerized Medical Imaging and Graphics, vol. 46, pp. 95–107, 2015.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2017 Min Dong et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

2868

Downloads

1146

Citations