Abstract

Screening mammograms is a repetitive task that causes fatigue and eye strain since for every thousand cases analyzed by a radiologist, only 3–4 are cancerous and thus an abnormality may be overlooked. Computer-aided detection (CAD) algorithms were developed to assist radiologists in detecting mammographic lesions. In this paper, a computer-aided detection and diagnosis (CADD) system for breast cancer is developed. The framework is based on combining principal component analysis (PCA), independent component analysis (ICA), and a fuzzy classifier to identify and label suspicious regions. This is a novel approach since it uses a fuzzy classifier integrated into the ICA model. Implemented and tested using MIAS database. This algorithm results in the classification of a mammogram as either normal or abnormal. Furthermore, if abnormal, it differentiates it into a benign or a malignant tissue. Results show that this system has 84.03% accuracy in detecting all kinds of abnormalities and 78% diagnosis accuracy.

1. Introduction

Breast cancer is considered one of the most common and fatal cancers among women in the USA [1]. According to National Cancer Institute, 40 480 women died due to this disease and on average every three minutes one woman is diagnosed with this cancer. Right now there are over two and a half million women in the US who have been treated from it [1]. Radiologists visually examine mammograms to search for signs of abnormal regions. They usually look for clusters of microcalcifications, architectural distortions, or masses.

Early detection of breast cancer via mammography improves treatment chances and survival rates [2]. Unfortunately, mammography is not perfect. False positive (FP) rates are 15–30% due to the overlap in the appearance of malignant and benign abnormalities while false negative (FN) rates are 10–30%. A result of FP is defined to be when a radiologist reports a suspicious change in the breast but no cancer is found after further examinations. Therefore, it leads to unnecessary biopsies and anxiety. A result of FN means failure to detect or correctly characterize breast cancer in a case of which later tests conclude that cancer is present. Nonetheless, mammography has an overall accuracy rate of 90% [3].

CAD algorithms have been developed to assist radiologists in detecting mammographic lesions. These systems are regarded as a second reader, and the final decision is left to the radiologist. CAD algorithms have improved total radiologist accuracy of detection of cancerous tissues [4]. CADD algorithms are considered as an extremely challenging task for various reasons. First, the imaging system may have serious imperfections. Second, the image analysis task is compounded by the large variability in the appearance of abnormal regions. Finally, abnormal regions are often hidden in dense breast tissue. The goal of the detection stage is to assist radiologists in locating abnormal tissues.

Many methods have been proposed in the literature for mammography detection and diagnosis utilizing a wide variety of algorithms. Chang et al. [5] developed a 3D snake algorithm that finds the tumor’s contour after reducing the noise levels and followed by an edge enhancement process. Finally, the tumor’s contour is estimated by using the gradient vector flow snake. Kobatake et al. [6] proposed the iris filter to detect lesions as suspicious regions with a low contrast compared to their background. The proposed filter has the features' extraction ability of malignant tissues. Bocchi et al. [7] developed an algorithm for microcalcification detection and classification by which the existing tumors are detected using a region growing method combined with a neural network-based classifier. Then, microcalcification clusters are detected and classified by using a second fractal model. Also, Li et al. [8] developed a method for detecting tumors using a segmentation process, adaptive thresholding, and modified Markov random fields, followed by a classification step based on a fuzzy binary decision tree. Bruce and Adhami [9] used the modulus-maxima technique of discrete wavelet transform as a feature extraction technique combined with a Euclidean distance classifier. A radial distance measure of mass boundaries is used to extract multiresolution shape features. Finally, the leave-one-out and apparent methods are used to test their proposed technique. Peña-Reyes and Sipper [10] applied a combined fuzzy-genetic approach with new methods as a computer-aided diagnosis system. Zheng and Chan [11] combined artificial intelligent methods with the discrete wavelet transform to build an algorithm for mass detection. Hassanien and Ali [12] proposed an enhanced rough set technique for feature reduction and classification. Swiniarski and Lim [13] integrated ICA with rough set model for breast-cancer detection. First, features are reduced and extracted using ICA. Then, extracted features are selected using a rough set model. Finally, a rough set-based method is used for rule-based classifier design.

This work is based on integrating PCA, ICA, and fuzzy classifier to identify and label suspicious regions from digitized mammograms. The rest of this paper is organized as follows: Section 2 presents PCA and ICA algorithms and covers fuzzy logic adaptation as a classifier. The proposed integrated approach is presented in Section 3. Section 4 presents the experimental results followed by the conclusions in Section 5.

2. Background

2.1. PCA

PCA is a decorrelation-based technique that finds the basis vectors for a subspace in order to select the most important information. PCA consists of two phases. The first phase finds uncorrelated and orthogonal vectors; and the second phase projects the testing data into a subspace spanned by these vectors [14]. PCA algorithm can be presented as follows:

(i)construct matrix with dimension , where is the total number of training subimages and is the size of each square subimage; then, generate its normalized matrix ;(ii)covariance matrix is constructed using (iii)let and be its eigenvalues and eigenvectors that satisfy the equation , where ; discard of all eigenvalues less than (a predetermined threshold) and retain the rest (the principal components) to produce the reduced matrix . is calculated using

The given testing data is projected into the space spanned by the reduced training matrix using

2.2. ICA

Higher-order statistics, such as ICA techniques, are used to compensate for PCA shortcomings. ICA is based on the use of moments and cumulants up to fourth-order to describe any distribution of a random variable.

In general, ICA is a relatively new technique developed to find a linear representation of nongaussian data so that the data components are statistically as independent as possible. ICA has the ability to describe localized shape variations and it does not require a Gaussian distribution of the data as in PCA. However, the resulting vectors are not ordered; and, therefore, ICA requires a method for ordering the resulting vectors.

The statistical latent variables model is used to define ICA. Assuming that we have linear mixtures of independent components according to

The digital mammographic image is considered as a mixture of linear combination of statistically independent source regions where , the mixing matrix, and its coefficients describe uniquely the mixed source regions and can be used as extracted features. After estimating the matrix and its inverse (the separating matrix), the independent components can be estimated using

2.3. Fuzzy Classifier

Fuzzy logic can be interpreted as the emulation of human reasoning on computers [15]. Fuzzy rules are more comprehensible than crisp rules since they can be expressed in terms of linguistic concepts. The value of the linguistic variable is not a number but a word. For example, the linguistic variable “size” might have the values “small,” “medium,” and “large.” Each one of these values is called a fuzzy set when implemented using fuzzy logic and thus fuzzy sets can be used to model linguistic variables.

Fuzzy classifier is ideally suited to the labeled observed data to provide interpretable solutions. It handles imprecise data and the resulting fuzzy rules are interpretable, that is, fuzzy classifier structure can be analyzed through its semantic structure. There are two different methods for development of fuzzy classifiers; approximate and descriptive fuzzy rule base.

Each fuzzy rule is defined using membership function of fuzzy sets in an approximate fuzzy rule base which is implemented in this work. Values of the linguistic variable can be described in terms of numerals using membership functions. The object membership degree to a fuzzy set defines a membership function. Its domain is the universe of discourse (all values an object may take) and its range of the interval . A commonly used membership function is the triangular function. Figure 1 shows a triangular membership function of a fuzzy set “Small.”

In Figure 1, an object has a membership degree of 0.7 to the fuzzy set “Small.” A fuzzy space is defined to be the set of fuzzy sets that define fuzzy classes for a particular object as shown in Figure 2.

Fuzzy space allows the object to partially belong to different classes simultaneously. This idea is very useful in cases where the difference between classes is not well defined. For example, the object has a membership degree of 0.7 to the fuzzy set “Small” and 0.3 to the fuzzy set “Medium.” Similarly, in mammographic images, the difference between benign/malignant and normal/abnormal subimages is not well defined. For example, an abnormal subimage may be classified as benign rather than malignant which can be described in terms of numerals using membership functions as it has a membership degree of 0.7 to the fuzzy set “benign” and 0.3 to the fuzzy set “malignant.” Fuzzy membership functions are easy to implement and their fuzzy inference engines are fast.

In descriptive fuzzy rule base, linguistic variables are commonly defined by fuzzy if-then rules where labels are used to represent a discrete set of linguistic fuzzy sets. For example, fuzzy classification rules that describe each class of subimages may be developed to represent each class. Fuzzy rules have the form Fuzzy rules can also be expressed as where represents the decision class (i.e., normal, abnormal, benign, or malignant) and represents a fuzzy set for : selected feature.

3. Proposed CADD Algorithm

In this section, a computer-aided detection and diagnosis algorithm of suspicious regions in mammograms is developed. PCA algorithm is used as a dimensionality reduction module followed by ICA as a feature extraction module. Finally, a fuzzy classifier is used to classify testing subimages into normal/abnormal and at a later stage to classify the abnormal subimages into malignant/benign as a diagnosis system. Figure 3 presents the general framework for this system.

3.1. Subimages Generation

MIAS database has a total of 119 regions of suspicion (ROS) divided into 51 malignant and 68 benign. Two different sets of abnormal subimages, each set consists of 119 ROS, are cropped and scaled into and  pixels based on the center of each abnormality.

Then, five different sets of normal subimages, each set consists of 119 subimages, are cropped and scaled randomly from normal MIAS mammograms where two sets of size and three sets of size  pixels.

Each set of abnormal subimages is mixed with one set of normal subimages every time and then divided into two groups; one for training phase and the other group for testing phase as shown in Table 1.

Each training set is used to create the matrix with dimension where each row contains a subimage. The training matrix dimensionality is reduced by using PCA algorithm to generate . Then, the covariance matrix is estimated by using

3.2. Unsupervised Learning

Estimation of the separating matrix, , and the independent source regions, , is done in an unsupervised manner. The independent source regions are estimated by using (9), where is the transpose of the reduced matrix . The separating matrix, , is initialized to the identity matrix yielding

To reach the maximum statistical independence of , the nonlinear function is used to estimate the marginal probability density function of using its central moments and cumulants. Minimum mutual information algorithm [16] is used to estimate as shown in (10)–(14). Equations (10) and (11) are used to estimate the th central moments and cumulants where is the expected value and is the mean of the current feature . Equations (12)–(14) are used to estimate ( indicates the Hadamard product of two matrices)

Natural gradient descent method [16] is used to estimate the change of according to , where is the learning rate and is the identity matrix. If is not close to zero, is updated using

Finally, selected features resulting from the training process are estimated using minimum square error method (MSE) [17, 18].

(i)From (8), the training matrix is reconstructed as (ii)Substitute (9) into (16): (iii)There, the reduced dimensionality selected features from the training set are estimated by

Same procedure followed for training data is used for testing; and is projected into the reduced matrix from the training procedure. The reduced dimensionality extracted features from the testing procedure are estimated by using

3.3. Fuzzy Classifier Modeling

The matrices and contain the reduced dimensionality extracted features from subimages where each one of size by . Each class of subimages (normal, abnormal, benign, and malignant) is represented by a single fuzzy rule by aggregating the membership functions of each antecedent fuzzy set using the information about selected feature values of training subimages.

The proposed fuzzy-based classification algorithm can be summarized as follows.

(1)Four activation functions with each one is of size by 1, are initialized to 0 where each element of them represents the aggregated membership functions of the selected feature values for the corresponding testing subimage. Each one represents the degree of activation of the selected feature values and so these parameters are defined as(i): represents the degree of activation for the benign testing subimages,(ii): represents the degree of activation for the malignant testing subimages,(iii): represents the degree of activation for the abnormal testing subimages, and(iv): represents the degree of activation for the normal testing subimages.(2)Since subimages have different intensities and the goal is to reduce the variation and the computational complexity, the selected features of and are mapped into a limited range of using (3)Using (21), membership functions of fuzzy sets of the testing subimages are obtained from the product space of the selected features from the training phase: where represents number of samples of the current feature represents the total number of all samples in the current feature , that is, the product space of the current feature. Also, the subscript is the index for the selected feature for each training subimage, and is the index for the current processed sample of the current feature.(4)The membership functions are normalized by using (5)The degree of activation of the developed membership functions is computed for the testing subimages for in the detection phase and for in the diagnosis phase by aggregating estimated membership functions: (6)There are many methods used in the literature to determine to which class a subimage belongs (i.e., normal/abnormal or benign/malignant). An efficient one is the maximum algorithm. It classifies the testing subimage into the class that has the maximum degree of activation according to (24) where is used as an index of a testing subimage being identified as normal or abnormal and for being identified as benign or malignant:

4. Experimetal Results

Table 2 shows results of the proposed CADD algorithm against PCA and ICA algorithms for the same testing data using fuzzy classifier. Algorithm accuracy is defined as the ratio between number of correctly classified testing subimages and total number of testing subimages. Results demonstrate that combining ICA and PCA algorithms improves the total algorithm performance in all testing sets over usage of PCA algorithm only. PCA algorithm has a best result of 80.67% while 84.03% for the proposed CADD algorithm as shown in Table 2. The proposed algorithm improved PCA algorithm accuracy with an average of 8.56% for all tests.

Table 2 also shows the simulation results of ICA algorithm versus the proposed CADD algorithm. ICA algorithm has an accuracy of 49.58% in all testing sets. In contrast, the best result of applying the proposed CADD algorithm is 84.03%. These results indicate that using PCA algorithm for dimensionality reduction before ICA algorithm improves the ICA algorithm accuracy with an average of 50.51%. Results from ICA algorithm show that fuzzy classifier performance is degraded when no dimensionality reduction module is implemented. A fuzzy classifier requires features reduction method in order to minimize total number of membership functions and improves its accuracy. As for ICA algorithm alone, each subimage has larger number of selected features and therefore fuzzy classifier performance is degraded in all testing subimages.

The experimental results of the proposed CADD algorithm as a computer-aided diagnosis system are shown in Table 3. The best result is 78% where 15 malignant subimages out of 25 are correctly classified and 31 benign subimages out of 34 are correctly classified.

This system uses several parameters that impact the performance and accuracy of results such as the number of selected principal components, learning rate, and mapping range.

4.1. Number of Selected PC

Using PCA algorithm to reduce data dimensionality as a preprocessing step for ICA algorithm affects the total algorithm accuracy. In Table 4, simulation results on test sets 1–5 (PC indicates the number of selected principal components) are shown. These results indicate that selecting less than 11 principal components achieves acceptable results in all simulations. This means that less than 0.81% of principal components are selected for subimages of size  pixels and less than 0.5% of principal components are selected for subimages of size  pixels. This is harmony with all literature that used PCA algorithm for dimensionality reduction.

4.2. Learning Rate

The learning rate for computing the change in for ICA algorithm determines the speed of convergence for and it impacts the total algorithm accuracy. Figures 48 show learning rate impact on test sets 1–5. It can be concluded that choosing a learning rate close to 0.0045 produce acceptable results for all sets.

4.3. Mapping Range

Figures 913 show the accuracy of the results versus the mapping range values for all test sets 1–5 and it can be concluded that choosing a mapping range equal to or is acceptable for all testing sets.

The proposed system performance is a parameter-dependent and an investigation of this dependency is outside this presentation but rather is left for future investigations. Efforts developed earlier such as in [19, 20] can be investigated. Estimating the parameters will continue to be one of the main disadvantages of algorithms such ICA where human intervention is needed.

In other classification methods such as in fractal models, [7], a set of 30 mammograms are used that contains single and clustered microcalcifications. 50 subimages are extracted and divided into 30 subimages for the training phase and 20 subimages for the testing phase. Results of using two different multilayer subnetworks in neural network-based classifier indicate that the proposed system has a classification accuracy of 90%. Also, in discrete wavelet transform method [9], a set of 60 mammograms are used. Masses are segmented manually as a preprocessing step for the classification system. The proposed system classifies masses into round, nodular, or stellate. Results indicate a classification accuracy of 83%. In [13], 330 subimages are cropped and scaled into sizes of and  pixels form all MIAS mammograms as one subimage from each mammogram. Results using ICA-Rough indicate a classifications accuracy of 82.22% for subimages of size  pixels and for PCA-Rough of 88.57% for subimages of size  pixels.

Furthermore, Table 2 shows that each test set has different algorithm accuracy so cropping size for example has an impact on the results.

5. Concluuding Remarks

A CADD system has been developed and implemented. Its framework is based on integrating PCA, ICA, and fuzzy logic. The performance of the proposed CADD is compared against PCA and ICA performance individually. Extensive simulations using 833 subimages are performed. These results indicate that combining ICA and PCA algorithms improves PCA algorithm accuracy about 8.56% for all test sets and ICA algorithm accuracy about 50.51%. The best results are obtained with subimage sizes of  pixels over the  size. Using ICA algorithm for feature extraction without using a preprocessing module of PCA degraded fuzzy classifier performance. ICA takes advantage of the reduction of dimensionality and noise to produce more accurate and robust results. Parameter values play a vital role in the system’s performance and their selection should be investigated to improve system’s robustness. Other membership functions can be modeled based on mean and standard deviation of selected feature values.

Acknowledgments

Partial support of this work was provided by the National Science Foundation Grant (MRI-0215356) and by Western Michigan University FRACASF Award (WMU: 2005–2007). The authors would like also to acknowledge Western Michigan University for its support and contributions to the Information Technology and Image Analysis (ITIA) Center.