Abstract

This study involved developing a computer-aided diagnosis (CAD) system for discriminating the grades of breast cancer tumors in ultrasound (US) images. Histological tumor grades of breast cancer lesions are standard prognostic indicators. Tumor grade information enables physicians to determine appropriate treatments for their patients. US imaging is a noninvasive approach to breast cancer examination. In this study, 148 3-dimensional US images of malignant breast tumors were obtained. Textural, morphological, ellipsoid fitting, and posterior acoustic features were quantified to characterize the tumor masses. A support vector machine was developed to classify breast tumor grades as either low or high. The proposed CAD system achieved an accuracy of 85.14% (126/148), a sensitivity of 79.31% (23/29), a specificity of 86.55% (103/119), and an of 0.7940.

1. Introduction

Breast cancer is a leading cause of death in women worldwide [1]. The histological grade of a breast cancer tumor is regarded as a crucial prognostic indicator [2]. Rapid and accurate assessment of tumor grades is crucial for enabling a physician to determine the appropriate treatment options for patients. Previous studies have reported ultrasound (US) imaging to be an effective supplement to mammography in screening for breast cancer [35]. This study involved developing a computer-aided diagnosis (CAD) [6] system for assessing the tumor grades of breast cancer according to US images.

A histological tumor grade is a measure of the differentiation between cancerous and normal cells [2]. The Nottingham system [7] categorizes breast cancer into 3 grades. In general, cancer of lower grades tends to be less aggressive than cancer of higher grades. The grade of a tumor is typically determined through a morphological assessment of biopsied tissue and cells performed by pathologists using a microscope. The grading process is invasive and time consuming and can be subjective. Assessing the grades of breast cancer tumors online by using noninvasive approaches is more desirable.

Research has indicated that tumor grades are correlated with sonographic characteristics. Lamb et al. [8] observed that high-grade tumors exhibited posterior enhancement and well-defined margins. Kim et al. [9] demonstrated that parallel orientation and echo patterns were correlated with tumor grades and certain biological markers in breast cancer. Aho et al. [10] indicated that infiltrating ductal carcinoma tumors that exhibited posterior shadowing in US images were likely to be low-grade ones. Wojcinski et al. [11] evaluated the interrelationship between tumor grades and BI-RADS [12] features and determined that high-grade tumors were associated with strong posterior acoustic enhancement and weak shadowing. Chang et al. [13] quantified stellate features by using US images and observed that masses of breast cancer associated with stellate features tended to be low-grade tumors. Another study revealed that the presence of posterior enhancement in US images was correlated with an increased likelihood of the tumor being of a high grade [14].

In this study, a CAD system was developed to determine the tumor grades of the breast cancer masses captured in 3-dimensional (3D) US images. The specific objectives were to (1) quantify features of breast cancer lesions in US images, (2) identify a set of US image features that significantly correlate with tumor grades, and (3) develop a model that can be applied to distinguish between high-grade and low-grade tumors. In this study, volumetric US breast images were collected. The tumor lesions were segmented, and the features of these tumor masses were quantified. A support vector machine (SVM) classifier was developed to distinguish tumor grades, and a genetic algorithm (GA) was used for feature selection and model parameter optimization.

2. Materials and Methods

2.1. Volumetric Ultrasound Image Acquisition

The breast US images used in this study were samples of diagnostic cases obtained during routine clinical care at Changhua Christian Hospital (Changhua, Taiwan). A total of 148 cases were examined. The images were acquired using a US scanner (Voluson 730; GE Healthcare, Zipf, Austria) equipped with a 5.6–18 MHz volume transducer (RSP6-16; GE Healthcare, Zipf, Austria). The images were quantized into 256 gray levels, and the mean voxel resolution was 0.2 mm on each side. Regarding patients that exhibited multiple tumor masses, only images of the largest lesions were included in the study. The lesion sizes ranged from 0.134 to 24.061 cm3 (median: 2.669 cm3). The grades of the tumors were identified based on pathological diagnoses, which involved biopsy methods and the Nottingham grading system. The numbers of grade I, II, and III tumors were 25, 94, and 29, respectively. In this study, grades I and II were defined as low-grade, whereas grade III was considered as high grade. The images were collected between June 2007 and August 2009. The ages of the patients ranged from 24 to 87 years (median: 49 years). The ethics committee of the hospital approved the study. No patient identifications were disclosed to avoid diagnosis bias and ensure patient privacy.

2.2. Tumor Segmentation

Segmentation was performed to extract the tumor lesions in the US images. The tumor masses were segmented semiautomatically by using ITK-SNAP [15], which performed active contouring based on a level set algorithm [1619]. During the segmentation process, the operators identified the lesions in the US images and placed seeds (i.e., starting points) at appropriate locations inside the tumor masses. The seeds expanded until they reached the tumor boundaries. Appropriate control parameters were set to ensure that optimal segmentation results were attained [15]. Compared with manual methods, semiautomatic segmentation is more consistent and less laborious when accurate contours must be sketched. Semiautomatic segmentation is particularly suitable for use with 3D US images. Figure 1 shows a segmented volumetric tumor mass. Experienced radiologists verified the segmentation results.

2.3. Feature Quantification

Features were quantified to describe the characteristics of the tumors. The features were categorized into 4 types: textural, morphological, ellipsoid fitting, and posterior acoustic. The textural features represent the spatial correlations in gray level among the voxels of a tumor mass. The textural features were calculated using a gray level cooccurrence matrix (GLCM) [20]. During this process, the gray level of the US image subjected to analysis was reduced from 256 to 16. The frequencies of the gray level differences between 2 adjacent voxels in the image were then cumulated to form the GLCM , where is the displacement vector that represents the geometric relationship between the 2 adjacent voxels [21]. Six textural features, namely, the angular second moment , contrast , inverse difference moment , entropy , dissimilarity , and correlation [20, 22], were then calculated using . In this study, 4 displacement vectors were considered: , , , and . Thus, 24 textural features were quantified.

Morphological features [23, 24] describe the superficial regularity of the tumor masses. Six morphological features were included in this study. Volume (unit: mm3) and surface area (unit: mm2) described the basic structural characteristics of a tumor mass. Classical compactness was used to measure the degree of similarity between a tumor mass and its optimally fitted sphere, whereas discrete compactness was used to evaluate the degree of similarity between a tumor mass and its optimally fitted cube [24, 25]. The mean radius and standard deviation of radius characterized the size and surface irregularity of tumor masses.

Ellipsoid fitting features [24] depict the degree of similarity between a tumor mass and its optimally fitted ellipsoid (Figure 2). The optimally fitted ellipsoid can be regarded as the baseline against which the degree of shape irregularity of a tumor mass can be measured. Nine ellipsoid fitting features were quantified: axis ratio , surface ratio , volume covering ratio , number of regions outside the ellipsoid , number of regions inside the ellipsoid , number of total regions , number of regions with angularity outside the ellipsoid , number of regions with angularity inside the ellipsoid , and number of total regions with angularity . The parameter was defined as the ratio of the volume of the intersection between the tumor and the ellipsoid volume to the tumor volume; is the sum of and ; and is the sum of and .

Posterior acoustic features [2628] are characterized by the discrepancy in the gray levels of a voxel between a tumor mass and its corresponding posterior region (the region beneath the tumor in the A-view image in Figure 3). When acoustic enhancement occurs, the gray level of the posterior region is greater than the gray level of the lesion in ultrasound images [29]. Five posterior acoustic features were defined: the standard deviation of the gray levels in the posterior region , the ratio of the mean gray level in the posterior region to that in the tumor region , the ratio of the gray level standard deviation in the posterior region to that in the tumor region , the difference between the gray level means of the posterior and tumor regions , and the difference between the gray level standard deviations of the posterior and tumor regions . In this study, the section area (C-view image in Figure 3) of the posterior region was defined as two-thirds of the maximum tumor mass section area to avoid the edge-shadowing effect [26, 28]. The section area of the posterior region was derived using distance transform [30]. The height of the posterior region was defined as the tumor mass height and could not exceed 100 voxels [28].

2.4. Tumor Grade Classification and Attribute Selection

Soft-margin SVM classifiers with radial basis function kernels were developed to differentiate between high-grade and low-grade tumors. Because the dataset used in this study was unbalanced (119 low-grade tumors and 29 high-grade tumors), the soft-margin parameter ratio was set as the reciprocal of the tumor number ratio between the 2 grades [31]. The classifiers were developed using LIBSVM [32]. During the model development process, a GA was applied to identify an optimal set of features as the model inputs and to determine the soft-margin and kernel parameters for the model [33]. Feature selection is crucial for the performance of CAD systems. Including inappropriate attributes can result in an overfitted model [34] and can therefore reduce the system performance. In this study, the fitness function [33] of the GA was set as a linear combination of the product of the model sensitivity and specificity (with a weight of 0.8) and the reciprocal of the number of selected features (with a weight of 0.2). The calculation was performed using MATLAB (MathWorks, Inc.).

2.5. Performance Assessment

Receiver operating characteristic analysis was applied to measure the performance levels of the CAD systems by using tenfold cross validation (CV). Six indices were calculated: the area under the curve (), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) [3537]. The sensitivity and the specificity were defined as the percentages of actual high-grade and low-grade tumors, respectively, that were correctly classified. The PPV and the NPV were defined as the percentages of predicted high-grade and low-grade tumors, respectively, that were correctly classified.

3. Results and Discussion

3.1. Feature Analysis

A total of 44 features were collected. Table 1 lists the mean values, median values, and standard deviations of the features concerning the low-grade and high-grade tumor lesions. Regarding the textural features, the numbers in the parentheses denote the associated displacement vectors. Student’s -test and the Mann-Whitney test were applied to evaluate the differences in feature values between the tumors of different grades. The tests indicated that the values differed significantly between the low-grade and the high-grade tumors . The values of some features were marginal (e.g., , , , , and ).

3.2. Selected Features

Fourteen features were selected using the proposed GA-based approach: , , , , , , , , , , , , , and . The selected feature set contained 6 of the 24 textural features, 3 of the 6 morphological features, 3 of the 9 ellipsoid fitting features, and 2 of the 5 posterior acoustic features, indicating that an appropriate combination of feature types might improve the performance of the CAD system.

3.3. Model Performance Evaluation

SVM models were developed using the selected features, all of the available features, all of the morphological features, all of the ellipsoid fitting features, all of the textural features, or all of the posterior acoustic features, separately. During the model development process, the GA was implemented to optimize the model parameters. Table 2 shows the CV classification performance results of the 6 models. The model that was developed using the selected features outperformed the other models. Practically, high-grade tumors are more severe. Misdiagnosing a high-grade tumor as a low-grade tumor may increase the risk of harm and should be avoided. Therefore, the sensitivity and the NPV are 2 critical indices for evaluating the performance of CAD systems. The model that was developed using the selected features attained reasonable sensitivity (79.31%) and a high NPV (94.50%). The model developed using all of the features was inferior to the model developed using the selected features, possibly because of overfitting (including too many trivial explanatory variables in the model).

4. Conclusion

This study proposed a CAD system for discriminating the tumor grades of breast cancer in US images. The effectiveness of the proposed system was verified based on clinical data. The textural, morphological, ellipsoid fitting, and posterior acoustic features of the tumors were quantified using the US images. An SVM classifier was developed using a GA to facilitate feature selection and model parameter optimization. An optimal set comprising 14 features (out of 44 total features) was determined. The proposed CAD system effectively distinguished between high-grade and low-grade tumors at an accuracy of 85.14% (126/148), a sensitivity of 79.31% (23/29), a specificity of 86.55% (103/119), and an of 0.7940. Additional features, such as the angle of the long axis of the fitted ellipsoid or the abrupt interface between tumor and normal tissue, can be included in future research to further improve the CAD system.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to thank Dr. Ping-Lang Yen at National Taiwan University and Dr. Shuen-De Wu at National Taiwan Normal University for their helpful discussion and their invaluable suggestions regarding this research.