Abstract

Breast cancer is one of the primary causes of cancer death in the world and has a great impact on women’s health. Generally, the majority of classification methods rely on the high-level feature. However, different levels of features may not be positively correlated for the final results of classification. Inspired by the recent widespread use of deep learning, this study proposes a novel method for classifying benign cancer and malignant breast cancer based on deep features. First, we design Sliding + Random and Sliding + Class Balance Random window slicing strategies for data preprocessing. The two strategies enhance the generalization of model and improve classification performance on minority classes. Second, feature extraction is based on the AlexNet model. We also discuss the influence of intermediate- and high-level features on classification results. Third, different levels of features are input into different machine-learning models for classification, and then, the best combination is chosen. The experimental results show that the data preprocessing of the Sliding + Class Balance Random window slicing strategy produces decent effectiveness on the BreaKHis dataset. The classification accuracy ranges from 83.57% to 88.69% at different magnifications. On this basis, combining intermediate- and high-level features with SVM has the best classification effect. The classification accuracy ranges from 85.30% to 88.76% at different magnifications. Compared with the latest results of F. A. Spanhol’s team who provide BreaKHis data, the presented method shows better classification performance on image-level accuracy. We believe that the proposed method has promising good practical value and research significance.

1. Introduction

In recent years, the global prevalence of breast cancer (BC) has been gradually increased, and the affected organisms tend to be younger, gender-neutral, and have racial ambiguity, which has posed a huge threat to human beings’ normal life. In 2018, the World Health Organization’s International Agency estimated that there were 2.1 million new female cases of BC among women about 25 percent of all cancers. The number of female cases was far greater than any other cancers in both developed and developing countries [1]. BC is the leading cause of cancer deaths among women between 20 and 60 years old. Early diagnosis and treatment can effectively reduce the risk of diseases and prevent the progression of cancers [2]. The traditional diagnosis of BC includes breast mammography, breast B ultrasound, dynamic enhanced magnetic resonance, and pathological biopsy [3]. Normally, pathologists need to combine the feedback from medical equipment with their own diagnostic experience to test and analyze sample information for cancer diagnosis and treatment strategy. The diagnosis process is inefficient, costly, and subjective. Furthermore, due to the uneven distribution of pathologists and medical resources around the world, it is difficult to ensure timely and effective treatment for patients in remote areas and underdeveloped countries [4]. Therefore, an efficient, low-cost, and objective diagnosis method has important social significance and research value.

Digital pathological slicing scanner converts the pathological slicing of substance from photoelectric signal to digital signal and finally generates full-information and high-resolution digital image, namely whole slide imaging (WSI) [5]. Compared with traditional pathological slicing, it has the advantages of convenient preservation, less damage, remote diagnosis, and so on. WSI provides a basis for automatic classification and quantitative analysis of medical pathological slicing [6, 7]. In pathological WSI, different diseases have different spatial pixel arrangements and cellular features. Cells of different shapes, sizes, and colors may also be of the same type. Therefore, automatic analysis of WSI by computer algorithm is a very challenging job. With the upgrade of software and hardware, machine-learning (ML) algorithm and deep learning (DL) algorithm have been widely used in the field of lesion region segmentation, location detection, and subtype classification as a digital pathology-aided diagnosis method. They also become one of the hot research fields in pattern recognition and artificial intelligence [8, 9]. In the automatic analysis of pathological images, the application of computer-aided diagnosis (CAD) technology has improved the accuracy and efficiency of disease diagnosis and grasps the development trend of disease more objectively. It also reduces the limitations of economic conditions, geographical environment, and medical foundation.

There are two types of BC: benign and malignant. The benign cancer is prone to be transformed into the malignant cancer in the early stage. The benign cancer mainly includes adenosis, phyllodes tumor, fibroadenoma, and tubular adenoma. The malignant cancer mainly includes papillary carcinoma, ductal carcinoma, lobular cancer, and mucinous cancer [10, 11]. Among them, ductal carcinoma and lobular cancer accounted for more than 95%.

In the traditional classification of the BC, doctors need to combine feedback from medical equipment with their own experience to diagnose the body, which is somewhat subjective and inefficient. In addition, with the increasing prevalence of BC and the lack of medical resources, problems such as missed diagnosis and misdiagnosis are prone to occur. Some ML algorithms can solve the above problems, but the model performance is not satisfactory and cannot provide scientific guidance and suggestions for clinical treatment. Therefore, researchers have conducted in-depth exploration and research on the CAD method based on the DL algorithm. Although great progress has been made, there are still some shortcomings in data preprocessing, feature extraction, unbalanced data analysis, and the selection of benchmark dataset. To solve the above problems, combining the correlation and difference between the ML and DL algorithms, we propose a novel BC’s benign and malignant classification scheme based on deep features of different levels. Firstly, image preprocessing is performed on the BC ’s pathological tissue images, including normalization and window slicing. Secondly, the input data images are trained based on the AlexNet model. Then, we discuss how to train the network model and choose appropriate nodes by extracting features. Finally, the trained network model extracts features by choosing node. Then, the extracted features are input into the ML model to classify. The experiment is performed using a fivefold cross-validation approach, the same folds released with the BreaKHis dataset. The experimental results show that the average classification accuracy and standard deviation of pathological tissue images at different magnifications (40×, 100×, 200×, and 400×) are 87.85%, 86.68%, 87.75%, and 85.30%, respectively. The average classification accuracy and standard deviation of patients at different magnifications (40×, 100×, 200×, and 400×) are 87.93%, 87.41%, 88.76%, and 85.55%, respectively. In summary, our approach has higher classification accuracy on the image-level accuracy and model stability than the latest classification results of Spanhol team from the BreaKHis image dataset in literature [12]. The following is a summary of the contribution of this article:(1)To improve the generalization of the model, four data preprocessing strategies are used in this study. The four window slicing strategies are Sliding, Random, Sliding + Random, and Sliding + Class Balance Random. The experimental results show that the preprocessing strategy of Sliding + Class Balance Random window slicing has the best classification results.(2)As a second step, the intermediate- and high-level features are extracted by the AlexNet model. The experiments show that the intermediate- and high-level features have a better classification effect.(3)Based on the previous steps, extracted features are input into a variety of ML models for training and evaluation. We classify BC pathological tissue images (BreaKHis), and SVM has great potential for classifying.

The histopathological analysis is a highly specialized task. The effectiveness of diagnosis depends on pathologists’ experience, attention, and fatigue. Traditional techniques such as MRI, ultrasound, and biopsy technology are used to detect and grade BC’s lesions. Although biopsy techniques are time-consuming to diagnose, it is still one of the gold standards on diagnosis [13]. Common biopsy techniques include skin biopsy, fine-needle biopsy, core biopsy, and surgical biopsy. In the biopsy assay, biopsy tissue slicing samples are first obtained, and then, hematoxylin and eosin (H&E) staining is performed. At last, the pathologists use the equipment to analyze texture, morphology, and histological characteristics of the biopsy tissue to give the corresponding diagnostic results [14]. The pathologists focus, magnify, and scan the entire tissue under a high-power microscope. The procedure is time-consuming, repetitive, and subjective, so the diagnosis result may be very different.

With the advent of computer era, how to use computer algorithms to better assist pathologists in the simple and repeatable process on the pathologic diagnosis has always been a challenging task [15]. CAD allows pathologists to devote more energy and time on dealing with difficult disease. Therefore, a great deal of intensive research emerges from the CAD field, especially for the analysis of BC. Khamparia et al. [16] use a modified VGG model as a pretraining model on the DDSM dataset of X-ray, and the classification accuracy is 94.3%. Sharma et al. [17] use various evolutionary algorithms as the feature extractor for the Wisconsin breast cancer dataset on the UCI database. Then, they use an ML model as a classifier and get a maximum accuracy of 96.45% by the combination of BPSO and SVM algorithms. Ha et al. [18] classify benign and malignant MRI images of 216 BC patients, which are provided by Columbia University Medical Center. In the study, they present a convolutional neural network (CNN) based on MRI features and finally achieve 70% accuracy. Cruz-Roa et al. [19] study the WSI tissue region in 162 patients with invasive ductal breast, which is provided by Pennsylvania University and New Jersey Cancer Institute. At first, the team transforms the WSI detection problem into a classification problem, so they reduce the difficulty on detection of disease areas. Then, a multilayer CNN is designed to classify the tissue images after slicing. Finally, the evaluation results achieve F1 score of 71.80% and a balanced accuracy of 84.23%. Fuzzy color histogram is manually extracted features and is input into random forest classifier for training. The results achieve 67.53% for F1 score and 78.74% for balanced accuracy. In the same way, the results of RGB histogram achieve 66.64% for F1 score and 77.24% for balanced accuracy. Anuranjeeta et al. [20] use morphological features to detect BC lesions and classify cells in pathological tissue images. Finally, the results achieve 85.7% accuracy and 0.884 AUC value on rotating forest classification model.

Spanhol et al. propose a series of studies on benign and malignant BC classification. The experiment is performed using a fivefold cross-validation approach, and the specific research contents are as follows:(1)At the beginning of the series, Spanhol et al. [12] provide 7909 BC digital pathological tissue images from 82 patient cases (BreaKHis). They also design six different feature extractors to extract low-level features such as texture and edge. After then, ML algorithms such as decision tree (DT), random forest (RF), and support vector machines (SVMs) are used to classify for extracting features. Finally, a better classification result is the combination of the feature extractors PFTAS [21] and SVMs. The average accuracy of classification at different magnifications (40×, 100×, 200×, and 400×) ranges from 80% to 85%, and the standard deviation is about 5%.(2)To further improve the classification performance, Spanhol et al. [22] use a novel strategy to obtain small pieces from the BreaKHis image data. After obtaining the data, these small pieces are input into CNN model for training. Image-level accuracy and patient-level accuracy are given as classification evaluation indicators. The final classification is decided by voting mechanism of ensemble learning. At last, on the patient-level accuracy and image-level accuracy, the average accuracy of classification ranges from 80% to 90%, with a large standard deviation.(3)In [23], Spanhol uses the AlexNet model for the pretraining on BreaKHis dataset and extracting deep features. The features in different depths of layers are obtained and all are input into the logistic regression (LR) classifier to train and evaluate. Finally, the decaffeinated coffee method named by Spanhol et al. is combined with the method in literature [12]. The experiments show that the average accuracy at 200× magnifications achieves 86.3% on the patient-level accuracy and 84.2% on the image-level accuracy. However, compared with literature [12], the evaluation results of other magnifications significantly decreased.(4)Recently, Spanhol et al. [24] apply the multi-instance principle to classify BC pathological cancer images. On the patient-level accuracy, the average accuracy is 92.1%, 89.1%, 87.2%, and 82.7%, respectively. Similarly, on image-level accuracy, the average accuracy at different magnifications (40×, 100×, 200×, and 400×) achieves 87.8%, 85.6%, 80.8%, and 82.9%, respectively. Compared with literature [23], the image-level accuracy decreases at 200×and 400×magnifications, while the standard deviation increases.

Though researchers have made great progress in the image classification problem of BC, there are still many deficiencies. The studies from the literature [1620] use a small BC dataset. As a result, it leads to the inability for fitting model and objectively evaluating generalization ability. In [12], it is hard to obtain the effective classification features using the traditional manual feature extraction method. In [22], researchers adopted the random data acquisition method. This method leads to model instability. Because 1000 data pieces (64643) are obtained from each data image (3502303), the model training is difficult and the data are redundant. It is difficult to provide a large enough global view of the data.

3. Materials and Methods

3.1. Dataset

In our work, we use the publicly available BreaKHis image dataset as the benchmark for classifying BC. The dataset contains 7909 pathological tissue images of benign and malignant BC in 82 groups of patients. The patients were invited to participate in research activities in the P&D Laboratory in Brazil during January to December in 2014. All information is anonymous to ensure patient privacy. Pathologists use H&E to stain and label the pathological tissue images. Samples are prepared using a standard paraffin method, including fixation, dehydration, removal, and correction. In the process of the digital transformation of BC tissue images, a Samsung digital color camera SCC-131AN is used to obtain digital tissue images of RGB channels with the magnifications of 40×, 100×, 200×, and 400×. Pathologists remove the worthless areas such as the black border and text annotation from the original image by clipping operation. Finally, the digital pathological tissue image of BC with 700 × 460 × 3 pixels is obtained. Sample tissue images of magnifications (40×, 100×, 200×, and 400×) are shown in Figure 1.

BreaKHis dataset is divided into benign and malignant BC. Table 1 shows the distribution of them. Besides, the benign cancer and malignant cancer have different subtypes under a high-power microscope. The benign cancer includes four types: adenosis (A), fibroadenoma (F), tubular adenoma (TA), and phyllodes tumor (PT); the malignant cancer includes four types: ductal carcinoma (DC), lobular carcinoma (LC), mucinous carcinoma (MC), and papillary carcinoma (PC). The class distribution of benign and malignant subtypes is shown in Tables 2 and 3.

3.2. Methods

In our work, the normalization and window slicing operations are firstly used to preprocess the dataset. We design the training model based on the AlexNet model. The features of nodes in different depths are extracted by the model. The extracted features are input into the ML classifier for classification. Finally, we choose the best combination of feature extraction node and ML classifier. Four window slicing strategies (Sliding, Random, Sliding + Random, and Sliding + Class Balance Random) are used to obtain small pieces of data. Next, small pieces are input into the model for training and evaluation. The optimal window slicing strategy is proved by experimental results. At last, the best strategy is used as the basis of subsequent research. The method structure diagram is shown in Figure 2.

When the trained model is used to predict the whole image, the data are split into 15 small image sets with 1281283 pixel size by 128 pixel step length. The small image set is input into the model for prediction, and 15 decision results are obtained for each whole pathological image. The classification with the most votes is chosen by voting as the final prediction result.

3.2.1. Data Preprocessing

In this study, the normalization method is adopted to preprocess data, which can compress the data range from -1 and 1. This data processing method is not only simple and easy to implement, but also can improve the training efficiency and generalization of the model. The mathematical calculation formula is shown in the following equation:

Because there is no area of interest in the BreaKHis dataset, the experiments can be designed according to their own needs during the preprocessing by window slicing. In [22], the original data image is scaled from 700 × 460 × 3 to 350 × 230 × 3 in [25]. Four strategies are used to obtain small images. The specific division is shown in Table 4.

In Table 4, 32 × 32 and 64 × 64 pixel windows are used to obtain data images, and the sliding step is half of the corresponding window size, but it is hard to obtain a large global view of the data. When acquiring 1000 pieces of data by Random window slicing strategy, there is a lot of repeated information. Obviously, this preprocessing method can hardly provide effective data. It leads to underfitting and overfitting problems when the model is trained. To overcome this problem, we try four different window slicing strategies, including Sliding window slicing, Random window slicing, Sliding + Random window slicing, and Sliding + Class Balance Random window slicing. The experimental results show that Sliding window slicing strategy can guarantee model’s fitting training, but cannot guarantee generalization training. Oppositely, Random window slicing can guarantee model’s generalization training, but cannot guarantee the fitting training effectively. In our work, we have fully considered the advantages of the two strategies and proposed the Sliding + Random window slicing strategy. This idea not only guaranteed the model’s fitting training, but also improved the generalization ability.

Due to the unbalanced distribution of classes, we improve the Random window slicing strategy by obtaining random minority class data with directivity. The specific window slicing strategy of the data is shown in Table 5. The size of various window slicing strategies is 128 × 128 × 3 pixels, and the sliding step is128 pixels. This mode can ensure a large enough global view of data. Then, we use four data strategies to generate data for the model training. Sliding + Random window obtains 45 small pictures and can improve model fitting and generalization ability. To solve the problem of class imbalance, Sliding + Class Balance Random window gives a random data acquisition method based on the first three and transforms unbalanced data into balanced. Therefore, it improved the classification performance of the model by randomly acquiring benign minority classes with directivity.

3.2.2. Network Model Design and Construction

We use the Keras framework to design CNN based on the AlexNet model. In the 2012 CVPR Competition, Krizhevsky et al. [26] proposed the AlexNet model, which consists of multilayer convolution, pooling, and nonlinear mapping layers. The AlexNet model’s network structure is shown in Figure3(a). After a large number of experiments, the network model with the best classification performance contains the following layers and parameters:Input layer: the layer is the input of the network model and the output of the back propagation algorithm. The input layer image size is 128 × 128 × 3.Convolution layer: as an important structure for model feature extracting and learning, it is responsible for gradually abstracting low-level features into high-level features. The model consists of three convolutional layers; the kernel size is 5 × 5, 5 × 5, and 3 × 3; the number of convolutional kernels is 16, 32, and 72; and the order of step is 3, 1, and 1. The zero filling method is adopted, and parameters of convolution kernel are initialized with the Gaussian distribution, and the bias is 0.Pooling layer: the layer is responsible for reducing dimensionality of input features through down-sampling. Each convolutional layer is followed by a pooling layer. All pooling sizes are 2 × 2; step size is 2.Nonlinear mapping layer: in this layer, low-level features are mapped to the high-level features to facilitate model classification. ReLU is used as the activation function in this process. Behind each convolution layer, a ReLU layer is added. The calculation formula of the ReLU function is shown as follows:Global max pooling layer: in this layer, the maximum value of each feature map is output. The global max pooling layer can reduce fitting parameters and improve the generalization ability of the model.Output layer: the layer is output of the model and input of the back propagation algorithm. The number of neurons in the output layer is set to 2, and softmax is used as the activation function.

The weight of the network layer is initialized by the Gaussian distribution, and the bias is initialized to 0. The structure of the CNN model and the configuration of the model’s hyperparameter are shown in Tables 6 and 7.

3.2.3. Deep Feature Extraction and Machine-Learning Model Classification

According to the working principle of CNN, different convolutional layers have different feature extraction tasks. With the deepening of network layers, the low-level features are gradually abstracted to the high-level features. To fit data, CNN uses the high-level features to finish the corresponding learning tasks. Researches and experiments show that the classification effect of intermediate- and high-level features extracted by CNN is not worse than the high-level features; adopting an appropriate ML classifier can improve the classification results. Therefore, we present a fusion method based on deep features and ML classifier for malignant-benign classification.

In Table 6, the trained CNN takes convolution + pooling layer as the splitting node. The models are split into three groups, and the global max pooling layer is added behind them to generate three new CNN models. Then, the data are input into the new CNN model for feature extraction, and the extracted features are input into ML classifiers for training. ML classifiers include SVM, LR, Gaussian naive Bayes (GNB), DT, RF, and multilayer feedforward neural network (MFNN). The feature extraction structure diagram is shown in Figure 4.

4. Results and Discussion

Firstly, patients used a fivefold stratified shuffle split cross-validation as the data splitting benchmark. This data splitting method is consistent in F. A. Spanhol team from literature [12, 2224]. Secondly, the preprocessing operation normalizes the data and uses four window slicing strategies. Then, the preprocessed data are input into CNN for training and evaluation to obtain the best strategy. Once the optimal strategy is determined, the best feature extraction model nodes and choice should be discussed. Finally, after performing feature extraction, the extracted features are input into the ML model to classify. Figure 5 shows the main structure diagram of the experiment.

4.1. Experimental Environment

In this study, the experimental equipment is based on the Windows 10 system. The main hardware devices are 4-Core Xeon(R) W-2104 CPU @3.20 GHz and NVIDIA QuADro P2200 GPU with 4G video memory. Main software environments are PyChram 2021, Python 3.6.0, CUDA 9.0, CUDNN 7.0, TensorFlow GPU 2.2.0, Anaconda 3.5.0, and Keras GPU 2.4.3 version.

4.2. Evaluation Indicators

Two evaluation indicators, image-level accuracy and patient-level accuracy, are used to evaluate the classification performance of BC at different magnifications. The calculation formula of the image-level accuracy is shown as follows:where represents the total number of all BC images. represents the number of correctly classified BC images. In the course of routine diagnosis, the pathologists need to evaluate the patient’s overall cancer images to confirm health status. Therefore, patient-level accuracy is considered as an important evaluation indicator. The mathematical calculation formula is shown as follows:where represents the total number of patients. represents the number of correctly classified BC images of patient P, and represents the total number of BC images of patient P. Due to the unbalanced distribution of the benign and malignant data, balanced accuracy (BAC) and F1 score are adopted to objectively evaluate the classification performance of the model. The mathematical calculation formulas of the two are shown as follows:

4.3. Experiment
4.3.1. Data Splitting

In this study, the dataset is split by the method of fivefold stratified shuffle split cross-validation. The cross-validation method can objectively evaluate the performance of the model and effectively avoid the evaluation result falling into the local optimal state. In the process of splitting the training set and the test set, 82 groups of patients are used as the splitting benchmark. 57 patients are in the training set of about 70%, while others are in the test set of about 30%. This data splitting method is the same as that of F. A. Spanhol team. When comparing the algorithm performance with it, the contribution of this study can be evaluated more objectively. The number of benign and malignant patients is 17 : 40 in the training set, while 7:18 in the test set. Both sets have similar class distribution, which is beneficial to objectively evaluate the model performance. The training set and the test set are shown in Table 8.

4.3.2. Convolutional Neural Network Classification

The data are normalized according to calculation formula (2). Then, we try to use four window slicing strategies to preprocess training data in Table 5. The preprocessed data are input into the model based on the CNN model for training and evaluation. As shown in Table 5, the data in Table 8 have been increased several times on different levels. The model structure of the CNN is shown in Table 6.

In the process of model evaluation, the small image of the test set is obtained by sliding in the test set image. The Sliding window is 128 × 128 × 3 size, and the step is 128 pixels. The class of the large image is decided by voting. The voting rule is that the maximum number of image classes serves as the final prediction result for the whole image. Though other literature studies have not focused on classification problems from an imbalance view, we use F1 scores and BAC to evaluate model performance. The pathological tissue image classification results of BC at different magnifications are shown in Tables 9 and Tables 10.

As shown in Table 9 and Table 10, Sliding + Class Balance Random window slicing strategy has an excellent overall result on accuracy at different magnifications. Regarding the performance at image-level accuracy, the performance is more stable and has better evaluation results in F1 score and BAC accuracy. This experimental result shows that the presented method can alleviate the class imbalance problems. Regarding the performance at the patient-level accuracy, Sliding + Class Balance Random window slicing strategy is superior to other strategies in all magnifications. In a word, the Sliding + Class Balance Random window slicing strategy is more suitable for the following studies.

4.3.3. Feature Extraction and Machine-Learning Model Classification

For feature extraction, a well-trained CNN model takes convolution + pooling layer as feature extraction nodes after preprocessing of Sliding + Class Balance Random window slicing. For the ML model classification, the extracted features are input into the ML model for training and testing. Except for the MFNN, the hyperparameters of other classifiers (SVM, LR, GNB, DT, and RF) are all the default parameters in the Python-SkLearn ML Toolkit.

The combination of convolution and pooling layer is set standard for feature extraction nodes. Then, the global max pooling layer is added after it. There are three cases in the number of input layer neural networks in Figure 4(i). In the MFNN training process, the three cases share the model structure and the hyperparameter configuration as shown in Tables 11 and 12. The features are extracted from the CNN of different nodes on the trained model and input into ML for classification. The accuracy, F1 score, and BAC are shown in Tables 1315.

Table 13 shows that most ML classifiers have a better classification effect at the second node’s position at different magnifications. Compared with other ML classifiers, SVM has the best classification results. Regarding the performance at image-level accuracy, the mean and standard deviation at different magnifications achieve 87.85%, 86.68%, 87.75%, and 85.30%, respectively. The results are 2.81%, 1.02%, 0.78%, and 1.01% higher than the pure CNN (Sliding + Class Balance Random) in Table 9. Besides, regarding the performance at patient-level accuracy, the mean and standard deviation at different magnifications are 87.93%, 87.41%, 88.76%, and 85.55%, respectively. The results of SVM are 1.87%, 1.12%, 0.07%, and 1.98% higher than the pure CNN (Sliding + Class Balance Random) in Table 9.

In Table 14, compared with other classifiers, SVM has the highest F1 score at the second node. The mean and standard deviation at different magnifications achieve 91.12%, 93.30%, 92.54%, and 90.45%, respectively. The results of SVM are 1.13%, 1.37%, 0.51%, and 1.04% higher than the pure CNN (Sliding + Class Balance Random) in Table 10.

In Table 15, compared with other classifiers, SVM has the highest BAC score at the second node. The mean and standard deviation at different magnifications achieve 86.57%, 87.64%, 89.42%, and 85.31%, respectively. The results of SVM are 1.08%, 1.62%, 1.42%, and 1.56% higher than the BAC at different magnifications of image classification by a pure CNN (Sliding + Class Balance Random) in Table 10.

In summary, the intermediate- and high-level features are very important for SVM classification. By this means, the classification effect of BC pathological tissue images has been improved as a whole.

Literature [23] uses a fully connected layer to extract features and input them into LR. We compare the combination, which is different levels of features and SVM from Table 13 to the former. In Table 16, the results of location points 1, 2, and 3 are better than fc6, fc7, and fc8 on image-level accuracy. Similarly, the results perform better than, or at least comparable to, the former on patient-level accuracy. In particular, Location Point 2 has the best overall results. The experimental results show that the convolutional layer is better than the fully connected layer on feature extraction. Our experimental method is consistent with the comparative literature.

4.3.4. Comparison with the Literature

To verify the performance of the proposed method, we compare the optimal combination with relevant literature. To prove the validity of the comparative data, the experimental methods are the same as those in the compared literature. We use a fivefold stratified shuffle split cross-validation method, and the ratio of the training set and the test set is 7 : 3. Table 17 and Table 18 show the results of the comparison on image-level accuracy and patient-level accuracy.

In Table 17, regarding the performance at image-level accuracy the classification results are higher than those models in the recent literatures at different magnifications. In Table 18, regarding the performance at patient-level accuracy, the results indicate that the proposed method performs better than, or at least comparable to, methods in terms of the quality of the classification from literature studies. Our method is not all superior to the optimal values of different magnifications compared with literature studies, but it is more accurate and stable in terms of overall model performance classification. In particular, the classification results on the patient-level accuracy are better than others at 200× magnification.

5. Conclusions

In this study, we propose a novel classification method based on deep features of different levels to solve the BC classification problems. In the stage of data preprocessing, we present four slicing methods: Sliding window slicing, Random window slicing, Sliding + Random window slicing, and Sliding + Class Balance Random window slicing. The experimental results show that Sliding window slicing strategy can guarantee model’s fitting training, and Random window slicing can enhance model’s generalization training. Then, we combine the characteristics of both strategies and propose Sliding + Random window slicing strategy. This strategy performs well on model fitting ability and generalization ability. To overcome the classification problem caused by the unbalanced distribution of classes, we propose a Sliding + Class Balance Random window slicing. By comparing model performance, Sliding + Class Balance Random window slicing is the best data preprocessing strategy. In the stage of deep feature classification, features of different levels are combined with ML classifier, and the combination of intermediate- and high-level features with SVM has the best classification performance. The proposed method and some state-of-the-art method experiment on the BreaKHis breast cancer dataset. The experimental result shows that the proposed method can obtain better results than the ones reported in the relevant literature. It has been concluded that our method can be efficiently used for solving these problems due to its simplicity, reliability, and robustness. Although the method of benign and malignant BC has been presented, the classification of BC’s subtypes has not been studied, and each subtype has a large imbalance ratio. In future research work, we aim to provide a DL algorithm for subtype classification.

Data Availability

The authors train and evaluate the method based on the BreaKHis datasets provided by F. A. Spanhol’s team. This dataset is available at https://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-breakhis/

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the submission and publication of this manuscript.

Acknowledgments

This work was substantially supported by the Daqing Normal University Land Enterprise Application Cultivation Project under no. 19ZR16.