Abstract

The pathologist’s diagnosis is crucial in identifying and categorizing pathological cancer sections, as well as in the physician’s subsequent evaluation of the patient’s condition and therapy. It is recognised as the “gold standard”; however, both objective and subjective pathological diagnoses have limits, such as tissue corruption resulting from the nonstandard collection of diseased tissue, nonstandard tissue fixation or delivery, or a lack of necessary clinical data. In addition, diagnostic pathology encompasses too much information; thus, it requires time and effort to grow a trained pathologist. Consequently, computer-assisted diagnosis has become an essential tool for replacing or assisting pathologists with computer technology and graphical development. In this regard, the CAMELYON 17 competition was designed to identify the best algorithm for detecting cancer metastases in the lymph. Each participant was given 899 whole-slide photos for the development of their algorithms. More than 300 people enrolled on the competition. CAMELYON 17 is primarily focused on the categorization of lymph node metastases. The TNM classification system is the primary classification system. Participants at CAMELYON 17 mostly use categorization and learning techniques in deep learning and machine learning. In order to get a better understanding of the top-selected algorithms, we examine the advantages and limitations of traditional machine learning and deep learning for classifying breast cancer metastases.

1. Background

In today’s society, the number of cancer patients is increasing. Breast cancer is expected to become the most common cancer in 2020, with a solid familial hereditary link. Breast cancer develops from malignant tumour cells in the epithelium lining the milk ducts or the lobules of the breast. Breast cancer (BC) is the most frequent form of cancer in females and is a complex illness with many causes. In 2020, there were an estimated 2,261,419 new instances of breast cancer identified and 684,996 deaths attributed to the disease. Although the 5-year survival rate for BC is poor, those who are diagnosed and treated early have a decent chance of beating the disease [1]. So, we pay attention to the contribution of CAMELYON 17 [2] in machine learning for breast cancer diagnosis.

The goal of this challenge is to find and categorize breast cancer metastases in lymph nodes. Small glands called lymph nodes filter lymph, the fluid that travels through the lymphatic system. Breast cancer is most likely to spread to the axilla’s lymph nodes. One of the most significant prognostic variables in breast cancer is the presence of metastatic lymph nodes. The prognosis worsens when cancer progresses to the lymph nodes 0. This is why lymph nodes are surgically removed and examined under a microscope. On the other hand, the diagnosis technique for pathologists is arduous and time-consuming. Small metastases, on the other hand, are notoriously difficult to detect and are frequently overlooked.

The TNM Classification of Malignant Tumours (TNM) is a widely accepted system for determining the degree of cancer dissemination. The TNM method is a widely used classification system for determining the amount of cancer dissemination in individuals with solid tumours [3]. It is one of the most important tools for physicians to determine the best treatment choice and prognosis. TNM staging considers the tumour size (T stage), whether cancer has progressed to regional lymph nodes (N stage), and if the tumour has moved to other areas of the body in breast cancer (M stage).

Because the histological examination of lymph node metastases is a crucial element of TNM classification, the pathologic N stage, often known as the pN stage, will be the topic of CAMELYON 17. Hematoxylin and eosin (H&E) stains were used to create the photographs. This stain produces vivid blue/pink contrasts across various (sub) cellular structures and may be used on various tissue types. The whole-slide images (WSI) were acquired from five medical centres and 100 patients for training, with five lymph node pictures per patient. Five photos of each patient lead to a labelled CSV file with the pN stage in this collection. The WSI image was also saved in tiff format, while the annotated file of the metastatic lymph node was saved in XML format.

There are three types of metastasis tissue and 5 pN stages in the CAMELYON 17 challenge dataset (Tables 1 and 2). Cell size and colour contrast could play an important role in classifying cancer and normal tissue by comparing metastasis and normal tissue (Figure 1). The computer has higher efficiency, shorter diagnosis time, and training cost than the pathologist in machine learning. However, there are some deficiencies in the diagnosis and accuracy of rare cases, so we want to explore a broader application space of computers in medicine in the future by comparing the advantages and disadvantages of two computer methods mainly used in the competition.

2. Methodology

This report compares and contrasts the efficacy of deep learning and machine learning approaches to the study of breast cancer [4]. Our work here was informed by algorithms developed for the CAMELYON 17 competition. We first compare several machine learning [5] methods for spotting breast cancer. We then compare several deep learning [3, 6] methods used in the screening of breast cancer.

3. Classifier Selection in Machine Learning

Machine learning techniques can be used for the detection of many diseases [7]. This section illustrated feature selection and how to choose a suitable classifier for classifying the tumour section and normal section. This section discussed the different feature selections from the papers of entrants in the CAMELYON 17 challenge for feature selection. This could be a potential reference for future work to continue feature extraction and classify the tumour and expected areas using the CAMELYON 17 dataset. Then, the different classifiers discussed the positives and negatives. In addition, the application of classifiers in histology breast cancer images was displayed and discussed to provide a possible algorithm to classify the tumour cell and normal cells.

3.1. Feature Selection

Selecting relevant features to extract or a candidate subset of characteristics is known as feature selection. For example, if cancer cells are the target area in a histological image, selecting the most suitable features to distinguish cancer cells from normal cells could be necessary. In the arena of medical pictures, every flaw in the imaging apparatus will be reflected as noise in the subsequent processing. The device itself causes this kind of noise. In addition, the CAMELYON 17 dataset was collected from five medical centres, which means the dataset has been significantly expanded compared to the previous year (CAMELYON 16 challenge). A typical whole-slide image is roughly pixels using a 3-byte RGB pixel format on the highest resolution. This amounts to 55.88 GB of uncompressed pixel data from a single level. According to the large size of the image, dimensionality reduction is necessary for image processing. Dimensionality reduction is one of the most often used methods for removing noisy (i.e., redundant) features. Feature extraction and feature selection are the two primary dimensionality reduction approaches. Project features are extracted into a lower-dimensional feature space, and the newly produced features are frequently composites of the original features. Principle component analysis (PCA) is one example of feature extraction techniques [4]. Both feature extraction and feature selection have the potential to improve learning performance, reduce computational complexity, develop more generalizable models, and reduce storage requirements.

CAMELYON 17 challenge [2] provided the dataset. Whole-slide photos of histology lymph node sections are shown. Because most organizations use 0.9 as their threshold [710], this research will first attempt to threshold the tumour probability heatmap using 0.9. The texture features and morphological characteristics were the two primary groupings of features in this dataset. Colour scale-invariant feature transform (SIFT) features [1113], local binary patterns [11, 1416], and features based on grey-level cooccurrence matrix (GLCM) [11, 17, 18] are some of the texture characteristics that may be found in this dataset.

One publication employed several properties of nuclei as morphological criteria for categorizing breast histological images [19]. Another noteworthy example is one group’s use of the DBSCAN technique to extract morphological cues from heatmaps to classify slide-level metastases. To anticipate a slide, the primary axis is taken on each slide. The length of the central axis in pixel units yields the maximum tumour length [10]. A set of participants also chose the total area, main exis length and the area of the most significant tumour [20].

Furthermore, thirteen morphological features from the heatmap, such as the main tumour cluster’s size and the length of the cluster’s principal axis, are employed to categorize tumour clusters into their respective groups [21]. The papers of the participating groups will be helpful for the feature extraction that we will be performing. These characteristics may help distinguish cancer tissue from the rest of the normal tissue.

3.2. Support Vector Machine (SVM) Classification

After reading some works of literature from the participating teams, most groups chose the SVM [22] and random forest [23] classifiers. Furthermore, some researchers advocate SVM or RF as more straightforward approaches that consistently provide high accuracies and are typically faster [24]. As a result, SVM and random forest are the first two choices, and a quick summary table compares SVM versus random forest for classifier selection in this dataset (Table 3). When comparing SVM with random forest, the random forest still has some overfitting issues. In addition, to minimize overfitting when training the random forest classifier, one group extracted just four features from each heatmap, which was updated based on their previous work [7].

Furthermore, the number of available features may not be known. Therefore, the number of features chosen may impact the random forest output. SVM classification can perform well on datasets with numerous characteristics, even when only a few examples are available for the training process [24]. As a result, SVM might be used as a first step in picking a classifier.

VC (Vapnik-Chervonenkis) advocated the use of SVM. Unlike Bayesian classification, this technique does not require previous distribution information for data classification. The distribution of characteristics is not available in many real scenarios. SVM may be used to categorize the provided data into distinct classes in such circumstances. SVM evaluated and classified various breast image collections (Table 4). Although some literature did not initially address the test set, SVM worked well on these histopathology images. These publications might inspire the SVM library and convenient features.

So, if there is time for additional study, the possible phases of SVM are as follows (Figure 2). The initial effort would be to compare the pN0 stage and the pN2 stage (with more significant macros) because the features of metastatic tissue are more noticeable in pN2 than in pN0. A PCA analysis would be used to reduce the size of extracted features, followed by a logistic regression analysis to check the relationship between these features. The basic SVM approach could be the first try. If the accuracy of the findings is poor, combining SVM with other algorithms may be a viable option, such as the MGSA algorithm [30]. If the recovered characteristics are helpful, the next step would be to input all phases of photos to train.

3.3. Discussion

In certain circumstances, some categorized algorisms of machine learning might be more decisive, which could conclude the classification and prediction without a deep learning model. For instance, some groups in CAMELYON 17 challenge use a classifier to categorize cancer and normal pictures, such as DenseNet-201 [21] and XGBoost [7]. As a result, in this study, a deep learning model will be used to see which technique (machine learning-SVM or deep learning) is superior for classifying metastatic tissue.

4. Deep Learning Method

Whole-slide scanners are now utilized to digitize glass slides holding high-resolution tissue specimens, thanks to advancements in slide scanning technology (up to 160 nm per pixel). Digital whole-slide images (WSI) allow deep learning techniques to help pathologists examine and quantify slides. With the development of deep learning algorithms (such as convolutional neural networks (CNN)) in the past decade, many successful examples of CNN have been applied in computer vision. Studies have shown that deep learning can automatically identify all breast cancer metastasis slides and exclude 30-40% of normal tissue [31]. The combination of digital pathology and deep learning techniques can improve the objectivity and efficiency of histopathological slide analysis and reduce the pathologist’s workload.

4.1. Methodology

Deep learning technology analyzes digital lymph node slices, mainly divided into four processes: preprocessing, training, testing, and evaluation. The specific analysis methods are as follows (Figure 3).

Although the first 400 WSI files contained all the necessary information, they were not suitable for direct training of a deep convolutional neural network (CNN) due to the machine memory shortage and multiresolution problems caused by the high-level resolution. Therefore, we must preprocess the image to obtain a smaller patch image, such as , so that a typical CNN can process it.

CNN model was used as the feature extractor to build a two-classification prediction model of images, which enabled the model to identify the probability of the image as cancer (cancer was 1), and the output was 0-1. Then, the probabilistic thermal map obtained by the CNN model is used to train the WSI classifier and classify WSI images. The test set’s cancer distribution probability map is obtained using the trained model. Successfully labelling cancer tissue could be the excepted outcome of this experiment.

4.2. Limitations

According to literature, some deep learning algorithms can achieve higher accuracy than the manual judgments of 11 pathologists. However, this does not mean that deep learning can replace artificial judgment in evaluating breast cancer lymph node metastasis. That is because the competition has some limitations.

The challenge did not capture the actual clinical setting. CAMELYON 16’s dataset consisted mainly of lymph node slices with metastasis. However, most slices did not contain metastasis in actual clinical situations, which is a limitation caused by the number of datasets used in the competition and could not fully describe the situation encountered in clinical practice. Thus, the dataset can not be directly comparable with the mix of cases in clinical practice. In addition, after training, the competition’s algorithm could only distinguish breast cancer metastasis in lymph node tissue. However, it could not identify some rare cases (such as infection, sarcoma, and lymphoma) [11]. This is a limitation due to the type of dataset.

5. Results and Discussion

In the 2016 CAMELYON Breast Cancer Lymph Node Metastasis Classification Challenge, 25 of the 32 algorithms submitted by participants used convolutional neural networks (Bejnordi et al.); these include VGG-16 (Simonyan and Eisserman), GoogLeNet (Szegedy et al.), and ResNet-101 (He et al.) and other famous models [6]. Researchers from the Harvard Medical School and the Massachusetts Institute of Technology (Wang et al.) achieved the best score of 0.9935 area under the ROC curve (AUC) and 0.8074 free-response operating characteristic curve (FROC) using the GoogLeNet-based model [5].

The evaluation of results can be divided into two aspects: the evaluation of metastasis identification and metastasis classification. Contestants were asked to provide the coordinates of the tumour areas detected in each slice and a confidence score for that tumour. The confidence can be calculated by the DICE coefficient formula, which represents the probability that the area is a tumour area. The algorithm comparison uses the free-response receiver operator characteristic curve (FROC) metrics (Figure 4).

For evaluation of transfer grading, participants should provide the results of transfer grading for each slice and a confidence score. The algorithm’s performance is compared with the area under the subject operating characteristic curve (AUC) (Figure 5).

Comparing the four most commonly used neural networks in the competition with the best performance, we can conclude the benefits and drawbacks of different neural networks when using deep learning technology to analyze lymph node metastasis (Table 5).

5.1. Data Dependency

With the growth in data scale, the most significant distinction between deep learning and regular machine learning will become apparent. The deep learning algorithm works poorly when the data is small. This is because the deep learning algorithm requires a large amount of data to comprehend it fully. On the other hand, traditional machine learning algorithms and their human-crafted rules benefit in this instance.

5.2. Hardware Dependency

In contrast to traditional machine learning algorithms, which may operate on low-end equipment, deep learning algorithms generally rely on high-end machines. This is since deep learning methods necessitate the usage of a GPU. After all, the GPU is an essential component of its operation. The usage of GPU may significantly optimize these operations, which is the goal of employing GPU. A deep learning algorithm conducts a lot of matrix multiplication.

5.3. Feature Engineering

Feature engineering incorporates domain knowledge into a feature extractor, which is intended to simplify data and make features more visible to the learning algorithm. This procedure is demanding and costly in terms of time and experience. Professionals must detect most applications’ characteristics and manually program according to the field and data type. Pixel values, forms, textures, locations, and orientations are examples of features. The precision of feature identification and feature extraction is critical to the effectiveness of most machine learning algorithms.

A deep learning algorithm tries to extract advanced characteristics from data using machine learning techniques. This is a distinct aspect of deep learning and an important component of machine learning. As a result, deep learning reduces the time it takes to create a new feature extractor for each challenge. Convolutional neural networks, for example, will attempt to learn the underlying characteristics first, such as edges and lines in early layers, followed by a portion of the face, and advanced face recognition.

5.4. Execution Time

Deep learning algorithms often require a long time to train. The deep learning method has many parameters; therefore, training takes a long time. The most powerful deep learning algorithm, ResNet, takes roughly two weeks to train from zero. On the other hand, machine learning takes substantially less time to train, ranging from a few seconds to a few hours.

The exam time has been entirely flipped around. The deep learning system takes substantially less time to test. However, compared to the k-nearest neighbour, the test time will rise as the number of data increases (a machine learning algorithm). Some machine learning algorithms will have a short test time, albeit this does not apply to all.

6. Conclusion

This paper is aimed at finding the differences between traditional machine learning and deep learning methods to detect cancers. Their differences are mainly about the accuracy, efficacy, algorithm complexity, and time spent. Traditional machine learning (SVM) and deep learning could share similarities, such as high accuracy and efficacy in tumour detection compared with manual selection; however, whether classifying the tumour section and normal section by only using an SVM classifier instead of a deep learning model maybe not yet possible to conclude. Thus, the SVM classifier would be applied to distinguish the tumour section in future work. The deep learning model would also be considered to learn how to identify tumour cells automatically. This research collected some excellent ideas in the feature selection, classifier selection, and deep learning model selection in CAMELYON 17 challenge, which could be helpful for similar image detection research. Also, this research could provide some good ideas on the detection of the metastasis stage in breast cancer metastasis research by using machine learning methods to improve the accuracy of detection in the actual clinical environment. In future, we would compare the SVM classifier and deep learning model to find a more accurate and efficient algorithm to finish the CAMELYON 17 challenge.

Data Availability

All data, models, and code generated or used during the study appear in the submitted article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Tianbo Sun, Tong Meng, and Yutong Liu contributed equally to this work and should be considered co-first authors.

Acknowledgments

First and foremost, the CAMELYON 17 challenge (https://camelyon17.grand-challenge.org/) provided all of the data for this study, which is warmly welcomed. Also, we got some good ideas from the published papers from the contestants. Prof. Jens Rittscher and TA Yisha, both highly respected, responsible, and competent scholars, have been of tremendous assistance in the writing of our thesis. We would want to thank you from the bottom of our hearts. Without their helpful advice, remarkable politeness, and patience, we would not have been able to complete the thesis. Their meticulous and intense academic analysis informs this thesis and future studies. We would also want to thank all of our teachers who helped us finish the research project. Last but not least, we want to thank everyone in our team who worked so hard on this project.