Abstract

Breast cancer is one of the largest causes of women’s death in the world today. Advance engineering of natural image classification techniques and Artificial Intelligence methods has largely been used for the breast-image classification task. The involvement of digital image classification allows the doctor and the physicians a second opinion, and it saves the doctors’ and physicians’ time. Despite the various publications on breast image classification, very few review papers are available which provide a detailed description of breast cancer image classification techniques, feature extraction and selection procedures, classification measuring parameterizations, and image classification findings. We have put a special emphasis on the Convolutional Neural Network (CNN) method for breast image classification. Along with the CNN method we have also described the involvement of the conventional Neural Network (NN), Logic Based classifiers such as the Random Forest (RF) algorithm, Support Vector Machines (SVM), Bayesian methods, and a few of the semisupervised and unsupervised methods which have been used for breast image classification.

1. Introduction

The cell of the body maintains a cycle of regeneration processes. The balanced growth and death rate of the cells normally maintain the natural working mechanism of the body, but this is not always the case. Sometimes an abnormal situation occurs, where a few cells may start growing aberrantly. This abnormal growth of cells creates cancer, which can start from any part of the body and be distributed to any other part. Different types of cancer can be formed in human body; among them breast cancer creates a serious health concern. Due to the anatomy of the human body, women are more vulnerable to breast cancer than men. Among the different reasons for breast cancer, age, family history, breast density, obesity, and alcohol intake are reasons for breast cancer.

Statistics reveal that in the recent past the situation has become worse. As a case study, Figure 1 shows the breast cancer situation in Australia for the last 12 years. This figure also shows the number of new males and females to start suffering from breast cancer. In 2007, the number of new cases for breast cancer was 12775, while the expected number of new cancer patients in 2018 will be 18235. Statistics show that, in the last decade, the number of new cancer disease patients increased every year at an alarming rate.

Figure 2 shows the number of males and females facing death due to breast cancer. It is predicted that in 2018 around 3156 people will face death; among them 3128 will be women which is almost 99.11% of the overall deaths due to breast cancer.

Women’s breasts are constructed by lobules, ducts, nipples, and fatty tissues. Milk is created in lobules and carried towards nipple by ducts. Normally epithelial tumors grow inside lobules as well as ducts and later form cancer inside the breast [1]. Once the cancer has started it also spreads to other parts of the body. Figure 3 shows the internal construction from a breast image.

Breast cancer tumors can be categorized into two broad scenarios.

(i) Benign (Noncancerous). Benign cases are considered as noncancerous, that is, non-life-threatening. But on a few occasions it could turn into a cancer status. An immune system known as “sac” normally segregates benign tumors from other cells and can be easily removed from the body.

(ii) Malignant (Cancerous). Malignant cancer starts from an abnormal cell growth and might rapidly spread or invade nearby tissue. Normally the nuclei of the malignant tissue are much bigger than in normal tissue, which can be life-threatening in future stages.

Cancer is always a life-threatening disease. Proper treatment of cancer saves people’s lives. Identification of the normal, benign, and malignant tissues is a very important step for further treatment of cancer. For the identification of benign and malignant conditions, imaging of the targeted area of the body helps the doctor and the physician in further diagnosis. With the advanced modern photography techniques, the image of the targeted part of the body can be captured more reliably. Based on the penetration of the skin and damage of the tissue medical photography techniques can be classified into two groups.

(i) Noninvasive. (a) Ultrasound: this photography technique uses similar techniques to SOund Navigation And Ranging (SONAR) which operates in the very-high-frequency domain and records the echos of that frequency, invented by Karl Theodore Dussik [2]. An ultrasound image machine contains a Central Processing Unit (CPU), transducer, a display unit, and a few other peripheral devices. This device is capable of capturing both 2D and 3D images. Ultrasound techniques do not have any side-effects, with some exceptions like production of heat bubbles around the targeted tissue. (b) X-ray: X-rays utilize electromagnetic radiation, invented by Wilhelm Conrad Roentgen in 1895. The mammogram is a special kind of X-ray (low-dose) imaging technique which is used to capture a detailed image of the breast [3]. X-rays sometimes increase the hydrogen peroxide level of the blood, which may cause cell damage. Sometimes X-rays may change the base of DNA. (c) Computer Aided Tomography (CAT): CAT, or in short CT imaging, is advanced engineering of X-ray imaging techniques, where the X-ray images are taken at different angles. The CT imaging technique was invented in 1970 and has been mostly used for three-dimensional imaging. (d) Magnetic Resonance Imaging (MRI): MRI is a noninvasive imaging technique which produces a 3D image of the body, invented by Professor Sir Peter Marsfield, and this method utilizes both a magnetic field as well as radio waves to capture the images [4]. MRI techniques take longer to capture images, which may create discomfort for the user. Extra cautions need to be addressed to patients who may have implanted extra metal.

(ii) Invasive. (a) Histopathological images (biopsy imaging): histopathology is the microscopic investigation of a tissue. For histopathological investigation, a patient needs to go through a number of surgical steps. The photographs taken from the histopathological tissue provide histopathological images (see Figure 4).

2. Breast Image Classification

Various algorithms and investigation methods have been used by researchers to investigate breast images from different perspectives depending on the demand of the disease, the status of the disease, and the quality of the images. Among the different tasks, for breast image classification, machine learning (ML) and the Artificial Intelligence (AI) are heavily utilized. A general breast image classifier consists of four stages (see Figure 5):(i)Selection of a breast database(ii)Feature extraction and selection(iii)Classifier model(iv)Performance measuring parameter(v)Classifier output.

Figure 5 shows a very basic breast image classifier model.

2.1. Available Breast Image Databases

Doctors and physicians are heavily reliant on the ultrasound, MRI, X-ray, and so forth images to find the breast cancer present status. However, to ease the doctors’ work, some research groups are investigating how to use computers more reliably for breast cancer diagnostics. To make a reliable decision about the cancer outcome, researchers always base their investigation on some well-established image database. Various organizations have introduced sets of images databases which are available to researchers for further investigation. Table 1 gives a few of the available image databases, with some specifications.

The image formats of the different databases are different. Few of the images contained images in JPEG format and few databases contained DICOM-format data. Here the MIAS, DDSM, and Inbreast databases contain mammogram images. According to the Springer (http://www.springer.com), Elsevier (https://www.elsevier.com), and IEEE (http://www.ieeexplore.ieee.org) web sites, researchers have mostly utilized the MIAS and DDSM databases for the breast image classification research. The number of conference papers published for the DDSM and MIAS databases is 110 and 168, respectively, with 82 journal papers published on DDSM databases and 136 journal papers published using the MIAS database. We have verified these statistics on both Scopus (https://www.scopus.com) and the Web of Science database (http://www.webofknowledge.com). Figure 6 shows the number of published breast image classification papers based on the MIAS and DDSM database from the years 2000 to 2017.

Histopathological images provide valuable information and are being intensively investigated by doctors for finding the current situation of the patient. The TCGA-BRCA and BreakHis databases contain histopathological images. Research has been performed in a few experiments on this database too. Among these two databases, BreakHis is the most recent histopathological image database, containing a total of 7909 images which have been collected from 82 patients [6]. So far around twenty research papers have been published based on this database.

2.2. Feature Extraction and Selection

An important step of the image classification is extracting the features from the images. In the conventional image classification task, features are crafted locally using some specific rules and criteria. However, the-state-of-the-art Convolutional Neural Network (CNN) techniques generally extract the features globally using kernels and these Global Features have been used for image classification. Among the local features, texture, detector, and statistical are being accepted as important features for breast image classification. Texture features actually represent the low-level feature information of an image, which provides more detailed information of an image that might be possible from histogram information alone. More specifically, texture features provide the structural and dimensional information of the color as well as the intensity of the image. Breast Imaging-Reporting and Data System (BI-RADS) is a mammography image assessment technique, containing 6 categories normally assigned by the radiologist. Feature detector actually provides information whether the particular feature is available in the image or not. Structural features provide information about the features structure and orientation such as the area, Convex Hull, and centroid. This kind of information gives more detailed information about the features. In a cancer image, it can provide the area of the nucleus or the centroid of the mass. Mean, Median, and Standard Deviation always provide some important information on the dataset and their distribution. This kind of features has been categorized as statistical features. The total hierarchy of the image feature extraction is resented in Figure 7. Tables 2 and 3 further summarize the local features in detail.

Features which are extracted for classification do not always carry the same importance. Some features may even contribute to degrading the classifier performance. Prioritization of the feature set can reduce the classifier model complexity and so it can reduce the computational time. Feature set selection and prioritization can be classified into three broad categories:(i)Filter: the filter method selects features without evaluating any classifier algorithm.(ii)Wrapper: the wrapper method selects the feature set based on the evaluation performance of a particular classifier.(iii)Embedded: the embedded method takes advantage of the filter and wrapper methods for classifier construction.

Figure 8 shows a generalized feature selection method where we have further classified the filter method into Fisher Score, Mutual Information, Relief, and chi square methods. The embedded method has been classified into Bridge Regularization, Lasso, and Adaptive Lasso methods, while the wrapper method has been classified to recursive feature selection and sequential feature selection method.

2.3. Classifier Model

Based on the learning point of view, breast image classification techniques can be categorized into the following three classes [41]:(i)Supervised(ii)Unsupervised(iii)Semisupervised.

These three classes can be split into Deep Neural Network (DNN) and conventional classifier (without DNN) and to some further classes as in Table 4.

2.4. Performance Measuring Parameter

A Confusion Matrix is a two-dimensional table which is used to a give a visual perception of classification experiments [54]. The th position of the confusion table indicates the number of times that the th object is classified as the th object. The diagonal of this matrix indicates the number of times the objects are correctly classified. Figure 9 shows a graphical representation of a Confusion Matrix for the binary classification case.

Among the different classification performance properties, this matrix will provide following parameters:(i)Recall is defined as (ii)Precision is defined as (iii)Specificity is defined as (iv)Accuracy is defined as (v)F-1 score is defined as (vi)Matthew Correlation Coefficient (MCC): MCC is a performance parameter of a binary classifier, in the range 1 to +. If the MCC values trend more towards , the classifier gives a more accurate classifier and the opposite condition will occur if the value of the MCC trend towards the −1. MCC can be defined as

3. Performance of Different Classifier Model on Breast Images Dataset

Based on Supervised, Semisupervised, and Unsupervised methods different research groups have been performed classification operation on different image database. In this section we have summarized few of the works of breast image classification.

3.1. Performance Based on Supervised Learning

In supervised learning, a general hypothesis is established based on externally supplied instances to produce future prediction. For the supervised classification task, features are extracted or automatically crafted from the available dataset and each sample is mapped to a dedicated class. With the help of the features and their levels a hypothesis is created. Based on the hypothesis unknown data are classified [55].

Figure 10 represents an overall supervised classifier architecture. In general, the whole dataset is split into training and testing parts. To validate the data, some time data are also split into a validation part as well. After the data splitting the most important part is to find out the appropriate features to classify the data with the utmost Accuracy. Finding the features can be classified into two categories, locally and globally crafted. Locally crafted means that this method requires a hand-held exercise to find out the features, whereas globally crafted means that a kernel method has been introduced for the feature extraction. Handcrafted features can be prioritized, whereas Global Feature selection does not have this luxury.

3.1.1. Conventional Neural Network

The Neural Network (NN) concept comes from the working principle of the human brain. A biological neuron consists of the following four parts:(i)Dendrites(ii)Nuclease(iii)Cell body(iv)Axon.

Dendrites collect signals and axons carry the signal to the next dendrite after processing by the cell body as shown in Figure 11. Using the neuron working principle, the perceptron model was proposed by Rosenblatt in 1957 [56]. A single-layer perceptron linearly combines the input signal and gives a decision based on a threshold function. Based on the working principle and with some advanced mechanism and engineering, NN methods have established a strong footprint in many problem-solving issues. Figure 12 shows the basic working principle of NN techniques.

In the NN model the input data is first multiplied by the weight data and then the output is calculated using

Function is known as the activation function. This function can be any threshold value or Sigmoid or hyperbolic and so forth. In the early stages, feed-forward Neural Network techniques were introduced [57]; lately the backpropagation method has been invented to utilize the error information to improve the system performance [58, 59].

The history of breast image classification by NN is a long one. To the best of my knowledge a lot of the pioneer work was performed by Dawson et al. in 1991 [60]. Since then, NN has been utilized as one of the strong tools for breast image classification. We have summarized some of the work related to NN and breast image classification in Tables 5, 6, and 7.

3.1.2. Deep Neural Network

Deep Neural Network (DNN) is a state-of-the-art concept where conventional NN techniques have been utilized with advanced engineering. It is found that conventional NNs have difficulties in solving complex problems, whereas DNNs solve them with utmost Precision. However DNNs suffer from more time and computational complexity than the conventional NN.(i)Convolutional Neural Network (CNN)(ii)Deep Belief Network (DBN)(iii)Generative Adverbial Network (GAN)(iv)Recurrent Neural Network (RNN)

Convolutional Neural Network. A CNN model is the combination of a few intermediate mathematical structures. This intermediate mathematical structure creates or helps to create different layers:

(i) Convolutional Layer. Among all the other layers, the convolutional layer is considered as the most important part for a CNN model and can be considered as the backbone of the model. A kernel of size is scanned through the input data for the convolutional operation which ensures the local connectivity and weight sharing property.

(ii) Stride and Padding. In the convolutional operation, a filter scans through the input matrices. In each step how much position a kernel filter moves through the matrix is known as the stride. By default stride keeps to 1. With inappropriate selection of the stride the model can lose the border information. To overcome this issue the model utilizes extra rows and columns at the end of the matrices, and these added rows and columns contain all 0s. This adding of extra rows and columns which contain only zero value is known as zero padding.

(iii) Nonlinear Operation. The output of each of the kernel operations is passed through a rectifier function such as Rectified Linear Unit (ReLU), Leaky-ReLU, TanH, and Sigmoid. The Sigmoid function can be defined as and the tanh function can be defined asHowever the most effective rectifier is ReLU. The ReLU method converts all the information into zero if it is less than or equal to zero and passes all the other data as is shown in Figure 13Another important nonlinear function is Leaky-RelU where is predetermined parameter which can be varied to give a better model.

(iv) Subsampling. Subsampling is the procedure of reducing the dimensionality of each of the feature maps of a particular layer; this operation is also known as a pooling operation. Actually it reduces the amount of feature information from the overall data. By doing so, it reduces the overall computational complexity of the model. To do this patch units are utilized. The two most popular pooling methods are(a) Max-Pooling(b) Average Pooling.

In Max-Pooling, only the maximum values within a particular kernel size are selected for further calculation. Consider an example of a 16 × 16 image as shown in Figure 14. A 2 by 2 kernel is applied to the whole image, 4 blocks in total, and produces a 4 × 4 output image. For each block of four values, we have selected the maximum. For instance, from blocks one, two, three, and four, maximum values 4, 40, 13, and 8 are selected, respectively, as they are the maximum in that block. For the Average Pooling operation, each kernel gives the output as average.

(v) Dropout. Regularization of the weight can reduce the outfitting problem. Randomly removing some neurons can regularize the overfilling problem. The technique of randomly removing neurons from the network is known as dropout.

(vi) Soft-Max Layer. This layer contains normalized exponential functions to calculate the loss function for the data classification.

Figure 15 shows a generalized CNN model for the image classification. All the neurons of the most immediate layer of a fully connected layer are completely connected with the fully connected layer, like a conventional Neural Network. Let represent the th feature map at the layer . The th feature map at the layer can be represented aswhere represents the number of feature maps at the th layer, represents the kernel function, and represents the bias at , where performs a nonlinear function operation. The layer before the Soft-Max Layer can be represented asAs we are working on a binary classification, the Soft-Max regression normalized output can be represented asLet represent Benign class and represent the Malignant class. The cross-entropy loss of the above function can be calculated as

Whichever group experiences a large loss value, the model will consider the other group as predicted class.

A difficult part of working on DNN is that it requires a specialized software package for the data analysis. Few research groups have been working on how effectively data can be analyzed by DNN from different perspectives and the demand. Table 8 summarizes some of the software which is available for DNN analysis.

The history of the CNN and its use for biomedical image analysis is a long one. Fukushima first introduced a CNN named “necognitron” which has the ability to recognize stimulus patterns with a few shifting variances [113]. To the best of our knowledge, Wu et al. first classified a set of mammogram images into malignant and benign classes using a CNN model [78]. In their proposed model they only utilized one hidden layer. After that, in 1996 Sahiner et al. utilized CNN model to classify mass and normal breast tissue and achieved ROC scores of 0.87 [79]. In 2002, Lo et al. utilized a Multiple Circular Path CNN (MCPCNN) for tumor identification from mammogram images and obtained ROC scores of around 0.89. After an absence of investigation of the CNN model, this model regained its momentum after the work of Krizhevsky et al. [114]. Their proposed model is known as AlexNet. After this work a revolutionary change has been achieved in the image classification and analysis field. As an advanced engineering of the AlexNet, the paper titled “Going Deeper with Convolutions” by Szegedy [115] introduced the GoogleNet model. This model contains a much deeper network than AlexNet. Sequentially ResNet [116], Inception [117], Inception-v4, Inception-ResNet [118], and a few other models have recently been introduced.

Later, directly or with some advanced modification, these DNN models have been adapted for biomedical image analysis. In 2015, Fonseca et al. [81] classified breast density using CNN techniques. CNN requires a sufficient amount of data to train the system. It is always very difficult to find a sufficient amount of medical data for training a CNN model. A pretrained CNN model with some fine tuning can be used rather than create a model from scratch [119]. The authors of [119] did not perform their experiments on a breast cancer image dataset; however they have performed their experiments on three different medical datasets with layer-wise training and claimed that “retrained CNN along with adequate training can provide better or at least the same amount of performance.”

The Deep Belief Network (DBN) is another branch of the Deep Neural Network, which mainly consists of Restricted Boltzmann Machine (RBM) techniques. The DBN method was first utilized for supervised image classification by Liu et al. [120]. After that, Abdel-Zaher and Eldeib utilized the DBN method for breast image classification [121]. This field is still not fully explored for breast image classification yet. Zhang et al. utilized both RBM and Point-Wise Gated RBM (PRBM) for shear-wave electrography image classification where the dataset contains 227 images [97]. Their achieved classification Accuracy, Sensitivity, and Specificity are 93.40%, 88.60%, and 97.10%, respectively. Tables 9, 10, and 11 have summarized the most recent work for breast image classification along with some pioneer work on CNN.

3.1.3. Logic Based Algorithm

A Logic Based algorithm is a very popular and effective classification method which follows the tree structure principle and logical argument as shown in Figure 16. This algorithm classifies instances based on the feature’s values. Along with other criteria, a decision-tree based algorithm contains the following features:(i)Root node: a root node contains no incoming node, and it may or may not contain any outgoing edge(ii)Splitting: splitting is the process of subdividing a set of cases into a particular group. Normally the following criteria are maintained for the splitting:(a)information gain,(b)Gini index,(c)chi squared(iii)Decision node(iv)Leaf/terminal node: this kind of node has exactly one incoming edge and no outgoing edge. The tree always terminates here with a decision(v)Pruning: pruning is a process of removing subtrees from the tree. Pruning performs to reduce the overfitting problem. Two kinds of pruning techniques are available:(a)prepruning,(b)postpruning.

Among all the tree based algorithms, Iterative Dichotomiser 3 (ID3) can be considered as a pioneer, proposed by Quinlan [149]. The problem of the ID3 algorithm is to find the optimal solution which is very much prone towards overfitting. To overcome the limitation of the ID3 algorithm the C4.5 algorithm has been introduced by Quinlan [150], where a pruning method has been introduced to control the overfitting problem. Pritom et al. [151] classified the Wisconsin breast dataset where they utilized 35 features. They have obtained 76.30% Accuracy, 75.10% False Positive Rate, and ROC score 0.745 when they ranked the features. Without ranking the features they obtained 73.70% Accuracy, 50.70% False Positive Rate, and ROC score value 52.80. Asri et al. [152] utilized the C4.5 algorithm for the Wisconsin database classification where they utilized 11 features and obtained 91.13% Accuracy.

Logic Based algorithms allow us to produce more than one tree and combine the decisions of those trees for an advanced result; this mechanism is known as an ensemble method. An ensemble method combines more than one classifier hypothesis together and produces more reliable results through a voting concept. Boosting and bagging are two well-known ensemble methods. Both boosting and bagging aggregate the trees. The difference is in bagging successive trees do not depend on the predecessor trees, where in the boosting method successive trees depend on the information gathered from the predecessor trees. Gradient boosting is a very popular method for data classification [153, 154]; however a state-of-the-art boosting algorithm such as “Extreme Gradient Boosting” (XGBoosting) is a very effective method for data classification [155]. Interestingly, there has not been a single paper published for breast image classification using the XGBoost algorithm. Along with the boosting method, different bagging methods are available; among them Random Forest (RF) is very popular where a large number of uncorrelated trees are aggregated together for a better prediction. Tables 12 and 13 summarize a set of papers where a Logic Based algorithm has been used for image classification.

3.1.4. Support Vector Machine (SVM)

SVM were proposed by VC (Vepnick-Cherovorenkis). This technique does not require any prior distribution knowledge for the data classification task like Bayesian classification technique. In many practical situations, the distribution of the features is not available. In such cases, SVM can be used to classify the available data into the different classes.

Consider the set of two-dimensional data plotted in Figure 17. The symbol “” represents those data which belong to Class-1 and “□” represents data which belong to Class-2. A hyperplane has been drawn which classifies the data into two classes. Interestingly, there will be “” hyperplanes available which can separate the data.

Let , where () is to be classified into two classes . Suppose that the classes and are recognized as “+1” and “−1”. Classification of this data can be writtenDuring the learning stage, the SVM finds parameters and to produce a decision function :where . As the training data are linearly separable no training data will satisfy the condition

To control the separability, we consider the following inequalities:

Sometime it is very difficult to find the perfect hyperplane which can separate the data, but if we transform the data into a higher dimension the data may be easily separable. To separate this kind of data, a kernel function can be introduced.

Kernel Methods. Assume a transformation such that it transforms the dataset into dataset where . Now train the linear SVM on the dataset to get a new classifier .

A kernel effectively computes a dot product in a higher-dimensional space . For , is an inner product of , where transforms to . Consider ; then we can define the kernel as follows:(i)Radial basis function kernel (rbf): (ii)Polynomial kernel (polynomial): .(iii)Sigmoid kernel: .(iv)Linear kernel (linear): .

The advantage of the kernel method for breast cancer image classification using an SVM was first introduced by El-Naqa et al. [156]. They classify Microcalcification clusters in mammogram images (76 images were utilized for the experiment where the total number of MCs was 1120). They utilized the SVM method along with the Gaussian kernel as well as the polynomial kernel. In 2003, Chang et al. classified a set of sonography images using SVM techniques where they consider that the image is surrounded by pickle noise [157], where the database contains 250 images. Their achieved Accuracy was 93.20%. A total of thirteen features, including shape, law, and gradient features, were utilized along with SVM and a Gaussian kernel for the mammogram image classification. They performed their operation on 193 mammogram images and achieved 83.70% sensitivity and 30.20% False Positive Rate [158]. SVM has been combined with the NN method by B. Sing et al. for ultrasound breast image classification where the database contained a total of 178 images. They performed a hybrid feature selection method to select the best features [159].

A breast ultrasound image is always very complex in nature. The Multiple Instance Learning (MIL) algorithm has been first used along with SVM for the breast image classification by [176], and their obtained Accuracy was 91.07%. The Concentric Circle BOW feature extraction method was utilized to extract the features and later the SVM method was used for breast image classification [177]. Their achieved Accuracy is 88.33% when the dimension of the features was 1000. A Bag of Features has been extracted from histopathological images (using SIFT and DCT) and using SVM for classification by Mhala and Bhandari [178]. The experiment is performed on a database which contains 361 images, where 119 images are normal, 102 images are ductal carcinoma in situ, and the rest of the images are invasive carcinoma. Their experiment achieved 100.00% classification Accuracy for ductal carcinoma in situ, 98.88% classification Accuracy for invasive carcinoma, and 100.00% classification Accuracy for normal image classification. A mammogram (DDSM) image database has been classified by Hiba et al. [179] by SVM along with the Bag of Feature method. Firstly the authors extract LBP and quantize the binary pattern information for feature extraction. Their obtained Accuracy was 91.25%.

Along with the above-mentioned work different breast image databases have been analyzed and classified using SVM. We have summarized some of the work related to SVM in Tables 14, 15, and 16.

3.1.5. Bayesian

A Bayesian classifier is a statistical method based on Bayes theorem. This method does not follow any explicit decision rule; however it depends on estimating probabilities. The Naive Bayes method can be considered one of the earlier Bayesian learning algorithms.

The Naive Bayes (NB) method works on the basis of the Bayes formula, where each of the features is considered statistically independent. Consider a dataset with samples, with each sample containing a feature vector with features [180] and belonging to a particular class . According to the NB formula, the probability of the particular class with the conditional vector is represented asApplying the chain rule The NB theorem considers all the features independently which can be represented as

The NB method is very easy to construct and very first to predict the data. This method can also utilize the kernel method. However, for a large dataset and continuous data, this method has very poor performance. NB can be classified into the following subclasses:(i)Gaussian Naive Bayes(ii)Multinomial Naive Bayes(iii)Bernoulli Naive Bayes.

One of the constraints of the NB classifier is that it considers that all the features are conditionally independent. A Bayesian Network is another Bayesian classifier which can overcome this constraint [181, 182]. The literature shows that the Bayesian classifier method is not utilized much for breast image classification. In 2003 Butler et al. used NB classifier for X-ray breast image classification [183]. They extracted features from the low-level pixels. For all feature combinations they obtained more than 90.00% Accuracy. Bayesian structural learning has been utilized for a breast lesion classifier by Fischer et al. [184]. Soria et al. [185] classify a breast cancer dataset utilizing C4.5, multilayered perceptron, and the NB algorithm using WEKA software [186]. They conclude that the NB method gives better performance than the other two methods in that particular case. They also compared their results with the Bayes classifier output. Some other research on the Bayes classifier and breast image classification has been summarized in Tables 17 and 18.

3.2. Performance Based on Unsupervised Learning

This learning algorithm does not require any prior knowledge about the target. The main goal of the unsupervised learning is to find the hidden structure and relations between the different data [187] and distribute the data into different clusters. Basically clustering is a statistical process where a set of data points is partitioned into a set of groups, known as a cluster. The -means algorithm is a clustering algorithm proposed by [188]. Interestingly, unsupervised learning can be utilized as preprocessing step too.(i) In the -means algorithm, firstly assign centroid points. Suppose that we have feature points where . The objective of the -means algorithm is to find positions , where that minimize the data points to the cluster by solving(ii) Self-Organizing Map (SOM): SOM is another popular unsupervised classifier, proposed by Kohonen et al. [189191]. The main idea of the SOM method is to reduce the dimension of the data and represent those dimensionally reduced data by a map architecture, which provides more visual information.(iii) Fuzzy -Means Clustering (FCM): the FCM algorithm cluster databased on the value of a membership function is proposed by [192] and improved by Bezdek [193].

The history of using unsupervised learning for breast image classification is a long one. In 2000, Cahoon et al. [194] classified mammogram breast images (DDSM database) in an unsupervised manner, utilizing the -NN clustering and Fuzzy -Means (FCM) methods. Chen et al. classified a set of breast images into benign and malignant classes [164]. They utilized a SOM procedure to perform this classification operation. They collected 24 autocorrelation textural features and used a 10-fold validation method. Markey et al. utilized the SOM method for BIRADS image classification of 4435 samples [195]. Tables 19 and 20 summarize the breast image classification performance based on -means algorithm and SOM method.

3.3. Performance Based on Semisupervisor

The working principle of semisupervised learning lies in between supervised and unsupervised learning. For the semisupervised learning a few input data have an associated target and large amounts of data are not labeled [196]. It is always very difficult to collect the labeled data. Few data such as speech or information scratched from the web are difficult to label. To classify this kind of data semisupervised learning is very efficient. However lately this method has been utilized for the brats image classification too. Semisupervised learning can be classified as(i)Graph Based (GB)(ii)Semisupervised Support Vector Machine(iii)Human Semisupervised Learning.

To the best of our knowledge, Li and Yuen have utilized GB semisupervised learning for biomedical image classification [197]. The kernel trick is applied along with the semisupervised learning method for breast image classification by Li et al. [198]. They performed their experiments on the Wisconsin Prognostic Breast Cancer (WPBC) dataset for the breast image classification. Ngadi et al. utilized both the SKDA (Supervised Kernel-Based Deterministic Annealing) and NSVC methods for mammographic image classification [199]. They performed their experiments on 961 images, where 53.60% of the images were benign and the rest of the images are malignant. Among the other utilized features they utilized BI-RADS descriptors as features. When they utilized the NSVC method they also utilized RBF, polynomial, and linear kernel. They found that the best Accuracy of 99.27% was achieved when they utilized linear kernels. Few studies have performed the breast image classification by semisupervised learning, as summarized in Tables 21 and 22.

4. Conclusion

Breast cancer is a serious threat to women throughout the world and is responsible for increasing the female mortality rate. The improvement of the current situation with breast cancer is a big concern and can be achieved by proper investigation, diagnosis, and appropriate patient and clinical management. Identification of breast cancer in the earlier stages and a regular check of the cancer can save many lives. The status of cancer changes with time, as the appearance, distribution, and structural geometry of the cells are changing on a particular time basis because of the chemical changes which are always going on inside the cell. The changing structure of cells can be detected by analysing biomedical images which can be obtained by mammogram, MRI, and so forth techniques. However these images are complex in nature and require expert knowledge to perfectly analyze malignancy. Due to the nontrivial nature of the images the physician sometimes makes a decision which might contradict others. However computer-aided-diagnosis techniques emphasising the machine learning can glean a significant amount of information from the images and provide a decision based on the gained information, such as cancer identification, by classifying the images.

The contribution of machine learning techniques to image classification is a long story. Using some advanced engineering techniques with some modifications, the existing machine learning based image classification techniques have been used for biomedical image classification, specially for breast image classification and segmentation. A few branches of the machine learning based image classifier are available such as Deep Neural Network, Logic Based, and SVM. Except for deep-learning, a machine learning-based classifier largely depends on handcrafted feature extraction techniques such as statistical and structural information that depend on various mathematical formulations and theorize where they gain object-specific information. They are further utilized as an input for an image classifier such as SVM and Logic Based, for the image classification.

This investigation finds that most of the conventional classifiers depend on prerequisite local feature extraction. The nature of cancer is always changing, so the dependencies on a set of local features will not provide good results on a new dataset. However the state-of-the art Deep Neural Networks, specially CNN, have recently advanced biomedical image classification due to the Global Feature extraction capabilities. As the core of the CNN model is the kernel, which gives this model the luxury of working with the Global Features, these globally extracted features allow the CNN model to extract more hidden structure from the images. This allows some exceptional results for breast cancer image classification. As the CNN model is based on the Global Features, this kind of classifier model should be easy to adapt to a new dataset.

This paper also finds that the malignancy information is concentrated in the particular area defined as ROI. Utilizing only the ROI portions, information gathered from the segmented part of the data can improve the performance substantially. The recent development of the Deep Neural Network can also be utilized for finding the ROI and segmenting the data, which can be further utilized for the image classification.

For breast cancer patient care, the machine learning techniques and tools have been a tremendous success so far, and this success has gained an extra impetus with the involvement of deep-learning techniques. However the main difficulty of handling the current deep-learning based machine learning classifier is its computational complexity, which is much higher than for the traditional method. The current research is focused on the development of the light DNN model so that both the computational and timing complexities can be reduced. Another difficulty of using the DNN based cancer image classifier is that it requires a large amount of training data. However the reinforcement of learning techniques and data augmentation has been largely adapted with the current CNN model, which can provide reliable outcomes. Our research finds that the current trend of machine learning is largely towards deep-learning techniques. Among a few other implications, the appropriate tools for designing the overall deep-learning model was the initial obligation for utilizing deep-learning based machine learning techniques. However some reliable software has been introduced which can be utilized for breast image classification. Initially it was difficult to implement a DNN based architecture in simpler devices; however due to cloud-computer based Artificial Intelligence techniques this issue has been overcome and DNN has already been integrated with electronic devices such as mobile phones. In future combining the DNN network with the other learning techniques can provide more-positive predictions about breast cancer.

Due to the tremendous concern about breast cancer, many research contributions have been published so far. It is quite difficult to summarize all the research work related to breast cancer image classification based on machine learning techniques in a single research article. However this paper has attempted to provide a holistic approach to the breast cancer image classification procedure which summarizes the available breast dataset, generalized image classification techniques, feature extraction and reduction techniques, performance measuring criteria, and state-of-the-art findings.

In a nutshell, the involvement of machine learning for breast image classification allows doctors and physicians to take a second opinion, and it provides satisfaction to and raises the confidence level of the patient. There is also a scarcity of expert people who can provide the appropriate opinion about the disease. Sometimes the patient might need to spend a long time waiting due to the lack of expert people. In this particular scenario the machine learning based diagnostic system can help the patient to receive the timely feedback about the disease which can improve the patient-management scenario.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.