Abstract

Breast cancer is the most common form of cancer in women. Its aggressive nature has made it one of the chief factors of high female mortality. Therefore, this has motivated research to achieve early diagnosis since it is the best strategy for patient survival. Currently, mammography is the gold standard for detecting breast cancer. However, it is expensive, unsuitable for dense breasts, and an invasive process that exposes the patient to radiation. Infrared thermography is gaining popularity as a screening modality for the early detection of breast cancer. It is a noninvasive and cost-effective modality that allows health practitioners to observe the temperature profile of the breast region for signs of cancerous tumors. Deep learning has emerged as a powerful computational tool for the early detection of breast cancer in radiology. As such, this study presents a review that shows existing work on deep learning-based Computer-aided Diagnosis (CADx) systems for breast cancer detection. In the same context, it reflects on classification utilizing breast thermograms. It first provides an overview of infrared thermography, details on available breast thermogram datasets, and then segmentation techniques applied to these thermograms. We also provide a brief overview of deep neural networks. Finally, it reviews works adopting Deep Neural Networks (DNNs) for breast thermogram classification.

1. Introduction

Breast Cancer (BC) is the most common cancer in women with an estimated 2.3 million new cases registered annually, making it the second leading cause of death in females [1]. Cancer results from a genetic abnormality due to a change in somatic cells. This modification can result from a genetic or epigenetic mutation [2]. Breast cancers are classified into invasive and noninvasive cancer. It is invasive if it has stretched to nearby tissues of the ducts in which it is located. Otherwise, it is noninvasive [3, 4]. Early detection is the best strategy for survival [5]. Currently, mammography is the state-of-the-art screening modality used to detect breast cancer in its infancy [6]. Although, it proved to be successful. It is an invasive process that causes discomfort and exposure to radiation, thus increasing the risk of radiation-induced breast cancer [7].

Moreover, it is not recommended for young women who naturally have dense breasts [8, 9]. This is because breast density is inversely proportional to the sensitivity of a mammogram, making it difficult to pick abnormalities [10]. Instead, women with dense breasts are recommended for a breast ultrasound [11, 12]. However, breast ultrasound can only reveal superficial lesions [13].

Thermography is another screening modality that has advantages over its counterparts. First, it is noninvasive since no contact or hazardous radiation is involved [14, 15]. Second, it is portable and cost-effective [16]. This screening modality is based on infrared radiation; it measures the heat radiated from the breast surface and maps the temperature matrix onto a digital image to be analyzed by a health practitioner. Its working principle leverages a scientific fact found on skin surfaces harboring tumors in that they generate abnormal heatmaps because of the higher metabolic activity of cancerous cells. Until recently, scholars have attempted to develop predictive models for adoption in the diagnostic workflow of breast cancer. Deep Neural Networks (DNNs) have significantly displayed a high predictive accuracy in breast cancer radiology. This is due to their robustness, scalability, and universal learning approach [17].

Contrary to classical learning algorithms, these eliminate the need for explicit feature extraction through automatic feature extraction and provide the option to reduce the computational cost of training a new model through the transfer learning technique. Their enormous performance has led to their adoption in the design of Computer-Aided Diagnostic (CADx) systems to augment the work of health practitioners. This review seeks to identify the work in the literature related to the prediction of breast cancer based on deep learning techniques using the thermal infrared imaging modality. The study sought to address the following key research questions:(i)What databases are used in BC classification models based on thermography?(ii)Which techniques are used for the segmentation of digital breast thermal infrared images?(iii)What evaluation metrics best assess the efficiency of DNN classification models?(iv)Which deep learning techniques are currently applied to classify breast thermograms?

2. Infrared Thermography

2.1. Thermal Radiation Overview

According to Planck’s law of radiation, every object with a temperature above absolute zero emits Electromagnetic (EM) radiation. EM radiation is composed of a collection of quantum particles referred to as “photons.” The energy of each photon is given by (1) [18] as follows:where is Planck’s constant and and are the velocity and wavelength of the EM wave, respectively. In the context of thermal radiation, we are concerned about radiation resulting from a body’s temperature. Thus, we can quantify the emissive power or luminosity of a body with Stefan’s law in (2) [19]:where is the Stefan–Boltzmann constant, is the emissivity of an object, is the object’s absolute temperature (), and is the surface area of the emitter . It is clear from (2) that the rate at which a body radiates energy is directly proportional to its absolute temperature. The wavelength of thermal radiation lies within the range , which is categorized into four subranges, namely, Near Infrared (NIR) , Medium Infrared (MIR) , and Far Infrared (FIR) [20]. Figure 1 illustrates these subranges within the electromagnetic spectrum.

3. Breast Thermography

Temperature is a good indicator of health [22]. Any deviation from the normal range indicates a probable illness. As chemical reactions are influenced by temperature, likewise, the metabolic process is subject to body temperature [23]. The metabolic activity of cancerous cells changes the temperature profile of the breast region, causing an asymmetric temperature distribution between the contralateral sides of the breast [24]. Skin temperature differences in symmetric body parts provide objective evidence that something is wrong [25]. As the tumor size increases, so does the surface temperature [26]. This view is supported by the authors in [27], who concluded that a tumor inside a breast could give a varying temperature profile depending on factors such as size, location, and depth. This can help detect breast cancer early, particularly in asymptomatic people.

A thermal imaging camera is used to capture the temperature distribution of the breast area. This is particularly useful in cases where it is difficult to locate the physical source of pain. Two different protocols can be utilized to produce thermal images: Static Infrared Thermography (SIT) and Dynamic Infrared Thermography (DIT). The former involves observing the temperature under steady-state conditions, whereas the latter measures the thermal response after subjecting to thermal stress excited through either heating or cooling [28]. According to [29], DIT produces high-resolution images but with a trade-off for increased complexity in the inspection. The idea behind DIT is that the blood arteries of a malignant tumor will not respond to sympathetic stimulation. As a result, the tumor site will remain intact [20]. As Hernandez et al. [30] point out, DIT was introduced to address the issue of false positives and negatives encountered by the SIT protocol.

3.1. Breast Thermography Datasets

Numerous studies have attempted to produce thermographic datasets, which are given in Table 1. However, these datasets are few compared to those acquired through modalities such as mammography. They are mostly stored in private databases and exist in small sizes. Furthermore, most of these datasets were acquired through the SIT protocol. From Table 1, thermograms hosted by the Database for Mastology Research (DMR-IR) are profound since they comprise each patient’s corresponding clinical and personal data. DMR-IR utilized both protocols. SIT and DIT were carried out in the same room, under thermal conditions, and by the same team. For DIT, an electric fan was used to cool the breast area for 5 minutes, after which 20 images of thermograms were captured sequentially at an interval of 15 s. These images can then be used to determine the dynamic response of the breast area to thermal stress.

4. Image Segmentation

Image segmentation involves partitioning pixels in a digital image into coherent portions called homogeneous segments [43]. Pixels in the same segment share qualities such as intensity, texture, and hue [44]. Ideally, the goal is to break down a medical image into basic objects with comparable properties for easier analysis. As discussed by Sharma et al. [45], segmentation in the medical domain aims to:(i)Identify the Region of Interest (ROI)(ii)Measure tissue volume(iii)Aid in treatment planning

CADx systems adopt this technology to minimize the image’s complexity by isolating the Region of Interest (ROI), thus reducing computation costs for a learning algorithm. In their review, Ramesh et al. [43] classified segmentation techniques into region-based, thresholding, edge detection, clustering-based, and model-based. Thresholding is considered the simplest method with two variants. One is bilevel thresholding, which divides an image into foreground and background (i.e., binary) with the aid of a threshold value [46]. The other is multilevel thresholding, which partitions an image into more than two classes [47]. Clustering techniques employ mathematics and statistics to generate several clusters within the image space [48]. The goal is to classify the unlabeled pixels into cohesive groups with maximum affinity [49]. Edge detection techniques use differential operators and a convolution mask to localize the contours of an object and boundaries on an image [50]. Region-based segmentation attempts to group pixels with identical features into regions [51]. Model-based techniques facilitate fully automated segmentation of biomedical images [52]. Table 2 shows the advantages and disadvantages of these methods.

4.1. Works on Breast Thermogram Segmentation

Predictive models are often trained with manually segmented images; however, in CADx systems, ROI segmentation must be conducted in real-time, i.e., while examining the patient. This improves the speed of the diagnostic workflow. An alternative method is implementing a semiautomated or fully automated segmentation system to minimize delays. However, this is not an easy task due to intensity inhomogeneity, noise, and obscured boundaries normally encountered during segmentation [58]. Table 3 summarizes works implementing segmentation techniques for breast thermograms.

In [59], automatic ROI segmentation and asymmetry analysis of breast thermograms were implemented. Initially, the breast boundaries were identified by first cropping out the undesired body portion, which consisted of the waistlines and shoulder area. The ROI localization was devised with the Canny edge detector, then the left and right boundary curves were extracted using the gradient operator. After observing the parabolic shape of the lower breast boundaries, the parameters that describe these boundaries’ actual curves were selected by estimation. Finally, the portion below the detected breast curve was removed, then separated the breast sides with a vertical line drawn at the point of contact of the contralateral breasts. Three sets of features were extracted from the segmented ROIs for asymmetry analysis of heat patterns in the breast region, including Higher-Order Statistical (HOS) features, center calculation, and histogram generation. We strongly agree with some of the premises pointed out in their conclusions. First, we broadly support their view that asymmetry analysis can be useful as a second opinion for diagnostics. We also support the usage of breast thermography for mass screening since it is a portable and low-cost modality. Finally, the graphic user interface designed facilitates the system’s real-time operation.

Venkatachalam et al. [60] presented a method for segmenting an inflamed ROI using the active contour method. First, a thresholding method based on bilateral histogram differences was employed for localizing the inflamed ROI. This was then used to automate the initialization of active contours driven by the multiscale local and global fitted image (MLGFI) model. The technique’s performance was then validated with an expert’s manually segmented ground-truth images. The method was precise and accurate in segmenting the inflamed ROI without over-segmenting or under-segmenting. Its accuracy was evaluated on two datasets against state-of-the-art methods to prove its significance: K-means, fuzzy C-means, and Chan–Vese. Second-order statistical features were extracted from the segmented inflamed ROI and classified as benign or malignant. Their analyses found that the technique exhibits improved accuracy compared to state-of-the-art techniques. These results are no surprise since this method segments a smaller portion of the breast area, the inflamed portion, thereby minimizing computation costs and improving overall prediction performance.

A study in [61] proposed a novel method to detect the boundary of breast thermograms using a Single Univalue Segment Assimilating Nucleus (SUSAN); an edge-based segmentation technique. The Hough transform was used to determine the center points of each breast after the armpit region by hand. Then, for the bottom boundary, a SUSAN edge detector was generated with two masks uniquely oriented to emphasize the useful oblique edges while disregarding unpleasant ones. After that, cubic parabolic interpolation was used to interpolate the bottom region’s edges and the Canny detector for retrieving the outermost edge points. Finally, the intersection of the outermost edge points and the boundary points of the bottom of the breast. The approach was remarkably effective and accurate in three different datasets. The choice for a SUSAN edge detector was well-thought out as it gives an isotropic edge response enabling it to detect edges in all directions. The kernels were oriented at 45 and 135 degrees, allowing the detector to focus on the left and right breast edges.

Moreover, a search algorithm was implemented to extract accurate breast boundaries. However, the authors claimed the approach is useful as an automated tool; this is misleading since their approach had adapted an element of manual segmentation through manual removal of the armpit area. Their approach qualifies as semiautomated.

5. Artificial Intelligence

A substantial volume of published work now describes AI’s role in medicine. An increasing amount of this literature is devoted to building machine learning models integrated with the clinical setting to augment health practitioners’ work by processing the data from Electronic Health Records (EHRs) [69]. Jian et al. [70] identified machine learning as a potential solution to the challenges faced by health practitioners and patients, such as diagnostic dilemmas and waiting times. Machine learning has provided opportunities to reduce healthcare costs, provide quality health care, and improve precision [71]. Furthermore, it can play a pivotal role in accurate medical diagnosis and aid health practitioners in promptly diagnosing diseases.

Consequently, it has been extensively applied in screening and diagnoses, particularly on images from modalities such as MRI and breast ultrasound. This is due to the capability of these models to search for clinical patterns from large medical datasets to reveal abnormalities that are invisible to humans [72]. Following Patel et al. [73], pharmaceutical companies are now incorporating deep learning algorithms and machine learning into the drug discovery process to predict the characteristics and discover drug efficacy.

Generally, Machine Learning may be divided into six main learning paradigms. These are: supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, batch and online learning, and instance-based and model-based learning [74]. In a supervised learning framework, the data used to train the model includes the desired solutions called labels, which experts normally prepare to serve as the ground truth for the model [75]. In contrast, unsupervised learning presents no ground truth for the algorithm. Figure 2 illustrates examples of both supervised and unsupervised learning. Following the work in [76], the authors present a semi-supervised learning model which utilizes partially labeled data. This paradigm aids in developing powerful machine learning models on limited labeled data [77]. The reinforcement learning framework takes a different avenue toward learning in that it primarily focuses on interaction with the environment and closely mimics the biological learning aspects [78]. Batch and online learning are concerned with the ability of a system to incrementally learn from a stream of incoming data [74].

5.1. Supervised Learning Framework

In a supervised learning task, the goal is given a training set that is a finite sequence of pairs from , the function is needed to approximate the value corresponding to . The function is referred to as the hypothesis function. The aim is to have an optimal w.r.t. a suitably chosen “loss function that measures how far is from the actual value [79]. To evaluate the average loss on a dataset, the most popular choice is the “empirical” or “in-sample” risk shown in.

It is worth noting that the reliance of supervised learning frameworks on consistently labeled data is a problem in a clinical setting. This is confirmed by Kraljevic et al. [80] who advocated for using detailed annotation guidelines that specify cryptic scenarios for annotators.

5.2. Performance Measurement

As stated earlier, a loss function is used to determine how well a model performs on a given training set, also known as a “cost function.” The term “loss” refers to the difference between the predicted and expected values. Therefore, a loss function computes the difference between the anticipated and expected value [81]. In addition, a loss function quantifies this error into a single real value. The higher this value is, the more inaccurate the model predicted. Therefore, a loss function must be minimized to penalize high deviations from the expected values. Hence, improving the model’s performance. Given this, it is also worth mentioning that not all models will require this function to be minimized. In some cases, the primary goal is to maximize it. A lucid explanation for this is provided by Couso and Sánchez in [82], where they name the opposite of the loss function a ‘reward function’ which needs to be maximized.

5.3. Model Evaluation

Various mathematical expressions known as metrics have been defined to assess the performance of diverse types of learning systems. Selecting a performance metric is one of the most crucial and challenging tasks. In his book on Machine Learning and Artificial intelligence, Joshi [78] categorized commonly used ML metrics into numerical error, categorical error, and hypothesis testing. For classification tasks, most of the work in literature relies only on accuracy to evaluate their classifiers. However, accuracy alone is not an efficient metric for evaluating different classifiers [83]. Nevertheless, the interpretation of biomedical data is centered on accuracy and precision. For fair evaluation, other common classifier metrics used are precision, recall, and Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve [84]. They are discussed in Table 4, and their mathematical expressions are shown from equations:where(i)True Positives (TP): positive cases correctly classified as positive instances(ii)True Negatives (TN): negative cases correctly classified as negative instances(iii)False Positives (FP): negative cases wrongly classified as positive instances(iv)False Negatives (FN): positive cases wrongly classified as negative instances

5.4. Artificial Neural Networks

Artificial Neural Networks (ANNs) are among the most well-documented Machine Learning algorithms. This is due to scholars’ motivation to know more about the brain’s processing function and its ability to tackle complex problems.

ANNs work similarly to the nervous system in that they receive certain inputs, process them, and create specific outputs. A unit or neuron of a neural network is called a perceptron. As shown in Figure 3, a perceptron is defined by its state, transfer functions, and connections with other nodes [85]. The transfer function produces network nonlinearity. In ANN, a layer of neurons combines several neurons that share the same input, but each has its vector of weights or coefficients and individual output.

5.5. Multilayer Perceptrons (MLPs)

A logical extension of the single-layer architecture forms what is referred to as a Multilayered perceptron (MLP). An MLP is an ANN’s basic architecture, which may be considered a network of interneurons partitioned into three layers: input, hidden, and output. Each internal neuron is now connected to all nodes in the subsequent layer of neurons, thus forming a structure resembling the human brain’s nerve system. ANNs are classified into two types based on their connections: feed-forward (acyclic) networks and recurrent (cyclic) networks [86]. A feed-forward NN allows signals to travel in only one direction, from input to output as shown in Figure 4.

On the contrary, a Recurrent Neural Network (NN) provides feedback loops [87]. These MLP architectures are shown in Figure 5, with some connections pointing in the opposite direction. Although both leverage parallelism, RNNs are more in-depth than all NNs because they combine sequential and parallel information processing naturally and efficiently, thus using extreme parallelism [88]. As Alamia et al. [89] suggest, recurrent networks are best suited to model explicit learning, whereas feedforward networks capture the dynamics inherent in implicit learning. They are adaptable and powerful, making them ideal for handling large and complex computational tasks [90].

To train ANNs, an algorithm called backpropagation is used. By propagating the error backward through the network, this approach computes the gradient of the error with respect to the weights for a given input [91].

5.6. Overview of Deep Learning

The recent emergence of computers with more processing power has led to a renewed interest in ANNs in a complex form called DNNs. As a result, this triggered many innovative technological developments in computer vision and speech recognition. Notwithstanding, deep learning outperformed its counterparts to become state-of-the-art. DNNs can play a pivotal role in accurate radiology diagnosis and aid health practitioners in diagnosing diseases quickly and precisely. In the clinical setting, these augment the work of health practitioners by processing huge data from Electronic Health Records (EHRs) [69]. DNNs provide robustness, scalability, and a universal learning approach [17].

5.7. Convolutional Neural Networks
5.7.1. Overview

Convolutional Neural Networks (CNNs) are examples of deep networks. They perform outstandingly in visual recognition tasks such as classification. Their effectiveness in computer vision results from their unique working principle, which is physiologically inspired by the brain’s visual cortex. This area of the cerebral cortex receives and analyzes sensory nerve impulses from the eyes [93]. Hubel and Wiesel [94] found that neurons in the visual cortex respond to specific visual field patterns known as the receptive field. The intricacy of the patterns rises as they propagate through the subsequent brain molecules; hence the corresponding neurons will respond to more complex patterns. A CNN adopts a similar principle, extracting low-level features through its first layers. Then, when the image propagates through its layers, the complexity increases. Traditional neural networks use vectorization that tends to ignore the 2D spatial structure of the image, whereas CNN employs 2D convolutions which consider the 2D structure of images [95]. In contrast to explicit feature extraction, others [96, 97] have highlighted the advantage of CNNs in that they automatically detect distinct features of each class without any explicit attribute extraction. A CNN architecture comprises several building blocks: convolution layers, pooling layers, and Fully Connected Layers (FCLs), as shown in Figure 6.

5.7.2. Convolution Layer

This is an intrinsic component of a CNN. Here, a neuron is only connected to a narrow area of input neurons rather than fully connected as in conventional ANN, thereby reducing the number of parameters to be learned and allowing a network to develop deeper with fewer parameters [99]. This implies that the weights will occupy less memory, making CNN memory effective. It is made up of kernels or filters, the parameters of which must be learned. This area extracts features from the image through a linear operation called convolution. As shown in Figure 7, a kernel is slid over the entire input image to calculate the dot product between each element of the input image and a filter at every spatial position to produce a feature map. However, its size must be smaller than the input image [100]. After the convolution operation, a nonlinear operation follows in the form of an activation function. As Srinivas et al. [95] point out, a nonlinear operation between layers makes the model more expressive than a linear model and makes the model training faster.

5.7.3. Pooling Layer

The key task of this layer is to reduce the dimensions of feature maps while maintaining paramount features within the map. This decreases the number of subsequent learnable parameters, thereby minimizing computational costs. Furthermore, extracting relevant information controls the overfitting of the network [102]. Like the convolution operation, pooling uses a kernel, stride, and padding as hyperparameters to execute the operation. There are several types of pooling operations. The most common types are max pooling, global average, and min pooling. Max and average pooling examples are shown in Figure 8. In max pooling, the max pixel value is picked within the kernel whereas in average pooling the average of pixels within the mask is computed.

5.7.4. Fully Connected Layer

The flattening layer processes feature maps of the final pooling or convolutional layers. As shown in Figure 9, this layer converts these features into a one-dimensional vector. It connects each array element to a FCL, simply a feed-forward NN. As in conventional NN, every input element in an FCL is connected to the next layer by a learnable weight. It has the same number of nodes in its output layer as the number of classes to be predicted. In [104], a study was conducted on the impact of FCL on CNN performance, from which two key conclusions were drawn. First, it was found that for better performance, shallow CNN architectures needed more nodes in the FCLs, whereas deeper ones required a smaller number of neurons regardless of the type of dataset. Lastly, shallow CNN architectures need more FCLs and a substantial number of neurons in the FCLs for wider datasets than deeper datasets and vice versa.

5.8. Deep Belief Networks
5.8.1. Overview

DNN are typically trained using the backpropagation algorithm; as already discussed, it propagates the output error backward through the network to modify the weights. However, as the network depth rises, the propagated error gradually vanishes to zero, impeding the revision of early layer weights and consequently degrading network performance [105]. This can also lead to the overfitting phenomenon. To address these problems, a new kind of DNN architecture was introduced, a Deep Belief Network (DBN). DBNs have emerged in favor of an effective implementation of a learning technique. They employ unsupervised pretraining which enhances the model’s performance and avoids overfitting [106]. As is the case for breast thermograms, a feature crucial for small-sized datasets.

5.8.2. Restricted Boltzmann Machine

A Restricted Boltzmann machine (RBM) is a probabilistic model, which a generative stochastic two-layer NN can represent. As shown in Figure 10, these two layers are the hidden layer h and the visible layer v. Unlike a Boltzmann machine, the RBM restricts intra-layer neuron connections, but both have inter-layer neuron connections. These neuron connections may be unidirectional or bidirectional. The migration from full connection to restricted connection minimizes the time it takes to train the model and learning parameters [107]. An RBM can learn the probability distribution from samples. It employs an unsupervised learning approach. Hinton [108] presented a Contrastive Divergence (CD) learning algorithm for training RBMs.

5.8.3. Construction of Deep Belief Networks

A DBN architecture is composed of stacked directed RBMs as shown in Figure 11. For efficient learning, a DBNs adopt the two-step method proposed by Hinton [108]. The first step involves an unsupervised layer-wise learning procedure that leverages the CD algorithm to obtain data distribution without the need of labels. This involves training the first RBM of the DBN with original data and then using its output as input for the successor RBM. This repeats for other RBMs util we have learned weights. Lastly, the fine-tuning process is done with a backpropagation to revise the weight matrix of the network. The pretraining process prevents overfitting, consequently improving the model’s generalization [109]. To use a DBN for a classification task, we must add a new network of supervised learning, which can classify the samples based on the features extracted by DBN.

5.9. Preprocessing Images for Deep Learning

Training DNN mandates the input data to be compatible with the network. However, there are many instances where the raw data are messy. Such issues are dealt with in the preprocessing stage. At this stage, functions are designed to automatically produce transformations on the raw data to improve the model’s performance. These functions can be reused on a live system to transform new data before feeding it into the network. The goal of preprocessing thermograms can be to emphasize suspicious areas [110, 111] since image segmentation performance relies on other preprocessing techniques for decimating noise. In the context of breast thermograms, the following are possible preprocessing techniques.

5.9.1. Data Augmentation

DNNs have a huge appetite for data; however, biomedical datasets are currently scarce and available in small sizes. Data augmentation is a method used for addressing the issue of data scarcity by increasing the size of the dataset [112]. It can be defined as the artificial expansion of a dataset by carrying out a set of modifications on the existing instances to produce distinct ones. It can also cushion disparity between classes by boosting the diversity of the class instances. Overall, the technique can help a DNN to reduce generalization errors. This view is supported by a survey in [113] where they found the technique to prevent overfitting in DNN. There are two approaches to data augmentation, one is through image manipulation, and the other is through DNN. The former is less complex as it generates a new instance by performing transformations on the original image. This can be done through geometric transformations, the use of kernels, etc. The latter leverages some DNN architectures, such as Generative Adversarial Networks (GANs), to learn the features of the original image to generate synthetic ones based on learned features [114].

5.9.2. Image Registration

Image registration is a tool for spatially estimating a coordinate transformation that aligns two or more images [115]. It facilitates comparing or integrating multiple images or data captured from dissimilar sources. During this process, one of the images, called the reference image, is considered fixed while the other moves through several image operations such as regularization and transformations. [116]. The alignment quality is measured through similarity measures such as cross-correlation and mean squared intensity differences. The coordinate transformation that best relates the reference and moving image is estimated by iteratively minimizing the dissimilarity between the fixed and moving image. In instances where DIT protocol has been used, a set of progressive breast thermograms has been captured. Image registration becomes necessary to register multiple images together to form an image compatible with a DNN.

5.9.3. Image Enhancement

Image enhancement aims to make an image well suited for specific utilization by either extracting the details that are obscured or highlighting certain features of interest. The acquired images have various kinds of noises which need to be decimated to maintain their quality. However, noise, edges, and texture are high-frequency components that are difficult to distinguish from a digital image. Noisy images can impede the learning algorithm from reaching its optimal performance. Moreover, it is an intrinsic task toward quality image segmentation. Linear filters such as the Gaussian filters are classical approaches adopted for suppressing noise in the spatial domain. However, they denoise with a tradeoff of deteriorated edges and object details [117]. An alternative approach to denoise an image in a spatial domain is through a non-linear filtering technique such as an Anisotropic Diffusion Filter (ADF). An ADF denoises while also preserving vital information [118].

In contrast with spatial domain filtering, which directly deals with the image matrix, transform domain filtering can be adopted. It operates on the Fourier transform of an image and then re-transforms it back to the spatial domain [119]. In addition to image filtering, other image enhancement techniques include contrast enhancement, sharpness enhancement, and color correction. Nevertheless, a recent survey by Liu et al. [120] identified a gap in the literature of lack of metrics or quantitative standards for evaluating the performance of these enhancement techniques.

5.9.4. Other Preprocessing Techniques

One factor that makes it difficult to attain optimally trained models is when the numerical attributes have very distinct scales. As such, it is imperative to scale features before feeding them onto a learning algorithm. Normalization is a widely used feature scaling technique for changing the range of pixel intensity values to form that is easier for the learning algorithm. In addition to feature scaling, the input images must be resized to a form compatible with the architecture of the DNN. For instances where segmentation is applied, morphological operations such as dilation, erosion, and closing, are normally used to manipulate the shape of objects in an image based on the connectivity of pixels to extract the region of interest.

6. Research on Breast Thermogram Classification

Roslidar et al. [121] reviewed the potential of thermographic imaging and deep learning models to detect BC comprehensively. Their review revealed tremendously accurate CNN classifiers performed well on breast thermograms. For better CNN classifiers, they emphasized the image preprocessing stage. They advocated for an integrated automatic ROI segmentation into the CNN since it reduces computation time and improves identification accuracy [122]. More proposed approaches are given in Table 5. Resmini et al. [123] presented a hybrid methodology for detecting and diagnosing BC using DIT and SIT. The study used the DMR-IR dataset. Two key actions were performed in preprocessing of both DIT and SIT images. First, the image registration technique establishes a correspondence between multiple images by setting one image as the reference and subjecting the floating images to operations such as regularizations and transformations [116]. Second, manual ROI segmentation. Before segmentation and registration of the SIT images, the temperature matrix was obtained from the image. Furthermore, a Genetic Algorithm was used as a dimensionality reduction tool for extracting texture features to reduce computation costs. Overall, the authors conceded to a few limitations: a limited number of patients used, manual ROI segmentation, few iterations in the Genetic Algorithm, and failure to indicate the position of abnormality.

Chatterjee et al. [124] proposed a computer-based diagnostic model for detecting BC using thermograms. Due to the computational overhead in training DNNs, the authors employed a transfer learning technique with pretrained layers of VGG16 CNN architecture. After extracting features with the model, optimal features were selected with an enhanced version of a metaheuristic Dragonfly algorithm (DA) named the Grunwald–Letnikov Dragonfly algorithm (GLDA). The GLDA method effectively reduced the dimensions of the feature set, as it yielded 82% fewer features. Moreover, their model achieved a diagnostic accuracy of 100% on the DMR-IR dataset. As the authors rightly point out, the dataset size was small; hence a large dataset needs to be used to demonstrate the model’s potential on much larger datasets. Furthermore, they made another valid point that employing an ensemble improves performance. This view is also emphasized in [125].

Ekici and Jawzal [126] developed software to automate breast thermograms’ BC detection. Their method consists of a CNN optimized by a Bayes Algorithm. The dataset used had a cohort of 140, of which 95 were benign and the rest malignant. Due to the class disparity, the data augmentation technique was used to balance the two. Also, object-oriented image segmentation was used for the ROI extraction. The model’s performance was then compared with those from the literature on the same dataset from which some used statistical features but used various classifiers, namely, SVM, ANN, RBF, etc.

On the contrary, their method used CNN features and a CNN for classification. The authors found the accuracy of nonoptimized CNN to be 97.91%; but after optimizing CNN parameters with a Bayes algorithm, the accuracy rose to 98.95%. We agree with the authors’ view that CNN is vigorous in automatic feature extraction and requires less of the developer’s dedication than manual feature extraction. They also recognized the potential of the DIT protocol in improving performance and advocated for further research.

Gonçalves et al. [127] employed two bioinspired metaheuristic algorithms to fine-tune three CNN architectures for thermogram-based BC detection, namely, VGG-16, ResNet-50, and DenseNet-201. A GA and Particle Swarm Optimization (PSO) algorithm were specifically used to find optimal FCLs of the three pretrained models and hyperparameter tuning. From the experiments conducted, the GA outperformed the PSO. However, both algorithms outperformed most manual experiments. Overall, the GA significantly improved the F1 score and maintained it above 0.90 for all architectures. The VGG-16 model improved from 0.66 to 0.92, while the ResNet-50 rose from 0.83 to 0.90. The authors conceded a limitation in their work of failure to segment the ROI by proposing the technique for their next work. Additionally, they acknowledged the DIT protocol’s potential by considering its images for future work.

A recent study by Sánchez-Cauce et al. [128] offers the most innovative approach to detecting breast cancer. The method combines different views of breast thermograms with personal and clinical data considered risk factors for breast cancer. These were used to train a multi-input model with branches comprising CNNs for each image perspective and an ANN for clinical and personal data. The individual output was then combined to form a global output for classification. The model achieved accuracy of 97%, sensitivity of 83%, and area under the ROC curve of 0.99. The authors reached two key conclusions. First, adding lateral views of the breast improved the overall model’s performance. Lastly, the addition of clinical data helped to identify sick patients. These are valid points because a model trained on lateral views of the breast is less likely to miss lesions that may develop on the sides of the breast area. In addition, the clinical and personal data used has substantial evidence from the literature to support them as risk factors. For example, they used diabetes and age at menarche. Diabetic patients are more at risk of developing cancer [129]. An earlier age of menarche is a risk factor since it increases susceptibility to breast carcinogenesis [130]. In addition to its impressive contributions, the study could have employed segmentation to achieve better results.

In [131], clinical data and five breast thermogram views were used for detecting breast cancer. They used thermograms acquired through the SIT protocol and multiple frontals, 90, and 45 degrees. Their findings are like those in [128] since the addition of clinical data increased the performance of the CNN model. Accuracy improved from 85.4% to 93.8% after adding clinical data. Furthermore, the model achieved a specificity of 96.7% and a sensitivity of 88.9% after adding clinical data. Also, their model was compared with other works in literature and was found to be highly competitive. However, this work shares some limitations with the work in [128], as segmentation was not used despite its benefits.

In [132], the study presented a novel approach to detecting BC with thermograms from the DIT protocol. The method employed representation learning and texture analysis to model temperature changes during DIT and used these representations to build a multi-layered perceptron (MLP) classifier. The classifier achieved 95.8% accuracy. To validate their method, three similar studies were compared to their work. One of these used only 22 samples and failed to use any form of cross-validation. Instead, their work used a large dataset and randomly partitioned it into a training, development, and test set. This is good practice, as small datasets tend to overfit the model and lack validation risk producing a predictive model that may not generalize to unseen data. Unsurprisingly, the authors concluded that their proposed method produces a compact but descriptive representation of each instance, which led to outstanding classification results compared to related ones from literature.

In [133], a hybrid breast cancer detection modality was proposed. Their method involved microwaves as a radiation source, thermography as an imaging recorder, and a CNN to obtain quantifiable information on the tumor’s location and size. The method was based on the variation of the electromagnetic power between the healthy and tumorous breasts. A screen plate was placed under the breasts so that the transmitted waves led to a heat pattern forming on them. The difference between the heat patterns was then utilized to check for breast abnormalities. The method was found to exhibit the capability to detect and determine the size and location of a tumor with a radius of 5 mm. Contrary to x-rays used in mammography, which are associated with an increased risk of breast cancer, this work used electromagnetics, allowing full penetration at a low to non-radiation (Table 5).

7. Conclusion

This paper reviews the available information on deep learning-based CADx systems of breast thermograms, and metrics often used to validate these classifiers. It also examined the role of segmentation on DNNs performance and existing work on breast thermogram segmentation. Segmentation improved the efficiency of CADx systems by lowering computing complexity, as per the studies. Furthermore, it facilitates asymmetry analysis of breast thermograms, which can assist radiologists to avert further diagnostic conundrums. The most reliable and prevalent deep learning-based classifiers for breast thermograms were CNNs and their derivatives. This study suggests that deep learning and thermography can play a significant role in the early detection of breast cancer.

Additionally, it provides valuable information about the role of artificial intelligence in improving diagnostic workflows in health care. Further work is needed to produce breast thermogram datasets or make them publicly accessible for research, as only a few are easily accessible. Also, since the previous work was focused primarily on frontal breast thermograms, more work on the segmentation and classification of lateral breast thermograms is required to cater to lesions that may develop on lateral breast sides [143].

Conflicts of Interest

The authors declare that they have no conflicts of interest.