Abstract

Breast cancer is one of the most common forms of cancer. Its aggressive nature coupled with high mortality rates makes this cancer life-threatening; hence early detection gives the patient a greater chance of survival. Currently, the preferred diagnosis method is mammography. However, mammography is expensive and exposes the patient to radiation. A cost-effective and less invasive method known as thermography is gaining popularity. Bearing this in mind, the work aims to initially create machine learning models based on convolutional neural networks using multiple thermal views of the breast to detect breast cancer using the Visual DMR dataset. The performances of these models are then verified with the clinical data. Findings indicate that the addition of clinical data decisions to the model helped increase its performance. After building and testing two models with different architectures, the model used the same architecture for all three views performed best. It performed with an accuracy of 85.4%, which increased to 93.8% after the clinical data decision was added. After the addition of clinical data decisions, the model was able to classify more patients correctly with a specificity of 96.7% and sensitivity of 88.9% when considering sick patients as the positive class. Currently, thermography is among the lesser-known diagnosis methods with only one public dataset. We hope our work will divert more attention to this area.

1. Introduction

Cancer is among one of the most predominant diseases prevalent today. Among its various types, breast cancer has become quite notable. Breast cancer can affect both males and females. It affects more than 2.3 million women, which accounts for more than 11% of the total cancer cases in the world as stated by GLOBOCAN 2020 [1]. Meanwhile, cases in men are rare, having an incidence rate of 0.5–1% [2].

Breast cancer accounts for 14% of cancers in India. Kerala records the highest cancer rates as compared to the rest of the country. When considering breast cancer specifically in Kerala, it is quite common among women, accounting for 30–35% of the cancer cases. A prevalence rate of 19.8 per 100,000 and 30.5 per 100,000 was observed in rural and urban areas, respectively, as per the Thiruvananthapuram Cancer Registry [3]. This could be attributed to the increasing elderly population in Kerala, accounting for nearly 20.9% of their population by 2031.

Studies have shown an increment in the occurrence of breast cancer among the younger population as opposed to 25 years ago in India. About 25 years ago, 69% of the affected individuals were above 50. However, trends show that approximately 48% of them are below the age of 50. Breast cancer was found to account for 23–32% of the cancer cases in women by 2012, overtaking cervical cancer as the most prevalent cancer type [4].

Older women (age greater than 65) have a greater chance of having breast cancer when compared with younger women (age less than 35). Hence, many young women do not undergo cancer detection screenings until the age of 40. Due to this, the survival rates among the younger affected population are much lower than what is observed for affected women of ages 65–74. [5]

There are several treatments available today that try to combat the effects of breast cancer. These include surgery to remove cancerous cells, chemotherapy, and radiation. With rising cases and increasing mortality rates, breast cancer has become a major medical concern within society.

A convolutional neural network is built, which considers five thermal views of the breast along with the patient’s clinical data. This then is used for the classification of the patient as healthy or sick. The main contributions of this work include the following:(i)Performance comparison of single-input CNN and multi-input CNN to see which one gives better predictions(ii)Providing experimental proof to show that adding clinical data improves the prediction of the model(iii)Coupling results of a thermography with the other screening techniques can help improve prediction results

In this paper, the section organization is as follows. Section 2 takes a closer look at the origin of the problem and the problem statement that is being worked on. An extensive review of breast cancer and the existing work done for its identification is discussed in Section 3. Details regarding the dataset and the data collected are outlined in Section 4. Sections 5 and 6 present the methods for creation of model and data preparation. Results are discussed in Section 7. Conclusion and future plans are mentioned in Section 8.

2. Problem Statement

2.1. Origin of Work

Breast cancer is a prevalent form of cancer in women. Over the years, there has been significant amount of research that incorporates machine learning principles to help predict the existence of cancerous cells in the breast with high accuracy. Further studies have been done in the localization of such regions which helps radiologists study it further and provide the necessary treatments. In [6], the authors have tried machine learning algorithms to predict the presence of cancer using thermal images of the breast. The most common screening technique currently used is mammography which has disadvantages such as exposure to radiation, high costs to get a screening, and discomfort to the patient. Paper [6] is among the first few papers that worked on adding clinical data of the patient to the multi-input convolutional neural network.

2.2. Problem Definition

Detection of breast cancer is a challenging task and can be life-threatening if not detected at an early stage. There are many tools and technological advances to detect breast cancer. Mammography has become a popular screening technique. However, mammography exposes the patient to radiation and causes discomfort to the patient. Thermographic screenings require no contact with the machine and are lower in cost than mammography enabling patients to get screenings more often. The advancement of artificial intelligence technology has enabled a deep neural network approach in aiding medical practitioners in rapid diagnosis [7, 8]. The convolutional neural network designed uses multiple thermal views of the breast for each patient from the benchmark dataset. It compares the effect of the addition of clinical information collected of each patient to the model.

3. Literature Review

Breast cancer can be detected using many methods; some of them are discussed in Table 1. Thermography is considered a less invasive and cost-efficient screening technique compared with mammograms [16]. Mammography requires the patient to have contact with some machinery during the process, but thermography requires no contact while screening. Furthermore, there is no exposure to radiation that would otherwise arise if mammography was done. Thermography could also be used to diagnose women of all ages with any breast density, hence, making thermography more suitable.

Due to the aggressive nature of breast cancer and its rapid multiplicity, the cells will expend a lot of energy because of the repeated cell divisions. Hence, the presence of lesions on the breast can be detected due to this temperature difference. For this study, images were collected using quantitative thermography. The main reason is because the presence of tiny lesions for the early detection of cancer is significant and such temperature differences should be collected for each pixel to get a complete in-depth view of the patient’s breasts. When qualitative thermography is used, an in-depth detailed image of the breast is not obtained, and such images may not be able to correctly differentiate the slight temperature variations which detect the early presence of cancer. A quantitative measurement camera can also provide the temperature variations of such regions with the rest of the body which can help any model in detecting the presence of cancer.

Table 2 shows the different approaches of detecting breast cancer using thermal images. The popularity of thermography is relatively quite recent. Earlier thermal images were analyzed by humans making the process strenuous and inaccurate. With the emergence of artificial intelligence (AI) and machine learning (ML) algorithms, thermal images can be intercepted and used in a whole new different way. Thus, it makes thermography a very powerful method that can overcome the disadvantages of mammograms [28, 29].

Currently, there are not many studies that test the efficiency of thermograms in predicting the possibility of breast cancer. Hence, devising a model along these lines could contribute to the ongoing research in this area.

3.1. Observations

Breast cancer has risen to be one of the most widespread cancer types with a significant fatality rate depending on various factors such as age. Early detection is a crucial method to combat this disease. Compared to relatively older women (above 65 years old), adolescents and teenagers have a much lower probability of being diagnosed with breast cancer. This is mostly because breast cancer screening is done after they are 40. Three methods discussed in this report for early detection include mammography, clinical, and self-examination. The convolution neural network implementation has been made to predict breast cancer.

The CNN mechanism classifies the image by breaking it down into its features, reconstructing it, and predicting it at the end. The edge-based samples have been considered to reduce the comparison time and space. This results in increased accuracy. The reason for picking thermography over mammography is, as mentioned in the report, the boom of AI opened new possibilities, and thermography being a contact-less screening method makes it more preferred over mammography. The idea behind adding clinical data along with the thermal images is that it enhances the accuracy of the model, as exhibited in [6]. Table 3 shows some pros and cons regarding certain other approaches discussed earlier.

3.2. Impact of Technology on Breast Cancer Prediction

Technology has a lot of impact on the early prediction of breast cancer. Image processing is a very important component for its prediction. Some of the techniques used to obtain images are computerized tomography (CT) scan, MRI, mammograms, thermography, and ultrasound. One very important application of using technology is that they can be used to identify and detect regions of growth and segment those regions and radiologists can use this to study cancers and help the patients [34].

3.2.1. Artificial Neural Network

One of the most important uses of artificial neural networks is their ability to be used in place of mammography and breast MRIs where they can be used to screen severe cases of cancer. ANNs can be used to detect cancer and can be used to find benign tumours as efficiently and accurately as possible [35]. Mammography can be expensive and as the patient is exposed to radiation, mammography cannot be used as a screening tool at regular intervals. Mammography can cause patients some discomfort which is eliminated when using neural networks in the early detection of breast cancer.

3.2.2. Convolutional Neural Network

Convolutional neural networks have the ability to extract complex features from any image that it processes [36]. They help us find patterns that are common among different images. The basic operation which is performed here is that, for each cell or pixel in the image matrix, the cell is multiplied by an element in the weight matrix, followed by the addition of the resulting products. Going through a CNN, the first few layers can help us identify edges in the image, and going deeper in the network, extracts more information from them. Moving across the network, we reduce the dimensions of an image, while retaining the important information used to make a prediction. Table 4 shows some CNN models prevalent today.

4. Materials

4.1. Mammographic Images Dataset
4.1.1. Digital Mammography Dataset

This dataset consists of only the clinical data generated from around 40,000 mammograms (20,000 digital mammograms and 20,000 film screen mammograms) collected between the years 2005 and 2008 by the Breast Cancer Surveillance Consortium. The clinical data includes mammogram assessment, details on breast cancer diagnosis within one year, age of the patient, family history of breast cancer, biopsy details, breast density, etc. [40].

4.1.2. MIAS Mini Mammography Database (i.e., Mini-MIAS Database of Mammograms)

Dataset consists of digital mammograms which were collected by the Mammographic Image Analysis Society. The dataset consists of 322 images from 161 patients containing the right and left views of the breast for a single patient. Each image has a resolution of 50 microns. Details of the presence of any abnormalities are also mentioned in the dataset. All the images in the dataset are in grayscale format with a size of 1024 × 1024 pixels. This dataset was compiled in 2015 and its version is 1.21 [40, 41].

4.1.3. Digital Database for Screening Mammography (DDSM)

The dataset of mammograms consists of 2500 studies where each sample consists of two images of the breast along with the patient’s clinical information such as breast density; presence of abnormalities is also mentioned in the dataset. The resolution of the images ranged from 42 microns to 50 microns. The images collected were classified as either normal, benign, or malignant. The datasets were compiled by Massachusetts General Hospital, Sandia National Laboratories, and the University of South Florida Computer Science and Engineering Department [42].

4.2. Thermographic Images Dataset

Thermography is a detection method that uses an infrared camera to identify heat patterns based on the infrared emissions given off by an individual. Database for Mastology Research (DMR) [43] is an online platform that stores and manages mastologic images for early detection of breast cancer. This database includes the data of the patients from University Hospital Antonio Pedro (HUAP) of the Federal Fluminense University of Brazil.

The database contains data from 293 patients. The database includes thermal images (front, lateral views), thermal matrix (where each pixel contains the thermal information), and clinical and personal information (this includes menstrual details, medical history, data acquired as part of protocols, previously diagnosed positive or negative, age, and eating habits) of the patient. There are two protocols that are used in thermography because the whole point of the thermogram is to note how the body cells react under different temperature conditions.

Static protocol: patients will be asked to rest themselves with the help of brackets, whilst frontal and lateral views (including right, left, right oblique, and left oblique views) imagery will be captured. The whole point of this is for the patient to attain thermal stability.

Dynamic protocol: once the thermal stability is acquired, the patient will be subjected to cooling using a conditioning fan. Frontal and lateral images are obtained sequentially (5-second gap).

Images are recorded using a FLIR SC620 thermal camera which is captured using the static and dynamic protocols. Their dimensions were 640 × 480.

5. Methods

5.1. Steps for Predicting Breast Cancer Using Thermal Images

The dataset was obtained from the Database for Mastology Research (DMR). The thermal images of five different views (front, left 45°, right 45°, left 90°, and right 90°) of the breasts were used. Along with this, the clinical data of each patient was recorded using web scraping.

After the extraction of thermal images and clinical data, we moved on to image and data preprocessing. The images that we collected were of dimensions 640 × 480. Patients that had fuzzy images, images that did not follow protocol, or did not have all five views were removed from the dataset. For data preprocessing, any anomalous entries were deleted as well. In the current implementation, the CNN models that were built and trained worked for square images. Hence, as part of data preprocessing, all the images were transformed to images of size 640 × 640. To transform each image to a square image, the resize functionality provided by PyTorch was used. After this, it passes through various classification methods as shown in Figure 1.

5.2. Architecture Diagram

Once the image and clinical data preprocessing is completed, the focus shifts to building models that could be used for classification. In the current design, a convolutional neural network is built for each of the five views of the breast. Each model is trained separately to make sure that they perform well individually before combining their output. The output of each model will be a tuple containing the probability of the patient being sick and the probability of the patient being healthy. The probability of a patient being diagnosed as healthy is denoted by P(H) and the likelihood of a patient being diagnosed as sick is P(S). Table 5 shows the output that would be generated by each model.

Once the training of the five neural networks is completed, the results will be combined to train the final neural network. The combined result will be a 10-element tuple such that each of the 5 views will contribute the probability of a healthy and sick diagnosis. The set of these tuples for all of the patients who were part of the training data is then used to build and train the final neural network. The resulting output of the final neural network will be the prediction of healthy or sick by the multi-input CNN model.

The next step is to add personal and clinical data to the model and see the change in the performance of the model after appending the result of the model that was trained on the clinical data. The clinical data collected includes information about the age, conditions, symptoms, family history, menarche, etc. Once categorization of this dataset is completed, a neural network is built for the clinical data. The output of this neural network will be a value between 0 and 1. A threshold of 0.5 is set to decide if the prediction is healthy or sick.(i)If P (clinical data) < 0.5, patient diagnosis prediction: healthy(ii)If P (clinical data) > 0.5, patient diagnosis prediction: sick

Once this clinical data neural network model is trained, its output will be appended to the output of the multi-image CNN model giving an 11-element tuple which would be passed as input to the final neural network. Both images and clinical data are treated separately and only the outputs from their respective models (i.e., CNN for images and neural network for clinical data) are concatenate to determine the final classification. Different performance evaluation metrics are run on the final neural network to see how the addition of clinical data to the model affects its performance. A detailed overview of the steps followed to obtain the prediction using thermal images alone and thermal images along with clinical data is shown in Figure 2.

6. Data Preparation

The images are recorded using a FLIR SC620 thermal camera, which is captured using static and dynamic protocols (to see details of each protocol, refer to the dataset section of the report). The different views captured are shown in Figure 3.

6.1. Image Preprocessing
(i)Identify the inaccurate or incomplete entries, then either try to rectify it or delete the entry. The same is applied to thermal and personal clinical data(ii)All patients that did not have all five views (i.e., front, left 45°, right 45°, left 90°, and right 90°) were removed from the dataset(iii)If the static view of a front or lateral view was not available or was fuzzy, we replaced them with images taken with the dynamic protocol(iv)Images will be removed if they are as follows:(i)Blurry and barely visible (refer to Figure 4(a))(ii)Presence of injury (refer to Figure 4(b))(iii)Proper protocols were not followed during the process of data collection (for example, keeping arms down or images taken from a different angle) (refer to Figure 4(c))
6.2. Clinical Data Preprocessing
(i)Since the database was procured from Brazil, the text had to be converted to English(ii)Patient’s age was constantly updated in the database which was converted to the age at the time of visit(iii)Some cases such as Patient 398 had their age as 0 and Patient 211 had their age as 120. Such anomalies were removed(iv)All entries which were blank were filled appropriately with values such as “not answered”(v)(Patient’s age was constantly updated in the database which was converted to the age at the time of visit(vi)Some cases such as Patient 398 had their age as 0 and Patient 211 had their age as 120. Such anomalies were removed(vii)All blank entries were filled appropriately with values such as “not answered”(viii)The final set of features included the following:(i)Discrete features: diagnosis, marital status, race, complaints, symptoms, signs, menopause, eating habits, family history, cancer family, mammography, radiotherapy, plastic surgery, prosthesis, biopsy, use of hormone replacement, signs of wart on breast, smoking habit, drinks coffee, consumes alcohol, physical exercise, applied products, visible nipple changes and body temperature(ii)Continuous features: age at time of screening and menarche age
6.3. Construction of Classification Model

There are three kinds of CNN layers:(i)Pooling layer(ii)Convolutional layer(iii)Fully connected layer

Using these layers, each thermal image is gone through the models independently and their output is concatenated to predict the outcome of the disease.

6.4. Performance Evaluation Metrics
6.4.1. Current Dataset after Preprocessing

After preprocessing, there are 157 healthy patients and 84 sick patients. This is split into 80% for training set and 20% for test set as shown in Table 6. The 80/20 ratio is the most common combination of splitting training set and test set which can provide the best results. In general, 70–80% is used for training and the remaining 20–30% for testing.

6.4.2. Formulas Used for Model Evaluation

The sick class is considered positive, while the healthy class is deemed to be negative. Table 7 outlines the metrics used for the evaluation of the models.

7. Results and Discussion

7.1. Loss, Optimizers, and Normalizing Functions

For all the models discussed below, a Stochastic Gradient Descent (SGD) optimizer was used. The Adam optimizer was also tried; however, it was noticed that the model was overfitting very quickly and it was negatively affecting the overall diagnosis prediction. When SGD was used with a learning rate of 0.001, the loss reduced after each epoch, and training of the model was stopped after a certain number of epochs before the loss started to flatten. To calculate the loss between the actual value and predicted value, Cross Entropy Loss function was used.

Before the data was passed through the ANN, the data was normalized. This was done using the MinMaxScaler. The reason the scalar is used is to remove any form of bias before passing it to an ANN so that the decision-making process of the neural network may not be affected by any other external factor. The models were trained batch-wise so that the models could see different sets of inputs to improve its performance.

7.2. Model 1: Same Architecture for All 3 Views

In the current implementation, the models use 3 views of the thermographic images of the breast (frontal, left 90°, and right 90°). The model in Figure 5 uses convolutional layers with channel sizes of 32, 64, and 128. We use strides with a value of 2 and pooling layers of size 2 × 2 to reduce the matrix size from 640 × 640 × 1 to a 1 × 2 matrix containing the probabilities. In this multi-input CNN, the same convolution neural network architecture was used to train all three views and their outputs were combined. After this, another model that combines the result of the three CNNs and the neural network trained on the clinical data is built. The results of the model were evaluated with the same model after the decision from the clinical data was added.

7.2.1. Model Performance

Model 1 had an accuracy of 85.4% without clinical data and 93.8% with clinical data. The proportion of sick patients that got classified correctly is about 77.8% without clinical data and it increased to 88.9% after the addition of clinical data decision. Similarly, 90% of the healthy patients got classified correctly without clinical data and was 96.7% after the addition of clinical data decision. The performance scores of the model are mentioned in Table 8.

The area under the ROC curve is around 0.89 without clinical data and 0.987 with clinical data. As this dataset has a class imbalance and since we are more interested in the prediction of sick patients, we calculate the precision and recall and the curve associated with them. We see that a perfect predictive model would have a curve at (1,1). The precision-recall curve is seen to be tending towards (1,1) indicating that the model is performing well. The AUC score for this precision-recall curve is 0.841 without clinical data and 0.977 with clinical data. The plots for the model with and without clinical can be seen in Figure 6.

To give a better perspective, a blue dotted line is drawn on both graphs to denote a model that has no skill; i.e., it simply outputs a prediction randomly. If the graph of the model, here, the multi-input CNN, is above this line (as is seen by the ROC curve and the precision-recall curve), it signifies a much better performance than the no skill model.

7.3. Model 2: Different Architectures for Frontal and 90° Views

This model also uses the 3 views of the thermographic images of the breast (frontal, left 90°, and right 90°). In the model in Figure 7, the left and right views use the same CNN model with channels of size 45, 90, 135, and 180 while the front view uses a different model with channels of size 50, 100, 150, and 200. The multi-input CNN reduces the image to output probabilities of size 1 × 2. The results of the model along with its performance in comparison with clinical data are also mentioned in Table 9.

7.3.1. Model Performance

Model 2 had an accuracy of 81.2% without clinical data and 89.6% with clinical data. The proportion of sick patients that got classified correctly is about 75.0% without clinical data and 87.5% after the addition of clinical data decision. Similarly, 84.4% of the healthy patients got classified correctly without clinical data and it increased to 90.6% after the addition of clinical data decision. The area under the ROC curve is around 0.802 without clinical data and 0.900 with the addition of clinical data decision. The AUC score for this precision-recall curve is 0.771 without clinical data and increases to 0.923 after the addition of clinical data decision. The plots for the model with and without clinical can be seen in Figure 8.

7.4. Discussion
7.4.1. Clinical Data

Symptoms such as irritation and rashes are common in women with breast cancer. Research also shows that factors such as age [44] or hormone replacement [45] play a role in the disease. Such data coupled with the thermal images would add value as they give a more extensive idea of the patient and their health.

The models created so far have been able to perform with high accuracy after adding the patient’s clinical information as seen from their performance. There are some signs and symptoms that can indicate cancerous cells in the body. Hence, such clinical information can effectively predict the presence of breast cancer. After collecting the patient’s information, it is passed through an ANN. ANNs use backpropagation that helps to identify which features are more important and then use this information to make a decision.

Since the features collected are general to any patient, this information can be used for any other problem. For example, if any other cancer such as lung cancer was being detected, the same ANN can be used to train the model with the patient's diagnosis. The ANN then learns which feature is most important for the problem of lung cancer and uses that information to make a decision. Hence, the addition of clinical data is a very easy and effective tool to diagnose a patient, and thus it can be used for a variety of purposes.

Currently, it was observed that most datasets only collect the images of the different screening of a patient and use it to make a decision. This work aims to encourage more people to start collecting clinical data along with the images and use this information also to make a diagnosis prediction.

7.4.2. CNN

Generally, the CNN used to classify the frontal images was able to classify both the healthy and sick patients with an accuracy above 0.7. When different CNN models were being trained for the different views, it was noticed that the models were able to classify either healthy samples or sick samples with high accuracy while the other class had an accuracy of 0.6. In such cases, different epochs for the different models were tried and the final model was built after ensuring that they perform their best individually. The performance of the models that handle each view before the combination of their results is shown in Table 10.

Data augmentation techniques are mainly used when the dataset is small to generate more samples for the model to learn better. Multi-view CNNs perform better than single-view CNN. In multi-view CNN, different views of the same object are being used as input, so information and features generated from each image can be pooled together to improve the prediction. And data augmentation techniques have been used to complement or improve the performance of the multi-view CNNs [46].

Quantitative comparison of our work with other methods using thermographic images is shown in Table 11.

8. Conclusion

From the current work, it is found that the addition of clinical data to the convolutional neural networks increased the ability of the model to classify a patient as healthy or sick correctly. We see an accuracy of 93.8% for the model that uses data over 85.4% for the one that does not. Addition of clinical data helps in strengthening the prediction of breast cancer using CNN models. These findings could also bring to light the importance of increasing the research in the area of thermography. Increased research could improve the techniques employed for thermography and thus help it to become a standard in breast cancer detection in the future.

Since only a multi-input CNN was used for classification, a comparative study of the performance of a single-input CNN with the current input is planned. Along with this, another possible experimental study is using pretrained CNN models for classification. Some techniques for data preprocessing that could be explored are data augmentation and image segmentation. It is interesting to see how each model interacts with the limited data available.

Data Availability

The data that was used to obtain the findings can be found in the publicly available Visual DMR dataset [43].

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by Dr. Sunitha V.C., Additional Professor and Head, Department of Radiodiagnosis, JIPMER, Pondicherry, India.