Abstract

Thyroid nodule is one of the common life-threatening diseases, and it had an increasing trend over the last years. Ultrasound imaging is a commonly used diagnostic method for detecting and characterizing thyroid nodules. However, assessing the entire slide images is time-consuming and challenging for the experts. For assessing ultrasound images in a meaningful manner, there is a need for automated, trustworthy, and objective approaches. The recent advancements in deep learning have revolutionized many aspects of computer-aided diagnosis (CAD) and image analysis tools that address the problem of diagnosing thyroid nodules. In this study, we explained the objectives of deep learning in thyroid cancer imaging and conducted a literature review on its potential, limits, and current application in this area. We gave an overview of recent progress in thyroid cancer diagnosis using deep learning methods and discussed various challenges and practical problems that might limit the growth of deep learning and its integration into clinical workflow.

1. Introduction

Thyroid cancer has been more common during the last three decades [1]. The most recent estimation for thyroid cancer reported by the American Cancer Society for 2022 is approximately 43,800 new cases and about 2,230 deaths [2]. Thyroid cancer is a solid tumor that usually shows up as a nodule or mass at the front base of the throat in the thyroid gland [3]. Thyroid cancer happens when rogue cells reproduce too rapidly for the immune system to control [4]. Generally, cancer results from gene mutation or changes to genes are responsible for controlling the cell function. Therefore, cells reproduce uncontrollably and spread into surrounding tissues [5]. Several types of thyroid cancer exist, but two types are by far the most common types that are responsible for 95% of thyroid cancers. These types include follicular and papillary thyroid cancer [6].

Treating early detected malignant thyroid nodules before the thyroid gland’s cancerous cells spreading can result in effective treatment and less harm [7]. Thyroid cancer screening is a procedure for the early detection of malignant thyroid nodules [8]. Thyroid cancer is detected using two major methods: (1) palpation of the neck during a physical examination and (2) ultrasonography, which can detect palpable and nonpalpable nodules, especially those less than 1 cm in diameter [9]. Ultrasonography is used to identify the characteristics of thyroid nodules as the primary diagnostic tool. These identified characteristics help to classify nodules into benign or malignant type [1012].

Over the last decades, computer-aided diagnosis (CAD) is employed as a new technique for automatic thyroid nodules diagnosis. Implementing artificial intelligence in CAD tools makes them smarter and increases the accuracy and consistency of the ultrasonography features interpretation, ultimately decreasing the unnecessary biopsy. Machine learning and deep learning are the underlying techniques of AI-based CAD systems that greatly impact the medical field [13]. These methods rely on experts’ knowledge to choose the essential features from a set of predefined specified characteristics collected from the region of interest [14]. In thyroid ultrasound images, features such as margin, shape, echogenicity, calcifications, and composition and have been used in many studies to develop CAD systems. The efficiency of these systems has been indicated previously [1517]. Previous research has indicated how the traditional machine learning and deep learning algorithms, such as the support vector machines [18], GoogLeNet [19], and convolutional neural network (CNN) [20, 21], have changed the thyroid nodule diagnosis. The development of machine learning and artificial intelligence removed the constraints of employing CAD tools in the everyday routine of physicians and experts have been overcome significantly [22, 23].

This paper presents a comprehensive review of the deep learning approaches used for diagnosis of thyroid cancer. Most of the papers were published after 2018, indicating that the deep learning algorithm had a good performance for thyroid nodules classification; therefore, it gained much attention over the last years. In the following part of this paper, a review of the deep learning methods that previously have been applied for thyroid nodule classification is presented in Section 2. A comprehensive explanation of deep learning methods such as CNNs, generative adversarial networks (GANs), autocondors, long short-term memory (LSTM), deep belief network (DBN), and recurrent neural networks (RNNs) was provided, and the investigations that applied these approaches for thyroid cancer classification were introduced. The rest of the paper provides discussion and conclusion.

2. Deep Learning Methods Reviews

Deep learning is a part of artificial intelligence that uses artificial neural networks. It is a machine learning technique for extracting patterns and making predictions from large datasets. The growing deep learning model application in health care, combined with the availability of well-characterized cancer datasets, has pushed research into deep learning’s utility in analyzing cancer cells. In the following, a comprehensive review of deep learning models that are used for thyroid cancer classification is provided.

2.1. Convolutional Neural Networks

Convolutional neural network (CNN) is a type of neural network that has one or more convolutional layers. The use case of these layers is to process images, classify data, segment data, etc [24]. CNN is similar to traditional artificial neural networks and is made up of neurons that learn to optimize themselves. Each neuron receives data and performs an operation, forming the foundation for many artificial neural networks. The complete network expresses a single perceptual scoring function from the input raw image vectors through the final output class score [25, 26]. CNN as a feed-forward neural network uses a grid-like layout to evaluate visual images, process data, and detect and categorize items in an image. CNN does three steps to make image classification practical, including the following:(1)Reduce the number of input nodes(2)Tolerate small shifts in where the pixels are in the image(3)Take advantage of the correlations observed in complex images

Figure 1 indicates the simple architecture of CNN. The CNN architecture is made up of several layers (known as multibuilding blocks). In the following, the description of each layer and its functions are provided in detail:(1)Convolutional layer: Convolutional layers are the fundamental component of a CNN structure. It is made up of several convolutional filters. The output feature map is generated by convolving the input image (represented as N-dimensional matrices) with these filters.(2)Pooling layer: Pooling is a technique used in convolutional neural networks to enable the network to recognize features regardless of their location in the image by generalizing characteristics retrieved by convolutional filters(3)Fully connected layer: In a neural network, fully connected layers are layers in which all of the inputs from one layer are connected to each activation unit of the next layer

CNN has been widely applied to ultrasound images to classify thyroid cancer. CNN is proved to be efficient in thyroid disease diagnosis based on medical imaging [19, 27, 28]. CNN had the greatest accuracy rates among other models for thyroid cancer diagnosis according to the latest researchers. Table 1 lists the most recent 20 articles that used CNN to diagnose thyroid cancer. The specificity, sensitivity, and accuracy rates are used to measure the method’s functionality.

In this paper, we chose 20 articles out of 170 initial selected publications. These publications proved remarkable growth in implementing CNNs in the thyroid nodules’ assessment in the past years (Table 1). CNNs are mainly focused on identifying suspicious nodules and diagnosing diseases like cancerous cells by the classification of nodules into malignant and benign types. This characteristic led to the growth of using CNNs over the last years. Lee et al. introduced a CAD system that works based on a deep learning approach for patients with thyroid cancer. Eight different CNN models were used to compare the accuracy of methods in classifying thyroid cancer tumors. As shown in Table 1, the ResNet50 had a better performance with higher accuracy, sensitivity, and specificity rate [29]. The other method that was proved to have a good performance is VGG16. Lin et al. proposed a deep learning approach based on VGG16 and used the whole slide images (WSIs) database for this purpose. The result of the proposed approach indicated 99% accuracy and 94% sensitivity [31]. Xception neural network also demonstrated high accuracy in diagnosing brain tumors. Zhang et al. indicated the high accuracy of this approach by applying Xception neural network to CT images [38]. CascadeMaskR-CNN utilized ultrasound images of thyroid cancer to diagnose benign from malignant tumors [46]. The experimental results indicated 94% of accuracy.

2.2. Generative Adversarial Networks

Generative adversarial networks (GANs) were introduced as a type of generative model and have gained much attention among artificial intelligence researchers since their introduction. GANs are inspired by the idea of two-player zero-sum games. These models estimate the potential distribution of the given dataset and then generate new samples from the estimated distribution [48]. GANs techniques have been widely applied in various fields due to their exceptional capability for dealing with a variety of types of problems, including image processing, computer vision, speech processing, and language processing [49]. Typically, GANs are made up of a generator and a discriminator learning simultaneously. The generator has the role of recording the probability distribution of the given datasets and then generating new data samples based on that distribution [50]. The discriminator is responsible for distinguishing real data and fake data and usually is a binary classifier. The generator and the discriminator can use a deep neural network structure. GAN utilizes minimax game optimization with the goal of reaching Nash equilibrium, where the generator is to capture the distribution of given datasets [51]. Figure 2 indicates the GANs model and its simple structure.

GANs generate fair data that are very close to the real data [48]. This method is the second most used approach that has been used for thyroid nodules classification. For example, Zhang et al. proposed an adversarial learning-based approach for tissue recognition from medical images by synthesizing medical images. The synthetic model is based on Wasserstein, deep convolutional GANs, and boundary equilibrium GANs approaches. The researchers reported a 98.83% accuracy for tissue recognition synthetic images [52]. In another paper written by Yang and Qianqian, a semisupervised learning model proposed integrated domain knowledge in training dual path conditional GANs. Also, a semisupervised support vector machine is suggested for classifying thyroid nodules. After putting the model to the test, they found that it successfully avoids the mixed outcomes that might arise when using a limited dataset [53]. Zhao et al. introduced a novel thyroid cancer classification approach based on multimodal domain adaption. To deal with visual discrepancies between modal data, the researchers created semantic consistency GANs and used adversarial learning between dual domains, which is based on the self-attention mechanism. The rate of accuracy of this research for classifying benign and malignant nodules was 94.30 percent [54]. Shi et al. presented an adversarial augmentation technique that is knowledge-guided to synthesize medical images. They designed term and image encoders for extracting domain knowledge based on radiologists’ ideas. Then, for high-quality thyroid nodule images and to constrain the auxiliary classifier GANs, domain knowledge is used as a condition. The researchers tested the proposed model on the classification of the ultrasonography thyroid nodule. The accuracy of the model is reported to be 91.46% [55]. The effectiveness of GANs for creating high-resolution pathology images was investigated by Levine et al. The researchers looked at ten different forms of cancer histologically, including five cancer types from the five primary histological subtypes of ovarian carcinoma and the Cancer Genome Atlas. They showed that histotype-classified actual and synthetic images had similar accuracies [56].

2.3. Other Deep Learning Approaches

There are other deep learning approaches that have been applied to ultrasound images for thyroid cancer diagnosis. In the following, a review of deep learning applications is presented.

2.3.1. Autoencodors

Autoencoders (AE) are a subtype of neural networks. They are primarily meant to encode the input, i.e., represent the input in a compressed and meaningful manner, then decode it, i.e., reconstruct the encoded input with the maximum possible similarity to the original input. Unsupervised learning and deep architectures rely heavily on autoencoders for transfer learning and other tasks. AE has been widely used in the medical field for tumor classifications. However, this method has not been applied widely for the classification of thyroid cancer. There are several studies that applied this method for this purpose. For example, Ferreira et al. applied six distinct AE types for thyroid nodules classification, as well as two different techniques to train the classification model. With an F1 score of 99.61% ± 0.54, they conclude that combining a deeper classification network with the reconstruction of the input space outperformed previous studies [57]. In another study, Ferreira et al. contributed to the literature by automatically classifying tumor samples by analyzing their gene expressions. The researchers tried to develop a methodology for distinguishing five different cancer types from RNA-Seq datasets, including thyroid, skin, stomach, breast, and lung cancers. In this research, they adopted autoencoders for initializing weights on deep neural networks and compared the performance of three different autoencoders. The results indicated an average F1 score of 99.03 for the RNA-Seq data [58]. Also, to categorize thyroid nodules, Li et al. employed the stacked denoising sparse autoencoder. This study used immune-related genes to build a classifier with a stacked denoising sparse autoencoder using data of gene expression from thyroid nodule tissues. The experimental results on distinguishing benign and malignant thyroid nodules demonstrated an accuracy of 92.9% [59].

2.3.2. Long Short-Term Memory

Long short-term memory (LSTM) is another deep learning approach that is able to learn order dependency in sequence prediction issues and is a kind of recurrent neural network (RNN). By default, these algorithms are designed to avoid the problem of long-term dependency and remember information for long periods of time. In order to consider the application of LSTM, Chen et al. proposed a new approach that divided the report into two layers: a word vector layer and a sentence presentation layer, with each layer employing the bidirectional long short-term memory and attention mechanism. Finally, they provided a model with good performance [60]. Wu et al. applied ML algorithms such as Gradient Boosting trees, k-nearest neighbor, decision trees, Naïve Bayes, logistic regression, random forest, and long short-term memory model using time-series tumor marker data on two large asymptomatic cohorts, including 163,174 records. Compared to the other ML models, the LSTM model proved the best at handling erratic data [61].

2.3.3. Deep Belief Network

The deep belief network or DBN is a type of deep neural network but is not the same and it is made up of multiple layers of restricted Boltzmann machines. These algorithms provide solutions for the limitations of training conventional neural networks in deep layered networks, including getting stuck in local minima due to poor parameters, slow learning, and requiring big training datasets. The only paper that applied the DBN method for thyroid nodule diagnosis is the research done by Pavithra and Parthiban. They presented a new pigeon inspired optimization (PIO) problem with the DBN model, named PIO-DBN, for the classification and diagnosis of thyroid disease. The PIO-DBN model reached the maximum accuracy of 98.91% and 96.28% on the two thyroid datasets used to evaluate the model [62].

2.3.4. Recurrent Neural Networks

Recurrent neural networks (RNNs) are a kind of artificial neural network used for dealing with sequential or time-series data. The distinguishing feature of these algorithms is their “memory.” In RNNs, the information from prior inputs can influence the current input and output. These deep learning algorithms are commonly applied to ordinal or temporal problems. For thyroid cancer nodule diagnosis, Begum et al. utilized bBidirectional RNN to evaluate the risk of getting thyroid illness in patients. The result of applying the proposed approach was a 98.72% rate of accuracy [63]. Also, Santillan et al. studied distinguishing malignant from benign thyroid lesions by applying five neural network approaches, and the results indicated that the RNN model performed better than the rest, having an accuracy of 98% [64].

3. Discussion

Ultrasound imaging has become one of the primary technologies for analyzing thyroid nodules due to its safety, cost-effectiveness, being noninvasive, and easily accessible. However, it is a challenging task to interpret ultrasound images, and the interpretation can be altered based on radiologists’ prior medical knowledge and observational skills. Therefore, the need for automated, reliable, and objective technologies for the interpretation of ultrasound images is significant. Progress in deep learning in recent years has revolutionized various areas of machine learning, such as computer vision and image processing. Although CAD systems that are based on artificial intelligence are evolving rapidly, there is no widespread adoption of any of these systems, and there are still conflicting issues. There is a significant need for AI-based CAD systems with better designs and practicality that provide consistent nodule management solutions in practice [65, 66]. In this paper, we reviewed the recent studies that deployed deep learning-based algorithms for analyzing medical images of thyroid nodules. The literature demonstrated that although CAD systems provide similar sensitivity to experienced radiologists, they still cannot reach the level of specificity and accuracy of experts [67]. Therefore, a probable option to consider is to combine the specificity and accuracy of radiologists with the sensitivity of CAD systems and use these systems as assistants for operators with less experience at primary care centers [7, 1012]. Accordingly, it is necessary to apply deep learning approaches and develop models with high accuracy, specificity, and sensitivity [68, 69]. Future research should scrutinize the effectiveness of these methods and techniques. Moreover, developing more effective techniques for preprocessing images is necessary as they can alter the performance of deep learning models significantly. Other challenges that need to be addressed in future research include coping with data limitations, creating valid and public datasets, and developing standard evaluation measures. Furthermore, all deep learning approaches, including B-mode, Doppler, contrast-enhanced ultrasound, and SWE, should be used on multimodal images to get a complete picture of the lesions. Thyroid nodule diagnosis accuracy can be improved by registering, training, and evaluating thyroid nodules’ multimodal images. Besides, the lack of standard metrics for evaluating the suggested methods’ performance makes it difficult to compare their outcomes. Based on the recent publication, it can be concluded that among all deep learning techniques, CNNs have been widely applied in order to diagnose thyroid cancer. The results yielded high sensitivity, specificity, and accuracy. However, other deep learning methods have not been applied widely, and there are not enough papers to make the comparison between methods reasonable. The second most used deep learning method to diagnose thyroid nodules is GANs. The high rate of sensitivity, specificity, and accuracy indicates that the application of this method on multimodal images can result in finding models with a better performance. The other popular deep learning approaches like RNN, DBN, and LSTM have not been used widely, and more research should be done to find out the rate of accuracy.

4. Conclusion

As mentioned before, thyroid cancer begins when the cells divide rapidly and spread uncontrollably into surrounding tissues. Therefore, the early detection of cancerous nodules is essential for effective management of the disease besides reducing the number of deaths. The AI-based CAD systems development for processing thyroid images was very fast over the last decades. Thyroid nodule treatment will be improved if these technologies are thoroughly verified. This paper gives a comprehensive review of deep learning applications in assessing thyroid nodules. The overall conclusion of this study demonstrated that thyroid tumor classification and analysis would considerably benefit from the latest enhancements of deep learning approaches and new systematic deep learning techniques with high specificity, sensitivity, and accuracy. Currently, in comparison with the investigations that applied deep learning approaches for other cancer detection like breast cancer and brain cancer, it can be concluded that it is essential to conduct more investigations for developing systems with high accuracy. Despite the empirical strengths and successes of previous deep learning algorithms and methods in the assessment of ultrasound thyroid images, there still exist many deep learning methods that need to be applied to ultrasound images to investigate their performance. Currently, the number of the public dataset for thyroid cancer imaging is not enough. Therefore, the development of reliable and accessible datasets and the creation of uniform assessment metrics are issues that need to be covered in next researches. According to the results of specificity, sensitivity, accuracy, and rate of previously proposed approaches, it can be concluded that CNN is by far the most popular deep learning method for thyroid cancer diagnosis. According to Table 1, the VGG16 method is the technique that has been widely used for thyroid nodule classification. Moreover, GANs, RNNs, and LSTM methods have been utilized in some research. However, the number of published papers is not enough, and more investigations are required. Also, developing better preprocessing approaches for improving deep learning models’ performance is mandatory.

Conflicts of Interest

All authors declare that they have no conflicts of interest.