Abstract

This study aimed to analyze the application of the diagnostic model based on deep learning technology in the evaluation of thyroid contrast-enhanced ultrasound images and to provide a reference for the evaluation of benign and malignant thyroid. A diagnosis model of ultrasound images based on long- and short-term memory neural network (LSTM), C-LSTM, was proposed. The diagnostic method was compared with that based on support vector machine (SVM) and manual feature (MF), and it was applied to the diagnosis of thyroid contrast-enhanced ultrasound images. The results showed that the sensitivity, specificity, and accuracy of the C-LSTM model were greatly higher than those of SVM and MF, and the differences were considerable (). The number of parameters and the calculation amount of the C-LSTM model were greatly lower than those of SVM- and MF-based diagnosis methods (). The sensitivity, specificity, and accuracy of the C-LSTM model were greatly greater than those of the C-LSTM-0 model, while the amounts of parameters and calculations were greatly lesser than those of the C-LSTM-0 model (). The numbers of benign tumors with contrast-enhanced ultrasound modes of no enhancement, no enhancement at early stage, and low enhancement were more than those of malignant tumors, while the numbers of high-enhancement tumors were greatly less than those of malignant tumors (). The diagnostic area under the curve (AUC) of rise time (RT) ratio, time to peak (TTP) ratio, and mean transit time (mTT) ratio for malignant masses were large, which were 0.856, 0.794, and 0.761, respectively. RT ratio, TTP ratio, and mTT ratio were of high diagnostic sensitivity and specificity for malignant masses, while RT, TTP, and mTT were of low diagnostic sensitivity and specificity. In summary, the contrast-enhanced ultrasound images based on the deep learning C-LSTM model can effectively improve the diagnostic effect of benign and malignant thyroid masses. The image feature parameters RT ratio, TTP ratio, and mTT ratio were of good efficiency in diagnosing benign and malignant thyroid masses.

1. Introduction

The thyroid nodule refers to the thyroid mass after the abnormal proliferation of the thyroid cells, which can move up and down with the thyroid with swallowing, and it is one of the most common thyroid diseases [1, 2]. Most thyroid nodules are benign, and most patients have no symptoms. They are often found inadvertently during a medical examination. The cause of thyroid nodules is not clear [3, 4]. People with a history of radiation exposure are more likely to develop thyroid nodules. Thyroid nodules become more common with age. Thyroid nodules have familial heritability, and if parents have thyroid nodules, the risk of children’s disease will also be increased [5].

Clinical examination methods for thyroid nodules mainly include thyroid radionuclide scan, thyroid computed tomography (CT) or magnetic resonance imaging (MRI), fine needle puncture cytology, and thyroid ultrasound [6, 7]. Nuclide scanning is classified into “hot nodules” and “cold nodules” according to their ability to absorb radionuclides. “Hot nodules” are functionally autonomous thyroid nodules that are almost always benign. “Cold nodules” may be cancerous. Fine needle aspiration inter cytological examination can reduce unnecessary thyroid surgery and improve the detection rate of intraoperative malignant tumors, with a diagnostic accuracy of 70%–90%, however, it is related to experience in puncture and cytological diagnosis [810]. CT and MRI can display the patients’ tissues with high clarity and can clarify the invasion of important tissues, such as the cervical blood vessels, trachea, and esophagus, which is of important guiding significance for the operation [11]. Ultrasound can show nodules as solid, cystic, or mixed lesions. Single solid nodules have a high probability of malignancy. Mixed nodules are also likely to be malignant, while pure cystic nodules are less likely to be malignant. 10% of nodules with calcification may be malignant, and sandy nodules are most likely to be malignant [12].

Contrast-enhanced ultrasound, also known as contrast acoustics, is an image enhancement technology developed based on conventional ultrasound. The contrast agent is injected intravenously at the same time as ultrasonic scanning. The scattering signal of the contrast agent is used to enhance the echo, and the perfusion information of the microvessels of the tissues is observed dynamically in real time. As the echo of the contrast agent in the blood is different from the signal of the surrounding tissues, the resolution, sensitivity, and specificity of ultrasonic diagnosis are greatly improved [13, 14]. The location of the region of interest (ROI) is a very important task in contrast-enhanced ultrasound-assisted diagnosis. By locating the ROI, ultrasound doctors and clinicians can quickly conduct a series of analyses according to the ROI to provide help for subsequent diagnostic analysis [15]. The general supervised target location algorithm requires the doctor to mark the real location of the ROI in the image, which requires a lot of energy. In recent years, with the development of deep learning, the combination and cross-development of medical image and artificial intelligence are becoming closer, and the research of deep learning models is becoming more extensive. Therefore, a method based on long- and short-term memory neural network (LSTM) was considered in this research, and image-level tags were used to locate the features of contrast-enhanced ultrasound images.

To sum up, deep learning technology has been applied more widely in the field of medical images, however, there are few studies on its role in thyroid contrast-enhanced ultrasound images. Therefore, the LSTM-based diagnosis model C-LSTM was proposed in this study. It was compared with the diagnostic methods based on support vector machine (SVM) and manual feature (MF) and was applied to the diagnosis of patients’ thyroid contrast-enhanced ultrasound images. Then, the diagnostic performance of different contrast-enhanced ultrasound image features for benign and malignant thyroid masses was compared to comprehensively evaluate the application feasibility of deep learning-based contrast-enhanced ultrasound images in the clinical diagnosis of thyroid masses.

2. Materials and Methods

2.1. Research Objects

A total of 84 patients who were admitted to the hospital for examination because of thyroid diseases from February 5, 2019, to March 10, 2021, were selected as research subjects, including 36 males and 48 females. All patients were examined by contrast-enhanced ultrasound. This study had been approved by the medical ethics committee of the hospital. Patients and their families had been informed of this study and had signed the informed consent.

Inclusion criteria: (i) patients with complete basic data; (ii) patients who signed the informed consent; (iii) patients confirmed by needle biopsy or surgical pathology; (iv) patients who had not received treatment. Exclusion criteria: (i) pregnant women; (ii) patients with congenital heart disease with right-to-left shunt; (iii) patients with acute infarction; (iv) patients with severe pulmonary hypertension; (v) patients with moderate chronic obstructive pulmonary disease; (vi) patients allergic to contrast agents.

2.2. Contrast-Enhanced Ultrasound

An ultrasound tester was used to scan the patients with a special shallow contrast linear array probe at a frequency of 5–10 MHZ and a center frequency of 5 MHZ. The thyroid was first scanned with two-dimensional gray-scale ultrasound before angiography. The location, size, shape, boundary, internal echo, transformation indicator, liquefaction, and relationship with the surrounding late adjacent tissue and cervical lymph nodes were observed. Then, the blood flow was observed by color Doppler and pulse Doppler. Conventional ultrasound data were collected, and thyroid contrast ultrasound was performed. The patient was supine, and venous access was established through the elbow vein. For contrast-enhanced ultrasound, Sonovi, an Italian company, was used to dilute the dry powder without normal saline. Each time, the powder was extracted and injected through the cubitus vein and rinsed with normal saline.

Image analysis was performed using the quantitative analysis software of the angiography images of a German company. The analysis data included the relative peak intensity of the diseased area, the rise time (RT), time to peak (TTP), and mean transit time (mTT) between the diseased area and surrounding normal tissue, and the maximum peak intensity of normal thyroid tissue was defined as 100%. In addition, the ratio of RT, TTP, and mTT between the lesion area and surrounding normal thyroid tissue was taken to determine the value of benign and malignant thyroid masses.

2.3. Ultrasound Image Lesion Diagnosis Algorithm Based on Deep Learning

Deep learning, especially convolutional neural networks, has achieved very good classification results on two-dimensional images. However, to perform the classification and diagnosis tasks of contrast-enhanced ultrasound better and more accurately, the entire video image needs to be used. The task of video image classification poses unique challenges for deep learning models because, in addition to the spatial features of two-dimensional images, videos also have additional temporal features. A video contains many frames of two-dimensional images, each frame is meaningful, and the order is also very important. If the order information is ignored, the classification effect may be greatly reduced.

The LSTM recurrent neural network [16] is a special deep learning recurrent neural network that can solve the problem that the ordinary recurrent neural network structure can better deal with short-term dependence but cannot deal with long-term dependence. LSTM mainly uses gating units to store information outside the regular flow of recurrent neural networks. With these gating units, the network can manipulate information in a variety of ways, including storing information in cells and reading information from them. These cells can make decisions on information individually and can execute these decisions by opening or closing these doors. The chain structure of LSTM allows it to contain information over a long period of time, which solves challenging tasks that are difficult or impossible to solve by traditional recurrent neural networks.

LSTM mainly includes forget gate, input gate, and output gate. The forget gate is used to delete information that is no longer needed for task completion. The equation expression is as follows:

In equation (1), represents the forgetting gate at the ith time point, represents the input at the ith time point, represents the hidden layer state at the i-1th time point, represents the weight of the forgetting gate, and α represents the sigmoid function.

The input gate is responsible for adding information to the cell to update the cell state. The equation expression is as follows:

In equation (2), represents the input gate at the ith time point, represents the weight of the input gate, and the cell update state value is as follows:

In equation (3), represents the cell update state value at the ith time point, and represents the cell update state weight value. The cell state update is the value after passing the forget gate plus the update state value after the input gate screening, and the cell state is expressed as follows:

In equation (4), represents the cell state at the ith time point. The output gate selects and outputs the necessary information. The equation expression is as follows:

In equation (5), represents the output gate at the ith time point, and represents the weight of the output gate. The current hidden state is expressed as follows:

Thus, a complete LSTM unit is designed (Figure 1). In LSTM, addition is often used in the calculation process, which greatly reduces the problem of the disappearance of the back-propagation gradient.

Convolutional neural networks can extract spatial features well, while LSTM can extract temporal features well. The combination of the two is used to perform image classification to achieve the task of image diagnosis of contrast-enhanced ultrasound. Firstly, the sequential k frames of the contrast video image should be collected. The convolutional neural network is employed to extract the spatial features of each frame, i.e., the one-dimensional vector of the fully connected layer is taken as the abstract feature of the frame. It can represent the frame of the image to some extent. Then, the features of the m frames extracted by the convolutional neural network are sequentially input into the LSTM. The output of the last unit of the LSTM is used to perform classification operations. Figure 2 is a schematic diagram of C-LSTM.

The pretrained model is employed as the feature extractor, and feature extraction is performed on each frame of the image. The proposed feature vector is saved, which is then used as the input of LSTM for training. As mentioned above, the pretrained networks are employed for feature extraction, which is already a very mature practice in migration learning, and the effect is very good. In addition, it only requires the training of the LSTM network, the parameter amount is small, and the training is fast. Therefore, this method is adopted for experiment and analysis.

In addition, the error rate will increase as the number of layers increases during the training process. Therefore, the residual network (ResNet) module (Figure 3) is introduced to form the structure of CNN. Adding an identity shortcut connection in the structure allows the flow of information, i.e., the data is easier to bypass the normal convolutional layer and directly connect to the subsequent layers.

Therefore, the function that the original network needs to be fitted is recorded as S(x), and the residual module needs to be learned as G(x). Then, the relationship between the two is expressed as follows:

By the addition of the residual module, adding additional new layers will not harm the performance of the model, however, it will improve slightly. These residual modules are stacked together to form a very deep network. The residual module with the identity shortcut connection can also make each block very easy to learn the mapping of the original function. It means that additional residual modules can be superimposed without compromising the performance of the training set. An 18-layer network structure is constructed in this research.

2.4. Performance Evaluation Indicators

The diagnosis method based on SVM [17], the diagnosis method based on MF [18], and the C-LSTM diagnosis method designed were compared and analyzed. Sensitivity (SE), specificity (SP), accuracy (AC), floating point calculations (FLOPs), and parameter quantity (PQ) were used as performance evaluation indicators. The equation expressions are as follows:

TP is true positive, TN is true negative, FP is false positive, and FN is false negative. s represents the size of the convolution kernel, lin represents the number of channels in the current feature map, lout represents the number of channels in the next layer of feature maps, h represents the height of the feature channel, and b represents the width of the feature channel.

2.5. Statistical Methods

SPSS 19.0 was employed for data statistics and analysis. Mean ± standard deviation ( ± s) was how measurement data were expressed, and percentage (%) was how count data were expressed. One-way analysis of variance was used for pairwise comparison. The difference was statistically considerable with .

3. Results

3.1. Algorithm Performance Analysis

Figure 4 shows the comparison of the sensitivity, specificity, and accuracy of the three diagnostic methods. The sensitivity, specificity, and accuracy of the C-LSTM model were greatly greater than those of the SVM and MF diagnostic methods, and the difference was considerable ().

Figure 5 shows the comparison of the parameters and calculations of the three diagnostic methods. The parameter quantity and calculations of C-LSTM model were greatly lesser than those of the SVM and MF diagnosis methods, and the difference was considerable ().

The performance of the model with or without the added module was compared to analyze the role of the residual module added in this research, and the results are shown in Figure 6 below. The sensitivity, specificity, and accuracy of the C-LSTM model were greatly greater than that of the C-LSTM-0 model, and the difference was considerable (). The parameter quantity and calculations of the C-LSTM model were greatly smaller than those of the C-LSTM-0 model, and the difference was considerable ().

3.2. Image Data of Some Patient Samples

Figure 7 shows the imaging data of a 29-year-old male patient. Conventional ultrasound showed asymmetric thyroid enlargement, and the boundary between the part of the right thyroid lobe envelope and the muscles and soft tissues in front of the neck was unclear. The low echoic area was seen in the gland at the right, with irregular shape and blurred boundary, and it seemed to have scattered slightly stronger echoic points. Color ultrasound showed abundant internal blood flow signal and irregular low echoic area in the left gland.

Figure 8 shows the image data of a 40-year-old female patient. Conventional ultrasound showed thyroid isthmus thickening, and bilateral thyroid echoes were uneven. There was a mixed echo in the thyroid of the right lobe. The boundary was clear, the shape was regular, and the internal echo was uneven. Color ultrasound showed a small number of short rod blood flow signals around and inside.

3.3. Contrast-Enhanced Ultrasound Modes of Benign and Malignant Thyroid Masses

Among the 84 patients selected in this research, there were 86 lesions, 52 of which were malignant and 34 were benign (Figure 9(a)). From Figure 9(b), the number of contrast-enhanced ultrasound images in benign masses with no enhancement, early no enhancement, and low enhancement was more than that of malignant masses, and the difference was considerable (). The number of benign masses with high-enhancement ultrasound mode was greatly lesser than that of the malignant masses, and the difference was considerable (). There was no statistically considerable difference in the number of equal enhancement patterns between the benign and malignant masses ().

3.4. The Diagnostic Performance of Contrast-Enhanced Ultrasound Image Characteristic Parameters for Malignant Masses

Figure 10 shows the diagnostic performance analysis results of contrast-enhanced ultrasound image characteristic parameters for malignant masses. RT ratio, TTP ratio, and mTT ratio had relatively larger diagnostic AUCs for malignant tumors, which were 0.856, 0.794, and 0.761, respectively. However, the diagnostic AUCs of RT, TTP, and mTT for malignant masses were relatively small, which were 0.644, 0.607, and 0.638, respectively.

Further quantitative comparisons of the sensitivity and specificity of each contrast-enhanced ultrasound image characteristic parameter for the diagnosis of malignant masses are presented in Figure 11. RT ratio, TTP ratio, and mTT ratio were of high sensitivity and specificity for the diagnosis of malignant tumors, while RT, TTP, and mTT were of low sensitivity and specificity for the diagnosis of malignant tumors.

4. Discussion

At present, contrast-enhanced ultrasound has the most research reports and is the most mature in the identification of benign and malignant fat tumors of the liver. It has been widely used in clinical practice, however, there are relatively fewer pieces of research on thyroid diseases [1921]. As the thyroid knee blood vessel only has an arterial blood supply, which is different from the liver with dual arterial and portal blood supply, the effect on the effect of angiography is different. In addition, the frequency of the superficial ultrasound probe does not match the resonance frequency of the ultrasound contrast, which makes the performance of the ultrasound contrast in the thyroid scan deviation [22, 23]. Therefore, a deep learning-based ultrasound image lesion diagnosis model C-LSTM was proposed in this work, and the diagnosis methods based on SVM and MF were proposed for comparison. It was found that the sensitivity, specificity, and accuracy of the C-LSTM model were greatly greater than those of SVM and MF diagnostic methods, and the difference was considerable (). This was similar to the research results of Slough et al. (2019) [24], indicating that the C-LSTM model designed in this work can perform the task of diagnosing benign and malignant thyroid with contrast-enhanced ultrasound images. The parameter quantity and the calculations of the C-LSTM model were greatly lesser than SVM and MF diagnostic methods, and the difference was considerable (), which showed that the C-LSTM model can greatly reduce the computational workload and improve the computational efficiency while ensuring the accuracy of the diagnosis [25]. The performance of the model with or without the added module was compared to analyze the role of the residual module added in this research. The sensitivity, specificity, and accuracy of the C-LSTM model were greatly greater than those of the C-LSTM-0 model, and the difference was considerable (). The parameter quantity and the calculations of the C-LSTM model were greatly smaller than those of the C-LSTM-0 model, and the difference was considerable (), which suggested that adding a residual module can greatly improve the diagnostic effect of the model.

The number of benign tumors with contrast-enhanced ultrasound mode, no enhancement, early nonenhancement, and low enhancement were more than those of malignant tumors, while the number of high-enhancement tumors was greatly less than that of malignant tumors (). It was similar to the study of Bailey and Wallwork [26], indicating that the imaging modes of benign and malignant masses were mainly equal enhancement and high enhancement. Then, the diagnostic performance of the contrast-enhanced ultrasound image characteristic parameters for malignant masses was analyzed. It was found that the AUCs of RT ratio, TTP ratio, and mTT ratio for the diagnosis of malignant masses were relatively large, which were 0.856, 0.794, and 0.761, respectively. However, the diagnostic AUCs of RT, TTP, and mTT for malignant masses were relatively small, which were 0.644, 0.607, and 0.638, respectively. This meant that the RT ratio, TTP ratio, and mTT ratio had relatively better application value in diagnosing benign and malignant thyroid. Further quantitative comparisons of the sensitivity and specificity of each contrast-enhanced ultrasound image characteristic parameter for the diagnosis of malignant masses were performed. It was found that RT ratio, TTP ratio, and mTT ratio had high sensitivity and specificity for the diagnosis of malignant tumors, while RT, TTP, and mTT had low sensitivity and specificity for the diagnosis of malignant tumors [27]. Combined with the above results, it was proved that the contrast-enhanced ultrasound image based on the deep learning C-LSTM model can effectively improve the diagnostic effect of benign and malignant thyroid masses. The image feature parameters RT ratio, TTP ratio, and mTT ratio were of good efficiency in diagnosing benign and malignant thyroid masses.

5. Conclusion

In this study, a deep learning-based diagnosis model, C-LSTM, was proposed and compared with the SVM and MF-based diagnosis methods, which were applied in the diagnosis of thyroid contrast-enhanced ultrasound images. The results showed that contrast-enhanced ultrasound images based on the deep learning C-LSTM model could effectively improve the diagnosis effect of benign and malignant thyroid masses. Moreover, the image characteristic parameters RT ratio, TTP ratio, and mTT ratio were of high efficiency in the diagnosis of benign and malignant thyroid masses. However, since the contrast-enhanced ultrasound data does not have real ROI location labeling, only qualitative comparative analysis and lateral quantitative analysis can be performed. Subsequently, it is necessary to obtain the real location annotation of the ROI for accurate quantitative analysis and comparison. In this way, the algorithm can be further optimized to achieve better diagnosis results. In conclusion, the results of this study support the clinical diagnosis of benign and malignant thyroid masses.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Authors’ Contributions

Ping Xu and Zusheng Du contributed equally to this work.