Abstract

Objective. This study investigates the potential of an artificial intelligence (AI) methodology, the radial basis function (RBF) artificial neural network (ANN), in the evaluation of thyroid lesions. Study Design. The study was performed on 447 patients who had both cytological and histological evaluation in agreement. Cytological specimens were prepared using liquid-based cytology, and the histological result was based on subsequent surgical samples. Each specimen was digitized; on these images, nuclear morphology features were measured by the use of an image analysis system. The extracted measurements (41,324 nuclei) were separated into two sets: the training set that was used to create the RBF ANN and the test set that was used to evaluate the RBF performance. The system aimed to predict the histological status as benign or malignant. Results. The RBF ANN obtained in the training set has sensitivity 82.5%, specificity 94.6%, and overall accuracy 90.3%, while in the test set, these indices were 81.4%, 90.0%, and 86.9%, respectively. Algorithm was used to classify patients on the basis of the RBF ANN, the overall sensitivity was 95.0%, the specificity was 95.5%, and no statistically significant difference was observed. Conclusion. AI techniques and especially ANNs, only in the recent years, have been studied extensively. The proposed approach is promising to avoid misdiagnoses and assists the everyday practice of the cytopathology. The major drawback in this approach is the automation of a procedure to accurately detect and measure cell nuclei from the digitized images.

1. Introduction

Cytopathology, a medical discipline born in the 20th century, was founded by George Papanicolaou in 1928 [1] and became very popular due to the worldwide known Papanicolaou test [2, 3] (test Pap). Although cervical cancer represents the vast majority of cytological diagnoses worldwide, it is not the only disease that cytopathology deals with. Even in the early days of cytopathology [46], it was used for the investigation of thyroid gland and body fluids. One of the main advantages of thyroid cytopathology is its noninvasive or minimally invasive nature. That is, the biological material is extracted from patients with nonpainful methods (for example, cells are extracted using a fine needle).

Cytopathology of the thyroid gland is a well-documented method, extensively used for preoperative diagnosis of the thyroid nodules [7, 8]. The Bethesda System for reporting thyroid cytopathology (TBS [7]) is a well-established system for the evaluation of thyroid lesions [911] with link to the risk for malignancy guiding clinicians towards patient management. TBS guides cytopathologists in the classification of thyroid lesions via established criteria.

The diagnostic accuracy of thyroid fine-needle aspiration (FNA) has been reported by many studies as having both high sensitivity (80%–90.8% [12, 13]) and specificity (60%–100% [1416]). Despite, TBS is already widely used in various countries, and it has its own limitations [1719], especially due to gray diagnostic zones. Thus, false-positive or false-negative results can be observed not only due to poor cellular smears but also misinterpretation of inadequate representation of the morphological characteristics of the smear, especially in the case of follicular neoplasms. Admitted, the experience of the individual cytopathologist, in both the particular organ and the general diagnostic experience, is an important factor in a proper diagnosis and is based mainly on extensive training.

During the last decade, information technology and artificial intelligence (AI) enabled the creation of computer-aided systems supporting diagnosis as well as decisions for therapy and patient management. Among the numerous machine learning methodologies aiming towards the solution of such medical tasks are artificial neural networks (ANNs) [2032], more classical approaches such as discriminant analysis [20, 33, 34], classification, and regression trees [3538], genetic algorithms [39], and in the last decade, deep learning [4042]. Such techniques are not new in the diagnostic cytopathology field [43], since they have been already employed in diagnostic tasks for numerous organs such as the stomach [33, 44, 45], breast [4649], urinary system [5052], cervix [5357], and thyroid [5861] among others.

Given that a gray zone in the thyroid cytopathology classification system exists, in this study, we focus on the investigation of the potential of a rarely used ANN (namely, the radial basis function network—RBF [51]) into the classification of thyroid specimens based on cytomorphological characteristics.

2. Materials and Methods

The study was performed in FNA specimens that had a follow-up of a histological evaluation from thyroidectomy specimens (performed at the 2nd Department of Pathology, National and Kapodistrian University of Athens, Medical School, “Attikon” University Hospital). The study was carried out on cytological slides collected from 2012 to 2016 and conformed to the principles of the Helsinki Declaration. In addition, it was approved by the Ethics Review Board of “Attikon” University Hospital, and the requirement of a signed consent form was waived, since it was based on archived material and had no invasive or treatment effects on the patients.

We analyzed liquid-based cytology (LBC) specimens from 447 patients; from each specimen, various cell types were isolated and characterized. Table 1 shows the confusion matrix (histological diagnosis vs. cells measured in the cytological slides). All cases were selected in a serial manner from the database, and cases without histological confirmation were excluded from the study. In a total 288 cases, the lesion was benign (64.4%), and in the remaining 159 cases (35.6%), it was malignant. We followed the methods of an approach proposed by Margari et al. in 2019 [62] especially for the image analysis and the subsequent construction of the ANN system.

2.1. Liquid-Based Cytology (LBC)

Liquid-based cytology (LBC) was used due to the offered advantages over conventional cytopathology, i.e., reduction of air drying artifacts, blood and inflammation obscuring the cells reduction, distribution of the cells in a single layer on the slide in a smaller area than the conventional allowing rapid screening, well preserved nuclear details, and the possibility to obtain additional slides and/or biological material for molecular tests and immunocytochemical staining. Aspirates were immersed and rinsed into a vial containing CytoLyt® (fixative solution). Subsequently, a single slide was prepared, and the ThinPrep® technique was applied, as already described [63]. Cytology diagnoses were formulated according to the TBS system [7, 911].

2.2. Architecture of the AI System

A system with the architecture depicted in Figure 1 was created to classify each individual patient as benign or malignant by the use of measurements from the cytological slides. Two different technologies were used for this application: (a) image analysis which involves the selection of cell nuclei from the digitized images and subsequently their measurement; from this step is a series of measurements characteristic for each cell nucleus extracted, and (b) the second technological domain is relevant to artificial intelligence; in this part of the system, individual nucleus was classified as benign or malignant by the RBF ANN, which produces a list of classified nuclei for each cytological slide; this information is subsequently fed to a second subsystem (the case classifier), which can identify individual patients as belonging to the benign or malignant group. The case classifier can be based on a majority logic methodology either for the number of nuclei within each group or their percentage [58, 64, 65].

2.3. Image Analysis and Nucleus Morphometry

A manual selection process was applied in order to select representative nuclei from every cytological slide. This manual selection process ranges between 10 and 20 minutes per slide and consists of the definition of the nucleus borders via the mouse; note that for each patient, a single cytological slide is prepared. Subsequently, every nucleus was measured by the computer within milliseconds (note that this is a batch process that operates massively on identified nuclei). Detailed information for extracted nuclear morphometrical features are summarized in Table 2. The measurement algorithms have been already reported in the literature [58, 6668], and descriptive characteristics have already been reported in our previous article [65].

For image analysis purposes, a computer equipped with a frame grabber and a digital camera (SONY DFW-X700, Sony Corporation, Tokyo, Japan) was used. A microscope (Leica Microsystems GmbH, Wetzlar, Germany) had the camera attached via a c-mount adaptor and was interfaced via the appropriate cables to the frame grabber (installed in the computer). The images were captured using a 40x objective and digitized by the frame grabber into 1024 × 768 pixels (8 bits for each color component (red green and blue, i.e., 24 bits of depth).

PathSight version 4.3 (Medical Solutions PLC, UK) was used to capture the images, and Image-Pro Plus VERSION 4.5 (Media Cybernetics, Inc. Bethesda, MD, USA) was used for the isolation of nuclei (segmentation) and subsequent measurement (morphometry). Moreover, Image-Pro Plus was used for background correction and to remove the noise caused by the lenses of the microscope, i.e., to alleviate noise caused from dust particles interfering in the light path and calibration for lighting, ensuring that all images were captured under similar conditions, and therefore, there was improved reproducibility of results and quality control.

The measurements obtained can be categorized into two types: (a) geometric and (b) densitometric [32, 64, 65, 67, 6975]. In general, geometric and densitometric can be considered unrelated, as geometric features are based on the nucleus boundary and are relevant to nucleus shape characteristics. Carcinogenesis causes destruction effects of the nucleus—skeleton and cytoskeleton; therefore, nuclei are deformed, and this is reflected in geometric characteristics. The densitometric characteristics are extracted from pixel values and their spatial distribution within the nucleus boundary. In detail, the nucleus boundary is represented as a polygon, and the coordinates of the edge points of this polygon are used to extract geometric characteristics; for example, the nucleus area is calculated as the sum of the triangles that compose the nucleus surrounding polygon. More details on calculation algorithms can be found in the relevant bibliography for image analysis [69, 72]. A characteristic image is presented in Figure 2.

2.4. Measured Cytological Structures Training and Test Sets

The total number of selected cell nuclei (and colloid structures) was 41,324 (Table 3); they were picked either from cell groups or they were found isolated in the field of view.

About 50% (N = 224) of patients were randomly selected to form the training set; thus, the measured cytological structures from these cases were used to train the AI system. The remaining cases formed the test set and were used to evaluate the performance of the system on unknown data and therefore validate the results.

Since the dataset is not balanced between the classes (288 cases were benign and 159 cases were malignant, i.e., 64% and 36%, respectively), the random selection of 50% of the cases to form the training set, eventually, lead to a representative selection of the cases respecting this imbalanced distribution. Actually, the training set was composed of 145 benign and 79 malignant cases (65% and 35%) and the test set of 143 and 80 cases (i.e., 64% and 36%, respectively). Note that during the training stage of the ANN, it is important to respect the data distribution within different classes to avoid learning towards one direction.

Nowadays, the most popular percentages used to separate the data into training and test sets are 70%–30% or even 80%–20%, following the Pareto principle [76]. In this study, we preferred to use 50% of the data for training and the remaining for test for several reasons: (a) we have already applied successfully this approach in numerous other cytology-related classification problems in the past, (b) there were a lot of data available, especially in the nucleus classification domain; thus, we expected that the data variance would be possible to be “learned” by the 50% of the available data, (c) less data in the training set reduces the probability for overfitting (i.e., the ANN learns extremely well the training set but fails to perform well in the test set), and (d) we preferred to have almost equal samples in the training and test sets in order to compare the performance in these two sets and therefore assess the system robustness.

2.5. The Radial Basis Function Artificial Neural Network

Artificial neural networks are mathematical models mimicking the human brain structure. They are capable to learn and subsequently recall patterns [43, 77, 78]; thus, they are ideal to learn the nuclear patterns as these are represented through the measurements and subsequently assign the nuclei to individual categories (benign or malignant in this study).

A radial basis function network (RBF) uses radial basis functions as activation functions; they have strictly three layers: (a) an input layer that serves after weighting all the measurements in all nodes of the subsequent layer (the hidden layer), (b) the hidden layer that implements a series of nonlinear RBF activation functions responsible to create clusters of similar data, and (c) the linear output layer that is actually a linear combination of radial basis functions results from the inputs and the neuron parameters and creates the ANN output (i.e., classification result). Despite RBF, ANNs are here for more than 30 years (first presented in 1988 [79]) and have many uses, such as function approximation, time series prediction, classification, and control; among others, they have not been used extensively in medical applications. However, they do possess some advantages compared to classical architectures such as (a) the multilayer perceptrons have faster training, (b) it is possible to interpret what is the role of the nodes in the hidden layer, and (c) the number of nodes in the hidden layer (RBF nodes) is adjusted from the data.

2.6. Classification for the Patients

Cell nuclei classification by the RBF ANN cannot on its own assign patients as having benign or malignant thyroid disease. Thus, an additional subsystem was incorporated according to a technique already reported in the literature [58, 64, 65]. Specifically, two different approaches were used called subsequently the numeric and percentages classifiers; these assign a case as benign if a number or percentage of nuclei, respectively, classified as benign by the RBF ANN is above a certain threshold, otherwise as malignant. In order to find such threshold, we evaluated the specificity and sensitivity in a broad range of thresholds starting from 1 (or 1%) and increasing up to 100 (or 100%) with an increment step of 1 (or 0.1). For every value, the percentage of the cases that have been correctly classified was calculated as well as the sensitivity and specificity. As the most suitable threshold, the threshold that produced a balanced result between sensitivity and specificity (i.e., minimized their difference) was used. In order to avoid bias, only the nuclei classified in the training set were used to find these thresholds.

2.7. Tools and Techniques

The RBF ANN for nuclei classification and the algorithms for the determination of the optimum thresholds were constructed with in-house developed software for the MATLAB environment (The MathWorks, Inc. Natick, Massachusetts, U.S.A.). Moreover, MATLAB was used to calculate the performance indicators for the training and test sets and all data combined. Statistical measures used were specificity, sensitivity, positive and negative predictive value (PPV and NPV), false-positive and false-negative rates (FPR and NPR), overall accuracy (OA), and odds ratio. A list of the performance indices along with the mathematical formulas and a short description is already reported [80].

3. Results

3.1. Results of Cell Nuclei Classification

The performance of the RBF ANN was evaluated for the training set, the test set, and the complete data set (training and test sets combined). The results are presented in Table 4.

As expected, the performance (Table 5) was better in the training set. Comparison of proportions (z-test) revealed that a statistically significant difference was present in (a) sensitivity (difference: 1.14%, 95% CI: 0.40%–1.89%, ), (b) specificity (difference: 4.58%, 95% CI: 4.07%–5.10%, ), (c) positive-predictive value (difference: 7.74%, 95% CI: −7.06%–8.42%, ), (d) negative-predictive value (difference 0.87%, 95% CI: 0.29%–1.45%, ), (e) false-positive rate (difference: 4.58%, 95% CI: 4.07%–5.10%, ), (f) false-negative rate (difference: 1.14%, 95% CI: 0.40%–1.89%, ), and (g) overall accuracy (difference: 3.32%, 95% CI: −2.71%–3.94%, ). Note that due to the large number of nuclei, even small differences in the percentages can lead to statistically significant differences.

3.2. Results of Patient Classification

As mentioned, two approaches were used for the classification of patients: the numeric classifier and the percentages classifier, and the results of these two different methodologies are presented in Table 6, and the relevant performance indices are presented in Table 7. Specifically, the threshold that produced the more balanced results between sensitivity and specificity for the numeric classifier was 37 nuclei, i.e., if more than 37 nuclei were classified by the RBF ANN as benign, then the case (i.e., the patient) was classified as benign, otherwise as malignant. In a similar approach, the threshold for the percentages classifier was 51%, i.e., if more than 51% of the case nuclei were classified by the RBF ANN as benign, then the sample (the patient) was considered as benign, otherwise as malignant.

Notably, the percentages classifier had better indices for the three most important metrics: sensitivity, specificity, and overall accuracy. And in general, these indicators were lower in the test set. In order to test the stability of the approach, we performed comparisons (z-test) for the sensitivity, specificity, and overall accuracy between the training and test sets; for the numeric classifiers, the difference, 95% CI, and values were 1.16%, 95% CI: −4.25–6.63, and ; 0.77%, 95% CI: −4.0–5.57, and , and 0.92%, 95% CI: −4.1–5.98, and , respectively. Similarly, for the percentages classifier, there was no observed statistically significant difference. Thus, both approaches can be considered as stable. This is also reflected by the comparison of the areas under curve for the receiver operating characteristic (ROC) curves between the training and the test sets (Figure 3). Specifically, for the arithmetic classifier, the area under curve (AUC) for the training and test sets was 97.7% (95% CI: 95.7%–99.7%) and 96.9% (95% CI: 94.8%–99.1%), respectively, and no statistically significant difference was confirmed (). Similarly, for the percentages classifier, the AUC for the training and test sets was 98.1% (95% CI: 96.2%–100%) and 98.1% (95% CI: 96.6%–99.7%), respectively, and no statistical difference was possible to be confirmed (). Moreover, we compared the same three performance indices between the numeric and percentages classifier (considering the training, test, and all data combined); again, no statistically significant difference was proved ( for all comparisons). In summary, both case classifiers can be considered stable between the training and test sets and of similar performance.

4. Discussion

The history of artificial intelligence in thyroid cytopathology is really worth to investigate, so that the evolution of the various efforts can be highlighted. Thyroid disease and ANNs first appeared to the authors’ knowledge, in 1993 [81], and a few years later, the first application in cytopathology appeared in 1996 [27]. Specifically, it used the backpropagation training algorithm to train a feedforward three-layer (one input, one hidden, and one output layers) ANN. This ANN was discriminating between benign from malignant thyroid nuclei, according to nuclear morphometry. The number of patients in this article was rather small (51). At the nuclei classification level, an overall accuracy of 90.6% was achieved, and classification of individual patients had an overall accuracy of 98%. Three years later, in 1999 [61], four variations of the LVQ classifier (namely, the versions LVQ1, LVQ2.1, LVQ3, and OLVQ1-optimized LVQ) were tested in 100 patients. A different approach was used, i.e., the mean value and standard deviation of nucleus morphometry features were employed for each patient, and in contrast to this approach, the ANN is applied on individual nuclei; thus, the patients were represented from the statistics measures of the cell nuclei. These LVQ variations enabled classification of 97.7% benign vs. malignant patients, but no important results finer classification in the histological subgroups was reported or obtained. In 2006 [31], Cochand-Priollet et al. reported on the application of four different classification methods after nuclear morphometry that was followed by statistical preselection in order to identify the significant features (notably only four image morphometry features were different and were considered important for subsequent classification). Four classifiers were compared: (1) a linear classifier, (2) a two-layer feedforward, (3) a combined two-layer feedforward ANNS generated by the Ada–Boost method, and (4) the k nearest neighbor classifier (a method with many similarities with LVQ). The results of the classifiers were between 83% and 94%, with the linear classifier having the worst performance (65%) in patient discrimination. The latter proved that ANNs can exploit their nonlinear nature to obtain better classification results than typical statistical approaches such as the linear models.

In the same arena, in 2007, Shapiro et al. [82] used 197 thyroid follicular tumors (adenomas and carcinomas); various types of ANNs and different designs were tested using nuclear morphometry (i.e., area, perimeter, and shape factor) and nuclear density or texture features (mean value and standard deviation of gray levels). In a similar approach, the ANNs were applied on the mean values of the nuclei measurements; thus, each patient was represented by a single vector of values. According to the results, the diagnostic accuracy in detecting follicular tumors was 97%, and the accuracy of ANNs in discriminating adenomas from carcinomas was by 87%. This research team reported that the application of ANNs raised the sensitivity of cytological diagnosis of follicular tumors to 90%, while at these times, the usual cytopathology approach had sensitivity around 60%.

The problem of indeterminate results of thyroid cytopathology was first reported in 2004 [83], whereas in a large group of patients (N = 453), a feedforward ANN was trained and tested using not only cytopathology results but additionally combined with clinical data. Patients were separated into high or low risk for malignancy and reported that only the cytological parameters contributed towards this classification. In this study, it was reported that there was no difference between the training and the test set results; thus, the method was not only important in the gray zone of cytopathology of the thyroid but also considered as robust (note that usually the robustness test is not reported in the various studies).

In 2006, two important articles presented [84, 85] with several novelties: (a) a two-layer ANN was employed with the layer having one input assigned to one training image, (b) the classification was based on image frequency bands (i.e., application of the Fourier transform in the two-dimensional domain); thus, the novelty was that no complex morphometry operations were required and most interestingly, the cell identification (an extremely difficult task), and (c) the proposed system discriminated the cases as follicular carcinomas, follicular adenomas, or unknown; thus, the ANN was capable to handle cases impossible to discriminate. Notably, this approach introduces a gray zone quite shorter than the cytological.

The first combinatorial approach was presented by Daskalakis et al., [86] in 2008, who applied a system composed of multiple classifiers in order to discriminate benign from malignant thyroid nodules. The team used an ensemble of classifiers and applied combinations of rules in the classifiers involved. Similarly, to the majority of the efforts, this study used nuclear morphological features. The classification results were in the range of 95.7%, while the best single classifier had an accuracy of 89.6%. Therefore, a new combinatorial methodology for thyroid cytopathology and ANNS was first introduced and had the potential for better accuracy from a single ANN. Similarly, in 2011 [58], the application of a combinatorial approach of two ANNs, one for nuclei classification and a cascaded second ANN for patient classification based on the LVQ and monolayer smears, was reported. The study had relatively a large number of patients (N = 335). These two combined ANNs had an overall accuracy of 94%. The study concluded that the diagnostic accuracy of thyroid FNA can be improved by the use of ANNs. More interesting results were for follicular neoplasms suspicious for malignancy and in Hürthle cell tumors.

Finally, there are approaches that are based on histological sections. For example, in 2014 [87], Ozolek et al. presented a method distinguishing follicular thyroid lesions using the optimal transport-based linear embedding to segment cell nuclei [88]. The results of the classification were almost perfect, and the classification was based on isolated nuclei using a supervised method [89].

Finally, in the recent years, ANNs have been again in the front line, and there are more efforts reported, for example, to distinguish follicular adenomas from follicular carcinomas [90] or papillary carcinomas [91]. Interestingly, whole slide imaging applications started to appear, and deep learning approaches have been introduced [92, 93].

Since there is no standardization on the reported results, for example, some reports mention sensitivity or specificity and others overall accuracy, it is extremely difficult to have comparative results of the various approaches. Moreover, the classification unit differs, for example, in some reports, the discrimination is on patients, while in others, on images and even at the cell nucleus level. Finally, the classification domain can be between benign or malignant lesions and in other reports between follicular adenomas vs. follicular carcinomas. For completeness reasons, Table 8 presents the various approaches and the applied classification technique, along with the classification units, domain, and performance.

This study has an important novelty; it represents the first approach that the RBF ANN, a rarely applied ANN in the field of medicine, is used to discriminate benign from malignant thyroid patients on the basis of image morphometry in monolayer cytological slides. The results indicated that the proposed system was robust as compared between the training and test sets, for both patient classification approaches, while the performance was in the range of the performance indicators reported by the other studies so far conducted.

The major advantages and novelty of this approach can be summarized as follows: (a) there is increased objectivity in the method since there are measurable features, (b) the slide preparation using the single layer approach and staining devices contributes towards this objectivity, (c) the diagnostic accuracy if we consider only the cytological examination seems increased; however, this is a subjective measure since it depends on the laboratory performing the cytological examinations, and (d) the decision mechanism is based on computers and is not dependent on human factors; however, note that specialized cytopathologists are required to define the nucleus borders required for the image measurements process.

More efforts seem to be needed towards the automation of the procedure, since only a small number of nuclei (about 100/patient) are used both in this study and in other studies. It seems that the application of whole slide imaging and a simultaneous detection of higher number of nuclei would be of interest. Moreover, automated cell nuclei identification and measurement seem to be of interest for further research. This approach highlighted a new combined methodology for thyroid cytopathology that has the potential to evolve to greatest accuracy and automation.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Ethical Approval

This study was approved by the Bioethics Committee of “Attikon” University Hospital; the decision number is 7/7/11-7-2012. The study was carried out on cytological slides collected from 2012 to 2016 and conformed to the principles of the Helsinki Declaration.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Christos Fragopoulos, Abraham Pouliakis, and Christos Meristoudis equally contributed to this work.