Abstract

This study aims to investigate the relationship between the detection performance of an artificial intelligence (AI) algorithm and pathology in chest computed tomography (CT) images. In this study, a new pulmonary nodule (PN) detection algorithm was designed and developed on the three-dimensional (3D) connected domain algorithm. The appropriate grayscale threshold of CT images was selected, the CT images were converted into black-and-white images, and the useless images were removed. Then, the remaining lung images were formed into a 3D black-and-white pixel matrix. Labeling statistics was carried out, and the size, property, and location of PN could be measured and determined. A self-built database of PNs undergoing chest multislice spiral CT examination was retrospectively selected, and 150 cases were randomly selected by SPSS 22.0. Image processing was performed according to the algorithm and compared with the PN detected by radiologists; finally, the detection results were counted. There were 560 benign and malignant PNs, 312 malignant, and 248 benign. The algorithm detected 498 cases, of which 478 cases were detected accurately, and the sensitivity was 95.98%. The radiologist detected 424 cases, 364 cases were accurate, and the sensitivity was 85.85%. Compared with the detection results of radiologists, the algorithm detection results of nodules in solid nodules and ground glass nodules were more accurate. The detection results of nodules in the pleural connection type, peripheral type, central type, and hilar type were more accurate and statistically significant (). The malignancy, size, property, and location of different nodules could be accurately determined through CT images under this algorithm. It provided important support for the pathological research of lung cancer and prejudged the future development of PN in patients more accurately.

1. Introduction

Lung cancer is a relatively common malignant tumor with the highest morbidity and mortality year in, year out [1]. Although smoking is the major risk factor for 80%–90% of all lung cancer diagnoses [2], the incidence in young people and nonsmokers is also on the rise in recent years. Such a situation poses a serious threat to the health of the Chinese people [3, 4]. According to the characteristic morphology of cancer cells, the main pathological types of lung cancer are as follows: adenocarcinoma, squamous cell carcinoma (referred to as SCC), small cell undifferentiated carcinoma (referred to as small cell carcinoma), large cell undifferentiated carcinoma (referred to as large cell carcinoma), bronchiolar−alveolar carcinoma (referred to as alveolar carcinoma), and adenosquamous carcinoma. Each type of lung cancer has its typical computed tomography (CT) manifestations [5]. Pulmonary nodule (PN) is the most important early imaging manifestation of lung cancer. It is generally considered to be a general term for round or irregular high-density shadows ≤30 mm on the CT images of the lungs. The shadow over 30 mm is called a mass [6]. In fact, PN is not the name of a disease, but an imaging sign of lung lesions. The imaging features of PNs are one of the important methods to determine their pathological conditions.

The density of PNs is different, the probability to be malignant will also be different. According to their densities, PNs can be classified into ground glass nodule (GGN), solid nodule (SN), and part-solid nodule (PSN) [7]. The probability of being malignant from high to low is ranked as PSN, GGN, and SN [8]. GGNs are vague nodules in the lungs, whose density is slightly higher than that of the surrounding lung parenchyma, but the outlines of the blood vessels and bronchi can be observed insides [9]. SNs have a relatively uniform soft-tissue density, and internal blood vessels and bronchi are masked by the soft-tissue density in the images [10]. PSNs have the uneven density, containing the compositions of both GGN and nodules with solid soft-tissue density [11]. In the process of clinical diagnosis and treatment of lung cancer, CT occupies an important position and has various functions in assisting clinical diagnosis, staging, and necessary clinical differential diagnosis. Compared with traditional lung X-ray examination, CT has many advantages in the examination. It can well distinguish the images overlapped by previous X-ray examinations, better distinguish the lesions, and clarify the relationship between the tumor and surrounding tissues, organs, large blood vessels, vessels, and the tumor and the mediastinum. At the same time, it can also evaluate the situation of regional lymph nodes well, so it has an irreplaceable role. At present, spiral CT is still an important means of lung cancer screening.

The computer-aided detection/diagnosis is an auxiliary diagnostic system capable of preliminary screening and marking of lesions on a large number of CT images. It includes computer-aided detection and computer-aided diagnosis [12]. With the advancement of science and technology and the arrival of the era of big data, computer processing capabilities and algorithms have been improved significantly. Artificial intelligence (AI) is continuously developed, in particular, the algorithms such as artificial neural network, machine learning, and deep learning (DL) are researched. Thus, AI is of considerable significance medically and affects the progress of medical imaging continuously [13, 14]. In image processing, DL system learns valuable features through massive training sets by building a machine learning model with a large number of hidden layers, thereby improving the accuracy of classification or diagnosis [15]. DL can learn abstract and deep image features directly from raw image data. Therefore, DL technology has been widely used in PN computer-aided detection with chest CT. Convolutional neural network is a DL algorithm with a multilayer feed-forward structure that can use supervised learning and/or unsupervised learning to learn a variety of features. Often used in image recognition and processing, it has the significant advantage that it does not require any feature extraction process from the image, but instead learns and distinguishes features directly from the data. In the detection of PNs, convolutional neural network can automatically select the optimal image features directly from the image, so that the more PN features, higher accuracy, and better robustness can be obtained [16, 17]. The connected region generally refers to the image region composed of foreground pixels with the same pixel value and adjacent positions in the image. Connected region analysis refers to finding and marking of each connected region in an image, which is a relatively basic method in many application fields of image analysis and processing. The connected region analysis method can be used in all application scenarios where foreground objects are extracted for subsequent processing.

In this study, a novel pulmonary nodule detection algorithm was designed and developed based on the 3D-connected domain algorithm and was applied to the collected CT images of 150 patients with pulmonary nodules for labeling statistics. The size, nature, and location of pulmonary nodules were measured and compared with those detected by radiologists. Statistical test results designed to support pathological studies of lung cancer.

2. Experimental Methods

2.1. CT Image Sources

Since this experiment is a retrospective study, the requirement for informed consent was waived, but the patient information was desensitized. From January 2015 to December 2021, PNs examined by chest multislice spiral CT examination were chosen for a self-built database. SPSS 22.0 was applied for random sampling, and 150 cases were selected. This study was approved by the ethics committee of the hospital. Inclusion criteria for the CT images were the lung cancer patients underwent at least one CT examination before the diagnosis, in which the same PN had a diameter of ≤3 cm. All PNs were confirmed by pathological results such as fiberoptic bronchoscopy, surgery, or percutaneous lung biopsy, with an accurate histopathological diagnosis. All cases underwent chest CT plain scan before surgery or before needle biopsy. The clinical data of the patients were complete. The images were clear as the lesions were shown clearly. Exclusion criteria were the same patient had multiple PNs that could not be distinguished by location and size. The patients had a history of other malignant tumors, or had distant metastasis. The patients were suffered from diffuse lung diseases such as tuberculosis and pneumonia. There were severe artifacts in the images.

2.2. The Main Biopsy Methods of Lung Cancer Histopathology
2.2.1. Bronchoscopic Biopsy

The patient should do various tests before surgery, such as blood routine, coagulation function, pulmonary function, and cardiac function evaluation to clarify the lungs and airways. The slender bronchoscope was implanted into the patient’s airway through the oral cavity and then through the nasal cavity, and can be observed directly under the fiberoptic bronchoscope. The secretions or tissues were extracted from the suspicious lesions by brushing and lavage, and they were fixed and sent to the pathology department. After the pathology department received the specimen, the tissue was collected, dehydrated, transparent, immersed in wax, embedded, sectioned, stained, and sealed to make sections. A pathologist looked at it under a microscope. According to the morphological changes of the lesions, if necessary, immunohistochemical staining may be performed to assist in the diagnosis, and finally, a pathological report will be issued based on the staining results.

2.2.2. Percutaneous Lung Puncture Biopsy

Various examinations should be done before puncture, and physicians should actively communicate with patients to eliminate their fears and actively cooperate. Choosing an appropriate body position is convenient for immobilizing the patient and facilitating puncture, thereby shortening the operation time. During the puncture process, the patient should hold their breath in a stable breathing state of the same amplitude as much as possible to avoid the damage to the pleura caused by the breathing movement and the deviation of the positioning of the small nodules. It should select the lesion close to the chest wall or the nearest level, and avoid the ribs, scapula, or other important organs. The size of the puncture needle was 18G. To avoid puncturing of the necrotic area of the mass lesion, the material should be taken at the edge of the lesion, especially the area with obvious tissue enhancement after enhancement. When the lesion was combined with atelectasis, the two should be distinguished. Finally, the puncture specimens were fixed with 10% formaldehyde and sent for the medical examination in time.

2.3. Construction of the Algorithm

The DICOM file reader was used to obtain the detailed technical parameters of the CT images, and convert the grayscale of the pixels into the unified unit HU in the CT images. Then, the grayscale CT images were converted into black-and-white CT images according to the set gray threshold (−550 HU). The black-and-white CT image slices were superimposed in scanning order, to form a black 3D matrix. In this 3D matrix, each lung tissue that was originally kept connected remained the pixel matrix regions connected, and the originally isolated PNs still kept the pixel matrix regions isolated. With this feature, the number of isolated 3D-connected domains could be separated and calculated by mathematical methods. Then, the volume size of each isolated 3D-connected domain was calculated by using the technical parameters of CT image scanning. The volume size was converted into the diameter, the size of the PN. Given that it was a preliminary PN screening, the threshold of PN size in this algorithm was selected as 2 mm.

The algorithm was implemented, as shown in Equation (1).

The grayscale values of the CT images were replaced by the unified unit HU, and the calculation is given in Equation (1).

The pixel value in the equation was the pixel grayscale value of the corresponding point in the corresponding slice of the CT image. The rescaling intercept was the value of b in the relational expression between the stored value (SV) and the output units, . The rescaling slope was the value of m in the rescaling intercept.

The algorithm processing was to convert the pixels into HU, then the original CT grayscale image (image A) was converted into a black-and-white image (image B). After that, the largest white region (image C) was found and inverted into another black-and-white image (image D), and all the regions inside of human body were made in black (image E). The lung image (image F) was taken out on the grounds of the black regions and superimposed to form a 3D matrix image (image G). The regions connected in white are just the 3D-connected domains, as shown in Figure 1. A threshold for the size of the 3D-connected domain was set, and the 3D-connected domains smaller than the set threshold were filtered out. The number of the remaining 3D-connected domains was counted, and the number of isolated PNs was obtained. The diameter of an isolated PN could be worked out by calculating the size of the 3D-connected domain.

The calculation method for the size of PN is given in Equation (2).

The size (pixels and diameter) of detected isolated PNs can be calculated by Equation (2). The size of PN is calculated by Equation (2) with the slice spacing (SS) or slice thickness (ST) and pixel spacing (PS) in the metadata tag of the DICOM file.

In this equation, V is the volume of the PN and m is the number of pixel intervals between connected pixels in the X- and Y-axis directions. Since the general PN was round, the same number of pixel intervals was taken in the X and Y directions. n is the number of slice intervals adapted to the number of pixel intervals. The distance between pixel intervals and the distance between slice intervals should be nearly equal, so that the calculated 3D space was an approximate cube.

2.4. CT Scan Method and Radiologist’s Detection Method

CT scans of all enrolled patients were performed using 16-slice multislice CT scanner. The scanning was ranged from the thorax to the bottom of the lung. The patients took a deep breath and then held their breath to complete the whole lung scan. The scanning parameters included, the tube voltage 120–140 kV, the current 200–340 mA, the pitch 1.375 : 1, the layer thickness 5 mm, the field of view 360 mm, and the image matrix 512 × 512.

Data analysis was performed by using 3D slicer. The images obtained from CT scans were imported into the software. The nodule detection by radiologists was on the basis of the archived CT images. Two senior radiologists made their own professional judgments with reference to the CT images. Uncertain PNs were counted by their consensus. The benign and malignant, size, location, and density of each multiple PN (2–30 mm) were recorded.

2.5. Classification Criteria for PNs

According to the size of the nodule, PN was divided into three groups of 2–5, 5–10, and >10 mm. It could also be divided into four groups according to their location, including that connected to the pleura, peripheral nodules (within 20 mm from the pleural surface and not connected to the pleura), pulmonary hilar nodules (within 20 mm from the hilar structure), and central nodules (between the peripheral and hilar regions). According to the density of multiple PN, it was divided into the SN, PSN, and GGN. Figure 2 displays the CT images of a patient whose nodule was developed from GGN to PSN and finally, developed into SN. The left upper lung had a GGN with a size of about 17 × 13 mm (Figure 2(a)), CT scan after 2 years showed partial consolidation of the nodule (Figure 2(b)), and after 5 years, it was shown with complete consolidation of the nodule (Figure 2(c)). Postoperative pathology confirmed invasive lung adenocarcinoma with a size of 21 mm.

2.6. Statistical Processing Methods

SPSS 22.0 was applied for statistical analysis of data. The sensitivity, false negative rate, and false positive rate of algorithm detection as well as radiologist detection were calculated. McNemar test compared the sensitivities to detect benign and malignant PNs and multiple PNs by the deep algorithm and radiologists. The false positive rates detected by the test algorithm and the radiologist did not meet the normal distribution, so the Wilcoxon Rank Sum Test was used for comparing the false positive rate of the two. Statistical results indicated that a difference was statistically significant at .

3. Results

3.1. Analysis of Medical Records

With statistics of patient data, 71 were male and 79 were female among the 150 patients selected, with an average age of 58 years. The youngest was 27 years old and the oldest was 89. The mean size of the PNs was 16.08 ± 9.17 mm. Five hundred and thirty multiple PNs were identified by the radiologists. The PNs were classified depending on the size, type, and location, and the data are listed in Table 1.

3.2. Detection Results of Benign and Malignant PNs

Five hundred and thirty benign and malignant PNs were found in 150 patients, 312 were malignant and 248 were benign. After statistics, the algorithm detected 498 PNs, among which 478 were detected accurately, 20 were falsely detected, and 32 were missed, with a sensitivity of 95.98% and a specificity of 93.96%. The radiologists detected 424 PNs in total, as 364 were accurate, 60 were falsely detected, and 106 were missed. The sensitivity was 85.85% and the specificity was 80.00%. The detailed results are shown in Table 2.

Figure 3 presents the images of some benign and malignant PNs (Figure 3(a)–3(e) shows benign PNs, while Figure 3(f)–3(j) shows malignant ones). The two sets of images were compared, from which it was revealed that the edges of benign PNs were smooth with no or few burrs. Comparatively, malignant PNs had uneven and mostly burr edges, and the overall morphology was irregular.

3.3. Detection Results of Nodule Size

Among the 530 multiple PNs, 2–5 mm nodules accounted for the largest proportion. The algorithm had a statistically higher accuracy in detecting nodules with 2–5, 5–10, and >10 mm compared with the detection results of radiologists (, respectively). The accuracy of radiologists for 2–5 mm PNs was 60.41%, which suggested that there was still a large deviation in the judgment of small nodules by traditional manual judgment methods. Algorithm detection could well avoid this flaw as the results are shown in Table 3.

3.4. Detection Results of Nodule Properties

Compared with the detection results by radiologists, the detection results by the algorithm were more accurate statistically in SNs and GGNs (, respectively). The detections of PSNs showed no difference of statistical significance (), as shown in Table 4.

3.5. Detection Results of Locations of PNs

The detection results of PNs in the pleura-connected type, peripheral type, central type, and hilar type were compared between the algorithm and radiologists. The algorithm detection was more accurate statistically (, respectively), as shown in Table 5.

4. Discussion

Pulmonary nodules are round or irregular lesions with a diameter of ≤3 cm in the lungs. Among them, nodules less than 1 cm are called “small nodules,” nodules of 2–5 mm are called “micronodules,” and lesions larger than 3 cm are not called nodules, but “mass.” Early-stage lung cancer is generally asymptomatic, and many people ignore it, leading to an estimated survival of several months for patients with advanced disease due to lack of attention to follow-up examinations [18, 19]. Some studies have found that the larger the nodule, there are lobulation, burr, pleural traction, air-containing bronchioles sign and vesicle sign, eccentric thick-walled cavity, etc. The nodules are located in the upper lobe, especially in the upper lobe of the right lung [20]. CT has high-density resolution in the diagnosis of lung cancer and can avoid overlapping of chest wall, heart, and mediastinum. Many small lesions in hidden parts of the lung can be found, which is of great help in judging the scope of lesions and disease staging [21, 22].

In this study, in the judgment of 530 benign and malignant nodules, the intelligent algorithm based on 3D-connected domain has an accurate detection rate of 93.96%, a true positive rate of 95.98%, and a false positive rate of 4.02%. The radiologists accurately detected 424 cases, the detection rate was 80.0%, the true positive rate was 85.85%, and the false positive rate was 14.15%. It can be seen that the detection effect of the algorithm is significantly higher than that of radiologists (). According to Kavithaa et al. [23], the traditional algorithm detected and recognized the target according to the texture, color, edge gradient information, and other features of the image itself. However, the generalization ability of its computational theory is not strong, and the recognition error rate is high like the manual detection by doctors. The recognition rate and generalization ability of the feature classification network based on the intelligent algorithm are better than the traditional methods. This study conducted a comparative evaluation of the properties of all nodules, and found that the detection results of the size, nature, and location of the nodules detected by the algorithm were significantly higher than those of the radiologists (). After statistics, it was found that the algorithm detection was easy to misdiagnose lobular nuclear structure, hilar blood vessels, trachea, ground glass density shadows, old lesions, etc., and most of the lesions were concentrated in 2–5 mm. Therefore, algorithm detection software should focus on the determination of false positives to avoid excessive misdiagnosis. Perl et al. [24] compared computer-aided detection (CAD) and 3D convolutional neural network (CNN) software and found that software based on deep learning systems had similar sensitivity to traditional CAD software, but higher specificity. In addition, Sim et al. [25] used deep convolutional neural network (DCNN) software, radiologist detection, and radiologists used algorithm detection software to assist in the detection of X-ray films for comparison. It was found that the detection rate and sensitivity of the artificial intelligence algorithm were high, and the false negative rate and false positive rate of the physician’s judgment were significantly reduced after software-assisted detection. In summary, it is of great significance to apply AI algorithms for intelligently identification of chest CT images in the pathological research of lung cancer. The efficiency of AI is extremely high, and the corresponding image analysis can be obtained only by importing images. This study also found that AI had the higher accuracy. Furthermore, AI has the ability to continuously learn, simulate, and evolute. Through the training and learning of massive images, it will definitely overcome the shortcoming of high false positive rate in the future, and finally achieve a recognition goal close to 100%. The improved AI diagnostic ability of CT imaging will inevitably give a huge impetus to the research of lung cancer pathology.

5. Conclusion

In this study, a pulmonary nodule detection algorithm was constructed based on the 3D-connected domain algorithm and was applied to the CT processing of thoracic pulmonary nodules and compared with the pulmonary nodules detected by physicians. The results showed that the detection effect of artificial intelligence algorithm was better than that of radiologists, and it can more accurately reflect the pathological conditions of pulmonary nodules, and can be used as an auxiliary detection tool for screening. The disadvantages were as follows: a retrospective experimental method was adopted and it only screened many patients who have been tested sequentially but the real situation was missed. The number of samples was limited, which cannot more objectively reflect the detection effect of the algorithm. In future research, the sample size would be expanded, and this direction would be further explored through clinical experiments. This study provided important support for the pathological study of lung cancer and more accurately judges the future development of pulmonary nodules in patients.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.