Convolutional Neural Networks Promising in Lung Cancer T-Parameter Assessment on Baseline FDG-PET/CT

Kirienko, Margarita; Sollini, Martina; Silvestri, Giorgia; Mognetti, Serena; Voulaz, Emanuele; Antunovic, Lidija; Rossi, Alexia; Antiga, Luca; Chiti, Arturo

doi:https://doi.org/10.1155/2018/1382309

Contrast Media & Molecular Imaging

On this page

Abstract Introduction Materials and Methods Results Discussion Data Availability Additional Points Ethical Approval Conflicts of Interest Authors’ Contributions Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2018 | Article ID 1382309 | https://doi.org/10.1155/2018/1382309

Convolutional Neural Networks Promising in Lung Cancer T-Parameter Assessment on Baseline FDG-PET/CT

Margarita Kirienko,¹Martina Sollini,^1,2Giorgia Silvestri,³Serena Mognetti,³Emanuele Voulaz,⁴Lidija Antunovic,²Alexia Rossi,^1,5Luca Antiga,³and Arturo Chiti^1,2

Academic Editor: André L. B. De Barros

Received30 Jun 2018

Accepted01 Oct 2018

Published30 Oct 2018

Abstract

Aim. To develop an algorithm, based on convolutional neural network (CNN), for the classification of lung cancer lesions as T1-T2 or T3-T4 on staging fluorodeoxyglucose positron emission tomography (FDG-PET)/CT images. Methods. We retrospectively selected a cohort of 472 patients (divided in the training, validation, and test sets) submitted to staging FDG-PET/CT within 60 days before biopsy or surgery. TNM system seventh edition was used as reference. Postprocessing was performed to generate an adequate dataset. The input of CNNs was a bounding box on both PET and CT images, cropped around the lesion centre. The results were classified as Correct (concordance between reference and prediction) and Incorrect (discordance between reference and prediction). Accuracy (Correct/[Correct + Incorrect]), recall (Correctly predicted T3-T4/[all T3-T4]), and specificity (Correctly predicted T1-T2/[all T1-T2]), as commonly defined in deep learning models, were used to evaluate CNN performance. The area under the curve (AUC) was calculated for the final model. Results. The algorithm, composed of two networks (a “feature extractor” and a “classifier”), developed and tested achieved an accuracy, recall, specificity, and AUC of 87%, 69%, 69%, and 0.83; 86%, 77%, 70%, and 0.73; and 90%, 47%, 67%, and 0.68 in the training, validation, and test sets, respectively. Conclusion. We obtained proof of concept that CNNs can be used as a tool to assist in the staging of patients affected by lung cancer.

1. Introduction

In recent years, advanced analysis of medical imaging using radiomics, machine, and deep-learning, including convolutional neural networks (CNNs), has been explored. These approaches offer great promise for future applications for both diagnostic and predictive purposes. CNNs are nonexplicitly programmed algorithms that identify relevant features on the images that allow them to classify an input object. They have been applied in various tasks such as detection (e.g., breast lesions on mammographic scans), segmentation (e.g., liver and liver lesions on computed tomography (CT)), and diagnosis (e.g., lung lesions on screening low-dose CT).

CNNs are a machine-learning technique based on an artificial neural network with deep architecture relying on convolution operations (the linear application of a filter or kernel to local neighbourhoods of pixel/voxels in an input image) and downsampling or pooling operations (grouping of feature map signals into a lower-resolution feature map). The final classification or regression task relies on higher-level features representative of a large receptive field that is flattened into a single vector. The development of an algorithm entails (a) selection of the hyperparameters, (b) training and validation, and (c) testing. The hyperparameters include the network topology, the number of filters per layer, and the optimisation parameters. During the training process, the dataset of input images (divided into training and validation sets) is repeatedly submitted to the network to capture the structure of the images that is salient for the task. Initially, the weights for each artificial neuron are randomly chosen. Then, they are adjusted at each iteration, targeting minimisation of the loss function, which quantifies how close the prediction is to the target class. The performance of the trained model is then evaluated using an independent test dataset. This is also aimed at assessing whether an “overfitting” has occurred. The overfitting problem can arise in the case of limited datasets with too many parameters compared with the dataset size, in which case a model “memorises” the training data rather than generalising from them [1].

In the field of lung imaging, CNNs have been tested in nodule segmentation from CT images. Average dice scores of 82% and 80% for the training and test datasets, respectively, have been reported [2]. CNNs have been demonstrated to achieve better results than conventional methods for the purpose of nodule detection [3, 4]. Moreover, a model for assessment of cancer probability in patients with pulmonary nodules has been proposed. The area under the curve (AUC) was found to be 0.90 and 0.87 for the training and test sets, respectively [5]. Stage assessment has not yet been described. The present study, as a first step towards complete TNM parameter assessment, aimed to develop an algorithm for the classification of lung cancer as T1-T2 or T3-T4 on staging fluorodeoxyglucose positron emission tomography (FDG-PET)/CT images.

2. Materials and Methods

2.1. Study Design and Patient Selection

In this retrospective single-centre investigation, we screened all patients who underwent FDG-PET/CT between 01/01/2011 and 27/06/2017 for the purpose of staging a suspected lung lesion, within 60 days before biopsy or surgical procedure. The inclusion criteria were (1) age > 18 years and (2) histological diagnosis of primary lung cancer. The exclusion criteria were (1) inconclusive histology due to inadequate biopsy sample and (2) diagnosis of nonmalignancy. The study was approved by the Institutional Ethics Committee.

2.2. Image Acquisition and Postprocessing

FDG-PET/CT was performed according to the standard institutional procedures, previously detailed [6]. Postacquisition processing was performed to generate an adequate dataset for the CNN. The original CT and PET image size was 512 × 512 × N_slices and 128 × 128 × N_slices, respectively, where N_slices is the number of slices in which the lesion appears. The CT images were clipped between −1000 and 400 Hounsfield units. PET images were resampled in the CT space. Then, both images were rescaled to lie between 0 and 1. Consequently, the dataset consisted of 3D bounding boxes on both PET and CT images, cropped around the lesion centre, identified by two nuclear medicine physicians (M.S. and M.K.) with dimension 128 × 128 × N_slices. Data augmentation, a strategy commonly used by deep-learning methods, was performed. Image patches were rotated in 2D space around the lesion centre about the z-axis by an angle randomly selected in a range of [−10°, 10°]. This processing step artificially expands the size of the training set and reduces the overfitting phenomena.

2.3. CNN Development and Analysis

The study workflow and the networks’ architecture are summarised in Figure 1. During the training phase, a fivefold cross-validation strategy was adopted by dividing the cohort into a training dataset and a validation dataset. To assess the performance of the final model, the cohort was divided into training, validation, and test datasets. The algorithm was composed of two networks: a feature extractor and a classifier. The feature extractor was a CNN that took a CT-PET image patch of 128 × 128 pixels as input and performed classification (T1-T2 with label = 0 and T3-T4 with label = 1) according to the appearance of the image patch. The feature extractor aimed to extract the most relevant features from a single patch. The classifier took as input the mean of the second to last layer of features extracted from all slices of a single patient and aimed to perform a classification (T1-T2 vs. T3-T4) for that patient. The softmax function was applied to the last layer of both networks, in order to obtain the probability of being T1-T2 and T3-T4. The class having the highest probability was assigned to each patient. Both models were trained with the Adam algorithm [7]. Table 1 summarises the parameters for the feature extractor and the classifier networks. The code was written in Python, using the PyTorch deep-learning library (http://pytorch.org/).

Figure 1

Study workflow and networks’ architecture. The network (consisting of two neural networks, the feature extractor, and the classifier) was trained using a cross-validation strategy. The entire dataset was divided into 5 randomly chosen parts. At each training run, 4/5 of the dataset were used as a training set and the remaining 1/5 was used as the validation set. Subsequently, the final model network was adjusted and tested for performance using the dataset divided into three sets: training, validation, and test. In the feature extractor CNN, the input images, both PET and CT, were submitted to a series of convolutions producing a stack of feature maps containing low-level features, rectified linear units (ReLU), and max pooling layers that downsample the feature maps (MaxPool), to produce higher-level features. In the classifier network, these higher-level features are used to perform the final classification T1-T2 (output label 0) vs T3-T4 (output label 1).

TNM classification system 7^th edition [8] was used as reference.

The results were classified as Correct (concordance between reference and prediction) and Incorrect (discordance between reference and prediction). Accuracy (Correct/[Correct + Incorrect]), recall (Correctly predicted T3-T4/[All T3-T4]), and specificity (Correctly predicted T1-T2/[all T1-T2]) as commonly defined in deep-learning models, were used to evaluate CNN performance. The area under the curve (AUC) was calculated for the final model.

3. Results

From the institutional database, a cohort of 586 patients was selected by applying the abovementioned criteria. Patients with distant metastases or histology different from adenocarcinoma and squamous cell carcinoma were excluded from the present analysis. Therefore, 472 patients were included in the study (T1-T2 = 353 patients, T3-T4 = 119 patients). Staging was clinical and pathological in 97 and 375 of cases, respectively. Subsequently, the patients were randomly divided in training (), validation (), and test () sets. The patients’ characteristics are summarised in Table 2.

Table 3 summarises the results of the cross-validation analysis.

The algorithm developed and tested in the present work achieved an accuracy of 69%, a recall of 70%, and a specificity of 67% in the test set, for the identification of T1-T2 and T3-T4 lung cancer, in the final model analysis. The AUC was 0.83, 0.73, and 0.68 in the training, validation, and test sets, respectively. Results of all metrics for the final model in the training, validation, and test sets are reported in Table 4.

Figure 2 shows examples of patients classified by CNN as T1-T2 and T3-T4.

(a)

(b)

(c)

(d)

Figure 2

Clinical examples of the network classification. Axial CT images (a, b) of a patient affected by lung adenocarcinoma pT1bN0 ((a) zoom on the lesion); the developed neural network classified the new input image ((b) green square on one of the slices in which the lesion is visible) as belonging to the class T1-T2 with 99% probability. Axial CT images (c, d) of a patient affected by lung adenocarcinoma pT4N1 ((c) zoom on the lesion); the developed neural network classified the new input image ((d) green square on one of the slices in which the lesion is visible) as belonging to the class T3-T4 with 92% probability.

4. Discussion

The algorithm developed and tested in the present work achieved an accuracy of 87%, 69%, and 69% in the training, validation, and test sets, respectively, for the classification of T1-T2 and T3-T4 lung cancer. The lower performances of the CNN in the validation and test sets compared with the training dataset are probably related to the sample size (, , and , respectively).

The TNM staging system is the reference method for prognostication and treatment decision-making in cancer patients, including in those with lung cancer. Pathological assessment is considered the gold standard. However, patients with tumours of the same stage can experience variations in the incidence of recurrence and survival. These variations may be related to tumour biology and other factors, including potential differences in the pathological stage at diagnosis and at neoadjuvant treatment [9] and the possible effect of the number of surgically removed lymph nodes on the N-parameter [10]. Finally, pathological TNM staging is not feasible in advanced stages. Medical imaging is the reference when pathological assessment is not feasible. Hybrid FDG-PET/CT is noninvasive and provides whole-body assessment, resulting essential in baseline lung cancer staging. CT scan is the cornerstone of lung cancer imaging providing all the necessary information needed for clinical T staging. FDG-PET/CT outperforms other modalities in terms of diagnostic accuracy in mediastinal nodal involvement and extrathoracic disease detection [11, 12].

In lung CT imaging, deep-learning approaches have been used to detect lung nodules [3, 13], to segment [2], and to classify them as benign or malignant [5, 14]. Some preliminary data are available on radiation treatment planning [15] and outcome prediction [16]. Recently, the value of CNNs for classification of mediastinal lymph node metastasis on FDG-PET/CT images has been investigated in non-small cell lung cancer. CNNs proved promising in identifying lymph node involvement, with higher sensitivities but lower specificities compared with doctors [17].

The present work reports an innovative application of deep-learning providing an automated staging classification. Some limitations of this study have to be acknowledged. Firstly, the limited number of patients precluded the design of a network for a classification task with four outputs (T1, T2, T3, and T4). Moreover, better performance is expected in larger datasets. In future work, we plan to include a larger number of cases in a multicentre framework. Secondly, the reference to define the T-parameter was based on the pathological assessment in surgically treated patients (80% of cases), while in patients not suitable for surgery, clinical assessment was used as the standard. The choice to use the pathological staging when available instead of the clinical one was aimed to achieve higher algorithm performance. Finally, we did not test separately CT and PET image modality. It could be speculated that, in this first step investigation, the CNN nodes were mostly activated by the weighted features coming from the CT component, while for the N- and M-parameters, PET can be supposed to give a major contribution. The comparison between the performances of a CNN trained using either CT or PET images will be considered in future studies. However, our final objective is to develop a whole-body assessment tool processing both CT and PET images together.

In conclusion, the key result in the present preliminary investigation is the feasibility and promising performance of CNNs in assessing the T-parameter in lung cancer. The developed tool is able to provide in few minutes from baseline PET/CT the probability of a patient being T1-T2 or T3-T4. Further investigations are needed to develop robust algorithms for a complete TNM assessment. Compared with radiomics, CNNs have the advantage of eliminating the need for tumour segmentation, feature calculation and selection, which are even more critical issues in small lesions. Still, the possibility of a complementary role of radiomics and artificial intelligence techniques should be addressed. Moreover, improvement in risk stratification is foreseen with the incorporation of patients’ clinical features in the neural network algorithm.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Additional Points

Clinical Relevance. The development of an automated system based on neural networks, that is, the ability to perform a complete stage and risk assessment will ensure more accurate and reproducible use of imaging data and allow deeper integration of this information in the therapeutic plan for each individual patient.

Ethical Approval

The study was approved by the Institutional Ethics Committee. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Conflicts of Interest

A. Chiti received speaker honoraria from General Electric, Blue Earth Diagnostics, and Sirtex Medical System, acted as the scientific advisor for Blue Earth Diagnostics and Advanced Accelerator Applications, and benefited from an unconditional grant from Sanofi to Humanitas University. All honoraria and grants are outside the scope of the submitted work. All other authors have no conflicts of interest.

Authors’ Contributions

MS, MK, and AC conceived the idea of the study; MS, MK, GS, and SM reviewed and contributed to the image processing; EV collected the data and selected the patients; GS, SM, and LA developed the network; LA and AR reviewed the images and selected the patients; MS and MK wrote the manuscript; all the authors discussed the results and commented on the manuscript. Margarita Kirienko and Martina Sollini contributed equally to this work.

Acknowledgments

M. Kirienko was supported by the AIRC (Italian Association for Cancer Research) scholarship funded by the grant won by AC (IG-2016-18585).

References

G. Chartrand, P. M. Cheng, E. Vorontsov et al., “Deep learning: a primer for radiologists,” RadioGraphics, vol. 37, no. 7, pp. 2113–2131, 2017.
View at: Publisher Site | Google Scholar
S. Wang, M. Zhou, Z. Liu et al., “Central focused convolutional neural networks: developing a data-driven model for lung nodule segmentation,” Medical Image Analysis, vol. 40, pp. 172–183, 2017.
View at: Publisher Site | Google Scholar
A. A. A. Setio, F. Ciompi, G. Litjens et al., “Pulmonary nodule detection in CT images: false positive reduction using multi-view convolutional networks,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1160–1169, 2016.
View at: Publisher Site | Google Scholar
Q. Dou, H. Chen, L. Yu, J. Qin, and P. A. Heng, “Multilevel contextual 3-D CNNs for false positive reduction in pulmonary nodule detection,” IEEE Transactions on Biomedical Engineering, vol. 64, pp. 1558–1567, 2017.
View at: Publisher Site | Google Scholar
F. Liao, M. Liang, Z. Li, X. Hu, and S. Song, “Evaluate the malignancy of pulmonary nodules using the 3D deep leaky noisy-or network,” 2017, http://arxiv.org/abs/1711.08324.
View at: Google Scholar
M. Kirienko, L. Cozzi, A. Rossi et al., “Ability of FDG PET and CT radiomics features to differentiate between primary and metastatic lung lesions,” European Journal of Nuclear Medicine and Molecular Imaging, vol. 45, no. 10, pp. 1649–1660, 2018.
View at: Publisher Site | Google Scholar
D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” in Proceedings of 3rd International Conference for Learning Representations, San Diego, CA, USA, December 2014.
View at: Google Scholar
S. Edge, D. Byrd, C. Compton, A. Fritz, F. Greene, and A. Trotti, AJCC Cancer Staging Manual, Springer, Heidelberg, Germany, 7th edition, 2010.
M. D. Hellmann, J. E. Chaft, W. N. William et al., “Pathological response after neoadjuvant chemotherapy in resectable non-small-cell lung cancers: proposal for the use of major pathological response as a surrogate endpoint,” Lancet Oncology, vol. 15, no. 1, pp. e42–e50, 2014.
View at: Publisher Site | Google Scholar
F. O. Velez-Cubian, K. L. Rodriguez, M. R. Thau et al., “Efficacy of lymph node dissection during robotic-assisted lobectomy for non-small cell lung cancer: retrospective review of 159 consecutive cases,” Journal of Thoracic Disease, vol. 8, no. 9, pp. 2454–2463, 2016.
View at: Publisher Site | Google Scholar
N. C. Purandare and V. Rangarajan, “Imaging of lung cancer: implications on staging and management,” Indian Journal of Radiology and Imaging, vol. 25, no. 2, pp. 109–120, 2015.
View at: Publisher Site | Google Scholar
Y. Wu, P. Li, H. Zhang et al., “Diagnostic value of fluorine 18 fluorodeoxyglucose positron emission tomography/computed tomography for the detection of metastases in non-small-cell lung cancer patients,” International Journal of Cancer, vol. 132, no. 2, pp. E37–E47, 2013.
View at: Publisher Site | Google Scholar
H. Jin, Z. Li, R. Tong, and L. Lin, “A deep 3D residual CNN for false-positive reduction in pulmonary nodule detection,” Medical Physics, vol. 45, no. 5, pp. 2097–2107, 2018.
View at: Publisher Site | Google Scholar
X. Zhao, L. Liu, S. Qi, Y. Teng, J. Li, and W. Qian, “Agile convolutional neural network for pulmonary nodule classification using CT images,” International Journal of Computer Assisted Radiology and Surgery, vol. 13, no. 4, pp. 585–595, 2018.
View at: Publisher Site | Google Scholar
H.-H. Tseng, Y. Luo, S. Cui, J.-T. Chien, R. K. Ten Haken, and I. E. l. Naqa, “Deep reinforcement learning for automated radiation adaptation in lung cancer,” Medical Physics, vol. 44, no. 12, pp. 6690–6705, 2017.
View at: Publisher Site | Google Scholar
R. Paul, S. H. Hawkins, Y. Balagurunathan et al., “Deep feature transfer learning in combination with traditional features predicts survival among patients with lung adenocarcinoma,” Tomography, vol. 2, no. 4, pp. 388–395, 2016.
View at: Publisher Site | Google Scholar
H. Wang, Z. Zhou, Y. Li et al., “Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from 18F-FDG PET/CT images,” EJNMMI Research, vol. 7, no. 1, p. 11, 2017.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2018 Margarita Kirienko et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

5208

Downloads

2140

Citations