Abstract

Methylation of the O6-methylguanine methyltransferase (MGMT) gene promoter is correlated with the effectiveness of the current standard of care in glioblastoma patients. In this study, a deep learning pipeline is designed for automatic prediction of MGMT status in 87 glioblastoma patients with contrast-enhanced T1W images and 66 with fluid-attenuated inversion recovery(FLAIR) images. The end-to-end pipeline completes both tumor segmentation and status classification. The better tumor segmentation performance comes from FLAIR images (Dice score, ) compared to contrast-enhanced T1WI (Dice score, ), and the better status prediction is also from the FLAIR images (accuracy, ; recall, ; precision, ; and score, ). This proposed pipeline not only saves the time in tumor annotation and avoids interrater variability in glioma segmentation but also achieves good prediction of MGMT methylation status. It would help find molecular biomarkers from routine medical images and further facilitate treatment planning.

1. Introduction

Glioblastoma multiforme (GBM) is the most common and aggressive type of primary brain tumor in adults. It accounts for 45% of primary central nervous system tumors, and the 5-year survival rate is around 5.1% [1, 2]. The standard treatment for GBM is surgical resection followed by radiation therapy and temozolomide (TMZ) chemotherapy, which improves median survival by 3 months compared to radiotherapy alone [3]. Several studies indicated that O6-methylguanine-DNA methyltransferase (MGMT) gene promoter methylation reported in 30-60% of glioblastomas [4] can enhance the response to TMZ, which has been proven to be a prognostic biomarker in GBM patients [3, 5]. Thus, determination of MGMT promoter methylation status is important to medical decision-making.

Genetic analysis based on surgical specimens is the reference standard to assess the MGMT methylation status, while a large tissue sample is required for testing MGMT methylation status using methylation-specific polymerase chain reaction [6]. In particular, the major limitations are the possibility of incomplete biopsy samples due to tumor spatial heterogeneity and high cost [7]. Besides, it cannot be used for real-time monitoring of the methylation status.

Magnetic resonance imaging (MRI) is a standard conventional examination in diagnosis, preoperative planning, and therapy evaluation of GBM [8, 9]. Recently, radiomics, extracting massive quantitative features from medical images, has been proposed to explore the correlation between image features and underlying genetic traits [1012]. There is growing evidence that radiomics can be used in predicting the status of MGMT promoter methylation [1315]. However, most previous works utilized handcrafted features. This procedure includes tumor segmentation, feature extraction, and informatics analysis [1619]. In particular, tumor segmentation is a challenging and important step because most works depend on manual delineation. This step is burdensome and time consuming, and inter- or intraobserver disagreement is unavoidable. Deep learning which can extract features automatically has been emerging as an innovative technology in many fields [20]. The convolutional neural network (CNN) is proven to be effective in image segmentation, disease diagnosis, and other medical image analysis tasks [2125]. Compared to traditional methods with handcrafted features, deep learning shows several advantages of being robust to distortions such as changes in shape and lower computational cost. A few studies have shown that deep learning can be used to segment tumors and predict MGMT methylation status for glioma [26]. However, to the best of our knowledge, there is no previous report regarding building a pipeline for both glioma tumor segmentation and MGMT methylation status prediction in an end-to-end manner. Therefore, we investigate the feasibility of integrating the tumor segmentation and status prediction of GBM patients into a deep learning pipeline in this study.

2. Methods

2.1. Data Collection

A total of 106 GBM patients were analyzed in our study. MR images, including presurgical axial contrast-enhanced T1-weighted images (CE-T1WI) and T2-weighted fluid-attenuated inversion recovery (FLAIR) images, were collected from The Cancer Imaging Archive (http://www.cancerimagingarchive.net). The images were originated from four centers (Henry Ford Hospital, University of California San Francisco, Anderson Cancer Center, and Emory University). Clinical and molecular data were also obtained from the open-access data tier of the TCGA website.

Genomic data were from the TCGA data portal. MGMT methylation status analysis was performed on Illumina HumanMethylation27 and HumanMethylation450 BeadChip platforms. A median cutoff using the level 3 beta-value present in the TCGA was utilized for categorizing methylation status. Illumina Human Methylation probes (cg12434587 and cg12981137) were selected in this study [27].

Of 106 GBM cases, 87 cases were with CE-T1W images, and 66 cases with FLAIR images. We randomly split the cases into training and testing sets with the ratio of 8 : 2 and applied 10-fold cross-validation to the training set with scikit-learn library (https://scikit-learn.org/stable/). The dataset distribution is listed in Table 1.

2.2. Image Preprocessing

For general images, the pixel values contain reliable image information. However, MR images do not have a standard intensity scale. In Figure 1(a), we show the density plot of two raw MR images. In each plot, there are two peaks, the peak around 0 refers to background pixels, and the other peak refers to white matter. The white matter peaks of the two images are far away. Thus, MR images normalization is needed to guarantee that the grey values of the same tissue among different MR images are close to each other [28].

The piece-wise linear histogram matching was used to normalize the intensity distribution of MR images [29]. Firstly, we studied standard histogram distribution via averaging the 1st to 99th percentile of all images. Then, we linearly mapped the intensities of each image to this standard histogram. In Figure 1(b), we can see that the white matter peaks of two images coincide with each other after normalization. Secondly, the images were normalized to zero mean and unit standard deviation only on valued voxels. At last, data augmentation was used to increase the dataset size to avoid overfitting. We rotated images for every 5 degrees from -20 to +20 degrees, resulting in a 9-fold increment in the number of MRI scans.

2.3. Segmentation

As for tumor segmentation, one state-of-the-art model [30] in BraTS 2018 challenge (Multimodal Brain Tumor Segmentation 2018 Challenge http://braintumorsegmentation.org/) was adapted. The whole network architecture is shown in Figure 2.

In short, the deep learning model added a variational autoencoder (VAE) branch to a fully convolutional network model. The decoder part was shared for both segmentation and VAE tasks. The prior distribution taken for the KL divergence in the VAE part is . ResNet blocks used in the architecture [31] included two convolutions with normalization and ReLU as well as skip connections. In the encoder part, the image dimension was downsampled using stride convolution by 2 and increased channel size by 2. For the decoder part, the structure was similar to that of the encoder part but using upsampled. The decoder endpoint had the same size as the input image followed by sigmoid activation, and its output was for tumor segmentation. As for the VAE part, the encoder output was reduced to 256, and the input image was reconstructed by using a similar structure as the decoder without skip connection. The segmentation part output the tumor segmentation and the VAE branch attempted to reconstruct the input image. Except for the input and output layers, all blocks in Figure 2 utilized the ResNet block with different channel numbers (depicted aside each layer). For the input layer, a convolution was with 3 channels; and for both output layers, a convolution with a dropout rate of 0.2 and regularization with weight were used to avoid overfitting. The loss function consists of 3 terms as shown in where is the soft Dice loss between the predicted segmentation and the ground truth labels. The ground truth labels were manually annotated with ImageJ (https://imagej.nih.gov) by one neuroradiologist with 10 years’ experience specialized in brain disease diagnosis. is the loss on the VAE branch output image and the input image, and is the standard VAE penalty term [32, 33]. Then, the Dice coefficient as defined in function (2) was calculated to assess the performance of segmentation: where is the ground truth, is the prediction for pixel , and .

2.4. Status Classification

Meanwhile, for the classification of MGMT methylation status, a 4-layer CNN was designed. Further, the classification model was cascaded with the tumor segmentation model. At the stage of the tumor segmentation model design, the classification network was tried with different numbers of convolutional layers [25], and we found that 2 convolutional layers with 2 fully connected (FC) layers performed the best for this task. The first convolutional layer had 16 filters, and the second one had 4 filters. All the convolutional layers had a kernel size of and stride of 1 followed by LeakyReLU, batch normalization, and max pooling. LeakyReLU was an advanced ReLU activation that avoids dead neurons by setting a negative half-axis slope 0.3 instead of 0. Its advantages include good performance in eliminating gradient saturation, low computational cost, and faster convergence. Batch normalization was used to normalize features by the mean and variance within a small batch. It helped to solve the covariance shift issue and ease optimization. Max pooling with a filter was used to downsample image features extracted through convolutional layers and then fed into 2 FC layers. ReLU and softmax were adapted as activation functions for the first and second FC layers, respectively. The weight initialization of all layers was done by He-normal [34].

2.5. Parameter Settings and Software

All experiments were conducted under the open-source framework Keras (https://keras.io/) on one GeForce RTX 2080Ti GPU. The numbers of parameters of the segmentation and classification model are, respectively, 6,014,721 and 3,498. In tumor segmentation, Adam optimizer was adapted with a self-designed learning rate scheduler which was initialized with a learning rate ; then, the learning rate was divided by 2 when the validation loss did not reduce in the past 5 epochs. The epoch was set at 50 and batch size at 8. Every epoch took around 50 seconds. In tumor classification, 4-CNN was trained for 50 epochs which utilized Adam with learning rate , and the batch size was 32. If the validation accuracy was observed stable for over 10 epochs, the training process would be ended. The averaged elapsed time for each epoch was 5 seconds.

2.6. Statistical Analysis

The Dice coefficient was calculated for evaluating the performance of tumor segmentation. For the MGMT methylation status classification, the accuracy rate, recall, precision, and score were calculated according to equations listed below. In addition, the receiver operating characteristic (ROC) curve was plotted, and the area under the ROC curve (AUC) was reported to measure the classification accuracy. All the parameters were calculated in PyCharm with the programming language of Python (version 3.6.8; Wilmington, DE, USA; http://www.python.org/): where is the true positive, is the true negative, is the false positive, and is the false negative.

3. Results

3.1. Tumor Segmentation
3.1.1. Qualitative Observation

Tumors could be accurately delineated by the proposed pipeline. Figure 3 shows the annotated ground truth (the first row) and corresponding segmentation results (the second row) of GBM in FLAIR images. It is observed that tumor boundaries could be accurately localized by using the deep learning network, and the major hyperintense regions are delineated. The three cases show that automatic segmentation is quite close to the ground truth.

Figure 4 shows the GBM in CE-T1WI images, and the ground truth (the first row) and the segmentation results (the second row) are presented. Tumor boundaries are localized, and it seems that there is no obvious difference between the manual annotation and its corresponding segmentation results obtained from our proposed network, and the suspicious regions are mainly contoured. The three cases show that segmentation results from the deep network approximate the manual delineation.

3.1.2. Quantitative Evaluation

The quantitative performance of automatic tumor segmentation is summarized in Table 2. The deep network obtained good testing performance on tumor segmentation using CE-T1WI (Dice score, ) and FLAIR (Dice score, ). And the Dice scores from FLAIR were slightly higher than those from CE-T1WI across training, validation, and testing sets. The maximum difference of the Dice score between average Dice scores from CE-T1W images in training and validation sets was 0.026, indicating that the model was not overfitting.

3.1.3. Computational Performance

Time consumption between manual annotation and automatic prediction per MR slice is compared as shown in Table 3. For the evaluation of time consumption, we recorded the total time and divided it by the number of slices. So, the time listed in Table 3 was the average segmentation time per slice. It was observed that the deep network was more efficient, and it took less than 0.2 seconds to complete the segmentation of an MR slice, while manual annotation required more than 30 seconds.

3.2. Classification of MGMT Promoter Methylation Status

Table 4 shows the prediction performance of MGMT promoter methylation status which is evaluated from four classification metrics (accuracy, recall, precision, and score) on three stages (training, validation, and testing) when using different MR images (CE-T1WI, FLAIR). In general, the model trained with FLAIR achieves better results for all metrics across three stages, followed by the model trained with CE-T1WI images. Specifically, the accuracy, recall, precision, and score of the deep model trained with FLAIR images reach 0.827, 0.852, 0.821, and 0.836 in the testing stage, respectively.

ROC curves of the prediction results are demonstrated in Figures 5 and 6. Figure 5 shows the best status classification when using FLAIR images for a deep model, which achieves an AUC of 0.985 (yellow curve), 0.968 (green curve), and 0.905 (red curve) on the training, validation, and testing datasets, respectively.

The best status classification when using CE-T1WI images for deep model training is shown in Figure 6. The well-trained deep model obtains AUC up to 0.973 (yellow curve), 0.942 (green curve), and 0.887 (red curve) on the training, validation, and testing datasets, respectively.

4. Discussion

This study presents an MR-based deep learning pipeline for automatic tumor segmentation and MGMT methylation status classification in an end-to-end manner for GBM patients. Experimental results demonstrate promising performance on accurate glioma delineation (Dice score, 0.897) and MGMT status prediction (accuracy, 0.827; recall, 0.852; precision, 0.821; and score, 0.836) coming from the model trained with FLAIR images. In addition, the proposed pipeline dramatically shortens the inference time on glioma segmentation.

For glioma segmentation, one state-of-the-art deep model is utilized and obtains impressive performance on the involved MGMT dataset for GBM segmentation. Its performance is close to these deep network-based tumor segmentation studies. Hussain et al. [35] reported a CNN approach for glioma MRI segmentation, and the model achieved a Dice score of 0.87 on the BRATS 2013 and 2015 datasets. Cui et al. [36] proposed an automatic semantic segmentation model on the BRATS 2013 dataset, and the Dice score was near 0.80 on the combined high- and low-grade glioma datasets. Kaldera et al. [37] proposed a faster RCNN method and achieved a Dice score of 0.91 on 233 patients’ data. These studies suggest that deep networks are full of potential for accurate tumor segmentation in MR images.

Several deep models have been designed for the classification of MGMT methylation status in GBM patients. Chang et al. [38] proposed a deep neural network which achieved a classification accuracy of 83% for 259 gliomas patients with T1W, T2W, and FLAIR images. Korfiatis et al. [26] compared different sizes of the ResNet baseline model and reached the highest accuracy of 94.9% in 155 GBM patients with T2W images. Han et al. [39] proposed a bidirectional convolutional recurrent neural network architecture for MGMT methylation classification, while the accuracy was around 62% for 262 GBM patients with T1W, T2W, and FLAIR images. In this study, a shallow CNN is used, and the classification performance is promising. The best performance comes from the model trained with FLAIR images, and we achieved a satisfactory result with the highest accuracy of 0.827 and recall of 0.852 in consideration of the relatively small dataset.

In the previous studies, Drabycz et al. [40] analyzed handcrafted features to distinguish methylated from unmethylated GBM and figured out that texture features from T2-weighted images were important for the prediction of MGMT methylation status. Han et al. [41] found that MGMT promoter-methylated GBM was prone to more tumor necrosis, while T2-weighted FLAIR sequence may be more sensitive to necrosis than T1-weighted images. Interestingly, we also find that better performances of both GBM segmentation and molecular classification are achieved on FLAIR images in our study although the images of CE-T1W and FLAIR did not come from the same patients.

The strengths of this study lie in the fully automatic glioma segmentation and predicting the MGMT methylation status based on a small dataset. Generally, it takes a radiologist about one minute per slice in tumor annotation, while the inference time of the deep learning model is about 0.1 seconds which is around 1/600 times used in manual annotation. Additionally, manual annotation is burdensome and prone to introduce inter- and intraobserver variability. While once well trained, a deep learning model can continuously and repeatedly perform tumor segmentation regardless of the observers. On the other hand, the training strategy in this study is beneficial for small dataset analysis. In general, a deep model requires a large number of training instances. However, it is challenging or impossible to provide massive high-quality images in medical imaging. Finally, although several studies tried to use deep networks for automatic glioma segmentation [35, 36, 42] or molecular classification [26, 38, 39], the proposed network in this study could integrate both glioma segmentation and classification in a seamless connection pipeline. And the performance is competitive to the state-of-the-art studies in tumor segmentation and classification.

There are several limitations to our study. First, the sample size is small in the study; we will further confirm the findings in a study with larger samples. Second, a multicenter research trial is helpful to validate the capability of the proposed pipeline, while the variations of MR imaging sequences, equipment venders, and other factors could impose difficulties on model building. Third, we failed to investigate the value of combined CE-T1WI and FLAIR in tumor segmentation and classification considering the fewer samples. In the future, we will explore multiple MR sequences for MGMT methylation status prediction, such as amide-proton-transfer-weighted imaging and diffusion-weighted imaging. These may have great potential to improve the performance of MGMT methylation status prediction.

5. Conclusion

An MRI-based end-to-end deep learning pipeline is designed for tumor segmentation and MGMT methylation status prediction in GBM patients. It can save time and avoid interobserver variability in tumor segmentation and help discover molecular biomarkers from routine medical images to aid in diagnosis and treatment decision-making.

Data Availability

All MRI data are available in the cancer imaging archive (https://www.cancerimagingarchive.net/), and clinical and molecular data are obtained from the open-access data tier of the TCGA website.

Conflicts of Interest

The authors declare that there is no conflict of interest.

Authors’ Contributions

Xin Chen, Min Zeng, and Yichen Tong contributed equally.

Acknowledgments

This study was supported by the Key R&D Program of Guangdong Province (2018B030339001), the National Science Fund for Distinguished Young Scholars (No. 81925023), the National Natural Scientific Foundation of China (Nos. 81601469, 81771912, 81871846, 81971574, and 81802227), the Guangzhou Science and Technology Project of Health (No. 20191A011002), the Natural Science Foundation of Guangdong Province (2018A030313282), the Fundamental Research Funds for the Central Universities, SCUT (2018MS23), and the Guangzhou Science and Technology Project (202002030268 and 201804010032).