Biomedical Applications of Computer Vision using Artificial IntelligenceView this Special Issue
SGPNet: A Three-Dimensional Multitask Residual Framework for Segmentation and IDH Genotype Prediction of Gliomas
Glioma is the main type of malignant brain tumor in adults, and the status of isocitrate dehydrogenase (IDH) mutation highly affects the diagnosis, treatment, and prognosis of gliomas. Radiographic medical imaging provides a noninvasive platform for sampling both inter and intralesion heterogeneity of gliomas, and previous research has shown that the IDH genotype can be predicted from the fusion of multimodality radiology images. The features of medical images and IDH genotype are vital for medical treatment; however, it still lacks a multitask framework for the segmentation of the lesion areas of gliomas and the prediction of IDH genotype. In this paper, we propose a novel three-dimensional (3D) multitask deep learning model for segmentation and genotype prediction (SGPNet). The residual units are also introduced into the SGPNet that allows the output blocks to extract hierarchical features for different tasks and facilitate the information propagation. Our model reduces 26.6% classification error rates comparing with previous models on the datasets of Multimodal Brain Tumor Segmentation Challenge (BRATS) 2020 and The Cancer Genome Atlas (TCGA) gliomas’ databases. Furthermore, we first practically investigate the influence of lesion areas on the performance of IDH genotype prediction by setting different groups of learning targets. The experimental results indicate that the information of lesion areas is more important for the IDH genotype prediction. Our framework is effective and generalizable, which can serve as a highly automated tool to be applied in clinical decision making.
Glioma is the main type of malignant brain tumor in adults which accounted for approximately 80% of them, and it can be divided into four grades from I to IV according to the World Health Organization (WHO) . Despite the frequency of gliomas, the histology and molecular etiology are variable even in a single pathology class ; hence, recognizing the status is crucial for precision medicine. Isocitrate dehydrogenase (IDH) is a general term for IDH1 and IDH2, and previous studies have proved that the IDH genotype (wild-type or mutation) shows significant impacts on the diagnosis, treatment, and prognosis of glioma patients [3–6]. However, identifying the IDH genotype by a biopsy is an invasive and costly procedure that needs a sample of cells from a patient’s lesion, while radiographic medical imaging provides a noninvasive platform for sampling both inter and intralesion heterogeneity of gliomas. Previous research has demonstrated the strong correlation between phenotypes (extracted from medical images) and genotypes (extracted from gene expression files), and the prediction of genotypes from phenotypes becomes a fast-developing research field .
At present, there have been constructed high-performance models to predict the genotypes of gliomas patients across medical images. Regarding this task, an effective approach is based on radiomics and machine learning algorithms [8, 9]. Radiomics is a method that extracted lesion-related features from medical images by experienced radiologists using professional software and data-characterization algorithms . The high-dimensional images’ data are well represented by the low-dimensional radiomics features after the processing of radiologists, and using these radiomics features allows researchers to build IDH prediction models more easily. Although the radiomics feature-based models perform well on genotype prediction, they still have some limitations. For example, extracting radiomics features depends on radiologists’ judgment is a subjective procedure, and it is also affected by factors of the environment of hardware and software. Different radiologists using different software and algorithms may result in slightly different descriptions of the details of the lesion. Besides, all raw images should be processed before the predicting phase, and the low-dimensional features restrict the models for further investigations. Overall, the model’s generalization ability and reproducibility are limited by the high-dependency on manual intervention.
Based on the above observations, researchers introduced deep learning (DL) algorithms into genotype prediction tasks. DL, as a subclass of machine learning (ML), reveals a more powerful learning ability. The annotated data are only required for the training phase, and the well-trained models could receive raw images as input for various tasks. The raw images preserve all the information about the lesions and the organism that allow the models to finish more complex tasks. Chang et al. developed a residual convolutional neural network (CNN) using magnetic resonance (MR) sequence images . However, in Chang’s work, the MR sequence images are manually selected from whole 3D brain MR images. To directly handle the 3D brain MR images, Liang et al. developed a 3D DenseNet for IDH genotype prediction of low-grade (grade II and III, known as low-grade gliomas, LGG) and high-grade (grade IV, known as glioblastoma multiform, GBM) gliomas’ MR images and achieved an accuracy of 84.6% on the validation dataset . DL algorithms also perform well on automatic segmentation tasks, and previous studies have established many high-performance models to segment lesion areas from medical images [13, 14]. Soltaninejad et al. combined DL and ML algorithms to build superpixel-based and supervoxel-based models for brain tumor segmentation and detection . However, these models are incompetent to predict gene mutation statuses which are also important for the treatment of glioma patients. The attention mechanism is also introduced to improve the performance of segmentation. Although the attention mechanism shows potential to be applied to medical image tasks, it significantly increases the computational complexity of models, especially for the 3D MR images. It means that the attention-based models need more cases, and they are more difficult to be well-trained. Liu et al. developed a multitask model including segmentation of brainstem gliomas and prediction of H3 K27M mutation . The phenotypes of MR images and IDH genotype are both important criteria for gliomas’ patients to receive proper medical treatment; however, it still lacks a multitask framework for the segmentation of the lesion areas of gliomas and the prediction of IDH genotype.
The brain MR images contain the details of normal tissues and lesion areas. Both normal tissues and lesion areas may affect the performance of genotype prediction. However, previous research studies only focused on conducting the black-box models for the genotype prediction due to constrained by the single-task model structure, which limits the reliability as a computer-aided tool for diagnosis and treatment. Due to the multitask architecture in the SGPNet, we set up controlled experiments to discuss the influence of lesion areas for IDH genotype prediction by setting different groups of learning targets.
In this paper, we focus on a multitask CNN model to address the challenges of the automatic segmentation of low-grade gliomas (LGG) and glioblastoma multiform (GBM) tumor volumes and the prediction of IDH mutation from MR images (SGPNet). Four types of modalities of MR images including T1, T1Gd, T2, and T2-FLAIR are preprocessed and then fed into the SGPNet, and our model consists of a single backbone with two output blocks, one each for segmentation and IDH status. In order to effectively train such a multitask model, we apply a multiloss function for our network and different learning rates for the different blocks. The experimental results indicate that our model reduces 26.6% classification error rates comparing with previous models on the datasets of Multimodal Brain Tumor Segmentation Challenge (BRATS) and The Cancer Genome Atlas (TCGA) gliomas’ databases. In addition, we further study the features of lesion areas which influence the performance of IDH genotype prediction. We believe that these experiments can prove the information of lesions which is important for the IDH genotypes prediction and increase the reliability of the IDH genotypes prediction.
2. Materials and Methods
2.1. Gene Profiles and Medical Images Dataset
In this paper, we used two datasets of The Cancer Genome Atlas (TCGA) and Brain Tumor Segmentation Challenge (BRATS) 2020 databases to conduct our experiments. The genotype-related dataset used in this paper is The Cancer Genome Atlas (TCGA)  which provides various gene data types, including gene expression profiling, copy number variation profiling, and so on. More specifically, The TCGA dataset provides four methods to identify gene mutation status in parallel, including MuSE , MuTect2 , SomaticSniper , and VarScan2 . We considered one gene to be in mutation status when more than one of these methods indicated this gene is mutated. The BRATS 2020 dataset [22–24] provides multimodalities brain MR images of LGG and GBM patients, including T1, T1Gd, T2, and T2-FLAIR volumes. One of the sources in the BRATS dataset is The Cancer Imaging Archive (TCIA) dataset , which allows us to build cross-referenced MR images and gene expression profiles data according to the project ID in both datasets. The subtypes of the segmentation labels include the necrotic and the nonenhancing (NCR and NET), the peritumoral edema (ED), the enhancing tumor (ET), and the background. In this paper, considering the scale of the datasets and our research content, we integrate the NCR and NET, ED, and ET into the lesion label, and it can make the evaluation of our experimental results more concise. Totally, 121 cross-referenced patients’ data are collected from the above datasets which include 56 mutant cases and 65 wild-type cases, respectively.
2.2. Data Processing
The original MR images have been manually annotated by clinical experts; each entity consists of four modalities volumes (T1, T1Gd, T2, and T2-FLAIR) and the ground-truth segmentation labels, and all those images have the same shape of pixels. The data preprocessing procedure has the following steps. (1) Every image is cropped to remove the black background. (2) Following the cropped image is reshaped into the unified shape of pixels, and then all images except for segmentation labels are normalized to zero mean and unit standard deviation. (3) The four modalities are concatenated as four input channels. Figure 1 shows the above steps of the preprocessing procedure. Considering the scale of dataset size, we also apply the data augmentation technique, and the operations of shift and flip are randomly chosen with a fifty percent chance in each training step.
2.3. Model Architecture
In the segmentation task, using low-level details of the input image is proved to be important when the size of datasets is limited; as a result, U-Net has achieved high performance and been widely applied on medical image segmentation [26–28]. Besides, degradation is also a common problem when the network architecture is deep . Inspired by this research, we modify the hyperparameters of 3D U-Net and introduce skip-connection into our model. The basic shape of our framework is based on the standard U-Net containing two paths called contracting path (left side) and expansive path (right side). There are five pairs of blocks employed in the two paths, where the output of the block in the contracting path is concatenated as part of the input of the block in the corresponding expansive path. These connections create a quick pathway for information between high-level and low-level feature maps which is facilitating the gradient backward propagation and compensating finer details into high-level semantic features . Besides, these connections allow the output blocks to extract multilevel features for different tasks from the backbone of SGPNet.
Our proposed network is consisting of one backbone and two output blocks, illustrated in Figure 2. More specifically, represents the 3D convolution layer; the items in four-tuple represent input channels, output channels, kernel size, and stride of the convolutional layer, respectively. represents the instance normalization (IN) layer which is designed to remove the instance-specific contrast information from the input image , and is the up-sampling layer. represents the fully connected layer for the prediction of IDH genotype. is the following leaky rectified linear unit (LeakyReLU) activation function:
The segmentation task and the IDH genotype prediction task share most of the weights in the backbone. In general, our network is an end-to-end model, which receives four channels of MR images as input and outputs the segmentation labels and predicted IDH mutation status.
In the contracting path, we replace the max-pooling layer in the standard U-Net, with one convolution with a stride of 2 for down-sampling and double the number of output channels, followed by two repeated convolutions with a stride of 1. LR and IN are also added after the convolution layer. The blue dotted line represents the skip-connection; it adds the output of the first convolution layer with the output of the last convolution layer in each block. In the expansive path, the input of each block is the concatenation of the previous block and the corresponding feature map from the paired contracting path. The first convolution integrates the information of concatenated input, followed by a convolution that halves the number of input channels. The upsampling layer follows these two convolutions and uses the nearest neighbor interpolation algorithm to double the width and height of the input features, followed by a convolution to further half the number of input channels. These two output blocks have also introduced the idea of skip-connection. For the segmentation and IDH genotype output blocks, the input of these blocks is from three different levels’ blocks in the expansive path of the backbone.
2.4. Evaluation Metrics
In this section, we use four metrics to assess SGPNet including specificity (SP), sensitivity (SN), accuracy (ACC), and area under the receiver operating characteristic curve (AUC) for IDH status prediction task and dice similarity coefficient (DSC) for segmentation task. Specificity (SP) measures the proportion of negatives that are correctly predicted, as in equation (2), and sensitivity (SN) is the measurements of true positive rate, as in equation (3). ACC is the fraction of the total samples that are identified correctly, as in equation (4). AUC calculates the probability that a randomly selected positive example ranked above a randomly selected negative one. Dice similarity coefficient (DSC) is designed to score how closely the predicted segmentation labels matched the annotated ground-truth segmentation labels, as in equation (5).
There are four definitions introduced to calculate the above items: true positive (TP) is the quantity of the correctly predicted positive class, likewise, true negative (TN) is the number of correctly predicted negative class. False positive (FP) is the quantity of incorrectly predicted positive class, and false negative (FN) is the quantity of incorrectly predicted negative class.
2.5. Implementation Details
Considering the evaluation metrics, cross-entropy and dice loss are the objective functions of our network. In the task of gene mutation prediction, the IDH status is encoded into two labels (wild-type and mutation). The binary cross-entropy (BCE) loss function is used to calculate the similarity between the predicted labels and ground-truth labels, which is defined as follows:where represents the model’s prediction of class possibilities and represents the ground-truth labels.
The dice loss function is aimed to calculate the spatial overlap accuracy of predicted segmentation labels compared with manually annotated labels which are defined as follows:
The ground-truth segmentation labels contain more information than the IDH mutation status, so it may be not ideal to weigh segmentation error equally with classification error. In order to integrate the above loss functions, we define the total loss as follows:where is the parameter to balance the segmentation error and classification error. In order to dynamically balance the dice loss and classification loss, the parameter in the total loss function is defined as , so the total loss function can be given by the following formula:
We set different learning rates for different parts of our network. In particular, the learning rate is set to 0.0001 for the backbone and segmentation labels output block, and it is set to 0.00005 for the IDH status prediction block. Moreover, we adopt learning rate scheduling with cosine annealing during the training phase. The weights of our network are optimized by the Adam  method with a minibatch size of two.
3. Experiments and Results
In this section, we present a series of experiments to demonstrate the performance of the proposed multitask model; we test SGPNet on the BRAST and TCGA datasets and compare SGPNet with three existing models. Furthermore, we discuss the impact of the lesion’s information for the IDH status prediction task. Overall, 121 gliomas cases are involved including 56 mutant cases and 65 wild-type cases. The reproducibility of the results is verified in fivefold cross-validations, and the final results are the average of the cross-validations.
3.1. Multitask Model for Segmentation and IDH Genotype Prediction
In order to evaluate the performance of our proposed model, we compare SGPNet with three different models. ACC, SE, SP, and AUC metrics are utilized to quantitatively evaluate the performance of the prediction of IDH genotype, and the DSC metric is used to evaluate the performance of the segmentation task. Table 1 shows the ACC, SN, SP, AUC, and DSC of all models on the performance of the IDH genotype prediction task and segmentation task. Figure 3 illustrates the qualitative segmentation results of lesion areas with our SGPNet, which demonstrates that the SGPNet can determine the boundary of the lesion accurately.
Different from single-task segmentation and classification models, the SGPNet not only can segment the lesions of gliomas but also predicts the IDH genotype depending on the brain MR images. The positive predictive value (PPV) and negative predictive value (NPV) of the SGPNet achieve 0.894 and 0.908, respectively. Moreover, these experimental results show that our proposed model reduced 26.6% classification error rates compared with previous models and performed well on gliomas’ lesions segmentation.
3.2. The Comparisons with Different Groups of Learning Targets
The brain MR images contain the details of normal tissues and lesion areas. Both normal and lesion areas may possibly influence genotype prediction. The multitask model structure allows us to set different groups of learning targets to investigate if the information of lesion areas or the whole-brain MR images may be more likely to influence the genotype prediction, which might increase the reliability as a computer-aided tool for diagnosis and treatment. In this section, we carry out three controlled experiments for analysing the relationship between the genotypes and phenotypes by training SGPNet with different groups of learning targets: (1) SGPNet is only trained with IDH genotype; (2) SGPNet is trained with ground-truth segmentation labels and IDH genotype; and (3) SGPNet is trained with randomly generated tensor as segmentation labels and IDH genotype. Table 2 shows the performance of IDH genotype prediction across three controlled experiments. Figure 4 compares the comparative ROC curves of different experiments.
The total loss function is simplified as a single-task objective function when SGPNet is only trained with IDH genotype labels. After that, SGPNet is considered as a classifier of IDH genotype, and the performance of SGPNet is worse than Liang et al. and Chang et al. [11, 12]. One important reason is that Liang et al. and Chang et al. crop the lesion areas as the models’ input, while our model receives whole-brain MR images as input, which increases the difficulty for the model to extract useful features considering the limited information of IDH genotype labels. When the ground-truth segmentation labels are added as learning targets, the performance of the model is significantly improved. However, the first experiment uses a single-task objective function , while the second experiment uses the multitask objective function . To further discuss the influence of the objective function, we set up the third experiment that regards randomly generated segmentation labels as learning targets. It means that the segmentation output block learns the wrong features of lesion areas while the IDH status output block can still learn the features of the whole MR images; as a result, the performance of the model is significantly cut down. After comparing these experimental results, we can infer that the ground-truth segmentation labels promote the performance of IDH genotype prediction, and the lesions information is more important to predict the IDH genotype.
Developing an automatic segmentation of 3D gliomas lesion is a challenging task, considering the wide variability in tumor size, form, and strength. Furthermore, the mutation status of IDH can be used as a qualified biomarker for selecting diagnostic and therapeutic approaches for gliomas patients. Previous studies have focused on the prediction of genotypes from medical images [8, 9, 11, 12]; however, these single-task models show the limitation of their practicality and scalability. However, it still lacks a multitask model for segmentation and IDH genotype prediction of gliomas. Besides, there is no research to compare the influence of the images’ features of whole MR images and lesion areas to the prediction of IDH genotype.
The SGPNet is an end-to-end framework designed to address the challenges of segmentation and IDH genotype prediction of gliomas. In Section 3.1, the experimental results indicate the significant improvement of the performance of IDH genotype prediction, and the prediction error rates reduce 26.6%, comparing to the models of Liang et al. and Chang et al. [11, 12]. Due to the multitask model architecture, in Section 3.2, we further discuss if the information of gliomas’ lesions or whole MR images is more likely to affect the prediction of IDH genotype by setting different learning target groups. The experimental results indicate that providing the ground-truth segmentation labels as learning targets will promote the performance of IDH genotype prediction comparing with other experiments. Overall, we infer that the information of lesion areas is more important for IDH genotype prediction, which increases the reliability as a computer-aided tool for diagnosis and treatment.
In clinical practice, the diagnosis of glioma is usually made by experts based on the various MR images and gene mutation statuses. The different modalities of MR images can reflect different characteristics of the lesions. For example, T1 provides anatomical information, and T2 is sensitive to the edema area and reflects the morphological information of tumors . The SGPNet can integrate multimodality MR images to predict the boundary of lesion areas and the IDH genotype of the patients, and it can reduce doctors’ workload and help doctors to choose the proper treatment for the patients. The SGPNet is feasible for segmentation and genotype prediction because the backbone of our framework is designed to learn the intrinsic information of patients’ lesions. Meanwhile, the framework of SGPNet can be used to segment other tissue lesions or predict other genotypes when it is well-trained on the corresponding dataset. The SGPNet can be also applied to multicenters and larger-scale multisequence MR image datasets because the backbone in our models is generic for any MR image collected from different institutions, equipment, and modalities. Moreover, increasing the scale of training datasets can improve the generalization ability of the SGPNet. Generating probability density distributions for different tissue types is also an effective approach to reduce noise reduce environmental noise and improve generalization ability . Therefore, the design of an automatic multitask model for gliomas has superior clinical value. In the future, we will further develop our framework and apply the SGPNet to more types of diseases and genes.
In this paper, we present a novel multitask 3D framework named SGPNet for automatic segmentation of gliomas lesions and prediction of IDH mutation status from MR images. Our framework employs a backbone for learning the intrinsic MR image information, two output blocks for segmentation and IDH genotype prediction of gliomas. The experimental results indicate that our architecture achieves a better IDH genotype prediction performance on public TCGA and BRATS 2020 datasets comparing with previous studies and achieves a good result on the segmentation task. Furthermore, we compare the influence of the images’ features of whole MR images and lesion areas to the prediction of genotype and the experimental results, indicating that the information of patients’ lesions is more significant for the prediction of IDH genotype. In summary, the accurate segmentation of glioma lesion regions and prediction of IDH mutation status will improve therapeutic criteria and assist doctors in diagnosis and treatment.
The MRI data used to support the findings of this study have been deposited in the BRATS repository (http://braintumorsegmentation.org/), and the gene profiles data used to support the findings of this study have been deposited in the TCGA-GBM and TCGA-LGG repositories (https://portal.gdc.cancer.gov/projects/TCGA-GBM and https://portal.gdc.cancer.gov/projects/TCGA-LGG).
Conflicts of Interest
The authors declare that they have no conflicts of interest regarding the publication of this paper.
This work was supported by the National Natural Science Foundation of China (nos. 62072212 and 81600923), the Development Project of Jilin Province of China (nos. 20200401083GX and 2020C003), and the Foundation of Health and Family Planning Commission of Jilin Province (no. 2020J052). This work was also supported by the Jilin Provincial Key Laboratory of Big Data Intelligent Computing (no. 20180622002JC).
C. Hartmann, B. Hentschel, W. Wick et al., “Patients with IDH1 wild type anaplastic astrocytomas exhibit worse prognosis than IDH1-mutated glioblastomas, and IDH1 mutation status accounts for the unfavorable prognostic effect of higher age: implications for classification of gliomas,” Acta Neuropathologica, vol. 120, no. 6, pp. 707–718, 2010.View at: Publisher Site | Google Scholar
H. Arita, M. Kinoshita, A. Kawaguchi et al., “Lesion location implemented magnetic resonance imaging radiomics for predicting IDH and TERT promoter mutations in grade II/III gliomas,” Scientific Reports, vol. 8, no. 1, Article ID 11773, 2018.View at: Google Scholar
S. Liang, R. Zhang, D. Liang et al., “Multimodal 3D DenseNet for IDH genotype prediction in gliomas,” Genes, vol. 9, no. 8, p. 382, 2018.View at: Google Scholar
M. Soltaninejad, L. Zhang, T. Lambrou et al., “MRI brain tumor segmentation and patient survival prediction using random forests and fully convolutional networks,” in Proceedings of the International MICCAI Brainlesion Workshop, pp. 204–215, Quebec City, Canada, September 2017.View at: Google Scholar
K. Tomczak, P. Czerwińska, and M. Wiznerowicz, “The cancer genome atlas (TCGA): an immeasurable source of knowledge,” Contemporary Oncology, vol. 19, no. 1A, pp. A68–A77, 2015.View at: Google Scholar
Y. Fan, L. Xi, D. Hughes et al., “MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data,” Genome Biology, vol. 17, no. 1, p. 178, 2016.View at: Google Scholar
S. Bakas, H. Akbari, A. Sotiras et al., “Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features,” Scientific Data, vol. 4, no. 1, Article ID 170117, 2017.View at: Google Scholar
O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolutional networks for biomedical image segmentation,” in Proceedings of the Medical Image Computing and Computer-Assisted Intervention, pp. 234–241, Munich, Germany, October 2015.View at: Google Scholar
Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3D U-net: learning dense volumetric segmentation from sparse annotation,” in Proceedings of the Medical Image Computing and Computer-Assisted Intervention, pp. 424–432, Athens, Greece, October 2016.View at: Google Scholar
T. Zhou, S. Ruan, and S. Canu, “A review: deep learning for medical image segmentation using multi-modality fusion,” Array, vol. 3-4, Article ID 100004, 2019.View at: Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and pattern Recognition, pp. 770–778, Las Vegas, NV, USA, June 2016.View at: Google Scholar
F. Raschke, T. R. Barrick, T. L. Jones et al., “Tissue-type mapping of gliomas,” NeuroImage: Clinical, vol. 21, Article ID 101648, 2019.View at: Google Scholar