Advances in MRI Techniques and ApplicationsView this Special Issue
Research Article | Open Access
Tumour Relapse Prediction Using Multiparametric MR Data Recorded during Follow-Up of GBM Patients
Purpose. We have focused on finding a classifier that best discriminates between tumour progression and regression based on multiparametric MR data retrieved from follow-up GBM patients. Materials and Methods. Multiparametric MR data consisting of conventional and advanced MRI (perfusion, diffusion, and spectroscopy) were acquired from 29 GBM patients treated with adjuvant therapy after surgery over a period of several months. A 27-feature vector was built for each time point, although not all features could be obtained at all time points due to missing data or quality issues. We tested classifiers using LOPO method on complete and imputed data. We measure the performance by computing BER for each time point and wBER for all time points. Results. If we train random forests, LogitBoost, or RobustBoost on data with complete features, we can differentiate between tumour progression and regression with 100% accuracy, one time point (i.e., about 1 month) earlier than the date when doctors had put a label (progressive or responsive) according to established radiological criteria. We obtain the same result when training the same classifiers solely on complete perfusion data. Conclusions. Our findings suggest that ensemble classifiers (i.e., random forests and boost classifiers) show promising results in predicting tumour progression earlier than established radiological criteria and should be further investigated.
GBM is the most common and malignant intracranial tumor , representing as much as 30% of primary brain tumors with increasing incidence in some geographic regions . The patients have a median survival of only 10 to 14 months after diagnosis with only 3 to 5% of patients surviving more than three years. Recurrence is universal, and, at the time of relapse, the median survival is only five to seven months despite therapy .
The current standard of care is surgical resection followed by radiotherapy and concomitant and adjuvant temozolomide (TMZ) chemotherapy .
Magnetic resonance imaging (MRI) is the most widely used medical imaging technique for identifying the location and size of brain tumours. However, conventional MRI has a limited specificity in determining the underlying type of brain tumour and tumour grade [5, 6]. More advanced MR techniques like diffusion-weighted MRI, perfusion-weighted MRI, and chemical shift imaging (CSI) are promising in the characterization of brain tumours as they give potentially more physiological information [7–9].
Diffusion-weighted imaging (DWI) and diffusion kurtosis imaging (DKI) visualize the tissue structure and are useful for assessing tumour cellularity, because they give information about the movement of the water inside different tissues including biological barriers. Typical parameters related to diffusion are apparent diffusion coefficient (ADC), mean diffusivity (MD), mean kurtosis (MK), and fractional anisotropy (FA). MD is a general parameter that accounts for the mean diffusivity in all directions, MK might be a specific parameter for tissue structure , and FA is a general index of anisotropy, with a value of zero corresponding to isotropic diffusion and a value of one corresponding to diffusion only in one direction.
Perfusion-weighted MRI (PWI) provides measurements that reflect changes in blood flow, volume, and angiogenesis. Hypervascularity due to glioma-induced neoangiogenesis may show up as high relative cerebral blood volume (rCBV) while necrosis of different tissues may show up as low rCBV .
MR spectroscopy provides information about metabolites present in normal and abnormal tissues . This information can be represented as metabolite maps using CSI.
We have studied patients with GBM that had the tumour surgically removed and afterwards were treated according to two different protocols developed for evaluating dendritic cell immunotherapy: HGG-IMMUNO-2003 [13–16] and HGG-IMMUNO-2010 .
The focus of our paper is finding a map between the multiparametric MR data acquired during the follow-up of the patients and the relapse of the brain tumour after surgery, as described by the clinically accepted RANO criteria . In order to do this, we test different families of classifiers on multiparametric MR data, starting from simple ones, for example, -nearest neighbours (-NN) and linear discriminant analysis (LDA), and moving to nonlinear classifiers, for example, random forests and neural networks, using a total of 27 features extracted from PWI, DKI, and CSI data.
2. Materials and Methods
2.1. Study Setup
Patients that were treated according to the HGG-IMMUNO-2003 protocol are patients with relapsed GBM that received immune therapy as the sole treatment strategy.
Patients that were treated according to the HGG-IMMUNO-2010 protocol are patients with primary GBM that had surgery. For the follow-up treatment after surgery the patients were split into two groups. The first group consisting of 6 patients who received radiochemotherapy and the immune therapy vaccine. The second group consisting of the remaining 7 patients who received just radiochemotherapy for the first six months after surgery, and after those six months all 7 patients received radiochemotherapy plus the immune therapy vaccine. We refer to the first group as “HGG-IMMUNO-2010 vaccine” and to the second group as “HGG-IMMUNO-2010 placebo.”
All 29 patients were offered monthly MRI follow-up, but after six months under immune therapy all patients switched to a three-monthly schedule.
The local ethics committee approved this study and informed consent was obtained from every patient before the first imaging time point.
Based on radiological evaluation of the follow-up MRI scans using the current guidelines for response assessment of high grade glioma , each patient was assigned to one of two clinical groups:(i)patients with progressive disease during follow-up which exhibit an increase of ≥25% in the sum of the products of perpendicular diameter of enhancing lesions compared to the smallest tumour measurement obtained either at baseline or best response,(ii)patients with complete response with disappearance of all measurable and nonmeasurable disease sustained for at least 4 weeks.
Based on this assessment, each MRI time point for each patient was considered to be labeled or unlabeled as follows: labeled as “responsive” for all time points at and after the moment when the patient was considered as “complete response”; labeled as “progressive” for all time points at and after the moment when the patient was considered as “progressive disease”; or “unlabeled” for all time points preceding the decision moment.
2.2. MRI Acquisition and Processing
Magnetic resonance imaging was performed on a clinical 3 Tesla MR imaging system (Philips Achieva, Best, Netherlands), using a body coil for transmission and a 32-channel head coil for signal reception. The imaging protocol consisted of diffusion kurtosis imaging, dynamic susceptibility weighted contrast-MRI (DSC-MRI), and MR spectroscopy, combined with standard anatomical imaging (T1-weighted MRI after contrast administration, T2-weighted MRI, and FLAIR (fluid attenuated inversion recovery) MR images).
2.2.1. Anatomical Magnetic Resonance Imaging
MR images were acquired as previously described [9, 18, 19]. In brief, an axial spin echo T2-weighted MR image (TR/TE: 3000/80 msec, slice/gap: 4/1 mm, field of view (FOV): 230 × 184 mm2, turbo factor (TF): 10, and acquisition matrix: 400 × 300), an axial fluid-attenuated inversion recovery (FLAIR) image (TR/TE/IR: 11000/120/2800 msec, slice/gap: 4/1 mm, and acquisition matrix: 240 × 134), and a T1-weighted 3D spoiled gradient echo scan (fast field echo-FFE, TR/TE: 9.7/4.6 msec, flip angle: 8°, turbo field echo factor: 180, acquisition voxel size: 0.98 × 0.98 × 1 mm3, 118 contiguous partitions, and inversion time: 900 msec) after contrast administration were acquired as high-resolution anatomical reference images.
Regions of interest (ROI) were manually drawn around the solid contrast-enhancing region if present, avoiding areas of necrosis (N) or cystic components such as the surgical cavity. A second ROI was manually drawn around the entire lesion (TO), that is, contrast enhancement (CE) and perilesional oedema (ED). The ROI containing the perilesional oedema was obtained by extracting the contrast-enhancing portion from the total lesion. Finally, a separate ROI was drawn around the contralateral normal appearing white matter (NAWM) to standardize the hemodynamic measurements of DSC-MRI.
The manual delineations were drawn by a radiologist (SVC) with 5 years experience of MR imaging of brain tumours. An example of delineations on T1 post contrast image can be seen in Figure 1, where green is the necrosis, red is CE, and blue is ED.
2.2.2. Magnetic Resonance Spectroscopy
A 2D-CSI short echo time protocol was used as validated in . The volume of interest (VOI) is positioned on the slice of the transverse reconstruction of the T1-weighted 3D-FFE sequence with the largest section of contrast enhancement. The slice thickness of the VOI is 10 mm and the VOI is 80 × 80 × 10 mm3, with each voxel being 5 × 5 × 10 mm3 (16 × 16 voxels in total). If the contrast-enhancing lesion was smaller than 2 cm3 or the contrast enhancement is located in areas with large susceptibility differences, for example, the basal forebrain or the anterior temporal lobes, a single voxel (SV) technique was performed (TR/TE: 2000/35 msec, minimal volume: 1 cm3).
Nine metabolites were quantified using the AQSES-MRSI quantification method : N-acetyl aspartate (NAA), glutamine (Gln), glutamate (Glu), total creatine (Cre), phosphorylcholine (PCh), glycerophosphorylcholine (GPC), myo-inositol (Myo), and lipids (Lips) at 0.9 and 1.3 ppm, referred to as Lip1 and Lip2, respectively. Glu + Gln and PCh + GPC were reported as Glx and tCho (total choline), respectively. For each metabolite, AQSES-MRSI reported metabolite concentrations in institutional units and their error estimates as Cramer-Rao lower bounds (CRLBs) . After quantification, good quality voxels were selected based on the CRLBs and spectral quality assessment as recommended by Kreis (FWHM of metabolites <0.07–0.1 ppm, no unexplained features in the residuals, no doubled peaks or evidence for movement artifacts, symmetric line shape, no outer volume ghosts or other artifacts present) . CRLB lower than 20% for tCho, NAA, Glx, Cre, and Lips and CRLB lower than 50% for Myo were considered sufficient. From these representative voxels, the mean metabolite ratios as proposed by Kounelakis et al. were calculated  over the CE region: NAA/tCho, NAA/sum, tCho/sum, NAA/Cre, Lips/tCho, tCho/Cre, Myo/sum, Cre/sum, Lips/Cre and Glx/sum (10 parameters). The sum represents the sum of the concentrations of all quantified metabolites.
Sixty-six percent (66%) of all spectroscopic time points are not included in this study. There are two reasons for this: quantification was not possible for all time points (MR spectroscopy data was not acquired for all patients due to patient movement) and the rest of them did not pass the quality control recommended by Kreis .
2.2.3. Dynamic Susceptibility Weighted Imaging (DSC-MRI)
Perfusion images were obtained using a standard DSC perfusion MR imaging protocol consisting of a gradient echo-EPI sequence, TR/TE: 1350/30 msec, section thickness/gap: 3/0 mm, dynamic scans: 60, FOV: 200 × 200 mm2, matrix: 112 × 109, number of slices: 23, and scan time: 1 minute 26 seconds. EPI data were acquired during the first pass following a rapid injection of a 0.1 mmol/kg body weight bolus of megluminegadoterat (Dotarem, Guerbet, Villepinte, France) via a mechanical pump at a rate of 4 mL/sec, followed by a 20 mL bolus of saline. Preload dosing was performed according to Hu et al. in order to correct for T1-weighted leakage (preload dose 0.1 mmol/kg megluminegadoterat, incubation time 10 min) .
The mean values of the considered perfusion parameters were retrieved in the CE, ED, and NAWM regions. We report relative rCBV (rrCBV), relative rCBF (rrCBF), and relative DR (rDR) of tumoural tissue by using the corresponding parameter value in the contralateral NAWM as internal reference.
Although quantification was possible for all time points, after quality assessment done by visual inspection by SVC, 30% of them were not included in this study.
2.2.4. Diffusion Kurtosis Imaging (DKI)
DKI data were acquired according to the previously described protocol in [18, 19] (SE-EPI-DWI sequence with TR/TE: 3200/90 msec, : 20/48.3 msec; FOV: 240 × 240 mm2, matrix: 96 × 96, number of slices: 44, 1 signal average acquired, section thickness/gap: 2.5/0 mm, and -values: 700, 1000, and 2800 sec/mm2 in 25, 40, and 75 uniformly distributed directions, resp.) . The DKI data were processed as described in . Fractional anisotropy (FA), mean diffusivity (MD), and mean kurtosis (MK) were derived from the tensors [10, 28]. A nonlinear registration of the parameter maps to the anatomical MR imaging data was performed to minimize the local misalignment between the EPI distorted DKI data and the anatomical data on which the ROIs were manually positioned. MK, MD, and FA were determined in the CE and ED regions.
Although quantification was possible for all time points, after quality control according to , 44% of them were not included in this study.
2.2.5. Summary of MRI Acquisition and Processing
In total, from 29 patients, we have 178 data points of follow-up MR imaging sessions, and each of these ones has 27 features:(i)3 volumes, contrast enhancement (CE), oedema (ED), and necrosis (N)(ii)6 perfusion features, rrCBV, rrCBF, and rDR for CE and ED(iii)6 diffusion features, MK, MD, and FA for CE and ED(iv)10 spectroscopic features, from CE—NAA/tCho, NAA/sum, tCho/sum, NAA/Cre, Lips/tCho, tCho/Cre, Myo/sum, Cre/sum, Lips/Cre, and Glx/sum(v)a parameter (0 or 1) for total resection of the tumour(vi)a parameter (0, 1, or 2) to describe the group of the patient, HGG-IMMUNO-2003, HGG-IMMUNO-2010 placebo, or HGG-IMMUNO-2010 vaccine.
Out of all 178 measurements, if we extract just the ones with complete features, it will result in a subset of 18 patients with 45 measurements. This implies that more than 75% of the measurements have at least one feature missing. Five features are always present: the three volumes, the parameter for tumour resection, and the parameter for different groups.
We have used several supervised and semisupervised classifiers, as presented in Table 1, with the goal of testing whether the unlabeled data could have been reliably labeled before the actual labeling was performed in the clinic according to the RANO criteria.
The list of classifiers in Table 1 is representative for the most important families of classification methods, starting from simple classical methods such as linear discriminant analysis (LDA) and -nearest neighbour (-NN) up to more complex nonlinear classifiers such as random forests and neural networks.
Each classifier is based on a mathematical model, which needs to be optimised on the basis of a training dataset. The training set consists here of labeled data, that is, data at and after a clinical decision has been made. The test set on which we compare classifiers consists of data that have no label, that is, time points before the decision of “progressive” or “responsive” has been made.
All classifiers are implemented in MATLAB R2013a (MathWorks, MA, USA). All classifiers except least squares support vector machines (LSSVMs) and the semisupervised ones are part of the Statistics Toolbox and Neural Networks Toolbox of MATLAB R2013a.
-NN  is one of the basic classifers in machine learning. The class label of a new testing point is given by the most common class among its neighbours. We used the default MATLAB R2013a (Statistics Toolbox) function “nnclassify” to run a grid search for the best combination of number of neighbours () and type of distance. We varied between 1 and 11 and the distance was either “euclidean,” “cityblock,” “cosine,” or “correlation.” We found the best results for the combination of 3 neighbours and the “correlation” distance.
Diagonal LDA (dLDA ) is a simple modification of linear discriminant analysis, which implies that we use the pseudoinverse of the covariance matrix instead of the actual inverse. We used the default MATLAB R2013a implementation “classify” from the Statistics Toolbox.
SVMs [31, 32] are among the most popular machine learning models because they are easy to understand: given a training set with points that belong to two classes, we try to find the best hyperplane to differentiate between the two types of points. We can try this in the original space or we can map the points to another space by using the kernel trick. We used the default MATLAB R2013a (Statistics Toolbox) implementations “svmtrain” and “svmclassify.” We used different types of kernel: linear, polynomial, radial basis function, and multilevel perceptron.
Classification tree  is an algorithm commonly used in machine learning. Like in a real tree there are leaves which represent class labels and branches. At each node of a tree a single feature is used to discriminate between different branches. We used the default MATLAB R2013a (Statistics Toolbox) implementation “classregtree.”
Neural networks [34–37] are built on interconnected layers of artificial “neurons” that try to map an input vector to its specific output. There are three types of layers: input, hidden, and output. The weights between different neurons are trained until a maximum number of iterations or a minimum error is reached. We used the default MATLAB R2013a (Neural Network Toolbox) implementation “net” with 10 hidden neurons. We tested four types of neural networks: pattern net, feed forward net, cascade forward net, and fit net.
Random forests [38, 39] are part of the ensemble methods for classification that use a collection of decision trees. Each decision tree learns a rule and then it can classify a new point. The new point is assigned to the class voted by the majority of the decision trees. We used the default MATLAB R2013a (Statistics Toolbox) implementation “TreeBagger” with 100 trees.
Boosting algorithms [40–43] start with a collection of weak classifiers (e.g., decision trees) and with each iteration they try to improve the overall classification by learning what was misclassified at the previous step. We used the default MATLAB R2013a (Statistics Toolbox) implementation “fitensemble” with 100 trees. We tested seven types of boosting algorithms: AdaBoost, LogitBoost, GentleBoost, RobustBoost, LPBoost, TotalBoost, and RUSBoost.
LSSVMs [44, 45] are a powerful machine learning technique. We downloaded LSSVMlab from  and followed the instructions from  to tune the parameters. We used different types of kernel: linear, polynomial, radial basis function, and also the Bayesian approach on LSSVM.
The semisupervised classifiers used in this paper are low density separation (LDS ), squared-loss mutual information regularization (SMIR ), and safe semisupervised support vector machine (S4VM [50, 51]). In the last years there has been a steady increase in the use and development of semisupervised classifiers, as they take into account information from unlabeled data also, not just from labeled data. This makes them powerful machine learning tools. The implementation for semisupervised classifiers was downloaded from [52–54].
Classifiers were tested first with all features described in Section 2.2.5 taken as input, but then also by selecting subsets of the available features as input, that is, only the features pertaining to a single modality (perfusion, diffusion, and spectroscopy). Additionally, classifiers were tested first on the smaller dataset containing 45 time points with a complete set of features and then on the larger dataset containing 178 time points where missing values have been imputed according to Section 2.4, presented below.
2.4. In-House Imputation Method
Some classifiers have built-in strategies of handling missing values, but other classifiers do not handle missing values (see Table 1). This is why we developed our own in-house imputation method, so the handling of missing values will be the same for all classifiers.
Our method is based on the volumes of contrast enhancement and oedema regions, in the sense that if the volume of a tumour region is zero, that missing tissue is considered healthy tissue. If we have values of any modality (perfusion, diffusion, and spectroscopy) that are missing from CE or ED, and the volume of CE or ED corresponding to that measurement is zero, and then we assume that those missing values belong to a normal type of tissue. For perfusion, because we normalize every parameter to the normal appearing white matter value, the missing values will be replaced by 1’s. For diffusion and spectroscopy, the missing values will be replaced by the average of the features taken over the measurements which were labeled as responsive, because we consider that these measurements are recorded from a healthy tissue. If we have missing values without association to zero volume for CE or ED, they will be replaced by the average taken over all the labeled measurements.
2.5. Performance Indices
Leave One Patient Out (LOPO). Classifiers are trained on labeled data from all patients except one who is the test patient. Each patient in turn is selected as test patient. All time points that belong to the test patient are classified independently. Results for each classifier are averaged per time point over all patients relative to the time point at which the clinical decision was made.
This way of testing is intuitive from a medical point of view and provides us with information about how good is the classification when we approach the decision time. In this way we can look at the temporal evolution of the classification for each patient.
We compute the balanced error rate (BER) at each time point before and after the decision, using the clinical decision assigned to each patient as expected label for all time points of this patient. BER is computed as where
For each classifier we have a grand total of 17 time points, due to the fact that there are patients with up to 6 time points after the decision time point and there are others with up to 11 time points before the decision. In order to compare the classifiers by using just one error number instead of 17, we compute a weighted average for each classifier’s time response. This performance measurement is denoted by “weighted BER (wBER)” in the Results section.
We use two sets of weights:(i)one for the temporal response—the classifier should perform better when we approach the labeling time point and after it: (ii)one for patient population—the time points with more patients get a higher weight (see Table 14 from the Appendix): The equation of wBER is
3. Results and Discussion
3.1.1. LOPO When Using All Modalities
Table 7 from the Appendix shows how different classifiers perform on complete and on imputed features when using all MR modalities.
3.1.2. LOPO When Using Each Modality
Tables 8, 9, and 10 from the Appendix list the performance of the best supervised classifiers (marked by bold font in Table 7) when using, respectively, perfusion, diffusion, or spectroscopy data separately, considering complete features only.
Tables 11, 12, and 13 from the Appendix list the performance of the best supervised classifiers (marked by bold font in Table 7) when using, respectively, perfusion, diffusion, or spectroscopy data separately, considering imputed features only.
3.1.3. In-House Imputation Strategy versus Built-In Imputation Strategy
Table 6 shows how different classifiers perform with our in-house imputation of missing values (Section 2.4) versus the built-in imputation strategy of missing values for the classifiers marked in Table 1.