BioMed Research International

BioMed Research International / 2018 / Article

Research Article | Open Access

Volume 2018 |Article ID 9128527 |

Qiaoliang Li, Yuzhen Xu, Zhewei Chen, Dexiang Liu, Shi-Ting Feng, Martin Law, Yufeng Ye, Bingsheng Huang, "Tumor Segmentation in Contrast-Enhanced Magnetic Resonance Imaging for Nasopharyngeal Carcinoma: Deep Learning with Convolutional Neural Network", BioMed Research International, vol. 2018, Article ID 9128527, 7 pages, 2018.

Tumor Segmentation in Contrast-Enhanced Magnetic Resonance Imaging for Nasopharyngeal Carcinoma: Deep Learning with Convolutional Neural Network

Academic Editor: Enzo Terreno
Received14 May 2018
Revised12 Sep 2018
Accepted02 Oct 2018
Published17 Oct 2018


Objectives. To evaluate the application of a deep learning architecture, based on the convolutional neural network (CNN) technique, to perform automatic tumor segmentation of magnetic resonance imaging (MRI) for nasopharyngeal carcinoma (NPC). Materials and Methods. In this prospective study, 87 MRI containing tumor regions were acquired from newly diagnosed NPC patients. These 87 MRI were augmented to >60,000 images. The proposed CNN network is composed of two phases: feature representation and scores map reconstruction. We designed a stepwise scheme to train our CNN network. To evaluate the performance of our method, we used case-by-case leave-one-out cross-validation (LOOCV). The ground truth of tumor contouring was acquired by the consensus of two experienced radiologists. Results. The mean values of dice similarity coefficient, percent match, and their corresponding ratio with our method were 0.89±0.05, 0.90±0.04, and 0.84±0.06, respectively, all of which were better than reported values in the similar studies. Conclusions. We successfully established a segmentation method for NPC based on deep learning in contrast-enhanced magnetic resonance imaging. Further clinical trials with dedicated algorithms are warranted.

1. Introduction

Head and neck cancer (HNC), especially nasopharyngeal carcinoma (NPC), is an aggressive cancer type with high incidence rate in Southern China [1]. The cancer incidence data collected in Guangxi and Guangdong show that nasopharyngeal cancer is the fourth most common cancer for males [2]. External beam radiation therapy is the primary therapy to this cancer. The 3-year local control rate for NPC after therapy is higher than 80% and the 3-year overall survival rate is up to 90% [3]. Noninvasive medical imaging is of great importance to determine the tumor volume for successful radiation treatment planning [3, 4].

Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI), a functional noninvasive imaging modality, plays a key role in the studies of cancer by providing information about physiological characteristics in tissues. Studies have concluded that DCE-MRI is useful in differentiating tumors from normal tissues in NPC [4]. Accurate segmentation of NPC tumors from DCE-MRI is important for the radiotherapy treatment planning and prognosis evaluation. However, the accuracy of tumor segmentation in DCE-MRI is affected by some imaging factors such as low spatial resolution, poor signal-to-noise ratio, partial volume effect, and the intensity changes during perfusion [5].

There have been many studies performed to automatically segment NPC tumors from medical images. Zhou et al. [6] performed NPC tumor segmentation in MR images by using Semi-Fuzzy C-means with the percent match (PM) values close to 0.87. Zhou et al. [7] performed NPC tumor segmentation in MRI by using the two-class support vector machine (SVM) method with PM values close to 0.79. Huang et al. [8] performed semisupervised NPC lesion extraction in MR images by using spectral clustering-based method with the positive predictive value up to 0.71. Huang et al. [9] performed NPC tumor segmentation by using Bayesian classifiers and SVM method with average specificity of 0.93.

The above-mentioned methods were all conventional machine learning techniques that require subjective feature extraction and selection. Deep learning (DL) technique, such as convolutional neural network (CNN), has recently emerged as a powerful tool in solving the challenges aforementioned, which detects low-level features such as shape and texture information autonomously from small patches of the input images and then combines these features into high-level features for the image processing tasks such as classification, segmentation, and detection without the subjective feature extraction and selection [10, 11]. Deep learning techniques perform even better in generalization with new datasets [12].

To the best of our knowledge, DL with CNN technique in tumor segmentation has recently attracted research interest [13, 14]. Wang et al. [15] performed NPC tumor segmentation in MR images by using deep convolutional neural networks; however, the average Jaccard similarity coefficient (JSC) value was less than 0.8. In the current study, we reported an automatic and accurate segmentation method based on the CNN architecture with dynamic contrast-enhanced MRI.

2. Materials and Methods

2.1. CE-MRI and Preprocessing

Twenty-nine newly diagnosed NPC patients from August 2010 to April 2013 were included from the First Affiliated Hospital, Sun Yat-Sen University. This study was approved by the local institutional review board of Sun Yat-Sen University. Written informed consent was obtained from each patient before the MRI scan. PVE could severely affect the images whenever the tumor size is less than 3 times the full width at half maximum (FWHM) of the reconstructed image resolution [16]. Thus, the patients with lymph nodes or lesions smaller than 1 cm were excluded in the current study to avoid possible partial volume effects (PVE), according to the advice from the radiologists. Imaging of DCE-MRI was performed in the primary tumor region including the retropharyngeal nodes with regional nodal metastasis, in with a 3.0-T MRI system (Magnetom Trio, Siemens) with the field of view of 22cm×22cm×6cm (AP×RL×FH), a flip angle of 15°, and scanning time of 6 minutes and 47 seconds, resulting in 65 dynamic images. The contrast agent gadolinium-diethylenetriamine pentaacetic acid (Gd-DTPA) (Omniscan; Nycomed, Oslo, Norway) was injected intravenously as a bolus into the blood at around the 8th dynamic acquisition using a power injector system (Spectris Solaris, MedRad, USA) and a 25 mL saline flush at a rate of 3.5 mL/sec was immediately followed. The dose of Gd-DTPA was 0.1 mmol per body weight in kg of the patient. The matrix of the 65 reconstructed dynamic image was 144×144×20×65.

The ground truth was manually contoured in ImageJ (National Institutes of Health, Bethesda, MD) with the consensus between two experienced radiologists (Dr. Yufeng Ye, 13 years’ experience, and Dr. Dexiang Liu, 18 years’ experience in Radiology) who were blind to this study. Since tumors were mostly enhanced at the scan of our DCE-MRI, this scan from each patient was used for training and testing our DL model, and we only selected the scanned images containing the tumor area. There were a total of 87 slices of CE-MRI acquired from each of the 29 patients. To fulfill the requirement of large number of data in training the DL model, we augmented the 87 MRI to more than 60,000 slices of images by using the following methods [17], namely, rotating each slice between -10 degrees and 10 degrees with an interval of 2 degrees to augment each slice to 11 slices, changing the image contrast with an embedded Matlab function, Imadjust, to adjust the image contrast automatically to produce 33 extra different slices from one single slice and adding Gaussian noise to the images with a power of 1×10−8 to produce 2 different additional slices from each slice. Totally we augmented the images by 11×33×2=726 times for each patient’s CE-MRI set to give a total of 63126 (87x726) slices. These augmented images were then normalized by performing Z-score translation [18], in which the image intensity value in each voxel was normalized by the mean intensity of this image.

2.2. CNN Network

The CNN network included two phases of feature representation and scores map reconstruction. In the feature representation phase, the network consisted of 2 Pool-Conv-ReLu blocks (P1-P2) and 4 Conv-ReLu blocks (C1-C4) (see Figure 1). A Pool-Conv-ReLu block included one pooling layer (Pool), one convolution layer (Conv), and one rectified linear units (ReLu) layer, while a Conv-ReLu block consisted of one convolution layer and one ReLu layer. The convolution layer detected local features from the input images and the ReLu layer accelerated the convergence. The pooling layer was designed for reducing the dimension of feature maps and network parameters. The input images with a matrix size of 144×144 were transformed into the feature maps of matrix size of 36×36 in the feature representation phase.

In the scores map reconstruction phase (D1-D2, Ct1-Ct2, C5-C6), the images were reconstructed from the 36×36 feature maps. Two deconvolution layers (D1-D2) were applied to reconstruct an output image with a matrix size of 144×144. Since some image details could be missing in this reconstruction from the 36×36 feature maps, the fine features obtained from the previous feature representation phase were combined with the scores map to allow the integration of local and global multilevel contextual information. A concatenate layer was then used for the information connection. Then a convolution layer was applied for information fusion and the final reconstruction. The detailed parameters of the CNN network are shown in Table 1.

BlockLayerKernel sizeStridePadOutput size

Input Data----1441441
Conv-Relu 1Conv151214414480
Pool-Conv-Relu 1Pool122-727280
Pool-Conv-Relu 2Pool222-3636120
Conv-Relu 2Conv41103636300
Conv-Relu 3Conv53113636100
Conv-Relu 4Conv631136361
Conv-Relu 5Conv711072721

2.3. Model Training and Model-Based Segmentation

A stepwise training scheme was used to train the DL CNN network. Firstly, we trained the network in the feature representation phase and a 36×36 scores’ map was obtained. Based on this 36×36’ scores map, we next trained the deconvolution layer, the concatenate layer, and the convolution layer, resulting in a 72×72 scores’ map. Finally, based on the network parameters and output feature maps acquired in the second step, we reconstructed the images with matrix size of 144×144 scores’ map.

In the training process, the weights were optimized in each iteration. The weight of a Gaussian distribution with mean of 0 and standard deviation of 1 was used in the convolution kernel at the initialization step. The training parameters were as follows: basic learning rate: 1×10−7, step size: 1x105, gamma: 0.1, momentum: 0.9, weight decay: 5x10-4. It took 52 hours for a complete training procedure with a NVIDIA GeForce GTX 980 GPU equipped on an Intel Core i7 3.5 GHz computer.

We used the trained model in the segmentation tasks of NPC tumor lesions in the testing dataset. The testing images were input into the trained model. A score map representing the tumor region of the NPC tumor was acquired for each input image.

2.4. Tumor Segmentation

We used the testing dataset to make forward propagation and evaluated the segmentation performance based on the trained model. Parameters of recall, precision, and dice similarity coefficient (DSC) were given bywhere true positive (TP) denotes the correctly identified tumor area and false positive (FP) denotes the tumor area, but the area is normal tissue in ground truth and false negative (FN) denotes normal tissue but the pixel isolated is tumor area in ground truth. And those are the results for each patient.

For the comparisons with other published results, values of corresponding ratio (CR), percent match (PM) [7], and Jaccard similarity coefficient (JSC) [15] were also calculated asThe model validation technique of leave-one-out cross-validation (LOOCV) was used such that, in one repetition, the images of 28 patients were used as the training dataset (which were then augmented to >60000 images), and the images of the remaining patient were used as the testing dataset. After each patient’s images in these 87 images were tested, the mean and variance of DSC, recall, CR, PM, and JSC were calculated to evaluate the segmentation performance of our method.

3. Results

Table 2 tabulates the tumor volumes as segmented by the radiologist (the golden standard) and by the proposed automatic segmentation method together with DSC, CR, PM, recall, and JSC. These values were calculated for each patient, not for each lesion. Table 3 shows the comparison of segmentation performance in terms of DSC, CR, and PM between our current results of DL CNN network and those of published results using other models. The mean DSC with our method for 29 patients was 0.89±0.05, and the range was 0.80-0.95. The mean PM with our method for 29 patients was 0.90±0.04 with a range of 0.71-0.92, which was higher compared to the mean PM of the value less than 0.9 in other studies. The mean CR was 0.84±0.06 and the range was 0.83-0.96, while the mean CR was 0.72 in similar studies using other algorithms [7, 15, 19].

Patients numberVolume of current DL method (cm3)Volume obtained by the two readers (cm3)Percent matchCorresponding ratioDice similarity coefficientRecallJaccard similarity coefficient
Mean±Std--0.90±0.040.84±0.060.89±0.050.88 ±0.070.81 ±0.07

Dice similarity coefficientCorresponding ratioPercent match
StudyAlgorithmMean ± SDRangeMean ± SDRangeMean ± SDRange

Current studyConvolutional neural network0.89±0.050.80-0.950.84±0.060.71-0.920.90±0.040.83-0.96
Zhou at al. [7]Support vector machineN.AN.A0.72±0.060.58~0.850.79±0.070.65-0.91
Huang et al. [14]Hidden Markov random fieldN.AN.A0.720.44-0.910.850.64-1.00
Wang et al. [15]Deep Convolutional Neural NetworksN.A-0.80N.AN.AN.A-0.90

Figure 2 shows the segmentation with high accuracy, in which the DSC, CR, and PM were 0.941, 0.915, and 0.950, respectively, showing good accordance between segmentation results using our current DL CNN network and ground truth.

Figure 3 shows a less accurate segmentation result as obtained by our current DL CNN model with values of DSC, CR, and PM being 0.797, 0.731, and 0.937, respectively, showing slight difference between segmentation results using our current DL CNN network and ground truth.

4. Discussion

Based on the CNN technique, we achieved a supervised segmentation method for NPC tumors in CE-MRI with high accuracy of mean DSC being 0.89. The performance was also robust with a low standard deviation of 0.05 for DSC among the results of different images. For comparison with the other studies, we calculated CR and PM. The mean values of CR and PM achieved with our method were 0.84 and 0.90, respectively. Compared with similar studies in literature, results of CR and PM in our study are more superior, indicating better accuracy in tumor segmentation with our current CNN technique than with other models with highest mean CR and PM of 0.72 and 0.90, respectively [7, 15, 19]. This may indicate that our method has indeed improved the automatic segmentation accuracy.

Firstly, the improvement may lie in the application of CNN to extract the image features automatically and objectively. In our model, the low-level features were combined into high-level features with semantic information through convolutions (Figure 1). By iterations through the back propagation algorithm, we highlighted the characteristics associated with the targeted area and gradually suppressed irrelevant features [12]. In this way, our model can extract the most useful features and achieve better segmentation results.

Secondly, in our designed network architecture, we fused the different feature maps at feature representation phase and scores map reconstruction phase for the final reconstruction. As shown in Figure 4(a), which was acquired in the reconstruction phase, the tumor location and shape are roughly visible; however, they are unclear. Through the fusion of this feature map and the fine-feature map acquired in feature representation phase, we may fix the problem of information loss in the reconstruction process. As shown in Figure 4(b), we finally had better segmentation through the reconstruction from the fused feature maps.

There is space to further improve the accuracy and effectiveness of our current model. As shown in Table 3, our method resulted in less accurate DSC results of 0.80 in some cases. For further improvement, we may include T2 weighted images, since T2 weighted images are widely used in the manual contouring of tumor regions. Therefore we would expect to have better performance with both the DCE-MRI with T2 weighted images. We applied the Z-score translation in preprocessing to normalize the DCE-MRI [18]. However, some information could be lost during this normalization. Therefore, we may investigate an appropriate method of normalization to avoid the loss of intrinsic image features. Importantly, we may improve our network architecture, such as the depth of our network, for the direct training of the 3D images and the incorporation of time domain information from the dynamic scans. In future studies, it is expected to further improve our method and the segmentation results with these ideas.

5. Conclusion

A robust segmentation method for NPC tumor based on deep learning convolutional neural network and CE-MRI has been established. The tumors can be segmented successfully in seconds with high accuracy. This automatic segmentation method may be time-effective in tumor contouring for routine radiotherapy treatment planning. Future studies may aim to improve the segmentation accuracy and efficiency with more training data and optimized network structure, thus helping clinicians improve the segmentation results in the clinical practice of NPC.

Data Availability

The authors do not have permission to share data.

Conflicts of Interest

All authors of the manuscript declare that there are no conflicts of interest with regard to equipment, contrast, drug, and other materials described in the study.


The study was jointly funded by the National Natural Science Foundation of China (no. 61401285), National Natural Science Foundation of China-Shenzhen Joint Fund Project (U1713220), Science and Technology Planning Project of Guangdong Province, China (no. 2017ZC0425), Shenzhen High-Caliber Personnel Research Funding (no. 000048), Shenzhen Municipal Scheme for Basic Research (no. JCYJ20160307114900292, no. JCYJ20170302152605463, and no. JCYJ20170306123423907), and Special Funding Scheme for Supporting the Innovation and Research of Shenzhen High-Caliber Overseas Intelligent (no. KQCX20140519103243534).


  1. Hong Kong Cancer Registry, “Hospital Authority of Hong Kong,” View at: Google Scholar
  2. International Agency for Research on Cancer, “CI5 XI: Cancer incidence in five continents volume XI,” View at: Google Scholar
  3. Y.-J. Hua, F. Han, L.-X. Lu et al., “Long-term treatment outcome of recurrent nasopharyngeal carcinoma treated with salvage intensity modulated radiotherapy,” European Journal of Cancer, vol. 48, no. 18, pp. 3422–3428, 2012. View at: Publisher Site | Google Scholar
  4. M. Sumi and T. Nakamura, “Extranodal spread in the neck: MRI detection on the basis of pixel-based time-signal intensity curve analysis,” Journal of Magnetic Resonance Imaging, vol. 33, no. 4, pp. 830–838, 2011. View at: Publisher Site | Google Scholar
  5. B. Huang, C.-S. Wong, B. Whitcher et al., “Dynamic contrast-enhanced magnetic resonance imaging for characterising nasopharyngeal carcinoma: comparison of semiquantitative and quantitative parameters and correlation with tumour stage,” European Radiology, vol. 23, no. 6, pp. 1495–1502, 2013. View at: Publisher Site | Google Scholar
  6. J. Zhou, T. V. Lim, and J. Huang, “Segmentation and visualization of nasopharyngeal carcinoma using MRI,” Computers in Biology & Medicine, vol. 33, no. 5, pp. 407–424, 2003. View at: Google Scholar
  7. J. Zhou, K. L. Chan, P. Xu, and V. F. H. Chong, “Nasopharyngeal carcinoma lesion segmentation from MR images by support vector machine,” in Proceedings of the 2006 3rd IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 1364–1367, Arlington, VA, USA, April 2006. View at: Google Scholar
  8. W. Huang, K. L. Chan, Y. Gao, J. Zhou, and V. Chong, “Semi-supervised Nasopharyngeal Carcinoma Lesion Extraction from Magnetic Resonance Images Using Online Spectral Clustering with a Learned Metric,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2008, vol. 5241 of Lecture Notes in Computer Science, pp. 51–58, Springer Berlin Heidelberg, Berlin, Heidelberg, 2008. View at: Publisher Site | Google Scholar
  9. W. Huang and C. Liu, “A hybrid supervised learning nasal tumor discrimination system for DMRI,” Journal of the Chinese Institute of Engineers, vol. 35, no. 6, pp. 723–733, 2012. View at: Publisher Site | Google Scholar
  10. J. Shi, S. Zhou, X. Liu, Q. Zhang, M. Lu, and T. Wang, “Stacked deep polynomial network based representation learning for tumor classification with small ultrasound image dataset,” Neurocomputing, vol. 194, pp. 87–94, 2016. View at: Publisher Site | Google Scholar
  11. J. Shi, X. Zheng, Y. Li, Q. Zhang, and S. Ying, “Multimodal Neuroimaging Feature Learning with Multimodal Stacked Deep Polynomial Networks for Diagnosis of Alzheimer's Disease,” IEEE Journal of Biomedical and Health Informatics, vol. 22, no. 1, pp. 173–183, 2018. View at: Publisher Site | Google Scholar
  12. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. View at: Publisher Site | Google Scholar
  13. X. Zhao, Y. Wu, G. Song, Z. Li, Y. Fan, and Y. Zhang, “Brain Tumor Segmentation Using a Fully Convolutional Neural Network with Conditional Random Fields,” in Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, vol. 10154 of Lecture Notes in Computer Science, pp. 75–87, Springer International Publishing, Cham, 2016. View at: Publisher Site | Google Scholar
  14. M. Havaei, A. Davy, D. Warde-Farley et al., “Brain tumor segmentation with Deep Neural Networks,” Medical Image Analysis, vol. 35, pp. 18–31, 2017. View at: Publisher Site | Google Scholar
  15. Y. Wang, C. Zu, G. Hu et al., “Automatic Tumor Segmentation with Deep Convolutional Neural Networks for Radiotherapy Applications,” Neural Processing Letters. View at: Publisher Site | Google Scholar
  16. M. Soret, S. L. Bacharach, and I. Buvat, “Partial-volume effect in PET tumor imaging,” Journal of Nuclear Medicine, vol. 48, no. 6, pp. 932–945, 2007. View at: Publisher Site | Google Scholar
  17. O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolutional networks for biomedical image segmentation,” in Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention and Medical Image Computing and Computer-Assisted Intervention (MICCAI '15), vol. 9351 of Lecture Notes in Computer Science, pp. 234–241, November 2015. View at: Publisher Site | Google Scholar
  18. L. A. Shalabi, Z. Shaaban, and B. Kasasbeh, “Data mining: a preprocessing engine,” Journal of Computer Science, vol. 2, no. 9, pp. 735–739, 2006. View at: Publisher Site | Google Scholar
  19. K.-W. Huang, Z.-Y. Zhao, Q. Gong, J. Zha, L. Chen, and R. Yang, “Nasopharyngeal carcinoma segmentation via HMRF-EM with maximum entropy,” in Proceedings of the 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2015, pp. 2968–2972, Milan, Italy, August 2015. View at: Publisher Site | Google Scholar

Copyright © 2018 Qiaoliang Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.