Algorithms and Devices for Smart Processing Technology for Energy SavingView this Special Issue
Research Article | Open Access
Hyunhee Lee, Jaechoon Jo, Heuiseok Lim, "Study on Optimal Generative Network for Synthesizing Brain Tumor-Segmented MR Images", Mathematical Problems in Engineering, vol. 2020, Article ID 8273173, 12 pages, 2020. https://doi.org/10.1155/2020/8273173
Study on Optimal Generative Network for Synthesizing Brain Tumor-Segmented MR Images
Due to institutional and privacy issues, medical imaging researches are confronted with serious data scarcity. Image synthesis using generative adversarial networks provides a generic solution to the lack of medical imaging data. We synthesize high-quality brain tumor-segmented MR images, which consists of two tasks: synthesis and segmentation. We performed experiments with two different generative networks, the first using the ResNet model, which has significant advantages of style transfer, and the second, the U-Net model, one of the most powerful models for segmentation. We compare the performance of each model and propose a more robust model for synthesizing brain tumor-segmented MR images. Although ResNet produced better-quality images than did U-Net for the same samples, it used a great deal of memory and took much longer to train. U-Net, meanwhile, segmented the brain tumors more accurately than did ResNet.
General characteristics of medical imaging data are as follows. It is difficult to obtain a large volume of data, and it is more difficult to acquire labelled data necessary for supervised learning. As shown in Figure 1, since the picture archiving and communicating system (PACS) was introduced in hospitals, vast amounts of multimedia data in the medical imaging field have been stored. However, due to various institutional and privacy issues, external institutions have difficulty gaining access to such data. Additionally, utilizing the accumulated data for learning requires preprocessing the data, which consequentially takes considerable time and effort.
In addition, medical imaging data have the following characteristics. A typical chest X-ray image contains 2,000 pixels horizontally and 2,500 vertically, which results in a total of five million pixels. Meanwhile, the lesions usually occupy a relatively small part of the whole image. Magnetic resonance imaging (MRI) scans provide more detailed information about the inner organs such as the brain, skeletal system, and other organ systems than do computerized tomography (CT) scans. Although MRI has many advantages, it has some disadvantages such as prolonged acquisition time (about 45 min), high costs, and limiting patient factors such as claustrophobia or metal devices in their bodies . Because MRI scanners use strong magnetic fields and magnetic field gradients, MRI scans could be dangerous especially for a patient with nonremovable metal inside the body , and therefore the acquired images could be blurred and abnormal. CT scans are combinations of X-ray images taken from different angles, and they are fast, painless, and noninvasive. However, they do expose the patient to radiation, albeit at a relatively low dose. To minimize radiation exposure, CT scans produce low-dose images, and as a result, they unfortunately tend to be severely degraded by excessive noise and streak artifacts. For these reasons, the medical imaging field contains a small number of available medical data.
Generative adversarial networks (GANs)  provide a generic solution to the lack of medical imaging data. As shown in Figure 2, they can be applied to diverse tasks such as image synthesis [7, 9], segmentation [6, 11, 12], reconstruction, and classification . Figure 3 shows the statistics of GAN-related papers categorized by tasks and imaging modalities. These statistics are based on the databases of PubMed, arXiv, proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), SPIE Medical Imaging, IEEE International Symposium on Biomedical Imaging (ISBI), and International conference on Medical Imaging with Deep Learning (MIDL) . The cutoff date of the search was set as July 30, 2018, and the number of GAN-related papers increased significantly in 2017 and 2018. As Figure 3 shows, about 70% of these papers studies on image synthesis and segmentation, and MR is the most-studied imaging modality in the GAN-related publications.
In this paper, we use a CycleGAN  which has the significant performance in medical imaging so as to synthesize brain tumor-segmented MR images. Generating brain tumor-segmented MR images consists of two tasks, namely, synthesis and segmentation. One entails image-to-image translation, which synthesizes a novel image by combining the content of one image with the style of another. The other involves locating and marking lesions on the images. To perform these two tasks, we conduct two experiments using two different generative networks. In the first experiment in this paper, we use a ResNet  model, which has significant advantages in style transfer. In the second, we use a U-Net  model, one of the most commonly used segmentation techniques. We compare the performance of each model and propose a more robust model for synthesizing brain tumor-segmented MR images, which could lead to high-quality multimedia data augmentation in the medical imaging field.
2. Related Work
In a study on brain CT image synthesis from MR images by Wolterink et al. , training using unpaired images was even better than using paired images . That is, the performance of CycleGAN is greater than that of Pix2pix  in cross-modality synthesis of medical images. In addition, as shown in Table 1, the use of CycleGAN in the recent publications for medical image synthesis is increasing. In this section, we discuss CycleGAN, ResNet, and U-Net used in our experiments.
In the third column, following the method denotes some modifications either on the architecture of the network or on the employed losses.
The problem that Pix2pix or CycleGAN has to solve is the interpretation of images from one domain to another. In Pix2pix, there must be data pairs corresponding to both domains, whereas CycleGAN can solve this problem without these data pairs . CycleGAN is a GAN using two generators and two discriminators. We call one generator G, and it converts images from the X domain to the Y. The other generator is called F, and it converts images from the Y domain to the X. Each generator has a corresponding discriminator that attempts to tell apart its synthesized images from real ones. CycleGAN is not just a simple mapping technique. It considers the returning mapping and puts a constraint on coming back to its original state. As shown in Figure 4, not only mapping from X to Y but also mapping back to original X from Y should be defined, and this applies to the opposite mapping as well. The reason for doing this is X and Y are unpaired domains. When X goes to Y, it is checked that it looks like Y, and the actual constraint is to keep the original shape when it returns to X again. That is, the shape of X does not change much, only its style is changed to that of Y, and therefore it looks as if only the style has been transferred.
There are two components in the CycleGAN objective function, an adversarial loss and a cycle-consistency loss. Both are essential for successful results. The adversarial loss alone is not sufficient to produce high-quality images, which leaves the model underconstrained . In other words, it forces the generated output to be of the appropriate domain but does not force the input and output to be recognizably the same. The cycle-consistency loss addresses this underconstrained problem. The full objective function by putting these loss terms together and by weighting the cycle-consistency loss with a hyper parameter is defined as equation (1). Pix2pix can rely heavily on L1 loss; therefore, adversarial loss may play a supplementary role and it may be better to subtract it. In contrast, the CycleGAN does not learn at all except for the adversarial loss.
The general CNN network in Figure 5(a) receives the input x and yields the output H (x) through the two weighted layers. This output is the input to the next layer. Figure 5(b) shows the architecture of the ResNet, and it uses a shortcut connection that connects the input of the layer directly to the output . It is simple network, but its performance is significantly high. As can be seen in Figure 5(b), the output H (x) has been changed to F (x) + x, and consequently H (x) = F (x) + x is derived as F (x) = H (x) − x. Learning F (x) can be seen as learning residual, and therefore it is called ResNet. It adds the results from the weight layer and the previous results and uses ReLU. ResNet learns in the direction that F (x) becomes zero. In addition, since x is directly connected to shortcut connection, there is no increase in computation, and it is possible to select which layer to include. For example, a fully connected layer as well as a convolution layer can be added .
Segmentation is the process of partitioning an image into different meaningful segments . In medical imaging, these segments often correspond to different tissue classes, organs, pathologies, or other biologically relevant structures . In the past, there were few medical images, and therefore experts could segment images directly. However, more needs for the automation of the segmentation have arisen as the volume of the medical images has increased exponentially. Analysing medical images can often be difficult and time consuming, and therefore deep neural networks can help doctors make more rapid and more accurate diagnoses .
U-Net is one of the most preferred models when segmenting images. Figure 6(a) shows the general encoding and decoding process, and Figure 6(b) shows a U-Net model with a skip connection added to the encoder-decoder structure. If the image size is reduced (down sampling) and then reraised (up sampling), sophisticated pixel information disappears. This is a big problem for image segmentation which requires dense prediction on a pixel-by-pixel basis. The skip connection, which passes important information directly from the encoder to the decoder, results in a much clearer image at the decoder section, allowing for more accurate prediction.
3. Materials and Methods
The first CycleGAN used the U-Net model. The advantage of skip connection is that it has much more detail, but the disadvantage is that the performance is not good when the two contents are similar. On the other hand, the last CycleGAN used ResNet model, which is good for image quality but has the disadvantage of using a great deal of memory.
In this paper, we synthesize high-quality brain tumor-segmented MR images. It consists of two tasks, namely, synthesis and segmentation, and therefore we conduct experiments from two perspectives. One is to perform image-to-image translation, which synthesizes a novel image with the style of another image. The other is to locate and mark tumors in brain MR image. Therefore, we perform two experiments with two different generative networks. In the first experiment, we use a ResNet model, which has significant advantages in style transfer. In the second experiment, we use a U-Net model, which is one of the most commonly used segmentation techniques. In this paper, we compare the performance of each model and propose a more powerful model for synthesizing brain tumor-segmented MR images.
Figure 7 shows the datasets of source and target domains used in our experiments. Figure 7(a) represents the brain lesion images in the source domain, and Figure 7(b) is the segmentation mask images of the brain lesion in the target domain. We used a training set of 765 unaligned image pairs and a test set of 92 unaligned image pairs in our experiments. The size of each image is on average 30 to 50 KB. The environments for our experiments are as follows: Ubuntu 16.04.4 LTS for OS, GeForce GTX 1080Ti and CUDA Toolkit for high-performance GPU-accelerated application, and TensorFlow library for deep learning framework.
3.2. Architecture of Our Discriminative Model
The configuration of the discriminative model in our experiments is shown in Figure 8. It consists of four convolution layers, and we use leaky ReLU as an activation function for each layer. In the first step, we extract the features from the image, and in the last, we decide which specific category these features belong to. For that, we add a final convolution layer that produces a one-dimensional output. Both ResNet and U-Net generative models use this model as a discriminator in our experiments.
3.3. Architecture of Our Generative Model Using ResNet
The generator has the job of taking an input image and performing the transformation to produce the target image. The architecture of our generative model using ResNet can be viewed in Figure 9. First, the encoding process consists of three convolution layers, and ReLU is used as an activation function for each layer. In the transformation process, nine residual blocks are constructed, and each block consists of convolution layer-ReLU-convolution layer. The decoding process consists of two deconvolution layers, and each uses ReLU as an activation function. In the last decoding step, we add a final convolution layer.
3.4. Architecture of Our Generative Model Using U-Net
As shown in Figure 10, the encoder-decoder structure of our generative model using U-Net is as follows. First, the encoding process consists of eight convolution layers, and leaky ReLU is used as an activation function for each layer. The decoding process consists of eight deconvolution layers. ReLU is used as an activation function for each layer, and 50% dropout is performed in the first to third decoding processes. We also use concat function in the decoding process to perform a skip connection that passes important information directly from the encoder to the decoder.
3.5. Methods for Synthesizing Brain Tumor-Segmented MR Images
The models in our work includes forward and backward cycles, just like the CycleGAN model proposed by Zhu et al. . With these cycles, the novel synthesized image can only obtain the style of the target image while retaining the shape of the original image.
As shown in Figure 11, the architecture of our forward and backward cycles is composed of two generators (and ) and two discriminators ( and ). Forward process is as follows: . More specifically, our forward process can be explained in three steps. First, generator is trained to translate an input brain tumor domain (A) into a segmentation mask domain . Second, is trained to discriminate the generated image from the real image . Third, is trained to translate the generated image into the brain tumor MR image . Likewise, backward process is as follows: . More specifically, our backward process can be also defined in three steps. First, is trained to translate an input segmentation mask domain (B) into a brain tumor domain . Second, is trained to discriminate the generated image from the real image . Third, is trained to translate the generated image into the segmentation mask MR image .
The goal of the discriminator is to distinguish the novel image generated by the two generators from the real one, and therefore the discriminative neural network is trained to minimize the final classification error. On the other hand, the goal of the generator is to fool the discriminator, and therefore the generative neural network is trained to maximize the final classification error. Both networks attempt to beat each other, and this competition between them makes them evolve with respect to their respective goals. Hence, the adversarial loss function that discriminator aims to minimize and generator aims to maximize is defined as
Next, the adversarial loss function that discriminator aims to minimize and generator aims to maximize is defined as
Adversarial loss alone cannot guarantee that the learned function can map an input domain A to a target domain B. To regularize the model and to transform source distribution into target and then back again, we should introduce the constraint of cycle-consistency into the model. An additional loss term, cycle-consistency loss, is defined as
Our full objective function by putting these loss terms together and by weighting the cycle-consistency loss with a hyper parameter is defined aswhere we set to 10, which is the optimal value as introduced in CycleGAN.
We basically use metrics like mean-squared error and mean absolute error to evaluate the performance of the ResNet model as well as that of the U-Net model. We also evaluate the performance of the discriminator in each model using the following metrics:which is the discriminator’s loss for the real image B:which is the discriminator’s loss for the fake image B synthesized by a generator , andwhich is the sum of the discriminator’s losses for both and .
We also evaluate the performance of the generator in each model using the following metrics:which is the forward and backward cycle-consistency loss, where the hyper parameter is set to 10, andwhich is the generator’s loss for the fake image B synthesized by the generator , and vice versa for and . In addition to the above-metioned evaluation metrics, we use human perception to judge the visual quality of samples. We evaluate the quality of the generated images and whether they can segment the brain tumors well.
4. Results and Discussion
Our training process is as follows. We used 20 epochs, 100 steps, and Adam as an optimizer with initial learning rate of 0.0002 and Adam’s momentum term of 0.5. The training took 28.6 hours for the ResNet model and 6.9 hours for the U-Net model. That is, the training of the ResNet took four times as long as that of the U-Net.
Table 2 shows all losses of the discriminator and the generator in each training process of the ResNet and the U-Net. Full loss means the sum of all losses not only from the brain tumor domain to the segmentation mask domain but also from the segmentation mask domain to the brain tumor domain. As shown in Table 2, the full loss of the discriminator is 0.430 for the ResNet and 4.890e − 3 for the U-Net. It indicates that the performance of the discriminator is significantly higher when using U-Net than using ResNet. In contrast, the full loss of the generator is 1.338 for the ResNet and 2.572 for the U-Net. That is, the performance of the generator is slightly higher when using ResNet than using U-Net.
A is the brain tumor domain and B is the segmentation mask domain.
Additionally, Figure 12 (A ⟶ B) and Figure 13 (B ⟶ A) show the novel images generated by the ResNet and the U-Net model for the same samples. As shown in Figure 12 (A ⟶ B), U-Net, which is a more robust model for segmentation, located and marked tumors in the brain MR image and produced brain tumor-segmented MR image synthesis similar to ground truth. ResNet, in contrast, did not segment the exact location of the brain lesion. Figure 13 (B ⟶ A) shows the novel brain images generated from the segmentation mask domain. Both networks produced high-quality images, whereas the synthesis-quality of ResNet was higher than that of U-Net (see Figures 14 and 15 for more details about the novel images generated by the ResNet and the U-Net model for the same samples).
In our work, we augmented brain tumor-segmented MR images, which consists of two tasks: synthesis and segmentation. Therefore, we conducted two experiments, one to perform image-to-image translation, namely, image style transfer, and the other to locate and mark tumors in brain MR image, that is to say image segmentation. We performed experiments with two different generative networks, the first one using the ResNet model, which has great advantages in style transfer, and the second one, the U-Net model, one of the highly robust models for segmentation.
The performance comparison between the ResNet and the U-Net generative model is as follows. When the generator used ResNet, its training loss was slightly less than that of U-Net, and moreover it produced better-quality images than U-Net. However, it was a memory-intensive network and took much longer to train, and it did not segment the brain tumors better than U-Net. On the other hand, when the generator was with U-Net, the discriminator performed the better discrimination of whether the generated image was real or fake. Additionally, for the same samples, U-Net segmented the brain tumors more accurately than did ResNet, i.e., the segmented images generated from the brain tumor domain marked the exact location of the brain lesions.
The generative networks proposed in our paper will enable the synthesis of not only brain tumor-segmented images, but also medical images in Figure 2, as well as the novel images of segmenting tumors from the breast, uterus, and other organs, depending on the intended application. In future work, we will apply a network that combines the advantages of two networks. If we merge two models, it will be possible to generate a high-quality synthetic image with accurate segmentation. We will also increase the number of epochs and adjust hyper parameters such as initial learning rate. High-quality multimedia data augmentation using GANs has a direct impact on radiology workflow and patient care improvement. Although promising results have been reported, the adoption of GANs in medical imaging is still in its infancy and there are no clinically adopted breakthrough applications yet. Therefore, more studies and more diverse attempts are needed.
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This research was supported by the Ministry of Science and ICT (MSIT), Korea, under the Information Technology Research Center (ITRC) support program (IITP-2020-2018-0-01405) supervised by the Institute for Information and Communications Technology Planning and Evaluation (IITP). This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (NRF-2017M3C4A7068189).
- O. Ratib, N. Roduit, D. Nidup, G. De Geer, A. Rosset, and A. Geissbuhler, “PACS for Bhutan: a cost effective open source architecture for emerging countries,” Insights Into Imaging, vol. 7, no. 5, pp. 747–753, 2016.
- R. H. Choplin, J. M. Boehme II, and C. D. Maynard, “Picture archiving and communication systems: an overview,” Radiographics, vol. 12, no. 1, pp. 127–129, 1992.
- S. Lütje, J. W. J. de Rooy, S. Croockewit, E. Koedam, W. J. G. Oyen, and R. A. Raymakers, “Role of radiography, MRI and FDG-PET/CT in diagnosing, staging and therapeutical evaluation of patients with multiple myeloma,” Annals of Hematology, vol. 88, no. 12, pp. 1161–1168, 2009.
- F. G. Shellock and J. V. Crues, “MR procedures: biologic effects, safety, and patient care,” Radiology, vol. 232, no. 3, pp. 635–652, 2004.
- I. Goodfellow, J. Pouget-Abadie, and M. Mirza, “Generative adversarial nets,” in Proceedings of the Paper presented at the Advances in Neural Information Processing Systems, Montreal, Canada, December 2014.
- W. Dai, J. Doyle, X. Liang, Scan: structure correcting adversarial network for chest X-rays organ segmentation, arXiv preprint arXiv:1703.08770 1, 2017, 2.
- J. M. Wolterink, A. M. Dinkla, and M. H. F. Savenije, “Deep MR to CT synthesis using unpaired data,” in Proceedings of the Paper presented at the International Workshop on Simulation and Synthesis in Medical Imaging, Québec, Canada, September 2017.
- X. Yi, E. Walia, P. Babyn, Unsupervised and semi-supervised learning with categorical generative adversarial networks assisted by wasserstein distance for dermoscopy image classification, arXiv preprint arXiv:1804.03700, 2018.
- C. Senaras, M. K. K. Niazi, and B. Sahiner, “Optimized generation of high-resolution phantom images using cgan: application to quantification of Ki67 breast cancer images,” PLoS One, vol. 13, no. 5, 2018.
- P. Costa, A. Galdran, and M. I. Meyer, “End-to-end adversarial retinal image synthesis,” IEEE Transactions on Medical Imaging, vol. 37, no. 3, pp. 781–791, 2017.
- F. Pollastri, F. Bolelli, R. Paredes, and C. Grana, “Augmenting data with gans to segment melanoma skin lesions,” Multimedia Tools and Applications, pp. 1–18, 2019.
- M. Rezaei, H. Yang, and C. Meinel, “Recurrent generative adversarial network for learning imbalanced medical image semantic segmentation,” Multimedia Tools and Applications, pp. 1–20, 2019.
- X. Yi, E. Walia, and P. Babyn, “Generative adversarial network in medical imaging: a review,” Medical Image Analysis, vol. 58, p. 101552, 2019.
- J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the Paper presented at the IEEE International Conference on Computer Vision, Venice, Italy, October 2017.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the Paper presented at the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, June 2016.
- O. Ronneberger, P. Fischer, and T. Brox, “U-Net: convolutional networks for biomedical image segmentation,” in Proceedings of the Paper presented at the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, October 2015.
- P. Isola, J. Y. Zhu, and T. Zhou, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the Paper presented at the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, July 2017.
- C.-B. Jin, H. Kim, M. Liu et al., “Deep CT to MR synthesis using paired and unpaired data,” Sensors, vol. 19, no. 10, p. 2361, 2019.
- Z. Zhang, L. Yang, and Y. Zheng, “Translating and segmenting multimodal medical volumes with cycle-and shape-consistency generative adversarial network,” in Proceedings of the Paper Presented at the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, UT, USA, June 2018.
- Y. Hiasa, Y. Otake, and M. Takao, “Cross-modality image synthesis from unpaired data using cyclegan,” in Proceedings of the Paper presented at the International workshop on simulation and synthesis in medical imaging, Shenzhen, China, October 2018.
- S. U. Dar, M. Yurt, L. Karacan, A. Erdem, E. Erdem, and T. Cukur, “Image synthesis in multi-contrast mri with conditional generative adversarial networks,” IEEE Transactions on Medical Imaging, vol. 38, no. 10, pp. 2375–2388, 2019.
- P. Welander, S. Karlsson, A. Eklund, Generative adversarial networks for image-to-image translation on multi-contrast MR images-a comparison of cyclegan and unit, arXiv preprint arXiv:1806.07777, 2018.
- H. Ji, D. Hooshyar, K. Kim, and H. Lim, “A semantic-based video scene segmentation using a deep neural network,” Journal of Information Science, vol. 45, no. 6, pp. 833–844, 2019.
- M. Forouzanfar, N. Forghani, and M. Teshnehlab, “Parameter optimization of improved fuzzy C-means clustering algorithm for brain mr image segmentation,” Engineering Applications of Artificial Intelligence, vol. 23, no. 2, pp. 160–168, 2010.
- A. So, D. Hooshyar, K. Park, and H. Lim, “Early diagnosis of dementia from clinical data by machine learning techniques,” Applied Sciences, vol. 7, no. 7, p. 651, 2017.
Copyright © 2020 Hyunhee Lee et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.