Abstract

Using deep neural networks for segmenting an MRI image of heterogeneously distributed pixels into a specific class assigning a label to each pixel is the concept of the proposed approach. This approach facilitates the application of the segmentation process on a preprocessed MRI image, with a trained network to be utilized for other test images. As labels are considered expensive assets in supervised training, fewer training images and training labels are used to obtain optimal accuracy. To validate the performance of the proposed approach, an experiment is conducted on other test images (available in the same database) that are not part of the training; the obtained result is of good visual quality in terms of segmentation and quite similar to the ground truth image. The average computed Dice similarity index for the test images is approximately 0.8, whereas the Jaccard similarity measure is approximately 0.6, which is better compared to other methods. This implies that the proposed method can be used to obtain reference images almost similar to the segmented ground truth images.

1. Introduction

Deep neural networks have been highly successful in segmenting outdoor scenes with high complexity, dissimilar patterns, variable texture, and wide pixel range. In the present study, this model is used for segmenting MRI images of the brain, which are relatively simpler than outdoor scenes. The precise segmentation of a 2D image has always been a challenging task, and various approaches have been proposed for better accuracy, such as supervised and unsupervised, manual and automatic, and standalone and neural network-based techniques. Similarly, deep convolutional neural networks (CNNs) have been effective in machine learning and have had impact on various industrial, medical, and commercial fields. Generally, image segmentation is the process of presenting and partitioning image content into distinguishable parts. Moreover, segmentation methods from edge detection as well as supervised and unsupervised methods have been proposed. Similarly, neural networks have been developed for medical image processing, particularly in MRI image segmentation and Alzheimer’s disease classification [1, 2]. Brain MRI segmentation is fundamental in several clinical applications and influences the outcome of the entire analysis because various processing operations rely on accurate segmentation of anatomical and structural regions. For instance, MRI segmentation is frequently used for calculating and imagining different brain structures, delineating lesions, analyzing brain development, and image-guided intrusions and surgical preparation. In MRI, tissue is heterogeneously concentrated in terms of intensity owing to the bias field and the partial volume effect that reflects the actual content of the brain, namely, white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Therefore, accurate and selective methods should be chosen.

In contrary to existing methods [3], which have a certain way of feature extraction and criteria like thresholding, contours, and clustering, this method has been used extensively by many researchers and found to be excellent in case of MRI segmentation as well. But on the contrary, deep neural network are now proving to be better, highly computational for large data, and powerful because of encoder-decoder-based network or CNN architecture. The features are automatically investigated from low level features like edge, blob, and line to high level features like color, shape, and detail in a hierarchical manner by each layer. The activation layer like ReLu helps to make those features more clear and computable. Hence, we can easily get our segmentation result using our model. The only problem will be to train the network as it requires a large amount of ground truth and design the network appropriately.

2. Background and Methodology

2.1. Semantic Segmentation

In semantic segmentation, the image is segmented on a pixel-label basis, that is, each pixel is associated with a certain defined class. Its applications include scene understanding, autonomous driving, object recognition, machine translation, and machine vision. Semantic segmentation has been improved by using full CNNs [4] and deep CNNs [58]. These neural networks are trained in an end-to-end, pixel-to-pixel manner on each layer for image segmentation.

2.2. SegNet Layer

The SegNet layer is a deep full CNN architecture adapted for semantic segmentation that was proposed by Vijay Badrinarayanan et al. [5]. Generally, the semantic segmentation approach is used for outdoor, indoor, and road scenes mostly for a large number of classes. SegNet was originally designed for scene understanding applications. Hence, it should be efficient in terms of memory, operation, and computational time. It is also considerably be smaller in terms of the number of trainable parameters than other competing architectures, and it can be used in training end-to-end pixel-label classes using stochastic gradient descent and the cross-entropy loss function.

The encoder used in SegNet is identical to the convolutional layers in VGG16 [9]. The fully connected layers of VGG16 have been removed in SegNet, and thus the encoder network is considerably reduced and easier to train compared to other recent architectures [5, 6, 10, 11]. The most important constituent of SegNet is the encoder-decoder network, which consists of a hierarchy of downsampling encoders matching each upsampling decoder with associated feature vectors cycling inside them.

2.3. CNN and Architecture

CNNs have always been important in machine learning; by using various types of neural networks, systematic training and testing of image and pixel labels can be performed. The encoder network used here consists of convolution layers of 64 filters, each of size 3 × 3, manually padded, followed by batch normalization and ReLu activation unit and repeatedly followed by same convolution, batch normalization, and ReLu for proper downsampling and robust feature extraction. Same is the case with decoder convolution network but firstly unpool layer and then convolution layer following batch normalization and ReLu.

Proposed CNN has an encoder network and a matching decoder network, which is followed by a final pixel-based classification layer. This architecture is shown in Figure 1. To simplify the architecture, two encoder and two decoder networks have been employed: encoder1 is mapped to decoder1, and encoder2 is mapped to decoder2. encoder1 consists of encoder1_conv1, encoder1_bn_1, encoder1_relu_1, and encoder1_maxpool_1 in hierarchical order, whereas dencoder1 consists of dencoder1_unpool_1, dencoder1_conv1, dencoder1_bn_1, and dencoder1_relu_1. encoder2 and decoder2 are similarly structured. Here, encoder1 is followed by encoder2, and dencoder2 is followed by dencoder1, as shown in Figure 2. The first 13 layers constitute an encoder network that performs the convolution with 64 filter banks of size 3 × 3 to obtain sets of features along with batch normalization in a minibatch set of 8 images. ReLU acts as an activation function f(x) = max (0, x), which can be used by neurons, as any other activation function, to eliminate negative values. Thereafter, the max pooling layer with a 2 × 2 window and stride size 2 (nonoverlapping window) is executed, so that the resulting output is downsampled by a factor of 2. Multiple layers of max pooling downsampling are used to achieve more translation invariance and robust pixel classification. Similarly, the decoder in the decoder network upsamples the input layer feature maps unpooling the memorized max pooling indices with the location of maximum feature values from the corresponding encoder feature maps. It is followed by the convolution and batch normalization layers to produce dense features that are similar in size to the input image. The details of the simplified architecture are tabulated in Table 1.

3. Experimental Setup

For the experiments, T1-weighted structural brain MRI data were used that are available on OASIS (open access series of imaging studies). OASIS is an open access website [12], created by the Alzheimer’s Disease Research Center at Washington University. The dataset consists mainly of brain MRI images from Alzheimer’s disease patients aged 18 to 96 and normal human brain MRI images for comparative study. All experiments were conducted using Matlab R2017b on an i3 4160, 4 GB RAM windows desktop. To reduce computation time, the neural network was trained by a single GeForce GTX 1050 Ti GPU using parallel computing.

3.1. Image Extraction and Preprocessing

The dataset consisted of several types of MRI scans with raw, processed, and segmented 3D raw files or analyze format file (.img, .hdr). Cross-sectional averaged and coregistered scan images were used that were obtained in the native acquisition space resampled to 1 mm isotropic voxels [12] from 50 subjects (cross-sectional MRI brain scans of dimensions 208 × 176 × 160). The MRIcon software package was used to extract slices from each mid cross-sectional MRI to generate images of size 208 × 176 pixels, each representing a single MRI scan.

Two disc images were selected from ID OAS1_0001_MR1 to OAS1_0080_MR1, consisting of 76 images originally. The skull stripped image was used as training image, and the segmented images (the image is already segmented into four parts) of each training image were used as training labels or ground truth. Later, the trained network was used to segment the test MRI images, and the result was compared with the ground truth segmentation. Regarding the training environment, “Stochastic Gradient Descent with Momentum” was selected as the training optimization algorithm, with an initial learning rate of 0.001. To facilitate smooth training, the training was carried out in minibatches of 8 files per epoch, with data augmentation carried out at a random reflection in X-axis and rotation of ±10 degrees from the original position of each image. The predesigned SegNet layers created a training network, which was to undergo a stepwise feature extraction process on each CNN layer. Additionally, the classification of pixels was facilitated by classweight−classname pairs and a cross-entropy loss function. The SegNet layer acted as the training framework, whereas the pixel classification layer acted as classification output. The overall workflow of proposed method is illustrated in Figure 3.

3.2. Training and Testing Accuracy

Seventy-six images were selected from ID OAS1_0001_MR1 to OAS1_0080_MR1 for training (including the augmented images) excluding four missing MRI and six images from ID OAS1_0081_MR1 to OAS1_0087_MR1 for testing excluding OAS1_0082_MR1. The overall training accuracy was 91.47 with mean global accuracy 0.91, mean accuracy 0.88248, mean IoU 0.88248, and WeightedIoU 0.84. The clusterwise accuracy, IoU, and MeanBFScore are tabulated in Table 2. The intersection over union (IoU) for the best predicted image was approximately 0.8477, whereas IoU for the worst predicted image was approximately 0.625. The confusion matrix obtained from the classification is shown in Figure 4. The obtained result shows, out of total training image, the network could correctly classify around 99% of background pixel, 93% of CSF pixel, 78% of GM, and 83.6% of WM which indicates the network is trained and ready to perform segmentation in other testing image.

3.3. Dice and Jaccard Similarity Index

To assess the performance of the method, the Dice similarity index, the Jaccard coefficient, and the mean squared error (MSE) of each tested image were calculated with reference to the ground truth image available in the same database. For comparison, each image was converted into a label image as that of ground truth. From the experiment, it can be clearly seen that results of high visual quality were obtained, with almost 80% Dice similarity index in each test image.

The Dice similarity coefficient of two sets x and y is defined aswhere |x| represents the cardinality of the set x and |y| represents the cardinality of the set y.

Similarly, the Jaccard similarity coefficient is defined as

MSE is defined aswhere and stand for the pixel intensity value for the ground truth reference image of size and the simulated image pixel value of the same image size, that is, , respectively. Both the Dice similarity index and the Jaccard similarity index are important parameters for determining how closely the images and are related, and IoU is used for determining how closely they are spatially matched, with no wrong mapping. Similarly, MSE was calculated to authenticate the similarity index and the resemblance of the simulated result to the ground truth with minimum loss of information.

4. Results and Discussion

The results obtained appear satisfactory and visually distinguishable. Figures 5(a)5(g) show the results of the experiment. The first column (a) contains the original MRI images obtained from the OASIS database, which are cross-sectional T1 images; the next column (b) contains the ground truth or the segmented image of respective images in column (a). The third column (c) shows the main results, which are segmented using the proposed method, that is, segmentation based on pixel label. The remaining three columns (d), (e), and (f) show the extracted binary image as a classification result of (c). The segmented image (c) is represented by gray level intensity in (g), which is compared with the ground truth to evaluate the Dice similarity coefficient (DSC) of each class, namely, WM, GM, and CSF. Table 3 presents the performance parameter for each image presented in row (a) of the original image in Figure 5.

4.1. Comparison with Other State-of-the-Art Methods

The computed mean DSC was approximately 80% (highest 84% and lowest 71%) among 6 test images. To compare this result, similar previous approaches for brain image segmentation are presented in Table 4. Zhang et al. [12] used a patchwise CNN for private data of 10 healthy infants, and Nie et al. [13] used semantic approach for the same type of data. Our approach was superior to those by de Brebisson et al. [14] and Moeskops et al. [15] in terms of DSC, but the dataset used here is OASIS mid cross-sectional T1 MRI 2D images instead of MICCAI 2012 Atlas.

5. Conclusions

In conclusion, we have successfully applied deep learning technique for image segmentation with convincing results. Specifically, we are able to segment closely related brain MRI images on pixel-label basis using encoder-decoder network of SegNet layer, which is generally used in semantic segmentation of outdoor scene. This suggests us that, with certain modification and simplified architecture, deep neural network can be effective in medical MRI image segmentation as like natural outdoor images.

Data Availability

The Open Access Series of Imaging Studies (OASIS) data were acquired through grants: P50 AG05681, P01 AG03991, R01 AG021910, P20 MH071616, and U24 RR021382. The MRI analyze format file (.img, and .hdr) data used in preparation of this article are publicly available on the OASIS database (http://www.oasis-brains.org/#data).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was supported by the Brain Research Program through the National Research Foundation of Korea funded by the Ministry of Science, ICT & Future Planning (NRF-2014M3C7A1046050). And, this work was supported by the National Research Foundation of Korea Grant funded by the Korean Government (NRF-2017R1A2B4006533).