Abstract

Accurate segmentation of brain tissue from magnetic resonance images (MRIs) is a critical task for diagnosis, treatment, and clinical research. In this paper, a novel algorithm (GMMD-U) that incorporates the modified full convolutional neural network U-net and Gaussian-Dirichlet mixture model (GMMD) with spatial constraints is presented. The proposed GMMD-U considers the local spatial relationships by assuming that the prior probability obeys the Dirichlet distribution. Specifically, GMMD is applied for extracting brain tissue that has a distinct intensity region and modified U-net is exploited to correct the wrong-classification areas caused by GMMD or other conventional approaches. The proposed GMMD-U is designed to take advantage of the statistical model-based segmentation techniques and deep neural network. We evaluate the performance of GMMD-U on a publicly available brain MRI dataset by comparing it with several existing algorithms, and the results reported reveal that the proposed framework can accurately detect the brain tissue from MRIs. The proposed learning-based integrated framework could be effective for brain tissue segmentation, which will be helpful for surgeons in brain disease diagnosis.

1. Introduction

Precise segmentation of human brain tissue from magnetic resonance images (MRIs) can aid in identification and diagnosis of neurological diseases, such as Parkinson’s disease and Alzheimer’s disease. This is a challenging task because brain MRIs are severely affected by intensity nonuniformity, complex structure, and low contrast during acquisition. During the last decades, a large majority of MRI segmentation algorithms were based on machine learning techniques [1]. In general, machine learning can be divided into supervised approaches, semisupervised or unsupervised approaches. For example, both random forests [24] and support vector machines [5] are typical supervised techniques commonly used in MRI segmentation. Recently, some new unsupervised algorithms have been proposed for extracting the required brain tissue, including grey matter (GM), white matter (WM), and cerebrospinal fluid (CSF) [6, 7], but clustering analysis is among the most studied [8, 9]. It has been found that clustering performance depends heavily on the selection of the initial cluster centres, and the segmentation results are sensitive to noise. The finite-mixture model (FMM) has become another important branch in unsupervised segmentation algorithms. It applies an unsupervised learning method to label the observation data and supposes that the intensity of each pixel obeys the chosen probability distribution. Typical FMMs include the Gaussian mixture model (GMM) [10], Student’s- mixture model (SMM) [11], circular mixture model [12], Rayleigh mixture model [13], etc. GMM is widely applied to model uncertainty of the data by assuming the conditional probability as a Gaussian distribution. It is the main model because the classical GMM is simple to carry out, and there are very few parameters that need to be estimated. However, GMM suffers from a problem of modelling data with different shapes and sensitivity to outlets. To overcome these shortcomings, segmentation techniques adopting mixture models with longer-tailed probability distributions have widely received consideration. For example, SMM with the longer tail can be regarded as an alternative to GMM [14]. However, one limitation of SMM is that its freedom parameter does not exist in closed form [15]. Another is that it cannot consider the relationships between the local spatial information between neighbouring pixels. An improvement can be obtained by combining the mixture models with the Markov random field (MRF) [16]. Additionally, the FMM was incorporated with level set-based methods in some studies [17]. The unsupervised methods mentioned above are often subject to various uncertainties, such as the tissue intensity overlaps caused by limitations of the MRI acquisition process. These factors decrease the brain MRI segmentation performance. Instead of employing FMM for modelling the MRI tissue intensity, deep-learning-based algorithms (supervised or unsupervised) have received considerable attention of recently published studies [18]. As one of the parts of broader family of machine learning, deep learning has achieved state-of-the-art results in MRI segmentation [18] or other computer vision tasks including image retrieval [19], image classification [20], object detection [21], etc. The convolutional neural network (CNN) is a deep learning architecture inspired by the biological networks akin to the multilayer perceptron. Classical CNN framework consists of three main building blocks: fully connected layers, pooling layers, and convolutional layers. Some recent studies have claimed that it is also suitable for MRI segmentation and brain tissue detection [22, 23]. However, for the classical CNN, it is a tedious task to recognize image patches. For example, the normal algorithm introduced by Ciresan [24] needs huge amount of time and redundant images for training because the recognition on an image patch acquires a sliding-window operation. Therefore, an end-to-end segmentation method called fully convolutional network (FCN) was recently introduced [25, 26] that can make pixel-wise prediction based on the images’ ground truth and can output the label map directly [27]. The purpose of the network FCN is to extract critical feature maps and to restore these maps into the image labels. This procedure focusing on features could be more suitable for precise segmentation, especially for medical images [28]. Actually, the capability of feature representation is the primary reason that FCN achieves great success on object detection, classification, and segmentation, but often only when sufficient amount of training data is available. However, in medical imaging field, the data acquisition is expensive. Besides, there are some other factors that impact the data availability, like the privacy and regulations/ethical concerns, etc. Recently, U-net was proposed to extend the FCN network by increasing and recycling more feature maps [29], so that U-net can perform well with relatively small amounts of training samples. The earliest U-net [29] was designed for labelling cells from light microscopic images and to yield very accurate segmentation results. Such network usually produces high-precision segmentation mainly because its architecture includes more constructions that copy and overlay down-sampling features.

Motivated by the aforementioned research, in this paper, we addressed a novel fusion framework called GMMD-U, which incorporates the Gaussian-Dirichlet mixture model (GMMD) and modified U-net to accurately analyse brain MRIs in terms of tissue types. The proposed algorithm differs from conventional clustering or FMM algorithms with respect to the following considerations. First, the proposed method merges the advantages of deep learning so that our algorithm has ability to use convolutional network to precisely recognize the uncertainty regions. Second, this paper improves the classical U-net by attaching a padding operation and batch normalization in order to improve the convergence speed. More clearly, CSF makes up a rather low proportion in the brain and its greyscale is very close to the pure black background in most MRIs; the modified U-net should be able to learn and to distinguish the shape of CSF as fast as possible. Third, this paper develops a novel FMM, Gaussian-Dirichlet mixture model that is a modified version of the classical GMM. Comparing with our previous work [30], the proposed GMMD takes the local spatial and intensity information into consideration through Dirichlet distribution so that the performance of the proposed GMMD is insensitive to noise and outlets. Four, in the proposed framework, the majority of pixels belonging to WM and GM can be accurately determined by the proposed GMMD module. The modified U-net is utilized to predict the CSF, as well as the error-prone region done by GMMD or other traditional methods. The experimental results in the Internet Brain Segmentation Repository (IBSR) dataset demonstrate that the proposed framework is superior to several existing unsupervised and supervised models.

The rest of this article is organized as follows. Section 2 addresses the construction of the modified GMM with spatial constraints and details the proposed fusion framework. The experimental results and discussion on the algorithm’s performance are given in Section 3. The final concluding remarks are provided in Section 4.

2. Methodology

The proposed fusion architecture consists of two fully convolutional networks: modified U-net and a Gaussian-Dirichlet mixture model with spatial constraints. In this paper, the finite-mixture model technology is adopted in our fusion scheme. It is mainly because it provides a statistical-based approach to model observed data in a probabilistic manner. Another advantage is that it classifies every pixel of an image into certain labels, while U-net network structure could only label object from the background. More precisely, the output of U-net has only two labels, which means WM, GM, and CSF could not be extracted at once. The aim of improving U-net framework in this study is to correct wrong label regions raised from conventional approaches. Therefore, the proposed approach incorporates finite-mixture model scheme with improved U-net network to extract brain tissue more accurately. We start by introducing modified GMM, which follows a Dirichlet distribution for brain MRI segmentation.

2.1. Gaussian-Dirichlet Mixture Model with Spatial Constraints

To consider the local spatial information between neighbouring pixels, this subsection presents a novel Gaussian-Dirichlet mixture model which is a modified version of GMM in terms of Dirichlet distribution. Assume that each pixel , , is an independent and identically distributed value in a grayscale image. Then, GMM assumes that is independent of the label , . To partition an image consisting of pixels into labels, GMM assumes the density function at pixel is as follows:where denotes the GMM parameters and and are the mean and covariance of Gaussian distribution, respectively. is the set of prior probabilities of pixel belonging to , which satisfies the constraintsIn (1), is a component of the Gaussian distribution with the formThe joint conditional probability density of is expressed byTaking the natural logarithm of (4), we have the following maximum logarithmic likelihood function:Next, we introduce the Dirichlet distribution [33] into the classical GMM in order to model the spatial information between neighbouring pixels. First, the probability label is defined aswhere is the th pixel’s discrete label taking the following form:Corresponding to (6), is the Dirichlet parameters. It also satisfies the constraintsDefine with in (6) as the vector format of the Dirichlet parameters. Then, according to method introduced in [31, 34], the probability density function can be written in the polynomial formHence, the probability density function takes the formThis paper defines the Dirichlet parameters to incorporate the neighbourhood spatial information bywhere represents the number of neighbour pixels that include and points around in a certain window. represents the posterior probability at iteration step .By (12), one can find that (11) can be regarded as a linear filter, where pixel is replaced by the mean value of its corresponding window. To eliminate the effect of noise, generally, the size of the filter template can be or . Taking (9) and (10), (6) can be expressed asAccording to the property of the probability density function, one hasSubstituting (11) and (14) into (13) leads toThe Dirichlet-based constraints consider the spatial information of neighbouring pixels in the form of linear filtering. More specifically, if considering discrete label in (7) and the Gamma function , the prior probability of (15) with , for each pixel can be written byIn this case, we have the following new log-likelihood function:where the parameter set is defined as . It is reasonable to optimize the negative likelihood function because the logarithm is monotonically increasing. Thus, the new loss function is expressed by can also be supposed as the difference expressionsConsidering , , and the Jensen inequality [35], expression (19) can be rewritten asThus, minimizing the negative log-likelihood function in (18) is equivalent to minimizing the following error function: Next, we apply the gradient descent method [36] for parameter learning. where is the learning rate and . The partial derivatives of the parameter with respect to are calculated by Similarly, we take the partial derivative of with respect to as Next, considering the partial derivative of with respect to yieldsIn detail, the computation process of parameters is summarized as follows.

Step 1. Initialize the prior probability , mean , and variance of each label by using K-means. ; .

Step 2 (E-step). Calculate the posterior probability using (12).

Step 3 (M-step). Evaluate the mean , variance , and in terms of (22), (23), (24), and (25). Update the prior probability using (16).

Step 4. Check the convergence of log-likelihood in (17). If the convergence criterion is not satisfied, increase the iteration , and repeat Steps 24.

2.2. Training Procedure

In this paper, all experiments are performed on the IBSR 18 dataset [37], which provides manually guided expert segmentation results along with magnetic resonance brain image data. This dataset consists of MRIs and 3D ground truth volume of 18 brains of size with mm slide thickness. These volumes are provided after skull-stripping, normalization and bias field correction. The ground truth is provided with manual segmentation by experts with tissue labels as for background, CSF, GM, and WM, respectively. Each MRI volume is read, via 256 number of axial brain slices of size each, in the proposed model. The proposed network structure of the training procedure can be divided into two modified U-net subnetworks and GMMD, as shown in Figure 1. To acquire the training set, we define the ground truth of CSF as the binary mask and then obtain the brain MRI without CSF by the following relation:where is the original brain MRI and is the ground truth of CSF. All original brain MRIs and images will be fed into the first modified U-net network as the training dataset. This procedure is summarized in Training Model I, shown in Figure 1. To clarify this process, Figure 2 presents several examples to show the visually extracted CSF and . After that, the brain MRIs (without CSF) are fed into the GMMD as original input images. After obtaining the segmentation results of GM and WM using the proposed GMMD, as shown in the schematic representation of Training Model II, the training set of wrong-classification regions is then acquired from the difference between the regions labelled by GMMD and the ground truth without CSF. For the purpose of clarifying this process, we present four examples for extracting wrong-classification areas in Figure 3. The term ‘wrong-classification area’ indicates the pixels that are not usually sufficiently distinguished by the classical segmentation methods. In the training phase, as can be seen from Figure 4, we found that the wrong-classification regions are almost always GM. More specifically, it can be found that almost of GM pixels are mistakenly classified as WM (see Figure 5). In most cases, sample images that are segmented by GMMD have few wrong-classification pixels, except for GM pixels. Due to the small proportion of WM area in the residual region, in this paper, the prediction of the second modified U-net module is regarded as the GM detector to correct the wrong-classification regions caused by GMMD or other classical segmentation algorithms.

2.3. Modified U-Net Framework

U-net [29] is a fully convolutional network that performs excellent in image segmentation. Classical U-net network does not require a huge amount of training sets. What is more, the training time of U-net is relatively short, having a simple structure, and demanding less parameters compared with other network. In this paper, we improve the classical U-net by attaching a padding operation and batch normalization in order to improve the convergence speed. This is important because batch normalization can allow one to utilize higher learning rates and require less intensive initialization. The whole architecture of this modified U-net network is depicted in Figure 6. This network combines a feature-extracting path for collecting global features and an expanding method to locate pixels belonging to the features. More specifically, the network structure in Figure 6 contains batch normalization next to each convolution through zero-padding. The ReLU layer, which is the nonlinear activation function, is used instead of the traditional Sigmoid function. The ReLU function only needs a single threshold to activate itself, and moreover can eliminate the complexity of calculation [26]. After two continuous convolution operations, max-pooling with stride 2 is attached in the upper structure. Correspondingly, upsampling with stride 2 is operated in the lower structure. Before each max-pooling, the feature maps will be copied and then be transferred to the same location as the upsampling part. At the output layer, a convolution with the generalized linear soft-max function is applied to calculate the probability of each pixel in the classification. Cross entropy is adopted as the loss function in this network, which describes the distance between the prediction and real values of the proposed model [38].where is the logit value from the image-prediction matrix and is the size of the matrix. Then, substituting into the equation of cross entropy yieldswhere represents and is the ground-truth value.

As illustrated in Figure 1, the proposed model utilizes two modified U-net modules for training the CSF and wrong-classification areas based on the Tensorflow framework [39]. The first training model is for CSF detection and the second model is attributed to wrong-classification region prediction. Here, the wrong-classification region is some pixels that cannot be precisely classified using GMMD. Figure 7 shows some training output epochs of Training Models I and II. As indicated by the figure, our proposed training model achieves higher accuracy with increasing epoch. The curves of loss function versus the number of iterations corresponding to each training procedure are depicted in Figure 8. It can be clearly observed that the loss function presents a steady decrease with increasing iterations, which confirms the effectiveness of the proposed training models.

3. Experimental Results

Figure 9 displays the flowchart of the test procedure using the proposed GMMD-U framework, which contains three parts corresponding to CSF detection (Part I), wrong-classification prediction (Part II), and remaining region clustering (Part III). We run two trained U-net models on the testing set and produce heat maps to illustrate the probabilities of prediction. In this paper, all testing samples are randomly selected from the IBSR 18 dataset, and the test results are presented in Figure 10. Obviously, the results of prediction are visually similar to the ground truth. This means some regions that GMMD never labels precisely can be predicted by the proposed model. Furthermore, comparing the ground truth (labels) and predicted areas reported in Figure 10(b), one can observe that almost all of the predicted pixels belong to GM because the prediction areas are basically the same as the ground truth of GM. GMMD has an ability to model the shape of the pixel regions better. This is significant because these regions have fewer blur boundaries and patches. Finally, for each MRI, the final segmentation results come from three parts: CSF detected by modified neural network U-net I, GM and WM labelled using the posterior probability of the th pixel of the MRI to class , and GM and WM obtained from uncertain pixel areas via modified neural network U-net II and the wrong-classification prediction module.

We have evaluated the proposed fusion algorithm on the public brain datasets from IBSR 18. Specifically, we focus on the segmentation of clinical MRI and aim to separate the three parts of the real brain image data that contain CSF, GM, and WM. For experimentation purposes, all MR images from the IBSR 18 dataset are preprocessed by removing blank images and extracting the main brain parts from the skull. Thus, there are 1000 images of size that are selected from ten random subjects for training, and 300 images are adopted from other three subjects for testing. The ground-truth images have been divided into four parts: background, CSF, GM, and WM. Our experiments have been developed under the environment of MATLAB2016a and the Tensorflow library.

The experiment first runs the proposed framework on several brain MRIs (slice 92, slice 23, slice 40, and slice 96 in ISBR 18). The corresponding results of the four different patients are presented in Figure 11. These results indicate that our supervised network performs well in detecting the CSF and error-classification pixels in a visual medium. In addition, we observe, regardless of which slices are used, that the proposed model can obtain rich details. In the next experiment, the performance of GMMD-U is compared to that of FCM, GMM, K-means, SMM-SC (see [31]), CBCLO (see [32]), and classical U-net by using the brain MRI (slice 70 of one patient) in ISBR 18, and the results are provided in Figure 12. The classical U-net algorithm used here is divided into three parts for dealing with the segmentation task, which contains CSF, GM, and WM. Based on visual comparison with the ground truth, GMMD-U performs better than several other comparing methods. SMM-SC method achieves better segmentation results compared to CBCLO.

In order to make a quantitative comparison of the different algorithms, this paper considers the Dice similarity coefficient [40] to evaluate the performance of segmentation models. The Dice coefficient is extensively accepted to evaluate the performance of segmentation algorithms in the image-processing field. The Dice coefficient of each part’s prediction and ground truth can be calculated bywhere means all pixel labels that belong to the ground truth of the brain image and represents the result of the predicted labels of each tissue. The Dice coefficient lies in the range , and a high value indicates high performance of the segmentation approach. The experiment determines the performance of five algorithms on the testing set. Figure 13 shows the Dice coefficient of the segmentation performance of FCM, K-means, GMM, SMM-SC (see [31]), CBCLO (see [32]), classical U-net, and GMMD-U on the brain MRI of one patient (ISBR 18). It is clear from this figure that GMMD-U shows the highest Dice coefficient values among all algorithms and the smallest fluctuation range in each part of tissues. These results indicate that GMMD-U is more accurate than several related methods, such as classical U-net, SMM-SC (see [31]), CBCLO (see [32]), K-means, GMM, and FCM. Figure 14 shows the Dice coefficient obtained for the whole testing image dataset (ISBR 18) after applying the five methods. From this figure, consistently high values of Dice coefficients can be easily observed for the GMMD-U algorithm, which indicates that brain tissues are better recognized. The classical U-net model has higher Dice coefficients and achieves better accuracy than FCM, K-means, GMM, SMM-SC, and CBCLO, respectively.

4. Conclusions

In this paper, we proposed a new fusion framework for brain MRI segmentation based on two fully convolutional networks, modified U-net, and the Gaussian-Dirichlet mixture model with spatial constraints. Compared with other finite-mixture models, our proposed method can accurately acquire different brain tissue, such as WM, GM, and CSF. The advantage of the proposed framework is that it incorporates the local spatial constraints through Dirichlet distribution. Furthermore, the modified U-net could accurately distinguish different tissue, which could be categorized with some fault decisions. The experimental results of the brain MRI from the IBSR 18 dataset illustrated the effectiveness of GMMD-U on segmentation tasks.

Data Availability

The data used to support the findings of this study have been deposited in the Internet Brain Segmentation Repository, https://www.nitrc.org/projects/ibsr. Please find them in [36] of this paper.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Nature Science Foundation of China (Grant no. 61872143).