Abstract

Segmentation of the prostate from Magnetic Resonance Imaging (MRI) plays an important role in prostate cancer diagnosis. However, the lack of clear boundary and significant variation of prostate shapes and appearances make the automatic segmentation very challenging. In the past several years, approaches based on deep learning technology have made significant progress on prostate segmentation. However, those approaches mainly paid attention to features and contexts within each single slice of a 3D volume. As a result, this kind of approaches faces many difficulties when segmenting the base and apex of the prostate due to the limited slice boundary information. To tackle this problem, in this paper, we propose a deep neural network with bidirectional convolutional recurrent layers for MRI prostate image segmentation. In addition to utilizing the intraslice contexts and features, the proposed model also treats prostate slices as a data sequence and utilizes the interslice contexts to assist segmentation. The experimental results show that the proposed approach achieved significant segmentation improvement compared to other reported methods.

1. Introduction

Accurately segmenting the prostate from Magnetic Resonance Imaging (MRI) provides very useful information for clinical applications like computer aided diagnosis and image guided interventions [1]. However, it is a very challenging task due to the lack of clear boundary definition and the significant variation of shape and texture across images from different patients [2] as shown in Figure 1.

According to the guidance information used in the model, we can classify the existing prostate MRI segmentation methods into four kinds: region based, shape prior based, contour based, and classification methods [35]. Toth et al. [6] presented an Active Shape Model (ASM) initialization scheme for prostate segmentation which leverages multimodal information to initialize ASM. Samiee et al. [7] proposed a model using shape prior of prostate to refine the prostate boundary; Klein [8] proposed an automatic segmentation model which is based on manually matching segmented atlas images.

In the past several years, deep learning based techniques, especially fully convolutional neural networks (FCNs), have proved very effective on image segmentation [911], including biomedical image segmentation. Zhu et al. [12] proposed a deeply supervised CNN that utilized the residual information to accurately segment the prostate MRI. Bao and Chung [13] introduced a multiscale structured FCN model for brain MRI segmentation by capturing discriminative features from input patch. Other examples on introducing deep learning into biomedical image segmentation can be found in [1417].

However, a straightforward extension of those 2D image segmentation methods to 3D may not yield satisfactory performance, due to the anisotropic nature of many medical imaging modalities. To tackle this problem, Chen et al. [18] proposed a method to combine fully convolutional neural networks with the extended Convolutional Long Short-Term Memory (C-LSTM), which improved 3D medical image segmentation performances by simultaneously leveraging the abstraction capabilities of both FCNs and RNNs. Nevertheless, their model relies on the so-called U-Net [18] to extract image features. The following Bidirectional Convolutional LSTMs (BDC-LSTMs) only work on the extracted features. Thus, useful information for assisting image segmentation may be lost.

As to prostate segmentation task, an insurmountable challenge is the image slice specifically at the apex and base always loses clear boundaries and necessary information. This phenomenon brings the main difficulty to prostate segmentation. However, shape prior is an effective way to resolve this challenge. For instance, Qin et al. [19] proposed an adaptive shape prior constrained directional level set model (ASPDLS) to segment the inner and outer boundaries of the bladder wall and achieved accurate segmentation results. Motivated by the fact that the acquired MRI images typically have a high intraslice resolution and there exists a high spatial dependence between slices from the same patient, we utilize interslice as shape prior to guide the process of feature extraction and explore necessary information from interslice to alleviate information loss as shown in Figure 2. Besides the architecture of RNNs has superiority performances in modeling sequential data [17, 20, 21]. To improve the performance of prostate segmentation, in this paper, we propose a network, called UR-Net, which treats prostate slices as a data sequence, utilizing the intraslice contexts and features to assist segmentation.

There are two main contributions of this paper. First, we treat prostate slices as a data sequence and utilize interslice as shape prior to guide the process of feature extraction and explore necessary information from interslice. Second, we explore the power of RNNs rather than the traditional CNNs to extract image feature. The experimental results demonstrate that the use of RNNs can substantially improve the performance of prostate segmentation.

The rest of the paper is organized as follows. The architecture of Recurrent Neural Network and the details of proposed network architecture are described in Section 2. Section 3 presents the experimental results and performance evaluation. The conclusions are provided in Section 4.

2. UR-Net

In this section, we first review the classic Recurrent Neural Networks (RNNs) and then move on to describe the extension of RNNs to Long Short-Term Memory (LSTM) [22] and Convolutional Long Short-Term Memory (CLSTM) [18] which are specific Recurrent Neural Networks. After that, the proposed UR-Net is presented in detail.

2.1. Recurrent Neural Networks (RNNs)

Recurrent Neural Network (RNN) has a long history in the artificial neural network community which was designed to model temporal sequences. The architecture of typical RNNs is shown in Figure 3. This model has shown great promise in many tasks, such as NLP [23], non-Markovian control, and text tasks [24]. The idea behind RNNs is to make use of sequential information with the output being dependent on the previous computation. RNNs have a memory, which can remember the information about what has been calculated so far. In theory [25], RNNs can remember the information in arbitrarily long sequences and make use of the previous computations, but in practice they are limited to looking back only a few steps, because of the problem of vanishing gradient.

At each time step , the RNNs utilize the input data and the previous hidden state to calculate the next hidden state and output by applying the following recursive operation:where is an element-wise nonlinearity function; , , and are the parameters of hidden state; and are output parameters.

During the last decade, several methods have been explored for training RNNs, such as backpropagation through time (BPTT) [26], real-time recurrent learning (RTRL) [27], and extended Kalman filtering based techniques (EKF) [28]. Though those training methods can help us train RNNs, they suffer from the vanishing gradient problem.

2.2. Long Short-Term Memory (LSTM)

To address the problem of vanishing gradient and long-term dependency residing in RNNs [29], a special implementation of RNN, Long Short-Term Memory (LSTM), was introduced by Hochreiter and Schmidhuber [22]. The architecture of LSTM is shown in Figure 4.

One LSTM unit consists of an input gate (), a forget gate (), an output gate (), and a memory cell () which possess the ability of remembering or forgetting the information over potentially long periods of time. The input gate puts and controls the input data into the memory cell. The forgetting gate decides what information we are going to throw away form the memory cell. The output gate decides which parts of data in the memory cell are going to output and simultaneously controls the output data flow into the rest of the network.

The step of LSTM can be described as follows: the first step in the LSTM model corresponds to (2) which is to decide what information should be discarded from input data. This decision is made by forget gate; the architecture of forget gate is shown in Figure 5. The forget gate possesses a forgetting layer which consists of a sigmoid function. When getting the data from previous stage and input data , the sigmoid function outputs a number 0 or 1 for each data in the cell state , the number 0 represents completely throwing it away, while 1 represents storing it.

The second step is to decide what new information we should store in the cell. The input gate consists of a sigmoid function and a tanh function as shown in Figure 6. When the input gate receives a new data, the sigmoid function decides what values will be updated and the tanh function creates a candidate values , those operations corresponding to (3). At last, the input gate controls the candidate values to update the cell state.

When we got the information coming from forget gate and input gate, we can utilize (4) to update the cell state. This operation can drop some useless information.

The fourth step is to decide what information we should output which is based on cell state. The final result consists of two parts, the first part comes from cell state selected by a tanh function. The second part comes from input data which will be selected by a sigmoid function. The output gate is shown in Figure 7 and the computing methods are

Putting those together, the gates at discrete time    are computed as follows:

The standard LSTM architecture is just designed for one-dimensional data. It cannot be directly applied to 2D image data. To apply LSTM in prostate image, in our model, we apply Convolutional LSTM (CLSTM) as convolutional layer. This can be achieved by using a convolution operator to replace the matrix multiplication. The core equations of CLSTM are presented in where denotes convolution operator and is the output of the layer; denotes the fact that the CLSTM works slice by slice in certain direction.

2.3. The Proposed Network Architecture

In order to exploit the interslice information effectively, we introduce a Bidirectional Convolutional LSTM (BDC-LSTM) layer into our deep learning network. A BDC-LSTM layer consists of two sets of CLSTMs to extract features as shown in Figure 8. The two CLSTM streams work in two opposite directions. Rather than serializing each prostate image into sequential patches and then leveraging Bidirectional LSTM to segment each patch, in our method, we treat each image as a whole and three adjacent image slices compose a sequence.

When we put one image sequence denoted by , , into the BDC-LSTM layer as shown in Figure 8, the layer will utilize interslice and intraslice information to extract prostate features. Firstly, the layer extracts the first slice’s features. And then the result of and will be treated as a shape prior combined with the later slice and as input to guide the process of segmentation in turn. Simultaneously the layer will extract features of each slice in opposite directions from to , at last, connecting the two different features maps together as the input of next layer.

Our proposed network architecture is shown in Figure 10. The main framework of our proposed method follows the architecture of U-Net [30], since U-Net can successfully extract image features for segmentation with a reasonable network depth. As a matter of fact, U-Net has obtained state-of-the-art performances in many biomedical image processing tasks. For example, Milletari et al. [31] proposed a fully convolutional neural network for volumetric medical image segmentation, called V-Net. This model leverages the power of U-Net to process MRI volumes. The architecture of U-Net is shown in Figure 9.

The proposed network architecture consists of a contracting path on the left, an expansive path on the right, and a classified path on the bottom. Both the contracting path and expansive path possess 4 stages, and each stage consists of one BDC-LSTM layer. At the end of networks one softmax layer is added. At the contracting stage, a max pooling operation with a stride of 2 is attached for downsampling and the number of feature channels is doubled after each stage. On the opposite side, at the expansive stage, the first step is upsampling, which makes the width and height of feature maps doubled each time until they reach the size of the original images. At the same time, upsampling also halves the number of feature channels. To reduce information loss during convolution, a concatenation from left contracting path to right expansive path is made. The concatenation can provide features extracted from early stage to late stage and also can speed up the convergence of the network. To avoid overfitting, dropout operations have been added at the end of each stage.

2.4. Network Objective Function

For the prostate images, the anatomy of interest usually occupies a very small part of an image. This brings the problem that the networks always ignore the segmentation parts and become biased towards the background. This always led to the learning process trapped in local minima. To overcome this problem, we apply the dice coefficient as the objective function. The dice coefficient function can pay more attention to segmentation parts influences. The dice coefficient (DSC) [19] function between two images can be written aswhere denotes the result of automatic segmentation and denotes the result of manual segmentation.

In our work, the ground truth and results of segmentation are binary images, so the dice coefficient DSC between two binary images can be written as denotes the total number of pixels in the image, and , denote the pixels from ground truth and segmentation, respectively.

This formulation of dice can be differentiated yielding the gradient:Besides, Milletari et al. [31] have proved that the DSC are much better than the same network trained with a logistic loss for overcoming the network traps in local minima.

3. Experimental Results

3.1. Materials

The MRI prostate images used in our work as shown in Figure 11 were acquired from 80 patients using a Philips 3T MRI scanner with endorectal coil. The in-plane resolution is 0.3 mm × 0.3 mm and interslice distance is 3 mm. Each patient image volume consists of about 26 slices. The dimension of each 2D slice is 512 × 512 pixels.

3.2. Training Strategy

We randomly selected 76 patients from 80 patients for training and the rest of patients are utilized for testing. During training, we put three sequential slices denoted by , , from one patient into the network. And then the BDC-LSTM layers exploit intraslice and interslice contextual information from two directions, one in direction and the other in direction as shown in Figure 12. Our network is trained end-to-end on the prostate scans dataset. And the network framework is implemented under the open-source deep learning library Keras [32]. Experiments are carried out on GTX1080 GPU with 8 GB of video memory and the CUDA edition is 8.0. In the training phase, the learning rate is set as 0.0001 initially. Due to the limit by the memory, we choose 1 as the mini-batch. And all of the train image and ground truth have been resized to 256 × 256.

3.3. Experiments

To validate whether the deep neural network with RNN layers can significantly improve the segmentation accuracy, we also modify the FCNs by utilizing BDC-LSTM layers to replace the convolutional layers within FCNs. These testing images come with a corresponding ground truth segmentation map which is a binary image and is used to evaluate the performances of automatic segmentation. At last, we compare our model with U-Net, V-Net [31], fully convolutional networks (FCNs), and modified FCNs. Parts of segmentation results of our network are shown in Figure 13.

3.3.1. Qualitative Comparison

From the segmentation results, we selected some representative and challenging images, which have fuzzy boundaries and the pixel intensity distributions are inhomogeneous both inside and outside. In addition, both prostate and nonprostate regions in those images have similar intensity distributions as shown in Figure 14.

As presented in the third column, FCNs only can detect and segment a part of prostate. And the segmentation results are not accurate, due to the fact that the FCNs model has assigned the labels to a small patch rather than each pixel. Besides, the FCNs ignore the boundaries information. So the FCNs model cannot be directly used in prostate segmentation problem.

As shown in the fourth column, U-Net model has got more accurate segmentation results than FCNs. Because U-Net assigns each label to every pixel and the architecture of U-Net can enhance information propagation through the whole network and improves the network performance, for the slices at the apex and base which lack clear boundary and complete texture, the model cannot segment the prostate accurately.

The results of modified FCNs are shown in the fifth column. Compared with original FCNs, the segmentation results of modified FCNs are more accurate. From the results, we can see that the modified FCNs can detect more prostate information under the guidance of previous slice. The improvement of modified FCNs can be attributed to the superiority of the architecture of BDC-LSTM. Compared with the traditional convolutional layer, the BDC-LSTM layer can obtain the losing information from adjacent slices and enhance the performance of network.

The sixth column shows the results of V-Net. Compared with FCNs and U-Net, V-Net can take fully use of the 3D spatial information of the volumetric data. However, due to the limited data and memory, each time, V-Net only can receive local volume; this results in V-Net unable to obtain global information. From Figure 14, we can see that the prostate boundaries lose continuity and curvature.

The results of UR-Net are shown in the seventh column. We can observe that the model achieved the best results on prostate segmentation. It can be attributed to the fact that prostate sequence scans can provide more information than a single slice. And the model utilizes interslice information to aid the segmentation process.

3.3.2. Quantitative Comparison

To quantitatively evaluate the segmentation results, we have computed segmentation results from three aspects as shown in Table 1 including the mean, maximum, and median DSC values. From Table 1, it can be seen that our proposed model obtained the highest scores among all the methods. It shows that the deep neural network with BDC-LSTM layers can obtain promising improvements on prostate MRI images segmentation. Besides, the modified FCNs obtain more accurate segmentation results compared with original FCNs. This improvement should be attributed to the superiority of BDC-LSTM layers, which utilize interslice as shape prior to guide the process of feature extraction and explore necessary information from interslice to alleviate information loss and finally improve the segmentation results.

4. Conclusions

In this paper, we propose a deep neural network with RNNs layers for MRI prostate image segmentation. Different from traditional methods, we treat the prostate scans as sequence data. Except for the local features, we also utilize the interslice information to aid prostate segmentation. In the proposed network, we put three neighboring slices into the network once. And then the network extracts intraslice contexts under the guidance of previous segmentation results from different neighboring slices. Connecting the two different features maps coming from opposite sequential directions together can alleviate features lost. Experimental results on extensive MRI prostate image datasets demonstrate that the proposed model achieves better performance than the state-of-the-art convolutional neural networks.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grants U1536204, 60473023, and 61471274. And the authors would like to acknowledge NVIDIA Corporation for the donation of the Titan Xp GPU used for this research.