Abstract

The segmentation of organs at risk in CT volumes is a prerequisite for radiotherapy treatment planning. In this paper, we focus on esophagus segmentation, a challenging application since the wall of the esophagus, made of muscle tissue, has very low contrast in CT images. We propose in this paper an original method to segment in thoracic CT scans the 3D esophagus using a skeleton-shape model to guide the segmentation. Our method is composed of two steps: a 3D segmentation by graph cut with skeleton prior, followed by a 2D propagation. Our method yields encouraging results over 6 patients.

1. Introduction

Lymphoma is a common tumor in the mediastinum, which is often located close to the trachea and the esophagus. Before radiotherapy, organs at risk such as heart, lung, and esophagus. must be outlined, in order to minimize the quantity of irradiation. In particular, the esophagus is very challenging to delineate because of the really low contrast of its boundaries, its complex shape and its inhomogeneous appearance (Figure 1). Today, the segmentation of the esophagus is performed manually by clinicians and is a tedious task, prone to intra- and interobserver variabilities.

Due to its difficulty, the literature on (semi)automatic esophagus segmentation is quite light. In [1], Rousson et al. proposed a two-step segmentation method in cardiac CT: first, the esophagus centerline is extracted using probabilistic spatial and appearance modeling, and then the outer surface of the esophagus is extracted using multiple and coupled ellipse fittings. However, the method requires as input two points on the centerline of the esophagus and manual segmentation of the aorta and the left atrium. In [2], Ragan et al. segment several organs using deformable models but fail to accurately contour the esophagus. In [3], Huang et al. propose a semiautomated method consisting in manually drawing one contour in an axial slice, which is propagated to other slices of based registration, but no quantitative evaluation is proposed in this paper. Fieselmann et al. in [4] propose, from several contours in axial slices provided by user input, to interpolate all missing contours in the frequency domain. A fully automatic method is presented in [5] by Feulner et al. that consists in first finding the approximate shape using a “detect and connect” approach, and then a classifier is trained to find short segments of the esophagus which are approximated by an elliptical model.

Most of these methods require a significant amount of user input or are based on an elliptical model. As shown in Figure 1, the esophagus is a deformable organ and has a more complex shape which can hardly be approximated by an elliptic shape. In this work we propose a method to segment the esophagus on thoracic CT scans in two steps: (i) a 3D segmentation by graph cut using a skeleton prior and (ii) a detection of the first slice of inaccurate segmentation, an oversegmentation due to the same gray level of neighboring organs such as aorta, and a 2D segmentation by graph cut for the next slices. Our method requires a very simple user input, has a low computational cost, and gives encouraging preliminary results on 6 patients.

2. Method: 3D Graph Cut Segmentation with Shape Prior

In this section, we first outline the graph cut segmentation framework as described in [6]. Then, we introduce our method, which includes the construction of the skeleton model, the 3D segmentation using graph cuts, and the 2D graph cut propagation.

2.1. 3D Graph Cut Segmentation

Let us consider the volume as a graph , where is the set of nodes (voxels) and the set of edges. Each pair of nodes in a neighborhood is connected by a segment called n-link and weighted by , a regularization or boundary term, designed to provide spatial coherence in a neighborhood of voxels. is typically defined as where and are the gray levels of voxels and , the Euclidean distance between and , and a constant usually related to acquisition noise. Consider two additional nodes, called terminal nodes: the source representing the object (in our case, the RV cavity) and the sink representing the background . Each node is connected to the terminal nodes and by two respective segments called t-links and weighted by the so-called region term denoted by and defined by where is the likelihood of observing given that voxel belongs to class that is intensity distribution of class .

A cut in the graph consists in cutting t-links and n-links to attribute a label or to each voxel of the image, which boils down to segment the volume. The energy of a cut is defined by where is if and have the same label, otherwise. The 3D optimal segmentation is obtained by searching for the cut of minimal energy. This global search can be very efficiently performed due to mincut-maxflow algorithms, in polynomial time [7].

2.2. Proposed Graph Cut Segmentation Framework Using Shape Prior

We propose a two-step method to segment the esophagus on thoracic CT scans based on graph cuts with a skeleton prior. In Section 2.2.1, we present the construction of a skeleton model based on a principal component analysis (PCA). Using this model, the first step of the method consists in segmenting the esophagus based on 3D graph cut (Section 2.2.2). We show 3D segmentation overestimate from a certain slice, taking aorta into esophagus segmentation. In this case, the variability of the skeletons is so important that the 3D shape model cannot well guide the segmentation. The second step consists then in detecting this slice, called breaking slice in the following, and realizes from it 2D segmentation by graph cut with a skeleton prior for the rest slices (Section 2.2.3).

2.2.1. Construction of the Skeleton Model

Let us consider 3D esophagus obtained by expert’s manual delineation on CT scans. On each slice , a gravity center can be computed and a skeleton is made of all gravity centers to where is the number of slices. Gravity centers are first decimated to allow skeletons to have the same number of points. Following the definition of a point distribution model (PDM) as first introduced in [8], skeletons are then rigidly aligned using a Procrustes analysis (see Figure 2). A mean skeleton is then computed: A PCA is then performed on the set of centered skeletons and yields eigenmodes denoted by , with , and their associated eigenvalues, denoted by . The number of eigenmodes is retained, with chosen large enough to account for the most important skeleton variations present in the training set.

Let us now describe how a single skeleton prior is computed from the PCA. Our aim is to isolate areas of variation of the mean skeleton, for each principal axis. We thus generate deformed skeleton instances for each axis: has to be bounded in order to represent variations in the training set of 3D esophagus. Classical boundary values of in PCA framework are [9]. A lower value would not capture all variations given by the training set. A higher value would include too large variations and would not ensure smooth deformations. All possible positions of skeletons are summed up in . A prior volume is then obtained by computing the minimum distance between a point and its nearest skeleton point: where is the euclidean distance between a point and its nearest gravity center from all possible skeletons. Thus, the lower the distance, the higher the probability to be inside the esophagus.

2.2.2. 3D Segmentation Using Graph Cuts

Now how can this prior map be integrated into the graph cut framework? In the literature, additional energy terms on the t-links [10] or the n-links [11] are added to the graph cost function. In any case, the prior must be rigidly registered onto the image to be segmented. The user is thus required to point out two landmarks: inside the esophagus, in the first and last slices of the volume. Esophagus intensity values are in the range −100–200 in CT scans. The prior map gives the distance of pixel to a possible skeleton of the esophagus. We thus suggest that the shape prior contributes to weighting t-links. The region term with skeleton prior can straightforwardly be defined with with being the gray level of voxel , the distance prior map, and a parameter defining what value of distance can be considered as being near to the possible skeleton.

We use the classical definition of to weight the n-links as defined in (1). The final energy of a cut for our 3D graph integrating a shape prior is then where weights the relative contributions of the n-link and t-link terms.

However, as esophagus is a moving cavity, compressed by the other organs, possible skeletons are various. Moreover, esophagus shares the same gray level with neighboring organs (in particular the aorta). As a result, from a certain slice, an oversegmentation is observed with the 3D graph (see Figure 3(c)). We propose to detect slices with oversegmentation and use 2D propagation by graph cut to improve segmentations.

2.2.3. 2D Propagation by Graph Cut

To detect the slice position where the oversegmentation begins, we use a simple heuristic: esophagus area does not change consequently from one slice to the other. Our aim is to detect the slice where oversegmentation starts by browsing through slices from top to bottom. This slice level is called breaking slice and is determined by where is the esophagus area of slice , the mean esophagus area of previous slices, and the threshold of area variation. From , we use a 2D segmentation by graph cut with prior for each slice. The final energy of the 2D graph is the same as 3D graph, as defined in (8). The prior is defined by the distance to the gravity center of the previously segmented slice : where is the gravity center of the previous segmented slice .

3. Experimental Results and Discussion

The proposed segmentation method has been applied on thoracic CT scans of patients (which have been acquired with different scanners). Each CT volume includes between 73 and 108 slices. Voxel size is  mm or  mm.

3.1. Skeleton Model Construction and Method Parameterization

Following a leave-one-out cross-validation strategy (LOOCV), skeleton models are constructed using a training set of esophagus skeletons. Preliminary registration is performed by superposing the first and the last gravity centers of each skeleton on an arbitrary reference.

Parameters are derived empirically: , , , and for 3D and 2D graph segmentations. The implementation of Boykov and Kolmogorov of the mincut-maxflow algorithm (available online at http://pub.ist.ac.at/~vnk/software.html) is used to compute the cut of minimal cost in the graph [7].

3.2. Segmentation Results

Our segmentation algorithm is run on 6 patients following an LOOCV strategy. For each volume, the user is required to point out two landmarks in order to register the skeleton prior: a point inside the esophagus in the first slice and a point inside the esophagus in the last slice. Our segmentation results are compared to manual ground truth through the Dice metric (DM), a standard overlap measure for comparing two surfaces. The Dice coefficient is given for 3D results and 2D results after slice and combined 3D and 2D results. The Dice score is stopped from being computed when a zero value is found and considered next slices as not segmented. Proportions of segmented slices considering over the total number of slices of esophagus are also computed. Results are provided in Table 1.

Not surprisingly, as shown in Table 1, segmentation results are better for the first slices of esophagus with 3D segmentation, as the esophagus contours are better defined. Figure 3 shows the breaking slice detection and the segmentation differences between the 3D and 2D methods. An example of 3D reconstruction of an esophagus is also shown in Figure 4.

The use of 2D segmentation allows avoiding overestimation given by 3D graph method in slices much more difficult to segment and to obtain better results. Considering this challenging application, our method yields encouraging results. However, as shown in Table 1, an average of 88.9% (11.9%) of esophagus slices are segmented with our method (from top to bottom). Room for improvement is left in the last slices of esophagus. Note that, thanks to the well-known high computational efficiency of graph cut, the computation time of our method is about 15 seconds by patient on a regular PC hardware, a time compatible with clinical practice.

4. Conclusion and Perspectives

In this paper, we have presented a method to segment the esophagus in CT scanner, based on a graph cut approach with incorporation of a shape prior. The shape model is a 1D PDM, constructed via a PCA from a set of skeletons of the esophagus obtained by manual segmentation. This shape model is then integrated into the graph cut cost function as prior term, in order to guide the segmentation. A combined 3D and 2D segmentation method is proposed. Results have been presented over 6 3D CT scans. If results are satisfying for the first slices, room for improvement is left in the remaining ones.

Apart from a validation on a larger database of patients, future works will focus on improving the 3D model of skeleton. We are currently investigating how to use air hole to guide the 2D segmentation process. Other works on 3D curve shape models such as [12] and medial tubular models [13] could also be fruitfully investigated.