Abstract

Segmentation tasks in medical imaging represent an exhaustive challenge for scientists since the image acquisition nature yields issues that hamper the correct reconstruction and visualization processes. Depending on the specific image modality, we have to consider limitations such as the presence of noise, vanished edges, or high intensity differences, known, in most cases, as inhomogeneities. New algorithms in segmentation are required to provide a better performance. This paper presents a new unified approach to improve traditional segmentation methods as Active Shape Models and Chan-Vese model based on level set. The approach introduces a combination of local analysis implementations with classic segmentation algorithms that incorporates local texture information given by the Hermite transform and Local Binary Patterns. The mixture of both region-based methods and local descriptors highlights relevant regions by considering extra information which is helpful to delimit structures. We performed segmentation experiments on 2D images including midbrain in Magnetic Resonance Imaging and heart’s left ventricle endocardium in Computed Tomography. Quantitative evaluation was obtained with Dice coefficient and Hausdorff distance measures. Results display a substantial advantage over the original methods when we include our characterization schemes. We propose further research validation on different organ structures with promising results.

1. Introduction

Segmentation tasks are routinely used in a great variety of applications, being one of the most common task image analysis [1]. It has become an important tool in the support on clinical diagnosis, planning, and evaluation of patient treatments with noninvasive techniques. It is the correct delineation of organ boundaries in which signal processing techniques have been adapted to recognize anatomical structures according to clinical needs. Obtaining high precision and accurate results with minimal human interaction is becoming more important for the clinicians’ on their everyday tasks.

Among the different existing medical imaging modalities, the most popular are as follows: X-rays, Ultrasound, Computed Tomography (CT), and Magnetic Resonance Imaging (MRI). Each human organ has specific anatomical characteristics and each imaging modality has its own image quality properties and limitations such as sharpness, noise, and contrast. All the above hampers the segmentation task. There is a great importance in selecting the correct algorithm that fits the organ boundary. Up to date the efforts focus on locating the desired structure, identifying it in the image, and determining its contour [2].

Several techniques have been developed in order to cover the segmentation task. Some of the more popular methods in medical image analysis are those based on deformable models [35]. They include algorithms, such as Active Shape Models (ASM) and level sets (LS), which are widely documented in the literature [6]. Efficient implementations have been designed aiming at faster methods [7, 8]. However, classic deformable models are associated with a functional minimization, which often leads to oversegmentation [9]. As a part of these methods, an extra step for validation requires a ground truth for comparison. This is usually obtained by user manual delineation and prone to user errors.

In recent years, segmentation performance has increased with new improvements by combining other techniques and also having the inclusion of more information cues, like texture or movement. Such is the case of Jiang et al. who combined level set framework with seeded region growing algorithm for segmentation of hippocampus [10]. Ma et al. [11] used a Haar classifier to detect the heart area and, after this step, the segmentation is performed with an ASM. Brox et al. [12] introduced simultaneously texture features, gray level, and optic flow on a level-based segmentation process. An extension of ASMs has been proposed with the Active Appearance Model (AAM), including textural information within the model as a whole [13], but the algorithm results in low efficiency in real-time systems and high computational cost, besides having inconsistent robustness when applied under different circumstances [14].

On the other hand, besides combining different techniques, there exist information features inherent to the images such as color, texture, or morphology that can be included in order to provide especial characteristics to improve the final segmentation. It is known, in general, that intensity inhomogeneity complicates many image segmentation methods, making it difficult to identify the desired regions based only on pixel intensity. Level sets and AAMs are used as based-region methods, but when dealing with images that contain intensity inhomogeneities, they tamper its performance [15]. For this reason, it is important to add features to the algorithms that improve the performance in spite of inhomogeneities presence.

Xu et al. [16] propose to combine an automatic global edge and region force field guided method with nonlinear exponential point evolution for lung field segmentation, expanding the ASM performance with this information. Keomany and Marcel [17] proposed to use LBPs also with ASMs for the profile search instead of having only gray values. Instead of gray values it uses LBP information, avoiding the limitations of AAMs.

In the case of the Hermite transform (HT) and Local Binary Patterns (LBP), both methods provide additional features that improve the segmentation algorithms. LBPs are beneficial when they are combined with other methods such as ASMs, giving a more solid result modeling local structures in spite of illumination changes [18]. In addition, LBPs have very low computational complexity and provide a robust way for describing local texture patterns [19]. The Hermite transform [20, 21] is one of the mathematical models that resembles the human visual system (HVS). These HVS models have increased in popularity because they allow images to be expanded into a local decomposition that describes intrinsic attributes related to important cues and highlight structures. The information obtained from the Hermite transform coefficients has previously been used as textural data for classification tasks [22].

The present study proposes to combine segmentation algorithms with information, given by the HT and LBPs, to improve efficiency of current segmentation techniques such as ASMs and LS. In order to test our approach we used two image modalities for different organ studies, such as cranial midbrain MRI and endocardium left ventricle CT.

This paper is organized as follows: Section 2 introduces the mathematical background for the proposal based on ASMs and a deformable model Chan-Vese (CV) algorithm. In Section 3 the method is explained, in Section 4 the dataset is described, and in Section 5 we display the experiment’s results and provide a discussion. Finally, Section 6 concludes our work.

2. Theoretical Background

In this section, we present the mathematical foundation. The deformable models used here are ASMs, a statistical deformable model, and vector value Chan-Vese model, a method based on level sets.

2.1. Chan-Vese Model with Level Sets

The active contour model without edges was proposed by [23] and it is inspired by the Mumford-Shah method for image segmentation which evolve an object by minimization of an energy functional [24]. The original definition of the 2-dimensional Chan-Vese model (CV) is to let be a bounded open set of , with as the boundary. We define an image as . Now let be an involving curve in as a boundary between two regions in such a way that , where and represent the inner and outer region of , respectively. Finally, the formulation of the energy functional tries to identify the best partition on as follows:where and are the two average regions of intensity values inside and outside the contour, respectively. The energy is minimized if the contour is located at the boundary of the object of interest. The 2-dimensional energy functional of the CV model in the domain is defined as follows:where , are fixed nonnegative constants for data fitting and is introduced to regularize the surface . The minimal partition problem given by (1) can be solved using the level set method [25]. The contour can be represented by the nonzero level set of a Lipschitz function, called , such that . This approach has many advantages because it allows a wide study of topologies, which is made on a rectangular grid. The functional in (2) can be written in terms of the level set as follows:

Considering the approximations of the Heaviside and Dirac functions [23], as specified in the original paper, the minimization is solved with the Euler-Lagrange equation for an artificial time :

Equation (4) is implemented as a discrete model through finite differences. The vector value model for Chan-Vese (VVCV) proposed in [26] uses -extra information in order to obtain a better segmentation.where and are constant vectors that represent the average value of one of the K layers of inside and outside the curve , respectively. The parameters and are constant weights defined manually for the user and further discussed in the experimental results section. We will consider in this study planar images.

The solution of (4) and (5) achieves a convergence by checking whether the solution is stationary. Setting a threshold is used to verify if the energy is not changing or if the contour is not moving too much between two iterations, such threshold can be found intuitively or experimentally. Another very used criterion to stop the curve evolution is to manually define a stopping term with a large number of iterations; this assumption considers that the desired contour will be obtained at the end of the process, similar to obtaining a global minima.

2.2. Active Shape Models (ASM)

ASMs published by Cootes et al. [27] incorporate shape information as part of its model. It consists of a training phase, where the shape information is modeled statistically, and a fitting phase, where new shapes are recognized. ASMs use a set of points that define the shape within the image. These points also incorporate features, such as the gray-level information, that drives the fitting (recognition) phase.

In the training phase, the points or landmarks define the prior knowledge, and this is accomplished by defining the set of points correspondent to the images used for training. These images were aligned one with respect to the other, using pose transformations that include translation, rotation, and scaling. In order to avoid variability within the training set, Cootes developed the point distribution model (PDM), where shapes are defined with 2-dimensional coordinates . We obtain the mean shape from . The variations of the mean shape are obtained by computing principal component analysis [27]. Any shape can be approximated bywhere is the matrix of the first principal components, is the weight vector, and is the estimated shape. And a covariance matrix is computed as follows:

The eigenvectors of the largest eigenvalues from the covariance matrix describe the most significant modes of variance [3], and they are used to characterize the variability of the training set. During the same training phase, the algorithm includes the creation of a gray-level model for each landmark that defines the mean shape.

In the fitting phase, the following steps are executed:(i)Initialize the shape parameters and generate the mean shape from the training phase.(ii)Convey a gray-level profile search in order to adjust the mean shape landmarks to the image. should be placed close to the object of interest manually. Each landmark in is compared with its corresponding gray-level profile. The landmarks positions are adjusted according to a distance-based optimization criterion.(iii)Pose model is adjusted including translation, rotation, and scaling, aligning the points model to current found in the previous step.(iv)Compute shape adjustment in order to update parameter, according to (6).

Except the initialization, the steps are repeated iteratively until convergence is achieved.

2.3. Hermite Transform

The Hermite analysis functions called [20, 21] are a set of orthogonal polynomials obtained from the product of a Gaussian window defined by and the Hermite polynomials , where in this case indicates the order of analysis.

The Hermite coefficients can be obtained by convolving the main function with the analysis filters . This process involves a localized description of the function in terms of the orthogonal basis. The 2-dimensional representation can be obtained since the analysis functions are spatially separable and rotationally symmetric, that is, .with , where and denote the analysis order in - and -axes for and . The associated orthogonal polynomials are defined as follows:

A steered version of the Hermite transform is based on the principle of steerable filters [28] and is implemented by linearly combining the Cartesian Hermite coefficients. This representation allows adapting the analysis direction to the local image orientation content according to a criterion of maximum oriented energy. Therefore, a set of rotated coefficients is constructed, named the steered Hermite transform (SHT) [29, 30].where are known as the Cartesian angular functions that express the directional selectivity of the filter at all window positions by using a simple relation, , which approximates the direction of maximum oriented energy.

Since the general theory of the Hermite transform states this transformation can be considered as linear and locally unitary, the energy is preserved in terms of its coefficients. This is expressed by Parseval’s formula:

The SHT version generates energy compaction up to order , by steering the local coefficients into the direction of maximum energy. The more oriented the local structure, the more compact the representation. We can distinguish the local energy terms and , adapted to local orientation content, given by and patterns, respectively:

Total energy can be written in combination as follows: , where represents the DC component. In the SHT, high compaction of energy is achieved in term; this is useful for image compression purposes and several other applications [29].

2.4. Local Binary Patterns (LBP)

Ojala et al. proposed the Local Binary Pattern descriptor in 1994 [18]. It is based on the idea that textural properties within homogeneous regions can be mapped into patterns that represent microfeatures. Initially, it makes use of a fixed mask, which represents a neighborhood around a central pixel. The values are compared with their central pixel and the pixels take possible labels that describe intensity differences.

In a more general way, LBPs use a circular neighborhood instead of a fixed rectangular region. The sampling coordinates of the neighbors are calculated using the expression: [31]. If a coordinate does not fall on an integer position then the intensity values are bilinearly interpolated. Such implementations allow choosing the spatial resolution and the number of sampling points as follows:where is the number of sampling points, represents the radius of the neighborhood, is the central pixel at , and are the values of the neighbors whereas the comparison function, , is defined as follows:

LBPs also possess a variation on this definition called Uniform Local Binary Patterns () [31]. It takes advantage of the uniformity of the patterns and over of LBP texture patterns can be described with few spatial transitions, which are the changes in a pattern chain. This uniformity is defined as follows: , which corresponds to the number of spatial transitions, so that can be obtained as follows:

The pixelwise information is encoded as a histogram, , so that it can be interpreted as a fingerprint or a signature of the analyzed object.

3. The Proposed Method

The aim of the proposed methodology is to compare the segmentation performance of the original ASM and level sets methods with improved version of these methods that include information obtained from LBP and SHT. The next subsections explain different combination procedures used to improve the chosen algorithms. We evaluated different combinations of ASM and CV level set algorithms.

3.1. Combining VVCV with SHT

In this section, we describe the Hermite transform-based segmentation for planar images. This segmentation approach combines the Hermite coefficients and the vector-valued Chan-Vese algorithm based on level sets. Both image representation and segmentation techniques were computed using 2D algorithms.

Figure 1 shows the segmentation procedure used as part of this work. First, the HT is calculated up to order on every slice of the initial volume . Then the SHT is computed by locally rotating the HT coefficients. From these, only the first row coefficients are used, that is, the coefficients that carry most of the 1D energy.

A number of image coefficients produce the final segmentation that is computed iteratively by the formula in (5) in combination with (11).where , , , and are the image coefficients of SHT for orders and , respectively.

In the final stage of Figure 1, the VVCV procedure starts with the initialization of a circle with radius , called and centered in a specified pixel. The initial contour should be placed close to the object of interest to segment and the algorithm allows the curve evolution according to the tuning parameters and a convergence criterion.

3.2. Combining LBP with ASM

This combination is used as presented in [32]. In the training phase,(1)for every training image, a set of points is taken from the expert manual annotations and aligned by means of pose transformations;(2)create a statistical point distribution model (PDM) with point position data from all training images;(3)instead of creating gray-level profiles across the landmarks, we generate a model based on LBPs calculated within a neighborhood around the landmarks. The LBPs are calculated over four square regions of pixels around every landmark. A histogram is taken for each square as in the method described by Ojala et al. [31]. The four histograms are concatenated as shown in Figure 2;(4)finally, for each landmark, a mean concatenated histogram is obtained from all training images. Our final model consists of the original ASM model plus an average concatenated histogram for each landmark.

The LBP concatenated mean histograms take the place of the gray-level profile of the original ASM algorithm, using LBPs defined by Ojala et al. [18] and the Uniform-LBP (LBP-uni); Figure 3 shows the pipeline process. As in the original ASM algorithm, this model is used with the shape model, to estimate the optimal position for each landmark in the recognition phase.

In the recognition phase, a new image is processed by the ASM shape and LBP-landmark fitting stage, using the model from the previous training phase:(1)Initialize shape parameters and generate the mean shape from the training phase, in order to start the profile search of the points model with the mean shape. Before fitting the profile model with the PDM, must be manually placed close to the object.(2)For each landmark in , a comparison is done between the mean concatenated trained histogram and the corresponding histograms calculated at different positions along the search profile. Differences are evaluated with the Chi-square distance. The landmark is moved to the position with the smallest Chi-square distance. A small distance means more region similarity [33].(3)Pose model is adjusted including translation, rotation, and scaling, aligning the points model to the current shape found in the previous step.(4)Adjust the statistical model (6) to the new shape, updating the parameter within limits defined in the original algorithm.(5)Repeat step (2) and iterate until convergence is reached.

Figure 4 displays the pipeline of the recognition phase.

3.3. Combining SHT with ASM

This procedure is similar to the ASM/LBP, but, instead of computing LBP histograms, we compute histograms of quadratic local regions on four images corresponding to the first four SHT coefficients , for . This method resembles a vector-valued processing, because at each square region we use four steered Hermite coefficients, each one being a channel of the vector-valued ASM.where , , , and are the histograms of each landmark for each steered Hermite coefficients and is the notation for the coefficient histograms concatenated in the quadratic region.

The process is the same as in ASM/LBP except for the construction of the histogram information; this is applied in the training and recognition phases. Figure 5 displays how the histograms are built, and finally they are concatenated.

4. Materials

The first dataset consists of cranial annotated midbrain studies from 10 normal subjects. This studies were labeled and randomly selected by a highly trained neurology specialist at Hospital Interlomas (Mexico). The T2 images were obtained on a 3 Tesla Scanner generating volumes of size 512 448 176 pixels with a resolution of 0.44921 0.44921 0.9 mm. These data were processed with a normalization of gray intensities. We used 10 magnetic resonance brain volumes containing 10 slices each, where the midbrain is located.

Our second dataset is 11 annotated tomographic cardiac studies, taken during the cardiac cycle, from healthy subjects obtained with a CT Siemens dual source scanner (128 channels) at Hospital Angeles Pedregal, Mexico, by a highly trained cardiology specialist. The studies go from a final diastolic phase (relaxing), throughout the systolic phase (contraction), and return to the diastolic phase, providing 10 sets of images, at 0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90% of the cardiac cycle. The spatial resolution values range from 0.302734 0.302735 1.5 [mm] to 0.433593 0.433593 1 [mm]. In order to remove negative values, Hounsfield values are shifted by an offset. The experiments were evaluated on mid-third slice, and the consecutive images of the cardiac cycle were retrospectively reconstructed.

In both datasets, the algorithms were executed with a 2D segmentation analysis.

5. Experimental Results

The algorithm’s results were validated with manual segmentations generated by expert clinicians. The initialization steps of the algorithms are similar to the midbrain segmentation and left ventricle’s endocardium segmentation. For the training phase of the ASM algorithms we used leave-one-out cross-validation to reduce bias.

5.1. Midbrain Segmentation

We performed a comparison between the texture-based methods described previously and the classic segmentation methods applied directly over each image corresponding to midbrain.

Approaches based on level set methods profit from semiautomatic procedures where the user interaction in the initialization stage helps to achieve a fast convergence of the curve evolution. The initial circle with radius was located at the central pixel , where the point was defined by the user and it is the same for every slice . This value assumes to be close to the central part of the midbrain; then the procedure was repeated as an initial step for each data volume.

The stopping term is the result of applying a certain convergence criterion for the curve evolution that indicates the interval in which the segmentation was achieved. This term is crucial and can be experimentally obtained, while the value of time step can be assigned arbitrarily [34]. Since selecting a large value of implies the fast evolution of the curve process, it yields usually unsatisfactory results. On the other hand, selecting a small value of generates a slow curve motion and the number of iterations needed to achieve convergence increases substantially. Moreover, choosing a suitable value of and convergence criteria might solve some stability and resolution issues in the development. Finally, the number of iterations can be computed as a ratio between the stopping term and the time step .

The parameters of can be viewed as weight values that assign priority to the measure of uniformity inside and outside the curve in terms of intensity. Many applications can be found in the literature where the parameters are set to 1, although they can be modified according to a criterion after carrying out several experiments. This selection is closely related to the object of interest to segment. We defined a relation of that would allow less variations in the foreground compared to the background; this rule aims at identifying an internal homogeneous region compared to the external region, while the curve is growing [35]. Clearly, this relation improves the delimitation of a single structure among some others in the same image because the structure of interest was preserved along the first four orders of the chosen expansion.

In the midbrain volume assessing, the CV and VVCV/SHT parameters in (4) and (18), respectively, were defined experimentally as follows: , , and . We applied a criterion of time step to further compute the error at each iteration. In order to exhibit the stability differences between both methodologies, the resulting average curves from all volumes segmentation are displayed in Figure 6 using Dice index and Hausdorff distance in the interval of .

The Dice index curves can be viewed in Figure 6(a) where the highest value close to 1 corresponds to the best result. In this case, the average distances at each iteration using VVCV/SHT are higher than CV despite the increment of iterations. It is worth highlighting that the curve in CV decreases suddenly after the highest value is achieved. Similar performance of the error curves using Hausdorff metric, Figure 6(b), have shown a faster error increment in CV, while the VVCV/SHT values remain lower in the most of the iterations. The performance of VVCV/SHT implementation indicates a good improvement compared to the CV algorithm. The error curves show that the evolution can be stopped easier in VVCV/SHT than in CV by defining a thresholding criterion or a fixed number of iterations because the error remains lower (Dice index close to one, Hausdorff distance close to zero) despite increasing the time step or the number of iterations. The energy of the contour for a single brain image is displayed in Figure 7; the curve exhibits stability with VVCV/SHT.

Moreover, in order to quantify the differences among all techniques, both Dice and Hausdorff measures were computed according to the expert annotations; see Figure 8. The best resulting distances in the experiments are displayed as different color lines. Notice how the texture descriptors improve the classic methods of segmentation. In order to illustrate the segmentation results qualitatively, we create a volume render shown in Figure 9 trying to highlight the differences between the best results using CV and VVCV/SHT implementations.

5.2. Left Ventricle Segmentation

It is worth pointing out that the CV and VVCV parameters can be defined experimentally as in the case of the midbrain evaluation and the previous considerations of , , and terms. In this case, we determined the values as follows: , , and . The performance of the algorithms was compared using the Hausdorff distance and Dice coefficient, as shown in Figure 10. In general, we obtained good performance for the ASM/LBP and VVCV/SHT algorithms. The performance of the algorithms varies with the phase of the cardiac cycle. In most segmentation algorithms, performance decreases in systole, and our algorithms are not an exception. This could be explained because all algorithms are tested under similar conditions, such as iteration number, and in this phase the LV reaches maximum contraction. Also because of organ motion, the LV gets a less regular shape of a contracted ventricle, and a smaller area is obtained. It is worth noticing that, in spite of the fact that CT images possess more noise than MRI images, texture-based implementations exhibit better quantitative results than the rest of algorithms.

Figure 11 shows the segmentation results during the cardiac cycle, with changes in time. Notice how the segmentation with the level set approach detects the endocardial papillary muscles (yellow line). After this step, we applied a convex hull operation (blue line) in order to resemble the clinical segmentation and be able to compare it with the traditional delineations (red line) through all cardiac cycle percentages.

6. Conclusions and Future Work

In this research, we first introduced common deformable models for image segmentation and we further showed the improvement provided by texture descriptors for local analysis: steered Hermite transform and Local Binary Patterns. We assessed their performance segmenting midbrain in Magnetic Resonance Imaging and heart's left ventricle in Computed Tomography.

This approach compared the original algorithms with their modified versions that include additional local texture information. We found that our proposal results in segmentations closer to the expert annotations. This fact provides clear evidence in the rationale for combining methodologies which consider either a level of homogeneity in certain zones or a group of edges that best encloses the structure of interest.

In general, we obtained the most consistent results when combining CV level set method with SHT. Adding local image information to the original segmentation algorithms proved to be advantageous to delimit the shape and contours because of the energy compaction of the SHT. Furthermore, in the case of ASM, the boundaries converge with high accuracy when we included the information given by the LBPs. However, it is important to notice that ASM requires a training phase which takes an extra effort compared to the CV procedure; this fact adds an advantage to level sets methods regarding computer time and complexity. Moreover, this study is helpful to verify that our VVCV/SHT method can obtain favorable results when the parameters in the curve evolution, such as weights, time step, and convergence criterion, are selected intuitively as in most of the segmentation schemes based on active contours.

Future work should focus on improving our methods and expanding them to 3D analysis, as well as including more organ structures in our assessments.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This work has been sponsored by UNAM Grants PAPIIT IN116917 and SECITI 110/2015. Jimena Olveres and Erik Carbajal-Degante acknowledge Consejo Nacional de Ciencia y Tecnología (CONACYT).