Abstract

The Mumford-Shah model is extensively used in image segmentation. Its energy functional causes the content of the segments to remain homogeneous and the segment boundaries to become short. However, the problem is that optimization of the functional can be very slow. To attack this problem, we propose a reduced two-phase Mumford-Shah model to segment images having one prominent object. First, initial segmentation is obtained by the k-means clustering technique, further minimizing the Mumford-Shah functional by the Douglas-Rachford algorithm. Evaluation of segmentations with various error metrics shows that 70 percent of the segmentations keep the error values below 50%. Compared to the level set method to solve the Chan-Vese model, our algorithm is significantly faster. At the same time, it gives almost the same or better segmentation results. When compared to the recent k-means variant, it also gives much better segmentation with convex boundaries. The proposed algorithm balances well between time and quality of the segmentation. A crucial step in the design of machine vision systems is the extraction of discriminant features from the images, which is based on low-level segmentation which can be obtained by our approach.

1. Introduction

Image is an entity having different objects. Image segmentation is one of the techniques to recognize the regions that show different objects in the image.

Let the function represent the intensity of light at any point of domain when an image is captured. So, image is defined as . The RGB color representation of an image is mathematically represented by . The aim is to find regions for , of a domain, corresponding to different objects or their parts or shadows in the image. These regions of the image will have boundaries. In summary, the segmentation problem has the following definition: suppose be the domain of . Compute and find such that and varies smoothly over the regions as well as varies discontinuously over the boundaries between . The same problem can be represented in terms of approximation theory: find an approximation of in such a way that each represents the entire region via a piecewise smooth function.

There are many approaches to image segmentation [1] including thresholding, clustering, or classification methods [24], region-based methods [5], and edge-based active contour methods [68]. Segmentation techniques based on deep convolutional neural networks have been developed for various medical images MRI, CT, and X-rays, which show promising results in comparison to conventional segmentation methods [912]. The segmentation problem has also been considered as an energy minimization problem. But till date, obtaining a generalized solution that works without prepriori facts about the object elements, its characteristics like shape, color, texture, appearance of shadows, and overlapping of objects are an open problem.

Segmentation is essentially a clustering problem with additional connectivity constraint. k-means is widely used, its properties are well known, and it has also been applied to image segmentation with modest modifications. It is fast and simple to implement [13], which makes it suitable for large datasets like images. However, applying k-means merely based on the pixel values usually results in very fragmented segments. Regularized k-means (reg-KM) [14] attacks this by combining the intensity and the spatial information using two terms. The first term calculates the ratio of the segment perimeter and its area while the second term measures the intracluster dissimilarity as in the standard k-means.

In 1989, Mumford and Shah introduced the Mumford-Shah model [15]. Since then, it has applications in the areas such as image denoising, image restoration, and image segmentation [16]. The Mumford-Shah model can be implemented using the Douglas-Rachford algorithm, ADI implicit method, and other numerical optimizations [1722]. The improvement of local minima was achieved by graph-based methods, but these methods do not allow for open boundaries [23].

A convex relaxation approach was presented in [24], using functional lifting which guarantees a globally optimal solution, is independent of initialization, and takes care of open boundaries. There are suitable algorithms to solve the simplified Mumford-Shah model which include convex boundaries for any kind of image segmentation [18, 22, 2527]. The problems of the previous approaches are that those based on Mumford-Shah are rather slow, while many region and contour-based models provide poor segmentation result if the parameters are not carefully tuned.

In this paper, we introduce a two-phase image segmentation algorithm. Instead of using region or block-based heuristics to solve the segmentation problem, we define it as a global optimization problem. We input the number of segments, select the initial centroids randomly, and apply the k-means clustering technique to obtain the initial segmented image, which is further improved by minimizing the Mumford-Shah model. k-means is known to be sensitive to the initialization. A typical solution is to try the algorithm several times with different initialization or use some more robust initialization technique, see [13]. Our main contribution is to show how the proximal splitting technique can be used to solve the two-phase Mumford-Shah model.

We compare our segmentation with ground truth data for 100 color images from the Weizmann image dataset [28]. Statistically evaluating using error metrics defined in [29], we find 70% of our segmentation have an error value of 0.5 or less. The method works especially well with segments that are difficult to determine, there are multiple objects in the image, and objects have stripes, or complex objects posing so that they have artificial boundaries. For these troublesome images, we find our algorithm gives slightly better segmentation when compared to the classical Chan-Vese optimization algorithm [8], and much better than the k-means variant (reg-KM) [14]. Our implementation of the Douglas-Rachford (DR) algorithm is also a lot faster than the Chan-Vese algorithm. The Chan-Vese model jointly uses the reduced Mumford-Shah model and level set method which also uses active contours for image segmentation. This is time consuming because it is solved using reinitialization of a level set function [30, 31].

The modern machine vision methods have higher-level goals like extracting an anatomical object of interest for diagnosis of diseases, scene classification, detection of human activity in visual surveillance, assigning a name to a human face in an image, and classifying handwritten characters [32]. A crucial step in the design of such machine vision systems is the extraction of discriminant features from the images [33, 34]. The application of the proposed approach is to obtain a low-level segmentation based on color features and spatial connectivity, which can be further fed to the above-mentioned higher-level semantic segmentation.

The rest of the paper is organized as follows: Section 2 introduces the Mumford-Shah image segmentation model and shows its discrete formulation. Section 3 explains the proximal splitting technique used for optimizing the Mumford-Shah functional and also contains pseudocode. In Section 4, we explain the evaluation criteria used and discuss the results. Conclusions are drawn in Section 5.

2. The Mumford-Shah Image Segmentation Model

The Mumford-Shah model is based on the functional which is to be minimized and it depends on two parameters and . The functional is given by where is an image, function approximates which is differentiable over the regions , and where is made up of smooth arcs and it creates boundaries for . The overall length of all the smooth arcs of is expressed by . The parameters and control the approximation quality and segment coarseness, respectively.

The smaller value of gives good segmentation and better representation of the image. The first term in equation (2) measures how well approximates the image . The second term measures how much varies on each . The third term in equation (2) measures the length of the boundaries and ensures that the boundaries are as small as possible. Mumford and Shah mentioned that [15], dropping any of the terms in equation (2), we have . If the first term is not in equation (2), we have . If the second term is not there in equation (2), we take . If the third term is not there in equation (2), we take as a fine grid of horizontal and vertical lines and hence number of small squares. Also, is average of on each .

The presence of all the three terms leads to nontrivial optimal solution . The optimal value can be visualised as a cartoon of the actual image . Both Mumford and Shah were uncertain for the well-posedness of minimization of . However, they conjectured that for the continuous function , has a minimum in the set of all pairs where is differentiable on each and a finite set of singular points joined by a finite set of arcs.

The energy functional in equation (2) whose exact minimization is a nontrivial task. When is a piecewise constant function on each , obtained as the average value of on each , the second term in equation (2), . Therefore, equation (2) reduces to the simplified Mumford-Shah model [5, 21, 35] given as

This is a special case of equation (2). This is also called the minimal partition problem [6]. If is large, the emphasis is on the length of the boundaries and the minimization of leads to fewer boundaries and coarser segmentation. If is small, the emphasis is on the first term of equation (4) which leads to an approximation that follows much more closely and allows for more boundaries. Thus, a small leads to a finer segmentation. If , the minimum of forces leading to finest possible segmentation, which is a trivial segmentation, where every region consists of a single point (pixel).

2.1. Image Segmentation Model

Equation (4) is the reduced two-phase segmentation model. In the two-phase model, the image is divided into two segments, the object and the background. Chan-Vese used the level set method to implement the two-phase segmentation model [6, 8]. In our approach, we solve the reduced two-phase model by rewriting the problem using the Douglas-Rachford proximal splitting algorithm. We use a convexification technique similar to [18, 22].

In this work, we have considered RGB images. The goal is to divide the given image into two regions, the background and the foreground object. We define the foreground object as , background as , and the entire image as . So, image , here represent the arcs of the segment . The image contains pixels.

Let and denote the color values of randomly selected pixels as foreground pixel and background pixel, respectively. From simplified equation (4), we have where and , ; . Let and of

Now, our aim is to find the regions and that minimize functional which is given in equation (5). Defining , we represent the domain with binary indicator function so that

This function is the expected approximation of image . Using , the minimization of functional from equation (5) can be seen as follows:

The first term in equation (7) is the inner product of and and is defined as . The total variation pseudonorm is , which equals to for binary indicator by coarea formula established by Fleming and Rishel [20].

2.2. Discrete Formulation of Segmentation Model

To discretize variational problem (7) on a grid of pixels, we assume that . Let be a vector field for every and suppose then norm of is given by

The total variation pseudonorm, for , is defined as using equation (9). Thus, we obtain

We assume that the pixels are indexed on a two-dimensional grid where and , the finite difference gradient operator is

We use periodic boundary conditions for simplicity. The inner product is

Equation (11) is a nonsmooth and nonconvex and usually direct minimization that leads to poor local minima. By replacing the binary constraint with a box constraint , we obtain the convex formulation of the energy functional of equation (11). This defines the following finite dimensional convex problem:

One can prove that this relaxation is exact, meaning that the minimizer , when it is unique, is binary, . It means that such that actually solves the original segmentation problem.

3. Proximal Splitting Technique

Proximal algorithms are used for solving large-scale, constrained, or distributed optimization problems [36, 37]. Many image processing problems can be formulated as where : These algorithms use splitting technique such that the functions in equation (15) can be used to get an algorithm which is easily implementable. Usually, these algorithms are called proximal because each function in equation (15) is involved via its proximity operator. We propose to use proximal splitting scheme to solve our optimization problem, i.e., simplified Mumford-Shah model of equation (14).

3.1. Splitting

We introduce the variable (http://www.numericaltours.com). The optimization problem (14) can be seen as follows: where and are defined as follows:

Here, the constraints are defined using indicator function as follows:

3.2. Proximal Mapping

The proximal mapping, , for any convex function with parameter and

Here, is the minimizer of function . Proximal operators can be viewed as generalized projections (see Appendix A). Similarly, reflection is defined as

3.3. Douglas-Rachford Algorithm

The Douglas-Rachford (DR) algorithm minimizes functionals iteratively as in equation (16):

It is given that and are convex functions. Hence, we can compute their proximal mappings and , respectively. It is not required that the functions and are smooth. A DR iteration is (see Appendix C): where

We can show that for any value of and is a minimizer of .

3.3.1. Proximal Operator of

The proximal mapping of is the orthogonal projection on the convex set :

The following linear system of equations gives and because where

(see Appendix B).

Here, by convention, and . We have where

As we have discretized variational problem (7) on a grid of pixels, takes values and takes the values .

Also, and where the two-dimensional discrete Fourier transform of an image is .

3.3.2. Proximal Operator of

The function can be written as where

The proximal mapping of is

We define the value of . The proximal operator of is where

The proximal operator of the norm is a soft thresholding of the amplitude of the vector field:

3.4. Implementation

The implementation is sketched in the following pseudocode. We use the following notations:

: input image of size

number of segments

: number of iterations

: control parameter in Mumford-Shah functional

: color values of randomly selected and

: color values of and after iterations of k-means

indicate regions , , and

: indicator function

: gradient of

: value of functional at each iteration

The following subprograms are also needed:

: fast Fourier transform

: gradient

inverse fourier transform

: divergence

MS (g, segments, max iterations, lambda) →
values of randomly selected
values of randomly selected
kmeans
mu⟵1, gamma⟵1,
grad(f)
 REPEAT
  
  
  
  
  
  
  
  
  
  
  
 UNTIL AND () OR ()
Procedures
diff
 FOR all i DO
  FOR all DO
    
indicator
 FOR all i DO
  FOR all DO
   IF THEN
    
   ELSE
    
Auxiliary functions
projc1
gridk()
gridk() →
 FOR all i DO
  FOR all DO
  
projc2
gradf()
proxf0
 FOR all i DO
  FOR all DO
    

4. Experimental Results

4.1. Image Dataset

The performance of our algorithm is evaluated using the partial Weizmann dataset [28] images depicting one object in the foreground. Currently, the image database contains a hundred grey level and hundred color images along with their ground truth segmentations, out of which we have used only color images. The dataset comprises of images that clearly have an object that is different from the background. Each image has three corresponding ground truth segmentations as shown in Figure 1.

4.2. Evaluation Criterion

The performance of our algorithm is based on region-based error metrics defined in [29]. Considering as segmentation obtained by an algorithm, as one of the ground-truth, we calculate

a value in the range . For a given pixel , we consider segment in the solution and segment in the ground truth that contain the pixel. If one segment is a proper subset of the other, then the pixel lies in an area of refinement, and the local error should be zero. Otherwise, the two regions overlap in an inconsistent manner and we should calculate the corresponding error. Here, is the set of pixels corresponding to the region in segmentation that contains pixel , and the local refinement error is defined as .

This local error measure is not symmetric, and it gives refinement in one direction only. It is to be noted that if is a refinement of at pixel , but not the opposite. Considering this local refinement error in each direction at each pixel, there are two methods to combine the values into an error measure for the entire image: global consistency error that forces all local refinements to be in the same direction and local consistency error that allows refinement in different directions in different parts of the image [38]. For a given as the number of pixels, we have

The region-based consistency measure LCE is tolerant to refinement in either direction at each image pixel. If we simply replace the pixelwise minimum with a maximum, we get a measure that does not tolerate refinement at all and penalizes dissimilarity between segmentations proportional to the degree of region overlap. Applying to segmentations and , the bidirectional consistency error (BCE) is defined as

In addition, we can ask if segmentation obtained by our algorithm is consistent with the collection of all ground truth segmentations of that image. The measure is obtained by computing the minimum error at each pixel over each ground truth segmentation :

Structural similarity index [39] measures the visual quality between two images. It is calculated as a weighted combination of three factors luminance, contrast, and structural similarity. It can be expressed as where is the luminance, is the contrast, is the structural similarity, and , , and are positive constants. The three factors are calculated separately as follows: where and are local means, and are the standard deviations, and is the cross-covariance between the two images and , respectively. If , then SSIM simplifies to

We use SSIM to calculate the similarity between the image generated by the segmentation result, and the image generated from the ground truth segmentation. Here, we have implemented the proposed method for two-phase segmentation, which gives us foreground (object) and background. We have used SSIM to evaluate the foreground.

4.3. Results

We have executed the DR algorithm over 100 images from the Weizmann dataset [29], for various values of as , , , , and is a parameter which is positive, and unfortunately, there is no fitness function to adjust it automatically. This is a drawback of the method. reg-KM and Chan-Vese methods also have similar limitation.

For each of the 100 images, the segmentation results are compared with their three ground truths to obtain the error measures GCE and . The average value of the three error values is taken as the final error. Out of these values, we found that the results using and are the best in terms of accuracy in most of the images. We elaborate these in Figure 2. It is very important that the error values are more concentrated towards zero. Figure 2(a) illustrates that for and , around 69% of the images have GCE value of 0.5 or less. Similarly, Figure 2(b) illustrates that for and , around 70% of the images have value of 0.5 or less.

Figure 1 shows the original image, the three human segmentations comprising the ground truth, the initial segmentation obtained by the k-means clustering technique, and the result of the Douglas-Rachford algorithm for two selected images from the dataset. It is clear that when is large, the minimization of leads to fewer boundaries and coarser segmentation and when is small, we obtain a finer segmentation. Minimization of is achieved when the difference of between two iterations is less than 0.1 tolerance. We vary from 0.1 to 1.00 to have segmentation from finer to coarser, as established in [15]. Figure 3 compares the performance and efficiency of the iterations with different values. It shows that the value of the minimization of energy functional keeps decreasing when iterated long enough. We observe that for values of 0.75 and 1.00, minimization of energy functional reaches the lowest values. At the same time for value of 0.75, it converges faster.

Further, for comparison with other methods, we select three images from the dataset such that their segmentation is not an easy task. Figure 4 shows these three selected images and their different ground truths. For the church image, it is difficult to determine which is the object, the church, or the signboard. This confusion is also seen in the ground truths. The same is observed in the case of a lion, where either the lion or the ball, or both, is considered as the object. The segmentation of caterpillar is the most difficult because it has stripes which can act as an artificial boundary.

Figure 5 shows segmentations obtained by the reg-KM algorithm [14], Chan-Vese algorithm [8], and the proposed DR algorithm. The results are optimized as follows. Results of reg-KM are with the regularization parameter 0.5. The Chan-Vese algorithm has been used with various initial contours like a small circle, large circle, and user-defined box. We tested all these and chose to use small circles (see Figure 6) as it provided the best segmentation for these images. In our algorithm, we have used the k-means clustering method for obtaining initial segmentation and then Mumford-Shah functional minimized by Douglas-Rachford using the parameter .

Table 1 shows the numerical comparison of reg-KM, Chan-Vese, and the proposed DR algorithm. The results show that the reg-KM algorithm has worse accuracy in all three cases which we obtained through the error evaluation criteria mentioned in Section 4.2. The results of the Chan-Vese method and the proposed DR method have similar accuracy; Chan-Vese is better with caterpillar while DR is better with lion.

Table 2 summarizes the SSIM values for the three methods: reg-KM, Chan-Vese, and the proposed DR. In case of all images, the proposed DR algorithm achieves significantly higher SSIM values which indicates its superiority over the compared methods.

Table 3 summarizes the processing times of these algorithms. We can see that the proposed DR algorithm is an order of magnitude faster than Chan-Vase. The three images are segmented in about 1.5 seconds, on average, while Chan-Vese takes 40 seconds per image, on average. The reg-KM technique does not give very good segmentation but is definitely fast in terms of execution time. However, the DR algorithm uses a minimization technique and therefore it takes more time but gives much better results than reg-KM and nearly the same results when compared to the popular Chan-Vese method.

5. Conclusions

In this paper, we proposed a reduced two-phase Mumford-Shah model to segment images with one object. We obtained initial segmentation by the k-means clustering technique and further minimized it by our implementation of the Douglas-Rachford algorithm. In k-means, is a user-selected parameter about how many segments. If needs to be automatically selected, then it should also be included in the Mumford-Shah functional in (2). Possible solutions how to do it, we refer to solutions in the clustering context [40]. Experiments with various error metrics LCE, GCE, and show that 70 percent of the segmentations keep the error values below 0.5. We also compare our algorithm to Chan-Vese implementation which is one of the classical methods to optimize the Mumford-Shah functional. Our segmentation is slightly better in two of the three selected images, while the main benefit is that our method is significantly faster than the Chan-Vese algorithm. Secondly, we compare our algorithm with a recent k-means variant, reg-KM. According to the error metric, our segmentations are significantly better. However, reg-KM is much faster as it has a statistical approach of clustering but it does not give smooth segmentation with convex boundaries. Our algorithm balances time as well as quality of segmentation. As a future research, we consider adopting the Mumford-Shah model in k-means context.

Appendix

A. The Projection, Proximal Mapping, and Reflection

Let be a closed convex set. The projection onto set is a mapping defined by

The proximal mapping of an indicator function is given by which is nothing but projection of on and is denoted by

The reflection with respect to set is a function defined by

Let be -dimensional Euclidean space, be norm, and be the class of lower semicontinuous functions from such that and . For any , the minimization problem admits a unique solution, which is denoted by . The operator thus defined is the proximal operator of function

B. Solving a Linear System of Equations to Obtain

Let and is given by

Let , , and , than the is given by

Let and denote fast Fourier transform of then

C. Douglas-Rachford Algorithm

Let and be lower semicontinuous functions. Also, and need not be differentiable and .

The problem is

Solution. The problem has at least one solution. Let and .

The solution is denoted by ; Douglas-Rachford iterations are as follows:

,
{
   
   
}

Applying the same to our minimization problem, in equation (16) by taking and .

Repeat
   {
     
      
      
            
     
   }

Data Availability

The data used to support the findings of this study are publicly available: http://cs.uef.fi/sipu/images/ and http://www.wisdom.weizmann.ac.il/~vision/Seg_Evaluation_DB/.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.