Abstract
The MumfordShah model is extensively used in image segmentation. Its energy functional causes the content of the segments to remain homogeneous and the segment boundaries to become short. However, the problem is that optimization of the functional can be very slow. To attack this problem, we propose a reduced twophase MumfordShah model to segment images having one prominent object. First, initial segmentation is obtained by the kmeans clustering technique, further minimizing the MumfordShah functional by the DouglasRachford algorithm. Evaluation of segmentations with various error metrics shows that 70 percent of the segmentations keep the error values below 50%. Compared to the level set method to solve the ChanVese model, our algorithm is significantly faster. At the same time, it gives almost the same or better segmentation results. When compared to the recent kmeans variant, it also gives much better segmentation with convex boundaries. The proposed algorithm balances well between time and quality of the segmentation. A crucial step in the design of machine vision systems is the extraction of discriminant features from the images, which is based on lowlevel segmentation which can be obtained by our approach.
1. Introduction
Image is an entity having different objects. Image segmentation is one of the techniques to recognize the regions that show different objects in the image.
Let the function represent the intensity of light at any point of domain when an image is captured. So, image is defined as . The RGB color representation of an image is mathematically represented by . The aim is to find regions for , of a domain, corresponding to different objects or their parts or shadows in the image. These regions of the image will have boundaries. In summary, the segmentation problem has the following definition: suppose be the domain of . Compute and find such that and varies smoothly over the regions as well as varies discontinuously over the boundaries between . The same problem can be represented in terms of approximation theory: find an approximation of in such a way that each represents the entire region via a piecewise smooth function.
There are many approaches to image segmentation [1] including thresholding, clustering, or classification methods [2–4], regionbased methods [5], and edgebased active contour methods [6–8]. Segmentation techniques based on deep convolutional neural networks have been developed for various medical images MRI, CT, and Xrays, which show promising results in comparison to conventional segmentation methods [9–12]. The segmentation problem has also been considered as an energy minimization problem. But till date, obtaining a generalized solution that works without prepriori facts about the object elements, its characteristics like shape, color, texture, appearance of shadows, and overlapping of objects are an open problem.
Segmentation is essentially a clustering problem with additional connectivity constraint. kmeans is widely used, its properties are well known, and it has also been applied to image segmentation with modest modifications. It is fast and simple to implement [13], which makes it suitable for large datasets like images. However, applying kmeans merely based on the pixel values usually results in very fragmented segments. Regularized kmeans (regKM) [14] attacks this by combining the intensity and the spatial information using two terms. The first term calculates the ratio of the segment perimeter and its area while the second term measures the intracluster dissimilarity as in the standard kmeans.
In 1989, Mumford and Shah introduced the MumfordShah model [15]. Since then, it has applications in the areas such as image denoising, image restoration, and image segmentation [16]. The MumfordShah model can be implemented using the DouglasRachford algorithm, ADI implicit method, and other numerical optimizations [17–22]. The improvement of local minima was achieved by graphbased methods, but these methods do not allow for open boundaries [23].
A convex relaxation approach was presented in [24], using functional lifting which guarantees a globally optimal solution, is independent of initialization, and takes care of open boundaries. There are suitable algorithms to solve the simplified MumfordShah model which include convex boundaries for any kind of image segmentation [18, 22, 25–27]. The problems of the previous approaches are that those based on MumfordShah are rather slow, while many region and contourbased models provide poor segmentation result if the parameters are not carefully tuned.
In this paper, we introduce a twophase image segmentation algorithm. Instead of using region or blockbased heuristics to solve the segmentation problem, we define it as a global optimization problem. We input the number of segments, select the initial centroids randomly, and apply the kmeans clustering technique to obtain the initial segmented image, which is further improved by minimizing the MumfordShah model. kmeans is known to be sensitive to the initialization. A typical solution is to try the algorithm several times with different initialization or use some more robust initialization technique, see [13]. Our main contribution is to show how the proximal splitting technique can be used to solve the twophase MumfordShah model.
We compare our segmentation with ground truth data for 100 color images from the Weizmann image dataset [28]. Statistically evaluating using error metrics defined in [29], we find 70% of our segmentation have an error value of 0.5 or less. The method works especially well with segments that are difficult to determine, there are multiple objects in the image, and objects have stripes, or complex objects posing so that they have artificial boundaries. For these troublesome images, we find our algorithm gives slightly better segmentation when compared to the classical ChanVese optimization algorithm [8], and much better than the kmeans variant (regKM) [14]. Our implementation of the DouglasRachford (DR) algorithm is also a lot faster than the ChanVese algorithm. The ChanVese model jointly uses the reduced MumfordShah model and level set method which also uses active contours for image segmentation. This is time consuming because it is solved using reinitialization of a level set function [30, 31].
The modern machine vision methods have higherlevel goals like extracting an anatomical object of interest for diagnosis of diseases, scene classification, detection of human activity in visual surveillance, assigning a name to a human face in an image, and classifying handwritten characters [32]. A crucial step in the design of such machine vision systems is the extraction of discriminant features from the images [33, 34]. The application of the proposed approach is to obtain a lowlevel segmentation based on color features and spatial connectivity, which can be further fed to the abovementioned higherlevel semantic segmentation.
The rest of the paper is organized as follows: Section 2 introduces the MumfordShah image segmentation model and shows its discrete formulation. Section 3 explains the proximal splitting technique used for optimizing the MumfordShah functional and also contains pseudocode. In Section 4, we explain the evaluation criteria used and discuss the results. Conclusions are drawn in Section 5.
2. The MumfordShah Image Segmentation Model
The MumfordShah model is based on the functional which is to be minimized and it depends on two parameters and . The functional is given by where is an image, function approximates which is differentiable over the regions , and where is made up of smooth arcs and it creates boundaries for . The overall length of all the smooth arcs of is expressed by . The parameters and control the approximation quality and segment coarseness, respectively.
The smaller value of gives good segmentation and better representation of the image. The first term in equation (2) measures how well approximates the image . The second term measures how much varies on each . The third term in equation (2) measures the length of the boundaries and ensures that the boundaries are as small as possible. Mumford and Shah mentioned that [15], dropping any of the terms in equation (2), we have . If the first term is not in equation (2), we have . If the second term is not there in equation (2), we take . If the third term is not there in equation (2), we take as a fine grid of horizontal and vertical lines and hence number of small squares. Also, is average of on each .
The presence of all the three terms leads to nontrivial optimal solution . The optimal value can be visualised as a cartoon of the actual image . Both Mumford and Shah were uncertain for the wellposedness of minimization of . However, they conjectured that for the continuous function , has a minimum in the set of all pairs where is differentiable on each and a finite set of singular points joined by a finite set of arcs.
The energy functional in equation (2) whose exact minimization is a nontrivial task. When is a piecewise constant function on each , obtained as the average value of on each , the second term in equation (2), . Therefore, equation (2) reduces to the simplified MumfordShah model [5, 21, 35] given as
This is a special case of equation (2). This is also called the minimal partition problem [6]. If is large, the emphasis is on the length of the boundaries and the minimization of leads to fewer boundaries and coarser segmentation. If is small, the emphasis is on the first term of equation (4) which leads to an approximation that follows much more closely and allows for more boundaries. Thus, a small leads to a finer segmentation. If , the minimum of forces leading to finest possible segmentation, which is a trivial segmentation, where every region consists of a single point (pixel).
2.1. Image Segmentation Model
Equation (4) is the reduced twophase segmentation model. In the twophase model, the image is divided into two segments, the object and the background. ChanVese used the level set method to implement the twophase segmentation model [6, 8]. In our approach, we solve the reduced twophase model by rewriting the problem using the DouglasRachford proximal splitting algorithm. We use a convexification technique similar to [18, 22].
In this work, we have considered RGB images. The goal is to divide the given image into two regions, the background and the foreground object. We define the foreground object as , background as , and the entire image as . So, image , here represent the arcs of the segment . The image contains pixels.
Let and denote the color values of randomly selected pixels as foreground pixel and background pixel, respectively. From simplified equation (4), we have where and , ; . Let and of
Now, our aim is to find the regions and that minimize functional which is given in equation (5). Defining , we represent the domain with binary indicator function so that
This function is the expected approximation of image . Using , the minimization of functional from equation (5) can be seen as follows:
The first term in equation (7) is the inner product of and and is defined as . The total variation pseudonorm is , which equals to for binary indicator by coarea formula established by Fleming and Rishel [20].
2.2. Discrete Formulation of Segmentation Model
To discretize variational problem (7) on a grid of pixels, we assume that . Let be a vector field for every and suppose then norm of is given by
The total variation pseudonorm, for , is defined as using equation (9). Thus, we obtain
We assume that the pixels are indexed on a twodimensional grid where and , the finite difference gradient operator is
We use periodic boundary conditions for simplicity. The inner product is
Equation (11) is a nonsmooth and nonconvex and usually direct minimization that leads to poor local minima. By replacing the binary constraint with a box constraint , we obtain the convex formulation of the energy functional of equation (11). This defines the following finite dimensional convex problem:
One can prove that this relaxation is exact, meaning that the minimizer , when it is unique, is binary, . It means that such that actually solves the original segmentation problem.
3. Proximal Splitting Technique
Proximal algorithms are used for solving largescale, constrained, or distributed optimization problems [36, 37]. Many image processing problems can be formulated as where : These algorithms use splitting technique such that the functions in equation (15) can be used to get an algorithm which is easily implementable. Usually, these algorithms are called proximal because each function in equation (15) is involved via its proximity operator. We propose to use proximal splitting scheme to solve our optimization problem, i.e., simplified MumfordShah model of equation (14).
3.1. Splitting
We introduce the variable (http://www.numericaltours.com). The optimization problem (14) can be seen as follows: where and are defined as follows:
Here, the constraints are defined using indicator function as follows:
3.2. Proximal Mapping
The proximal mapping, , for any convex function with parameter and
Here, is the minimizer of function . Proximal operators can be viewed as generalized projections (see Appendix A). Similarly, reflection is defined as
3.3. DouglasRachford Algorithm
The DouglasRachford (DR) algorithm minimizes functionals iteratively as in equation (16):
It is given that and are convex functions. Hence, we can compute their proximal mappings and , respectively. It is not required that the functions and are smooth. A DR iteration is (see Appendix C): where
We can show that for any value of and is a minimizer of .
3.3.1. Proximal Operator of
The proximal mapping of is the orthogonal projection on the convex set :
The following linear system of equations gives and because where
(see Appendix B).
Here, by convention, and . We have where
As we have discretized variational problem (7) on a grid of pixels, takes values and takes the values .
Also, and where the twodimensional discrete Fourier transform of an image is .
3.3.2. Proximal Operator of
The function can be written as where
The proximal mapping of is
We define the value of . The proximal operator of is where
The proximal operator of the norm is a soft thresholding of the amplitude of the vector field:
3.4. Implementation
The implementation is sketched in the following pseudocode. We use the following notations:
: input image of size
number of segments
: number of iterations
: control parameter in MumfordShah functional
: color values of randomly selected and
: color values of and after iterations of kmeans
indicate regions , , and
: indicator function
: gradient of
: value of functional at each iteration
The following subprograms are also needed:
: fast Fourier transform
: gradient
inverse fourier transform
: divergence

4. Experimental Results
4.1. Image Dataset
The performance of our algorithm is evaluated using the partial Weizmann dataset [28] images depicting one object in the foreground. Currently, the image database contains a hundred grey level and hundred color images along with their ground truth segmentations, out of which we have used only color images. The dataset comprises of images that clearly have an object that is different from the background. Each image has three corresponding ground truth segmentations as shown in Figure 1.
4.2. Evaluation Criterion
The performance of our algorithm is based on regionbased error metrics defined in [29]. Considering as segmentation obtained by an algorithm, as one of the groundtruth, we calculate
a value in the range . For a given pixel , we consider segment in the solution and segment in the ground truth that contain the pixel. If one segment is a proper subset of the other, then the pixel lies in an area of refinement, and the local error should be zero. Otherwise, the two regions overlap in an inconsistent manner and we should calculate the corresponding error. Here, is the set of pixels corresponding to the region in segmentation that contains pixel , and the local refinement error is defined as .
This local error measure is not symmetric, and it gives refinement in one direction only. It is to be noted that if is a refinement of at pixel , but not the opposite. Considering this local refinement error in each direction at each pixel, there are two methods to combine the values into an error measure for the entire image: global consistency error that forces all local refinements to be in the same direction and local consistency error that allows refinement in different directions in different parts of the image [38]. For a given as the number of pixels, we have
The regionbased consistency measure LCE is tolerant to refinement in either direction at each image pixel. If we simply replace the pixelwise minimum with a maximum, we get a measure that does not tolerate refinement at all and penalizes dissimilarity between segmentations proportional to the degree of region overlap. Applying to segmentations and , the bidirectional consistency error (BCE) is defined as
In addition, we can ask if segmentation obtained by our algorithm is consistent with the collection of all ground truth segmentations of that image. The measure is obtained by computing the minimum error at each pixel over each ground truth segmentation :
Structural similarity index [39] measures the visual quality between two images. It is calculated as a weighted combination of three factors luminance, contrast, and structural similarity. It can be expressed as where is the luminance, is the contrast, is the structural similarity, and , , and are positive constants. The three factors are calculated separately as follows: where and are local means, and are the standard deviations, and is the crosscovariance between the two images and , respectively. If , then SSIM simplifies to
We use SSIM to calculate the similarity between the image generated by the segmentation result, and the image generated from the ground truth segmentation. Here, we have implemented the proposed method for twophase segmentation, which gives us foreground (object) and background. We have used SSIM to evaluate the foreground.
4.3. Results
We have executed the DR algorithm over 100 images from the Weizmann dataset [29], for various values of as , , , , and is a parameter which is positive, and unfortunately, there is no fitness function to adjust it automatically. This is a drawback of the method. regKM and ChanVese methods also have similar limitation.
For each of the 100 images, the segmentation results are compared with their three ground truths to obtain the error measures GCE and . The average value of the three error values is taken as the final error. Out of these values, we found that the results using and are the best in terms of accuracy in most of the images. We elaborate these in Figure 2. It is very important that the error values are more concentrated towards zero. Figure 2(a) illustrates that for and , around 69% of the images have GCE value of 0.5 or less. Similarly, Figure 2(b) illustrates that for and , around 70% of the images have value of 0.5 or less.
(a)
(b)
Figure 1 shows the original image, the three human segmentations comprising the ground truth, the initial segmentation obtained by the kmeans clustering technique, and the result of the DouglasRachford algorithm for two selected images from the dataset. It is clear that when is large, the minimization of leads to fewer boundaries and coarser segmentation and when is small, we obtain a finer segmentation. Minimization of is achieved when the difference of between two iterations is less than 0.1 tolerance. We vary from 0.1 to 1.00 to have segmentation from finer to coarser, as established in [15]. Figure 3 compares the performance and efficiency of the iterations with different values. It shows that the value of the minimization of energy functional keeps decreasing when iterated long enough. We observe that for values of 0.75 and 1.00, minimization of energy functional reaches the lowest values. At the same time for value of 0.75, it converges faster.
(a)
(b)
Further, for comparison with other methods, we select three images from the dataset such that their segmentation is not an easy task. Figure 4 shows these three selected images and their different ground truths. For the church image, it is difficult to determine which is the object, the church, or the signboard. This confusion is also seen in the ground truths. The same is observed in the case of a lion, where either the lion or the ball, or both, is considered as the object. The segmentation of caterpillar is the most difficult because it has stripes which can act as an artificial boundary.
Figure 5 shows segmentations obtained by the regKM algorithm [14], ChanVese algorithm [8], and the proposed DR algorithm. The results are optimized as follows. Results of regKM are with the regularization parameter 0.5. The ChanVese algorithm has been used with various initial contours like a small circle, large circle, and userdefined box. We tested all these and chose to use small circles (see Figure 6) as it provided the best segmentation for these images. In our algorithm, we have used the kmeans clustering method for obtaining initial segmentation and then MumfordShah functional minimized by DouglasRachford using the parameter .
Table 1 shows the numerical comparison of regKM, ChanVese, and the proposed DR algorithm. The results show that the regKM algorithm has worse accuracy in all three cases which we obtained through the error evaluation criteria mentioned in Section 4.2. The results of the ChanVese method and the proposed DR method have similar accuracy; ChanVese is better with caterpillar while DR is better with lion.
Table 2 summarizes the SSIM values for the three methods: regKM, ChanVese, and the proposed DR. In case of all images, the proposed DR algorithm achieves significantly higher SSIM values which indicates its superiority over the compared methods.
Table 3 summarizes the processing times of these algorithms. We can see that the proposed DR algorithm is an order of magnitude faster than ChanVase. The three images are segmented in about 1.5 seconds, on average, while ChanVese takes 40 seconds per image, on average. The regKM technique does not give very good segmentation but is definitely fast in terms of execution time. However, the DR algorithm uses a minimization technique and therefore it takes more time but gives much better results than regKM and nearly the same results when compared to the popular ChanVese method.
5. Conclusions
In this paper, we proposed a reduced twophase MumfordShah model to segment images with one object. We obtained initial segmentation by the kmeans clustering technique and further minimized it by our implementation of the DouglasRachford algorithm. In kmeans, is a userselected parameter about how many segments. If needs to be automatically selected, then it should also be included in the MumfordShah functional in (2). Possible solutions how to do it, we refer to solutions in the clustering context [40]. Experiments with various error metrics LCE, GCE, and show that 70 percent of the segmentations keep the error values below 0.5. We also compare our algorithm to ChanVese implementation which is one of the classical methods to optimize the MumfordShah functional. Our segmentation is slightly better in two of the three selected images, while the main benefit is that our method is significantly faster than the ChanVese algorithm. Secondly, we compare our algorithm with a recent kmeans variant, regKM. According to the error metric, our segmentations are significantly better. However, regKM is much faster as it has a statistical approach of clustering but it does not give smooth segmentation with convex boundaries. Our algorithm balances time as well as quality of segmentation. As a future research, we consider adopting the MumfordShah model in kmeans context.
Appendix
A. The Projection, Proximal Mapping, and Reflection
Let be a closed convex set. The projection onto set is a mapping defined by
The proximal mapping of an indicator function is given by which is nothing but projection of on and is denoted by
The reflection with respect to set is a function defined by
Let be dimensional Euclidean space, be norm, and be the class of lower semicontinuous functions from such that and . For any , the minimization problem admits a unique solution, which is denoted by . The operator thus defined is the proximal operator of function
B. Solving a Linear System of Equations to Obtain
Let and is given by
Let , , and , than the is given by
Let and denote fast Fourier transform of then
C. DouglasRachford Algorithm
Let and be lower semicontinuous functions. Also, and need not be differentiable and .
The problem is
Solution. The problem has at least one solution. Let and .
The solution is denoted by ; DouglasRachford iterations are as follows:

Applying the same to our minimization problem, in equation (16) by taking and .

Data Availability
The data used to support the findings of this study are publicly available: http://cs.uef.fi/sipu/images/ and http://www.wisdom.weizmann.ac.il/~vision/Seg_Evaluation_DB/.
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.