Abstract
In this paper, a multilevel thresholding (MT) algorithm based on the harmony search algorithm (HSA) is introduced. HSA is an evolutionary method which is inspired in musicians improvising new harmonies while playing. Different to other evolutionary algorithms, HSA exhibits interesting search capabilities still keeping a low computational overhead. The proposed algorithm encodes random samples from a feasible search space inside the image histogram as candidate solutions, whereas their quality is evaluated considering the objective functions that are employed by the Otsu’s or Kapur’s methods. Guided by these objective values, the set of candidate solutions are evolved through the HSA operators until an optimal solution is found. Experimental results demonstrate the high performance of the proposed method for the segmentation of digital images.
1. Introduction
Segmentation is one of the most important tasks in image processing that endeavors to identify whether a pixel intensity corresponds to a predefined class. Thresholding is the easiest method for segmentation as it works taking a threshold () value so that pixels whose intensity value is higher than are labeled as first class while the rest correspond to a second class label. When the image is segmented into two classes, the task is called bilevel thresholding (BT) and requires only one value. On the other hand, when pixels are separated into more than two classes, the task is named as MT and demands more than one value [1].
In recent years image processing has been applied to different areas as engineering, medicine, agriculture, and so forth. Since most of such implementations use a TH methodology, several techniques had been studied. Generally, TH methods are divided into parametric and nonparametric [2–5]. Parametric approaches need to estimate values of a probability density function to model each class. The estimation process is time consuming and computationally expensive. On the other hand, the TH nonparametric employs several criteria such as the betweenclass variance, the entropy, and the error rate [6–8] in order to verify the quality of a value. These metrics could also be used as optimization functions since they result as an attractive option due their robustness and accuracy.
There exist two classical thresholding methods. The first, proposed by Otsu in [6] that maximizes the variance between classes while the second method, submitted by Kapur et al. in [7], uses the maximization of the entropy to measure the homogeneity among classes. Their efficiency and accuracy have been already proved for a bilevel segmentation [4]. Although they can be expanded for MT, their computational complexity increases exponentially when a new threshold is incorporated [5].
As an alternative to classical methods, the MT problem has also been handled through evolutionary optimization methods. In general, they have demonstrated to deliver better results than those based on classical techniques in terms of accuracy, speed, and robustness. Numerous evolutionary approaches have been reported in the literature.
Hammouche et al. in [9] provides an interesting survey of how different evolutionary algorithms are used to solve the Kaptur’s and Otsu’s problems. The study uses four classical evolutionary algorithms to test their efficiency in MT. Such methods include differential evolution (DE) [10], simulated annealing (SA) [11], and tabu search (TS) [12].
Genetic algorithms (GA) [13], inspired on the biological evolution, have been also used for solving segmentation tasks. One interesting example is presented in [14], where a GAbased algorithm is combined with Gaussian models for multilevel thresholding. Other similar works, such as that of Yin [15] proposes an improved GA for optimal thresholding. In the approach, it is used as a learning strategy to increase the speed of convergence.
Evolutionary approaches inspired on swarm behavior, such as particle swarm optimization (PSO) [16] and artificial bee colony (ABC) [17], have been employed to face the segmentation problem. In [18], both methods are used to find the optimal multilevel threshold points by using the Kapur’s entropy as fitness function.
Finally, in [19], the optimal segmentation threshold values are determined by using the bacterial foraging algorithm (BFA). Such method aims to maximize the Kapur’s and Otsu’s objective functions by considering a set of operators that are based on the social foraging behavior of the bacteria Eschericha Colli.
On the other hand, the harmony search algorithm (HSA) introduced by Geem et al. [20] is an evolutionary optimization algorithm which is based on the metaphor of the improvisation process that occurs when a musician searches for a better state of harmony. The HSA generates a new candidate solution from all existing solutions. The solution vector is analogous to the harmony in music while the local and global search schemes are analogous to musician’s improvisations. In comparison to other metaheuristics methods in the literature, HSA imposes fewer mathematical requirements as it can be easily adapted for solving several sorts of engineering optimization challenges [21]. Furthermore, numerical comparisons have demonstrated that the convergence for the HSA is faster than GA [22] which attracts further attention. It has been successfully applied to solve a wide range of practical optimization problems such as discrete and continuous structural optimization [23], parameter estimation of the nonlinear Muskingum model [24], design optimization of water distribution networks [25], vehicle routing [26], combined heat and power economic dispatch [27], design of steel frames [28], and image processing [29]. Although the standard HSA presents good search characteristics, several modifications to the original HSA have been proposed in the literature in order to enhance its own features [30].
In this paper, a novel multithresholding segmentation algorithm is introduced. The proposed approach, called the harmony search multithresholding algorithm (HSMA), combines the original harmony search algorithm (HSA) and the Otsu’s and Kapur’s methodologies. The proposed algorithm takes random samples from a feasible search space inside the image histogram. Such samples build each harmony (candidate solution) in the HSA context, whereas its quality is evaluated considering the objective function that is employed by the Otsu’s or the Kapur’s method. Guided by these objective values, the set of candidate solutions are evolved using the HSA operators until the optimal solution is found. The approach generates a multilevel segmentation algorithm which can effectively identify the threshold values of a digital image within a reduced number of iterations. Experimental results over several complex images have validated the efficiency of the proposed technique regarding accuracy, speed, and robustness.
The paper is organized as follows. In Section 2, the HSA is introduced. Section 3 gives a brief description of the Otsu’s and Kapur’s methods. Section 4 explains the implementation of the proposed algorithm. Section 5 discusses experimental results after testing the proposed method over a set benchmark images. Finally, some conclusions are discussed in Section 6.
2. Harmony Search Algorithm
2.1. The Harmony Search Algorithm
In the basic HSA, each solution is called a “harmony” and is represented by an ndimension real vector. An initial population of harmony vectors are randomly generated and stored within a harmony memory (HM). A new candidate harmony is thus generated from the elements in the HM by using a memory consideration operation either by a random reinitialization or a pitch adjustment operation. Finally, the HM is updated by comparing the new candidate harmony and the worst harmony vector in the HM. The worst harmony vector is replaced by the new candidate vector when the latter delivers a better solution in the HM. The above process is repeated until a certain termination criterion is met. The basic HS algorithm consists of three main phases: HM initialization, improvisation of new harmony vectors, and updating the HM. The following discussion addresses details about each stage.
2.1.1. Initializing the Problem and the Algorithm Parameters
In general, the global optimization problem can be summarized as follows: where is the objective function, is the set of design variables, is the number of design variables, and and are the lower and upper bounds for the design variable , respectively. The parameters for HSA are the harmony memory size, that is, the number of solution vectors lying on the harmony memory (HM), the harmonymemory consideration rate (HMCR), the pitch adjusting rate (PAR), the distance bandwidth (BW), and the number of improvisations (NI) which represents the total number of iterations. The performance of HSA is strongly influenced by values assigned to such parameters, which in turn, depend on the application domain [31].
2.1.2. Harmony Memory Initialization
In this stage, initial vector components at HM, that is, HMS vectors, are configured. Let represent the th randomlygenerated harmony vector: for and , where rand(0,1) is a uniform random number between 0 and 1, the upper and lower bounds of the search space are defined by and , respectively. Then, the HM matrix is filled with the HMS harmony vectors as follows:
2.1.3. Improvisation of New Harmony Vectors
In this phase, a new harmony vector is built by applying the following three operators: memory consideration, random reinitialization, and pitch adjustment. Generating a new harmony is known as “improvisation.” In the memory consideration step, the value of the first decision variable for the new vector is chosen randomly from any of the values already existing in the current HM, that is, from the set . For this operation, a uniform random number is generated within the range . If is less than HMCR, the decision variable is generated through memory considerations; otherwise, is obtained from a random reinitialization between the search bounds . Values of the other decision variables are also chosen accordingly. Therefore, both operations, memory consideration and random reinitialization, can be modeled as follows: Every component obtained by memory consideration is further examined to determine whether it should be pitchadjusted. For this operation, the pitchadjusting rate (PAR) is defined as to assign the frequency of the adjustment and the bandwidth factor (BW) to control the local search around the selected elements of the HM. Hence, the pitchadjusting decision is calculated as follows: Pitch adjusting is responsible for generating new potential harmonies by slightly modifying original variable positions. Such operation can be considered similar to the mutation process in evolutionary algorithms. Therefore, the decision variable is either perturbed by a random number between 0 and BW or left unaltered. In order to protect the pitch adjusting operation, it is important to assure that points lying outside the feasible range must be reassigned, that is, truncated to the maximum or minimum values of the interval.
2.1.4. Updating the Harmony Memory
After a new harmony vector is generated, the harmony memory is updated by the survival of the fit competition between and the worst harmony vector in the HM. Therefore will replace and become a new member of the HM in case the fitness value of is better than the fitness value of .
2.1.5. Computational Procedure
The computational procedure of the basic HSA can be summarized as in Algorithm 1 [20].

This procedure is implemented for minimization. If the intention is to maximize the objective function, a sign modification of step 4 () is required. In this paper the HSA is used for maximization proposes.
3. Image Multilevel Thresholding (MT)
Thresholding is a process in which the pixels of a gray scale image are divided in sets or classes depending on their intensity level (). For this classification it is necessary to select a threshold value () and follow the simple rule of where is one of the pixels of the gray scale image that can be represented in gray scale levels . and are the classes in which the pixel can be located, while th is the threshold. The rule in (5) corresponds to a bilevel thresholding and can be easily extended for multiple sets: where represent different thresholds. The problem for both bilevel and MT is to select the values that correctly identify the classes. Although, Otsu’s and Kapur’s methods are wellknown approaches for determining such values, both propose a different objective function which must be maximized in order to find optimal threshold values, just as it is discussed below.
3.1. BetweenClass Variance (Otsu’s Method)
This is a nonparametric technique for thresholding proposed by Otsu [6] that employs the maximum variance value of the different classes as a criterion to segment the image. Taking the intensity levels from a gray scale image or from each component of a RGB (red, green, and blue) image, the probability distribution of the intensity values is computed as follows: where is a specific intensity level (), is the component of the image which depends if the image is gray scale or RGB, whereas is the total number of pixels in the image. (histogram) is the number of pixels that corresponds to the intensity level in . The histogram is normalized within a probability distribution . For the simplest segmentation (bilevel) two classes are defined as where and are probabilities distributions for and , as it is shown by It is necessary to compute mean levels and that define the classes using (10). Once those values are calculated, the Otsu variance between classes is calculated using (11) as follows: Notice that for both equations, (10) and (11), depends on the type of image. In (11) the number two is part of the Otsu’s variance operator and does not represent an exponent in the mathematical sense. Moreover and in (11) are the variances of and which are defined as where and . Based on the values and , (13) presents the objective function: where is the Otsu’s variance for a given value. Therefore, the optimization problem is reduced to find the intensity level () that maximizes (13).
Otsu’s method is applied for a single component of an image. In case of RGB images, it is necessary to apply separation into single component images. The previous description of such bilevel method can be extended for the identification of multiple thresholds. Considering thresholds, it is possible separate the original image into classes using (6); then it is necessary to compute the variances and their respective elements. The objective function in (13) can thus be rewritten for multiple thresholds as follows: where , is a vector containing multiple thresholds and the variances are computed through Here, represents the class, and are, respectively, the probability of occurrence and the mean of a class. In MT, such values are obtained as And, for the mean values Similar to the bilevel case, for the MT using the Otsu’s method, corresponds to the image components, RGB , and gray scale .
3.2. Entropy Criterion Method (Kapur’s Method)
Another nonparametric method that is used to determine the optimal threshold values has been proposed by Kapur et al. [7]. It is based on the entropy and the probability distribution of the image histogram. The method aims to find the optimal that maximizes the overall entropy. The entropy of an image measures the compactness and separability among classes. In this sense, when the optimal value appropriately separates the classes, the entropy has the maximum value. For the bilevel example, the objective function of the Kapur’s problem can be defined as where the entropies and are computed by the following model: is the probability distribution of the intensity levels which is obtained using (7). and are probabilities distributions for and . stands for the natural logarithm. Similar to the Otsu’s method, the entropybased approach can be extended for multiple threshold values; for such a case, it is necessary to divide the image into classes using the similar number of thresholds. Under such conditions, the new objective function is defined as: where is a vector that contains the multiple thresholds. Each entropy is computed separately with its respective value, so (21) is expanded for entropies: The values of the probability occurrence () of the classes are obtained using (16) and the probability distribution with (10). Finally, it is necessary to use (6) to separate the pixels into the corresponding classes.
4. Multilevel Thresholding Using Harmony Search Algorithm (HSMA)
4.1. Harmony Representation
Each harmony (candidate solution) uses different elements as decision variables within the optimization algorithm. Such decision variables represent a different threshold point that is used for the segmentation. Therefore, the complete population is represented as where refers to the transpose operator, HMS is the size of the harmony memory, is the th element of HM, and is set for RGB images while is chosen for gray scale images. For this problem, the boundaries of the search space are set to and , which correspond to image intensity levels.
4.2. HMA Implementation
The proposed segmentation algorithm has been implemented considering two different objective functions: Otsu and Kapur. Therefore, the HSA has been coupled with the Otsu and Kapur functions, producing two different segmentation algorithms. The implementation of both algorithms can be summarized as in Algorithm 2.

4.3. Parameter Setting
The performance of HSA is strongly influenced by values assigned to parameters HM, HMCR, PAR, BW, and NI. Determining the most appropriate parameter values for an arbitrary problem is a complex issue, since such parameters interact to each other in a highly nonlinear manner, and no mathematical models of such interaction currently exist. The common method to find the best set of parameter values is to fix each parameter value to a random number within the parameter limits and then HSA is executed. If the final result is not satisfactory; then a new set of parameter values is defined, and the evolutionary algorithm is executed again. Evidently, this process can be very expensive (in terms of computational time), since many different trials may be required before reaching a set of satisfactory parameter values. Additionally, the set of values chosen by the user are not necessarily the best possible, but only the best from the arbitrary number of trials performed by the user. In order to reduce the number of experiments in this paper, it has used the factorial design method proposed in [32, 33] to systematically identify the best parameters of HSA.
The factorial design method [34] is a statistical technique that evaluates at the same time all process variables in order to determine which ones really exert significant influence on the final response. All variables are called factors and the different values chosen to study the factors are called levels. The factors to be considered in the factorial design are the HSA parameters, the harmony memory (HM), the harmonymemory consideration rate (HMCR), the pitch adjusting rate (PAR), the distance bandwidth (BW), and the number of improvisations (NI), whereas the response is the best fitness value obtained as a consequence of the HSA execution. Table 1 show the levels of the quantitative factors used in the factorial design. The values of zero level (central point) are based on the suggestions of the literature [33].
Each experiment is conducted combining the two different levels that define each parameter considering as a problem an image histogram example. Since the factors are five, a fractional factorial design is chosen, requiring sixteen optimization experiments plus one optimization trial for the central point. The results obtained from seventeen runs were analyzed according to [32, 33] using a general linear form of analysis of variance (ANOVA) [34] considering a 95% of confidence. After such analysis, it was found that the best possible configuration of HSA is shown in Table 2. These results were consistent considering six replications using different image histograms and the Otsu (14) or Kapur (20) functions, indistinctly. For more information on how to build fractional factorial designs, the reader is referred to [32, 33].
5. Experimental Results
The HSMA has been tested under a set of 11 benchmark images. Some of these images are widely used in the image processing literature to test different methods (Lena, Cameraman, Hunter, Baboon, etc.) [3, 9]. All the images have the same size ( pixels), and they are in JPGE format. For the sake of representation, only five images which are presented in Figure 1 have been used to show the visual results; however, the numerical outcomes are analyzed considering the complete set.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
Since HSMA is stochastic, it is necessary to employ an appropriate statistical metrics to measure its efficiency. Hence, the results have been reported executing the algorithm 35 times for each image. In order to maintain compatibility with similar works reported in the literature [14, 15, 18, 19], the number of thresholds points used in the test are . In the experiments, the stop criterion is the number of times in which the best fitness values remain with no change. Therefore, if the fitness value for the best harmony remains unspoiled in 10% of the total number of iterations (), then the HSA is stopped.
To evaluate the stability and consistency, it has been computed the standard deviation (STD) from the results obtained in the 35 executions. Since the STD represents a measure about how the data are dispersed, the algorithm becomes more instable as the STD value increases [19]. Equation (23) shows the model used to calculate the STD value: where is the best fitness of the th iteration, is the average value of , and is the number of total executions ().
On the other hand, as an index of quality, the peaktosignal ratio (PSNR) is used to assess the similarity of the segmented image against a reference image (original image) based on the produced mean square error (MSE) [18, 35]. Both PSNR and MSE are defined as where is the original image, is the segmented image, for gray scale, and for RGB images, whereas , are the total number of rows and columns of the image, respectively.
5.1. Otsu’s Results
This section analyzes the results of HSMA after considering the variance among classes (14) as the objective function, just as it has been proposed by Otsu [6]. The approach is applied over the complete set of benchmark images, whereas the results are registered in Table 3. Such results present the best threshold values after testing the proposed method with four different threshold points . The table also features the and the values. It is evident that the and values increase their magnitude as the number of threshold points also increases.
For the sake of representation, it has been selected only five images of the set to show (graphically) the segmentation results. Figure 1 presents the images selected from the benchmark set and their respective histograms which possess irregular distributions (see Figure 1(j) in particular). Under such circumstances, classical methods face great difficulties to find the best threshold values.
The processing results for the selected original images are presented in five tables: Tables 4, 5, 6, 7, and 8. Such results show the segmented images considering four different threshold points, . The tables also show the evolution of the objective function during one execution.
5.2. Kapur’s Results
This section analyzes the performance of HSMA after considering the entropy function (20) as objective function, as it has been proposed by Kapur et al. in [7]. Table 9 presents the experimental results after the application of HSMA over the entire set of benchmark images. The values listed are , , and the best threshold values of the last population (). The same test procedure that was previously applied to the Otsu’s method (Section 5.1) is used with the Kapur’s method, also considering the same stop criterion and a similar HSA parameter configuration.
The results after appling the HSMA to the selected benchmark images are presented in Tables 10, 11, 12, 13, and 14. Four different threshold points have been employed: . All tables exhibit the segmented image, the approximated histogram, and the evolution of the fitness value during the execution of the HSA method.
From the results of both Otsu’s and Kapur’s methods, it is possible to appreciate that the HSMA converges (stabilizes) after a determined number of iterations depending on the value. For experimental purposes HSMA continues running still further, even though the stop criterion is achieved. In this way, the graphics show that convergence is often reached in the first iterations of the optimization process. The segmented images provide evidence that the outcome is better with and ; however, if the segmentation task does not require to be extremely accurate then it is possible to select .
5.3. Comparisons
In order to analyze the results of HSMA, two different comparisons are executed. The first one involves the comparison between the two versions of the proposed approach, one with the Otsu function and the other with the Kapur criterion. The second analyses the comparison between the HSMA and other stateoftheart approaches.
5.3.1. Comparison between Otsu and Kapur HSMA
In order to statistically compare the results from Tables 3 and 9, a nonparametric significance proof known as the Wilcoxon’s rank test [36, 37] for 35 independent samples has been conducted. Such proof allows assessing result differences among two related methods. The analysis is performed considering a 5% significance level over the peaktosignal ratio (PSNR) data corresponding to the five threshold points. Table 15 reports the values produced by Wilcoxon’s test for a pairwise comparison of the PSNR values between the Otsu and Kapur objective functions. As a null hypothesis, it is assumed that there is no difference between the values of the two objective functions. The alternative hypothesis considers an existent difference between the values of both approaches. All values reported in Table 15 are less than 0.05 (5% significance level) which is a strong evidence against the null hypothesis, indicating that the Otsu PSNR mean values for the performance are statistically better and it has not occurred by chance.
5.3.2. Comparison among HSMA and Other MT Approaches
The results produced by HSMA have been compared with those generated by stateoftheart thresholding methods such genetic algorithms (GA) [15], particle swarm optimization (PSO) [18], and bacterial foraging (BF) [19].
All the algorithms run 35 times over each selected image. The images at this test are the same as in Sections 5.2 and 5.1 (Camera man, Lena, Baboon, Hunter, and Butterfly). For each image, the , the , and the mean of the objective function values are calculated. Moreover, the entire test is performed using both Otsu’s and Kapur’s objective functions.
Table 16 presents the computed values for a reduced benchmark test (five images). It is clear that the HSMA delivers better performance than the others. Such values are computed using the Otsu’s method as the objective function. On the other hand, the same experiment has been performed using the Kapur’s method. Using the same criteria (as those described for the Otsu’s method), the algorithm runs over 35 times at each image. The results of this experiment are presented in Table 17 and show that the proposed HSMA algorithm is better in comparison with the GA, PSO, and BF.
6. Conclusions
In this paper, a MTbased method on the original harmony search algorithm (HSA) is presented. The approach combines the good search capabilities of HSA algorithm and the use of some objective functions that have been proposed by the popular MT methods of Otsu and Kapur. In order to measure the performance of the proposed approach, the peak signaltonoise ratio (PSNR) is used to assess the segmentation quality by considering the coincidences between the segmented and the original images. In this work, a simple HSA implementation without any modification is considered in order to demonstrate that it can be applied to image processing tasks.
The study explores the comparison between two versions of HSMA: one employs the Otsu objective function while the other uses the Kapur criterion. Results show that the Otsu function delivers better results than the Kapur criterion. Such conclusion has been statistically proved considering the Wilcoxon test.
The proposed approach has been compared to other techniques that implement different optimization algorithms like GA, PSO, and BF. The efficiency of the algorithm has been evaluated in terms of the PSNR index and the STD value. Experimental results provide evidence on the outstanding performance, accuracy, and convergence of the proposed algorithm in comparison to other methods. Although the results offer evidence to demonstrate that the standard HSA method can yield good results on complicated images, the aim of our paper is not to devise an MT algorithm that could beat all currently available methods, but to show that harmony search algorithms can be effectively considered as an attractive alternative for this purpose.
Acknowledgment
The proposed algorithm is part of the vision system used by a biped robot supported under the Grant CONACYT CB 181053. The first author acknowledges The National Council of Science and Technology of Mexico (CONACyT) for the doctoral Grant number 215517 and The Youth Institute of Jalisco (IJJ) for partially support this research.