Computational Intelligence in Image Processing 2014View this Special Issue
Research Article | Open Access
Multithreshold Segmentation by Using an Algorithm Based on the Behavior of Locust Swarms
As an alternative to classical techniques, the problem of image segmentation has also been handled through evolutionary methods. Recently, several algorithms based on evolutionary principles have been successfully applied to image segmentation with interesting performances. However, most of them maintain two important limitations: (1) they frequently obtain suboptimal results (misclassifications) as a consequence of an inappropriate balance between exploration and exploitation in their search strategies; (2) the number of classes is fixed and known in advance. This paper presents an algorithm for the automatic selection of pixel classes for image segmentation. The proposed method combines a novel evolutionary method with the definition of a new objective function that appropriately evaluates the segmentation quality with respect to the number of classes. The new evolutionary algorithm, called Locust Search (LS), is based on the behavior of swarms of locusts. Different to the most of existent evolutionary algorithms, it explicitly avoids the concentration of individuals in the best positions, avoiding critical flaws such as the premature convergence to suboptimal solutions and the limited exploration-exploitation balance. Experimental tests over several benchmark functions and images validate the efficiency of the proposed technique with regard to accuracy and robustness.
Image segmentation  consists in grouping image pixels based on some criteria such as intensity, color, and texture and still represents a challenging problem within the field of image processing. Edge detection , region-based segmentation , and thresholding methods  are the most popular solutions for image segmentation problems.
Among such algorithms, thresholding is the simplest method. It works by considering threshold (points) values to adequately separate distinct pixels regions within the image being processed. In general, thresholding methods are divided into two types depending on the number of threshold values, namely, bilevel and multilevel. In bilevel thresholding, only a threshold value is required to separate the two objects of an image (e.g., foreground and background). On the other hand, multilevel thresholding divides pixels into more than two homogeneous classes that require several threshold values.
The thresholding methods use a parametric or nonparametric approach . In parametric approaches [6, 7], it is necessary to estimate the parameters of a probability density function that is capable of modelling each class. A nonparametric technique [8–11] employs a given criteria such as the between-class variance or the entropy and error rate, in order to determine optimal threshold values.
A common method to accomplish parametric thresholding is the modeling of the image histogram through a Gaussian mixture model  whose parameters define a set of pixel classes (threshold points). Therefore, each pixel that belongs to a determined class is labeled according to its corresponding threshold points with several pixel groups gathering those pixels that share a homogeneous grayscale level.
The problem of estimating the parameters of a Gaussian mixture that better model an image histogram has been commonly solved through the Expectation Maximization (EM) algorithm [13, 14] or gradient-based methods such as Levenberg-Marquardt, LM . Unfortunately, EM algorithms are very sensitive to the choice of the initial values , meanwhile gradient-based methods are computationally expensive and may easily get stuck within local minima .
As an alternative to classical techniques, the problem of Gaussian mixture identification has also been handled through evolutionary methods. In general, they have demonstrated to deliver better results than those based on classical approaches in terms of accuracy and robustness . Under these methods, an individual is represented by a candidate Gaussian mixture model. Just as the evolution process unfolds, a set of evolutionary operators are applied in order to produce better individuals. The quality of each candidate solution is evaluated through an objective function whose final result represents the similarity between the mixture model and the histogram. Some examples of these approaches involve optimization methods such as Artificial Bee Colony (ABC) , Artificial Immune Systems (AIS) , Differential Evolution (DE) , Electromagnetism Optimization (EO) , Harmony Search (HS) , and Learning Automata (LA) . Although these algorithms own interesting results, they present two important limitations. (1) They frequently obtain suboptimal approximations as a consequence of a limited balance between exploration and exploitation in their search strategies. (2) They are based on the assumption that the number of Gaussians (classes) in the mixture is preknown and fixed; otherwise, they cannot work. The cause of the first limitation is associated with their evolutionary operators employed to modify the individual positions. In such algorithms, during their evolution, the position of each agent for the next iteration is updated yielding an attraction towards the position of the best particle seen so far or towards other promising individuals. Therefore, as the algorithm evolves, these behaviors cause that the entire population rapidly concentrates around the best particles, favoring the premature convergence and damaging the appropriate exploration of the search space [25, 26]. The second limitation is produced as a consequence of the objective function that evaluates the similarity between the mixture model and the histogram. Under such an objective function, the number of Gaussians functions in the mixture is fixed. Since the number of threshold values (Gaussian functions) used for image segmentation varies depending on the image, the best threshold number and values are obtained by an exhaustive trial and error procedure.
On the other hand, bioinspired algorithms represent a field of research that is concerned with the use of biology as a metaphor for producing optimization algorithms. Such approaches use our scientific understanding of biological systems as an inspiration that, at some level of abstraction, can be represented as optimization processes.
In the last decade, several optimization algorithms have been developed by a combination of deterministic rules and randomness, mimicking the behavior of natural phenomena. Such methods include the social behavior of bird flocking and fish schooling such as the Particle Swarm Optimization (PSO) algorithm  and the emulation of the differential evolution in species such as the Differential Evolution (DE) . Although PSO and DE are the most popular algorithms for solving complex optimization problems, they present serious flaws such as premature convergence and difficulty to overcome local minima [29, 30]. The cause for such problems is associated with the operators that modify individual positions. In such algorithms, during the evolution, the position of each agent for the next iteration is updated yielding an attraction towards the position of the best particle seen so far (in case of PSO) or towards other promising individuals (in case of DE). As the algorithm evolves, these behaviors cause that the entire population rapidly concentrates around the best particles, favoring the premature convergence and damaging the appropriate exploration of the search space [31, 32].
Recently, the collective intelligent behavior of insect or animal groups in nature has attracted the attention of researchers. The intelligent behavior observed in these groups provides survival advantages, where insect aggregations of relatively simple and “unintelligent” individuals can accomplish very complex tasks using only limited local information and simple rules of behavior . Locusts (Schistocerca gregaria) are a representative example of such collaborative insects . Locust is a kind of grasshopper that can change reversibly between a solitary and a social phase, with clear behavioral differences among both phases . The two phases show many differences regarding the overall level of activity and the degree to which locusts are attracted or repulsed among them . In the solitary phase, locusts avoid contact to each other (locust concentrations). As consequence, they distribute throughout the space, exploring sufficiently over the plantation . On the other hand, in the social phase, locusts frantically concentrate around those elements that have already found good food sources . Under such a behavior, locusts attempt to efficiently find better nutrients by devastating promising areas within the plantation.
This paper presents an algorithm for the automatic selection of pixel classes for image segmentation. The proposed method combines a novel evolutionary method with the definition of a new objective function that appropriately evaluates the segmentation quality with regard to the number of classes. The new evolutionary algorithm, called Locust Search (LS), is based on the behavior presented in swarms of locusts. In the proposed algorithm, individuals emulate a group of locusts which interact to each other based on the biological laws of the cooperative swarm. The algorithm considers two different behaviors: solitary and social. Depending on the behavior, each individual is conducted by a set of evolutionary operators which mimics different cooperative conducts that are typically found in the swarm. Different to most of existent evolutionary algorithms, the behavioral model in the proposed approach explicitly avoids the concentration of individuals in the current best positions. Such fact allows avoiding critical flaws such as the premature convergence to suboptimal solutions and the incorrect exploration-exploitation balance. In order to automatically define the optimal number of pixel classes (Gaussian functions in the mixture), a new objective function has been also incorporated. The new objective function is divided into two parts. The first part evaluates the quality of each candidate solution in terms of its similarity with regard to the image histogram. The second part penalizes the overlapped area among Gaussian functions (classes). Under these circumstances, Gaussian functions that do not “positively” participate in the histogram approximation could be easily eliminated in the final Gaussian mixture model.
In order to illustrate the proficiency and robustness of the proposed approach, several numerical experiments have been conducted. Such experiments are divided into two sections. In the first part, the proposed LS method is compared to other well-known evolutionary techniques over a set of benchmark functions. In the second part, the performance of the proposed segmentation algorithm is compared to other segmentation methods which are also based on evolutionary principles. The results in both cases validate the efficiency of the proposed technique with regard to accuracy and robustness.
This paper is organized as follows: in Section 2 basic biological issues of the algorithm analogy are introduced and explained. Section 3 describes the novel LS algorithm and its characteristics. A numerical study on different benchmark function is presented in Section 4 while Section 5 presents the modelling of an image histogram through a Gaussian mixture. Section 6 exposes the LS segmentation algorithm and Section 7 the performance of the proposed segmentation algorithm. Finally, Section 8 draws some conclusions.
2. Biological Fundamentals and Mathematical Models
Social insect societies are complex cooperative systems that self-organize within a set of constraints. Cooperative groups are good at manipulating and exploiting their environment, defending resources and breeding, yet allowing task specialization among group members [38, 39]. A social insect colony functions as an integrated unit that not only possesses the ability to operate at a distributed manner but also undertakes a huge construction of global projects . It is important to acknowledge that global order for insects can arise as a result of internal interactions among members.
Locusts are a kind of grasshoppers that exhibit two opposite behavioral phases: solitary and social (gregarious). Individuals in the solitary phase avoid contact to each other (locust concentrations). As consequence, they distribute throughout the space while sufficiently exploring the plantation . In contrast, locusts in the gregarious phase gather into several concentrations. Such congregations may contain up to 1010 members, cover cross-sectional areas of up to 10 km2, and a travelling capacity up to 10 km per day for a period of days or weeks as they feed causing a devastating crop loss . The mechanism to switch from the solitary phase to the gregarious phase is complex and has been a subject of significant biological inquiry. Recently, a set of factors has been implicated to include geometry of the vegetation landscape and the olfactory stimulus .
Only few works [36, 37] that mathematically model the locust behavior have been published. Both approaches develop two different minimal models with the goal of reproducing the macroscopic structure and motion of a group of locusts. Considering that the method in  focuses on modelling the behavior for each locust in the group, its fundamentals have been employed to develop the algorithm that is proposed in this paper.
2.1. Solitary Phase
This section describes how each locust’s position is modified as a result of its behavior under the solitary phase. Considering that represents the current position of the th locust in a group of different elements, the new position is calculated by using the following model:with corresponding to the change of position that is experimented by as a consequence of its social interaction with all other elements in the group.
Two locusts in the solitary phase exert forces on each other according to basic biological principles of attraction and repulsion (see, e.g., ). Repulsion operates quite strongly over a short length scale in order to avoid concentrations. Attraction is weaker and operates over a longer length scale, providing the social force that is required to maintain the group’s cohesion. Therefore, the strength of such social forces can be modeled by the following function:Here, is a distance, describes the strength of attraction, and is the typical attractive length scale. We have scaled the time and the space coordinates so that the repulsive strength and length scale are both represented by the unity. We assume that and so that repulsion is stronger and features in a shorter-scale, while attraction is applied in a weaker and longer-scale; both facts are typical for social organisms . The social force exerted by locust over locust iswhere is the distance between the two locusts and is the unit vector pointing from to . The total social force on each locust can be modeled as the superposition of all of the pairwise interactions:The change of position is modeled as the total social force experimented by as the superposition of all of the pairwise interactions. Therefore, is defined as follows:In order to illustrate the behavioral model under the solitary phase, Figure 1 presents an example, assuming a population of three different members () which adopt a determined configuration in the current iteration . As a consequence of the social forces, each element suffers an attraction or repulsion to other elements depending on the distance among them. Such forces are represented by , , , , , and . Since and are too close, the social forces and present a repulsive nature. On the other hand, as the distances and are quite long, the social forces , , , and between and all belong to the attractive nature. Therefore, the change of position is computed as the vector resultant between and () is . The values and are also calculated accordingly.
In addition to the presented model , some studies [43–45] suggest that the social force is also affected by the dominance of the involved individuals and in the pairwise process. Dominance is a property that relatively qualifies the capacity of an individual to survive, in relation to other elements in a group. The locust’s dominance is determined by several characteristics such as size, chemical emissions, and location with regard to food sources. Under such circumstances, the social force is magnified or weakened depending on the most dominant individual that is involved in the repulsion-attraction process.
2.2. Social Phase
In this phase, locusts frantically concentrate around the elements that have already found good food sources. They attempt to efficiently find better nutrients by devastating promising areas within the plantation. In order to simulate the social phase, the food quality index is assigned to each locust of the group as such index reflects the quality of the food source where is located.
Under this behavioral model, each of the elements of the group is ranked according to its corresponding food quality index. Afterwards, the elements featuring the best food quality indexes are selected (). Considering a concentration radius that is created around each selected element, a set of new locusts is randomly generated inside . As a result, most of the locusts will be concentrated around the best elements. Figure 2 shows a simple example of the behavioral model under the social phase. In the example, the configuration includes eight locusts (), just as it is illustrated by Figure 2(a) that also presents the food quality index for each locust. A food quality index near to one indicates a better food source. Therefore, Figure 2(b) presents the final configuration after the social phase, assuming .
3. The Locust Search (LS) Algorithm
In this paper, some behavioral principles drawn from a swarm of locusts have been used as guidelines for developing a new swarm optimization algorithm. The LS assumes that entire search space is a plantation, where all the locusts interact to each other. In the proposed approach, each solution within the search space represents a locust position inside the plantation. Every locust receives a food quality index according to the fitness value of the solution that is symbolized by the locust’s position. As it has been previously discussed, the algorithm implements two different behavioral schemes: solitary and social. Depending on each behavioral phase, each individual is conducted by a set of evolutionary operators which mimics the different cooperative operations that are typically found in the swarm. The proposed method formulates the optimization problem in the following form:where is a nonlinear function whereas is a bounded feasible search space, which is constrained by the lower () and upper () limits.
In order to solve the problem formulated in (6), a population of locusts (individuals) is evolved inside the LS operation from the initial point () to a total number of iterations (). Each locust () represents an -dimensional vector where each dimension corresponds to a decision variable of the optimization problem to be solved. The set of decision variables constitutes the feasible search space , where and correspond to the lower and upper bounds for the dimension , respectively. The food quality index that is associated with each locust (candidate solution) is evaluated through an objective function whose final result represents the fitness value of . In the LS algorithm, each iteration of the evolution process consists of two operators: (A) solitary and (B) social. Beginning by the solitary stage, the set of locusts is operated in order to sufficiently explore the search space. On the other hand, the social operation refines existent solutions within a determined neighborhood (exploitation) subspace.
3.1. Solitary Operation (A)
One of the most interesting features of the proposed method is the use of the solitary operator to modify the current locust positions. Under this approach, locusts are displaced as a consequence of the social forces produced by the positional relations among the elements of the swarm. Therefore, near individuals tend to repel each other, avoiding the concentration of elements in regions. On the other hand, distant individuals tend to attract to each other, maintaining the cohesion of the swarm. A clear difference to the original model in  considers that social forces are also magnified or weakened depending on the most dominant (best fitness values) individuals that are involved in the repulsion-attraction process.
In the solitary phase, a new position is produced by perturbing the current locust position with a change of position . The change of position is the result of the social interactions experimented by as a consequence of its repulsion-attraction behavioral model. Such social interactions are pairwise computed among and the other individuals in the swarm. In the original model, social forces are calculated by using (3). However, in the proposed method, it is modified to include the best fitness value (the most dominant) of the individuals involved in the repulsion-attraction process. Therefore, the social force, that is exerted between and , is calculated by using the following new model:where is the social force strength defined in (2) and is the unit vector pointing from to . Besides, is a randomly generated number between 1 and −1. Likewise, is the dominance function that calculates the dominance value of the most dominant individual from and . In order to operate , all the individuals from are ranked according to their fitness values. The ranks are assigned so that the best individual receives the rank 0 (zero) whereas the worst individual obtains the rank . Therefore, the function is defined as follows:where the function rank() delivers the rank of the -individual. According to (8), yields a value within the interval . Its maximum value of one in is reached when either individual or is the best element of the population regarding their fitness values. On the other hand, a value close to zero is obtained when both individuals and possess quite bad fitness values. Figure 3 shows the behavior of considering 100 individuals. In the figure, it is assumed that represents one of the 99 individuals with ranks between 0 and 98 whereas is fixed to the element with the worst fitness value (rank 99).
Under the incorporation of in (7), social forces are magnified or weakened depending on the best fitness value (the most dominant) of the individuals involved in the repulsion-attraction process.
Finally, the total social force on each individual is modeled as the superposition of all of the pairwise interactions exerted over it:Therefore, the change of position is considered as the total social force experimented by as the superposition of all of the pairwise interactions. Thus, is defined as follows:After calculating the new positions of the population , the final positions must be calculated. The idea is to admit only the changes that guarantee an improvement in the search strategy. If the fitness value of is better than , then is accepted as the final solution. Otherwise, is retained. This procedure can be resumed by the following statement (considering a minimization problem):In order to illustrate the performance of the solitary operator, Figure 4 presents a simple example with the solitary operator being iteratively applied. A population of 50 different members () is assumed which adopt a concentrated configuration as initial condition (Figure 4(a)). As a consequence of the social forces, the set of elements tends to distribute throughout the search space. Examples of different distributions are shown in Figures 4(b), 4(c), and 4(d) after applying 25, 50, and 100 different solitary operations, respectively.
3.2. Social Operation (B)
The social procedure represents the exploitation phase of the LS algorithm. Exploitation is the process of refining existent individuals within a small neighborhood in order to improve their solution quality.
The social procedure is a selective operation which is applied only to a subset of the final positions (where ). The operation starts by sorting with respect to fitness values, storing the elements in a temporary population . The elements in are sorted so that the best individual receives the position whereas the worst individual obtains the location . Therefore, the subset is integrated by only the first locations of (promising solutions). Under this operation, a subspace is created around each selected particle . The size of depends on the distance which is defined as follows:where and are the upper and lower bounds in the th dimension and is the number of dimensions of the optimization problem, whereas is a tuning factor. Therefore, the limits of can be modeled as follows:where and are the upper and lower bounds of the th dimension for the subspace , respectively.
Considering the subspace around each element , a set of new particles are randomly generated inside bounds fixed by (13). Once the samples are generated, the individual of the next population must be created. In order to calculate , the best particle , in terms of its fitness value from the samples (where ), is compared to . If is better than according to their fitness values, is updated with ; otherwise, is selected. The elements of that have not been processed by the procedure () transfer their corresponding values to with no change.
The social operation is used to exploit only prominent solutions. According to the proposed method, inside each subspace , random samples are selected. Since the number of selected samples at each subspace is very small (typically ), the use of this operator substantially reduces the number of fitness function evaluations.
In order to demonstrate the social operation, a numerical example has been set by applying the proposed process to a simple function. Such function considers the interval of , whereas the function possesses one global maxima of value 8.1 at . Notice that and correspond to the axis coordinates (commonly and ). For this example, a final position population of six 2-dimensional members () is assumed. Figure 5 shows the initial configuration of the proposed example, with the black points representing half of the particles with the best fitness values (the first three elements of , ) whereas the grey points correspond to the remaining individuals. From Figure 5, it can be seen that the social procedure is applied to all black particles (, , and , , , ) yielding two new random particles (), which are characterized by white points , , , , , and , for each black point inside of their corresponding subspaces , , and . Considering the particle in Figure 7, the particle corresponds to the best particle () from the two randomly generated particles (according to their fitness values) within . Therefore, the particle will substitute in the individual for the next generation, since it holds a better fitness value than .
The LS optimization procedure is defined over a bounded search space S. Search points that do not belong to such area are considered to be infeasible. However, during the evolution process, some candidate solutions could fall outside the search space. In the proposed approach, such infeasible solutions are arbitrarily placed with a random position inside the search space S.
3.3. Complete LS Algorithm
LS is a simple algorithm with only seven adjustable parameters: the strength of attraction , the attractive length , number of promising solutions , the population size , the tuning factor , the number of random samples , and the number of generations . The operation of LS is divided into three parts: initialization of the solitary and social operations. In the initialization (), the first population is produced. The values of each individual and each dimension are randomly and uniformly distributed between the prespecified lower initial parameter bound and the upper initial parameter bound :In the evolution process, the solitary (A) and social (B) operations are iteratively applied until the number of iterations has been reached. The complete LS procedure is illustrated in Algorithm 1.
3.4. Discussion about the LS Algorithm
Evolutionary algorithms (EA) have been widely employed for solving complex optimization problems. These methods are found to be more powerful than conventional methods that are based on formal logics or mathematical programming . In the EA algorithm, search agents have to decide whether to explore unknown search positions or to exploit already tested positions in order to improve their solution quality. Pure exploration degrades the precision of the evolutionary process but increases its capacity to find new potentially solutions. On the other hand, pure exploitation allows refining existent solutions but adversely drives the process to local optimal solutions. Therefore, the ability of an EA to find a global optimal solution depends on its capacity to find a good balance between the exploitation of found-so-far elements and the exploration of the search space .
Most of swarm algorithms and other evolutionary algorithms tend to exclusively concentrate the individuals in the current best positions. Under such circumstances, such algorithms seriously limit their exploration-exploitation capacities.
Different to most of existent evolutionary algorithms, in the proposed approach, the modeled behavior explicitly avoids the concentration of individuals in the current best positions. Such fact allows not only to emulate the cooperative behavior of the locust colony in a good realistic way but also to incorporate a computational mechanism to avoid critical flaws that are commonly present in the popular evolutionary algorithms, such as the premature convergence and the incorrect exploration-exploitation balance.
It is important to emphasize that the proposed approach conducts two operators (solitary and social) within a single iteration. Such operators are similar to those that are used by other evolutionary methods such as ABC (employed bees, onlooker bees, and scout bees), AIS (clonal proliferation operator, affinity maturation operator, and clonal selection operator), and DE (mutation, crossover, and selection), which are all executed in a single iteration.
4. Numerical Experiments over Benchmark Functions
A comprehensive set of 13 functions, all collected from [48–50], has been used to test the performance of the LS approach as an optimization method. Tables 11 and 12 present the benchmark functions used in our experimental study. Such functions are classified into two different categories: unimodal test functions (Table 11) and multimodal test functions (Table 12). In these tables, is the function dimension; is the minimum value of the function, with being a subset of . The optimum locations () for functions in Tables 11 and 12 are in , except for , , and with in and in . A detailed description of optimum locations is given in Tables 11 and 12.
We have applied the LS algorithm to 13 functions whose results have been compared to those produced by the Particle Swarm Optimization (PSO) method  and the Differential Evolution (DE) algorithm , both considered as the most popular algorithms for many optimization applications. In all comparisons, the population has been set to 40 () individuals. The maximum iteration number for all functions has been set to 1000. Such stop criterion has been selected to maintain compatibility to similar works reported in the literature [48, 49].
The parameter setting for each of the algorithms in the comparison is described as follows:(1)PSO: in the algorithm, while the inertia factor () is decreased linearly from 0.9 to 0.2.(2)DE: the DE/Rand/1 scheme is employed. The parameter settings follow the instructions in [28, 51]. The crossover probability is and the weighting factor is .(3)In LS, and are set to 0.6 and , respectively. Besides, is fixed to 20 (/2), , whereas and are configured to 1000 and 40, respectively. Once such parameters have been experimentally determined, they are kept for all experiments in this section.
In the comparison, three indexes are considered: the average best-so-far solution (ABS), the standard deviation (SD), and the number of function evaluations (NFE). The first two indexes assess the accuracy of the solution whereas the last one measures the computational cost. The average best-so-far solution (ABS) expresses the average value of the best function evaluations that have been obtained from 30 independent executions. The standard deviation (SD) indicates the dispersion of the ABS values. Evolutionary methods are, in general, complex pieces of software with several operators and stochastic branches. Therefore, it is difficult to conduct a complexity analysis from a deterministic perspective. Under such circumstances, it is more appropriate to use the number of function evaluations (NFE), just as it is used in the literature [52, 53], to evaluate and assess the computational effort (time) and the complexity among optimizers. It represents how many times an algorithm uses the objective function to evaluate the objective (fitness) function until the best solution of a determined execution has been found. Since the experiments require 30 different executions, the NFE index corresponds to the averaged value obtained from these executions. A small NFE value indicates that less time is needed to reach the global optimum.
4.1. Unimodal Test Functions
Functions to are unimodal functions. The results for unimodal functions over 30 runs are reported in Table 1 considering the average best-so-far solution (ABS), the standard deviation (SD), and the number of function evaluations (NFE). According to this table, LS provides better results than PSO and DE for all functions in terms of ABS and SD. In particular, the test yields the largest performance difference in functions . Such functions maintain a narrow curving valley that is hard to optimize, in case the search space cannot be explored properly and the direction changes cannot be kept up with . For this reason, the performance differences are directly related to a better trade-off between exploration and exploitation that is produced by LS operators. In the practice, a main goal of an optimization algorithm is to find a solution as good as possible within a small time window. The computational cost for the optimizer is represented by its NFE values. According to Table 1, the NFE values that are obtained by the proposed method are smaller than its counterparts. Lower NFE values are more desirable since they correspond to less computational overload and, therefore, faster results. In the results perspective, it is clear that PSO and DE need more than 1000 generations in order to produce better solutions. However, this number of generations is considered in the experiments aiming for producing a visible contrast among the approaches. If the number of generations has been set to an exaggerated value, then all methods would converge to the best solution with no significant troubles.
A nonparametric statistical significance proof known as Wilcoxon’s rank sum test for independent samples [55, 56] has been conducted with an 5% significance level, over the “average best-so-far” data of Table 1. Table 2 reports the values produced by Wilcoxon’s test for the pairwise comparison of the “average best-so-far” of two groups. Such groups are formed by LS versus PSO and LS versus DE. As a null hypothesis, it is assumed that there is no significant difference between mean values of the two algorithms. The alternative hypothesis considers a significant difference between the “average best-so-far” values of both approaches. All values reported in the table are less than 0.05 (5% significance level) which is a strong evidence against the null hypothesis, indicating that the LS results are statistically significant and that it has not occurred by coincidence (i.e., due to the normal noise contained in the process).
4.2. Multimodal Test Functions
Multimodal functions possess many local minima which make the optimization a difficult task to be accomplished. In multimodal functions, the results reflect the algorithm’s ability to escape from local optima. We have applied the algorithms over functions to where the number of local minima increases exponentially as the dimension of the function increases. The dimension of such functions is set to 30. The results are averaged over 30 runs, with performance indexes being reported in Table 3 as follows: the average best-so-far solution (ABS), the standard deviation (SD), and the number of function evaluations (NFE). Likewise, values of the Wilcoxon signed-rank test of 30 independent runs are listed in Table 4. From the results, it is clear that LS yields better solutions than others algorithms for functions , , , and , in terms of the indexes ABS and SD. However, for functions and , LS produces similar results to DE. The Wilcoxon rank test results, that are presented in Table 6, confirm that LS performed better than PSO and DE considering four problems , whereas, from a statistical viewpoint, there is no difference between results from LS and DE for and . According to Table 3, the NFE values obtained by the proposed method are smaller than those produced by other optimizers. The reason of this remarkable performance is associated with its two operators: (i) the solitary operator allows a better particle distribution in the search space, increasing the algorithm’s ability to find the global optima and (ii) the use of the social operation provides a simple exploitation operator that intensifies the capacity of finding better solutions during the evolution process.
5. Gaussian Mixture Modelling
In this section, the modeling of image histograms through Gaussian mixture models is presented. Let one consider an image holding gray levels whose distribution is defined by a histogram represented by the following formulation: where denotes the number of pixels with gray level and Np the total number of pixels in the image. Under such circumstances, can be modeled by using a mixture of Gaussian functions of the form:where symbolizes the number of Gaussian functions of the model whereas is the a priori probability of function . and represent the mean and standard deviation of the th Gaussian function, respectively. Furthermore, the constraint must be satisfied by the model. In order to evaluate the similarity between the image histogram and a candidate mixture model, the mean square error can be used as follows:where represents the penalty associated with the constrain . Therefore, is considered as the objective function which must be minimized in the estimation problem. In order to illustrate the histogram modeling through a Gaussian mixture, Figure 6 presents an example, assuming three classes, that is, . Considering Figure 6(a) as the image histogram , the Gaussian mixture , that is shown in Figure 6(c), is produced by adding the Gaussian functions , , and in the configuration presented in Figure 6(b). Once the model parameters that better model the image histogram have been determined, the final step is to define the threshold values which can be calculated by simple standard methods, just as it is presented in [19–21].
6. Segmentation Algorithm Based on LS
In the proposed method, the segmentation process is approached as an optimization problem. Computational optimization generally involves two distinct elements: (1) a search strategy to produce candidate solutions (individuals, particles, insects, locust, etc.) and (2) an objective function that evaluates the quality of each selected candidate solution. Several computational algorithms are available to perform the first element. The second element, where the objective function is designed, is unquestionably the most critical. In the objective function, it is expected to embody the full complexity of the performance, biases, and restrictions of the problem to be solved. In the segmentation problem, candidate solutions represent Gaussian mixtures. The objective function (17) is used as a fitness value to evaluate the similarity presented between the Gaussian mixture and the image histogram. Therefore, guided by the fitness values ( values), a set of encoded candidate solutions are evolved using the evolutionary operators until the best possible resemblance can be found.
Over the last decade, several algorithms based on evolutionary and swarm principles [19–22] have been proposed to solve the problem of segmentation through a Gaussian mixture model. Although these algorithms own certain good performance indexes, they present two important limitations. (1) They frequently obtain suboptimal approximations as a consequence of an inappropriate balance between exploration and exploitation in their search strategies. (2) They are based on the assumption that the number of Gaussians (classes) in the mixture is preknown and fixed; otherwise, they cannot work.
In order to eliminate such flaws, the proposed approach includes (A) a new search strategy and (B) the definition of a new objective function. For the search strategy, the LS method (Section 4) is adopted. Under LS, the concentration of individuals in the current best positions is explicitly avoided. Such fact allows reducing critical problems such as the premature convergence to suboptimal solutions and the incorrect exploration-exploitation balance.
6.1. New Objective Function
Previous segmentation algorithms based on evolutionary and swarm methods use (17) as objective function. Under these circumstances, each candidate solution (Gaussian mixture) is only evaluated in terms of its approximation with the image histogram.
Since the proposed approach aims to automatically select the number of Gaussian functions in the final mixture , the objective function must be modified. The new objective function is defined as follows:where is a scaling constant. The new objective function is divided into two parts. The first part evaluates the quality of each candidate solution in terms of its similarity with regard to the image histogram (17). The second part penalizes the overlapped area among Gaussian functions (classes), with being defined as follows:where and represent the number of classes and the gray levels, respectively. The parameters and symbolize the Gaussian functions and , respectively, that are to be evaluated on the point (gray level) whereas the elements and represent the a priori probabilities of the Gaussian functions and , respectively. Under such circumstances, mixtures with Gaussian functions that do not “positively” participate in the histogram approximation are severely penalized.
Figure 7 illustrates the effect of the new objective function in the evaluation of Gaussian mixtures (candidate solutions). From the image histogram (Figure 7(a)), it is evident that two Gaussian functions are enough to accurately approximate the original histogram. However, if the Gaussian mixture is modeled by using a greater number of functions (e.g., four as it is shown in Figure 7(b)), the original objective function is unable to obtain a reasonable approximation. Under the new objective function , the overlapped area among Gaussian functions (classes) is penalized. Such areas, in Figure 7(c), correspond to , , and , where represents the penalization value produced between the Gaussian function and . Therefore, due to the penalization, the Gaussian mixture shown in Figures 7(b) and 7(c) provides a solution of low quality. On the other hand, the Gaussian mixture presented in Figure 7(d) maintains a low penalty; thus, it represents a solution of high quality. From Figure 7(d), it is easy to see that functions and can be removed from the final mixture. This elimination could be performed by a simple comparison with a threshold value , since and have a reduced amplitude (). Therefore, under , it is possible to find the reduced Gaussian mixture model starting from a considerable number of functions.
Since the proposed segmentation method is conceived as an optimization problem, the overall operation can be reduced to solve the formulation of (20) by using the LS algorithm:where , , and represent the probability, mean, and standard deviation of the class . It is important to remark that the new objective function allows the evaluation of a candidate solution in terms of the necessary number of Gaussian functions and its approximation quality. Under such circumstances, it can be used in combination with any other evolutionary method and not only with the proposed LS algorithm.
6.2. Complete Segmentation Algorithm
Once the new search strategy (LS) and objective function () have been defined, the proposed segmentation algorithm can be summarized by Algorithm 2. The new algorithm combines operators defined by LS and operations for calculating the threshold values.
(Line 1) The algorithm sets the operative parameters , , , , , , , and . They rule the behavior of the segmentation algorithm. (Line 2) Afterwards, the population is initialized considering different random Gaussian mixtures of functions. The idea is to generate an -random Gaussian mixture subjected to the constraints formulated in (20). The parameter must hold a high value in order to correctly segment all images (recall that the algorithm is able to reduce the Gaussian mixture to its minimum expression). (Line 3) Then, the Gaussian mixtures are evolved by using the LS operators and the new objective function . This process is repeated during cycles. (Line 8) After this procedure, the best Gaussian mixture is selected according to its objective function . (Line 9) Then, the Gaussian mixture is reduced by eliminating those functions whose amplitudes are lower than (). (Line 10) Then, the threshold values from the reduced model are calculated. (Line 11) Finally, the calculated values are employed to segment the image. Figure 8 shows a flowchart of the complete segmentation procedure.
The proposed segmentation algorithm permits to automatically detect the number of segmentation partitions (classes). Furthermore, due to its remarkable search capacities, the LS method maintains a better accuracy than previous algorithms based on evolutionary principles. However, the proposed method presents two disadvantages: first, it is related to its implementation which in general is more complex than most of the other segmentators based on evolutionary basics. The second refers to the segmentation procedure of the proposed approach which does not consider any spatial pixel characteristics. As a consequence, pixels that may belong to a determined region due to its position are labeled as a part of another region due to its gray level intensity. Such a fact adversely affects the segmentation performance of the method.
7. Segmentation Results
This section analyses the performance of the proposed segmentation algorithm. The discussion is divided into three parts: the first one shows the performance of the proposed LS segmentation algorithm while the second presents a comparison between the proposed approach and others segmentation algorithms that are based on evolutionary and swam methods. The comparison mainly considers the capacities of each algorithm to accurately and robustly approximate the image histogram. Finally, the third part presents an objective evaluation of segmentation results produced by all algorithms that have been employed in the comparisons.
7.1. Performance of LS Algorithm in Image Segmentation
This section presents two experiments that analyze the performance of the proposed approach. Table 5 presents the algorithm’s parameters that have been experimentally determined and kept for all the test images through all experiments.
First Image. The first test considers the histogram shown by Figure 9(b) while Figure 9(a) presents the original image. After applying the proposed algorithm, just as it has been configured according to parameters in Table 5, a minimum model of four classes emerges after the start from Gaussian mixtures of 10 classes. Considering 30 independent executions, the averaged parameters of the resultant Gaussian mixture are presented in Table 6. One final Gaussian mixture (ten classes), which has been obtained by LS, is presented in Figure 10. Furthermore, the approximation of the reduced Gaussian mixture is also visually compared with the original histogram in Figure 10. On the other hand, Figure 11 presents the segmented image after calculating the threshold points.
Second Image. For the second experiment, the image in Figure 12 is tested. The method aims to segment the image by using a reduced Gaussian mixture model obtained by the LS approach. After executing the algorithm according to parameters from Table 5, the resulting averaged parameters of the resultant Gaussian mixture are presented in Table 6. In order to assure consistency, the results are averaged considering 30 independent executions. Figure 13 shows the approximation quality that is obtained by the reduced Gaussian mixture model in (a) and the segmented image in (b).
7.2. Histogram Approximation Comparisons
This section discusses the comparison between the proposed segmentation algorithm and other evolutionary-segmentation methods that have been proposed in the literature. The set of methods used in the experiments includes the , , and . These algorithms consider the combination between the original objective function (17) and an evolutionary technique such as Artificial Bee Colony (ABC), the Artificial Immune Systems (AIS), and the Differential Evolution (DE) , respectively. Since the proposed segmentation approach considers the combination of the new objective function (18) and the LS algorithm, it will be referred to as . The comparison focuses mainly on the capacities of each algorithm to accurately and robustly approximate the image histogram.
In the experiments, the populations have been set to 40 () individuals. The maximum iteration number for all functions has been set to 1000. Such stop criterion has been considered to maintain compatibility to similar experiments that are reported in the literature . The parameter setting for each of the segmentation algorithms in the comparison is described as follows:(1) + ABC : in the algorithm, its parameters are configured as follows: the abandonment , and .(2) + AIS : it presents the following parameters, , , , , , and .(3) + DE : the DE/Rand/1 scheme is employed. The parameter settings follow the instructions in . The crossover probability is and the weighting factor is .(4)In , the method is set according to the values described in Table 5.
In order to conduct the experiments, a synthetic image is designed to be used as a reference in the comparisons. The main idea is to know in advance the exact number of classes (and their parameters) that are contained in the image so that the histogram can be considered as a ground truth. The synthetic image is divided into four sections. Each section corresponds to a different class which is produced by setting each gray level pixel to a value that is determined by the following model:where represents the section, whereas and are the mean and the dispersion of the gray level pixels, respectively. The comparative study employs the image of that is shown in Figure 14(a) and the algorithm’s parameters that have been presented in Table 7. Figure 14(b) illustrates the resultant histogram.
In the comparison, the discussion focuses on the following issues: first of all, accuracy; second, convergence; and third, computational cost.
Convergence. This section analyzes the approximation convergence when the number of classes that are used by the algorithm during the evolution process is different to the actual number of classes in the image. Recall that the proposed approach automatically finds the reduced Gaussian mixture which better adapts to the image histogram.
In the experiment, the methods, + ABC, + AIS, and + DE, are executed considering Gaussian mixtures composed of 8 functions. Under such circumstances, the number of classes to be detected is higher than the actual number of classes in the image. On the other hand, the proposed algorithm maintains the same configuration of Table 5. Therefore, it can detect and calculate up to ten classes ().
As a result, the techniques + ABC, + AIS, and + DE tend to overestimate the image histogram. This effect can be seen in Figure 15(a), where the resulting Gaussian functions are concentrated within actual classes. Such a behavior is a consequence of the evaluation that is considered by the original objective function , which privileges only the approximation between the Gaussian mixture and the image histogram. This effect can be graphically illustrated by Figure 16(a) that shows the pixel misclassification produced by the wrong segmentation of Figure 14(a). On the other hand, the proposed approach obtains a reduced Gaussian mixture model which allows the detection of each class from the actual histogram (see Figure 15(b)). As a consequence, the segmentation is significantly improved by eliminating the pixel misclassification, as it is shown by Figure 16(b).
It is evident from Figure 15 that the techniques, + ABC, + AIS, and + DE, all need an a priori knowledge of the number of classes that are contained in the actual histogram in order to obtain a satisfactory result. On the other hand, the proposed algorithm is able to find a reduced Gaussian mixture whose classes coincide with the actual number of classes that are contained in the image histogram.
Accuracy. In this section, the comparison among the algorithms in terms of accuracy is reported. Most of the reported comparisons [19–26] are concerned about comparing the parameters of the resultant Gaussian mixtures by using real images. Under such circumstances, it is difficult to consider a clear reference in order to define a meaningful error. Therefore, the image defined in Figure 14 has been used in the experiments because its construction parameters are clearly defined in Table 7.
Since the parameter values from Table 7 act as ground truth, a simple averaged difference between them and the values that are computed by each algorithm could be used as comparison error. However, as each parameter maintains different active intervals, it is necessary to express the differences in terms of percentage. Therefore, if is the parametric difference and the ground truth parameter, the percentage error can be defined as follows:In the segmentation problem, each Gaussian mixture represents a -dimensional model where each dimension corresponds to a Gaussian function of the optimization problem to be solved. Since each Gaussian function possesses three parameters , , and , the complete number of parameters is dimensions. Therefore, the final error produced by the final Gaussian mixture iswhere .
In order to compare accuracy, the algorithms, + ABC, + AIS, + DE, and the proposed approach are all executed over the image shown by Figure 14(a). The experiment aims to estimate the Gaussian mixture that better approximates the actual image histogram. Methods + ABC, + AIS, and + DE consider Gaussian mixtures composed of 4 functions (). In case of the method, although the algorithm finds a reduced Gaussian mixture of four functions, it is initially set with ten functions (). Table 8 presents the final Gaussian mixture parameters and the final error . The final Gaussian mixture parameters have been averaged over 30 independent executions in order to assure consistency. A close inspection of Table 8 reveals that the proposed method is able to achieve the smallest error () in comparison to the other algorithms. Figure 16 presents the histogram approximations that are produced by each algorithm whereas Figure 17 shows their correspondent segmented images. Both illustrations present the median case obtained throughout 30 runs. Figure 18 exhibits that + ABC, + AIS, and + DE present different levels of misclassifications which are nearly absent in the proposed approach case.
Computational Cost. The experiment aims to measure the complexity and the computing time spent by the + ABC, the + AIS, the + DE, and the algorithm while calculating the parameters of the Gaussian mixture in benchmark images (see Figures 19(a)–19(d)). + ABC, + AIS, and + DE consider Gaussian mixtures that are composed of 4 functions (). In case of the method, although the algorithm finds a reduced Gaussian mixture of four functions despite being initialized with ten functions (), Table 9 shows the averaged measurements after 30 experiments. It is evident that the + ABC and + DE are the slowest to converge (iterations) and the + AIS shows the highest computational cost (time elapsed) because it requires operators which demand long times. On the other hand, the shows an acceptable trade-off between its convergence time and its computational cost. Therefore, although the implementation of in general requires more code than most of other evolution-based segmentators, such a fact is not reflected in the execution time. Finally, Figure 19 below shows the segmented images as they are generated by each algorithm. It can be seen that the proposed approach generate more homogeneous regions whereas + ABC, + AIS, and + DE present several artifacts that are produced by an incorrect pixel classification.
7.3. Performance Evaluation of the Segmentation Results
This section presents an objective evaluation of segmentation results that are produced by all algorithms in the comparisons. The ill-defined nature of the segmentation problem makes the evaluation of a candidate algorithm difficult . Traditionally, the evaluation has been conducted by using some supervised criteria  which are based on the computation of a dissimilarity measure between a segmentation result and a ground truth image. Recently, the use of unsupervised measures has substituted supervised indexes for the objective evaluation of segmentation results . They enable the quantification of the quality of a segmentation result without a priori knowledge (ground truth image).
Evaluation Criteria. In this paper, the unsupervised index ROS proposed by Chabrier et al.  has been used to objectively evaluate the performance of each candidate algorithm. This index evaluates the segmentation quality in terms of the homogeneity within segmented regions and the heterogeneity among the different regions. ROS can be computed as follows:where quantifies the homogeneity within segmented regions. Similarly, measures the disparity among the regions. A segmentation result is considered better than , if . The interregion homogeneity characterized by is calculated considering the following formulation:where represents the number of partitions in which the image has been segmented. symbolizes the number of pixels contained in the partition whereas considers the number of pixels that integrate the complete image. Similarly, represents the standard deviation from the partition . On the other hand, disparity among the regions is computed as follows:where is the average gray level in the partition .
Experimental Protocol. In the comparison of segmentation results, a set of four classical images has been chosen to integrate the experimental set (Figure 20). The segmentation methods used in the comparison are + ABC , + AIS , and + DE .
From all segmentation methods used in the comparison, the proposed algorithm is the only one that has the capacity to automatically detect the number of segmentation partitions (classes). In order to conduct a fair comparison, all algorithms have been proved by using the same number of partitions. Therefore, in the experiments, the segmentation algorithm is firstly applied to detect the best possible number of partitions . Once we obtained the number of partitions , the rest of the algorithms were configured to approximate the image histogram with this number of classes.
Figure 21 presents the segmentation results obtained by each algorithm considering the experimental set from Figure 20. On the other hand, Table 10 shows the evaluation of the segmentation results in terms of the ROS index. Such values represent the averaged measurements after 30 executions. From them, it can be seen that the proposed method obtains the best ROS indexes. Such values indicate that the proposed algorithm maintains the best balance between the homogeneity within segmented regions and the heterogeneity among the different regions. From Figure 21, it can be seen that the proposed approach generates more homogeneous regions whereas + ABC, + AIS, and + DE present several artifacts that are produced by an incorrect pixel classification.