#### Abstract

Finding an optimal set of discriminative features is still a crucial but challenging task in biomedical science. The complexity of the task is intensified when any of the two scenarios arise: a highly dimensioned dataset and a small sample-sized dataset. The first scenario poses a big challenge to existing machine learning approaches since the search space for identifying the most relevant feature subset is so diverse to be explored quickly while utilizing minimal computational resources. On the other hand, the second aspect poses a challenge of too few samples to learn from. Though many hybrid metaheuristic approaches (i.e., combining multiple search algorithms) have been proposed in the literature to address these challenges with very attractive performance compared to their counterpart standard standalone metaheuristics, more superior hybrid approaches can be achieved if the individual metaheuristics within the proposed hybrid algorithms are improved prior to the hybridization. Motivated by this, we propose a new hybrid Excited- (E-) Adaptive Cuckoo Search- (ACS-) Intensification Dedicated Grey Wolf Optimization (IDGWO), i.e., EACSIDGWO. EACSIDGWO is an algorithm where the step size of ACS and the nonlinear control strategy of parameter of the IDGWO are innovatively made adaptive via the concept of the complete voltage and current responses of a direct current (DC) excited resistor-capacitor (RC) circuit. Since the population has a higher diversity at early stages of the proposed EACSIDGWO algorithm, both the ACS and IDGWO are jointly involved in local exploitation. On the other hand, to enhance mature convergence at latter stages of the proposed algorithm, the role of ACS is switched to global exploration while the IDGWO is still left conducting the local exploitation. To prove that the proposed algorithm is superior in providing a good learning from fewer instances and an optimal feature selection from information-rich biomedical data, all these while maintaining a high classification accuracy of the data, the EACSIDGWO is employed to solve the feature selection problem. The EACSIDGWO as a feature selector is tested on six standard biomedical datasets from the University of California at Irvine (UCI) repository. The experimental results are compared with the state-of-the-art feature selection techniques, including binary ant-colony optimization (BACO), binary genetic algorithm (BGA), binary particle swarm optimization (BPSO), and extended binary cuckoo search algorithm (EBCSA). These results reveal that the EACSIDGWO has comprehensive superiority in tackling the feature selection problem, which proves the capability of the proposed algorithm in solving real-world complex problems. Furthermore, the superiority of the proposed algorithm is proved via various numerical techniques like ranking methods and statistical analysis.

#### 1. Introduction

Currently, there is a growing research interest in developing and deploying population-based metaheuristics to tackle combinatorial optimization challenges. This is because they are simple, flexible with an inexpensive computational cost, and gradient-free [1].

Many researchers have applied these optimization algorithms in various research domains because of their ability to achieve best solutions.

The optimization challenge grows bigger when tackling highly dimensioned datasets. This is because these datasets have a vast feature space with many classes. Due to the presence of redundant and noninformative attributes within these datasets, the process of effective machine learning greatly hindered. Thus, the construction of efficient classifiers with high predictive power largely depends on selection of informative features [2].

Feature selection (FS) is one of the main steps in data preprocessing that aims at selecting a subset of attributes out of the whole dataset resulting into removal of noisy noninformative and redundant features. This in turn increases the accuracy of a considered classifier or clustering model [3].

FS algorithms can be broadly categorized into two classes: filter and wrapper techniques [4, 5]. Filters include techniques independent of classifiers and work directly on presented data. Moreover, these methods in many situations determine the correlations between features. On the contrary, wrapper approaches engage classifiers and mainly determine interactions between dataset features. From literature, wrapper approaches have proved to be superior compared to filters for classification algorithms [6, 7].

To utilize wrapper-based techniques, three key factors need to be outlined: considered classifiers (i.e., -nearest neighbor (KNN), support vector machine (SVM)), evaluation criteria for the identified feature subset, and a search technique utilized in determining a subset of optimal features [8].

Many researchers have pointed out that determining an optimal subset of attributes is not only challenging but computationally expensive as well. Though, in the recent past, metaheuristics have proved to be reliable and efficient tools in tackling many optimization tasks (e.g., engineering designs problems, machine learning, feature selection, and data mining), they are not efficient in solving problems with high computational complexity [5, 9–11].

In the recent past, a number of metaheuristic search algorithms have been utilized for FS using highly dimensioned datasets. Some of these metaheuristics are the grey wolf optimization (GWO) [12, 13], genetic algorithm (GA) [14], particle swarm optimization (PSO) [11], ant-colony optimization (ACO) [15], differential evolution algorithm (DEA) [16], cuckoo search algorithm (CSA) [17], and dragonfly algorithm (DA) [18]. Though, many of these algorithms have already made an important contribution in the field of feature selection, in many cases, they offer acceptable solutions without a guarantee of determining optimal solutions since they do not explore the entire search space [11].

Some of the new modifications that have been proposed to improve the performance of these metaheuristics include chaotic maps [19], evolutionary methods [20], sine cosine algorithms [21], biogeography-based optimization, and local searches [22].

While designing or utilizing a metaheuristic, it should be noted that diversification (exploring the search space) and intensification (exploiting optimal solutions obtained so far) are two contradicting principles that must be balanced efficiently in order to achieve an improved performance of the metaheuristic [9].

In this regard, one promising alternative is developing a memetic algorithm whereby an integration of (at least) two algorithms is done with the aim of enhancing the overall performance.

Motivated by this, a good number of hybrid algorithms have been proposed in the recent past to solve a variety of optimizations and feature selection problems [23]. However, to enhance diversification and intensification of these hybrid algorithms, exploration and fine-tuning within their basic constituent algorithms is needed prior to hybridization [24].

This emphasizes, too, that there are a number of techniques lying within these memetic algorithms that are yet to be investigated.

Firstly, the technique of combining one or more nature-inspired algorithms (NIAs) needs to be determined. Secondly, the criterion of determining how many NIAs need to be combined within the search space has to be accomplished. Thirdly, the method of determining the application area upon which the proposed memetic algorithm has to be done. Finally, the criterion of applying the memetic algorithm in a specific domain has to be accomplished [24].

Inspired by the aforementioned, this paper proposes a new hybrid algorithm called Excited- (E-) Adaptive Cuckoo Search- (ACS-) Intensification Dedicated Grey Wolf Optimization (IDGWO), i.e., EACSIDGWO algorithm to solve the feature selection problem in biomedical science. In the proposed algorithm, the concept of the complete voltage and current responses of a direct current (DC) excited resistor capacitor (RC) circuit is innovatively utilized to make the step size of ACS and the nonlinear control strategy of parameter of the IDGWO adaptive. Since the population has a higher diversity during early stages of the proposed algorithm, both the ACS and IDGWO are jointly utilized to attain accelerated convergence. However, to enhance mature convergence while striking an effective balance between exploitation and exploration in latter stages, the role of ACS is switched to global exploration while the IDGWO is still left conducting the local exploitation.

The remainder of this paper is organized as follows: Section 2 discusses the existing literature within the same research domain. Section 3 presents the background information of the CS and the GWO, respectively, where their inspirations and mathematical models are given emphasis. The continuous version of the proposed EACSIDGWO algorithm is presented in Section 4 while the details of its binary version are given in Section 5. The experimental methodology considered in this paper is presented in Section 6 while the results on feature selection are discussed in Section 7. Finally, conclusions and the suggested future works are given in Section 8.

#### 2. Literature Reviews

##### 2.1. Review of Hybridization of GWO with Other Search Algorithms

Combining two or more metaheuristics to attain better solutions is currently a new insight in the area of optimization. In the literature, many researchers have utilized GWO in the field of hybrid metaheuristics. For instance, in [25], a hybrid of GWO and artificial bee colony (ABC) is proposed to improve performance of a complex system. In [26], GWO is hybridized with ant lion optimizer (ALO) for wrapper feature selection. Alomoush et al. [27] proposed a hybrid of GWO and harmony search (HS). In this memetic, GWO updates the bandwidth and pitch adjustment rate in HS, which in return improves the global optimization abilities of the hybrid algorithm. In [28], Arora et al. combined GWO with the crow search algorithm (CSA). The performance of the derived memetic as a feature selector is evaluated using 21 datasets. The obtained results reveal that the combined algorithm is superior in solving complex optimization algorithms. In [29], a novel combination between GWO and PSO is utilized as a load-balancing technique in the cloud-computing arena. The conclusions point out that the hybrid algorithm improved both the convergence speed and the simplicity in comparison with other algorithms. Zhu et al. [30] hybridized GWO with differential evolution (DE). The hybrid algorithm was tested on 23 different functions and a nondeterministic polynomial hard problem. The obtained results indicate that this combination achieved superior exploration. In [31], a new memetic combining the exploration ability of the fireworks algorithm (FWA) with the exploitation ability of GWO is proposed. Utilizing 16 benchmark functions with varied dimensions and complexities, the experimental results indicate that the hybrid algorithm attained attractive global search abilities and convergence speeds.

##### 2.2. Review of Hybridization of CS with Other Search Algorithms

Utilizing the concept of rand and best agents within a population, Cheng et al. [32] developed an ensemble cuckoo search variant combining three different CS approaches that coexist within the entire search domain. These CS variants actively compete to derive superior generations for numerical optimization. To maintain population diversity, he introduced an external archive. The statistical results obtained reveal that the ensemble CS attained attractive converge speeds as well as robustness. In [33], GWO is hybridized with CS, i.e., GWOCS for the extraction of parameters for different PV cell models situated in different conditions. Zhang et al. [34] developed an ensemble CS algorithm that foremost divides a population into two smaller groups and then utilizes CS and differential evolution (DE) on the derived subgroups independently. The subgroups are free to share useful information by division. Further, the CS and DE algorithms can freely utilize each other’s merits to complement their weaknesses. This approach proved to balance the quality of solutions and the computation consumption. In [34], CS is hybridized with a covariance matrix adaptation evolution approach, i.e., CMA-CS to improve the performance of CS in different optimization problems.

Despite the advantages portrayed by the aforementioned hybrid GWO and CS metaheuristics for optimization and feature selection, superior hybrid approaches can be achieved if the single GWO and CS algorithms are improved prior to hybridization. Furthermore, the no-free-lunch (NFL) theorem has logically proved that there has been, is, and will be no single metaheuristic capable of solving all optimization and feature selection problems [33]. While a given metaheuristic can show an attractive performance on specific datasets, its performance might degrade when applied to similar or different types of datasets [34]. Thus, there is still a dire need to improve existing algorithms or develop new ones to solve function optimization problems as well as feature selection problems efficiently.

#### 3. Standard Cuckoo Search (CS)

##### 3.1. Inspiration of CS

###### 3.1.1. The Behavior of Cuckoo Birds

To date, more than a thousand different species of birds are in existence in nature [35]. For most of these species, the female birds lay eggs in nests they have built themselves [36]. However, there exist some types of birds that do not build nests of their own, but instead lay their eggs in other different species’ nests, leaving the responsibility of taking care of their eggs to the host birds. The cuckoos are the most famous of these brood parasites [37].

There are three types of brood parasites: intraspecific brood parasites, cooperative breeding, and nest takeover [38].The cuckoo strategy is full of amazing traits; foremost, it replaces one host egg with its own to increase the chances of its egg being hatched by the host bird. Next, it tries to mimic the pattern and color(s) of this host eggs with the aim of reducing the chances of its egg being noticed and discarded by the host bird. It is also important to point out that the timing of laying its egg is amazing since it cleverly selects a nest where a host bird has just laid eggs, implying that the cuckoo’s egg will hatch prior to the host eggs. The first action taken by the hatched cuckoo is evicting the host eggs that are yet to hatch out of the nest by blind propelling in order to increase its chances of being fed well by the host bird [37]. In addition, this young cuckoo mimics the call of host chicks thus enhancing more access to the food provided by the host bird [39].

However, if this host bird is able to identify the cuckoo’s egg, it can either discard it from the nest or quit this nest to build a completely new nest in a different location.

###### 3.1.2. Le’vy Flights

From literature, many researchers have shown that the behavior of many flying animals, birds, and insects can be demonstrated by a Le’vy flight [40–43]. Le’vy flights are evident when some birds, insects, and animals follow a long path with sudden turns in combination with random-short moves [43].These Le’vy flights have been successfully applied in optimization [41, 43–45]. A Le’vy flight is a random walk characterized with step lengths whose distribution is according to a heavy-tailed probability distribution.

##### 3.2. Cuckoo Search (CS) Algorithm

CS is a metaheuristic swarm-based global optimization based on cuckoos that was proposed by Yang and Deb in 2009.The CS combines the obligate brood parasitic nature of cuckoos with the Le’vy flight existing in fruit flies and some birds [38]. There are three basic idealized rules for the CS, namely: (i)A female cuckoo lays one egg at a time and puts it in a randomly chosen nest(ii)The best nests with high-quality eggs (highest fitness/solutions) will carry over to the next generations(iii)The number of available host nests is kept fixed, and the host bird can discover the egg laid by the female cuckoo (alien egg) with a probability . Depending on the value of , the host bird can either throw away the alien egg or abandon the nest. An assumption that only a fraction of nests are replaced by new ones

Based on the above rules, an illustration of the CS is shown in Algorithm 1.

| ||||||||||||||||||||||||||||||||||

Algorithm 1: Pseudocode for the standard CS. |

##### 3.3. Mathematical Modelling of the Standard CS

Considering Algorithm 1, the standard CS has three major steps [46–48]: (1)Exploitation (intensification) by the use of Le’vy flight random walk (LFRW)(2)Exploration (diversification) using biased selective random walk (BSRW)(3)Elitist scheme via greedy selection

###### 3.3.1. Intensification Using Le’vy Flight Random Walk (LFRW)

In this phase, new solutions are generated around the current best solution, which in return enhances the speed of the local search. This phase is achieved via the LFRW that is generally presented in (1) where the step size is derived from the Le’vy distribution. where is the nest in the generation and is a new nest generated by the Le’vy flight. implies entry-wise multiplications, and is the step size where and is formulated in (2). The formula in equation (1) ensures that a new solution will be close to the current best solution. where is the current solution and is a scaler that is set to 0.01 in the standard CSA [38, 49]. is a random number derived from the Le’vy distribution and is formulated in where is a constant whose value is 1.5 as suggested by Yang in the standard CS [38]. and are random numbers derived from a normal distribution whose mean and standard deviation is 1. is a parameter computed in where is a gamma function. The final form of Le’vy flight random walk (LFRW) is a combination of equations (1) to (4) as presented in

###### 3.3.2. Diversification by the Use of Biased Selective Random Walk (BSRW)

In this phase, new solutions are randomly generated in locations far from the current best solution, an approach that ensures that the CSA is not trapped in the local optimum thus enhancing suitable diversity and exploration of the entire search space [48]. This phase of the CSA is achieved by utilizing the BSRW which is efficient in exploring the entire search space especially when it is large since the step size in the Le’vy flight is much longer in the long run [46, 48].

To find new solutions that are far from the current best solution, foremost, a trial solution is obtained by using a mutation of the current best solution and a differential step size from two solutions selected randomly. Then, a new solution is derived from a crossover operator between the current best solution and the two trial solutions [48]. The formulation of the BSRW is given in [47]. where and are two random indexes, is a random number in the range [0, 1], and is the probability discovery whose best value is 0.25 [38, 48].

###### 3.3.3. Elitist Scheme via Greedy Selection

After each random walk process, the cuckoo search algorithm utilizes the greedy strategy to select solutions with better fitness values that will be passed to the next generation. This facilitates maintenance of good solutions [48].

#### 4. Grey Wolf Optimization (GWO) Algorithm

GWO is a recent nature-inspired metaheuristic algorithm that was proposed by Mirjalili et al. in 2014 [28, 50, 51]. The GWO imitates both the hunting and leadership traits of the grey wolves. The grey wolves belong to the Canidae family and follow a social hierarchy that is very strict. In most cases, a pack of between 5 and 12 wolves is involved in hunting. To efficiently simulate the leadership hierarchy of the conventional GWO algorithm, four levels are considered: alpha (*α*), beta (*β*), delta (*δ*), and omega (*ω*). Alpha, which is either a male or female is at the topmost of the hierarchy and is regarded as the leader of the pack. This leader makes all suitable decisions for the pack which are not limited to discipline and order, hunting, sleeping location, and waking-up time for the entire pack. Beta is known to assist the alpha in decision-making, and their main task is the feedback suggestions. Delta behaves like scouts, caretakers, sentinels, hunters, and elders. They control and guide the omega wolves by obeying both the beta and alpha wolves. The omega wolves are the least in the hierarchy and must obey all the other wolves [28, 50, 51].

The GWO algorithm is modelled mathematically in four stages that are described as follows.

##### 4.1. Leadership Hierarchy

The mathematical model of the GWO is anchored on the social hierarchy of the grey wolves. The alpha (*α*) is considered the best solution in the population while beta (*β*) and delta (*δ*) are termed as the second and third best solutions, respectively. Lastly, the omega (*ω*) is assumed as the rest of the solutions in the population [28, 50, 51].

##### 4.2. Encircling the Prey

Equation (7) and equation (8) represent the mathematical model for the wolves’ encircling trait [50]. where is the distance between the prey and a given wolf. is the wolf’s position vector, and depicts the prey’s position vector at iteration . and are random vectors computed as shown in [50]. where and are randomly generated vectors in the range [0, 1] and is a set vector that linearly decreases from 2 to 0 over the iterations.

##### 4.3. Hunting the Prey

In the hunting stage, the alpha is considered the best applicant for the solution while its two assistants (beta and delta) are expected to know the possible location of the prey. Thus, the best three solutions that have been achieved until a given iteration are preserved and are used to compel the remaining wolves in the pack (i.e., omega) to update their positions in the search space consistent with the optimal location.

The mechanism utilized in updating the wolves’ positions is given in where , , and are defined and computed using where , , and are the three best wolves (solutions) in the pack at a given iteration . , , and are calculated using Equation (9), while , , and are calculated using where , and are calculated based on Equation (10).

##### 4.4. Searching and Attacking the Prey

The grey wolves can only attack the prey when it stops moving. This is modelled mathematically based on vector that is utilized in Equation (9). Vector is comprised of values that span within the range [], and the value of is decreased from 2 to 0 over the course of iterations using where is the iteration number and is the optimal total number of iterations.

When , the wolves are forced to attack the prey, and when , the wolf diverges out from the current prey.Searching for the prey is the exploration phase while attacking it is the exploitation phase.

| ||||||||||||||||||||||||||||||||||||||

Algorithm 2: Pseudocode for the GWO. |

#### 5. Excited-Adaptive Cuckoo Search-Intensification Dedicated Grey Wolf Optimization (EACSIDGWO)

In general, effective balancing between diversification (global search) and intensification (local search) in a metaheuristic plays a beneficial and crucial role in achieving excellent performance of an algorithm [52–54]. However, it is difficult to achieve this balance with a single metaheuristic (for example, either using CSA or GWO) [52, 53]. For instance, CSA is efficient at exploring the promising area of the whole search space (diversification) but ineffective at fine-tuning the end of the search space (exploitation/intensification) [55, 56]. On the other hand, GWO is good at intensification (exploitation) but inefficient at diversification (exploration) [32, 57].

For this reason, in trying to enhance mature convergence while ensuring that the required effective balance between diversification and intensification is met, a hybrid algorithm called Excited-Adaptive Cuckoo Search-Intensification Dedicated Grey Wolf Optimization (EACSIDGWO) utilizing the strengths of each algorithm (i.e., CSA’s diversification and GWO’s intensification abilities) is proposed in this paper. Moreover, the adaptability of the proposed EACSIDGWO is guided innovatively by the complete voltage and current responses of a DC excited RC circuit (whose analysis results in first order differential equations) that finds continual applications in electronics, communications, and control systems [58].

##### 5.1. Adaptive Cuckoo Search (ACS)

###### 5.1.1. Adaptive Step Size via the Complete Voltage Response of the DC Excited RC Circuit

From the details of the standard CS algorithm presented in Section 2, it is evident that the algorithm lacks a criterion to control its step size through the iteration process. Control of the step size is key in guiding the CS algorithm to reach either its global maxima or minima [48, 59].

Inspired by the complete voltage response of a direct current (DC) excited RC circuit which increases with time, a novel mechanism to control the step size is proposed. Contrary to prior research [48, 59] where the step size decays with generations, in this research, the step size grows with generations with the aim of strengthening the diversification (exploration) ability of the CS, which is a component of the proposed EACSIDGWO algorithm.

The solution to the first order differential equation of the direct current-excited RC circuit motivated the formulation of a new variant of ACS in this paper.

The complete voltage response of the RC circuit to a sudden application of a DC voltage source, with the assumption that the capacitor is initially not charged, is given in where is the time constant, which expresses the rapidity with which this voltage rises to the value of which is a constant DC voltage source. and are the equivalent resistance and capacitance in the circuit, respectively.

Considering the situation when , equation (15) can be rewritten as presented in

As , the component forcing . We adopt this concept, i.e., the exponential growth of to control the step size of the cuckoo search algorithm by introducing the proposed where is the current generation (iteration), is the upper bound of the step size , and is the maximum number of generations (iterations).

To ensure that the is proportional to the fitness of a given individual nest within the search space in the current generation, the nonlinear modulation index is formulated in
where is the nonlinear modulation index for the nest in generation , is the fitness value of the alpha () nest (overall best nest) in generation , is the fitness value of the beta () nest (2^{nd} best nest) in generation , is the fitness value of the delta () nest (3^{rd} best nest) in generation , is the fitness value of the nest in generation , and is the fitness value of the worst nest among the remaining omega () nests (i.e., nests whose fitness values are not featured among the top three fitness values).

Thus, equation (17) is further modified as where is the step size for the for the nest in generation. From equation (19), the step size is nonlinearly increasing from relatively small values to values close to . The reason for proposing a nonlinearly increasing strategy are as follows. Foremost, at the early stages of the proposed EACSIDGWO algorithm, whereby ACS is a component, the population has a higher diversity. A higher diversity implies a stronger ability to explore the global space. Our aim at this point is to accelerate convergence. Therefore, the value of the step size is set to a smaller value.

It is important to point out that the anticipated accelerated convergence is a joint effort attained by foremost setting the of the ACS to a small value at early stages and utilizing the IDGWO (whose details are presented in Section 4.2) whose core task is exploitation.

On the other hand, since the proposed EACSIDGWO algorithm is a hybrid algorithm where the ACS cooperatively works with the IDGWO, all the nests will be attracted to the global optima, i.e., the alpha (*α*) nest at the later stage. This will compel them to converge prematurely without being given enough room to explore the search space. Such a situation will lead the nests away from a local optimum and encourage diversification. For this reason, the value of the step size is set to a larger value, i.e., . In this paper, the is set to 1.

In other words, our main reason for proposing a nonlinearly increasing step size is that its small values at the initial stages of the proposed EACSIDGWO algorithm facilitates “local exploitation” while its larger values in the later stages will facilitate “global exploration”.

The ACS can then be modeled as presented in

Equation (20) is a formulation of the new search space for the ACS from the current solution.

Moreover, if this step size is considered proportional to the global best solution, then equation (20) can be formulated as given in where is the global best solution among all for at generation , and is the number of host bird nests.

Thus, from equations (17), (18), (19), (20), (21), it is evident that the diversification ability of the ACS is heightened as the number of generations (gen) approach the maximum number of generations (gen_{Max}). This is because the value of the step size rapidly increases towards the set maximum value of step (step_{Max}).

##### 5.2. Intensification Dedicated Grey Wolf Optimizer (IDGWO)

###### 5.2.1. Nonlinearly Controlling Parameter via the Complete Current Response of the DC Excited RC Circuit

It is evident from Section 4.4 that parameter plays a critical role in balancing the diversification (exploration) and the intensification (exploitation) of a search agent.

A large value of control parameter facilitates diversification while a smaller value of this parameter facilitates intensification. Thus, a suitable selection of the control parameter can enhance a good balance between global diversification (exploration) and local intensification (exploitation).

In the original GWO (described in Section 3), the value of linearly decreases from 2 to 0 (refer to equation (14)). However, the search process of the GWO algorithm is both nonlinear and complicated, which cannot be truly reflected by the linear control strategy of presented in equation (14).

In addition, Mittal et al. [60] proposed that an attractive performance can be attained if parameter is nonlinearly decreased rather than decreased linearly.

Inspired by the complete current response of a direct current (DC) excited RC circuit which increases with time, a novel nonlinear adjustment mechanism of control parameter is formulated in this paper.

The complete current response of the RC circuit to a sudden application of a DC voltage source, with the assumption that the capacitor is initially not charged, is given in

As , the component forcing . We adopt this concept, i.e., the exponential decay of to formulate a novel improved strategy, i.e., equation (23) to generate the values for control parameter. where is the current generation (iteration), is the initial higher value of parameter and is the maximum number of generations (iterations). is the nonlinear modulation index described earlier by equation (18).

Consequently, vector is computed as given in

Equation (23) is a nonlinear decreasing control parameter for whose initial upper limit is equal to the value while its final lower limit is zero.

From the original literature of GWO, the value compels the grey wolves to move towards the prey (exploitation) while compels them to move away from the prey in search of a fitter prey (exploration). Thus, setting to 1 will always force the wolves to move to the prey which will enable us the dedicated modified GWO algorithm, a component of proposed EACSIDGWO, for intensification.

###### 5.2.2. Enhanced Mature Convergence via a Fitness Value-Based Position-Updating Criterion

Both diversification and intensification are crucial for population-based optimization algorithms [60]. However, from the detailed account of the conventional GWO (refer to Section 3), it is evident that all the other wolves are attracted towards the three leaders *α*, *β*, and *δ*; a scenario that will force the algorithm to converge prematurely without attaining sufficient diversification of the search space. In other words, the conventional GWO is prone to premature convergence.

In reference to the position-updated criterion of GWO described by equation (11), a new candidate individual is obtained by moving the old individual towards the best leader (), the second best leader (), and the third best leader (). This approach will force all the other grey wolves to crowd in a reduced section of the search space that might be different from the optimal region and without giving them a leeway to escape from such a region. In an effort to overcome this major drawback, in this paper, a scheme that promotes mature converge is devised.

Instead of averaging the values of vectors , , and (a form of recombining them) as a mechanism of updating the wolves’ positions (refer to equation (11)), in this paper, we make full use of the information of their fitness values as a criteria of arriving at new positions for the wolves.

Foremost, the search agents of the populations , , and are computed as given in where and . is the population size while is the dimension of the search space.

Next, the fitness value for each search agent in each of the derived populations, i.e., , and is evaluated. Further, a new population with the fittest values is derived from these three populations, i.e., , and .

Equations (28) and (29) represent the process undertaken to derive this new population. where is vector computed using search agent during iteration , is the fitness value of vector .

##### 5.3. Proposed EACSIDGWO (Continuous Version)

We cooperatively combined the proposed adaptive cuckoo search (ACS) and the intensification-dedicated grey wolf optimization (IDGWO) and developed the EACSIDGWO. In the EACSIDGWO algorithm, the ACS is actively involved in intensification (exploitation) during the early stage when the population has higher diversity and diversification at later stages. On the other hand, the IDGWO is only actively involved in intensification in all the stages of the proposed algorithm. By doing so, an effective balance between diversification and intensification is achieved. In addition, mature convergence is enhanced which in the end leads to high-quality solutions.

#### 6. Proposed EACSIDGWO (Binary Version)

Selection of features is binary by nature [61]. Therefore, the proposed EACSIDGWO algorithm cannot be utilized in selection of features without further modifications.

In the proposed EACSIDGWO algorithm, the new positions of the search agents will have continuous solutions, which must be converted into corresponding binary values.

In this paper, this conversion is achieved by foremost applying squashing of the continuous solutions in each dimension using a sigmoid (S-shaped) transfer function [61]. This will compel the search agents to move into a binary search space as depicted by equation (31). where is a continuous-valued position of the search agent in the dimension during generation .

The output of the sigmoid transfer function is still a continuous value, and thus, it has to be the threshold to reach the binary-value one. Normally, the sigmoid function maps smoothly the infinite input to a finite output [61]. To arrive at the binary solution when a sigmoid function is used, the commonly stochastic threshold is applied as presented in where is the binary updated position at generation in the dimension and is a random number drawn from a uniform distribution . is the equivalent binary vector of the search agent at generation .

Using this approach, the original solutions remain in the continuous domain of the proposed EACSIDGWO algorithm and can be converted to binary when the need arises.

The pseudocode of the binary version of the proposed EACSIDGWO algorithm is presented in Algorithm 3.

| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Algorithm 3. Pseudocode for the EACSIDGWO (Binary Version). |

#### 7. Experimental Methodology

In this section, detailed accounts of the biomedical datasets, evaluation metrics, proposed fitness function, and the parameter setting for the considered metaheuristic algorithms are outlined.

##### 7.1. Considered Biomedical Datasets

To validate the performance of the considered metaheuristic algorithms, six benchmark biomedical datasets extracted from the UCI Irvine Machine [62] were utilized. Each dataset has two classes, and the performance of each of these algorithms is evaluated based on its ability to classify these classes correctly. Details of these datasets are given in Table 1.

##### 7.2. Evaluation Metrics

For the considered feature selection problem, the following evaluation metrics were utilized to compare the performance of each considered feature selection technique.

*Average Accuracy (Avg_Acc).* It is one of the commonly used classification metric that represents the number of correctly classified instances by using a particular feature set. The mathematical formulation of this metric is given in Equation (33).
where is the number of times (runs) a given metaheuristic algorithm is run, represents the number of folds utilized, and is the accuracy reported during fold . is defined in equation (34).
where TP and FN denote the number of positive samples in fold that are accurately and falsely predicted, respectively, and TN and FP represent the number of negative samples in the same fold that are predicted accurately and wrongly, respectively [63].

*Average Feature Length (Avg_NFeat).* This metric characterizes the average length of selected features to the total number of features in the dataset. Equation (35) gives its mathematical formulation.
where is the number of selected features in the testing dataset during run .

*Minimum Accuracy (Min_Acc).* It is the least value of accuracy reported during runs. Equation (36) depicts its formulation.
where is given by

*Maximum Accuracy (Max_Acc)*. It is the largest value of accuracy reported during runs. Its mathematical formulation is given by

*Maximum Features Selected (Max_NFeat).*It is the largest number of selected features during runs. Equation (39) gives its mathematical formulation.

*Minimum Features Selected (Min_NFeat).* It is the least number of selected features during runs. Equation (40) gives its mathematical formulation.

##### 7.3. Evaluation of the Classifier Performance

Since the support vector machine classifier has already made immense contributions in the field of microarray-based cancer classification [63], it was adopted in this paper to evaluate the classification accuracy using the selected subset of features returned by the various considered metaheuristic feature selection approaches. The Matlab fitcsvm function that trains and cross-validates an SVM model was adopted in this paper. We specified the kernel scale parameter to “auto” to allow the function to select the appropriate scale factor using a heuristic search.

With the SVM classifier, the data items are mapped points in an dimensional feature space (i.e., ) and each feature’s value is a value of a given coordinate. The final output of this classifier is an optimal hyperplane which can be used to classify new cases [17, 63].

However, the performance of the SVM classifier is highly dependent on the selection of its kernel function [17, 63]. A reason why experiments were conducted using various kernels in this paper.

Selecting a suitable kernel is both dataset and problem specific and selected experimentally [17, 63]. Based on the conducted experiments, suitable kernel functions were selected for the considered datasets. The considered datasets and their suitable kernel functions are presented in Table 2.

More information of selecting suitable SVM kernel functions is presented in [63].

##### 7.4. Fitness Function

The main aim of a feature selection exercise is to discover a subset of features from the whole set of existing features in a given dataset such that the considered optimization algorithm is able to achieve the highest possible accuracy using that subset. For instance, in datasets with many features (attributes), the objective is to minimize the number of selected features while improving the classification accuracy of the feature selection approach.

In classifications tasks, there exist higher chances that two feature subsets containing a different number of features will have the same accuracy [17]. However, if a subset with a large number of features is discovered earlier by a given optimization algorithm, it is likely that the one with least features will be ignored [17].

In trying to overcome this challenge, a fitness function proposed in [17] to evaluate the classification performance of optimization algorithms for feature selection tasks is adopted. This fitness function is given in where represents the total number of features within a given dataset, represents the number of selected features during run , and is the average cross-validation accuracy reported during run (refer to Equation (37)). and are two weights corresponding to the significance of the classification quality and the subset length, respectively. In this paper, is set to 0.8 and as adopted from [17].

It is important to point out that both terms are normalized by division by their largest possible values; i.e., the number of selected features is divided by the total number of features , and average accuracy is divided by the value 1.

##### 7.5. Parameter Setting for the Considered Feature Selection Techniques

The performance of the proposed EACSIDGWO algorithm was compared to those of extended binary cuckoo search (EBCS), binary ant-colony optimization (BACO), binary genetic algorithm (BGA), and binary particle swarm optimization (BPSO) that were reported earlier in [17].

Table 3 indicates the selected parameter values for both the proposed BEACSIDGWO algorithm and each of the other algorithms as reported in [17].

To be consistent with the setup proposed in [17], the population size for the proposed EACSIDGWO was set to 30. Then, the algorithm was run 10 times to perform the feature selection task for each considered dataset. In addition, each run terminated when 10000 fitness function evaluations was attained. This approach allowed the proposed algorithm to utilize the fitness function at an equal number of times.

In this paper, all the experiments were conducted using Matlab 2017 running on Windows 10 operating system on a HP desktop with Intel® Core™ i7-3770CPU @ 3.4 GHZ with 12.0 GB of RAM.

#### 8. Results and Discussion

To examine the diversification and intensification of the proposed EACSIDGWOA, detailed comparative study is presented in this section.

The efficiency and the optimization performance of the proposed algorithm have been verified by comparing and analyzing its results with those of four other state-of-the-art optimization algorithms.

The experimental classification results have been probed through statistical tests, comparative analysis, and ranking methods.

Tables 4–9 provide the performance of all the considered optimization approaches for feature selection using the datasets described in Section 7.1. It is important to point out that the best result achieved in each column for all the considered biomedical datasets is highlighted in bold while the worst is italicized.

To prove that the proposed EACSIDGWO is superior over the other four-optimization algorithms, Wilcoxon rank-sum test, i.e., a nonparametric statistical test, is also performed. The statistical results for the and values obtained from the pairwise comparisons of the four groups are tabulated in Table 10. Tables 11 and 12 present a comparison of the overall ranking of the results obtained by the considered algorithms.

##### 8.1. Discussion

###### 8.1.1. Investigation of the Obtained Classification Results

From Tables 4–9, the following observations can be made. (i)The proposed EACSIDGWO algorithm outperformed all the other considered algorithms in terms of classification accuracy for all the utilized datasets. It recorded the highest classification accuracy on the three highly dimensioned datasets (i.e., Ovarian, CNS, and Colon) as well as the remaining three small sample-sized datasets. This promising performance is largely attributed to the cooperative exploitation conducted by ACS and IDGWO components of the proposed algorithm during the early generations, as well as the single-handed exploitation and exploration by IDGWO and ACS, respectively, at later generations(ii)For four datasets, i.e., Ovarian, Heart, CNS and Colon, the proposed algorithm attained a value for that is larger than the value for attained by the EBCS. EBCS is a variant of cuckoo search, which is a component of the proposed EACSIDGWO algorithm. This superior performance proves the competency of the proposed approach to efficiently determine the optima within the search space(iii)With regard to the average feature length (), the proposed B-EACSIDGWO algorithm demonstrated a superior performance by selecting the least number of features compared to the other algorithms. According to the results reported in Tables 4–9, the proposed algorithm performed better on all the considered datasets

In comparison with the original number of features in the considered datasets, there is a notable reduction in the number selected features by the proposed approach. For instance, the actual number of features in ovarian cancer, CNS, and Colon cancer datasets is 4000, 7129, and 2000, respectively, whereas the number of selected features by the proposed EACSIDGWO is 274.8, 1208.1, and 538.5, respectively. This clearly indicates that the proposed algorithm is able to reduce the number of features as well as locate the most significant optimal feature subsets. The strength of the proposed EACSIDGWO lies in its well-formulated algorithm (refer to Section 5) that enhances both its diversification and intensification capabilities which enables it to eliminate redundant (noninformative) attributes and then actively searches within the high-performance regions of the feature space.

###### 8.1.2. Statistical Analysis

The superiority of the proposed EACSIDGWO algorithm has been verified via Wilcoxon rank-sum test, i.e., a nonparametric test with a significance level of 5%. The results obtained for the pairwise comparison of the four groups are presented in Table 10. Observations from Table 10 reveal the statistical significance of the obtained experimental results for all the considered datasets. This clearly indicates that the proposed approach has an attractive performance in relation to the other four approaches. Thus, the overall statistical results by our algorithm are highly significant from the results of the four algorithms for all the considered datasets.

###### 8.1.3. Ranking Methods

Tables 11 and 12 outline the detailed ranking of all the considered algorithms with their respective comparative analysis. The ranking is based on maximum accuracy (), minimum accuracy (), average accuracy (), maximum number of selected features (), minimum number of selected features (), and average number of selected features (). From the ranking, it is evident that the proposed EACSIDGWO algorithm obtained the best values in all these measures for all the datasets. Considering the final ranks, the proposed algorithm attained an attractive performance whose overall rank value is 37.This clearly reveals the superiority of EACSIDGWO algorithm in relation to the four state-of-the-art algorithms.

#### 9. Conclusion

This paper proposed a new hybrid Excited- (E-) Adaptive Cuckoo Search- (ACS-) Intensification Dedicated Grey Wolf Optimizer (IDGWO), i.e., EACSIDGWO algorithm to solve the feature selection problem in biomedical science. In the proposed algorithm, the concept of the complete voltage and current responses of a direct current (DC) excited resistor capacitor (RC) circuit are innovatively utilized to make the step size of ACS and the nonlinear control strategy of parameter of the IDGWO adaptive. Since the population has a higher diversity during early stages of the proposed algorithm, both the ACS and IDGWO are jointly utilized to attain accelerated convergence. However, to enhance mature convergence while striking an effective balance between exploitation and exploration in later stages, the role of ACS is switched to global exploration while the IDGWO is still left conducting the local exploitation. In order to test the efficiency of the proposed EACSIDGWO as a feature selector, six standard biomedical datasets from the University of California at Irvine (UCI) repository were utilized. The experimental results obtained prove that the proposed algorithm is superior to the state-of-the-art feature selection techniques, i.e., BACO, BGA, BPSO, and EBCSA in attaining a good learning from fewer instances and optimal feature selection from information-rich biomedical data, all these while maintaining a high classification accuracy of the utilized data. In the future, utilizing this hybrid algorithm as a filter-feature selection approach seeking to evaluate the generality of the selected features will be a valuable contribution.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no competing interests regarding the publication of this paper.

#### Acknowledgments

This work is fully supported by the African Development Bank (AfDB), through the Ministry of Education, Kenya Support for Capacity Building.