High-Dimensional Feature Selection Based on Improved Binary Ant Colony Optimization Combined with Hybrid Rice Optimization Algorithm

Ye, A. Zhiwei; Li, B. Ruihan; Zhou, C. Wen; Wang, D. Mingwei; Mei, E. Mengqing; Shu, F. Zhe; Shen, G. Jun

doi:https://doi.org/10.1155/2023/1444938

International Journal of Intelligent Systems

On this page

Abstract Introduction Related Work Preliminaries Experimental Results and Discussion Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2023 | Article ID 1444938 | https://doi.org/10.1155/2023/1444938

High-Dimensional Feature Selection Based on Improved Binary Ant Colony Optimization Combined with Hybrid Rice Optimization Algorithm

A. Zhiwei Ye,¹B. Ruihan Li,¹C. Wen Zhou,¹D. Mingwei Wang,¹E. Mengqing Mei,¹F. Zhe Shu,¹and G. Jun Shen²

Academic Editor: Said El Kafhali

Received28 Feb 2023

Revised03 Jun 2023

Accepted26 Jun 2023

Published10 Jul 2023

Abstract

In the realm of high-dimensional data analysis, numerous fields stand to benefit from its applications, including the biological and medical sectors that are crucial for computer-aided disease diagnosis and prediction systems. However, the presence of a significant number of redundant or irrelevant features can adversely affect system accuracy and real-time diagnosis efficiency. To mitigate this issue, this paper proposes two innovative wrapper feature selection (FS) methods that integrate the ant colony optimization (ACO) algorithm and hybrid rice optimization (HRO). HRO is a recently developed metaheuristic that mimics the breeding process of the three-line hybrid rice, which is yet to be thoroughly explored in the context of solving high-dimensional FS problems. In the first hybridization, ACO is embedded as an evolutionary operator within HRO and updated alternately with it. In the second form of hybridization, two subpopulations evolve independently, sharing the local search results to assist individual updating. In the initial stage preceding hybridization, a problem-oriented heuristic factor assignment strategy based on the importance of the knee point feature is introduced to enhance the global search capability of ACO in identifying the smallest and most representative features. The performance of the proposed algorithms is evaluated on fourteen high-dimensional biomedical datasets and compared with other recently advanced FS methods. Experimental results suggest that the proposed methods are efficient and computationally robust, exhibiting superior performance compared to the other algorithms involved in this study.

1. Introduction

High-dimensional data are increasingly prevalent in diverse fields such as medical diagnosis and genomics, presenting significant challenges to the construction of intelligent systems. Data from medical imaging generally comprise a variety of features that delineate patient conditions. Similarly, microarray data from high-throughput gene expression experiments concurrently measure the expression levels of thousands of genes [1]. Such high dimensionality creates difficulties in terms of storage, computation, and analysis [2]. Specifically, the abundance of features may lead to the “curse of dimensionality,” where data sparsity occurs and the distance between sample points loses significance. This situation, in turn, can lead to the overfitting of data mining models and increased computation time. Therefore, selecting representative features that positively impact intelligent systems becomes critical. Feature selection (FS) is employed to minimize the feature space, thereby not only conserving storage space but also facilitating information discovery and mitigating the potential for model overfitting.

FS techniques directly identify the optimal feature subset (OFS) from the original feature space, preserving the interpretability of the chosen features and enhancing the comprehension of underlying data patterns and relationships. The broad applicability of FS spans areas such as text classification [3], image processing [4, 5], finance [6, 7], and other fields. As an NP-hard problem in nature, the search space of FS grows exponentially with the number of features, making traditional search methods ineffective in finding the OFS within polynomial time. Existing FS methods primarily fall into three categories: filter, wrapper, and embedded, each distinguished by its selection process. The filter method independently evaluates the relevance and redundancy of each feature using statistical or ranking measures, irrespective of the particular classifier employed in subsequent steps. The wrapper method integrates a specific learning algorithm within the FS process, treating FS as a search problem and seeking the OFS by assessing the performance of each feature subset on the classifier. Although wrappers entail a higher computational cost compared to highly efficient filter methods, they can handle nonlinear feature interactions and dependencies. Furthermore, wrappers can determine the OFS for a specific learning algorithm, potentially enhancing predictive performance [8]. The embedded method incorporates the FS process within the training process of the learning algorithm. The superior performance of the wrapper method has attracted increasing research interest, marking it as a popular FS technique. To tackle the computational intensity of wrapper methods, researchers are investigating the use of metaheuristics in FS to boost search efficiency and OFS quality.

Metaheuristics have surfaced as promising alternatives for FS in high-dimensional data, efficiently balancing exploration and exploitation in the search space to converge towards near-optimal solutions. Various high-performance metaheuristics have been successfully employed across diverse FS tasks [9, 10]. For instance, Jia et al. [11] enhanced the slime mould algorithm (SMA) [12] for FS, wherein the FS process and the parameter optimization of SVM occur concurrently. To improve population diversity, a composite mutation strategy was introduced, and a trial-based restart strategy was designed to circumvent the local optima. Qu et al. [13] proposed a novel gene selection method based on Harris hawk optimization (HHO). The F-score is initially employed for preliminary feature filtering to condense the feature space, followed by a variable neighborhood learning strategy for balancing the exploration and exploitation of HHO. Ghosh et al. [14] studied eight different transfer functions from two series (S-shaped and V-shaped) and suggested an improved binary manta ray foraging optimization (MRFO) [15] for FS. Ewees et al. [16] modified the seagull optimization algorithm (SOA) using the Levy flight mechanism to overcome its linear search deficiency in the search space, expanding the search area and enhancing the capability of individuals to escape from local optima. To select the optimal gene combination effectively in microarray data, Pashaei [17] utilized mRMR to filter the top promising genes in the initial stage to reduce the feature space and then applied the Aquila Optimizer (AO) with a mutation mechanism and TVMS transfer function to search the OFS. Awadallah et al. [18] developed a new enhanced binary rat swarm optimizer (RSO), in which the local historical optimal strategy of PSO was introduced to enhance individual exploitation. Furthermore, three crossover operators, namely, one-point crossover, two-point crossover, and uniform crossover, are incorporated into the individual update process and randomly selected with equal probability. A multi-strategy integrated grey wolf optimizer (MSGWO) was presented for biological data classification [19]. It adopted the convergence factor concept to adjust the transition between exploration and exploitation explicitly. Additionally, multiple exploration and exploitation strategies were employed to boost the global search and local exploitation processes.

The no-free lunch (NFL) theorem suggests that no single algorithm can resolve all optimization problems due to the inability to strike a perfect balance between global search and local exploitation. Consequently, recent research has increasingly focused on hybrid metaheuristics to address the limitations of individual algorithms. For instance, Pashaei [2] introduced a hybrid FS method that integrates the dragonfly algorithm (DFA) and the black hole algorithm (BHA), in which the optimal solution derived from DFA serves as the initial solution for BHA. A hybrid FS method, merging the binary arithmetic optimization algorithm (BAOA) and simulated annealing (SA), was proposed in [20], utilizing SA as a local search operator to discover potential solutions near the optimal solution of BAOA. Stephan et al. [21] put forward a method combining the artificial bee colony (ABC) and whale optimization algorithm (WOA) to concurrently search the OFS of breast cancer data and optimize the artificial neural network parameters. ‘In addition, there are several studies attempting to integrate the HHO with other metaheuristic algorithms.’ We hope this revision is more clear and accurate [22–24]. Beyond metaheuristic hybrids, the combination of filters and wrappers has also been extensively researched, aiming to determine the OFS at a lower computational cost. Zhu et al. [25] combined the Fisher filtering method with the artificial immune optimizer for high-dimensional FS, introducing a lethal mutation mechanism and a Cauchy mutation operator with an adaptive adjustment factor to enhance population diversity. A new high-dimensional FS method in conjunction with mRMR was developed in [26]. This method improved the recently proposed COOT algorithm using a crossover operator and employed the hyperbolic tangent transfer function for continuous numerical binarization. The improved COOT was hybridized with SA and applied to FS of microarray data. While these studies have demonstrated the performance of metaheuristic-based FS methods, the risk of local optima entrapment persists due to the stochastic strategies inherent in metaheuristics that ensure the global search of the algorithm. Therefore, the development of a computationally efficient and robust high-dimensional FS method remains a crucial and ongoing research endeavor.

Hybrid rice optimization (HRO) algorithm, proposed by Ye et al. [27], is a novel population-based metaheuristic inspired by the real-world breeding process of three-line hybrid rice. The heterosis theory suggests that the first generation of hybrids manifests superior physical, reproductive, and behavioral characteristics compared to their parent generation. Echoing this concept, HRO has exhibited several desirable properties, including high search efficiency, fewer control parameters compared to other metaheuristics, and ease of implementation. Its high flexibility and reduced parameter dependency have encouraged researchers to apply it to the 0-1 knapsack problem [28] as well as the FS problem [29].

While most metaheuristics necessitate conversion to binary form via a specific transfer function to indicate whether the feature is selected, this approach may be appropriate for all situations, as they were initially designed to manage continuous optimization problems. As a classic swarm intelligence algorithm, ant colony optimization (ACO) is more suitable for FS tasks, given its origin as a combinatorial optimization algorithm. ACO strikes a well-balanced ratio between exploration and exploitation by dynamically adjusting pheromone density. Numerous studies have demonstrated the robustness and adaptability of ACO in resolving FS problems. Wang et al. [30] introduced a novel approach for FS, namely, the probabilistic sequence-based graphical representation ACO, incorporating symmetric uncertainty (SU) into the algorithm. Paniri et al. [31] presented an innovative multi-label FS method based on ACO, which used both unsupervised and supervised heuristic functions to seek features with minimal redundancy and maximal correlation with class labels. Moreover, a recent study [32] put forth a semisupervised FS method based on ACO that employs a nonlinear heuristic function trained using temporal difference (TD) reinforcement learning instead of the traditional linear heuristic functions. Despite the standard ACO and HRO having demonstrated commendable optimization performance in low-dimensional optimization problems, their performance may encounter difficulty in achieving ideal performance when dealing with high-dimensional FS problems. This challenge motivates us to combine the strengths of both algorithms, aiming to create a more robust and efficient method for high-dimensional FS.

This paper proposes a two-stage hybrid technique for solving high-dimensional FS problems, leveraging the strengths of both ACO and HRO. Considering the limitations of standard ACO, this paper seeks to enhance it in the first stage prior to hybridization. A novel problem-oriented heuristic factor assignment strategy based on the importance of the knee point feature is designed to augment the search capability of ACO in high-dimensional FS. In the second stage, two hybrid models merging the improved binary ACO (IBACO) and HRO are presented and applied to high-dimension FS tasks. The hybridization manifests in two forms: low-level relay hybrid (LRH) and high-level teamwork hybrid (HTH) [33]. In the case of LRH, IBACO serves as an operator within HRO to guide the evolution of the maintainer line. This hybrid mode has been introduced due to the notable absence of suitable maintainer line update strategies within HRO, a factor of particular importance for high-dimensional FS tasks, while in HTH, subpopulations of HRO and IBACO evolve independently and share their local search results after each iteration. A comprehensive list of acronyms and symbol annotations used throughout this paper is provided in Table 1. The main contributions of the paper are summarized as follows:(1)Two unique hybridizations of IBACO and HRO, namely, R-IBACO and C-IBACO, are presented to effectively leverage the advantages of the two algorithms and obtain a more promising solution.(2)A new problem-oriented assignment strategy for the heuristic factor (HF) based on the feature correlation of the knee point feature is proposed.(3)The proposed methods are applied in high-dimensional FS tasks and compared with existing state-of-the-art metaheuristic-based FS methods.(4)The performance of the proposed methods is evaluated from multiple perspectives, and the results are subjected to the Wilcoxon signed-rank test and Friedman test.

The remainder of this paper is structured as follows. Section 2 presents an overview of the related work. Section 3 discusses the mathematical model of FS and provides some theoretical background on BACO and HRO. The specifics of the proposed methods are elucidated in Section 4. Section 5 outlines the experiments conducted and analyzes the obtained results. Finally, Section 6 draws conclusions from this study and indicates potential areas for future research.

2.1. Metaheuristic-Based FS

In recent years, the application of metaheuristics for FS problems has seen a significant rise. Notably, many of the proposed wrapper FS methods draw upon GWO due to its remarkable performance in solving continuous optimization problems. In the work of Hu et al. [34], an enhanced variant of the binary GWO was introduced, incorporating a novel strategy for updating the parameter governing exploration and exploitation, along with five transfer functions for mapping continuous values to their binary counterparts, thereby enhancing the quality of candidate solutions. Likewise, Abdel-Basset et al. [35] proposed three distinct binary GWO variants, each utilizing different transfer functions. In addition to GWO-based research, PSO has been a focus in prior studies. For example, Song et al. [36] proposed an improved integer PSO with a fast correlation-guided feature clustering strategy that markedly reduces the computation cost for FS. The gaining-sharing knowledge (GSK) optimization algorithm is a newly devised metaheuristic that draws its inspiration from the human process of acquiring and sharing knowledge [37]. Its robustness and convergence properties have proven to be highly effective in solving continuous optimization problems. Some enhanced versions of GSK have been successfully employed for FS tasks. As a continuous optimization algorithm, Agrawal et al. [38] implemented eight S-shaped and V-shaped transfer functions to map the individual codes of GSK into the binary search space. In addition, a dynamic population reduction strategy is introduced to facilitate the adjustment of population size during the pursuit of OFS. Another improved GSK integrating chaotic map strategies was proposed in [39], which utilized a probability estimation operator to represent its binary variant. Hanbay [40] proposed an innovative standard error-based ABC for FS, incorporating new solution search mechanisms based on standard error into the original ABC algorithm and utilizing the Shannon condition entropy value for FS. Moreover, an advanced salp swarm algorithm (SSA) was also suggested in [41], utilizing opposition-based learning to augment population diversity and implementing a novel local search mechanism to circumvent the local optima.

2.2. ACO-Based FS

ACO has displayed exceptional performance in tackling a wide range of discrete optimization challenges. For instance, Owuor et al. [42] performed a detailed evaluation of three population-based optimization techniques in the context of mining gradual patterns. The findings suggested that ACO surpassed the other two techniques and traditional counterparts. In another study, Zhang et al. [43] introduced a knowledge-based local search for multi-population ant colony systems to address the multi-objective supply chain configuration problem in supply chain management. By leveraging two independent ant colonies, the cost of goods sold and the lead time were concurrently minimized, facilitating an efficient search of the target space. Moreover, Zhao et al. [44] enhanced ACO using horizontal and vertical crossover search and applied it in the field of image segmentation. The superior combinatorial optimization capability of ACO has encouraged researchers to extend its application to various types of FS tasks, some of which have been successfully implemented [45].

The heuristic factor (HF) and the pheromone density (PD) are two crucial parameters in ACO that can significantly influence its performance. However, some FS methods based on ACO have not appropriately set the HF, even generating it randomly. Appropriately configuring the HF and PD parameters can enhance the global and local search capabilities of ACO. Recently, the impact of these parameters on the performance of ACO has been explored. Manbari et al. [46] introduced a circular graph-based ACO algorithm for FS, wherein each feature is interconnected with the subsequent one through a pair of select/deselect edges. The heuristic information pertaining to each feature is determined based on its corresponding term variance. Ghosh et al. [47] proposed an ACO-based wrapper-filter FS method that employs a filter to assess feature subsets instead of a wrapper, significantly reducing the computational complexity. The HF is estimated by calculating the similarity between the last added feature and the feature whose addition probability is being calculated. Additionally, a novel multi-label FS method based on ACO was proposed in [48], which employed a heuristic learning approach instead of a static heuristic function. The heuristic function of ACO is learned from experiences directly using TD reinforcement learning, which significantly improves the quality of the selected feature subset. Moreover, Hashemi et al. [49] proposed an ensemble FS approach based on ACO that utilized multiple heuristic information determined by a multi-criteria decision-making procedure. Such distinctive heuristic information supplies further insights about subsequent nodes. Experimental results demonstrated that the proposed method significantly outperforms other methods across various evaluation indicators.

Despite the proven effectiveness of ACO in solving FS problems, it still encounters issues such as limited global search capability and slow convergence rates, particularly when handling high-dimensional problems. To mitigate these limitations, the concept of hybridizing with complementary metaheuristics has emerged as a promising solution. As detailed by Talbi [33], there exists a two-tiered hybridization system encompassing low-level and high-level hybridization, each comprising two distinct mechanisms of hybridization. Building on this concept, Wan et al. [50] proposed the VMBACO, a hybrid FS method that blends a modified binary ACO algorithm with a genetic algorithm (GA). This hybrid method employs the solution derived from GA as the assignment of the HF in BACO, resulting in substantial improvements in both the quality of feature subsets and classifier accuracy. In a similar vein, Li et al. [51] introduced a hybrid FS model that combines ACO with the antlion optimizer, which includes a novel mutation operator to enhance exploration capability. Furthermore, Ma et al. [52] suggested a two-stage hybrid ACO algorithm for high-dimensional FS. This method employs an interval strategy to determine the size of the OFS and integrates a hybrid model that harnesses the inherent relevance attributes of features and classification performance to direct the OFS search. This advanced hybrid ACO assigns HF to a feature by calculating its correlation with the chosen feature subset after softmax normalization. Despite the improved performance of these hybrid algorithms, they remain susceptible to the local optima and often exhibit relatively high computational complexity.

The heterosis-inspired HRO has demonstrated high search efficiency and robust global search capabilities, as illustrated by its successful application to the band selection problem [29]. Moreover, the diverse set of operators employed by HRO facilitates the effective maintenance of population diversity in high-dimensional problem spaces. Considering these benefits of HRO, it is hypothesized that integrating the improved, problem-oriented ACO with HRO may yield a more efficient and robust search for OFS in high-dimensional FS tasks.

3. Preliminaries

3.1. The Binary Coded Ant Colony Optimization Algorithm

Binary ant colony optimization (BACO) is a metaheuristic inspired by ant behavior, and it is designed to find the optimal solutions for binary optimization problems. BACO is particularly useful in optimizing scenarios such as FS and intrusion detection [53]. BACO commences with a randomly generated set of candidate solutions, depicted as binary strings. Throughout each iteration, the algorithm adjusts the probability of including each feature, drawing from the information acquired from previously established solutions. In BACO, the PD serves as a directive for subsequent ants to decide whether a feature should be incorporated into their solution. Figure 1 presents a binary directed acyclic graph, symbolizing the potential paths ants can follow to find a solution. Figure 2 elaborates on the path of an ant to construct a corresponding solution to a feature subset. In this example, five features are considered, and the bold black solid line represents the path constructed by an ant. The selected feature subset is , while features and are deselected.

The node selection for the next bit is determined by the state transition probability, a function derived from both the PD and HF. The mathematical expression for the state transition probability is given by equations (1) and (2),where denotes the probability that the ant selects path 1 from bit to bit and and (1) are the corresponding PD and HF values for this path. Similarly, , , and (0) represent the probability, PD, and HF values for the case where the ant selects path 0 from bit to bit . The parameters and control the relative importance of the PD and HF, respectively. Initially, the PD for each path is set to be equivalent and then decreases as ants make their selections. The update rule for the PD from bit to bit is described by equations (3) and (4).

The evaporation factor, denoted as , signifies the rate of pheromone decay on unselected paths. Conversely, the chosen path experiences an increase in PD as described by , which is defined by the fitness value of the optimal candidate solution, , procured in the current iteration. BACO also maintains a record of the optimal solution found to date, denoted as . Additionally, each iteration records its own optimal solution, , with its fitness value being compared against that of . If the fitness value of surpasses that of , is updated with at iteration . Upon the completion of all iterations, serves as the global optimal solution.

BACO is limited by the absence of a suitable HF for the FS task, especially when the dimension is high. This makes BACO tends to converge to a local optimal solution rather than a global optimal solution. To fully exploit the performance of BACO in FS, it is essential to introduce a suitable HF assignment strategy specifically for the FS task and to improve the relevant update strategy. The improved strategies will be presented in detail in Section 4.2.

3.2. Hybrid Rice Optimization Algorithm

Hybrid rice optimization (HRO), proposed by Ye et al. [27], is an innovative metaheuristic that draws upon the advantages of heterosis, offering superior search efficiency and robust global search capabilities. HRO divides the population into three lines based on their fitness values. Let denote the population sorted by fitness value, where is the size of the population. The first subpopulation, representing the best fitness within the entire population, is referred to as the maintainer line and is denoted by . The second subpopulation represented by possesses the poorest fitness value and constitutes the sterile line, which requires hybridization with the maintainer line to enhance its fitness quality. The remaining subpopulation is the restorer line, denoted as . It aims to evolve into the maintainer line through a process referred to as selfing. The principle of HRO will be further elaborated in the following sections.

3.2.1. Hybridization

The hybridization process involves crossing the maintainer line and sterile line, which have the greatest difference in fitness value. This process aims to update the sterile line by substituting the original individual with a newly generated hybrid individual if it displays superior fitness. The new individual within the sterile line can be obtained using the following equation:where denotes the gene of the hybrid individual of the sterile line at iteration . represents the gene of a randomly selected individual from the sterile line, while is the gene of the individual randomly selected from the maintainer line. is a random number generated from .

3.2.2. Selfing

The selfing process is designed to update the restorer line, with the intention of steering the individual towards the global optimal solution. This behavior can be mathematically modeled using the following equation:where represents the new gene produced by selfing between the and restorer . denotes the gene of the best individual found so far and represents the gene of the restorer randomly selected from the restorer line. The variable is a random number generated within the range of .

After generating a new individual through hybridization and selfing, it is evaluated and compared to the original candidate solution. The substitution process, as defined in equation (7), replaces the old individual with the new one only if the fitness value of the new individual surpasses that of the old one.

3.2.3. Renewal

The selfing process in HRO involves tracking the number of iterations during which a restorer has not undergone updates using a parameter called SC (self-crossing). If a restorer’s SC reaches the upper limit, denoted as iterations without updates, a reset operation is performed on that individual. This reset behavior is mathematically encapsulated in the following equation:where represents the gene of the restorer that has not been updated and and represent the maximum and minimum values of the dimension, respectively. is a random number drawn from . It is worth noting that the solution obtained by HRO is continuous, which needs to be mapped into the solution space of FS through the application of the transfer function. In this study, the sigmoid function is employed as a binary map, which is defined by

4. The Proposed Method

Although the effectiveness of traditional ACO-based methods in solving low-dimensional combinatorial optimization problems is well established, it is necessary to further improve the strategies employed by BACO to tackle high-dimensional FS problems. In pursuit of this aim, two hybrid models have been proposed, integrating IBACO and HRO to improve the efficiency of exploring the global optimal solution and to enhance the rate of convergence. Moreover, given the considerable impact of the HF assignment strategy on both the population update process and the overall performance, a novel problem-oriented HF assignment scheme is introduced, which is based on the significance of the knee point feature. This scheme takes into account the importance of the knee point feature to ensure that the population is updated effectively. Lastly, the objective function that needs to be optimized has been finessed, taking into account not only the classification accuracy but also the size of the feature subset.

4.1. The Hybrid Models

To fully harness the convergence rate and global search capability benefits of HRO, IBACO is melded with HRO in two unique hybrid models: the relay model (R-IBACO) and the collaborative model (C-IBACO). The details of these two hybrid models are discussed in the following.

4.1.1. The Relay Model

As described in Section 3.2, the maintainer line in HRO symbolizes the optimal candidate solutions discovered by the population. The quality of the maintainer line is crucial in determining the evolution of the population and influencing both the convergence rate and the final result. Moreover, the sterile line is updated via crossing with the maintainer line, underscoring the importance of enhancing the quality of the maintainer line in HRO. To this end, a LRH model is proposed, applying the update strategies of IBACO to the update process of the HRO maintainer line. Specifically, the bit of the binary string corresponding to each maintainer candidate solution is evolved using the PD and HF of IBACO. Each individual of the maintainer line selects a path determined by the transition probability (equations (1) and (2)). This integration of the IBACO operator effectively enhances numerous dimensions in the maintainer line during the early iterations, enabling R-IBACO to concentrate on global search in discrete spaces and quickly converge towards the vicinity of the global optimum. Furthermore, the best individual from the maintainer line is employed to update the PD. The detailed implementation of the relay model is outlined in Algorithm 1.

	Input: Dataset and objective function
	Output: OFS and corresponding classification accuracy
(1)	Initialize population and set the maximum number of iterations
(2)	Calculate feature importance and knee point feature using Algorithm 3
(3)	Set heuristic factor and initialize PD
(4)	while t < T do
(5)	Calculate the fitness of the population
(6)	Sort in descending order and divide it into three lines: (Maintainer), (Restorer), and (Sterile)
(7)	for Individual in do
(8)	if in then
(9)	Generate trial solution using equations (1) and (2)
(10)	else if in then
(11)	Generate trial solution using equation (5)
(12)	else
(13)	ifthen
(14)	Generate trial solution using equation (6)
(15)	else
(16)	Generate trial solution using equation (8)
(17)	end if
(18)	end if
(19)	Compute
(20)	Update using equation (7)
(21)	if is a binary vector with all bits set to 0 then
(22)	Reinitialize the binary vector corresponding to
(23)	end if
(24)	end for
(25)	Update , , and PD
(26)	end while

4.1.2. The Collaborative Model

In this section, a co-evolutionary hybrid model, referred to as C-IBACO, is proposed. This model maintains the efficiency of IBACO in tackling high-dimensional FS problems while integrating the effective optimization performance of HRO. C-IBACO consists of two subpopulations, with IBACO and HRO independently executing their respective update strategies in each iteration. Specifically, ants select a path based on the transition probabilities specified in equations (1) and (2), while the gene of the rice individual is updated using equations (5), (6), and (8). The HF of the path assigns importance to the knee point to measure the potential of each feature to be selected. Features with a correlation greater than the knee point are more likely to be included in the feature subset while those with a low feature correlation not losing the opportunity to be selected.

To promote co-evolution between these two parallel algorithms, the search results from each subpopulation are shared after each iteration. The global optimal solution is determined by comparing the fitness values of the best candidate solutions from both subpopulations. If the best individual from HRO outperforms that of IBACO, the pheromone updated originally with the best candidate solution from the ant colony is substituted by the best individual from HRO. Conversely, if the best candidate solution from IBACO prevails, the worst solution of the maintainer line in HRO is supplanted by the best candidate solution of IBACO.

By incorporating these improvement strategies and information sharing mechanisms, C-IBACO can potentially identify a superior feature subset, selecting the most promising features effectively. The specifics of the C-IBACO procedure are detailed in Algorithm 2.

	Input: Dataset and objective function
	Output: OFS and corresponding classification accuracy
(1)	Initialize the subpopulations and and set the maximum number of iterations
(2)	Calculate feature importance and knee point feature using Algorithm 3
(3)	Set heuristic factor and initialize PD
(4)	while t < T do
(5)	Calculate and
(6)	Find and : individuals with min fitness in and
(7)	Sort in descending order and divide into three lines: (Maintainer), (Restorer), (Sterile)
(8)	for Individual in do
(9)	if in then
(10)	ifthen
(11)	Generate candidate solution using equation (6)
(12)	else
(13)	Generate candidate solution using equation (8)
(14)	end if
(15)	else if in then
(16)	Generate candidate solution using equation (5)
(17)	end if
(18)	Calculate
(19)	Update using equation (7)
(20)	Check if is an all-zero binary vector
(21)	end for
(22)	for Individual in do
(23)	Update path of using equations (1) and (2)
(24)	Check if is an all-zero binary vector
(25)	end for
(26)	Update and
(27)	Update the PD and the worst individual in maintainer line
(28)	end while
(29)	is the best of and

4.2. The Improved Heuristic Factor

The random assignment of the HF in BACO has been identified as a limitation in its capability to optimize high-dimensional FS problems. This limitation calls for a task-specific HF assignment to enhance performance. Nevertheless, traditional FS methods, such as those based solely on high-correlation features, may be arbitrary and do not consider the potential influence of combined features with low correlation on the final classification result. To mitigate this issue, this paper proposes the use of the weight of the knee point [54] as a threshold in the HF assignment process. This approach obviates the need for complex experimental verification and avoids significant loss of class label information.

Initially, the proposed methods employ the random forest (RF) algorithm to calculate the importance of each feature. All features are then sorted based on their RF results, and the knee point is defined as the feature with the maximum distance from the straight-line projection connecting the features with the maximum and minimum importance. The detailed selection process for the knee point is described in Algorithm 3 and is graphically represented in Figure 3.

	Input: Dataset and feature space
	Output: Knee point and its weight
(1)	Set maximum vertical projection
(2)	Initialize index of knee point and its weight
(3)	Compute feature correlation for each using
(4)	Sort the feature correlation in descending order, denoted by
(5)	Connect features with largest and smallest correlations to form line (shown as red dotted line in Figure 3)
(6)	for each feature in do
(7)	Calculate vertical projection distance from to line , denoted as
(8)	ifthen
(9)
(10)
(11)	end if
(12)	end for
(13)	The knee point is the feature in (as marked with a red circle in Figure 3) and its weight is

Equation (11) presents the HF assignment strategy in the proposed IBACO method, where denotes the importance of the feature and represents the importance of the knee point feature. indicates that the HF of the path adopts while means that the HF of the path adopts . According to equations (1) and (2), this assignment strategy prioritizes selecting features with higher correlation while not disregarding less significant features.

4.3. Time Complexity Analysis

The time complexity of the proposed hybrid algorithms primarily hinges on four aspects: initialization, individual evaluation and update, population sorting, and pheromone update. Table 2 contrasts the time complexity of the hybrid algorithms with those of standalone algorithms at each stage. In this context, symbolizes the population size, while and denote the subpopulation sizes of HRO and IBACO within C-IBACO, respectively. signifies the dimension of . The detailed analysis of time complexity reveals that, in comparison to ACO, R-IBACO increases the computational complexity merely at the population sorting phase. Conversely, in relation to HRO, it heightens the computational overhead solely in updating the maintainer line and PD. Owing to the cooperative updating of two subpopulations engaged in C-IBACO, it manifests a higher time complexity than the single algorithms. This increased complexity emerges during the population initialization, population sorting, and individual update process. The total time complexities of R-IBACO and C-IBACO stand at and , respectively, where T represents the maximum number of iterations. Overall, it can be suggested the hybrid algorithm significantly improves the overall performance of the model within an acceptable range of increased computational overhead.

4.4. The Objective Function

The objective function serves as the guiding principle in providing guidance for the update process and determining the optimization direction of the algorithms. In the context of FS, the objective is to identify the most informative feature subset that can enhance the performance of classifiers. The primary criterion for evaluating the efficacy of a classifier is its classification accuracy, which is defined in equation (12). However, since the objective function is designed to be minimized, the error rate of classification, which is the complement of accuracy (1-accuracy), is employed as a primary factor in the objective function. Moreover, it is desirable to maintain a minimal feature count in the subset, prompting the inclusion of the selection ratio in the objective function, as specified in equation (13). By minimizing the objective function, the proposed methods can identify the least redundant and most informative feature subset with the highest classification accuracy.

In equation (12), and , respectively, denote instances where the classifier correctly identified a test sample as positive or negative, whereas and represent instances where the classifier incorrectly classified a test sample as positive or negative. In equation (13), the parameters and refer to the count of features included in the selected feature subset and the total number of features, respectively. The fraction signifies the feature selection rate. The fitness value, denoted by , is calculated by weighing the error rate and the selection rate using the weights and , respectively. As the classifier’s accuracy is the primary component of the fitness value, is typically assigned a larger value, such as 0.9.

5. Experimental Results and Discussion

In this section, the performance of the proposed methods is evaluated on fourteen high-dimensional biomedical datasets, with feature sizes ranging from 2000 to 12533 dimensions. The primary performance metrics considered are classification accuracy, size of feature subset, and running time. The K-nearest neighbor (KNN) classifier serves as the evaluation metric for the performance of the feature subset selected by each algorithm. The effectiveness of two methods for computing feature importance on model performance is also investigated. Five-fold cross-validation is employed to avoid model overfitting due to limited sample sizes, providing a more accurate assessment of model performance metrics. To thoroughly analyze the FS performance of the proposed methods, comparison experiments with thirteen other metaheuristic-based FS methods are conducted in three distinct groups. In the first group, the proposed methods are compared with standard HRO and IBACO, which are components of the proposed hybrid methods. The standard ACO is also included in this group to validate the effectiveness of the proposed heuristic factor. In the second group, the proposed methods are compared with five well-known FS methods based on basic metaheuristics, such as FPA, BQPSO, ABC, SSA [41], and GWO. In the final group, the effectiveness of the proposed hybrid methods is validated against five state-of-the-art methods reported in recent studies, which are CMSRSSMA [11], MBAO [17], MSGWO [19], SCHHO [24], and HFSIA [25]. These advanced hybrid algorithms allow the proposed methods to be benchmarked against current leading solutions in the field.

All algorithms employed in this research were implemented using Python language version 3.6.9. The experimental results presented in this paper were obtained on a personal computer equipped with an Intel (R) Core (TM) i7-8700 CPU operating at 3.2 GHz, 16.0 GB RAM, under a Windows 10 system.

5.1. Description of Datasets

Table 3 outlines the key characteristics of the datasets employed in this research. These datasets span a range of medical disciplines, predominantly serving binary or multi-class classification tasks. Notably, each dataset is characterized by a large number of features, varying from 2000 to 12533. A shared trait among these datasets is the presence of numerous redundant and irrelevant features, compounded by a typically limited sample size. It is imperative to apply dimensionality reduction to such data, as the presence of unnecessary features can compromise model performance.

5.2. Parameter Settings

The parameters for all algorithms are configured as follows. The maximum number of iterations is fixed at 100 for all algorithms. To ensure fairness in terms of maximum evaluation times of the fitness function (FE_MAX = 3000), the population size of all algorithms is set to 30 except for HRO, C-IBACO, and HFSIA. In consideration of the fact that only the sterile line is updated during the hybridization process of HRO and taking into account that the subpopulation sizes in C-IBACO are required to satisfy equation (14), where stands for the total population size (fixed at 30), the population size for the single standard HRO is adjusted to 45. Additionally, the subpopulation sizes for HRO and IBACO in C-IBACO are set to 27 and 12, respectively. For HFSIA, the population size is set at 14 and the crossover probability is set at 0.35 to maintain consistency in the evaluation times of the fitness function. The filter ratio of Fisher in HFSIA is set at 0.3 to preselect the top 30% of relevant features. Moreover, in the initial stage of MBAO, the number of features filtered by mRMR is fixed at 100, aligning with the existing literature. Lastly, to mitigate the effects of randomness and bolster the stability of the experimental outcomes, each algorithm is executed independently on each dataset ten times.

5.3. Experiment Analysis

5.3.1. Experimental Evaluation of Methods for Computing Feature Correlation

In alignment with the high-dimensional FS research framework delineated in Section 4, this section endeavors to assess and compare two feature correlation computation methods, including ReliefF and RF. The results derived from the application of KNN with all features serve as a control group. Classification accuracy and average number of selected features (AvgN) are designated as the principal evaluation metrics to provide a comprehensive performance appraisal of different feature correlation computation methods. The experimental findings are detailed in Table 4.

The evaluation results underscore the efficiency and effectiveness of RF and ReliefF in determining feature correlation and eliminating redundant features. It is noted that ReliefF tends to select more features than RF across all datasets, indicating the superior selection efficiency of RF. For example, the selection rate of RF in the lymphoma dataset reaches 95.85%, which is 90.59% higher than that of ReliefF. A similar outcome is observed in the warpAR10P dataset, where the selection rate of RF outperforms ReliefF by 88.32%. Moreover, both methods exhibit varying degrees of improvement in classification accuracy compared to using all original, unfiltered features. In the ALLAML dataset, the classification accuracies of RF and ReliefF are recorded at 92.92% and 95.81%, respectively, marking improvements of 19.74% and 25.43% over the baseline. RF also delivers a classification accuracy of 68.67% in the warpAR10P dataset, which is 21.18% higher than the baseline. The enhancement in classification accuracy, coupled with an average dimensionality reduction rate of 98.4%, testifies to the ability of RF to effectively discard irrelevant features and calculate feature correlation in high-dimensional data.

However, the selected feature subset does not invariably guarantee improved classification accuracy. For instance, in the Brain_Tumor_2 dataset, RF selects a small feature subset of 60 features, achieving a dimensionality reduction rate of 99.42%. However, the classification accuracy merely attains 64.82%, marking a decrease of 7.99% from the baseline. This trend is similarly observed in lymphoma, Leukemia_2, and 11_Tumor datasets. The diminished classification accuracy underscores the limitations of FS methods that rely solely on feature correlation ranking, as crucial information tied to features with lower correlations than the knee point may be overlooked. Consequently, it might be more judicious to consider the correlation of all features as the criteria for high-dimensional feature ordering instead of discarding less correlated features.

The classification experiment results, detailed in Tables 5 and 6, demonstrate the effectiveness of employing the feature correlation determined by RF and ReliefF as the heuristic factor in the proposed IBACO. The findings suggest that C-IBACO and R-IBACO surpassed standalone RF or ReliefF in classification accuracy. For instance, when k is set to 3, the classification accuracies of RF-based C-IBACO and R-IBACO on the Brain_Tumor_2 dataset register at 88.82% and 86.82%, respectively, which are 22.28% and 18.51% superior to the accuracies of single RF. Furthermore, the classification accuracy obtained in the DLBCL dataset by C-IBACO using the feature correlation calculated by RF achieved 100%, which indicates the ability of the proposed methods to identify samples that are challenging to distinguish between KNN and RF-based KNN. Compared to , the proposed methods exhibit an enhanced search capability when is fixed at 3. As a result, in subsequent comparative experiments, KNN (K = 3) will serve as the final classifier to evaluate the performance of feature subsets.

5.3.2. Comparison of the Proposed Methods with HRO and IBACO

To substantiate the effectiveness of the proposed heuristic factor assignment and hybrid strategies, the proposed methods were contrasted with the standard single algorithms. Table 7 summarizes the performance of the proposed methods and the basic methods in terms of classification accuracy, average number of selected features, and average runtime over ten independent runs. The results distinctly demonstrate that the mean accuracy of the proposed methods surpasses that of the original single algorithm across all datasets. Moreover, the introduction of an improved problem-oriented HF propels IBACO to outperform the standard ACO on ten of the fourteen datasets. Regarding the average number of selected features, the proposed methods obtained the smallest feature subsets on nine datasets. Although the standard ACO recorded the shortest average runtime across all datasets except the lung dataset, exhibiting its efficient selection capability, its lack of enhanced strategies culminated in inferior classification outcomes.

5.3.3. Comparison of the Proposed Methods with Other Basic Metaheuristics

To further investigate the superiority of the proposed methods, a fair comparison was conducted with five well-known basic metaheuristic-based FS methods, including FPA, BQPSO, ABC, ISSA [41], and GWO. The comparison maintained an equal maximum number of fitness evaluations, and the comparative outcomes of this group of methods are presented in Table 8. The proposed methods achieve the highest average and maximum classification accuracy across all datasets. BQPSO and GWO also deliver impressive results on most datasets, with classification accuracy on the Leukemia_1 dataset even surpassing that of C-IBACO, a pattern also observed on the 11_Tumor dataset.

Regarding the size of the selected feature subset, the proposed methods significantly outperform their counterparts. For instance, the feature selection rate of C-IBACO on the Prostate_Tumors dataset is remarkably low, at only 0.41%. This is in stark contrast to the substantially higher feature selection rate of 49.47% displayed by FPA and IBSSA, a difference of 99.17%, which underscores the superior effectiveness of the proposed methods. Furthermore, C-IBACO selected an average of 143.6, 32.2, 53.1, and 23.6 features on the warpAR10P, GLIOMA, Prostate_GE, and ALLAML datasets, respectively, with a minimum dimensionality reduction rate of 94.02%. R-IBACO selects the smallest average number of features on the Colon dataset, with only 57 features selected, a reduction of 94.09% compared to the average number of features chosen by IBSSA. The proposed methods record the shortest runtime on seven out of fourteen datasets. Notably, IBSSA demonstrated higher computational efficiency than other methods on high-dimensional datasets (the last four datasets with over ten thousand dimensions).

In general, as the problem dimension increases, the performance gap between the proposed methods and other algorithms widens, indicating that C-IBACO and R-IBACO exhibit greater robustness and are more suitable for high-dimensional FS tasks.

5.3.4. Comparative Study with the State of the Art

Apart from comparing the proposed methods with single standard algorithms and basic metaheuristic-based FS methods, this study also probes into the performance comparison of C-IBACO and R-IBACO with five advanced metaheuristic-based FS methods recently expounded in the literature, including MBAO [17], HFSIA [25], SCHHO [24], MSGWO [19], and CMSRSSMA [11]. Table 9 presents the comparison results of multiple independent runs for each advanced algorithm on each dataset. As evidenced in the table, C-IBACO and R-IBACO surpassed their counterparts in terms of classification accuracy on thirteen datasets, except for the lymphoma dataset. In the case of the lymphoma dataset, MBAO achieved the best classification accuracy while selecting the least number of features. This hinted at the limitations of the proposed methods in eliminating redundant and irrelevant features while bolstering classification accuracy. Although the application of filters substantially diminished the time consumed in the FS process, it consequently led to a drop in classification performance. This is illustrated by the average classification accuracy of MBAO, amounting to only 65.4 on the 11_Tumor dataset. In this case, the classification accuracy of our proposed C-IBACO is 40.61% higher than that of MBAO. Similar trends were also observed in the Brain_Tumor_2 and warpAR10P datasets, where R-IBACO and C-IBACO were 24.45% and 23.30% higher than MBAO and SCHHO, respectively, which might be due to filters eliminating features potentially useful for classification.

While emphasizing classification accuracy, the hybrid filter approaches managed to select the minimum number of features across all datasets, with MBAO achieving thirteen and HFSIA securing one. This outcome can be ascribed to the ability of filter to constrain the search space to a lower dimension, thereby significantly shrinking the size of the feature subset to be searched. Regarding computational time, the hybrid filter methods demonstrate superior search efficiency as they conduct the search within the filtered, low-dimensional feature space. Specifically, HFSIA registers the shortest computational time on the initial nine datasets, whereas MBAO exhibits exceptional search efficiency on the final five datasets characterized by higher dimensionality. In the context of search efficiency across the full feature space, both C-IBACO and R-IBACO perform slightly less effectively than MSGWO, which does not utilize a hybrid strategy. However, C-IBACO and R-IBACO prove to be more robust than SCHHO and CMSRSSMA, particularly as the feature dimension increases.

Although the C-IBACO and R-IBACO may not match the computational efficiency and selected feature subset size exhibited by MBAO and HFSIA, they still demonstrated acceptable performance and robustness in classification accuracy and stability, which is worth a slight sacrifice in time.

5.3.5. Graphical Analysis

Figure 4 provides a visual representation of the comparative superiority and efficacy of the proposed methods against the single standard algorithms and other metaheuristic-based FS approaches. This figure encapsulates the overall maximum and average classification accuracy, along with the number of selected features for all algorithms. Specifically, Figures 4(a) and 4(b) depict the overall maximum and average classification accuracies, respectively. As the figures suggest, the feature subsets selected by C-IBACO and R-IBACO deliver markedly superior classification performance compared to other methods. Higher overall classification accuracy underscores the superior search performance of the proposed methods and a heightened likelihood of discovering more promising candidate solutions. Conversely, MBAO did not exhibit optimal performance in overall classification accuracy, as discussed in the preceding section. This discrepancy could be due to the classifier used, which differs from those documented in the literature. Figure 4(c) displays the overall number of features selected by each algorithm. MBAO selected the fewest features, followed closely by R-IBACO and C-IBACO. It is worth noting that the number of features chosen by the proposed methods was less than that of HFSIA, even though the latter also employs a filtering method. This result reaffirms the effectiveness of the proposed methods in reducing the number of selected features.

(a)

(b)

(c)

The convergence results, illustrated in Figure 5, demonstrate that both C-IBACO and R-IBACO converge quickly to the vicinity of the global optimum before the termination of iterations. The hybrid algorithms possess an advantage in achieving a superior initial solution by properly selecting features with high correlation based on the modified heuristic factor in the first stage. As the convergence curves indicate, the proposed methods maintain a certain exploratory capability in the final stage of iterations, reducing the risk of getting trapped in the local optima. In contrast, the convergence curves of MBAO reveal its relatively underwhelming overall performance. Specifically, on datasets such as lung, Brain_Tumor_2, Leukemia_2, and 11_Tumor, MBAO prematurely converges to a local optimum and fails to further search for a better feature subset. One possible explanation for this is that mRMR filters out informative feature combinations, resulting in a search space with poor performance.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

(m)

(n)

To provide a more intuitive understanding of the computational complexity of different algorithms, Figure 6 displays the average running time of all algorithms for each dataset. Evidently, HFSIA attains the shortest running time for the first nine datasets, whereas MBAO demonstrates superior search efficiency for the last five datasets with higher dimensionality. It is noticeable that IBACO achieves a shorter average CPU running time than HRO and their hybrid counterparts across all datasets, which can be attributed to its ability to quickly select features with high correlation using the modified HF, thereby accelerating convergence. However, this advantage is accompanied by lower classification accuracy and unsatisfactory average fitness values compared to the hybrid algorithms. Nevertheless, C-IBACO and R-IBACO display exceptional robustness by generating acceptable candidate solutions without incurring a significant computational cost. Overall, the experimental results affirm the feasibility and potential of deploying the proposed methods for practical high-dimensional FS tasks.

(a)

(b)

5.3.6. Experimental Results for Nonparametric Test

To determine the significance of the difference between the proposed hybrid algorithms and compared metaheuristic-based FS approaches, the experimental results were analyzed using the Wilcoxon signed-rank test and the Friedman test [55]. The results of the Wilcoxon signed-rank test are presented in Table 10. In this table, denotes the sum of ranks where the proposed hybrid algorithm outperforms the comparative method, whereas symbolizes the reverse. The value signifies the level of significance, with indicating a significant difference between the two algorithms under comparison. The outcomes reveal that C-IBACO outperforms R-IBACO, suggesting that C-IBACO maintains a lower individual fitness value and exhibits superior performance in the FS process. Apart from the null hypothesis being rejected between C-IBACO and R-IBACO, both are significantly superior to other basic and advanced methods, demonstrating significant differences.

Table 11 presents the results of the Friedman test, including average accuracy rankings and final ranks for each algorithm across all datasets. As demonstrated in Table 11, the proposed methods claim the top two positions in the final ranking. The Friedman test produces a value of 2.61E − 18, which is less than the preset significance level of 0.01, indicating a statistical difference among the algorithms. Table 12 provides the results of the Holm test, which serves as a post hoc test to determine significant differences between the control method and other algorithms in pairwise comparisons. The Holm test rejected the null hypothesis at a significance level of 0.05, indicating significant differences between R-IBACO and other competing methods except for C-IBACO, which is consistent with the findings of the Wilcoxon signed-rank test.

Based on the statistical test results, it can be inferred that the hybrid algorithms incorporating IBACO and HRO exhibit superior optimization performance compared to their standalone counterparts. This emphasizes the effective combination of the optimization strategies of every single algorithm, leading to an enhancement in their competence in high-dimensional FS tasks.

5.4. Discussion and Biological Interpretation

Based on the results of three comparative experiments and statistical analysis, it can be inferred that the hybrid models of HRO and IBACO integrate the advantages of both single algorithms, allowing them to converge more quickly to the most promising region in the search space while maintaining a certain level of global exploration capability in later iterations. The problem-oriented HF enhances the ability of ACO to select more important features while discarding irrelevant ones. Moreover, the optimal solution obtained by HRO guides the update of pheromones. This collaborative update process continues until the termination condition is achieved. By combining HRO and IBACO, the population benefits from greater diversity while reducing the probability of a single algorithm getting stuck into the local optima.

Numerous metaheuristics have been developed to solve low-dimensional continuous optimization problems, but these often struggle when addressing high-dimensional discrete optimization issues. This study introduces two hybrid wrapper FS methods, and the experimental results on fourteen high-dimensional datasets demonstrate that the feature subset obtained by the hybrid algorithms had higher classification accuracy and smaller size. Nevertheless, their optimization performance still needs to improve in some instances, a shortcoming attributable to the inherent traits of metaheuristics. To enhance the potential of locating the optimal solution, metaheuristics often adopt a global search strategy that depends on certain randomness. This approach inevitably raises the risk of the algorithm getting stuck in the local optima. For instance, the algorithms only managed to achieve an average classification accuracy of 83% on the warpAR10P dataset, a rate that may not satisfy the stringent standards demanded in practical medical scenarios. Furthermore, the introduction of advanced collaborative hybrid strategies can reduce the optimization efficiency of the algorithm, given that updating the maintainer line and independent subpopulations amplifies the complexity of the model.

The performance improvement of the hybrid algorithms can primarily be attributed to the fact that they leverage the outstanding traits of individual algorithms. As illustrated in Table 7, BHRO exhibited superior classification accuracy, while IBACO surpassed in terms of optimization efficiency and robustness. Another crucial factor contributing to performance enhancement is the hybrid strategy and the improved HF. These elements ensure adequate coverage of the search area, providing the proposed methods with a better opportunity to locate the global optimal solution within the entire search space. This results in the leading performance regarding maximum classification accuracy across all datasets. However, it is essential to note that the proposed methods still have limitations. For instance, setting the selfing upper limit parameter for HRO can be challenging since it is crucial for the transition between the selfing and renewal stages. Additionally, the inherent randomness in metaheuristics cannot guarantee the acquisition of the optimal feature subset in a single run.

The index of the most frequently selected features (chosen in out of 10 runs) by C-IBACO is presented in Table 13. The results demonstrate that C-IBACO is capable of selecting a small number of highly discriminative features on most datasets except for warpAR10P, Brain_Tumor_1, and Brain_Tumor_2, which suggests its potential applications in disease diagnosis and gene expression problems. Notably, the feature subsets composed of high-frequency features on the Colon and Prostate_Tumors datasets exhibit superior classification performance compared to the results in Table 7. Moreover, the size of these subsets is only 31.12% and 20.5% of the original average selected features, respectively, which suggests that the proposed methods still have the potential for enhanced removal of irrelevant features. With more appropriate and reasonable parameter settings in the future, the suggested approaches can be applied to other practical problems such as fault detection [56], scheduling problems [57, 58], text classification [59], and sentiment analysis [60].

6. Conclusion

In this research, two innovative hybrid wrapper-based FS methods that integrate HRO and IBACO are proposed to identify the most informative features in high-dimensional disease diagnosis and gene expression data. The primary objective of hybridization is to enhance the performance of IBACO by harnessing the power of HRO to facilitate the exploration and exploitation of the high-dimensional search space. By combining the superior search efficiency and robustness of IBACO and the excellent performance of HRO in searching the global optima, the proposed hybrid methods manifest enhanced FS capabilities. Moreover, IBACO attempts to boost performance through a problem-oriented assignment strategy that employs the correlation of the knee point feature, enabling the algorithm to exploit valuable latent information in the features. This strategy is also integrated into HRO to compensate for the absence of update to the maintainer line.

Two distinct forms of hybridization are presented in this study: R-IBACO and C-IBACO. In R-IBACO, IBACO plays a critical role in updating the maintainer line of HRO, with the best solution derived from HRO subsequently used to update PD at each iteration, while in C-IBACO, the subpopulations of HRO and IBACO perform the update process independently, and the local search results are shared to update PD and maintainer line after each iteration. In the methods proposed, the KNN algorithm functions as the classifier, and RF is employed to calculate the feature importance required for assigning HF. The proposed methods were evaluated on fourteen well-known biomedical datasets. Their performance was benchmarked against thirteen other algorithms, including single standard algorithms that comprise them, as well as basic and advanced metaheuristic-based wrapper FS methods. The experimental results indicate that the proposed methods outperform the other techniques in terms of both the number of selected features and classification accuracy on most datasets. Furthermore, the statistical results of the Wilcoxon signed-rank test and Friedman test reveal that the proposed methods achieved the top rank in terms of classification accuracy, which corroborates their effectiveness as practical strategies for selecting the most representative features related to diseases.

Although the suggested approaches effectively enhance the exploration and exploration capabilities of a single algorithm, it is important to note that optimization efficiency might decline as the model complexity increases. Moreover, optimal performance may not be achievable in certain specific scenarios. Consequently, future research should prioritize the consideration of more efficient hybrid strategies that can ensure the generalization capabilities of the algorithms across problems of varying scales and dimensions. One direction for further research involves exploring more efficient parameter settings and integrated optimization of algorithm and classifier parameters, which will help determine synergistic settings that maximize the overall performance. Additionally, it would be beneficial to explore different transfer functions and advanced classifiers, as these components have the potential to further bolster the performance of the algorithms in a variety of optimization scenarios.

Data Availability

The datasets analyzed during the current study are available in the following websites: https://jundongl.github.io/scikit-feature/datasets.html and https://ckzixf.github.io/dataset.html.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under grant no. 42201464 and the Wuhan Science and Technology Bureau Knowledge Innovation Dawning under grant no. 2022010801020270.

References

A. Chaudhuri and T. P. Sahu, “A hybrid feature selection method based on binary jaya algorithm for micro-array data classification,” Computers and Electrical Engineering, vol. 90, Article ID 106963, 2021.
View at: Publisher Site | Google Scholar
E. Pashaei and E. Pashaei, “Gene selection using hybrid dragonfly black hole algorithm: a case study on rna-seq covid-19 data,” Analytical Biochemistry, vol. 627, Article ID 114242, 2021.
View at: Publisher Site | Google Scholar
J. T. Pintas, L. A. F. Fernandes, and A. C. B. Garcia, “Feature selection methods for text classification: a systematic literature review,” Artificial Intelligence Review, vol. 54, no. 8, pp. 6149–6200, 2021.
View at: Publisher Site | Google Scholar
S. Chakraborty, A. K. Saha, S. Nama, and S. Debnath, “Covid-19 x-ray image segmentation by modified whale optimization algorithm with population reduction,” Computers in Biology and Medicine, vol. 139, Article ID 104984, 2021.
View at: Publisher Site | Google Scholar
M. Canayaz, “Mh-covidnet: diagnosis of covid-19 using deep neural networks and meta-heuristic-based feature selection on x-ray images,” Biomedical Signal Processing and Control, vol. 64, Article ID 102257, 2021.
View at: Publisher Site | Google Scholar
S. K. S. Tyagi and Q. Boyang, “An intelligent internet-of-things-aided financial crisis prediction model in fintech,” IEEE Internet of Things Journal, vol. 10, no. 3, pp. 2183–2193, 2023.
View at: Publisher Site | Google Scholar
M.-F. Leung and J. Wang, “Cardinality-constrained portfolio selection based on collaborative neurodynamic optimization,” Neural Networks, vol. 145, no. 68–79, pp. 68–79, 2022.
View at: Publisher Site | Google Scholar
H. Chan, M. Yang, H. Wang et al., “Assessing gait patterns of healthy adults climbing stairs employing machine learning techniques,” International Journal of Intelligent Systems, vol. 28, no. 3, pp. 257–270, 2013.
View at: Publisher Site | Google Scholar
P. Agrawal, H. F. Abutarboush, T. Ganesh, and A. W. Mohamed, “Metaheuristic algorithms on feature selection: a survey of one decade of research (2009-2019),” IEEE Access, vol. 9, pp. 26766–26791, 2021.
View at: Publisher Site | Google Scholar
L. Y. Yab, N. Wahid, and R. A. Hamid, “A meta-analysis survey on the usage of meta-heuristic algorithms for feature selection on high-dimensional datasets,” IEEE Access, vol. 10, pp. 122832–122856, 2022.
View at: Publisher Site | Google Scholar
H. Jia, W. Zhang, R. Zheng, S. Wang, X. Leng, and N. Cao, “Ensemble mutation slime mould algorithm with restart mechanism for feature selection,” International Journal of Intelligent Systems, vol. 37, no. 3, pp. 2335–2370, 2022.
View at: Publisher Site | Google Scholar
S. Li, H. Chen, M. Wang, A. A. Heidari, and S. Mirjalili, “Slime mould algorithm: a new method for stochastic optimization,” Future Generation Computer Systems, vol. 111, pp. 300–323, 2020.
View at: Publisher Site | Google Scholar
C. Qu, L. Zhang, J. Li et al., “Improving feature selection performance for classification of gene expression data using Harris Hawks optimizer with variable neighborhood learning,” Briefings in Bioinformatics, vol. 22, no. 5, Article ID bbab097, 2021.
View at: Publisher Site | Google Scholar
K. K. Ghosh, R. Guha, S. K. Bera, N. Kumar, and R. Sarkar, “S-shaped versus v-shaped transfer functions for binary manta ray foraging optimization in feature selection problem,” Neural Computing and Applications, vol. 33, no. 17, pp. 11027–11041, 2021.
View at: Publisher Site | Google Scholar
W. Zhao, Z. Zhang, and L. Wang, “Manta ray foraging optimization: an effective bio-inspired optimizer for engineering applications,” Engineering Applications of Artificial Intelligence, vol. 87, Article ID 103300, 2020.
View at: Publisher Site | Google Scholar
A. A. Ewees, R. R. Mostafa, R. M. Ghoniem, and M. A. Gaheen, “Improved seagull optimization algorithm using lévy flight and mutation operator for feature selection,” Neural Computing and Applications, vol. 34, no. 10, pp. 7437–7472, 2022.
View at: Publisher Site | Google Scholar
E. Pashaei, “Mutation-based binary aquila optimizer for gene selection in cancer classification,” Computational Biology and Chemistry, vol. 101, Article ID 107767, 2022.
View at: Publisher Site | Google Scholar
M. A. Awadallah, M. A. Al-Betar, M. S. Braik, A. I. Hammouri, I. A. Doush, and R. A. Zitar, “An enhanced binary rat swarm optimizer based on local-best concepts of pso and collaborative crossover operators for feature selection,” Computers in Biology and Medicine, vol. 147, Article ID 105675, 2022.
View at: Publisher Site | Google Scholar
M. Mafarja, T. Thaher, J. Too et al., “An efficient high-dimensional feature selection approach driven by enhanced multi-strategy grey wolf optimizer for biological data classification,” Neural Computing and Applications, vol. 35, no. 2, pp. 1749–1775, 2023.
View at: Publisher Site | Google Scholar
E. Pashaei and E. Pashaei, “Hybrid binary arithmetic optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical data,” The Journal of Supercomputing, vol. 78, no. 13, pp. 15598–15637, 2022.
View at: Publisher Site | Google Scholar
P. Stephan, T. Stephan, R. Kannan, and A. Abraham, “A hybrid artificial bee colony with whale optimization algorithm for improved breast cancer diagnosis,” Neural Computing and Applications, vol. 33, no. 20, pp. 13667–13691, 2021.
View at: Publisher Site | Google Scholar
R. Al-Wajih, S. J. Abdulkadir, N. Aziz, Q. Al-Tashi, and N. Talpur, “Hybrid binary grey wolf with Harris hawks optimizer for feature selection,” IEEE Access, vol. 9, pp. 31662–31677, 2021.
View at: Publisher Site | Google Scholar
R. Bandyopadhyay, A. Basu, E. Cuevas, and R. Sarkar, “Harris hawks optimisation with simulated annealing as a deep feature selection method for screening of covid-19 ct-scans,” Applied Soft Computing, vol. 111, Article ID 107698, 2021.
View at: Publisher Site | Google Scholar
K. Hussain, N. Neggaz, W. Zhu, and E. H. Houssein, “An efficient hybrid sine-cosine Harris hawks optimization for low and high-dimensional feature selection,” Expert Systems with Applications, vol. 176, Article ID 114778, 2021.
View at: Publisher Site | Google Scholar
Y. Zhu, W. Li, and T. Li, “A hybrid artificial immune optimization for high-dimensional feature selection,” Knowledge-Based Systems, vol. 260, no. 110, Article ID 110111, 2023.
View at: Publisher Site | Google Scholar
E. Pashaei and E. Pashaei, “Hybrid binary coot algorithm with simulated annealing for feature selection in high-dimensional microarray data,” Neural Computing and Applications, vol. 35, no. 1, pp. 353–374, 2023.
View at: Publisher Site | Google Scholar
Z. Ye, L. Ma, and H. Chen, “A hybrid rice optimization algorithm,” in Proceedings of the 2016 11th International Conference on Computer Science and Education (ICCSE), pp. 169–174, Nagoya, Japan, August 2016.
View at: Google Scholar
Z. Shu, Z. Ye, X. Zong et al., “A modified hybrid rice optimization algorithm for solving 0-1 knapsack problem,” Applied Intelligence, vol. 52, no. 5, pp. 5751–5769, 2022.
View at: Publisher Site | Google Scholar
Z. Ye, W. Cai, S. Liu, K. Liu, M. Wang, and W. Zhou, “A band selection approach for hyperspectral image based on a modified hybrid rice optimization algorithm,” Symmetry, vol. 14, no. 7, p. 1293, 2022.
View at: Publisher Site | Google Scholar
Z. Wang, S. Gao, Y. Zhang, and L. Guo, “Symmetric uncertainty-incorporated probabilistic sequence-based ant colony optimization for feature selection in classification,” Knowledge-Based Systems, vol. 256, Article ID 109874, p. 2022, 2022.
View at: Publisher Site | Google Scholar
M. Paniri, M. B. Dowlatshahi, and H. Nezamabadi-pour, “Mlaco: a multi-label feature selection algorithm based on ant colony optimization,” Knowledge-Based Systems, vol. 192, Article ID 105285, 2020.
View at: Publisher Site | Google Scholar
F. Karimi, M. B. Dowlatshahi, and A. Hashemi, “Semiaco: a semi-supervised feature selection based on ant colony optimization,” Expert Systems with Applications, vol. 214, Article ID 119130, 2023.
View at: Publisher Site | Google Scholar
E.-G. Talbi, “A taxonomy of hybrid metaheuristics,” Journal of Heuristics, vol. 8, no. 5, pp. 541–564, 2002.
View at: Publisher Site | Google Scholar
P. Hu, J.-S. Pan, and S.-C. Chu, “Improved binary grey wolf optimizer and its application for feature selection,” Knowledge-Based Systems, vol. 195, Article ID 105746, 2020.
View at: Publisher Site | Google Scholar
M. Abdel-Basset, K. M. Sallam, R. Mohamed, I. Elgendi, K. Munasinghe, and O. M. Elkomy, “An improved binary grey-wolf optimizer with simulated annealing for feature selection,” IEEE Access, vol. 9, pp. 139792–139822, 2021.
View at: Publisher Site | Google Scholar
X.-F. Song, Y. Zhang, D.-W. Gong, and X.-Z. Gao, “A fast hybrid feature selection based on correlation-guided clustering and particle swarm optimization for high-dimensional data,” IEEE Transactions on Cybernetics, vol. 52, no. 9, pp. 9573–9586, 2022.
View at: Publisher Site | Google Scholar
A. W. Mohamed, A. A. Hadi, and A. K. Mohamed, “Gaining-sharing knowledge based algorithm for solving optimization problems: a novel nature-inspired algorithm,” International Journal of Machine Learning and Cybernetics, vol. 11, no. 7, pp. 1501–1529, 2020.
View at: Publisher Site | Google Scholar
P. Agrawal, T. Ganesh, D. Oliva, and A. W. Mohamed, “S-shaped and v-shaped gaining-sharing knowledge-based algorithm for feature selection,” Applied Intelligence, vol. 52, no. 1, pp. 81–112, 2022.
View at: Publisher Site | Google Scholar
P. Agrawal, T. Ganesh, and A. W. Mohamed, “Chaotic gaining sharing knowledge-based optimization algorithm: an improved metaheuristic algorithm for feature selection,” Soft Computing, vol. 25, no. 14, pp. 9505–9528, 2021.
View at: Publisher Site | Google Scholar
K. Hanbay, “A new standard error based artificial bee colony algorithm and its applications in feature selection,” Journal of King Saud University Computer and Information Sciences, vol. 34, no. 7, pp. 4554–4567, 2022.
View at: Publisher Site | Google Scholar
M. Tubishat, N. Idris, L. Shuib, M. A. Abushariah, S. Mirjalili, and S. Mirjalili, “Improved salp swarm algorithm based on opposition based learning and novel local search algorithm for feature selection,” Expert Systems with Applications, vol. 145, Article ID 113122, 2020.
View at: Publisher Site | Google Scholar
D. O. Owuor, T. Runkler, A. Laurent, J. O. Orero, and E. O. Menya, “Ant colony optimization for mining gradual patterns,” International Journal of Machine Learning and Cybernetics, vol. 12, no. 10, pp. 2989–3009, 2021.
View at: Publisher Site | Google Scholar
X. Zhang, Z.-H. Zhan, W. Fang, P. Qian, and J. Zhang, “Multipopulation ant colony system with knowledge-based local searches for multiobjective supply chain configuration,” IEEE Transactions on Evolutionary Computation, vol. 26, no. 3, pp. 512–526, 2022.
View at: Publisher Site | Google Scholar
D. Zhao, L. Liu, F. Yu et al., “Ant colony optimization with horizontal and vertical crossover search: fundamental visions for multi-threshold image segmentation,” Expert Systems with Applications, vol. 167, Article ID 114122, 2021.
View at: Publisher Site | Google Scholar
M. Rostami, K. Berahmand, E. Nasiri, and S. Forouzandeh, “Review of swarm intelligence-based feature selection methods,” Engineering Applications of Artificial Intelligence, vol. 100, Article ID 104210, 2021.
View at: Publisher Site | Google Scholar
Z. Manbari, F. Akhlaghian Tab, and C. Salavati, “Fast unsupervised feature selection based on the improved binary ant system and mutation strategy,” Neural Computing and Applications, vol. 31, no. 9, pp. 4963–4982, 2019.
View at: Publisher Site | Google Scholar
M. Ghosh, R. Guha, R. Sarkar, and A. Abraham, “A wrapper-filter feature selection technique based on ant colony optimization,” Neural Computing and Applications, vol. 32, no. 12, pp. 7839–7857, 2020.
View at: Publisher Site | Google Scholar
M. Paniri, M. B. Dowlatshahi, and H. Nezamabadi-pour, “Ant-td: ant colony optimization plus temporal difference reinforcement learning for multi-label feature selection,” Swarm and Evolutionary Computation, vol. 64, Article ID 100892, 2021.
View at: Publisher Site | Google Scholar
A. Hashemi, M. Joodaki, N. Z. Joodaki, and M. B. Dowlatshahi, “Ant colony optimization equipped with an ensemble of heuristics through multi-criteria decision making: a case study in ensemble feature selection,” Applied Soft Computing, vol. 124, Article ID 109046, 2022.
View at: Publisher Site | Google Scholar
Y. Wan, M. Wang, Z. Ye, and X. Lai, “A feature selection method based on modified binary coded ant colony optimization algorithm,” Applied Soft Computing, vol. 49, pp. 248–258, 2016.
View at: Publisher Site | Google Scholar
M. Li, X. Ren, Y. Wang, W. Qin, and Y. Liu, “Advanced antlion optimizer with discrete ant behavior for feature selection,” IEICE Transactions on Info and Systems, vol. 103, no. 12, pp. 2717–2720, 2020.
View at: Publisher Site | Google Scholar
W. Ma, X. Zhou, H. Zhu, L. Li, and L. Jiao, “A two-stage hybrid ant colony optimization for high-dimensional feature selection,” Pattern Recognition, vol. 116, Article ID 107933, 2021.
View at: Publisher Site | Google Scholar
A. Thakkar and R. Lohiya, “Role of swarm and evolutionary algorithms for intrusion detection system: a survey,” Swarm and Evolutionary Computation, vol. 53, Article ID 100631, 2020.
View at: Publisher Site | Google Scholar
K. Chen, B. Xue, M. Zhang, and F. Zhou, “An evolutionary multitasking-based feature selection method for high-dimensional classification,” IEEE Transactions on Cybernetics, vol. 52, no. 7, pp. 7172–7186, 2022.
View at: Publisher Site | Google Scholar
J. Derrac, S. García, D. Molina, and F. Herrera, “A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms,” Swarm and Evolutionary Computation, vol. 1, no. 1, pp. 3–18, 2011.
View at: Publisher Site | Google Scholar
E. S. M. El-kenawy, F. Albalawi, S. A. Ward et al., “Feature selection and classification of transformer faults based on novel meta-heuristic algorithm,” Mathematics, vol. 10, no. 17, p. 3144, 2022.
View at: Publisher Site | Google Scholar
J. Mou, P. Duan, L. Gao, X. Liu, and J. Li, “An effective hybrid collaborative algorithm for energy-efficient distributed permutation flow-shop inverse scheduling,” Future Generation Computer Systems, vol. 128, pp. 521–537, 2022.
View at: Publisher Site | Google Scholar
F. Zhao, X. He, and L. Wang, “A two-stage cooperative evolutionary algorithm with problem-specific knowledge for energy-efficient scheduling of no-wait flow-shop problem,” IEEE Transactions on Cybernetics, vol. 51, no. 11, pp. 5291–5303, 2021.
View at: Publisher Site | Google Scholar
K. Thirumoorthy and K. Muneeswaran, “Feature selection using hybrid poor and rich optimization algorithm for text classification,” Pattern Recognition Letters, vol. 147, pp. 63–70, 2021.
View at: Publisher Site | Google Scholar
D. K. Jain, P. Boyapati, J. Venkatesh, and M. Prakash, “An intelligent cognitive-inspired computing with big data analytics framework for sentiment analysis and classification,” Information Processing and Management, vol. 59, no. 1, Article ID 102758, 2022.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2023 A. Zhiwei Ye et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

591

Downloads

296

Citations

International Journal of Intelligent Systems

High-Dimensional Feature Selection Based on Improved Binary Ant Colony Optimization Combined with Hybrid Rice Optimization Algorithm

Abstract

1. Introduction

2. Related Work

2.1. Metaheuristic-Based FS

2.2. ACO-Based FS

3. Preliminaries

3.1. The Binary Coded Ant Colony Optimization Algorithm

3.2. Hybrid Rice Optimization Algorithm

3.2.1. Hybridization

3.2.2. Selfing

3.2.3. Renewal

4. The Proposed Method

4.1. The Hybrid Models

4.1.1. The Relay Model

4.1.2. The Collaborative Model

4.2. The Improved Heuristic Factor

4.3. Time Complexity Analysis

4.4. The Objective Function

5. Experimental Results and Discussion

5.1. Description of Datasets

5.2. Parameter Settings

5.3. Experiment Analysis

5.3.1. Experimental Evaluation of Methods for Computing Feature Correlation

5.3.2. Comparison of the Proposed Methods with HRO and IBACO

5.3.3. Comparison of the Proposed Methods with Other Basic Metaheuristics

5.3.4. Comparative Study with the State of the Art

5.3.5. Graphical Analysis

5.3.6. Experimental Results for Nonparametric Test

5.4. Discussion and Biological Interpretation

6. Conclusion

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright