Abstract

The recent advance in the microarray data analysis makes it easy to simultaneously measure the expression levels of several thousand genes. These levels can be used to distinguish cancerous tissues from normal ones. In this work, we are interested in gene expression data dimension reduction for cancer classification, which is a common task in most microarray data analysis studies. This reduction has an essential role in enhancing the accuracy of the classification task and helping biologists accurately predict cancer in the body; this is carried out by selecting a small subset of relevant genes and eliminating the redundant or noisy genes. In this context, we propose a hybrid approach (MWIS-ACO-LS) for the gene selection problem, based on the combination of a new graph-based approach for gene selection (MWIS), in which we seek to minimize the redundancy between genes by considering the correlation between the latter and maximize gene-ranking (Fisher) scores, and a modified ACO coupled with a local search (LS) algorithm using the classifier for measuring the quality of the candidate subsets. In order to evaluate the proposed method, we tested MWIS-ACO-LS on ten well-replicated microarray datasets of high dimensions varying from 2308 to 12600 genes. The experimental results based on ten high-dimensional microarray classification problems demonstrated the effectiveness of our proposed method.

1. Introduction

In recent years, DNA microarray technology has grown tremendously, thanks to its unquestionable scientific merit. This technology developed in the early 1990s allowed researchers to simultaneously measure the expression levels of several thousand genes [1, 2], These levels of expression are very important for the detection or classification of the specific tumor type. The microarray data is transformed into gene expression matrices, where a row represents an experimental condition and column represents a gene; each value of is the measure of the level of expression of the gene in the sample (see Table 1).

For the cancer classification problem, each line contains information about the class of a sample (the type of cancer). Thus, DNA microarray analysis can be formulated as a supervised classification task [3].

In the cancer classification task, a small number of samples are available, while each sample is described by a very large number of genes. These characteristics of the microarray data make it very likely the presence of redundant or irrelevant genes, which limit the performance of classifiers. Thus, extracting a small subset of genes containing valuable information about a given cancer is one of the principal challenges in the microarray data analysis [4].

Gene selection has become more and more indispensable over the last few years. The main motivation of this selection is to identify and select the useful genes contained in a microarray dataset for distinguishing the sample classes. It also provides a better understanding and interpretation of the phenomena studied. Also, it surpasses the curse of dimensionality in order to improve the quality of classifiers. In general, gene selection methods are divided into two subclasses: wrapper approaches and filter approaches [5]. In wrapper methods, the selection can be seen as an exploration of all the possible subsets, and the principle is to generate a subset of genes and evaluate it afterward. Indeed, the quality of a given subset is measured by a specific classifier. In the aforementioned method (wrapper), the classification algorithm is used several times at each evaluation. Generally, the accuracy according to the final subset of genes is high because of the bias of the process of generating the classifier used. Another advantage is their conceptual simplicity: just generate and test. However, they do not have any theoretical justification for the selection and do not allow us to understand the dependency relationships that may exist between genes. On the other hand, the selection procedure is specific to a particular classifier, and the found subsets are not necessarily valid if we change the classifier. Besides, they typically suffer from a possible overfitting and high computational cost [5, 6]. Also, these approaches become unfeasible because the evaluation of large gene subsets is computationally very expensive [7]. While in filter methods, the final subset is selected based on some gene score functions and significance measures. Unlike wrappers, the selection is independent of the classifier used. The operating principle of these methods is based on the evaluation of each gene individually to assign it a score. The gene selection is performed by selecting the best-ranked genes. Filters are generally less expensive in computing time, so they can be used in the case where the number of genes is very high because of their reasonable complexity. But, the main negative point of these methods is that they do not take into consideration the possible interactions between genes. In the literature, there are several individual gene-ranking methods (filter) such as t-test [8], Fisher score [9], signal-to-noise ratio [10], information gain [7], and ReliefF [11].

In wrapper methods, metaheuristics are commonly used to generate high-quality subsets of genes. Examples of classification algorithms used for measuring the quality of each candidate solution include support vector machines (SVMs) and K nearest neighbor (KNN) [12].

The first works on the DNA microarray classification were published at the end of the 1990s [13, 14]. In this context, several researchers have utilized metaheuristic methods and the ACO algorithm for solving the feature selection problem (particularly gene selection), in order to facilitate recognition of cancer cells: ACO [1520] algorithm, PSO [4, 6, 2125] genetic algorithm [4, 26, 27], incorporating imperialist competition algorithm (ICA) [28], and binary differential evolution (BDE) algorithm [29].

The ant colony optimization algorithm (abbreviated as ACO) is a population-based metaheuristic [30, 31]. Thanks to its efficiency, it has been used to solve several optimization problems in different fields. In the ACO algorithm, each ant presents a candidate solution to the problem, and the ants build approximate solutions iteratively (step-by-step). The process of constructing solutions can be regarded as a path (between home and food source of ants) on a graph. The choice of the best path by ants is influenced by the quantities of pheromone left in these pathways and a piece of heuristic information that indicates the goodness of the decision taken by an ant.

Thus, metaheuristics find application in solving the gene selection problem which is known to be NP-hard [32, 33]. In the last decade, several researchers have also adopted graph-based techniques to select near-optimal subset of a feature set [3436].

In this study, we propose a hybrid approach for solving the gene selection problem. Our two-stage proposed approach starts with a first stage in which a new graph-based approach is proposed (MWIS) without using any learning model. In the second stage, a wrapper method based on a modified ACO and a new local search algorithm guided by the classifier is developed. In this step, the role of is to evaluate each candidate gene subset generated. The proposed approach has not been previously investigated by previous researchers.

This paper is organized as follows: in Section 2, we present the proposed gene selection method. Section 3 provides a detailed exposition of the experiments that we have put on ten microarray datasets to evaluate our approach. Finally, we conclude our paper.

2. Methods

2.1. Graph Theory Approach for Gene Selection
2.1.1. Notations

In this work, we use X to denote a dataset (Table 1) of M samples . We use , to denote the N genes vectors. are the class labels.

Graph theory gives an abstract model to represent the relationships between two or more elements (vertices) into a given system. Let be an undirected graph where V is a nonempty finite set called the set of vertices and E is the set of edges. We define a vertex-weighted graph as a graph G together with a function W (the vertex weighting function) such that for all [37, 38]. The maximum weight independent set (MWIS) is one of the most important optimization problems, thanks to their several domains of application [39], particularly, in the gene selection problem, where we can transform the DNA microarray data into a vertex-weighted graph (gene-similarity graph). In this graph, each gene can be considered as a vertex and their Fisher score as weight of this vertex. The set of edges represents the existence of significant correlation (relationship) between these genes; this relation is nothing but the degree of linear association (Pearson correlation) between the latter. After transforming the DNA microarray data, we try to find the maximum weight independent set. This set of genes will be used in the second stage of our proposed method.

2.2. Construction of Gene-Similarity Graph

The construction of gene-similarity graph requires the definition of some statistical notions: starting with the Fisher score to calculate the weight of each vertex (gene).

2.2.1. Fisher Score [9]

It is mainly applied in gene selection as a filter [40]. The Fisher score value of each gene represents its relevance to the dataset; a higher Fisher score means that the gene contributes more information. This information helps to measure the degree of separability of the classes through a given gene . It is defined bywhere c, , , and represent, respectively, the number of classes, the size of the class, and mean and standard deviation of class corresponding to the gene. is the global mean of the gene.

2.2.2. Pearson Correlation Coefficient

The Pearson correlation coefficient is a measure of the strength of the linear relationship between two variables (genes). Let and be two random variables, and the correlation coefficient between and is defined bywhere is the covariance between and , is the standard deviation of , and the standard deviation of .

The correlation coefficient may take on a range of values from to . Let be the absolute value of the correlation between and .

Now, we can define the adjacency matrix , with zeros on its diagonal to represent . Where if is an edge of and if . More precisely, a value of 1 represents the existence of a relationship between (row i) and (column j), while a value of 0 means the nonexistence of this relationship. The creation of requires the definition of the absolute correlation matrix . Based on this matrix, we fill ; let be a fixed value in ; we assume that if then the mutual information between and is high (i.e., the two vertices are adjacent). More exactly, the matrix is filled based on the rule below: for ,where is the minimum correlation value for which we consider two genes in relation. The experimental study carried out in our method proves that behaves well with the high-dimensional data. For example, if we have a dataset composed of 7 genes , Table 2 shows the corresponding absolute correlation matrix to these data.

For , the adjacency matrix is given in Table 3.

We define the weight of a vertex i , by using the Fisher score: . A gene i with a high score in the DNA microarray dataset corresponds to a vertex with a high weight in . This weight gives important information about the gene relevancy to the data. Indeed, if there are two genes connected by an edge in G we prefer the gene which has the best weight. On the basis of the steps defined before, we were able to transform a determined DNA data microarray into a vertex-weighted graph (Algorithm 1).

Input: DNA microarray data,
Output: Gene-similarity graph .
Begin
 Calculate the weight of each gene by using the Fisher score (1).
 Calculate the absolute correlation matrix by using (2).
 Fill the adjacency matrix associated to G, based on the rule (3).
 Create the gene-similarity graph .
Return .

Figure 1 shows the gene-similarity graph equivalent to the adjacency matrix (Table 3); we associate to each gene (vertex) a weight by using the Fisher score:

In the context of gene selection for cancer classification, the microarray datasets are characterized by a very large number of genes. The application of an evolutionary algorithm such as ACO directly without passing by a preprocessing step is highly expansive. This is where filter methods become so useful in order to extract a subset of possibly informative genes, and then the evolutionary metaheuristic is applied to select the near-optimal subset of genes [19]. As examples, generalized Fisher score, ReliefF, and BPSO are combined in [6], an information gain filter and a memetic algorithm in [41], chi-square statistics and a GA are used in [26], information gain and improved simplified swarm optimization in [42].and ReliefF, mRMR (minimum redundancy maximum relevance), and GA in [11]. Zhao et al proposed a hybrid approach by combining the Fisher score with a GA and PSO [40]. In order to overcome the disadvantages of filter methods, we propose an efficient approach based on graph theory techniques to select the first subset. This method takes into account possible interactions between genes.

2.3. Gene Selection Based on the Maximum Weight Independent Set

Let be a vertex-weighted undirected graph, where V is the set of its vertices, E is the set of edges, and W is the vertex weighting function. For each we define the neighborhood of , i.e., . A subset is an independent set of G if there are no two adjacent vertices in I (i.e., connected by an edge). The MWIS is the independent set with the maximum weight (the weight of a subset of vertices in V is defined as the sum of the weights of the vertices in this subset [43]).

We remark that in filter methods for gene selection based on the rank of genes, the correlation between the selected genes is not considered. This implies the selection of subsets with a high level of redundancy that penalizes the classification performances; on the other hand, these methods eliminate the genes with a low individual score, ignoring the possibility that they can become highly relevant when combined with other genes [44]. This motivates us to propose a graph-based approach to overcome these problems. In the first stage of our method, we consider the gene selection problem as the search for the maximum weight independent set in the gene-similarity graph . The choice of this subset is justified by two arguments: First, the term maximum weight can be translated in the context of gene selection as selecting a subset of genes with maximum relevance. Second, the notion of independent ensures the choice of a subset with minimum redundancy; i.e., in this subset, there are no two genes with high correlation. In addition, this subset can contain genes with a low score. Therefore, the proposed method in this stage gives a good subset of genes for applying an evolutionary algorithm such as ACO.

The MWIS into a given graph is an NP-hard problem [45], and since in our case the gene-similarity graph is large (several thousands of vertices and edges), then it is impossible to find an exact solution to our problem in a reasonable time. For this, we propose a greedy algorithm (heuristic) to quickly obtain an approximate solution. The main lines of this algorithm are presented in Algorithm 2.

Input: Gene-similarity graph
Output: An approximate maximum weight independent set I.
  Begin
  
 3: while do
   Choose the best vertex in , (i.e., vertex with the high weight).
   
 6:  ;
   ; % is the neighborhood of .
   i = i + 1;
 9: end while
  Return I.

We illustrate the execution of our greedy algorithm (Algorithm 2) on the graph from Figure 1 formed by . In the first iteration, we select the best gene (), then we remove their neighborhood , and in the next iteration we choose the best gene in the second graph composed by . In the last iteration, we have only one gene to choose . Then (Figure 2) is an approximate maximum weight independent set, and we can notice that our greedy algorithm gives the exact MWIS for this example.

2.4. Ant Colony Optimization for Gene Selection

ACO is one of the algorithms based on swarm intelligence. It was introduced as a method for solving optimization problems in the early 90s by Dorigo et al. [30, 31] and developed after in [46, 47]. Initially, ACO was designed to solve the traveling salesman problem by proposing the first ACO algorithm: “Ant System” (AS) [48]. Subsequently, other applications that were considered early in the history of ACO such as quadratic assignment [49], sum coloring [50], vehicle routing [51], constraint satisfaction [52], and gene selection [1517, 19, 20].

The ACO algorithm is inspired by the social behavior of ants. The artificial ants used in the ACO can cooperate with each other (by exchanging information via pheromones) to solve difficult problems; this is performed by building approximate solutions iteratively (step-by-step). The feasible solutions can be regarded as a path between home and food source of ants. The method of choice of this last path is detailed in the next subsections.

2.4.1. ACO for Gene Selection

Denote the p genes as to adopt the ACO for gene selection problem, and a novel ACO is proposed; the path of each ant from the nest to food is coded as a binary string where each bit of the pathway is attached to a gene; the selection of the pathway “1” means that gene has been chosen. On the other hand, a pathway “0” indicates that the gene is not selected in the final subset. Suppose that p is 10, the coding of our modified ACO is presented and explained in Figure 3.

The ants seek to find the best path that maximizes the accuracy and minimizes the number of selected genes. Figure 4 describes the gene section procedure proposed on our ACO. Each ant starts from the nest to the food source with the aim to find the best path (best subset of genes). The building of this path is done step-by-step; in each step i, the ant decides to add the gene i to the candidate subset of genes or not, based on the pheromone and heuristic information assigned to this gene (Figure 4). The ant terminates its tour in p steps and outputs a subset of selected genes as it reaches the food source.

As indicated previously, the task of each ant is to construct a candidate subset of genes using heuristic information and pheromone; this is performed via a probabilistic decision rule. We compute the probability of selecting a pathway as below:where i represents the gene, j takes the value 1 or 0 to denote whether the corresponding gene has been selected or not, is the pheromone intensity that indicates the importance of the selection of the gene, and represents the heuristic reflecting the desirability of the selection of this gene or not. α and β are two parameters controlling the relative importance of the pheromone intensity versus visibility; with , only the visibility (heuristic information) of the gene is taken into account, and the ants will decide to select or not a given gene based just on . Since the previous research experience is lost, therefore there is no cooperation between ants in this case. On the contrary, with , only the trail pheromone trails play. To avoid too rapid convergence of the ACO algorithm, a compromise between these two parameters is necessary to ensure the diversification and intensification of the search space.

2.4.2. The Heuristic

The choice of a good heuristic, which will be used in combination with the pheromone information to build solutions, is an important task in the ACO implementation [53]. In our ACO, this heuristic is used to indicate the quality of a gene based on a scoring algorithm.

For a given ant, the heuristic information is the desirability of adding the gene i to the subset of selected genes. We define this quantity based on the Fisher score (1) which measures the quality of this gene and the number of genes selected by the ant before arriving at gene i . is calculated as follows:

For the value of , we combine the mean of the scores of Fisher of all genes and . This means that the ants tend to choose the small subsets of genes that have high relevance:

2.4.3. Updating the Pheromone Trail

The goal of the pheromone update is to increase the pheromone values associated with good solutions while reducing those associated with bad ones.

The updates of pheromones are made in two stages, a local update and a global update.

Once the ant k has finished the built of its path, the pheromone in all of the pathways will be updated. The updated formula is described below:where is the local pheromone evaporation coefficient parameter which represents the evaporation of trail and is the amount of pheromone deposited by the ant k; in our ACO, it is given bywhere S is the candidate solution created by the ant, is the r-fold cross-validation classification accuracy of classifier (nearest neighbor) based on S, is the number of selected genes in S, and λ is a parameter that indicates the importance of the number of selected genes in S ().

At each iteration T, after all ants finish their traverses, a global update of pheromone quantities is made for all pathways chosen by the best ant (the best candidate solution) during the iteration T.

The global update is carried out as follows:where is the global pheromone evaporation coefficient parameter and is the amount of pheromone deposited by the best ant during the iteration T given by Chiang et al. [15].

To avoid stagnation of the search, the range of possible pheromone trails is limited to an interval .

2.5. Fitness Function

In order to guide our novel ACO towards a high-quality subset of genes, we need to define a “fitness function” f. The quality of a candidate subset can be measured by combining the number of genes into this subset (size) and the classification accuracy using a specific classifier, and in gene selection the aim is to maximize the accuracy and minimize the number of genes used.

The estimation of the classification accuracy is measured by a given classifier using the cross-validation rule. In this study, we use the K-nearest neighbor classifier (KNN).

2.5.1. K-Nearest Neighbor (KNN)

The KNN method is a supervised learning algorithm and was introduced by Fix and Hodges in 1951 [54]. It is based on the notion of proximity (neighbor) between samples for making a decision (classification) [55].

In order to determine the class of a new example, we calculate the distance between the new one and all testing data, and finally the classification is given by a majority vote of its K neighbors. The neighbors are determined by the Euclidean distance which is defined as follows:

In our proposed method, we use the classifier, which is a particular case of (with ). Let X be a new sample to classify and T a sample from the training data, then the class of X is determined as below:

Note that, the genes into gene expression data had different scales, and the KNN classifier is influenced by the measure of distances between samples. Therefore, we modify our by normalizing the training data to transform them to a common scale. This transformation is carried out based on the mean and the standard deviation of each gene, and the latter values are used for the scaling of the test data.

2.5.2. Objective Function

The fitness value of a candidate solution S in our ACO is calculated as follows:where is a weight coefficient in that controls the aggregation of both objectives (maximizing the predictive accuracy and minimizing the number of genes), is the number of selected genes in S, and p is the total number of genes.

Mention that“” is nothing but the average cross-validation classification accuracy calculated by the KNN classifier, using leave-one-out-cross-validation (LOOCV) [56], in which we divide our dataset into M nonoverlapping subsets (M tissue samples). At each iteration, we train our KNN classifier on samples based on the selected genes, and we test it on the remaining sample. The“” associated to LOOCV is calculated based on the rule below:

2.6. Local Search

The local search algorithm is used to improve the solutions given by ants and provide good solutions within a reasonable time. With this aim, we are inspired by the framework proposed in [57], in which a local search based on the filter ranking method is used to solve the feature selection problem.

Given a candidate solution generated by an ant, we define X and Y as the subset of selected and eliminated genes, and X and Y both are ranked using Fisher score, respectively. We further define two basic operators of the local search algorithm:(i)Add: select gene from Y based on its ranking and add it to S(ii)Del: select gene from X based on its ranking and remove it from S

The selection of the gene i from Y to move it to S by Add operator in our proposed method is based on the Roulette wheel developed by Holland [58]. Let and be its Fisher score values. Then the selection probability for gene is defined as follows:

Similarly, for the operator Del, we define the probability of selecting a gene of to remove it from S with a probability defined by:where , for , and are the Fisher score values of .

Based on the probabilities defined before, we can remark that Add operator prefers the genes with the high score to add to S, on the other hand, Del operator prefers the genes with the low score to remove from S.

Our local search algorithm (Algorithm 3) is characterized by the number maximal of Add and Del operations, and the maximal number of consecutive iterations without improvement in the best solution. In addition, this local search algorithm is general and efficient, for example, if we fix at 0, the local search algorithm becomes a backward generation, in which we try to remove the not relevant genes at each iteration.

Input: DNA microarray data; ; S a candidate solution given by an ant;
the maximal number of Add operations;
the maximal number of Del operations;
Output: A candidate solution better than S.
Begin
k = 0;
3:while do
  Determine the subsets X and Y.
   % is the floor function.
6: Repeat times of Add operation to S.
 Repeat times of Del operation to S.
 Create new candidate solution
9:if then
   
   
12:end if
k = k + 1
end while
15:
Return .
2.7. Proposed Method for Gene Selection (MWIS-ACO-LS)

Our hybrid method for solving the gene selection problem is based on combining filter and wrapper approaches. This is carried out taking advantage of the low computing time in filters (MWIS) and the high quality of the subsets provided by the wrapper methods (ACO and LS). The overall process of MWIS-ACO-LS can be seen in Figure 5.

The process begins by transforming the initial dataset into a vertex-weighted graph (Algorithm 1), where we search the MWIS, which is well-known as an NP-hard problem, so we have proposed a greedy algorithm (Algorithm 2) to find a near-optimal set of vertices (representing genes in our problem). The subset of genes selected in the later stage is taken as input into the second stage of selection, which used an evolutionary algorithm (ACO), combined with a local search algorithm to select the minimum number of genes that gives the maximum classification accuracy for the classifier. In this stage, artificial ants cooperate to build a high-quality subset of genes based on the transition rules already presented in Section 2. Also, a local search (Algorithm 3) is proposed to help the ants to achieve good results in a reasonable time. The pseudocode of our proposed method is presented as follows.

2.8. Complexity Analysis of MWIS-ACO-LS

Suppose that N is the number of the original genes and M is the number of samples. Our method is divided into three principal stages:Stage 1: In the first step (Algorithm 1), the weight values of the genes are evaluated using the Fisher score, thus the time complexity is . Moreover, the absolute correlation values between each pair of genes are computed, so the time complexity is . And finally, for the filling of the adjacency matrix (implicitly the construction of gene-similarity graph), the time complexity is . In the second step of this stage (Algorithm 2), the weight of each vertex is already defined, and then we can conservatively assume that in each iteration we remove only the vertex itself to get a time complexity of . Therefore, the overall time complexity of this stage (MWIS) is .Stage 2: First, we mention that p represents the number of selected genes in the first stage; generally .In this stage, the fitness of each candidate subset of genes is calculated using LOOCV (leave-one-out-cross-validation) and the 1NN as a classifier equation (13). Let us analyze now the complexity of fitness calculation using (LOOCV): we compute the distance between the single sample of the testing set and each training set sample, requiring , this process is repeated M times, so the fitness calculation need .In each iteration of our ACO, each ant from the m ants starts from to passing by all p genes, then the construction of a path by an ant as , and each path is evaluated by LOOCV. This process is repeated times by the m ants, in addition, updating the pheromone values has ; therefore, the overall computational complexity of ACO without local search is .Concerning the local search algorithm, for the LS used in the second stage (line 20 Algorithm 4) repeated times, the complexity time is .Generally, , so the total complexity time of the second stage is .Stage 3: For the last stage (line 24 Algorithm 4), the complexity of the backward generation is

Input: DNA microarray data;
Output The global best candidate solution .
Begin
Stage 1: The selection of the first subset of gene
3:Step 1: Use the Algorithm 1 to construct the gene-similarity graph.
Step 2: Apply the greedy algorithm (Algorithm 1) to select an initial subset of genes.
Stage 2: The application of ACO to the subset of gene selected in the first stage
6:Step 1: ACO combined with the local search
 Initialize the pheromone matrix by ones.
for do
9:for do
   build the path (candidate solution S)of the ant based on the probabilistic decision rule defined by (4), (5) and (6).
   Calculate the fitness of the candidate solution using LOOCV in (11).
12:  if i = = 1 then
    
  end if
15:  if then
    
  end if
18:  Do a local update of pheromones based on S.
end for
 Apply the Local search (Algorithm 3) to .
21: Do a global update of pheromones based on .
end for
Find the global best solution
24:Step 2: Apply a backward generation to .
Return .

Consequently, the total time complexity of the proposed method MWIS-ACO-LS is

3. Experimental Studies

This section presents the performance of our proposed approach (MWIS-ACO-LS) on ten well-known gene expression classification datasets, and we compare our results with those of the state-of-the-art. Furthermore, the characteristics of the used datasets, the parameter settings, and the numerical results will be described in the following sections.

The implementation of the proposed approach (MWIS-ACO-LS) is performed using Matlab R2017a.

As far as the KNN classifier is concerned, we have chosen a predefined function in Matlab. Similarly for the SVM classifier [59, 60] used in the comparison a predefined binary linear classifier was chosen. In addition, we have developed a multiclass SVM classifier based on the one-against-all strategy.

Concerning the logistic regression (LR) classifier we have regularized the cost function by two penalties, the first is lasso and the second is the elastic net regularization. The minimizing the cost functions used on and LR-Elasticnet is assured by the stochastic gradient descent (SGD) algorithm implemented in the Scikit-learn package [61]. Experimental initial parameters are given in Table 4.

Additionally, in this study, we use leave-one-out-cross-validation (LOOCV) to measure the quality of the candidate subsets of genes and for comparing our results with the other works.

3.1. Environment

To evaluate our approach, we have chosen ten datasets (DNA microarray) concerning the recognition of cancers [62], which are publicly available and easily accessible. In addition, these datasets are used in several supervised classification works, particularly in the papers using in the “Comparison with state-of-the-art algorithms” section.

All datasets used are described in Table 5. The latter datasets have a multitude of distinguishing characteristics (number of genes, number of samples, and binary classes or multiclasses). The number of samples in some datasets is small (Brain_Tumor2, 9_Tumors, etc.), while others have a higher number (Lung_Cancer, 11_Tumors, etc.). Also, some of them have binary classes (Prostate_Tumor, DLBCL) while others have multiclasses (Leukemia1, Lung_Cancer, etc.). And as our proposed method is designed for the high-dimensional microarrays, all these datasets are characterized by thousands of genes ranging from 2308 to 12600.

3.2. Parameters

We note that our approaches have been run on an Acer Aspire 7750g laptop with Intel Core I5 2.30 GHz processor and 8 GB RAM, under system running Windows 7 (64 bit).

Several tests were carried out in order to obtain an appropriate parameterization; indeed, a set of initial values for the parameters were fixed, and then we change the value of one parameter for different runs until the solutions could not be ameliorated. The process of adjustment was repeated for each parameter until the solutions could not be improved. This process is carried out based on one dataset of cancer classification. Table 5 represents the parameters of the proposed approach.

3.3. Results and Comparisons

Firstly, in order to limit the search space and accelerate the speed of convergence of our proposed approach, the first subset of genes was selected based on a graph-theory algorithm for gene selection (MWIS), and then a modified ACO-1NN coupled with a local search algorithm was applied to find more excellent subset of genes. The quality of a candidate subset is measured by the performance of the KNN classifier obtained using LOOCV and the size of this subset.

The objectives of the experiments carried out on the ten datasets of (DNA microarray) are as follows: to test the effect of gene selection on the improvement of the classification accuracy and to validate the proposed method and verify its effectiveness.

Given the nondeterministic nature of our approach and the SGDlogistic classifier, ten independent runs were performed for each dataset to obtain a more reliable result.

Table 6 shows the results obtained using a new graph-based approach (MWIS) for gene selection, and then, using the MWIS-ACO method where we apply to the subset selected by the MWIS and ACO algorithm, and finally using our new improved method MWIS-ACO-LS, where the ACO is coupled with the local search (LS) method. The classification accuracy in MWIS, MWIS-ACO, and MWIS-ACO-LS is calculated using the 1NN classifier, on the other hand, these methods are compared with SVM, 1NN, and SGDlogistic penalized classifiers without selection to demonstrate the usefulness of our selection approach. We analyze our results in three ways:(i)The classification accuracy(ii)The number of genes used in the classification(iii)The execution time

4. Discussion

First of all, we start by the execution time analysis of our proposed methods. We can remark that the execution time is appropriate to the complexity analysis; in the filter-based approach MWIS, the execution time is low, but the accuracy is not good since the selection is independent to the classifier. While in MWIS-ACO and MWIS-ACO-LS the execution time is important because of the nature of the wrappers method used and the use of the classifier at each evaluation, but the classification accuracy is high. Now passing to the analysis of the different stages of our proposed method MWIS-ACO-LS (from Figures 6 and 7 and Table 6), we can remark that the role of the ACO is to improve the classification accuracy and reduce the number of genes used. In addition, the local search has a primordial role in the refinement of the candidate solutions provided by the ants by reducing the number of genes, while retaining the classification accuracy proved by ants.

The proposed approach (MWIS-ACO-LS) derives its effectiveness from the remarkable improvement in the classification accuracy and the reduction of the number of the genes used in the classification (shown in bold in Table 6), in all datasets (Figure 8).

The “MWIS,” “MWIS-ACO,” and “MWIS-ACO-LS” methods select a reduced subset of informative genes compared to the original subset of genes in the datasets.

From Table 6 and Figure 8, it can be observed that MWIS overcomes the results obtained by the 1NN classifier for “9_Tumors,” “Lung_Cancer,” “SRBCT,” and “DLBCL” datasets which is amazing because the role of the MWIS algorithm is just to find a candidate subset of genes to apply our modified ACO. That subset can contain weak genes and the process of the selection in this method is independent of the classifier used.

Based on the experiments and the application of our approach on ten dataset concern the cancer recognition, we can observe that the proposed method (MWIS-ACO-LS) outperforms all four algorithms in terms of classification accuracy and the number of genes used in the classification. The improvement in performance is more significant for the ; we are passed from a classification accuracy less than to a perfect classification using just 40. So, we can conclude (from Table 6 and Figure 8) that MWIS-ACO-LS can successfully select a small subset of genes which can obtain a high classification accuracy. For all datasets, the “MWIS-ACO-LS” approach has reached a great classification accuracy, more exactly, a classification greater than 99.42% using only a small subset of genes from the original genes. In addition, MWIS-ACO-LS gave a perfect accuracy of 100% for the majority of datasets: (9_Tumors, Brain_Tumor1, Brain_Tumor2, Leukemia1, Leukemia2, SRBCT, Prostate_Tumor, and DLBCL) using just 5 genes for Leukemia1, and 6 genes for SRBCT and DLBLC dataset.

However, with regard to the MWIS approach based on some graph theory principles, we remark that the subset of genes selected by this method gives acceptable classification accuracy according to the number of genes used. This goes back to the procedure used for the construction of this subset in which we give to the genes with the low score the opportunity to be present. As detailed in Section 2, our two-stage proposed method MWIS-ACO-LS starts by selecting a small initial subset of genes that contains the major information in the first stage using MWIS, and then we call our modified ACO combined with a local search algorithm. In this second stage, our algorithm tries to find the smallest subset of genes that give the highest classification accuracy, and Table 6 shows how the second stage plays a crucial role in the increase of the classification for all dataset, especially (Brain_Tumor2, 9_Tumors, 11_Tumors, and Leukemia1) where the results are significantly different (great improvement in the classification accuracy).

In Figures 912, the abscissa axis expresses the number of generations in the second stage of MWIS-ACO-LS, and the ordinate axis expresses the classification accuracy of the best candidate solution during each iteration. This is done for the average of all solutions and the best solution found for the datasets “9_Tumors, Brain_Tumor1, Brain_Tumor2, and Leukemia1.” These figures clearly show that the use of our modified ACO and the local search algorithm play a crucial role in the amelioration of the classification accuracy. As we can remark in these figures the difference between the best solution and the average solution is not great. Therefore, MWIS-ACO-LS possesses a faster convergence speed and achieves the optimal solution rapidly.

In Figures 1316, we show the evolution of the number of genes selected on the ordinate axis relative to the number of generations (the abscissa axis) for the “9_Tumors, Brain_Tumor1, Brain_Tumor2, and Leukemia1” datasets. These figures illustrate the role of our wrapper algorithm based on ACO in reducing the number of genes. Moreover, the second stage of our proposed approach based on the modified ACO and the local search algorithm plays a key role in increasing the classification accuracy and minimizing the number of genes used. Indeed, the ACO aims to identify the near-optimal subset of candidate genes, called the best ant, during each iteration that maximizes the objective function, and once the subset in question is found, our local search algorithm is called to ameliorate the accuracy or reduce the number of genes used while retaining the classification accuracy found previously. After 100 generations of the ACO algorithm, we apply a backward local search algorithm to reduce the number of genes used in the last found best solution (Figures 1316). Thereafter, statistical analysis has been performed using the Kruskal–Wallis statistical test to evaluate our results and test the significance of the difference in the results (accuracy) obtained by our approach.

The Kruskal–Wallis statistical test presented in Figures 17 and 18 shows a comparison of the results obtained by MWIS, MWIS-ACO, MWIS-ACO-LS, and 1NN classifier. According to these figures, the performance of MWIS-ACO and MWIS-ACO-LS approaches exceeds that of the MWIS method and the 1NN classifier. In terms of the statistical significance of the results (classification accuracy), the said test proves that the difference between the classification accuracy in (“1NN,” “MWIS”) and “MWIS-ACO-LS” is statistically significant (remarkable).

Table 7 lists the best subset of genes selected by our proposed approach (MWIS-ACO-LS) for the datasets in which MWIS-ACO-LS gives the best performances compared to the other works (9_Tumors, Brain_Tumor1, Brain_Tumor2, and Prostate_Tumor). These genes are potential biomarkers in cancer identification.

Based on the experiments we carried out, we can conclude that our approach of gene selection (MWIS-ACO-LS) is well-founded. Indeed, of the ten datasets used, our method has achieved a high classification accuracy. More exactly, the proposed method yielded a classification accuracy equal to or greater than 99.42 for all datasets, with a perfect classification (100) for 9_Tumors, Brain_Tumor1, Brain_Tumor2, Leukemia1, Leukemia2, SRBCT, Prostate_Tumor, and DLBCL using less than 40 genes. The high classification accuracy found by our proposed methodology returns to two elements: the first is the combination of a method of the graph theory (MWIS) and the ACO metaheuristic, and the second is the use of a modified 1NN classifier where we normalize the training data in order to transform it to a common scale.

In the following, we do a comparison between our proposed method and some recent optimization algorithms using several classification datasets.

4.1. Comparison with State-of-the-Art Algorithms

In this section, we compare our method with eight recently referred algorithms in the literature [6, 2125, 28]. And to make sense of this comparison, the experiments are performed under the same conditions in each algorithm. Specifically, our approach is executed ten times on each dataset, and then we choose the average and the best subset of genes found. We indicate that the papers [22, 23] report just the best results found.

Table 8 summarizes the classification accuracy and the number of selected genes (taken from the original papers) for the different approaches. The (−) symbol indicates that the result is not reported in the related work. We remark that the results obtained by our approach are very encouraging compared to previous work. Indeed, for most of the datasets examined, the classification accuracies obtained by the proposed gene selection method matched or outperformed those obtained using other methods [6, 2125, 28].

First, for the dataset (9_Tumors) we achieve a perfect accuracy classification with only 40 genes. We find that the best performance for this dataset is attained by our approach (MWIS-ACO-LS), exceeding the best-known result by in the accuracy [6]. We note that the number of genes reported in the FBPSO-SVM [6] is 71 genes to have a good accuracy.

Similarly, for the datasets (, Brain_Tumor1, Brain_Tumor2, and Prostate_Tumor), we get the best performance. In addition, we have a perfect classification (100%) for (Brain_Tumor1, Brain_Tumor2, and Prostate_Tumor) with less than 21 genes.

Table 9 reports the rank of the proposed method compared to other existing methods according to the average accuracy. The results mentioned in the table show that the proposed method has achieved the best average accuracy in most datasets. Indeed, we clearly see that our method is more suitable for gene selection. As shown in Tables 8 and 9, we match or exceed the performance of all comparison methods; except for the Brain_Tumor2 and Prostate_Tumor datasets, in which our approach comes in the second rank after the FBPSO-SVM approach.

The results of this comparative analysis with previous methods for the gene selection in the context of cancer classification have enabled us to conclude that our nature-inspired optimization method is useful in the gene selection problem.

5. Conclusion

In this work, we have presented a hybrid approach (MWIS-ACO-LS) for the gene selection in DNA microarray data. The two-stage proposed approach consists of a preselection phase carried out using a new graph-theoretic approach to select first a small subset of genes; in this stage, we model the gene selection problem as an MWIS problem, and we present a greedy algorithm to approximate the MWIS of genes and a search phase that determines a near-optimal subset of genes for the cancer classification. The latter is based on a modified ACO and a LS algorithm.

This approach aims to select a small subset of relevant genes from an original dataset which contains redundant, noisy, or irrelevant data.

The experimental results show that our approach compares very favorably with the reference methods in terms of the classification accuracy and the number of selected genes. Although the results obtained are interesting and encouraging, many points are likely to be studied in future works, such as(i)Modifying the MWIS method in order to improve the quality of the first subset of genes(ii)Combining the MWIS filter with other metaheuristics such as VNS

This field of research will always remain active as long as it is motivated by the advances of data collection and storage systems on one hand, and by the oncology requirements on the other hand. The best approach for judging this selection of genes is to collaborate with experts (biologists) for a good interpretation of the results.

Data Availability

The datasets (DNA microarray) used in our paper are publicly available and easily accessible [62] (http://web.archive.org/web/20180625075744/http://www.gems-system.org/) (visited on 2018).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.