Research Article

Two-Stage Approach for Protein Superfamily Classification

Algorithm 1

PCA-NSGA-II.
Let population be denoted as
Probability of crossover be denoted as
Probability of mutation be denoted as
Fitness function be denoted as
Pareto fronts be denoted as
repeat
  Step  1. .
  Step  2. Initialize population .
  Initialize number of chromosomes as random individuals which are encoded
  as strings of 0's and 1's in the chromosome. The length of chromosome depends
  on the total number of non-zero eigen vectors having non-zero eigen values.
  {1 indicates inclusion of the eigen vector in the covariance
  matrix and 0 represents discard of the eigen vector.}
  Step  3. Evaluate fitness function = number of 1's in the chromosome string.
  Step  4. Evaluate where is the original matrix and is the transformation
  matrix. {Based on eigen values selected, map the feature matrix to lower
  dimension by multiplying the original matrix with the transformation matrix.
  Step  5. Evaluate fitness = misclassification error rate of the classifier taking
  B as input matrix.}
  Step  6. Considering and , perform non-dominated sorting using NSGA-II( )
  and generate pareto fronts such as .
  Step  7. Calculate the crowding distance of all solution points using the crowding distance( ).
  Step  8. Perform tournament selection by selecting random pairs from .
  Step  9. Use the crowded comparison operator( ) to select the most widely spread solutions.
  Step  10. Perform pairwise crossover and bitwise mutation to create new offspring.
  Step  11. Let the new population be denoted as .
until ( of )