Computational Biology Journal

Research Article

Two-Stage Approach for Protein Superfamily Classification

PCA-NSGA-II.

Let population be denoted as
Probability of crossover be denoted as
Probability of mutation be denoted as
Fitness function be denoted as
Pareto fronts be denoted as

repeat
Step 1. .
Step 2. Initialize population .
Initialize number of chromosomes as random individuals which are encoded
as strings of 0's and 1's in the chromosome. The length of chromosome depends
on the total number of non-zero eigen vectors having non-zero eigen values.
{1 indicates inclusion of the eigen vector in the covariance
matrix and 0 represents discard of the eigen vector.}
Step 3. Evaluate fitness function = number of 1's in the chromosome string.
Step 4. Evaluate where is the original matrix and is the transformation
matrix. {Based on eigen values selected, map the feature matrix to lower
dimension by multiplying the original matrix with the transformation matrix.
Step 5. Evaluate fitness = misclassification error rate of the classifier taking
B as input matrix.}
Step 6. Considering and , perform non-dominated sorting using NSGA-II( )
and generate pareto fronts such as .
Step 7. Calculate the crowding distance of all solution points using the crowding distance( ).
Step 8. Perform tournament selection by selecting random pairs from .
Step 9. Use the crowded comparison operator( ) to select the most widely spread solutions.
Step 10. Perform pairwise crossover and bitwise mutation to create new offspring.
Step 11. Let the new population be denoted as .
until ( of )