Review Article

Long Noncoding RNA Identification: Comparing Machine Learning Based Tools for Long Noncoding Transcripts Discrimination

Table 2

Summary of the features of each method selected.

ā€‰ORFCodonSequence structureRibosome interactionAlignmentProtein conservation

CPCQuality; coverage;
integrity
NoNoNoBLASTXNumber and -value of hits;
Distribution of hits
CPATLength;
coverage
Hexamer
Frequency
Content of the bases
Position of the bases
NoNoNo
CNCINoANT matrix;
Codon-bias
MLCDSNoNoNo
PLEKNoNoImproved k-mer schemeNoNoNo
lncRNA-MFDLLength;
coverage
Nok-mer scheme
Secondary structure
MLCDS
NoNoNo
LncRNA-IDLength;
coverage
NoKozak motifRibosome release signal
Changes of binding energy
Profile HMM
based alignment
Score of HMMER
Length of the profile
Length of aligned region
lncRScan-SVMNoDistribution of stop codonScore of txCdsPredict;
length of transcripts;
length and count of exon
NoPhylo-HMM
based alignment
Average PhastCons scores
LncRNApredLength;
coverage
NoLength of the sequence;
signal to noise ratio;
k-mer scheme;
G + C content
NoNoNo

All features are categorized into six groups according to the similarity or basic principles. Thus, some items in the table might not be exactly in one-to-one correspondence with the feature names given in the corresponding published references.