Review Article

Long Noncoding RNA Identification: Comparing Machine Learning Based Tools for Long Noncoding Transcripts Discrimination

Table 1

Overview of the methods concerning lncRNA identification.

ā€‰Published yearTesting datasetsTraining speciesModelQuery file formatWeb interface

CPC2007EukaryoticSVMFASTAYes
CPAT2013Human; mouse; fly; zebrafishLRBED; FASTAYes
CNCI2013lncRNAHuman; plantSVMFASTA; GTFNo
PLEK2014lncRNAHuman; maizeSVMFASTANo
lncRNA-MFDL2015lncRNAHumanDL
LncRNA-ID2015lncRNAHuman; mouseRFBED; FSATANo
lncRScan-SVM2015lncRNAHuman; mouseSVMGTFNo
LncRNApred2016lncRNAHumanRFFASTAWeb only

Testing datasets denote that one specific method is developed to discriminate ncRNAs or lncRNAs from protein-coding transcripts. The classification model of CPC, CNCI, PLEK, and lncRScan-SVM is support vector machine (SVM); CPAT employs logistic regression (LR); LncRNA-ID and LncRNApred utilise random forests (RF) and lncRNA-MFDL uses deep stacking networks (DSNs) of deep learning (DL) algorithm.
that the most popular tool CPC is trained and tested on datasets of ncRNAs and protein-coding transcripts. The training datasets of CPAT are also ncRNAs and protein-coding transcripts, though test on lncRNAs for CPAT is conducted and achieved a higher accuracy.
access link of lncRNA-MFDL has expired; thus, we cannot verify the information that the original paper failed to mention.