Research Article

iMPT-FDNPL: Identification of Membrane Protein Types with Functional Domains and a Natural Language Processing Approach

Figure 2

Entire procedures to construct and evaluate the multilabel classifier, iMPT-FDNPL. Membrane proteins and types are retrieved from the UniProt database. The types are termed as labels. Function domain information is obtained from the InterPro database. This information is processed by a natural language processing approach (word2vector), and the outcomes are used to encode proteins. Labels and vectors are fed into RAKEL with random forest as the base classifier to construct the multilabel classifier. This classifier is evaluated by tenfold cross-validation.