Research Article

METSP: A Maximum-Entropy Classifier Based Text Mining Tool for Transporter-Substrate Identification with Semistructured Text

Figure 1

Workflow of design and function of METSP. Step I (highlighted in pink): explicit TSPs were manually collected from UniProt, TCDB, and TransportDB databases. Step II (in blue): the UniProt annotation text of proteins in explicit TSPs and in randomly selecting protein set was processed to get positive and unlabeled sentence training sets. The maximum-entropy model was used to train and retain the classifier. Step III (in green): the classifier was used to recognize TSPs from query protein annotation text. The new TSPs were obtained by further experts checking.