Comparative and Functional Genomics
Volume 3 (2002), Issue 5, Pages 423-440
Research Article

A Computational Strategy for Protein Function Assignment which Addresses the Multidomain Problem

1Genetics Department, University of Málaga, Málaga 29071, Spain
2Computer Architecture Department, University of Málaga, Málaga 29071, Spain

Received 15 March 2002; Accepted 12 August 2002

A method for assigning functions to unknown sequences based on finding correlations between short signals and functional annotations in a protein database is presented. This approach is based on keyword (KW) and feature (FT) information stored in the SWISS-PROT database. The former refers to particular protein characteristics and the latter locates these characteristics at a specific sequence position. In this way, a certain keyword is only assigned to a sequence if sequence similarity is found in the position described by the FT field. Exhaustive tests performed over sequences with homologues (cluster set) and without homologues (singleton set) in the database show that assigning functions is much ’cleaner’ when information about domains (FT field) is used, than when only the keywords are used.