Table 1: List of query commands applied for a sequence retrieval system (SRS) to create a positive dataset.

SubsetsSearch conditions in SRS Query LanguageNumber of hitsNumber of hits in the positive dataset

Subset 1 Lectin which are not enzymes[libs = swiss_prot trembl -Description: lectin*] [libs-Keywords:Lectin*] [libs-Keywords:Chitin-binding*] [libs-Description:sugarbinding*] ! ([libs-Description: ] [libs-Description:/ase /]) ! [libs-Description: Putative*] ! [libs-Description:putative*] ! [libs-ProtExist: 4*] ! [libs-ProtExist: 5*] ! [libs-ProtExist: 3*] & [libs-SeqLength#30:]2017231

Subset 2 Lectin which are also enzymes[libs = swiss_prot trembl -Description: lectin*] [libs-Keywords:Lectin*] [libs-Keywords:Chitin-binding*] [libs-Description: sugar-binding*] & ([libs-Description: *Peptidase*] [libs-Description: ligase*] [libs-Description: ribonuclease*] [libs-Description: *Protease*] [libs-Description: *Proteinase*] [libs-Keywords: *lipase*] [libs-Keywords: ribonuclease*] [libs-Keywords: *Protease*] [libs-Keywords: *Proteinase*] [libs-Keywords: *lipase*]) ! [libs-Description: Putative*] ! [libs-Description:putative*] ! [libs-ProtExist: 4*] ! [libs-ProtExist: 5*] ! [libs-ProtExist: 3*] & [libs-SeqLength#30:]374

Subset 3 Other “Carbohydrate-binding” proteins[libs = swiss_prot trembl -Keywords: Carbohydrate-binding*] [libs-Description:Carbohydrate-binding*] ! [libs-Description: CUT*] ! [libs-Description: Hydrolase*] ! [libs-Description:lyase*] ! [libs-Description: Putative*] ! [libs-Description:putative*] ! [libs-ProtExist: 4*] ! [libs-ProtExist: 5*] ! [libs-ProtExist: 3*] & [libs-SeqLength#30:]1615

Subset 4 Hyaluronic acid binding proteins[libs = swiss_prot trembl -Description: Hyaluronate*] [libs-Keywords:Hyaluronate*] [libs-Description: Hyaluronan*] [libs-Keywords:Hyaluronan*] [libs-Description: Hyaluronic*] [libs-Keywords:Hyaluronic*] ! [libs-Description: lyase*] ! [libs-Description: synthase*] & ([libs-Description: *link*] [libs-Description: *bind*] [libs-Description: *associate*] [libs-Description: *receptor*] [libs-Description: *mediate*] [libs-Keywords: *link*] [libs-Keywords: *bind*] [libs-Keywords: *associate*]) ! [libs-Description: Putative*] ! [libs-Description:putative*] ! [libs-ProtExist: 4*] ! [libs-ProtExist: 5*] ! [libs-ProtExist: 3*] & [libs-SeqLength#30:]9014

Subset 5 Heparin-binding proteins[libs = swiss_prot trembl -Keywords: Heparin-binding*] [libs-Description:Heparin-binding*] ! [libs-Description: Putative*] ! [libs-Description:lyase*] ! [libs-Description:putative*] ! [libs-ProtExist: 4*] ! [libs-ProtExist: 5*] ! [libs-ProtExist: 3*] & [libs-SeqLength# 30:]33360

Subset 6 Interleukin which can bind to sugar-chains[libs = swiss_prot trembl -ID: IL1A_*] [libs-ID: IL1B_*] [libs-ID: IL4_*] [libs-ID: IL1RA_*] [libs-ID: IL6_*] [libs-ID: IL3_*] [libs-ID: IL2_*] ! [libs-Description: Putative*] ! [libs-Description:putative*] ! [libs-ProtExist: 4*] ! [libs-ProtExist: 5*] ! [libs-ProtExist: 3*] & [libs-SeqLength#30:]1547

Subset 7 FimH adhesion of type 1 pili[libs = swiss_prot trembl -Description: FimH*] [libs-Description: Neuraminyllactose-binding*] [libs-Description: S-fimbrial adhesin*] ! [libs-Description: Putative*] ! [libs-Description:putative*] ! [libs-ProtExist: 4*] ! [libs-ProtExist: 5*] ! [libs-ProtExist: 3*] & [libs-SeqLength#30:])11

Subset 8 F-box only protein which can bind to sugar-chains[libs = swiss_prot trembl -ID: FBX27_HUMAN*] [libs-ID: FBX6_HUMAN*]21

Subset 9 Agrin. Tenascin-C Phospholipase A2 inhibitor subunit A Neurexin[libs = swiss_prot trembl -ID: AGRIN_HUMAN] [libs-ID: PLIA_TRIFL] [libs-Description: Tenascin-C] [libs-ID: NRX1A_HUMAN*]138

Subset 10 Chitin-binding proteins[libs = swiss_prot trembl -Description: cbp-1] ! [libs-Description: Centromere* ] ! [libs-Description: EC*] ! [libs-Description: synthase*] ! [libs-Description: Putative*] ! [libs-Description:putative*] ! [libs-ProtExist: 4*] ! [libs-ProtExist: 5*] ! [libs-ProtExist: 3*] & [libs-SeqLength#30:]44