BioMed Research International

BioMed Research International / 2015 / Article
Special Issue

Intelligent Informatics in Translational Medicine

View this Special Issue

Resource Review | Open Access

Volume 2015 |Article ID 475062 | https://doi.org/10.1155/2015/475062

Hao-Ting Lee, Chen-Che Lee, Je-Ruei Yang, Jim Z. C. Lai, Kuan Y. Chang, "A Large-Scale Structural Classification of Antimicrobial Peptides", BioMed Research International, vol. 2015, Article ID 475062, 6 pages, 2015. https://doi.org/10.1155/2015/475062

A Large-Scale Structural Classification of Antimicrobial Peptides

Academic Editor: Oliver Ray
Received01 Sep 2014
Accepted23 Feb 2015
Published27 Apr 2015

Abstract

Antimicrobial peptides (AMPs) are potent drug candidates against microbial organisms such as bacteria, fungi, parasites, and viruses. AMPs have abundant sequences and structures, two fundamental resources for bioinformatics researches, but analyses on how they associate with each other are either nonexistent or limited to partial classification and data. We thus present A Database of Anti-Microbial peptides (ADAM), which contains 7,007 unique sequences and 759 structures, to systematically establish comprehensive associations between AMP sequences and structures through structural folds and to provide an easy access to view their relationships. 30 distinct AMP structural fold clusters with more than one structure are detected and about a thousand AMPs are associated with at least one structural fold cluster. According to ADAM, AMP structural folds are limited—AMPs only cover about 3% of the overall protein fold space.

1. Introduction

Antimicrobial peptides (AMPs) are potent drug candidates against microbial organisms such as bacteria, fungi, parasites, and viruses. Up to date, more than 10 AMPs have entered clinical trials [1]. Due to the importance, several databases dedicated to AMPs were released in the past few years. Some databases are species-specific such as BACTIBASE [2], BAGEL2 [3], DADP [4], PenBase [5], and PhytAMP [6]; some curate a broad spectrum of species such as AMPer [7], APD2 [8], CAMP2 [9], DAMPD [10], Defensins Knowledgebase [11], and YADAMP [12]. The sizes of these databases range from hundreds to a couple of thousand AMP sequences. However, none of these databases contains all.

Understanding sequence-structure relationships is important for AMP-based drug design. However, one of the major limitations in AMP databases is poorly utilizing structural information. Like AMP sequences, various AMP structures have been resolved. Classified by secondary structures, four traditional AMP structures are alpha helices, beta strands, loop structures, and extended structures [13, 14]. An alternative structural classification using peptide backbone torsion angles also shows many different AMP folds [1]. Few AMP databases such as APD2 have attempted to associate AMP sequences with their secondary structures. However, none has established associations between AMP sequences and AMP structural folds. Examining AMP tertiary structures would help us understand AMPs better and enhance potential antimicrobial drug discovery.

In this work, we present A Database of Anti-Microbial peptides (ADAM) (available at http://bioinformatics.cs.ntou.edu.tw/ADAM). ADAM collects AMPs comprehensively and establishes associations systematically between AMP sequences and structures. Integrated from various sources, ADAM contains the most complete AMP sequences and structures. ADAM not only allows biomedical researchers to search basic AMP information but also provides an easy access to link AMP sequences to structures and vice versa.

2. Data Collection and Methods

2.1. AMP Sequences

ADAM contains 7,007 unique AMP sequences extracted from twelve databases (Figure 1). The twelve databases include APD2 [8], AVPpred [15], BACTIBASE [2], BAGEL3 [3], CAMP2 [9], DADP [4], DAMPD [10], HIPdb [16], PenBase [5], PhytAMP [6], RAPD [17], and YADAMP [12]. The AMP sequences in ADAM were mostly derived from natural sources, covering a broad spectrum of species such as archaea, bacteria, plants, and animals. 2497 out of the 7,007 sequences have been validated experimentally and recorded in literature. Table 1 compares the AMPs of the twelve databases. The CAMP2 contains the most overlapping sequences among the large AMP databases such as APD2, CAMP2, DAMPD, DADP, and YADAMP. For species-specific AMP databases, AVP and HIPdb are found to contain less overlapping sequences.


APD CAMPDADPDAMPDYADBACTIBBAGELPenBasePhytAMPRAPDAVPHIPdb

APD 24362100744376160186391107555633
CAMP210030528585861994122561145716533
DADP7448581792220772000051311
DAMPD37658622010685283170519101811
YADAMP16011994772528278211343160677649
BACTIBASE8612203111320452001001
BAGEL3956070435243100010
PenBase1105100280000
PhytAMP107145019600002723100
RAPD5571510671000311995
AVP5665131876010109604156
HIPdb333311114910005156744

Each unique AMP sequence was assigned an ADAM ID. The ADAM ID is linked to the basic AMP information, structural view, physicochemical properties, amino acid composition, and external resources. The structural view displays the best corresponding PDB structure and, if available, the representative PDB structure of the fold cluster which this AMP sequence belongs to. The physicochemical properties list peptide length, net charge, instability index, aliphatic index, and grand average of hydropathicity index. The composition is the ratio of each amino acid in the AMP. The other resources are linked to PDB, CATH, SCOP, Pfam, and other AMP databases associated with this AMP (see Supplementary Material available online at http://dx.doi.org/10.1155/2015/475062).

2.2. AMP Structures

The AMP structures were obtained by running BLAST of the experimentally validated AMP sequences against the Protein Data Bank [18]. 408 sequences had 759 matching structures with either 100% sequence identity or at least 90% identity sequence with the -value < 10−5. Each matching structure was annotated by SCOP v1.75B [19] and CATH v4.0 [20]. Because not every AMP structure had CATH or SCOP annotation, one could not determine all unique AMP structural folds simply based on these annotations.

Tables 2 and 3 record the number of the AMP structures according to CATH v4.0 and SCOP v1.75B, respectively. Four hierarchical levels of CATH are class, architecture, topology, and homologous superfamily; four levels of SCOP are class, fold, superfamily, and family. The topology level of CATH corresponds to the fold level of SCOP. The AMP structures appear at the entire four fundamental CATH classes (Table 2) and seven SCOP classes (Table 3). Within 759 AMP structures, 40 out of 1375 CATH folds (Table 2) and 47 out of 1390 SCOP folds (Table 3) are found. These AMP structures cover about 3% of the protein fold space defined by CATH and SCOP.


 ClassArchitectureTopologyHomologous superfamily

ADAM4114041
CATH 4.044013752738


 ClassFoldSuperfamilyFamily

ADAM7475372
SCOP 1.75B11139022204609

2.3. AMP Structural Fold Clusters

A graph-based clustering procedure was applied for accessing the unique AMP folds. In this graph, the vertices represent AMP structures and there is an edge between two vertices if the two AMP structures are similar. The AMP structures came from the previous BLAST results. Only 264 best matching structures were collected under more stringent selection conditions. Each AMP is allowed to have at most one best matching structure, and multiple AMPs can map to the same AMP structure. The similarity of two AMP structures was then measured by TM-score, whose value ranges from 0 to 1 [21]. An edge exits if its TM-score > 0.5, which indicates that the two structures should belong to the same fold [22]. 136 AMP fold clusters were formed with 30 clusters containing more than one AMP structure, as shown in Figure 2. The top 10 common AMP structural folds with CATH and SCOP annotations are listed in Table 4. The structural fold clusters can have the same CATH and SCOP annotations as cluster #1 in Table 4. One CATH fold can map to multiple SCOP folds as cluster #4 in Table 4; one SCOP fold can also map to multiple CATH folds as cluster #9 in Table 4. Note that some AMP structures have neither CATH nor SCOP annotation.


AMP structural foldsCATHSCOP
Fold cluster IDClassArchitectureTopologyClassFold

1Alpha beta2-layer sandwichDefensin A-likeSmall proteinsKnottins

2Mainly betaBeta barrelOB foldAlpha and beta proteins (a + b)IL8-like

3Mainly alphaUp-down bundleSingle alpha-helices involved in coiled-coils or other helix-helix interfacesPeptidesAntimicrobial helix

3PeptidesLiposaccharide-binding protein CAP18

3PeptidesPeptide hormones

4Alpha betaRollAntimicrobial peptide, beta-defensin 2; chain ASmall proteinsDefensin-like

5Small proteinsKnottins

6Mainly alphaOrthogonal bundleHistone, subunit AAll alpha proteinsHistone-fold

7Mainly alphaOrthogonal bundleLysozymeAlpha and beta proteins (a + b)Lysozyme-like

8Alpha beta2-layer sandwichCrambinSmall proteinsCrambin-like

9Mainly alphaOrthogonal bundleNK-lysinAll alpha proteinsSaposin-like

9Mainly alphaUp-down bundleBacteriocin As-48; chain AAll alpha proteinsSaposin-like

10Alpha betaRollP-30 proteinAlpha and beta proteins (a + b)RNase A-like

The vertices represent the AMP structures and an edge between two vertices exists if the TM-score > 0.5, indicating the two structures as the two verctices fall into the same fold [22]. Among the 136 fold clusters in ADAM, 30 of them which contain more than one structure are displayed here.

2.4. AMP Structures Associated with ADAM Sequences

From AMP sequences to structures, AMP structures were obtained by performing BLAST on the experimentally validated AMPs against PDB. From AMP structures to ADAM sequences, about one-eighth of the ADAM sequences, over a thousand sequences, were found to associate with the AMP structures, which were determined by running BLAST against the best matching AMP structures with the -value < 10−5. Here we list the top 10 common Pfam domains and families [23] found in the experimentally validated AMPs and their associations with the AMP structural fold clusters (Table 5). Out of these common Pfam domains and families, seven of them fall within the top 10 AMP structural folds. Table 5 also indicates that no structures are available for the AMPs with Pfam family antimicrobial_1.


PfamAMP structural fold cluster ID

1Antimicrobial_23
2Antimicrobial_1NA
3Defensin_beta4
4Gamma-thionin1
5Cyclotide5
6Defensin_21
7Defensin_14
8Bacteriocin_II33
9Cecropin106
10DD_K3

3. Implementations and Results

ADAM was built using AppServ 2.6.0. The Apache HTTP server was applied, the server-side scripts were written in PHP, and the database was built by MySQL.

3.1. Multiple Search Capabilities

ADAM offers multiple search capabilities, which can be classified into two basic categories: sequence search and structural search. Each AMP entry is assigned an ADAM ID, which would have a unique sequence and, if found, a corresponding structure. The sequence search covers the direct information of an AMP sequence, including the description, source species, sequence length, and Pfam domain. ADAM which focuses on AMP structure and sequence information does not contain all of the information that other AMP databases provide. Therefore, external links to other AMP databases are also provided in ADAM. In addition, the structural search allows users to retrieve the AMP information associated with specific PDB structures or ADAM fold clusters.

3.2. Structure-Sequence Cluster Browsing

ADAM offers 136 AMP fold clusters built by TM-score for browsing. Each structure in the AMP cluster is annotated by CATH, SCOP, and Pfam, if available. The AMP structures from all of the clusters occupy about 3% of the protein fold space defined by CATH and SCOP. Each cluster would list the associated AMP sequences.

For example, ADAM cluster #1 (AC_001) is a cluster of 26 structures associated with 207 AMP sequences. Detailed information can be found at Table S1. These structures in this cluster gathered by TM-score are consistently classified into the same CATH fold, alpha-beta 2-layer sandwich defensin A-like structure, and the same SCOP fold, small protein knottins. SCOP further classifies these structures into four different SCOP families. In addition, this AMP structural fold is found to associate with six different Pfam domains, including antimicrobial_6, defensin_2, gamma-thionin, toxin_2, toxin_3, and toxin_37, which supports that different sequences which fold into the same structure could behave similarly. Another interesting example is ADAM cluster #5 (AC_005), which contains 53 AMP sequences involved with cyclotide Pfam family. Within this cluster, only four structures are annotated by SCOP. All of the four structures are again classified into the same SCOP fold, knottins, but fall into multiple SCOP families.

ADAM also allows users to extract the relevant AMP structures according to CATH or SCOP classification by the underneath hyperlinks. In fact, both structure-to-sequence and sequence-to-structure browsing can be performed in ADAM.

Each AMP cluster is further examined. An interesting phenomenon is observed that peptides in one AMP cluster consistently belong to the same mechanism of microbial killing, either transmembrane pore formation or metabolic inhibition of intracellular targets [24], suggesting that AMP structures may play a role in the killing action. For example, the AMPs in ADAM cluster #3 (AC_003) belong to the mechanism of transmembrane pore formation; those in ADAM cluster #6 (AC_006) are the metabolic inhibitors for the intracellular targets.

4. Discussions

ADAM, which is a comprehensive AMP database, provides an easy access to AMP sequences, structures, and their relations. Two distinct characters of ADAM are its size and sequence-structure analysis. ADAM contains 7,007 unique AMP sequences and 759 structures. To our knowledge, this is the first comprehensive study to analyze various AMP structural folds. Our analysis demonstrates that AMP structures cover about ~3% of the overall CATH or SCOP folds. Biologically this infers more than one scheme for AMPs to fight microbes. The results also indicate that AMP structural folds are limited. The majority of the protein structural folds lack antimicrobial activities.

The development of ADAM raises some interesting research topics, which are beyond the scope of this study, still waiting to be explored. To name a few, for example, Table 5 shows that little is known of the structure of Pfam family antimicrobial_1. Such AMP structures need to be resolved by X-ray crystallography or NMR spectroscopy; Table 4 demonstrates a prolonged discussion that CATH and SCOP classifications are not always consistent with each other [21]. The best approach to annotate protein structure is still to be determined. Despite sequence differences between Pfam antimicrobial_2 and DD_K domains, the two domains somehow share the same alpha-helical structural fold: how the two different domains maintain the same structural fold as well as antimicrobial activities still needs more studies.

ADAM, which offers complete AMP sequence and structure information, can benefit a number of different AMP researches such as biomimetics in drug development, comparative immunomics, and structure-function analysis. For example, ADAM cluster #1 (AC_001) has 26 structures associated with 207 AMP sequences (Table S1). Not every structure in the cluster has annotations, but those which do belong to the same CATH and SCOP fold, matching with six different kinds of Pfam families. Such information can help to identify key elements for antimicrobial drug design.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The work was supported by National Science Council, Taiwan (NSC-102-2221-E-019-060). Hao-Ting Lee, Chen-Che Lee, and Je-Ruei Yang were partially supported by Center for Excellence for the Oceans at National Taiwan Ocean University. In addition, the authors thank Juin-Yiing Huang and De-Hao Chen for technical support.

Supplementary Materials

The content of the Supplementary Material can be classified into three main categories: (1) an example of ADAM fold cluster, (2) the technical descriptions for aliphatic index**, instability index**, and hydropathicity**, and (3) the frequently asked questions about ADAM. In more detail, Table S1 lists ADAM fold cluster AC_001 with 26 AMP structures associated with 207 unique AMP sequences. Figure S1 and S2 illustrate how to browse through ADAM either from AMP sequence-structure or AMP structure-sequence. Figure S3 and S4 show the basis of the AMP prediction tools using support vector machines and hidden Markov models, which are provided in ADAM.

  1. Supplementary Material

References

  1. C. D. Fjell, J. A. Hiss, R. E. W. Hancock, and G. Schneider, “Designing antimicrobial peptides: form follows function,” Nature Reviews Drug Discovery, vol. 11, no. 1, pp. 37–51, 2012. View at: Publisher Site | Google Scholar
  2. R. Hammami, A. Zouhir, C. Le Lay, J. Ben Hamida, and I. Fliss, “BACTIBASE second release: a database and tool platform for bacteriocin characterization,” BMC Microbiology, vol. 10, article 22, 2010. View at: Publisher Site | Google Scholar
  3. A. de Jong, A. J. van Heel, J. Kok, and O. P. Kuipers, “BAGEL2: mining for bacteriocins in genomic data,” Nucleic Acids Research, vol. 38, no. 2, Article ID gkq365, pp. W647–W651, 2010. View at: Publisher Site | Google Scholar
  4. M. Novković, J. Simunić, V. Bojović, A. Tossi, and D. Juretić, “DADP: the database of anuran defense peptides,” Bioinformatics, vol. 28, no. 10, pp. 1406–1407, 2012. View at: Publisher Site | Google Scholar
  5. Y. Gueguen, J. Garnier, L. Robert et al., “PenBase, the shrimp antimicrobial peptide penaeidin database: sequence-based classification and recommended nomenclature,” Developmental & Comparative Immunology, vol. 30, no. 3, pp. 283–288, 2006. View at: Publisher Site | Google Scholar
  6. R. Hammami, J. Ben Hamida, G. Vergoten, and I. Fliss, “PhytAMP: a database dedicated to antimicrobial plant peptides,” Nucleic Acids Research, vol. 37, no. 1, pp. D963–D968, 2009. View at: Publisher Site | Google Scholar
  7. C. D. Fjell, R. E. W. Hancock, and A. Cherkasov, “AMPer: a database and an automated discovery tool for antimicrobial peptides,” Bioinformatics, vol. 23, no. 9, pp. 1148–1155, 2007. View at: Publisher Site | Google Scholar
  8. G. Wang, X. Li, and Z. Wang, “APD2: the updated antimicrobial peptide database and its application in peptide design,” Nucleic Acids Research, vol. 37, no. 1, pp. D933–D937, 2009. View at: Publisher Site | Google Scholar
  9. F. H. Waghu, L. Gopi, R. S. Barai, P. Ramteke, B. Nizami, and S. Idicula-Thomas, “CAMP: collection of sequences and structures of antimicrobial peptides,” Nucleic Acids Research, vol. 42, no. 1, pp. D1154–D1158, 2014. View at: Publisher Site | Google Scholar
  10. V. S. Sundararajan, M. N. Gabere, A. Pretorius et al., “DAMPD: a manually curated antimicrobial peptide database,” Nucleic Acids Research, vol. 40, no. 1, pp. D1108–D1112, 2012. View at: Publisher Site | Google Scholar
  11. S. Seebah, A. Suresh, S. Zhuo et al., “Defensins knowledgebase: a manually curated database and information source focused on the defensins family of antimicrobial peptides,” Nucleic Acids Research, vol. 35, no. 1, pp. D265–D268, 2007. View at: Publisher Site | Google Scholar
  12. S. P. Piotto, L. Sessa, S. Concilio, and P. Iannelli, “YADAMP: yet another database of antimicrobial peptides,” International Journal of Antimicrobial Agents, vol. 39, no. 4, pp. 346–351, 2012. View at: Publisher Site | Google Scholar
  13. H. Jenssen, P. Hamill, and R. E. W. Hancock, “Peptide antimicrobial agents,” Clinical Microbiology Reviews, vol. 19, no. 3, pp. 491–511, 2006. View at: Publisher Site | Google Scholar
  14. L. T. Nguyen, E. F. Haney, and H. J. Vogel, “The expanding scope of antimicrobial peptide structures and their modes of action,” Trends in Biotechnology, vol. 29, no. 9, pp. 464–472, 2011. View at: Publisher Site | Google Scholar
  15. N. Thakur, A. Qureshi, and M. Kumar, “AVPpred: collection and prediction of highly effective antiviral peptides,” Nucleic Acids Research, vol. 40, no. 1, pp. W199–W204, 2012. View at: Publisher Site | Google Scholar
  16. A. Qureshi, N. Thakur, and M. Kumar, “HIPdb: a database of experimentally validated HIV inhibiting peptides,” PLoS ONE, vol. 8, no. 1, Article ID e54908, 2013. View at: Publisher Site | Google Scholar
  17. Y. Li and Z. Chen, “RAPD: a database of recombinantly-produced antimicrobial peptides,” FEMS Microbiology Letters, vol. 289, no. 2, pp. 126–129, 2008. View at: Publisher Site | Google Scholar
  18. P. W. Rose, C. Bi, W. F. Bluhm et al., “The RCSB Protein Data Bank: new resources for research and education,” Nucleic Acids Research, vol. 41, no. 1, pp. D475–D482, 2013. View at: Publisher Site | Google Scholar
  19. A. Andreeva, D. Howorth, J.-M. Chandonia et al., “Data growth and its impact on the SCOP database: new developments,” Nucleic Acids Research, vol. 36, no. supplement 1, pp. D419–D425, 2008. View at: Publisher Site | Google Scholar
  20. I. Sillitoe, A. L. Cuff, B. H. Dessailly et al., “New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures,” Nucleic Acids Research, vol. 41, no. 1, pp. D490–D498, 2013. View at: Publisher Site | Google Scholar
  21. Y. Zhang and J. Skolnick, “Scoring function for automated assessment of protein structure template quality,” Proteins: Structure, Function and Genetics, vol. 57, no. 4, pp. 702–710, 2004. View at: Publisher Site | Google Scholar
  22. J. Xu and Y. Zhang, “How significant is a protein structure similarity with TM-score = 0.5?” Bioinformatics, vol. 26, no. 7, pp. 889–895, 2010. View at: Publisher Site | Google Scholar
  23. M. Punta, P. C. Coggill, R. Y. Eberhardt et al., “The Pfam protein families database,” Nucleic Acids Research, vol. 40, no. 1, pp. D290–D301, 2012. View at: Publisher Site | Google Scholar
  24. K. A. Brogden, “Antimicrobial peptides: pore formers or metabolic inhibitors in bacteria?” Nature Reviews Microbiology, vol. 3, no. 3, pp. 238–250, 2005. View at: Publisher Site | Google Scholar

Copyright © 2015 Hao-Ting Lee et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

3786 Views | 1030 Downloads | 25 Citations
 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.