|
Database name | Database content | Data access and analysis support | URL |
|
Protein Sequence | | | |
|
UniProtKB/Swiss-Prot and UniProtKB/TrEMBL, UniProt Archive (UniParc) [26] | UniProt protein sequences and functional information, comprehensive and non-redundant database that contains most of the publicly available protein sequences in the world | Text search; Blast sequence similarity search; Sequence alignment; Batch retrieval; Database ID mapping; FTP download | http://www.uniprot.org/ |
|
NCBI Reference Sequence (RefSeq) [27] | Non-redundant collection of richly annotated DNA, RNA, and protein sequences | Entrez query access; Searching Nucleotide or Protein; Searching Genome; BLAST; FTP download; Sequence Homology searches and retrieval | http://www.ncbi.nlm.nih.gov/ |
|
Gene and Genome | | | |
|
GenBank [28] | Genetic sequence database, an annotated collection of all publicly available DNA sequences databases | Database query; Phylogenetics; Genome Analyses; FTP download | http://www.ncbi.nlm.nih.gov/Genbank/ |
EMBL [29] | | | http://www.ebi.ac.uk/embl/ |
DDBJ [30] | | | http://www.ddbj.nig.ac.jp/ |
|
UniGene [31] | Non-redundant set of eukaryotic gene-oriented clusters of transcript sequences, together with information on protein similarities, gene expression, cDNA clone reagents, and genomic location | Entrez query; Library browse; Digital Differential Display; FTP download | http://www.ncbi.nlm.nih.gov/unigene |
|
FlyBase [32] | Drosophila sequences and genomic information | Aberration Maps; Batch download; BLAST; Chromosome Maps; Coordinate Converter; CytoSearch; GBrowse; ID Converter; ImageBrowse; Interactions Browser; QueryBuilder; TermLink; FTP download | http://flybase.bio.indiana.edu/ |
|
Mouse Genome Database (MGD) [33] | Gene characterization, nomenclature, mapping, gene homologies among mammals, sequence links, phenotypes, allelic variants and mutants, and strain data | Genes & Markers Query; Sequence Query; MouseBLAST; Graphical Map Tools; Mouse Genome Browser; Batch Query; MGI Web Service | http://www.informatics.jax.org/ |
|
Saccharomyces Genome Database (SGD) [34] | Genetic and molecular biological information about Saccharomyces cerevisiae | Search Gene function information and Protein information; Specialized Gene and Sequence Searches; Search Yeast Literature; BLAST; Batch download; Pattern Matching; Genome Restriction Analysis; PDB Homology Query; Yeast Protein Motif Query; Yeast Biochemical Pathways; Gene Expression Connection | http://www.yeastgenome.org/ |
|
WormBase [35] | Data repository for C. elegans and C. briggsae | Gene, Phenotype, protein, and Genetics Search; Microarray Expression download and Pattern search; Ontology Search | http://www.wormbase.org/ |
|
The Arabidopsis Information Resource (TAIR) [36] | The genetic and molecular biology information resource about Arabidopsis | Synteny Viewer; MapViewer; Pattern Matching; Motif Analysis; Bulk Data Retrieval; Chromosome Map Tool; Restriction Analysis | http://www.arabidopsis.org/ |
Taxonomy | | | |
|
NCBI Taxonomy [37] | Names of all organisms that are represented in the genetic databases with at least one nucleotide or protein sequence | Browse; Retrieve and FTP download | http://www.ncbi.nlm.nih.gov/Taxonomy/ |
|
UniProt Taxonomy [26] | UniProt taxonomy database, which integrates taxonomy data compiled in the NCBI database and data specific to the UniProt Knowledgebase | Query the database by keywords (species name) or NCBI taxonomic identifier | http://www.uniprot.org/taxonomy/ |
|
Gene Expression | | | |
|
Gene Expression Omnibus (GEO) [38] | Public repository for high-throughput microarray experimental data | Search by accession number; Search Entrez GEO DataSets or Entrez GEO Profiles with keywords; Visualize cluster heat map images; Retrieve other genes with similar expression patterns; Retrieve chromosomally closest 20 genes; FTP download | http://www.ncbi.nlm.nih.gov/geo/ |
|
CleanEx [39] | Expression reference database that facilitates joint analysis and cross-dataset comparisons | Search by ID, Gene symbol and target ID; List expression datasets; Text search in expression datasets description lines; Extract all features of common genes between datasets; Experiments pools comparison; Batch retrieval; FTP download | http://www.cleanex.isb-sib.ch/ |
|
SOURCE [40] | Functional genomics resource for human, mouse and rat to facilitate the analysis of large sets of data using genome-scale experimental approaches | Search by CloneID, Database Accession, Gene name/Symbol, UniGene ClusterID, Probe ID, and Entrez GeneID; Batch retrieval | http://source.stanford.edu/ |
|
ArrayExpress [41] | Public repository for well-annotated data from array based platforms, including gene expression, comparative genomic hybridization (CGH) and chromatin-immunoprecipitation (ChIP) experiments, tiling arrays, and so forth | Web-based query interface; REST and Web-services access; FTP download; Web-based online microarray analysis tool—Expression Profiler | http://www.ebi.ac.uk/microarray-as/ae |
|
Proteomic Peptide ID Databases | | | |
|
Global Proteome Machine Database (GPMDB) [42] | Global Proteome Machine Database, which utilizes the information obtained by GPM servers to aid in peptide validation as well as protein coverage patterns | Search by protein description keywords, and data set keywords | http://gpmdb.thegpm.org/ |
|
PRoteomics IDEntifications Database (PRIDE) [43] | PRIDE database provides public data repository for proteomics data | Search by PRIDE Experiment accession number and Protein accessions; Browse experiments by project name or categories such as species, tissue, cell type, GO terms and disease; Ontology Lookup Service (OLS); Protein Identifier Cross Reference (PICR) service; Database on Demand (DOD) | http://www.ebi.ac.uk/pride/ |
Peptidome [44] | Public repository that archives and freely distributes tandem mass spectrometry peptide and protein identification data | Search by Accession, Author, Description, MeSH Terms, Organism, Peptide Count, Platform, Protein Count, Protein GI, Publication Date, Search Engine, Spectra Count, Submitter Institute, Title, Update Date | http://www.ncbi.nlm.nih.gov/peptidome |
|
PeptideAtlas [45] | Peptide database identified by Tandem Mass Proteomics experiments | Search by Protein/Gene Name, Protein/Gene ID, Protein/Gene Symbol, Accession, Refseq, Sequence and Peptide Accession; Browse Peptides; Browse Proteins; FTP download | http://www.peptideatlas.org/ |
|
Protein Expression | | | |
|
Swiss-2DPAGE [46] | Annotated 2D gel electrophoresis database contains data on proteins identified on various 2D PAGE and SDS-PAGE reference maps | Search by description, accession number, author, spot serial number, experimental pI/Mw range and experimental identification methods; Retrieve all the protein entries identified on a given reference map; Compute estimated location on reference maps for a user-entered sequence; FTP download | http://ca.expasy.org/ch2d |
|
Function and Pathway | | | |
|
Kyoto Encyclopedia of Genes and Genomes (KEGG) [47] | Integrated database resource consisting of 16 main databases, broadly categorized into systems information, genomic information, and chemical information | Access by KEGG object identifier; KEGG Web Services and KEGG FTP download; Pathway Mapping; Brite Mapping; KegHier for browsing and searching functional hierarchies in KEGG BRITE; KegArray for analysis of transcriptome data (gene expression profiles) and metabolome data (compound profiles) | http://www.genome.jp/kegg/ |
|
BioCyc [48] | Microbial pathway/genome databases | Visualize individual metabolic pathways; View the complete metabolic map of an organism; Genome browsing capabilities and comparative analysis tools | http://biocyc.org/ |
|
Genetic Variation and Disease | | | |
|
Online Mendelian Inheritance in Man (OMIM) [49] | A catalog of human genetic and genomic phenotypes | Entrez search at basic, advanced, or complex Boolean levels; Browse entries; Build query; Combine search results; Store search results in Clipboard; FTP download | www.ncbi.nlm.nih.gov/sites/entrez?db=omim |
|
HapMap [50] | Resource for human genetic variation | Browse data; Bulk data download; HapMart—a data mining tool for retrieving data from the HapMap database | http://www.hapmap.org/ |
|
Ontology | | | |
|
Gene Ontology (GO) [51] | Gene Ontology database provides controlled vocabulary of terms describing Biological process, Cellular component, and Molecular function of gene and gene product annotation data | Tools include Browsers, Microarray tools, Annotation tools, Mapping to other databases, FTP download in Flat file, MySQL or RDF XML format | http://www.geneontology.org/ |
Interaction | | | |
|
IntAct [52] | Protein-protein interaction data | Browse by UniProt Taxonomy, Gene Ontology, Interpro Domain, Reactome Pathway, Chromosomal Location, and mRNA expression, FTP download in PSI-MI and PSI-MI TAB format | http://www.ebi.ac.uk/intact |
|
Database of Interacting Proteins (DIP) [53] | Database of experimentally determined interactions between proteins with curator or computational methods generated annotations | Search by protein entry, BLAST, Motif, Article and pathBLAST; Data analysis services include Expression Profile Reliability Index, Paralogous Verification, and Domain Pair Verification | http://dip.doe-mbi.ucla.edu/ |
|
Modification | | | |
|
RESID [54] | Collection of annotations and structures for Protein Pre-, Co- and Post-translational modifications | Web-based search interface; FTP download database entries in XML format, and associated files containing XML DTD, graphic images, and molecular models | http://www.ebi.ac.uk/RESID |
|
Phosphosite [55] | Database of phosphorylation sites and other Post-translational modifications | Search by Protein, Sequence, or Reference; Browse MS data by Disease, Cell Line, and Tissue | http://www.phosphosite.org/ |
|
Structure | | | |
|
Protein Data Bank (PDB) [56] | Database of experimentally-determined structures of proteins, nucleic acids, and complex assemblies | Web-based search and browsing interface; File download via http and FTP services in PDB, mmCIF, and PDBML/XML format | http://www.pdb.org/pdb/home/home.do |
|
Structural Classification of Proteins (SCOP) [57] | Comprehensive ordering of all proteins of known structure according to their evolutionary and structural relationships | Keywords-based search | http://scop.mrc-lmb.cam.ac.uk/ |
|
CATH [58] | Protein domain structures database | Search by ID/Keywords and FASTA sequence; BLAST; Cathedral server, and SSAP server for query and analysis CATH data; FTP download | http://www.cathdb.info/ |
|
Molecular Modeling Database (MMDB) [59] | Database of 3D structures | Search by UID/text term, protein sequence and 3D coordinates; FTP download | http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml |
|
PDBsum [60] | Summaries and analyses of PDB structures | Search by text or sequence; Browse by Highlights, List of PDB codes, Het Groups, Ligands, Enzymes, ProSite and Species; Download data file for protein names, protein sequences, protein annotations, Enzymes, Het Groups, and Ligands | http://www.ebi.ac.uk/pdbsum |
|
Protein Structure Model Database (Modbase) [61] | Annotated comparative protein structure models and related resources | Search by model or sequence similarity and properties | http://modbase.compbio.ucsf.edu/modbase-cgi/index.cgi |
|
Classification | | | |
|
PIRSF [62] | Family/superfamily classification of whole proteins | Batch retrieval using UniProtKB AC, PIRSF ID, Pfam ID, COG ID, EC Number, GO ID, KEGG Pathway ID, PDB ID; PIRSF scan by sequence or UniProtKB identifier; FTP download | http://pir.georgetown.edu/pirwww/dbinfo/pirsf.shtml |
UniProt Reference Clusters (UniRef) [26] | UniProt non-redundant reference clusters | Searches on various attributes of the UniRef clusters, including UniRef cluster ID, protein names, organism names and database identifiers; Direct web access in HTML, XML and FASTA format; FTP download in XML format | http://www.uniprot.org/help/uniref |
|
Pfam [63] | Protein families of domains each represented by multiple sequence alignments and hidden Markov models (HMMs) | Search by Sequence, Functional similarity, Keyword, Domain, DNA, and Taxonomy; Browse by Families, Clans, Proteomics; FTP download | http://pfam.sanger.ac.uk |
|
InterPro [64] | Integrated resource of protein families, domains, and functional sites | Text search; SRS text search; InterPro Scan; InterPro BoMart; Web services; FTP download | http://www.ebi.ac.uk/interpro |
|
Protein ANalysis THrough Evolutionary Relationships (PANTHER) Classification System [65] | Gene products organized by biological function | Search; Browse; Batch search; Gene expression data analysis; Evolutionary analysis of coding SNPs; HMM sequence scoring; FTP download | http://www.pantherdb.org/panther |
|
Simple Modular Architecture Research Tool (SMART) [66] | Resource for protein domain identification and the analysis of protein domain architectures | Sequence analysis; Architecture analysis; Domain detection | http://smart.embl-heidelberg.de/ |
|