Advances in Bioinformatics / 2010 / Article / Tab 1

Review Article

Protein Bioinformatics Infrastructure for the Integration and Analysis of Multiple High-Throughput “omics” Data

Table 1

Commonly used molecular biology databases for functional analysis of gene and protein expression data.

Database nameDatabase contentData access and analysis supportURL

Protein Sequence

UniProtKB/Swiss-Prot and UniProtKB/TrEMBL, UniProt Archive (UniParc) [26]UniProt protein sequences and functional information, comprehensive and non-redundant database that contains most of the publicly available protein sequences in the worldText search; Blast sequence similarity search; Sequence alignment; Batch retrieval; Database ID mapping; FTP download

NCBI Reference Sequence (RefSeq) [27]Non-redundant collection of richly annotated DNA, RNA, and protein sequencesEntrez query access; Searching Nucleotide or Protein; Searching Genome; BLAST; FTP download; Sequence Homology searches and retrieval

Gene and Genome

GenBank [28]Genetic sequence database, an annotated collection of all publicly available DNA sequences databasesDatabase query; Phylogenetics; Genome Analyses; FTP download
EMBL [29]
DDBJ [30]

UniGene [31]Non-redundant set of eukaryotic gene-oriented clusters of transcript sequences, together with information on protein similarities, gene expression, cDNA clone reagents, and genomic locationEntrez query; Library browse; Digital Differential Display; FTP download

FlyBase [32]Drosophila sequences and genomic informationAberration Maps; Batch download; BLAST; Chromosome Maps; Coordinate Converter; CytoSearch; GBrowse; ID Converter; ImageBrowse; Interactions Browser; QueryBuilder; TermLink; FTP download

Mouse Genome Database (MGD) [33]Gene characterization, nomenclature, mapping, gene homologies among mammals, sequence links, phenotypes, allelic variants and mutants, and strain dataGenes & Markers Query; Sequence Query; MouseBLAST; Graphical Map Tools; Mouse Genome Browser; Batch Query; MGI Web Service

Saccharomyces Genome Database (SGD) [34]Genetic and molecular biological information about Saccharomyces cerevisiae Search Gene function information and Protein information; Specialized Gene and Sequence Searches; Search Yeast Literature; BLAST; Batch download; Pattern Matching; Genome Restriction Analysis; PDB Homology Query; Yeast Protein Motif Query; Yeast Biochemical Pathways; Gene Expression Connection

WormBase [35]Data repository for C. elegans and C. briggsae Gene, Phenotype, protein, and Genetics Search; Microarray Expression download and Pattern search; Ontology Search

The Arabidopsis Information Resource (TAIR) [36]The genetic and molecular biology information resource about Arabidopsis Synteny Viewer; MapViewer; Pattern Matching; Motif Analysis; Bulk Data Retrieval; Chromosome Map Tool; Restriction Analysis

NCBI Taxonomy [37]Names of all organisms that are represented in the genetic databases with at least one nucleotide or protein sequenceBrowse; Retrieve and FTP download

UniProt Taxonomy [26]UniProt taxonomy database, which integrates taxonomy data compiled in the NCBI database and data specific to the UniProt KnowledgebaseQuery the database by keywords (species name) or NCBI taxonomic identifier

Gene Expression

Gene Expression Omnibus (GEO) [38]Public repository for high-throughput microarray experimental dataSearch by accession number; Search Entrez GEO DataSets or Entrez GEO Profiles with keywords; Visualize cluster heat map images; Retrieve other genes with similar expression patterns; Retrieve chromosomally closest 20 genes; FTP download

CleanEx [39]Expression reference database that facilitates joint analysis and cross-dataset comparisonsSearch by ID, Gene symbol and target ID; List expression datasets; Text search in expression datasets description lines; Extract all features of common genes between datasets; Experiments pools comparison; Batch retrieval; FTP download

SOURCE [40]Functional genomics resource for human, mouse and rat to facilitate the analysis of large sets of data using genome-scale experimental approachesSearch by CloneID, Database Accession, Gene name/Symbol, UniGene ClusterID, Probe ID, and Entrez GeneID; Batch retrieval

ArrayExpress [41]Public repository for well-annotated data from array based platforms, including gene expression, comparative genomic hybridization (CGH) and chromatin-immunoprecipitation (ChIP) experiments, tiling arrays, and so forthWeb-based query interface; REST and Web-services access; FTP download; Web-based online microarray analysis tool—Expression Profiler

Proteomic Peptide ID Databases

Global Proteome Machine Database (GPMDB) [42]Global Proteome Machine Database, which utilizes the information obtained by GPM servers to aid in peptide validation as well as protein coverage patternsSearch by protein description keywords, and data set keywords

PRoteomics IDEntifications Database (PRIDE) [43]PRIDE database provides public data repository for proteomics dataSearch by PRIDE Experiment accession number and Protein accessions; Browse experiments by project name or categories such as species, tissue, cell type, GO terms and disease; Ontology Lookup Service (OLS); Protein Identifier Cross Reference (PICR) service; Database on Demand (DOD)
Peptidome [44]Public repository that archives and freely distributes tandem mass spectrometry peptide and protein identification dataSearch by Accession, Author, Description, MeSH Terms, Organism, Peptide Count, Platform, Protein Count, Protein GI, Publication Date, Search Engine, Spectra Count, Submitter Institute, Title, Update Date

PeptideAtlas [45]Peptide database identified by Tandem Mass Proteomics experimentsSearch by Protein/Gene Name, Protein/Gene ID, Protein/Gene Symbol, Accession, Refseq, Sequence and Peptide Accession; Browse Peptides; Browse Proteins; FTP download

Protein Expression

Swiss-2DPAGE [46]Annotated 2D gel electrophoresis database contains data on proteins identified on various 2D PAGE and SDS-PAGE reference mapsSearch by description, accession number, author, spot serial number, experimental pI/Mw range and experimental identification methods; Retrieve all the protein entries identified on a given reference map; Compute estimated location on reference maps for a user-entered sequence; FTP download

Function and Pathway

Kyoto Encyclopedia of Genes and Genomes (KEGG) [47]Integrated database resource consisting of 16 main databases, broadly categorized into systems information, genomic information, and chemical informationAccess by KEGG object identifier; KEGG Web Services and KEGG FTP download; Pathway Mapping; Brite Mapping; KegHier for browsing and searching functional hierarchies in KEGG BRITE; KegArray for analysis of transcriptome data (gene expression profiles) and metabolome data (compound profiles)

BioCyc [48]Microbial pathway/genome databasesVisualize individual metabolic pathways; View the complete metabolic map of an organism; Genome browsing capabilities and comparative analysis tools

Genetic Variation and Disease

Online Mendelian Inheritance in Man (OMIM) [49]A catalog of human genetic and genomic phenotypesEntrez search at basic, advanced, or complex Boolean levels; Browse entries; Build query; Combine search results; Store search results in Clipboard; FTP

HapMap [50]Resource for human genetic variationBrowse data; Bulk data download; HapMart—a data mining tool for retrieving data from the HapMap database


Gene Ontology (GO) [51]Gene Ontology database provides controlled vocabulary of terms describing Biological process, Cellular component, and Molecular function of gene and gene product annotation dataTools include Browsers, Microarray tools, Annotation tools, Mapping to other databases, FTP download in Flat file, MySQL or RDF XML format

IntAct [52]Protein-protein interaction dataBrowse by UniProt Taxonomy, Gene Ontology, Interpro Domain, Reactome Pathway, Chromosomal Location, and mRNA expression, FTP download in PSI-MI and PSI-MI TAB format

Database of Interacting Proteins (DIP) [53]Database of experimentally determined interactions between proteins with curator or computational methods generated annotationsSearch by protein entry, BLAST, Motif, Article and pathBLAST; Data analysis services include Expression Profile Reliability Index, Paralogous Verification, and Domain Pair Verification


RESID [54]Collection of annotations and structures for Protein Pre-, Co- and Post-translational modificationsWeb-based search interface; FTP download database entries in XML format, and associated files containing XML DTD, graphic images, and molecular models

Phosphosite [55]Database of phosphorylation sites and other Post-translational modificationsSearch by Protein, Sequence, or Reference; Browse MS data by Disease, Cell Line, and Tissue


Protein Data Bank (PDB) [56]Database of experimentally-determined structures of proteins, nucleic acids, and complex assembliesWeb-based search and browsing interface; File download via http and FTP services in PDB, mmCIF, and PDBML/XML format

Structural Classification of Proteins (SCOP) [57]Comprehensive ordering of all proteins of known structure according to their evolutionary and structural relationshipsKeywords-based search

CATH [58]Protein domain structures databaseSearch by ID/Keywords and FASTA sequence; BLAST; Cathedral server, and SSAP server for query and analysis CATH data; FTP download

Molecular Modeling Database (MMDB) [59]Database of 3D structuresSearch by UID/text term, protein sequence and 3D coordinates; FTP download

PDBsum [60]Summaries and analyses of PDB structuresSearch by text or sequence; Browse by Highlights, List of PDB codes, Het Groups, Ligands, Enzymes, ProSite and Species; Download data file for protein names, protein sequences, protein annotations, Enzymes, Het Groups, and Ligands

Protein Structure Model Database (Modbase) [61]Annotated comparative protein structure models and related resourcesSearch by model or sequence similarity and properties


PIRSF [62]Family/superfamily classification of whole proteinsBatch retrieval using UniProtKB AC, PIRSF ID, Pfam ID, COG ID, EC Number, GO ID, KEGG Pathway ID, PDB ID; PIRSF scan by sequence or UniProtKB identifier; FTP download
UniProt Reference Clusters (UniRef) [26]UniProt non-redundant reference clustersSearches on various attributes of the UniRef clusters, including UniRef cluster ID, protein names, organism names and database identifiers; Direct web access in HTML, XML and FASTA format; FTP download in XML format

Pfam [63]Protein families of domains each represented by multiple sequence alignments and hidden Markov models (HMMs)Search by Sequence, Functional similarity, Keyword, Domain, DNA, and Taxonomy; Browse by Families, Clans, Proteomics; FTP download

InterPro [64]Integrated resource of protein families, domains, and functional sitesText search; SRS text search; InterPro Scan; InterPro BoMart; Web services; FTP download

Protein ANalysis THrough Evolutionary Relationships (PANTHER) Classification System [65]Gene products organized by biological functionSearch; Browse; Batch search; Gene expression data analysis; Evolutionary analysis of coding SNPs; HMM sequence scoring; FTP download

Simple Modular Architecture Research Tool (SMART) [66]Resource for protein domain identification and the analysis of protein domain architecturesSequence analysis; Architecture analysis; Domain detection

Article of the Year Award: Outstanding research contributions of 2020, as selected by our Chief Editors. Read the winning articles.