Research Article

GenSensor Suite: A Web-Based Tool for the Analysis of Gene and Protein Interactions, Pathways, and Regulation

Table 1

List of the gene sets available at the GenSensor Suite, and how they were generated. All gene sets are species-specific, and are updated regularly by utilizing the gene set construction scripts for a given gene set to parse the new updated annotations, and then replacing the old version with the new gene set.

Gene setDescription of contents

Gene ontology termsGene sets represent gene ontology categories. Each GO gene set contains genes annotated with the indicated term or any subterms [13].
Details: gene ontologies were extracted from the gene2go file from NCBI’s gene database. The GO terms and their relationships were extracted from the Ontologies file from the Gene Ontology web-site. Gene sets were created for genes matching each GO term. Matched genes also include matches to any GO subterm

Kegg pathwaysGene sets represent the biosynthetic and regulatory pathways from the KEGG database [14].
Details: pathway “.html” files were downloaded from KEGG. Entrez gene identifiers were extracted from the pathways imagemap annotations. Gene IDs from each extraction were made into separate gene sets.

IPI protein subcellular locationsGene sets are collections of genes whose proteins are annotated with particular subcellular locations.
Details: the UniProt formatted protein file was downloaded from the International Protein Index (IPI) at the EBI. Subcellular location terms were extracted from the “SUBCELLULAR LOCATION:” subsection of the Comment lines (CC) in the protein annotations. Unigene identifiers in the annotations were converted to Entrez gene identifiers using the “gene2unigene” file provided at the NCBI

IPI protein key wordsGene sets are collections of genes whose proteins are annotated with particular key words.
Details: the UniProt formatted protein files were downloaded from IPI at the EBI. Key words were extracted from the Keywords lines (KW) in the protein annotations. Unigene identifiers were converted to Entrez gene as above

Disease-related publication genes (human only)Gene sets are collections of genes which are referenced in PubMed publications which are related to the disease terms found at MESH.
Details: the MESH ontologies were downloaded from National Library of Medicine’s Medical Subject Headings (MESH) website. The lowest level terms (leaf nodes) were submitted to the PubMed website to identify all PubMed IDs matching the term as a MESH major topic. The genes described in each publication were identified using the gene2pubmed file at the NCBI. If a gene from a publication was nonhuman, the human homolog was identified using the Homologene data from NCBI. Publications discussing more than 100 genes were excluded as these generally were nonspecific discussion of EST libraries or microarrays

Drug-related publication genes (human only)Gene sets are collections of genes which are referenced in PubMed publications which are related to the chemical and drug terms found at MESH.
Details as above

miRNA targetsGene sets represent collections of potential gene targets of particular microRNAs as predicted by the Sanger Institute.
Details: the miRNA predicted targets and the miRNA data files were downloaded from the Sanger Institute. The “external identifier” in the targets file was converted to Entrez gene identifiers using files provided by the NCBI

TF binding sitesGene sets represent collections of genes with particular transcription factor binding sites located within 1000 bp upstream of their transcription start site. TF binding site predictions were made using minimal false positive or minimal false-negative settings.
Details: genomic sequences for the regions 1000 bp upstream of all human/mouse RefSeq transcripts were obtained from download pages for each organism from the UCSC genome browser. TRANSFAC analysis was performed using BioBase’s “Match” tool. The “TF Binding Sites (min. false pos.)” set are the genes identified using the minimal false-positives vertebrate profiles (from minFP_good102.prf). Analyses were also performed with the minimal false-negative vertebrate profiles (from minFN_good102.prf)

Tissue specific genesGene sets represent collections of genes whose expression is predominately confined to a few tissues.
Details: gene expression and annotation data for mouse was downloaded from the Genomics Institute of the Novartis Research Foundation’s website. The intensities across all tissues were summed for each probeset. Intensity values for each probeset in each tissue were compared to the summed intensity value. If the intensity in a particular tissue was ≥25% of the total, the gene for that probeset was added to a collection of genes specific for that tissue