Research Article

Predictive Models of Gene Regulation from High-Throughput Epigenomics Data

Table 3

Regions considered per transcript locus for the calculation of the different attributes. We defined the 13 regions based on the gene annotations from Gencode version 7 (Ensembl 62).

TypeRegionDescription

Promoter 2 kbRegion starting 2 kb upstream of the transcription start site (TSS) and ending 1 bp before the TSS;
Promoter 5 kbRegion starting 5 kb upstream of the TSS and ending 1 bp before the TSS;
Fixed-length regionsTSS ± 2 kbRegion starting 2 kb upstream of the TSS and ending 2 kb downstream
TSS ± 5 kbRegion starting 5 kb upstream of the TSS and ending 5 kb downstream
pA ± 2 kbRegion starting 2 kb upstream of the pA and ending 2 kb downstream
TailRegion starting 1 bp after the pA and ending 2 kb downstream

First exonRegion corresponding to the first exon of the transcript locus
First intronRegion corresponding to the first intron of the transcript locus
GBGene body, that is, region between the TSS and the poly-adenylation site (pA) of an annotated transcript locus
Variable-length regionsGB3′ss Region between the first 3′ splice-site and the pA of an annotated transcript locus
GB ± 1 kbGene body with additional 1 kb stretches up- and downstream
GB ± 5 kbGene body with additional 5 kb stretches up- and downstream
GB + 5 kbGene body with an additional 5kb stretch downstream of the pA