Predictive Models of Gene Regulation from High-Throughput Epigenomics Data
Table 3
Regions considered per transcript locus for the calculation of the different attributes. We defined the 13 regions based on the gene annotations from Gencode version 7 (Ensembl 62).
Type
Region
Description
Promoter 2 kb
Region starting 2 kb upstream of the transcription start site (TSS) and ending 1 bp before the TSS;
Promoter 5 kb
Region starting 5 kb upstream of the TSS and ending 1 bp before the TSS;
Fixed-length regions
TSS ± 2 kb
Region starting 2 kb upstream of the TSS and ending 2 kb downstream
TSS ± 5 kb
Region starting 5 kb upstream of the TSS and ending 5 kb downstream
pA ± 2 kb
Region starting 2 kb upstream of the pA and ending 2 kb downstream
Tail
Region starting 1 bp after the pA and ending 2 kb downstream
First exon
Region corresponding to the first exon of the transcript locus
First intron
Region corresponding to the first intron of the transcript locus
GB
Gene body, that is, region between the TSS and the poly-adenylation site (pA) of an annotated transcript locus
Variable-length regions
GB3′ss
Region between the first 3′ splice-site and the pA of an annotated transcript locus
GB ± 1 kb
Gene body with additional 1 kb stretches up- and downstream
GB ± 5 kb
Gene body with additional 5 kb stretches up- and downstream
GB + 5 kb
Gene body with an additional 5kb stretch downstream of the pA