International Journal of Genomics

Research Article

Predictive Models of Gene Regulation from High-Throughput Epigenomics Data

Table 5

Correctly classified instances in each transcript subset. Sets are filtered to avoid overlapping gene bodies, promoters or tails from transcript loci from different genes in the same or opposite strands (Section 2). Attribute selection has been applied to each pair: P1 (CFS) and P2 (CFS), for each of the subsets of intron-containing loci, high (HCG) or low (LCG) CG content promoter. The attribute sets correspond to the ones from Table 4(b): P1 (CFS) denotes the model for P1, where the attributes used are those that have a score 80 or higher (maximum 100) using the CFS attribute selection method. P2 (CFS-P1) indicates that the model was trained using the data from P2 but the attributes selected using CFS on P1. P1 (CFS)-on-P2 indicates that the model was trained with pair P1 with only selected attributes and tested on pair P2.


Attributes	Transcript loci set	Instances in total	Correctly classified instances

P1 (CSF)	LCG-IC	1767	1185 (67.06%)
P1 (CSF)	HCG-IC	1959	1182 (60.34%)
P2 (CSF-P1)	LCG-IC	585	454 (77.60%)
P2 (CSF-P1)	HCG-IC	792	577 (72.85%)
P1 (CSF)-on-P2	LCG-IC	585	410 (70.09%)
P1 (CSF)-on-P2	HCG-IC	792	445 (56.19%)