Research Article

Predictive Models of Gene Regulation from High-Throughput Epigenomics Data

Table 5

Correctly classified instances in each transcript subset. Sets are filtered to avoid overlapping gene bodies, promoters or tails from transcript loci from different genes in the same or opposite strands (Section 2). Attribute selection has been applied to each pair: P1 (CFS) and P2 (CFS), for each of the subsets of intron-containing loci, high (HCG) or low (LCG) CG content promoter. The attribute sets correspond to the ones from Table 4(b): P1 (CFS) denotes the model for P1, where the attributes used are those that have a score 80 or higher (maximum 100) using the CFS attribute selection method. P2 (CFS-P1) indicates that the model was trained using the data from P2 but the attributes selected using CFS on P1. P1 (CFS)-on-P2 indicates that the model was trained with pair P1 with only selected attributes and tested on pair P2.

AttributesTranscript loci setInstances in totalCorrectly classified instances

P1 (CSF)LCG-IC1767 1185 (67.06%)
HCG-IC1959 1182 (60.34%)
P2 (CSF-P1)LCG-IC585454 (77.60%)
HCG-IC792577 (72.85%)
P1 (CSF)-on-P2LCG-IC585410 (70.09%)
HCG-IC792445 (56.19%)