Research Article

Predictive Models of Gene Regulation from High-Throughput Epigenomics Data

Table 4

We show the accuracy in terms of the area under the ROC curve (AUC) for the 10-fold cross validation for the IC transcript sets for various training conditions. The results are shown for all the transcript loci before (a) and after (b) filtering for the overlaps in opposite strands and overlaps of promoters and tails (Section 2). P1 (with RNAPII) corresponds to pair P1 with the additional RNAPII attribute, that is, the same attributes as P2 plus RNAPII. P1 and P2 denote the models for each cell line pairs with all the attributes. P1 (CFS) and P2 (CFS) denote the models for P1 and P2, respectively, where the attributes used are those that have a score 80 or higher (maximum 100) using the CFS attribute selection method independently for P1 and P2. P2 (CFS-P1) indicates that the model was trained using the data from P2 but the attributes selected using CFS on P1. P1-on-P2 indicates that the model was trained with pair P1 with all attributes and tested on pair P2. P1 (CFS)-on-P2 indicates that the model was trained with pair P1 with only selected attributes and tested on pair P2.
(a) Before filtering

AttributesHCG-ICLCG-IC
UpDwNrAverageUpDwNrAverage

P1 (with RNAPII)0.80.790.740.780.820.870.780.83
P10.790.790.740.770.830.860.760.82
P1 (CFS)0.80.790.740.780.820.860.760.81

P20.850.830.810.830.90.880.830.87
P2 (CFS-P1)0.850.830.80.830.90.880.830.87
P1-on-P20.830.770.630.740.880.830.710.81
P1(CFS)-on-P20.830.80.570.730.880.840.740.82

(b) After filtering

AttributesHCG-ICLCG-IC
UpDwNrAverageUpDwNrAverage

P1 (with RNAPII)0.790.840.760.80.850.90.810.86
P10.790.820.750.790.860.890.760.84
P1 (CFS)0.790.810.730.780.840.90.770.84

P20.890.880.850.870.920.910.850.89
P2 (CFS-P1)0.870.870.840.860.920.920.860.9
P1-on-P20.890.870.70.820.920.890.790.87
P1(CFS)-on-P20.850.820.680.780.910.890.810.87