Rule-Based Knowledge Acquisition Method for Promoter Prediction in Human and Drosophila Species
Table 4
Top 20 descriptors of 4-mer motifs. Top 20 descriptors of the 4-mer motifs are contained in the reference set of 167 DNASDs. The descriptors of the TATA motif are ranked at the 199th and 98th when applied for the HPL and DPL datasets, respectively.
Rank
HPL dataset
DPL dataset
4-mer motif
Score
Included ()
4-mer motif
Score
Included ()
1
TGAA
1000
+
AAAG
1000
+
2
TGAT
941
+
AAGA
956
+
3
CCGG
878
−
TTCG
948
+
4
TATG
843
+
AGAA
922
−
5
TGGA
817
−
GAAA
866
−
6
GATG
770
+
AAGG
791
+
7
TCAA
739
+
CGCC
787
−
8
TACA
702
+
AGAT
777
−
9
AGGC
697
−
AATA
759
−
10
ATGA
694
+
TCGC
747
−
11
TTGA
672
+
TGAT
744
+
12
CGGC
662
−
TGAA
732
−
13
CAGG
651
−
ATCG
732
+
14
ATGT
634
−
TCGA
724
+
15
AGCG
633
−
CGGT
724
−
16
CGCG
629
−
ATAA
712
+
17
AGCC
618
−
CGAT
710
−
18
TCAT
595
−
CGCG
703
−
19
GAGC
592
+
GAAG
699
+
20
AGGG
582
−
ATAG
697
−
25 AAGT
642
199
TATA
111
−
98 TATA
365
−
+: included in the set of DNASDs. −: not included in the set of DNASDs.