Research Article

K-mer-Based Motif Analysis in Insect Species across Anopheles, Drosophila, and Glossina Genera and Its Application to Species Classification

Table 3

CC statistics for 5′, 3′ UTRs and introns for k-mer lengths k = 7–9 bp between A. gambiae and Drosophila.

ComparisonRegionkMinMedianMeanMaxSt. dev.n value

Within Drosophila family5′ UTR70.6920.8620.8410.9750.10021NA
A. gambiae vs. Drosophila family5′ UTR70.6230.7340.7220.7740.05075.1e−4
Within Drosophila family3′ UTR70.6510.8280.8090.9630.10121NA
A. gambiae vs. Drosophila family3′ UTR70.5240.6200.5990.6440.04376.2e−8
Within Drosophila familyIntrons70.7590.8940.8950.9960.05866NA
Within Drosophila family5′ UTR80.5030.7860.7370.9400.15321NA
A. gambiae vs. Drosophila family5′ UTR80.4220.6430.6200.6940.09070.024
Within Drosophila family3′ UTR80.4870.7050.6880.9080.12521NA
A. gambiae vs. Drosophila family3′ UTR80.3920.5130.4980.5620.05571.2e−5
Within Drosophila familyIntrons80.3920.6900.6760.9810.13566NA
Within Drosophila family5′ UTR90.2800.6260.5690.8540.18321NA
A. gambiae vs. Drosophila family5′ UTR90.2010.4530.4310.5120.10470.023
Within Drosophila family3′ UTR90.3340.5260.5240.7950.12221NA
A. gambiae vs. Drosophila family3′ UTR90.2420.3600.3560.4220.05675.9e−5
Within Drosophila familyIntrons90.7210.8550.8540.9780.06266NA

Minimum, mean, median, and maximum CC values were calculated for the 5′, 3′ UTR and intron regions of different Drosophila species compared to A. gambiae. The num[[parms resize(1),pos(50,50),size(200,200),bgcol(156)]] comparisons and the value are also included. For 5′ and 3′ UTRs, the following Drosophila species were examined: D. ananassae, erecta, grimshawi, melanogaster, mojavensis, pseudoobscura, and simulans. For introns, the following Drosophila species were examined: D. ananassae, erecta, grimshawi, melanogaster, mojavensis, persimilis, pseudoobscura, sechelia, simulans, virilis, willistoni, and yakuba.