Research Article

Predicting RNA 5-Methylcytosine Sites by Using Essential Sequence Features and Distributions

Figure 1

Flow chart to construct models for the prediction of m5C sites. A subsequence with 41 bp is used to represent each m5C site. Features of -mers obtained by RNA2Vec are adopted to constitute features of the subsequence. All features are analysed by max-relevance and min-redundancy method. The outcome feature list is fed into incremental feature selection, incorporating four classification algorithms and 10-fold cross-validation, to construct optimum models.