Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2016, Article ID 9565689, 11 pages
http://dx.doi.org/10.1155/2016/9565689
Research Article

BP Neural Network Could Help Improve Pre-miRNA Identification in Various Species

1School of Computer Science and Technology, Tianjin University, Tianjin 300350, China
2School of Information and Electrical Engineering, Hebei University of Engineering, Handan 056038, China
3School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
4State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin 300074, China

Received 17 May 2016; Revised 5 July 2016; Accepted 17 July 2016

Academic Editor: Xing Chen

Copyright © 2016 Limin Jiang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

MicroRNAs (miRNAs) are a set of short (21–24 nt) noncoding RNAs that play significant regulatory roles in cells. In the past few years, research on miRNA-related problems has become a hot field of bioinformatics because of miRNAs’ essential biological function. miRNA-related bioinformatics analysis is beneficial in several aspects, including the functions of miRNAs and other genes, the regulatory network between miRNAs and their target mRNAs, and even biological evolution. Distinguishing miRNA precursors from other hairpin-like sequences is important and is an essential procedure in detecting novel microRNAs. In this study, we employed backpropagation (BP) neural network together with 98-dimensional novel features for microRNA precursor identification. Results show that the precision and recall of our method are 95.53% and 96.67%, respectively. Results further demonstrate that the total prediction accuracy of our method is nearly 13.17% greater than the state-of-the-art microRNA precursor prediction software tools.

1. Introduction

MicroRNAs are some of the most important noncoding RNA genes with rather short length. They regulate the expression of whole organism genes at the posttranscriptional level [1]. miRNA is widely involved in the metabolic activity of the body as well as in many important life processes, including cell proliferation and apoptosis, cell differentiation, growth and development of plants and animals, and organ formation [24]. Recently, several studies have shown that microRNAs are related to several cancers [57] and other diseases [810]. Caligiuri et al. [11] proposed that methods and compositions involving miRNAs are useful for the treatment of various diseases and cancers. Some miRNAs are regarded as potential therapeutic targets for various diseases [12]. Recently, the target gene (cancer gene) drugs, which developed in accordance with the theory on miRNA’s gene silencing, have been used for incurable disease that has become a threat to human health problems for years [13]. In addition, the viral genome can encode a large number of miRNAs by itself. Through combination with target genes and coding by viruses or host cell, these miRNAs can lead to immune escape or antiviral effect against the host cell. Therefore, the accurate prediction of miRNA and its target genes, as well as the correct understanding of miRNA mechanism, has important practical significance in medical treatments. Thus, the research on novel miRNA identification is rather essential.

Feature selection mainly dominated the performance of the prediction model in the machine learning process [1420]. In addition, effective features can represent the characteristics of the entire sequence data, which enables easy-to-build better prediction model. To represent the microRNA precursors, Xue et al. [21] proposed 32D novel triplet features, which involved secondary structure information. Jiang et al. [22] found that random rearrangement of the sequence could help obtain significant free-energy features. However, the free-energy computation for many random rearrangement sequences is very time consuming. Wei et al. [23] combined Xue et al.’s features and triplet nucleotide frequency to 98D features and obtained good performance result in human pre-miRNA identification. However, more features would not mean better performance because of some irrelevant and redundant features in the high dimensional or ultra-high dimensional feature set. The purpose of feature selection is to eliminate the irrelevant and redundant features of the feature set. In addition, the training time could be reduced effectively by the feature selection optimization [24]. Some studies focus on developing computational predictors by incorporating the sequence-order or structure-order effects [25, 26]. Several works indicated that proper features could improve the prediction performance of classification in a certain extent. For example, Wang et al. [27] employed the feature selection techniques to optimize the features in miR-SF. They proved that an optimized feature subset could improve the prediction performance. In addition, some popular recently proposed multiobjective optimization evolutionary algorithms can also be used as a possibly promising feature selection approach [28, 29].

Another factor that affects the performance of machine learning prediction method is the classifier algorithm. The selection of different classifiers often leads to the difference of classification results. Several different classifiers and strategies were employed for miRNA identification. Bayesian classifier algorithm was tested for predicting miRNA across different species in 2006 [30]. The method also utilized the multiple species of miRNA sequences and structural features. It proved that miRNA genes could be detected effectively in large scale of different species genomes.

MiPred classifier was tested for predicting miRNA in 2007 [22]. The method utilized random forest classifier algorithm. The prediction accuracy of MiPred is 10% higher than that of Triplet-SVM; the sensitivity and specificity of MiPred can reach to 95.09% and 98.21%. CSHMM classifier was also used for mining miRNA sequences from the genome [31], which utilized the Markov model. Overall, the accuracy of machine learning algorithm was up to 90%. The machine learning method is more accurate than the other methods.

In this study, we chose backpropagation neural network as the classifier. It has three advantages, including better generalization performance, faster learning speed, and good learning ability.

2. miRNA Identification with BP Neural Network

2.1. Pre-miRNA Features
2.1.1. -Gram Frequency

Some studies showed that the local primary sequence is crucial to the pre-miRNA sequence [32]. Thus, the -gram frequency is often applied for the feature map in the selection of the primary sequence feature [33, 34]. However, no good methods are still available for tuning the value of . In general, we choose by comparing the effect of -gram frequency with different -values. In our feature set, we select the different values () for comparison. The different frequency characteristics have almost the same effect on the classifier. Thus, consider that its base and adjacent base have practical biological significance. We chose as 3. A total of 64 ()-dimensional frequency features were calculated.

2.1.2. Triple Structure Sequence

In addition to high specificity of the primary sequence features, the secondary structure sequence of pre-miRNA is also a contributing factor. To analyze the contribution of the secondary structure, the secondary structure prediction software RNAfold is used to calculate the potential structures. In the secondary structure, each nucleotide of the sequence corresponds to two states, matching and nonmatching: record matching as “(” or “)” and nonmatching as “·.” In the structure, three character groups are considered as a unit, and every “)” is replaced as “(.” Thus, 8 () different combinations are available as a unit, including “(((,” “((·,” “(·(,” “·((,” “(··,” “·(·,” “··(,” and “···.”

To characterize pre-miRNA sequence better, the first nucleotide of the corresponding subsequence was added to the front of each structure unit. This provides 32 different combinations, that is, “A(((,” “U((·,” …, “G·((,” “G···.” For a sequence, the occurrence frequency of each combination is determined and coded into the 32D feature vector as the input of the classifier. This calculated 32D triple structure sequence feature is used to train the SVM classifier; the inclusion of the SVM classifier significantly improved the classification ability of pre-miRNA sequences [21].

2.1.3. Energy Characteristics

The real pre-miRNA sequences are generally more stable and show a lower minimum of free energy (MFE) than the randomly generated pre-miRNA. Therefore, energy characterization is often used to describe the structure pre-miRNA sequence as an aspect of feature extraction of the pre-miRNA sequence. To do this, the MFE value is obtained by using RNAfold to calculate the structure.

2.1.4. Structural Diversity Characteristics

The potential for nucleotide pairing in the sequence is a significant characteristic that can also be used to describe the pre-miRNA sequence. This includes both traditional Watson-Crick nucleotide pairing (A–U pairing and C–G pairing) and also other forms of nucleotide pairing, such as the G–U pairing that can occur in the loop of RNA hairpin structures. We included possible G–U pairing in our description of base pairing.

To summarize, we extracted 98 features for the input of the neural network, including 64-dimensional -gram frequency characteristics, 32-dimensional triple structure sequence characteristics, one-dimensional energy feature, and one-dimensional structural diversity characteristics.

2.2. Fixing the Number of Nodes in the Hidden Layer

In general, to select the number of nodes in the hidden layer in changing the BP neural network structure is difficult. Technically, a hidden layer could facilitate operation. However, too many hidden layers can reduce the operation rate.

Currently, no theoretical methods are available to fix the number of nodes in the hidden layer. However, the number generally depends on the empirical formula, as calculated in where represents the neuron number of the hidden layers, is the neuron number of the input layers, is the neuron number of the output layers, and is a constant between 1 and 10.

In this study, and . Therefore, (1) can be used for any values between 11 and 20. A comprehensive analysis of the training results with different numbers of nodes in the hidden layer was performed with the error set to 0.0001. A total of 621 samples were used to train the network, and one sample was used to test the network. The results are shown in Table 1.

Table 1: Corresponding training results with different numbers of nodes in the hidden layers.

From the data shown in Table 1, the increased number of nodes in the hidden layer did not result in better convergence. Additionally, the increased number of nodes increased the network parameters and greatly increased the amount of calculation of the classifier. Thus, keeping 13 nodes in the hidden layers required relatively less training times and less error and still produced relatively good training effects.

2.3. Fixing the Number of Nodes in the Output Layer

Two kinds of output exist, positive and negative, which are represented as 1 for a positive sample and 0 for a negative sample. The topology structure of this prediction method based on BP neural network is shown in Figure 1.

Figure 1: Topology structure of the BP neural network.
2.4. Selecting Training and Test Model Samples

The collection and organization of training samples are often limited by the objective conditions. Appropriate numbers of training samples are required to achieve sufficient precision. Therefore, it refers to the rule of experience:where represents the numbers of training samples and is the total of network connection weight equal to the sum of nodes of the input and hidden layers. In this study, 2236 samples were used for training.

The data set used for the pre-miRNAs was downloaded from http://bioinf.sce.carleton.ca/SMIRP [35], and these data include negative and positive samples for Arabidopsis lyrata. The FASTA file was converted to ARFF file using a jar package written by Java converting the reference index to numerical form. We randomly selected real pre-miRNAs and pseudo pre-miRNAs to evaluate our algorithm.

2.5. Error Evaluation Steps Based on BP

The structure of the intelligent diagnosis model contains three layers of 98-13-1. First, we set the nodes of the input, output, and hidden layers as , , and , respectively. Assuming the training sample set , the weight matrix between the input and hidden layers can be written as , where and . We assume the connection weight matrix between the hidden and output layers as , where , . Then, respectively, take and as the activation function of each node of the hidden and output layers. To simplify the derivation, we use the vector function for , where . After input of the sample , the actual output can be calculated by

The error function is defined in

Objectively, the target of BP training is to compute the and to minimize the solution of the error function . With this, a combination of gradient descent, common, and simple derivatives was used. To simplify the derivation process, we derive

Then, the error function can be written as

The corresponding gradient function of and can then be expressed as

For arbitrary initial values of and , gradient descent rules to modify the weight of the BP learning algorithm are applied in where represents the learning efficiency. is the partial derivative of the error function relative to . is the partial derivative of the error function relative to .

2.6. Selection of Training Functions and Related Parameters

The above analysis allows fixing of the BP neural network structure. Table 2 shows the chosen training functions and the relevant parameters.

Table 2: Basic parameters of the classifier based on BP neural network.

This condition allows establishment of a complete classifier based on BP neural network structure. The model generation and training are summarized in Figure 2.

Figure 2: Process flow of model generation and training.
2.7. Measurement

The use of pattern recognition and machine learning methods can be used as a two-way classification problem. Four kinds of prediction results are presented in Table 3.

Table 3: Measurements for the classification problems.

The four kinds of prediction results are true positive (TP), the number of positive cases that were correctly predicted; false positive (FP), the number of positive cases represented by error prediction; true negative (TN), the number of counter negative examples that were correctly predicted; and false negative (FN), the number of negative cases represented by error prediction.

Many evaluation indicators can be used for the classification results. First, the accuracy rate (ACC) is the ratio of the correctly predicted cases for the entire data set. Precision and recall can also be used as evaluation indicators in tests of pattern recognition models. Precision is expressed as the ratio of the correctly predicted values for the entire positive data set and recall reflects the number correctly judged as positive examples in the positive example test set [36]. The above three indicators are expressed in

Additionally, sensitivity and specificity parameters may be used to evaluate the function of the model. Sensitivity record (SE) is the same as the recall and specificity record (SP) calculated in accordance with

A challenge may be presented if the positive and negative test sets are unbalanced in the study of biological information. In most cases, the number of positive samples is far less than the number of negative samples. In a few cases, the number of positive samples may be much larger than the number of negative samples. We can easily obtain ACC-SP when the number of positive samples is greater than the negative samples. In this case, the classifier only reflects the classification effect of the negative samples and is unable to accurately express the prediction effect of the classifier on the entire test data set. To solve this problem, researchers typically use the geometric mean (Gm) as described in

Matthew’s correlation coefficient (MCC) [16, 21, 37, 38] can provide more equitable response forecast ability when a large difference exists between the number of positive samples and the number of negative samples. MCC can be expressed as

Currently, studies on miRNA commonly use one or more of these above evaluation indices. In this work, we estimate the overall performance of the classifier by analysis of ACC, SE, SP, Gm, and MCC.

3. Results and Discussion

3.1. Analysis of Feature Set Performance

To select a better feature set for classification, we needed to determine the effect of different feature subsets on the performance of the classifier. To do this, we used the BP neural network method with the same training set (553 positive samples and 1150 samples) to test different feature sets, with the results shown in Table 4.

Table 4: Comparison of classification results based on different feature sets.

From Table 4, we learn that the accuracy of the entire feature sets can be as high as 93.42%. This result indicates that our feature set is more effective for processing of a more complex structure or sequence diversity. Considering that the feature sets used here are not very large and each feature subset is highly independent, reducing the dimension of the feature vector is no longer needed.

3.2. Performances of BP

-fold cross-validation with moderate computational complexity is widely used for model selection. The selection of is important because not only determines the number of samples but also determines the computational complexity. Usually, a value of between 5 and 10 is selected based on experience. Statistical performance shows little improvement when selection is greater than 10. Again, computational complexity must be considered; thus a value between 5 and 10 is best [32].

We divided the samples into two cases for training and testing. In the first one, a large difference was observed between the number of positive and negative samples: 518 positive samples and 1078 negative samples as the training set and 166 positive samples and 366 negative samples for the test set. The second case included equal numbers of positive and negative samples: 552 positive samples and 552 negative samples as the training set and 138 positive samples and 138 negative samples for the test set. These training and testing were repeated five times. The testing performance is shown in Figures 3 and 4.

Figure 3: Different test results for varying sample quantities.
Figure 4: Different test results for same sample quantity.

From comparison of the data in Figures 3 and 4, no significant difference was observed between the actual output and the expected output of each test. As described above, the evaluation of the reference index is shown in Table 5.

Table 5: Evaluation of the reference index.

From the data presented in Table 5, the number of samples affects the accuracy and recall rate of the positive samples. In particular, the precision and recall rate of the negative samples decreased with the decrease in the number of negative samples in the training set. This result indicates that the more the samples in the training process, the better the classification effect of the classifier. At the same time, the precision and recall rate of the number of positive samples were affected. With the number of negative samples in the training set increased, the number of correct predictions increased by four and the number of error predictions was reduced by eight. This result shows that the precision and recall rate of the positive samples decreased with the increase in the number of the negative samples.

3.3. Comparison with Other Methods

The performance of our method was compared with other methods: J48, random forest, LibD3C [39], Adaboost, string kernel SVM [40], LibSVM, and GBDT, which were classified on the same data set. The data set contains 691 real pre-miRNAs and 1437 pseudo pre-miRNAs. As shown in Table 6 and Figure 5, the results demonstrate that the total prediction accuracy of our method is 13.64% greater than the string kernel SVM model and nearly 2% greater than the LibD3C and LibSVM models. The overall performance of the models as measured by MCC was in the following order: GBDT (0.8682), BP (0.8662), LibSVM (0.8510), LibD3C (0.8510), Adaboost (0.8120), random forest (0.7720), J48 (0.7200), and string kernel SVM (0.6002).

Table 6: Comparison of the BP with alternative models.
Figure 5: Comparison results of different models.

Thus, we conclude that the BP method allows improved recognition accuracy.

3.4. Performance on Different Species

To demonstrate the validity and the universal applicability of the BP method, we analyzed six other species: Anolis carolinensis, Arabidopsis thaliana, Drosophila melanogaster, Drosophila pseudoobscura, Epstein-Barr virus, and Xenopus tropicalis. The results shown in Figure 6 indicate that the accuracy of the GBDT is better than BP method in some situations, but the BP method has been achieved fairly good results in terms of ACC, precision, recall, and MCC.

Figure 6: Test comparison results for six different species.

4. Conclusions

Identification of miRNAs is the first step toward understanding their biological characteristics. Many approaches have been proposed to predict pre-miRNAs in recent years. However, feature extraction in these methods can result in information redundancy. To overcome this drawback, a BP neural network algorithm together with optimal 98D features was employed for this analysis. We compare our method with the existing methods of J48, random forest, LibD3C, Adaboost, GBDT, string kernel SVM, and LibSVM, which were trained on the same training data set. The results demonstrate that the total prediction accuracy of our method is 13.17% greater than the string kernel SVM model and nearly 2% greater than LibD3C and LibSVM.

After the identification step, functional analysis is also important for miRNA research. If human miRNA and diseases were focused on, two main approaches would be employed to predict the relationship. The first one is the statistical comparison analysis for the miRNA or isomiR expression [41]. The second one is the network analysis and prediction for miRNA-disease relationship [4245]. Several advanced machine learning, network techniques, and bioinspired models can be utilized on this problem, including random forest [46], semisupervised learning [47], HeteSim Scores [48], spiking neural P systems [4952], and membrane computing_ENREF_51 [5357]. Functional analysis of the novel detected miRNAs would be our future works.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

The work was supported by the Natural Science Foundation of China (no. 61370010 and no. 61302139) and the State Key Laboratory of Medicinal Chemical Biology in China.

References

  1. D. P. Bartel, “microRNAs: genomics, biogenesis, mechanism, and function,” Cell, vol. 116, no. 2, pp. 281–297, 2004. View at Publisher · View at Google Scholar · View at Scopus
  2. D. Wu, Y. Huang, J. Kang et al., “ncRDeathDB: a comprehensive bioinformatics resource for deciphering network organization of the ncRNA-mediated cell death system,” Autophagy, vol. 11, no. 10, pp. 1917–1926, 2015. View at Publisher · View at Google Scholar · View at Scopus
  3. Y. Huang, N. Liu, J. P. Wang et al., “Regulatory long non-coding RNA and its functions,” Journal of Physiology and Biochemistry, vol. 68, no. 4, pp. 611–618, 2012. View at Publisher · View at Google Scholar · View at Scopus
  4. X. Zhang, D. Wu, L. Chen et al., “RAID: a comprehensive resource for human RNA-associated (RNA–RNA/RNA–protein) interaction,” RNA, vol. 20, no. 7, pp. 989–993, 2014. View at Publisher · View at Google Scholar · View at Scopus
  5. S. Hua, W. Yun, Z. Zhiqiang, and Q. Zou, “A discussion of micrornas in cancers,” Current Bioinformatics, vol. 9, no. 5, pp. 453–462, 2014. View at Publisher · View at Google Scholar · View at Scopus
  6. Q. Wang, L. Wei, X. Guan, Y. Wu, Q. Zou, and Z. Ji, “Briefing in family characteristics of microRNAs and their applications in cancer research,” Biochimica et Biophysica Acta—Proteins and Proteomics, vol. 1844, no. 1, pp. 191–197, 2014. View at Publisher · View at Google Scholar · View at Scopus
  7. C. Yang, D. Wu, L. Gao et al., “Competing endogenous RNA networks in human cancer: hypothesis, validation, and perspectives,” Oncotarget, vol. 7, no. 12, pp. 13479–13490, 2016. View at Google Scholar
  8. X. Zeng, X. Zhang, and Q. Zou, “Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks,” Briefings in Bioinformatics, vol. 17, no. 2, pp. 193–203, 2016. View at Publisher · View at Google Scholar
  9. Q. Zou, J. Li, Q. Hong et al., “Prediction of microRNA-disease associations based on social network analysis methods,” BioMed Research International, vol. 2015, Article ID 810514, 9 pages, 2015. View at Publisher · View at Google Scholar · View at Scopus
  10. Y. Wang, L. Chen, B. Chen et al., “Mammalian ncRNA-disease repository: a global view of ncRNA-mediated disease network,” Cell Death and Disease, vol. 4, no. 8, article e765, 2013. View at Publisher · View at Google Scholar · View at Scopus
  11. M. A. Caligiuri, J. Yu, S. He, and R. Trott, “Activation of Innate Immunity by miRNA for Cancer and Infection Treatment,” United States Patent, 2016.
  12. H. Zhou, X. Ge, and X. Xue, “microRNAs regulation and its role as biomarkers in diseases,” Oncology and Translational Medicine, vol. 2, no. 1, pp. 39–46, 2016. View at Google Scholar
  13. P. S. Kelly, C. Gallagher, M. Clynes, and N. Barron, “Conserved microRNA function as a basis for Chinese hamster ovary cell engineering,” Biotechnology Letters, vol. 37, no. 4, pp. 787–798, 2015. View at Publisher · View at Google Scholar · View at Scopus
  14. Q. Zou, J. Zeng, L. Cao, and R. Ji, “A novel features ranking metric with application to scalable visual and bioinformatics data classification,” Neurocomputing, vol. 173, pp. 346–354, 2016. View at Publisher · View at Google Scholar · View at Scopus
  15. W.-X. Liu, E.-Z. Deng, W. Chen, and H. Lin, “Identifying the subfamilies of voltage-gated potassium channels using feature selection technique,” International Journal of Molecular Sciences, vol. 15, no. 7, pp. 12940–12951, 2014. View at Publisher · View at Google Scholar · View at Scopus
  16. H. Tang, W. Chen, and H. Lin, “Identification of immunoglobulins using Chou's pseudo amino acid composition with feature selection technique,” Molecular BioSystems, vol. 12, no. 4, pp. 1269–1275, 2016. View at Publisher · View at Google Scholar
  17. P.-P. Zhu, W.-C. Li, Z.-J. Zhong et al., “Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition,” Molecular BioSystems, vol. 11, no. 2, pp. 558–563, 2015. View at Publisher · View at Google Scholar · View at Scopus
  18. H. Ding, P.-M. Feng, W. Chen, and H. Lin, “Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis,” Molecular BioSystems, vol. 10, no. 8, pp. 2229–2235, 2014. View at Publisher · View at Google Scholar · View at Scopus
  19. H. Ding, S.-H. Guo, E.-Z. Deng et al., “Prediction of Golgi-resident protein types by using feature selection technique,” Chemometrics and Intelligent Laboratory Systems, vol. 124, pp. 9–13, 2013. View at Publisher · View at Google Scholar
  20. H. Ding, H. Lin, W. Chen et al., “Prediction of protein structural classes based on feature selection technique,” Interdisciplinary Sciences: Computational Life Sciences, vol. 6, no. 3, pp. 235–240, 2014. View at Publisher · View at Google Scholar
  21. C. Xue, F. Li, T. He, G.-P. Liu, Y. Li, and X. Zhang, “Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine,” BMC Bioinformatics, vol. 6, article 310, 2005. View at Publisher · View at Google Scholar · View at Scopus
  22. P. Jiang, H. Wu, W. Wang, W. Ma, X. Sun, and Z. Lu, “MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features,” Nucleic Acids Research, vol. 35, no. 2, pp. W339–W344, 2007. View at Publisher · View at Google Scholar · View at Scopus
  23. L. Wei, M. Liao, Y. Gao, R. Ji, Z. He, and Q. Zou, “Improved and promising identification of human microRNAs by incorporating a high-quality negative set,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 11, no. 1, pp. 192–201, 2014. View at Publisher · View at Google Scholar · View at Scopus
  24. Y. Wang, X. Chen, W. Jiang et al., “Predicting human microRNA precursors based on an optimized feature subset generated by GA-SVM,” Genomics, vol. 98, no. 2, pp. 73–78, 2011. View at Publisher · View at Google Scholar · View at Scopus
  25. B. Liu, L. Fang, F. Liu, X. Wang, J. Chen, and K.-C. Chou, “Identification of real microRNA precursors with a pseudo structure status composition approach,” PLoS ONE, vol. 10, no. 3, article e0121501, 2015. View at Publisher · View at Google Scholar · View at Scopus
  26. B. Liu, L. Fang, J. Chen, F. Liu, and X. Wang, “MiRNA-dis: MicroRNA precursor identification based on distance structure status pairs,” Molecular BioSystems, vol. 11, no. 4, pp. 1194–1204, 2015. View at Publisher · View at Google Scholar · View at Scopus
  27. X. Wang, J. D. Laurie, T. Liu, J. Wentz, and X. S. Liu, “Computational dissection of Arabidopsis smRNAome leads to discovery of novel microRNAs and short interfering RNAs associated with transcription start sites,” Genomics, vol. 97, no. 4, pp. 235–243, 2011. View at Publisher · View at Google Scholar · View at Scopus
  28. X. Zhang, Y. Tian, R. Cheng, and Y. Jin, “An efficient approach to nondominated sorting for evolutionary multiobjective optimization,” IEEE Transactions on Evolutionary Computation, vol. 19, no. 2, pp. 201–213, 2015. View at Publisher · View at Google Scholar · View at Scopus
  29. X. Zhang, Y. Tian, and Y. Jin, “A knee point-driven evolutionary algorithm for many-objective optimization,” IEEE Transactions on Evolutionary Computation, vol. 19, no. 6, pp. 761–776, 2014. View at Publisher · View at Google Scholar
  30. M. Yousef, M. Nebozhyn, H. Shatkay, S. Kanterakis, L. C. Showe, and M. K. Showe, “Combining multi-species genomic data for microRNA identification using a Naïve Bayes classifier,” Bioinformatics, vol. 22, no. 11, pp. 1325–1334, 2006. View at Publisher · View at Google Scholar · View at Scopus
  31. S. Arlot and A. Celisse, “A survey of cross-validation procedures for model selection,” Statistics Surveys, vol. 4, pp. 40–79, 2010. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  32. E. Bonnet, J. Wuyts, P. Rouzé, and Y. Van de Peer, “Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences,” Bioinformatics, vol. 20, no. 17, pp. 2911–2917, 2004. View at Publisher · View at Google Scholar · View at Scopus
  33. H. Liu and L. Wong, “Data mining tools for biological sequences,” Journal of Bioinformatics and Computational Biology, vol. 1, no. 1, pp. 139–167, 2003. View at Publisher · View at Google Scholar · View at Scopus
  34. B. Liu, F. Liu, X. Wang, J. Chen, L. Fang, and K. Chou, “Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences,” Nucleic Acids Research, vol. 43, no. 1, pp. W65–W71, 2015. View at Publisher · View at Google Scholar
  35. R. J. Peace, K. K. Biggar, K. B. Storey, and J. R. Green, “A framework for improving microRNA prediction in non-human genomes,” Nucleic Acids Research, vol. 43, no. 20, article e138, 2015. View at Publisher · View at Google Scholar · View at Scopus
  36. S. Yang, S. Cai, F. Zheng et al., “Representation of fluctuation features in pathological knee joint vibroarthrographic signals using kernel density modeling method,” Medical Engineering and Physics, vol. 36, no. 10, pp. 1305–1311, 2014. View at Publisher · View at Google Scholar · View at Scopus
  37. R. Wang, Y. Xu, and B. Liu, “Recombination spot identification Based on gapped k-mers,” Scientific Reports, vol. 6, Article ID 23934, 2016. View at Publisher · View at Google Scholar
  38. Y. Wu, P. Chen, X. Luo et al., “Quantification of knee vibroarthrographic signal irregularity associated with patellofemoral joint cartilage pathology based on entropy and envelope amplitude measures,” Computer Methods and Programs in Biomedicine, vol. 130, pp. 1–12, 2016. View at Publisher · View at Google Scholar
  39. C. Lin, W. Chen, C. Qiu, Y. Wu, S. Krishnan, and Q. Zou, “LibD3C: ensemble classifiers with a clustering and dynamic selection strategy,” Neurocomputing, vol. 123, pp. 424–435, 2014. View at Publisher · View at Google Scholar · View at Scopus
  40. M. Ghandi, D. Lee, M. Mohammad-Noori, and M. A. Beer, “Enhanced regulatory sequence prediction using gapped k-mer features,” PLoS Computational Biology, vol. 10, no. 7, Article ID e1003711, 2014. View at Publisher · View at Google Scholar
  41. L. Guo, J. Yu, T. Liang, and Q. Zou, “miR-isomiRExp: a web-server for the analysis of expression of miRNA at the miRNA/isomiR levels,” Scientific Reports, vol. 6, Article ID 23700, 2016. View at Publisher · View at Google Scholar
  42. X. Chen, C. C. Yan, X. Zhang et al., “WBSMDA: within and between score for MiRNA-disease association prediction,” Scientific Reports, vol. 6, Article ID 21106, 2016. View at Publisher · View at Google Scholar · View at Scopus
  43. X. Chen, C. Clarence Yan, X. Zhang et al., “RBMMMDA: predicting multiple types of disease-microRNA associations,” Scientific Reports, vol. 5, article 13877, 2015. View at Publisher · View at Google Scholar · View at Scopus
  44. X. Chen, M.-X. Liu, and G.-Y. Yan, “RWRMDA: predicting novel human microRNA-disease associations,” Molecular BioSystems, vol. 8, no. 10, pp. 2792–2798, 2012. View at Publisher · View at Google Scholar · View at Scopus
  45. X. Chen, “miREFRWR: a novel disease-related microRNA-environmental factor interactions prediction method,” Molecular BioSystems, vol. 12, no. 2, pp. 624–633, 2016. View at Publisher · View at Google Scholar
  46. Y. Liu, X. Zeng, Z. He, and Q. Zou, “Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2016. View at Publisher · View at Google Scholar
  47. X. Chen and G.-Y. Yan, “Semi-supervised learning for potential human microRNA-disease associations inference,” Scientific Reports, vol. 4, article 5501, 2014. View at Publisher · View at Google Scholar · View at Scopus
  48. X. Zeng, Y. Liao, Y. Liu, and Q. Zou, “Prediction and validation of disease genes using HeteSim Scores,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2016. View at Publisher · View at Google Scholar
  49. T. Song, J. Xu, and L. Pan, “On the universality and non-universality of spiking neural P systems with rules on synapses,” IEEE Transactions on NanoBioscience, vol. 14, no. 8, pp. 960–966, 2015. View at Publisher · View at Google Scholar
  50. X. Zhang, X. Zeng, B. Luo, and L. Pan, “On some classes of sequential spiking neural P systems,” Neural Computation, vol. 26, no. 5, pp. 974–997, 2014. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  51. T. Song and L. Pan, “Spiking neural P systems with request rules,” Neurocomputing, vol. 193, pp. 193–200, 2016. View at Publisher · View at Google Scholar
  52. X. Wang, T. Song, F. Gong, and P. Zheng, “On the computational power of spiking neural P systems with self-organization,” Scientific Reports, vol. 6, Article ID 27624, 2016. View at Publisher · View at Google Scholar
  53. X. Zeng, L. Xu, X. Liu, and L. Pan, “On languages generated by spiking neural P systems with weights,” Information Sciences, vol. 278, pp. 423–433, 2014. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  54. X. Zhang, L. Pan, and A. Păun, “On the universality of axon P systems,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 11, pp. 2816–2829, 2015. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  55. X. Chen, M. J. Pérez-Jiménez, L. Valencia-Cabrera, B. Wang, and X. Zeng, “Computing with viruses,” Theoretical Computer Science, vol. 623, pp. 146–159, 2016. View at Publisher · View at Google Scholar
  56. T. Wu, Z. Zhang, G. Păun, and L. Pan, “Cell-like spiking neural P systems,” Theoretical Computer Science, vol. 623, pp. 180–189, 2016. View at Publisher · View at Google Scholar · View at Scopus
  57. X. Zhang, Y. Liu, B. Luo, and L. Pan, “Computational power of tissue P systems for generating control languages,” Information Sciences, vol. 278, pp. 285–297, 2014. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus