Table of Contents Author Guidelines Submit a Manuscript
Computational and Mathematical Methods in Medicine
Volume 2017, Article ID 5043984, 12 pages
https://doi.org/10.1155/2017/5043984
Research Article

Utilizing Selected Di- and Trinucleotides of siRNA to Predict RNAi Activity

Ye Han,1,2 Yuanning Liu,1,2 Hao Zhang,1,2 Fei He,3,4,5 Chonghe Shu,1,2 and Liyan Dong1,2

1Department of Computer Science and Technology, Jilin University, Changchun, Jilin, China
2Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
3Department of Computer Science and Information Technology, Northeast Normal University, Changchun, Jilin, China
4Department of Environment, Northeast Normal University, Changchun, Jilin, China
5Institute of Computational Biology, Northeast Normal University, Changchun, China

Correspondence should be addressed to Liyan Dong; nc.ude.ulj@ylgnod

Received 25 October 2016; Accepted 15 December 2016; Published 24 January 2017

Academic Editor: Yu Xue

Copyright © 2017 Ye Han et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Small interfering RNAs (siRNAs) induce posttranscriptional gene silencing in various organisms. siRNAs targeted to different positions of the same gene show different effectiveness; hence, predicting siRNA activity is a crucial step. In this paper, we developed and evaluated a powerful tool named “siRNApred” with a new mixed feature set to predict siRNA activity. To improve the prediction accuracy, we proposed 2-3NTs as our new features. A Random Forest siRNA activity prediction model was constructed using the feature set selected by our proposed Binary Search Feature Selection (BSFS) algorithm. Experimental data demonstrated that the binding site of the Argonaute protein correlates with siRNA activity. “siRNApred” is effective for selecting active siRNAs, and the prediction results demonstrate that our method can outperform other current siRNA activity prediction methods in terms of prediction accuracy.

1. Introduction

RNA interference (RNAi) is a cellular process whereby double-stranded RNA (dsRNA) leads to posttranscriptional gene silencing through base-pairing interactions and is found in many eukaryotic systems, including plants, fungi, invertebrates, and mammals [14]. In mammalian cells, long dsRNA is processed into short 21–23 nucleotide (nt) dsRNAs known as small interfering RNA (siRNA) and induces instant target gene knockdown [3]. In functional genomic research, RNAi has become very helpful in drug and therapeutic applications [5]. Highly effective siRNAs can be synthesized to design novel drugs for influenza virus [6], HIV virus [7], and cancer [8]. However, Takayuki measured the RNAi activities of siRNAs targeting all positions of a single mRNA in human cells and found that few siRNAs show very high activities [9]. Therefore, predicting siRNA activity is a critical step for the successful implementation of RNAi.

Numerous siRNA-designing algorithms, which can be generally categorized as first-and second-generation algorithms, have been reported to date. The first-generation algorithms are based on small validated siRNA datasets and exploit multiple siRNA features, including GC content [10], base preferences at specific positions [11, 12], thermodynamic stability [13], internal structure [14], and target mRNA secondary structure [1517]. However, a large majority of siRNAs designed by the first-generation algorithms are not very effective [18]. The reason may be that the early datasets are too small to cover all the important features [19].

The second-generation algorithms were developed with the accumulation of validated siRNAs. Huesken developed “Biopredsi” [20] based on artificial neural network and built a major siRNA dataset including 2431 siRNAs through high-throughput analysis technology. A number of siRNA activity prediction algorithms based on machine learning models were built using Huesken’s dataset. The algorithms ThermoComposition21 [21], DSIR [22], -score [23], and Biopredsi were estimated as the best predictors [24]. In addition, Takayuki et al. proposed a complete dataset including the siRNAs targeting all positions of a single mRNA in human cells and developed an algorithm “siExplored.” They found that specific residues at every third position of siRNAs greatly influenced its RNAi activity [9].

The performance of second-generation algorithms heavily depends on the selection of the included features [25]. Because the siRNA sequence is the most important factor that determines RNAi activity, more potential features embedded in siRNA sequences should be exploited to increase prediction accuracy. Takahashi found that when the 2-3 bp RNA at every position of a siRNA sequence were substituted by DNA, the RNAi activity changed [26]. Thus, we consider that the di- and trinucleotides at certain positions of siRNA may correlate with its RNAi activity.

In this paper, we developed a powerful siRNA activity predictor by fusing multiple potential features. Our experimental results demonstrate that siRNA activity is significantly affected by its di- and trinucleotides; thus, we proposed 2-3NTs as our new features. In addition, a new mixed 230-dimensional feature set was formed by combining 191 traditional features and 39 new features. To select the most relevant features, we proposed a Binary Search Feature Selection (BSFS) algorithm. Finally, a Random Forest predictor is constructed using the selected features. At the same time, a user-friendly web server named siRNApred is developed and is available for free at http://www.jlucomputer.com:8080/RNA/. siRNApred showed better performance compared with first-generation and second-generation algorithms. The result suggests that the di- and trinucleotides of siRNA can provide important information for prediction of active siRNAs.

2. Materials and Methods

2.1. Dataset

Huesken’s dataset includes [20] 2431 siRNAs targeted to 34 human and rodent mRNAs. The dataset is divided into the 2182-sequence training set (Huesken_train) and 249-sequence testing set (Huesken_test). Three independent datasets from Vickers, Reynolds, and Haborth, including 368 siRNAs, are used for testing [11, 27, 28].

2.2. The Importance of the Di- and Trinucleotides of siRNA

In this section, we first elucidated the importance of our proposed di- and trinucleotides of siRNA on its activity. The di- and trinucleotides of siRNA can be defined as follows:The guide strand of siRNA , where . represents the dinucleotide at position , where . represents the trinucleotide at position , where .

All di- and trinucleotides at all positions of siRNA are obtained by a sliding window size of 2-3. Huesken’s dataset is divided into two classes: 1218 potent siRNAs with activities greater than 0.7 and 1213 nonpotent siRNAs with activities less than 0.7.

There are 16 2-mer RNA subsequences, that is, AA, AU, etc., and the frequencies of all 2-mer RNA subsequences at positions 1 to 20 are calculated for the two classes. The significance level is calculated by Student’s -test and the 2-mer RNA subsequences with minimal value are shown in Table 1 (-value < 0.05).

Table 1: Primary dinucleotides with minimal value.

Table 1 shows that the 2-mer RNA subsequences that appeared most often as potent were different than those that appeared most often as nonpotent siRNAs. We found that “UU” occurred more often than other 2-mer RNA subsequences in potent siRNAs, whereas “GG” and “CC” appeared most often in nonpotent siRNAs. Most of the “UU” 2-mers were found at positions 1, 4, 6, and 7 of potent siRNAs. In nonpotent siRNAs, “GG” often occurred at positions 1, 13, 14, 15, and 16 and “CC” often occurred at positions 3, 4, 5, 6, and 20.

There are 64 3-mer RNA subsequences, that is, AAA, AAU, etc. In addition, the frequencies of all 3-mer RNA subsequences at positions 1 to 19 are calculated for the two classes. The significance level is calculated by Student’s -test and the 3-mer RNA subsequences with minimal value are shown in Table 2 ( value < 0.05).

Table 2: Primary trinucleotides with minimal value.

The results demonstrate that di- and trinucleotides of siRNAs at certain positions can be used as indicators to distinguish between potent siRNAs and nonpotent siRNAs and can possibly be used as a potential feature for siRNA activity prediction.

2.3. Feature Extraction

A total of 230 features are extracted in this section for siRNA activity prediction. These features include 2-3NTs, thermodynamic stability, nucleotide representation, and nucleotide compositions.

2.3.1. 2-3NTs

2-3NTs are categorical features extracted from the di- and trinucleotides of siRNAs.

We defined the feature vector including 20 categorical features extracted from the dinucleotides of siRNA as follows:where .

The categorical feature is calculated using the following formula:where

Then, the feature vector , which includes 19 categorical features, is extracted from the trinucleotides of siRNA as follows:where .

The categorical feature is calculated using the following formula:where

2.3.2. Thermodynamic Stability

The thermodynamic stability of siRNA may influence the strand selection in the process of RNAi; thus it would influence the RNAi activity [23]. is the sum of all the siRNA local duplex stability. The siRNA local duplex stability is calculated for every two base pairs along the siRNA duplex and the thermodynamic parameters for calculations were supplied by Xia et al. [29]. The is the difference of duplex formation at the 5′ and 3′ ends of siRNA for 5 terminal nucleotides.

2.3.3. Nucleotide Representation

Preferred nucleotides at specific positions are important indicators for activity prediction [21]. For example, the nucleotides at the first position of potent siRNAs were most often or , while often appeared at positions 7 and 11 in nonpotent siRNAs [11, 20]. We defined the siRNA as a 21-dimensional vector and indicated the nucleotides at all positions. , , , and were digitized as 0.1, 0.2, 0.3, and 0.4.

2.3.4. Nucleotide Compositions

The compositions of short motifs of 1–3 nt in siRNA and mRNA contained relevant information for activity prediction [30, 31]. There are 4, 16, and 64 possible subsequences for all 1-mer, 2-mer, and 3-mer RNAs, respectively. Thus, there are 168 features extracted from nucleotide compositions.

2.4. Model Construction

Random Forest (RF) [32] is an ensemble learning method for classification and regression by growing a collection of trees. In the process of regression, the trees are constructed using a training set with variables. variables from these input variables are selected for the construction of an individual tree. The mean prediction of the individual tree will be output when the testing samples are pushed down these trees. Because the RF algorithm can randomly select features to build the ensemble of trees, it has stronger robustness than other methods. In this paper, the RF algorithm was used to develop siRNA activity prediction model.

2.5. Feature Selection

We combined 39 2-3NTs, 2 thermodynamic stabilities, 21 nucleotide representations, and 168 nucleotide compositions to obtain a 230-dimensional feature vector. Since the contributions of these features are different, we proposed BSFS algorithm based on RF-variable importance to select the optimal feature set. The process of the algorithm is shown as follows.

Firstly, all features are ranked in descending order according to its . The is calculated by the RF algorithm to measure the feature importance [32]. To get the , Variable Importance (VI) should be first calculated.

of the variable was calculated according to the mean decrease in classification accuracy after permuting values of variable over all trees. The of each tree is computed as follows:where is OOB samples of tree .where is the variable value and is predicted class before permutation.where is the variable value after randomly permuting the th variable and is the predicted class after permutation.

Please note that if is not in the tree , then .

Over all trees, is defined as follows:where is the number of trees in the Random Forest.

Finally, the of the th feature is defined as follows:where is the standard deviation of the raw importance.

Secondly, the first features are selected as the optimal features. Set and the calculation process of threshold is summarized in Algorithm 1.

Algorithm 1: The calculation process of threshold .
2.6. Model Performance Evaluation

As a validation step, we used the Pearson Correlation Coefficient (PCC) to describe the correlation between experimentally determined and predicted siRNA activity. It may be defined as follows:where is the sample size and and are the average value and standard deviation, respectively.

In addition, the Receiver Operating Characteristic (ROC) curve is applied to illustrate the performance of a binary classifier system by plotting sensitivity ( axis) against ( axis) at various threshold settings.where TN is the number of true negatives, FN is the number of false negatives, TP is the number of true positives, and FP is the number of false positives.

The area under the ROC curve (AUC) is a single measurement of the algorithm’s overall performance, and AUC of 1 and 0.5 represents perfect classification and random classification, respectively.

3. Results and Discussion

3.1. Performance of the 2-3NTs Features

To investigate the importance of di- and trinucleotides of siRNA, we learn two RF regression models trained using Huesken_train and tested on Huesken_test. “model 1” is constructed with 2 thermodynamic stabilities, 21 nucleotide representations, and 168 nucleotide compositions, which are often used for siRNA activity prediction [24]. Then, “model 2” which extended “model 1” by considering 39 2-3NTs was constructed for comparisons.

The experimental prediction results are shown in Figure 1, and the PCC between the observed and predicted siRNA activities for model 1 and model 2 are 0.671 and 0.704, respectively. The prediction efficacy achieved 4.92% improvement after adding the new proposed features. It validates that 2-3NTs are important features for the prediction of siRNA activity.

Figure 1: Comparison between model 1 and model 2. Observed siRNA activities of the Huesken_test are plotted against predicted siRNA activities by model 1 (a) and model 2 (b).
3.2. Feature Selection Result

The optimal feature set is obtained by our proposed BSFS algorithm. The details of this algorithm are shown in Section 2.5.

Table 3 shows the threshold “” and the prediction accuracy “PCC” of our model with the top features for all steps. The results show that, when , the PCC of our model reaches a maximum of 0.722. Thus, we choose as the threshold of the feature selection algorithm.

Table 3: The performance of our model with the top features.

As shown in Figure 2, 57 features are selected by the BSFS algorithm and ranked in descending order according to . The higher the , the stronger the predictive ability of the feature. There are ten features proposed by our paper in the selective feature set, including the trinucleotides at positions 1, 2, 7, 18, and 19 and the dinucleotides at positions 1, 2, 8, and 19. Significantly, Takahashi noted the terminal bps of RNA (positions 19–21) provide Argonaute protein binding sites [26]. Our results show that “CUG” occurred most often at this position in potent siRNAs. The Argonaute protein is the endonuclease of RNA-induced silencing complexes (RISC) and cleaves the target mRNA whose sequence is complementary to the guide strand of siRNA [26]. We consider that, because the trinucleotide at position 19 is the binding site of the Argonaute protein, it will influence siRNA activity. However, further experiments are needed to validate if the Argonaute protein prefers to bind to potent siRNAs with specific trinucleotides at position 19.

Figure 2: The 57 features selected by the BSFS method.

Some other features previously proven to be associated with silencing efficacy are selected, including the nucleotides at positions 1, 2, 7 and 19; thermodynamic stability and ; and U%, GGG%, C%, G%, CC%, GG%, GGC%, UGA%, CG%, GCC%, UC%, ACU%, UUC%, AA%, UU%, CGG%, AUG%, AG%, and AGA% of siRNA; AAU%, UUG%, GGG%, AAA%, ACA%, GU%, GCA%, CGU%, GCU%, CU%, GC%, CCG%, AGU%, CGA%, UA%, AU%, UAU%, UAA%, CUC%, GCG%, CUU%, AUU%, and CAU% of mRNA. Graphical boxplots are shown in Figure 3 to display the spread of potent and nonpotent siRNAs for the top 15 features.

Figure 3: Boxplots of the top 15 features. For each plot, the left side represents potent siRNAs, and the right side represents nonpotent siRNAs.
3.3. Comparison of Algorithms

After finding the optimal feature set, the final model, siRNApred, was created. The parameters and are the number of decision trees to be grown in the forest and the number of variables to split at each node, respectively. The default and are 500 and . is the number of features. To find the optimal parameters, we used a grid search method with the step size of 100 and 1. The final results are and . The PCC between the observed and predicted siRNA activities of our model with these parameters is 0.722, which is a 1.7% improvement compared to the model with default parameters. However, the results are not sensitive to over the range 24–30 according to our experimental results.

To test the performance of siRNApred, we compared our model with the most state-of-the-art methods for siRNA activity prediction recently reported in the literature. Two experiments were carried out in the same conditions and the comparative evaluation is as follows.

First, our method was compared with Biopredsi [20], -score [23], ThermoComposition-21 [21], and DSIR [22]. All the algorithms were trained using Huesken_train and tested on Huesken_test. Table 4 shows that the PCC between observed and predicted siRNA activities of our model tested on Huesken_test is 0.722, which is 9.39%, 10.39%, 9.56%, and 7.76% higher than the other four algorithms.

Table 4: PCC between observed and predicted siRNA activities for five algorithms.

In addition, the ROC curves combining both sensitivity and specificity of the five methods are plotted (Figure 4). For ROC analysis, siRNAs that produce at least 70% target gene knockdown were accepted as active siRNAs, and those below 70% were considered inactive siRNA. We calculated an AUC of 0.898 for our model, which is better than those obtained from Biopredsi, -score, ThermoComposition-21, and DSIR.

Figure 4: ROC curves of the five algorithms.

In siRNA design, more inactive siRNAs predicted as active siRNAs will increase the experimental cost, so siRNA design tools are expected to be capable of rejecting as many false positives as possible and retain the maximum number of true positives. Consequently, we should focus on the area that has higher specificity and compare the sensitivities among different algorithms in this area. Figure 4 shows that in the higher specificity area, siRNApred outperforms all other algorithms. Table 5 shows two group sensitivities of all the algorithms. When the specificity of all algorithms is 96.5%, the sensitivity of our method is 51.9%. The value is higher than Biopredsi, -score, ThermoComposition-21, and DSIR, which is 16.3%, 24.4%, 28.9%, and 20%, respectively. Our model also performs best when the specificity of all the algorithms is 99.1%. The results demonstrate that our method had more advantages than the other four algorithms for siRNA design.

Table 5: The five algorithms’ sensitivities in the high specificity area.

A second experiment was conducted to compare our model with the other nine models, including the first-generation siRNA design algorithms Reynolds [11], Ui-Tei [14], Amarzguioui [12], Katoh [9], Hsieh [33], and Takasaki [34] and the second-generation algorithms Biopredsi, -score, ThermoComposition-21, and DSIR. All the algorithms were trained on Huesken_train and tested on the three independent datasets of Vickers, Reynolds, and Harborth.

Figure 5 shows that siRNApred achieves the highest PCC compared to all nine models on all three independent testing datasets and obtained a higher AUC except when tested on Vickers’ dataset. Otherwise, siRNApred produces more stable results across each of the independent siRNA datasets. In addition, the results show that both the PCC and AUC of the first-generation siRNA design algorithms are lower than the second-generation algorithms.

Figure 5: Comparisons of ten algorithms using the three independent datasets of Vickers, Reynolds, and Harborth.

It was found that siRNApred is more stable and effective than other models in the two experiments. The reason may be that our model takes account into the influence of di- and trinucleotides and removes several redundant features. The comparison results demonstrated that prediction accuracy can be improved significantly when considering the 2-3NTs of siRNA guide strand.

4. Conclusions

Activity prediction of siRNA is a critical step for the successful implementation of RNAi. In this study, we introduced 2-3NTs as our new features. A new mixed 230-dimensional feature set was formed by combining 191 traditional features and our 39 proposed features. Since there were many potential features, the BSFS method based on RF-variable importance was proposed to select the optimal feature set. A total of 57 features were selected as input vectors of the RF model to predict siRNA activity, and nine of our proposed features were included. Significantly, the trinucleotide motif at position 19 was included in the selected feature set, which is the binding site of the Argonaute protein. We found that “CUG” occurred most often at position 19 of potent siRNAs. Further experiments are needed to validate if the Argonaute protein prefers to bind to potent siRNAs possessing a specific trinucleotide at position 19. Finally, we describe a highly accurate and reliable tool called “siRNApred.” It can design effective siRNAs for an input mRNA using an optimal feature set. The experimental comparative evaluation on commonly used datasets showed that siRNApred produced better results than first-generation and second-generation siRNA design methods. Consequently, we consider siRNApred a worthy tool for efficient siRNA design.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

The authors would like to acknowledge the support of the National Natural Science Foundation of China (NSFC) under Grant no. 61471181, Natural Science Foundation of Jilin Province under Grant nos. 20140101194JC and 20150101056JC, the Fundamental Research Funds for the Central Universities under Grant no. 2412016KJ033, and the open project program of Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, under Grant no. 93K172016K04.

References

  1. L. Timmons and A. Fire, “Specific interference by ingested dsRNA,” Nature, vol. 395, no. 6705, p. 854, 1998. View at Publisher · View at Google Scholar · View at Scopus
  2. M. K. Montgomery, S. Xu, and A. Fire, “RNA as a target of double-stranded RNA-mediated genetic interference in Caenorhabditis elegans,” Proceedings of the National Academy of Sciences of the United States of America, vol. 95, no. 26, pp. 15502–15507, 1998. View at Publisher · View at Google Scholar · View at Scopus
  3. S. M. Elbashir, J. Harborth, W. Lendeckel, A. Yalcin, K. Weber, and T. Tuschl, “Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells,” Nature, vol. 411, no. 6836, pp. 494–498, 2001. View at Publisher · View at Google Scholar · View at Scopus
  4. C. D. Novina and P. A. Sharp, “The RNAi revolution,” Nature, vol. 430, no. 6996, pp. 161–164, 2004. View at Publisher · View at Google Scholar · View at Scopus
  5. L. Aagaard and J. J. Rossi, “RNAi therapeutics: principles, prospects and challenges,” Advanced Drug Delivery Reviews, vol. 59, no. 2-3, pp. 75–86, 2007. View at Publisher · View at Google Scholar · View at Scopus
  6. C. M. McMillen, D. H. Beezhold, F. M. Blachere, S. Othumpangat, M. L. Kashon, and J. D. Noti, “Inhibition of influenza A virus matrix and nonstructural gene expression using RNA interference,” Virology, vol. 497, pp. 171–184, 2016. View at Publisher · View at Google Scholar · View at Scopus
  7. F. Wang, Y. Sun, J. Ruan et al., “Using small RNA deep sequencing data to detect human viruses,” BioMed Research International, vol. 2016, Article ID 2596782, 9 pages, 2016. View at Publisher · View at Google Scholar · View at Scopus
  8. T. Wang, S. Shigdar, H. A. Shamaileh et al., “Challenges and opportunities for siRNA-based cancer treatment,” Cancer Letters, vol. 387, no. 28, pp. 77–83, 2017. View at Publisher · View at Google Scholar · View at Scopus
  9. T. Katoh and T. Suzuki, “Specific residues at every third position of siRNA shape its efficient RNAi activity,” Nucleic Acids Research, vol. 35, no. 4, article no. e27, 2007. View at Publisher · View at Google Scholar · View at Scopus
  10. S. M. Elbashir, J. Harborth, K. Weber, and T. Tuschl, “Analysis of gene function in somatic mammalian cells using small interfering RNAs,” Methods, vol. 26, no. 2, pp. 199–213, 2002. View at Publisher · View at Google Scholar · View at Scopus
  11. A. Reynolds, D. Leake, Q. Boese, S. Scaringe, W. S. Marshall, and A. Khvorova, “Rational siRNA design for RNA interference,” Nature Biotechnology, vol. 22, no. 3, pp. 326–330, 2004. View at Publisher · View at Google Scholar · View at Scopus
  12. M. Amarzguioui and H. Prydz, “An algorithm for selection of functional siRNA sequences,” Biochemical and Biophysical Research Communications, vol. 316, no. 4, pp. 1050–1058, 2004. View at Publisher · View at Google Scholar · View at Scopus
  13. A. Khvorova, A. Reynolds, and S. D. Jayasena, “Functional siRNAs and miRNAs exhibit strand bias,” Cell, vol. 115, no. 2, pp. 209–216, 2003. View at Publisher · View at Google Scholar · View at Scopus
  14. K. Ui-Tei, Y. Naito, F. Takahashi et al., “Guidelines for the selection of highly effective siRNA sequences for mammalian and chick RNA interference,” Nucleic Acids Research, vol. 32, no. 3, pp. 936–948, 2004. View at Publisher · View at Google Scholar · View at Scopus
  15. S. Schubert, A. Grünweller, V. A. Erdmann, and J. Kurreck, “Local RNA target structure influences siRNA efficacy: systematic analysis of intentionally designed binding regions,” Journal of Molecular Biology, vol. 348, no. 4, pp. 883–893, 2005. View at Publisher · View at Google Scholar · View at Scopus
  16. K. Q. Luo and D. C. Chang, “The gene-silencing efficiency of siRNA is strongly dependent on the local structure of mRNA at the targeted region,” Biochemical and Biophysical Research Communications, vol. 318, no. 1, pp. 303–310, 2004. View at Publisher · View at Google Scholar · View at Scopus
  17. S. M. Yiu, P. W. H. Wong, T. W. Lam et al., “Filtering of ineffective siRNAs and improved siRNA design tool,” Bioinformatics, vol. 21, no. 2, pp. 144–151, 2005. View at Publisher · View at Google Scholar · View at Scopus
  18. Y. Ren, W. Gong, Q. Xu et al., “siRecords: an extensive database of mammalian siRNAs with efficacy ratings,” Bioinformatics, vol. 22, no. 8, pp. 1027–1028, 2006. View at Publisher · View at Google Scholar · View at Scopus
  19. P. Sætrom and O. Snøve Jr., “A comparison of siRNA efficacy predictors,” Biochemical and Biophysical Research Communications, vol. 321, no. 1, pp. 247–253, 2004. View at Publisher · View at Google Scholar · View at Scopus
  20. D. Huesken, J. Lange, C. Mickanin et al., “Design of a genome-wide siRNA library using an artificial neural network,” Nature Biotechnology, vol. 23, no. 8, pp. 995–1001, 2005. View at Google Scholar
  21. S. A. Shabalina, A. N. Spiridonov, and A. Y. Ogurtsov, “Computational models with thermodynamic and composition features improve siRNA design,” BMC Bioinformatics, vol. 7, article no. 65, 2006. View at Publisher · View at Google Scholar · View at Scopus
  22. J.-P. Vert, N. Foveau, C. Lajaunie, and Y. Vandenbrouck, “An accurate and interpretable model for siRNA efficacy prediction,” BMC Bioinformatics, vol. 7, article no. 520, 2006. View at Publisher · View at Google Scholar · View at Scopus
  23. M. Ichihara, Y. Murakumo, A. Masuda et al., “Thermodynamic instability of siRNA duplex is a prerequisite for dependable prediction of siRNA activities,” Nucleic Acids Research, vol. 35, no. 18, article no. e123, 2007. View at Publisher · View at Google Scholar · View at Scopus
  24. O. Matveeva, Y. Nechipurenko, L. Rossi et al., “Comparison of approaches for rational siRNA design leading to a new efficient and transparent method,” Nucleic Acids Research, vol. 35, no. 8, article no. e63, 2007. View at Publisher · View at Google Scholar · View at Scopus
  25. B. N. Thang, T. B. Ho, and T. Kanda, “A semi-supervised tensor regression model for siRNA efficacy prediction,” BMC Bioinformatics, vol. 16, no. 1, article no. 80, 2015. View at Publisher · View at Google Scholar · View at Scopus
  26. T. Takahashi, S. Zenno, O. Ishibashi, T. Takizawa, K. Saigo, and K. Ui-Tei, “Interactions between the non-seed region of siRNA and RNA-binding RLC/RISC proteins, Ago and TRBP, in mammalian cells,” Nucleic Acids Research, vol. 42, no. 8, pp. 5256–5269, 2014. View at Publisher · View at Google Scholar · View at Scopus
  27. J. Harborth, S. M. Elbashir, K. Vandenburgh et al., “Sequence, chemical, and structural variation of small interfering RNAs and short hairpin RNAs and the effect on mammalian gene silencing,” Antisense and Nucleic Acid Drug Development, vol. 13, no. 2, pp. 83–105, 2003. View at Publisher · View at Google Scholar · View at Scopus
  28. T. A. Vickers, S. Koo, C. F. Bennett, S. T. Crooke, N. M. Dean, and B. F. Baker, “Efficient reduction of target RNAs by small interfering RNA and RNase H-dependent antisense agents. A comparative analysis,” Journal of Biological Chemistry, vol. 278, no. 9, pp. 7108–7118, 2003. View at Publisher · View at Google Scholar · View at Scopus
  29. T. Xia, J. SantaLucia Jr., M. E. Burkard et al., “Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs,” Biochemistry, vol. 37, no. 42, pp. 14719–14735, 1998. View at Publisher · View at Google Scholar · View at Scopus
  30. R. Teramoto, M. Aoki, T. Kimura, and M. Kanaoka, “Prediction of siRNA functionality using generalized string kernel and support vector machine,” FEBS Letters, vol. 579, no. 13, pp. 2878–2882, 2005. View at Publisher · View at Google Scholar · View at Scopus
  31. Y. Liu, Y. Chang, C. Zhang et al., “Influence of mRNA features on siRNA interference efficacy,” Journal of Bioinformatics and Computational Biology, vol. 11, no. 3, Article ID 1341004, 2013. View at Publisher · View at Google Scholar · View at Scopus
  32. L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  33. A. C. Hsieh, R. Bo, J. Manola et al., “A library of siRNA duplexes targeting the phosphoinositide 3-kinase pathway: determinants of gene silencing for use in cell-based screens,” Nucleic Acids Research, vol. 32, no. 3, pp. 893–901, 2004. View at Publisher · View at Google Scholar · View at Scopus
  34. S. Takasaki, S. Kotani, and A. Konagaya, “An effective method for selecting siRNA target sequences in mammalian cells,” Cell Cycle, vol. 3, no. 6, pp. 790–795, 2004. View at Google Scholar · View at Scopus