Predicting Gram-Positive Bacterial Protein Subcellular Location by Using Combined Features

Li, Feng-Min; Gao, Xiao-Wei

doi:https://doi.org/10.1155/2020/9701734

BioMed Research International

On this page

Abstract Introduction Materials and Methods Results Discussion Data Availability Conflicts of Interest Authors’ Contributions Acknowledgments Supplementary Materials References Copyright Related Articles

Special Issue

Scalable Machine Learning Algorithms in Computational Biology and Biomedicine 2020 (SDM20-2738)

View this Special Issue

Research Article | Open Access

Volume 2020 | Article ID 9701734 | https://doi.org/10.1155/2020/9701734

Predicting Gram-Positive Bacterial Protein Subcellular Location by Using Combined Features

Feng-Min Li¹and Xiao-Wei Gao¹

Guest Editor: Quan Zou

Received23 May 2020

Revised30 Jun 2020

Accepted13 Jul 2020

Published03 Aug 2020

Abstract

There are a lot of bacteria in the environment, and Gram-positive bacteria are the most common ones. Some Gram-positive bacteria are very harmful to the human body, so it is significant to predict Gram-positive bacterial protein subcellular location. And identification of Gram-positive bacterial protein subcellular location is important for developing effective drugs. In this paper, a new Gram-positive bacterial protein subcellular location dataset was established. The amino acid composition, the gene ontology annotation information, the hydropathy dipeptide composition information, the amino acid dipeptide composition information, and the autocovariance average chemical shift information were selected as characteristic parameters, then these parameters were combined. The locations of Gram-positive bacterial proteins were predicted by the Support Vector Machine (SVM) algorithm, and the overall accuracy (OA) reached 86.1% under the Jackknife test. The overall accuracy (OA) in our predictive model was higher than those in existing methods. This improved method may be helpful for protein function prediction.

1. Introduction

The cell is the most basic unit of life, and it contains many protein molecules. When a protein is in the right subcellular position, it can perform the right function [1]. So, studying protein subcellular location can help us better understand the biological function of proteins at the cellular level. In the postgenetic era, the amount of biological information has grown rapidly and the traditional experimental method became time-consuming and exhausting. So, the prediction of protein subcellular location based on the machine method has gradually become a hot research topic in bioinformatics [2–7].

Gram-positive bacteria are those that retain their original blue-violet color after being stained by Gram staining. Gram-positive bacteria exist widely in the human body, and they are harmful to the environment and human health. So, it is important to study the protein subcellular location of Gram-positive bacteria. There are a few researches on the protein subcellular location of Gram-positive bacteria. In 2007, Shen and Chou [8] established a Gram-positive bacteria dataset of five categories. They used the GO-PseAA discrete model and the Fusion OET-KNN method, and the overall success accuracy was 82.7% with the Jackknife test. In 2009, Shen and Chou [9] rebuilt the Gram-positive bacteria dataset with four categories: cell wall, cell membrane, cytoplasm, and extracell. The feature of gene ontology information and functional domain information were extracted, and the total success accuracy reached 82.2% with the Jackknife test. In 2012, the total success accuracy was 85.9% for the GP25 dataset constructed by Hu et al. [10]. In the 9th international conference on electrical and computer engineering, Rahman et al. [11] proposed two hybrid features, AACPPM and PAACPPM, which combined PPM with AAC and PseAAC, respectively. The accuracy of both AACPPM and PAACPPM were 73.2%. In 2017, Xiao et al. [12] took advantage of the dataset established by Shen and Chou in 2009 and applied the new algorithm, and a better result was obtained. In 2018, Xiao et al. [13] developed a new bias-reducing predictor. The results showed that this predictor was very helpful in predicting the training dataset.

In this paper, we reconstructed the Gram-positive bacterial protein subcellular location dataset. The amino acid composition information [14], the amino acid dipeptide composition information [15, 16], the gene ontology [17] annotation information, the hydropathy dipeptide [18] composition information, and the autocovariance average chemical shift [19] information were selected as characteristic parameters, then these parameters were combined. Finally, the overall accuracy in the Jackknife test was 86.1% by using the combined parameter AAC+DC+hpDC for the Support Vector Machine.

2. Materials and Methods

2.1. Dataset

In order to collect as much desired information as possible while ensuring a high quality for the dataset, the protein sequences were collected from the Swiss-Prot [20] database at http://www.uniprot.org/. The dataset was established in strict accordance with the following criteria: (1) We conducted a search for all protein sequences with “actinobacteria” and “firmicutes” in the OC firmicutes from the UniProtKB/Swiss-Prot database. (2) Different locations of the protein in the “Subcellular Location” annotation were selected, and the ambiguous or uncertain terms, such as “By similarity” and “Probably” were removed. (3) The protein sequence of 50 aa-3000 aa in the “Sequence” information were selected. (4) Sequences annotated by two or more locations were not included. (5) Sequences annotated with “fragment,” “B,” “X,” and “Z” were excluded. (6) To avoid any homology bias, the software CD-HIT [21] was used to winnow those sequences which have ≥25% sequence identity to any other sequence in the same subcellular location.

After completing the above steps, we obtained 700 Gram-positive proteins, and the specific distribution is shown in Table 1.

2.2. Amino Acid Composition (AAC)

The sequence information of proteins is the most basic feature information of all characteristic parameters [22]. The protein sequence consists of 20 amino acids (A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, and Y). The feature of the occurrence frequency of the 20 amino acids in the protein is important. So, the occurrence frequency of the 20 amino acids in the protein sequence can be selected as one of the characteristic parameters. The amino acid composition can be expressed as a 20-dimensional feature vector:

where , is the occurrence number of the 20 native amino acids of the protein, is the length of the protein, and is the transpose operator.

2.3. Dipeptide Composition (DC)

One of the main drawbacks of the amino acid composition is that it only emphasizes on overall sequence information but ignores the sequence order information. In order to make full use of the sequence information of amino acids, we proposed using the amino acid dipeptide composition information. The amino acid dipeptide information is an improvement based on the AAC parameter, and it denotes the frequency of two adjacent amino acids in a 400-dimensional vector [23–25]. The dipeptide composition can be formulated as follows:

where is the absolute occurrence frequencies of the 400 dipeptides and calculated by where is the occurrence number of the 400 dipeptides of the protein and is the length of the protein.

2.4. Gene Ontology (GO)

Gene ontology is a directed acyclic graph ontology widely used in bioinformatics, and gene ontology consists of three parts: biological process (P), molecular function (F), and cellular component (C). In the gene ontology database, we found that each AC number has a corresponding GO identification number: XXXXXXX. In this paper, since cellular component (C) contains the location information of a protein, in order to ensure the accuracy of the prediction, only biological process (P) and molecular function (F) were extracted.

The specific steps are as follows:

Step 1. The “Text” documents of all protein sequences were downloaded in Swiss-Port, and the annotation information of all biological processes (P) and molecular functions (F) was extracted.

Step 2. BLAST [26] was used to find homologous sequences of biological process (P) and molecular function (F) without annotation information. The homology threshold was set to 60%, and the value was set to 0.001.

Step 3. The frequency of occurrence of each GO term was calculated: where denotes the frequency of the th GO terms at the position of Gram-positive bacteria and is the total number of amino acid sequences at the position of Gram-positive bacteria. A threshold valuewas set; when , the corresponding GO terms were retained.

Step 4. The GO terms of all target sequences were integrated and repeated, then 2573 GO terms were acquired. Finally, the 2573 GO terms were integrated into one vector, : where is 0 or 1, and the GO number with the corresponding location information of the proteins was set to 1; otherwise, it was 0.

2.5. Autocovariance Average Chemical Shifts (acACS)

The most important issue is how to extract features from primary sequences of a protein in a predictor. Hence, the acACS [27, 28] algorithm that uses simple secondary structure information to represent the sample of a protein was proposed. The average chemical shift of a protein is closely related to the protein’s secondary structure [29] and the function of this protein. The secondary structure of the protein sequence (C, H, and E) was obtained by submitting the protein sequence to the PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/) online tool, and then the secondary structure was submitted to Fan et al.’s [30] average chemical shift service website acACS (http://wlxy.imu.edu.cn/college/biostation/fuwu/acACS/index.asp) to obtain the results of the chemical shifts. For a protein where means the length of the protein sequence and is the 20 amino acid residues; thus, can be expressed as follows: where represents the correlation factor of the average chemical shift for with the average chemical shift for along the protein sequence. The factor means the rank of correlation. The factor can be represented in a different composition of , , , and . In order to obtain the best accuracy, an appropriate number factor and the best combination mode were selected to predict the results.

2.6. Hydropathy Dipeptide Composition (hpDC)

Hydropathy dipeptide composition is based on the improvement of hydrophilic and hydrophobic proteins. Firstly, 20 kinds of amino acids were divided into 6 categories [31] according to the hydrophilic and hydrophobic standards, namely, strong hydrophilic amino acids (H), strong hydrophobic amino acids (L), weak hydrophilic amino acids or weak hydrophobic amino acids (W), and three types of proline (P), glycine (G), and cysteine (C) with special chemical structures. Hydrophilic and hydrophobic dipeptide composition is a discrete method that uses protein sequence representation, and it can be represented as a 36-dimensional vector: where represents the occurrence frequencies of the 36 hydropathy dipeptides, while denotes the occurrence number of the 36 hydropathy dipeptides of the protein and is the length of the protein.

2.7. Support Vector Machine (SVM)

The Support Vector Machine is a machine learning method to solve classification and regression problems based on statistical principles. The SVM model is a representation of the examples as points in space, mapped by a kernel function so that the examples are divided by a clear gap that is wide enough. The new examples are mapped into the same space and predicted according to which side of the gap they fall on. The radial basis kernel function (RBF) was used to obtain the best classification hyperplane. The regularization parameter and the kernel width parameter were tuned via the grid search method. So far, the risk minimization of the SVM algorithm has become the latest research hotspot and it has been successfully applied to various fields [32–38], especially in the field of biological computing, such as in the prediction of protein sequence structure and in the classification of protein structure [28, 39–46]. In this paper, the LIBSVM algorithm has been used to predict various feature information, which can be downloaded from http://www.csie.ntu.edu.tw/cjlin/libsvm/.

3. Results

3.1. Cross-Validation

In statistical prediction, three test methods of prediction accuracy are used: the Jackknife test, the-fold cross-validation test, and the independent test [8, 47–53]. In this paper, a strict and objective method for the Jackknife test was adopted to examine the performance of the proposed model. The principle of the Jackknife test is to select one from among all protein sequences as a testing set and the other remaining sequences as a training set until all protein sequences are recycled once.

3.2. Evaluation of the Predictive Performances

In order to evaluate the performance of related predictive methods and the reliability of the algorithm, the sensitivity (), specificity (), accuracy (ACC), Matthew’s correlation coefficient (MCC), and overall accuracy (OA) [54–59] were used and defined by where is the total number of protein sequences in the dataset, TP represents the numbers of the correctly recognized positives, FN is the numbers of the positives recognized as negatives, FP means the numbers of the negatives recognized as positives, while TN is the numbers of correctly recognized negatives.

3.3. The Prediction of Gram-Positive Bacteria

In this paper, in order to investigate the effectiveness of our approaches, we have used five feature extraction strategies and the SVM is used as classification algorithm.

The autocovariance average chemical shift (acACS) vectors were formed based on protein sequence, and in order to obtain the best prediction results, we need to find the best chemically shifted atom combination and the best parameter . Figure 1 shows that the predicted results for ranges from 0 to 56, and the best is 40. Figure 2 shows that the prediction result was the best when the combination mode of chemically shifted atoms was . For gene ontology information, the first 2573-dimensional vector was obtained. Since the redundancy of data has a detrimental effect on the prediction results, we used the method of principal component analysis to reduce the vector to 854 dimensions. First of all, the 2573 GO terms were integrated into one vector, then the frequency of each GO term was counted. According to the sum of frequencies, the first 854 data was selected.

The predicted results by the Jackknife test for the different information parameters are recorded in Table 2, and the predicted results based on the combined parameter information with the Jackknife test are shown in Table 3. The results showed that the combined parameters were better than a single characteristic parameter. And the combined parameter AAC+DC+hpDC obtained the best accuracy which was 86.1%. The results indicated that the combined parameter was helpful to predict the protein subcellular location of Gram-positive bacteria. The reason that the accuracies of AAC+GO+acACS+hpDC, AAC+DC+GO+hpDC, and AAC+DC+GO+acACS+hpDC were lower than AAC+DC+hpDC was probably due to the redundancy of data.

4. Discussion

For the purpose of comparing the predictive capability of our method, the predicted results of Shen’s, Hu’s, and Julia Rahman’s method are enumerated in Table 4. It can be seen from Table 4 that our results were superior to others. The accuracy of our method was 3.4% higher than Shen’s first work, 3.9% higher than Shen’s second work, 0.2% higher than Hu’s work, and 12.9% higher than Julia Rahman’s work.

Gram-positive bacteria exist widely in nature and could cause many diseases, so studying Gram-positive bacteria subcellular location could solve the many problems of disease. In this paper, the dataset of protein subcellular location of Gram-positive bacteria was reconstructed, and the subcellular location of Gram-positive bacterial protein was predicted. The method in this paper had the advantages of a simple algorithm and an automatic process. The results showed that the combined parameter can improve the prediction accuracy of protein subcellular location of Gram-positive bacteria.

The protein data used to support the findings of this study are included within the supplementary information file.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there is no conflict of interest.

Authors’ Contributions

Gao XW conceived the selection of feature parameters and carried out the computation by SVM. Li FM analysed the results and wrote the manuscript. All authors reviewed the manuscript.

Acknowledgments

This work was supported by the Natural Science Foundation of Inner Mongolia of China (2019MS03015) and the National Natural Science Foundation of China (31360206).

Supplementary Materials

The protein data used to support the findings of this study are included within the supplementary information file. (Supplementary Materials)

References

Y. Fujiwara and M. Asogawa, “Prediction of subcellular localizations using amino acid composition and order,” Genome informatics, vol. 12, pp. 103–112, 2001.
View at: Publisher Site | Google Scholar
M. Suzuki, R. J. Youle, and N. Tjandra, “Structure of Bax: coregulation of dimer formation and intracellular localization,” Cell, vol. 103, no. 4, pp. 645–654, 2000.
View at: Publisher Site | Google Scholar
M. L. Liu, W. Su, Z. X. Guan et al., “An overview on predicting protein subchloroplast localization by using machine learning methods,” Current Protein & Peptide Science, vol. 21, 2020.
View at: Publisher Site | Google Scholar
H. Ding and D. Li, “Identification of mitochondrial proteins of malaria parasite using analysis of variance,” Amino Acids, vol. 47, no. 2, pp. 329–333, 2015.
View at: Publisher Site | Google Scholar
T. H. Zhang and S. W. Zhang, “Advances in the prediction of protein subcellular locations with machine learning,” Current Bioinformatics, vol. 14, no. 5, pp. 406–421, 2019.
View at: Publisher Site | Google Scholar
S. Wan, Y. Duan, and Q. Zou, “HPSLPred: an ensemble multi-label classifier for human protein subcellular location prediction with imbalanced source,” Proteomics, vol. 17, no. 17-18, pp. 17-18, 2017.
View at: Publisher Site | Google Scholar
L. Wei, Y. Ding, R. Su, J. Tang, and Q. Zou, “Prediction of human protein subcellular localization using deep learning,” Journal of Parallel & Distributed Computing, vol. 117, pp. 212–217, 2018.
View at: Publisher Site | Google Scholar
H. B. Shen and K. C. Chou, “Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins,” Protein Engineering, Design & Selection, vol. 20, no. 1, pp. 39–46, 2007.
View at: Publisher Site | Google Scholar
H. B. Shen and K. C. Chou, “Gpos-mPLoc: a top-down approach to improve the quality of predicting subcellular localization of Gram-positive bacterial proteins,” Protein and Peptide Letters, vol. 16, no. 12, pp. 1478–1484, 2009.
View at: Publisher Site | Google Scholar
Y. Hu, T. Li, J. Sun et al., “Predicting gram-positive bacterial protein subcellular localization based on localization motifs,” Journal of Theoretical Biology, vol. 308, pp. 135–140, 2012.
View at: Publisher Site | Google Scholar
J. Rahman, M. N. I. Mondal, M. K. B. Islam, M. A. M. Hasan, and S. M. S. Amin, “Gram-positive bacterial protein subcellular localization prediction using features fusion strategy,” in 2016 9th International Conference on Electrical and Computer Engineering (ICECE), pp. 291–294, 2016.
View at: Google Scholar
X. Xiao, X. Cheng, S. Su, Q. Mao, and K. C. Chou, “pLoc-mGpos: incorporate key gene ontology information into general PseAAC for predicting subcellular localization of gram-positive bacterial proteins,” Natural Science, vol. 9, no. 9, pp. 330–349, 2017.
View at: Publisher Site | Google Scholar
X. Xiao, X. Cheng, G. Chen, Q. Mao, and K. C. Chou, “pLoc_bal-mGpos: predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC,” Genomics, vol. 111, no. 4, pp. 886–892, 2019.
View at: Publisher Site | Google Scholar
S. H. Li, J. Zhang, Y. W. Zhao et al., “iPhoPred: a predictor for identifying phosphorylation sites in human protein,” Ieee Access, vol. 7, pp. 177517–177528, 2019.
View at: Publisher Site | Google Scholar
W. Yang, X. J. Zhu, J. Huang, H. Ding, and H. Lin, “A brief survey of machine learning methods in protein sub-Golgi localization,” Current Bioinformatics, vol. 14, no. 3, pp. 234–240, 2019.
View at: Publisher Site | Google Scholar
J. X. Tan, S. H. Li, Z. M. Zhang et al., “Identification of hormone binding proteins based on machine learning methods,” Mathematical Biosciences and Engineering, vol. 16, no. 4, pp. 2466–2480, 2019.
View at: Publisher Site | Google Scholar
M. Ashburner, C. A. Ball, J. A. Blake et al., “Gene ontology: tool for the unification of biology. The Gene Ontology Consortium,” Nature Genetics, vol. 25, no. 1, pp. 25–29, 2000.
View at: Publisher Site | Google Scholar
F.-M. Li and X.-Q. Wang, “Identifying anticancer peptides by using improved hybrid compositions,” Scientific Reports, vol. 6, no. 1, 2016.
View at: Publisher Site | Google Scholar
G.-L. Fan and Q.-Z. Li, “Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition,” Amino Acids, vol. 43, no. 2, pp. 545–555, 2012.
View at: Publisher Site | Google Scholar
B. Boeckmann, A. Bairoch, R. Apweiler et al., “The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003,” Nucleic Acids Research, vol. 31, no. 1, pp. 365–370, 2003.
View at: Publisher Site | Google Scholar
Y. Huang, B. Niu, Y. Gao, L. Fu, and W. Li, “CD-HIT suite: a web server for clustering and comparing biological sequences,” Bioinformatics, vol. 26, no. 5, pp. 680–682, 2010.
View at: Publisher Site | Google Scholar
M. A. Andrade, S. I. O’Donoghue, and B. Rost, “Adaptation of protein surfaces to subcellular location,” Journal of Molecular Biology, vol. 276, no. 2, pp. 517–525, 1998.
View at: Publisher Site | Google Scholar
M. Reczko and H. Bohr, “The DEF data base of sequence based protein fold class predictions,” Nucleic Acids Research, vol. 22, no. 17, pp. 3616–3619, 1994.
View at: Google Scholar
H. Tang, Y. W. Zhao, P. Zou et al., “HBPred: a tool to identify growth hormone-binding proteins,” International Journal of Biological Sciences, vol. 14, no. 8, pp. 957–964, 2018.
View at: Publisher Site | Google Scholar
W. Chen, F. Nie, and H. Ding, “Recent advances of computational methods for identifying bacteriophage virion proteins,” Protein and Peptide Letters, vol. 27, no. 4, pp. 259–264, 2020.
View at: Publisher Site | Google Scholar
A. A. Schaffer, L. Aravind, T. L. Madden et al., “Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements,” Nucleic Acids Research, vol. 29, no. 14, pp. 2994–3005, 2001.
View at: Publisher Site | Google Scholar
W. Shi, M. Punta, J. Bohon et al., “Characterization of metalloproteins by high-throughput X-ray absorption spectroscopy,” Genome Research, vol. 21, no. 6, pp. 898–907, 2011.
View at: Publisher Site | Google Scholar
X. J. Zhu, C. Q. Feng, H. Y. Lai, W. Chen, and L. Hao, “Predicting protein structural classes for low-similarity sequences by evaluating different features,” Knowledge-Based Systems, vol. 163, pp. 787–793, 2019.
View at: Publisher Site | Google Scholar
S. P. Mielke and V. V. Krishnan, “Protein structural class identification directly from NMR spectra using averaged chemical shifts,” Bioinformatics, vol. 19, no. 16, pp. 2054–2064, 2003.
View at: Publisher Site | Google Scholar
G. L. Fan, Y. L. Liu, and Y. C. Zuo, “acACS: improving the prediction accuracy of protein subcellular locations and protein classification by incorporating the average chemical shifts composition,” The Scientific World Journal, vol. 2014, Article ID 864135, 9 pages, 2014.
View at: Publisher Site | Google Scholar
Y. L. Chen and Q. Z. Li, “Prediction of the subcellular location of apoptosis proteins,” Journal of Theoretical Biology, vol. 245, no. 4, pp. 775–783, 2007.
View at: Publisher Site | Google Scholar
B. Manavalan and J. Lee, “SVMQA: support-vector-machine-based protein single-model quality assessment,” Bioinformatics, vol. 33, no. 16, pp. 2496–2503, 2017.
View at: Publisher Site | Google Scholar
B. Manavalan, T. H. Shin, and G. Lee, “PVP-SVM: sequence-based prediction of phage virion proteins using a Support Vector Machine,” Frontiers in Microbiology, vol. 9, p. 476, 2018.
View at: Publisher Site | Google Scholar
L. Wei, R. Su, S. Luan et al., “Iterative feature representations improve N4-methylcytosine site prediction,” Bioinformatics, vol. 35, no. 23, pp. 4930–4937, 2019.
View at: Publisher Site | Google Scholar
C. Meng, S. Jin, L. Wang, F. Guo, and Q. Zou, “AOPs-SVM: a sequence-based classifier of antioxidant proteins using a Support Vector Machine,” Frontiers in Bioengineering and Biotechnology, vol. 7, 2019.
View at: Publisher Site | Google Scholar
H. Bu, J. Hao, J. Guan, and S. Zhou, “Predicting enhancers from multiple cell lines and tissues across different developmental stages based on SVM method,” Current Bioinformatics, vol. 13, no. 6, pp. 655–660, 2018.
View at: Publisher Site | Google Scholar
L. Chao, L. Wei, and Q. Zou, “SecProMTB: a SVM-based classifier for secretory proteins of Mycobacterium tuberculosis with imbalanced data set,” Proteomics, vol. 19, 2019.
View at: Google Scholar
Y. Wang, F. Shi, L. Cao et al., “Morphological segmentation analysis and texture-based Support Vector Machines classification on mice liver fibrosis microscopic images,” Current Bioinformatics, vol. 14, no. 4, pp. 282–294, 2019.
View at: Publisher Site | Google Scholar
H. Yang, W. Yang, F. Y. Dao et al., “A comparison and assessment of computational method for identifying recombination hotspots in Saccharomyces cerevisiae,” Briefings in Bioinformatics, 2019.
View at: Publisher Site | Google Scholar
H. Y. Lai, Z. Y. Zhang, Z. D. Su et al., “iProEP: a computational predictor for predicting promoter,” Molecular Therapy - Nucleic Acids, vol. 17, pp. 337–346, 2019.
View at: Publisher Site | Google Scholar
H. Ding, W. Yang, H. Tang et al., “PHYPred: a tool for identifying bacteriophage enzymes and hydrolases,” Virologica Sinica, vol. 31, no. 4, pp. 350–352, 2016.
View at: Publisher Site | Google Scholar
H. Y. Lai, C. Q. Feng, Z. Y. Zhang, H. Tang, W. Chen, and H. Lin, “A brief survey of machine learning application in cancerlectin identification,” Current Gene Therapy, vol. 18, no. 5, pp. 257–267, 2018.
View at: Publisher Site | Google Scholar
K. Liu and W. Chen, “iMRM: a platform for simultaneously identifying multiple kinds of RNA modifications,” Bioinformatics, vol. 36, no. 11, pp. 3336–3342, 2020.
View at: Publisher Site | Google Scholar
N. Stephenson, E. Shane, J. Chase et al., “Survey of machine learning techniques in drug discovery,” Current Drug Metabolism, vol. 20, no. 3, pp. 185–193, 2019.
View at: Publisher Site | Google Scholar
H. Tang, R. Z. Cao, W. Wang, T. S. Liu, L. M. Wang, and C. M. He, “A two-step discriminated method to identify thermophilic proteins,” International Journal of Biomathematics, vol. 10, no. 4, 2017.
View at: Publisher Site | Google Scholar
L. Yu, F. Xu, and L. Gao, “Predict new therapeutic drugs for hepatocellular carcinoma based on gene mutation and expression,” Frontiers in Bioengineering and Biotechnology, vol. 8, p. 8, 2020.
View at: Publisher Site | Google Scholar
H. Lin, Z. Y. Liang, H. Tang, and W. Chen, “Identifying Sigma70 promoters with novel pseudo nucleotide composition,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 16, no. 4, pp. 1316–1321, 2019.
View at: Publisher Site | Google Scholar
W. Chen, P. Feng, T. Liu, and D. Jin, “Recent advances in machine learning methods for predicting heat shock proteins,” Current Drug Metabolism, vol. 20, no. 3, pp. 224–228, 2019.
View at: Publisher Site | Google Scholar
S. Basith, B. Manavalan, T. Hwan Shin, and G. Lee, “Machine intelligence in peptide therapeutics: a next-generation tool for rapid disease screening,” Medicinal Research Reviews, vol. 40, no. 4, pp. 1276–1314, 2020.
View at: Publisher Site | Google Scholar
V. Boopathi, S. Subramaniyam, A. Malik, G. Lee, B. Manavalan, and D. C. Yang, “mACPpred: a Support Vector Machine-based meta-predictor for identification of anticancer peptides,” International Journal of Molecular Sciences, vol. 20, no. 8, 2019.
View at: Publisher Site | Google Scholar
M. M. Hasan, N. Schaduangrat, S. Basith, G. Lee, W. Shoombuatong, and B. Manavalan, “HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation,” Bioinformatics, vol. 36, no. 11, pp. 3350–3356, 2020.
View at: Publisher Site | Google Scholar
B. Manavalan, S. Basith, T. H. Shin, L. Wei, and G. Lee, “mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation,” Bioinformatics, vol. 35, no. 16, pp. 2757–2765, 2019.
View at: Publisher Site | Google Scholar
L. Yu, S. Y. Yao, L. Gao, and Y. H. Zha, “Conserved disease modules extracted from multilayer heterogeneous disease and gene networks for understanding disease mechanisms and predicting disease treatments,” Frontiers in Genetics, vol. 9, 2019.
View at: Publisher Site | Google Scholar
F. Y. Dao, H. Lv, H. Zulfiqar et al., “A computational platform to identify origins of replication sites in eukaryotes,” Briefings in Bioinformatics, 2020.
View at: Publisher Site | Google Scholar
S. Basith, B. Manavalan, T. H. Shin, and G. Lee, “SDM6A: a web-based integrative machine-learning framework for predicting 6mA sites in the rice genome,” Molecular Therapy–Nucleic Acids, vol. 18, pp. 131–141, 2019.
View at: Publisher Site | Google Scholar
B. Manavalan, S. Basith, T. H. Shin, D. Y. Lee, L. Wei, and G. Lee, “4mCpred-EL: an ensemble learning framework for identification of DNA N(4)-methylcytosine sites in the mouse genome,” Cell, vol. 8, pp. 1–14, 2019.
View at: Google Scholar
B. Manavalan, S. Basith, T. H. Shin, L. Wei, and G. Lee, “AtbPpred: a robust sequence-based prediction of anti-tubercular peptides using extremely randomized trees,” Computational and Structural Biotechnology Journal, vol. 17, pp. 972–981, 2019.
View at: Publisher Site | Google Scholar
B. Manavalan, S. Basith, T. H. Shin, L. Wei, and G. Lee, “Meta-4mCpred: a sequence-based meta-predictor for accurate DNA 4mC site prediction using effective feature representation,” Molecular Therapy - Nucleic Acids, vol. 16, pp. 733–744, 2019.
View at: Publisher Site | Google Scholar
L. Yu and L. Gao, “Human pathway-based disease network,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 16, no. 4, pp. 1240–1249, 2019.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2020 Feng-Min Li and Xiao-Wei Gao. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

563

Downloads

1115

Citations