Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2018, Article ID 2936257, 7 pages
Research Article

Metagenomics Biomarkers Selected for Prediction of Three Different Diseases in Chinese Population

1Wuhan National Laboratory for Optoelectronics, Key Laboratory of Information Storage System, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China
2Binhai Genomics Institute, BGI-Tianjin, BGI-Shenzhen, Tianjin 300308, China
3Tianjin Translational Genomics Center, BGI-Tianjin, BGI-Shenzhen, Tianjin 300308, China
4School of Mathematics and Computer Science, Guangdong Ocean University, Zhanjiang, Guangdong 524088, China
5BGI-Shenzhen, Shenzhen 518083, China
6School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China

Correspondence should be addressed to Ke Zhou; nc.ude.tsuh@uohz.k

Received 17 July 2017; Revised 14 October 2017; Accepted 24 October 2017; Published 11 January 2018

Academic Editor: Clara G. de los Reyes-Gavilan

Copyright © 2018 Honglong Wu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


The dysbiosis of human microbiome has been proven to be associated with the development of many human diseases. Metagenome sequencing emerges as a powerful tool to investigate the effects of microbiome on diseases. Identification of human gut microbiome markers associated with abnormal phenotypes may facilitate feature selection for multiclass classification. Compared with binary classifiers, multiclass classification models deploy more complex discriminative patterns. Here, we developed a pipeline to address the challenging characterization of multilabel samples. In this study, a total of 300 biomarkers were selected from the microbiome of 806 Chinese individuals (383 controls, 170 with type 2 diabetes, 130 with rheumatoid arthritis, and 123 with liver cirrhosis), and then logistic regression prediction algorithm was applied to those markers as the model intrinsic features. The estimated model produced an score of 0.9142, which was better than other popular classification methods, and an average receiver operating characteristic (ROC) of 0.9475 showed a significant correlation between these selected biomarkers from microbiome and corresponding phenotypes. The results from this study indicate that machine learning is a vital tool in data mining from microbiome in order to identify disease-related biomarkers, which may contribute to the application of microbiome-based precision medicine in the future.