Novel Bioinformatics Approaches for Analysis of High-Throughput Biological DataView this Special Issue
Systematic Analysis of the Association between Gut Flora and Obesity through High-Throughput Sequencing and Bioinformatics Approaches
Eighty-one stool samples from Taiwanese were collected for analysis of the association between the gut flora and obesity. The supervised analysis showed that the most, abundant genera of bacteria in normal samples (from people with a body mass index (BMI) 24) were Bacteroides (27.7%), Prevotella (19.4%), Escherichia (12%), Phascolarctobacterium (3.9%), and Eubacterium (3.5%). The most abundant genera of bacteria in case samples (with a BMI 27) were Bacteroides (29%), Prevotella (21%), Escherichia (7.4%), Megamonas (5.1%), and Phascolarctobacterium (3.8%). A principal coordinate analysis (PCoA) demonstrated that normal samples were clustered more compactly than case samples. An unsupervised analysis demonstrated that bacterial communities in the gut were clustered into two main groups: N-like and OB-like groups. Remarkably, most normal samples (78%) were clustered in the N-like group, and most case samples (81%) were clustered in the OB-like group (Fisher’s ). The results showed that bacterial communities in the gut were highly associated with obesity. This is the first study in Taiwan to investigate the association between human gut flora and obesity, and the results provide new insights into the correlation of bacteria with the rising trend in obesity.
Enterobacteria, or gut microbiota, in the human gastrointestinal (GI) tract play important roles in the body’s functions. For example, they can regulate immune responses and metabolic functions of the host [1–4]. The gut microbiota is frequently used to study the association between human health and an individual’s lifestyle. In 2011, three major kinds of enterotypes consisting of Bacteroides, Prevotella, and Ruminococcus  provided new perspectives to classify individuals. Ruminococcus was reported to be an ambiguous enterotype, and the Bacteroides and Prevotella enterotypes were associated with dietary habits . Animal protein and saturated fats were highly correlated with the Bacteroides enterotype. Low meat intake and plant-based nutrition with high carbohydrates were correlated with the Prevotella enterotype.
The Human Microbiome Project (HMP)  was launched by the US National Institutes of Health in 2008, and understanding the relationship between human health and microbiota that live in or on the human body was recognized as an important concept. To decipher this relationship, high-throughput sequencing, also called next-generation sequencing (NGS), supporting a large number of sequences, can be used to sequence 16S ribosomal (r)RNA to construct complex microbial community profiles. 16S rRNA is considered the standard for studying microbial communities and assigning taxonomy to bacteria. Compared to conventional polymerase chain reaction- (PCR-) based or culture-based methods, 16S rRNA sequencing by NGS can detect hundreds to thousands of bacteria at one time and offer relative quantification of the bacteria. Interactions between different bacterial communities and their environments can be comprehensively analyzed by metagenomics research. Associations between diseases and specific bacteria have been described in previous studies, for example, type 2 diabetes [8–11], irritable bowel syndrome (IBS) [12–14], and colorectal cancer (CRC) [15–17]. Moreover, some bacteria which are significantly associated with specific diseases were thought to be biomarkers for construction of a disease risk prediction model [8, 18].
Obesity is a major public health problem worldwide, and its prevalence is rapidly increasing . Obesity is related to several disorders, including type 2 diabetes [20–23], cardiovascular disease [24–26], and cancer [26–28]. Recently, obesity was shown to be associated with an alteration of the gut microbiota, both in human [29–32] and animal models [30, 33, 34]. It was observed that a reduced proportion of the Bacteroidetes and increased proportion of the Firmicutes were associated with human obesity [30, 35, 36]. Also, an increase of Actinobacteria in obese individuals was reported . In another study, amounts of Archaea and Methanobacteriales were positively correlated with obesity , and their amounts in obesity samples decreased or disappeared after gastric bypass surgery. In addition, another study also mentioned that the amounts of Bifidobacterium and Ruminococcus decreased in obesity samples . However, some studies indicated that the ratio of the proportions of Bacteroidetes and Firmicutesis contradictory  or not associated [5, 40] with obesity.
With different ethnicities and regions, dietary habits and environmental factors can widely vary, and there is a lack of studies focusing on Taiwanese samples. Therefore, herein we collected 81 stool samples from Taiwanese for analysis of the association between gut flora and obesity. According to a study by Pan et al. , Taiwan adopted body mass index (BMI) values of 24 and 27 as the cutoff points for being overweight and obese, respectively. In this study, the stools of 36 obese (BMI ≥ 27) and 45 normal persons (BMI ≤ 24) were collected, and 16S rRNA sequencing was used to assess the association between obesity and the taxonomic composition of the gut microbiota.
Participant metadata are summarized in Table 1, and detailed sample profiles are given in Table available online in Supplementary material at http://dx.doi.org/10.1155/2014/906168, including the number of reads, gender, age, height, weight, and BMI. In total, 4,152,740 sequence reads were obtained from the 81 samples, and a mean of 51,268 reads with a median read length of 125 bp was obtained per study participant. Sequence reads were processed through our taxonomic mapping process, and the distribution of genera in samples is depicted in Figure 1. The sequencing results showed that the most abundant genera in all samples were Bacteroides (28%), Prevotella (20%), Escherichia (9.7%), Phascolarctobacterium (3.9%), Eubacterium (3.2%), Megamonas (3%), Faecalibacterium (2.9%), Gemmiger (2.2%), and Sutterella (2%).
2.1. Unsupervised Clustering Analysis
Hierarchical clustering was performed using the UniFrac unweighted distance, and gut bacterial communities and clinical values of each sample are shown in Figure 2. The results demonstrate that the bacterial communities in the gut were clustered into two main groups: an N-like group (including the N1 and N2 subgroups) and an OB-like group (including the OB1, OB2, OB3, and OB4 subgroups). Figure 3 shows that the most abundant genera in N-like samples were Bacteroides (27.8%), Prevotella (18.6%), Escherichia (12.7%), Phascolarctobacterium (4%), and Eubacterium (3.5%). The most abundant genera in OB-like samples were Bacteroides (28.8%), Prevotella (21.7%), Escherichia (7.1%), Megamonas (4.4%), and Phascolarctobacterium (3.7%). Remarkably, most normal samples (78%) were clustered in the N-like group, and most case samples (81%) were clustered in the OB-like group (Fisher’s value = ). The results showed that gut bacterial community types were highly associated with obesity. The genera diversity analysis showed that the bacterial communities in the N-like group exhibited significantly higher alpha diversity and lower beta diversity than those in the OB-like group (Figure 4).
2.2. Supervised Clustering Analysis
To investigate the association between gut bacterial communities and obesity, 45 stool samples of participants with a BMI of ≤ 24 were defined as normal samples, and 36 samples of participants with a BMI ≥ 27 were used as case samples. Figure 5 shows that the most abundant bacteria in normal samples were Bacteroides (27.7%), Prevotella (19.4%), Escherichia (12%), Phascolarctobacterium (3.9%), and Eubacterium (3.5%). The most abundant bacteria in case samples were Bacteroides (29%), Prevotella (21%), Escherichia (7.4%), Megamonas (5.1%), and Phascolarctobacterium (3.8%). Normal samples had a significantly higher proportion of Escherichia, while case samples had a higher proportion of Megamonas.
Genera with significantly different proportions between normal and case samples are listed in Table 2. Additionally, genera with a significantly different presence between normal and case samples are listed in Table 3, and significantly different species are also provided in Table . The genera of Shewanella, Citrobacter, Cronobacter, Leclercia, Tatumella, and Acinetobacter exhibited significant differences in both proportions and presence. Unweighted alpha and beta diversities of genera in the normal and case samples are shown in Figures 6(a) and 6(b), respectively. The results showed that the bacterial communities in normal samples exhibited significantly higher alpha diversity and lower beta diversity than those in case samples.
A PCoA of gut bacterial communities is shown in Figure 7. The results showed that most normal samples (green nodes) were located in the bottom left area, and case samples (red nodes) were spread in other areas (Figure 7(a)). Samples in the N1, N2, OB1, OB2, OB3, and OB4 subgroups are depicted in Figure 7(b). The results show that bacterial communities of N1 and N2 were highly associated with normal-weight individuals, and others were associated with obese individuals.
2.3. Potential Markers for Classification of Normal Weight and Obesity
The identified bacteria with statistical significance were used for rule-based clustering. Threefold cross-validation was used to evaluate the performance of the classification model. Two out of the significant species in Table , Parabacteroides distasonis and Serratia sp. DAP4, were selected as discriminating factors in the J48 decision tree. As shown in Figure 8, the classification rules are described as follows. (1) a sample is classified as normal if Parabacteroides distasonis was absent. (2) A sample with the presence of Parabacteroides distasonis and absence of Serratia sp. DAP4 was classified as a case; otherwise, it was classified as normal. As shown in Table , the classifier performed well, and the area under the receiver operating characteristic curve (AUC) was 0.813. The results showed that Parabacteroides distasonis and Serratia sp. DAP4 might be potential markers for further clinical analysis and investigation of obesity.
Several relatively abundant genera were identified in samples (Figure 5), including Bacteroides, Prevotella, Escherichia, Phascolarctobacterium, Eubacterium, Megamonas, Faecalibacterium, Gemmiger, Sutterella, Fusobacterium, Salmonella, Megasphaera, Dialister, Bifidobacterium, and Akkermansia. In previous studies, the presence of Bacteroides, Prevotella, and Sutterella was negatively associated with obesity [36, 42, 43]. In other related gastrointestinal diseases, Prevotella was increased in children diagnosed with IBS . Bacteroides, Eubacterium, and Prevotella were increased, and Faecalibacterium was reduced in CRC patients [45, 46]. Increased Bacteroides and reduced Eubacterium and Prevotella were also found in a rat model of CRC .
At the genus level, the presence of Acinetobacter, Aliivibrio, Marinomonas, Pseudoalteromonas, and Shewanella had positive associations with obesity (Table 3). Acinetobacter is a genus of Gram-negative bacteria, and the species Acinetobacter baumannii is a key pathogen of infections in hospitals . Aliivibrio is a reclassified genus from the “Vibrio fischeri species group” , and species of Aliivibrio are symbiotic with marine animals or are described as fish pathogens [50–52]. Shewanella is a genus of marine bacteria, and some species can cause infections [53, 54]. Lachnospira , Citrobacter , and Shigella  were reported to be positively associated with obesity. Lachnobacterium  showed a negative association with obesity.
At the species level, the presence of Parabacteroides distasonis, Lactobacillus kunkeei, Pseudoalteromonas piscicida, Shewanella algae, Marinomonas posidonica, and Aliivibrio fischeri was positively associated with obesity (Table ). Species of Bacteroides and Parabacteroides represent opportunistic pathogens in infectious diseases, and they are able to develop antimicrobial drug resistance . Parabacteroides distasonis, previously known as Bacteroides distasonis , is prominently found in the gut of healthy individuals . It is also related to improved human bowel health release  and negatively associated with celiac disease . Our results revealed a positive association between Parabacteroides distasonis and obesity. Blautia producta  and Enterobacter cloacae  were suggested to be related to a high-fat diet causing obesity in a mouse model. Serratia is a genus of Gram-negative, facultatively anaerobic, rod-shaped bacteria. In hospitals, Serratia species tend to colonize the respiratory and urinary tracts causing nosocomial infections [62, 63]. In related studies of the GI tract, Serratia increased in formula-fed mice  and was positively correlated with infants with colic .
In the alpha diversity analysis (Figure 6(a)), the Chao richness index between normal and case groups exhibited a significant difference (). This shows that bacterial communities in normal samples had a greater genera richness than those in case samples. Results of the beta diversity analysis (Figure 6(b)) showed that bacterial communities in normal samples were more similar than those in case samples. The unweighted PCoA plot (Figure 7) showed that bacterial communities in the N-like group (including N1 and N2) were highly associated with normal individuals, and bacterial communities in the OB-like group (including OB1, OB2, OB3, and OB4) were more associated with obese individuals. The unsupervised clustering heatmap of all samples (Figure (A)) was generated using Spearman correlations, and the results showed that most normal samples and most case samples were, respectively, clustered together, when all genera were used for clustering. However, when only relatively abundant genera were used for clustering, normal, and case samples were interwoven with each other (Figure S1(B)). This indicates that some genera found in small proportions might be important for distinguishing obese from normal individuals.
This is the first study in Taiwan to investigate the association between human gut microbiota and obesity using metagenomic sequencing. The results showed that bacterial communities in the gut were clustered into N-like and OB-like groups which were highly associated with normal and obese subjects, respectively. Several relatively abundant bacteria with significantly different distributions between normal and case samples were identified and used to establish a rule-based classification model. Although detailed functional roles or mechanisms of these bacteria are needed for further validation, the results provide new insights about bacterial communities in the gut with a rising trend of obesity.
5.1. Sample Collection and DNA Extraction
Eighty-one stool samples were collected by Sigma-transwab (Medical Wire) into a tube with Liquid Amies Transport Medium and stored at 4°C until being processed. Fresh faeces were obtained from participants, and DNA was directly extracted from stool samples using a QIAamp DNA Stool Mini Kit (Qiagen). A swab was vigorously vortexed and incubated at room temperature for 1 min. The sample was transferred to microcentrifuge tubes containing 560 μL of Buffer ASL, then vortexed, and incubated at 37°C for 30 min. In addition, the suspension was incubated at 95°C for 15 min, vortexed, and centrifuged at 14,000 rpm for 1 min to obtain pelletized stool particles. Extraction was performed following the protocol of the QIAamp DNA Stool Mini Kit. DNA was eluted with 50 μL Buffer AE, centrifuged at 14,000 rpm for 1 min, and then the DNA extract was stored at −20°C until being further analyzed.
5.2. Library Construction and Sequencing of the V4 Region of 16S rDNA
The PCR primers, F515 (5′-GTGCCAGCMGCCGCGGTAA-3′) and R806 (5′-GGACTACHVGGGTWTCTAAT-3′), were designed to amplify the V4 region of bacterial 16S rDNA as described previously . Polymerase chain reaction (PCR) amplification was performed in a 50 μL reaction volume containing 25 μL 2x Taq Master Mix (Thermo Scientific), 0.2 μM of each forward and reverse primer, and 20 ng of a DNA template. The reaction conditions included an initial temperature of 95°C for 5 min, followed by 30 cycles of 95°C for 30 s, 54°C for 1 min, and 72°C for 1 min, with a final extension of 72°C for 5 min. Next, amplified products were checked by 2% agarose gel electrophoresis and ethidium bromide staining. Amplicons were purified using the AMPure XP PCR Purification Kit (Agencourt) and quantified using the Qubit dsDNA HS Assay Kit (Qubit) on a Qubit 2.0 Fluorometer (Qubit), all according to the respective manufacturer’s instructions. For V4 library preparation, Illumina adapters were attached to the amplicons using the Illumina TruSeq DNA Sample Preparation v2 Kit. Purified libraries were processed for cluster generation and sequencing using the MiSeq system.
5.3. Filtering 16S rRNA (rDNA) Sequencing Data for Quality
The FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit) was used to process the raw fastq read data files from Illumina Miseq. The sequence quality criteria were as follows: (1) the minimum acceptable phred quality score of sequences was 30 with a score of >70% of sequence bases of ≥20; (2) after quality trimming from the sequence tail, sequences of >100 bp were retained, and they also had an acceptable phred quality score of 30; and (3) both forward and reverse sequencing reads which met the first and second requirements were retained for subsequent analysis. Sequencing reads from different samples were identified and separated according to specific barcodes in the 5′ end of the sequence (with two mismatches allowed).
5.4. Taxonomic Assignments of Bacterial 16S rRNA Sequences
Paired-end sequences were obtained, and their qualities were assessed using the FASTX-Toolkit. To generate taxonomic assignments, Bowtie2 was used to align sequencing reads against the collection of a 16S rRNA sequences database. A standard of 97% similarity against the database was applied. 16S rRNA sequences of bacteria were retrieved from the SILVA ribosomal RNA sequence database . Following sequence data collection, sequences were extracted using V4 forward and reverse primers. To prevent repetitive sequence assignments, V4 sequences from SILVA were then clustered into several clusters by 97% similarity using UCLUST . Results of the taxonomic assignment were filtered to retain assignments with >10 sequences.
5.5. Bacterial Community Analysis
After taxonomic assignment, an operational taxonomic unit (OTU) table was generated. To normalize the sample size of all samples, a rarefaction process was performed on the OTU table. Alpha and beta diversities were calculated based on a rarified OTU table. The Kolmogorov-Smirnov test and an analysis of variance (ANOVA) test with the Bonferroni correction were used to investigate significant differences between different sample groups. To observe relationships between samples and explore taxonomic associations, weighted and unweighted UniFrac  distance metrics were also generated based on the rarified OTU table. A principal coordinate analysis (PCoA) and unsupervised clustering were performed based on the UniFac distance matrix. To explore relationships between clinical features and different sample groups, Spearman’s correlation coefficient and regression analysis were performed. The statistical analytical process was done in R language. The J48 machine learning method in Weka 3.6.7  was used to construct a classification rule for discriminating between obese and normal individuals.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Chih-Min Chiu, Wei-Chih Huang, and Shun-Long Weng contributed equally to this work.
The authors would like to thank the National Science Council of the Republic of China for financially supporting this research under Grant nos. NSC 101-2311-B-009-005-MY3 and NSC 102-2627-B-009-001. This work was supported in part by the Ministry of Science and Technology under Grant no. MOST 103-2221-E-038-013-MY2. This work was supported in part by UST-UCSD International Center of Excellence in Advanced Bioengineering sponsored by the Ministry of Science and Technology I-RiCE Program under Grant no. NSC-102-2911-I-009-101-, and Veterans General Hospitals and University System of Taiwan (VGHUST) Joint Research Program under Grant no. VGHUST103-G5-1-2. This work was also partially supported by MOE ATU.
Table S1: Characteristics of each sample
Table S2: Species with a significantly different presence between normal and case samples
S3: Performance of classification of obesity and normal using Parabacteroides distasonis and Serratia sp. DAP4
Figure S1: Unsupervised clustering heatmap using Speaman correlation with (A) all genera (B) relative abundant genera
J. Chen, X. He, and J. Huang, “Diet effects in gut microbiome and obesity,” Journal of Food Science, vol. 79, no. 4, pp. R442–R451, 2014.View at: Google Scholar
O. O. Erejuwa, S. A. Sulaiman, and M. S. Wahab, “Modulation of gut microbiota in the management of metabolic disorders: the prospects and challenges,” International Journal of Molecular Sciences, vol. 15, no. 3, pp. 4158–4188, 2014.View at: Google Scholar
G. L. Hold, M. Smith, C. Grange, E. R. Watt, E. M. El-Omar, and I. Mukhopadhya, “Role of the gut microbiota in inflammatory bowel disease pathogenesis: what have we learnt in the past 10 years?” World Journal of Gastroenterology, vol. 20, no. 5, pp. 1192–1210, 2014.View at: Google Scholar
F. Karlsson, V. Tremaroli, J. Nielsen, and F. Backhed, “Assessing the human gut microbiota in metabolic diseases,” Diabetes, vol. 62, no. 10, pp. 3341–3349, 2013.View at: Google Scholar
M. A. Hullar, A. N. Burnett-Hartman, and J. W. Lampe, “Gut microbes, diet, and cancer,” Cancer Treatment and Research, vol. 159, pp. 377–399, 2014.View at: Google Scholar
K. Korpela, H. J. Flint, A. M. Johnstone et al., “Gut microbiota signatures predict host and microbiota responses to dietary interventions in obese individuals,” PLoS One, vol. 9, no. 3, Article ID e90702, 2014.View at: Google Scholar
K. C. Portero McLellan, K. Wyne, E. T. Villagomez, and W. A. Hsueh, “Therapeutic interventions to reduce the risk of progression from prediabetes to type 2 diabetes mellitus,” Therapeutics and Clinical Risk Management, vol. 10, pp. 173–188, 2014.View at: Google Scholar
D. Nagakubo, M. Shirai, Y. Nakamura et al., “Prophylactic effects of the glucagon-like Peptide-1 analog liraglutide on hyperglycemia in a rat model of type 2 diabetes mellitus associated with chronic pancreatitis and obesity,” Comparative Medicine, vol. 64, no. 2, pp. 121–127, 2014.View at: Google Scholar
K. A. Britton, J. M. Massaro, J. M. Murabito, B. E. Kreger, U. Hoffmann, and C. S. Fox, “Body fat distribution, incident cardiovascular disease, cancer, and all-cause mortality,” Journal of the American College of Cardiology, vol. 62, no. 10, pp. 921–925, 2013.View at: Publisher Site | Google Scholar
M. Remely, E. Aumueller, D. Jahn, B. Hippe, H. Brath, and A. G. Haslberger, “Microbiota and epigenetic regulation of inflammatory mediators in type 2 diabetes and obesity,” Beneficial Microbes, vol. 5, no. 1, pp. 33–43, 2014.View at: Google Scholar
W. Pan, K. M. Flegal, H. Chang, W. Yeh, C. Yeh, and W. Lee, “Body mass index and obesity-related metabolic disorders in Taiwanese and US whites and blacks: Implications for definitions of overweight and obesity for Asians,” The American Journal of Clinical Nutrition, vol. 79, no. 1, pp. 31–39, 2004.View at: Google Scholar
S. Xiao, N. Fei, X. Pang et al., “A gut microbiota-targeted dietary intervention for amelioration of chronic inflammation underlying metabolic syndrome,” FEMS Microbiology Ecology, vol. 87, no. 2, pp. 357–367, 2014.View at: Google Scholar
Q. Zhu, Z. Jin, W. Wu et al., “Analysis of the intestinal lumen microbiota in an animal model of colorectal cancer,” PLoS ONE, vol. 9, no. 3, Article ID e90849, 2014.View at: Google Scholar
H. Urbanczyk, J. C. Ast, M. J. Higgins, J. Carson, and P. V. Dunlap, “Reclassification of Vibrio fischeri, Vibrio logei, Vibrio salmonicida and Vibrio wodanis as Aliivibrio fischeri gen. nov., comb. nov., Aliivibrio logei comb. nov., Aliivibrio salmonicida comb. nov. and Aliivibrio wodanis comb. nov,” International Journal of Systematic and Evolutionary Microbiology, vol. 57, part 12, pp. 2823–2829, 2007.View at: Publisher Site | Google Scholar
R. Beaz-Hidalgo, A. Doce, S. Balboa, J. L. Barja, and J. L. Romalde, “Aliivibrio finisterrensis sp. nov., isolated from Manila clam, Ruditapes philippinarum and emended description of the genus Aliivibrio,” International Journal of Systematic and Evolutionary Microbiology, vol. 60, no. 1, pp. 223–228, 2010.View at: Publisher Site | Google Scholar
M. Sakamoto and Y. Benno, “Reclassification of Bacteroides distasonis, Bacteroides goldsteinii and Bacteroides merdae as Parabacteroides distasonis gen. nov., comb. nov., Parabacteroides goldsteinii comb. nov and Parabacteroides merdae comb. nov,” International Journal of Systematic and Evolutionary Microbiology, vol. 56, part 7, pp. 1599–1605, 2006.View at: Publisher Site | Google Scholar
M. Patankar, S. Sukumaran, A. Chhibba, U. Nayak, and L. Sequeira, “Comparative in-vitro activity of cefoperazone-tazobactam and cefoperazone-sulbactam combinations against ESBL pathogens in respiratory and urinary infections,” Journal of Association of Physicians of India, vol. 60, no. 11, pp. 22–24, 2012.View at: Google Scholar
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software: an update,” SIGKDD Explorations, vol. 11, no. 1, pp. 10–18, 2009.View at: Google Scholar