Network analysis of transcriptional signature typically relies on direct interaction between two highly expressed genes. However, this approach misses indirect and biological relevant interactions through a third factor (hub). Here we determine whether a hub-based network analysis can select an improved signature subset that correlates with a biological change in a stronger manner than the original signature. We have previously reported an interferon-related transcriptional signature (THP1r2Mtb-induced) from Mycobacterium tuberculosis (M. tb)-infected THP-1 human macrophage. We selected hub-connected THP1r2Mtb-induced genes into the refined network signature TMtb-iNet and grouped the excluded genes into the excluded signature TMtb-iEx. TMtb-iNet retained the enrichment of binding sites of interferon-related transcription factors and contained relatively more interferon-related interacting genes when compared to THP1r2Mtb-induced signature. TMtb-iNet correlated as strongly as THP1r2Mtb-induced signature on a public transcriptional dataset of patients with pulmonary tuberculosis (PTB). TMtb-iNet correlated more strongly in CD4+ and CD8+ T cells from PTB patients than THP1r2Mtb-induced signature and TMtb-iEx. When TMtb-iNet was applied to data during clinical therapy of tuberculosis, it resulted in the most pronounced response and the weakest correlation. Correlation on dataset from patients with AIDS or malaria was stronger for TMtb-iNet, indicating an involvement of TMtb-iNet in these chronic human infections. Collectively, the significance of this work is twofold: (1) we disseminate a hub-based approach in generating a biologically meaningful and clinically useful signature; (2) using this approach we introduce a new network-based signature and demonstrate its promising applications in understanding host responses to infections.

1. Introduction

It has been estimated that Mycobacterium tuberculosis (M. tb) infects as many as 2 billion people in the world. Between 5 and 10% of these infected individuals will likely develop active tuberculosis (TB) during their lifetime [1]. Approximately 1.4 million people each year die from TB [1]. M. tb infection results in disease after immune cells fail to contain bacterial replication. As M. tb primarily resides within macrophages after being inhaled there is an urgent need to understand how the host macrophages respond protectively and pathologically to M. tb infection. Transcriptional profiling of host cell responses is an unbiased whole-genome approach that has already been applied to the whole blood [24] and blood cell subpopulations [2, 5] of TB patients and to the human macrophage cell line model THP-1 infected with M. tb [6]. The information-rich data obtained from these analyses hold great promise for exploring mechanisms of pathogenicity and immunity, for TB diagnosis/prognosis, and have potential implications for development of new TB vaccines.

Changes in the transcriptome cause changes in cell functions. Yet the changes in transcriptome and the resultant changes in cell function are generally mediated by changes in the availability of RNA sequences and proteins that function within cascades of network interactions. Working from the principle that genes do not function alone but in the context of networks, network-based interpretations of “omics” data can uncover novel insights for biomedical research [79]. In such a view, it would be of greater biological relevance if “omics” data were trained in the context of protein-protein interactions [3, 9, 10]. As a precedent, candidate genes identified from a RNAi functional screen for host genes important for regulating M. tb survival in macrophages have been analyzed in the context of protein-protein interaction data. This analysis revealed a pivotal role of the regulation of autophagy for survival of M. tb [11]. This kind of network-based approach has also been applied in other contexts, such as an AIDS-relevant network in macaques for predicting the magnitude of specific T-cell responses and viral loads [9] and a putative network underlying early human organogenesis [12].

Network analysis of highly expressed genes typically relies on preexisting knowledge of a direct interaction between pairs of highly expressed genes. However, expression of multiple genes often indicates interaction with a hub or a factor that interacts/associates with many other gene products. Connections via a hub can be missed by network analysis that is based solely on direct interaction between two expression-active gene products. As these connections are biologically relevant, we proposed that hubs could be exploited for creating a biologically relevant subnetwork of expression-active genes.

Recently, we have reported transcriptome analysis of human macrophage cell line THP-1 infected by different M. tb W-Beijing strains and have identified a core interferon-related transcriptional signature [6]. This core host transcriptional response seemed to be positively correlated with in vivo transcriptome data from patients with active pulmonary tuberculosis (PTB) and to some extent this signature decreased following clinical therapy of PTB [6]. Here, by reanalyzing our previously reported interferon-related signature with a new hub-based network analysis strategy, we aimed to produce a refined signature that was biologically and clinically more correlative with PTB patients. Interestingly, the new signature also showed greater correlation with patients with acquired immunodeficiency syndrome (AIDS) and malaria but not with patients with several other infections or inflammatory conditions. We propose that the improved interferon-related signature can be an attractive alternative to the established large interferon-related signature and should be more accessible to TB investigators interested in host cell response research.

2. Methods

2.1. Protein Interaction Network Data

The protein interaction information used in this study was obtained from the STRING database [16]. STRING contains both physical and functional interactions between proteins in a variety of organisms. We extracted these interactions from the human specific network where there was a combined score of at least 0.7. This criterion ensured high coverage without compromising data quality [16].

2.2. Derivation of a Network-Based Signature from Original THP1r2Mtb-Induced Signature

We first identified a number of genes (referred to herein as hubs) that made a minimum number of direct connections in the STRING database with genes in our previously identified active interferon-related signature (THP1r2Mtb-induced) [12]. We then used this set of hubs to select all the interacting genes in THP1r2Mtb-induced signature and grouped them into a new subset known as THP1r2Mtb-iNet[]. Genes in THP1r2Mtb-induced signature excluded from THP1r2Mtb-iNet[] were then grouped into THP1r2Mtb-iEx[]. We then assessed the biological relevance of each subset by its aggregate -score [17]. The calculation of aggregate -score was similar to that described in the original paper [13]. In general, the -score of an individual gene was calculated from the significance (adjusted ) of the change in gene expression by subtracting it from 1 (see our previous work for the adjusted values for 4 h versus 18 h after infection [6]) and this was then divided by the normal cumulative distribution function (CDF). Then the aggregate -score was calculated as the summation of -scores from genes in a subset divided by the square root of the number of genes in a subset. In essence, the aggregate -score reflected the expression levels of a signature and allowed comparison of putative signatures with different numbers of genes. The higher the aggregate -score of a signature was, the more transcriptionally active the signature was. The signature with the highest aggregate -score was visualized using Cytoscape [18].

2.3. Enrichment Analysis of Transcription Factor Binding Sites (TFBSs)

PRomoter Integration in Microarray Analysis (PRIMA) was applied for TFBS enrichment analysis for genes in the derived signatures [14]. The analysis was based on the promoter region spanning from 2,000 bp upstream to 200 bp downstream of transcription start sites, using the entire EntrezGenes as testing background. Enrichments with Bonferroni-corrected value < 0.01 were declared as significant.

2.4. KEGG Pathway Enrichment Analysis

The analysis was done in the web-accessible Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7, based on Benjamini and Hochberg-derived False Discovery Rate (FDR) [17].

2.5. Gene Set Enrichment Analysis (GSEA) against Transcriptomes from Patients with PTB or Other Diseases

GSEA is a nonparameter method for determining whether signature genes are overrepresented at the top or bottom of a predefined list of ranked genes (genes are ranked from high to low according to their expression levels) [19]. The list of ranked genes was predefined according to the available transcriptome data. A total of nine transcriptome datasets were retrieved for GSEA analysis from NCBI GEO with accession numbers GSE19491 [2], GSE31348 [20], GSE6269 [21], GSE11907 [15], GSE4124 [22], GSE6740 [23], GSE5418 [24], GSE40184 [25], and GSE7123 [26]. Among these publicly available datasets, we chose the first two (GSE19491 and GSE31348) datasets to determine the correlation of our signatures because both datasets contained transcriptome data that compare PTB against latent tuberculosis (LTB) or healthy control (HC) and followed the course of PTB therapy. In particular, GSE19491 contains whole blood transcriptome data from a large number of PTB patients, LTB patients, and HC recruited from London in the UK and Cape Town in South Africa. These samples were grouped into 5 cohorts: (1) training set, London volunteers with PTB, LTB, and healthy controls; (2) test set, London volunteers with PTB, LTB, and healthy controls; (3) validation set, Cape Town volunteers with PTB and LTB; (4) Test_set_seperated, neutrophils (Neut), monocytes (Mono), CD4+ T cells (CD4), and CD8+ T cells (CD8) separated from the blood of the test set PTB patients and healthy controls; and (5) longitudinal study, patients after 2 months (PTB_2 m) and 12 months (PTB_12 m) of treatment and healthy controls [2]. GSE31348 contains whole blood transcriptome data from PTB patients in Cape Town, South Africa, at diagnosis (before drug treatment, week 0) and at 1, 2, 4, and 26 weeks of treatment [20].

Other datasets involving patients with other infections or inflammatory conditions were included in our analyses to determine the specificity of our network-derived gene signature. GSE6269 contains transcriptome data of peripheral blood mononuclear cells (PBMCs) from young patients. In these young patients, the infecting pathogens were (1) Escherichia coli, (2) influenza A, (3) Staphylococcus aureus, or (4) Streptococcus pneumonia [21]. GSE11907 contains transcriptome data of PBMCs from patients with one of the following conditions: (1) E. coli infection; (2) systemic juvenile idiopathic arthritis; (3) systemic lupus erythematosus; (4) liver-transplant recipient undergoing immunosuppressive therapy; (5) metastatic melanoma; (6) type I diabetes; and (7) Staphylococcus aureus infection [15]. GSE4124 contains transcriptome data of PBMCs from HIV-1 positive/negative mothers with infants in Botswana, Africa. These mothers could be divided into the following three categories: (1) HIV-1 negative mother; (2) HIV-1 positive mother who perinatally transmitted the virus to her infant; and (3) HIV-1 positive mother who did not transmit the virus to her infant [22]. For GSE6740, CD4+ or CD8+ T cells were purified from four groups of participants: Group 1: HIV-1-negative volunteers; Group 2: individuals with HIV-1 infection within 6 months of study and asymptomatic when blood was drawn (acute HIV); Group 3: individuals with chronic progressive HIV-1 infection for at least 1 year and asymptomatic (chronic HIV); and Group 4: nonprogressor individuals with HIV-1 infection for at least 3 years (nonprogressor HIV) [23]. GSE5418 contains two groups of donors: one group is malaria patients from Cameroon, West Africa, where blood was obtained for PBMCs separation before and after chloroquine treatment; the other group includes healthy individuals from the USA who were experimentally challenged with malaria-infected mosquitos. PBMCs were obtained from these subjects before mosquito challenge and when a single parasite was identified by blood smear microscopy [24]. GSE40184 contains transcriptome data of PBMCs from treatment-naïve chronic hepatitis C virus- (HCV-) infected patients or healthy controls [25]. GSE7123 contains transcriptome data of PBMCs from African-American/black (AA) or Caucasian-American/white (CA) patients with chronic HCV infection and undergoing therapy with pegylated interferon-2a (peginterferon). Treatment doses were 180 μg weekly by self-administered subcutaneous injection and ribavirin orally in a dose of 1,000 or 1,200 mg daily based on body weight of less than 75 kg or equal to or greater than 75 kg. PBMCs were separated from patients prior to therapy (day 0) and on days 1 (after injection of peginterferon), 2, 7, 14, and 28. In addition, these patients were divided into three categories based on their change in HCV levels as detected by a quantitative PCR-based assay: (1) marked, defined as a decrease in virus RNA levels of more than 3.5 log10 IU/mL on day 28 relative to baseline; (2) intermediate, decrease of 1.4 to 3.5 log10 IU/mL; and (3) poor, decrease of less than 1.4 log10 IU/mL [26].

GSEA results are reported as normalized enrichment score (NES) and FDR. A gene signature with a positive score is overrepresented at the top of a ranked gene list and indicates a positive correlation (upregulated expression) in the gene list, whereas a gene signature with a negative NES is underrepresented at the bottom of a ranked gene list and indicates the negative correlation (downregulated expression) in the gene list. An FDR of 0.05 or less indicates statistical significance of NES [19].

3. Results

3.1. Derivation of an Integrated Signature Capturing Essential Characteristics of a Previously Identified Interferon-Related Signature

Previously, we identified an active interferon-related signature (THP1r2Mtb-induced) as a common transcriptional response of THP-1 cells to infection by different M. tb W-Beijing strains [6]. Since gene products function within the context of networks and perturbation of such networks often changes cell phenotype [8], we reasoned that the transcriptional core response could be better described/refined when integrated with protein-protein interaction data. To this end, we combined the previous data with protein-protein interaction network data to refine the original signature (Figure 1). We sought to identify which of the genes that showed a dominant expression pattern during M. tb infection were also highly linked among themselves or via a hub in the human interaction/association network. Then, the highly linked signature and the excluded signature, as well as the original THP1r2Mtb-induced signature, were subject to gene set enrichment analysis (GSEA) on publicly available patient-derived transcriptome data for validation and comparison (Figure 1). A hub was selected from the STRING protein interaction database based on the biological relevance of the hub, which was defined by the number of direct interactions (referred to herein as the degree of interaction) the hub made with expression-active gene products (i.e., gene products of THP1r2Mtb-induced). We grouped together all highly expressed genes, whose gene products mutually interacted directly or interacted indirectly via at least one of the hubs that had a minimum of degrees of interaction, into the refined subset signature THP1r2Mtb-iNet[]. As the minimum degree required for inclusion of hubs in a subset was increased, the number of hubs (Figure S1A) and the total number of interactions in the subset decreased dramatically (Figure S1B) (Supplementary Material is available online at http://dx.doi.org/10.1155/2014/713071). Also, more highly expressed genes were excluded from the new subset (Figure S1C). These excluded highly expressed genes were grouped into THP1r2Mtb-iEx[]. The expression level of each THP1r2Mtb-iNet[] as a whole was then assessed by aggregate -score. This score allows comparison among gene groups with different sizes. The higher the aggregate -score, the higher level of expression of THP1r2Mtb-iNet[]. Figure 2(a) displays the distribution of aggregate -scores as a function of minimum degree of hubs. The aggregate -score for THP1r2Mtb-iNet[] reached the highest when hubs had at least 14 degrees. We refer to the signature with 14 minimum degrees as TMtb-iNet and the cognate excluded signature as TMtb-iEx. Figure 2(b) indicated that the TMtb-iNet genes were expressed at significantly higher levels than the TMtb-iEx genes.

In our previous study, we showed that the promoter regions of genes in THP1r2Mtb-induced signature are significantly enriched for transcription factor binding sites (TFBSs) of interferon-related regulators (i.e., ISRE, IRF-7, and IRF-1) [6]. To validate that TMtb-iNet genes were still representative of THP1r2Mtb-induced signature, we also looked for significant enrichment of these three putative TFBSs. Regardless of which minimum degree of hubs was utilized, we always observed the superior enrichment of ISRE and IRF-7 in promoter regions of genes in THP1r2Mtb-iNet[] compared to genes in THP1r2Mtb-iEx[] (corrected ) (Figures 3(a) and 4(b)). In contrast, IRF-1 was significantly enriched in promoter regions of genes in both THP1r2Mtb-iNet[] and THP1r2Mtb-iEx[] (corrected ) independent of the minimum degree of hubs (Figure 3(c)). We especially noted that the TFBS of IRF-7 was exclusively enriched in promoter regions of genes in THP1r2Mtb-iNet[] but not of genes in THP1r2Mtb-iEx[] (Figure 3(b)). Figure 3(d) illustrates the consistent and superior significant enrichment of TFBSs in the promoter regions of genes in TMtb-iNet compared to genes in TMtb-iEx, derived using hubs with minimum degree 14.

Figure 4(a) illustrates the layout of TMtb-iNet plus its cognate hubs with minimum degree 14 according to the subcellular localization of their gene products. In this layout, the expression changes of all genes were color-coded, showing the overwhelming induction (upregulation) especially at 18 h after M. tb infection (Figure 4(b)). TMtb-iNet significantly enriched the pathways of cytokine-cytokine receptor interaction, chemokine signaling, and NOD-like receptor signaling compared to THP1r2Mtb-induced signature (Figure 5 and Table S5). By contrast, TMtb-iEx did not enrich any pathway.

In summary, by utilizing hubs with minimum degree 14, we obtained the network-based signature of TMtb-iNet that displayed the highest expression significance (the highest aggregate -score), without losing the enrichment of interferon-related TFBSs of ISRE, IRF-7, and IRF-1 in their promoter regions.

3.2. TMtb-iNet Contains more Interferon-Related Genes than TMtb-iEx

Interferon-related genes are expected to function in the context of interferon-relevant molecular networks. Since THP1r2Mtb-induced signature correlates with interferon-related processes, we determined whether TMtb-iNet contained more interferon-related genes than TMtb-iEx did. Based on transcriptional profiling of whole blood from a large number of pulmonary tuberculosis (PTB) or latent tuberculosis (LTB) patients and healthy volunteers, Berry et al. reported a PTB specific interferon-inducible neutrophil-driven blood transcriptional signature (393 transcripts representing 307 unique Entrez Genes) compared to LTB and healthy controls [2]. We reported earlier that 55 of these signature genes were significantly () present in the THP1r2Mtb-induced signature [6]. We found that 36 of the 55 overlapped genes were also present in TMtb-iNet, whereas only 19 were present in TMtb-iEx () (Figure 6). Chaussabel et al. constructed an array of gene modules that are expressed commonly across multiple diseases. These gene modules were associated with certain functional characteristics as clarified by literature profiling [15, 29]. THP1r2Mtb-induced signature harbors nearly half (44/95) of the genes in the interferon-related module (M3.1) [6]. We found that 33 of these THP1r2Mtb-induced genes were also present in TMtb-iNet, whereas only 11 of such THP1r2Mtb-induced genes were in TMtb-iEx () (Figure 7). Ingenuity pathway analysis also indicated that interferon signaling was enriched in TMtb-iNet and THP1r2Mtb-induced signature with the highest significances, but not in TMtb-iEx (−log10( value) = 9.28 for TMtb-iNet and –log10( value) = 6.9 for THP1r2Mtb-induced). Taken together, our analyses validated that the network-based signature of TMtb-iNet contained more interferon-related genes than the excluded signature of TMtb-iEx and confirmed that our approach could select a network-based signature that retained the original signature’s biological representation.

3.3. TMtb-iNet Displays Equivalent Positive Correlation with PTB Patients but Higher Positive Correlation with Separated Cell Populations of PTB Patients Compared to THP1r2Mtb-Induced Signature or TMtb-iEx

We have previously indicated the high positive correlation of THP1r2Mtb-induced signature with a public transcriptional dataset on PTB patients [6]. We therefore examined whether the network-based signature of TMtb-iNet still inherited the significant degree of positive correlation with PTB patients. As shown in Table 1 (also in Figure 8), like THP1r2Mtb-induced signature, TMtb-iNet showed similar positive correlation with PTB more than with LTB and healthy controls (e.g., for the training set, PTB versus HC showed NES = 3.23 in THP1r2Mtb-induced signature, and NES = 3.30 in TMtb-iNet). This was the case for any of the three datasets (i.e., London patient-based training set, London patient-based test set, and Cape Town patient-based validation set). In comparison, the correlation of TMtb-iEx to PTB was lower (e.g., PTB versus HC showed NES = 2.66 in the training set). Our analysis indicated that the network-based signature of TMtb-iNet, but not the excluded signature of TMtb-iEx, was overall as expression-active as THP1r2Mtb-induced signature in the whole blood of PTB patients.

We then examined whether TMtb-iNet also showed positive correlation with specific cell populations including neutrophils, monocytes, and CD4+ and CD8+ T cells from PTB patients. We found that, similar to THP1r2Mtb-induced signature, TMtb-iNet showed positive and significant correlation with each of the four cell populations (Table 2 and Figure 8). However, with CD4+ and CD8+ T cells, TMtb-iNet displayed higher positive correlation than THP1r2Mtb-induced signature (NES = 2.36 for TMtb-iNet versus NES = 1.86 for THP1r2Mtb-induced signature in CD4+ T cells; NES = 2.23 for TMtb-iNet versus NES = 1.70 for THP1r2Mtb-induced signature in CD8+ T cells). These higher correlations of TMtb-iNet were specific to CD4+ and CD8+ T cells because, with neutrophils or monocytes, TMtb-iNet did not show higher correlation than THP1r2Mtb-induced signature did. Thus, when compared to THP1r2Mtb-induced signature, TMtb-iNet was more expression-active in CD4+ and CD8+ T cells, but not in neutrophils or in monocytes. TMtb-iEx showed less correlation with neutrophils and monocytes than either TMtb-iNet or THP1r2Mtb-induced signature, indicating that TMtb-iEx was less expression-active in these two cell populations. More importantly, TMtb-iEx displayed no correlation with CD4+ and CD8+ T cells, which indicated that TMtb-iEx was not expression-active in T cells (Table 2). Taken together, our results indicated that the network-based signature of TMtb-iNet provided equivalent correlation with PTB patients and higher correlation with CD4+ and CD8+ T cells, when compared to THP1r2Mtb-induced signature or the excluded signature of TMtb-iEx.

3.4. TMtb-iNet Decreases More than Either THP1r2Mtb-Induced Signature Or TMtb-iEx during Treatment of PTB

Gene set enrichment analysis (GSEA) on the datasets from PTB patients receiving treatment indicated that TMtb-iNet showed decreased, but still significant, positive correlation after two months of treatment (PTB_0 m versus HC with NES = 3.29 and FDR = 0 before treatment; PTB_2 m versus HC showed NES = 2.83 and FDR = 0 at 2 months after treatment), and the correlation of TMtb-iNet became insignificant at 12 months after treatment (PTB_12 m versus HC with NES = 1.15 and FDR = 0.182) (Table 3 and Figure 8). By contrast, both THP1r2Mtb-induced signature and TMtb-iEx still had significant positive correlation at 12 months after treatment, even though they showed decreasing correlation during the course of treatment (Table 3). Consistently, TMtb-iNet showed lower negative correlation with PTB_2 m and PTB_12 m when compared to the pretherapy (PTB_0 m) (NES = −3.16 for TMtb-iNet, NES = −2.96 for THP1r2Mtb-induced, and NES = −2.29 for TMtb-iEx in PTB_12 m versus PTB_0 m) (Table 3). Similarly, TMtb-iNet showed lower negative correlation than TMtb-iEx with therapy of PTB at weeks 2, 4, and 26 after treatment compared to the pretherapy (PTB_wk0), but not at week 1 after treatment at another dataset (e.g., NES = −2.68 for TMtb-iNet and NES = −2.38 for TMtb-iEx in PTB_wk26 versus PTB_wk0) (Table 4 and Figure S2). THP1r2Mtb-induced signature showed the lowest negative correlation with PTB therapy at weeks 1, 2, and 4 but showed almost the same degree of negative correlation with TMtb-iNet with PTB therapy at week 26 (e.g., NES = −2.71 for THP1r2Mtb-induced signature and NES = −2.68 for TMtb-iNet in PTB_wk26 versus PTB_wk0) (Table 4 and Figure S2). These results collectively demonstrated that the network-based signature of TMtb-iNet seemed to be more responsive to the therapy of PTB than the original THP1r2Mtb-induced signature or the excluded signature of TMtb-iEx.

3.5. Correlation Analysis of TMtb-iNet, THP1r2Mtb-Induced, and TMtb-iEx Signatures to Several Other Infections and Inflammatory Conditions

HIV, malaria, and TB are the top infectious diseases imposing the heaviest burden on health care systems [30]. We therefore examined whether THP1r2Mtb-induced signature, TMtb-iNet, and TMtb-iEx were correlated with or well represented in transcriptome datasets from patients with HIV or malaria. We found that all the three signatures displayed general positive correlation in the transcriptome datasets of PBMCs from patients with HIV-1 infection (Table 5). Specifically, TMtb-iNet displayed higher positive correlation than THP1r2Mtb-induced signature and TMtb-iEx (e.g., NES = 3.16 for TMtb-iNet versus NES = 2.93 for THP1r2Mtb-induced signature and NES = 1.98 for TMtb-iEx) (Table 5 and Figure 8). Since HIV/TB coinfection imposes a severe death threat to patients and the underlying mechanism is the dysfunction of T cells [30], we then further applied the GSEA against transcriptome datasets from T cells (both CD4+ and CD8+) of patients with acute and chronic forms of HIV infection. All three signatures displayed general positive correlation in the T cell transcriptome datasets. By contrast, in the nonprogressor HIV group all the three signatures displayed no correlation with CD4+ T cells and lowest positive correlation with CD8+ T cells (Table 6). Notably, among the three signatures TMtb-iNet showed the highest positive correlation with both CD4+ and CD8+ T cells from acute and chronic forms of HIV infection (e.g., in CD4_chronic HIV, NES = 3.35 for TMtb-iNet versus NES = 3.06 for THP1r2Mtb-induced signature or NES = 1.59 for TMtb-iEx) (Table 6 and Figure 8). Similarly, all the three signatures displayed positive correlation with malaria from either the natural malaria infection in Cameroon or the experimental challenge malaria in USA, and once again higher positive correlation was seen with TMtb-iNet (e.g., in ExpeMalaria, NES = 2.73 for TMtb-iNet versus NES = 2.59 for THP1r2Mtb-induced signature and NES = 1.71 for TMtb-iEx) (Table 7 and Figure 8). In summary, blood samples from TB patients produced an interferon-related signature similar to those signatures seen in blood from patients with AIDS or malaria.

Since TMtb-iNet showed correlation with non-TB conditions that also produce a similar interferon-related signature (AIDS and malaria here), we then further applied GSEA with TMtb-iNet, along with THP1r2Mtb-induced signature and TMtb-iEx, against other infections and inflammatory conditions [15, 21, 26]. All the three signatures showed strong positive correlation with datasets of PBMCs from chronic HCV-infected patients before drug treatment (Table 8) or during the therapy with pegylated interferon-2a (peginterferon-2a) and ribavirin, no matter whether the patients were African-American or Caucasian-American or were in any drug response category (i.e., marked, intermediate, or poor) (Table S4). No clear difference in correlation was observed between TMtb-iNet and THP1r2Mtb-induced signature, although both of them showed higher positive correlation than TMtb-iEx did (Tables 8 and S4 and Figures 8 and S3). However, all the three signatures showed no correlation with transcriptome datasets from patients with acute infections of Streptococcus pneumonia, Staphylococcus aureus, influenza A, or E. coli (Table 9) or from patients with inflammatory conditions of type I diabetes, liver transplant undergoing immunosuppressive therapy, metastatic melanoma, systemic lupus erythematosus, or systemic juvenile idiopathic arthritis (Table 10 and Figure 8). Thus, blood samples from patients with TB, AIDS, malaria, or hepatitis C displayed a common interferon-related signature that could be represented by our signatures, especially by the network-based signature of TMtb-iNet.

4. Discussion

In this study, we combined our previously identified interferon-related THP1r2Mtb-induced signature with STRING protein-protein interaction data to generate a more refined version of TMtb-iNet. The refined TMtb-iNet still inherited key characteristics of THP1r2Mtb-induced signature. Promoter regions of genes in TMtb-iNet were enriched with the TFBSs of ISRE, IRF-7, and IRF-1, and the whole TMtb-iNet signature significantly overlapped with the interferon-inducible gene signature with PTB and interferon-related module (Figures 3, 6, and 7). Additionally, TMtb-iNet showed strong positive correlation in PTB blood and its separated cell subpopulations, as well as patterns of decreasing positive correlation during the course of anti-TB therapy (Tables 14, Figure 8).

A complete set of protein-protein interactions comprises a summation of knowledge on functional modularity and network interconnectivity within cells [31]. Therefore, network-based interpretation of “omics” data should be more rational and biology oriented than one based solely on transcriptomics [1012, 32, 33]. Here we identified characteristic protein-protein connections within the THP1r2Mtb-induced profile with the involvement of a third factor (hub) (Figures 2 and 3, Figure S1). By integration of protein-protein interaction data, we refined a subset of genes from the interferon-related THP1r2Mtb-induced transcriptome signature [6] to obtain TMtb-iNet and discarded the rest into TMtb-iEx (Figure 4 and Table S3). Compared with THP1r2Mtb-induced signature or TMtb-iEx, TMtb-iNet consistently enriched interferon signaling and interferon-related TFBSs of ISRE, IRF-1, and IRF-7 in the promoter regions of its genes (Figure 3), as well as harboring more interferon-related genes (Figures 6 and 7). In addition, TMtb-iNet showed greater positive correlation with the separated cells from PTB patients (neutrophils, monocytes, and CD4+ and CD8+ T cells) (Table 2) and specifically displayed a decreasing pattern of positive correlation during therapy of PTB (Table 3). All these results indicated the reliability of the hub-based network approach for identifying a functionally enriched signature.

A key finding was that there exists a universal core of functionally associated host responses irrespective of immune cell type. Transcriptional responses of immune and adaptive immune cells during human M. tb infection have been studied by others. After migrating to tissues (e.g., lung), monocytes can differentiate into macrophages and dendritic cells which are major phagocytes that engulf M. tb and induce adaptive immunity [34, 35]. CD4+ and CD8+ T cells are both important adaptive immune cells in TB and dysfunction of either significantly abrogates control of TB infection [34]. Neutrophils can also be a prominent cell type infected by M. tb [36]. Circulating monocytes undergo functional and phenotypic changes in TB patients, although the presence of different subtypes of monocytes in peripheral blood may have reverse implications for TB control at sites of infection [37, 38]. As a refined signature of function, TMtb-iNet showed a higher degree of positive correlation with all the four separated peripheral blood cell populations of PTB patients (i.e., neutrophils, monocytes, and CD4+ and CD8+ T cells) with the highest positive correlation with neutrophils (Table 2, Figure 8). Thus, a common host response existed among these immune cells in PTB patients, irrespective of cell type or in vitro (i.e., THP-1) or in vivo (i.e., PTB) conditions. This was consistent with other reports showing that immune cells often exhibit a core gene expression profile when exposed to various microorganisms [3942]. Another key finding was the decreased pattern of correlation of TMtb-iNet during clinical therapy of PTB patients (Table 3), which suggested that this set of genes (TMtb-iNet) or the biological process behind their activity (probably an interferon-related process) was strongly involved in the generation/treatment of PTB as also reported by others [2, 3, 43, 44]. The interferon-based nature of the TMtb-iNet and the THP1r2Mtb-induced signatures was indirectly validated by the strong positive correlation with the therapy of chronic hepatitis C; type 1 interferon-related processes were inevitably activated in these patients because they were treated with peginterferon-2a (Table S4, Figure S3).

Interestingly, TMtb-iNet also displayed stronger positive correlation with AIDS and malaria, as well as with hepatitis C (Tables 58, Figure 8), but not with several other acute infections or inflammatory conditions (Tables 9 and 10, Figure 8). This indicated that a similar host response (an interferon-related process being most likely in this case) is shared when a host is fighting/adjusting against these three pathogens of diverse phyla. Coinfection with HIV is known to increase latent TB reactivation about 20-fold [30]. T cells are vital in adaptive control of M. tb infection [45]; however, HIV infection can gradually deplete CD4+ T cells, some of which can be M. tb-specific [30] and CD4+ T cell depletion is a key factor contributing to latent TB reactivation [30, 46]. Other changes in host cells caused by HIV can also facilitate M. tb survival, such as disruption of bactericidal activities of macrophages [47, 48] and deregulation of chemotaxis [49]. GSEA analysis (Tables 14, Figure 8) and transcriptome analysis on PTB patients confirmed that interferon-related processes are dynamically regulated in the pathogenesis/treatment of PTB [2]. We demonstrated a positive correlation of TMtb-iNet with CD4+ and CD8+ T cells from PTB patients or AIDS patients (acute and chronic forms of HIV infection), forming a transcriptome bridge of similarity (i.e., an interferon-related process) between these two diseases [23]. However, the functional significance of such similarity remains elusive in TB/HIV coinfection patients. Detailed transcriptional profiling with high-throughput analyses is needed to unravel the sophisticated mutual correlation and potential for clinical utility in TB, HIV, and TB/HIV coinfection patients. A recent network-based transcriptome study based on mouse models noted strong overlap between genes regulated during cerebral malaria and genes regulated during M. tb infections [50]. These observations indicate caution before using transcriptional signatures alone for TB diagnosis or prognosis.

The observation that TMtb-iNet represented a transcriptional response related to patients with AIDS, malaria, and HCV (Tables 58) might suggest that the microorganism M. tb expressed a molecular pattern which was also expressed by the other disease agents during infections. Innate immune recognitions of HIV and HCV primarily involve sensing of nucleic acids. A DNA-containing protein complex from Plasmodium falciparum, the causative agent of malaria, is also known to be the major trigger of the innate immune response [51]. In a study based on mouse macrophages it was shown that M. tb activates a nucleic acid sensing pathway [52]. Thus, nucleic acids of M. tb, malaria, HIV, and HCV might produce the common transcriptional response in immune cells represented by TMtb-iNet. It is of note that TMtb-iNet does not reflect a generic interferon-related signature. Other pathogens, such as S. pneumoniae [53] and influenza A [54], also trigger nucleic acid-dependent immune responses, but they failed to induce a TMtb-iNet-related transcriptional signature (Tables 9 and 10).

In summary, we derived a refined network signature (TMtb-iNet) from the original transcriptional signature (THP1r2Mtb-induced) based on their directions among themselves or through a group of hubs. The refined signature TMtb-iNet was a highly connected signature induced by M. tb infections in vitro. It showed positive correlation with clinical TB. We believe that the gene products of TMtb-iNet, especially those gene products with higher degrees of interaction, as well as the connecting hubs, are major regulators of immune responses to TB. The shared correlation of TMtb-iNet with other important infectious diseases deserves the attention of investigators involved in developing transcriptome-based TB diagnostic or prognostic tests.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


This work was supported in part by Grants from Chinese National Mega Science & Technology Program on Infectious Diseases (2013ZX10003007-003), National Science Foundation of China (81301407, 81273328, 31170876, 30901276, and 81371777), Shanghai Rising-Star Program (12QH1401900), Shanghai Health Bureau (20114013), Shanghai Science and Technology Commission (134119a5200 and 114119a3100), and Shanghai Natural Science Fund for Youth Scholars (12ZR1448200).

Supplementary Materials

Supplementary Material contains three figures and five tables (FIGURES S1-S3 and TABLES S1-S5). FIGURE S1: General feature of THP1r2Mtb-iNet[i] and its cognate hubs.; FIGURE S2: Correlation of THP1r2Mtb-induced, TMtb-iNet, and TMtb-Ex during therapy of PTB; FIGURE S3: Correlation of THP1r2Mtb-induced, TMtb-iNet, and TMtb-Ex during therapy of chronic hepatitis C; Table S1: General features of THP1r2Mtb-iNet[i]; TABLE S2: TFBS profiling of THP1r2Mtb-iNet[i]; TABLE S3: Relative expression levels of TMtb-iNet and its cognate hubs in THP-1 cells; TABLE S4: GSEA using transcriptome data of PBMCs from chronic HCV-infected patients undergoing therapy; TABLE S5. Gene lists from KEGG analysis in THP1r2Mtb-induced, TMtb-iNet, and TMtb-iEx.

  1. Supplementary Material