Expression Data Analysis to Identify Biomarkers Associated with Asthma in Children
Asthma is characterized by recurrent episodes of wheezing, shortness of breath, chest tightness, and coughing. It is usually caused by a combination of complex and incompletely understood environmental and genetic interactions. We obtained gene expression data with high-throughput screening and identified biomarkers of children's asthma using bioinformatics tools. Next, we explained the pathogenesis of children's asthma from the perspective of gene regulatory networks: DAVID was applied to perform Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enriching analysis for the top 3000 pairs of relationships in differentially regulatory network. Finally, we found that HAND1, PTK1, NFKB1, ZIC3, STAT6, E2F1, PELP1, USF2, and CBFB may play important roles in children's asthma initiation. On account of regulatory impact factor (RIF) score, HAND1, PTK7, and ZIC3 were the potential asthma-related factors. Our study provided some foundations of a strategy for biomarker discovery despite a poor understanding of the mechanisms underlying children's asthma.
Asthma is the most common chronic inflammatory disease of the trachea in childhood characterized by variable and recurring symptoms, reversible airflow obstruction, and bronchospasm . There are significant variations in prevalence in different regions and ethnics; generally, a developed country has a higher prevalence than a developing country. Asthma prevalence is rising worldwide now, and according to the report of International Children’s Asthma and Allergic Organization (ISAAC), the incidence rate of British children’s asthma rose from 10.2% at 2000 to 20.9% at 2011 ; the prevalence in American children below 17 years increased from 3.2% at 1999 to 5.7% at 2010 . In China, the incidence rate of urban children aged 0–14 increased from 0.5% at 1998 to 4.33% at 2008 . Thus, there is an urgent need to identify the underlying basis of asthma.
Asthma is thought to be caused by a combination of genetic and environmental factors , which influence both the severity and responsiveness of asthma in treatment . Smoking during pregnancy and after delivery , low air quality, and exposure to indoor allergens , such as dust mites, cockroaches, animal dander, and mold, have been found to be associated with children’s asthma. Asthma is believed to have a strong genetic background, and hundreds of genes have been identified to be related with asthma, including GSTM1, IL10, CTLA-4, SPINK5, LTC4S, IL4R, and ADAM33 . Some genetic variants may cause asthma only when they are combined with specific environmental exposures , for example, a specific single nucleotide polymorphism in the CD14 region and exposure to endotoxin . Understanding the genetic basis of asthma susceptibility will allow disease prediction and risk stratification .
Bioinformatics plays an important role in addressing the complexity of the underlying genetic basis of common human disease . Microarray data analysis enables the identification of disease marker genes and gene regulatory networks [14, 15]. In this study, we obtained the gene expression profiles using high-throughput technology and screened differentially coexpressed gene pairs. The availability and integration of high-throughput gene expression data with computational bioinformatics analysis may shed new lights into molecular biomarker identification of children’s asthma.
2. Materials and Methods
2.1. Data Source and Preprocessing
The expression profile of GSE18965  was downloaded from Gene Expression Omnibus-GEO database (http://www.ncbi.nlm.nih.gov/geo/) of NCBI (National Center of Biotechnology Information) based on GPL96 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array. Seven normal tissues’ microarray and nine children’s asthma tissues’ microarray were available. Then, probes in expression profile were transformed to corresponding symbols based on GPL96 platform. For genes related to many probes, the average expression value was calculated as the only symbol, and there were 13,046 gene symbols after transformation. Next, limma package in R language was used to screen the differentially expressed genes (DEGs), and false discovery rate (FDR) < 0.05 was set as the threshold.
2.2. Screening of Transcriptional Regulatory Relationships
According to the central dogma, approaches resulting in gene expression differences are varied, but on transcription level, regulatory molecules are the decisive factors, for example, transcription factors (TFs), which regulate the turn on and off of genes. Firstly, human h18 transcription factor binding sites data and genetic coordinate position information were downloaded from the UCSC database . Secondly, we searched transcription factor binding sites between the range of 1 kb upstream and 0.5 kb downstream in the transcription start site of each gene, and the found TF was considered to be associated with this gene, and finally we got 214,608 pairs of gene regulatory relationships on 216 TFs for 16,863 genes.
2.3. Differentially Coexpressed Analysis 
Differentially coexpressed analysis determines the discrepancy in coexpression of gene pairs or gene-TF pairs under different conditions . Previous differentially coexpressed analyses have revealed many insightful biological hypotheses. In this study, for any pair of genes or pair of gene versus TF (, ), the Pearson correlation coefficient (PCC) in normal tissues (P-normal) and tumor tissues (P-tumor) was calculated, and then their absolute difference was obtained. Finally, the pairs with absolute difference >1 were selected as differentially coexpressed pairs. There were two kinds of coexpression relationships: negative, when P-normal ; positive, when P-tumor and vice versa. Consider In this term, represents PCC of gene/TF and gene/TF at normal state; represents the PCC of gene/TF and at tumor state.
2.4. Regulatory Impact Factors (RIF) Calculation
Regulatory impact factor (RIF) appears to be a robust and valuable methodology to identify the regulators with the highest contribution to differential gene expression in two biological conditions. It is a metric given to each TF that combines the expression values of target genes and the coexpression values of TFs and the target genes. The measures of RIF are computed as follows : where is the number of DEGs; and represent the expression values of the DEG in conditions 1 and 2, respectively; and represent the coexpression correlation between the TF and the DEG in conditions 1 and 2, respectively.
2.5. Pathway Enrichment Analysis
In order to facilitate the functional annotation and analysis of large lists of genes in the regulatory network, we inputted all the DEGs into DAVID for KEGG term enrichment analysis. The DAVID enriches canonical pathways by calculating the association between a given set of genes and a canonical pathway using hypergeometric test . A value <0.05 was the screening criterion.
3.1. Screening of Differentially Coexpressed Gene Pairs
If the expressions of two genes or gene versus TF in a series of samples are similar, they are called coexpression pairs. If the pairs are coexpressed in condition A, but not in condition B, or vice versa, then they are called the differentially coexpressed genes pairs. We calculated the PCCs between two genes and gene versus TF with their expression profile data at normal and tumor stages and then used formula (1) to screen differentially coexpressed gene pairs. A total of 9,775,369 differentially coexpressed genes pairs were obtained (Table 1).
3.2. Construction and Analysis of Differentially Regulatory Network
Transcriptional regulation pairs were selected based on the selected differentially coexpressed gene pairs and then were used for the construction of differentially regulatory network. The transcriptional regulation relationship in the network under disease states was different from that under normal state, which may possess a significant impact on the incidence of disease. The constructed differentially regulatory network was comprised of 10,899 pairs of regulation relationships, including 133 TFs and 5,083 target genes. The top 25% relationships were visualized using Cytoscape software (Figure 1).
3.3. Impact Analysis of Transcription Factor
The above network generated vast amounts of data. In order to focus on the most meaningful information, we evaluated the impact of TFs by calculating their RIF. The top 10 ranked TFs with higher RIF were HAND1, PTK1, NFKB1, ZIC3, STAT6, E2F1, PELP1, USF2, CBFB, SOX9, and FOXO4 (Table 2).
By searching PubMed, NFKB1 [22, 23], STAT6 , E2F1 , USF1 , and CBFB  were the verified asthma-related TFs, while NFKB1 [22, 23] and STAT6  were the newly discovered asthma-related genes in 2013. In addition, HAND1, PTK7, and ZIC3 were found to be potential asthma-related factors considering their TIF values.
3.4. Enrichment of KEGG Pathway
We used DAVID to perform KEGG pathway enriching analysis for the top 3000 pairs of relationships in differentially regulatory network. As shown in Table 3, the differentially regulatory network was mainly enriched in some important pathways, such as cancer pathway, Wnt, and MAPK pathway.
Molecular biomarkers are useful in improving diagnostic and prediction accuracy in clinic and treatment efficacy. Since microarray can interrogate expression levels of thousands of genes in human genome simultaneously, it has been widely used in the discovery of disease biomarkers [27–29]. In this work, we analyzed gene expression data with computational methods with the aim of uncovering biomarkers that were potentially dysregulated in children’s asthma. We identified a total of 9,775,369 differentially coexpressed gene pairs between normal tissue microarray and children’s asthma tissue microarray. After regulatory network construction and RIF analysis, we found that the TFs: HAND1, PTK1, NFKB1, ZIC3, STAT6, E2F1, PELP1, USF2, CBFB, SOX9, and FOXO4 may play important roles in children’s asthma initiation. On account of RIF score, HAND1, PTK7, and ZIC3 were considered as potential asthma-related factors.
Heart and neural crest derivatives-expressed protein 1 (HAND 1) is a protein encoded by the HAND1 gene in human . The protein encoded by this gene belongs to the basic helix-loop-helix family of TFs. A recent study provides evidence that HAND1 is indeed an important regulator of the interventricular boundary , but the role of HAND 1 in asthma has not been reported. Tyrosine-protein kinase-like 7 (PTK7) is a human enzyme encoded by the PTK7 gene . Receptor protein tyrosine kinases could transduce extracellular signals across the cell membrane, and PTK7 is thought to mediate signals by recruiting other signaling molecules as defective receptor tyrosine kinases . Our research showed that PTK7 gene was association with the occurrence of asthma in children. Zinc finger protein ZIC 3 is a protein encoded by the ZIC3 gene , which encodes a member of the ZIC family of C2H2-type zinc finger proteins. Our results highlight a role of Zic3 in the maintenance of asthma. However, further experimental verification is needed on the possible roles of HAND1, PTK7, and ZIC3 in asthma proposed in this study.
NFKB1, STAT6, and E2F1 were the verified asthma-related TFs in PubMed, and they were discovered to exert regulatory impact in this study. NFKB1 (nuclear factor of kappa light polypeptide gene enhancer in B cells 1), which located within the linkage peak, encodes the p105/p50 subunit of the NFκB family of proteins . By detecting the RNA expression in buccal mucosa samples of patients with asthma, NFKB1 was found to be differentially expressed . NFKBIA/IκBα is identified to be a central hub in transcriptional responses of prevalent childhood lung diseases, including asthma . STAT6 gene (human signal transducer and activator of transcription 6) is considered as one of the most promising candidate genes for asthma . Genomewide association studies have revealed that special polymorphism haplotype variants and epigenetic modifications of STAT6 are associated with asthma in childhood . The transcription factor E2F1 is an additional target of c-Myc promoting cell cycle progression . E2F1 was differentially expressed in asthma-diagnosed human donor lung tissues compared with normal bronchial epithelial cells .
During cellular processes, genes interact with each other; thus, disease-related genes may form differential coexpression patterns with other genes in different conditions. Most of the previous analysis applied a single gene differential expression method, whereas we applied differential coexpression analysis. The differential coexpression approach provides a FDR controlled list of interesting gene sets, with no requirement that genes be highly correlated in at least one biological condition, and it is now readily applied to data from individual and multiple experiments. Nevertheless, the differential coexpression gene pairs identified using the computational bioinformatics method should be further confirmed by in vitro analysis with normal controls.
In conclusion, our analysis identified 9,775,369 differentially coexpressed genes pairs associated with asthma initiation using a computational bioinformatics analysis of gene expression. We also uncovered a network of transcription factors that putatively contribute to the dysregulation of several genes in asthma. In the differentially regulatory network, transcription factors HAND1, PTK1, NFKB1, ZIC3, STAT6, E2F1, PELP1, USF2, CBFB, SOX9, and FOXO4 were found to have altered expression levels in asthma patients. We suggested that HAND1, PTK7, and ZIC3 may be used as biomarkers for asthma; however, more work is needed to validate our result.
Conflicts of Interests
The author has no conflict of interests to state.
F. D. Martinez, “Genes, environments, development and asthma: a reappraisal,” European Respiratory Journal, vol. 29, no. 1, pp. 179–184, 2007.View at: Publisher Site | Google Scholar
M. Koehoorn, L. Tamburic, C. McLeod, P. Demers, L. Lynd, and S. Kennedy, “Population-based surveillance of asthma among workers in British Columbia, Canada,” Chronic Diseases and Injuries in Canada, vol. 33, no. 2, pp. 88–94, 2013.View at: Google Scholar
L. J. Akinbami, C. M. Bailey, C. A. Johnson et al., National Surveillance of Asthma: United States, 2001–2010, vol. 3, no. 35 of Vital and Health Statistics, National Center for Health Statistics, 2012.
D. Fishwick, C. M. Barber, L. M. Bradshaw et al., “Standards of care for occupational asthma: an update,” Thorax, vol. 67, no. 3, pp. 278–280, 2012.View at: Publisher Site | Google Scholar
J. S. Kuriakose and R. L. Miller, “Environmental epigenetics and allergic diseases: recent advances,” Clinical & Experimental Allergy, vol. 40, no. 11, pp. 1602–1610, 2010.View at: Publisher Site | Google Scholar
F. J. Kelly and J. C. Fussell, “Air pollution and airway disease,” Clinical & Experimental Allergy, vol. 41, no. 8, pp. 1059–1071, 2011.View at: Publisher Site | Google Scholar
G. McGwin Jr., J. Lienert, and J. I. Kennedy Jr., “Formaldehyde exposure and asthma in children: a systematic review,” Environmental Health Perspectives, vol. 118, no. 3, pp. 313–317, 2010.View at: Publisher Site | Google Scholar
S. H. Arshad, “Does exposure to indoor allergens contribute to the development of asthma and allergy?” Current Allergy and Asthma Reports, vol. 10, no. 1, pp. 49–55, 2010.View at: Publisher Site | Google Scholar
C. Ober and S. Hoffjan, “Asthma genetics 2006: the long and winding road to gene discovery,” Genes and Immunity, vol. 7, no. 2, pp. 95–100, 2006.View at: Publisher Site | Google Scholar
E. Halapi and U. S. Bjornsdottir, “Overview on the current status of asthma genetics,” The Clinical Respiratory Journal, vol. 3, no. 1, pp. 2–7, 2009.View at: Publisher Site | Google Scholar
F. D. Martinez, “CD14, endotoxin, and asthma risk: actions and interactions,” Proceedings of the American Thoracic Society, vol. 4, no. 3, pp. 221–225, 2007.View at: Publisher Site | Google Scholar
E. M. Schauberger, S. L. Ewart, S. H. Arshad et al., “Identification of ATPAF1 as a novel candidate gene for asthma in children,” Journal of Allergy and Clinical Immunology, vol. 128, no. 4, pp. 753.e11–760.e11, 2011.View at: Publisher Site | Google Scholar
J. H. Moore, F. W. Asselbergs, and S. M. Williams, “Bioinformatics challenges for genome-wide association studies,” Bioinformatics, vol. 26, no. 4, pp. 445–455, 2010.View at: Publisher Site | Google Scholar
E. Segal, M. Shapira, A. Regev et al., “Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data,” Nature Genetics, vol. 34, no. 2, pp. 166–176, 2003.View at: Publisher Site | Google Scholar
Y. Wang, T. Joshi, X.-S. Zhang, D. Xu, and L. Chen, “Inferring gene regulatory networks from multiple microarray datasets,” Bioinformatics, vol. 22, no. 19, pp. 2413–2420, 2006.View at: Publisher Site | Google Scholar
T. Barrett, T. O. Suzek, D. B. Troup et al., “NCBI GEO: mining millions of expression profiles—database and tools,” Nucleic Acids Research, vol. 33, no. supplement 1, pp. D562–D566, 2005.View at: Publisher Site | Google Scholar
D. Karolchik, R. Baertsch, M. Diekhans et al., “The UCSC genome browser database,” Nucleic Acids Research, vol. 31, no. 1, pp. 51–54, 2003.View at: Publisher Site | Google Scholar
M. Bhattacharyya and S. Bandyopadhyay, “Studying the differential co-expression of microRNAs reveals significant role of white matter in early Alzheimer's progression,” Molecular BioSystems, vol. 9, no. 3, pp. 457–466, 2013.View at: Google Scholar
S. Cho, J. Kim, and J. H. Kim, “Identifying set-wise differential co-expression in gene expression microarray data,” BMC Bioinformatics, vol. 10, article 109, 2009.View at: Publisher Site | Google Scholar
A. Reverter, N. J. Hudson, S. H. Nagaraj, M. Pérez-Enciso, and B. P. Dalrymple, “Regulatory impact factors: unraveling the transcriptional regulation of complex traits from expression data,” Bioinformatics, vol. 26, no. 7, pp. 896–904, 2010.View at: Publisher Site | Google Scholar
S. Klunker, M. M. W. Chong, P.-Y. Mantel et al., “Transcription factors RUNX1 and RUNX3 in the induction and suppressive function of Foxp3+ inducible regulatory T cells,” Journal of Experimental Medicine, vol. 206, no. 12, pp. 2701–2715, 2009.View at: Publisher Site | Google Scholar
S. Ali, A. F. Hirschfeld, M. L. Mayer et al., “Functional genetic variation in NFKBIA and susceptibility to childhood asthma, bronchiolitis, and bronchopulmonary dysplasia,” The Journal of Immunology, vol. 190, no. 8, pp. 3949–3958, 2013.View at: Publisher Site | Google Scholar
C. A. Vyhlidal, A. K. Riffel, H. Dai, L. J. Rosenwasser, and B. L. Jones, “Detecting gene expression in buccal mucosa in subjects with asthma versus subjects without asthma,” Pediatric Allergy and Immunology, vol. 24, no. 2, pp. 138–143, 2013.View at: Publisher Site | Google Scholar
M. Godava, R. Vrtel, and R. Vodicka, “STAT6—polymorphisms, haplotypes and epistasis in relation to atopy and asthma,” Biomedical Papers of the Medical Faculty of the University Palacký, Olomouc, Czechoslovakia, vol. 157, no. 2, pp. 172–180, 2013.View at: Publisher Site | Google Scholar
B. Diao, Y. Liu, Y. Zhang, Q. Liu, W. J. Lu, and G. Xu, “Functional network analysis with the subcellular location and gene ontology information in human allergic asthma,” Genet Test Mol Biomarkers, vol. 16, no. 11, pp. 1287–1292, 2012.View at: Publisher Site | Google Scholar
J. S. Bickford, K. J. Newsom, J.-D. Herlihy et al., “Induction of group IVC phospholipase A2 in allergic asthma: transcriptional regulation by TNFα in bronchoepithelial cells,” Biochemical Journal, vol. 442, no. 1, pp. 127–137, 2012.View at: Publisher Site | Google Scholar
C. S. Cooper, C. Campbell, and S. Jhavar, “Mechanisms of disease: biomarkers and molecular targets from microarray gene expression studies in prostate cancer,” Nature Clinical Practice Urology, vol. 4, no. 12, pp. 677–687, 2007.View at: Publisher Site | Google Scholar
C. R. Scherzer, A. C. Eklund, L. J. Morse et al., “Molecular markers of early Parkinson's disease based on gene expression in blood,” Proceedings of the National Academy of Sciences of the United States of America, vol. 104, no. 3, pp. 955–960, 2007.View at: Publisher Site | Google Scholar
A. Allam, R. S. Gumpeny, and S. V. Guttula, “Analyzing microarray data of Alzheimer's using cluster analysis to identify the biomarker genes,” International Journal of Alzheimer's Disease, vol. 2012, Article ID 649456, 5 pages, 2012.View at: Publisher Site | Google Scholar
M. Knöfler, G. Meinhardt, R. Vasicek, P. Husslein, and C. Egarter, “Molecular cloning of the human Hand1 gene/cDNA and its tissue- restricted expression in cytotrophoblastic cells and heart,” Gene, vol. 224, no. 1-2, pp. 77–86, 1998.View at: Publisher Site | Google Scholar
K. Togi, T. Kawamoto, R. Yamauchi, Y. Yoshida, T. Kita, and M. Tanaka, “Role of Hand1/eHAND in the dorso-ventral patterning and interventricular septum formation in the embryonic heart,” Molecular and Cellular Biology, vol. 24, no. 11, pp. 4627–4635, 2004.View at: Publisher Site | Google Scholar
K. Mossie, B. Jallal, F. Alves, I. Sures, G. D. Plowman, and A. Ullrich, “Colon carcinoma kinase-4 defines a new subclass of the receptor tyrosine kinase family,” Oncogene, vol. 11, no. 10, pp. 2179–2184, 1995.View at: Google Scholar
J. Boudeau, D. Miranda-Saavedra, G. J. Barton, and D. R. Alessi, “Emerging roles of pseudokinases,” Trends in Cell Biology, vol. 16, no. 9, pp. 443–452, 2006.View at: Publisher Site | Google Scholar
S. Alonso, M. E. Pierpont, W. Radtke et al., “Heterotaxia syndrome and autosomal dominant inheritance,” American Journal of Medical Genetics, vol. 56, no. 1, pp. 12–15, 1995.View at: Publisher Site | Google Scholar
H. J. Edenberg, X. Xuei, L. F. Wetherill et al., “Association of NFKB1, which encodes a subunit of the transcription factor NF-κB, with alcohol dependence,” Human Molecular Genetics, vol. 17, no. 7, pp. 963–970, 2008.View at: Publisher Site | Google Scholar
G. Duetsch, T. Illig, S. Loesgen et al., “STAT6 as an asthma candidate gene: polymorphism-screening, association and haplotype analysis in a Caucasian sib-pair study,” Human Molecular Genetics, vol. 11, no. 6, pp. 613–621, 2002.View at: Google Scholar
K. A. O'Donnell, E. A. Wentzel, K. I. Zeller, C. V. Dang, and J. T. Mendell, “C-Myc-regulated microRNAs modulate E2F1 expression,” Nature, vol. 435, no. 7043, pp. 839–843, 2005.View at: Publisher Site | Google Scholar
S. Damian, A. Deighan, M. Smithhisler, and G. Klarmann, “Gene Expression Differences in Primary Tracheobronchial Airway Epithelial Cells from Human Donors Diagnosed with Asthma or Chronic Obstructive Pulmonary Disorder (COPD),” Resource, 2011.View at: Google Scholar