Integrative Genomics and Computational Systems MedicineView this Special Issue
Expression Sensitivity Analysis of Human Disease Related Genes
Background. Genome-wide association studies (GWAS) have shown its revolutionary power in seeking the influenced loci on complex diseases genetically. Thousands of replicated loci for common traits are helpful in diseases risk assessment. However it is still difficult to elucidate the variations in these loci that directly cause susceptibility to diseases by disrupting the expression or function of a protein currently. Results. We evaluate the expression features of disease related genes and find that different diseases related genes show different expression perturbation sensitivities in various conditions. It is worth noting that the expression of some robust disease-genes doesn’t show significant change in their corresponding diseases, these genes might be easily ignored in the expression profile analysis. Conclusion. Gene ontology enrichment analysis indicates that robust disease-genes execute essential function in comparison with sensitive disease-genes. The diseases associated with robust genes seem to be relatively lethal like cancer and aging. On the other hand, the diseases associated with sensitive genes are apparently nonlethal like psych and chemical dependency diseases.
To elucidate the etiology and pathogenesis of the diseases, scientists have made efforts to map human disease loci genetically and clone many diseases genes [1, 2]. Recently with the development of new sequencing technology and high throughput microarray technology, the searching for the genetic traits of the diseases and scanning of personal genomic variations are revolutionized. Genome-wide association studies (GWAS)  showed its strong abilities in detecting complicated genetic variations in genes and building genomic variation patterns compared with linkage analysis  and candidate gene studies . GWAS have made contribution to establish plenty of disorder-gene association pairs . As reported, over 4008 SNPs are associated with 819 common diseases . The Genetic Association Database (GAD)  is a valuable resource of human genetic association studies on complex diseases and disorders, which facility us to rapidly establish the relationship of disorder-gene association pairs. The association studies explained the relationships between the diseases and genes on the genomic levels. Diseases-associated studies have identified functional genetic variations, but they didn’t well make sure the variations could cause the diseases directly. Annotating the diseases associated variations on different levels are necessary to identify the outstanding risking genes. Here we wonder whether the genes associated the same diseases show similar expression features?
In order to probe the expression feature of disease genes which are detected by the associated studies, we referred to the method of gene expression sensitivity analysis . We firstly investigated human global gene expression characters in response to the environmental perturbation. Gene expression patterns are different in various biological conditions, a lot of case-control expression patterns have been profiled using the high throughput microarray technology. Studies show a group of genes’ expression that could be easily disturbed with various external stimulations [9, 10]; however, some genes are stably expressed in different environments, which indicate that the genes show different expression sensitivity. For example, housekeeping genes which have been well investigated in maintaining the basal cellular functions have revealed its expression stability [11, 12]. Recently gene expression sensitivity to external stimulation have been studied in yeast and human [9, 10], which guide us to generate the idea to investigate the expression sensitivity of diseases genes.
It is worthy to obtain a global view of the intrinsic properties of human disease gene expression as a response to perturbations. Gene Expression Omnibus (GEO) database  and Genetic Association Database (GAD)  were used to analyze the human associated gene expression sensitivity. A meta-analysis method could be used to seek the sensitive genes and robust genes in the expression profiles globally. Based on our calculation of sensitive values of gene expression, we firstly categorized the genes into robust and sensitive groups. Furthermore we investigated the expression sensitivity of disease related genes in response to the perturbations and found some of genes were detected by the association studies previously, but the expression of these genes is relatively stable in their corresponding disease studies. The results also indicate some diseases related genes that show their expression robustness (DGR) like cancer and aging genes, and chemical dependency disease related genes seem to be relatively sensitive (DGS).
2. Materials and Computational Methods
2.1. Data Collection and Preprocessing
The Genetic Association Database (GAD) includes over 80,000 gene records of genetic association studies. Importantly, the database has a designation of whether the gene record was reported to be associated with disease phenotype. The option Y means that the gene of record was associated with the disease phenotype; otherwise, the option N was not associated. We collected the records in GAD that associated with the disease phenotype and got the records only annotated with the standard disease phenotype keywords from MeSH (http://www.nlm.nih.gov/mesh/) vocabulary.
After filtering, 13277 records were used for further investigation. In our study, 2588 disease related genes were acquired from 13275 records from the Genetic Association Database. Among these genes, 1804 genes were associated with more than one disease and 784 were associated with only one disease (Supplementary Material available online at http://dx.doi.org/10.1155/2013/637424, Table S2). These diseases related genes are mainly from human genetic association studies of complex diseases and disorders. 1464 kinds of diseases were extracted from 13275 records. To establish the relationships between genes and their corresponding diseases groups, the 1464 kinds of diseases were divided into 16 groups by paring database.
We downloaded the HGU133plus2.0 microarray datasets from the GEO database, each dataset had been normalized with MAS5 when the authors submitted them into the database as required (http://www.ncbi.nlm.nih.gov/geo/). To calculate the expression level of each gene, we referred to the methods from the previous work . We discarded the data sets with less than 6 arrays and changed the expression values into 10 if the expression values are less than 10. Then the expression values of all probes were logarithmic transformed (base 2). We choose the maximum expression value as gene expression value if multiple probes illustrate the same gene expression.
2.2. Calculate Sensitive Values (SV) of Each Gene
In our study, 167 datasets (labeled as in the Formula below) in GEO were used to calculate the sensitive values of genes. In each dataset , we calculate the standard deviation (SD) and mean of each gene , getting the coefficient of variation (CV) of each gene with SD divided mean. The CV of each gene in each dataset is calculated as follows: In order to merge the results of different datasets and minimum experimental variation during sensitivity analysis, we employed the large scale meta-analysis method reported in our previous work . We ranked the CV of genes in each dataset and constructed a matrix of ranked CV to datasets. Sensitive values (SV) are calculated as mean of each ranked CVs of all datasets:
2.3. Defining Sensitive Genes and Robust Genes
In the current study, we tried to establish the relationships between the different kinds of disease and different sensitive genes. Firstly we selected a group of genes whose expression could be significantly disturbed and a group of genes whose expression significantly stable. After calculating the sensitive values of genes, we used the 5 percent as the cutoff value. The top 5 percent of genes are significantly sensitively expressed, we took five percent of genes with lowest sensitive values as robust genes groups, and five percent of genes with highest SV were considered as sensitive genes.
The Affymetrix HGU 133a plus 2.0 microarray covers 31835 genes, which represents more genes than the HGU133a microarray does. The distribution of SV is skewed normal distribution (Figure 1). The figure suggested that there are more genes with a comparatively lower sensitive value than those with higher sensitive values indicating existing more robust genes than sensitive genes. Although the distribution of sensitive values are skewed normal, we took five percent of genes with lowest sensitive values as robust genes groups, and also five percent of genes with highest SV were considered as sensitive genes. Therefore, we got 1592 robust genes and sensitive genes individually. Therefore, we got 1592 robust genes and 1592 sensitive genes. After comparison with disease related genes, we got 131 diseases related robust genes and 467 disease related sensitive genes respectively (Supplementary Table S1).
3.1. Functional Annotations of Expression Robust and Sensitive Genes
In order to identify the biological functions in the cell, we conducted GO enrichment analysis [14, 15]. The results show that robust genes are strongly engaged in the cellular component organization (GO: 0071842), reproductive process (GO: 0022414), and viral reproduction (GO: 0016032) (Figure 2 and Table 1). However the sensitive genes play important roles in progesterone receptor signaling pathway (GO: 0050847) and negative regulation of osteoclast differentiation (GO: 0045671) (Figure 3 and Table 1). Furthermore enrichments analysis for cell components modules indicates that the robust gene are strongly enriched in the intracellular (GO: 0005622) and ribosome (GO: 0005840) (Figure 3), but the sensitive genes didn’t show enrichment in the cell components. Based on the above observation, we conclude that the robust genes are engaged in the basic biological process.
3.2. A Case about Robust Genes Consistently Expressing in Their Corresponding Disease
We have found the expression of robust genes are stable in various conditions and inferred that the some disease related robust genes may express consistently in their corresponding diseases conditions. To verify our assumption, we took colorectal cancer and its associated robust genes HIF1A and MLH as an example.
HIF1A associated with colorectal cancer is one of robust genes. Hypoxia-inducible factor-1 (HIF1) is a heterodimer composed with HIF1A and HIF1B. HIF1 is functionally important in cellular and systemic homeostatic responses to hypoxia and initially found as transcription factor in mammalian cells cultured under reduced oxygen tension . Fransén et al. found that polymorphic alleles in the gene of HIF1A show significant higher risk for the development of ulcerative colorectal cancer, which indicates that the HIF1A polymorphisms display their importance in the development of ulcerative intestinal tumors . To view the gene expression variations of colorectal cancer, we took a case from a work  that analyzed expression changes in early onset colorectal cancer (GDS2609). The expression of HIF1A didn’t show significant change in the study (Figure 4(a)). The robust gene HIF1A might be ignored in the colorectal cancer expression analysis.
Another robust gene MLH1 involved in DNA mismatch repair is also associated with colorectal cancer . Liu et al. revealed that colorectal cancer is associated with 2 missense mutations in exon 16 of the MLH1 . Chan et al. described a novel germline 1.8-kb deletion involving of the MLH1 gene associated with hereditary nonpolyposis colorectal cancer in a Hong Kong family . Recently Nejda et al. suggests that gender should be considered in colorectal cancer association studies . They found that nucleotide polymorphism in MLH1 displays a higher risk in sporadic colorectal carcinogenesis especially in men. A mechanism of genomic instability has been identified in colorectal cancer , the DNA mismatch repair genes MLH1 inactivated by hypermethylation of their promoter could cause microsatellite instability. We also found the expression of MLH1 in the early onset colorectal cancer investigation (GDS2609) is not significantly changed (Figure 4(b)). The expression profile analysis may overlook importance of robust gene MLH1. Therefore, we believe that diseases related robust genes are easily ignored in the expression analysis. In order to know the robust genes are enriched in what kinds of diseases, we established the relationships between diseases and gene expression sensitivities.
3.3. The Relationships between Diseases and Gene Expression Sensitivity
The 1469 diseases with their associated genes from GAD were divided into 16 groups. The disease genes are classified as robust disease gene groups and sensitive diseases genes groups based on whether they were included in the robust genes groups or sensitive genes groups. We performed the enrichment analysis of disease genes based on the hyper geometric distribution. The results (Table 2) show that the cancer, aging, and pharmacogenomics are enriched with robust genes (DGR), with values of 0.001309, 0.025063, and 0.06734 individually, and psych, chemical dependency, and reproduction are enriched with sensitive genes (DGS), with corresponding values of 0.001954, 0.028318, and 0.055457.
We annotated the gene from DGR and DGS with Gene Ontology . Firstly the DGR and GDS were classified into seven groups (Figure 5(a)) according to whether the genes associated one or more diseases. In the DGR groups, cancer genes are mainly engaged in the mismatch repair (GO: 0006298), DNA catabolic process (GO: 0006308), and base excision repair (GO: 0006284). The GO analysis of DGS (Figure 5(b)) shows the genes only associated with psych diseases are enriched in the learning (GO:0007612) process and the genes associated both psych and chemical dependency diseases are engaged in the dopamine secretion (GO:0014046) and gamma-aminobutyric acid signaling pathway (GO:0007214) and so forth. We conclude that the DGR mainly engaged in more essential biological process compared with DGS which mainly involve in the regulation and response process.
4. Discussion and Conclusion
We parsed the GAD databases and selected diseases associated genes. Because the human genes show different expression sensitivity in response to the environmental perturbation , we evaluated the expression features of diseases genes with the method of gene expression sensitivity analysis. Because the finding of expression of robust genes is not easily changed in various biological conditions, we assumed that the disease related robust genes might be expressed stably in their corresponding disease conditions. The colorectal cancer associated robust genes HIF1A and MLH did not show significant expression changes in the studies of colorectal cancer [17, 18, 20, 22]. Importantly our results suggested the genes associated with different diseases also reveal different sensitivities. We found the cancer, aging, and pharmacogenomics related genes display expression robustness, and psych, chemical dependency, and reproduction-associated genes are relatively sensitive. In our study, the robust disease related genes were investigated not only by combining the gene ontology but also by grouped disease information. The defect of robust genes could cause more lethal diseases, such as cancer and aging diseases. Thus diseases related robust genes might play more essential roles to keep health for human.
Additionally, the protein interaction network and gene ontology provide extensive information to detail the relationships between different diseases genes. It was found that the structure of a cellar network and its functional properties were connected with protein or “Hubs” which are more likely encoded by essential genes [25, 26]. Human robust genes are higher degree centrality than the random groups of genes in the protein interaction network . Gene ontology analysis also indicates the robust genes play an essential role in the cellar biology. The robust genes have shown its importance in different levels. Recent studies reveal that some kinds of diseases genes potentially encode hubs [27, 28]. Goh et al. suggested that cancer genes are more likely to encode hubs in the human disease networks and show higher coexpression with the rest of the genes in the cell , which means that cancer genes play critical roles in cellar development and growth. Age-related diseases tend to attack the center of the human protein network . PPI network investigation above indicated that the cancer and aging related genes are potentially robust.
Based on the studies above, we believe that the different diseases genes reveal distinct expression sensitivity. Diseases that are associated with robust genes seem to be lethal, and the diseases associated with sensitive genes are nonlethal apparently. The gene ontology analysis indicates the robust genes are more essential when compared with sensitive genes. The robust genes those stably express in various environmental conditions are easily ignored in the expression analysis. Therefore the consideration of sensitivity of disease genes might be greatly helpful in elucidating of etiology and pathogenesis of the diseases. In practice, calculation of the diseases genes’ sensitive values could be used to predict the potential harm to heath. In addition, if a robust gene is a potential drug target, it would have little therapeutic effects to these diseases by disturbing the expression level of the robust genes.
Conflict of Interests
The authors declare that they have no conflict of interests.
Liang-Xiao Ma and Ya-Jun Wang statistical analyses for the study and drafted the paper. Liang-Xiao Ma, Ya-Jun Wang, and Jing-Fang Wang participated in the design of the study and provided guidance. Pei Hao and Xuan Li conceived the study and finalized the organization and contents of the paper. All authors approved the final paper.
This work was supported by the National Basic Research Program of China (973 Program, nos. 2012CB517905 and 2012CB316501), Shanghai Natural Science Foundation (no. 11ZR1425700), and the National Natural Science Foundation of China (nos. 31200547 and 90913009). The authors gratefully acknowledge the support of the SA-SIBS Scholarship Program.
Table S1. The number of genes associated with 16 kinds of grouped diseases related genes.
Table S2. The number of diseases associated with each gene involved in the current study.
N. Risch and K. Merikangas, “The future of genetic studies of complex human diseases,” Science, vol. 273, no. 5281, pp. 1516–1517, 1996.View at: Google Scholar
L. A. Hindorff, P. Sethupathy, H. A. Junkins et al., “Potential etiologic and functional implications of genome-wide association loci for human diseases and traits,” Proceedings of the National Academy of Sciences of the United States of America, vol. 106, no. 23, pp. 9362–9367, 2009.View at: Publisher Site | Google Scholar
G. L. Wang, B.-H. Jiang, E. A. Rue, and G. L. Semenza, “Hypoxia-inducible factor 1 is a basic-helix-loop-helix-PAS heterodimer regulated by cellular O2 tension,” Proceedings of the National Academy of Sciences of the United States of America, vol. 92, no. 12, pp. 5510–5514, 1995.View at: Publisher Site | Google Scholar
N. Papadopoulos, N. C. Nicolaides, Y.-F. Wei et al., “Mutation of a mutL homolog in hereditary colon cancer,” Science, vol. 263, no. 5153, pp. 1625–1629, 1994.View at: Google Scholar
N. Nejda, D. Iglesias, M. Moreno Azcoita, V. Medina Arana, J. J. González-Aguilera, and A. M. Fernández-Peralta, “A MLH1 polymorphism that increases cancer risk is associated with better outcome in sporadic colorectal cancer,” Cancer Genetics and Cytogenetics, vol. 193, no. 2, pp. 71–77, 2009.View at: Publisher Site | Google Scholar
K. Söreide, E. Janssen, H. Söiland, H. Körner, and J. Baak, “Microsatellite instability in colorectal cancer,” British Journal of Surgery, vol. 93, no. 4, pp. 395–406, 2006.View at: Google Scholar
J. D. J. Han, N. Bertin, T. Hao et al., “Evidence for dynamically organized modularity in the yeast protein-protein interaction network,” Nature, vol. 430, no. 6995, pp. 88–93, 2004.View at: Google Scholar