BioMed Research International

BioMed Research International / 2016 / Article

Research Article | Open Access

Volume 2016 |Article ID 2395341 | 6 pages | https://doi.org/10.1155/2016/2395341

Identifying Liver Cancer-Related Enhancer SNPs by Integrating GWAS and Histone Modification ChIP-seq Data

Academic Editor: Xun Lan
Received09 Apr 2016
Revised30 May 2016
Accepted01 Jun 2016
Published27 Jun 2016

Abstract

Many disease-related single nucleotide polymorphisms (SNPs) have been inferred from genome-wide association studies (GWAS) in recent years. Numerous studies have shown that some SNPs located in protein-coding regions are associated with numerous diseases by affecting gene expression. However, in noncoding regions, the mechanism of how SNPs contribute to disease susceptibility remains unclear. Enhancer elements are functional segments of DNA located in noncoding regions that play an important role in regulating gene expression. The SNPs located in enhancer elements may affect gene expression and lead to disease. We presented a method for identifying liver cancer-related enhancer SNPs through integrating GWAS and histone modification ChIP-seq data. We identified 22 liver cancer-related enhancer SNPs, 9 of which were regulatory SNPs involved in distal transcriptional regulation. The results highlight that these enhancer SNPs may play important roles in liver cancer.

1. Introduction

Single nucleotide polymorphism (SNP) is a variation at a single nucleotide in a DNA sequence [1]. In the last decade, a large number of genome-wide association studies (GWAS) have been published, indicating that thousands of SNPs are associated with diseases. Linkage disequilibrium is the nonrandom association of alleles at different genome locations [2]. There are many SNPs in LD with the causal SNP at specific GWAS locus [3, 4]. Over 90% of these GWAS variants are located in noncoding regions, and approximately 10% are in LD with a protein-coding variant [5, 6]. In protein-coding regions, many studies have shown that some SNPs are associated with numerous diseases by affecting gene expression [7, 8]. However, in noncoding regions, the mechanism of how SNPs contribute to disease susceptibility remains unclear.

Enhancers are the core regulatory components of the genome that act over a distance to positively regulate gene expression [9]. It is estimated that 400,000 to 1 million putative enhancers exist in the human genome [10, 11]. Recently, some studies have shown that disease-related GWAS SNPs are correlated with enhancers marked with special histone modifications [1215]. Therefore, through integrating GWAS and histone modification ChIP-seq data in a given disorder, we can identify disease-related enhancer SNPs.

We provided a method for identifying liver cancer-related enhancer SNPs through integrating liver cancer GWAS and histone modification ChIP-seq data. We identified 22 enhancer SNPs associated with liver cancer, 9 of which were regulatory SNPs involved in distal transcriptional regulation. The results highlight that these enhancer SNPs may play important roles in liver cancer.

2. Results

2.1. Pipeline of Identifying Liver Cancer-Related Enhancer SNPs

As shown in Figure 1, the pipeline consists of four steps. Firstly, we downloaded liver cancer-related GWAS SNPs from the GRASP [16] database and used LD data from HapMap [17] to infer liver cancer-related LD SNPs. Secondly, we identified enhancer regions in liver cancer through integrating histone modification ChIP-seq data in the HepG2 cell line. Thirdly, we mapped the liver cancer-related LD SNPs to the identified enhancers in liver cancer and obtained liver cancer-related enhancer SNPs. Finally, we used a curated regulatory SNP database named rVarBase [18] to validate our results.

2.2. Linkage Disequilibrium Analysis with Liver Cancer-Associated SNPs

We obtained 45 liver cancer-associated SNPs from GRASP (Table 1). These SNPs are the raw potential liver cancer-related SNPs. Then, we used LD data from HapMap to achieve liver cancer-associated LD SNPs. The total number of potential liver cancer-related SNPs is 340.


SNP ID valueChromosomePopulationsPMID

rs174019661.20E − 191CHB20676096
rs12494588.30E − 062CHB22807686
rs17142591.10E − 062CHB22807686
rs23964705.10E − 072CHB20676096
rs74241618.80E − 062CHB22807686
rs75748651.70E − 112CHB23242368
rs39058863.70E − 063CHB22807686
rs10735476.80E − 064CHB22807686
rs170813453.70E − 076CHB20676096
rs92721053.30E − 236CHB22807686
rs92753198.70E − 196CHB23242368
rs94942571.10E − 146CHB20676096
rs126822666.70E − 068CHB22174901
rs15732667.40E − 068CHB22174901
rs22759596.40E − 068CHB22174901
rs78219747.00E − 068CHB22174901
rs78980057.00E − 0810CHB20676096
rs101607586.00E − 0611CHB22807686
rs108964646.50E − 0611CHB22807686
rs26111459.30E − 0611CHB22174901
rs38250233.10E − 0611CHB22807686
rs71194263.50E − 0611CHB22807686
rs4020718.60E − 0619CHB22807686
rs30921944.40E − 0620CHB22807686
rs3680079.90E − 0620CHB22807686
rs4558044.40E − 1021CHB22807686
rs19802152.30E − 063JPT21499248
rs25965424.20E − 136JPT21499248
rs92755721.40E − 096JPT21499248
rs15686586.90E − 067JPT21499248
rs9526562.80E − 068JPT21499248
rs43636144.20E − 0711JPT21499248
rs19574964.60E − 0614JPT21499248
rs80195343.90E − 0614JPT21499248
rs17943043.60E − 0616JPT21725309
rs22084563.80E − 0620JPT21499248
rs10120681.30E − 1422JPT21725309
rs117037794.20E − 0622JPT21725309
rs48209943.00E − 0622JPT21725309
rs48209964.20E − 0622JPT21725309
rs57538164.90E − 0622JPT21725309
rs57538189.40E − 0622JPT21725309
rs59981521.20E − 0722JPT21725309
rs72870543.80E − 0622JPT21725309
rs7370845.90E − 0622JPT21725309

2.3. Identification of Liver Cancer-Related Enhancer SNPs

Previous studies indicated that the enhancer regions are marked by a strong H3K4me1 signal and a relatively weak H3K4me3 signal [19, 20]. Thus, we used histone modification ChIP-seq data to recognize the enhancer regions in liver cancer. Then, we mapped the liver cancer-related GWAS SNPs to the enhancer regions and obtained 22 enhancer SNPs in liver cancer (Table 2).


SNP IDChromosomeStartEndChainPopulations

rs12751375chr11029187310291874+CHB
rs6700866chr11030603710306038+CHB
rs9494257chr6135827471135827472+CHB
rs17064474chr6135680137135680138+CHB
rs17721919chr6135748923135748924+CHB
rs17721931chr6135749377135749378+CHB
rs6903949chr6135821065135821066+CHB
rs6996881chr83740791937407920+CHB
rs4739519chr83741285837412859+CHB
rs6988263chr83741465937414660+CHB
rs12156293chr83741992137419922+CHB
rs6928810chr63141052331410524+JPT
rs3869132chr63141094731410948JPT
rs2596562chr63135459431354595JPT
rs2523475chr63136170931361710JPT
rs2523467chr63136292931362930JPT
rs9501387chr63136445831364459+JPT
rs1568658chr72914155729141558JPT
rs1794304chr161262539412625395+JPT
rs5994449chr223230417832304179+JPT
rs5753816chr223231284132312842+JPT
rs5749339chr223231573432315735+JPT

2.4. Validation as Regulatory SNPs

rVarBase is a database that provides reliable, comprehensive, and user-friendly annotations on variant’s regulatory features [18]. It includes regulatory SNPs (rSNPs), LD-proxies of rSNPs, and genes that are potentially regulated by rSNPs. We used rVarBase to analyze these 22 enhancer SNPs in liver cancer and found that 14 SNPs have evidence of regulatory SNPs and 9 SNPs (rs9494257, rs6903949, rs6996881, rs4739519, rs6988263, rs12156293, rs1568658, rs5994449, and rs5753816) are involved in distal transcriptional regulation (Table 3). Table 4 shows the potential target genes of these 9 SNPs.


SNP IDRegulatory SNPDistal regulationChromatin stateRelated regulatory elements

rs12751375YesNoInactive regionn/a
rs6700866YesNoWeak transcription; ZNF genes and repeats; strong transcription; enhancersn/a
rs9494257YesYesEnhancers; flanking active TSS; weak transcriptionChromatin interactive region
rs17064474YesNoWeak transcription; active TSS; flanking active TSS; enhancersn/a
rs17721919YesNoWeak transcriptionn/a
rs17721931YesNoWeak transcriptionn/a
rs6903949YesYesWeak transcription; enhancersTF binding region; chromatin interactive region
rs6996881YesYesWeak transcription; enhancersChromatin interactive region
rs4739519YesYesEnhancers; weak transcriptionChromatin interactive region
rs6988263YesYesEnhancers; weak transcription; genic enhancers; bivalent enhancer; flanking active TSSChromatin interactive region
rs12156293YesYesEnhancers; weak transcription; bivalent enhancer; genic enhancersChromatin interactive region
rs1568658YesYesWeak transcription; enhancers; strong transcriptionChromatin interactive region
rs5994449YesYesWeak transcription; strong transcription; ZNF genes and repeatsChromatin interactive region
rs5753816YesYesWeak transcription; enhancers; flanking active TSSChromatin interactive region


SNP IDGene symbolEnsemble IDRegulation type

rs9494257BCLAF1ENSG00000029363Distal transcriptional regulation
rs9494257AHI1ENSG00000135541Distal transcriptional regulation
rs9494257LINC00271ENSG00000231028Distal transcriptional regulation
rs6903949MYBENSG00000118513Distal transcriptional regulation
rs6903949BCLAF1ENSG00000029363Distal transcriptional regulation
rs6903949AHI1ENSG00000135541Distal transcriptional regulation
rs6903949LINC00271ENSG00000231028Distal transcriptional regulation
rs6996881ZNF703ENSG00000183779Distal transcriptional regulation
rs6996881ERLIN2ENSG00000147475Distal transcriptional regulation
rs6996881NullENSG00000183154Distal transcriptional regulation
rs6996881NullENSG00000253161Distal transcriptional regulation
rs4739519ZNF703ENSG00000183779Distal transcriptional regulation
rs4739519NullENSG00000254290Distal transcriptional regulation
rs6988263ZNF703ENSG00000183779Distal transcriptional regulation
rs6988263NullENSG00000254290Distal transcriptional regulation
rs12156293ZNF703ENSG00000183779Distal transcriptional regulation
rs12156293NullENSG00000254290Distal transcriptional regulation
rs12156293ERLIN2ENSG00000147475Distal transcriptional regulation
rs12156293NullENSG00000183154Distal transcriptional regulation
rs1568658NullENSG00000228421Distal transcriptional regulation
rs1568658TRILENSG00000176734Distal transcriptional regulation
rs1568658NullENSG00000255690Distal transcriptional regulation
rs5994449DEPDC5ENSG00000100150Distal transcriptional regulation
rs5994449FBXO7ENSG00000100225Distal transcriptional regulation
rs5994449SYN3ENSG00000185666Distal transcriptional regulation
rs5994449PRR14LENSG00000183530Distal transcriptional regulation
rs5994449PISDENSG00000241878Distal transcriptional regulation
rs5994449EIF4ENIF1ENSG00000184708Distal transcriptional regulation
rs5994449RNU6-28ENSG00000199248Distal transcriptional regulation
rs5994449SFI1ENSG00000198089Distal transcriptional regulation
rs5753816YWHAHENSG00000128245Distal transcriptional regulation
rs5753816C22orf24ENSG00000128254Distal transcriptional regulation
rs5753816PISDENSG00000241878Distal transcriptional regulation
rs5753816DEPDC5ENSG00000100150Distal transcriptional regulation
rs5753816RNU6-28ENSG00000199248Distal transcriptional regulation
rs5753816SFI1ENSG00000198089Distal transcriptional regulation
rs5753816EIF4ENIF1ENSG00000184708Distal transcriptional regulation
rs5753816RFPL3SENSG00000205853Distal transcriptional regulation
rs5753816NullENSG00000230736Distal transcriptional regulation
rs5753816NullENSG00000243519Distal transcriptional regulation
rs5753816NullENSG00000241954Distal transcriptional regulation
rs5753816NullENSG00000232218Distal transcriptional regulation
rs5753816SYN3ENSG00000185666Distal transcriptional regulation

3. Materials and Methods

3.1. GWAS and LD Datasets

We downloaded the human liver cancer-related GWAS SNPs from GRASP. The database includes 26 and 19 liver cancer-associated SNPs () from Han Chinese in Beijing, China (CHB), and Japanese in Tokyo, Japan (JPT), respectively. The URL is https://grasp.nhlbi.nih.gov/Overview.aspx. We obtained all SNPs in LD with GWAS-lead SNPs using LD blocks identified with publicly available HapMap data on the CHB and JPT populations. The LD data can be downloaded from http://hapmap.ncbi.nlm.nih.gov/index.html/.

3.2. Histone Modification Datasets

We downloaded the human histone modification ChIP-seq datasets in the HepG2 cell line from the ENCODE Production Data/Broad Institute. The URL is http://genome.ucsc.edu/ENCODE/downloads.html.

3.3. Linkage Disequilibrium Analysis

In the genome, SNPs located in close proximity tend to be in linkage disequilibrium with each other. The International HapMap Project has established linkage disequilibrium of human genome SNPs. We used LD data from HapMap to achieve liver cancer-associated LD SNPs ().

3.4. Identify Enhancer Regions and Enhancer SNPs

Firstly, we downloaded the human histone modification BAM files (H3K4me1 and H3K4me3) in the HepG2 cell line from the ENCODE project. Then, we used BEDtools [21] to count read coverage for every position of the genome. Through calculating the ratio H3K4me1/H3K4me3 and picking up the regions with log2(H3K4me1/H3K4me3) > 1.2, we identified the potential enhancer regions. Finally, we mapped the potential LD SNPs to these enhancer regions and achieved liver cancer-related enhancer SNPs.

4. Discussion

Through integrating liver cancer GWAS SNPs from GRASP, LD data from HapMap, and histone modification ChIP-seq data from ENCODE, we explored liver cancer-related enhancer SNPs. We compared our results with rVarBase and found that 9 SNPs (rs9494257, rs6903949, rs6996881, rs4739519, rs6988263, rs12156293, rs1568658, rs5994449, and rs5753816) were regulatory SNPs involved in distal transcriptional regulation. The results highlight that these enhancer SNPs may play important roles in liver cancer.

Compared with protein-coding regions in the human genome, noncoding regions contain much more genetic variations. Some important regulation regions, such as enhancers, have great influence on target gene expression. SNPs located in these regions may disturb gene expression and even cause diseases. Thus, the identification of SNPs in enhancer regions is helpful to understand the mechanism of association between SNPs and diseases.

We presented a method to identify disease-related SNPs located in enhancer regions that gives a new solution to investigate the relationship between SNPs and diseases. The presented method can also be applied to other diseases and will enable biologists to investigate the mechanism of disease risk associated with SNPs.

Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.

Authors’ Contributions

Tianjiao Zhang collected the data, designed the computational experiments, carried out the statistical analysis, and wrote the paper. Qinghua Jiang participated in the design of the study. Yang Hu, Xiaoliang Wu, and Rui Ma participated in the revision of this paper. Yadong Wang gave comments and revisions to the final version of this paper. All authors read and approved the final paper. Tianjiao Zhang and Yang Hu equally contributed to this paper.

Acknowledgments

This work was partially supported by the National High-Tech Research and Development Program (863) of China (2012AA02A601, 2012AA02A602, 2012AA020404, 2012AA020409, 2012AA02A604, 2014AA021505, 2015AA020101, and 2015AA020108), National Science and Technology Major Project [no. 2013ZX03005012], and the National Natural Science Foundation of China (61571152, 31301089).

References

  1. R. Sachidanandam, D. Weissman, S. C. Schmidt et al., “A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms,” Nature, vol. 409, no. 6822, pp. 928–933, 2001. View at: Publisher Site | Google Scholar
  2. M. Slatkin, “Linkage disequilibrium—understanding the evolutionary past and mapping the medical future,” Nature Reviews Genetics, vol. 9, no. 6, pp. 477–485, 2008. View at: Publisher Site | Google Scholar
  3. J. Peng, S. Uygun, T. Kim, Y. Wang, S. Y. Rhee, and J. Chen, “Measuring semantic similarities by combining gene ontology annotations and gene co-function networks,” BMC Bioinformatics, vol. 16, no. 1, article 44, 2015. View at: Publisher Site | Google Scholar
  4. J. Peng, T. Wang, J. Wang, Y. Wang, and J. Chen, “Extending gene ontology with gene association networks,” Bioinformatics, vol. 32, no. 8, pp. 1185–1194, 2016. View at: Publisher Site | Google Scholar
  5. M. T. Maurano, R. Humbert, E. Rynes et al., “Systematic localization of common disease-associated variation in regulatory DNA,” Science, vol. 337, no. 6099, pp. 1190–1195, 2012. View at: Publisher Site | Google Scholar
  6. M. A. Schaub, A. P. Boyle, A. Kundaje, S. Batzoglou, and M. Snyder, “Linking disease associations with regulatory information in the human genome,” Genome Research, vol. 22, no. 9, pp. 1748–1759, 2012. View at: Publisher Site | Google Scholar
  7. L. A. Hindorff, P. Sethupathy, H. A. Junkins et al., “Potential etiologic and functional implications of genome-wide association loci for human diseases and traits,” Proceedings of the National Academy of Sciences of the United States of America, vol. 106, no. 23, pp. 9362–9367, 2009. View at: Publisher Site | Google Scholar
  8. Q. Zou, J. Li, Q. Hong et al., “Prediction of microRNA-disease associations based on social network analysis methods,” BioMed Research International, vol. 2015, Article ID 810514, 9 pages, 2015. View at: Publisher Site | Google Scholar
  9. J. Banerji, S. Rusconi, and W. Schaffner, “Expression of a β-globin gene is enhanced by remote SV40 DNA sequences,” Cell, vol. 27, no. 2, pp. 299–308, 1981. View at: Publisher Site | Google Scholar
  10. C. Buecker and J. Wysocka, “Enhancers as information integration hubs in development: lessons from genomics,” Trends in Genetics, vol. 28, no. 6, pp. 276–284, 2012. View at: Publisher Site | Google Scholar
  11. W. Xie and B. Ren, “Developmental biology. Enhancing pluripotency and lineage specification,” Science, vol. 341, no. 6143, pp. 245–247, 2013. View at: Publisher Site | Google Scholar
  12. S. Heinz, C. Benner, N. Spann et al., “Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities,” Molecular Cell, vol. 38, no. 4, pp. 576–589, 2010. View at: Publisher Site | Google Scholar
  13. J. Ernst, P. Kheradpour, T. S. Mikkelsen et al., “Mapping and analysis of chromatin state dynamics in nine human cell types,” Nature, vol. 473, no. 7345, pp. 43–49, 2011. View at: Publisher Site | Google Scholar
  14. B. Akhtar-Zaidi, R. Cowper-Sallari, O. Corradin et al., “Epigenomic enhancer profiling defines a signature of colon cancer,” Science, vol. 336, no. 6082, pp. 736–739, 2012. View at: Publisher Site | Google Scholar
  15. G. Trynka, C. Sandor, B. Han et al., “Chromatin marks identify critical cell types for fine mapping complex trait variants,” Nature Genetics, vol. 45, no. 2, pp. 124–130, 2013. View at: Publisher Site | Google Scholar
  16. R. Leslie, C. J. O'Donnell, and A. D. Johnson, “GRASP: analysis of genotype-phenotype results from 1390 genome-wide association studies and corresponding open access database,” Bioinformatics, vol. 30, no. 12, pp. i185–i194, 2014. View at: Publisher Site | Google Scholar
  17. The International HapMap Consortium, “The international HapMap project,” Nature, vol. 426, no. 6968, pp. 789–796, 2003. View at: Google Scholar
  18. L. Guo, Y. Du, S. Qu, and J. Wang, “rVarBase: an updated database for regulatory features of human variants,” Nucleic Acids Research, vol. 44, no. 1, pp. D888–D893, 2016. View at: Publisher Site | Google Scholar
  19. F. de Santa, I. Barozzi, F. Mietton et al., “A large fraction of extragenic RNA Pol II transcription sites overlap enhancers,” PLoS Biology, vol. 8, no. 5, article e1000384, 2010. View at: Publisher Site | Google Scholar
  20. A. C. Marques, J. Hughes, B. Graham, M. S. Kowalczyk, D. R. Higgs, and C. P. Ponting, “Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs,” Genome Biology, vol. 14, no. 11, article R131, 2013. View at: Publisher Site | Google Scholar
  21. A. R. Quinlan, “UNIT 11.12 BEDTools: the swiss-army tool for genome feature analysis,” Current Protocols in Bioinformatics, vol. 47, pp. 11.12.1–11.12.34, 2014. View at: Publisher Site | Google Scholar

Copyright © 2016 Tianjiao Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

1447 Views | 447 Downloads | 1 Citation
 PDF  Download Citation  Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly and safely as possible. Any author submitting a COVID-19 paper should notify us at help@hindawi.com to ensure their research is fast-tracked and made available on a preprint server as soon as possible. We will be providing unlimited waivers of publication charges for accepted articles related to COVID-19. Sign up here as a reviewer to help fast-track new submissions.