Identifying Liver Cancer-Related Enhancer SNPs by Integrating GWAS and Histone Modification ChIP-seq Data
Many disease-related single nucleotide polymorphisms (SNPs) have been inferred from genome-wide association studies (GWAS) in recent years. Numerous studies have shown that some SNPs located in protein-coding regions are associated with numerous diseases by affecting gene expression. However, in noncoding regions, the mechanism of how SNPs contribute to disease susceptibility remains unclear. Enhancer elements are functional segments of DNA located in noncoding regions that play an important role in regulating gene expression. The SNPs located in enhancer elements may affect gene expression and lead to disease. We presented a method for identifying liver cancer-related enhancer SNPs through integrating GWAS and histone modification ChIP-seq data. We identified 22 liver cancer-related enhancer SNPs, 9 of which were regulatory SNPs involved in distal transcriptional regulation. The results highlight that these enhancer SNPs may play important roles in liver cancer.
Single nucleotide polymorphism (SNP) is a variation at a single nucleotide in a DNA sequence . In the last decade, a large number of genome-wide association studies (GWAS) have been published, indicating that thousands of SNPs are associated with diseases. Linkage disequilibrium is the nonrandom association of alleles at different genome locations . There are many SNPs in LD with the causal SNP at specific GWAS locus [3, 4]. Over 90% of these GWAS variants are located in noncoding regions, and approximately 10% are in LD with a protein-coding variant [5, 6]. In protein-coding regions, many studies have shown that some SNPs are associated with numerous diseases by affecting gene expression [7, 8]. However, in noncoding regions, the mechanism of how SNPs contribute to disease susceptibility remains unclear.
Enhancers are the core regulatory components of the genome that act over a distance to positively regulate gene expression . It is estimated that 400,000 to 1 million putative enhancers exist in the human genome [10, 11]. Recently, some studies have shown that disease-related GWAS SNPs are correlated with enhancers marked with special histone modifications [12–15]. Therefore, through integrating GWAS and histone modification ChIP-seq data in a given disorder, we can identify disease-related enhancer SNPs.
We provided a method for identifying liver cancer-related enhancer SNPs through integrating liver cancer GWAS and histone modification ChIP-seq data. We identified 22 enhancer SNPs associated with liver cancer, 9 of which were regulatory SNPs involved in distal transcriptional regulation. The results highlight that these enhancer SNPs may play important roles in liver cancer.
2.1. Pipeline of Identifying Liver Cancer-Related Enhancer SNPs
As shown in Figure 1, the pipeline consists of four steps. Firstly, we downloaded liver cancer-related GWAS SNPs from the GRASP  database and used LD data from HapMap  to infer liver cancer-related LD SNPs. Secondly, we identified enhancer regions in liver cancer through integrating histone modification ChIP-seq data in the HepG2 cell line. Thirdly, we mapped the liver cancer-related LD SNPs to the identified enhancers in liver cancer and obtained liver cancer-related enhancer SNPs. Finally, we used a curated regulatory SNP database named rVarBase  to validate our results.
2.2. Linkage Disequilibrium Analysis with Liver Cancer-Associated SNPs
We obtained 45 liver cancer-associated SNPs from GRASP (Table 1). These SNPs are the raw potential liver cancer-related SNPs. Then, we used LD data from HapMap to achieve liver cancer-associated LD SNPs. The total number of potential liver cancer-related SNPs is 340.
2.3. Identification of Liver Cancer-Related Enhancer SNPs
Previous studies indicated that the enhancer regions are marked by a strong H3K4me1 signal and a relatively weak H3K4me3 signal [19, 20]. Thus, we used histone modification ChIP-seq data to recognize the enhancer regions in liver cancer. Then, we mapped the liver cancer-related GWAS SNPs to the enhancer regions and obtained 22 enhancer SNPs in liver cancer (Table 2).
2.4. Validation as Regulatory SNPs
rVarBase is a database that provides reliable, comprehensive, and user-friendly annotations on variant’s regulatory features . It includes regulatory SNPs (rSNPs), LD-proxies of rSNPs, and genes that are potentially regulated by rSNPs. We used rVarBase to analyze these 22 enhancer SNPs in liver cancer and found that 14 SNPs have evidence of regulatory SNPs and 9 SNPs (rs9494257, rs6903949, rs6996881, rs4739519, rs6988263, rs12156293, rs1568658, rs5994449, and rs5753816) are involved in distal transcriptional regulation (Table 3). Table 4 shows the potential target genes of these 9 SNPs.
3. Materials and Methods
3.1. GWAS and LD Datasets
We downloaded the human liver cancer-related GWAS SNPs from GRASP. The database includes 26 and 19 liver cancer-associated SNPs () from Han Chinese in Beijing, China (CHB), and Japanese in Tokyo, Japan (JPT), respectively. The URL is https://grasp.nhlbi.nih.gov/Overview.aspx. We obtained all SNPs in LD with GWAS-lead SNPs using LD blocks identified with publicly available HapMap data on the CHB and JPT populations. The LD data can be downloaded from http://hapmap.ncbi.nlm.nih.gov/index.html/.
3.2. Histone Modification Datasets
We downloaded the human histone modification ChIP-seq datasets in the HepG2 cell line from the ENCODE Production Data/Broad Institute. The URL is http://genome.ucsc.edu/ENCODE/downloads.html.
3.3. Linkage Disequilibrium Analysis
In the genome, SNPs located in close proximity tend to be in linkage disequilibrium with each other. The International HapMap Project has established linkage disequilibrium of human genome SNPs. We used LD data from HapMap to achieve liver cancer-associated LD SNPs ().
3.4. Identify Enhancer Regions and Enhancer SNPs
Firstly, we downloaded the human histone modification BAM files (H3K4me1 and H3K4me3) in the HepG2 cell line from the ENCODE project. Then, we used BEDtools  to count read coverage for every position of the genome. Through calculating the ratio H3K4me1/H3K4me3 and picking up the regions with log2(H3K4me1/H3K4me3) > 1.2, we identified the potential enhancer regions. Finally, we mapped the potential LD SNPs to these enhancer regions and achieved liver cancer-related enhancer SNPs.
Through integrating liver cancer GWAS SNPs from GRASP, LD data from HapMap, and histone modification ChIP-seq data from ENCODE, we explored liver cancer-related enhancer SNPs. We compared our results with rVarBase and found that 9 SNPs (rs9494257, rs6903949, rs6996881, rs4739519, rs6988263, rs12156293, rs1568658, rs5994449, and rs5753816) were regulatory SNPs involved in distal transcriptional regulation. The results highlight that these enhancer SNPs may play important roles in liver cancer.
Compared with protein-coding regions in the human genome, noncoding regions contain much more genetic variations. Some important regulation regions, such as enhancers, have great influence on target gene expression. SNPs located in these regions may disturb gene expression and even cause diseases. Thus, the identification of SNPs in enhancer regions is helpful to understand the mechanism of association between SNPs and diseases.
We presented a method to identify disease-related SNPs located in enhancer regions that gives a new solution to investigate the relationship between SNPs and diseases. The presented method can also be applied to other diseases and will enable biologists to investigate the mechanism of disease risk associated with SNPs.
The authors declare that there are no competing interests regarding the publication of this paper.
Tianjiao Zhang collected the data, designed the computational experiments, carried out the statistical analysis, and wrote the paper. Qinghua Jiang participated in the design of the study. Yang Hu, Xiaoliang Wu, and Rui Ma participated in the revision of this paper. Yadong Wang gave comments and revisions to the final version of this paper. All authors read and approved the final paper. Tianjiao Zhang and Yang Hu equally contributed to this paper.
This work was partially supported by the National High-Tech Research and Development Program (863) of China (2012AA02A601, 2012AA02A602, 2012AA020404, 2012AA020409, 2012AA02A604, 2014AA021505, 2015AA020101, and 2015AA020108), National Science and Technology Major Project [no. 2013ZX03005012], and the National Natural Science Foundation of China (61571152, 31301089).
L. A. Hindorff, P. Sethupathy, H. A. Junkins et al., “Potential etiologic and functional implications of genome-wide association loci for human diseases and traits,” Proceedings of the National Academy of Sciences of the United States of America, vol. 106, no. 23, pp. 9362–9367, 2009.View at: Publisher Site | Google Scholar
The International HapMap Consortium, “The international HapMap project,” Nature, vol. 426, no. 6968, pp. 789–796, 2003.View at: Google Scholar
A. C. Marques, J. Hughes, B. Graham, M. S. Kowalczyk, D. R. Higgs, and C. P. Ponting, “Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs,” Genome Biology, vol. 14, no. 11, article R131, 2013.View at: Publisher Site | Google Scholar