Common Expression Quantitative Trait Loci Shared by Histone Genes
A genome-wide association study (GWAS) was conducted to examine expression quantitative trait loci (eQTLs) for histone genes. We examined common eQTLs for multiple histone genes in 373 European lymphoblastoid cell lines (LCLs). A linear regression model was employed to identify single-nucleotide polymorphisms (SNPs) associated with expression of the histone genes, and the number of eQTLs was determined by linkage disequilibrium analysis. Additional associations of the identified eQTLs with other genes were also examined. We identified 31 eQTLs for 29 histone genes through genome-wide analysis using 29 histone genes (). Among them, 12 eQTLs were associated with the expression of multiple histone genes. Transcriptome-wide association analysis using the identified eQTLs showed their associations with additional 80 genes (). In particular, expression of RPPH1, SCARNA2, and SCARNA7 genes was associated with 26, 25, and 23 eQTLs, respectively. This study suggests that histone genes shared 12 common eQTLs that might regulate cell cycle-dependent transcription of histone and other genes. Further investigations are needed to elucidate the transcriptional mechanisms of these genes.
Histone mRNA transcripts and proteins are important for packing DNA into chromatin and are thus tightly regulated in most human cells . In humans, the genes encoding histones are gathered on chromosomes 1 and 6. It has been suspected that the clustered structure of genes can provide a manageable unit for coordinating transcription . Recently, genome-wide chromatin interaction analysis with paired-end-tag sequencing (ChIA-PET) has shown that some histone genes can share promoters .
While many efforts have been made to understand the mechanisms for the transcription of histone genes, they have not yet been well defined. Nuclear protein of the ataxia-telangiectasia-mutated locus (NPAT), which promotes the transcription of histone genes, is located near the Cajal body . Clusters of histone genes are also located near the Cajal body . The positions of histone gene clusters near the Cajal body have been observed between the restriction point (R-point) and the G1/S transition (S-point) during the cell cycle . The objective of this study was to select simultaneously expressed histone genes, identify their expression quantitative trait loci (eQTLs), and examine the functions of those eQTLs.
2. Material and Methods
2.1. Subjects and Data
The subjects of this study were 373 Europeans including 95 Finnish in Finland, 94 British in England and Scotland, 93 Tuscans from Italy, and 91 Utahn residents with Northern and Western European ancestry from the CEPH collection. Their genotypic data were derived from the phase 1 dataset produced by the 1000 Genomes Project  (http://www.internationalgenome.org/). This study utilized genotypic data at 5,796,145 SNPs after filtering out the SNPs with minor allele frequency < 0.05, with missing rate > 0.05, or in Hardy-Weinberg disequilibrium with .
Transcriptional data on 10,518 human genes were obtained in lymphoblastoid cells of the subjects by the Geuvadis RNA sequencing project (http://www.geuvadis.org/web/geuvadis/rnaseq-project). The unit used for the mRNA expression level was reads per kilobase per million mapped reads (RPKM). Outliers were removed based on sample similarity, which was estimated by the Spearman rank correlation between RPKMs and the exon counts of the samples . Sample swaps or contaminated samples were excluded based on allele-specific expression analysis . For details on the quality control process, see t Hoen et al. .
2.2. Statistical Methods
We selected histone genes that were expressed simultaneously. Pairwise gene expression relationships were estimated using Pearson’s correlation coefficient (r). The significance of the correlation was determined by .
We investigated genome-wide associations of the expression of the selected histone genes. A regression model was employed to identify SNPs associated with expressions of histone genes using PLINK . The Bonferroni correction was applied as a multiple testing, and the significance was determined by .
Linkage disequilibrium (LD) between the identified SNPs was estimated using the HaploView program . The LD block was determined according to the 95% confidence interval of the value for pairwise LD between the nucleotide variants with minor allele frequency > 0.05 .
The identified eQTLs were further analyzed for their associations with the expression of nonhistone genes throughout the genome. The Bonferroni multiple testing based on t-statistic was also applied with a significance threshold value of .
The functions of identified SNPs were examined using the Ensembl Variant Effect Predictor program  and RegulomeDB  (e.g., the motif of DNA footprinting assay, chromatin structure by DNA-seq, and protein binding by ChIP-seq).
We observed numerous correlations amid the expression of the histone genes investigated in the current study (Figure 1). In particular, the expression of 29 genes showed correlations significantly (). Genome-wide association analysis showed that 74 SNPs were associated with the expression of the 29 histone genes (; Table 1). Among them, 26 SNPs were simultaneously associated with the expression of multiple histone genes, and 5 out of 26 SNPs were associated with the expression of more than 10 histone genes (Figure 2). Thirty-one LD blocks were constructed covering the identified SNPs (Figure 3). The eQTLs corresponded to functional sites provided by various functional search sites. The rs79335804 had the most probable function with a RegulomeDB score of 2b (Table S available online at https://doi.org/10.1155/2017/6202567).
Transcriptome-wide association analysis revealed 80 additional genes associated with the 31 identified eQTLs (; Table 2). The genes encoding ribonuclease P RNA component H1 (RPPH1) and some small Cajal body-specific RNAs (scaRNAs), in particular, were associated with more than half of the eQTLs (>15 eQTLs; Table 2).
We analyzed the eQTLs for simultaneously expressed histone genes. We found significant correlations amid the expression of 29 histone genes, which were all clustered in chromosome 1 or 6. This clustered structure of the genes may serve to control simultaneous transcription, and this is supported by the observation that the expression of other histone genes not located on chromosome 1 or 6, including H1FX and H2A family members, was not correlated with those of the 29 selected genes. Furthermore, correlation estimates showed two subgroups nested within the large group (one with 21 genes and the other with 10 genes; with strong correlation coefficients of ), which likely provide a manageable unit for coordinating transcription.
The genome-wide eQTL analysis revealed that 12 loci were associated with the expression of multiple histone genes. The eQTLs were located on chromosomes 2, 7, and 11. Since 29 histone genes were all located on chromosome 1 or 6, we suspect that the identified eQTLs were transacting. This suggests that many histone genes are simultaneously transcribed by remote regulators.
Functional analysis of the identified eQTLs suggests that they are very important for transcription. For example, rs79335804, an SNP within an eQTL on chromosome 2, was the binding motif for Kruppel-like factor 4 (Klf4) protein in various cells including LCLs. Klf4 was associated with chromosomal aberrations and can prevent cell proliferation by acting as a transcription factor . The aberrant chromatin formation could be caused by overproduction of a histone dimer set (H2A-H2B or H3-H4) . Thus, we suspect that there is an association between the chromosomal aberrations by Klf4 and histone gene mRNA expression. Rs849578 within another eQTL on chromosome 2 was associated with autism in the Chinese Han population . It is located in an intron of neuropilin 2 (NRP2) which may be an effector of apoptosis, proliferation, and neuronal development . Histones are known to be related to developmental regulation , but additional study is required to elucidate underlying mechanism of the relationship between histones and NRP2.
Transcriptome-wide association analysis revealed that many nonhistone genes were also associated with the identified eQTLs. In particular, some genes were associated with 23 or more eQTLs. One was RPPH1, an RNA component of RNase P, which may assist in the cell cycle-dependent transcription of ribosomal RNAs (rRNAs) by associating with chromatin . The expression of rRNAs increased from G1 to S and peaked at G2 . The transcription of histone genes rapidly increased before the S phase of the cell cycle and decreased shortly thereafter . Thus, many eQLTs identified in this study might be involved in the cell cycle-dependent expression of both RPPH1 and histone genes. Such a regulation of the eQTLs would be one of the key factors to solve their underlying mechanisms. The others identified with many eQTLs were the genes encoding scaRNAs (SCARNA2 and SCARNA7) located in the Cajal body, similar to the pre-mRNAs of histones, which move to the Cajal body for mRNA processing [3, 19].
Interestingly, many genes controlled by the same eQTLs as those for histones do not have polyadenylated structures . In particular, the genes associated with more than 10 eQTLs were all nonpolyadenylated. They were snoRNAs, scaRNAs, and RPPH1. Considering that histones are also nonpolyadenylated, this may help us to understand the transcriptional regulation of histone genes by these eQTLs.
The expressions of histone genes play an important role in controlling chromatin accessibility . Improper expression of histone genes has been associated with tumorigenesis [22–24]. Expression of NPAT, a transcriptional activator for histone genes, is also associated with human tumorigenesis . The influence of histone genes on tumor developments might be supported by the eQTLs identified in the current study, because some of the eQTLs were located within anticipated tumor suppressor genes such as low-density lipoprotein receptor-related protein (LRP1B) and utrophin (UTRN) [26, 27].
In conclusion, we identified 31 eQTLs for histone genes. The eQTLs were also associated with nonhistone genes that exhibited both a cell cycle-dependent expression and a nonpolyadenylated RNA structure. Further investigations are required to understand the mechanisms regulating the transcription of the histone and nonhistone genes identified in this study and to appreciate their influence on cancer and other diseases. Moreover, identification of eQTLs using disease-specific cell types would provide resolute mechanisms by diseases.
Conflicts of Interest
The authors declare that they have no competing interests.
This work was funded by the National Research Foundation of Korea, the Ministry of Education, Science, and Technology (Grant no. NRF-2012M3A9D1054705).
Table S1. Functional capability of eQTLs identified for histones using RegulomeDB.
L. S. Shopland, M. Byron, J. L. Stein, J. B. Lian, G. S. Stein, and J. B. Lawrence, “Replication-dependent histone gene expression is related to Cajal body (CB) association but does not require sustained CB contact,” Molecular Biology of the Cell, vol. 12, no. 3, pp. 565–576, 2001.View at: Google Scholar
C. X. Liu, Y. Li, L. M. Obermoeller-McCormick, A. L. Schwartz, and G. Bu, “The putative tumor suppressor LRP1B, a novel member of the low density lipoprotein (LDL) receptor family, exhibits both overlapping and distinct properties with the LDL receptor-related protein,” The Journal of Biological Chemistry, vol. 276, no. 31, pp. 28889–28896, 2001.View at: Publisher Site | Google Scholar