Influence of Genetic Variants in EGF and Other Genes on Hematological Traits in Korean Populations by a Genome-Wide Approach
Hematological traits are important health indicators and are used as diagnostic clinical parameters for human disorders. Recently, genome-wide association studies (GWAS) identified many genetic loci associated with hematological traits in diverse ethnic groups. However, additional GWAS are necessary to elucidate the breadth of genetic variation and the underlying genetic architecture represented by hematological metrics. To identify additional genetic loci influencing hematological traits (such as hematocrit, hemoglobin concentration, white blood cell count, red blood cell count, and platelet count), we conducted GWAS and meta-analyses on data from 12,509 Korean individuals grouped into population-based cohorts. Of interest is EGF, a factor plays a role in the proliferation and differentiation of hematopoietic progenitor cells. We identified a novel EGF variant, which associated with platelet count in our study (). Our study also replicated 16 genetic associations related to five hematological traits with genome-wide significance () that were previously established in other ethnic groups. Of these, variants influencing platelet count are distributed across several genes and have pleiotropic effects in coronary artery disease and dyslipidemia. Our findings may aid in elucidating molecular mechanisms underlying not only hematopoiesis but also inflammatory and cardiovascular diseases.
Hematological metrics are used as essential medical indicators . Maintenance of homeostasis is linked to physiological pathways that can be tested via blood chemistry panels . Variation in hematological traits is heritable [3, 4]. Recently, genome-wide association studies (GWAS) have revealed hundreds of genetic loci associated with hematological traits [5–8]. Many of associated loci with hematological traits are shared between different ethnic groups. Despite success of discovery of large number of disease-associated variants, less than 10% of the heritability was explained by identified variants .
In addition, previous studies illustrated that significant differences in hematological traits exist between ethnic groups. For example, African Americans tend to have lower white blood cell counts, whereas persons of Japanese descent generally have fewer red blood cell-related anomalies than typically seen in other populations [10, 11]. These observations may suggest that there is a genetic basis for many hematological traits and investigation of unveiled variants is still required . And also, previously identified common loci have yet to be thoroughly evaluated through a genome-wide scan in persons of Korean descent.
In this study, we sought to identify additional ethnic Korean-specific genetic variants associated with five hematological traits: hemoglobin (Hb), hematocrit (Hct), red blood cell count (RBC), white blood cell count (WBC), and platelet count (PLT). To achieve our aim, we thus carried out a GWAS and meta-analysis in Korean populations to look specifically for effects related to these metrics. Subsequently, we performed pleiotropic association analyses and functional annotation of the identified trait-associated loci. Our results may not only highlight the biologically important role of genetic variants in hematological traits found in Korean populations but also provide useful insight on understanding genetic diversity between ethnic groups.
2. Materials and Methods
2.1. Study Subjects
We performed GWAS based on 5 hematological traits (Hb, Hct, WBC, RBC, and PLT) with data from 12,509 subjects from two population-based cohorts that are comprised in the Korean Genome Epidemiology Study (KoGES). In discovery stage, we analyzed data for 8,842 subjects from the Korea Association Resource (KARE) project of KoGES . To validate our discovery stage results, 3,667 healthy subjects in the Cardio Vascular Disease Association Study (CAVAS) of KoGES were used for the replication stage. For further replication of a novel locus, 8,053 subjects taking part in the Health2 study of KoGES and 23,032 Japanese subjects from the BioBank Japan project were selected for analyses. The descriptive statistics of each cohort are described in Supplementary Table 1 (available online at http://dx.doi.org/10.1155/2015/914965). And more detailed explanations of each cohort were previously described [6, 12, 13].
This study was approved by the ethics committee of the Korea Centers for Disease Control and Prevention’s Institutional Review Board, and all of study subjects provided written informed consent prior to taking part in the study.
2.2. Phenotype Determination
Hematological trait values were available for up to 20,562 subjects (8,842 KARE subjects, 3,667 CAVAS subjects, and 8,053 Health2 subjects) taking part in KoGES. Fasting blood samples were drawn from study subjects into a test tube containing an anticoagulant (e.g., EDTA), and relevant traits were measured or calculated using an automated electronic cell counter, ADIVA 120 hematology system by Bayer Diagnostics, USA.
2.3. Genotyping and Quality Control
In the discovery stage, 10,004 KARE study samples were genotyped by the Affymetrix Genome-Wide Human SNP array 5.0. Our quality control criteria are as follows: samples (i) with missing genotype call rate (>4%), (ii) with excessive heterozygosity (>30%), (iii) with gender inconsistencies, and (iv) from subject with cancer; SNPs with (i) missing genotype call rate (>5%), (ii) low MAF (<0.01), and (iii) Hardy-Weinberg equilibrium () were excluded. Following quality control analyses, data for 8,842 subjects and 352,228 SNPs were retained for further study. For in silico replication data, the 4,034 CAVAS samples were genotyped using the Illumina HumanOmni1-Quad BeadChip. The report file containing input signal intensity of samples was converted using the Illumina BeadStudio software package. Following quality control, 3,667 samples and 730,073 SNPs were deemed appropriate for analyses. Detailed quality control sample criteria and the genotypes from the two cohorts were described previously [12, 13].
For further replication of a novel locus, analyses in two methods were performed, de novo and in silico replication analysis. In de novo replication, we genotyped a SNP with the GoldenGate assay (Illumina Inc.) using 8,053 samples from Health2 study. The genotype success rate was 99.9%. In in silico replication, we used imputed data of 4q25 region based on genotype data using Illumina Human610-Quad BeadChip in 23,032 samples from the BioBank Japan project at the Center for Integrative Medical Sciences, RIKEN. The quality control criteria of Japanese samples and SNPs were described previously .
SNPs were imputed based on HapMap (phase 2, release 22, NCBI build 36 and dbSNP build 126; http://hapmap.ncbi.nlm.nih.gov/) data from the Japanese population in Tokyo, Japan (JPT), and the Han Chinese population in Beijing, China (CHB), using the IMPUTE program (version 2) (https://mathgen.stats.ox.ac.uk/impute/impute_v2.html) . Following our quality control testing, 1,590,162 SNPs from the KARE study and 2,150,086 SNPs from the CAVAS study were used for further analyses [13, 15].
2.4. Statistical Analyses
To investigate the genetic causes for the five specified hematological traits, we carried out GWAS using a linear regression model via the PLINK program (http://pngu.mgh.harvard.edu/~purcell/plink/) . Phenotypes used in the analyses were approximately normally distributed, and age and gender were incorporated into the analyses as covariates. We conducted meta-analyses for selected SNPs that exceeded our criteria of in the discovery stage and in the replication stage, with the inverse variance method using the METAL program (http://genome.sph.umich.edu/wiki/METAL) . After meta-analyses, SNPs with the accepted genome-wide significance level (), which reflected testing of one million SNPs , were considered statistically significant.
2.5. Association Analyses with Related Traits (Coronary Artery Disease and Lipid Profiles)
As we were interested in the effects of genome-wide significant SNPs on PLT, associations of each SNP with the lipid profile metrics (total cholesterol (TC), triglyceride (TG), LDL-cholesterol (LDL), and HDL-cholesterol (HDL)) and CAD were implemented using 8,842 KARE subjects and CAD 2,123 cases and 2,690 controls that were previously published, respectively [15, 19]. Age and gender were used as covariates in all analyses.
We conducted GWAS on 1,590,162 common SNPs (minor allele frequency (MAF) > 1%) and five hematological traits, namely, Hb, Hct, WBC, RBC, and PLT, for 8,842 subjects of the KARE project . We carried forward SNPs of our top association results that satisfied the threshold () for replication for 3,667 subjects in the CAVAS study, which represented a rural population-based cohort. Thirty-two variants were validated with statistical significance () in CAVAS study (Supplementary Table 3). Descriptive information for the study samples and the inflation of test statistics (genomic control) are shown in Supplementary Table 1 and Supplementary Table 2, respectively. The quantile-quantile plots for five hematological traits are presented in Supplementary Figure 1.
For the 12,509 data we used, we identified 17 genetic regions including one novel genetic association for PLT (4q25, on the EGF gene) that reached our threshold for genome-wide significance (), one for Hb, six for RBC, two for WBC, and six for PLT and one region (6q23.3) associated with three traits (Hct, RBC, and PLT) (Table 1 and Figure 1).
3.1. Previously Reported Loci
Of the 17 regions we identified, seven included previously reported associations of erythrocyte-related traits (Hb, Hct, and RBC) with the following loci: 22q12.3 (TMPRSS6), 6q23.3 (HBS1L-MYB), 4q12 (PDGFRA-KIT), 6p21.1 (CCND3), 12p13.3 (PARP11-CCND2), 9q34.2 (ABO), and 2p21 (PRKCE) (Table 1) . Our analyses of WBC also revealed two previously reported loci with rs8070454 located in PSMD3-CSF3 (17q21.1) and rs11981340 in CDK6 (7q21.2) (Table 1) . GWAS for PLT showed previously reported associations with the following six loci: 22q13.31 (PNPLA3), 6p21.33 (LY6G5C), 6p21.32 (HLA-DOA-HLA-DPA1), 12q24.12 (SH2B3), 6q23.3 (HBS1L-MYB), and 3q27.1 (THPO) (Table 1) [6, 20]. Other known loci were also scanned using the GWAS catalog of the National Human Genome Research Institute (Supplementary Table 4) .
3.2. A Newly Identified Locus
We identified a novel intronic variant, rs2282786, located on EGF at 4q25 that associated with lower platelet counts (effect size = , ) (Table 1 and Figure 2). This variant also showed statistical significance in subsequent replication stages that consisted of two populations, including 8,053 subjects from the Health2 cohort  and 23,032 subjects from Japanese population  () (Table 3). We also found a monomorphic ethnic difference in allele frequency of rs2282786 (T allele) in those of European and Yoruba descent in contrast to those of Asian descent (CHB + JPT, MAF = 0.258) based on HapMap project data (http://hapmap.ncbi.nlm.nih.gov/) (Supplementary Figure 2).
3.3. Pleiotropic Effect of PLT Related Variants on CAD and Lipid Profile
We examined associations between seven PLT-associated variants with genome-wide significance and other traits related to CAD and lipid profile, including TC, TG, LDL, and HDL (Table 2). Two variants near HBS1L-MYB and PNPLA3 were associated with three lipids (TC, TG, and LDL), respectively. Rs739496, located on 3′-UTR of SH2B3, was associated with both decreased platelet count and a decreased risk for CAD (Table 2). Other variants did not have compelling associations with these five traits.
Recently, numerous genetic loci for hematological traits were discovered through several GWASs of European, African American, and Japanese populations [5–7, 22]. Using a similar approach, we screened data for 12,509 Korean individuals and confirmed the participation of 16 known loci associated with hematological traits and also identified one novel genetic locus affecting PLT. The SNP rs2282786 located on EGF in 4q25 showed a strong association with PLT with genome-wide significance () by combined meta-analysis in 43,594 individuals (20,562 Koreans and 23,032 Japanese) (Table 1, Figure 2, and Table 3). This SNP also showed an ethnicity-based difference in allele frequency (Supplementary Figure 2). This is a compelling discovery and provides evidence of a divergent genetic background based on ethnic differences seen in hematological traits.
Additionally, to examine the association between genetic variants and the level of gene expression, the novel PLT-associated locus was cross-referenced with expression quantitative trait loci (eQTL) associations using genetic variation and gene expression profiling data from Gene Expression Variation (GENEVAR) (http://www.sanger.ac.uk/resources/software/genevar) . These data were based on lymphoblastoid cell lines (LCLs) from 162 HapMap3 individuals (80 CHB + 82 JPT) . An intronic SNP, rs4698756 on EGF, in weak linkage disequilibrium (LD) (, ) with rs2282786, showed a statistical significant cis-regulatory effect on gene expression levels of EGF in Chinese populations () (data are not shown). Furthermore, to elucidate the regulatory function of the locus, we surveyed the Encyclopedia of DNA Elements (ENCODE) features such as regulatory chromatin states, DNAse hypersensitivity, and ChIP-seq experiment using UCSC Genome Browser (http://genome.ucsc.edu/). According to the functional annotation based on ENCODE data, rs4698756 lies within regulatory functional elements comprising transcription factor binding sites, DNase clusters, and proteins required for chemical modification of histones. Even though the extent of LD between rs2282786 and rs4698756 was not so strong to use rs4698756 as a direct surrogate of rs2282786, this functional information may suggest the possibility of the regulation of EGF expression that may modulate platelet counts.
The EGF gene encodes epidermal growth factor; the encoded protein acts as a potent mitogenic factor, playing an important role in the growth, proliferation, and differentiation of numerous cell types [MIM: 131530]. It may play a role in growth, proliferation, and differentiation of megakaryocytes and platelet production. Previous studies reported that activated platelets induced by inflammation may secrete EGF and proinflammatory substances for subsequent thrombus formation in an inflammation-hemostasis cycle that is a tightly interrelated pathophysiologic process [25, 26]. It is well known that platelets play an important role in CAD both in the pathogenesis of atherosclerosis and in the development of acute thrombotic events. Accordingly, high blood lipid levels can enhance platelet aggregation, causing CAD . The resulting associations related to the identified PLT, CAD, and lipid profile loci suggested that they also have pleiotropic effects in the process of CAD and dyslipidemia (Table 2). Two variants located on HBS1L-MYB and PNPLA3, and one variant on SH2B3, that associated with PLT, also were significantly associated with lipids (TC, TG, and LDL) and CAD, respectively. Among them, SH2B3 encodes a member of the SH2B adaptor family of proteins [MIM: 605093] that plays a critical role in hematopoiesis involving blood coagulation and erythropoietin signaling pathways . This gene has previously been found to be associated with type 1 diabetes , cardiovascular diseases , and hypertension .
To date, many studies have reported genetic factors associated with hematological traits via GWAS across diverse ethnic groups [5–7, 22]. According to previous transethnic studies, the most commonly revealed variants were replicated in all of ethnic groups and have a species-wide role in biological pathways of hematopoiesis. However, these common genetic loci may account for a low percentage of hematological trait heritability . Therefore, to identify genetic mechanisms underlying these traits, additional large-scale population based studies incorporating multiple rare variants or gene-gene and gene-environment interactions should be undertaken. And performing functional experiments may also help to validate the results of statistical analyses for human hematological traits.
In summary, we illustrated that a genome-wide approach identified genetic variants contributing to phenotypic variation of hematological traits in Korean populations. We identified one novel ethnic specific variant associated with PLT that localized to a key regulator of hematopoiesis and confirmed previously implicated loci that were associated five hematological traits. We also provided pleiotropic effects of PLT-associated variants that may support the biological role of genetic determinants for hematological traits. Our findings may help identify biological pathways that contribute not only to hematopoiesis but also to inflammatory and cardiovascular diseases in humans.
Conflict of Interests
The authors declare no conflict of interests.
Yun Kyoung Kim and Ji Hee Oh contributed equally to this work.
This work was supported by an intramural grant from the Korea National Institute of Health (2012-N73002-00) and grants from the Korea Centers for Disease Control and Prevention (4845-301, 4851-302, and 4851-307).
Supplementary Figure 1: The quantile - quantile plots for five hematological traits in discovery stage.
Supplementary Figure 2: Population diversity of rs2282786.
Supplementary Table 1: Descriptive statistics of samples analyzed in this study.
Supplementary Table 2: Descriptive statistics of samples analyzed in this study.
Supplementary Table 3: Association results of 32 variants that were verified in the replication stage.
Supplementary Table 4: Replication of previously reported loci associated with hematological traits in Korean populations.
C. Garner, T. Tatu, J. E. Reittie et al., “Genetic influences on F cells and other hematologic variables: a twin heritability study,” Blood, vol. 95, no. 1, pp. 342–346, 2000.View at: Google Scholar
S. Margetic, “Inflammation and haemostasis,” Biochemia Medica, vol. 22, no. 1, pp. 49–62, 2012.View at: Google Scholar