Abstract

Background. The hypoxic conditions at high altitudes are great threats to survival, causing pressure for adaptation. More and more high-altitude denizens are not adapted with the condition known as high-altitude polycythemia (HAPC) that featured excessive erythrocytosis. As a high-altitude sickness, the etiology of HAPC is still unclear. Methods. In this study, we reported the whole-genome sequencing-based study of 10 native Tibetans with HAPC and 10 control subjects followed by genotyping of selected 21 variants from discovered single nucleotide variants (SNVs) in an independent cohort (232 cases and 266 controls). Results. We discovered the egl nine homologue 3 (egln3/phd3) (14q13.1, rs1346902, ) and PPP1R2P1 (Protein Phosphatase 1 Regulatory Inhibitor Subunit 2) gene (6p21.32, rs521539, ). Our results indicated an unbiased framework to identify etiological mechanisms of HAPC and showed that egln3/phd3 and PPP1R2P1 may be associated with the susceptibility to HAPC. Egln3/phd3b is associated with hypoxia-inducible factor subunit α (HIFα). Protein Phosphatase 1 Regulatory Inhibitor is associated with reactive oxygen species (ROS) and oxidative stress. Conclusions. Our genome sequencing conducted in Tibetan HAPC patients identified egln3/phd3 and PPP1R2P1 associated with HAPC.

1. Introduction

In excess of 140 million humans have primarily lived in the high-altitude regions of various locations around the world, including the Qinghai-Tibet plateau in Asia, the Andes Mountains of South America, and the Ethiopian plateau in East Africa [1]. All dwellers in plateaus like native Tibetans of the Qinghai-Qinghai-Tibet plateau are regarded as one of the best dwellers adapted to high altitudes. They possess heritable adaptations to the hypoxic condition [2], as they display lower hemoglobin and hematocrit [3], higher oxygen saturation of blood [4], and better work efficiency [5], which are conducive to adapting the high-altitude and hypoxic situations. However, some Tibetans living at high altitude with genetic adaptations remain maladapted and possess symptoms of high-altitude polycythemia (HAPC).

HAPC is diagnosed with excessive erythrocytosis encountered by around 10% of the population residing at the Qinghai-Tibet plateau [6, 7]. HAPC shows several impacts on blood viscosity with various harmful consequences such as dysfunction of microcirculation, vascular thrombosis, extensive organ damage, sleep disturbances, and even death [7, 8]. In high-altitude dwellers, hypobaric hypoxia is thought to be one of the primary initiators of HAPC. However, the mechanisms underlying the pathogenesis and genetic basis of HAPC have not been well understood.

In recent years, the genetic basis underlying adaptation to high altitude in Tibetans has been elucidated by using exome and single nucleotide polymorphism (SNP) array data [911]. Furthermore, gene candidate approaches in Tibetan have identified that genomic variants in EPAS1, CYP17A1, CYP2E1, and mitochondrial DNA 10609T are associated with HAPC [1215]. However, comprehensive genome-wide scanning to detect the variations associated with HAPC has not been performed. It has been known that the variants that occurred in the genome-wide association study (GWAS) may account for some diseases with only relatively modest proportions and numerous deleterious variants with low frequency in the population.

With the technical development of high-throughput sequencing, it provides an opportunity to resequence multiple genetic regions and recognize novel risk alleles in diseases including high-altitude sickness [1, 1618]. To dissect the genetic mechanisms underlying HAPC, we compared genetic variations between native Tibetans with HAPC and adapted subjects without HAPC by whole-genome sequencing (WGS) of 20 individuals, and selected variants were further genotyped in an additional larger case-control cohort. Our investigation revealed the egl nine homologue 3 (egln3/phd3) (14q13.1) and PPP1R2P1 (Protein Phosphatase 1 Regulatory Inhibitor Subunit 2) gene (6p21.32) associated with HAPC.

2. Methods

2.1. Patients and Healthy Control Subjects

As the same ethnicity, HAPC was collected in healthy control subjects and matched with native high-altitude Tibetans. In this study, 10 patients with HAPC and 10 control subjects without HAPC that were performed in Tibet area (Table 1) were recruited. Based on the International Consensus Statement [7], (1) HAPC was diagnosed with an Hb concentration of at least 19 g/dL and 21 g/dL for women and men, respectively, and (2) HAPC patients were local Tibetans normally living at 3600 to 4500 m. Blood samples for WGS were collected, and each subject was informed and gave written consent.

2.2. Whole-Genome Sequencing

Blood DNA was isolated from each subject, and the desired length fragments were gel purified. According to the manufacturer’s instruction, DNA cluster preparation and adaptor ligation were performed with the library preparation kit (Illumina). WGS was performed on Illumina HiseqX ten Platform with 150 base pairs (bps) paired-end reads.

2.3. Read Alignment and Variant Calling

All reads were mapped to human reference genome by BWA [19] with default parameter. For downstream filtering, picard tools (http://broadinstitute.github. io/picard/) were used to mark PCR duplicates. We used Genome Analysis Toolkit (GATK) to adjust the alignments via GATK indel realignment and score recalibration modules [20]. We finally called and filtered SNVs using the GATK Unified Genotyper under default parameters. With the Ensembl annotated database, the resulting SNVs were annotated by ANNOVAR [21] according to the locations and expected effect on encoded gene products. According to dnSNP, SNVs were identified into “newly identified” and “known” variants.

2.4. SNP Discovery Stage

The allele frequency distribution of 10 cases and 10 controls and variants with significant difference () were analyzed by the Fisher exact test in allele frequency. To select the SNVs for association mapping, the pathway enrichment analysis using WebGestalt (http://bioinfo.vanderbilt.edu/webgestalt/) was performed by the distribution of genes with different allele frequencies. We used the hypergeometric test and Benjamini-Hochberg FDR to determine the significance of enrichment and adjust the multiple testing, respectively. This analysis revealed pathways including the HIF-1 signaling pathway and hematopoietic cell lineage [22] were associated with high-attitude adaption. We selected 21 SNVs in the two pathways from SNVs with different allele frequencies for genotyping in the stage of the association study.

2.5. Genotyping and Statistical Analysis

Genotyping of the selected 21 variants was performed with Sequenom MassARRAY or Snapshot system in an additional 232 HAPC cases and 266 controls. The primer of genotyping was designed by use of Primer 5.0. The frequency of alleles was calculated by genotype frequency in HAPC patients and control subjects, and the intergroup differences were counted using the Fisher exact test. value for multiple testing was adjusted by Benjamini-Hochberg FDR. QQ plot was produced using the man statistics package of R (http://cran.r-project.org/web/packages/qqman/). We use the chi-square test to assess deviations from Hardy-Weinberg equilibrium.

3. Results

3.1. Whole-Genome Sequencing and Single Nucleotide Variants (SNVs)

After extracting DNA from the peripheral blood, the whole-genome sequencing of 10 native Tibetans with HAPC and 10 adapted subjects without HAPC was performed by using next-generation sequencing technology (Supplementary Table S1). As a result, there were more than 360 million paired-end reads for each subject (Table 2). More than 99.7% of the reference genome was covered, and we sequenced the genome with a mean depth of 39x per individual. After applying stringent quality controls, we obtained 69,591,555 SNVs (single nucleotide variants), of which 191,734 (0.28%) were nonsilent mutations (nonsynonymous and splice site mutations). Comparing with 1000 Genome Project [23] gave 94% of SNVs known in 1000 Genomes Phase 1 ASN.

To discover the susceptibility loci associated with HAPC, two-stage studies including SNP discovery study and replication study were performed. To select the candidate SNVs for association mapping, we applied the Fisher exact test to WGS data and identified 33,662 SNVs () in allele frequency between native Tibetans with HAPC and adapted subjects without HAPC, of which 72 were nonsilent SNVs (Supplementary Table S2). Pathway enrichment analysis was performed on the genes with significant difference in allele frequency and revealed pathways including the HIF-1 signaling pathway [24] and hematopoietic cell lineage [22] that were associated with high-attitude adaption (Supplementary Table S3). To explore the association between HAPC and the two pathways, we subjected 21 SNVs that are in the two pathways and also showed allele frequency difference in the above Fisher exact test for genotyping in the stage of the association study (Supplementary Table S4).

3.2. Association Study

Associations between the 21 selected variants and HAPC were evaluated by genotyping in an additional 232 Tibetans with HAPC and 266 control subjects (Supplementary Table S5). Most of the observed associations showed close to the expected distribution based on the quantile-quantile (QQ) plot, which is consistent with the null hypothesis of no association (Figure 1). Two loci exceeded significance in association with HAPC (Table 3): egln3/phd3 at rs1346902 in EGLN3 on 14q13.1 (, , 95% confidence intervals (CI): 0.44-0.74) and PPP1R2P1 at rs521539 in the region between HLA-DRB1 and HLA-DQA1 on 6p21.32 (, , 95% CI: 1.08-1.84). The allele frequencies of other variants between these two groups were not significantly different. Furthermore, the overlap of these two associated SNPs was evaluated with the Encyclopedia of DNA Elements—annotated genomic elements (Supplementary Table S6). There were histone modifications at promoter or enhancer; DNase hypersensitivity sites, binding proteins, and motifs changed in both SNVs.

3.3. The Analysis of Clinical and Genetic Correlation

We next measured whether tegln3/phd3 and PPP1R2P1 were concerned with clinical characters between the HAPC group and the control group using analysis of variance method (Figure 1). In all Tibetan subjects, we found the obvious correlation () between tegln3/phd3 and hemoglobin (HGB or Hb; , ), red blood cell count (RBC; , ), hematocrit (HCT; , ), and direct bilirubin (DBIL; , ). Tibetan subjects with CC genotype showed lower Hb than those with either the CG () or the GG () genotype (Figure 2(a)). Similarly, the lower HGB, HCT, and DBIL were also observed in individuals with CC genotype compared with CG or GG individuals (Figures 2(b)2(d)). Additionally, we also found a significant association between PPP1R2P1 (A/G) and the albumin to globulin ratio (Figure 2(e); , ).

4. Discussion

In the present study, we used whole-genome sequencing of native Tibetans with HAPC and adapted subjects without HAPC to identify alleles’ associated risk in previously unidentified HAPC susceptibility genes. We selected 21 variants from our WGS data for further evaluation in an additional larger cohort of Tibetans with HAPC and controls. The differences in the SNPs of tegln3/phd3 (C/G) and PPP1R2P1 (A/G) between the two groups are found for the first time in this present paper.

SNP rs1346902 is located in the intron region (transcript ID: ENST00000487915 and ENST00000551935) of EGLN3 (also known as PHD3) that belongs to the Egl-9 (EGLN) family. EGLN3, along its paralogous genes EGLN1 and EGLN2, can catalyze the hydroxylation of hypoxia-inducible factor subunit α (HIFα), leading to increased ubiquitination and proteasomal degradation of HIFα and decreased HIFα activity [25]. Upregulated EGLN3 is related to apoptosis in sympathetic neurons [26], p53-induced growth arrest in RAS-transformed embryo fibroblasts [27], and differentiation of C2C12 myoblasts [28]. EGLN3 also plays important roles in human endothelial cells [29] and cancers, including pancreatic cancer [30] and glioblastoma [31]. Previous study also identifies that EGLN3 can negatively regulate the NF-κB pathway, which is a crucial regulator of cell differentiation, inflammation, stress responses, and immunity [32]. Although association analysis with microsatellite markers in an Andean high-altitude population shows a possible association between EGLN3 (marker D14S1049) and severe polycythemia [33], no susceptible association between SNPs in EGLN3 and HAPC is reported. Here, we found a susceptibility locus in EGLN3 for HAPC. We also found significantly lower Hb and RBC in the subjects with rs1346902 C/C allele (Figures 2(a) and 2(b)), supporting and extending the potential role of EGLN3 in HAPC. Therefore, the significant difference of genotypes between two groups for SNP rs1346902 (C/C) was the beneficial allele of HAPC and it might contribute to high-altitude adaption in Tibetans with lower Hb and RBC.

A large number of genomic surveys about nucleotide polymorphism among humans in high-altitude environments have offered evidence for genes in the HIF oxygen signaling pathway, among which EPAS1 and EGLN1 are notable [9, 10, 3436]. EPAS1, EGLN1, VEGFA, CYP17A1, and CYP2E1 in the HIF pathway are also shown to be associated with high-altitude sickness, including HAPC [1214, 3739]. The finding of susceptibility SNP in EGLN3 for HAPC in our study further supported the key role of the HIF oxygen signaling pathway in high-altitude adaption and relevant diseases.

Human leukocyte antigen (HLA) loci have been associated with more than 100 diseases [40]. HLA-DRB1 and HLA-DQA1 play an important role in the immune system, which belong to the classical HLA class II molecules. They can take part in recognizing and presenting antigens to T cells and are mainly expressed in B cells, dendritic cells, and macrophages. In addition, several studies have found a connection between HLA class II molecules and immune-mediated disorders, such as susceptibility loci in HLA-DRB1 for sarcoidosis [41] and pemphigus [42] and susceptibility loci in HLA-DQA1 for achalasia [43] and celiac diseases [44].

In our study, we also found that there were significant differences () in the allele frequency of PPP1R2P1 near HLA-DRB1/DQA1 between the HAPC group and the control group, with high-frequency A allele among HAPC (57%) than controls (47%). Our finding was consistent with positive selection of HLA loci (HLA-DRA, HLA-DQB1/HLA-DQA2) in Ethiopian population [45] and significantly associated with the high-altitude pulmonary edema of HLA-DR6 and HLA-DQ4 [46]. Our results also revealed a significant correlation between PPP1R2P1 (A/G) and albumin/globulin ratio (Figure 2(e)), which is an indicator of liver and kidney disorders. The implication of HLA suggests a role of inflammation perhaps associated with the kidneys or livers. Previous studies show that chronic hypoxia induced hypertension and erythrocytosis leads to renal injury and inflammation in rats [47]. The similar effect between HLA and HAPC may exist in humans. Protein Phosphatase 1 Regulatory Inhibitor has been reported to be associated with reactive oxygen species (ROS) and oxidative stress [48].

The primary limitation of our research was the small case-control cohort in the discovery phase, which might result in low power of detection of candidate susceptibility loci. Moreover, the functional role of newly identified susceptibility loci was needed to be verified by some experiments. We will pay more attention to the expression pattern of these genes and related regulatory pathways in HAPC patients.

5. Conclusions

In summary, our genome sequencing conducted in Tibetan HAPC patients identified two new susceptibility loci at 14q13.1 and 6p21.32 and offered to our knowledge of the genetic architecture of HAPC. Further study about these two loci was likely to improve our knowledge on the etiology of HAPC.

Abbreviations

HAPC:High-altitude polycythemia
SNP:Single nucleotide polymorphism
GWAS:Genome-wide association study
WGS:Whole-genome sequencing
SNVs:Single nucleotide variants
HIFα:Hypoxia-inducible factor subunit α
HLA:Human leukocyte antigen.

Data Availability

Our datasets are presented within the manuscript and additional supporting files.

Ethical Approval

The experimental protocol was established according to the ethical guidelines of the Helsinki Declaration and approved by the ethics committee of People’s Hospital of Tibet Autonomous Region, Lhasa, China. Besides, all experiments were performed in accordance with relevant guidelines and regulations

Written informed consent was obtained from the individual participants.

Conflicts of Interest

The authors declare no actual or perceived competing interests.

Authors’ Contributions

L.G., K.L., L.G., K.L., C.D., Y.B., R.T., and M.N. provided materials and performed sample preparation. Q.D. and G.G. performed laboratory work. L.G., K.L., and A san performed high-throughput genome sequencing. L.G., K.L., and A san analyzed genomic sequence and SNP data. L.G. and K.L. wrote the manuscript with contributions from all coauthors. All authors read and approved the final manuscript.

Acknowledgments

This study was supported by a grant from the National “Twelfth Five-Year” Plan for Science & Technology Support of China (2013BAI05B04), the Key project for Natural Science Foundation in Tibet Autonomous Region, China (2012), and the National Natural Science Foundation of China (NSFC) (No. 81960029).

Supplementary Materials

Supplementary Table S1: sequence data for HAPC and control subjects. Supplementary Table S2: list of 72 nonsilent SNPs with in allele frequency between cases and healthy controls in the discovery phase. Supplementary Table S3: pathway enrichment analysis on the genes with significant difference in allele frequency between 10 cases and 10 controls. Supplementary Table S4: the selected 21 SNPs for genotyping. Supplementary Table S5: characteristics of case and control subjects in the genotyping phase. Supplementary Table S6: functional annotation of SNPs correlated with rs1346902 and rs521539 () using data from ENCODE. (Supplementary Materials)