Abstract

Introduction. Despite remarkable progress in identifying Parkinson’s disease (PD) genetic risk loci, the genetic basis of PD remains largely unknown. With the help of the endophenotype approach and using data from dopamine transporter single-photon emission computerized tomography (DaTscan), we identified potentially involved genes in PD. Method. We conducted an imaging genetic study by performing exome-wide association study (EWAS) and genome-wide association study (GWAS) on the specific binding ratio (SBR) of six DaTscan anatomical areas between 489 and 559 subjects of Parkinson’s progression markers initiative (PPMI) cohort and 83,623 and 36,845 single-nucleotide polymorphisms (SNPs)/insertion-deletion mutations (INDELs). We also investigated the association of cerebrospinal fluid (CSF) protein concentration of our significant genes with PD progression using PPMI CSF proteome data. Results. Among 83,623 SNPs/INDELs in EWAS, one SNP (rs201465075) on 1 q32.1 locus was significantly ( value = 4.03 × 10−7) associated with left caudate DaTscan SBR, and 33 SNPs were suggestive. Among 36,845 SNPs in GWAS, one SNP (rs12450112) on 17 p.12 locus was significantly ( value = 1.34 × 10−6) associated with right anterior putamen DaTscan SBR, and 39 SNPs were suggestive among which 8 SNPs were intergenic. We found that rs201465075 and rs12450112 are most likely related to IGFN1 and MAP2K4 genes. The protein level of MAP2K4 in the CSF was significantly associated with PD progression in the PPMI cohort; however, proteomic data were not available for the IGFN1 gene. Conclusion. We have shown that particular variants of IGFN1 and MAP2K4 genes may be associated with PD. Since DaTscan imaging could be positive in other Parkinsonian syndromes, caution should be taken when interpreting our results. Future experimental studies are also needed to verify these findings.

1. Introduction

PD is a neurodegenerative disease affecting 8–18 subjects out of 100,000 individuals annually [1]. It is characterized by tremor, dyskinesia, and rigidity [2]. The diagnosis of PD is based on clinical findings, and patients usually experience a prodromal phase before the definite diagnosis [3]. The prodromal phase mainly manifests with nonmotor symptoms such as constipation, hyposmia, REM-sleep behavior disorder, depression, anxiety, and cognitive impairment [3]. The pathological hallmark of PD is the aggregation of misfolded α-synuclein, also known as Lewy body, resulting in the loss of dopaminergic neurons in the substantia nigra [4]. The diagnosis of PD is not usually made until at least 30% of dopaminergic neurons and 50–60% of their axon terminals in substantia nigra are lost [5].

The exact etiology of PD is not yet fully understood; however, a higher incidence of PD in monozygotic compared to dizygotic twins and in people with a family history of PD than in those without a family history of PD highlights the prominent role of genetics in PD [6, 7]. Genetic studies have revealed several monogenic causes of PD, such as SNCA, PRKN, LRRK2, PINK1, DJ-1, VPS35, and GBA [8]. Recently, GWAS and EWAS have indicated the involvement of numerous genetic polymorphisms in PD [9]. Despite all these findings, the overall estimated heritability of PD is about 60%, and the known genetic variants do not account for all of it [10, 11].

An alternative approach to better identify the susceptibility genes for complex traits is the endophenotype approach [12]. An endophenotype refers to a biological or psychological feature of a disease believed to be in the causal chain between genetic backgrounds and diagnosable symptoms of the disease [13]. Neuroimaging endophenotypes have been widely used in neurological disorders such as PD. Neuroimaging endophenotypes provide quantitative measures of the brain structure or function that index genetic liability for a neurological condition [14]. In the substantia nigra of patients with PD, these endophenotypes can appear as reduced fractional anisotropy in diffusion tensor imaging, increased echogenicity in transcranial sonography, and decreased signal in the dorsolateral segment in magnetic resonance imaging (MRI) [15]. However, more specific and sensitive neuroimaging tools are needed to assess brain changes in PD.

Dopamine transporter single-photon emission computerized tomography (DaTscan) is a relatively new imaging modality in PD. DaTscan utilizes ioflupane I-123 injection to visualize the striatal dopamine transporters [16]. Ioflupane I-123 is derived from cocaine compounds and binds to dopamine active transporter (DAT), thereby providing a specific binding ratio (SBR) [17]. Due to its high sensitivity (84.4%) and specificity (96.2%), DaTscan has been used to diagnose PD and differentiate it from other neurodegenerative disorders in probable cases [18]. It also plays a role in detecting the preclinical phase of PD, as a longitudinal study has shown that SBR decline begins before the manifestation of motor symptoms and continues to decrease during disease progression [19].

Given the information above, we hypothesized that the genetic association study combined with neuroimaging findings from DaTscan as endophenotypes can reveal genetic basis of PD more accurately compared to case-control studies. Moreover, we aimed to conduct our genomic analyses on a mixed population (PD cases, prodromal patients, and healthy controls) because we believed that increasing the phenotypic variance increases the power of our study. Given the information above, we hypothesized that the underlying genetic basis of PD may be associated with neuroimaging findings from DaTscan. Compared with noncoding variants, exomes’ variants can better prioritize the putative causal genes in PD [20]. As a result, we aimed to explore the possible associations between striatal DaTscan findings and SNPs by conducting EWAS and GWAS on data from Parkinson’s progression markers initiative (PPMI) cohort.

2. Method

2.1. Study Design

We used data from the PPMI cohort in our study. In brief, PPMI is a longitudinal, observational, and multicenter study. The primary goal of the PPMI cohort was to assess clinical and neuroimaging features and genetic and biological markers across all stages of PD. The PPMI cohort began in 2010, and after an enrollment gap, the cohort reinitiated recruitment in 2020. The study included patients with PD, patients in the prodromal stage of PD, and healthy controls. All samples were collected using the relevant guidelines. All subjects signed the consent form, and the Regional Ethics Committees approved this cohort. The details of this cohort and its protocol can be found on its website (https://www.ppmi-info.org/) or through Supplementary Material 1. By filling out the data access request form and specifying our research purpose, we were allowed to access the database of this cohort in the LONI database (https://ida.loni.usc.edu/).

2.2. Participants

We used the ppmi_wes_645_cohort_vcf.tar, PPMI_NEUROX_Nov11th2013.zip, DaTScan_Analysis.csv, Age_at_visit.csv, and Demographics.csv files, which contained whole-exome genetic polymorphism data of 654 subjects, whole-genome SNP data of 619 subjects, six neuroimaging features from DaTscan in 2930 visits of participants, and participants’ age and demographic information in their screening visits. We chose screening visits of PPMI cohort patients for our study; because based on inclusion and exclusion criteria of PPMI cohort, the DaTscan values of patients were least confounded by disease progression or anti-Parkinson’s drugs.

2.3. Quality Control

We performed different stages of quality control on whole-exome sequence data before performing EWAS. In summary, exome sequencing was performed on whole blood-derived DNA samples following the PPMI Research Biomarkers Laboratory Manual using Illumina Nextera Rapid Capture Expanded Exome kit in 2015. Nextera Expanded Exome targets 201,121 exons, UTRs, and miRNA and covers 95.3% of Refseq exome. More details about the exome data in the vcf file can be found on the cohort website. There were data about 707,050 genetic polymorphisms, including SNPs and INDELs of 645 subjects. We first converted the .vcf file to the PLINK binary files with bed suffix using PLINK 1.9 software (https://www.coggenomics.org/plink/). Quality control of the exomic data was performed using PLINK 1.9 software. All SNPs of the sex chromosomes were excluded. SNPs with minor allele frequency (MAF) of less than 5%, missing genotype rate of less than 95%, and Hardy–Weinberg equilibrium (HWE) value of less than 0.000001 were excluded. Participants with a missing genotype rate of less than 95% and a heterozygosity rate of more than three standard deviations from the mean value were excluded. In the next step, the exomic data satisfying the quality control were merged with the neuroimaging features in screening visits and demographic characteristics.

We used the NeuroX SNP data of the original cohort to perform GWAS. In summary, SNP genotyping was performed using the Illumina NeuroX array. The NeuroX array is an Illumina Infinium iSelect HD custom genotyping array containing 267,607 Illumina standard content exomic variants and 24,706 custom variants designed for studying neurological diseases. There were data of 267,607 SNPs from 619 subjects. All quality control steps were performed similarly to the exomic data. Finally, the genomic data satisfying the quality control were merged with data from DaTscan.

2.4. DaTscan

DaTscan was performed in PPMI imaging centers. The imaging protocol of PPMI is available at http://www.ppmi-info.org/study-design/research-documents-and-sops/. Before injecting ioflupane I-123, females with possible pregnancy underwent a urine pregnancy test. Spatial normalization for consistent orientation on DaTscan, reconstruction, and attenuation correction was conducted according to PPMI protocol. Count densities were extracted from striatal regions of interest, including the caudate nucleus and putamen, and the occipital cortex was considered the reference region. The regional SBR was calculated using the following formula: (SBR) = (striatal region)/(occipital) − 1 [21]. In our study, the SBR scores of subjects at the screening visit were used for the analysis.

2.5. EWAS and GWAS

We used GCTA 1.94.1 software (https://yanglab.westlake.edu.cn/software/gcta/#Download) to perform EWAS and GWAS. We used a mixed linear model with the --mlma command. Mixed statistical models were used and corrected for the following confounding factors in genome-wide association analyses: genetic relatedness and population structure (ethnicity). The details of the GCTA software and mixed model association methods have been described elsewhere [22, 23]. The mathematical equation of this statistical model is as follows.

y = a + bx +  + e, where y is the phenotype (DaTscan features in our study), a is the mean term, b is the additive effect (fixed effect) of the candidate SNP to be tested for association, and x is the SNP genotype indicator variable coded as 0, 1, or 2. is the polygenic effect (random effect), i.e., the accumulated effect of all SNPs (as captured by the genetic relatedness matrix (GRM) calculated using all SNPs), and e is the residual. We used age and gender as covariates. Using the --mlma-no-preadj-covar option, covariates were fitted together with the SNP for the association test. For ease of computation, the genetic variance, var(g), was estimated based on the null model, i.e., y = a +  + e, and then fixed while measuring for the association between each SNP and the trait. Using the Bonferroni correction method, the suggestive and significant thresholds for values were calculated based on the number of SNPs/INDELs used in our study. After performing EWAS and GWAS on six neuroimaging endophenotypes, we computed the λ-statistic to evaluate the degree of genomic inflation for adjusting population stratification. Thereafter, we highlighted the observed versus expected values in the Q-Q plot by the qqman R package in R software version 4.2.2. We also used the qqman R package for generating the GWAS Manhattan plot.

2.6. Variant Annotation and Functional Fine Mapping

The positional gene mapping for all suggestive and significant SNPs/INDELs was performed using the UCSC genome browser website (https://genome.ucsc.edu/) based on the version of GRCh37/hg19. Using the Genotype-Tissue Expression (GTEx) version 8 (V8) data GTEx website (https://gtexportal.org/home/index.html), top genes with a significant eQTL value related to the SNPs/INDELs in 3 tissues (caudate nucleus, putamen nucleus, and substantia nigra) were identified for functional annotation. Finally, information about the association of identified genes with PD or PD-related phenotypes was evaluated on the GWAS catalog website (https://www.ebi.ac.uk/gwas/) and the Genecards website (https://www.genecards.org/) to identify novel candidate genes potentially associated with PD.

2.7. Replication of Significant Genes with CSF Proteome Data

In order to confirm the potential role of statistically significant genes identified in EWAS and GWAS, we investigated whether the CSF concentration of proteins produced by these genes is associated with PD progression. We used ordinal regression with PD status as the response variable (control = 0, prodromal = 1, and PD = 2) and age and gender as covariates. values less than 0.05 were considered statistically significant.

2.8. CSF Proteome Data

Proteomics data from the CSF of patients with PD and healthy volunteers were measured using the SOMAscan platform. The data were quality controlled by removing outlier samples, calibrators, buffer, and nonhuman SOMAmers. The measured values were hybridization normalized, plate scaled, median normalized intraplate, and calibrated at SomaLogic’s side, then log 2 transformed, median normalized interplates, and batch corrected at the plate level. More details about the quality control steps of proteomic data can be found on the PPMI website.

2.9. Results

In the exomic data, 557,019 SNPs/INDELs with MAF of less than 5%, 60,100 SNPs/INDELs with a missing genotype rate of less than 95%, 6,212 SNPs/INDELs with Hardy–Weinberg value of less than 0.000001, and 96 SNPs/INDELs on sex chromosomes were excluded. In addition, 95 subjects with less than 95% missing genotype rates and 13 participants with a heterozygosity rate of more than three standard deviations from the mean were excluded. Finally, the genetic data of 537 participants and 83,623 exomic SNPs/INDELs passed all steps of quality control. Among 537 participants, data from DaTscan and demographic characteristics were available for 489 participants. Among the 489 participants, there were 124 healthy subjects, 317 patients with PD, and 48 subjects from the SWEDD cohort (patients with symptoms of PD and normal DaTscan). The mean age of participants was 61.23 ± 10.08 years. Finally, EWAS for six neuroimaging endophenotypes was performed on 489 subjects (318 males and 171 females). Based on the Bonferroni correction method, the statistically significant and suggestive threshold in our EWAS were value <5.97 × 10−7 and 1.19 × 10−4 < value <5.97 × 10−7, respectively. The λ-statistics indicating the degree of genomic inflation was low in our 12 analyses. Manhattan plots and Q-Q plots of six endophenotypes are shown in Supplementary Material 2.

In the NeuroX genotyping data, 217,443 SNPs with MAF of less than 5%, 12,461 SNPs with a missing genotype rate of less than 95%, 191 SNPs with Hardy–Weinberg value of less than 0.000001, and 667 SNPs on sex chromosomes were excluded. In addition, 13 subjects with less than 95% missing genotype rates and 2 participants with a heterozygosity rate of more than three standard deviations from the mean were excluded. Finally, the genetic data of 606 participants and 36,845 SNPs passed all steps of quality control. Among 606 participants, data from DaTscan and demographic characteristics were available for 559 participants. Among 559 participants, there were 148 healthy subjects, 359 patients with PD, and 52 subjects from the SWEDD cohort (patients with symptoms of PD and normal DaTscan). The mean age of participants was 61.06 ± 10.34 years. Finally, GWAS for six neuroimaging endophenotypes was performed on 559 subjects (365 males and 194 females). Based on the Bonferroni correction method, the statistically significant and suggestive threshold in our GWAS were value <1.35 × 10−6 and 2.71 × 10−4 < value <1.35 × 10−6, respectively. The λ-statistic indicating the degree of genomic inflation was low in our six analyses. Manhattan plots and Q-Q plots of six endophenotypes are shown in Supplementary Material 3.

Among 83,623 SNPs/INDELs in EWAS, one SNP (rs201465075) reached the statistically significant threshold ( value = 4.03 × 10−7) associated with left caudate DaTscan SBR and 33 SNPs were considered suggestive. Positional gene mapping identified 30 candidate genes associated with the DaTscan features. The results of all SNPS with their mapped genes are shown in Table 1.

Among 36,845 SNPs in GWAS, one SNP (rs12450112) reached the statistically significant threshold ( value = 1.34 × 10−6) associated with right anterior putamen DaTscan SBR, and 39 SNPs were suggestive among which 8 SNPs were intergenic. To perform positional gene mapping for intergenic SNPs, two nearest genes on the left and right sides of the SNP were reported. The complete results of GWAS are shown in Table 2.

Among all suggestive and significant SNPs/INDELs in EWAS and GWAS, 16 SNPs were eQTL for 12 genes in three tissues relevant to PD pathology (caudate nucleus, putamen nucleus, and substantia nigra): ANKRD65, B3GALT6, OBSCN, WNT3A, CCDC25, ZNF471, AC007228.11, IDUA, KRI1, UBE3D, GAS2L1P2, and VPS28. The complete data of mentioned genes with their eQTLs are shown in Table 3.

2.10. CSF Proteome Analysis

One of our 2 significant SNPs was mapped to the IGFN1 gene. The other significant SNP was near the MAP2K4 gene. We hypothesized that the concentration of protein products of these 2 genes in CSF may be associated with PD progression. Only the CSF concentration of MAP2K4 protein was available in the PPMI CSF proteome dataset. The CSF concentration of MAP2K4 and demographic information were extracted for 1,156 subjects. The mean age of subjects was 61.64 ± 9.32 years. There were 637 males and 517 females. There were 185 healthy subjects, 354 patients in the prodromal phase of PD, and 617 patients with PD in our analysis. The mean CSF concentration of MAP2K4 among healthy subjects, patients in the prodromal phase of PD, and patients with PD were 7.82 ± 0.49, 7.79 ± 0.53, and 7.95 ± 0.59, respectively. The mean CSF concentration of MAP2K4 among all subjects was 7.88 ± 0.56. There was a statistically significant association between the concentration of MAP2K4 in CSF and PD progression across the PD spectrum ( value = 0.001).

3. Discussion

Using data from the PPMI cohort, we performed EWAS and GWAS and identified two loci, 1q32.1 and 17p12, at IGFN1 gene and near MAP2K4 gene, with a significant association with striatum DaTscan SBR which may have implications in PD pathology. We also found several suggestive genes (Table 4) associated with DaTscan SBR. Furthermore, the CSF proteome analysis showed that increased CSF concentration of MAP2K4 is associated with PD progression.

Based on The Human Protein Atlas, immunoglobulin-like and fibronectin type III domain containing 1 (IGFN1) is highly expressed in the muscular tissue and has low expression in the central nervous system (CNS) and basal ganglia [24]. IGFN1 is essential for myoblast fusion and differentiation [25]. IGFN1 is upregulated during muscle denervation and interacts with eukaryotic translation elongation factor 1A (eEF1A), thereby downregulating protein synthesis during muscle denervation [26]. eEF1A2 knockdown has been associated with impaired autophagy, mitochondrial dysfunction, α-synuclein deposition, and apoptosis in the 1-methyl-4-phenylpyridinium ion (MPP+)-induced cellular model of PD [76]. Giri et al. indicated that loss of function in the IGFN1 gene is significantly associated with Parkinson’s disease [27]. Nikonova et al. compared differentially expressed genes between kinase hyperactive G2019S transgenic mice and mice with knockout of leucine-rich repeat kinase 2 (LRRK2), a common genetic cause of both autosomal dominant familial and sporadic PD. They indicated that IGFN1 gene was one of the differentially expressed genes between LRRK2 knockout mice and kinase hyperactive G2019S transgenic mice [77]. Lavin et al. found that rehabilitative training enhances IGFN1 expression in the skeletal muscle of patients with PD; however, data are scarce about its status in the CNS [78]. In our analysis, rs201465075 was significantly associated with left caudate SBR. The rs201465075 is a missense variant in IGFN1 gene. It may alter the IGFN1 protein structure. There are no data about the effect of this SNP on gene expression of nearby genes.

The MAP2K4 gene, also known as MKK4, encodes mitogen-activated protein kinase 4, a member of the mitogen-activated protein kinase (MAPK) family [79]. This family of proteins is an integration point for intracellular signaling pathways and has been implicated in various cellular processes such as proliferation, differentiation, and transcription regulation [80]. The MAP2K4 gene has been previously implicated in the pathogenesis of Alzheimer’s disease [81]. Chen et al. showed that the G2019S mutation of LRRK2 enhanced its kinase activity and led to overphosphorylation of MAP2K4 [82]. LRRK2-mediated overphosphorylation of MAP2K4 upregulated and activated proapoptotic factors such as Fas ligand, caspase-9, caspase-8, and caspase-3 in the dopaminergic neurons of substantia nigra in transgenic mice, resulting in apoptotic neuronal death [82]. However, there were not any subjects with LRRK2 mutation in GWAS and EWAS samples.

It was also observed that MAP2K4 activated c-Jun N-terminal kinase (JNK)-mediated apoptosis in the MPP+-induced cellular model of PD [83]. Consistently, inhibition of JNK attenuated MAP2K4-mediated neuronal death [83, 84]. Interestingly, Shakespear et al. found that miR-200a-3p targets 3′-UTR of MAP2K4 gene/MKK4 mRNA and downregulates MAP2K4 mRNA and protein expressions, thereby preventing apoptosis of MPP+-treated SH-SY5Y cells [85]. In our study, rs12450112 near to MAP2K4 gene was significantly associated with right anterior putamen SBR. Due to lack of data about the effect of this SNP on gene expression, we hypothesized that MAP2K4 is the potential causal gene involved in PD pathology. Interestingly, CSF proteome analysis revealed that CSF concentration of MAP2K4 increases across the PD spectrum. So, MAP2K4 gene may be a potential causal gene in PD pathology. However, there are other genes such as myocardin (MYOCD), zinc finger protein 18 (ZNF18), dynein axonemal heavy chain 9 (DNAH9), and Rho GTPase activating protein 44 (ARHGAP44) near our significant SNP. Our significant SNP may also affect the expression level of these genes. Thus, further studies such as single cell-based integration of omic data or CRISPR-based experimental methods are needed to link our significant SNP to the target genes.

Our study has several limitations: There are no data regarding the effect of our two significant SNPs on gene expression or protein structure of related genes. Unfortunately, there is not any compelling evidence in various bioinformatics databases about the effect of our two significant SNPs on the expression level of nearby genes. No eQTLs were associated with our two significant SNPs in various external studies. Also, data on the protein level of IGFN1 do not exist in the PPMI cohort database. Also, DaTscan can be positive in various types of Parkinsonian syndromes. Therefore, the significant genes identified in our analyses can be associated with other forms of Parkinsonian syndromes rather than PD. Although using mixed population in analyses increased the power of our study for detecting significant SNPs, our approach has important limitations due to our nonhomogenous samples regarding various disease statuses of subjects.

4. Conclusion

Analyzing data from the PPMI cohort, EWAS and GWAS identified two potential genes, IGFN1 and MAP2K4, with a significant association with DaTscan SBR and PD. Furthermore, we found several suggestive genes (Table 4) for PD by EWAS and GWAS. Moreover, the CSF proteome data showed that the CSF concentration of MAP2K4 is associated with PD progression. More sample sizes should be used for future studies. The exact role of our significant SNPs and genes should be investigated in experimental and animal studies.

Data Availability

Data from the PPMI cohort were used for analysis in this study. Data can be obtained from their website.

Ethical Approval

All procedures were approved by the Institutional Ethics Committee.

Informed consent was obtained before data collection.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Supplementary Materials

Supplementary Material 1: the detailed protocol of PPMI cohort. Supplementary Material 2: Q-Q plot and Manhattan plot of right putamen, right caudate, right anterior putamen, left putamen, left caudate, and left anterior putamen DaTscan EWAS. Supplementary Material 3: Q-Q plot and Manhattan plot of right putamen, right caudate, right anterior putamen, left putamen, left caudate, and left anterior putamen DaTscan GWAS. (Supplementary Materials)