Abstract

Rheumatoid arthritis (RA) is a complex and multifactorial autoimmune disorder with the involvement of multiple genetic and environmental factors. Genome-wide association studies (GWAS) have identified more than 50 RA genetic loci in European populations. Given the anticipated overlap of RA-relevant genes and pathways across different ethnic groups, we sought to replicate 58 GWAS-implicated SNPs reported in Europeans in Pakistani subjects. 1,959 unrelated subjects comprising 1,222 RA cases and 737 controls were collected from three rheumatology facilities in Pakistan. Genotyping was performed using iPLEX or TaqMan® methods. A total of 50 SNPs were included in the final association analysis after excluding those that failed assay design/run or postrun QC analysis. Fourteen SNPs (LINC00824/rs1516971, PADI4/rs2240336, CEP57/rs4409785, CTLA4/rs3087243, STAT4/rs13426947, HLA-B/MICA/rs2596565, C5orf30/rs26232, CCL21/rs951005, GATA3/rs2275806, VPS37C/rs595158, HLA-DRB1/rs660895, EOMES/rs3806624, SPRED2/rs934734, and RUNX1/rs9979383) were replicated in our Pakistani sample at false discovery rate (FDR) of <0.20 with nominal values ranging from 4.73E-06 to 3.48E-02. Our results indicate that several RA susceptibility loci are shared between Pakistani and European populations, supporting the role of common genes/pathways.

1. Introduction

Rheumatoid arthritis (RA) is a complex autoimmune disorder due to the involvement of many genetic, environmental, and stochastic elements in its etiology [1, 2]. It is characterized by continuous inflammation associated with altered expression of different proinflammatory (TNFα, IL-1, IL-17) and anti-inflammatory (IL-4, IL-10, IL-13, IL-35) cytokines and activation of B and T cells, which leads to the destruction of synovial joints and ultimately physical disability [3, 4]. Worldwide, RA affects 0.5 to 1% of the population [5]. RA prevalence does not differ significantly between rural and urban areas [6]. It can affect both sexes at any age, but data suggest that women are three times more prone than men [7]. There is a significant genetic susceptibility to RA with 50-60% heritability estimates [8]. Twin studies also revealed high concordance rates in monozygotic twins (12.3-15.4%) relative to dizygotic twins (3.5%), strongly indicating the presence of a genetic component in RA [9].

During the last two decades, many genetic studies, including genome-wide association studies (GWAS), have been conducted to understand the genetic basis of RA and this has led to the identification of more than 50 risk loci [10]. Besides HLA-DRB1, many non-HLA loci have been implicated that are involved in multiple functions/pathways including immune activation (NF-κB signaling, JAK-STAT pathway, regulation, and activation of CD4+T-cells, complement activation), fibroblast differentiation and dedifferentiation, and bone modeling and repair [11, 12].

Thus far, the major focus of genetic studies on RA has largely been on Europeans or European-derived populations, with only limited data available on other populations [13], especially in South Asians [14]. In an effort to expand genetic studies on RA in South Asians and to further delineate its genetic basis, we sought to replicate 58 genome-wide significant single nucleotide polymorphisms (SNPs) from 58 loci previously implicated in Europeans or European-derived populations [1518] in Pakistani population where there is the paucity of such data.

2. Material and Methods

2.1. Study Subjects

A total of 1,959 subjects (1,222 cases, 737 controls) were included in our study. Blood samples and relevant clinical information were collected from three major public or private rheumatology clinics in Pakistan: Pakistan Institute of medical sciences (PIMS), Military hospital, and Rehmat Noor Clinic. All cases included in this study (, 78.6% women) were clinically diagnosed by rheumatologists and met the American College of Rheumatology (ACR) 1987 classification criteria for RA [19]. All controls included in this study (, 39.5% women) were recruited from the general population and were free of any autoimmune disease at the time of recruitment. A screening questionnaire was filled out and a written informed consent was obtained from each subject at the time of the recruitment. All blood samples were collected in EDTA tubes to avoid coagulation and processed shortly after the collection. The study was approved by the Institutional review board (IRB) of the University of Pittsburgh, USA (IRB no. PRO12110472).

2.2. Genomic DNA Extraction

Genomic DNA was extracted from whole blood using either the standard phenol-chloroform extraction method or GeneJET Whole Blood Genomic DNA Purification (Thermo Scientific USA) and quantified using the NanoDrop™ 2000 spectrophotometer (Thermo Scientific USA).

2.3. Genotyping

A total of 58 RA-associated genome-wide significant SNPs () was selected from the previously published GWAS in subjects of European ancestry (Table 1). SNP genotyping was performed using either the TaqMan® (Applied Biosystems, Thermo Fisher Scientific) or iPLEX® Gold (Agena Bioscience) methods and following the manufacturer’s design/order instructions and protocols. After the thermal cycling of the TaqMan® assays and DNAs on 384-well plates, the endpoint fluorescence reading was performed on a QuantStudio™ 12K Flex system (Applied Biosystems, Thermo Fisher Scientific). The iPLEX® Gold genotyping was performed in the Core laboratories of the University of Pittsburgh. 18% replicates were used to test genotyping consistency.

2.4. Statistical Analysis

Concordance to Hardy-Weinberg Equilibrium (HWE) was tested using the Chi-square goodness of fit test. Departure from HWE was considered at arbitrarily . Logistic regression using an additive model and minor allele as the effect allele was employed for case-control association analysis using sex and age as covariates. The Benjamin Hochberg false discovery rate (FDR) was applied to correct for multiple testing [20]. was considered as suggestive evidence of association and FDR ( value) of <0.20 as statistically significant as used in previous reports [21, 22]. All analyses were implemented in R, version 3.4.4.

2.5. Functional Annotations

To evaluate the potential biological significance of reported genome-wide significant SNPs, we used the Genotype-Tissue Expression (GTEx) database (https://gtexportal.org/home/) to search for expression quantitative trait loci (eQTL) in RA-relevant tissues and whole blood. We also used the RegulomeDB online database (http://regulome.stanford.edu/) to determine possible regulatory functions of the SNPs located in noncoding regions.

3. Results

A total of 1,222 unrelated RA cases and 737 controls were recruited for this research study. The prevalence of RA was higher in females (78%) than males (22%), supporting the earlier data that females are more prone to RA [14].

Eight of the 58 genotyped SNPs failed the QC (quality control) during assay design (either iPLEX® Gold/TaqMan® or both). The genotype distribution of all QC-passed 50 SNPs adhered to the HWE. The association analyses results in our Pakistani sample are presented in Table 2. Fourteen SNPs showed nominal significance at and FDR of ≤0.20.

We found the same direction of association where the minor allele showed the same risk or protective effect as compared to the previous findings on the tested allele (major or minor). In our data, LINC00824/rs1516971 showed the most significant association (). The second most significant SNP was PADI4/rs2240336 () followed by CEP57/rs4409785 (), CTLA4/rs3087243 (), STAT4/rs13426947 (), and HLA-B-MICA/rs2596565 (). Eight other SNPs showed marginal significance: C5orf30/rs26232 (), CCL21/rs951005 (), GATA3/rs2275806 (), VPS37C/rs595158 (), HLA-DRB1/rs660895 (), EOMES/rs3806624 (), SPRED2/rs934734 (), and RUNX1/rs9979383 (). Figure 1 shows the distribution of tested SNPs across the genome where the SNPs with value < 0.05 are labeled.

Next, we examined the functional significance of all 50 SNPs using the GTEx and RegulomeDB databases. Table 3 shows the category summaries of RegulomeDB scores, and Table 4 shows 14 SNPs that had a RegulomeDB score of ≤3, indicating strong evidence of potential regulatory role. SNPs falling in this category have more likelihood to affect the binding of transcriptional factors. Out of these fourteen SNPs, only five (HLA-B/MICA/rs2596565, HLA-DRB1/rs660895, C5orf30/rs26232, CTLA4/rs3087243, and RUNX1/rs9979383) had in our association results. SYNGR1/rs909685 had the top RegulomeDB score of 1b followed by FADS2/rs968567, CD40/rs4810485, CCR6/rs3093023, HLA-B/MICA/rs2596565, and HLA-DRB1/rs660895 with a score of 1f. The latter two SNPs (HLA-B/MICA/rs2596565 and HLA-DRB1/rs660895) were significant in our sample ( and ) and based on RegulomeDB, both SNPs are eQTL for HLA-DQA1. While HLA-DRB1/rs660895 affects the binding of two proteins (BCLAF1 and POLR2A), no such evidence was found for HLA-B/MICA/rs2596565. SYNGR1/rs909685 is an intronic variant, which falls in the p53decamer binding motif and affects the binding of five different proteins (MAX, MYC, PAX5, TRIM28, and EBF1). FADS2/rs968567 is also an intronic variant and it affects the binding of twenty-eight proteins, including MAX, POLR2A, NFKB1, and NFIC. The binding of these proteins is also affected by the CD40/rs4810485 variant. An intronic CCR6/rs3093023 variant falls in the Oct-1 binding motif and is eQTL for CCR6.

GTEx data eQTL showed the lowest p-eQTL (2.59E-34) for FADS2/rs968567 that affects FADS2 gene expression in transformed fibroblast cells; the same variant also affects TMEM258 (), ZBTB3 (), MYRF (), DAGLA (), and RAB3IL1 (). In transformed fibroblast cells, LINC00824/rs1516971 was a weak eQTL () for RP11-89M16.1. No data was available for CTLA4/rs3087243 in GTEx.

STAT4/rs13426947 was eQTL for STAT4, AC005540.3, and RP11-647K16.1 in EBV-transformed lymphocyte cells. HLA-B/MICA/rs2596565 was eQTL for multiple immune-related genes including C4A, C4B, HLA-S, HLA-B, HLA-C, and MICA in all three tested tissues. C5orf30/rs26232 showed the strongest eQTL () with PPIP5K2 only in whole blood. Additional details of GTEx data are given in supplementary Table S1.

4. Discussion

There have been a number of genome-wide association studies on RA over the past decade that have resulted in the identification of many RA susceptibility loci, thus improving our understanding of the complex genetic underpinning of RA [1517]. Most of the recently published GWAS have been conducted on Europeans and North Americans, which have identified many new RA risk loci and replicated and confirmed the previously reported putative risk loci. Many of these reported risk loci are shared among different ethnic groups, while some are specific to certain populations [23]. Replication studies across different ethnic groups are necessary to better understand the globe-wide population specificity of these established RA risk loci and guide future directions. Since South Asians, including the Pakistani population, have significant European ancestry [24, 25], we chose European originated GWAS-implicated SNPs for replication in Pakistanis. For this purpose, we enrolled 1,959 unrelated RA cases and controls from two public and private Rheumatology facilities, where people from different demographic regions of Pakistan visit for treatment, and we successfully genotyped 50 GWAS—implicated SNPs in those subjects.

In our data set, rs1516971 was the most significant SNP (), which is an intronic variant in the LINC00824 gene at chromosome 8q24.21. Previously, this variant has also been reported in a trans-ethnic association study of RA () [16]. Another SNP (rs6651252) from the same region, which is in complete LD () with rs1516971, has also been reported to be associated with RA and Crohn’s disease [15, 26, 27]. GTEx expression data suggests that rs1516971 affects the expression of RP11-89M16.1 at chromosome 8 in transformed fibroblast cells ().

The second most significant SNP in our data was rs2240336 (), which is present in the 9th intron of PADI4 at chromosome 1p36. PADI4 is one of the most important RA susceptibility loci in multiple ethnic groups, including Europeans, Asians, and Latin Americans [28, 29]. Another SNP in this region, rs2301888 that is in strong LD () with rs2240336, has also been reported to be associated with RA in Europeans and Koreans [30]. However, a study on Iranian population found no significant association of two other SNPs (rs11203367 and rs874881) in the PADI4 region with RA, which are not in strong LD () with rs2240336 [31]. GTEx expression data showed that rs2240336 is eQTL for RP4-798A10.2 () and PADI3 () in EBV transformed lymphocytes. The rs2240336 SNP also controls the expression of FBXO42 () in transformed fibroblasts. The third most important SNP in our data is CEP57/rs4409785 (), which is an intergenic variant on chromosome 11q21. This SNP has also been reported to be associated with other immune-related diseases such as multiple sclerosis, vitiligo, and Graves’ disease [3234]. In GTEx expression data, rs4409785 was eQTL for JRKL () in EBV-transformed lymphocytes, and for AP001877.1 (), and MAML2 () in transformed fibroblast cells.

Among the reported genome-wide significant SNPs that did not show significant association in our sample, the top functional ones based on the GTEx and RegulomeDB database included SYNGR1/rs909685, FADS2/rs968567, CD40/rs4810485, CCR6/rs3093023, and IKZF3/rs12936409. SYNGR1/rs909685 has a RegulomeDB score of 1b and is a strong eQTL for SYNGR1 in EBV-transformed lymphocytes () and whole blood (). FADS2/rs968567 has a RegulomeDB score of 1f and is eQTL for FADS2 in EBV-transformed lymphocytes (), whole blood and () transformed fibroblasts (). CD40/rs4810485 also has a high RegulomeDB score of 1f and is strong eQTL for CD40 in whole blood () and transformed fibroblasts () but weak eQTL for CD40 in EBV-transformed lymphocytes (). CCR6/rs3093023 with RegulomeDB score of 1f was the strongest eQTL for RNASET2 in transformed fibroblasts () and whole blood () with no data reported for EBV-transformed lymphocytes in GTEx. IKZF3/rs12936409 was a strong eQTL for ORMDL3 in all three tested tissues. In GTEx expression data, rs12936409 was found to control the expression of GSDMA, GSDMB, and ORMDL3 in all tested cells and whole blood.

5. Conclusions

We were able to successfully replicate 14 of 50 SNPs selected from previously published GWAS results in European populations in our unique Pakistani population. These findings suggest that there is a sharing of RA risk loci among different population groups. A weakness of our study is the availability of relatively small-sized sample, which may have prevented 36 SNPs from achieving the significance threshold, although they showed a trend for the same directional effects as the reported ones. Further studies using larger samples may help to identify and replicate more RA risk loci in the Pakistani population.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

All authors declared no competing interests.

Acknowledgments

We are very thankful to all Rheumatologists and their staff for providing us blood samples of RA patients and the Higher Education Commission (HEC) and the University of Pittsburgh for providing funding. We thank all the study subjects (RA patients and control subjects) for their participation in this study.

Supplementary Materials

S1: additional details of GTEx data. (Supplementary Materials)