Abstract

Human leukocyte antigen- (HLA-) A, HLA-B, HLA-C, HLA-DRB1, and HLA-DQB1 allele and haplotype frequencies were studied in a subset of 237 volunteer bone marrow donors registered at the South African Bone Marrow Registry (SABMR). Hapl-o-Mat software was used to compute allele and haplotype frequencies from individuals typed at various resolutions, with some alleles in multiple allele code (MAC) format. Four hundred and thirty-eight HLA-A, 235 HLA-B, 234 HLA-DRB1, 41 HLA-DQB1, and 29 HLA-C alleles are reported. The most frequent alleles were A02:02g (0.096), B07:02g (0.082), C07:02g (0.180), DQB106:02 (0.157), and DRB115:01 (0.072). The most common haplotype was A03:01g~B07:02g~C07:02g~DQB106:02~DRB115:01 (0.067), which has also been reported in other populations. Deviations from Hardy-Weinberg equilibrium were observed in A, B, and DRB1 loci, with C~DQB1 being the only locus pair in linkage disequilibrium. This study describes allele and haplotype frequencies from a subset of donors registered at SABMR, the only active bone marrow donor registry in Africa. Although the sample size was small, our results form a key resource for future population studies, disease association studies, and donor recruitment strategies.

1. Introduction

The ~4 Mb human leukocyte antigen (HLA) complex on chromosome 6 in humans is amongst the most polymorphic gene regions in the genome [1]. Seventeen thousand eight hundred and seventy-four (17874) HLA alleles have been described in the IMTG/HLA database to date [2]. HLA gene products are key for antigen presentation to T cells and form the basis of host defense mechanisms against pathogens [3]. HLA also plays a role in vaccine development and has a determining role in transplantation outcome [411]. In hematopoietic stem cell transplantation (HSCT), good clinical outcomes are associated with high-resolution HLA matching [12, 13], with the number of mismatches correlating with the risk of rejection and/or graft versus host disease (GVHD) [1416].

Bone Marrow Donors Worldwide (BMDW) is a centralized databank of HLA phenotypes and other relevant data of unrelated stem cell donors which aims to support HSCT programmes [17]. The South African Bone Marrow Registry (SABMR), a nonprofit initiative based in Cape Town, was started in 1991 with the objective of providing HLA-matched unrelated donors for South African patients and the world at large. The registry, listed in the BMDW, has more than 73,000 HLA-typed volunteer donors from South Africa [18]. Unrelated donor registries globally, including the SABMR, increase chances of HLA matches for many patients in need of transplantation. Despite the high-donor numbers globally, it is still difficult to find HLA matches for patients of Black African origin, partly because of (a) the great genetic diversity in these populations [19] and (b) limited information on HLA diversity [9]. Most transplants facilitated by the SABMR are from foreign donors, mainly due to the limited number of donors in the registry, particularly those of Black African and Asiatic/Indian origin [20]. There is thus a need to improve recruitment from these underrepresented populations into the SABMR, which, since 1997, has been the only registry on the African continent supporting an HLA-matched unrelated donor stem cell transplantation programme [20, 21].

Donor registries continuously try to improve their recruitment strategies through increasing donor numbers [22], recruiting young males [23], minority recruitment [2426], recruiting donors with rare HLA phenotypes [27], or alternatively using currently available HLA allele and haplotype frequencies [25, 28]. Although there is limited HLA diversity data for Southern Africans (reviewed in [29]), Africans are considered to be genetically diverse [19] as has been determined using multiple markers [3032], including HLA [33]. Most HLA families that exist globally are found in African populations [34], further confirming genetic diversity in these populations.

In this study, we describe HLA allele and haplotype frequency data from 237 donors registered with the SABMR, which serves as the source of unrelated marrow donors in South Africa. Frequencies of HLA-A, HLA-B, HLA-C, HLA-DRB1, and HLA-DQB1 alleles and haplotypes were analyzed with the aim of developing a resource for disease association, anthropology, and evolutionary studies. Furthermore, these data will support models for population-specific vaccine development [35] and will improve donor recruitment strategies in South African populations.

2. Methods

2.1. Study Population, Data Access, and Ethics

Two hundred and thirty-seven (237) SABMR-registered-consenting volunteer bone marrow donors HLA typed at varying resolutions were included in this study. This subset was accessed following an extensive reconsenting procedure of donors in the SAMBR. The self-reported ethnic grouping of the study population was Asian, Black, Chinese, Coloured, White, and some unknown. High-resolution typing has recently been adopted by SABMR, with most donors having low-resolution typing (two digit) [20, 21] which did not meet the current study criteria. For ethical compliance, the current study had to reconsent donors to participate in the study. As a result, only 237 of the potential 400 participants provided consent. Ethical clearance for this study was granted by the University of Pretoria, Faculty of Health Sciences Research Ethics Committee (220/2015) and the SABMR Board. Participants’ data accessed included HLA-A, HLA-B, HLA-C, HLA-DRB1, and HLA-DQB1 loci molecular typings and self-reported ethnicity. Some typings in this dataset were represented by multiple allele codes (MAC, formerly NMDP allele codes) as described in https://hml.nmdp.org/MacUI/.

2.2. HLA Allele and Haplotype Frequency Analysis

Allele and haplotype (two, three, four, and five loci) frequencies were estimated by resolving phase and allelic ambiguities using the expectation-maximization (EM) algorithm [36, 37] in Hapl-o-Mat open source software [38]. This software allows for allele verification using the IMTG/HLA database (http://www.ebi.ac.uk/ipd/imgt/hla/) [2, 3] and recognizes ambiguities including MACs. Deviations from Hardy-Weinberg equilibrium (HWE) were assessed at locus level using a chi-squared test [39]. Global linkage disequilibrium (LD) and HWE were implemented in Arlequin v3.5.2 [40]. MAC coded alleles were dropped to two-digit level resolution for HWE and LD analysis.

3. Results

3.1. Demographics and Allele Diversity

Self-reported ethnicity was not considered for analysis in this study owing to redundancy and simplicity of this classification as previously discussed [41, 42]. One hundred and thirty-one (131) Black, 69 Caucasian, 19 Mixed ancestry (Coloured), 15 Asian, 2 unknown, and 1 Chinese individuals were included in this study. Nine hundred and seventy-seven (977) different possible alleles are reported in this study (Table S1). There were 438 HLA-A, 235 HLA-B, 29 HLA-C, 234 HLA-DRB1, and 41 HLA-DQB1 alleles (Table S1), with the HLA-C locus having the lowest allelic diversity.

3.2. Hardy-Weinberg Equilibrium and Global LD Analysis

In this donor subset, HLA-A, HLA-B, and HLA-DRB1 genotypes deviated from the expected HWE proportions (), with HLA-C and HLA-DQB1 having insignificant () differences between expected and observed heterozygosity (Table 1). No significant global LD was detected between A~B, A~C, B~C, A~DRB1, B~DRB1, C~DRB1, A~DQB1, B~DQB1, and DRB1~DQB1 locus pairs (Table 2). In addition, the C~DQB1 locus pair showed significant LD (), as summarized in Table 2.

3.3. HLA Allele Frequency

The full list of alleles including those derived from MACs and their frequencies are listed in Table S1. The top 20 most frequent alleles across the five loci are summarized in Table 3 with the top three alleles per locus being A02:01g (0.096), A03:01g (0.093), and A01:01g (0.057); B07:02g (0.082), B08:01g (0.049), and B58:02 (0.048); C07:02g (0.180), C07:01g (0.104), and C04:01g (0.091); DRB115:01 (0.072), DRB115:03 (0.065), and DRB107:01 (0.057); and DQB106:02 (0.157), DQB103:01 (0.139), and DQB105:01 (0.118).

3.4. HLA Haplotype Frequency

All two, three, four, and five (extended) haplotype frequencies are detailed in Supplementary Table 2 (Table S2), with the 20 most frequent haplotypes summarized in Tables 4 and 5 (extended haplotypes). The most common computed two, three, and four loci haplotypes were B07:02g~C07:02g (0.145), C07:02g~DRB115:01~DQB106:02 (0.107), and B07:02g~C07:02g~DRB115:01~DQB106:02 (0.108), respectively. We report a possible 7498 two locus, 6446 three locus, and 773 four locus haplotypes in the SABMR subset of the donors (Table s2). A33:95~B07:231N (1.08725E-06), A03:01g~C07:02g~DQB103:02 (1.03519E-06), and A11:01g~C01:02g~DRB101:01~DQB105:01 (2.8507E-06) were less frequent two, three, and four locus haplotypes, respectively (Table S2). The twenty most frequent extended haplotypes (five loci) are summarized in Table 5, with A03:01g~B07:02g~C07:02g~DRB115:01~DQB106:02 being the most frequent (0.067).

4. Discussion

Although this study had a limited sample size of 237, we provide an in-depth analysis of HLA diversity in a subset of donors in the SABMR. Mixed resolution HLA typing data with multiple allele codes (https://hml.nmdp.org/MacUI) were analyzed using a robust Hapl-o-Mat [38] package to compute allele and haplotype frequencies through the EM algorithm. In addition, the package supports typing ambiguities in NMDP codes (MAC), G group and GL string formats. Since Hapl-o-Mat does not compute LD and HWE, we reduced all MAC-encoded typings in our data set to two digit resolution to estimate these parameters in Arlequin v3.5.2 [40]. Although there was the possibility of underestimation due to the loss of some allele information, global LD and HWE deviation is important in genetic studies.

Strong LD of C~DQB1 locus pairs ( in Table 2) in our study suggests limited chances of recombination between alleles from these loci in our population hence a greater chance of being inherited together. LD patterns of HLA or other genes may be used to infer evolutionary relatedness of populations [44]. Generally, individuals with haplotypes in LD are more likely to find haplomatches and strong LD is indicative of evolutionary relatedness of those alleles/loci. Carvalho and colleagues [45] report HLA-A, HLA-B, and HLA-DRB1 in HWE (), which contrasts to the significant deviation () observed in the current study (Table 1). Sample size and mixed typing resolution in the current study may have affected HWE proportions. When there is no deviation from HWE, HLA data may be used to infer human peopling history in anthropological studies [46]. Furthermore, there is evidence of large HWE deviations influencing EM algorithm-based allele and haplotype frequency estimations [47]. It is thus important to note the sample size and mixed typing resolution limitations of the current study in interpreting HWE and LD analysis.

Taking into account the nature of the HLA data in the current study, we report 977 possible alleles (Table S1). HLA-C had the lowest number (29) of alleles compared to HLA-A (438 alleles) which had the highest. There are generally more reported HLA-B alleles in the HLA database [2, 3]. We note though that previously most registries routinely typed HLA-A, HLA-B, and HLA-DRB1 for new donors with few being typed for HLA-C and HLA-DQB1 [48]. This might explain the observed allele numbers in our study. There is an ever increasing number of alleles in the database (currently 17,874 in the IMTG/HLA database release 3.31) [2, 3], with South Africa contributing some unique alleles [49, 50].

HLA-A02:01g with a frequency of 9.6% in the current study has been reported in North West England Caucasians at a higher frequency of 28.9% [51]. This English study also reported B07:02g, C07:02g, and DRB115:01 at frequencies of 15.3%, 15.6%, and 15.9%, respectively [51], compared to 8.2%, 18.0%, and 7.2% in the current study. It is important to note that the fifth most common allele in our study, namely, A30:02g (5% frequency in Table 3 and Table S1), is identical (exon 2 and 3 amino acid sequence) to a novel A30:02:01:03 allele previously reported in a SABMR donor [49]. HLA-DQB106:02 (15.7%) has been observed at higher frequencies in previous studies in West Africans (30.8%) and Shona Zimbabweans (24.7%) and is lower in Kenyans (14.6%), Colombians (15.0%), and people from Papua New Guinea (15.0%) [26]. HLA-DRB115:01 (7.2%) in the current study (Table 3) has been reported previously in South African populations at varying frequencies: 11.2% in Caucasians and 2.4% in Black Africans [26]. Additionally, DRB115:01 had a 3.8% frequency in Inuit women [52], 11.65% in Chinese [53], and more than 50% in North Africans, Asians, people from Oceania, and Europeans [54].

The main thrust of our study has been the ability to estimate, with high confidence, haplotype frequencies from mixed resolution typings including MAC (https://hml.nmdp.org/MacUI) encoded alleles [38]. No record of the most frequent two, three, and four loci haplotypes reported in this study (Table 4 and Table S2) is found in the allele frequency database [2, 3, 55]. The most frequent (6.7%) extended haplotype A03:01g~B07:02g~C07:02g~DRB115:01~DQB106:02 has previously been reported amongst Chinese populations at varying frequencies (0.93–5.20%) [53] compared to our 6.7%. There is no record of this haplotype in African populations in the AFND allele frequency database [56]. A lower frequency (3.31%) of this haplotype has also been reported in a German registry as described by Sauter and colleagues [57].

Haplotype frequencies from a specific population may be useful for resolving typing ambiguities using statistical approaches in typing prospective individuals from the same population [58]. It is important though to note that sample size affects these computations, with a tendency towards haplotype overestimation in small sample-sized studies [35]. Other confounders include typing ambiguity as previously described [59]. Additionally, multilocus haplotype frequency estimation better informs disease association studies than allele frequency [47]. A complete list of donor registry HLA haplotype frequencies better informs donor-patient matching tools like EasyMatch® [60], NMDP HapLogic [61, 62], and OptiMatch [63] especially for patients of African origin who might benefit from donors in the SABMR. These tools use haplotype frequencies to compute the likelihood of a donor-patient match and also anticipate the most likely mismatches. Haplotype frequency may be used to estimate the probability of finding a recipient match or may give an indication of the likelihood of mismatches from initial registry searches [35]. Additionally, haplotypes are better indicators of HLA match estimation compared to allele frequency alone [35]. Variations in allele frequency distribution in populations in general provide insight into peopling history [64, 65]. HLA genetic makeup of populations provides insight into history including selective pressures by pathogens [33], migration, admixture, and changes in population size [54, 6668].

Allele and haplotype frequencies from this study highlight the need for continued analysis by the SABMR for a better understanding of HLA diversity in the region. There is limited HLA diversity data for South African populations (reviewed in [29]), despite the evident value in transplantation, donor recruitment, disease association, and population studies. In addition, some registries specifically aim to improve recruitment from ethnic minorities [25] to increase the HLA diversity and hence the probability of finding an appropriate donor for a given patient. In this context, knowledge of the distribution of alleles and haplotypes in many different population groups, as determined by high-resolution typing, may allow for modification of recruitment strategies.

5. Conclusions

Although results reported here are from a small subset of SABMR registered donors, allele and haplotype frequencies generated by Hapl-o-Mat tool [38] could be a useful resource for future anthropological and population genetics studies in South Africans. Furthermore, these findings may better inform donor recruitment strategies for the SABMR. The small sample size limitation of this study also highlights the need for larger studies in order to better understand HLA diversity in South African populations. It would also be interesting to analyze the whole donor registry and compare its HLA diversity data to other registries globally.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

The authors thank the SABMR donors for their willingness to participate in this study. This research and the publication thereof is the result of funding provided by the South Africa Medical Research Council (SAMRC) in terms of the MRC’s Flagships Awards Project SAMRC-RFA-UFSP-01-2013/STEM CELLS, the SAMRC Extramural Unit for Stem Cell Research and Therapy, the Institute for Cellular and Molecular Medicine of the University of Pretoria, and the National Research Foundation of South Africa.

Supplementary Materials

Supplementary Table 1 (Table S1): HLA-A, HLA-B, HLA-C, HLA-DRB1, and HLA-DQB1 allele frequencies in 237 volunteer bone marrow donors registered in the South African Bone Marrow Registry. The 237 individuals described herein are a subset of all SABMR registered donors. Supplementary Table 2 (Table S2): Two, three, four, and five loci Haplo-o-Mat [38] estimated haplotype frequencies in 237 volunteer bone marrow donors registered in the South African Bone Marrow Registry. The 237 individuals described herein are a subset of all SABMR registered donors. (Supplementary Materials)