Abstract

Frequent mislabelled causal relationship between drug hypersensitivity reactions and culprit drugs reinforces the need for an accurate diagnosis. The systematic reviews and meta-analyses of in vitro assays published so far focused on immediate reactions and the most severe delayed reactions, while the most frequent drug-induced delayed reactions—nonsevere exanthemas—have been underestimated. We aim to fill this gap. A systematic review of studies on in vitro assays used in the diagnosis of nonsevere drug-induced delayed reactions was conducted following the methodology of Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies Statement. The EMBASE and PubMed databases were searched. We have included 33 studies from which we extracted the data, then performed meta-analysis where possible, or synthesised the evidence narratively. The quality of the analysed studies was assessed with the QUADAS-2 tool. The tests identified the most frequently were lymphocyte transformation test (LTT), ELISpot, and ELISA. In the meta-analysis carried out for LTT in reactions induce by beta-lactams, the pool estimate of sensitivity and specificity amounted to 49.1% (95% CI: 14.0%, 85.0%) and 94.6% (95% CI: 81.7%, 98.6%), respectively. The studies showed heterogeneity in study design and laboratory settings, which resulted in a wide range of specificity and sensitivity of testing.

1. Introduction

Drug hypersensitivity reactions (DHRs) are common and are important culprits for unsuccessful treatment or the necessity of applying second-choice pharmacotherapy in daily clinical practice. They represent not only a health problem but also a significant financial burden for affected individuals and health systems [13]. DHRs are clinically classified as immediate reactions (IRs) or nonimmediate/delayed reactions (NIRs) [3, 4]. IRs trigger a spectrum of symptoms from mild to severe, including urticaria, angioedema, or anaphylaxis [3]. NIRs include all the range of clinical manifestations from maculopapular exanthema (MPE) or fixed drug eruption (FDE) to severe cutaneous adverse reactions (SCARs) such as the Stevens-Johnson syndrome/toxic epidermal necrolysis (SJS/TEN), acute generalized exanthematous pustulosis (AGEP), or drug reaction with eosinophilia and systemic symptoms (DRESS) [5, 6].

As for other diseases, misdiagnosis is the major factor that increases the burden and costs of the disease for patients [7, 8]. Over recent years, mislabelling is showing to be a relevant problem in two opposite ways, including false labels of allergic and false labels of nonallergic. This reinforces the need for accurate diagnosis followed by appropriate management [3]. The sensitivity and specificity of individual tests or assays used in the diagnosis of drug-induced reactions (immediate and delayed) are topics that are frequently undertaken and frequently discussed in the literature [3, 911]. However, studies in which the authors attempted to perform a meta-analysis (MA) based on studies identified through a systematic review of medical databases are rare and focused on immediate reactions [12].

Sousa-Pinto et al. have recently published an excellent systematic review with meta-analysis dedicated to the accuracy of penicillin allergy diagnostic tests [12]. However, the field of delayed DHR is far underrepresented in this respect. We had previously published a detailed analysis of in vitro assays as potential diagnostic tests in SCARs [10], but a more in-depth analysis of this issue related to MPE and other mild and moderate clinical manifestations of delayed DHR is still lacking. Taking into account the high prevalence of these reactions in daily practice, the reliable and well-established diagnostic tool would facilitate the management of these clinical conditions. Therefore, we have undertaken to systematize the available knowledge of in vitro assays used to diagnose delayed reactions other than SCARs and in case of availability of suitable tests to carry out their MA.

2. Methods

2.1. Design

A systematic review of primary studies of in vitro tests used in the diagnosis of delayed DHR (other than SCARS) was conducted according to the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies (PRISMADTA) Statement [13]. Drug-induced reactions were classified as delayed on the basis of judgement of the authors in the particular analysed publication.

2.2. Criteria for Considering Studies

Publications meeting the following criteria were included in the review: (1) the publication concerned the diagnosis of nonimmediate drug hypersensitivity reactions other than SCARS (i.e., SJS/TEN, DRESS, and AGEP); (2) the test was conducted in vitro; (3) the study was conducted in a population of at least 5 patients; (4) the results are presented in a form that allows the estimate of sensitivity, specificity, or positive/negative predictive value (PPV/NPV) with value given directly or estimated on the basis of the number of patients with a positive and negative test result; and (5) publication in English or Polish, which is the native language of the authors. The following publication types were excluded: letter to the editor, conference abstracts, books, and documents. No restrictions on the publication date were applied.

2.3. Search Strategy

The databases EMBASE and PubMed (Medline) were searched in December 2020, without limitation on publication date. The search strategy was created in PubMed using keywords related to drug hypersensitivity, delayed symptoms of drug allergies, and in vitro tests. It was next adapted for EMBASE but kept as similar as possible. To ensure that no important study was omitted, the references of the included works were verified in terms of their compliance with the subject of the study and the inclusion/exclusion criteria. Furthermore, in August 2020, a systematic review of the latest secondary studies consisting of reviews and systematic reviews (i.e., published after January 1, 2018) was carried out to find primary studies that were not identified in the main review. A detailed search strategy, inclusion criteria, and a description of selection process are available in Supplementary Data.

2.4. Study Selection

Publications were selected in two stages. First, the titles and abstracts of the articles retrieved were analysed, and a list of studies that initially met the inclusion criteria was developed. In the second stage, full-text copies were obtained and checked to qualify the studies for inclusion in the analysis. The selection was carried out by two independent authors (E.R. and S.D.). In case of disagreement of opinions during the verification of full texts, the final position was developed by consensus with the participation of a third party (P.D.).

2.5. Data Extraction

The key data from selected studies were extracted by one of the authors (E.R.) and then verified by another (S.D.). The extraction was carried out approaching a previously prepared and standardized form that included the following: (i) study characteristics (country, setting, and sampling method); (ii) population characteristics (number of patients and age group); (iii) information on drug hypersensitivity (drugs that trigger allergic reactions as reported by participants and clinical manifestation); (iv) information on the applied in vitro tests (type of test and threshold for the definition of positive result); and (v) results. If the study involved more than one group of drugs, the extraction was also carried out by the drug group. The endpoints assessed in this review have been selected a priori and included the sensitivity, specificity, or PPV/NPV of the test.

2.6. Methods of Synthesis

For each of the tests, we extracted the frequency with which a positive result occurred. On the basis of this data and information about the results of reference tests or the patient’s medical history (confirmed symptoms of drug allergy), we determined true positive (TP), true negative (TN), false positive (FP), or false negative (FN) values for each test. Where such an estimation was not possible, the values of parameters such as sensitivity, specificity, or PPV/NPV presented in the publication were used. If the study involved subgroups (i.e., acute or postrecovery phase) or data was split by drug or hypersensitivity reaction, the results were obtained both for the general population and for each subgroup (if available). Studies were grouped and analysed by test type.

2.7. Quantitative Synthesis

If multiple studies assessing the same issue are available, the appropriate procedure to obtain pooled results is an MA. While a typical MA usually concerns one endpoint, in case of diagnostic tests, their assessment most often takes into account two parameters—sensitivity and specificity. Therefore, these points should be analysed together. For MA, we used an interactive web-based tool, MetaDTA (Diagnostic Test Accuracy Meta-Analysis v2.0, 15th March 2021) [14, 15], which takes into account the correlation between sensitivity and specificity. Individual study estimates for trials included in the MA are presented as forest plots, and MA results are presented as a hierarchical summary receiver-operating characteristic (HSROC) curve with false positive rate (1-specificity) on the -axis and sensitivity on the -axis. Random-effect approach was used. Point estimates and 95% confidence intervals (CI) were estimated. MA was conducted only for tests for which at least 4 studies with full data (TP, FN, TN, and FP) were available. To be included in the MA, the studies had to concern the same group of drugs (particular drugs could differ), to have the same cut-off definition for a positive result and similar symptoms of delayed drug allergies present in the included patients.

2.8. Assessment of Risk of Bias

The quality of the included studies was independently evaluated by 2 authors (S.D. and P.D.) using the QUADAS-2 tool to assess the accuracy of diagnostic tests [16]. This tool allows one to evaluate the test in terms of its applicability (patient selection, test index, and reference standard) and to assess the risk of bias in 4 domains (3 domains shown above, plus flow and timing). Possible inconsistencies in the assessment were resolved through discussion and consensus with another author (E.R.).

3. Results

A total of 650 records were identified from the search of medical databases, of which 571 were excluded on the basis of title and abstract. After removing duplicates, there were 79 records for which full texts were obtained, which were then assessed for eligibility. We found 16 primary studies that met the inclusion criteria with extractable data on in vitro tests used for the diagnosis of nonimmediate drug hypersensitivity reactions. Additional 17 studies were identified from the references and systematic review of secondary studies (for details, see Figure 1 Suppl.). In total, 33 primary studies were included in this analysis.

3.1. Study Characteristics

Characteristics of the included primary studies are presented in Table 1. Three trials [1719] included children, and seven were conducted in children and adult population [2024], while four lacked information on age in the patient’s population [2528]. The remaining studies included adults [2939]. 23 studies were carried out in Europe [17, 2327, 2931, 33, 34, 3947], and for one of the studies, information on where the trial was conducted was not available [48]. Most of the trials (28 out of 33) included a convenience sample of patients [17, 1921, 23, 24, 2628, 3039, 4149], while three had retrospective design [18, 25, 40], and two used the consecutive sample method [22, 29]. Information on the trial setting (inpatients and outpatients) was available only in two studies [20, 34].

3.2. Quality of Included Studies

The main issue with the quality of the primary studies included is the lack of sufficient information to determine whether the risk of bias is high or low. More than 30% of the studies were rated “unclear” in individual domains, and in case of flow and timing, this percentage reaches 94% of the studies. Additionally, the evaluation carried out on concerns related to the applicability of studies has a high percentage of studies rated “unclear”—depending on the domain, this percentage varies from 45% to 64%. Figure 1 presents our assessment of the risk of bias and applicability concerns of the 33 primary studies included in the systematic review according to the QUADAS-2 tool.

3.3. Results of Individual Studies

The results of individual studies, divided by test type, are presented in Table 2.

3.4. Lymphocyte Proliferation/Transformation Tests (LTT)

The LTT results of 431 patients were extracted and pooled from the analysed studies (). The control comprised 318 LTT results from healthy individuals or LTT assays performed with irrelevant drugs in the patients. Three studies were conducted in the paediatric population, six studies were conducted in both the paediatric and adult populations, 13 studies described the results in the adult population, and four studies were carried out in an unknown population. Sensitivity in particular studies ranged from 0% to 100%, depending on the type of delayed hypersensitivity reaction and the drug analysed. The specificity ranged from 66.7% to 100%. The overall average of sensitivity and specificity of the LTT assay was, respectively, 48.6% and 93.7%. The sensitivity of the LTT assay for most of the drugs tested—beta-lactams and antiepileptics—ranged from 0% to 100% and 0% to 50%, respectively. The specificity range of LTT for those drug groups was as follows: 66.7% to 100% and 95.8% to 100%. The most frequently tested drugs were antibiotics (represented by beta-lactams) and antiepileptic drugs. Among the delayed hypersensitivity reactions, the most common was maculopapular exanthema (MPE). LTT seems to be the most accurate for detection of MPE caused by amoxicillin. Modifications in routine protocols with additional anti-CD3/anti-CD28 monoclonal antibody stimulation [47] and with B cells and monocytes or with dendritic cells serving as antigen presenting cells [43] revealed increased sensitivities, from 54.5% to 72.7% and from 22.2% to 88.9%, respectively. Data is collected in Table 2.

3.5. Enzyme-Linked Immunospot Assays (ELISpot)

ELISpot results from 415 patients were extracted and pooled from the analysed studies (), in which the assay was used to detect cells secreting IFN-γ, IL-4, IL-5, and GrB (Table 2). The control comprised 85 ELISpot results from healthy individuals or ELISpot assays performed with irrelevant drugs in patients. Most studies were carried out in the adult population (), and 1 study was carried out in the adult and paediatric population and 1 only in paediatric patients. In one of the studies, the age of the population has not been determined. The sensitivity in particular studies ranged from 0% to 100%, depending on the delayed hypersensitivity reaction type and the drug analysed. Specificity ranged from 82.9% to 100%; however, in only 4 studies, specificity was determined. The overall average of sensitivity and specificity of the LTT assay was 55.6% and 93.1%, respectively. The sensitivity of the ELISpot assay for most drugs tested, beta-lactams, represented by penicillins and antiepileptics represented by carbamazepine, lamotrigine, oxcarbazepine, and phenytoin ranged from 60% to 90.9% and 0% to 72.7%, respectively. The specificity range of ELISpot for those drug groups was the following: 82.9% to 95% and 95.8% to 100%. Among the delayed hypersensitivity reactions, the most common was MPE. The ELISpot test appears to be the most accurate for detection of MPE caused by penicillin (specificity: 90.9%; specificity: 95%). The value of this test for detection of penicillin-causing DHRs additionally confirms PPV and NPV with 100% and 84.6%, respectively.

In the papers by Polak et al. and Haw et al. [24, 40], ELISpot was used simultaneously for detection of various cytokines (IFN-γ and IL-4), which allows for direct comparison. ELISpot for IFN-γ and ELISpot for IL-4 showed very similar sensitivities in both studies (Table 2). In turn, the same cytokine detected with the ELISpot can provide significantly different results depending on the tested causal drugs, e.g., ELISpot for IFN-γ reached sensitivity 37.5% with antiepileptics [42] and 90.9% with beta-lactams [26].

3.6. Enzyme-Linked Immunosorbent Assay (ELISA)

ELISA experiments were performed with the measurement of the following cytokines: IL-5, IL-10, and IFN-γ. The experiments were designed to identify delayed allergic reactions only against antiepileptic drugs. The results of ELISA from 61 patients were extracted and pooled from the analysed studies (). The control comprised 61 ELISA results from healthy individuals or ELISA assays performed with irrelevant drugs in the patients. One study was carried out in mixed, paediatric and adult population, and the other 3 were carried out in adult population. Sensitivity in particular studies ranged from 17.4% to 91.7%, depending on a measured cytokine. The specificity ranged from 60% to 100%. The most specific biomarker was IL-5 with a sensitivity of 91.7% and a specificity of 100%. The overall average of sensitivity and specificity of the ELISA assay was 50.9% and 92%, respectively. Among the delayed hypersensitivity reactions, the only one under investigation was MPE (Table 2).

3.7. Basophil Activation Test (BAT)

BAT, as a well-known test for immediate allergic reactions, was also tested against delayed drug hypersensitivity in two studies. BAT assays were performed in three settings measuring the expression of CD203c+ and/or CD63+ to test allergic reactions against beta-lactams and antibiotics. The BAT results from 20 patients were extracted and pooled from the analysed studies (). The control consisted of 30 BAT results from healthy individuals. Sensitivity in particular studies ranged from 0% to 33.3%, depending on the expression measured activation marker, with CD63+ being more relevant for such a measurement. Specificity ranged from 78.6% to 100%. In a single study, CD63+ revealed 40% and 3.3% of positive and negative predictive values, respectively. In another study, where CD203c+ and CD63+ were applied, BAT had a negative predictive value of 53.3% (positive predictive value was impossible to calculate). Among the delayed hypersensitivity reactions, patients with MPE and benign skin rashes were tested with BAT (Table 2).

3.8. Other Tests

Other in vitro diagnostic assays identified in the literature were based on the detection of IFN-γ or intracellular staining followed by cytometric analysis of CD4+ cell proliferation [19]. Some other tests were based on the heparin-induced IgG assay [33] and radioallergosorbent test (RAST) known from the detection of immediate drug hypersensitivity reactions. Both the heparin-induced IgG assay and RAST do not show any suitability for DHR determination, which was confirmed by sensitivity results—22.2% and 0%, respectively. Cytometric analysis of CD4+ cell proliferation presented a high predictive value in the detection DHR. This was confirmed by the sensitivity and specificity results—100% and 90.9%, respectively. IFN-γ secretion appears to be an equally useful DHR determination test with a sensitivity of 71.4% and a 100% specificity [19]. Diagnostic parameters of these tests are shown in Table 2.

3.9. Meta-Analysis: LTT for Beta-Lactams

An MA of studies reporting the detection of delayed allergies related to beta-lactams (the same cut-offs—stimulation ) using LTT was conducted and comprised four studies [17, 26, 46, 47]. All these studies had been conducted in Europe and evaluated a reasonable sample of patients. One study [17] included children, two studies [45, 47] assessed adults, and one study [26] lacked information about the age of the participants. Three studies [17, 26, 46] investigated amoxicillin, and one study investigated other beta-lactams, cefuroxime [17], ticarcillin [26], and penicillin G [46], respectively. The fourth study tested ampicillin [47]. Studies included participants who reported benign skin rashes [17], MPE [26], exanthema [46], and macular or maculopapular exanthema [47]. Individual study estimates for each trial, both for sensitivity and specificity, are presented in Figure 2. Across these four studies, the pool estimate of sensitivity and specificity amounted to 49.1% (95% CI: 14.0%, 85.0%) and 94.6% (95% CI: 81.7%, 98.6%), respectively. The hierarchical summary receiver-operating characteristic curve for the diagnostic performance of beta-lactam LTT in patients with delayed reactions is shown in Figure 3.

4. Discussion

In vitro diagnostic tests in delayed DHR are considered difficult to perform and therefore limited to highly specialized centers. On the other hand, they would be highly useful to avoid time- and cost-consuming in vivo challenges (including drug provocation tests), facilitate allergologic diagnostic work-up in case of the patients living far from reference centers, and delabelling patients with a suspicion of drug allergy (positive lab test confirms allergy and changes a direction of a diagnostic pathway into alternative drugs). An important issue that needs to be emphasized considering in vitro tests is the fact that cytokines are determined in various conditions and various diseases [50]; however, only certain cytokines (i.e., TNF-α, IL-2, IL-4, IL-5, IL-6, IL-10, IL-13, and IFN-γ) are important in allergic reactions [5156]. Delayed drug hypersensitivity reactions most often are mediated by IFN-γ, but in the case of reactions with a dominant role of eosinophils, IL-5 and IL-4/IL-13 play a major role. On the other hand, in reactions with a cytotoxic mechanism, such as SJS, perforin and granzyme B are of key importance. In turn, drug-induced reactions associated with neutrophilic inflammation, such as AGEP, seem to be associated with an increase in CXCL-8/GM-CSF [57]. We aimed to identify the most valuable and promising assay(s) for further development and application to daily practice. For practitioners working in the field of drug hypersensitivity, the burning questions regarding in vitro tests are how sensitive and specific those tests are in the detection of immune reaction to drugs implicated in DHR and how diagnostic parameters compare between in vitro tests?

In this systematic review, we evaluated the usefulness of in vitro tests for the diagnosis of delayed drug reactions other than SCARs. The exclusion of severe reactions such as SJS/TEN, AGEP, and DRESS was due to the fact that a more in-depth analysis of this issue addressing maculopapular exanthema and other mild clinical manifestations of delayed DHR is still lacking. Taking into account the high prevalence of these reactions in daily practice, a reliable and well-established diagnostic tool would facilitate the management of these clinical conditions. The systematic review covered a broad spectrum of tests and drugs; thus, despite a few differences between identified studies (included population, reported drug reactions, and associated risk of bias), the production of an MA of studies reporting the detection of delayed allergies related to beta-lactams () using LTT was possible. However, limitations of MA resulting from both low number of studies included, and the moderate heterogeneity should be taken into account while interpreting the pooled sensitivity and specificity of the test especially in a context of the patient population. Different response markers applied in parallel on the same platform may improve the overall test performance (i.e., both IFN-γ and IL-5 ELISpot [49]).

One of the major challenges in the diagnosis of DHR is the lack of standardized criteria for the evaluation of DHRs. Although there are a few testing possibilities, the criteria of the selection of the best diagnostic tools for each drug group are still missing. Among all tests utilized to diagnose DHR, the most studied group is LTT and its modification. The publications describe both paediatric [17, 18, 2024, 29, 40] and adult groups [2024, 2949]. Also, broad drug spectrum was tested. The outcome of our analysis clearly highlights the lack of standardization in both the performance of the tests and the read out. It results in a wide range of specificity and sensitivity of testing. This, in turn, brings challenges to daily practice of physicians.

ELISpot is a commercially available method that applies an analysis of cytokine and other soluble molecule secretion from T-cell. The results of our analysis clearly demonstrate that there is also an immunological response, other than proliferation, available to use in the diagnosis of DHR. Seven studies included in our analysis have brought a range in specificity and sensitivity [24, 26, 31, 4042, 49]. Although the specificity seemed to be much higher than for LTT, it is important to underline that only 4 studies included in the analysis aimed to measure it [24, 26, 41, 42]. There is also a limitation in the paediatric population that was represented only in 1 study [40]. Importantly, ELISpot was performed mostly in the detection of DHR in patients suffering from MPE [26, 4042, 49].

ELISA [29, 32, 41, 42], BAT [17, 30], and other tests [19, 33, 44] had a very limited number of studies with the quality to be included in our systematic review. Again, as for the methods described above, the range in specificity and sensitivity suggests that the tests require further optimalization to be used as reliable for DHR detection. Of interest is the fact of high specificity in all three groups of tests (92% ELISA, 89.3% BAT, and 95.45% other tests). Unfortunately, the sensitivity of BAT [17, 30] and heparin-induced IgG assay [33] was below the results of LTT and ELISpot which underlines limitations of such methods.

Taking into consideration the ELISpot and ELISA tests, it has to be noted that the lack of standard cut-off values used in the discussed studies makes it difficult to directly compare their results. There are very different approaches in the literature to calculate thresholds in performance evaluation. For instance, a positive response can be calculated as a result above the upper limit of 95% confidence interval or higher than mean and 2 standard deviations calculated from samples serving as negative controls [11]. Therefore, well-designed methodological studies on the precise determination of threshold values for tests would provide crucial input for further development of these assays and their wider introduction into everyday practice.

It should be noted that despite the theoretical availability of many in vitro diagnostic methods, our meta-analysis indicates that the possibilities of using these methods in practice are limited. The most consistent evidence-based data relates first to LTT. The experience of a given diagnostic center in the field of cultivating PBMCs and stimulating them with suspected drugs is also of great importance, and this is a common step in different in vitro tests, regardless of the reading the systems used, such as LTT, ELISA, or ELISpot.

5. Conclusions

In summary, more specific and sensitive diagnostic tools are needed for a better patient management. Current testing brings uncertainty, and our systematic review does not provide a clear answer to the question which test should be used for each drug and patient group. LTT is most commonly used and has a good performance in beta-lactam-induced MPE. Based on that, it can be concluded that LTT seems to have the highest value in clinical practice among other in vitro tests. This is especially applicable for DHR detection in the adult population, as only single studies are available in small paediatric cohorts. Due to large heterogenicity in particular study results, the conclusion presented needs further investigation in well-designed studies conducted on large cohorts.

Data Availability

The data supporting this systematic review and meta-analysis are from previously reported studies and datasets, which have been cited. The processed data are available from the corresponding author on reasonable request to researchers who provide a methodologically sound proposal.

Conflicts of Interest

The authors declare no competing interests.

Supplementary Materials

Supplementary Material.docx (Figure S1 and S2 and Table S1-S10). (Supplementary Materials)