Research Article | Open Access
The Reproducibility of the Immunohistochemical PD-L1 Testing in Non-Small-Cell Lung Cancer: A Multicentric Italian Experience
An important harmonization effort was produced by the scientific community to standardize both the preanalytical and interpretative phases of programmed death-ligand 1 (PD-L1) immunohistochemical (IHC) testing in non-small-cell lung cancer (NSCLC). This analysis is crucial for the selection of patients with advanced-stage tumors eligible for treatment with pembrolizumab and potentially with other anti-PD-1/PD-L1 checkpoint inhibitors. This multicentric retrospective study evaluated the reproducibility of PD-L1 testing in the Italian scenario both for closed and open platforms. In the evaluation of the well-known gold-standard combinations (Agilent 22C3 PharmDx on Dako Autostainer versus Roche’s Ventana SP263 on BenchMark), the results confirmed the literature data and showed complete overlapping between the two methods. With regard to the performances by using open platforms, the combination of 22C3 with Dako Omnis or Benchmark obtained good results basically, while the 28,8 clone seemed to be associated with worse scores.
An important harmonization effort was produced by the scientific community to standardize both the preanalytical and interpretative phases of programmed death-ligand 1 (PD-L1) immunohistochemical (IHC) testing in non-small-cell lung cancer (NSCLC) [1, 2]. This analysis is crucial for the selection of patients with advanced-stage tumors eligible for treatment with pembrolizumab and potentially with other anti-PD-1/PD-L1 checkpoint inhibitors. Several antibody clones (especially 22C3, 28-8, SP263, and SP142) were evaluated and showed good reproducibility in harmonization studies . However, in clinical practice, further validation efforts seem necessary since diagnostic reports from various laboratories may be not completely overlapping . The Blueprint project showed that the percentage of PD-L1 positive tumor cells was comparable for clones 22C3, 28-8, and SP263, while clone SP142 characteristically identified lower percentages of positive neoplastic cells . Consequently, the 22C3, SP263, and 28-8 clones are usually chosen by pathologists to test routinely cytological and histological specimens, combining them in close and open commercially available IHC platforms. Moreover, due to the different technical and interpretative expertise, further analytical variables may affect the final local reports . In the Italian scenario, a study confirmed a high correlation between PD-L1 IHC expression data obtained with the 22C3 and SP263 clones, suggesting that the two assays could be utilized interchangeably . After 1 year of PD-L1 routine testing, the present multicentric retrospective study has aimed to compare the results obtained by using different protocols performed on the same tissue microarray (TMA) of a series of NSCLC histological specimens, analyzed in different laboratories and it aimed to evaluate if heterogeneous results still persist, especially when open platforms are used. The data were recorded in terms of interpretative/analytical error, highlighting the current state of reproducibility in the routine practice of PD-L1 IHC test.
2. Materials and Methods
Formalin-fixed paraffin-embedded (FFPE) histological samples from 18 lung surgical specimens with a NCSLC were retrospectively selected for this study. The series included adenocarcinomas and squamous cell carcinoma. The inclusion criteria were the following: adult patients (>18 years old) who underwent total or partial pneumonectomy in the period between 1 December 2016 and 31 January 2018 for NSCLC; no previous neoadjuvant chemoradiotherapy was administered. The original samples were recovered from the archive of the Pathology Department of University Milan Bicocca-ASST Monza, San Gerardo Hospital, Monza. The study was approved by the Ethical Committee of ASST Monza, under the approval #N.1311, dated 17/07/2018. To maximize the homogeneity in preanalytical variables, cases were selected from a unique institution with available trackable processing phases. For this study, fixation time was set at 24 hours following the surgical procedure, as previously described . Tissues subsequently were grossed and processed as routine cases; a representative histological hematoxylin and eosin (H&E) stained section of the original nodules was evaluated by two lung-committed pathologists (FB, FP) avoiding little fixed areas of necrosis and fibrosis and the corresponding paraffin block was chosen for the study. For every case a PD-L1 staining (Agilent 22C3 pharmDx on Dako Autostainer, Dako, Glostrup, Denmark) was performed to sample TMA cores, according to three balanced groups: score Tumor Proportion Score (TPS) negative (<1% or absence of reactivity); score intermediate expressors (1-49% of tumor cells); score strong expressors (≥ 50% of tumor cells). For the TMA construction, two separate areas were selected from the original block (about 3 mm in diameter), homogeneous for expression patterns for PD-L1, to be punched using a 2 mm-diameter needle. The TMA layout was built using the Galileo TMA R4.30 ISE software (Integrated Systems Engineering Srl, Milan, Italy). The realization of the TMA blocks was made possible by the use of the semiautomatic ISE Galileo TMA CK 4500 arrayer (Integrated Systems Engineering). Serial sections on positively charged slides of 1-2 micron thickness were obtained. All the collected sections were then kept in a thermostated oven at 60°C overnight. Firstly, TMA blanks were stained using two closed platforms to obtain the gold-standard scores (Agilent 22C3 PharmDx on Dako Autostainer and Roche’s Ventana SP263 on BenchMark with Assay OptiView DAB IHC Detection Kit, Ventana, CA, USA). PD-L1 staining was evaluated by two lung-committed pathologists (FB, FP) in blind and then jointly for the final agreement. Secondly, further TMA blanks were stained using 7 alternative protocols for PD-L1 scoring on open platforms (Table 1). The slides were evaluated in blind from the gold-standard results and scoring was recorded in a Microsoft Office Excel 2007 database for the statistical analysis. All the discordant cases were reevaluated jointly by a board team of pathologists to identify the possible source of errors (meeting at UNIMIB in 07/2018).
In the first phase of the study PD-L1 using two closed standard platforms was evaluated (Roche Ventana SP263 on Benchmark and Dako PharmDx 22C3 on Autostainer). In Table 2 the comparative results obtained are listed. In 15 out of 18 cases (83%) PD-L1 scoring overlapped in both cores, using indifferently the two platforms. In 3 out 18 cases (n. 11,12,17) the 22C3/Autostainer PD-L1 staining produced different results in the two cores so the final grade was set on the Ventana staining. The definitive gold-standard scores included 6 negative cases and 7 intermediate, and 5 strong expressors. While the conclusive results were equivalent, a certain degree of diversity was noticed in terms of intensity. SP263 produced stronger IHC reactivity, just easily perceptible at 4x; to assess correctly 22C3 staining, a greater magnification than 4x had to be used, in contrast to what happened with SP263 staining, where a low magnification was sufficient (Figure 1(a)). Secondly, the comparison between the gold-standard scores and the results obtained by using 7 different open platforms were collected in Table 3. All the discordant cases were reevaluated by the expert board team to identify the possible source of error.
Three possibilities were identified.(1)Technical errors (T): exemplificative situations are listed in Figures 1(b)–1(d).(2)Pathologist interpretative errors (P): typical examples are shown in Figures 2(a)–2(d).(3)Mixed errors (M): in these cases, a combination of low intensity due to technical reasons (compared to gold-standard) and underestimated signal by the pathologists produced the error.
A total of 23 out of 126 tests (18%; 4T, 15P, 4 M) were affected by the three error sources, globally; error rate (ER) of the single centers ranged from 25% (5/ 18) to 5% (1/18). Sensitivity ranged from 69% to 92% and specificity from 33% to 100%; the best performance was obtained by protocol n.2 using clone 22C3 on Dako Omnis (ER=5.5%; Sn=92%; Sp=100%) and protocols n.3,7 (ER=11%; Sn=85%; Sp=100%) using clone 22C3 on Dako Omnis and Ventana Benchmark, respectively (Table 4).
3.1. Negative Cases and Therapy (N=6)
Five out of 7 protocols assigned correctly all the negative cases (Table 5). For 2 protocols the board pointed out in the study possible problems in terms of specificity, due to prevalent interpretative errors. In 2 out of 6 patients PD-L1 testing was negative independently from the platform and the center performing the examination.
3.2. Intermediate Expressor Cases and Therapy (N=7)
In the intermediate group a certain disagreement persisted; however technical (or mixed) errors seemed to be more relevant than in negative cases. Two out 10 errors were scored as strong expressors instead of intermediate; in the majority of the situations the score was underestimated (Table 6).
3.3. Strong Expressor Cases and Therapy (N=5)
The positive group had a good diagnostic agreement for technical and interpretative variables (Table 7). Case n.12 (Figure 2(d)) was particularly challenging (isolated tumor cells in normal lung parenchyma), highlighting the importance of a high magnification examination in absence of strong PD-L1 signal.
Immune-checkpoint inhibitors have changed the treatment paradigm in locally/advanced NSCLC [7–9]. There are four monocolonal antibodies that are currently used in clinical practice, with some overlapping indications: nivolumab in pretreated patients  as well as atezolizumab , pembrolizumab that extends the possibility of using upfront immune-checkpoint inhibitors (ICIs) , and more recently durvalumab as maintenance/consolidative treatment in patients with nonoperable locally advanced NSCLC who benefit from chemoradiotherapy . All these drugs have an indication more or less linked to tumoral IHC PD-L1 expression. In particular pembrolizumab was granted in first line setting only in tumors that express a strong PD-L1 tumor proportion score (TPS≥50%) while in further lines of therapy a positivity of PD-L1 (≥1%) is sufficient to indicate its employment . The same situation regarding durvalumab, after a recent post hoc analysis in which the maximum benefit is demonstrated in PD-L1 ≥ 1%, keeps the indication after chemoradiation in these tumors alone . Nivolumab and atezolizumab have the indication from second line of treatment in all comers without restriction of PD-L1 tumor expression. In this scenario how the detection and correct interpretation of PD-L1 expression on tumor cells are crucial in order to allow the patients the best therapeutic strategy becomes evident . Beyond the technical aspects discussed below it is important to note the intra- and intertumoral heterogeneity of PD-L1 expression that may affect the reproducibility of this analysis . Finally, the conservation of archived tissue samples may impact the detection and staining degree leading to misinterpretation of PD- L1 status [17–19]. This multicentric retrospective study evaluated the reproducibility of PD-L1 testing in the Italian scenario both for closed and open platforms. In the evaluation of the well-known gold-standard combinations (Agilent 22C3 PharmDx on Dako Autostainer versus Roche’s Ventana SP263 on BenchMark), the results confirmed the literature data and showed complete overlapping between the two methods. As regards the intensity levels of the staining, the use of the Ventana platform produced more intense reactions in the face of a morphological more difficult distinction between the tumor cells and the immune cells (usually alveolar macrophages, normally positive) while the Dako platform provided fairly soft reactions but with morphological differentiation of the most obvious cell types. Secondly, the comparison between the gold-standard and the PD-L1 IHC staining obtained by using 7 alternative locally validated protocols on open platforms reflected some possible sources of errors in the routine practice. The study identified mainly two exemplificative situations: interpretative errors (Figure 2) that affected basically false negative results and technical ones (Figure 1). In the first group the possible staining of the immune cells in the histological sample may complicate the histological interpretation as the inappropriate application of the magnification rule that recommends the use of high-power field for the scoring of mild/ focal reactivity. As shown in Table 6, in the intermediate group the sensitivity may be affected whenever a mild or focal staining, related to a low technical amplification or a low antigen retrieval, is a source of false negative interpretative errors. On the other hand, a low specificity is also possible, as in protocols 4 and 6 (Table 5) due to the difficulties of pathologists in distinguishing macrophages from neoplastic cells in particularly challenging specimens or avoiding false positive staining in mucus-rich tumors. In this subgroup, the study revealed as some protocols may frequently produce unspecific perimembranous reactivity. Every laboratory should set the proper PD-L1 protocol to avoid this signal that is inappropriately considered as positive.
Another difficult situation was pinpointed when pathologists should decide around the threshold of 50% positive cells, suggesting that only a careful and extensive quantitative evaluation may avoid an underestimation in the real routine practice. Among the strong positive patients only one (case 12) produced equivocal results, due to a particularly challenging TMA core that included isolated (PD-L1 positive) tumor cells. For the technical errors, a significant proportion of them may be related to the limitations of a TMA-based study; the heterogeneity in the serial levels of the histological sections may in fact exclude focal PD-L1 positive foci from the analysis. Moreover, due to TMA intrinsic characteristics, some cores cannot be adequately examined (Figure 1). With regard to the performances by using open platforms, the combination of 22C3 with Dako Omnis or Benchmark obtained good results basically, while the 28,8 clone seemed to be associated with worse scores.
This study was designed to stress the methodological challenges of the PD-L1 IHC testing and collected particularly difficult cases by a preliminary histological selection of NSCLC samples that did not reflect necessary a normal case-mix. By these limitations, we can conclude that oncologists should remember that the bioselection of NSCLC patients by the PD-L1 staining has still some technical and interpretative caveat. On the other hand, after several efforts in order to harmonize the read-out lecture of PD-L1 status among different antibody clones, assays, and platforms, pathologists have now focused experiences and adequate training to give more detailed and reproducible PD-L1 results to clinicians [20, 21].
The immunohistochemical data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare absence of conflicts of interest in the management of this paper.
This work was funded by AIRC (Associazione Italiana per la Ricerca sul Cancro) MFAG GRANT 2016-Id. 18445.
- M. S. Tsao, K. M. Kerr, M. Kockx, M. B. Beasley et al., “PD-L1 immunohistochemistry comparability study in real-life clinical samples: results of blueprint phase 2 project,” Journal of Thoracic Oncology, vol. 13, pp. 1302–1311, 2018.
- A. Marchetti, M. Barberis, R. Franco et al., “Multicenter comparison of 22C3 PharmDx (Agilent) and SP263 (Ventana) assays to test PD-L1 expression for NSCLC patients to be treated with immune checkpoint inhibitors,” Journal of Thoracic Oncology, vol. 12, no. 11, pp. 1654–1663, 2017.
- J. Gong, A. Chehrazi-Raffle, S. Reddi, and R. Salgia, “Development of PD-1 and PD-L1 inhibitors as a form of cancer immunotherapy: a comprehensive review of registration trials and future considerations,” Journal for ImmunoTherapy of Cancer, vol. 6, no. 1, p. 8, 2018.
- B. Melosky, Q. Chu, R. Juergens, N. Leighl, D. McLeod, and V. Hirsh, “Pointed progress in second-line advanced non-small-cell lung cancer: the rapidly evolving field of checkpoint inhibition,” Journal of Clinical Oncology, vol. 34, no. 14, pp. 1676–1688, 2016.
- R. Buttner, J. R. Gosney, B. G. Skov et al., “Programmed death-ligand 1 immunohistochemistry testing: A review of analytical assays and clinical implementation in non-small-cell lung cancer,” Journal of Clinical Oncology, vol. 35, no. 34, pp. 3867–3876, 2017.
- A. Smith, M. Galli, I. Piga et al., “Molecular signatures of medullary thyroid carcinoma by matrix-assisted laser desorption/ionisation mass spectrometry imaging,” Journal of Proteomics, vol. 191, pp. 114–123, 2019.
- H. Yu, T. A. Boyle, C. Zhou, D. L. Rimm, and F. R. Hirsch, “PD-L1 expression in lung cancer,” Journal of Thoracic Oncology, vol. 11, no. 7, pp. 964–975, 2016.
- A. H. Scheel, M. Dietel, L. C. Heukamp et al., “Harmonized PD-L1 immunohistochemistry for pulmonary squamous-cell and adenocarcinomas,” Modern Pathology, vol. 29, no. 10, pp. 1165–1172, 2016.
- F. R. Hirsch, A. McElhinny, D. Stanforth et al., “PD-L1 immunohistochemistry assays for lung cancer: results from phase 1 of the blueprint PD-L1 IHC assay comparison project,” Journal of Thoracic Oncology, vol. 12, no. 2, pp. 208–222, 2017.
- L. Horn, D. R. Spigel, E. E. Vokes et al., “Nivolumab versus docetaxel in previously treated patients with advanced non-small-cell lung cancer: Two-year outcomes from two randomized, open-label, phase III Trials (CheckMate 017 and CheckMate 057),” Journal of Clinical Oncology, vol. 35, no. 35, pp. 3924–3933, 2017.
- A. Rittmeyer, F. Barlesi, D. Waterkamp et al., “Atezolizumab versus docetaxel in patients with previously treated non-small-cell lung cancer (OAK): a phase 3, open-label, multicentre randomised controlled trial,” The Lancet, vol. 389, no. 10066, pp. 255–265, 2017.
- M. Reck, D. Rodríguez-Abreu, A. G. Robinson et al., “Pembrolizumab versus chemotherapy for PD-L1–positive non–small-cell lung cancer,” The New England Journal of Medicine, vol. 375, no. 19, pp. 1823–1833, 2016.
- SJ. Antonia, A. Villegas, D. Daniel et al., “Overall survival with durvalumab after chemoradiotherapy in stage III NSCLC,” The New England Journal of Medicine, vol. 379, pp. 2342–2350, 2018.
- E. B. Garon, N. A. Rizvi, R. Hui et al., “Pembrolizumab for the treatment of non-small-cell lung cancer,” The New England Journal of Medicine, vol. 372, pp. 2018–2028, 2015.
- M. J. Ratcliffe, A. Sharpe, A. Midha et al., “Agreement between programmed cell death ligand-1 diagnostic assays across multiple protein expression cutoffs in non–small cell lung cancer,” Clinical Cancer Research, vol. 23, no. 14, pp. 3585–3591, 2017.
- J. McLaughlin, G. Han, K. A. Schalper et al., “Quantitative assessment of the heterogeneity of PD-L1 expression in non-small-cell lung cancer,” JAMA Oncology, vol. 2, no. 1, pp. 46–54, 2016.
- S. P. Kang, K. Gergich, G. M. Lubiniecki et al., “Pembrolizumab KEYNOTE-001: an adaptive study leading to accelerated approval for two indications and a companion diagnostic,” Annals of Oncology, vol. 28, no. 6, pp. 1388–1398, 2017.
- D. L. Rimm, G. Han, J. M. Taube et al., “A prospective, multi-institutional, pathologist-based assessment of 4 immunohistochemistry assays for PD-L1 expression in non–small cell lung cancer,” JAMA Oncology, vol. 3, no. 8, pp. 1051–1058, 2017.
- S. Hendry, D. J. Byrne, G. M. Wright et al., “Comparison of four PD-L1 immunohistochemical assays in lung cancer,” Journal of Thoracic Oncology, vol. 13, no. 3, pp. 367–376, 2018.
- J. Adam, N. Le Stang, I. Rouquette et al., “Multicenter harmonization study for PD-L1 IHC testing in non-small-cell lung cancer,” Annals of Oncology, vol. 29, no. 4, pp. 953–958, 2018.
- D. Fujimoto, Y. Sato, K. Uehara et al., “Predictive Performance of Four Programmed Cell Death Ligand 1 Assay Systems on Nivolumab Response in Previously Treated Patients with Non–Small Cell Lung Cancer,” Journal of Thoracic Oncology, vol. 13, no. 3, pp. 377–386, 2018.
Copyright © 2019 Elena Vigliar et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.