Background. Given limited data on the epidemiology of MSI-H and dMMR across solid tumors (except colorectal cancer (CRC)), the current study was designed to estimate their prevalence. Materials and Methods. A structured literature review identified English language publications that used immunohistochemistry (IHC) or polymerase chain replication (PCR) techniques. Publications were selected for all tumors except CRC using MEDLINE, EMBASE, and Cochrane databases and key congresses; CRC and pan-tumor genomic publications were selected through a targeted review. Meta-analysis was performed to estimate pooled prevalence of MSI-H/dMMR across all solid tumors and for selected tumor types. Where possible, prevalence within tumor types was estimated by disease stages. Results. Of 1,176 citations retrieved, 103 and 48 publications reported prevalence of MSI-H and dMMR, respectively. Five pan-tumor genomic studies supplemented the evidence base. Tumor types with at least 5 publications included gastric (n = 39), ovarian (n = 23), colorectal (n = 20), endometrial (n = 53), esophageal (n = 6), and renal cancer (n = 8). Overall MSI-H prevalence (with 95% CI) across 25 tumors was based on 90 papers (28,213 patients) and estimated at 14% (10%–19%). MSI-H prevalence among Stage 1/2 cancers was estimated at 15% (8%–23%); Stages 3 and 4 prevalence was estimated at 9% (3%–17%) and 3% (1%–7%), respectively. Overall, dMMR prevalence across 13 tumor types (based on 54 papers and 20,383 patients) was estimated at 16% (11%–22%). Endometrial cancer had the highest pooled MSI-H and dMMR prevalence (26% and 25% all stages, respectively). Conclusions. This is the first comprehensive attempt to report pooled prevalence estimates of MSI-H/dMMR across solid tumors based on published data. Prevalence determined by IHC and PCR was generally comparable, with some variations by cancer type. Late-stage prevalence was lower than that in earlier stages.

1. Introduction

DNA mismatch repair (MMR) is a process that plays a key role in maintaining genomic stability by recognizing and repairing base-base mismatches and insertion/deletion of DNA generated during replication and recombination. Defects in MMR are associated with genome-wide instability and the progressive accumulation of mutations, especially regions of simple repetitive DNA sequences known as microsatellites, resulting in MSI. MSI-high (MSI-H) is a hypermutable phenotype that allows mutations to be accumulated rapidly, resulting in tumor development via the selection of cancer-promoting mutations in pathways that are responsible for maintaining functional DNA repair, apoptosis, and cell growth.

To test for MSI-H and dMMR statuses in solid tumors, polymerase chain reaction (PCR) and immunohistochemistry (IHC) methods have been widely accepted as respective testing platforms for these biomarkers. The PCR method uses a panel of microsatellite markers to detect size shifts in different loci. The IHC method uses a more direct test to determine the presence of MMR proteins. A tumor is typically classified as MSI-H if shifts are detected in at least 2 of 5 loci using the PCR method and dMMR if at least one MMR protein is absent using the IHC method. The use of NCI (BAT-25, BAT-26, D2S123, D5S346, and D17S250) [1] and Promega (BAT-25, BAT-26, NR-21, NR-24, and MONO-27) [2] panels in PCR and the use of MLH1, MSH2, MSH6, and PMS2 proteins in IHC are considered the gold standard approaches [3, 4, 5].

Among patients diagnosed with metastatic cancer and MSI-H or dMMR, prognosis is generally poor [6]. Recently, evidence has mounted on the benefits of immunotherapy, especially with checkpoint inhibitors such as pembrolizumab on MSI-H/dMMR tumors [7, 8, 9]. Historically, most patients with a solid tumor diagnosis were not tested for MSI; a better understanding of MSI-H and dMMR prevalence can help estimate the size of the potential target population. To provide reliable estimates of MSI-H and dMMR prevalence, a comprehensive structured literature review was conducted to gather relevant and recent evidence on the epidemiology of MSI-H and dMMR across multiple tumors. When sufficient data were available, meta-analysis was performed to estimate the prevalence of MSI-H and dMMR tumors overall, across individual tumor types, and by stage of disease.

2. Methods

Study eligibility criteria outlined in Table 1 guided study identification and selection for the literature review.

2.1. Literature Review

Relevant studies were identified by searching the following through the Ovid platform: Medical Literature Analysis and Retrieval System Online (MEDLINE), Excerpta Medica database (Embase), and Cochrane Central Register of Controlled Trials. Predefined search strategies were executed on October 26th, 2017. Study design filters recommended by the Scottish Intercollegiate Guidelines Network (SIGN) were used. Population terms were adapted from published research [9]; no intervention or comparator terms were used.

Systematic reviews, meta-analyses, and key narrative reviews of interest were identified via hand searching. Targeted hand searches were conducted to identify colorectal cancer (CRC) studies and pan-tumor genomic studies reporting MSI-H/dMMR prevalence. Studies for all solid tumors except CRC were selected through database searches; CRC and pan-tumor genomic studies were selected through a targeted review. One reviewer reviewed all abstracts and proceedings identified through database searches and the targeted review according to the selection criteria. Studies identified as potentially eligible during abstract screening were screened in full-text by the same reviewer. The full-text studies identified at this stage were included for data extraction. The process of study identification and selection are summarized with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagrams [10].

One reviewer extracted data on study characteristics, interventions, patient characteristics, and outcomes from included studies. The second reviewer independently extracted data from a random 10% of the publications, reconciled the data, and determined the error rate and missing data rate of data extraction by the first reviewer. The error rate (number of cells with incorrect data/number of cells with text) was 2.9%, and the missing data rate (number of cells with missing data/number of blank cells) was 1.2% (an error rate greater than 5% would have triggered extraction of a further 10% of publications by the second reviewer). All errors discovered through this process were corrected. Potential publication biases were checked through funnel plots. Data were stored and managed in a Microsoft Excel workbook.

Only studies that used PCR or IHC methods were included in this review. To increase validity of the meta-analysis, only studies that used NCI (BAT-25, BAT-26, D2S123, D5S346, and D17S250) or Promega (BAT-25, BAT-26, NR-21, NR-24, and MONO-27) panels in PCR and MLH1, MSH2, MSH6, and PMS2 proteins in IHC were included in the meta-analysis. The only exceptions were pan-tumor genomic studies, which used large-scale sequencing techniques to test for the presence of only the MLH1 gene. These genomic studies were included in sensitivity analyses to detect their potential effect on the meta-analysis.

Prevalence of MSI-H and/or dMMR was extracted overall, by tumor type, histology, stage, and country.

2.2. Meta-Analysis

Reported proportions were transformed according to the Freeman–Tukey variant of the arcsine square root (double arcsine) transformed proportion [11]. The pooled proportion was calculated by back-transforming the weighted mean of the transformed proportions, using the DerSimonian–Laird random effects model [12].

Meta-analysis was conducted using the metafor package version 1.9-9 in R 3.4.0. Weighting of each tumor type was based on cancer-specific prevalence estimates provided by the GLOBOCAN 2012 database from the World Health Organization [13]. For rare tumor types, when data were unavailable on the GLOBOCAN database, other databases and peer-reviewed publications were referenced [1418]. Each tumor type was assigned a weight based on its general prevalence; in cases where two or more studies were included for a given tumor type, weight was split proportionally between studies based on the sample size.

3. Results

The study selection process for identification of studies reporting MSI-H or dMMR prevalence in the structured literature review is outlined in Figure 1. Overall, 1,176 publications were assessed for eligibility; a total of 156 full-text publications were included based on the structured and targeted literature review.

3.1. Feasibility Assessment of Meta-Analysis

References for included studies can be found in Tables 24. Of the 156 included full-text publications, 103 studies reported prevalence of any MSI status, which included MSI-H, MSI-L (microsatellite instability-low), and MSS (microsatellite stable). Forty-eight studies reported prevalence of dMMR according to the eligibility criteria. Five large pan-tumor genomic studies reported MSI-H status across multiple solid tumors.

3.2. Study Characteristics

The most common tumor types (excluding CRC) identified were endometrial (53 studies), gastric (39 studies), ovarian (23 studies), renal (9 studies), and esophageal (6 studies). Twenty CRC studies were identified from the targeted review. Overall, 54 studies were conducted in the United States, 18 in Korea, 12 in Japan, 12 in multiple countries, and 60 in other countries. Most studies provided an MSI-H cut-off between 30 and 40%, inclusive, translating into a change in loci size of greater than or equal to 2 of 5 loci tested; however, there were two prominent outliers at 9% (Glavac 2003) and 66% (Wen 2012). Fifty-four studies used all four MMR proteins to detect MMR status, 3 studies used three proteins, 6 studies used two proteins, and 3 studies did not specify number of proteins used. Included studies reported different study designs: case control, cross-sectional, prospective cohort, and retrospective cohort.

3.3. Patient Characteristics

Across studies, the mean/median age ranged from 20.7 to 74 years. Percentage of patients by ethnicity ranged as follows: Caucasian (0–94.8%), African American (0–17.2%), Asian (0–100%), and other ethnic groups (0–13.8%). In studies where disease stage was reported, percentage of patients with stage 1 disease ranged from 0 to 80.7%, stage 2 disease ranged from 4.2 to 38.6%, stage 3 disease ranged from 8 to 73.5%, and stage 4 disease ranged from 0 to 97.7%.

3.4. MSI-H and dMMR Prevalence

The number of studies with available MSI-H and dMMR data is presented in Table 5. Of the 156 included studies, MSI-H prevalence as determined by NCI or Promega markers was reported in 90 studies, and MSS prevalence was reported in 79 studies. Sixty-six studies reported dMMR prevalence; 54 of those used all 4 MMR proteins in the IHC assay. Pooled MSI-H and dMMR prevalence estimates were reported in 140 studies.

MSI-H prevalence was available in 25 studies conducted in the United States, 17 studies conducted in Korea, and 8 studies conducted in Japan. dMMR prevalence data were available in 27 conducted in the United States and 2 studies conducted in Japan. MSI-H prevalence was reported by stages 1 (18 studies), 2 (18 studies), 3 (17 studies), 4 (16 studies), 1 or 2 (24 studies), and 3 or 4 (23 studies).

Beyond the 6 main tumor types feasible for tumor-specific meta-analyses, 19 other tumor types were included in the meta-analysis of overall MSI-H prevalence. Overall, MSI-H prevalence differed considerably across tumor types. A low prevalence of 2% (95% CI, 0%–8%) was observed in Ewing sarcoma [19], while a much higher prevalence of 35% (95% CI, 15%–57%) was reported in sebaceous tumors [20]. Small bowel [21] and cervical tumors [22] had prevalence of 12% each, which were very close to the all-tumor estimate.

3.5. Meta-Analysis Results: Random Effects

Overall meta-analysis results is presented in Figure 2. Prevalence estimates, 95% confidence intervals, and number of studies included in each analysis are shown. Meta-analysis results obtained from the random effects model in all tumor types are presented as forest plots in the Supplementary information (Appendix Figures 1–Figure 26). Funnel plots obtained from each meta-analysis are also presented in the Supplementary information (Appendix Figure 27–Figure 44).

The weighted prevalence of MSI-H without genomic studies was estimated to be 14% (95% CI, 10%–19%) across all tumor types and stages. The prevalence was 10% (95% CI, 7%–14%) when four of the five large pan-tumor genomic studies were included (one genomic study was excluded as it did not report the total number of patients or the number of patients with MSI-H). Overall weighted dMMR prevalence was estimated to be 16% (95% CI, 11%–22%) across all tumor types and stages. This estimate remained unchanged (16% (95% CI, 12%–21%)) in the sensitivity analysis, in which two studies (Everett 2014 and Roberts 2013) that possibly screened patients based on their Lynch syndrome status were excluded. Overall, MSS prevalence was found to be 79% (95% CI, 72%–85%) across tumor types and stages. Estimated pooled MSI-H and dMMR prevalence without genomic studies was 15% (95% CI, 11%–18%) and dropped to 11% (95% CI, 8%–15%) when genomic studies were included.

Country-specific MSI-H prevalence was estimated only in the United States, Korea, and Japan, for which at least 2 publications were included. The weighted prevalence of MSI-H for the United States, Korea, and Japan was estimated at 20% (95% CI, 16%–24%), 9% (95% CI, 6%–12%), and 16% (95% CI, 9%–26%), respectively, across all cancers and stages. dMMR all-stage prevalence for the United States was estimated at 14% (95% CI, 6%–23%) and for Japan was estimated at 20% (95% CI, 0%–63%). Stages 1-2 MSI-H prevalence was 15% (8–23%), while stage 3 and stage 4 prevalence was estimated at 9% (3%–17%) and 3% (1%–7%), respectively.

Tumor-specific meta-analysis was feasible for 3 key gastrointestinal tumors (gastric, colorectal, and esophageal), 2 gynecological tumors (endometrial and ovarian), and 1 genitourinary tumor (renal) with results presented in Figures 35. Among the gastrointestinal tumors, gastric cancer MSI-H pooled prevalence (with 95% CI) from 32 studies (16,308 patients) was estimated at 11% (9–12%) and dMMR pooled prevalence from 4 studies (854 patients) was estimated at 8% (2–17%); Based on stages across gastrointestinal tumors, the prevalence was 13% (10%–16%; 10 studies; 3,194 patients) for stages 1-2, and the prevalence was 10% (7–13%; 10 studies; 1,319 patients) in stages 3-4 cancer. The highest MSI-H pooled prevalence was observed for the intestinal histological subtype with 13% (10–17%) based on 14 studies (2,652 patients). In CRC, MSI-H pooled prevalence from 14 studies (8,156 patients) was estimated at 13% (10–16%) and dMMR pooled prevalence from 4 studies (11,434 patients) was estimated at 10% (5–15%). For stages 1-2 CRC, the prevalence was 20% (10%–32%; 4 studies; 888 patients), and for stages 3-4, the prevalence was 9% (3–16%; 4 studies; 873 patients). Based on histology, the highest MSI-H pooled prevalence was observed for the poorly differentiated CRC subtype with 32% (25–40%) based on 6 studies (1,204 patients). Among esophageal cancers, MSI-H pooled prevalence from 3 studies (147 patients) was estimated at 4% (0–11%). For stages 3-4 esophageal cancers, the prevalence was 18% (4%–39%; 2 studies; 62 patients). Based on histology, the highest MSI-H pooled prevalence was observed for well-differentiated and poorly differentiated esophageal subtypes with 16% (3–35%) and 16% (0%–45%), respectively. dMMR analysis was not feasible for esophageal tumors. For the gynecological tumors, endometrial cancer MSI-H pooled prevalence from 27 studies (6,813 patents) was estimated at 26% (23–29%) and dMMR pooled prevalence from 26 studies (5,248 patients) was estimated at 25% (22–28%). In ovarian cancers, MSI-H pooled prevalence from 17 studies (4,150 patients) was estimated at 11% (6–18%) and dMMR pooled prevalence from 5 studies (356 patients) was estimated at 8% (6–11%). Based on histology, the highest MSI-H pooled prevalence was observed for endometrioid subtype for each tumor with 30% (25–35%) based on 6 studies (1,204 patients) for endometrial cancers and 17% (25–35%) based on 3 studies (211 patients) for ovarian cancers. Among renal tumors, MSI-H all-stage prevalence was estimated to be 1% (95% CI, 0%–2%) based on 7 studies (2,231 patients); dMMR analysis was not feasible for renal tumors.

4. Discussion

This structured literature review and meta-analysis investigated MSI-H and dMMR prevalence across tumor types and compared prevalence estimates by tumor type, tumor stage, and country subgroups. Analysis results estimated the prevalence of MSI-H across all tumor types as 14% (95% CI, 10%–19%). dMMR prevalence was comparable at 16% (95% CI, 11%–22%).

Pooled dMMR prevalence estimates by tumor type were similar to those for MSI-H. It has been suggested that, for Lynch syndrome testing, PCR testing may be less sensitive than IHC due to the fact that mutations in MSH6 may present as MSI-L [23]. The results of this review, however, suggest that MSI-H and dMMR IHC testing results are generally comparable.

The United States had higher MSI-H prevalence than Korea and Japan, but this result is possibly biased due to the lack of weighting for country-specific tumor prevalence.

Subgroup analysis indicated that early stage diseases (stage 1 and 2) tended to have a higher MSI-H prevalence than later stages (stages 3 and 4). Numerous studies have established the value of MSI status as a prognostic factor [2426]. Results of a meta-analysis including 7642 patients indicated that MSI (MSI-H + MSI-L) tumors corresponded with significantly improved prognosis compared to MSS CRCs (overall survival HR 0.65 (95% CI, 0.59–0.71) [27]. This may partially explain the lower MSI-H prevalence in the later stages of cancers.

Some tumor types had noticeably higher MSI-H prevalence than others. Endometrial tumors had MSI-H prevalence of 26% (95% CI, 23%–29%), whereas renal tumors only had MSI-H prevalence of 1% (95% CI, 0%–2%). This observation corroborates findings from recent genomic studies, which revealed that the frequency of MSI-H events is highly variable across tumor types [13, 28]. One study noted that MSI-H prevalence was highest in Lynch syndrome-associated tumor types (endometrial, colon, gastric, and rectal) [13] which is well-aligned with findings from the current study.

The identified evidence base included 156 articles reporting on the prevalence of MSI-H and/or dMMR published between 1999 and 2017. This review includes the most cancer types of a published review to date. Of the other two known published meta-analyses that have quantified the prevalence of MSI-H for selective tumors, the first (including publications to 2007) reported an MSI-H prevalence of 12% (95% CI, 8%–17%) in ovarian tumors [29], the second (including publications to 2009) reported an MSI-H prevalence of 10% (95% CI, 6%–14%) in ovarian tumors [30], and the third (including studies published up to 2014) reported an MSI-H prevalence of 17% (95% CI, 15%–19%) in colorectal tumors [31]. The finding from the current meta-analysis suggests MSI-H prevalence of 11% (95% CI, 6%–18%) in ovarian cancer patients and 13% (95% CI, 10%–16%) in colorectal cancer patients, which are well-aligned with findings from previous meta-analyses.

This large-scale meta-analysis of the prevalence of MSI-H and dMMR used rigorous methodology in selection of testing methods, subgroup analyses, and incorporation of pan-tumor genomic studies in sensitivity analyses. First, this meta-analysis of MSI-H and dMMR prevalence included the most number of studies (156) to date. Second, weighting techniques were used to adjust for overall tumor prevalence in order to prevent oversampling of commonly reported tumor types. Third, only studies that utilized the “gold standard” MSI-H and dMMR testing methods were included in the meta-analysis, so the results from these studies were more comparable. Fourth, the subgroup analyses, which were stratified by factors such as tumor type, country, and disease stage, indicated which factors had potential association with prevalence. Fifth, the inclusion of pan-tumor genomic studies in the sensitivity analyses offered an alternative scenario and suggested that the testing method used in large-scale genomic studies (sequencing) is significantly different from the widely accepted methods (PCR and IHC) used in other included studies.

This meta-analysis has some limitations. First, the literature review for CRC was a targeted hand search; some potentially relevant publications may not have been identified. Studies were reviewed by a single researcher, but a quality check was performed to validate the dataset. An additional limitation was the heterogeneity of included study designs included, which included case control, cross-sectional, prospective cohort, and retrospective cohort studies. However, because of scarcity of the numbers in most cancer types, studies with different designs were included to maximize the data sources. Symmetry was observed on most funnel plots, which suggest a lack of publication bias. To address heterogeneity in study designs included in the meta-analysis, data were analyzed using fixed- and random-effects models; however, this exploration did not provide evidence of any specific source of heterogeneity. Finally, given the lack of MSI/MMR publications on a few major cancer types, the “overall” prevalence estimate does not include all solid tumors.

Recent evidence [32, 33] supporting the role of MSI-H and dMMR, and associated immunogenicity as a mechanism for increased efficacy of PD-1/PD-L1 blockade in metastatic tumors with MSI-H or dMMR [8], demonstrates to the importance of increasing understanding [34] of prevalence across tumor type, stage, histology, and ethnicity.

Conflicts of Interest

M. Amonkar and K.-L. Liaw are employees of and own shares in Merck & Co, Inc. The other authors declare no conflicts of interests.


This work was financially supported by Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc., Kenilworth, NJ, USA.

Supplementary Materials

Meta-analysis results obtained from random effects model in all tumor types are presented as forest plots in the Supplementary information (Appendix Figures 1–Figure 26). Funnel plots obtained from each meta-analysis are also presented in the Supplementary information (Appendix Figure 27–Figure 44). (Supplementary Materials)