#### Abstract

Multiple meta-analyses may use similar search criteria and focus on the same topic of interest, but they may yield different or sometimes discordant results. The lack of statistical methods for synthesizing these findings makes it challenging to properly interpret the results from multiple meta-analyses, especially when their results are conflicting. In this paper, we first introduce a method to synthesize the meta-analytic results when multiple meta-analyses use the same type of summary effect estimates. When meta-analyses use different types of effect sizes, the meta-analysis results cannot be directly combined. We propose a two-step frequentist procedure to first convert the effect size estimates to the same metric and then summarize them with a weighted mean estimate. Our proposed method offers several advantages over existing methods by Hemming et al. (2012). First, different types of summary effect sizes are considered. Second, our method provides the same overall effect size as conducting a meta-analysis on all individual studies from multiple meta-analyses. We illustrate the application of the proposed methods in two examples and discuss their implications for the field of meta-analysis.

#### 1. Introduction

The last two decades have seen an exponential growth in the popularity of meta-analyses across scientific disciplines including medical research [1] and diagnostic medicine [2]. The purpose of meta-analysis is to assess the consistency and robustness of findings across populations, settings, and contextual factors in order to help ensure that a practice is likely to produce similar results when it is implemented. A single study cannot determine, with certainty, that an intervention works or does not work. Instead, studies that are combined together, across different settings, and conducted over time can establish a pattern of consistent findings that may be useful to justify new or refined practice. Several studies combined can establish both significance and repeatability of results [3]. Common statistical methods to combine studies in meta-analyses are based on a fixed-effects model which assumes a homogeneous treatment effect among studies or a more general random-effects model which allows heterogeneity among studies [4].

Multiple meta-analyses are sometimes conducted to investigate the effect of the same topic or intervention. The synthesis of these meta-analyses can be used to highlight what is known on a particular topic or intervention or what can contribute to more complete understanding of the extant empirical evidence [5]. However, the summarizing of meta-analyses can be very challenging because existing meta-analyses may be conflicting, may be reported differently or incompletely across studies, or may be more or less valid based on the quality of the research synthesis methods that were employed in conducting the reviews. This raises challenges for interpreting and drawing conclusions about what the results of these studies mean and how they should be used to inform future research, theory development, and practice. Systematic reviews are further complicated when a diverse array of health-relevant outcomes are assessed. This is essentially the same issue researchers and practitioners face when attempting to summarize primary studies, the same issue that makes meta-analysis appealing.

Several nonstatistical approaches to summarizing meta-analyses of the same topic or intervention are currently available. The Cochrane Collaboration has developed a set of recommended procedures for conducting overviews of reviews when multiple meta-analyses exist regarding different treatments for the same clinical condition [6–8]. Nonstatistical approaches to summarizing multiple reviews, which may be used for meta-analyses, include vote counting and using decision algorithms to identify the review(s) that are most salient [7]. However, narrative reviews can be far too subjective to reflect the knowledge that has been gained through research [9]. An ideal statistical approach to summarizing meta-analyses is to conduct a new meta-analysis of all primary studies included in multiple meta-analyses. That is, individual studies in all the related meta-analyses are identified, and their effect sizes are combined to calculate a summary statistic using meta-analytic techniques such as the random-effects model in DerSimonian and Laird [4]. However, while this approach is ideal and appealing when there are only two or three meta-analyses to combine, doing so is not as efficient as directly summarizing effect sizes reported in existing meta-analyses on similar topics. Recently, Hemming et al. [10] provided a Bayesian method to summarize multiple reviews. Their method assumes that estimated summary effect sizes follow a random-effects model based on exchangeability assumption. Differing types of effect sizes are not considered in their method. In addition, the authors did not study the fixed-effects models which may lead to nice properties for the combined overall effect size.

In this paper we start with describing methods to meta-analyze effect size estimates from individual studies extracted from several existing meta-analyses. However, the methods require a substantial amount of time and resources. Thus, we also introduce a more efficient statistical method to directly summarize information from existing meta-analyses without going back to individual studies. The paper is organized as follows. In Section 2 we introduce the notations and our method for synthesizing meta-analyses with the same type of summary statistics. We also describe two-step methods for combining different types of summary statistics commonly used in meta-analyses. We start with the fixed-effects model and show that summarizing the reviews yields the same overall effect size as conducting a meta-analysis on all possible individual studies. We then describe the method for the random-effects model. Section 3 applies the proposed method to two examples. Section 4 discusses some potential applications of this method.

#### 2. Methods

The use of quantitative research synthesis techniques such as meta-analysis overcomes many of the limitations of traditional narrative literature reviews. Meta-analysis provides an objective and quantitative approach to research synthesis by taking into account factors such as sample size, magnitude, and direction of relationships and the methodological quality of the various studies analyzed [11–13]. Meta-analyses are not limited by a reliance on traditional indicators of statistical significance but instead rely upon effect sizes to give a picture of the size and scope of the impact of an intervention [14]. However, the more meta-analyses that are conducted on a given topic, the more difficult it can be to analyze and determine the cumulative findings from a body of literature.

Calculating the summary effect sizes across existing meta-analyses is a potentially valuable way to synthesize the knowledge base on a given topic. This process should be guided by the same rule that guides traditional meta-analysis which requires that studies being combined should be assessing a common clinical research question. Suppose several meta-analyses consider a common clinical question, and use similar search criteria to include and exclude individual studies. Summarizing these meta-analyses can provide better understanding of the clinical question than any single meta-analysis.

When multiple meta-analyses address the same topic, we propose that researchers should explore methods to synthesize the summary effect sizes from the existing meta-analyses. This proposed approach only requires summary effect sizes and their variances which are reported in the existing meta-analyses. The calculated overall effect size for the combined meta-analyses is essentially a weighted average of the summary effect sizes with the weights being the inverse of the variances; this approach is analogous to the techniques used to conduct a meta-analysis of primary studies.

Popular effect size estimates used in meta-analyses include standardized mean differences, odds ratios, and correlation coefficients [14]. In this section, we first consider the simplest case of combining multiple meta-analyses when all analyses use the same type of effect size estimates in a fixed-effects model. We then extend the method to the more commonly used random-effects model. Finally, we present a method for combining meta-analyses with different types of summary statistics in the fixed-effects or random-effects model.

##### 2.1. Combining Effect Sizes from the Same Measure

###### 2.1.1. Synthesizing All Individual Studies

Suppose meta-analyses using similar search criteria on a common topic are identified and need to be combined to obtain an overall summary statistic. We use and to denote a true effect parameter and its estimate, respectively, for study , , in meta-analysis . The estimate is commonly assumed to follow a normal distribution as where is the variance of . The variance estimator, , obtained from the study tends to be smaller for a larger sample size. If all ’s and ’s are available, researchers can conduct a meta-analysis using existing statistical methods based on a random-effects model (REM) or a fixed-effects model (FEM). As a popular model in meta-analysis, the REM assumes heterogeneity among and is given as follows: where and . Here is estimated using the sample variance from study in meta-analysis . Conventionally, the true variance is assumed to be known and equal to , and to simplify the notation we will use instead of throughout the paper. The random effect, , represents heterogeneous among all studies, and its variance can be estimated using a moment estimator by DerSimonian and Laird [4]: where and the weight is given by indicating a larger weight for a larger sample size. The moment estimate, , should always be nonnegative. If a negative value of is obtained, is set to 0 according to DerSimonian and Laird [4]. Using as a weight for study in meta-analysis , an estimate for the summary effect is given by a weighted average of effect estimates: In the FEM, the random-effect component in (1) is 0 by assuming true effect parameters are homogenous across studies, and the REM in (1) becomes The estimate for can be obtained using (4) by setting .

###### 2.1.2. Synthesizing Meta-Analyses

Obtaining with results from all individual studies as in (4) is an ideal way to synthesize multiple meta-analyses. In the traditional meta-analysis, hundreds of studies are often available on a similar topic, and when authors of some individual studies are not contactable, criteria can be set accordingly to exclude these studies. However, extracting individual study data requires tremendous time and resources. Alternatively, we propose to directly summarize meta-analyses. We start with showing the equivalence of synthesizing meta-analyses and synthesizing all individual studies under the simple FEM which assumes the same effect parameter, , across studies. The effect parameter in meta-analysis is estimated using where . The variance estimate of is given by . Using as the weight for meta-analysis , we can obtain a weighted average of meta-analysis results as follows: After some math, we see that the expression above is the same as (4) with . The result implies that conducting a meta-analysis by pooling summary effect sizes of the meta-analyses of interest is equivalent to a meta-analysis of combining all individual studies from these meta-analyses. The variance of the weighted average is given by .

The assumption of the same effect parameter across the studies may not be realistic considering studies’ heterogeneous characteristics. This assumption is relaxed in the REM as shown in (1). Using the data from meta-analysis , the parameter accounting for the random component is estimated by the following moment estimator: where The moment estimator has the same expression as the one given in (8) except that it is now obtained using only data from meta-analysis . Meta-analysis then uses as a weight for study to obtain an estimate for the summary effect by To synthesize meta-analyses, a possible way is to use as the weight for meta-analysis and calculate a weighted average of ’s as follows: Comparison between the estimate above and the one in (4) shows some difference. We can write (4) as the weighted average of meta-analyses summary statistics in the following expression: where and . A close look at and reveals a slight difference. This is due to different weight estimates, and , where the former uses studies in meta-analysis to estimate and the latter pools studies from all meta-analyses to estimate . In practice, a well-planned meta-analysis includes a large number of studies, and ’s and should all be close to the true parameter . Subsequently, the difference between and can be possibly ignored. Thus, although the weights used in (11) may not be as accurate as the ones obtained by pooling all individual studies, is still a statistically sound estimate for .

##### 2.2. Combining Effect Sizes from Different Types of Statistics

The summary statistics in meta-analyses may be of different types. When individual studies in a meta-analysis have continuous outcomes, the statistic may appear in the form of the sample mean difference. Sometimes, other studies address the same topic, but the outcome is dichotomized, and the odds ratio is commonly used to summarize the difference between two groups. The Pearson correlation coefficient is frequently used to evaluate the correlation between two continuous variables. When one variable is a continuous outcome and the other variable is dichotomized to indicate the group status, can be used to compare outcomes between two groups. When meta-analyses addressing the same topic use different types of statistics, one should convert these statistics to the same type of summary statistics before synthesizing them.

###### 2.2.1. Synthesizing All Individual Studies

We now discuss how to combine meta-analyses with these types of summary statistics using a two-step procedure. The first step is to convert various types of statistics to the same type. In this paper, we discuss how to convert the log-transformed odds ratio (OR) and the correlation coefficient to the same scale of the sample mean difference. The log-transformed OR can be converted to the sample mean difference using their linear relationship as will be discussed. However, a Taylor expansion is needed to convert the correlation coefficient to the sample mean difference via linear relationship. We focus on the linear relations because the weighted average in a meta-analysis is a linear combination of effect estimates, and preserving the linear nature in the conversion will facilitate the future calculation for synthesizing the meta-analyses.

Suppose meta-analysis , , is conducted using the standardized sample mean difference with its variance of and meta-analysis , , has the estimated log-transformed odds ratio as the summary effect and its sample variance of .

Chinn gives the detailed discussion on how to convert between the standardized mean difference and the estimated odds ratios from individual studies [15]. Since the log-transformed odds ratio, , in study of meta-analysis follows approximately a logistic distribution which differs from a standard normal distribution mainly in the tail area, dividing by , or 1.81, converts to an approximate standardized sample mean difference.

In meta-analysis , , with correlation coefficients, the coefficient from study is converted to Fisher's -score before the further calculation [14]. The conversion is given by Since a correlation coefficient can be converted to a standardized mean difference using , we can convert Fisher’s -score to using . The relationship above between and is nonlinear. The second order Taylor expansion of gives the approximate linear relationship between and .

After the effect estimates are converted to the same scale of , they can be used to estimate the overall effect parameter in the following REM model: where , , , and .

In the second step, , , and are combined to obtain the overall summary effect estimate . Let , , and . Again, the parameter can be estimated by a moment estimator similar to (8) using all effect estimates and their variances: where The resulting estimate for the summary effect using the REM is given by where , , and . The variance of is given by .

###### 2.2.2. Synthesizing Meta-Analyses

Combining meta-analyses with different types of statistics is similar to synthesizing meta-analyses using the same type of statistics except for the appropriate conversion of summary statistics. We let be an estimate for in the meta-analysis using the standardized sample mean difference, in the meta-analysis using the log-transformed odds ratio, and in the meta-analysis using the correlation coefficient. These estimates take the similar form as (17) using data from the corresponding meta-analysis. The estimate for the summary effect in each meta-analysis is derived by using these estimates for and the variances of the effect estimates in the weights and is given by where , , and . To directly synthesize these meta-analysis results, we use a weighted average of converted effect estimates as follows: where , , and . The variance of is given by . The weights , , and are usually calculated from the variances, values, or the confidence intervals found in original meta-analyses being combined. Articles which only report effect estimates do not have sufficient information on the true effect and should be excluded in the synthesis. Following similar lines in previous sections, we can show that the estimated overall effect is approximately the same as the one by combining individual studies.

The effect estimates used in the expression (21) are already converted to the same type. The estimates reported in original meta-analyses may not be in the same form. For example, a meta-analysis using the correlation coefficient is likely to report the correlation coefficient instead of Fisher’s -score. Suppose is the estimated odds ratio in meta-analysis and is the estimated correlation coefficient in meta-analysis . In terms of these indices and their variances originally reported in the meta-analyses, the overall effect estimate can be written as the following expression: where , , and . And its variance is The proposed estimate in (22) is a summary standardized sample mean difference. Its expression represents two steps. The first step is to convert the odds ratios and correlation coefficients to the same scale of the standardized sample mean difference, and the second step is to calculate the overall mean difference from all meta-analyses.

#### 3. Examples

We illustrate the proposed methods in two examples. The first example illustrates that combining two meta-analyses using our methods produces the same overall effect size as conducting a meta-analysis from all includable primary studies based on either the fixed-effects model or the random-effects model. The second example illustrates the process of combining multiple meta-analyses of the same clinical question which have concordant findings. The calculation is conducted using the R package “metafor.”

The first example is based on a meta-analysis by Swift and Callahan [16], which compared treatment outcomes between appropriately and inappropriately matched groups using 26 individual studies [16]. The treatments in these studies include pharmacotherapy, cognitive behavioral therapy, and group therapy. The treatment outcomes range from substance use days to weight loss. All the studies used the Pearson correlation coefficient as the effect estimate, and the estimates and the confidence intervals are presented in Table 1. The summary correlation coefficient estimate using all 26 studies is with 95% confidence interval based on a fixed-effects model. We conduct two separate meta-analyses based on studies 1–12 and studies 13–26 as cited in the reference section of the review in Swift and Callahan [16]. The summary effect sizes for these two meta-analyses are calculated to be with its sample variance being and with its sample variance being . It is of interest to investigate whether combining these two effect sizes gives a similar result as conducting the meta-analysis using all 26 individual studies. We use the inverse of the sample variances as the weights and calculate the weighted average of the summary effect sizes. The resulting overall effect size is with its variance being . The 95% confidence interval is also . The results are identical to those found using all individual effect sizes indicating the potential utility of this procedure.

To further illustrate the technique we repeat the analyses with a random-effects model (REM). We again use the 26 studies from Swift and Callahan [16] and illustrate that the proposed summary effect size obtained under REM may differ slightly from the one achieved from combining all individual studies. Under the REM, the summary correlation coefficient using all 26 studies is with 95% confidence interval (0.0941, 0.2253). We repeat the process described above based on studies 1–12 and studies 13–26, respectively. Using the random-effects model, the summary correlation coefficients for these two meta-analyses are with its variance being and with its variance being . Combining these two effect sizes produces an overall weighted mean correlated coefficient , with its variance being . This variance is smaller than those of separate meta-analyses, indicating more accuracy in estimating the overall correlation coefficient by including more individual studies. The 95% confidence interval is . Although the estimates of the between-studies variance differ slightly across the two meta-analyses, combining the two pseudoanalyses produces a similar finding to that of combining all of the includable individual studies.

To illustrate the method for synthesizing different types of effect estimates, we create a synthetic dataset using the studies from Swift and Callahan [16]. We convert the correlation coefficients to standardized sample mean differences for studies 1–9 and to log-transformed odds ratios for studies 10–17. The estimates are kept unchanged for studies 18–26. The converted estimates and the corresponding confidence intervals are listed in Table 2. The variances of the estimates are obtained using . We use the fixed-effects model to conduct meta-analyses on these three sets of studies. The summary effect estimates are with , with , and with . The overall effect estimate in mean difference is calculated using (22) to be with variance . We see that this result is quite close to the summary effect size calculated from combining individual studies at the beginning of the section which is with the variance 0.0016 after converting to the mean effect size. This indicates that the proposed procedure for directly combining results from meta-analyses produces similar results as the ideal but much more tedious procedure which instead combines individual studies subtracted from meta-analyses.

The anterior cruciate ligament (ACL) is a commonly reconstructed ligament of the knee. The meta-analyses conducted by Biau et al. [17] and M. C. Forster and I. W. Forster [18] indicate that the graft failure using hamstring autografts is not significantly different compared with bone-patellar tendon-bone autografts. We sought to apply our approach to synthesize these two meta-analyses on the graft choices in the ACL reconstruction. The summary odds ratios of graft fracture in Biau et al. [17] are with the 95% confidence interval and with the 95% confidence interval in M. C. Forster and I. W. Forster [18]. We first convert the odds ratios and their confidence limits to the natural log scale since meta-analyses using odds ratios are commonly conducted using the weighted average of the log-transformed odds ratios. After the conversion, we obtain the log-transformed odds ratios with the variance and with the variance . The resulting overall estimate of log odds ratio is with the variance 0.0695. This variance is smaller than either of the meta-analyses due to the inclusion of more studies. The 95% confidence interval of is (). Converting back to the original scale, the overall estimated odds ratio based on the two meta-analyses is 1.2613 with 95% confidence interval (0.7523, 1.2613). Although this confidence interval does not show significant difference between two graft choices, it is narrower than those in Biau et al. [17] and M. C. Forster and I. W. Forster [18]. This is expected since the overall estimated odds ratio calculated across meta-analyses provides a more accurate indicator than those calculated from separate meta-analyses.

#### 4. Discussion

Despite the advantages of meta-analyses over traditional narrative review methods, these techniques are not without limitations. As is the case with primary studies, multiple meta-analyses often produce conflicting findings even when they examine the same body of literature [7] given the numerous methodological decisions that are made during the review process. These decisions affect the search strategy, coding of studies, moderators examined, inclusion/exclusion criteria, and the reporting of effect sizes, and each has the potential to impact the review findings. Despite the transparency of the method, there is always a certain degree of subjectivity in research synthesis [14]. We have presented methods which can synthesize meta-analysis findings with minimal subjectivity.

Attempting to make statements about the overall findings of a body of systematic review literature raises a number of challenges. Part of the challenge is the lack of consistency in how authors report the findings of meta-analyses [19, 20]. Since no standardized reporting procedures have been adopted, there is a great deal of variability in how meta-analytic findings are reported both within and across academic disciplines. Additionally, many extant meta-analyses do not completely report effect sizes; confidence intervals, variance measures, and precise values are often omitted. This lack of consistency and quality assurance is a major limitation of this body of research and potentially limits the transportability of meta-analysis findings [21]. Like primary studies, meta-analyses on similar constructs may differ from one another in many ways. To address this issue and make meta-analyses more compatible and therefore more appropriate for synthesis and better equipped to inform practice, it is apparent that the research community needs to continue to work towards adopting a set of standardized best practices for conducting and reporting meta-analyses in behavioral health (see, e.g., [22]).

While the examples provided here illustrate that it is possible to calculate summary effect sizes across multiple meta-analyses of the same intervention, our testing of these techniques is still ongoing. A potential barrier to this approach is the lack of independence that results when multiple meta-analyses include many of the same primary studies. Treating these different outcomes as independent can produce incorrect estimates of the variance for the summary effect, but it does not invalidate the calculation of the effect size itself. This can be seen from (4) for combining individual studies. For example, suppose meta-analyses use the exactly same set of studies. Equation (4) becomes which gives the same estimate as the summary effect estimate in either of the meta-analyses. Ideally, the resulting variance of should be the same as the one in either of the meta-analyses. But instead, we have which tends to be considerably smaller than the appropriate value of . This limitation makes the proposed method more appropriate for meta-analyses with no or a limited number of overlapping studies. A possible inflation factor of can be multiplied with the variance in this case to correct for the biased variance estimator. More generally, with overlapping studies in all meta-analyses, the unbiased variance of the summary estimator can be written as where the inflation factor corrects for the underestimated variance and we have . If ’s can be retrieved from the overlapped studies, the factor can be precisely calculated. Otherwise, the investigator may have to guess an appropriate value for the inflation factor.

While further research is needed to refine the technique proposed in this paper, the findings of these initial validations suggest the utility of this method for combining mean effect sizes from multiple meta-analyses of the same intervention or treatment. The methodology detailed in this paper has several possible applications. The technique has its most utility when several meta-analyses have been conducted on the same treatment and have produced varying results. Even when multiple meta-analyses report similar findings regarding the magnitude and direction of effect sizes, the proposed technique can be used to summarize across the extant meta-analytic evidence base. The technique also offers a solution when meta-analyses have been conducted to update prior research syntheses; rather than ignoring findings predating the inclusion criteria of the updated study, this technique can be used to summarize the full range of available evidence. These and other potential applications make this a particularly appealing technique for researchers and practitioners alike who are faced with the challenge of summarizing the rapidly expanding body of meta-analytic research in medicine and many other scientific disciplines.

#### Acknowledgments

The authors would like to thank the associate editor and two referees for their constructive comments.